Bayesian Network without tears by Eugene Charniak of Brown university, is a classical introductory writing by the author of seminal texts like Statistical Language Learning, Introduction to Artificial Intelligence, Artificial Intelligence Programming and Computational Semantics. It has repeatedly been (officially or unofficially) made part of PhD comprehensive exam reading lists.

This short and well-written 14 page paper published by AAAI is a pleasure to read because it is concrete, concise and comes with the subtext of* "making Bayesian networks more accessible to the probabilistically unsophisticated". *Bayesian Networks, aka Causal Networks combine the visual and representative nature of graphical models with causal modeling and Bayesian probability. This results in a highly useful tool for intelligence analysis. Unlike usual texts on this subject, the researcher does not start with degree of Belief / Truth or uncertainty basics. By delving right into mechanics of Bayesian networks, author explains the idea of probabilistic reasoning combined with concepts, structure, features, evaluation and inference.

Referring to Judea Pearl's work on causality, the paper explains Bayesian networks as“causal” networks with the following properties.

- Bayesian networks are directed acyclic graphs (DAGs)
- The nodes represent random variables and the arrows represent dependencies between random variables (which can be interpreted as causal relationships).
- Each node X is associated with a conditional distribution P(X|parents), where parents are the nodes sending arrows to X.
- Root nodes X are associated with a prior P(X).
- The arrows are consistent with independence assumptions between the variables
- The entire Bayesian network can be explained as representation of the joint probability distributions of all the random variables of its nodes: P(X1, X2, … Xn), taking into account independence assumptions between the random variables.

- The Bayesian network allows us to compute conditional probabilities of the nodes, given that some of them have been observed i.e. evidence. Now the relationships (beliefs) are updated in the light of "evidence" updating the causal network.

The shortcomings of Belief networks are also pointed out in the text such as the major drawback to their use is the time of evaluation…

and

Probably the most important constraint on the use of Bayesian networks is the fact that in general, this computation is NP-hard (Cooper 1987)

The examples in the paper are quite easy to follow and most importantly, the explain-ability of the networks is elaborated quite well in the paper for those of us who like to flaunt the white-box nature of belief networks when comparing them against techniques like SVM or ANN.

The connection between causality inference and feature selection is not elaborated in detail probably because its deemed out of the scope for the paper. Dr. Charniak concludes with the following.

Bayesian networks offer the AI researcher a convenient way to attack a multitude of problems in which one wants to come to conclusions that are not warranted logically but, rather, probabilistically. Furthermore, they allow you to attack these problems without the traditional hurdles of specifying a set of numbers that grows exponentially with the complexity of the model. Probably the major drawback to their use is the time of evaluation (exponential time for the general case). However, because a large number of people are now using Bayesian networks, there is a great deal of research on efficient exact solution methods as well as a variety of

approximation schemes. It is my belief that Bayesian networks or their descendants are the wave of the future.