Introduction

A junction tree window shows the junction tree(s) of a Bayesian network or influence diagram.

To understand what a junction tree is and what it can be used for, there are several issues that may be of interest:

What is a Junction Tree?

Inference in a Bayesian network is conducted through message passing in a so-called junction tree of the network. The junction tree is obtained through transformation of the network, which involves the processes of moralization and triangulation. It can be shown that exact inference in a Bayesian network requires (implicit or explicit) moralization and triangulation.

In the moralization step, an undirected link is first added between each pair of unconnected parent nodes that have a common child node. Next, all directed links are converted to undirected links. This process results in a so-called moral graph.

The triangulation step adds (undirected) links to the moral graph such that the resulting graph is triangulated (or chordal). An undirected graph is chordal whenever it does not contain a cycle (i.e., a path of distinct nodes, except that the first and the last nodes are identical) of length greater than 3 without a chord (i.e., a link between non-consecutive nodes of the path). The maximal completely connected components or subgraphs (i.e., where each pair of nodes are connected to each other) of a graph are called cliques.

It is the cliques of the triangulated graph that form the nodes of the junction tree. Each clique has associated with it a table with indices spanning the Cartesian product of the state spaces of the nodes of the clique. Thus, each clique holds a representation of a function over all nodes of the clique. That is, a joint probability distribution over the nodes of the clique can be represented in the clique table.

Obviously, as each clique has associated with it a table that grows exponentially in size as more nodes are added to it, we want the cliques to be as small as possible. However, as the number of different triangulations may be as high as n factorial, where n is the number of nodes of the network, and as the representational and computational complexities of the resulting junction trees vary heavily over the different triangulations, it can be very difficult to find an optimal triangulation. In fact, the complexity of this optimization problem is NP-hard (mathematical way to say that it is impossible within reasonable time).

The Hugin Decision Engine (HDE) offers a number of different triangulation algorithms, most of which are based on greedy one-step look ahead heuristics. In many cases, however, an optimal (or near-optimal) triangulation can in fact be found. The HDE offers a method for (near-)optimal triangulation, which in many cases results in junction trees of much lower complexity than those generated using the other (heuristic) triangulation methods provided. For details, see the "Compile" item of the on-line manual, which is accessed through the help button of the main window of the Hugin Graphical User Interface.

Please note that if the Bayesian network is disconnected, so is the junction tree. A disconnected junction tree is sometimes called a junction forest.

Inference in a Junction Tree

As mentioned above, inference in a Bayesian network amounts to message passing in a junction-tree representation of the network.

An inference step (also referred to as a propagation) consists of two main phases:

A clique, C, are allowed to send a message to a neighboring clique, D, whenever C has received messages from all other neighbors and it has not already sent a message to D.

A message from a clique C to a neighboring clique D consists of a marginal table over the nodes that are members of both C and D. More precisely, the message is computed from the table of C by eliminating the dimensions corresponding to the nodes of C that are not members of D. Typically, elimination is performed through summation. To compute the most likely configuration (also known as the most probable explanation (MPE)) and the probability of this configuration, elimination is, however, performed through maximization.

Data Conflict Analysis

Conflict analysis is the activity of detecting, tracing, and explaining possible conflicts among observations of variable values (i.e., evidence or data). Inconsistencies among observations are easily detected (P(evidence) = 0), but also flawed findings should be detected and traced. For example, in a diagnostic situation a single flawed test result may take the investigation in a completely wrong direction.

To understand what conflict analysis is and how it can be used, there are several issues of interest:

Definition of Data Conflict

We define two sets of observations e1 and e2 to be in a possible conflict with one another if they are negatively correlated.

For positively correlated findings we expect that P(e1|e2) > P(e1) and vice versa (i.e., observing e2 makes it more likely to also observe e1 (and vice versa)). In other words, we expect that

P(e1,e2) > P(e1)P(e2)

if e1 and e2 are positively correlated,

P(e1,e2) < P(e1)P(e2)

if e1 and e2 are negatively correlated, and

P(e1,e2) = P(e1)P(e2)

if e1 and e2 are independent.

Conflict Measure

Therefore, given a set of observations (evidence), e = {e1,...,en}, we define the conflict measure for e as

If conf(e) is positive, e1,...,en are negatively correlated, indicating a possible conflict among these pieces of evidence. (The choice of base for the log function is immaterial.)

Notice, that if conf(e) is negative (i.e., no apparent conflict among e1,...,en), then this gives you no guarantee that all of e1,...,en are positively correlated. It may well happen that there is a local conflict (i.e., that conf(e') > 0 for a proper subset e' of e) although conf(e) < 0.

As an example, consider the junction tree below, where there are five pieces of evidence (marked by the red nodes). The global conflict measure is -0.54, indicating no global conflict. This conflict measure is obtained from the root clique.

The measure indicated in each clique is the conflict measure pertaining to the evidence entered in the subtree with that clique as root. For example, the conflict measure of the clique containing nodes "Education" and "Husb_educ" is -0.6 and pertains to the two pieces of evidence associated with "Education" and "Husb_occup".

To compute the local conflict measure for the three pieces of evidence associated with "Age", "Religion" and "Contraceptive" one can select, for example, the clique { "Education", "Religion", "Contraceptive" } as root (see below). (See the Tools and Functionalities section for details about selecting a different clique as root.)

Now, it turns out that there is a slight conflict among these three pieces of evidence (conflict measure is 0.39).

Conflict Resolution

There are situations in which a positive conflict measure is computed, where there is no real conflict. These include: By activating the button with the   symbol, one can obtain a list of possible instantiations of currently uninstantiated variables that can eliminate the current conflict. An example of the dialog box that appears when this button is activated can be seen below.

The dialog box contains a list of possible instantiations in the form

<RM>: <variable_name> = <state_value>

where <RM> is a measure indicating the ability of the instantiation to resolve the conflict. An instantiation with an <RM> value of 100 will reduce the conflict measure to 0. Thus, only instantiations with an <RM> value greater than or equal to 100 get displayed. The higher the <RM> value of the selected instantiation (if any), the larger the negative value of the resulting conflict measure.

The Instantiate button enters the currently selected instantiation (if any) as evidence.

Tracing Conflicts

Whenever a positive conflict has been observed that cannot be explained as a rare case, it is important to pinpoint the piece (or pieces) of evidence that is in conflict with the majority of the pieces of evidence.

Basically, this involves computation of conflict measures for subsets of the evidence. As mentioned and illustrated above, the junction tree is useful for this purpose. As an example, consider the junction tree below, where four pieces of evidence have been entered (marked by the red nodes).

The conflict measure of 0.21 indicates a slight conflict among the these pieces of evidence. To trace the source of the conflict, we can investigate local conflict measures. The junction tree shown above does not reveal any definitive clues as to which piece could be the one causing the conflict. It does show, however, that the local conflict measure is zero for the subtree rooted at clique { T, E, L }, indicating that the observations on X and A are not in conflict with one another.

To investigate further, we may select another clique as root of the junction tree, so as to compute some different local conflict measures. If we select clique { T, E, L } as the root of the tree (see below), we find that there is a slight local conflict among the observed value for D and the observed value for S.

Tools and Functionalities

The cliques of the junction tree(s) are displayed as rectangles, where the upper part of the rectangle contains a list of the nodes that are members of the clique, and the lower part contains a conflict bar indicating the conflict measure pertaining to the evidence entered in the subtree rooted at the clique.

A clique can be selected by clicking the left mouse button. When selected the border of a clique change from grey to black. It is possible to select multiple cliques by holding the shift key down while clicking. When selecting one or more cliques, the nodes of the selected cliques will get selected in the Network Panel.

The Junction Tree Panel offers a number of other functionalities, which are available through a tool bar of buttons and a pull-down menu (see below).

The magnification glass buttons offer a possibility to zoom in and out for better viewing.

The button with the   symbol offers a possibility to select a different clique as root of the tree. The functionality gets enabled only when exactly one clique is selected. Selecting a different clique as root can be useful in the investigation of local conflicts.

The button with the   symbol gets enabled when the global conflict measure is positive. See the above section on conflict resolution for details.

The button with the   symbol provides the usual sum propagation functionality.

The button with the   symbol activates this help page.

Evidence is indicated by highlighting evidence nodes (the color of their node symbols are changed to the evidence color). Two different modes exist for displaying evidence nodes. Either all instances of the evidence nodes will be highlighted or only those instances that appear in the cliques where the evidence has been entered. (Note that a node can be a member of several cliques.) A pull-down menu (see above) is provided for selection of preferred display mode.