What is Data Conflict Analysis?

Conflict analysis is the activity of detecting, tracing, and explaining possible conflicts among observations of variable values (i.e., evidence or data). Inconsistencies among observations are easily detected (P(evidence) = 0), but also flawed findings should be detected and traced. For example, in a diagnostic situation a single flawed test result may take the investigation in a completely wrong direction.

To understand what conflict analysis is and how it can be used, there are several issues of interest:

Definition of Data Conflict

We define two sets of observations e1 and e2 to be in a possible conflict with one another if they are negatively correlated.

For positively correlated findings we expect that P(e1|e2) > P(e1) and vice versa (i.e., observing e2 makes it more likely to also observe e1 (and vice versa)). In other words, we expect that

P(e1,e2) > P(e1)P(e2)

if e1 and e2 are positively correlated,

P(e1,e2) < P(e1)P(e2)

if e1 and e2 are negatively correlated, and

P(e1,e2) = P(e1)P(e2)

if e1 and e2 are independent.

Conflict Measure

Therefore, given a set of observations (evidence), e = {e1,...,en}, we define the conflict measure for e as

If conf(e) is positive, e1,...,en are negatively correlated, indicating a possible conflict among these pieces of evidence. (The choice of base for the log function is immaterial.)

Notice, that if conf(e) is negative (i.e., no apparent conflict among e1,...,en), then this gives you no guarantee that all of e1,...,en are positively correlated. It may well happen that there is a local conflict (i.e., that conf(e') > 0 for a proper subset e' of e) although conf(e) < 0.

For more information about detection of local conflict, see the help page of the junction tree panel.

Conflict Resolution

There are situations in which a positive conflict measure is computed, where there is no real conflict. These include: By activating the button with the   symbol, one can obtain a list of possible instantiations of currently uninstantiated variables that can eliminate the current conflict. See below for an example of the dialog box that appears when this button is activated.

The dialog box contains a list of possible instantiations in the form

<CM>: <variable> = <value>

where <CM> is the new conflict measure obtained if <variable> is instantiated to <value>. Only instantiations (if any) with a resulting conflict measure less than or equal to 0 get displayed.

The Instantiate button enters the currently selected instantiation (if any) as evidence.

Tracing Conflicts

Whenever a positive conflict has been observed that cannot be explained as a rare case, it is important to pinpoint the piece (or pieces) of evidence that is in conflict with the majority of the pieces of evidence.

Basically, this involves computation of conflict measures for subsets of the evidence. The junction tree is useful for this purpose; see the help page for the junction tree panel for more information.

Hypothesis Variables

When searching for a hypothesis or observation that may eliminate the current conflict it may be desirable to restrict the set of uninstantiated variables considered in the search. This is done by selecting a set of hypothesis variables. Any subset of discrete chance nodes may be selected as hypothesis variables. Only the selected set of hypothesis variables are considered in the search for instantiations that may eliminate the current conflict.

Hypothesis Driven Conflict Analysis

So far we have considered data conflict analysis, i.e. how to identify, resolve, or explain a possible conflict in data. In hypothesis driven data conflict analysis we investigate how each piece of evidence impacts a given hypothesis.

In hypothesis driven conflict analysis we investigate the impact of each piece of evidence on a given hypothesis. For each piece of evidence (finding) f we compute the prior P(h) of the hypothesis, the posterior P(h|e) of the hypothesis given the entire set of evidence e and the posterior P(h|e\f) of the hypothesis given the subset of evidence where the piece of evidence f under consideration is retracted.

Hypothesis driven conflict analysis allows the user to investigate how a single piece of finding impacts the probability of the hypothesis.

A threshold value is applied to reduce the number of findings considered. Only results for findings where the cost-of-omission is above the threshold are displayed where cost-of-omission is a measure of the distance between P(H|e) and P(H|e\f).