Bayesian networks
Příčina
Následek
Example
• Topology of network encodes conditional independence
assertions:
• Weather is independent of the other variables
• Toothache and Catch are conditionally independent given
Cavity
Weng-Keen Wong, Oregon
State University ©2005
3
Bayesian Networks
• In the opinion of many AI researchers, Bayesian
networks are the most significant contribution in
AI in the last 10 years
• They are used in many applications eg. spam
filtering, speech recognition, robotics, diagnostic
systems and even syndromic surveillance
HasAnthrax
HasCough HasFever HasDifficultyBreathing HasWideMediastinum
Weng-Keen Wong, Oregon
State University ©2005
4
Example
BCR subset
CDR3 V-Gene J - Gene Mutated
Example
• I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary
doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?
• Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
• Network topology reflects "causal" knowledge:
– A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call
Example contd.
Compactness
• A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of
parent values
• Each row requires one number p for Xi = true
(the number for Xi = false is just 1-p)
• If each variable has no more than k parents, the complete network requires O(n ·
2k) numbers
• I.e., grows linearly with n, vs. O(2n) for the full joint distribution
• For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)
Semantics
The full joint distribution is defined as the product of the local conditional
distributions:
P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))
e.g., P(j  m  a  b  e)
= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
n
Constructing Bayesian networks
• 1. Choose an ordering of variables X1, … ,Xn
• 2. For i = 1 to n
– add Xi to the network
–
– select parents from X1, … ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)
This choice of parents guarantees:
P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1)
(chain rule)
= πi =1P (Xi | Parents(Xi))
(by construction)
n
n
• Suppose we choose the ordering M, J, A, B, E
•
P(J | M) = P(J)?
Example
• Suppose we choose the ordering M, J, A, B, E
•
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?
Example
• Suppose we choose the ordering M, J, A, B, E
•
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
Example
• Suppose we choose the ordering M, J, A, B, E
•
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)?
P(E | B, A, J, M) = P(E | A, B)?
Example
• Suppose we choose the ordering M, J, A, B, E
•
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)? No
P(E | B, A, J, M) = P(E | A, B)? Yes
Example
Weng-Keen Wong, Oregon
State University ©2005
15
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
4. Bayesian networks in syndromic
surveillance
Weng-Keen Wong, Oregon
State University ©2005
16
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some
degree of uncertainty as to the outcome of
the event
• For example, the random variable A could
be the event of getting a heads on a coin
flip
Weng-Keen Wong, Oregon
State University ©2005
17
Boolean Random Variables
• We will start with the simplest type of random
variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not occurring
• Examples (Let A be a Boolean random variable):
A = Getting heads on a coin flip
A = It will rain today
A = The Cubs win the World Series in 2007
Weng-Keen Wong, Oregon
State University ©2005
18
Probabilities
The sum of the red
and blue areas is 1
P(A = false)
P(A = true)
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an outcome would be
obtained if the process were repeated a large number of times under similar
conditions*
*Ahem…there’s also the Bayesian
definition which says probability is your
degree of belief in an outcome
Weng-Keen Wong, Oregon
State University ©2005
19
Conditional Probability
• P(A = true | B = true) = Out of all the outcomes in which B
is true, how many also have A equal to true
• Read this as: “Probability of A conditioned on B” or
“Probability of A given B”
P(F = true)
P(H = true)
H = “Have a headache”
F = “Coming down with Flu”
P(H = true) = 1/10
P(F = true) = 1/40
P(H = true | F = true) = 1/2
“Headaches are rare and flu is rarer, but if
you’re coming down with flu there’s a 50-
50 chance you’ll have a headache.”
Weng-Keen Wong, Oregon
State University ©2005
20
The Joint Probability Distribution
• We will write P(A = true, B = true) to mean
“the probability of A = true and B = true”
• Notice that:
P(H=true|F=true)
regionF""ofArea
regionF"andH"ofArea

true)P(F
true)Ftrue,P(H



In general, P(X|Y)=P(X,Y)/P(Y)
P(F = true)
P(H = true)
Weng-Keen Wong, Oregon
State University ©2005
21
The Joint Probability Distribution
• Joint probabilities can be between
any number of variables
eg. P(A = true, B = true, C = true)
• For each combination of variables,
we need to say how probable that
combination is
• The probabilities of these
combinations need to sum to 1
A B C P(A,B,C)
false false false 0.1
false false true 0.2
false true false 0.05
false true true 0.05
true false false 0.3
true false true 0.1
true true false 0.05
true true true 0.15
Sums to 1
Weng-Keen Wong, Oregon
State University ©2005
22
The Joint Probability Distribution
• Once you have the joint
probability distribution, you can
calculate any probability involving
A, B, and C
• Note: May need to use
marginalization and Bayes rule,
(both of which are not discussed in
these slides)
A B C P(A,B,C)
false false false 0.1
false false true 0.2
false true false 0.05
false true true 0.05
true false false 0.3
true false true 0.1
true true false 0.05
true true true 0.15
Examples of things you can compute:
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
Weng-Keen Wong, Oregon
State University ©2005
23
The Problem with the Joint
Distribution
• Lots of entries in the
table to fill up!
• For k Boolean random
variables, you need a
table of size 2k
• How do we use fewer
numbers? Need the
concept of
independence
A B C P(A,B,C)
false false false 0.1
false false true 0.2
false true false 0.05
false true true 0.05
true false false 0.3
true false true 0.1
true true false 0.05
true true true 0.15
Weng-Keen Wong, Oregon
State University ©2005
24
Independence
Variables A and B are independent if any of
the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
This says that knowing the outcome of A does not
tell me anything new about the outcome of B.
Weng-Keen Wong, Oregon
State University ©2005
25
Independence
How is independence useful?
• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2n
values in the table
• If the coin flips are independent, then


n
i
in CPCCP
1
1 )(),...,(
Each P(Ci) table has 2 entries and there
are n of them for a total of 2n values
Weng-Keen Wong, Oregon
State University ©2005
26
Conditional Independence
Variables A and B are conditionally
independent given C if any of the following
hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain anything by knowing A
(either because A doesn’t influence B or because knowing C provides all
the information knowing A would give)
Weng-Keen Wong, Oregon
State University ©2005
27
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
4. Bayesian networks in syndromic
surveillance
A Bayesian Network
A Bayesian network is made up of:
A P(A)
false 0.6
true 0.4
A
B
C D
A B P(B|A)
false false 0.01
false true 0.99
true false 0.7
true true 0.3
B C P(C|B)
false false 0.4
false true 0.6
true false 0.9
true true 0.1
B D P(D|B)
false false 0.02
false true 0.98
true false 0.05
true true 0.95
1. A Directed Acyclic Graph
2. A set of tables for each node in the graph
Shotgun proteomics
Trained
Model
Test
PSMs
Training
PSMs
Probability
Model
Evaluation
PSM = peptide-spectrum match
Peptide sequence influences peak height
Bayesian network
• We model peptide fragmentation using a
Bayesian network.
• Nodes represent random variables, and
edges represent conditional dependencies.
• Each node stores a conditional probability
table (CPT) giving Pr(node|parents).
1.000.00no b-ion observed
0.750.25b-ion observed
intensity > 50% intensity < 50%
Is b-ion
observed?
b-ion
intensity
Ion series modeled in a Markov chain
Is b-ion
observed?
b-ion
intensity
Is b-ion
observed?
b-ion
intensity
Is b-ion
observed?
b-ion
intensity
Is b-ion
observed?
b-ion
intensity
Is b-ion
observed?
b-ion
intensity
~ PepHMM (Han et al., 2005).
A more realistic model
Is b-ion
observed?
b-ion
intensity
N-term
AA
C-term
AA
Is ion
detectable?
Fractional
m/z
Is proton
mobile?
Ion series modeled in a Markov chain
 
 modelnullpeptideions,-bPr
modelpeptideions,-bPr
logbLOR