Bayesian networks Příčina Následek Example • Topology of network encodes conditional independence assertions: • Weather is independent of the other variables • Toothache and Catch are conditionally independent given Cavity Weng-Keen Wong, Oregon State University ©2005 3 Bayesian Networks • In the opinion of many AI researchers, Bayesian networks are the most significant contribution in AI in the last 10 years • They are used in many applications eg. spam filtering, speech recognition, robotics, diagnostic systems and even syndromic surveillance HasAnthrax HasCough HasFever HasDifficultyBreathing HasWideMediastinum Weng-Keen Wong, Oregon State University ©2005 4 Example BCR subset CDR3 V-Gene J - Gene Mutated Example • I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? • Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls • Network topology reflects "causal" knowledge: – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call Example contd. Compactness • A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values • Each row requires one number p for Xi = true (the number for Xi = false is just 1-p) • If each variable has no more than k parents, the complete network requires O(n · 2k) numbers • I.e., grows linearly with n, vs. O(2n) for the full joint distribution • For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31) Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi)) e.g., P(j  m  a  b  e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) n Constructing Bayesian networks • 1. Choose an ordering of variables X1, … ,Xn • 2. For i = 1 to n – add Xi to the network – – select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) This choice of parents guarantees: P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1) (chain rule) = πi =1P (Xi | Parents(Xi)) (by construction) n n • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes Example Weng-Keen Wong, Oregon State University ©2005 15 Outline 1. Introduction 2. Probability Primer 3. Bayesian networks 4. Bayesian networks in syndromic surveillance Weng-Keen Wong, Oregon State University ©2005 16 Probability Primer: Random Variables • A random variable is the basic element of probability • Refers to an event and there is some degree of uncertainty as to the outcome of the event • For example, the random variable A could be the event of getting a heads on a coin flip Weng-Keen Wong, Oregon State University ©2005 17 Boolean Random Variables • We will start with the simplest type of random variables – Boolean ones • Take the values true or false • Think of the event as occurring or not occurring • Examples (Let A be a Boolean random variable): A = Getting heads on a coin flip A = It will rain today A = The Cubs win the World Series in 2007 Weng-Keen Wong, Oregon State University ©2005 18 Probabilities The sum of the red and blue areas is 1 P(A = false) P(A = true) We will write P(A = true) to mean the probability that A = true. What is probability? It is the relative frequency with which an outcome would be obtained if the process were repeated a large number of times under similar conditions* *Ahem…there’s also the Bayesian definition which says probability is your degree of belief in an outcome Weng-Keen Wong, Oregon State University ©2005 19 Conditional Probability • P(A = true | B = true) = Out of all the outcomes in which B is true, how many also have A equal to true • Read this as: “Probability of A conditioned on B” or “Probability of A given B” P(F = true) P(H = true) H = “Have a headache” F = “Coming down with Flu” P(H = true) = 1/10 P(F = true) = 1/40 P(H = true | F = true) = 1/2 “Headaches are rare and flu is rarer, but if you’re coming down with flu there’s a 50- 50 chance you’ll have a headache.” Weng-Keen Wong, Oregon State University ©2005 20 The Joint Probability Distribution • We will write P(A = true, B = true) to mean “the probability of A = true and B = true” • Notice that: P(H=true|F=true) regionF""ofArea regionF"andH"ofArea  true)P(F true)Ftrue,P(H    In general, P(X|Y)=P(X,Y)/P(Y) P(F = true) P(H = true) Weng-Keen Wong, Oregon State University ©2005 21 The Joint Probability Distribution • Joint probabilities can be between any number of variables eg. P(A = true, B = true, C = true) • For each combination of variables, we need to say how probable that combination is • The probabilities of these combinations need to sum to 1 A B C P(A,B,C) false false false 0.1 false false true 0.2 false true false 0.05 false true true 0.05 true false false 0.3 true false true 0.1 true true false 0.05 true true true 0.15 Sums to 1 Weng-Keen Wong, Oregon State University ©2005 22 The Joint Probability Distribution • Once you have the joint probability distribution, you can calculate any probability involving A, B, and C • Note: May need to use marginalization and Bayes rule, (both of which are not discussed in these slides) A B C P(A,B,C) false false false 0.1 false false true 0.2 false true false 0.05 false true true 0.05 true false false 0.3 true false true 0.1 true true false 0.05 true true true 0.15 Examples of things you can compute: • P(A=true) = sum of P(A,B,C) in rows with A=true • P(A=true, B = true | C=true) = P(A = true, B = true, C = true) / P(C = true) Weng-Keen Wong, Oregon State University ©2005 23 The Problem with the Joint Distribution • Lots of entries in the table to fill up! • For k Boolean random variables, you need a table of size 2k • How do we use fewer numbers? Need the concept of independence A B C P(A,B,C) false false false 0.1 false false true 0.2 false true false 0.05 false true true 0.05 true false false 0.3 true false true 0.1 true true false 0.05 true true true 0.15 Weng-Keen Wong, Oregon State University ©2005 24 Independence Variables A and B are independent if any of the following hold: • P(A,B) = P(A) P(B) • P(A | B) = P(A) • P(B | A) = P(B) This says that knowing the outcome of A does not tell me anything new about the outcome of B. Weng-Keen Wong, Oregon State University ©2005 25 Independence How is independence useful? • Suppose you have n coin flips and you want to calculate the joint distribution P(C1, …, Cn) • If the coin flips are not independent, you need 2n values in the table • If the coin flips are independent, then   n i in CPCCP 1 1 )(),...,( Each P(Ci) table has 2 entries and there are n of them for a total of 2n values Weng-Keen Wong, Oregon State University ©2005 26 Conditional Independence Variables A and B are conditionally independent given C if any of the following hold: • P(A, B | C) = P(A | C) P(B | C) • P(A | B, C) = P(A | C) • P(B | A, C) = P(B | C) Knowing C tells me everything about B. I don’t gain anything by knowing A (either because A doesn’t influence B or because knowing C provides all the information knowing A would give) Weng-Keen Wong, Oregon State University ©2005 27 Outline 1. Introduction 2. Probability Primer 3. Bayesian networks 4. Bayesian networks in syndromic surveillance A Bayesian Network A Bayesian network is made up of: A P(A) false 0.6 true 0.4 A B C D A B P(B|A) false false 0.01 false true 0.99 true false 0.7 true true 0.3 B C P(C|B) false false 0.4 false true 0.6 true false 0.9 true true 0.1 B D P(D|B) false false 0.02 false true 0.98 true false 0.05 true true 0.95 1. A Directed Acyclic Graph 2. A set of tables for each node in the graph Shotgun proteomics Trained Model Test PSMs Training PSMs Probability Model Evaluation PSM = peptide-spectrum match Peptide sequence influences peak height Bayesian network • We model peptide fragmentation using a Bayesian network. • Nodes represent random variables, and edges represent conditional dependencies. • Each node stores a conditional probability table (CPT) giving Pr(node|parents). 1.000.00no b-ion observed 0.750.25b-ion observed intensity > 50% intensity < 50% Is b-ion observed? b-ion intensity Ion series modeled in a Markov chain Is b-ion observed? b-ion intensity Is b-ion observed? b-ion intensity Is b-ion observed? b-ion intensity Is b-ion observed? b-ion intensity Is b-ion observed? b-ion intensity ~ PepHMM (Han et al., 2005). A more realistic model Is b-ion observed? b-ion intensity N-term AA C-term AA Is ion detectable? Fractional m/z Is proton mobile? Ion series modeled in a Markov chain    modelnullpeptideions,-bPr modelpeptideions,-bPr logbLOR