Daphne Koller Models Data Model Declarative representation AlgorithmAlgorithm Algorithm elicitation domain expert Learning Daphne Koller Graphical Models IntelligenceDifficulty Grade Letter SAT BD C A Bayesian networks Markov networks Daphne Koller Textual Information Extraction Mrs. Green spoke today in New York. Green chairsthe finance committee. Daphne Koller •  Grade •  Course Difficulty •  Student Intelligence •  Student SAT •  Reference Letter P(G,D,I,S,L) Daphne Koller IntelligenceDifficulty Grade Letter SAT Daphne Koller IntelligenceDifficulty Grade Letter SAT 0.3 0.08 0.25 0.4 g2(B) 0.020.9i1,d0 0.70.05i0,d1 0.5 0.3 g1(A) g3(C) 0.2i1,d1 0.3i0,d0 l1l0 0.99 0.4 0.1 0.9g1 0.01g3 0.6g2 0.2 0.95 s0 s1 0.8i1 0.05i0 0.40.6 d1d0 0.30.7 i1i0 Daphne Koller IntelligenceDifficulty Grade Letter SAT P(D) P(I) P(G|I,D) P(S|I) P(L|G) Chain Rule for Bayesian Networks P(D,I,G,S,L) = P(D) P(I) P(G|I,D) P(S|I) P(L|G) Distribution defined as a product of factors! Daphne Koller IntelligenceDifficulty Grade Letter SAT 0.3 0.08 0.25 0.4 g2 0.020.9i1,d0 0.70.05i0,d1 0.5 0.3 g1 g3 0.2i1,d1 0.3i0,d0 l1l0 0.99 0.4 0.1 0.9g1 0.01g3 0.6g2 0.2 0.95 s0 s1 0.8i1 0.05i0 0.40.6 d1d0 0.30.7 i1i0 P(d0, i1, g3, s1, l1) = Daphne Koller Bayesian Network •  A Bayesian network is: – A directed acyclic graph (DAG) G whose nodes represent the random variables X1,…,Xn – For each node Xi a CPD P(Xi | ParG(Xi)) •  The BN represents a joint distribution via the chain rule for Bayesian networks P(X1,…,Xn) = Πi P(Xi | ParG(Xi)) Daphne Koller BN Is a Legal Distribution: P ≥ 0 Daphne Koller BN Is a Legal Distribution: ∑ P = 1 ∑D,I,G,S,L P(D,I,G,S,L) = ∑D,I,G,S,L P(D) P(I) P(G|I,D) P(S|I) P(L|G) = ∑D,I,G,S P(D) P(I) P(G|I,D) P(S|I) ∑L P(L|G) = ∑D,I,G,S P(D) P(I) P(G|I,D) P(S|I) = ∑D,I,G P(D) P(I) P(G|I,D) ∑S P(S|I) = ∑D,I P(D) P(I) ∑G P(G|I,D) Daphne Koller P Factorizes over G •  Let G be a graph over X1,…,Xn. •  P factorizes over G if P(X1,…,Xn) = Πi P(Xi | ParG(Xi)) Daphne'Koller' Naïve'Bayes'Model' Class X1 X2 Xn.'.'.' (Xi ⊥ Xj | C) for all Xi, Xj Daphne'Koller' Naïve'Bayes'Classifier' Class X1 X2 Xn.'.'.' Daphne'Koller' Bernoulli'Naïve'Bayes'for'Text' cat dog buy.'.'.' Document Label P(“cat” appears | Label) Financial cat dog buy sell … Pets 0.001 0.001 0.2 0.3 … 0.3 0.4 0.02 0.0001 … Daphne'Koller' Mul5nomial'Naïve'Bayes'for'Text' W1 W2 Wn.'.'.' Document Label Financial cat dog buy sell … Pets 0.001 0.001 0.02 0.02 … 0.02 0.03 0.003 0.001 … Daphne'Koller' Summary •  Simple approach for classification –  Computationally efficient –  Easy to construct •  Surprisingly effective in domains with many weakly relevant features •  Strong independence assumptions reduce performance when many features are strongly correlated Daphne Koller Applica'on:+ Diagnosis+ Probabilis'c+ Graphical+ Models+ Bayesian+Networks+ Representa'on+ Daphne Koller Medical Diagnosis: Pathfinder (1992) •  Help pathologist diagnose lymph node pathologies (60 different diseases) •  Pathfinder I: Rule-based system •  Pathfinder II used naïve Bayes and got superior performance Heckerman et al. Daphne Koller Medical Diagnosis: Pathfinder (1992) •  Pathfinder III: Naïve Bayes with better knowledge engineering •  No incorrect zero probabilities •  Better calibration of conditional probabilities –  P(finding | disease1) to P(finding | disease2) –  Not P(finding1 | disease) to P(finding2 | disease) Heckerman et al. Daphne Koller Medical Diagnosis: Pathfinder (1992) •  Pathfinder IV: Full Bayesian network –  Removed incorrect independencies –  Additional parents led to more accurate estimation of probabilities •  BN model agreed with expert panel in 50/53 cases, vs 47/53 for naïve Bayes model •  Accuracy as high as expert that designed the model Heckerman et al.