Daphne Koller
Models
Data
Model
Declarative representation
AlgorithmAlgorithm Algorithm
elicitation
domain expert
Learning
Daphne Koller
Graphical Models
IntelligenceDifficulty
Grade
Letter
SAT BD
C
A
Bayesian networks Markov networks
Daphne Koller
Textual Information Extraction
Mrs. Green spoke today in New York. Green chairsthe finance committee.
Daphne Koller
•  Grade
•  Course Difficulty
•  Student Intelligence
•  Student SAT
•  Reference Letter
P(G,D,I,S,L)
Daphne Koller
IntelligenceDifficulty
Grade
Letter
SAT
Daphne Koller
IntelligenceDifficulty
Grade
Letter
SAT
0.3
0.08
0.25
0.4
g2(B)
0.020.9i1,d0
0.70.05i0,d1
0.5
0.3
g1(A) g3(C)
0.2i1,d1
0.3i0,d0
l1l0
0.99
0.4
0.1 0.9g1
0.01g3
0.6g2
0.2
0.95
s0 s1
0.8i1
0.05i0
0.40.6
d1d0
0.30.7
i1i0
Daphne Koller
IntelligenceDifficulty
Grade
Letter
SAT
P(D) P(I)
P(G|I,D) P(S|I)
P(L|G)
Chain Rule for Bayesian Networks
P(D,I,G,S,L) = P(D) P(I) P(G|I,D) P(S|I) P(L|G)
Distribution defined as a product of factors!
Daphne Koller
IntelligenceDifficulty
Grade
Letter
SAT
0.3
0.08
0.25
0.4
g2
0.020.9i1,d0
0.70.05i0,d1
0.5
0.3
g1 g3
0.2i1,d1
0.3i0,d0
l1l0
0.99
0.4
0.1 0.9g1
0.01g3
0.6g2
0.2
0.95
s0 s1
0.8i1
0.05i0
0.40.6
d1d0
0.30.7
i1i0
P(d0, i1, g3, s1, l1) =
Daphne Koller
Bayesian Network
•  A Bayesian network is:
– A directed acyclic graph (DAG) G whose
nodes represent the random variables X1,…,Xn
– For each node Xi a CPD P(Xi | ParG(Xi))
•  The BN represents a joint distribution
via the chain rule for Bayesian networks
P(X1,…,Xn) = Πi P(Xi | ParG(Xi))
Daphne Koller
BN Is a Legal Distribution: P ≥ 0
Daphne Koller
BN Is a Legal Distribution: ∑ P = 1
∑D,I,G,S,L P(D,I,G,S,L) = ∑D,I,G,S,L P(D) P(I) P(G|I,D) P(S|I) P(L|G)
= ∑D,I,G,S P(D) P(I) P(G|I,D) P(S|I) ∑L P(L|G)
= ∑D,I,G,S P(D) P(I) P(G|I,D) P(S|I)
= ∑D,I,G P(D) P(I) P(G|I,D) ∑S P(S|I)
= ∑D,I P(D) P(I) ∑G P(G|I,D)
Daphne Koller
P Factorizes over G
•  Let G be a graph over X1,…,Xn.
•  P factorizes over G if
P(X1,…,Xn) = Πi P(Xi | ParG(Xi))
Daphne'Koller'
Naïve'Bayes'Model'
Class
X1 X2 Xn.'.'.'
(Xi ⊥ Xj | C) for all Xi, Xj
Daphne'Koller'
Naïve'Bayes'Classiﬁer'
Class
X1 X2 Xn.'.'.'
Daphne'Koller'
Bernoulli'Naïve'Bayes'for'Text'
cat dog buy.'.'.'
Document Label
P(“cat” appears | Label)
Financial
cat dog buy sell …
Pets
0.001 0.001 0.2 0.3 …
0.3 0.4 0.02 0.0001 …
Daphne'Koller'
Mul5nomial'Naïve'Bayes'for'Text'
W1 W2 Wn.'.'.'
Document Label Financial
cat dog buy sell …
Pets
0.001 0.001 0.02 0.02 …
0.02 0.03 0.003 0.001 …
Daphne'Koller'
Summary
•  Simple approach for classification
–  Computationally efficient
–  Easy to construct
•  Surprisingly effective in domains with many
weakly relevant features
•  Strong independence assumptions reduce
performance when many features are strongly
correlated
Daphne Koller
Applica'on:+
Diagnosis+
Probabilis'c+
Graphical+
Models+ Bayesian+Networks+
Representa'on+
Daphne Koller
Medical Diagnosis: Pathfinder (1992)
•  Help pathologist diagnose lymph node
pathologies (60 different diseases)
•  Pathfinder I: Rule-based system
•  Pathfinder II used naïve Bayes and got
superior performance
Heckerman et al.
Daphne Koller
Medical Diagnosis: Pathfinder (1992)
•  Pathfinder III: Naïve Bayes with better
knowledge engineering
•  No incorrect zero probabilities
•  Better calibration of conditional probabilities
–  P(finding | disease1) to P(finding | disease2)
–  Not P(finding1 | disease) to P(finding2 | disease)
Heckerman et al.
Daphne Koller
Medical Diagnosis: Pathfinder (1992)
•  Pathfinder IV: Full Bayesian network
–  Removed incorrect independencies
–  Additional parents led to more accurate
estimation of probabilities
•  BN model agreed with expert panel in 50/53
cases, vs 47/53 for naïve Bayes model
•  Accuracy as high as expert that designed
the model
Heckerman et al.