Efﬁcient Algorithms for Pure Nash Equilibria
In the step 2. of the backward induction, the algorithm may choose
an arbitrary hmax ∈ argmaxh�∈K uρ(h)(h�
) and always obtain a SPE.
In order to compute all SPE, the algorithm may systematically search
through all possible choices of hmax throughout the induction.
Backward induction is too inefﬁcient (unnecessarily searches through
the whole tree).
There are better algorithms, such as α−β-prunning.
For details, extensions etc. see e.g.
� PB016 Artiﬁcial Intelligence I
� Multi-player alpha-beta prunning, R. Korf, Artiﬁcial Intelligence
48, pages 99-111, 1991
� Artiﬁcial Intelligence: A Modern Approach (3rd edition),
S. Russell and P. Norvig, Prentice Hall, 2009
186
Example
Centipede game:
A A A A A
D D D D D
(1, 0) (0, 2) (3, 1) (2, 4) (4, 3)
(3, 5)1 2 1 2 1
SPE in pure strategies: (DDD, DD) ... Isn’t it weird?
There are serious issues here ...
� In laboratory setting, people usually play A for several steps.
� There is a theoretical problem: Imagine, that you are player 2.
What would you do when player 1 chooses A in the ﬁrst step?
The SPE analysis says that you should go down, but the same
analysis also says that the situation you are in cannot appear :-)
187
Dynamic Games of Complete Information
Extensive-Form Games
Mixed and Behavioral Strategies
188
Mixed and Behavioral Strategies
Deﬁnition 58
A mixed strategy σi of player i in G is a mixed strategy of player i in
the corresponding strategic-form game.
I.e., a mixed strategy σi of player i in G is a probability distribution on Si (recall
that Si is the set of all pure strategies, i.e., functions of the form si : Hi → A).
As before, we denote by σi the set of all mixed strategies of player i
and by Σ the set of all mixed strategy proﬁles Σ1 × · · · × Σn.
Deﬁnition 59
A behavioral strategy of player i in G is a function βi : Hi → Δ(A)
such that for every h ∈ Hi we have that supp(βi(h)) ⊆ χ(h).
Given a proﬁle β = (β1, . . . , βn) of behavioral strategies, we denote by
Pβ(z) the probability of reaching z ∈ Z when β is used, i.e.,
Pβ(z) =
k�
�=1
βρ(h�−1)(h�)(a�)
where h0a1h1a2h2 · · · ak hk is the unique path from h0 to hk = z.
We deﬁne ui(β) :=
�
z∈Z Pβ(z) · ui(z). 189
Behavioral Strategies: Example
1
h0
2
h1
z1
B
1
h3
z2
C
z3
¯C
¯B
A
2
h2
z4
D
z5
¯D
¯A
Pure strategies of player 1: AC, A ¯C, ¯AC, ¯A ¯C
An example of a mixed strategy σ1 of player 1:
σ1(AC) = 1
3 , σ1(A ¯C) = 1
9 , σ1(¯AC) = 1
6 and σ1(¯A ¯C) = 11
18
190
Behavioral Strategies: Example
1
h0
2
h1
z1
B
1
h3
z2
C
z3
¯C
¯B
A
2
h2
z4
D
z5
¯D
¯A
An example of behavioral strategies of both players:
� player 1: β1(h0)(A) = 1
3 and β1(h3)(C) = 1
2
� player 2: β2(h1)(B) = 1
4 and β2(h2)(D) = 1
5
P(β1,β2)(z2) = 1
3
�
1 − 1
4
�
1
2 = 1
8
191
Behavioral Strategies: Example
1
h0
2
h1
z1
(1, 0)
B
1
h3
z2
(2, 3)
C
z3
(3, 2)
¯C
¯B
A
2
h2
z4
(1, 1)
D
z5
(5, 4)
¯D
¯A β = (β1, β2)
� player 1: β1(h0)(A) = 1
3
and β1(h3)(C) = 1
2
� player 2: β2(h1)(B) = 1
4
and β2(h2)(D) = 1
5
u1(β) = Pβ(z1) · 1 + Pβ(z2) · 2 + Pβ(z3) · 3 + Pβ(z4) · 1 + Pβ(z5) · 5
=
1
3
1
4
1 +
1
3
3
4
1
2
2 +
1
3
3
4
1
2
3 +
2
3
1
5
1 +
2
3
4
5
5 ≈ 3.508
192
Mixed/Behavioral Proﬁles
Each pure strategy can be considered as a behavioral strategy.
Deﬁnition 60
A mixed/behavioral strategy proﬁle is a tuple α = (α1, . . . , αn) where
each αi is either a mixed, or a behavioral strategy.
Let α = (α1, . . . , αn) be a mixed/behavioral strategy proﬁle, and let
M = {i1, . . . , ik } ⊆ N be the set of all players ij ∈ N such that αij
is
a mixed strategy. We deﬁne
ui(α) =
�
si1
∈Si1
· · ·
�
sik
∈Sik


k�
�=1
αi�
(si�
)

 · ui(α�
1, . . . , α�
n)
where α�
j
=



sj if j ∈ M,
αj otherwise.
Intuitively, ui(α) is the expected payoff of player i in the following play: First,
each player i� ∈ M chooses his pure strategy si�
randomly with the probability
αi�
(si�
), then these ﬁxed pure strategies are played against the behavioral
strategies of players from N � M (who may still randomize along the play).
193
Equivalence of Mixed and Behavioral Strategies
We show how to translate behavioral strategies to equivalent mixed
ones (w.r.t. probabilities of reaching terminal nodes) and vice versa.
Behavioral to mixed: We say that a mixed strategy σi is induced by
a behavioral strategy βi if
σi(si) =
�
h∈Hi
βi(h)(si(h)) for all si ∈ Si
Mixed to behavioral: For this direction some notation is needed.
Given h ∈ H, we denote by w[h] the unique path from h0 to h.
Given h ∈ Hi, we denote by Sh
i
the set of all pure strategies si ∈ Si
such that for every h�
∈ Hi visited by w[h] we have that si(h�
) is
the action chosen in h�
on w[h].
Intuitively, Sh
i
consists of all pure strategies that on the unique path from h0 to
h chose the appropriate actions to stay on the path. In other words, h can be
reached using si (assuming that the opponents play appropriately) iff si ∈ Sh
i
.
Given h ∈ Hi and a ∈ χ(h), we denote by Sh,a
i
⊆ Sh
i
the set of all pure
strategies si ∈ Sh
i
such that si(h) = a.
I.e., strategies of Sh,a
i
may reach h and then choose a there.
194
Equivalence of Mixed and Behavioral Strategies
(Cont.)
We say that a behavioral strategy βi is induced by a mixed strategy σi
if the following holds:
For every h ∈ Hi and a ∈ χ(h)
� either
�
si ∈Sh
i
σi(si) = 0
� or
βi(h)(a) =
�
si ∈Sh,a
i
σi(si)
�
si ∈Sh
i
σi(si)
Intuitively, βi(h)(a) is the probability of selecting a in h assuming that h can
be reached with a positive probability if the other players play appropriately.
If the probability of reaching h using σi is zero (no matter of what
the opponents are doing), then the βi(h) may be deﬁned arbitrarily since h is
reached with zero probability using β as well.
195
Equivalence of Mixed and Behavioral Strategies
Theorem 61
Let α be a mixed/behavioral strategy proﬁle and let α�
be any
mixed/behavioral proﬁle obtained from α by substituting some of
the strategies in α with strategies they induce. Then ui(α) = ui(α�
).
In fact, any node of H is reached from h0 with the same probability
for all such α�
.
196
Equivalence of Mixed and Behavioral Strategies
1
h0
2
h1
z1
(1, 0)
B
1
h3
z2
(2, 3)
C
z3
(3, 2)
¯C
¯B
A
2
h2
z4
(1, 1)
D
z5
(5, 4)
¯D
¯A
Pure strategies of player 1: AC, A ¯C, ¯AC, ¯A ¯C
Pure strategies of player 2: BD, B ¯D, ¯BD, ¯B ¯D
Mixed strategies of player 1: σ1 = (pAC , pA ¯C , p¯A,C , p¯A ¯C )
(Here pXY = σ1(s) where s is a pure str. such that s(h0) = X, s(h3) = Y)
Mixed strategies of player 2: σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D)
197
Equivalence of Mixed and Behavioral Strategies
1
h0
2
h1
z1
(1, 0)
B
1
h3
z2
(2, 3)
C
z3
(3, 2)
¯C
¯B
A
2
h2
z4
(1, 1)
D
z5
(5, 4)
¯D
¯A
Behavioral strategies of player 1: β1 = (qA , qC ) were qA = β1(h0)(A)
and qC = β1(h3)(C); Denote q¯A = 1 − qA and q¯C = 1 − qC
Behavioral strategies of player 2: β2 = (qB , qD) and we use
q¯B = 1 − qB and q¯D = 1 − qD
198
Equivalence of Mixed and Behavioral Strategies
Behavioral to mixed: Given β1 = (qA , qC ) and β2 = (qB , qD) deﬁne
σ1 = (pAC , pA ¯C , p¯A,C , p¯A ¯C ) := (qA qC , qA q¯C , q¯A qC , q¯A q¯C )
σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) := (qB qD, qB q¯D, q¯B qD, q¯B q¯D)
What is the probability of reaching z2 ?
� Using (β1, β2) : qA q¯B qC
(i.e. multiply the probabilities assigned by β1, β2 along the path from h0
to z2)
� Using (σ1, σ2) : (qA qC )(q¯B qD + q¯B q¯D) = qA q¯B qC
(i.e., player 1 needs to choose the pure strategy AC, player 2 needs to
choose any pure strategy which selects ¯B)
� Using (σ1, β2) : (qA qC )q¯B = qA q¯B qC
(i.e., ﬁrst player 1 chooses a pure strategy, this needs to be AC, and
then player 2 plays against this particular strategy by choosing ¯B)
� Using (β1, σ2) : (q¯B qD + q¯B q¯D)qA qC = qA q¯B qC
(i.e., ﬁrst player 2 chooses a pure strategy, needs to be one playing ¯B in
h1, and then player 1 plays against this strategy by choosing A and C)
199
Equivalence of Mixed and Behavioral Strategies
Mixed to behavioral: Given σ1 = (pAC, pA ¯C, p¯A,C, p¯A ¯C) and
σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) we have
� β1 = (qA , qC) where
qA = pAC +pA ¯C qC =



pAC
pAC +pA ¯C
if pAC + pA ¯C > 0
x otherwise
Here x is an arbitrary number between 0 and 1.
� β2 = (qB, qD) where
qB = pBD + pB ¯D qD = pBD + p¯BD
200
Equivalence of Mixed and Behavioral Strategies
First, consider qA = pAC + pA ¯C > 0.
What is the probability of reaching z2 ?
� Using (σ1, σ2) : pAC · (p¯BD + p¯B ¯D)
i.e., player 1 chooses AC and player 2 chooses a pure str. playing ¯B
� Using (β1, β2) :
qA · q¯B · qC = (pAC + pA ¯C ) · q¯B ·
pAC
pAC + pA ¯C
= q¯B · pAC
= pAC · (1 − qB )
= pAC · (1 − (pBD + pB ¯D))
= pAC · (p¯BD + p¯B ¯D)
� Using (β1, σ2) :
(p¯BD + p¯B ¯D) · qA · qC = qA · q¯B · qC = pAC · (p¯BD + p¯B ¯D)
i.e., ﬁrst player 2 chooses a pure strategy playing ¯B in h1 and then
player 1 plays the behavioral strategy β1 against it
201
Equivalence of Mixed and Behavioral Strategies
� Using (σ1, β2) : pAC · q¯B = pAC · (p¯BD + p¯B ¯D)
i.e., ﬁrst player 1 chooses the pure strategy AC and then player 2 plays
the behavioral str. β2 against it
Observe that all possible combinations of mixed and behavioral
strategies give the same probability of reaching z2; this holds for all
terminal nodes and hence all combinations give the same payoff.
Now, assume qA = pAC + pA ¯C = 0 (which implies pAC = 0).
What is the probability of reaching z2 ?
� Using (σ1, σ2) : pAC · (p¯BD + p¯B ¯D) = 0
� Using (β1, β2) : qA · q¯B · qC = 0
� Using (β1, σ2) : (p¯BD + p¯B ¯D) · qA · qC = 0
� Using (σ1, β2) : pAC · q¯B = 0
202
Behavioral (Mixed) Strategy SPE
Let us denote by Bi the set of all behavioral strategies of player i, and
by B the set of all behavioral strategy proﬁles B1 × . . . × Bn.
Deﬁnition 62
β = (β1, . . . , βn) ∈ B is a behavioral Nash equilibrium if
ui(βi, β−i) ≥ ui(β�
i , β−i) for all i ∈ N and β�
i ∈ Bi
Observe that due to Theorem 61 behavioral NE coincide with mixed NE.
Deﬁnition 63
A subgame perfect equilibrium (SPE) in behavioral strategies is
a behavioral strategy proﬁle β ∈ B such that for any subgame Gh
of
G, the restriction of β to Hh
is a behavioral Nash equilibrium.
Here β = (β1, . . . , βn) and the restriction of β to Gh
is a behavioral strategy
proﬁle βh
= (βh
1
, . . . , βh
n) where each βh
i
is a restriction of βi to Hh
∩ Hi.
Theorem 64
There exists a pure strategy proﬁle which is a SPE in behavioral
strategies.
The proof is similar to the proof of Theorem 57.
203
Comments on Algorithms
Note that some SPE in behavioral strategies can be computed using
the backward induction.
Indeed, the algorithm computes a pure strategy proﬁle where each player
always maximizes his value; such a pure strategy proﬁle is SPE in both pure
and behavioral strategies.
Even though there always exists a pure SPE, there may exist
(a continuum of) SPE composed of "non-pure" behavioral strategies.
However, the necessary and sufﬁcient condition for existence of such
SPE is that at some point of the backward induction one of the players
(say i) has two or more alternatives with the same equilibrium payoff.
The same payoff is only for the player i, the other players may have different
payoffs depending on the choice of the player i.
Then any convex combination of such alternatives can be made by
the player i, still leading to SPE (of course, for each combination
the resulting SPE may be different).
For two players the backward induction can be extended to compute
(a ﬁnite representation of) all SPE in behavioral strategies in
polynomial time.
204
Dynamic Games of Complete Information
Extensive-Form Games
Imperfect-Information Games
205
Extensive-form of Matching Pennies
Is it possible to model Matching pennies using extensive-form
games?
H T
H 1, −1 −1, 1
T −1, 1 1, −1
1
h0
2
h1
(1, −1)
H
(−1, 1)
T
H
2
h2
(−1, 1)
H
(1, −1)
T
T
The problem is that player 2 is "perfectly" informed about the choice
of player 1. In particular, there are pure Nash equilibria (H, TH) and
(T, TH) in the extensive-form game as opposed to the strategic-form.
Reversing the order of players does not help.
We need to extend the formalism to be able to hide some information
about previous moves. 206
Extensive-form of Matching Pennies
Matching pennies can be modeled using
an imperfect-information extensive-form game:
1
h0
2
h1
(1, −1)
H
(−1, 1)
T
H
2
h2
(−1, 1)
H
(1, −1)
T
T
Here h1 and h2 belong to the same information set of player 2.
As a result, player 2 is not able to distinguish between h1 and h2.
So even though players do not move simultaneously, the information
player 2 has about the current situation is the same as in
the simultaneous case.
207
Imperfect Information Games
An imperfect-information extensive-form game is a tuple
Gimp = (Gperf , I) where
� Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is a perfect-information
extensive-form game (called the underlying game),
� I = (I1, . . . , In) where for each i ∈ N = {1, . . . , n}
Ii = {Ii,1, . . . , Ii,ki
}
is a collection of information sets for player i that satisﬁes
�
�ki
j=1
Ii,j = Hi and Ii,j ∩ Ii,k = ∅ for j � k
(i.e., Ii is a partition of Hi)
� for all h, h�
∈ Ii,j, we have ρ(h) = ρ(h�
) and χ(h) = χ(h�
)
(i.e., nodes from the same information set are owned by the same
player and have the same sets of enabled actions)
Given an information set Ii,j, we denote by χ(Ii,j) the set of all
actions enabled in some (and hence all) nodes of Ii,j.
Now we deﬁne the set of pure, mixed, and behavioral strategies in Gimp as
subsets of pure, mixed, and behavioral strategies, resp., in Gperf that respect
the information sets. 208
Imperfect Information Games – Strategies
Let Gimp = (Gperf , I) be an imperfect-information extensive-form game
where Gperf = (N, A, H, Z, χ, ρ, π, h0, u).
Deﬁnition 65
A pure strategy of player i in Gimp is a pure strategy si in Gperf such
that for all j = 1, . . . , ki and all h, h�
∈ Ii,j holds si(h) = si(h�
).
Note that each si can also be seen as a function si : Ii → A such that for
every Ii,j ∈ Ii we have that si(Ii,j) ∈ χ(Ii,j).
As before, we denote by Si the set of all pure strategies of player i in
Gimp, and by S = S1 × · · · × Sn the set of all pure strategy proﬁles.
As in the perfect-information case we have a corresponding
strategic-form game ¯Gimp = (N, (Si)i∈N , (ui)i∈N).
209
Matching Pennies
1
h0
2
h1
(1, −1)
H
(−1, 1)
T
H
2
h2
(−1, 1)
H
(1, −1)
T
T
I1 = {I1,1} where I1,1 = {h0}
I1 = {I2,1} where I2,1 = {h1, h2}
Example of pure strategies:
� s1(I1,1) = H which describes the strategy s1(h0) = H
� s2(I2,1) = T which describes the strategy s2(h1) = s2(h2) = T
(it is also sufﬁcient to specify s2(h1) = T since then s2(h2) = T)
So we really have strategies H, T for player 1 and H, T for player 2.
210
Weird Example
1
h0
2
h1
(1, 2)
K
(2, 1)
L
A
2
h2
(3, 5)
K
(7, 1)
L
B
1
h3
(2, 5)
A
(11, 0)
B
(−4, 10)
C
C
Note that I1 = {I1,1} where I1,1 = {h0, h3}
and that I2 = {I2,1} where I2,1 = {h1, h2}
What pure strategies are in this example?
211
SPE with Imperfect Information
1
h0
2
h1
h3
1
z1
(4, 1)
C
z2
(1, 4)
¯C
B
1
h4
z3
(1, 4)
C
z4
(4, 1)
¯C
¯B
A
2
h2
z5
(1, 1)
D
z6
(4, 5)
¯D
¯A
What we designate as subgames to allow the backward induction?
Only subtrees rooted in h1, h2, and h0 (together with all subtrees
rooted in terminal nodes)
Note that subtrees rooted in h3 and h4 cannot be considered as
"independent" subgames because their individual solutions cannot be
combined to a single best response in the information set {h3, h4}. 212
SPE with Imperfect Information
Let Gimp = (Gperf , I) be an imperfect-information extensive-form game
where Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is the underlying
perfect-information extensive-form game.
Let us denote by Hsingle the set of all h ∈ H such that Iρ(h),j containing
h satisﬁes Iρ(h),j = {h}.
I.e. h ∈ Hsingle iff h is a "perfect-information" node in which player ρ(h) knows
precisely the node h.
Deﬁnition 66
For every h ∈ Hsingle we deﬁne a subgame Gh
imp
to be the imperfect
information game (Gh
perf
, Ih
) where Ih
is the restriction of I to Hh
.
Note that as subgames we consider only subtrees rooted in
"perfect-information" nodes, that is nodes whose corresponding information
set is a singleton.
Deﬁnition 67
A strategy proﬁle s ∈ S is a subgame perfect equilibrium (SPE) if sh
is
a Nash equilibrium in every subgame Gh
imp
of Gimp (here h ∈ Hsingle).
213
Backward Induction with Imperfect Info
The backward induction generalizes to imperfect-information
extensive-form games along the following lines:
1. As in the perfect-information case, the goal is to label each node
h ∈ Hsingle ∪ Z with a SPE sh
and a vector of payoffs
u(h) = (u1(h), . . . , un(h)) for individual players according to sh
.
2. Starting with terminal nodes, the labeling proceeds bottom up.
Terminal nodes are labeled similarly as in the perfect-inf. case.
3. Consider h ∈ Hsingle, let K be the set of all h�
∈
�
Hsingle ∪ Z
�
� {h}
that are h’s closest descendants out of Hsingle ∪ Z.
I.e., h�
∈ K iff h�
� h is reachable from h and the unique path from h to
h�
visits only nodes of H � Hsingle (except the ﬁrst and the last node).
For every h�
∈ K we have already computed a SPE sh�
in Gh�
imp
and the vector of corresponding payoffs u(h�
).
4. Now consider all nodes of K as terminal nodes where each
h�
∈ K has payoffs u(h�
). This gives a new game in which we
compute an equilibrium ¯sh
together with the vector u(h).
The equilibrium sh
is then obtained by "concatenating" ¯sh
with
all sh�
, here h�
∈ K, in the subgames Gh�
imp
of Gh
imp
. 214
Mutually Assured Destruction
Analysis of Cuban missile crisis of 1962
(as described in Games for Business and
Economics by R. Gardner)
� The crisis started with United States’ discovery of Soviet nuclear
missiles in Cuba.
� The USSR then backed down, agreeing to remove the missiles
from Cuba, which suggests that US had a credible threat "if you
don’t back off we both pay dearly".
Question: Could this indeed be a credible threat?
215
Mutually Assured Destruction (Cont.)
Model as an extensive-form game:
� First, player 1 (US) chooses to either ignore the incident (I),
resulting in maintenance of status quo (payoffs (0, 0)), or
escalate the situation (E).
� Following escalation by player 1, player 2 can back down (B),
causing it to lose face (payoffs (10, −10)), or it can choose to
proceed to a nuclear confrontation (N).
� Upon this choice, the players play a simultaneous-move game in
which they can either retreat (R), or choose doomsday (D).
� If both retreat, the payoffs are (−5, −5), a small loss due to
a mobilization process.
� If either of them chooses doomsday, then the world
destructs and payoffs are (−100, −100).
Find SPE in pure strategies.
216
Mutually Assured Destruction (Cont.)
1
h0
2
h1
h2
1
h3
2
(−5, −5)
z1
R
(−100, −100)
z2
D
R
h4
2
(−100, −100)
z3
R
(−100, −100)
z4
D
D
N
(10, −10)
z5
B
E
(0, 0)
z6
I
Solve G
h2
imp
(a strategic-form game). Then G
h1
imp
by solving a game rooted in h1
with terminal nodes h2, z5 (payoffs in h2 correspond to an equilibrium in G
h2
imp
).
Finally solve Gimp by solving a game rooted in h0 with terminal nodes h1, z6
(payoffs in h1 have been computed in the previous step). 217