Dynamic Games of Complete Information
Extensive-Form Games
Imperfect-Information
Mixed and Behavioral Strategies
Games with Chance Nodes
218
Mixed and Behavioral Strategies
Deﬁnition 68
A mixed strategy σi of player i in Gimp is a mixed strategy of player i in
the corresponding strategic-form game ¯Gimp = (N, (Si)i∈N , ui).
Do not forget that now si ∈ Si iff si is a pure strategy that assigns the same
action to all nodes of every information set. Hence each si ∈ Si can be seen
as a function si : Ii → A.
As before, we denote by Σi the set of all mixed strategies of player i
and by Σ the set of all mixed strategy proﬁles Σ1 × · · · × Σn.
Deﬁnition 69
A behavioral strategy of player i in Gimp is a behavioral strategy βi in
Gperf such that for all j = 1, . . . , ki and all h, h�
∈ Ii,j : βi(h) = βi(h�
).
Each βi can be seen as a function βi : Ii → Δ(A) such that for all Ii,j ∈ Ii we
have supp(βi(Ii,j)) ⊆ χ(Ii,j).
Are they equivalent as in the perfect-information case?
219
Example: Absent Minded Driver
1
0
L
1
5
L
1
R
R
Only one player: A driver who has to take a turn at a particular
junction. There are two identical junctions, the ﬁrst one leads to
a wrong neighborhood where the driver gets completely lost
(payoff 0), the second one leads home (payoff 5). If the driver misses
both, there is a longer way home (payoff 1). The problem is that after
missing the ﬁrst turn, the driver forgets that he missed the turn.
Behavioral strategy: β1(I1,1)(L) = 1
2 has the expected payoff 3
2 .
No mixed strategy gives a larger payoff than 1 since no pure strategy
ever reaches the terminal node with payoff 5. 220
Kuhn’s Theorem
Deﬁnition 70
Player i has perfect recall in Gimp if the following holds:
� Every information set of player i intersects every path from
the root h0 to a terminal node at most once.
� Every two paths from the root that end in the same information
set of player i
� pass through the same information sets of player i,
� and in the same order,
� and in every such information set the two paths choose the
same action.
In other words, along all paths ending in the same information set, player i
sees the same sequence of information sets and makes the same decisions
in his nodes (i.e. at the end knows exactly the sequence of visited information
sets and all his own choices along the way).
221
Kuhn’s Theorem
The notion of induced strategies can be straightforwardly generalized
to imperfect information games:
Behavioral to mixed: We say that a mixed strategy σi is induced by
a behavioral strategy βi if
σi(si) =
�
Ii,j ∈Ii
βi(Ii,j)(si(Ii,j)) for all si ∈ Si
As before, for the opposite direction some notation is needed. Recall that
given h ∈ H, we denote by w[h] the unique path from h0 to h.
Given h ∈ Hi, we denote by Sh
i
the set of all pure strategies si ∈ Si such that
for every h�
∈ Hi visited by w[h] we have that si(h�
) is the action chosen in h�
on w[h].
Given h ∈ Hi and a ∈ χ(h), we denote by Sh,a
i
the set of all pure strategies
si ∈ Sh
i
such that si(h) = a.
222
Kuhn’s Theorem
Mixed to behavioral: We say that a behavioral strategy βi is induced
by a mixed strategy σi if the following holds:
Let Ii,j be an information set of player i and let h ∈ Ii,j be (an arbitrary)
node of Ii,j. We have that
� either
�
si ∈Sh
i
σi(si) = 0
� or for each action a ∈ χ(h) (= χ(Ii,j)) :
βi(Ii,j)(a) =
�
si ∈Sh,a
i
σi(si)
�
si ∈Sh
i
σi(si)
(Here the perfect recall implies that the deﬁnition of βi(Ii,j) does not depend
on the choice of h.)
Theorem 71 (Kuhn, 1953)
Let α be a mixed/behavioral strategy proﬁle and let α�
be any
mixed/behavioral proﬁle obtained from α by substituting some of
the strategies in α with strategies they induce. Then ui(α) = ui(α�
).
The concepts of Nash equilibria and SPE in behavioral strategies are
the same as in the perfect information case.
223
Complexity of Zero-Sum Games
Recall that a behavioral strategy βi of player i is maxmin if
βi ∈ argmax
β�
i
∈Bi
min
β−i ∈B−i
ui(β�
i , β−i)
Similarly for pure and mixed strategies.
Theorem 72 (Koller and Megiddo, 1990)
Consider ﬁnite two-player zero-sum imperfect information games.
� For such games with perfect recall, the problem of computing
a maxmin behavioral strategy is in PTIME.
� For games with possibly imperfect recall, the problem of
computing a (pure, behavioral, or mixed) strategy that
guarantees a given payoff is NP-hard.
How to compute Nash equilibria in polynomial time?
Existence of a poly. time algorithm for computing behavioral NE does not
immediately follow from Thm 72 and von Neumann’s Thm 48. Indeed,
Thm 48 has been proved only for mixed strategies. However, using Kuhn’s
thm, von Neumann’s thm can be easily extended to behavioral strategies.
224
Complexity of Zero-Sum Games
Proposition 5
Let (β1, β2) be a behavioral strategy proﬁle. Then (β1, β2) is a NE iff
both β1 and β2 are maxmin.
Proof. Let (β1, β2) be a proﬁle of behavioral strategies. Apply Kuhn’s
theorem and obtain induced mixed strategies (σ1, σ2).
Since we used only the Kuhn’s theorem to obtain (σ1, σ2) from
(β1, β2), for both i ∈ {1, 2} holds: ui(β1, β2) = ui(σ1, σ2) and
� for every behavioral strategy β�
−i
and an induced mixed strategy
σ�
−i
, we have that ui(βi, β�
−i
) = ui(σi, σ�
−i
),
� for every mixed strategy σ�
−i
and an induced behavioral strategy
β�
−i
, we have that ui(σi, σ�
−i
) = ui(βi, β�
−i
).
Now (β1, β2) is a Nash equilibrium iff (σ1, σ2) is a Nash equilibrium iff
σ1 and σ2 are maxmin iff β1 and β2 are maxmin.
Corollary 73
The complexity of computing Nash equilibria in behavioral strategies
in two-player zero-sum imperfect information games with perfect
recall is in PTIME.
225
Complexity of Non-Zero-Sum Games
Computing NE (or SPE) in non-zero-sum imperfect-information
extensive-form games is at least as hard as for strategic-form games.
Backward induction helps in decomposing the game into "subgames"
rooted in nodes of Hsingle but large games may still remain to be
solved using other methods.
Naively, any solution concept developed for strategic-form games can
be applied to imperfect-information extensive-form games (with
perfect recall) via the corresponding strategic-form game ¯Gimp.
However, such solution is not efﬁcient (the corresponding game is
exponentially large and often degenerate).
More efﬁcient methods exist for two-player games of perfect recall,
e.g., using sequence form representation of the game, where nodes
of Gimp are represented by sequences of actions leading to the
nodes, which leads to a linear complementarity problem of polynomial
size, which in turn can be solved using a modiﬁed Lemke-Howson.
For a detailed treatment of complexity see "The complexity of computing a
(perfect) equilibrium for an n-player extensive form game of perfect recall" by
Kousha Etessami. 226
Imperfect-information and Chance Nodes
0
h0
1
h1
(−2, 2)
C
(−1, 1)
F
1
2
1
h2
(2, −2)
C
(−1, 1)
F
1
2
A very simple card game:
� Player 1 chooses randomly a card from a large deck of cards,
containing only an equal number of Kings and Aces.
� Then Player 1 may either call (C) or fold (F), no look at the card.
� If he folds, then pays $1 to player 2, otherwise
� call + King means that player 1 pays $2 to player 2
� call + Ace means that player 2 pays $2 to player 1
227
Imperfect-information and Chance Nodes
An imperfect-information extensive-form game with chance nodes is
a tuple Gimp = (Gperf , I, β0) where
� The set of players N is equal to {0, 1, . . . , n} (i.e., there is a new
player 0 called chance, or nature),
� We assume that for every h ∈ H0 the set of enabled actions χ(h)
is the set of all children nodes of h,
� Each information set of player 0 is a singleton (i.e., the nature
has a complete information),
� β0 is a ﬁxed behavioral strategy for player 0. Player 0 always
plays according to β0.
Note that due to the above assumption, β0(h) is a distribution on all
children of h
As player 0 plays the same strategy always, we exclude this strategy
from strategy proﬁles
(i.e. pure strategy proﬁles remain to be elements of S1 × · · · × Sn)
A game with chance nodes is a perfect information game if all
information sets of I are singletons.
228
Example
0
h0
1
h1
(−2, 2)
C
(−1, 1)
F
1
2
1
h2
(2, −2)
C
(−1, 1)
F
1
2
Here β0(h0)(h1) = 1
2 and β0(h0)(h2) = 1
2
Player 1 has just one information set I1,1 = {h1, h2}.
Consider a mixed strategy σ1 of player 1 deﬁned by
σ1(I1,1)(C) = 1
4 and σ1(I1,1)(F) = 3
4 .
Then u1(σ1) = 1
2
1
4 (−2) + 1
2
3
4 (−1) + 1
2
1
4 2 + 1
2
3
4 (−1) = −3
4
229
Results
All results for games without chance nodes presented so
far remain valid for games with chance nodes.
In particular, Theorem 57 and Theorem 64 remain valid for
games of perfect information with chance nodes. Concretely:
Theorem 74
Consider games of perfect information with chance nodes.
� There exists a pure strategy proﬁle which is a SPE with
respect to pure strategies.
� There exists a pure strategy proﬁle which is a SPE with
respect to behavioral strategies.
Backward induction can be straightforwardly modiﬁed to deal
with chance nodes (see next slide).
230
Backward induction with perfect info. & chance
Backward Induction: We inductively "attach" to every node h a SPE
sh
in Gh
, and expected payoffs u(h) = (u1(h), . . . , un(h)).
� Initially: Attach to each terminal node z ∈ Z the empty proﬁle
sz
= (∅, . . . , ∅) and the payoff vector u(z) = (u1(z), . . . , un(z)).
� While(there is an unattached node h with all children attached):
1. Let K be the set of all children of h
2. If χ(h) � 0 then let hmax ∈ argmaxh�∈K uρ(h)(h�
) and
� attach to h a SPE sh
where sh
ρ(h)
(h) = hmax and for i ∈ N � {0}
and all h�
∈ Hi deﬁne sh
i
(h�
) = s
¯h
i
(h�
) where h�
∈ H
¯h
∩ Hi
(i.e. in subgames rooted in ¯h ∈ K, sh
behaves as s
¯h
.)
� attach to h expected payoffs ui(h) = ui(hmax) for i ∈ N � {0}
3. If χ(h) = 0, then
� attach to h a SPE sh
where for all i ∈ N � {0} and all h�
∈ Hi
deﬁne sh
i
(h�
) = s
¯h
i
(h�
) where h�
∈ H
¯h
∩ Hi
(i.e. in subgames rooted in ¯h ∈ K, sh
behaves as s
¯h
.)
� attach to h the expected payoffs
ui(h) =
�
¯h∈K
�
β0(h)(¯h)
�
ui(¯h)
(i.e., the weighted average payoff in all children nodes) 231
Backward Induction for Imperfect Info & Chance
The high-level description of backward induction for
imperfect-information games given earlier remains valid also for
imperfect-information games with chance nodes.
We only have to notice that in the games newly created in
step 4., player 0 participates with the strategy β0.
232
Dynamic Games of Complete Information
Repeated Games
Finitely Repeated Games
233
Example
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Imagine that the criminals are being arrested repeatedly.
Can they somewhat reﬂect upon their experience in order to play
"better"?
In what follows we consider strategic-form games played repeatedly
� for ﬁnitely many rounds, the ﬁnal payoff of each player will be
the average of payoffs from all rounds
� inﬁnitely many rounds, here we consider a discounted sum of
payoffs and the long-run average payoff
We analyze Nash equilibria and sub-game perfect equilibria.
We stick to pure strategies only!
234
Finitely Repeated Games
Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a ﬁnite strategic-form game of
two players.
A T-stage game GT-rep based on G proceeds in T stages so that in
a stage t ≥ 1, players choose a strategy proﬁle st
= (st
1
, st
2
).
After T stages, both players collect the average payoff
�T
t=1 ui(st
) / T.
A history of length 0 ≤ t ≤ T is a sequence h = s1
· · · st
∈ St
of t
strategy proﬁles. Denote by H(t) the set of all histories of length t.
A pure strategy for player i in a T-stage game GT-rep is a function
τi :
T−1�
t=0
H(t) → Si
which for every possible history chooses a next step for player i.
Every strategy proﬁle τ = (τ1, τ2) in GT-rep induces a sequence of
pure strategy proﬁles wτ = s1
· · · sT
in G so that st
i
= τi(s1
· · · st−1
).
Given a pure strategy proﬁle τ in GT-rep such that wτ = s1
· · · sT
,
deﬁne the payoffs ui(τ) =
�T
t=1 ui(st
) / T. 235
Example
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Consider a 3-stage game.
Examples of histories: �, (C, S), (C, S)(S, S), (C, S)(S, S)(C, C)
Here the last one is terminal, obtained using τ1, τ2 s.t.:
τ1(�) = C, τ1((C, S)) = S, τ1((C, S)(S, S)) = C
τ2(�) = S, τ2((C, S)) = S, τ2((C, S)(S, S)) = C
Thus w(τ1,τ2) = (C, S)(S, S)(C, C)
u1(τ1, τ2) = (0 + (−1) + (−5))/3 = −2
u2(τ1, τ2) = (−20 + (−1) + (−5))/3 = −26/3
236
Finitely Repeated Games in Extensive-Form
Every T-stage game GT-rep can be deﬁned as an imperfect
information extensive-form game.
Deﬁne an imperfect-information extensive-form game Grep
imp
= (Grep
perf
, I)
such that Grep
perf
= ({1, 2}, A, H, Z, χ, ρ, π, h0, u) where
� A = S1 ∪ S2
� H = (S1 × S2)≤T
∪ (S1 × S2)<T
· S1
Intuitively, elements of (S1 × S2)≤k
are possible histories;
(S1 × S2)<k
· S1 is used to simulate a simultaneous play of G by letting
player 1 choose ﬁrst and player 2 second.
� Z = (S1 × S2)T
� χ(�) = S1 and χ(h · s1) = S2 for s1 ∈ S1, and χ(h · (s1, s2)) = S1
for (s1, s2) ∈ S
� ρ(�) = 1 and ρ(h · s1) = 2 and ρ(h · (s1, s2)) = 1
� π(�, s1) = s1 and π(h · s1, s2) = h · (s1, s2) and
π(h · (s1, s2), s�
1
) = h · (s1, s2) · s�
1
� h0 = � and ui((s1
1
, s1
2
)(s2
1
, s2
2
) · · · (sT
1
, sT
2
)) =
�T
t=1 ui(st
1
, st
2
) / T
237
Finitely Repeated Games in Extensive-Form
The set of information sets is deﬁned as follows: Let h ∈ H1 be a node
of player 1, then
� there is exactly one information set of player 1 containing h as
the only element,
� there is exactly one information set of player 2 containing all
nodes of the form h · s1 where s1 ∈ S1.
Intuitively, in every round, player 1 has a complete information about
results of past plays,
player 1 chooses a pure strategy s1 ∈ S1,
player 2 is not informed about s1 but still has a complete information
about results of all previous rounds,
player 2 chooses a pure strategy s2 ∈ S2 and both players are
informed about the result.
238
Finitely Repeated Games – Equilibria
Deﬁnition 75
A strategy proﬁle τ = (τ1, τ2) in a T-stage game GT-rep is a Nash
equilibrium if for every i ∈ {1, 2} and every τ�
i
we have
ui(τ1, τ2) ≥ ui(τ�
i , τ−i)
To deﬁne SPE we use the following notation. Given a history
h = s1
· · · st
and a strategy τi of player i, we deﬁne a strategy τh
i
in
(T − t)-stage game based on G by
τh
i (¯s1
· · · ¯s
¯t
) = τi(s1
· · · st ¯s1
· · · ¯s
¯t
) for every sequence ¯s1
· · · ¯s
¯t
(i.e. τh
i
behaves as τi after h)
Deﬁnition 76
A strategy proﬁle τ = (τ1, τ2) in a T-stage game GT-rep is
a subgame-perfect Nash equilibrium (SPE) if for every history h
the proﬁle (τh
1
, τh
2
) is a Nash equilibrium in the (T − |h|)-stage game
based on G.
239
SPE with Single NE in G
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Consider a T-stage game based on Prisoner’s dilemma.
For every T, ﬁnd a SPE.
... there is one, play (C, C) all the time. Is it all?
Theorem 77
Let G be an arbitrary ﬁnite strategic-form game. If G has a unique
Nash equilibrium, then playing this equilibrium all the time is
the unique SPE in the T-stage game based on G.
Proof.
By backward induction, players have to play the NE in the last stage.
As the behavior in the last stage does not depend on the behavior in
the (T − 1)-th stage, they have to play the NE also in the (T − 1)-th
stage. Then the same holds in the (T − 2)-th stage, etc. �
240
Further Discussion of Prisoner’s Dilemma
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Are there other NE (that are not SPE) in the repeated Prisoner’s
dilemma?
To simplify our discussion, we use the following notation: X−YZ,
where X, Y, Z ∈ {C, S} denotes the following strategy:
� In the ﬁrst phase, play X
� In the second phase, play Y if the opponent plays C in the ﬁrst
phase, otherwise play Z
There are 4 NE: They are the four proﬁles that lead to (C, C)(C, C),
i.e., each player plays either C−CC, or C−CS.
241
Further Discussion of Prisoner’s Dilemma
C S
C −5, −5 0, −20
S −20, 0 −1, −1
The strategy C strictly dominates S in the Prisoner’s dilemma.
Is there a strictly dominant strategy in the 2-stage game based on
the Prisoner’s dilemma?
If player 2 plays S−CS, then the best responses of player 1 are
S−CC and S−SC.
(The strategy S−CS is usually called "tit-for-tat".)
If player 2 plays S−SC, then the best responses are C−SC and
C−CC.
So there is no strictly dominant strategy for player 1.
(Which would be among the best responses for all strategies of player 2.)
242
SPE with Multiple NE in G
Let s = (s1, s2) be a Nash equilibrium in G.
Deﬁne a strategy proﬁle τ = (τ1, τ2) in GT-rep where
� τ1 chooses s1 in every stage
� τ2 chooses s2 in every stage
Proposition 6
τ is a SPE in GT-rep for every T ≥ 1.
Proof.
Apparently, changing τi in some stage(s) may only result in the same
or worse payoff for player i, since the other player always plays s2
independent of the choices of player 1. �
The proposition may be generalized by allowing players to play
different equilibria in particular stages
I.e., consider a sequence of NE s1
, s2
, . . . , sT
in G and assume that in stage �
player i plays s�
i
Does this cover all possible SPE in ﬁnitely repeated games?
243
SPE with Multiple NE in G
m f r
M 4, 4 −1, 5 0, 0
F 5, −1 1, 1 0, 0
R 0, 0 0, 0 3, 3
NE in the above game G : (F, f) and (R, r)
Consider 2-stage game G2-rep and strategies τ1, τ2 where
� τ1 : Chooses M in stage 1. In stage 2 plays R if (M, m) was
played in the ﬁrst stage, and plays F otherwise.
� τ2 : Chooses m in stage 1. In stage 2 plays r if (M, m) was
played in the ﬁrst stage, and plays f otherwise.
Is this SPE?
Note that here the players do not play a NE in the ﬁrst step.
The idea is that both players agree to play a Pareto optimal proﬁle. If
both comply, then a favorable NE is played in the second stage. If one
of them betrays then a "punishing" NE is played.
244