Dynamic Games of Complete Information Extensive-Form Games Imperfect-Information Mixed and Behavioral Strategies Games with Chance Nodes 218 Mixed and Behavioral Strategies Definition 68 A mixed strategy σi of player i in Gimp is a mixed strategy of player i in the corresponding strategic-form game ¯Gimp = (N, (Si)i∈N , ui). Do not forget that now si ∈ Si iff si is a pure strategy that assigns the same action to all nodes of every information set. Hence each si ∈ Si can be seen as a function si : Ii → A. As before, we denote by Σi the set of all mixed strategies of player i and by Σ the set of all mixed strategy profiles Σ1 × · · · × Σn. Definition 69 A behavioral strategy of player i in Gimp is a behavioral strategy βi in Gperf such that for all j = 1, . . . , ki and all h, h� ∈ Ii,j : βi(h) = βi(h� ). Each βi can be seen as a function βi : Ii → Δ(A) such that for all Ii,j ∈ Ii we have supp(βi(Ii,j)) ⊆ χ(Ii,j). Are they equivalent as in the perfect-information case? 219 Example: Absent Minded Driver 1 0 L 1 5 L 1 R R Only one player: A driver who has to take a turn at a particular junction. There are two identical junctions, the first one leads to a wrong neighborhood where the driver gets completely lost (payoff 0), the second one leads home (payoff 5). If the driver misses both, there is a longer way home (payoff 1). The problem is that after missing the first turn, the driver forgets that he missed the turn. Behavioral strategy: β1(I1,1)(L) = 1 2 has the expected payoff 3 2 . No mixed strategy gives a larger payoff than 1 since no pure strategy ever reaches the terminal node with payoff 5. 220 Kuhn’s Theorem Definition 70 Player i has perfect recall in Gimp if the following holds: � Every information set of player i intersects every path from the root h0 to a terminal node at most once. � Every two paths from the root that end in the same information set of player i � pass through the same information sets of player i, � and in the same order, � and in every such information set the two paths choose the same action. In other words, along all paths ending in the same information set, player i sees the same sequence of information sets and makes the same decisions in his nodes (i.e. at the end knows exactly the sequence of visited information sets and all his own choices along the way). 221 Kuhn’s Theorem The notion of induced strategies can be straightforwardly generalized to imperfect information games: Behavioral to mixed: We say that a mixed strategy σi is induced by a behavioral strategy βi if σi(si) = � Ii,j ∈Ii βi(Ii,j)(si(Ii,j)) for all si ∈ Si As before, for the opposite direction some notation is needed. Recall that given h ∈ H, we denote by w[h] the unique path from h0 to h. Given h ∈ Hi, we denote by Sh i the set of all pure strategies si ∈ Si such that for every h� ∈ Hi visited by w[h] we have that si(h� ) is the action chosen in h� on w[h]. Given h ∈ Hi and a ∈ χ(h), we denote by Sh,a i the set of all pure strategies si ∈ Sh i such that si(h) = a. 222 Kuhn’s Theorem Mixed to behavioral: We say that a behavioral strategy βi is induced by a mixed strategy σi if the following holds: Let Ii,j be an information set of player i and let h ∈ Ii,j be (an arbitrary) node of Ii,j. We have that � either � si ∈Sh i σi(si) = 0 � or for each action a ∈ χ(h) (= χ(Ii,j)) : βi(Ii,j)(a) = � si ∈Sh,a i σi(si) � si ∈Sh i σi(si) (Here the perfect recall implies that the definition of βi(Ii,j) does not depend on the choice of h.) Theorem 71 (Kuhn, 1953) Let α be a mixed/behavioral strategy profile and let α� be any mixed/behavioral profile obtained from α by substituting some of the strategies in α with strategies they induce. Then ui(α) = ui(α� ). The concepts of Nash equilibria and SPE in behavioral strategies are the same as in the perfect information case. 223 Complexity of Zero-Sum Games Recall that a behavioral strategy βi of player i is maxmin if βi ∈ argmax β� i ∈Bi min β−i ∈B−i ui(β� i , β−i) Similarly for pure and mixed strategies. Theorem 72 (Koller and Megiddo, 1990) Consider finite two-player zero-sum imperfect information games. � For such games with perfect recall, the problem of computing a maxmin behavioral strategy is in PTIME. � For games with possibly imperfect recall, the problem of computing a (pure, behavioral, or mixed) strategy that guarantees a given payoff is NP-hard. How to compute Nash equilibria in polynomial time? Existence of a poly. time algorithm for computing behavioral NE does not immediately follow from Thm 72 and von Neumann’s Thm 48. Indeed, Thm 48 has been proved only for mixed strategies. However, using Kuhn’s thm, von Neumann’s thm can be easily extended to behavioral strategies. 224 Complexity of Zero-Sum Games Proposition 5 Let (β1, β2) be a behavioral strategy profile. Then (β1, β2) is a NE iff both β1 and β2 are maxmin. Proof. Let (β1, β2) be a profile of behavioral strategies. Apply Kuhn’s theorem and obtain induced mixed strategies (σ1, σ2). Since we used only the Kuhn’s theorem to obtain (σ1, σ2) from (β1, β2), for both i ∈ {1, 2} holds: ui(β1, β2) = ui(σ1, σ2) and � for every behavioral strategy β� −i and an induced mixed strategy σ� −i , we have that ui(βi, β� −i ) = ui(σi, σ� −i ), � for every mixed strategy σ� −i and an induced behavioral strategy β� −i , we have that ui(σi, σ� −i ) = ui(βi, β� −i ). Now (β1, β2) is a Nash equilibrium iff (σ1, σ2) is a Nash equilibrium iff σ1 and σ2 are maxmin iff β1 and β2 are maxmin. Corollary 73 The complexity of computing Nash equilibria in behavioral strategies in two-player zero-sum imperfect information games with perfect recall is in PTIME. 225 Complexity of Non-Zero-Sum Games Computing NE (or SPE) in non-zero-sum imperfect-information extensive-form games is at least as hard as for strategic-form games. Backward induction helps in decomposing the game into "subgames" rooted in nodes of Hsingle but large games may still remain to be solved using other methods. Naively, any solution concept developed for strategic-form games can be applied to imperfect-information extensive-form games (with perfect recall) via the corresponding strategic-form game ¯Gimp. However, such solution is not efficient (the corresponding game is exponentially large and often degenerate). More efficient methods exist for two-player games of perfect recall, e.g., using sequence form representation of the game, where nodes of Gimp are represented by sequences of actions leading to the nodes, which leads to a linear complementarity problem of polynomial size, which in turn can be solved using a modified Lemke-Howson. For a detailed treatment of complexity see "The complexity of computing a (perfect) equilibrium for an n-player extensive form game of perfect recall" by Kousha Etessami. 226 Imperfect-information and Chance Nodes 0 h0 1 h1 (−2, 2) C (−1, 1) F 1 2 1 h2 (2, −2) C (−1, 1) F 1 2 A very simple card game: � Player 1 chooses randomly a card from a large deck of cards, containing only an equal number of Kings and Aces. � Then Player 1 may either call (C) or fold (F), no look at the card. � If he folds, then pays $1 to player 2, otherwise � call + King means that player 1 pays $2 to player 2 � call + Ace means that player 2 pays $2 to player 1 227 Imperfect-information and Chance Nodes An imperfect-information extensive-form game with chance nodes is a tuple Gimp = (Gperf , I, β0) where � The set of players N is equal to {0, 1, . . . , n} (i.e., there is a new player 0 called chance, or nature), � We assume that for every h ∈ H0 the set of enabled actions χ(h) is the set of all children nodes of h, � Each information set of player 0 is a singleton (i.e., the nature has a complete information), � β0 is a fixed behavioral strategy for player 0. Player 0 always plays according to β0. Note that due to the above assumption, β0(h) is a distribution on all children of h As player 0 plays the same strategy always, we exclude this strategy from strategy profiles (i.e. pure strategy profiles remain to be elements of S1 × · · · × Sn) A game with chance nodes is a perfect information game if all information sets of I are singletons. 228 Example 0 h0 1 h1 (−2, 2) C (−1, 1) F 1 2 1 h2 (2, −2) C (−1, 1) F 1 2 Here β0(h0)(h1) = 1 2 and β0(h0)(h2) = 1 2 Player 1 has just one information set I1,1 = {h1, h2}. Consider a mixed strategy σ1 of player 1 defined by σ1(I1,1)(C) = 1 4 and σ1(I1,1)(F) = 3 4 . Then u1(σ1) = 1 2 1 4 (−2) + 1 2 3 4 (−1) + 1 2 1 4 2 + 1 2 3 4 (−1) = −3 4 229 Results All results for games without chance nodes presented so far remain valid for games with chance nodes. In particular, Theorem 57 and Theorem 64 remain valid for games of perfect information with chance nodes. Concretely: Theorem 74 Consider games of perfect information with chance nodes. � There exists a pure strategy profile which is a SPE with respect to pure strategies. � There exists a pure strategy profile which is a SPE with respect to behavioral strategies. Backward induction can be straightforwardly modified to deal with chance nodes (see next slide). 230 Backward induction with perfect info. & chance Backward Induction: We inductively "attach" to every node h a SPE sh in Gh , and expected payoffs u(h) = (u1(h), . . . , un(h)). � Initially: Attach to each terminal node z ∈ Z the empty profile sz = (∅, . . . , ∅) and the payoff vector u(z) = (u1(z), . . . , un(z)). � While(there is an unattached node h with all children attached): 1. Let K be the set of all children of h 2. If χ(h) � 0 then let hmax ∈ argmaxh�∈K uρ(h)(h� ) and � attach to h a SPE sh where sh ρ(h) (h) = hmax and for i ∈ N � {0} and all h� ∈ Hi define sh i (h� ) = s ¯h i (h� ) where h� ∈ H ¯h ∩ Hi (i.e. in subgames rooted in ¯h ∈ K, sh behaves as s ¯h .) � attach to h expected payoffs ui(h) = ui(hmax) for i ∈ N � {0} 3. If χ(h) = 0, then � attach to h a SPE sh where for all i ∈ N � {0} and all h� ∈ Hi define sh i (h� ) = s ¯h i (h� ) where h� ∈ H ¯h ∩ Hi (i.e. in subgames rooted in ¯h ∈ K, sh behaves as s ¯h .) � attach to h the expected payoffs ui(h) = � ¯h∈K � β0(h)(¯h) � ui(¯h) (i.e., the weighted average payoff in all children nodes) 231 Backward Induction for Imperfect Info & Chance The high-level description of backward induction for imperfect-information games given earlier remains valid also for imperfect-information games with chance nodes. We only have to notice that in the games newly created in step 4., player 0 participates with the strategy β0. 232 Dynamic Games of Complete Information Repeated Games Finitely Repeated Games 233 Example C S C −5, −5 0, −20 S −20, 0 −1, −1 Imagine that the criminals are being arrested repeatedly. Can they somewhat reflect upon their experience in order to play "better"? In what follows we consider strategic-form games played repeatedly � for finitely many rounds, the final payoff of each player will be the average of payoffs from all rounds � infinitely many rounds, here we consider a discounted sum of payoffs and the long-run average payoff We analyze Nash equilibria and sub-game perfect equilibria. We stick to pure strategies only! 234 Finitely Repeated Games Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a finite strategic-form game of two players. A T-stage game GT-rep based on G proceeds in T stages so that in a stage t ≥ 1, players choose a strategy profile st = (st 1 , st 2 ). After T stages, both players collect the average payoff �T t=1 ui(st ) / T. A history of length 0 ≤ t ≤ T is a sequence h = s1 · · · st ∈ St of t strategy profiles. Denote by H(t) the set of all histories of length t. A pure strategy for player i in a T-stage game GT-rep is a function τi : T−1� t=0 H(t) → Si which for every possible history chooses a next step for player i. Every strategy profile τ = (τ1, τ2) in GT-rep induces a sequence of pure strategy profiles wτ = s1 · · · sT in G so that st i = τi(s1 · · · st−1 ). Given a pure strategy profile τ in GT-rep such that wτ = s1 · · · sT , define the payoffs ui(τ) = �T t=1 ui(st ) / T. 235 Example C S C −5, −5 0, −20 S −20, 0 −1, −1 Consider a 3-stage game. Examples of histories: �, (C, S), (C, S)(S, S), (C, S)(S, S)(C, C) Here the last one is terminal, obtained using τ1, τ2 s.t.: τ1(�) = C, τ1((C, S)) = S, τ1((C, S)(S, S)) = C τ2(�) = S, τ2((C, S)) = S, τ2((C, S)(S, S)) = C Thus w(τ1,τ2) = (C, S)(S, S)(C, C) u1(τ1, τ2) = (0 + (−1) + (−5))/3 = −2 u2(τ1, τ2) = (−20 + (−1) + (−5))/3 = −26/3 236 Finitely Repeated Games in Extensive-Form Every T-stage game GT-rep can be defined as an imperfect information extensive-form game. Define an imperfect-information extensive-form game Grep imp = (Grep perf , I) such that Grep perf = ({1, 2}, A, H, Z, χ, ρ, π, h0, u) where � A = S1 ∪ S2 � H = (S1 × S2)≤T ∪ (S1 × S2)