Efficient Algorithms for Pure Nash Equilibria In the step 2. of the backward induction, the algorithm may choose an arbitrary hmax ∈ argmaxh�∈K uρ(h)(h� ) and always obtain a SPE. In order to compute all SPE, the algorithm may systematically search through all possible choices of hmax throughout the induction. Backward induction is too inefficient (unnecessarily searches through the whole tree). There are better algorithms, such as α−β-prunning. For details, extensions etc. see e.g. � PB016 Artificial Intelligence I � Multi-player alpha-beta prunning, R. Korf, Artificial Intelligence 48, pages 99-111, 1991 � Artificial Intelligence: A Modern Approach (3rd edition), S. Russell and P. Norvig, Prentice Hall, 2009 186 Example Centipede game: A A A A A D D D D D (1, 0) (0, 2) (3, 1) (2, 4) (4, 3) (3, 5)1 2 1 2 1 SPE in pure strategies: (DDD, DD) ... Isn’t it weird? There are serious issues here ... � In laboratory setting, people usually play A for several steps. � There is a theoretical problem: Imagine, that you are player 2. What would you do when player 1 chooses A in the first step? The SPE analysis says that you should go down, but the same analysis also says that the situation you are in cannot appear :-) 187 Dynamic Games of Complete Information Extensive-Form Games Mixed and Behavioral Strategies 188 Mixed and Behavioral Strategies Definition 58 A mixed strategy σi of player i in G is a mixed strategy of player i in the corresponding strategic-form game. I.e., a mixed strategy σi of player i in G is a probability distribution on Si (recall that Si is the set of all pure strategies, i.e., functions of the form si : Hi → A). As before, we denote by σi the set of all mixed strategies of player i and by Σ the set of all mixed strategy profiles Σ1 × · · · × Σn. Definition 59 A behavioral strategy of player i in G is a function βi : Hi → Δ(A) such that for every h ∈ Hi we have that supp(βi(h)) ⊆ χ(h). Given a profile β = (β1, . . . , βn) of behavioral strategies, we denote by Pβ(z) the probability of reaching z ∈ Z when β is used, i.e., Pβ(z) = k� �=1 βρ(h�−1)(h�)(a�) where h0a1h1a2h2 · · · ak hk is the unique path from h0 to hk = z. We define ui(β) := � z∈Z Pβ(z) · ui(z). 189 Behavioral Strategies: Example 1 h0 2 h1 z1 B 1 h3 z2 C z3 ¯C ¯B A 2 h2 z4 D z5 ¯D ¯A Pure strategies of player 1: AC, A ¯C, ¯AC, ¯A ¯C An example of a mixed strategy σ1 of player 1: σ1(AC) = 1 3 , σ1(A ¯C) = 1 9 , σ1(¯AC) = 1 6 and σ1(¯A ¯C) = 11 18 190 Behavioral Strategies: Example 1 h0 2 h1 z1 B 1 h3 z2 C z3 ¯C ¯B A 2 h2 z4 D z5 ¯D ¯A An example of behavioral strategies of both players: � player 1: β1(h0)(A) = 1 3 and β1(h3)(C) = 1 2 � player 2: β2(h1)(B) = 1 4 and β2(h2)(D) = 1 5 P(β1,β2)(z2) = 1 3 � 1 − 1 4 � 1 2 = 1 8 191 Behavioral Strategies: Example 1 h0 2 h1 z1 (1, 0) B 1 h3 z2 (2, 3) C z3 (3, 2) ¯C ¯B A 2 h2 z4 (1, 1) D z5 (5, 4) ¯D ¯A β = (β1, β2) � player 1: β1(h0)(A) = 1 3 and β1(h3)(C) = 1 2 � player 2: β2(h1)(B) = 1 4 and β2(h2)(D) = 1 5 u1(β) = Pβ(z1) · 1 + Pβ(z2) · 2 + Pβ(z3) · 3 + Pβ(z4) · 1 + Pβ(z5) · 5 = 1 3 1 4 1 + 1 3 3 4 1 2 2 + 1 3 3 4 1 2 3 + 2 3 1 5 1 + 2 3 4 5 5 ≈ 3.508 192 Mixed/Behavioral Profiles Each pure strategy can be considered as a behavioral strategy. Definition 60 A mixed/behavioral strategy profile is a tuple α = (α1, . . . , αn) where each αi is either a mixed, or a behavioral strategy. Let α = (α1, . . . , αn) be a mixed/behavioral strategy profile, and let M = {i1, . . . , ik } ⊆ N be the set of all players ij ∈ N such that αij is a mixed strategy. We define ui(α) = � si1 ∈Si1 · · · � sik ∈Sik   k� �=1 αi� (si� )   · ui(α� 1, . . . , α� n) where α� j =    sj if j ∈ M, αj otherwise. Intuitively, ui(α) is the expected payoff of player i in the following play: First, each player i� ∈ M chooses his pure strategy si� randomly with the probability αi� (si� ), then these fixed pure strategies are played against the behavioral strategies of players from N � M (who may still randomize along the play). 193 Equivalence of Mixed and Behavioral Strategies We show how to translate behavioral strategies to equivalent mixed ones (w.r.t. probabilities of reaching terminal nodes) and vice versa. Behavioral to mixed: We say that a mixed strategy σi is induced by a behavioral strategy βi if σi(si) = � h∈Hi βi(h)(si(h)) for all si ∈ Si Mixed to behavioral: For this direction some notation is needed. Given h ∈ H, we denote by w[h] the unique path from h0 to h. Given h ∈ Hi, we denote by Sh i the set of all pure strategies si ∈ Si such that for every h� ∈ Hi visited by w[h] we have that si(h� ) is the action chosen in h� on w[h]. Intuitively, Sh i consists of all pure strategies that on the unique path from h0 to h chose the appropriate actions to stay on the path. In other words, h can be reached using si (assuming that the opponents play appropriately) iff si ∈ Sh i . Given h ∈ Hi and a ∈ χ(h), we denote by Sh,a i ⊆ Sh i the set of all pure strategies si ∈ Sh i such that si(h) = a. I.e., strategies of Sh,a i may reach h and then choose a there. 194 Equivalence of Mixed and Behavioral Strategies (Cont.) We say that a behavioral strategy βi is induced by a mixed strategy σi if the following holds: For every h ∈ Hi and a ∈ χ(h) � either � si ∈Sh i σi(si) = 0 � or βi(h)(a) = � si ∈Sh,a i σi(si) � si ∈Sh i σi(si) Intuitively, βi(h)(a) is the probability of selecting a in h assuming that h can be reached with a positive probability if the other players play appropriately. If the probability of reaching h using σi is zero (no matter of what the opponents are doing), then the βi(h) may be defined arbitrarily since h is reached with zero probability using β as well. 195 Equivalence of Mixed and Behavioral Strategies Theorem 61 Let α be a mixed/behavioral strategy profile and let α� be any mixed/behavioral profile obtained from α by substituting some of the strategies in α with strategies they induce. Then ui(α) = ui(α� ). In fact, any node of H is reached from h0 with the same probability for all such α� . 196 Equivalence of Mixed and Behavioral Strategies 1 h0 2 h1 z1 (1, 0) B 1 h3 z2 (2, 3) C z3 (3, 2) ¯C ¯B A 2 h2 z4 (1, 1) D z5 (5, 4) ¯D ¯A Pure strategies of player 1: AC, A ¯C, ¯AC, ¯A ¯C Pure strategies of player 2: BD, B ¯D, ¯BD, ¯B ¯D Mixed strategies of player 1: σ1 = (pAC , pA ¯C , p¯A,C , p¯A ¯C ) (Here pXY = σ1(s) where s is a pure str. such that s(h0) = X, s(h3) = Y) Mixed strategies of player 2: σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) 197 Equivalence of Mixed and Behavioral Strategies 1 h0 2 h1 z1 (1, 0) B 1 h3 z2 (2, 3) C z3 (3, 2) ¯C ¯B A 2 h2 z4 (1, 1) D z5 (5, 4) ¯D ¯A Behavioral strategies of player 1: β1 = (qA , qC ) were qA = β1(h0)(A) and qC = β1(h3)(C); Denote q¯A = 1 − qA and q¯C = 1 − qC Behavioral strategies of player 2: β2 = (qB , qD) and we use q¯B = 1 − qB and q¯D = 1 − qD 198 Equivalence of Mixed and Behavioral Strategies Behavioral to mixed: Given β1 = (qA , qC ) and β2 = (qB , qD) define σ1 = (pAC , pA ¯C , p¯A,C , p¯A ¯C ) := (qA qC , qA q¯C , q¯A qC , q¯A q¯C ) σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) := (qB qD, qB q¯D, q¯B qD, q¯B q¯D) What is the probability of reaching z2 ? � Using (β1, β2) : qA q¯B qC (i.e. multiply the probabilities assigned by β1, β2 along the path from h0 to z2) � Using (σ1, σ2) : (qA qC )(q¯B qD + q¯B q¯D) = qA q¯B qC (i.e., player 1 needs to choose the pure strategy AC, player 2 needs to choose any pure strategy which selects ¯B) � Using (σ1, β2) : (qA qC )q¯B = qA q¯B qC (i.e., first player 1 chooses a pure strategy, this needs to be AC, and then player 2 plays against this particular strategy by choosing ¯B) � Using (β1, σ2) : (q¯B qD + q¯B q¯D)qA qC = qA q¯B qC (i.e., first player 2 chooses a pure strategy, needs to be one playing ¯B in h1, and then player 1 plays against this strategy by choosing A and C) 199 Equivalence of Mixed and Behavioral Strategies Mixed to behavioral: Given σ1 = (pAC, pA ¯C, p¯A,C, p¯A ¯C) and σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) we have � β1 = (qA , qC) where qA = pAC +pA ¯C qC =    pAC pAC +pA ¯C if pAC + pA ¯C > 0 x otherwise Here x is an arbitrary number between 0 and 1. � β2 = (qB, qD) where qB = pBD + pB ¯D qD = pBD + p¯BD 200 Equivalence of Mixed and Behavioral Strategies First, consider qA = pAC + pA ¯C > 0. What is the probability of reaching z2 ? � Using (σ1, σ2) : pAC · (p¯BD + p¯B ¯D) i.e., player 1 chooses AC and player 2 chooses a pure str. playing ¯B � Using (β1, β2) : qA · q¯B · qC = (pAC + pA ¯C ) · q¯B · pAC pAC + pA ¯C = q¯B · pAC = pAC · (1 − qB ) = pAC · (1 − (pBD + pB ¯D)) = pAC · (p¯BD + p¯B ¯D) � Using (β1, σ2) : (p¯BD + p¯B ¯D) · qA · qC = qA · q¯B · qC = pAC · (p¯BD + p¯B ¯D) i.e., first player 2 chooses a pure strategy playing ¯B in h1 and then player 1 plays the behavioral strategy β1 against it 201 Equivalence of Mixed and Behavioral Strategies � Using (σ1, β2) : pAC · q¯B = pAC · (p¯BD + p¯B ¯D) i.e., first player 1 chooses the pure strategy AC and then player 2 plays the behavioral str. β2 against it Observe that all possible combinations of mixed and behavioral strategies give the same probability of reaching z2; this holds for all terminal nodes and hence all combinations give the same payoff. Now, assume qA = pAC + pA ¯C = 0 (which implies pAC = 0). What is the probability of reaching z2 ? � Using (σ1, σ2) : pAC · (p¯BD + p¯B ¯D) = 0 � Using (β1, β2) : qA · q¯B · qC = 0 � Using (β1, σ2) : (p¯BD + p¯B ¯D) · qA · qC = 0 � Using (σ1, β2) : pAC · q¯B = 0 202 Behavioral (Mixed) Strategy SPE Let us denote by Bi the set of all behavioral strategies of player i, and by B the set of all behavioral strategy profiles B1 × . . . × Bn. Definition 62 β = (β1, . . . , βn) ∈ B is a behavioral Nash equilibrium if ui(βi, β−i) ≥ ui(β� i , β−i) for all i ∈ N and β� i ∈ Bi Observe that due to Theorem 61 behavioral NE coincide with mixed NE. Definition 63 A subgame perfect equilibrium (SPE) in behavioral strategies is a behavioral strategy profile β ∈ B such that for any subgame Gh of G, the restriction of β to Hh is a behavioral Nash equilibrium. Here β = (β1, . . . , βn) and the restriction of β to Gh is a behavioral strategy profile βh = (βh 1 , . . . , βh n) where each βh i is a restriction of βi to Hh ∩ Hi. Theorem 64 There exists a pure strategy profile which is a SPE in behavioral strategies. The proof is similar to the proof of Theorem 57. 203 Comments on Algorithms Note that some SPE in behavioral strategies can be computed using the backward induction. Indeed, the algorithm computes a pure strategy profile where each player always maximizes his value; such a pure strategy profile is SPE in both pure and behavioral strategies. Even though there always exists a pure SPE, there may exist (a continuum of) SPE composed of "non-pure" behavioral strategies. However, the necessary and sufficient condition for existence of such SPE is that at some point of the backward induction one of the players (say i) has two or more alternatives with the same equilibrium payoff. The same payoff is only for the player i, the other players may have different payoffs depending on the choice of the player i. Then any convex combination of such alternatives can be made by the player i, still leading to SPE (of course, for each combination the resulting SPE may be different). For two players the backward induction can be extended to compute (a finite representation of) all SPE in behavioral strategies in polynomial time. 204 Dynamic Games of Complete Information Extensive-Form Games Imperfect-Information Games 205 Extensive-form of Matching Pennies Is it possible to model Matching pennies using extensive-form games? H T H 1, −1 −1, 1 T −1, 1 1, −1 1 h0 2 h1 (1, −1) H (−1, 1) T H 2 h2 (−1, 1) H (1, −1) T T The problem is that player 2 is "perfectly" informed about the choice of player 1. In particular, there are pure Nash equilibria (H, TH) and (T, TH) in the extensive-form game as opposed to the strategic-form. Reversing the order of players does not help. We need to extend the formalism to be able to hide some information about previous moves. 206 Extensive-form of Matching Pennies Matching pennies can be modeled using an imperfect-information extensive-form game: 1 h0 2 h1 (1, −1) H (−1, 1) T H 2 h2 (−1, 1) H (1, −1) T T Here h1 and h2 belong to the same information set of player 2. As a result, player 2 is not able to distinguish between h1 and h2. So even though players do not move simultaneously, the information player 2 has about the current situation is the same as in the simultaneous case. 207 Imperfect Information Games An imperfect-information extensive-form game is a tuple Gimp = (Gperf , I) where � Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is a perfect-information extensive-form game (called the underlying game), � I = (I1, . . . , In) where for each i ∈ N = {1, . . . , n} Ii = {Ii,1, . . . , Ii,ki } is a collection of information sets for player i that satisfies � �ki j=1 Ii,j = Hi and Ii,j ∩ Ii,k = ∅ for j � k (i.e., Ii is a partition of Hi) � for all h, h� ∈ Ii,j, we have ρ(h) = ρ(h� ) and χ(h) = χ(h� ) (i.e., nodes from the same information set are owned by the same player and have the same sets of enabled actions) Given an information set Ii,j, we denote by χ(Ii,j) the set of all actions enabled in some (and hence all) nodes of Ii,j. Now we define the set of pure, mixed, and behavioral strategies in Gimp as subsets of pure, mixed, and behavioral strategies, resp., in Gperf that respect the information sets. 208 Imperfect Information Games – Strategies Let Gimp = (Gperf , I) be an imperfect-information extensive-form game where Gperf = (N, A, H, Z, χ, ρ, π, h0, u). Definition 65 A pure strategy of player i in Gimp is a pure strategy si in Gperf such that for all j = 1, . . . , ki and all h, h� ∈ Ii,j holds si(h) = si(h� ). Note that each si can also be seen as a function si : Ii → A such that for every Ii,j ∈ Ii we have that si(Ii,j) ∈ χ(Ii,j). As before, we denote by Si the set of all pure strategies of player i in Gimp, and by S = S1 × · · · × Sn the set of all pure strategy profiles. As in the perfect-information case we have a corresponding strategic-form game ¯Gimp = (N, (Si)i∈N , (ui)i∈N). 209 Matching Pennies 1 h0 2 h1 (1, −1) H (−1, 1) T H 2 h2 (−1, 1) H (1, −1) T T I1 = {I1,1} where I1,1 = {h0} I1 = {I2,1} where I2,1 = {h1, h2} Example of pure strategies: � s1(I1,1) = H which describes the strategy s1(h0) = H � s2(I2,1) = T which describes the strategy s2(h1) = s2(h2) = T (it is also sufficient to specify s2(h1) = T since then s2(h2) = T) So we really have strategies H, T for player 1 and H, T for player 2. 210 Weird Example 1 h0 2 h1 (1, 2) K (2, 1) L A 2 h2 (3, 5) K (7, 1) L B 1 h3 (2, 5) A (11, 0) B (−4, 10) C C Note that I1 = {I1,1} where I1,1 = {h0, h3} and that I2 = {I2,1} where I2,1 = {h1, h2} What pure strategies are in this example? 211 SPE with Imperfect Information 1 h0 2 h1 h3 1 z1 (4, 1) C z2 (1, 4) ¯C B 1 h4 z3 (1, 4) C z4 (4, 1) ¯C ¯B A 2 h2 z5 (1, 1) D z6 (4, 5) ¯D ¯A What we designate as subgames to allow the backward induction? Only subtrees rooted in h1, h2, and h0 (together with all subtrees rooted in terminal nodes) Note that subtrees rooted in h3 and h4 cannot be considered as "independent" subgames because their individual solutions cannot be combined to a single best response in the information set {h3, h4}. 212 SPE with Imperfect Information Let Gimp = (Gperf , I) be an imperfect-information extensive-form game where Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is the underlying perfect-information extensive-form game. Let us denote by Hsingle the set of all h ∈ H such that Iρ(h),j containing h satisfies Iρ(h),j = {h}. I.e. h ∈ Hsingle iff h is a "perfect-information" node in which player ρ(h) knows precisely the node h. Definition 66 For every h ∈ Hsingle we define a subgame Gh imp to be the imperfect information game (Gh perf , Ih ) where Ih is the restriction of I to Hh . Note that as subgames we consider only subtrees rooted in "perfect-information" nodes, that is nodes whose corresponding information set is a singleton. Definition 67 A strategy profile s ∈ S is a subgame perfect equilibrium (SPE) if sh is a Nash equilibrium in every subgame Gh imp of Gimp (here h ∈ Hsingle). 213 Backward Induction with Imperfect Info The backward induction generalizes to imperfect-information extensive-form games along the following lines: 1. As in the perfect-information case, the goal is to label each node h ∈ Hsingle ∪ Z with a SPE sh and a vector of payoffs u(h) = (u1(h), . . . , un(h)) for individual players according to sh . 2. Starting with terminal nodes, the labeling proceeds bottom up. Terminal nodes are labeled similarly as in the perfect-inf. case. 3. Consider h ∈ Hsingle, let K be the set of all h� ∈ � Hsingle ∪ Z � � {h} that are h’s closest descendants out of Hsingle ∪ Z. I.e., h� ∈ K iff h� � h is reachable from h and the unique path from h to h� visits only nodes of H � Hsingle (except the first and the last node). For every h� ∈ K we have already computed a SPE sh� in Gh� imp and the vector of corresponding payoffs u(h� ). 4. Now consider all nodes of K as terminal nodes where each h� ∈ K has payoffs u(h� ). This gives a new game in which we compute an equilibrium ¯sh together with the vector u(h). The equilibrium sh is then obtained by "concatenating" ¯sh with all sh� , here h� ∈ K, in the subgames Gh� imp of Gh imp . 214 Mutually Assured Destruction Analysis of Cuban missile crisis of 1962 (as described in Games for Business and Economics by R. Gardner) � The crisis started with United States’ discovery of Soviet nuclear missiles in Cuba. � The USSR then backed down, agreeing to remove the missiles from Cuba, which suggests that US had a credible threat "if you don’t back off we both pay dearly". Question: Could this indeed be a credible threat? 215 Mutually Assured Destruction (Cont.) Model as an extensive-form game: � First, player 1 (US) chooses to either ignore the incident (I), resulting in maintenance of status quo (payoffs (0, 0)), or escalate the situation (E). � Following escalation by player 1, player 2 can back down (B), causing it to lose face (payoffs (10, −10)), or it can choose to proceed to a nuclear confrontation (N). � Upon this choice, the players play a simultaneous-move game in which they can either retreat (R), or choose doomsday (D). � If both retreat, the payoffs are (−5, −5), a small loss due to a mobilization process. � If either of them chooses doomsday, then the world destructs and payoffs are (−100, −100). Find SPE in pure strategies. 216 Mutually Assured Destruction (Cont.) 1 h0 2 h1 h2 1 h3 2 (−5, −5) z1 R (−100, −100) z2 D R h4 2 (−100, −100) z3 R (−100, −100) z4 D D N (10, −10) z5 B E (0, 0) z6 I Solve G h2 imp (a strategic-form game). Then G h1 imp by solving a game rooted in h1 with terminal nodes h2, z5 (payoffs in h2 correspond to an equilibrium in G h2 imp ). Finally solve Gimp by solving a game rooted in h0 with terminal nodes h1, z6 (payoffs in h1 have been computed in the previous step). 217