8 SPACE COMPLEXITY In this chapter we consider the complexity of computational problems in terms of the amount of space, or memory, that they require. Time and space are two of the most important considerations when we seek practical solutions to many computational problems. Space complexity shares many of the features of time complexity and serves as a further way of classifying problems according to their computational difficulty. As we did with time complexity, we need to select a model for measuring the space used by an algorithm. We continue with the Turing machine model for the same reason that we used it to measure time. Turing machines are mathematically simple and close enough to real computers to give meaningful results. definition 8.1 Let M be a deterministic Turing machine that halts on all inputs. The space complexity of M is the function /: TV—>J\f, where f(n) is the maximum number of tape cells that M scans on any input of length n. If the space complexity of M is f(n), we also say that M runs in space /(n). If M is a nondeterministic Turing machine wherein all branches halt on all inputs, we define its space complexity f(n) to be the maximum number of tape cells that M scans on any branch of its computation for any input of length n. 303 304 chapter 8 / space complexity We typically estimate the space complexity of Turing machines by using asymptotic notation. definition 8.2 Let /: M—>TZ+ be a function. The space complexity classes, SPACE(/(n)) and NSPACE(/(n)), are defined as follows. SPACE(/(n)) = {L\ L is a language decided by an 0(/(n)) space deterministic Turing machine}. NSPACE(/(n)) = {L\ L is a language decided by an 0(f(n)) space nondeterministic Turing machine}. example 8.3 In Chapter 7 we introduced the NP-complete problem SAT. Here, we show that SAT can be solved with a linear space algorithm. We believe that SAT cannot be solved with a polynomial time algorithm, much less with a linear time algorithm, because SAT is NP-complete. Space appears to be more powerful than time because space can be reused, whereas time cannot. Mi = "On input {4>), where 0 is a Boolean formula: 1. For each truth assignment to the variables xi, ..., xm of 0: 2. Evaluate on that truth assignment. 3. If 0 ever evaluated to 1, accept; if not, reject." Machine Mi clearly runs in linear space because each iteration of the loop can reuse the same portion of the tape. The machine needs to store only the current truth assignment and that can be done with 0(m) space. The number of variables m is at most n, the length of the input, so this machine runs in space 0(n). example 8.4 .............................................................................................................................. Here, we illustrate the nondeterministic space complexity of a language. In the next section we show how determining the nondeterministic space complexity can be useful in determining its deterministic space complexity. Consider 8.1 savitch's theorem 305 the problem of testing whether a nondeterministic finite automaton accepts all strings. Let ALLNFA = {(A)\ A is a NFA and L(A) = £*}. We give a nondeterministic linear space algorithm that decides the complement of this language, ALL^fA- The idea behind this algorithm is to use nondeter-minism to guess a string that is rejected by the NFA and to use linear space to keep track of which states the NFA could be in at a particular time. Note that this language is not known to be in NP or in coNP. JV = "On input (M) where M is an NFA: 1. Place a marker on the start state of the NFA. 2. Repeat 2q times, where q is the number of states of M: 3. Nondeterministically select an input symbol and change the positions of the markers on M's states to simulate reading that symbol. 4. Accept if Stages 2 and 3 reveal some string that M rejects, that is, if at some point none of the markers lie on accept states of M. Otherwise, reject." If M rejects any strings, it must reject one of length at most 2q because in any longer string that is rejected the locations of the markers described in the preceding algorithm would repeat. The section of the string between the repetitions can be removed to obtain a shorter rejected string. Hence N decides ALLupa. (Note that N accepts improperly formed inputs, too.) The only space needed by this algorithm is for storing the location of the markers and the repeat loop counter, and that can be done with linear space. Hence the algorithm runs in nondeterministic space 0(n). Next, we prove a theorem that provides information about the deterministic space complexity of ALL^fa. i; 8.1 SAVITCH'S THEOREM Savitch's theorem is one of the earliest results concerning space complexity. It shows that deterministic machines can simulate nondeterministic machines by using a surprisingly small amount of space. For time complexity, such a simulation seems to require an exponential increase in time. For space complexity, Savitch's theorem shows that any nondeterministic TM that uses f(n) space can be converted to a deterministic TM that uses only f2 (n) space. 306 chapter 8 / space complexity theorem 8.5 Savitch's theorem For any1 function /: M—>1Z+, where /(n) > n, NSPACE(/(n)) C SPACE(/2(n)). proof idea We need to simulate an f(n) space NTM deterministically. A naive approach is to proceed by trying all the branches of the NTM's computation, one by one. The simulation needs to keep track of which branch it is currently trying so that it is able to go on to the next one. But a branch that uses f(n) space may run for 2°^^n^ steps, and each step may be a nondeterministic choice. Exploring the branches sequentially would require recording all the choices used on a particular branch in order to be able to find the next branch. Therefore this approach may use 2°^(-n^ space, exceeding our goal of 0(f2(n)) space. Instead, we take a different approach by considering the following more general problem. We are given two configurations of the NTM, c\ and c2, together with a number t, and we must test whether the NTM can get from c\ to c2 within t steps. We call this problem theyieldability problem. By solving the yieldability problem, where c\ is the start configuration, c2 is the accept configuration, and t is the maximum number of steps that the nondeterministic machine can use, we can determine whether the machine accepts its input. We give a deterministic, recursive algorithm that solves the yieldability problem. It operates by searching for an intermediate configuration cm, and recursively testing whether (1) ci can get to cm within t/2 steps, and (2) whether cm can get to c2 within t/2 steps. Reusing the space for each of the two recursive tests allows a significant saving of space. This algorithm needs space for storing the recursion stack. Each level of the recursion uses 0(/(n)) space to store a configuration. The depth of the recursion is log t, where t is the maximum time that the nondeterministic machine may use on any branch. We have t = 2°^^\ so logi = 0(f(n)). Hence the deterministic simulation uses 0(/2(n)) space. proof Let N be an NTM deciding a language A in space f(n). We construct a deterministic TM M deciding A. Machine M uses the procedure CANYIELD, which tests whether one of AT's configurations can yield another within a specified number of steps. This procedure solves the yieldability problem described in the proof idea. Let w be a string considered as input to A". For configurations c\ and c2 of N on w, and integer t, CANYIELD (ci, c2, t) outputs accept if A" can go from configuration c\ to configuration c2 in t or fewer steps along some nondeterministic path. If not, CANYIELD outputs reject. For convenience, we assume that t is a power of 2. On page 323, we show that Savitch's theorem also holds whenever f(n) > log n. 8.1 savitch's theorem 307 CANYIELD = "On input ci, c2, and t: 1. If t = 1, then test directly whether C\ — c2 or whether c\ yields c2 in one step according to the rules of N. Accept if either test succeeds; reject if both fail. 2. If t > 1, then for each configuration cm of iV on to using space f(n): 3. Run CANYIELD (ci, cm, |). 4. Run CANYIELD (cm, c2, f). 5. If steps 3 and 4 both accept, then accept. 6. If haven't yet accepted, reject." Now we define M to simulate N as follows. We first modify N so that when it accepts it clears its tape and moves the head to the leftmost cell, thereby entering a configuration called caccept. We let cstart be the start configuration of N on w. We select a constant d so that N has no more than 2d^n^ configurations using fin) tape, where n is the length of w. Then we know that 2d^n^ provides an upper bound on the running time of any branch of N on w. M = "On input w: 1. Output the result of CANYIELD (cstart, caccept, 2d/(n))." Algorithm CANYIELD obviously solves the yieldability problem, and hence M correctly simulates N. We need to analyze it to verify that M works within 0(/2(n)) space. Whenever CANYIELD invokes itself recursively, it stores the current stage number and the values of c\, c2, and t on a stack so that these values may be restored upon return from the recursive invocation. Each level of the recursion thus uses 0(/(n)) additional space. Furthermore, each level of the recursion divides the size of t in half. Initially t starts out equal to 2d^n\ so the depth of the recursion is 0(log2d-^n)) or 0(f(n)). Therefore the total space used is 0(/2(n)), as claimed. One technical difficulty arises in this argument because algorithm M needs to know the value of f(n) when it calls CANYIELD. We can handle this difficulty by modifying M so that it tries f(n) — 1, 2,3,... For each value /(n) = i, the modified algorithm uses CANYIELD to determine whether the accept configuration is reachable. In addition, it uses CANYIELD to determine whether N uses at least space i + 1 by testing whether N can reach any of the configurations of length i +1 from the start configuration. If the accept configuration is reachable, M accepts; if no configuration of length i + 1 is reachable, M rejects; and otherwise M continues with f(n) — i + 1. (We could have handled this difficulty in another way by assuming that M can compute f(n) within 0(f(n)) space, but then we would need to add that assumption to the statement of the theorem). 308 chapter 8 / space complexity 8.2 .... THE CLASS PSPACE By analogy with the class P, we define the class PSPACE for space complexity. definition 8.6 PSPACE is the class of languages that are decidable in polynomial space on a deterministic Turing machine. In other words, PSPACE = |JSPACE(nfc). k We define NPSPACE, the nondeterministic counterpart to PSPACE, in terms of the NSPACE classes. However, PSPACE = NPSPACE by virtue of Savitch's theorem, because the square of any polynomial is still a polynomial. In Examples 8.3 and 8.4 we showed that SAT is in SPACE(n) and that ALLNf& is in coNSPACE(n) and hence, by Savitch's theorem, in SPACE(n2), because the deterministic space complexity classes are closed under complement. Therefore both languages are in PSPACE. Let's examine the relationship of PSPACE with P and NP. We observe that P C PSPACE because a machine that runs quickly cannot use a great deal of space. More precisely, for t(n) > n, any machine that operates in time t(n) can use at most t(n) space because a machine can explore at most one new cell at each step of its computation. Similarly, NP C NPSPACE, and so NP C PSPACE. Conversely, we can bound the time complexity of a Turing machine in terms of its space complexity. For f(n) > n, a TM that uses f(n) space can have at most f(n) 2°(f(n^ different configurations, by a simple generalization of the proof of Lemma 5.8 on page 194. A TM computation that halts may not repeat a configuration. Therefore a TM^ that uses space /(n) must run in time f(n) 2°^^\ so PSPACE C EXPTIME = (Jfc TIME(2nfc). We summarize our knowledge of the relationships among the complexity classes defined so far in the series of containments PCNPC PSPACE = NPSPACE C EXPTIME. We don't know whether any of these containments is actually an equality. Someone may yet discover a simulation like the one in Savitch's theorem that merges some of these classes into the same class. However, in Chapter 9 we prove that P 7^ EXPTIME. Therefore at least one of the preceding containments is proper, but we are unable to say which! Indeed, most researchers ^The requirement here that f(n) > n is generalized later to f(n) > logn, when we introduce TMs that use sublinear space on page 322. 8.3 pspace-completeness 309 believe that all the containments are proper. The following diagram depicts the relationships among these classes, assuming that all are different. figure 8.7 Conjectured relationships among P, NP, PSPACE, and EXPTIME 8.3 PSPACE-COMPLETENESS In Section 7.4 we introduced the category of NP-complete languages as representing the most difficult languages in NP. Demonstrating that a language is NP-complete provides strong evidence that the language is not in P. If it were, P and NP would be equal. In this section we introduce the analogous notion, PSPACE-completeness, for the class PSPACE. definition 8.8 A language B is PSPACE-complete if it satisfies two conditions: 1. B is in PSPACE, and 2. every A in PSPACE is polynomial time reducible to B. If B merely satisfies condition 2, we say that it is PSPACE-hard. In defining PSPACE-completeness, we use polynomial time reducibility as given in Definition 7.29. Why don't we define a notion of polynomial space reducibility and use that instead of polynomial time reducibility? To understand the answer to this important question, consider our motivation for defining complete problems in the first place. 310 chapter 8 / space complexity Complete problems are important because they are examples of the most difficult problems in a complexity class. A complete problem is most difficult because any other problem in the class is easily reduced into it, so if we find an easy way to solve the complete problem, we can easily solve all other problems in the class. The reduction must be easy, relative to the complexity of typical problems in the class, for this reasoning to apply. If the reduction itself were difficult to compute, an easy solution to the complete problem wouldn't necessarily yield an easy solution to the problems reducing to it. Therefore the rule is: Whenever we define complete problems for a complexity class, the reduction model must be more limited than the model used for defining the class itself. THE TQBF PROBLEM Our first example of a PSPACE-complete problem involves a generalization of the satisfiability problem. Recall that a Boolean formula is an expression that contains Boolean variables, the constants 0 and 1, and the Boolean operations A, V, and ->. We now introduce a more general type of Boolean formula. The quantifiers V (for all) and 3 (there exists) make frequent appearances in mathematical statements. Writing the statement Vrc 4> means that, for every value for the variable x, the statement x] means that the successor x + 1 of every natural number x is greater than the number itself. Obviously, this statement is true. However, the statement By [y + y = 3] obviously is false. When interpreting the meaning of statements involving quantifiers, we must consider the universe from which the values are drawn. In the preceding cases the universe comprised the natural numbers, but if we took the real numbers instead, the existentially quantified statement would become true. Statements may contain several quantifiers, as in \/x 3y [y > x]. For the universe of the natural numbers, this statement says that every natural number has another natural number larger than it. The order of the quantifiers is important. Reversing the order, as in the statement ByMx [y > x], gives an entirely different meaning—namely, that some natural number is greater than all others. Obviously, the first statement is true and the second statement is false. A quantifier may appear anywhere in a mathematical statement. It applies to the fragment of the statement appearing within the matched pair of parentheses or brackets following the quantified variable. This fragment is called the scope of the quantifier. Often, it is convenient to require that all quantifiers appear at the beginning of the statement and that each quantifier's scope is everything following it. Such statements are said to be mprenex normal form. Any statement may be put into prenex normal form easily. We consider statements in this form only, unless otherwise indicated. 8.3 pspace-completeness 31 1 Boolean formulas with quantifiers are called quantified Boolean formulas. For such formulas, the universe is {0.1}. For example, = Vx 3y [ (x V y) A (x V y) ] is a quantified Boolean formula. Here, 0 is true, but it would be false if the quantifiers Vx and 3y were reversed. When each variable of a formula appears within the scope of some quantifier, the formula is said to befiilly quantified. A fully quantified Boolean formula is sometimes called a sentence and is always either true or false. For example, the preceding formula 0 is fully quantified. However, if the initial part, Vx, of 0 were removed, the formula would no longer be fully quantified and would be neither true nor false. The TQBF problem is to determine whether a fully quantified Boolean formula is true or false. We define the language TQBF = 0 is a true fully quantified Boolean formula}. theorem 8.9 ............................................................................................................................ TQBF is PSPACE-complete. proof idea To show that TQBF is in PSPACE we give a straightforward algorithm that assigns values to the variables and recursively evaluates the truth of the formula for those values. From that information the algorithm can determine the truth of the original quantified formula. To show that every language A in PSPACE reduces to TQBF in polynomial time, we begin with a polynomial space-bounded Turing machine for A. Then we give a polynomial time reduction that maps a string to a quantified Boolean formula 0 that encodes a simulation of the machine on that input. The formula is true iff the machine accepts. As a first attempt at this construction, let's try to imitate the proof of the Cook-Levin theorem, Theorem 7.37. We can construct a formula 0 that simulates M on an input w by expressing the requirements for an accepting tableau. A tableau for M on w has width 0{nk), the space used by M, but its height is exponential in nk because M can run for exponential time. Thus, if we were to represent the tableau with a formula directly, we would end up with a formula of exponential size. However, a polynomial time reduction cannot produce an exponential-size result, so this attempt fails to show that A

0, we construct a formula (pCl,c2,t- If we assign c\ and c2 to actual configurations, the formula is true iff M can go from c\ to c2 in at most t steps. Then we can let 0 be the formula 0cstart,caccept,/i> where h = 2d^n^ for a constant d, chosen so that M has no more than 2d^n' possible configurations on an input of length n. Here, let f(n) = nk. For convenience, we assume that t is a power of 2. The formula encodes the contents of tape cells as in the proof of the Cook-Levin theorem. Each cell has several variables associated with it, one for each tape symbol and state, corresponding to the possible settings of that cell. Each configuration has nk cells and so is encoded by 0(nk) variables. If t = 1, we can easily construct 4>ci,c2,t- We design the formula to say that either c\ equals c2, or c2 follows from c\ in a single step of M. We express the equality by writing a Boolean expression saying that each of the variables representing c\ contains the same Boolean value as the corresponding variable representing c2- We express the second possibility by using the technique presented in the proof of the Cook-Levin theorem. That is, we can express that c\ yields c2 in a single step of M by writing Boolean expressions stating that the contents of each triple of c\'s cells correctly yields the contents of the corresponding triple of c2's cells. If t > 1, we construct 0Ci,c2)t recursively. As a warmup let's try one idea that doesn't quite work and then fix it. Let 0Cl,C2,t = 3mi [0Cl,mi,i A0mijC2)t]. 8.3 pspace-completeness 313 The symbol mi represents a configuration of M. Writing 3mi is shorthand for 3xi, ... ,xi, where / = 0(nk) and xi, ... ,xi are the variables that encode mi. So this construction of 4>Cl iC2;t says that M can go from ci to c2 in at most t steps if some intermediate configuration mi exists, whereby M can go from c\ to mi in at most | steps and then from mi to c2 in at most | steps. Then we construct the two formulas (f>CuTni^ and 0mi)C2) * recursively. The formula 4>Cl,c2,t has the correct value; that is, it is TRUE whenever M can go from ci to c2 within t steps. However, it is too big. Every level of the recursion involved in the construction cuts t in half but roughly doubles the size of the formula. Hence we end up with a formula of size roughly t. Initially t — 2df (n), so this method gives an exponentially large formula. To reduce the size of the formula we use the V quantifier in addition to the 3 quantifier. Let (pCl,c2,t = 3miV(c3,c4)e{(ci,mi),(mi,c2)} [0C3)C4,±]. The introduction of the new variables representing the configurations c3 and c4 allows us to "fold" the two recursive subformulas into a single subformula, while preserving the original meaning. By writing V(c3,c4) e {(ci,mi), (mi,c2)}, we indicate that the variables representing the configurations c3 and c4 may take the values of the variables of ci and mi or of mi and c2, respectively, and that the resulting formula 0C3,C4) 1 is true in either case. We may replace the construct Vx e {y,z} [... ] by the equivalent construct Vx [ (x = y V x = z) —> ... ] to obtain a syntactically correct quantified Boolean formula. Recall that in Section 0.2 we showed that Boolean implication (—>) and Boolean equality (=) can be expressed in terms of AND and NOT. Here, for clarity, we use the symbol = for Boolean equality instead of the equivalent symbol <-+ used in Section 0.2. To calculate the size of the formula (f>csvm,ciCcept,h, where h — 2d^n\ we note that each level of the recursion adds a portion of the formula that is linear in the size of the configurations and is thus of size 0(f(n)). The number of levels of the recursion is log(2d^n)), or 0(/(n)). Hence the size of the resulting formula isO(/2(n)). WINNING STRATEGIES FOR GAMES For the purposes of this section, a game is loosely defined to be a competition in which opposing parties attempt to achieve some goal according to prespec-ified rules. Games appear in many forms, from board games such as chess to economic and war games that model corporate or societal conflict. Games are closely related to quantifiers. A quantified statement has a corresponding game; conversely, a game often has a corresponding quantified statement. These correspondences are helpful in several ways. For one, expressing a mathematical statement that uses many quantifiers in terms of the corresponding game may give insight into the statement's meaning. For another, expressing a game in terms of a quantified statement aids in understanding the complexity of the game. To illustrate the correspondence between games and quantifiers, we turn to an artificial game called the formula game. 314 chapter 8 / space complexity Let (ft = 3x\ Vx2 3x3 • ■ • Qxk [ift] be a quantified Boolean formula in prenex normal form. Here Q represents either a V or an 3 quantifier. We associate a game with (ft as follows. Two players, called Player A and Player E, take turns selecting the values of the variables xi, ..., x&. Player A selects values for the variables that are bound to V quantifiers and player E selects values for the variables that are bound to 3 quantifiers. The order of play is the same as that of the quantifiers at the beginning of the formula. At the end of play we use the values that the players have selected for the variables and declare that Player E has won the game if ip, the part of the formula with the quantifiers stripped off, is now TRUE. Player A has won if ip is now FALSE. example 8.10 .......................................................................................................................... Say that (fti is the formula 3xi Vx2 3x3 [(xi V x2) A (x2 V x3) A (xj V xj)]. In the formula game for (fti, Player E picks the value of xi, then Player A picks the value of x2, and finally Player E picks the value of x3. To illustrate a sample play of this game, we begin by representing the Boolean value TRUE with 1 and FALSE with 0, as usual. Let's say that Player E picks xi = l, then Player A picks X2 = 0, and finally Player E picks x3 = 1. With these values for x\, x2, and X3, the subformula (xi V x2) A (x2 V x3) A (x~2 V X3) is 1, so Player E has won the game. In fact, Player E may always win this game by selecting x\ = 1 and then selecting X3 to be the negation of whatever Player A selects for x2. We say that Player E has a winning strategy for this game. A player has a winning strategy for a game if that player wins when both sides play optimally. Now let's change the formula slightly to get a game in which Player A has a winning strategy. Let (ft2 be the formula 3xi Vx2 3x3 [(xi V x2) A (x2 V x3) A (x2 V x3")]. Player A now has a winning strategy because, no matter what Player E selects for xi, Player A may select x2 = 0, thereby falsifying the part of the formula appearing after the quantifiers, whatever Player E's last move may be. We next consider the problem of determining which player has a winning strategy in the formula game associated with a particular formula. Let FORMULA-GAME = {{(ft) | Player E has a winning strategy in the formula game associated with (ft). 8.3 pspace-completeness 315 theorem 8.11 FORMULA-GAME is PSPACE-complete proof idea FORMULA-GAME is PSPACE-complete for a simple reason. It is the same as TQBF. To see that FORMULA-GAME = TQBF, observe that a formula is TRUE exactly when Player E has a winning strategy in the associated formula game. The two statements are different ways of saying the same thing. proof The formula 0 = 3xi Vx2 3x3 • • • [^] is TRUE when some setting for x\ exists such that, for any setting of X2, a setting of X3 exists such that, and so on ..., where ip is TRUE under the settings of the variables. Similarly, Player E has a winning strategy in the game associated with 0 when Player E can make some assignment to x\ such that, for any setting of X2, Player E can make an assignment to X3 such that, and so on ..., -0 is TRUE under these settings of the variables. The same reasoning applies when the formula doesn't alternate between existential and universal quantifiers. If 0 has the form Vxi, X2, X3 3x4, X5 Vx6 [ip], Player A would make the first three moves in the formula game to assign values to xi, X2, and X3; then Player E would make two moves to assign X4 and X5; and finally Player A would assign a value xq. Hence 0 e TQBF exactly when 0 e FORMULA-GAME, and the theorem follows from Theorem 8.9. GENERALIZED GEOGRAPHY Now that we know that the formula game is PSPACE-complete, we can establish the PSPACE-completeness or PSPACE-hardness of some other games more easily. We'll begin with a generalization of the game geography and later discuss games such as chess, checkers, and GO. Geography is a child's game in which players take turns naming cities from anywhere in the world. Each city chosen must begin with the same letter that ended the previous city's name. Repetition isn't permitted. The game starts with some designated starting city and ends when some player loses because he or she is unable to continue. For example, if the game starts with Peoria, then Amherst might legally follow (because Peoria ends with the letter a, and Amherst begins with the letter a), then Tucson, then Nashua, and so on until one player gets stuck and thereby loses. We can model this game with a directed graph whose nodes are the cities of the world. We draw an arrow from one city to another if the first can lead to the second according to the game rules. In other words, the graph contains an edge from a city X to a city Y if city X ends with the same letter that begins city Y. We illustrate a portion of the geography graph in Figure 8.12. 316 chapter 8 / space complexity figure 8.12 Portion of the graph representing the geography game When the rules of geography are interpreted for this graphic representation, one player starts by selecting the designated start node and then the players take turns alternately by picking nodes that form a simple path in the graph. The requirement that the path be simple (i.e., doesn't use any node more than once) corresponds to the requirement that a city may not be repeated. The first player unable to extend the path loses the game. In generalized geography we take an arbitrary directed graph with a designated start node instead of the graph associated with the actual cities. For example, the following graph is an example of a generalized geography game. figure 8.13 A sample generalized geography game 8.3 pspace-completeness 317 Say that Player I is the one who moves first and Player II second. In this example, Player I has a winning strategy as follows. Player I starts at node 1, the designated start node. Node 1 points only at nodes 2 and 3, so Player I's first move must be one of these two choices. He chooses 3. Now Player II must move, but node 3 points only to node 5, so she is forced to select node 5. Then Player I selects 6, from choices 6, 7, and 8. Now Player II must play from node 6, but it points only to node 3, and 3 was previously played. Player II is stuck, and thus Player I wins. If we change the example by reversing the direction of the edge between nodes 3 and 6, Player II has a winning strategy. Can you see it? If Player I starts out with node 3 as before, Player II responds with 6 and wins immediately, so Player I's only hope is to begin with 2. In that case, however, Player II responds with 4. If Player I now takes 5, Player II wins with 6. If Player I takes 7, Player II wins with 9. No matter what Player I does, Player II can find a way to win, so Player II has a winning strategy. The problem of determining which player has a winning strategy in a generalized geography game is PSPACE-complete. Let GG = {(G, b) | Player I has a winning strategy for the generalized geography game played on graph G starting at node b). theorem 8.14 ......................................................................................................................... GG is PSPACE-complete. proof idea A recursive algorithm similar to the one used for TQBF in Theorem 8.9 determines which player has a winning strategy. This algorithm runs in polynomial space and so GG e PSPACE. To prove that GG is PSPACE-hard, we give a polynomial time reduction from FORMULA-GAME to GG. This reduction converts a formula game to a generalized geography graph so that play on the graph mimics play in the formula game. In effect, the players in the generalized geography game are really playing an encoded form of the formula game. proof The following algorithm decides whether Player I has a winning strategy in instances of generalized geography; in other words, it decides GG. We show that it runs in polynomial space. M = "On input (G, b), where G is a directed graph and 6 is a node of G: 1. If b has outdegree 0, reject, because Player I loses immediately. 2. Remove node b and all connected arrows to get a new graph G\. 3. For each of the nodes bi, b2, ..., bk that b originally pointed at, recursively call M on (Gi,bi). 4. If all of these accept, Player II has a winning strategy in the original game, so reject. Otherwise, Player II doesn't have a winning strategy, so Player I must; therefore accept." 318 chapter 8 / space complexity The only space required by this algorithm is for storing the recursion stack. Each level of the recursion adds a single node to the stack, and at most m levels occur, where m is the number of nodes in G. Hence the algorithm runs in linear space. To establish the PSPACE-hardness of GG, we show that FORMULA-GAME is polynomial time reducible to GG. The reduction maps the formula = 3xi Vx2 3x3 ■■■ Qxk [ tf)} to an instance (G, b) of generalized geography. Here we assume for simplicity that 0's quantifiers begin and end with 3 and that they strictly alternate between 3 and V. A formula that doesn't conform to this assumption may be converted to a slightly larger one that does by adding extra quantifiers binding otherwise unused or "dummy" variables. We assume also that ip is in conjunctive normal form (see Problem 8.12). The reduction constructs a geography game on a graph G where optimal play mimics optimal play of the formula game on 4>. Player I in the geography game takes the role of Player E in the formula game, and Player II takes the role of Player A. The structure of graph G is partially shown in the following figure. Play starts at node b, which appears at the top left-hand side of G. Underneath b, a sequence of diamond structures appears, one for each of the variables of 0. Before getting to the right-hand side of G, let's see how play proceeds on the left-hand side. figure 8.15 Partial structure of the geography game simulating the formula game 8.3 pspace-completeness 319 Play starts at b. Player I must select one of the two edges going from b. These edges correspond to Player E's possible choices at the beginning of the formula game. The left-hand choice for Player I corresponds to TRUE for Player E in the formula game and the right-hand choice to FALSE. After Player I has selected one of these edges—say, the left-hand one—Player II moves. Only one outgoing edge is present, so this move is forced. Similarly, Player I's next move is forced and play continues from the top of the second diamond. Now two edges again are present, but Player II gets the choice. This choice corresponds to Player A's first move in the formula game. As play continues in this way, Players I and II choose a rightward or leftward path through each of the diamonds. After play passes through all the diamonds, the head of the path is at the bottom node in the last diamond, and it is Player I's turn because we assumed that the last quantifier is 3. Player I's next move is forced. Then they are at node c in Figure 8.15 and Player II makes the next move. This point in the geography game corresponds to the end of play in the formula game. The chosen path through the diamonds corresponds to an assignment to 0's variables. Under that assignment, if ip is TRUE, Player E wins the formula game, and if ip is FALSE, Player A wins. The structure on the right-hand side of the following figure guarantees that Player I can win if Player E has won and that Player II can win if Player A has won. figure 8.16 Full structure of the geography game simulating the formula game, where 4> = 3xi Vx2 • • • Qxk [{xi V x2 V £3) A (x~2 V S3 V ■ ■ •) A ■■• A ( )] 320 chapter 8 / space complexity At node c, Player II may choose a node corresponding to one of i/j's clauses. Then Player I may choose a node corresponding to a literal in that clause. The nodes corresponding to unnegated literals are connected to the left-hand (TRUE) sides of the diamond for associated variables, and similarly for negated literals and right-hand (FALSE) sides as shown in Figure 8.16. If (f> is FALSE, Player II may win by selecting the unsatisfied clause. Any literal that Player I may then pick is FALSE and is connected to the side of the diamond that hasn't yet been played. Thus Player II may play the node in the diamond, but then Player I is unable to move and loses. If

0} is a member of L. In Section 7.1 on page 247 we described a Turing machine that decides A by zigzagging back and forth across the input, crossing off the Os and Is as they are matched. That algorithm uses linear space to record which positions have been crossed off, but it can be modified to use only log space. 322 chapter 8 / space complexity The log space TM for A cannot cross off the Os and Is that have been matched on the input tape because that tape is read-only. Instead, the machine counts the number of Os and, separately, the number of Is in binary on the work tape. The only space required is that used to record the two counters. In binary, each counter uses only logarithmic space, and hence the algorithm runs in O(logn) space. Therefore A € L. example 8.19 .......................................................................................................................... Recall the language PATH = {(G, s,t) \ G is a directed graph that has a directed path from s to t} defined in Section 7.2. Theorem 7.14 shows that PATH is in P but that the algorithm given uses linear space. We don't know whether PATH can be solved in logarithmic space deterministically, but we do know a nondeterministic log space algorithm for PATH. The nondeterministic log space Turing machine deciding PATH operates by starting at node s and nondeterministically guessing the nodes of a path from s to t. The machine records only the position of the current node at each step on the work tape, not the entire path (which would exceed the logarithmic space requirement). The machine nondeterministically selects the next node from among those pointed at by the current node. Then it repeats this action until it reaches node t and accepts, or until it has gone on for m steps and rejects, where m is the number of nodes in the graph. Thus PATH is in NL. n Our earlier claim that any f{n) space bounded Turing machine also runs in time 2°(f(n^ is no longer true for very small space bounds. For example, a Turing machine that uses 0(1) (i.e., constant) space may run for n steps. To obtain a bound on the running time that applies for every space bound f(n) we give the following definition. definition 8.20 If M is a Turing machine that has a separate read-only input tape and w is an input, a configuration of M on w is a setting of the state, the work tape, and the positions of the two tape heads. The input w is not a part of the configuration of M on uj. If M runs in f{n) space and w is an input of length n, the number of configurations of M on w is n2°U(n». To expl ain this result, let's say that M has c states and g tape symbols. The number of strings that can appear on the work tape is The input head can be in one of n positions and the work tape head can 8.5 nl-completeness 323 be in one of f(n) positions. Therefore the total number of configurations of M on w, which is an upper bound on the running time of M on w, is cnf{n)g^n\ or n2°(/(n)). We focus almost exclusively on space bounds f(n) that are at least log n. Our earlier claim that the time complexity of a machine is at most exponential in its space complexity remains true for such bounds because n2°(f(n^ is 2°^^ when f(n) > logn. Recall that Savitch's theorem shows that we can convert nondeterministic TMs to deterministic TMs and increase the space complexity f(n) by only a squaring, provided that f(n) > n. We can extend Savitch's theorem to hold for sublinear space bounds down to f(n) > logn. The proof is identical to the original one we gave on page 305, except that we use Turing machines with a read-only input tape and instead of referring to configurations of N we refer to configurations of iV on w. Storing a configuration of TV on w uses log(n2°^^n^) = logn + 0(/(n)) space. If /(n) > logn, the storage used is 0(f(n)) and the remainder of the proof remains the same. 8.5 NL-COMPLETENESS As we mentioned in Example 8.19, the PATH problem is known to be in NL but isn't known to be in L. We believe that PATH doesn't belong to L, but we don't know how to prove this conjecture. In fact, we don't know of any problem in NL that can be proven to be outside L. Analogous to the question of whether P = NP we have the question of whether L = NL. As a step toward resolving the L versus NL question, we can exhibit certain languages that are NL-complete. As with complete languages for other complexity classes, the NL-complete languages are examples of languages that are, in a certain sense, the most difficult languages in NL. If L and NL are different, all NL-complete languages don't belong to L. As with our previous definitions of completeness, we define an NL-complete language to be one which is in NL and to which any other language in NL is reducible. However, we don't use polynomial time reducibility here because, as you will see, all problems in NL are solvable in polynomial time. Therefore every two problems in NL except 0 and £* are polynomial time reducible to one another (see the discussion of polynomial time reducibility in the definition of PSPACE-completeness on page 309). Hence polynomial time reducibility is too strong to differentiate problems in NL from one another. Instead we use a new type of reducibility called log space reducibility. 324 chapter 8 / space complexity definition 8.21 A log space transducer is a Turing machine with a read-only input tape, a write-only output tape, and a read/write work tape. The work tape may contain O(logn) symbols. A log space transducer M computes a function /: S*—>£*, where f(w) is the string remaining on the output tape after M halts when it is started with w on its input tape. We call / a log space computable function. Language A is log space reducible to language B, written A 7Z+, where f(n) > n, the space complexity class SPACE(/(n)) is the same whether you define the class by using the single-tape TM model or the two tape read-only input TM model. 8.2 Consider the following position in the standard tic-tac-toe game. X O O X Let's say that it is the X -player's turn to move next. Describe the winning strategy for this player. (Recall that a winning strategy isn't merely the best move to make in the current position. It also includes all the responses that this player must make in order to win, however the opponent moves.) 8.3 Consider the following generalized geography game wherein the start node is the one with the arrow pointing in from nowhere. Does Player I have a winning strategy? Does Player II? Give reasons for your answers. 8.4 Show that PSPACE is closed under the operations union, complementation, and star. A8.5 Show that NL is closed under the operations union, intersection, and star. 8.6 Show that any PSPACE-hard language is also NP-hard. A8.7 Show that ADFa G L. PROBLEMS 8.8 Let EQRBX = {(R,S)\ R and S are equivalent regular expressions}. Show that eQrex G PSPACE. 330 chapter 8 / space complexity 8.9 A ladder is a sequence of strings si,S2, • . •, Sk, wherein every string differs from the preceding one in exactly one character. For example the following is a ladder of English words, starting with "head" and ending with "free": head, hear, near, fear, bear, beer, deer, deed, feed, feet, fret, free. Let LADDERdfa — {(M, s, t)\ M is a DFA and L(M) contains a ladder of strings, starting with s and ending with t}. Show that LADDERDfa is in PSPACE. 8.10 The Japanese game go-moku is played by two players, "X" and "O," on a 19 x 19 grid. Players take turns placing markers, and the first player to achieve 5 of his markers consecutively in a row, column, or diagonal, is the winner. Consider this game generalized to an n x n board. Let GM = {(B)| B is a position in generalized go-moku, where player "X" has a winning strategy}. By a position we mean a board with markers placed on it, such as may occur in the middle of a play of the game, together with an indication of which player moves next. Show that GM G PSPACE. 8.11 Show that, if every NP-hard language is also PSPACE-hard, then PSPACE = NP. 8.12 Show that TQBF restricted to formulas where the part following the quantifiers is in conjunctive normal form is still PSPACE-complete. 8.13 Define Alba — {(M, w)\ M is an LBA that accepts input w}. Show that Alba is PSPACE-complete. 8.14 Consider the following two-person version of the language PUZZLE that was described in Problem 7.26. Each player starts with an ordered stack of puzzle cards. The players take turns placing the cards in order in the box and may choose which side faces up. Player I wins if, in the final stack, all hole positions are blocked, and Player II wins if some hole position remains unblocked. Show that the problem of determining which player has a winning strategy for a given starting configuration of the cards is PSPACE-complete. *8.15 The cat-and-mouse game is played by two players, "Cat" and "Mouse," on an arbitrary undirected graph. At a given point each player occupies a node of the graph. The players take turns moving to a node adjacent to the one that they currently occupy. A special node of the graph is called "Hole." Cat wins if the two players ever occupy the same node. Mouse wins if it reaches the Hole before the preceding happens. The game is a draw if a situation repeats (i.e., the two players simultaneously occupy positions that they simultaneously occupied previously and it is the same player's turn to move). HAPPY-CAT = {(G, c, m, h) \ G,c,m, h, are respectively a graph, and positions of the Cat, Mouse, and Hole, such that Cat has a winning strategy if Cat moves first}. Show that HAPPY-CAT is in P. (Hint: The solution is not complicated and doesn't depend on subtle details in the way the game is defined. Consider the entire game tree. It is exponentially big, but you can search it in polynomial time.) problems 331 8.16 Read the definition of MIN'-FORMULA in Problem 7.44. a. Show that MIN-FORMULA e PSPACE. b. Explain why this argument fails to show that MIN-FORMULA G coNP: If 0 0 MIN-FORMULA, then 0 has a smaller equivalent formula. An NTM can verify that 0 G MIN-FORMULA by guessing that formula. 8.17 Let A be the language of properly nested parentheses. For example, (()) and (()(()))() are in A, but) (is not. Show that A is in L. *8.18 Let B be the language of properly nested parentheses and brackets. For example, ([()()]()□) is in B but ([)] is not. Show that B is in L. *8.19 The game of Nim is played with a collection of piles of sticks. In one move a player may remove any nonzero number of sticks from a single pile. The players alternately take turns making moves. The player who removes the very last stick loses. Say that we have a game position in Nim with k piles containing si, ..., Sk sticks. Call the position balanced if, when each of the numbers Si is written in binary and the binary numbers are written as rows of a matrix aligned at the low order bits, each column of bits contains an even number of Is. Prove the following two facts. a. Starting in an unbalanced position, a single move exists that changes the position into a balanced one. b. Starting in a balanced position, every single move changes the position into an unbalanced one. Let NIM = {(si, ..., Sk)\ each Si is a binary number and Player I has a winning strategy in the Nim game starting at this position}. Use the preceding facts about balanced positions to show that NIM G L. 8.20 Let MULT = {a#b#c\ where a, b, c are binary natural numbers and a x b — c}. Show that MULT G L. 8.21 For any positive integer x, let x71 be the integer whose binary representation is the reverse of the binary representation of x. (Assume no leading 0s in the binary representation of x.) Define the function IZ+ : Af—>Af where TZ+(x) = x + xn. a. Let A2 = {{x,y) \ TZ+(x) = y}. Show A2 G L. b. Let A3 = {{x,y)\ TZ+{U+{x)) = y}. Show A3 G L. 8.22 a. Let ADD = {{x, y,z)\x,y,z>0 are binary integers and x + y = z}. Show that ADD G L. b. Let PAL-ADD — {{x,y)\ x,y > 0 are binary integers where x + y is an integer whose binary representation is a palindrome}. (Note that the binary representation of the sum is assumed not to have leading zeros. A palindrome is a string that equals its reverse). Show that PAL-ADD G L. *8.23 Define UCYCLE = {(G)| G is an undirected graph that contains a simple cycle}. Show that UCYCLE G L. (Note: G may be a graph that is not connected.) *8.24 For each n, exhibit two regular expressions, R and S, of length poly(n), where L(R) ^ L(S), but where the first string on which they differ is exponentially long. In other words, L(R) and L(S) must be different, yet agree on all strings of length 2en for some constant e > 0. 332 chapter 8 / space complexity 8.25 An undirected graph is bipartite if its nodes may be divided into two sets so that all edges go from a node in one set to a node in the other set. Show that a graph is bipartite if and only if it doesn't contain a cycle that has an odd number of nodes. Let BIPARTITE = {{G) \ G is a bipartite graph}. Show that BIPARTITE e NL. 8.26 Define UPATH to be the counterpart of PATH for undirected graphs. Show that BIPARTITE