8
SPACE COMPLEXITY
In this chapter we consider the complexity of computational problems in terms of the amount of space, or memory, that they require. Time and space are two of the most important considerations when we seek practical solutions to many computational problems. Space complexity shares many of the features of time complexity and serves as a further way of classifying problems according to their computational difficulty.
As we did with time complexity, we need to select a model for measuring the space used by an algorithm. We continue with the Turing machine model for the same reason that we used it to measure time. Turing machines are mathematically simple and close enough to real computers to give meaningful results.
definition 8.1
Let M be a deterministic Turing machine that halts on all inputs. The space complexity of M is the function /: TV—>J\f, where f(n) is the maximum number of tape cells that M scans on any input of length n. If the space complexity of M is f(n), we also say that M runs in space /(n).
If M is a nondeterministic Turing machine wherein all branches halt on all inputs, we define its space complexity f(n) to be the maximum number of tape cells that M scans on any branch of its computation for any input of length n.
303
304       chapter 8 / space complexity
We typically estimate the space complexity of Turing machines by using asymptotic notation.
definition 8.2
Let /: M—>TZ+ be a function.   The space complexity classes, SPACE(/(n)) and NSPACE(/(n)), are defined as follows.
SPACE(/(n)) = {L\ L is a language decided by an 0(/(n)) space
deterministic Turing machine}. NSPACE(/(n)) = {L\ L is a language decided by an 0(f(n)) space
nondeterministic Turing machine}.
example 8.3
In Chapter 7 we introduced the NP-complete problem SAT. Here, we show that SAT can be solved with a linear space algorithm. We believe that SAT cannot be solved with a polynomial time algorithm, much less with a linear time algorithm, because SAT is NP-complete. Space appears to be more powerful than time because space can be reused, whereas time cannot.
Mi = "On input {4>), where 0 is a Boolean formula:
1. For each truth assignment to the variables xi, ..., xm of 0:
2. Evaluate </> on that truth assignment.
3. If 0 ever evaluated to 1, accept; if not, reject."
Machine Mi clearly runs in linear space because each iteration of the loop can reuse the same portion of the tape. The machine needs to store only the current truth assignment and that can be done with 0(m) space. The number of variables m is at most n, the length of the input, so this machine runs in space 0(n).
example 8.4 ..............................................................................................................................
Here, we illustrate the nondeterministic space complexity of a language. In the next section we show how determining the nondeterministic space complexity can be useful in determining its deterministic space complexity. Consider
8.1      savitch's theorem 305
the problem of testing whether a nondeterministic finite automaton accepts all strings. Let
ALLNFA = {(A)\ A is a NFA and L(A) = £*}.
We give a nondeterministic linear space algorithm that decides the complement of this language, ALL^fA- The idea behind this algorithm is to use nondeter-minism to guess a string that is rejected by the NFA and to use linear space to keep track of which states the NFA could be in at a particular time. Note that this language is not known to be in NP or in coNP.
JV = "On input (M) where M is an NFA:
1. Place a marker on the start state of the NFA.
2. Repeat 2q times, where q is the number of states of M:
3. Nondeterministically select an input symbol and change the positions of the markers on M's states to simulate reading that symbol.
4. Accept if Stages 2 and 3 reveal some string that M rejects, that is, if at some point none of the markers lie on accept states of M. Otherwise, reject."
If M rejects any strings, it must reject one of length at most 2q because in any longer string that is rejected the locations of the markers described in the preceding algorithm would repeat. The section of the string between the repetitions can be removed to obtain a shorter rejected string. Hence N decides ALLupa. (Note that N accepts improperly formed inputs, too.)
The only space needed by this algorithm is for storing the location of the markers and the repeat loop counter, and that can be done with linear space. Hence the algorithm runs in nondeterministic space 0(n). Next, we prove a theorem that provides information about the deterministic space complexity of ALL^fa. i;
8.1
SAVITCH'S THEOREM
Savitch's theorem is one of the earliest results concerning space complexity. It shows that deterministic machines can simulate nondeterministic machines by using a surprisingly small amount of space. For time complexity, such a simulation seems to require an exponential increase in time. For space complexity, Savitch's theorem shows that any nondeterministic TM that uses f(n) space can be converted to a deterministic TM that uses only f2 (n) space.
306
chapter 8 / space complexity
theorem 8.5
Savitch's theorem   For any1 function /: M—>1Z+, where /(n) > n,
NSPACE(/(n)) C SPACE(/2(n)).
proof idea We need to simulate an f(n) space NTM deterministically. A naive approach is to proceed by trying all the branches of the NTM's computation, one by one. The simulation needs to keep track of which branch it is currently trying so that it is able to go on to the next one. But a branch that uses f(n) space may run for 2°^^n^ steps, and each step may be a nondeterministic choice. Exploring the branches sequentially would require recording all the choices used on a particular branch in order to be able to find the next branch. Therefore this approach may use 2°^(-n^ space, exceeding our goal of 0(f2(n)) space.
Instead, we take a different approach by considering the following more general problem. We are given two configurations of the NTM, c\ and c2, together with a number t, and we must test whether the NTM can get from c\ to c2 within t steps. We call this problem theyieldability problem. By solving the yieldability problem, where c\ is the start configuration, c2 is the accept configuration, and t is the maximum number of steps that the nondeterministic machine can use, we can determine whether the machine accepts its input.
We give a deterministic, recursive algorithm that solves the yieldability problem. It operates by searching for an intermediate configuration cm, and recursively testing whether (1) ci can get to cm within t/2 steps, and (2) whether cm can get to c2 within t/2 steps. Reusing the space for each of the two recursive tests allows a significant saving of space.
This algorithm needs space for storing the recursion stack. Each level of the recursion uses 0(/(n)) space to store a configuration. The depth of the recursion is log t, where t is the maximum time that the nondeterministic machine may use on any branch. We have t = 2°^^\ so logi = 0(f(n)). Hence the deterministic simulation uses 0(/2(n)) space.
proof Let N be an NTM deciding a language A in space f(n). We construct a deterministic TM M deciding A. Machine M uses the procedure CANYIELD, which tests whether one of AT's configurations can yield another within a specified number of steps. This procedure solves the yieldability problem described in the proof idea.
Let w be a string considered as input to A". For configurations c\ and c2 of N on w, and integer t, CANYIELD (ci, c2, t) outputs accept if A" can go from configuration c\ to configuration c2 in t or fewer steps along some nondeterministic path. If not, CANYIELD outputs reject. For convenience, we assume that t is a power of 2.
On page 323, we show that Savitch's theorem also holds whenever f(n) > log n.
8.1     savitch's theorem 307
CANYIELD = "On input ci, c2, and t:
1. If t = 1, then test directly whether C\ — c2 or whether c\ yields c2 in one step according to the rules of N. Accept if either test succeeds; reject if both fail.
2. If t > 1, then for each configuration cm of iV on to using space f(n):
3. Run CANYIELD (ci, cm, |).
4. Run CANYIELD (cm, c2, f).
5. If steps 3 and 4 both accept, then accept.
6. If haven't yet accepted, reject."
Now we define M to simulate N as follows. We first modify N so that when it accepts it clears its tape and moves the head to the leftmost cell, thereby entering a configuration called caccept. We let
cstart be the start configuration of N on w. We select a constant d so that N has no more than 2d^n^ configurations using fin) tape, where n is the length of w. Then we know that 2d^n^ provides an upper bound on the running time of any branch of N on w.
M = "On input w:
1.   Output the result of CANYIELD (cstart, caccept, 2d/(n))."
Algorithm CANYIELD obviously solves the yieldability problem, and hence M correctly simulates N. We need to analyze it to verify that M works within 0(/2(n)) space.
Whenever CANYIELD invokes itself recursively, it stores the current stage number and the values of c\, c2, and t on a stack so that these values may be restored upon return from the recursive invocation. Each level of the recursion thus uses 0(/(n)) additional space. Furthermore, each level of the recursion divides the size of t in half. Initially t starts out equal to 2d^n\ so the depth of the recursion is 0(log2d-^n)) or 0(f(n)). Therefore the total space used is 0(/2(n)), as claimed.
One technical difficulty arises in this argument because algorithm M needs to know the value of f(n) when it calls CANYIELD. We can handle this difficulty by modifying M so that it tries f(n) — 1, 2,3,... For each value /(n) = i, the modified algorithm uses CANYIELD to determine whether the accept configuration is reachable. In addition, it uses CANYIELD to determine whether N uses at least space i + 1 by testing whether N can reach any of the configurations of length i +1 from the start configuration. If the accept configuration is reachable, M accepts; if no configuration of length i + 1 is reachable, M rejects; and otherwise M continues with f(n) — i + 1. (We could have handled this difficulty in another way by assuming that M can compute f(n) within 0(f(n)) space, but then we would need to add that assumption to the statement of the theorem).
308       chapter 8 / space complexity
8.2 ....
THE CLASS PSPACE
By analogy with the class P, we define the class PSPACE for space complexity. definition 8.6
PSPACE is the class of languages that are decidable in polynomial space on a deterministic Turing machine. In other words,
PSPACE = |JSPACE(nfc).
k
We define NPSPACE, the nondeterministic counterpart to PSPACE, in terms of the NSPACE classes. However, PSPACE = NPSPACE by virtue of Savitch's theorem, because the square of any polynomial is still a polynomial.
In Examples 8.3 and 8.4 we showed that SAT is in SPACE(n) and that ALLNf& is in coNSPACE(n) and hence, by Savitch's theorem, in SPACE(n2), because the deterministic space complexity classes are closed under complement. Therefore both languages are in PSPACE.
Let's examine the relationship of PSPACE with P and NP. We observe that P C PSPACE because a machine that runs quickly cannot use a great deal of space. More precisely, for t(n) > n, any machine that operates in time t(n) can use at most t(n) space because a machine can explore at most one new cell at each step of its computation. Similarly, NP C NPSPACE, and so NP C PSPACE.
Conversely, we can bound the time complexity of a Turing machine in terms of its space complexity. For f(n) > n, a TM that uses f(n) space can have at most f(n) 2°(f(n^ different configurations, by a simple generalization of the proof of Lemma 5.8 on page 194. A TM computation that halts may not repeat a configuration. Therefore a TM^ that uses space /(n) must run in time f(n) 2°^^\ so PSPACE C EXPTIME = (Jfc TIME(2nfc).
We summarize our knowledge of the relationships among the complexity classes defined so far in the series of containments
PCNPC PSPACE = NPSPACE C EXPTIME.
We don't know whether any of these containments is actually an equality. Someone may yet discover a simulation like the one in Savitch's theorem that merges some of these classes into the same class. However, in Chapter 9 we prove that P 7^ EXPTIME. Therefore at least one of the preceding containments is proper, but we are unable to say which!  Indeed, most researchers
^The requirement here that f(n) > n is generalized later to f(n) > logn, when we introduce TMs that use sublinear space on page 322.
8.3     pspace-completeness 309
believe that all the containments are proper. The following diagram depicts the relationships among these classes, assuming that all are different.
figure 8.7
Conjectured relationships among P, NP, PSPACE, and EXPTIME
8.3
PSPACE-COMPLETENESS
In Section 7.4 we introduced the category of NP-complete languages as representing the most difficult languages in NP. Demonstrating that a language is NP-complete provides strong evidence that the language is not in P. If it were, P and NP would be equal. In this section we introduce the analogous notion, PSPACE-completeness, for the class PSPACE.
definition 8.8
A language B is PSPACE-complete if it satisfies two conditions:
1. B is in PSPACE, and
2. every A in PSPACE is polynomial time reducible to B.
If B merely satisfies condition 2, we say that it is PSPACE-hard.
In defining PSPACE-completeness, we use polynomial time reducibility as given in Definition 7.29. Why don't we define a notion of polynomial space reducibility and use that instead of polynomial time reducibility? To understand the answer to this important question, consider our motivation for defining complete problems in the first place.
310       chapter 8 / space complexity
Complete problems are important because they are examples of the most difficult problems in a complexity class. A complete problem is most difficult because any other problem in the class is easily reduced into it, so if we find an easy way to solve the complete problem, we can easily solve all other problems in the class. The reduction must be easy, relative to the complexity of typical problems in the class, for this reasoning to apply. If the reduction itself were difficult to compute, an easy solution to the complete problem wouldn't necessarily yield an easy solution to the problems reducing to it.
Therefore the rule is: Whenever we define complete problems for a complexity class, the reduction model must be more limited than the model used for defining the class itself.
THE TQBF PROBLEM
Our first example of a PSPACE-complete problem involves a generalization of the satisfiability problem. Recall that a Boolean formula is an expression that contains Boolean variables, the constants 0 and 1, and the Boolean operations A, V, and ->. We now introduce a more general type of Boolean formula.
The quantifiers V (for all) and 3 (there exists) make frequent appearances in mathematical statements. Writing the statement Vrc 4> means that, for every value for the variable x, the statement <fi is true. Similarly, writing the statement 3x <fi means that, for some value of the variable x, the statement 0 is true. Sometimes, V is referred to as the universal quantifier and 3 as the existential quantifier. We say that the variable x immediately following the quantifier is bound to the quantifier.
For example, considering the natural numbers, the statement Vx [x + 1 > x] means that the successor x + 1 of every natural number x is greater than the number itself. Obviously, this statement is true. However, the statement By [y + y = 3] obviously is false. When interpreting the meaning of statements involving quantifiers, we must consider the universe from which the values are drawn. In the preceding cases the universe comprised the natural numbers, but if we took the real numbers instead, the existentially quantified statement would become true.
Statements may contain several quantifiers, as in \/x 3y [y > x]. For the universe of the natural numbers, this statement says that every natural number has another natural number larger than it. The order of the quantifiers is important. Reversing the order, as in the statement ByMx [y > x], gives an entirely different meaning—namely, that some natural number is greater than all others. Obviously, the first statement is true and the second statement is false.
A quantifier may appear anywhere in a mathematical statement. It applies to the fragment of the statement appearing within the matched pair of parentheses or brackets following the quantified variable. This fragment is called the scope of the quantifier. Often, it is convenient to require that all quantifiers appear at the beginning of the statement and that each quantifier's scope is everything following it. Such statements are said to be mprenex normal form. Any statement may be put into prenex normal form easily. We consider statements in this form only, unless otherwise indicated.
8.3     pspace-completeness       31 1
Boolean formulas with quantifiers are called quantified Boolean formulas. For such formulas, the universe is {0.1}. For example,
<j> = Vx 3y [ (x V y) A (x V y) ]
is a quantified Boolean formula. Here, 0 is true, but it would be false if the quantifiers Vx and 3y were reversed.
When each variable of a formula appears within the scope of some quantifier, the formula is said to befiilly quantified. A fully quantified Boolean formula is sometimes called a sentence and is always either true or false. For example, the preceding formula 0 is fully quantified. However, if the initial part, Vx, of 0 were removed, the formula would no longer be fully quantified and would be neither true nor false.
The TQBF problem is to determine whether a fully quantified Boolean formula is true or false. We define the language
TQBF =        0 is a true fully quantified Boolean formula}.
theorem 8.9 ............................................................................................................................
TQBF is PSPACE-complete.
proof idea To show that TQBF is in PSPACE we give a straightforward algorithm that assigns values to the variables and recursively evaluates the truth of the formula for those values. From that information the algorithm can determine the truth of the original quantified formula.
To show that every language A in PSPACE reduces to TQBF in polynomial time, we begin with a polynomial space-bounded Turing machine for A. Then we give a polynomial time reduction that maps a string to a quantified Boolean formula 0 that encodes a simulation of the machine on that input. The formula is true iff the machine accepts.
As a first attempt at this construction, let's try to imitate the proof of the Cook-Levin theorem, Theorem 7.37. We can construct a formula 0 that simulates M on an input w by expressing the requirements for an accepting tableau. A tableau for M on w has width 0{nk), the space used by M, but its height is exponential in nk because M can run for exponential time. Thus, if we were to represent the tableau with a formula directly, we would end up with a formula of exponential size. However, a polynomial time reduction cannot produce an exponential-size result, so this attempt fails to show that A <p TQBF.
Instead, we use a technique related to the proof of Savitch's theorem to construct the formula. The formula divides the tableau into halves and employs the universal quantifier to represent each half with the same part of the formula. The result is a much shorter formula.
312       chapter 8 / space complexity
proof   First, we give a polynomial space algorithm deciding TQBF.
T = "On input (4>), a fully quantified Boolean formula:
1. If 0 contains no quantifiers, then it is an expression with only constants, so evaluate 0 and accept if it is true; otherwise, reject.
2. If 0 equals 3x ip, recursively call T on ift, first with 0 substituted for x and then with 1 substituted for x. If either result is accept, then accept; otherwise, reject.
3. If 0 equals Vx tp, recursively call T on ip, first with 0 substituted for x and then with 1 substituted for x. If both results are accept, then accept; otherwise, reject."
Algorithm T obviously decides TQBF. To analyze its space complexity we observe that the depth of the recursion is at most the number of variables. At each level we need only store the value of one variable, so the total space used is 0(m), where m is the number of variables that appear in 0. Therefore T runs in linear space.
Next, we show that TQBF is PSPACE-hard. Let A be a language decided by a TM M in space nk for some constant k. We give a polynomial time reduction from A to TQBF.
The reduction maps a string w to a quantified Boolean formula 0 that is true iff M accepts w. To show how to construct 0 we solve a more general problem. Using two collections of variables denoted c\ and c2 representing two configurations and a number t > 0, we construct a formula (pCl,c2,t- If we assign c\ and c2 to actual configurations, the formula is true iff M can go from c\ to c2 in at most t steps. Then we can let 0 be the formula 0cstart,caccept,/i> where h = 2d^n^ for a constant d, chosen so that M has no more than 2d^n' possible configurations on an input of length n. Here, let f(n) = nk. For convenience, we assume that t is a power of 2.
The formula encodes the contents of tape cells as in the proof of the Cook-Levin theorem. Each cell has several variables associated with it, one for each tape symbol and state, corresponding to the possible settings of that cell. Each configuration has nk cells and so is encoded by 0(nk) variables.
If t = 1, we can easily construct 4>ci,c2,t- We design the formula to say that either c\ equals c2, or c2 follows from c\ in a single step of M. We express the equality by writing a Boolean expression saying that each of the variables representing c\ contains the same Boolean value as the corresponding variable representing c2- We express the second possibility by using the technique presented in the proof of the Cook-Levin theorem. That is, we can express that c\ yields c2 in a single step of M by writing Boolean expressions stating that the contents of each triple of c\'s cells correctly yields the contents of the corresponding triple of c2's cells.
If t > 1, we construct 0Ci,c2)t recursively. As a warmup let's try one idea that doesn't quite work and then fix it. Let
0Cl,C2,t = 3mi [0Cl,mi,i A0mijC2)t].
8.3     pspace-completeness 313
The symbol mi represents a configuration of M. Writing 3mi is shorthand for 3xi, ... ,xi, where / = 0(nk) and xi, ... ,xi are the variables that encode mi. So this construction of 4>Cl iC2;t says that M can go from ci to c2 in at most t steps if some intermediate configuration mi exists, whereby M can go from c\ to mi in at most | steps and then from mi to c2 in at most | steps. Then we construct the two formulas (f>CuTni^ and 0mi)C2) * recursively.
The formula 4>Cl,c2,t has the correct value; that is, it is TRUE whenever M can go from ci to c2 within t steps. However, it is too big. Every level of the recursion involved in the construction cuts t in half but roughly doubles the size of the formula. Hence we end up with a formula of size roughly t. Initially t — 2df (n), so this method gives an exponentially large formula.
To reduce the size of the formula we use the V quantifier in addition to the 3 quantifier. Let
(pCl,c2,t = 3miV(c3,c4)e{(ci,mi),(mi,c2)} [0C3)C4,±].
The introduction of the new variables representing the configurations c3 and c4 allows us to "fold" the two recursive subformulas into a single subformula, while preserving the original meaning. By writing V(c3,c4) e {(ci,mi), (mi,c2)}, we indicate that the variables representing the configurations c3 and c4 may take the values of the variables of ci and mi or of mi and c2, respectively, and that the resulting formula 0C3,C4) 1 is true in either case. We may replace the construct Vx e {y,z} [... ] by the equivalent construct Vx [ (x = y V x = z) —> ... ] to obtain a syntactically correct quantified Boolean formula. Recall that in Section 0.2 we showed that Boolean implication (—>) and Boolean equality (=) can be expressed in terms of AND and NOT. Here, for clarity, we use the symbol = for Boolean equality instead of the equivalent symbol <-+ used in Section 0.2.
To calculate the size of the formula (f>csvm,ciCcept,h, where h — 2d^n\ we note that each level of the recursion adds a portion of the formula that is linear in the size of the configurations and is thus of size 0(f(n)). The number of levels of the recursion is log(2d^n)), or 0(/(n)). Hence the size of the resulting formula isO(/2(n)).
WINNING STRATEGIES FOR GAMES
For the purposes of this section, a game is loosely defined to be a competition in which opposing parties attempt to achieve some goal according to prespec-ified rules. Games appear in many forms, from board games such as chess to economic and war games that model corporate or societal conflict.
Games are closely related to quantifiers. A quantified statement has a corresponding game; conversely, a game often has a corresponding quantified statement. These correspondences are helpful in several ways. For one, expressing a mathematical statement that uses many quantifiers in terms of the corresponding game may give insight into the statement's meaning. For another, expressing a game in terms of a quantified statement aids in understanding the complexity of the game. To illustrate the correspondence between games and quantifiers, we turn to an artificial game called the formula game.
314       chapter 8 / space complexity
Let (ft = 3x\ Vx2 3x3 • ■ • Qxk [ift] be a quantified Boolean formula in prenex normal form. Here Q represents either a V or an 3 quantifier. We associate a game with (ft as follows. Two players, called Player A and Player E, take turns selecting the values of the variables xi, ..., x&. Player A selects values for the variables that are bound to V quantifiers and player E selects values for the variables that are bound to 3 quantifiers. The order of play is the same as that of the quantifiers at the beginning of the formula. At the end of play we use the values that the players have selected for the variables and declare that Player E has won the game if ip, the part of the formula with the quantifiers stripped off, is now TRUE. Player A has won if ip is now FALSE.
example 8.10 ..........................................................................................................................
Say that (fti is the formula
3xi Vx2 3x3 [(xi V x2) A (x2 V x3) A (xj V xj)].
In the formula game for (fti, Player E picks the value of xi, then Player A picks the value of x2, and finally Player E picks the value of x3.
To illustrate a sample play of this game, we begin by representing the Boolean value TRUE with 1 and FALSE with 0, as usual. Let's say that Player E picks xi = l, then Player A picks X2 = 0, and finally Player E picks x3 = 1. With these values for x\, x2, and X3, the subformula
(xi V x2) A (x2 V x3) A (x~2 V X3)
is 1, so Player E has won the game. In fact, Player E may always win this game by selecting x\ = 1 and then selecting X3 to be the negation of whatever Player A selects for x2. We say that Player E has a winning strategy for this game. A player has a winning strategy for a game if that player wins when both sides play optimally.
Now let's change the formula slightly to get a game in which Player A has a winning strategy. Let (ft2 be the formula
3xi Vx2 3x3 [(xi V x2) A (x2 V x3) A (x2 V x3")].
Player A now has a winning strategy because, no matter what Player E selects for xi, Player A may select x2 = 0, thereby falsifying the part of the formula appearing after the quantifiers, whatever Player E's last move may be.
We next consider the problem of determining which player has a winning strategy in the formula game associated with a particular formula. Let
FORMULA-GAME = {{(ft) | Player E has a winning strategy in
the formula game associated with (ft).
8.3     pspace-completeness 315
theorem 8.11
FORMULA-GAME is PSPACE-complete
proof idea FORMULA-GAME is PSPACE-complete for a simple reason. It is the same as TQBF. To see that FORMULA-GAME = TQBF, observe that a formula is TRUE exactly when Player E has a winning strategy in the associated formula game. The two statements are different ways of saying the same thing.
proof The formula 0 = 3xi Vx2 3x3 • • • [^] is TRUE when some setting for x\ exists such that, for any setting of X2, a setting of X3 exists such that, and so on ..., where ip is TRUE under the settings of the variables. Similarly, Player E has a winning strategy in the game associated with 0 when Player E can make some assignment to x\ such that, for any setting of X2, Player E can make an assignment to X3 such that, and so on ..., -0 is TRUE under these settings of the variables.
The same reasoning applies when the formula doesn't alternate between existential and universal quantifiers. If 0 has the form Vxi, X2, X3 3x4, X5 Vx6 [ip], Player A would make the first three moves in the formula game to assign values to xi, X2, and X3; then Player E would make two moves to assign X4 and X5; and finally Player A would assign a value xq.
Hence 0 e TQBF exactly when 0 e FORMULA-GAME, and the theorem follows from Theorem 8.9.
GENERALIZED GEOGRAPHY
Now that we know that the formula game is PSPACE-complete, we can establish the PSPACE-completeness or PSPACE-hardness of some other games more easily. We'll begin with a generalization of the game geography and later discuss games such as chess, checkers, and GO.
Geography is a child's game in which players take turns naming cities from anywhere in the world. Each city chosen must begin with the same letter that ended the previous city's name. Repetition isn't permitted. The game starts with some designated starting city and ends when some player loses because he or she is unable to continue. For example, if the game starts with Peoria, then Amherst might legally follow (because Peoria ends with the letter a, and Amherst begins with the letter a), then Tucson, then Nashua, and so on until one player gets stuck and thereby loses.
We can model this game with a directed graph whose nodes are the cities of the world. We draw an arrow from one city to another if the first can lead to the second according to the game rules. In other words, the graph contains an edge from a city X to a city Y if city X ends with the same letter that begins city Y. We illustrate a portion of the geography graph in Figure 8.12.
316       chapter 8 / space complexity
figure 8.12
Portion of the graph representing the geography game
When the rules of geography are interpreted for this graphic representation, one player starts by selecting the designated start node and then the players take turns alternately by picking nodes that form a simple path in the graph. The requirement that the path be simple (i.e., doesn't use any node more than once) corresponds to the requirement that a city may not be repeated. The first player unable to extend the path loses the game.
In generalized geography we take an arbitrary directed graph with a designated start node instead of the graph associated with the actual cities. For example, the following graph is an example of a generalized geography game.
figure 8.13
A sample generalized geography game
8.3     pspace-completeness 317
Say that Player I is the one who moves first and Player II second. In this example, Player I has a winning strategy as follows. Player I starts at node 1, the designated start node. Node 1 points only at nodes 2 and 3, so Player I's first move must be one of these two choices. He chooses 3. Now Player II must move, but node 3 points only to node 5, so she is forced to select node 5. Then Player I selects 6, from choices 6, 7, and 8. Now Player II must play from node 6, but it points only to node 3, and 3 was previously played. Player II is stuck, and thus Player I wins.
If we change the example by reversing the direction of the edge between nodes 3 and 6, Player II has a winning strategy. Can you see it? If Player I starts out with node 3 as before, Player II responds with 6 and wins immediately, so Player I's only hope is to begin with 2. In that case, however, Player II responds with 4. If Player I now takes 5, Player II wins with 6. If Player I takes 7, Player II wins with 9. No matter what Player I does, Player II can find a way to win, so Player II has a winning strategy.
The problem of determining which player has a winning strategy in a generalized geography game is PSPACE-complete. Let
GG = {(G, b) | Player I has a winning strategy for the generalized
geography game played on graph G starting at node b).
theorem 8.14 .........................................................................................................................
GG is PSPACE-complete.
proof idea A recursive algorithm similar to the one used for TQBF in Theorem 8.9 determines which player has a winning strategy. This algorithm runs in polynomial space and so GG e PSPACE.
To prove that GG is PSPACE-hard, we give a polynomial time reduction from FORMULA-GAME to GG. This reduction converts a formula game to a generalized geography graph so that play on the graph mimics play in the formula game. In effect, the players in the generalized geography game are really playing an encoded form of the formula game.
proof The following algorithm decides whether Player I has a winning strategy in instances of generalized geography; in other words, it decides GG. We show that it runs in polynomial space.
M = "On input (G, b), where G is a directed graph and 6 is a node of G:
1. If b has outdegree 0, reject, because Player I loses immediately.
2. Remove node b and all connected arrows to get a new graph G\.
3. For each of the nodes bi, b2, ..., bk that b originally pointed at, recursively call M on (Gi,bi).
4. If all of these accept, Player II has a winning strategy in the original game, so reject. Otherwise, Player II doesn't have a winning strategy, so Player I must; therefore accept."
318
chapter 8 / space complexity
The only space required by this algorithm is for storing the recursion stack. Each level of the recursion adds a single node to the stack, and at most m levels occur, where m is the number of nodes in G. Hence the algorithm runs in linear space.
To establish the PSPACE-hardness of GG, we show that FORMULA-GAME is polynomial time reducible to GG. The reduction maps the formula
<j> = 3xi Vx2 3x3 ■■■ Qxk [ tf)}
to an instance (G, b) of generalized geography. Here we assume for simplicity that 0's quantifiers begin and end with 3 and that they strictly alternate between 3 and V. A formula that doesn't conform to this assumption may be converted to a slightly larger one that does by adding extra quantifiers binding otherwise unused or "dummy" variables. We assume also that ip is in conjunctive normal form (see Problem 8.12).
The reduction constructs a geography game on a graph G where optimal play mimics optimal play of the formula game on 4>. Player I in the geography game takes the role of Player E in the formula game, and Player II takes the role of Player A.
The structure of graph G is partially shown in the following figure. Play starts at node b, which appears at the top left-hand side of G. Underneath b, a sequence of diamond structures appears, one for each of the variables of 0. Before getting to the right-hand side of G, let's see how play proceeds on the left-hand side.
figure 8.15
Partial structure of the geography game simulating the formula game
8.3     pspace-completeness 319
Play starts at b. Player I must select one of the two edges going from b. These edges correspond to Player E's possible choices at the beginning of the formula game. The left-hand choice for Player I corresponds to TRUE for Player E in the formula game and the right-hand choice to FALSE. After Player I has selected one of these edges—say, the left-hand one—Player II moves. Only one outgoing edge is present, so this move is forced. Similarly, Player I's next move is forced and play continues from the top of the second diamond. Now two edges again are present, but Player II gets the choice. This choice corresponds to Player A's first move in the formula game. As play continues in this way, Players I and II choose a rightward or leftward path through each of the diamonds.
After play passes through all the diamonds, the head of the path is at the bottom node in the last diamond, and it is Player I's turn because we assumed that the last quantifier is 3. Player I's next move is forced. Then they are at node c in Figure 8.15 and Player II makes the next move.
This point in the geography game corresponds to the end of play in the formula game. The chosen path through the diamonds corresponds to an assignment to 0's variables. Under that assignment, if ip is TRUE, Player E wins the formula game, and if ip is FALSE, Player A wins. The structure on the right-hand side of the following figure guarantees that Player I can win if Player E has won and that Player II can win if Player A has won.
figure 8.16
Full structure of the geography game simulating the formula game, where 4> = 3xi Vx2 • • • Qxk [{xi V x2 V £3) A (x~2 V S3 V ■ ■ •) A ■■• A ( )]
320
chapter 8 / space complexity
At node c, Player II may choose a node corresponding to one of i/j's clauses. Then Player I may choose a node corresponding to a literal in that clause. The nodes corresponding to unnegated literals are connected to the left-hand (TRUE) sides of the diamond for associated variables, and similarly for negated literals and right-hand (FALSE) sides as shown in Figure 8.16.
If (f> is FALSE, Player II may win by selecting the unsatisfied clause. Any literal that Player I may then pick is FALSE and is connected to the side of the diamond that hasn't yet been played. Thus Player II may play the node in the diamond, but then Player I is unable to move and loses. If <p is TRUE, any clause that Player II picks contains a TRUE literal. Player I selects that literal after Player IPs move. Because the literal is TRUE, it is connected to the side of the diamond that has already been played, so Player II is unable to move and loses.
In Theorem 8.14 we showed that no polynomial time algorithm exists for optimal play in generalized geography unless P = PSPACE. We'd like to prove a similar theorem regarding the difficulty of computing optimal play in board games such as chess, but an obstacle arises. Only a finite number of different game positions may occur on the standard 8x8 chess board. In principle, all these positions may be placed in a table, along with the best move in each position. The table would be too large to fit inside our galaxy but, being finite, could be stored in the control of a Turing machine (or even that of a finite automaton!). Thus the machine would be able to play optimally in linear time, using table lookup. Perhaps at some time in the future, methods that can quantify the complexity of finite problems will be developed, but current methods are asymptotic and hence apply only to the rate of growth of the complexity as the problem size increases—not to any fixed size. Nevertheless, we can give some evidence for the difficulty of computing optimal play for many board games by generalizing them to an n x n board. Such generalizations of chess, checkers, and GO have been shown to be PSPACE-hard or hard for even larger complexity classes, depending on the details of the generalization.
8.4
THE CLASSES L AND NL
Until now, we have considered only time and space complexity bounds that are at least linear—that is, bounds where f(n) is at least n. Now we examine smaller, sublinear space bounds. In time complexity, sublinear bounds are insufficient for reading the entire input, so we don't consider them here. In sublinear space complexity the machine is able to read the entire input but it doesn't have enough space to store the input. To consider this situation meaningfully, we must modify our computational model.
8.4     the classes l and nl 321
We introduce a Turing machine with two tapes: a read-only input tape and a read/write work tape. On the read-only tape the input head can detect symbols but not change them. We provide a way for the machine to detect when the head is at the left-hand and right-hand ends of the input. The input head must remain on the portion of the tape containing the input. The work tape may be read and written in the usual way. Only the cells scanned on the work tape contribute to the space complexity of this type of Turing machine.
Think of a read-only input tape as a CD-ROM, a device used for input on many personal computers. Often, the CD-ROM contains more data than the computer can store in its main memory. Sublinear space algorithms allow the computer to manipulate the data without storing all of it in main memory.
For space bounds that are at least linear, the two-tape TM model is equivalent to the standard one-tape model (see Exercise 8.1). For sublinear space bounds, we use only the two-tape model.
definition 8.17
L is the class of languages that are decidable in logarithmic space on a deterministic Turing machine. In other words,
L = SPACE(logn).
NL is the class of languages that are decidable in logarithmic space on a nondeterministic Turing machine. In other words,
NL = NSPACE(logn).
We focus on log n space instead of, say, y/n or log2 n space, for several reasons that are similar to those for our selection of polynomial time and space bounds. Logarithmic space is just large enough to solve a number of interesting computational problems, and it has attractive mathematical properties such as robustness even when machine model and input encoding method change. Pointers into the input may be represented in logarithmic space, so one way to think about the power of log space algorithms is to consider the power of a fixed number of input pointers.
example 8.18 ..........................................................................................................................
The language A = {0fclfc| k > 0} is a member of L. In Section 7.1 on page 247 we described a Turing machine that decides A by zigzagging back and forth across the input, crossing off the Os and Is as they are matched. That algorithm uses linear space to record which positions have been crossed off, but it can be modified to use only log space.
322       chapter 8 / space complexity
The log space TM for A cannot cross off the Os and Is that have been matched on the input tape because that tape is read-only. Instead, the machine counts the number of Os and, separately, the number of Is in binary on the work tape. The only space required is that used to record the two counters. In binary, each counter uses only logarithmic space, and hence the algorithm runs in O(logn) space. Therefore A € L.
example 8.19 ..........................................................................................................................
Recall the language
PATH = {(G, s,t) \ G is a directed graph that has a directed path from s to t}
defined in Section 7.2. Theorem 7.14 shows that PATH is in P but that the algorithm given uses linear space. We don't know whether PATH can be solved in logarithmic space deterministically, but we do know a nondeterministic log space algorithm for PATH.
The nondeterministic log space Turing machine deciding PATH operates by starting at node s and nondeterministically guessing the nodes of a path from s to t. The machine records only the position of the current node at each step on the work tape, not the entire path (which would exceed the logarithmic space requirement). The machine nondeterministically selects the next node from among those pointed at by the current node. Then it repeats this action until it reaches node t and accepts, or until it has gone on for m steps and rejects, where m is the number of nodes in the graph. Thus PATH is in NL. n
Our earlier claim that any f{n) space bounded Turing machine also runs in time 2°(f(n^ is no longer true for very small space bounds. For example, a Turing machine that uses 0(1) (i.e., constant) space may run for n steps. To obtain a bound on the running time that applies for every space bound f(n) we give the following definition.
definition 8.20
If M is a Turing machine that has a separate read-only input tape and w is an input, a configuration of M on w is a setting of the state, the work tape, and the positions of the two tape heads. The input w is not a part of the configuration of M on uj.
If M runs in f{n) space and w is an input of length n, the number of configurations of M on w is n2°U(n». To expl ain this result, let's say that M has c states and g tape symbols. The number of strings that can appear on the work tape is The input head can be in one of n positions and the work tape head can
8.5     nl-completeness 323
be in one of f(n) positions. Therefore the total number of configurations of M on w, which is an upper bound on the running time of M on w, is cnf{n)g^n\ or n2°(/(n)).
We focus almost exclusively on space bounds f(n) that are at least log n. Our earlier claim that the time complexity of a machine is at most exponential in its space complexity remains true for such bounds because n2°(f(n^ is 2°^^ when f(n) > logn.
Recall that Savitch's theorem shows that we can convert nondeterministic TMs to deterministic TMs and increase the space complexity f(n) by only a squaring, provided that f(n) > n. We can extend Savitch's theorem to hold for sublinear space bounds down to f(n) > logn. The proof is identical to the original one we gave on page 305, except that we use Turing machines with a read-only input tape and instead of referring to configurations of N we refer to configurations of iV on w. Storing a configuration of TV on w uses log(n2°^^n^) = logn + 0(/(n)) space. If /(n) > logn, the storage used is 0(f(n)) and the remainder of the proof remains the same.
8.5
NL-COMPLETENESS
As we mentioned in Example 8.19, the PATH problem is known to be in NL but isn't known to be in L. We believe that PATH doesn't belong to L, but we don't know how to prove this conjecture. In fact, we don't know of any problem in NL that can be proven to be outside L. Analogous to the question of whether P = NP we have the question of whether L = NL.
As a step toward resolving the L versus NL question, we can exhibit certain languages that are NL-complete. As with complete languages for other complexity classes, the NL-complete languages are examples of languages that are, in a certain sense, the most difficult languages in NL. If L and NL are different, all NL-complete languages don't belong to L.
As with our previous definitions of completeness, we define an NL-complete language to be one which is in NL and to which any other language in NL is reducible. However, we don't use polynomial time reducibility here because, as you will see, all problems in NL are solvable in polynomial time. Therefore every two problems in NL except 0 and £* are polynomial time reducible to one another (see the discussion of polynomial time reducibility in the definition of PSPACE-completeness on page 309). Hence polynomial time reducibility is too strong to differentiate problems in NL from one another. Instead we use a new type of reducibility called log space reducibility.
324
chapter 8 / space complexity
definition 8.21
A log space transducer is a Turing machine with a read-only input tape, a write-only output tape, and a read/write work tape. The work tape may contain O(logn) symbols. A log space transducer M computes a function /: S*—>£*, where f(w) is the string remaining on the output tape after M halts when it is started with w on its input tape. We call / a log space computable function. Language A is log space reducible to language B, written A <l B, if A is mapping reducible to B by means of a log space computable function /.
Now we are ready to define NL-completeness.
— definition 8.22
A language B is NL-complete if
1. B e NL, and
2. every A in NL is log space reducible to B.
If one language is log space reducible to another language already known to be in L, the original language is also in L, as the following theorem demonstrates.
theorem 8.23 .........................................................................................................................
HA<LB and B G L, then A e L.
proof A tempting approach to the proof of this theorem is to follow the model presented in Theorem 7.31, the analogous result for polynomial time re-ducibility. In that approach, a log space algorithm for A first maps its input w to f(w), using the log space reduction /, and then applies the log space algorithm for B. However, the storage required for f(w) may be too large to fit within the log space bound, so we need to modify this approach.
Instead, ^4's machine Ma computes individual symbols of f(w) as requested by -B's machine Mb- In the simulation, Ma keeps track of where Mg's input head would be on f(w). Every time Mb moves, Ma restarts the computation of / on w from the beginning and ignores all the output except for the desired location of fiw). Doing so may require occasional recomputation of parts of f(w) and so is inefficient in its time complexity. The advantage of this method is that only a single symbol of f(w) needs to be stored at any point, in effect trading time for space.
8.5     nl-completeness 325
corollary 8.24 .......................................................
If any NL-complete language is in L, then L = NL.
SEARCHING IN GRAPHS
theorem 8.25 .........................................................................................................................
PATH is NL-complete.
proof idea Example 8.19 shows that PATH is in NL, so we only need to show that PATH is NL-hard. In other words, we must show that every language A in NL is log space reducible to PATH.
The idea behind the log space reduction from A to PATH is to construct a graph that represents the computation of the nondeterministic log space Turing machine for A. The reduction maps a string w to a graph whose nodes correspond to the configurations of the NTM on input w. One node points to a second node if the corresponding first configuration can yield the second configuration in a single step of the NTM. Hence the machine accepts w whenever some path from the node corresponding to the start configuration leads to the node corresponding to the accepting configuration.
proof We show how to give a log space reduction from any language A in NL to PATH. Let's say that NTM M decides A in O(logn) space. Given an input w, we construct (G, s, t) in log space, where G is a directed graph that contains a path from s to t if and only if M accepts w.
The nodes of G are the configurations of M on w. For configurations c\ and c2 of M on w, the pair (ci, c2) is an edge of G if c2 is one of the possible next configurations of M starting from c\. More precisely, if M's transition function indicates that ci's state together with the tape symbols under its input and work tape heads can yield the next state and head actions to make c\ into c2, then (ci, c2) is an edge of G. Node s is the start configuration of M on w. Machine M is modified to have a unique accepting configuration, and we designate this configuration to be node t.
This mapping reduces A to PATH because, whenever M accepts its input, some branch of its computation accepts, which corresponds to a path from the start configuration s to the accepting configuration t in G. Conversely, if some path exists from s to t in G, some computation branch accepts when M runs on input w, and M accepts w.
To show that the reduction operates in log space, we give a log space transducer which, on input w, outputs a description of G. This description comprises two lists: G's nodes and G's edges. Listing the nodes is easy because each node is a configuration of M on w and can be represented in c log n space for some constant c. The transducer sequentially goes through all possible strings of length
326
chapter 8 / space complexity
clogn, tests whether each is a legal configuration of M on w, and outputs those that pass the test. The transducer lists the edges similarly. Log space is sufficient for verifying that a configuration c\ of M on w can yield configuration c2 because the transducer only needs to examine the actual tape contents under the head locations given in c\ to determine that M's transition function would give configuration c2 as a result. The transducer tries all pairs (ci, c2) in turn to find which qualify as edges of G. Those that do are added to the output tape.
One immediate spinoff of Theorem 8.25 is the following corollary which states that NL is a subset of P.
corollary 8.26 ...................................................................................................................
NL C P.
proof Theorem 8.25 shows that any language in NL is log space reducible to PATH. Recall that a Turing machine that uses space f(n) runs in time n2°(/(n)\ so a reducer that runs in log space also runs in polynomial time. Therefore any language in NL is polynomial time reducible to PATH, which in turn is in P, by Theorem 7.14. We know that every language that is polynomial time reducible to a language in P is also in P, so the proof is complete.
Though log space reducibility appears to be highly restrictive, it is adequate for most reductions in complexity theory, because these are usually computationally simple. For example, in Theorem 8.9 we showed that every PSPACE problem is polynomial time reducible to TQBF. The highly repetitive formulas that these reductions produce may be computed using only log space, and therefore we may conclude that TQBF is PSPACE-complete with respect to log space reducibility. This conclusion is important because Corollary 9.6 shows that NL C PSPACE. This separation and log space reducibility implies that TQBF g NL.
8.6
NL EQUALS CONL
This section contains one of the most surprising results known concerning the relationships among complexity classes. The classes NP and coNP are generally believed to be different. At first glance, the same appears to hold for the classes NL and coNL. The fact that NL equals coNL, as we are about to prove, shows that our intuition about computation still has many gaps in it.
8.6     nl equals conl 327
theorem 8.27
NL = coNL.
proof idea We show that PATH is in NL, and thereby establish that every problem in coNL is also in NL, because PATH is NL-complete. The NL algorithm M that we present for PATH must have an accepting computation whenever the input graph G does not contain a path from s to t.
First, let's tackle an easier problem. Let c be the number of nodes in G that are reachable from s. We assume that c is provided as an input to M and show how to use c to solve PATH. Later we show how to compute c.
Given G, s, t, and c, the machine M operates as follows. One by one, M goes through all the m nodes of G and nondeterministically guesses whether each one is reachable from s. Whenever a node u is guessed to be reachable, M attempts to verify this guess by guessing a path of length m or less from s to u. If a computation branch fails to verify this guess, it rejects. In addition, if a branch guesses that t is reachable, it rejects. Machine M counts the number of nodes that have been verified to be reachable. When a branch has gone through all of G's nodes, it checks that the number of nodes that it verified to be reachable from s equals c, the number of nodes that actually are reachable, and rejects if not. Otherwise, this branch accepts.
In other words, if M nondeterministically selects exactly c nodes reachable from s, not including t, and proves that each is reachable from s by guessing the path, M knows that the remaining nodes, including t, are not reachable, so it can accept.
Next, we show how to calculate c, the number of nodes reachable from s. We describe a nondeterministic log space procedure whereby at least one computation branch has the correct value for c and all other branches reject.
For each i from 0 to m, we define Ai to be the collection of nodes that are at a distance of i or less from s (i.e., that have a path of length at most i from s). So Ao = {s}, each Ai C Ai+i, and Am contains all nodes that are reachable from s. Let q be the number of nodes in Ai. We next describe a procedure that calculates q+i from c^. Repeated application of this procedure yields the desired value of c = cm.
We calculate a+i from c;, using an idea similar to the one presented earlier in this proof sketch. The algorithm goes through all the nodes of G, determines whether each is a member of Ai+\, and counts the members.
To determine whether a node v is in Ai+i, we use an inner loop to go through all the nodes of G and guess whether each node is in Ai. Each positive guess is verified by guessing the path of length at most i from s. For each node u verified to be in Ai, the algorithm tests whether (u, v) is an edge of G. If it is an edge, v is in Ai+\. Additionally, the number of nodes verified to be in Ai is counted. At the completion of the inner loop, if the total number of nodes verified to be in Ai is not q, all Ai have not been found, so this computation branch rejects. If the count equals Ci and v has not yet been shown to be in Aj+i, we conclude that it isn't in Ai+i. Then we go on to the next v in the outer loop.
328       chapter 8 / space complexity
proof   Here is an algorithm for PATH. Let m be the number of nodes of G.
M = "On input {G,sA):
1. Let Co = 1. [ Aq = {s} has 1 node J
2. For i = 0 to m — 1: [ compute ci+i from a ]
3. LetCi-|_i = l. [ci+i counts nodes in ]
4. For each node v ^ s in G: [ check iff £ Ai+i ]
5. Let d — 0. [ d re-counts Aj ]
6. For each node u in G: [ check if n e A* ]
7. Nondeterministically either perform or skip these steps:
8. Nondeterministically follow a path of length at most i from s and reject if it doesn't end at u.
9. Increment d. [verified that u e Ai J
10. If (u, v) is an edge of G, increment q+i and go to Stage 5 with the next v. [verified that v G Ai+i ]
11. If d 7^ Ci, then reject. [ check whether found all Ai ]
12. Let d = 0. [ cm now known; d re-counts Am ]
13. For each node u in G: [ check if u e Am ]
14. Nondeterministically either perform or skip these steps:
15. Nondeterministically follow a path of length at most m from s and reject if it doesn't end at u.
16. If u = t, then reject. [ found path from s to t ]
17. Increment d. [verified that u e Am ]
18. If d / cm, then reject. [ check that found all of Am ] Otherwise, accept."
This algorithm only needs to store u, v, ci, Ci+i, d, i, and a pointer to the head of a path, at any given time. Hence it runs in log space. (Note that M accepts improperly formed inputs, too.)
We summarize our present knowledge of the relationships among several complexity classes as follows:
LCNL = coNL CPC PSPACE.
We don't know whether any of these containments are proper, although we prove NL C PSPACE3 in Corollary 9.6. Consequently, either coNL CP or P C PSPACE must hold, but we don't know which one does! Most researchers conjecture that all these containments are proper.
3We write A C B to mean that A is a proper (i.e., not equal) subset of B.
exercises 329
EXERCISES
8.1 Show that for any function /: M—>7Z+, where f(n) > n, the space complexity class SPACE(/(n)) is the same whether you define the class by using the single-tape TM model or the two tape read-only input TM model.
8.2 Consider the following position in the standard tic-tac-toe game.
X		
	O	
O		X
Let's say that it is the X -player's turn to move next. Describe the winning strategy for this player. (Recall that a winning strategy isn't merely the best move to make in the current position. It also includes all the responses that this player must make in order to win, however the opponent moves.)
8.3 Consider the following generalized geography game wherein the start node is the one with the arrow pointing in from nowhere. Does Player I have a winning strategy? Does Player II? Give reasons for your answers.
8.4 Show that PSPACE is closed under the operations union, complementation, and star.
A8.5 Show that NL is closed under the operations union, intersection, and star.
8.6 Show that any PSPACE-hard language is also NP-hard. A8.7 Show that ADFa G L.
PROBLEMS
8.8 Let EQRBX = {(R,S)\ R and S are equivalent regular expressions}.  Show that eQrex G PSPACE.
330        chapter 8 / space complexity
8.9 A ladder is a sequence of strings si,S2, • . •, Sk, wherein every string differs from the preceding one in exactly one character. For example the following is a ladder of English words, starting with "head" and ending with "free":
head, hear, near, fear, bear, beer, deer, deed, feed, feet, fret, free.
Let LADDERdfa — {(M, s, t)\ M is a DFA and L(M) contains a ladder of strings, starting with s and ending with t}. Show that LADDERDfa is in PSPACE.
8.10 The Japanese game go-moku is played by two players, "X" and "O," on a 19 x 19 grid. Players take turns placing markers, and the first player to achieve 5 of his markers consecutively in a row, column, or diagonal, is the winner. Consider this game generalized to an n x n board. Let
GM = {(B)| B is a position in generalized go-moku,
where player "X" has a winning strategy}.
By a position we mean a board with markers placed on it, such as may occur in the middle of a play of the game, together with an indication of which player moves next. Show that GM G PSPACE.
8.11 Show that, if every NP-hard language is also PSPACE-hard, then PSPACE = NP.
8.12 Show that TQBF restricted to formulas where the part following the quantifiers is in conjunctive normal form is still PSPACE-complete.
8.13 Define Alba — {(M, w)\ M is an LBA that accepts input w}. Show that Alba is PSPACE-complete.
8.14 Consider the following two-person version of the language PUZZLE that was described in Problem 7.26. Each player starts with an ordered stack of puzzle cards. The players take turns placing the cards in order in the box and may choose which side faces up. Player I wins if, in the final stack, all hole positions are blocked, and Player II wins if some hole position remains unblocked. Show that the problem of determining which player has a winning strategy for a given starting configuration of the cards is PSPACE-complete.
*8.15 The cat-and-mouse game is played by two players, "Cat" and "Mouse," on an arbitrary undirected graph. At a given point each player occupies a node of the graph. The players take turns moving to a node adjacent to the one that they currently occupy. A special node of the graph is called "Hole." Cat wins if the two players ever occupy the same node. Mouse wins if it reaches the Hole before the preceding happens. The game is a draw if a situation repeats (i.e., the two players simultaneously occupy positions that they simultaneously occupied previously and it is the same player's turn to move).
HAPPY-CAT = {(G, c, m, h) \ G,c,m, h, are respectively a graph, and
positions of the Cat, Mouse, and Hole, such that Cat has a winning strategy if Cat moves first}.
Show that HAPPY-CAT is in P. (Hint: The solution is not complicated and doesn't depend on subtle details in the way the game is defined. Consider the entire game tree. It is exponentially big, but you can search it in polynomial time.)
problems 331
8.16 Read the definition of MIN'-FORMULA in Problem 7.44.
a. Show that MIN-FORMULA e PSPACE.
b. Explain why this argument fails to show that MIN-FORMULA G coNP: If 0 0 MIN-FORMULA, then 0 has a smaller equivalent formula. An NTM can verify that 0 G MIN-FORMULA by guessing that formula.
8.17 Let A be the language of properly nested parentheses. For example, (()) and (()(()))() are in A, but) (is not. Show that A is in L.
*8.18 Let B be the language of properly nested parentheses and brackets. For example, ([()()]()□) is in B but ([)] is not. Show that B is in L.
*8.19 The game of Nim is played with a collection of piles of sticks. In one move a player may remove any nonzero number of sticks from a single pile. The players alternately take turns making moves. The player who removes the very last stick loses. Say that we have a game position in Nim with k piles containing si, ..., Sk sticks. Call the position balanced if, when each of the numbers Si is written in binary and the binary numbers are written as rows of a matrix aligned at the low order bits, each column of bits contains an even number of Is. Prove the following two facts.
a. Starting in an unbalanced position, a single move exists that changes the position into a balanced one.
b. Starting in a balanced position, every single move changes the position into an unbalanced one.
Let NIM = {(si, ..., Sk)\ each Si is a binary number and Player I has a winning strategy in the Nim game starting at this position}. Use the preceding facts about balanced positions to show that NIM G L.
8.20 Let MULT = {a#b#c\ where a, b, c are binary natural numbers and a x b — c}. Show that MULT G L.
8.21 For any positive integer x, let x71 be the integer whose binary representation is the reverse of the binary representation of x. (Assume no leading 0s in the binary representation of x.) Define the function IZ+ : Af—>Af where TZ+(x) = x + xn.
a.
Let A2 = {{x,y) \ TZ+(x) = y}. Show A2 G L.
b. Let A3 = {{x,y)\ TZ+{U+{x)) = y}. Show A3 G L.
8.22      a. Let ADD = {{x, y,z)\x,y,z>0 are binary integers and x + y = z}. Show that ADD G L.
b. Let PAL-ADD — {{x,y)\ x,y > 0 are binary integers where x + y is an integer whose binary representation is a palindrome}. (Note that the binary representation of the sum is assumed not to have leading zeros. A palindrome is a string that equals its reverse). Show that PAL-ADD G L.
*8.23 Define UCYCLE = {(G)| G is an undirected graph that contains a simple cycle}. Show that UCYCLE G L. (Note: G may be a graph that is not connected.)
*8.24 For each n, exhibit two regular expressions, R and S, of length poly(n), where L(R) ^ L(S), but where the first string on which they differ is exponentially long. In other words, L(R) and L(S) must be different, yet agree on all strings of length 2en for some constant e > 0.
332        chapter 8 / space complexity
8.25 An undirected graph is bipartite if its nodes may be divided into two sets so that all edges go from a node in one set to a node in the other set. Show that a graph is bipartite if and only if it doesn't contain a cycle that has an odd number of nodes. Let BIPARTITE = {{G) \ G is a bipartite graph}. Show that BIPARTITE e NL.
8.26 Define UPATH to be the counterpart of PATH for undirected graphs. Show that BIPARTITE <L UPATH. (Note: As this edition was going to press, O. Rein-gold [60] announced that UPATH e L. Consequently, BIPARTITE e L, but the algorithm is somewhat complicated.)
8.27 Recall that a directed graph is strongly connected if every two nodes are connected by a directed path in each direction. Let
STRONGLY-CONNECTED = {{G)\ G is a strongly connected graph}.
Show that STRONGLY-CONNECTED is NL-complete.
8.28 Let BOTHNfA = {{MUM2) \ Mi and M2 are NFAs where L(Mi) n L(M2) # 0}. Show that BOTHNfA is NL-complete.
8.29 Show that A^fa is NL-complete.
8.30 Show that Edfa is NL-complete. * 8.31  Show that 2SAT is NL-complete.
*8.32 Give an example of an NL-complete context-free language.
A*8.33 Define CYCLE = {(G) | G is a directed graph that contains a directed cycle}. Show that CYCLE is NL-complete.
SELECTED SOLUTIONS
8.S Let Ai and A2 be languages that are decided by NL-machines N\ and N2. Construct three Turing machines: Aru deciding Ai U A2;   N0 deciding Ai o A2; and A* deciding AI. Each of these machines receives input w. Machine Au nondeterministically branches to simulate Ni or to simulate N2. In either case, AAj accepts if the simulated machine accepts.
Machine N0 nondeterministically selects a position on the input to divide it into two substrings. Only a pointer to that position is stored on the work tape— insufficient space is available to store the substrings themselves. Then N0 simulates N\ on the first substring, branching nondeterministically to simulate Ari's nonde-terminism. On any branch that reaches Ai's accept state, N0 simulates N2 on the second substring. On any branch that reaches N2s accept state, N0 accepts. Machine A* has a more complex algorithm, so we describe its stages.
AT* = "On input w.
1. Initialize two input position pointers p\ and p2 to 0, the position immediately preceding the first input symbol.
2. Accept if no input symbols occur after p2.
3. Move p2 forward to a nondeterministically selected input position.
selected solutions
333
4. Simulate Ni on the substring of w from the position following pi to the position at pi, branching nondeterministicaHy to simulate JVi's nondeterminism.
5. If this branch of the simulation reaches AVs accept state, copy P2 to pi and go to stage 2."
.7 Construct a TM M to decide Adfa- When M receives input (A, w), a DFA and a string, M simulates A on w by keeping track of A's current state and its current head location, and updating them appropriately. The space required to carry out this simulation is O(logn) because M can record each of these values by storing a pointer into its input.
33 Reduce PATH to CYCLE. The idea behind the reduction is to modify the PATH problem instance (G, s, t) by adding an edge from t to s in G. If a path exists from s to t in G, a directed cycle will exist in the modified G. However, other cycles may exist in the modified G because they may already be present in G. To handle that problem, first change G so that it contains no cycles. A leveled directed graph is one where the nodes are divided into groups, A\, A2,..., Ak, called levels, and only edges from one level to the next higher level are permitted. Observe that a leveled graph is acyclic. The PATH problem for leveled graphs is still NL-complete, as the following reduction from the unrestricted PATH problem shows. Given a graph G with two nodes s and t, and m nodes in total, produce the leveled graph G' whose levels are m copies of G's nodes. Draw an edge from node i at each level to node j in the next level if G contains an edge from i to j. Additionally, draw an edge from node i in each level to node i in the next level. Let s' be the node s in the first level and let t' be the node t in the last level. Graph G contains a path from s to t iff G' contains a path from s' to t'. If you modify G' by adding an edge from t' to s', you obtain a reduction from PATH to CYCLE. The reduction is computationally simple, and its implementation in logspace is routine. Furthermore, a straightforward procedure shows that CYCLE £ NL. Hence CYCLE is NL-complete.