Nanosyntax: some key features Pavel Caha* Masarykova univerzita Brno January 31,2019 1 Introduction Nanosyntax (Nano, Starke 2002, 2009; Caha 2009) is a theory of morphosyntax whose central tenets overlap to some extent with those proposed in Distributed Morphology (DM, Halle and Marantz 1993, 1994). For instance, DM's famous dictum of 'syntax all the way down' is something that Nanosyntax would subscribe to just as much as DM. Similarly, both theories converge on a 'late insertion' approach to morphology, where syntactic computation, consisting minimally of Merge and Move, applies before spell out. The shared features are given in (l).1 (1) Nanosyntax and DM, shared features: a. Late Insertion: All syntactic nodes systematically lack all phonological features. The phonological features are supplied - after the syntax - by consulting the Vocabulary Items (lexical entries) in the postsyntactic lexicon. b. Syntax all the way down: Terminal nodes are organized into hierarchical structures determined by the principles and operations of the syntax. Since terminal nodes correspond to units smaller than words, it follows that syntactic principles govern *My work on this paper has been supported by the Czech Science Foundation, Grant no. GA17-10144S. I want to thank Karen De Clercq, Michal Starke and Guido Vanden Wyngaerd for their helpful comments on a previous version of this paper. however, I have to add that the shared features are somewhat compromised by the fact that I had to change the definitions compared to the wording used in Halle and Marantz (1994) (their literal rendering would make such an agreement impossible). Nevertheless, I hope to have extracted the spirit correctly in an attempt to show what both frameworks share, compared to other approaches, such as, e.g., A-morphous Morphology (see Anderson 1992) or Lexicalist approaches (e.g., Di Sciullo and Williams 1987). 1 the internal structure of words. There is no sharp boundary between the traditional syntax and morphology. At the same time, there are also differences. The first major difference is that in Nanosyn-tax, each morphosyntactic feature corresponds to its own syntactic terminal (cf. Kayne 2005, Cinque and Rizzi 2010). I will refer to this as "No Bundling." The second important difference is that there are no post-syntactic operations in Nanosyntax ("No Morphology," see, e.g., Koopman 2005, Kayne 2010)). In this article, I shall begin from No Bundling, and explain the idea behind "No Morphology" later on. Let me start by noting that as a consequence of No Bundling, it follows that any complex object like, e.g., the set of features [1st plural], must be created by (binary) Merge, and therefore, correspond to a syntactic phrase. This position reflects a more fundamental hypothesis about language, namely that any complex grouping of more primitive building blocks should be dealt with within a single generative system. DM does not share this view. Ever since its early days until the most recent incarnations, the input to syntactic computation in DM are not only terminals with single features, but also complex sets of features, called feature bundles. As Bobaljik (2017) puts it in his recent overview of the theory, the input to the syntactic computation in DM is "a list of the syntactic atoms [...]. Items on this list would include [...] (possibly language-particular) bundles of features that constitute a single node: for example English (plausibly) groups both tense and agreement (person and number) under a single INFL node in the syntax." The reliance on 'syntactically atomic' feature bundles is characteristic for the work done in DM despite the fact that a lot of its home-grown research has been devoted to uncovering the rich structure of these bundles (cf. Harley and Ritter 2002). In fact, this type of work lays bare a fundamental tension inside the DM model: on the one hand, it is becoming increasingly clear that feature bundles have a rich internal structure. On the other hand, this structure cannot be generated by syntax, because it is already in place when syntax starts assembling such bundles together. Feature bundles are a point of concern for Nano. As Starke (2014a) puts it, "a 'feature bundle' is equivalent to a constituent," because "enclosing elements inside square brackets is a notational variant of linking those elements under a single mother node." From this perspective, feature bundles are equivalent to n-ary trees, with n typically greater than 2. Starke further points out that this is equivalent to having a "second syntax" with "a new type of Merge for 2 the purpose of lexical storage." Crucially, the issue of what generative system lies behind the formation of these bundles remains a largely unaddressed question within DM. Nanosyntax, as already mentioned, entirely dispenses with feature bundles. The framework (as a core hypothesis about the architecture of grammar) simply rejects the possibility that feature bundles may be generated outside of the core syntactic computation: (2) No Bundling (a property of Nanosyntax, but not DM): The atoms (terminal nodes) of syntactic trees are single features. All combinations of morphosyntactic features arise as the result of (binary) Merge. Pre-syntactic feature bundles do not exist, they correspond to phrases assembled by syntax. The elimination of feature bundles eradicates the residue of Lexicalism in the theory of grammar. To see that, consider the fact that feature bundles are language specific (recall the quote from Bobaljik's 2017 introduction to DM). If that is so, then the list of "pre-syntactic building blocks" from which syntactic structures are constructed in DM is also necessarily language specific, as used to be the case in Lexicalist theories. DM of course differs from Lexicalist theories in many important respects, but the idea that syntax begins from language-particular objects is shared between DM and Lexicalism. In Nanosyntax, this residue of a language-particular presyntactic lexicon is eliminated. What we are left with as the building blocks are universal features. Concerning the inventory of such features, Nanosyntax in line with Cartography (Cinque and Rizzi 2010, 55) adopts "the strongest position one could take; one which implies that if some language provides evidence for the existence of a particular functional head (and projection), then that head (and projection) must be present in every other language." As a result, there is no trace left of "a language particular list that feeds syntax with building blocks," which completes the shift from a presyntactic Lexicon to a postsyntacic lexicon. The goal of this chapter is to further elaborate on the technical consequences of the differences highlighted above. I will introduce phrasal spellout and spellout-driven movement as the essential theoretical tools which allow the theory to capture a range of data while adhering to the principles introduced above. I will then briefly go through three case studies (on case marking, comparatives and root suppletion), which, I think, provide a good illustration of how the shared features and differences play out in the analysis of particular pieces of data. 3 2 Constituent spellout Once feature bundles are dispensed with, it follows that markers like we, which corresponds to multiple features ([1 PL]), spell out multiple terminals of the syntactic tree (cf. Vanden Wyn-gaerd 2018). The first question is how to delimit the sets of terminals that may be spelled out by a single marker. This question arises because it is not the case that just about any two terminals may be pronounced together regardless of their position in the tree. To give an example: a morpheme like we cannot lexicalise the person of the subject and the number of a fronted XP, so that a sentence like In god we trust would be the spellout of a meaning corresponding to In god[-s I] trust. Here the bracket in the second sentence encloses two elements that jointly provide both the feature of the first person and that of a plural, yet their adjacency is not sufficient for joint spellout. So clearly, some restrictions on which features may be lexicalised together must be a part of the spellout mechanism: the sheer reduction of the number of morphemes in a string cannot be the right criterion.2 Starke (2002; 2009; 2014b; 2018) as well as much current work in Nanosyntax propose that the theoretically simplest way to define sets of terminals eligible for joint spellout is to rely on a grouping mechanism that is already needed for independent reasons. The sets of terminals that syntax provides for free are (by definition) constituents, and hence, the zero theory is one where morphemes spell out constituents (cf. McCawley 1968, Weerman and Evers-Vermeul 2002, Neeleman and Szendroi 2007, Radkevich 2010 for similar approaches outside of Nanosyntax). Constructing theories along these lines has been the golden standard of work in generative grammar. Consider, for instance, the work done on ellipsis or movement. Here, we also encounter situations where ellipsis/movement targets multiple terminals. The standard way of explaining why several terminals undergo ellipsis/movement jointly is to say that they are all contained in a single constituent, and it is this constituent that actually undergoes ellipsis/movement. Nanosyntax adopts the same methodology (just applying it to spellout) and adheres to the view that when multiple terminals are joined inside a single morpheme, this is so because the morpheme spells out a constituent containing these terminals. All observable restrictions on joint lexicalisation should follow from this (in the same way as restrictions on 'joint movement' or 'joint ellipsis'). In following this logic, Nanosyntax not only adheres to a standard theory-building proce- 2Note incidentally that phenomena like these are not completely out of this world; see Blix (2016, sec.6.2) for a discussion of a morpheme of Pazar Laz, which is able to spell out the number of the object alongside the person of the subject, provided they form a unit available for spellout. 4 dure, but also increases our model of grammar incrementally (rather than changing it completely). To see that, consider the fact that where DM has feature bundles located under a terminal, Nanosyntax has a run-of-the-mill syntactic phrase. However, all terminal nodes of standard DM (where markers are inserted) are still syntactic nodes in Nano—just phrasal. This is not so in sequence-based approaches to spellout. These approaches are close to Nano in that a single morpheme may correspond to several terminals. The difference is that in this theory, spellout targets multiple terminals that form "a functional/linear sequence" (Abels and Muriungi 2008, Dekany 2012, Svenonius 2012, Merchant 2015, Haugen and Siddiqi 2016). These approaches thus lead to a much more radical departure from feature constituency used in DM. In such approaches, the feature bundles of DM (targeted by insertion) are no longer constituents at all: they have been replaced by a new type of object, a "sequence." 2.1 Underspecification Let me now move on to the observation that if we want to have a theory where morphemes target phrasal nodes, we must define our insertion principles differently from what is usually assumed in DM. The reason is that non-terminal insertion clashes with one of DM's key features (as defined in Halle and Marantz 1994), namely Underspecification, see (3). (3) Underspecification (a feature specific to DM). In order for a Vocabulary Item to be inserted in a terminal node, the identifying features of the Vocabulary Item must be a subset of the features at the terminal node. Underspecification is embodied in the well-known insertion principle used in DM, the Subset Principle, given in an abbreviated form below. Notice that this principle explicitly states that it governs insertion only at terminal nodes. (4) The Subset Principle (abbreviated, Halle 1997) The phonological exponent of a Vocabulary Item is inserted into a morpheme of the terminal string if the item matches all or only a subset of the grammatical features specified in the terminal morpheme. Can this insertion principle be broadened in a way that the Subset Principle could also govern insertion at phrasal nodes? It turns out it cannot. Consider, for instance, the suppletive com- 5 parative worse. Bobaljik (2012) proposes that it spells out a non-terminal composed of the root •v/bad and an associated cmpr head. Its lexical entry is as shown in (5): (5) worse o cmprp To allow for insertion of such lexical items, one could simply drop the restriction on terminal nodes from the Subset Principle. This would preserve the spirit of Underspecification and, at the same time, allow insertion into all nodes in general. To reflect the more general nature of insertion sites in such a reformulation of the Subset Principle, given in (6), I will call it the 'Generalised' Subset Principle. The boldfaced parts highlight the modifications introduced in (6) compared to the standard formulation in (4). (6) The Generalised Subset Principle (a made-up principle that would fail, if proposed) The phonological exponent of a Vocabulary Item is inserted into a node if the item matches all or only a subset of the grammatical features specified in the node. This principle would allow the insertion of worse (5) into a non-terminal built by syntax, as shown in (7). The tree here depicts the structure built by syntax, and the bracket indicates that the spellout of cmprp is successful, since worse in (6) matches a (trivial) subset of the features specified inside the cmprp, namely the root y/BKD and the cmpr feature. (7) cmprp {worse) However, the Generalised Subset Principle would also allow for the cmprp in (8-a) to be pronounced by a regular (non-suppletive) root like fast, with an entry as given in (8-b). The spell out of the cmprp in (8-a) is allowed because fast is specified for a (proper) subset of the features contained in the cmprp node, specifically for the root \/fast. This is obviously a wrong result, since fast does not have a comparative meaning, and so insertion at CmprP must be blocked in this case. In fact, taking this logic to its extreme, a whole sentence containing the root \/fast at the bottom could be spelled out as fast. (8) a. cmprp (fast) b. fast -b- VfaST cmpr y/BAD cmpr y/BAD cmpr VFAST 6 There are various ways in which theories based on Underspecification may block/as? in spelling out the whole CmprP in (8-a). The main strategy is to augment the Subset Principle by additional principles that restrict Underspecification at non-terminals. For instance, Bobaljik adopts the Vocabulary Insertion Principle proposed by Radkevich (2010) for the case at hand; cf. Feature Portaging in Newell and Noonan (2018) or negative features in Siddiqi (2006). I shall not discuss these theories in any detail here (see Caha 2018a for the discussion of some potential problems), since the general point is exactly this: in order to allow for non-terminal spell out, one needs to re-think how insertion works: Underspecification on its own is not enough. Nanosyntax—rather than proposing additional principles on top of Underspecification—replaces Underspecification by a similar (but inverted) condition, namely Overspecifi-cation. Once Overspecification is adopted, phrasal lexicalisation works with no need for any additional principles. 2.2 Fusion and the architecture of grammar In mainstream DM, however, insertion is proposed to target only terminals. Trivially, this move eliminates the need to augment Underspecification by any additional principles to deal phrasal spell out. However, the need to associate worse to multiple terminals then requires something special.3 One possible approach within DM is to introduce a post-syntactic operation called Fusion. What Fusion does is that it takes the relevant non-terminal as an input, and turns it into a terminal, see (9). (9) Fusion: cmprP ->• [ cmpr ^bad ] cmpr Vbad The availability of such a proposal rests on a particular architecture of the grammar. In particular, because of the fact that Fusion does not affect the interpretation of cmpr, Fusion (along with other similar operations) is assumed to take place on a separate branch of the derivation that only affects PF. Still, Fusion happens before the insertion of Vocabulary items, which means that Vocabulary Items must also be inserted at the PF branch, following Fusion (and, as we shall see, other postsyntactic operations). The overall model is thus as shown in (10) on the left-hand side, which is a picture taken from Harley and Noyer 1999, slightly simplified). 3 It is of course also possibe to deny that worse realises multiple terminals, in which case the cmpr marker would be silent. See Caha (2018a) for a discussion of the issues that interact with this decision, specifically, what kind of consequences this has for the so-called *aba property of paradigms that we will turn to shortly. 7 (10) The architecture of Distributed Morphology (left) and Nanosyntax (right) Distributed Morphology Nanosyntax Feature bundles Single features r PF ~7 Lexicon ---- insertion, Merger Fusion, Feature CF PF CF The point of Vocabulary-Item insertion is labelled as 'Lexicon.' The label is used because it is here where syntactic features are paired with their pronunciation. (In contrast, I will not use the term Lexicon for the pre-syntactic list of features or feature bundles, since the pre-syntactic list does not contain such pairs, at least in Harley and Noyer 1999.) In DM, the Lexicon is pushed down the PF branch, because, as said, it needs to follow Fusion, which does not feed interpretation. This, however, leads to a paradox. To see that, consider the fact that the CF needs to know whether dog or cat has been inserted into the root node. But since the nodes of the syntactic tree are devoid of phonological and conceptual features (recall (1-a)), this information is only present in the derivation after Vocabulary Insertion, i.e., after the 'Lexicon' box. So in order for the CF to know which Vocabulary Item has been inserted, it needs to have access to the stage of the derivation that follows Vocabulary Insertion. This is paradoxical, because the PF—CF split actually precedes Vocabulary Insertion, so if one sticks to the strict Y-model, CF should not be able to see which Vocabulary Item has been inserted. The tension is resolved (in Harley and Noyer 1999) by enriching the model by a direct communication line between the CF and the PF. This is indicated by the dashed line, which by-passes the syntactic derivation.4 Nano rejects post-syntactic operations, replacing Fusion by phrasal spellout. Therefore, the Lexicon is not located down on the PF branch, but appears at the juncture of the three systems (syntax, PF and CF). When a lexical item is inserted at a node, its phonology (e.g., good) is sent to PF, while the corresponding concept (good') is simultaneously sent to CF. No additional direct communication line between the PF branch and the CF is needed. (The meaning of the 4Harley (2014) or Embick and Noyer (2007) provide a different solution to the issue of how conceptual information is passed on to CF (without the need for a direct communication line), which I discuss in section 4.2. 8 dashed arrow leading from the Lexicon back to syntax will become clear in section 4.1.) Even though the existence of Fusion and similar operations complicates the architecture, mainstream DM has fully embraced the model with Lexicon on the PF branch, preceded by a number of operations, some of which are listed in (10). As Bobaljik (2017) puts it, in DM, "the investigation of mismatches between syntactically-motivated representations, and those observed in the morphophonological string" assumes "a central role," and "a variety of devices [=postsyntactic operations] serve together to constitute a theory of possible mismatches." In Nano, the theoretical goal is different, namely to develop a unified theory of syntax and morphology based on No Bundling. On this approach, all such 'mismatches' must be accommodated by updating our syntax. Reference to post-syntactic operations represents a type of solution that would not be considered satisfactory in Nano. Note that deciding whether we do—or don't—need a Morphological Component of the sort envisaged in DM is not a matter of simple empirical observation (as researchers working in DM sometimes tend to suggest). This is not to say that facts play no role, but it is an indisputable fact that for as long as the theory of syntax is not ready and finished once and for all (accounting for all the facts there are), mismatches between surface forms and syntactic structures are bound to occur: their very existence does not constitute evidence for anything. The question is rather how we proceed when mismatches are uncovered: do we treat them as illusions, which disappear once syntax is properly set up, or do we accommodate the facts by adding 'mismatch-removing operations' on top of an existing theory? Nanosyntax (along with other approaches) rejects the latter and opts for the former. To express the programmatic resistance of Nanosyntax to a post-syntactic component through a slogan, Caha (2007) suggested that one should not only avoid morphological analysis in the privacy of one's own Lexicon (as Marantz 1997 proposed), but also in the privacy of one's own Morphology. (11) No Morphology (a feature of Nanosyntax). There is no component of grammar other than syntax that has the power to manipulate syntactic structures. No nodes or features may be added or deleted outside of syntax, no displacement operations take place outside of the syntactic computation. As the chapter unfolds, I would like to give the reader a sense of the technology that Nano uses to deal with the scenarios that DM covers by the post-syntactic operations given in (10). 9 2.3 Overspecification As the frst step towards a model without post-syntactic operations, Nanosyntax replaces all analyses with Fusion by phrasal lexicalisation. To avoid the problems caused by Underspecifi-cation, Starke (2009) replaces it by Overspecification, see (12). (12) Overspecification (a feature specific to Nanosyntax). In order for a lexical item to be inserted in a node, the lexical entry must fully contain the syntactic node, including all its daughters, granddaughters, etc., all the way down to every single feature dominated by the node to be spelled out, and in exactly the right geometrical shape. Technically, overspecification is implemented by the so-called Superset Principle: (13) The Superset Principle, Starke (2009): A lexically stored tree L matches a syntactic node S iff L contains the syntactic tree dominated by S as a subtree. Once The Superset Principle is in place, non-terminal spell out works as needed (and without the need to add additional principles). Specifically, worse (recall (5)) can still spell out the structure [cmpr \/bad] in (7), because the syntactic tree is contained in the lexically stored tree in (5). However, fast (with the entry as in (8-b)) can no longer spell out [cmpr \/fast], because such a syntactic tree is not contained in the entry of fast in (8-b). So all is well. This is an important result. It shows that once Overspecification is adopted, we no longer need to search for additional principles that counteract the effects of Underspecification, and we have a fairly simple insertion rule that in principle applies to all nodes (both phrasal and terminal), an important achievement in the pursuit of No Bundling. 2.4 Elsewhere The Superset Principle (just like the Subset Principle) leads sometimes to the result that several candidate morphemes qualify for insertion at a particular phrasal node. Consider, for instance, the phrases given in (14) and the lexical entries as in (15). 10 (14) a. F2P F2 Fi What we see here is that the lexical entry for j5 matches all the structures in (14), because it contains every single one. a matches only (14-a,b), but it does not match (14-c). This means that for (14-a,b), both a and j5 are candidates for insertion. In such cases, the entries compete, and the so-called Elsewhere Condition (Kiparsky 1973) determines the winner. The Elsewhere Condition says that when two entries compete, the more specific entry wins. In our case, this is a, because it spells out proper subset of structures compared to j5. As a rule of thumb, the more specific entry is the one that contains fewer superfluous features: (16) The Elsewhere Condition: When two entries can spell out a given node, the more specific entry wins. Under the Superset Principle governed insertion, the more specific entry is the one which has fewer unused features. 3 Features, Paradigms and the *ABA With the basics of insertion in place, let me now turn to how the theory is put to use in modelling morphological paradigms. In a tradition going back at least to Jakobson (1936 [1962]), it has become customary to characterise the cells of paradigms in terms of more primitive units of analysis, namely features, where each cell of the paradigm is defined by a unique set of features. This approach is also widely adopted in DM, where the relevant features usually form a bundle at the relevant terminal. In Nanosyntax, feature bundles are dispensed with, and so each cell in a paradigm corresponds to a constituent containing the relevant features. Consider, for instance, the trees in (14). These trees can be taken to define a paradigm like the one in (17). Specifically, the tree (14-a) contains the features Fi and F2; these corresponds 11 to the features that characterise the cell on the first row (Cell 1). The tree (14-b) corresponds to Cell 2 (it contains the same features as Cell 2), and (14-c) corresponds to Cell 3. (17) An example paradigm features a matches j5 matches insertion Cell 1 Fi,F2 yes yes a Cell 2 Fi,F2, F3 yes yes a Cell 3 Fi,F2, F3,F4 no yes /3 Consider now in addition that these cells (each cell representing a particular constituent) can be spelled out by lexical items which contain them. In (15), I have given two lexical entries such that a contains the features of Cell 2 and Cell 1, and j5 contains the features of all the cells. Such lexical entries therefore match the cells of the paradigms as depicted in the table (17). Where both match, competition arises with a winner determined by the Elsewhere Condition. The winners are recorded in the final column, and they correspond to the surface paradigm generated by the system introduced in the previous section. The interest of "translating" the abstract structures in (14) onto a paradigm like (17) is that once we realise the possibility of such a "translation," we can start doing it also the other way round. If we succeed, we ultimately reduce surface paradigms (the sequences of as and /3 s) to surface manifestations of syntactic structures of a rather familiar kind. A fundamental question in this enterprise is how we come to know—given a set of forms—what order they come in, so that we can then decide what structure they correspond to. An important stepping stone on this path was the investigation of the so-called *ABA patterns. ABA (without the asterisk) refers to a pattern where in a particular arrangement of cells, the first cell and the last cell are the same, while the middle cell is different. When ABA is preceded by an asterisk, this means that such a syncretism is not found. As in any area of science, the goal is to explain why we observe some patterns of behaviour, but never other patterns; so if *ABA is observed in a paradigm, we want the theory to be able to explain this. It can be shown that when the cells in a paradigm are ordered in terms of growing complexity (as in (17)), then the *ABA restriction falls out from the theory. Consider the reasoning: In order for Cell 3 in (17) to be spelled out as j5, the entry of j5 must be as in (15-b) (it must contain all the features of Cell 3). Now by virtue of containing all the features of Cell 3, and because we are dealing with a paradigm of growing complexity (by assumption), j5 necessarily also contains all the features of Cell 2 and of Cell 1. If that is so, then j5 also automatically 12 applies in those cells (in virtue of the Superset Principle). Now since j5 is in principle applicable in all the cells, the only way how j5 may fail to spell out the middle cell is that there is a more specific competitor—a in our case—which outcompetes j5 due to the fact that it has fewer superfluous features. Crucially, once we have j5 in the most complex cell and a in the middle cell, we realize that the least complex cell (CI) can never be spelled out by j5 (which would yield an ABA type of pattern). To see that, consider the fact that in the setup we have created, both a and j5 can spell out Cell 1. Further, since a has fewer features than j5, a it will always win when they compete, including Cell 1. The conclusion to be drawn here is therefore the following: if it is true that surface paradigms derive from syntactic structures of the sort in (14), we expect such paradigms to exhibit rather stringent restrictions on syncretism. For this reason, the study of *ABA patterns has been an important empirical domain to look at within Nanosyntax.5 In one of the first approaches along these lines, Caha (2009) has addressed this issue for case morphology, and found a number of languages where such constraints have been independently observed in the existing literature. Moreover, he argued that the results of such studies can be generalised into a universal linear restriction on syncretism in case, such that in the sequence NOM—ACC—GEN—DAT—INS—COM, only adjacent functions can be syncretic. Leaving the subsequent ramifications of this ordering aside (see Hardarson 2016, Starke 2017, Zompi 2017, Van Baal and Don 2018, Caha 2018b), Caha (2009) proposed that such a constraint can be explained by organising the cells of case paradigms in a cumulative fashion, as depicted abstractly in table (17), so that ultimately, the full case structure looks as given in (18). 5See, e.g., McCreight and Chvany (1991), Plank (1991) or Johnston (1996) for the investigation of *ABA patterns outside of both DM or Nanosyntax. See Bobaljik (2012) for an important discussion of *ABA within DM, which has provided much inspiration for this kind of work. 13 (18) The proposal embodied in this structure is that the nominative case (corresponding to nomP in (18)) is characterised by the feature Fi, the accusative (corresponding to ACCP in (18)) is derived from the nominative by yet another feature, etc. The proposal has the effect that the cases nom—ACC—gen etc. stand in a containment relation, exactly as the abstract structures in (14). The labels of the phrasal constituents are apparently exocentric, but this is only for clarity: the 'true' label of the nominative is FiP, but this would be a rather opaque label, so the non-terminal nodes carry the name of the case, which is defined by the collection of Fs it dominates (e.g., ACC = [ Fi, F2 ]). I am leaving the content of the features aside, see Caha (2013) for some remarks. The novel part of this proposal is not so much the idea that case decomposes into various features; this has been a common stance ever since Jakobson's (1936 [1962]) pioneering work. The novel part is rather that each case feature corresponds to a head of its own, which, recall, is one of the core features of Nano. What is also different from Jakobson is that the features are privative. In sum, the proposal derives the morphological patterns found in case paradigms from a type of architecture that is characteristic for syntactic derivations. By now, a number of researchers working within Nanosyntax have looked at various phenomena through a similar lens, and used *ABA patterns as a tool for uncovering the underlying features and their hierarchical organisation into 'nesting' structures of the type in (18) (see in particular Starke 2009; Pantcheva 2010; De Clercq 2013; Baunaz and Lander 2018a,c; Lander and Haegeman 2018; Taraldsen Medová and Wiland 2018). And even though some more recent contributions point out that *ABA patterns can also be derived with non-nesting types of structures (e.g., Caha 2017a or Bobaljik and Sauerland 2018), this changes nothing to the fact that for a number of domains, the existence of *ABA patterns has led to the confirmation of an 14 architecture where the atoms of syntax are not feature bundles, but single features. In DM, the standard treatment of case morphology is different in a way that I think is symptomatic for the larger architectural differences between the frameworks (see Halle 1997; Halle and Vaux 1998; McFadden 2004; Embick and Noyer 2007; Calabrese 2008). Consider, for instance, one specific proposal taken from Embick and Noyer (2007), a state-of-the-art paper on DM. Their feature decomposition, intended to capture the facts of the Latin declension, is given in (19). I have taken the freedom to re-label their ablative as instrumental, since in Latin, the ablative marks also instruments. (19) Case decomposition in DM NOM ACC GEN DAT INS Oblique - - + + + Structural + + + + -Superior + - - + + To see the generative power of such a decomposition, consider, for instance, the triplet NOM— ACC—GEN. In this sequence, no ABA pattern is attested in Latin, which is in line with the fact that this would be very rare crosslinguistically. In particular, Baerman et al. (2005) report that if one of NOM/ACC is the same as an oblique case (frequently a genitive), this is going to be the accusative and not the nominative.6 However, the decomposition in (19) cannot rule out such ABA patterns, and so it does not allow us to capture the asymmetry reported by Baerman et al. (2005) (cf. McFadden 2017, Smith et al. 2018). Consider the reasoning: In the proposal (19), the three cases under discussion share the feature [+structural], and so any exponent marked for [+structural] can appear in all the cases, yielding an AAA pattern (recall that DM uses Under specification). When such a 'default' AAA pattern interacts with competing entries, ABA patterns emerge. Specifically, because of the decomposition into equipollent features, it is possible to devise tailor-made competitors for each individual case. Suppose, for instance, that NOM has a dedicated case marker, B, specified as [-Oblique, +Structural, +Superior]. Its competition with the underspecified marker A would yield a BAA pattern. However, if B were tailor-made for ACC, competition would yield ABA, and if B were tailor-made for GEN, we would get AAB. This shows that within this particular triplet, any pair of cases can be syncretic, which goes against the general-6Baerman et al. (2005) phrase this as a tendency, see Caha (2018b) for a discussion of some counterexamples. 15 isation observed in the typological literature. Hence, as Caha (2009) argues, the Nanosyntactic proposal that features are privative, and assembled by Merge, is not only theoretically attractive (consistent with No Bundling), it allows one to capture important generalisations. The standard DM account (as described in Embick and Noyer 2007) has an additional feature that is worth mentioning in this context. Following Marantz (1991) and McFadden (2004), Embick and Noyer (2007) report that in DM, case features are not a part of the syntactic derivation at all. They note: "At PF, case features are added to DPs [...], based on the syntactic structure that the DP appears in. [...] These features are added at PF, and are not present in the syntactic derivation." I will refer to such features introduced post syntax as morphological features, 'M-features' for short. The important point is that the postulation of M-features amounts to the introduction of yet another generative component (this time post-syntax), where complex feature bundles can be assembled. In addition, when case is expressed independently of other categories (as in agglutinative languages), such case features would be introduced in a separate node, also created post-syntax, as Embick and Noyer (2007) make clear in their footnote 25. I find it difficult to reconcile such an array of structure-building operations with the explicit statement that in DM, "all complex objects, whether words and phrases, are treated as the output of the same generative system (the syntax)" (Embick and Noyer 2007). One can of course always backtrack from specific proposals about case features, but the fact remains that from the perspective of Nanosyntax, the multitude of mechanisms that DM's architecture makes available is "an embarassment of the riches," as Bobaljik (2017) points out. To make explicit the implications of these findings for the general architecture assumed in Nano and DM, I will start by quoting a passage from Embick and Noyer (2007), originally meant as a guideline for comparing Lexicalist and non-Lexicalist approaches. They say: "It is often objected in discussions of non-Lexicalist versus Lexicalist analyses that the patterns analyzed syntactically in the former type of approach could potentially be stated in a theory with a Lexicon. This point is almost certainly correct, but at the same time never at issue. [...] The Lexicalist position, which posits two distinct generative systems in the grammar, can be supported only to the extent that there is clear evidence that Lexical derivations and syntactic derivations must be distinct." DM (compared to Nano) has exactly the same issue of multiple systems that can generate (or minimally provide) complex objects: (i) pre-syntactic feature bundles, (ii) syntax, (iii) feature 16 bundles constructed at 'PF,' (iv) nodes inserted at PF. So if the reasoning quoted above is followed consistently, it must be concluded that Nano has an architectural advantage of exactly the same sort that differentiates between lexicalist and non-Lexicalist approaches. In particular, to the extent that feature bundles can be generated by syntax (corresponding to vanilla-flavour syntactic constituents), they should not be drawn from a pre-syntactic list or created at PF. 4 Cyclic spellout Another important feature of current work in Nanosyntax is Cyclic spellout (cf. Starke 2018, Baunaz and Lander 2018b, Caha et al. 2019a). (20) Cyclic spellout. Spell out must successfully apply to the output of every Merge F operation. After successful spellout, the derivation may terminate, or proceed to another round of Merge F, in which case a new round of spellout is initiated, and so on. Cyclic spellout plays a central role in current Nanosyntactic thinking, providing the basis of an account for a number of phenomena including idioms, root suppletion and affix ordering. To see how cyclic spellout works, let us assume the very same toy scenario that we have been working with in (14), only enriched by the idea of cyclic spellout. Suppose then that syntax merges Fi and F2, forming F2P: (21) [f2pF2Fi] After Merge F has applied, spellout applies. Spellout means that the lexicon is searched for an item matching the phrase in (21). In our toy scenario, F2P is contained in the lexical entry for both a, recall (15-a), and j5, recall (15-b). Recall also that a wins against j5 due to The Elsewhere Condition. Spellout is therefore successful. After the successful application of spellout at F2P, the lexicalisation procedure remembers minimally that F2P can be lexicalised by a. If no more features are added (we are finished constructing the intended meaning), the derivation terminates and F2P will ultimately be pronounced as a. However, if we want to add more meaning, the derivation continues—without being immediately pronounced. Suppose it continues, and that F2P is fed back to syntax for an additional Merge F operation. The result is that F3 is added, and spellout applies to the F3P depicted in (22). In (22), the tree contains the information (accessible to the lexicalisation 17 procedure, not to syntax) that F2P has been matched by a at the previous round of spellout. (22) F3P F3 F2P (a) F2 Fi When a tree like (22) is fed to spellout, we again find two possible matches for F3P, namely a and j5, with a again as the winner. The match of a at F3P is remembered, and all previous matches inside F3P are forgotten (over-ridden). Should no more features be added, F3P would be pronounced the same as F2P, namely by a. If we want to add more meaning, (22) is fed back to syntax again, and F4 is added, producing F4P as shown in (23). (23) Once again, at spellout, the product of Merge F (namely F4P) must be matched against a lexical entry. This time, only the lexical entry for j5 is a match. It is thus remembered as the spellout of F4P, and a is over-ridden. If no more features are added, F4P is pronounced as j5. Note that lexical entries containing trees do not duplicate syntax in any way (which is similar to saying that lexical entries containing phonology do not "duplicate" phonology). The purpose of the lexicon in Nanosyntax is to link syntactic representations (trees) to representations legible by phonology (sound) and by the conceptual system (meaning). Note finally that cyclicity here is neither the same as the notion of a phase as currently entertained in the syntactic literature (e.g., Chomsky 2001), nor is it meant as its replacement. The core of Chomsky's proposal is that some phrasal nodes are special and correspond to phases, other phrasal nodes are ordinary, and do not correspond to phases. Cyclic spellout treats all phrasal nodes alike. Phases in Chomsky's sense are not a part of the standard Nano toolbox, but they could be easily added; there is no logical incompatibility between cyclic spellout (cyclic lexical look up) and the idea that some phrasal nodes are special (for instance, where actual shipping to PF/CF occurs). There are two empirical domains where cyclic spellout plays an important role. I now visit 18 them in turn. 4.1 Spellout-driven movement When spellout at a newly formed FP fails, spellout-driven movements take place. The goal of these movements is to create a configuration where the spellout of FP succeeds. (24) Spellout-driven movement. When Merge F produces an FP that cannot be spelled out (no lexical item matches the FP), the FP is rejected at the interface. Syntax then tries to rescue the structure by performing one in a predefined hierarchy of movement operations, before it sends the structure for spellout again. A variety of spellout-driven-movement operations may apply until spellout at FP succeeds. Once lexicalisation succeeds, the derivation either terminates or continues by Merge F. Spellout movements are different from standard feature driven movements in that they have no effect on interpretation, they are strictly local (inverting the order of two adjacent phrases), and show no reconstruction effects (there is no evidence for two interpretation sites). Within Nanosyntax, spellout-driven movement is used as a replacement for the traditional head movement as well as for DM's Merger. The machinery is also related to the U20 type of movements proposed in Cinque (2005). However, wh-movement, focus movement, etc. are of a different kind and contrast with spellout-driven movement on all three properties given above (they affect interpretation, they can cross multiple phrases, they show reconstruction effects). The algorithm for spellout-driven movement is given in (25). It is basically a version of the algorithm as presented in Starke (2018), and I will explain its workings step by step. (25) Spellout Algorithm a. Merge F and spell out. (i) If (a) fails, try spec-to-spec movement of the node inserted at the previous cycle, and spell out. (ii) If (a.i) fails, move the complement of F, and spell out. b. If (a.ii) fails, remove F from the main workspace. Start a new workspace, and build a phrase containing F that can be spelled out. Once done, Merge that phrase back with the main projection line. 19 In the previous section, we have already informally talked about how (25-a) works. What we shall now see in more detail is what happens when spellout fails. I am going to ilustrate the algorithm on a fragment of data discussed in Caha et al. (2019a) (CDV henceforth). Once the system is introduced, I show that it is capable to replicate derivations that have been treated by post-syntactic Merger in DM. The goal of CDV's paper is to capture alternations in comparative marking in Czech, English and other languages. Beginning with Czech, the basic contrast is illustrated in (26). (26) - FULL MARKING REDUCED MARKING POS CMPR GLOSS POS CMPR GLOSS chab-y chab-ejs-i 'weak' slab-y slab-s-i 'weak' kulat-y kulat-ejs-i 'round' bohat-y bohat-s-i 'rich' jist-y jist-ejs-i 'certain' tlust-y tlust-s-i 'fat' The table shows two different classes of comparatives in Czech. The first class—which is the productive one—can be seen on the left, and it forms comparatives by the suffix -ejs. The marker appears in a position preceding the final agreement marker -i (obligatory in Czech). The second class uses a reduced marker to the same effect, namely -s. There does not seem to be any straightforward way of deciding which adjective forms which type of comparative; this seems to be to a large extent an arbitrary property of a particular root (though frequency and phonology play a role, see Kfivan 2012). In what follows, I will use the examples on the first row (both meaning 'weak') to illustrate the working of the theory. CDV give various reasons to believe that the non-reduced marker -ejs decomposes into -ej and -s, where the latter marker is shared between the two comparatives. For instance, the comparative adjective chab-ej-s-i 'weaker' has a corresponding comparative adverb chab-ej-i, which lacks the -s, suggesting that -s has an independent life on its own. The two classes thus differ as shown below, with the final agreement omitted: (27) Two classes of comparatives a. ^ -Sj -s b. V -s Taking the bi-morphemic nature of the comparative in (27-a) as a starting point, CDV propose 20 that in the morphosyntactic structure, two comparative projections must be present, where the lower one is pronounced as -ej, and the higher one as -s (cf. Caha 2017b, De Clercq and Vanden Wyngaerd 2017). This is schematically depicted in (28-a), where the comparative markers appear on top of a QP. QP corresponds to a gradable adjective, and decomposes into the gradability head Q and the property head A (not shown in the tree). (28) a. Cmpr2P b. Cmpr2P ' CmprIP A Cmpr2 CmprI QP slab In this setting, (28-b) encodes the proposal that Class 2 adjectives lack CmprI -ej because their roots spell out a phrasal projection that includes CmprI as well as the QP. The structures are simplified, and I elaborate on them further below. However, what can be seen right away is that the two classes of roots can be easily distinguished in the lexicon as follows: b. slab-^ CmprIP -h- WEAK CmprI QP (29) QP chab WEAK With such entries, both roots can spell out the QP (due to the Superset Principle), and appear as such (without any affixes) in the positive degree (which corresponds precisely to the QP). In the comparative, a difference shows up. The chab root, given in (29-a), still spells out QP only, and needs additional affixes to express CmprI (-ej) and Cmpr2 (-s). The slab root, however, is able to spell out CmprI on its own, and combines only with cmpr2 -s. It is interesting to note that this way, we state the selection requirements between the root and the particular comparative suffix using the variable size of the lexical tree associated with the root. There is no need for a statement of the sort 'this root combines with -s' or 'this root combines with -ej-s;' such combinatorial statements simply fall out from the lexical difference in the size of the tree associated to the two classes of roots.7 Let me now describe how exactly spellout works in Class 1 (non-reduced marking). The derivation begins by forming a QP. Such a QP is contained in both lexical entries in (29), and 7This is an interesting proposal for allomorphy in general, and there is an ongoing work that investigates this option (see, e.g., Holaj 2018), but I leave this aside here for reasons of space. 21 so both roots can be inserted. I am assuming here (following CDV) that the choice of the root is free, and not subject to Elsewhere reasoning (cf. Harley and Noyer 1999). Suppose that the root in (29-a) (chab) is selected. QP is thus successfully spelled out, and the derivation continues by adding CmprI, forming CmprIP. When this happens, the structure is again sent for spellout. The structure now looks as in (30-a): (30) a. CmprIP b. ^^^^^ ^^^^^ QP (chab) CmprIP CMPRl QP (chab) | ^ ... CMPRl (30-a) cannot be spelled out by the root chab (its lexical entry does not contain CmprI), and so the output of MergeF (CmprIP) ends up without a spellout. The structure is therefore rejected at the interface, and Merge F cannot continue. A repair spellout-driven movement is therefore attempted. The various options of this movement are always applied in the succession given in (25) (no look-ahead as to whether a particular step will succeed or not). The first option is moving the Spec of the complement. For our case, this entails that the movement of Spec,QP should be tried first, but QP has no movable Spec in (30-a), so this option is skipped. The next option down the list is the movement of the full complement, which is the QP. The output of such a movement is shown in (30-b). Note that QP leaves no trace inside CmprIP. According to Starke (2018), this is a general property of spellout-driven movement, and this is also how it differs from, e.g., w/z-movement (recall that spellout-driven movement, unlike w/z-movement, never shows any reconstruction effects, and so there is never any evidence for two different interpretive positions). After movement, the spell out of CmprIP is tried again. CmprIP now lacks the QP inside, and so there is a chance that lexicalisation succeeds. We know that it does in Czech, inserting the marker -ej. Its lexical entry according to CDV is therefore as shown in (31). It is easy to see that this entry perfectly matches the CmprIP in (30-b). (Recall that it is the lower CmprIP that has been created by Merge F, and it is therefore this lower node that undergoes spellout.) (31) -8j<* CmprIP CmprI At this stage, the structure (30-b) is successfully spelled out, and if no more features are added, it would be pronounced as the sequence of chab and -ej. Note that as a result of spellout-driven 22 movement, CmprIP follows the QP, so on the surface, -ej follows chab. The suffixal nature of -ej is determined by the shape of the lexical tree it is associated to in (31). The tree has just a single feature dependent on the lowest phrasal projection CmprIP. In a model like that of Chomsky (1994) (Bare Phrase Structure), such a configuration only arises in syntax when the second daughter of CmprIP extracts. And since movement is only to the left (as in Kayne 1994 and many others), this means that -ej will only ever be inserted as a suffix. The derivation now proceeds by merging Cmpr2 on top of (30-b), with the result shown in (32). This constituent cannot be spelled out as is, triggering spellout-driven movement. According to the spell out algorithm, the first operation that must be tried is the movement of the Spec of Cmpr2's complement. This phrase corresponds to the QP, and so the QP is moved out, with the result in (33). In Czech, there is no marker to spell the Cmpr2P thus formed (containing the features CmprI and Cmpr2), and hence, spell out fails. (32) CMPR2P Cmpr2 QP (chab) (33) QP (chab) CmprI P(-ej') CMPRl Cmpr2P Cmpr2 CmprIP (-ej) CMPRl When (33) is rejected, the next option to be tried is complement movement. We start from the original Merge F structure (32), the complement of the newly added F is moved, producing (34). CDV propose that the lexical entry for -s is as in (35). This lexical item matches the Cmpr2 in (34) out of which the phrase [chab-ej] had extracted, and so spell out succeeds, producing the correct sequence of morphemes chab-ej-s. (35) -s <-> CMPR2P Cmpr2 Cmpr2P Q? (chab) CmprIP (-ej) Cmpr2 CMPRl Note that this way, Nanosyntax replicates a roll-up movement type of derivation without the need to postulate the usual 'movement' features on particular heads. In this theory, mirror 23 image orders (in the sense of Baker 1985) arise as a result of the interaction between the spellout algorithm and the tree shape of lexical entries (Starke 2014b). I will now briefly show how reduced comparative marking arises in this theory. The derivation starts again by assembling a QP, which can be spelled out by slab, because it is contained in its entry, recall (29-b). If we wanted to produce the positive, the derivation would end here, producing just slab to which agr would be added. In the comparative, when CmprI is added on top of such an QP, see (36-a), spell out succeeds without any movement, because exactly such a CmprIP is contained in the lexical entry of slab in (29-b). CmprIP is thus successfully spelled out, and the derivation continues by merging Cmpr2 on top, see (36-b). (36) a. CmprIP b. Cmpr2P CMPRl QP (slab) Cmpr2 CmprIP (slab) CMPRl QP This time, spell out fails, and spellout-driven movement is triggered. The first thing to be be tried is Spec movement. However, the complement of Cmpr2 in (36-b) has no Spec, and so this option is skipped. Complement movement is tried next, producing the structure (37), which correctly spells out as slab followed by -s: (37) CmprIP (slab) Cmpr2P CMPRl QP Cmpr2 An important observation is that for different roots, the spellout algorithm produces different tree shapes, compare (37) with (34). The choice of a particular root thus has a certain (limited) power to steer the derivation in a particular direction. For example, the lexical item does not influence the sequence in which features are merged, but it does influence in how the features are linearly ordered (CmprI either precedes or follows the complement). This turns out to be useful in extending this theory to English, focussing on the alternation between more intelligent and smart-er. CDV build their analysis around the fact that the two markers differ in terms of complexity. In particular, -er is simpler than mo-re (it spells out fewer features). The complexity of more can be interpreted either literally, so that more is 24 segmented as mo-re. However, even in the absence of surface decomposition, the interpretation tells us that more is the comparative of much, which in Nano necessarily means that more (which is minimally [cmpr much]) must express more features than -er (cmpr). Importantly, each feature must correpond to a syntactic head. CDV implement these observations as follows. First of all, they interpret adjectives like smart-er as exactly parallel to the Czech reduced comparatives like slab-s 'weaker,' with the final structures as in (38), where spellout-driven movement has moved CMPRlPout of Cmpr2P in a way described for (37). Because of this parallel, the lexical entry of smart will be like the one of slab in (29-b), i.e., associated to the full CmprIP. (38) slab smart Now we know that adjectives like intelligent do not combine with -er. CDV encode this by associating such adjectives to a QP only, see (39). This yields 4'intelligent-er, since the two pieces (intelligent and -er) do not spell out all the features of the comparative (they lack CmprI). QP (40) more