Universals of Human Langua CONTRIBUTORS Alan Bell D. N. S. Bhat Dwight Bolinger John Crothers Charles A. Ferguson Thomas V. Gamkrelidze Joseph H. Greenberg Larry M. Hyman Ian Maddieson Merritt Ruhlen Russell Ultan Marilyn May Vihman Edited by Joseph H. Greenberg Associate Editors: Charles A. Ferguson & Edith A. Moravcsik VOLUME 2 Phonology Stanford University Press, Stanford, California 1978 Some Generalizations Concerning Initial and Final Consonant Clusters JOSEPH H. GREENBERG ABSTRACT Based on a sample of 104 languages, 40 universals regarding initial and final clusters are formulated. These fall into five main groups: 1) the marked status of clusters as such; 2) preferred types of assimilation; 3) preferred types of dissimilation; 4) preferences based on the relation to the peak of the syllable; 5) preferences for certain consonantal types over others not dependent on factors involved in groups 2), 3) and 4). This article is a somewhat revised version of one originally published in Russian in Voprosy Jazykoznanija (1964) 4. 41-65. An English version appeared subsequently in Linguistics 18. 5-34 (1965). It was subsequently utilized and commented on in Charles E. Cairns, Markedness, neutralization and universal redundancy rules. Language 45.4. 863-85 (1969); Doris L. Pertz, Sensitivity to phonological universals in children and adults, doctoral dissertation, Columbia University (197 3), and Doris L. Pertz and Thomas G. Bever, Sensitivity to phonological universals in children and adolescents, Working Papers on Language Universals 13. 69-90 (December 1973). I am particularly indebted to Doris L. Pertz for critical comments. The only substantial change incorporated into this version is owing to her. Former universal 34, asserting the existence of at least one initial cluster of liquid + nasal implies at least one of liquid + obstruent has been withdrawn through lack of sufficient empirical evidence,and subsequent universals have been renumbered. The original study was supported by the National Institute of Health (M8 0739-01) and by a grant in aid of my personal research from the Behavioral Sciences division of the Ford Foundation. This assistance is gratefully ackowledged. 244 Joseph H. Greenberg In the present study, which is of a preliminary nature, a number of generalizations are proposed regarding initial and final consonant sequences, based on a sample of approximately one hundred languages. All assertions made here are to be understood as not claiming any validity beyond this sample. It is, of course, reasonable to conclude that, although exceptions are to be expected with further investigation, they should be few in number and that, therefore, at least a statistical validity for the statements made here can be claimed. Most previous study of consonant clusters has been related more or less directly to the question of the possible functional definition of phonemes in terms of their behavior in combinations. Almost the only attempt to generalize about the characteristics of consonant clusters is to be found in an article of Trnka, which was subject to a critique by Trubetskoi.1 The most important suggestion of Trnka in relation to the present paper is that ". . .phonemes differentiated by a mark of correlation never combine in the same morpheme ..." As indicated by Trubetskoi, this is not true as a completely general statement since, for example, nasal and voiced homorganic stop is not only a permitted but even a favored combination. It does, however, hold for some features under certain conditions. The generalizations 11, 12, and 13 below are specific cases of this principle. Trnka also pointed out the incompatibility of combinations of two different kinds of sibilants, reflected in generalization 16 of the present paper. A further point in Trnka's discussion which has proven useful is his appreciation of morpheme boundaries as allowing combinations forbidden internally in the morpheme. This factor is involved in the statement of a number of the universals presented here. The only other suggestion which has proven useful for this study is that of Hjelmslev in regard to resolvability, that is, the principle that longer consonant sequences in general contain as partial sequences shorter ones which are likewise occurrent. This point is incorporated in generalization 2. 1b. Trnka, General Laws of Phoneme Combinations, Travaux du Cercle Linguistique de Prague 4. 75-61 (1931) and N. Trubetskoi, Principes de phonologie (Paris, 1949), 264-8. It is indicative of how little has been done in this area that no universals of consonant combinations figure in the general table of universals to be found in b. Uspenskil's review of Universals of language (Cambridge, 1962) in Voprosy Jazykoznanija 5. 121-9 (1963). 2l. Hjelmslev, On the principles of phonematics, Proceedings of the Second International Congress of Phonetic Sciences (19 36), 49-54. f i 5- / 1 I k 1 INITIAL AND FINAL CONSONANT SEQUENCES 245 The language employed in the present study are listed in the appended bibliography with a numerical designation for convenience of reference in the text of the article and with the chief sources employed for each language. Where I consider the material to be significantly defective in completeness or in phonetic information, this has likewise been indicated. It is clear that even incomplete inforination can yield relevant evidence for some generalizations, while being insufficient for others. Thus, a generalization that a lateral is never followed by a vibrant ("r" sound) can be refuted by a single valid example from a description which is seriously incomplete, while being insufficient to refute a generalization of other types, e.g. that every language which has an initial obstruent followed by a nasal also has some combination of initial obstruent followed by a liquid. In such a case we might find /kn/ among the clusters reported but no examples such as /kr/ because of the insufficiency of the material. The exclusion of medial clusters was dictated chiefly by practical considerations. The number of such combinations is often very large. Further, many of the sources utilized in the present study contain statements regarding initial and final clusters only. The study of medial clusters also raises some theoretical problems not present in the case of initial and final clusters. For example, in languages with syllabic initial and final single consonants or clusters, except for possible word-sandhi phenomena, the medial clusters produced at word boundaries are in general predictable from initial and final combinations. Such clusters should evidently be distinguished from those which are word-internal and which may or may not be present in languages independently of the question as to whether word-boundary clusters exist. Again, morpheme boundary and morpheme internal clusters should be distinguished among word internal clusters. For these reasons a study of medial clusters would be much more elaborate and difficult to undertake from existing data. Preliminary to setting forth specific results, it will be necessary to consider a few problems of theoretical method. These are only briefly discussed, insofar as they affect the procedure employed in the present study. The very phrase consonant cluster raises definitional problems which have to be decided in order to compare consistently data from different languages. A first question, then, concerns the definition of consonant. In accordance with the usual notion, consonant is understood here in terms of function in the syllable, i.e. as a margin rather than peak. This will mean, however, that the syllabic and non-syllabic allophones of the same 246 Joseph H. Greenberg phoneme, e.g. [u], [w] in some languages, will have only the latter allophones reckoned as consonants. Strictly speaking, if there are both consonant and vowel allophones of the same phoneme, then this distinction is an irrelevant feature for the phoneme concerned at the same time that it is a central question in the present study. It is clear that this same basic consideration arises at other points. Consider, for example, the generalization stated below as number 5. This asserts that the presence of final heterorganic nasal obstruent combinations in any language implies the existence of homorganic combinations. Now many languages have final combinations which are phonetically [rjk] or [ng] where [rj] is to be considered as a member of an Inl phoneme containing [n] and [rj] among its variants. A naive reading of a phonemic transcription /nk/ would lead to its classification as a heterorganic combination. Strictly speaking, however, since in this case the difference between dental-alveolar and velar position is a non-distinctive feature, we have no right to classify it as either heterorganic or homorganic on a phonemic basis. Yet a classification of the sounds on a phonetic basis allows us to compare languages and draw general conclusions2 Further, it is evidently the same tendency operating in different languages which leads to this particular allophonic distribution of [n] and [rj] in one language while in another which contains distinct /n/ and /n/ phonemes, it leads to a preference for sequences like /nd/ and /rjg/ while it disfavors /ng/ and /nd/. Since allophonic information is, for the reasons just adduced, essential to the present study, statements of phonemic combinations in the literature had to be supplemented by phonetic descriptions in the same or other works. For this reason also, data from languages of the past were not included in the sample of 104, although in certain instances such languages were taken into consideration when the absence of phonetic detail did not seriously affect a particular hypothesis. The other part of the phrase 'consonant cluster1 also raises difficulties. It is well known that for certain classes of sounds, the decision as to whether we have a cluster or succession of phonemes 3 What is said here is in close agreement with the view of E. Fischer-Jdirgensen that ". . .the tendencies to free combinations or to definite restrictions between different parts of the syllable seem to be more easily formulated when the parts of the syllable are defined on a phonetic basis." (On the definition of phoneme categories on a distributional basis, Acta Linguistica 7. 8-39 (1952)). j. INITIAL AND FINAL CONSONANT SEQUENCES 247 as against a single phoneme has not produced a usable unarbitrary criterion which meets with general consent. Thus, in languages with the phonemic contrast of unaspirated and aspirated consonants, the alternative solution as a single phoneme, e.g. /pi or a cluster /ph/ depends on considerations of symmetry or "pattern" in the sound system in general which often leads to individually different solutions even for the same language. It seems unavoidable, for purposes of valid comparison among languages, that one must make a decision in such matters which, even though it may be arbitrary, will be consistently applied. In general the sequences at issue are well characterized in N. Trubetskol's classic work on phonology as "produced by a single articulatory movement or by means of a progressive dissociation of an articulatory complex. In the former of these cases, that of the affricates, I have considered the articulation to be a cluster of stop + fricative. The latter have all been considered single consonants. These include aspirated, glottalized, labialized, palatalized, velarized, and pharyngealized sounds. The sequence nasal + homorganic voiced stop, treated as a single phoneme in some languages by some analysts, e. g. FIJIAN, is here always treated as a cluster. A further problem of definition arises regarding the terms 'initial' and 'final.' In principle, initial and final in the utterance is intended. Most studies are in terms of word initial and final which generally comes to the same thing as utterance initial and final. "Where there are word sandhi rules, however, only the utterance initial or final forms are considered in the formulation of generalizations. Where word boundaries occur between members of a consonantal sequence which is actually or potentially utterance initial or final, the entire sequence is viewed as a valid cluster. Thus, RUSSIAN v dome, 'in the house,' is considered as having an initial consonant cluster [vcl]. A particularly vexing question concerns the treatment of clusters in borrowed words. The line between forms recent enough to be considered borrowings and those which can be considered fully assimilated into the language is difficult to draw. Moreover, some of the studies utilized distinguished borrowed from native clusters, while others did not. As far as possible this distinction was made in compiling the material from the original sources where given, from etymological dictionaries, or from my own knowledge. However, since such data were not obtained from all of the languages. :N. Trubetskol 1949: 58. 248 Joseph H. Greenberg it is to be understood that, as a general rule, borrowed are included along with indigenous clusters in the statements of the present article. While the exclusion of combinations in borrowed forms would change the typological assignment of certain languages in the tables below, in no case would such a change have been sufficient to invalidate a hypothesis. A further question concerns the distinction between clusters which appear only with contained morpheme boundaries and those which do not. Again an attempt was made to record this distinction as far as possible, but in some instances the information at my disposal was not sufficient to resolve this question. In several instances where the relevant evidence was sufficient, the existence of morpheme boundaries figures in the statement of generalizations. Finally, there is the question as to what particular variety of a language is intended, or even which speech tempo since, in many instances, particular consonant sequences which occur in slower and careful speech are contracted or assimilated in more rapid or more colloquial instances. In general, I have treated so-called "standard" languages simply because they have usually been more carefully described from the phonetic point of view. In general, the source cited in the bibliography will give sufficient indication of the particular variety of a language which is being considered. For HINDI the RANKHANDI dialect was used and for KAREN, SGAW. On the question of speech styles, I have in general utilized the lento forms as against the allegro, but I have recorded data concerning such variation where they were present. It is plausible to consider that allegro forms give important insight into the identification of "difficult" and less-favored sequences and into the direction of historical change. However, their systematic treatment has been left for further investigation. In general it proved useful to distinguish between initial and final clusters as separate systems with distinct though often similar properties. In the sample of 104 languages there were found to be 90 initial systems and 62 final systems. Although the possibilities of certain connections are not to be excluded, in general initial and final systems seem to function independently and it was not possible to formulate any generalizations connecting them. The first set of hypotheses concern properties of initial and final systems which correlate with the length of the sequences. 1. For initial and final systems, if x is the number of sequences of length m_ and y_ is the number of sequences of length n and > 4 t i ■■■■ if \ I 8 ft INITIAL AND FINAL CONSONANT SEQUENCES 249 m > n, and rj is the number of consonant phonemes, then In other other words the propertion of the logically possible combination utilized decreases or remains the same with increasing length of the sequences. This may be illustrated for ENGLISH initial clusters as follows: the number of consonant phonemes are 22. All of these except /z/ and /n_/ occur as single phonemes. The logically possible sequences of length 2 are 221 =484. Of these 28 occur. For length 3 the logically possible number of combina -tions is 22 =10,648. Of these only 8 occur. No sequences of length greater than 3 are found. Hence, §(L=l)>^(L = 2)>ll7|i¥ (L = 3) > 0 (L=4) = 0 (L = 5) etc. It will be noted that the absolute number of combinations of length 2, i.e. 28, is greater than those of length 3, i.e. 8, etc. However, in the limiting case L = l, in this as in many other instances the number of combinations of length 2 is greater than length 1. We can therefore make the following statement regarding absolute length: 2. For initial and final systems, if x is the number of sequences of length m and y_ is the number of sequences of length n, x < y_. and m > n and n > 2, then The statement in the "Memorandum concerning language universal, " in Universals of language that "If syllables containing sequences of n consonants in a language are to be found as syllabic types, then sequences of n -1 consonants are also to be found in the corresponding position (prevocalic or postvocalic) except that CV — ■>■ V does not hold, " can be deduced as a corollary from either 1 or 2 above, insofar as it refers to word initial and final as a special case of syllabic initial and final.5 Further, 1 and 2 make no assertion concerning L= 0 since in this case there is no question of combinations. In general the validity of 1 and 2, to which no exception was found in the 104 languages of the sample, provides objective evidence of the "difficulty" of clusters. This would seem to correlate with the diachronic tendency towards their simplification, since any simplification automatically reduces the number, both absolutely and 5Universals of language, ed. by J. H. Greenberg (Cambridge 1963), p. 263. I. 250 Joseph H. Greenberg proportionally, of sequences of the length subject to reduction and increases the number of shorter sequences. The next statement refers to the property of resolvability which was first suggested by Hjelmslev. A sequence is here said to be completely resolvable if every continuous subsequence also occurs. For example, if in a language initial fstr occurs then if fs_, st, tr_, fst and str all occur, it is completely resolvable. If some of these occur but not otherwise, it is partially resolvable, and if none occurs, it is non-resolvable. 3. Every initial or final sequence of length m contains at least one continuous subsequence of length m - 1. In the overwhelming majority of instances sequences are completely resolvable. In the weaker form asserted here there are still a very small number of unresolvable sequences in the material collected. These were from CHATINO, FAME, and COEUR D'ALENE and totalled 10 in all. Thus this assertion has only statistical validity but far beyond chance within any reasonable confidence limit. That this is not a chance phenomenon in the individual languages can be illustrated from initial clusters in ENGLISH. Here all 8 clusters of length 3 (skw, skr, ski, skj, spl, spj, spr and str) are completely resolvable. Now the only initial clusters of length 3 that can be formed from those of length 2 that conform to the requirement of complete resolvability are the following: spl, spr, spj, str, stw, ski, skr, ski, skw, sfl, sfr, sfj. That 8 clusters chosen at random out of 20* = 8,000 logically possible combinations should all fall within a set of 12 is, of course, highly significant statistically. This generalization could be restated as follows: For every initial and final system and for every length, the number of completely resolvable sequences is greater than the number of those which are not completely resolvable. This statement has no exceptions in the present material. The reason for the phenomenon of resolvability is, at least partly, that longer sequences are formed from shorter sequences by morphological or syntactic combination. The latter occurs.for example, in initial consonant sequences in the SLAVIC languages where prepositions consisting of a single consonant are found. From this it should follow, as a general result, that the longer a sequence the more likely it is to contain one or more morpheme or word boundary. Unfortunately, it was only possible to classify the sequences in this regard for a few languages. Two classes of sequences were distinguished, those which occurred exclusively with a contained i -J ■i i I- ■ INITIAL AND FINAL CONSONANT SEQUENCES 251 morpheme or word boundary and those which occurred in at least some cases without such an internal boundary. The following generalization is therefore merely a probable conjecture which was verified in the few cases presented below. 4. In all initial and final systems, if there are sequences of length m and n and m> n_, then the proportion of sequences of length m which only occur with internal morpheme boundaries is equal to or greater than the proportion of such sequences which occur of length n. Since every word boundary will also be a morpheme boundary, it is sufficient to state the above generalization in terms of morpheme boundaries. Among the cases investigated were Coeur d'Alene and Dutch. In Coeur d'Alene initial clusters the ratios were as follows: _0_ 42 (L=l)< ||(L=2)<|f (L=3) 42 The final ratios were: 42 2) it also has some other,!/'. The majority of the hypotheses stated here are of this nature. If properties and O <4>, the falsity of the opposite implication f 3 is tacitly asserted. If it were true, the universal would be stated in the form of an equivalence. As supporting evidence from the sample of 104 languages, we cite by number those which have both final heterorganic and homorganic sequences of nasal followed by obstruent and those which have homorganic sequences without heterorganic. In the implicitly asserted typology, then, one of the four logically possible types, the class of languages with heterorganic but without homorganic sequences of this kind is null. Languages with final systems containing both heterorganic and homorganic combinations are, then, as follows: 1, 2, 3, 4, 7, 10, 13, 27, 28, 30, 33, 35, 36, 41, 42, 43, 48, 49, 61, 62, 64, 67, 72, 75, 76, 80, 81, 86, 87, 94, 101, 102, and 103. Languages with final systems containing homorganic without heterorganic combinations are: 5, 11, 15, 16, 23, 34, 44, 45, 46, 51, 66, 69, 70, 77, 82, 89, 93, 96, 97, 98. The data for 57 (KASHMIRI) were not sufficient to decide between these two possibilities. The remaining languages with final systems did not have any combinations of nasals and obstruents . These results can be shown in the following table: Homorganic ~ Homorganic Heterorganic 33 0 Table 1 1 Heterorganic 20 A similar statement for initial systems holds in almost all cases, but a number of SLAVIC languages (e.g. RUSSIAN, POLISH, CZECH) are conspicuous exceptions in that they contain initial heterorganic combinations such as mg without having homorganic sequences. It should be noted that for purposes of the present hypothesis 254 Joseph H. Greenberg sequences of nasal followed by obstruent are included even when preceded or followed by one or more consonants. Thus, RUSSIAN /mgla/ counts as an instance of a heterorganic cluster. We now consider hypotheses of dissimilation. The first group has to do with preference for combinations of stop and spirant as against stop +stop or spirant + spirant. The hypotheses presented here refers to sequences of length two only. In fact sequences of three or more stops or three or more fricatives are excessively rare and implicational universals analogous to the following could doubtless be formulated. 7. In initial systems the presence of at least one combination of stop + stop implies the presence of at least one combination of stop + fricative. In fact languages with stop +stop almost always have both stop + fricative and fricative + stop combinations. Affricates are counted here as stop + fricative as explained in an earlier section. The following languages which have stop +stop combinations also have both stop + fricative and fricative +stop: 2, 10, 14, 16, 27, 28, 30, 31, 32, 34, 38, 41, 42, 44, 59, 61, 77, 79, 80, 81, 82, 93, and 102. Two languages, 47 and 87, have stop + stop and stop + fricative but do not have fricative +stop. Further details are given in Table 2. FS-SF FS SF