'^1 14 Phono tactic aspects of the linguistic expression BENGT SIGURD Department of Scandinavian Languages, University of Lund 1. Introduction Phonological descriptions of languages generally consist of a description of the phonemes of the language and a description of the distribution of the phonemes. The latter part is also referred to as the phonotactic structures of the language.1 Phonotactic data are generally presented as word, morpheme or syllable formulae, lists of phoneme sequences or rules of the following type: only word medial sequences (interludes) are permitted in the language, the longest type of consonant cluster has three members, dentals do not combine in initial clusters, the vowel before / is always short etc. Such information may be given in relation to various units, such as utterance, macrosegment, word, morpheme, syllable. The choice of reference unit is a problem in many languages. Phonotactic descriptions may also be called phonological grammars (Householder, 1959; Saporta and Contreras, 1962; Romeo, 1964). In this case the permitted strings of phonemes constitute the language which is to be generated by the grammar. The relation of descriptions of phonotactic structures to the sentence grammar of the language is not clear. The scholars mentioned above hold that it is necessary to supplement the sentence grammar with a phonotactic description, but others (Halle, 1962) maintain that there is no motivation for separate phonotactic descriptions: the phonotactic 1 The useful term 'phonotactics' was introduced by Robert P. Stockwell (Hill, 1958). The distribution of phonemes is sometimes used in the sense of the relative frequencies of the phonemes (Herdan, 1956). I am indebted to Gerald Sanders, Indiana University, for revising my English and suggesting many improvements, in particular in the part dealing with string-replacement rules. '-■.vi Phonotactic aspects of the linguistic expression 451 restrictions will appear at their proper place in the sentence grammar of the language as rules concerning the combination of distinctive features (provided the lexical items are specified in terms of distinctive features). It might be argued that the shape of words or syllables is a characteristic feature of the language which is hard to extract from a sentence grammar of the language. The permitted phoneme sequences are generally much restricted and only a fraction of the combinations which can be formed by combining the phonemes in all possible ways occurs. The restrictions vary/greatly between languages and given only the set of phonemes it is not possible to produce acceptable words. There might be languages with identical phoneme inventories but greatly different ways of combining the phonemes. Phonemes can be classified according to their combinatory possibilities. The study of phonotactic structures jums at _djs^pyering^he_^harac-teristic patterns of the language. These patterns seem to be fairly stable properties of languages and they can be used for comparative or typological purposes. The permitted phoneme arrangements (and the permitted letter arrangements) are of interest to those who coin new words as personal names, names of industrial products and trade names. The accidental gaps found in the analysis suggest potential words which can be used without breaking the rules of the language. The semantic associations of certain sequences must, however, also be taken into account in such cases. Phonotactic investigations are also of importance to areas such as shorthand writing, cryptography, speech audiometry, and automatic recognition of speech. The teaching of foreign languages may also benefit from phonotactic studies. A contrastive study may show that the native language and the foreign language combine identical phonemes differently. Initial German kn is for instance difficult for an English speaker since this cluster does not occur in English although both k and n occur. This chapter will give a survey of the problems and methods of phonotactic investigation. Data from several languages will be treated for the sake of demonstration.2 2 The general problems and methods have been treated in Trubetzkoy, 1939; Vogt, 1942; Fischer-Jefrgensen, 1952; Hockett, 1955; Haugen, 1956a; Hararay and Paper, 1957; Spang-Hanssen, 1959; Malmberg, 1966; Sigurd, 1965; Greenberg, 1965. 452 Bengt Sigurd 2. The choice of units It is apparent from language descriptions that various linguistic units have been chosen as the frame of reference for phonotactic descriptions. Hockett suggests the macro-segment, the microsegment and the syllable as basic units, Trubetzkoy prefers the morpheme, Haugen and Fischer-Jorgensen insist on the syllable. Haugen and Fischer-Jorgensen have discussed the problem from general points of view and they find the syllable as the most suitable unit for comparison of the phonotactic structures of different languages. Haugen goes further and tries to define the syllable on the basis of phonotactic structures: the syllable is the unit within which phoneme distribution can be most economically described. All languages seem to have syllables and their structure is generally of the type (C)V(C). The trouble is, however, that other units have an influence on the phonotactic patterns. In the Germanic languages the word and syllable final consonant clusters will vary between mono-morphemic and polymorphemic sequences. The monomorphemic sequences are much more restricted and form a neater pattern. The addition of suffixes breaks this pattern. Although morphophonemic changes adapt many combinations to the basic pattern found in mono-morphemic sequences the morphological pressure may result in sequences which deviate considerably from the monomorphemic pattern. In English the final cluster ksd is only found in polymorphemic clusters as in sixth. In Swedish the addition of suffixes may result in clusters of six or even more members in words such as skdlmskts {skdlm-sk-t-s} (nominal-ized genitive form of a neuter adjective derived from the noun skdlm, 'rouge'; s, 'genitive', t, 'neuter', sk, 'derivational suffix'). Monomorphemic final clusters can only have three members in Swedish. Similarly, in Russian all initial clusters with more than three members seem to be polymorphemic. The word is of phonotactic importance in some languages, such as Eskimo (Swadesh, 1946), Finnish (Haugen, 1956a), and Kutenai (Haugen, 1956b). Any syllable division in Finnish and Kutenai will yield syllable initial or syllable final sequences which cannot occur word initially or word finally and rules for the occurrence of syllables in different positions of the word must therefore be introduced, if we insist on describing the phonotactic pattern within syllables. Phonotactic aspects of the linguistic expression 453 3. Data Another problem ij, the heterogeneous character of linguistic data.'1 Generally it is possible to find a fairly neat system inherent in frequent and genuine words. As we include more data such as loan-words, archaic words, nursery words and interjections the system breaks down or has to be supplemented in a way which destroys the neat patterning. Linguists usually try to get rid of the disturbing material by reference to the special character of the words, but it is generally difficult to apply the criteria and only retain the well-formed words. If frequency is used as a criterion it will for instance often be clear that words which we would like to include are less frequent than words which we would like to exclude. The relevance of statistical information to phonotactic investigations is not clear. It is also a problem whether running text frequency or lexical frequency should be used (Karlgren, 1961). Such figures are supposed to show the importance of the patterns but it is dubious whether running text frequency or lexical frequency (or a combination of both) would do this best. This would seem to be part of the general problem of determining an appropriate role for statistical information in the general model of linguistic structure. 4. Methods of analysis and description A list of permitted sequences is in itself a description of the data but it does not reveal the inherent structure which we feel is present. The following list shows permitted word initial consonant sequences in Swedish (0 means no consonant). Single consonants: r, I, m, n, v,j, b, g, d,f,p, k, t, s,f, c, h, 0 2- member clusters: tr, pr, kr, fr, dr, br, gr, vr, si, pi, kl, fl, bl, gl, pj, f /, bj, mj, nj, sv, tv, kv, dv, sm, sn, kn,fn, gn, st, sp, sk 3- member clusters: skr, spr, str, spj, skv, spl. The following diagram displays the inherent structure better. It generates all permitted sequences. The words to the right are examples. 3 The heterogeneous nature of linguistic data has been pointed out recently by Malmberg (1964). 0 - A -t ■ f -j ■ I- — al — kus — tJUV — skjuta -jul — lur — rar — ínur — mjuk — nát — njure — var — vrak — dvárg — dag — drag — tvá — lak — trád — gris — gala — g/ad — gnida — brod — bo — blod — bjuda — pris — pá — plat — PJds — krig — klok — ko — kvar — kná -fri -flod -fa — fiáder — fnissa — siná — SfíÓ — svár — slag — se — slá — sirá — skrutta — škola — škvár ta — sprit — spis — splittra — spjut ig. 1. Word initial sequences. Phonotactic aspects of the linguistic expression 455 4.1. Position analysis The diagram above shows that we can define different classes on the basis of position in relation to the vowel. Class 1, defined as the class of phonemes which can occur one step before the vowel, contains all the phonemes, class 2 which includes the phonemes occurring 2 steps from the vowel has a restricted membership, class 3 includes only s. It is possible to go further and define classes on the basis of the number of members in the clusters. Thus position 2 can only be occupied by p, t, k in three-member clusters, while it can be occupied by m, n, v,p, t, k, b, d, g,f, s if we include two-member clusters. The varying ability of phonemes to occur separated from the vowel may be used for measuring the vowel adherence of the consonants (Sigurd, 1965, p. 47). 4.2. Combination analysis We may define classes of phonemes on the basis of their combinability. Such classes will be of different size. A big class may be said to show a more important feature of the language than a small class. In Swedish the labials (p, b, m,f, v) constitute a class of phonemes which do not combine initially, the palatals (k, g, f,j) constitute another such class. Dental stops (d, i) do not combine with /, but although this class is small the fact is interesting since this seems to be a characteristic of most Indo-european languages. Combinations between nasals and stops show interesting differences between languages. In the basic Swedish system in syllable final position we have the following pattern. r-s. lab lab trip, mb dent dent Npai dent pal Fig. 2. V:-"' 456 Bengt Sigurd Phonotactic aspects of the linguistic expression 457 The place of articulation of a stop following a dental nasal is thus predictable. A stop after labial or palatal nasals may have two different places of articulation. In word medial position in Finnish, where we only have mp, nt, rjk the place of articulation is always the same in the combination. In Spanish the place of articulation of a nasal at the end of a syllable is also predictable (from the pause or the following phoneme). 4.3. Order analysis If we look at phoneme sequences purely from the point of view of the order of the phonemes we may sometimes find an order relation inherent in the data. An order relation should fullfill the following requirements: (1) there should be no inversible sequences such as both xy andyx, (2) if we have xy and yz we should also have xz. The first criterion is the a-symmetry criterion; the second is the transitivity criterion. The Swedish sequences are very close to an order relation. They meet the first requirement and almost meet the second. We have for instance sk and kn and also sn, we have kv and vr and also kr. If we add the lacking sequences (sr, sj, kj, gj) we will get an order relation which can be depicted as in fig. 3. Fig. 3. Order diagram which generates initial two-member clusters in Swedish. In this figure a box has been drawn to the right of the order diagram containing the added sequences. This box can be considered as a filter which extracts the non-permitted sequences from the sequences which are generated if we select two phonemes going from left to right along the lines. The diagram may also generate three-member clusters, but in this case it is overgenerating. This diagram gives a neat description of the Swedish sequences and it is reasonable to use this approach in this case. It is obvious, however, that the size of the filter is a measure of the fit of the description to the data and that nothing important has been said if the filter extracts most of the sequences generated by the order diagram. The following examples show how this approach works in some other languages. The following is a chart of the two-consonant clusters in Fox interludes (cf. Hockett, 1955, p. 92). m w y pw py tw kw ky sw sw hw hy mw my nw ny p t k s s sk c h hp ht hk he m n vv y In the Fox interludes presented, the relation 'precedes' is almost transitive on the entire set of consonants. If the cluster sy was present it would be completely transitive. From the point of view of sequential order, this cluster might be said to be an 'accidental gap' in Fox (although, of course, there may be other reasons for rejecting the cluster). It thus seems reasonable to describe the Fox interludes by the following diagram (fig. 4). A box containing the exception sy has been drawn to the right of the diagram. This may be considered to be a representation of a 'filter' which excludes the cluster sy from the output of the diagram. The three-consonant interludes of Fox are the following: hpy, hpw, hky, hkw, sky, skw, and htw. This is precisely the set of three-consonant clusters generated by the diagram. Furthermore, the set of Fox interludes 458 Bengt Sigurd Fig. 4. Diagram generating medial sequences of one, two or three members in Fox which consist of a single consonant is just the set of consonants used to construct the diagram. Consequently, if we consider any one-, two- or three-consonant cluster obtained by traversing an arbitrary path of the diagram and filter to be allowable output, we find that the diagram generates exactly the full set of Fox interludes. The set of onsets in Fox is as follows :p,pw,py, t, tw, c, k, kw, ky, s, sw, s, sw, m, mw, my, n, nw, ny, w, and j>. Tc generate this set through the use of the preceding diagram would require a somewhat more complex filter to exclude the many interludes which do not occur as onsets. The following medial clusters occur in Eskimo4: jp, yp, pq, pk, pt, pn, ps, fa, pi, jp, jk, jt, jim, /in, js, js, jl, yp, yq, yt, ym, yn, ys, ys, yl, tl, ts. ji is an allophone of j before a nasal. The diagram below shows the order relation inherent in these clusters. It generates all the clusters and no others if two consonants are selected along the lines when going from left to right extracting those in the filter. The clusters in the filter show some systematic features: labials do not combine, the palatal fricative does not combine with a velar stop and the velar fricative does not combine with a palatal stop. The order inherent in the sequences permits us to arrange the consonants in columns - order classes5 - in such a way that we get the permitted clusters by going from left to right selecting members of the differ- 1 After Swadesh (1946, p. 31) j and y are palatal and velar fricatives respectively, k and q are palatal and velar stops respectively, s is articulated with the blade, s is articulated with the point of the tongue. 5 The application of order classifications in linguistic analysis has been discussed in detail in Brodda and Karlgren (1964). Phonotactic aspects of the linguistic expression iP 459 Fig. 5. Diagram generating medial two-member clusters in Eskimo. ent classes. The members in the classes do not combine, but members of different clusters combine although they need not all combine. By this classification only based on order we lose some information about occurring combinations. The members of the different classes will often have phonetic features in common. However, it is often possible to arrange the phonemes in order classes in several ways, if all phonemes do not combine which is often the case. The following arrangement shows one possible classification in Eskimo. Fricatives pal.-vel. lab. j ß y Stops and Nasals + s P k q t m n ? Lat. / 4.4. Phonological grammars using string-replacement rules String-replacement rules or rewrite rules of the type used in syntactic descriptions6 may also be used in phonotactic descriptions. Such grammars may take advantage of existing restrictions in position, combination 6 Cf. Chomsky (1957). Phonotactic descriptions of this type are discussed in Bach (1964), and in the papers mentioned in the introduction. 460 Bengt Sigurd Phonotactic aspects of the linguistic expression 461 or order in different ways. Various classes of grammars of this type have been defined. The following (a) is a context-free phrase structure grammar generating the initial sequences in Spanish: 6, p, b, t, d, k, g,f, r, I, I, X, s, m, n, ft, c, pr, br, kr, gr,fr, tr, dr,pl, bl, kl, gl,fl?

a /i* V/ {T} a e X 3 ->■ s m n ft 1 5 2. P -» p, b, k, g, f 3. T -»t, d 1. O -» 2. C - 3. L - •. . 4. C, CW fc2 cP/ {Cp IT (a) 5. C2 -» 8, s, X, m, n, n, I, c 6. Cp -* p, b, k, g, f 7. T ->■ t, d (b) We can get a more general first rule if we use context-sensitive rules in the following part of the grammar as is shown by (b). The initial Swedish clusters presented above are more numerous and more complex than the Spanish ones. The detailed observations on restrictions in position, combination and order made above suggests different string-replacement rules. The filters in the order diagrams suggest that, some non-permitted sequences may be considered as underlying forms of certain phonemes with restricted distribution {c,f). The grammar presented below is a transformational grammar or more 7 Cf. Saporta and Contreras (1962). The following notations are used. The arrow is to be read as 'is rewritten as'. Double arrow => means.'is transformed into'. The hook a means 'is followed by'. Items in a column linked by a brace { ^represent items one of which must be chosen. Items within angle brackets <> are optional. Square brackets [ ] used in transformations are used to abbreviate the listing of rules in which different symbols are replaced in the same environment. Such rules are read across by lines and rows. /_x means 'in the environment (context) before x', /x_means 'in the environment after x'. specificly: a context-sensitive phrase structure grammar with two transformations. The clusters kj, tj generated by the phrase structure part of the grammar are transformed into (. The cluster sj generated by the phrase structure rules and the cluster sg generated by the phrase structure rules and the first transformation are transformed into f. The last two transformations reflect diachronic changes that took place in earlier Swedish (cf. Saporta, 1965). The existence of the clusters kj, tj, gj, dj, stj, skj in earlier Swedish is reflected in the modern spelling for instance in tjock Ifokf, 'thick', kjol/cu.ij, 'skirt', gjorde [ju:rde/, 'made', djdvul jjs:vulj, 'devil', stjdrna /fe:rna/, 'star', skjuta jfu:taj, 'shoot'. The older clusters 1. O -* 2. C 3. R P T K r if not / s_ 1 if not / n if not / - m if not / • K v if not / sK} T J siK 1P T P sT P f ml a j 1-1 v a r h if not X_ where X / null 4. P 5. T 6. K V_n p/s- f P lb t/s_,_j It k/J s Is I-vol J 8. s I. dj, gj have developed into j not c, however, as is obvious from the examples. It would be possible to mirror this by a transformation but there seem to be no synchronic reasons for doing so. It would also be possible to generate h from a sequence generated by the phrase structure component for instance from sr which would mean that we can take away the 462 Bengt Sigurd condition for r in rule 3 or from gv which would mean we could take away a context restriction in rule 6, but any choice seems to be arbitrary. Instead of the last two transformations the following three could be used. They probably reflect the actual sound changes that took place in earlier Swedish better, but of course this is no argument in the synchronic description. The first transformation accounts for all voiceless allo-phones (c) of / after voiceless consonants.8 The second deletes t, k before c, and the third fuses s and c. 7. Xj => Xc, where X is any voiceless consonant t 9. s 5 =*■/" The use of string-replacement rules and transformations may reveal new patterns. However, there are often several patterns present in phonotactic structures and one description can hardly account for them all. It seems a full insight into the structures can only be gained if we apply several approaches. All the descriptions discussed above have been based on the phoneme. It is obvious, however, that the rules also can be stated in terms of distinctive features. This approach has been suggested by the advocates of transformational grammar who do not think it is necessary to have a phonemic level in the grammar. It has even been maintained that a phonemic level prevents one from making certain intuitively and observ-ationally obvious generalizations (cf. Lees, 1957; Chomsky, 1964). If we use distinctive features the predictability of certain features in certain contexts can be used for writing the rules. The first consonant in a Swedish three-member cluster can thus be specified by a consonantal feature since only s can occur in this position. Similarly, since only voiceless stops occur in the second position of a three-member cluster, the features [ + stop] and [—voiced] are predictable in this context. These features can thus be assigned by a general rule and they need not be included in the underlying phonemic characterization of these segments. Phonotactic aspects of the linguistic expression Bibliography 463 Bach, E., 1964. An introduction to transformational grammars. New York. Brodda, B., and H. Karlgren, 1964. Relative position of elements in linguistic strings, in Statistical methods in linguistics 3, Stockholm, p. 49. Chomsky, N., 1957. Syntactic structures. The Hague. Chomsky, N., 1964. Current issues in linguistic theory. The Hague. Fischer-j0rgensen, E., 1952. On the definition of phoneme categories on a distributional basis, Acta Linguistica 7, 8. Greenberg, J., 1965. Some generalizations concerning initial and final consonant sequences, Linguistics 18, 5. Halle, M., 1962. On phonology in generative grammar, Word 18, 54. Harary, F., and H. Paper, 1957. Toward a general calculus of phoneme distribution, Language 33, 143. Herdan, G., 1956. Language as choice and chance. Groningen. Haugen, E., 1956a. The syllable in linguistic descriptions. For Roman Jakobson. The Hague, p. 213. Haugen, E., 1956b. Syllabification in Kutenai, Intern. J. Amer. Linguistics 22, 196. Hill, A. A., 1958. Linguistic structures. New York. Hockett, Ch. F., 1955. A manual of phonology, Intern. J. Amer. Linguistics, Memoir 11, Baltimore. Householder Jr., F. W., 1959. On linguistic primes, Word 15, 231. Karlgren, H., 1961. Die Tragweite lexikalischer Statistik, Sprakvetenskapliga Sallskapets i Uppsala forhandlinger 1958-1960, Uppsala, p. 77. Lees, R. B., 1957. Review of Chomsky, Syntactic structures, Language 33, 375. Malmberg, B., 1964. Minimal systems, potential distinctions and primitive structures. Proc. 9th Intern. Congr. of Linguists. The Hague, p. 78. Malmberg, B., 1966. Structural linguistics and human communication. Berlin-Heidelberg-New York. 2nd ed. Romeo, L., 1964. Toward a phonological grammar of modern spoken Greek, Word 20, Special Publication 5, p. 60. Saporta, S., 1965. Ordered rules, dialect differences, and historical processes, Language 41, 218. Saporta, S., and H. Contreras, 1962. A phonological grammar of Spanish. Seattle. Sigurd, B., 1965. Phonotactic structures in Swedish. Lund. Spang-Hanssen, H., 1959. Probability and structural classification. Copenhagen. Swadesh, M., 1946. South Greenlandic (Eskimo), in Linguistic structures of native America, H. Hoijer (ed.) New York, p. 30. Trubetzkoy, N. S., 1939. Grundzuge der Phonologie (Travaux du Cercle Linguistique de Prague 7). Prague. Vogt, H., 1942. The structure of the Norwegian monosyllables, Norsk Tidskr. for Sprogvidenskap 12, 5. 8 This rule is allophonic and also generates the clusters Ipf], [fc] which were not listed above since they are phonemically jpjj, Ifjj.