List of Phonetic Symbols and Signs a Cardinal Vowel no. 4 (approximately as in French pane); used for first element of Eng. diphthong [ai] ae front vowel between open and open-mid (Eng. vowel in cat) a Cardinal Vowel no. 5 (approximately as in French pas); used for Eng. [a] in car 0 open rounded Cardinal Vowel no. 5 (Eng. vowel in dog) b voiced bilabial plosive (Eng. b in labour) 6 voiced ingressive bilabial plosive p" voiced bilabial fricative c voiceless palatal plosive £ voiceless palatal fricative C Cardinal Vowel 3 Cardinal Vowel no. 6 (approximately as in German Sonne); used for Eng. fa:] in saw, and first element of diphthong [di] d voiced alveolar plosive (Eng. d in lady) d voiced ingressive alveolar plosive dj voiced palato-alveolar affricate 5 voiced dental fricative (Eng. th in other) 2 Cardinal Vowel no. 2 (approximately as in French the); used for Eng. [e] in bed, and first element of diphthong [ei] J unrounded central vowel (Eng. initial and final vowels in another) * retroflexed central vowel (American er in water) '. Cardinal Vowel no. 3 (approximately as in French pere); used for first element of diphthong in [es] 1 unrounded central vowel (Eng. vowel in bird) r retroflexed central vowel voiceless labiodental fricative (Eng. / in four) [ voiced palatal plosive 5 voiced velar plosive (Eng. g in eager) J voiced velar implosive 1 voiceless glottal fricative (Eng. h in house) S voiced glottal fricative (sometimes Eng. h in behind) 1 Cardinal Vowel no. 1 (approximately as in French si); used for Eng. /'v./ in see * unrounded close central vowel xiv List of Phonetic Symbols and Signs List of Phonetic Symbols and Signs xv i centralized unrounded close-mid vowel (Eng. vowel in sit) j (unrounded) palatal approximant (Eng. y in you) c voiced alveolar tap (sometimes r in Eng. very) k voiceless velar plosive (Eng. c in car) 1 voiced alveolar lateral approximant (Eng. / in lay) i voiced alveolar lateral approximant with velarization (Eng. // in ill) I voiceless alveolar lateral fricative (Welsh 11) m voiced bilabial nasal (Eng. m in me) it) voiced labiodental nasal (Eng. m in comfort) ui Cardinal Vowel no. 16 (like Eng. /u:/ with spread lips) n voiced alveolar nasal (Eng. n in no) rj voiced velar nasal (Eng. ng in sing) ji voiced palatal nasal (French gn in rigne) 0 Cardinal Vowel no. 7 (approximately as in French eau) a Cardinal Vowel no. 10 (approximately as in French peu) ce Cardinal Vowel no. 11 (approximately as in French peur) 6 voiceless dental fricative (Eng. th in thing) p voiceless bilabial plosive (Eng. p in pea) r voiced alveolar trill (an emphatic pronunciation of r in Scottish English) 1 voiced post-alveolar approximant (Eng. r in red) \ voiced retroflex approximant r voiced uvular trill k voiced uvular fricative or approximant s voiceless alveolar fricative (Eng. s in see) J" voiceless palato-alveolar fricative (Eng. sh in she) t voiceless alveolar plosive (Eng. (in tea) tf voiceless palato-alveolar affricate I dental click u Cardinal Vowel no. 8 (approximately as in French doux); used for Eng /u:/ in do « rounded close central vowel 0 centralized rounded close-mid vowel (Eng. u in put) v voiced labiodental fricative (Eng. v in ever) a Cardinal Vowel no. 14; used for Eng. /a/ in cup v voiced labiodental approximant w labial-velar semi-vowel (Eng. w in we) a\ voiceless labial-velar fricative (sometimes Eng. wh in why) x voiceless velar fricative (Scottish ch in loch) y Cardinal Vowel no. 9 (approximately as in French du) A voiced palatal lateral approximant (Italian gl in egli) r Cardinal Vowel no. 15 y voiced velar fricative z voiced alveolar fricative (Eng. i in lazy) 3 voiced palato-alveolar fricative (Eng. s in measure) \ voiceless bilabial fricative 1 alveolar lateral click ? glottal plosive indicates full length of preceding vowel ■ indicates half length of preceding vowel . high unaccented pre-nuclear syllable * high falling nuclear tone (and used to indicate primary accent in citation forms) low falling nuclear tone 1 high rising nuclear tone low rising nuclear tone * falling-rising nuclear tone * rising-falling nuclear tone * mid-level nuclear tone = stylized tone (high level followed by mid-level) ' syllable carrying (high) secondary accent , syllable carrying (low) secondary accent ■ nasalization, e.g. [6] " centralization, e.g. [6] . more open quality, e.g. lo] closer quality, e.g. [o] [ devoiced lenis consonant, e.g. [5) (above in the case of [g,j,g]) * syllabic consonant, e.g. [nl (above in the case of [gl) dental articulation, e.g. [%] post-alveolar articulation [ 1 phonetic transcription // phonemic transcription > changed to < developed from -* is realized as * common in RP (Figs. 8-26 and in Chapter 10) PART I Speech and Language 1 Communication 1.1 Speech_ One of the chief characteristics of human beings is their ability to communicate to their fellows complicated messages concerning every aspect of their activity. A man possessing the normal human faculties achieves this exchange of information mainly by means of two types of sensory stimulation, auditory and visual. Children learn from a very early age to respond to the sounds and tunes which their elders habitually use in talking to them; and, in due course, from a need to communicate, they begin to imitate the recurrent sound patterns with which they have become familiar. In other words, they begin to make use of speech; and their constant exposure to the spoken form of their own language, together with their need to convey increasingly subtle types of information, leads to a rapid acquisition of the framework of spoken language. Nevertheless, with all the conditions in favour, a number of years pass before they master the sound system used in their community. It is no wonder, therefore, that the learning of another language later in life, acquired artificially in brief and sporadic spells of activity and often without the stimulus arising from an immediate need for communication, will tend to be tedious and rarely more than partially successful. In addition, the more firmly consolidated the basis of a first language becomes and the later in life a second language is begun, the more learners will be subject to resistances and prejudices deriving from the framework of their original language. As we grow older, the acquisition of a new language will normally entail a great deal of conscious, analytical effort, instead of children's ready and facile imitation. 1«2 Writing_ Later in childhood children will be taught the conventional visual representation of speech—they will learn to use writing. Today, in considering those languages which have long possessed a written form, we are apt to forget that the writing was originally an attempt at reflecting the spoken language, and that the latter precedes the former for both the individual and the community. Indeed, in many languages, So parallel are the two forms felt to be that the written form may be responsible for 4 Speech and Language Communication 5 changes in pronunciation or may at least tend to impose restraints upon its development. In the case of English, this sense of parallelism, rather than of derivation, may be encouraged by the obvious lack of consistent relationship between sound and spelling. A written form of English, based on the Latin alphabet, has existed for more than 1,000 years and, though the pronunciation of English has been constantly changing during this time, few basic changes of spelling have been made since the fifteenth century. The result is that written English is often an inadequate and misleading representation of the spoken language of today. Clearly it would be unwise, to say the least, to base our judgements concerning the spoken language on prejudices derived from the orthography. Moreover, if we are to examine the essence of the English language, we must make our approach through the spoken rather than the written form. The primary concern of this book will be the production, transmission, and reception of the sounds of English— in other words, the phonetics of English. 1.3 Language_ From the moment that we abandon orthography as our starting-point, it is clear that the analysis of the spoken form of English is by no means simple. Each of us uses an infinite number of different speech sounds when we speak English. Indeed, it is true to say that it is difficult to produce two sounds which are precisely identical from the point of view of instrumental measurement: two utterances by the same person of the word cat may well show quite marked differences when measured instrumentally. Yet we are likely to say that the same sound sequence has been repeated. Additionally we may hear clear and considerable differences of quality in the vowel of cat as, for instance, in the London and Manchester pronunciations of the word; yet, though we recognize differences of vowel quality, we are likely to feel that we are dealing with a 'variant' of the 'same' vowel. It seems, then, that we are concerned with two kinds of reality: the concrete, measurable reality of the sounds uttered, and another kind of reality, an abstraction made in our minds, which appears to reduce this infinite number of different sounds to a 'manageable' number of categories. In the first, concrete, approach, we are dealing with sounds in relation to speech; at the second, abstract, level, our concern is the behaviour of sounds in a particular language. A language is a system of conventional signals used for communication by a whole community. This pattern of conventions covers a system of significant sound units (the phonemes), the inflexion and arrangement of 'words', and the association of meaning with words. An utterance, an act of speech, is a single concrete manifestation of the system at work. As we have seen, several utterances which are plainly different on the concrete, phonetic, level may fulfil the same function, i.e. are the 'same', on the systematic language level. It is important in any analysis of spoken language to keep this distinction in mind and we shall later be considering in some detail how this dual approach to the utterance is to be made. It is not, however, always possible or desirable to keep the two levels of analysis entirely separate: thus, as we shall see, we will draw upon our knowledge of the linguistically significant units to help us in determining how the speech continuum shall be divided up on the concrete, phonetic, level; and again, our classification of linguistic units will be helped by our knowledge of their phonetic features. ^4 Redundancy_ Finally, it is well to remember that, although the sound system of our spoken languages serves us primarily as a medium of communication, its efficiency as such an instrument of communication does not depend upon the perfect production and reception of every single element of speech. A speaker will, in almost any utterance, provide the listener with far more cues than he needs for easy comprehension. In the first place, the situation, or context, will itself delimit very largely the purport of an utterance. Thus, in any discussion about a zoo, involving a statement such as 'We saw the lions and tigers', we are predisposed by the context to understand lions, even if the n is omitted and the word actually said is liars. Or again, we are conditioned by grammatical probabilities, so that a particular sound may lose much of its significance; e.g. in the phrase 'These men are working', the quality of the vowel in men is not as vitally important for deciding whether it is a question of men or man as it would be if the word were said in isolation, since here the plurality is determined in addition by the demonstrative adjective preceding men and the verb form following. Then again, there are particular probabilities in every language as to the different combinations of sounds which will occur. Thus in English, if we hear an initial th sound [S], we expect a vowel to follow, and of the vowels some are much more likely than others. We distinguish such sequences as -gl and -dl in final positions, e.g. in beagle and beadle, but this distinction is not relevant initially, so that even if dloves is said, we understand gloves. Or again, the total rhythmic shape of a word may provide an important cue to its recognition: thus, in a word such as become, the general rhythmic pattern may be said to contribute as much to the recognition of the word as the precise quality of the vowel in the first, weakly accented, syllable. Indeed, we may come to doubt the relative importance of vowels as a help to intelligibility, since we can replace our twenty English vowels by the single vowel [a] in any utterance and still, if the rhythmic pattern is kept, retain a high degree of intelligibility. An utterance, therefore, will provide a large complex of cues for the listener to interpret, but a great deal of this information will be redundant, as far as the listener's needs are concerned. On the other hand, such an over-proliferation of cues will serve to offset any disturbance such as noise or to counteract the sound-quality divergences which may exist between speakers of two dialects of the same language. But to insist, for instance, upon exaggerated articulation in order to achieve clarity may well be to go beyond the requirements of speech as a means of communication; indeed, certain obscurations of quality are, and have been for many centuries, characteristic of English. Aesthetic judgements on speech, such as those which deplore the use of the 'intrusive r', take into account social considerations of a somewhat different order from those involved in a study of speech as communication. lj Phonetics and Linguistics_ This book describes the sound system of English, but it should be remembered that Such a description forms only part of the total description of a language. A complete description of the current state of a language provides information on a number of interrelated components. 6 Speech and Language Communication 7 The phonetics of a language concerns the concrete characteristics (articulatory, acoustic, auditory) of the sounds used in languages, while phonology concerns how sounds function in a systemic way in a particular language. The traditional approach to phonology is through phonemics, which analyses the stream of speech into a sequence of contrastive segments, 'contrastive' here meaning 'contrasting with other segments which might change the meaning' (see further §5.3 below). The phonemic approach to phonology is not the only type of phonological theory but it is the most accessible to those with no training in linguistic theory, besides being more relatable to the writing system. Hence the major part of this book is set within phonemic analysis. Besides being concerned with the sounds of a language, both phonetics and phonology must also describe the combinatory possibilities of the sounds (the phonotactics or syllable structure) and the prosody of the language, that is, how features of pitch, loudness, and length work to produce accent, rhythm, and intonation. Additionally, a study can be made of the relationship between the sounds of a language and the letters used in its writing system (graphology or graphemics). While this book presents a detailed description of the phonetics and phonemics of English, reference will need to be made from time to time to other components of the language: (1) The lexicon—the words of the language, the sequence of phonemes of which they are composed, together with their meanings. (2) The morphology—the structure of words, in particular their inflexion (e.g. start!started—-here the past-tense morpheme is added to the stem morpheme). Statements can be made of the phonemic structure of morphemes—the morphophonemics. So the morphophonemics of the English plural morpheme involve the morphophonemic alternations illustrated by the /s/ in cats, the /z/ in dogs, and the /iz/ in losses. (3) The syntax—the description of categories like noun and verb, and the system of rules governing the structure of phrases, clauses, and sentences in terms of order and constituency. (4) The semantics—the meaning of words and the relationship between word meanings, and the way such meanings are combined to give the meanings of sentences. (5) The pragmatics—the influence of situation on the interpretation of utterances. Moreover various other aspects of linguistics will involve phonetics and phonology. Stylistics concerns the variations involved in different situations and in different styles of speech. Sociolinguistics concerns the interaction between language and society (e.g. the variation involved across classes and between the sexes). Dialectology (often considered a branch of sociolinguistics) concerns the variation in the same language in different regions. Psycho-linguistics concerns the behaviour of human beings in their production and perception of language (e.g. how far do we plan ahead and how much of an utterance do we decode at a time?). Language acquisition concerns children's learning of their first language, whereas applied linguistics principally concerns the acquisition of a second language. Finally, it is clear that the various components of a language are always undergoing change in time. The state of a language at any (synchronic) moment must be seen against a background of its historical (diachronic) evolution. It is for this reason that this book includes information on earlier states of the sound system of English* with some speculation on possible developments in the future. The Production of Speech 9 2 The Production of Speech: The Physiological Aspect 2.1 The Speech Chain_ Any manifestation of language by means of speech is the result of a highly complicated series of events. The communication in sound of such a simple concept as 'It's raining' involves a number of activities on the part of the speaker. In the first place, the formulation of the concept will take place at a linguistic level, i.e. in the brain; the first stage may, therefore, be said to be psychological. The nervous system transmits this message to the so-called 'organs of speech' and these in turn behave in a conventional manner, which, as we have learned by experience, will have the effect of producing a particular pattern of sound; the second important stage for our purposes may thus be said to be articulatory or physiological. The movement of our organs of speech will create disturbances in the air, or whatever the medium may be, through which we are talking; these varying air pressures may be investigated and they constitute the third stage in our chain, the physical, or acoustic. Since communication generally requires a listener as well as a speaker, these stages will be reversed at the listening end: the reception of the sound waves by the hearing apparatus (physiological) and the transmission of the information along the nervous system to the brain, where the linguistic interpretation of the message takes place (psychological). Phonetic analysis has often ignored the role of the listener. But any investigation of speech as communication must ultimately be concerned with both the production and the reception ends. Our immediate concern, however, is with the speaker's behaviour and more especially, on the concrete speech level, with the activity involved in the production of sounds. For this reason, we must now examine the articulatory stage (the speech mechanism) to discover how the various organs behave in order to produce the sounds of speech. g_2 The Speech Mechanism___ Man possesses, in common with many other animals, the ability to produce sounds by using certain of his body's mechanisms. The human being differs from other animals in that he has been able to organize the range of sounds which he can emit into a highly efficient system of communication. Non-human animals rarely progress beyond the stage of using the sounds they produce as a reflex of certain Basic stimuli to signal fear, hunger, sexual excitement, and the like. Nevertheless, like other animals, man when he speaks makes use of organs whose primary physiological function is unconnected with vocal communication; in particular, those situated in the respiratory tract. Sources of Energy: The Lungs The most usual source of energy for our vocal activity is provided by an airstream .expelled from the lungs. There are languages which possess sounds not requiring lung (pulmonic) air for their articulation, and, indeed, in English we have one or fw8 extralinguistic sounds, such as the one we write as tut-tut and the noise of encouragement made to horses, which are produced without the aid of the lungs; but all the essential sounds of English use lung air for their production. Our utterances are, therefore, largely shaped by the physiological limitations imposed by the capacity of our lungs and by the muscles which control their action. We are obliged to pause in articulation in order to refill our lungs with air, and the number of energetic peaks of exhalation which we make will to some extent condition the division of speech into sense-groups. In those cases where the airstream is not available for the upper organs of speech, as when, after the removal of the larynx, lung air does not reach the mouth but escapes from an artificial aperture in the neck, a new source of energy, such as stomach air, has to be employed; a new source of this kind imposes restrictions of quite a different nature from those exerted by the lungs, so that the organization of the utterance into groups is changed and variation of energy is less efficiently controlled. A number of techniques are available for the investigation of the activity in speech of the lungs and their controlling muscles. At one time air pressure within the lungs was observed by the reaction of an air-filled balloon in the stomach. On the basis of such evidence from a gastric balloon, it was at one time claimed that syllables were formed by chest pulses.' Such a primitive procedure was replaced by •he technique of electromyography, which demonstrated the electrical activity of those respiratory muscles most concerned in speech, notably the internal inter-costals; this technique disproved the relationship between chest pulses and syllables.3 X-ray photography can reveal the gross movements of the ribs and hence by mference the surrounding muscles, although the technique of Magnetic Resonance Paging (MRI) is now preferred on medical grounds. 1 Stetson (1951). 2 Ladefoged (1967). 10 Speech and Language us Fig. 1. Organs of speech. 2.2.2 The Larynx and Vocal Folds The airstream provided by the lungs undergoes important modifications in the upper parts of the respiratory tract before it acquires the quality of a speech sound. First of all, in the trachea or windpipe, it passes through the larynx, containing the so-called vocal folds, often, less correctly, called the vocal cords, or even vocal chords (see Fig. 1). The larynx is a casing, formed of cartilage and muscle, situated in the upper part of the trachea. Its forward portion is prominent in the neck below the chin and is commonly called the 'Adam's apple'. Housed within this structure from back to front are the vocal folds, two folds of ligament and elastic tissue which may be brought together or parted by the rotation of the arytenoid cartilages (attached at the posterior end of the folds) through muscular action. The inner edge of these folds is typically about 17 to 22 mm long in males and about 11 to 16 mm in females.3 The opening between the folds is known as the glottis. Biologically, the vocal folds act as a valve which is able to prevent the entry into the trachea and lungs of any foreign body, or which may have the effect of enclosing the air within the lungs to assist in muscular efforl on the part of the arms or the abdomen. In using the vocal folds for speech, the human being has adapted and elaborated upon this original open-or-shut function in the following ways (see Fig. 2). (1) The glottis may be held tightly closed, with the lung air pent up below il. This 'glottal stop' I'] frequently occurs in English, e.g. when it precedes the energetic articulation of a vowel as in apple [?aepl] or when it reinforces /p,t,k/ as in clock 3 Clark and Yallop (1990) The Production of Speech 11 Arytenoid cartilages [a] tightly together [b| loosely together and [cj open for normal breathing as for p) vibrating as for voiced and voiceless sounds sounds Frg. 2. The vocal cords as seen from above. [kln'k] or even replaces them, as in eolton [ko'n]. It may also be heard in defective speech, such as that arising from cleft palate, when PI may be substituted for the stop consonants, which, because of the nasal air escape, cannot be articulated with proper compression in the mouth cavity. (2) The glottis may be held open as for normal breathing and for voiceless sounds like [s] in sip and [p] in peak. (3) The action of the vocal folds which is most characteristically a function of speech consists in their role as a vibrator set in motion by lung air—the production of voice, or phonalion; this vocal-fold vibration is a normal feature of all vowels or of such a consonant as lz] compared with voiceless [s]. In order to achieve the effect of voice, the vocal folds are brought sufficiently close together that they vibrate when subjected to air pressure from the lungs. This vibration, of a somewhat undulatory character, is caused by compressed air forcing the opening of the glottis and the resultant reduced air pressure permitting the elastic folds to come together once more; the vibratory effect may easily be felt by touching the neck in the region of the larynx or by putting a finger over each ear flap when pronouncing a vowel or lz] for instance. In the typical speaking voice of a man, this opening and closing action is likely to be repeated between 100 and 150 times in a second, i.e. there are that number of cycles of vibration (called Hertz, which is abbreviated to Hz); in the case of a woman's voice, this frequency of vibration might well be between 200 and 325 Hz. We are able, within limits, to vary the speed of vibration of our vocal folds or, in other words, are able consciously to change the pitch of the voice produced in the larynx; the more rapid the rate of vibration, the higher is the pitch (an extremely 'ow rate of vibration being partly responsible for what is usually called creaky v°ice). Normally the vocal folds come together rapidly and part more slowly, the opening phase of each cycle thus being longer than the closing phase. This gives rise 10 'modal* (or 'normal') voice which is used for most of English speech. Other "todes of vibration result in other voice qualities, most notably breathy and creaky voice, which are used contrastively in a number of languages. (See also §5.8.) Moreover, we are able, by means of variations in pressure from the lungs, to modify "* size of the pufT of air which escapes at each vibration of the vocal folds; in other Words, we can alter the amplitude of the vibration, with a corresponding change of 12 Speech and Language The Production of Speech 13 loudness of the sound heard by a listener. The normal human being soon learns to manipulate his glottal mechanism so that most delicate changes of pitch and loudness are achieved. Control of this mechanism is, however, very largely exercised by the ear, so that such variations are exceedingly difficult to teach to those who are born deaf, and a derangement of pitch and loudness control is liable to occur among those who become totally deaf later in life. (4) One other action of the larynx should be mentioned. A very quiet whisper may result merely from holding the glottis in the voiceless position. But the more normal whisper, by means of which we are able to communicate with some ease, can be felt to involve energetic articulation and considerable stricture in the glottal region. Such a whisper may in fact be uttered with an almost total closure of the glottis and an escape of air in the region of the arytenoids. The simplest way of observing the behaviour of the vocal cords is by the use of a laryngoscope, which gives a stationary mirrored image of the glottis. Using stroboscopic techniques, it is possible to obtain a moving record, and high-speed films have been made of the vocal cords, showing their action in ordinary breathing, producing voice and whisper, and closed as for a glottal stop. The modern technique of observation is to use fiberoptic endoscopy coupled if required with a videocamera. 2.2.3 The Resonating Cavities The airstream, having passed through the larynx, is now subject to further modification according to the shape assumed by the upper cavities of the pharynx and mouth, and according to whether the nasal cavity is brought into use or not. These cavities function as the principal resonators of the voice produced in the larynx. 2.2.3J The Pharynx The pharyngeal cavity (see Fig. 1 > extends from the top of the trachea and oesophagus, past the epiglottis and the root of the tongue, to the region at the rear of the soft palate. It is convenient to identify these sections of the pharynx by naming them: laryngopharynx, oropharynx, nasopharynx. The shape and volume of this long chamber may be considerably modified by the constrictive action of the muscles enclosing the pharynx, by the movement of the back of the tongue, by the position of the soft palate which may, when raised, exclude the nasopharynx, and by the raising of the larynx itself. The position of the tongue in the mouth, whether it is advanced or retracted, will affect the size of the oropharyngeal cavity; the modifications in shape of this cavity should, therefore, be included in the description of any vowel. It is a characteristic of some kinds of English pronunciation that certain vowels, e.g. the [ae] vowel in sad, are articulated with a strong pharyngeal contraction; in addition, a constriction may be made between the lower rear part of the tongue and the wall of the pharynx so that friction, with or without voice, is produced, such fricative sounds being a feature of a number of languages. The pharynx may be observed by means of a laryngoscope or fiberoptic nasendoscopy, and its constrictive actions are revealed by lateral x-ray photography or, nowadays, preferably by MRI. The escape of air from the pharynx may be effected in one of three ways: (1) The soft palate may be lowered, as in normal breathing, in which case the air may escape through the nose and the mouth. This is the position taken up by the goft palate in articulation of the French nasalized vowels in such a phrase as un bon rin bianc [ce bo ve bla], the particular quality of such vowels being achieved through function of the nasopharyngeal cavity. Indeed, there is no absolute necessity for nasal airflow out of the nose, the most important factor in the production of nasality being the sizes of the posterior oral and nasal openings (some speakers may even make the nasal cavities vibrate through nasopharyngeal mucus or through the soft palate itself).4 (2) The soft palate may be lowered so that a nasal outlet is afforded to the airstream, but a complete obstruction is made at some point in the mouth, with the result that, although air enters all or part of the mouth cavity, no oral escape is possible. A purely nasal escape of this sort occurs in such nasal consonants as [nvifl] in the English words ram, ran, rang. In a snore and some kinds of defective speech, this nasal escape may be accompanied by friction between the rear side of the soft palate and the pharyngeal wall. (3) The soft palate may be held in its raised position, eliminating the action of the nasopharynx, so that the air escape is solely through the mouth. All normal English sounds, with the exception of the nasal consonants mentioned, have this oral escape. Moreover, if for any reason the lowering of the soft palate cannot be effected, or if there is an enlargement of the organs enclosing the nasopharynx or a blockage brought about by mucus, It is often difficult to articulate either nasalized vowels or nasal consonants. In such speech, typical of adenoidal enlargement or the obstruction caused by a cold, the French phrase mentioned above would have its nasalized vowels turned into their oral equivalents and the English word morning would have its nasal consonants replaced by [b,d,g). On the other hand, an inability to make an effective closure by means of the raising of the soft palate—either because the soft palate itself is defective or because an abnormal opening in the roof of the mouth gives access to the nasal cavity—will result in the general nasalization of vowels and the failure to articulate such oral stop consonants as [b,d,gl- This excessive nasalization (or hypernasality) is typical of such a condition as cleft palate. It is evident that the action of the soft palate is accessible to observation by direct •"leans, as well as by lateral x-ray photography and MRI; the pressure of the air Passing through the nasal cavities may be measured at the nostrils or within the cavities themselves. 2>2J.2 The Mouth Although all the cavities so far mentioned play an essential Pfrrt in the production of speech sounds, most attention has traditionally been paid to the behaviour of the cavity formed by the mouth. Indeed, in many languages the •ord tongue is used to refer to our speech and language activity. Such a preoccupa-tloIi with the oral cavity is doubtless due to the fact that it is the most readily "ccessible and easily observed section of the vocal tract; but there is in such an attitude a danger of gross oversimplification. Nevertheless, it is true that the shape W the mouth determines finally the quality of the majority of our speech sounds. * Uver (1980). w 14 Speech and Language The Production of Speech 15 Far more finely controlled variations of shape are possible in the mouth than in any other part of the speech mechanism. The only boundaries of this oral chamber which may be regarded as relatively fixed are, in the front, the teeth; in the upper part, the hard palate; and, in the rear, the pharyngeal wall. The remaining organs are movable: the lips, the various pans of the tongue, and the soft palate with its pendant uvula (see Fig. I). The lower jaw. too, is capable of very considerable movement; its movement will control the gap between the upper and lower teeth and also to a large extent the disposition of the lips. The space between the upper and lower teeth will often enter into our description of the articulation of sounds; in all such cases, it is clear that the movement of the lower jaw is ultimately responsible for the variation described. Movement of the lower jaw is also one way of altering the distance between the tongue and the roof of the mouth. It is convenient for our descriptive purposes to divide the roof of the mouth into three parts: moving backwards from the upper teeth, first, the teeth ridge (adjective: alveolar), which can be clearly fell behind the teeth; secondly, the bony arch which forms the hard palate (adjective: palatal), which varies in size and arching from one individual to another; and finally, the soft palate (adjective: velar), which, as we have seen, is capable of being raised or lowered, and at the extremity of which is the uvula (adjective: uvular). All these parts can be readily observed by means or a mirror. (1) Of the movable parts, the lips (adjective: labial) constitute the final orifice of the mouth cavity whenever the nasal passage is shut off. The shape which they assume will, therefore, affect very considerably the shape of the total cavity. They may be shut or held apart in various ways. When they are held tightly shut, they form a complete obstruction or occlusion to the airstream, which may either be momentarily prevented from escaping at all, as in the initial sounds of par and bat, or may be directed through the nose by the lowering of the soft palate, as in the initial sound of mat. If the lips are held apart, the positions they assume may be summarized under five headings: (a) held sufficiently close together over all their length that friction occurs between them. Fricative sounds of this sort, with or without voice, occur in many languages and the voiced variety [p] is sometimes wrongly used by foreign speakers of English for the first sound in the words vet or wet; (b) held sufficiently far apart for no friction to be heard, yet remaining fairly close together and energetically spread. This shape is taken up for vowels like that in see and is known as the spread lip position; (c) held in a relaxed position with a lowering of the lower jaw. This is the position taken up for the vowel of get and is known as the neutral position; (d) tightly pursed, so that the aperture is small and rounded, as in the vowel of do, or more markedly so in the French vowel of doux. This is the close rounded position: (e) held wide apart, but with slight projection and rounding, as in the vowel of got. This is the open rounded position. Variations of these five positions may be encountered, e.g. in the vowel of saw, for which a type of lip-rounding between open and close is commonly used. It will be seen from the examples given that lip position is particularly significant in the formation of vowel quality. English consonants, on the other hand, with the exception of [p,b,m,w], whose primary articulation involves lip aclion, will tend to share the lip position of the adjacent vowel. In addition, the lower lip is an active articulator in the pronunciation of |f,v], a light contact being made between the lower Up and the upper teeth. (2) Of all the movable organs within the mouth, the tongue is by far the most flexible, and is capable of assuming a great variety of positions in the articulation of both vowels and consonants. The tongue is a complex muscular structure which does not show obvious sections; yet, since its position must often be described in considerable detail, certain arbitrary divisions are made. When the tongue is at rest, with its tip lying behind the lower teeth, that part which lies opposite the hard palate is called the front and that which faces the soft palate is called the back, with the region where the front and back meet known as the centre (adjective: central). These areas together with the root are sometimes collectively referred to as (be body of the tongue. The tapering section facing the teeth ridge is called the Made (adjective: laminal) and its extremity the lip (adjective: apical). The edges of the tongue are known as the rims. Generally, in the articulation of vowels, the tongue tip remains low behind the lower teeth. The body of the tongue may, however, be 'bunched up* in different ways, e.g. the front may be the highest part, as when we say the vowel of he; or the back may be most prominent, as in the case of the vowel in who; or the whole surface may be relatively low and flat, as in the case of the vowel in ah. Such changes of shape can be felt if the above words are said in succession. These changes, moreover, together with the variations in lip position, have the effect of modifying very considerably the size of the mouth cavity and of dividing this chamber into two parts: that cavity which is in the forward part of the mouth behind the lips and that which is in the rear, in the region of the pharynx. The various parts of the tongue may also come into contact with the roof of the mouth. Thus, the tip. blade, and rims may articulate with the teeth, as Tor the th MMinds in English, or with the upper alveolar ridge, as in the case of /t.d,s,z,n/, or the apical contact may be only partial, as in the case of /l/ (where the tip makes firm contact whilst the rims make none), or intermittent in a trilled fx/ as in some forms of Scottish English. In some languages, notably those of India, Pakistan, and Sn Lanka, the tip contact may be retracted to the very back of the teeth ridge or ^en slightly behind it; the same kind of retroflexion, without the tip contact, is typical of some kinds of English /r/, e.g. those used in south-west England and in the USA. The front of the tongue may articulate against or near to the hard palate. Such a raising of the front of the tongue towards the palate (palatalization) is an essential j*rt of the t/,j] sounds in English words such as she and measure, being additional an articulation made between the blade and the alveolar ridge; or again, it is the feature of the (j) sound initially in yield. The back of the tongue can form a total obstruction by its contact with the soft Wate, raised in the case of [k.g] and lowered for |g), as in sing; or again, there may r^'y be a narrowing between the soft palate and the back of the tongue, so ihat ■J^tion of the type occurring finally in the Scottish pronunciation of loch is heard. *™ finally, the uvula may vibrate against the back of the tongue, or there may be a 16 Speech and Language The Production of Speech 17 narrowing in this region which causes uvular friction, as at the beginning of the French word rouge. It will be seen from these few examples that, whereas for vowels the tongue is generally held in a position which is convex in relation to the roof of the mouth, some consonant articulations, such as the southern British English /r/ in red and the IM in table, will involve the 'hollowing' of the body of the tongue so that it has, at least partially, a concave relationship with the roof of the mouth. Moreover, the surface of the tongue, viewed from the front, may take on various forms: there may be a narrow groove running from back to front down the mid line as for the /s/ in see, or the grooving may be very much more diffuse as in the case of the / j/ in ship; or again, the whole tongue may be laterally contracted, with or without a depression in the centre (sulcalization), as is the case with various kinds of r sounds. (3) The oral speech mechanism is readily accessible to direct observation as far as the lip movements are concerned, as are many of the tongue movements which take place in the forward part of the mouth. A lateral view of the shape of the tongue over all its length and its relationship with the palate and the velum may be obtained by means of still and moving x-ray photography and by MRI. It is not, however, to be expected that pictures of the articulation of, say, the vowel in cat will show an identical tongue position for the pronunciation of a number of individuals. Not only is the sound itself likely to be different from one individual to another, but, even if the sound is for all practical purposes the 'same', the tongue positions may be different, since the boundaries of the mouth cavity are not identical for two speakers; and, in any case, two sounds judged to be the same may be produced by the same individual with different articulations. When, therefore, we describe an articulation in detail, it should be understood that such an articulation is typical for the sound in question, but that variations are to be expected. Palatography, showing the extent of the area of contact between the tongue and the roof of the mouth, has long been a more practical and informative way of recording tongue movements. At one time the palate was coated with a powdery substance, the articulation was made, and the 'wipe-off subsequently photographed. But the modern method uses electropalatography, whereby electrodes on a false palate respond to any tongue contact, the contact points being simultaneously registered on a visual display. This has the advantage of showing a series of representations of the changing contacts between the tongue and the palate during speech. Electropalatograms of this sort are used to illustrate the articulations of consonants in Chapter 9. (3) The position of the soft palate, which will decide whether or not the sound hgg nasal resonances. (4) The disposition of the various movable organs of the mouth, i.e. the shape of jj^ gps and tongue, in order to determine the nature of the related oral and oropharyngeal cavities. In addition, it may be necessary to provide other information concerning, for instance, a particular secondary narrowing, or tenseness which may accompany the primary articulation; or again, when it ts a question of a sound with no steady state to describe, an indication of the kind of movement which is taking place. A systematic classification of possible speech sounds is given in Chapter 4. 2.3 Articulator}? Description_ We have now reviewed briefly the complex modifications which are made to the original airstream by a mechanism which extends from the lungs to the mouth and nose. The description of any sound necessitates the provision of certain basic information: (1) The nature of the airstream; usually, this will be expelled by direct action of the lungs, but we shall later consider cases where this is not so. (2) The action of the vocal folds; in particular, whether they are closed, wide apart, or vibrating. The Sounds of Speech 19 The Sounds of Speech: The Acoustic and Auditory Aspects 3.1 Sound Quality To complete an act of communication, it is not normally sufficient that our speech mechanism should simply function in such a way as to produce sounds; these in turn must be received by a hearing mechanism and interpreted, after having been transmitted through a medium, such as the air, which is capable of conveying sounds. We must now, therefore, examine briefly the nature of the sounds which we hear, the characteristics of the transmission phase of these sounds, and the way in which these sounds are perceived by a listener. When we listen to a continuous utterance, we perceive an ever-changing pattern of sound. As we have seen, when it is a question of our own language, we are not conscious of all the complexities of pattern which reach our ears: we tend consciously to perceive and interpret only those sound features which are relevant to the intelligibility of our language. Nevertheless, despite this linguistic selection which we ultimately make, we are aware that this changing pattern consists of variations of different kinds: of sound quality—we hear a variety of vowels and consonants: of pitch—we appreciate the melody, or intonation, of the utterance; of loudness—we will agree that some sounds or syllables sound 'louder' than others; and of length— some sounds will be appreciably longer to our ears than others. These are judgements made by a listener in respect of a sound continuum emitted by a speaker and. if the sound stimulus from the speaker and response from the listener are made in terms of the same linguistic system, then the utterance will be meaningful for speaker and listener alike. It is reasonable to assume, therefore, that there is some constant relationship between the speaker's articulation and the listener's reception of sound variations. In other words, it should be possible to link through the transmission phase the listener's impressions of changes of quality, pitch, loudness, and length to some articulatory activity on the part of the speaker. It will in fact be seen that an exact parallelism or correlation between the production, transmission, and reception phases of speech is not always easy to establish, the investigation of such relationships being one of the tasks of present-day phonetic studies. The formation of any sound requires that a vibrating medium should be set in potion by some kind of energy. We have seen that in the case of the human speech jj^anism the function of vibrator is often fulfilled by the vocal folds, and that fteae are activated by air pressure from the lungs. In addition, any such sound jgodooed in the larynx is modified by the resonating chambers of the pharynx, mouth, and, in certain cases, the nasal cavities. The listener's impression of sound quality wi" I* determined by the way in which the speaker's vibrator and icionators function together. Speech sounds, like other sounds, are conveyed to our ears by means of waves of compression and rarefaction of the air particles (the commonest medium of communication). These variations in pressure, initiated by the action of the vibrator, are propagated in all directions from the source, the air particles themselves vibrating at the same rate (or frequency) as the original vibrator. In speech, these vibrations may be of a complex but regular pattern, producing 'tone' roch as may be heard in a vowel sound; or they may be of an irregular kind, producing 'noise', such as we have in the consonant /s/; or there may be both regular and irregular vibrations present, i.e. a combination of tone and noise, as in /z/. In the production of normal vowels, the vibrator is normally provided by the vocal folds; in the case of many consonant articulations, however, a source of air diiturbance is provided by constriction at a point above the larynx, with or without accompanying vocal fold vibrations. Despite the fact that the basis of all normal vowels is the glottal tone, we are all capable of distinguishing a large number of vowel qualities. Yet the glottal vibrations in the case of [a:] are not very different from those for [i;l, when both vowels are said with the same pitch. The modifications in quality which we perceive tie due to the action of the supragloital resonators which we have previously described. To understand this action, it is necessary to consider a little more closely the nature of the glottal vibrations. It has already been mentioned that the glottal tone is the result of a complex, but •rainly regular, vibratory motion. In fact, the vocal folds vibrate in such a way as to Produce, in addition to a basic vibration over their whole length (the fundamental •"Wquency), a number of overtones or harmonics having frequencies which are *™Pk multiples of the fundamental or first harmonic. Thus, if there is a fundamental frequency of vibration of 100 Hz, the upper harmonics will be of the order ™ 200, 300, 400, etc. Hz. Indeed, there may be no energy at the fundamental ™fl^cy, but merely the harmonics of higher frequency such as 200, 300, 400 Hz. j^rtheless, we still perceive a pitch which is appropriate to a fundamental 7*l|Jency of 100 Hz; i.e. the fundamental frequency is the highest common factor t? *e frequencies present, whether or not it is present itself. _*ne number and strength of the component frequencies of this complex glottal ^"wiU differ from one individual to another, and this accounts at least in part for ^differences of voice quality by which we are able to recognize speakers. But we j*j all modify the glottal tone so as to produce at will vowels as different as [i:] and l**o that, despite our divergences of voice quality, we can convey the distinction two words such as key and car. This variation of quality, or timbre, of the *T*tal tone is achieved by the shapes which we give the resonators above the s—the pharynx, mouth, and nasal cavity. These chambers are capable of oing an infinite number of shapes, each of which will have a characteristic 4 20 Speech and Language The Sounds of Speech 21 vibrating resonance of its own. Those harmonics of the glottal tone which coincide with the chamber's own resonance are very considerably amplified. Thus, certain bands of strongly reinforced harmonics are characteristic of a particular arrangement of the resonating chambers which produces, for instance, a certain vowel sound. Moreover, these bands of frequencies will be reinforced whatever the fundamental frequency. In other words, whatever the pitch on which we say, for instance, the vowel [a:], the shaping of the resonators and their resonances will be very much the same, so that it is still possible, except on extremely high or low pitches, to recognize the quality intended. It is found that, for male speakers, the vowel [a:] has one such characteristic band of strong components in the region of 700 Hz and another at about 1,100 Hz. The vowel [i:] has, for female speakers, bands of energy at about 320 and 2,700 Hz. »0 ax> no k» *» ms urn 10000100100300 /W M M M 3.2 The Acoustic Spectrum This complex range of frequencies of varying intensity which go to make up the quality of a sound is known as the acoustic spectrum, and those bands of energy which are characteristic of a particular sound are known as the sound's formants. Thus, formants of la:] are said to occur, for male speakers, in the region 700 and 1,100 Hz. Such complex waveforms can be analysed and displayed as a spectrogram. Originally this display required a special instrument, a spectrograph, but nowadays it is generally done by computer. The spectrogram consists of a three-dimensional display: frequency is shown on the vertical axis, time on the horizontal axis, and the energy at any frequency level either by the density of blackness in a black and white display, or by colours in a colour display. Thus the concentrations of energy at particular frequency bands (the formants) stand out very clearly. Fig. 3 shows spectrograms of the vowels /a:/ and /i:/ as said by a male speaker of British English. Fig. 3 also shows, in the spectrogram of Manchester music shops, the extent to which utterances are not neatly segmented into a succession of sounds but that, on the contrary, considerable overlap is involved. Such spectrographic analysis provides a great deal of acoustic information in a convenient form. Nevertheless, much of the information given is, in fact, irrelevant to our understanding of speech, and the phonetician is obliged to establish by other methods the elements of the spectrum which are essential to speech communication. For instance, two, or at the most three, formants appear to be sufficient for the correct identification of vowels. As far as the English vowels are concerned, the first three formants are all included in the frequency range 0-4,000 Hz, so that the spectrum above 4,000 Hz would appear to be largely irrelevant to the recognition of our vowels. It is true that on a telephone system, which may have a frequency range of about 300-3,000 Hz, we find little difficulty in identifying the sound patterns used by a speaker and are even able to recognize voice qualities. Indeed, when we are dealing with a complete utterance in a given context, where there is a multiplicity of cues to help our understanding, a high degree of intelligibility may be retained even when there are no frequencies above 1,500 Hz. As one would suspect, there appear to be certain relationships between the formants of vowels and the cavities of the vocal tract (i.e. the shapes taken on by kHz w 9 8 7 6 5 4 3 1 1 kHz 10 9 8 7 6 5 4 . 3 2 1 ms 100 in It! kHz 5 OB20O 4OO 60O 800 1000 DOO 1400 tfoo l80O 3. Spectrograms of /i:,w/x,ai/,/&,f,f/, and Manchester music shops. 22 Speech and Language the resonators, notably the relation of the oral and pharyngeal cavities). Thus the first formant appears to be low when the tongue is high in the mouth: e.g. p:] and [u:|, having high tongue positions, have first Cormanls of the order of 280-320 Hz, whereas la.) and |d] have their first formants in the region 600-800 Hz, their tongue positions being relatively low. On the other hand, the second formant seems to be inversely related to the length of the fionl cavity: thus li:J, where the tongue is raised high in the front of the mouth, has a second formant around 2,200-2,700 Hz, whereas [u:], where the tongue is raised at the back of the mouth and lips are rounded, has a relatively low second formant around 1,100-1,400 Hz. It is also confirmed from spectrographs analysis that a diphthong, such as that in my. is indeed a glide between two vowel elements (reflecting a perceptible articulator movement), since the formants bend from those positions typical of one vowel to those characteristic of another (see Fig. 3). For many consonant articulations (e.g. the initial sounds in pin, tin, kin, thin, fin, sin, shin, in which the glottal vibrations play no part) there is an essential noise component, deriving from an obstruction or constriction within the mouth, approximately within the range 2,000-8,000 Hz (see Fig. 3|. This noise component is also present in analogous articulations in which vocal fold excitation is present, as in the final sounds of ruse and rouge, where we are dealing with sounds which consist of a combination of glottal tone and noise. Relevant acoustic data concerning both vowel and consonant articulations will be given in the sections dealing with individual English sounds (Chapters 8 and 9). Spectrographs analysis also reveals the way in which there tends, on the acoustic level, to be a merging of features of units which, linguistically, we treat separately. Thus, our discrimination of [f] and 16] sounds would appear to depend not only on the frequency and duration of the noise component but also upon a characteristic bending of the formants of the adjacent vowel. Indeed, in the case of such consonants as [p,t,k], which involve a complete obstruction of the airstream and whose release is characterized acoustically by a relatively brief burst of noise, the vowel transition between the noise and the steady state of the vowel appears to be of prime importance for our recognition of the consonant. 3.2.1 Fundamental Frequency: Pitch Our perception of the pitch of a speech sound depends directly upon the frequency of vibration of the vocal folds. Thus we are normally conscious of the pitch caused by the voiced sounds, especially vowels; pilch judgements made on voiceless or whispered sounds, without the glottal tone, are limited in comparison with those made on voiced sounds, and are induced mainly by variations of intensity or by the dominance of certain harmonics brought about by the dispositions of the resonating cavities. The higher the glottal fundamental frequency, the higher our impression of pitch. A male voice may have an average pitch level of about 120 Hz and a female voice a level in the region of 220 Hz-1 The pitch level of voices, however, will vary a great deal between individuals and also within the speech of one speaker, the total range of one speaking voice being liable to have a range as extensive as 80-350 Hz. Yet I Fant<1936). The Sounds of Speech 23 our perception of frequency extends furlher than the limits of glottal fundamental frequency, since our recognition of quality depends upon frequencies of a much higher order. In fact, the human ear perceives frequencies from as low as 16 Hz to about 20,000 Hz and in some cases even higher. As one becomes older, this upper limit may fall considerably, so that at the age of fifty it may extend no higher than about 10,000 Hz. As we have seen, such a reduced range is no impediment to perfect understanding of speech, since a high percentage of acoustic cues for speech recognition fall within the range 0-4,000 Hz. Our perception of pilch is not, however, solely dependent upon fundamental frequency. Variations of intensity on the same frequency may induce impressions of a change of pitch; and conversely, tones of very high or very low frequency, if they are to be audible at all, require greater intensity than those in a middle range of frequencies. Instrumental measurement of fundamental frequency based on signals received through a microphone employs two general methods. The first is to count the number of times that a particular pattern is repeated within a selected segment of a waveform such as that provided on an oscillogram. The second is to track the progress of the fundamental frequency on a spectral display like that provided on a spectrogram, or, alternatively, to track the progress of a particular harmonic and divide by the relevant number. Nowadays various computer programs are available which average the results from a range of measurements based on the two general methods noted above. But even with such sophisticated programs there are still likely to be the occasional mistakes like octave jumps (the difference between two harmonies representing an octave perceptually). A third method of fundamental frequency extraction involves direct measurement of the vibration of the vocal folds either by glottal illumination or by electroglottography. The best known technique in the latter class involves using a laryngograph. Electrodes are attached to the outside of the throat, and the varying electrical impedance is monitored and projected onto a visual display. The signal generated by the variation in impedance can also be stored, enabling this technique to be used outside the laboratory. Measures of fundamental frequency do not always correspond to our auditory perception of pitch. Different segments affect the fundamental frequency in different ways: for example, other things being equal, an [i] will have a higher fundamental frequency than an la] and a [pi will produce a higher frequency on a following vowel than a [b]. What is more, many slight changes in frequency will be undetectable by the ear. As in many other cases of instrumental measurement, we still have to use our auditory perception to interpret what instruments tell us. 33.2.2 Intensity: Loudness Our sensation of the relative loudness of sounds may depend on several factors, e.g. a sound or syllable may appear to stand out from its neighbours—be 'louder'— because a marked pitch change is associated with it or because it is longer than its neighbours. It is better to use a term such as prominence io cover these general listener-impressions of variations in the perceptibility of sounds. More strictly, what 2 Abberton and Fouicin (1984). 24 Speech and Language is 'loudness' at the receiving end should be related to intensity at the production stage, which in turn is idated to Ihe size or amplitude of the vibration. An increase in amplitude of vibration, with its resultant impression of greater loudness, is brought about by an increase in air pressure from the lungs As we shall see (§10.2), this greater intensity is not in itself usually the most important factor in rendering a sound prominent in English. Moreover, all other things being equal, some sounds appear by their nature io be more prominent or sonorous than others, e.g. the vowel in barn has more carrying power than that in bean, and vowels generally are more powerful than consonants. The judgements we make concerning loudness are not as fine as those made for either quality or pitch We may judge which of two sounds is the louder, but we find it difficult to express the extent of the difference. Indeed, in terms of our linguistic system, we need perceive and interpret only gross differences of loudness, despite the fact ihat when we judge quality we are, in recognizing the formant structure of a sound, reacting to characteristic regions of strong intensity in the spectrum. 3.2.3 Duration: Length In addition to affording different auditory impressions of quality, pitch, and loudness, sounds may appear to a listener to be of different length. Clearly, whenever it is possible to establish the boundaries of sounds or syllables, it will be possible to measure iheir duration by means of such traces as are provided by oscillograms or spectrograms. Such delimitation of units, in both the articulatory and acoustic sense, may be difficult, as we shall see when we deal with the segmentation of the utterance. But. even when it can be done, variation* of duration in acoustic terms may not correspond to our linguistic judgements of length. We shall, for instance, refer later to the 'long' vowels of English such as those of bean and barn, as compared with the 'short' vowel in bin. But, in making such statements, we shall not be referring to absolute duration values, since the duration of all vowels will vary considerably from utterance to utterance, according to factors like whether the utterance is spoken fast or slowly, whether the syllable containing the vowel is accented or not, and whether the vowel is followed by a voiced or voiceless consonant. In the English system, however, we know that no more than two degrees of length are ever linguistically significant, and all absolute durations will be interpreted in terms of this relationship. This distinction between measurable duration and linguistic length provides another example of the way in which our linguistic sense interprets from the acoustic material only that which is significant. The sounds composing any utterance will have varying durations, and we will have the impression that some syllables are longer than others. Such variations of length within the utterance constitute one manifestation of the rhythmic delivery which is characteristic of English and so is fundamentally different from the flow of other languages, such as French, where syllables tend to be of much more even length. As already mentioned, the absolute duration of sounds or syllables will depend, among other things, upon the speed of utterance. An average rate of delivery might contain anything from about 6 lo 20 sounds per second, but lower and much higher The Sounds of Speech 25 speeds are frequently used without loss of intelligibility. The lime required for the recognition of a sound will depend upon the nature of the sound and the pitch, vowels and consonants differing considerably in this respect, but it seems that a vowel lasting only about 4 msec may have a good chance of being recognized. 3.2.4 'Stress' We have purposely avoided the use of the word 'stress' in this chapter because this word has been used in different and ambiguous ways in phonetics and linguistics. It has sometimes been used as simply equivalent to loudness, sometimes as meaning 'made prominent by means other than pitch' (i.e. by loudness or length), and sometimes as referring just to syllables in words in Ihe lexicon and meaning something like 'having the potential for accent on utterances'. Throughout this book we will avoid use of the term 'stress' altogether, using prominence as the general term referring to segments or syllables, sonorjty as the particular term referring to the carrying power of individual sounds, and accent as referring to those syllables which stand out above others, either in individual words or in longer utterances. 3.3 Hearing Our hearing mechanism must be thought of in two ways: the physiological mechanism which reacts to the acoustic stimuli—the varying pressures in the air which constitute sound; and the psychological activity which, at the level of the brain, selects from the gross acoustic information that which is relevant in terms of the linguistic system involved. In this way, measurably different acoustic stimuli may be interpreted as being the 'same' sound unit. As we have seen, only part of the total acoustic information seems to be necessary for the perception of particular sound values. One of the tasks which confront the phonetician is the disentanglement of these relevant features from the mass of acoustic material, such as modern methods of sound analysis make available. The most fruitful technique of discovering the significant acoustic cues is that of speech synthesis, controlled by listeners' judgements. After all, the sounds [a:] and [s] are [a:] and [s] only if listeners recognize them as such. Thus, it has been established that only two formants are necessary for the recognition of vowels, because machines which generate sound of the appropriate frequency bands and intensity produce vowels which are correctly identified by listeners. Listeners without any phonetic training can. therefore, frequently give valuable guidance by their judgements of synthetic qualities. But it is important to be aware of the limitations of such listeners, so as to be able to make a proper evaluation of their judgements. A listener's reactions are normally conditioned by his experience of handling his own language. Thus, if there are only five significant vowel units in his language, he is liable to allow a great deal more latitude in his assessment of what is the 'same' vowel sound than if he has twenty. An Englishman, for instance, having a complex vowel system and being accustomed to distinguishing such subtle distinctions as those in sit, set, sat, will be fairly precise in his judgement of vowel 26 Speech and Language qualities. A Spaniard, however, whose vowel system is made up of fewer significant units, is likely for this reason to be more tolerant of variation of quality. Or again, if a listener is presented with a system of synthetic vowels which is numerically the same as his own, he is able to make allowance for considerable variations of quality between his and the synthetic system and still identify the vowels correctly—by their 'place' in a system rather than by their precise quality; this is what he does when he listens to and understands his language as used by a speaker of a different dialect. Our hearing mechanism also plays an important part in monitoring our own speech; it places a control upon our speech production which is complementary to our motor, articulatory, habits. If this feedback control is disturbed, e.g. by the imposition of an artificial delay upon our reception of our own speech, disturbance in the production of our utterance is likely to result. Those who are born deaf or who become deaf before the acquisition of speech habits are rarely able to learn normal speech; similarly, a severe hearing loss later in life is likely to lead eventually to a deterioration of speech. 1 The Description and Classification of Speech Sounds 4.1 Phonetic Description We have considered briefly both the mechanism which produces speech sounds and also some of the acoustic and auditory characteristics of the sounds themselves. It is now important to formulate a method of description and classification of the sound types which occur in speech and, more particularly, in English. We have seen that a speech sound has at least three stages available for investigations—the production, transmission, and reception stages. A complete description of a sound should, therefore, include information concerning all three stages. To describe the first sound in the word ten merely in terms of the movements of the organs of speech is to ignore the nature of the sound which is produced and the features perceived by a listener. Nevertheless, to provide all the information in respect of all phases entails a lengthy description, much of which may be irrelevant to a particular purpose. For example, since the description of the sounds of a language has in the past been most commonly used in the teaching of the language to foreigners, the emphasis has always been laid on the articulatory event. Moreover, it is only comparatively recently that there has existed any considerable body of acoustic information concerning speech. The most convenient and brief descriptive technique continues to rely either on articulatory criteria or on auditory judgements, or on a combination of both. Thus, those sounds which are commonly known as 'consonants' are most easily described mainly in terms of their articulation, whereas 'vowel' sounds require for their description a predominance of auditory impressions. 4.2 Vowel and Consonant Two types of meaning are associated with the terms 'vowel' and 'consonant'. Traditionally, consonants are those segments which, in a particular language, occur at the edges of syllables, while vowels are those which occur at the centre of syllables. So, in red, wed, dead, lead, said, the sounds represented by < r,w,d,l,s> 28 Speech and Language Description and Classification of Speech Sounds 29 are consonants, while in beat, bit, bet, but, bought, the sounds represented by are vowels. This reference to the functioning of sounds in syllables in a particular language is a phonological definition. But once any attempt is made to define what sorts of sounds generally occur in these different syllable-positions, then we are moving to a phonetic definition. This type of definition might define vowels as median (air must escape over the middle of the tongue, thus excluding the lateral [1]), oral (air must escape through the mouth, thus excluding nasals like [n]), frictionless (thus excluding fricatives like [s]), and continuant (thus excluding plosives like [pi); all sounds excluded from this definition would be consonants. But difficulties arise in English with this definition (and with others of this sort) because English /j,w,r/, which are consonants phonologically (functioning at the edges of syllables) are vowels phonetically. Because of this, these sounds are often called semi-vowels. The reverse type of difficulty is encountered in words like sudden and little, where the final consonants /n/ and /[/ form syllables on their own and hence must be the centre of such syllables even though they are phonetically consonants, and even though /n/ and /]/ more frequently occur al the edges of syllables, as in net and let. When occurring in words like sudden and little, nasals and laterals are called syllabic consonants. In this chapter we will be describing and classifying speech sounds phonetically (in the next chapter we return to the phonological definitions). We shall find that consonants can be voiced or voiceless, and are most easily described wholly in articulatory terms, since we can generally feel the contacts and movements involved. Vowels, on the other hand, are voiced, and, depending as they do on subtle adjustments of the body of the tongue, are more easily described in terms of auditory relationships. 4.3 Consonants We have seen, in the preceding chapters, that the production of a speech sound may involve the action of a source of energy, a vibrator, and the movement of certain supraglottal organs. In the case of consonantal articulations, a description must provide answers to the following questions: (1) Is the airstream set in motion by the lungs or by some other means? (pulmonic or non-pulmonic) (2) Is the airstream forced outwards or sucked inwards? (egressive or ingressive) (3) Do the vocal folds vibrate or not? (voiced or voiceless) (4) Is the soft palate raised, directing the airstream wholly through the mouth, or lowered, allowing the passage of air through the nose? (oral, or nasal or nasalized) (5) At what point or points and between what organs does the closure or narrowing take place? (place of articulation) (6) What is the type of closure or narrowing at the point of articulation? (manner of articulation) In the case of the sound [z], occurring medially in the word easy, the following answers would be given: (1) pulmonic (2) egressive (3) voiced (4) oral (5) tongue tip-alveolar ridge (6) fricative These answers provide a concise phonetic label for the sound; a more detailed description would include additional information concerning, for instance, the shape of the remainder of the tongue, the relative position of the jaws, and the lip position. 4.3.1 Egressive Pulmonic Consonants Most speech sounds are made with egressive lung air. Virtually all English sounds are so made, the exception being [p,t,k], which in some dialects become ejectives (see §4.3.9 below). 4.3.2. Voicing At any place of articulation, a consonantal articulation may be voiceless or voiced. 4.3.3 Place of Articulation The chief points of articulation are the following: Bilabial. The two lips are the primary articulators, e.g. Ip,b,ml. Labiodental. The lower lip articulates with the upper teeth, e.g. [f.v]. Dental. The tongue tip and rims articulate with the upper teeth, e.g. [6.3), as in think and then. Alveolar. The tip or blade of the tongue articulates with the alveolar ridge, e.g. [t,d,l,n,s,z). Post-alveolar. The tip (and rims) of the tongue articulate with the rear part of the alveolar ridge, e.g. [j] as at the beginning of English red. Retroflex. The tip of the tongue is curled back to articulate with the part of the hard palate immediately behind the alveolar ridge, e.g. [ri such as is found in southwest British and American English pronunciation of red. Palato-alveolar. The blade, or the tip and blade, of the tongue articulates with the alveolar ridge and there is at the same time a raising of the front of the tongue towards the hard palate, e.g. [/,3,^,<%1 as in English ship, measure, beach, edge} Palatal. The front of the tongue articulates with the hard palate, e.g. [jl or [cj as in queue Ikju:) or [kcu:] or a very advanced type of [k,g] = [c,j], as in French quitter or guide. Velar. The back of the tongue articulates with the soft palate, e.g. [k.g.rji, the last as in sing. Uvular. The back of the tongue articulates with the uvula, e.g. [b[ as in French rouge. 1 Note that these are called post-alvcolar on the chart of the International Phonetic Alphabet (Table 1). 30 Speech and Language Glottal. An obstruction, or a narrowing causing friction but not vibration, between the vocal folds, e.g. [hj. In the case of some consonantal sounds, there may be a secondary place of articulation in addition to the primary. Thus, in the so-called 'dark' [tj, as at the end of pull, in addition to the partial alveolar contact, there is an essential raising of the back of the tongue towards the velum (velarization); or, again, some post-alveolar articulations of IjJ are accompanied by slight lip-rounding (labialization). The place of primary articulation is that of the greatest stricture, that which gives rise to the greatest obstruction to the airflow. The secondary articulation exhibits a stricture of lesser rank. Where there are two coextensive strictures of equal rank, an example of double articulation results. 4.3.4 Manner of Articulation The obstruction made by the organs may be total, intermittent, or partial, or may merely constitute a narrowing sufficient to cause friction. The chief types of articulation, in decreasing degrees of closure, are as follows: (1) Complete Closure PLosrvE. A complete closure at some point in the vocal tract, behind which the air pressure builds up and can be released explosively, e.g. [p,b,t,d,k,g,?l. Affricate. A complete closure at some point in the mouth, behind which the air pressure builds up; the separation of the organs is however slow compared with that of a plosive, so that friction is a characteristic second element of the sound, e.e Nasal. A complete closure at some point in the mouth but, the soft palate being lowered, the air escapes through the nose. These sounds are continuants and, in the voiced form, have no noise component; they are, to this extent, vowel-like, e.g. [m,n,nj. (2) Intermittent Closure Trill (or roll). A series of rapid intermittent closures made by a flexible organ on a firmer surface, e.g. [r], where the tongue tip trills against the alveolar ridge as in Spanish perro, or [r] where the uvula trills against the back of tongue, as in a stage pronunciation of French rouge. Tap. A single tap made by a flexible organ on a firmer surface, e.g. U) where the tongue tip taps once against the teeth ridge, as in many Scottish pronunciations of English hi. (3) Partial Closure Lateral. A partial (but firm) closure is made at some point in the mouth, the airstream being allowed to escape on one or both sides of the contact. These sounds may be continuant and frictionless and therefore vowel-like (i.e. approximants like the sounds in (5) below), as in [1,+], as pronounced in southern British little [lit*], or they may be accompanied by a litde friction [1] as in fling or by considerable friction [H as in please. I Description and Classification of Speech Sounds 31 (4) Narrowing Fricative. Two organs approximate to such an extent that the airstream passes between them with friction, e.g. [f,v,e,S,s,z,/,3,c,x,h]. In the bilabial region, a distinction is to be made between those purely bilabial such as [$,p], where the friction occurs between spread lips, and a labial-velar sound like [m], where the friction occurs between rounded lips and is accompanied by a characteristic modification of the mouth cavity brought about by the raising of the back of the tongue towards the velum, [cj occurs at the beginning of huge, [x] and M in Scottish pronunciations of loch and which, and [p] in Spanish haber. (5) Narrowing without Friction Approximant (or Frictionless Continuant). A narrowing is made in the mouth but the narrowing is not quite sufficient to cause friction. In being frictionless and continuant, approximants are vowel-like; however, they function phonologically as consonants, i.e. they appear at the edges of syllables. They also differ phonetically from such sounds functioning as vowels in either of two ways. Firstly, the articulation may not involve the body of the tongue, e.g. post-alveolar |j] and labiodental [v], the former the usual pronunciation in RP at the beginning of red, the latter a speech-defective pronunciation of the same sound. Secondly, where they do involve the body of the tongue, the articulations represent only brief glides to a following vowel: thus [j] in yet is a glide starting from the [i] region and [w] in wet is a glide starting from the [u] region. 4.3.5 Obstruents and Sonorants It is sometimes found useful to classify categories of sounds according to their noise component. Those in whose production the constriction impeding the airflow through the vocal tract is sufficient to cause noise are known as obstruents. This category comprises plosives, fricatives, and affricates, sonorants are those voiced sounds in which there is no noise component (i.e. voiced nasals, approximants, and vowels). 43.6 Fortis and Lenis A voiceless/voiced pair such as English /s,z/ are distinguished not only by the presence or absence of voice but also by the degree of breath and muscular effort involved in the articulation. Those English consonants which are usually voiced tend lo be articulated with relatively weak energy, whereas those which are always voiceless are relatively strong. Indeed, we shall see that in certain situations the so-called voiced consonants may have very little voicing, so that the energy of articulation becomes a significant factor. 4.3.7 Classification of Consonants The chart of the International Phonetic Alphabet (IPA) (see Table 1) shows manner of articulation on the vertical axis; place of articulation on the horizontal axis; and Description and Classification of Speech Sounds 33 1 5 s L a pairing within each box thus created shows voiceless consonants on the left and voiced consonants on the right. 4.3.8 Ingressive Pulmonic Consonants Consonants of this type, made as we are breathing in, sometimes occur in languages as variants of their egressive pulmonic equivalents. So we may use such sounds when we are out of breath, but have not got time to pause, either because the need for communication is pressing or because we do not wish someone else to have a chance to speak. The use of such an ingressive pulmonic airstream is, however, variable between languages, and is not especially common in English. Individual sounds may occur as speech defects. Some sounds may also occur extralinguisti-cally, so in English a common way of expressing surprise or pain involves the energetic inspiration of air accompanied by bilabiai or labiodental friction. 4.3.9 Egressive Glottalic Consonants In the production of these sounds, known as ejective, the glottis is closed, so that lung air is contained beneath it. A closure or narrowing is made at some point above the glottis (the soft palate being raised) and the air between this point and the glottis is compressed by a general muscular constriction of the chamber and a raising of the larynx. Thus a bilabial ejective plosive sound [p'] may be made by compressing the air in this way behind the lips. However, it is not only plosives which may be ejective; affricates and fricatives commonly have this type of compression in a number of languages, e.g. [ts',tl\s',x'l. If the glottis is tightly closed, it follows that this type of articulation can apply only to voiceless sounds. [p',t',k'l occur sometimes in final positions in some dialects of English (e.g. in south-east Lancashire). These are not to be confused with the more common variants of final [p,t,k] which are frequently (e.g. in London English) replaced or reinforced by a glottal stop; i.e. the final sound in the word stop may be replaced by a glottal stop or have a glottal closure accompanying the bilabial one, but there is no compression between the glottal and bilabial closures. 4.3,10 Ingressive Glottalic Consonants For these sounds a complete closure is made in the mouth but, instead of air pressure from the lungs being compressed behind the closure, the almost completely closed larynx is lowered so that the air in the mouth and pharyngeal cavities is rarefied. The result is that outside air is sucked in once the mouth closure is released; at the same time, there is sufficient leakage of lung air through the glottis to produce voice. It will be seen that the resulting sound is made by means of a combined airstream mechanism, namely a pulmonic airstream in combination with ingressive glottalic air. Such ingressive stops (generally voiced) are known as implosives and occur with bilabial [6], dental or alveolar [cfl, or velar [cf] mouth closures. Though such sounds occur in a number of languages, sometimes in the 34 Speech and Language speech of the deaf, and in types of stammering, they are not found in normal English. In some languages voiceless implosives may occur, which of course means that in these cases the larynx must be completely closed. 4.3.11 Ingressive Velaric Consonants Another set of sounds involving an ingressive airstream is produced entirely by means of closures within the mouth cavity; normal breathing through the nose may continue quite independently if the soft palate is lowered, and may even produce accompanying nasalization. Thus, the sound made to indicate irritation or sympathy (often written as 'tut-tut') is articulated by means of a double closure, the back of the tongue against the velum and the tip, blade, and sides against the alveolar ridge and side teeth. The cavity contained within these closures is then enlarged mainly by tongue movement, so that the air is rarefied. The release of the forward closure causes the outer air to be sucked in; the release may be crisp, in which case a sound of a plosive type is heard, or relatively stow, in which case an affricated sound is produced. These sounds are known as clicks, the one referred to above being a dental click [II. The sound made to encourage horses is a lateral click, i.e. the air is sucked in by releasing one side of the tongue [I]. These clicks and several others occur as significant sounds in a small number of languages in Africa (e.g. Zulu) and paralinguistically in most languages (as in English). 4.4 Vowels This category of sounds is normally made with a voiced egressive airstream, without any closure or narrowing such as would result in the noise component characteristic of many consonantal sounds; moreover, the escape of the air is characteristically accomplished in an unimpeded way over the middle line of the tongue. We are now concerned with a glottal tone modified by the action of the upper resonators of the mouth, pharyngeal, and nasal cavities. As we have seen (Chapters 2 and 3), the movable organs mainly responsible for shaping these resonators are the soft palate, lips, and tongue. A description of vowel-like sounds must, therefore, note: (1) the position of the soft palate—raised for oral vowels, lowered for nasalized vowels; (2) the kind of aperture formed by the lips—degrees of spreading or rounding; (3) the part of tongue which is raised and the degree of raising. Of these three factors, only the second—the lip position—can be easily described by visual or tactile means. Our judgement of the action of the soft palate depends less on our feeling for its position than on our perception of the presence or absence of nasality in the sound produced. Again, the movements of the tongue, which so largely determine the shape of the mouth and pharyngeal cavities, may be so minute that it is impossible to assess them by any simple means; moreover, there being normally no contact of the tongue with the roof of the mouth, no help is given by any tactile sensation. A vowel description will usually, therefore, be based mainly on auditory judgements of sound relationships, together with some articulator^ information, especially as regards the position of the lips. In addition, an acoustic Description and Classification of Speech Sounds 35 description can be given in terms of the disposition of ihe characteristic forraants of the sound. 4.4.1 Difficulties of Description The description of vowel sounds, especially by means of the written word, has always presented considerable difficulty. Certain positions and gross movements of the tongue can be felt. We are, for instance, aware that when we pronounce most vowel sounds the tongue tip lies behind the lower teeth; moreover, in comparing two such vowels as /i:/ (key) and /a:/ (car) (Fig. 4), we can feel that, in the case of the former, the front of the tongue is the part which is mainly raised, whereas in the case of the latter, such raising as there is is accomplished by the back part of the tongue. Therefore, it can be stated in articulator^ terms that sorrw vowel sounds require the raising of the front of the tongue, while others are articulated with a typical 'hump' at the back; and these statements can be confirmed by means of k-ray photography. Bui the actual point and degree of raising is more difficult to judge. It is not, for instance, helpful to say that a certain vowel is articulated with the front part of the tongue raised to within 5mm of the hard palate. This may be a itatemem of fact for one person's pronunciation, but an identical sound may be produced by another speaker with a different relationship between the tongue and palate. Moreover, we would not find it easy to judge whether our tongue was at 4 or 5 mm from the palate. It is no more helpful to relate the vowel quality to a value used in a particular language, as is still so often done. A statement such as 'a vowel quality similar to that in the English word cat' is not precise, since the vowel in cat may have a wide range of values in English. The statement becomes more useful if the accent of English is specified, but even then a number of variant interpretations will always be possible. 4.4.2 Cardinal Vowels It is clear that a finer and more independent system of description is needed, on both the auditory and articulatory levels. The most satisfactory scheme is that devised by Daniel Jones and known as the cardinal vowel system. The basis of the Fig. 4. Tongue positions of [i:}, [a:]. 7 36 Speech and Language syslem is physiological, i.e. the two qualities upon which all the others were 'hinged' were produced with the tongue in certain easily felt positions: the front of the tongue raised as close as possible to the palate without friction being produced, for the Cardinal Vowel [ij; and the whole of the tongue as low as possible in the mouth, with very slight raising at the extreme back, for the Cardinal Vowel [a]. Starting from the |i] position, the front of the tongue was lowered gradually, the lips remaining spread or neutrally open and the soft palate raised. The lowering of the tongue was halted at three points at which the vowel qualities seemed, from an auditory standpoint, to be equidistant. The tongue positions of these qualities were x-rayed and were found to be more or less equidistant from a spatial point of view. The symbols [e,e,a] were assigned to these vowel values. The same procedure was applied to vowel qualities depending on the height of the back of the tongue, thus raising the back of the tongue from the [a) position; the lips were changed progressively from a wide open shape (for [a]) to a closely rounded one and the soft palate remained raised. Again, three auditorily equidistant points were established from the lowest to the highest position; the corresponding tongue positions were photographed and the spatial relationships confirmed as for the front vowels. These values were given the symbols [3,0,11]. Thus a scale of eight primary Cardinal Vowels was set up, denoted by the following numbers and symbols: 1, [i); 1, [e]; 3. [ej; 4, [a]; 5, [a]; 6, [»]; 7, [oj; 8, [uj. It is to be noticed thai the front series [i,e,£,a] and [a] of the back series are pronounced with spread or open lips, whereas the remaining three members of the back series have varying degrees of lip-rounding. The combination of tongue and lip positions in the primary Cardinal Vowels are the most frequent in languages; i.e. front and open vowels arc most commonly unrounded while back vowels other than in the open position are most commonly rounded. A secondary series can be obtained by reversing the lip positions, e.g. lip-rounding applied to the [i] tongue position, or lip-spreading applied to the [u] position. Such a secondary series is denoted by the following numbers and symbols: 9, [y]; 10, [0]; II, [ce]; 12, [02]; 13 [d]; 14, [a]; 15, [t]; 16, [ui]. This complete series of sixteen Cardinal Vowel values may be divided into two lip shape categories, with corresponding tongue positions: unrounded: [i,e.e,a,a,a,r,iu]. rounded: I y,0,02,0:, o,:>,o,u]. Such a scale is useful because (a) the vowel qualities are unrelated to particular values in languages, though many may occur in various languages, and (b) the set is recorded, so that reference may always be made to a standard, invariable scale.2 Thus a vowel quality can be described as being, for instance, similar to that of Cardinal 2 ([e]), or another as being a type half-way between Cardinal 6 ([:>]) and Cardinal 7 ([ol), but somewhat centralized. Diacritics arc available in the TPA alphabet to show modifications of Cardinal values, e.g. a subscript, to mean more open, a subscript, meaning closer, and raised dots "' to mean centralized. The last example given above might in this way be symbolized as [d] or [p]. It is, moreover, possible to give a visual representation of these vowel relation- 2 Copies of the original recording of the Cardinal vowel) by Daniel Jones are available from the Phonecics Laboratory, Department of Linguistics. University of Manchester, Manchester M13 Description and Classification of Speech Sounds 37 ships on a chart which is based on the Cardinal Vowel tongue positions. The simplified diagram shown in Fig. 5 is obtained by plotting the highest point of tongue-raising for each of the primary Cardinal Vowels and joining the points together. The internal triangle, corresponding to the region of central or [al-type vowel sounds, is made by dividing the lop line into three approximately equal sections and drawing lines parallel to the two sides, so that they meet near the base of the figure. On such a figure, the sound symbolized by [3] or [o] may have its relationship to the Cardinal scale shown visually (see the black circle on Fig. 5). It must be understood that this diagram is a highly conventionalized one which shows, above all, quality relationships. Some attempt is, however, made to relate the shape of the figure to actual tongue posiiions: thus the range of movement is greater at the top of the figure, and the tongue-raising of front vowels becomes more retracted as the tongue position lowers. Nevertheless, it has been shown that it is possible to articulate vowel qualities without the tongue and lip positions which this diagram seems to postulate as necessary. It is, for instance, possible to produce a sound of the Cardinal 7 ([o]) type without the lip-tongue relationship suggested. But. on the whole, it may be assumed that a certain auditorily identified vowel quality will be produced by an articulation of the kind presupposed by the Cardinal Vowel diagram. Moreover, it is a remarkable fact that the auditory judgements as to vowel relationships made by Daniel Jones have been largely supported by recent acoustic analysis; in fact, a chart based on an acoustic analysis of Cardinal Vowel qualities corresponds very well with the traditional Cardinal Vowel figure. 4.4.3 Nasality Besides the information concerning lip and tongue positions which the chart and symbolization denote, a vowel description must also indicate whether the vowel is purely oral or whether it is nasalized. The sixteen Cardinal Vowels mentioned may all be transformed into their nasalized counterparts if the soft palate is lowered. It is unusual, however, to find such an extensive series of nasalized vowels, since it is unusual (though not unknown) for languages to make such fine, significant, C.i [i] 0 C.8 [u] ) C.7[o] €> C.6 [d] €> C.5[a] Fig. 5. The primary Cardinal Vowels; the area symbolized by [ j] or [(*] is shown as a circle. 38 Speech and Language distinctions of nasalized qualities as are common in the case of the purely oral values. 4.4.4 Relatively Pure Vowels v. Gliding Vowels It is clearly not possible for the quality of a vowel to remain absolutely constant (or, in other words, for the organs of speech to function for any length of time in an unchanging way). Nevertheless, we may distinguish between those vowels which are relatively pure (or unchanging), such as the vowel in learn, and those which have a considerable and voluntary glide, such as the gliding vowel in line. The so-called pure vowels will be marked on the diagram as a dot, showing the highest point of the tongue, or, better, as a ring, since it would be inadvisable to attempt to be over-precise in the matter of these auditory judgements; the gliding (or diphthongal) vowel sound will be shown as an arrow, which indicates the quality of the starting-point and the direction in which the quality change is made (corresponding to a movement of the tongue). Fig. 6 shows the way in which the vowels of learn and line will be marked. We are now in a position to give a practical and comprehensive description of a vowel sound, partly in articulatory terms, partly in auditory terms. The vowel which we have symbolized above as [5] or [$] might be described in this way: 'A vowel quality between Cardinal Vowels nos. 6 and 7, but having a somewhat centralized value; the lips are fairly closely rounded; and the soft palate is raised'. Such a written description will have a meaning in terms of sound for anyone who is familiar with the Cardinal vowel scale. There may, of course, be other features of the sound to mention, e.g. a breathy or creaky voice quality. 4.4.5 Articulatory Classification of Vowels Although precise descriptions of vowels are better done auditorily, nevertheless it is convenient to have available a rough scheme of articulatory classification. Such a scheme is represented by the vowel diagram on the chart of the International Phonetic Alphabet (IPA) as shown in Table 1. It will immediately be noticed that learn Fig- 6. The vowels of learn and line. Description and Classification of Speech Sounds 39 Close Close-, Front Central Back Fig. 7. Classification labels combined with the Cardinal Vowel diagram. this is of similar shape to the Cardinal Vowel diagram. Labels are provided to distinguish between front, central, and back, and between four degrees of opening: close, close-mid, open-mid, and open (see Fig. 7). At each intersection point on the periphery of the diagram on the IPA chart (Table I) two symbols are supplied; these symbols are the same as those used for the Cardinal Vowels. However, on this chart the unrounded vowel is always the first of the pair and the rounded the second; this means that we cannot say that the first corresponds to the primary Cardinal and the second to the secondary Cardinal. (It will be remembered that primary Cardinals involve the most frequent lip positions, back vowels being more usually rounded.) The IPA diagram also supplies us with a number of additional symbols for vowels in certain positions, [i,as,i,o,e] being used for unrounded vowels and [u,o,e] for rounded vowels. Of course, this chart does not show nasalized vowels. Sounds in Language 41 5 Sounds in Language 5.1 Speech Sounds and Linguistic Units_ We have now considered a method of describing and classifying the sounds capable of being produced by the speech organs. Speech is, however, a manifestation of language, and spoken language is normally a continuum of sound. A speech sound, produced in isolation and without the meaningfulness imposed by a linguistic system, may be described in purely phonetic terms; but any purely phonetic approach to the sounds of language encounters considerable difficulties. Two initial problems concern, first, the identification and delimitation of the sound unit (or segment) to be described and, secondly, the way in which different sounds are treated, for the purpose of linguistic analysis, as if they were the same. As we have seen, in any investigation of speech, it is at the physiological and acoustic levels that most information is available to us. Yet, especially on the articulatory level, as is revealed by moving x-ray photography, any utterance consists of apparently continuous movements by a very large number of organs; it is well-nigh impossible to say, simply from an x-ray film of the speech organs at work, how many speech sounds have been uttered. A display of acoustic information is easier to handle (see Fig. 3), but even here it is not always possible, because of the way in which many sounds merge into one another, to delimit exactly the beginning and end of sound segments. Moreover, even if it were possible to identify the main characteristics of certain sounds without being sure of their limits, it would not follow that the phonetic statement we might accordingly make concerning the sequence of sound segments would be a useful one in terms of the language which we were investigating. Thus, the word tot is frequently pronounced in the London region in such a way that it is possible to identify five sound segments: [t], [sj, [h], [d], [t]. Yet much of this phonetic reality may be discarded as irrelevant when it is a question of the structure of the word tot in terms of the sound system of English. Indeed, the speaker himself will probably feel that the utterance tot consists of only three 'sounds' (and not only because of the influence of the spelling), such a judgement on his part being a highly sophisticated one which results from his experience in hearing and speaking English. In other words, the [s] and [h] segments are to be treated as part of the phonological, or linguistic, unit /t/.' The phonetic sequence [tsh] does not, in an initial position in this type of English, consist of three meaningful units; in other languages, on the other hand, such a sequence might well constitute three linguistic units as well as three phonetic segments. This same example illustrates how different sounds may count, in respect of their function in a language, as the same linguistic unit. In such a pronunciation of tot as is noted above, the first realization of /t/ might be described as consisting of: (1) a voiceless stop made by the tongue tip and rims against the alveolar ridge and side teeth; (2) a slow release of the compressed air, so that friction is heard—[s]; (3) the complete disengagement of the tongue from the roof of the mouth, so that no friction is caused in the mouth; but an interval before the beginning of the next sound, during which there is friction in the glottis (and voiceless resonance in the supraglottal cavities)—[h]. The second manifestation of /t/, on the other hand, might have an articulation which could be described phonetically as follows: (1) an alveolar stop made as before, but with a simultaneous stop made in the glottis; (2) the glottal closure is released, but the oral stop is retained slightly longer, during which time the air escapes through the nasal passage, the soft palate being lowered. The first [t] might be briefly described as a voiceless alveolar plosive, released with affrication and aspiration; the second as an unexploded voiceless alveolar plosive made with a simultaneous glottal stop. These two different articulations, with the resultant difference of sound, nevertheless function as the same linguistic unit, the first sound occurring predictably under strong accent initially in a syllable and the second being a typical manifestation of the unit in a final position. Such an abstract linguistic unit, which will include sounds of different types, is called a phoneme; the different phonetic realizations of a phoneme are known as its allophones. 5.2 The Linguistic Hierarchy_ It is clear, as we hinted in Chapter 1, that speech and language require in their analysis different types of unit. An utterance, on the concrete speech level, will consist of the continuous physiological activity which results in a continuum of sound; the largest unit will, therefore, be the span of sound occurring between two silences. Within this unit of varying extent it may be possible to find smaller segments. It is, however, from the abstract, linguistic, level of analysis that we receive guidance as to how the utterance may be usefully segmented in the case of any particular language. We might find, for instance, that an utterance such as The boys ran quickly away and were soon out of sight' is spoken without a pause or interruption for breath; it might be said to constitute a single breath group on the articulatory level. But, on the linguistic level, we know that this utterance is capable of being analysed as a sentence consisting of two clauses. Moreover, certain 1 It is customary to distinguish sound segments from linguistic sound units (phonemes) by using (] to enclose the former and / / to enclose the latter. 42 Speech and Language Sounds in Language 43 extensive sequences occurring within the utterance might be meaningfully replaced by other sound sequences, e.g. boys might be replaced by dogs, ran by walked, quickly by slowly, etc. These replaceable sound sequences are able to stand by themselves and are called words. In written forms of language, it usually happens that words are separated from each other by spaces, this being a sophisticated convention which is not reflected in speech. (We shall see, however, in Chapter 11, that words may retain at their boundaries certain characteristics in connected speech, so that their presence and span is signalled on a phonetic as well as a linguistic level.) Yet there are meaningful units smaller than the word. The word boys may be divided into boy and s ([z]), where the presence or absence of [z] indicates the plural or singular form; quickly may be said to consist of quick and the adverbial suffix -ly. These are smaller sound sequences which may be interchanged meaningfully, but which may or may not be capable of standing by themselves. These smaller units, known as morphemes, may correspond with words, e.g. boy, in which case they may stand alone, or they may not normally occur other than in association with a word. There is, however, a yet lower level at which meaningful commutation is possible. The word ran is also a morpheme; but if, instead of saying [raen) we say [rAn], we have, by changing an element on a lower level than the morpheme, changed the meaning and function of the word. This basic linguistic element, beyond which it is not necessary to go for practical purposes, is what we have already referred to as a phoneme. A phoneme may, therefore, be thought of as the smallest contrastive linguistic unit which may bring about a change of meaning. 5.3 Phonemes It is possible to establish the phonemes of a language by means of a process of commutation or the discovery of minimal pairs, i.e. pairs of words which are different in respect of only one sound segment. The series of words pin, bin, tin, din, kin, chin, gin, Jin, thin, sin, skin, win supplies us with twelve words which are distinguished simply by a change in the first (consonantal) element of the sound sequence. These elements, or phonemes, are said to be in contrast or opposition; we may symbolize them as /p,b,t,d,k,tf,d5,f,fe\s,/,w/. But other sound sequences will show other consonantal oppositions, e.g. (1) tame, dame, game, lame, maim, name, adding /g,I,m,n/ to our inventory; (2) pot, tot, cot, lot, yacht, hot, rot, adding /j,h,r/; (3) pie, tie, buy, thigh, thy, vie, adding /o,v/; (4) two, do, who, woo, zoo, adding /z/. Such comparative procedures reveal twenty-two consonantal phonemes capable of contrastive function initially in a word. It is not sufficient, however, to consider merely one position in the word. Possibilities of phonemic opposition have to be investigated in medial and final positions as well as in the initial. If this is done in English, we discover in medial positions another consonantal phoneme, /$/, cf. the word oppositions letter, leather, leisure or seater, seeker, Caesar, seizure. This phoneme /■$/ does not occur in initial positions and is rare (e.g. in rouge) in final positions. Moreover, in final positions, we do not find /h/ or /r/, and it is questionable whether we should consider /w,j/ as separate, final, contrastive units (see §8.2). We do, however, find one more phoneme that is common in medial and final positions but unknown initially, viz. /n/; cf. simmer, sinner, singer or some, son, sung. Such an analysis of the consonantal phonemes of English will give us a total of twenty-four phonemes, of which four (/h,r,3,n/) are of restricted occurrence—or six, if /w,j/ are not admitted finally. Similar procedures may be used to establish the vowel phonemes of English (see Chapter 8). The final inventories of vowel and consonant phonemes will constitute a statement of the total oppositions in all positions in the word or syllable; when any particular place in the word or syllable is taken into consideration, the number of terms in the series of oppositions is likely to be more restricted. 5.3.1 Diversity of Phonemic Solutions It is important to emphasize the fact that it is frequently possible to make several different statements of the phonemic structure of a language, all of which may be equally valid from a logical standpoint. The solution chosen will be the one which is most convenient as regards the use to which the phonemic analysis is to be put. Thus one solution might be appropriate when it is a question of teaching a language to a particular group of foreigners, when similarities and differences between two languages may need to be underlined; another solution might be appropriate if it is a question of using the phonemic analysis as a basis for an orthography, when sociolinguistic considerations (e.g. relations with other countries having particular orthographic conventions) have to be taken into account. Even without such considerations, discrepancies in analysis frequently arise in the case of such sound combinations as affricates (e.g. ['/,<%,tr,dr]) and diphthongs (e.g. [ei,ao,ai,au]), which may be treated as single phonemes or combinations of two. Such problems concerning particular English sounds will be dealt with when vowels and consonants are considered in detail. 5.3.2 Distinctive Features Up to now we have obtained an inventory of phonemes for English which is no more than a set of relationships or oppositions. The essence of the phoneme /p/, for instance, is that it is not /t/ or /k/ or /s/, etc. This is a negative definition, which it is desirable to amplify by means of positive information of a phonetic type. Thus we may say that /p/ is. from a phonetic point of view, characteristically voiceless (compared with voiced Ihl); labial (compared with the places of articulation of such sounds as /t/ or /k/); and plosive (compared with HI). The /p/ phoneme may, therefore, be defined positively by stating the combination of distinctive features which identify it within the English phonemic system: voiceless, labial, plosive. As originally conceived, the distinctive features of a language were stated in articulatory terms using as a basis the phonetic classification of consonants described in the previous chapter. So the distinctive features of English /p/ were voiceless, labial, and plosive. Here there are three dimensions of variation: voicing. 44 Speech and Language Sounds in Language 45 place, and manner. But it was conceded that the distinctive features of a language might involve more or less than three dimensions. For example, in some languages (e.g. in Tamil, a language of south India) voicing is not a distinctive feature (so changing from [pi to [b] does not bring about a change of meaning), and thus only place and manner are distinctive. In other languages we may need to state four dimensions of variation. In Hindi not only is voicing (and place and manner) distinctive but aspiration is also separately distinctive from voice; compare /kaan/ 'ear', /khaan/ 'mine', /gaan/ 'anthem', /g^an/ 'quantity'. Such articulatory distinctive features sometimes involve two terms (voiceless v. voiced, aspirated v. unaspirated), sometimes three (e.g. labial /p,b/ v. alveolar /t,d/ v. velar /k,g/ in English), and sometimes more. Later developments in the theory of distinctive features have involved explaining all the contrasts of a language in terms of binary distinctive features and suggesting that there is a set of binary features (involving around twelve or thirteen distinctions) which will account for all languages. An apparent three-term distinction like labial v. alveolar v. velar is turned into two features with plus or minus values; using 'coronal' to mean 'made with the blade of the tongue raised above the neutral position' and 'anterior' to mean 'made in front of the hard palate', the English plosives /p,b,t,d,k,g/ are then defined as follows: p,b t,d k,g coronal - + -anterior + + - In the most well-known set of binary distinctive features2, many features are still articulatory, although some are auditory or acoustic (e.g. 'strident'). In this book we use distinctive feature analysis (of the more traditional kind which allows non-binary dimensions) where such analysis is not in doubt and where it is obviously explanatory. This means that we frequently refer to feature analysis when describing the consonants of English, but use it very little when describing the vowels, since almost all distinctive feature analysis in this area is disputed and not always helpful. 5.3.3 Allophones No two realizations of a phoneme are the same. This is true even when the same word is repeated; thus, when the word cat is said twice, there are likely to be slight phonetic variations in the two realizations of the phoneme sequence /k+ae + t/. Nevertheless, the phonetic similarities between the utterances will probably be more striking than the differences. But variants of the same phoneme occurring in different words or in different positions in a word will frequently show consistent phonetic differences; such consistent variants are referred to as allophones. We have seen (§5.1) how different the initial and final allophones of /t/ in the word tot may be. Or again, the [k] sounds which occur initially in the words key and car are phonetically clearly different: the first can be felt to be a forward articulation, near the hard palate, whereas the second is made further back, on the soft palate. This difference of articulation is brought about by the nature of the following vowel, [i:], having a more advanced articulation than [a:]; the allophonic variation is in this case conditioned by the context. In some varieties of English the two [I] sounds of lull [Lvt] show a variation of a different kind. The first [1], the so-called 'clear' [1] with a front vowel resonance, has a quality very different from that of the final 'dark' [i] with a back vowel resonance. Here the difference of quality is related to the position of the phoneme in the word or syllable and is not dependent upon the phonetic context, i.e. the adjacent sounds. It is possible, therefore, to predict in a given language which allophones of a phoneme will occur in any particular context or situation: they are said to be in complementary distribution. Statements of complementary distribution can refer to preceding or following sounds (e.g. fronted [kj before front vowels like /i:/ in key but retracted [k) before back vowels like /a:/ in car); to positions in syllables (plosives are strongly aspirated when initial in accented syllables); or to positions in any grammatical unit, e.g. words (vowels may optionally be preceded by a glottal stop when word-initial) or morphemes (Cockney has a different allophone of /d:/ in morpheme-medial and morpheme-final positions (cf. board [baodl v. bored [towadl)). Complementary distribution does not take into account those variant realizations of the same phoneme in the same situation which may constitute the difference between two utterances of the same word. When the same speaker produces noticeably different pronunciations of the word cat (e.g. by exploding or not exploding the final A/), the different realizations of the phonemes are said to be in free variation. Again, the word very may be pronounced [veal] (where the middle consonant is an approximant) or [ven] (where the middle consonant is a flap). The approximant and the flap are here in free variation. Variants in free variation are also allophones (since, like those in complementary distribution, they are not involved in changes of meaning). It is usually the case that there is some phonetic similarity between the allophones of a phoneme: for example, both the [I] sounds discussed above, as well as the voiceless fricative variety which follows /p/ or /k/ in words such as please and clean, are lateral articulations. It sometimes happens that two sounds occur in complementary distribution, but are not treated as allophones of the same phoneme because of their total phonetic dissimilarity. This is the case of [h] and [nj in English; they are never significantly opposed, since [h] occurs typically in initial positions in the syllable or word, and [rj] in final positions. A purely logical arrangement might include these two sounds within the same phoneme, so that hung might be transcribed phonemically as either /hAh/ or /rjArj/; but such a solution would ignore the total lack of phonetic similarity and also the feeling of native speakers. The ordinary native speaker is, in fact, often unaware of the allophonic variations of his phonemes and will, for instance, say that the various allophones of /l/ we have discussed are the 'same' sound; [h] and [rjl, however, he will always consider to be 'different' sounds. When he makes a statement of this kind, he is usually referring to the function of the sounds in the language system, and can thereby offer helpful, intuitive information regarding the phonemic organization of his language. In the case of a language such as English, prejudices induced by the existence of written forms have naturally to be taken into account in evaluating the native speaker's reaction. 2 Chomsky and Halle (1968). 46 Speech and Language Sounds In Language 47 5.3.4 Neutralization It sometimes happens that a sound may be assigned to either of two phonemes with equal validity. In English, examples of this kind are to be found in the plosive series. The contrast between English /p,t,k/ and /b,d,g/ is shown in word-initial position by pairs like pin/bin. team/deem, come/gum. However, following /s/ there is no such contrast. Words beginning /sp-, st-, sk-/ are not contrasted with words beginning /sb-, sd-, sg-/, although a distinction sometimes occurs word-medially, as in disperse/disburse and discussed/disgust (suggesting a syllable division between the /s/ and the following plosive). In such circumstances we say that the contrast between /p,t,k/ and /b,d,g/, the contrast between voiceless and voiced, is neutralized following /s/ in word-initial position. Words like spin, steam, and scar could equally well be transcribed with /b,d,g/ as with /p,t,k/. Indeed, even though the writing system itself suggests /p,t,k/ (/k/ may be written with or [mi:s], where the change [u:] > [y:] can be explained by the fronting of [u:] under the influence of the [i:] of the following syllable. Such a combinative change belongs to OE, but a more recent change of this type is exemplified by words such as swan. This word was probably pronounced [swan] or [swaen] in about 1600, but the [w] sound has rounded and retracted the vowel to give the modern form [swon]. The large majority of earlier [w] + [a] sequences have now given [w] + [d], or [;>:), by reason of this combinative change affecting this particular sound sequence, e.g. want, quality, war, water. (3) Some changes are neither independent nor dependent upon the phonetic context; they may be said to be external to the main line of evolution. Thus it was fashionable in Elizabethan times to pronounce such words as servant and heard with [aer] or [ar), perhaps originally a dialect form, rather than with [ex], the regular form of development; these words, with some exceptions such as clerk, have reverted to the normal development of ME [er] > Is:] rather than [a:]. It was also fashionable to pronounce the termination -mg as [in], only now retained as a special form of affectation or in some dialects. Such changes, involving a change of distribution of phonemes among words and morphemes, do not affect the phonemic system of the language. The introduction of foreign words may, however, at least temporarily and in the speech of a restricted number of individuals, disturb the number of phonemes or their distribution as regards position in the word. Thus, if the French word beige is used in English with the pronunciation /beij/, we have a case of a final /■$/ previously unknown in English words; or again, if restaurant is pronounced with any kind of nasalized vowel in the last syllable, the possibility of a new kind of vocalic opposition is introduced into the language. However, such foreign borrowings generally tend to conform to the English system: words with a final French ly', such as prestige or camouflage may be realized in the English form with /<%/, and a word with a nasalized vowel like restaurant will be normalized to /"restarorj/, / restaront/, or / restrant/. (4) In addition to changes of quality, there have also to be taken into account changes involving length and accentual pattern. Thus the vowel in such words as path, half, pass, still short three hundred years ago, is now long in the south of England. Or again, the vowels in good, book and breath, death, once long, are now relatively short. Changes of accent are particularly striking in the case of words which have come into the language from French: in ME, such words as village or necessary retained their accent on the penultimate syllable—/vflaidp/ and /nese'sa:ri3/. Now, the accent has shifted to an earlier syllable, together with associated changes of quality— /'vilid3/, /"nesssri/ (the latter may retain the ME pattern in American English). Later borrowings, or those in less common use, often retain the French accentual pattern—thus hotel or machine, have the accent on the final syllable, whereas, if they had conformed to the English system, we might have had such modern forms as /'hautl/ and /'mxtfm/ or /'meitfin/, in the same way that the thoroughly anglicized form of garage gives /'gaerick/. (See §7.5 on current changes.) 6.2.2 Rate and Route of Vowel Change The English vowels have been subject to more striking changes than have the consonants. This is not surprising, for a consonantal articulation usually involves r 66 The Sounds of English an approximation of organs which can be felt; such an articulation tends to be more stable, in that it is more easily identified and transmitted more exactly from one generation to another. Changes in the consonantal system comparatively rarely involve a modification of sound (an example of such a modification would be the affrication, for combinative reasons, of the OE palatal plosives [c,j] to [tf.dj] as in church < OE cirice and bridge < OE brycg). Far more common is the type of distributional change involving the conferment of phonemic status on an existing sound (e.g. [v,3,z], allophones of /f,6,s/ in OE, later obtain contrastive, phonemic, significance), the disappearance of an allophone (e.g. postvocalic [x] and [c] in such words as brought and right were largely lost in the south of England by the seventeenth century), or the insertion of an existing phoneme in a particular class of words (e.g. the initial /h/ in words of French origin such as herb, homage). Whether it is a question of consonantal change, loss, or addition, it is usually possible to explain the type of modification which has taken place and the approximate period during which it occurred. A modification of vowel quality will, however, result from very slight changes of tongue or lip position, and there may be a series of imperceptible gradations before an appreciable quality change is evident (or is capable of being expressed by means of the Latin vowel letters). It is particularly difficult to assess rate and phonetic route of change in the case of those internal independent vowel changes which affect a phoneme throughout the language. It is known, for instance, that the modern homophones meet and meat had in ME different vowel forms, approximately of the value [e:] and [e:[. The [e:] vowel of meet became [i:] by about 1500, and it might be postulated that by a process of gradual change the [e:] ofmeat first closed to [e:] and then, by the eighteenth century, coalesced with the [i:] in meet. The available evidence, however, suggests that the change [t:] > [i:] may not have been either simple or gradual, but that two pronunciations existed side by side for a long period (the conservative [e:] beside another form [i:] which had resulted from an early coalescence with the meet vowel). In other vowel changes, it may be agreed that the change was gradual, but it is difficult to date precisely the stages of development. Thus the modern /ai/ of time results from a ME [i:] value; it is clear that the change has been one of progressive, widening diphthongization, but there may have been a period of incipient diphthongization when there was hesitation between the pure vowel [i:) and some such diphthong as [li] or [ai]. It is well to remember, therefore, that at any particular time in history there are likely to be a number of different, coexistent realizations of vowel phonemes, not only between regions but also between generations and social groups. An example of such variety in modern English is provided by the vowel at the end of city, which in the south of England may be rendered as [i] by the older generation and as something more like [i] by younger people. The speech of any community may, therefore, be said to reflect the pronunciation of the previous century and to anticipate that of the next. 6.2.3 Sound Change and the Linguistic System It is convenient to study sound change in terms of the development of particular phonemes or sounds, but it is misleading to ignore the relationship of the sound units to the system within which they function and which may, in fact, not be The Historical Background 67 changing. In other words, although there may be considerable qualitative changes, the number and pattern of the terms within the system may show relative stability. The ME I'y.l phoneme, for instance, is now realized as [ai], but there is still a phonemic opposition which contrasts such words as time, team, tame, term, tomb, and, in any case, a new phoneme /i:/ has emerged in words of the team type. On the other hand, the system may change because a sound, without itself changing, may receive a new, phonemic, value; e.g. the sound [n] has always existed in English as a realization of /n/ followed by the velars /k/ or /g/, but when the final /g/ in a word like sing was no longer pronounced, /rj/ contrasted significantly with /n/ and /m/, e.g. ram, ran, and rang. Since the system of our language consists of a framework of significant oppositions by means of which we communicate, it may be assumed that there is a tendency for the system to remain stable, the loss of an opposition involving a possibility of confusion. In fact, of course, the redundancy of English is such that some degree of neutralization of phonemes is easily tolerated: today, few speakers in the south of England distinguish saw and sore by means of an opposition /yj-/oa/, yet the loss of the /:»/ diphthong is no impediment to communication. An example of an earlier coalescence of vowel phonemes is that illustrated by the homophony of meet and meat. On the other hand, new oppositions may emerge in the language, e.g. the phonemes /v,5,z,n/, as we have seen. Nevertheless, despite the adjustments in the number of phonemes which have taken place, the history of the English sound system displays, over the last 1,000 years, a considerable degree of stability. Though the relationships within the system may tend to remain stable, a change of phonetic realization of any phoneme is likely to have qualitative repercussions throughout the system. Such a disturbance may be observed in modern English. The phonetic relationship of the vowel phonemes in set and sat, in one type of pronunciation, is of a front vowel between close-mid and open-mid to a front vowel between open-mid and open. If, however, the vowel of sat has a closer articulation than that described, that of set must be raised too. A limit of raising is imposed by the presence of sit and seat, for it is not possible to raise the vowel of sit to any extent without danger of confusion with that of seat, unless the latter vowel becomes strongly diphthongal. (It may be objected that a quantitative as well as qualitative difference distinguishes /i:/ from /i/; but in the examples given—seat and sit—the phonetic context imposes a quantity on /i:/ which is practically the same as that of /i/. If/i/ were too close to the region of /L/, the opposition would be maintained only by realizing /i:/ as fully long at the expense of the shortening influence of the final /1/ (or by a process such as diphthongization.) Alternatively, if the vowel phoneme of sat is realized as a front open vowel, as in many English regional dialects, the vocalic area in which the phoneme of set can be realized becomes more extensive; in fact, in those kinds of English where this occurs, the vowel in set tends to be open-mid variety. Such considerations of the phonetic relationship of phonemes have a relevance in the historical, diachronic study of English. In ME there were, for instance, four long vowels in the front region— /i:,e:,e:,a:/. By 1600 I'v.l had diphthongized and the remaining vowels closed up. Such a movement may have been caused by pressure upwards from /a:/ or by the creation of an empty space brought about by the diphthongization of the pure vowel Iv.I. 68 The Sounds of English Although, therefore, it is often convenient in diachronic studies to investigate the development of individual phonemes in terms of the quality of their realization, it is clear that many sound changes can be explained only by reference to a readjustment of the phonetic relationships of the phonemes of the system as a whole. Moreover, any particular point in the development of the sound system of a language is not simply to be considered as a stage in the process of change of a number of sound units but rather as the presentation of the functioning of a system at a certain historical moment. The primary significance of the sounds of modern English is their function in the system of today; in the same way, the English sounds of 1600 are to be viewed in terms not only of their past and future forms but also of their contemporary, synchronic relationships and functions. Some sound changes are, indeed, the result of an influence which applies to the system as a whole. Those drastic changes of vowel quality known as the Great Vowel Shift mainly affect vowels in accented syllables. But vowels in most unaccented syllables (especially those in word-final positions) have undergone, in the last thousand years, an equally striking, though different, type of change. Henry Sweet has called OE the period of full endings, stanas being realized as ["sta:nasl; ME, the period of levelled endings, when stones was pronounced rsto:nasJ; and eModE and later English, the period of lost endings, when stones is [sto:nz], [staunz]. There is, therefore, a general tendency for all unaccented vowels to shorten (if long) and to gravitate towards the weak centralized vowels [i] or [a], or sometimes [o], if not to disappear altogether. This fact accounts for the high frequency of occurrence of [i] and [sj in PresE and for the complete elision of many vowels in unaccented syllables in rapid colloquial speech, e.g. suppose [spauz], probably [pjDbbli]. 6.2.4 Sources of Evidence for Reconstruction Whether our aim is to reconstruct the phonological system of English at any particular moment in history or to estimate the nature of the development affecting particular phonemes, it is necessary to establish the sound values which were used in the pronunciation of the language—relative values in the case of the system, absolute values as far as possible in the case of sound development. An investigation of the phonological structure of PresE would have to include direct observation of its phonetic features. For this purpose, future generations will have the benefit of recordings of the speech of today. Obviously, this type of evidence cannot be used for the reconstruction of past states of the spoken language. The further back we go into history the scantier the evidence of spoken forms becomes. Our conclusions will, therefore, be based on information mostly of an indirect kind; yet such is the agreement generally amongst the various types of evidence that the broad lines of sound change can be conjectured with reasonable certainty. (1) Theoretical paths of development. If, in dealing with the changing realization of a particular phoneme, we can be reasonably sure of its sound value at two points in history, we can, from our knowledge of phonetic possibilities and probabilities, infer theoretically the intervening stages of development. We can, of course, be sure of the pronunciation of PresE. If, then, the evidence suggested unequivocally that, for instance, the vowel in home was pronounced as [a:] in OE, the development to The Historical Background 69 be described and accounted for would be [a:] > [au]. It is likely that the articulation has always involved the back, rather than the front, of the tongue; the change has clearly meant a closing of the tongue position, to which at some stage there has been added a gliding (diphthongal) movement. We might, therefore, postulate such developments as [a:>au>ou>3oj or [d:>o:>ou>3o]. The available evidence will then confirm or refute the hypothesis—in this case the second solution being more in keeping with the information. Such recognition of phonetic probabilities will always be implicit in the tracing of change. It must be considered unlikely that [a:] on its way to [ou] or [au] would have passed through a stage of front articulation, without any combinative influence. Nevertheless, the possibility of a type of change which is not the most probable theoretically must never be excluded. The rounded close-mid back ME [r] developed by the nineteenth century to an unrounded open-mid centralized back [a]; and in the London area this vowel has now become more open and more front [a]. Yet, at the same time, there is a tendency to make the vowel in sad more open. There is here a potential conflict, and the future development of these vowels is uncertain. It would, therefore, be dangerous to predict, merely according to phonetic probabilities, the way our present sound system will develop. (2) Old English. It is most important in an investigation of the development of English sounds over the last thousand years that the pronunciation of OE should be established with some certainty. If this can be done, we shall have a 'starting-point' for the phonetic route of change to PresE. The term Old English, however, spans a period of some four hundred years from about ad 700 ad 1100. Moreover, the invasion of the Angles, Saxons, and Jutes in the fifth and sixth centuries introduced four separate varieties of English: the Angles, in the Midlands, north-east England, and the south of Scotland, using types of English known as Mercian and Northumbrian (or, in general terms, anglian); the Saxons, in the south and south-west, using the west saxon dialect; and the Jutes, settling mainly in the region of Kent and using a dialect called kentish. Of the four dialects, West Saxon, which was to become a kind of standard language, is the one about which most is known from the extant texts. In its later form—that in use between about ad 900 and ad 1100— it is referred to as Classical OE. The broad lines of the pronunciation of this language can be conjectured from a comparison of the development of the other members of the West Germanic group of languages to which it is related. But by far the most explicit evidence concerning its sounds is to be inferred from the alphabet in which it is written. The earlier runic spelling was replaced by a form of the Latin alphabet. This alphabet was probably introduced into the country in the seventh century by Irish missionaries. It can be assumed, therefore, that the sounds of OE were represented as far as possible by the Latin letters with their Latin values, with some modifications of an Irish kind. A great deal is known about the pronunciation of Vulgar Latin, whose sound system had much in common with that of modern Italian. If an Italian, knowing no English, were today asked to write down with his own spelling the PresE pronunciation of the word milk [nuik), he would have no difficulty in representing the first sound, which he could spell as m\ the vowel [i] might, however, seem to him to resemble the sound he would write in Italian as e rather than as i; the 'dark' [i] would appear to have a back vowel glide accompanying it, requiring a spelling such as ol; and, since he has no k letter, he would spell the final [kl as c. His transcription 70 The Sounds of English The Historical Background 71 of the word might, therefore, be meolc, which is, in fact, a West Saxon spelling of the word now written milk. This is a fortuitous example, and must not be taken to suggest that OE was pronounced in the same way as PresE. But it does demonstrate that OE spellings, which may appear to be very different, are often less surprising when we keep in mind the Latin values originally attached to the letters. Sometimes the simple forms of the Latin alphabet were evidently inadequate for representing the English sound: thus the joined form x was used to symbolize a sound between C[a) and Cfc]; the sounds [6] and [5] were written in the earlier manuscripts as th initially and d medially and finally in a word, and later as [5] or the rune p, regardless of the sound's position in the word or its voiced or voiceless quality; the rune p frequently replaced the earlier u or uu. The vowel values of the OE system were particularly difficult to represent with the five Latin vowel letters. Sometimes the spelling used hesitated between two letters: thus the vowel of mann, probably of a C[a] or [d) quality, was written either with a or o, indicating a vowel between the unrounded open central value of the Latin letter a and the rounded open-mid to close-mid back value of o. Unaccented vowels, too, already beginning to be obscured and levelled, presented a problem to the scribes, the Latin alphabet offering no way of showing a central vowel of the [a] type. Unaccented x, e, and / soon began all to be written as e, and unaccented a, u, o later tended to be used indifferently, indicating that the vowel distinction was being lost. A diphthong such as the one written as ea must probably be interpreted as a glide to a central [o] quality. Quantity is often shown in the case of vowels by doubling the letter or by the use of an accent and in the case of consonants by doubling the letter. The accent in a word is also sometimes shown by the use of a mark; but, in any case, it is agreed, from a comparison of the West Germanic languages, that the word accent in OE fell generally on the first syllable of words, with the exception of certain compounds. The written form of OE provides us, therefore, with considerable information concerning the language's pronunciation; we have a working hypothesis from which to begin our investigations. The study of later forms of English will often, in fact, confirm that the OE pronunciation postulated from the spelling and the comparison of Germanic languages is the only one from which later forms can be expected to have developed. (3) Middle English. Spelling forms can also help us to deduce the pronunciation of the ME period, roughly ad 1100-1450. Generally speaking, it may be said that the letters still had their Latin values and that those letters which were written were meant to be sounded. Thus, the initial k in a word such as knokke was still pronounced and the vowel in time would have an [i] quality. This persistence of Latin values in spelling was no doubt due to the influence of the Church, which was still the centre of teaching and writing, and the absence of a thoroughly standardized spelling accounts for its predominantly phonetic character. However, English spelling was modified by French influences. Notably, the French ch spelling was introduced to represent the [t() sound in a word such as chin (formerly spelt cinn), where the new spelling form indicates no change of pronunciation; in addition ou, or ow, represents the sound [u], formerly written m, e.g. hous, in OE hus. The simple u spelling was retained to express both the French sound [yl in words like duke and fortune and the OE short [u] sound, though this latter sound is often written as o, especially when juxtaposed with tetters of the iv, m, n type, e.g. wonne rather than wunne, to iivoid confusion between the letter shapes. Rhymes, too, have their value, especially as, in this period, they are likely to have been satisfactory to the ear as well as to the eye—in the whole of Chaucer's work, for instance, there are very few rhymes which appear to involve the pairing of different vowel sounds. Nevertheless, evidence from rhymes is valueless unless it is possible to be certain, from other sources of evidence, of the pronunciation of one member of the pair. Thus, the Chaucerian rhyme par cas :: was, because we can be sure that the French word cas had a vowel of the [a] quality, is evidence to confirm the view that the (w) of was had not yet retracted and rounded the vowel to (d) and, the final s in the two words being still likely to represent [s], that the word was probably pronounced [was]. Again, words imported from French can give us information concerning the timing of sound changes. Thus French words such as age and couch, which we know from French sources had [a:] and [u:] at the time of their introduction into English, fell in with the English vowel development [a:] > [ei] and [u:] > [au] in words like name and house; we can conclude, therefore, that at the time the French words came into the language the [a:] and [u:] vowels had not begun their change. Moreover, after the ME period, as we shall see, a great deal of direct evidence is available to us, so that our conjectures from about 1500 onwards can be made with considerable certainty. We may often, therefore, be able to deduce from our knowledge of pronunciation in the sixteenth century, the stage probably reached in the ME period in the development of a sound from OE. The OE [i:] sound in time, for example, was beginning to be diphthongized generally very early in the sixteenth century. It is reasonable to suppose (even if other evidence to support the theory did not exist) that time still had a relatively pure [i:] for much of the ME period. Finally, the metre of verse reveals the accent of words. It is for this reason that we know that French words, in Chaucer's verse, generally retained their original accentual pattern, e.g. courage [ku"ra:d5a], and that the accent shift in these cases is a phenomenon of at least late ME. (4) Early Modern English. The same sources of evidence which we have already considered remain available for the eModE period, roughly ad 1450-1600. The introduction of printing brought standardization of spelling, and already the spoken and written forms of the language were beginning to diverge. But individuals, especially in their private correspondence, often used spellings of a largely phonetic kind, in the same unsophisticated and logical way that children still do. If a modern child writes He must have gone as He must of gone, he is only representing the phonetic identity of the weak forms of have and o/([av]). an identity which he will learn to ignore when he adopts the conventional spelling distinction. In the same way, if fifteenth- and sixteenth-century spellings show the word sweet occasionally written as swit, it may be assumed that this original ME [e:] was by now so close that it could be represented by i with its Latin value. Or again, the spelling form sarvant instead of servant reflects an open type of vowel in the first syllable which was current throughout the eModE period in such words. Moreover, the conventional adoption of an unphonetic spelling can sometimes provide us with positive evidence as to its value: thus, when words like delight (formerly delite) began to be spelt with gh, this spelling form gh clearly no longer had the 72 The Sounds of English The Historical Background 73 consonantal fricative value which it had formerly represented in lighi, since there never was a consonantal sound between the vowel and final [t] in delight. We may conclude, therefore, that gh no longer had its former phonetic significance in words such as light. Care must, of course, be taken to identify the increasing number of learned or technical spellings adopted by printers. The initial letter group gh in ghost (OE gasi) indicates no change in pronunciation—goose was also sometimes spelt ghoose in this period. Again, spellings which aim at revealing the etymology (true or false) of a word must usually be discarded as phonetically valueless, e.g. debt, island. Thus from the writings of individuals some general indications concerning sound changes may be gathered and used to supplement evidence derived from other sources. Rhymes, too, continue to be useful as complementary evidence. A rhyme such as night:: white confirms the view that post-vocalic gh no longer had a consonantal value; or again, can :: swan suggests that the rounding of [a] after [w] had not yet taken place. Yet.justasin thecaseof ME, rhymes must be treated with caution, more particularly as eye-rhymes were doubtless beginning to become more prevalent. In Elizabethan literature, however, additional evidence is afforded by the frequent use of puns, which usually rely for their effect upon similarities, if not identities, of phonetic value. Shakespeare, for instance, plays on the phonetic identity of such pairs as suitor, shooter (both capable of being pronounced [Ju:tar]) and known, none (both lno:n)); such puns suggest that the pronunciation of the two words was commonly sufficiently close to make an immediate impression upon an audience. The most important and fruitful evidence for this period is, however, of a direct kind. It is provided by the published works of the contemporary grammarians, orthoepists, and schoolmasters, some of whom have been mentioned in §6.1. They are of unequal value and their statements have often to be interpreted in the light of other evidence; yet they provide us with the first direct descriptive accounts of the pronunciation of English. From the sixteenth century onwards, our conclusions rely more and more on their descriptive statements and less on clues of an indirect kind. Sometimes there appears to be a conflict between the phonetic probabilities, the statements of grammarians, and evidence from other sources. Frequently the solution must be that there existed at any time a variety of current pronunciations, resulting from differences of dialect, generation, fashion, and place in society, in the same way that a description of PresE (even that of a restricted area such as the south of England) would have to take into account a large number of variants. The following representative systems are conjectures of one possible set of phonemes current in the periods in question. 6.2.5 The Classical Old English Sound System Vowels i:,i,y:,y u:,u e:,e o:p se ; English may be said to have begun in the ninth and tenth centuries. But there <: has always existed a great diversity in the spoken realizations of our language, in terms of the sounds used in different parts of the country and by different sections of the community. On the one hand, the sounds of the language always being in process of change, there have always been at any one time disparities between the speech sounds of the younger and older generations; the speech of the young is 78 The Sounds of English traditionally characterized by the old as slovenly and debased. On the other hand, especially in those times when communications between regions were poor, it was natural that the speech of all communities should not develop either in the same direction or at the same rate; moreover, different parts of the country might be exposed to different external influences (e.g. foreign invasion) which might influence the phonetic structure of the language in a particular area. English has, therefore, always had its regional pronunciations in the same way that other languages have been pronounced in a variety of ways for basically geographical reasons. Yet, at the same time, especially for the last five centuries, there has existed in this country the notion that one kind of pronunciation of English was socially preferable to others; one regional accent began to acquire social prestige. For reasons of politics, commerce, and the presence of the Court, it was to the pronunciation of the south-east of England and, more particularly, to that of the London region that this prestige was attached. The early phonetician John Hart notes (1569) that it is in the Court and London that 'the flower of the English tongue is used . . . though some would say it were not so, reason would we should grant no less: for that unto these two places, do daily resort from all towns and countries, of the best of all professions, as well of the own landsmen, as of aliens and strangers . . .' Puttenham's celebrated advice in the Arte of English Poesie (1589) recommends 'the usual speech of the Court, and that of London and the shires lying about London within 60 miles and not much above . . . Northern men, whether they be noblemen or gentlemen, or of their best clerks, [use an English] which is not so courtly or so current as our Southern English is.' Nevertheless, many courtiers continued to use the pronunciation of their own region; we are told, for instance, that Sir Walter Raleigh kept his Devon accent. The speech of the Court, however, phonetically largely that of the London area, increasingly acquired a prestige value and, in time, lost some of the local characteristics of London speech. It may be said to have been finally fixed, as the speech of the ruling class, through the conformist influence of the public schools of the nineteenth century. Moreover, its dissemination as a class pronunciation throughout the country caused it to be recognized as characteristic not so much of a region as of a social stratum. With the spread of education, the situation arose in which an educated man might not belong to the upper classes and might retain his regional characteristics; on the other hand, those eager for social advancement felt obliged to modify their accent in the direction of the social standard. Pronunciation became, therefore, a marker of position in society. 7.3 The Present-Day Situation: RP_ (1) Some prestige is still attached to this implicitly accepted social standard of pronunciation. Often called received pronunciation (RP), the term suggesting that it is the result of a social judgement rather than of an official decision as to what is 'correct' or 'wrong', it has become more widely known and accepted through the advent of radio and television. The BBC used to recommend this form of pronunciation for its announcers mainly because it was the type which was most widely understood and which excited least prejudice of a regional kind. Indeed, attempts to use announcers who had a mild regional accent used to provoke protests even from the region whose accent was used. Thus, RP often Standard and Regional Accents 79 became identified in the public mind with 'BBC English'. This special position occupied by RP, basically educated southern British English, has led to its being the form of pronunciation most commonly described in books on the phonetics of British English and traditionally taught to foreigners. (2) Nevertheless, it cannot be said that RP is any longer the exclusive property of a particular social stratum. This change is due partly to the influence of radio and television in constantly bringing the accent to the ears of the whole nation but also, in considerable measure, to the modifications which are taking place in the structure of English society. Just as the sharp divisions between classes have disappeared, so the more marked characteristics of regional speech and, in the London region, the popular forms of pronunciation are tending to be modified in the direction of RP, which is equated with the 'correct' pronunciation of English. This tendency does not mean that regional forms of pronunciation show signs of disappearing; but it has to be recognized that those who wish, for any reason, to modify their speech have models of RP always readily available to their ears while, at the same time, the social inhibitions concerning movement between classes, which were formerly so strongly operative, no longer exert the same pressure. Moreover, it must be remarked that some members of the present younger generation reject RP because of its association with the 'Establishment' in the same way that they question the validity of other forms of traditional authority. For them, real or assumed regional or popular accent has a greater (and less committed) prestige. It is too early to predict whether such attitudes will have any lasting effect upon the future development of the pronunciation of English. But if this tendency were to become more widespread and permanent, the result could be that, within the next century, RP might be so diluted that it could lose its historic identity, and that a new standard with a wider popular and regional base would emerge. Such a change is made more likely through the recent more permissive attitude of the BBC (and of the commercial television companies) in their choice of announcers, many of whom now have markedly non-RP or non-British accents. (3) Certain types of regional pronunciation are, indeed, firmly established. Some, especially Scottish English speech, are universally accepted; others, particularly the popular forms of pronunciation used in large towns such as London, Liverpool, or Birmingham, are generally characterized as ugly by those (especially of the older generations) who do not use them. This rejection of certain sounds used in speech is not, of course, a matter of the sounds themselves: thus, [paint] may be acceptable if it means pint, but 'ugly' if it means paint. It is rather a reflection of the social connotations of speech which, though they have lost some of their force, have by no means disappeared. Indeed, RP itself can be a handicap if used in inappropriate social situations, since it may be taken as a mark of affectation or a desire to emphasize social superiority.' It may be said, too, that if improved communications and radio have spread the availability of RP, these same influences have rendered other forms of pronunciation less remote and strange. An American pronunciation of English, for instance, is now completely accepted in Britain; this was not the case at the time when the first sound films were shown in this country, an American pronunciation then being considered strange and even difficult to understand. Speakers of RP are becoming increasingly aware of the fact that their type of I For a summary of experiments on the social evaluation of RP using the matched-guise technique, see Giles et at. (1990). 1 80 The Sounds of English pronunciation is one which is used by only a very small part of the English-speaking world. (4) Within RP, those habits of pronunciation that are mostly firmly established tend to be regarded as 'correct', whilst innovation tends to be stigmatized. Thus conservative forms tend to be most generally accepted, sometimes even by those who themselves use other pronunciations. Where the accentual patterns or the phonemic structure of words is concerned, this attitude may result in a speaker's use of the conservative variant in a formal situation and the use by the same speaker of a less well-established variant in more casual speech, e.g. the avoidance of /verffarebl/ (verifiable) and /"c^uarrn/ (during) in more formal speech and their replacement with the more conservative /"venfarabl/ and /"djoarm/. It may be of interest that the pronunciation /"c^oang/ with initial coalescent assimilation was acknowledged by Daniel Jones in the English Pronouncing Dictionary in the 1960s and noted as long ago as 1913 by Robert Bridges in his Tract on English Pronunciation. Nevertheless, there is still some resistance to accepting such coalescence word-initially in accented syllables. Where realizational variation (below the level of the phoneme) is affected, most speakers are unaware of their own changing speech patterns. Objections to the use of the glottal stop are often made, its use being popularly associated with Cockney speech, and yet its occurrence as a realization of preconsonantal /t/ is increasingly frequent within the speech of the middle and younger generations of RP speakers (see §9.2.8). (5) Even within RP there are some areas and many individual words where alternative pronunciations are possible. It is convenient to distinguish three main types of RP: General RP, Refined RP, and Regional RP.2 The last two types require some explanation. Refined RP is that type which is commonly considered to be upper-class, and it does indeed seem to be mainly associated in some way with upper-class families and with professions which have traditionally recruited from such families, e.g. officers in the navy and in some regiments. Where formerly it was very common, the number of speakers using Refined RP is increasingly declining. This may be because for many other speakers (both of other types of RP and of regional dialects) a speaker of Refined RP has become a figure of fun, and the type of speech itself is often regarded as affected. (The adjective 'Refined' has been chosen deliberately as having positive overtones for some people and negative overtones for others.) Particular characteristics of Refined RP are the realization of /au/ as [cu], and a very open word-final /a/ (and where [a] forms part of /ra,ea,ua/) and /i/. The vowel /s:/ is also pronounced very open, this time in all positions. The vowel /*/ is often dipthongized as [eae]. While Refined RP reflects a class distinction and describes a type of pronunciation which is relatively homogeneous, Regional RP reflects regional rather than class variation and will vary according to which region is involved in 'regional'. Some phoneticians, on the basis that part of the definition of RP is that it should not tell you where someone comes from, would regard the term 'Regional RP' as a contradiction in terms. Yet it is useful to have such a term as 'Regional RP' to describe the type of speech which is basically RP except for the presence of a few regional characteristics which go unnoticed even by other speakers of RP. For example, vocalization of dark [i] to [uj in words like held [heod] and ball [boo], a 2 cf. Wells (1982: 280-3, 297-301). Standard and Regional Accents 81 characteristic of Cockney (and some other regional accents), now passes virtually unnoticed in an otherwise fully RP accent (listen, for example, to umpires at Wimbledon saying all.) Or, again, the use of /se/ instead of /a./ before voiceless fricatives in words like after, bath, and past (part of the general Northern accent within England) may be likewise acceptable. But some other features of regional accents may be too stigmatized to be acceptable as RP, e.g. realization of /t/ by glottal stop word-medially between vowels, as in water (Cockney), the lack of a distinction between /a/ and /u/ (Northern), or the fronting of /u:/ to [y:] (Scottish). The concept of Regional RP reflects the fact that there is nowadays a far greater tolerance of dialectal variation in all walks of life, although, where RP is the norm, only certain types of regional dilution of RP are acceptable. It remains true, however, that most manuals and dictionaries of the pronunciation of British English, like this book, are based almost entirely on RP. RP, Refined RP, and Regional RP are not accents with precisely enumerable lists of features but rather represent clusterings of features, such clusterings varying from individual to individual. Thus there are not categorial boundaries between the three types of RP nor between RP and regional pronunciation; a speaker may, for example, generally be an RP speaker but have one noticeable feature of Refined RP. (6) Finally, it has to be recognized that the role of RP in the English-speaking world has changed very considerably in the last century. Over 300 million people now speak English as a first language, and of this number native RP speakers form only a minute proportion; the majority of English speakers use some form of American pronunciation. However, despite the discrepency in numbers, RP continues for historical reasons to serve as a model in many parts of the world; and, if a model is used at all, the choice is still effectively between RP and an American pronunciation. When it is a question of teaching English as a second language, there is clearly even greater adherence to one of the two main models. Most teaching textbooks describe either RP or General American, and allegiances to one or the other tend to be traditional or geographical: thus, for instance, European countries continue on the whole to teach RP, whereas some parts of Asia and Latin America follow the American model (see also Chapter 13). 7.4 Comparing Systems of Pronunciation A comparison of two types of pronunciation will reveal differences of several kinds (as mentioned in §5.3.5): (a) Realizational differences. The system, i.e. the number of distinctive (phonemic) terms operating may be the same, but the phonetic realizations of the phonemes may be different: e.g. the RP opposition between the vowels of bet and bat may be maintained, but the realization of both vowels is much more open than in RP (as in Northern English) (see §§8.9.3-4), so that the sound of /*/ may come near to that of one type of RP /a/ (see §8.9.5); or when, as in Cockney, an allophone ['] represents /t/ between vowels (see §9.2.8); or when the final allophone of IM is [1] rather than [*] (see §9.7.1). (b) Systemic differences (i.e. differences in phoneme inventory). The system may 82 The Sounds of English Standard and Regional Accents 83 be different, i.e. the number of oppositions may be smaller or greater, e.g. the RP /ae/-/a:/ opposition may not be present in those Ulster or Scottish forms which do not distinguish Sam and psalm; or when RP /at/ homophones, as in side and sighed, are differentiated qualitatively or quantitatively, as in some types of Scottish English; or when the presence of /g/ after [p] in such a word as sing deprives [gl of its phonemic status (see §9.6.3). (c) Lexical differences (i.e. differences of lexical incidence). The system may be the same, but the incidence of phonemes in words is different, e.g. in those Northern forms which have the RP opposition /u:/-/u/, but nevertheless use /u:/, in book, took, etc. (see §8.9.9-10); or when /d/ is used instead of /a/ in one, among, etc., though the opposition /d/-/a/ exists (see §8.9.5); or when the choice of phoneme is associated with the habits of different generations, e.g. /o:/ for /o/ in off, cloth, cross, etc. (see §8.9.7) or /ei/ for hi in Monday, holiday, etc. (d) Distributional differences. The system may be the same, but the phonetic context in which certain phonemes occur may be limited, e.g. in RP /r/ has a limited distribution, being restricted in its occurrence to prevocalic position as in red or horrid. Accents which display this limited distribution of /r/ are referred to as non-rhotic accents, whilst those in which /r/ has a full distribution (such as most American and Scottish accents) are termed rhotic. Zn the latter accents /r/ occurs pre-consonantally and pre-pausally as well as pre-vocalically; thus part and car will be pronounced /part/ and /ka:r/ whereas in non-rhotic accents the pronunciation will be /pa:t/ and /ka:/. See §§9.7.2 and 12.4.7. 7.5 Current Changes within RP3_ (1) Realizational changes. RP /*/ is frequently heard with a more open quality approaching Cla]. This continues a trend in which this RP vowel was typically around Cfe] early in this century. It appears to conflict with another trend whereby RP /a/ was becoming more fronted and also approaching Cla], There is no evidence suggesting that the two vowels are coalescing; indeed, it seems more likely that /a/ is retreating to its central position. Other developments among the vowels include /ea/ becoming monophthongal [e:], and /ai/ and /au/ having the same centrally open starting-point (as shown in the revision of the first symbol of /au/ from previous editions). The vowel represented by the spelling < y> at the ends of words like pity, cruelty, lengthy is more and more frequently heard with a closer and more forward pronunciation than the usual realization of /i/, e.g. in sit. Indeed the two vowels in such a pronunciation of city are far less similar to one another than the two vowels in meaty. Thus it seems best to regard the newer, closer, pronunciation as involving unaccented /i:/ (although, of course, theoretically the distinction may be said to be neutralized in this position—see §5.3.4).4 As has been mentioned in §7.3 (4), the realization of preconsonantal It/ as a glottal stop is increasingly common in present-day RP. (See §9.2.8) (2) Systemic changes. The one recent systemic change that is now more or less completed is the loss of /ds/ from the phoneme inventory. 3 See further in Ramsaran (1990a). 4 For fuller discussion, see Lewis (1990). (3) Lexical changes. There is a strong trend towards selecting /a/ instead of III in weak syllables, the choice of Is/ being particularly favoured after /l/ and even more so after /r/, e.g. angrily /"aajgrili/ > /"jengrali/. For further detail and examples, see §8.9.2. Another noticeable trend is the replacement of /oa/ by /j:/ in many common words, e.g. poor /pa:/, sure //:>:/, though /ml still retains its phonemic status, its contrastive function being illustrated in the speech of most speakers by such sets as doer, dour, door /du:a, dua, dy.l. (4) Distributional changes. The most noteworthy trend concerning a regular change in the occurrence of a phoneme is the loss of 1)1 after alveolar consonants in such words as allude /a"Iju:d/>/a'lu:d/, luminous /"lju: mmas/>/"lu:rnmas/, supersede /sju:pa'si:d/ > /su:pa'si:d/. 1)1 is most commonly dropped after /I/ and /s/ (as, indeed, it was long ago after It/). In sequences of /n/ + 1)1, elision of the 1)1 is increasingly common in British English. In the case of the alveolar plosives +1)1, coalescence whereby /tj,dj/>/t/,t%/, rather than elision, is now increasingly common except initially in an accented syllable, where HI+ 1)1 or Idl +1)1 tend to be retained. Thus educate /"edju:keit/ > /"e<%u:keit/, statuesque /staetju:"esk/ > /sta?tfu:'esk/. (5) Word accent changes. Certain patterns may be detected, especially in the change affecting adjectives in -ableI-Me and -aryl-ory. In both classes of words, the accent tends now to fall later in the word, thus "applicable > applicable, 'explicable > explicable, 'justifiable > justifiable, "fragmentary > fragmentary, "mandatory > mandatory. Similarly, the feminine suffix -ess increasingly attracts primary accent in words like countess, lio'ness, prio'ress, stewardess. Other current changes do not display such regular patterns, and it remains to be seen which of two variant pronunciations at present coexisting will prevail. 7.6 Systems and Standards other than RP__ The remainder of this book is a description of English set within the basic framework of RP, with some reference to variation in other dialects in the discussion of each of the RP phonemes. But there are a number of reasons why such particular differences should be drawn together to show the major overall differences between the phonemic system of RP and that of other major dialects of English. In this section we survey briefly differences between RP and five other systems: General American, Scottish English, Northern (England) English, CocE-ney, and Australian English.' We survey an American pronunciation because, as Tloied in §7Xthis is more frequently the standard model for learners of English as a second language in much of Asia and Latin America. We look at Scottish English because this is the type of pronunciation of English within the British Isles which is most frequently accepted as an alternative standard to RP. We survey Northern (England) English and Cockney because these are the areas (apart from Scottish) whose characteristic pronunciations are heard most widely within Britain and which often underlie regional forms of RP. We look at Australian English because this is typical of an English pronunciation of the southern hemisphere and may increasingly become the standard for a wider area rather than just Australia. Of 84 The Sounds of English Standard and Regional Accents 85 course, we could easily have made a case for the inclusion of other systems of pronunciation here (e.g. Caribbean English and Indian English); but since this is not primarily a book about varieties of English, a limit had to be set somewhere. Moreover there are now books which survey dialectical variation in English pronunciation in detail.5 Where reference is made in this book to non-standard varieties or English, the type of pronunciation being referred to is the basilectal variety of the area concerned, i.e. that used by lower socio-economic classes (and by middle socio-economic classes in informal situations). 7.6.1 Gejieral^American The traditional (although not undisputed) division of the United States for pronunciation purposes is into Eastern (including New England and New York City), Southern (stretching from Virginia to Texas and to all points southwards), and General (all the remaining area). General American (GA) can thus be regarded as that form of American which does not have marked regional characteristics (and is in this way comparable to RP). It is the standard model for the pronunciation of English as an 1.2 in parts of Asia (e.g. the Philippines) and parts of Latin America (e.g Mexico). There are two areas of systemic difference between RP and GA. First, GA hasno/o/. Most commonly, those vowels which have /d/ in RP are pronounced with /a:/ in GA, e.g. cod, spot, pocket, bottle. But a limited subset has /yj, e.g. across, gone.often.cough (as can be seen from the examples, these frequently involve a following voiceless fricative). Secondly, G A lacks the RPdiphthongs /ia,ea,ua/ which correspond in GA to sequences of vowel plus /r/, e.g. beard, fare, dour, /bi/d/. /fer/, /dur/. This reflects the allied distributional difference between RP and GA, namely that, unlike RP, where /r/ occurs only before vowels, GA /r/ can occur before consonants and before pause (GA is called a rhotic dialect and RP a non-rhotic dialect). The main difference of lexical incidence concerns words which in RP have /a:/ while in GA they have /*/. Like the change from /d/ to /y./, this change commonly involves the context before a voiceless fricative, or alternatively before a nasal followed by another consonant; thus RP /pa:st/-GA/pa?st/, RP /"arfta/-GA /sftar/, RP /pa:6/-GA /pa*/. Differences of realization are always numerous between any two systems of English pronunciation, and only the most salient will be mentioned. Among the vowels this includes the reali2ation of the diphthongs /ei/ and /au/ as monophthongs [e:) and (o:), hence late [le:r] and load [lo:dj. Among the consonants, hi is either phonetically U}, i.e. the tip of the tongue is curled further backwards than in RP, or else a similar auditory effect is achieved by bunching the body of the tongue upwards and backwards; /X/ tntervocalically is usually a voiced tap in GA, e.g. better (~ber»J; and /!/ is generally a dark It) in all positions in GA. unlike RP, where it is a clear |IJ before vowels and a dark [I] in other positions (see §9.7.1). 7.6.2. Scottish English The typical vowel system of Scottish English (SE) involves the loss of the RP distinctions between /a:/ and /*/, between /u:/ and /u/, and between /x/ and 5 In particular Wells (1982). ' ~~' /d/. Thus SE pronounces the pairs ant and aunt, soot and suit, caught and cot similarly. SE also has no /ia,ea,ua/ because, like American English, it is rhotic, and beard, fare, andt/oware pronounced as /bi:rd/, /feir/, and /du:r/. However, the vowel in /feir/, which we have transcribed with the RP diphthong let/, is typically monophlhongal fe:l (and of course would be transcribed as such if we were devising a phonemic transcription independently for SE). The vowel /au/ is also mono-phthongal [o:] as in coat [ko:t]; so the vowels in fare and coal are similar to those in American English. Moreover, the vowel in soot and suit is not like either of the RP vowels in these words, but is considerably fronted to something like [y], hence [syt]. The chief differences from RP in the realization of the consonants lies in the use of a tap [r], e.g. red [red] and trip [trip], though there is variation between this and [j] (the usual type in RP), the use of [a] being generally more prestigious. The phoneme /I/ is most commonly a dark [i] in all positions, little [iiti], and plough [piau]. Finally, intervocalic /t/ is often realized as a glottal stop, e.g. butter 7.6.3 Cockney We use the term Cockney, rather than London English, because, unlike General American and Scottish English, Cockney is as much a class dialect as a regional one. In its broadest form the dialect of Cockney includes a considerable vocabulary of its own, including rhyming slang. The characteristcs of Cockney pronunciation are spread more widely through the working class of London than is its vocabulary. Moreover, some traces of Cockney pronunciation are often present in most middle-class speech of the area. Unlike the previous two types of pronunciation, there are no differences in the inventory of vowel phonemes between RP and Cockney, and there are relatively few (compared with GA and SE) differences of lexical incidence. There are, however, a large number of differences of realization. The short front vowels tend to be uniformly closer than in RP, e.g. in sat, set, and sit, so much so that sat may sound like set and set itself like sit to speakers from other regions. Additionally the short vowel /a/ moves forward to almost C[a]. Among the long vowels, most noticeable is the diphthongization of Iv./ (= [ri]), /u:/ (= [uu]), and /:>:/, which varies between [w] morpheme-medially and [awa] morpheme-finally, thus bead [brid], boot [buutl, sword [soud], saw [sawa]. Cockney also uses distinctive pronunciations of a number of diphthongs: /ei/=[ai|, /ai/ = [ai], /ao/ = [aeu], and /ao/ = [a:], e.g. late [lait], light [lo.it], load Daeud], loud [la:d]. The last two vowels are close enough to cause considerable confusion among non-Cockney listeners, although the distinction is never actually neutralized. Among the consonants, most notable are the omission of /h/ and the replacement of /%$/ by /f,v/, e.g. think /fmk/father /'fava/, hammer /'jems/. Dark [*], i.e. Ill in positions not immediately before vowels, becomes vocalic [u], e.g. milk \mivk], /\f is realized as a glottal stop between vowels, e.g. [da?*] and there is glottal replacement of [p,t,k] before a following consonant, e.g. soapbox fsieu'bDks], statement [stai?mant] technical pte'nikal], as in some types of Scottish English; and 1)1 is elided after alveolar plosives, e.g. student, during. Cockney has consistently had a major influence on the development of RP, and 86 The Sounds of English nowadays that type of Regional RP which is heavily influenced by Cockney is often referred to as Estuary English (i.e. a middle-class pronunciation typical of the Thames estuary). Particularly characteristic of this type of Regional RP are the replacement of [p,t,k] by [?] before a consonant (see §9.2.8 (b) (ii) below) and the use of [u] in place of [1]. 7.6.4 Northern English While there is relative homogeneity in a broad Cockney accent but much less so in General American and Scottish English, the label 'Northern English' is even less homogeneous. We use it here simply to identify those things which the disparate pronunciations systems in the north of England have in common (and we will also mention a few characteristics which are typical only of certain areas). The area we are talking about covers that area north of a line from the River Severn to the Wash, and includes Birmingham. The major identifying feature of this area is the loss of the distinction between RP /u/ and /a/, the single phoneme doing duly here varying in quality from [u] to [a]. So Northern English has no distinction between put and putt, could and cud, and, for many speakers, between buck and book (although others may use /u:/ in the latter word). Hypercorrections are often made by those attempting Regional RP producing, for example, sugar [/Aga], pussy [pASi], put [pAt], Almost as identifying a characteristic is the changeover in lexical incidence from /a:/ to /*/ in words with a following voiceless fricative (or a nasal followed by a further consonant, as in General American), e.g. past /paest/, laugh /laef/, aunt /sent/. Another type of lexical incidence concerns the occurrence of a full vowel in prefixes where RP has /a/, e.g. advance /aed'vasns/, consume /konsju:m/, observe /rto'zr.v/. The short vowels are generally realized with more open qualities than RP, e.g. mad (mad], and the diphthongs /ei/ and /au/ are commonly monophthongal [e:] and [o:] as in GA and SE (indeed sometimes, as in Newcastle, the direction of the diphthong is reversed to [ea] and [os]). Other vowel changes (compared with RP) characteristic of particular areas include the loss of the /es/-/3:/ distinction in Liverpool (the local accent is called Scouse and its common realization as [oe:]. e.g. both fare and fur are pronounced [fee:]; the realization of /au/ as [u:] in Newcastle (where the broad local accent is called Geordie) while /u:/ itself becomes [ta], e.g. about [abu:t], boot [biat]; and the use of a particularly close /i/ in Birmingham, e.g. pit is almost [pit], where the distinction between pit and peat will depend on length alone. Most notable among the consonants of Northern English is the realization of /r/ as [r] in a number of conurbations including Leeds, Liverpool, and Newcastle, and the lack of the RP allophonic difference between clear [1] and dark [t], clear ]1] being used in all positions in many areas, e.g. Newcastle, and dark [i] in others, e.g. Manchester. In a quite extensive area, from Birmingham to Manchester and Liverpool, the RP single consonant /rj/ becomes /rjg/, e.g. singing Isingirjg). Also in a number of urban areas, notably south-east Lancashire, /p,t,k/ in final position (i.e. before a pause) are realized as ejectives. Standard and Regional Accents 87 7.6.5 Australian English There is little regional variation in Australian English (ANE), the variation which does occur being largely correlated with social class and ranging from a broad accent all the way up to regional RP. The broad accent described here shares many features with Cockney, but has of course a particular combination of these and other features which identify it. As in Cockney, there are no differences of phonemic inventory from RP and no extensive classes of word involved in differences of incidence. It is the realization of long /a:/ as [a:] which more than any other identifies ANE, e.g. father [fa:5a], part ipa:t]. As in Cockney, I'v.l and /u:/ are realized as [iil and [uul and the short front vowels are all closer than RP, although [i] does not occur in unaccented positions, being replaced by /i:/ word-finally and by /a/ in other positions, e.g. city /sati:/. In its diphthongs ANE is again like Cockney in having /ei/ = [ai] and /ai/ = [ail, and in having a convergence of quality of /au/ and /au/; however, diphthongs in /a/ are monophthongized, so /ia/ = [i:], clear [kli:] (leading to an accumulation of three vowels, /i:/, /i/, and |i:J in the close front area), /ea/ = \z.],fare [ft:], while /ua/ is either replaced by fa~-l as in sure or becomes disyllabic as in sewer /su:a/. Although ANE does drop /h/, it does not use glottal stop, nor does it vocalize /!/, having dark [i] in all positions. A particular development in Australian English (and in New Zealand) which has been the subject of much discussion recently, both in newspapers and in academic journals,6 is the increasing use of a high rising tone on declarative clauses (where a fall would normally have been expected). The meaning of this tone and the reasons behind its increased use have also been much discussed (see further under §11.6.3), 6 Guy et at. (1986); Britain (1992)