Dr. C. George Boeree
Shippensburg University

Phonetics is the study of the sounds of language.  These sounds are called phonemes. There are literally hundreds of them used in different languages.  Even a single language like English requires us to distinguish about 40!  The key word here is distinguish.  We actually make much finer discriminations among sounds, but English only requires 40.  The other discriminations are what lets us detect the differences in accents and dialects, identify individuals, and differentiate tiny nuances of speech that indicate things beyond the obvious meanings of the words.

The Vocal Tract

In order to study the sounds of language, we first need to study the vocal tract.  Speech starts with the lungs, which push air out and pull it in.  The original purpose was, of course, to get oxygen and eliminate carbon dioxide.  But it is also essential for speech.  There are phonemes that are little more than breathing:  the h for example.

Next, we have the larynx, or voice box.  It sits at the juncture of the trachea or windpipe coming up from the lungs and the esophagous coming up from the stomach.  In the larynx, we have an opening called the glottis, an epiglottis which covers the glottis when we are swallowing, and the vocal cords.  The vocal cords consist of two flaps of mucous membrane stretched across the glottis, as in this photograph:

The vocal cords can be tightened and loosened and can vibrate when air is forced past them, creating sound.  Some phonemes use that sound, and are called voiced.  Examples include the vowels (a, e, i, o, and u, for example) and some of the consonants (m, l, and r, for example).  Other phonemes do not involve the vocal cords, such as the consonants h, t, or s, and so are called unvoiced.

The area above the glottis is called the pharynx, or upper throat.  It can be tightened to make phryngeal consonants.  English doesn’t have any of these, but they sound like when you try to get a piece of food back up out of your throat.

At the top of the throat is the opening to the nasal passages (called the nasopharynx, in case you are interested).  When we allow air to pass into the nose while speaking, the sounds we make are called nasal.  Examples include m, n, and the ng sound of sing.

Much of the action during speech occurs in the mouth, of course, especially involving the interaction of the tongue with the roof of the mouth.  The roof of the mouth has several specific areas:  At the very back, just before the nasal passage, is that little bag called the uvula.  Its major function seems to be moisturizing the air and making certain sounds called, obviously, uvular.  The best known is the kind of r pronounced in the back of the mouth by some French and German speakers.  Uvular, pharyngeal, and glottal sounds are often refered to as gutterals.

Next, we have the soft palate, called the velum.  If you turn your tongue back as far as it will go and press up, you can feel how soft it is.  When you say k or g, you are using the velum, so they are called velar consonants.

Further forward is the hard palate.  Quite a few consonants are made using the hard palate, such as s, sh, n, and l, and are called palatals.  Just behind the teeth is the dental ridge or alveolus.  Here is where many of us make our t’s and d’s -- alveolar consonants.

At the very outer edge of the mouth we have the teeth and the lips.  Dental consonants are made by touching the tongue to the teeth.  In English, we make the two th sounds like this.  Note that one of these is voiced (the th in the) and one is unvoiced (the th in thin).

At the lips we can make several sounds as well.  The simplest, perhaps, are the bilabial sounds, made by holding the lips together and then releasing the sound, such as p and b, or by keeping them together and releasing the air through the nose, making the bilabial nasal m.  We can also use the upper teeth with the lower lip, for labiodental sounds.  This is how we make an f, for example.

Incidentally, we also have two names for the parts of the tongue used with these various parts of the mouth:  The front edge is called the corona, and the back is called the dorsum.  Sounds like t, th, and s are made with the corona, while k, g, and ng are made with the dorsum.


Consonants are sounds which involve full or partial blocking of airflow.  In English, the consonants are p, b, t, d, ch, j, k, g, f, v, th, dh, s, z, sh, zh, m, n, ng, l, r, w, and y.  They are classified in a number of different ways, depending on the vocal tract details we just discussed.

1.  Stops, also known as plosives.  The air is blocked for a moment, then released.  In English, they are p, b, t, d, k, and g.

a.  Bilabial plosives: p (unvoiced) and b (voiced)
b.  Alveolar plosives:  t (unvoiced) and d (voiced)
c.  Velar plosives:  k (unvoiced) and g (voiced)

In other languages, we find labiodental, palatal, uvular, pharyngeal, and glottal plosives as well, and retroflex plosives, which involve reaching back to the palate with the corona of the tongue.

In many languages, plosives may be followed by aspiration, that is, by a breathy sound like an h.  In Chinese, for example, there is a distinction between a p pronounced crisply and an aspirated p.  We use both in English (pit vs poo), but it isn’t a distinction that separates one meaning from another.

2.  Fricatives involve a slightly resisted flow of air.  In English, these include f, v, th, dh, s, z, sh, zh, and h.

a.  Labiodental fricatives:  f (unvoiced) and v (voiced)
b.  Dental fricatives:  th (as in thin -- unvoiced) and dh (as in the -- voiced)
c.  Alveolar fricatives:  s (unvoiced) and z (voiced)
d.  Palatal fricatives:  sh (unvoiced) and zh (like the s in vision -- voiced)
e.  Glottal fricative:  h (unvoiced)

3.  Affricates are sounds that involve a plosive followed immediately by a fricative at the same location.  In English, we have ch (unvoiced) and j (voiced).  Many consider these as blends:  t-sh and d-zh.

4.  Nasals are sounds made with air passing through the nose.  In English, these are m, n, and ng.

a.  Bilabial nasal:  m
b.  Alveolar nasal:  n
c.  Velar nasal:  ng

5.  Liquids are sounds with very little air resistance.  In English, we have l and r, which are both alveolar, but differ in the shape of the tongue.  For l, we touch the tip to the ridge of the teeth and let the air go around both sides.  For the r, we almost block the air on both sides and let it through at the top.  Note that there are many variations of l and r in other languages and even within English itself!

6.  Semivowels are sounds that are, as the name implies, very nearly vowels.  In English, we have w and y, which you can see are a lot like vowels such as oo and ee, but with the lips almost closed for w (a bilabial) and the tongue almost touching the palate for y (a palatal).  They are also called glides, since they normally “glide” into or out of vowel positions (as in woo, yeah, ow, and oy).

In many languages, such as Russian, there is a whole set of palatalized consonants, which means they are followed by a y before the vowel.  This is also called an on-glide.


There are about 14 vowels in English.  They are the ones found in these words:  beet, bit, bait, bet, bat, car, pot (in British English), bought, boat, book, boot, bird, but, and the a in ago.  There are also three diphthongs or double vowels:  bite, cow, and boy.  Diphthongs involve off-glides.: You can hear the y in bite and boy, and the w in cow.  Actually, the sounds in bait and boat are also diphthongs (with y and w off-glides, respectively), but the first parts of the diphthongs are different from the nearby sounds in bet and bought.

Vowels are classified in three dimensions:

1.  The height of the tongue in the mouth -- low, mid, or high

high are beet, bit, boot, and book
mid are bait, bet, but, boat, bought, bird and a in ago
low are bat, car, and british pot

2.  How far forward or backward in the mouth the tongue rises -- front, center, or back

front are beet, bit, bait, bet, and bat
center are but, bird, and a in ago
back are boot, book, boat, bought, and british pot

3.  How rounded or unrounded the lips are

the front vowels are unrounded
the center and back vowels are rounded

The rounding idea may seem unnecessary until you realize that many languages have rounded front vowels -- such as the German ü and ö and the French u and eu -- and many have unrounded back vowels -- such as the Japanese u.  If you took French in high school, you may remember the teacher telling you to say tea with your lips rounded for French tu.  It isn’t the best way to teach the sound, but it shows you where it fits in the scheme.

There is one more dimension that doesn’t have much to do with English, but is essential in many languages, and that is vowel length.  Vowels can be short or long, and it is just a matter of how long you continue the sound.  The closest we get in English is that the vowel in beet is longer (as well as higher) than the vowel in bit.  The same goes for boot and book, and for caught and the British pot.

In some languages, such as French, there is another quality to vowels, and that is nasality.  Some vowels are pronounced with airflow through the nose as well as the mouth.  Originally, these were simply vowels followed by nasal consonants.  But over time, the French blended the vowels and the nasals into one unit.


Over the years, linguists have developed a complex chart of phonemes for transcribing the sounds of all languages around the world.  It is called the International Phonetic Alphabet, and much of it is in the charts below.  If you get question marks or little squares, that means your computer isn't equipt with unicode, in which case you will have to look elsewhere for charts like this.

bilabial labio-





fricatives uv.
χ h





















i   y
ɨ    ʉ
ɯ   u

ɪ   ʏ

e   ø
ɜ     ə     ɵ
ɤ   o

ɛ   œ
ɐ        ʌ
æ   a

α   ɒ

Vowel length is marked with a colon after the vowel, e.g. i:

Nasal vowels are shown by placing a tilde over the vowel, e.g. ã

There are dozens more phonemes beyond the ones in the preceding charts, but one set is particularly interesting:  clicks.  Clicks are sounds made by creating a vacuum with the tongue and then suddenly snapping the tongue away.  We use these ourselves, though not as parts of words:  When we “tsk tsk,” when we make clucking sounds, and when we make a click in the side of our mouths when we tell a horse to get a move on.  Clicks are used in the Bushman languages and in the Bantu languages that had prolonged contact with them.  The best known is the Bantu language Khosa, because of the famous South African singer Miriam Makeba.

Stress and Tones

In many languages around the world, including English, words are differentiated by means of stress.  One syllable is usually given a higher pitch ("up" the musical scale) and sometimes a bit more force.  This is how we differentiate af-fect (as in influence) and af-fect (as in emotion), for example.  In longer words, there may even be a second semi-stressed syllable, as in math-e-mat-ics:  mat has the primary stress, math has the secondary stress.  In IPA, primary stress is indicated by preceding the syllable with a high vertical line, secondary with a low vertical line.

Note that even when we do not need to use stress to differentiate words, we use it anyway.  Sometimes we can tell where a person is from by how they use stress:  insurance is usually stressed on the sur; southerners stress it on the in.  But many languages do not use stress at all.  To our ears, they sound rather monotone.

Some other languages use dynamic stress or tones.  Swedish is an example.  This means that there is actual change of stress within syllables.  In Swedish, there are two tones: 

The single tone starts high and goes down.  If a single toneword has a second syllable, that syllable is unstressed.  Single tone words don’t sound very unusual to English speakers.

The double tone is only found in two syllable words.  The first pitch starts in the middle range of pitch and the second tone starts high and goes down.  If there is a third syllable, it is unstressed.  The double tone gives the word a sing-song quality to English speakers.

These tones differentiate many words in Swedish.  In the single tone, anden, tomten, biten, and slaget mean the duck, the building, the bit, and the battle, respectively.  In the double tone, they mean the spirit, the elf, bitten, and beaten, respectively!  English uses dynamic stress or tones also, but only one whole phrases, such as the rising pitch at the end of questions.

But many languages in Africa and Asia use far more complex tones, and in fact are called tonal languages.  Chinese is the best known example.  Although words are often more than one syllable in length, each syllable has a particular meaning.  And Chinese uses a very limitied number of phonemes.  It is the tones that prevent every syllable from having hundreds of meanings.  There are five of them:

Tone 1 -- high and level (as in hey!)
Tone 2 -- middle, then rising (as in was it you?)
Tone 3 -- middle, falling, then rising (as in mom!? spoken by a whining teenager)
Tone 4 -- high, then falling (as in Tom spoken by a disappointed mom)

For example, the simple syllable yi can mean many different things.  With tone 1 it means cloth, with tone 2 it means to suspect, with tone 3 it means chair, and with tone 4 it means meaning.  The syllable wu means house, none, five, and fog, respectively.  And ma means mother, hemp, horse, and scold.  In the official transcription, the four tones are indicated by ¯, ´, ˇ, and `.

Thai has five tones:  high, middle, low, rising, and falling.  The African language Katamba has six, adding a falling, then rising tone.  Cantonese has nine tones: high long, high short, middle long, middle short, low long, low short, high falling, middle falling, and low rising.

We don't know how tonal languages arise.  Many believe that it has to do with phonemes or even whole syllables that have been lost, but influenced the pronounciation anyway.  But this makes it hard to explain that Cantonese, which has kept many old consonant endings, has nine tones, while its relative Mandarin Chinese, which has lost those endings, only has four.  Of course a linguist from China might ask how non-tonal languages lost their tones!

One interesting tidbit is that tonality often crosses family lines.  In Asia, for example, tonality is found in Chinese, Thai, and Vietnamese -- which are unrelated languages.  On the other hand, Tibetan and Burmese are related to Chinese, but are not tonal; neither is Khmer, a relative of Vietnamese.  Most African languages are tonal, but Swahili is not.  Hausa, spoken in Nigeria, is tonal, but relatives like Arabic are not.  It is possible that one or another language family influenced others around it, or was original to an area before being invaded by speakers of another language.

© Copyright 2005, C. George Boeree