Phoneme

Overview
In human language, a phoneme is the smallest unit of speech that distinguishes meaning. Phonemes are not the physical segments themselves, but abstractions of them. An example of a phoneme would be the found in words like tip, stand, writer, and cat.

In sign languages, the basic movements were formerly called cheremes (or cheiremes), but usage changed to phoneme.

Some linguists (e.g. Roman Jakobson) consider phonemes to be further decomposable into features, such features being the true minimal constituents of language. Features as opposed to phonemes however overlap each other in time. A phoneme could be seen as a contemporaneous bundle of features.

A phoneme can include slightly different sounds or phones. For instance, the p sound in the English words pin and spin is pronounced differently. In some languages, such as Korean, these phones would be considered different phonemes. But English does not distinguish them, so in English both are considered to be a single phoneme. Two phones that belong to the same phoneme are called allophones. A common test to determine whether two phones are allophones or separate phones relies on finding so-called minimal pairs: words that differ only by the phones in question.

Background and related ideas
In ancient India, the Sanskrit grammarian (c. 520–460 BC), in his text of Sanskrit grammar, the Shiva Sutras, originated the concepts of the phoneme, the morpheme and the root. The Shiva Sutras describes a phonemic notational system in the fourteen initial lines of the . The notational system introduces different clusters of phonemes that serve special roles in the morphology of Sanskrit, and are referred to throughout the text.

Around the 1st century CE, the definitions of phoneme (oliyam) and alphabet (ezuththu) were discussed in the Tolkāppiyam concerning the Tamil language.

The term phonème was reportedly first used by Dufriche-Desgenettes in 1873, but it referred to only a sound of speech. The term phoneme as an abstraction was developed by the Polish linguist Jan Mieczysław Baudouin de Courtenay and his student Mikołaj Kruszewski during 1875-1895. The term used by these two was fonema, the basic unit of what they called psychophonetics. The concept of the phoneme was elaborated in the works of Nikolai Trubetzkoi and other of the Prague School (during the years 1926-1935), as well as in that of structuralists like Ferdinand de Saussure, Edward Sapir, and Leonard Bloomfield. Later, it was also used in generative linguistics, most famously by Noam Chomsky and Morris Halle, and remains central in any accounts of the development of virtually all modern schools of phonology.

Some languages make use of pitch for phonemic distinction. In this case, the tones used are called tonemes. Some languages distinguish words made up of the same phonemes (and tonemes) by using different durations of some elements, which are called chronemes. However, not all scholars working on languages with distinctive duration use this term.

Usually, long vowels and consonants are represented either by a length indicator or doubling of the symbol in question.

In sign languages, phonemes may be classified as Tab (elements of location, from Latin tabula), Dez (the hand shape, from designator), Sig (the motion, from signation), and with some researchers, Ori (orientation). Facial expressions and mouthing are also phonemic.

Notation
A transcription that only indicates the different phonemes of a language is said to be phonemic. Such transcriptions are enclosed within virgules (slashes), / /; these show that each enclosed symbol is claimed to be phonemically meaningful. On the other hand, a transcription that indicates finer detail, including allophonic variation like the two English L's, is said to be phonetic, and is enclosed in square brackets, [ ].

The common notation used in linguistics employs virgules (slashes) (/ /) around the symbol that stands for the phoneme. For example, the phoneme for the initial consonant sound in the word "phoneme" would be written as. In other words, the graphemes are &lt;ph&gt;, but this digraph represents one sound. Allophones, more phonetically specific descriptions of how a given phoneme might be commonly instantiated, are often denoted in linguistics by the use of diacritical or other marks added to the phoneme symbols and then placed in square brackets ([ ]) to differentiate them from the phoneme in slant brackets (/ /). The conventions of orthography are then kept separate from both phonemes and allophones by the use of angle brackets < > to enclose the spelling.

The symbols of the International Phonetic Alphabet (IPA) and extended sets adapted to a particular language are often used by linguists to write phonemes of oral languages, with the principle being one symbol equals one categorical sound. Due to problems displaying some symbols in the early days of the Internet, systems such as X-SAMPA and Kirshenbaum were developed to represent IPA symbols in plain text. As of 2004, any modern web browser can display IPA symbols (as long as the operating system provides the appropriate fonts), and we use this system in this article.

The only published set of phonemic symbols for a sign language is the Stokoe notation developed for American Sign Language, which has since been applied to British Sign Language by Kyle and Woll, and to Australian Aboriginal sign languages by Adam Kendon. However, there are several phonetic systems, such as SignWriting.

Examples
Examples of phonemes in the English language would include sounds from the set of English consonants, like and. These two are most often written consistently with one letter for each sound. However, phonemes might not be so apparent in written English, such as when they are typically represented with combined letters, called digraphs, like &lt;sh&gt; (pronounced ) or &lt;ch&gt; (pronounced ).

To see a list of the phonemes in the English language, see IPA for English.

Two sounds that may be allophones (sound variants belonging to the same phoneme) in one language may belong to separate phonemes in another language or dialect. In English, for example, has aspirated and non-aspirated allophones:aspirated as in, and non-aspirated as in. However, in many languages (e. g. Chinese), aspirated is a phoneme distinct from unaspirated. As another example, there is no distinction between and  in Japanese, there is only one  phoneme in Japanese, although the Japanese  has allophones that make it sound more like an,  (specifically the flapped form ), or  to English speakers. The sounds and  are distinct phonemes in English, but allophones in Spanish. (as in run) and (as in rung) are phonemes in English, but allophones in Italian and Spanish.

An important phoneme is the chroneme, a phonemically-relevant extension of the duration a consonant or vowel. Some languages or dialects such as Finnish or Japanese allow chronemes after both consonants and vowels. Others, like Italian or Australian English use it after only one (in the case of Italian, consonants; in the case of Australian, vowels).

Restricted phonemes
A restricted phoneme is a phoneme that can only occur in a certain environment: There are restrictions as to where it can occur. English has several restricted phonemes:


 * , as in sing, occurs only at the end of a syllable, never at the beginning (in many other languages, such as Swahili, can appear word-initially).
 * occurs only before vowels and at the beginning of a syllable, never at the end (a few languages, such as Arabic, or Romanian allow /h/ syllable-finally).
 * In many American dialects with the cot-caught merger, occurs only before, , and in the diphthong.
 * In non-rhotic dialects, can only occur before a vowel, never at the end of a word or before a consonant.
 * Under most interpretations, and  occur only before a vowel, never at the end of a syllable. However, many phonologists interpret a word like boy as either  or.

Neutralization, archiphoneme, underspecification
Phonemes that are contrastive in certain environments may not be contrastive in all environments. In the environments where they don't contrast, the contrast is said to be neutralized.

In English there are three nasal phonemes,, as shown by the minimal triplet,


 * {| cellpadding="4"


 * sum
 * sun
 * sung
 * }
 * sun
 * sung
 * }
 * sung
 * }

However, with rare exceptions, these sounds are not contrastive before plosives such as within the same morpheme. Although all three phones appear before plosives, for example in limp, lint, link, only one of these may appear before each of the plosives. That is, the distinction is neutralized before each of the plosives :
 * Only occurs before ,
 * only before, and
 * only before.

Thus these phonemes are not contrastive in these environments, and according to some theorists, there is no evidence as to what the underlying representation might be. If we hypothesize that we are dealing with only a single underlying nasal, there is no reason to pick one of the three phonemes over the other two.

(In some languages there is only one phonemic nasal anywhere, and due to obligatory assimilation, it surfaces as in just these environments, so this idea is not as far-fetched as it might seem at first glance.)

In certain schools of phonology, such a neutralized distinction is known as an archiphoneme (Nikolai Trubetzkoy of the Prague school is often associated with this analysis.). Archiphonemes are often notated with a capital letter. Following this convention, the neutralization of before  could be notated as |N|,  and limp, lint, link would be represented as ||. (The |pipes| indicate underlying representation.) Other ways this archiphoneme could be notated are |m-n-ŋ|,, or |n*|.

Another example from American English is the neutralization of the plosives following a stressed syllable. Phonetically, both are realized in this position as, a voiced alveolar flap. This can be heard by comparing writer with rider (for the sake of simplicity, Canadian raising is not taken into account).


 * {| cellpadding="4"


 * write
 * ride
 * }
 * ride
 * }
 * }

with the suffix -er:


 * {| cellpadding="4"


 * writer
 * rider
 * }
 * rider
 * }
 * }

Thus, one cannot say whether the underlying representation of the intervocalic consonant in either word is or  without looking at the unsuffixed form. This neutralization can be represented as an archiphoneme |D|, in which case the underlying representation of writer or rider would be ||.

Another way to talk about archiphonemes involves the concept of underspecification: phonemes can be considered fully specified segments while archiphonemes are underspecified segments. In Tuvan, phonemic vowels are specified with the features of tongue height, backness, and lip rounding. The archiphoneme |U| is an underspecified high vowel where only the tongue height is specified.


 * {| cellpadding="4"

! phoneme/ archiphoneme ! height ! backness ! roundedness
 * high
 * front
 * unrounded
 * high
 * back
 * unrounded
 * high
 * back
 * rounded
 * |U|
 * high
 * }
 * high
 * back
 * rounded
 * |U|
 * high
 * }
 * }

Whether |U| is pronounced as front or back and whether rounded or unrounded depends on vowel harmony. If |U| occurs following a front unrounded vowel, it will be pronounced as the phoneme ; if following a back unrounded vowel, it will be as an ; and if following a back rounded vowel, it will be an. This can been seen in the following words:


 * {| cellpadding="4"


 * -|Um|
 * 'my'
 * (the vowel of this suffix is underspecified)
 * |idikUm|
 * 'my boot'
 * (/i/ is front & unrounded)
 * |xarUm|
 * 'my snow'
 * (/a/ is back & unrounded)
 * |nomUm|
 * 'my book'
 * (/o/ is back & rounded)
 * }
 * |xarUm|
 * 'my snow'
 * (/a/ is back & unrounded)
 * |nomUm|
 * 'my book'
 * (/o/ is back & rounded)
 * }
 * 'my book'
 * (/o/ is back & rounded)
 * }
 * (/o/ is back & rounded)
 * }

Not all phonologists accept the concept of archiphonemes. Many doubt that it reflects how people process language or control speech, and some argue that archiphonemes add unnecessary complexity.

Phonological extremes
Of all the sounds that a human vocal tract can create, different languages vary considerably in the number of these sounds that are considered to be distinctive phonemes in the speech of that language. Ubyx and Arrernte have only two phonemic vowels, while at the other extreme, the Bantu language Ngwe has fourteen vowel qualities, twelve of which may occur long or short, for twenty-six oral vowels, plus six nasalized vowels, long and short, for thirty-eight vowels; while !Xóõ achieves thirty-one pure vowels—not counting vowel length, which it also has—by varying the phonation. Rotokas has only six consonants, while !Xóõ has somewhere in the neighborhood of seventy-seven, and Ubyx eighty-one. French has no phonemic tone or stress, while several of the Kam-Sui languages have nine tones, and one of the Kru languages, Wobe, has been claimed to have fourteen, though this is disputed. The total phonemic inventory in languages varies from as few as eleven in Rotokas to as many as 112 in !Xóõ (including four tones). These may range from familiar sounds like, , or to very unusual ones produced in extraordinary ways (see: Click consonant, phonation, airstream mechanism). The English language itself uses a rather large set of thirteen to twenty-two vowels, including diphthongs, though its twenty-two to twenty-six consonants are close to average. (There are twenty-one consonant and five vowel letters in the English alphabet, but this does not correspond to the number of consonant and vowel sounds.)

The most common vowel system consists of the five vowels. The most common consonants are. Very few languages lack one of these: Arabic lacks, standard Hawaiian lacks , Mohawk lacks and , Hupa lacks both  and a simple , colloquial Samoan lacks  and , while Rotokas and Quileute lack  and. While most of these languages have very small inventories, Quileute and Hupa have quite complex consonant systems.