Psychoacoustics

Psychoacoustics is the study of subjective human perception of sounds. Alternatively it can be described as the study of the psychological correlates of the physical parameters of acoustics.

Background
Hearing is not a purely mechanical phenomenon of wave propagation, but is also a sensory and perceptual event. When a person hears something, that something arrives at the ear as a mechanical sound wave traveling through the air, but within the ear it is transformed into neural action potentials. These nerve pulses then travel to the brain where they are perceived. Hence, in many problems in acoustics, such as for audio processing, it is advantageous to take into account not just the mechanics of the environment, but also the fact that both the ear and the brain are involved in a person’s listening experience.

The ear for example, takes a spectral decomposition of sound as part of the process of turning sound into neural stimulus, so certain time domain effects are inaudible. MP3 compression makes use of this fact. In addition the ear has a logarithmic dynamic response. Telephone networks make use of this fact by logarithmically compressing data samples before transmission, and then exponentially expanding them for playback. Another side effect of the ear’s non linear logarithmic response is that sounds which appear on the ear drum in close spectral proximity produce phantom beat notes. This is the same principle that is used for down conversion of carrier frequencies in radio front ends by a non-linear amplifier. Such physiological effects due to the ear’s anatomy are properly called physiology-acoustic effects, though people commonly lump them in with psycho-acoustic effects.

There are true psycho-acoustic effects introduced by the brain. For example,  when a person listens to crackly and needle-on-vinyl hiss-filled records, he or she soon stops noticing the background noise, and enjoys the music. A person who does this habitually appears to forget about the noise altogether, and may not be able to tell you after listening if there was noise present. This effect is called psycho-acoustical masking. The brain’s ability to perform such masking has been important for the adoption of a number of technologies; though in this age of digital signaling and high fidelity playback the effect is typically used to hide losses in compression rather than to cover up analog white noise. As another example of a psycho-acoustic effect, the brain appears to use a correlative process for pattern recognition; much like is done in electronic circuits that look for signal patterns. When the threshold for acceptance of a correlative match is very low a person may perceive hearing a sought after pattern in pure noise or among sounds that are somewhat indicative, as the brain fills in the rest of the pattern. This is a psycho-acoustic phantom effect. For example when a radio operator is straining to hear a weak Morse Code signal in a noisy background, he or she often perceives hearing the pitch of tiny dots and dashes even when they are not present. In general psycho-acoustic phantom effects play an important role in any environment where people have heightened perceptions, such as when danger may be perceived to be near. (There is an analogous visual effect experienced by people standing watch in very dark places.)   The psycho-acoustic phantom effect is conceptually distinct from hallucination, where the brain auto generates perceptions. Also, the psycho-acoustic phantom effect is distinct from the physiology-acoustic phantom effect. It is the estimation of masking threshold level.

Limits of perception
The human ear can nominally hear sounds in the range 20 Hz to 20,000 Hz (20 kHz). This upper limit tends to decrease with age, most adults being unable to hear above 16 kHz. The ear itself does not respond to frequencies below 20 Hz, but these can be perceived via the body's sense of touch. (Some recent research has demonstrated a hypersonic effect which is that although sounds above 20 kHz cannot consciously be heard, they can have an effect on the listener.)

Frequency resolution of the ear is, in the middle range, about 2 Hz. That is, changes in pitch larger than 2 Hz can be perceived. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a (low-)frequency difference pitch. This effect of phase variance upon the resultant sound is known as 'beating'.

However, the effect of frequency on the human ear has a logarithmic basis. In other words, the perceived pitch of a sound is related to the frequency as an exponential function. The 12-tone musical scale is an example of this; it evolved due to the way tones are perceived. When the fundamental frequency of a note or tone is multiplied by approximately $$2^\frac{1}{12}$$ (this factor is true in the average, but varies slightly depending on the tuning), the result is the frequency of the next higher semitone. Going 12 notes higher &mdash; an octave &mdash; is the same as multiplying the frequency by $$2^\frac{12}{12}$$, which is the same as doubling the frequency.

The impact of this is that the semitone scale used in Western musical notation is not a linear frequency scale but logarithmic. Other scales have been derived directly from experiments on human hearing perception, such as the Mel scale and Bark scale (these are used in studying perception, but not usually in musical composition), and these are approximately logarithmic in frequency as well.

The "intensity" range of audible sounds is enormous. Our ear drums are sensitive only to the sound pressure variation. The lower limit of audibility is defined to 0 dB, but the upper limit is not as clearly defined. The upper limit is more a question of the limit where the ear will be physically harmed or with the potential to cause a hearing disability. This limit depends also on the time exposed to the sound. The ear can be exposed to short periods in excess of 120 dB without permanent harm — albeit with discomfort and possibly pain; but long term exposure to sound levels over 80 dB can cause permanent hearing loss.

A more rigorous exploration of the lower limits of audibility determines that the minimum threshold at which a sound can be heard is frequency dependent. By measuring this minimum intensity for testing tones of various frequencies, a frequency dependent Absolute Threshold of Hearing (ATH) curve may be derived. Typically, the ear shows a peak of sensitivity (i.e., its lowest ATH) between 1 kHz and 5 kHz, though the threshold changes with age, with older ears showing decreased sensitivity above 2 kHz.

The ATH is the lowest of the equal-loudness contours. Equal-loudness contours indicate the sound pressure level (dB), over the range of audible frequencies, which are perceived as being of equal loudness. Equal-loudness contours were first measured by Fletcher and Munson at Bell Labs in 1933 using pure tones reproduced via headphones, and the data they collected are called Fletcher-Munson curves. Because subjective loudness was difficult to measure, the Fletcher-Munson curves were averaged over many subjects.

Robinson and Dadson refined the process in 1956 to obtain a new set of equal-loudness curves for a frontal sound source measured in an anechoic chamber. The Robinson-Dadson curves were standardized as ISO 226 in 1986. In 2003, ISO 226 was revised as equal-loudness contour using data collected from 12 international studies.

Interpretation of sound
Human hearing is basically like a spectrum analyzer, that is, the ear resolves the spectral content of the pressure wave without respect to the phase of the signal. In practice, though, some phase information can be perceived. Inter-aural phase difference, that is the difference in sound between the ears, is a notable exception by providing a significant part of the directional sensation of sound. The filtering effects of head-related transfer functions provide another important directional cue.

Masking effects
In some situations an otherwise clearly audible sound can be masked by another sound. For example, conversation at a bus stop can be completely impossible if a loud bus is driving past. This phenomenon is called masking. A weaker sound is masked if it is made inaudible in the presence of a louder sound. The masking phenomenon occurs because any loud sound will distort the Absolute Threshold of Hearing, making quieter, otherwise perceptible sounds inaudible.

If two sounds occur simultaneously and one is masked by the other, this is referred to as simultaneous masking. Simultaneous masking is also sometimes called frequency masking. The tonality of a sound partially determines its ability to mask other sounds. A sinusoidal masker, for example, requires a higher intensity to mask a noise-like maskee than a loud noise-like masker does to mask a sinusoid. Computer models which calculate the masking caused by sounds must therefore classify their individual spectral peaks according to their tonality.

Similarly, a weak sound emitted soon after the end of a louder sound is masked by the louder sound. Even a weak sound just before a louder sound can be masked by the louder sound. These two effects are called forward and backward temporal masking, respectively.

'Phantom' fundamentals
At the lower end of the ears' response, low notes can sometimes be heard when there is no sound at that frequency. This is due to the brain synthesising the low frequency sound from the differences of audible harmonics that are present. This effect is used in some commercial sound systems to give the effect of extended low frequency response when the system itself cannot reproduce that frequency adequately. See missing fundamental.

Psychoacoustics in software
The psychoacoustic model provides for high quality lossy signal compression by describing which parts of a given digital audio signal can be removed (or aggressively compressed) safely - that is, without significant losses in the (consciously) perceived quality of the sound.

It can explain how a sharp clap of the hands might seem painfully loud in a quiet library, but is hardly noticeable after a car backfires on a busy, urban street. This provides great benefit to the overall compression ratio, and psychoacoustic analysis routinely leads to compressed music files that are 1/10 to 1/12 the size of high quality original masters with very little discernible loss in quality. Such compression is a feature of nearly all modern audio compression formats. Some of these formats include MP3, Ogg Vorbis, WMA, Musicam (used for digital audio broadcasting in several countries) and ATRAC, the compression used in MiniDisc and walkman.

Psychoacoustics is based heavily on human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, these limitations are:


 * High frequency limit
 * Absolute threshold of hearing
 * Temporal masking
 * Simultaneous masking

Given that the ear will not be at peak perceptive capacity when dealing with these limitations, a compression algorithm can assign a lower priority to sounds outside the range of human hearing. By carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds a listener can hear most clearly are of the highest quality.

Psychoacoustics and music
Psychoacoustics include topics and studies which are relevant to music psychology. Theorists such as Benjamin Boretz consider some of the results of psychoacoustics to be meaningful only in a musical context.

Applied psychoacoustics
Psychoacoustics is presently applied within many fields from software development where developers map proven and experimental mathematical patterns; in the design of (high end) audio systems for accurate reproduction of music in theatres and homes, as well as defense systems where scientists have the capability to create new acoustic weapons {some of which emit frequencies that may impair, harm, or kill [with very limited success (http://www.nationaldefensemagazine.org/issues/2002/Mar/Acoustic-Energy.htm)]. It is also applied today within music, where musicians and artists continue to create new sonic sensory-breaking perceptions of sonic reality by masking unwanted frequencies of instrument while others are enhanced by the absence of the masked ones. Yet another application is to provide listeners of small loudspeakers the impression that they hear low notes by masking what is subsonic and enhancing what is perceived as low the frequencies (see references).