Digital Audio Concepts

Sound Basics

Sound originates from a disturbance of the air by any object. For example, two hands clapping cause a disturbance of the air around the hands: the hands are the source of the sound. The local region of air has increased energy caused by the motion of the air molecules. This energy spreads outwards in sound waves.

A typical source of acoustic energy today is the loudspeaker: the cone of the loudspeaker vibrates in the air causing disturbances dependent on the electrical signals reaching the loudspeaker from the sound system. Effectively, the loudspeaker converts electrical energy into sound energy, which travels through the air as waves radiating from the loudspeaker:

Here is a graph of the variations in air pressure over time, for a pure sine tone — the sound produced by a tuning fork or an electronic oscillator.

Frequency and Pitch

Subjectively, we often refer to a sound as having a high pitch or a low pitch. But from a technical standpoint, the sensation of pitch depends upon frequency — how many vibrations (periods or cycles) per second you are hearing. Frequency is the most fundamental quantitative unit of sound. When we talk about the radiation of sound, we are talking about an energy transfer in the molecules of air resulting in a moving high pressure (compression) zone traveling at 1130 feet/second. For this sound to have a major significance, it must be composed of multiple high and low pressure zones following each other. The time between the intervals of high pressure zones defines a cycle, and frequency is defined as the number of cycles per second.

Formerly, frequency was described in units of cps (cycles per second). Today it is more common to refer to a specific frequency in units called Hz (Hertz). Hertz simply means cycles per second. If we talk about the fundamental frequency of the concert "A" when played by a piano, then we are referring to 440 Hz. This tells us that when the piano plays a concert "A," it is generating 440 high and low pressure zones (periods or cycles) which propagate through a specific point in a given second. The range of human hearing is from 20 Hz to 20,000 Hertz.

Intensity of Sound

Intensity is the magnitude of variance in air pressure resulting from sound. From a graphical point of view intensity is directly related to the amplitude of the wave. The intensity of sound is usually expressed in terms of decibels (dB). The decibel is not an absolute measure of sound intensity; rather, it defines a relationship between two sound intensities. The decibel is a logarithmic ratio between what is defined as a zero decibel (0 dB) reference and the measured sound intensity level.
Threshold of hearing 0 dB
Leaves rustling in the breeze 20 dB
A quiet restaurant 50 dB
Busy Traffic 70 dB
Vacuum cleaner 80 dB
Threshold of pain 120 dB
Jet at takeoff 140 dB

Timbre, Tone Color

Sounds have another perceptual attribute: that of timbre or tone quality. We may describe sounds as being tinny, full, brassy, trumpet-like, etc. Timbre allows us to identify sounds.

Timbre is defined as that attribute of a sound that allows us to differentiate between two sounds of the same pitch, intensity and duration. The shape of the periodic wave producing the sound determines the relative strengths of the harmonics — or additional higher frequency components. A naturally occurring sound has a waveshape that is more complex than the sine wave example in the graphic above.

It used to be thought that timbre was related only to the relative strengths of the harmonics produced by an instrument, but recent research in computer synthesis of instruments has shown that the pattern of change over time of each of the components contributes to timbre. Sound recognition is also dependent on the sounds that are associated with the attack, for example the noise at the start of a trumpet sound, and to a lesser extent on the release, as when a piano key is released.

Digital Audio Basics

Converting analog signals

Here's what happens when sound is recorded digitally:

The analog signal is converted to digital form

The analog signal — a continuous variable defined with infinite precision — is converted to a discrete sequence of measured values which are represented digitally.

Aliasing

We sample the signal only at equal time intervals. We don't know what happened between the samples. Consider a "glitch" that happened to fall between adjacent samples. Since we don't measure it, we have no way of knowing the glitch was there at all.

In a less obvious case, we might have signal components that are varying rapidly in between samples. Again, we could not track these rapid inter-sample variations.

We must sample fast enough to see the most rapid changes in the signal.

If we do not sample fast enough, we cannot track completely the most rapid changes in the signal.

Some higher frequencies can be incorrectly interpreted as lower ones.

In the diagram, the high frequency signal is sampled just under twice every cycle. The result is, that each sample is taken at a slightly later part of the cycle. If we draw a smooth connecting line between the samples, the resulting curve looks like a lower frequency. This is called aliasing because one frequency looks like another — it travels under an alias.

Harry Nyquist (1920's) showed that to distinguish unambiguously between all signal frequency components,

we must sample at least twice the frequency of the highest frequency component.

In the diagram, the high frequency signal is sampled twice every cycle. If we draw a smooth connecting line between the samples, the resulting curve looks like the original signal. This avoids aliasing.

The highest signal frequency allowed for a given sample rate is called the Nyquist frequency. Here are some standard sampling rates.

96 kHz (96,000 samples per second) DVD-Audio
48 kHz DVD-Video, DV cameras, DAT, samplers
44.1 kHz CD, DAT, samplers
32 kHz, 22.05 Older samplers
Most professional-level computer software supports all these rates.

Quantization

When the signal is converted to digital form, the resolution is limited by the number of bits available (that is, the number of values available to encode each sample). We refer to this sort of resolution as sample word length or bit depth.

The diagram shows an analog signal which is then converted to a digital representation — in this case, with 8-bit word length. The smoothly varying analog signal can only be represented as a "stepped" waveform due to the limited resolution. The word length of hardware used for the sampling process determines the available resolution and dynamic range.

The effect, called quantization error, looks very like low-level random noise. The signal-to-noise ratio is affected by the number of bits in the data format.

©2003, Jeffrey Hass, John Gibson, Christopher Cook