This article is designed to give you an understanding of the basic concepts of digital audio and the terminology associated with it.
Amplitude and frequency
The two most important aspects of analog audio are amplitude and frequency. Let's discuss the basic properties of sound waves and explore why different sounds are different from each other.
Amplitude: In audio, amplitude refers to the strength of the contraction and expansion experienced by the medium (mainly air) through which the sound passes. It is measured in decibels (dB), which we think of as the loudness of a pitch. The higher the amplitude, the louder the sound.
Frequency: Frequency refers to the number of times a medium experiences vibrations in one second. It is measured in Hertz (Hz), also known as pitch. Low frequency sounds travel farther than high frequency sounds. For example, the frequency of a drum sound is lower than that of a flute.
Humans can hear frequencies between 20 Hz and 20,000 Hz. Frequencies above 20,000 Hz are called ultrasonic, and frequencies below 20 Hz are called infrasound and cannot be heard by humans.
What is "digital audio"?
Digital audio is the phenomenon of recording, processing, storing and transmitting audio signals through computer systems over the Internet. Computers can only understand signals that have been encoded as binary numbers. But any audio content is analog in nature and cannot be interpreted by our processors.
In order to represent audio signals in a way that computers understand and process data, the data needs to be converted into digital (binary) form.
The process requires different steps. Typically, an analog audio signal comes in the form of a continuous sine wave, while digital audio represents discrete points that show the amplitude of the waveform. Continuous signals must be converted to discrete signals because they provide finite and countable values for the computer to use after a certain time interval.
Sampling and quantization or digitization of audio signals
The conversion process begins with an analog-to-digital conversion (ADC). The ADC process must complete two tasks, namely sampling and quantization. Samples represent the number of samples (amplitude values) captured at regular intervals. The sample rate is the number of samples collected per second, measured in Hertz (Hz). If we record 48000 samples per second, the sampling rate is 48000 Hz or 48 kHz.
Sampling Rate (Fs) = 48 kHz
Sampling period (Ts) = 1/Fs
Sampling rate and audio frequency
Smaller sampling intervals allow for higher sampling rates, resulting in higher audio frequencies and larger file sizes and ultimately better quality sound. So obviously, for lossless digitization, the sample rate should be high enough.
Frequencies above half the sample rate cannot be represented in digital samples. According to the Nyquist theorem, a continuous-time signal can be perfectly reconstructed from its digital samples when the sampling rate is more than twice the highest audio frequency.
Nyquist frequency: The sampling rate should be at least twice Fmax.
Fs 〉2 Fmax
Aliasing: Aliasing is an artifact or distortion that occurs when a signal is sampled below twice the highest audio frequency in the signal. Aliasing often results in discrepancies between the signal reconstructed from the sample and the original continuous signal. It depends on the frequency and sampling rate of the signal. For example, if the signal is sampled at 38 kHz, any frequency components above 19 kHz will alias.
Anti-aliasing filter: The aliasing process can be avoided by using a low-pass filter or an anti-aliasing filter. These filters are applied to the input signal prior to sampling to limit the bandwidth of the signal. The anti-aliasing filter removes components above the Nyquist frequency and allows the signal to be reconstructed from the digital samples without any additional distortion.
Bit Depth: Simply put, bit depth is the number of bits available per sample. Computers can only understand and store information as binary numbers (ie 1 or 0). These binary numbers are called bits. higher number. The number of bits determines that more information has been stored. Therefore, the higher the bit depth, the more data will be captured for more accurate results.
Dynamic Range: Bit depth also determines the dynamic range of the signal. When a sampled signal is quantized to the nearest value within a given range, those values within that range are determined by the bit depth. These dynamic ranges are expressed in decibels (dB). In digital audio, 24-bit audio has a maximum dynamic range of 144dB, while 16-bit audio has a maximum dynamic range of 96dB.
The bit depth of 16-bit digital audio with a sample rate of 44.1 kHz is widely used for consumer audio, while 24-bit audio with a sample rate of 48 kHz is used for professional audio for content recording, mixing, storage, and editing.
Quantization of audio signals
It is the process of mapping an analog audio signal with infinite values from a large dataset to a digital audio signal in a smaller dataset with finite and countable values during analog-to-digital conversion (ADC). Bit depth plays an important role in determining the accuracy and quality of quantized values. If the audio signal uses 16 bits, the maximum number of amplitude values represented is 2^16= 65,536 values.
It indicates that the amplitude of the signal is divided into 65,536 samples, and the amplitude of all samples will be assigned a discrete value from a range of possible values. During this process, there may be a small perceptible loss of audio quality, but this is usually not understood by the human ear. This loss is due to the difference between the input value and the quantized value and is described as quantization error.
Figure 6: Relationship b/w sampling and quantization error at different frequencies
Mono, Stereo and Surround in Digital Audio
A monophonic (mono) sound is a system in which all sounds are combined and passed through a single channel. It uses only one channel when converting the signal to sound. It produces sound effects from one speaker or a single source even if there are multiple speakers and the sound comes from different speakers.
Stereo (stereo) sound is the opposite of mono sound. It uses two separate channels (left and right) to produce sound effects from different directions depending on the speaker you are sending the signal to. It provides listeners with an illusion of multi-dimensional audio perception and uniform coverage for the left and right channels.
Stereo has started to replace mono due to better audio quality and more channels.
Surround sound enriches the fidelity of audio reproduction for listeners by using multiple channels. It lets the audience experience sound from three or more directions. In addition to left, right and midrange, surround sound can also be heard from the front and rear, giving the listener a sense of sound coming from all directions. It is widely used in audio systems such as 5.1 and 7.1 channel home theaters designed by Dolby & DTS.
Digital audio converts analog signals into discrete (binary) form, which can be stored and manipulated on computer systems. Today, digital audio systems are everywhere, whether it's phones, music systems, computers, home audio systems, conference equipment, or any other smart device. It brings many advantages over the traditional recording and playback of songs using analog music systems. In addition to a variety of personalization features, digital audio provides users with high-quality audio, reliability, more storage space, wireless connectivity, portability and a truly immersive experience.