Introduction to Digital Audio
Your ears are analog devices that convert sound waves into mechanical pulses the brain can understand. Your computer is a binary device, which means that it can only understand messages described in ones and zeros. In order to convert an analog signal to a digital signal, a converter executes several operations. The main objective of the converter is to sample a piece of the incoming analog signal (kind of like nibbling on a slice of cake), and then the conversion of each sample into a 16-bit binary description
The standard sampling rate for digital audio onto musical CDs is 16-bit, 44.1 kHz, a rate that was standardized early on by a fellow named Nyquist. Mr. Nyquist determined that sample rates needed to be twice that of the highest frequency people can hear. As most people can hear up to 20 kHz, it was decided that the sample rate should be 44.1 kHz, which would give you a frequency response up to 22 kHz — a little beyond what most human beings can hear. The entire range of usual human hearing is 20 Hz — 20 kHz, with 20 Hz being the lowest frequency people can usually hear (ex: rap records try to utilize these low frequencies). 20 kHz is the highest frequency you can hear (think of a dentist's drill).
Electrically, an analog audio signal looks like wavy lines on an oscilloscope (which is a device electronic technichians use to test audio equipment). When you use a hard disc audio recorder on your computer, the program will represent audio waves in this manner:
The converter looks at the amplitude (the distance above or below the centerline of an audio signal’s waveform) of the incoming signal 44,100 times per second! The amplitude is then described using 16 digits (always a combination of zeros and ones: binary code). This 16-digit number is called a word. A stream of words is then recorded onto your hard drive, and is then converted back into analog audio by the program you are using to edit your audio. In other words, every audio file on the computer is just series of ones and zeros grouped into 16-bit words. These audio files are referred to as uncompressed digital audio. The most popular file formats for uncompressed digital audio are: WAVs (Windows Audio Volume — a Windows native file format); AIFFs (Audio Interchange File Format — the Macintosh version of a WAV); and SDIIs (Sound Designer II — a proprietary file format used by Digidesign for their suite of programs, including Pro Tools, Sound Designer and AVID).
Recording audio onto your hard disc is easy with the right tools, but the file size is huge. It is estimated that one minute of stereo audio at 16-bit 44.1 kHz has a file size of about 10.5 megabytes. This may not seem like a lot of space if you have a 27 gig hard drive, but let’s looks at it the way the internet sees it: With a modem speed of 28.8 kbps, each meg of information takes about 35 seconds to download. Hence, a 3 minute music sample will take about 18 minutes to download — not a very efficient way of transmitting data! For this reason, several companies have developed various methods of reducing audio file sizes so that reasonable quality can be maintained while file sizes are reduced dramatically.
As we learned, several companies have developed methods of reducing the size of a 16- bit 44.1 kHz audio file into a more manageable size for internet distribution. Methods of reducing audio file sizes are known as a codec, which is short for compression/decompression. Remember that WAVs and AIFFs are uncompressed audio. Applying a codec to an uncompressed audio file will yield a compressed file that is smaller in size, yet (hopefully) maintains the sonic integrity of the original file. You might be familiar with WinZip or Stuff It. These programs compress computer data into smaller files that can be emailed or distributed in less time over the internet.
Codecs determine what information is unnecessary and throws it away. As a result, the file size is smaller. We learned that stereo audio is about 10.5 megs per minute. Mono audio files will be half that size of stereo audio files since stereo is actually a combination of two mono files! A codec works in this way, but it does its magic by reducing bit resolution rates (16 to 8 to 4 bits) and reducing sample rates (44.1 k to 32kto22kto11k).
Bit resolution is an important component for the fidelity of an audio file. Each reduction in bit resolution results in a less accurate description of the amplitude of each sample. For example, if I asked you to measure a wall using only full sheets of 8.5" x 11" paper, you would be able to give me a number (say 10 sheets) that will represent the height of the wall. When you get to that last sheet of paper, you might find that the wall is actually 9.5 sheets high, but the criteria is to describe the height using whole sheets of paper, so you opt for saying 10 sheets. This is equivalent to 8-bit resolution. Now remeasure that wall with index cards. You will find that you can get much closer to describing the actual height of the wall because your measuring unit is smaller. This is equivalent to 16-bit resolution.
Sample rate reduction affects the frequency response of your audio file. Remember Nyquist? The sample rate needs to be twice the highest frequency you plan to encode, and 44.1 kHz is the standard for CD Quality audio. This means that the upper limit on the high end is 22.05 kHz, which is beyond what most people can hear. 32 k will give you a high end limit of 16 k, which is just below what the average person can hear (of course, we lose high end response abilities as we age). This sort of reduction in high end is almost undetectable to the average listener. A 22 k sampling rate will limit the high end to about 11 k. Cymbals on a drum set live in the 10 k range, so you can see that we are still at an acceptable frequency response (maybe slightly dull), but this will be perceived as good quality by the majority of listeners. Also notice that the sampling frequency is now half of it's original 44.1 — therefore, the file size is also half as large. Each reduction of these parameters yields a smaller file size but at the cost of fidelity. The race in this field is to provide a small file size with excellent audio quality, which is no small task, indeed.
There are two types of delivery modes for the internet: Download and Streaming. Every platform can be "downloaded" — you can post or send a WAV or AIFF to anyone via e.mail. Of course, the result of downloading a WAV or AIFF is massive connect times on the internet because the files are so big, so the person you send such a file to may not be too happy about it but it can be done.
Some genius somewhere realized that they would be donned the King/Queen of internet Delivery if audio files could be reduced in size yet perfect audio quality was left intact. The most common form of downloadable audio delivery is mp3. This codec analyzes audio information and translate it in a compression scheme of 5:1, with almost no detectable loss of fidelity. This means, for instance, that a 3 minute music sample that was originally 33 megs could become 6.3 megs or smaller. This is accomplished through the use of variable sample rates, variable bit rates, and perceptual coding. To explain perceptual coding, let's look at a typical song; the music begins the vocals come in and possibly the music continues by itself at the end of the piece. When the voice comes in, the music drops down in level and is at times masked by the voice itself. Codecs analyze these waveforms and give the most bits to the voice (which is up front) and less bits to the music (in the background). There is no need to encode the music in full fidelity since it is covered by the voice most of the time.
Streaming media is the ability to see or hear content on demand from a web site. This is a hot field in the internet world! The main players of streaming technology are Apple's QuickTime, Real Network's Real Media, and Microsoft's Media Player. These three companies have led the march to provide high quality media streams at the lowest bit rate possible. At one time, each company's player would only play their own files, but these days most players able to decode all the other formats. Isn't direct competition grand?
This article was made possible by Spot Taxi. www.spottaxi.com uses the internet to traffic radio commercials using mp2 technology.