The study of how humans perceive sound and a huge part of how lossy compression works.
It is argued that our brains cannot accurately perceive every bit of data that passes our ears when listening to CD-quality audio.
The argument then states that if we don’t perceive it as an audible sound then we don’t need it.
In principle, we get the same listening experience but use a fraction of the disk space.
In practice, it’s still up for debate as different audio coding algorithms make different choices on what data should be discarded.
Byproducts of Lossy Compression
If leaving out sounds that are unlikely to be heard anyway sounds good, there is a downside.
Artifacts left behind by lossy compression create unwanted sounds or anomalies that are not in the original recording.
These come in many different forms such as loss of bandwidth, pre-echoes, and post-echoes, double-track effect, Dynamics and phase shift and weakened low end.
Loss of Bandwidth
Different encoders have different perceptual coding algorithms.
This means you can’t always go between platforms and get the same results from the same settings.
A bit rate of 128 kbps or less won’t cut it anymore despite being the standard for platforms like iTunes etc. previously.
Now, at 128 kbps, MP3s filter the higher frequencies very crudely, discarding frequency content anywhere above approx. 16 kHz.
The iTunes MP3 encoder goes as far as creating distortions in this frequency range so in order to maintain full bandwidth through the iTunes MP3 encoder you must have a bit rate of 256 kbps or higher.
Pre and Post-Echoes
This is when sounds are heard before or after the expected sonic event.
It’s a common artifact in MP3 files and is caused by quantization noise being spread over the entire transform-window of the codec.
Temporal masking occurs when a loud sonic event masks a quieter one.
This happens when they occur in close proximity and it doesn’t matter which comes first.
Even if the quiet one happens first it will be masked by the louder one if there is only a small interval of time between the two.
The masking threshold is the sound pressure level needed to make a sound audible to the human ear when in the presence of another sound known as a masker.
The threshold depends on the frequency, the type of masker and the type of sound being masked.
If the sound being masked exists beyond the masking threshold then it becomes audible and we hear it as a pre or post-echo.
This most often occurs with sounds from percussion instruments but is likely any shorter transient burst of noise when encoded to a format such as MP3.
It’s a problem that can occur commonly even at higher bitrates like 256 kbps. There is a psychoacoustic element that means one often hears the pre-echo but not the post-echo.
Forward temporal masking is much stronger than backward temporal masking which results in the post-echo being drowned out by the transient.
Double Track Effect
Lower bit rates can sometimes cause audio content timing errors.
The effect of this is most noticeably heard on vocals, creating the illusion of the voice being double-tracked.
Dynamics and Phase Shift
The nature of perceptual audio coding is to remove frequency content that we are unlikely to hear.
The result of this can sometimes mean that our perception of the remaining frequency content can be altered.
You can end up with a dynamic range that is massively inconsistent.
Some sounds can seem attenuated which makes surrounding sounds seem boosted.
The relative phase or timing of frequency content can be changed which can affect stereo imaging or even the transparency and clarity of the material.
When frequency content is stretched over time, like with pre-echoes and post-echoes, it can play havoc with the listeners’ perception of the audio.
Weak Low End
One of the issues the MP3 format is most known for is making a banging bassline sound timid and weak.
Lower frequencies are far more difficult for DSP (Digital Signal Processing) algorithms to analyze.
This is because lower frequencies have a longer duration while the analysis windows are short.
This means that the analysis window won’t usually capture an entire cycle of a low frequency.
In many cases, the encoder will get less than half a cycle of any frequency under 114 Hz.
How WAV Files are Encoded
PCM (Pulse Code Modulation) is used for the lossless encoding of audio data.
It is the method used to digitally represent sampled analog signals.
The amplitude of an analog signal is sampled at uniformed intervals, each sample is then quantized to the nearest value within a set range of digital steps.
In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.
LPCM (Linear Pulse Code Modulation) is a type of PCM where the quantization levels are linearly uniform.
The two basic properties that determine the encoded stream’s fidelity to the original recording are the sample rate and bit depth.
Sample rate refers to the number of samples that are taken per second.
Bit depth refers to the number of possible digital values that can be used to represent each sample.
PCM is a term more broadly used but often it is used to describe data that is encoded as LPCM
A Brief History of the MP3
Karlheinz Brandenburg, a professor at the Fraunhofer Institute was one of the lead developers of the MP3.
He was also one of the first people to push for the use of psychoacoustics.
By the late 1980s, the MP3 was almost ready but still having issues dealing with the human voice.
The song Tom’s Diner by Suzanne Vega is a common choice amongst audiophiles for testing sound systems.
The A Capella version of Tom’s Diner would also be the first track chosen to test the MP3.
Initially, MP3 compression absolutely destroyed the track leading to hundreds of revisions to get it right.
Ghost in the MP3 is a project by Ryan Mcguire who created a track from the discarded/leftover sounds from Tom’s Diner after compression.
The MP3 format still widely divides opinion but whether you think it saved the industry or ruined it, it certainly had a huge effect on it.
There were some seminal moments in the history of MP3: the release of the Winamp media player for Windows in 1997 was huge.
In the late 1990’s I think everyone with a PC created Winamp playlists full of MP3s.
The big change was that people could now have hundreds of songs on their computer without filling up their entire hard drive.
Furthermore, people could easily share these songs with others.
This development gave birth to a host of illegal file-sharing platforms.
In 1999 came Napster, the most infamous of the peer2peer sharing platforms which would be caught up in endless legal battles with most of the record industry.
The next huge development was making this music portable.
The very first MP3 player was the MPman, released in 1998, and then Apple soon joined the market in 2001 with iTunes and the iPod.
As we all know, the iPod in its various forms took the world by storm and the MP3 along with it.
When you purchase music from iTunes it’s in the AAC format but you can still convert to MP3 to transfer to compatible devices.
MP3 players have of course been all but forgotten.
Most mobile phones now have enough storage for all the music you can handle. Because of this, MP3s are actually a daily feature in many people’s lives.
Advantages of MP3
Small File Format
Because files are so small, they can be easily distributed over the Internet and/or huge libraries stored on computers or handheld devices.
Because of this, they are still widely used today.
Compresses Files with Little Perceivable Difference to the Overall Sound Quality
In most applications, the average listener won’t actually hear any loss in quality
Easy to Convert a WAV or CD to MP3 with Free Software
iTunes, LAME MP3 encoders are many other free converters are available.
Disadvantages of MP3
MP3s lossy compression will always mean you sacrifice quality for smaller file size.
Nasty side effects of MP3 compression.
Not Suitable for Professional Work
The loss of quality and the artifacts mean MP3 really doesn’t work for professional work.
Advantages of WAV
Retains full Quality
It is an accurate, lossless format, the quality remains the same as the original recording. Our selection of royalty-free music, for example, is in the lossless WAV format of at least 16 bit/44,1 kHz.
Files are easy to edit and process with user-friendly software from freeware to professional applications.
Advancements in Home Recording
Many popular home studio audio interfaces can now offer recording rates up to 192 kHz.
WAV is the perfect format to take advantage of this high quality and huge dynamic range.
Disadvantages of WAV
The large size makes WAV files very impractical for portable devices and streaming.
Quality vs. Size
Despite several advantages and disadvantages for each, the argument over MP3 or WAV will always come down to quality vs. size.
So, the first question is what do you need the file for?
If you are an artist hoping to release a single, most online music stores require the WAV format.
When your music is streamed, it’s going to be in some lossy format even if not uploaded as an MP3 file.
An MP3 converted to a WAV will still be missing all of the data discarded in the MP3 encoding.
So, don’t start from a point where it could only get worse.
Imagine paying a mastering engineer to add the final polish then handing over an MP3 to work from.
The point is an MP3 will never be better quality than a WAV under any circumstances.
When quality is the most important thing always use WAV.
If you want to share a demo or idea quickly use MP3.
There are far too many examples to list but it’s common sense, decide what’s more important, quality or speed and size.
FLAC – Free lossless audio codec
FLAC files are lossless so no quality is lost but they are compressed.
They generally reduce the size of the original audio file by about 50%.
AIFF – Audio interchange file format
Audio data in AIFF files is uncompressed PCM.
It’s another lossless format and basically the Mac OS version of WAV.
OGG – Ogg Vorbis
A lesser-used or supported the lossy audio format.
OGG files are encoded using a variable bit rate system.
Choosing a quality setting gives the encoder an average number of bits to use.
When parts of the audio are more difficult to encode extra bits will be used.