A comparison of Internet audio compression formats

Note: This section is on Non-Speech Audio Compression. Speech Compression is a separate field and will be covered seperately.
Copyright (c) 1995-97, 2001, 2003 Serious Cybernetics
Commissioned by Radio Australia
Original research by Andrew Pam
Updated 1997 by Ben Hemming
"Further information" updated 2001 by Andrew Pam
Ogg information updated 2003 by Andrew Pam with thanks to Joel Forsberg

Quick comparison chart

8KHz mono audio formats
Audio format 16-bit PCM G.711 mu-law 32Kbps MPEG-1 IMA/DVI ADPCM GSM 06.10 InterWave VSC112 TrueSpeech 8.5 RealAudio v1.0 ToolVox for the Web
File extension .wav or .aiff .au .mpa or .mp2 .wav .gsm .vmf .wav .ra .vox
Data rate 128Kbps 64Kbps 32Kbps 32Kbps 13.2Kbps 11.2Kbps 8.5Kbps 8Kbps 2.4Kbps
File size per minute 960K 480K 240K 240K 96K 82K 62K 59K 18K
Compression factor 1:1 2:1 4:1 4:1 10:1 11:1 15:1 16:1 53:1
Sound quality 5 (sample) 4 (sample) 4 (sample) 3 (sample) 2 (sample) 2 (sample) 2 (sample) 1 (sample) 0-3 (sample)
Relative compression speed N/A 10 (not tested) 1.2 0.75 3 0.5 0.2 0.25
Windows player Yes Yes Yes Yes Yes Yes Yes Yes Yes
Mac player Yes Yes Yes Yes Yes No Yes Yes Yes
Unix player Yes Yes Yes Some Yes No No only v2.0 Promised
Supports higher sample rates Yes Yes, but rarely used Yes Yes No Yes No in v2.0 in other products
Streaming playback file None None None None .gsd .vmd .tsp .ram None
Credits None None None None None In .vmd file None In .ra file None

Audio compression formats

16-bit PCM

Original source material is uncompressed. 8-bit sampling is possible, but the dynamic range is much lower; 24-bit and 32-bit also exist but are rarely supported. PCs use the Microsoft .WAV format, Macs and Unix use the .AIFF format.

mu-law

mu-law is the international standard telephony encoding format, also known as ITU (formerly CCITT) standard G.711. It packs each 16-bit sample into 8 bits by using a logarithmic table to encode with a 13-bit dynamic range and dropping the least significant 3 bits of precision. Encoding and decoding is very fast and support is universal. There is a slight variation called A-law used in European telephone systems.

MPEG

MPEG (from the Motion Picture Experts Group) is the international standard for multimedia. It incorporates both audio and video encoding at a range of data rates. MPEG audio and video are the standard formats used on Video CDs and DVDs. The lowest data rate supported for MPEG-1 mono audio is 32Kbps. Sample rates of 32KHz, 44KHz (audio CD) and 48KHz (Digital Audio Tape) are supported; I used 32KHz for the 8KHz source material. There are three types of MPEG audio encoding, layer I, layer II and layer III in increasing order of sound quality and encoding time. Layer I is the "PASC" compression used in Digital Compact Cassettes and Layer II is the "MUSICAM" compression format. Layer III (aka "MP3") has recently become very popular on the Internet due to its combination of high quality and high compression ratio. MPEG-2 provides broadcast quality audio and video at higher data rates and MPEG-3 has been absorbed into MPEG-2. The new MPEG-4 standard will add support for lower sample rates (16KHz, 22KHz and 24KHz) and low data rate encoding (down to 8Kbps).

ADPCM

ADPCM (Adaptive Differential Pulse Code Modulation) comes in many varieties. There is the IMA (Interactive Multimedia Association) DVI standard, the ITU (formerly CCITT) standards G.726 and G.727 which supercede the earlier G.721 and G.723 standards, and proprietary versions from Microsoft, Creative Labs, Yamaha and Oki. There is also Sub-Band ADPCM (G.722) which is used for audio on ISDN phone lines.

GSM 06.10

GSM 06.10 is the international standard digital mobile telephony encoding format. It uses linear predictive coding to substantially compress the data by predicting the likely shape of the sound wave and recording the differences between the actual sound and the prediction. Compression and decompression are slow and the quality is not great, but the algorithm is freely available resulting in widespread use in products such as CyberPhone, NetPhone and Speak Freely.

InterWave

InterWave is a proprietary encoding format created by VocalTec. It is designed specifically for real-time audio on the Internet and features a friendly user interface and very rapid encoding times. A small program is provided which uses the Common Gateway Interface on a World Wide Web server to support repositioning during real-time playback. InterWave is also the basis for VocalTec's Internet Phone product. The following encoding formats are available to support a variety of sample rates:

NameData rateOriginal sample rate
VSC777.7Kbps5KHz
VSC11211.2Kbps8KHz
VSC15415.4Kbps11KHz
VSC22422.4Kbps16KHz

TrueSpeech

TrueSpeech is a proprietary encoding format created by DSP Group, Inc. It is designed for digital telephony use (such as WebPhone) and intended to be implemented in hardware using Digital Signal Processing chips. The decoder can play all TrueSpeech formats but a software encoder is currently only available for TrueSpeech 8.5. The following encoding formats are available to provide varying degrees of compression, using increasingly powerful chips:

NameData rateCompression factor
Truespeech 8.58.5Kbps15:1
Truespeech 6.36.3Kbps20:1
Truespeech 5.35.3Kbps24:1
Truespeech 4.84.8Kbps27:1

RealAudio

RealAudio is a proprietary encoding format created by Progressive Networks. It was the first compression format to support live audio over the Internet and thus gained considerable support, but it requires proprietary server software in order to provide the real-time playback facility. It also supports repositioning during real-time playback. Version 2.0 offers two encoding algorithms, the original v1.0 8Kbps data rate and a new faster data rate which provides higher audio quality from source material with higher sample rates (11KHz, 22KHz and 44KHz). However, RealAudio 2.0 requires the latest hardware; a Pentium or a PowerPC Mac is required for best results, although it will work on a 68040/25 or faster Mac or a 486/66MHz or faster PC.

ToolVox for the Web

A proprietary encoding format created by VoxWare which achieves very high compression by using vocal modelling. This allows the sound to be played faster or slower without changing the pitch, but is only designed to work with spoken voice material. Music and sound effects will usually not compress properly. VoxWare's MetaVoice technology is the basis for their TeleVox, ToolVox for Multimedia and ToolVox RT products and will also be the codec used in Netscape's LiveMedia standard.

OggSquish

OggSquish is intended to compete with MPEG layer III, but is still in alpha test. It will provide a range of compression factors from 5:1 up to 18:1 plus a "lossless" compression mode. It is optimised for very high sound quality (source material at 30-48KHz sample rates).

(Later update:) The original OggSquish evolved into the Xiph.org foundation which includes the audio codec "Ogg Vorbis" and a lossless codec named "FLAC"

ASPEC

ASPEC is one of the higher quality sound compression algorithms. ASPEC can produce CD quality sound and supports several different bitrates ranging from 128Kbps and down including 64Kbps. ASPEC uses the frequency limitations of human hearing as well as complex entropy coding for it's lossy compression.

The best bits of the ASPEC and MUSICAM compression formats have been combined for the MPEG Layer III audio compression standard.

Mac problems

Mac hardware does not support sampling at 8KHz. Therefore to compress audio on a Mac you will need to sample at a supported rate such as 11KHz and either use a format which supports the Mac rates (mu-law, MPEG or ADPCM) or resample the audio down to 8KHz before compression. This is very CPU-intensive and further reduces the audio quality. However, you should be able to play back 8KHz samples created on other hardware.

Windows problems

When playing compressed audio files under Windows, there may be annoying pauses in the sound. There are two main causes for this: not being able to effectively use a fast (28.8Kbps) modem, or interruptions caused by accessing the hard drive during playback. Both are symptoms of interrupt handling problems in Windows which do not occur under other operating systems such as Windows NT, OS/2, or Unix. Also, high speed modems require a "buffered UART" such as the 16550 chip rather than the older 8250 and 16450 chips still used in many PC serial ports. If you are using a 28.8Kbps modem check that you have this hardware and the appropriate driver software installed.

Further information


Send comments to xanni@sericyb.com.au (Andrew Pam)