A comparison of Internet audio compression formats
Note: This section is on Non-Speech Audio Compression. Speech Compression
is a separate field and will be covered seperately.
Copyright (c) 1995-97, 2001, 2003 Serious Cybernetics
Commissioned by Radio Australia
Original research by Andrew Pam
Updated 1997 by Ben Hemming
"Further information" updated 2001 by Andrew Pam
Ogg information updated 2003 by Andrew Pam with thanks to Joel Forsberg
Quick comparison chart
8KHz mono audio formats
Audio format |
16-bit PCM |
G.711 mu-law |
32Kbps MPEG-1 |
IMA/DVI ADPCM |
GSM 06.10 |
InterWave VSC112 |
TrueSpeech 8.5 |
RealAudio v1.0 |
ToolVox for the Web |
File extension |
.wav or .aiff |
.au |
.mpa or .mp2 |
.wav |
.gsm |
.vmf |
.wav |
.ra |
.vox |
Data rate |
128Kbps |
64Kbps |
32Kbps |
32Kbps |
13.2Kbps |
11.2Kbps |
8.5Kbps |
8Kbps |
2.4Kbps |
File size per minute |
960K |
480K |
240K |
240K |
96K |
82K |
62K |
59K |
18K |
Compression factor |
1:1 |
2:1 |
4:1 |
4:1 |
10:1 |
11:1 |
15:1 |
16:1 |
53:1 |
Sound quality |
5 (sample) |
4 (sample) |
4 (sample) |
3 (sample) |
2 (sample) |
2 (sample) |
2 (sample) |
1 (sample) |
0-3 (sample) |
Relative compression speed |
N/A |
10 |
(not tested) |
1.2 |
0.75 |
3 |
0.5 |
0.2 |
0.25 |
Windows player |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Mac player |
Yes |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
Yes |
Unix player |
Yes |
Yes |
Yes |
Some |
Yes |
No |
No |
only v2.0 |
Promised |
Supports higher sample rates |
Yes |
Yes, but rarely used |
Yes |
Yes |
No |
Yes |
No |
in v2.0 |
in other products |
Streaming playback file |
None |
None |
None |
None |
.gsd |
.vmd |
.tsp |
.ram |
None |
Credits |
None |
None |
None |
None |
None |
In .vmd file |
None |
In .ra file |
None |
Audio compression formats
16-bit PCM
Original source material is uncompressed. 8-bit sampling is possible, but
the dynamic range is much lower; 24-bit and 32-bit also exist but are rarely
supported. PCs use the Microsoft .WAV format, Macs and Unix use the .AIFF
format.
mu-law is the international standard telephony encoding format, also
known as ITU (formerly CCITT) standard G.711. It packs each 16-bit
sample into 8 bits by using a logarithmic table to encode with a 13-bit
dynamic range and dropping the least significant 3 bits of precision.
Encoding and decoding is very fast and support is universal. There is
a slight variation called A-law used in European telephone systems.
MPEG (from the Motion Picture Experts Group) is the international standard
for multimedia. It incorporates both audio and video encoding at a range
of data rates. MPEG audio and video are the standard formats used on
Video CDs and DVDs. The lowest data rate supported for MPEG-1 mono audio
is 32Kbps. Sample rates of 32KHz, 44KHz (audio CD) and 48KHz (Digital
Audio Tape) are supported; I used 32KHz for the 8KHz source material.
There are three types of MPEG audio encoding, layer I, layer II and
layer III in increasing order of sound quality and encoding time.
Layer I is the "PASC" compression used in Digital Compact Cassettes
and Layer II is the "MUSICAM" compression format. Layer III (aka
"MP3") has recently become very
popular on the Internet due to its combination of high quality and
high compression ratio. MPEG-2 provides broadcast quality audio and
video at higher data rates and MPEG-3 has been absorbed into MPEG-2.
The new MPEG-4 standard will add support for lower sample rates (16KHz,
22KHz and 24KHz) and low data rate encoding (down to 8Kbps).
ADPCM (Adaptive Differential Pulse Code Modulation) comes in many
varieties. There is the IMA (Interactive Multimedia Association) DVI
standard, the ITU (formerly CCITT) standards G.726 and G.727 which
supercede the earlier G.721 and G.723 standards, and proprietary
versions from Microsoft, Creative Labs, Yamaha and Oki. There is also
Sub-Band ADPCM (G.722) which is used for audio on ISDN phone lines.
GSM 06.10 is the international standard digital mobile telephony encoding
format. It uses linear predictive coding to substantially compress the
data by predicting the likely shape of the sound wave and recording the
differences between the actual sound and the prediction. Compression and
decompression are slow and the quality is not great, but the algorithm is
freely available resulting in widespread use in products such as
CyberPhone,
NetPhone and
Speak Freely.
InterWave is a proprietary encoding format created by
VocalTec. It is designed
specifically for real-time audio on the Internet and features a friendly
user interface and very rapid encoding times. A small program is
provided which uses the Common Gateway Interface on a World Wide Web
server to support repositioning during real-time playback. InterWave is
also the basis for VocalTec's
Internet Phone product.
The following encoding formats are available to support a variety of
sample rates:
Name | Data rate | Original sample rate |
VSC77 | 7.7Kbps | 5KHz |
VSC112 | 11.2Kbps | 8KHz |
VSC154 | 15.4Kbps | 11KHz |
VSC224 | 22.4Kbps | 16KHz |
TrueSpeech is a proprietary encoding format created by
DSP Group, Inc. It is designed for
digital telephony use (such as WebPhone) and intended to be
implemented in hardware using Digital Signal Processing chips. The
decoder can play all TrueSpeech formats but a software encoder is
currently only available for TrueSpeech 8.5. The following encoding
formats are available to provide varying degrees of compression, using
increasingly powerful chips:
Name | Data rate | Compression factor |
Truespeech 8.5 | 8.5Kbps | 15:1 |
Truespeech 6.3 | 6.3Kbps | 20:1 |
Truespeech 5.3 | 5.3Kbps | 24:1 |
Truespeech 4.8 | 4.8Kbps | 27:1 |
RealAudio is a proprietary encoding format created by
Progressive Networks.
It was the first compression format to support live audio over the
Internet and thus gained considerable support, but it requires
proprietary server software in order to provide the real-time playback
facility. It also supports repositioning during real-time playback.
Version 2.0 offers two encoding algorithms, the original v1.0 8Kbps data
rate and a new faster data rate which provides higher audio quality from
source material with higher sample rates (11KHz, 22KHz and 44KHz).
However, RealAudio 2.0 requires the latest hardware; a Pentium or a
PowerPC Mac is required for best results, although it will work on a
68040/25 or faster Mac or a 486/66MHz or faster PC.
A proprietary encoding format created by
VoxWare which achieves very high
compression by using vocal modelling. This allows the sound to be
played faster or slower without changing the pitch, but is only designed
to work with spoken voice material. Music and sound effects will
usually not compress properly. VoxWare's
MetaVoice technology is
the basis for their TeleVox,
ToolVox for Multimedia and
ToolVox RT products and will
also be the codec used in Netscape's
LiveMedia
standard.
OggSquish is intended to compete with MPEG layer III, but is still in
alpha test. It will provide a range of compression factors from 5:1 up
to 18:1 plus a "lossless" compression mode. It is optimised for very high
sound quality (source material at 30-48KHz sample rates).
(Later update:) The original OggSquish evolved into the
Xiph.org foundation which includes
the audio codec "Ogg Vorbis" and
a lossless codec named "FLAC"
ASPEC is one of the higher quality sound compression algorithms. ASPEC
can produce CD quality sound and supports several different bitrates
ranging from 128Kbps and down including 64Kbps. ASPEC uses the frequency
limitations of human hearing as well as complex entropy coding for
it's lossy compression.
The best bits of the ASPEC and MUSICAM compression formats have been
combined for the MPEG Layer III audio compression standard.
Mac problems
Mac hardware does not support sampling at 8KHz. Therefore to compress
audio on a Mac you will need to sample at a supported rate such as 11KHz
and either use a format which supports the Mac rates (mu-law, MPEG or
ADPCM) or resample the audio down to 8KHz before compression. This is
very CPU-intensive and further reduces the audio quality. However, you
should be able to play back 8KHz samples created on other hardware.
Windows problems
When playing compressed audio files under Windows, there may be annoying
pauses in the sound. There are two main causes for this: not being able
to effectively use a fast (28.8Kbps) modem, or interruptions caused by
accessing the hard drive during playback. Both are symptoms of
interrupt handling problems in Windows which do not occur under other
operating systems such as Windows NT, OS/2, or Unix. Also, high speed
modems require a "buffered UART" such as the 16550 chip rather than the
older 8250 and 16450 chips still used in many PC serial ports. If you
are using a 28.8Kbps modem check that you have this hardware and the
appropriate driver software installed.
Further information
Send comments to xanni@sericyb.com.au
(Andrew Pam)