What is cdparanoia?
Cdparanoia is a Compact Disc Digital Audio (CDDA) Digital Audio
Extraction (DAE) tool, commonly known on the net as a 'ripper'. The
application is built on top of the Paranoia library, which is doing
the real work (the Paranoia source is included in the cdparanoia
source distribution). Cdparanoia reads audio from the CDROM directly
as data, with no analog step between, and writes the data to a file or
pipe in WAV, AIFC or raw 16 bit linear PCM.
Cdparanoia is a bit different than most other CDDA extration tools.
It contains few-to-no 'extra' features, concentrating only on the
ripping process and knowing as much as possible about the hardware
performing it. Cdparanoia will read correct, rock-solid audio data
from inexpensive drives prone to misalignment, frame jitter and loss
of streaming during atomic reads. Cdparanoia will also read and
repair data from CDs that have been damaged in some way.
Cdparanoia is easy to use and administrate; It has no compile time
configuration, happily autodetecting the CDROM, its type, its
interface and other aspects of the ripping process at runtime. A
single binary can serve the diverse hardware of the do-it-yourself
computer laboratory from Hell.
Why use cdparanoia?
All CDROM drives are not created equal. You'll need cdparanoia if
yours is a little less equal than others-- or maybe you just keep your
CD collection in a box of full of gravel. Jewel cases are for wimps;
you know what I'm talking about.
Unfortunately, most rippers cannot work properly with a large number
of CDROM drives in the desktop world today. The most common problem
is sporadic or regular clicks and pops in the read sample, regardless
of options or settings. The great lesson from coding software for
CDROMS the past 15 years is that drives that don't have bugs reading
digital audio are exceptionally rare. Most drives advertise that they
support 'error correcting streaming' or 'perfect reconstruction'. If
that's true, why is your music collection full of glitches?
Cdparanoia is also smarter about finding and probing CDDA support from
drives. Cdparanoia knows most of the old proprietary CDDA reading
command sets from the bad old days and can autodetect them all. Many
drives, especially older drives, that do not work at all with other
rippers will work just fine with cdparanoia.
Nor will you need to type in PCI or SCSI bus ids to use your cdroms
ever again. OK, that's not so common today, but it was a *killer*
feature 15 years ago ;-) That alone nearly sent cdda2wav to its grave
;-)
What are the differences between Paranoia versions?
Paranoia I and II were a set of patches to Heiko Eissfeldt's cdda2wav
0.8. These patches did nothing more than add some error checks to the
standard cdda2wav. They were inefficient and only worked with some
drives. Paranoia III was the first version to be written seperately from
cdda2wav in the form of a standalone library.
The last of the previous generation of cdparanoia was cdparanoia III
version 9.8 from early 2001, designed for Linux 2.0 through early 2.4.
At this point, the project met all its original goals and was declared
'finished'.
Linux kept moving forward, finally unifying CDROM device access across all device types behind a new kernel interface in the 2.6 kernel series (something
The last cdparanoia 9.x was 9.8, the last of the 9.x versions in 2001 and was designed to support linux
through early 2.4 kernels.
Paranoia IV is an upcoming generation that intends to improve the
library API as well as take advantage of new CDROM features that
existed on only a few specialist drives five years ago, but are now
ubiquitous even in inexpensive models. Where Paranoia III
concentrated on bulletproof extraction from good media and reliable
extraction from damaged media, Paranoia IV will concentrate on the
best possible extraction and correction from even heavily damaged
media-- so long as the drive can still recognize the disc.
I can play audio CDs perfectly; why is reading the CD into a file so difficult and prone to errors? It's just the same thing.
Unfortunately, it isn't that easy.
The audio CD is not a random access format. It can only be played
from some starting point in sequence until it is done, like a vinyl
LP. Unlike a data CD, there are no synchronization or positioning
headers in the audio data (a CD, audio or data, uses 2352 byte
sectors. In a data CD, 304 bytes of each sector is used for header,
sync and error correction. An audio CD uses all 2352 bytes for data).
The audio CD *does* have a continuous fragmented subchannel, but this
is only good for seeking +/-1 second (or 75 sectors or ~176kB) of the
desired area, as per the SCSI spec.
When the CD is being played as audio, it is not only moving at 1x, the
drive is keeping the media data rate (the spin speed) exactly locked
to playback speed. Pick up a portable CD player while it's playing
and rotate it 90 degrees. Chances are it will skip; you disturbed
this delicate balance. In addition, a player is never distracted from
what it's doing... it has nothing else taking up its time. Now add a
non-realtime, (relatively) high-latency, multitasking kernel into the
mess; it's like picking up the player and constantly shaking it.
CDROM drives generally assume that any sort of DAE will be linear and
throw a readahead buffer at the task. However, the OS is reading the
data as broken up, seperated read requests. The drive is doing
readahead buffering and attempting to store additional data as it
comes in off media while it waits for the OS to get around to reading
previous blocks. Seeing as how, at 36x, data is coming in at
6.2MB/second, and each read is only 13 sectors or ~30k (due to DMA
restrictions), one has to get off 208 read requests a second, minimum
without any interruption, to avoid skipping. A single swap to disc or
flush of filesystem cache by the OS will generally result in loss of
streaming, assuming the drive is working flawlessly. Oh, and
virtually no PC on earth has that kind of I/O throughput; a Sun
Enterprise server might, but a PC does not. Most don't come within a
factor of five, assuming perfect realtime behavior.
To keep piling on the difficulties, faster drives are often prone to
vibration and alignment problems; some are total fiascos. They lose
streaming *constantly* even without being interrupted. Philips
determined 15 years ago that the CD could only be spun up to 50-60x
until the physical CD (made of polycarbonate) would deform from
centripetal force badly enough to become unreadable. Today's players
are pushing physics to the limit. Few do so terribly reliably.
Note that CD 'playback speed' is an excellent example of advertisers
making numbers lie for them. A 36x cdrom is generally not spinning at
36x a normal drive's speed. As a 1x drive is adjusting velocity
depending on the access's distance from the hub, a 36x drive is
probably using a constant angular velocity across the whole surface
such that it gets 36x max at the edge. Thus it's
actually spinning slower, assuming the '36x' isn't a complete lie, as
it is on some drives.
Because audio discs have no headers in the data to assist in picking
up where things got lost, most drives will just guess.
This doesn't even *begin* to get into stupid firmware bugs. Even
Plextors have occasionally had DAE bugs (although in every case,
Plextor has fixed the bug *and* replaced/repaired drives for free).
Cheaper drives are often complete basket cases.
Rant Update (for those in the know):
Several folks, through personal mail and on Usenet, have pointed out
that audio discs do place absolute positioning information for (at
least) nine out of every ten sectors into the Q subchannel, and that
my original statement of +/-75 sectors above is wrong. I admit to it
being misleading, so I'll try to clarify.
The positioning data certainly is in subchannel Q; the point is moot
however, for a couple of reasons.
- The SCSI and ATAPI specs (there are a couple of each, pick one)
don't give any way to retrieve the subchannel from a desired sector.
The READ SUB-CHANNEL command will hand you Q all right, you just don't
have any idea where exactly that Q came from. The command was
intended for getting rough positioning information from audio discs
that are paused or playing. This is audio; missing by several sectors
is a tiny fraction of a second.
- Older CDROM drives tended not to expect 'READ SUB-CHANNEL' unless
the drive was playing audio; calling it during data reads could crash
the drive and lock up the system. I had one of these drives (Apple
803i, actually a repackaged Sony CD-8003).
- MMC-2 *does* give a way to retrieve the Q subchannel along with
user data in the READ CD command. Although the drive is required to
recognize the fetaure, it is allowed to simply return zeroes
(effectively leaving the feature unimplemented). Guess how many
drives actually implement this feature: not many.
- Assuming you *can* get back the subchannel, most CDROM drives seem
to understand audio discs primarily at the "little frame" level; thus
sector-level structures aren't reliable. One might get a reassembled
subQ, but if the read began in the middle of a sector (or dropped a
little frame in the middle; many do), the subQ is likely corrupt and
useless.
As reassembling uncorrupted frames is easy without the subchannel, and
corrupted reads likely result in a corrupted subchannel too,
cdparanoia treats the subchannel as more trouble than it's worth
(during verification).
At least one other package (Exact Audio Copy for Win32) manages to use
the subchannel to enhance the Table of Contents information. I don't
know if this only works on MMC-2 drives that support returning Q with
READ CD, but I think I'm going to revisit using the subchannel for
extra TOC information.
Why don't you implement CDDB? A GUI? Four million other features I want?
Too many features spoil the broth. "Software is not perfect when
there is nothing left to add, but rather when there is nothing
extraneous left to take away." The goal of cdparanoia is perfect,
rock-solid audio from every capable cdrom on every platform. As this
goal has not yet been met, I'm uninterested in adding unrelated
capability to the core engine.
Several GUIs that incorporate cdparanoia already exist; I'm in the
process of compiling a list (see the links
page). Other software that implements new features by wrapping
around cdpar anoia (like CDDB lookup) also exist.
'Cdparanoia' will not play to sound cards (you can always pipe the
output to a WAV player), do MD5 signatures, read CD catalog or serial
numbers (this *is* a feature I plan to add), search indexes, do rate
reduction (use Sox, Ogg or a million others), or generally make use of
the maximum speed available from a CDROM drive.
If your CDROM drive is *not* prone to jitter and you don't have
scratched discs to worry about, you might want to look at the original
cdda2wav for
features cdparanoia does not have. Keep in mind however that even the
really good drives do occasionally stumble. I know of at least one
cdparanoia user who insists on using full paranoia with his Plextor
UltraPlex because it once botched a single sector from a rip; he'd already
burned the track to several CD-Rs before noticing...