You’ve mentioned CD-Rs and DVD-Rs more than once as excellent ways
to back up data. Now I have about ten gigabytes of data to backup. If I
compress the files to ZIP format, I can reduce them down to under four
gigabytes–small enough to burn to a DVD-R. But I am scared to do this,
because I fear my important files might eventually get corrupted or
damaged if compressed. I’ve had many bad experiences using compressed
file formats (ZIP, RAR, 7z, etc). It seems that any compressed file I
leave alone for too long ends up damaged or corrupted at some point. My
question is, will burning my compressed files to a finalized,
non-rewriteable DVD prevent them from getting corrupted? (I would
assume that data on a finalized DVD cannot be changed?)
There’s nothing about compression that increases the likelihood of
corruption. It doesn’t matter what format you pick, or how well the
compression is performed, the actual chances of corruption are
completely, and totally unrelated.
The impact of corruption, on the other hand, is an entirely
The kind of corruption we’re talking about here is likely caused by a bad sector on the media – be it CD, DVD or even a hard disk. Some data in the middle of one of your files becomes unreadable. Depending on exactly what that file contains, the impact could be negligible (some easily cleaned up noise in the middle of a document perhaps), or catastrophic (the entire file is rendered useless).
If you have lots of files archived or backed up in an uncompressed form then a single bad sector is likely only to affect a single file. Depending on the file, as I mentioned above, this might be benign or catastrophic but it’s limited to that single file. In particular, you might never even notice if that happens to be a file you never attempt to recover.
Compression utilities most often do two things:
Compress each file’s data so as to take up less space.
Compression is simply a mathematical algorithm that analyzes the raw bytes within the file, and uses alternate ways to represent the same information in less space. For one over-simplified example, a series of ten asterisks (**********) might be replaced by an indicator that what follows is compressed data, a single asterisk, and a count of 10. 10 bytes have been reduced to 3, with no data loss. On decompression the encoded data is expanded back to the 10 asterisks.
Bundle a number of compressed files together into a single file, so that the aggregate will take up less space.
Files are stored on hard disks in “clusters” or “sectors”, which have a minimum size. For example, a file containing our 10 asterisks will take up at least 512 bytes on a disk – the size of one sector. A file containing 513 bytes will take up at least 1024 bytes – two sectors, and so on.
By collecting all the compressed files into a single container file, all this storage inefficiency is avoided. Files take up whatever compressed space they need in the container, and no more.
Here’s the problem: we’ve said that in the worst case a single bad sector could render an entire file unrecoverable.
And we’ve just placed all our “files” into a single container file.
The bad news that you often hear about corruption in archives has nothing to do with increased corruption at all. It has to do with the fact that everything was placed into a single container file, and after a corruption that container file was rendered completely inaccessible.
And all the files within it, lost.
So as you can see, by using a compressed archive format, you haven’t really increased the likelihood of corruption on the disk, but you have increased the impact of such corruption should it happen – perhaps dramatically.
So, what should you do?
Choose and use good media.
Test your media by making sure that once written it can be read on other machines – preferably more than one.
Make multiple copies – Even quality CDs and DVDs are cheap.
Consider compressing individual files, rather than creating compressed archives. While the results are larger overall, this is most effective on larger files, and reduces your exposure to corruption back to the single-file level.
Consider not compressing at all. Perhaps create a set of DVDs with your data, rather than trying to get it all on one.
I’ve explicitly avoided talking about specific compression tools, like Zip, WinZip, 7-Zip, Rar, Gzip and others, simply because the characteristics of each, and the availability of recovery tools for each, varies widely. And while choosing a good one is important (I’m a fan of 7-Zip), I think it’s less important than the choices you make above to avoid or sidestep the problem in the first place.