Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Does archiving compressed files increase the chance of corruption?

You’ve mentioned CD-Rs and DVD-Rs more than once as excellent ways
to back up data. Now I have about ten gigabytes of data to backup. If I
compress the files to ZIP format, I can reduce them down to under four
gigabytes–small enough to burn to a DVD-R. But I am scared to do this,
because I fear my important files might eventually get corrupted or
damaged if compressed. I’ve had many bad experiences using compressed
file formats (ZIP, RAR, 7z, etc). It seems that any compressed file I
leave alone for too long ends up damaged or corrupted at some point. My
question is, will burning my compressed files to a finalized,
non-rewriteable DVD prevent them from getting corrupted? (I would
assume that data on a finalized DVD cannot be changed?)

There’s nothing about compression that increases the likelihood of
corruption. It doesn’t matter what format you pick, or how well the
compression is performed, the actual chances of corruption are
completely, and totally unrelated.

The impact of corruption, on the other hand, is an entirely
different story.

]]>

The kind of corruption we’re talking about here is likely caused by a bad sector on the media – be it CD, DVD or even a hard disk. Some data in the middle of one of your files becomes unreadable. Depending on exactly what that file contains, the impact could be negligible (some easily cleaned up noise in the middle of a document perhaps), or catastrophic (the entire file is rendered useless).

“… a single bad sector is likely only to affect a single file.”

If you have lots of files archived or backed up in an uncompressed form then a single bad sector is likely only to affect a single file. Depending on the file, as I mentioned above, this might be benign or catastrophic but it’s limited to that single file. In particular, you might never even notice if that happens to be a file you never attempt to recover.

Compression utilities most often do two things:

  • Compress each file’s data so as to take up less space.

    Compression is simply a mathematical algorithm that analyzes the raw bytes within the file, and uses alternate ways to represent the same information in less space. For one over-simplified example, a series of ten asterisks (**********) might be replaced by an indicator that what follows is compressed data, a single asterisk, and a count of 10. 10 bytes have been reduced to 3, with no data loss. On decompression the encoded data is expanded back to the 10 asterisks.

  • Bundle a number of compressed files together into a single file, so that the aggregate will take up less space.

    Files are stored on hard disks in “clusters” or “sectors”, which have a minimum size. For example, a file containing our 10 asterisks will take up at least 512 bytes on a disk – the size of one sector. A file containing 513 bytes will take up at least 1024 bytes – two sectors, and so on.

    By collecting all the compressed files into a single container file, all this storage inefficiency is avoided. Files take up whatever compressed space they need in the container, and no more.

Here’s the problem: we’ve said that in the worst case a single bad sector could render an entire file unrecoverable.

And we’ve just placed all our “files” into a single container file.

The bad news that you often hear about corruption in archives has nothing to do with increased corruption at all. It has to do with the fact that everything was placed into a single container file, and after a corruption that container file was rendered completely inaccessible.

And all the files within it, lost.

So as you can see, by using a compressed archive format, you haven’t really increased the likelihood of corruption on the disk, but you have increased the impact of such corruption should it happen – perhaps dramatically.

So, what should you do?

  • Choose and use good media.

  • Test your media by making sure that once written it can be read on other machines – preferably more than one.

  • Make multiple copies – Even quality CDs and DVDs are cheap.

  • Consider compressing individual files, rather than creating compressed archives. While the results are larger overall, this is most effective on larger files, and reduces your exposure to corruption back to the single-file level.

  • Consider not compressing at all. Perhaps create a set of DVDs with your data, rather than trying to get it all on one.

I’ve explicitly avoided talking about specific compression tools, like Zip, WinZip, 7-Zip, Rar, Gzip and others, simply because the characteristics of each, and the availability of recovery tools for each, varies widely. And while choosing a good one is important (I’m a fan of 7-Zip), I think it’s less important than the choices you make above to avoid or sidestep the problem in the first place.

Do this:

Subscribe to Confident Computing! More confidence & less frustration -- solutions, answers, & tips -- in your inbox every week.

I'll see you there!

5 comments on “Does archiving compressed files increase the chance of corruption?”

  1. Good Info;
    as always “Back up, Back up, Back up…”
    I never rely on a single copy of anything whether it’s documents, images, audio or video files etc.
    and when I do archiving I make a duplicate of the disc, CD, DVD are rather inexpensive now, even if it costs a $1 per disc that’s still cheaper than replacing or trying to replace some irreplaceable data.

    Reply
  2. I believe you’ve mentioned using True Image to do backups. I’ve been using that as well. After reading your article, I started thinking… A dangerous thing for me. When you use True Image, doesn’t that mean you’re putting all your stuff in one file – increasing the potential problems with corruption? How much does having True Image verify data when backing up help reduce the chances of corruption?

    All backup solutions that create a single image file do indeed fall into this bucket. Verifying after write helps, as do the additional techniques mentioned in the article (multipel copies, good media, etc.)

    – Leo
    08-Apr-2009
    Reply
  3. I have dozens of backups DVDs around; I’m constantly backing up data, copying and moving data. I also store things regularly on flash drives. It’s not a good idea to put all files in a single compressed container, for the reasons stated above. And with today’s cheap, large storage, there’s no real reason to compress.

    Reply
  4. I have avoided compression of files for the same reasons. If I had a thousand files on a CD uncompressed and a few bad sectors developed I could still recover some files. If they where all in one and it became corrupted – they are all gone. That has always been my reasoning.

    Except for programs/games. If even one program file becomes corrupt,its screwed anyway. So I don’t mind compressing games and program files.

    And I always make 2 copies of my backup CD/DVDs.

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.