Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Are Backup Images More Fragile Than Just Copying Individual Files?

Not to any important degree.

I'm often asked if backup images are more susceptible to failure than storing the contents as individual files. My take: not really.
Darts - many versus one
(Image: askleo.com)
Question:

So my concern is that if you have one huge backup image (which for me could be 2TB+) or even one logical backup file split into say 1GB chunks, it’s relatively easy to corrupt that single file/backup set. If each file (or folder or some subset) is backed up individually, corruption is likely to take out a small subset of your backups, not the whole backup.

Interested to hear your thoughts on this and if you know any programs that can do something along these lines.

Individual-file backups can be useful. However, I don’t think the solution you’re proposing solves the problem you think it does.

This concept isn’t limited to just backups, believe it or not. Let’s consider.

Become a Patron of Ask Leo! and go ad-free!

TL;DR:

Images versus files

Backing up one large image file or individual files each has pros and cons, but the risk of corruption and/or data loss depends more on the software’s design than the storage method. Reliable tools can typically recover data from corrupted backups. A layered approach combining image and file-based backups is even more resilient.

Many versus one

For simplicity’s sake, let’s describe the problem this way: you have 10 gigabytes of data. You can store it as:

or

  • 1 file of 10 gigabytes

Both represent the same data. In one case, it’s stored in one massive file; in another, it’s stored across a collection of 10,000 small files.

Which is more resilient to corruption?

As usual, it depends.

Corruption

I take issue with the comment that “It’s relatively easy to corrupt” a file.

These days, it shouldn’t be that easy. If a system is working even moderately well, the chances of random file corruption are low. Not zero, of course, and this is one reason we back up, but it should be infrequent.

If it’s happening “often” (however you’d like to interpret that), it’s more likely a sign of a problem that needs to be fixed rather than some inevitable circumstance.

Resiliency is relative

A larger file does represent a larger target.

If you’re going to have a problem somewhere in those 10 gigabytes, by definition it will happen inside that single 10-gigabyte file, if that’s the way you’re storing the data, or it’ll happen in only one (or a small number) of the 10,000 files.

What happens next depends on what that data represents and how the software reacts to corruption.

  • If a single error in the 10GB file invalidates the entire file, then you’ve lost all the data, and that’s a bad thing. (It’s also bad design.)
  • If a single error in one of the 10,000,000 1KB files invalidates only that file and no others, that’s significantly more resilient. Most of the data “survives”.

But here’s the thing: it could just as easily be the other way around.

  • If a single error in the 10GB file invalidates only a small portion of what it contains, the rest could be recoverable by the software that understands it. You lost little; most of the data “survives”.
  • If a single error in any of the 10,000,000 1KB files invalidates the software’s ability to use the entire collection, then you’ve lost it all. (Again, bad design.)

What happens to your data is less dependent on how it’s stored than it is on the software that subsequently reads it.

Most backup programs try to be smart

If a single, large, image file becomes corrupted, most backup programs attempt to recover what they can.

For example, Macrium Reflect is typically still able to extract individual files from a single .mrimg image file even if corruption somewhere in that file prevents a full restore. Most files are unaffected.

That’s more or less the same result if you were to store everything as individual files. Isolated files might not restore, but most files would be unaffected.

Backups are special

Backing up individual files isn’t enough.

A full-image backup includes things like partitions, partition information, boot information, file system overhead, and more. That’s the stuff you need to ensure you can restore to an empty disk when the time comes. To work as a backup, your collection of individual files needs to include more than just the files on the system.

And that’s the stuff that, if corrupted, could also prevent you from performing that restore — whether it’s stored as a single massive image file or as a collection of individual files.

It’s not specific to backups

The reason I say this issue isn’t specific to backups is that backup image files are just one example of a larger collection of information being bundled into a single file.

For example, we regularly distribute software in .zip, “.iso”, “.msi” (Microsoft Installer) files, or many other “archive” formats. Each type combines many files into a single, larger file. Depending on the file format and robustness of the specific tools being used, these files can be just as vulnerable to corruption. In fact, corruption (at least in the wrong place) in these archive files can render them unreadable.

And when you think about it, isn’t your hard disk just a file container as well?

You don’t have to be using image files or archive files or anything like that for corruption to render your entire hard disk instantly unreadable. Corruption in the wrong place on your hard disk can do exactly that.

Hard disks and file systems are designed to be resilient — to tolerate a certain amount of corruption before giving up completely —  but there’s always a point where things can get bad enough that recovery isn’t possible.

This is why we back up.

My backup software criteria

What I look for in a backup program includes:

  • The ability to back up a complete “image” of an entire hard disk.
  • The ability to back up only those things that have changed since the previous backup (incremental backups).
  • The ability to restore a backup image to a completely empty hard disk.
  • The ability to recover individual files from full disk images.

What you’re looking for

Ultimately, what I believe you’re asking for is this:

You want a program that, instead of collecting all the information into a single file (like a backup program’s image file), copies individual files as individual files and then includes overhead information as some kind of additional “special” file that your backup software could recognize and use during a restoration.

It’s possible, but I’m not aware of a backup program that works this way.

Why it wouldn’t help

Either scenario deals with the same data stored differently. If a file is corrupted, it’s corrupted, regardless of whether it’s inside a larger image file or directly accessible on its own.

If the overhead information is corrupted, then the full-restore process is impossible — again regardless of whether it’s inside a larger image file or directly accessible as a separate “special” file.

I honestly don’t believe that this buys you anything. Corruption is corruption, and if it happens in a benign place, you may never notice. If it happens in the wrong place, your entire backup could be invalidated, regardless of how it’s stored.

What I think you really want

I recommend tools like Macrium Reflect or EaseUS ToDo for creating full-image backups.

They’re not perfect, but they’re good.

If there were one thing I would change, it would be this: I would have them be significantly more resilient to image file corruption. They’re good, don’t get me wrong, but I would have them try even harder when something is determined to be in error. I would have them offer a “best effort” restoration in the face of detected corruption rather than just throwing up their digital hands and giving up.

Ultimately, the same problems that could keep that from working are the same problems that would prevent the suggested comprehensive file-based backup from working.

In both cases, your un-corrupted files are accessible; in either case, it could be impossible to do a complete restore.

Individual file backups are a convenience

I do agree that individual file-based backups are useful. When your backups are accessible in their original form, retrieving them is simple: you locate the backup of the file you want and copy it back.

There’s no need to fire up a backup program to retrieve a file or even look to see what’s in the backup; just navigate with Windows File Explorer and copy the file like you would any other.

It can be useful. And it’s why I do both — sort of.

Do this: layers

I implement a layered approach using:

  • An external hard drive
  • Monthly full-image backups (using Macrium Reflect, or similar)
  • Daily incremental image backups (using same)
  • Near real-time individual file backups using a syncing online service like Dropbox, OneDrive, or others

And while I don’t do it personally, there’s a very strong argument to be made that if your backup program can verify a backup after making it, that might be a good option to turn on, especially if you’re particularly concerned about corruption.

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

Footnotes & References

1: Within reason, of course. For the sake of keeping things conceptually simple here, I don’t want to devolve into the particular pros and cons of overly specific implementation details.

20 comments on “Are Backup Images More Fragile Than Just Copying Individual Files?”

  1. As part of my back up strategy I’ve been using a program named GoodSync. It makes copies individual files (My Documents [and other files I specify]) to external hard drives. They are exact copies and accessible directly. When I run GoodSync program analysis the files on C: drive vs. the files on the external drive and copies only new and changed files.

    It also will retain any “old / outdated” files in a separate location for 30 days. This allow them to be recovered if needed, but I’ve never needed to use this feature in 5+ years of using GoodSync.

    Note: I’m not aware that GoodSync can or does make full disk image backups. Only copies of individual files

    Leo – Have you used GoodSync for making backup copies of files? If so, any thoughts / comments?

    Reply
    • I have used similar programs, but not GoodSync itself, though I’ve heard good things about it. Seems like a good part of a backup strategy, though it’s no replacement for images.

      Reply
  2. “10,000 files of 1 kilobyte each

    or…

    1 file of 10 gigabytes”

    I think that should be 10,000,000 files of 1 kB each (or 10,000 files of 1 MB each)

    Reply
  3. I admire Leo’s thoroughness, but my own approach is similar to Martin’s. I have separate system and data drives. I do a full system disk image back-up to a partition on my large external hard drive from time to time, ideally once a month but in reality two or three times a year. The data drive was also originally copied as an image to another partition; subsequently, I update the larger folders on it as and when with a synchronisation program, at present FolderMatch. I also use Horizon Rollback to keep a snapshot of the system drive which makes for very easy restoration if the system does not boot, although it would not help in the event of the hard drive physically failing.
    Like the questioner, my experience leads me to be wary of relying on large compressed back-up files: from what I have seen, more often than not they don’t work when you need them. Furthermore, uncompressed back-ups offer more ease and flexibility in making customised incremental back-ups or extracting individual files and folders when needed.

    Reply
  4. Part of my discipline is to perform a backup every Saturday morning. Two important financial files I backup to a thumb drive daily. Something else I always do with Macrium is enable verify. It only takes a very few minutes and reassures me that things are probably okay. Once, and only once, did I get a bad incremental. I immediately ran a full backup and all was well. Verify takes so little time that I consider it cheap insurance. It’s only my opinion, but I feel that verify is something everybody should do as habit.

    Reply
  5. I don’t know if this is still true, but in the old days, backup programs would use added layers of error correction logic. That makes the backup file significantly LESS prone to data loss than a collection of plain files. If I have 100 files and I get one bad sector, the file that includes the sector is corrupted, possibly beyond recovery. If that same bad sector were to occur in a backup file with error correction, the error correction logic SHOULD be able to recover the “lost” data.

    Reply
  6. Been using MirrorFolder for years. I started doing it (mirroring to a drive that I can install into the computer) when I noticed bad sectors on the disk. When the disk finally gives up the ghost, I swap in the mirror, fire up the computer, the bios remaps the drive, and boots the operating system. Then I get another drive and use MirrorFolder again. I’ve had more than a few disks go bad over the years and, after the first one, it took me three days to get everything back to “normal”. With MirrorFolder I’m up and running, and back to “normal” in about ten minutes. I am not affiliated with the company that makes MirrorFolder.

    Reply
  7. I didn’t realize it, but I also use the layered approached. 1) I use Macrium Reflect Free Edition and make full image backups monthly for my 4 machines, saving the .mrimg file to my Synology DS-1812. 2) I don’t use My Documents because too many programs have overrun that directory, so I create a new folder called “Home” (aka C:\Users\\Home) and then create a “Home” library for Windows Explorer — I leave Pictures, Music, and Videos in the standard places, but I store all other personally created files and folders in Home, which 3) makes it easy to copy portions to Dropbox, or 3) use SyncToy or FreeFileSync to backup.

    Using this “Home” folder approach has been very satisfying, in that “Home” is the most critical thing that changes on a daily basis, so it gets the most attention on backups. If needed, just doing a daily backup, or syncing, of Home to my Synology is enough, and its guaranteed to be just MY stuff (no extra files and other program folders that have overrun the My Documents folder). I have lots of small files, sure, but total size of Home stays relatively small, so a full sync is quite fast.

    On a final note, my entire (software) solution cost me nothing. Macrium Reflect Free Edition, SyncToy, FreeFileSync, all are free. Granted, I did put quite a bit of money into the hardware side: my Synology DS-1812 has 8 drives, each is a 3TB Seagate … but I love my Synology (it does a lot more), so it was well worth it.

    Reply
  8. I wonder if there’s a hidden trap there… download the pro backup software, do your backups, and then have to reinstall your OS and lose the license. Then you have to pay to get it back. Then again the free version is excellent and may do the job.

    Reply
    • I’ve definitely never run into a situation like that. One way or another I’ve never had to pay twice for the same backup software across a reinstall.

      Reply
  9. I’m the person who asked this question originally. I eventually worked out I had bad RAM – memtest x86 didn’t detect errors, HCI memtest did eventually – this is why corruption resistance is important to me. I agree with a lot of what Leo says, and I appreciate his thought and opinions. However I still think a single backup file increases your risk, mostly because most software is poorly written. If software was well written and tested to cope with errors then the risk would be much lower.

    I’ve done some research into software. Cobian Backup is a good piece of software, it’s free, and it can back up to either compressed image files (zip/7z) or do individual files. It does incremental and differential backups using folders. I’ve decided to use it for my uncompressed backups – I have many TB of RAW images and I’m more comfortable having these large files stored individually. Cobian is INCREDIBLY slow at compressing (5MB/sec on a fast computer, compared with 25MB/sec for other software), so large backups will be slow, especially the initial backup.

    For smaller data (documents, website archives, I guess anything compressable) I’ve decided to use AOMEI Backupper Free. It’s fast, it resists file corruption (I damaged the file with a hex editor and it coped reasonably well), and it’s flexible. You can do compression on a per backup set basis, not globally.

    I do mean to look into Reflect and TrueImage at some point in the near future as well.

    Incidentally I don’t consider mirroring to be backup, as if your original file is corrupted by user error, a virus, or cryptoware, if you don’t notice and run a mirror you lose your “backup”. I also think backups have to be in a different location from the computer to be effective, and anything permanently connected to a computer should be called a “copy” not a “backup”.

    Reply
    • A mirror can be a backup. Dropbox and others like it, for example, keep deleted and overwritten (they are not physically overwritten) programs for 30 days. Older files might be a problem, but that why it’s useful to have a few different backup methods in place.

      Reply
  10. like Tim, I use Cobian, too. But I suspect that I don’t utilise it to its full potential. I find the deleter very useful, too.

    One seeming pitfall is watching the back up drive[s] (2 x Raid 1Tb HDs) for space overload. Every so often, I have to go and delete a large number of very big files.

    However, having the back-up files in an easily program readable (original) format has proved a blessing, at times.

    Interesting and provocative thought that another computer on a network is not a real back up, but only a copy. I can’t disconnect mine, as it’s also a LAN and Internet server.

    Reply
  11. Maybe I misunderstood you, but in the section “Individual file backups are a convenience,” you said that you were not aware of any backup program that would allow you to “just navigate with Windows Explorer and copy like you would any other file.”

    I know that you are currently evaluating EaseUs Todo Backup. Well, in your testing, you should try doing just that. I have plugged in my external hard drive, gone into Windows Explorer, navigated to my image backup and double clicked it. The image got mounted (or whatever the correct term is) and I could navigate all the files and folders from within Windows Explorer, as if I was navigating my hard drive. I found the file I wanted and copied and pasted the missing file. No need to run the backup program. This all happened within Windows Explorer, in much the same way that Windows handles ZIP files by default (unless you’ve gone and installed WinZip or something like it).

    P.S. I use the free version of EaseUS to make my images.

    Reply
    • Mounting the backup as a virtual drive is a very common use of image backups. The convenience Leo is talking about is having the files exist as individual files so that it wouldn’t be necessary to mount the image backup to view or restore a file.

      Reply
    • Easus still implements that as a single image file, no? What I meant was that you didn’t need to “mount” anything by double clicking it – just navigate around with Windows Explorer and the files are already there. (What you’re describing can be done in Macrium Reflect as well as other backup programs as well.)

      Reply
  12. I rate the files I generate myself (documents, programs, presentations, experimental data) much more highly than system files.
    It takes time, and makes me swear, but I can reinstall windows, office etc in a few hours.
    I make backups of all my own files using the windows system program ROBOCOPY. It has the nice properties that it can retain create/modify/access date information, and can be set to only copy newer files: no need to recopy files already on the backup. The backup takes more space than a compressed file, but these days disk space is cheap. I write a script file so running the backup is easy. I put the script file on as a scheduled job and it happens automatically every few hours.
    ROBOCOPY stands for “Robust Copy” and it handles several system conditions (e.g. file in use) more gracefully than programs like XCOPY (which I used to use).

    Reply
    • There’s a virus around the encrypts your files and demands money to restore them. If you run a mirror before you notice this happening your “backup” then has the corrupted files. Incremental backup programs avoid this issue.

      Reply
  13. I just lost my SSD and the hard drive I had backups on. I have old data backups of large data files, movies music etc… but it wasn’t an image file I was able to recover from. So I’m currently in the position of reinstalling everything, and now I’m compiling a game plan for having better, fuller and more secure data restore-ability.

    SyncToy is something I’ve used in the past and will soon set up for my current rig. It’s from MS and it’s free and makes copies of files and can sync them up if changed or updated between the two points, 1 point being the origin drive the other being the backup. But I’m only going to use this for very large files which really I would be able to recompile again if I were to suffer another data loss. I have the DVD’s/CD’s etc that the music/movie files originated, just in a closet.

    What I am going to image is a smaller, relatively, image backup of Windows/programs and other utilities I use after I’m fully updated with all the settings I like. It’s still going to be large, I figure about 150-200gb, but not nearly as big as the 1.2TB’s the big files take. I’m going to back up the “Bulk Data” only on two hard drives, one internal one external, and the image file I’m going to use a differential backup scheme and keep it on the internal HDD, the original SSD, the backup external SSD and I’m going to acquire enough cloud storage to hold the OS/Image/differential backups as well.

    I basically want multiple backup points to ensure I don’t go through this again. When you create the image file be sure to verify the file, that way if there is a corruption you can deal with it before you’re in desperate need. That’s when I’ll just copy the image to the multiple sources. The cloud being the most time consuming.

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.