Technology in terms you understand. Sign up for my weekly newsletter, "Confident Computing", for more solutions you can use to make your life easier. Click here.

Are Backup Image Files more Fragile than just Having Copies of All the Individual Files?

//

I’ve been suffering from some minor disk corruption over the past couple of years. I ran every test I could think of. Eventually it turned out all four sticks of RAM, when used together, caused data corruption, but any two worked fine. I still can’t figure that out, but anyway, it’s replaced now and works fine.

However I now have some corrupt files, including some Macrium reflect disk images. Fortunately even if the backup is corrupt you can still browse it to get individual files out, but you can’t restore the whole backup.

So my concern is that if you have one huge backup image (which for me could be 2TB+) or even one logical backup file split into say 1GB chunks it’s relatively easily to corrupt that single file/backup set. If each file (or folder, or some subset) is backed up individually corruption is likely to take out a small subset of your backups, not the whole backup.

Interested to hear your thoughts on this, and if you know any programs that can do something along these lines.

Actually I have several thoughts, but I’ll answer your last question first: no, I’m not aware of a backup program that works as you’ve outlined.

Your line of reasoning isn’t at all new; I get comments similar to this relatively often. Nor is the concept limited to just backups, believe it or not.

There are reasons individual file backups can be useful.  However, while I could be wrong, I don’t think the solution you’re proposing actually solves the problem you think it does.

Become a Patron of Ask Leo! and go ad-free!

Many versus one

For simplicity’s sake, let’s describe the problem this way: you have 10 gigabytes of data. You can store it as:

or…

  • 1 file of 10 gigabytes

Each represents the exact same data. In once case, it’s stored in one massive file; in another, it’s stored across a collection of 10,000 smaller files.

The question is, which is more resilient to corruption.

The answer is . . . it depends.

Resiliency is relative

While I take issue with your comment that “it’s relatively easily to corrupt” a file (it shouldn’t be, for a system working even moderately well), the fact is that a larger file represents a larger target. If you’re going to have a problem somewhere in that 10 gigabytes, then by definition it will happen inside that 10 gigabyte file, if that’s the storage you’re using; or, it’ll happen in only one of the 10,000 files, if that’s the approach taken.

What happens next depends on what that data represents – regardless of how it’s stored.1

As you saw, Macrium Reflect was still able to extract individual files from a single .mrimg image file, even though there was corruption somewhere in that file that prevented it from attempting a full restore.

The exact same thing could still happen using your suggested “store it all as individual files” approach.

For the backup you’re looking for to be a true replacement for an image file, it needs to include much more than just the files on the system. A full-image backup also includes things like partition information, boot information, file system overhead, and more. That’s the stuff you need to ensure you can restore to an empty disk when the time comes.

And that’s the stuff that, if corrupted, would also prevent you from performing that restore – just as Macrium was unable to perform the restore.

But you can still get at all the individual uncorrupted files, just like you can with Macrium.

Backup & RestoreIt’s not specific to backups

The reason I say “this isn’t specific to backups” is that backup image files are just one example of a larger collection of information being bundled into a single file.

For example, we regularly distribute software in zip files, or “.cab” cabinet files, or “.iso” image files, or “.msi” Microsoft Installer files, or many other so-called “archive” formats. Each takes a large number of  files and combines them into a single, larger  file. Depending on the file format and robustness of the specific tools being used, these files can be just as vulnerable to corruption as you experienced. In fact, a corruption anywhere, or at least in the wrong place, in these archive files can render them unreadable.

Isn’t your hard disk just a file container as well?

Indeed, you don’t even have to be using image files or archive files or anything like that for corruption to render your entire hard disk instantly unreadable. Corruption in the wrong place on your hard disk can do exactly that. Hard disks and file systems are designed to be resilient – to be able to tolerate a certain amount of corruption before giving up completely –  but there’s always a point where things can get bad enough that recovery isn’t possible.

This is why we back up in the first place.

My backup software criteria

  • The ability to back up a complete “image” of an entire hard disk.
  • The ability to back up only those things that have changed since the previous backup.
  • The ability to restore a backup image to a completely empty hard disk.
  • The ability to recover individual files from full disk images.

What you’re looking for

Ultimately, what I believe you’re asking for is this:

Instead of collecting all the information into a single file, like Macrium’s .mrimg file, copy individual files as individual files, and then also include the overhead information as some kind of additional, “special” file that your backup software would be able to recognize and use come restore time.

It’s totally possible, but as I said, I’m just not aware of a backup program that works this way. Perhaps someone will mention one in the comments to this article.

Why it doesn’t really solve the problem

In either scenario, it’s the same data, stored differently. If a file is corrupted, it’s corrupted, regardless of whether it’s inside a larger image file, or directly accessible on its own.

If the overhead information is corrupted, then the full-restore process is impossible, again regardless of whether it’s inside a larger image file, or directly accessible as a separate “special” file.

I honestly don’t believe that this buys you anything. Corruption is corruption, and if it happens in a benign place you may never notice, but if it happens in the wrong place then your entire backup could be invalidated.

Regardless of how it’s stored.

What I think you really want

Now, we’ve been talking about Macrium as if it is perfect software for creating full-image files.

It’s not. It’s good, but it’s not perfect.

If there were one thing I would change, it would be this: I would have Macrium – or any image backup program, for that matter – be significantly more resilient to image file corruption. I would have it try harder. I would have it offer to perform a “best effort” restore in the face of detected corruption, rather than just throwing up its digital hands and giving up.

But ultimately, the same problems that could keep that from working are the same problems that would prevent the suggested comprehensive file-based backup from working.

In both cases, your un-corrupted files are accessible, and in either case, it could be impossible to do a complete restore.

Individual file backups are a convenience

I did say that individual file-based backups could be useful.

In short, when your backups are accessible in their original form, retrieving them is simple: you locate the backup of the file you want, and copy it back.

No need to fire up a backup program to retrieve a file, or even look to see what files are included in the backup; just navigate with Windows Explorer and copy like you would any other file.

None of the imaging backup programs I’m aware of do this, and I believe it’s for one reason: space. By collecting all the individual files into a single file, and then also compressing the collection, the space taken up by the resulting single image is significantly smaller than the space allocated to all the individual files on disk.

But it can be useful… which is why I do both – sort of.

My take away: layers

Clearly, backing up is important. Regardless of the situation, “stuff” happens. Rarely is it as insidious and long term as what you were experiencing, but … well, as I said, “stuff” happens, and sometimes it happens to the backups themselves.

That’s why I implement a layered approach:

  • External hard drive
  • Monthly full-image backups (using Macrium Reflect, or similar)
  • Daily incremental image backups
  • Near real-time individual file backups using an online service like Dropbox, OneDrive, or others

In fact, if you’re a back-up beginner, that’s pretty much what I lay out in my book Just Do This: Back Up!

And while I don’t do it personally, there’s a very strong argument to be made that if your backup program has the ability to verify a backup after making it, that might be a good option to turn on, particularly in situations such as yours. (I don’t do it simply because I have so many backups, including backups of backups, that the additional time it takes just isn’t warranted.)

Podcast audio

Play

 

Footnotes & references

1: Within reason, of course. For the sake of keeping things conceptually simple here, I don’t want to devolve into the particular pros and cons of overly specific implementation details.

20 comments on “Are Backup Image Files more Fragile than just Having Copies of All the Individual Files?”

  1. As part of my back up strategy I’ve been using a program named GoodSync. It makes copies individual files (My Documents [and other files I specify]) to external hard drives. They are exact copies and accessible directly. When I run GoodSync program analysis the files on C: drive vs. the files on the external drive and copies only new and changed files.

    It also will retain any “old / outdated” files in a separate location for 30 days. This allow them to be recovered if needed, but I’ve never needed to use this feature in 5+ years of using GoodSync.

    Note: I’m not aware that GoodSync can or does make full disk image backups. Only copies of individual files

    Leo – Have you used GoodSync for making backup copies of files? If so, any thoughts / comments?

    • I have used similar programs, but not GoodSync itself, though I’ve heard good things about it. Seems like a good part of a backup strategy, though it’s no replacement for images.

  2. “10,000 files of 1 kilobyte each

    or…

    1 file of 10 gigabytes”

    I think that should be 10,000,000 files of 1 kB each (or 10,000 files of 1 MB each)

  3. I admire Leo’s thoroughness, but my own approach is similar to Martin’s. I have separate system and data drives. I do a full system disk image back-up to a partition on my large external hard drive from time to time, ideally once a month but in reality two or three times a year. The data drive was also originally copied as an image to another partition; subsequently, I update the larger folders on it as and when with a synchronisation program, at present FolderMatch. I also use Horizon Rollback to keep a snapshot of the system drive which makes for very easy restoration if the system does not boot, although it would not help in the event of the hard drive physically failing.
    Like the questioner, my experience leads me to be wary of relying on large compressed back-up files: from what I have seen, more often than not they don’t work when you need them. Furthermore, uncompressed back-ups offer more ease and flexibility in making customised incremental back-ups or extracting individual files and folders when needed.

  4. Part of my discipline is to perform a backup every Saturday morning. Two important financial files I backup to a thumb drive daily. Something else I always do with Macrium is enable verify. It only takes a very few minutes and reassures me that things are probably okay. Once, and only once, did I get a bad incremental. I immediately ran a full backup and all was well. Verify takes so little time that I consider it cheap insurance. It’s only my opinion, but I feel that verify is something everybody should do as habit.

  5. I don’t know if this is still true, but in the old days, backup programs would use added layers of error correction logic. That makes the backup file significantly LESS prone to data loss than a collection of plain files. If I have 100 files and I get one bad sector, the file that includes the sector is corrupted, possibly beyond recovery. If that same bad sector were to occur in a backup file with error correction, the error correction logic SHOULD be able to recover the “lost” data.

  6. Been using MirrorFolder for years. I started doing it (mirroring to a drive that I can install into the computer) when I noticed bad sectors on the disk. When the disk finally gives up the ghost, I swap in the mirror, fire up the computer, the bios remaps the drive, and boots the operating system. Then I get another drive and use MirrorFolder again. I’ve had more than a few disks go bad over the years and, after the first one, it took me three days to get everything back to “normal”. With MirrorFolder I’m up and running, and back to “normal” in about ten minutes. I am not affiliated with the company that makes MirrorFolder.

  7. I didn’t realize it, but I also use the layered approached. 1) I use Macrium Reflect Free Edition and make full image backups monthly for my 4 machines, saving the .mrimg file to my Synology DS-1812. 2) I don’t use My Documents because too many programs have overrun that directory, so I create a new folder called “Home” (aka C:\Users\\Home) and then create a “Home” library for Windows Explorer — I leave Pictures, Music, and Videos in the standard places, but I store all other personally created files and folders in Home, which 3) makes it easy to copy portions to Dropbox, or 3) use SyncToy or FreeFileSync to backup.

    Using this “Home” folder approach has been very satisfying, in that “Home” is the most critical thing that changes on a daily basis, so it gets the most attention on backups. If needed, just doing a daily backup, or syncing, of Home to my Synology is enough, and its guaranteed to be just MY stuff (no extra files and other program folders that have overrun the My Documents folder). I have lots of small files, sure, but total size of Home stays relatively small, so a full sync is quite fast.

    On a final note, my entire (software) solution cost me nothing. Macrium Reflect Free Edition, SyncToy, FreeFileSync, all are free. Granted, I did put quite a bit of money into the hardware side: my Synology DS-1812 has 8 drives, each is a 3TB Seagate … but I love my Synology (it does a lot more), so it was well worth it.

  8. I wonder if there’s a hidden trap there… download the pro backup software, do your backups, and then have to reinstall your OS and lose the license. Then you have to pay to get it back. Then again the free version is excellent and may do the job.

    • I’ve definitely never run into a situation like that. One way or another I’ve never had to pay twice for the same backup software across a reinstall.

  9. I’m the person who asked this question originally. I eventually worked out I had bad RAM – memtest x86 didn’t detect errors, HCI memtest did eventually – this is why corruption resistance is important to me. I agree with a lot of what Leo says, and I appreciate his thought and opinions. However I still think a single backup file increases your risk, mostly because most software is poorly written. If software was well written and tested to cope with errors then the risk would be much lower.

    I’ve done some research into software. Cobian Backup is a good piece of software, it’s free, and it can back up to either compressed image files (zip/7z) or do individual files. It does incremental and differential backups using folders. I’ve decided to use it for my uncompressed backups – I have many TB of RAW images and I’m more comfortable having these large files stored individually. Cobian is INCREDIBLY slow at compressing (5MB/sec on a fast computer, compared with 25MB/sec for other software), so large backups will be slow, especially the initial backup.

    For smaller data (documents, website archives, I guess anything compressable) I’ve decided to use AOMEI Backupper Free. It’s fast, it resists file corruption (I damaged the file with a hex editor and it coped reasonably well), and it’s flexible. You can do compression on a per backup set basis, not globally.

    I do mean to look into Reflect and TrueImage at some point in the near future as well.

    Incidentally I don’t consider mirroring to be backup, as if your original file is corrupted by user error, a virus, or cryptoware, if you don’t notice and run a mirror you lose your “backup”. I also think backups have to be in a different location from the computer to be effective, and anything permanently connected to a computer should be called a “copy” not a “backup”.

    • A mirror can be a backup. Dropbox and others like it, for example, keep deleted and overwritten (they are not physically overwritten) programs for 30 days. Older files might be a problem, but that why it’s useful to have a few different backup methods in place.

  10. like Tim, I use Cobian, too. But I suspect that I don’t utilise it to its full potential. I find the deleter very useful, too.

    One seeming pitfall is watching the back up drive[s] (2 x Raid 1Tb HDs) for space overload. Every so often, I have to go and delete a large number of very big files.

    However, having the back-up files in an easily program readable (original) format has proved a blessing, at times.

    Interesting and provocative thought that another computer on a network is not a real back up, but only a copy. I can’t disconnect mine, as it’s also a LAN and Internet server.

  11. Maybe I misunderstood you, but in the section “Individual file backups are a convenience,” you said that you were not aware of any backup program that would allow you to “just navigate with Windows Explorer and copy like you would any other file.”

    I know that you are currently evaluating EaseUs Todo Backup. Well, in your testing, you should try doing just that. I have plugged in my external hard drive, gone into Windows Explorer, navigated to my image backup and double clicked it. The image got mounted (or whatever the correct term is) and I could navigate all the files and folders from within Windows Explorer, as if I was navigating my hard drive. I found the file I wanted and copied and pasted the missing file. No need to run the backup program. This all happened within Windows Explorer, in much the same way that Windows handles ZIP files by default (unless you’ve gone and installed WinZip or something like it).

    P.S. I use the free version of EaseUS to make my images.

    • Mounting the backup as a virtual drive is a very common use of image backups. The convenience Leo is talking about is having the files exist as individual files so that it wouldn’t be necessary to mount the image backup to view or restore a file.

    • Easus still implements that as a single image file, no? What I meant was that you didn’t need to “mount” anything by double clicking it – just navigate around with Windows Explorer and the files are already there. (What you’re describing can be done in Macrium Reflect as well as other backup programs as well.)

  12. I rate the files I generate myself (documents, programs, presentations, experimental data) much more highly than system files.
    It takes time, and makes me swear, but I can reinstall windows, office etc in a few hours.
    I make backups of all my own files using the windows system program ROBOCOPY. It has the nice properties that it can retain create/modify/access date information, and can be set to only copy newer files: no need to recopy files already on the backup. The backup takes more space than a compressed file, but these days disk space is cheap. I write a script file so running the backup is easy. I put the script file on as a scheduled job and it happens automatically every few hours.
    ROBOCOPY stands for “Robust Copy” and it handles several system conditions (e.g. file in use) more gracefully than programs like XCOPY (which I used to use).

    • There’s a virus around the encrypts your files and demands money to restore them. If you run a mirror before you notice this happening your “backup” then has the corrupted files. Incremental backup programs avoid this issue.

  13. I just lost my SSD and the hard drive I had backups on. I have old data backups of large data files, movies music etc… but it wasn’t an image file I was able to recover from. So I’m currently in the position of reinstalling everything, and now I’m compiling a game plan for having better, fuller and more secure data restore-ability.

    SyncToy is something I’ve used in the past and will soon set up for my current rig. It’s from MS and it’s free and makes copies of files and can sync them up if changed or updated between the two points, 1 point being the origin drive the other being the backup. But I’m only going to use this for very large files which really I would be able to recompile again if I were to suffer another data loss. I have the DVD’s/CD’s etc that the music/movie files originated, just in a closet.

    What I am going to image is a smaller, relatively, image backup of Windows/programs and other utilities I use after I’m fully updated with all the settings I like. It’s still going to be large, I figure about 150-200gb, but not nearly as big as the 1.2TB’s the big files take. I’m going to back up the “Bulk Data” only on two hard drives, one internal one external, and the image file I’m going to use a differential backup scheme and keep it on the internal HDD, the original SSD, the backup external SSD and I’m going to acquire enough cloud storage to hold the OS/Image/differential backups as well.

    I basically want multiple backup points to ensure I don’t go through this again. When you create the image file be sure to verify the file, that way if there is a corruption you can deal with it before you’re in desperate need. That’s when I’ll just copy the image to the multiple sources. The cloud being the most time consuming.

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.