I’ve been suffering from some minor disk corruption over the past couple of years. I ran every test I could think of. Eventually it turned out all four sticks of RAM, when used together, caused data corruption, but any two worked fine. I still can’t figure that out, but anyway, it’s replaced now and works fine.
However I now have some corrupt files, including some Macrium reflect disk images. Fortunately even if the backup is corrupt you can still browse it to get individual files out, but you can’t restore the whole backup.
So my concern is that if you have one huge backup image (which for me could be 2TB+) or even one logical backup file split into say 1GB chunks it’s relatively easily to corrupt that single file/backup set. If each file (or folder, or some subset) is backed up individually corruption is likely to take out a small subset of your backups, not the whole backup.
Interested to hear your thoughts on this, and if you know any programs that can do something along these lines.
Actually I have several thoughts, but I’ll answer your last question first: no, I’m not aware of a backup program that works as you’ve outlined.
Your line of reasoning isn’t at all new; I get comments similar to this relatively often. Nor is the concept limited to just backups, believe it or not.
There are reasons individual file backups can be useful. However, while I could be wrong, I don’t think the solution you’re proposing actually solves the problem you think it does.
Become a Patron of Ask Leo! and go ad-free!
Many versus one
For simplicity’s sake, let’s describe the problem this way: you have 10 gigabytes of data. You can store it as:
Each represents the exact same data. In once case, it’s stored in one massive file; in another, it’s stored across a collection of 10,000 smaller files.
The question is, which is more resilient to corruption.
The answer is . . . it depends.
Resiliency is relative
While I take issue with your comment that “it’s relatively easily to corrupt” a file (it shouldn’t be, for a system working even moderately well), the fact is that a larger file represents a larger target. If you’re going to have a problem somewhere in that 10 gigabytes, then by definition it will happen inside that 10 gigabyte file, if that’s the storage you’re using; or, it’ll happen in only one of the 10,000 files, if that’s the approach taken.
What happens next depends on what that data represents – regardless of how it’s stored.1
As you saw, Macrium Reflect was still able to extract individual files from a single .mrimg image file, even though there was corruption somewhere in that file that prevented it from attempting a full restore.
The exact same thing could still happen using your suggested “store it all as individual files” approach.
For the backup you’re looking for to be a true replacement for an image file, it needs to include much more than just the files on the system. A full-image backup also includes things like partition information, boot information, file system overhead, and more. That’s the stuff you need to ensure you can restore to an empty disk when the time comes.
And that’s the stuff that, if corrupted, would also prevent you from performing that restore – just as Macrium was unable to perform the restore.
But you can still get at all the individual uncorrupted files, just like you can with Macrium.
It’s not specific to backups
The reason I say “this isn’t specific to backups” is that backup image files are just one example of a larger collection of information being bundled into a single file.
For example, we regularly distribute software in zip files, or “.cab” cabinet files, or “.iso” image files, or “.msi” Microsoft Installer files, or many other so-called “archive” formats. Each takes a large number of files and combines them into a single, larger file. Depending on the file format and robustness of the specific tools being used, these files can be just as vulnerable to corruption as you experienced. In fact, a corruption anywhere, or at least in the wrong place, in these archive files can render them unreadable.
Isn’t your hard disk just a file container as well?
Indeed, you don’t even have to be using image files or archive files or anything like that for corruption to render your entire hard disk instantly unreadable. Corruption in the wrong place on your hard disk can do exactly that. Hard disks and file systems are designed to be resilient – to be able to tolerate a certain amount of corruption before giving up completely – but there’s always a point where things can get bad enough that recovery isn’t possible.
This is why we back up in the first place.
What you’re looking for
Ultimately, what I believe you’re asking for is this:
Instead of collecting all the information into a single file, like Macrium’s .mrimg file, copy individual files as individual files, and then also include the overhead information as some kind of additional, “special” file that your backup software would be able to recognize and use come restore time.
It’s totally possible, but as I said, I’m just not aware of a backup program that works this way. Perhaps someone will mention one in the comments to this article.
Why it doesn’t really solve the problem
In either scenario, it’s the same data, stored differently. If a file is corrupted, it’s corrupted, regardless of whether it’s inside a larger image file, or directly accessible on its own.
If the overhead information is corrupted, then the full-restore process is impossible, again regardless of whether it’s inside a larger image file, or directly accessible as a separate “special” file.
I honestly don’t believe that this buys you anything. Corruption is corruption, and if it happens in a benign place you may never notice, but if it happens in the wrong place then your entire backup could be invalidated.
Regardless of how it’s stored.
What I think you really want
Now, we’ve been talking about Macrium as if it is perfect software for creating full-image files.
It’s not. It’s good, but it’s not perfect.
If there were one thing I would change, it would be this: I would have Macrium – or any image backup program, for that matter – be significantly more resilient to image file corruption. I would have it try harder. I would have it offer to perform a “best effort” restore in the face of detected corruption, rather than just throwing up its digital hands and giving up.
But ultimately, the same problems that could keep that from working are the same problems that would prevent the suggested comprehensive file-based backup from working.
In both cases, your un-corrupted files are accessible, and in either case, it could be impossible to do a complete restore.
Individual file backups are a convenience
I did say that individual file-based backups could be useful.
In short, when your backups are accessible in their original form, retrieving them is simple: you locate the backup of the file you want, and copy it back.
No need to fire up a backup program to retrieve a file, or even look to see what files are included in the backup; just navigate with Windows Explorer and copy like you would any other file.
None of the imaging backup programs I’m aware of do this, and I believe it’s for one reason: space. By collecting all the individual files into a single file, and then also compressing the collection, the space taken up by the resulting single image is significantly smaller than the space allocated to all the individual files on disk.
But it can be useful… which is why I do both – sort of.
My take away: layers
Clearly, backing up is important. Regardless of the situation, “stuff” happens. Rarely is it as insidious and long term as what you were experiencing, but … well, as I said, “stuff” happens, and sometimes it happens to the backups themselves.
That’s why I implement a layered approach:
- External hard drive
- Monthly full-image backups (using Macrium Reflect, or similar)
- Daily incremental image backups
- Near real-time individual file backups using an online service like Dropbox, OneDrive, or others
And while I don’t do it personally, there’s a very strong argument to be made that if your backup program has the ability to verify a backup after making it, that might be a good option to turn on, particularly in situations such as yours. (I don’t do it simply because I have so many backups, including backups of backups, that the additional time it takes just isn’t warranted.)