Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

The Perils of Digital Archiving

Become a Patron of Ask Leo! and go ad-free!

Transcript

Show Transcript

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

Podcast audio

Play

61 comments on “The Perils of Digital Archiving”

  1. Leo, you putting everything on hard drives is also destined to be lost. How many versions of hard drives are obsolete

    already? Some from just few years ago are no longer readable on current computers.

    Reply
    • Indeed. This is why periodic forward migration (from old media – whatever that might be – to newer) is an important part of any long term plan.

      Reply
  2. Leo, I think you are dead on about storage of digital information.
    One thing I have experienced with optical media is that after ten years or so it gets iffy as far as reading it goes. I live in Mexico and as such my very limited climate control I am sure is an issue.
    I have progressed from floppy drives, to cd’s to dvd’s and at present use bluray discs for most storage. I also of course back up to multiple drives though my physical location houses all backups. As my house is built of brick and cement, roof included fire is not a threat.
    Over the years I have experienced hard drive failures but had my photo and many important documents burned to optical medium. I still lost things but nothing earth shaking to my life.
    The main problem I experience is accessing the huge number of things I have saved and backed up. Not that I dont have them, but remembering what I have and simply scrolling through thousands and thousands of documents and photos makes it a daunting task at best.
    As I live in a country where it is legal to download movies and music at will I have a huge collection, all burned to dvd’s over the past fifteen or so years. My friend has made a spread sheet for me but once again the physical size of the disc collection makes in hard to find anything. Not sure what I will do with it all actually. Maybe when I am gone my kids will want to parse. Mike in Mexico

    Reply
  3. My concerns are threefold:

    1. I’m sure I have old files that can’t be read any more from my Atari word processor, spreadsheet and database, however, I no longer have either the machine or programs to read them. Some I saved in txt format so I can still read them.

    2. That backup drives will no longer connect to the new 2035 PC ports assuming that we are actually still using PCs then. I have old hard drives with various different connections, IDE, SCSI, SATA, USB, etc. SCSI is pretty obsolete already.

    3. That the backup drives will seize up or be rendered useless due to magnetism, moisture, heat, cold, radiation or some other physical damage.

    Reply
    • 1. There may be software out there to read the old file formats. Sometimes it’s possible to retrieve the contents of files while losing the formatting, but that also usually involves some after-the-fact clean up.

      2. Once again, this is an argument for always migrating data forward to current technologies on a regular basis. What’s current today will hold you for a while, but will not be current forever.

      3. If the data is in only one place, it’s not backed up. By that I mean if the failure of a single drive would cause you data loss, then you’re not properly backed up.

      Reply
  4. I used to be, before retirement, involved with document management (classification, storage, destruction, archiving). The archiving of strategic documents (those documents necessary for the recovery from the severest of disasters such as but not limited to all out nuclear war where high energy electromagnetic pulses would wipe out electronic devices and the ability to produce electricity immediately) had be done in such a manner that low tech or no tech would be necessary to recover those documents.

    The requirement for the immediate ability to read such documents after a worst case scenario eliminated the digital storage of strategic documents. Paper copies of original documents held in multiple, separated secure and “hardened” locations was by far the most secure way of storing documents for long periods of time. Paper when tightly packed in steel filing cabinets are very resistant to destruction by fire.

    A practical example of paper surviving for a long time are the Dead Sea Scrolls that survived roughly 2000 years in caves near the Dead Sea. All that is required to read the contents is the knowledge of the language(s) used to create them.

    Reply
    • @Mike: Fire IS still a very real threat to the contents of brick built houses! And if your main problem is accessing the huge number of things I have saved and backed up, perhaps you should try to adopt a uniform file-naming system (ie always add dates in the same format such as 2016-11-01) and try and make the time to sort files into separate folders. Failing all that, you will have to rely on a search engine to find particular files :-/

      @Rob: Storing paper in separated secure and “hardened” locations, in steel filing cabinets may well offer the surest way of protecting the documents – if you have a HUGE house with an absolutely HUGE cellar. But finding and retrieving just one sheet of paper would still be a nightmare! LOL. I think I’ll stick to keeping 3 or 4 digital copies, spread over different kinds of storage media (including 2 that are stored off site)! LOL.

      Reply
  5. Hi Leo.
    This is a very interesting article.
    There is just one point I would love to see covered:
    – How do you keep track of all those individual pieces of information?
    I have a few boxes of floppies, and more or less outdated CD/DVDs, but if ever I should want to retrieve some particular documentation, how will I be able to remember which is which, and in which folder it may reside in?
    The grandparents’ system of having a couple of shoeboxes with photos that we could skim through on festive occasions was a lot easier to handle. They did not have to fear the recording format go obsolete; except maybe fade a bit over the decades.
    When will technology advance to the stage where we can directly decode the storage media like we could back then?
    – Please address my first question first.

    Reply
  6. Ole K: “I have a few boxes of floppies, and more or less outdated CD/DVDs, but if ever I should want to retrieve some particular documentation, how will I be able to remember which is which, and in which folder it may reside?”

    It’s a very good point, and one I imagine that troubles the older ones among us more than those who are younger, who MAY not have quite such an extensive collection of files (certainly self-created documents). It can be a huge task to try to keep track of your files, because you have to keep cross-referencing everything.

    Take me, for instance — “somewhere” I have hundreds, perhaps several thousand DOS files of my documents (I am a writer) — which I would LIKE to be able to get back, and “catalog” so I can find them when I want them. But since DOS only uses 8+3 for the names of files, for sure the names are likely to pop up again, referring to something else! At least with Windows files “long names” you can make them “more unique”. (Anyway, in my case, most of those files are in the WordStar format, with the hi-bit, so they do not appear natively as text files — another problem…although I can VIEW them with DR.Com.)

    Back in the early 90’s I recall there was at least one programme that allowed you to “run” a floppy, by doing a directory sort, so that you had a file of all the file names, very quickly, and you would indicate what the floppy’s “name” was. You could process a lot of floppies very quickly this way. Now, you can only do, at the DOS prompt, DIR /od > d:\Floppy1 — to create, for instance, a directory list, in file date order (Order-Date) sending it to D: drive as a file named “Floppy1”. Doing that yourself WILL build up a list, certainly, but then you have to join all the files together, so that you can use the Search function in a document creation programme, in order to hopefully find various possibilities for what you seek. But pretty soon that file will be huge… And if you wish to index even a “small” HDD… well, it could take you forever, if you also want to know where some old programmes are located….

    The point is, you will likely spend more of your LIFE either trying to find something, OR trying to index all that you have, than it may be worth. And even if your task may not be so huge, and you figure you can handle it, you may soon be daunted by the realisation that in a few years time, as standards change, you may not even be able to READ all that “organisation” you spent so much of your life trying to create… so you then have to migrate everything….

    And THEN — even if you COULD… just WHEN are you really going to have time to READ all that interesting stuff you created some 20 years ago??

    “Getting a life” nowadays probably doesn’t allow for much looking backwards — especially in today’s increasingly hectic world…..

    Reply
  7. It’s not enough to preserve the data; you should also be thinking about retrieval. With your digital cameras generating dozens to thousands of images per day, you’re piling up tens of thousands of images per year. Twenty years from now, it’s not just a matter of being able to read a file and decode the contents — how will you find the image that you want? SAVING digital data is trivial; making it retrievable takes far more work. The ease of saving digital information makes it harder to retrieve. In 1916, only important documents were saved: Wedding licenses, diplomas, letters from the frontier. In 2016, backing up your phone every day means this week’s shopping list gets the same treatment as the photo of your child’s first step. And the photo of that first step is buried along with two dozen shots of your pants leg that you took by accident, and 600 shots of the fall foliage that looked amazing in person but utterly lack luster in the pictures. How do you find one image among 100,000?

    Reply
  8. Mark
    My problem, with most things is to remembering that I have something after a few years of not usseing it. If I don`t remember I have it (and or where it is), I may as well not have it.

    Reply
    • That issue is know in computer circles as a user error or PEBCAK Problem Existing Between Chair and Keyboard ;-)

      Reply
  9. I was noticing a lot of the comments about finding the data. You could use a document management system where the location of individual files is not explicitly known. Of course that is also subject to obsolescence, not to mention it is a lot of work to get everything entered into, categorized, linked to viewers, etc. For companies like marketing firms this is the subject of legitimate work at home data entry positions, but it is the same concern on a smaller scale for the home.

    Reply
  10. There are a number of free software programs that will scan and list
    the files from your media. The one I use is ‘JR Directory Printer’. You can save
    the file as well as print a copy.

    Reply
  11. A very good program. Thanks.

    Your suggestions are appropriate for most people, but don’t address some of the problems that archivists face and should be noted, and which your fans should keep in mind. So far as I know there are only two reasonably certain methods of archiving information: (1) Low acid content paper (think papyrus which has a good track record for a couple thousand years), and (2) stable photographic film, ranging from microfilm/microfiche in 35, 70 or 105 mm sizes upward (which has a proven track record of less than 300 years). The point here is that such media will likely always be digitizable as a source document at whatever resolution is appropriate – a technology niche that seems to be ever improving in capacity and cost effectiveness. The downside, of course, is that paper has to be stored in controlled conditions to be durable, and that creating an analog image from any film medium is expensive, though the result is presumably enduring. Neither of these approaches is likely to become a reasonable alternative to average people.

    Your insistence that updating files is excellent, though I wonder how many people are as dedicated as you. Somewhere in my attic are old CRM, 5-1/4 inch floppies and 3-1/2 inch floppies that I despair of ever using. Likewise I have both Betamax and VHS tapes that I’ll never watch again. It’s probably all to the good, I suppose.

    L

    Reply
  12. When I read the title of the article I thought the issue of cyber attack would be explored. Digital data stored in a drawer in my room is not vulnerable. But aren’t so many of us engaged in a growing relationship and dependency on Drop Box, One Drive, etc? How secure is what I have stored in the cloud?

    Reply
    • Cloud storage can be vulnerable to loss, but the greatest danger is having your account hacked or forgetting your password. Most large systems have mirror backups in several locations which should mitigate against cyber-attack, but there is still a chance of that failing. That’s why the 3-2-1 backup strategy is recommended. 3 copies on 2 kinds of media (cloud, external drives etc.) 1 copy off site (cloud or external drive away from your computer’s location)
      https://askleo.com/how_do_i_backup_my_computer/

      Reply
  13. I think that in the long, long term, digital information is doomed. The simple reason is that it cannot be read at all without an intermediary device of some sort. A piece of paper, or papyrus, from 3,000 years ago can still be read, if you can find one. Even better for hieroglyphics engraved on a stone. But if you were to discover the equivalent of a memory chip from a lost technology of 3,000 years ago, or possibly even as recently as 500 years ago, what would you do with it? Unless the reading technology had been passed down through the ages, you wouldn’t even know that it contained any data.

    Reply
  14. Great intro to the issues that digital archivists have been facing for the last twenty years. The one challenge that often seems to be overlooked is preserving file attributes, particularly date and time stamps. If you are keeping a document and want it to be as genuine a copy as possible, then you need to be concerned with maintaining date created and date modified. Many Windows utilities don’t preserve either or both when copying to a new physical media, or invert created and modified. It would be useful if you could point out some tools that actually allow for proper “archiving”.

    Reply
  15. One day at a Google meeting I heard that you can compress all of the data since beginning of man into a cube 1 m
    but to index it the cube will be the size of Jupiter. that’s what Google does it`s an index service.
    Another thing comes to mind do you really want to save everything forever ?
    What happens if you die or get killed suddenly what happens to all of your Digital identity,what is facebook going to do with all those dead people?
    are they going to have a digital cemetery?
    it’s a funny thing what time does do things. for instance, if you dig up the bones from a gravesite 2000 years after the burial you’re an archaeologist
    if you’ve done it within a couple of years your forensics,but if you do it after a day or two you’re a ghoul.
    and if you do it within an hour ago you are a pathologist. but of you go right up to the second after death you’re the murderer !
    after all isn’t that the true purpose of memory,to defeat time ?
    we all strive for immortality,
    to live forever.

    Reply
    • Thanks Jerry…*LIKE* your take…I personally have a one week to one month ‘existence’ in gmail and facebook and one year for paper and sentimental paper…I love Love Love to delete delete delete. I am living life forward and have booked 2017 as my transition to retirement to tidy my photos, papers, sentimental item and now my deceased parents.

      I have completed a few online declutter courses, read endless articles about digital storage…and the only advice, other than back up back back up, which turned it around for me was to choose an defined amount for storage and don’t go past that…therefore edit edit edit…your ‘space’ defines how much to keep.

      One option for me…re: photos and print is to think in decades…keep only the best of the best in a photo book with a hundred or so pages…a week to an opening so to speak. And when I retreat to my nursing home I can take my ten books with me. Doable? Paper shredder here I come.

      Reply
  16. I have a prime example of lost data due to obsolescence. Several years ago I was using IOMEGA 100mb Zip drive disks to backup my data, mainly photos. Much to my disappointment, caused by my own negligence, I recently discovered that IOMEGA quit supporting OS’s after Windows 98. It appears I can no longer use my Zip Drive and retrieve the data from my 100mb Zip disks. There are several hard lessons here, mainly staying current with your old data and multi-back up with different methods.

    Reply
    • You might be able to get a Win 98 computer very cheap on Ebay or Craigslist. Then copy your data, back it up and donate the computer and Zip drive.

      Reply
    • Just checked my Iomega Zip 250 and disks. Working okay. If memory serves there was an Iomega backup utility which may have been what you used. Will have a look later for the CD.

      Reply
  17. I have a reference book printed in 1746 beside me which I refer to from time to time – I doubt that any digital media will be readable after 270 years.

    Reply
    • Perhaps, but consider how many books that were printed then that no longer exist, having been lost through various reasons. They, too, are “unreadable”.

      Reply
      • Not to mention that it also depends on the quality of the paper. Cheap paper will yellow and cheap inks will fade over time. Not all of the Dead Sea Scrolls are readable because the ink is faded/smudged. Some of them have holes in the middle where archaeologists had to make “guesses” as to what was missing.

        Reply
  18. I never keep files on. My computer. I keep them on an outboard hard drive. I back everything up to a separate hard drive with Macrium Reflect. I disconnect it after each backup. I have all documents in either Word, PDF or WordPerfect format.

    Reply
  19. Thanks for all your great information, Leo. I really learn a lot from your articles, books & videos. Regarding this particular article: I too have gone through lots of changes in storage of media–particularly for favorite/important documents and especially my photo collection. Mostly I store things on portable hard drives (having moved from 5 1/4 disks to 3 disks, to CDs/DVDs, to thumb drives, etc. over the years). I have not stored much in the cloud but will probably increase this in the near future. My worry here is accessibility. We had an all day internet outage here yesterday and I could not get to some things stored in the cloud that I wanted to work on because my internet was down.

    I store my photos in .jpg format. I learned the hard way that storing them in a particular program’s format (like PowerPoint or Picasa) is a good way to lose photos. I think the .jpg will be around for a long time.

    One thing we need to remember is that 99% of the stuff we save (as physical things or digital things) as individuals is only important to us. When we die, nobody (including the kids) wants the stuff and it will be dumped. As I’ve changed media storage over the years, I’ve also “pruned out” things that really no longer mattered to me. As someone pointed out, if you can’t find it, it is of no use to anyone. I think we tend to save way more than is necessary because, now days, we can. Digital storage in past or present forms is so compact that it invites us to store stuff. And, yes, digital cameras (especially those on cell phones) create more photographic junk than the old printed photos ever did. I never allow my phone to hold more than 30 photos before “pruning out” the unnecessary ones and downloading the others to my computer. Can’t believe friends with over a thousand photos on their phones–of course, they can’t find the photo they want to show me!

    Reply
    • Exactly Teen…we store because we can…our stuff is defining our space rather than our space defining our stuff…outsourcing all the time.

      I can’t tell you the number of times I had to wait…and wait…for someone to find a photo, that truly only interested them, and then they give up when they can’t find it…thirty sounds like a fine number.

      Thanks for the .jpg tip.

      Reply
  20. When many years ago I purchased my first computer it came with a 5.5 inch floppy disk containing all the software loaded on the machine…….but no 5.5 inch drive !
    It was never an issue, but later, when I replaced another computer – from Dell, I had to specify the addition of a floppy drive (I think they were about 3.5 inch), because the drive was no longer standard on new computers and I had all my financial and correspondence back-up on a collection of such disks.
    Of course I promptly swapped the data from those disks and no longer used it, and threw away the substantial collection of disks supplied with PC magazines with free software.
    How would I read such a disk now, and with all my ‘recent’ music on CD, I can no longer play any of the tapes I still have, or the old 45 and 33.3 records I still treasure but never play.
    Moving house recently I also realised that I still had a significant number of VCR tapes, but dumped the lot because DVD has replaced them and left me with no means to play tapes.
    How long before I will have to dump cd’s and dvd’s !
    Life goes on, and not just in the computer world !!

    Reply
  21. The basic requirements of data archiving really haven’t changed whether the medium is paper or digital. First is TCO, total cost of ownership. While there is a cost for the medium (paper or digital) more significantly are the storage and retrieval costs. These costs are often buried and overlooked. While it is clear that digital technology is changing rapidly and data must be migrated to updated media and interpretation software, the same is true, albeit at a slower pace for paper-based data. We had a paper document written in a historical German script. Although we could photocopy the paper, we needed the services of a German history professor to read it. The best approach is to attempt to estimate how long the data will be needed and its monetary value over that time. Archiving is only suitable if the data TCO is less then the value of the data over that time.

    Reply
  22. I am not really concerned with the potential problem of “will my data be readable 100 years from now”. In my opinion, the question is moot, as in “Who will give a sh..t about my data. a hundred years from now?” As long as I am still around, I do what Leo suggests: save forward onto newer media. I have digitized most of my old 35 millimeter slides, and hundreds of photographs. They are stored in chronological order with a spreadsheet to support the background with the file name embedded. My current writings, receipts, transactions, documents, are digitized to the extent possible. I do this because I got tired of storing old statements that maybe once in a blue moon need to be found. This is for my immediate purposes, and maybe, I hope, for my children. After that, it would be up to them to “save forward” onto new media. With digital it is possible. With hard copy, such as prints and slides, well, after a time they may be useless or gone to a city dump. So, for now, I will go with digitizing. Once I am gone, it will not matter.

    Reply
  23. This is a very difficult problem, and I think the inescapable conclusion is that much of our digital data will be lost. In history, the only requirements were that the thing itself survived the ages and that people in the future could figure a way to interpret the data. A book hand-written by a monk in the 11th century could be read by scholars today as long as the book itself survived. That data never had to be re-written in new formats along the way. With digital data, the key to preservation is the periodic copying of that data to whatever is the current format. Who exactly is in charge of that job? The creator? The publisher? The reader? …and how often is “periodic”? Every 5 years? every 10 years? Will that job really be assigned and carried out on these cycles for 200 years? 500 years? I just don’t see that has a realistic assumption. So much is going to be lost because of the decentralization of storage and lack of assignment of responsibility. As far as individual personal data, it’s probably no different than in history – whatever survives will depend on your heirs. I suspect that much of this same kind of data from history is also long lost for the same reason much of our current personal data will be lost……lack of upkeep by those who come after us.

    Reply
    • Some are talking of how past information has come to us via books etc. Actually, the ratio of data lost to data preserved over the ages which was contained in books, pottery etc. is extremely high. Very little remains, enough to fit in a few museums. The entire Library of Alexandria was burned to the ground.

      Considering the technology we have, and the small space it takes to contain the data, I expect there will be much more data preserved 4,000 years from now than was preserved over the past 4,000 years. Much will be up-cycled to newer media, while a lot of the purely personal stuff will probably be lost. Large companies have vaults in caves to preserve their backups, much of which are on tape. Even with changing formats future archaeologists with their advanced technologies should be able to easily decode the contents of tapes and probably hard drives or SSDs or whatever future storage method is devised.

      Reply
      • “Considering the technology we have, and the small space it takes to contain the data, I expect there will be much more data preserved 4,000 years from now than was preserved over the past 4,000 years. ” – Unless we run out of capacity, of course!

        http://www.dailymail.co.uk/sciencetech/article-2883722/Be-careful-save-World-run-computer-hard-drive-space-2020-expert-warns.html

        I don’t believe that we will run out of capacity – I’m sure new storage technologies will come along before that happens – but the numbers are certainly interesting. According to IBM, we create 2.5 quintillion bytes of data every day – so much that 90% of all the data that exists has been created in the last 2 years. Surveillance cameras alone – which are constantly increasing in number – create enough data each and every day to fill 11.3 million double-layer Blu-ray disks. And the rate at which we as individuals create data continues to increase exponentially as camera/video resolutions get ever better. And then we further increase our storage needs by backing all our stuff up using the good ol’ 3-2-1 strategy, as well as further duplicating it by sharing it on Facebook, etc – which in turn back their/your data up.

        Reply
      • It just dawned on me that so much information will survive thousands of years from now, that they would be inundated and overwhelmed by all that data that the problem isn’t preservation, but interpretation. Not determining which words the text is composed of, but which words have value for them. Technology is always evolving, so it shouldn’t be a problem as Artificial Intelligence evolves.

        Reply
        • “So much information will survive thousands of years from now….” – And 95% of that information will be cat videos. Googols and googols of cat videos. All duplicated across multiple platforms – from Facebook to YouTube – and safely backed up for posterity in multiple geographically redundant data centres.

          Reply
          • You can haz as many backups as you likes.

            It’s interesting how the numbers increase. You start with one of your books that weighs in at, say, 10MB. It’s downloaded by 1,000 people, and suddenly it’s eating up 10,000MB of disk space. Each of those people follows the 3-2-1 rule for backup, and the number increases to 20,000MB. Actually, it increases to 30,000 because each buyer backed up online, and the online backup company has a backup of all those backups. So, a little 10MB book ends up occupying 30GB of disk space. And that doesn’t even include all the other ways it may be shared/duplicated.

        • I understand the ancient Egyptians worshiped cats.
          We’ll likely come full circle. Future Egyptians will find our 3,000-year-old cat videos and conclude that we worship cats.

          Reply
  24. Books. Books in any language. They were nice to have and hold, nice to peruse at the library, nice to borrow. Then the library system underwent political attack and our access was reduced. The library underwent religious and school-appropriate attacks and our access was reduced. Then libraries were supplanted by commercial stores, then those stores by online stores, then onlines stores honed selection by selling hierarchies. Eventually our access to knowledge–musical, mathematical, social, medical, scientific (you get the picture) gets bombarded and reduced to what some nebulous Leo out there in Delta Control Central says is relevant (or “all a lowly serf can handle”). So, everything becomes smaller. Have you noticed? Brains, imaginations, in-depth and meaningful information, manners, poetics, styles, political parties, representative choices, even color selections. (Take a gander at the colors on the cars in your town.) Now we all just fit on a blade that suggests what we want and what we should learn. But who controls the programming that controls the blade? And there you have it–incontinence.
    If you can’t put your hands on it, if electricity is the key, if an entity can send binary in and rob you of your privacy and chase you with buy orders, then there is no diaper so secure that your dryness and comfort can be assured. Be prepared for wetness, rash and to lose whatever you entrust to digital security. It’s not a muscle you control; it’s not a pocket.
    For my part, I run backups every hour on the hour and still have not been able to stem the mysterious losses that occur two or three times a year. But I have no problem putting my hands on my first book, my first record, or my first sheet of music. I have to subscribe to borrow a writing program. If I don’t my access to my work is severely hampered. How much progress have we made?
    There is no substitute for the comfort of dry pants, and none for actual ownership.

    Reply
  25. I am not that computer smart getting better thanks to you and other who have helped me when I worked know retire I depend on the internet for help but I do have 3.5 disks of pictures of the children that for the life of me can not download PS learned not to format anymore disks it wipes them out.LOL

    Reply
    • There are inexpensive (as low as $10) 3.5 floppy disk drives which should be able to read those as long as the disks are still OK.

      Reply
      • You forgot to mention that most newer computer don’t have a floppy drive connector, so he should look for an external USB drive.
        I checked Amazon.com and found a lot of them (even one for the 720kb disks).

        Reply
        • Good point. I neglected to mention USB connected drives because when I searched for Floppy drives, the first page on Amazon only showed me USB floppy drive. But I may have typed USB floppy drives into the search. :-)

          Reply
  26. There are professional archives out there, whose job it is to take care of all of this. I used to work for one which archives social science data. When I left, we could still retrieve data from punched cards, in addition to floppies, etc. And it was our job to back up, to research which storage formats to use, AND to keep all the metadata about the files and groups of files……

    If documents are actually going to be important in the future, someone can be paid to take care of them.

    My guess would be that few individuals have very many documents that are going to be regarded as interesting in the future, let alone important. And, as others have said — I’ll be gone; it’s not my concern.

    One thing I’d suggest for anyone with written documents they regard as important — save a copy in plain text. My guess would be that .jpg files will be around a lot longer than word-processor-format files. My dad had a long mailing list stored in an obscure database format which his newer software could not read. We saved it out in “comma-separated variable” format (which the ancient db program could produce), and hey, presto, the new software could read it.

    My advice for anyone concerned about this is “Save a copy in the most general/generic format possible. No proprietary formats. Save it all on hard drives. Multiple up-to-date hard drives, with at least one hard drive off-site (mine is in the safe-deposit box at the bank).” It would take a group of disasters to cause me a total loss of pics, financial docs, etc, and I suspect if that group of disasters were to happen, the loss of my pictures would not be one of my significant problems…………

    Reply
  27. This excellent article deals with file formats and the importance of data backup; it also mentions one particular backup application (Macrium Reflect), which can make backup archives, (or copies), of whole hard disks and/or partitions, or, (should the user prefer), backup copies of individual files. Another application of this kind, also often mentioned by Leo, (but not in this particular article), is EaseUS Todo Backup, (which also comes in free or paid versions). I am sure that there are many other backup applications available, (various applications from Terabyte Unlimited and Acronis come to mind).

    Also, I seem to remember, around 8 or 10 years ago, reading another article discussing this type of problem, (possibly on Photo.net). The article was primarily concerned with the various proprietary RAW image formats created by different digital camera manufacturers, and how one of those manufacturers was refusing to make available any means of reading those formats, other than by the use of that manufacturer’s own proprietary software, (which also could not be activated unless the user was a verified purchaser of one of their own camera models). (If I remember correctly, activation of the software was only possible through the input of the serial number of the camera which had been used to create the original image, but my recollection on this point is now rather hazy). I also seem to remember that some third-party photo editing software producers, (such as Adobe), were in dispute with the camera manufacturer concerned, because their software applications, (such as Photoshop), could not edit such files without needing a proprietary plugin for the RAW format in question, for which a licence fee would have to be paid. I have no idea whether or not this dispute was ever resolved.

    My problem/question is this:

    As far as I am aware, each of these backup applications stores the backed-up data in its own proprietary format, and I would be very surprised if I were to discover that any one of those proprietary formats could be read by any application other than the one which had been used to create the original backup archive. (For example, I know for a fact that backup archives created by Macrium Reflect v6 cannot be read by Macrium Reflect v5 or earlier). Archives created by older software versions can be read by newer versions of the same software, but not the other way around, (i.e., newer archives cannot be read by older software versions).

    Therefore, not only do we need to be concerned with the readability of the backed-up files: we also need to be concerned with the readability of the backup archives themselves.

    Furthermore, this also applies to archiving/compression applications, such as ZIP or RAR. (For example, earlier versions of ZIP had a content limitation of 65,536 files per archive, but this has been increased in the newer 64-bit ZIP versions).

    What can we do in the (admittedly highly unlikely) event that the ZIP or RAR formats themselves were to become obsolete at some time in the future?

    I really feel that this could be an even bigger problem than the formats of the backed-up individual files.

    That’s my 2¢ worth, anyway. :-)

    Reply
    • I consider backups and archiving two different things. As a result I wouldn’t count on true long-term access of backups. Fortunately the backup programs, while not forward compatible are usually backwards compatible. Meaning that the backups you create today with Macrium v6 (for example) will in all likelihood be readable by v7 and beyond – to some limit, I’m sure.

      The way you’re asking the question actually answers it, though. Don’t use proprietary formats for long term archival of data. Use common (VERY common) formats whenever possible. If you must collect files into other files, I expect “zip” files will be readable for decades if not centuries, just because they are so common. Macrium’s “.mrimg” (and Acronis’ “.tib” and EaseUS’s “.iforget”) not so much.

      Reply
    • You are absolutely right that each backup program makes its own file type that can only be restored with that particular software. Backup programs are not meant to be archival storage. They are really for much more immediate use – in the case of a computer crash, etc. As some people point out in the comments on this article… every single thing we own does not need to be archived. But it’s certainly good to be thinking about all the problems that involves.

      Reply
  28. Zip files are so ubiquitous, that many programs virtually see them as if they were plain text. Google, for example, doesn’t allow executable files to be sent as attachments. If you try to hide the file in a zip file, they still block it. File Explorer also unpacks zip files. I’ve always recommended people use zip files for universal compatibility.

    Reply
  29. I agree with what you said about the potential problems with digital archiving and how to work around them. However, there are two aspects of this that weren’t covered. First, the reliability and durability of the actual media used for storage is a real threat to the retention of data. No one really knows if a CD, or DVD, flash drive or even a hard drive will be able to keep the data on it for 50 or 100 years. Really long-term data retention on the media we have available today is a real question mark. Because of this, someone, only half in jest, suggested using an electron beam writer to burn date onto a piece of chrome steel by anyone who wanted to pass his or her data onto people 1000 years from now.
    The other potential problem is a bit more esoteric. We have no way of knowing what kinds of technologies we will develop in the next hundred years to do our data processing. If the direction of progress in the IT field moves us completely away from our current methods of computing and data storage, it’s quite possible that in 100 years will have no means of reading any of our existing media, since the entire paradigm of computing will have made the use of such storage systems completely obsolete. For instance, as an off-the-wall example, suppose we move to artificial biological devices that can be scaled up to any capacity we want to do our computing. In such an event, any kind of mechanical storage medium we pass on will most likely never be able to be read by our descendants since such mechanical data storage and retrieval will have been rendered totally unnecessary by then. Granted, this possibility might sound far-fetched now, but we have no way of knowing what computing of the next century will be like.

    Reply
  30. Here’s a historical case going back to 1930, the latest event being last week November 2016.
    My Grandfather from 1930 to 1940 had a cine camera and took much footage of my Grandmother, and my Dad, Uncle and Aunt as children. Places featured include London, North Wales, South of France, and Switzerland in summer and winter.
    In the 1980s my Uncle found these and had them converted to VHS tape and gave copies to my Dad and Aunt.
    A couple of months ago I discovered my now deceased Dad’s tape and took it to my local camera shop who copied it all onto DVDs.
    These DVDs went into my laptop and I copied the .VOB files back to the laptop. Using FormatFactory I converted the .VOB files to .MP4 format and edited them into clips, naming and dating them.
    I followed the same process with the recently found cine film my Dad, Uncle and Aunt took of their children (me, my sister and our cousins).
    And I’ve done the same with VHS tapes I took of a week-long family gathering (there were 30 of us!) in Cornwall from 1992.
    ALL of the above have now been copied to USB sticks (“thumbdrives” in the USA) and just last week distributed (by post!) to my sister and our cousins, who will make copies for their many kids to have.
    My guess is that as each significant new media capture, play and storage method we use will have the previous format copied up to it, probably ad infinitum, if the generations are prepared to do the work!

    Reply
  31. Congratulations Leo for bringing up a basic but very important and vexed issue of digital archiving. In fact it is surprising that even at this advanced stage of digital technology, computer scientists and engineers have not come up with an answer to this challenge. As a layman, i would suggest that the digital technology industry should come up with a common/unified standard wherein all kinds of digital files- whether simple text or multimedia files- can be, with a special tool/software, backed up into- and retrieved from a common/unified code or format (for each type of file file i.e. text, image, audio, video etc) which should be independent of the software which generated them at the first place. Let me call the data created in such common/unified code or format as “absolute data” or “eternal data”. No matter what types of media, formats or software are developed in future, the format of “absolute/eternal data” should be kept essentially the same. The digital technology industry should then start certifying all future hardware, softwares and technology capable of reading such data as “absolute/eternal data compliant”!! Just a thought!

    Reply
  32. I have just watched your video about backups (BUP) that will last over time, and I agree with all your points about formats, etc.

    Like you, I suspect, my wife and I have gone digital for almost all things, like say, credit cards, and I usually OCR them. This has great advantages that I can search them. All our backups are to HDDs

    However not all is going well. This has to do with digital copies becoming corrupt.

    As is recommended, we create BUPs, and then BUPs of the BUPs to store offsite (with a relative). These BUPs are huge, as so much data has been created over the years, and currently about 3TB. We try to have an older “archive” BUP, but we only have one or two “archives” at any point in time. So, what is the issue?

    When we need to recover a file from the “archive” we are increasingly finding corrupt files and this is a game stopper and it occurs mainly in .exe and/or .rar. I use a legal copy of Winrar and always their older, less crunched formats.

    Sometimes this has clearly been corrupted for some time, and that is why I need to visit my BUPs because a file is corrupted/lost, etc. All storage is on mechanical HDDs; SSDs are used as boot C:\ devices, with something always creeping back into C:\users, I also back that up.

    What am I asking? Quality of HDDs (I used WD Essentials or similar)? Maybe an utility to work away in the background to test each file?

    I’m sure someone will help with issue. It must be common in the workplace, but those places have more “archive” BUPs, more than I can afford.

    I look for someone to help
    Regards,
    Doug

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.