Become a Patron of Ask Leo! and go ad-free!
Hi, everyone. Leo Notenboom for askleo.com. By now you know, or you certainly should know, I’m a big fan of the digital. I’m a big fan of digital photography, digital video, digital document retention, digital … everything, basically.
And I say that, because one of the big lessons that I have, that I try to share with people, is that digital is significantly easier to back up than just about other format you can think of. If you’ve got a piece of paper that’s fantastic; it’s the original, but if anything happens to that one piece of paper, you no longer have your original. Anything else will be either a copy (a lower quality copy) or just not exist, which is what we find happening way too often when there are things like fires and such.
So, like I said, digital is so trivial to copy that makes backing up realistic. It makes it possible; it makes it downright easy. There’s simply no reason with a proper backup strategy that you would ever lose a digital document. Or is there?
So, one of the things, one of the arguments against digital document retention in particular, but it also happens for digital music, photography and video, is that things change. Over time, things change. How do we know today that the document we’re storing, the videos that I’m creating, the audio that’s getting produced is going to be saved in a format that will understood and recognizable ten year from now, 50 years from now, 100 years from now. We don’t.
The issue is, also, with respect to physical media. However I store this video or these documents, how do I know that ten years from now, 50 years from now, 100 years from now, people will have any way of actually physically retrieving the document off the media that I chose? If you’ve got old floppy disks lying around your home, you’re facing this already.
You’ve got documents or files on those floppy disks and there’s a good chance you no longer have a machine that could read them. There are currently still alternatives; there are ways to get the documents off of those floppy disks, but are they gonna be here ten years from now? I don’t know. I can’t tell you. We’re seeing also, with these very slow disappearances, with things like the CD-ROM drive or the DVD drive from laptops and occasionally even desktop machines.
Now all relying on ubiquitous connectivity to serve the same function. You don’t necessarily get your programs installed from a DVD; you actually download them from an online source. It’s an assumption that a lot of manufacturers love to make because making something available online is a heckuva lot cheaper than physically producing DVDs by boxing them up and shipping them out and doing whatever they do to get the product in your hands.
Again, how do we know that 10, 50, or 100 years from now, things like CDs, DVDs, whatever follows them are still going to be around? The answer ultimately is that we don’t. We don’t know that those will be around. We can make some assumptions, some very broad assumptions about digital formats.
For example, I think there’s a very high likelihood that PDF files will be readable 100 years from now. They may be considered old and arcane, but the fact is there is so much information being preserved and presented in PDF format today, that it seems unlikely that all that would be discarded so readily by something as simple as not being to understand and display the format.
I think that’s going to be around for a while. Jpeg files – around file; mp3 files – all your music, that’s going to be around for a while. Will there be better, newer, higher quality alternatives in the future? Very likely. But will support for these “ancient” at that time, formats go away? I suspect not.
There are still issues. For example, what if you’ve got a document in a more arcane format? Something that is less popular today? Maybe a word processor original document that is no longer something that is popular; no longer something that is supported. Maybe it’s a Works document. I’ll just throw that out there for example.
What are the chances that a .wks document will be readable in its native format in a 100 years from now? I’m going to call that one a coin toss because we really don’t know. And coin tosses aren’t what we want to rely on for digital archiving for long-term preservation.
What’s true, digital archives are faced with doing is two things: Understanding the formats that they have and making sure that as new formats arise, documents are migrated into those new formats. So, for example, right now, this video’s being recorded in .mp4 format.
Will that be around 100 years from now? Again, I suspect so but let’s say that it’s risky; let’s say it’s riskier than my assumption would have us believe. It would behoove archivists of the future to do the work; to translate, automatically, this format into whatever the appropriate format of the future would be.
Presumably it would be higher quality; there would be no image quality loss but that’s a step that would need to be taken. Again, it could be taken automatically but it’s something that digital archivists need to think about it. It’s something that a lot of people need to think about.
The same is true for physical formats. I no longer have floppy disks. Why? Because I copied all of the content off the floppy disks that I wanted to retain to hard disks. Some time ago actually. So, I may not have a floppy disk reader (I think I do but I don’t think I can use it anymore). But even if I didn’t, that’s okay because I’ve migrated all that data.
I’m doing the same thing with my CD archives. My archives of documents and backups that I care to keep for a long time. I’m actually very slowly copying off the contents of those CDs on to what are now significantly larger hard disks that have no problem with the capacity issues.
Today, it’s a simple copy operation. Tomorrow, who knows what it will be in terms of format changes or hardware required to be able to read the old media so again what digital archives the second part of digital archiving is as much about making sure that the data is preserved in a format that can be accessed and copying it if it looks like the current format is something that’s not going to be supported over time.
These are hard problems to solve. I don’t want to discount people’s concerns that digital data can be lost over time because it can. But the issue is that what we have with digital data way more so than with paper or other physical forms of documentation is we have options. We have so many options. Not just for backing the data up but also for ways in which to retain it.
For ways in which to distribute to multiple different locations so we’re not at risk of things like an archive bursting into flames. Other copies, again, backup copies of all that happens, all that information can be stored in multiple locations. We no longer have to experience the devastation that was the loss of the library in Alexandria, for example.
Consider if all of that stuff, however many thousands of years ago, consider if all of that stuff had been available digitally and replicated somewhere else. That data would still be around today. And even if there were no readers available to be able to decipher the documents at that time, we will be able to create them. Software can be written to understand and decode even the file formats that today we no longer understand.
We may have lost the ability to read a Works file or something like that 10 or 15 years from now, but that doesn’t mean that we can’t, if it’s important enough, go back and create it. So, with digital data, you have options more than you have with any other format. That’s why I’m such a proponent of it. It definitely comes with concerns and risks but again because there are so many options available to us when we have our information stored digitally, it’s significantly less of a risk to me anyway, than almost any other alternative.
What do you think? What do you think about digital data, digital archiving, backing up and so forth? Am I completely off the wall here? Is there a flaw in my thinking? I really do, I would love to hear what that might be. As always, here’s a link to this article out on askleo.com.
That’s where the comments are read, moderated. That’s where all the fun stuff happens. I’d encourage you to come out; let me know what you think. Until next time, I’m Leo Notenboom for askleo.com. Take care, have fun, stay safe and don’t forget to back that stuff up. Bye, bye.
Was that video interesting? Helpful even? Well then, I could use your help. I’ve got a Patreon project under way. You’ve got an opportunity to contribute and help support askleo.com, to help me do what I do. Help more people, answer more questions, produce more information about technology that hopefully can help you and others use it more effectively and with more confidence. Visit Patreon.com to learn more. Among other things, you get rewards depending on the level of your patronage so check out Patreon.com/askleo to learn more and help contribute to askleo.com. Thanks.
Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
I'll see you there!
Download (right-click, Save-As) (Duration: 10:55 — 20.1MB)
61 comments on “The Perils of Digital Archiving”
Leo, you putting everything on hard drives is also destined to be lost. How many versions of hard drives are obsolete
already? Some from just few years ago are no longer readable on current computers.
Indeed. This is why periodic forward migration (from old media – whatever that might be – to newer) is an important part of any long term plan.
Leo, I think you are dead on about storage of digital information.
One thing I have experienced with optical media is that after ten years or so it gets iffy as far as reading it goes. I live in Mexico and as such my very limited climate control I am sure is an issue.
I have progressed from floppy drives, to cd’s to dvd’s and at present use bluray discs for most storage. I also of course back up to multiple drives though my physical location houses all backups. As my house is built of brick and cement, roof included fire is not a threat.
Over the years I have experienced hard drive failures but had my photo and many important documents burned to optical medium. I still lost things but nothing earth shaking to my life.
The main problem I experience is accessing the huge number of things I have saved and backed up. Not that I dont have them, but remembering what I have and simply scrolling through thousands and thousands of documents and photos makes it a daunting task at best.
As I live in a country where it is legal to download movies and music at will I have a huge collection, all burned to dvd’s over the past fifteen or so years. My friend has made a spread sheet for me but once again the physical size of the disc collection makes in hard to find anything. Not sure what I will do with it all actually. Maybe when I am gone my kids will want to parse. Mike in Mexico
My concerns are threefold:
1. I’m sure I have old files that can’t be read any more from my Atari word processor, spreadsheet and database, however, I no longer have either the machine or programs to read them. Some I saved in txt format so I can still read them.
2. That backup drives will no longer connect to the new 2035 PC ports assuming that we are actually still using PCs then. I have old hard drives with various different connections, IDE, SCSI, SATA, USB, etc. SCSI is pretty obsolete already.
3. That the backup drives will seize up or be rendered useless due to magnetism, moisture, heat, cold, radiation or some other physical damage.
1. There may be software out there to read the old file formats. Sometimes it’s possible to retrieve the contents of files while losing the formatting, but that also usually involves some after-the-fact clean up.
2. Once again, this is an argument for always migrating data forward to current technologies on a regular basis. What’s current today will hold you for a while, but will not be current forever.
3. If the data is in only one place, it’s not backed up. By that I mean if the failure of a single drive would cause you data loss, then you’re not properly backed up.
I used to be, before retirement, involved with document management (classification, storage, destruction, archiving). The archiving of strategic documents (those documents necessary for the recovery from the severest of disasters such as but not limited to all out nuclear war where high energy electromagnetic pulses would wipe out electronic devices and the ability to produce electricity immediately) had be done in such a manner that low tech or no tech would be necessary to recover those documents.
The requirement for the immediate ability to read such documents after a worst case scenario eliminated the digital storage of strategic documents. Paper copies of original documents held in multiple, separated secure and “hardened” locations was by far the most secure way of storing documents for long periods of time. Paper when tightly packed in steel filing cabinets are very resistant to destruction by fire.
A practical example of paper surviving for a long time are the Dead Sea Scrolls that survived roughly 2000 years in caves near the Dead Sea. All that is required to read the contents is the knowledge of the language(s) used to create them.
@Mike: Fire IS still a very real threat to the contents of brick built houses! And if your main problem is accessing the huge number of things I have saved and backed up, perhaps you should try to adopt a uniform file-naming system (ie always add dates in the same format such as 2016-11-01) and try and make the time to sort files into separate folders. Failing all that, you will have to rely on a search engine to find particular files :-/
@Rob: Storing paper in separated secure and “hardened” locations, in steel filing cabinets may well offer the surest way of protecting the documents – if you have a HUGE house with an absolutely HUGE cellar. But finding and retrieving just one sheet of paper would still be a nightmare! LOL. I think I’ll stick to keeping 3 or 4 digital copies, spread over different kinds of storage media (including 2 that are stored off site)! LOL.
This is a very interesting article.
There is just one point I would love to see covered:
– How do you keep track of all those individual pieces of information?
I have a few boxes of floppies, and more or less outdated CD/DVDs, but if ever I should want to retrieve some particular documentation, how will I be able to remember which is which, and in which folder it may reside in?
The grandparents’ system of having a couple of shoeboxes with photos that we could skim through on festive occasions was a lot easier to handle. They did not have to fear the recording format go obsolete; except maybe fade a bit over the decades.
When will technology advance to the stage where we can directly decode the storage media like we could back then?
– Please address my first question first.
Ole K: “I have a few boxes of floppies, and more or less outdated CD/DVDs, but if ever I should want to retrieve some particular documentation, how will I be able to remember which is which, and in which folder it may reside?”
It’s a very good point, and one I imagine that troubles the older ones among us more than those who are younger, who MAY not have quite such an extensive collection of files (certainly self-created documents). It can be a huge task to try to keep track of your files, because you have to keep cross-referencing everything.
Take me, for instance — “somewhere” I have hundreds, perhaps several thousand DOS files of my documents (I am a writer) — which I would LIKE to be able to get back, and “catalog” so I can find them when I want them. But since DOS only uses 8+3 for the names of files, for sure the names are likely to pop up again, referring to something else! At least with Windows files “long names” you can make them “more unique”. (Anyway, in my case, most of those files are in the WordStar format, with the hi-bit, so they do not appear natively as text files — another problem…although I can VIEW them with DR.Com.)
Back in the early 90’s I recall there was at least one programme that allowed you to “run” a floppy, by doing a directory sort, so that you had a file of all the file names, very quickly, and you would indicate what the floppy’s “name” was. You could process a lot of floppies very quickly this way. Now, you can only do, at the DOS prompt, DIR /od > d:\Floppy1 — to create, for instance, a directory list, in file date order (Order-Date) sending it to D: drive as a file named “Floppy1”. Doing that yourself WILL build up a list, certainly, but then you have to join all the files together, so that you can use the Search function in a document creation programme, in order to hopefully find various possibilities for what you seek. But pretty soon that file will be huge… And if you wish to index even a “small” HDD… well, it could take you forever, if you also want to know where some old programmes are located….
The point is, you will likely spend more of your LIFE either trying to find something, OR trying to index all that you have, than it may be worth. And even if your task may not be so huge, and you figure you can handle it, you may soon be daunted by the realisation that in a few years time, as standards change, you may not even be able to READ all that “organisation” you spent so much of your life trying to create… so you then have to migrate everything….
And THEN — even if you COULD… just WHEN are you really going to have time to READ all that interesting stuff you created some 20 years ago??
“Getting a life” nowadays probably doesn’t allow for much looking backwards — especially in today’s increasingly hectic world…..
It’s not enough to preserve the data; you should also be thinking about retrieval. With your digital cameras generating dozens to thousands of images per day, you’re piling up tens of thousands of images per year. Twenty years from now, it’s not just a matter of being able to read a file and decode the contents — how will you find the image that you want? SAVING digital data is trivial; making it retrievable takes far more work. The ease of saving digital information makes it harder to retrieve. In 1916, only important documents were saved: Wedding licenses, diplomas, letters from the frontier. In 2016, backing up your phone every day means this week’s shopping list gets the same treatment as the photo of your child’s first step. And the photo of that first step is buried along with two dozen shots of your pants leg that you took by accident, and 600 shots of the fall foliage that looked amazing in person but utterly lack luster in the pictures. How do you find one image among 100,000?
My problem, with most things is to remembering that I have something after a few years of not usseing it. If I don`t remember I have it (and or where it is), I may as well not have it.
That issue is know in computer circles as a user error or PEBCAK Problem Existing Between Chair and Keyboard ;-)
I was noticing a lot of the comments about finding the data. You could use a document management system where the location of individual files is not explicitly known. Of course that is also subject to obsolescence, not to mention it is a lot of work to get everything entered into, categorized, linked to viewers, etc. For companies like marketing firms this is the subject of legitimate work at home data entry positions, but it is the same concern on a smaller scale for the home.
There are a number of free software programs that will scan and list
the files from your media. The one I use is ‘JR Directory Printer’. You can save
the file as well as print a copy.
A very good program. Thanks.
Your suggestions are appropriate for most people, but don’t address some of the problems that archivists face and should be noted, and which your fans should keep in mind. So far as I know there are only two reasonably certain methods of archiving information: (1) Low acid content paper (think papyrus which has a good track record for a couple thousand years), and (2) stable photographic film, ranging from microfilm/microfiche in 35, 70 or 105 mm sizes upward (which has a proven track record of less than 300 years). The point here is that such media will likely always be digitizable as a source document at whatever resolution is appropriate – a technology niche that seems to be ever improving in capacity and cost effectiveness. The downside, of course, is that paper has to be stored in controlled conditions to be durable, and that creating an analog image from any film medium is expensive, though the result is presumably enduring. Neither of these approaches is likely to become a reasonable alternative to average people.
Your insistence that updating files is excellent, though I wonder how many people are as dedicated as you. Somewhere in my attic are old CRM, 5-1/4 inch floppies and 3-1/2 inch floppies that I despair of ever using. Likewise I have both Betamax and VHS tapes that I’ll never watch again. It’s probably all to the good, I suppose.
When I read the title of the article I thought the issue of cyber attack would be explored. Digital data stored in a drawer in my room is not vulnerable. But aren’t so many of us engaged in a growing relationship and dependency on Drop Box, One Drive, etc? How secure is what I have stored in the cloud?
Cloud storage can be vulnerable to loss, but the greatest danger is having your account hacked or forgetting your password. Most large systems have mirror backups in several locations which should mitigate against cyber-attack, but there is still a chance of that failing. That’s why the 3-2-1 backup strategy is recommended. 3 copies on 2 kinds of media (cloud, external drives etc.) 1 copy off site (cloud or external drive away from your computer’s location)
I think that in the long, long term, digital information is doomed. The simple reason is that it cannot be read at all without an intermediary device of some sort. A piece of paper, or papyrus, from 3,000 years ago can still be read, if you can find one. Even better for hieroglyphics engraved on a stone. But if you were to discover the equivalent of a memory chip from a lost technology of 3,000 years ago, or possibly even as recently as 500 years ago, what would you do with it? Unless the reading technology had been passed down through the ages, you wouldn’t even know that it contained any data.
Great intro to the issues that digital archivists have been facing for the last twenty years. The one challenge that often seems to be overlooked is preserving file attributes, particularly date and time stamps. If you are keeping a document and want it to be as genuine a copy as possible, then you need to be concerned with maintaining date created and date modified. Many Windows utilities don’t preserve either or both when copying to a new physical media, or invert created and modified. It would be useful if you could point out some tools that actually allow for proper “archiving”.
One day at a Google meeting I heard that you can compress all of the data since beginning of man into a cube 1 m
but to index it the cube will be the size of Jupiter. that’s what Google does it`s an index service.
Another thing comes to mind do you really want to save everything forever ?
What happens if you die or get killed suddenly what happens to all of your Digital identity,what is facebook going to do with all those dead people?
are they going to have a digital cemetery?
it’s a funny thing what time does do things. for instance, if you dig up the bones from a gravesite 2000 years after the burial you’re an archaeologist
if you’ve done it within a couple of years your forensics,but if you do it after a day or two you’re a ghoul.
and if you do it within an hour ago you are a pathologist. but of you go right up to the second after death you’re the murderer !
after all isn’t that the true purpose of memory,to defeat time ?
we all strive for immortality,
to live forever.
Thanks Jerry…*LIKE* your take…I personally have a one week to one month ‘existence’ in gmail and facebook and one year for paper and sentimental paper…I love Love Love to delete delete delete. I am living life forward and have booked 2017 as my transition to retirement to tidy my photos, papers, sentimental item and now my deceased parents.
I have completed a few online declutter courses, read endless articles about digital storage…and the only advice, other than back up back back up, which turned it around for me was to choose an defined amount for storage and don’t go past that…therefore edit edit edit…your ‘space’ defines how much to keep.
One option for me…re: photos and print is to think in decades…keep only the best of the best in a photo book with a hundred or so pages…a week to an opening so to speak. And when I retreat to my nursing home I can take my ten books with me. Doable? Paper shredder here I come.
I have a prime example of lost data due to obsolescence. Several years ago I was using IOMEGA 100mb Zip drive disks to backup my data, mainly photos. Much to my disappointment, caused by my own negligence, I recently discovered that IOMEGA quit supporting OS’s after Windows 98. It appears I can no longer use my Zip Drive and retrieve the data from my 100mb Zip disks. There are several hard lessons here, mainly staying current with your old data and multi-back up with different methods.
You might be able to get a Win 98 computer very cheap on Ebay or Craigslist. Then copy your data, back it up and donate the computer and Zip drive.
Just checked my Iomega Zip 250 and disks. Working okay. If memory serves there was an Iomega backup utility which may have been what you used. Will have a look later for the CD.
I have a reference book printed in 1746 beside me which I refer to from time to time – I doubt that any digital media will be readable after 270 years.
Perhaps, but consider how many books that were printed then that no longer exist, having been lost through various reasons. They, too, are “unreadable”.
Not to mention that it also depends on the quality of the paper. Cheap paper will yellow and cheap inks will fade over time. Not all of the Dead Sea Scrolls are readable because the ink is faded/smudged. Some of them have holes in the middle where archaeologists had to make “guesses” as to what was missing.
Carving things into stone is probably the best bet. :)
But really slow.
I never keep files on. My computer. I keep them on an outboard hard drive. I back everything up to a separate hard drive with Macrium Reflect. I disconnect it after each backup. I have all documents in either Word, PDF or WordPerfect format.
Thanks for all your great information, Leo. I really learn a lot from your articles, books & videos. Regarding this particular article: I too have gone through lots of changes in storage of media–particularly for favorite/important documents and especially my photo collection. Mostly I store things on portable hard drives (having moved from 5 1/4 disks to 3 disks, to CDs/DVDs, to thumb drives, etc. over the years). I have not stored much in the cloud but will probably increase this in the near future. My worry here is accessibility. We had an all day internet outage here yesterday and I could not get to some things stored in the cloud that I wanted to work on because my internet was down.
I store my photos in .jpg format. I learned the hard way that storing them in a particular program’s format (like PowerPoint or Picasa) is a good way to lose photos. I think the .jpg will be around for a long time.
One thing we need to remember is that 99% of the stuff we save (as physical things or digital things) as individuals is only important to us. When we die, nobody (including the kids) wants the stuff and it will be dumped. As I’ve changed media storage over the years, I’ve also “pruned out” things that really no longer mattered to me. As someone pointed out, if you can’t find it, it is of no use to anyone. I think we tend to save way more than is necessary because, now days, we can. Digital storage in past or present forms is so compact that it invites us to store stuff. And, yes, digital cameras (especially those on cell phones) create more photographic junk than the old printed photos ever did. I never allow my phone to hold more than 30 photos before “pruning out” the unnecessary ones and downloading the others to my computer. Can’t believe friends with over a thousand photos on their phones–of course, they can’t find the photo they want to show me!
Exactly Teen…we store because we can…our stuff is defining our space rather than our space defining our stuff…outsourcing all the time.
I can’t tell you the number of times I had to wait…and wait…for someone to find a photo, that truly only interested them, and then they give up when they can’t find it…thirty sounds like a fine number.
Thanks for the .jpg tip.
When many years ago I purchased my first computer it came with a 5.5 inch floppy disk containing all the software loaded on the machine…….but no 5.5 inch drive !
It was never an issue, but later, when I replaced another computer – from Dell, I had to specify the addition of a floppy drive (I think they were about 3.5 inch), because the drive was no longer standard on new computers and I had all my financial and correspondence back-up on a collection of such disks.
Of course I promptly swapped the data from those disks and no longer used it, and threw away the substantial collection of disks supplied with PC magazines with free software.
How would I read such a disk now, and with all my ‘recent’ music on CD, I can no longer play any of the tapes I still have, or the old 45 and 33.3 records I still treasure but never play.
Moving house recently I also realised that I still had a significant number of VCR tapes, but dumped the lot because DVD has replaced them and left me with no means to play tapes.
How long before I will have to dump cd’s and dvd’s !
Life goes on, and not just in the computer world !!
You can now get record players that connect by USB. So you can play your old 33s and 45s.
You can still get record decks, amplifiers, tuners, speakers – no need for USB or a computer to play vinyl (or even shellac!) :)
The basic requirements of data archiving really haven’t changed whether the medium is paper or digital. First is TCO, total cost of ownership. While there is a cost for the medium (paper or digital) more significantly are the storage and retrieval costs. These costs are often buried and overlooked. While it is clear that digital technology is changing rapidly and data must be migrated to updated media and interpretation software, the same is true, albeit at a slower pace for paper-based data. We had a paper document written in a historical German script. Although we could photocopy the paper, we needed the services of a German history professor to read it. The best approach is to attempt to estimate how long the data will be needed and its monetary value over that time. Archiving is only suitable if the data TCO is less then the value of the data over that time.
@Martyn Green – “DR.com”? What is this? All I get is garbage when searching for it online. Thanks.
I am not really concerned with the potential problem of “will my data be readable 100 years from now”. In my opinion, the question is moot, as in “Who will give a sh..t about my data. a hundred years from now?” As long as I am still around, I do what Leo suggests: save forward onto newer media. I have digitized most of my old 35 millimeter slides, and hundreds of photographs. They are stored in chronological order with a spreadsheet to support the background with the file name embedded. My current writings, receipts, transactions, documents, are digitized to the extent possible. I do this because I got tired of storing old statements that maybe once in a blue moon need to be found. This is for my immediate purposes, and maybe, I hope, for my children. After that, it would be up to them to “save forward” onto new media. With digital it is possible. With hard copy, such as prints and slides, well, after a time they may be useless or gone to a city dump. So, for now, I will go with digitizing. Once I am gone, it will not matter.
This is a very difficult problem, and I think the inescapable conclusion is that much of our digital data will be lost. In history, the only requirements were that the thing itself survived the ages and that people in the future could figure a way to interpret the data. A book hand-written by a monk in the 11th century could be read by scholars today as long as the book itself survived. That data never had to be re-written in new formats along the way. With digital data, the key to preservation is the periodic copying of that data to whatever is the current format. Who exactly is in charge of that job? The creator? The publisher? The reader? …and how often is “periodic”? Every 5 years? every 10 years? Will that job really be assigned and carried out on these cycles for 200 years? 500 years? I just don’t see that has a realistic assumption. So much is going to be lost because of the decentralization of storage and lack of assignment of responsibility. As far as individual personal data, it’s probably no different than in history – whatever survives will depend on your heirs. I suspect that much of this same kind of data from history is also long lost for the same reason much of our current personal data will be lost……lack of upkeep by those who come after us.
Some are talking of how past information has come to us via books etc. Actually, the ratio of data lost to data preserved over the ages which was contained in books, pottery etc. is extremely high. Very little remains, enough to fit in a few museums. The entire Library of Alexandria was burned to the ground.
Considering the technology we have, and the small space it takes to contain the data, I expect there will be much more data preserved 4,000 years from now than was preserved over the past 4,000 years. Much will be up-cycled to newer media, while a lot of the purely personal stuff will probably be lost. Large companies have vaults in caves to preserve their backups, much of which are on tape. Even with changing formats future archaeologists with their advanced technologies should be able to easily decode the contents of tapes and probably hard drives or SSDs or whatever future storage method is devised.
“Considering the technology we have, and the small space it takes to contain the data, I expect there will be much more data preserved 4,000 years from now than was preserved over the past 4,000 years. ” – Unless we run out of capacity, of course!
I don’t believe that we will run out of capacity – I’m sure new storage technologies will come along before that happens – but the numbers are certainly interesting. According to IBM, we create 2.5 quintillion bytes of data every day – so much that 90% of all the data that exists has been created in the last 2 years. Surveillance cameras alone – which are constantly increasing in number – create enough data each and every day to fill 11.3 million double-layer Blu-ray disks. And the rate at which we as individuals create data continues to increase exponentially as camera/video resolutions get ever better. And then we further increase our storage needs by backing all our stuff up using the good ol’ 3-2-1 strategy, as well as further duplicating it by sharing it on Facebook, etc – which in turn back their/your data up.
It just dawned on me that so much information will survive thousands of years from now, that they would be inundated and overwhelmed by all that data that the problem isn’t preservation, but interpretation. Not determining which words the text is composed of, but which words have value for them. Technology is always evolving, so it shouldn’t be a problem as Artificial Intelligence evolves.
“So much information will survive thousands of years from now….” – And 95% of that information will be cat videos. Googols and googols of cat videos. All duplicated across multiple platforms – from Facebook to YouTube – and safely backed up for posterity in multiple geographically redundant data centres.
You can haz as many backups as you likes.
It’s interesting how the numbers increase. You start with one of your books that weighs in at, say, 10MB. It’s downloaded by 1,000 people, and suddenly it’s eating up 10,000MB of disk space. Each of those people follows the 3-2-1 rule for backup, and the number increases to 20,000MB. Actually, it increases to 30,000 because each buyer backed up online, and the online backup company has a backup of all those backups. So, a little 10MB book ends up occupying 30GB of disk space. And that doesn’t even include all the other ways it may be shared/duplicated.
I understand the ancient Egyptians worshiped cats.
We’ll likely come full circle. Future Egyptians will find our 3,000-year-old cat videos and conclude that we worship cats.
They will have concluded correctly ;-)
Books. Books in any language. They were nice to have and hold, nice to peruse at the library, nice to borrow. Then the library system underwent political attack and our access was reduced. The library underwent religious and school-appropriate attacks and our access was reduced. Then libraries were supplanted by commercial stores, then those stores by online stores, then onlines stores honed selection by selling hierarchies. Eventually our access to knowledge–musical, mathematical, social, medical, scientific (you get the picture) gets bombarded and reduced to what some nebulous Leo out there in Delta Control Central says is relevant (or “all a lowly serf can handle”). So, everything becomes smaller. Have you noticed? Brains, imaginations, in-depth and meaningful information, manners, poetics, styles, political parties, representative choices, even color selections. (Take a gander at the colors on the cars in your town.) Now we all just fit on a blade that suggests what we want and what we should learn. But who controls the programming that controls the blade? And there you have it–incontinence.
If you can’t put your hands on it, if electricity is the key, if an entity can send binary in and rob you of your privacy and chase you with buy orders, then there is no diaper so secure that your dryness and comfort can be assured. Be prepared for wetness, rash and to lose whatever you entrust to digital security. It’s not a muscle you control; it’s not a pocket.
For my part, I run backups every hour on the hour and still have not been able to stem the mysterious losses that occur two or three times a year. But I have no problem putting my hands on my first book, my first record, or my first sheet of music. I have to subscribe to borrow a writing program. If I don’t my access to my work is severely hampered. How much progress have we made?
There is no substitute for the comfort of dry pants, and none for actual ownership.
I am not that computer smart getting better thanks to you and other who have helped me when I worked know retire I depend on the internet for help but I do have 3.5 disks of pictures of the children that for the life of me can not download PS learned not to format anymore disks it wipes them out.LOL
There are inexpensive (as low as $10) 3.5 floppy disk drives which should be able to read those as long as the disks are still OK.
You forgot to mention that most newer computer don’t have a floppy drive connector, so he should look for an external USB drive.
I checked Amazon.com and found a lot of them (even one for the 720kb disks).
Good point. I neglected to mention USB connected drives because when I searched for Floppy drives, the first page on Amazon only showed me USB floppy drive. But I may have typed USB floppy drives into the search. :-)
There are professional archives out there, whose job it is to take care of all of this. I used to work for one which archives social science data. When I left, we could still retrieve data from punched cards, in addition to floppies, etc. And it was our job to back up, to research which storage formats to use, AND to keep all the metadata about the files and groups of files……
If documents are actually going to be important in the future, someone can be paid to take care of them.
My guess would be that few individuals have very many documents that are going to be regarded as interesting in the future, let alone important. And, as others have said — I’ll be gone; it’s not my concern.
One thing I’d suggest for anyone with written documents they regard as important — save a copy in plain text. My guess would be that .jpg files will be around a lot longer than word-processor-format files. My dad had a long mailing list stored in an obscure database format which his newer software could not read. We saved it out in “comma-separated variable” format (which the ancient db program could produce), and hey, presto, the new software could read it.
My advice for anyone concerned about this is “Save a copy in the most general/generic format possible. No proprietary formats. Save it all on hard drives. Multiple up-to-date hard drives, with at least one hard drive off-site (mine is in the safe-deposit box at the bank).” It would take a group of disasters to cause me a total loss of pics, financial docs, etc, and I suspect if that group of disasters were to happen, the loss of my pictures would not be one of my significant problems…………
This excellent article deals with file formats and the importance of data backup; it also mentions one particular backup application (Macrium Reflect), which can make backup archives, (or copies), of whole hard disks and/or partitions, or, (should the user prefer), backup copies of individual files. Another application of this kind, also often mentioned by Leo, (but not in this particular article), is EaseUS Todo Backup, (which also comes in free or paid versions). I am sure that there are many other backup applications available, (various applications from Terabyte Unlimited and Acronis come to mind).
Also, I seem to remember, around 8 or 10 years ago, reading another article discussing this type of problem, (possibly on Photo.net). The article was primarily concerned with the various proprietary RAW image formats created by different digital camera manufacturers, and how one of those manufacturers was refusing to make available any means of reading those formats, other than by the use of that manufacturer’s own proprietary software, (which also could not be activated unless the user was a verified purchaser of one of their own camera models). (If I remember correctly, activation of the software was only possible through the input of the serial number of the camera which had been used to create the original image, but my recollection on this point is now rather hazy). I also seem to remember that some third-party photo editing software producers, (such as Adobe), were in dispute with the camera manufacturer concerned, because their software applications, (such as Photoshop), could not edit such files without needing a proprietary plugin for the RAW format in question, for which a licence fee would have to be paid. I have no idea whether or not this dispute was ever resolved.
My problem/question is this:
As far as I am aware, each of these backup applications stores the backed-up data in its own proprietary format, and I would be very surprised if I were to discover that any one of those proprietary formats could be read by any application other than the one which had been used to create the original backup archive. (For example, I know for a fact that backup archives created by Macrium Reflect v6 cannot be read by Macrium Reflect v5 or earlier). Archives created by older software versions can be read by newer versions of the same software, but not the other way around, (i.e., newer archives cannot be read by older software versions).
Therefore, not only do we need to be concerned with the readability of the backed-up files: we also need to be concerned with the readability of the backup archives themselves.
Furthermore, this also applies to archiving/compression applications, such as ZIP or RAR. (For example, earlier versions of ZIP had a content limitation of 65,536 files per archive, but this has been increased in the newer 64-bit ZIP versions).
What can we do in the (admittedly highly unlikely) event that the ZIP or RAR formats themselves were to become obsolete at some time in the future?
I really feel that this could be an even bigger problem than the formats of the backed-up individual files.
That’s my 2¢ worth, anyway. :-)
I consider backups and archiving two different things. As a result I wouldn’t count on true long-term access of backups. Fortunately the backup programs, while not forward compatible are usually backwards compatible. Meaning that the backups you create today with Macrium v6 (for example) will in all likelihood be readable by v7 and beyond – to some limit, I’m sure.
The way you’re asking the question actually answers it, though. Don’t use proprietary formats for long term archival of data. Use common (VERY common) formats whenever possible. If you must collect files into other files, I expect “zip” files will be readable for decades if not centuries, just because they are so common. Macrium’s “.mrimg” (and Acronis’ “.tib” and EaseUS’s “.iforget”) not so much.
You are absolutely right that each backup program makes its own file type that can only be restored with that particular software. Backup programs are not meant to be archival storage. They are really for much more immediate use – in the case of a computer crash, etc. As some people point out in the comments on this article… every single thing we own does not need to be archived. But it’s certainly good to be thinking about all the problems that involves.
Zip files are so ubiquitous, that many programs virtually see them as if they were plain text. Google, for example, doesn’t allow executable files to be sent as attachments. If you try to hide the file in a zip file, they still block it. File Explorer also unpacks zip files. I’ve always recommended people use zip files for universal compatibility.
I agree with what you said about the potential problems with digital archiving and how to work around them. However, there are two aspects of this that weren’t covered. First, the reliability and durability of the actual media used for storage is a real threat to the retention of data. No one really knows if a CD, or DVD, flash drive or even a hard drive will be able to keep the data on it for 50 or 100 years. Really long-term data retention on the media we have available today is a real question mark. Because of this, someone, only half in jest, suggested using an electron beam writer to burn date onto a piece of chrome steel by anyone who wanted to pass his or her data onto people 1000 years from now.
The other potential problem is a bit more esoteric. We have no way of knowing what kinds of technologies we will develop in the next hundred years to do our data processing. If the direction of progress in the IT field moves us completely away from our current methods of computing and data storage, it’s quite possible that in 100 years will have no means of reading any of our existing media, since the entire paradigm of computing will have made the use of such storage systems completely obsolete. For instance, as an off-the-wall example, suppose we move to artificial biological devices that can be scaled up to any capacity we want to do our computing. In such an event, any kind of mechanical storage medium we pass on will most likely never be able to be read by our descendants since such mechanical data storage and retrieval will have been rendered totally unnecessary by then. Granted, this possibility might sound far-fetched now, but we have no way of knowing what computing of the next century will be like.
Here’s a historical case going back to 1930, the latest event being last week November 2016.
My Grandfather from 1930 to 1940 had a cine camera and took much footage of my Grandmother, and my Dad, Uncle and Aunt as children. Places featured include London, North Wales, South of France, and Switzerland in summer and winter.
In the 1980s my Uncle found these and had them converted to VHS tape and gave copies to my Dad and Aunt.
A couple of months ago I discovered my now deceased Dad’s tape and took it to my local camera shop who copied it all onto DVDs.
These DVDs went into my laptop and I copied the .VOB files back to the laptop. Using FormatFactory I converted the .VOB files to .MP4 format and edited them into clips, naming and dating them.
I followed the same process with the recently found cine film my Dad, Uncle and Aunt took of their children (me, my sister and our cousins).
And I’ve done the same with VHS tapes I took of a week-long family gathering (there were 30 of us!) in Cornwall from 1992.
ALL of the above have now been copied to USB sticks (“thumbdrives” in the USA) and just last week distributed (by post!) to my sister and our cousins, who will make copies for their many kids to have.
My guess is that as each significant new media capture, play and storage method we use will have the previous format copied up to it, probably ad infinitum, if the generations are prepared to do the work!
Congratulations Leo for bringing up a basic but very important and vexed issue of digital archiving. In fact it is surprising that even at this advanced stage of digital technology, computer scientists and engineers have not come up with an answer to this challenge. As a layman, i would suggest that the digital technology industry should come up with a common/unified standard wherein all kinds of digital files- whether simple text or multimedia files- can be, with a special tool/software, backed up into- and retrieved from a common/unified code or format (for each type of file file i.e. text, image, audio, video etc) which should be independent of the software which generated them at the first place. Let me call the data created in such common/unified code or format as “absolute data” or “eternal data”. No matter what types of media, formats or software are developed in future, the format of “absolute/eternal data” should be kept essentially the same. The digital technology industry should then start certifying all future hardware, softwares and technology capable of reading such data as “absolute/eternal data compliant”!! Just a thought!
I have just watched your video about backups (BUP) that will last over time, and I agree with all your points about formats, etc.
Like you, I suspect, my wife and I have gone digital for almost all things, like say, credit cards, and I usually OCR them. This has great advantages that I can search them. All our backups are to HDDs
However not all is going well. This has to do with digital copies becoming corrupt.
As is recommended, we create BUPs, and then BUPs of the BUPs to store offsite (with a relative). These BUPs are huge, as so much data has been created over the years, and currently about 3TB. We try to have an older “archive” BUP, but we only have one or two “archives” at any point in time. So, what is the issue?
When we need to recover a file from the “archive” we are increasingly finding corrupt files and this is a game stopper and it occurs mainly in .exe and/or .rar. I use a legal copy of Winrar and always their older, less crunched formats.
Sometimes this has clearly been corrupted for some time, and that is why I need to visit my BUPs because a file is corrupted/lost, etc. All storage is on mechanical HDDs; SSDs are used as boot C:\ devices, with something always creeping back into C:\users, I also back that up.
What am I asking? Quality of HDDs (I used WD Essentials or similar)? Maybe an utility to work away in the background to test each file?
I’m sure someone will help with issue. It must be common in the workplace, but those places have more “archive” BUPs, more than I can afford.
I look for someone to help