Your situation is actually not that uncommon.
These days, file formats are complex and the programs that read them are often unforgiving when there’s something wrong with the file.
When only portions of a file are recovered, some of the information that the application relies on to open and interpret the file is so badly damaged that the application can’t even recognize the file to open it.
Typically, that happens when the first few pieces of the file are missing. But it actually can happen if any piece of the file is missing, out of order, or just otherwise unrecoverable.
Become a Patron of Ask Leo! and go ad-free!
Recovering text from files
When faced with this in the past, I’ve typically used a very bizarre technique that basically tries to recover as much of the text in the document as it can.
Now, I want to be clear about what I’m talking about here. When I say “text,” that means the words that I’ve written and only the words I’ve written. Using this technique, you will end up losing any formatting or layout information as well as any images.
But I’m assuming this is primarily a text document, so you want to recover all of the words.
Pulling Strings
Download the utility called Strings. It’s a command-line tool available from the TechNet portion of the Microsoft website.
Very often, the words that you write in a document are stored as plain text. It’s like writing it all in Notepad where there is no formatting. Word processing programs, like Word Perfect, Microsoft Word, and others add formatting and layout information to the file format, so you can see that a paragraph in your document is centered, in all italics, or on the next page. Nonetheless, the words are still in plain text.
Strings looks through a file that you specify on the command line and simply displays everything that it recognizes as being a plain-text string.
You may have to run it twice:
- In ASCII mode to get simple characters that you obviously would recognize.
- In Unicode mode. Sometimes, programs will store characters in Unicode. This allows them to store millions of different kinds of characters from around the planet. Even if you’re only writing in English, you’ll want to make sure that you run a Unicode scan.
One scan will probably be better than the other and it should be fairly quick to recognize, based on the output of the tool when you run it each of the two times.1
Next, you’ll redirect the output of the Strings tools to a file. This is simple command line stuff. Look up “redirect” or “command line redirect” in Windows. For example:
strings -A example.doc >recovered_example.txt
That dumps all of the plain text Strings that it can find into a new file, recovered_example.txt, which you can edit using notepad or whatever.
You will see a lot of junk in these files. Some of it you can just delete. Hopefully, the majority of the text from the original document will be there.
I used Strings in the past. It’s been a while since I’ve done it, but Strings has helped me recover corrupt documents from time to time. It definitely beats having to retype everything from scratch.
Remember to backup
Now, I do have a recommendation that I’m sure everybody’s expecting me to say.
This wouldn’t have been a problem if the file had been backed up somewhere.
Even when you’re working on a document, it’s helpful to have a backup copy of it. Even if the backup is as recent as yesterday and you lose all of today’s work, you don’t lose everything.
So, use Strings to see if it will recover enough of your document and then start backing up that file in the future.
Leo, Thanks for mentioning the STRINGS program.
When I went to the Link I noticed STRINGS was a part of the SysInternals Suite, so I grabbed a newer copy of the Suite while I was there instead of just the one file.
Thanks Again!
This is a tip for anyone writing a serious volume of text.
I’ve written two books of more than 600 pages each. The first project took 1.5 years, the second nearly 3 years. Since I truly hate to redo any work, I’ve designed a scheme that goes further than simply backing up. I save my work every 30 minutes, using a new name for every instance. If the project working name is XYZ, then each file name follows the format XYZ_yymmdd_a, i.e., it is suffixed with the date and a letter. Every day I start with the letter a, and every half hour I save the file using the next letter of the alphabet. I also immediately copy every new instance to a usb thumbdrive dedicated to this one purpose. At the end of every day I back up the harvest of that day to an external disk. On the computer’s hard drive, I use a set of folders to store away the bulk so as not to be bothered by it.
Through the months, I regularly remove old instances, but always keep the last one of each day.
I have a micro vault cord around my neck to which I attach the daily usb thumbdrive, and never leave the house without wearing it. If the whole place burns down, I get a new computer and happily continue the project.
Especially in later stages of the writing, when you’re not adding new text at the end but make changes all over the place, it is important *never* to lose any of your work.
This procedure takes little effort, and only a bit of discipline, and delivers great peace of mind.
I was not able to extract the zip file using win-zip or 7-zip. I am wondering, MoreOff, if I need the whole suite to be able for it to work?