Why is my partially recovered document still not readable?

Writing over a file is a good way to lose data, but if you are willing to dive into some command line programs, you may be able to recover some of it.

//
Hi, Leo. I have a sticky situation here. My daughter accidentally overwrote a written document and lost 60 pages of her story. We’ve tried recovery tools, such as Recuva and Undelete. She may have recovered some of it, but she’s unable to open it. It’s a Word Perfect 12 document and when we try to open it, it says it’s an “unsupported format.” We cannot understand how it was saved or recovered in a different format. Just to let you know we have it still. I’ve tried emailing it as an attachment to different people who are better at computers and no one so far has been able to open it.

Your situation is actually not that uncommon.

These days, file formats are complex and the programs that read them are often unforgiving when there’s something wrong with the file.

When only portions of a file are recovered, some of the information that the application relies on to open and interpret the file is so badly damaged that the application can’t  even recognize the file to open it.

Typically, that happens when the first few pieces of the file are missing. But it actually can happen if any piece of the file is missing, out of order, or just otherwise unrecoverable.

Recovering text from files

When faced with this in the past, I’ve typically used a very bizarre technique that basically tries to recover as much of the text in the document as it can.

Now, I want to be clear about what I’m talking about here. When I say “text,” that means the words that I’ve written and only the words I’ve written. Using this technique, you will end up losing any formatting or layout information as well as any images.

But I’m assuming this is primarily a text document, so you want to recover all of the words.

Pulling Strings

Download the utility called Strings. It’s a command-line tool available from the TechNet portion of the Microsoft website.

Very often, the words that you write in a document are stored as plain text. It’s like writing it all in Notepad where there is no formatting.  Word processing programs, like Word Perfect, Microsoft Word, and others add formatting and layout information to the file format, so you can see that a paragraph in your document is centered, in all italics, or on the next page. Nonetheless, the words are still in plain text.

Strings looks through a file that you specify on the command line and simply displays everything that it recognizes as being a plain-text string.

A run of the Strings utility

You may have to run it twice:

  • In ASCII mode to get simple characters that you obviously would recognize.
  • In Unicode mode. Sometimes, programs will store characters in Unicode. This allows them to store millions of different kinds of characters from around the planet. Even if you’re only writing in English, you’ll want to make sure that you run a Unicode scan.

One scan will probably be better than the other and it should be fairly quick to recognize, based on the output of the tool when you run it each of the two times.1

Next, you’ll redirect the output of the Strings tools to a file. This is simple command line stuff. Look up “redirect” or “command line redirect” in Windows. For example:

strings -A example.doc >recovered_example.txt

That dumps all of the plain text Strings that it can find into a new file, recovered_example.txt, which you can edit using notepad or whatever.

You will see a lot of junk in these files. Some of it you can just delete. Hopefully, the majority of the text from the original document will be there.

I used Strings in the past. It’s been a while since I’ve done it, but Strings has helped me recover corrupt documents from time to time. It definitely beats having to retype everything from scratch.

Remember to backup

Now, I do have a recommendation that I’m sure everybody’s expecting me to say.

This wouldn’t have been a problem if the file had been backed up somewhere.

Even when you’re working on a document, it’s helpful to have a backup copy of it. Even if the backup is as recent as yesterday and you lose all of today’s work, you don’t lose everything.

So, use Strings to see if it will recover enough of your document and then start backing up that file in the future.

 

Footnotes and references

1: Some document formats lend themselves to this type of operation – like “.doc”, used in the examples. Other formats may need additional steps. “.docx”, for example, is actually a compressed zip file and needs to be unzipped using tools like WinZip, 7-zip or others before something like strings will work. Other formats may not work at all.

There are 3 comments:

  1. MoreOff Reply

    Leo, Thanks for mentioning the STRINGS program.
    When I went to the Link I noticed STRINGS was a part of the SysInternals Suite, so I grabbed a newer copy of the Suite while I was there instead of just the one file.
    Thanks Again!

  2. Hans Reply

    This is a tip for anyone writing a serious volume of text.
    I’ve written two books of more than 600 pages each. The first project took 1.5 years, the second nearly 3 years. Since I truly hate to redo any work, I’ve designed a scheme that goes further than simply backing up. I save my work every 30 minutes, using a new name for every instance. If the project working name is XYZ, then each file name follows the format XYZ_yymmdd_a, i.e., it is suffixed with the date and a letter. Every day I start with the letter a, and every half hour I save the file using the next letter of the alphabet. I also immediately copy every new instance to a usb thumbdrive dedicated to this one purpose. At the end of every day I back up the harvest of that day to an external disk. On the computer’s hard drive, I use a set of folders to store away the bulk so as not to be bothered by it.
    Through the months, I regularly remove old instances, but always keep the last one of each day.
    I have a micro vault cord around my neck to which I attach the daily usb thumbdrive, and never leave the house without wearing it. If the whole place burns down, I get a new computer and happily continue the project.
    Especially in later stages of the writing, when you’re not adding new text at the end but make changes all over the place, it is important *never* to lose any of your work.
    This procedure takes little effort, and only a bit of discipline, and delivers great peace of mind.

  3. Lee Guptill Reply

    I was not able to extract the zip file using win-zip or 7-zip. I am wondering, MoreOff, if I need the whole suite to be able for it to work?

Leave a reply:

Before commenting please:

  • Read the article. Seriously. You'd be shocked at how many people make comments that prove they didn't.
  • Comment only on the article. If you have a new, unrelated question start with the search box at the top of the page.
  • Don't post personal information. Email addresses, phone numbers and such will be removed.

VERY IMPORTANT: because of a rise in comment spam that's making it through our filters any comments that do not add to the discussion - typically off topic or content-free comments - run a very high risk of being flagged as spam and removed.

If you have a new question unrelated to the article above, ask it on the Ask Leo! ask-a-question page.