I am trying to convert a PDF into a word document and there does not seem to
be any free programs that will do this. I have tried several. Can you help?
Your question is a common one, but it represents a fundamental
misunderstanding of exactly what PDF documents are and how they were intended
to be used.
I’ll put it another way: let me explain why I can’t help you.
PDF, or Portable Document Format, is a document format that is intended to address the fact that every computer is different from almost every other computer in some ways. Frequently, those ways affect how documents are displayed and/or printed.
For example, a Word document you receive might use fonts that aren’t installed on your machine, so when displayed Word has to pick a “close” font – but it’s still different. Perhaps your printer has a minimum margin of 1/2 an inch, but your friend’s printer can handle 1/8th of an inch – when you each print the same Word document it wraps, paginates and fundamentally looks different depending on the specifics of your printer.
The things that contribute to visible differences in document presentation aren’t limited to fonts and printers, nor are they limited to Word documents – almost any program that produces a document is susceptible to all sorts of system-to-system differences.
That’s what PDF format sets out to solve: a PDF document is intended to look the same everywhere, and in practice it pretty much does. And across not only a wide variety of computers, but also these days on devices ranging from portable readers to cell phones. It’s a pretty powerful concept.
I want to stress that last point again: “a PDF document is intended to look the same everywhere.”
The upshot is that PDF is fundamentally a display format. It’s often been termed “electronic paper” and that’s a great way to think of it. One of the most common ways to create a PDF file is to install a virtual printer driver and “print” one.
It was never intended that a PDF document be used as “input” to some process to extract, edit or copy its contents.
Editing or Converting PDF’s
What do you do after you print a document to paper and find an error that needs to be corrected?
You reprint it.
That’s the “correct” way to make a change in a PDF – alter the original document that it was created from and re-create the PDF.
That’s also the “correct” way to “convert” it to a Word document – keep the original Word (or other) document around for editing purposes.
One of the problems is that there’s really no always-effective way to get the data back out of a PDF. PDF’s can contain so many different kinds of data – text, pictures, pictures of text – that getting data out could be as simple as copy/paste, and as complex has having to run every page through OCR (optical character recognition) software to recover text in some kind of editable format.
In addition, the information in a PDF is organized for layout, page by page. All your careful organization by topic and chapters, with paragraphs that flow neatly from one to another as you edit the document is completely replaced with an organization that reflects the physical layout of the information of the page. The display organization of a document is often very different than your conceptual organization of the document’s contents.
PDF Editing and Extraction Tools
There are tools and as you’ve seen they often don’t work, don’t work well, don’t work completely, or don’t work the way we really want them to.
For example I can’t point you to a tool that’ll take a PDF in and produce a Word document out that matches the Word document you probably expect, if for no other reason that it’s difficult if not impossible to reconstruct the document’s logical layout from its physical. Tools can make lots of assumptions, but that’s all they are – assumptions, and by their very nature they’re often wrong.
Editing tools do exist – the makers of Foxit Reader, for example, have several. But once again not everything works the way you might expect: yes, you may be able to change a typo, but adding a paragraph or picture that would cause the entire document to re-flow and re-paginate is probably not something that’ll work as expected, if at all.
And of course not every PDF is editable – either by design (encryption or password protection when the PDF is created), or by practicality. A PDF that’s built from images – say .jpg’s – of pages may look exactly like a PDF that’s made from a Word document, but the ability to go back to the original text is severely impaired.