My document management approach has changed over the years. I used to, very carefully and manually, scan documents, name files, put them in organized folders and so on. I don’t do anything like that anymore.
Become a Patron of Ask Leo! and go ad-free!
I’ve talked and written before about how I strive to be as paperless as possible. I get documents electronically whenever possible. Bills, receipts, and much more come to me in the form of bits rather than on sheets of paper.
One of the biggest reasons I do this is that I can do something with bits much more easily than I can with paper: I can back them up. With paper, unless you burn more paper with a copy machine, you have exactly and only one copy. With electronic documents, you can have as many copies as you care to create.
I think for most of my electronic documents, about half a dozen machines in something like three or four states would all have to implode simultaneously for me to actually lose something – and those include not just my machines but some that are owned by some pretty large companies.
So, what do I do when I actually get paper? Needless to say, I scan it, but it’s how I scan and how I manage my documents after that where things get kind of interesting.
I have a Fujitsu ScanSnap scanner. It’s an older model, but it has worked solidly for several years and it has been worth every penny I paid for it. You can drop a multi-page document into its feeder, push a button, and it’ll proceed to scan both sides of the entire document with pretty amazing speed.
Once it’s done so, I typically shred the actual paper document for security. My thinking is that the recycle bin on my street is a lot less secure than my own electronic document management techniques.
When a document is scanned by my ScanSnap, it automatically runs OCR; Optical Character Recognition. Remember, a scan of a document is really just a picture of that document. There’s no text associated with it that a computer can use. OCR is actually a separate process where the computer looks at the picture and tries to determine what the actual text is on that page.
The document with the text is then stored.
I put it all in Evernote
Now, I store almost all of my documents in Evernote. For security, of course, it has two-factor authentication enabled. I’ve stopped trying to name or file any of the scanned documents in Evernote beyond a rudimentary scanned documents folder.
So how do I find anything? Well, I use search first, and it’s awesome. That’s why the documents are OCRd in the first place. It allows me to search for the term I think the document I’m looking for contains.
Evernote search is blazingly fast, and I quickly cut the list of hundreds or thousands of documents down to just a few or even just one. If there are more than one, I do a quick visual scan and I’ve got exactly what I’m looking for.
But note my overall process:
- insert the paper into the scanner
- push a button
That’s all I need to do. Everything that happens after that is completely automatic until of course, I need to find a document. Then I just perform a quick search and have what I need.
Now, it gets even better. Evernote has a mobile app. That means two things. 1) Wherever I go, I have almost immediate access to all the documents in my collection and 2) I can add documents quickly and easily wherever I am.
How? By taking a picture.
Like I said, a scan is nothing more than a picture of a document. My phone has a camera. So, for example, I take pictures of receipts at restaurants rather than carting home the paper slip. That picture goes directly into Evernote. Evernote even does OCR on uploaded pictures this way, so I can easily find them later with a search.
Honestly, the only time I ever have trouble finding documents these days is when they pre-date my switch to Evernote a few years ago. Then I have to go back to searching my old collection of folders and files.
15 comments on “How Can I Manage a Lot of Scanned Documents?”
OK, you scan and OCR the document. Maybe it’s a bank statement. OCR is only about 99% accurate. How do you deal with the occasional drop-out?
To be honest, I never notice. The OCR is used only for searching – when I look at (or, heaven forbid, print :-) ) the document the original image is used. So a missed OCR hit only affects search ability. When searching I tend to throw enough random terms into the mix that it’s rare I can’t find what I need.
Hi, what OCR program do you use. Is it free?
It’s part of my ScanSnap software I believe that came with teh scanner. Evernote can do it as well. And sidestepping my process completely you can use Google Drive Docs to do it.
I love this idea, but it would concern me to have, for example, my bank and credit card statements in Evernote since it cannot be encrypted. So you have only a password to your Evernote account between this sensitive data and the rest of the world. Especially on the phone, where the default settings keep you logged in at all times, access is just a tap away. Do you take additional steps to protect this data somehow?
I have two-factor authentication enabled on my Evernote account. I also lock my phone with a PIN, and have remote-wipe enabled should I ever lose it.
Surely, Mark, in this electronic age you should be managing your bank and credit card accounts online. Thus… no paper to scan. My bank and credit card providers offer a service to supply statements going back more years than I care to remember!
Yes, Bank & CC statements are available online. What if, say, you switch banks or credit cards? As soon as you are no longer a customer, you will lose access to the historical online statements. Additionally, Lots of data that I would classify as “sensitive” arrives in the mail without being easily available online. Everyone has their own level of comfort, I guess.
I also think the “search for anything you want” system has limitations, as well. Let’s say it’s tax time and you want to get documentation for all charitable contributions for the year. Without categorization, you can’t just search for “contribution”. You would have to remember the name of every organization (or some other data that would likely be on the documents you scanned), for example, and search separately for that.
This is just one example, but it points out the limitations. For any system to work, you need to know it’s limitations so you can plan around them where necessary.
Maybe it’s as easy as hand-writing “TAX” on things during the year so that then becomes a searchable term – I don’t know.
Lastly, don’t forget to backup the Evernote database. Sure, they are a thriving company now, but what about in 10 years? Can you even convert their format to something readable if their online presence disappeared?
I download my bank and CC statements in PDF form for long term archives. You don’t have to leave the institution … sometimes they only keep the last year or two available online. And, not terribly surprising I hope, is the fact that I do backup my Evernote database. :-) I get what you’re saying about search – I really do – and I had many of those same reservations. And yet … I have always found what I’m looking for and I’ve been doing this for something like three years now.
Where does it store the OCR’ed text? Is it embedded in the PDF file? Or is it Evernote metadata? Other?
My approach embeds it in the PDF. If I save the PDF out of Evernote the ability to search the text remains.
My company takes this a step further. Using something called Doc-It, they rename PDFs and move them into the appropriate folder. Doc-It says their product is for accounting companies, but I think it could be widely applicable.
A Fujitsu scanner with Scansnap is expensive, but it’s a classic case of, “you get what you pay for.”
For about 12 years I’ve used PaperPort [by ScanSoft] to manage stored scans and find it invaluable. The later versions [now PaperPort Professional 14] provide an excellent search facility which has never failed me. If documents are important, as they are for me, I’d rather pay a few bucks every few years to keep programs like this up to date. As has already been said… you get what you pay for!
Many thanks, Leo, as always. Time for another latte!
Has anyone found a scanner that can take a stack of empty envelopes and scan one side? Only problem with most feeder scanners is that they won’t feed anything thicker than one piece of paper.
I store my personal scanned documents in cloud storages like Evernote, Dropbox, Skydrive and Google drive. The freely available space on these sites are more than enough for me. But for office documents, I never use these options. Our company has hired a third party company named Ash Conversions in Weston to deal with all the document scanning and management works. They do a good job making us free of all the hassles of scanning and management. Their online document management tool makes us easily access all our essential documents.