I get this question, and variations on it, surprisingly often. I have theories about why, but nonetheless it’s a real question that apparently a lot of people have.
On one hand, the answer seems obvious. However, depending on the circumstances, there are possibilities we need to consider.
The devil, as they say, is in the copy/pasted details.
Become a Patron of Ask Leo! and go ad-free!
There should be no difference
In general, pasting text into a document should be no different than typing it in.
Highlight this sentence, copy it, switch to Notepad, and paste it. The result will be the exactly the same as if you had carefully typed it in by hand.
That’s the theory behind the clipboard and copy/paste: it’s a shortcut to make life easier by saving us keystrokes.
Highlight this sentence, copy it, switch to Notepad, and paste it. The result will be different. The word “this” will not be italicized, because Notepad doesn’t support rich text.
This is an example to show that in some applications, copy/paste can actually copy and paste more than just the text you see: it can include “meta data” — the data about the data.
Exactly what the meta-data is, what it says, or even whether it’s there at all, depends on exactly where you are copying from. It could be visible, as in “this word should be in italics”, or it could be invisible, as in “these words link to that website”.
Exactly what happens to metadata depends on the program you’re pasting it into. It could be ignored, as Notepad ignores the instructions to italicize a word; it could be copied verbatim, as in preserving a hidden link as a hidden link, or it could be modified, perhaps unhiding that hidden link by applying default formatting for links to it.
As a result, more could be copy/pasted than you think, and the presence of some of that data could give away the fact that it probably hadn’t been typed in by hand.
Intentional or accidental differences
The scenario in the original question — an OCR of an image of text, effectively transforming a picture of text to a series of individual characters that can be copy/pasted — is an interesting one.
OCR is rarely perfect. If you are supposed to type what you see, and OCR sees something else, you probably weren’t typing if what you end up with includes the OCR mistakes. Is this a one or the letter l: l? Depending on the font being used, they might be virtually indistinguishable; only by understanding the context might you be able to understand. (And even then, in this example, there’s no context to know which it really is.) OCR errors like this are common, and have patterns that are easy to look for and detect.
As we’ll see in a moment, the error might be intentional. If I purposely misspell a word, give you an image of the text containing the word, and tell you to type what you see — do you take that literally, and include the typo? Or do you fix the typo? A copy/paste will never fix a typo — it’ll copy exactly, and only, what was there to begin with.
The difference can be telling.
Again, for reasons that will become apparent shortly, I have to include spyware of some sort in the mix. Spyware can tell exactly what you were doing, right down to the keystroke. It will make it glaringly obvious that you didn’t type something, but copy/pasted instead.
If you’re using a school or business computer, they have every right to monitor your activity with spyware.
They can tell.
What I think is going on
That last point is probably a giveaway to what I think is happening.
I believe that students or employees have been given an assignment to re-type the text that’s been given them in the form of an image, and they’re trying to cheat by using OCR and copy/paste instead.
The worst case, I suppose, would be a typing class where you’re supposed to be practicing your typing. Copy/paste isn’t practice, but it might seem a lot easier.
Regardless of the reasons, my gut tells me people are trying to take a shortcut where they’re not supposed to, and are concerned about being found out.
I have two pieces of advice for those folks:
- Ask for clarity in the assignment, or explicit permission to OCR and copy/paste. If the task allows it, it can absolutely speed things up.
- Don’t cheat. Follow your instructor’s instructions or your boss’s rules.
As we’ve seen, while in theory copy/paste isn’t detectable in most cases, it’s possible it could be accidentally exposed by various means.
If you found this article helpful, I'm sure you'll also love Confident Computing! My weekly email newsletter is full of articles that help you solve problems, stay safe, and give you more confidence with technology. Subscribe now and I'll see you there soon,