Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Can I Tell If Something’s Been Pasted Instead of Typed?

Question: I want to know, if I have received a .rar file which contains many jpeg files and I have to type it in Notepad, but I am doing copy-paste from the image in the Microsoft notes and then copy from there and I paste it in the Notepad and I save the document as .txt, can that be detected?

I get this question, and variations on it, surprisingly often. I have theories about why, but nonetheless it’s a real question that apparently a lot of people have.

On one hand, the answer seems obvious. However, depending on the circumstances, there are possibilities we need to consider.

The devil, as they say, is in the copy/pasted details.

Become a Patron of Ask Leo! and go ad-free!

TL;DR:

In theory copy/pasting should be indistinguishable from hand-typing text. Exceptions might include formatting and other meta-data not supported by the pasted-into application, or intentional errors introduced in an original specifically to confirm that information was typed and not copied.

There should be no difference

TypingIn general, pasting text into a document should be no different than typing it in.

Highlight this sentence, copy it, switch to Notepad, and paste it. The result will be the exactly the same as if you had carefully typed it in by hand.

That’s the theory behind the clipboard and copy/paste: it’s a shortcut to make life easier by saving us keystrokes.

However.

Obvious differences

Highlight this sentence, copy it, switch to Notepad, and paste it. The result will be different. The word “this” will not be italicized, because Notepad doesn’t support rich text.

This is an example to show that in some applications, copy/paste can actually copy and paste more than just the text you see: it can include “meta data” — the data about the data.

Exactly what the meta-data is, what it says, or even whether it’s there at all, depends on exactly where you are copying from. It could be visible, as in “this word should be in italics”, or it could be invisible, as in “these words link to that website”.

Exactly what happens to metadata depends on the program you’re pasting it into. It could be ignored, as Notepad ignores the instructions to italicize a word; it could be copied verbatim, as in preserving a hidden link as a hidden link, or it could be modified, perhaps unhiding that hidden link by applying default formatting for links to it.

As a result, more could be copy/pasted than you think, and the presence of some of that data could give away the fact that it probably hadn’t been typed in by hand.

Intentional or accidental differences

The scenario in the original question — an OCR of an image of text, effectively transforming a picture of text to a series of individual characters that can be copy/pasted — is an interesting one.

OCR is rarely perfect. If you are supposed to type what you see, and OCR sees something else, you probably weren’t typing if what you end up with includes the OCR mistakes. Is this a one or the letter l: l? Depending on the font being used, they might be virtually indistinguishable; only by understanding the context might you be able to understand. (And even then, in this example, there’s no context to know which it really is.) OCR errors like this are common, and have patterns that are easy to look for and detect.

As we’ll see in a moment, the error might be intentional. If I purposely misspell a word, give you an image of the text containing the word, and tell you to type what you see — do you take that literally, and include the typo? Or do you fix the typo? A copy/paste will never fix a typo — it’ll copy exactly, and only, what was there to begin with.

The difference can be telling.

Spyware

Again, for reasons that will become apparent shortly, I have to include spyware of some sort in the mix. Spyware can tell exactly what you were doing, right down to the keystroke. It will make it glaringly obvious that you didn’t type something, but copy/pasted instead.

If you’re using a school or business computer, they have every right to monitor your activity with spyware.

They can tell.

What I think is going on

That last point is probably a giveaway to what I think is happening.

I believe that students or employees have been given an assignment to re-type the text that’s been given them in the form of an image, and they’re trying to cheat by using OCR and copy/paste instead.

The worst case, I suppose, would be a typing class where you’re supposed to be practicing your typing. Copy/paste isn’t practice, but it might seem a lot easier.

Regardless of the reasons, my gut tells me people are trying to take a shortcut where they’re not supposed to, and are concerned about being found out.

I have two pieces of advice for those folks:

  1. Ask for clarity in the assignment, or explicit permission to OCR and copy/paste. If the task allows it, it can absolutely speed things up.
  2. Don’t cheat. Follow your instructor’s instructions or your boss’s rules.

As we’ve seen, while in theory copy/paste isn’t detectable in most cases, it’s possible it could be accidentally exposed by various means.

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

Podcast audio

Play

Video Narration

7 comments on “Can I Tell If Something’s Been Pasted Instead of Typed?”

  1. I’ve seen this question asked many times on Ask Leo! My question is why would an employer insist that it be typed and not copied and pasted? Perhaps there are some employers who require this who can explain why.

    As a teacher, I tell my students not to plagiarize articles and I verify by copying an pasting portions of the assignment into Google. I’ve caught a lot of students that way, but that’s something completely different from a work assignment.

    Reply
  2. There are two symbols for the blank spaces between words. Word puts in one style when you push the space bar. The Internet coding can put in a different style. These spaces are both referred to as a “non-breaking spaces.” If you are printing the document, it does not matter. Neither symbol prints. No one will know.
    You can see these by turning on the “Show/Hide” tool (Home tab/ Paragraph tools/click the ¶ symbol) and a sentence will have little dots between the words (if the document is from Word): “I·want·lunch” or it will have little zeros: “I◦want◦lunch” (if the document is from the Internet). Sometimes there is a mixture. These different space markers can indicate that something was copied and pasted.
    If you want to make them all the same, use the Find and Replace tool (Push Ctrl+H). In Find what: put the Alt code for the Internet space symbol (◦) – which is Alt+0160 or 00A0, Alt+X. This will find every ◦. In Replace with: put the Alt code for the Word space symbol (·) – which is Alt+32 or 0020, Alt+X.
    Click on Replace All. This will replace every ◦ with · .
    [Alt+0160] means hold down on the Alt key, type 0160 in the keypad, then release the Alt key.
    [00A0, Alt+X] means type the number, then push the Alt key and X at the same time.

    Reply
  3. Corresponding with someone on Zoosk (dating site) and some of his emails hint to the idea he is sending the same thing to multiple women. Is there a way I can tell????

    Reply
  4. @Jan Elliott
    There is a way. Make several “fake” female accounts, get in touch with the “suspect” with all of these accounts. If the “suspect” sends the same emails to your alter egos, (s)he probably also does it to other (real) women.

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.