I fully understand the theory behind using “gotcha” symbols for many online
processes. But if the gotcha is a picture of numbers and letters, then WHY must
they make them so difficult to read? I have to regenerate these things over and
over to get one that is readable. If it’s a photo, then why make the symbols
all twisty, blurred, and faded?
While it might feel like a “gotcha”, they’re actually called a CAPTCHA,
which is an acronym for “Completely Automated
Public Turing test to tell
Computers and Humans
Yep, it’s a “prove you’re human” test.
And all that twisty, blurry, faded stuff you’re complaining about? That’s
actually kinda the point.
That’s the test.
The problem that CAPTCHA’s avoid is actually an important one: preventing computers from doing things in an automated fashion – like creating millions of fake email accounts or posting comment spam to websites. By forcing this test which computers cannot (yet) pass, the activity that is being protected can be performed only by a real, live person.
The limitation that these tests take advantage of is that computers can’t read.
Now, technically that’s incorrect – optical character recognition has come a long way. Computer OCR software can, with a very high degree of reliability, take a photograph or scan of text printed on a page and “read” it – turn it into the computer representation of the text that the page contains, as opposed to a picture of that text.
That’s actually pretty cool, and very handy for many applications.
However, there are limits. Even with clear copies of the text a computer has a difficult time with some characters (the letter ‘l’ versus the number ‘1’ in many typefaces, for example), and thus can still get things wrong.
When things get blurry, twisted or faded, current computer algorithms try and figure out what those characters are and fail miserably. It just can’t figure out what those characters are.
You and I, on the other hand, can.
So when we get the answer correct where a computer couldn’t possibly it “proves” we’re human.
As computer technology advances, techniques will I’m sure be developed that will allow the computer to correctly interpret today’s CAPTCHA’s. What happens then I don’t know.
A couple of random notes on CAPTCHA’s:
One way that they’re often defeated is to hire real live humans – often cheaply, overseas.
Another way that some are bypassed is by exploiting weaknesses in a particular implementation. For example, if one type of CAPTCHA always selects from one of 100 different scrambled words, then one need only have a real human interpret each one once, and then simply let the computer compare pictures – something it is good at.
My favorite CAPTCHA, when I use one, is reCAPTCHA, which presents two words in random order: one of which is a real test, the other is a word that is part of a book digitization project. (Their about page has not only a good overview of CAPTCHA, but also how they’re using it in reCAPTCHA.)
CAPTCHAs can have problems – specifically for people with poor or no eyesight. In most cases, an audible CAPTCHA equivalent is made available where you type in what you hear spoken.
Even in normal cases, as you’re seeing, sometimes CAPTCHAs are too hard, too blurry, or too unreadable even for humans. Fortunately, most also include some kind of “show me something else” alternative.
But unfortunately, the bottom line is that the blurriness, and the difficulty is indeed the point.
And CAPTCHAs or something much like them will be around for quite some time – probably as long as there are spammers and those who would do other malicious things en masse, given the opportunity to automate the process.
4 comments on “Why are those "retype the word" tests so twisted, faded and blurred?”
“In most cases an audible CAPTCHA equivalent is made available where you type in what you hear spoken.”
I’ve tried this option a few times but the result is usually even less comprehensible than the visual.
Sometimes I just give up.
I’ve heard that “the bad guys” can use even cheaper human labor to bypass CAPTCHA tests — free use of spam victims.
They grab the CAPTCHA image, and display it to a human who clicks on their spam link. (As if the CAPTCHA image were theirs.) The victim then decodes the image, and “the bad guy”‘s scripts then pass that on to the target computer.
Voila! Bad guy’s scripts now bypass the CAPTCHA test.
I like those twisty letters and numbers.
Maybe thay should include a few upside down alphanumericals.
Figure that out if you’re not human.
Just a correction on how Recaptcha works: both words are from scanned books. One of them has already been verified. So, “If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”
And that’s how you are contributing to OCR antique books and and old editions of the New York Times ;)