I fully understand the theory behind using âgotchaâ symbols for many online
processes. But if the gotcha is a picture of numbers and letters, then WHY must
they make them so difficult to read? I have to regenerate these things over and
over to get one that is readable. If itâs a photo, then why make the symbols
all twisty, blurred, and faded?
While it might feel like a âgotchaâ, theyâre actually called a CAPTCHA,
which is an acronym for âCompletely Automated
Public Turing test to tell
Computers and Humans
Apartâ.
Yep, itâs a âprove youâre humanâ test.
And all that twisty, blurry, faded stuff youâre complaining about? Thatâs
actually kinda the point.
Thatâs the test.
]]>
The problem that CAPTCHAâs avoid is actually an important one: preventing computers from doing things in an automated fashion â like creating millions of fake email accounts or posting comment spam to websites. By forcing this test which computers cannot (yet) pass, the activity that is being protected can be performed only by a real, live person.
The limitation that these tests take advantage of is that computers canât read.
Now, technically thatâs incorrect â optical character recognition has come a long way. Computer OCR software can, with a very high degree of reliability, take a photograph or scan of text printed on a page and âreadâ it â turn it into the computer representation of the text that the page contains, as opposed to a picture of that text.
Thatâs actually pretty cool, and very handy for many applications.
However, there are limits. Even with clear copies of the text a computer has a difficult time with some characters (the letter âlâ versus the number â1â in many typefaces, for example), and thus can still get things wrong.
When things get blurry, twisted or faded, current computer algorithms try and figure out what those characters are and fail miserably. It just canât figure out what those characters are.
You and I, on the other hand, can.
Usually.
So when we get the answer correct where a computer couldnât possibly it âprovesâ weâre human.
For now.
As computer technology advances, techniques will Iâm sure be developed that will allow the computer to correctly interpret todayâs CAPTCHAâs. What happens then I donât know.
A couple of random notes on CAPTCHAâs:
-
One way that theyâre often defeated is to hire real live humans â often cheaply, overseas.
-
Another way that some are bypassed is by exploiting weaknesses in a particular implementation. For example, if one type of CAPTCHA always selects from one of 100 different scrambled words, then one need only have a real human interpret each one once, and then simply let the computer compare pictures â something it is good at.
-
My favorite CAPTCHA, when I use one, is reCAPTCHA, which presents two words in random order: one of which is a real test, the other is a word that is part of a book digitization project. (Their about page has not only a good overview of CAPTCHA, but also how theyâre using it in reCAPTCHA.)
-
CAPTCHAs can have problems â specifically for people with poor or no eyesight. In most cases, an audible CAPTCHA equivalent is made available where you type in what you hear spoken.
-
Even in normal cases, as youâre seeing, sometimes CAPTCHAs are too hard, too blurry, or too unreadable even for humans. Fortunately, most also include some kind of âshow me something elseâ alternative.
But unfortunately, the bottom line is that the blurriness, and the difficulty is indeed the point.
And CAPTCHAs or something much like them will be around for quite some time â probably as long as there are spammers and those who would do other malicious things en masse, given the opportunity to automate the process.
âIn most cases an audible CAPTCHA equivalent is made available where you type in what you hear spoken.â
Iâve tried this option a few times but the result is usually even less comprehensible than the visual.
Sometimes I just give up.
Iâve heard that âthe bad guysâ can use even cheaper human labor to bypass CAPTCHA tests â free use of spam victims.
They grab the CAPTCHA image, and display it to a human who clicks on their spam link. (As if the CAPTCHA image were theirs.) The victim then decodes the image, and âthe bad guyââs scripts then pass that on to the target computer.
Voila! Bad guyâs scripts now bypass the CAPTCHA test.
I like those twisty letters and numbers.
Maybe thay should include a few upside down alphanumericals.
Figure that out if youâre not human.
Just a correction on how Recaptcha works: both words are from scanned books. One of them has already been verified. So, âIf they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.â
And thatâs how you are contributing to OCR antique books and and old editions of the New York Times ;)