Why Am I Seeing So Many CAPTCHAs?

The rise of the bots.

Tired of being asked if you're human? I'll explain what site owners like me are dealing with behind the scenes and why you’ll probably be clicking “I’m not a robot” a lot more often.
I'm not a robot.
(Screenshot: askleo.com)

CAPTCHAs — Completely Automated Public Turing test to tell Computers and Humans Apart — seem to be popping up everywhere, even on sites where you wouldn’t think they’d be needed.

I’ve been tempted to add a CAPTCHA to Ask Leo!. Seriously tempted.

Let me explain what leads to that temptation. I’ll also explain why it’s unlikely to happen, even though the costs of not doing so can be high.

TL;DR:

So. Many. CAPTCHAs.

Bots from AI companies and search engines are flooding websites to read pages. This traffic makes sites slow or expensive to run. Owners use CAPTCHAs to block these bots and save money. I pay for a bigger server instead so that my work gets found online.

The origin of CAPTCHA

We’re all pretty used to the occasional CAPTCHA. They exist to prevent automated systems — bots — from doing things intended only for actual humans.

The most common example is email account creation. Before CAPTCHA, bots could create thousands upon thousands of email accounts from which to send millions and millions of spam emails. CAPTCHAs put a stop to that. Spammers had to resort to other means for their garbage1.

CAPTCHAs are also commonly used to prevent bots from accessing sites over VPNs2. “Real” people encounter them more often when using VPNs, but again, this isn’t terrribly common for most people.

Things have changed.

Ask Leo! is Ad-Free!
Help keep it going by becoming a Patron.

The rise of the bots

Cloudflare CAPTCHA
A very common CAPTCHA. Click for larger image. (Screenshot: askleo.com)

We’re seeing the CAPTCHA above, or variations thereof, more and more frequently.

What’s odd is that there’s no clear reason why. This is not an instance of account creation, or even if there is, there doesn’t appear to be any reason bots would want to create them. Similarly, there’s nothing special about the content that would make it appear lucrative to automated processing and nefarious actions by malicious actors like spammers.

But clearly there’s a problem that site owners are attempting to address by choosing to implement a CAPTCHA as an entrance requirement.

My recent experience leads me to a theory.

The rise of AI

I mentioned recently that I had to increase the size of my web server because it was getting overwhelmed with page requests.

The problem? Most requests were not from humans. They were bots.

  • Search engine spiders. These have been around since the dawn of search. They scan websites and build indexes to use when people search for things online.
  • AI spiders. These are new, and there are many. They’re all scanning websites to use the content on those sites to train their models or augment their responses.

I had to get a bigger and more expensive server to provide my content to spiders and bots scanning the web.

The alternative? Block the bots.

The rise of CAPTCHA

My theory is that I’m not alone. Other site owners are faced with the same problem: overwhelming demand from automated systems.

There are two choices:

  • Implement a method to block the bots. This is why we’re seeing so many more CAPTCHAs. They block the bots and presumably allow the website to serve its intended audience on its existing infrastructure.
  • Spend money to increase capacity so as to be able to serve humans and bots alike.

Many websites are choosing the former.

I’ve chosen the latter. Why? Preventing bots means my content will never appear in their services — AI or search. I want my content to be found, and search and AI exist to share content.

Couldn’t you just ask?

There are standard ways to ask search and AI spiders not to scan your website.

Respecting that request is voluntary. As a result, while many spiders respect the requests, many scan anyway.

My sense is that typically, those that don’t respect the requests are fairly poorly written. One way that manifests is that rather than spreading their requests out over time so as not to overwhelm the sites, they flood the site with hundreds of requests at once. In effect, it’s an unintentional denial of service (DDOS) attack that can bring some sites to their knees.

Do this

Get used to it.

There’s nothing you or I as site visitors can do about this problem. From the website side of things, I can assure you it’s an ongoing race between site owners trying to block bots and allow humans, and bots getting better and better at appearing to be human.

About the only thing for certain is that this race will be with us for some time.

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

Podcast audio

Play

Footnotes & References

1: This led to the rise of botnets on millions of individual computers compromised with malware.

2: I don’t have a clear reason why, but it’s common for VPNs to trigger CAPTCHAs when direct access does not.

7 comments on “Why Am I Seeing So Many CAPTCHAs?”

  1. The problem I have with CAPTCHA is that I just don’t seem to be able to determine to the systems satisfaction which squares show, say, traffic signals. Do you check the picture that has a tiny portion of it shown? Or not?

    Another site has me press and hold a spot on their site, but I guess I don’t do that correctly either.

    Now I am starting to think it is me?? I don’t have green fingers for gardening either.

    Reply
  2. “I don’t have a clear reason why, but it’s common for VPNs to trigger CAPTCHAs when direct access does not.”
    It may not be that they are specifically targeting VPN’s. It might be the high number of requests they get from VPNs which may make them look like bots.

    Reply
    • Those tick the box CAPTCHAs use different “fingerprinting” techniques to determine if you are a human. Sometimes they use aspects of your hardware specs or cookies to see if you’ve previously passed a CAPTCHA challenge. Other times they use different criteria like mouse movement and other behavioral and environmental signals.

      Reply
  3. There is a third option: Fingerprint which – used correct – is a solution where you can provide valuable data to the honest user and provide fake or no data to bots.

    If the same fingerprint read pages from your site every two seconds, its fair to assume that it is a bot.

    3 x examples:
    1) You want to protect your data.

    E.g. if the data is the length of a car, you can add 10% to the length.
    The bot (e.g. Google bot) does not care, and you will have your page proper indexed.
    When a genuine user click at the Google link, the user will – because the user has another fingerprint – read the correct data.
    That way you can protect your data.

    2) You want to reduce the load on the server.

    Just provide less data to the bot.

    3) You just want to provide data to proper search engines.

    Its not difficult to find the IP-addresses of proper search engines bots like Google and Microsoft.
    Make an exception IP-table and just provide fake data like in example 1, and return reduced or no data to the rest.

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.