Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Web site spam: what can I do about it?

I have a personal web-site to help computer users that’s been running for
about 6 years. I have a guest book and people have been signing it for years.
Within the past year, though, I’ve been swamped with spammers signing my book.
I get about 6 to 10 spams each day. Each morning I delete them, but it is
getting worse by day.

I had tried to “hide” my guest book from the public and sacrifice the
ability to have people sign and my enjoyment in reading these. But even this
“hidden” page keeps getting spam.

How can I prevent spammers from signing my guest book? I’d appreciate your
comments and hopefully a solution to this annoying problem.

Oh, I have plenty of comments and opinions on this topic – it’s a problem I
face right here with Ask Leo!

But unfortunately, like spam in general, there’s no single answer – no magic
bullet.

Depending on your server and other specifics, there are several approaches
you can take.

Become a Patron of Ask Leo! and go ad-free!

Web spam, also known as “blog spam” or “comment spam”, is definitely on the
rise. Spurred by the popularity of Weblogs or blogs which allow people to post
comments, spammers are using these forms to post links back to their own sites.
The links aren’t really intended to server as advertising, per se, but rather,
to trick the search engines into thinking that the target site is more
important than it is, because of all the incoming links.

Regardless of why, it’s a mess.

There are two types of comment spam generation techniques: manual and
automated. Automated tools will scour the web looking for things that look like
comment or guest book forms, and automatically post their bogus content to
these forms. Manual tools involving hiring cheap labor overseas to do exactly
the same thing by hand.

While it started as comment spam on blogs, it’s most definitely no longer
limited to that. Almost any form that accepts input on the web is getting
hit.

As I said, there are various tools and techniques to combat comment or web
spam. Which technique might help you depends on how your form is set up, and on
what type of server, or publishing platform you might be running.

A very common technique is to use what’s called a “CAPTCHA” (“Completely
Automated Public Turing test to tell Computers and Humans Apart”). You’ve
probably seen them – they’re the often distorted characters that you’re asked
to re-type into the form before it will be accepted. As the name implies, it’s
a way to prevent automated tools from posting to your form. Unfortunately it
does nothing to stop actual humans.

If you’re running on a content management system like MovableType, WordPress
or others, then CAPTCHA may already be an option – either as a built-in
feature, or as a plugin for your platform. Unfortunately creating and using a
CAPTCHA test in the general sense is not all that trivial.

‘Which technique might help you depends on how your form
is set up, and on what type of server, or publishing platform you might be
running.”

However, if you’re using a standard HTML <form> to get your input, I
developed a technique that relies on JavaScript to throttle spam. In fact, it’s
a technique I use here on Ask Leo! with great success. It’s developed and
described for the MovableType publishing platform, but the technique is in fact
valid for any <form> based input. You can read more about it on my
MovableType Tips site: Dealing with Comment
Spam.

The drawback of this technique is that it requires that JavaScript be
enabled in order for people to post to your form. While most people do have it
turned on, there’s a percentage that do not, and you’ll have to decide if that
is important enough to you.

If you’re running an Apache-based web server and you have access to its
configuration, the mod_security module might be an
option. This module can be configured to monitor for terms and take action when
those terms are posted to your form. It’s something else I run on Ask Leo!’s
server, and as a result attempts to post a comment with certain
four-letter-words or certain spam-related phrases will simply be rejected.

Another technique I find myself using is for forms where I control the
script that processes the form input. Most notably, my ask a question page has been getting hammered of late with
various attempts at web spam. What I’ve done is simply make note of common
strings (typically the websites that are being linked to) and updated the code
to disallow posts containing those strings. (Apparently, being PHP based, it
bypasses mod_security.)

Both techniques that scan for strings require a certain amount of
maintenance. As spammers arrive attempting to promote new things, those things
need to get added to the disallowed list. However, if you’re willing
to completely disallow links in the content posted from valid users, then
disallowing the string “http:” would stop 99% of this type of spam.
Unfortunately that’s not something I can do, as many of the questions I get do
need to refer to specific web pages.

If you don’t have access to the levels of scripting or server configuration
that I’ve described here, then your next best bet is to investigate the
specific publishing platform you’re using. The spam problem is wide-spread, and
many of the popular platforms are implementing solutions of various types.

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

9 comments on “Web site spam: what can I do about it?”

  1. We had a similar problem with the Guest Book page my wife has on her web site – spammers got in with advertising material. We solved this problem (so far) by siging up for an e-Guestbook (Google them) account. Annual subscription is very low. Posters have to type in a “magic” number which beats automated posters. You are advised when a new post has been sent, and can vet it before allowing the post. As I said, it works so far.

    We had a similar situation with the site’s discussion forum. We fixed this briefly be setting up a forum with phpBB, but recently, we started getting dozens of signups by “members” who are obviously pushing spam. More maintenance work to be done…

    Reply
  2. Leo –
    A few months ago I joined an online computer help forum sponsored by a major computer manufacturer. Within just a few days I began receiving (on average) 10 spam emails a day. Now it’s up to 20 a day. I’ve got my firewall, antivirus, and antispyware programs current and running. The spam has been directed to my bulk folder so apparently the filters are working and not sending the spam to my inbox.

    But how did the spammers get *MY* email address in the first place? In order to access the forum I have to first go to http://www.computer company.com and then sign in from there. If members want to communicate privately, they can send messages via a separate link provided on the forum site. We never see each others actual email address. (Similar to how eBay allows people to communicate.) It’s not like my address is being posted by the computer manufacturer or the discussion forum… or is it?
    Mary

    Reply
  3. phpBB’s built-in CAPTCHA has either been cracked by spammers or the human-created phpBB spam is on the rise.

    The problem with phpBB is that even if you require answering an e-mail to activate the account, or even if you go so far as to require manual administrator approval to activate an account, the moment someone signs up (BEFORE they’re activated), they end up in the member directory.

    And if they’ve specified a homepage link in the form when they signed up, it’s linked from two places in the member directory. So they can give a fake e-mail address and never activate their account, but get linky goodness from just barraging your phpBB board with fake accounts.

    This is why I have removed all my phpBB installations.

    Reply
  4. Maybe we’re going about this wrong? How does the spammers’ automated form search spider determine a page is a form they want to spam? Maybe there is a way to make the form page NOT look like a form they want to submit to. Are they looking for one which posts to a .cgi? In that case why not make the cgi extension .xyz and change your server .htaccess to execute .xyz like a .cgi?

    Reply
  5. —–BEGIN PGP SIGNED MESSAGE—–
    Hash: SHA1

    The article you commented on has my suggestions.

    Good luck!

    Leo
    —–BEGIN PGP SIGNATURE—–
    Version: GnuPG v1.4.6 (MingW32)

    iD8DBQFF0KigCMEe9B/8oqERAuspAJ46l2DDKmqNMgJbc7ek/AvFhdzobgCfY2Er
    kVSL6946NACOPC9+yXoZu1A=
    =CYJd
    —–END PGP SIGNATURE—–

    Reply
  6. my very popular website (over 1 million viewers) is now getting sick sex postings (my site is a family site) how can I prevent it, my site is http://www.pennypincher.ca

    The article you just commented on has my basic suggestions – they all involve modifying the website or website software to put up barriers to this type of thing. Unfortanately there’s no simply answer that just works – it depends heavily on the type of software that’s running the site.

    – Leo
    26-May-2008
    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.