Do you happen to know what the gibberish writing at the bottom of
spam is? I periodically get spam in my regular mail with a normal
enough header, but at the bottom of these emails (not all) have like a
history story, or some funky strung together writing about a person or
event or who knows what.
It’s spammers being spammy.
It’s one of many techniques spammers use to try and slide by
automated spam filters.
Let’s look at how that works.
Spam filters are incredibly complicated analysis tools. Some of course are better than others, but they all operate in a variety of ways, looking at various characteristics of each message they analyze.
Headers you see
The “To:”, “From:” and “Subject:” lines are examined for suspicious behaviors. Some spam filters will make sure that the email domains actually exist, for example, or messages that are “From:” suspicious sources.
This is one of the reasons you’ll often see spam “From:” yourself. You didn’t send it, but the spammer simply spoofed the “From:” line to make it look like you did. By definition, if you get it, then your email address is valid, and will pass many spam checkers as a valid “From:” address as well.
And of course most will look for “bad words” in the “Subject:” line. This is often why you’ll see spam with subjects completely unrelated to the body of the message, and are in fact often worded so as to entice you to open the message.
Headers you don’t see
The full headers that accompany email messages contain a lot more information. Once again, spammers often falsify that information, and spam checkers will look. Even without falsifying headers, this is also where the IP addresses of the email servers that route the message can be found. There are many “black lists” that contain the IP addresses of known spam sources, and many spam checkers will use these blacklists to determine if an incoming message is likely to be spam. (Sadly, with so many lists, they are also often prone to errors, missing some spammers, and blacklisting honest sources in error.)
The majority of full-header analysis is typically done by spam filtering solutions on mail servers, before the message ever reach you.
The Message Body
Naturally, the message body is where the spam is most evident. Embedded pictures, bad words or intentional misspellings of bad words are all things that a spam filter can look at to determine if a message is in fact spam.
In fact, it would seem … obvious. I mean, you know what spam is when you see it, right?
Computers are phenomenally stupid. They make up for it in speed, but at the core of the issue, they’re just dumb. They can parse, they can count and they can categorize, but they can’t understand. So we have to give them rules – often incredibly complex rules – that help them determine what is and is not spam.
For example, is a message that contains the word Viagra spam? How about if it’s mentioned twice? How about if it’s misspelled? If it comes from an overseas domain?
Maybe. Maybe not.
The classic case is of breast cancer discussion lists that lose a bunch of messages because they use the word breast. Spam? Probably not. But the word actually is in an awful lot of spam, so it has to be analyzed for the possibility.
The solution is that most spam filters don’t look at spam as either black or white – they formulate a guess as to “how spammy” it is, and then choose a threshold – anything over that threshold of spammyness is flagged as spam, and anything below it is not.
And that’s where the off-topic text comes in.
A message that has a line or two about Viagra is likely to be analyzed as spam, since that’s all it talks about. However, a line or two about Viagra, followed by multiple paragraphs of boring and unrelated text? That’s harder to say. The spam filter can’t tell that the boring and unrelated stuff is in fact boring and unrelated. The message, as a whole, might actually be legitimate.
As a result, spammers are using that random text to tip the balance of the message’s spammyness in the eyes of many spam filters back into the “probably not spam” category.
Even though it is.
Spam. It’s a war. Or a game of whack-a-mole. About the time one side gets better weapons, the other side gets better defenses. Repeat, ad nauseum.