Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Why does so much spam have a part of some other email or document in it?

Question:

Do you happen to know what the gibberish writing at the bottom of
spam is? I periodically get spam in my regular mail with a normal
enough header, but at the bottom of these emails (not all) have like a
history story, or some funky strung together writing about a person or
event or who knows what.

Yep.

It’s spammers being spammy.

It’s one of many techniques spammers use to try and slide by
automated spam filters.

Let’s look at how that works.

]]>

Spam filters are incredibly complicated analysis tools. Some of course are better than others, but they all operate in a variety of ways, looking at various characteristics of each message they analyze.

Headers you see

“Computers are phenomenally stupid.”

The “To:”, “From:” and “Subject:” lines are examined for suspicious behaviors. Some spam filters will make sure that the email domains actually exist, for example, or messages that are “From:” suspicious sources.

This is one of the reasons you’ll often see spam “From:” yourself. You didn’t send it, but the spammer simply spoofed the “From:” line to make it look like you did. By definition, if you get it, then your email address is valid, and will pass many spam checkers as a valid “From:” address as well.

And of course most will look for “bad words” in the “Subject:” line. This is often why you’ll see spam with subjects completely unrelated to the body of the message, and are in fact often worded so as to entice you to open the message.

Headers you don’t see

The full headers that accompany email messages contain a lot more information. Once again, spammers often falsify that information, and spam checkers will look. Even without falsifying headers, this is also where the IP addresses of the email servers that route the message can be found. There are many “black lists” that contain the IP addresses of known spam sources, and many spam checkers will use these blacklists to determine if an incoming message is likely to be spam. (Sadly, with so many lists, they are also often prone to errors, missing some spammers, and blacklisting honest sources in error.)

The majority of full-header analysis is typically done by spam filtering solutions on mail servers, before the message ever reach you.

The Message Body

Naturally, the message body is where the spam is most evident. Embedded pictures, bad words or intentional misspellings of bad words are all things that a spam filter can look at to determine if a message is in fact spam.

In fact, it would seem … obvious. I mean, you know what spam is when you see it, right?

The Dilemma

Computers are phenomenally stupid. They make up for it in speed, but at the core of the issue, they’re just dumb. They can parse, they can count and they can categorize, but they can’t understand. So we have to give them rules – often incredibly complex rules – that help them determine what is and is not spam.

For example, is a message that contains the word Viagra spam? How about if it’s mentioned twice? How about if it’s misspelled? If it comes from an overseas domain?

Maybe. Maybe not.

The classic case is of breast cancer discussion lists that lose a bunch of messages because they use the word breast. Spam? Probably not. But the word actually is in an awful lot of spam, so it has to be analyzed for the possibility.

The solution is that most spam filters don’t look at spam as either black or white – they formulate a guess as to “how spammy” it is, and then choose a threshold – anything over that threshold of spammyness is flagged as spam, and anything below it is not.

And that’s where the off-topic text comes in.

A message that has a line or two about Viagra is likely to be analyzed as spam, since that’s all it talks about. However, a line or two about Viagra, followed by multiple paragraphs of boring and unrelated text? That’s harder to say. The spam filter can’t tell that the boring and unrelated stuff is in fact boring and unrelated. The message, as a whole, might actually be legitimate.

As a result, spammers are using that random text to tip the balance of the message’s spammyness in the eyes of many spam filters back into the “probably not spam” category.

Even though it is.

Spam. It’s a war. Or a game of whack-a-mole. About the time one side gets better weapons, the other side gets better defenses. Repeat, ad nauseum.

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

5 comments on “Why does so much spam have a part of some other email or document in it?”

  1. I’ve noticed a lot of spammers will deliberately insert spaces inside the key words. For example, “Viagra” becomes “Vi agra”. Still readable, but the computer doesn’t recognize it, so it goes through.

    Reply
  2. Talking of spam, I see my GMail spam box is getting bigger by the day. Used to be little more than about ten in there when I checked once a week but I`ve just deleted fifty four of the beasts in two days and one popped up before my very eyes! Anything to do with Conficker do you think?

    Reply
  3. I get MUCH, M-U-C-H more spam on my Yahoo account than I get on my hotmail account. Are there certain rules that you can use to get less spam? Like, “Don’t use a yahoo account” likely being one.

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.