What is the definition of signature scanning? What does
signature scanning do from an anti-virus program’s stance?
It’s a term that we throw around a lot, but it’s easy to overlook
that it’s not all that clear what we mean by a “signature”. There are
several definitions, so I’ll try to clarify what it means when it comes
to anti-malware tools.
First, we have to step back and take a look at exactly how computers
store data, keeping in mind that computer programs are themselves just
data of another sort.
All data is stored on a computer as nothing more than a series of numbers. In fact, if we go even deeper those numbers are stored as nothing more than a series of ‘1’ or ‘0’, but that’s deeper than we need to go here.
Let’s look at character data. If we take a string, each letter and symbol has a specific numeric value. The string “Ask Leo!” is represented by the values:
65 is the numeric representation for the uppercase letter “A”, 115 means “s”, and so on. (There can be other representations, but for the purposes of this discussion the standard ASCII character set will do.)
Similarly, the instructions of a computer program are also represented as numbers. In fact, they are represented by those same kinds of numbers. The numbers that begin the text “Ask Leo!” also represent computer instructions to add one to the number being held in the CPU, make a decision based on whether that number got “too big”, and so on.
Everything on your computer is numbers and what they mean depends on how you look at them.
What viruses often do is “smuggle in” a set of numbers as, for example, a text string that if looked at as if it were a computer program would actually be the virus.
In a sense, they do the reverse of what I just did. I showed you the string “Ask Leo!” as a series of numbers, and then described what computer instructions those characters also represent. One virus writing technique is to craft a series of computer instructions that are the virus, and then figure out what string they represent. They then try to then use some way of getting that string onto your computer, after which they try to get the computer to treat that “string” not as a string, but as a computer program.
If they’re successful, your computer just got infected.
The “signature” is just that series of numbers that represents the string/virus.
Each virus has its own unique series of numbers that represent the virus itself and what it’s attempting to do. A virus scanner uses a large database of these signatures – a list of those numbers for all the different viruses it knows to look for – and simply scans all the files it can for those numbers. If it finds the unique sequence of numbers, the signature, that identifies a particular virus alerts you.
Now, naturally that’s an over simplification. Early on it may have been that simple, but the virus/anti-virus war has escalated in complexity as each side tries to outwit the other. For example:
Database updates. Each new virus has its own unique signature. If an anti-virus program doesn’t know to look for that signature, it won’t find the virus, even if you are infected. New viruses are being found every day, and that’s why regular database updates are critical to keeping your anti-virus solution effective.
Encryption. Yep, the same technology that hides your data can also hide viruses. The good news is that many times the encrypted data is, in fact, it’s own “signature” of a sort that can be scanned for. Similarly, the decryption instructions that are required to unlock the virus can itself be the signature.
Polymorphic code. That’s a fancy term for computer program instructions (“code”) that change each time the virus infects a new machine. This means, in a sense, that the virus’s signature is constantly changing and thus nearly impossible to detect through traditional means. Anti-virus programs have reacted with much more complex analysis of the files being scanned to detect these types of viruses.
Signature remains perhaps the most important part of malware detection.
And it’s all basically just looking for a series of numbers.