What is the definition of signature scanning? What does
signature scanning do from an anti-virus programās stance?
Itās a term that we throw around a lot, but itās easy to overlook
that itās not all that clear what we mean by a āsignatureā. There are
several definitions, so Iāll try to clarify what it means when it comes
to anti-malware tools.
First, we have to step back and take a look at exactly how computers
store data, keeping in mind that computer programs are themselves just
data of another sort.
]]>
All data is stored on a computer as nothing more than a series of numbers. In fact, if we go even deeper those numbers are stored as nothing more than a series of ā1ā or ā0ā, but thatās deeper than we need to go here.
Letās look at character data. If we take a string, each letter and symbol has a specific numeric value. The string āAsk Leo!ā is represented by the values:
65 is the numeric representation for the uppercase letter āAā, 115 means āsā, and so on. (There can be other representations, but for the purposes of this discussion the standard ASCII character set will do.)
Similarly, the instructions of a computer program are also represented as numbers. In fact, they are represented by those same kinds of numbers. The numbers that begin the text āAsk Leo!ā also represent computer instructions to add one to the number being held in the CPU, make a decision based on whether that number got ātoo bigā, and so on.
Everything on your computer is numbers and what they mean depends on how you look at them.
What viruses often do is āsmuggle inā a set of numbers as, for example, a text string that if looked at as if it were a computer program would actually be the virus.
In a sense, they do the reverse of what I just did. I showed you the string āAsk Leo!ā as a series of numbers, and then described what computer instructions those characters also represent. One virus writing technique is to craft a series of computer instructions that are the virus, and then figure out what string they represent. They then try to then use some way of getting that string onto your computer, after which they try to get the computer to treat that āstringā not as a string, but as a computer program.
If theyāre successful, your computer just got infected.
The āsignatureā is just that series of numbers that represents the string/virus.
Each virus has its own unique series of numbers that represent the virus itself and what itās attempting to do. A virus scanner uses a large database of these signatures ā a list of those numbers for all the different viruses it knows to look for ā and simply scans all the files it can for those numbers. If it finds the unique sequence of numbers, the signature, that identifies a particular virus alerts you.
Now, naturally thatās an over simplification. Early on it may have been that simple, but the virus/anti-virus war has escalated in complexity as each side tries to outwit the other. For example:
-
Database updates. Each new virus has its own unique signature. If an anti-virus program doesnāt know to look for that signature, it wonāt find the virus, even if you are infected. New viruses are being found every day, and thatās why regular database updates are critical to keeping your anti-virus solution effective.
-
Encryption. Yep, the same technology that hides your data can also hide viruses. The good news is that many times the encrypted data is, in fact, itās own āsignatureā of a sort that can be scanned for. Similarly, the decryption instructions that are required to unlock the virus can itself be the signature.
-
Polymorphic code. Thatās a fancy term for computer program instructions (ācodeā) that change each time the virus infects a new machine. This means, in a sense, that the virusās signature is constantly changing and thus nearly impossible to detect through traditional means. Anti-virus programs have reacted with much more complex analysis of the files being scanned to detect these types of viruses.
Signature remains perhaps the most important part of malware detection.
And itās all basically just looking for a series of numbers.