How in the world can my antivirus/antispyware/antimalware program
possibly scan all of my files for the thousands of trojans/signatures
out there without taking an eon to do so? Don’t they have to scan every
file on your computer (or at the very least the exes, zips, dlls and
registry) sequentially for a trojan-name and/or each signature? I can
only presume they must do this one trojan-name/signature at a time, and
then repeat. I can’t fathom how it can be done so quickly, relatively
speaking, given the task at hand. Heck – just a manual search for one
or two obscure files on my computer can take me almost as long to find
them – if I even do!
And here I was thinking that the virus scans take forever, and
you’re wondering how they can be so fast! It’s all a matter of
perspective, I suppose.
The short answer is that sometimes it does take a really long time.
But there are techniques that scanners use to dramatically speed up the
process, or at least make it look that way.
In addition not everything is, in fact, a scanner.
Time for some explanation of how anti-malware software typically
works.
]]>
I want to throw out an entire class of anti-malware software before we even start: those that don’t scan files at all.
Much of what we classify as “anti-spyware” software doesn’t scan files. In fact, it’s one of the semi-accurate rules of thumb that differentiate anti-spyware tools from anti-virus (though the line continues to blur over time.)
Rather than scanning files, these tools monitor behavior. For example, they might look for attempts to reset your browser home page as it happens. If things look kosher, they allow it, if not, they alert. No scanning was involved, they just hook into places where spyware-like behavior is likely to happen, and keep an eye out.
I’m not saying all anti-spyware software doesn’t scan, (or that all anti-virus tools don’t watch behavior), I’m just saying that a large portion of what anti-spyware software often does doesn’t involve scanning at all.
So I’ll focus on anti-virus software, which typically does scan.
As the question outlines, you would think that a complete scan would involve two things:
-
Read the contents of every file on your hard drive (or whatever media is being scanned).
-
Compare the entire contents of each file against the pattern of every known virus.
In other words, one heck of a lot of work.
Fortunately there are several shortcuts that anti-virus software can take.
-
Full scans can happen in the background. In reality, a full scan typically happens in the background as you’re doing other things. As a result you might not realize just how long the scan is taking. A good scanner will prioritize its work in such a way so as not to impact what you’re doing, but still get its work done. It’s not uncommon for this type of scan to take hours, and if its any good, you’d never notice.
-
Full scans can be scheduled for when you’re not using your computer. Once again, in the “so you’d never notice it” category, a full scan could be scheduled to happen automatically in the middle of the night (assuming you leave your computer on), or at some other time that’s appropriate. It could take a long time, but if you don’t see it, did it matter?
-
Full scans might not be needed after the first. After you install your anti-virus software it typically does one full scan shortly thereafter. Theoretically as long as it then monitors all the files that arrive on your machine as they come in, and any changes to the files that are already on your machine, another full scan isn’t really necessary. There’s often no need to re-scan an old file that’s never changed. Many anti-virus products’ default configuration use exactly this model.
-
It might not scan every file. In reality, not all file types can carry viruses. “.exe” or “.dll” files are typical targets, but a “.dat”, “.chm” or even a “.leo” files are not. That’s not to say that they couldn’t contain a virus, just that there’s typically no way for that virus to be run. Virus scanners can take advantage of that and not bother scanning many types of files at all. Once again, this is typically an option that is set by default.
The other part of this scenario is that actual algorithms used to perform the scan aren’t as brute force as we might think.
Let’s say there are 100,000 virus definitions that your anti-virus software needs to look for. The scan is most certainly not 100,000 cases of “is it this one?”, repeated for each file being scanned. That really would take forever.
In reality, the data and the patterns that make up the various virus signatures are optimized and stored in such a way that, in a sense, the scanner’s actually looking for almost all the viruses at once. It’s difficult to describe without getting all geeky, or even computer science-y, but I’m sure there’s a lot of math, organization and optimization around setting up anti-virus databases in such a way as to optimize for the fastest and most complete scan possible. I’d bet it rivals the complexity of encryption in many ways.
I’d also bet it’s one of the key differentiators in anti-virus software.
So, in a nutshell: not everything’s a scanner, you might not notice full scans, full scans might not even be needed, and the actual technology of a scan is much faster than you might think.
Me, I’m just glad there are smart people in the world who are writing these critical pieces of our security infrastructure.
A simple principle which explains how so many things can be done very fast is putting things in alphabetcal order. Suppose i have a book written in a foreign languge but which uses our alphabet and I want to know if it contains any English words. (An English word is any of 100,000 or so which is on a list.) I look at each word in the foreign book and for each the question is, is it on the English list. That does not mean comparing iot with 100,000 others. My English list is a dictionary. Finding out whether “reciept” say is in the dictionary (whch we might have to do if we are not sure how to spell “receipt”) just menas finding it if it is in or finding the pace where it would be if is isn’t. This takes far fewer look ups. 17 in fact. About the log to base 2 of the number of words in the dictionary.