I have a lot of files, pictures and documents, and most of the time, I copy
or move it in the hard drive or to DVD-ROM. My question is how can I make sure
that what I’ve copied or moved is exactly the same as the original? Most of the
time, after I copied, I check the folder or File Properties and compare the
Size and Size on disk and Contains – and you know, every time Size is always the same but Size on disk is sometimes different.
In this excerpt from
Answercast #58, I look at the way files copy to disk and why you needn’t
really worry about it, especially if you have a backup!
Become a Patron of Ask Leo! and go ad-free!
Different size designations
So, it’s interesting because I was just reading an article by a Microsoft
techie who was discussing something very similar to this from a programming
perspective.
His bottom line was there are many things that you can do to
double-check that what has been written to the destination does match
the source. And yet, for as much time as you might spend doing that, it’s still
possible that immediately after doing that, something else could happen.
Backing up is best
So, my very first recommendation is that the best thing you can do to make
sure that you never lose data is (as you might expect) backup and backup
regularly.
Copying usually works
Now, in a case like this where you’re copying files – to be honest, I’d just
let the file copy work and assume it works. Remember that if there’s a problem
along the way, if the copy operation actually stumbles and runs into something
that would cause a failure of the write operation, it’s going to tell you when
you do the copy. So, by the time the copy has completed and has completed
successfully (without error messages), then there is a very, very high
likelihood that the file has been copied: it’s been copied completely, it’s
been copied entirely and it’s been copied correctly.
You may not need to do or to take any additional steps.
Command line functions
Now, if you do, if you’re a little extra paranoid about these kinds of
things, there are two solutions that I will actually point you at.
Unfortunately, they are both command line functions.
The xcopy command line command has an option (I believe it’s /v) which it
calls “verify.” So, if you learn how to use xcopy to copy files from one place
to another, then when you specify this “verify” option – what xcopy does is it
copies all of the files that you specified. And when it’s done, it goes back
and re-reads them. It makes sure that what it finds in the destination that it
just put there still matches what it read originally as the file.
Now, the other solution, the other approach that doesn’t actually involve
moving to the command line to do the copies, uses a command line utility called
FC. Just the letter F and the letter C. That stands for “file compare” and it
is a command line utility that will do pretty much exactly what you’re looking
for.
You tell it “compare the files that are here with the files that are there”
and it will tell you if any of them are different. You have to be a little
careful with File Compare. It assumes that you’re using text files. You do need
to specify (I believe it’s the /b) if the files that you’re looking at are not
text – and most of the files you’re probably dealing with like your documents,
like your pictures, are definitely not text.
But it’s the same idea. It will actually read both sets of files assuming
that they’re the same and if they’re not, it lets you know.
Different size on disk
Now, about the Size and Size on disk.
The interesting thing about the way Windows (and actually most operating
systems) write information to disk is that they do not allocate space one byte
at a time. They allocate space in terms of sectors or clusters. So a sector
(usually, 512 bytes long) is allocated as an entity. If you have a one-byte file,
it will take up 512 bytes on the disk; it will actually consume 512
bytes of disk space.
It will only use the first byte of that 512 bytes to store its data – and
there’s other information stored with the file that says it’s “exactly one byte
long only look at the first byte,” but the entire 512-byte sector has been
allocated to that file.
So, when you take a look at “file size” versus “size on disk,” the two
numbers are actually telling you two different things.
- The “file size” is the true file size (it’s the one byte in the case of my
one byte file example.) - The “size on disk” is telling you how much space on disk has been allocated
for that file (in my example, that would be 512 bytes even though it’s a one
byte file).
So, you would see two different numbers. And in fact, those numbers might be
different between two different types of disks because 512 bytes is just an
example. It can be 512, it can be 1024, it can be 2048, it can be 4096. And
even more.
There are larger cluster sizes as well.
The bottom line is that’s defined by the way the disk is formatted. It’s an
option that’s actually specified in most cases at the time the disk was
formatted.
So what that means is on that on disk, a file will always have at least this
much space taken up on the disk. And it will always grow by 512 bytes or by
1024 bytes or whatever. That could be the different between the two places
you’re looking at. Specifically, if you’re looking at one hard drive and a DVD, it’s very reasonable to think that the cluster size of a large hard disk is
going to be a different choice than the cluster size that you might find on
something smaller like a CD or a DVD.
Bottom line is I wouldn’t really pay much attention to the size on disk. I
would pay attention to the specific file size and only the file size. And like
I said, in the long run, I wouldn’t really sweat about it… trying to verify
that the file is in fact copied correctly. If you don’t get any errors, it
probably has. Even if it fails later, you wouldn’t have found out at the time
it’s copied!
That’s why you want to make sure that you’re backing up and backing up
regularly so that if (or more specifically when) something
finally does fail, you’ve got backup copies of everything that you care
about.
Next from Answercast 58 – What do you recommend for storing the many addresses we all have?
I would think hash values would be much easier than command line functions for average users. Usually, they are used to ensure a downloaded file was completed properly, but double checking files you copied is possible as well…
04-Oct-2012
When used on the command line, it’s XCOPY not X COPY. XCOPY has a lot of options, including /V to verify. Use:
XCOPY /?
to see all the options.
Leo,
You mentioned that you was reading an article about this from a Microsoft person. I’m a programmer myself and I would be quite interested in reading the article so could you provide the link to it if you still have it please?
Thanks
05-Oct-2012
I use teracopy to copy files where I want immediate confirmation that it was done correctly. Teracopy has an option to do an automatic hash value check as part of the copying process. Very nice program – and free.