Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Why is the Same File a Different Size in Different Places?

Question: When backing up online, my pictures only take up ~65 GB, but ~88 GB are reported on my computer. Why?

This is something that's confused computer users for many years: the exact same file can show as taking up a different amount of space, depending on where you look and the characteristics of different disk drives.

Copy that file online and things get even more confusing.

I don't think I've ever seen this be something to worry about. Regardless of the differences, your file is still your file, regardless of where it's stored.

Become a Patron of Ask Leo! and go ad-free!

TL;DR:
  • Disk space is allocated one "cluster" at a time. Even a one-byte file takes up at least one cluster of space.
  • Cluster size is configurable when a disk is formatted, and generally ranges from 512 to 131,072 bytes.
  • Different utilities show disk space differently.
  • Online services hide all that and simply show your file sizes.

File size versus file size

I'll use a one-byte file as my example: one-byte-file.txt.

A one-byte file, listed in the Windows Command Prompt
A one-byte file listed in the Windows Command Prompt.

I used Command Prompt specifically because it shows the file size as being exactly one byte.

Unlike Windows File Explorer:

A one-byte file, listed in the Windows File Explorer
A one-byte file listed in the Windows File Explorer. (Click for larger image.)

Here you can see the file listed as "1KB" (1024 bytes) in size.

So, what it is it? One byte or over a thousand?

Well, in a way, it's both.

To understand why, we need to look at how disk space is allocated.

Clusters

Data on hard disks is stored in sectors of 512 or 4,0961 bytes at a time. This physical organization maximizes the amount of data stored on the media, while providing the ability to recover from errors, access data randomly as needed, and do all of it quickly.

File systems, or more accurately, file storage systems, keep track of all the information about files stored on a disk, including in which sectors the data is stored. Rather than track one sector at a time, however, most file systems group multiple sectors together in what are called clusters.

Clusters are simply groups of 1, 2, 4, 8, 16 or more adjacent sectors2. A file system then tracks the location of a file's data by keeping a list of the clusters assigned to it.

Running CHKDSK (no parameters required) will display the cluster size used on a drive as "bytes in each allocation unit" at the end of its report.

CHKDSK report showing cluster size
CHKDSK report showing cluster size.

You can see that my hard disk has 4096 bytes per cluster3.

Space given versus space used

Conceptually, when I created my one-byte file, the file system had to do a few things:

  • Create an entry in its table of files, or the "directory listing", as it's more commonly known.
  • Allocate a cluster on the hard disk in which to store the file.
  • Write the data to disk.

The file was given a cluster -- 4,096 bytes of disk space -- even though the file is only one byte.

A one-byte file takes up 4KB of space because that's how disk space is allocated: one cluster at a time. Should the file grow to 4,097 bytes in size, an additional cluster will be allocated; the 4,097-byte file will actually take up 8,192 bytes of disk space.

Depending on where you're looking, either number might be reported.

But File Explorer showed 1KB, not 4

Note that I said "conceptually", above. In reality, that's not quite what happened.

A file system tracks more than just your file's data. It also records its name, the list of clusters allocated, timestamps, attributes, permissions, and more. All that "meta-data" (data about your data) takes up disk space in the file's directory listing.

In the NTFS file system directory listing, space is allocated one "chunk" at a time. Regardless of the actual amount of meta-data, the space it's given grows 1,024 bytes at a time.

The optimization is simply this: if the file is small enough, and there's enough space left over in the directory listing to hold the file's data, it's placed there instead of being allocated any clusters at all. In a sense, the file takes up no additional space on disk (zero clusters) beyond its directory listing.

When that happens, Windows File Explorer lists the size as 1KB -- the size of the directory listing -- rather than the size of the clusters allocated to the file.

It's the same online, except different

It's all the same online, in the sense that cloud storage services use hard disks just like you and I do. Those hard disks are formatted with file systems, and those file systems allocate space in various ways that probably behave much like I've just described. There's a good bet that Microsoft's OneDrive uses NTFS-formatted hard disks to hold your files.

It's all different in the sense that none of that matters, and the hard disks are hidden from you completely. All OneDrive and other cloud storage providers show you are your files and their actual file sizes.

While it's important for you to know how much space your files are consuming on the hard disks on your machine, that information is completely irrelevant online. It could even change as cloud storage providers transparently update their infrastructure, possibly moving your data from hard disks formatted one way to hard disks formatted some other way.

The result of all this? The pictures you have on your hard disk that take up 88 gigabytes of space might actually be only 65 gigabytes' worth of actual data. Chalk up the difference to file system overhead, and the fact that full clusters are allocated even for the smallest files.

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

Podcast audio

Play

Footnotes & References

1: The move to 4,096 byte sectors, common in newer hard drives, reflects improvements of the underlying technologies.

2: A choice typically made when the disk is formatted. And yes, one sector per cluster is often an option.

3: Which is either one sector per cluster or eight, depending on the sector size used by the underlying physical disk.

5 comments on “Why is the Same File a Different Size in Different Places?”

  1. And if you look at the file’s properties in File Explorer, you will see both the actual size and the size that it is taking up on the disk.

    Reply
  2. As a programmer, it bothers me that MS didn’t put the simple if/then logic in the command prompt interface so that when the file size = 1 then “BYTE” else “BYTES”. Of course, back then, an extra line of if/then logic would have taken up several bytes and that was when storage was on floppy disks and measured in KB’s.

    Leo, you could have reprogrammed this during your tenure…why didn’t you? :-P

    It appears no one has noticed it or cared for all of these years. I just checked on Server 2016 where the command prompt has been updated to FINALLY take CTRL+V as an actual paste command and it still reads as “1 bytes”. *sigh*

    OK, I’m done whining! :D

    Reply
    • When I was programming or designing programs, I always made sure the program reflected proper singulars or plurals. So many little things bother me when I know that a couple of lines of code can reduce the number of keystrokes of clicks, especially clicks.

      Reply
  3. Can your phone be hacked and someone putting information on those large files? Ex-boyfriend has done so many things to my phone desperate

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.