Why does a file sometimes double in size? Sometimes, I’ll be working with a
file that is say, 25 MB in size. I will add say maybe 2.5 MB of text and
photos. When I save this file to my hard drive or external drive or thumb
drive, the file size sometimes doubles up. So instead of being something like
27.5 MB, in other words, the sum of the two original sizes, it jumps to 55 MB.
This happens very infrequently, but it’s a pain when it does. What am I doing
wrong? And how can I correct a file that this has happened to?
In this excerpt from
Answercast #60, I look at the most common reason for files to be larger than
expected after adding new data.
Become a Patron of Ask Leo! and go ad-free!
Files doubling in size
You’re probably not doing anything wrong at all. It sounds to me like a
feature of the program that you’re using to edit the file. Now unfortunately, I
don’t know what that program is, but I’m going to use Microsoft Word as an
example.
How Save works
What some programs do (Microsoft Word being one of them) is that when you
save a file, they assume that is a time-consuming operation, so they make some
decisions in order to make that appear faster.
One of the decisions that they might make is to rather than overwrite the original file;
-
They may just write a second copy at the end of wherever they happen to leave it;
-
Or they might only write a few changes in a different place;
-
Or they might write some of the files, some of the changes, in one place and
leave the deleted portions alone, just mark them as being deleted, and then
continue to append new data to the end of the file.
As you can imagine, it can get quite complex.
Similar to defragmentation
If this kind of thing sounds familiar to you, it should. It’s very similar
to disk defragmentation.
In other words, when you delete a file, it doesn’t really get deleted. The
data is still there. The same kind of thing is true for some of these programs
in the way that they save their data. They may not delete the original copy of
the file; they may just write a new one at the end of the actual physical file.
Like I said, it gets kind of complicated.
Turn off Fast Save
The good news is that the solution is usually very simple. The thing to look
for (at least in Microsoft Office programs) is something called Fast
Save; turn that off.
What Fast Save does is it does all of these magical things that may not
result in the most efficient copy of the file.
With Fast Save turned off, Word will go through the work of creating a
completely new version of the file that contains all of the changes you’ve
made in the correct order, in the correct place, and with only the content
that is currently in the file. It may take a little bit longer.
That’s the point. But the net result is you’ll get a file that has only the
things that are supposed to be in the file.
Practicality
Now, from an operational perspective (in other words, from just using this
stuff), it’s not like there’s other “stuff” in your file that you’re going to
see when you edit it or print it. It’s not. It’s purely a way of how the
information is stored on disk.
If you didn’t pay any attention to the file size, you would never know that this was going on – because when you’re editing the file, you would only see the file in the state from your last edit.
So, I wouldn’t worry about that.
Sharing files
There is one additional interesting little side effect – and that’s when
files get shared with other people.
If this Fast Save magic is going on and the program that you’re using
isn’t really removing everything from the file (the actual physical file),
it may remove it from what’s being displayed (the file that you’re seeing and
editing), but it may not remove everything from the file as it’s stored on
disk.
What that means is that if you give that file to somebody else, they could
potentially use some other tools to take a look at the parts of the file that
aren’t currently being used; very much (once again) like file fragmentation (or File > Delete in the file system) where you can actually recover deleted files by looking in the right places, as long as that file hasn’t been overwritten.
Same thing applies to these kind of magically-fast saved files. It is
possible that by looking at the areas of the file that are deleted but not
really removed from the file, that somebody could find something that you have
previously deleted.
There have in fact been news stories of exactly this kind of thing
happening where sensitive or embarrassing data was allowed to leak out from an
organization because somebody did the Fast Save option. The file that was sent
out contained not only the final version of the file, but some of the remnants
of things that had previously been deleted.
Computers have enough speed
So, I actually do recommend in general that Fast Save should be turned
off.
These days, there really isn’t a big reason to have it on anymore. Computers,
disks, and so forth are fast enough that you’ll never notice the difference on
anything but the largest document.
But, that’s probably what’s going on here. That’s the option to look for.
Like I said, I don’t know what specific program you’re using, but those are the
kinds of things to be searching for as you search that program’s options or
online help.
Next from Answercast #60 – Can my mobile phone calls be listened to?
Files often ‘save’ more than just your content. They save the previous version as well, or other information that is not always visible.
I remember a Word document I was trying to edit, that was a single page of text yet clocked up a staggering 10mb of room. After highlighting the entire document and copy-pasting into a new file (including formatting), the size was a mere 400kb. Identical to look at, identical to edit, but 1/20th the size.
There’s a feature in MS Word and most other word processing programs called Track Changes. If this feature is turned on, things you delete from that file will not be deleted, but marked for deletion in a manner similar to Windows placing deleted files in the Recycle Bin.
This is one of those file sharing features that allows other users of that file to see which changes were made and who made them. I believe this is one of the causes of the cases Leo was talking about when he mentioned that sensitive information was discovered in a file which supposedly had that information deleted.
If you have a file with these changes saved, there are two steps you’ll have to follow to clean them up. This is how to do it in Word 2010.
Click on the Review Ribbon
1. Click on the arrow under Accept and choose “Accept all changes in document”
2. If the icon above “Track Changes” is highlighted, click on that icon and unhilight it.
The process is similar in other versions of Word and other word processors such as Open/Libre Office and Word Perfect.
Another thing I have heard about are style sheets. In excel (as one example), when people work on spreadsheets, their stylesheets get added. Not only does this enlarge the files, but slows it way down because it needs to open all the connected style sheets when the spreadsheet opens. As more people work on a document, all their style sheets get added – not sure how much space it takes, but I have seen people with hundreds of styles they didn’t even know they had. Once the were removed, the spreadsheets opened in seconds, instead of minutes.
You can also try ‘saving as’ to reduce file size.
For all the reasons mentioned, a file that is edited and saved can often become bloated. Some programs perform worse in this regards than others. I work in the printing industry and I often see this (and in the extreme) when it is necessary to edit a client pdf’s.
After a number of edits a 5 meg pdf may become 20 or 30 megs (and even more) even though the pdf content was just re-arranged rather than added to. Performing a ‘save as’ will always bring these ‘bloated saves’ down to a more reasonable file size.