Why are email attachments in general much larger than the actual attachment
sent? Is something added to them by the email client?
I wouldn't say that anything is "added" to the attachment other than perhaps
some administrative data like its name - the attachment is still just the attachment.
However, something is done to the attachment that will
most definitely make it larger.
And it all has to do with the fact that the technology behind email is,
basically, older than dirt. (In internet terms, of course.)
]]>
First we need to explain how data is represented, and the difference between "text" and "binary" data.
Data that is text (and I'm restricting myself to "plain old ascii text" here) - the traditional letters, numbers and a specific set of symbols - are represented by numbers from 0 to 127. (I won't get into the actual 1's and 0's representation, but the reason it's 127, is because that is 2 to the 7th power - 128 - minus one. A lot of the magic numbers you'll see related to computers are related to powers of two.)
Binary data, on the other hand, is represented by numbers from 0-255. You might recall that a "byte" is also a number from 0-255, and that all the data stored in your computer is stored this way. We measure file sizes, memory sizes, disk sizes all in bytes - be it megabytes, gigabytes, or any of several other shorthand's for large quantities of bytes.
Text data, when stored on your computer, is still stored in bytes which could hold values greater than 127, but the characteristic of plain old text data is that all the actual values will be less than 128.
Why does this matter?
Email is primarily (or at least originally) a text-only media.
And yet, attachments, by definition, are binary. An attachment can be anything: text, a program, a video, a music file - these are all by definition binary data.
So the question becomes: how do you send data that can have values between 0 and 255 through a medium that itself can only handle values from 0 to 127?
Answer: you encode it. You come up with a way to represent the binary data as text.
If you've ever seen something like this:
iVBORw0KGgoAAAANSUhEUgAAAQQAAABOCAIAAABJ3v/jAAAACXBIWXMAABJ0AAASdAHeZh94 AAAKT2lDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVNnVFPpFj333vRCS4iAlEtvUhUI IFJCi4AUkSYqIQkQSoghodkVUcERRUUEG8igiAOOjoCMFVEsDIoK2AfkIaKOg6OIisr74Xuj ...
That's binary data, encoded as text. That particular example shows the first three of 390 lines that are a "base64" encoding of my logo from the top of this page - a binary file. "Base64" is one of several possible encoding mechanisms.
And now to your point: that image (a PNG file) is 20,985 bytes of binary data. The base64 encoded text version? 28,760 bytes - a 7,775 byte or 37% increase in size.
Representing binary data as text makes it bigger. The actual growth depends on several factors, but is mostly related to the encoding scheme used.
Why do we have to do this?
Remember when I said that email technology is "older than dirt"? It's actually one of the oldest technologies on the internet. And originally binary data could not be transmitted via email at all. Email was restricted to text data with values less than 128. When people decided that sending binary files around was a good idea, they had to come up with this approach of encoding that data as text to allow it to go through.
Today there are ways that many forms of binary data could be transmitted directly - in particular Asian and other character sets rely on it. But the basic problem remains: while most email programs and email servers and other email related software might work with it quite well - we can't guarantee that all will. Hence email is often transmitted using the "lowest common denominator" - the basic encoding that by definition all email programs must support.
Why Care?
That's pretty simple, actually. A limit on the size of an email doesn't imply you can send attachments that approach that size. Because of the expansion caused by text encoding the size of attachments you can include is typically much smaller.
Let's say your ISP imposes a 10 megabyte limit per message. If, as in my example, your email program uses an encoding method that increases the size of your attachments by 37%, then the largest file you could attach and send successfully would only be at most around 6.5 megabytes.
If your email's bouncing because of size limitations, that's important to know.
Hi Leo – Clarity must be your middle name as you always make your points in a way most people can understand. Nice work!
As to mapping an 8-bit world on a 7-bit system, the same problem arises when mapping a decimal world on a binary world: the fit is not exact. One approach in the old days was to approximate and on the early IBM PC’s, this could lead to compounding errors – for instance I took the number 3 and squared it on a loop many times (20?)
then took the square root of the result the same number of times and did NOT get back to 3!
Another approach, much preferred, was to represent the number in binary-equivalent to, say, 15 decimal points as was done on the WANG T-CPU in the late ’70’s (I cut my teeth on this Z-80 machine) but this wasted precious memory address space in the interests of precision. On the WANG system, it was possible to take the number 3 and square it any number of times (within reason, I guess) then reverse the process and, voila, back to 3 exactly. Neat.
Me, I’ll take precision…
Finally! I’ve wondered about this for years.
I was out sick for a week and when I logged back into my email, the file sizes of some of the attachments increased by up to 10 times. i.e. what once was a 25kb .pdf is now 10mb large. Not sure why. Help?
question – when archiving E-mails to a pst file – the original byte size of the email increases insize..ie if its 30 k in the inbox it now shows its 35 K insize inthe archive pst. if I grab another E-nail it starts to increase in size by 10 K if I grab another it increases in size by 20 K or 32K or even 45K for another set – so if it was originally 45K in size it now is 90 K in size in the pst file…so incrementally the padding start small but then increases to the point its double the original size of the E-mail…what is causing the E-mails to be padded to the point where one tries to open a archived E-mail you get a out of virtual memory error opening the file….. would greatlly greatlly appreciate some help…
Hi,
I just wanted to know, is there any way I can get the size of message body or the size of encoding part.
I made a word doc with a .jpg watermark for my boss who wants to use it as a letterheading. The original file I send him is 85kb. He saves it to his desktop then attaches it to his email and it becomes 7mb!
I’ve sent it between 3 different email accounts (yahoo, gmail & other) and it stays 85kb’s. Is this happening because of his email account?
You said: “The actual growth depends on several factors, but is mostly related to the encoding scheme used” could there be another factor involved (because it’s such a massive increase in size) that I can change?
16-Apr-2013
I’ve bookmarked your site because your articles are really clear and informative.
Thanks so much for getting back to me, at least I know now that this is an unusual issue an I’m not doing anything totally thick.