Why are email attachments in general much larger than the actual attachment
sent? Is something added to them by the email client?
I wouldn’t say that anything is “added” to the attachment other than perhaps
some administrative data like its name – the attachment is still just the attachment.
However, something is done to the attachment that will
most definitely make it larger.
And it all has to do with the fact that the technology behind email is,
basically, older than dirt. (In internet terms, of course.)
First we need to explain how data is represented, and the difference between “text” and “binary” data.
Data that is text (and I’m restricting myself to “plain old ascii text” here) – the traditional letters, numbers and a specific set of symbols – are represented by numbers from 0 to 127. (I won’t get into the actual 1’s and 0’s representation, but the reason it’s 127, is because that is 2 to the 7th power – 128 – minus one. A lot of the magic numbers you’ll see related to computers are related to powers of two.)
Binary data, on the other hand, is represented by numbers from 0-255. You might recall that a “byte” is also a number from 0-255, and that all the data stored in your computer is stored this way. We measure file sizes, memory sizes, disk sizes all in bytes – be it megabytes, gigabytes, or any of several other shorthand’s for large quantities of bytes.
Text data, when stored on your computer, is still stored in bytes which could hold values greater than 127, but the characteristic of plain old text data is that all the actual values will be less than 128.
Why does this matter?
Email is primarily (or at least originally) a text-only media.
And yet, attachments, by definition, are binary. An attachment can be anything: text, a program, a video, a music file – these are all by definition binary data.
So the question becomes: how do you send data that can have values between 0 and 255 through a medium that itself can only handle values from 0 to 127?
Answer: you encode it. You come up with a way to represent the binary data as text.
If you’ve ever seen something like this:
iVBORw0KGgoAAAANSUhEUgAAAQQAAABOCAIAAABJ3v/jAAAACXBIWXMAABJ0AAASdAHeZh94 AAAKT2lDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVNnVFPpFj333vRCS4iAlEtvUhUI IFJCi4AUkSYqIQkQSoghodkVUcERRUUEG8igiAOOjoCMFVEsDIoK2AfkIaKOg6OIisr74Xuj ...
That’s binary data, encoded as text. That particular example shows the first three of 390 lines that are a “base64” encoding of my logo from the top of this page – a binary file. “Base64” is one of several possible encoding mechanisms.
And now to your point: that image (a PNG file) is 20,985 bytes of binary data. The base64 encoded text version? 28,760 bytes – a 7,775 byte or 37% increase in size.
Representing binary data as text makes it bigger. The actual growth depends on several factors, but is mostly related to the encoding scheme used.
Why do we have to do this?
Remember when I said that email technology is “older than dirt”? It’s actually one of the oldest technologies on the internet. And originally binary data could not be transmitted via email at all. Email was restricted to text data with values less than 128. When people decided that sending binary files around was a good idea, they had to come up with this approach of encoding that data as text to allow it to go through.
Today there are ways that many forms of binary data could be transmitted directly – in particular Asian and other character sets rely on it. But the basic problem remains: while most email programs and email servers and other email related software might work with it quite well – we can’t guarantee that all will. Hence email is often transmitted using the “lowest common denominator” – the basic encoding that by definition all email programs must support.
That’s pretty simple, actually. A limit on the size of an email doesn’t imply you can send attachments that approach that size. Because of the expansion caused by text encoding the size of attachments you can include is typically much smaller.
Let’s say your ISP imposes a 10 megabyte limit per message. If, as in my example, your email program uses an encoding method that increases the size of your attachments by 37%, then the largest file you could attach and send successfully would only be at most around 6.5 megabytes.
If your email’s bouncing because of size limitations, that’s important to know.