Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Why Shouldn’t I use TAB Characters in My Source Files?

Someone’s telling me not to use tabs in code files. Why? He mumbles
something about “cross-platform issues.”

Is this even *true*!? is it even *possible* for (say) a Mac or a Unix to
botch the reading of a Windows file (or, most likely, I’d think) a Mac or Unix
to botch the other?

I do know that between Mac and Unix, one uses CR (000D) and the other, LF
(000A), whereas Windows uses both (though I know not in which order).

That being the case, what’s the deal with the HT code (0009)? Don’t all
three use the same 0009 to indicate a tab?

<Shudder>

No, this has nothing to do with platforms. All use hex 0x09 to represent a
tab character.

The problem is much, much deeper than that.

This is about programmers, religion, and the meaning of that lowly
little character we call “tab”.

Become a Patron of Ask Leo! and go ad-free!

I shudder because I’ve witnessed religious flame wars between computer
programmers on this issue. Seriously.

To understand why something this seemingly simple would inspire deep
passion, we need to define a few things. Like “tab”. And in doing so, we’ll see
that there is no one true definition. Only the one you choose to adhere to.
(It’s sounding like religion already!)

“I’ve witnessed religious flame wars between computer
programmers on this issue.”

First let’s separate out two concepts: the tab character and the
tab key. They’re not the same. That’s part of the confusion.

TAB: The Character

The tab ASCII character, hexadecimal value 0x09, decimal value 9, probably
dates back to the days of the teletype. As defined by how the hardware was
constructed, a tab character moved the print head to the next tab stop, which
was the next multiple of 8 character positions. If a tab was encountered as the
teletype printed, the print head would jump ahead to column 9, 17, 25 and so on
– whichever one was next.

Put another way, a tab character was hard-coded to be every 8
columns. And you can’t get much harder than hardware.

My guess is simply that the tab was a convenient form of compression. A long
run of blank space could be described by a much smaller number of tabs followed
by an appropriate number of individual single spaces.

Indenting: The Root of the
Controversy

Let’s step away from the tab character issue for a moment, and talk about
indenting. Indenting is an approach to making programming code, be it HTML, C,
Basic, or who-knows-what, easier to read by laying out the individual
instructions in a manner that visually mimics the intended structure of the
code. For example:

if (1 == a)
    printf ("'a' is one\n");
else
    printf ("'a' is not one\n");

This silly little snippet of code uses indentation as a visual aid to show
the structure of the ‘if’ statement. This is equivalent:

if (1 == a)
printf ("'a' is one\n");
else
printf ("'a' is not one\n");

but as you can see it’s much harder to get a sense for what’s happening.

Now the controversy. How many columns should indented lines be indented? My
example above is 4. Here’s that same example with an indent of 2:

if (1 == a)
  printf ("'a' is one\n");
else
  printf ("'a' is not one\n");

and again with an indent of 8:

if (1 == a)
        printf ("'a' is one\n");
else
        printf ("'a' is not one\n");

Which is “better” is a matter of personal taste and readability. I have
seen, and at various times used, intents of 1, 2, 3, 4 and 8. And while it
sounds silly to some, programmers do get passionate at times as to how much to
indent – this is code they have to look at every day, and they want it to be as
readable and understandable as possible. Indenting is part of that.

In particular, when you have multiple programmers working on the same source
code, it’s critical that they agree on how much indenting they’ll use. Why?
Because if some indent at 2, and others indent at 4, for example, the code will
over time become more and more difficult to read. And that, in turn, makes the
code more fragile and error prone.

The Tab Key: A Solution, and
Yet…

It’s easy to indent to any column just by typing the spacebar the
appropriate number of times. However that quickly gets cumbersome. But what
about tab?

If you can standardize on an indent of 8, well then having the tab key
insert a tab character, and then just typing tab the appropriate number of
times will get you right to your level of indent. Very quick, easy to use, easy
to do.

But what if your indent style isn’t 8? What if it’s 4? Or 3?

Two different approaches are commonly used:

Redefine the tab character, and still have the tab key
insert a tab character: Many, if not most, text editors will allow you to
redefine what the tab character physically means. So many programmers will
simply use this to say “a tab character means tab stops every 4 columns”. Then
they define the tab key to insert a tab character. Very quick, easy to use,
easy to do. But it only displays properly if that redefinition of tab is
used.
If someone else who hasn’t defined the tab character to be every 4
columns looks at the file, the indentation will be wrong. Things won’t line up
properly.

Redefine the tab key, and leave the tab character
definition alone. Many, if not most, text editors actually understand that
“indent” is a different concept than “tab”. By assigning the tab key
to some kind of smart indent feature in the editor, the computer can simply
automatically insert the correct number of tab characters and spaces to move to
the desired indent column. For example, if you want to indent to column 13, the
computer would simply insert one tab and 4 spaces, without you’re having to
think about it. Once again simple, efficient, allows you to use tab characters
according to their default definition, and makes the computer do the work.

The Third Option: Kill the Tab
Character

The problem is that those two different approaches are, indeed, commonly
used. That means that you may well find programmers who have their text
editors, viewers and other tools to assume that a tab character means every 3
spaces. Or every 4. Or every 8. And that means that any file that contains
tabs, regardless of the definition may not display properly for
everyone.

It’s important to have a public convention that says “tab characters mean
this”, so that when someone views a file they can adjust your settings so that
it will view properly, but still… that’s cumbersome and easily forgotten,
especially for folks that move between projects that have different
conventions.

One solution is simply to avoid the problem. Don’t use tabs. Indent to
whatever degree your convention calls for, just use spaces to do it. Let your
text editor simply insert the appropriate number of spaces to get to the next
tab stop.

That way, all files that are tab-free are guaranteed to display properly no
matter what your tab character is defined to be.

The cost? Your source files will be a little bigger. A trivially small cost
in today’s environment of huge disks and storage.

Who’s Right?

So what’s the answer? Who’s right?

Everyone. No one.

Regardless of how we got here, here we are. The tab character does, and
doesn’t, get redefined. People do use mixes of tabs, tabs and spaces or only
spaces to indent their code.

What’s most important is that everyone working on the same source
code use the same convention, and that the convention is somehow documented and
easy to find, so that others looking at, or perhaps about to work on the code
can adjust their own settings, if needed.

And I say that as someone who’s been a referee on too many of these coding
standards religious battles.

(And I won’t even touch the issue of where the curly braces go in
programming languages like C or C++. 🙂

Subscribe to Confident Computing! Tech problem solving & safety tips & a weekly confidence boost in your inbox every week.

I'll see you there!

10 Reasons Your Computer is Slow

Slow Computer?

Speed up with my special report: 10 Reasons Your Computer is Slow, now updated for Windows 10.

NOW: name your own price! You decide how much to pay -- and yes, that means you can get this report completely free if you so choose. Get your copy now!

12 comments on “Why Shouldn’t I use TAB Characters in My Source Files?”

  1. “The tab ASCII character, hexadecimal value 0x0a, decimal value 10, probably dates back to the days of the teletype.”

    I assume you mean 0x09 (decimal 9) in the above.

    (Note that my parents had an IBM Selectric typewriter that allowed user-settable tab stops. The default, of course, was every 8 columns.)

    Reply
  2. we should’t use TAB characters because most pepole i know don’t know how it works even i don’t know what it means so please rember these comment and reply soon from karina robinson

    Reply
  3. Bah! You are the devil! The entire point of using tab is that people can chose how to display it – say a team member chooses to use four spaces, when I prefer just two – I’m stuffed and his code looks alien.

    If we were to use the tab key, he can set it at four characters, I set it at two and we are both happy as hamsters!

    Go tab!

    Reply
  4. Yea tabs. But tabs are really just the beginning. What is needed is, for instance, an IDE that stores the code in some normalized form. It could even store it minified or compressed and encrypted. When you open it, it formats the code with what I call a “Parsing Beautifier”. In other words a beautifier that does not just use patterns but has the same understanding of the code as a compiler would. Each person who looks at the code can fine tune the PB themselves and see the code in whatever way makes the most sense to them. Agree, dissagree, suggestions?

    Reply
  5. One problem with using TABs which was not mentioned here yet: people tend to intermix TABs and spaces. With the result that a text file is only really readable if the TAB setting of the reader is exactly the same as that of the writer…

    Reply
  6. For me, the largest benefit to using tabs is the very quick ability to shift spacing forward and back. When you have the tab key set to insert *spaces*, shifting code forward is still a simple, single tab key hit; shifting code back, however, is x backspace keys. This may sound trivial, but when you are coding day in day out, that is a big difference.

    Solutions to some of the problems:

    * As a team, define how tab should be used in everyone’s editors. Ensure everyone is using the same tab settings (this should be done whether spaces or tabs are demanded)
    * Even with above, when spaces absolutely must be used. Have a tool in your editor to convert spaces to tabs (for people who don’t like typing) and have a tool in your version control system to ensure tabs are converted to spaces before committing.
    * Alternatively, have shift-tab set to remove x-spaces when you have tab set to insert x-spaces

    My tuppence

    Reply
  7. Personally I prefer TAB indentation, and then space indentation when it comes to alignment of parameters or something like that. Everything done automagically by my IDE ^^

    Reply
  8. What is missing in this article is the fact that you need to diff source files. Each diff tool will have a different idea of what a tab is. Usually 8 characters, but if you used your tabs at 4 or 2, then 8 will probably not work right.

    Also, tabs are okay in many places, if you have a mix of tabs and spaces, that’s when it breaks (as mentioned by another user.)

    Another really bad one, because code is likely to look right (except if written on multiple lines!), but not tables. Write a tables with tabs to separate the columns, and you sure create a big mess if you do not know the size of tabs.

    By the way, I use vim and you can add a comment with the info, for example:

    // vim: ts=4 sw=4

    Now all the users know that tabs are equivalent to 4 spaces (ts = tab stop) and the indentation to the left or right is also 4 spaces (sw = shift width); and if you want to have spaces instead of actual tab characters, add “et” (expand tabs).

    Most diff tools have the ability to “ignore whitespace”, which is super-handy when diffing source files with different tab assumptions.

    Leo
    02-Apr-2013
    Reply
  9. When I started programming, it would have been unthinkable to use multiple spaces instead of a tab. I went as far as storing the date as the number of days since 1950 (a Y2k bug waiting to happen) and I’d pack 8 switches into a byte and use bits as yes/no switches. Now a 100K picture is considered tiny.

    Reply
  10. Leo, you wrote:

    “Indenting is an approach to making programming code, be it HTML, C, Basic, or who-knows-what.”

    The correct spelling of that last programming language is BASIC, not “Basic.” That’s because it’s an acronym — it stands for “Beginner’s All-purpose Symbolic Instruction Code.”

    Hope that helps!

    Reply
    • As I understand it (and recall, I was there at the beginning-ish), BASIC as an acronym refers to one specific implementation of the language. By the time it hit places like Microsoft and others, and oodles of variations had been created, it was referred to as “Basic”.

      Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.