Someone’s telling me not to use tabs in code files. Why? He mumbles something about “cross-platform issues.”
Is this even *true*!? is it even *possible* for (say) a Mac or a Unix to botch the reading of a Windows file (or, most likely, I’d think) a Mac or Unix to botch the other?
I do know that between Mac and Unix, one uses CR (000D) and the other, LF (000A), whereas Windows uses both (though I know not in which order).
That being the case, what’s the deal with the HT code (0009)? Don’t all three use the same 0009 to indicate a tab?
<Shudder>
No, this has nothing to do with platforms. All use hex 0x09 to represent a tab character.
The problem is much, much deeper than that.
This is about programmers, religion, and the meaning of that lowly little character we call “tab”.
Become a Patron of Ask Leo! and go ad-free!
I shudder because I’ve witnessed religious flame wars between computer programmers on this issue. Seriously.
To understand why something this seemingly simple would inspire deep passion, we need to define a few things. Like “tab”. And in doing so, we’ll see that there is no one true definition. Only the one you choose to adhere to. (It’s sounding like religion already!)
First let’s separate out two concepts: the tab character and the tab key. They’re not the same. That’s part of the confusion.
TAB: The Character
The tab ASCII character, hexadecimal value 0x09, decimal value 9, probably dates back to the days of the teletype. As defined by how the hardware was constructed, a tab character moved the print head to the next tab stop, which was the next multiple of 8 character positions. If a tab was encountered as the teletype printed, the print head would jump ahead to column 9, 17, 25 and so on – whichever one was next.
Put another way, a tab character was hard-coded to be every 8 columns. And you can’t get much harder than hardware.
My guess is simply that the tab was a convenient form of compression. A long run of blank space could be described by a much smaller number of tabs followed by an appropriate number of individual single spaces.
Indenting: The Root of the Controversy
Let’s step away from the tab character issue for a moment, and talk about indenting. Indenting is an approach to making programming code, be it HTML, C, Basic, or who-knows-what, easier to read by laying out the individual instructions in a manner that visually mimics the intended structure of the code. For example:
if (1 == a) printf ("'a' is one\n"); else printf ("'a' is not one\n");
This silly little snippet of code uses indentation as a visual aid to show the structure of the ‘if’ statement. This is equivalent:
if (1 == a) printf ("'a' is one\n"); else printf ("'a' is not one\n");
but as you can see it’s much harder to get a sense for what’s happening.
Now the controversy. How many columns should indented lines be indented? My example above is 4. Here’s that same example with an indent of 2:
if (1 == a) printf ("'a' is one\n"); else printf ("'a' is not one\n");
and again with an indent of 8:
if (1 == a) printf ("'a' is one\n"); else printf ("'a' is not one\n");
Which is “better” is a matter of personal taste and readability. I have seen, and at various times used, intents of 1, 2, 3, 4 and 8. And while it sounds silly to some, programmers do get passionate at times as to how much to indent – this is code they have to look at every day, and they want it to be as readable and understandable as possible. Indenting is part of that.
In particular, when you have multiple programmers working on the same source code, it’s critical that they agree on how much indenting they’ll use. Why? Because if some indent at 2, and others indent at 4, for example, the code will over time become more and more difficult to read. And that, in turn, makes the code more fragile and error prone.
The Tab Key: A Solution, and Yet…
It’s easy to indent to any column just by typing the spacebar the appropriate number of times. However that quickly gets cumbersome. But what about tab?
If you can standardize on an indent of 8, well then having the tab key insert a tab character, and then just typing tab the appropriate number of times will get you right to your level of indent. Very quick, easy to use, easy to do.
But what if your indent style isn’t 8? What if it’s 4? Or 3?
Two different approaches are commonly used:
Redefine the tab character, and still have the tab key insert a tab character: Many, if not most, text editors will allow you to redefine what the tab character physically means. So many programmers will simply use this to say “a tab character means tab stops every 4 columns”. Then they define the tab key to insert a tab character. Very quick, easy to use, easy to do. But it only displays properly if that redefinition of tab is used. If someone else who hasn’t defined the tab character to be every 4 columns looks at the file, the indentation will be wrong. Things won’t line up properly.
Redefine the tab key, and leave the tab character definition alone. Many, if not most, text editors actually understand that “indent” is a different concept than “tab”. By assigning the tab key to some kind of smart indent feature in the editor, the computer can simply automatically insert the correct number of tab characters and spaces to move to the desired indent column. For example, if you want to indent to column 13, the computer would simply insert one tab and 4 spaces, without you’re having to think about it. Once again simple, efficient, allows you to use tab characters according to their default definition, and makes the computer do the work.
The Third Option: Kill the Tab Character
The problem is that those two different approaches are, indeed, commonly used. That means that you may well find programmers who have their text editors, viewers and other tools to assume that a tab character means every 3 spaces. Or every 4. Or every 8. And that means that any file that contains tabs, regardless of the definition may not display properly for everyone.
It’s important to have a public convention that says “tab characters mean this”, so that when someone views a file they can adjust your settings so that it will view properly, but still… that’s cumbersome and easily forgotten, especially for folks that move between projects that have different conventions.
One solution is simply to avoid the problem. Don’t use tabs. Indent to whatever degree your convention calls for, just use spaces to do it. Let your text editor simply insert the appropriate number of spaces to get to the next tab stop.
That way, all files that are tab-free are guaranteed to display properly no matter what your tab character is defined to be.
The cost? Your source files will be a little bigger. A trivially small cost in today’s environment of huge disks and storage.
Who’s Right?
So what’s the answer? Who’s right?
Everyone. No one.
Regardless of how we got here, here we are. The tab character does, and doesn’t, get redefined. People do use mixes of tabs, tabs and spaces or only spaces to indent their code.
What’s most important is that everyone working on the same source code use the same convention, and that the convention is somehow documented and easy to find, so that others looking at, or perhaps about to work on the code can adjust their own settings, if needed.
And I say that as someone who’s been a referee on too many of these coding standards religious battles.
(And I won’t even touch the issue of where the curly braces go in programming languages like C or C++. :-)
“The tab ASCII character, hexadecimal value 0x0a, decimal value 10, probably dates back to the days of the teletype.”
I assume you mean 0x09 (decimal 9) in the above.
(Note that my parents had an IBM Selectric typewriter that allowed user-settable tab stops. The default, of course, was every 8 columns.)
D’oh. Thank you. I’ve corrected the article. Had LF on the brain, or something.
we should’t use TAB characters because most pepole i know don’t know how it works even i don’t know what it means so please rember these comment and reply soon from karina robinson
Bah! You are the devil! The entire point of using tab is that people can chose how to display it – say a team member chooses to use four spaces, when I prefer just two – I’m stuffed and his code looks alien.
If we were to use the tab key, he can set it at four characters, I set it at two and we are both happy as hamsters!
Go tab!
Yea tabs. But tabs are really just the beginning. What is needed is, for instance, an IDE that stores the code in some normalized form. It could even store it minified or compressed and encrypted. When you open it, it formats the code with what I call a “Parsing Beautifier”. In other words a beautifier that does not just use patterns but has the same understanding of the code as a compiler would. Each person who looks at the code can fine tune the PB themselves and see the code in whatever way makes the most sense to them. Agree, dissagree, suggestions?
One problem with using TABs which was not mentioned here yet: people tend to intermix TABs and spaces. With the result that a text file is only really readable if the TAB setting of the reader is exactly the same as that of the writer…
For me, the largest benefit to using tabs is the very quick ability to shift spacing forward and back. When you have the tab key set to insert *spaces*, shifting code forward is still a simple, single tab key hit; shifting code back, however, is x backspace keys. This may sound trivial, but when you are coding day in day out, that is a big difference.
Solutions to some of the problems:
* As a team, define how tab should be used in everyone’s editors. Ensure everyone is using the same tab settings (this should be done whether spaces or tabs are demanded)
* Even with above, when spaces absolutely must be used. Have a tool in your editor to convert spaces to tabs (for people who don’t like typing) and have a tool in your version control system to ensure tabs are converted to spaces before committing.
* Alternatively, have shift-tab set to remove x-spaces when you have tab set to insert x-spaces
My tuppence
Personally I prefer TAB indentation, and then space indentation when it comes to alignment of parameters or something like that. Everything done automagically by my IDE ^^
What is missing in this article is the fact that you need to diff source files. Each diff tool will have a different idea of what a tab is. Usually 8 characters, but if you used your tabs at 4 or 2, then 8 will probably not work right.
Also, tabs are okay in many places, if you have a mix of tabs and spaces, that’s when it breaks (as mentioned by another user.)
Another really bad one, because code is likely to look right (except if written on multiple lines!), but not tables. Write a tables with tabs to separate the columns, and you sure create a big mess if you do not know the size of tabs.
By the way, I use vim and you can add a comment with the info, for example:
// vim: ts=4 sw=4
Now all the users know that tabs are equivalent to 4 spaces (ts = tab stop) and the indentation to the left or right is also 4 spaces (sw = shift width); and if you want to have spaces instead of actual tab characters, add “et” (expand tabs).
02-Apr-2013
When I started programming, it would have been unthinkable to use multiple spaces instead of a tab. I went as far as storing the date as the number of days since 1950 (a Y2k bug waiting to happen) and I’d pack 8 switches into a byte and use bits as yes/no switches. Now a 100K picture is considered tiny.
Leo, you wrote:
“Indenting is an approach to making programming code, be it HTML, C, Basic, or who-knows-what.”
The correct spelling of that last programming language is BASIC, not “Basic.” That’s because it’s an acronym — it stands for “Beginner’s All-purpose Symbolic Instruction Code.”
Hope that helps!
As I understand it (and recall, I was there at the beginning-ish), BASIC as an acronym refers to one specific implementation of the language. By the time it hit places like Microsoft and others, and oodles of variations had been created, it was referred to as “Basic”.