Clifford on Programming Style
Some discussions with kyrah and Rusty Russell's keynote at Linuxkongress 2004, as well as reading too much ugly code (others‘ and my own), have inspired me to create this page. These are my personal opinions and thoughts, as well as others‘ opinions and thoughts that I can agree with
Some rules can be bent, others can be broken. |
This is about programming style and programming languages. It's not about algorithms themselves. Good programming style can't fix a broken algorithm – but the best algorithm doesn't help you at all if it is implemented in bad style.
In fact this document is more about philosophy than actual guidelines for good coding style. Read „Elements of programming style“ (by Kernighan and Plauger) if you are looking for guidelines to follow.
/* You are not expected to understand this */ |
The comments are IMO the most misunderstood coding-style issue. Most people think „the more comments the better“. In fact, for me most so-called „well commented“ source code is much harder to read than uncommented source code.
Some code explains itself and any additional comment would just make the code ugly and harder to read. The comments should always be one abstraction layer above the code. Let's have a look at the following C99 code snippet:
// pi is a float, start with 0 float pi = 0; // i is an integer, go from 0 to 10000 for (int i=0; i<10000; i++) { // initialize the floats x an y with random numbers float x = (float)rand() / RAND_MAX; float y = (float)rand() / RAND_MAX; // add .0004 to pi if x*x + y*y is smaller than 1 if ( x*x + y*y < 1 ) pi += .0004; }
The comments above are absolutely useless! Now let's have a 2nd look at the same code fragment with different comments:
// estimate pi by using a monte carlo algorithm: // get 10000 random points in the [0..1] x/y range and use pythagoras to check // if they are inside the 1/4 circle. Add 4/10000 for each point within the circle. float pi = 0; for (int i=0; i<10000; i++) { float x = (float)rand() / RAND_MAX; float y = (float)rand() / RAND_MAX; if ( x*x + y*y < 1 ) pi += .0004; }
This is actually one line shorter than the code above – and this time the comments do actually help. So here are the first rules:
- Comments should give the reader the „big picture“. As long as the code itself explains the details, extra comments are just disturbing.
- Never explain the programming language or API in your code.
It is safe to trust programmers to understand their languages and APIs.
For sure that only works if your code does explain itself…
if ( (!!strcmp(input, „no“)) != 0 ) printf(„You did not type 'no‘.\n“); |
… I will come back to that point soon.
What if you implement your own language or API? It would be stupid to expect other people to know it already. But languages and APIs are to be implemented once and then used often. Re-documenting them whenever using them would be stupid, too.
- If you introduce any new language features and APIs, add separate documentation for them so it is possible to learn it first and then read the code using it.
If Python is executable pseudocode, then perl is executable line noise. |
How to write code which explains itself well? Well, you could do it the hard way:
- Don't comment your code at all. Try to read it again after a while and rewrite what you don't understand at once. Repeat until you don't need to rewrite anything.
You think I'm kidding? I'm not! Sure – the idea isn't to rewrite your code all the time. The idea is to write it right in the first place – and „right“ in this context is code that would be produced by using the procedure described above. If comments are needed to understand the program, something has gone wrong. After your program is self-explanatory you can add comments for the big picture.
- Add comments to further improve the readability of a program, not to establish it in the first place.
OK, I think I'm done with the comments now.
If it still doesn't work, re-write it in assembler. This won't fix the bug, but it will make sure no one else finds it and makes you look bad. |
The good thing about high level languages is that they allow you to choose random names for almost everything. The bad thing about high level languages is that they allow you to choose random names for almost everything.
- Use descriptive names for everything which is not obvious.
- Use short names for everything which is.
Calling a loop iterator variable „loop_counter_int“ when calling it „i“ can't be misunderstood either is as bad as naming all variables from x0 to x999.
- When your variable names describe what the variable is doing and your function names describe what the function is doing, you won't need any comments…
If it's not possible to choose right names, you are possibly using too complex functions or using a variable for various different purposes.
- Break up your big and complex functions into small and non-complex ones.
- If you are using one variable for two things, you might want to use two variables instead.
Inlining code and allocating variable storage is the compiler's job. Don't try to be smarter – usually you are not.
When evaluating bigger expressions, the compiler adds unnamed temporary variables (one for each node in the DAG, read „The Dragonbook“ if you want to know more about that). Let's have a look at this code from a self-modifying hashing algorithm:
hash = (hash << ((hash % 7) + 9))) ^ (hash >> (32 - ((hash % 7) + 9))) ^ data[i];
This expression is hard to understand? Because of all the temporary variables with no names. If we split this code up into pieces with dedicated variable names, it becomes much easier to read:
unsigned int shifting_level = (hash % 7) + 9; unsigned int cross_shifted_hash = (hash << shifting_level) ^ (hash >> (32 - shifting_level)); hash = cross_shifted_hash ^ data[i];
Now the temporary values have names and it's much easier to understand the algorithm. With modern C compilers both variations produce exactly the same assembler code.
Some languages are designed to solve a problem. Others are designed to prove a point. |
An issue that is IMO underestimated in almost all programming style publications is the process of choosing the right language.
ht pu setxy -115 -200 pd to koch :length :depth if :depth = 1 [ fd :length stop ] koch :length / 3 :depth - 1 lt 60 koch :length / 3 :depth - 1 rt 120 koch :length / 3 :depth - 1 lt 60 koch :length / 3 :depth - 1 end repeat 3 [ koch 400 6 rt 120 ]
This LOGO program draws a „koch flake“ (a very simple fractal). It is small, clean and (at least for logo programmers) very easy to read. While I believe that LOGO is almost the perfect language for this program, I also believe that Oracle SQL*Forms would be a pretty stupid choice.
Here comes the most important of all programming style rules:
- Try to learn as many programming languages as possible and always choose the language which best fits the specific needs for the problem you are working on.
- Never be afraid to learn a new language just to see if it might be a good choice for your current project.
Profanity is the one language all programmers know best. |
One final word about API design: APIs don't need to be easy to use! Complex things sometime need complex APIs. But APIs must not be easy to use wrong!
And the two most important rules to reach this goal are IMO:
- Function names and types must be self explanatory.
E.g. if it is not obvious that a function can fail, add a _try to the end of the function name, don't expect void pointers to be int-aligned, etc.
- The most obvious way to use the API must be the right one.
E.g. when a C function generates a string, it should always be null terminated. strncpy() and readlink() are good examples for bad APIs.
That's it for now. Maybe I will add more later.
- Break the rules – but first learn them, so you know you're breaking them, and why.
Now get yourself a copy of „Elements of programming style“ (by Kernighan and Plauger; McGraw-Hill 1974, 1978 ISBN 0-07-034207-5) and read it carefully. A summary of the rules from the book can be found here. If you are programming C (or any other language which looks like C), also read the Linux kernel Coding Style document (linux/Documentation/CodingStyle).