Why C Is Not My Favourite Programming Language

来源：互联网发布：ios sql语句高级编辑：程序博客网时间：2024/06/05 14:23

Brian Kernighan, the documenter of the C programming language, wrote a rant entitled Why Pascal is Not My Favourite Programming Language. I can picture him thinking to himself smugly as he repeatedly strikes facetiously at Pascal by describing a few of its small flaws over and over again.

Unfortunately, time has not been kind to Kernighan's tract. Pascal has matured and grown in leaps and bounds, becoming a premier commercial language. Meanwhile, C has continued to stagnate over the last 35 years with few fundamental improvements made. It's time to redress the balance; here's why C is now owned by Pascal.

No string type

C has no string type. Huh? Most sane programming languages have a string type which allows one to just say "this is a string" and let the compiler take care of the rest. Not so with C. It's so stubborn and dumb that it only has three types of variable; everything is either a number, a bigger number, a pointer or a combination of those three. Thus, we don't have proper strings but "arrays of unsigned integers". "char" is basically only a really small number. And now we have to start using unsigned ints to represent multibyte characters.

What. A. Crock. An ugly hack.

Functions for insignificant operations

Copying one string from another requires including <string.h> in your source code, and there are two functions for copying a string. One could even conceivably copy strings using other functions (if one wanted to, though I can't imagine why). Why does any normal language need two functions just for copying a string? Why can't we just use the assignment operator ('=') like for the other types? Oh, I forgot. There's no such thing as strings in C; just a big continuous stick of memory. Great! Better still, there's no syntax for:

string concatenation
string comparison
substrings

Ditto for converting numbers to strings, or vice versa. You have to use something like atol(), or strtod(), or a variant on printf(). Three families of functions for variable type conversion. Hello? Flexible casting? Hello?

And don't even get me started on the lack of exponentiation operators.

No string type: the redux

Because there's no real string type, we have two options: arrays or pointers. Array sizes can only be constants. This means we run the risk of buffer overflow since we have to try (in vain) to guess in advance how many characters we need. Pathetic. The only alternative is to use malloc(), which is just filled with pitfalls. The whole concept of pointers is an accident waiting to happen. You can't free the same pointer twice. You have to always check the return value of malloc() and you mustn't cast it. There's no built-in way of telling if a spot of memory is in use, or if a pointer's been freed, and so on and so forth. Having to resort to low-level memory operations just to be able to store a line of text is asking for...

The encouragement of buffer overflows

Buffer overflows abound in virtually any substantial piece of C code. This is caused by programmers accidentally putting too much data in one space or leaving a pointer pointing somewhere because a returning function ballsed up somewhere along the line. C includes no way of telling when the end of an array or allocated block of memory is overrun. The only way of telling is to run, test, and wait for a segfault. Or a spectacular crash. Or a slow, steady leakage of memory from a program, agonisingly 'bleeding' it to death.

Functions which encourage buffer overflows

gets()
strcat()
strcpy()
sprintf()
vsprintf()
bcopy()
scanf()
fscanf()
sscanf()
getwd()
getopt()
realpath()
getpass()

The list goes on and on and on. Need I say more? Well, yes I do.

You see, even if you're not writing any memory you can still access memory you're not supposed to. C can't be bothered to keep track of the ends of strings; the end of a string is indicated by a null '/0' character. All fine, right? Well, some functions in your C library, such as strlen(), perhaps, will just run off the end of a 'string' if it doesn't have a null in it. What if you're using a binary string? Careless programming this may be, but we all make mistakes and so the language authors have to take some responsibility for being so intolerant.

No built-in Boolean type

If you don't believe me, just watch:

$ cat > test.c
int main(void)
{

bool b;
return 0;

}

$ gcc -ansi -pedantic -Wall -W test.c
test.c: In function 'main':
test.c:3: 'bool' undeclared (first use in this function)

Not until the 1999 ISO C standard were we finally able to use 'bool' as a data type. But guess what? It's implemented as a macro and one actually has to include a header file to be able to use it!

High-level or low-level?

On the one hand, we have the fact that there is no string type, and direct memory management, implying a low-level language. On the other hand, we have a mass of library functions, a preprocessor and a plethora of other things which imply a high-level language. C tries to be both, and as a result spreads itself too thinly.

The great thing about this is that when C is lacking a genuinely useful feature, such as reasonably strong data typing, the excuse "C's a low-level language" can always be used, functioning as a perfect 'reason' for C to remain unhelpfully and fatally sparse.

The original intention for C was for it to be a portable assembly language for writing UNIX. Unfortunately, from its very inception C has had extra things packed into it which make it fail as an assembly language. Its kludgy strings are a good example. If it were at least portable these failings might be forgivable, but C is not portable.

Integer overflow without warning

Self explanatory. One minute you have a fifteen digit number, then try to double or triple it and boom! its value is suddenly -234891234890892 or something similar. Stupid, stupid, stupid. How hard would it have been to give a warning or overflow error or even reset the variable to zero?

This is widely known as bad practice. Most competent developers acknowledge that silently ignoring an error is a bad attitude to have; this is especially true for such a commonly used language as C.

Portability?!

Please. There are at least four official specifications of C I could name from the top of my head and no compiler has properly implemented all of them. They conflict, and they grow and grow. The problem isn't subsiding; it's increasing each day. New compilers and libraries are developed and proprietary extensions are being developed. GNU C isn't the same as ANSI C isn't the same as K&R C isn't the same as Microsoft C isn't the same as POSIX C. C isn't portable; all kinds of machine architectures are totally different, and C can't properly adapt because it's so muttonheaded. It's trapped in The Unix Paradigm.

If it weren't for the C preprocessor, then it would be virtually impossible to get C to run on multiple families of processor hardware, or even just slightly differing operating systems. A programming language should not require a C preprocessor just so that it can run on both FreeBSD, Linux or Windows without failing to compile.

C is unable to adapt to new conditions for the sake of "backward compatibility", throwing away the opportunity to get rid of stupid, utterly useless and downright dangerous functions for a nonexistent goal. And yet C is growing new tentacles and unnecessary features because of idiots who think adding seven new functions to their C library will make life easier. It does not.

Even the C89 and C99 standards conflict with each other in ridiculous ways. Can you use the long long type or can't you? Is a certain constant defined by a preprocessor macro hidden deep, deep inside my C library? Is using a function in this particular way going to be undefined, or acceptable? What do you mean, getch() isn't a proper function but getchar() is?

The implications of this false 'portability'

Because C pretends to be portable, even professional C programmers can be caught out by hardware and an unforgiving programming language; almost anything like comparisons, character assignments, arithmetic, or string output can blow up spectacularly for no apparent reason because of endianness or because your particular processor treats all chars as unsigned or silly, subtle, deadly traps like that.

Archaic, unexplained conventions

In addition to the aforementioned problems, C also has various idiosyncrasies (invariably unreported) which not even some teachers of C are aware of: "Don't use fflush(stdin), gets() is evil, main() must return an integer, main() can only take one of three sets of arguments, you musn't cast the return value of malloc(), fileno() isn't an ANSI compliant function..." all these unnecessary and unmentioned quirks mean buggy code. Death by a thousand cuts. Ironic when you consider that Kernighan thinks of Pascal in the same way when C has just as many little gotchas that bleed you to death gradually and painfully.

Blaming The Progammer

Due to the fact that C is pretty difficult to learn and even harder to actually use without breaking something in a subtle yet horrific way it's assumed that anything which goes wrong is the programmer's fault. If your program segfaults, it's your fault. If it crashes, mysteriously returning 184 with no error message, it's your fault. When one single condition you'd just happened to have forgotten about whilst coding screws up, it's your fault.

Obviously the programmer has to shoulder most of the responsibility for a broken program. But as we've already seen, C positively tries to make the programmer fail. This increases the failure rate and yet for some reason we don't blame the language when yet another buffer overflow is discovered. C programmers try to cover up C's inconsistencies and inadequacies by creating a culture of 'tua culpa'; if something's wrong, it's your fault, not that of the compiler, linker, assembler, specification, documentation, or hardware.

Compilers have to take some of the blame. Two reasons. The first is that most compilers have proprietary extensions built into them. Let me remind you that half of the point of using C is that it should be portable and compile anywhere. Adding extensions violates the original spirit of C and removes one of its advantages (albeit an already diminished advantage).

The other (and perhaps more pressing) reason is the lack of anything beyond minimal error checking which C compilers do. For every ten types of errors your compiler catches, another fifty will slip through. Beyond variable type and syntax checking the compiler does not look for anything else. All it can do is give warnings on unusual behaviour, though these warnings are often spurious. On the other hand, a single error can cause a ridiculous cascade, or make the compiler fall over and die because of a misplaced semicolon, or, more accurately and incriminatingly, a badly constructed parser and grammar. And yet, despite this, it's your fault.

To quote The Unix Haters' Handbook:

"If you make even a small omission, like a single semicolon, a C compiler tends to get so confused and annoyed that it bursts into tears and complains that it just can't compile the rest of the file since one missing semicolon has thrown it off so much."

So C compilers may well give literally hundreds of errors stating that half of your code is wrong if you miss out a single semicolon. Can it get worse? Of course it can! This is C!

You see, a compiler will often not deluge you with error information when compiling. Sometimes it will give you no warning whatsoever even if you write totally foolish code like this:

#include <stdio.h>

int main()
{

char *p;
puts(p);
return 0;

}

When we compile this with our 'trusty' compiler gcc, we get no errors or warnings at all. Even when using the '-W' and '-Wall' flags to make it watch out for dangerous code it says nothing.

In fact, no warning is given ever unless you try to optimise the program with a '-O' flag. But what if you never optimise your program? Well, you now have a dangerous program. And unless you check the code again you may well never notice that error.

What this section (and entire document) is really about is the sheer unfriendliness of C and how it is as if it takes great pains to be as difficult to use as possible. It is flexible in the wrong way; it can do many, many different things, but this makes it impossible to do any single thing with it.

Trapped in the 1970s

C is over thirty years old, and it shows. It lacks features that modern languages have such as exception handling, many useful data types, function overloading, optional function arguments and garbage collection. This is hardly surprising considering that it was constructed from an assembler language with just one data type on a computer from 1970.

C was designed for the computer and programmer of the 1970s, sacrificing stability and programmer time for the sake of memory. Despite the fact that the most recent standard is just half a decade old, C has not been updated to take advantage of increased memory and processor power to implement such things as automatic memory management. What for? The illusion of backward compatibility and portability.

Yet more missing data types

Hash tables. Why was this so difficult to implement? C is intended for the programming of things like kernels and system utilities, which frequently use hash tables. And yet it didn't occur to C's creators that maybe including hash tables as a type of array might be a good idea when writing UNIX? Perl has them. PHP has them. With C you have to fake hash tables, and even then it doesn't really work at all.

Multidimensional arrays. Before you tell me that you can do stuff like int multiarray[50][50][50] I think that I should point out that that's an array of arrays of arrays. Different thing. Especially when you consider that you can also use it as a bunch of pointers. C programmers call this "flexibility". Others call it "redundancy", or, more accurately, "mess".

Complex numbers. They may be in C99, but how many compilers support that? It's not exactly difficult to get your head round the concept of complex numbers, so why weren't they included in the first place? Were complex numbers not discovered back in 1989?

Binary strings. It wouldn't have been that hard just to make a compulsory struct with a mere two members: a char * for the string of bytes and a size_t for the length of the string. Binary strings have always been around on Unix, so why wasn't C more accommodating?

Library size

The actual core of C is admirably small, even if some of the syntax isn't the most efficient or readable (case in point: the combined '? :' statement). One thing that is bloated is the C library. The number of functions in a full C library which complies with all significant standards runs into four digit figures. There's a great deal of redundancy, and code which really shouldn't be there.

This has knock-on effects, such as the large number of configuration constants which are defined by the preprocessor (which shouldn't be necessary), the size of libraries (the GNU C library almost fills a floppy disk and its documentation, three) and inconsistently named groups of functions in addition to duplication.

For example, a function for converting a string to a long integer is atol(). One can also use strtol() for exactly the same thing. Boom - instant redundancy. Worse still, both functions are included in the C99, POSIX and SUSv3 standards!

Can it get worse? Of course it can! This is C!

As a result it's only logical that there's an equivalent pair of atod() and strtod() functions for converting a string to a double. As you've probably guessed, this isn't true. They are called atof() and strtod(). This is very foolish. There are yet more examples scattered through the standard C library like a dog's smelly surprises in a park.

The Single Unix Specification version three specifies 1,123 functions which must be available to the C programmer of the compliant system. We already know about the redundancies and unnecessary functions, but across how many header files are these 1,123 functions spread out? 62. That's right, on average a C library header will define approximately eighteen functions. Even if you only need to use maybe one function from each of, say, five libraries (a common occurrence) you may well wind up including 90, 100 or even 150 function definitions you will never need. Bloat, bloat, bloat. Python has the right idea; its import statement allows you to define exactly the functions (and global variables!) you need from each library if you prefer. But C? Oh, no.

Specifying structure members

Why does this need two operators? Why do I have to pick between '.' and '->' for a ridiculous, arbitrary reason? Oh, I forgot; it's just yet another of C's gotchas.

Limited syntax

A couple of examples should illustrate what I mean quite nicely. If you've ever programmed in PHP for a substantial period of time, you're probably aware of the 'break' keyword. You can use it to break out from nested loops of arbitrary depth by using it with an integer, such as "break 3"; this would break out of three levels of loops.

There is no way of doing this in C. If you want to break out from a series of nested for or while loops then you have to use a goto. This is what is known as a crude hack.

In addition to this, there is no way to compare any non-numerical data type using a switch statement. C does not allow you to use switch and case statements for strings. One must use several variables to iterate through an array of case strings and compare them to the given string with strcmp(). This reduces performance and is just yet another hack.

In fact, this is an example of gratuitous library functions running wild once again. Even comparing one string to another requires use of the strcmp() function.

Flushing standard I/O

A simple microcosm of the "you can do this, but not that" philosophy of C; one has to do two different things to flush standard input and standard output.

To flush the standard output stream, one can use fflush() (defined by <stdio.h>). One doesn't usually need to do this after every bit of text is printed, but it's nice to know it's there, right?

Unfortunately, one cannot use fflush() to flush the contents of standard input. Some C standards explicitly define it as having undefined behaviour, but this is so illogical that even textbook authors sometimes mistakenly use fflush(stdin) in examples and some compilers won't bother to warn you about it. One shouldn't even have to flush standard input; you ask for a character with getchar(), and the program should just read in the first character given and disregard the rest. But I digress...

There is no 'real' way to flush standard input up to, say, the end of a line. Instead one has to use a kludge like so:

int c;
do {

errno = 0;
c = getchar();

if (errno) {

fprintf(stderr,
"Error flushing standard input buffer: %s/n",
strerror(errno));

}

} while ((c != '/n') && (!feof(stdin)));

That's right; you need to use a variable, a looping construct, two library functions and several lines of exception handling code to flush the standard input buffer.

Inconsistent error handling

A seasoned C programmer will be able to tell what I'm talking about just by reading the title of this section. There are many incompatible ways in which a C library function indicates that an error has occurred:

Returning zero.
Returning nonzero.
Returning a NULL pointer.
Setting errno.
Requiring a call to another function.
Outputting a diagnostic message to the user.

Some functions may actually use up to three of these methods. But the thing is that none of these are compatible with each other and error handling does not occur automatically; every time a C programmer uses a library function they must check manually for an error. This bloats code which would otherwise be perfectly readable without if-blocks for error handling and variables to keep track of errors. In a large software project one must write a section of code for error handling hundreds of times. If you forget, something can go horribly wrong. For example, if you don't check the return value of malloc() you may accidentally try to use a null pointer. Oops...

Commutative array subscripting

"Hey, Thompson, how can I make C's syntax even more obfuscated and difficult to understand?"

"How about you allow 5[var] to mean the same as var[5]?"

"Wow; unnecessary and confusing syntactic idiocy! Thanks!"

"You're welcome, Dennis."

Variadic anonymous macros

In case you don't understand what variadic anonymous macros are, they're macros (i.e. pseudofunctions defined by the preprocessor) which can take a variable number of arguments. Sounds like a simple thing to implement. I mean, it's all done by the preprocessor, right? And besides, you can define proper functions with variable numbers of arguments even in the original K&R C, right?

In that case, why can't I do:

#define error(...) fprintf(stderr, ...)

without getting a warning from GCC?

warning: anonymous variadic macros were introduced in C99

That's right, folks. Not until late 1999, 30 years after development on the C programming language began, have we been allowed to do such a simple task with the preprocessor.

The C standards don't make sense

Only one simple quote from the ANSI C standard - nay, a single footnote - is needed to demonstrate the immense idiocy of the whole thing. Ladies, gentlemen, and everyone else, I present to you...footnote 82:

All whitespace is equivalent except in certain situations.

I'd make a cutting remark about this, but it'd be too easy.

Too much preprocessor power

Rather foolishly, half of the actual C language is reimplemented in the preprocessor. (This should be a concern from the start; redundancy usually indicates an underlying problem.) We can #define fake variables, fake conditions with #ifdef and #ifndef, and look, there's even #if, #endif and the rest of the crew! How useful!

Erm, sorry, no.

Preprocessors are a good idea for a language like C. As has been iterated, C is not portable. Preprocessors are vital to bridging the gap between different computer architectures and libraries and allowing a program to compile on multiple machines without having to rely on external programs. The #define statement, in this case, can be used perfectly validly to set 'flags' that can be used by a program to determine all sorts of things: which C standard is being used, which library, who wrote it, and so on and so forth.

Now, the situation isn't as bad as for C++. In C++, the preprocessor is so packed with unnecessary rubbish that one can actually use it to calculate an arbitrary series of Fibonacci numbers at compile-time. However, C comes dangerously close; it allows the programmer to define fake global variables with wacky values which would not otherwise be proper code, and then compare values of these variables. Why? It's not needed; the C language of the Plan 9 operating system doesn't let you play around with preprocessor definitions like this. It's all just bloat.

"But what about when we want to use a constant throughout a program? We don't want to have to go through the program changing the value each time we want to change the constant!" some may complain. Well, there's these things called global variables. And there's this keyword, const. It makes a constant variable. Do you see where I'm going with this?

You can do search and replace without the preprocessor, too. In fact, they were able to do it back in the seventies on the very first versions of Unix. They called it sed. Need something more like cpp? Use m4 and stop complaining. It's the Unix way!