Friday, September 4, 2020

Massacring C Pointers (2018)

Massacring C Pointers

I'm taking a break from debugging books to talk about a calamitous shitshow of textbook writing: Mastering C Pointers: Tools for Programming Power, by Robert J. Traister.

I learned of the book through a talk by Brian Kernighan where he refers to the book as probably “the worst C programming textbook ever written.” He doesn't name it but with some help I was able to track down his obliquely accurate reference.

This book has become my white whale. Since I started reading debugging books, and especially now that I'm digging through older ones, I find bits of advice that simply don't work today. While some of it could be construed as useless or idiotic, I've always found the authors come from a position of earnestness, attempting to draw the best conclusions based on decent principles and what they knew at the time they wrote it. In some cases they may not have known much, but they're honestly and humbly trying to impart some wisdom.

When Kernighan put up the following example, I saw what seemed to be the opposite of that.

char *combine(s, t)
char *s, *t;
{

      int x, y;
      char r[100];

      strcpy(r, s);
      y = strlen(r);
      for (x = y; *t != '\0'; ++x)
           r[x] = *t++;

      r[x] = '\0';

      return(r);

}

This program (formatting preserved) is taken from the first edition of the book (p.146). It's horrific (Kernighan calls it “malpractice”). It does not exhibit the genuineness I have seen with, say, books from the late 1970s on how to debug BASIC programs. I didn't know what it was. Deceit? Laziness? Unremitting ignorance? What is the mindset of someone who writes this, presumably thinking it's a good idea? Can the whole book be that bad? Kernighan said it was. I had to know.

The book has two editions: the first was published in 1990, the second in 1993. The fact that there are two editions piqued my curiosity even more. It sold enough to make another version? Is that horrible example corrected? I obtained a copy of each and read them.

Reviewing the book is pointless. Kernighan was right: it's garbage. And the second edition only makes things worse. Teaching from either one would be a breach of ethics. If you follow Traister’s coding practices, even adjusted for today's standards, you are guaranteed to create defects and vicious, latent bugs. A subtly pernicious aspect of the book is the casual tone of the writing. It’s informal enough that if you don’t really know much about C what he says sort of makes sense, despite the sloppy terminology and mixed, inaccurate metaphors.

Any trained programmer will recognize the lessons as worthless. His terminology is all over the map and typically inaccurate, if not plainly wrong. Expressions “return a value.” Values are “handed” into locations. Constants are “written directly into the program.” A union is a “specialized pointer.” The terminology isn’t even consistent. Micro-optimizations are stressed at all times, and program efficiency is valued over comprehension. He can’t even define a pointer correctly: “A pointer is a special variable that returns the address of a memory location.” It does not take long to realize Traister has no idea what he’s talking about.

Although I’m not going to go over the content (that would take far too long since there’s nothing redeeming there), I did take extensive notes. You can read them, if you are so inclined.

I must, however, take a moment to single out the code. It is universally bad and much of it is simply wrong. (Imagine trying to learn programming principles from a book that contains a large number of programs that don’t even compile.) I’ve transcribed some of the programs and annotated them with comments so that you can get a taste of how inept Traister is as a C programmer. One thing to keep in mind (both for the programs and the notes) is that C89/C90 was new at the time and that the code was written on (and for) MS-DOS systems of the late 1980s/early 1990s. Things were a bit different then.

Enough about the material. I want to explore the question of how something so wrong even got written. It’s not that everything in the book is wrong, but it feels like when it’s right, it’s right by accident.

Traister has written other books, some about electronics and some about programming, one called Going from BASIC to C. In Mastering C Pointers, he talks about a product he created called CBREEZE that converts BASIC code to C. Throughout the book he makes passing, roundabout references to BASIC and uses terminology that suggests he’s written a lot of BASIC code. For example, there is a whole chapter on using pointers to access memory, where reading and writing memory is instead called “peeking” and “poking”, based on the PEEK and POKE instructions in BASIC. He also says that it took him a couple of tries to learn C coming from BASIC. In short, I’m convinced he’s knowledgeable about BASIC and has worked on writing software for small, electronic devices.

Why is this important? As I read the book (and if you read my notes, you know where this is going) I started to notice something in the wording and tone. The further I progressed the more I became convinced of it, and I think it explains how he managed to mangle the explanation of C pointers so badly.

I don’t think he understands the call stack.

My argument for this interpretation requires a little knowledge of BASIC and embedded devices.

With BASIC, the key thing to know about most implementations at the time is that there were no functions and no scope aside from the global scope. The closest thing to a function in BASIC is the GOSUB command. The GOSUB command jumps to a line and executes code until it gets to a RETURN statement, where control is transferred back to the line following the GOSUB command. Within a GOSUB you can jump somewhere else with another GOSUB. The control follows a stack principle, but no arguments are passed. GOSUB routines are a way to factor out common code, but that common code has to work on global variables. (And yes, it’s as terrible as it sounds.)

Now, consider the case of simple electronic devices. Even today some embedded devices, usually programmed using C, do not have a call stack that dynamically allocates space for automatic variables. There simply isn’t enough memory for it. Instead, the compiler lays out memory such that each function’s local variables have fixed memory addresses (a “compiled stack” model). The only stack you have is for return addresses and it is probably handled in hardware.

Suppose you’re used to writing BASIC for small memory electronic devices and you learn about C. You read about pointers and realize something: it’s possible to write a subroutine that can change variables without knowing their names. It’s manna from heaven! You don’t have to devote global variables to being the “parameters” of your subroutines anymore. Life is great.

This is the mindset I think Traister had and never got past. In the book there is one fleeting mention of the stack in reference to excessive (automatic) memory allocation. (On MS-DOS, if the space for local variables is too large in the program, it might not compile.) He consistently describes variables as having “exclusive” addresses in the program. His writing about pointers suggests that he thinks, for each function, space is set aside to hold the local variables for the duration of the program, but you can only access them when inside the function in which they were declared. So pointers are really powerful because you can provide this address to another function and it can change the value using only a parameter.

Further evidence for his lack of understanding is that he frequently cites ridiculous space micro-optimizations within functions, such as avoiding the use of integers for index variables, if possible. Another one, mentioned often, is local char arrays that have a fixed size. There are good reasons to not use them but his are not among them. His admonishment is that they waste space. Technically, that is true, but they don't exist until they're on the stack. And he never talks about global or file variables. He only refers to locals with “exclusive” addresses “set aside” for variables.

This interpretation runs into some problems once you start asking how functions with malloc will work, but it's worth pointing out that there is almost no discussion about memory management. In a book devoted to C pointers, that's a toxic mix of gross negligence and incompetence. There is literally one short paragraph devoted to talking about the free function—and it's characterized as a “side note.”

Another sticking point in this interpretation is Traister’s incomprehensible approach to writing functions that take a variable number of arguments. He does this by passing an arbitrary number of arguments to a function (the first being the number of arguments) and accessing them using offsets from the address of the first argument. This suggests he has some idea about parameters being passed in a dynamic fashion, but it is so spectacularly wrong you’re left wondering if he even tried his programs out before publishing them.

Honestly, this is the most generous interpretation of the text I could come up with, and it still paints a terrible picture. Occam's Razor suggests that Traister is just clueless. But like analyzing a terrible movie that somehow gets made, it's more fun to reason through the “behind the scenes” parts.

Given the ineptness of the book, you'd think it was self-published. You would be wrong. It was published through Academic Press, which was a division of Harcourt, Brace & World at the time, but is now an “imprint of Elsevier.”

In the preface of the second edition it says that the first edition was reviewed “by a professional C programmer hired by the publisher.” That programmer said it should not be published. That programmer was right, but the publisher went ahead and published it anyway.

Since there was a second edition, the assumption is that the book sold well. According to WorldCat, Mastering C Pointers is in at least 242 libraries, most appearing to be the first edition, but I didn’t check them all. It claims to be one of the first books to tackle the subject of pointers in C, which is often a sticking point for novice programmers. The lack of material in this area at the time is probably why it sold. I could not track down a review of this book anywhere (and yes, I looked through scans of Byte Magazine et al.), but I did find reviews for other C books in the 1980s and what I found suggested that pointers were not covered well, if at all. In other words, like many books—and tech books in particular—it sold because of its title and good timing.

If you browse search results for other books by Traister you’ll find a lot of questionable sounding titles: Making money with your microcomputer (1982), Leaping from BASIC to C++ (1994), Learn C in Two Weeks with Run/C and CBreeze (1987). The breadth of topics covered in his works seems exhausting: Beginner’s Guide to Reading Schematics (1991), Astronomy and Telescopes: A Beginner’s Handbook (1983), Make your own professional home video recordings (1982), Cave Exploring (1983), just to name a few. You start to wonder if this is actually the same person who writes all this. Then you start to wonder if maybe all these books just touch on the topics, and are churned out mainly to try and make a buck. I had a hard time finding a review for any of them (the best resource was archive.org). Practically all I could find about his books was ads for them—and even then it wasn’t that much, which is odd given the apparent volume of output from him. I did manage to find two reviews for his book Programming in C for the Microcomputer User (1984): one was favourable (80 Micro, Nov. 1984), the other was not (Practical Computing, Oct. 1985). The only other book I can find of his that seems to have some staying power is the schematics book, which went to a third edition.

Ultimately, the aspect of Mastering C Pointers that truly disturbs me is that there are probably a fair number of people who actually learned C pointers from it. There’s no way to know how much of an impact this book had on programmers in the 1990s, but given the number of copies in libraries it must have had some. It’s hard not to wonder how much of the terrible C code that has made its way into production can be attributed to the awful advice in Traister’s travesty of a text.

Thanks to John Regehr for helping me track down the book in the first place. The title was stolen inspired by one of his tweets.



from Hacker News https://ift.tt/3h22qK1

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.