Coroutines in C(by Simon Tatham)

来源:互联网 发布:电脑自动打电话软件 编辑:程序博客网 时间:2024/05/20 17:39

Coroutines in C

by Simon Tatham

Introduction

Structuring a large program is always a difficult job. One of theparticular problems that often comes up is this: if you have a pieceof code producing data, and another piece of code consuming it,which should be the caller and which should be the callee?

Here is a very simple piece of run-length decompression code, and anequally simple piece of parser code:

    /* Decompression code */    while (1) {        c = getchar();        if (c == EOF)            break;        if (c == 0xFF) {            len = getchar();            c = getchar();            while (len--)                emit(c);        } else            emit(c);    }    emit(EOF);
    /* Parser code */    while (1) {        c = getchar();        if (c == EOF)            break;        if (isalpha(c)) {            do {                add_to_token(c);                c = getchar();            } while (isalpha(c));            got_token(WORD);        }        add_to_token(c);        got_token(PUNCT);    }

Each of these code fragments is very simple, and easy to read andunderstand. One produces a character at a time by callingemit(); the other consumes a character at a time bycallinggetchar(). If only the calls toemit() and the calls to getchar() could bemade to feed data to each other, it would be simple to connect thetwo fragments together so that the output from the decompressor wentstraight to the parser.

In many modern operating systems, you could do this using pipesbetween two processes or two threads.emit() in thedecompressor writes to a pipe, and getchar() in theparser reads from the other end of the same pipe. Simple and robust,but also heavyweight and not portable. Typically you don't want tohave to divide your program into threads for a task this simple.

In this article I offer a creative solution to this sort ofstructure problem.

Rewriting

The conventional answer is to rewrite one of the ends of thecommunication channel so that it's a function that can be called.Here's an example of what that might mean for each of the examplefragments.

int decompressor(void) {    static int repchar;    static int replen;    if (replen > 0) {        replen--;        return repchar;    }    c = getchar();    if (c == EOF)        return EOF;    if (c == 0xFF) {        replen = getchar();        repchar = getchar();        replen--;        return repchar;    } else        return c;}
void parser(int c) {    static enum {        START, IN_WORD    } state;    switch (state) {        case IN_WORD:        if (isalpha(c)) {            add_to_token(c);            return;        }        got_token(WORD);        state = START;        /* fall through */        case START:        add_to_token(c);        if (isalpha(c))            state = IN_WORD;        else            got_token(PUNCT);        break;    }}

Of course you don't have to rewrite both of them; just one will do.If you rewrite the decompressor in the form shown, so that itreturns one character every time it's called, then the originalparser code can replace calls togetchar() with callsto decompressor(), and the program will be happy.Conversely, if you rewrite the parser in the form shown, so that itis called once for every input character, then the originaldecompression code can callparser() instead ofemit() with no problems. You would only want to rewriteboth functions as callees if you were a glutton forpunishment.

And that's the point, really. Both these rewritten functions arethoroughly ugly compared to their originals. Both of the processestaking place here are easier to read when written as a caller, notas a callee. Try to deduce the grammar recognised by the parser, orthe compressed data format understood by the decompressor, just byreading the code, and you will find that both the originals areclear and both the rewritten forms are less clear. It would be muchnicer if we didn't have to turn either piece of code inside out.

Knuth's coroutines

In The Art of Computer Programming, Donald Knuthpresents a solution to this sort of problem. His answer is to throwaway the stack concept completely. Stop thinking of one process asthe caller and the other as the callee, and start thinking of themas cooperating equals.

In practical terms: replace the traditional "call" primitive with aslightly different one. The new "call" will save the return valuesomewhere other than on the stack, and will then jump to a locationspecified in another saved return value. So each time thedecompressor emits another character, it saves its program counterand jumps to the last known location within the parser - and eachtime the parserneeds another character, it saves its ownprogram counter and jumps to the location saved by the decompressor.Control shuttles back and forth between the two routines exactly asoften as necessary.

This is very nice in theory, but in practice you can only do it inassembly language, because no commonly used high level languagesupports the coroutine call primitive. Languages like C dependutterly on their stack-based structure, so whenever control passesfrom any function to any other, one must be the caller and the othermust be the callee. So if you want to write portable code, thistechnique is at least as impractical as the Unix pipe solution.

Stack-based coroutines

So what we would really like is the ability to mimicKnuth's coroutine call primitive in C. We must accept that inreality, at the C level, one function will be caller and the otherwill be callee. In the caller, we have no problem; we code theoriginal algorithm, pretty much exactly as written, and whenever ithas (or needs) a character it calls the other function.

The callee has all the problems. For our callee, we want a functionwhich has a "return and continue" operation: return from thefunction, and next time it is called, resume control from just afterthereturn statement. For example, we would like to be ableto write a function that says

int function(void) {    int i;    for (i = 0; i < 10; i++)        return i;   /* won't work, but wouldn't it be nice */}

and have ten successive calls to the function return the numbers 0through 9.

How can we implement this? Well, we can transfer control to anarbitrary point in the function using agoto statement.So if we use a state variable, we could do this:

int function(void) {    static int i, state = 0;    switch (state) {        case 0: goto LABEL0;        case 1: goto LABEL1;    }    LABEL0: /* start of function */    for (i = 0; i < 10; i++) {        state = 1; /* so we will come back to LABEL1 */        return i;        LABEL1:; /* resume control straight after the return */    }}

This method works. We have a set of labels at the points where wemight need to resume control: one at the start, and one just aftereachreturn statement. We have a state variable,preserved between calls to the function, which tells us which labelwe need to resume control at next. Before any return, we update thestate variable to point at the right label; after any call, we do aswitch on the value of the variable to find out whereto jump to.

It's still ugly, though. The worst part of it is that the set oflabels must be maintained manually, and must be consistent betweenthe function body and the initialswitch statement.Every time we add a new return statement, we must invent a new labelname and add it to the list in theswitch; every timewe remove a return statement, we must remove its correspondinglabel. We've just increased our maintenance workload by a factor oftwo.

Duff's device

The famous "Duff's device" in C makes use of the fact that acase statement is still legal within a sub-block of itsmatchingswitch statement. Tom Duff used this for anoptimised output loop:

    switch (count % 8) {        case 0:        do {  *to = *from++;        case 7:              *to = *from++;        case 6:              *to = *from++;        case 5:              *to = *from++;        case 4:              *to = *from++;        case 3:              *to = *from++;        case 2:              *to = *from++;        case 1:              *to = *from++;                       } while ((count -= 8) > 0);    }

We can put it to a slightly different use in the coroutine trick.Instead of using aswitch statement to decide whichgoto statement to execute, we can use theswitch statement to perform the jump itself:

int function(void) {    static int i, state = 0;    switch (state) {        case 0: /* start of function */        for (i = 0; i < 10; i++) {            state = 1; /* so we will come back to "case 1" */            return i;            case 1:; /* resume control straight after the return */        }    }}

Now this is looking promising. All we have to do now is construct afew well chosen macros, and we can hide the gory details insomething plausible-looking:

#define crBegin static int state=0; switch(state) { case 0:#define crReturn(i,x) do { state=i; return x; case i:; } while (0)#define crFinish }int function(void) {    static int i;    crBegin;    for (i = 0; i < 10; i++)        crReturn(1, i);    crFinish;}

(note the use of do ... while(0) to ensure thatcrReturn does not need braces around it when it comesdirectly betweenif and else)

This is almost exactly what we wanted. We can usecrReturn to return from the function in such a way thatcontrol at the next call resumes just after the return. Of course wemust obey some ground rules (surround the function body withcrBegin and crFinish; declare all localvariables static if they need to be preserved across acrReturn;never put a crReturnwithin an explicit switch statement); but those do notlimit us very much.

The only snag remaining is the first parameter tocrReturn. Just as when we invented a new label in theprevious section we had to avoid it colliding with existing labelnames, now we must ensure all our state parameters tocrReturn are different. The consequences will be fairlybenign - the compiler will catch it and not let it do horriblethings at run time - but we still need to avoid doing it.

Even this can be solved. ANSI C provides the special macro name__LINE__, which expands to the current source linenumber. So we can rewritecrReturn as

#define crReturn(x) do { state=__LINE__; return x; \                         case __LINE__:; } while (0)

and then we no longer have to worry about those state parameters atall, provided we obey a fourth ground rule (never put twocrReturn statements on the same line).

Evaluation

So now we have this monstrosity, let's rewrite our original codefragments using it.

int decompressor(void) {    static int c, len;    crBegin;    while (1) {        c = getchar();        if (c == EOF)            break;        if (c == 0xFF) {            len = getchar();            c = getchar();            while (len--)        crReturn(c);        } else    crReturn(c);    }    crReturn(EOF);    crFinish;}
void parser(int c) {    crBegin;    while (1) {        /* first char already in c */        if (c == EOF)            break;        if (isalpha(c)) {            do {                add_to_token(c);crReturn( );            } while (isalpha(c));            got_token(WORD);        }        add_to_token(c);        got_token(PUNCT);crReturn( );    }    crFinish;}

We have rewritten both decompressor and parser as callees, with noneed at all for the massive restructuring we had to do last time wedid this. The structure of each function exactly mirrors thestructure of its original form. A reader can deduce the grammarrecognised by the parser, or the compressed data format used by thedecompressor, far more easily than by reading the obscurestate-machine code. The control flow is intuitive once you havewrapped your mind around the new format: when the decompressor has acharacter, it passes it back to the caller withcrReturn and waits to be called again when anothercharacter is required. When the parser needs another character, itreturns usingcrReturn, and waits to be called againwith the new character in the parameterc.

There has been one small structural alteration to the code:parser() now has itsgetchar() (well, thecorresponding crReturn) at the end of the loop insteadof the start, because the first character is already inc when the function is entered. We could accept thissmall change in structure, or if we really felt strongly about it wecould specify that parser() required an"initialisation" call before you could start feeding it characters.

As before, of course, we don't have to rewrite both routines usingthe coroutine macros. One will suffice; the other can be its caller.

We have achieved what we set out to achieve: a portable ANSI C meansof passing data between a producer and a consumer without the needto rewrite one as an explicit state machine. We have done this bycombining the C preprocessor with a little-used feature of theswitch statement to create an implicit statemachine.

Coding Standards

Of course, this trick violates every coding standard in the book.Try doing this in your company's code and you will probably besubject to a stern telling off if not disciplinary action! You haveembedded unmatched braces in macros, usedcase withinsub-blocks, and as for the crReturn macro with itsterrifyingly disruptive contents . . . It's a wonder youhaven't been fired on the spot for such irresponsible codingpractice. You should be ashamed of yourself.

I would claim that the coding standards are at fault here. Theexamples I've shown in this article are not very long, not verycomplicated, and still just about comprehensible when rewritten asstate machines. But as the functions get longer, the degree ofrewriting required becomes greater and the loss of clarity becomesmuch, much worse.

Consider. A function built of small blocks of the form

    case STATE1:    /* perform some activity */    if (condition) state = STATE2; else state = STATE3;

is not very different, to a reader, from a function built of smallblocks of the form

    LABEL1:    /* perform some activity */    if (condition) goto LABEL2; else goto LABEL3;

One is caller and the other is callee, true, but the visualstructure of the functions are the same, and the insights theyprovide into their underlying algorithms are exactly as small aseach other. The same people who would fire you for using mycoroutine macros would fire you just as loudly for building afunction out of small blocks connected bygotostatements! And this time they would be right, because laying out afunction like that obscures the structure of the algorithm horribly.

Coding standards aim for clarity. By hiding vital things likeswitch,return and casestatements inside "obfuscating" macros, the coding standards wouldclaim you have obscured the syntactic structure of the program, andviolated the requirement for clarity. But you have done so in thecause of revealing the algorithmic structure of theprogram, which is far more likely to be what the reader wants toknow!

Any coding standard which insists on syntactic clarity at theexpense of algorithmic clarity should be rewritten. If your employerfires you for using this trick, tell them that repeatedly as thesecurity staff drag you out of the building.

Refinements and Code

In a serious application, this toy coroutine implementation isunlikely to be useful, because it relies onstaticvariables and so it fails to be re-entrant or multi-threadable.Ideally, in a real application, you would want to be able to callthe same function in several different contexts, and at each call ina given context, have control resume just after the last return inthe same context.

This is easily enough done. We arrange an extra function parameter,which is a pointer to a context structure; we declare all our localstate, and our coroutine state variable, as elements of thatstructure.

It's a little bit ugly, because suddenly you have to usectx->i as a loop counter where you would previouslyjust have usedi; virtually all your serious variablesbecome elements of the coroutine context structure. But it removesthe problems with re-entrancy, and still hasn't impacted thestructure of the routine.

(Of course, if C only had Pascal's with statement, wecould arrange for the macros to make this layer of indirection trulytransparent as well. A pity. Still, at least C++ users can managethis by having their coroutine be a class member, and keeping allits local variables in the class so that the scoping is implicit.)

Included here is a C header file that implements this coroutinetrick as a set of pre-defined macros. There are two sets of macrosdefined in the file, prefixedscr and ccr.The scr macros are the simple form of the technique,for when you can get away with usingstatic variables;the ccr macros provide the advanced re-entrant form.Full documentation is given in a comment in the header file itself.

Note that Visual C++ version 6 doesn't like this coroutine trick,because its default debug state (Program Database for Edit andContinue) does something strange to the__LINE__ macro.To compile a coroutine-using program with VC++ 6, you must turn offEdit and Continue. (In the project settings, go to the "C/C++" tab,category "General", setting "Debug info". Select any optionother than "Program Database for Edit and Continue".)

(The header file is MIT-licensed, so you can use it in anything youlike without restriction. If you do find something the MIT licencedoesn't permit you to do,mail me,and I'll probably give you explicit permission to do it anyway.)

Follow this link forcoroutine.h.

Thanks for reading. Share and enjoy!

References

  • Donald Knuth, The Art of Computer Programming, Volume1. Addison-Wesley, ISBN 0-201-89683-4. Section 1.4.2 describescoroutines in the "pure" form.
  • http://www.lysator.liu.se/c/duffs-device.htmlis Tom Duff's own discussion of Duff's device. Note, right at thebottom, a hint that Duff might also have independently invented thiscoroutine trick or something very like it.

    Update, 2005-03-07:TomDuff confirms this in a blog comment. The "revolting way to useswitches to implement interrupt driven state machines" of which hespeaks in his original email is indeed the same trick as I describehere.

  • PuTTYis a Win32 Telnet and SSH client. The SSH protocol code containsreal-life use of this coroutine trick. As far as I know, this is theworst piece of C hackery ever seen in serious production code.


本文转载自网络,如有侵权,请联系本人。

谢谢。


原创粉丝点击