Linux Coding Style

来源:互联网 发布:virtualbox5 装mac dmg 编辑:程序博客网 时间:2024/06/02 05:54


        Linux kernel coding style

This is a short document describing the preferred coding style for the
linux kernel.  Coding style is very personal, and I won't _force_ my
views on anybody, but this is what goes for anything that I have to be
able to maintain, and I'd prefer it for most other things too.  Please
at least consider the points made here.

First off, I'd suggest printing out a copy of the GNU coding standards,
and NOT read it.  Burn them, it's a great symbolic gesture.

Anyway, here goes:


         Chapter 1: Indentation

Tabs are 8 characters, and thus indentations are also 8 characters.
There are heretic movements that try to make indentations 4 (or even 2!)
characters deep, and that is akin to trying to define the value of PI to
be 3.

Rationale: The whole idea behind indentation is to clearly define where
a block of control starts and ends.  Especially when you've been looking
at your screen for 20 straight hours, you'll find it a lot easier to see
how the indentation works if you have large indentations.

Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen.  The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.

In short, 8-char indents make things easier to read, and have the added
benefit of warning you when you're nesting your functions too deep.
Heed that warning.

The preferred way to ease multiple indentation levels in a switch statement is
to align the "switch" and its subordinate "case" labels in the same column
instead of "double-indenting" the "case" labels.  E.g.:

    switch (suffix) {
    case 'G':
    case 'g':
        mem <<= 30;
        break;
    case 'M':
    case 'm':
        mem <<= 20;
        break;
    case 'K':
    case 'k':
        mem <<= 10;
        /* fall through */
    default:
        break;
    }


Don't put multiple statements on a single line unless you have
something to hide:

    if (condition) do_this;
      do_something_everytime;

Don't put multiple assignments on a single line either.  Kernel coding style
is super simple.  Avoid tricky expressions.

Outside of comments, documentation and except in Kconfig, spaces are never
used for indentation, and the above example is deliberately broken.

Get a decent editor and don't leave whitespace at the end of lines.


        Chapter 2: Breaking long lines and strings

Coding style is all about readability and maintainability using commonly
available tools.

The limit on the length of lines is 80 columns and this is a strongly
preferred limit.

Statements longer than 80 columns will be broken into sensible chunks.
Descendants are always substantially shorter than the parent and are placed
substantially to the right. The same applies to function headers with a long
argument list. Long strings are as well broken into shorter strings. The
only exception to this is where exceeding 80 columns significantly increases
readability and does not hide information.

void fun(int a, int b, int c)
{
    if (condition)
        printk(KERN_WARNING "Warning this is a long printk with "
                        "3 parameters a: %u b: %u "
                        "c: %u \n", a, b, c);
    else
        next_statement;
}

        Chapter 3: Placing Braces and Spaces

The other issue that always comes up in C styling is the placement of
braces.  Unlike the indent size, there are few technical reasons to
choose one placement strategy over the other, but the preferred way, as
shown to us by the prophets Kernighan and Ritchie, is to put the opening
brace last on the line, and put the closing brace first, thusly:

    if (x is true) {
        we do y
    }

This applies to all non-function statement blocks (if, switch, for,
while, do).  E.g.:

    switch (action) {
    case KOBJ_ADD:
        return "add";
    case KOBJ_REMOVE:
        return "remove";
    case KOBJ_CHANGE:
        return "change";
    default:
        return NULL;
    }

However, there is one special case, namely functions: they have the
opening brace at the beginning of the next line, thus:

    int function(int x)
    {
        body of function
    }

Heretic people all over the world have claimed that this inconsistency
is ...  well ...  inconsistent, but all right-thinking people know that
(a) K&R are _right_ and (b) K&R are right.  Besides, functions are
special anyway (you can't nest them in C).

Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a "while" in a do-statement or an "else" in an if-statement, like
this:

    do {
        body of do-loop
    } while (condition);

and

    if (x == y) {
        ..
    } else if (x > y) {
        ...
    } else {
        ....
    }

Rationale: K&R.

Also, note that this brace-placement also minimizes the number of empty
(or almost empty) lines, without any loss of readability.  Thus, as the
supply of new-lines on your screen is not a renewable resource (think
25-line terminal screens here), you have more empty lines to put
comments on.

Do not unnecessarily use braces where a single statement will do.

if (condition)
    action();

and

if (condition)
    do_this();
else
    do_that();

This does not apply if one branch of a conditional statement is a single
statement. Use braces in both branches.

if (condition) {
    do_this();
    do_that();
} else {
    otherwise();
}

        3.1:  Spaces

Linux kernel style for use of spaces depends (mostly) on
function-versus-keyword usage.  Use a space after (most) keywords.  The
notable exceptions are sizeof, typeof, alignof, and __attribute__, which look
somewhat like functions (and are usually used with parentheses in Linux,
although they are not required in the language, as in: "sizeof info" after
"struct fileinfo info;" is declared).

So use a space after these keywords:
    if, switch, case, for, do, while
but not with sizeof, typeof, alignof, or __attribute__.  E.g.,
    s = sizeof(struct file);

Do not add spaces around (inside) parenthesized expressions.  This example is
*bad*:

    s = sizeof( struct file );

When declaring pointer data or a function that returns a pointer type, the
preferred use of '*' is adjacent to the data name or function name and not
adjacent to the type name.  Examples:

    char *linux_banner;
    unsigned long long memparse(char *ptr, char **retptr);
    char *match_strdup(substring_t *s);

Use one space around (on each side of) most binary and ternary operators,
such as any of these:

    =  +  -  <  >  *  /  %  |  &  ^  <=  >=  ==  !=  ?  :

but no space after unary operators:
    &  *  +  -  ~  !  sizeof  typeof  alignof  __attribute__  defined

no space before the postfix increment & decrement unary operators:
    ++  --

no space after the prefix increment & decrement unary operators:
    ++  --

and no space around the '.' and "->" structure member operators.

Do not leave trailing whitespace at the ends of lines.  Some editors with
"smart" indentation will insert whitespace at the beginning of new lines as
appropriate, so you can start typing the next line of code right away.
However, some such editors do not remove the whitespace if you end up not
putting a line of code there, such as if you leave a blank line.  As a result,
you end up with lines containing trailing whitespace.

Git will warn you about patches that introduce trailing whitespace, and can
optionally strip the trailing whitespace for you; however, if applying a series
of patches, this may make later patches in the series fail by changing their
context lines.


        Chapter 4: Naming

C is a Spartan language, and so should your naming be.  Unlike Modula-2
and Pascal programmers, C programmers do not use cute names like
ThisVariableIsATemporaryCounter.  A C programmer would call that
variable "tmp", which is much easier to write, and not the least more
difficult to understand.

HOWEVER, while mixed-case names are frowned upon, descriptive names for
global variables are a must.  To call a global function "foo" is a
shooting offense.

GLOBAL variables (to be used only if you _really_ need them) need to
have descriptive names, as do global functions.  If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should _not_ call it "cntusr()".

Encoding the type of a function into the name (so-called Hungarian
notation) is brain damaged - the compiler knows the types anyway and can
check those, and it only confuses the programmer.  No wonder MicroSoft
makes buggy programs.

LOCAL variable names should be short, and to the point.  If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood.  Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.

If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See chapter 6 (Functions).


        Chapter 5: Typedefs

Please don't use things like "vps_t".

It's a _mistake_ to use typedef for structures and pointers. When you see a

    vps_t a;

in the source, what does it mean?

In contrast, if it says

    struct virtual_container *a;

you can actually tell what "a" is.

Lots of people think that typedefs "help readability". Not so. They are
useful only for:

 (a) totally opaque objects (where the typedef is actively used to _hide_
     what the object is).

     Example: "pte_t" etc. opaque objects that you can only access using
     the proper accessor functions.

     NOTE! Opaqueness and "accessor functions" are not good in themselves.
     The reason we have them for things like pte_t etc. is that there
     really is absolutely _zero_ portably accessible information there.

 (b) Clear integer types, where the abstraction _helps_ avoid confusion
     whether it is "int" or "long".

     u8/u16/u32 are perfectly fine typedefs, although they fit into
     category (d) better than here.

     NOTE! Again - there needs to be a _reason_ for this. If something is
     "unsigned long", then there's no reason to do

    typedef unsigned long myflags_t;

     but if there is a clear reason for why it under certain circumstances
     might be an "unsigned int" and under other configurations might be
     "unsigned long", then by all means go ahead and use a typedef.

 (c) when you use sparse to literally create a _new_ type for
     type-checking.

 (d) New types which are identical to standard C99 types, in certain
     exceptional circumstances.

     Although it would only take a short amount of time for the eyes and
     brain to become accustomed to the standard types like 'uint32_t',
     some people object to their use anyway.

     Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
     signed equivalents which are identical to standard types are
     permitted -- although they are not mandatory in new code of your
     own.

     When editing existing code which already uses one or the other set
     of types, you should conform to the existing choices in that code.

 (e) Types safe for use in userspace.

     In certain structures which are visible to userspace, we cannot
     require C99 types and cannot use the 'u32' form above. Thus, we
     use __u32 and similar types in all structures which are shared
     with userspace.

Maybe there are other cases too, but the rule should basically be to NEVER
EVER use a typedef unless you can clearly match one of those rules.

In general, a pointer, or a struct that has elements that can reasonably
be directly accessed should _never_ be a typedef.


        Chapter 6: Functions

Functions should be short and sweet, and do just one thing.  They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.

The maximum length of a function is inversely proportional to the
complexity and indentation level of that function.  So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.

However, if you have a complex function, and you suspect that a
less-than-gifted first-year high-school student might not even
understand what the function is all about, you should adhere to the
maximum limits all the more closely.  Use helper functions with
descriptive names (you can ask the compiler to in-line them if you think
it's performance-critical, and it will probably do a better job of it
than you would have done).

Another measure of the function is the number of local variables.  They
shouldn't exceed 5-10, or you're doing something wrong.  Re-think the
function, and split it into smaller pieces.  A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused.  You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.

In source files, separate functions with one blank line.  If the function is
exported, the EXPORT* macro for it should follow immediately after the closing
function brace line.  E.g.:

int system_is_up(void)
{
    return system_state == SYSTEM_RUNNING;
}
EXPORT_SYMBOL(system_is_up);

In function prototypes, include parameter names with their data types.
Although this is not required by the C language, it is preferred in Linux
because it is a simple way to add valuable information for the reader.


        Chapter 7: Centralized exiting of functions

Albeit deprecated by some people, the equivalent of the goto statement is
used frequently by compilers in form of the unconditional jump instruction.

The goto statement comes in handy when a function exits from multiple
locations and some common work such as cleanup has to be done.

The rationale is:

- unconditional statements are easier to understand and follow
- nesting is reduced
- errors by not updating individual exit points when making
    modifications are prevented
- saves the compiler work to optimize redundant code away ;)

int fun(int a)
{
    int result = 0;
    char *buffer = kmalloc(SIZE);

    if (buffer == NULL)
        return -ENOMEM;

    if (condition1) {
        while (loop1) {
            ...
        }
        result = 1;
        goto out;
    }
    ...
out:
    kfree(buffer);
    return result;
}

        Chapter 8: Commenting

Comments are good, but there is also a danger of over-commenting.  NEVER
try to explain HOW your code works in a comment: it's much better to
write the code so that the _working_ is obvious, and it's a waste of
time to explain badly written code.

Generally, you want your comments to tell WHAT your code does, not HOW.
Also, try to avoid putting comments inside a function body: if the
function is so complex that you need to separately comment parts of it,
you should probably go back to chapter 6 for a while.  You can make
small comments to note or warn about something particularly clever (or
ugly), but try to avoid excess.  Instead, put the comments at the head
of the function, telling people what it does, and possibly WHY it does
it.

When commenting the kernel API functions, please use the kernel-doc format.
See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
for details.

Linux style for comments is the C89 "/* ... */" style.
Don't use C99-style "// ..." comments.

The preferred style for long (multi-line) comments is:

    /*
     * This is the preferred style for multi-line
     * comments in the Linux kernel source code.
     * Please use it consistently.
     *
     * Description:  A column of asterisks on the left side,
     * with beginning and ending almost-blank lines.
     */

It's also important to comment data, whether they are basic types or derived
types.  To this end, use just one data declaration per line (no commas for
multiple data declarations).  This leaves you room for a small comment on each
item, explaining its use.


        Chapter 9: You've made a mess of it

That's OK, we all do.  You've probably been told by your long-time Unix
user helper that "GNU emacs" automatically formats the C sources for
you, and you've noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing - an infinite number of monkeys typing into GNU emacs would never
make a good program).

So, you can either get rid of GNU emacs, or change it to use saner
values.  To do the latter, you can stick the following in your .emacs file:

(defun c-lineup-arglist-tabs-only (ignored)
  "Line up argument lists by tabs, not spaces"
  (let* ((anchor (c-langelem-pos c-syntactic-element))
     (column (c-langelem-2nd-pos c-syntactic-element))
     (offset (- (1+ column) anchor))
     (steps (floor offset c-basic-offset)))
    (* (max steps 1)
       c-basic-offset)))

(add-hook 'c-mode-common-hook
          (lambda ()
            ;; Add kernel style
            (c-add-style
             "linux-tabs-only"
             '("linux" (c-offsets-alist
                        (arglist-cont-nonempty
                         c-lineup-gcc-asm-reg
                         c-lineup-arglist-tabs-only))))))

(add-hook 'c-mode-hook
          (lambda ()
            (let ((filename (buffer-file-name)))
              ;; Enable kernel mode for the appropriate files
              (when (and filename
                         (string-match (expand-file-name "~/src/linux-trees")
                                       filename))
                (setq indent-tabs-mode t)
                (c-set-style "linux-tabs-only")))))

This will make emacs go better with the kernel coding style for C
files below ~/src/linux-trees.

But even if you fail in getting emacs to do sane formatting, not
everything is lost: use "indent".

Now, again, GNU indent has the same brain-dead settings that GNU emacs
has, which is why you need to give it a few command line options.
However, that's not too bad, because even the makers of GNU indent
recognize the authority of K&R (the GNU people aren't evil, they are
just severely misguided in this matter), so you just give indent the
options "-kr -i8" (stands for "K&R, 8 character indents"), or use
"scripts/Lindent", which indents in the latest style.

"indent" has a lot of options, and especially when it comes to comment
re-formatting you may want to take a look at the man page.  But
remember: "indent" is not a fix for bad programming.


        Chapter 10: Kconfig configuration files

For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different.  Lines under a "config" definition
are indented with one tab, while help text is indented an additional two
spaces.  Example:

config AUDIT
    bool "Auditing support"
    depends on NET
    help
      Enable auditing infrastructure that can be used with another
      kernel subsystem, such as SELinux (which requires this for
      logging of avc messages output).  Does not do system-call
      auditing without CONFIG_AUDITSYSCALL.

Features that might still be considered unstable should be defined as
dependent on "EXPERIMENTAL":

config SLUB
    depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT
    bool "SLUB (Unqueued Allocator)"
    ...

while seriously dangerous features (such as write support for certain
filesystems) should advertise this prominently in their prompt string:

config ADFS_FS_RW
    bool "ADFS write support (DANGEROUS)"
    depends on ADFS_FS
    ...

For full documentation on the configuration files, see the file
Documentation/kbuild/kconfig-language.txt.


        Chapter 11: Data structures

Data structures that have visibility outside the single-threaded
environment they are created and destroyed in should always have
reference counts.  In the kernel, garbage collection doesn't exist (and
outside the kernel garbage collection is slow and inefficient), which
means that you absolutely _have_ to reference count all your uses.

Reference counting means that you can avoid locking, and allows multiple
users to have access to the data structure in parallel - and not having
to worry about the structure suddenly going away from under them just
because they slept or did something else for a while.

Note that locking is _not_ a replacement for reference counting.
Locking is used to keep data structures coherent, while reference
counting is a memory management technique.  Usually both are needed, and
they are not to be confused with each other.

Many data structures can indeed have two levels of reference counting,
when there are users of different "classes".  The subclass count counts
the number of subclass users, and decrements the global count just once
when the subclass count goes to zero.

Examples of this kind of "multi-level-reference-counting" can be found in
memory management ("struct mm_struct": mm_users and mm_count), and in
filesystem code ("struct super_block": s_count and s_active).

Remember: if another thread can find your data structure, and you don't
have a reference count on it, you almost certainly have a bug.


        Chapter 12: Macros, Enums and RTL

Names of macros defining constants and labels in enums are capitalized.

#define CONSTANT 0x12345

Enums are preferred when defining several related constants.

CAPITALIZED macro names are appreciated but macros resembling functions
may be named in lower case.

Generally, inline functions are preferable to macros resembling functions.

Macros with multiple statements should be enclosed in a do - while block:

#define macrofun(a, b, c)             \
    do {                    \
        if (a == 5)            \
            do_this(b, c);        \
    } while (0)

Things to avoid when using macros:

1) macros that affect control flow:

#define FOO(x)                    \
    do {                    \
        if (blah(x) < 0)        \
            return -EBUGGERED;    \
    } while(0)

is a _very_ bad idea.  It looks like a function call but exits the "calling"
function; don't break the internal parsers of those who will read the code.

2) macros that depend on having a local variable with a magic name:

#define FOO(val) bar(index, val)

might look like a good thing, but it's confusing as hell when one reads the
code and it's prone to breakage from seemingly innocent changes.

3) macros with arguments that are used as l-values: FOO(x) = y; will
bite you if somebody e.g. turns FOO into an inline function.

4) forgetting about precedence: macros defining constants using expressions
must enclose the expression in parentheses. Beware of similar issues with
macros using parameters.

#define CONSTANT 0x4000
#define CONSTEXP (CONSTANT | 3)

The cpp manual deals with macros exhaustively. The gcc internals manual also
covers RTL which is used frequently with assembly language in the kernel.


        Chapter 13: Printing kernel messages

Kernel developers like to be seen as literate. Do mind the spelling
of kernel messages to make a good impression. Do not use crippled
words like "dont"; use "do not" or "don't" instead.  Make the messages
concise, clear, and unambiguous.

Kernel messages do not have to be terminated with a period.

Printing numbers in parentheses (%d) adds no value and should be avoided.

There are a number of driver model diagnostic macros in <linux/device.h>
which you should use to make sure messages are matched to the right device
and driver, and are tagged with the right level:  dev_err(), dev_warn(),
dev_info(), and so forth.  For messages that aren't associated with a
particular device, <linux/printk.h> defines pr_debug() and pr_info().

Coming up with good debugging messages can be quite a challenge; and once
you have them, they can be a huge help for remote troubleshooting.  Such
messages should be compiled out when the DEBUG symbol is not defined (that
is, by default they are not included).  When you use dev_dbg() or pr_debug(),
that's automatic.  Many subsystems have Kconfig options to turn on -DDEBUG.
A related convention uses VERBOSE_DEBUG to add dev_vdbg() messages to the
ones already enabled by DEBUG.


        Chapter 14: Allocating memory

The kernel provides the following general purpose memory allocators:
kmalloc(), kzalloc(), kcalloc(), vmalloc(), and vzalloc().  Please refer to
the API documentation for further information about them.

The preferred form for passing a size of a struct is the following:

    p = kmalloc(sizeof(*p), ...);

The alternative form where struct name is spelled out hurts readability and
introduces an opportunity for a bug when the pointer variable type is changed
but the corresponding sizeof that is passed to a memory allocator is not.

Casting the return value which is a void pointer is redundant. The conversion
from void pointer to any other pointer type is guaranteed by the C programming
language.


        Chapter 15: The inline disease

There appears to be a common misperception that gcc has a magic "make me
faster" speedup option called "inline". While the use of inlines can be
appropriate (for example as a means of replacing macros, see Chapter 12), it
very often is not. Abundant use of the inline keyword leads to a much bigger
kernel, which in turn slows the system as a whole down, due to a bigger
icache footprint for the CPU and simply because there is less memory
available for the pagecache. Just think about it; a pagecache miss causes a
disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles
that can go into these 5 milliseconds.

A reasonable rule of thumb is to not put inline at functions that have more
than 3 lines of code in them. An exception to this rule are the cases where
a parameter is known to be a compiletime constant, and as a result of this
constantness you *know* the compiler will be able to optimize most of your
function away at compile time. For a good example of this later case, see
the kmalloc() inline function.

Often people argue that adding inline to functions that are static and used
only once is always a win since there is no space tradeoff. While this is
technically correct, gcc is capable of inlining these automatically without
help, and the maintenance issue of removing the inline when a second user
appears outweighs the potential value of the hint that tells gcc to do
something it would have done anyway.


        Chapter 16: Function return values and names

Functions can return values of many different kinds, and one of the
most common is a value indicating whether the function succeeded or
failed.  Such a value can be represented as an error-code integer
(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure,
non-zero = success).

Mixing up these two sorts of representations is a fertile source of
difficult-to-find bugs.  If the C language included a strong distinction
between integers and booleans then the compiler would find these mistakes
for us... but it doesn't.  To help prevent such bugs, always follow this
convention:

    If the name of a function is an action or an imperative command,
    the function should return an error-code integer.  If the name
    is a predicate, the function should return a "succeeded" boolean.

For example, "add work" is a command, and the add_work() function returns 0
for success or -EBUSY for failure.  In the same way, "PCI device present" is
a predicate, and the pci_dev_present() function returns 1 if it succeeds in
finding a matching device or 0 if it doesn't.

All EXPORTed functions must respect this convention, and so should all
public functions.  Private (static) functions need not, but it is
recommended that they do.

Functions whose return value is the actual result of a computation, rather
than an indication of whether the computation succeeded, are not subject to
this rule.  Generally they indicate failure by returning some out-of-range
result.  Typical examples would be functions that return pointers; they use
NULL or the ERR_PTR mechanism to report failure.


        Chapter 17:  Don't re-invent the kernel macros

The header file include/linux/kernel.h contains a number of macros that
you should use, rather than explicitly coding some variant of them yourself.
For example, if you need to calculate the length of an array, take advantage
of the macro

  #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))

Similarly, if you need to calculate the size of some structure member, use

  #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))

There are also min() and max() macros that do strict type checking if you
need them.  Feel free to peruse that header file to see what else is already
defined that you shouldn't reproduce in your code.


        Chapter 18:  Editor modelines and other cruft

Some editors can interpret configuration information embedded in source files,
indicated with special markers.  For example, emacs interprets lines marked
like this:

-*- mode: c -*-

Or like this:

/*
Local Variables:
compile-command: "gcc -DMAGIC_DEBUG_FLAG foo.c"
End:
*/

Vim interprets markers that look like this:

/* vim:set sw=8 noet */

Do not include any of these in source files.  People have their own personal
editor configurations, and your source files should not override them.  This
includes markers for indentation and mode configuration.  People may use their
own custom mode, or may have some other magic method for making indentation
work correctly.



        Appendix I: References

The C Programming Language, Second Edition
by Brian W. Kernighan and Dennis M. Ritchie.
Prentice Hall, Inc., 1988.
ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback).
URL: http://cm.bell-labs.com/cm/cs/cbook/

The Practice of Programming
by Brian W. Kernighan and Rob Pike.
Addison-Wesley, Inc., 1999.
ISBN 0-201-61586-X.
URL: http://cm.bell-labs.com/cm/cs/tpop/

GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
gcc internals and indent, all available from http://www.gnu.org/manual/

WG14 is the international standardization working group for the programming
language C, URL: http://www.open-std.org/JTC1/SC22/WG14/

Kernel CodingStyle, by greg@kroah.com at OLS 2002:
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/



        这篇短小的文章是对Linux内核编程风格的建议. 编程风格非常的个性化,而且,我并不想将我的观点强加给任何人,但是为了变于维护,我不得不提出这个观点.详情如下: 


  在最开始,我应该写出GNU 编程风格的标准而不用理会它. 不要理会他们,它只是一个符号表情而已. 

  好,让我们开始吧! 

第一章: 缩进格式 

  Tab 是 8 个字符,于是缩进也是8个字符.有很多怪异的风格,他们将缩进格式定义为4个字符(设置为2个字符!)的深度,这就象试图将PI定义为3一样让人难以接受. 

  理由是: 缩进的大小是为了清楚的定义一个块的开始和结束.特别是当你已经在计算机前面呆了20多个小时了以后,你会发现一个大的缩进格式使得你对程序的理解更容易. 

  现在,有一些人说,使用8个字符的缩进使得代码离右边很近,在80个字符宽度的终端屏幕上看程序很难受.回答是,但你的程序有3个以上的缩进的时候,你就应该修改你的程序. 

  总之,8个字符的缩进使得程序易读,还有一个附加的好处,就是它能在你将程序变得嵌套层数太多的时候给你警告.这个时候,你应该修改你的程序. 

第二章:大符号的位置 

  另外一个C 程序编程风格的问题是对 大括号的处理. 同缩进大小不同,几乎没有什么理由去选择一种而不选择另外一种风格,但有一种推荐的风格,它是 Kernighan 和 Ritchie的经典的那本书带来的,它将开始的大括号放在一行的最后,而将结束大括号放在一行的第一位,如下所示: 

if (x is true) { 
we do y 



  然而,还有一种特殊的情况:命名函数: 开始的括号是放在下一行的第一位,如下: 
int function(int x) 

body of function 



  所有非正统的人会非难这种不一致性,但是,所有思维正常的人明白: (第一) K&R是___对___的,(第二)如果K&R不对,请参见第一条. (:-))......另外,函数也是特殊的,不一定非得一致. 

需要注意的是结束的括号在它所占的那一行是空的,__除了__它跟随着同一条语句的继续符号.如 "while"在 do-while循环中,或者"else"在if语句中.如下: 

do { 
body of do-loop 
} while (condition); 


以及 

if (x == y) { 
.. 
} else if (x >; y) { 
... 
} else { 
.... 


理由: K&R. 


  另外,注意到这种大括号的放置方法减小了空行的数量,但却没有减少可读性.于是,在屏幕大小受到限制的时候,你就可以有更多的空行来写些注释了. 

第三章 : 命名系统 

  C是一种简洁的语言,那么,命名也应该是简洁的. 同MODULE-2以及PASCAL语言不同的是,C程序员不使用诸如 ThisVariableIsATemporaryCounter 之类的命名方式. 一个C语言的程序员会将之命名为"tmp",这很容易书写,且并不是那么难以去理解. 

  然而,当混合类型的名字不得不出现的时候,描述性名字对全局变量来说是必要的了. 调用一个名为"foo"全局的函数是很让人恼火的. 

  全局变量( 只有你必须使用的时候才使用它) ,就象全局函数一样,需要有描述性的命名方式.假如你有一个函数用来计算活动用户的数量,你应该这样命名--"count_active_users()"--或另外的相近的形式,你不应命名为"cntusr()". 

  有一种称为 Hungarian命名方式,它将函数的类型编码写入变量名中,这种方式是脑子有毛病的一种表现 --- 编译器知道这个类型而且会去检查它,而这样只会迷惑程序员. --知道为什么Micro$oft为什么会生产这么多"臭虫"程序了把!!. 

  局部变量的命名应该短小精悍. 假如你有一个随机的整数循环计数器,它有可能有"i",如果没有任何可能使得它能被误解的话,将其写作"loop_counter"是效率低下的.同样的,""tmp"可以是任何临时数值的函数变量. 

  如果你害怕混淆你的局部变量的名字,还有另外一个问题,就是称 function-growth-hormone-imbalance syndrome. 

第四章: 函数 

  函数应该短小而迷人,而且它只作一件事情. 它应只覆盖一到两个屏幕(80*24一屏),并且只作一件事情,而且将它做好.( 这不就是UNIX 的风格吗,译者注). 

  一个函数的最大长度和函数的复杂程度以及缩进大小成反比. 于是, 如果你已经写了简单但长度较长的的函数, 而且你已经对不同的情况做了很多很小的事情,写一个更长一点的函数也是无所谓的. 

  然而, 假如你要写一个很复杂的函数, 而且你已经估计到假如一般人读这个函数,他可能都不知道这个函数在说些什么,这个时候,使用具有描述性名字的有帮助的函数. 

  另外一个需要考虑的是局部变量的数量. 他们不应该超过5-10个, 否则你有可能会出错. 重新考虑这个函数 , 将他们分割成更小的函数. 人的大脑通常可以很容易的记住7件不同的事情,超过这个数量会引起混乱. 你知道你很聪明,但是你可能仍想去明白2周以前的做的事情. 

第5章 : 注释 

  注释是一件很好的事情, 但是过多的注释也是危险的,不要试图区解释你的代码是注释如何如何的好: 你应该将代码写得更好,而不是花费大量的时间去解释那些糟糕的代码. 

  通常情况下,你的注释是说明你的代码做些什么,而不是怎么做的. 而且,要试图避免将注释插在一个函数体里: 假如这个函数确实很复杂,你需要在其中有部分的注释,你应该回到第四章看看. 你可以写些简短的注释来注明或警告那些你认为特别聪明(或极其丑陋)的部分, 但是你必须要避免过多. 取而代之的是, 将注释写在函数前,告诉别人它做些什么事情,和可能为什么要这样做. 

第六章 : 你已经深陷其中了. 

  不要着急. 你有可能已经被告之"GUN emacs" 会自动的帮你处理C 的源代码格式, 而且你已经看到它确实如此, 但是, 缺省的情况下, 它的作用还是不尽如人意(实际上,他们比随便敲出来的东西还要难看 - a infinite number of monkeys typing into GNU emacs would never make a good program) 

  于是, 你可以要么 不要使用 GUN emacs, 要么让它使用saner valules. 使用后者,你需要将如下的语句输入到你的 .emacs文件中. 
(defun linux-c-mode () 
"C mode with adjusted defaults for use with the Linux kernel." 
(interactive) 
(c-mode) 
(c-set-style "K&R") 
(setq c-basic-offset 8)) 


  这会定义一个 M-x Linux-c-mode 的命令. 当你hacking 一个模块的时候, 如何你将-*- linux-c -*- 输入在最开始的两行, 这个模式会自动起作用. 而且, 你也许想加入如下 

(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.〖ch〗$" . linux-c-mode) 
auto-mode-alist)) 


  到你的.emacs文件中, 这样的话, 当你在/usr/src/linux下编辑文件的时候,它会自动切换到linux-c-mode . 

  但是, 假如你还不能让emaces去自动处理文件的格式, 不要紧张, 你还有一样东西 : "缩进" . 

  GNU 的缩进格式也很死板, 这就是你为什么需要加上几行命令选项. 然而, 这还不算太坏,因为GNU 缩进格式的创造者也记得 K&R 的权威, ( GNU 没有罪, 他们仅仅是在这件事情上错误的引导了人们) , 你要做的就只有输入选项 "-kr -i8"(表示"K&R, 缩进8个字符). 

  "缩进"有很多功能, 特别是当它建议你重新格式你的代码的时候,你应该看看帮助. 但要记住 : "缩进"不是风格很差的程序的万灵丹.
原创粉丝点击