Difference between revisions of "Coding style/Practices"
KenRaeburn (talk | contribs) (→The many names of zero) |
KenRaeburn (talk | contribs) (→Misc. to be sorted: we use small struct passing internally, have for a while; it's not a problem. omit -Wconversion) |
||
Line 464: | Line 464: | ||
any narrow types in the argument list. |
any narrow types in the argument list. |
||
− | Don't pass |
+ | Don't pass or return structures in API functions except by address. |
− | restriction for non-API functions, though. |
||
For new functions, input parameters should go before output parameters |
For new functions, input parameters should go before output parameters |
||
Line 478: | Line 478: | ||
Strive to make your code capable of compiling using "gcc -Wall |
Strive to make your code capable of compiling using "gcc -Wall |
||
-Wmissing-prototypes -Wtraditional -Wcast-qual -Wcast-align |
-Wmissing-prototypes -Wtraditional -Wcast-qual -Wcast-align |
||
− | + | -pedantic" [XXX need to rethink this |
|
somewhat] without generating any errors or warnings. Do not, however, |
somewhat] without generating any errors or warnings. Do not, however, |
||
compile using the "-ansi" flag to gcc, since that can result in odd |
compile using the "-ansi" flag to gcc, since that can result in odd |
Revision as of 16:10, 26 January 2009
Contents
String Handling
Code should not use any of the following functions: strcpy, strcat, sprintf, or any *scanf function.
Dynamically allocated buffers are preferred to fixed-sized buffers. When using dynamic buffers:
- Use strdup or asprintf for simple constructions.
- Use the k5buf module (k5-buf.h) for complex constructions. If this is not desirable, strlcpy and strlcat are valid alternatives.
Substitute versions of strlcpy, strlcat, and asprintf, for operating systems that don't supply them, are declared in k5-platform.h and defined in the support library (which practically everything in the tree links directly against).
When using fixed-sized buffers:
- Use strlcpy for simple copies.
- Use snprintf for simple compound constructions. Avoid using precision or field width specifiers in snprintf format strings.
- To check for truncation when using snprintf, use the following approach:
result = snprintf(buffer, sizeof(buffer), "format string", ...); if (SNPRINTF_OVERFLOW(result, sizeof(buffer)) handle_overflow_error;
The SNPRINTF_OVERFLOW macro is defined in k5-platform.h.
Current conformance
Existing code mostly conforms, with some exceptions.
Rationale
It is relatively common to audit a code base such as krb5 and flag all uses of strcpy and similar functions, even if they are used safely. Verifying that the function is used safely requires manual inspection. Using safer alternatives does not guarantee code safety, but does reduce the likelihood of a catastrophic buffer overflow vulnerability.
In some compilation environments under Solaris, field widths and precisions are computed using screen columns rather than bytes.
sprintf returns a signed integer; buffer sizes are typically size_t (which is an unsigned type). Comparing the two directly will result in a warning from some compilers, and has unpredictable semantics depending on the relative widths of int and size_t. Also, some sprintf implementations, such as the one in Solaris prior to version 10, return -1 on a buffer overflow, whereas the C99 behavior returns the number of bytes which would have been written to the string. The SNPRINTF_OVERFLOW macro uses casts to address both issues.
The k5buf module safely allows multi-step string construction within fixed-size or dynamic buffers without performing an error check on each append, and without pre-computing lengths. Errors need only be checked when the resulting string needs to be read or handed off to another function.
Exception Handling
If a function allocates several resources and has many exit paths, use the following general structure when possible:
{ krb5_error_code retval; char *ptr1 = NULL; krb5_blah *ptr2 = NULL; ... retval = another_function(...); if (retval != 0) goto cleanup; ... cleanup: if (ptr1) free(ptr1); if (ptr2) kbr5_free_blah(ptr2); return retval; }
Simpler functions may use simpler structures, but do not get to the point of freeing the same resource in several different places within a function depending on the exit path.
Current conformance
Existing code mostly conforms, with some exceptions.
Rationale
This is the most reliable way to avoid resource leaks within the constraints of the krb5 code base. Within libraries, the code base does not abort on memory allocation failure, so tends to have many exit paths within some functions.
Output parameter handling
If a function has output parameters, it should initialize them in all cases, even if it returns failure. Most of the time, this means setting pointer output parameters to NULL and numeric output parameters to zero at the beginning of the function. The real output values should not be assigned to the output parameters until successful completion from the function is guaranteed. A typical function with output parameters should look something like:
krb5_error_code function_with_output_param(type *input_param, type *out_ptr, int *out_len) { declarations here; *out_ptr = NULL; *out_len = 0; stuff that can go wrong here; *out_ptr = ptr; *out_len = len; return 0; error: cleanup code here; return ret; }
Simple wrappers do not need to worry about this rule unless they are invoking functions through pointers.
Current conformance
Existing code mostly does not conform.
Rationale
This practice makes it clear to static analysis tools that the previous value of the output parameter will not be used, and also helps convey under what circumstances an output parameter is initialized. If there happens to be a bug in the function such that it returns success but does not set output parameters, or in the caller such that it uses an output parameter even though the function returned failure, then the bug will result in predictable behavior (generally a NULL pointer dereference) with no security consequences beyond a denial of service attack.
Assignments as truth values
Do not use assignments as truth values. Rather than this:
/* bad style */ if ((retval = krb5_foo())) /* ... */;
do this:
/* better style */ retval = krb5_foo(); if (retval) /* ... */;
Current conformance
Existing code varies widely.
Rationale
This makes the code easier to read, and also makes it easier to use debuggers. It may be excusable to put assignments into the conditional espression of a "while" statement, though, like:
while ((ch = getopt(argc, argv, "abn")) != -1) /* ... */;
Using assignments as truth values in conditional expressions ("ternary expressions") may make code particularly impenetrable.
The many names of zero
There are at least three types of "zero" known to C. These are the integer zero (0), the null pointer constant (NULL), and the character constant zero ('\0'). Yes, these are usually all technically the integer zero. Use them in their correct contexts. (Purists will point out that 0 is a valid null pointer constant; still, do not use 0 to specify a null pointer constant. For further unconfusion, read the section on null pointer constants in the C FAQ.) Do not use a lone variable as a truth value unless it's of integer type. Thus:
int i; char *cp; /* ... */ if (i) /* ... */; if (cp != NULL) { while (*cp != '\0') /* ... */; }
Do not cast uses of NULL unless you're calling a function with a variable number of arguments, in which case you should cast it to to the appropriate pointer type. (NULL may be defined as 0, and an integer 0 may not be the same size in the argument list as a pointer on a 64-bit platform.) Likewise, do not cast the return value from malloc() and friends; the prototype should declare them properly as returning a void * and thus shouldn't require an explicit cast.
In any case, reading the section in the C FAQ on null pointers is highly recommended to remove confusion regarding null pointers in C, since this is a subject of much confusion to even experienced programmers. In particular, if you do not understand why using calloc() to allocate a struct that contains pointer members or why calling memset() to initialize such a struct to all-bytes-zero is wrong, reread that section again. (Note that there are *lots* of examples of code in the krb5 source tree that erroneously calls memset() to zero a struct, and we should fix these somehow eventually.)
Brace placement
Control flow statements that have a single statement as their body should nevertheless have braces around their bodies if the body is more than one line long, especially in the case of stacked multiple if-else clauses; use:
if (x) { if (y) foo(); else bar(); }
instead of:
/* bad style */ if (x) if (y) foo(); else bar();
which, while legible to the compiler, may confuse human readers and make the code less maintainable, especially if new branches get added to any of the clauses.
If you are writing a do-while loop that has only one statement in its body, put braces around it anyway, since the while clause may be mistaken for a while loop with an empty body. Don't do this:
/* bad style */ do foo(); while (x);
Instead, write this:
/* better style */ do { foo(); } while (x);
Preprocessor conditionals
Instead of scattering #ifdef
or other preprocessor conditionals throughout the code use preprocessor conditionals to control the definition of a function-like macro, or similar construct, which is used instead of embedded preprocessor conditionals in code.
Instead of this:
do_something(); /* bad style */ #ifdef DEBUG fprintf(stderr, "did something\n"); #endif do_something_else();
do something more like this:
#ifdef DEBUG #define DPRINTF(x) fprintf(stderr, x) #else #define DPRINTF(x) /* empty */ #endif /* ... */ do_something(); /* better style */ DPRINTF(("did something\n")); do_something_else();
Do not intersperse conditional compilation
directives with control flow statements, as some combination of
#define
d symbols may result in statements getting eaten by dangling
bits of control flow statements. When it is not possible to avoid
this questionable practice (you really should rewrite the relevant
code section), make use of redundant braces to ensure that a compiler
error will result in preference to incorrect runtime behavior (such as
inadvertently providing someone with a root shell).
Do not intersperse conditional compilation directives with control flow statements in such a way that confuses emacs cc-mode. Not only does emacs get confused, but the code becomes more difficult to read and maintain. Therefore, avoid code like this:
/* bad style */ if (x) { f(); } #ifdef FOO else if (y) { #else else { #endif g(); }
Put comments after conditional compilation directives such as "#else" and "#endif". Make them correspond to the sense of the value that controls the compilation of the section they are closing, i.e.
#ifdef FOO /* ... */ #else /* !FOO */ /* ... */ #endif /* !FOO */
Also, in the case of more complex conditional compilation directives, write the comments like this:
#if defined(FOO) || defined(BAR) /* ... */ #else /* !(defined(FOO) || defined(BAR)) */ /* ... */ #endif /* !(defined(FOO) || defined(BAR)) */
Conformance
Existing code mostly does not conform.
Rationale
Instances of "ifdef soup" make code more difficult to read, and make it more difficult to recognize bugs.
Local variables
Do not declare variables in an inner scope, e.g. inside the compound substatement of an if statement, unless the complexity of the code really demands that the variables be declared that way. In such situations, the function could probably stand to be broken up into smaller chunks anyway. Do not declare variables in an inner scope that shadow ones in an outer scope, since this leads to confusion. Also, some debugging environments, such as gdb under Solaris, can't see variables declared in an inner scope, so declaring such variables will make maintenance more difficult as well.
Placement of parentheses
Parenthesize expressions that may be confusing, particularly where C's precedences are broken. For example, the shift operators have lower precedence than the +, -, *, /, and % operators. Perhaps the most familiar C precedence quirk is that equality and relational operators are of higher precedence than assignment operators. Less well known is that the bitwise operators are of a lower precedence than equality and relational operators.
The sizeof operator takes either a unary expression or a parenthesized type name. It is not necessary to parenthesize the operand of sizeof if it is applied to a unary expression, but still, always parenthesize the operand of the sizeof operator. The sizeof operator does not evaluate its operand if it is a unary expression, so usages such as
s = sizeof(++foo);
should be avoided for the sake of sanity and readability.
Function-like macros
Macros should have all-uppercase names. If it is necessary to use multiple statements, use braces, and wrap the whole thing in a do-while(0) construct, such as
#define FOOMACRO(x, y) do { \ foo = (x) + (y); \ f(y); \ } while (0)
Leave off the semicolon at the end of a function-like macro, so that it can be mostly used like a call to a function without a return value. Line up the backslashes to make it more readable. Use M-x c-backslash-region in emacs to do neat lined-up backslashes. Parenthesize uses of arguments in the replacement text of a macro in order to prevent weird interactions.
Namespaces
The C standard reserves a bunch of namespaces for the implementation. Don't stomp on them. For practical purposes, any identifier with a leading underscore should not be used. (Technically, ^_[a-z].* are reserved only for file scope, so should be safe for things smaller than file scope, but it's better to be paranoid in this case.)
POSIX reserves typedef names ending with _t as well.
Recall that errno is a reserved identifier, and is permitted to be a macro. Therefore, do not use errno as the name of a structure member, etc.
Reserved namespaces are somewhat more restricted than this; read the appropriate section of the C standard if you have questions.
If you're writing new library code, pick a short prefix and stick with it for any identifier with external linkage. If for some reason a library needs to have external symbols that should not be visible to the application, pick another (related) prefix to use for the internal globals. This applies to typedef names, tag names, and preprocessor identifiers as well.
For the krb5 library, the prefix for public global symbols is "krb5_". Use "krb5int_" as a prefix for library internal globals. Avoid using "__" in symbol names, as it may confuse C++ implementations. There are admittedly a number of places where we leak thing into the namespace; we should try to fix these.
Header files should also not leak symbols. Usually using the upcased version of the prefix you've picked will suffice, e.g. "KRB5_" as a CPP symbol prefix corresponding to "krb5_". In general, do not define macros that are lowercase, in order to avoid confusion and to prevent namespace collisions.
The C standard only guarantees six case-insensitive characters to be significant in external identifiers; this is largely regarded as obsolescent even in 1989 and we will ignore it. It does, however, only guarantee 31 case-sensitive characters to be signficant in internal identifiers, so do not use identifiers that differ beyond the 31st character. This is unlikely to be a problem, though.
Misc. to be sorted
Assume, for most purposes, working ANSI/ISO C ('89, not '99) support, both for internal use and for applications compiling against Kerberos header files and libraries. Some exceptions are noted below.
While it is syntactically correct to call through a function pointer without applying a dereference operator to it, do not write code that does this. It is easier to see that the function call is actually taking place through a function pointer if you write an explicit dereference. However, do not explicitly take the address of a function in order to assign it to a function pointer, since a function name degrades into a pointer. Thus:
int (*fp)(void); int foofunc(void); fp = foofunc; x = (*fp)();
In general, do not take the address of an array. It does not return a pointer to the first element; it returns a pointer to the array itself. These are often identical when cast to an integral type, but they are inherently of different types themselves. Functions that take array types or pointers to array types as arguments can be particularly trouble-prone.
If a function is declared to return a value, do not call "return" without an argument or allow the flow of control to fall off the end of the function.
Always declare the return type of a function, even if it returns int. Yes, this means declaring main() to return int, since main() is required to return int by the standard. If a function is not supposed to return a value, declare it as returning void rather than omitting the return type, which will default the return type to int.
Try to use ANSI C prototype-style function definitions in preference to K&R style definitions. When using K&R style function definitions, declare all the argument types, even those that are int, but beware of any narrow types in the argument list.
Don't pass or return structures in API functions except by address.
For new functions, input parameters should go before output parameters in the call signature. There are exceptions, such as a context-like parameter.
Every function should have block comment preceding it describing briefly in complete sentences what it does, what inputs and outputs it has, and what error codes it can return. It should also describe any unusual aspects of the function. At some point we will want to put some of this information into a machine-parsable form.
Strive to make your code capable of compiling using "gcc -Wall -Wmissing-prototypes -Wtraditional -Wcast-qual -Wcast-align -pedantic" [XXX need to rethink this somewhat] without generating any errors or warnings. Do not, however, compile using the "-ansi" flag to gcc, since that can result in odd behavior with header files on some systems, causing some necessary symbols to not be defined.