Difference between revisions of "Coding style/Practices"
(null pointers required to have all-bits-zero for now) |
(Use shorter name for return value in exception handling example) |
||
Line 64: | Line 64: | ||
<pre> |
<pre> |
||
{ |
{ |
||
− | krb5_error_code |
+ | krb5_error_code ret; |
char *ptr1 = NULL; |
char *ptr1 = NULL; |
||
krb5_blah *ptr2 = NULL; |
krb5_blah *ptr2 = NULL; |
||
... |
... |
||
− | + | ret = another_function(...); |
|
− | if ( |
+ | if (ret != 0) |
goto cleanup; |
goto cleanup; |
||
... |
... |
||
Line 76: | Line 76: | ||
free(ptr1); |
free(ptr1); |
||
krb5_free_blah(ptr2); |
krb5_free_blah(ptr2); |
||
− | return |
+ | return ret; |
} |
} |
||
</pre> |
</pre> |
Revision as of 09:56, 10 May 2012
Quick Summary
- Avoid strcpy, strcat, sprintf, and *scanf.
- Use "goto cleanup" to ensure that resources are released on all function exit paths.
- Keep track of which pointers are owners and which are aliases; initialize owner pointers to NULL.
- Avoid creating structure instances containing a mix of owner and alias pointers.
- Name output variables ending with "_out". Initialize them at the beginning of a function, then don't touch them until just before successful return.
- Don't use assignments as truth values except (perhaps) in a while loop.
- Use NULL for the null pointer and '\0' for the null character. Don't use pointers or characters as truth values.
- Put braces around flow control statement bodies if they are more than one line long, and around any do-while loop.
- Try to keep preprocessor statements outside of function bodies.
- Avoid declarations of local variables in inner scope.
- Parenthesize (sparingly) to avoid C precedence confusion, especially when using bitwise operators.
- Use all-uppercase names for macros.
- Avoid stepping on reserved C namespaces.
- Use the krb5_ prefix for libkrb5 API functions, and the k5_ prefix for linker-visible internal functions. Don't use a prefix for static functions.
- Avoid long or deeply nested function bodies. Use helpers to limit function complexity.
- Function calls through pointers should look like "(*fp)(args)" or "vtable->method(vars)".
- Always declare function return types. Use ANSI C-style function definitions.
- Don't pass or return structures by value in API functions.
- Precede functions with block comments describing what they do.
- Make your code warning-clean under the gcc warning flags used in our build system.
String Handling
Code should not use any of the following functions: strcpy, strcat, sprintf, or any *scanf function.
Dynamically allocated buffers are preferred to fixed-sized buffers. When using dynamic buffers:
- Use strdup or asprintf for simple constructions.
- Use the k5buf module (k5-buf.h) for complex constructions. If this is not desirable, strlcpy and strlcat are valid alternatives.
Substitute versions of strlcpy, strlcat, and asprintf, for operating systems that don't supply them, are declared in k5-platform.h and defined in the support library (which practically everything in the tree links directly against).
When using fixed-sized buffers:
- Use strlcpy for simple copies.
- Use snprintf for simple compound constructions. Avoid using precision or field width specifiers in snprintf format strings.
- To check for truncation when using snprintf, use the following approach:
result = snprintf(buffer, sizeof(buffer), "format string", ...); if (SNPRINTF_OVERFLOW(result, sizeof(buffer)) handle_overflow_error;
The SNPRINTF_OVERFLOW macro is defined in k5-platform.h.
Current conformance
Existing code mostly conforms, with some exceptions.
Rationale
It is relatively common to audit a code base such as krb5 and flag all uses of strcpy and similar functions, even if they are used safely. Verifying that the function is used safely requires manual inspection. Using safer alternatives does not guarantee code safety, but does reduce the likelihood of a catastrophic buffer overflow vulnerability.
In some compilation environments under Solaris, field widths and precisions are computed using screen columns rather than bytes.
sprintf returns a signed integer; buffer sizes are typically size_t (which is an unsigned type). Comparing the two directly will result in a warning from some compilers, and has unpredictable semantics depending on the relative widths of int and size_t. Also, some sprintf implementations, such as the one in Solaris prior to version 10, return -1 on a buffer overflow, whereas the C99 behavior returns the number of bytes which would have been written to the string. The SNPRINTF_OVERFLOW macro uses casts to address both issues.
The k5buf module safely allows multi-step string construction within fixed-size or dynamic buffers without performing an error check on each append, and without pre-computing lengths. Errors need only be checked when the resulting string needs to be read or handed off to another function.
Exception Handling
If a function allocates several resources and has many exit paths, use the following general structure when possible:
{ krb5_error_code ret; char *ptr1 = NULL; krb5_blah *ptr2 = NULL; ... ret = another_function(...); if (ret != 0) goto cleanup; ... cleanup: free(ptr1); krb5_free_blah(ptr2); return ret; }
(free() and krb5_free_foo() functions will both accept NULL as input and do nothing.) Simpler functions may use simpler control structures, but do not get to the point of freeing the same resource in several different places within a function depending on the exit path. Do not use multiple exit labels.
Unless constrained by an API standard such as GSSAPI, functions which can fail should return a krb5_error_code. Error codes are defined in com_err tables, usually located in src/lib/krb5/error_tables.
Functions can also set extended error messages using krb5_set_error_message() (or, in dependencies of libkrb5, krb5int_set_error() on &context->err). Code should set extended error messages when an error condition is moderately likely to occur and the default string for the error code is insufficiently clear. Avoid exposing function names and other implementation details in error messages.
After ignoring or handling an error code, krb5_clear_error_message() (or krb5int_clear_error()) should be used to ensure that an extended error message is not applied erroneously to a later instance of the same error code. This can be especially important in loops.
Current conformance
Existing code mostly conforms, with some exceptions.
Rationale
The "goto cleanup" flow control is the most reliable way to avoid resource leaks within the constraints of the krb5 code base. Within libraries, the code base does not abort on memory allocation failure, so tends to have many exit paths within some functions.
Extended error messages can provide invaluable information about error conditions, but merely serve to expand code if used mechanically.
Memory management
An allocated object can have multiple pointers to it. The pointer through which the object will eventually be freed is called the owner, and other pointers to the object are called aliases. The owner may change over the lifetime of the object.
When transferring ownership of a pointer (such as when returning an allocated value to a caller via an output parameter), null out the old owner pointer unless it is about to be destroyed or go out of scope.
Avoid mixing owner pointers and aliases within a structure. Create additional local variables to act as owner pointers when this situation occurs.
Initialize variables containing owner pointers to NULL. Free the pointed-to objects (unconditionally, when possible) in the function's cleanup handler.
In some cases these rules may not be reasonable; for instance, temporary memory allocated within a loop body must be freed within the loop body, not in the cleanup handler. Break these rules when necessary, but also make liberal use of helper functions to simplify memory management.
Current conformance
Newer code mostly conforms; older code does not.
Rationale
Following these rules makes it easier to verify that a function contains no memory leaks, no double frees, and no dereferences after frees. These rules also keep memory management logic out of the core logic of a function, making it easier to understand.
Output parameter handling
Output parameter names should typically end in "_out". Output parameters should typically be the last parameters to a function, with certain exceptions (such as context-like parameters).
Avoid filling in a caller-allocated structure. Instead, allocate the structure within the function and output a pointer to the structure.
If a function has output parameters, their values should be defined in all cases, including on failure. Most of the time, this means setting output parameters to NULL or zero at the beginning of the function, and not assigning real values to the output parameters until successful completion from the function is guaranteed. A typical function with output parameters should look something like:
krb5_error_code function_with_output_param(type *input_param, type *ptr_out) { krb5_error_code ret; type *ptr = NULL; *ptr_out = NULL; stuff that can go wrong here, allocating ptr; *ptr_out = ptr; ptr = NULL; cleanup: free_function(ptr); other cleanup code here; return ret; }
Simple wrapper functions can delegate the initialization of output paremeters to the functions they wrap, unless they are invoking the wrapped function through a function pointer.
Current conformance
Existing code mostly does not conform. Filling in caller-allocated structures is common (though far from universal) in the public libkrb5 API.
Rationale
Naming output parameters with an "_out" suffix helps readers easily identify them, and also makes it clearer when a resource's ownership is transferred to the caller. Since output parameters are only used near the beginning and end of a function, the extra verbosity is not very burdensome.
Initializing output parameters makes it clear to static analysis tools that the previous value of the output parameter will not be used. If there happens to be a bug in the function such that it returns success but does not set output parameters, or in the caller such that it uses an output parameter even though the function returned failure, then the bug will result in predictable behavior (generally a NULL pointer dereference) with no security consequences beyond a denial of service attack.
Assignments as truth values
Do not use assignments as truth values. Rather than this:
/* bad style */ if ((retval = krb5_foo())) /* ... */;
do this:
/* better style */ retval = krb5_foo(); if (retval) /* ... */;
Current conformance
Existing code varies widely.
Rationale
This makes the code easier to read, and also makes it easier to use debuggers. It may be excusable to put assignments into the conditional espression of a "while" statement, though, like:
while ((ch = getopt(argc, argv, "abn")) != -1) /* ... */;
Using assignments as truth values in conditional expressions ("ternary expressions") may make code particularly impenetrable.
The many names of zero
There are at least three types of "zero" known to C. These are the integer zero (0), the null pointer constant (NULL), and the character constant zero ('\0'). Yes, these are usually all technically the integer zero. Use them in their correct contexts. (Purists will point out that 0 is a valid null pointer constant; still, do not use 0 to specify a null pointer constant. For further unconfusion, read the section on null pointer constants in the comp.lang.c FAQ.) Do not use a lone variable as a truth value unless it's of integer type. Thus:
int i; char *cp; /* ... */ if (i) /* ... */; if (cp != NULL) { while (*cp != '\0') /* ... */; }
Do not cast uses of NULL unless you're calling a function with a variable number of arguments, in which case you should cast it to to the appropriate pointer type. (NULL may be defined as 0, and an integer 0 may not be the same size in the argument list as a pointer on a 64-bit platform.) Likewise, do not cast the return value from malloc() and friends; the prototype should declare them properly as returning a void * and thus shouldn't require an explicit cast.
In any case, reading the section in the C FAQ on null pointers is highly recommended to remove confusion regarding null pointers in C, since this is a subject of much confusion to even experienced programmers. In particular, if you do not understand why using calloc() to allocate a struct that contains pointer members or why calling memset() to initialize such a struct to all-bytes-zero is wrong, reread that section again. Due to the prevalence of this practice and the practical difficulties of eliminating it, our portability assumptions currently require that platforms represent null pointers as all-bits-zero.
Brace placement
Control flow statements that have a single statement as their body should nevertheless have braces around their bodies if the body is more than one line long, especially in the case of stacked multiple if-else clauses; use:
if (x) { if (y) foo(); else bar(); }
instead of:
/* bad style */ if (x) if (y) foo(); else bar();
which, while legible to the compiler, may confuse human readers and make the code less maintainable, especially if new branches get added to any of the clauses.
If you are writing a do-while loop that has only one statement in its body, put braces around it anyway, since the while clause may be mistaken for a while loop with an empty body. Don't do this:
/* bad style */ do foo(); while (x);
Instead, write this:
/* better style */ do { foo(); } while (x);
Preprocessor conditionals
Instead of scattering #ifdef
or other preprocessor conditionals throughout the code use preprocessor conditionals to control the definition of a function-like macro, or similar construct, which is used instead of embedded preprocessor conditionals in code.
Instead of this:
do_something(); /* bad style */ #ifdef DEBUG fprintf(stderr, "did something\n"); #endif do_something_else();
do something more like this:
#ifdef DEBUG #define DPRINTF(x) fprintf(stderr, x) #else #define DPRINTF(x) /* empty */ #endif /* ... */ do_something(); /* better style */ DPRINTF(("did something\n")); do_something_else();
Do not intersperse conditional compilation
directives with control flow statements, as some combination of
#define
d symbols may result in statements getting eaten by dangling
bits of control flow statements. When it is not possible to avoid
this questionable practice (you really should rewrite the relevant
code section), make use of redundant braces to make a compiler
error more likely than incorrect runtime behavior (such as
inadvertently providing someone with a root shell -- this has actually happened, long ago).
Do not intersperse conditional compilation directives with control flow statements in such a way that confuses emacs cc-mode. Not only does emacs get confused, but the code becomes more difficult to read and maintain. Therefore, avoid code like this:
/* bad style */ if (x) { f(); } #ifdef FOO else if (y) { #else else { #endif g(); }
Put comments after conditional compilation directives such as "#else" and "#endif". Make them correspond to the sense of the value that controls the compilation of the section they are closing, i.e.
#ifdef FOO /* ... */ #else /* !FOO */ /* ... */ #endif /* !FOO */
Also, in the case of more complex conditional compilation directives, write the comments like this:
#if defined(FOO) || defined(BAR) /* ... */ #else /* !(defined(FOO) || defined(BAR)) */ /* ... */ #endif /* !(defined(FOO) || defined(BAR)) */
Conformance
Existing code mostly does not conform.
Rationale
Instances of "ifdef soup" make code more difficult to read, and make it more difficult to recognize bugs. It also interferes with automated reformatting and analysis tools.
Local variables
Do not declare variables in an inner scope, e.g. inside the compound substatement of an if statement, unless the complexity of the code really demands that the variables be declared that way. In such situations, the function could probably stand to be broken up into smaller chunks anyway. Do not declare variables in an inner scope that shadow ones in an outer scope, since this leads to confusion. Also, some debugging environments, such as gdb under Solaris, can't see variables declared in an inner scope, so declaring such variables will make maintenance more difficult as well.
Placement of parentheses
Parenthesize expressions that may be confusing, particularly where C's precedences are broken. For example, the shift operators have lower precedence than the +, -, *, /, and % operators. Perhaps the most familiar C precedence quirk is that equality and relational operators are of higher precedence than assignment operators. Less well known is that the bitwise operators are of a lower precedence than equality and relational operators.
The sizeof operator takes either a unary expression or a parenthesized type name. It is not necessary to parenthesize the operand of sizeof if it is applied to a unary expression, but still, always parenthesize the operand of the sizeof operator. The sizeof operator does not evaluate its operand if it is a unary expression, so usages such as
s = sizeof(++foo);
should be avoided for the sake of sanity and readability.
Function-like macros
Macros should have all-uppercase names. If it is necessary to use multiple statements, use braces, and wrap the whole thing in a do-while(0) construct, such as
#define FOOMACRO(x, y) do { \ foo = (x) + (y); \ f(y); \ } while (0)
Leave off the semicolon at the end of a function-like macro, so that it can be mostly used like a call to a function without a return value. Line up the backslashes to make it more readable. Use M-x c-backslash-region in emacs to do neat lined-up backslashes. Parenthesize uses of arguments in the replacement text of a macro in order to prevent weird interactions.
Namespaces
The C standard reserves a bunch of namespaces for the implementation. Don't stomp on them. For practical purposes, any identifier with a leading underscore should not be used. (Technically, ^_[a-z].* are reserved only for file scope, so should be safe for things smaller than file scope, but it's better to be paranoid in this case.)
POSIX reserves typedef names ending with _t as well.
Recall that errno is a reserved identifier, and is permitted to be a macro. Therefore, do not use errno as the name of a structure member, etc.
Reserved namespaces are somewhat more restricted than this; read the appropriate section of the C standard if you have questions.
If you're writing new library code, pick a short prefix and stick with it for any identifier with external linkage. If for some reason a library needs to have external symbols that should not be visible to the application, pick another (related) prefix to use for the internal globals. This applies to typedef names, tag names, and preprocessor identifiers as well.
For the krb5 library, the prefix for public global symbols is "krb5_". Use "k5_" as a prefix for library internal globals ("krb5int_" is also acceptable). Avoid using "__" in symbol names, as it may confuse C++ implementations. There are admittedly a number of places where we leak thing into the namespace; we should try to fix these.
Header files should also not leak symbols. Usually using the upcased version of the prefix you've picked will suffice, e.g. "KRB5_" as a CPP symbol prefix corresponding to "krb5_". In general, do not define macros that are lowercase, in order to avoid confusion and to prevent namespace collisions.
Do not formulaically apply prefixes such as krb5_ or krb5int_ to static function names. It is better for static functions to have short descriptive, and it can be confusing to see the krb5_ prefix on a symbol which is not actually part of the krb5 API.
The C standard only guarantees six case-insensitive characters to be significant in external identifiers; this is largely regarded as obsolescent even in 1989 and we will ignore it. It does, however, only guarantee 31 case-sensitive characters to be signficant in internal identifiers, so do not use identifiers that differ beyond the 31st character. This is unlikely to be a problem, though.
Function length
Functions should not be longer than about 60 lines. Strongly consider breaking up functions that are longer than this limit.
Conformance
Existing code conforms somewhat, with some egregious exceptions. Some functions exceed 500 lines in length.
Rationale
Functions that are excessively lengthy are difficult to read. Especially when the beginning of a control structure does not fit on the same screen or page as its end, maintenance can become problematic. 60 lines is about the maximum length that can comfortably fit on a sheet of letter-sized paper.
Nesting depth
The maximum indentation level should not be more than four. Deeply nested control flow generally indicates that a function needs to be broken into smaller pieces.
Conformance
Existing code conforms somewhat.
Rationale
Deeply nested control flow is difficult to read.
Copyright/License Statements
Each applicable copyright/license statement for a file should appear by itself in a block comment. Copyright/license statements should appear near the top of source files, after any emacs formatting or filename comments but before any descriptive comments or non-comment lines of source code. See prototype/prototype.c and prototype/prototype.h for a more detailed example.
All unique (except for year) copyright/license statements should appear in the top-level NOTICE file.
Conformance
Existing code mostly conforms, except that copyright/license statements do not usually appear by themselves in block comments. In a few cases copyright/license statements appear later on in source files and should be moved to the top.
Rationale
Formatting license comments this way allows them to be extracted from source code by scripts, without the need for ugly markers. The top-level NOTICE file makes it easier to satisfy the documentation requirements of licenses and to determine all of the terms under which MIT Kerberos may be used.
Misc. to be sorted
Assume, for most purposes, working ANSI/ISO C ('89, not '99) support, both for internal use and for applications compiling against Kerberos header files and libraries. Some exceptions are noted below.
When calling a function through a variable which holds a function pointer, explicitly deference the variable in order to make it easy to see that the call is through a function pointer. Do not explicitly dereference the pointer if it is contained within a structure or union, since there is no ambiguity in that case. Do not explicitly take the address of a function in order to assign it to a function pointer. Thus:
int (*fp)(void); int foofunc(void); fp = foofunc; x = (*fp)(); x = vtable.funcname(); x = vtable->funcname();
In general, do not take the address of an array. It does not return a pointer to the first element; it returns a pointer to the array itself. These are often identical when cast to an integral type, but they are inherently of different types themselves. Functions that take array types or pointers to array types as arguments can be particularly trouble-prone.
If a function is declared to return a value, do not call "return" without an argument or allow the flow of control to fall off the end of the function.
Always declare the return type of a function, even if it returns int. Yes, this means declaring main() to return int, since main() is required to return int by the standard. If a function is not supposed to return a value, declare it as returning void rather than omitting the return type, which will default the return type to int.
Try to use ANSI C prototype-style function definitions in preference to K&R style definitions. When using K&R style function definitions, declare all the argument types, even those that are int, but beware of any narrow types in the argument list.
Don't pass or return structures in API functions except by address.
Every function should have block comment preceding it describing briefly in complete sentences what it does, what inputs and outputs it has, and what error codes it can return. It should also describe any unusual aspects of the function. At some point we will want to put some of this information into a machine-parsable form.
Strive to make your code capable of compiling using "gcc -Wall -Wmissing-prototypes -Wtraditional -Wcast-align -pedantic" [XXX need to rethink this somewhat] without generating any errors or warnings. Do not, however, compile using the "-ansi" flag to gcc, since that can result in odd behavior with header files on some systems, causing some necessary symbols to not be defined.