Tuesday, 6 January 2015

GCC is wonderful: diagnostic options you should use

In my previous post, I explored how a few extensions to C provided by GCC (and Clang) can be used to implement a better ARRAY_SIZE macro, with type-safety. There are other wonderful things about GCC, though, including the set of compiler warning and error options that it provides. If you thought that ‘-Wall’ enabled all warnings, you may wish to take a closer look, as you’ll otherwise be missing out on many useful checks that could save you a lot of time and effort.

Of course, it’s not a simple case of enabling all the warning options, making sure nothing gets printed and hey presto (or ‘Abracadabra’), you have perfect code. I would liken using a compiler’s diagnostic options to tying up one’s shoelaces: it won’t guarantee that you won’t trip, and it certainly won’t protect you if you walk out into the street without looking, but it’s still a fairly sensible idea!

For the C language family, these diagnostic options can be divided into a few categories:
  • Warnings about the use of old-style ‘K&R’ or ‘traditional’ (pre-ANSI/ISO) C. Almost any C compiler worth caring about supports C90 (or more formally, ISO/IEC 9899:1990 and briefly ANSI X3.159-1989) or newer, these days. Unless you depend upon libraries which have traditional-style declarations in their headers (either accidentally or deliberately).
  • Warnings about poor practices that are simply best avoided.
  • Warnings about potential issues that one might not always choose to address, as these could sometimes be considered as false positives.
Quite a few of these tie in with common misconceptions about C, some of which are equally applicable to C++. GCC also has warning and error options for C++, Objective C and Objective C++ which are to be recommended, although I shall not cover them here.

Starting with GCC 4.2, any individual warning option can be promoted to an error. Prior to this, only a few specific warning options had an equivalent error option, although all enabled warnings could be turned into errors with a single option (-Werror), if desired. There are many situations where it is beneficial to allow false positives to be raised by the compiler without causing compilation to fail (such as many of the sub-options specified by -Wextra and a few by -Wall), and it’s certainly nice to avoid causing unnecessary pain to package maintainers and to users building your code from source, if yours is an open source project.

What follows is a run through of some of these options, together with some examples of the sorts of issues that they will help you to avoid. To begin with, I’ll focus on several options that you may wish to enable as early on in a project as possible, as they are the most likely ones to affect header files, and are the strongest candidates for turning into errors. I will then move on to discuss other useful warning options, before briefly mentioning one option that’s not quite as useful as one might otherwise imagine.

-Wwrite-strings

The -Wwrite-strings option isn’t actually a warning option at all, but masquerades as one: its effect is to modify the language, altering the type that is used for string literals in C from ‘char *’ to ‘const char *’, as is used by C++. No additional option is needed to produce warnings when implicitly dropping the const qualifier, as the messages that GCC emits for this are enabled by default.

You may wonder why on earth C uses ‘char *’ in the first place, especially given that string literals are not typically (at least on systems that use virtual memory) held in memory that is writeable. My suspicion here is that an awful lot of code had already been written by the time the const keyword was added, and it was deemed too big a task to modify all of it (there’d certainly be some precedence for this). To avoid getting stuck in the same mess, I would advise any new project to start off using this option.

If you’re writing a library, you should make sure at a minimum to use const anywhere in your headers where a string literal may be used, as you’ll make a mess for those writing applications in C++, otherwise. If your test code is written in C (and not C++), make sure it uses -Wwrite-strings to help check for missing const qualifiers.

In many established projects, you may find you get an awful lot of noise from a very small set of functions, which is likely to include wrappers around printf() and family and wrappers around write() or fwrite(). Once you’ve dealt with these, the warnings will start to relate a little more to potentially risky situations, and you might even find a situation where an attempt could be made to modify or call free() upon a string literal, perhaps in some of the less well-tested error paths of the codebase. Once you’ve got into the habit of using const throughout your codebase (to do otherwise would produce warnings when compiling your test code) you may find that it will help to catch a number of issues. Supposing we have a function for parsing IPv6 addresses, declared as:
bool read_ipv6_addr(uint8_t *ipv6_bin, const char *str);
We will have been forced into adding the const qualifier to str as shown above when writing our test code:
uint8_t ipv6_addr_buf[16];
static const uint8_t ipv6_addr_localhost[16] =
    { 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1 };
CU_ASSERT(read_ipv6_addr(ipv6_addr_buf, "::1"));
CU_ASSERT(!memcmp(ipv6_addr_buf, ipv6_addr_localhost));
Provided we otherwise maintain good discipline and introduce const qualifiers where we know a string will never need to be modified, we can entirely avoid ever accidentally transposing the arguments to read_ipv6_addr(), and the marathon of a debugging session that might otherwise arise from such a trivial mistake.

-Werror=implicit-function-declaration

My suggestion is to turn this into an error, although you could enable it just as a warning, as -Wimplicit-function-declaration. An implicit function declaration occurs at the earliest place where a function is called within a C source file, if the compiler has not already parsed a function declaration (or definition). What this typically means is that you forgot to include the header with the prototype for the function, so the compiler assumes you’ve called it correctly. It works out what the function’s parameters are from the arguments that have been passed to it, and then checks that the arguments of any subsequent calls within the file match the parameters of the implicit declaration. Either that, or you have included the correct header file, and the function prototype is entirely missing from it. In any case, the behaviour is the same: if the function takes a pointer and you’ve forgotten to pass it, the likelihood is one of an instant segfault, and if the code is called infrequently, there’s every chance that this will slip through testing.

Implicit function declarations are so nasty that GCC had an option to flag them as errors long before the release of GCC 4.2 (going back at least as far as GCC 2.95), although it was named -Werror-implicit-function-declaration, and this older name for the option is still supported in current versions of GCC. If you don’t care too much about GCC 4.1 and older, and if you plan on using any of the other -Werror= options described below, you may just as well use the newer syntax consistently.

C++ doesn’t allow implicit function declarations and that’s probably a fair indicator that you don’t want to allow them when using C, either. If you’re using a library function that someone forgot to include in the header files, you will just have to settle for declaring your own prototype, although it’s probably best to do this in a wrapper around the library’s own header file. This can get a little messy, but it’s still better than relying upon implicit declarations.

-Werror=strict-prototypes

This option will catch function declarations that are not ANSI C prototypes. If the compiler catches one of these, that means one of two things. Either the function takes no parameters but this has not been properly indicated where the function is declared, or worse, the function does take parameters but these have not been listed in the declaration. We’re basically back to the same behaviour as with implicit function declarations, where the compiler works on the basis of the earliest call to the function within the source file (or the function’s definition, if that happens to be found first).

This tends to throw C++ programmers who are used to thinking of C as a subset of C++, but yes, an empty parameter list in C means ‘don’t check’, whereas in C++ it is equivalent to writing void. If you specify void for the parameter list, the compiler can check to make sure you’re not mistakenly passing parameters to a function that takes none in the belief that it will use them. If the function does actually take parameters, the compiler will detect that the function’s definition (i.e. the function itself) does not match its prototype and will throw a compilation error.

Basically, unless you’re using C++:
int foo();     /* WRONG: K&R declaration, args not checked */
int foo(void); /* RIGHT: ANSI C prototype */
-Werror=missing-prototypes

This catches cases where a function has been defined globally (i.e. without the static keyword) but the prototype for the function has not been found. If the function is only called locally, you can simply mark it as static and move on. Otherwise, you have either forgotten to include the header in which it is declared, or you have included the correct header but again, it has not been declared within that header. Either way, the result is the same: the compiler cannot check that the function prototype and function definition actually match, and so it cannot provide any guarantee that the function is being called correctly.

-Wformat -Werror=format-security

Certain incorrect uses of the printf() family of functions can remarkably result in security vulnerabilities. Even if security is not a direct concern of yours (perhaps all input that your program will receive can be safely trusted), this will still help to catch some common mistakes. -Wformat itself will check that the argument list matches the format string, and is included as part of -Wall. However, these warnings can sometimes be benign, so I wouldn’t strictly recommend turning them into errors in all cases. The more severe issues to look out for are those where too few arguments are passed, where an integral type is passed but a char * is expected for use with %s, or the wrong type of pointer has been passed for use with the %s specifier.

-Werror=format-security is another matter, in that it specifically checks for situations where the format string is not a string literal, and no format arguments are passed. In other words, printf(str) has been used where printf("%s", str) or perhaps puts(str) would do just as well. In cases where str is not from trusted source, this can result in a uncontrolled format string vulnerability, allowing arbitrary code to be executed when a maliciously constructed string is passed to the program.

When -Wformat is used, GCC can also check that any wrappers for functions in the printf() family have been called correctly, provided their prototypes have been annotated with the format function attribute, which I shall demonstrate in a later post concerning the language extensions that GCC provides, including function, variable and type attributes.

-Wall

This is an alias for a whole load of other warning options which catch a whole load of common mistakes, and comes strongly recommended, although I would not suggest turning these into errors, simply because this option includes -Wmaybe-uninitialized which can produce false positives. Actually, the -Wmaybe-uninitialized option was only introduced in GCC 4.7, but in older versions these same false positives are produced with -Wuninitialized, which is also included in -Wall. With GCC 4.7 and later, you could perhaps use -Werror=all -Wno-error=maybe-uninitialized but there is a fair likelihood that it may be inconvenient for you to have GCC treat other warnings that are enabled by -Wall as errors. It it worth keeping in mind that the checking for certain options included in -Wall varies depending on the level of code optimisation in use, and that the precise details of this and the compiler’s propensity to report false positives will vary between versions.

-Werror=parentheses

This helps catch cases where missing brackets or braces affect the behaviour of a program, and is included (as a warning) by -Wall. This actually checks for quite a few different omissions, but the main concerns are the accidental use of the assignment operator (=) instead of the equality operator (==) where a truth value is expected, and common mistakes relating to C’s operator precedence, which as Dennis Ritchie had admitted is not ideal.

-Werror=undef

This causes preprocessing to fail if an undefined identifier is used within a ‘#if’ directive, rather than simply substituting 0 and carrying on as if nothing happened. This is almost always due to a missing include or a missing macro definition, which is probably not something you want ignoring. One great thing about this option is it allows you to switch away from using #ifdef with little confidence when checking a definition which you do not currently have enabled (where you could easily have made a typo, or failed to include the appropriate header) to using #if with complete assurance that if a feature is disabled, the checking of that feature is correct and an explicit value of ‘0’ has been set.

One option to avoid

The -Wshadow option is just one option that is somewhat less than useful. The tl;dr of the argument that Linus makes: do you really want a warning (or even compilation failure) whenever you define a local variable named index and you just happen to have included <strings.h>?

In closing

There are quite a few more options that you may wish to use, and my recommendation would be to read the Warning Options page of the GCC manual, if you want to know more.

glibc users may find that they get stricter warnings regarding use of the certain C library functions (including unused return values from read()) if they compile with -D_FORTIFY_SOURCE=2, although Ubuntu enables fortification by default.

If you’re using GCC 4.1, 4.2, 4.3, 4.4 or 4.5, you may also find the -fdiagnostics-show-option useful: at the end of each warning message that is printed, it will add the name of the warning option that controls that message, allowing you to look up the entry for that option in GCC’s manual if you wish, but also allowing you an easier way of determining which option to disable if you require. This became the default in GCC 4.6. (Also, ignore the typo in the documentation for it in GCC 4.1: the option name given here is the correct one.)

If you’re using GCC 4.9, you might chose to enable colourisation of GCC’s diagnostic messages. One way is to simply set the GCC_COLORS environment variable to ‘auto’ (meaning that colourisation will occur when GCC’s output is to a terminal). In GCC 5.0, ‘auto’ will be the default.

There are various static code analysis tools that are worth considering, such as Sparse and Coccinelle (with an appropriate set of semantic match files), and there’s a good deal of run-time analysis that can be helpful, too, e.g. AddressSanitizer, ThreadSanitizer and MemorySanitizer and the venerable Valgrind.

Sunday, 4 January 2015

GCC *is* wonderful: a better ARRAY_SIZE macro

One of the best patch submissions I’ve ever seen is Rusty Russell’s change to Linux’s ARRAY_SIZE macro. For those that haven’t programmed much in C, I’ll briefly explain that the language provides no built-in construct for determining the number of elements in an array. However, it does allow one to determine the size of an array in bytes (well, chars, actually) and also the size of the array’s elements (also in chars) and you can then find the number of array elements by dividing the former by the latter.

This works reasonably well — well enough that to this day, C still does not include a language construct for determining array element counts. However, there’s disagreement as to what to call the macro for doing this, with Unix programmers typically preferring ARRAY_SIZE, Windows programmers sometimes preferring DIM, and some programmers preferring NELEMS or something else entirely. What’s worse, sometimes a macro isn't used at all, leaving the calculation open-coded. Occasionally, when an open-coded calculation is used, the element size is written as a literal value instead of using C’s sizeof keyword, or the element type is explicitly specified instead of letting the compiler work that out itself.

Okay, so maybe that sounds a little cumbersome but not especially bad, right? Actually, yes — the ARRAY_SIZE macro as conventionally defined does not provide good type-safety and the open-coded approach is more fragile, more verbose and provides no improvement in type-safety.

Type-safety of the conventional ARRAY_SIZE macro

The macro is usually defined as follows:
#define ARRAY_SIZE(array) \
    (sizeof(array) / sizeof(*array))
or:
#define ARRAY_SIZE(array) \
    (sizeof(array) / sizeof(array[0]))
As C uses the same syntax for pointer dereferencing and the indexing of arrays, these are equivalent. The macro would be used as follows:
int i;
int a[100];
for (i = 0; i < ARRAY_SIZE(a); i++) {
    a[i] = i;
}
Some C programmers presumably feel that the version of the definition that uses ‘[0]’ better conveys that the parameter to the macro is to be an array. On the contrary, I feel that ‘*’ better conveys that the parameter could be a pointer. Of course, if the parameter is a pointer, the calculation is unlikely to produce the desired value. For example:
int i;
int a[100];
int *p = a;
for (i = 0; i < ARRAY_SIZE(p); i++) { /* WRONG */
    a[i] = i;
}
ARRAY_SIZE(p) expands to sizeof(p) / sizeof(*p). Although sizeof(*p) provides the expected value, sizeof(p) is equivalent to sizeof(int *), and not sizeof(int[10]) as intended. On most general-purpose platforms in use today, the size of any pointer type will be either four chars (for 32-bit systems) or eight chars (for 64-bit systems). Depending on the array’s element size, the result of the integer division will be either 8, 4, 2, 1 or 0, regardless of the number of elements. In the above example, the vast majority of the elements in the array will be left uninitialised.

Competent C programmers are therefore somewhat careful to ensure that they never use ARRAY_SIZE directly with a pointer variable, and are also careful in cases where they might legitimately use the ARRAY_SIZE macro to get the size of an array that is the member of a structure that is referenced through a pointer, as shown:
int i;
struct foo {
    int a[100];
} bar;
struct foo *p = &bar;
for (i = 0; i < ARRAY_SIZE(p->a); i++) {
    p->a[i] = i;
}
It’s also worth considering the fragility of the open-coded approach, but only because it’s still frightfully common. Here, if the array were to change from an array of int to an array of long int, we would need to update the open-coded calculation to suit:
int i;
int a[100];
for (i = 0; i < sizeof(a) / sizeof(int); i++) { /* UGH */
    a[i] = i;
}
There’s simply no reason to burden the programmer with this task, especially when once every so often a failure to do this will result in a severe bug.

One danger of the example above, where the array member of p->a is used, is that if the struct were to be changed to include a pointer, instead of an array, the code would still compile without modification. Note that open-coding of the calculation does nothing to prevent this.

Rusty Russell’s patch to Linux’s ARRAY_SIZE macro

There are at least three awesome things about Rusty’s patch. These are:
  1. The functionality. With this version of ARRAY_SIZE, one never needs to worry about accidentally taking the array size of a pointer, as any code doing this would quite simply fail to compile when using a sufficiently recent version of GCC (such as GCC 3.1, released on 27th July 2002) or Clang. The error message is somewhat opaque but it’s better than the bug that would otherwise result.
  2. The subject: ‘[PATCH] Use more gcc extensions in the Linux headers’. It’s brilliantly attention-getting, but also conveys quite well that the submission was more of a request for comment than anything else, at that early stage. There’s a sense of humour about this which is wonderfully disarming, and what followed was a short discussion and a far tidier patch which was accepted soon afterwards. The follow-up patch addressing the lack of documentation is worth a quick mention, too: it simply added the comment /* GCC is awesome. */ — of course, this too was only intended a bit of light humour, and the eventual change provides a far better explanation.
  3. The directness of the original submission. Rather than to attempt to explain it, Rusty simply leaves it as a challenge to the reader! I wouldn’t suggest accepting code in this form, but I found it interesting to read his original patch and to make sense of the change. Oddly, it was probably easier than it would have been to make sense of anybody else’s explanation, or to look at the eventual implementation of the __must_be_array macro.
Rusty’s version of the ARRAY_SIZE macro, with the added ‘documentation’ (and with some whitespace changes), is as follows:
/* GCC is awesome. */
#define ARRAY_SIZE(arr) \
    (sizeof(arr) / sizeof((arr)[0]) \
     + sizeof(typeof(int[1 - 2 * \
           !!__builtin_types_compatible_p(typeof(arr), \
                 typeof(&arr[0]))])) * 0)
I won’t explain this in full, so as to provide some of the same challenge to you, except to say:
  • The initial division is the same as before, the addend must equal zero to leave this behaviour unchanged.
  • The type of the addend must be size_t (as provided by sizeof) for all expansions of the macro to retain the type of size_t.
  • typeof(int[-1]) is an invalid expression under GCC (as is ‘sizeof(struct { int : -1; })’, upon which current kernels rely).
  • typeof and __builtin_types_compatible_p are both GCC extensions, as you may have guessed!

Usable ARRAY_SIZE macro with type-safety under GCC

My only question: why should we have to go to so much trouble? Well, I’m not sure how far fair-use extends (as these macros are QEMU and Linux, which are distributed under the GPLv2) and IANAL, but this may help:
#if defined(__GNUC__) && defined(__GNUC_MINOR__)
 #define GNUC_VERSION \
     (__GNUC__ << 16) + __GNUC_MINOR__
 #define GNUC_PREREQ(maj, min) \
     (GNUC_VERSION >= ((maj) << 16) + (min))
#else
 #define GNUC_PREREQ(maj, min) 0
#endif
#define BUILD_BUG_ON_ZERO(e) \
    (sizeof(struct { int:-!!(e)*1234; }))
#if GNUC_PREREQ(3, 1)
 #define SAME_TYPE(a, b) \
     __builtin_types_compatible_p(typeof(a), typeof(b))
 #define MUST_BE_ARRAY(a) \
     BUILD_BUG_ON_ZERO(SAME_TYPE((a), &(*a)))
#else
 #define MUST_BE_ARRAY(a) \
     BUILD_BUG_ON_ZERO(sizeof(a) % sizeof(*a))
#endif
#ifdef __cplusplus
 template <typename T, size_t N>
 char ( &ARRAY_SIZE_HELPER( T (&array)[N] ))[N];
 #define ARRAY_SIZE( array ) \
      (sizeof( ARRAY_SIZE_HELPER( array ) ))
#else
 #define ARRAY_SIZE(a) ( \
     (sizeof(a) / sizeof(*a)) \
     + MUST_BE_ARRAY(a))
#endif
With this version, if you take the ARRAY_SIZE of a pointer, GCC 4.9 will give you the error message:
error: negative width in bit-field ‘<anonymous>’
This will also work for Clang (as it curiously defines __GNUC__ and __GNUC_MINOR__):
error: anonymous bit-field has negative width (-1234)
The magic value of −1234 is intended as something of a give-away as to the cause of the error, should it occur, assuming that some poor soul will be shown this line on its own without the subsequent lines of output that backtrack through successive layers of macro expansion. At least this way, you would have something to grep for!

I’ve added the nice hack from Chromium’s version of the macro that will provide somewhat patchy checking for compilers without __builtin_types_compatible_p or typeof. It may be worth noting here that the approach Chromium takes is to use ‘0[a]’ instead of ‘*a’, exploiting the equivalence between pointer dereferencing and array indexing in C (a[i] is equivalent to *(a+i)and therefore i[a]) but catching operator overloading (using an operator[] method to handle subscripting for a class) as well.

Instead of adopting Chromium’s trick, I’ve also included a portable version for C++ in the above, although this has the disadvantage that it cannot be used with types defined within a function.

What would actually make sense would be for C and C++ (and Objective C, assuming that’s also affected... and let’s just pretend Objective C++ doesn’t exist) to be extended to include a keyword or some syntax for this. In lieu of any provision for this in the standards, I feel that it would make good sense for compilers to add this as an extension. In GCC and Clang, I imagine a __builtin_arraysize function would probably make a whole lot of sense. Yes, I know things are a lot better in C++, but there may still be times when you can’t escape from its C roots, such as when writing language bindings, or when using a library written in C.

Really, this should have been fixed thirty years ago!