Wednesday, July 6, 2011

Writing a C library, intro, conclusion and errata

This is a series of blog-posts about best practices for writing C libraries. See below for each part and the topics covered.

Table of contents

The entire series about best practices for writing C libraries covered 15 topics and was written over five parts posted over the course of approximately one week. Feel free to hotlink directly to each topic but please keep in mind that the content (like any other content on this blog) is copyrighted by its author and may not be reproduced without his consent (if you are friendly towards free software, like e.g. LWN, just ask and I will probably give you permission):

Topics not covered

Some topics relevant for writing a C library isn't (yet?) covered in this series either because I'm not an expert on the topic, the topic is still in development or for other reasons:
  • Networking
    You would think IP networking is easy but it's really not and the low-level APIs that are part of POSIX (e.g. BSD Sockets) are not really that helpful since they only do part of what you need. Difficult things here include name resolution, service resolutionproxy server handling, dual-stack addressing and transport security (including handling certificates for authentication).

    If you are using modern GLib networking primitives (such as GSocketClient or GSocketService) all of these problems are taken care of for you without you having to do much work; if not, well, talking to people (or at least, read the blogs) such as Dan Winship, Dan Williams or Lennart Poettering is probably your best bet.

  • Build systems
    This is a topic that continues to make me sad so I decided to not really cover it in the series because the best guidance I can give is to just copy/paste whatever other projects are doing - see e.g. the GLib source tree for how to nicely integrate unit testing (see Makefile.decl) and documentation (see docs/reference sub-directories) into the build system (link).

    Ideally we would have a single great IDE for developing Linux libraries and applications (integrating testing, documentation, distribution, package building and so on - see e.g. Sami's libhover video) but even if we did, most existing Linux programmers probably wouldn't use it because they are so used to e.g. emacs or vi (if you build it, they will come?). There's a couple of initiatives in this area including Eclipse CDT, Anjuta, KDevelop and MonoDevelop.

  • Bundling libraries/resources
    The traditional way of distributing applications on Linux is through so-called Linux distributions - the four most well-known being DebianFedoraopenSUSE and Ubuntu (in alphabetical order!). These guys, basically, take your source code, compile it against some version of other software it depends on (usually a different version than you, the developer, used), and then ship binary packages to users using dynamic linking.

    There's a couple of problems with this legacy model of distributing software (this list is not exhaustive): a) it can take up to one or two distribution release cycles (6-12 months) before your software is available to end users; and b) user X can't give a copy of the software to user Y - he can only tell him where to get it (it might not be available on user Y's distro); and c) it's all a hodgepodge of version skew e.g. the final product that your users are using is, most likely, using different versions of different libraries so who knows if it works; and d) the software is sometimes changed in ways that you, the original author, wasn't expecting or does not approve of (for example, by removing credits); and e) the distribution might not forward you bug reports or may forward you bug reports that is caused by downstream patches; and f) there's a peer pressure to not depend on too new libraries because distributions want to ship your software in old versions of their OS - for example, Mozilla wants to be able to run on a system with just GTK+ 2.8 installed (and hence won't use features in GTK+ 2.10 or later except for using dlopen()-techniques), similar for e.g. Google Chrome (maybe with a newer GTK+ version though). These problems are virtually unknown to developers on other platforms such as Microsoft Windows, Mac OS X or even some of the smartphone platforms such as iOS or Android - they all have fancy tools that bundles things up nicely so the developers won't have to worry about such things.

    There's a couple of interesting initiatives in this area see e.g. bockbuild, glick and the proposal to add a resource-system to GLib. Note that it's very very hard to do this properly since it depends not only on fixing a lot of libraries so they are relocatable, it also depends on identifying exactly what kind of run-time requirements each library in question has. The latter includes the kernel/udev version, the libc version (unless bundled or statically linked), the X11 server version (and its extensions such as e.g. RENDER) version, the presence of one or more message buses and so on. With modern techniques such as direct rendering this becomes even harder if you want to take advantage of hardware acceleration since you must assume that the host OS is providing recent enough versions of e.g. OpenGL or cairo libraries (since you don't want to bundle hardware drivers). And even after all this, you still need to deal with how each distribution patches core components. In some circumstances it might end up being easier to just ship a kernel+runtime along with the application, virtualized.
The way the series is set up is so it can be extended at a later point - so if there is a demand for one or more popular topics about writing a C library, I might write another blog entry and add it to this page as it's considered the canonical location for the entire series.

Errata

Please send me feedback and I will fix up the section in question and credit you here (I already have a couple of corrections lined up that I will add later).

Tuesday, July 5, 2011

Writing a C library, part 5

This is part five in a series of blog-posts about best practices for writing C libraries. Previous installments: part one, part two, part three, part four.

API design

A C library is, almost by definition, something that offers an API that is used in applications. Often an API can't be changed in incompatible ways (it can, however, be extended) so it is usually important to get right the first time because if you don't, you and your users will have to live with your mistakes for a long time.

This section is not a a full-blown guide to API design as there's a lot of literature, courses and presentations available on the subject - see e.g. Designing a library that's easy to use - but we will mention the most important principles and a couple of examples of good and bad API design.

The main goals when it comes to API design is, of course, to make the API easy to use - this include choosing good names for types, functions and constants. Be careful of abbreviations - atof might be quick to type but it's not exactly clear that the function parses a C string and returns a double (no, not a float as the name suggests). Typically nouns are used for types and while verbs are used for methods.

Another thing to keep in mind is the number of function arguments - ideally each function should take only a few arguments so it's easy to remember how to use it. For example, no-one probably ever remembers exactly what arguments to pass to g_spawn_async_with_pipes() so programmers end up looking up the docs, breaking the rhythm. A better approach (which is yet to be implemented in GLib), would be to create a new type, let's call it GProcess, with methods to set what you'd otherwise pass as arguments and then a method to spawn the actual program. Not only is this easier to use, it is also extensible as adding a method to a type doesn't break API while adding an argument to an existing function/method does. An example of such an API is libudev's udev_enumerate API - for example, at the time udev starting dealing with device tags, the udev_enumerate type gained the add_match_tag() method.

If using constants, it is often useful to use the C enum type since the compiler can warn if a switch statement isn't handling all cases. Generally avoid boolean types in functions and use flag enumerations instead - this has two advantages: first of all, it's sometimes easier to read foo_do_stuff(foo, FOO_FLAGS_FROBNICATOR) than foo_do_stuff(foo, TRUE) since the reader does not have to expend mental energy on remembering if TRUE translates into whether the frobnicator is to be used or not. Second, it means that several booleans arguments can be passed in one parameter so hard-to-use functions like e.g. gtk_box_pack_start() can be avoided (most programmers can't remember if the expand or fill boolean comes first). Additionally, this technique allows adding new flags without breaking API.

Often the compiler can help - for example, C functions can be annotated with all kinds of gcc-specific annotations that will cause warnings if the user is not using the function correctly. If using, GLib, some of these annotations are available as macros prefixed with G_GNUC, the most important ones being G_GNUC_CONST, G_GNUC_PURE, G_GNUC_MALLOC, G_GNUC_DEPRECATED_FORG_GNUC_PRINTF and G_GNUC_NULL_TERMINATED.

Checklist

  • Choose good type and function names (favor expressiveness over length).
  • Keep the number of arguments to functions down (consider introducing helper types).
  • Use the type system / compiler to your advantage instead of fighting it (enums, flags, compiler annotations).

Documentation

If your library is very simple, the best documentation might just be a nicely formatted C header file with inline comments. Often it's not that simple and people using your library might expect richer and cross-referenced documentation complete with code samples.

Many C libraries, including those in GLib and GNOME itself, use inline documentation tags that can be read by tools such as gtk-doc or Doxygen. Note that gtk-doc works just fine even on low-level non-GLib-using libraries - see e.g. libudev and libblkid API documentation.

If used with a GLib library, gtk-doc uses the GLib type system to draw type hierarchies and show type-specific things like properties and signals. gtk-doc can also easily integrate with any tool producing Docbook documentation such as manual pages or e.g. gdbus-codegen(1) when used to generate docs describing D-Bus interfaces (example with C API docs, D-Bus docs and man pages).

Checklist

  • Decide what level of documentation is needed (HTML, pdf, man pages, etc.).
  • Try to use standard tools such as Doxygen or gtk-doc.
  • If shipping commands/daemons/helpers (e.g. anything showing up in ps(1) output), consider shipping man pages for those as well.

Language bindings

C libraries are increasingly used from higher-level languages such as Python or JavaScript through a so-called language binding - for example, this is what allows the Desktop Shell in GNOME 3 to be written entirely in JavaScript while still using C libraries such as GLib, Clutter and Mutter underneath.

It's outside the scope of this article to go into detail on language bindings (however a lot of the advice given in this series does apply - see also: Writing Bindable APIs) but it's worth pointing out that the goal of the GObject Introspection project (which is what is used in GNOME's Shell) is aiming for 100% coverage of GLib libraries assuming the library is properly annotated. For example, this applies to the GUdev library (a thin wrapper on top of the libudev library) can be used from any language that supports GObject Introspection (JS example).

GObject Intropspection is interesting because if someone adds GObject Introspection support to a new language, X, then the GNOME platform (and a lot of the underlying Linux plumbing as well cf. GUdev) is now suddenly available from that language without any work.

Checklist

  • Make sure your API is easily bindable (avoid C-isms such as variadic functions).
  • If using GLib, set up GObject Introspection and ship GIR/typelibs (notes).
  • If writing a complicated application, consider writing parts of it in C and parts of it in a higher-level language.

ABI, API and versioning

While the API of a library describes how the programmer use it, the ABI describes how the API is mapped onto the target machine the library is running on. Roughly, a (shared) library is said to be compatible with a previous version if a recompile is not needed. The ABI involves a lot of factors including data alignment rules, calling conventions, file formats and other things that are not suitable to cover in this series; the important thing to know about when writing C libraries is how (and if) the ABI changes when the API changes. Specifically, since some changes (such as adding a new function) are backwards compatible, the interesting question is what kind of API changes result in non-backwards-compatible ABI changes.

Assuming all other factors like calling convention are constant, the rule of thumb about compatibility on the ABI level basically boils down to a very short list of allowed API changes:
  • you may add new C functions; and
  • you may add parameters to a function only if it doesn't cause a memory/resource leak; and
  • you may add a return value to a function returning void only if it doesn't cause a memory leak; and
  • modifiers such as const may be added / removed at will since they are not part of the ABI in C
The latter is an example of a change that breaks the API (causing compiler warnings when compiling existing programs that used to compile without warnings) but preserve the ABI (still allowing any previously compiled program to run) - see e.g. this GLib commit for a concrete example (note that this can't be done in C++ because of how name mangling work).

In general, you may not extend C structs that the user can allocate on the stack or embed in another C structure which is why opaque data types are often used since they can be extended without the user knowing. In case the data type is not opaque, an often used technique is to add padding to structs (example) and use it when adding a new virtual method or signal function pointer (example). Other types, such as enumeration types, may normally be extended with new constants but existing constants may not be changed unless explicitly allowed.

The semantics of a function, e.g. its side effect, is usually considered part of the ABI. For example, if the purpose of a function is to print diagnostics on standard output and it stops doing it in a later version of the library, one could argue it's an ABI break even when existing programs are able to call the function and return to the caller just fine possibly even returning the same value.

On Linux, shared libraries (similar to DLLs on Windows) use the so-called soname to maintain and provide backwards-compatibility as well as allowing having multiple incompatible run-time versions installed at the same time. The latter is achieved by increasing the major version number of a library every time a backwards-incompatible change is made. Additionally, other fields of the soname have other (complex) rules associated (more info).

One solution to managing non-backwards-compatible ABI changes without bumping the so-number is symbol versioning - however, apart from being hard to use, it only applies to functions and not e.g. higher-level run-time data structures like e.g. signals, properties and types registered with the GLib type-system.

It is often desirable to have multiple incompatible versions of libraries and their associated development tools installed at the same time (and in the same prefix) - for example, both version 2 and 3 of GTK+. To easily achieve this, many libraries (including GLib and up) include the major version number (which is what is bumped exactly when non-backwards-compatible changes are made) in the library name as well as names of tools and so on - see the Parallel Installation essay for more information.

Some libraries, especially when they are in their early stages of development, specifically gives no ABI guarantees (and thus, does not manage their soname when incompatible changes are made). Often, to better manage expectations, such unstable libraries require that the user defines a macro acknowledging this (example). Once the library is baked, this requirement is then removed and normal ABI stability rules starts applying (example).

Related to versioning, it's important to mention that in order for your library to be easy to use, it is absolutely crucial that it includes pkg-config files along with the header files and other development files (more information).

Checklist

  • Decide what ABI guarantees to give if any (and when)
  • Make sure your users understand the ABI guarantees (being explicit is good)
  • If possible, make it possible to have multiple incompatible versions of your library and tools installed at the same time (e.g. include the major version number in the library name)

Friday, July 1, 2011

Writing a C library, part 4

This is part four in a series of blog-posts about best practices for writing C libraries. Previous installments: part one, part two, part three.

Helpers and daemons

Occasionally it's useful for a program or library to call upon an external process to do its bidding. There are many reasons for doing this - for example, the code you want to use
  • might not be easily used from C - it could be written in say, python or, gosh, bash; or
  • could mess with signal handlers or other global process state; or
  • is not thread-safe or leaking or just bloated; or
  • its error handling is incompatible with how your library does things; or
  • the code needs elevated privileges; or
  • you have a bad feeling about the library but it's not worth (or (politically) feasible) to re-implement the functionality yourself. 
There are three main ways of doing this.

The first one is to just call fork(2) and start using the new library in the child process - this usually doesn't work because chances are that you are already using libraries that cannot be reliably used after the fork() call as discussed in previously (additionally, a lot of unnecessary COW might be happen if the parent process has a lot of writable pages mapped). If portability to Windows is a concern, this is also a non-starter as Windows does not have fork() or any meaningful equivalent that is as efficient.

The second way is to write a small helper program and distribute the helper along with your library. This also uses fork() but the difference is that one of the exec(3) functions is called immediately in the child process so all previous process state is cleaned up when the process image is replaced (except for file descriptors as they are inherited across exec() so be wary of undesired leaks). If using GLib, there's a couple of (portable) useful utility functions to do this (including support for automatically closing file descriptors).

The third way is to have your process communicate with a long-lived helper process (a socalled daemon or background process). The helper daemon can be launched either by dbus-daemon(1) (if you are using D-Bus as the IPC mechanism), systemd if you are using e.g. Unix domain sockets, an init script (uuidd(8) used to do this - wasteful if your library is not going to get used) or by the library itself.

Helper daemons usually serve multiple instances of library users, however it is sometimes desirable to have a helper daemon instance per library user instance. Note that having a library spawn a long-lived process by itself is usually a bad idea because the environment and other inherited process state might be wrong (or even insecure) - see Rethinking PID 1 for more details on why a good, known, minimal and secure working environment is desirable. Another thing that is horribly difficult to get right (or, rather, horribly easy to get wrong) is uniqueness - e.g. you want at most one instance of your helper daemon - see Colin's notes for details and how D-Bus can be used and note that things like GApplication has built-in support for uniqueness. Also, in a system-level daemon, note that you might need to set things like the loginuid (example of how to do this) so things like auditing work when rendering service for a client (this is related to the Windows concept known as impersonation).

As an example, GLib's libproxy-based GProxy implementation uses a helper daemon because dealing with proxy servers involves a interpreting JavaScript (!) and initializing a JS interpreter from every process wanting to make a connection is too much overhead not to mention the pollution caused (source, D-Bus activation file - also note how the helper daemon is activated by simply creating a D-Bus proxy).

If the helper needs to run with elevated privileges, a framework like PolicyKit is convenient to use (for checking whether the process using your library is authorized) since it nicely integrates with the desktop shell (and also console/ssh logins). If your library is just using a short-lived helper program, it's even simpler: just use the pkexec(1) command to launch your helper (example, policy file).

As an aside (since this write-up is about C libraries, not software architecture), many subsystems in today's Linux desktop are implemented as a system-level daemons (often running privileged) with the primary API being a D-Bus API (example) and a C library to access the functionality either not existing at all (applications then use generic D-Bus libraries or tools like gdbus(1) or dbus-send(1)) or mostly generated from the IDL-like D-Bus XML definition files (example). It's useful to contrast this approach to libraries using helpers since one is more or less upside down compared to the other.

Checklist

  • Identify when a helper program or helper daemon is needed
  • If possible, use D-Bus (or similar) for activation / uniqueness of helper daemons.
  • Communicating with a helper via the D-Bus protocol (instead of using a custom binary protocol) adds a layer of safety because message contents are checked.
  • Using D-Bus through a message bus router (instead of peer-to-peer connections) adds yet another layer of safety since the two processes are connected through an intermediate router process (a dbus-daemon(1) instance) which will also validate messages and disconnects processes sending garbage.
  • Hence, if the helper is privileged (meaning that it must a) treat the unprivileged application/library using it as untrusted and potentially compromised; and b) validate all data to it - see Wheeler's Secure Programming notes for details), activating a helper daemon on the D-Bus system bus is often a better idea than using a setuid root helper program spawned yourself.
  • If possible, in particular if you are writing code that is used on the Linux desktop, use PolicyKit (or similar) in privileged code to check if unprivileged code is authorized to carry out the requested operation.

Testing

A sign of maturity is when a library or application comes with a test suite; a good test suite is also incredible useful for ensuring mostly bug-free releases and, more importantly, ensuring that the maintainer is comfortable putting releases out without loosing too much sleep or sanity. Discussing specifics of testing is out of the scope for a series on writing C libraries, but it's worth pointing to the GLib test framework, how it's used (example, example and example) and how this is used by e.g. the GNOME buildbots.

One metric for measuring how good a test suite is (or at least how extensive it is), is determining how much of the code it covers - for this, the gcov tool can be used - see notes on how this is used in D-Bus. Specifically, if the test suite does not cover some edge case, the code paths for handling said edge case will appear as never being executed. Or if the code base handles OOM but the test suite isn't set up to handle it (for example, by failing each allocation) the code-paths for handling OOM should appear as untested.

Innovative approaches to testing can often help - for example, Mozilla employ a technique known as reftests (see also: notes on GTK+ reftests) while the Dracut test suite employs VMs for both client and server to test that booting from iSCSI work.

Checklist

  • Start writing a test suite as early as possible.
  • Use tools like gcov to ascertain how good the test suite is.
  • Run the test suite often - ideally integrate it into the build system ('make check'), release procedures, version control etc.