As with any project, you need to have a goal in mind for what the project should accomplish. Here I will discuss the goals of welibc and how I plan to achieve them.

Goal

To create an implementation of the standard C library, as defined in ISO/IEC 9899:1990, which is clear, secure, and well documented.

Scope

The user mode portions of the standard C library will be implemented first. This means that functions like free, fwrite, and malloc will be saved until the end. This is because those functions must implement OS and architecture specific code which, by its definition, is not portable and could stall the project. This project is not aimed at providing an implementation which will run on every machine but rather an implementation which can serve as a reference or perhaps the base for other projects with similar goals.

Plan

I don't think that either of the three sub-goals are more important than another, but it is important to discuss a preliminary plan for achieving them. I think that implementing one function a week is a moderate goal to allow me time to write the function, explore different ways of achieving the same results, writing tests for the function, and documenting the work. Now that I've said that, maybe two weeks is better. We'll see.

Code Clarity

Having code which is easy to read and digest is very important if you plan on maintaining it or passing it on to someone else. It can make all the difference in the world when the previous programmer (which might be you!) has taken the time to write code in a way which implements their intent such that you can verify it is done correctly. I briefly touched on this before, but I'll give another example. Which code sample below would you prefer to debug?

Sample 1

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("You passed in %d argument%s\n",argc,(argc!=1)?"s":"");
    return 0;
}

Sample 2

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    int numArgs = argc;

    printf("You passed in %d argument", numArgs);

    if (1 != numArgs)
    {
        printf("s");
    }

    printf("\n");

    return 0;
}

Can you spot the error? How would you fix it? These two code examples do exactly the same thing and while one has fewer lines of code and less function calls I would personally prefer to be debugging the 2nd example.

Using a style guide can help a lot with writing clear code. It helps ensure that your code is written in a consistent manner and that there is a defined naming scheme for files, functions, and variables. Previously I said that I would just use the coding style I developed on my own, which isn't documented, but that seems to go against the goals of the project. So, I'll use the Google Style Guide as a template to write my own style guide which I expect to grow in size and specificity as the project progresses. It would be nice if I could write it in markdown and include it as a part of my directory structure then link a blog post to the file itself, rather than updating a single blog post.

Here are corrected versions of the code samples above. The bug lies in that argc includes the program name in its count, so the number of arguments passed to the program is argc - 1.

Corrected Sample 1

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("You passed in %d argument%s\n",argc-1,((argc-1)!=1)?"s":"");
    return 0;
}

Corrected Sample 2

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    int numArgs = argc - 1;

    printf("You passed in %d argument", numArgs);

    if (1 != numArgs)
    {
        printf("s");
    }

    printf("\n");

    return 0;
}

It's worth noting that the first example relies on having to change code in two places while the 2nd example only requires a change in one. This can cause problems during maintence since you rely on the programmer remembering to update every instance of the error.

Writing Secure Code

Some portions of the standard C library will never be secure by their nature, but I want to do as much as I can to create an implementation which is as secure as possible within the confines of the standard.

A great resource for writing secure C code is the Cert C Coding Standard which is a free resource that is continuously updated. If you prefer a hardcopy, they took a snapshot of their standard and printed it; it's about $55 on Amazon. I haven't read the entire standard, but I'll try to abide by its rules as much as possible. Also, the bibliography section has a plethora of other fantastic C resources.

Testing will also be a large component of writing secure code. No matter how skilled of a programmer you are, I don't think it would ever be wise to deploy your code without testing it; no matter how sure you are that it's correct. I still have some research to do on testing frameworks for C, but this is something I plan on doing from the beginning. I don't think this will be test driven development, but tests will certainly be included.

There is also the idea of defensive programming which drives the way you write code with the goal of writing code which is less likely to contain bugs. This doesn't mean you write code without bugs, but that the bugs you do implement will reveal themselves sooner rather than later. A good example of this is below:

/* Print the first argument to the screen */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    if (argc = 2)
    {
        printf("Please provide one argument to be printed.\n");
    }
    else
    {
        printf("%s", argv[1]);
    }

    return 0;
}

First, do you see the bug and can you determine what behavior it causes?

Compiling this with gcc -o defensive defensive.c will not produce any warnings or errors. But no matter what input you give, it will always print "Please provide one argument to be printed." This is because of the comparison if (argc = 2). This actually assigns the value 2 to the variable argc and that expression evaluates to 2 which is true in C. I meant to write if (argc != 2). So, we can rewrite this by placing the constant on the left side of the expression:

/* Print the first argument to the screen */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    if (2 = argc)
    {
        printf("Please provide one argument to be printed.\n");
    }
    else
    {
        printf("%s", argv[1]);
    }

    return 0;
}

Now when we compile with gcc -o defensive defensive.c we get the error:

defensive.c: In function ‘main’:
defensive.c:5:11: error: lvalue required as left operand of assignment
     if (2 = argc)
           ^

So, simply by changing the way we write comparisons we can catch bugs like this early since the code won't even compile! See if you can find the other bug in this code snippet (hint).

Another way to program securely is to take advantage of the features of your compiler. For example, compiling the original example with gcc -Wall -o defensive defensive.c would give the following warning:

defensive.c: In function ‘main’:
defensive.c:5:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
     if (argc = 2)
     ^

That is better than nothing, but the code still compiles. For this reason, I prefer writing the code defensively because it's not always guaranteed that someone else will be compiling your code with the same settings. Yes, there are makefiles or build settings for a project, but copy/pasting code is very common and doesn't capture your specific build flags.

Documentation

Again, documentation is very important and for that reason I want to provide a standard C library which is well documented. I will be using Doxygen to generate documentation from the source code which dictates that my code be commented in specific ways so that the Doxygen parser can read it. This will force me to comment every file and function appropriately and forces me towards meeting this goal.

In addition to documenting the source code, my plan is to document my progress as I go with blog posts that discuss how I arrived at the implementation of each function. I hope that other programmers will find this documentation useful for answering questions like "How does this work?", "Why did you do it this way?", or "What is the best way to _____?". This will also help me have a log to look back over to see why I chose specific implementations and what worked or didn't work the way I wanted it to.

comments powered by Disqus