Blog

string.h, a retrospective

September 03, 2021

Note: this implementation of string.h is missing the locale functions - when they're added this post will be updated.

Call(er) Graphs

Many of the string functions call other string functions to do their work and from a library perspective this is nice. You write the logic for string length calculation once and just as users would call into your library to calculate a string's length, the library can do the same. You may wonder what the various calls chains looks like within the library itself - who calls what?

Conveniently, doxygen can generate this for us as long as the appropriate settings are enabled. Doxygen not only generates call graphs (what functions does I call), but also caller graphs (what functions call me). We'll explore these graphs using the metaphor of a house.

Foundation Functions

At the lowest level, certain functions form the foundation of string.h - they're what everything else is built on top of. These will provide the most interesting caller graphs and should have an empty call graph.

memcmp

If you abstract the concept of "search for a character" to "search for a byte" then it makes sense that memcmp is a foundational function.

memmove

Another service provided by the string header is copying data from one location in memory to another. At the root of this is the memmove function. memcpy could be here as well but in this library it's a wrapper for memmove.

memset

One of the more important functions that's often used for initializing memory, memset offers the ability to copy a value to each byte of a region of memory.

strchr

Locating a specific character within a string is key to many of the string functions. For the more complicated functions this lets us skip ahead to the next meaningful starting point for a search or comparison.

strlen

This is the absolute workhorse of string.h and as such it has the largest caller graph. Most string functions focus on searching the string in some fashion, so knowing where the strings ends is important to knowing when the search can stop if no match is found earlier.

What a graph!

Structure Functions

Next would be functions that both call and are called by other functions in the library. As such, these have two graphs to enjoy.

strcpy

strcspn

strncmp

strspn

Roof Functions

These functions are at the highest level and only call other string functions so they won't have a caller graph within the context of string.h.

memcpy

As mentioned before, memcpy is only here because it's implemented as a wrapper around memcpy - other libraries may implement this as a foundation function.

strcat

strcmp

strncat

strncpy

strpbrk

strrchr

strstr

strtok

Shed Functions

There are only two functions which neither call nor are called by other string functions. Since these are close by but not tied to other functions, "shed" seems appropriate.

Performance Considerations

These graphs aren't just pretty, they also provide a means for guiding how performance could be improved in a library. If performance is a concern then improving the most used functions will be the fastest way to better performance. Using our house analogy from above, start at the foundation and work your way up and out to the shed. By improving the foundation you are improving everything that's built on top of it - just like the stability of a real house.

Error Checking

Since most functions in this library perform error checking before doing work, it's entirely possible that the same error condition is checked multiple times. This is likely the case with pointer validation as the pointer is passed from roof to structure to foundation. With a call chain like strtok to strcspn to strlen, the s1 argument to strtok will be checked three times!

This knowledge can lead to different design decisions based on the goal of the project. If this checking is excessive then the library can provide "internal use only" versions of the functions which don't perform error checking on the arguments. This may not guarantee one check per argument in all cases, but it can reduce the number of checks to only what's necessary.

Implementation Considerations

Similar to the value that these graphs provide to improving performance, they also provide a roadmap for implementing the library in a progressive order. Follow the same strategy of starting at the foundation and working up and out if you want a library which calls into itself. If not then you can start anywhere you like since each function would be completely self-contained.

Administrivia

A short note on the format of the posts. I'll probably rework the sequence of: local variables, error checking, implementation, tests. This isn't how I think about solving problems with C. While it may be how you would explain a written piece of code to someone else (as in going from top to bottom through a function), it doesn't communicate what I want.

Going forward the posts will discuss the main algorithm first and then cover variables needed to make that happen. Once the main functionality is explained then it's more appropriate to discuss error conditions and then tests logically follow that. We'll see if that sequence works better.

strerror

July 25, 2021

tags: errno.h, strerror, string.h

The strerror function maps an error number into a string description of that error.

char *
strerror(int errnum);

It returns a pointer to the string representation of the error, the contents of which are implementation-defined.

Local Variables

We'll use a local variable to point to the string description corresponding to the received error number. This let's us modify what string is pointed to and have a single return statement (as we'll see shortly).

char *str = NULL;

Error Checking

No error checking is necessary for this function, or more accurately, the error checking will be handling by a switch statement that maps the error number to a string description.

Implementation

The total number of error numbers supported by the implementation is only two at this point (EDOM and ERANGE from errno.h), however it's is expected to grow in the future. This presents a good use case for a switch which makes for a cleaner implementation over the use of a large chain of if/else if/else statements. As more error numbers are defined in the future, new cases will be added to translate them.

If the error number is unknown we'll still translate this into a message indicating as such. This way our implementation always returns a description. After the switch, return the pointer to the string that was chosen.

switch (errnum)
{
case EDOM:
    str = "EDOM - An input argument is outside the domain over which the mathematical function is defined";
    break;

case ERANGE:
    str = "ERANGE - The result of the function cannot be represented as a double value";
    break;

default:
    str = "E??? - Unknown error number";
    break;
}

return str;

It's worth mentioning a short side note on style here. When possible, welibc chooses to limit the width of code to 80 characters which aids in readability. The C language supports splitting these long descriptions across multiple lines which might look like the following:

switch (errnum)
{
case EDOM:
    str = "EDOM - An input argument is outside the domain over which the " \
          "mathematical function is defined";
    break;

/* ... */

Although this maintains the 80-character width, it presents a problem when searching through the source code with a tool like grep.

If you were interested in where this error message came from then you might grep for the string "outside the domain" which would get you to the right place. However, if you chose to search for "the mathematical function" then your search wouldn't return any results because that string wouldn't exist on a single line within the welibc code base. For that reason, I consider it to be worth breaking the 80-character width to ensure easy searching later on.

Testing

Since the strings returned by strerror are implementation-defined, we can't test the contents of the strings themselves - we can only test that a string was returned. This simplifies the tests as we only need to check that the returned pointer is not a null pointer. We'll test the currently supported errno values in addition to the unsupported error number 0.

int
strerrorTest(void)
{
    int errnums[]   = { 0, EDOM, ERANGE };
    size_t i        = 0;
    int ret         = 0;

    for (i = 0; i < sizeof(errnums)/sizeof(errnums[0]); i++)
    {
        if (!strerror(errnums[i]))
        {
            ret = 1;
            break;
        }
    }

    return ret;
}

Conclusion

Unfortunately, strerror is a somewhat boring library function. The implementation chooses how to represents each error number as a string and returns those representations as appropriate.

char *
strerror(int errnum)
{
    char *str = NULL;

    switch (errnum)
    {
    case EDOM:
        str = "EDOM - An input argument is outside the domain over which the mathematical function is defined";
        break;

    case ERANGE:
        str = "ERANGE - The result of the function cannot be represented as a double value";
        break;

    default:
        str = "E??? - Unknown error number";
        break;
    }

    return str;
}

strtok

July 24, 2021

tags: strcspn, string.h, strspn, strtok

The strtok function aids in splitting a string into pieces based on a given set of characters which delimit each piece (tokenizing a string).

char *
strtok(char *s1, const char *s2);

If the delimiting characters are found then a pointer to the first non-delimiter is returned and the next delimiter is overwritten with a null character. Subsequent calls passing a null pointer as the first parameter will return a pointer to the next token from the original string, or a null pointer if no token remains.

Keep in mind that strtok can modify the string you pass in (note the first parameter is a char *, rather than a const char *). If you need to retain the original data then it's best to copy it and tokenize the copy, leaving the original data untouched.

It's also important to understand that strtok will preserve a pointer across calls to it, namely a pointer to the end of the last token so that it can continue tokenizing if a null pointer is passed in as the first parameter.

Local Variables

As mentioned, strtok needs to keep track of where it left off so that subsequent calls with a null pointer as the s1 argument will continue to tokenize the string that was last seen. In order to preserve data across function calls we can use the static storage class specifier to instruct the C compiler that we want the variable to have a static storage duration which means that the "lifetime" of the variable is constant. The way that's achieved during execution of a program is for the variable's lifetime to be the entire duration of the program itself. We choose this storage class on a local variable because this data is only needed within the strtok function itself.

static char *save = NULL;

Error Checking

The first condition we want to check is whether or not a set of token delimiters was passed in, so we validate the s2 pointer. If it's a null pointer then we indicate to the caller that no token was found by returning a null pointer.

if (!s2)
{
    return NULL;
}

Now let's consider the string being tokenized. If multiple tokens exist in the same string then the caller will initiall pass in a pointer to the string and then pass in a null pointer to continue tokenizing the same string. This means that a null pointer for s1 is not necessarily an error. It would be an error if s1 were null before any tokens had been identified and the way we can detect this condition is if our saved pointer is also null. If our saved pointer wasn't null then strtok should use the saved pointer and continue tokenizing where it left off.

if (!s1)
{
    if (!save)
    {
        return NULL;
    }

    s1 = save;
}

Implementation

The first job of strtok is to find the initial character of the next token. If s1 points at a non-delimiter character then that's the initial character. However, if s1 points at a delimiting character then it, along with any subsequent delimiters, must be skipped over. This would be called the span of delimiting characters and is what the strspn function can be used for - it allows us to skip over any initial delimiting characters. In the case that we skip to the end of the string (i.e. the remaining part of the string only consisted of delimiters) then no token remains and we return a null pointer.

s1 += strspn(s1, s2);
if ('\0' == *s1)
{
    return NULL;
}

The next job is to find the end of the token which will either be the end of the string or at the next delimiter. This would be the span of non-delimiting characters, said another way this is the span of the complement of the delimiting characters which is what strcspn can be used for. When we find the end we'll update our saved pointer to that location.

save = s1 + strcspn(s1, s2);

Finally, we need to null terminate the token data to make the token itself a string (i.e. a sequence of characters with a null terminator). There are two possiblities for the character following the end of the token: it's either a delimiter or a null character. Before we can check for these conditions, we should validate the saved pointer since it was updated through the call to strcspn.

if (save)
{
    /* Do something */
}

If the current token ends in a delimiter then we need to overwrite it with a null character and update the saved pointer to the next character so that a subsequent call to strok with a null pointer for s1 will continue tokenizing at the correct place.

if (save && *save)
{
    *save++ = '\0';
}

If the current token already ends in a null terminator then no more work is needed. It's important that we don't update the saved pointer in this case, otherwise it would point beyond the null terminator into memory with unknown data. By retaining a pointer to a null character, subsequent calls will return properly when strtok attempts to skip over any initial delimiting characters.

Testing

There are quite a few edge cases to check for with the functionality of strtok so there are many test cases to go through. Since strtok can preserve a pointer across calls, the test strings consist of tokens which start with different characters. This allows the test to ensure that tokenizing is progressing correctly by checking a single character rather than comparing an entire string.

The first tests ensure that input validation is performed properly (remember that a null pointer for s1 isn't always an error). Next, tests are run to validate that tokenization works with a single delimiter between the tokens, in addition to switching the string being tokenized to ensure that strtok starts using the new string rather than the saved pointer. Following that are similar tests with multiple delimiters. Next, validate that tokenizing strings with initial and terminating characters works properly. Then we verify that the delimiters can be changed while tokenizing the same string. Finally, the standard includes a few test cases to help demonstrate the desired functionality, so those are included as well.

int
strtokTest(void)
{
    int ret             = 1;
    char str1[]         = "ABCD:BCD:CD:D";
    char str1_copy[]    = "ABCD:BCD:CD:D";
    char sep1[]         = ":";
    char str2[]         = "EFGH:!FGH!:GH:!H";
    char str2_copy[]    = "EFGH:!FGH!:GH:!H";
    char sep2[]         = ":!";
    char str3[]         = ":!:IJKL!:!";
    char sep3[]         = ":!";
    char str4[]         = "MNOP!::";
    char sep4[]         = "!:";
    char str5[]         = "QRST::UVWX:!:YZ";
    char sep5_1[]       = ":";
    char sep5_2[]       = "!";
    /* From the standard */
    static char str[]   = "?a???b,,,#c";
    char *t             = NULL;

    do
    {
        /* Invalid token string on first call with invalid separators */
        if (NULL != strtok(NULL, NULL))
        {
            break;
        }

        /* Still invalid token string with valid separators */
        if (NULL != strtok(NULL, sep1))
        {
            break;
        }

        /* Valid token string with invalid separators */
        if (NULL != strtok(str1, NULL))
        {
            break;
        }

        /* Tokenize on single separator */
        if (0 != strcmp("ABCD", strtok(str1, sep1)))
        {
            break;
        }

        /* Tokenize new string */
        if (0 != strcmp("EFGH", strtok(str2, sep2)))
        {
            break;
        }

        /* Tokenize on single seperator */
        memcpy(str1, str1_copy, sizeof(str1));
        if (0 != strcmp("ABCD", strtok(str1, sep1)))
        {
            break;
        }

        /* Tokenize on single seperator, second time */
        if (0 != strcmp("BCD", strtok(NULL, sep1)))
        {
            break;
        }

        /* Tokenize on single seperator, third time */
        if (0 != strcmp("CD", strtok(NULL, sep1)))
        {
            break;
        }

        /* Tokenize on single seperator, fourth time */
        if (0 != strcmp("D", strtok(NULL, sep1)))
        {
            break;
        }

        /* Tokenize on single seperator, end */
        if (NULL != strtok(NULL, sep1))
        {
            break;
        }

        /* Tokenize on multiple seperators */
        memcpy(str2, str2_copy, sizeof(str2));
        if (0 != strcmp("EFGH", strtok(str2, sep2)))
        {
            break;
        }

        /* Tokenize on multiple seperators, second time */
        if (0 != strcmp("FGH", strtok(NULL, sep2)))
        {
            break;
        }

        /* Tokenize on multiple seperators, third time */
        if (0 != strcmp("GH", strtok(NULL, sep2)))
        {
            break;
        }

        /* Tokenize on multiple seperators, fourth time */
        if (0 != strcmp("H", strtok(NULL, sep2)))
        {
            break;
        }

        /* Tokenize on multiple seperators, end */
        if (NULL != strtok(NULL, sep2))
        {
            break;
        }

        /* Tokenize with initial leading separators */
        if (0 != strcmp("IJKL", strtok(str3, sep3)))
        {
            break;
        }

        /* Tokenize with initial leading separators, end */
        if (NULL != strtok(NULL, sep3))
        {
            break;
        }

        /* Tokenize with trailing separators */
        if (0 != strcmp("MNOP", strtok(str4, sep4)))
        {
            break;
        }

        /* Tokenize with trailing separators, repeat */
        if (NULL != strtok(NULL, sep4))
        {
            break;
        }

        /* Change token string, start */
        if (0 != strcmp("QRST", strtok(str5, sep5_1)))
        {
            break;
        }

        /* Change token string, change */
        if (0 != strcmp(":UVWX:", strtok(NULL, sep5_2)))
        {
            break;
        }

        /* Change token string, change again */
        if (0 != strcmp("YZ", strtok(NULL, sep5_1)))
        {
            break;
        }

        /* Test from the standard: 't points to the token "a"' */
        t = strtok(str, "?");
        if (0 != strcmp("a", t))
        {
            break;
        }

        /* Test from the standard, 't points to the token "??b" */
        t = strtok(NULL, ",");
        if (0 != strcmp("??b", t))
        {
            break;
        }

        /* Test from the standard, 't points to the token "c"' */
        t = strtok(NULL, "#,");
        if (0 != strcmp("c", t))
        {
            break;
        }

        /* Test from the standard, 't is a null pointer' */
        t = strtok(NULL, "?");
        if (t)
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Although the work performed by strtok is complicated, it can be described succinctly through the use of existing library functions (namely strspn and strcspn). This is a great demonstration of the power of capturing common logic into functions for reuse.

char *
strtok(char *s1, const char *s2)
{
    static char *save = NULL;

    if (!s2)
    {
        return NULL;
    }

    if (!s1)
    {
        if (!save)
        {
            return NULL;
        }

        s1 = save;
    }

    /* Skip initial delimiters */
    s1 += strspn(s1, s2);
    if ('\0' == *s1)
    {
        return NULL;
    }

    /* Find next delimiter */
    save = s1 + strcspn(s1, s2);
    if (save && *save)
    {
        *save++ = '\0';
    }

    return s1;
}

strstr

June 27, 2021

tags: strchr, string.h, strlen, strncmp, strstr

The strstr function lets us search for the first occurrence of one string in another string.

char *
strstr(const char *s1, const char *s2);

If the string is found then a pointer to its beginning is returned, if the string being found is empty then a pointer to the beginning of the search string is returned, otherwise NULL is returned.

Local Variables

During the main loop of this function we'll reference the length of the string being found many times so it will be best to store that in a variable rather than calling strlen(s2) over and over again.

size_t s2_len = 0;

Error Checking

As normal, first we check to see if either string is invalid.

if (!s1 || !s2)
{
    return NULL;
}

Next, we'll calculate and store the length of the string being found and if it's empty (has zero length) then we'll return a pointer to the first string.

s2_len = strlen(s2);
if (0 == s2_len)
{
    return (char *) s1;
}

Implementation

As a precursor to the main search logic, if the string to be found is longer than the string being searched then a match will never occur and we can bail early.

/* A longer string will never be found */
if (s2_len > strlen(s1))
{
    return NULL;
}

The general idea will be to focus on the first character in the string being found and skip along each occurrence in the string being searched. Since this is a prerequisite of matching the entire string, it will prevent us from performing unnecessary comparisons. As we visit each instance of that character we'll then compare the full string being found - if a match is identified then we can stop searching and return the matching location.

First, we'll want to loop until s1 points to the terminating null character.

while (*s1)
{
    /* Perform search */
}

Next, we want to skip to the next location of the first character of the string that we want to find. strchr will do exactly what we need and we'll update our search string since unmatched characters no longer matter as we progress. If the first character isn't found then NULL will be returned and this will be the same value we return to the caller since the string wasn't found either.

while (*s1)
{
    /* Jump to the next match on the first character */
    s1 = strchr(s1, *s2);
    if (!s1)
    {
        break;
    }

    /* Perform comparison */
}

return (char *) s1;

If we do get a match on the first character then we will perform a string comparison to identify a match. However, since the string being searched may be longer than the string being found, we need to ensure that we only compare the characters in the string being found which is done through the use of strncmp.

Finally, if no match was found then we increment our search string pointer and try again.

while (*s1)
{
    /* Jump to the next match on the first character */
    s1 = strchr(s1, *s2);
    if (!s1)
    {
        break;
    }

    if (0 == strncmp(s1, s2, s2_len))
    {
        break;
    }

    s1++;
}

return (char *) s1;

Testing

For testing, we'll ensure that we pass NULL pointers to validate error checking, search for a string which is longer than the string being searched, attempt some near matches, and finally search for matches that exist at the beginning, middle, and end of the string being searched.

int
strstrTest(void)
{
    int ret     = 1;
    char str[]  = "Hello, world!\n";
    char str2[] = "H";

    do
    {
        /* Invalid search string */
        if (NULL != strstr(NULL, str))
        {
           break;
        }

        /* Invalid substring */
        if (NULL != strstr(str, NULL))
        {
            break;
        }

        /* Empty substring */
        if (str != strstr(str, ""))
        {
            break;
        }

        /* No match */
        if (NULL != strstr(str, "goodbye"))
        {
            break;
        }

        /* No match on single character */
        if (NULL != strstr(str, "z"))
        {
            break;
        }

        /* Search for string that is longer but matches initially */
        if (NULL != strstr(str2, "Hello"))
        {
            break;
        }

        /* Mismatch on last character */
        if (NULL != strstr(str, "Hellp"))
        {
            break;
        }

        /* Find itself */
        if (str != strstr(str, str))
        {
            break;
        }

        /* Find match at beginning */
        if (str != strstr(str, "Hello"))
        {
            break;
        }

        /* Find match in middle */
        if (strchr(str, 'w') != strstr(str, "wo"))
        {
            break;
        }

        /* Find match at end */
        if (strchr(str, 'w') != strstr(str, "world!\n"))
        {
            break;
        }

        /* Find match of single character */
        if (strchr(str, ' ') != strstr(str, " "))
        {
            break;
        }

        /* Match a single character only */
        if (str2 != strstr(str2, "H"))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

This function is a little larger than the other ones we've covered so far, but still boils down to straightforward logic of jumping to the next match of the first character and then performing a full comparison of the string being found.

char *
strstr(const char *s1, const char *s2)
{
    size_t s2_len = 0;

    if (!s1 || !s2)
    {
        return NULL;
    }

    s2_len = strlen(s2);
    if (0 == s2_len)
    {
        return (char *) s1;
    }

    /* A longer string will never be found */
    if (s2_len > strlen(s1))
    {
        return NULL;
    }

    while (*s1)
    {
        /* Jump to the next match on the first character */
        s1 = strchr(s1, *s2);
        if (!s1)
        {
            break;
        }

        if (0 == strncmp(s1, s2, s2_len))
        {
            break;
        }

        s1++;
    }

    return (char *) s1;
}

strspn

June 21, 2021

tags: strchr, string.h, strspn

The strspn function is the opposite of strcspn in that it computes the number of characters at the start of a string that are in the set of characters you specify.

size_t
strspn(const char *s1, const char *s2);

The return value is that count of characters.

While strcspn is nice for finding the first occurence of any of a given set of characters, strspn's use is more for finding the occurence of the first character not in a given set. This description may seem backwards, but an example will help.

char *item_nums = { "AA-BC-1234", "AA-321" };
char *set = "ABC-";
size_t loc = strspn(item_nums[0], set);
printf("Serial Number 1: %s\n", loc);
size_t loc = strspn(item_nums[1], set);
printf("Serial Number 2: %s\n", loc);

In this scenario there is an item numbering scheme which consists of the letters A, B, and C, hyphens, and a serial number. The letters will always preface the numbers but the number of letters can change. Using strspn allows you to easily index to the number.

As a bonus, how else might you solve this problem (hint: strrchr)?

Local Variables

There isn't much to keep track of since we're only calculating a length, so we'll create a variable of type size_t to hold that information which is also the return value.

size_t len = 0;

Error Checking

The only error conditions we need to explicitly check for are NULL pointers, in both cases the return value will be 0 so they can be handled together.

if (!s1 || !s2)
{
    return 0;
}

You might consider empty strings to be another error condition, but the loop logic that follows will handle both situations automatically.

Implementation

The main loop is nearly identical to strcspn except we want to find a character with strchr so we directly use its return value as a loop condition. The idea is to march along the string being searched and check whether each character is in the set of characters to find. When we find a character that isn't in the set then we end our search and return the number of characters that were found.

while (*s1 &&
       strchr(s2, *s1++))
{
    len++;
}

return len;

If s1 were empty then the loop body would never execute and 0 would be returned. If s2 were empty then strchr would return NULL and the loop body wouldn't execute either.

Testing

These tests closely mimic those for strcspn with slight modifications to error condition checking.

int
strspnTest(void)
{
    int ret     = 1;
    char str[]  = "ABCDEFG";
    char str2[] = "ABCD";

    do
    {
        /* NULL source */
        if (0 != strspn(NULL, "ABC"))
        {
            break;
        }

        /* NULL span set */
        if (0 != strspn(str, NULL))
        {
            break;
        }

        /* Empty search string */
        if (0 != strspn("", "ABCD"))
        {
            break;
        }

        /* Empty search set */
        if (0 != strspn(str, ""))
        {
            break;
        }

        /* Missing characters only */
        if (0 != strspn(str, "hijkl"))
        {
            break;
        }

        /* Short set of existing characters only */
        if (strlen(str2) != strspn(str, str2))
        {
            break;
        }

        /* Existing characters only */
        if (strlen(str) != strspn(str, str))
        {
            break;
        }

        /* Missing character at beginning of search set */
        if (strlen(str) != strspn(str, "aABCDEFG"))
        {
            break;
        }

        /* Missing character in middle of search set */
        if (strlen(str) != strspn(str, "ABCDdEFG"))
        {
            break;
        }

        /* Missing character at end of search set */
        if (strlen(str) != strspn(str, "ABCDEFGg"))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

As seen with the serial number example, strspn can be used to tell you two pieces of information about a string: how long the initial set of matching characters is and at what index non-matching characters start.

size_t
strspn(const char *s1, const char *s2)
{
    size_t len = 0;

    if (!s1 || !s2)
    {
        return 0;
    }

    while (*s1 &&
           strchr(s2, *s1++))
    {
        len++;
    }

    return len;
}

strrchr

March 06, 2021

tags: strchr, string.h, strrchr

The strrchr function enables the caller to find the location of the last occurrence of a character in a string.

char *
strrchr(const char *s, int c);

The return value is a pointer to the found character, or NULL if it does not exist in the string.

Local Variables

We use a char search character for the same reason mentioned for strchr. Otherwise, we only need a pointer to the end of the string so we can search in the reverse direction.

const char  *cur    = NULL;
const char  search  = c;

Implementation

The implementation is similar to that of strrchr, but performed in the reverse direction. We use the beginning of the string as our terminal position, rather than that of the null-character. Another difference is that we first determine where the end of the string is, which we can rely on strlen to calculate for us.

The only catch to this function is that since the null-character is considered part of the search, we must ensure we compare that character every time - even if the string is empty (i.e. the string points to the null-character). This is where the --cur becomes important. When an empty string is searched, by pre-decrementing we ensure that we only inspect the null-character rather than looping an extra time (which would happen if post-decrement, cur--, was used).

if (!s)
{
    return NULL;
}

cur = s + strlen(s);
do
{
    if (search == *cur)
    {
        return (char *) cur;
    }

} while (--cur >= s);

return NULL;

Testing

First, we'll test our parameter validation by passing a NULL pointer, and attempting to search an empty string which should succeed when we search for the null-character. Next, we'll search for characters that don't exist at all, or exist past a null character in the array we define. Finally, make sure it works when finding characters at the beginning, middle, and end of the string (i.e. finding the null character), as well as finding the last occurrence of a character which occurs multiple times in the string.

int
strrchrTest(void)
{
    int     ret         = 1;
    char    str[]       = "ABCDEFGBCDEFG\0abcdefg";
    char    emptyStr[]  = "";

    do
    {
        /* NULL object */
        if (NULL != strrchr(NULL, 'a'))
        {
            break;
        }

        /* Search empty string for characters */
        if (NULL != strrchr(emptyStr, 'a'))
        {
            break;
        }

        /* Search empty string for the null-character */
        if (emptyStr != strrchr(emptyStr, '\0'))
        {
            break;
        }

        /* Search for non-existent character */
        if (NULL != strrchr(str, 'Z'))
        {
            break;
        }

        /* Search for char after null character */
        if (NULL != strrchr(str, 'a'))
        {
            break;
        }

        /* Valid search at beginning */
        if (str != strrchr(str, str[0]))
        {
            break;
        }

        /* Valid search in middle*/
        if (strchr(str + 2, str[1]) != strrchr(str, str[1]))
        {
            break;
        }

        /* Valid search at end */
        if ((str + strlen(str) - 1) != strrchr(str, str[strlen(str) - 1]))
        {
            break;
        }

        /* Search for null character at end */
        if ((str + strlen(str)) != strrchr(str, '\0'))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

strrchr is the reverse analogue of strchr with the same catch about finding the null-character.

char *
strrchr(const char *s, int c)
{
    const char  *cur    = NULL;
    const char  search  = c;

    if (!s)
    {
        return NULL;
    }

    cur = s + strlen(s);
    do
    {
        if (search == *cur)
        {
            return (char *) cur;
        }

    } while (--cur >= s);

    return NULL;
}

strpbrk

December 17, 2020

tags: strchr, strcspn, string.h, strpbrk

The strpbrk function provides a simpler method of calling strchr when searching for any one of several characters in a given string. It searches one string for the first occurrence of any character in another string.

char *
strpbrk(const char *s1, const char *s2);

The return value is a pointer to the found character, or NULL if no characters were found.

Suppose you wanted to split a string on any one of a set of characters, but you wanted to know whether you could perform a split before doing the work (e.g. before calling strtok). strpbrk will return the pointer to the split location assuming you wanted to split on whichever of your characters came first, then you could do you more advanced logic of breaking the string into two pieces.

char *phrase = "Goodbye, cruel world";
char *set = "abc";

if (strpbrk(phrase, set))
{
    /* Advanced split logic */
}

You may think that this sounds very similar to strcspn and that's because it is! The difference is that strcspn indicates a length between the start of the string and the first match while strpbrk returns a pointer to the first match.

Local Variables

We'll be utilizing the strcspn function to do the core work and we want to store the return value so we can index into the string being searched.

size_t idx = 0;

Error Checking

Since strcspn can indicate an error by returning 0, we wouldn't know whether a character was found in the first location or if an error occurred, so we need to validate s1 before passing it to strcspn. Normally we would want to check for a NULL s2 pointer, however strcspn will do that for us so we can skip that check.

if (!s1)
{
    return NULL;
}

Implementation

We'll leverage strcspn to find the length to the first match which allows us to validate no match if the length is equal to the length of our string being searched. In that case we return NULL, otherwise we return a pointer to the matched character which is the string pointer plus the length returned by strcspn.

idx = strcspn(s1, s2);
if ('\0' == s1[idx])
{
    return NULL;
}

return (char *)(s1 + idx);

Testing

Due to the similar functionality, the testing is nearly identical to that of strcspn. We'll test input validation and then cases where there are no matches and then matches at the beginning, middle, and end of the string being searched.

int
strpbrkTest(void)
{
    int ret     = 1;
    char str[]  = "ABCDEFGabcdefg";

    do
    {
        /* NULL search string */
        if (NULL != strpbrk(NULL, "abcd"))
        {
            ret = 2;
            break;
        }

        /* NULL search set */
        if (NULL != strpbrk(str, NULL))
        {
            ret = 3;
            break;
        }

        /* Empty search string */
        if (NULL != strpbrk("", "abcd"))
        {
            ret = 4;
            break;
        }

        /* Empty search set */
        if (NULL != strpbrk(str, ""))
        {
            ret = 5;
            break;
        }

        /* Characters not in search string */
        if (NULL != strpbrk(str, "hijk"))
        {
            ret = 6;
            break;
        }

        /* First character matches */
        if (str != strpbrk(str, "hijAklm"))
        {
            ret = 7;
            break;
        }

        /* Last character matches */
        if ((str + strlen(str) - 1) != strpbrk(str, "hijgklm"))
        {
            ret = 8;
            break;
        }

        /* Match in middle */
        if (strchr(str, 'a') != strpbrk(str, "hijaklm"))
        {
            ret = 9;
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

The name wasn't immediately obvious to me but strpbrk stands for "string pointer break" which likely comes from the SNOBOL language which had a BREAK() function to perform the same task (and it had a SPAN() function that mirrors strcspn). Due to the commanality with strcspn, you could swap which function does the main work and then implement strcpsn by calling this function instead.

char *
strpbrk(const char *s1, const char *s2)
{
    size_t idx = 0;

    if (!s1)
    {
        return NULL;
    }

    idx = strcspn(s1, s2);
    if ('\0' == s1[idx])
    {
        return NULL;
    }

    return (char *)(s1 + idx);
}

strcspn

December 05, 2019

tags: strchr, strcspn, string.h, strlen

The strcspn function is like an advanced version of strchr - it determines the number of characters at the start of a string are not in the set of characters you specify.

size_t
strcspn(const char *s1, const char *s2);

The return value is that count of characters.

This function is somewhat confusing so it will help to show an example. Using strcspn allows you to answer a question like "Where does the first instance of a, b, or c occur in the string 'Goodbye, cruel world'?". C code which answers this question would look like so:

char *phrase = "Goodbye, cruel world";
char *set = "abc";
size_t loc = strcspn(phrase, set);
printf("The first instance is at index %u", len);

At this point, loc will either be equal to the length of the phrase, implying that a, b, or c does not exist in the string, or it will hold the index of the first occurrence of either of those characters. We can see that there is a 'b' in "Goodbye", so the message printed to the screen will be "The first instance is at index 4".

Local Variables

There isn't much to keep track of since we're only calculating a length, so we'll create a variable of type size_t to hold that information which is also the return value.

size_t len = 0;

Error Checking

There are two error cases we need to account for and each requires a different return value. The first case is when the string being searched, s1, is NULL. If this is the case then the string has no length and the return value would be 0.

if (!s1)
{
    return 0;
}

The second case is when no search characters are provided, i.e. s2 is NULL. Since we don't have anything to search for then the return value is equal to the length of the string being searched.

if (!s2)
{
    return strlen(s1);
}

The order of these checks is important as well. The first case implies that we can't do any searching while the seconds implies that we don't need to do any searching.

Implementation

The method we'll use is to march along the first string and look for each subsequent character in the second string. We'll keep marching until we reach the end of the string being searched or we find a match. The hardest part here is finding matches but we can farm that work out to strchr since that's exactly what it's for! As long as we continue marching then we also continue updating the count being returned.

while (*s1 &&
       !strchr(s2, *s1++))
{
    len++;
}

return len;

Testing

First, we'll test our parameter validation by passing a NULL pointer for each of the strings in sequence and similarly pass empty strings for each. Next, we'll search for characters that don't exist at all to ensure that we get the length of the search string back. Finally, we'll place the first-matched character at various positions in the string of characters being matched against.

int
strcspnTest(void)
{
    int ret = 1;
    char str[] = "ABCDEFGabcdefg";

    do
    {
        /* NULL source */
        if (0 != strcspn(NULL, "ABC"))
        {
            break;
        }

        /* NULL span complement */
        if (strlen(str) != strcspn(str, NULL))
        {
            break;
        }

        /* Empty search string */
        if (0 != strcspn("", "ABCD"))
        {
            break;
        }

        /* Empty search set */
        if (strlen(str) != strcspn(str, ""))
        {
            break;
        }

        /* Characters aren't in string */
        if (strlen(str) != strcspn(str, "hijkl"))
        {
            break;
        }

        /* First character is in the search set */
        if (0 != strcspn(str, "A"))
        {
            break;
        }

        /* Character at beginning of search set */
        if (3 != strcspn(str, "Dabcd"))
        {
            break;
        }

        /* Character in middle of search set */
        if (3 != strcspn(str, "abDcd"))
        {
            break;
        }

        /* Character at end of search set */
        if (3 != strcspn(str, "abcdD"))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Although the description is somewhat confusing, strcspn is not difficult to implement. The name comes from "string complement span" because it computes the span of characters in a string that are in the set complementary to the one provided.

size_t
strcspn(const char *s1, const char *s2)
{
    size_t len = 0;

    if (!s1)
    {
        return 0;
    }

    if (!s2)
    {
        return strlen(s1);
    }

    while (*s1 &&
           !strchr(s2, *s1++))
    {
        len++;
    }

    return len;
}

memchr

December 05, 2018

tags: memchr, string.h

Once you have collected or generated data, you may want to search through it to find the information you're interested in. memchr is the first in the collection of search functions that can be used to accomplish such a task. It takes the object being search, a character to find, and a number of characters to search through.

void *
memchr(const void *s, int c, size_t n);

The return value is a pointer to the found character, or NULL if it does not exist in the object.

Local Variables

The description of memchr is very specific with respect to the data types used for searching. As you see with the prototype, memchr allows any object to be searched since it takes and returns a void *. In addition, it takes a search character which is actually an int (which is what a character would be promoted to anyways). However, The Standard has this to say about the data types

"The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s."

As such, rather than adding casts into our code, we can declare local variables of the appropriate types and assign them to implicitly perform the casts for us.

const unsigned char *pCur   = s;
const unsigned char search  = c;

Parameter Validation

Since this function will do all of its own work, we need to perform some parameter validation. This includes checking for a NULL pointer and when the search would wrap around memory. We exclude checking n against 0 because this isn't likely to happen and is handled by the logic to come. If the parameters are invalid then the search can't be performed and we return NULL.

if ( !s ||
    /* Check for wrapping while searching */
    ((((unsigned long) -1) - ((unsigned long) s)) < n))
{
    return NULL;
}

Implementation

A similar approach to memcmp will be taken where we loop over each character, decrementing n each iteration, but we will always compare against our search character. It's important to remember than after the if statement is executed, pCur will point one past the character that was just searched. If that was the character we were searching for then we must decrement pCur before returning its value as a void *. Finally, if the character was never found, return NULL to indicate as such.

while (n-- > 0)
{
    if (search == *(pCur++))
    {
        return (void *) --pCur;
    }
}

return NULL;

Testing

First we'll test our parameter validation by passing a NULL pointer, and attempting to trigger a search which would wrap memory. Next we'll search for a character which exists in the object, but specify that no characters should be searched, along with a character that doesn't exist. Lastly, search for characters at the boundaries of the object and in the middle.

int
memchrTest(void)
{
    int ret = 1;
    char obj[] = "ABCDEFG";
    char *p = NULL;

    do
    {
        /* NULL object */
        if (NULL != memchr(NULL, 'a', 10))
        {
            break;
        }

        /* Wrapped search */
        if (NULL != memchr((void *) ((unsigned long) -5), 'A', 10))
        {
            break;
        }

        /* Search no characters */
        if (NULL != memchr(obj, obj[0], 0))
        {
            break;
        }

        /* Search for non-existent character */
        if (NULL != memchr(obj, 'a', sizeof(obj)))
        {
            break;
        }

        /* Valid search at beginning */
        if (obj != memchr(obj, obj[0], sizeof(obj)))
        {
            break;
        }

        /* Valid search in middle*/
        if ((obj + 4) != memchr(obj, obj[4], sizeof(obj)))
        {
            break;
        }

        /* Search for null-character at end */
        p = memchr(obj, '\0', sizeof(obj));
        if (strlen(obj) != (size_t) (p - obj))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

While memchr represents a simple operation, we must ensure that we follow the standard by using the correct data types for comparison. Aside from this, we see the same pattern used in functions like memcmp to iterate over the object and perform the search operation.

void *
memchr(const void *s, int c, size_t n)
{
    const unsigned char *pCur   = s;
    const unsigned char search  = c;

    if ( !s ||
        /* Check for wrapping while searching */
        ((((unsigned long) -1) - ((unsigned long) s)) < n))
    {
        return NULL;
    }

    while (n-- > 0)
    {
        if (search == *(pCur++))
        {
            return (void *) --pCur;
        }
    }

    return NULL;
}

strchr

December 05, 2018

tags: strchr, string.h

The strchr function enables the caller to find the location of the first occurrence of a character in a string.

char *
strchr(const char *s, int c);

The return value is a pointer to the found character, or NULL if it does not exist in the string.

Local Variables

Similar to memchr, the description of strchr is very specific with respect to the data type used for searching. As you see with the prototype, strchr accepts an int as the type being search for. This is partially due to the integer promotion rules, whereby a character being passed as an argument to a function will be implicitly promoted to an int with the same sign. The other part is to maintain compatibility with pre-standardized C code.

"The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s."

As such, rather than adding casts into our code, we can declare a local char and implicitly cast the searched for character.

const char search = c;

Implementation

We'll use an approach which is slightly similar to strlen where we'll march along the string, comparing each character to search until we find a match or reach the end of the string. One interesting property of this specific function is that it counts the terminating null character as part of the string. This means you can search for null character with strchr. This also means you need to first check for the desired character, and then check against '\0' before exiting the loop. In addition, we get parameter validation for free by using s as the loop condition. Finally, we return NULL if the loop ends as it means we reached the null character without finding a match.

while (s)
{
    if (search == *s)
    {
        return (char *) s;
    }

    if ('\0' == *s)
    {
        break;
    }

    s++;
}

return NULL;

Testing

First, we'll test our parameter validation by passing a NULL pointer, and attempting to search an empty string. Next, we'll search for characters that don't exist at all, or exist past a null character in the array we define. Finally, make sure it works when finding characters at the beginning, middle, and end of the string (i.e. finding the null character).

int
strchrTest(void)
{
    int ret = 1;
    char str[] = "ABCDEFG\0abcdefg";

    do
    {
        /* NULL object */
        if (NULL != strchr(NULL, 'a'))
        {
            break;
        }

        /* Search no characters */
        if (NULL != strchr("", 'a'))
        {
            break;
        }

        /* Search for non-existent character */
        if (NULL != strchr(str, 'Z'))
        {
            break;
        }

        /* Search for char after null character */
        if (NULL != strchr(str, 'a'))
        {
            break;
        }

        /* Valid search at beginning */
        if (str != strchr(str, str[0]))
        {
            break;
        }

        /* Valid search in middle*/
        if ((str + 4) != strchr(str, str[4]))
        {
            break;
        }

        /* Search for null character at end */
        if ((str + strlen(str)) != strchr(str, '\0'))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

strchr is straightforward, with the only catch being that the null character is considered a part of the string. Ensuring that it can be found means you could implement strlen by use of strchr.

char *
strchr(const char *s, int c)
{
    const char search  = c;

    while (s)
    {
        if (search == *s)
        {
            return (char *) s;
        }

        if ('\0' == *s)
        {
            break;
        }

        s++;
    }

    return NULL;
}

strncmp

December 05, 2018

tags: memcmp, string.h, strlen, strncmp

The strncmp function allows you to compare to strings to determine their (in)equality while also specifying a maximum number of characters to compare.

int
strncmp(const char *s1, const char *s2, size_t n);

The return value is an integer which indicates whether the first string is greater than, equal to, or less than the second string.

Implementation

strncmp is of course similar to strcmp, but we first need to determine how many characters should be compared, as any characters after a null-character in s1 should not be compared. Therefore, we need to determine the length of s1, plus 1 for the null-character, and compare a number of characters which is the lesser of that value and n. We can then pass the real work on to memcmp.

size_t len1 = strlen(s1) + 1;

if (len1 < n)
{
    n = len1;
}

return memcmp(s1, s2, n);

It's important to include the + 1 as otherwise an s1 which is a substring of s2 would compare as equal (e.g. strncmp("foo", "foobar", strlen("foobar"))).

Testing

We need to test comparison to/from NULL pointers, to/from empty strings, and all cases of equal, greater than, and less than results. The only deviation from the strcmp tests is comparing two strings where one is a substring of the other. Performing this test ensures that the null-character is accounted for when s1 is shorter than s2.

int
strncmpTest(void)
{
    int ret         = 1;
    char pStr1[]    = "This is a Test string";
    char pStr2[]    = "This is a Test string";
    char pStr3[]    = "This is a Lesser string";
    char pStr4[]    = "This is a greater string";
    char *pStr5     = NULL;
    char pStr6[]    = "This is a Test string with more";

    do
    {
        /* NULL s2 pointer */
        if (1 != strncmp(pStr1, pStr5, sizeof(pStr1)))
        {
            break;
        }

        /* NULL s1 pointer */
        if (1 != strncmp(pStr5, pStr1, sizeof(pStr1)))
        {
            break;
        }

        /* Empty s2 */
        if (0 >= strncmp(pStr1, "", sizeof(pStr1)))
        {
            break;
        }

        /* Empty s1 */
        if (0 <= strncmp("", pStr1, sizeof(pStr1)))
        {
            break;
        }

        /* Test equality */
        if (0 != strncmp(pStr1, pStr2, sizeof(pStr1)))
        {
            break;
        }

        /* Test equal substrings */
        if (0 != strncmp(pStr1, pStr6, sizeof(pStr1) - 1))
        {
            break;
        }

        /* Test inequal strings */
        if (0 == strncmp(pStr1, pStr6, sizeof(pStr1)))
        {
            break;
        }

        /* First string greater than second string */
        if (0 >= strncmp(pStr1, pStr3, sizeof(pStr1)))
        {
            break;
        }

        /* First string less than second string */
        if (0 <= strncmp(pStr1, pStr4, sizeof(pStr1)))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Again, memcmp does all the work and we need to take care when calculating the maximum number of characters to be compared.

int
strncmp(const char *s1, const char *s2, size_t n)
{
    size_t len1 = strlen(s1) + 1;

    if (len1 < n)
    {
        n = len1;
    }

    return memcmp(s1, s2, n);
}

strcmp

November 30, 2018

tags: memcmp, strcmp, string.h, strlen

The strcmp function allows you to compare to strings to determine their (in)equality.

int
strcmp(const char *s1, const char *s2);

The return value is an integer which indicates whether the first string is greater than, equal to, or less than the second string.

Implementation

The strcmp function is incredibly similar to memcmp except we must stop comparing when we encounter a null-character in s1. We find find that character with the strlen function and use the result, plus 1, to define how many characters to compare. Thus, we can make a call directly to memcmp to do the work for us. All error handling is done in the other functions and strcmp becomes very simple:

return memcmp(s1, s2, strlen(s1) + 1);

It's important to include the + 1 as otherwise an s1 which is a substring of s2 would compare as equal (e.g. strcmp("foo", "foobar")).

Testing

We need to test comparison to/from NULL pointers, to/from empty strings, and all cases of equal, greater than, and less than results. The testing here is slightly different from memcmp as an empty string, "", has a single character in it, the null-character '\0. Therefore we have at least one character to compare and the result should be a non-zero value, assuming the other string has characters in it.

int
strcmpTest(void)
{
    int ret         = 1;
    char pStr1[]    = "This is a Test string";
    char pStr2[]    = "This is a Test string";
    char pStr3[]    = "This is a Lesser string";
    char pStr4[]    = "This is a greater string";
    char *pStr5     = NULL;

    do
    {
        /* NULL s2 pointer */
        if (1 != strcmp(pStr1, pStr5))
        {
            break;
        }

        /* NULL s1 pointer */
        if (1 != strcmp(pStr5, pStr1))
        {
            break;
        }

        /* Empty s2 */
        if (0 >= strcmp(pStr1, ""))
        {
            break;
        }

        /* Empty s1 */
        if (0 <= strcmp("", pStr1))
        {
            break;
        }

        /* Test equality */
        if (0 != strcmp(pStr1, pStr2))
        {
            break;
        }

        /* First string greater than second string */
        if (0 >= strcmp(pStr1, pStr3))
        {
            break;
        }

        /* First string less than second string */
        if (0 <= strcmp(pStr1, pStr4))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

memcmp does all the work here while we use strlen to determine the number of characters to be compared, plus one for the null-terminator.

int
strcmp(const char *s1, const char *s2)
{
    return memcmp(s1, s2, strlen(s1) + 1);
}

strncat

November 29, 2018

tags: memmove, string.h, strlen, strncat

The strncat function allows you to join two strings by appending the second string to the first, modifying the first string rather than creating a new one. In addition, you may specify a maximum number of characters to be appended, allowing you to control the amount of data being appended.

char *
strncat(char *s1, const char *s2, size_t n);

It will return s1 which will now hold the concatenation of the two strings.

Implementation

Much like strcat, we must first determine the length of the two strings to determine where to begin copying and how much to copy. We should copy the lesser value of n and the length of the second string.

size_t len1 = 0;
size_t len2 = 0;

/* Error checking */

len1 = strlen(s1);
len2 = strlen(s2);

if (len2 < n)
{
    n = len2;
}

Now that we know how much to copy, we know that the first character to be overwritten will be the null-character at the end of s1, which is located at s1 + len1. Next, we add a null-character at the very end in case one was not found within the first n (our modified one) characters of s2. Finally, return the value of s1.

memmove(s1 + len1, s2, n);
s1[len1 + n] = '\0';

return s1;

This is nice because it works whether n is positive or zero. When n is zero no characters will be copied and the null-character at the end of s1 will be overwritten with another null-character.

Testing

We need to test concatenation to/from NULL pointers, to/from empty strings, partial concatenation, and finally normal concatenation. In the case of NULL pointers and an empty source string, we expect the destination to be unmodified. Otherwise, we verify that the result is n characters from the second string appended to the first.

int
strncatTest(void)
{
    int ret = 1;
    char *pStr1     = NULL;
    char pStr2[15]  = "hello, world";
    char pStr3[15]  = "hello, world";
    char pStr4[15]  = { 0 };
    char pStr5[15]  = "hello,";
    char pStr6[15]  = " world";

    do
    {
        /* Append to NULL, may crash */
        strncat(pStr1, pStr2, strlen(pStr2));

        /* Concatenate from NULL, may crash */
        strncat(pStr2, pStr1, 10);

        /* Concatenate to an empty string */
        strncat(pStr4, pStr2, strlen(pStr2));
        if (0 != memcmp(pStr4, pStr2, strlen(pStr2) + 1))
        {
            break;
        }

        /* Concatenate from an empty string */
        memset(pStr4, '\0', sizeof(pStr4));
        strncat(pStr2, pStr4, 10);
        if (0 != memcmp(pStr2, pStr3, sizeof(pStr3)))
        {
            break;
        }

        /* Append partial string */
        memset(pStr4, '\0', sizeof(pStr4));
        strncat(pStr4, pStr5, 4);
        if ((4 != strlen(pStr4)) ||
            (0 != memcmp(pStr4, pStr5, 4)))
        {
            break;
        }

        /* Normal concatenation */
        strncat(pStr5, pStr6, sizeof(pStr5) - strlen(pStr5));
        if (0 != memcmp(pStr5, pStr3, sizeof(pStr3)))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Leveraging the functionality of strlen allows us to pass the major work to memmove, which consequently works as expected if n is zero. The only tricks to this function are figuring out how much data to copy and ensuring the resulting string is null-terminated.

char *
strncat(char *s1, const char *s2, size_t n)
{
    size_t len1 = 0;
    size_t len2 = 0;

    if (!s1 || !s2)
    {
        return s1;
    }

    len1 = strlen(s1);
    len2 = strlen(s2);

    if (len2 < n)
    {
        n = len2;
    }

    memmove(s1 + len1, s2, n);
    s1[len1 + n] = '\0';

    return s1;
}

strcat

October 09, 2017

tags: strcat, strcpy, string.h

The strcat function allows you to join two strings by appending the second string to the first, modifying the first string rather than creating a new one.

char *
strcat(char *s1, const char *s2);

It will return s1 which will now hold the concatenation of the two strings.

Implementation

To concatenate two strings means copying one to the end of the other. To find the end of the first string we use the strlen function to get an index for the location of the terminating null character. Then we pass this location to strcpy, providing the second string as the second parameter.

strcpy(s1 + strlen(s1), s2);
return s1;

We don't need any parameter validation because strlen and strcpy do that for us. If s1 is NULL then strlen will return 0 and we'll end up passing NULL to strcpy which will return without modifying anything. Likewise, if s2 is NULL then strcpy won't do anything.

Testing

We need to test concatenation to/from NULL pointers, to/from empty strings, and finally normal concatenation. In the case of NULL pointers and an empty source string, we expect the destination to be unmodified. Otherwise, we verify that the result is the second string appended to the first.

int
strcatTest(void)
{
    int ret         = 1;
    char *pStr1     = NULL;
    char pStr2[15]  = "hello, world";
    char pStr3[15]  = "hello, world";
    char pStr4[15]  = { 0 };
    char pStr5[15]  = "hello,";
    char *pStr6     = " world";

    do
    {
        /* Append to NULL, may crash */
        strcat(pStr1, pStr2);

        /* Concatenate from NULL, may crash */
        strcat(pStr2, pStr1);

        /* Concatenate to an empty string */
        strcat(pStr4, pStr2);
        if (0 != memcmp(pStr4, pStr2, strlen(pStr2)))
        {
            break;
        }

        /* Concatenate from an empty string */
        memset(pStr4, '\0', sizeof(pStr4));
        strcat(pStr2, pStr4);
        if (0 != memcmp(pStr2, pStr3, sizeof(pStr3)))
        {
            break;
        }

        /* Normal concatenation */
        strcat(pStr5, pStr6);
        if (0 != memcmp(pStr5, pStr2, sizeof(pStr2)))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

As our library continues to grow, we'll find that more and more functions can be implemented in terms of existing functions. This will reduce the logic and validation required so that it becomes concentrated in certain parts of the library. It also means that there is less code to debug later on if there are any problems.

char *
strcat(char *s1, const char *s2)
{
    strcpy(s1 + strlen(s1), s2);
    return s1;
}

strncpy

September 30, 2017

tags: memmove, memset, strcpy, string.h, strlen, strncpy

strncpy provides control over the same type of copying provided by strcpy. Rather than being forced to copy an entire string, you can choose to copy a portion of a string.

char *
strncpy(char *s1, const char *s2, size_t n);

s1 and s2 are the destination and source strings while n is the maximum number of characters to copy. The catch is that if no NUL is found in the first n characters then no NUL is added to the destination string. If the string is less than n characters long then null characters are appended so a total of n characters are written.

Local Variables

Similar to strcpy, we'll store the length of the source string so we can use it to perform error checking before attempting to copy. It will also be used to add NULs when the source string is shorter than n.

size_t len = strlen(s2);

Parameter Validation

The same validation can be used from strcpy.

if ( !s1 ||
     !s2 ||
    /* Check for wrapping while copying */
    ((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
    ((((unsigned long) -1) - ((unsigned long) s2)) < len))
{
    return s1;
}

Implementation

When parameter validation succeeds there are two cases to handle: n is less than or equal to the source string length, or it is greater.

If n <= len then we can use memmove or memcpy to copy n characters from s2 to s1. We can't use strcpy generically in this case because it will add a NUL if n == len, meaning we will have written n + 1 characters.

When n > len we need to copy the string and then add null characters to pad out the total number of written characters to n. We can use strcpy to copy the string and then memset to add the padding for us. Remember that strcpy will set s1[len] to NUL, so starting the memset ats1 + len would mean we unnecessarily overwrite an existing NUL with NUL. Thus, we should start the padding at s1 + len + 1. Since n > len then we guarantee that the result of n - len - 1 is greater than or equal to zero and that the result will never overflow.

Finally, we return s1 just like strcpy.

if (n <= len)
{
    memmove(s1, s2, n);
}
else
{
    strcpy(s1, s2);
    memset(s1 + len + 1, '\0', n - len - 1);
}

return s1;

Testing

Normal tests for strncpy would include passing bad pointers and passing an n which is less than strlen(s2). We also need to test copying from an empty string, from a string which is shorter than n (to verify the NUL padding), and finally setting n to zero.

int
strncpyTest(void)
{
    int ret         = 1;
    char *str1      = NULL;
    char *str2      = "hello, world";
    char str3[15]   = { 0 };
    char *str4      = "";
    char zeros[15]  = { 0 };

    do
    {
        /* Copy into NULL, may crash */
        strncpy(str1, str2, strlen(str2) + 1);

        /* Copy from NULL, may crash */
        strncpy(str3, str1, sizeof(str3));

        /* Should work the same as strcpy */
        strncpy(str3, str2, strlen(str2) + 1);
        if (0 != memcmp(str3, str2, strlen(str2) + 1))
        {
            break;
        }

        memset(str3, '\0', sizeof(str3));

        /* Taint str3 and verify the taint is not touched */
        str3[strlen(str2) - 4] = 'a';
        strncpy(str3, str2, strlen(str2) - 4);
        if ((0 != memcmp(str3, str2, strlen(str2) - 4)) ||
            (str3[strlen(str2) - 4] != 'a'))
        {
            break;
        }

        /* Taint str3 completely and verify strncpy places NULs afterwards */
        memset(str3, -1, sizeof(str3));
        strncpy(str3, str2, sizeof(str3));
        if ((0 != memcmp(str3, str2, strlen(str2))) ||
            (0 != memcmp(str3 + strlen(str2),
                         zeros,
                         sizeof(str3) - strlen(str2))))
        {
            break;
        }

        /* Verify no copy is made from an empty string */
        memset(str3, 0, sizeof(str3));
        strncpy(str3, str4, strlen(str4));
        if (0 != memcmp(str3, zeros, sizeof(str3)))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

This implementation makes use of memmove, strcpy, and memset to carry out the tasks dictated be the relation of n to strlen(s2). Our tests then verify this behavior through various taint methods.

char *
strncpy(char *s1, const char *s2, size_t n)
{
    size_t len = strlen(s2);

    if ( !s1 ||
         !s2 ||
        /* Check for wrapping while copying */
        ((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
        ((((unsigned long) -1) - ((unsigned long) s2)) < len))
    {
        return s1;
    }

    if (n <= len)
    {
        memmove(s1, s2, n);
    }
    else
    {
        strcpy(s1, s2);
        memset(s1 + len + 1, '\0', n - len - 1);
    }

    return s1;
}

memset

August 26, 2017

tags: memset, string.h

Writing a single value to a region of memory is most often used to initialize a structure on the stack or the memory returned from a call to malloc. It can also be used to mark a region "dirty" with a predetermined value that can later be searched for. The memset function is used to perform such tasks

void *
memset(void *s, int c, size_t n);

And returns the value of the passed in pointer, s.

Local Variables

Since data cannot be written by directly dereferencing a void *, we need to convert that to a pointer type we can use. The Standard specifies that the value c is to be converted to an unsigned char when written, so an unsigned char * makes sense as our working pointer.

unsigned char *pChar = s;

Parameter Validation

We need to verify that a valid pointer is passed and also that the amount of characters being set will not wrap around memory. We could return here if n is zero but that's an unlikely scenario. Rather than add a check to every call of memset, we'll let the least likely case suffer a few extra lines of code.

if ( !s ||
    /* Check for wrapping */
    ((((unsigned long) -1) - ((unsigned long) s)) < n))
{
    return s;
}

Implementation

The main portion of the loops looks similar to that of the working loops in memmove. We iterate and decrement n, each time writing c into the pointed to character and finally increment the pointer by one character. Once finished, we return the value of s.

while (0 < n--)
{
    *(pChar++) = (unsigned char) c;
}

return s;

Testing

Parameter validation aside, we need to verify that requesting zero characters be set is honored and that we can't set more characters than are left in memory. We also need to set some memory and verify that only the requested region was changed.

int
memsetTest(void)
{
    int ret     = 1;
    char reg1[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char reg2[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char reg3[] = "ABCD55555JKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";

    do
    {
        /* We may crash */
        memset(NULL, 0, 10);

        /* Request a set that would wrap, may crash */
        memset((void *) ((unsigned long) -5), 0, 10);

        /* No change should be made */
        memset(reg1 + 4, 0, 0);

        if (0 != (memcmp(reg1, reg2, sizeof(reg1))))
        {
            break;
        }

        /* Verify change in only the specified area */
        memset(reg1 + 4, '5', 5);

        if (0 != (memcmp(reg1, reg3, sizeof(reg1))))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

memset provides a straightforward way to write a single value to every character in a given region.

void *
memset(void *s, int c, size_t n)
{
    unsigned char *pChar = s;

    if ( !s ||
        /* Check for wrapping */
        ((((unsigned long) -1) - ((unsigned long) s)) < n))
    {
        return s;
    }

    while (0 < n--)
    {
        *(pChar++) = (unsigned char) c;
    }

    return s;
}

strcpy

August 25, 2017

tags: memmove, strcpy, string.h, strlen

Text manipulation is a large part user facing programs so that text can be stored, retrieved, and displayed. To accomplish those actions, strings may need to be copied from one location to another. We can already find the length of a string with strlen and could pass along that value to the memcpy or memmove functions. However, the programmer must always remember to copy an extra character: the null terminator. To prevent this error prone requirement on the programmer, C has the strlen function

char *
strcpy(char *s1, const char *s2);

An (Almost) Easy Win

As mentioned, one could accomplish a string copy with a memory copying function. The Standard does not require strcpy to support copying between overlapping strings, but we can easily support this by using memmove

memmove(dst, src, strlen(src) + 1);

Assuming that strlen works, it follows that a null character will be at the end of the string. Thus, we can copy the source length, plus one, in order to get the entire string along with the null terminator.

In the case of string which runs off the end of memory and the program doesn't crash during strlen (which is a valid possibility with undefined behavior), we would attempt to copy a single character from the source. As long as the source pointer isn't the last addressable byte, this would look valid to memmove which would subsequently copy a single character into the destination... without the null terminator. This is not the correct behavior so we must handle it appropriately.

Local Variables

First, we'll store the length of the string so we can use it to perform error checking before attempting to copy.

size_t len = strlen(s2);

Parameter Validation

We need to perform the same types of checks that memmove does so we'll steal them only exchanging n for our local variable len

if ( !s1 ||
     !s2 ||
    /* Check for wrapping while copying */
    ((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
    ((((unsigned long) -1) - ((unsigned long) s2)) < len))
{
    return s1;
}

In the case that the length is 0 then the wrapping checks don't perform any meaningful verification. However, when the length is positive we guarantee that we won't copy onto the end of memory leaving a string which is not null terminated.

Implementation

If the parameters are valid then we are assured that a valid copy may take place. However, if we're copying from a string which isn't null terminated (which would report a length of 0) then we would still copy a single character and no null terminator if we were to use the above memmove call. To prevent this, we can call memmove with the string length and manually null terminate it. This guarantees that the destination will always become null terminated, even if the source string wasn't. In other words, an invalid source string will result in an empty string being copied.

memmove(s1, s2, len);
s1[len] = '\0';

return s1;

Testing

Aside from passing NULL pointers to strcpy, we also need to verify copies from an empty string along with a normal copy from a string into an appropriately sized array. We still can't accurately test non-null terminated strings at the end of memory because that would invoke undefined behavior, however we can attempt to copy to the end of memory.

int
strcpyTest(void)
{
    int ret = -1;
    char *str1 = NULL;
    char *str2 = "hello, world";
    char *str3 = "";
    char str4[] = "hello, world";
    char str5[15] = { 0 };
    char str6[] = "a";

    do
    {
        /* Wrap checking relies on an unsigned long holding a pointer value
         * without truncation.
         */
        if (sizeof(unsigned long) < sizeof(void *))
        {
            break;
        }

        /* Copy into NULL, may crash */
        strcpy(str1, str2);

        /* Copy from NULL, may crash */
        strcpy(str2, str1);

        if (0 != memcmp(str2, str4, strlen(str4) + 1))
        {
            break;
        }

        /* Copy to destination which would cause wrap, may crash */
        strcpy((char *) ((unsigned long) -5), str2);

        /* Copy an empty string over an array */
        strcpy(str4, str3);

        if (0 != memcmp(str4, str3, strlen(str3) + 1))
        {
            break;
        }

        /* Copy a single character string */
        strcpy(str5, str6);

        if (0 != memcmp(str5, str6, strlen(str6) + 1))
        {
            break;
        }

        /* Copy a string into an array */
        strcpy(str5, str2);

        if (0 != memcmp(str5, str2, strlen(str2) + 1))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Our implementation of strcpy appropriately handles invalid inputs as well as copying from non-null terminated strings at the end of memory.

char *
strcpy(char *s1, const char *s2)
{
    size_t len = strlen(s2);

    if ( !s1 ||
         !s2 ||
        /* Check for wrapping while copying */
        ((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
        ((((unsigned long) -1) - ((unsigned long) s2)) < len))
    {
        return s1;
    }

    memmove(s1, s2, len);
    s1[len] = '\0';

    return s1;
}

strlen

August 23, 2017

tags: string.h, strlen

Although C doesn't have true string objects, it does have arrays of characters and string literals (static character arrays) which we'll call "C strings." Since there is no string object then nothing tracks the length of a C string and C must provide a way to denote the end of a string so the length can be programmatically determined. This is accomplished by placing a NUL terminator, '\0', after the last character.

Determining the length of a C string is such a common task that a function for it is provided in the standard library, strlen. It's prototype is as follows

size_t
strlen(const char *s);

It takes a pointer to the first character in a C string and returns the number of characters in the string, not counting the NUL terminator.

Common Ways

A simple approach to strlen is as follows

size_t
strlen(const char *s)
{
    size_t len = 0;

    for (; *s++; len++)
    {
        /* Do nothing */
    }

    return len;
}

Or similarly using a while loop:

size_t
strlen(const char *s)
{
    size_t len = 0;

    while (*s++)
    {
        len++;
    }

    return len;
}

These take advantage of the fact that the NUL terminator has an integer value of 0. The loop conditions are actually checking that the current character pointed to does not have the integer value of 0, in other words that it's not the NUL terminator. This could be further shortened as such

size_t
strlen(const char *s)
{
    size_t len = 0;

    while (*s++ && ++len);

    return len;
}

Which goes a step further in utilizing the short circuit nature of C conditionals meaning that if the first condition is not met then the second is not tested. This means that the len variable is only incremented when the current character is not the NUL terminator.

Although these are straightforward and simple implementations, glibc, bsdlibc, and musl all use more complicated, but more efficient, versions which use a technique similar to the one I discussed in the memcpy.h article. Once a word boundary is reached the function will test an entire word at a time to detect a single zero valued byte.

The above implementations do not account for NULL pointers or for a string which is not NUL terminated and may wrap around memory. These are things we'll cover below.

Local Variables

For our implementation, we'll only use a single local variable as a pointer to the end of the string (whenever we find it). Its initial value will be the start of the string since it's possible that the string is zero characters long.

const char *end = s;

Parameter Validation

The only parameter to this function is the character pointer, s, which we need to verify is not NULL. This will be one condition of the test loop in the main body of the function.

while (end)
{
}

Implementation

A loop is an obvious choice for iterating over a C string, each time checking whether the current character is the NUL terminator. To factor in our parameter validation and memory wrapping we'll have the first condition check the validity of the end pointer which was initialized with the value of s. We need to increment end inside the loop body in case the check against \0 fails, otherwise we'll increment it once more than intended when we reach the NUL terminator.

while (end &&
       ('\0' != *end))
{
    end++;
}

One of two things will cause the loop to terminate. Either the end pointer will be NULL or we reached the end of the string. The end pointer being NULL represents an error condition that we need to account for by resetting the value of end. Similarly, if an unbounded string at the end of memory was passed to this function, it's possible that the pointer will be incremented after it already points to the last addressable byte. This results in undefined behavior which we can attempt to handle by checking whether the resultant pointer is less than the original pointer. In such a case, we will also reset our end pointer so the string length will be reported as zero since an error will likely occur otherwise.

Lastly, we subtract s from the resulting end pointer to get the length of the string.

while (end &&
       ('\0' != *end))
{
    end++;
}

if (!end ||
    (end < s))
{
    end = s;
}

return (size_t) (end - s);

Testing

We can easily test whether our input validation works by passing NULL to strlen. We can also easily test the length of an empty string which should be 0, as well as a string with a single character in it. For a more "dynamic" test we will use a character array initialized with a string literal. The array will automatically have enough room allocated on the stack for every character in the string literal plus one more for the NUL terminator. As such, the length of that string should be the size of the array minus 1. We're unable to test strings which wrap memory because doing so intentionally would invoke undefined behavior for which the results are... undefined.

int
strlenTest(void)
{
    int ret = -1;
    char *str1 = NULL;
    char str2[] = "hello, world";
    char *str3 = "";
    char *str4 = "a";

    do
    {
        /* NULL pointer */
        if (0 != strlen(str1))
        {
            break;
        }

        /* String length should match array size - 1 */
        if ((sizeof(str2) - 1) != strlen(str2))
        {
            break;
        }

        /* Empty string */
        if (0 != strlen(str3))
        {
            break;
        }

        /* Single character string */
        if (1 != strlen(str4))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Although simple, this implementation of strlen affords us nice error checking and handling of undefined behaviors should they behave in a certain way.

size_t
strlen(const char *s)
{
    const char *end = s;

    /* Check s every time to safeguard against non-NUL terminated strings */
    while (end &&
           ('\0' != *end))
    {
        end++;
    }

    /* Verify the loop didn't stop in an error and s didn't wrap memory */
    if (!end ||
        (end < s))
    {
        end = s;
    }

    return (size_t) (end - s);
}

memcmp

May 21, 2017

tags: memcmp, string.h

Comparing memory regions is an incredibly useful tool when writing C code. This can be used to verify that two regions contain the same data or to compare against a desired value with user-provided data. The memcmp function provides this functionality and has the following prototype

int
memcmp(const void *s1, const void *s2, size_t n);

The return value is an integer which indicates whether the first region is greater than, equal to, or less than the second region. Since we only need to read the data it makes sense that the pointers are const which is an extra safety measure to ensure we don't modify what is being compared.

Local Variables

As with memcpy and memmove, we need two local pointers that enable us to perform comparisons. We will also use an integer to hold the value we will be returning.

const unsigned char *pStr1  = s1;
const unsigned char *pStr2  = s2;
int ret                     = 0;

Parameter Validation

The only validation we need to perform is on the pointer values. The n parameter is of type size_t which means it will always be greater than or equal to zero. This brings up an interesting question: what should the return value be when no memory is compared? The Standard provides no direct answer to this, but we can infer one from section 7.11.4, "Comparison functions", which states

"The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the first two values of the first pair of characters (both interpreted as unsigned char), that differ in the objects being compared."

So, if we were to return a nonzero value we would need to be able to perform a comparison which we can't do if n is zero. As such, if n is zero then the return value will also be zero. Therefore, we don't need to perform any validation on n since all values of n are valid.

Lastly, we need to verify that the comparison will not cause the pointers to wrap around memory. This will be done in the same manner as memmove where we add n to each pointer and see if the result is less than the pointer itself.

Input errors will be indicated with a value of 1 since no comparison will be made and we cannot otherwise determine the correct sign of the difference according to the rules above.

if ( !s1 ||
     !s2 ||
    /* Check for wrapping while comparing */
    ((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
    ((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
    return 1;
}

Implementation

A similar approach to memmove will be taken where we loop over each character, decrementing n each iteration, but we will compare values rather than copy them. We can store the compared value so that we can check it and break out of the loop upon inequality like so

while (n-- > 0)
{
    ret = *(pStr1++) - *(pStr2++);

    if (0 != ret)
    {
        break;
    }
}

However, this adds the overhead of storing that value for every iteration of the loop. We only care about that value when it is not equal to zero so we should only store it when that's the case. This requires us to backup the pointers one step since they were incremented within the if statement.

while (n-- > 0)
{
    if (*(pStr1++) - *(pStr2++))
    {
        ret = *(--pStr1) - *(--pStr2);
        break;
    }
}

This implementation is $O(n)$ which is as good as you can do while staying within the constraints of The Standard. The multi-character copy approach could also be used for comparisons but that's still $O(n)$ and it violates The Standard.

Testing

Aside from testing the obvious cases like equal regions and differing regions (both to get a greater than and less than result), we also need to test that our input validation works as expected. This includes passing in NULL pointers, providing 0 for n, and intentionally attempting to compare regions which will cause an overlap to occur. We'll order the tests such that we only verify correct region comparisons after we verify that input validation works appropriately.

int
memcmpTest(void)
{
    int ret         = 1;
    char pStr1[]    = "This is a Test string";
    char pStr2[]    = "This is a Test string";
    char pStr3[]    = "This is a Lesser string";
    char pStr4[]    = "This is a greater string";
    char *pStr5     = NULL;

    do
    {
        /* NULL s2 pointer */
        if (1 != memcmp(pStr1, pStr5, sizeof(pStr1)))
        {
            break;
        }

        /* NULL s1 pointer */
        if (1 != memcmp(pStr5, pStr1, sizeof(pStr1)))
        {
            break;
        }

        /* Cause a wrap from s1 */
        if (1 != memcmp((void *) ((unsigned long) -5), pStr1, sizeof(pStr1)))
        {
            break;
        }

        /* Cause a wrap from s2 */
        if (1 != memcmp(pStr1, (void *) ((unsigned long) -5), sizeof(pStr1)))
        {
            break;
        }

        /* Compare no characters */
        if (0 != memcmp(pStr1, pStr2, 0))
        {
            break;
        }

        /* Compare no characters with an invalid s2 pointer */
        if (0 == memcmp(pStr1, pStr5, 0))
        {
            break;
        }

        /* Compare no characters with an invalid s1 pointer */
        if (0 == memcmp(pStr5, pStr1, 0))
        {
            break;
        }

        /* Test equality */
        if (0 != memcmp(pStr1, pStr2, sizeof(pStr1)))
        {
            break;
        }

        /* First string greater than second string */
        if (0 >= memcmp(pStr1, pStr3, sizeof(pStr1)))
        {
            break;
        }

        /* First string less than second string */
        if (0 <= memcmp(pStr1, pStr4, sizeof(pStr1)))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

This function follows the same pattern that we've seen in memcpy and memmove with the only differences being that a comparison is performed instead of a copy and that we don't need to worry about comparing in a certain direction since data is never written. Remember that comparing zero bytes of memory will produce a return value indicating that the regions are equal. Now that we have memcmp, we can use it to test future functions like strcpy or strcat.

int
memcmp(const void *s1, const void *s2, size_t n)
{
    const unsigned char *pStr1  = s1;
    const unsigned char *pStr2  = s2;
    int ret                     = 0;

    if ( !s1 ||
         !s2 ||
        /* Check for wrapping while comparing */
        ((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
        ((((unsigned long) -1) - ((unsigned long) s2)) < n))
    {
        return 1;
    }

    while (n-- > 0)
    {
        if (*(pStr1++) - *(pStr2++))
        {
            ret = *(--pStr1) - *(--pStr2);
            break;
        }
    }

    return ret;
}

One Function Per File

May 20, 2017

tags: assembling, compiling, design, linking, make

A common layout model seen in most libc implementations (e.g. glibc, bsdlibc, musl, etc.) is to have one source file for every function in libc. For example, memcpy.c, memmove.c residing under a string/ directory. This is in contrast to having one source file corresponding to every libc header (i.e. string.c for string.h). We'll explore why this approach is taken and then demonstrate how to convert to the more common model.

Static Linking

One of the root causes of this layout model is static linking against the standard library. Static linking takes whatever you're linking against, like libc, and embeds those functions into your output binary. This increases the size of your binary somewhat but frees you from worrying about what version of that library may or may not be installed on whatever system your user is running on.

Dynamically linking places some requirements on the version of the library you're linking against. If your user chose to upgrade their standard C library and that library provides more accurate results for sin() than you're expecting then you may run into issues.

This decision means we're trading portability for size or vice versa. Since we're developing a library that will be linked against we want to be considerate to our users so that this decision is easier to make. More specifically, we want the output binary size to be as small as possible when someone statically links against welibc. How do we do that? Place every function in its own file. The real question that we'll be answering is why placing one function per file produces smaller binaries.

Binary Size

To find out why one function per file is better, we need to do a comparison of statically linking against a library written one way versus the other. The test code will use one, but not both, of the functions we have so far in welibc so that we can show when, or if, unnecessary code shows up in our output binary. If you use all of the functions then the approach doesn't matter since you get the entire library anyways. Our test linking program, link.c, will be the following:

#include <string.h>

int
main(int argc, char *argv[])
{
    char src[] = "Hello, world!";
    char dst[20] = { 0 };

    memcpy(src, dst, sizeof(src));

    return 0;
}

This program will be compiled and linked against welibc in the same way the test code is done:

$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi -c link.c -o link.o
$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi link.o -lwelibc -L. -O0 -o test_link

One modification I made is to remove the use of the -ggdb3 flag so that no debugging information is generated. This will simplify things in a moment.

Comparison

welibc is currently written with one source file per header so we'll look at that before changing anything. Using the above procedure and renaming the output file to "combined", we get an output file size of 2164 bytes.

Separating out memmove to a dedicated C file, rebuilding welibc, and relinking the test code to welibc with a file name of "separate" gives a file size of 1892 bytes.

How do we find out where the 272 byte increase comes from? objdump will tell us exactly what is contained in the binaries and we can diff the output of each binary:

$ objdump -d combined > comb.txt
$ objdump -d separate > sep.txt
$ diff comb.txt sep.txt
2c2
< combined:     file format elf64-x86-64
---
> separate:     file format elf64-x86-64
101,162d100
<
< 0000000000400282 <memmove>:
<   400282:     55                      push   %rbp
<   400283:     48 89 e5                mov    %rsp,%rbp
<   400286:     48 89 7d e8             mov    %rdi,-0x18(%rbp)
...

The truncated output contains the rest of the instructions for the memmove function. The only difference is that the "separate" binary also contains the memmove function (which was never called). Let's see exactly how large the function is by subtracting the address of the first byte of the first instruction (0x400282) from the address of the last byte of the last function.

$ diff comb.txt sep.txt | tail
<   40032f:     48 8b 45 f8             mov    -0x8(%rbp),%rax
<   400333:     88 10                   mov    %dl,(%rax)
<   400335:     48 8b 45 d8             mov    -0x28(%rbp),%rax
<   400339:     48 8d 50 ff             lea    -0x1(%rax),%rdx
<   40033d:     48 89 55 d8             mov    %rdx,-0x28(%rbp)
<   400341:     48 85 c0                test   %rax,%rax
<   400344:     75 d8                   jne    40031e <memmove+0x9c>
<   400346:     48 8b 45 e8             mov    -0x18(%rbp),%rax
<   40034a:     5d                      pop    %rbp
<   40034b:     c3                      retq
$ echo $((0x40034b - 0x400282))
201

The difference in the file sizes was 272 bytes though, there are still 71 bytes unaccounted for. These are attributed to other changes required to faciliate this extra function which come in the form of added, or slightly changed, instructions in the exception header and frame.

It's clear that separating functions to their own source file ensures that programs which statically link against welibc will only receive the functions they call. Next we'll discover why this happens.

Translation Units

During the compilation of C programs the compiler will deal in discrete pieces of the larger program which are referred to as translation units. The Standard defines a translation unit as "A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines previously skipped by any of the conditional inclusion preprocessing directives ...".

When all string.h functions are included in a single source file they get lumped into a single translation unit. When you make a call into code within a translation unit, the entire unit must be included in the final output executable. When all functions for a given section of the standard library are in the same file (i.e. string.c), you end up including all functions even if only one was called. This is seen with the combined executable including memmove even though only memcpy was called.

Dynamic Linking

Now that we know why separating functions to their own function is beneficial for static linking, we need to explore what affect this will have on dynamic linking.

When dynamic linking is used, the operating system will load a library into an address space the first time an executable needs to use it. That library is then made available to any other executables that need to use it, without the overhead of it being loaded. The operating system takes care of linking the running executables with the library at run time (hence the name dynamic).

When a library is compiled with the intention of being used with dynamic linking, the output size isn't a concern because the price of loading is paid only once. Moving each function to its own file effectively makes no difference for dynamic loading since all code is included anyways.

Reorganizing

The way I'd like to lay out the source code, and the way other projects have also done it, is with directories that correspond to each header file. For example:

└── src/
    ├── assert
    ├── errno
    ├── math
    ├── stddef
    ├── stdio
    ├── stdlib
    └── string

This means that some changes need to be made to the Makefile so that it will find code two directories deep.

Gather Source Directories

First we need to find all directories that contain source code. After creating some new directories and shuffling the files around, the layout looks like so:

src/
├── errno
│   └── errno.c
├── _start.s
└── string
    ├── memcpy.c
    └── memmove.c

Make has a builtin function, wildcard, which will expand each pattern it receives into the files and directories that it finds.

$(wildcard $(SRCDIR)/*/)

With the current layout, this will expand to

src/string/ src/errno/ src/_start.s

We don't want "src/_start.s" in the mix, so we can use the builtin dir function to extract the directory portion of each filename:

$(dir $(wildcard $(SRCDIR)/*/))

Which gives us:

src/string/ src/errno/ src/

Finally, we will use sort to order the names and remove any possible duplicates. This list of directories will be stored in a variable for later use:

$(SRC_DIRS):= $(sort $(dir $(wildcard $(SRCDIR)/*/)))

Gather Source Files

Now that we've found all directories containing source code, we want to gather all source files that exist within those directories. The addsuffix builtin will add the given suffix to every item in the list; this is perfect for using another wildcard to find all files of a given type. Finally, the notdir function is used to extract just the filename from an item.

C_SRCS  :=  $(notdir $(wildcard $(addsuffix *.c,$(SRC_DIRS))))
S_SRCS  :=  $(notdir $(wildcard $(addsuffic *.s,$(SRC_DIRS))))

This gives us a listing of all current source files:

errno.c memmove.c memcmp.c memcpy.c
_start.s

Convert Sources into Objects

Next we will slightly alter the existing pattern substitutions so that the input data comes from $(C_SRCS) and $(S_SRCS).

OBJECTS := $(patsubst %.c,%.o,$(C_SRCS))
OBJECTS += $(patsubst %.s,%.o,$(S_SRCS))

Modify VPATH

Now that we know the names of all source files and the directories in which they exist, we need to modify the VPATH variable so that it contains all of the source directories. This way Make can find the source files when we refer to them by filename only, rather than by full path. This means switching out $(SRCDIR) for $(SRC_DIRS).

$(VPATH)   := $(SRC_DIRS):$(TSTDIR)

Conclusion

After seeing the advantages of placing every function in its own file it's clear that this is the way welibc should be organized. This only required a few adjustments to the Makefile and we're all set. Now binaries that link against welibc will be leaner and the code will be easier to browse.

memmove

December 16, 2016

tags: memmove, string.h

The followup function to memcpy is memmove which performs the same operations but with the guarantee that overlapping memory regions will not cause any issue. The stipulation added by memmove is that "copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1." The prototype is as follows

void *
memmove(void *s1, const void *s2, size_t n);

As with memcpy, the return value is not very useful and provides no way of indicating an error. Similarly, s2 is a const void * meaning that we cannot modify the data to which it points.

The "Obvious" Approach

Given the wording of extra stipulation, the logical approach would be to allocate a chunk of new temporary memory, copy from s2 to the temporary region, copy from the temporary region to s1, and finally free the temporary region. Something like this:

char *pDst          = s1;
const char *pSrc    = s2;
char *pTmp          = NULL;

/* Error checking */

pTmp = malloc(n);

if (NULL == pTmp)
{
    return s1;
}

memcpy(pTmp, pSrc, n);
memcpy(pDst, pTmp, n);

free(pTmp);
pTmp = NULL;

return s1;

However, this has an issue if the call to malloc fails. From implementing memcpy we already know that we can copy objects backwards to account for object overlap, so if the malloc fails we can just do that.

char *pDst          = s1;
const char *pSrc    = s2;
char *pTmp          = NULL;

/* Error checking */

pTmp = malloc(n);

if (NULL != pTmp)
{
    memcpy(pTmp, pSrc, n);
    memcpy(pDst, pTmp, n);

    free(pTmp);
    pTmp = NULL;
}
else
{
    /* Copy forward or backward to handle object overlap */
    if (pDst < pSrc)
    {
        while (0 < n--)
        {
            *(pDst++) = *(pSrc++);
        }
    }
    else
    {
        pDst += n;
        pSrc += n;

        while (0 < n--)
        {
            *(--pDst) = *(--pSrc);
        }
    }
}

return s1;

However, this is extra logic and code that's unnecessary. We can use our same strategy from memcpy from the beginning and not worry about malloc failing, using extra local variables, or having a new branch in the code to check malloc's return value.

The Simple Approach

By removing the use of malloc and memcpy, our memmove implementation looks like so:

char *pDst          = s1;
const char *pSrc    = s2;

/* Error checking */

/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
    while (0 < n--)
    {
        *(pDst++) = *(pSrc++);
    }
}
else
{
    pDst += n;
    pSrc += n;

    while (0 < n--)
    {
        *(--pDst) = *(--pSrc);
    }
}

return s1;

This should look familiar because it's exactly the same code found in our implementation of memcpy. In fact, we can even reuse the same error checking and once added, memmove will be implemented identically to memcpy. So, why don't we just call one function straight from the other? Well, we could call memmove from memcpy because the former operates under stricter guidelines that also fulfill the job of the latter. However, we could not call memcpy from memmove because the former makes no guarantee about copying between overlapping objects and this is a requirement for memmove.

Testing

Since memmove provides similar functionality to memcpy we can also reuse our tests by replacing every call to memcpy with a call to memmove. See the memcpy post for the tests.

Conclusion

In the end, this is a copy of our implementation of memcpy and we could slightly clean up the code by calling memmove from memcpy. We can reuse the same error checking to end at the following implementation for memmove.

void *
memmove(void *s1, const void *s2, size_t n)
{
    char *pDst          = s1;
    const char *pSrc    = s2;

    if ( !s1 ||
         !s2 ||
        /* Check for wrapping while copying */
        ((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
        ((((unsigned long) -1) - ((unsigned long) s2)) < n))
    {
        return s1;
    }

    /* Copy forward or backward to handle object overlap */
    if (pDst < pSrc)
    {
        while (0 < n--)
        {
            *(pDst++) = *(pSrc++);
        }
    }
    else
    {
        pDst += n;
        pSrc += n;

        while (0 < n--)
        {
            *(--pDst) = *(--pSrc);
        }
    }

    return s1;
}

memcpy

March 08, 2016

tags: memcpy, string.h

The inaugural string.h function is memcpy which will copy the specified number of characters from one object to another. It takes two character pointers, one to the destination object and one to the source object, along with the number of characters to copy. It returns a pointer to the destination object.

void *
memcpy(void *s1, const void *s2, size_t n);

As for the return value, the standard makes no indication that an error value would be returned if something went wrong, so memcpy lacks a way to do meaningful error checking. Also note that s2 is a const void *, meaning that the data to which it points won't be modifiable through the s2 pointer, but the pointer itself could be changed.

Local Variables

Since this function isn't very complicated, we will only need two local variables, a source pointer and a destination pointer. But, isn't that was s1 and s2 already are? Well, they are in a sense, but since they are void pointers then we can't do anything with them because they don't point to any specific type. What we need to do is copy those pointers into char pointers so that we can read and write characters.

char *pDst          = s1;
const char *pSrc    = s2;

Note that our source needs to be a const char * to match the const that is attached to s2. Without this, you should get a warning about discarding the const qualifier:

src/string.c: In function ‘memcpy’:
src/string.c:59:21: warning: initialization discards ‘const’ qualifier from
pointer target type [enabled by default]
     char *pSrc    = s2;
                     ^

Parameter Validation

The first thing we should do is check our inputs to be sure they are valid before we use them. The basic checks here would be ensuring that we don't have NULL pointers and that our size isn't zero:

if ( !s1 ||
     !s2 ||
     !n))
{
    return s1;
}

This will prevent us from having a NULL pointer dereference when we read or write a character and prevent us from executing unnecessary instructions if we didn't need to perform any actual copying.

The standard explicitly states that 'If copying takes place between objects that overlap, the behavior is undefined.' So, we don't have to verify that s1 isn't in the range of [s2, s2 + n - 1], or that s2 isn't in the range [s1, s1 + n - 1]. Undefined behavior means that the standard allows us to ignore this behavior, to implement a specific behavior that is then required to be documented, or to terminate execution completely. For right now, we will just ignore the behavior.

Function Body

Since we have the number of characters to copy stored in n, we'll use that as a loop counter and compare it against 0 to determine when we need to stop copying. We can then use our source and destination pointers to copy a single character at a time. Once we're done copying, all we need to do is return the original destination pointer.

while (0 < n--)
{
    *(pDst++) = *(pSrc++);
}

return s1;

Security

Although the standard says we can just ignore overlapping objects pointed to by s1 and s2, we should be able to easily verify that the objects don't overlap with the simple range checks that were mentioned before. Since size_t is an unsigned type, then we can use suggestions from the INT30-C section of the CERT C Coding Standard to check for overlap and wrapping:

if ( !s1 ||
     !s2 ||
     !n ||
    /* Check for object overlap */
    ((((unsigned long) s2) - ((unsigned long) s1)) < n) ||
    ((((unsigned long) s1) - ((unsigned long) s2)) < n) ||
    /* Check for wrapping while copying */
    ((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
    ((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
    return s1;
}

Due to a limitation of C90, there is no integral type guaranteed to be large enough to hold the value of a pointer, but in order to perform this error checking we must convert the pointers to an integral type. The largest possible type available in C90 is unsigned long, so that is what we'll convert the pointers to in order to perform arithmetic on them. We'll add a test later on to ensure that this is true on the system where this library is tested. As well, the function comment will include a warning about our assumption.

That's a lot of checking but it provides validation that our copy should succeed. There are other errors that could still happen, like reading outside of a memory boundary, but they are difficult (or impossible) to prevent in a general enough way to be included in the library. There are two camps here: either you prevent the user from doing something bad and let them handle the error checking themselves, or you let them try something bad at the expense of extra cycles spent on error checking in the library itself. Most implementations are done in the first way and I think it would be interesting to do it the latter way.

Backwards copy?

Another use for memcpy would be to slide a region of data over by some amount. But, what if we slide the region over by so little that its destination is actually within the original region? Well, our security checks will prevent that from happening! This is nice, but in actuality a user may need to do this very thing and the standard doesn't actually say we have to prevent it. To allow this we'll first have to remove the overlap checks.

Now that we allow the source and destination regions to overlap, let's look at what happens when we slide a region to lower memory such that destination region will overlap with the source. Assume that we have the characters, A, B, C, and D stored in memory adjacent to each other at address 0x8000, and we want to move it over to address 0x7FFE.

First, we read an A from 0x8000 and store it at 0x7FFE. Next, we read a B from 0x8001 and store it at 0x7FFF. Next we read a C from 0x8002 and store it at 0x8000, overwriting the A from our source region. Finally, we read a D from 0x8003 and store it at 0x8001, overwriting the B from our source region. We end up with A, B, C, D starting at address 0x7FFE. Although we overwrote our source region with new data, it wasn't an issue because we read from the source locations before overwriting them and our memcpy was a success.

Next, let's look at what happens when we slide a region to higher memory such that the destination region will overlap with the source. We'll use the same source address and data, but the destination will be address 0x8002.

First, we read an A from 0x8000 and store it at 0x8002, overwriting the C from our source region. Next, we read a B from 0x8001 and store it at 0x8003, overwriting the D from our source region. Next, we read an A from 0x8002 and store it at 0x8004. Finally, we read a B from 0x8003 and store it at 0x8005. We end up with A, B, A, B, starting at address 0x8002. This is the incorrect result and it happened because we overwrote source locations before reading from them.

So, how can we implement memcpy in such a way that the source and destination may overlap but the copy will still happen properly? Well, for the case that we slide the region in to higher memory, we can simply copy from the end of the region to the start, rather than the normal method of copying from start to end.

We'll redo our failed copy backwards and see what happens. First, we read a D from 0x8003 and store it at 0x8005. Next, we read a C from 0x8002 and store it at 0x8004. Next, we read a B from 0x8001 and store it at 0x8003, overwriting the D from our source region. Finally, we read an A from 0x8000 and store it to 0x8002, overwriting the C from our source region. We end up with A, B, C, D, starting at address 0x8002 and our memcpy was a success when done backwards.

In the same way that the forward copy failed when you have overlapping regions and move from lower to higher memory, a similar copy from higher to lower memory will fail when done backwards. As such, we must check the source and destination and perform a forward or backward copy as appropriate. The only true way to do this is by having two copies of the same code where one is implemented as normal, while the other moves the pointers to the end of the region and decrements them as it copies.

/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
    while (0 < n--)
    {
        *(pDst++) = *(pSrc++);
    }
}
else
{
    pDst += n;
    pSrc += n;

    while (0 < n--)
    {
        *(--pDst) = *(--pSrc);
    }
}

Our backward copy works very similarly to the forward copy with the exception of where our pointers start. Since we use the pre-decrement operator we must start out past the region we want to copy. This will allow us to back up in to the valid region and then perform the copy.

Efficiency

When you look at the complexity of an algorithm you can measure two things to get an idea of how the algorithm will perform: time and space. We'll look at how to analyze them both, how to improve the efficiency of memcpy, and some other generic notes on efficiency. That being said, keep in mind the famous quote, "Premature optimization is the root of all evil." Try not to be too clever and save the hardcore optimizations for when there is a true need for them (i.e. your algorithm isn't meeting speed requirements or it's using too much memory).

Before getting in to the details of efficiency, it's worth noting that the algorithm we have at this point is truly as efficient as it can be according to the standard. This is due to strict aliasing and object access rules which limit the ways in which we can read or write data. The standard tells us that a pointer to char may be used to access any other type of object which is why those are the only pointers we can use. Unfortunately, this limits us to copying one byte at a time.

For the sake of exploration we'll look at using other pointer types to perform the copy in a faster way, but remember that this breaks the rules of the standard.

Time

A common way to measure the time of an algorithm is to use Big O Notation which represents a cap on the growth of a function in relation to its input. This is appropriate for C code because each line, comparison, or assignment could actually take multiple instructions to execute depending on what hardware it runs on. Using Big O Notation allows us to normalize our measurements, providing an accurate description of our function regardless of where it runs.

For this memcpy algorithm, we can see that there is a direct correlation between the amount of input data, n, and the number of steps required to complete the copy. This means our function can be described with $O(n)$ in relation to how long it takes to execute.

Space

We can also use Big O Notation to describe the amount of memory, or space, that a function uses to do its work. With C programming, you may have arrays or multiple variables used to keep track of state, but the space for those will be constant and most likely on the stack. There is the concept of dynamic memory allocation where you use variable amounts of memory from the heap; these will be indicated by calls to functions like malloc or calloc. Since the amount of memory used with dynamic memory allocation is unknown at run time, this is the memory we are interested in measuring. Most other uses of memory will be known at compile time and would remain constant.

Our implementation of memcpy does not make use of dynamic memory allocation which means it will use the same amount of memory every time, regardless of the size of input data. We can describe this behavior with $O(1)$ indicating constant memory usage.

Multi-character copy

Copying one character at a time could be pretty slow if we were copying a large amount of characters. It could be more efficient to do four characters at a time using an int pointer (points to four bytes on x86_64), or even eight characters at a time using a long pointer (points to eight bytes on x86_64). This will incur some additional overhead for calculating which type to use with what values of n, as well as extra logic to copy the leftover characters if the chosen type is not a multiple of n. Let's explore this.

Side-note: copying via a pointer to a non-char type will violate the strict-aliasing rules of C because only pointers to void or char types may alias any other pointer. Since we don't have any type information about the objects being copied to or from, we could only use pointers to char types if we want to stay within the bounds of strict-aliasing rules.

Alignment

The first thing we'll look at is alignment, which deals with how data is arranged and accessed in memory. Data said to be "n-byte aligned" will always be at a memory address that is a multiple of n. For example, four-byte aligned data will always be at an address that ends in 0x0, 0x4, 0x8, or 0xC, because those are all multiples of four. This is an important concept when copying memory because different amounts of data can be accessed depending on the type used for access, but this only works if the addresses match the alignment of that type.

Each type in C may have a different alignment due to its size in bytes. Examples from x86_64 would be char, short, int, and long, which each have a size of one, two, four, and eight bytes, respectively, and also have the same values for their alignment. This means that storing 0xDEADBEEF at address 0x8000 might look like so:

unsigned char *pChar = (unsigned char *) 0x8000;
unsigned short *pShort = (unsigned short *) 0x8000;
unsigned int *pInt = (unsigned int *) 0x8000;
unsigned long *pLong = (unsigned long *) 0x8000;

*pChar = 0xEF;
*(pChar + 1) = 0xBE;
*(pChar + 2) = 0xAD;
*(pChar + 3) = 0xDE;

*pShort = 0xBEEF;
*(pShort + 1) = 0xDEAD;

*pInt = 0xDEADBEEF;

*pLong = 0xDEADBEEF;

As you can see, the larger the size of a type, the larger the amount of data that be accessed at one time. However, there could be a point at which you stop seeing benefits, as can be seen when we used int and long pointers.

The address we chose, 0x8000, is special because it fits the alignment of each of the four types. This won't always be the case and this is where problems start to happen. If we had chosen the address 0x8002 then the only valid types for accessing that address would be char and short, because 0x8002 is a multiple of both one and two, but not four or eight. This will come in to play when attempting to do multi-byte copies, because both the source and destination addresses will need to have the same alignment. The convenient part about copying one char at a time is that every address is one-byte aligned so there are no alignment issues to worry about.

The largest amount of data that a processor can access at one time is referred to as a word, so if we can copy a word at a time then we will theoretically be copying memory as fast as possible. The word size on each machine will change, but it will always be a multiple of the largest data type available on that machine. In terms of C90, this means copying a long at a time is the best we can do. The sign of the long is irrelevant here because an unsigned long occupies the same amount of memory as a long.

For more information about alignment, see this article by Johnathan Rentzsch.

Calculating Efficiency Gains

The second thing we'll look at is how to determine which type will be the most efficient at copying a certain amount of data. To do this, we need to consider how many steps are taken to perform a copy. For simplicities sake, we will consider one copy to be one step. If we use different types then that one step may copy one byte of data or it may copy eight.

To take care of any alignment issues (assuming that the source and destination are capable of being aligned), we will assume that our copies start one byte past an alignment boundary, and end one byte before an alignment boundary which will represent the worst-case scenario. For an n-byte aligned type, this means we have $n-1$ steps before and after we perform the n-byte aligned bulk copy. This also means that an n-byte aligned type will only be used to copy $3n - 2$ characters, or more. Otherwise, we wouldn't have any characters left to copy after handling alignment.

For x86_64, our thresholds for performing copies with each type are as follows:

char, 1-byte aligned: $3(1) - 2 = 3 - 2 = 1$
short, 2-byte aligned: $3(2) - 2 = 6 - 2 = 4$
int, 4-byte aligned: $3(4) - 2 = 12 - 2 = 10$
long, 8-byte aligned: $3(8) - 2 = 24 - 2 = 22$

As you can see, the thresholds are very close together. To determine how many steps, in the worst case, are used to copy $x$ number of characters with an $n$-byte aligned type, we will derive an equation based on the three parts of the copy: a) getting to an alignment boundary, b) the bulk copy, and c) copying whatever is leftover:

$$ f(x,n) = a + b + c $$

$$ a = (n - 1) $$

$$ b = \left \lfloor \dfrac{x - (n - 1)}{n} \right \rfloor $$

$$ c = x - a - nb $$

And if we substitute in $c$ we can simplify:

$$ f(x,n) = a + b + c = a + b + x - a - nb = x + b - nb = x + b(1 - n)$$

Then substitute in $b$ since $a$ isn't needed anymore:

$$ f(x,n) = x + b(n - 1) = x + \left \lfloor \dfrac{x - (n - 1)}{n} \right \rfloor (1 - n) $$

As an example, let's compare copying 100 characters using short and long types.

$$ f(100, 2) = 100 + \left \lfloor \dfrac{100 - (2 - 1)}{2} \right \rfloor (1 - 2) $$

$$ f(100, 2) = 100 + \left \lfloor \dfrac{100 - 1}{2} \right \rfloor (-1) $$

$$ f(100, 2) = 100 - \left \lfloor \dfrac{99}{2} \right \rfloor $$

$$ f(100, 2) = 100 - \left \lfloor 49.5 \right \rfloor $$

$$ f(100, 2) = 100 - 49 $$

$$ f(100, 2) = 51 $$

$$ f(100, 8) = 100 + \left \lfloor \dfrac{100 - (8 - 1)}{8} \right \rfloor (1 - 8) $$

$$ f(100, 8) = 100 + \left \lfloor \dfrac{100 - 7}{8} \right \rfloor (-7) $$

$$ f(100, 8) = 100 + \left \lfloor \dfrac{93}{8} \right \rfloor (-7) $$

$$ f(100, 8) = 100 + \left \lfloor 11.625 \right \rfloor (-7) $$

$$ f(100, 8) = 100 + 11(-7) $$

$$ f(100, 8) = 100 - 77 $$

$$ f(100, 8) = 23 $$

Considering the range of size_t, which dictates how many characters get copied at once, the efficiency gains of using a larger type are noticed for very small values of n.

Implementing a typeless memcpy

Next we'll look at implementing a multi-byte memcpy by starting with the basic copy from before and building up to a version which will work regardless of the type used for the "bulk" copy portion. Here was our original copy loop:

while (0 < n--)
{
    *(pDst++) = *(pSrc++);
}

This relied on the size of each copy being one byte allowing us to use the post-decrement operator to reduce our counter. More generically, this loop would be:

while (0 < n)
{
    *(pDst++) = *(pSrc++);
    n -= sizeof(*pDst);
}

Which will accomplish the same thing while working with any type chosen for pDst and pSrc (assuming they're the same). For now we are assuming that both pDst and pSrc point to memory addresses which are aligned and that n is a multiple of sizeof(*pDst); we'll deal with that later. We need to further abstract this to account for getting to an aligned boundary before the bulk copy and then to copy any leftover bytes.

Like we mentioned with our object overlap checks, C90 doesn't have an integral type that is guaranteed to be large enough to hold the value of a pointer and we need such a type in order to check the alignment of a pointer. The best we can do is use the largest type available to us, long, and note the invalid assumption that it will hold the value of a pointer. In our tests we can include a check to ensure that this is true.

char *pCharDst          = s1;
const char *pCharSrc    = s2;
my_type *pDst           = NULL;
const my_type *pSrc     = NULL;

/* Error checking */

while ((0 < (((unsigned long) pCharDst) & (sizeof(*pDst) - 1))) &&
       (0 < n--))
{
    *(pCharDst++) = *(pCharSrc++);
}

pDst = (my_type *) pCharDst;
pSrc = (my_type *) pCharSrc;

while (sizeof(*pDst) <= n)
{
    *(pDst++) = *(pSrc++);
    n -= sizeof(*pDst);
}

pCharDst = (char *) pDst;
pCharSrc = (char *) pSrc;

while (0 < n--)
{
    *(pCharDst++) = *(pCharSrc++);
}

Finally, we need to deal with our alignment assumption by checking the source and destination addresses for matching alignment. If their offsets from an alignment boundary do not match then they will never sync up to both match the alignment of the bulk copy type at the same time. In this case, the best we can do is use our generic char copy.

char *pCharDst          = s1;
const char *pCharSrc    = s2;
my_type *pDst           = NULL;
const my_type *pSrc     = NULL;

/* Error checking */

if ((((unsigned long) pCharDst) & (sizeof(*pDst) - 1)) ==
        (((unsigned long) pCharSrc) & (sizeof(*pSrc) - 1)))
{
    while ((0 < (((unsigned long) pCharDst) & (sizeof(*pDst) - 1))) &&
           (0 < n--))
    {
        *(pCharDst++) = *(pCharSrc++);
    }

    pDst = (my_type *) pCharDst;
    pSrc = (my_type *) pCharSrc;

    while (sizeof(*pDst) <= n)
    {
        *(pDst++) = *(pSrc++);
        n -= sizeof(*pDst);
    }

    pCharDst = (char *) pDst;
    pCharSrc = (char *) pSrc;
}

while (0 < n--)
{
    *(pCharDst++) = *(pCharSrc++);
}

Choosing when to switch types

Now we have all of tools we need to write a multi-byte memcpy. Before fully implementing the multi-byte memcpy, let's look at our thresholds one more time (again, these are specific to x86_64):

char, 1-byte aligned - $3(1) - 2 = 3 - 2 = 1$
short, 2-byte aligned - $3(2) - 2 = 6 - 2 = 4$
int, 4-byte aligned - $3(4) - 2 = 12 - 2 = 10$
long, 8-byte aligned - $3(8) - 2 = 24 - 2 = 22$

Not only are these the lowest values of n that are required to use that specific type, they are also the first values of n where copying with the smaller type will take more steps than with the larger. This means that the minimum count thresholds double as the efficiency thresholds. We could use a simple check to decide on which type to use for the bulk copy and then use the above code with the appropriate type substituted for pDst and pSrc. Due to the nature of alignment, ensuring that something is aligned to the largest type guarantees that alignment is satisfied for any smaller type (thought I wouldn't be surprised if there was an architecture for which that wasn't true). This means we can always perform our alignment check using a modulus of the largest type, sizeof(long), and if it passes then we're guaranteed that the source and destination can ultimately be aligned for any type.

#define MEMCPY_LONG_THRESHOLD   (22)
#define MEMCPY_INT_THRESHOLD    (10)
#define MEMCPY_SHORT_THRESHOLD  (4)

char *pCharDst          = s1;
const char *pCharSrc    = s2;
short *pShortDst        = NULL;
const short *pShortSrc  = NULL;
int *pIntDst            = NULL;
const int *pIntSrc      = NULL;
long *pLongDst          = NULL;
const long *pLongSrc    = NULL;

/* Error checking */

if ((MEMCPY_SHORT_THRESHOLD <= n) &&
    (((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1)) ==
        (((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1)))
{
    if (MEMCPY_LONG_THRESHOLD <= n)
    {
        while ((0 < (((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1)))
               &&
               (0 < n--))
        {
            *(pCharDst++) = *(pCharSrc++);
        }

        pLongDst = (long *) pCharDst;
        pLongSrc = (long *) pCharSrc;

        while (sizeof(*pLongDst) <= n)
        {
            *(pLongDst++) = *(pLongSrc++);
            n -= sizeof(*pLongDst);
        }

        pCharDst = (char *) pLongDst;
        pCharSrc = (char *) pLongSrc;
    }

    if (MEMCPY_INT_THRESHOLD <= n)
    {
        while ((0 < (((unsigned long) pCharDst) & (sizeof(*pIntDst) - 1)))
               &&
               (0 < n--))
        {
            *(pCharDst++) = *(pCharSrc++);
        }

        pIntDst = (int *) pCharDst;
        pIntSrc = (int *) pCharSrc;

        while (sizeof(*pIntDst) <= n)
        {
            *(pIntDst++) = *(pIntSrc++);
            n -= sizeof(*pIntDst);
        }

        pCharDst = (char *) pIntDst;
        pCharSrc = (char *) pIntSrc;
    }

    if (MEMCPY_SHORT_THRESHOLD <= n)
    {
        while ((0 < (((unsigned long) pCharDst) & (sizeof(*pShortDst) - 1)))
               &&
               (0 < n--))
        {
            *(pCharDst++) = *(pCharSrc++);
        }

        pShortDst = (short *) pCharDst;
        pShortSrc = (short *) pCharSrc;

        while (sizeof(*pShortDst) <= n)
        {
            *(pShortDst++) = *(pShortSrc++);
            n -= sizeof(*pShortDst);
        }

        pCharDst = (char *) pShortDst;
        pCharSrc = (char *) pShortSrc;
    }
}

while (0 < n--)
{
    *(pCharDst++) = *(pCharSrc++);
}

You'll notice that we used if statements for each of the bulk copies, rather than if else for the int and short copies. This is because we aren't sure what type of system we will run on and there is a possibility that what is leftover from one bulk copy may be enough to do a smaller bulk copy. On x86_64 this could be seen when copying 22 bytes when the source and destination addresses are already aligned to the size of a long. Two bulk copies would happen with a long pointer, followed by three copies with a short pointer.

While this will mostly guarantee you get the fastest copy for any value of n, it has a lot of repetitive code and is overly complicated just to provide faster copies for a very small window of n values. In addition, if we look at each comparison as another step then a copy using short pointers has completely lost any efficiency gain it would've had because we spent five steps just to figure out we should be using shorts. Performing more detailed step or efficiency analysis really requires you to look at the assembly level and that's not what we're here to do. With that said, a copy using short pointers probably won't have any real benefits over the basic char copy. Likewise, the int copy would show gains for such a small window that it's unnecessary to implement unless your specific application utilizes a lot of copies in that range.

Now that we're down to only performing bulk copies with long pointers, we should consider the overhead of checking n, the alignment, and adjusting pointers as necessary for a successful multi-byte copy. The added overhead means that we should increase the threshold required to perform the extra work, otherwise we aren't actually gaining anything. We'll bump up our threshold by an order of magnitude by setting it to 200. You could more finely tune this threshold by looking at the assembly output of your memcpy implementation and comparing exactly how many instructions are used for the extra checks and the bulk copy itself. Our final multi-byte implementation (sans error checking) is below. Each loop here represents each piece of our efficiency equation from before: $a$ to get to an alignment boundary, $b$ to perform the bulk copy, and $c$ to copy any leftovers. Lastly, we've added two new variables to track the alignment offsets for us so we only have to calculate them once.

#define MEMCPY_EFFICIENCY_THRESHOLD     (200)

void *
memcpy(void *s1, const void *s2, size_t n)
{
    char *pCharDst          = s1;
    const char *pCharSrc    = s2;
    long *pLongDst          = NULL;
    const long *pLongSrc    = NULL;
    size_t dstOffset        = 0;
    size_t srcOffset        = 0;

    /* Error checking */

    dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
    srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);

    if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
        (dstOffset == srcOffset))
    {
        while ((0 < dstOffset--) &&
               (0 < n--))
        {
            *(pCharDst++) = *(pCharSrc++);
        }

        pLongDst = (long *) pCharDst;
        pLongSrc = (long *) pCharSrc;

        while (sizeof(*pLongDst) <= n)
        {
            *(pLongDst++) = *(pLongSrc++);
            n -= sizeof(*pLongDst);
        }

        pCharDst = (char *) pLongDst;
        pCharSrc = (char *) pLongSrc;
    }

    while (0 < n--)
    {
        *(pCharDst++) = *(pCharSrc++);
    }

    return s1;
}

Other Efficiency Notes

Our parameter validation includes a check on n that is questionable. On one hand, our loops will check n and never run, so we don't actually need the check. On the other hand, we can exit early when n is zero and save ourselves from performing the other checks that are irrelevant since no data is being copied. The chance of (n == 0) being true is possible, but highly unlikely, and keeping the check means that we're adding an extra step to every call to memcpy. Due to this, we'll favor the calls to memcpy where (n != 0) by removing parameter validation on n, and let those unlikely calls suffer a few extra steps.

For error checking, we'll consider each check to be an additional step. So, we're adding just four steps to validate that our objects do not overlap and that we don't wrap during the copy. This is an incredibly small price to pay for the added security and safety that these checks add, so we'll definitely keep those.

Backwards copy revisited

Now that we have a more efficient version of our algorithm, we still need to make it copy backwards to handle the case of object overlap. We've already discussed why this is necessary and how to do it with the simple copy loop and walked through step-by-step the way to expand the simple version into a more efficient version. The pointer arithmetic takes care of a lot of the headache for us. Since this is an extension of what we've done already, the full version is found at the end.

Testing

Normally we could use memcmp to verify that our memcpy happened correctly, but we don't have that function yet. We'll create a simple test function that checks the boundaries of the copy to ensure we copied up to the boundaries, but no further. This function takes a control memory region that should be a copy of the destination region so it can be restored, as well as an offset in to the region such that we can verify no bytes before the region were touched. Before doing the actual copy, we verify that the source and destination are actually different at the critical points, otherwise our tests won't be meaningful. The last note about this function is with the use of the variables src2 and src3. This function will be used to test copying between overlapping objects and we must preserve the values of the source object that we intend to test, otherwise they may be overwritten and our test will appear to fail.

int
verifyCopy(
    const void * const  pVoidCtrl,
    const size_t        ctrlSize,
    void                *pVoidDst,
    const size_t        dstSize,
    const void * const  pVoidSrc,
    const size_t        srcSize,
    const size_t        offset)
{
    int ret                     = 1;
    const char * const pCtrl    = pVoidCtrl;
    char *pDst                  = pVoidDst;
    const char * const pSrc     = pVoidSrc;
    size_t n                    = 0;
    size_t i1                   = offset - 1;
    size_t i2                   = offset;
    size_t i3                   = 0;
    size_t i4                   = 0;
    char src2                   = 0;
    char src3                   = 0;

    if ((NULL == pCtrl) ||
        (NULL == pDst) ||
        (NULL == pSrc) ||
        (ctrlSize != dstSize) ||
        (0 == offset) ||
        ((2 * offset) >= ctrlSize) ||
        ((2 * offset) >= dstSize))
    {
        return ret;
    }

    /* Copy up to offset bytes from the end of the shortest region */
    if (dstSize > srcSize)
    {
        n = srcSize - (2 * offset);
    }
    else
    {
        n = dstSize - (2 * offset);
    }

    i3 = offset + n - 1;
    i4 = offset + n;

    src2 = pSrc[i2];
    src3 = pSrc[i3];

    do
    {
        /* Verify that we will be able to see a difference */
        if ((pDst[i1] == pSrc[i1]) ||
            (pDst[i2] == src2) ||
            (pDst[i3] == src3) ||
            (pDst[i4] == pSrc[i4]))
        {
            break;
        }

        memcpy(pDst + offset, pSrc + offset, n);

        /* Test the boundaries */
        if ((pDst[i1] != pCtrl[i1]) ||
            (pDst[i2] != src2) ||
            (pDst[i3] != src3) ||
            (pDst[i4] != pCtrl[i4]))
        {
            break;
        }

        ret = 0;
    } while (0);

    /* Return the destination to its original form */
    memcpy(pDst, pCtrl, dstSize);

    return ret;
}

Our main test function will explicitly test the error conditions and then use the previous function to verify that the memcpy implementation works properly. The method by which we verify our alignment handling is by testing every combination of start and end offset from zero to sizeof(long) - 1. The last two tests will ensure that copying between overlapping objects works properly.

int
memcpyTest(void)
{
    int ret         = 1;
    int failed      = 0;
    size_t i        = 0;
    size_t j        = 0;
    char bulk1[]    =   "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char bulk2[]    =   "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char bulk3[]    =   "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA";
    char bulk4[]    =   "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
                        "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA";
    char simple1[]  =   "This will be a short string to test simple things";
    char simple2[]  =   "This will be a short string to test simple things";

    do
    {
        /* Alignment checking relies on an unsigned long holding a pointer value
         * without truncation.
         */
        if (sizeof(unsigned long) < sizeof(void *))
        {
            break;
        }

        /* No way to verify, we'll just crash here if we don't catch the NULL */
        memcpy(NULL, simple1, sizeof(simple1));

        /* We may crash here */
        memcpy(simple2, NULL, sizeof(simple2));

        for (i = 0; i < sizeof(simple2); i++)
        {
            if (simple1[i] != simple2[i])
            {
                failed = 1;
                break;
            }
        }

        if (1 == failed)
        {
            break;
        }

        /* We might crash here too */
        memcpy(simple2, (void *) ((unsigned long) -5), sizeof(simple2));

        for (i = 0; i < sizeof(simple2); i++)
        {
            if (simple1[i] != simple2[i])
            {
                failed = 1;
                break;
            }
        }

        if (1 == failed)
        {
            break;
        }

        /* And here might be another crash */
        memcpy((void *) ((unsigned long) -5), simple1, sizeof(simple1));

        /* Verify no characters are copied */
        memcpy(bulk2, bulk3, 0);

        for (i = 0; i < sizeof(bulk2); i++)
        {
            if (bulk1[i] != bulk2[i])
            {
                failed = 1;
                break;
            }
        }

        if (1 == failed)
        {
            break;
        }

        /* Test each permutation of 0 to sizeof(long) start positions and
         * offsets doing a forward copy.
         */
        for (i = 0; i < sizeof(long); i++)
        {
            for (j = 0; j < sizeof(long); j++)
            {
                if (0 != verifyCopy(bulk1 + i,
                                    sizeof(bulk1) - i,
                                    bulk2 + i,
                                    sizeof(bulk2) - i,
                                    bulk3 + i,
                                    sizeof(bulk3) - i,
                                    j + 5))
                {
                    failed = 1;
                    break;
                }
            }

            if (1 == failed)
            {
                break;
            }
        }

        if (1 == failed)
        {
            break;
        }

        /* Test each permutation of 0 to sizeof(long) start positions and
         * offsets doing a backward copy.
         */
        for (i = 0; i < sizeof(long); i++)
        {
            for (j = 0; j < sizeof(long); j++)
            {
                if (0 != verifyCopy(bulk3 + i,
                                    sizeof(bulk3) - i,
                                    bulk4 + i,
                                    sizeof(bulk4) - i,
                                    bulk2 + i,
                                    sizeof(bulk2) - i,
                                    j + 5))
                {
                    failed = 1;
                    break;
                }
            }

            if (1 == failed)
            {
                break;
            }
        }

        if (1 == failed)
        {
            break;
        }

        /* Slide a buffer backward */
        if (0 != verifyCopy(bulk1 + 12,
                            sizeof(bulk1) - 12,
                            bulk2 + 12,
                            sizeof(bulk2) - 12,
                            bulk2 + 12 + (3 * sizeof(long)),
                            sizeof(bulk2) - 12 - (3 * sizeof(long)),
                            7))
        {
            break;
        }

        /* Slide a buffer forward */
        if (0 != verifyCopy(bulk1 + 12 + (3* sizeof(long)),
                            sizeof(bulk1) - 12 - (3* sizeof(long)),
                            bulk2 + 12 + (3* sizeof(long)),
                            sizeof(bulk2) - 12 - (3* sizeof(long)),
                            bulk2 + 12,
                            sizeof(bulk2) - 12,
                            7))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

At the core, memcpy is very simple; just loop over the data and move it from source to destination. The versions we've implemented have several extra checks in place to prevent NULL pointer dereferences and reading/writing with an improperly chosen length. Since copying between overlapping objects is allowed if implemented, we chose to implement that to provide more functionality.

void *
memcpy(void *s1, const void *s2, size_t n)
{
    char *pDst          = s1;
    const char *pSrc    = s2;

    if ( !s1 ||
         !s2 ||
        /* Check for wrapping while copying */
        ((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
        ((((unsigned long) -1) - ((unsigned long) s2)) < n))
    {
        return s1;
    }

    /* Copy forward or backward to handle object overlap */
    if (pDst < pSrc)
    {
        while (0 < n--)
        {
            *(pDst++) = *(pSrc++);
        }
    }
    else
    {
        pDst += n;
        pSrc += n;

        while (0 < n--)
        {
            *(--pDst) = *(--pSrc);
        }
    }

    return s1;
}

We also implemented bulk copies when available by using pointers to the largest type available in C90. Keep in mind that the bulk copy implementation relies on the assumption that an unsigned long is at least as large as a pointer type and that it also violates certain parts of the standard to perform copies in this way.

#define MEMCPY_EFFICIENCY_THRESHOLD     (200)

void *
memcpy(void *s1, const void *s2, size_t n)
{
    char *pCharDst          = s1;
    const char *pCharSrc    = s2;
    long *pLongDst          = NULL;
    const long *pLongSrc    = NULL;
    size_t dstOffset        = 0;
    size_t srcOffset        = 0;

    if ( !s1 ||
         !s2 ||
        /* Check for wrapping while copying */
        ((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
        ((((unsigned long) -1) - ((unsigned long) s2)) < n))
    {
        return s1;
    }

    /* Copy forward or backward to handle object overlap */
    if (pCharDst < pCharSrc)
    {
        dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
        srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);

        if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
            (dstOffset == srcOffset))
        {
            while ((0 < dstOffset--) &&
                   (0 < n--))
            {
                *(pCharDst++) = *(pCharSrc++);
            }

            pLongDst = (long *) pCharDst;
            pLongSrc = (long *) pCharSrc;

            while (sizeof(*pLongDst) <= n)
            {
                *(pLongDst++) = *(pLongSrc++);
                n -= sizeof(*pLongDst);
            }

            pCharDst = (char *) pLongDst;
            pCharSrc = (char *) pLongSrc;
        }

        while (0 < n--)
        {
            *(pCharDst++) = *(pCharSrc++);
        }
    }
    else
    {
        pCharDst += n;
        pCharSrc += n;

        dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
        srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);

        if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
            (dstOffset == srcOffset))
        {
            while ((0 < dstOffset--) &&
                   (0 < n--))
            {
                *(--pCharDst) = *(--pCharSrc);
            }

            pLongDst = (long *) pCharDst;
            pLongSrc = (long *) pCharSrc;

            while (sizeof(*pLongDst) <= n)
            {
                *(--pLongDst) = *(--pLongSrc);
                n -= sizeof(*pLongDst);
            }

            pCharDst = (char *) pLongDst;
            pCharSrc = (char *) pLongSrc;
        }

        while (0 < n--)
        {
            *(--pCharDst) = *(--pCharSrc);
        }
    }

    return s1;
}

string.h

December 05, 2015

tags: stddef.h, string.h

C has no concept of string objects, but it's handy to refer to certain pieces of memory as strings. These certain pieces of memory are a collection of consecutive characters ending in a NUL terminator. Because C has no concept of an object associated with those pieces of memory, the string.h header provides the common functions you might usually find associated with strings such as length, copy, concatenate, compare, and search. In addition to these functions, the header provides a type and a definition which we'll cover in this post.

size_t and NULL

The type defined in string.h is size_t, and the single macro it has is NULL... but we've seen these already in stddef.h. So, we have two options:

include stddef.h
redeclare/redefine them in string.h, or

Although both approaches are valid from a general programming perspective, The Standard says this in 7.1.3, Reserved identifiers

Each header declares or defines all identifiers listed in its associated
subclause, ...

Which means we can't include other headers to get a declaration or definition, it must be done directly in the header itself. So, even though it will duplicate code, we have to redefine the size_t type and macro for NULL.

Redeclare and redefine

I won't go over what size_t and NULL are since that was covered in the stddef.h post, so refer back to that if you need.

We'll first start off with copying some code from stddef.h into string.h:

#ifndef _STRING_H
#define _STRING_H

typedef unsigned long size_t;

#define NULL    ((void *) 0)

#endif /* _STRING_H */

Now, let's look at how this causes a slight challenge. I'll use an example piece of code here that pulls from both stddef.h and string.h to demonstrate the issues:

#include <stddef.h>
#include <string.h>

int
main(void)
{
    size_t testSize = 5;
    size_t *pSize = &testSize;

    *pSize = 7;

    return 0;
}

Compile this with the -Wpedantic flag at a minimum to produce the warning:

$ gcc -Wpedantic string_test.c -o string_test
In file included from string_test.c:2:0:
string.h:4:23: warning: redefinition of typedef ‘size_t’ [-Wpedantic]
 typedef unsigned long size_t;
                        ^
In file included from string.c:1:0:
stddef.h:4:23: note: previous declaration of ‘size_t’ was here
 typedef unsigned long size_t;
                        ^

What we didn't see an error with was the redefinition of the NULL macro, which is because we redefined it to the exact same thing. Had the text replacement been different, we would have seen a warning like so:

$ gcc -Wpedantic string_test.c -o string_test
In file included from string_test.c:3:0:
string.h:4:0: warning: "NULL" redefined [enabled by default]
 #define NULL    ((void *) 1)
 ^
In file included from string_test.c:2:0:
stddef.h:4:0: note: this is the location of the previous definition
 #define NULL    ((void *) 0)
 ^

To protect against these things, we can use something similar to header guards to indicate whether or not we have defined size_t or NULL already. These need to go in both stddef.h and string.h:

#ifndef _SIZE_T
#define _SIZE_T
typedef unsigned long size_t;
#endif /* _SIZE_T */

#ifndef _NULL
#define _NULL
#define NULL    ((void *) 0)
#endif /* _NULL */

Since NULL is itself a macro, we could test for the definition directly rather than creating a new #define. However, this would allow a program to provide its own definition of NULL which would supersede, and possibly break, our library. By creating a new test macro inside of the reserved name space, our implementation does its best to prevent that from happening.

These checks will get rid of the warnings, but they also allow for size_t and NULL to become unsynchronized if we change their definitions at some point. Files including string.h could get a different definition than files including stddef.h, which could become difficult to debug. Although it could be error prone, the definitions for size_t and NULL will not change for a given implementation because The Standard won't change (within a given standard at least, e.g. C90 is set in stone at this point).

Conclusion

Due to The Standard, we must have the macro and type directly in the header, luckily we can copy them from stddef.h. However, we now need to introduce checks to prevent redefinitions and add those same checks to stddef.h.

#ifndef _STRING_H
#define _STRING_H

#ifndef _SIZE_T
#define _SIZE_T
typedef unsigned long size_t;
#endif /* _SIZE_T */

#ifndef NULL
#define NULL    ((void *) 0)
#endif /* NULL */

#endif /* _STRING_H */

The initial test is simple as well, we just need to ensure that we can access the size_t type and the NULL macro:

#include <string.h>

int
stringTest(void)
{
    int ret         = 0;
    int *pTest      = NULL;
    size_t pSize    = sizeof(pTest);

    /* "Use" pSize to get rid of the warning */
    pSize = pSize;

    return ret;
}

math.h

August 09, 2015

tags: errno.h, float.h, math.h

The math.h header provides a large variety of mathematical functions as well as one macro. This post will focus on just the macro while each function will have its own dedicated post.

HUGE_VAL

The only macro in math.h is HUGE_VAL which is defined as "a positive double expression, not neceessarily representable as a float." This sounds like a good place to use the IEEE 754 reserved value for infinity which is described like so:

$$sign = \begin{cases} 0 & \text{positive infinity}\\ 1 & \text{negative infinity} \end{cases} $$

$$exponent = 2^{w} - 1$$

$$significand = 0$$

We will use the value for positive infinity so that we can represent negative infinity by simply prepending the unary minus operator to this macro, (-HUGE_VAL). A double value on x86_64 is 8 bytes in length and it will be easiest just to represent this value with 16 hexadecimal digits:

#define HUGE_VAL    (0x7FF0000000000000L)

Since a hexadecimal constant in C can't have an exponent part then we use the L suffix to ensure that HUGE_VAL is interpreted as a double expression when it is used.

Test

The test for this macro is very simple and just ensures that the file has access to the macro through including the math.h header:

#include <math.h>

int
math_test(void)
{
    int ret         = 0;
    double hugeVal  = 0.0;

    /* Test the HUGE_VAL macro */
    hugeVal = HUGE_VAL;

    return ret;
}

If the test fails then a compile error will be generated:

error: ‘HUGE_VAL’ undeclared (first use in this function)

Otherwise, everything will compile and succeed.

Error Conditions

math.h functions make use of two macros to indicate error conditions, EDOM and ERANGE. We already talked about these and defined them in errno.h but we'll review them here.

A domain error occurs when an input argument is outside the domain over which the function is defined. When this occurrs, errno is set to EDOM and an implementation-defined value is returned to indicate that an error condition was triggered.

A range errors occurs when the result of a function can't be represented as a double value. If the result overflows the double type, the return value is set to HUGE_VAL with the same sign as the correct return value (if it was negative then we can return (-HUGE_VAL)) and errno is set to ERANGE. If the result underflows the double type then the return value is set to 0. For underflows, errno is not required to be set to ERANGE, but an implementation may choose to do so. This is because the value is so small that the architecture is not able to differentiate it from 0.

You can associate EDOM with input errors, while ERANGE deals with output errors. These macros aren't used in the math.h header itself but will be used by most, if not all, of the functions provided by math.h.

stdarg.h

August 08, 2015

tags: stdarg.h

The stdarg.h header provides the functionality for marching through each argument of a function when you don't know how many arguments there will be until run time. When you see a function with an ellipsis as the last "parameter" then that function is probably using stdarg.h to process each argument. The printf function is just one example.

Stack Frames

First, we need to talk about stack frames and how they are constructed. If you've never heard of a stack frame then I recommend reading the Wikipedia article on call frames. This explanation is specific to the x86-64 architecture so keep that in mind as this may not be the way a different architecture handles things. A stack frame refers to the portion of stack memory that pertains to a specific function. The stack frame will contain function arguments (if they weren't passed in registers), the return address, a pointer to the previous base pointer, and then any local variables for the function. The base pointer is used to access both the arguments that were passed on the stack as well as the local variables for the function. There is a specific register, rbp, which will contain the current base pointer. Modifying this register without saving it could have serious consequences because you no longer have an easy reference point for accessing the contents of your stack frame. For a visual representation of this, the Wikipedia page I mentioned previously has a good picture or you can look at page 16 of the x86_64 ABI. Keep in mind that these pictures will seem to contradict eachother but this is only because the Wiki page shows the stack with low memory at the top while the ABI shows the stack with high memory at the top.

va_list

stdarg.h specifies a single type for holding the meta-data necessary for advancing through the argument list. Since this type is architecture specific then the Standard doesn't give any more information aside from it being "a type suitable for holding the information needed by the macros va_start, va_arg, and va_end." We can look to the x86_64 ABI for an answer though.

Each specific architecture will have its own way of performing argument passing. For example, the i386 architecture pushes every argument onto the stack before calling a function. Unfortunately, x86_64 is not so simple. The ABI describes the method through with parameters should be passed in detail and even provides the following definition for the va_list type:

typedef struct {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
} va_list[1];

Due to the way that x86_64 performs parameters passing, it is necessary to keep track of a few things so that we can access different types of parameters since they are stored in different places. The first of these is reg_save_area which is a pointer to the register save area. The register save area is a location on the stack which holds copies of arguments that were passed in registers, the reg_save_area pointer holds the location of the first register copy (which is always a copy of the rdi register). There is a limit to the amount of information that can be passed through registers and the excess arguments are passed directly on the stack. The overflow_arg_area is a pointer to the first of these excess arguments.

Now we have pointers to each group of arguments on the stack but we need to keep track of which argument is next. The gp_offset member is an offset from beginning of the reg_save_area that points to the next argument that was passed in the general purpose registers. The next argument could be retrieved by accessing what is on the stack at reg_save_area + gp_offset. However, what happens when we exhaust the arguments that were passed in registers? We set gp_offset to 48 which indicates that "register arguments have been exhausted" and then we would retrieve the next argument by accessing what is on the stack at overflow_arg_area. The value 48 comes from the number of general purpose registers used for argument passing: rdi, rsi, rdx, rcx, r8, and r9 give us 6, 8-byte, registers totaling 48 bytes on the stack for the copies of each register. Therefore, when gp_offset is 48 or greater, the location being accessed is no longer valid. overflow_arg_area works differently than reg_save_area because it should always point to the next argument to be retrieved; it needs to be updated every time an argument is retrieved. Lastly, there is the fp_offset member which works in the same way as gp_offset except it is the offset to the next argument passed in a floating-point register. This offset will have a value of 48 to 304 where the 304 indicates that all floating-point arguments passed in registers have been exhausted and the next one should be retrieved by using overflow_arg_area.

That is a little bit complicated but overall it's not too bad. The issues comes in with passing structures as arguments because a structure, on x86_64, may be passed partially through registers and partially on the stack. This results in a lot of extra logic in order to properly rebuild the entire structure from different parts of the stack. We'll talk more about this later.

va_start

The va_start macro is used to initialize a va_list structure so that it can be used to retrieve arguments. You must call va_start before accessing any arguments otherwise you won't know where they are.The x86_64 ABI gives us the information we need in order to initialize the va_list structure. This macro takes a va_list structure as well as the identifier of the rightmost parameter in the variable parameter list (e.g. the argument directly before the ellipsis).

First, we need to figure out where the register save area is. Looking at figure 3.33 in the ABI, we see that the register save area has a total size of 304 bytes which we calculate using the last register copy offset, 288, plus the size of the last register copy, 16 bytes for a floating-point register. This value should also be familiar from fp_offset because when it is set to 304, we know that we can't retrieve floating-point arguments from the register save area anymore. The compiler takes care of building the register save area for us, so we can be sure that 304 bytes of space will be taken up on the stack directly in between the previous rbp value and the local function variables. Since we know that the base pointer register, rbp, holds the stack location immediately after our register save area, we can subtract the size of the register save area to get the location we need.

reg_save_area $= rbp - 304$

Second, we need the location of the first argument passed on the stack, which should be in the stack location which follows the return address of the current stack stack frame (which is directly after the saved ebp). This location will be 16 bytes after the saved ebp:

overflow_arg_area $= rbp + 16$

Figure 3.33 from the ABI shows us the offsets to the first general purpose register as well as the first floating-point register, 0 and 48, respectively.

gp_offset $= 0$
fp_offset $= 48$

That's all we need for va_start to initialize the va_list structure.

va_arg

The va_arg macros expands to an expression that has the type and value of the next argument passed to a function. You must provide this macro with the va_list struct initialized from a call to va_start and the type of the argument that you want to be retrieved. va_arg will update the va_list structure appropriately so that a subsequent call to va_arg will work as expected.

The only primitive types that would be returned through va_arg would be the int and double types due to the way that the ABI is defined. For an int type we can just use reg_save_area + gp_offset to get the next int argument, or if gp_offset is greater than or equal to 48 we would read from overflow_arg_area. Likewise, for double arguments we would use reg_save_area + fp_offset or overflow_arg_area if fp_offset was 304 or greater.

For structures, things get pretty complicated since parts of the structure may be passed in general purpose registers, some parts may be passed in floating-point registers, and some parts may be passed directly on the stack. A structure can contain multiple members, each of varying types, and this is what makes it difficult to return a structure with va_arg on this architecture. There isn't a good way to dynamically determine the type of the next member within a structure from the standpoint of a C library, so we need to cheat a little bit for this macro.

The ABI even hints at this and states that "The va_arg macro is usually implemented as a compiler builtin and expanded in simplified forms for each particular type." gcc provides this builtin in the form of the macro __builtin_va_arg and we'll use it like so:

#define va_arg(ap, type)    __builtin_va_arg((ap), type)

va_end

The last macro must ensure that a normal return will happen from a function which called va_start. For the x86_64 architecture this macro doesn't actually need to do anything since va_start and va_arg only modify the va_list structure and not the stack or registers themselves. However, I would choose to zero out each member of the va_list struct just to ensure that va_start would need to be called be called before using va_arg again.

Cheating

Like I mentioned in the va_arg section, we needed to cheat a little bit for one of the macros. Since we are going to cheat on one of the macros then we also need to cheat on the rest of them to ensure that everything works together properly. gcc provides builtin versions of the va_list type as well as all three macros, so our implementation will look like so:

#ifndef _STDARG_H
#define _STDARG_H

typedef __builtin_va_list va_list;

#define va_start(ap, parmN) __builtin_va_start((ap), (parmN))
#define va_arg(ap, type)    __builtin_va_arg((ap), type)
#define va_end(ap)          __builtin_va_end((ap))

#endif /* _STDARG_H */

Freestanding Environment

Now that we've completed the stdarg.h header we have all the headers necessary for a freestanding environment. A freestanding environment is one in which C program can run without an underlying operating system. The freestanding execution environment requires all of the architecture specific header files: float.h, limits.h, stdarg.h, and stddef.h. This means that the rest of the header files (and any backing .c files) can be written in an architecture agnostic way and shouldn't require any assembly.

stddef.h

July 19, 2015

tags: stddef.h

In addition to the integral and floating types provided by C, there are some extra types that will be commonly used. The standard C library groups these common types (and some macros) into a single header, stddef.h.

Types

To define new types in C you used a typedef. A typedef does not actually introduce a new type, it just creates an alias to an existing type. They take the form of:

typedef existing-type new-type;

For example, if you wanted to make it more convenient to use the unsigned int type by only having to type uint, then you would use the following typedef:

typedef unsigned int uint;

Now, any occurrence of the uint identifier will be treated as if it were unsigned int. This may sound similar to a #define, but a typedef does not perform a text replacement.

ptrdiff_t

This type "is the signed integral type of the result of subtracting two pointers". So, we need this type to be of the same type as an address for the specific architecture. The x86_64 ABI tells us that a pointer to any type is an "unsigned eightbyte". Since this type is signed then we can focus on the other signed eightbyte types designated in the ABI: signed long, and signed long long. We'll use the signed long type:

typedef signed long ptrdiff_t;

size_t

This type "is the unsigned integral type of the result of the sizeof operator". The sizeof operator yields the size (in bytes) of its operand. It can be used to find the number of bytes in a type, an array, or a structure:

struct myStruct
{
    char a;
    int b;
    unsigned long c;
};

int a = 0;
char myArray[] = "hello, world";
struct myStruct test;

sizeof char;
sizeof a;
sizeof myArray;
sizeof test;

Since an array or a struct can be arbitrarily long we should choose a type which can represent a large range of values. The largest unsigned integral type available on the x86_64 architecture is the unsigned long type, which is what we'll use.

typedef unsigned long size_t;

wchar_t

This type "is the integral type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales". In other words, if your extended character set used a maximum of 3 bytes per character then wchar_t would need to be at least 3 bytes in length. Since we are using UTF-8 which uses a maximum of 4 bytes per character then we only need to consider types which are at least 4 bytes in length. There is no sense in using a type which is more than 4 bytes as that would just waste space, so we can choose between signed int and unsigned int. A character doesn't really have a sign so we will choose the unsigned int type:

typedef unsigned int wchar_t;

Macros

If you're not already familiar with macros, then see the errno.h post for more information.

NULL

This macro "expands to an implementation-defined null pointer constant". The x86_64 ABI states that "a null pointer (for all types) has the value zero." We can simply cast the value 0 as a void pointer to accomplish this:

#define NULL    ((void *) 0)

offsetof(type, member-designator)

This macro "expands to an integral constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The member-designator shall be such that given static type t; then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)"

So, the number of bytes between the beginning of a structure and the defined structure member. To get this we can subtract the address of the structure from the address of the member:

#define offsetof(type, member)  ((size_t) &(type.member) - &type)

Conclusion

That's everything for stddef.h. It's not a very involved header but we will see these types and macros pop up frequently throughout the rest of the library.

#ifndef _STDDEF_H
#define _STDDEF_H

typedef signed long ptrdiff_t;

typedef unsigned long size_t;

typedef unsigned int wchar_t;

#define NULL    ((void *) 0)

#define offsetof(type, member) ((size_t) &(type.member) - &type)

#endif /* _STDDEF_H */

float.h

July 17, 2015

tags: float.h, limits.h

Similary to limits.h, some limits must be established on the characteristics of floating-point arithmetic for a given architecture. These are determined by a model described in the standard and implemented in float.h. This won't be a comprehensive introduction to floating-point arithmetic as there are already resources for that:

The Floating-Point Model

Before talking about the specifics of float.h, we need to lay some groundwork for how floating-point numbers are defined and some of the terminology that goes along with them.

sign - positive or negative
radix - the numerical base of exponent representation (binary would have a radix of 2)
significand/mantissa - the part of the floating-point number consisting of its significant digits
precision - the number of digits in the significand

With those definitions, we'll construct a few variables that we need to construct the floating-point model:

$s$ - sign
$b$ - base/radix
$e$ - exponent
$p$ - precision
$f_{k}$ - nonnegative integers less than $b$ (the significand digits)

The standard states, "A normalized floating-point number $x ( f_1 > 0 \mbox{ if } x \neq 0 )$ is defined by the following model":

$$ x = s \times b^{e} \times \sum\limits_{k=1}^p f_{k} \times b^{-k} \mbox{ , } e_{min} \leq e \leq e_{max} $$

This is the model that we will implement with float.h and which can be used to construct a floating-point number. For this implementation, $b$ will always be $2$ since we're dealing with the binary representation of numbers. In addition, we must have a range for $e$ by defining $e_{min}$ and $e_{max}$. To calculate these values we use the following equations:

$k =$ type width in bits
$e_{max} = 2^{k-p-1}$
$e_{min} = 1 - e_{max}$

Floating Types

float, double, and long double are the floating-point types provided by C. Since this first version of welibc is for the x86_64 architecture we can turn to the x86_64 ABI to see how these types are implemented. Page 12 shows that these types are represented by single-precision, double-precision, and 80-bit extended precision floating-point types, respectively. It also shows us that that each of the three floating-point types are to be implemented according to the IEEE Standard for Floating-Point Arithmetic. I'll be choosing to use the most recent revision of that standard, IEEE 754-2008.

Now we have some reference documentation to help us implement float.h. We'll use this documentation to define the variables we discussed above and then we can use those to calculate the values we need for float.h. IEEE 754-2008 will give us values that define the widths, in bits, for the sign ($s$), exponent ($e$), and precision ($p$) of each floating-point type. From those variables we can derive the rest of the information we'll need.

IEEE 754 uses normalized floating-point numbers meaning that each value is represented with a nonzero leading digit. For example, $2.2573 \times 10^{3}$ is normalized, but $0.22573 \times 10^{4}$ is not, even though they represent the same quantity. By using normalized numbers we ensure that each value is uniquely represented.

Since we're using normalized numbers and the leading digit will always be nonzero (except for infinity, 0, and Not a Number), then the most significant bit of the significand would always be 1. There is no point in storing a bit if it will always be 1 so IEEE 754 uses that storage space to gain more precision. Since the original most significant bit is no longer stored in the floating type, it is referred to as a "hidden" bit.

The hidden bit has a few implications that we need to keep in mind through calculating our values. First, the field width of the significand is actually one more than the number of bits we use to represent it. We'll look at this below when we define all of the field widths. Second, the exponent field implicitly encodes the hidden bit for us so the computed exponent ranges are really one less than the value we need. I'll make sure to note these as we go along.

Float

The float type is a single-precision floating-point number represented by 32 bits ($k = 32$). The IEEE 754-2008 standard refers to this format as binary32 and tells us that the bits are broken down like so:

$s = 1$
$e = 8$
$p = 24$

See how the total bits here is actually 33? This is due to the hidden bit in the significand; there are physically 23 bits for the significand but we get 24 bits because the most significant bit is always 1 for normalized floating point numbers.

Double

The double type is a double-precision floating-point number represented by 64 bits ($k = 64$). The IEEE 754-2008 standard refers to this format as binary64 and breaks down the bits like so:

$s = 1$
$e = 11$
$p = 53$

Again, 53 bits for the significand where there are only 52 available.

Long Double

The long double type is an extended-precision floating-point number represented by 80 bits ($k = 80$). The IEEE 754-2008 standard refers to this format as binary64 extended and breaks down the bits like so:

$s = 1$
$e = 15$
$p = 64$

Don't forget that the hidden bit for the significand is what gives us 64 instead of 63 bits for $p$.

float.h Values and Equations

Now that we have the inputs we need, we can derive each value in float.h according to the equations given to us in the C standard. For all but two values in float.h, there is a separate named version for each floating point type denoted by the prefixes FLT, DBL, and LDBL, for float, double, and long double, respectively. We'll look at each value to see what it means and how to compute it based on the data we've already collected.

FLT_ROUNDS

The rounding mode for floating-point addition. This value is used for all types rather than having separate versions for each type (even though it is prefixed with FLT). The standard defines a set of possible assignments and their meanings:

$-1$ - indeterminate
$0$ - toward zero
$1$ - to nearest
$2$ - toward positive infinity
$3$ - toward negative infinity

Your specific application of C may necessitate one of these specific rounding modes. However, what if you don't know? How do you choose one?

Well, we can look at What Every Computer Scientist Should Know About Floating-Point Arithmetic for advice (you did read it, didn't you?). There are two very relevant portions from this paper. The first is titled "Exactly Rounded Operations" in the "Rounding Error" section. The second is in the "IEEE Standard" section titled "Operations". The author suggests that you round to the nearest floating point number using the round to even method.

#define FLT_ROUNDS  (1)

Exactly Rounded Operations

The paper states that rounding is pretty straightforward except for halfway cases such as 12.5; should that be 12 or 13? It then introduces the concept of "round to even" which proposes that instead of rounding up for all halfway values, they should be rounded towards the number that is even. This should result in 50% of the occurrences rounding up and 50% rounding down, rather than 100% rounding up. It also references a paper by Reiser and Knuth which backs up the position that round to even should be used.

So, why round to even on halfway values if half of the values are rounded down (0, 1, 2, 3, 4) and half of the values are rounded up (5, 6, 7, 8, 9)? Well, rounding up every time results in drift which is the gradual upward increase in repeated computations. By rounding to even you can eliminate drift since the error from one round up would be cancelled out by a subsequent round down.

IEEE 754-2008 Rounding

Goldberg's paper also references the IEEE standard because it also suggests that round to even be used. If we look at section 4.3, "Rounding-direction attributes", of IEEE 754-2008 we find that it refers to round to even as roundTiesToEven, stating that it should be the default rounding-direction attribute for results in binary formats. Although it also defines 4 other rounding attributes, we'll go with the default of roundTiesToEven which is an attribute of round to nearest.

FLT_RADIX

The radix of the exponent representation, $b$. As stated before, the radix will always be $2$ since we are performing binary floating-point arithmetic and the base of a binary numerical system is $2$.

#define FLT_RADIX   (2)

FLT_MANT_DIG

The number of base-FLT_RADIX digits in the floating-point significand, $p$. Alternatively, this is the number of bits used to represent the significand which is the value of $p$ from our floating-types discussion above.

#define FLT_MANT_DIG    (24)
#define DBL_MANT_DIG    (53)
#define LDBL_MANT_DIG   (64)

FLT_DIG

The number of decimal digits, $q$, such that any floating-point number with $q$ decimal digits can be rounded into a floating-point number with $p$ radix $b$ digits and back again without change to the $q$ decimal digits,

$$ \lfloor(p-1) \times log_{10}b \rfloor + \begin{cases} 1 & \text{if } b \text{ is a power of 10}\\ 0 & \text{otherwise} \end{cases} $$

This dictates the maximum number of decimal digits that can be converted to binary floating-point and back with no changes resulting from conversion. We'll use $p$ (the FLT_MANT_DIG family) and $b$ (FLT_RADIX).

First, let's try to simplify this equation a little bit. We can immediately see that we will always be adding $0$ to the result of the $floor$ computation because our $b$ (FLT_RADIX) value is $2$ which is not a power of $10$. Secondly, since $b$ is constant, $log_{10}b$ is also constant. Lastly, we just insert the value of $p$ for each floating-type and we have our solutions.

And we end up with:

#define FLT_DIG     (6)
#define DBL_DIG     (15)
#define LDBL_DIG    (18)

FLT_MAX_EXP

The maximum integer such that FLT_RADIX raised to that power minus $1$ is a representable finite floating-point number, $e_{max}$. Or, the maximum possible exponent we can use. To compute this we calculate the highest signed value that we can store in the number of bits alotted to the exponent field. $b$ is our base and $e$ is the exponent width (in bits) from above:

$$ b^{e-1} - 1 $$

FLT_MAX_EXP $= 2^{8-1} - 1 = 2^{7} - 1 = 128 - 1 = 127$
DBL_MAX_EXP $= 2^{11-1} - 1 = 2^{10} - 1 = 1024 - 1 = 1023$
LDBL_MAX_EXP $= 2^{15-1} - 1 = 2^{14} - 1 = 16384 - 1 = 16383$

We can verify these results against IEEE 754-2008 which has some handy tables that just give you the values for $e_{max}$ (specifically Table 3.5-Binary interchange format parameters). Not to spoil the surprise, but our answers agree with the table.

Now, while these values are correct, they are not what we will actually use. Remember that hidden bit from before? This is where it comes in to play with the exponent, so we need to add 1 to these value to get the actual max exponent.

#define FLT_MAX_EXP     (+128)
#define DBL_MAX_EXP     (+1024)
#define LDBL_MAX_EXP    (+16384)

FLT_MIN_EXP

The minimum negative integer such that FLT_RADIX raised to that power minus $1$ is a normalized floating-point number, $e_{min}$. Or, the lowest possible exponent we can use. IEEE 754-2008 gives us the following equation to compute this value:

$$ e_{min} = 1 - e_{max} $$

It's important to note that the $e_{max}$ in this equation is the computed value of $e_{max}$ from before, not the value of the FLT_EXP_MAX family of macros.

FLT_MIN_EXP $= 1 - 127 = -126$
DBL_MIN_EXP $= 1 - 1023 = -1022$
LDBL_MIN_EXP $= 1 - 16383 = -16382$

Again, due to the hidden bit in the significand, we must add one to these values to get the minimum exponent.

#define FLT_MIN_EXP     (-125)
#define DBL_MIN_EXP     (-1021)
#define LDBL_MIN_EXP    (-16381)

FLT_MIN_10_EXP

The minimum negative integer such that $10$ raised to that power is in the range of normalized floating-point numbers,

$$ \lceil log_{10}b^{e_{min}-1} \rceil $$

This is the minimum exponent we could use in a number written in decimal scientific notation while still being able to reasonably convert it to a binary floating-point representation. Here, $e_{min}$ is the value of the FLT_MIN_EXP family of macros to account for the hidden bit.

Which yields:

#define FLT_MIN_10_EXP  (-37)
#define DBL_MIN_10_EXP  (-307)
#define LDBL_MIN_10_EXP (-4931)

FLT_MAX_10_EXP

The maximum integer such that $10$ raised to that power is in the range of representable finite floating-point numbers,

$$ \lfloor log_{10}((1 - b^{-p}) \times b^{e_{max}}) \rfloor $$

This is the maximum exponent we could use in a number written in decimal scientific notation while still being able to reasonably convert it to a binary floating-point representation. Here, $e_{max}$ is the value of the FLT_MAX_EXP family of macros to account for the hidden bit.

Whew!

#define FLT_MAX_10_EXP  (+38)
#define DBL_MAX_10_EXP  (+308)
#define LDBL_MAX_10_EXP (+4932)

FLT_MAX

The maximum representable floating-point number,

$$ (1 - b^{-p}) \times b^{e_{max}} $$

You'll notice that this exact equation is found embedded in the calculation of FLT_MAX_10_EXP because it uses the maximum floating-point number to derive the exponent needed to represent it.

Note that the exponents for each of these match the values we calculated for the FLT_MAX_10_EXP family of macros. You should also see that there are an increasing number of significant figures as we go. This is because each successively larger floating type can hold more precise values than the last. To determine how many significant figures we should use you can just count the number of digits in the largest decimal value that can be stored in the bits alotted for the significand. This is equal to:

$$ \lceil log_{10}2^{p} \rceil $$

With that equation we arrive at $8$, $16$, and $20$ decimal digits for the float, double, and long double floating types, respectively.

#define FLT_MAX     (3.40282347E+38F)
#define DBL_MAX     (1.7976931348623157E+308)
#define LDBL_MAX    (1.18973149535723176502E+4932L)

FLT_EPSILON

The difference between $1$ and the least value greater than $1$ that is representable in the given floating point type,

$$ b^{1-p} $$

In other words, you could store 1.0 as a float, but your floating point type probably can't accurately represent $1.00000000000000000000000023$. We need to define the smallest step you can take when incrementing the floating point type which is what FLT_EPSILON provides.

FLT_EPSILON $= 2^{1-24} = 2^{-23} = 1.19209290... \times 10^{-7}$
DBL_EPSILON $= 2^{1-53} = 2^{-52} = 2.2204460492503131... \times 10^{-16}$
LDBL_EPSILON $= 2^{1-64} = 2^{-63} = 1.08420217248550443401... \times 10^{-19}$

Again, we have used a number of significant digits which corresponds to the precision of the floating type which stores that value.

#define FLT_EPSILON     (1.19209290-7F)
#define DBL_EPSILON     (2.2204460492503131E-16)
#define LDBL_EPSILON    (1.08420217248550443401E-19L)

FLT_MIN

The minimum normalized positive floating-point number,

$$ b^{e_{min}-1} $$

You'll notice that this exact equation is embedded within the calculation of the FLT_MIN_10_EXP family of macros for the same reason as the equation used to calculate FLT_MAX (but for the minimum number).

Those are some tiny numbers!

#define FLT_MIN     (1.17549435E-38F)
#define DBL_MIN     (2.2250738585072014E-308)
#define LDBL_MIN    (3.36210314311209350626E-4932L)

Conclusion

You may have noticed an F appended to each FLT_ value expressed in floating-point notation. This is a floating suffix and is a mechanism used by the C language to indicate that the prepended floating constant is to be interpreted as a float type. Similarly, the L floating suffix is used to denote a value to be interpreted as a long double type. Floating constants with no floating suffix are interpreted as a double floating type by default. Lastly, if you paid attention to the values we used vs. the values that wolframalpha got then you'll notice that we used the "round to even" method, when necessary, to achieve a proper result.

Our full implementation looks like so:

#ifndef _FLOAT_H
#define _FLOAT_H

#define FLT_ROUNDS      (1)

#define FLT_RADIX       (2)

#define FLT_MANT_DIG    (24)
#define DBL_MANT_DIG    (53)
#define LDBL_MANT_DIG   (64)

#define FLT_DIG         (6)
#define DBL_DIG         (15)
#define LDBL_DIG        (18)

#define FLT_MIN_EXP     (-125)
#define DBL_MIN_EXP     (-1021)
#define LDBL_MIN_EXP    (-16381)

#define FLT_MIN_10_EXP  (-37)
#define DBL_MIN_10_EXP  (-307)
#define LDBL_MIN_10_EXP (-4931)

#define FLT_MAX_EXP     (+128)
#define DBL_MAX_EXP     (+1024)
#define LDBL_MAX_EXP    (+16384)

#define FLT_MAX_10_EXP  (+38)
#define DBL_MAX_10_EXP  (+308)
#define LDBL_MAX_10_EXP (+4932)

#define FLT_MAX         (3.40282347E+38F)
#define DBL_MAX         (1.7976931348623157E+308)
#define LDBL_MAX        (1.18973149535723176502E+4932L)

#define FLT_EPSILON     (1.19209290-7F)
#define DBL_EPSILON     (2.2204460492503131E-16)
#define LDBL_EPSILON    (1.08420217248550443401E-19L)

#define FLT_MIN         (1.17549435E-38F)
#define DBL_MIN         (2.2250738585072014E-308)
#define LDBL_MIN        (3.36210314311209350626E-4932L)

#endif /* _FLOAT_H */

limits.h

May 10, 2015

tags: limits.h

When writing C code, it's important to know the ranges for the variables that you'll use. If you hit the top end of what your variable can store then its value will "roll-over" to a large negative number, or to zero. The limits.h file contains the ranges for each type so that you can use them when necessary.

About limits.h

This header is architecture specific and we'll be writing it for an x86_64 machine. Although the values we use are specific, the macros we use are a huge part of the portability of the C language itself. As a C programmer, you should be aware of these values for the architecture you're programming for but you should use these macros as much as possible so that your code will work on any machine with a C compiler. For example, you can be aware of the fact that an unsigned char has a maximum value of 255 on your machine, but you should use UCHAR_MAX in your code instead of 255. In order to know what values to use for limits.h, you'll need the Application Binary Interface for your specific architecture. I searched Google for "x86_64 ABI" and found it here. If you look at "Figure 3.1: Scalar Types" on page 12 then you'll see each of the C types along with their sizes, in bytes, on the x86_64 architecture. This table is what dictates our implementation.

Byte size

The first macro in limits.h is CHAR_BIT which establishes the number of bits used to represent a byte. This may seem kind of odd because a byte is always 8 bits! Almost always. Most modern architectures follow this norm, but some systems actually have a different number of bits per byte. This stack overflow question goes into some more depth, but a few examples would be:

16-bit bytes: TI C54x DSP
9-bit bytes: DEC PDP-10
6-bit byte: CDC-6400

The ABI states that "the term byte refers to an 8-bit object", which tells us that CHAR_BIT should be 8:

#define CHAR_BIT    (8)

Integer limits

For the rest of the macros in limits.h we will use 2's complement arithmetic and Figure 3.1 from the ABI to determine our values.

The formulas for determing these numbers are as follows:

$n =$ CHAR_BIT $*$ number of bytes
signed minimum $= -(2^{n-1})$
signed maximum $= 2^{n-1} - 1$
unsigned minimum $= 0$
unsigned maximum $= 2^{n} - 1$

For example, the char type is 1 byte and gives us the following values:

$n = 8$
signed minimum $= -(2^{8-1}) = {-128}$
signed maximum $= 2^{8-1} - 1 = 127$
unsigned minimum $= 0$
unsigned maximum $= 2^{8} - 1 = 255$

Notice anything odd here? The fact that $|{-128}| \gt |127|$ presents a small issue. We've defined that the maximum signed char is $127$, yet we use an even larger (absolute) value to define the minimum signed char value.

If we look at the C standard to understand how integers are parsed during compilation, we'll see that the sign is not a part of an integer constant. The standard states, "An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix which specifies its type." So for the minimum signed char value we computed, the unary minus operator, -, is parsed separately from the integer 128 and since we want to use that value with signed char types, then keeping the value as $128$ would result in overflowing the type. To avoid this we can substitute in an expression equal to ${-128}$, like (-127 - 1). We'll use a similar format for the signed minimum on other types.

Two final things to cover. The standard does not specify whether or not a char is signed or unsigned. However, the ABI specifies that a char is a signed byte so we will implement it as such. Lastly, the macro MB_LEN_MAX is for the number of bytes in a multibyte character for any supported locale. By setting this to 4 we allow support for UTF-8 which requires a maximum of 4 bytes for any one character according to its [specification].

Our full implementation looks like so:

#define CHAR_BIT    (8)

#define SCHAR_MIN   (-127 - 1)

#define SCHAR_MAX   (+127)

#define UCHAR_MAX   (+255)

#define CHAR_MIN    SCHAR_MIN

#define CHAR_MAX    SCHAR_MAX

#define MB_LEN_MAX  (4)

#define SHRT_MIN    (-32767 - 1)

#define SHRT_MAX    (+32767)

#define USHRT_MAX   (+65535)

#define INT_MIN     (-2147483647 - 1)

#define INT_MAX     (+2147483647)

#define UINT_MAX    (+4294967295U)

#define LONG_MIN    (-9223372036854775807L - 1L)

#define LONG_MAX    (+9223372036854775807L)

#define ULONG_MAX   (+18446744073709551615UL)

You may notice on the last few macros that the constant expressions have a letter suffixed to them, like U or L (or both). These are used to indicate an unsigned value and a long value, respectively, and help communicate your desired type for integer constants since they don't have type specifiers.

errno.h

April 19, 2015

tags: compiling, errno.h, linking, make

Since the Standard mentions the details of errno.h first, 7.1.4 "Errors <errno.h>", then we'll start there. This post will look into how to implement a global variable, errno, some macros, and how to properly link object files together so that all of your symbols resolve correctly.

About errno.h

The standard states that errno.h provides several macros relating to the reporting of error conditions. Specifically, a "modifiable lvalue that has type int", and two macros: EDOM and ERANGE. So, what is a modifiable lvalue and what are macros?

Lvalues

The standard defines lvalues in section 6.2.2.1, "Lvalues and function designators". An lvalue is an expression that designates an object, with an object type or incomplete type other than void. The definition of a modifiable lvalue is a little more complicated: an lvalue that doesn't have an array type, an incomplete type, or a const-qualified type, and if it is a structure or union, it has no members that are of const-qualified type. Basically, an object which you can assign a value to. The term "lvalue" can be thought of as meaning "left value". In expressions like object = value;, the object is a modifiable lvalue; it's on the left and we are modifying it.

Macros

Macros are a bit simpler. Section 6.8.3 of the standard, "Macro replacement", states that the identifier immediately following the define is called the macro name. For example:

#define MY_MACRO    (10)

MY_MACRO is the macro name. Immediately following a macro of this form is the replacement-list followed by a new-line. This means that when the preprocessor runs through our C code, every instance of MY_MACRO will be replaced with the characters (10). This makes our code much more readable and allows us to avoid using "magic numbers" which are values which have no apparent meaning.

For example, if you are using strings within your program and you know that you only want to handle strings which are less than or equal to 30 characters in length, you could have something like:

if (30 >= strlen(myString))
{
    /* process string */
}

This is fine since you wrote it, but there is a good chance someone else may look through your code and when they come across this 30 they would wonder what meaning it has. It might be easy to infer than it's the max string length, but they would have to further investigate your code to be sure. Instead, you could define a macro for the max string length and then use it like so:

#define MAX_STR_LEN     (30)

...

if (MAX_STR_LEN >= strlen(myString))
{
    /* process string */
}

Now the next person can easily determine that you were checking to be sure the string was within your limits. In addition, it's likely that you would use that value several times within your program and at some point you might decide that you will now handle strings up to 50 characters in length. Instead of having to update every instance of 30 that corresponds to your string length, you only need to modify your macro and everything is automatically updated for you! This will help you avoid bugs since every relevant instance of your max string length is updated, rather than you having to find and replace them manually. Let the preprocessor do the work for you.

errno

So, errno is a modifiable lvalue of type int which should mean we can just define it as int errno;, right? Well, the standard explicitly states that errno does not have to be the identifier of an object and that it could expand to the result from a function call like *errno(). The way I'm choosing to implement it is with an object of type int, rather than with a function. The use case for having a function macro is that every thread of your program would have its own instance of errno. This prevents threads from stepping on each other's error conditions. However, multi-threading support is another beast so I'll just stick to getting things working on one-thread before moving on to more.

The idea behind errno is that the library will update errno when somethings goes wrong and return an indication to the caller that it should check errno. This means that our errno object needs to have global scope such that any part of the code can check it. The way we accomplish this with C is through the use of the extern storage-class specifier.

Declarations vs. definitions

We need to take a short detour on the difference between declarations and definitions. The standard provides some information about them in section 6.5, "Declaration". A declaration specifies the interpretation and attributes of a set of identifiers; i.e. the storage-class, type-specifier, type-qualifiers, and the name. A definition is what causes storage to be reserved for an object.

Some examples of declarations in C look like:

extern int a;

int
main(void);

Note that these are at file scope, and not within a function. This declares an object, a, with external linkage and of type int, in addition to a function of type int, named main which takes no parameters. This indicates to the compiler that these two objects will be defined elsewhere and not to reserve any storage for them since the storage will be reserved by other code.

Some examples of definitions in C look like:

int a;

int
main(void)
{
    int b;
    int c = 0;

    return c;
}

Each object here is defined and has space reserved for it: a, main, b, and c. The two questionable objects are a and b; we never assigned any value to them yet they are still definitions? With C, declarations at file scope, without external linkage (i.e. without the extern storage-class), also count as definitions. Likewise, all function local variable declarations are also definitions regardless of whether or not you assigned a value to them.

Implementing the errno object

Since we want multiple files to be able to access the errno object, we would need to place the line extern int errno; in every file that needs access to errno. The way we will accomplish this is by creating the header file, errno.h, and placing this declaration inside it. Now, any file wishing to access errno can just include errno.h and it will get access. This is only declaring the errno object, so we also need to provide a definition for errno so that space is reserved for it such that the library can actually assign it a value and other files can check that value. We will have a matching errno.c file with int errno; in it providing the definition for our errno object. It is unnecessary to initialize errno since it is an external variable. The standard dictates that objects with external linkage have static storage duration (6.1.2.4 "Storage duraction of objects"), that objects in static storage are initialized before program startup (5.1.2 "Execution environment"), and that the value they are initialized to is 0 if no value is provided (6.4.7 "Initialization").

So, errno.h will contain the declaration for errno:

extern int errno;

errno.c will contain the definition for errno:

int errno;

And files which want to access errno only need to include errno.h like so:

#include <errno.h>

EDOM and ERANGE

errno.h must also provide these two macros which indicate two different error conditions, defined in 7.5.1, "Treatment of error conditions". EDOM indicates a domain error which means that the input argument is outside the domain over which the function is defined; e.g. if your function accepts inputs from 0 to 5 and you pass 10 to it, you've caused a domain error. ERANGE indicates a range error which means that the result of a function cannot be represented as a double value; e.g. if you try to take the log of 0 then you cause a range error because log(0) is undefined and cannot be represented.

Since we're using macros for EDOM and ERANGE it doesn't really matter what value we give them since the macro name would be used to check errno. We will just use the values 1 and 2 the represent EDOM and ERANGE, respectively. Just like errno, we want other files to have access to EDOM and ERANGE; we'll place them in the errno.h file so that anyone accessing errno automatically gets access to these macros as well.

We'll add the following to errno.h:

#define EDOM    (1)
#define ERANGE  (2)

Header Guards

Since we've written our first header file, we also need to talk about header guards. When you #include a file, the contents of that file are inserted into the current file at the location of the #include directive. When multiple files include the same file, the contents get included multiple times and you can end up with multiple definition errors.

To prevent this, we use preprocessor directives to control when the contents of the header file get used. Header guards look like the following:

#ifndef _HEADER_H
#define _HEADER_H

/* contents of header */

#endif /* _HEADER_H */

This works by checking to see if the identifier _HEADER_H has not been defined yet. If it hasn't been defined, in the case of the first occurrence of a file including this header, then the code between the #ifndef and #endif is processed and the identifier _HEADER_H is defined. If the identifier _HEADER_H has been defined, in the case of any inclusion of the header after the first, the code inside the conditionals is not processed. Anytime you use a header, you should use header guards to prevent errors from occurring when the header needs to be included by multiple files.

Identifier name

The identifier we used, _HEADER_H, will usually correspond to the specific name of the file that the header guards are placed in. The reason that we prepend these with an underscore, _, is because tokens of this naming scheme are specifically reserved by the standard for use in the standard library. Section 7.1.3, "Reserved identifiers" states what names are reserved, and our identifier falls within "identifiers that begin with an underscore and either an uppercase letter or another underscore". Specifically for our errno.h file, our header guards will look like:

#ifndef _ERRNO_H
#define _ERRNO_H

/* contents of header */

#endif /* _ERRNO_H */

Implementation of errno.h

That finishes up all the pieces we need for errno.h. Combining all these things together, our implementation of errno.h is below (without comments):

#ifndef _ERRNO_H
#define _ERRNO_H

#define EDOM    (1)

#define ERANGE  (2)

extern int errno;

#endif /* _ERRNO_H */

Makefile changes

A few modifications to the Makefile were needed to facilitate building object files from C source files and to use our header directory instead of the standard system header directory.

Compiling C files into object files

This addition was pretty simple. We we're previously only assembling .s files, but now that we have some C code, we also want to build objects from .c files. We can add a new pattern substitution that appends more object names to the OBJECTS variable, like so:

OBJECTS +=  $(patsubst $(SRCDIR)/%.c,%.o,$(wildcard $(SRCDIR)/*.c))

Defining the system include directory

By default, gcc will use /usr/include as the location for finding system header files. System headers are wrapped with <> rather than "" when you #include them. Since we want to use the header files in our ./include directory, we need to tell gcc not to use files found in /usr/include. The -nostdinc flag tells gcc not to search the standard system directories for header files, exactly what we want! Now, we need to tell gcc where to search for system headers with the -isystem flag. You simple provide a directory path to this argument, so from the base of our directory structure we will use the ./include directory by adding -isystem ./include.

Compile for a specific standard

gcc also includes specific rules for compiling against a certain standard. Since we are writing this for ISO 9899:1990 we want to compile for that standard by passing the following flag: -std=iso9899:1990.

Testing errno.h

Since we implemented errno.h then we need to write a test to ensure that things work the way we want them to. This should be pretty simple because all we need to do is #include <errno.h> and ensure that we can use the errno object as well as the EDOM and ERANGE macros. The test I wrote looks like so:

#include <errno.h>

/*******************************************************************************
    errno_test
*//**
    @brief  Tests the functionality of errno.h

    @return Indication of success
    @retval 0   Successfully ran all tests
    @retval 1   Failed to run tests
*******************************************************************************/
int
errno_test(void)
{
    int ret = 0;

    /* Ensure errno starts out as 0 */
    if (0 == errno)
    {
        /* Test setting errno to all defined values */
        errno = EDOM;
        errno = ERANGE;

        /* Reset since no errors occurred */
        errno = 0;
    }
    else
    {
        ret = 1;
    }

    return ret;
}

Link order

Lastly, we need to slightly alter the order of our build command so that our symbols will resolve properly. Initially, we used the following recipe to link our test code against the library:

$(TSTNAME): $(LIBNAME) $(TSTOBJS)
    $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME,,$^))

When I compiled this way, I kept getting the following errors:

errno_test.o: In function `errno_test':
errno_test.c:(.text+0xd): undefined reference to `errno'
errno_test.c:(.text+0x17): undefined reference to `errno'
errno_test.c:(.text+0x21): undefined reference to `errno'
errno_test.c:(.text+0x2b): undefined reference to `errno'

The first thing I checked was that the library actually had the errno symbol and that it was global. You check this by dumping the symbol table with objdump -t:

$ objdump -t libwelibc.a

In archive libwelibc.a:

_start.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 g       .text  0000000000000000 _start
0000000000000000         *UND*  0000000000000000 main



errno.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 errno.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack        0000000000000000 .note.GNU-stack
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000004       O *COM*  0000000000000004 errno

Hrm, we can see the last entry is errno and that it's placed in the .common section of the ELF (indicated by *COM*). The .common section holds uninitialized global variables and is where we would expect to see errno since we didn't specificically initialize it in errno.c. So, the symbol is there but the linker is having trouble resolving it.

When linking object files together, the order in which you list them makes a difference in how the symbols are resolved during linking. Eli Bendersky wrote a great article explaining this, Library order in static linking. It even has an example of this exact situation of linking against a library. The answer to the problem is to place the library after the other object files we are linking together. So, we just move the $(LDFLAGS) variable after the pattern substitution like so:

$(TSTNAME): $(LIBNAME) $(TSTOBJS)
    $(CC) $(CFLAGS) $(patsubst $(LIBNAME),,$^) $(LDFLAGS) -o $@

I also moved the output file to the end because that seems to make sense to me.

Conclusion

With all that, we now know what modifiable lvalues and macros are, how to use global variables with C, and how to link properly with our own system headers. We now have the error reporting facilities we need to move on with the library. The next piece we will implement is 7.1.5, "Limits <float.h> and <limits.h>".

Making a Makefile

April 08, 2015

tags: assembling, compiling, design, linking, make

make is a GNU utility used for compiling programs and is the build system that was chosen for welibc. It helps automate the build process so that you don't need to memorize the specific compiler flags you want to use, remember exactly how to link against a library, or rely on having your command history available to compile your code. make is not necessarily complicated but there are a few pitfalls to getting started. We'll go over the creation of the makefile for welibc to help guide you.

Directory Structure

In a previous post the directory structure of welibc was discussed but it was just a placeholder since no code had been written yet. Since we actually have some files now, here is what the structure looks like:

welibc/
    ├── docs
    ├── license.txt
    ├── README.md
    ├── src
    │   └── _start.s
    └── test
        └── main_test.c

All source code will be inside the src/ directory and any code used to test the library will be placed in test/. This keeps files in logically separate places so the project doesn't become cluttered with assembly, C code, headers, and tests all mixed together.

Compiling without make

We only have two files here so compiling this project should be cake, right? Well, let's consider what we need to do:

Generate a _start.o file from _start.s
Create libwelibc.a from _start.o
Compile, but not link, main_test.c
Link main_test.o with libwelibc.a to create a test_welibc program
Run the test

That seems manageable. Some of the compilation basics and commands for generating libraries were covered before so I won't go over them here. Instead, I'll just show the commands:

$ gcc -nostdlib -ffreestanding -c -o _start.o src/_start.s
$ ar cr libwelibc.a _start.o
$ gcc -nostdlib -ffreestanding -c -o main_test.o test/main_test.c
$ gcc -lwelibc -L. -nostdlib -ffreestanding -o test_welibc main_test.o
$ ./test_welibc

That's only five commands which isn't too many, but let's look a little more closely. First, the compile flags are copied across three commands so if we want to change them we'll need to remember to include the updated flags in all three commands. Second, if we change something in _start.s then we'll need to go back and run all five commands again; that's just hitting the up arrow and enter keys five times though. Lastly, when (not if) we add more code we'll need to be sure to include each one as appropriate which could add another entire command for each new file we need to compile for the project. Well, make can take care of these things for us in an automated fashion so that we don't have to rely on ourselves to remember each detail of the compilation process. After all, why should we do it when the computer can do it for us?

Compiling with make

I'll only go over a few of the features of make, but a deeper explanation can be read from the source, the GNU Make Manual. To compile with make you must have a makefile with rules for compiling your project which you can run from the command-line by ussuing the make command.

Comments

You can include comments in your makefile to document how things work. Comments are started with the # character and can be placed on their own line, or at the end of a line.

# This is my makefile. There are many like it, but this one is mine.

Rules

Rules are used to define something for make to do, such as creating an object file or linking an object file against a library. The name of the rule, called a target, goes at the beginning of the line and is followed by a colon. The files or other rules required to run a rule, called prerequisites, go on the same line after the colon. The commands that should be run for that rule go on the following lines and are prepended with a tab character; the combination of these commands is called the recipe.

target: prerequisites
    recipe

Variables

make lets you define variables to use throughout your makefile, these are case sensitive and have a few assignment operators, but we'll just use := and += for now. := assigns data to a variable while += appends more data to it. For the most part, you can just consider variables to be character arrays or strings. Variables are used by placing them inside of parenthesis and prepending a dollar sign, like $(myvar).

myvar := some stuff
myvar += more things

Basic Makefile

The basic version of the makefile used to replicate the steps above looks like so and is named Makefile:

# Makefile

all: _start.o libwelibc.a test_welibc

_start.o: src/_start.s
    gcc -nostdlib -ffreestanding -c -o _start.o src/_start.s

libwelibc.a: _start.o
    ar cr libwelibc.a _start.o

test_welibc: libwelibc.a test/main_test.c
    gcc -nostdlib -ffreestanding -c -o test_welibc.o test/main_test.c
    gcc -lwelibc -L. -nostdlib -ffreestanding -o test_welibc test_welibc.o
    ./test_welibc

clean:
    rm _start.o libwelibc.a test_welibc.o test_welibc

You can see that the same five commands from above are replicated here in addition to an rm command. We have separated out the commands into targets so that each rule will build a specific piece of the project.

By default, make will run the first rule that it comes across; in this makefile that would be the "all" rule. This rule only specifies prerequisites which, if not found or are older than the target, will be built before continuing into the recipe portion of the rule. So, this will run the rules for _start.o, libwelibc, and test_welibc before executing the recipe for the rule. The recipe for this rule is empty so the make process will simply end after executing the rules for each prerequisite.

The _start.o rule requires the src/_start.s file which is located in exactly the spot indicated. If the _start.o file is older the _start.s, or if _start.o doesn't exist then it will create it using the recipe. The same goes for the libwelibc.a rule and the test_welibc rule. However, for the test_welibc rule, the recipe has a few steps which are each run in order.

Since "clean" was not a prerequisite of the "all" rule, it won't be run. We can run specific rules by passing the target as an argument on the command-line. For example, if you want to run the rm command in the "clean" rule, you would run make clean and it would clean up your files for you. Pretty nice!

Basic Enhancements

There are some simple things you can do to enhance your makefile a bit. We'll go over those before getting into some more advanced things to solidify your make skills.

Using Variables

You can see that the list of files to delete and the list of prerequisites for the "all" rule are pretty similar. We should place these file names/prerequisites into a variable so we don't have to worry about updating things in two places. A standard variable name for object files would simply be OBJECTS. We'll split up the test_welibc rule into two pieces as well: one to create test_welibc.o and one to link and run the test program.

# Makefile
OBJECTS := _start.o libwelibc.a test_welibc.o

all: $(OBJECTS) test_welibc

_start.o: src/_start.s
    gcc -nostdlib -ffreestanding -c -o _start.o src/_start.s

libwelibc.a: _start.o
    ar cr libwelibc.a _start.o

test_welibc.o: test/main_test.c
    gcc -nostdlib -ffreestanding -c -o test_welibc.o test/main_test.c

test_welibc: libwelibc.a test_welibc.o
    gcc -lwelibc -L. -nostdlib -ffreestanding -o test_welibc test_welibc.o
    ./test_welibc

clean:
    rm $(OBJECTS) test_welibc

This has the same functionality except that when we add a new object file to be created we would add it to the objects variable and it automatically gets deleted without us having to manually update the "clean" rule.

The next set of variables that we'll add will cover the compiler being used and the flags passed to the compiler. It's very common to see variables named CC, CFLAGS, and LDFLAGS in a makefile. These correspond to the compiler, compiler flags, and linker flags, respectively.

# Makefile
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
OBJECTS := _start.o libwelibc.a test_welibc.o

all: $(OBJECTS) test_welibc

_start.o: src/_start.s
    $(CC) $(CFLAGS) -c -o _start.o src/_start.s

libwelibc.a: _start.o
    ar cr libwelibc.a _start.o

test_welibc.o: test/main_test.c
    $(CC) $(CFLAGS) -c -o test_welibc.o test/main_test.c

test_welibc: libwelibc.a test_welibc.o
    $(CC) $(LDFLAGS) $(CFLAGS) -o test_welibc test_welibc.o
    ./test_welibc

clean:
    rm $(OBJECTS) test_welibc

Ignoring Errors and PHONYs

Let's say there was an error when test_welibc.o was created, so only _start.o was made. We might want to clean up what was built, fix the error, and finally recompile. However, the rm command would fail because it can't find the rest of the object files or test_welibc. You can prepend any command with a hyphen, -, to instruct make to ignore any errors produced by that command. As well, we can add the -rf flags to rm to remove any errors related to unfound files or directories that are not empty.

Some targets will not have any prerequisites, as with our clean rule, and make provides a special directive to mark rules which are a set of actions that don't rely on specific files. This is the .PHONY directive which you give a target and make will know to always run that target even though it has no files to determine when things are out of date and need to be rebuilt. In addition to this, if we end up with a file actually named "clean" then make won't get confused; it will know that the clean rule we have is for deleting files and doesn't correspond to compiling a file named "clean". Here's the new all and clean rules:

.PHONY: all
all: $(OBJECTS) test_welibc

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) test_welibc

Advanced makery

The following will assist in further automating the make process and will also clean up the makefile to make it easier to understand when changing it.

Automatic File Finding

We already know that we'll be adding more code to the project which means we'll have to update the objects variable each time and add a new rule. Instead of manually updating the objects variables we can automate finding files so we can focus our attention on creating new rules. make has a wildcard character, %, which can be combined with two built in functions, patsubst (for pattern substitution) and wildcard, to generate a list of object files from every ".s" file that is found. A new variable to indicate where our main source files would be helpful as well. The additions look like:

SRCDIR  := src
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a

The $(wildcard $(SRCDIR)/*.s) part will match every file in the src/ directory that ends in ".s". The pattern substitution will then do a search and replace where the % character represents the filename. So, src/_start.s would become _start.o and anytime we add a new ".s" file make will find it automatically.

We can create similar variables just for the ".c" files in the test directory since they're no longer found with the changes we just made:

TSTDIR  := test
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))

The rule for compiling the main_test.c file needs to change slightly, along with test_welibc and the prerequisite list for libwelibc.a; here's the new makefile:

# Makefile
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR  := src
TSTDIR  := test
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))

.PHONY: all
all: $(OBJECTS) $(TSTOBJS) test_welibc

_start.o: src/_start.s
    $(CC) $(CFLAGS) -c -o _start.o src/_start.s

libwelibc.a: $(OBJECTS)
    ar cr libwelibc.a _start.o

main_test.o: test/main_test.c
    $(CC) $(CFLAGS) -c -o main_test.o test/main_test.c

test_welibc: libwelibc.a main_test.o
    $(CC) $(LDFLAGS) $(CFLAGS) -o test_welibc main_test.o
    ./test_welibc

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) $(TSTOBJS) test_welibc

Virtual Paths and Automatic Variables

It's kind of annoying to have to specify the directory where a file exists and it clutters the rules up having more than just file names. make provides a variable that you can fill in, VPATH (remember this is case-sensitive), with colon-separated locations to search for files when they aren't found in the current directory. Sounds like a great thing to use for the src/ and test/ directories:

VPATH   := $(SRCDIR):$(TSTDIR)

Now we can simply list the filenames in the prerequisite lists, rather than prepending the directory. However, the recipe commands won't search these directories since they are passed directly to your shell for execution. To get around this, make provides some automatic variables which are expanded before passing a recipe command to the shell. The $^ variable corresponds to every prerequisite file and is updated with the path and name found using VPATH. The $@ variable corresponds to the target name and the $< variable corresponds to the first prerequisite. With these powers combined we get:

# Makefile
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR  := src
TSTDIR  := test
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH   := $(SRCDIR):$(TSTDIR)

.PHONY: all
all: $(OBJECTS) $(TSTOBJS) test_welibc

_start.o: _start.s
    $(CC) $(CFLAGS) -c -o $@ $^

libwelibc.a: $(OBJECTS)
    ar cr $@ $^

main_test.o: main_test.c
    $(CC) $(CFLAGS) -c -o $@ $^

test_welibc: main_test.o libwelibc.a
    $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $<
    ./test_welibc

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) $(TSTOBJS) test_welibc

That simplifies the syntax quite a bit!

Wildcard Targets

We can use the same % wildcard from before to create some wildcard rules so we don't need a specific rule for each .o which gets built with the same command, $(CC) $(CFLAGS) -c -o file.o file.c. Remember that the % expands to the matched name so we can use it in the prerequisite list as well. We'll still need two rules since some objects are made from ".s" files and others from ".c" files:

# Makefile
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR  := src
TSTDIR  := test
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH   := $(SRCDIR):$(TSTDIR)

.PHONY: all
all: $(OBJECTS) $(TSTOBJS) test_welibc

_%.o: _%.s
    $(CC) $(CFLAGS) -c -o $@ $^

%.o: %.c
    $(CC) $(CFLAGS) -c -o $@ $^

libwelibc.a: $(OBJECTS)
    ar cr $@ $^

test_welibc: main_test.o libwelibc.a
    $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $<
    ./test_welibc

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) $(TSTOBJS) test_welibc

That's not a huge improvement at the moment, but it's the other half of automatic file finding since now we have automatic rule generation for objects which don't require any special recipe.

Generic Tricks

The next few iterations of the makefile expand on the things weve already covered.

First, we'll add variables for the name of our library and the name of the test in addition to creating a "test" rule that is separate from the "all" rule so that we don't test every time we build the library. To test, you would use the make test command. Also, we'll clean up all the files before rebuilding with the "all" target by making "clean" a prerequisite for the "all" rule. The $(TSTNAME) rule uses a pattern substitution to remove $(LIBNAME) from the list of object files that are linked together. It's redundant to provide -lwelibc -L. (the $(LDFLAGS) variable) and also provide libwelibc.a in the list of files to link. Further, we want to make a distinction between them as the proper way to compile this would be to generate the test object file and then link it against our library (which is what we accomplish here).

# Makefile
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR  := src
TSTDIR  := test
LIBNAME := libwelibc.a
TSTNAME := test_welibc
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) $(LIBNAME)
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH   := $(SRCDIR):$(TSTDIR)

.PHONY: all
all: clean $(OBJECTS)

.PHONY: test
test: $(TSTNAME)
    ./$<

_%.o: _%.s
    $(CC) $(CFLAGS) -c -o $@ $^

%.o: %.c
    $(CC) $(CFLAGS) -c -o $@ $^

$(LIBNAME): $(OBJECTS)
    ar cr $@ $^

$(TSTNAME): $(LIBNAME) $(TSTOBJS)
    $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME),,$^)

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) $(TSTOBJS) test_welibc

Next, we'll add an install rule to place the final libwelibc.a static library into the /usr/lib folder on the machine, as well as an uninstall rule to remove it.

# Makefile
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
INSTALL := install
SRCDIR  := src
TSTDIR  := test
LIBNAME := libwelibc.a
TSTNAME := test_welibc
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) $(LIBNAME)
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH   := $(SRCDIR):$(TSTDIR)

.PHONY: all
all: clean $(OBJECTS)

.PHONY: test
test: $(TSTNAME)
    ./$<

.PHONY: install
install: $(LIBNAME)
    $(INSTALL) $^ /usr/lib

.PHONY: uninstall
uninstall: $(LIBNAME)
    -rm -rf /usr/lib/$^

_%.o: _%.s
    $(CC) $(CFLAGS) -c -o $@ $^

%.o: %.c
    $(CC) $(CFLAGS) -c -o $@ $^

$(LIBNAME): $(OBJECTS)
    ar cr $@ $^

$(TSTNAME): $(LIBNAME) $(TSTOBJS)
    $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME),,$^)

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) $(TSTOBJS) test_welibc

Lastly, the GNU Make Manual has an entire chapter dedicated to conventions that are commonly used with makefiles and is a great source of enhancements to add. I've chosen just a few to add to the last iteration of the makefile, for now.

First, we'll add INSTALL_PROGRAM and INSTALL_DATA which are to be used for installing programs and installing binaries onto a system. Along with these we'll add LIBDIR to specify the default destination which we also prepend with DESTDIR so that the command-line flag, --destdir, can be used to specify an alternate installation destination. They also instruct you to include the SHELL variable set to the bourne shell specifically. The first instance of .SUFFIXES clears out all the suffixes being searched and the second specifies which file suffixes are to be searched. This may save some time on larger projects but probably won't matter for this one. The manual also suggests placing any build programs into a variable (like ar and gcc), but not to do so with shell utilies (like rm). Finally, we end up with the following:

# Makefile
AR      := ar
CC      := gcc
CFLAGS  := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
INSTALL := install
INSTALL_PROGRAM := $(INSTALL)
INSTALL_DATA    := $(INSTALL) -m 644
SHELL   := /bin/sh
LIBDIR  := /usr/lib
SRCDIR  := src
TSTDIR  := test
LIBNAME := libwelibc.a
TSTNAME := test_welibc
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) $(LIBNAME)
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH   := $(SRCDIR):$(TSTDIR)

.SUFFIXES:
.SUFFIXES: .c .o .s

.PHONY: all
all: clean $(OBJECTS)

.PHONY: test
test: $(TSTNAME)
    ./$<

.PHONY: install
install: $(LIBNAME)
    $(INSTALL) $^ $(DESTDIR)$(LIBDIR)/$(LIBNAME)

.PHONY: uninstall
uninstall: $(LIBNAME)
    -rm -rf $(DESTDIR)$(LIBDIR)/$(LIBNAME)

_%.o: _%.s
    $(CC) $(CFLAGS) -c -o $@ $^

%.o: %.c
    $(CC) $(CFLAGS) -c -o $@ $^

$(LIBNAME): $(OBJECTS)
    $(AR) cr $@ $^

$(TSTNAME): $(LIBNAME) $(TSTOBJS)
    $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME),,$^)

.PHONY: clean
clean:
    -rm -rf $(OBJECTS) $(TSTOBJS) test_welibc

Congratulations, your makefile has officially been pimped.

That's quite a bit more advanced than what we started with and we're only compiling two files! However, a lot of the changes are intended to ease the "pain" of adding more files to the library later on, as well as expanding the test suite. So, we did a bit of heavy lifting on the front-end here but hopefully this makefile will last a while with only minor tweaks along the way.

This wouldn't be complete without mentioning Paul's Rules of Makefiles which is almost always referenced when people ask make questions. I've tried to abide by the rules and I'll admit that I wanted to do things differently but sticking to the rules actually made things easier in the end.

Creating a Static Library

March 22, 2015

tags: assembling, compiling, linking

Creating libraries allows you to place code in a container so that it can be used by other programs. You can create a shared library which is loaded into memory and is available for any running process to use, or you can create a static library which is bundled up within an executable in the same manner as linking in another object file. We'll go over static library creation since welibc isn't ready to be a shared library yet.

Static libraries

Rather than being loaded by the operating system and being made available to any process, a static library is linked in at compile time in a similar fashion to a normal object file. This is the method I'll be using for testing welibc.

We can assemble the following code just as we normally would to create an object file, from _start.s:

.globl _start               # Declare global _start symbol

_start:
    call    main            # Call main function
    mov     %rax,   %rbx    # Copy return value from main as argument to exit(2)
    mov     $1,     %rax    # Set rax to syscall number for exit
    int     $0x80           # Trigger interrupt to call exit

Assemble:

$ gcc -nostdlib -ffreestanding -c -o _start.o _start.s

Then we create an archive with the ar program:

$ ar cr libwelibc.a _start.o

Write our test code, main_test.c:

int
main(int argc, char *argv[])
{
    int ret = 42;

    return ret;
}

And compile:

$ gcc -nostdlib -ffreestanding -c -o main_test.o main_test.c

To link against a static library you specify the library's name without the prepended "lib" using the -l flag to gcc, and ensuring to pass the path of the library with the -L flag:

$ gcc -L./ -nostdlib -ffreestanding -o test_welibc main_test.o -lwelibc

Now you've linked against a static library and you can run your code:

$ ./test_welibc
$ echo $?
42

As welibc comes along it will be built as a static library and then the test code can link against it like above. This should keep things pretty simple for testing the functionality of the library.

Getting Rid of Stdlibc

March 22, 2015

tags: assembling, compiling, linking

To get started writing your own implementation of the standard C library you'll need to stop linking against whatever implementation that your compiler comes with. I'm working on a CentOS 7 machine and use glibc normally; now I want to use welibc (even though it doesn't really exist yet).

Normal Compilation

First, we're going to look at a "normal" compile. The beginning of the test suite for welibc is in a file called main_test.c and looks like this (comment blocks excluded)

int
main(int argc, char *argv[])
{
    int ret = 0;

    return ret;
}

Easy enough. This should just run and return 0 without doing anything else.

I'm using gcc to compile this project, so to compile this code we can use the following command

$ gcc -o test_welibc main_test.c

If things went properly then this compiles with no errors and gcc doesn't print anything. I'm also using bash for my shell and we can test that it ran properly like so

$ ./test_welibc
$ echo $?
0

The echo $? line will print the exit status of the most recently executed foreground pipeline, e.g. the return value of the program we just ran. We got 0 which means things are working properly. You can be sure by using a much different return value

int
main(int argc, char *argv[])
{
    int ret = 42;

    return ret;
}

Then rerun it

$ gcc -o test_welibc main_test.c
$ ./test_welibc
$ echo $?
42

So, now you know how a normal compilation would run.

Compiling without stdlibc

We never explicitly told gcc to use glibc but it does this automatically unless you instruct it not to. You would do that with the -nostdlib -ffreestanding flags, which prevent linking against stdlib and instructs gcc that our final product will be running in a freestanding environment which means the standard library might not exist and our program might not necessarily start in the main function. So, we'll add those to our compile line and see what happens.

$ gcc -nostdlib -ffreestanding -o test_welibc main_test.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144

Uh... what? /usr/bin/ld? entry symbol _start? First, we didn't even tell you to run ld and second, my code doesn't even mention _start, so why do you need that symbol?

Compilation process

When you compile a C program, a couple things happen:

The preprocessor performs text replacements
The compiler converts C code into assembly instructions
The assembler converts assembly instructions into into object code
The linker combines all of the object files together into an executable

With the GNU compiler collection your preprocessor and compiler will be gcc, the assembler is as, and the linker is ld. gcc will automatically run each of these as instructed so you don't have to worry about doing each of them individually. However, when you need the control, you can absolutely run each step individually.

The output of each of these steps can be seen by using the -save-temps flag to gcc, so we'll compile with that and look at each step

$ gcc -save-temps -nostdlib -ffreestanding -o test_welibc main_test.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
$ ls
main_test.c  main_test.i  main_test.o  main_test.s  test_welibc

Preprocessor

The preprocessor, gcc in this case, moves through your C code and performs all of the actions specified by the preprocessor directives, like:

#include
#define
#endif

This mostly performs text replacements within your code. For example, #include actually inserts the text from whatever file you're including, and #define replaces every instance of your macro with the value you assigned to it. It also replaces every comment with a space.

The temporary file corresponding to this step is main_test.i, let's look at it:

# 1 "main_test.c"
# 1 "<command-line>"
# 1 "main_test.c"
# 47 "main_test.c"
int
main(int argc, char *argv[])
{
    int ret = 42;

    return ret;
}

There isn't really anything too interesting here, our code is mostly unchanged. The .i file type indicates to gcc that the code should not be preprocessed; this makes sense because this is the output of the preprocessor. The first four lines are all specific to the gcc C preprocessor and contain a line number followed by a filename in quotes. These aren't really important for now, but you can read more about them here.

Compiler

The compiler, gcc, will convert human readable code into assembly instructions which correspond directly to a specific action that your processor needs to carry out. For example, the processor doesn't know what return ret; means, so we need to translate that C code into the instructions that make the processor "return the value of ret to the calling function." To look at the assembly, check out the test_main.s file

.file   "main_test.c"
.text
.globl  main
.type   main, @function

main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    %edi, -20(%rbp)
        movq    %rsi, -32(%rbp)
        movl    $42, -4(%rbp)
        movl    -4(%rbp), %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (GNU) 4.8.2 20140120 (Red Hat 4.8.2-16)"
        .section        .note.GNU-stack,"",@progbits

This includes some things that are unique to the DWARF-2 debugging format and are too specific to get into right now. More information about these can be found here and in the DWARF-2 specification. The .ident and .section lines can also be ignored for the purposes of this article, but more info is here.

I'll add some comments to the assembly to indicate what it's doing

.file   "main_test.c"               # The file that these instruction were generated from
.text                               # The following instructions should be executable
.globl  main                        # Declare a global symbol named main
.type   main, @function             # Set the "main" symbol to be of function type

main:                               # Define a label named main
.LFB0:                              # Define a label named .LFB0
        .cfi_startproc
        pushq   %rbp                # Push the value in the rbp register on to the stack
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp          # Copy the contents of the rsp register into the rbp register
        .cfi_def_cfa_register 6
        movl    %edi, -20(%rbp)     # Move the contents of edi into memory
        movq    %rsi, -32(%rbp)     # Move the contents of rsi into memory
        movl    $42, -4(%rbp)       # Place the value 42 in memory
        movl    -4(%rbp), %eax      # Move the value we stored in memory into the eax register
        popq    %rbp                # Pop the value on top of the stack into the rbp register
        .cfi_def_cfa 7, 8
        ret                         # Return to the calling function
        .cfi_endproc
.LFE0:                              # Define a label named .LFE0
        .size   main, .-main        # Calculate the size of the main function
        .ident  "GCC: (GNU) 4.8.2 20140120 (Red Hat 4.8.2-16)"
        .section        .note.GNU-stack,"",@progbits

There is a lot going on here but the point is that you can see the 42 getting placed into memory, getting retrieved from memory and placed into a register, and finally the function returns to the caller. The eax register is the place where calling functions check for the return value.

Assembler

The assembler, /usr/bin/as in this case, takes the assembly instructions in the .s file and translates them into an object file which contains code that is relocatable. This means that is will use offsets to reference things, rather than hardcoded addresses, and will use a special address (like 0x00000000) to indicate when something is unknown and needs to be fixed. This is done because the assembler isn't aware of where this code will be placed in memory. The special addresses will be fixed in the next stage. The object file also contains the symbol table which correlates named objects to specific addresses, like where the main function is. There are several object file formats, the one we are looking at is called ELF, executable and linkable format.

To view the information about the object file, use readelf

$ readelf -a main_test.o

This is too much information to look at, but I'll mention the symbol table specifically, which can be produced by itself with

$ readelf -s main_test.o

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main_test.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000    23 FUNC    GLOBAL DEFAULT    1 main

Entry 8 is the entry for our main function, you can see the size of the function, the fact that it's a global function, and that it's value is 0 which means it hasn't been fixed up yet.

Linker

The linker, /usr/bin/ld in this case, will take one or more object files and combine them together by looking at their symbol tables to resolve unknown symbols so that it knows exactly where to go when it needs to execute specific parts of your code. The output of this stage is (hopefully) a fully working and executable binary. We instructed gcc to output a file named test_welibc which is the file that the linker produces. We can look at the final symbol table associated with the executable to see how our symbols were resolved

$ readelf -s test_welibc

Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000400120     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000400144     0 SECTION LOCAL  DEFAULT    2
     3: 000000000040015c     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000400170     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main_test.c
     7: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS
     8: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _start
     9: 0000000000601000     0 NOTYPE  GLOBAL DEFAULT    4 __bss_start
    10: 0000000000400144    23 FUNC    GLOBAL DEFAULT    2 main
    11: 0000000000601000     0 NOTYPE  GLOBAL DEFAULT    4 _edata
    12: 0000000000601000     0 NOTYPE  GLOBAL DEFAULT    4 _end

Entry 10 is the entry for our main function and you can see that its value is no longer 0. However, we also have 5 more symbols than before, including the _start symbol that was said to be missing; where did they come from? These extra symbols are needed by the ELF file format and gcc creates them as part of the linking process. However, the _start symbol is special because that is the location where an ELF file will start executing. The standard C library provides its own _start symbol and since we aren't linking against it anymore (because we specified -nostdlib) then we get the error about not finding a symbol. You'll also see that the "Ndx" column lists the _start symbol as being UND, or undefined.

Bootstrapping your library

Now that we know a little more about the compilation process, ELF, and why we ended up with an error, we can look at fixing it.

First, let's just run our executable and see what happens

$ ./test_welibc
Segmentation fault

A segmentation fault generally means that we accessed a region of memory that was protected in some way. It makes sense that we get this error because the ELF file format begins execution at the address associated with the _start symbol, and our _start symbol's address (the value from the symbol table) is 0. Most systems will not allow you to reference data at, or execute code at, the address 0; so, we end up with a seg fault.

This means we need to provide a _start symbol and somehow get it to call our main function. This will require us to do some assembly programming.

Defining the _start symbol

The first file in our library will be _start.S, where we will define our start symbol, call our main function, and finally exit from the program.

To define a global symbol, we use the .globl assembly directive, like so

.globl _start

This directive is specific to the GNU assembler (gas) so if you're using something else, be sure to look at its directive syntax (for example, nasm uses "global").

In assembly, to call a function we use the call instruction, like so

call main

And to exit from the program we will make use of the exit(2) syscall through use of the syscall instruction

syscall

However, we must determine how to pass arguments to main, receives its return value, and pass an argument to the exit syscall. These calling conventions are established by the ABI for your architecture and operating system. In the case of Linux on x86_64 we can turn to the System V ABI which has this to say

Figure 3.4: Register Usage
%rax    temporary register; with variable arguments
        passes information about the number of vector
        registers used; 1st return register

A.2.1 Calling Convention
1. User-level applications use as integer registers for passing the sequence
%rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi,
%rsi, %rdx, %r10, %r8 and %r9.

2. A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.

3. The number of the syscall has to be passed in register %rax.

Now that we know how to pass arguments, we will want to call main and exit like so

exit(main(argc, argv));

Where argc and argv are passed to main, whose return value is passed to exit. Before calling main we'll need to place argc into %rdi and argv into %rsi, main will give its return value to us in %rax which needs to be placed into %rdi, and finally we need to place the syscall number for exit into %rax before issuing the syscall instruction. This leaves us with two questions: where do argc and argv come from, and what is the syscall number for exit?

argc and argv

The ABI tells us where these come from in Figure 3.11: Initial Process Stack

Argument pointers       8 + %rsp
Argument count          %rsp

And additionally makes this note about %rsp's initial value

%rsp    The stack pointer holds the address of the byte with lowest address
        which is part of the stack. It is guaranteed to be 16-byte aligned
        at process entry

Alignment will be important in just a moment. Additionally, the pointers in the argv array are located directly on the stack at the location specified above.

At the beginning of _start, %rsp will point to argc and argv will be 8 bytes higher on the stack. We can use pop to grab the value at the location to which %rsp points and the resultant value of %rsp will be the value we need for argv (remember, argv is an array of pointers and to pass that as an argument in C we pass the location of that array). In assembler this is

pop     %rdi
mov     %rsp,   %rsi

Our stack started off being 16-byte aligned but is now 8-byte aligned due to the pop instruction. This means we need to re-align the stack on a 16-byte boundary because that's what is expected upon entry to main. This can be achieved in a few ways, but we'll be explicit about what we're doing by AND'ing %rsp with a bit mask that will preserve all but the last nibble (-16 in two's complement), guaranteeing 16-byte alignment. Once done, we're ready to call main.

pop     %rdi
mov     %rsp,   %rsi
and     $-16,   %rsp
call    main

exit syscall number

Our last item is to figure out what the syscall number for exit is. 64-bit syscalls on Linux are defined in the header /usr/include/asm/unistd_64.h, which has the following line

#define __NR_exit 60

Now we save off the return value from main and call the exit syscall like so

mov     %rax,   %rdi
movq    $60,    %rax
syscall

I was given a protip to specify the size when moving immediate values rather than relying on the default size. This is why movq is used on the immediate value of 60.

_start.S will look something like this

.globl _start               # Declare global _start symbol

_start:
    pop     %rdi            # Place argc as first arg to main
    mov     %rsp,   %rsi    # Place location of argv[] as second arg to main
    and     $-16,   %rsp    # Realign stack on a 16-byte boundary
    call    main            # Call main function
    mov     %rax,   %rdi    # Copy return value from main as argument to exit(2)
    movq    $60,    %rax    # Set rax to syscall number for exit
    syscall                 # Execute the syscall specified in rax

Assembling assembly

We now have an assembly file that we need to assemble. Let's try assembling it by itself to see what happens

$ gcc -o _start _start.S
/tmp/ccZH6Hhm.o: In function `_start':
(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64/crt1.o:(.text+0x0): first defined here
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/tmp/ccZH6Hhm.o: In function `_start':
(.text+0x1): undefined reference to `main'
collect2: error: ld returned 1 exit status

Here we compiled with glibc (because we left out -nostdlib) and we can see that _start is in fact defined by the normal standard library because gcc tells us that the _start symbol was first defined in the crt1.0 file. Hopefully things are starting to make sense as to what glibc is providing and why we get errors when we don't link against it. You can also see there is an undefined reference to main which should be expected since we call the main function but we haven't provided gcc with the code for it yet.

Now, let's compile without glibc and see what output we get

$ gcc -nostdlib -ffreestanding -o _start _start.S
/tmp/ccDUeM2Z.o: In function `_start':
(.text+0x1): undefined reference to `main'
collect2: error: ld returned 1 exit status

Awesome, we don't get anymore errors about multiple definitions of start and we still get the expected error about main. However, we really want an object file so that we can then link it to the code where main is defined. You can instruct gcc to stop before the linking phase which will give you an object file.

$ gcc -nostdlib -ffreestanding -c -o _start.o _start.S

Now we don't get any errors since gcc didn't try to resolve any symbols.

We'll do the same thing with our main_test.c file to get an object file with main defined inside of it

$ gcc -nostdlib -ffreestanding -c -o main_test.o main_test.c

Linking objects together

So we're now at the point where we have one file defining the _start symbol and referencing the main symbol, and one file which defines the main symbol. We need to link them together so that the unresolved symbols will be fixed up so that we can execute our code with no seg faults.

First, let's verify what we have. If we dump the symbol table for _start.o we should see the _start symbol and an undefined main symbol

$ readelf -s _start.o

Symbol table '.symtab' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    1 _start
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND main

The "Ndx" column indicates whether the symbol is undefined or not, and as we expected, the main symbol is undefined. Now, let's look at main_test.o

$ readelf -s main_test.o

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main_test.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000    23 FUNC    GLOBAL DEFAULT    1 main

Likewise we now have a defined main symbol. Now, let's link them together to get a working binary!

We can pass object files to gcc and it knows not to preprocess, compile, or assemble them. We still need to specify not to use glibc because gcc will automatically link against it otherwise, and in our case we want to link only main_test.o and _start.o:

$ gcc -nostdlib -ffreestanding -o test_welibc _start.o main_test.o

And now let's look at the final symbol table to be sure everything is defined

$ readelf -s test_welibc

Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000400120     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000400144     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000400168     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000400180     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main_test.c
     7: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS
     8: 000000000040015c     0 NOTYPE  GLOBAL DEFAULT    2 _start
     9: 0000000000601000     0 NOTYPE  GLOBAL DEFAULT    4 __bss_start
    10: 0000000000400144    23 FUNC    GLOBAL DEFAULT    2 main
    11: 0000000000601000     0 NOTYPE  GLOBAL DEFAULT    4 _edata
    12: 0000000000601000     0 NOTYPE  GLOBAL DEFAULT    4 _end

Removing startup files and standard libraries

We can add two more compiler flags to get rid of a few more system defaults. First, we'll direct gcc not to use any of the standard system startup files with -nostartfiles. Second, we don't want to use any of the default system libraries since we are writing our own standard C library; this is done with the -nodefaultlibs flag. From the man pages description of the -nostdlib flag we're already using, it seems to be a combination of -nostartfiles and -nodefaultlibs, but we'll include them just to be sure.

Our final compilation looks like

$ gcc -nostdlib -ffreestanding -nostartfiles -nodefaultlibs -o test_welibc _start.o main_test.o

Running glibc free

So, we haven't gotten any errors during preprocessing, compiling, assembling, or linking. It's time to test our binary that doesn't use glibc to see if it runs properly

$ ./test_welibc
$ echo $?
42

Sweet! Now we can continue on with the development of the rest of the library since we can successfully run the most basic code when linking against it.

Project Goals

March 05, 2015

tags: design, standards

As with any project, you need to have a goal in mind for what the project should accomplish. Here I will discuss the goals of welibc and how I plan to achieve them.

Goal

To create an implementation of the standard C library, as defined in ISO/IEC 9899:1990, which is clear, secure, and well documented.

Scope

The user mode portions of the standard C library will be implemented first. This means that functions like free, fwrite, and malloc will be saved until the end. This is because those functions must implement OS and architecture specific code which, by its definition, is not portable and could stall the project. This project is not aimed at providing an implementation which will run on every machine but rather an implementation which can serve as a reference or perhaps the base for other projects with similar goals.

Plan

I don't think that either of the three sub-goals are more important than another, but it is important to discuss a preliminary plan for achieving them. I think that implementing one function a week is a moderate goal to allow me time to write the function, explore different ways of achieving the same results, writing tests for the function, and documenting the work. Now that I've said that, maybe two weeks is better. We'll see.

Code Clarity

Having code which is easy to read and digest is very important if you plan on maintaining it or passing it on to someone else. It can make all the difference in the world when the previous programmer (which might be you!) has taken the time to write code in a way which implements their intent such that you can verify it is done correctly. I briefly touched on this before, but I'll give another example. Which code sample below would you prefer to debug?

Sample 1

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("You passed in %d argument%s\n",argc,(argc!=1)?"s":"");
    return 0;
}

Sample 2

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    int numArgs = argc;

    printf("You passed in %d argument", numArgs);

    if (1 != numArgs)
    {
        printf("s");
    }

    printf("\n");

    return 0;
}

Can you spot the error? How would you fix it? These two code examples do exactly the same thing and while one has fewer lines of code and less function calls I would personally prefer to be debugging the 2nd example.

Using a style guide can help a lot with writing clear code. It helps ensure that your code is written in a consistent manner and that there is a defined naming scheme for files, functions, and variables. Previously I said that I would just use the coding style I developed on my own, which isn't documented, but that seems to go against the goals of the project. So, I'll use the Google Style Guide as a template to write my own style guide which I expect to grow in size and specificity as the project progresses. It would be nice if I could write it in markdown and include it as a part of my directory structure then link a blog post to the file itself, rather than updating a single blog post.

Here are corrected versions of the code samples above. The bug lies in that argc includes the program name in its count, so the number of arguments passed to the program is argc - 1.

Corrected Sample 1

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("You passed in %d argument%s\n",argc-1,((argc-1)!=1)?"s":"");
    return 0;
}

Corrected Sample 2

/* Print the number of command-line arguments passed in */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    int numArgs = argc - 1;

    printf("You passed in %d argument", numArgs);

    if (1 != numArgs)
    {
        printf("s");
    }

    printf("\n");

    return 0;
}

It's worth noting that the first example relies on having to change code in two places while the 2nd example only requires a change in one. This can cause problems during maintence since you rely on the programmer remembering to update every instance of the error.

Writing Secure Code

Some portions of the standard C library will never be secure by their nature, but I want to do as much as I can to create an implementation which is as secure as possible within the confines of the standard.

A great resource for writing secure C code is the Cert C Coding Standard which is a free resource that is continuously updated. If you prefer a hardcopy, they took a snapshot of their standard and printed it; it's about $55 on Amazon. I haven't read the entire standard, but I'll try to abide by its rules as much as possible. Also, the bibliography section has a plethora of other fantastic C resources.

Testing will also be a large component of writing secure code. No matter how skilled of a programmer you are, I don't think it would ever be wise to deploy your code without testing it; no matter how sure you are that it's correct. I still have some research to do on testing frameworks for C, but this is something I plan on doing from the beginning. I don't think this will be test driven development, but tests will certainly be included.

There is also the idea of defensive programming which drives the way you write code with the goal of writing code which is less likely to contain bugs. This doesn't mean you write code without bugs, but that the bugs you do implement will reveal themselves sooner rather than later. A good example of this is below:

/* Print the first argument to the screen */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    if (argc = 2)
    {
        printf("Please provide one argument to be printed.\n");
    }
    else
    {
        printf("%s", argv[1]);
    }

    return 0;
}

First, do you see the bug and can you determine what behavior it causes?

Compiling this with gcc -o defensive defensive.c will not produce any warnings or errors. But no matter what input you give, it will always print "Please provide one argument to be printed." This is because of the comparison if (argc = 2). This actually assigns the value 2 to the variable argc and that expression evaluates to 2 which is true in C. I meant to write if (argc != 2). So, we can rewrite this by placing the constant on the left side of the expression:

/* Print the first argument to the screen */
#include <stdio.h>

int
main(int argc, char *argv[])
{
    if (2 = argc)
    {
        printf("Please provide one argument to be printed.\n");
    }
    else
    {
        printf("%s", argv[1]);
    }

    return 0;
}

Now when we compile with gcc -o defensive defensive.c we get the error:

defensive.c: In function ‘main’:
defensive.c:5:11: error: lvalue required as left operand of assignment
     if (2 = argc)
           ^

So, simply by changing the way we write comparisons we can catch bugs like this early since the code won't even compile! See if you can find the other bug in this code snippet (hint).

Another way to program securely is to take advantage of the features of your compiler. For example, compiling the original example with gcc -Wall -o defensive defensive.c would give the following warning:

defensive.c: In function ‘main’:
defensive.c:5:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
     if (argc = 2)
     ^

That is better than nothing, but the code still compiles. For this reason, I prefer writing the code defensively because it's not always guaranteed that someone else will be compiling your code with the same settings. Yes, there are makefiles or build settings for a project, but copy/pasting code is very common and doesn't capture your specific build flags.

Documentation

Again, documentation is very important and for that reason I want to provide a standard C library which is well documented. I will be using Doxygen to generate documentation from the source code which dictates that my code be commented in specific ways so that the Doxygen parser can read it. This will force me to comment every file and function appropriately and forces me towards meeting this goal.

In addition to documenting the source code, my plan is to document my progress as I go with blog posts that discuss how I arrived at the implementation of each function. I hope that other programmers will find this documentation useful for answering questions like "How does this work?", "Why did you do it this way?", or "What is the best way to _____?". This will also help me have a log to look back over to see why I chose specific implementations and what worked or didn't work the way I wanted it to.

Project Design Decisions

March 03, 2015

tags: design, license, standards, style

There are a few decisions that you might want to make at the beginning of your project so that you have them answered before you get too deep in to things. We'll go over choosing a license, establishing a directory structure, choosing a revision control system, and a few other things.

Software License

If you choose to place your code online where other people can see it then you'll want to choose a license to release your project under. Choosing a license ensures that you have appropriate protection both in how people use your code and what you can be blamed for. I fantastic resource for seeing what types of licenses are out there and what they mean is choosealicense.com. Jeff Atwood wrote a pretty good article on this as well; it's worth a read. Once you've chosen your license, generally you'll need to include a copy of it at the base of your directory structure and possibly at the top of each file.

For welibc I'm choosing the BSD 3-Clause License for its simplicity, the requirement of attribution, and preventing the ability to be held liable.

Directory Structure

Establishing your directory structure at the beginning will help guide your development and keep the layout clean and organized. The idea is to separate distinct pieces and parts of your project in to their own directories in a way that makes sense to you and your team. As your project grows and changes you may find that your directory structure is really working out or that it seems to complicate things. Don't be afraid to alter it as you go, but don't compromise the structure for one feature if it complicates a lot of other pieces.

I don't expect welibc to be terribly big, but at this point I see three distinct groups of files: source code, public headers, and test code. Both the source and test code will need access to the headers, so I think it's logical to group them in a separate folder. Here is a sample layout of what I have in mind:

welibc/
    ├── license.md
    ├── readme.md
    ├── include
    │   ├── assert.h
    │   ├── limits.h
    │   ├── stdio.h
    │   └── stdlib.h
    ├── src
    │   ├── assert.c
    │   ├── stdio.c
    │   └── stdlib.c
    └── test
        ├── assert_test.c
        ├── main.c
        ├── stdio_test.c
        └── stdlib_test.c

Revision Control System

A revision control system (RCS) assists in managing your code as you add features, remove bugs, and implement new functionality. It is an absolutely essential piece of your project. Without an RCS it is likely that you will lose working code, leave commented out code laying around, and make the same mistakes over again.

An RCS provides two incredibly important features: snapshotting your code and attaching messages to your code. If you're unfamiliar, an RCS works similarly to a library in that you check out your code, edit it, then check it back in (or "commit" it) with the changes along with a message of what you did. This also means that it is entirely up to you and your team to make the best use of these features. You could use an RCS but only have one commit with your finalized piece of software; that's not useful. You could put song lyrics in every commit message you make; that's not useful. These features exist for a reason and you should use them. Seth Robertson wrote a great post that discusses this in more depth.

Alright, I'm glad you chose to use a revision control system but which one should you use? There are a few out there, but there are really about four in major use today:

I don't know that any of them are necessarily better than others, but right now there are two pretty large providers of free source code hosting that both use git: bitbucket and github. They make it easy to share your code as well as browse it through their web interfaces. It's also nice that they are in charge of managing the infrastructure and server side of things; all you have to worry about is writing your code.

For this project I've chosen to host the code at bitbucket using git. I'm not as proficient with git as I would like to be so this will help me become more familiar with it.

Coding Style

The style of your code is something that is often overlooked but is something that can really enhance your project. If all of your code is written the same way with braces in the same spots, the same variable name scheme, and spacing consistency then it's much easier to read the code and spot errors when things get out of place.

What looks better, this?

int greatestcommonDivisor(int a  , int b){
    int c;

    while(a!=0)
{
c = a; a = b % a;
b = c;
}

    return b;}

Or this?

int
greatestCommonDivisor(int a, int b)
{
    int c;

    while (0 != a)
    {
        c = a;
        a = b % a;
        b = c;
    }

    return b;
}

That's an extreme example, but just keep in mind that it'll probably be you coming back later on to fix a bug. Do you really want to make things harder on yourself later?

The style you choose is entirely up to you but I recommend always using the same style throughout all of your code; don't change it between files or folders. There are a few example style guides you could adopt as well:

To get a better idea of what all a style guide encompasses, read the Linux kernel coding style page. It's a pretty fast read and Linus even includes some justifications for why certain styles were chosen.

For welibc I have my own style that I've developed over the past few years which I'll be using. It's not exactly documented somewhere, but maybe I'll do that as part of this project. I'm pretty particular about how my code looks.

Build Automation Software

Unless you're working with just a handful of files then you'll want to pick a build system to handle everything for you. A build system allows you to issue a simple command like "make" or to just click a button and have your entire project built according to the specific settings that you need. A few examples of build systems would be:

Since I'll be doing this work on Linux and I'm already pretty familiar with it, I'll be using GNU make for this project.

Documentation

Documentation is usually the last thing done and is sometimes left out of a project entirely. Including documentation may seem worthless but it can really pay off when you have to find an obscure bug, when you bring in a new developer, or when an experienced developer leaves the team. If you have appropriate documentation then these things go more smoothly.

You should first start with commenting your code appropriately. This can include file and function comments along with documenting particularly tricky pieces of code when you can't figure out how to simplify it any other way. Once you have a consistent method of documenting things then you could look at documentation generators. These are separate programs that parse your source code and generate documentation which generally looks better and is easier to read than your source code. A few documentation generators to consider are:

My plan is to use Doxygen since that's really the only one I've used before. If it isn't customizeable enough to fit into the style of the rest of the website then I might switch to something else.

Conclusion

This may seem like a lot to decide on and setup before you actually get down to writing any code but a little planning can go a long way in keeping your project organized and easy to understand. Having some of these things in place can really make the difference between looking like a hobby and looking like a professionally produced piece of software.

C Standards

March 01, 2015

tags: history, standards

To implement the standard C library you need a C standard which tells you what the library contains and how it is intended to work. So, which standard do you choose and where do you get it?

History

There is a bit of history behind how the most current revision of the C standard came to be, so we'll go over it to give some background and context for why they exist and where they came from. The beginnings are somewhat scattered across the Internet, but a lot of this information was obtained from Dennis Ritchie's homepage and the documents mentioned there.

Informal Standards - 1975 - 1989

Originally, there were no C standards, just "The C Reference Manual", a document written by Dennis Ritchie and used within AT&T Bell Laboratories. This was not necessarily written with the purpose of being a standard, but rather documentation of the syntax and use of C. The most recent version of this manual was published in 1975 and included with 6th Edition Unix. It mentions a useful introduction to the C language, "Programming in C - A Tutorial", by Brian Kernighan.

The next iteration of an informal standard came in 1978 with the first version of "The C Programming Language", by Brian Kernighan and Dennis Ritchie. This book replaced "The C Reference Manual", but reproduced most, if not all, of it in the book's Appendix A. This book has become very well known and is usually referred to as just "K&R" and served as the informal standard until 1989.

ANSI C - 1983 - 1990

As C grew in popularity it began to quickly evolve, but the informal standards were not keeping up; "The C Programming Language" no longer reflected the most current version of the language. During the summer of 1983 the American National Standards Institude (ANSI) created the X3J11 committee with the goal "to develop a clear, consistent, and unambiguous Standard for the C programming language which codifies the common, existing definition of C and which promotes the portability of user programs across C language environments." On December 14, 1989 the committee ratified the first official C standard, ANSI X3.159-1989. It is more commonly referred to as "ANSI C" or just "C89".

As far as I know, the closest you can get to a copy of the original C89 standard is a text version of a draft. There is also a PDF version of the rationale which helps explain the reasoning and decision making that goes into developing the standard.

International Standardization - 1990 - 1999

In 1990, the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC) created the WG14 working group which is in charge of the international standardization for the programming language C. C was still growing in popularity and ANSI C was really only a standard approved by an American organization - it had no internationally recognized standard. The WG14 working group used the ANSI X3.159-1989 standard to create its own version, ISO/IEC 9899:1990, which was ratified on December 31, 1990, and is more commonly known as "C90". The only difference between C89 and C90 is the formatting, otherwise they are equivalent. You can get a copy of C90 from SAI Global, which only has a hard copy available.

In addition to creating the international standard, ISO/IEC also provides updates to its standards through documents called technical corrigenda which are meant to correct technical errors or ambiguity and updated outdated information. In addition to technical corrigenda, ISO/IEC releases amendments to their standards in order to alter and/or add to previously agreed technical provisions in an existing standard. In 1994, ISO/IEC published Technical Corrigendum 1 for C90 as the first set of corrections/updates for the standard. In 1995, ISO/IEC published "Amendment 1: C integrity" to C90 which was such a significant update to the language that it is known as C95, it can be purchased from SAI Global. The final update to C90 was Technical Corrigendum 2, which was released in 1996. The dates for the technical corrigenda and amendment seem to overlap, but as best I can tell TC1 was published before AMD1.

Updating the International Standard - 1999 - Present

The next major update to the C standard came in 1999 with ISO/IEC 9899:1990, also known as C99. I couldn't find a draft version of just C99 without any of the technical corrigenda included, but a copy is available from SAI Global.

C99 has had three updates: Technical Corrigendum 1 released in 2001, Technical Corrigendum 2 released in 2004, and finally Technical Corrigendum 3 released in 2007. There is a WG14 working paper which includes C99 along with all of its technical corrigenda.

Finally, the most recent C standard is ISO/IEC 9899:2011, also known as C11. A draft version, ISO/IEC 9899:201X, is available for free and according to WG14 "it reflects what was to become the standard at the time of issue." This version of the standard had a technical corrigendum released in 2012. Again, I couldn't find a free draft version of this technical corrigendum, but it can be purchased from SAI Global.

Choosing a standard

It is generally best to use the most up-to-date version of the C standard. However, not all compilers fully support every version, and there may be portability concerns for your specific project which dictate using a specific standard. As well, consider what other code your project will integrate with and which standard that code was written for.

For this project, I will be implementing the standard C library as defined in ISO/IEC 9899:1990. The updated standards usually add functionality on top of the base library as defined C90, so I think the most basic set of functions is a good place to begin.

First Post

February 20, 2015

tags: markdown, meta

Subtitles

This is my first post with markdown, just seeing how things work.

A
List
I think

Need some text here

#include <stdio.h>
int
main(void)
{
    printf("Hello, world!");
    return 0;
}

How about some math? $e=mc^2$

And there we go