Although C doesn't have true string objects, it does have arrays of characters and string literals (static character arrays) which we'll call "C strings." Since there is no string object then nothing tracks the length of a C string and C must provide a way to denote the end of a string so the length can be programmatically determined. This is accomplished by placing a NUL terminator, '\0', after the last character.

Determining the length of a C string is such a common task that a function for it is provided in the standard library, strlen. It's prototype is as follows

size_t
strlen(const char *s);

It takes a pointer to the first character in a C string and returns the number of characters in the string, not counting the NUL terminator.

Common Ways

A simple approach to strlen is as follows

size_t
strlen(const char *s)
{
    size_t len = 0;

    for (; *s++; len++)
    {
        /* Do nothing */
    }

    return len;
}

Or similarly using a while loop:

size_t
strlen(const char *s)
{
    size_t len = 0;

    while (*s++)
    {
        len++;
    }

    return len;
}

These take advantage of the fact that the NUL terminator has an integer value of 0. The loop conditions are actually checking that the current character pointed to does not have the integer value of 0, in other words that it's not the NUL terminator. This could be further shortened as such

size_t
strlen(const char *s)
{
    size_t len = 0;

    while (*s++ && ++len);

    return len;
}

Which goes a step further in utilizing the short circuit nature of C conditionals meaning that if the first condition is not met then the second is not tested. This means that the len variable is only incremented when the current character is not the NUL terminator.

Although these are straightforward and simple implementations, glibc, bsdlibc, and musl all use more complicated, but more efficient, versions which use a technique similar to the one I discussed in the memcpy.h article. Once a word boundary is reached the function will test an entire word at a time to detect a single zero valued byte.

The above implementations do not account for NULL pointers or for a string which is not NUL terminated and may wrap around memory. These are things we'll cover below.

Local Variables

For our implementation, we'll only use a single local variable as a pointer to the end of the string (whenever we find it). Its initial value will be the start of the string since it's possible that the string is zero characters long.

const char *end = s;

Parameter Validation

The only parameter to this function is the character pointer, s, which we need to verify is not NULL. This will be one condition of the test loop in the main body of the function.

while (end)
{
}

Implementation

A loop is an obvious choice for iterating over a C string, each time checking whether the current character is the NUL terminator. To factor in our parameter validation and memory wrapping we'll have the first condition check the validity of the end pointer which was initialized with the value of s. We need to increment end inside the loop body in case the check against \0 fails, otherwise we'll increment it once more than intended when we reach the NUL terminator.

while (end &&
       ('\0' != *end))
{
    end++;
}

One of two things will cause the loop to terminate. Either the end pointer will be NULL or we reached the end of the string. The end pointer being NULL represents an error condition that we need to account for by resetting the value of end. Similarly, if an unbounded string at the end of memory was passed to this function, it's possible that the pointer will be incremented after it already points to the last addressable byte. This results in undefined behavior which we can attempt to handle by checking whether the resultant pointer is less than the original pointer. In such a case, we will also reset our end pointer so the string length will be reported as zero since an error will likely occur otherwise.

Lastly, we subtract s from the resulting end pointer to get the length of the string.

while (end &&
       ('\0' != *end))
{
    end++;
}

if (!end ||
    (end < s))
{
    end = s;
}

return (size_t) (end - s);

Testing

We can easily test whether our input validation works by passing NULL to strlen. We can also easily test the length of an empty string which should be 0, as well as a string with a single character in it. For a more "dynamic" test we will use a character array initialized with a string literal. The array will automatically have enough room allocated on the stack for every character in the string literal plus one more for the NUL terminator. As such, the length of that string should be the size of the array minus 1. We're unable to test strings which wrap memory because doing so intentionally would invoke undefined behavior for which the results are... undefined.

int
strlenTest(void)
{
    int ret = -1;
    char *str1 = NULL;
    char str2[] = "hello, world";
    char *str3 = "";
    char *str4 = "a";

    do
    {
        /* NULL pointer */
        if (0 != strlen(str1))
        {
            break;
        }

        /* String length should match array size - 1 */
        if ((sizeof(str2) - 1) != strlen(str2))
        {
            break;
        }

        /* Empty string */
        if (0 != strlen(str3))
        {
            break;
        }

        /* Single character string */
        if (1 != strlen(str4))
        {
            break;
        }

        ret = 0;
    } while (0);

    return ret;
}

Conclusion

Although simple, this implementation of strlen affords us nice error checking and handling of undefined behaviors should they behave in a certain way.

size_t
strlen(const char *s)
{
    const char *end = s;

    /* Check s every time to safeguard against non-NUL terminated strings */
    while (end &&
           ('\0' != *end))
    {
        end++;
    }

    /* Verify the loop didn't stop in an error and s didn't wrap memory */
    if (!end ||
        (end < s))
    {
        end = s;
    }

    return (size_t) (end - s);
}
comments powered by Disqus