Note: this implementation of string.h is missing the locale functions - when they're added this post will be updated.

Call(er) Graphs

Many of the string functions call other string functions to do their work and from a library perspective this is nice. You write the logic for string length calculation once and just as users would call into your library to calculate a string's length, the library can do the same. You may wonder what the various calls chains looks like within the library itself - who calls what?

Conveniently, doxygen can generate this for us as long as the appropriate settings are enabled. Doxygen not only generates call graphs (what functions does I call), but also caller graphs (what functions call me). We'll explore these graphs using the metaphor of a house.

Foundation Functions

At the lowest level, certain functions form the foundation of string.h - they're what everything else is built on top of. These will provide the most interesting caller graphs and should have an empty call graph.

memcmp

If you abstract the concept of "search for a character" to "search for a byte" then it makes sense that memcmp is a foundational function.

memmove

Another service provided by the string header is copying data from one location in memory to another. At the root of this is the memmove function. memcpy could be here as well but in this library it's a wrapper for memmove.

memset

One of the more important functions that's often used for initializing memory, memset offers the ability to copy a value to each byte of a region of memory.

strchr

Locating a specific character within a string is key to many of the string functions. For the more complicated functions this lets us skip ahead to the next meaningful starting point for a search or comparison.

strlen

This is the absolute workhorse of string.h and as such it has the largest caller graph. Most string functions focus on searching the string in some fashion, so knowing where the strings ends is important to knowing when the search can stop if no match is found earlier.

What a graph!

Structure Functions

Next would be functions that both call and are called by other functions in the library. As such, these have two graphs to enjoy.

strcpy

strcspn

strncmp

strspn

Roof Functions

These functions are at the highest level and only call other string functions so they won't have a caller graph within the context of string.h.

memcpy

As mentioned before, memcpy is only here because it's implemented as a wrapper around memcpy - other libraries may implement this as a foundation function.

strcat

strcmp

strncat

strncpy

strpbrk

strrchr

strstr

strtok

Shed Functions

There are only two functions which neither call nor are called by other string functions. Since these are close by but not tied to other functions, "shed" seems appropriate.

Performance Considerations

These graphs aren't just pretty, they also provide a means for guiding how performance could be improved in a library. If performance is a concern then improving the most used functions will be the fastest way to better performance. Using our house analogy from above, start at the foundation and work your way up and out to the shed. By improving the foundation you are improving everything that's built on top of it - just like the stability of a real house.

Error Checking

Since most functions in this library perform error checking before doing work, it's entirely possible that the same error condition is checked multiple times. This is likely the case with pointer validation as the pointer is passed from roof to structure to foundation. With a call chain like strtok to strcspn to strlen, the s1 argument to strtok will be checked three times!

This knowledge can lead to different design decisions based on the goal of the project. If this checking is excessive then the library can provide "internal use only" versions of the functions which don't perform error checking on the arguments. This may not guarantee one check per argument in all cases, but it can reduce the number of checks to only what's necessary.

Implementation Considerations

Similar to the value that these graphs provide to improving performance, they also provide a roadmap for implementing the library in a progressive order. Follow the same strategy of starting at the foundation and working up and out if you want a library which calls into itself. If not then you can start anywhere you like since each function would be completely self-contained.

Administrivia

A short note on the format of the posts. I'll probably rework the sequence of: local variables, error checking, implementation, tests. This isn't how I think about solving problems with C. While it may be how you would explain a written piece of code to someone else (as in going from top to bottom through a function), it doesn't communicate what I want.

Going forward the posts will discuss the main algorithm first and then cover variables needed to make that happen. Once the main functionality is explained then it's more appropriate to discuss error conditions and then tests logically follow that. We'll see if that sequence works better.

comments powered by Disqus