string.h, a retrospective
Note: this implementation of string.h is missing the locale functions - when they're added this post will be updated.
Many of the string functions call other string functions to do their work and from a library perspective this is nice. You write the logic for string length calculation once and just as users would call into your library to calculate a string's length, the library can do the same. You may wonder what the various calls chains looks like within the library itself - who calls what?
Conveniently, doxygen can generate this for us as long as the appropriate settings are enabled. Doxygen not only generates call graphs (what functions does I call), but also caller graphs (what functions call me). We'll explore these graphs using the metaphor of a house.
At the lowest level, certain functions form the foundation of string.h - they're what everything else is built on top of. These will provide the most interesting caller graphs and should have an empty call graph.
If you abstract the concept of "search for a character" to "search for a byte"
then it makes sense that
memcmp is a foundational function.
Another service provided by the string header is copying data from one location
in memory to another. At the root of this is the
could be here as well but in this library it's a wrapper for
One of the more important functions that's often used for initializing memory,
memset offers the ability to copy a value to each byte of a region of memory.
Locating a specific character within a string is key to many of the string functions. For the more complicated functions this lets us skip ahead to the next meaningful starting point for a search or comparison.
This is the absolute workhorse of string.h and as such it has the largest caller graph. Most string functions focus on searching the string in some fashion, so knowing where the strings ends is important to knowing when the search can stop if no match is found earlier.
What a graph!
Next would be functions that both call and are called by other functions in the library. As such, these have two graphs to enjoy.
These functions are at the highest level and only call other string functions so they won't have a caller graph within the context of string.h.
As mentioned before,
memcpy is only here because it's implemented as a wrapper
memcpy - other libraries may implement this as a foundation function.
There are only two functions which neither call nor are called by other string functions. Since these are close by but not tied to other functions, "shed" seems appropriate.
These graphs aren't just pretty, they also provide a means for guiding how performance could be improved in a library. If performance is a concern then improving the most used functions will be the fastest way to better performance. Using our house analogy from above, start at the foundation and work your way up and out to the shed. By improving the foundation you are improving everything that's built on top of it - just like the stability of a real house.
Since most functions in this library perform error checking before doing work,
it's entirely possible that the same error condition is checked multiple times.
This is likely the case with pointer validation as the pointer is passed from
roof to structure to foundation. With a call chain like
s1 argument to
strtok will be checked three times!
This knowledge can lead to different design decisions based on the goal of the project. If this checking is excessive then the library can provide "internal use only" versions of the functions which don't perform error checking on the arguments. This may not guarantee one check per argument in all cases, but it can reduce the number of checks to only what's necessary.
Similar to the value that these graphs provide to improving performance, they also provide a roadmap for implementing the library in a progressive order. Follow the same strategy of starting at the foundation and working up and out if you want a library which calls into itself. If not then you can start anywhere you like since each function would be completely self-contained.
A short note on the format of the posts. I'll probably rework the sequence of: local variables, error checking, implementation, tests. This isn't how I think about solving problems with C. While it may be how you would explain a written piece of code to someone else (as in going from top to bottom through a function), it doesn't communicate what I want.
Going forward the posts will discuss the main algorithm first and then cover variables needed to make that happen. Once the main functionality is explained then it's more appropriate to discuss error conditions and then tests logically follow that. We'll see if that sequence works better.comments powered by Disqus