Blog
string.h, a retrospective
September 03, 2021
tags: string.h
Note: this implementation of string.h is missing the locale functions - when they're added this post will be updated.
Call(er) Graphs
Many of the string functions call other string functions to do their work and from a library perspective this is nice. You write the logic for string length calculation once and just as users would call into your library to calculate a string's length, the library can do the same. You may wonder what the various calls chains looks like within the library itself - who calls what?
Conveniently, doxygen can generate this for us as long as the appropriate settings are enabled. Doxygen not only generates call graphs (what functions does I call), but also caller graphs (what functions call me). We'll explore these graphs using the metaphor of a house.
Foundation Functions
At the lowest level, certain functions form the foundation of string.h - they're what everything else is built on top of. These will provide the most interesting caller graphs and should have an empty call graph.
memcmp
If you abstract the concept of "search for a character" to "search for a byte"
then it makes sense that memcmp
is a foundational function.
memmove
Another service provided by the string header is copying data from one location
in memory to another. At the root of this is the memmove
function. memcpy
could be here as well but in this library it's a wrapper for memmove
.
memset
One of the more important functions that's often used for initializing memory,
memset
offers the ability to copy a value to each byte of a region of memory.
strchr
Locating a specific character within a string is key to many of the string functions. For the more complicated functions this lets us skip ahead to the next meaningful starting point for a search or comparison.
strlen
This is the absolute workhorse of string.h and as such it has the largest caller graph. Most string functions focus on searching the string in some fashion, so knowing where the strings ends is important to knowing when the search can stop if no match is found earlier.
What a graph!
Structure Functions
Next would be functions that both call and are called by other functions in the library. As such, these have two graphs to enjoy.
strcpy
strcspn
strncmp
strspn
Roof Functions
These functions are at the highest level and only call other string functions so they won't have a caller graph within the context of string.h.
memcpy
As mentioned before, memcpy
is only here because it's implemented as a wrapper
around memcpy
- other libraries may implement this as a foundation function.
strcat
strcmp
strncat
strncpy
strpbrk
strrchr
strstr
strtok
Shed Functions
There are only two functions which neither call nor are called by other string functions. Since these are close by but not tied to other functions, "shed" seems appropriate.
Performance Considerations
These graphs aren't just pretty, they also provide a means for guiding how performance could be improved in a library. If performance is a concern then improving the most used functions will be the fastest way to better performance. Using our house analogy from above, start at the foundation and work your way up and out to the shed. By improving the foundation you are improving everything that's built on top of it - just like the stability of a real house.
Error Checking
Since most functions in this library perform error checking before doing work,
it's entirely possible that the same error condition is checked multiple times.
This is likely the case with pointer validation as the pointer is passed from
roof to structure to foundation. With a call chain like strtok
to strcspn
to
strlen
, the s1
argument to strtok
will be checked three times!
This knowledge can lead to different design decisions based on the goal of the project. If this checking is excessive then the library can provide "internal use only" versions of the functions which don't perform error checking on the arguments. This may not guarantee one check per argument in all cases, but it can reduce the number of checks to only what's necessary.
Implementation Considerations
Similar to the value that these graphs provide to improving performance, they also provide a roadmap for implementing the library in a progressive order. Follow the same strategy of starting at the foundation and working up and out if you want a library which calls into itself. If not then you can start anywhere you like since each function would be completely self-contained.
Administrivia
A short note on the format of the posts. I'll probably rework the sequence of: local variables, error checking, implementation, tests. This isn't how I think about solving problems with C. While it may be how you would explain a written piece of code to someone else (as in going from top to bottom through a function), it doesn't communicate what I want.
Going forward the posts will discuss the main algorithm first and then cover variables needed to make that happen. Once the main functionality is explained then it's more appropriate to discuss error conditions and then tests logically follow that. We'll see if that sequence works better.
strerror
July 25, 2021
tags: errno.h, strerror, string.h
The strerror
function maps an error number into a string description of that
error.
char *
strerror(int errnum);
It returns a pointer to the string representation of the error, the contents of which are implementation-defined.
Local Variables
We'll use a local variable to point to the string description corresponding to the received error number. This let's us modify what string is pointed to and have a single return statement (as we'll see shortly).
char *str = NULL;
Error Checking
No error checking is necessary for this function, or more accurately, the error checking will be handling by a switch statement that maps the error number to a string description.
Implementation
The total number of error numbers supported by the implementation is only two at
this point (EDOM
and ERANGE
from errno.h
), however it's is expected to
grow in the future. This presents a good use case for a switch
which makes for
a cleaner implementation over the use of a large chain of if
/else if
/else
statements. As more error numbers are defined in the future, new cases will be
added to translate them.
If the error number is unknown we'll still translate this into a message indicating as such. This way our implementation always returns a description. After the switch, return the pointer to the string that was chosen.
switch (errnum)
{
case EDOM:
str = "EDOM - An input argument is outside the domain over which the mathematical function is defined";
break;
case ERANGE:
str = "ERANGE - The result of the function cannot be represented as a double value";
break;
default:
str = "E??? - Unknown error number";
break;
}
return str;
It's worth mentioning a short side note on style here. When possible, welibc chooses to limit the width of code to 80 characters which aids in readability. The C language supports splitting these long descriptions across multiple lines which might look like the following:
switch (errnum)
{
case EDOM:
str = "EDOM - An input argument is outside the domain over which the " \
"mathematical function is defined";
break;
/* ... */
Although this maintains the 80-character width, it presents a problem when
searching through the source code with a tool like grep
.
If you were interested in where this error message came from then you might grep
for the string "outside the domain"
which would get you to the right place.
However, if you chose to search for "the mathematical function"
then your
search wouldn't return any results because that string wouldn't exist on a
single line within the welibc code base. For that reason, I consider it to be
worth breaking the 80-character width to ensure easy searching later on.
Testing
Since the strings returned by strerror
are implementation-defined, we can't
test the contents of the strings themselves - we can only test that a string was
returned. This simplifies the tests as we only need to check that the returned
pointer is not a null pointer. We'll test the currently supported errno
values
in addition to the unsupported error number 0
.
int
strerrorTest(void)
{
int errnums[] = { 0, EDOM, ERANGE };
size_t i = 0;
int ret = 0;
for (i = 0; i < sizeof(errnums)/sizeof(errnums[0]); i++)
{
if (!strerror(errnums[i]))
{
ret = 1;
break;
}
}
return ret;
}
Conclusion
Unfortunately, strerror
is a somewhat boring library function. The
implementation chooses how to represents each error number as a string and
returns those representations as appropriate.
char *
strerror(int errnum)
{
char *str = NULL;
switch (errnum)
{
case EDOM:
str = "EDOM - An input argument is outside the domain over which the mathematical function is defined";
break;
case ERANGE:
str = "ERANGE - The result of the function cannot be represented as a double value";
break;
default:
str = "E??? - Unknown error number";
break;
}
return str;
}
strtok
July 24, 2021
tags: strcspn, string.h, strspn, strtok
The strtok
function aids in splitting a string into pieces based on a given
set of characters which delimit each piece (tokenizing a string).
char *
strtok(char *s1, const char *s2);
If the delimiting characters are found then a pointer to the first non-delimiter is returned and the next delimiter is overwritten with a null character. Subsequent calls passing a null pointer as the first parameter will return a pointer to the next token from the original string, or a null pointer if no token remains.
Keep in mind that strtok
can modify the string you pass in (note the first
parameter is a char *
, rather than a const char *
). If you need to retain
the original data then it's best to copy it and tokenize the copy, leaving the
original data untouched.
It's also important to understand that strtok
will preserve a pointer across
calls to it, namely a pointer to the end of the last token so that it can
continue tokenizing if a null pointer is passed in as the first parameter.
Local Variables
As mentioned, strtok
needs to keep track of where it left off so that
subsequent calls with a null pointer as the s1
argument will continue to
tokenize the string that was last seen. In order to preserve data across
function calls we can use the static
storage class specifier to instruct the C
compiler that we want the variable to have a static storage duration which means
that the "lifetime" of the variable is constant. The way that's achieved
during execution of a program is for the variable's lifetime to be the entire
duration of the program itself. We choose this storage class on a local variable
because this data is only needed within the strtok
function itself.
static char *save = NULL;
Error Checking
The first condition we want to check is whether or not a set of token delimiters
was passed in, so we validate the s2
pointer. If it's a null pointer then we
indicate to the caller that no token was found by returning a null pointer.
if (!s2)
{
return NULL;
}
Now let's consider the string being tokenized. If multiple tokens exist in the
same string then the caller will initiall pass in a pointer to the string and
then pass in a null pointer to continue tokenizing the same string. This means
that a null pointer for s1
is not necessarily an error. It would be an error
if s1
were null before any tokens had been identified and the way we can
detect this condition is if our saved pointer is also null. If our saved pointer
wasn't null then strtok
should use the saved pointer and continue tokenizing
where it left off.
if (!s1)
{
if (!save)
{
return NULL;
}
s1 = save;
}
Implementation
The first job of strtok
is to find the initial character of the next token. If
s1
points at a non-delimiter character then that's the initial character.
However, if s1
points at a delimiting character then it, along with any
subsequent delimiters, must be skipped over. This would be called the span of
delimiting characters and is what the strspn
function can be used for - it
allows us to skip over any initial delimiting characters. In the case that we
skip to the end of the string (i.e. the remaining part of the string only
consisted of delimiters) then no token remains and we return a null pointer.
s1 += strspn(s1, s2);
if ('\0' == *s1)
{
return NULL;
}
The next job is to find the end of the token which will either be the end of the
string or at the next delimiter. This would be the span of non-delimiting
characters, said another way this is the span of the complement of the
delimiting characters which is what strcspn
can be used for. When we find the
end we'll update our saved pointer to that location.
save = s1 + strcspn(s1, s2);
Finally, we need to null terminate the token data to make the token itself a
string (i.e. a sequence of characters with a null terminator). There are two
possiblities for the character following the end of the token: it's either a
delimiter or a null character. Before we can check for these conditions, we
should validate the saved pointer since it was updated through the call to
strcspn
.
if (save)
{
/* Do something */
}
If the current token ends in a delimiter then we need to overwrite it with a
null character and update the saved pointer to the next character so that a
subsequent call to strok
with a null pointer for s1
will continue tokenizing
at the correct place.
if (save && *save)
{
*save++ = '\0';
}
If the current token already ends in a null terminator then no more work is
needed. It's important that we don't update the saved pointer in this case,
otherwise it would point beyond the null terminator into memory with unknown
data. By retaining a pointer to a null character, subsequent calls will return
properly when strtok
attempts to skip over any initial delimiting characters.
Testing
There are quite a few edge cases to check for with the functionality of strtok
so there are many test cases to go through. Since strtok
can preserve a
pointer across calls, the test strings consist of tokens which start with
different characters. This allows the test to ensure that tokenizing is
progressing correctly by checking a single character rather than comparing an
entire string.
The first tests ensure that input validation is performed properly (remember
that a null pointer for s1
isn't always an error). Next, tests are run to
validate that tokenization works with a single delimiter between the tokens, in
addition to switching the string being tokenized to ensure that strtok
starts
using the new string rather than the saved pointer. Following that are similar
tests with multiple delimiters. Next, validate that tokenizing strings with
initial and terminating characters works properly. Then we verify that the
delimiters can be changed while tokenizing the same string. Finally, the
standard includes a few test cases to help demonstrate the desired
functionality, so those are included as well.
int
strtokTest(void)
{
int ret = 1;
char str1[] = "ABCD:BCD:CD:D";
char str1_copy[] = "ABCD:BCD:CD:D";
char sep1[] = ":";
char str2[] = "EFGH:!FGH!:GH:!H";
char str2_copy[] = "EFGH:!FGH!:GH:!H";
char sep2[] = ":!";
char str3[] = ":!:IJKL!:!";
char sep3[] = ":!";
char str4[] = "MNOP!::";
char sep4[] = "!:";
char str5[] = "QRST::UVWX:!:YZ";
char sep5_1[] = ":";
char sep5_2[] = "!";
/* From the standard */
static char str[] = "?a???b,,,#c";
char *t = NULL;
do
{
/* Invalid token string on first call with invalid separators */
if (NULL != strtok(NULL, NULL))
{
break;
}
/* Still invalid token string with valid separators */
if (NULL != strtok(NULL, sep1))
{
break;
}
/* Valid token string with invalid separators */
if (NULL != strtok(str1, NULL))
{
break;
}
/* Tokenize on single separator */
if (0 != strcmp("ABCD", strtok(str1, sep1)))
{
break;
}
/* Tokenize new string */
if (0 != strcmp("EFGH", strtok(str2, sep2)))
{
break;
}
/* Tokenize on single seperator */
memcpy(str1, str1_copy, sizeof(str1));
if (0 != strcmp("ABCD", strtok(str1, sep1)))
{
break;
}
/* Tokenize on single seperator, second time */
if (0 != strcmp("BCD", strtok(NULL, sep1)))
{
break;
}
/* Tokenize on single seperator, third time */
if (0 != strcmp("CD", strtok(NULL, sep1)))
{
break;
}
/* Tokenize on single seperator, fourth time */
if (0 != strcmp("D", strtok(NULL, sep1)))
{
break;
}
/* Tokenize on single seperator, end */
if (NULL != strtok(NULL, sep1))
{
break;
}
/* Tokenize on multiple seperators */
memcpy(str2, str2_copy, sizeof(str2));
if (0 != strcmp("EFGH", strtok(str2, sep2)))
{
break;
}
/* Tokenize on multiple seperators, second time */
if (0 != strcmp("FGH", strtok(NULL, sep2)))
{
break;
}
/* Tokenize on multiple seperators, third time */
if (0 != strcmp("GH", strtok(NULL, sep2)))
{
break;
}
/* Tokenize on multiple seperators, fourth time */
if (0 != strcmp("H", strtok(NULL, sep2)))
{
break;
}
/* Tokenize on multiple seperators, end */
if (NULL != strtok(NULL, sep2))
{
break;
}
/* Tokenize with initial leading separators */
if (0 != strcmp("IJKL", strtok(str3, sep3)))
{
break;
}
/* Tokenize with initial leading separators, end */
if (NULL != strtok(NULL, sep3))
{
break;
}
/* Tokenize with trailing separators */
if (0 != strcmp("MNOP", strtok(str4, sep4)))
{
break;
}
/* Tokenize with trailing separators, repeat */
if (NULL != strtok(NULL, sep4))
{
break;
}
/* Change token string, start */
if (0 != strcmp("QRST", strtok(str5, sep5_1)))
{
break;
}
/* Change token string, change */
if (0 != strcmp(":UVWX:", strtok(NULL, sep5_2)))
{
break;
}
/* Change token string, change again */
if (0 != strcmp("YZ", strtok(NULL, sep5_1)))
{
break;
}
/* Test from the standard: 't points to the token "a"' */
t = strtok(str, "?");
if (0 != strcmp("a", t))
{
break;
}
/* Test from the standard, 't points to the token "??b" */
t = strtok(NULL, ",");
if (0 != strcmp("??b", t))
{
break;
}
/* Test from the standard, 't points to the token "c"' */
t = strtok(NULL, "#,");
if (0 != strcmp("c", t))
{
break;
}
/* Test from the standard, 't is a null pointer' */
t = strtok(NULL, "?");
if (t)
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
Although the work performed by strtok
is complicated, it can be described
succinctly through the use of existing library functions (namely strspn
and
strcspn
). This is a great demonstration of the power of capturing common logic
into functions for reuse.
char *
strtok(char *s1, const char *s2)
{
static char *save = NULL;
if (!s2)
{
return NULL;
}
if (!s1)
{
if (!save)
{
return NULL;
}
s1 = save;
}
/* Skip initial delimiters */
s1 += strspn(s1, s2);
if ('\0' == *s1)
{
return NULL;
}
/* Find next delimiter */
save = s1 + strcspn(s1, s2);
if (save && *save)
{
*save++ = '\0';
}
return s1;
}
strstr
June 27, 2021
tags: strchr, string.h, strlen, strncmp, strstr
The strstr
function lets us search for the first occurrence of one string in
another string.
char *
strstr(const char *s1, const char *s2);
If the string is found then a pointer to its beginning is returned, if the
string being found is empty then a pointer to the beginning of the search string
is returned, otherwise NULL
is returned.
Local Variables
During the main loop of this function we'll reference the length of the string
being found many times so it will be best to store that in a variable rather
than calling strlen(s2)
over and over again.
size_t s2_len = 0;
Error Checking
As normal, first we check to see if either string is invalid.
if (!s1 || !s2)
{
return NULL;
}
Next, we'll calculate and store the length of the string being found and if it's empty (has zero length) then we'll return a pointer to the first string.
s2_len = strlen(s2);
if (0 == s2_len)
{
return (char *) s1;
}
Implementation
As a precursor to the main search logic, if the string to be found is longer than the string being searched then a match will never occur and we can bail early.
/* A longer string will never be found */
if (s2_len > strlen(s1))
{
return NULL;
}
The general idea will be to focus on the first character in the string being found and skip along each occurrence in the string being searched. Since this is a prerequisite of matching the entire string, it will prevent us from performing unnecessary comparisons. As we visit each instance of that character we'll then compare the full string being found - if a match is identified then we can stop searching and return the matching location.
First, we'll want to loop until s1
points to the terminating null character.
while (*s1)
{
/* Perform search */
}
Next, we want to skip to the next location of the first character of the string
that we want to find. strchr
will do exactly what we need and we'll update our
search string since unmatched characters no longer matter as we progress. If the
first character isn't found then NULL
will be returned and this will be the
same value we return to the caller since the string wasn't found either.
while (*s1)
{
/* Jump to the next match on the first character */
s1 = strchr(s1, *s2);
if (!s1)
{
break;
}
/* Perform comparison */
}
return (char *) s1;
If we do get a match on the first character then we will perform a string
comparison to identify a match. However, since the string being searched may be
longer than the string being found, we need to ensure that we only compare the
characters in the string being found which is done through the use of strncmp
.
Finally, if no match was found then we increment our search string pointer and try again.
while (*s1)
{
/* Jump to the next match on the first character */
s1 = strchr(s1, *s2);
if (!s1)
{
break;
}
if (0 == strncmp(s1, s2, s2_len))
{
break;
}
s1++;
}
return (char *) s1;
Testing
For testing, we'll ensure that we pass NULL
pointers to validate error
checking, search for a string which is longer than the string being searched,
attempt some near matches, and finally search for matches that exist at the
beginning, middle, and end of the string being searched.
int
strstrTest(void)
{
int ret = 1;
char str[] = "Hello, world!\n";
char str2[] = "H";
do
{
/* Invalid search string */
if (NULL != strstr(NULL, str))
{
break;
}
/* Invalid substring */
if (NULL != strstr(str, NULL))
{
break;
}
/* Empty substring */
if (str != strstr(str, ""))
{
break;
}
/* No match */
if (NULL != strstr(str, "goodbye"))
{
break;
}
/* No match on single character */
if (NULL != strstr(str, "z"))
{
break;
}
/* Search for string that is longer but matches initially */
if (NULL != strstr(str2, "Hello"))
{
break;
}
/* Mismatch on last character */
if (NULL != strstr(str, "Hellp"))
{
break;
}
/* Find itself */
if (str != strstr(str, str))
{
break;
}
/* Find match at beginning */
if (str != strstr(str, "Hello"))
{
break;
}
/* Find match in middle */
if (strchr(str, 'w') != strstr(str, "wo"))
{
break;
}
/* Find match at end */
if (strchr(str, 'w') != strstr(str, "world!\n"))
{
break;
}
/* Find match of single character */
if (strchr(str, ' ') != strstr(str, " "))
{
break;
}
/* Match a single character only */
if (str2 != strstr(str2, "H"))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
This function is a little larger than the other ones we've covered so far, but still boils down to straightforward logic of jumping to the next match of the first character and then performing a full comparison of the string being found.
char *
strstr(const char *s1, const char *s2)
{
size_t s2_len = 0;
if (!s1 || !s2)
{
return NULL;
}
s2_len = strlen(s2);
if (0 == s2_len)
{
return (char *) s1;
}
/* A longer string will never be found */
if (s2_len > strlen(s1))
{
return NULL;
}
while (*s1)
{
/* Jump to the next match on the first character */
s1 = strchr(s1, *s2);
if (!s1)
{
break;
}
if (0 == strncmp(s1, s2, s2_len))
{
break;
}
s1++;
}
return (char *) s1;
}
strspn
June 21, 2021
tags: strchr, string.h, strspn
The strspn
function is the opposite of strcspn
in that it computes the
number of characters at the start of a string that are in the set of
characters you specify.
size_t
strspn(const char *s1, const char *s2);
The return value is that count of characters.
While strcspn
is nice for finding the first occurence of any of a given set of
characters, strspn
's use is more for finding the occurence of the first
character not in a given set. This description may seem backwards, but an
example will help.
char *item_nums = { "AA-BC-1234", "AA-321" };
char *set = "ABC-";
size_t loc = strspn(item_nums[0], set);
printf("Serial Number 1: %s\n", loc);
size_t loc = strspn(item_nums[1], set);
printf("Serial Number 2: %s\n", loc);
In this scenario there is an item numbering scheme which consists of the
letters A, B, and C, hyphens, and a serial number. The letters will always
preface the numbers but the number of letters can change. Using strspn
allows
you to easily index to the number.
As a bonus, how else might you solve this problem (hint: strrchr
)?
Local Variables
There isn't much to keep track of since we're only calculating a length, so
we'll create a variable of type size_t
to hold that information which is also
the return value.
size_t len = 0;
Error Checking
The only error conditions we need to explicitly check for are NULL
pointers,
in both cases the return value will be 0
so they can be handled together.
if (!s1 || !s2)
{
return 0;
}
You might consider empty strings to be another error condition, but the loop logic that follows will handle both situations automatically.
Implementation
The main loop is nearly identical to strcspn
except we want to find a
character with strchr
so we directly use its return value as a loop condition.
The idea is to march along the string being searched and check whether each
character is in the set of characters to find. When we find a character that
isn't in the set then we end our search and return the number of characters that
were found.
while (*s1 &&
strchr(s2, *s1++))
{
len++;
}
return len;
If s1
were empty then the loop body would never execute and 0
would be
returned. If s2
were empty then strchr
would return NULL
and the loop body
wouldn't execute either.
Testing
These tests closely mimic those for strcspn
with slight modifications to error
condition checking.
int
strspnTest(void)
{
int ret = 1;
char str[] = "ABCDEFG";
char str2[] = "ABCD";
do
{
/* NULL source */
if (0 != strspn(NULL, "ABC"))
{
break;
}
/* NULL span set */
if (0 != strspn(str, NULL))
{
break;
}
/* Empty search string */
if (0 != strspn("", "ABCD"))
{
break;
}
/* Empty search set */
if (0 != strspn(str, ""))
{
break;
}
/* Missing characters only */
if (0 != strspn(str, "hijkl"))
{
break;
}
/* Short set of existing characters only */
if (strlen(str2) != strspn(str, str2))
{
break;
}
/* Existing characters only */
if (strlen(str) != strspn(str, str))
{
break;
}
/* Missing character at beginning of search set */
if (strlen(str) != strspn(str, "aABCDEFG"))
{
break;
}
/* Missing character in middle of search set */
if (strlen(str) != strspn(str, "ABCDdEFG"))
{
break;
}
/* Missing character at end of search set */
if (strlen(str) != strspn(str, "ABCDEFGg"))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
As seen with the serial number example, strspn
can be used to tell you two
pieces of information about a string: how long the initial set of matching
characters is and at what index non-matching characters start.
size_t
strspn(const char *s1, const char *s2)
{
size_t len = 0;
if (!s1 || !s2)
{
return 0;
}
while (*s1 &&
strchr(s2, *s1++))
{
len++;
}
return len;
}
strrchr
March 06, 2021
tags: strchr, string.h, strrchr
The strrchr
function enables the caller to find the location of the last
occurrence of a character in a string.
char *
strrchr(const char *s, int c);
The return value is a pointer to the found character, or NULL
if it does not
exist in the string.
Local Variables
We use a char
search character for the same reason mentioned for strchr
.
Otherwise, we only need a pointer to the end of the string so we can search in
the reverse direction.
const char *cur = NULL;
const char search = c;
Implementation
The implementation is similar to that of strrchr
, but performed in the reverse
direction. We use the beginning of the string as our terminal position, rather
than that of the null-character. Another difference is that we first determine
where the end of the string is, which we can rely on strlen
to calculate for
us.
The only catch to this function is that since the null-character is considered
part of the search, we must ensure we compare that character every time - even
if the string is empty (i.e. the string points to the null-character). This is
where the --cur
becomes important. When an empty string is searched, by
pre-decrementing we ensure that we only inspect the null-character rather than
looping an extra time (which would happen if post-decrement, cur--
, was used).
if (!s)
{
return NULL;
}
cur = s + strlen(s);
do
{
if (search == *cur)
{
return (char *) cur;
}
} while (--cur >= s);
return NULL;
Testing
First, we'll test our parameter validation by passing a NULL
pointer, and
attempting to search an empty string which should succeed when we search for the
null-character. Next, we'll search for characters that don't exist at all, or
exist past a null character in the array we define. Finally, make sure it works
when finding characters at the beginning, middle, and end of the string (i.e.
finding the null character), as well as finding the last occurrence of a
character which occurs multiple times in the string.
int
strrchrTest(void)
{
int ret = 1;
char str[] = "ABCDEFGBCDEFG\0abcdefg";
char emptyStr[] = "";
do
{
/* NULL object */
if (NULL != strrchr(NULL, 'a'))
{
break;
}
/* Search empty string for characters */
if (NULL != strrchr(emptyStr, 'a'))
{
break;
}
/* Search empty string for the null-character */
if (emptyStr != strrchr(emptyStr, '\0'))
{
break;
}
/* Search for non-existent character */
if (NULL != strrchr(str, 'Z'))
{
break;
}
/* Search for char after null character */
if (NULL != strrchr(str, 'a'))
{
break;
}
/* Valid search at beginning */
if (str != strrchr(str, str[0]))
{
break;
}
/* Valid search in middle*/
if (strchr(str + 2, str[1]) != strrchr(str, str[1]))
{
break;
}
/* Valid search at end */
if ((str + strlen(str) - 1) != strrchr(str, str[strlen(str) - 1]))
{
break;
}
/* Search for null character at end */
if ((str + strlen(str)) != strrchr(str, '\0'))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
strrchr
is the reverse analogue of strchr
with the same catch about finding
the null-character.
char *
strrchr(const char *s, int c)
{
const char *cur = NULL;
const char search = c;
if (!s)
{
return NULL;
}
cur = s + strlen(s);
do
{
if (search == *cur)
{
return (char *) cur;
}
} while (--cur >= s);
return NULL;
}
strpbrk
December 17, 2020
tags: strchr, strcspn, string.h, strpbrk
The strpbrk
function provides a simpler method of calling strchr
when
searching for any one of several characters in a given string. It searches one
string for the first occurrence of any character in another string.
char *
strpbrk(const char *s1, const char *s2);
The return value is a pointer to the found character, or NULL
if no characters
were found.
Suppose you wanted to split a string on any one of a set of characters, but you
wanted to know whether you could perform a split before doing the work (e.g.
before calling strtok
). strpbrk
will return the pointer to the split
location assuming you wanted to split on whichever of your characters came
first, then you could do you more advanced logic of breaking the string into two
pieces.
char *phrase = "Goodbye, cruel world";
char *set = "abc";
if (strpbrk(phrase, set))
{
/* Advanced split logic */
}
You may think that this sounds very similar to strcspn
and that's because it
is! The difference is that strcspn
indicates a length between the start of the
string and the first match while strpbrk
returns a pointer to the first match.
Local Variables
We'll be utilizing the strcspn
function to do the core work and we want to
store the return value so we can index into the string being searched.
size_t idx = 0;
Error Checking
Since strcspn
can indicate an error by returning 0
, we wouldn't know whether
a character was found in the first location or if an error occurred, so we need
to validate s1
before passing it to strcspn
. Normally we would want to check
for a NULL
s2
pointer, however strcspn
will do that for us so we can skip
that check.
if (!s1)
{
return NULL;
}
Implementation
We'll leverage strcspn
to find the length to the first match which allows us
to validate no match if the length is equal to the length of our string being
searched. In that case we return NULL
, otherwise we return a pointer to the
matched character which is the string pointer plus the length returned by
strcspn
.
idx = strcspn(s1, s2);
if ('\0' == s1[idx])
{
return NULL;
}
return (char *)(s1 + idx);
Testing
Due to the similar functionality, the testing is nearly identical to that of
strcspn
. We'll test input validation and then cases where there are no matches
and then matches at the beginning, middle, and end of the string being searched.
int
strpbrkTest(void)
{
int ret = 1;
char str[] = "ABCDEFGabcdefg";
do
{
/* NULL search string */
if (NULL != strpbrk(NULL, "abcd"))
{
ret = 2;
break;
}
/* NULL search set */
if (NULL != strpbrk(str, NULL))
{
ret = 3;
break;
}
/* Empty search string */
if (NULL != strpbrk("", "abcd"))
{
ret = 4;
break;
}
/* Empty search set */
if (NULL != strpbrk(str, ""))
{
ret = 5;
break;
}
/* Characters not in search string */
if (NULL != strpbrk(str, "hijk"))
{
ret = 6;
break;
}
/* First character matches */
if (str != strpbrk(str, "hijAklm"))
{
ret = 7;
break;
}
/* Last character matches */
if ((str + strlen(str) - 1) != strpbrk(str, "hijgklm"))
{
ret = 8;
break;
}
/* Match in middle */
if (strchr(str, 'a') != strpbrk(str, "hijaklm"))
{
ret = 9;
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
The name wasn't immediately obvious to me but strpbrk
stands for "string
pointer break" which likely comes from the SNOBOL language which had a BREAK()
function to perform the same task (and it had a SPAN()
function that mirrors
strcspn
). Due to the commanality with strcspn
, you could swap which function
does the main work and then implement strcpsn
by calling this function
instead.
char *
strpbrk(const char *s1, const char *s2)
{
size_t idx = 0;
if (!s1)
{
return NULL;
}
idx = strcspn(s1, s2);
if ('\0' == s1[idx])
{
return NULL;
}
return (char *)(s1 + idx);
}
strcspn
December 05, 2019
tags: strchr, strcspn, string.h, strlen
The strcspn
function is like an advanced version of strchr
- it determines
the number of characters at the start of a string are not in the set of
characters you specify.
size_t
strcspn(const char *s1, const char *s2);
The return value is that count of characters.
This function is somewhat confusing so it will help to show an example. Using
strcspn
allows you to answer a question like "Where does the first instance of
a, b, or c occur in the string 'Goodbye, cruel world'?". C code which answers
this question would look like so:
char *phrase = "Goodbye, cruel world";
char *set = "abc";
size_t loc = strcspn(phrase, set);
printf("The first instance is at index %u", len);
At this point, loc
will either be equal to the length of the phrase, implying
that a, b, or c does not exist in the string, or it will hold the index of the
first occurrence of either of those characters. We can see that there is a 'b'
in "Goodbye", so the message printed to the screen will be "The first instance
is at index 4".
Local Variables
There isn't much to keep track of since we're only calculating a length, so
we'll create a variable of type size_t
to hold that information which is also
the return value.
size_t len = 0;
Error Checking
There are two error cases we need to account for and each requires a different
return value. The first case is when the string being searched, s1
, is NULL
.
If this is the case then the string has no length and the return value would be
0
.
if (!s1)
{
return 0;
}
The second case is when no search characters are provided, i.e. s2
is NULL
.
Since we don't have anything to search for then the return value is equal to the
length of the string being searched.
if (!s2)
{
return strlen(s1);
}
The order of these checks is important as well. The first case implies that we can't do any searching while the seconds implies that we don't need to do any searching.
Implementation
The method we'll use is to march along the first string and look for each
subsequent character in the second string. We'll keep marching until we reach
the end of the string being searched or we find a match. The hardest part here
is finding matches but we can farm that work out to strchr
since that's
exactly what it's for! As long as we continue marching then we also continue
updating the count being returned.
while (*s1 &&
!strchr(s2, *s1++))
{
len++;
}
return len;
Testing
First, we'll test our parameter validation by passing a NULL
pointer for each
of the strings in sequence and similarly pass empty strings for each. Next,
we'll search for characters that don't exist at all to ensure that we get the
length of the search string back. Finally, we'll place the first-matched
character at various positions in the string of characters being matched
against.
int
strcspnTest(void)
{
int ret = 1;
char str[] = "ABCDEFGabcdefg";
do
{
/* NULL source */
if (0 != strcspn(NULL, "ABC"))
{
break;
}
/* NULL span complement */
if (strlen(str) != strcspn(str, NULL))
{
break;
}
/* Empty search string */
if (0 != strcspn("", "ABCD"))
{
break;
}
/* Empty search set */
if (strlen(str) != strcspn(str, ""))
{
break;
}
/* Characters aren't in string */
if (strlen(str) != strcspn(str, "hijkl"))
{
break;
}
/* First character is in the search set */
if (0 != strcspn(str, "A"))
{
break;
}
/* Character at beginning of search set */
if (3 != strcspn(str, "Dabcd"))
{
break;
}
/* Character in middle of search set */
if (3 != strcspn(str, "abDcd"))
{
break;
}
/* Character at end of search set */
if (3 != strcspn(str, "abcdD"))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
Although the description is somewhat confusing, strcspn
is not difficult to
implement. The name comes from "string complement span" because it computes the
span of characters in a string that are in the set complementary to the one
provided.
size_t
strcspn(const char *s1, const char *s2)
{
size_t len = 0;
if (!s1)
{
return 0;
}
if (!s2)
{
return strlen(s1);
}
while (*s1 &&
!strchr(s2, *s1++))
{
len++;
}
return len;
}
memchr
December 05, 2018
Once you have collected or generated data, you may want to search through it to
find the information you're interested in. memchr
is the first in the
collection of search functions that can be used to accomplish such a task. It
takes the object being search, a character to find, and a number of characters
to search through.
void *
memchr(const void *s, int c, size_t n);
The return value is a pointer to the found character, or NULL
if it does not
exist in the object.
Local Variables
The description of memchr
is very specific with respect to the data types used
for searching. As you see with the prototype, memchr
allows any object to be
searched since it takes and returns a void *
. In addition, it takes a search
character which is actually an int
(which is what a character would be
promoted to anyways). However, The Standard has this to say about the data types
"The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s."
As such, rather than adding casts into our code, we can declare local variables of the appropriate types and assign them to implicitly perform the casts for us.
const unsigned char *pCur = s;
const unsigned char search = c;
Parameter Validation
Since this function will do all of its own work, we need to perform some
parameter validation. This includes checking for a NULL
pointer and when the
search would wrap around memory. We exclude checking n
against 0
because
this isn't likely to happen and is handled by the logic to come. If the
parameters are invalid then the search can't be performed and we return NULL
.
if ( !s ||
/* Check for wrapping while searching */
((((unsigned long) -1) - ((unsigned long) s)) < n))
{
return NULL;
}
Implementation
A similar approach to memcmp
will be taken where we loop over each character,
decrementing n
each iteration, but we will always compare against our search
character. It's important to remember than after the if
statement is executed,
pCur
will point one past the character that was just searched. If that was the
character we were searching for then we must decrement pCur
before returning
its value as a void *
. Finally, if the character was never found, return
NULL
to indicate as such.
while (n-- > 0)
{
if (search == *(pCur++))
{
return (void *) --pCur;
}
}
return NULL;
Testing
First we'll test our parameter validation by passing a NULL
pointer, and
attempting to trigger a search which would wrap memory. Next we'll search for
a character which exists in the object, but specify that no characters should be
searched, along with a character that doesn't exist. Lastly, search for
characters at the boundaries of the object and in the middle.
int
memchrTest(void)
{
int ret = 1;
char obj[] = "ABCDEFG";
char *p = NULL;
do
{
/* NULL object */
if (NULL != memchr(NULL, 'a', 10))
{
break;
}
/* Wrapped search */
if (NULL != memchr((void *) ((unsigned long) -5), 'A', 10))
{
break;
}
/* Search no characters */
if (NULL != memchr(obj, obj[0], 0))
{
break;
}
/* Search for non-existent character */
if (NULL != memchr(obj, 'a', sizeof(obj)))
{
break;
}
/* Valid search at beginning */
if (obj != memchr(obj, obj[0], sizeof(obj)))
{
break;
}
/* Valid search in middle*/
if ((obj + 4) != memchr(obj, obj[4], sizeof(obj)))
{
break;
}
/* Search for null-character at end */
p = memchr(obj, '\0', sizeof(obj));
if (strlen(obj) != (size_t) (p - obj))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
While memchr
represents a simple operation, we must ensure that we follow the
standard by using the correct data types for comparison. Aside from this, we see
the same pattern used in functions like memcmp
to iterate over the object and
perform the search operation.
void *
memchr(const void *s, int c, size_t n)
{
const unsigned char *pCur = s;
const unsigned char search = c;
if ( !s ||
/* Check for wrapping while searching */
((((unsigned long) -1) - ((unsigned long) s)) < n))
{
return NULL;
}
while (n-- > 0)
{
if (search == *(pCur++))
{
return (void *) --pCur;
}
}
return NULL;
}
strchr
December 05, 2018
The strchr
function enables the caller to find the location of the first
occurrence of a character in a string.
char *
strchr(const char *s, int c);
The return value is a pointer to the found character, or NULL
if it does not
exist in the string.
Local Variables
Similar to memchr
, the description of strchr
is very specific with respect
to the data type used for searching. As you see with the prototype, strchr
accepts an int
as the type being search for. This is partially due to the
integer promotion rules, whereby a character being passed as an argument to
a function will be implicitly promoted to an int
with the same sign. The other
part is to maintain compatibility with pre-standardized C code.
"The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s."
As such, rather than adding casts into our code, we can declare a local char
and implicitly cast the searched for character.
const char search = c;
Implementation
We'll use an approach which is slightly similar to strlen
where we'll march
along the string, comparing each character to search
until we find a match or
reach the end of the string. One interesting property of this specific function
is that it counts the terminating null character as part of the string. This
means you can search for null character with strchr
. This also means you need
to first check for the desired character, and then check against '\0'
before
exiting the loop. In addition, we get parameter validation for free by using s
as the loop condition. Finally, we return NULL if the loop ends as it means we
reached the null character without finding a match.
while (s)
{
if (search == *s)
{
return (char *) s;
}
if ('\0' == *s)
{
break;
}
s++;
}
return NULL;
Testing
First, we'll test our parameter validation by passing a NULL
pointer, and
attempting to search an empty string. Next, we'll search for characters that
don't exist at all, or exist past a null character in the array we define.
Finally, make sure it works when finding characters at the beginning, middle,
and end of the string (i.e. finding the null character).
int
strchrTest(void)
{
int ret = 1;
char str[] = "ABCDEFG\0abcdefg";
do
{
/* NULL object */
if (NULL != strchr(NULL, 'a'))
{
break;
}
/* Search no characters */
if (NULL != strchr("", 'a'))
{
break;
}
/* Search for non-existent character */
if (NULL != strchr(str, 'Z'))
{
break;
}
/* Search for char after null character */
if (NULL != strchr(str, 'a'))
{
break;
}
/* Valid search at beginning */
if (str != strchr(str, str[0]))
{
break;
}
/* Valid search in middle*/
if ((str + 4) != strchr(str, str[4]))
{
break;
}
/* Search for null character at end */
if ((str + strlen(str)) != strchr(str, '\0'))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
strchr
is straightforward, with the only catch being that the null character
is considered a part of the string. Ensuring that it can be found means you
could implement strlen
by use of strchr
.
char *
strchr(const char *s, int c)
{
const char search = c;
while (s)
{
if (search == *s)
{
return (char *) s;
}
if ('\0' == *s)
{
break;
}
s++;
}
return NULL;
}
strncmp
December 05, 2018
tags: memcmp, string.h, strlen, strncmp
The strncmp
function allows you to compare to strings to determine their
(in)equality while also specifying a maximum number of characters to compare.
int
strncmp(const char *s1, const char *s2, size_t n);
The return value is an integer which indicates whether the first string is greater than, equal to, or less than the second string.
Implementation
strncmp
is of course similar to strcmp
, but we first need to determine how
many characters should be compared, as any characters after a null-character in
s1
should not be compared. Therefore, we need to determine the length of s1
,
plus 1 for the null-character, and compare a number of characters which is the
lesser of that value and n
. We can then pass the real work on to memcmp
.
size_t len1 = strlen(s1) + 1;
if (len1 < n)
{
n = len1;
}
return memcmp(s1, s2, n);
It's important to include the + 1
as otherwise an s1
which is a substring of
s2
would compare as equal (e.g. strncmp("foo", "foobar", strlen("foobar"))
).
Testing
We need to test comparison to/from NULL
pointers, to/from empty strings,
and all cases of equal, greater than, and less than results. The only deviation
from the strcmp
tests is comparing two strings where one is a substring of the
other. Performing this test ensures that the null-character is accounted for
when s1
is shorter than s2
.
int
strncmpTest(void)
{
int ret = 1;
char pStr1[] = "This is a Test string";
char pStr2[] = "This is a Test string";
char pStr3[] = "This is a Lesser string";
char pStr4[] = "This is a greater string";
char *pStr5 = NULL;
char pStr6[] = "This is a Test string with more";
do
{
/* NULL s2 pointer */
if (1 != strncmp(pStr1, pStr5, sizeof(pStr1)))
{
break;
}
/* NULL s1 pointer */
if (1 != strncmp(pStr5, pStr1, sizeof(pStr1)))
{
break;
}
/* Empty s2 */
if (0 >= strncmp(pStr1, "", sizeof(pStr1)))
{
break;
}
/* Empty s1 */
if (0 <= strncmp("", pStr1, sizeof(pStr1)))
{
break;
}
/* Test equality */
if (0 != strncmp(pStr1, pStr2, sizeof(pStr1)))
{
break;
}
/* Test equal substrings */
if (0 != strncmp(pStr1, pStr6, sizeof(pStr1) - 1))
{
break;
}
/* Test inequal strings */
if (0 == strncmp(pStr1, pStr6, sizeof(pStr1)))
{
break;
}
/* First string greater than second string */
if (0 >= strncmp(pStr1, pStr3, sizeof(pStr1)))
{
break;
}
/* First string less than second string */
if (0 <= strncmp(pStr1, pStr4, sizeof(pStr1)))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
Again, memcmp
does all the work and we need to take care when calculating the
maximum number of characters to be compared.
int
strncmp(const char *s1, const char *s2, size_t n)
{
size_t len1 = strlen(s1) + 1;
if (len1 < n)
{
n = len1;
}
return memcmp(s1, s2, n);
}
strcmp
November 30, 2018
tags: memcmp, strcmp, string.h, strlen
The strcmp
function allows you to compare to strings to determine their
(in)equality.
int
strcmp(const char *s1, const char *s2);
The return value is an integer which indicates whether the first string is greater than, equal to, or less than the second string.
Implementation
The strcmp
function is incredibly similar to memcmp
except we must stop
comparing when we encounter a null-character in s1
. We find find that
character with the strlen
function and use the result, plus 1, to define how
many characters to compare. Thus, we can make a call directly to memcmp
to do
the work for us. All error handling is done in the other functions and strcmp
becomes very simple:
return memcmp(s1, s2, strlen(s1) + 1);
It's important to include the + 1
as otherwise an s1
which is a substring of
s2
would compare as equal (e.g. strcmp("foo", "foobar")
).
Testing
We need to test comparison to/from NULL
pointers, to/from empty strings,
and all cases of equal, greater than, and less than results. The testing here is
slightly different from memcmp as an empty string, ""
, has a single character
in it, the null-character '\0
. Therefore we have at least one character to
compare and the result should be a non-zero value, assuming the other string has
characters in it.
int
strcmpTest(void)
{
int ret = 1;
char pStr1[] = "This is a Test string";
char pStr2[] = "This is a Test string";
char pStr3[] = "This is a Lesser string";
char pStr4[] = "This is a greater string";
char *pStr5 = NULL;
do
{
/* NULL s2 pointer */
if (1 != strcmp(pStr1, pStr5))
{
break;
}
/* NULL s1 pointer */
if (1 != strcmp(pStr5, pStr1))
{
break;
}
/* Empty s2 */
if (0 >= strcmp(pStr1, ""))
{
break;
}
/* Empty s1 */
if (0 <= strcmp("", pStr1))
{
break;
}
/* Test equality */
if (0 != strcmp(pStr1, pStr2))
{
break;
}
/* First string greater than second string */
if (0 >= strcmp(pStr1, pStr3))
{
break;
}
/* First string less than second string */
if (0 <= strcmp(pStr1, pStr4))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
memcmp
does all the work here while we use strlen
to determine the number of
characters to be compared, plus one for the null-terminator.
int
strcmp(const char *s1, const char *s2)
{
return memcmp(s1, s2, strlen(s1) + 1);
}
strncat
November 29, 2018
tags: memmove, string.h, strlen, strncat
The strncat
function allows you to join two strings by appending the second
string to the first, modifying the first string rather than creating a new one.
In addition, you may specify a maximum number of characters to be appended,
allowing you to control the amount of data being appended.
char *
strncat(char *s1, const char *s2, size_t n);
It will return s1
which will now hold the concatenation of the two strings.
Implementation
Much like strcat
, we must first determine the length of the two strings to
determine where to begin copying and how much to copy. We should copy the lesser
value of n
and the length of the second string.
size_t len1 = 0;
size_t len2 = 0;
/* Error checking */
len1 = strlen(s1);
len2 = strlen(s2);
if (len2 < n)
{
n = len2;
}
Now that we know how much to copy, we know that the first character to be
overwritten will be the null-character at the end of s1
, which is located at
s1 + len1
. Next, we add a null-character at the very end in case one was
not found within the first n
(our modified one) characters of s2
. Finally,
return the value of s1
.
memmove(s1 + len1, s2, n);
s1[len1 + n] = '\0';
return s1;
This is nice because it works whether n
is positive or zero. When n
is zero
no characters will be copied and the null-character at the end of s1
will be
overwritten with another null-character.
Testing
We need to test concatenation to/from NULL
pointers, to/from empty strings,
partial concatenation, and finally normal concatenation. In the case of NULL
pointers and an empty source string, we expect the destination to be unmodified.
Otherwise, we verify that the result is n
characters from the second string
appended to the first.
int
strncatTest(void)
{
int ret = 1;
char *pStr1 = NULL;
char pStr2[15] = "hello, world";
char pStr3[15] = "hello, world";
char pStr4[15] = { 0 };
char pStr5[15] = "hello,";
char pStr6[15] = " world";
do
{
/* Append to NULL, may crash */
strncat(pStr1, pStr2, strlen(pStr2));
/* Concatenate from NULL, may crash */
strncat(pStr2, pStr1, 10);
/* Concatenate to an empty string */
strncat(pStr4, pStr2, strlen(pStr2));
if (0 != memcmp(pStr4, pStr2, strlen(pStr2) + 1))
{
break;
}
/* Concatenate from an empty string */
memset(pStr4, '\0', sizeof(pStr4));
strncat(pStr2, pStr4, 10);
if (0 != memcmp(pStr2, pStr3, sizeof(pStr3)))
{
break;
}
/* Append partial string */
memset(pStr4, '\0', sizeof(pStr4));
strncat(pStr4, pStr5, 4);
if ((4 != strlen(pStr4)) ||
(0 != memcmp(pStr4, pStr5, 4)))
{
break;
}
/* Normal concatenation */
strncat(pStr5, pStr6, sizeof(pStr5) - strlen(pStr5));
if (0 != memcmp(pStr5, pStr3, sizeof(pStr3)))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
Leveraging the functionality of strlen
allows us to pass the major work to
memmove
, which consequently works as expected if n
is zero. The only tricks
to this function are figuring out how much data to copy and ensuring the
resulting string is null-terminated.
char *
strncat(char *s1, const char *s2, size_t n)
{
size_t len1 = 0;
size_t len2 = 0;
if (!s1 || !s2)
{
return s1;
}
len1 = strlen(s1);
len2 = strlen(s2);
if (len2 < n)
{
n = len2;
}
memmove(s1 + len1, s2, n);
s1[len1 + n] = '\0';
return s1;
}
strcat
October 09, 2017
tags: strcat, strcpy, string.h
The strcat
function allows you to join two strings by appending the second
string to the first, modifying the first string rather than creating a new one.
char *
strcat(char *s1, const char *s2);
It will return s1
which will now hold the concatenation of the two strings.
Implementation
To concatenate two strings means copying one to the end of the other. To find
the end of the first string we use the strlen
function to get an index for the
location of the terminating null character. Then we pass this location to
strcpy
, providing the second string as the second parameter.
strcpy(s1 + strlen(s1), s2);
return s1;
We don't need any parameter validation because strlen
and strcpy
do that for
us. If s1
is NULL
then strlen
will return 0 and we'll end up passing
NULL
to strcpy
which will return without modifying anything. Likewise, if
s2
is NULL
then strcpy
won't do anything.
Testing
We need to test concatenation to/from NULL
pointers, to/from empty strings,
and finally normal concatenation. In the case of NULL
pointers and an empty
source string, we expect the destination to be unmodified. Otherwise, we verify
that the result is the second string appended to the first.
int
strcatTest(void)
{
int ret = 1;
char *pStr1 = NULL;
char pStr2[15] = "hello, world";
char pStr3[15] = "hello, world";
char pStr4[15] = { 0 };
char pStr5[15] = "hello,";
char *pStr6 = " world";
do
{
/* Append to NULL, may crash */
strcat(pStr1, pStr2);
/* Concatenate from NULL, may crash */
strcat(pStr2, pStr1);
/* Concatenate to an empty string */
strcat(pStr4, pStr2);
if (0 != memcmp(pStr4, pStr2, strlen(pStr2)))
{
break;
}
/* Concatenate from an empty string */
memset(pStr4, '\0', sizeof(pStr4));
strcat(pStr2, pStr4);
if (0 != memcmp(pStr2, pStr3, sizeof(pStr3)))
{
break;
}
/* Normal concatenation */
strcat(pStr5, pStr6);
if (0 != memcmp(pStr5, pStr2, sizeof(pStr2)))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
As our library continues to grow, we'll find that more and more functions can be implemented in terms of existing functions. This will reduce the logic and validation required so that it becomes concentrated in certain parts of the library. It also means that there is less code to debug later on if there are any problems.
char *
strcat(char *s1, const char *s2)
{
strcpy(s1 + strlen(s1), s2);
return s1;
}
strncpy
September 30, 2017
tags: memmove, memset, strcpy, string.h, strlen, strncpy
strncpy
provides control over the same type of copying provided by
strcpy. Rather than being forced to copy an entire
string, you can choose to copy a portion of a string.
char *
strncpy(char *s1, const char *s2, size_t n);
s1
and s2
are the destination and source strings while n
is the maximum
number of characters to copy. The catch is that if no NUL is found in the first
n
characters then no NUL is added to the destination string. If the string is
less than n
characters long then null characters are appended so a total of
n
characters are written.
Local Variables
Similar to strcpy
, we'll store the length of the source string so we can use
it to perform error checking before attempting to copy. It will also be used to
add NULs when the source string is shorter than n
.
size_t len = strlen(s2);
Parameter Validation
The same validation can be used from strcpy
.
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
((((unsigned long) -1) - ((unsigned long) s2)) < len))
{
return s1;
}
Implementation
When parameter validation succeeds there are two cases to handle: n
is less
than or equal to the source string length, or it is greater.
If n <= len
then we can use memmove
or memcpy
to copy n
characters from
s2
to s1
. We can't use strcpy
generically in this case because it will add
a NUL if n == len
, meaning we will have written n + 1
characters.
When n > len
we need to copy the string and then add null characters to pad
out the total number of written characters to n
. We can use strcpy
to copy
the string and then memset
to add the padding for us. Remember that strcpy
will set s1[len]
to NUL, so starting the memset
ats1 + len
would mean we
unnecessarily overwrite an existing NUL with NUL. Thus, we should start the
padding at s1 + len + 1
. Since n > len
then we guarantee that the result of
n - len - 1
is greater than or equal to zero and that the result will never
overflow.
Finally, we return s1
just like strcpy
.
if (n <= len)
{
memmove(s1, s2, n);
}
else
{
strcpy(s1, s2);
memset(s1 + len + 1, '\0', n - len - 1);
}
return s1;
Testing
Normal tests for strncpy
would include passing bad pointers and passing an n
which is less than strlen(s2)
. We also need to test copying from an empty
string, from a string which is shorter than n
(to verify the NUL padding), and
finally setting n
to zero.
int
strncpyTest(void)
{
int ret = 1;
char *str1 = NULL;
char *str2 = "hello, world";
char str3[15] = { 0 };
char *str4 = "";
char zeros[15] = { 0 };
do
{
/* Copy into NULL, may crash */
strncpy(str1, str2, strlen(str2) + 1);
/* Copy from NULL, may crash */
strncpy(str3, str1, sizeof(str3));
/* Should work the same as strcpy */
strncpy(str3, str2, strlen(str2) + 1);
if (0 != memcmp(str3, str2, strlen(str2) + 1))
{
break;
}
memset(str3, '\0', sizeof(str3));
/* Taint str3 and verify the taint is not touched */
str3[strlen(str2) - 4] = 'a';
strncpy(str3, str2, strlen(str2) - 4);
if ((0 != memcmp(str3, str2, strlen(str2) - 4)) ||
(str3[strlen(str2) - 4] != 'a'))
{
break;
}
/* Taint str3 completely and verify strncpy places NULs afterwards */
memset(str3, -1, sizeof(str3));
strncpy(str3, str2, sizeof(str3));
if ((0 != memcmp(str3, str2, strlen(str2))) ||
(0 != memcmp(str3 + strlen(str2),
zeros,
sizeof(str3) - strlen(str2))))
{
break;
}
/* Verify no copy is made from an empty string */
memset(str3, 0, sizeof(str3));
strncpy(str3, str4, strlen(str4));
if (0 != memcmp(str3, zeros, sizeof(str3)))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
This implementation makes use of memmove
, strcpy
, and memset
to carry out
the tasks dictated be the relation of n
to strlen(s2)
. Our tests then verify
this behavior through various taint methods.
char *
strncpy(char *s1, const char *s2, size_t n)
{
size_t len = strlen(s2);
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
((((unsigned long) -1) - ((unsigned long) s2)) < len))
{
return s1;
}
if (n <= len)
{
memmove(s1, s2, n);
}
else
{
strcpy(s1, s2);
memset(s1 + len + 1, '\0', n - len - 1);
}
return s1;
}
memset
August 26, 2017
Writing a single value to a region of memory is most often used to initialize a
structure on the stack or the memory returned from a call to malloc
. It can
also be used to mark a region "dirty" with a predetermined value that can later
be searched for. The memset
function is used to perform such tasks
void *
memset(void *s, int c, size_t n);
And returns the value of the passed in pointer, s
.
Local Variables
Since data cannot be written by directly dereferencing a void *
, we need to
convert that to a pointer type we can use. The Standard specifies that the value
c
is to be converted to an unsigned char
when written, so an unsigned char
*
makes sense as our working pointer.
unsigned char *pChar = s;
Parameter Validation
We need to verify that a valid pointer is passed and also that the amount of
characters being set will not wrap around memory. We could return here if n
is
zero but that's an unlikely scenario. Rather than add a check to every call of
memset
, we'll let the least likely case suffer a few extra lines of code.
if ( !s ||
/* Check for wrapping */
((((unsigned long) -1) - ((unsigned long) s)) < n))
{
return s;
}
Implementation
The main portion of the loops looks similar to that of the working loops in
memmove
. We iterate and decrement n
, each time writing c
into the pointed
to character and finally increment the pointer by one character. Once finished,
we return the value of s
.
while (0 < n--)
{
*(pChar++) = (unsigned char) c;
}
return s;
Testing
Parameter validation aside, we need to verify that requesting zero characters be set is honored and that we can't set more characters than are left in memory. We also need to set some memory and verify that only the requested region was changed.
int
memsetTest(void)
{
int ret = 1;
char reg1[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
char reg2[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
char reg3[] = "ABCD55555JKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
do
{
/* We may crash */
memset(NULL, 0, 10);
/* Request a set that would wrap, may crash */
memset((void *) ((unsigned long) -5), 0, 10);
/* No change should be made */
memset(reg1 + 4, 0, 0);
if (0 != (memcmp(reg1, reg2, sizeof(reg1))))
{
break;
}
/* Verify change in only the specified area */
memset(reg1 + 4, '5', 5);
if (0 != (memcmp(reg1, reg3, sizeof(reg1))))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
memset
provides a straightforward way to write a single value to every
character in a given region.
void *
memset(void *s, int c, size_t n)
{
unsigned char *pChar = s;
if ( !s ||
/* Check for wrapping */
((((unsigned long) -1) - ((unsigned long) s)) < n))
{
return s;
}
while (0 < n--)
{
*(pChar++) = (unsigned char) c;
}
return s;
}
strcpy
August 25, 2017
tags: memmove, strcpy, string.h, strlen
Text manipulation is a large part user facing programs so that text can be
stored, retrieved, and displayed. To accomplish those actions, strings may need
to be copied from one location to another. We can already find the length of a
string with strlen and could pass along that value to the
memcpy or memmove functions.
However, the programmer must always remember to copy an extra character: the
null terminator. To prevent this error prone requirement on the programmer, C
has the strlen
function
char *
strcpy(char *s1, const char *s2);
An (Almost) Easy Win
As mentioned, one could accomplish a string copy with a memory copying function.
The Standard does not require strcpy
to support copying between overlapping
strings, but we can easily support this by using memmove
memmove(dst, src, strlen(src) + 1);
Assuming that strlen
works, it follows that a null character will be at the
end of the string. Thus, we can copy the source length, plus one, in order to
get the entire string along with the null terminator.
In the case of string which runs off the end of memory and the program doesn't
crash during strlen
(which is a valid possibility with undefined behavior), we
would attempt to copy a single character from the source. As long as the source
pointer isn't the last addressable byte, this would look valid to memmove
which would subsequently copy a single character into the destination... without
the null terminator. This is not the correct behavior so we must handle it
appropriately.
Local Variables
First, we'll store the length of the string so we can use it to perform error checking before attempting to copy.
size_t len = strlen(s2);
Parameter Validation
We need to perform the same types of checks that memmove
does so we'll steal
them only exchanging n
for our local variable len
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
((((unsigned long) -1) - ((unsigned long) s2)) < len))
{
return s1;
}
In the case that the length is 0 then the wrapping checks don't perform any meaningful verification. However, when the length is positive we guarantee that we won't copy onto the end of memory leaving a string which is not null terminated.
Implementation
If the parameters are valid then we are assured that a valid copy may take
place. However, if we're copying from a string which isn't null terminated
(which would report a length of 0) then we would still copy a single character
and no null terminator if we were to use the above memmove
call. To prevent
this, we can call memmove
with the string length and manually null terminate
it. This guarantees that the destination will always become null terminated,
even if the source string wasn't. In other words, an invalid source string will
result in an empty string being copied.
memmove(s1, s2, len);
s1[len] = '\0';
return s1;
Testing
Aside from passing NULL
pointers to strcpy
, we also need to verify copies
from an empty string along with a normal copy from a string into an
appropriately sized array. We still can't accurately test non-null terminated
strings at the end of memory because that would invoke undefined behavior,
however we can attempt to copy to the end of memory.
int
strcpyTest(void)
{
int ret = -1;
char *str1 = NULL;
char *str2 = "hello, world";
char *str3 = "";
char str4[] = "hello, world";
char str5[15] = { 0 };
char str6[] = "a";
do
{
/* Wrap checking relies on an unsigned long holding a pointer value
* without truncation.
*/
if (sizeof(unsigned long) < sizeof(void *))
{
break;
}
/* Copy into NULL, may crash */
strcpy(str1, str2);
/* Copy from NULL, may crash */
strcpy(str2, str1);
if (0 != memcmp(str2, str4, strlen(str4) + 1))
{
break;
}
/* Copy to destination which would cause wrap, may crash */
strcpy((char *) ((unsigned long) -5), str2);
/* Copy an empty string over an array */
strcpy(str4, str3);
if (0 != memcmp(str4, str3, strlen(str3) + 1))
{
break;
}
/* Copy a single character string */
strcpy(str5, str6);
if (0 != memcmp(str5, str6, strlen(str6) + 1))
{
break;
}
/* Copy a string into an array */
strcpy(str5, str2);
if (0 != memcmp(str5, str2, strlen(str2) + 1))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
Our implementation of strcpy
appropriately handles invalid inputs as well as
copying from non-null terminated strings at the end of memory.
char *
strcpy(char *s1, const char *s2)
{
size_t len = strlen(s2);
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < len) ||
((((unsigned long) -1) - ((unsigned long) s2)) < len))
{
return s1;
}
memmove(s1, s2, len);
s1[len] = '\0';
return s1;
}
strlen
August 23, 2017
Although C doesn't have true string objects, it does have arrays of characters
and string literals (static character arrays) which we'll call "C strings."
Since there is no string object then nothing tracks the length of a C string and
C must provide a way to denote the end of a string so the length can be
programmatically determined. This is accomplished by placing a NUL terminator,
'\0'
, after the last character.
Determining the length of a C string is such a common task that a function for
it is provided in the standard library, strlen
. It's prototype is as follows
size_t
strlen(const char *s);
It takes a pointer to the first character in a C string and returns the number of characters in the string, not counting the NUL terminator.
Common Ways
A simple approach to strlen is as follows
size_t
strlen(const char *s)
{
size_t len = 0;
for (; *s++; len++)
{
/* Do nothing */
}
return len;
}
Or similarly using a while loop:
size_t
strlen(const char *s)
{
size_t len = 0;
while (*s++)
{
len++;
}
return len;
}
These take advantage of the fact that the NUL terminator has an integer value of 0. The loop conditions are actually checking that the current character pointed to does not have the integer value of 0, in other words that it's not the NUL terminator. This could be further shortened as such
size_t
strlen(const char *s)
{
size_t len = 0;
while (*s++ && ++len);
return len;
}
Which goes a step further in utilizing the short circuit nature of C
conditionals meaning that if the first condition is not met then the second is
not tested. This means that the len
variable is only incremented when the
current character is not the NUL terminator.
Although these are straightforward and simple implementations, glibc, bsdlibc, and musl all use more complicated, but more efficient, versions which use a technique similar to the one I discussed in the memcpy.h article. Once a word boundary is reached the function will test an entire word at a time to detect a single zero valued byte.
The above implementations do not account for NULL pointers or for a string which is not NUL terminated and may wrap around memory. These are things we'll cover below.
Local Variables
For our implementation, we'll only use a single local variable as a pointer to the end of the string (whenever we find it). Its initial value will be the start of the string since it's possible that the string is zero characters long.
const char *end = s;
Parameter Validation
The only parameter to this function is the character pointer, s
, which we need
to verify is not NULL. This will be one condition of the test loop in the main
body of the function.
while (end)
{
}
Implementation
A loop is an obvious choice for iterating over a C string, each time checking
whether the current character is the NUL terminator. To factor in our parameter
validation and memory wrapping we'll have the first condition check the validity
of the end
pointer which was initialized with the value of s
. We need to
increment end
inside the loop body in case the check against \0
fails,
otherwise we'll increment it once more than intended when we reach the NUL
terminator.
while (end &&
('\0' != *end))
{
end++;
}
One of two things will cause the loop to terminate. Either the end pointer will be NULL or we reached the end of the string. The end pointer being NULL represents an error condition that we need to account for by resetting the value of end. Similarly, if an unbounded string at the end of memory was passed to this function, it's possible that the pointer will be incremented after it already points to the last addressable byte. This results in undefined behavior which we can attempt to handle by checking whether the resultant pointer is less than the original pointer. In such a case, we will also reset our end pointer so the string length will be reported as zero since an error will likely occur otherwise.
Lastly, we subtract s
from the resulting end
pointer to get the length of
the string.
while (end &&
('\0' != *end))
{
end++;
}
if (!end ||
(end < s))
{
end = s;
}
return (size_t) (end - s);
Testing
We can easily test whether our input validation works by passing NULL
to
strlen
. We can also easily test the length of an empty string which should be
0, as well as a string with a single character in it. For a more "dynamic" test
we will use a character array initialized with a string literal. The array will
automatically have enough room allocated on the stack for every character in the
string literal plus one more for the NUL terminator. As such, the length of that
string should be the size of the array minus 1. We're unable to test strings
which wrap memory because doing so intentionally would invoke undefined behavior
for which the results are... undefined.
int
strlenTest(void)
{
int ret = -1;
char *str1 = NULL;
char str2[] = "hello, world";
char *str3 = "";
char *str4 = "a";
do
{
/* NULL pointer */
if (0 != strlen(str1))
{
break;
}
/* String length should match array size - 1 */
if ((sizeof(str2) - 1) != strlen(str2))
{
break;
}
/* Empty string */
if (0 != strlen(str3))
{
break;
}
/* Single character string */
if (1 != strlen(str4))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
Although simple, this implementation of strlen
affords us nice error checking
and handling of undefined behaviors should they behave in a certain way.
size_t
strlen(const char *s)
{
const char *end = s;
/* Check s every time to safeguard against non-NUL terminated strings */
while (end &&
('\0' != *end))
{
end++;
}
/* Verify the loop didn't stop in an error and s didn't wrap memory */
if (!end ||
(end < s))
{
end = s;
}
return (size_t) (end - s);
}
memcmp
May 21, 2017
Comparing memory regions is an incredibly useful tool when writing C code. This
can be used to verify that two regions contain the same data or to compare
against a desired value with user-provided data. The memcmp
function provides
this functionality and has the following prototype
int
memcmp(const void *s1, const void *s2, size_t n);
The return value is an integer which indicates whether the first region is
greater than, equal to, or less than the second region. Since we only need to
read the data it makes sense that the pointers are const
which is an extra
safety measure to ensure we don't modify what is being compared.
Local Variables
As with memcpy
and memmove
, we need two local pointers that enable us to
perform comparisons. We will also use an integer to hold the value we will be
returning.
const unsigned char *pStr1 = s1;
const unsigned char *pStr2 = s2;
int ret = 0;
Parameter Validation
The only validation we need to perform is on the pointer values. The n
parameter is of type size_t
which means it will always be greater than or
equal to zero. This brings up an interesting question: what should the return
value be when no memory is compared? The Standard provides no direct answer to
this, but we can infer one from section 7.11.4, "Comparison functions", which
states
"The sign of a nonzero value returned by the comparison functions memcmp
,
strcmp
, and strncmp
is determined by the sign of the difference between the
first two values of the first pair of characters (both interpreted as unsigned
char
), that differ in the objects being compared."
So, if we were to return a nonzero value we would need to be able to perform a
comparison which we can't do if n
is zero. As such, if n
is zero then the
return value will also be zero. Therefore, we don't need to perform any
validation on n
since all values of n
are valid.
Lastly, we need to verify that the comparison will not cause the pointers to
wrap around memory. This will be done in the same manner as memmove
where we
add n
to each pointer and see if the result is less than the pointer itself.
Input errors will be indicated with a value of 1 since no comparison will be made and we cannot otherwise determine the correct sign of the difference according to the rules above.
if ( !s1 ||
!s2 ||
/* Check for wrapping while comparing */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return 1;
}
Implementation
A similar approach to memmove
will be taken where we loop over each character,
decrementing n
each iteration, but we will compare values rather than copy
them. We can store the compared value so that we can check it and break out of
the loop upon inequality like so
while (n-- > 0)
{
ret = *(pStr1++) - *(pStr2++);
if (0 != ret)
{
break;
}
}
However, this adds the overhead of storing that value for every iteration of the
loop. We only care about that value when it is not equal to zero so we should
only store it when that's the case. This requires us to backup the pointers one
step since they were incremented within the if
statement.
while (n-- > 0)
{
if (*(pStr1++) - *(pStr2++))
{
ret = *(--pStr1) - *(--pStr2);
break;
}
}
This implementation is \(O(n)\) which is as good as you can do while staying within the constraints of The Standard. The multi-character copy approach could also be used for comparisons but that's still \(O(n)\) and it violates The Standard.
Testing
Aside from testing the obvious cases like equal regions and differing regions
(both to get a greater than and less than result), we also need to test that our
input validation works as expected. This includes passing in NULL
pointers,
providing 0 for n
, and intentionally attempting to compare regions which will
cause an overlap to occur. We'll order the tests such that we only verify
correct region comparisons after we verify that input validation works
appropriately.
int
memcmpTest(void)
{
int ret = 1;
char pStr1[] = "This is a Test string";
char pStr2[] = "This is a Test string";
char pStr3[] = "This is a Lesser string";
char pStr4[] = "This is a greater string";
char *pStr5 = NULL;
do
{
/* NULL s2 pointer */
if (1 != memcmp(pStr1, pStr5, sizeof(pStr1)))
{
break;
}
/* NULL s1 pointer */
if (1 != memcmp(pStr5, pStr1, sizeof(pStr1)))
{
break;
}
/* Cause a wrap from s1 */
if (1 != memcmp((void *) ((unsigned long) -5), pStr1, sizeof(pStr1)))
{
break;
}
/* Cause a wrap from s2 */
if (1 != memcmp(pStr1, (void *) ((unsigned long) -5), sizeof(pStr1)))
{
break;
}
/* Compare no characters */
if (0 != memcmp(pStr1, pStr2, 0))
{
break;
}
/* Compare no characters with an invalid s2 pointer */
if (0 == memcmp(pStr1, pStr5, 0))
{
break;
}
/* Compare no characters with an invalid s1 pointer */
if (0 == memcmp(pStr5, pStr1, 0))
{
break;
}
/* Test equality */
if (0 != memcmp(pStr1, pStr2, sizeof(pStr1)))
{
break;
}
/* First string greater than second string */
if (0 >= memcmp(pStr1, pStr3, sizeof(pStr1)))
{
break;
}
/* First string less than second string */
if (0 <= memcmp(pStr1, pStr4, sizeof(pStr1)))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
This function follows the same pattern that we've seen in memcpy
and memmove
with the only differences being that a comparison is performed instead of a copy
and that we don't need to worry about comparing in a certain direction since
data is never written. Remember that comparing zero bytes of memory will produce
a return value indicating that the regions are equal. Now that we have memcmp
,
we can use it to test future functions like strcpy
or strcat
.
int
memcmp(const void *s1, const void *s2, size_t n)
{
const unsigned char *pStr1 = s1;
const unsigned char *pStr2 = s2;
int ret = 0;
if ( !s1 ||
!s2 ||
/* Check for wrapping while comparing */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return 1;
}
while (n-- > 0)
{
if (*(pStr1++) - *(pStr2++))
{
ret = *(--pStr1) - *(--pStr2);
break;
}
}
return ret;
}
One Function Per File
May 20, 2017
tags: assembling, compiling, design, linking, make
A common layout model seen in most libc implementations (e.g. glibc, bsdlibc, musl, etc.) is to have one source file for every function in libc. For example, memcpy.c, memmove.c residing under a string/ directory. This is in contrast to having one source file corresponding to every libc header (i.e. string.c for string.h). We'll explore why this approach is taken and then demonstrate how to convert to the more common model.
Static Linking
One of the root causes of this layout model is static linking against the standard library. Static linking takes whatever you're linking against, like libc, and embeds those functions into your output binary. This increases the size of your binary somewhat but frees you from worrying about what version of that library may or may not be installed on whatever system your user is running on.
Dynamically linking places some requirements on the version of the library
you're linking against. If your user chose to upgrade their standard C library
and that library provides more accurate results for sin()
than you're
expecting then you may run into issues.
This decision means we're trading portability for size or vice versa. Since we're developing a library that will be linked against we want to be considerate to our users so that this decision is easier to make. More specifically, we want the output binary size to be as small as possible when someone statically links against welibc. How do we do that? Place every function in its own file. The real question that we'll be answering is why placing one function per file produces smaller binaries.
Binary Size
To find out why one function per file is better, we need to do a comparison of statically linking against a library written one way versus the other. The test code will use one, but not both, of the functions we have so far in welibc so that we can show when, or if, unnecessary code shows up in our output binary. If you use all of the functions then the approach doesn't matter since you get the entire library anyways. Our test linking program, link.c, will be the following:
#include <string.h>
int
main(int argc, char *argv[])
{
char src[] = "Hello, world!";
char dst[20] = { 0 };
memcpy(src, dst, sizeof(src));
return 0;
}
This program will be compiled and linked against welibc in the same way the test code is done:
$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi -c link.c -o link.o
$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi link.o -lwelibc -L. -O0 -o test_link
One modification I made is to remove the use of the -ggdb3 flag so that no debugging information is generated. This will simplify things in a moment.
Comparison
welibc is currently written with one source file per header so we'll look at that before changing anything. Using the above procedure and renaming the output file to "combined", we get an output file size of 2164 bytes.
Separating out memmove
to a dedicated C file, rebuilding welibc, and relinking
the test code to welibc with a file name of "separate" gives a file size of
1892 bytes.
How do we find out where the 272 byte increase comes from? objdump
will tell
us exactly what is contained in the binaries and we can diff the output of each
binary:
$ objdump -d combined > comb.txt
$ objdump -d separate > sep.txt
$ diff comb.txt sep.txt
2c2
< combined: file format elf64-x86-64
---
> separate: file format elf64-x86-64
101,162d100
<
< 0000000000400282 <memmove>:
< 400282: 55 push %rbp
< 400283: 48 89 e5 mov %rsp,%rbp
< 400286: 48 89 7d e8 mov %rdi,-0x18(%rbp)
...
The truncated output contains the rest of the instructions for the memmove
function. The only difference is that the "separate" binary also contains the
memmove
function (which was never called). Let's see exactly how large the
function is by subtracting the address of the first byte of the first
instruction (0x400282) from the address of the last byte of the last function.
$ diff comb.txt sep.txt | tail
< 40032f: 48 8b 45 f8 mov -0x8(%rbp),%rax
< 400333: 88 10 mov %dl,(%rax)
< 400335: 48 8b 45 d8 mov -0x28(%rbp),%rax
< 400339: 48 8d 50 ff lea -0x1(%rax),%rdx
< 40033d: 48 89 55 d8 mov %rdx,-0x28(%rbp)
< 400341: 48 85 c0 test %rax,%rax
< 400344: 75 d8 jne 40031e <memmove+0x9c>
< 400346: 48 8b 45 e8 mov -0x18(%rbp),%rax
< 40034a: 5d pop %rbp
< 40034b: c3 retq
$ echo $((0x40034b - 0x400282))
201
The difference in the file sizes was 272 bytes though, there are still 71 bytes unaccounted for. These are attributed to other changes required to faciliate this extra function which come in the form of added, or slightly changed, instructions in the exception header and frame.
It's clear that separating functions to their own source file ensures that programs which statically link against welibc will only receive the functions they call. Next we'll discover why this happens.
Translation Units
During the compilation of C programs the compiler will deal in discrete pieces of the larger program which are referred to as translation units. The Standard defines a translation unit as "A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines previously skipped by any of the conditional inclusion preprocessing directives ...".
When all string.h functions are included in a single source file they get lumped
into a single translation unit. When you make a call into code within a
translation unit, the entire unit must be included in the final output
executable. When all functions for a given section of the standard library are
in the same file (i.e. string.c), you end up including all functions even if
only one was called. This is seen with the combined executable including
memmove
even though only memcpy
was called.
Dynamic Linking
Now that we know why separating functions to their own function is beneficial for static linking, we need to explore what affect this will have on dynamic linking.
When dynamic linking is used, the operating system will load a library into an address space the first time an executable needs to use it. That library is then made available to any other executables that need to use it, without the overhead of it being loaded. The operating system takes care of linking the running executables with the library at run time (hence the name dynamic).
When a library is compiled with the intention of being used with dynamic linking, the output size isn't a concern because the price of loading is paid only once. Moving each function to its own file effectively makes no difference for dynamic loading since all code is included anyways.
Reorganizing
The way I'd like to lay out the source code, and the way other projects have also done it, is with directories that correspond to each header file. For example:
└── src/
├── assert
├── errno
├── math
├── stddef
├── stdio
├── stdlib
└── string
This means that some changes need to be made to the Makefile so that it will find code two directories deep.
Gather Source Directories
First we need to find all directories that contain source code. After creating some new directories and shuffling the files around, the layout looks like so:
src/
├── errno
│ └── errno.c
├── _start.s
└── string
├── memcpy.c
└── memmove.c
Make has a builtin function, wildcard
, which will expand each pattern it
receives into the files and directories that it finds.
$(wildcard $(SRCDIR)/*/)
With the current layout, this will expand to
src/string/ src/errno/ src/_start.s
We don't want "src/_start.s" in the mix, so we can use the builtin dir
function to extract the directory portion of each filename:
$(dir $(wildcard $(SRCDIR)/*/))
Which gives us:
src/string/ src/errno/ src/
Finally, we will use sort
to order the names and remove any possible
duplicates. This list of directories will be stored in a variable for later use:
$(SRC_DIRS):= $(sort $(dir $(wildcard $(SRCDIR)/*/)))
Gather Source Files
Now that we've found all directories containing source code, we want to gather
all source files that exist within those directories. The addsuffix
builtin
will add the given suffix to every item in the list; this is perfect for using
another wildcard to find all files of a given type. Finally, the notdir
function is used to extract just the filename from an item.
C_SRCS := $(notdir $(wildcard $(addsuffix *.c,$(SRC_DIRS))))
S_SRCS := $(notdir $(wildcard $(addsuffic *.s,$(SRC_DIRS))))
This gives us a listing of all current source files:
errno.c memmove.c memcmp.c memcpy.c
_start.s
Convert Sources into Objects
Next we will slightly alter the existing pattern substitutions so that the input
data comes from $(C_SRCS)
and $(S_SRCS)
.
OBJECTS := $(patsubst %.c,%.o,$(C_SRCS))
OBJECTS += $(patsubst %.s,%.o,$(S_SRCS))
Modify VPATH
Now that we know the names of all source files and the directories in which they
exist, we need to modify the VPATH variable so that it contains all of the
source directories. This way Make can find the source files when we refer to
them by filename only, rather than by full path. This means switching out
$(SRCDIR)
for $(SRC_DIRS)
.
$(VPATH) := $(SRC_DIRS):$(TSTDIR)
Conclusion
After seeing the advantages of placing every function in its own file it's clear that this is the way welibc should be organized. This only required a few adjustments to the Makefile and we're all set. Now binaries that link against welibc will be leaner and the code will be easier to browse.
memmove
December 16, 2016
The followup function to memcpy
is memmove
which performs the same
operations but with the guarantee that overlapping memory regions will not cause
any issue. The stipulation added by memmove
is that "copying takes place as if
the n characters from the object pointed to by s2 are first copied into a
temporary array of n characters that does not overlap the objects pointed to by
s1 and s2, and then the n characters from the temporary array are copied into
the object pointed to by s1." The prototype is as follows
void *
memmove(void *s1, const void *s2, size_t n);
As with memcpy
, the return value is not very useful and provides no way of
indicating an error. Similarly, s2
is a const void *
meaning that we cannot
modify the data to which it points.
The "Obvious" Approach
Given the wording of extra stipulation, the logical approach would be to
allocate a chunk of new temporary memory, copy from s2
to the temporary
region, copy from the temporary region to s1
, and finally free the temporary
region. Something like this:
char *pDst = s1;
const char *pSrc = s2;
char *pTmp = NULL;
/* Error checking */
pTmp = malloc(n);
if (NULL == pTmp)
{
return s1;
}
memcpy(pTmp, pSrc, n);
memcpy(pDst, pTmp, n);
free(pTmp);
pTmp = NULL;
return s1;
However, this has an issue if the call to malloc
fails. From implementing
memcpy
we already know that we can copy objects backwards to account for
object overlap, so if the malloc
fails we can just do that.
char *pDst = s1;
const char *pSrc = s2;
char *pTmp = NULL;
/* Error checking */
pTmp = malloc(n);
if (NULL != pTmp)
{
memcpy(pTmp, pSrc, n);
memcpy(pDst, pTmp, n);
free(pTmp);
pTmp = NULL;
}
else
{
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
}
return s1;
However, this is extra logic and code that's unnecessary. We can use our same
strategy from memcpy from the beginning and not worry about malloc
failing,
using extra local variables, or having a new branch in the code to check
malloc
's return value.
The Simple Approach
By removing the use of malloc
and memcpy
, our memmove
implementation looks
like so:
char *pDst = s1;
const char *pSrc = s2;
/* Error checking */
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
return s1;
This should look familiar because it's exactly the same code found in our
implementation of memcpy
. In fact, we can even reuse the same error checking
and once added, memmove
will be implemented identically to memcpy
. So, why
don't we just call one function straight from the other? Well, we could call
memmove
from memcpy
because the former operates under stricter guidelines
that also fulfill the job of the latter. However, we could not call memcpy
from memmove
because the former makes no guarantee about copying between
overlapping objects and this is a requirement for memmove
.
Testing
Since memmove
provides similar functionality to memcpy
we can also reuse our
tests by replacing every call to memcpy
with a call to memmove
. See the
memcpy post for the tests.
Conclusion
In the end, this is a copy of our implementation of memcpy
and we could
slightly clean up the code by calling memmove
from memcpy
. We can reuse the
same error checking to end at the following implementation for memmove
.
void *
memmove(void *s1, const void *s2, size_t n)
{
char *pDst = s1;
const char *pSrc = s2;
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
return s1;
}
memcpy
March 08, 2016
The inaugural string.h
function is memcpy
which will copy the specified
number of characters from one object to another. It takes two character
pointers, one to the destination object and one to the source object, along with
the number of characters to copy. It returns a pointer to the destination
object.
void *
memcpy(void *s1, const void *s2, size_t n);
As for the return value, the standard makes no indication that an error value
would be returned if something went wrong, so memcpy
lacks a way to do
meaningful error checking. Also note that s2
is a const void *
, meaning that
the data to which it points won't be modifiable through the s2
pointer, but
the pointer itself could be changed.
Local Variables
Since this function isn't very complicated, we will only need two local
variables, a source pointer and a destination pointer. But, isn't that was s1
and s2
already are? Well, they are in a sense, but since they are void
pointers then we can't do anything with them because they don't point to any
specific type. What we need to do is copy those pointers into char
pointers so
that we can read and write characters.
char *pDst = s1;
const char *pSrc = s2;
Note that our source needs to be a const char *
to match the const
that is
attached to s2
. Without this, you should get a warning about discarding the
const
qualifier:
src/string.c: In function ‘memcpy’:
src/string.c:59:21: warning: initialization discards ‘const’ qualifier from
pointer target type [enabled by default]
char *pSrc = s2;
^
Parameter Validation
The first thing we should do is check our inputs to be sure they are valid
before we use them. The basic checks here would be ensuring that we don't have
NULL
pointers and that our size isn't zero:
if ( !s1 ||
!s2 ||
!n))
{
return s1;
}
This will prevent us from having a NULL
pointer dereference when we read or
write a character and prevent us from executing unnecessary instructions if we
didn't need to perform any actual copying.
The standard explicitly states that 'If copying takes place between objects that
overlap, the behavior is undefined.' So, we don't have to verify that s1
isn't
in the range of [s2
, s2 + n - 1
], or that s2
isn't in the range [s1
, s1
+ n - 1
]. Undefined behavior means that the standard allows us to ignore this
behavior, to implement a specific behavior that is then required to
be documented, or to terminate execution completely. For right now, we will just
ignore the behavior.
Function Body
Since we have the number of characters to copy stored in n
, we'll use that as
a loop counter and compare it against 0
to determine when we need to stop
copying. We can then use our source and destination pointers to copy a single
character at a time. Once we're done copying, all we need to do is return the
original destination pointer.
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
return s1;
Security
Although the standard says we can just ignore overlapping objects pointed to by
s1
and s2
, we should be able to easily verify that the objects don't
overlap with the simple range checks that were mentioned before. Since size_t
is an unsigned type, then we can use suggestions from the INT30-C section
of the CERT C Coding Standard to check for overlap and wrapping:
if ( !s1 ||
!s2 ||
!n ||
/* Check for object overlap */
((((unsigned long) s2) - ((unsigned long) s1)) < n) ||
((((unsigned long) s1) - ((unsigned long) s2)) < n) ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
Due to a limitation of C90, there is no integral type guaranteed to be large
enough to hold the value of a pointer, but in order to perform this error
checking we must convert the pointers to an integral type. The largest possible
type available in C90 is unsigned long
, so that is what we'll convert the
pointers to in order to perform arithmetic on them. We'll add a test later on to
ensure that this is true on the system where this library is tested. As well,
the function comment will include a warning about our assumption.
That's a lot of checking but it provides validation that our copy should succeed. There are other errors that could still happen, like reading outside of a memory boundary, but they are difficult (or impossible) to prevent in a general enough way to be included in the library. There are two camps here: either you prevent the user from doing something bad and let them handle the error checking themselves, or you let them try something bad at the expense of extra cycles spent on error checking in the library itself. Most implementations are done in the first way and I think it would be interesting to do it the latter way.
Backwards copy?
Another use for memcpy
would be to slide a region of data over by some amount.
But, what if we slide the region over by so little that its destination is
actually within the original region? Well, our security checks will prevent that
from happening! This is nice, but in actuality a user may need to do this very
thing and the standard doesn't actually say we have to prevent it. To allow this
we'll first have to remove the overlap checks.
Now that we allow the source and destination regions to overlap, let's look at
what happens when we slide a region to lower memory such that destination region
will overlap with the source. Assume that we have the characters, A
, B
, C
,
and D
stored in memory adjacent to each other at address 0x8000
, and we want
to move it over to address 0x7FFE
.
First, we read an A
from 0x8000
and store it at 0x7FFE
. Next, we read a
B
from 0x8001
and store it at 0x7FFF
. Next we read a C
from 0x8002
and
store it at 0x8000
, overwriting the A
from our source region. Finally, we
read a D
from 0x8003
and store it at 0x8001
, overwriting the B
from our
source region. We end up with A
, B
, C
, D
starting at address 0x7FFE
.
Although we overwrote our source region with new data, it wasn't an issue
because we read from the source locations before overwriting them and our
memcpy
was a success.
Next, let's look at what happens when we slide a region to higher memory such
that the destination region will overlap with the source. We'll use the same
source address and data, but the destination will be address 0x8002
.
First, we read an A
from 0x8000
and store it at 0x8002
, overwriting the
C
from our source region. Next, we read a B
from 0x8001
and store it at
0x8003
, overwriting the D
from our source region. Next, we read an A
from
0x8002
and store it at 0x8004
. Finally, we read a B
from 0x8003
and
store it at 0x8005
. We end up with A
, B
, A
, B
, starting at address
0x8002
. This is the incorrect result and it happened because we overwrote
source locations before reading from them.
So, how can we implement memcpy
in such a way that the source and destination
may overlap but the copy will still happen properly? Well, for the case that we
slide the region in to higher memory, we can simply copy from the end of the
region to the start, rather than the normal method of copying from start to end.
We'll redo our failed copy backwards and see what happens. First, we read a D
from 0x8003
and store it at 0x8005
. Next, we read a C
from 0x8002
and
store it at 0x8004
. Next, we read a B
from 0x8001
and store it at
0x8003
, overwriting the D
from our source region. Finally, we read an A
from 0x8000
and store it to 0x8002
, overwriting the C
from our source
region. We end up with A
, B
, C
, D
, starting at address 0x8002
and our
memcpy
was a success when done backwards.
In the same way that the forward copy failed when you have overlapping regions and move from lower to higher memory, a similar copy from higher to lower memory will fail when done backwards. As such, we must check the source and destination and perform a forward or backward copy as appropriate. The only true way to do this is by having two copies of the same code where one is implemented as normal, while the other moves the pointers to the end of the region and decrements them as it copies.
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
Our backward copy works very similarly to the forward copy with the exception of where our pointers start. Since we use the pre-decrement operator we must start out past the region we want to copy. This will allow us to back up in to the valid region and then perform the copy.
Efficiency
When you look at the complexity of an algorithm you can measure two things to
get an idea of how the algorithm will perform: time and space. We'll look at how
to analyze them both, how to improve the efficiency of memcpy
, and some other
generic notes on efficiency. That being said, keep in mind the famous quote,
"Premature optimization is the root of all evil." Try not to be too clever and
save the hardcore optimizations for when there is a true need for them (i.e.
your algorithm isn't meeting speed requirements or it's using too much memory).
Before getting in to the details of efficiency, it's worth noting that the
algorithm we have at this point is truly as efficient as it can be according to
the standard. This is due to strict aliasing and object access rules which limit
the ways in which we can read or write data. The standard tells us that a
pointer to char
may be used to access any other type of object which is why
those are the only pointers we can use. Unfortunately, this limits us to copying
one byte at a time.
For the sake of exploration we'll look at using other pointer types to perform the copy in a faster way, but remember that this breaks the rules of the standard.
Time
A common way to measure the time of an algorithm is to use Big O Notation which represents a cap on the growth of a function in relation to its input. This is appropriate for C code because each line, comparison, or assignment could actually take multiple instructions to execute depending on what hardware it runs on. Using Big O Notation allows us to normalize our measurements, providing an accurate description of our function regardless of where it runs.
For this memcpy
algorithm, we can see that there is a direct correlation
between the amount of input data, n
, and the number of steps required to
complete the copy. This means our function can be described with \(O(n)\) in
relation to how long it takes to execute.
Space
We can also use Big O Notation to describe the amount of memory, or space,
that a function uses to do its work. With C programming, you may have arrays or
multiple variables used to keep track of state, but the space for those will be
constant and most likely on the stack. There is the concept of dynamic memory
allocation where you use variable amounts of memory from the heap; these will be
indicated by calls to functions like malloc
or calloc
. Since the amount of
memory used with dynamic memory allocation is unknown at run time, this is the
memory we are interested in measuring. Most other uses of memory will be known
at compile time and would remain constant.
Our implementation of memcpy
does not make use of dynamic memory allocation
which means it will use the same amount of memory every time, regardless of the
size of input data. We can describe this behavior with \(O(1)\) indicating
constant memory usage.
Multi-character copy
Copying one character at a time could be pretty slow if we were copying a large
amount of characters. It could be more efficient to do four characters at a time
using an int
pointer (points to four bytes on x86_64), or even eight
characters at a time using a long
pointer (points to eight bytes on x86_64).
This will incur some additional overhead for calculating which type to use with
what values of n
, as well as extra logic to copy the leftover characters if
the chosen type is not a multiple of n
. Let's explore this.
Side-note: copying via a pointer to a non-char type will violate the strict-aliasing rules of C because only pointers to void or char types may alias any other pointer. Since we don't have any type information about the objects being copied to or from, we could only use pointers to char types if we want to stay within the bounds of strict-aliasing rules.
Alignment
The first thing we'll look at is alignment, which deals with how data is
arranged and accessed in memory. Data said to be "n-byte aligned" will always be
at a memory address that is a multiple of n. For example, four-byte aligned data
will always be at an address that ends in 0x0
, 0x4
, 0x8
, or 0xC
, because
those are all multiples of four. This is an important concept when copying
memory because different amounts of data can be accessed depending on the
type used for access, but this only works if the addresses match the alignment
of that type.
Each type in C may have a different alignment due to its size in bytes. Examples
from x86_64 would be char
, short
, int
, and long
, which each have a size
of one, two, four, and eight bytes, respectively, and also have the same values
for their alignment. This means that storing 0xDEADBEEF
at address 0x8000
might look like so:
unsigned char *pChar = (unsigned char *) 0x8000;
unsigned short *pShort = (unsigned short *) 0x8000;
unsigned int *pInt = (unsigned int *) 0x8000;
unsigned long *pLong = (unsigned long *) 0x8000;
*pChar = 0xEF;
*(pChar + 1) = 0xBE;
*(pChar + 2) = 0xAD;
*(pChar + 3) = 0xDE;
*pShort = 0xBEEF;
*(pShort + 1) = 0xDEAD;
*pInt = 0xDEADBEEF;
*pLong = 0xDEADBEEF;
As you can see, the larger the size of a type, the larger the amount of data
that be accessed at one time. However, there could be a point at which you stop
seeing benefits, as can be seen when we used int
and long
pointers.
The address we chose, 0x8000
, is special because it fits the alignment of each
of the four types. This won't always be the case and this is where problems
start to happen. If we had chosen the address 0x8002
then the only valid types
for accessing that address would be char
and short
, because 0x8002
is a
multiple of both one and two, but not four or eight. This will come in to play
when attempting to do multi-byte copies, because both the source and destination
addresses will need to have the same alignment. The convenient part about
copying one char
at a time is that every address is one-byte aligned so there
are no alignment issues to worry about.
The largest amount of data that a processor can access at one time is referred
to as a word, so if we can copy a word at a time then we will theoretically be
copying memory as fast as possible. The word size on each machine will change,
but it will always be a multiple of the largest data type available on that
machine. In terms of C90, this means copying a long
at a time is the best we
can do. The sign of the long
is irrelevant here because an unsigned long
occupies the same amount of memory as a long
.
For more information about alignment, see this article by Johnathan Rentzsch.
Calculating Efficiency Gains
The second thing we'll look at is how to determine which type will be the most efficient at copying a certain amount of data. To do this, we need to consider how many steps are taken to perform a copy. For simplicities sake, we will consider one copy to be one step. If we use different types then that one step may copy one byte of data or it may copy eight.
To take care of any alignment issues (assuming that the source and destination are capable of being aligned), we will assume that our copies start one byte past an alignment boundary, and end one byte before an alignment boundary which will represent the worst-case scenario. For an n-byte aligned type, this means we have \(n-1\) steps before and after we perform the n-byte aligned bulk copy. This also means that an n-byte aligned type will only be used to copy \(3n - 2\) characters, or more. Otherwise, we wouldn't have any characters left to copy after handling alignment.
For x86_64, our thresholds for performing copies with each type are as follows:
char
, 1-byte aligned: \(3(1) - 2 = 3 - 2 = 1\)short
, 2-byte aligned: \(3(2) - 2 = 6 - 2 = 4\)int
, 4-byte aligned: \(3(4) - 2 = 12 - 2 = 10\)long
, 8-byte aligned: \(3(8) - 2 = 24 - 2 = 22\)
As you can see, the thresholds are very close together. To determine how many steps, in the worst case, are used to copy \(x\) number of characters with an \(n\)-byte aligned type, we will derive an equation based on the three parts of the copy: a) getting to an alignment boundary, b) the bulk copy, and c) copying whatever is leftover:
And if we substitute in \(c\) we can simplify:
Then substitute in \(b\) since \(a\) isn't needed anymore:
As an example, let's compare copying 100 characters using short
and long
types.
Considering the range of size_t
, which dictates how many characters get copied
at once, the efficiency gains of using a larger type are noticed for very small
values of n
.
Implementing a typeless memcpy
Next we'll look at implementing a multi-byte memcpy
by starting with the basic
copy from before and building up to a version which will work regardless of the
type used for the "bulk" copy portion. Here was our original copy loop:
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
This relied on the size of each copy being one byte allowing us to use the post-decrement operator to reduce our counter. More generically, this loop would be:
while (0 < n)
{
*(pDst++) = *(pSrc++);
n -= sizeof(*pDst);
}
Which will accomplish the same thing while working with any type chosen for
pDst
and pSrc
(assuming they're the same). For now we are assuming that both
pDst
and pSrc
point to memory addresses which are aligned and that n
is a
multiple of sizeof(*pDst)
; we'll deal with that later. We need to further
abstract this to account for getting to an aligned boundary before the bulk copy
and then to copy any leftover bytes.
Like we mentioned with our object overlap checks, C90 doesn't have an integral
type that is guaranteed to be large enough to hold the value of a pointer and we
need such a type in order to check the alignment of a pointer. The best we can
do is use the largest type available to us, long
, and note the invalid
assumption that it will hold the value of a pointer. In our tests we can include
a check to ensure that this is true.
char *pCharDst = s1;
const char *pCharSrc = s2;
my_type *pDst = NULL;
const my_type *pSrc = NULL;
/* Error checking */
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pDst) - 1))) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pDst = (my_type *) pCharDst;
pSrc = (my_type *) pCharSrc;
while (sizeof(*pDst) <= n)
{
*(pDst++) = *(pSrc++);
n -= sizeof(*pDst);
}
pCharDst = (char *) pDst;
pCharSrc = (char *) pSrc;
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
Finally, we need to deal with our alignment assumption by checking the source
and destination addresses for matching alignment. If their offsets from an
alignment boundary do not match then they will never sync up to both match the
alignment of the bulk copy type at the same time. In this case, the best we can
do is use our generic char
copy.
char *pCharDst = s1;
const char *pCharSrc = s2;
my_type *pDst = NULL;
const my_type *pSrc = NULL;
/* Error checking */
if ((((unsigned long) pCharDst) & (sizeof(*pDst) - 1)) ==
(((unsigned long) pCharSrc) & (sizeof(*pSrc) - 1)))
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pDst) - 1))) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pDst = (my_type *) pCharDst;
pSrc = (my_type *) pCharSrc;
while (sizeof(*pDst) <= n)
{
*(pDst++) = *(pSrc++);
n -= sizeof(*pDst);
}
pCharDst = (char *) pDst;
pCharSrc = (char *) pSrc;
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
Choosing when to switch types
Now we have all of tools we need to write a multi-byte memcpy
. Before fully
implementing the multi-byte memcpy
, let's look at our thresholds one more time
(again, these are specific to x86_64):
char
, 1-byte aligned - \(3(1) - 2 = 3 - 2 = 1\)short
, 2-byte aligned - \(3(2) - 2 = 6 - 2 = 4\)int
, 4-byte aligned - \(3(4) - 2 = 12 - 2 = 10\)long
, 8-byte aligned - \(3(8) - 2 = 24 - 2 = 22\)
Not only are these the lowest values of n
that are required to use that
specific type, they are also the first values of n
where copying with the
smaller type will take more steps than with the larger. This means that the
minimum count thresholds double as the efficiency thresholds. We could use a
simple check to decide on which type to use for the bulk copy and then use the
above code with the appropriate type substituted for pDst
and pSrc
. Due to
the nature of alignment, ensuring that something is aligned to the largest type
guarantees that alignment is satisfied for any smaller type (thought I wouldn't
be surprised if there was an architecture for which that wasn't true). This
means we can always perform our alignment check using a modulus of the largest
type, sizeof(long)
, and if it passes then we're guaranteed that the source and
destination can ultimately be aligned for any type.
#define MEMCPY_LONG_THRESHOLD (22)
#define MEMCPY_INT_THRESHOLD (10)
#define MEMCPY_SHORT_THRESHOLD (4)
char *pCharDst = s1;
const char *pCharSrc = s2;
short *pShortDst = NULL;
const short *pShortSrc = NULL;
int *pIntDst = NULL;
const int *pIntSrc = NULL;
long *pLongDst = NULL;
const long *pLongSrc = NULL;
/* Error checking */
if ((MEMCPY_SHORT_THRESHOLD <= n) &&
(((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1)) ==
(((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1)))
{
if (MEMCPY_LONG_THRESHOLD <= n)
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1)))
&&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(pLongDst++) = *(pLongSrc++);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
if (MEMCPY_INT_THRESHOLD <= n)
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pIntDst) - 1)))
&&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pIntDst = (int *) pCharDst;
pIntSrc = (int *) pCharSrc;
while (sizeof(*pIntDst) <= n)
{
*(pIntDst++) = *(pIntSrc++);
n -= sizeof(*pIntDst);
}
pCharDst = (char *) pIntDst;
pCharSrc = (char *) pIntSrc;
}
if (MEMCPY_SHORT_THRESHOLD <= n)
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pShortDst) - 1)))
&&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pShortDst = (short *) pCharDst;
pShortSrc = (short *) pCharSrc;
while (sizeof(*pShortDst) <= n)
{
*(pShortDst++) = *(pShortSrc++);
n -= sizeof(*pShortDst);
}
pCharDst = (char *) pShortDst;
pCharSrc = (char *) pShortSrc;
}
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
You'll notice that we used if
statements for each of the bulk copies, rather
than if else
for the int
and short
copies. This is because we aren't sure
what type of system we will run on and there is a possibility that what is
leftover from one bulk copy may be enough to do a smaller bulk copy. On x86_64
this could be seen when copying 22
bytes when the source and destination
addresses are already aligned to the size of a long
. Two bulk copies would
happen with a long
pointer, followed by three copies with a short
pointer.
While this will mostly guarantee you get the fastest copy for any value of n
,
it has a lot of repetitive code and is overly complicated just to provide faster
copies for a very small window of n
values. In addition, if we look at each
comparison as another step then a copy using short
pointers has completely
lost any efficiency gain it would've had because we spent five steps just to
figure out we should be using short
s. Performing more detailed step or
efficiency analysis really requires you to look at the assembly level and that's
not what we're here to do. With that said, a copy using short
pointers
probably won't have any real benefits over the basic char
copy. Likewise, the
int
copy would show gains for such a small window that it's unnecessary to
implement unless your specific application utilizes a lot of copies in that
range.
Now that we're down to only performing bulk copies with long
pointers, we
should consider the overhead of checking n
, the alignment, and adjusting
pointers as necessary for a successful multi-byte copy. The added overhead means
that we should increase the threshold required to perform the extra work,
otherwise we aren't actually gaining anything. We'll bump up our threshold by an
order of magnitude by setting it to 200. You could more finely tune this
threshold by looking at the assembly output of your memcpy
implementation and
comparing exactly how many instructions are used for the extra checks and the
bulk copy itself. Our final multi-byte implementation (sans error checking) is
below. Each loop here represents each piece of our efficiency equation from
before: \(a\) to get to an alignment boundary, \(b\) to perform the bulk copy, and
\(c\) to copy any leftovers. Lastly, we've added two new variables to track the
alignment offsets for us so we only have to calculate them once.
#define MEMCPY_EFFICIENCY_THRESHOLD (200)
void *
memcpy(void *s1, const void *s2, size_t n)
{
char *pCharDst = s1;
const char *pCharSrc = s2;
long *pLongDst = NULL;
const long *pLongSrc = NULL;
size_t dstOffset = 0;
size_t srcOffset = 0;
/* Error checking */
dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);
if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
(dstOffset == srcOffset))
{
while ((0 < dstOffset--) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(pLongDst++) = *(pLongSrc++);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
return s1;
}
Other Efficiency Notes
Our parameter validation includes a check on n
that is questionable. On one
hand, our loops will check n
and never run, so we don't actually need the
check. On the other hand, we can exit early when n
is zero and save ourselves
from performing the other checks that are irrelevant since no data is being
copied. The chance of (n == 0)
being true is possible, but highly unlikely,
and keeping the check means that we're adding an extra step to every call to
memcpy
. Due to this, we'll favor the calls to memcpy
where (n != 0)
by
removing parameter validation on n
, and let those unlikely calls suffer a few
extra steps.
For error checking, we'll consider each check to be an additional step. So, we're adding just four steps to validate that our objects do not overlap and that we don't wrap during the copy. This is an incredibly small price to pay for the added security and safety that these checks add, so we'll definitely keep those.
Backwards copy revisited
Now that we have a more efficient version of our algorithm, we still need to make it copy backwards to handle the case of object overlap. We've already discussed why this is necessary and how to do it with the simple copy loop and walked through step-by-step the way to expand the simple version into a more efficient version. The pointer arithmetic takes care of a lot of the headache for us. Since this is an extension of what we've done already, the full version is found at the end.
Testing
Normally we could use memcmp
to verify that our memcpy
happened correctly,
but we don't have that function yet. We'll create a simple test function that
checks the boundaries of the copy to ensure we copied up to the boundaries, but
no further. This function takes a control memory region that should be a copy of
the destination region so it can be restored, as well as an offset in to the
region such that we can verify no bytes before the region were touched. Before
doing the actual copy, we verify that the source and destination are actually
different at the critical points, otherwise our tests won't be meaningful. The
last note about this function is with the use of the variables src2
and
src3
. This function will be used to test copying between overlapping objects
and we must preserve the values of the source object that we intend to test,
otherwise they may be overwritten and our test will appear to fail.
int
verifyCopy(
const void * const pVoidCtrl,
const size_t ctrlSize,
void *pVoidDst,
const size_t dstSize,
const void * const pVoidSrc,
const size_t srcSize,
const size_t offset)
{
int ret = 1;
const char * const pCtrl = pVoidCtrl;
char *pDst = pVoidDst;
const char * const pSrc = pVoidSrc;
size_t n = 0;
size_t i1 = offset - 1;
size_t i2 = offset;
size_t i3 = 0;
size_t i4 = 0;
char src2 = 0;
char src3 = 0;
if ((NULL == pCtrl) ||
(NULL == pDst) ||
(NULL == pSrc) ||
(ctrlSize != dstSize) ||
(0 == offset) ||
((2 * offset) >= ctrlSize) ||
((2 * offset) >= dstSize))
{
return ret;
}
/* Copy up to offset bytes from the end of the shortest region */
if (dstSize > srcSize)
{
n = srcSize - (2 * offset);
}
else
{
n = dstSize - (2 * offset);
}
i3 = offset + n - 1;
i4 = offset + n;
src2 = pSrc[i2];
src3 = pSrc[i3];
do
{
/* Verify that we will be able to see a difference */
if ((pDst[i1] == pSrc[i1]) ||
(pDst[i2] == src2) ||
(pDst[i3] == src3) ||
(pDst[i4] == pSrc[i4]))
{
break;
}
memcpy(pDst + offset, pSrc + offset, n);
/* Test the boundaries */
if ((pDst[i1] != pCtrl[i1]) ||
(pDst[i2] != src2) ||
(pDst[i3] != src3) ||
(pDst[i4] != pCtrl[i4]))
{
break;
}
ret = 0;
} while (0);
/* Return the destination to its original form */
memcpy(pDst, pCtrl, dstSize);
return ret;
}
Our main test function will explicitly test the error conditions and then use
the previous function to verify that the memcpy
implementation works properly.
The method by which we verify our alignment handling is by testing every
combination of start and end offset from zero to sizeof(long) - 1
. The last
two tests will ensure that copying between overlapping objects works properly.
int
memcpyTest(void)
{
int ret = 1;
int failed = 0;
size_t i = 0;
size_t j = 0;
char bulk1[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
char bulk2[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
char bulk3[] = "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA";
char bulk4[] = "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA";
char simple1[] = "This will be a short string to test simple things";
char simple2[] = "This will be a short string to test simple things";
do
{
/* Alignment checking relies on an unsigned long holding a pointer value
* without truncation.
*/
if (sizeof(unsigned long) < sizeof(void *))
{
break;
}
/* No way to verify, we'll just crash here if we don't catch the NULL */
memcpy(NULL, simple1, sizeof(simple1));
/* We may crash here */
memcpy(simple2, NULL, sizeof(simple2));
for (i = 0; i < sizeof(simple2); i++)
{
if (simple1[i] != simple2[i])
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
/* We might crash here too */
memcpy(simple2, (void *) ((unsigned long) -5), sizeof(simple2));
for (i = 0; i < sizeof(simple2); i++)
{
if (simple1[i] != simple2[i])
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
/* And here might be another crash */
memcpy((void *) ((unsigned long) -5), simple1, sizeof(simple1));
/* Verify no characters are copied */
memcpy(bulk2, bulk3, 0);
for (i = 0; i < sizeof(bulk2); i++)
{
if (bulk1[i] != bulk2[i])
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
/* Test each permutation of 0 to sizeof(long) start positions and
* offsets doing a forward copy.
*/
for (i = 0; i < sizeof(long); i++)
{
for (j = 0; j < sizeof(long); j++)
{
if (0 != verifyCopy(bulk1 + i,
sizeof(bulk1) - i,
bulk2 + i,
sizeof(bulk2) - i,
bulk3 + i,
sizeof(bulk3) - i,
j + 5))
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
}
if (1 == failed)
{
break;
}
/* Test each permutation of 0 to sizeof(long) start positions and
* offsets doing a backward copy.
*/
for (i = 0; i < sizeof(long); i++)
{
for (j = 0; j < sizeof(long); j++)
{
if (0 != verifyCopy(bulk3 + i,
sizeof(bulk3) - i,
bulk4 + i,
sizeof(bulk4) - i,
bulk2 + i,
sizeof(bulk2) - i,
j + 5))
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
}
if (1 == failed)
{
break;
}
/* Slide a buffer backward */
if (0 != verifyCopy(bulk1 + 12,
sizeof(bulk1) - 12,
bulk2 + 12,
sizeof(bulk2) - 12,
bulk2 + 12 + (3 * sizeof(long)),
sizeof(bulk2) - 12 - (3 * sizeof(long)),
7))
{
break;
}
/* Slide a buffer forward */
if (0 != verifyCopy(bulk1 + 12 + (3* sizeof(long)),
sizeof(bulk1) - 12 - (3* sizeof(long)),
bulk2 + 12 + (3* sizeof(long)),
sizeof(bulk2) - 12 - (3* sizeof(long)),
bulk2 + 12,
sizeof(bulk2) - 12,
7))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
At the core, memcpy
is very simple; just loop over the data and move it from
source to destination. The versions we've implemented have several extra checks
in place to prevent NULL pointer dereferences and reading/writing with an
improperly chosen length. Since copying between overlapping objects is allowed
if implemented, we chose to implement that to provide more functionality.
void *
memcpy(void *s1, const void *s2, size_t n)
{
char *pDst = s1;
const char *pSrc = s2;
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
return s1;
}
We also implemented bulk copies when available by using pointers to the largest
type available in C90. Keep in mind that the bulk copy implementation relies on
the assumption that an unsigned long
is at least as large as a pointer type
and that it also violates certain parts of the standard to perform copies in
this way.
#define MEMCPY_EFFICIENCY_THRESHOLD (200)
void *
memcpy(void *s1, const void *s2, size_t n)
{
char *pCharDst = s1;
const char *pCharSrc = s2;
long *pLongDst = NULL;
const long *pLongSrc = NULL;
size_t dstOffset = 0;
size_t srcOffset = 0;
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
/* Copy forward or backward to handle object overlap */
if (pCharDst < pCharSrc)
{
dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);
if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
(dstOffset == srcOffset))
{
while ((0 < dstOffset--) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(pLongDst++) = *(pLongSrc++);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
}
else
{
pCharDst += n;
pCharSrc += n;
dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);
if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
(dstOffset == srcOffset))
{
while ((0 < dstOffset--) &&
(0 < n--))
{
*(--pCharDst) = *(--pCharSrc);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(--pLongDst) = *(--pLongSrc);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
while (0 < n--)
{
*(--pCharDst) = *(--pCharSrc);
}
}
return s1;
}
string.h
December 05, 2015
C has no concept of string objects, but it's handy to refer to certain pieces of
memory as strings. These certain pieces of memory are a collection of
consecutive characters ending in a NUL terminator. Because C has no concept
of an object associated with those pieces of memory, the string.h
header
provides the common functions you might usually find associated with strings
such as length, copy, concatenate, compare, and search. In addition to these
functions, the header provides a type and a definition which we'll cover in this
post.
size_t and NULL
The type defined in string.h
is size_t
, and the single macro it has is
NULL
... but we've seen these already in stddef.h
. So, we have two options:
- include stddef.h
- redeclare/redefine them in string.h, or
Although both approaches are valid from a general programming perspective, The Standard says this in 7.1.3, Reserved identifiers
Each header declares or defines all identifiers listed in its associated
subclause, ...
Which means we can't include other headers to get a declaration or definition,
it must be done directly in the header itself. So, even though it will duplicate
code, we have to redefine the size_t
type and macro for NULL
.
Redeclare and redefine
I won't go over what size_t
and NULL
are since that was covered in the
stddef.h post, so refer back to that if you need.
We'll first start off with copying some code from stddef.h
into string.h:
#ifndef _STRING_H
#define _STRING_H
typedef unsigned long size_t;
#define NULL ((void *) 0)
#endif /* _STRING_H */
Now, let's look at how this causes a slight challenge. I'll use an example piece
of code here that pulls from both stddef.h
and string.h
to demonstrate the
issues:
#include <stddef.h>
#include <string.h>
int
main(void)
{
size_t testSize = 5;
size_t *pSize = &testSize;
*pSize = 7;
return 0;
}
Compile this with the -Wpedantic
flag at a minimum to produce the warning:
$ gcc -Wpedantic string_test.c -o string_test
In file included from string_test.c:2:0:
string.h:4:23: warning: redefinition of typedef ‘size_t’ [-Wpedantic]
typedef unsigned long size_t;
^
In file included from string.c:1:0:
stddef.h:4:23: note: previous declaration of ‘size_t’ was here
typedef unsigned long size_t;
^
What we didn't see an error with was the redefinition of the NULL
macro, which
is because we redefined it to the exact same thing. Had the text replacement
been different, we would have seen a warning like so:
$ gcc -Wpedantic string_test.c -o string_test
In file included from string_test.c:3:0:
string.h:4:0: warning: "NULL" redefined [enabled by default]
#define NULL ((void *) 1)
^
In file included from string_test.c:2:0:
stddef.h:4:0: note: this is the location of the previous definition
#define NULL ((void *) 0)
^
To protect against these things, we can use something similar to header guards
to indicate whether or not we have defined size_t
or NULL
already. These
need to go in both stddef.h
and string.h
:
#ifndef _SIZE_T
#define _SIZE_T
typedef unsigned long size_t;
#endif /* _SIZE_T */
#ifndef _NULL
#define _NULL
#define NULL ((void *) 0)
#endif /* _NULL */
Since NULL
is itself a macro, we could test for the definition directly rather
than creating a new #define
. However, this would allow a program to provide
its own definition of NULL
which would supersede, and possibly break, our
library. By creating a new test macro inside of the reserved name space, our
implementation does its best to prevent that from happening.
These checks will get rid of the warnings, but they also allow for size_t
and
NULL
to become unsynchronized if we change their definitions at some point.
Files including string.h
could get a different definition than files including
stddef.h
, which could become difficult to debug. Although it could be error
prone, the definitions for size_t
and NULL will not change for a given
implementation because The Standard won't change (within a given standard at
least, e.g. C90 is set in stone at this point).
Conclusion
Due to The Standard, we must have the macro and type directly in the header,
luckily we can copy them from stddef.h
. However, we now need to introduce
checks to prevent redefinitions and add those same checks to stddef.h
.
#ifndef _STRING_H
#define _STRING_H
#ifndef _SIZE_T
#define _SIZE_T
typedef unsigned long size_t;
#endif /* _SIZE_T */
#ifndef NULL
#define NULL ((void *) 0)
#endif /* NULL */
#endif /* _STRING_H */
The initial test is simple as well, we just need to ensure that we can access
the size_t
type and the NULL
macro:
#include <string.h>
int
stringTest(void)
{
int ret = 0;
int *pTest = NULL;
size_t pSize = sizeof(pTest);
/* "Use" pSize to get rid of the warning */
pSize = pSize;
return ret;
}
math.h
August 09, 2015
tags: errno.h, float.h, math.h
The math.h
header provides a large variety of mathematical functions as well
as one macro. This post will focus on just the macro while each function will
have its own dedicated post.
HUGE_VAL
The only macro in math.h
is HUGE_VAL
which is defined as "a positive
double
expression, not neceessarily representable as a float." This sounds
like a good place to use the IEEE 754 reserved value for infinity which is
described like so:
We will use the value for positive infinity so that we can represent negative
infinity by simply prepending the unary minus operator to this macro,
(-HUGE_VAL)
. A double value on x86_64 is 8 bytes in length and it will be
easiest just to represent this value with 16 hexadecimal digits:
#define HUGE_VAL (0x7FF0000000000000L)
Since a hexadecimal constant in C can't have an exponent part then we use the
L
suffix to ensure that HUGE_VAL
is interpreted as a double expression when
it is used.
Test
The test for this macro is very simple and just ensures that the file has access
to the macro through including the math.h
header:
#include <math.h>
int
math_test(void)
{
int ret = 0;
double hugeVal = 0.0;
/* Test the HUGE_VAL macro */
hugeVal = HUGE_VAL;
return ret;
}
If the test fails then a compile error will be generated:
error: ‘HUGE_VAL’ undeclared (first use in this function)
Otherwise, everything will compile and succeed.
Error Conditions
math.h
functions make use of two macros to indicate error conditions, EDOM
and ERANGE
. We already talked about these and defined them in
errno.h but we'll review them here.
A domain error occurs when an input argument is outside the domain over which
the function is defined. When this occurrs, errno
is set to EDOM
and an
implementation-defined value is returned to indicate that an error condition was
triggered.
A range errors occurs when the result of a function can't be represented as a
double
value. If the result overflows the double
type, the return value is
set to HUGE_VAL
with the same sign as the correct return value (if it was
negative then we can return (-HUGE_VAL)
) and errno
is set to ERANGE
. If
the result underflows the double
type then the return value is set to 0
. For
underflows, errno
is not required to be set to ERANGE
, but an implementation
may choose to do so. This is because the value is so small that the architecture
is not able to differentiate it from 0.
You can associate EDOM
with input errors, while ERANGE
deals with output
errors. These macros aren't used in the math.h
header itself but will be used
by most, if not all, of the functions provided by math.h
.
stdarg.h
August 08, 2015
tags: stdarg.h
The stdarg.h
header provides the functionality for marching through each
argument of a function when you don't know how many arguments there will be
until run time. When you see a function with an ellipsis as the last
"parameter" then that function is probably using stdarg.h
to process each
argument. The printf
function is just one example.
Stack Frames
First, we need to talk about stack frames and how they are constructed. If
you've never heard of a stack frame then I recommend reading the Wikipedia
article on call frames. This explanation is specific to the x86-64
architecture so keep that in mind as this may not be the way a different
architecture handles things. A stack frame refers to the portion of stack memory
that pertains to a specific function. The stack frame will contain function
arguments (if they weren't passed in registers), the return address, a pointer
to the previous base pointer, and then any local variables for the function. The
base pointer is used to access both the arguments that were passed on the stack
as well as the local variables for the function. There is a specific register,
rbp
, which will contain the current base pointer. Modifying this register
without saving it could have serious consequences because you no longer have an
easy reference point for accessing the contents of your stack frame. For a
visual representation of this, the Wikipedia page I mentioned previously
has a good picture or you can look at page 16 of the x86_64 ABI. Keep in
mind that these pictures will seem to contradict eachother but this is only
because the Wiki page shows the stack with low memory at the top while the ABI
shows the stack with high memory at the top.
va_list
stdarg.h
specifies a single type for holding the meta-data necessary for
advancing through the argument list. Since this type is architecture specific
then the Standard doesn't give any more information aside from it being "a type
suitable for holding the information needed by the macros va_start
, va_arg
,
and va_end
." We can look to the x86_64 ABI for an answer though.
Each specific architecture will have its own way of performing argument passing.
For example, the i386 architecture pushes every argument onto the stack before
calling a function. Unfortunately, x86_64 is not so simple. The ABI describes
the method through with parameters should be passed in detail and even provides
the following definition for the va_list
type:
typedef struct {
unsigned int gp_offset;
unsigned int fp_offset;
void *overflow_arg_area;
void *reg_save_area;
} va_list[1];
Due to the way that x86_64 performs parameters passing, it is necessary to keep
track of a few things so that we can access different types of parameters since
they are stored in different places. The first of these is reg_save_area
which
is a pointer to the register save area. The register save area is a location on
the stack which holds copies of arguments that were passed in registers, the
reg_save_area
pointer holds the location of the first register copy (which is
always a copy of the rdi
register). There is a limit to the amount of
information that can be passed through registers and the excess arguments are
passed directly on the stack. The overflow_arg_area
is a pointer to the first
of these excess arguments.
Now we have pointers to each group of arguments on the stack but we
need to keep track of which argument is next. The gp_offset
member is an
offset from beginning of the reg_save_area
that points to the next argument
that was passed in the general purpose registers. The next argument could be
retrieved by accessing what is on the stack at reg_save_area + gp_offset
.
However, what happens when we exhaust the arguments that were passed in
registers? We set gp_offset
to 48
which indicates that "register arguments
have been exhausted" and then we would retrieve the next argument by accessing
what is on the stack at overflow_arg_area
. The value 48
comes from the
number of general purpose registers used for argument passing: rdi
, rsi
,
rdx
, rcx
, r8
, and r9
give us 6, 8-byte, registers totaling 48
bytes on
the stack for the copies of each register. Therefore, when gp_offset
is 48
or greater, the location being accessed is no longer valid. overflow_arg_area
works differently than reg_save_area
because it should always point to the
next argument to be retrieved; it needs to be updated every time an argument is
retrieved. Lastly, there is the fp_offset
member which works in the same way
as gp_offset
except it is the offset to the next argument passed in a
floating-point register. This offset will have a value of 48
to 304
where
the 304
indicates that all floating-point arguments passed in registers have
been exhausted and the next one should be retrieved by using
overflow_arg_area
.
That is a little bit complicated but overall it's not too bad. The issues comes in with passing structures as arguments because a structure, on x86_64, may be passed partially through registers and partially on the stack. This results in a lot of extra logic in order to properly rebuild the entire structure from different parts of the stack. We'll talk more about this later.
va_start
The va_start
macro is used to initialize a va_list
structure so that it can
be used to retrieve arguments. You must call va_start
before accessing any
arguments otherwise you won't know where they are.The x86_64 ABI gives us the
information we need in order to initialize the va_list
structure. This macro
takes a va_list
structure as well as the identifier of the rightmost parameter
in the variable parameter list (e.g. the argument directly before the ellipsis).
First, we need to figure out where the register save area
is. Looking at figure 3.33 in the ABI, we see that the register save area has a
total size of 304
bytes which we calculate using the last register copy
offset, 288
, plus the size of the last register copy, 16
bytes for a
floating-point register. This value should also be familiar from fp_offset
because when it is set to 304
, we know that we can't retrieve floating-point
arguments from the register save area anymore. The compiler takes care of
building the register save area for us, so we can be sure that 304
bytes of
space will be taken up on the stack directly in between the previous rbp
value
and the local function variables. Since we know that the base pointer register,
rbp
, holds the stack location immediately after our register save area, we can
subtract the size of the register save area to get the location we need.
reg_save_area
\(= rbp - 304\)
Second, we need the location of the first argument passed on the stack, which
should be in the stack location which follows the return address of the current
stack stack frame (which is directly after the saved ebp
). This location will
be 16
bytes after the saved ebp
:
overflow_arg_area
\(= rbp + 16\)
Figure 3.33 from the ABI shows us the offsets to the first general purpose
register as well as the first floating-point register, 0
and 48
,
respectively.
gp_offset
\(= 0\)fp_offset
\(= 48\)
That's all we need for va_start
to initialize the va_list
structure.
va_arg
The va_arg
macros expands to an expression that has the type and value of the
next argument passed to a function. You must provide this macro with the
va_list
struct initialized from a call to va_start
and the type of the
argument that you want to be retrieved. va_arg
will update the va_list
structure appropriately so that a subsequent call to va_arg
will work as
expected.
The only primitive types that would be returned through va_arg
would be the
int
and double
types due to the way that the ABI is defined. For an int
type we can just use reg_save_area + gp_offset
to get the next int
argument,
or if gp_offset
is greater than or equal to 48
we would read from
overflow_arg_area
. Likewise, for double
arguments we would use
reg_save_area + fp_offset
or overflow_arg_area
if fp_offset
was 304
or
greater.
For structures, things get pretty complicated since parts of the structure may
be passed in general purpose registers, some parts may be passed in
floating-point registers, and some parts may be passed directly on the stack. A
structure can contain multiple members, each of varying types, and this is what
makes it difficult to return a structure with va_arg
on this architecture.
There isn't a good way to dynamically determine the type of the next member
within a structure from the standpoint of a C library, so we need to cheat a
little bit for this macro.
The ABI even hints at this and states that "The va_arg
macro is usually
implemented as a compiler builtin and expanded in simplified forms for each
particular type." gcc provides this builtin in the form of the macro
__builtin_va_arg
and we'll use it like so:
#define va_arg(ap, type) __builtin_va_arg((ap), type)
va_end
The last macro must ensure that a normal return will happen from a function
which called va_start
. For the x86_64 architecture this macro doesn't actually
need to do anything since va_start
and va_arg
only modify the va_list
structure and not the stack or registers themselves. However, I would choose to
zero out each member of the va_list
struct just to ensure that va_start
would need to be called be called before using va_arg
again.
Cheating
Like I mentioned in the va_arg
section, we needed to cheat a little bit for
one of the macros. Since we are going to cheat on one of the macros then we also
need to cheat on the rest of them to ensure that everything works together
properly. gcc provides builtin versions of the va_list
type as well as all
three macros, so our implementation will look like so:
#ifndef _STDARG_H
#define _STDARG_H
typedef __builtin_va_list va_list;
#define va_start(ap, parmN) __builtin_va_start((ap), (parmN))
#define va_arg(ap, type) __builtin_va_arg((ap), type)
#define va_end(ap) __builtin_va_end((ap))
#endif /* _STDARG_H */
Freestanding Environment
Now that we've completed the stdarg.h
header we have all the headers necessary
for a freestanding environment. A freestanding environment is one in which C
program can run without an underlying operating system. The freestanding
execution environment requires all of the architecture specific header files:
float.h
, limits.h
, stdarg.h
, and stddef.h
. This means that the rest of
the header files (and any backing .c files) can be written in an architecture
agnostic way and shouldn't require any assembly.
stddef.h
July 19, 2015
tags: stddef.h
In addition to the integral and floating types provided by C, there are some
extra types that will be commonly used. The standard C library groups these
common types (and some macros) into a single header, stddef.h
.
Types
To define new types in C you used a typedef
. A typedef
does not actually
introduce a new type, it just creates an alias to an existing type. They take
the form of:
typedef existing-type new-type;
For example, if you wanted to make it more convenient to use the unsigned int
type by only having to type uint
, then you would use the following typedef
:
typedef unsigned int uint;
Now, any occurrence of the uint
identifier will be treated as if it were
unsigned int
. This may sound similar to a #define
, but a typedef
does not
perform a text replacement.
ptrdiff_t
This type "is the signed integral type of the result of subtracting two
pointers". So, we need this type to be of the same type as an address for the
specific architecture. The x86_64 ABI tells us that a pointer to any type is an
"unsigned eightbyte". Since this type is signed then we can focus on the other
signed eightbyte types designated in the ABI: signed long
, and signed long
long
. We'll use the signed long
type:
typedef signed long ptrdiff_t;
size_t
This type "is the unsigned integral type of the result of the sizeof
operator". The sizeof
operator yields the size (in bytes) of its operand. It
can be used to find the number of bytes in a type, an array, or a structure:
struct myStruct
{
char a;
int b;
unsigned long c;
};
int a = 0;
char myArray[] = "hello, world";
struct myStruct test;
sizeof char;
sizeof a;
sizeof myArray;
sizeof test;
Since an array or a struct can be arbitrarily long we should choose a type which
can represent a large range of values. The largest unsigned integral type
available on the x86_64 architecture is the unsigned long
type, which is what
we'll use.
typedef unsigned long size_t;
wchar_t
This type "is the integral type whose range of values can represent distinct
codes for all members of the largest extended character set specified among the
supported locales". In other words, if your extended character set used a
maximum of 3 bytes per character then wchar_t
would need to be at least 3
bytes in length. Since we are using UTF-8 which uses a maximum of 4 bytes per
character then we only need to consider types which are at least 4 bytes in
length. There is no sense in using a type which is more than 4 bytes as that
would just waste space, so we can choose between signed int
and unsigned
int
. A character doesn't really have a sign so we will choose the unsigned
int
type:
typedef unsigned int wchar_t;
Macros
If you're not already familiar with macros, then see the errno.h post for more information.
NULL
This macro "expands to an implementation-defined null pointer constant". The x86_64 ABI states that "a null pointer (for all types) has the value zero." We can simply cast the value 0 as a void pointer to accomplish this:
#define NULL ((void *) 0)
offsetof(type, member-designator)
This macro "expands to an integral constant expression that has type size_t
,
the value of which is the offset in bytes, to the structure member (designated
by member-designator), from the beginning of its structure (designated by
type). The member-designator shall be such that given static type t;
then
the expression &(t.member-designator)
evaluates to an address constant. (If
the specified member is a bit-field, the behavior is undefined.)"
So, the number of bytes between the beginning of a structure and the defined structure member. To get this we can subtract the address of the structure from the address of the member:
#define offsetof(type, member) ((size_t) &(type.member) - &type)
Conclusion
That's everything for stddef.h
. It's not a very involved header but we will
see these types and macros pop up frequently throughout the rest of the library.
#ifndef _STDDEF_H
#define _STDDEF_H
typedef signed long ptrdiff_t;
typedef unsigned long size_t;
typedef unsigned int wchar_t;
#define NULL ((void *) 0)
#define offsetof(type, member) ((size_t) &(type.member) - &type)
#endif /* _STDDEF_H */
float.h
July 17, 2015
Similary to limits.h
, some limits must be established on the characteristics
of floating-point arithmetic for a given architecture. These are determined by a
model described in the standard and implemented in float.h
. This won't be a
comprehensive introduction to floating-point arithmetic as there are already
resources for that:
- Single-precision floating-point format - Wikipedia
- Double-precision floating-point format - Wikipedia
- Floating point - Wikipedia
- What Every Computer Scientist Should Know About Floating-Point Arithmetic
The Floating-Point Model
Before talking about the specifics of float.h
, we need to lay some groundwork
for how floating-point numbers are defined and some of the terminology that goes
along with them.
- sign - positive or negative
- radix - the numerical base of exponent representation (binary would have a radix of 2)
- significand/mantissa - the part of the floating-point number consisting of its significant digits
- precision - the number of digits in the significand
With those definitions, we'll construct a few variables that we need to construct the floating-point model:
- \(s\) - sign
- \(b\) - base/radix
- \(e\) - exponent
- \(p\) - precision
- \(f_{k}\) - nonnegative integers less than \(b\) (the significand digits)
The standard states, "A normalized floating-point number \(x ( f_1 > 0 \mbox{ if } x \neq 0 )\) is defined by the following model":
This is the model that we will implement with float.h
and which can be used to
construct a floating-point number. For this implementation, \(b\) will always be
\(2\) since we're dealing with the binary representation of numbers. In addition,
we must have a range for \(e\) by defining \(e_{min}\) and \(e_{max}\). To calculate
these values we use the following equations:
- \(k =\) type width in bits
- \(e_{max} = 2^{k-p-1}\)
- \(e_{min} = 1 - e_{max}\)
Floating Types
float
, double
, and long double
are the floating-point types provided by C.
Since this first version of welibc is for the x86_64 architecture we can turn to
the x86_64 ABI to see how these types are implemented. Page 12 shows that
these types are represented by single-precision, double-precision, and 80-bit
extended precision floating-point types, respectively. It also shows us that
that each of the three floating-point types are to be implemented according to
the IEEE Standard for Floating-Point Arithmetic. I'll be choosing to use the
most recent revision of that standard, IEEE 754-2008.
Now we have some reference documentation to help us implement float.h
. We'll
use this documentation to define the variables we discussed above and then we
can use those to calculate the values we need for float.h
. IEEE 754-2008 will
give us values that define the widths, in bits, for the sign (\(s\)), exponent
(\(e\)), and precision (\(p\)) of each floating-point type. From those variables we
can derive the rest of the information we'll need.
IEEE 754 uses normalized floating-point numbers meaning that each value is represented with a nonzero leading digit. For example, \(2.2573 \times 10^{3}\) is normalized, but \(0.22573 \times 10^{4}\) is not, even though they represent the same quantity. By using normalized numbers we ensure that each value is uniquely represented.
Since we're using normalized numbers and the leading digit will always be nonzero (except for infinity, 0, and Not a Number), then the most significant bit of the significand would always be 1. There is no point in storing a bit if it will always be 1 so IEEE 754 uses that storage space to gain more precision. Since the original most significant bit is no longer stored in the floating type, it is referred to as a "hidden" bit.
The hidden bit has a few implications that we need to keep in mind through calculating our values. First, the field width of the significand is actually one more than the number of bits we use to represent it. We'll look at this below when we define all of the field widths. Second, the exponent field implicitly encodes the hidden bit for us so the computed exponent ranges are really one less than the value we need. I'll make sure to note these as we go along.
Float
The float
type is a single-precision floating-point number represented by
32 bits (\(k = 32\)). The IEEE 754-2008 standard refers to this format as binary32
and tells us that the bits are broken down like so:
- \(s = 1\)
- \(e = 8\)
- \(p = 24\)
See how the total bits here is actually 33? This is due to the hidden bit in the significand; there are physically 23 bits for the significand but we get 24 bits because the most significant bit is always 1 for normalized floating point numbers.
Double
The double
type is a double-precision floating-point number represented
by 64 bits (\(k = 64\)). The IEEE 754-2008 standard refers to this format as
binary64 and breaks down the bits like so:
- \(s = 1\)
- \(e = 11\)
- \(p = 53\)
Again, 53 bits for the significand where there are only 52 available.
Long Double
The long double
type is an extended-precision floating-point number
represented by 80 bits (\(k = 80\)). The IEEE 754-2008 standard refers to this
format as binary64 extended and breaks down the bits like so:
- \(s = 1\)
- \(e = 15\)
- \(p = 64\)
Don't forget that the hidden bit for the significand is what gives us 64 instead of 63 bits for \(p\).
float.h Values and Equations
Now that we have the inputs we need, we can derive each value in float.h
according to the equations given to us in the C standard. For all but two values
in float.h
, there is a separate named version for each floating point type
denoted by the prefixes FLT
, DBL
, and LDBL
, for float
, double
, and
long double
, respectively. We'll look at each value to see what it means and
how to compute it based on the data we've already collected.
FLT_ROUNDS
The rounding mode for floating-point addition. This value is used for all types
rather than having separate versions for each type (even though it is prefixed
with FLT
). The standard defines a set of possible assignments and their
meanings:
- \(-1\) - indeterminate
- \(0\) - toward zero
- \(1\) - to nearest
- \(2\) - toward positive infinity
- \(3\) - toward negative infinity
Your specific application of C may necessitate one of these specific rounding modes. However, what if you don't know? How do you choose one?
Well, we can look at What Every Computer Scientist Should Know About Floating-Point Arithmetic for advice (you did read it, didn't you?). There are two very relevant portions from this paper. The first is titled "Exactly Rounded Operations" in the "Rounding Error" section. The second is in the "IEEE Standard" section titled "Operations". The author suggests that you round to the nearest floating point number using the round to even method.
#define FLT_ROUNDS (1)
Exactly Rounded Operations
The paper states that rounding is pretty straightforward except for halfway cases such as 12.5; should that be 12 or 13? It then introduces the concept of "round to even" which proposes that instead of rounding up for all halfway values, they should be rounded towards the number that is even. This should result in 50% of the occurrences rounding up and 50% rounding down, rather than 100% rounding up. It also references a paper by Reiser and Knuth which backs up the position that round to even should be used.
So, why round to even on halfway values if half of the values are rounded down (0, 1, 2, 3, 4) and half of the values are rounded up (5, 6, 7, 8, 9)? Well, rounding up every time results in drift which is the gradual upward increase in repeated computations. By rounding to even you can eliminate drift since the error from one round up would be cancelled out by a subsequent round down.
IEEE 754-2008 Rounding
Goldberg's paper also references the IEEE standard because it also suggests that round to even be used. If we look at section 4.3, "Rounding-direction attributes", of IEEE 754-2008 we find that it refers to round to even as roundTiesToEven, stating that it should be the default rounding-direction attribute for results in binary formats. Although it also defines 4 other rounding attributes, we'll go with the default of roundTiesToEven which is an attribute of round to nearest.
FLT_RADIX
The radix of the exponent representation, \(b\). As stated before, the radix will always be \(2\) since we are performing binary floating-point arithmetic and the base of a binary numerical system is \(2\).
#define FLT_RADIX (2)
FLT_MANT_DIG
The number of base-FLT_RADIX
digits in the floating-point significand, \(p\).
Alternatively, this is the number of bits used to represent the significand
which is the value of \(p\) from our floating-types discussion above.
#define FLT_MANT_DIG (24)
#define DBL_MANT_DIG (53)
#define LDBL_MANT_DIG (64)
FLT_DIG
The number of decimal digits, \(q\), such that any floating-point number with \(q\) decimal digits can be rounded into a floating-point number with \(p\) radix \(b\) digits and back again without change to the \(q\) decimal digits,
This dictates the maximum number of decimal digits that can be converted to
binary floating-point and back with no changes resulting from conversion. We'll
use \(p\) (the FLT_MANT_DIG
family) and \(b\) (FLT_RADIX
).
First, let's try to simplify this equation a little bit. We can immediately see
that we will always be adding \(0\) to the result of the \(floor\) computation
because our \(b\) (FLT_RADIX
) value is \(2\) which is not a power of \(10\).
Secondly, since \(b\) is constant, \(log_{10}b\) is also constant. Lastly, we just
insert the value of \(p\) for each floating-type and we have our solutions.
FLT_DIG
\(= \lfloor(24-1) \times log_{10}2\rfloor + 0 = \lfloor 23 \times log_{10}2 \rfloor = \lfloor 6.92368... \rfloor = 6\)DBL_DIG
\(= \lfloor(53-1) \times log_{10}2\rfloor + 0 = \lfloor 52 \times log_{10}2 \rfloor = \lfloor 15.65355... \rfloor = 15\)LDBL_DIG
\(= \lfloor(64-1) \times log_{10}2\rfloor + 0 = \lfloor 63 \times log_{10}2 \rfloor = \lfloor 18.96488... \rfloor = 18\)
And we end up with:
#define FLT_DIG (6)
#define DBL_DIG (15)
#define LDBL_DIG (18)
FLT_MAX_EXP
The maximum integer such that FLT_RADIX
raised to that power minus \(1\) is a
representable finite floating-point number, \(e_{max}\). Or, the maximum possible
exponent we can use. To compute this we calculate the highest signed value that
we can store in the number of bits alotted to the exponent field. \(b\) is our
base and \(e\) is the exponent width (in bits) from above:
FLT_MAX_EXP
\(= 2^{8-1} - 1 = 2^{7} - 1 = 128 - 1 = 127\)DBL_MAX_EXP
\(= 2^{11-1} - 1 = 2^{10} - 1 = 1024 - 1 = 1023\)LDBL_MAX_EXP
\(= 2^{15-1} - 1 = 2^{14} - 1 = 16384 - 1 = 16383\)
We can verify these results against IEEE 754-2008 which has some handy tables that just give you the values for \(e_{max}\) (specifically Table 3.5-Binary interchange format parameters). Not to spoil the surprise, but our answers agree with the table.
Now, while these values are correct, they are not what we will actually use. Remember that hidden bit from before? This is where it comes in to play with the exponent, so we need to add 1 to these value to get the actual max exponent.
#define FLT_MAX_EXP (+128)
#define DBL_MAX_EXP (+1024)
#define LDBL_MAX_EXP (+16384)
FLT_MIN_EXP
The minimum negative integer such that FLT_RADIX
raised to that power minus
\(1\) is a normalized floating-point number, \(e_{min}\). Or, the lowest possible
exponent we can use. IEEE 754-2008 gives us the following equation to compute
this value:
It's important to note that the \(e_{max}\) in this equation is the computed value
of \(e_{max}\) from before, not the value of the FLT_EXP_MAX
family of macros.
FLT_MIN_EXP
\(= 1 - 127 = -126\)DBL_MIN_EXP
\(= 1 - 1023 = -1022\)LDBL_MIN_EXP
\(= 1 - 16383 = -16382\)
Again, due to the hidden bit in the significand, we must add one to these values to get the minimum exponent.
#define FLT_MIN_EXP (-125)
#define DBL_MIN_EXP (-1021)
#define LDBL_MIN_EXP (-16381)
FLT_MIN_10_EXP
The minimum negative integer such that \(10\) raised to that power is in the range of normalized floating-point numbers,
This is the minimum exponent we could use in a number written in decimal
scientific notation while still being able to reasonably convert it to a binary
floating-point representation. Here, \(e_{min}\) is the value of the FLT_MIN_EXP
family of macros to account for the hidden bit.
FLT_MIN_10_EXP
\(= \lceil log_{10}2^{-125-1} \rceil = \lceil log_{10}2^{-126} \rceil = \lceil -37.92977... \rceil = -37\)DBL_MIN_10_EXP
\(= \lceil log_{10}2^{-1021-1} \rceil = \lceil log_{10}2^{-1022} \rceil = \lceil -307.65265... \rceil = -307\)LDBL_MIN_10_EXP
\(= \lceil log_{10}2^{-16381-1} \rceil = \lceil log_{10}2^{-16382} \rceil = \lceil -4931.47338... \rceil = -4931\)
Which yields:
#define FLT_MIN_10_EXP (-37)
#define DBL_MIN_10_EXP (-307)
#define LDBL_MIN_10_EXP (-4931)
FLT_MAX_10_EXP
The maximum integer such that \(10\) raised to that power is in the range of representable finite floating-point numbers,
This is the maximum exponent we could use in a number written in decimal
scientific notation while still being able to reasonably convert it to a binary
floating-point representation. Here, \(e_{max}\) is the value of the FLT_MAX_EXP
family of macros to account for the hidden bit.
FLT_MAX_10_EXP
\(= \lfloor log_{10}((1 - 2^{-24}) \times 2^{128})\rfloor = \lfloor log_{10}3.40282... \times 10^{38} \rfloor = \lfloor 38.53183... \rfloor = 38\)DBL_MAX_10_EXP
\(= \lfloor log_{10}((1 - 2^{-53}) \times 2^{1024})\rfloor = \lfloor log_{10}1.79769... \times 10^{308} \rfloor = \lfloor 308.25471... \rfloor = 308\)LDBL_MAX_10_EXP
\(= \lfloor log_{10}((1 - 2^{-64}) \times 2^{16384})\rfloor = \lfloor log_{10}1.18973... \times 10^{4932} \rfloor = \lfloor 4932.07544... \rfloor = 4932\)
Whew!
#define FLT_MAX_10_EXP (+38)
#define DBL_MAX_10_EXP (+308)
#define LDBL_MAX_10_EXP (+4932)
FLT_MAX
The maximum representable floating-point number,
You'll notice that this exact equation is found embedded in the calculation of
FLT_MAX_10_EXP
because it uses the maximum floating-point number to derive
the exponent needed to represent it.
FLT_MAX
\(= (1 - 2^{-24}) \times 2^{128} = 3.40282347... \times 10^{38}\)DBL_MAX
\(= (1 - 2^{-53}) \times 2^{1024} = 1.7976931348623157... \times 10^{308}\)LDBL_MAX
\(= (1 - 2^{-64}) \times 2^{16384} = 1.18973149535723176502... \times 10^{4932}\)
Note that the exponents for each of these match the values we calculated for the
FLT_MAX_10_EXP
family of macros. You should also see that there are an
increasing number of significant figures as we go. This is because each
successively larger floating type can hold more precise values than the last. To
determine how many significant figures we should use you can just count the
number of digits in the largest decimal value that can be stored in the bits
alotted for the significand. This is equal to:
With that equation we arrive at \(8\), \(16\), and \(20\) decimal digits for the
float
, double
, and long double
floating types, respectively.
#define FLT_MAX (3.40282347E+38F)
#define DBL_MAX (1.7976931348623157E+308)
#define LDBL_MAX (1.18973149535723176502E+4932L)
FLT_EPSILON
The difference between \(1\) and the least value greater than \(1\) that is representable in the given floating point type,
In other words, you could store 1.0 as a float, but your floating point type
probably can't accurately represent \(1.00000000000000000000000023\). We need to
define the smallest step you can take when incrementing the floating point type
which is what FLT_EPSILON
provides.
FLT_EPSILON
\(= 2^{1-24} = 2^{-23} = 1.19209290... \times 10^{-7}\)DBL_EPSILON
\(= 2^{1-53} = 2^{-52} = 2.2204460492503131... \times 10^{-16}\)LDBL_EPSILON
\(= 2^{1-64} = 2^{-63} = 1.08420217248550443401... \times 10^{-19}\)
Again, we have used a number of significant digits which corresponds to the precision of the floating type which stores that value.
#define FLT_EPSILON (1.19209290-7F)
#define DBL_EPSILON (2.2204460492503131E-16)
#define LDBL_EPSILON (1.08420217248550443401E-19L)
FLT_MIN
The minimum normalized positive floating-point number,
You'll notice that this exact equation is embedded within the calculation of
the FLT_MIN_10_EXP
family of macros for the same reason as the equation used to
calculate FLT_MAX
(but for the minimum number).
FLT_MIN
\(= 2^{-125-1} = 2^{-126} = 1.17549435... \times 10^{-39}\)DBL_MIN
\(= 2^{-1021-1} = 2^{-1022} = 2.2250738585072014... \times 10^{-308}\)LDBL_MIN
\(= 2^{-16381-1} = 2^{-16382} = 3.36210314311209350626... \times 10^{-4932}\)
Those are some tiny numbers!
#define FLT_MIN (1.17549435E-38F)
#define DBL_MIN (2.2250738585072014E-308)
#define LDBL_MIN (3.36210314311209350626E-4932L)
Conclusion
You may have noticed an F
appended to each FLT_
value expressed in
floating-point notation. This is a floating suffix and is a mechanism used by
the C language to indicate that the prepended floating constant is to be
interpreted as a float
type. Similarly, the L
floating suffix is used to
denote a value to be interpreted as a long double
type. Floating constants
with no floating suffix are interpreted as a double
floating type by default.
Lastly, if you paid attention to the values we used vs. the values that
wolframalpha got then you'll notice that we used the "round to even" method,
when necessary, to achieve a proper result.
Our full implementation looks like so:
#ifndef _FLOAT_H
#define _FLOAT_H
#define FLT_ROUNDS (1)
#define FLT_RADIX (2)
#define FLT_MANT_DIG (24)
#define DBL_MANT_DIG (53)
#define LDBL_MANT_DIG (64)
#define FLT_DIG (6)
#define DBL_DIG (15)
#define LDBL_DIG (18)
#define FLT_MIN_EXP (-125)
#define DBL_MIN_EXP (-1021)
#define LDBL_MIN_EXP (-16381)
#define FLT_MIN_10_EXP (-37)
#define DBL_MIN_10_EXP (-307)
#define LDBL_MIN_10_EXP (-4931)
#define FLT_MAX_EXP (+128)
#define DBL_MAX_EXP (+1024)
#define LDBL_MAX_EXP (+16384)
#define FLT_MAX_10_EXP (+38)
#define DBL_MAX_10_EXP (+308)
#define LDBL_MAX_10_EXP (+4932)
#define FLT_MAX (3.40282347E+38F)
#define DBL_MAX (1.7976931348623157E+308)
#define LDBL_MAX (1.18973149535723176502E+4932L)
#define FLT_EPSILON (1.19209290-7F)
#define DBL_EPSILON (2.2204460492503131E-16)
#define LDBL_EPSILON (1.08420217248550443401E-19L)
#define FLT_MIN (1.17549435E-38F)
#define DBL_MIN (2.2250738585072014E-308)
#define LDBL_MIN (3.36210314311209350626E-4932L)
#endif /* _FLOAT_H */
limits.h
May 10, 2015
tags: limits.h
When writing C code, it's important to know the ranges for the variables that
you'll use. If you hit the top end of what your variable can store then its
value will "roll-over" to a large negative number, or to zero. The limits.h
file contains the ranges for each type so that you can use them when necessary.
About limits.h
This header is architecture specific and we'll be writing it for an x86_64
machine. Although the values we use are specific, the macros we use are a huge
part of the portability of the C language itself. As a C programmer, you should
be aware of these values for the architecture you're programming for but you
should use these macros as much as possible so that your code will work on any
machine with a C compiler. For example, you can be aware of the fact that an
unsigned char
has a maximum value of 255 on your machine, but you should use
UCHAR_MAX
in your code instead of 255
.
In order to know what values to use for limits.h
, you'll need the Application
Binary Interface for your specific architecture. I searched Google for
"x86_64 ABI" and found it here. If you look at "Figure 3.1: Scalar Types"
on page 12 then you'll see each of the C types along with their sizes, in bytes,
on the x86_64 architecture. This table is what dictates our implementation.
Byte size
The first macro in limits.h
is CHAR_BIT
which establishes the number of bits
used to represent a byte. This may seem kind of odd because a byte is always 8
bits! Almost always. Most modern architectures follow this norm, but some
systems actually have a different number of bits per byte. This stack
overflow question goes into some more depth, but a few examples would be:
- 16-bit bytes: TI C54x DSP
- 9-bit bytes: DEC PDP-10
- 6-bit byte: CDC-6400
The ABI states that "the term byte refers to an 8-bit object", which tells us
that CHAR_BIT
should be 8:
#define CHAR_BIT (8)
Integer limits
For the rest of the macros in limits.h
we will use 2's complement
arithmetic and Figure 3.1 from the ABI to determine our values.
The formulas for determing these numbers are as follows:
- \(n =\)
CHAR_BIT
\(*\) number of bytes - signed minimum \(= -(2^{n-1})\)
- signed maximum \(= 2^{n-1} - 1\)
- unsigned minimum \(= 0\)
- unsigned maximum \(= 2^{n} - 1\)
For example, the char
type is 1 byte and gives us the following values:
- \(n = 8\)
- signed minimum \(= -(2^{8-1}) = {-128}\)
- signed maximum \(= 2^{8-1} - 1 = 127\)
- unsigned minimum \(= 0\)
- unsigned maximum \(= 2^{8} - 1 = 255\)
Notice anything odd here? The fact that \(|{-128}| \gt |127|\) presents a small
issue. We've defined that the maximum signed char
is \(127\), yet we use an even
larger (absolute) value to define the minimum signed char
value.
If we look at the C standard to understand how integers are parsed during
compilation, we'll see that the sign is not a part of an integer constant. The
standard states, "An integer constant begins with a digit, but has no period or
exponent part. It may have a prefix that specifies its base and a suffix which
specifies its type." So for the minimum signed char
value we computed, the
unary minus operator, -
, is parsed separately from the integer 128
and since
we want to use that value with signed char
types, then keeping the value as
\(128\) would result in overflowing the type. To avoid this we can substitute in
an expression equal to \({-128}\), like (-127 - 1)
. We'll use a similar format
for the signed minimum on other types.
Two final things to cover. The standard does not specify whether or not a char
is signed
or unsigned
. However, the ABI specifies that a char
is a signed
byte so we will implement it as such. Lastly, the macro MB_LEN_MAX
is for the
number of bytes in a multibyte character for any supported locale. By setting
this to 4
we allow support for UTF-8 which requires a maximum of 4 bytes
for any one character according to its [specification].
Our full implementation looks like so:
#define CHAR_BIT (8)
#define SCHAR_MIN (-127 - 1)
#define SCHAR_MAX (+127)
#define UCHAR_MAX (+255)
#define CHAR_MIN SCHAR_MIN
#define CHAR_MAX SCHAR_MAX
#define MB_LEN_MAX (4)
#define SHRT_MIN (-32767 - 1)
#define SHRT_MAX (+32767)
#define USHRT_MAX (+65535)
#define INT_MIN (-2147483647 - 1)
#define INT_MAX (+2147483647)
#define UINT_MAX (+4294967295U)
#define LONG_MIN (-9223372036854775807L - 1L)
#define LONG_MAX (+9223372036854775807L)
#define ULONG_MAX (+18446744073709551615UL)
You may notice on the last few macros that the constant expressions have a
letter suffixed to them, like U
or L
(or both). These are used to indicate
an unsigned value and a long value, respectively, and help communicate your
desired type for integer constants since they don't have type specifiers.
errno.h
April 19, 2015
tags: compiling, errno.h, linking, make
Since the Standard mentions the details of errno.h
first, 7.1.4 "Errors
<errno.h>", then we'll start there. This post will look into how to
implement a global variable, errno
, some macros, and how to properly link
object files together so that all of your symbols resolve correctly.
About errno.h
The standard states that errno.h
provides several macros relating to the
reporting of error conditions. Specifically, a "modifiable lvalue that has type
int
", and two macros: EDOM
and ERANGE
. So, what is a modifiable lvalue and
what are macros?
Lvalues
The standard defines lvalues in section 6.2.2.1, "Lvalues and function
designators". An lvalue is an expression that designates an object, with an
object type or incomplete type other than void
. The definition of a modifiable
lvalue is a little more complicated: an lvalue that doesn't have an array type,
an incomplete type, or a const-qualified type, and if it is a structure or
union, it has no members that are of const-qualified type. Basically, an object
which you can assign a value to. The term "lvalue" can be thought of as meaning
"left value". In expressions like object = value;
, the object
is a
modifiable lvalue; it's on the left and we are modifying it.
Macros
Macros are a bit simpler. Section 6.8.3 of the standard, "Macro replacement",
states that the identifier immediately following the define
is called the
macro name. For example:
#define MY_MACRO (10)
MY_MACRO
is the macro name. Immediately following a macro of this form is the
replacement-list followed by a new-line. This means that when the preprocessor
runs through our C code, every instance of MY_MACRO
will be replaced with the
characters (10)
. This makes our code much more readable and allows us to avoid
using "magic numbers" which are values which have no apparent meaning.
For example, if you are using strings within your program and you know that you only want to handle strings which are less than or equal to 30 characters in length, you could have something like:
if (30 >= strlen(myString))
{
/* process string */
}
This is fine since you wrote it, but there is a good chance someone else may look through your code and when they come across this 30 they would wonder what meaning it has. It might be easy to infer than it's the max string length, but they would have to further investigate your code to be sure. Instead, you could define a macro for the max string length and then use it like so:
#define MAX_STR_LEN (30)
...
if (MAX_STR_LEN >= strlen(myString))
{
/* process string */
}
Now the next person can easily determine that you were checking to be sure the string was within your limits. In addition, it's likely that you would use that value several times within your program and at some point you might decide that you will now handle strings up to 50 characters in length. Instead of having to update every instance of 30 that corresponds to your string length, you only need to modify your macro and everything is automatically updated for you! This will help you avoid bugs since every relevant instance of your max string length is updated, rather than you having to find and replace them manually. Let the preprocessor do the work for you.
errno
So, errno
is a modifiable lvalue of type int
which should mean we can just
define it as int errno;
, right? Well, the standard explicitly states that
errno
does not have to be the identifier of an object and that it could expand
to the result from a function call like *errno()
. The way I'm choosing to
implement it is with an object of type int, rather than with a function. The use
case for having a function macro is that every thread of your program would have
its own instance of errno
. This prevents threads from stepping on each other's
error conditions. However, multi-threading support is another beast so I'll just
stick to getting things working on one-thread before moving on to more.
The idea behind errno
is that the library will update errno
when somethings
goes wrong and return an indication to the caller that it should check errno
.
This means that our errno
object needs to have global scope such that any part
of the code can check it. The way we accomplish this with C is through the use
of the extern
storage-class specifier.
Declarations vs. definitions
We need to take a short detour on the difference between declarations and definitions. The standard provides some information about them in section 6.5, "Declaration". A declaration specifies the interpretation and attributes of a set of identifiers; i.e. the storage-class, type-specifier, type-qualifiers, and the name. A definition is what causes storage to be reserved for an object.
Some examples of declarations in C look like:
extern int a;
int
main(void);
Note that these are at file scope, and not within a function. This declares an
object, a
, with external linkage and of type int
, in addition to a function
of type int
, named main
which takes no parameters. This indicates to the
compiler that these two objects will be defined elsewhere and not to reserve any
storage for them since the storage will be reserved by other code.
Some examples of definitions in C look like:
int a;
int
main(void)
{
int b;
int c = 0;
return c;
}
Each object here is defined and has space reserved for it: a
, main
, b
, and
c
. The two questionable objects are a
and b
; we never assigned any value
to them yet they are still definitions? With C, declarations at file scope,
without external linkage (i.e. without the extern
storage-class), also count
as definitions. Likewise, all function local variable declarations are also
definitions regardless of whether or not you assigned a value to them.
Implementing the errno object
Since we want multiple files to be able to access the errno
object, we would
need to place the line extern int errno;
in every file that needs access to
errno
. The way we will accomplish this is by creating the header file,
errno.h
, and placing this declaration inside it. Now, any file wishing to
access errno
can just include errno.h
and it will get access. This is only
declaring the errno
object, so we also need to provide a definition for
errno
so that space is reserved for it such that the library can actually
assign it a value and other files can check that value. We will have a matching
errno.c
file with int errno;
in it providing the definition for our errno
object. It is unnecessary to initialize errno
since it is an external
variable. The standard dictates that objects with external linkage have static
storage duration (6.1.2.4 "Storage duraction of objects"), that objects in
static storage are initialized before program startup (5.1.2 "Execution
environment"), and that the value they are initialized to is 0 if no value is
provided (6.4.7 "Initialization").
So, errno.h
will contain the declaration for errno
:
extern int errno;
errno.c
will contain the definition for errno
:
int errno;
And files which want to access errno
only need to include errno.h
like so:
#include <errno.h>
EDOM and ERANGE
errno.h
must also provide these two macros which indicate two different error
conditions, defined in 7.5.1, "Treatment of error conditions". EDOM
indicates
a domain error which means that the input argument is outside the domain over
which the function is defined; e.g. if your function accepts inputs from 0 to 5
and you pass 10 to it, you've caused a domain error. ERANGE
indicates a range
error which means that the result of a function cannot be represented as a
double
value; e.g. if you try to take the log of 0 then you cause a range
error because log(0) is undefined and cannot be represented.
Since we're using macros for EDOM
and ERANGE
it doesn't really matter what
value we give them since the macro name would be used to check errno
. We will
just use the values 1 and 2 the represent EDOM
and ERANGE
, respectively.
Just like errno
, we want other files to have access to EDOM
and ERANGE
;
we'll place them in the errno.h
file so that anyone accessing errno
automatically gets access to these macros as well.
We'll add the following to errno.h
:
#define EDOM (1)
#define ERANGE (2)
Header Guards
Since we've written our first header file, we also need to talk about header
guards. When you #include
a file, the contents of that file are inserted into
the current file at the location of the #include
directive. When multiple
files include the same file, the contents get included multiple times and you
can end up with multiple definition errors.
To prevent this, we use preprocessor directives to control when the contents of the header file get used. Header guards look like the following:
#ifndef _HEADER_H
#define _HEADER_H
/* contents of header */
#endif /* _HEADER_H */
This works by checking to see if the identifier _HEADER_H
has not been defined
yet. If it hasn't been defined, in the case of the first occurrence of a file
including this header, then the code between the #ifndef
and #endif
is
processed and the identifier _HEADER_H
is defined. If the identifier
_HEADER_H
has been defined, in the case of any inclusion of the header after
the first, the code inside the conditionals is not processed. Anytime you use a
header, you should use header guards to prevent errors from occurring when the
header needs to be included by multiple files.
Identifier name
The identifier we used, _HEADER_H
, will usually correspond to the specific
name of the file that the header guards are placed in. The reason that we
prepend these with an underscore, _
, is because tokens of this naming scheme
are specifically reserved by the standard for use in the standard library.
Section 7.1.3, "Reserved identifiers" states what names are reserved, and our
identifier falls within "identifiers that begin with an underscore and either an
uppercase letter or another underscore". Specifically for our errno.h
file,
our header guards will look like:
#ifndef _ERRNO_H
#define _ERRNO_H
/* contents of header */
#endif /* _ERRNO_H */
Implementation of errno.h
That finishes up all the pieces we need for errno.h
. Combining all these
things together, our implementation of errno.h
is below (without comments):
#ifndef _ERRNO_H
#define _ERRNO_H
#define EDOM (1)
#define ERANGE (2)
extern int errno;
#endif /* _ERRNO_H */
Makefile changes
A few modifications to the Makefile were needed to facilitate building object files from C source files and to use our header directory instead of the standard system header directory.
Compiling C files into object files
This addition was pretty simple. We we're previously only assembling .s files,
but now that we have some C code, we also want to build objects from .c files.
We can add a new pattern substitution that appends more object names to the
OBJECTS
variable, like so:
OBJECTS += $(patsubst $(SRCDIR)/%.c,%.o,$(wildcard $(SRCDIR)/*.c))
Defining the system include directory
By default, gcc
will use /usr/include
as the location for finding system
header files. System headers are wrapped with <>
rather than ""
when you
#include
them. Since we want to use the header files in our ./include
directory, we need to tell gcc
not to use files found in /usr/include
. The
-nostdinc
flag tells gcc
not to search the standard system directories for
header files, exactly what we want! Now, we need to tell gcc
where to search
for system headers with the -isystem
flag. You simple provide a directory path
to this argument, so from the base of our directory structure we will use the
./include
directory by adding -isystem ./include
.
Compile for a specific standard
gcc
also includes specific rules for compiling against a certain standard.
Since we are writing this for ISO 9899:1990 we want to compile for that standard
by passing the following flag: -std=iso9899:1990
.
Testing errno.h
Since we implemented errno.h
then we need to write a test to ensure that
things work the way we want them to. This should be pretty simple because all we
need to do is #include <errno.h>
and ensure that we can use the errno
object
as well as the EDOM
and ERANGE
macros. The test I wrote looks like so:
#include <errno.h>
/*******************************************************************************
errno_test
*//**
@brief Tests the functionality of errno.h
@return Indication of success
@retval 0 Successfully ran all tests
@retval 1 Failed to run tests
*******************************************************************************/
int
errno_test(void)
{
int ret = 0;
/* Ensure errno starts out as 0 */
if (0 == errno)
{
/* Test setting errno to all defined values */
errno = EDOM;
errno = ERANGE;
/* Reset since no errors occurred */
errno = 0;
}
else
{
ret = 1;
}
return ret;
}
Link order
Lastly, we need to slightly alter the order of our build command so that our symbols will resolve properly. Initially, we used the following recipe to link our test code against the library:
$(TSTNAME): $(LIBNAME) $(TSTOBJS)
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME,,$^))
When I compiled this way, I kept getting the following errors:
errno_test.o: In function `errno_test':
errno_test.c:(.text+0xd): undefined reference to `errno'
errno_test.c:(.text+0x17): undefined reference to `errno'
errno_test.c:(.text+0x21): undefined reference to `errno'
errno_test.c:(.text+0x2b): undefined reference to `errno'
The first thing I checked was that the library actually had the errno symbol and
that it was global. You check this by dumping the symbol table with objdump
-t
:
$ objdump -t libwelibc.a
In archive libwelibc.a:
_start.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 g .text 0000000000000000 _start
0000000000000000 *UND* 0000000000000000 main
errno.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 errno.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000004 O *COM* 0000000000000004 errno
Hrm, we can see the last entry is errno and that it's placed in the .common
section of the ELF (indicated by *COM*
). The .common section holds
uninitialized global variables and is where we would expect to see errno
since
we didn't specificically initialize it in errno.c
. So, the symbol is there but
the linker is having trouble resolving it.
When linking object files together, the order in which you list them makes a
difference in how the symbols are resolved during linking. Eli Bendersky wrote
a great article explaining this, Library order in static linking. It even
has an example of this exact situation of linking against a library. The answer
to the problem is to place the library after the other object files we are
linking together. So, we just move the $(LDFLAGS)
variable after the pattern
substitution like so:
$(TSTNAME): $(LIBNAME) $(TSTOBJS)
$(CC) $(CFLAGS) $(patsubst $(LIBNAME),,$^) $(LDFLAGS) -o $@
I also moved the output file to the end because that seems to make sense to me.
Conclusion
With all that, we now know what modifiable lvalues and macros are, how to use global variables with C, and how to link properly with our own system headers. We now have the error reporting facilities we need to move on with the library. The next piece we will implement is 7.1.5, "Limits <float.h> and <limits.h>".
Making a Makefile
April 08, 2015
tags: assembling, compiling, design, linking, make
make
is a GNU utility used for compiling programs and is the build system that
was chosen for welibc. It helps automate the build process so that you don't
need to memorize the specific compiler flags you want to use, remember exactly
how to link against a library, or rely on having your command history available
to compile your code. make
is not necessarily complicated but there are a few
pitfalls to getting started. We'll go over the creation of the makefile for
welibc to help guide you.
Directory Structure
In a previous post the directory structure of welibc was discussed but it was just a placeholder since no code had been written yet. Since we actually have some files now, here is what the structure looks like:
welibc/
├── docs
├── license.txt
├── README.md
├── src
│ └── _start.s
└── test
└── main_test.c
All source code will be inside the src/ directory and any code used to test the library will be placed in test/. This keeps files in logically separate places so the project doesn't become cluttered with assembly, C code, headers, and tests all mixed together.
Compiling without make
We only have two files here so compiling this project should be cake, right? Well, let's consider what we need to do:
- Generate a _start.o file from _start.s
- Create libwelibc.a from _start.o
- Compile, but not link, main_test.c
- Link main_test.o with libwelibc.a to create a test_welibc program
- Run the test
That seems manageable. Some of the compilation basics and commands for generating libraries were covered before so I won't go over them here. Instead, I'll just show the commands:
$ gcc -nostdlib -ffreestanding -c -o _start.o src/_start.s
$ ar cr libwelibc.a _start.o
$ gcc -nostdlib -ffreestanding -c -o main_test.o test/main_test.c
$ gcc -lwelibc -L. -nostdlib -ffreestanding -o test_welibc main_test.o
$ ./test_welibc
That's only five commands which isn't too many, but let's look a little more
closely. First, the compile flags are copied across three commands so if we want
to change them we'll need to remember to include the updated flags in all three
commands. Second, if we change something in _start.s then we'll need to go back
and run all five commands again; that's just hitting the up arrow and enter keys
five times though. Lastly, when (not if) we add more code we'll need to be sure
to include each one as appropriate which could add another entire command for
each new file we need to compile for the project. Well, make
can take care of
these things for us in an automated fashion so that we don't have to rely on
ourselves to remember each detail of the compilation process. After all, why
should we do it when the computer can do it for us?
Compiling with make
I'll only go over a few of the features of make
, but a deeper explanation can
be read from the source, the GNU Make Manual. To compile with make
you
must have a makefile with rules for compiling your project which you can run
from the command-line by ussuing the make
command.
Comments
You can include comments in your makefile to document how things work. Comments
are started with the #
character and can be placed on their own line, or at
the end of a line.
# This is my makefile. There are many like it, but this one is mine.
Rules
Rules are used to define something for make
to do, such as creating an object
file or linking an object file against a library. The name of the rule, called a
target, goes at the beginning of the line and is followed by a colon. The files
or other rules required to run a rule, called prerequisites, go on the same line
after the colon. The commands that should be run for that rule go on the
following lines and are prepended with a tab character; the combination of these
commands is called the recipe.
target: prerequisites
recipe
Variables
make
lets you define variables to use throughout your makefile, these are case
sensitive and have a few assignment operators, but we'll just use :=
and +=
for now. :=
assigns data to a variable while +=
appends more data to it. For
the most part, you can just consider variables to be character arrays or
strings. Variables are used by placing them inside of parenthesis and prepending
a dollar sign, like $(myvar)
.
myvar := some stuff
myvar += more things
Basic Makefile
The basic version of the makefile used to replicate the steps above looks like
so and is named Makefile
:
# Makefile
all: _start.o libwelibc.a test_welibc
_start.o: src/_start.s
gcc -nostdlib -ffreestanding -c -o _start.o src/_start.s
libwelibc.a: _start.o
ar cr libwelibc.a _start.o
test_welibc: libwelibc.a test/main_test.c
gcc -nostdlib -ffreestanding -c -o test_welibc.o test/main_test.c
gcc -lwelibc -L. -nostdlib -ffreestanding -o test_welibc test_welibc.o
./test_welibc
clean:
rm _start.o libwelibc.a test_welibc.o test_welibc
You can see that the same five commands from above are replicated here in
addition to an rm
command. We have separated out the commands into targets so
that each rule will build a specific piece of the project.
By default, make
will run the first rule that it comes across; in this
makefile that would be the "all" rule. This rule only specifies prerequisites
which, if not found or are older than the target, will be built before
continuing into the recipe portion of the rule. So, this will run the rules for
_start.o, libwelibc, and test_welibc before executing the recipe for the rule.
The recipe for this rule is empty so the make process will simply end after
executing the rules for each prerequisite.
The _start.o rule requires the src/_start.s file which is located in exactly the spot indicated. If the _start.o file is older the _start.s, or if _start.o doesn't exist then it will create it using the recipe. The same goes for the libwelibc.a rule and the test_welibc rule. However, for the test_welibc rule, the recipe has a few steps which are each run in order.
Since "clean" was not a prerequisite of the "all" rule, it won't be run. We can
run specific rules by passing the target as an argument on the command-line. For
example, if you want to run the rm
command in the "clean" rule, you would run
make clean
and it would clean up your files for you. Pretty nice!
Basic Enhancements
There are some simple things you can do to enhance your makefile a bit. We'll go
over those before getting into some more advanced things to solidify your make
skills.
Using Variables
You can see that the list of files to delete and the list of prerequisites for
the "all" rule are pretty similar. We should place these file
names/prerequisites into a variable so we don't have to worry about updating
things in two places. A standard variable name for object files would simply be
OBJECTS
. We'll split up the test_welibc rule into two pieces as well: one to
create test_welibc.o and one to link and run the test program.
# Makefile
OBJECTS := _start.o libwelibc.a test_welibc.o
all: $(OBJECTS) test_welibc
_start.o: src/_start.s
gcc -nostdlib -ffreestanding -c -o _start.o src/_start.s
libwelibc.a: _start.o
ar cr libwelibc.a _start.o
test_welibc.o: test/main_test.c
gcc -nostdlib -ffreestanding -c -o test_welibc.o test/main_test.c
test_welibc: libwelibc.a test_welibc.o
gcc -lwelibc -L. -nostdlib -ffreestanding -o test_welibc test_welibc.o
./test_welibc
clean:
rm $(OBJECTS) test_welibc
This has the same functionality except that when we add a new object file to be created we would add it to the objects variable and it automatically gets deleted without us having to manually update the "clean" rule.
The next set of variables that we'll add will cover the compiler being used and
the flags passed to the compiler. It's very common to see variables named CC
,
CFLAGS
, and LDFLAGS
in a makefile. These correspond to the compiler,
compiler flags, and linker flags, respectively.
# Makefile
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
OBJECTS := _start.o libwelibc.a test_welibc.o
all: $(OBJECTS) test_welibc
_start.o: src/_start.s
$(CC) $(CFLAGS) -c -o _start.o src/_start.s
libwelibc.a: _start.o
ar cr libwelibc.a _start.o
test_welibc.o: test/main_test.c
$(CC) $(CFLAGS) -c -o test_welibc.o test/main_test.c
test_welibc: libwelibc.a test_welibc.o
$(CC) $(LDFLAGS) $(CFLAGS) -o test_welibc test_welibc.o
./test_welibc
clean:
rm $(OBJECTS) test_welibc
Ignoring Errors and PHONYs
Let's say there was an error when test_welibc.o was created, so only _start.o
was made. We might want to clean up what was built, fix the error, and finally
recompile. However, the rm
command would fail because it can't find the rest
of the object files or test_welibc. You can prepend any command with a hyphen,
-
, to instruct make
to ignore any errors produced by that command. As well,
we can add the -rf
flags to rm
to remove any errors related to unfound files
or directories that are not empty.
Some targets will not have any prerequisites, as with our clean rule, and make
provides a special directive to mark rules which are a set of actions that don't
rely on specific files. This is the .PHONY
directive which you give a target
and make
will know to always run that target even though it has no files to
determine when things are out of date and need to be rebuilt. In addition to
this, if we end up with a file actually named "clean" then make
won't get
confused; it will know that the clean rule we have is for deleting files and
doesn't correspond to compiling a file named "clean". Here's the new all and
clean rules:
.PHONY: all
all: $(OBJECTS) test_welibc
.PHONY: clean
clean:
-rm -rf $(OBJECTS) test_welibc
Advanced makery
The following will assist in further automating the make
process and will also
clean up the makefile to make it easier to understand when changing it.
Automatic File Finding
We already know that we'll be adding more code to the project which means we'll
have to update the objects variable each time and add a new rule. Instead of
manually updating the objects variables we can automate finding files so we can
focus our attention on creating new rules. make
has a wildcard character, %
,
which can be combined with two built in functions, patsubst
(for pattern
substitution) and wildcard
, to generate a list of object files from every ".s"
file that is found. A new variable to indicate where our main source files would
be helpful as well. The additions look like:
SRCDIR := src
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
The $(wildcard $(SRCDIR)/*.s)
part will match every file in the src/ directory
that ends in ".s". The pattern substitution will then do a search and replace
where the %
character represents the filename. So, src/_start.s would become
_start.o and anytime we add a new ".s" file make
will find it automatically.
We can create similar variables just for the ".c" files in the test directory since they're no longer found with the changes we just made:
TSTDIR := test
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
The rule for compiling the main_test.c file needs to change slightly, along with test_welibc and the prerequisite list for libwelibc.a; here's the new makefile:
# Makefile
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR := src
TSTDIR := test
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
.PHONY: all
all: $(OBJECTS) $(TSTOBJS) test_welibc
_start.o: src/_start.s
$(CC) $(CFLAGS) -c -o _start.o src/_start.s
libwelibc.a: $(OBJECTS)
ar cr libwelibc.a _start.o
main_test.o: test/main_test.c
$(CC) $(CFLAGS) -c -o main_test.o test/main_test.c
test_welibc: libwelibc.a main_test.o
$(CC) $(LDFLAGS) $(CFLAGS) -o test_welibc main_test.o
./test_welibc
.PHONY: clean
clean:
-rm -rf $(OBJECTS) $(TSTOBJS) test_welibc
Virtual Paths and Automatic Variables
It's kind of annoying to have to specify the directory where a file exists and
it clutters the rules up having more than just file names. make
provides a
variable that you can fill in, VPATH
(remember this is case-sensitive), with
colon-separated locations to search for files when they aren't found in the
current directory. Sounds like a great thing to use for the src/ and test/
directories:
VPATH := $(SRCDIR):$(TSTDIR)
Now we can simply list the filenames in the prerequisite lists, rather than
prepending the directory. However, the recipe commands won't search these
directories since they are passed directly to your shell for execution. To get
around this, make
provides some automatic variables which are expanded before
passing a recipe command to the shell. The $^
variable corresponds to every
prerequisite file and is updated with the path and name found using VPATH. The
$@
variable corresponds to the target name and the $<
variable corresponds
to the first prerequisite. With these powers combined we get:
# Makefile
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR := src
TSTDIR := test
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH := $(SRCDIR):$(TSTDIR)
.PHONY: all
all: $(OBJECTS) $(TSTOBJS) test_welibc
_start.o: _start.s
$(CC) $(CFLAGS) -c -o $@ $^
libwelibc.a: $(OBJECTS)
ar cr $@ $^
main_test.o: main_test.c
$(CC) $(CFLAGS) -c -o $@ $^
test_welibc: main_test.o libwelibc.a
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $<
./test_welibc
.PHONY: clean
clean:
-rm -rf $(OBJECTS) $(TSTOBJS) test_welibc
That simplifies the syntax quite a bit!
Wildcard Targets
We can use the same %
wildcard from before to create some wildcard rules so we
don't need a specific rule for each .o which gets built with the same command,
$(CC) $(CFLAGS) -c -o file.o file.c
. Remember that the %
expands to the
matched name so we can use it in the prerequisite list as well. We'll still
need two rules since some objects are made from ".s" files and others from ".c"
files:
# Makefile
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR := src
TSTDIR := test
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) libwelibc.a
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH := $(SRCDIR):$(TSTDIR)
.PHONY: all
all: $(OBJECTS) $(TSTOBJS) test_welibc
_%.o: _%.s
$(CC) $(CFLAGS) -c -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $^
libwelibc.a: $(OBJECTS)
ar cr $@ $^
test_welibc: main_test.o libwelibc.a
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $<
./test_welibc
.PHONY: clean
clean:
-rm -rf $(OBJECTS) $(TSTOBJS) test_welibc
That's not a huge improvement at the moment, but it's the other half of automatic file finding since now we have automatic rule generation for objects which don't require any special recipe.
Generic Tricks
The next few iterations of the makefile expand on the things weve already covered.
First, we'll add variables for the name of our library and the name of the test
in addition to creating a "test" rule that is separate from the "all" rule so
that we don't test every time we build the library. To test, you would use the
make test
command. Also, we'll clean up all the files before rebuilding with
the "all" target by making "clean" a prerequisite for the "all" rule. The
$(TSTNAME)
rule uses a pattern substitution to remove $(LIBNAME)
from the
list of object files that are linked together. It's redundant to provide
-lwelibc -L.
(the $(LDFLAGS)
variable) and also provide libwelibc.a in the
list of files to link. Further, we want to make a distinction between them as
the proper way to compile this would be to generate the test object file and
then link it against our library (which is what we accomplish here).
# Makefile
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
SRCDIR := src
TSTDIR := test
LIBNAME := libwelibc.a
TSTNAME := test_welibc
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) $(LIBNAME)
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH := $(SRCDIR):$(TSTDIR)
.PHONY: all
all: clean $(OBJECTS)
.PHONY: test
test: $(TSTNAME)
./$<
_%.o: _%.s
$(CC) $(CFLAGS) -c -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $^
$(LIBNAME): $(OBJECTS)
ar cr $@ $^
$(TSTNAME): $(LIBNAME) $(TSTOBJS)
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME),,$^)
.PHONY: clean
clean:
-rm -rf $(OBJECTS) $(TSTOBJS) test_welibc
Next, we'll add an install rule to place the final libwelibc.a static library into the /usr/lib folder on the machine, as well as an uninstall rule to remove it.
# Makefile
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
INSTALL := install
SRCDIR := src
TSTDIR := test
LIBNAME := libwelibc.a
TSTNAME := test_welibc
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) $(LIBNAME)
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH := $(SRCDIR):$(TSTDIR)
.PHONY: all
all: clean $(OBJECTS)
.PHONY: test
test: $(TSTNAME)
./$<
.PHONY: install
install: $(LIBNAME)
$(INSTALL) $^ /usr/lib
.PHONY: uninstall
uninstall: $(LIBNAME)
-rm -rf /usr/lib/$^
_%.o: _%.s
$(CC) $(CFLAGS) -c -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $^
$(LIBNAME): $(OBJECTS)
ar cr $@ $^
$(TSTNAME): $(LIBNAME) $(TSTOBJS)
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME),,$^)
.PHONY: clean
clean:
-rm -rf $(OBJECTS) $(TSTOBJS) test_welibc
Lastly, the GNU Make Manual has an entire chapter dedicated to conventions that are commonly used with makefiles and is a great source of enhancements to add. I've chosen just a few to add to the last iteration of the makefile, for now.
First, we'll add INSTALL_PROGRAM
and INSTALL_DATA
which are to be used for
installing programs and installing binaries onto a system. Along with these
we'll add LIBDIR
to specify the default destination which we also prepend with
DESTDIR
so that the command-line flag, --destdir
, can be used to specify an
alternate installation destination. They also instruct you to include the
SHELL
variable set to the bourne shell specifically. The first instance of
.SUFFIXES
clears out all the suffixes being searched and the second specifies
which file suffixes are to be searched. This may save some time on larger
projects but probably won't matter for this one. The manual also suggests
placing any build programs into a variable (like ar
and gcc
), but not to do
so with shell utilies (like rm
). Finally, we end up with the following:
# Makefile
AR := ar
CC := gcc
CFLAGS := -nostdlib -ffreestanding
LDFLAGS := -lwelibc -L.
INSTALL := install
INSTALL_PROGRAM := $(INSTALL)
INSTALL_DATA := $(INSTALL) -m 644
SHELL := /bin/sh
LIBDIR := /usr/lib
SRCDIR := src
TSTDIR := test
LIBNAME := libwelibc.a
TSTNAME := test_welibc
OBJECTS := $(patsubst $(SRCDIR)/%.s,%.o,$(wildcard $(SRCDIR)/*.s)) $(LIBNAME)
TSTOBJS := $(patsubst $(TSTDIR)/%.c,%.o,$(wildcard $(TSTDIR)/*.c))
VPATH := $(SRCDIR):$(TSTDIR)
.SUFFIXES:
.SUFFIXES: .c .o .s
.PHONY: all
all: clean $(OBJECTS)
.PHONY: test
test: $(TSTNAME)
./$<
.PHONY: install
install: $(LIBNAME)
$(INSTALL) $^ $(DESTDIR)$(LIBDIR)/$(LIBNAME)
.PHONY: uninstall
uninstall: $(LIBNAME)
-rm -rf $(DESTDIR)$(LIBDIR)/$(LIBNAME)
_%.o: _%.s
$(CC) $(CFLAGS) -c -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $^
$(LIBNAME): $(OBJECTS)
$(AR) cr $@ $^
$(TSTNAME): $(LIBNAME) $(TSTOBJS)
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(patsubst $(LIBNAME),,$^)
.PHONY: clean
clean:
-rm -rf $(OBJECTS) $(TSTOBJS) test_welibc
Congratulations, your makefile has officially been pimped.
That's quite a bit more advanced than what we started with and we're only compiling two files! However, a lot of the changes are intended to ease the "pain" of adding more files to the library later on, as well as expanding the test suite. So, we did a bit of heavy lifting on the front-end here but hopefully this makefile will last a while with only minor tweaks along the way.
This wouldn't be complete without mentioning Paul's Rules of Makefiles
which is almost always referenced when people ask make
questions. I've tried
to abide by the rules and I'll admit that I wanted to do things differently but
sticking to the rules actually made things easier in the end.
Creating a Static Library
March 22, 2015
tags: assembling, compiling, linking
Creating libraries allows you to place code in a container so that it can be used by other programs. You can create a shared library which is loaded into memory and is available for any running process to use, or you can create a static library which is bundled up within an executable in the same manner as linking in another object file. We'll go over static library creation since welibc isn't ready to be a shared library yet.
Static libraries
Rather than being loaded by the operating system and being made available to any process, a static library is linked in at compile time in a similar fashion to a normal object file. This is the method I'll be using for testing welibc.
We can assemble the following code just as we normally would to create an object file, from _start.s:
.globl _start # Declare global _start symbol
_start:
call main # Call main function
mov %rax, %rbx # Copy return value from main as argument to exit(2)
mov $1, %rax # Set rax to syscall number for exit
int $0x80 # Trigger interrupt to call exit
Assemble:
$ gcc -nostdlib -ffreestanding -c -o _start.o _start.s
Then we create an archive with the ar
program:
$ ar cr libwelibc.a _start.o
Write our test code, main_test.c:
int
main(int argc, char *argv[])
{
int ret = 42;
return ret;
}
And compile:
$ gcc -nostdlib -ffreestanding -c -o main_test.o main_test.c
To link against a static library you specify the library's name without the
prepended "lib" using the -l
flag to gcc, and ensuring to pass the path of the
library with the -L
flag:
$ gcc -L./ -nostdlib -ffreestanding -o test_welibc main_test.o -lwelibc
Now you've linked against a static library and you can run your code:
$ ./test_welibc
$ echo $?
42
As welibc comes along it will be built as a static library and then the test code can link against it like above. This should keep things pretty simple for testing the functionality of the library.
Getting Rid of Stdlibc
March 22, 2015
tags: assembling, compiling, linking
To get started writing your own implementation of the standard C library you'll need to stop linking against whatever implementation that your compiler comes with. I'm working on a CentOS 7 machine and use glibc normally; now I want to use welibc (even though it doesn't really exist yet).
Normal Compilation
First, we're going to look at a "normal" compile. The beginning of the test suite for welibc is in a file called main_test.c and looks like this (comment blocks excluded)
int
main(int argc, char *argv[])
{
int ret = 0;
return ret;
}
Easy enough. This should just run and return 0 without doing anything else.
I'm using gcc
to compile this project, so to compile this code we can use the
following command
$ gcc -o test_welibc main_test.c
If things went properly then this compiles with no errors and gcc
doesn't
print anything. I'm also using bash for my shell and we can test that it ran
properly like so
$ ./test_welibc
$ echo $?
0
The echo $?
line will print the exit status of the most recently executed
foreground pipeline, e.g. the return value of the program we just ran. We got 0
which means things are working properly. You can be sure by using a much
different return value
int
main(int argc, char *argv[])
{
int ret = 42;
return ret;
}
Then rerun it
$ gcc -o test_welibc main_test.c
$ ./test_welibc
$ echo $?
42
So, now you know how a normal compilation would run.
Compiling without stdlibc
We never explicitly told gcc
to use glibc but it does this automatically
unless you instruct it not to. You would do that with the -nostdlib
-ffreestanding
flags, which prevent linking against stdlib and instructs gcc
that our final product will be running in a freestanding environment which means
the standard library might not exist and our program might not necessarily start
in the main
function. So, we'll add those to our compile line and see what
happens.
$ gcc -nostdlib -ffreestanding -o test_welibc main_test.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
Uh... what? /usr/bin/ld? entry symbol _start? First, we didn't even tell you to
run ld
and second, my code doesn't even mention _start, so why do you need that
symbol?
Compilation process
When you compile a C program, a couple things happen:
- The preprocessor performs text replacements
- The compiler converts C code into assembly instructions
- The assembler converts assembly instructions into into object code
- The linker combines all of the object files together into an executable
With the GNU compiler collection your preprocessor and compiler will be gcc
,
the assembler is as
, and the linker is ld
. gcc
will automatically run each
of these as instructed so you don't have to worry about doing each of them
individually. However, when you need the control, you can absolutely run each
step individually.
The output of each of these steps can be seen by using the -save-temps
flag to
gcc
, so we'll compile with that and look at each step
$ gcc -save-temps -nostdlib -ffreestanding -o test_welibc main_test.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
$ ls
main_test.c main_test.i main_test.o main_test.s test_welibc
Preprocessor
The preprocessor, gcc
in this case, moves through your C code and performs all
of the actions specified by the preprocessor directives, like:
#include
#define
#endif
This mostly performs text replacements within your code. For example, #include
actually inserts the text from whatever file you're including, and #define
replaces every instance of your macro with the value you assigned to it. It also
replaces every comment with a space.
The temporary file corresponding to this step is main_test.i
, let's look at
it:
# 1 "main_test.c"
# 1 "<command-line>"
# 1 "main_test.c"
# 47 "main_test.c"
int
main(int argc, char *argv[])
{
int ret = 42;
return ret;
}
There isn't really anything too interesting here, our code is mostly unchanged.
The .i file type indicates to gcc
that the code should not be preprocessed;
this makes sense because this is the output of the preprocessor. The first four
lines are all specific to the gcc
C preprocessor and contain a line number
followed by a filename in quotes. These aren't really important for now, but you
can read more about them here.
Compiler
The compiler, gcc
, will convert human readable code into assembly instructions
which correspond directly to a specific action that your processor needs to
carry out. For example, the processor doesn't know what return ret;
means, so
we need to translate that C code into the instructions that make the processor
"return the value of ret to the calling function." To look at the assembly,
check out the test_main.s file
.file "main_test.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $42, -4(%rbp)
movl -4(%rbp), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.8.2 20140120 (Red Hat 4.8.2-16)"
.section .note.GNU-stack,"",@progbits
This includes some things that are unique to the DWARF-2 debugging format and are too specific to get into right now. More information about these can be found here and in the DWARF-2 specification. The .ident and .section lines can also be ignored for the purposes of this article, but more info is here.
I'll add some comments to the assembly to indicate what it's doing
.file "main_test.c" # The file that these instruction were generated from
.text # The following instructions should be executable
.globl main # Declare a global symbol named main
.type main, @function # Set the "main" symbol to be of function type
main: # Define a label named main
.LFB0: # Define a label named .LFB0
.cfi_startproc
pushq %rbp # Push the value in the rbp register on to the stack
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp # Copy the contents of the rsp register into the rbp register
.cfi_def_cfa_register 6
movl %edi, -20(%rbp) # Move the contents of edi into memory
movq %rsi, -32(%rbp) # Move the contents of rsi into memory
movl $42, -4(%rbp) # Place the value 42 in memory
movl -4(%rbp), %eax # Move the value we stored in memory into the eax register
popq %rbp # Pop the value on top of the stack into the rbp register
.cfi_def_cfa 7, 8
ret # Return to the calling function
.cfi_endproc
.LFE0: # Define a label named .LFE0
.size main, .-main # Calculate the size of the main function
.ident "GCC: (GNU) 4.8.2 20140120 (Red Hat 4.8.2-16)"
.section .note.GNU-stack,"",@progbits
There is a lot going on here but the point is that you can see the 42 getting placed into memory, getting retrieved from memory and placed into a register, and finally the function returns to the caller. The eax register is the place where calling functions check for the return value.
Assembler
The assembler, /usr/bin/as
in this case, takes the assembly instructions in
the .s file and translates them into an object file which contains code that is
relocatable. This means that is will use offsets to reference things, rather
than hardcoded addresses, and will use a special address (like 0x00000000) to
indicate when something is unknown and needs to be fixed. This is done because
the assembler isn't aware of where this code will be placed in memory. The
special addresses will be fixed in the next stage. The object file also contains
the symbol table which correlates named objects to specific addresses, like
where the main
function is. There are several object file formats, the one we
are looking at is called ELF, executable and linkable format.
To view the information about the object file, use readelf
$ readelf -a main_test.o
This is too much information to look at, but I'll mention the symbol table specifically, which can be produced by itself with
$ readelf -s main_test.o
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main_test.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 4
8: 0000000000000000 23 FUNC GLOBAL DEFAULT 1 main
Entry 8 is the entry for our main function, you can see the size of the function, the fact that it's a global function, and that it's value is 0 which means it hasn't been fixed up yet.
Linker
The linker, /usr/bin/ld
in this case, will take one or more object files and
combine them together by looking at their symbol tables to resolve unknown
symbols so that it knows exactly where to go when it needs to execute specific
parts of your code. The output of this stage is (hopefully) a fully working and
executable binary. We instructed gcc
to output a file named test_welibc
which
is the file that the linker produces. We can look at the final symbol table
associated with the executable to see how our symbols were resolved
$ readelf -s test_welibc
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400120 0 SECTION LOCAL DEFAULT 1
2: 0000000000400144 0 SECTION LOCAL DEFAULT 2
3: 000000000040015c 0 SECTION LOCAL DEFAULT 3
4: 0000000000400170 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 FILE LOCAL DEFAULT ABS main_test.c
7: 0000000000000000 0 FILE LOCAL DEFAULT ABS
8: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _start
9: 0000000000601000 0 NOTYPE GLOBAL DEFAULT 4 __bss_start
10: 0000000000400144 23 FUNC GLOBAL DEFAULT 2 main
11: 0000000000601000 0 NOTYPE GLOBAL DEFAULT 4 _edata
12: 0000000000601000 0 NOTYPE GLOBAL DEFAULT 4 _end
Entry 10 is the entry for our main function and you can see that its value is no
longer 0. However, we also have 5 more symbols than before, including the _start
symbol that was said to be missing; where did they come from? These extra
symbols are needed by the ELF file format and gcc
creates them as part of the
linking process. However, the _start symbol is special because that is the
location where an ELF file will start executing. The standard C library provides
its own _start symbol and since we aren't linking against it anymore (because we
specified -nostdlib
) then we get the error about not finding a symbol. You'll
also see that the "Ndx" column lists the _start symbol as being UND, or
undefined.
Bootstrapping your library
Now that we know a little more about the compilation process, ELF, and why we ended up with an error, we can look at fixing it.
First, let's just run our executable and see what happens
$ ./test_welibc
Segmentation fault
A segmentation fault generally means that we accessed a region of memory that was protected in some way. It makes sense that we get this error because the ELF file format begins execution at the address associated with the _start symbol, and our _start symbol's address (the value from the symbol table) is 0. Most systems will not allow you to reference data at, or execute code at, the address 0; so, we end up with a seg fault.
This means we need to provide a _start symbol and somehow get it to call our main function. This will require us to do some assembly programming.
Defining the _start symbol
The first file in our library will be _start.S
, where we will define our start
symbol, call our main
function, and finally exit from the program.
To define a global symbol, we use the .globl
assembly directive, like so
.globl _start
This directive is specific to the GNU assembler (gas) so if you're using something else, be sure to look at its directive syntax (for example, nasm uses "global").
In assembly, to call a function we use the call
instruction, like so
call main
And to exit from the program we will make use of the exit(2) syscall through use
of the syscall
instruction
syscall
However, we must determine how to pass arguments to main, receives its return value, and pass an argument to the exit syscall. These calling conventions are established by the ABI for your architecture and operating system. In the case of Linux on x86_64 we can turn to the System V ABI which has this to say
Figure 3.4: Register Usage
%rax temporary register; with variable arguments
passes information about the number of vector
registers used; 1st return register
A.2.1 Calling Convention
1. User-level applications use as integer registers for passing the sequence
%rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi,
%rsi, %rdx, %r10, %r8 and %r9.
2. A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.
3. The number of the syscall has to be passed in register %rax.
Now that we know how to pass arguments, we will want to call main and exit like so
exit(main(argc, argv));
Where argc
and argv
are passed to main
, whose return value is passed to
exit
. Before calling main
we'll need to place argc
into %rdi
and argv
into %rsi
, main
will give its return value to us in %rax
which needs to be
placed into %rdi
, and finally we need to place the syscall number for exit
into %rax
before issuing the syscall
instruction. This leaves us with two
questions: where do argc
and argv
come from, and what is the syscall number
for exit
?
argc and argv
The ABI tells us where these come from in Figure 3.11: Initial Process Stack
Argument pointers 8 + %rsp
Argument count %rsp
And additionally makes this note about %rsp
's initial value
%rsp The stack pointer holds the address of the byte with lowest address
which is part of the stack. It is guaranteed to be 16-byte aligned
at process entry
Alignment will be important in just a moment. Additionally, the pointers in the
argv
array are located directly on the stack at the location specified above.
At the beginning of _start
, %rsp
will point to argc
and argv
will be 8
bytes higher on the stack. We can use pop
to grab the value at the location to
which %rsp
points and the resultant value of %rsp
will be the value we need
for argv
(remember, argv
is an array of pointers and to pass that as an
argument in C we pass the location of that array). In assembler this is
pop %rdi
mov %rsp, %rsi
Our stack started off being 16-byte aligned but is now 8-byte aligned due to the
pop
instruction. This means we need to re-align the stack on a 16-byte
boundary because that's what is expected upon entry to main
. This can be
achieved in a few ways, but we'll be explicit about what we're doing by AND'ing
%rsp
with a bit mask that will preserve all but the last nibble (-16
in
two's complement), guaranteeing 16-byte alignment. Once done, we're ready to
call main
.
pop %rdi
mov %rsp, %rsi
and $-16, %rsp
call main
exit syscall number
Our last item is to figure out what the syscall number for exit is. 64-bit
syscalls on Linux are defined in the header /usr/include/asm/unistd_64.h
,
which has the following line
#define __NR_exit 60
Now we save off the return value from main and call the exit
syscall like so
mov %rax, %rdi
movq $60, %rax
syscall
I was given a protip to specify the size when mov
ing immediate values rather
than relying on the default size. This is why movq
is used on the immediate
value of 60
.
_start.S
will look something like this
.globl _start # Declare global _start symbol
_start:
pop %rdi # Place argc as first arg to main
mov %rsp, %rsi # Place location of argv[] as second arg to main
and $-16, %rsp # Realign stack on a 16-byte boundary
call main # Call main function
mov %rax, %rdi # Copy return value from main as argument to exit(2)
movq $60, %rax # Set rax to syscall number for exit
syscall # Execute the syscall specified in rax
Assembling assembly
We now have an assembly file that we need to assemble. Let's try assembling it by itself to see what happens
$ gcc -o _start _start.S
/tmp/ccZH6Hhm.o: In function `_start':
(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64/crt1.o:(.text+0x0): first defined here
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/tmp/ccZH6Hhm.o: In function `_start':
(.text+0x1): undefined reference to `main'
collect2: error: ld returned 1 exit status
Here we compiled with glibc (because we left out -nostdlib) and we can see that
_start is in fact defined by the normal standard library because gcc
tells us
that the _start symbol was first defined in the crt1.0 file. Hopefully things
are starting to make sense as to what glibc is providing and why we get errors
when we don't link against it. You can also see there is an undefined reference
to main which should be expected since we call the main function but we haven't
provided gcc
with the code for it yet.
Now, let's compile without glibc and see what output we get
$ gcc -nostdlib -ffreestanding -o _start _start.S
/tmp/ccDUeM2Z.o: In function `_start':
(.text+0x1): undefined reference to `main'
collect2: error: ld returned 1 exit status
Awesome, we don't get anymore errors about multiple definitions of start and we
still get the expected error about main. However, we really want an object file
so that we can then link it to the code where main is defined. You can instruct
gcc
to stop before the linking phase which will give you an object file.
$ gcc -nostdlib -ffreestanding -c -o _start.o _start.S
Now we don't get any errors since gcc
didn't try to resolve any symbols.
We'll do the same thing with our main_test.c file to get an object file with main defined inside of it
$ gcc -nostdlib -ffreestanding -c -o main_test.o main_test.c
Linking objects together
So we're now at the point where we have one file defining the _start symbol and referencing the main symbol, and one file which defines the main symbol. We need to link them together so that the unresolved symbols will be fixed up so that we can execute our code with no seg faults.
First, let's verify what we have. If we dump the symbol table for _start.o we should see the _start symbol and an undefined main symbol
$ readelf -s _start.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3
3: 0000000000000000 0 SECTION LOCAL DEFAULT 4
4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 1 _start
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND main
The "Ndx" column indicates whether the symbol is undefined or not, and as we expected, the main symbol is undefined. Now, let's look at main_test.o
$ readelf -s main_test.o
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main_test.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 4
8: 0000000000000000 23 FUNC GLOBAL DEFAULT 1 main
Likewise we now have a defined main symbol. Now, let's link them together to get a working binary!
We can pass object files to gcc
and it knows not to preprocess, compile, or
assemble them. We still need to specify not to use glibc because gcc
will
automatically link against it otherwise, and in our case we want to link only
main_test.o and _start.o:
$ gcc -nostdlib -ffreestanding -o test_welibc _start.o main_test.o
And now let's look at the final symbol table to be sure everything is defined
$ readelf -s test_welibc
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400120 0 SECTION LOCAL DEFAULT 1
2: 0000000000400144 0 SECTION LOCAL DEFAULT 2
3: 0000000000400168 0 SECTION LOCAL DEFAULT 3
4: 0000000000400180 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 FILE LOCAL DEFAULT ABS main_test.c
7: 0000000000000000 0 FILE LOCAL DEFAULT ABS
8: 000000000040015c 0 NOTYPE GLOBAL DEFAULT 2 _start
9: 0000000000601000 0 NOTYPE GLOBAL DEFAULT 4 __bss_start
10: 0000000000400144 23 FUNC GLOBAL DEFAULT 2 main
11: 0000000000601000 0 NOTYPE GLOBAL DEFAULT 4 _edata
12: 0000000000601000 0 NOTYPE GLOBAL DEFAULT 4 _end
Removing startup files and standard libraries
We can add two more compiler flags to get rid of a few more system defaults.
First, we'll direct gcc
not to use any of the standard system startup files
with -nostartfiles
. Second, we don't want to use any of the default system
libraries since we are writing our own standard C library; this is done with the
-nodefaultlibs
flag. From the man pages description of the -nostdlib
flag
we're already using, it seems to be a combination of -nostartfiles
and
-nodefaultlibs
, but we'll include them just to be sure.
Our final compilation looks like
$ gcc -nostdlib -ffreestanding -nostartfiles -nodefaultlibs -o test_welibc _start.o main_test.o
Running glibc free
So, we haven't gotten any errors during preprocessing, compiling, assembling, or linking. It's time to test our binary that doesn't use glibc to see if it runs properly
$ ./test_welibc
$ echo $?
42
Sweet! Now we can continue on with the development of the rest of the library since we can successfully run the most basic code when linking against it.
Project Goals
March 05, 2015
As with any project, you need to have a goal in mind for what the project should accomplish. Here I will discuss the goals of welibc and how I plan to achieve them.
Goal
To create an implementation of the standard C library, as defined in ISO/IEC 9899:1990, which is clear, secure, and well documented.
Scope
The user mode portions of the standard C library will be implemented first. This
means that functions like free
, fwrite
, and malloc
will be saved until the
end. This is because those functions must implement OS and architecture
specific code which, by its definition, is not portable and could stall the
project. This project is not aimed at providing an implementation which will run
on every machine but rather an implementation which can serve as a reference or
perhaps the base for other projects with similar goals.
Plan
I don't think that either of the three sub-goals are more important than another, but it is important to discuss a preliminary plan for achieving them. I think that implementing one function a week is a moderate goal to allow me time to write the function, explore different ways of achieving the same results, writing tests for the function, and documenting the work. Now that I've said that, maybe two weeks is better. We'll see.
Code Clarity
Having code which is easy to read and digest is very important if you plan on maintaining it or passing it on to someone else. It can make all the difference in the world when the previous programmer (which might be you!) has taken the time to write code in a way which implements their intent such that you can verify it is done correctly. I briefly touched on this before, but I'll give another example. Which code sample below would you prefer to debug?
Sample 1
/* Print the number of command-line arguments passed in */
#include <stdio.h>
int
main(int argc, char *argv[])
{
printf("You passed in %d argument%s\n",argc,(argc!=1)?"s":"");
return 0;
}
Sample 2
/* Print the number of command-line arguments passed in */
#include <stdio.h>
int
main(int argc, char *argv[])
{
int numArgs = argc;
printf("You passed in %d argument", numArgs);
if (1 != numArgs)
{
printf("s");
}
printf("\n");
return 0;
}
Can you spot the error? How would you fix it? These two code examples do exactly the same thing and while one has fewer lines of code and less function calls I would personally prefer to be debugging the 2nd example.
Using a style guide can help a lot with writing clear code. It helps ensure that your code is written in a consistent manner and that there is a defined naming scheme for files, functions, and variables. Previously I said that I would just use the coding style I developed on my own, which isn't documented, but that seems to go against the goals of the project. So, I'll use the Google Style Guide as a template to write my own style guide which I expect to grow in size and specificity as the project progresses. It would be nice if I could write it in markdown and include it as a part of my directory structure then link a blog post to the file itself, rather than updating a single blog post.
Here are corrected versions of the code samples above. The bug lies in that argc
includes the program name in its count, so the number of arguments passed to the
program is argc - 1
.
Corrected Sample 1
/* Print the number of command-line arguments passed in */
#include <stdio.h>
int
main(int argc, char *argv[])
{
printf("You passed in %d argument%s\n",argc-1,((argc-1)!=1)?"s":"");
return 0;
}
Corrected Sample 2
/* Print the number of command-line arguments passed in */
#include <stdio.h>
int
main(int argc, char *argv[])
{
int numArgs = argc - 1;
printf("You passed in %d argument", numArgs);
if (1 != numArgs)
{
printf("s");
}
printf("\n");
return 0;
}
It's worth noting that the first example relies on having to change code in two places while the 2nd example only requires a change in one. This can cause problems during maintence since you rely on the programmer remembering to update every instance of the error.
Writing Secure Code
Some portions of the standard C library will never be secure by their nature, but I want to do as much as I can to create an implementation which is as secure as possible within the confines of the standard.
A great resource for writing secure C code is the Cert C Coding Standard which is a free resource that is continuously updated. If you prefer a hardcopy, they took a snapshot of their standard and printed it; it's about $55 on Amazon. I haven't read the entire standard, but I'll try to abide by its rules as much as possible. Also, the bibliography section has a plethora of other fantastic C resources.
Testing will also be a large component of writing secure code. No matter how skilled of a programmer you are, I don't think it would ever be wise to deploy your code without testing it; no matter how sure you are that it's correct. I still have some research to do on testing frameworks for C, but this is something I plan on doing from the beginning. I don't think this will be test driven development, but tests will certainly be included.
There is also the idea of defensive programming which drives the way you write code with the goal of writing code which is less likely to contain bugs. This doesn't mean you write code without bugs, but that the bugs you do implement will reveal themselves sooner rather than later. A good example of this is below:
/* Print the first argument to the screen */
#include <stdio.h>
int
main(int argc, char *argv[])
{
if (argc = 2)
{
printf("Please provide one argument to be printed.\n");
}
else
{
printf("%s", argv[1]);
}
return 0;
}
First, do you see the bug and can you determine what behavior it causes?
Compiling this with gcc -o defensive defensive.c
will not produce any
warnings or errors. But no matter what input you give, it will always print
"Please provide one argument to be printed." This is because of the comparison
if (argc = 2)
. This actually assigns the value 2
to the variable argc
and
that expression evaluates to 2
which is true in C. I meant to write if (argc
!= 2)
. So, we can rewrite this by placing the constant on the left side
of the expression:
/* Print the first argument to the screen */
#include <stdio.h>
int
main(int argc, char *argv[])
{
if (2 = argc)
{
printf("Please provide one argument to be printed.\n");
}
else
{
printf("%s", argv[1]);
}
return 0;
}
Now when we compile with gcc -o defensive defensive.c
we get the error:
defensive.c: In function ‘main’:
defensive.c:5:11: error: lvalue required as left operand of assignment
if (2 = argc)
^
So, simply by changing the way we write comparisons we can catch bugs like this early since the code won't even compile! See if you can find the other bug in this code snippet (hint).
Another way to program securely is to take advantage of the features of your
compiler. For example, compiling the original example with gcc -Wall -o
defensive defensive.c
would give the following warning:
defensive.c: In function ‘main’:
defensive.c:5:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
if (argc = 2)
^
That is better than nothing, but the code still compiles. For this reason, I prefer writing the code defensively because it's not always guaranteed that someone else will be compiling your code with the same settings. Yes, there are makefiles or build settings for a project, but copy/pasting code is very common and doesn't capture your specific build flags.
Documentation
Again, documentation is very important and for that reason I want to provide a standard C library which is well documented. I will be using Doxygen to generate documentation from the source code which dictates that my code be commented in specific ways so that the Doxygen parser can read it. This will force me to comment every file and function appropriately and forces me towards meeting this goal.
In addition to documenting the source code, my plan is to document my progress as I go with blog posts that discuss how I arrived at the implementation of each function. I hope that other programmers will find this documentation useful for answering questions like "How does this work?", "Why did you do it this way?", or "What is the best way to _____?". This will also help me have a log to look back over to see why I chose specific implementations and what worked or didn't work the way I wanted it to.
Project Design Decisions
March 03, 2015
tags: design, license, standards, style
There are a few decisions that you might want to make at the beginning of your project so that you have them answered before you get too deep in to things. We'll go over choosing a license, establishing a directory structure, choosing a revision control system, and a few other things.
Software License
If you choose to place your code online where other people can see it then you'll want to choose a license to release your project under. Choosing a license ensures that you have appropriate protection both in how people use your code and what you can be blamed for. I fantastic resource for seeing what types of licenses are out there and what they mean is choosealicense.com. Jeff Atwood wrote a pretty good article on this as well; it's worth a read. Once you've chosen your license, generally you'll need to include a copy of it at the base of your directory structure and possibly at the top of each file.
For welibc I'm choosing the BSD 3-Clause License for its simplicity, the requirement of attribution, and preventing the ability to be held liable.
Directory Structure
Establishing your directory structure at the beginning will help guide your development and keep the layout clean and organized. The idea is to separate distinct pieces and parts of your project in to their own directories in a way that makes sense to you and your team. As your project grows and changes you may find that your directory structure is really working out or that it seems to complicate things. Don't be afraid to alter it as you go, but don't compromise the structure for one feature if it complicates a lot of other pieces.
I don't expect welibc to be terribly big, but at this point I see three distinct groups of files: source code, public headers, and test code. Both the source and test code will need access to the headers, so I think it's logical to group them in a separate folder. Here is a sample layout of what I have in mind:
welibc/
├── license.md
├── readme.md
├── include
│ ├── assert.h
│ ├── limits.h
│ ├── stdio.h
│ └── stdlib.h
├── src
│ ├── assert.c
│ ├── stdio.c
│ └── stdlib.c
└── test
├── assert_test.c
├── main.c
├── stdio_test.c
└── stdlib_test.c
Revision Control System
A revision control system (RCS) assists in managing your code as you add features, remove bugs, and implement new functionality. It is an absolutely essential piece of your project. Without an RCS it is likely that you will lose working code, leave commented out code laying around, and make the same mistakes over again.
An RCS provides two incredibly important features: snapshotting your code and attaching messages to your code. If you're unfamiliar, an RCS works similarly to a library in that you check out your code, edit it, then check it back in (or "commit" it) with the changes along with a message of what you did. This also means that it is entirely up to you and your team to make the best use of these features. You could use an RCS but only have one commit with your finalized piece of software; that's not useful. You could put song lyrics in every commit message you make; that's not useful. These features exist for a reason and you should use them. Seth Robertson wrote a great post that discusses this in more depth.
Alright, I'm glad you chose to use a revision control system but which one should you use? There are a few out there, but there are really about four in major use today:
I don't know that any of them are necessarily better than others, but right now there are two pretty large providers of free source code hosting that both use git: bitbucket and github. They make it easy to share your code as well as browse it through their web interfaces. It's also nice that they are in charge of managing the infrastructure and server side of things; all you have to worry about is writing your code.
For this project I've chosen to host the code at bitbucket using git. I'm not as proficient with git as I would like to be so this will help me become more familiar with it.
Coding Style
The style of your code is something that is often overlooked but is something that can really enhance your project. If all of your code is written the same way with braces in the same spots, the same variable name scheme, and spacing consistency then it's much easier to read the code and spot errors when things get out of place.
What looks better, this?
int greatestcommonDivisor(int a , int b){
int c;
while(a!=0)
{
c = a; a = b % a;
b = c;
}
return b;}
Or this?
int
greatestCommonDivisor(int a, int b)
{
int c;
while (0 != a)
{
c = a;
a = b % a;
b = c;
}
return b;
}
That's an extreme example, but just keep in mind that it'll probably be you coming back later on to fix a bug. Do you really want to make things harder on yourself later?
The style you choose is entirely up to you but I recommend always using the same style throughout all of your code; don't change it between files or folders. There are a few example style guides you could adopt as well:
To get a better idea of what all a style guide encompasses, read the Linux kernel coding style page. It's a pretty fast read and Linus even includes some justifications for why certain styles were chosen.
For welibc I have my own style that I've developed over the past few years which I'll be using. It's not exactly documented somewhere, but maybe I'll do that as part of this project. I'm pretty particular about how my code looks.
Build Automation Software
Unless you're working with just a handful of files then you'll want to pick a build system to handle everything for you. A build system allows you to issue a simple command like "make" or to just click a button and have your entire project built according to the specific settings that you need. A few examples of build systems would be:
Since I'll be doing this work on Linux and I'm already pretty familiar with it, I'll be using GNU make for this project.
Documentation
Documentation is usually the last thing done and is sometimes left out of a project entirely. Including documentation may seem worthless but it can really pay off when you have to find an obscure bug, when you bring in a new developer, or when an experienced developer leaves the team. If you have appropriate documentation then these things go more smoothly.
You should first start with commenting your code appropriately. This can include file and function comments along with documenting particularly tricky pieces of code when you can't figure out how to simplify it any other way. Once you have a consistent method of documenting things then you could look at documentation generators. These are separate programs that parse your source code and generate documentation which generally looks better and is easier to read than your source code. A few documentation generators to consider are:
My plan is to use Doxygen since that's really the only one I've used before. If it isn't customizeable enough to fit into the style of the rest of the website then I might switch to something else.
Conclusion
This may seem like a lot to decide on and setup before you actually get down to writing any code but a little planning can go a long way in keeping your project organized and easy to understand. Having some of these things in place can really make the difference between looking like a hobby and looking like a professionally produced piece of software.
C Standards
March 01, 2015
To implement the standard C library you need a C standard which tells you what the library contains and how it is intended to work. So, which standard do you choose and where do you get it?
History
There is a bit of history behind how the most current revision of the C standard came to be, so we'll go over it to give some background and context for why they exist and where they came from. The beginnings are somewhat scattered across the Internet, but a lot of this information was obtained from Dennis Ritchie's homepage and the documents mentioned there.
Informal Standards - 1975 - 1989
Originally, there were no C standards, just "The C Reference Manual", a document written by Dennis Ritchie and used within AT&T Bell Laboratories. This was not necessarily written with the purpose of being a standard, but rather documentation of the syntax and use of C. The most recent version of this manual was published in 1975 and included with 6th Edition Unix. It mentions a useful introduction to the C language, "Programming in C - A Tutorial", by Brian Kernighan.
The next iteration of an informal standard came in 1978 with the first version of "The C Programming Language", by Brian Kernighan and Dennis Ritchie. This book replaced "The C Reference Manual", but reproduced most, if not all, of it in the book's Appendix A. This book has become very well known and is usually referred to as just "K&R" and served as the informal standard until 1989.
ANSI C - 1983 - 1990
As C grew in popularity it began to quickly evolve, but the informal standards were not keeping up; "The C Programming Language" no longer reflected the most current version of the language. During the summer of 1983 the American National Standards Institude (ANSI) created the X3J11 committee with the goal "to develop a clear, consistent, and unambiguous Standard for the C programming language which codifies the common, existing definition of C and which promotes the portability of user programs across C language environments." On December 14, 1989 the committee ratified the first official C standard, ANSI X3.159-1989. It is more commonly referred to as "ANSI C" or just "C89".
As far as I know, the closest you can get to a copy of the original C89 standard is a text version of a draft. There is also a PDF version of the rationale which helps explain the reasoning and decision making that goes into developing the standard.
International Standardization - 1990 - 1999
In 1990, the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC) created the WG14 working group which is in charge of the international standardization for the programming language C. C was still growing in popularity and ANSI C was really only a standard approved by an American organization - it had no internationally recognized standard. The WG14 working group used the ANSI X3.159-1989 standard to create its own version, ISO/IEC 9899:1990, which was ratified on December 31, 1990, and is more commonly known as "C90". The only difference between C89 and C90 is the formatting, otherwise they are equivalent. You can get a copy of C90 from SAI Global, which only has a hard copy available.
In addition to creating the international standard, ISO/IEC also provides updates to its standards through documents called technical corrigenda which are meant to correct technical errors or ambiguity and updated outdated information. In addition to technical corrigenda, ISO/IEC releases amendments to their standards in order to alter and/or add to previously agreed technical provisions in an existing standard. In 1994, ISO/IEC published Technical Corrigendum 1 for C90 as the first set of corrections/updates for the standard. In 1995, ISO/IEC published "Amendment 1: C integrity" to C90 which was such a significant update to the language that it is known as C95, it can be purchased from SAI Global. The final update to C90 was Technical Corrigendum 2, which was released in 1996. The dates for the technical corrigenda and amendment seem to overlap, but as best I can tell TC1 was published before AMD1.
Updating the International Standard - 1999 - Present
The next major update to the C standard came in 1999 with ISO/IEC 9899:1990, also known as C99. I couldn't find a draft version of just C99 without any of the technical corrigenda included, but a copy is available from SAI Global.
C99 has had three updates: Technical Corrigendum 1 released in 2001, Technical Corrigendum 2 released in 2004, and finally Technical Corrigendum 3 released in 2007. There is a WG14 working paper which includes C99 along with all of its technical corrigenda.
Finally, the most recent C standard is ISO/IEC 9899:2011, also known as C11. A draft version, ISO/IEC 9899:201X, is available for free and according to WG14 "it reflects what was to become the standard at the time of issue." This version of the standard had a technical corrigendum released in 2012. Again, I couldn't find a free draft version of this technical corrigendum, but it can be purchased from SAI Global.
Choosing a standard
It is generally best to use the most up-to-date version of the C standard. However, not all compilers fully support every version, and there may be portability concerns for your specific project which dictate using a specific standard. As well, consider what other code your project will integrate with and which standard that code was written for.
For this project, I will be implementing the standard C library as defined in ISO/IEC 9899:1990. The updated standards usually add functionality on top of the base library as defined C90, so I think the most basic set of functions is a good place to begin.
First Post
February 20, 2015
Subtitles
This is my first post with markdown, just seeing how things work.
- A
- List
- I think
Need some text here
#include <stdio.h>
int
main(void)
{
printf("Hello, world!");
return 0;
}
How about some math? \(e=mc^2\)
And there we go