memcpy
The inaugural string.h
function is memcpy
which will copy the specified
number of characters from one object to another. It takes two character
pointers, one to the destination object and one to the source object, along with
the number of characters to copy. It returns a pointer to the destination
object.
void *
memcpy(void *s1, const void *s2, size_t n);
As for the return value, the standard makes no indication that an error value
would be returned if something went wrong, so memcpy
lacks a way to do
meaningful error checking. Also note that s2
is a const void *
, meaning that
the data to which it points won't be modifiable through the s2
pointer, but
the pointer itself could be changed.
Local Variables
Since this function isn't very complicated, we will only need two local
variables, a source pointer and a destination pointer. But, isn't that was s1
and s2
already are? Well, they are in a sense, but since they are void
pointers then we can't do anything with them because they don't point to any
specific type. What we need to do is copy those pointers into char
pointers so
that we can read and write characters.
char *pDst = s1;
const char *pSrc = s2;
Note that our source needs to be a const char *
to match the const
that is
attached to s2
. Without this, you should get a warning about discarding the
const
qualifier:
src/string.c: In function ‘memcpy’:
src/string.c:59:21: warning: initialization discards ‘const’ qualifier from
pointer target type [enabled by default]
char *pSrc = s2;
^
Parameter Validation
The first thing we should do is check our inputs to be sure they are valid
before we use them. The basic checks here would be ensuring that we don't have
NULL
pointers and that our size isn't zero:
if ( !s1 ||
!s2 ||
!n))
{
return s1;
}
This will prevent us from having a NULL
pointer dereference when we read or
write a character and prevent us from executing unnecessary instructions if we
didn't need to perform any actual copying.
The standard explicitly states that 'If copying takes place between objects that
overlap, the behavior is undefined.' So, we don't have to verify that s1
isn't
in the range of [s2
, s2 + n - 1
], or that s2
isn't in the range [s1
, s1
+ n - 1
]. Undefined behavior means that the standard allows us to ignore this
behavior, to implement a specific behavior that is then required to
be documented, or to terminate execution completely. For right now, we will just
ignore the behavior.
Function Body
Since we have the number of characters to copy stored in n
, we'll use that as
a loop counter and compare it against 0
to determine when we need to stop
copying. We can then use our source and destination pointers to copy a single
character at a time. Once we're done copying, all we need to do is return the
original destination pointer.
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
return s1;
Security
Although the standard says we can just ignore overlapping objects pointed to by
s1
and s2
, we should be able to easily verify that the objects don't
overlap with the simple range checks that were mentioned before. Since size_t
is an unsigned type, then we can use suggestions from the INT30-C section
of the CERT C Coding Standard to check for overlap and wrapping:
if ( !s1 ||
!s2 ||
!n ||
/* Check for object overlap */
((((unsigned long) s2) - ((unsigned long) s1)) < n) ||
((((unsigned long) s1) - ((unsigned long) s2)) < n) ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
Due to a limitation of C90, there is no integral type guaranteed to be large
enough to hold the value of a pointer, but in order to perform this error
checking we must convert the pointers to an integral type. The largest possible
type available in C90 is unsigned long
, so that is what we'll convert the
pointers to in order to perform arithmetic on them. We'll add a test later on to
ensure that this is true on the system where this library is tested. As well,
the function comment will include a warning about our assumption.
That's a lot of checking but it provides validation that our copy should succeed. There are other errors that could still happen, like reading outside of a memory boundary, but they are difficult (or impossible) to prevent in a general enough way to be included in the library. There are two camps here: either you prevent the user from doing something bad and let them handle the error checking themselves, or you let them try something bad at the expense of extra cycles spent on error checking in the library itself. Most implementations are done in the first way and I think it would be interesting to do it the latter way.
Backwards copy?
Another use for memcpy
would be to slide a region of data over by some amount.
But, what if we slide the region over by so little that its destination is
actually within the original region? Well, our security checks will prevent that
from happening! This is nice, but in actuality a user may need to do this very
thing and the standard doesn't actually say we have to prevent it. To allow this
we'll first have to remove the overlap checks.
Now that we allow the source and destination regions to overlap, let's look at
what happens when we slide a region to lower memory such that destination region
will overlap with the source. Assume that we have the characters, A
, B
, C
,
and D
stored in memory adjacent to each other at address 0x8000
, and we want
to move it over to address 0x7FFE
.
First, we read an A
from 0x8000
and store it at 0x7FFE
. Next, we read a
B
from 0x8001
and store it at 0x7FFF
. Next we read a C
from 0x8002
and
store it at 0x8000
, overwriting the A
from our source region. Finally, we
read a D
from 0x8003
and store it at 0x8001
, overwriting the B
from our
source region. We end up with A
, B
, C
, D
starting at address 0x7FFE
.
Although we overwrote our source region with new data, it wasn't an issue
because we read from the source locations before overwriting them and our
memcpy
was a success.
Next, let's look at what happens when we slide a region to higher memory such
that the destination region will overlap with the source. We'll use the same
source address and data, but the destination will be address 0x8002
.
First, we read an A
from 0x8000
and store it at 0x8002
, overwriting the
C
from our source region. Next, we read a B
from 0x8001
and store it at
0x8003
, overwriting the D
from our source region. Next, we read an A
from
0x8002
and store it at 0x8004
. Finally, we read a B
from 0x8003
and
store it at 0x8005
. We end up with A
, B
, A
, B
, starting at address
0x8002
. This is the incorrect result and it happened because we overwrote
source locations before reading from them.
So, how can we implement memcpy
in such a way that the source and destination
may overlap but the copy will still happen properly? Well, for the case that we
slide the region in to higher memory, we can simply copy from the end of the
region to the start, rather than the normal method of copying from start to end.
We'll redo our failed copy backwards and see what happens. First, we read a D
from 0x8003
and store it at 0x8005
. Next, we read a C
from 0x8002
and
store it at 0x8004
. Next, we read a B
from 0x8001
and store it at
0x8003
, overwriting the D
from our source region. Finally, we read an A
from 0x8000
and store it to 0x8002
, overwriting the C
from our source
region. We end up with A
, B
, C
, D
, starting at address 0x8002
and our
memcpy
was a success when done backwards.
In the same way that the forward copy failed when you have overlapping regions and move from lower to higher memory, a similar copy from higher to lower memory will fail when done backwards. As such, we must check the source and destination and perform a forward or backward copy as appropriate. The only true way to do this is by having two copies of the same code where one is implemented as normal, while the other moves the pointers to the end of the region and decrements them as it copies.
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
Our backward copy works very similarly to the forward copy with the exception of where our pointers start. Since we use the pre-decrement operator we must start out past the region we want to copy. This will allow us to back up in to the valid region and then perform the copy.
Efficiency
When you look at the complexity of an algorithm you can measure two things to
get an idea of how the algorithm will perform: time and space. We'll look at how
to analyze them both, how to improve the efficiency of memcpy
, and some other
generic notes on efficiency. That being said, keep in mind the famous quote,
"Premature optimization is the root of all evil." Try not to be too clever and
save the hardcore optimizations for when there is a true need for them (i.e.
your algorithm isn't meeting speed requirements or it's using too much memory).
Before getting in to the details of efficiency, it's worth noting that the
algorithm we have at this point is truly as efficient as it can be according to
the standard. This is due to strict aliasing and object access rules which limit
the ways in which we can read or write data. The standard tells us that a
pointer to char
may be used to access any other type of object which is why
those are the only pointers we can use. Unfortunately, this limits us to copying
one byte at a time.
For the sake of exploration we'll look at using other pointer types to perform the copy in a faster way, but remember that this breaks the rules of the standard.
Time
A common way to measure the time of an algorithm is to use Big O Notation which represents a cap on the growth of a function in relation to its input. This is appropriate for C code because each line, comparison, or assignment could actually take multiple instructions to execute depending on what hardware it runs on. Using Big O Notation allows us to normalize our measurements, providing an accurate description of our function regardless of where it runs.
For this memcpy
algorithm, we can see that there is a direct correlation
between the amount of input data, n
, and the number of steps required to
complete the copy. This means our function can be described with \(O(n)\) in
relation to how long it takes to execute.
Space
We can also use Big O Notation to describe the amount of memory, or space,
that a function uses to do its work. With C programming, you may have arrays or
multiple variables used to keep track of state, but the space for those will be
constant and most likely on the stack. There is the concept of dynamic memory
allocation where you use variable amounts of memory from the heap; these will be
indicated by calls to functions like malloc
or calloc
. Since the amount of
memory used with dynamic memory allocation is unknown at run time, this is the
memory we are interested in measuring. Most other uses of memory will be known
at compile time and would remain constant.
Our implementation of memcpy
does not make use of dynamic memory allocation
which means it will use the same amount of memory every time, regardless of the
size of input data. We can describe this behavior with \(O(1)\) indicating
constant memory usage.
Multi-character copy
Copying one character at a time could be pretty slow if we were copying a large
amount of characters. It could be more efficient to do four characters at a time
using an int
pointer (points to four bytes on x86_64), or even eight
characters at a time using a long
pointer (points to eight bytes on x86_64).
This will incur some additional overhead for calculating which type to use with
what values of n
, as well as extra logic to copy the leftover characters if
the chosen type is not a multiple of n
. Let's explore this.
Side-note: copying via a pointer to a non-char type will violate the strict-aliasing rules of C because only pointers to void or char types may alias any other pointer. Since we don't have any type information about the objects being copied to or from, we could only use pointers to char types if we want to stay within the bounds of strict-aliasing rules.
Alignment
The first thing we'll look at is alignment, which deals with how data is
arranged and accessed in memory. Data said to be "n-byte aligned" will always be
at a memory address that is a multiple of n. For example, four-byte aligned data
will always be at an address that ends in 0x0
, 0x4
, 0x8
, or 0xC
, because
those are all multiples of four. This is an important concept when copying
memory because different amounts of data can be accessed depending on the
type used for access, but this only works if the addresses match the alignment
of that type.
Each type in C may have a different alignment due to its size in bytes. Examples
from x86_64 would be char
, short
, int
, and long
, which each have a size
of one, two, four, and eight bytes, respectively, and also have the same values
for their alignment. This means that storing 0xDEADBEEF
at address 0x8000
might look like so:
unsigned char *pChar = (unsigned char *) 0x8000;
unsigned short *pShort = (unsigned short *) 0x8000;
unsigned int *pInt = (unsigned int *) 0x8000;
unsigned long *pLong = (unsigned long *) 0x8000;
*pChar = 0xEF;
*(pChar + 1) = 0xBE;
*(pChar + 2) = 0xAD;
*(pChar + 3) = 0xDE;
*pShort = 0xBEEF;
*(pShort + 1) = 0xDEAD;
*pInt = 0xDEADBEEF;
*pLong = 0xDEADBEEF;
As you can see, the larger the size of a type, the larger the amount of data
that be accessed at one time. However, there could be a point at which you stop
seeing benefits, as can be seen when we used int
and long
pointers.
The address we chose, 0x8000
, is special because it fits the alignment of each
of the four types. This won't always be the case and this is where problems
start to happen. If we had chosen the address 0x8002
then the only valid types
for accessing that address would be char
and short
, because 0x8002
is a
multiple of both one and two, but not four or eight. This will come in to play
when attempting to do multi-byte copies, because both the source and destination
addresses will need to have the same alignment. The convenient part about
copying one char
at a time is that every address is one-byte aligned so there
are no alignment issues to worry about.
The largest amount of data that a processor can access at one time is referred
to as a word, so if we can copy a word at a time then we will theoretically be
copying memory as fast as possible. The word size on each machine will change,
but it will always be a multiple of the largest data type available on that
machine. In terms of C90, this means copying a long
at a time is the best we
can do. The sign of the long
is irrelevant here because an unsigned long
occupies the same amount of memory as a long
.
For more information about alignment, see this article by Johnathan Rentzsch.
Calculating Efficiency Gains
The second thing we'll look at is how to determine which type will be the most efficient at copying a certain amount of data. To do this, we need to consider how many steps are taken to perform a copy. For simplicities sake, we will consider one copy to be one step. If we use different types then that one step may copy one byte of data or it may copy eight.
To take care of any alignment issues (assuming that the source and destination are capable of being aligned), we will assume that our copies start one byte past an alignment boundary, and end one byte before an alignment boundary which will represent the worst-case scenario. For an n-byte aligned type, this means we have \(n-1\) steps before and after we perform the n-byte aligned bulk copy. This also means that an n-byte aligned type will only be used to copy \(3n - 2\) characters, or more. Otherwise, we wouldn't have any characters left to copy after handling alignment.
For x86_64, our thresholds for performing copies with each type are as follows:
char
, 1-byte aligned: \(3(1) - 2 = 3 - 2 = 1\)short
, 2-byte aligned: \(3(2) - 2 = 6 - 2 = 4\)int
, 4-byte aligned: \(3(4) - 2 = 12 - 2 = 10\)long
, 8-byte aligned: \(3(8) - 2 = 24 - 2 = 22\)
As you can see, the thresholds are very close together. To determine how many steps, in the worst case, are used to copy \(x\) number of characters with an \(n\)-byte aligned type, we will derive an equation based on the three parts of the copy: a) getting to an alignment boundary, b) the bulk copy, and c) copying whatever is leftover:
And if we substitute in \(c\) we can simplify:
Then substitute in \(b\) since \(a\) isn't needed anymore:
As an example, let's compare copying 100 characters using short
and long
types.
Considering the range of size_t
, which dictates how many characters get copied
at once, the efficiency gains of using a larger type are noticed for very small
values of n
.
Implementing a typeless memcpy
Next we'll look at implementing a multi-byte memcpy
by starting with the basic
copy from before and building up to a version which will work regardless of the
type used for the "bulk" copy portion. Here was our original copy loop:
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
This relied on the size of each copy being one byte allowing us to use the post-decrement operator to reduce our counter. More generically, this loop would be:
while (0 < n)
{
*(pDst++) = *(pSrc++);
n -= sizeof(*pDst);
}
Which will accomplish the same thing while working with any type chosen for
pDst
and pSrc
(assuming they're the same). For now we are assuming that both
pDst
and pSrc
point to memory addresses which are aligned and that n
is a
multiple of sizeof(*pDst)
; we'll deal with that later. We need to further
abstract this to account for getting to an aligned boundary before the bulk copy
and then to copy any leftover bytes.
Like we mentioned with our object overlap checks, C90 doesn't have an integral
type that is guaranteed to be large enough to hold the value of a pointer and we
need such a type in order to check the alignment of a pointer. The best we can
do is use the largest type available to us, long
, and note the invalid
assumption that it will hold the value of a pointer. In our tests we can include
a check to ensure that this is true.
char *pCharDst = s1;
const char *pCharSrc = s2;
my_type *pDst = NULL;
const my_type *pSrc = NULL;
/* Error checking */
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pDst) - 1))) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pDst = (my_type *) pCharDst;
pSrc = (my_type *) pCharSrc;
while (sizeof(*pDst) <= n)
{
*(pDst++) = *(pSrc++);
n -= sizeof(*pDst);
}
pCharDst = (char *) pDst;
pCharSrc = (char *) pSrc;
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
Finally, we need to deal with our alignment assumption by checking the source
and destination addresses for matching alignment. If their offsets from an
alignment boundary do not match then they will never sync up to both match the
alignment of the bulk copy type at the same time. In this case, the best we can
do is use our generic char
copy.
char *pCharDst = s1;
const char *pCharSrc = s2;
my_type *pDst = NULL;
const my_type *pSrc = NULL;
/* Error checking */
if ((((unsigned long) pCharDst) & (sizeof(*pDst) - 1)) ==
(((unsigned long) pCharSrc) & (sizeof(*pSrc) - 1)))
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pDst) - 1))) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pDst = (my_type *) pCharDst;
pSrc = (my_type *) pCharSrc;
while (sizeof(*pDst) <= n)
{
*(pDst++) = *(pSrc++);
n -= sizeof(*pDst);
}
pCharDst = (char *) pDst;
pCharSrc = (char *) pSrc;
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
Choosing when to switch types
Now we have all of tools we need to write a multi-byte memcpy
. Before fully
implementing the multi-byte memcpy
, let's look at our thresholds one more time
(again, these are specific to x86_64):
char
, 1-byte aligned - \(3(1) - 2 = 3 - 2 = 1\)short
, 2-byte aligned - \(3(2) - 2 = 6 - 2 = 4\)int
, 4-byte aligned - \(3(4) - 2 = 12 - 2 = 10\)long
, 8-byte aligned - \(3(8) - 2 = 24 - 2 = 22\)
Not only are these the lowest values of n
that are required to use that
specific type, they are also the first values of n
where copying with the
smaller type will take more steps than with the larger. This means that the
minimum count thresholds double as the efficiency thresholds. We could use a
simple check to decide on which type to use for the bulk copy and then use the
above code with the appropriate type substituted for pDst
and pSrc
. Due to
the nature of alignment, ensuring that something is aligned to the largest type
guarantees that alignment is satisfied for any smaller type (thought I wouldn't
be surprised if there was an architecture for which that wasn't true). This
means we can always perform our alignment check using a modulus of the largest
type, sizeof(long)
, and if it passes then we're guaranteed that the source and
destination can ultimately be aligned for any type.
#define MEMCPY_LONG_THRESHOLD (22)
#define MEMCPY_INT_THRESHOLD (10)
#define MEMCPY_SHORT_THRESHOLD (4)
char *pCharDst = s1;
const char *pCharSrc = s2;
short *pShortDst = NULL;
const short *pShortSrc = NULL;
int *pIntDst = NULL;
const int *pIntSrc = NULL;
long *pLongDst = NULL;
const long *pLongSrc = NULL;
/* Error checking */
if ((MEMCPY_SHORT_THRESHOLD <= n) &&
(((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1)) ==
(((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1)))
{
if (MEMCPY_LONG_THRESHOLD <= n)
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1)))
&&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(pLongDst++) = *(pLongSrc++);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
if (MEMCPY_INT_THRESHOLD <= n)
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pIntDst) - 1)))
&&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pIntDst = (int *) pCharDst;
pIntSrc = (int *) pCharSrc;
while (sizeof(*pIntDst) <= n)
{
*(pIntDst++) = *(pIntSrc++);
n -= sizeof(*pIntDst);
}
pCharDst = (char *) pIntDst;
pCharSrc = (char *) pIntSrc;
}
if (MEMCPY_SHORT_THRESHOLD <= n)
{
while ((0 < (((unsigned long) pCharDst) & (sizeof(*pShortDst) - 1)))
&&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pShortDst = (short *) pCharDst;
pShortSrc = (short *) pCharSrc;
while (sizeof(*pShortDst) <= n)
{
*(pShortDst++) = *(pShortSrc++);
n -= sizeof(*pShortDst);
}
pCharDst = (char *) pShortDst;
pCharSrc = (char *) pShortSrc;
}
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
You'll notice that we used if
statements for each of the bulk copies, rather
than if else
for the int
and short
copies. This is because we aren't sure
what type of system we will run on and there is a possibility that what is
leftover from one bulk copy may be enough to do a smaller bulk copy. On x86_64
this could be seen when copying 22
bytes when the source and destination
addresses are already aligned to the size of a long
. Two bulk copies would
happen with a long
pointer, followed by three copies with a short
pointer.
While this will mostly guarantee you get the fastest copy for any value of n
,
it has a lot of repetitive code and is overly complicated just to provide faster
copies for a very small window of n
values. In addition, if we look at each
comparison as another step then a copy using short
pointers has completely
lost any efficiency gain it would've had because we spent five steps just to
figure out we should be using short
s. Performing more detailed step or
efficiency analysis really requires you to look at the assembly level and that's
not what we're here to do. With that said, a copy using short
pointers
probably won't have any real benefits over the basic char
copy. Likewise, the
int
copy would show gains for such a small window that it's unnecessary to
implement unless your specific application utilizes a lot of copies in that
range.
Now that we're down to only performing bulk copies with long
pointers, we
should consider the overhead of checking n
, the alignment, and adjusting
pointers as necessary for a successful multi-byte copy. The added overhead means
that we should increase the threshold required to perform the extra work,
otherwise we aren't actually gaining anything. We'll bump up our threshold by an
order of magnitude by setting it to 200. You could more finely tune this
threshold by looking at the assembly output of your memcpy
implementation and
comparing exactly how many instructions are used for the extra checks and the
bulk copy itself. Our final multi-byte implementation (sans error checking) is
below. Each loop here represents each piece of our efficiency equation from
before: \(a\) to get to an alignment boundary, \(b\) to perform the bulk copy, and
\(c\) to copy any leftovers. Lastly, we've added two new variables to track the
alignment offsets for us so we only have to calculate them once.
#define MEMCPY_EFFICIENCY_THRESHOLD (200)
void *
memcpy(void *s1, const void *s2, size_t n)
{
char *pCharDst = s1;
const char *pCharSrc = s2;
long *pLongDst = NULL;
const long *pLongSrc = NULL;
size_t dstOffset = 0;
size_t srcOffset = 0;
/* Error checking */
dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);
if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
(dstOffset == srcOffset))
{
while ((0 < dstOffset--) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(pLongDst++) = *(pLongSrc++);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
return s1;
}
Other Efficiency Notes
Our parameter validation includes a check on n
that is questionable. On one
hand, our loops will check n
and never run, so we don't actually need the
check. On the other hand, we can exit early when n
is zero and save ourselves
from performing the other checks that are irrelevant since no data is being
copied. The chance of (n == 0)
being true is possible, but highly unlikely,
and keeping the check means that we're adding an extra step to every call to
memcpy
. Due to this, we'll favor the calls to memcpy
where (n != 0)
by
removing parameter validation on n
, and let those unlikely calls suffer a few
extra steps.
For error checking, we'll consider each check to be an additional step. So, we're adding just four steps to validate that our objects do not overlap and that we don't wrap during the copy. This is an incredibly small price to pay for the added security and safety that these checks add, so we'll definitely keep those.
Backwards copy revisited
Now that we have a more efficient version of our algorithm, we still need to make it copy backwards to handle the case of object overlap. We've already discussed why this is necessary and how to do it with the simple copy loop and walked through step-by-step the way to expand the simple version into a more efficient version. The pointer arithmetic takes care of a lot of the headache for us. Since this is an extension of what we've done already, the full version is found at the end.
Testing
Normally we could use memcmp
to verify that our memcpy
happened correctly,
but we don't have that function yet. We'll create a simple test function that
checks the boundaries of the copy to ensure we copied up to the boundaries, but
no further. This function takes a control memory region that should be a copy of
the destination region so it can be restored, as well as an offset in to the
region such that we can verify no bytes before the region were touched. Before
doing the actual copy, we verify that the source and destination are actually
different at the critical points, otherwise our tests won't be meaningful. The
last note about this function is with the use of the variables src2
and
src3
. This function will be used to test copying between overlapping objects
and we must preserve the values of the source object that we intend to test,
otherwise they may be overwritten and our test will appear to fail.
int
verifyCopy(
const void * const pVoidCtrl,
const size_t ctrlSize,
void *pVoidDst,
const size_t dstSize,
const void * const pVoidSrc,
const size_t srcSize,
const size_t offset)
{
int ret = 1;
const char * const pCtrl = pVoidCtrl;
char *pDst = pVoidDst;
const char * const pSrc = pVoidSrc;
size_t n = 0;
size_t i1 = offset - 1;
size_t i2 = offset;
size_t i3 = 0;
size_t i4 = 0;
char src2 = 0;
char src3 = 0;
if ((NULL == pCtrl) ||
(NULL == pDst) ||
(NULL == pSrc) ||
(ctrlSize != dstSize) ||
(0 == offset) ||
((2 * offset) >= ctrlSize) ||
((2 * offset) >= dstSize))
{
return ret;
}
/* Copy up to offset bytes from the end of the shortest region */
if (dstSize > srcSize)
{
n = srcSize - (2 * offset);
}
else
{
n = dstSize - (2 * offset);
}
i3 = offset + n - 1;
i4 = offset + n;
src2 = pSrc[i2];
src3 = pSrc[i3];
do
{
/* Verify that we will be able to see a difference */
if ((pDst[i1] == pSrc[i1]) ||
(pDst[i2] == src2) ||
(pDst[i3] == src3) ||
(pDst[i4] == pSrc[i4]))
{
break;
}
memcpy(pDst + offset, pSrc + offset, n);
/* Test the boundaries */
if ((pDst[i1] != pCtrl[i1]) ||
(pDst[i2] != src2) ||
(pDst[i3] != src3) ||
(pDst[i4] != pCtrl[i4]))
{
break;
}
ret = 0;
} while (0);
/* Return the destination to its original form */
memcpy(pDst, pCtrl, dstSize);
return ret;
}
Our main test function will explicitly test the error conditions and then use
the previous function to verify that the memcpy
implementation works properly.
The method by which we verify our alignment handling is by testing every
combination of start and end offset from zero to sizeof(long) - 1
. The last
two tests will ensure that copying between overlapping objects works properly.
int
memcpyTest(void)
{
int ret = 1;
int failed = 0;
size_t i = 0;
size_t j = 0;
char bulk1[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
char bulk2[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
char bulk3[] = "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA";
char bulk4[] = "ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA"
"ZYXWVUTSRQPONMLKJIHGFEDCBAZYXWVUTSRQPONMLKJIHGFEDCBA";
char simple1[] = "This will be a short string to test simple things";
char simple2[] = "This will be a short string to test simple things";
do
{
/* Alignment checking relies on an unsigned long holding a pointer value
* without truncation.
*/
if (sizeof(unsigned long) < sizeof(void *))
{
break;
}
/* No way to verify, we'll just crash here if we don't catch the NULL */
memcpy(NULL, simple1, sizeof(simple1));
/* We may crash here */
memcpy(simple2, NULL, sizeof(simple2));
for (i = 0; i < sizeof(simple2); i++)
{
if (simple1[i] != simple2[i])
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
/* We might crash here too */
memcpy(simple2, (void *) ((unsigned long) -5), sizeof(simple2));
for (i = 0; i < sizeof(simple2); i++)
{
if (simple1[i] != simple2[i])
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
/* And here might be another crash */
memcpy((void *) ((unsigned long) -5), simple1, sizeof(simple1));
/* Verify no characters are copied */
memcpy(bulk2, bulk3, 0);
for (i = 0; i < sizeof(bulk2); i++)
{
if (bulk1[i] != bulk2[i])
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
/* Test each permutation of 0 to sizeof(long) start positions and
* offsets doing a forward copy.
*/
for (i = 0; i < sizeof(long); i++)
{
for (j = 0; j < sizeof(long); j++)
{
if (0 != verifyCopy(bulk1 + i,
sizeof(bulk1) - i,
bulk2 + i,
sizeof(bulk2) - i,
bulk3 + i,
sizeof(bulk3) - i,
j + 5))
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
}
if (1 == failed)
{
break;
}
/* Test each permutation of 0 to sizeof(long) start positions and
* offsets doing a backward copy.
*/
for (i = 0; i < sizeof(long); i++)
{
for (j = 0; j < sizeof(long); j++)
{
if (0 != verifyCopy(bulk3 + i,
sizeof(bulk3) - i,
bulk4 + i,
sizeof(bulk4) - i,
bulk2 + i,
sizeof(bulk2) - i,
j + 5))
{
failed = 1;
break;
}
}
if (1 == failed)
{
break;
}
}
if (1 == failed)
{
break;
}
/* Slide a buffer backward */
if (0 != verifyCopy(bulk1 + 12,
sizeof(bulk1) - 12,
bulk2 + 12,
sizeof(bulk2) - 12,
bulk2 + 12 + (3 * sizeof(long)),
sizeof(bulk2) - 12 - (3 * sizeof(long)),
7))
{
break;
}
/* Slide a buffer forward */
if (0 != verifyCopy(bulk1 + 12 + (3* sizeof(long)),
sizeof(bulk1) - 12 - (3* sizeof(long)),
bulk2 + 12 + (3* sizeof(long)),
sizeof(bulk2) - 12 - (3* sizeof(long)),
bulk2 + 12,
sizeof(bulk2) - 12,
7))
{
break;
}
ret = 0;
} while (0);
return ret;
}
Conclusion
At the core, memcpy
is very simple; just loop over the data and move it from
source to destination. The versions we've implemented have several extra checks
in place to prevent NULL pointer dereferences and reading/writing with an
improperly chosen length. Since copying between overlapping objects is allowed
if implemented, we chose to implement that to provide more functionality.
void *
memcpy(void *s1, const void *s2, size_t n)
{
char *pDst = s1;
const char *pSrc = s2;
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
/* Copy forward or backward to handle object overlap */
if (pDst < pSrc)
{
while (0 < n--)
{
*(pDst++) = *(pSrc++);
}
}
else
{
pDst += n;
pSrc += n;
while (0 < n--)
{
*(--pDst) = *(--pSrc);
}
}
return s1;
}
We also implemented bulk copies when available by using pointers to the largest
type available in C90. Keep in mind that the bulk copy implementation relies on
the assumption that an unsigned long
is at least as large as a pointer type
and that it also violates certain parts of the standard to perform copies in
this way.
#define MEMCPY_EFFICIENCY_THRESHOLD (200)
void *
memcpy(void *s1, const void *s2, size_t n)
{
char *pCharDst = s1;
const char *pCharSrc = s2;
long *pLongDst = NULL;
const long *pLongSrc = NULL;
size_t dstOffset = 0;
size_t srcOffset = 0;
if ( !s1 ||
!s2 ||
/* Check for wrapping while copying */
((((unsigned long) -1) - ((unsigned long) s1)) < n) ||
((((unsigned long) -1) - ((unsigned long) s2)) < n))
{
return s1;
}
/* Copy forward or backward to handle object overlap */
if (pCharDst < pCharSrc)
{
dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);
if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
(dstOffset == srcOffset))
{
while ((0 < dstOffset--) &&
(0 < n--))
{
*(pCharDst++) = *(pCharSrc++);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(pLongDst++) = *(pLongSrc++);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
while (0 < n--)
{
*(pCharDst++) = *(pCharSrc++);
}
}
else
{
pCharDst += n;
pCharSrc += n;
dstOffset = ((unsigned long) pCharDst) & (sizeof(*pLongDst) - 1);
srcOffset = ((unsigned long) pCharSrc) & (sizeof(*pLongSrc) - 1);
if ((MEMCPY_EFFICIENCY_THRESHOLD <= n) &&
(dstOffset == srcOffset))
{
while ((0 < dstOffset--) &&
(0 < n--))
{
*(--pCharDst) = *(--pCharSrc);
}
pLongDst = (long *) pCharDst;
pLongSrc = (long *) pCharSrc;
while (sizeof(*pLongDst) <= n)
{
*(--pLongDst) = *(--pLongSrc);
n -= sizeof(*pLongDst);
}
pCharDst = (char *) pLongDst;
pCharSrc = (char *) pLongSrc;
}
while (0 < n--)
{
*(--pCharDst) = *(--pCharSrc);
}
}
return s1;
}