One Function Per File
tags: assembling, compiling, design, linking, make
A common layout model seen in most libc implementations (e.g. glibc, bsdlibc, musl, etc.) is to have one source file for every function in libc. For example, memcpy.c, memmove.c residing under a string/ directory. This is in contrast to having one source file corresponding to every libc header (i.e. string.c for string.h). We'll explore why this approach is taken and then demonstrate how to convert to the more common model.
Static Linking
One of the root causes of this layout model is static linking against the standard library. Static linking takes whatever you're linking against, like libc, and embeds those functions into your output binary. This increases the size of your binary somewhat but frees you from worrying about what version of that library may or may not be installed on whatever system your user is running on.
Dynamically linking places some requirements on the version of the library
you're linking against. If your user chose to upgrade their standard C library
and that library provides more accurate results for sin()
than you're
expecting then you may run into issues.
This decision means we're trading portability for size or vice versa. Since we're developing a library that will be linked against we want to be considerate to our users so that this decision is easier to make. More specifically, we want the output binary size to be as small as possible when someone statically links against welibc. How do we do that? Place every function in its own file. The real question that we'll be answering is why placing one function per file produces smaller binaries.
Binary Size
To find out why one function per file is better, we need to do a comparison of statically linking against a library written one way versus the other. The test code will use one, but not both, of the functions we have so far in welibc so that we can show when, or if, unnecessary code shows up in our output binary. If you use all of the functions then the approach doesn't matter since you get the entire library anyways. Our test linking program, link.c, will be the following:
#include <string.h>
int
main(int argc, char *argv[])
{
char src[] = "Hello, world!";
char dst[20] = { 0 };
memcpy(src, dst, sizeof(src));
return 0;
}
This program will be compiled and linked against welibc in the same way the test code is done:
$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi -c link.c -o link.o
$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi link.o -lwelibc -L. -O0 -o test_link
One modification I made is to remove the use of the -ggdb3 flag so that no debugging information is generated. This will simplify things in a moment.
Comparison
welibc is currently written with one source file per header so we'll look at that before changing anything. Using the above procedure and renaming the output file to "combined", we get an output file size of 2164 bytes.
Separating out memmove
to a dedicated C file, rebuilding welibc, and relinking
the test code to welibc with a file name of "separate" gives a file size of
1892 bytes.
How do we find out where the 272 byte increase comes from? objdump
will tell
us exactly what is contained in the binaries and we can diff the output of each
binary:
$ objdump -d combined > comb.txt
$ objdump -d separate > sep.txt
$ diff comb.txt sep.txt
2c2
< combined: file format elf64-x86-64
---
> separate: file format elf64-x86-64
101,162d100
<
< 0000000000400282 <memmove>:
< 400282: 55 push %rbp
< 400283: 48 89 e5 mov %rsp,%rbp
< 400286: 48 89 7d e8 mov %rdi,-0x18(%rbp)
...
The truncated output contains the rest of the instructions for the memmove
function. The only difference is that the "separate" binary also contains the
memmove
function (which was never called). Let's see exactly how large the
function is by subtracting the address of the first byte of the first
instruction (0x400282) from the address of the last byte of the last function.
$ diff comb.txt sep.txt | tail
< 40032f: 48 8b 45 f8 mov -0x8(%rbp),%rax
< 400333: 88 10 mov %dl,(%rax)
< 400335: 48 8b 45 d8 mov -0x28(%rbp),%rax
< 400339: 48 8d 50 ff lea -0x1(%rax),%rdx
< 40033d: 48 89 55 d8 mov %rdx,-0x28(%rbp)
< 400341: 48 85 c0 test %rax,%rax
< 400344: 75 d8 jne 40031e <memmove+0x9c>
< 400346: 48 8b 45 e8 mov -0x18(%rbp),%rax
< 40034a: 5d pop %rbp
< 40034b: c3 retq
$ echo $((0x40034b - 0x400282))
201
The difference in the file sizes was 272 bytes though, there are still 71 bytes unaccounted for. These are attributed to other changes required to faciliate this extra function which come in the form of added, or slightly changed, instructions in the exception header and frame.
It's clear that separating functions to their own source file ensures that programs which statically link against welibc will only receive the functions they call. Next we'll discover why this happens.
Translation Units
During the compilation of C programs the compiler will deal in discrete pieces of the larger program which are referred to as translation units. The Standard defines a translation unit as "A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines previously skipped by any of the conditional inclusion preprocessing directives ...".
When all string.h functions are included in a single source file they get lumped
into a single translation unit. When you make a call into code within a
translation unit, the entire unit must be included in the final output
executable. When all functions for a given section of the standard library are
in the same file (i.e. string.c), you end up including all functions even if
only one was called. This is seen with the combined executable including
memmove
even though only memcpy
was called.
Dynamic Linking
Now that we know why separating functions to their own function is beneficial for static linking, we need to explore what affect this will have on dynamic linking.
When dynamic linking is used, the operating system will load a library into an address space the first time an executable needs to use it. That library is then made available to any other executables that need to use it, without the overhead of it being loaded. The operating system takes care of linking the running executables with the library at run time (hence the name dynamic).
When a library is compiled with the intention of being used with dynamic linking, the output size isn't a concern because the price of loading is paid only once. Moving each function to its own file effectively makes no difference for dynamic loading since all code is included anyways.
Reorganizing
The way I'd like to lay out the source code, and the way other projects have also done it, is with directories that correspond to each header file. For example:
└── src/
├── assert
├── errno
├── math
├── stddef
├── stdio
├── stdlib
└── string
This means that some changes need to be made to the Makefile so that it will find code two directories deep.
Gather Source Directories
First we need to find all directories that contain source code. After creating some new directories and shuffling the files around, the layout looks like so:
src/
├── errno
│ └── errno.c
├── _start.s
└── string
├── memcpy.c
└── memmove.c
Make has a builtin function, wildcard
, which will expand each pattern it
receives into the files and directories that it finds.
$(wildcard $(SRCDIR)/*/)
With the current layout, this will expand to
src/string/ src/errno/ src/_start.s
We don't want "src/_start.s" in the mix, so we can use the builtin dir
function to extract the directory portion of each filename:
$(dir $(wildcard $(SRCDIR)/*/))
Which gives us:
src/string/ src/errno/ src/
Finally, we will use sort
to order the names and remove any possible
duplicates. This list of directories will be stored in a variable for later use:
$(SRC_DIRS):= $(sort $(dir $(wildcard $(SRCDIR)/*/)))
Gather Source Files
Now that we've found all directories containing source code, we want to gather
all source files that exist within those directories. The addsuffix
builtin
will add the given suffix to every item in the list; this is perfect for using
another wildcard to find all files of a given type. Finally, the notdir
function is used to extract just the filename from an item.
C_SRCS := $(notdir $(wildcard $(addsuffix *.c,$(SRC_DIRS))))
S_SRCS := $(notdir $(wildcard $(addsuffic *.s,$(SRC_DIRS))))
This gives us a listing of all current source files:
errno.c memmove.c memcmp.c memcpy.c
_start.s
Convert Sources into Objects
Next we will slightly alter the existing pattern substitutions so that the input
data comes from $(C_SRCS)
and $(S_SRCS)
.
OBJECTS := $(patsubst %.c,%.o,$(C_SRCS))
OBJECTS += $(patsubst %.s,%.o,$(S_SRCS))
Modify VPATH
Now that we know the names of all source files and the directories in which they
exist, we need to modify the VPATH variable so that it contains all of the
source directories. This way Make can find the source files when we refer to
them by filename only, rather than by full path. This means switching out
$(SRCDIR)
for $(SRC_DIRS)
.
$(VPATH) := $(SRC_DIRS):$(TSTDIR)
Conclusion
After seeing the advantages of placing every function in its own file it's clear that this is the way welibc should be organized. This only required a few adjustments to the Makefile and we're all set. Now binaries that link against welibc will be leaner and the code will be easier to browse.
comments powered by Disqus