welibc - One Function Per File

A common layout model seen in most libc implementations (e.g. glibc, bsdlibc, musl, etc.) is to have one source file for every function in libc. For example, memcpy.c, memmove.c residing under a string/ directory. This is in contrast to having one source file corresponding to every libc header (i.e. string.c for string.h). We'll explore why this approach is taken and then demonstrate how to convert to the more common model.

Static Linking

One of the root causes of this layout model is static linking against the standard library. Static linking takes whatever you're linking against, like libc, and embeds those functions into your output binary. This increases the size of your binary somewhat but frees you from worrying about what version of that library may or may not be installed on whatever system your user is running on.

Dynamically linking places some requirements on the version of the library you're linking against. If your user chose to upgrade their standard C library and that library provides more accurate results for sin() than you're expecting then you may run into issues.

This decision means we're trading portability for size or vice versa. Since we're developing a library that will be linked against we want to be considerate to our users so that this decision is easier to make. More specifically, we want the output binary size to be as small as possible when someone statically links against welibc. How do we do that? Place every function in its own file. The real question that we'll be answering is why placing one function per file produces smaller binaries.

Binary Size

To find out why one function per file is better, we need to do a comparison of statically linking against a library written one way versus the other. The test code will use one, but not both, of the functions we have so far in welibc so that we can show when, or if, unnecessary code shows up in our output binary. If you use all of the functions then the approach doesn't matter since you get the entire library anyways. Our test linking program, link.c, will be the following:

#include <string.h>

int
main(int argc, char *argv[])
{
    char src[] = "Hello, world!";
    char dst[20] = { 0 };

    memcpy(src, dst, sizeof(src));

    return 0;
}

This program will be compiled and linked against welibc in the same way the test code is done:

$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi -c link.c -o link.o
$ gcc -nostartfiles -nodefaultlibs -nostdinc -nostdlib -ffreestanding -isystem ./include -ansi link.o -lwelibc -L. -O0 -o test_link

One modification I made is to remove the use of the -ggdb3 flag so that no debugging information is generated. This will simplify things in a moment.

Comparison

welibc is currently written with one source file per header so we'll look at that before changing anything. Using the above procedure and renaming the output file to "combined", we get an output file size of 2164 bytes.

Separating out memmove to a dedicated C file, rebuilding welibc, and relinking the test code to welibc with a file name of "separate" gives a file size of 1892 bytes.

How do we find out where the 272 byte increase comes from? objdump will tell us exactly what is contained in the binaries and we can diff the output of each binary:

$ objdump -d combined > comb.txt
$ objdump -d separate > sep.txt
$ diff comb.txt sep.txt
2c2
< combined:     file format elf64-x86-64
---
> separate:     file format elf64-x86-64
101,162d100
<
< 0000000000400282 <memmove>:
<   400282:     55                      push   %rbp
<   400283:     48 89 e5                mov    %rsp,%rbp
<   400286:     48 89 7d e8             mov    %rdi,-0x18(%rbp)
...

The truncated output contains the rest of the instructions for the memmove function. The only difference is that the "separate" binary also contains the memmove function (which was never called). Let's see exactly how large the function is by subtracting the address of the first byte of the first instruction (0x400282) from the address of the last byte of the last function.

$ diff comb.txt sep.txt | tail
<   40032f:     48 8b 45 f8             mov    -0x8(%rbp),%rax
<   400333:     88 10                   mov    %dl,(%rax)
<   400335:     48 8b 45 d8             mov    -0x28(%rbp),%rax
<   400339:     48 8d 50 ff             lea    -0x1(%rax),%rdx
<   40033d:     48 89 55 d8             mov    %rdx,-0x28(%rbp)
<   400341:     48 85 c0                test   %rax,%rax
<   400344:     75 d8                   jne    40031e <memmove+0x9c>
<   400346:     48 8b 45 e8             mov    -0x18(%rbp),%rax
<   40034a:     5d                      pop    %rbp
<   40034b:     c3                      retq
$ echo $((0x40034b - 0x400282))
201

The difference in the file sizes was 272 bytes though, there are still 71 bytes unaccounted for. These are attributed to other changes required to faciliate this extra function which come in the form of added, or slightly changed, instructions in the exception header and frame.

It's clear that separating functions to their own source file ensures that programs which statically link against welibc will only receive the functions they call. Next we'll discover why this happens.

Translation Units

During the compilation of C programs the compiler will deal in discrete pieces of the larger program which are referred to as translation units. The Standard defines a translation unit as "A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines previously skipped by any of the conditional inclusion preprocessing directives ...".

When all string.h functions are included in a single source file they get lumped into a single translation unit. When you make a call into code within a translation unit, the entire unit must be included in the final output executable. When all functions for a given section of the standard library are in the same file (i.e. string.c), you end up including all functions even if only one was called. This is seen with the combined executable including memmove even though only memcpy was called.

Dynamic Linking

Now that we know why separating functions to their own function is beneficial for static linking, we need to explore what affect this will have on dynamic linking.

When dynamic linking is used, the operating system will load a library into an address space the first time an executable needs to use it. That library is then made available to any other executables that need to use it, without the overhead of it being loaded. The operating system takes care of linking the running executables with the library at run time (hence the name dynamic).

When a library is compiled with the intention of being used with dynamic linking, the output size isn't a concern because the price of loading is paid only once. Moving each function to its own file effectively makes no difference for dynamic loading since all code is included anyways.

Reorganizing

The way I'd like to lay out the source code, and the way other projects have also done it, is with directories that correspond to each header file. For example:

└── src/
    ├── assert
    ├── errno
    ├── math
    ├── stddef
    ├── stdio
    ├── stdlib
    └── string

This means that some changes need to be made to the Makefile so that it will find code two directories deep.

Gather Source Directories

First we need to find all directories that contain source code. After creating some new directories and shuffling the files around, the layout looks like so:

src/
├── errno
│   └── errno.c
├── _start.s
└── string
    ├── memcpy.c
    └── memmove.c

Make has a builtin function, wildcard, which will expand each pattern it receives into the files and directories that it finds.

$(wildcard $(SRCDIR)/*/)

With the current layout, this will expand to

src/string/ src/errno/ src/_start.s

We don't want "src/_start.s" in the mix, so we can use the builtin dir function to extract the directory portion of each filename:

$(dir $(wildcard $(SRCDIR)/*/))

Which gives us:

src/string/ src/errno/ src/

Finally, we will use sort to order the names and remove any possible duplicates. This list of directories will be stored in a variable for later use:

$(SRC_DIRS):= $(sort $(dir $(wildcard $(SRCDIR)/*/)))

Gather Source Files

Now that we've found all directories containing source code, we want to gather all source files that exist within those directories. The addsuffix builtin will add the given suffix to every item in the list; this is perfect for using another wildcard to find all files of a given type. Finally, the notdir function is used to extract just the filename from an item.

C_SRCS  :=  $(notdir $(wildcard $(addsuffix *.c,$(SRC_DIRS))))
S_SRCS  :=  $(notdir $(wildcard $(addsuffic *.s,$(SRC_DIRS))))

This gives us a listing of all current source files:

errno.c memmove.c memcmp.c memcpy.c
_start.s

Convert Sources into Objects

Next we will slightly alter the existing pattern substitutions so that the input data comes from $(C_SRCS) and $(S_SRCS).

OBJECTS := $(patsubst %.c,%.o,$(C_SRCS))
OBJECTS += $(patsubst %.s,%.o,$(S_SRCS))

Modify VPATH

Now that we know the names of all source files and the directories in which they exist, we need to modify the VPATH variable so that it contains all of the source directories. This way Make can find the source files when we refer to them by filename only, rather than by full path. This means switching out $(SRCDIR) for $(SRC_DIRS).

$(VPATH)   := $(SRC_DIRS):$(TSTDIR)

Conclusion

After seeing the advantages of placing every function in its own file it's clear that this is the way welibc should be organized. This only required a few adjustments to the Makefile and we're all set. Now binaries that link against welibc will be leaner and the code will be easier to browse.

One Function Per File