The stdarg.h header provides the functionality for marching through each argument of a function when you don't know how many arguments there will be until run time. When you see a function with an ellipsis as the last "parameter" then that function is probably using stdarg.h to process each argument. The printf function is just one example.

Stack Frames

First, we need to talk about stack frames and how they are constructed. If you've never heard of a stack frame then I recommend reading the Wikipedia article on call frames. This explanation is specific to the x86-64 architecture so keep that in mind as this may not be the way a different architecture handles things. A stack frame refers to the portion of stack memory that pertains to a specific function. The stack frame will contain function arguments (if they weren't passed in registers), the return address, a pointer to the previous base pointer, and then any local variables for the function. The base pointer is used to access both the arguments that were passed on the stack as well as the local variables for the function. There is a specific register, rbp, which will contain the current base pointer. Modifying this register without saving it could have serious consequences because you no longer have an easy reference point for accessing the contents of your stack frame. For a visual representation of this, the Wikipedia page I mentioned previously has a good picture or you can look at page 16 of the x86_64 ABI. Keep in mind that these pictures will seem to contradict eachother but this is only because the Wiki page shows the stack with low memory at the top while the ABI shows the stack with high memory at the top.

va_list

stdarg.h specifies a single type for holding the meta-data necessary for advancing through the argument list. Since this type is architecture specific then the Standard doesn't give any more information aside from it being "a type suitable for holding the information needed by the macros va_start, va_arg, and va_end." We can look to the x86_64 ABI for an answer though.

Each specific architecture will have its own way of performing argument passing. For example, the i386 architecture pushes every argument onto the stack before calling a function. Unfortunately, x86_64 is not so simple. The ABI describes the method through with parameters should be passed in detail and even provides the following definition for the va_list type:

typedef struct {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
} va_list[1];

Due to the way that x86_64 performs parameters passing, it is necessary to keep track of a few things so that we can access different types of parameters since they are stored in different places. The first of these is reg_save_area which is a pointer to the register save area. The register save area is a location on the stack which holds copies of arguments that were passed in registers, the reg_save_area pointer holds the location of the first register copy (which is always a copy of the rdi register). There is a limit to the amount of information that can be passed through registers and the excess arguments are passed directly on the stack. The overflow_arg_area is a pointer to the first of these excess arguments.

Now we have pointers to each group of arguments on the stack but we need to keep track of which argument is next. The gp_offset member is an offset from beginning of the reg_save_area that points to the next argument that was passed in the general purpose registers. The next argument could be retrieved by accessing what is on the stack at reg_save_area + gp_offset. However, what happens when we exhaust the arguments that were passed in registers? We set gp_offset to 48 which indicates that "register arguments have been exhausted" and then we would retrieve the next argument by accessing what is on the stack at overflow_arg_area. The value 48 comes from the number of general purpose registers used for argument passing: rdi, rsi, rdx, rcx, r8, and r9 give us 6, 8-byte, registers totaling 48 bytes on the stack for the copies of each register. Therefore, when gp_offset is 48 or greater, the location being accessed is no longer valid. overflow_arg_area works differently than reg_save_area because it should always point to the next argument to be retrieved; it needs to be updated every time an argument is retrieved. Lastly, there is the fp_offset member which works in the same way as gp_offset except it is the offset to the next argument passed in a floating-point register. This offset will have a value of 48 to 304 where the 304 indicates that all floating-point arguments passed in registers have been exhausted and the next one should be retrieved by using overflow_arg_area.

That is a little bit complicated but overall it's not too bad. The issues comes in with passing structures as arguments because a structure, on x86_64, may be passed partially through registers and partially on the stack. This results in a lot of extra logic in order to properly rebuild the entire structure from different parts of the stack. We'll talk more about this later.

va_start

The va_start macro is used to initialize a va_list structure so that it can be used to retrieve arguments. You must call va_start before accessing any arguments otherwise you won't know where they are.The x86_64 ABI gives us the information we need in order to initialize the va_list structure. This macro takes a va_list structure as well as the identifier of the rightmost parameter in the variable parameter list (e.g. the argument directly before the ellipsis).

First, we need to figure out where the register save area is. Looking at figure 3.33 in the ABI, we see that the register save area has a total size of 304 bytes which we calculate using the last register copy offset, 288, plus the size of the last register copy, 16 bytes for a floating-point register. This value should also be familiar from fp_offset because when it is set to 304, we know that we can't retrieve floating-point arguments from the register save area anymore. The compiler takes care of building the register save area for us, so we can be sure that 304 bytes of space will be taken up on the stack directly in between the previous rbp value and the local function variables. Since we know that the base pointer register, rbp, holds the stack location immediately after our register save area, we can subtract the size of the register save area to get the location we need.

  • reg_save_area \(= rbp - 304\)

Second, we need the location of the first argument passed on the stack, which should be in the stack location which follows the return address of the current stack stack frame (which is directly after the saved ebp). This location will be 16 bytes after the saved ebp:

  • overflow_arg_area \(= rbp + 16\)

Figure 3.33 from the ABI shows us the offsets to the first general purpose register as well as the first floating-point register, 0 and 48, respectively.

  • gp_offset \(= 0\)
  • fp_offset \(= 48\)

That's all we need for va_start to initialize the va_list structure.

va_arg

The va_arg macros expands to an expression that has the type and value of the next argument passed to a function. You must provide this macro with the va_list struct initialized from a call to va_start and the type of the argument that you want to be retrieved. va_arg will update the va_list structure appropriately so that a subsequent call to va_arg will work as expected.

The only primitive types that would be returned through va_arg would be the int and double types due to the way that the ABI is defined. For an int type we can just use reg_save_area + gp_offset to get the next int argument, or if gp_offset is greater than or equal to 48 we would read from overflow_arg_area. Likewise, for double arguments we would use reg_save_area + fp_offset or overflow_arg_area if fp_offset was 304 or greater.

For structures, things get pretty complicated since parts of the structure may be passed in general purpose registers, some parts may be passed in floating-point registers, and some parts may be passed directly on the stack. A structure can contain multiple members, each of varying types, and this is what makes it difficult to return a structure with va_arg on this architecture. There isn't a good way to dynamically determine the type of the next member within a structure from the standpoint of a C library, so we need to cheat a little bit for this macro.

The ABI even hints at this and states that "The va_arg macro is usually implemented as a compiler builtin and expanded in simplified forms for each particular type." gcc provides this builtin in the form of the macro __builtin_va_arg and we'll use it like so:

#define va_arg(ap, type)    __builtin_va_arg((ap), type)

va_end

The last macro must ensure that a normal return will happen from a function which called va_start. For the x86_64 architecture this macro doesn't actually need to do anything since va_start and va_arg only modify the va_list structure and not the stack or registers themselves. However, I would choose to zero out each member of the va_list struct just to ensure that va_start would need to be called be called before using va_arg again.

Cheating

Like I mentioned in the va_arg section, we needed to cheat a little bit for one of the macros. Since we are going to cheat on one of the macros then we also need to cheat on the rest of them to ensure that everything works together properly. gcc provides builtin versions of the va_list type as well as all three macros, so our implementation will look like so:

#ifndef _STDARG_H
#define _STDARG_H

typedef __builtin_va_list va_list;

#define va_start(ap, parmN) __builtin_va_start((ap), (parmN))
#define va_arg(ap, type)    __builtin_va_arg((ap), type)
#define va_end(ap)          __builtin_va_end((ap))

#endif /* _STDARG_H */

Freestanding Environment

Now that we've completed the stdarg.h header we have all the headers necessary for a freestanding environment. A freestanding environment is one in which C program can run without an underlying operating system. The freestanding execution environment requires all of the architecture specific header files: float.h, limits.h, stdarg.h, and stddef.h. This means that the rest of the header files (and any backing .c files) can be written in an architecture agnostic way and shouldn't require any assembly.

comments powered by Disqus