When writing C code, it's important to know the ranges for the variables that you'll use. If you hit the top end of what your variable can store then its value will "roll-over" to a large negative number, or to zero. The limits.h file contains the ranges for each type so that you can use them when necessary.

About limits.h

This header is architecture specific and we'll be writing it for an x86_64 machine. Although the values we use are specific, the macros we use are a huge part of the portability of the C language itself. As a C programmer, you should be aware of these values for the architecture you're programming for but you should use these macros as much as possible so that your code will work on any machine with a C compiler. For example, you can be aware of the fact that an unsigned char has a maximum value of 255 on your machine, but you should use UCHAR_MAX in your code instead of 255. In order to know what values to use for limits.h, you'll need the Application Binary Interface for your specific architecture. I searched Google for "x86_64 ABI" and found it here. If you look at "Figure 3.1: Scalar Types" on page 12 then you'll see each of the C types along with their sizes, in bytes, on the x86_64 architecture. This table is what dictates our implementation.

Byte size

The first macro in limits.h is CHAR_BIT which establishes the number of bits used to represent a byte. This may seem kind of odd because a byte is always 8 bits! Almost always. Most modern architectures follow this norm, but some systems actually have a different number of bits per byte. This stack overflow question goes into some more depth, but a few examples would be:

The ABI states that "the term byte refers to an 8-bit object", which tells us that CHAR_BIT should be 8:

#define CHAR_BIT    (8)

Integer limits

For the rest of the macros in limits.h we will use 2's complement arithmetic and Figure 3.1 from the ABI to determine our values.

The formulas for determing these numbers are as follows:

  • \(n =\) CHAR_BIT \(*\) number of bytes
  • signed minimum \(= -(2^{n-1})\)
  • signed maximum \(= 2^{n-1} - 1\)
  • unsigned minimum \(= 0\)
  • unsigned maximum \(= 2^{n} - 1\)

For example, the char type is 1 byte and gives us the following values:

  • \(n = 8\)
  • signed minimum \(= -(2^{8-1}) = {-128}\)
  • signed maximum \(= 2^{8-1} - 1 = 127\)
  • unsigned minimum \(= 0\)
  • unsigned maximum \(= 2^{8} - 1 = 255\)

Notice anything odd here? The fact that \(|{-128}| \gt |127|\) presents a small issue. We've defined that the maximum signed char is \(127\), yet we use an even larger (absolute) value to define the minimum signed char value.

If we look at the C standard to understand how integers are parsed during compilation, we'll see that the sign is not a part of an integer constant. The standard states, "An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix which specifies its type." So for the minimum signed char value we computed, the unary minus operator, -, is parsed separately from the integer 128 and since we want to use that value with signed char types, then keeping the value as \(128\) would result in overflowing the type. To avoid this we can substitute in an expression equal to \({-128}\), like (-127 - 1). We'll use a similar format for the signed minimum on other types.

Two final things to cover. The standard does not specify whether or not a char is signed or unsigned. However, the ABI specifies that a char is a signed byte so we will implement it as such. Lastly, the macro MB_LEN_MAX is for the number of bytes in a multibyte character for any supported locale. By setting this to 4 we allow support for UTF-8 which requires a maximum of 4 bytes for any one character according to its [specification].

Our full implementation looks like so:

#define CHAR_BIT    (8)

#define SCHAR_MIN   (-127 - 1)

#define SCHAR_MAX   (+127)

#define UCHAR_MAX   (+255)

#define CHAR_MIN    SCHAR_MIN

#define CHAR_MAX    SCHAR_MAX

#define MB_LEN_MAX  (4)

#define SHRT_MIN    (-32767 - 1)

#define SHRT_MAX    (+32767)

#define USHRT_MAX   (+65535)

#define INT_MIN     (-2147483647 - 1)

#define INT_MAX     (+2147483647)

#define UINT_MAX    (+4294967295U)

#define LONG_MIN    (-9223372036854775807L - 1L)

#define LONG_MAX    (+9223372036854775807L)

#define ULONG_MAX   (+18446744073709551615UL)

You may notice on the last few macros that the constant expressions have a letter suffixed to them, like U or L (or both). These are used to indicate an unsigned value and a long value, respectively, and help communicate your desired type for integer constants since they don't have type specifiers.

comments powered by Disqus