limits.h
tags: limits.h
When writing C code, it's important to know the ranges for the variables that
you'll use. If you hit the top end of what your variable can store then its
value will "roll-over" to a large negative number, or to zero. The limits.h
file contains the ranges for each type so that you can use them when necessary.
About limits.h
This header is architecture specific and we'll be writing it for an x86_64
machine. Although the values we use are specific, the macros we use are a huge
part of the portability of the C language itself. As a C programmer, you should
be aware of these values for the architecture you're programming for but you
should use these macros as much as possible so that your code will work on any
machine with a C compiler. For example, you can be aware of the fact that an
unsigned char
has a maximum value of 255 on your machine, but you should use
UCHAR_MAX
in your code instead of 255
.
In order to know what values to use for limits.h
, you'll need the Application
Binary Interface for your specific architecture. I searched Google for
"x86_64 ABI" and found it here. If you look at "Figure 3.1: Scalar Types"
on page 12 then you'll see each of the C types along with their sizes, in bytes,
on the x86_64 architecture. This table is what dictates our implementation.
Byte size
The first macro in limits.h
is CHAR_BIT
which establishes the number of bits
used to represent a byte. This may seem kind of odd because a byte is always 8
bits! Almost always. Most modern architectures follow this norm, but some
systems actually have a different number of bits per byte. This stack
overflow question goes into some more depth, but a few examples would be:
- 16-bit bytes: TI C54x DSP
- 9-bit bytes: DEC PDP-10
- 6-bit byte: CDC-6400
The ABI states that "the term byte refers to an 8-bit object", which tells us
that CHAR_BIT
should be 8:
#define CHAR_BIT (8)
Integer limits
For the rest of the macros in limits.h
we will use 2's complement
arithmetic and Figure 3.1 from the ABI to determine our values.
The formulas for determing these numbers are as follows:
- \(n =\)
CHAR_BIT
\(*\) number of bytes - signed minimum \(= -(2^{n-1})\)
- signed maximum \(= 2^{n-1} - 1\)
- unsigned minimum \(= 0\)
- unsigned maximum \(= 2^{n} - 1\)
For example, the char
type is 1 byte and gives us the following values:
- \(n = 8\)
- signed minimum \(= -(2^{8-1}) = {-128}\)
- signed maximum \(= 2^{8-1} - 1 = 127\)
- unsigned minimum \(= 0\)
- unsigned maximum \(= 2^{8} - 1 = 255\)
Notice anything odd here? The fact that \(|{-128}| \gt |127|\) presents a small
issue. We've defined that the maximum signed char
is \(127\), yet we use an even
larger (absolute) value to define the minimum signed char
value.
If we look at the C standard to understand how integers are parsed during
compilation, we'll see that the sign is not a part of an integer constant. The
standard states, "An integer constant begins with a digit, but has no period or
exponent part. It may have a prefix that specifies its base and a suffix which
specifies its type." So for the minimum signed char
value we computed, the
unary minus operator, -
, is parsed separately from the integer 128
and since
we want to use that value with signed char
types, then keeping the value as
\(128\) would result in overflowing the type. To avoid this we can substitute in
an expression equal to \({-128}\), like (-127 - 1)
. We'll use a similar format
for the signed minimum on other types.
Two final things to cover. The standard does not specify whether or not a char
is signed
or unsigned
. However, the ABI specifies that a char
is a signed
byte so we will implement it as such. Lastly, the macro MB_LEN_MAX
is for the
number of bytes in a multibyte character for any supported locale. By setting
this to 4
we allow support for UTF-8 which requires a maximum of 4 bytes
for any one character according to its [specification].
Our full implementation looks like so:
#define CHAR_BIT (8)
#define SCHAR_MIN (-127 - 1)
#define SCHAR_MAX (+127)
#define UCHAR_MAX (+255)
#define CHAR_MIN SCHAR_MIN
#define CHAR_MAX SCHAR_MAX
#define MB_LEN_MAX (4)
#define SHRT_MIN (-32767 - 1)
#define SHRT_MAX (+32767)
#define USHRT_MAX (+65535)
#define INT_MIN (-2147483647 - 1)
#define INT_MAX (+2147483647)
#define UINT_MAX (+4294967295U)
#define LONG_MIN (-9223372036854775807L - 1L)
#define LONG_MAX (+9223372036854775807L)
#define ULONG_MAX (+18446744073709551615UL)
You may notice on the last few macros that the constant expressions have a
letter suffixed to them, like U
or L
(or both). These are used to indicate
an unsigned value and a long value, respectively, and help communicate your
desired type for integer constants since they don't have type specifiers.