This chapter describes the dialect of the C programming language that is implemented by the CompCert C compiler and reference interpreter. It follows very closely the ISO C99 standard [5]. A few features of C99 are not supported at all; some other are supported only if the appropriate language support options are selected on the command line. On the other hand, some extensions to C99 are supported, borrowed from the ISO C2011 standard [6].
In this chapter, we describe both the restrictions and the extensions of CompCert C with respect to the C99 standard. We also document how CompCert implements the behaviors specified as implementation-dependent in C99. The description follows the structure of the C99 standard document [5]. In particular, section numbers (e.g. “§5.1.2.2”) correspond to those of the C99 standard document.
Target architecture | Size of pointers | Integer model |
AArch64, x86 64 bits, RISC-V 64 bits | 64 bits (8 bytes) | I32LP64 |
ARM, PowerPC, x86 32 bits, RISC-V 32 bits | 32 bits (4 bytes) | ILP32LL |
Type | Size | Range of values |
unsigned char | 1 byte | 0 to 255 |
signed char | 1 byte | −128 to 127 |
char | 1 byte | like signed char on x86 |
like unsigned char on PowerPC, ARM, and RISC-V | ||
unsigned short | 2 bytes | 0 to 65535 |
signed short | 2 bytes | −32768 to 32767 |
short | 2 bytes | −32768 to 32767 |
unsigned int | 4 bytes | 0 to 232−1 |
signed int | 4 bytes | −231 to 231−1 |
int | 4 bytes | −231 to 231−1 |
unsigned long | 4 bytes | 0 to 232−1 in the ILP32LL model |
8 bytes | 0 to 264−1 in the I32LP64 model | |
signed long | 4 bytes | −231 to 231−1 in the ILP32LL model |
8 bytes | −263 to 263−1 in the I32LP64 model | |
long | 4 bytes | −231 to 231−1 in the ILP32LL model |
8 bytes | −263 to 263−1 in the I32LP64 model | |
unsigned long long | 8 bytes | 0 to 264−1 |
signed long long | 8 bytes | −263 to 263−1 |
long long | 8 bytes | −263 to 263−1 |
_Bool | 1 byte | 0 or 1 |
Floating-point types follow the IEEE 754-2008 standard [12]:
Type | Representation | Size | Mantissa | Exponent |
float | IEEE 754 single precision | 4 bytes | 23 bits | −126 to 127 |
(binary32) | ||||
double | IEEE 754 double precision | 8 bytes | 52 bits | −1022 to 1023 |
(binary64) | ||||
long double | not supported by default; with -flongdouble option, like double |
During evaluation of floating-point operations, the floating-point format used is that implied by the type, without excess precision and range. This corresponds to a FLT_EVAL_METHOD equal to 0.
Signed integers use two’s-complement representation.
Pointer values can be converted to any pointer type. A pointer value can also be converted to any integer type of the same size and back to the original pointer type: the result is identical to the original pointer value. The type intptr_t and uintptr_t from <stdint.h> are integer types suitable for this purpose, as they are guaranteed to have the same size as all pointer types.
Conversions from double to float rounds the floating-point number to the nearest floating-point number representable in single precision. Conversions from integer types to floating-point types round to the nearest representable floating-point number.
_Alignas _Alignof __attribute__ __attribute __const __const__ __inline __inline__ __restrict __restrict__ __packed__ asm __asm __asm__ _Noreturn _Generic _Static_assert _Float16
The supported range of characters is the Unicode range from 0 to 0x10ffff, but excluding the range of surrogate characters from 0xd800 to 0xdfff.
Literal | Syntax | Type |
Regular character constant | ’...’ | int |
L wide character constant | L’...’ | wchar_t |
u wide character constant | u’...’ | char16_t (unsigned short) |
U wide character constant | U’...’ | char32_t (unsigned int) |
Multibyte character constants such as ’abcd’ are supported for regular character constants only. They support up to four characters/escapes in the range of 0 to 255 which are interpreted as digits of a big-endian number in base 256. For example, ’abcd’ is 0x61626364.
Literal | Syntax | Type | String encoding |
Regular string | "..." | char * | UTF-8 |
UTF-8 string | u8"..." | char * | UTF-8 |
L wide string | L"..." | wchar_t * | UCS-4 if wchar_t is 32-bit wide, |
UTF-16 if wchar_t is 16-bit wide | |||
u wide string | u"..." | char16_t * | UTF-16 |
U wide string | U"..." | char32_t * | UCS-4 |
CompCert C supports all expression operators specified in C99, with the restrictions and extensions described below.
Overflow in arithmetic over signed integer types is defined as taking the mathematically-exact result and reducing it modulo 232 or 264 to the range of representable signed integers. Bitwise operators (&, |, ^, ~, <<, >>) over signed integer types are interpreted following two’s-complement representation.
Floating-point operations round their results to the nearest representable floating point number, breaking ties by rounding to an even mantissa. If the program changes the rounding mode at run-time, it must be compiled with flag -ffloat-const-prop 0 (section 3.2.3). Otherwise, the compiler will perform compile-time evaluations of floating-point operations assuming round-to-nearest mode.
Floating-point intermediate results are computed in single precision if they are of type float (i.e. all arguments have integer or float types) and in double precision if they are of type double (i.e. one argument has type double). This corresponds to FLT_EVAL_METHOD equal to 0.
An integer or floating-point value stored in (part of) an object can be accessed by any lvalue having integer or floating-point type. The effect of such an access is defined taking into account the bit-level representation of the types (two’s complement for integers, IEEE 754 for floats) and the endianness of the target platform (big-endian for PowerPC, little-endian for x86 and RISC-V, and configuration dependent endianness for ARM). In contrast, a pointer value stored in an object can only be accessed by an lvalue having pointer type or integer type with the same size as a pointer type, such as intptr_t and uintptr_t. In other words, while the bit-level, in-memory representation of integers and floats is fully exposed by the CompCert C semantics, the in-memory representation of pointers is kept opaque and cannot be examined at any granularity other than a pointer-sized word.
union u { double d; unsigned int i[2]; unsigned char c[8]; };supports accessing any member after any other member has been stored. On the other hand, consider
union u { char * ptr; unsigned char c[4]; };If a pointer value is stored in the ptr member, accesses to the elements of the c member are not defined.
The type size_t of the result of sizeof and _Alignof is taken to be unsigned long.
Bit fields can be of enumeration type, e.g. enum e x: 2. Such bit fields are treated as unsigned if this enables all values of the enumeration to be represented exactly in the given number of bits, and as signed otherwise.
The members of a structure are laid out consecutively in declaration order, with enough bytes of padding inserted to guarantee that every non-bit-field member is aligned on its natural alignment, and every bit-field member is contained in a naturally-aligned storage unit. The natural alignment of a non-bit-field member can be modified by the _Alignas qualifier. Different layouts can be obtained if the -fpacked-structs option is set (section 3.2.9) and the packed attribute (section 6.2) is used.
Anonymous structures and anonymous unions are supported as in C2011. (See the C2011 standard, §6.7.2.1 paragraph 13 for a description.)
Accesses to objects of a volatile-qualified scalar type are treated as described in paragraph 6 of section 6.7.3: every assignment and dereferencing is treated as an observable event by the CompCert C formal semantics, and therefore is not subject to optimization by the CompCert compiler. Accesses to objects of a volatile-qualified composite type (struct or union type) are treated as regular, non-volatile accesses: no observable event is produced, and the access can be optimized away. The CompCert compiler emits a warning in the latter case.
Following ISO C2011, CompCert supports the _Alignas qualifier on variable or member declarations. This qualifier comes in two forms: _Alignas(N), where N is a compile-time constant integer expression that evaluates to a power of two; and _Alignas(T), where T is a type. The latter form is equivalent to _Alignas(_Alignof(T)).
The effect of the _Alignas(N) qualifier is to change the alignment of the type being qualified, setting the alignment to N. In particular, this affects the layout of struct fields. For instance:
struct s { char c; int _Alignas(8) i; };
The Alignas(8) qualifier changes the alignment of field i from 4 (the natural alignment of type int) to 8. This causes 7 bytes of padding to be inserted between c and i, instead of the normal 3 bytes. This also increases the size of struct s from 8 to 12, and the alignment of struct s from 4 to 8.
The alignment N given in the _Alignas(N) qualifier should normally be greater than or equal to the natural alignment of the modified type. For target platforms that support unaligned memory accesses (x86, PowerPC and RISC-V, but not ARM), N can also be smaller than the natural alignment.
Finally, CompCert C provides limited support for GCC-style attributes (__attribute keyword) used in type qualifier position. See section 6.2.
The asm statement (a popular extension for inline assembly) is not supported by default, but is supported if option -finline-asm is set. See section 6.6 for a complete description of the syntax and semantics of asm statements.
switch (expression) { case expr1: ... case expr2: ... ... default: ... }
In other words, the case and default labels that pertain to a switch must occur at the top-level of the block following the switch. The use of break is not mandatory, i.e. fall-though cases are allowed. Also, the order of the case and default statements is not restricted.
As an extension, CompCert also has full support for “unstructured” switch statements – e.g. the infamous Duff’s device – via an optional source-to-source transformation in the frontend, which must be activated via the command line switch -funstructured-switch.
Functions with a variable number of arguments, as denoted by an ellipsis ... in the prototype, are supported.
The result type of the function must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).
If GCC is used as hosting compiler, the default configuration sets the language mode to C99 via option -std=c99. For Diab as hosting compiler, the default language mode is used.
The #pragma directive and the _Pragma operator are always ignored when they occur within a function definition, as part of a compound statement. They can only be honored if they occur outside external declarations, at the top level of a compilation unit.
The pragmas supported by CompCert C are documented in section 6.1. The standard pragmas STDC FP_CONTRACT, STDC FENV_ACCESS, and STDC CX_LIMITED_RANGE are not supported.
When active, the warning unknown-pragmas causes the compiler to report every use of unsupported pragmas. (See section 3.2.10.)
CompCert defines the following macro names according §6.10.8.3 of the C11 standard to indicate supported languages features:
Furthermore, a definition of __STDC_IEC_559_COMPLEX__ by the hosting toolchain is removed since CompCert does not support Complex numbers.
The CompCert C compiler does not provide its own implementation of the C standard library. It provides a few standard headers and relies on the standard library of the target system for the others. CompCert has been successfully used in conjunction with the GNU glibc standard library. Note, however, the following points: