The CompCert C language

Chapter 5 The CompCert C language

This chapter describes the dialect of the C programming language that is implemented by the CompCert C compiler and reference interpreter. It follows very closely the ISO C99 standard [5]. A few features of C99 are not supported at all; some other are supported only if the appropriate language support options are selected on the command line. On the other hand, some extensions to C99 are supported, borrowed from the ISO C2011 standard [6].

In this chapter, we describe both the restrictions and the extensions of CompCert C with respect to the C99 standard. We also document how CompCert implements the behaviors specified as implementation-dependent in C99. The description follows the structure of the C99 standard document [5]. In particular, section numbers (e.g. “§5.1.2.2”) correspond to those of the C99 standard document.

§5 Environment

§5.1.2.2 Hosted environment.

CompCert C follows the hosted environment model. The function called at program startup is named main. According to the formal semantics, it must be defined without parameters: int main(void) { ... }. The CompCert C compiler also supports the two-parameter form int main(int argc, char *argv[]).

§5.2.1.1 Character sets.

CompCert defines its source character set and its execution character set as UTF-8 encoded Unicode.

§5.2.4.2 Numerical limits.

Depending on the target architecture, integer types follow one of two possible models, the “ILP32LL” model or the “I32LP64” model:

Target architecture	Size of pointers	Integer model
AArch64, x86 64 bits, RISC-V 64 bits	64 bits (8 bytes)	I32LP64
ARM, PowerPC, x86 32 bits, RISC-V 32 bits	32 bits (4 bytes)	ILP32LL

The numerical limits for integers are:

Type	Size	Range of values
unsigned char	1 byte	0 to 255
signed char	1 byte	−128 to 127
char	1 byte	like signed char on x86
		like unsigned char on PowerPC, ARM, and RISC-V
unsigned short	2 bytes	0 to 65535
signed short	2 bytes	−32768 to 32767
short	2 bytes	−32768 to 32767
unsigned int	4 bytes	0 to 2³²−1
signed int	4 bytes	−2³¹ to 2³¹−1
int	4 bytes	−2³¹ to 2³¹−1
unsigned long	4 bytes	0 to 2³²−1 in the ILP32LL model
	8 bytes	0 to 2⁶⁴−1 in the I32LP64 model
signed long	4 bytes	−2³¹ to 2³¹−1 in the ILP32LL model
	8 bytes	−2⁶³ to 2⁶³−1 in the I32LP64 model
long	4 bytes	−2³¹ to 2³¹−1 in the ILP32LL model
	8 bytes	−2⁶³ to 2⁶³−1 in the I32LP64 model
unsigned long long	8 bytes	0 to 2⁶⁴−1
signed long long	8 bytes	−2⁶³ to 2⁶³−1
long long	8 bytes	−2⁶³ to 2⁶³−1
_Bool	1 byte	0 or 1

Floating-point types follow the IEEE 754-2008 standard [12]:

Type	Representation	Size	Mantissa	Exponent
float	IEEE 754 single precision	4 bytes	23 bits	−126 to 127
	(binary32)
double	IEEE 754 double precision	8 bytes	52 bits	−1022 to 1023
	(binary64)
long double	not supported by default; with -flongdouble option, like double

During evaluation of floating-point operations, the floating-point format used is that implied by the type, without excess precision and range. This corresponds to a FLT_EVAL_METHOD equal to 0.

§6 Language

§6.2 Concepts

§6.2.5 Types

CompCert C supports all the types specified in C99, with the following exceptions:

The long double type is not supported by default. If the -flongdouble option is set, it is treated as a synonym for double.
Complex types (double _Complex, etc) are not supported.
The result type and argument types of a function type must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).
Variable-length arrays are not supported. The size N of an array declarator T[N] must always be a compile-time constant expression.
The _Float16 type (specified in C23) is not supported, but recognized by the CompCert frontend for improved compatibility with header files of the hosting compiler. Any occurrence of _Float16 in later translation phases is rejected as an error by CompCert.

§6.2.6 Representation of types

Signed integers use two’s-complement representation.

§6.3 Conversions

Conversion of an integer value to a signed integer type is always defined as reducing the integer value modulo 2^N to the range of values representable by the N-bit signed integer type.

Pointer values can be converted to any pointer type. A pointer value can also be converted to any integer type of the same size and back to the original pointer type: the result is identical to the original pointer value. The type intptr_t and uintptr_t from <stdint.h> are integer types suitable for this purpose, as they are guaranteed to have the same size as all pointer types.

Conversions from double to float rounds the floating-point number to the nearest floating-point number representable in single precision. Conversions from integer types to floating-point types round to the nearest representable floating-point number.

§6.4 Lexical elements

§6.4.1 Keywords.

The following tokens are reserved as additional keywords:

    _Alignas      _Alignof       __attribute__    __attribute
    __const       __const__      __inline         __inline__
    __restrict    __restrict__   __packed__
    asm           __asm          __asm__
    _Noreturn     _Generic       _Static_assert   _Float16

§6.4.2 Identifiers

All characters of an identifier are significant, whether it has external linkage or not. Case is significant even in identifiers with external linkage. The “$” (dollar) character is accepted in identifiers.

§6.4.3 Universal character names

Universal character names are supported in character constants and string literals. They are not supported within identifiers.

The supported range of characters is the Unicode range from 0 to 0x10ffff, but excluding the range of surrogate characters from 0xd800 to 0xdfff.

§6.4.4 Constants

CompCert supports C2011 Unicode character constants and string literals.

Literal	Syntax	Type
Regular character constant	’...’	int
L wide character constant	L’...’	wchar_t
u wide character constant	u’...’	char16_t (unsigned short)
U wide character constant	U’...’	char32_t (unsigned int)

Multibyte character constants such as ’abcd’ are supported for regular character constants only. They support up to four characters/escapes in the range of 0 to 255 which are interpreted as digits of a big-endian number in base 256. For example, ’abcd’ is 0x61626364.

Literal	Syntax	Type	String encoding
Regular string	"..."	char *	UTF-8
UTF-8 string	u8"..."	char *	UTF-8
L wide string	L"..."	wchar_t *	UCS-4 if wchar_t is 32-bit wide,
			UTF-16 if wchar_t is 16-bit wide
u wide string	u"..."	char16_t *	UTF-16
U wide string	U"..."	char32_t *	UCS-4

§6.5 Expressions

CompCert C supports all expression operators specified in C99, with the restrictions and extensions described below.

Overflow in arithmetic over signed integer types is defined as taking the mathematically-exact result and reducing it modulo 2³² or 2⁶⁴ to the range of representable signed integers. Bitwise operators (&, |, ^, ~, <<, >>) over signed integer types are interpreted following two’s-complement representation.

Floating-point operations round their results to the nearest representable floating point number, breaking ties by rounding to an even mantissa. If the program changes the rounding mode at run-time, it must be compiled with flag -ffloat-const-prop 0 (section 3.2.3). Otherwise, the compiler will perform compile-time evaluations of floating-point operations assuming round-to-nearest mode.

Floating-point intermediate results are computed in single precision if they are of type float (i.e. all arguments have integer or float types) and in double precision if they are of type double (i.e. one argument has type double). This corresponds to FLT_EVAL_METHOD equal to 0.

An integer or floating-point value stored in (part of) an object can be accessed by any lvalue having integer or floating-point type. The effect of such an access is defined taking into account the bit-level representation of the types (two’s complement for integers, IEEE 754 for floats) and the endianness of the target platform (big-endian for PowerPC, little-endian for x86 and RISC-V, and configuration dependent endianness for ARM). In contrast, a pointer value stored in an object can only be accessed by an lvalue having pointer type or integer type with the same size as a pointer type, such as intptr_t and uintptr_t. In other words, while the bit-level, in-memory representation of integers and floats is fully exposed by the CompCert C semantics, the in-memory representation of pointers is kept opaque and cannot be examined at any granularity other than a pointer-sized word.

§6.5.2 Postfix operators

If a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is as described in the last paragraph above: the operation is well defined as long as it does not entail accessing a stored pointer value with a type other than a pointer type or an integer type with the same size as a pointer type. For example, the declaration

    union u { double d; unsigned int i[2]; unsigned char c[8]; };

supports accessing any member after any other member has been stored. On the other hand, consider

    union u { char * ptr; unsigned char c[4]; };

If a pointer value is stored in the ptr member, accesses to the elements of the c member are not defined.

§6.5.3 Unary operators

Symmetrically with the sizeof operator, CompCert C supports the _Alignof operator from C2011, which can also be written __alignof__ as in GNU C. This operator applies to a type but not to an expression. It returns the natural alignment, in bytes, of this type.

The type size_t of the result of sizeof and _Alignof is taken to be unsigned long.

§6.5.4 Casts

See the comments on point §6.3 (“Conversions”) above concerning pointer casts and casts to signed integer types.

§6.5.5 Multiplicative operators

Division and remainder are undefined if the second argument is zero. Signed division and remainder are also undefined if the first argument is the smallest representable signed integer (−2³¹ for type int) and the second argument is −1 (the only case where division overflows).

§6.5.6 Additive operators

Adding a pointer and an integer, or subtracting an integer from a pointer, are always defined even if the resulting pointer points outside the bounds of the underlying object. The byte offset with respect to the base of the underlying object is treated modulo 2³² or 2⁶⁴ (depending on the bitsize of pointers on the target processor). Such out-of-bounds pointers cause undefined behavior when dereferenced or compared with other pointers.

§6.5.7 Bitwise shift operators

The right shift operator >> applied to a negative signed integer is specified as shifting in “1” bits from the left.

§6.5.8 Relational operators

Comparison between two non-null pointers is always defined if both pointers fall within the bounds of the underlying objects. Additionally, comparison is also defined if one of the pointers is “one past the end of an object” and the other pointer is identical to the first pointer or falls within the bounds of the same object. Comparison between a pointer “one past” the end of an object and a pointer within a different object is undefined behavior.

§6.5.9 Equality operators

Same remark as in §6.5.8 concerning pointer comparisons.

§6.6 Constant expressions

No differences with C99.

§6.7 Declarations

CompCert C supports all declarations specified in C99, with the restrictions and extensions described below.

§6.7.2 Type specifiers

Complex types (the _Complex specifier) are not supported.

§6.7.2.1 Structure and union specifiers

Bit fields of “plain” type int are treated as signed. In accordance with the ELF Application Binary Interfaces, bit fields within an integer are allocated most significant bits first on the PowerPC and big-endian ARM platforms, and least significant bits first on the AArch64, ARM, RISC-V, and x86 platforms. A bit field never straddles an integer boundary and is always contained within a storage unit of size and alignment determined by its integer type.

Bit fields can be of enumeration type, e.g. enum e x: 2. Such bit fields are treated as unsigned if this enables all values of the enumeration to be represented exactly in the given number of bits, and as signed otherwise.

The members of a structure are laid out consecutively in declaration order, with enough bytes of padding inserted to guarantee that every non-bit-field member is aligned on its natural alignment, and every bit-field member is contained in a naturally-aligned storage unit. The natural alignment of a non-bit-field member can be modified by the _Alignas qualifier. Different layouts can be obtained if the -fpacked-structs option is set (section 3.2.9) and the packed attribute (section 6.2) is used.

Anonymous structures and anonymous unions are supported as in C2011. (See the C2011 standard, §6.7.2.1 paragraph 13 for a description.)

§6.7.2.2 Enums

The values of an enumeration type have type int.

§6.7.3 Type qualifiers

The const and volatile qualifiers are honored, with the restriction below on volatile composite types. The restrict qualifier is accepted but ignored.

Accesses to objects of a volatile-qualified scalar type are treated as described in paragraph 6 of section 6.7.3: every assignment and dereferencing is treated as an observable event by the CompCert C formal semantics, and therefore is not subject to optimization by the CompCert compiler. Accesses to objects of a volatile-qualified composite type (struct or union type) are treated as regular, non-volatile accesses: no observable event is produced, and the access can be optimized away. The CompCert compiler emits a warning in the latter case.

Following ISO C2011, CompCert supports the _Alignas qualifier on variable or member declarations. This qualifier comes in two forms: _Alignas(N), where N is a compile-time constant integer expression that evaluates to a power of two; and _Alignas(T), where T is a type. The latter form is equivalent to _Alignas(_Alignof(T)).

The effect of the _Alignas(N) qualifier is to change the alignment of the type being qualified, setting the alignment to N. In particular, this affects the layout of struct fields. For instance:

    struct s { char c; int _Alignas(8) i; };

The Alignas(8) qualifier changes the alignment of field i from 4 (the natural alignment of type int) to 8. This causes 7 bytes of padding to be inserted between c and i, instead of the normal 3 bytes. This also increases the size of struct s from 8 to 12, and the alignment of struct s from 4 to 8.

The alignment N given in the _Alignas(N) qualifier should normally be greater than or equal to the natural alignment of the modified type. For target platforms that support unaligned memory accesses (x86, PowerPC and RISC-V, but not ARM), N can also be smaller than the natural alignment.

Finally, CompCert C provides limited support for GCC-style attributes (__attribute keyword) used in type qualifier position. See section 6.2.

§6.7.4 Function specifiers

Two function specifiers are supported: inline from ISO C99 and _Noreturn from ISO C2011.

§6.7.5.2 Array declarators

Variable-length arrays are not supported. The only supported array declarators are those of ISO C90, namely [] for an incomplete array type, and [N] where N is a compile-time constant expression for a complete array type.

§6.7.5.3 Function declarators

The result type of a function must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).

§6.7.8 Initialization

Both traditional (ISO C90) and designated initializers are supported, conforming with ISO C99.

Static Assertions

CompCert supports static assertions via the _Static_assert keyword as described in the C2011 standard, §6.7.10.

Generic selection

CompCert supports the C11 _Generic operator for generic selection as described in the C2011 standard, §6.5.1.1.

§6.8 Statements and blocks

All forms of statements specified in C99 are supported in CompCert C, with the exception described below.

The asm statement (a popular extension for inline assembly) is not supported by default, but is supported if option -finline-asm is set. See section 6.6 for a complete description of the syntax and semantics of asm statements.

§6.8.4.2 The switch statement

By default, the switch statement in CompCert C is restricted to a “structured” form as present in Java and mandated by Misra-C. Namely, the switch statement must have the following form:

    switch (expression) {
        case expr1: ...
        case expr2: ...
        ...
        default: ...
    }

In other words, the case and default labels that pertain to a switch must occur at the top-level of the block following the switch. The use of break is not mandatory, i.e. fall-though cases are allowed. Also, the order of the case and default statements is not restricted.

As an extension, CompCert also has full support for “unstructured” switch statements – e.g. the infamous Duff’s device – via an optional source-to-source transformation in the frontend, which must be activated via the command line switch -funstructured-switch.

§6.9 External definitions

Function definitions should be written in modern, prototype form. The compiler accepts traditional, non-prototype function definitions but converts them to prototype form. In particular, T f() {...} is automatically converted to T f(void) {...}.

Functions with a variable number of arguments, as denoted by an ellipsis ... in the prototype, are supported.

The result type of the function must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).

§6.10 Preprocessing directives

The CompCert C compiler does not perform preprocessing itself, but delegates this task to an external C preprocessor, such as that of GCC. The external preprocessor is assumed to conform to the C99 standard.

If GCC is used as hosting compiler, the default configuration sets the language mode to C99 via option -std=c99. For Diab as hosting compiler, the default language mode is used.

§6.10.6 Pragma directive / §6.10.9 Pragma operator

The #pragma directive and the _Pragma operator are always ignored when they occur within a function definition, as part of a compound statement. They can only be honored if they occur outside external declarations, at the top level of a compilation unit.

The pragmas supported by CompCert C are documented in section 6.1. The standard pragmas STDC FP_CONTRACT, STDC FENV_ACCESS, and STDC CX_LIMITED_RANGE are not supported.

When active, the warning unknown-pragmas causes the compiler to report every use of unsupported pragmas. (See section 3.2.10.)

§6.10.8 Predefined macro names

CompCert defines the following macro names according §6.10.8.3 of the C11 standard to indicate supported languages features:

__STDC_NO_ATOMICS__: no support for optional standard atomics,
__STDC_NO_COMPLEX__: no support for optional standard complex numbers,
__STDC_NO_THREADS__: no support for optional standard threads, and
__STDC_NO_VLA__: no support for standard variable length arrays.

Furthermore, a definition of __STDC_IEC_559_COMPLEX__ by the hosting toolchain is removed since CompCert does not support Complex numbers.

§7 Library

The CompCert C compiler does not provide its own implementation of the C standard library. It provides a few standard headers and relies on the standard library of the target system for the others. CompCert has been successfully used in conjunction with the GNU glibc standard library. Note, however, the following points:

§7.6 Floating-point environment <fenv.h>

The CompCert formal semantics and optimization passes assume round-to-nearest behavior in floating-point arithmetic. If the program changes rounding modes during execution using the fesetround function, it must be compiled with option -ffloat-const-prop 0 to turn off certain floating-point optimizations.

§7.12 Mathematics <math.h>

Many versions of <math.h> include long double versions of the math functions. These functions cannot be called by CompCert-compiled code by lack of ABI-conformant support for the long double type.

§7.13 Non-local jumps <setjmp.h>

The CompCert C compiler has no special knowledge of the setjmp and longjmp functions, treating them as ordinary functions that respect control flow. It is therefore not advised to use these two functions in CompCert-compiled code. To prevent misoptimisation, it is crucial that all local variables that are live across an invocation of setjmp be declared with volatile modifier.