Previous Up Next

Chapter 5 The CompCert C language

This chapter describes the dialect of the C programming language that is implemented by the CompCert C compiler and reference interpreter. It follows very closely the ISO C99 standard [5]. A few features of C99 are not supported at all; some other are supported only if the appropriate language support options are selected on the command line. On the other hand, some extensions to C99 are supported, borrowed from the ISO C2011 standard [6].

In this chapter, we describe both the restrictions and the extensions of CompCert C with respect to the C99 standard. We also document how CompCert implements the behaviors specified as implementation-dependent in C99. The description follows the structure of the C99 standard document [5]. In particular, section numbers (e.g. “§5.1.2.2”) correspond to those of the C99 standard document.

§5 Environment

§5.1.2.2 Hosted environment.
CompCert C follows the hosted environment model. The function called at program startup is named main. According to the formal semantics, it must be defined without parameters: int main(void) { ... }. The CompCert C compiler also supports the two-parameter form int main(int argc, char *argv[]).
§5.2.1.1 Character sets.
CompCert defines its source character set and its execution character set as UTF-8 encoded Unicode.
§5.2.4.2 Numerical limits.
Depending on the target architecture, integer types follow one of two possible models, the “ILP32LL” model or the “I32LP64” model:
Target architectureSize of pointersInteger model
AArch64, x86 64 bits, RISC-V 64 bits64 bits (8 bytes)I32LP64
ARM, PowerPC, x86 32 bits, RISC-V 32 bits32 bits (4 bytes)ILP32LL
The numerical limits for integers are:
TypeSizeRange of values
unsigned char1 byte0 to 255
signed char1 byte−128 to 127
char1 bytelike signed char on x86
  like unsigned char on PowerPC, ARM, and RISC-V
unsigned short2 bytes0 to 65535
signed short2 bytes−32768 to 32767
short2 bytes−32768 to 32767
unsigned int4 bytes0 to 232−1
signed int4 bytes−231 to 231−1
int4 bytes−231 to 231−1
unsigned long4 bytes0 to 232−1 in the ILP32LL model
 8 bytes0 to 264−1 in the I32LP64 model
signed long4 bytes−231 to 231−1 in the ILP32LL model
 8 bytes−263 to 263−1 in the I32LP64 model
long4 bytes−231 to 231−1 in the ILP32LL model
 8 bytes−263 to 263−1 in the I32LP64 model
unsigned long long8 bytes0 to 264−1
signed long long8 bytes−263 to 263−1
long long8 bytes−263 to 263−1
_Bool1 byte0 or 1

Floating-point types follow the IEEE 754-2008 standard [12]:

TypeRepresentationSizeMantissaExponent
floatIEEE 754 single precision4 bytes23 bits−126 to 127
 (binary32)   
doubleIEEE 754 double precision8 bytes52 bits−1022 to 1023
 (binary64)   
long doublenot supported by default; with -flongdouble option, like double

During evaluation of floating-point operations, the floating-point format used is that implied by the type, without excess precision and range. This corresponds to a FLT_EVAL_METHOD equal to 0.

§6 Language

§6.2 Concepts
 
§6.2.5 Types
CompCert C supports all the types specified in C99, with the following exceptions:
  • The long double type is not supported by default. If the -flongdouble option is set, it is treated as a synonym for double.
  • Complex types (double _Complex, etc) are not supported.
  • The result type and argument types of a function type must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).
  • Variable-length arrays are not supported. The size N of an array declarator T[N] must always be a compile-time constant expression.
  • The _Float16 type (specified in C23) is not supported, but recognized by the CompCert frontend for improved compatibility with header files of the hosting compiler. Any occurrence of _Float16 in later translation phases is rejected as an error by CompCert.
§6.2.6 Representation of types

Signed integers use two’s-complement representation.

§6.3 Conversions
Conversion of an integer value to a signed integer type is always defined as reducing the integer value modulo 2N to the range of values representable by the N-bit signed integer type.

Pointer values can be converted to any pointer type. A pointer value can also be converted to any integer type of the same size and back to the original pointer type: the result is identical to the original pointer value. The type intptr_t and uintptr_t from <stdint.h> are integer types suitable for this purpose, as they are guaranteed to have the same size as all pointer types.

Conversions from double to float rounds the floating-point number to the nearest floating-point number representable in single precision. Conversions from integer types to floating-point types round to the nearest representable floating-point number.

§6.4 Lexical elements
 
§6.4.1 Keywords.
The following tokens are reserved as additional keywords:
    _Alignas      _Alignof       __attribute__    __attribute
    __const       __const__      __inline         __inline__
    __restrict    __restrict__   __packed__
    asm           __asm          __asm__
    _Noreturn     _Generic       _Static_assert   _Float16
§6.4.2 Identifiers
All characters of an identifier are significant, whether it has external linkage or not. Case is significant even in identifiers with external linkage. The “$” (dollar) character is accepted in identifiers.
§6.4.3 Universal character names
Universal character names are supported in character constants and string literals. They are not supported within identifiers.

The supported range of characters is the Unicode range from 0 to 0x10ffff, but excluding the range of surrogate characters from 0xd800 to 0xdfff.

§6.4.4 Constants
CompCert supports C2011 Unicode character constants and string literals.
LiteralSyntaxType
Regular character constant’...’int
L wide character constantL’...’wchar_t
u wide character constantu’...’char16_t (unsigned short)
U wide character constantU’...’char32_t (unsigned int)

Multibyte character constants such as ’abcd’ are supported for regular character constants only. They support up to four characters/escapes in the range of 0 to 255 which are interpreted as digits of a big-endian number in base 256. For example, ’abcd’ is 0x61626364.

LiteralSyntaxTypeString encoding
Regular string"..."char *UTF-8
UTF-8 stringu8"..."char *UTF-8
L wide stringL"..."wchar_t *UCS-4 if wchar_t is 32-bit wide,
   UTF-16 if wchar_t is 16-bit wide
u wide stringu"..."char16_t *UTF-16
U wide stringU"..."char32_t *UCS-4
§6.5 Expressions

CompCert C supports all expression operators specified in C99, with the restrictions and extensions described below.

Overflow in arithmetic over signed integer types is defined as taking the mathematically-exact result and reducing it modulo 232 or 264 to the range of representable signed integers. Bitwise operators (&, |, ^, ~, <<, >>) over signed integer types are interpreted following two’s-complement representation.

Floating-point operations round their results to the nearest representable floating point number, breaking ties by rounding to an even mantissa. If the program changes the rounding mode at run-time, it must be compiled with flag -ffloat-const-prop 0 (section 3.2.3). Otherwise, the compiler will perform compile-time evaluations of floating-point operations assuming round-to-nearest mode.

Floating-point intermediate results are computed in single precision if they are of type float (i.e. all arguments have integer or float types) and in double precision if they are of type double (i.e. one argument has type double). This corresponds to FLT_EVAL_METHOD equal to 0.

An integer or floating-point value stored in (part of) an object can be accessed by any lvalue having integer or floating-point type. The effect of such an access is defined taking into account the bit-level representation of the types (two’s complement for integers, IEEE 754 for floats) and the endianness of the target platform (big-endian for PowerPC, little-endian for x86 and RISC-V, and configuration dependent endianness for ARM). In contrast, a pointer value stored in an object can only be accessed by an lvalue having pointer type or integer type with the same size as a pointer type, such as intptr_t and uintptr_t. In other words, while the bit-level, in-memory representation of integers and floats is fully exposed by the CompCert C semantics, the in-memory representation of pointers is kept opaque and cannot be examined at any granularity other than a pointer-sized word.

§6.5.2 Postfix operators
If a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is as described in the last paragraph above: the operation is well defined as long as it does not entail accessing a stored pointer value with a type other than a pointer type or an integer type with the same size as a pointer type. For example, the declaration
    union u { double d; unsigned int i[2]; unsigned char c[8]; };
supports accessing any member after any other member has been stored. On the other hand, consider
    union u { char * ptr; unsigned char c[4]; };
If a pointer value is stored in the ptr member, accesses to the elements of the c member are not defined.
§6.5.3 Unary operators
Symmetrically with the sizeof operator, CompCert C supports the _Alignof operator from C2011, which can also be written __alignof__ as in GNU C. This operator applies to a type but not to an expression. It returns the natural alignment, in bytes, of this type.

The type size_t of the result of sizeof and _Alignof is taken to be unsigned long.

§6.5.4 Casts
See the comments on point §6.3 (“Conversions”) above concerning pointer casts and casts to signed integer types.
§6.5.5 Multiplicative operators
Division and remainder are undefined if the second argument is zero. Signed division and remainder are also undefined if the first argument is the smallest representable signed integer (−231 for type int) and the second argument is −1 (the only case where division overflows).
§6.5.6 Additive operators
Adding a pointer and an integer, or subtracting an integer from a pointer, are always defined even if the resulting pointer points outside the bounds of the underlying object. The byte offset with respect to the base of the underlying object is treated modulo 232 or 264 (depending on the bitsize of pointers on the target processor). Such out-of-bounds pointers cause undefined behavior when dereferenced or compared with other pointers.
§6.5.7 Bitwise shift operators
The right shift operator >> applied to a negative signed integer is specified as shifting in “1” bits from the left.
§6.5.8 Relational operators
Comparison between two non-null pointers is always defined if both pointers fall within the bounds of the underlying objects. Additionally, comparison is also defined if one of the pointers is “one past the end of an object” and the other pointer is identical to the first pointer or falls within the bounds of the same object. Comparison between a pointer “one past” the end of an object and a pointer within a different object is undefined behavior.
§6.5.9 Equality operators
Same remark as in §6.5.8 concerning pointer comparisons.
§6.6 Constant expressions
No differences with C99.
§6.7 Declarations
CompCert C supports all declarations specified in C99, with the restrictions and extensions described below.
§6.7.2 Type specifiers
Complex types (the _Complex specifier) are not supported.
§6.7.2.1 Structure and union specifiers
Bit fields of “plain” type int are treated as signed. In accordance with the ELF Application Binary Interfaces, bit fields within an integer are allocated most significant bits first on the PowerPC and big-endian ARM platforms, and least significant bits first on the AArch64, ARM, RISC-V, and x86 platforms. A bit field never straddles an integer boundary and is always contained within a storage unit of size and alignment determined by its integer type.

Bit fields can be of enumeration type, e.g. enum e x: 2. Such bit fields are treated as unsigned if this enables all values of the enumeration to be represented exactly in the given number of bits, and as signed otherwise.

The members of a structure are laid out consecutively in declaration order, with enough bytes of padding inserted to guarantee that every non-bit-field member is aligned on its natural alignment, and every bit-field member is contained in a naturally-aligned storage unit. The natural alignment of a non-bit-field member can be modified by the _Alignas qualifier. Different layouts can be obtained if the -fpacked-structs option is set (section 3.2.9) and the packed attribute (section 6.2) is used.

Anonymous structures and anonymous unions are supported as in C2011. (See the C2011 standard, §6.7.2.1 paragraph 13 for a description.)

§6.7.2.2 Enums
The values of an enumeration type have type int.
§6.7.3 Type qualifiers
The const and volatile qualifiers are honored, with the restriction below on volatile composite types. The restrict qualifier is accepted but ignored.

Accesses to objects of a volatile-qualified scalar type are treated as described in paragraph 6 of section 6.7.3: every assignment and dereferencing is treated as an observable event by the CompCert C formal semantics, and therefore is not subject to optimization by the CompCert compiler. Accesses to objects of a volatile-qualified composite type (struct or union type) are treated as regular, non-volatile accesses: no observable event is produced, and the access can be optimized away. The CompCert compiler emits a warning in the latter case.

Following ISO C2011, CompCert supports the _Alignas qualifier on variable or member declarations. This qualifier comes in two forms: _Alignas(N), where N is a compile-time constant integer expression that evaluates to a power of two; and _Alignas(T), where T is a type. The latter form is equivalent to _Alignas(_Alignof(T)).

The effect of the _Alignas(N) qualifier is to change the alignment of the type being qualified, setting the alignment to N. In particular, this affects the layout of struct fields. For instance:

    struct s { char c; int _Alignas(8) i; };

The Alignas(8) qualifier changes the alignment of field i from 4 (the natural alignment of type int) to 8. This causes 7 bytes of padding to be inserted between c and i, instead of the normal 3 bytes. This also increases the size of struct s from 8 to 12, and the alignment of struct s from 4 to 8.

The alignment N given in the _Alignas(N) qualifier should normally be greater than or equal to the natural alignment of the modified type. For target platforms that support unaligned memory accesses (x86, PowerPC and RISC-V, but not ARM), N can also be smaller than the natural alignment.

Finally, CompCert C provides limited support for GCC-style attributes (__attribute keyword) used in type qualifier position. See section 6.2.

§6.7.4 Function specifiers
Two function specifiers are supported: inline from ISO C99 and _Noreturn from ISO C2011.
§6.7.5.2 Array declarators
Variable-length arrays are not supported. The only supported array declarators are those of ISO C90, namely [] for an incomplete array type, and [N] where N is a compile-time constant expression for a complete array type.
§6.7.5.3 Function declarators
The result type of a function must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).
§6.7.8 Initialization
Both traditional (ISO C90) and designated initializers are supported, conforming with ISO C99.
Static Assertions
CompCert supports static assertions via the _Static_assert keyword as described in the C2011 standard, §6.7.10.
Generic selection
CompCert supports the C11 _Generic operator for generic selection as described in the C2011 standard, §6.5.1.1.
§6.8 Statements and blocks
All forms of statements specified in C99 are supported in CompCert C, with the exception described below.

The asm statement (a popular extension for inline assembly) is not supported by default, but is supported if option -finline-asm is set. See section 6.6 for a complete description of the syntax and semantics of asm statements.

§6.8.4.2 The switch statement
By default, the switch statement in CompCert C is restricted to a “structured” form as present in Java and mandated by Misra-C. Namely, the switch statement must have the following form:
    switch (expression) {
        case expr1: ...
        case expr2: ...
        ...
        default: ...
    }

In other words, the case and default labels that pertain to a switch must occur at the top-level of the block following the switch. The use of break is not mandatory, i.e. fall-though cases are allowed. Also, the order of the case and default statements is not restricted.

As an extension, CompCert also has full support for “unstructured” switch statements – e.g. the infamous Duff’s device – via an optional source-to-source transformation in the frontend, which must be activated via the command line switch -funstructured-switch.

§6.9 External definitions
Function definitions should be written in modern, prototype form. The compiler accepts traditional, non-prototype function definitions but converts them to prototype form. In particular, T f() {...} is automatically converted to T f(void) {...}.

Functions with a variable number of arguments, as denoted by an ellipsis ... in the prototype, are supported.

The result type of the function must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.9).

§6.10 Preprocessing directives
The CompCert C compiler does not perform preprocessing itself, but delegates this task to an external C preprocessor, such as that of GCC. The external preprocessor is assumed to conform to the C99 standard.

If GCC is used as hosting compiler, the default configuration sets the language mode to C99 via option -std=c99. For Diab as hosting compiler, the default language mode is used.

§6.10.6 Pragma directive / §6.10.9 Pragma operator

The #pragma directive and the _Pragma operator are always ignored when they occur within a function definition, as part of a compound statement. They can only be honored if they occur outside external declarations, at the top level of a compilation unit.

The pragmas supported by CompCert C are documented in section 6.1. The standard pragmas STDC FP_CONTRACT, STDC FENV_ACCESS, and STDC CX_LIMITED_RANGE are not supported.

When active, the warning unknown-pragmas causes the compiler to report every use of unsupported pragmas. (See section 3.2.10.)

§6.10.8 Predefined macro names

CompCert defines the following macro names according §6.10.8.3 of the C11 standard to indicate supported languages features:

  • __STDC_NO_ATOMICS__: no support for optional standard atomics,
  • __STDC_NO_COMPLEX__: no support for optional standard complex numbers,
  • __STDC_NO_THREADS__: no support for optional standard threads, and
  • __STDC_NO_VLA__: no support for standard variable length arrays.

Furthermore, a definition of __STDC_IEC_559_COMPLEX__ by the hosting toolchain is removed since CompCert does not support Complex numbers.

§7 Library

The CompCert C compiler does not provide its own implementation of the C standard library. It provides a few standard headers and relies on the standard library of the target system for the others. CompCert has been successfully used in conjunction with the GNU glibc standard library. Note, however, the following points:

§7.6 Floating-point environment <fenv.h>
The CompCert formal semantics and optimization passes assume round-to-nearest behavior in floating-point arithmetic. If the program changes rounding modes during execution using the fesetround function, it must be compiled with option -ffloat-const-prop 0 to turn off certain floating-point optimizations.
§7.12 Mathematics <math.h>
Many versions of <math.h> include long double versions of the math functions. These functions cannot be called by CompCert-compiled code by lack of ABI-conformant support for the long double type.
§7.13 Non-local jumps <setjmp.h>
The CompCert C compiler has no special knowledge of the setjmp and longjmp functions, treating them as ordinary functions that respect control flow. It is therefore not advised to use these two functions in CompCert-compiled code. To prevent misoptimisation, it is crucial that all local variables that are live across an invocation of setjmp be declared with volatile modifier.

Previous Up Next