The CompCert C compiler
CompCert C is a compiler for the C programming language. Its intended use is the compilation of life-critical and mission-critical software written in C and meeting high levels of assurance. It accepts most of the ISO C 99 language, with some exceptions and a few extensions. It produces machine code for the PowerPC, ARM, RISC-V and x86 (32 and 64 bits) architectures. Performance of the generated code is decent but not outstanding: on PowerPC, about 90% of the performance of GCC version 4 at optimization level 1.
What sets CompCert C apart from any other production compiler, is that it is formally verified, using machine-assisted mathematical proofs, to be exempt from miscompilation issues. In other words, the executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This level of confidence in the correctness of the compilation process is unprecedented and contributes to meeting the highest levels of software assurance. In particular, using the CompCert C compiler is a natural complement to applying formal verification techniques (static analysis, program proof, model checking) at the source code level: the correctness proof of CompCert C guarantees that all safety properties verified on the source code automatically hold as well for the generated executable.
The subset of C supported
CompCert C supports all of ISO C 99, with the following exceptions:
-
By default,
switch
statements must be structured as in MISRA-C. Support for unstructuredswitch
, as in Duff's device, can be activated via the command-line option-funstructured-switch
. -
longjmp
andsetjmp
are not guaranteed to work. - Variable-length arrays are not supported.
Consequently, CompCert supports all of the MISRA-C 2004 subset of C, plus many features excluded by MISRA (such as recursive functions and dynamic heap memory allocation).
A number of ISO C 2011 features are also supported:
- The
_Alignof
operator and the_Alignas
attribute. - Anonymous structures and unions.
-
_Static_assert
statements and_Generic
expressions. - Unicode string literals and character constants.
CompCert also supports some extensions to ISO C taken from the GNU and Diab compilers:
- Pragmas and attributes to control alignment and section placement of global variables.
- Attributes to control struct packing, function inlining, etc.
- Built-in functions to give access to some processor-specific instructions.
- Inline assembly code with GNU-style arguments.
Architecture of the compiler
Part 1: Parsing, type-checking, and pre-simplifications.
This first part converts C source code into abstract syntax trees of the CompCert C language. Some constructs not natively supported by CompCert C are expanded away. For example, block-scoped local variables are renamed and lifted to function-local scope; and unstructured switch
statements are rewritten to structured switch
statements with goto
statements. Some other unsupported constructs, such as variable-length arrays, are rejected.
This part of CompCert (transformation of C source text to CompCert C
abstract syntax trees) is not formally verified. However, CompCert C is a
subset of C, and the compiler can output the generated
CompCert C code in C concrete syntax (flag -dc
),
therefore the result of this transformation can be manually inspected.
Moreover, most static analysis and program verification tools for C
operate on a simplified C language similar to CompCert C. By conducting
the analysis or the program verification directly on the CompCert C form,
bugs potentially introduced by this first part of the compiler can be
detected.
Part 2: Compilation of CompCert C AST to assembly AST. This part is the bulk of the compiler and the one that is proved correct in Coq. It is structured in 16 passes and uses 10 intermediate language, as depicted on the following diagram.
All intermediate languages are given formal semantics, and each of the transformation passes is proved to preserve semantics.
Part 3: Assembling and linking.
The abstract syntax tree for PowerPC or ARM or RISC-V or x86 assembly language
produced by part 2 is printed in concrete assembly syntax. The
system's assembler and linker are then called to produce object files
and executable files, respectively. This part is not yet formally
verified. A benefit of using the standard assembler and linker is
that object files produced by CompCert can be linked with existing
libraries compiled with gcc
. This is convenient for
testing, although the formal guarantees of semantic preservation apply
only to whole programs that have been compiled as a whole by CompCert
C.
Using the compiler
The executable for CompCert C is called ccomp
.
It has the standard command-line interface for a Unix C compiler.
For instance, to compile the single-file program src.c
and create an executable called exec
, just do
ccomp -o exec src.c
To compile a two-file program src1.c
and src2.c
, do
ccomp -c src1.c ccomp -c src2.c ccomp -o exec src1.o src2.o
To see the CompCert C code and the assembly code generated for
src1.c
, do
ccomp -dc -S src1.c
The generated assembly code is left in file src1.s
and the generated CompCert C code in file src1.compcert.c
.
For more details, see the CompCert C user's manual.
Performance of the generated code
On PowerPC and ARM, the code generated by CompCert runs at least twice
as fast as the code generated by GCC without optimizations
(gcc -O0
), and approximately 10% slower than GCC 4 at optimization
level 1 (gcc -O1
), 15% slower at optimization level 2
(gcc -O2
) and 20% slower at optimization level 3 (gcc -O3
). These numbers were obtained on the homemade benchmark mix shown in
the graph below. By lack of aggressive loop optimizations,
performance is lower on HPC codes involving lots of matrix
computations.
(The sources for the test programs are available in the
test/c
subdirectory of the CompCert distribution.
Measurements were done on an IBM Power7 running Linux.)