This chapter describes the CompCert C reference interpreter and how to invoke it.
The CompCert C reference interpreter executes the given C source file by interpretation, displaying the outcome of the execution (normal termination or aborting on an undefined behavior), as well as the observable effects (e.g. printf calls) performed during the execution.
The reference interpreter is faithful to the formal semantics of the CompCert C language: all the behaviors that it displays are possible behaviors according to the formal semantics. In particular, the reference interpreter immediately reports and stops when an undefined behavior of the C source program is encountered. This is not the case for the machine code generated by compiling this program: once the undefined behavior is triggered, the machine code can crash, but it can also continue with any other behavior.
The primary use of the reference interpreter is to check whether a particular run of a C program exhibits behaviors that are undefined in CompCert C. If it does, something is wrong with the program, and the program should be fixed. The reference interpreter can also be useful to familiarize oneself with the CompCert C formal semantics, and validate it experimentally by testing.
The reference interpreter is presented as a special mode, -interp, of the ccomp command-line executable of the CompCert C compiler. An invocation of the reference interpreter is of the form
The input C file is preprocessed, elaborated to the CompCert C subset language, then interpreted and its observable effects displayed.
The following limitations apply to the C source files that can be interpreted.
printf | to display formatted text on standard output |
malloc | to dynamically allocate memory |
free | to free memory allocated by malloc |
__builtin_annot | to mark execution points |
__builtin_annot_intval | (likewise) |
__builtin_fabs | floating-point absolute value |
int main(void) { ... } int main(int argc, char ** argv) { ... }
The following options to the ccomp command apply specifically to the reference interpreter.
By default, the reference interpreter prints whatever output is produced by the program via calls to the printf function, plus messages to indicate program termination as well as other observable events.
Like that of C, the semantics of CompCert C is internally nondeterministic: in general, several evaluation orders are possible for a given expression, and different orders can produce different observable behaviors for the program. By default, the interpreter evaluates C expressions following a fixed, left-to-right evaluation order.
In addition, all the options of the CompCert C compiler are recognized (see section 3.2). The ones that make sense in interpreter mode are:
Consider the file fact.c containing the following program:
#include <stdio.h> int fact(int n) { int r = 1; int i; for (i = 2; i <= n; i++) r *= i; return r; } int main(void) { printf("fact(10) = %d\n", fact(10)); return 0; }
Running ccomp -interp fact.c produces the following output:
fact(10) = 3628800 Time 251: observable event: extcall printf(& __stringlit_1, 3628800) -> 0 Time 256: program terminated (exit code = 0)
The first line is the output produced by the printf statement. The other three lines report the two observable effects performed by the program: first, after 251 execution steps, a call to the external function printf; then, after 256 execution steps, successful termination with exit code 0.
To make more sense out of the messages, we can add the -dc option to the command line, then look at the generated fact.compcert.c file, which contains the CompCert C program as the interpreter sees it:
unsigned char const __stringlit_1[15] = "fact(10) = %d\012"; extern int printf(unsigned char *, ...); int fact(int n) { int r; int i; r = 1; for (i = 2; i <= n; i++) { r *= i; } return r; } int main(void) { printf(__stringlit_1, fact(10)); return 0; }
We see that the string literal appearing as first argument to printf was lifted outside the call and bound to a global variable __stringlit_1.
Interpreting fact.c with the -trace option, we obtain a long and detailed trace of the execution, of which we show only a few lines:
Time 0: calling main() --[step_internal_function]--> Time 1: in function main, statement printf(__stringlit_1, fact(10)); return 0; --[step_seq]--> Time 2: in function main, statement printf(__stringlit_1, fact(10)); --[step_do_1]--> Time 3: in function main, expression printf(__stringlit_1, fact(10)) --[red_var_global]--> Time 4: in function main, expression printf(<loc __stringlit_1>, fact(10)) --[red_rvalof]--> Time 5: in function main, expression printf(<ptr __stringlit_1>, fact(10)) [...] Time 254: in function main, statement return 0; --[step_return_1]--> Time 255: in function main, expression 0 --[step_return_2]--> Time 256: returning 0 Time 256: program terminated (exit code = 0)
The labels on the arrows (e.g. step_do_1) are the names of the reduction rules being applied. The reduction rules can be found in the CompCert C formal semantics (module Csem.v of the CompCert sources).
Consider the file outofbounds.c containing the following C code:
#include <stdio.h> int x[2] = { 12, 34 }; int y[1] = { 56 }; int main(void) { int i = 65536 * 65536 + 2; printf("i = %d\n", i); printf("x[i] = %d\n", x[i]); return 0; }
Running it with ccomp -interp -trace outofbounds.c, we obtain the following trace, shortened to focus on the interesting parts:
[...] Time 3: in function main, expression i = 65536 * 65536 + 2 --[red_var_local]--> Time 4: in function main, expression <loc i> = 65536 * 65536 + 2 --[red_binop]--> Time 5: in function main, expression <loc i> = 0 + 2 --[red_binop]--> Time 6: in function main, expression <loc i> = 2 [...] Time 27: in function main, expression printf(<ptr __stringlit_2>, *<ptr x+8>) --[red_deref]--> Time 28: in function main, expression printf(<ptr __stringlit_2>, <loc x+8>) Stuck state: in function main, expression printf(<ptr __stringlit_2>, <loc x+8>) Stuck subexpression: <loc x+8> ERROR: Undefined behavior
We see that the expression 65536 * 65536 + 2 caused an overflow in signed arithmetic. This is an undefined behavior according to the C standards, but the CompCert C semantics fully defines this behavior as computing the result modulo 232. Therefore, the expression evaluates to 2, without error.
On the other hand, the array access x[i] triggers an undefined behavior, since i, which is equal to 2, falls outside the bounds of x, which is of size 2. The interpreter gets stuck when trying to dereference the l-value x + 2 (printed as <loc x+8> to denote the location 8 bytes from that of variable x).
Consider the following C program in file abc.c:
#include <stdio.h> int a() { printf("a "); return 1; } int b() { printf("b "); return 2; } int c() { printf("c "); return 3; } int main () { printf("%d\n", a() + (b() + c())); return 0; }
Interpreting it multiple times with ccomp -interp -quiet -random abc.c produces various outputs among the following six possibilities:
a b c 6 a c b 6 b a c 6 b c a 6 c a b 6 c b a 6
Indeed, according to the C standards and to the CompCert C formal semantics, the calls to functions a(), b() and c() can occur in any order.
On the abc.c example, exploring all evaluation orders with the -all option results in a messy output. Let us do this exploration on a simpler example (file nondet.c):
int x = 0; int f() { return ++x; } int main() { return f() - x; }
Running ccomp -interp -all nondet.c shows the two possible outcomes for this program:
Time 17: program terminated (exit code = 0) Time 17: program terminated (exit code = 1)
The first outcome corresponds to calling f first, setting x to 1 and returning 1, then reading x, obtaining 1. The second outcome corresponds to the other evaluation order: x is read first, producing 0, then f is called, returning 1.
If we add the -trace option, we can follow the breadth-first exploration of evaluation states. At any given time, up to three different states are reachable.
State 0.1: calling main() Transition state 0.1 --[step_internal_function]--> state 1.1 State 1.1: in function main, statement return f() - x; Transition state 1.1 --[step_return_1]--> state 2.1 State 2.1: in function main, expression f() - x Transition state 2.1 --[red_var_global]--> state 3.1 Transition state 2.1 --[red_var_global]--> state 3.2 State 3.1: in function main, expression <loc f>() - x State 3.2: in function main, expression f() - <loc x> Transition state 3.1 --[red_rvalof]--> state 4.1 Transition state 3.1 --[red_var_global]--> state 4.2 Transition state 3.2 --[red_var_global]--> state 4.2 Transition state 3.2 --[red_rvalof]--> state 4.3 State 4.1: in function main, expression <ptr f>() - x State 4.2: in function main, expression <loc f>() - <loc x> State 4.3: in function main, expression f() - 0 Transition state 4.1 --[red_call]--> state 5.1 Transition state 4.1 --[red_var_global]--> state 5.2 Transition state 4.2 --[red_rvalof]--> state 5.2 Transition state 4.2 --[red_rvalof]--> state 5.3 Transition state 4.3 --[red_var_global]--> state 5.3 State 5.1: calling f() State 5.2: in function main, expression <ptr f>() - <loc x> State 5.3: in function main, expression <loc f>() - 0 [...]