Chapter 4 Using the CompCert C interpreter

This chapter describes the CompCert C reference interpreter and how to invoke it.

4.1 Overview

The CompCert C reference interpreter executes the given C source file by interpretation, displaying the outcome of the execution (normal termination or aborting on an undefined behavior), as well as the observable effects (e.g. printf calls) performed during the execution.

The reference interpreter is faithful to the formal semantics of the CompCert C language: all the behaviors that it displays are possible behaviors according to the formal semantics. In particular, the reference interpreter immediately reports and stops when an undefined behavior of the C source program is encountered. This is not the case for the machine code generated by compiling this program: once the undefined behavior is triggered, the machine code can crash, but it can also continue with any other behavior.

The primary use of the reference interpreter is to check whether a particular run of a C program exhibits behaviors that are undefined in CompCert C. If it does, something is wrong with the program, and the program should be fixed. The reference interpreter can also be useful to familiarize oneself with the CompCert C formal semantics, and validate it experimentally by testing.

The reference interpreter is presented as a special mode, -interp, of the ccomp command-line executable of the CompCert C compiler. An invocation of the reference interpreter is of the form

ccomp -interp [option …] input-file.c

The input C file is preprocessed, elaborated to the CompCert C subset language, then interpreted and its observable effects displayed.

4.2 Limitations

The following limitations apply to the C source files that can be interpreted.

The C source file must contain a complete, standalone program, including in particular a main function.

The only external functions available to the program are

printf	to display formatted text on standard output
malloc	to dynamically allocate memory
free	to free memory allocated by malloc
__builtin_annot	to mark execution points
__builtin_annot_intval	(likewise)
__builtin_fabs	floating-point absolute value

The main function must be declared with one of the two types allowed by the C standards, namely:
```
    int main(void) { ... }
    int main(int argc, char ** argv) { ... }
```
In the second form above, main is called with argc equal to zero and argv equal to the NULL pointer. The program does not, therefore, have access to command-line arguments.

4.3 Options

The following options to the ccomp command apply specifically to the reference interpreter.

4.3.1 Controlling the output

By default, the reference interpreter prints whatever output is produced by the program via calls to the printf function, plus messages to indicate program termination as well as other observable events.

-quiet: Do not print any trace of the execution. The only output produced is that of the printf calls contained in the program.
-trace: Print a detailed trace of the execution. At each time step, the interpreter displays the expression or statement or function invocation that it is about to execute.

4.3.2 Controlling execution order

Like that of C, the semantics of CompCert C is internally nondeterministic: in general, several evaluation orders are possible for a given expression, and different orders can produce different observable behaviors for the program. By default, the interpreter evaluates C expressions following a fixed, left-to-right evaluation order.

-random: Randomize execution order. Instead of evaluating expressions left-to-right, the interpreter picks one evaluation order among all those allowed by the semantics of CompCert C. Interpreting the same program in -random mode several times in a row can show that a program is sensitive to evaluation order.
-all: Explore in parallel all evaluation orders allowed by the semantics of CompCert C, displaying all possible outcomes of the input program. This exploration can be very costly and is feasible only for short programs.

4.3.3 Options shared with the compiler

In addition, all the options of the CompCert C compiler are recognized (see section 3.2). The ones that make sense in interpreter mode are:

Preprocessing options (section 3.2.2): -I, -D, -U
Language support options (section 3.2.9): -fall, -fnone, and the various -ffeature and -fno-feature options.
Tracing options (section 3.2.11): -dparse and -dc. The -dc option is particularly useful in conjunction with the interpreter, since it saves in a readable file the exact CompCert C program that the interpreter is running.

4.4 Examples of use

4.4.1 Running a simple program

Consider the file fact.c containing the following program:

    #include <stdio.h>

    int fact(int n)
    {
      int r = 1;
      int i;
      for (i = 2; i <= n; i++) r *= i;
      return r;
    }

    int main(void) {
      printf("fact(10) = %d\n", fact(10));
      return 0;
    }

Running ccomp -interp fact.c produces the following output:

    fact(10) = 3628800
    Time 251: observable event:
                         extcall printf(& __stringlit_1, 3628800) -> 0
    Time 256: program terminated (exit code = 0)

The first line is the output produced by the printf statement. The other three lines report the two observable effects performed by the program: first, after 251 execution steps, a call to the external function printf; then, after 256 execution steps, successful termination with exit code 0.

To make more sense out of the messages, we can add the -dc option to the command line, then look at the generated fact.compcert.c file, which contains the CompCert C program as the interpreter sees it:

    unsigned char const __stringlit_1[15] = "fact(10) = %d\012";

    extern int printf(unsigned char *, ...);

    int fact(int n)
    {
      int r;
      int i;
      r = 1;
      for (i = 2; i <= n; i++) {
        r *= i;
      }
      return r;
    }

    int main(void)
    {
      printf(__stringlit_1, fact(10));
      return 0;
    }

We see that the string literal appearing as first argument to printf was lifted outside the call and bound to a global variable __stringlit_1.

Interpreting fact.c with the -trace option, we obtain a long and detailed trace of the execution, of which we show only a few lines:

    Time 0: calling main()
    --[step_internal_function]-->
    Time 1: in function main, statement
      printf(__stringlit_1, fact(10)); return 0;
    --[step_seq]-->
    Time 2: in function main, statement printf(__stringlit_1, fact(10));
    --[step_do_1]-->
    Time 3: in function main, expression printf(__stringlit_1, fact(10))
    --[red_var_global]-->
    Time 4: in function main, expression printf(<loc __stringlit_1>, fact(10))
    --[red_rvalof]-->
    Time 5: in function main, expression printf(<ptr __stringlit_1>, fact(10))
    [...]
    Time 254: in function main, statement return 0;
    --[step_return_1]-->
    Time 255: in function main, expression 0
    --[step_return_2]-->
    Time 256: returning 0
    Time 256: program terminated (exit code = 0)

The labels on the arrows (e.g. step_do_1) are the names of the reduction rules being applied. The reduction rules can be found in the CompCert C formal semantics (module Csem.v of the CompCert sources).

4.4.2 Exploring undefined behaviors

Consider the file outofbounds.c containing the following C code:

    #include <stdio.h>

    int x[2] = { 12, 34 };
    int y[1] = { 56 };

    int main(void)
    {
      int i = 65536 * 65536 + 2;
      printf("i = %d\n", i);
      printf("x[i] = %d\n", x[i]);
      return 0;
    }

Running it with ccomp -interp -trace outofbounds.c, we obtain the following trace, shortened to focus on the interesting parts:

    [...]
    Time 3: in function main, expression i = 65536 * 65536 + 2
    --[red_var_local]-->
    Time 4: in function main, expression <loc i> = 65536 * 65536 + 2
    --[red_binop]-->
    Time 5: in function main, expression <loc i> = 0 + 2
    --[red_binop]-->
    Time 6: in function main, expression <loc i> = 2
    [...]
    Time 27: in function main, expression printf(<ptr __stringlit_2>,
                                                 *<ptr x+8>)
    --[red_deref]-->
    Time 28: in function main, expression printf(<ptr __stringlit_2>,
                                                 <loc x+8>)
    Stuck state: in function main, expression printf(<ptr __stringlit_2>,
                                                     <loc x+8>)
    Stuck subexpression: <loc x+8>
    ERROR: Undefined behavior

We see that the expression 65536 * 65536 + 2 caused an overflow in signed arithmetic. This is an undefined behavior according to the C standards, but the CompCert C semantics fully defines this behavior as computing the result modulo 2³². Therefore, the expression evaluates to 2, without error.

On the other hand, the array access x[i] triggers an undefined behavior, since i, which is equal to 2, falls outside the bounds of x, which is of size 2. The interpreter gets stuck when trying to dereference the l-value x + 2 (printed as <loc x+8> to denote the location 8 bytes from that of variable x).

4.4.3 Exploring evaluation orders

Consider the following C program in file abc.c:

    #include <stdio.h>

    int a() { printf("a "); return 1; }
    int b() { printf("b "); return 2; }
    int c() { printf("c "); return 3; }

    int main () {
      printf("%d\n", a() + (b() + c()));
      return 0;
    }

Interpreting it multiple times with ccomp -interp -quiet -random abc.c produces various outputs among the following six possibilities:

    a b c 6
    a c b 6
    b a c 6
    b c a 6
    c a b 6
    c b a 6

Indeed, according to the C standards and to the CompCert C formal semantics, the calls to functions a(), b() and c() can occur in any order.

On the abc.c example, exploring all evaluation orders with the -all option results in a messy output. Let us do this exploration on a simpler example (file nondet.c):

    int x = 0;
    int f() { return ++x; }
    int main() { return f() - x; }

Running ccomp -interp -all nondet.c shows the two possible outcomes for this program:

    Time 17: program terminated (exit code = 0)
    Time 17: program terminated (exit code = 1)

The first outcome corresponds to calling f first, setting x to 1 and returning 1, then reading x, obtaining 1. The second outcome corresponds to the other evaluation order: x is read first, producing 0, then f is called, returning 1.

If we add the -trace option, we can follow the breadth-first exploration of evaluation states. At any given time, up to three different states are reachable.

    State 0.1:  calling main()
    Transition state 0.1 --[step_internal_function]--> state 1.1
    State 1.1:  in function main, statement return f() - x;
    Transition state 1.1 --[step_return_1]--> state 2.1
    State 2.1:  in function main, expression f() - x
    Transition state 2.1 --[red_var_global]--> state 3.1
    Transition state 2.1 --[red_var_global]--> state 3.2
    State 3.1:  in function main, expression <loc f>() - x
    State 3.2:  in function main, expression f() - <loc x>
    Transition state 3.1 --[red_rvalof]--> state 4.1
    Transition state 3.1 --[red_var_global]--> state 4.2
    Transition state 3.2 --[red_var_global]--> state 4.2
    Transition state 3.2 --[red_rvalof]--> state 4.3
    State 4.1:  in function main, expression <ptr f>() - x
    State 4.2:  in function main, expression <loc f>() - <loc x>
    State 4.3:  in function main, expression f() - 0
    Transition state 4.1 --[red_call]--> state 5.1
    Transition state 4.1 --[red_var_global]--> state 5.2
    Transition state 4.2 --[red_rvalof]--> state 5.2
    Transition state 4.2 --[red_rvalof]--> state 5.3
    Transition state 4.3 --[red_var_global]--> state 5.3
    State 5.1:  calling f()
    State 5.2:  in function main, expression <ptr f>() - <loc x>
    State 5.3:  in function main, expression <loc f>() - 0
    [...]