Explaining Counterexamples
with Giant-Step Assertion Checking††thanks: This work was partly funded by collaborations between Inria and the companies AdaCore
(Paris, France) and Mitsubishi Electric R&D Centre Europe (Rennes, France).
Abstract
Identifying the cause of a proof failure during deductive verification of programs is hard: it may be due to an incorrectness in the program, an incompleteness in the program annotations, or an incompleteness of the prover. The changes needed to resolve a proof failure depend on its category, but the prover cannot provide any help on the categorisation. When using an SMT solver to discharge a proof obligation, that solver can propose a model from a failed attempt, from which a possible counterexample can be derived. But the counterexample may be invalid, in which case it may add more confusion than help. To check the validity of a counterexample and to categorise the proof failure, we propose the comparison between the run-time assertion-checking (RAC) executions under two different semantics, using the counterexample as an oracle. The first RAC execution follows the normal program semantics, and a violation of a program annotation indicates an incorrectness in the program. The second RAC execution follows a novel “giant-step” semantics that does not execute loops nor function calls but instead retrieves return values and values of modified variables from the oracle. A violation of the program annotations only observed under giant-step execution characterises an incompleteness of the program annotations. We implemented this approach in the Why3 platform for deductive program verification and evaluated it using examples from prior literature.
1 Introduction
Deductive program verification aims at checking that a given program respects a given functional behaviour. The expected behaviour is expressed formally by logical assertions, principally pre-conditions and post-conditions on procedures and functions, and invariants on loops. The verification process consists in generating, from the code and the formal annotations, a set of logic formulas called verification conditions (VCs), typically via a Weakest Precondition Calculus (WP) [Dijkstra76]. If the VCs are proved valid, then the program is guaranteed to satisfy its specification. Deductive verification environments like Dafny [leino14fide], OpenJML [cok14], or Why3 [bobot14sttt], pass the VCs to automatic theorem provers usually based on Satisfiability Modulo Theories (SMT), such as Alt-Ergo [alt-ergo], CVC4 [barrett11cade] and Z3 [demoura08tacas]. Due to the nature of these solvers, the conclusion of each VC is negated to form a proof goal and the background solver is queried for satisfiability. Since these solvers are assumed to be sound when the answer is “unsat”, one can conclude that the VC is valid when that is indeed the answer.
In this paper we address the case when the solver does not answer “unsat”, and provide a method to explain why the proof could not be completed. The solver may give instead several other answers: at best it answers “sat”, possibly with a model, which is a collection of values for the variables in the goal. As displayed in Fig. 1, we rely on an approach by Dailler et al [dailler18jlamp] to turn such a model into a candidate counterexample, which is essentially a collection of triples (variable, source location, value) representing the different values taken by a program variable during an execution. Such a counterexample may indicate two different problems: (i) a non-conformance, where the code does not satisfy one of its annotations; (ii) a subcontract weakness, where the annotations are not appropriate to achieve a proof (typically a weakness of a loop invariant or post-condition). Unfortunately there is no direct way to distinguish these cases. Other answers of the prover are even less informative: the answer “unknown” replaces “sat” when the solver is incomplete, for example in presence of quantified hypotheses or formulas involving non-linear arithmetic, or the solver reaches a time limit or a memory limit. In these cases, the solver may as well propose a model, but without any guarantee about its validity. Summing up, for any other answer than “unsat”, there is a need to validate the resulting candidate counterexample and categorise it as non-conformance or subcontract weakness.
We propose the categorisation of counterexamples using a novel notion of assertion checking, called giant-step assertion checking. Let us illustrate the idea on the following toy program, that operates on a global variable x.
The VC for the function main is where denotes the new value of x after the call to set_x, and the premise of the second implication comes from the post-condition of set_x. The query sent to the solver is , from which the solver would typically answer “sat” with the model , corresponding to a counterexample where x is 0 after the assignment and 4 after the call to set_x. If we proceed to a regular assertion-checking execution of main, no issue is reported: both the post-condition of set_x and the assertion in main are valid. Our giant-step assertion checking executes calls to sub-programs in a single step, selecting the values of modified variables from the candidate counterexample: it handles the call set_x 2 by extracting the new value for x from the counterexample, here , and checking the post-condition, which is correct here. The execution then fails because the assertion is wrong. Since the standard assertion checking is OK but the giant-step one fails, we report a subcontract weakness. This is the expected categorisation, suggesting the user to improve the contract of set_x, in this case by stating a more precise post-condition. Giant-step assertion checking also executes loops by a single giant step to identify subcontract weaknesses in loop invariants.
In Section 2, we explain in more details the concept of giant-step execution, and how we use it to categorise counterexamples. In Section 3, we present our implementation, and experimental results, that were conducted within the Why3 environment [bobot14sttt]. We discuss related work and future work in Section LABEL:sec:concl. The technical details that we skip here due to lack of space are available in a research report [becker21rr].
2 Giant-Step Assertion Checking
We consider here a basic programming language with mutable variables, function calls and while loops, where functions are annotated with contracts (a pre-condition and a post-condition) and loops are annotated with a loop invariant. The language also includes a standard assert statement, as shown in the example above.
Giant-step assertion checking.
It corresponds to standard runtime assertion checking (RAC), in the sense that annotations (i.e., assertions, pre- and post-conditions, and loop invariants) are checked when they are encountered during execution. It differs from the standard RAC in the treatment of function calls and loops: instead of executing their body statements, they only involve a single, “giant” step. Giant-step execution is parameterised by an oracle, from which one retrieves the values of variables that are written by a function call or a loop. These written variables could be automatically inferred, but for simplicity we require them to be indicated by a writes clause. Therefore, a function declaration has the form:
and a loop has the form:
where are fresh variables.
Figure 2 presents a pseudo-code for giant-step evaluation of a program expression (see [becker21rr] for a formal presentation). This execution form is inspired by the weakest precondition calculus, specifically the rules for calls and loops that we remind in Figure 3. The execution may fail or get stuck in a number of situations:
-
•
Line 7: if the pre-condition of a call to is false, the execution must fail (as in standard RAC).
-
•
Line 9: if the post-condition of a call to is false, the values from the oracle are incompatible with the postcondition of . A stuck execution is reported, and the counterexample will be declared invalid.
-
•
Line 12: as in standard assertion checking, a failure is reported for the invariant initialisation.
-
•
Line 14: if the invariant is false, the oracle does not provide valid values to continue the execution. A stuck execution is reported, and the counterexample will be invalid.
-
•
Line 15: if the loop condition is false after setting the values of written variables in the context, the oracle covers an execution that goes beyond the loop, so we just terminate its execution.
-
•
Line 17: if invariant is not preserved, we report an assertion failure.
-
•
Line 18: if invariant is preserved, it means the oracle is not appropriate for identifying any failure, we report a stuck execution.
Categorisation of counterexample.
As shown on Fig. 1, assume that a VC for a program is not validated by some SMT solver, which returns a model, which is turned into a candidate counterexample. We categorise this counterexample as follows (stopping when the first statement is met):
-
1.
Run the standard RAC on the enclosing function of the VC, with arguments and values of global variables taken from the counterexample. If the result is
-
(a)
a failure at the same source location as the VC, we report a non-conformance: the code does not satisfy the corresponding annotation.
-
(b)
a failure at a different source location as the VC, the counterexample is bad (is not suitable to explain the failed proof), although it deserves to be reported as a non-conformance elsewhere in the code: it exposes an execution where the program does not satisfy some annotation.
-
(a)
-
2.
Run the giant-step RAC on the enclosing function of the VC, with inputs and written variables given by the counterexample seen as an oracle. If the result is
-
(a)
a failure, we report a sub-contract weakness: some post-condition or loop invariant in the context is too weak, as in the introductory toy example.
-
(b)
a stuck execution, the counterexample is invalid and discarded: one of the premises of the VC is not satisfied, and we can suspect a prover incompleteness.
-
(c)
a normal execution, the counterexample is discarded: it satisfies the conclusion of the VC.
-
(a)