Proof Repair across Type Equivalences

Talia Ringer University of WashingtonUSA [email protected] , RanDair Porter University of WashingtonUSA [email protected] , Nathaniel Yazdani Northeastern UniversityUSA [email protected] , John Leo Halfaya ResearchUSA [email protected] and Dan Grossman University of WashingtonUSA [email protected]

(2021)

Abstract.

We describe a new approach to automatically repairing broken proofs in the Coq proof assistant in response to changes in types. Our approach combines a configurable proof term transformation with a decompiler from proof terms to suggested tactic scripts. The proof term transformation implements transport across equivalences in a way that removes references to the old version of the changed type and does not rely on axioms beyond those Coq assumes.

We have implemented this approach in Pumpkin Pi, an extension to the Pumpkin Patch Coq plugin suite for proof repair. We demonstrate Pumpkin Pi’s flexibility on eight case studies, including supporting a benchmark from a user study, easing development with dependent types, porting functions and proofs between unary and binary numbers, and supporting an industrial proof engineer to interoperate between Coq and other verification tools more easily.

proof engineering, proof repair, proof reuse

^†^†copyright: acmlicensed^†^†price: 15.00^†^†doi: 10.1145/3453483.3454033^†^†journalyear: 2021^†^†submissionid: pldi21main-p43-p^†^†isbn: 978-1-4503-8391-2/21/06^†^†conference: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation; June 20–25, 2021; Virtual, Canada^†^†booktitle: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21), June 20–25, 2021, Virtual, Canada^†^†ccs: Software and its engineering Formal software verification^†^†ccs: Software and its engineering Software evolution^†^†ccs: Theory of computation Type theory

1. Introduction

Program verification with interactive theorem provers has come a long way since its inception, especially when it comes to the scale of programs that can be verified. The seL4 (Klein et al., 2009) verified operating system kernel, for example, is the effort of a team of proof engineers spanning more than a million lines of proof, costing over 20 person-years. Given a famous 1977 critique of verification (DeMillo et al., 1977) (emphasis ours):

A sufficiently fanatical researcher might be willing to devote two or three years to verifying a significant piece of software if he could be assured that the software would remain stable.

we could argue that, over 40 years, either verification has become easier, or researchers have become more fanatical. Unfortunately, not all has changed (emphasis still ours):

But real-life programs need to be maintained and modified. There is no reason to believe that verifying a modified program is any easier than verifying the original the first time around.

Tools that can automatically refactor or repair proofs (Wibergh, 2019; Whiteside, 2013; Dietrich et al., 2013; Adams, 2015; Bourke et al., 2012; Roe and Smith, 2016; Robert, 2018; Ringer et al., 2018) give us reason to believe that verifying a modified program can sometimes be easier than verifying the original, even when proof engineers do not follow good development processes, or when change occurs outside of proof engineers’ control (Ringer et al., 2019a). Still, maintaining verified programs can be challenging: it means keeping not just the programs, but also specifications and proofs about those programs up-to-date. This remains so difficult that sometimes, even experts give up in the face of change (Ringer et al., 2020).

The problem of automatically updating proofs in response to changes in programs or specifications is known as proof repair (Ringer et al., 2019a; Ringer et al., 2018). While there are many ways proofs need to be repaired, one such need is in response to a changed type definition (Section 3). We make progress on two open challenges in proof repair in response to changes in type definitions:

(1)

Existing work supports very limited classes of these changes like non-structural changes (Ringer et al., 2018) or a predefined set of changes (Robert, 2018; Wibergh, 2019), and these are not informed by the needs of proof engineers (Ringer et al., 2020).
(2)

Proof repair tools are not yet integrated with typical proof engineering workflows (Ringer et al., 2019a; Ringer et al., 2018; Robert, 2018), and may impose additional proof obligations like proving relations corresponding to changes (Ringer et al., 2019b).

The typical proof engineering workflow in Coq is interactive: The proof engineer passes Coq high-level search procedures called tactics (like induction), and Coq responds to each tactic by refining the current goal to some subgoal (like the proof obligation for the base case). This loop of tactics and goals continues until no goals remain, at which point the proof engineer has constructed a sequence of tactics called a proof script. To check this proof script for correctness, Coq compiles it to a low-level representation called a proof term, then checks that the proof term has the expected type.

Our approach to proof repair works at the level of low-level proof terms, then builds back up to high-level proof scripts. In particular, our approach combines a configurable proof term transformation (Section 4) with a prototype decompiler from proof terms back to suggested proof scripts (Section 5). This is implemented (Section 6) in Pumpkin Pi, an extension to the Pumpkin Patch (Ringer et al., 2018) proof repair plugin suite for Coq 8.8 that is available on Github.¹¹1We annotate each claim to which code is relevant with a circled number like . These circled numbers are links to code, and are detailed in GUIDE.md.

Addressing Challenge 1: Flexible Type Support. The case studies in Section 7—summarized in Table 1 on page 1—show that Pumpkin Pi is flexible enough to support a wide range of proof engineering use cases. In general, Pumpkin Pi can support any change described by an equivalence, though it takes the equivalence in a deconstructed form that we call a configuration. The configuration expresses to the proof term transformation how to translate functions and proofs defined over the old version of a type to refer only to the new version, and how to do so in a way that does not break definitional equality. The proof engineer can write this configuration in Coq and feed it to Pumpkin Pi (manual configuration in Table 1), configuring Pumpkin Pi to support the change.

Addressing Challenge 2: Workflow Integration. Research on workflow integration for proof repair tools is in its infancy. Pumpkin Pi is built with workflow integration in mind. For example, Pumpkin Pi is the only proof repair tool we are aware of that produces suggested proof scripts (rather than proof terms) for repaired proofs, a challenge highlighted in existing proof repair work (Ringer et al., 2018; Robert, 2018) and in a survey of proof engineering (Ringer et al., 2019a). In addition, Pumpkin Pi implements search procedures that automatically discover configurations and prove the equivalences they induce for four different classes of changes (automatic configuration in Table 1), decreasing the burden of proof obligations imposed on the proof engineer. Our partnership with an industrial proof engineer has informed other changes to further improve workflow integration (Sections 6.1 and 7).

Refer to caption — Figure 1. The workflow for Pumpkin Pi.

Bringing it Together. Figure 1 shows how this comes together when the proof engineer invokes Pumpkin Pi:

(1)

The proof engineer Configures Pumpkin Pi, either manually or automatically.
(2)

The configured Transform transforms the old proof term into the new proof term.
(3)

Decompile suggests a new proof script.

There are currently four search procedures for automatic configuration implemented in Pumpkin Pi (see Table 1 on page 1). Manual configuration makes it possible for the proof engineer to configure the transformation to any equivalence, even without a search procedure. Section 7 shows examples of both workflows applied to real scenarios.

2. A Simple Motivating Example

⬇

Inductive list (T : Type) : Type :=

| nil : list T

| cons : T

\rightarrow

list T

\rightarrow

list T.

⬇

Inductive list (T : Type) : Type :=

| cons : T

\rightarrow

list T

\rightarrow

list T

| nil : list T.

Figure 2. A change from the old version of list (left) to the new version of list (right). The old version of list is an inductive datatype that is either empty (the nil constructor), or the result of placing an element in front of another list (the cons constructor). The change swaps these two constructors (orange).

⬇ swap T (l : Old.list T) : New.list T := Old.list_rect T (fun (l : Old.list T) => New.list T) New.nil (fun t _ (IHl : New.list T) => New.cons T t IHl) l. Lemma section: $\forall$ T (l : Old.list T), swap^-1 T (swap T l) = l. Proof. intros T l. symmetry. induction l as [ |a l0 H]. - auto. - simpl. rewrite $\leftarrow$ H. auto. Qed. ⬇ swap^-1 T (l : New.list T) : Old.list T := New.list_rect T (fun (l : New.list T) => Old.list T) (fun t _ (IHl : Old.list T) => Old.cons T t IHl) Old.nil l. Lemma retraction: $\forall$ T (l : New.list T), swap T (swap^-1 T l) = l. Proof. intros T l. symmetry. induction l as [t l0 H| ]. - simpl. rewrite $\leftarrow$ H. auto. - auto. Qed.

Figure 3. Two functions between Old.list and New.list (top) that form an equivalence (bottom).

Consider a simple example of using Pumpkin Pi: repairing proofs after swapping the two constructors of the list datatype (Figure 2). This is inspired by a similar change from a user study of proof engineers (Section 7). Even such a simple change can cause trouble, as in this proof from the Coq standard library (comments ours for clarity):²²2We use induction instead of pattern matching.

⬇

Lemma rev_app_distr {A} :

\forall

(x y : list A), rev (x ++ y) = rev y ++ rev x.

Proof. (* by induction over x and y *)

induction x as [| a l IHl].

(* x nil: *) induction y as [| a l IHl].

(* y nil: *) simpl. auto.

(* y cons *) simpl. rewrite app_nil_r; auto.

(* both cons: *) intro y. simpl.

rewrite (IHl y). rewrite app_assoc; trivial.

Qed.

This lemma says that appending (++) two lists and reversing (rev) the result behaves the same as appending the reverse of the second list onto the reverse of the first list. The proof script works by induction over the input lists x and y: In the base case for both x and y, the result holds by reflexivity. In the base case for x and the inductive case for y, the result follows from the existing lemma app_nil_r. Finally, in the inductive case for both x and y, the result follows by the inductive hypothesis and the existing lemma app_assoc.

When we change the list type, this proof no longer works. To repair this proof with Pumpkin Pi, we run this command:

⬇

Repair Old.list New.list in rev_app_distr.

assuming the old and new list types from Figure 2 are in modules Old and New. This suggests a proof script that succeeds (in light blue to denote Pumpkin Pi produces it automatically):

⬇

Proof. (* by induction over x and y *)

intros x. induction x as [a l IHl| ]; intro y0.

- (* both cons: *) simpl. rewrite IHl. simpl.

rewrite app_assoc. auto.

- (* x nil: *) induction y0 as [a l H| ].

+ (* y cons: *) simpl. rewrite app_nil_r. auto.

+ (* y nil: *) auto.

Qed.

where the dependencies (rev, ++, app_assoc, and app_nil_r) have also been updated automatically . If we would like, we can manually modify this to something that more closely matches the style of the original proof script:

⬇

Proof. (* by induction over x and y *)

induction x as [a l IHl|].

(* both cons: *) intro y. simpl.

rewrite (IHl y). rewrite app_assoc; trivial.

(* x nil: *) induction y as [a l IHl|].

(* y cons: *) simpl. rewrite app_nil_r; auto.

(* y nil: *) simpl. auto.

Qed.

We can even repair the entire list module from the Coq standard library all at once by running the Repair module command . When we are done, we can get rid of Old.list.

The key to success is taking advantage of Coq’s structured proof term language: Coq compiles every proof script to a proof term in a rich functional programming language called Gallina—Pumpkin Pi repairs that term. Pumpkin Pi then decompiles the repaired proof term (with optional hints from the original proof script) back to a suggested proof script that the proof engineer can maintain.

In contrast, updating the poorly structured proof script directly would not be straightforward. Even for the simple proof script above, grouping tactics by line, there are $6!=720$ permutations of this proof script. It is not clear which lines to swap since these tactics do not have a semantics beyond the searches their evaluation performs. Furthermore, just swapping lines is not enough: even for such a simple change, we must also swap arguments, so induction x as [| a l IHl] becomes induction x as [a l IHl|]. Robert (2018) describes the challenges of repairing tactics in detail. Pumpkin Pi’s approach circumvents this challenge.

3. Problem Definition

Pumpkin Pi can do much more than permute constructors. Given an equivalence between types $A$ and $B$ , Pumpkin Pi repairs functions and proofs defined over $A$ to instead refer to $B$ (Section 3.1). It does this in a way that allows for removing references to $A$ , which is essential for proof repair, since $A$ may be an old version of an updated type (Section 3.2).

3.1. Scope: Type Equivalences

Pumpkin Pi repairs proofs in response to changes in types that correspond to type equivalences (Univalent Foundations Program, 2013), or pairs of functions that map between two types and are mutual inverses.³³3The adjoint follows, and Pumpkin Pi includes machinery to prove it . When a type equivalence between types $A$ and $B$ exists, those types are equivalent (denoted $A$ $\simeq$ $B$ ). Figure 3 shows a type equivalence between the two versions of list from Figure 2 that Pumpkin Pi discovered and proved automatically .

To give some intuition for what kinds of changes can be described by equivalences, we preview two changes below. See Table 1 on page 1 for more examples.

Factoring out Constructors. Consider changing the type I to the type J in Figure 4. J can be viewed as I with its two constructors A and B pulled out to a new argument of type bool for a single constructor. With Pumpkin Pi, the proof engineer can repair functions and proofs about I to instead use J, as long as she configures Pumpkin Pi to describe which constructor of I maps to true and which maps to false. This information about constructor mappings induces an equivalence I $\simeq$ J across which Pumpkin Pi repairs functions and proofs. File shows an example of this, mapping A to true and B to false, and repairing proofs of De Morgan’s laws.

Adding a Dependent Index. At first glance, the word equivalence may seem to imply that Pumpkin Pi can support only changes in which the proof engineer does not add or remove information. But equivalences are more powerful than they may seem. Consider, for example, changing a list to a length-indexed vector (Figure 5). Pumpkin Pi can repair functions and proofs about lists to functions and proofs about vectors of particular lengths , since $\Sigma$ (l:list T).length l = n $\simeq$ vector T n. From the proof engineer’s perspective, after updating specifications from list to vector, to fix her functions and proofs, she must additionally prove invariants about the lengths of her lists. Pumpkin Pi makes it easy to separate out that proof obligation, then automates the rest.

More generally, in homotopy type theory, with the help of quotient types, it is possible to form an equivalence from a relation, even when the relation is not an equivalence (Angiuli et al., 2020). While Coq lacks quotient types, it is possible to achieve a similar outcome and use Pumpkin Pi for changes that add or remove information when those changes can be expressed as equivalences between $\Sigma$ types or sum types.

3.2. Goal: Transport with a Twist

The goal of Pumpkin Pi is to implement a kind of proof reuse known as transport (Univalent Foundations Program, 2013), but in a way that is suitable for repair. Informally, transport takes a term $t$ and produces a term $t^{\prime}$ that is the same as $t$ modulo an equivalence $A\simeq B$ . If $t$ is a function, then $t^{\prime}$ behaves the same way modulo the equivalence; if $t$ is a proof, then $t^{\prime}$ proves the same theorem the same way modulo the equivalence.

When transport across $A\simeq B$ takes $t$ to $t^{\prime}$ , we say that $t$ and $t^{\prime}$ are equal up to transport across that equivalence (denoted $t\equiv_{A\simeq B}t^{\prime}$ ).⁴⁴4This notation should be interpreted in a metatheory with univalence—a property that Coq lacks—or it should be approximated in Coq. The details of transport with univalence are in Univalent Foundations Program (2013), and an approximation in Coq is in Tabareau et al. (2018). For equivalent $A$ and $B$ , there can be many equivalences $A\simeq B$ . Equality up to transport is across a particular equivalence, but we erase this in the notation. In Section 2, the original append function ++ over Old.list and the repaired append function ++ over New.list that Pumpkin Pi produces are equal up to transport across the equivalence from Figure 3, since (by app_ok ):

⬇

\forall

T (l1 l2 : Old.list T),

swap T (l1 ++ l2) = (swap T l1) ++ (swap T l2).

The original rev_app_distr is equal to the repaired proof up to transport, since both prove the same thing the same way up to the equivalence, and up to the changes in ++ and rev.

Transport typically works by applying the functions that make up the equivalence to convert inputs and outputs between types. This approach would not be suitable for repair, since it does not make it possible to remove the old type $A$ . Pumpkin Pi implements transport in a way that allows for removing references to $A$ —by proof term transformation.

⬇

Inductive I :=

| A : I

| B : I.

⬇

Inductive J :=

| makeJ : bool

\rightarrow

Figure 4. The old type I (left) is either A or B. The new type J (right) is I with A and B factored out to bool (orange).

⬇

Inductive list (T : Type) : Type :=

| nil : list T

| cons : T

\rightarrow

list T

\rightarrow

list T.

⬇

Inductive vector (T : Type) : nat

\rightarrow

Type :=

| nil : vector T O

| cons : T

\rightarrow

\forall

(n : nat), vector T n

\rightarrow

vector T (S n).

Figure 5. A vector (bottom) is a list (top) indexed by its length (orange). Vectors effectively make it possible to enforce length invariants about lists at compile time.

{grammar}

¡i¿ $\in\mathrm{I\!N}$ , ¡v¿ $\in$ Vars, ¡s¿ $\in$ { Prop, Set, Type¡i¿ }

¡t¿ ::= ¡v¿ — ¡s¿ — $\Pi$ (¡v¿ : ¡t¿) . ¡t¿ — $\lambda$ (¡v¿ : ¡t¿) . ¡t¿ — ¡t¿ ¡t¿ — Ind (¡v¿ : ¡t¿){¡t¿,…,¡t¿} — Constr (¡i¿, ¡t¿) — Elim(¡t¿, ¡t¿){¡t¿,…,¡t¿}

Figure 6. Syntax for CIC_ω from Timany and Jacobs (2015) with (from left to right) variables, sorts, dependent types, functions, application, inductive types, inductive constructors, and primitive eliminators.

4. The Transformation

⬇

DepConstr(0, list T) : list T := Constr(0, list T).

DepConstr(1, list T) t l : list T :=

Constr (1, list T) t l.

DepElim(l, P) { p_nil, p_cons } : P l :=

Elim(l, P) { p_nil, p_cons }.

⬇

DepConstr(0, list T) : list T := Constr(1, list T).

DepConstr(1, list T) t l : list T :=

Constr(0, list T) t l.

DepElim(l, P) { p_nil, p_cons } : P l :=

Elim(l, P) { p_cons, p_nil }.

Figure 7. The dependent constructors and eliminators for old (left) and new (right) list, with the difference in orange.

At the heart of Pumpkin Pi is a configurable proof term transformation for transporting proofs across equivalences . It is a generalization of the transformation from an earlier version of Pumpkin Pi called Devoid (Ringer et al., 2019b), which solved this problem a particular class of equivalences.

The transformation takes as input a deconstructed equivalence that we call a configuration. This section introduces the configuration (Section 4.1), defines the transformation that builds on that (Section 4.2), then specifies correctness criteria for the configuration (Section 4.3). Section 6.1 describes the additional work needed to implement this transformation.

Conventions. All terms that we introduce in this section are in the Calculus of Inductive Constructions (CIC_ω), the type theory that Coq’s proof term language Gallina implements. CIC_ω is based on the Calculus of Constructions (CoC), a variant of the lambda calculus with polymorphism (types that depend on types) and dependent types (types that depend on terms) (Coquand and Huet, 1986). CIC_ω extends CoC with inductive types (Coquand and Paulin, 1990). Inductive types are defined solely by their constructors (like nil and cons for list) and eliminators (like the induction principle for list); this section assumes that these eliminators are primitive.

The syntax for CIC_ω with primitive eliminators is in Figure 6. The typing rules are standard. We assume inductive types $\Sigma$ with constructor $\exists$ and projections $\pi_{l}$ and $\pi_{r}$ , and an equality type = with constructor eq_refl. We use $\vec{t}$ and $\{t_{1},\ldots,t_{n}\}$ to denote lists of terms.

4.1. The Configuration

The configuration is the key to building a proof term transformation that implements transport in a way that is suitable for repair. Each configuration corresponds to an equivalence $A$ $\simeq$ $B$ . It deconstructs the equivalence into things that talk about $A$ , and things that talk about $B$ . It does so in a way that hides details specific to the equivalence, like the order or number of arguments to an induction principle or type.

At a high level, the configuration helps the transformation achieve two goals: preserve equality up to transport across the equivalence between $A$ and $B$ , and produce well-typed terms. This configuration is a pair of pairs:

⬇

((DepConstr, DepElim), (Eta, Iota))

each of which corresponds to one of the two goals: DepConstr and DepElim define how to transform constructors and eliminators, thereby preserving the equivalence, and Eta and Iota define how to transform $\eta$ -expansion and $\iota$ -reduction of constructors and eliminators, thereby producing well-typed terms. Each of these is defined in CIC_ω for each equivalence.

Preserving the Equivalence. To preserve the equivalence, the configuration ports terms over $A$ to terms over $B$ by viewing each term of type $B$ as if it were an $A$ . This way, the rest of the transformation can replace values of $A$ with values of $B$ , and inductive proofs about $A$ with inductive proofs about $B$ , all without changing the order or number of arguments.

The configuration parts responsible for this are DepConstr and DepElim (dependent constructors and eliminators). These describe how to construct and eliminate $A$ and $B$ , wrapping the types with a common inductive structure. The transformation requires the same number of dependent constructors and cases in dependent eliminators for $A$ and $B$ , even if $A$ and $B$ are types with different numbers of constructors ( $A$ and $B$ need not even be inductive; see Sections 4.3 and 7).

For the list change from Section 2, the configuration that Pumpkin Pi discovers uses the dependent constructors and eliminators in Figure 7. The dependent constructors for Old.list are the normal constructors with the order unchanged, while the dependent constructors for New.list swap constructors back to the original order. Similarly, the dependent eliminator for Old.list is the normal eliminator for Old.list, while the dependent eliminator for New.list swaps cases.

As the name hints, these constructors and eliminators can be dependent. Consider the type of vectors of some length:

⬇

packed_vect T :=

\Sigma

(n : nat).vector T n.

Pumpkin Pi can port proofs across the equivalence between this type and list T . The dependent constructors Pumpkin Pi discovers pack the index into an existential, like:

⬇

DepConstr(0, packed_vect) : packed_vect T :=

\exists

(Constr(0, nat)) (Constr(0, vector T)).

and the eliminator it discovers eliminates the projections:

⬇

DepElim(s, P) { f₀ f₁ } : P (

\exists

(

\pi_{l}

s) (

\pi_{r}

s)) :=

Elim(

\pi_{r}

\lambda

(n : nat)(v : vector T n).P (

\exists

n v)) {

f₀,

(

\lambda

(t : T)(n : nat)(v : vector T n).f₁ t (

\exists

n v))

In both these examples, the interesting work moves into the configuration: the configuration for the first swaps constructors and cases, and the configuration for the second maps constructors and cases over list to constructors and cases over packed_vect. That way, the transformation need not add, drop, or reorder arguments. Furthermore, both examples use automatic configuration, so Pumpkin Pi’s Configure component discovers DepConstr and DepElim from just the types $A$ and $B$ , taking care of even the difficult work.

Producing Well-Typed Terms. The other configuration parts Eta and Iota deal with producing well-typed terms, in particular by transporting equalities. CIC_ω distinguishes between two important kinds of equality: those that hold by reduction (definitional equality), and those that hold by proof (propositional equality). That is, two terms t and t’ of type T are definitionally equal if they reduce to the same normal form, and propositionally equal if there is a proof that t = t’ using the inductive equality type = at type T. Definitionally equal terms are necessarily propositionally equal, but the converse is not in general true.

When a datatype changes, sometimes, definitional equalities defined over the old version of that type must become propositional. A naive proof term transformation may fail to generate well-typed terms if it does not account for this. Otherwise, if the transformation transforms a term t : T to some t’ : T’, it does not necessarily transform T to T’ (Tabareau et al., 2019).

Eta and Iota describe how to transport equalities. More formally, they define $\eta$ -expansion and $\iota$ -reduction of $A$ and $B$ , which may be propositional rather than definitional, and so must be explicit in the transformation. $\eta$ -expansion describes how to expand a term to apply a constructor to an eliminator in a way that preserves propositional equality, and is important for defining dependent eliminators (nLab authors, 2020b). $\iota$ -reduction ( $\beta$ -reduction for inductive types) describes how to reduce an elimination of a constructor (nLab authors, 2020a).

The configuration for the change from list to packed_vect has propositional Eta. It uses $\eta$ -expansion for $\Sigma$ :

⬇

Eta(packed_vect) :=

\lambda

(s:packed_vect).

\exists

(

\pi_{l}

s) (

\pi_{r}

s).

which is propositional and not definitional in Coq. Thanks to this, we can forego the assumption that our language has primitive projections (definitional $\eta$ for $\Sigma$ ).

⬇

Inductive nat :=

| O : nat

| S : nat

\rightarrow

nat.

⬇

Inductive positive :=

| xI : positive

\rightarrow

positive

| xO : positive

\rightarrow

positive

| xH : positive.

Inductive N :=

| N0 : N

| Npos : positive

\rightarrow

Figure 8. A unary natural number nat (left) is either zero (0) or the successor of some other natural number (S). A binary natural number N (right) is either zero (N0) or a positive binary number (Npos), where a positive binary number is either 1 (xH), or the result of shifting left and adding 1 (xI) or 0 (xO). Unary and binary natural numbers are equivalent, but have different inductive structures. Consequentially, definitional equalities over nat may become propositional over N.

Each Iota—one per constructor—describes and proves the $\iota$ -reduction behavior of DepElim on the corresponding case. This is needed, for example, to port proofs about unary numbers nat to proofs about binary numbers N (Figure 8). While we can define DepConstr and DepElim to induce an equivalence between them , we run into trouble reasoning about applications of DepElim, since proofs about nat that hold by reflexivity do not necessarily hold by reflexivity over N. For example, in Coq, while S (n + m) = S n + m holds by reflexivity over nat, when we define + with DepElim over N, the corresponding theorem over N does not hold by reflexivity.

To transform proofs about nat to proofs about N, we must transform definitional $\iota$ -reduction over nat to propositional $\iota$ -reduction over N. For our choice of DepConstr and DepElim, $\iota$ -reduction is definitional over nat, since a proof of:

⬇

\forall

P p

{}_{\texttt{0}}

{}_{\texttt{S}}

DepElim(DepConstr(1, nat) n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

} =

{}_{\texttt{S}}

n (DepElim(n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

}).

holds by reflexivity. Iota for nat in the S case is a rewrite by that proof by reflexivity , with type:

⬇

\forall

P p

{}_{\texttt{0}}

{}_{\texttt{S}}

n (Q: P (DepConstr(1, nat) n)

\rightarrow

s),

Iota(1, nat, Q) :

Q (p

{}_{\texttt{S}}

n (DepElim(n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

}))

\rightarrow

Q (DepElim(DepConstr(1, nat) n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

}).

In contrast, $\iota$ for N is propositional, since the theorem:

⬇

\forall

P p

{}_{\texttt{0}}

{}_{\texttt{S}}

DepElim(DepConstr(1, N) n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

} =

{}_{\texttt{S}}

n (DepElim(n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

}).

no longer holds by reflexivity. Iota for N is a rewrite by the propositional equality that proves this theorem , with type:

⬇

\forall

P p

{}_{\texttt{0}}

{}_{\texttt{S}}

n (Q: P (DepConstr(1, N) n)

\rightarrow

s),

Iota(1, N, Q) :

Q (p

{}_{\texttt{S}}

n (DepElim(n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

}))

\rightarrow

Q (DepElim(DepConstr(1, N) n, P) { p

{}_{\texttt{0}}

, p

{}_{\texttt{S}}

}).

By replacing Iota over nat with Iota over N, the transformation replaces rewrites by reflexivity over nat to rewrites by propositional equalities over N. That way, DepElim behaves the same over nat and N.

Taken together over both $A$ and $B$ , Iota describes how the inductive structures of $A$ and $B$ differ. The transformation requires that DepElim over $A$ and over $B$ have the same structure as each other, so if $A$ and $B$ themselves have the same inductive structure (if they are ornaments (McBride, 2011)), then if $\iota$ is definitional for $A$ , it will be possible to choose DepElim with definitional $\iota$ for $B$ . Otherwise, if $A$ and $B$ (like nat and N) have different inductive structures, then definitional $\iota$ over one would become propositional $\iota$ over the other.

4.2. The Proof Term Transformation

{mathpar}\mprset

flushleft $\Gamma$ $\vdash$ $t$ $\Uparrow$ $t^{\prime}$

\inferrule

[Dep-Elim] Γ⊢a ⇑b
Γ⊢p_a ⇑p_b
Γ⊢→f_al ⇑→f_b Γ⊢DepElim(a, p_a) →f_a ⇑DepElim(b, p_b) →f_b

\inferrule

[Dep-Constr] Γ⊢→t_a ⇑→t_b Γ⊢DepConstr(j, A) →t_a ⇑DepConstr(j, B) →t_b

\inferrule

[Eta]
Γ⊢Eta(A) ⇑Eta(B)

\inferrule

[Iota] Γ⊢q_A ⇑q_B
Γ⊢→t_A ⇑→t_B Γ⊢Iota(j, A, q_A) →t_A ⇑Iota(j, B, q_B) →t_B

\inferrule

[Equivalence]
Γ⊢A ⇑B

\inferrule

[Constr] Γ⊢T ⇑T’
Γ⊢→t ⇑→t’ Γ⊢Constr(j, T) →t ⇑Constr(j, T’) →t’

\inferrule

[Ind] Γ⊢T ⇑T’
Γ⊢→C ⇑→C’ Γ⊢Ind (Ty : T) →C ⇑Ind (Ty : T’) →C’

\inferrule

[App] Γ⊢f ⇑f’
Γ⊢t ⇑t’ Γ⊢f t ⇑f’ t’

\inferrule

[Elim] Γ⊢c ⇑c’
Γ⊢Q ⇑Q’
Γ⊢→f ⇑→f’ Γ⊢Elim(c, Q) →f ⇑Elim(c’, Q’) →f’

\inferrule

[Lam] Γ⊢t ⇑t’
Γ⊢T ⇑T’
Γ, t : T ⊢b ⇑b’ Γ⊢λ(t : T).b ⇑λ(t’ : T’).b’

\inferrule

[Prod] Γ⊢t ⇑t’
Γ⊢T ⇑T’
Γ, t : T ⊢b ⇑b’ Γ⊢Π(t : T).b ⇑Π(t’ : T’).b’

\inferrule

[Var] v ∈Vars Γ⊢v ⇑v

Figure 9. Transformation for transporting terms across

A\simeq B

with configuration ((DepConstr, DepElim), (Eta, Iota)).

⬇

(* 1: original term *)

\lambda

(T : Type) (l m : Old.list T) .

Elim(l,

\lambda

(l: Old.list T).Old.list T

\rightarrow

Old.list T)) {

(

\lambda

m . m),

(

\lambda

t _ IHl m . Constr(1, Old.list T) t (IHl m))

} m.

(* 2: after unifying with configuration *)

\lambda

(T : Type) (l m : A) .

DepElim(l,

\lambda

(l: A).A

\rightarrow

A)) {

(

\lambda

m . m)

(

\lambda

t _ IHl m . DepConstr(1, A) t (IHl m))

} m.

⬇

(* 4: reduced to final term *)

\lambda

(T : Type) (l m : New.list T) .

Elim(l,

\lambda

(l: New.list T).New.list T

\rightarrow

New.list T)) {

(

\lambda

t _ IHl m . Constr(0, New.list T) t (IHl m)),

(

\lambda

m . m)

} m.

(* 3: after transforming *)

\lambda

(T : Type) (l m : B) .

DepElim(l,

\lambda

(l: B).B

\rightarrow

B)) {

(

\lambda

m . m)

(

\lambda

t _ IHl m . DepConstr(1, B) t (IHl m))

} m.

Figure 10. Swapping cases of the append function, counterclockwise, the input term: 1) unmodified, 2) unified with the configuration, 3) ported to the updated type, and 4) reduced to the output.

Figure 9 shows the proof term transformation $\Gamma\vdash t\Uparrow t^{\prime}$ that forms the core of Pumpkin Pi. The transformation is parameterized over equivalent types $A$ and $B$ (Equivalence) as well as the configuration. It assumes $\eta$ -expanded functions. It implicitly constructs an updated context $\Gamma^{\prime}$ in which to interpret $t^{\prime}$ , but this is not needed for computation.

The proof term transformation is (perhaps deceptively) simple by design: it moves the bulk of the work into the configuration, and represents the configuration explicitly. Of course, typical proof terms in Coq do not apply these configuration terms explicitly. Pumpkin Pi does some additional work using unification heuristics to get real proof terms into this format before running the transformation. It then runs the proof term transformation, which transports proofs across the equivalence that corresponds to the configuration.

Unification Heuristics. The transformation does not fully describe the search procedure for transforming terms that Pumpkin Pi implements. Before running the transformation, Pumpkin Pi unifies subterms with particular $A$ (fixing parameters and indices), and with applications of configuration terms over $A$ . The transformation then transforms configuration terms over $A$ to configuration terms over $B$ . Reducing the result produces the output term defined over $B$ .

Figure 10 shows this with the list append function ++ from Section 2. To update ++ (top left), Pumpkin Pi unifies Old.list T with $A$ , and Constr and Elim with DepConstr and DepElim (bottom left). After unification, the transformation recursively substitutes $B$ for $A$ , which moves DepConstr and DepElim to construct and eliminate over the updated type (bottom right). This reduces to a term with swapped constructors and cases over New.list T (top right).

In this case, unification is straightforward. This can be more challenging when configuration terms are dependent. This is especially pronounced with definitional Eta and Iota, which typically are implicit (reduced) in real code. To handle this, Pumpkin Pi implements custom unification heuristics for each search procedure that unify subterms with applications of configuration terms, and that instantiate parameters and dependent indices in those subterms . The transformation in turn assumes that all existing parameters and indices are determined and instantiated by the time it runs.

Pumpkin Pi falls back to Coq’s unification for manual configuration and when these custom heuristics fail. When even Coq’s unification is not enough, Pumpkin Pi relies on proof engineers to provide hints in the form of annotations .

⬇

section:

\forall

(a : A), g (f a) = a.

retraction:

\forall

(b : B), f (g b) = b.

constr_ok:

\forall

j

\vec{x_{A}}

\vec{x_{B}}

\vec{x_{A}}

\equiv_{A\simeq B}

\vec{x_{B}}

\rightarrow

DepConstr(

j

, A)

\vec{x_{A}}

\equiv_{A\simeq B}

DepConstr(j, B)

\vec{x_{B}}

elim_ok:

\forall

a b P_A P_B

\vec{f_{A}}

\vec{f_{B}}

\equiv_{A\simeq B}

\rightarrow

P_A

\equiv_{(A\rightarrow s)\simeq(B\rightarrow s)}

P_B

\rightarrow

\forall

j

\vec{f_{A}}

[j]

\equiv_{\xi(A,P_{A},j)\simeq\xi(B,P_{B},j)}

\vec{f_{B}}

[j]

\rightarrow

DepElim(a, P_A)

\vec{f_{A}}

\equiv_{(Pa)\simeq(Pb)}

DepElim(b, P_B)

\vec{f_{A}}

⬇

elim_eta(A):

\forall

a P

\vec{f}

, DepElim(a, P)

\vec{f}

: P (Eta(A) a).

eta_ok(A):

\forall

(a : A), Eta(A) a = a.

constr_ok:

\forall

j

\vec{x_{A}}

\vec{x_{B}}

DepConstr(

j

, A)

\vec{x_{A}}

\equiv_{A\simeq B}

DepConstr(j, B)

\vec{x_{B}}

iota_ok(A):

\forall

j

\vec{f}

\vec{x}

(Q: P(Eta(A) (DepConstr(

j

, A)

\vec{x}

))

\rightarrow

s),

Iota(A, j, Q) :

Q (DepElim(DepConstr(j, A)

\vec{x}

, P)

\vec{f}

)

\rightarrow

Q (rew

\leftarrow

eta_ok(A) (DepConstr(j, A)

\vec{x}

) in

(

\vec{f}

[j]

\ldots

(DepElim(IH₀, P)

\vec{f}

)

\ldots

(DepElim(IH_n, P)

\vec{f}

)

\ldots

)).

Figure 11. Correctness criteria for a configuration to ensure that the transformation preserves equivalence (left) coherently with equality (right, shown for

A

;

B

is similar). f and g are defined in text.

s

\vec{f}

\vec{x}

, and

\vec{\mathtt{IH}}

represent sorts, eliminator cases, constructor arguments, and inductive hypotheses.

\xi

(A,

P,

j)

is the type of DepElim(A, P) at DepConstr(j, A) (similarly for

B

Specifying a Correct Transformation. The implementation of this transformation in Pumpkin Pi produces a term that Coq type checks, and so does not add to the trusted computing base. As Pumpkin Pi is an engineering tool, there is no need to formally prove the transformation correct, though doing so would be satisfying. The goal of such a proof would be to show that if $\Gamma\vdash t\Uparrow t^{\prime}$ , then $t$ and $t^{\prime}$ are equal up to transport, and $t^{\prime}$ refers to $B$ in place of $A$ . The key steps in this transformation that make this possible are porting terms along the configuration (Dep-Constr, Dep-Elim, Eta, and Iota). For metatheoretical reasons, without additional axioms, a proof of this theorem in Coq can only be approximated (Tabareau et al., 2018). It would be possible to generate per-transformation proofs of correctness, but this does not serve an engineering need.

4.3. Specifying Correct Configurations

Choosing a configuration necessarily depends in some way on the proof engineer’s intentions: there can be infinitely many equivalences that correspond to a change, only some of which are useful (for example , any $A$ is equivalent to unit refined by $A$ ). And there can be many configurations that correspond to an equivalence, some of which will produce terms that are more useful or efficient than others (consider DepElim converting through several intermediate types).

While we cannot control for intentions, we can specify what it means for a chosen configuration to be correct: Fix a configuration. Let f be the function that uses DepElim to eliminate $A$ and DepConstr to construct $B$ , and let g be similar. Figure 11 specifies the correctness criteria for the configuration. These criteria relate DepConstr, DepElim, Eta, and Iota in a way that preserves equivalence coherently with equality.

Equivalence. To preserve the equivalence (Figure 11, left), DepConstr and DepElim must form an equivalence (section and retraction must hold for f and g). DepConstr over $A$ and $B$ must be equal up to transport across that equivalence (constr_ok), and similarly for DepElim (elim_ok). Intuitively, constr_ok and elim_ok guarantee that the transformation correctly transports dependent constructors and dependent eliminators, as doing so will preserve equality up to transport for those subterms. This makes it possible for the transformation to avoid applying f and g, instead porting terms from $A$ directly to $B$ .

{grammar}

¡v¿ $\in$ Vars, ¡t¿ $\in$ CIC_ω

¡p¿ ::= intro ¡v¿ — rewrite ¡t¿ ¡t¿ — symmetry — apply ¡t¿ — induction ¡t¿ ¡t¿ { ¡p¿, …, ¡p¿ } — split { ¡p¿, ¡p¿ } — left — right — ¡p¿ . ¡p¿

Figure 12. Qtac syntax.

{mathpar}\mprset

flushleft $\Gamma$ $\vdash$ $t$ $\Rightarrow$ $p$

\inferrule

[Intro] Γ, n : T ⊢b ⇒p Γ⊢λ(n : T) . b ⇒intro n. p

\inferrule

[Symmetry] Γ⊢H ⇒p Γ⊢eq_sym H ⇒symmetry. p

\inferrule

[Split] Γ⊢l ⇒p
Γ⊢r ⇒q Γ⊢Constr(0, ∧) l r ⇒split { p, q }.

\inferrule

[Left] Γ⊢H ⇒p Γ⊢Constr(0, ∨) H ⇒left. p

\inferrule

[Right] Γ⊢H ⇒p Γ⊢Constr(1, ∨) H ⇒right. p

\inferrule

[Rewrite] Γ⊢H_1 : x = y
Γ⊢H_2 ⇒p Γ⊢Elim(H_1, P) { x, H_2, y } ⇒symmetry. rewrite P H_1. p

\inferrule

[Induction] Γ⊢→f ⇒→p Γ⊢Elim(t, P) →f ⇒induction P t →p

\inferrule

[Apply] Γ⊢t ⇒p Γ⊢f t ⇒apply f. p

\inferrule

[Base]
Γ⊢t ⇒apply t

Figure 13. Qtac decompiler semantics.

Equality. To ensure coherence with equality (Figure 11, right), Eta and Iota must prove $\eta$ and $\iota$ . That is, Eta must have the same definitional behavior as the dependent eliminator (elim_eta), and must behave like identity (eta_ok). Each Iota must prove and rewrite along the simplification (refolding (Boutillier, 2014)) behavior that corresponds to a case of the dependent eliminator (iota_ok). This makes it possible for the transformation to avoid applying section and retraction.

Correctness. With these correctness criteria for a configuration, we get the completeness result (proven in Coq ) that every equivalence induces a configuration. We also obtain an algorithm for the soundness result that every configuration induces an equivalence.

The algorithm to prove section is as follows (retraction is similar): replace a with Eta(A) a by eta_ok(A). Then, induct using DepElim over $A$ . For each case $i$ , the proof obligation is to show that g (f a) is equal to a, where a is DepConstr(A, i) applied to the non-inductive arguments (by elim_eta(A)). Expand the right-hand side using Iota(A, i), then expand it again using Iota(B, i) (destructing over each eta_ok to apply the corresponding Iota). The result follows by definition of g and f, and by reflexivity.

Automatic Configuration. Pumpkin Pi implements four search procedures for automatic configuration . Three of the four procedures are based on the search procedure from Devoid (Ringer et al., 2019b), while the remaining procedure instantiates the types $A$ and $B$ of a generic configuration that can be defined inside of Coq directly.

The algorithm above is essentially what Configure uses to generate functions f and g for these configurations , and also generate proofs section and retraction that these functions form an equivalence . To minimize dependencies, Pumpkin Pi does not produce proofs of constr_ok and elim_ok directly, as stating these theorems cleanly would require either a special framework (Tabareau et al., 2018) or a univalent type theory (Univalent Foundations Program, 2013). If the proof engineer wishes, it is possible to prove these in individual cases , but this is not necessary in order to use Pumpkin Pi.

5. Decompiling Proof Terms to Tactics

Transform produces a proof term, while the proof engineer typically writes and maintains proof scripts made up of tactics. We improve usability thanks to the realization that, since Coq’s proof term language Gallina is very structured, we can decompile these Gallina terms to suggested Ltac proof scripts for the proof engineer to maintain.

Decompile implements a prototype of this translation : it translates a proof term to a suggested proof script that attempts to prove the same theorem the same way. Note that this problem is not well defined: while there is always a proof script that works (applying the proof term with the apply tactic), the result is often qualitatively unreadable. This is the baseline behavior to which the decompiler defaults. The goal of the decompiler is to improve on that baseline as much as possible, or else suggest a proof script that is close enough to correct that the proof engineer can manually massage it into something that works and is maintainable.

Decompile achieves this in two passes: The first pass decompiles proof terms to proof scripts that use a predefined set of tactics. The second pass improves on suggested tactics by simplifying arguments, substituting tacticals, and using hints like custom tactics and decision procedures.

First Pass: Basic Proof Scripts. The first pass takes Coq terms and produces tactics in Ltac, the proof script language for Coq. Ltac can be confusing to reason about, since Ltac tactics can refer to Gallina terms, and the semantics of Ltac depends both on the semantics of Gallina and on the implementation of proof search procedures written in OCaml. To give a sense of how the first pass works without the clutter of these details, we start by defining a mini decompiler that implements a simplified version of the first pass. Section 6.2 explains how we scale this to the implementation.

The mini decompiler takes CIC_ω terms and produces tactics in a mini version of Ltac which we call Qtac. The syntax for Qtac is in Figure 12. Qtac includes hypothesis introduction (intro), rewriting (rewrite), symmetry of equality (symmetry), application of a term to prove the goal (apply), induction (induction), case splitting of conjunctions (split), constructors of disjunctions (left and right), and composition (.). Unlike in Ltac, induction and rewrite take a motive explicitly (rather than relying on unification), and apply creates a new subgoal for each function argument.

The semantics for the mini decompiler $\Gamma\vdash t\Rightarrow p$ are in Figure 13 (assuming $=$ , eq_sym, $\wedge$ , and $\vee$ are defined as in Coq). As with the real decompiler, the mini decompiler defaults to the proof script that applies the entire proof term with apply (Base). Otherwise, it improves on that behavior by recursing over the proof term and constructing a proof script using a predefined set of tactics.

For the mini decompiler, this is straightforward: Lambda terms become introduction (Intro). Applications of eq_sym become symmetry of equality (Symmetry). Constructors of conjunction and disjunction map to the respective tactics (Split, Left, and Right). Applications of equality eliminators compose symmetry (to orient the rewrite direction) with rewrites (Rewrite), and all other applications of eliminators become induction (Induction). The remaining applications become apply tactics (Apply). In all cases, the decompiler recurses, breaking into cases, until only the Base case holds.

While the mini decompiler is very simple, only a few small changes are needed to move this to Coq. The generated proof term of rev_app_distr from Section 2, for example, consists only of induction, rewriting, simplification, and reflexivity (solved by auto). Figure 14 shows the proof term for the base case of rev_app_distr alongside the proof script that Pumpkin Pi suggests. This script is fairly low-level and close to the proof term, but it is already something that the proof engineer can step through to understand, modify, and maintain. There are few differences from the mini decompiler needed to produce this, for example handling of rewrites in both directions (eq_ind_r as opposed to eq_ind), simplifying rewrites, and turning applications of eq_refl into reflexivity or auto.

Second Pass: Better Proof Scripts. The implementation of Decompile first runs something similar to the mini decompiler, then modifies the suggested tactics to produce a more natural proof script . For example, it cancels out sequences of intros and revert, inserts semicolons, and removes extra arguments to apply and rewrite. It can also take tactics from the proof engineer (like part of the old proof script) as hints, then iteratively replace tactics with those hints, checking for correctness. This makes it possible for suggested scripts to include custom tactics and decision procedures.

6. Implementation

The transformation and mini decompiler abstract many of the challenges of building a tool for proof engineers. This section describes how we solved some of these challenges.

6.1. Implementing the Transformation

Termination. When a subterm unifies with a configuration term, this suggests that Pumpkin Pi can transform the subterm, but it does not necessarily mean that it should. In some cases, doing so would result in nontermination. For example, if $B$ is a refinement of $A$ , then we can always run Equivalence over and over again, forever. We thus include some simple termination checks in our code .

Intent. Even when termination is guaranteed, whether to transform a subterm depends on intent. That is, Pumpkin Pi automates the case of porting every $A$ to $B$ , but proof engineers sometimes wish to port only some $A$ s to $B$ s. Pumpkin Pi has some support for this using an interactive workflow , with plans for automatic support in the future.

From CIC_ω to Coq. The implementation of the transformation handles language differences to scale from CIC_ω to Coq. We use the existing Preprocess (Ringer et al., 2019b) command to turn pattern matching and fixpoints into eliminators. We handle refolding of constants in constructors using DepConstr.

⬇

fun (y0 : list A) 1 =>

list_rect 2 _ _ (fun a l H 2 =>

eq_ind_r 3 _ eq_refl 4 (app_nil_r (rev l) (a::[])) 3 )

eq_refl 5

y0 2

- intro y0. 1 induction y0 as [a l H|]. 2

+ simpl. rewrite app_nil_r. 3 auto. 4

+ auto. 5

Figure 14. Proof term (top) and decompiled proof script (bottom) for the base case of rev_app_distr (Section 2), with corresponding terms and tactics grouped by color & number.

Reaching Real Proof Engineers. Many of our design decisions in implementing Pumpkin Pi were informed by our partnership with an industrial proof engineer (Section 7). For example, the proof engineer rarely had the patience to wait more than ten seconds for Pumpkin Pi to port a term, so we implemented optional aggressive caching, even caching intermediate subterms encountered while running the transformation . We also added a cache to tell Pumpkin Pi not to $\delta$ -reduce certain terms . With these caches, the proof engineer found Pumpkin Pi efficient enough to use on a code base with tens of thousands of lines of code and proof.

The experiences of proof engineers also inspired new features. For example, we implemented a search procedure to generate custom eliminators to help reason about types like $\Sigma$ (l : list T).length l = n by reasoning separately about the projections . We added informative error messages to help the proof engineer distinguish between user errors and bugs. These features helped with workflow integration.

Table 1. Some changes using Pumpkin Pi (left to right): class of changes, kind of configuration, examples, whether using Pumpkin Pi saved development time relative to reference manual repairs ( if yes, if comparable, if no), and Coq tools we know of that support repair along (Repair) or automatic proof of (Search) the equivalence corresponding to each example. Tools considered are Devoid (Ringer et al., 2019b), the Univalent Parametricity (UP) white-box transformation (Tabareau et al., 2019), and the tool from Magaud and Bertot (2000). Pumpkin Pi is the only one that suggests tactics. More nuanced comparisons to these and more are in Section 8.

Class	Config.	Examples	Sav.	Repair Tools	Search Tools
Algebraic Ornaments	Auto	List to Packed Vector, hs-to-coq		Pumpkin Pi, Devoid, UP	Pumpkin Pi, Devoid
Algebraic Ornaments	Auto	List to Packed Vector, Std. Library		Pumpkin Pi, Devoid, UP	Pumpkin Pi, Devoid
Unpack Sigma Types	Auto	Vector of Given Length, hs-to-coq		Pumpkin Pi, UP	Pumpkin Pi
Tuples & Records	Auto	Simple Records		Pumpkin Pi, UP	Pumpkin Pi
		Parameterized Records		Pumpkin Pi, UP	Pumpkin Pi
		Industrial Use		Pumpkin Pi, UP	Pumpkin Pi
Permute Constructors	Auto	List, Standard Library		Pumpkin Pi, UP	Pumpkin Pi
		Modifying PL, REPLica Benchmark		Pumpkin Pi, UP	Pumpkin Pi
		Large Ambiguous Enum		Pumpkin Pi, UP	Pumpkin Pi
Add new Constructors	Mixed	PL Extension, REPLica Benchmark		Pumpkin Pi	Pumpkin Pi (partial)
Factor Constructors	Manual	External Example		Pumpkin Pi, UP	None
Permute Hypotheses	Manual	External Example		Pumpkin Pi, UP	None
Change Ind. Structure	Manual	Unary to Binary, Classic Benchmark		Pumpkin Pi, Magaud	None
Change Ind. Structure	Manual	Vector to Fin. Set, External Example		Pumpkin Pi	None

6.2. Implementing the Decompiler

From Qtac to Ltac. The mini decompiler assumes more predictable versions of rewrite and induction than those in Coq. Decompile includes additional logic to reason about these tactics . For example, Qtac assumes that there is only one rewrite direction. Ltac has two rewrite directions, and so the decompiler infers the direction from the motive.

Qtac also assumes that both tactics take the inductive motive explicitly, while in Coq, both tactics infer the motive automatically. Consequentially, Coq sometimes fails to infer the correct motive. To handle induction, the decompiler strategically uses revert to manipulate the goal so that Coq can better infer the motive. To handle rewrites, it uses simpl to refold the goal before rewriting. Neither of these approaches is guaranteed to work, so the proof engineer may sometimes need to tweak the suggested proof script appropriately. Even if we pass Coq’s induction principle an explicit motive, Coq still sometimes fails due to unrepresented assumptions. Long term, using another tactic like change or refine before applying these tactics may help with cases for which Coq cannot infer the correct motive.

From CIC_ω to Coq. Scaling the decompiler to Coq introduces let bindings, which are generated by tactics like rewrite in, apply in, and pose. Decompile implements support for rewrite in and apply in similarly to how it supports rewrite and apply, except that it ensures that the unmanipulated hypothesis does not occur in the body of the let expression, it swaps the direction of the rewrite, and it recurses into any generated subgoals. In all other cases, it uses pose, a catch-all for let bindings.

Forfeiting Soundness. While there is a way to always produce a correct proof script, Decompile deliberately forfeits soundness to suggest more useful tactics. For example, it may suggest the induction tactic, but leave the step of motive inference to the proof engineer. We have found these suggested tactics easier to work with (Section 7). Note that in the case the suggested proof script is not quite correct, it is still possible to use the generated proof term directly.

Pretty Printing. After decompiling proof terms, Decompile pretty prints the result . Like the mini decompiler, Decompile represents its output using a predefined grammar of Ltac tactics, albeit one that is larger than Qtac, and that also includes tacticals. It maintains the recursive proof structure for formatting. Pumpkin Pi keeps all output terms from Transform in the Coq environment in case the decompiler does not succeed. Once the proof engineer has the new proof, she can remove the old one.

7. Case Studies: Pumpkin Pi Eight Ways

This section summarizes eight case studies using Pumpkin Pi, corresponding to the eight rows in Table 1. These case studies highlight Pumpkin Pi’s flexibility in handling diverse scenarios, the success of automatic configuration for better workflow integration, the preliminary success of the prototype decompiler, and clear paths to better serving proof engineers. Detailed walkthroughs are in the code.

Algebraic Ornaments: Lists to Packed Vectors. The transformation in Pumpkin Pi is a generalization of the transformation from Devoid. Devoid supported proof reuse across algebraic ornaments, which describe relations between two inductive types, where one type is the other indexed by a fold (McBride, 2011). A standard example is the relation between a list and a length-indexed vector (Figure 5).

Pumpkin Pi implements a search procedure for automatic configuration of algebraic ornaments. The result is all functionality from Devoid, plus tactic suggestions. In file , we used this to port functions and a proof from lists to vectors of some length, since list T $\simeq$ packed_vect T. The decompiler helped us write proofs in the order of hours that we had found too hard to write by hand, though the suggested tactics did need massaging.

⬇

Inductive Term : Set :=

| Var : Identifier

\rightarrow

Term

| Int : Z

\rightarrow

Term

| Eq : Term

\rightarrow

Term

\rightarrow

Term

| Plus : Term

\rightarrow

Term

\rightarrow

Term

| Times : Term

\rightarrow

Term

\rightarrow

Term

| Minus : Term

\rightarrow

Term

\rightarrow

Term

| Choose : Identifier

\rightarrow

Term

\rightarrow

Term.

⬇

Inductive Term : Set :=

| Var : Identifier

\rightarrow

Term

| Bool : Identifier

\rightarrow

Term

| Eq : Term

\rightarrow

Term

\rightarrow

Term

| Int : Z

\rightarrow

Term

| Plus : Term

\rightarrow

Term

\rightarrow

Term

| Times : Term

\rightarrow

Term

\rightarrow

Term

| Minus : Term

\rightarrow

Term

\rightarrow

Term

| Choose : Identifier

\rightarrow

Term

\rightarrow

Term.

Figure 15. A simple language (left) and the same language with two swapped constructors and an added constructor (right).

Unpack Sigma Types: Vectors of Particular Lengths. In the same file , we then ported functions and proofs to vectors of a particular length, like vector T n. Devoid had left this step to the proof engineer. We supported this in Pumpkin Pi by chaining the previous change with an automatic configuration for unpacking sigma types. By composition, this transported proofs across the equivalence from Section 3.

Two tricks helped with better workflow integration for this change: 1) have the search procedure view vector T n as $\Sigma$ (v : vector T m).n = m for some m, then let Pumpkin Pi instantiate those equalities via unification heuristics, and 2) generate a custom eliminator for combining list terms with length invariants. The resulting workflow works not just for lists and vectors, but for any algebraic ornament, automating manual effort from Devoid. The suggested tactics were helpful for writing proofs in the order of hours that we had struggled with manually over the course of days, but only after massaging. More effort is needed to improve tactic suggestions for dependent types.

Tuples & Records: Industrial Use. An industrial proof engineer at the company Galois has been using Pumpkin Pi in proving correct an implementation of the TLS handshake protocol. Galois had been using a custom solver-aided verification language to prove correct C programs, but had found that at times, the constraint solvers got stuck. They had built a compiler that translates their language into Coq’s specification language Gallina, that way proof engineers could finish stuck proofs interactively using Coq. However, due to language differences, they had found the generated Gallina programs and specifications difficult to work with.

The proof engineer used Pumpkin Pi to port the automatically generated functions and specifications to more human-readable functions and specifications, wrote Coq proofs about those functions and specifications, then used Pumpkin Pi to port those proofs back to proofs about the original functions and specifications. So far, they have used at least three automatic configurations, but they most often used an automatic configuration for porting compiler-produced anonymous tuples to named records, as in file . The workflow was a bit nonstandard, so there was little need for tactic suggestions. The proof engineer reported an initial time investment learning how to use Pumpkin Pi, followed by later returns.

Permute Constructors: Modifying a Language. The swapping example from Section 2 was inspired by benchmarks from the Replica user study of proof engineers (Ringer et al., 2020). A change from one of the benchmarks is in Figure 15. The proof engineer had a simple language represented by an inductive type Term, as well as some definitions and proofs about the language. The proof engineer swapped two constructors in the language, and added a new constructor Bool.

This case study and the next case study break this change into two parts. In the first part, we used Pumpkin Pi with automatic configuration to repair functions and proofs about the language after swapping the constructors . With a bit of human guidance to choose the permutation from a list of suggestions, Pumpkin Pi repaired everything, though the original tactics would have also worked, so there was not a difference in development time.

Add new Constructors: Extending a Language. We then used Pumpkin Pi to repair functions after adding the new constructor in Figure 15, separating out the proof obligations for the new constructor from the old terms . This change combined manual and automatic configuration. We defined an inductive type Diff and (using partial automation) a configuration to port the terms across the equivalence Old.Term + Diff $\simeq$ New.Term. This resulted in case explosion, but was formulaic, and pointed to a clear path for automation of this class of changes. The repaired functions guaranteed preservation of the behavior of the original functions.

Adding constructors was less simple than swapping. For example, Pumpkin Pi did not yet save us time over the proof engineer from the user study; fully automating the configuration would have helped significantly. In addition, the repaired terms were (unlike in the swap case) inefficient compared to human-written terms. For now, they make good regression tests for the human-written terms—in the future, we hope to automate the discovery of the more efficient terms, or use the refinement framework CoqEAL (Cohen et al., 2013) to get between proofs of the inefficient and efficient terms.

Factor out Constructors: External Example. The change from Figure 4 came at the request of a non-author. We supported this using a manual configuration that described which constructor to map to true and which constructor to map to false . The configuration was very simple for us to write, and the repaired tactics were immediately useful. The development time savings were on the order of minutes for a small proof development. Since most of the modest development time went into writing the configuration, we expect time savings would increase for a larger development.

Permute Hypotheses: External Example. The change in came at the request of a different non-author (a cubical type theory expert), and shows how to use Pumpkin Pi to swap two hypotheses of a type, since T1 $\rightarrow$ T2 $\rightarrow$ T3 $\simeq$ T2 $\rightarrow$ T1 $\rightarrow$ T3. This configuration was manual. Since neither type was inductive, this change used the generic construction for any equivalence. This worked well, but necessitated some manual annotation due to the lack of custom unification heuristics for manual configuration, and so did not yet save development time, and likely still would not have had the proof development been larger. Supporting custom unification heuristics would improve this workflow.

Change Inductive Structure: Unary to Binary. In , we used Pumpkin Pi to support a classic example of changing inductive structure: updating unary to binary numbers, as in Figure 8. Binary numbers allow for a fast addition function, found in the Coq standard library. In the style of Magaud and Bertot (2000), we used Pumpkin Pi to derive a slow binary addition function that does not refer to nat, and to port proofs from unary to slow binary addition. We then showed that the ported theorems hold over fast binary addition.

The configuration for N used definitions from the Coq standard library for DepConstr and DepElim that had the desired behavior with no changes. Iota over the successor case was a rewrite by a lemma from the standard library that reduced the successor case of the eliminator that we used for DepElim:

⬇

N.peano_rect_succ :

\forall

P pO pS n,

N.peano_rect P pO pS (N.succ n) =

pS n (N.peano_rect P pO pS n).

The need for nontrivial Iota comes from the fact that N and nat have different inductive structures. By writing a manual configuration with this Iota, it was possible for us to implement this transformation that had been its own tool.

While porting addition from nat to N was automatic after configuring Pumpkin Pi, porting proofs about addition took more work. Due to the lack of unification heuristics for manual configuration, we had to annotate the proof term to tell Pumpkin Pi that implicit casts in the inductive cases of proofs were applications of Iota over nat. These annotations were formulaic, but tricky to write. Unification heuristics would go a long way toward improving the workflow.

After annotating, we obtained automatically repaired proofs about slow binary addition, which we found simple to port to fast binary addition. We hope to automate this last step in the future using CoqEAL. Repaired tactics were partially useful, but failed to understand custom eliminators like N.peano_rect, and to generate useful tactics for applications of Iota; both of these are clear paths to more useful tactics. The development time for this proof with Pumpkin Pi was comparable to reference manual repairs by external proof engineers. Custom unification heuristics would help bring returns on investment for experts in this use case.

8. Related Work

Proof Repair. The search procedures in Configure are based partly on ideas from the original Pumpkin Patch prototype (Ringer et al., 2018). The Pumpkin Patch prototype did not apply the patches that it finds, handle changes in structure, or include support for tactics beyond the use of hints.

Proof repair can be viewed as a form of program repair (Monperrus, 2018; Gazzola et al., 2018) for proof assistants. Proof assistants like Coq are a good fit for program repair: A recent paper (Qi et al., 2015) recommends that program repair tools draw on extra information such as specifications or example patches. In Coq, specifications and examples are rich and widely available: specifications thanks to dependent types, and examples thanks to constructivism.

Proof Refactoring. The proof refactoring tool Levity (Bourke et al., 2012) for Isabelle/HOL has seen large-scale industrial use. Levity focuses on a different task: moving lemmas. Chick (Robert, 2018) and RefactorAgda (Wibergh, 2019) are proof refactoring tools in a Gallina-like language and in Agda, respectively. These tools support primarily syntactic changes and do not have tactic support.

A few proof refactoring tools operate directly over tactics: POLAR (Dietrich et al., 2013) refactors proof scripts in languages based on Isabelle/Isar (Wenzel, 2007), CoqPIE (Roe and Smith, 2016) is an IDE with support for simple refactorings of Ltac scripts, and Tactician (Adams, 2015) is a refactoring tool for switching between tactics and tacticals. This approach is not tractable for more complex changes (Robert, 2018).

Proof Reuse. A few proof reuse tools work by proof term transformation and so can be used for repair. Johnsen and Lüth (2004) describes a transformation that generalizes theorems in Isabelle/HOL. Pumpkin Pi generalizes the transformation from Devoid (Ringer et al., 2019b), which transformed proofs along algebraic ornaments (McBride, 2011). Magaud and Bertot (2000) implement a proof term transformation between unary and binary numbers. Both of these fit into Pumpkin Pi configurations, and none suggests tactics in Coq like Pumpkin Pi does. The expansion algorithm from Magaud and Bertot (2000) may help guide the design of unification heuristics in Pumpkin Pi.

The widely used Transfer (Huffman and Kunčar, 2013) package supports proof reuse in Isabelle/HOL. Transfer works by combining a set of extensible transfer rules with a type inference algorithm. Transfer is not yet suitable for repair, as it necessitates maintaining references to both datatypes. One possible path toward implementing proof repair in Isabelle/HOL may be to reify proof terms using something like Isabelle/HOL-Proofs, apply a transformation based on Transfer, and then (as in Pumpkin Pi) decompile those terms to automation that does not apply Transfer or refer to the old datatype in any way.

CoqEAL (Cohen et al., 2013) transforms functions across relations in Coq, and these relations can be more general than Pumpkin Pi’s equivalences. However, while Pumpkin Pi supports both functions and proofs, CoqEAL supports only simple functions due to the problem that Iota addresses. CoqEAL may be most useful to chain with Pumpkin Pi to get faster functions. Both CoqEAL and recent ornaments work (Williams, 2020) may help with better workflow support for changes that do not correspond to equivalences.

The Pumpkin Pi transformation implements transport. Transport is realizable as a function given univalence (Univalent Foundations Program, 2013). UP (Tabareau et al., 2018) approximates it in Coq, only sometimes relying on functional extensionality. While powerful, neither approach removes references to the old type.

Recent work (Tabareau et al., 2019) extends UP with a white-box transformation that may work for repair. This imposes proof obligations on the proof engineer beyond those imposed by Pumpkin Pi, and it includes neither search procedures for equivalences nor tactic script generation. It also does not support changes in inductive structure, instead relying on its original black-box functionality; Iota solves this in Pumpkin Pi. The most fruitful progress may come from combining these tools.

Proof Design. Much work focuses on designing proofs to be robust to change, rather than fixing broken proofs. This can take the form of design principles, like using information hiding techniques (Woos et al., 2016; Klein et al., 2014) or any of the structures (Chrz\kaszcz, 2003; Sozeau and Oury, 2008; Saïbi, 1999) for encoding interfaces in Coq. Design and repair are complementary: design requires foresight, while repair can occur retroactively. Repair can help with changes that occur outside of the proof engineer’s control, or with changes that are difficult to protect against even with informed design.

Another approach to this is to use heavy proof automation, for example through program-specific proof automation (Chlipala, 2013) or general-purpose hammers (Blanchette et al., 2016; Paulson and Blanchette, 2012; Kaliszyk and Urban, 2014; Czajka and Kaliszyk, 2018). The degree to which proof engineers rely on automation varies, as seen in the data from a user study (Ringer et al., 2020). Automation-heavy proof engineering styles localize the burden of change to the automation, but can result in terms that are large and slow to type check, and tactics that can be difficult to debug. While these approaches are complementary, more work is needed for Pumpkin Pi to better support developments in this style.

9. Conclusions & Future Work

We combined search procedures for equivalences, a proof term transformation, and a proof term to tactic decompiler to build Pumpkin Pi, a proof repair tool for changes in datatypes. The proof term transformation implements transport across equivalences in a way that is suitable for repair and that does not compromise definitional equality. The resulting tool is flexible and useful for real proof engineering scenarios.

Future Work. Moving forward, our goal is to make proofs easier to repair and reuse regardless of proof engineering expertise and style. We want to reach more proof engineers, and we want Pumpkin Pi to integrate seamlessly with Coq.

Three problems that we encountered scaling up the Pumpkin Pi transformation were lack of type-directed search, ad hoc termination checks, and inability for proof engineers to add custom unification heuristics. We hope to solve these challenges using e-graphs (Nelson, 1980), a data structure for managing equivalences built with these kinds of problems in mind. E-graphs were recently adapted to express path equality in cubical (Gjørup and Spitters, 2020); we hope to repurpose this insight.

Beyond that, we believe that the biggest gains will come from continuing to improve the prototype decompiler. Two helpful features would be preserving indentation and comments, and automatically using information from the original proof script rather than asking for hints. One promising path toward the latter is to integrate the decompiler with a machine learning tool like TacTok (First et al., 2020) to rank tactic hints. Some improvements could also come from better tactics, or from a more structured tactic language. Integration with version control systems or with integrated development environments could also help. With that, we believe that the future of seamless and powerful proof repair and reuse for all is within reach.

Acknowledgements.

This paper has really been a community effort. Anders Mörtberg and Conor McBride gave us hours worth of detailed feedback that was instrumental to writing this paper. Nicolas Tabareau helped us understand the need to port definitional to propositional equalities. Valentin Robert gave us feedback on usability that informed tool design. Gaëtan Gilbert, James Wilcox, and Jasper Hugunin all at some point helped us write Coq proofs; let this be a record that we owe Gaëtan a beer, and we owe James boba. And of course, we thank our shepherd Gerwin Klein, and we thank all of our reviewers. We got other wonderful feedback on the paper from Cyril Cohen, Tej Chajed, Ben Delaware, Jacob Van Geffen, Janno, James Wilcox, Chandrakana Nandi, Martin Kellogg, Audrey Seo, James Decker, Ben Kushigian, John Regehr, and Justus Adam. The Coq developers have for years given us frequent and efficient feedback on plugin APIs for tool implementation. The programming languages community on Twitter (yes, seriously) has also been essential to this effort. Especially during a pandemic. And we’d like to extend a special thank you to Talia’s mentor, Derek Dreyer. We have also gotten a lot of feedback on future ideas that we are excited to pursue. Carlo Angiuli has helped us understand some beautiful theory beneath our implementation (spoiler: we believe the analogy connecting DepConstr and DepElim to constructors and eliminators has formal meaning in terms of initial algebras), and we are excited to integrate these insights into future papers and use them to generalize our insights. Alex Polozov helped us sketch out ideas for future work with the decompiler. And we got wonderful feedback on e-graph integration for future work from Max Willsey, Chandrakana Nandi, Remy Wang, Zach Tatlock, Bas Spitters, Steven Lyubomirsky, Andrew Liu, Mike He, Ben Kushigian, Gus Smith, and Bill Zorn. This material is based upon work supported by the Sponsor National Science Foundation http://dx.doi.org/10.13039/100000001 Graduate Research Fellowship under Grant No. Grant #DGE-1256082. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

(1)
Adams (2015) Mark Adams. 2015. Refactoring Proofs with Tactician. In Software Engineering and Formal Methods, Domenico Bianculli, Radu Calinescu, and Bernhard Rumpe (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 53–67. https://doi.org/10.1007/978-3-662-49224-6_6
Angiuli et al. (2020) Carlo Angiuli, Evan Cavallo, Anders Mörtberg, and Max Zeuner. 2020. Internalizing Representation Independence with Univalence. arXiv:cs.PL/2009.05547
Blanchette et al. (2016) Jasmin Christian Blanchette, David Greenaway, Cezary Kaliszyk, Daniel Kühlwein, and Josef Urban. 2016. A Learning-Based Fact Selector for Isabelle/HOL. Journal of Automated Reasoning 57, 3 (01 Oct 2016), 219–244. https://doi.org/10.1007/s10817-016-9362-8
Bourke et al. (2012) Timothy Bourke, Matthias Daum, Gerwin Klein, and Rafal Kolanski. 2012. Challenges and Experiences in Managing Large-Scale Proofs. In Intelligent Computer Mathematics. Springer, Berlin, Heidelberg, 32–48. https://doi.org/10.1007/978-3-642-31374-5_3
Boutillier (2014) Pierre Boutillier. 2014. De nouveaux outils pour calculer avec des inductifs en Coq. Theses. Université Paris-Diderot - Paris VII. https://tel.archives-ouvertes.fr/tel-01054723
Chlipala (2013) Adam Chlipala. 2013. Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant. The MIT Press.
Chrz\kaszcz (2003) Jacek Chrz\kaszcz. 2003. Implementing Modules in the Coq System. In Theorem Proving in Higher Order Logics, David Basin and Burkhart Wolff (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 270–286. https://doi.org/10.1007/10930755_18
Cohen et al. (2013) Cyril Cohen, Maxime Dénès, and Anders Mörtberg. 2013. Refinements for Free!. In Certified Programs and Proofs. Melbourne, Australia, 147 – 162. https://doi.org/10.1007/978-3-319-03545-1_10
Coquand and Huet (1986) T. Coquand and Gérard Huet. 1986. The calculus of constructions. Technical Report RR-0530. INRIA. https://hal.inria.fr/inria-00076024
Coquand and Paulin (1990) Thierry Coquand and Christine Paulin. 1990. Inductively defined types. In COLOG-88, Per Martin-Löf and Grigori Mints (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 50–66.
Czajka and Kaliszyk (2018) Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for Dependent Type Theory. Journal of Automated Reasoning 61, 1 (01 Jun 2018), 423–453. https://doi.org/10.1007/s10817-018-9458-4
DeMillo et al. (1977) Richard A. DeMillo, Richard J. Lipton, and Alan J. Perlis. 1977. Social Processes and Proofs of Theorems and Programs. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’77). ACM, New York, NY, USA, 206–214. https://doi.org/10.1145/512950.512970
Dietrich et al. (2013) Dominik Dietrich, Iain Whiteside, and David Aspinall. 2013. Polar: A Framework for Proof Refactoring. In Logic for Programming, Artificial Intelligence, and Reasoning. Springer, Berlin, Heidelberg, 776–791. https://doi.org/10.1007/978-3-642-45221-5_52
First et al. (2020) Emily First, Yuriy Brun, and Arjun Guha. 2020. TacTok: Semantics-Aware Proof Synthesis. Proc. ACM Program. Lang. 4, OOPSLA, Article 231 (Nov. 2020), 31 pages. https://doi.org/10.1145/3428299
Gazzola et al. (2018) Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2018. Automatic Software Repair: A Survey. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1219–1219. https://doi.org/10.1145/3180155.3182526
Gjørup and Spitters (2020) Emil Holm Gjørup and Bas Spitters. 2020. Congruence Closure in Cubical Type Theory. In Workshop on Homotopy Type Theory / Univalent Foundations. https://www.cs.au.dk/~spitters/Emil.pdf
Huffman and Kunčar (2013) Brian Huffman and Ondřej Kunčar. 2013. Lifting and Transfer: A Modular Design for Quotients in Isabelle/HOL. In Certified Programs and Proofs: Third International Conference (CPP 2013). Springer International Publishing, Cham, 131–146. https://doi.org/10.1007/978-3-319-03545-1_9
Johnsen and Lüth (2004) Einar Broch Johnsen and Christoph Lüth. 2004. Theorem Reuse by Proof Term Transformation. In Theorem Proving in Higher Order Logics: 17th International Conference, TPHOLs 2004, Park City, Utah, USA, September 14-17, 2004. Proceedings. Springer, Berlin, Heidelberg, 152–167. https://doi.org/10.1007/978-3-540-30142-4_12
Kaliszyk and Urban (2014) Cezary Kaliszyk and Josef Urban. 2014. Learning-Assisted Automated Reasoning with Flyspeck. Journal of Automated Reasoning 53, 2 (01 Aug 2014), 173–213. https://doi.org/10.1007/s10817-014-9303-3
Klein et al. (2014) Gerwin Klein, June Andronick, Kevin Elphinstone, Toby Murray, Thomas Sewell, Rafal Kolanski, and Gernot Heiser. 2014. Comprehensive Formal Verification of an OS Microkernel. ACM Trans. Comput. Syst. 32, 1, Article 2 (Feb. 2014), 70 pages. https://doi.org/10.1145/2560537
Klein et al. (2009) Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. 2009. seL4: Formal Verification of an OS Kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP ’09). ACM, New York, NY, USA, 207–220. https://doi.org/10.1145/1629575.1629596
Magaud and Bertot (2000) Nicolas Magaud and Yves Bertot. 2000. Changing data structures in type theory: A study of natural numbers. In International Workshop on Types for Proofs and Programs. Springer, 181–196.
McBride (2011) Conor McBride. 2011. Ornamental algebras, algebraic ornaments. http://plv.mpi-sws.org/plerg/papers/mcbride-ornaments-2up.pdf
Monperrus (2018) Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. ACM Comput. Surv. 51, 1, Article 17 (Jan. 2018), 24 pages. https://doi.org/10.1145/3105906
Nelson (1980) Charles Gregory Nelson. 1980. Techniques for Program Verification. Ph.D. Dissertation. Stanford, CA, USA. AAI8011683.
nLab authors (2020a) nLab authors. 2020a. beta-reduction. http://ncatlab.org/nlab/show/beta-reduction. Revision 6.
nLab authors (2020b) nLab authors. 2020b. eta-conversion. http://ncatlab.org/nlab/show/eta-conversion. Revision 12.
Paulson and Blanchette (2012) Lawrence C. Paulson and Jasmin Christian Blanchette. 2012. Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In International Workshop on the Implementation of Logics (IWIL 2010) (EPiC Series), G. Sutcliffe, S. Schulz, and E. Ternovska (Eds.), Vol. 2. EasyChair, 1–11.
Qi et al. (2015) Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). ACM, New York, NY, USA, 24–36. https://doi.org/10.1145/2771783.2771791
Ringer et al. (2019a) Talia Ringer, Karl Palmskog, Ilya Sergey, Milos Gligoric, and Zachary Tatlock. 2019a. QED at Large: A Survey of Engineering of Formally Verified Software. Foundations and Trends® in Programming Languages 5, 2-3 (2019), 102–281. https://doi.org/10.1561/2500000045
Ringer et al. (2020) Talia Ringer, Alex Sanchez-Stern, Dan Grossman, and Sorin Lerner. 2020. REPLica: REPL instrumentation for Coq analysis. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP 2020). 99–113. https://doi.org/10.1145/3372885.3373823
Ringer et al. (2018) Talia Ringer, Nathaniel Yazdani, John Leo, and Dan Grossman. 2018. Adapting Proof Automation to Adapt Proofs. In Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP 2018). Association for Computing Machinery, New York, NY, USA, 115–129. https://doi.org/10.1145/3167094
Ringer et al. (2019b) Talia Ringer, Nathaniel Yazdani, John Leo, and Dan Grossman. 2019b. Ornaments for Proof Reuse in Coq. In 10th International Conference on Interactive Theorem Proving (ITP 2019) (Leibniz International Proceedings in Informatics (LIPIcs)), John Harrison, John O’Leary, and Andrew Tolmach (Eds.), Vol. 141. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 26:1–26:19. https://doi.org/10.4230/LIPIcs.ITP.2019.26
Robert (2018) Valentin Robert. 2018. Front-end tooling for building and maintaining dependently-typed functional programs. Ph.D. Dissertation. UC San Diego.
Roe and Smith (2016) Kenneth Roe and Scott Smith. 2016. CoqPIE: An IDE Aimed at Improving Proof Development Productivity. In Interactive Theorem Proving: 7th International Conference, ITP 2016, Nancy, France, August 22-25, 2016, Proceedings. Springer International Publishing, Cham, 491–499. https://doi.org/10.1007/978-3-319-43144-4_32
Saïbi (1999) Amokrane Saïbi. 1999. Outils Génériques de Modélisation et de Démonstration pour la Formalisation des Mathématiques en Théorie des Types: application à la Théorie des Catégories. Ph.D. Dissertation. Université Paris VI, Paris, France.
Sozeau and Oury (2008) Matthieu Sozeau and Nicolas Oury. 2008. First-Class Type Classes. In Theorem Proving in Higher Order Logics: 21st International Conference, TPHOLs 2008, Montreal, Canada, August 18-21, 2008. Proceedings, Otmane Ait Mohamed, César Muñoz, and Sofiène Tahar (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 278–293. https://doi.org/10.1007/978-3-540-71067-7_23
Tabareau et al. (2018) Nicolas Tabareau, Éric Tanter, and Matthieu Sozeau. 2018. Equivalences for Free: Univalent Parametricity for Effective Transport. Proc. ACM Program. Lang. 2, ICFP, Article 92 (July 2018), 29 pages. https://doi.org/10.1145/3236787
Tabareau et al. (2019) Nicolas Tabareau, Éric Tanter, and Matthieu Sozeau. 2019. The Marriage of Univalence and Parametricity. arXiv:cs.PL/1909.05027
Timany and Jacobs (2015) Amin Timany and Bart Jacobs. 2015. First Steps Towards Cumulative Inductive Types in CIC. In ICTAC.
Univalent Foundations Program (2013) The Univalent Foundations Program. 2013. Homotopy Type Theory: Univalent Foundations of Mathematics. https://homotopytypetheory.org/book, Institute for Advanced Study.
Wenzel (2007) Makarius Wenzel. 2007. Isabelle/Isar–a generic framework for human-readable proof documents. From Insight to Proof–Festschrift in Honour of Andrzej Trybulec 10, 23 (2007), 277–298.
Whiteside (2013) Iain Johnston Whiteside. 2013. Refactoring proofs. Ph.D. Dissertation. University of Edinburgh. http://hdl.handle.net/1842/7970
Wibergh (2019) Karin Wibergh. 2019. Automatic refactoring for Agda. Master’s thesis. Chalmers University of Technology and University of Gothenburg.
Williams (2020) Ambre Williams. 2020. Refactoring functional programs with ornaments. Ph.D. Dissertation.
Woos et al. (2016) Doug Woos, James R. Wilcox, Steve Anton, Zachary Tatlock, Michael D. Ernst, and Thomas Anderson. 2016. Planning for Change in a Formal Verification of the Raft Consensus Protocol. In Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs (CPP 2016). ACM, New York, NY, USA, 154–165. https://doi.org/10.1145/2854065.2854081