Model-Driven Synthesis for Programming Tutors

Niek Mulleners Utrecht UniversityUtrecht [email protected] , Johan Jeuring Utrecht UniversityUtrecht [email protected] and Bastiaan Heeren Open University of the NetherlandsHeerlen [email protected]

(2020)

Abstract.

When giving automated feedback to a student working on a beginner’s exercise, many programming tutors run into a completeness problem. On the one hand, we want a student to experiment freely. On the other hand, we want a student to write her program in such a way that we can provide constructive feedback. We propose to investigate how we can overcome this problem by using program synthesis, which we use to generate correct solutions that closely match a student program, and give feedback based on the results.

synthesis, programming tutors

^†^†copyright: rightsretained^†^†journalyear: 2020^†^†conference: HATRA 2020: Human Aspects of Types and Reasoning Assistants; 15–20 November, 2020; Chicago, IL^†^†booktitle: HATRA 2020: Human Aspects of Types and Reasoning Assistants, 15–20 November, 2020, Chicago, IL^†^†ccs: Applied computing Interactive learning environments^†^†ccs: Software and its engineering Search-based software engineering

1. Introduction

Programming tutors help students learning to develop a program. Since the 1960s, more than one hundred of such tools have been developed (Keuning et al., 2018). Many of these tutors analyse the final program a student submits, but there are also tutors that analyse a partial, unfinished student program, and give feedback for such programs. Keuning et al. (2018) note that most of these tools give solution error feedback and that relatively few tools give hints to students when they are on their way to a solution. We want to investigate how we can give hints to students that are developing a program, at a stage where a program need not be complete yet.

Giving hints to students who are halfway towards a solution is challenging. The literature describes several techniques to achieve this. Anderson and Skwarecki (1986) and Gerdes et al. (2017) require a student to follow a path towards a model solution, where there are usually multiple paths possible. Price et al. (2016), Piech et al. (2015), and Lazar and Bratko (2014) use data collected from previous students to generate hints. The disadvantages of these approaches are that they either restrict what a student can do, or that they are incomplete in the sense that a student can submit a program for which no hint can be given. The latter need not be a problem if this happens infrequently, but current practice shows that this happens regularly. We want to investigate techniques for giving hints to students who can develop any program for a problem, and ask for hints at any stage.

We will focus on students working on beginners exercises in a statically typed functional programming language. Such an exercise is defined by

•

a textual description;
•

a set of model solutions;
•

a set of properties;
•

a library of allowed functions.

We allow students to leave parts of their program unimplemented by introducing holes, denoted by a question mark (?). For an unfinished student program, we want to answer the following questions: is the student still on the right track? If so, how can we give hints on how to proceed? If not, how can we help the student to get back on track?

Feldman et al. (2019) use program synthesis in a dynamically typed functional language to tell students whether they are on the right track, by generating the part of the program the student still needs to write. Singh et al. (2013) and Gulwani et al. (2018) apply program synthesis to imperative languages to give automated feedback on how to repair a program. We want to adapt these approaches to a statically typed functional language and use the synthesis results to give constructive feedback. We propose to investigate to what extent we can use program synthesis to give hints to students on their way to a solution.

2. An example: sorting algorithm

Throughout this proposal, we use the following exercise as an example.

Description

Write a function my_sort :: [Int] -> [Int] that sorts a list of integers.

Model solutions

There are many possible ways to implement a sorting algorithm. For simplicity, we only look at the following model solution, implementing my_sort as insertion sort.

my_sort = foldr insert []
  where
    insert x [] = [x]
    insert x (y:ys) | x < y     = x:y:ys
                    | otherwise = y:insert x ys

Properties

The correctness of my_sort is expressed in terms of the following properties, expressing that the result of sorting a list should be a permutation of the input list and that the result should be in non-descending order.

sort_permutes xs = my_sort xs ‘permutes‘ xs
sort_nondescending xs = nondescending (my_sort xs)

Library

A suitable prelude containing all functions the students should reasonably be allowed to use.

3. Program Synthesis

Program synthesis is the process of automatically generating programs based on their intended behaviour. Rather than writing programs by hand, the programmer expresses their intention in terms of a specification to which the program should adhere. Such a specification can range from simple input-output examples to a formal logical specification. A program synthesizer then tries to generate a program from this specification, typically by performing a search, enumerating all possible programs within the program space, until one is found that adheres to the specification.

To verify the correctness of student solutions, we extensively test them against a large set of input-output examples, generated from the model solutions. Therefore, as our synthesizer, we will look at Myth (Osera and Zdancewic, 2015; Frankle et al., 2016), a state-of-the-art tool in example-based synthesizers. During the synthesis process, Myth can only introduce constructs that it can execute, so that the synthesized program can be checked against the input-output examples. When synthesizing a recursive function, Myth can “execute” a recursive call if the input to the recursive call is in the example set, by simply returning the corresponding output. This requires, however, that the example set is trace complete. That is, for every input-output pair in the example set, there should also be an input-output pair for every structurally smaller input, so that the example set contains complete program traces. This requirement is often considered the main limitation of Myth. In the presence of model solutions, however, we can simply execute a model solution at the recursive call, bypassing the need for trace-complete example sets!

Furthermore, we expect correct solutions to resemble our model solutions. We can use this insight to guide the synthesis process by giving priority to constructs appearing in the model solutions. Lee et al. (2018) accelerate search-based synthesis by performing A* search (Hart et al., 1968), with a heuristic based on learned probabilistic models. We intend to use a similar approach to speed up the synthesis, by basing our heuristic on the provided model solutions.

Unfortunately, Myth cannot be applied directly to our setting, because it requires local input-output examples on holes, whereas we only have global input-output examples. As described by Lubin et al. (2020), we can use live bidirectional evaluation to propagate global input-output examples to local input-output examples on holes. First, live evaluation (Omar et al., 2019) is used to partially evaluate a program to a result r by proceeding around the holes. Live unevaluation (Lubin et al., 2020) then checks this result r against an input-output example, generating constraints on the holes in the form of local input-output examples.

4. Finishing programs

In this section, we give an overview of how we intend to apply the described techniques to complete unfinished student programs and use the results to give feedback.

4.1. Live bidirectional example checking

Before we can perform program synthesis, we apply live bidirectional example checking to check the student solution against an example set, possibly returning a set of local input-output examples for every hole.

4.1.1. Failure

If live bidirectional example checking fails, this means that it is impossible to finish a student solution by filling its holes. This results in a counterexample, which can be checked against the property set to determine which properties are violated.

Finished solution

The student might try to implement my_sort as insertion sort, but incorrectly use the list cons operator instead of the insert function.

my_sort = foldr (:) []

Live evaluation might return the counterexample my_sort [3,2,1] == [3,2,1], violating the property sort_nondescending.

Partial solution

Alternatively, the student might give a partial solution with an incorrect base case, such as the following.

my_sort = foldr ? [0]

Live evaluation gives the counterexample my_sort [] == [0], which violates the property sort_permutes. Note that, in some cases, the resulting counterexample might also contain a hole! For example, take the following partial solution.

my_sort [] = []
my_sort (x:xs) = x : ?

Live evaluation might return the counterexample my_sort [2,1] == 2:?. Using unevaluation, we can determine that there is no hole filling for which the epxression 2:? evaluates to the expected result [1,2].

Recovery

Other than presenting the student the counterexample and showing which properties her implementation violates, we might want to help her get back on track. Each counterexample corresponds to a path in the abstract syntax tree of the program, hinting at what part of the program is incorrect. For example, the counterexample my_sort [3,2,1] == [3,2,1] is introduced by the leaf (:) in the abstract syntax tree of foldr (:) []. We replace this leaf with a hole (resulting in the program foldr ? []) and repeat the synthesis process until we eventually find a correct solution closest to the student’s attempt.

4.1.2. Success

On successful live bidirectional evaluation, we get a set of local input-output examples on the holes. If the student solution contains no holes, this set of examples is empty and the exercise is finished. If there are holes, the generated examples can be used to perform program synthesis.

4.2. Program synthesis

Now that we have local input-output examples on the holes in our program, we can perform program synthesis to determine whether the student is still on track.

Failure

If a program can no longer be finished, we expect our synthesis to fail. This means that the synthesis has either exhaustively searched the program space or a timeout occurred. Both of these cases can take quite some time. One way to skip this process is to check for conflicting examples. Take, for example, the following attempt at implementing my_sort using map.

my_sort = map ?0

For the input [2,2,1] we expect the output [1,2,2]. As such, for a function f filling the hole we generate the following input-output examples.

f 2 == 2
f 2 == 1
f 1 == 2

These constraints are conflicting, since a function cannot have different outputs on the same input. As such, we can skip the synthesis process. For other incorrect programs, it might not be possible to generate conflicting constraints. Consider the following example:

my_sort = map ? . zip [0..]

Before mapping, it first combines every value in the input list with a unique integer. In doing so, the list passed to map ? contains only unique values, so that no conflicting constraints can be generated. At this point, we will perform program synthesis with the generated constraints, but the process will fail since map cannot change the order of a list.

Recovery

To help the student recover, we propose an approach similar to the one described in section 4.1.1. If the synthesis process fails to fill a hole, we blame its parent node in the abstract syntax tree of the program and replace that node with a hole. For example, for the incorrect program map ? . zip [0..], we replace map ? with a hole, resulting in the program ? . zip [0..]. Like before, we repeat the synthesis process until we find a correct solution.

Success

If a program can still be finished, our synthesis will either timeout, or it will return a valid hole filling. In the first case, we cannot tell a student whether she is on the right track, but we can guarantee that her implementation is more complex than required and advise another approach. In the second case, we can give her feedback by using the synthesized programs to calculate specifications for the holes and present these in a legible way. For example, take the following partial student program.

my_sort [] = []
my_sort (x:xs) = f x (my_sort xs)
  where f y ys = ?

This program can easily be finished by filling the hole with insert y ys. Knowing this, we can generate a specification for the hole, by generating input-output examples using the hole filling. These input-output examples can help the student understand the part of the program they are yet to write. Even more interesting, we might use a technique such as described by Claessen et al. (2010) to automatically generate properties that should hold on insert and therefore should hold on the hole. Ideally, this would allow us to inform the student that ys is a sorted list and f should return a sorted list containing all elements from ys plus y.

Note that many existing techniques, such as program transformations (Gerdes et al., 2017), can already give relevant feedback on the previous example by comparing it to one of the model solutions. The next example, however, is a lot harder to reason about automatically.

my_sort [] = []
my_sort (x:xs) = foldr ? ? xs

It is far from trivial to automatically prove that this partial program is still on the right track, but we expect our synthesizer to find the correct hole fillings (insert and [x] respectively) with relative ease.

5. Conclusion

To conclude, we believe there is a lot of potential in the application of program synthesis and live bidirectional evaluation to give meaningful feedback to students on their way to a solution.

References

(1)
Anderson and Skwarecki (1986) John R. Anderson and Edward Skwarecki. 1986. The automated tutoring of introductory computer programming. Commun. ACM 29, 9 (1986), 842–849.
Claessen et al. (2010) Koen Claessen, Nicholas Smallbone, and John Hughes. 2010. QuickSpec: Guessing Formal Specifications Using Testing. In Tests and Proofs, Gordon Fraser and Angelo Gargantini (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 6–21.
Feldman et al. (2019) Molly Q Feldman, Yiting Wang, William E. Byrd, François Guimbretière, and Erik Andersen. 2019. Towards Answering ”Am I on the Right Track?” Automatically Using Program Synthesis. In Proceedings of the 2019 ACM SIGPLAN Symposium on SPLASH-E (Athens, Greece) (SPLASH-E 2019). Association for Computing Machinery, New York, NY, USA, 13–24. https://doi.org/10.1145/3358711.3361626
Frankle et al. (2016) Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-directed synthesis: a type-theoretic interpretation. ACM SIGPLAN Notices 51, 1 (April 2016), 802–815. https://doi.org/10.1145/2914770.2837629
Gerdes et al. (2017) Alex Gerdes, Bastiaan Heeren, Johan Jeuring, and L. Thomas van Binsbergen. 2017. Ask-Elle: an Adaptable Programming Tutor for Haskell Giving Automated Feedback. International Journal of Artificial Intelligence in Education 27, 1 (01 Mar 2017), 65–100.
Gulwani et al. (2018) Sumit Gulwani, Ivan Radiček, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. SIGPLAN Not. 53, 4 (June 2018), 465–480. https://doi.org/10.1145/3296979.3192387
Hart et al. (1968) P. E. Hart, N. J. Nilsson, and B. Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (1968), 100–107. https://doi.org/10.1109/TSSC.1968.300136
Keuning et al. (2018) Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A Systematic Literature Review of Automated Feedback Generation for Programming Exercises. ACM Trans. Comput. Educ. 19, 1, Article 3 (Sept. 2018), 43 pages. https://doi.org/10.1145/3231711
Lazar and Bratko (2014) Timotej Lazar and Ivan Bratko. 2014. Data-Driven Program Synthesis for Hint Generation in Programming Tutors. In Intelligent Tutoring Systems, Stefan Trausan-Matu, Kristy Elizabeth Boyer, Martha Crosby, and Kitty Panourgia (Eds.). Springer International Publishing, Cham, 306–311.
Lee et al. (2018) Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating Search-Based Program Synthesis Using Learned Probabilistic Models. SIGPLAN Not. 53, 4 (June 2018), 436–449. https://doi.org/10.1145/3296979.3192410
Lubin et al. (2020) Justin Lubin, Nick Collins, Cyrus Omar, and Ravi Chugh. 2020. Program Sketching with Live Bidirectional Evaluation. Proc. ACM Program. Lang. 4, ICFP, Article 109 (Aug. 2020), 29 pages. https://doi.org/10.1145/3408991
Omar et al. (2019) Cyrus Omar, Ian Voysey, Ravi Chugh, and Matthew A. Hammer. 2019. Live Functional Programming with Typed Holes. Proc. ACM Program. Lang. 3, POPL, Article 14 (Jan. 2019), 32 pages. https://doi.org/10.1145/3290327
Osera and Zdancewic (2015) Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis. ACM SIGPLAN Notices 50, 6 (Aug. 2015), 619–630. https://doi.org/10.1145/2813885.2738007
Piech et al. (2015) Chris Piech, Mehran Sahami, Jonathan Huang, and Leonidas Guibas. 2015. Autonomously Generating Hints by Inferring Problem Solving Policies. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale (Vancouver, BC, Canada) (L@S ’15). Association for Computing Machinery, New York, NY, USA, 195–204. https://doi.org/10.1145/2724660.2724668
Price et al. (2016) Thomas W. Price, Yihuan Dong, and Tiffany Barnes. 2016. Generating Data-Driven Hints for Open-Ended Programming. In Proceedings of the 9th International Conference on Educational Data Mining, Tiffany Barnes, Min Chi, and Mingyu Feng (Eds.). Raleigh, NC, USA, 191–198.
Singh et al. (2013) Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated Feedback Generation for Introductory Programming Assignments. SIGPLAN Not. 48, 6 (June 2013), 15–26. https://doi.org/10.1145/2499370.2462195