Faculdade de Ciências
Universidade do Porto, Porto
11email: {joao.barbosa,amflorido,vscosta}@fc.up.pt
Typed SLD-Resolution: Dynamic Typing for Logic Programming
Abstract
The semantic foundations for logic programming are usually separated into two different approaches. The operational semantics, which uses SLD-resolution, the proof method that computes answers in logic programming, and the declarative semantics, which sees logic programs as formulas and its semantics as models. Here, we define a new operational semantics called TSLD-resolution, which stands for Typed SLD-resolution, where we include a value “wrong”, that corresponds to the detection of a type error at run-time. For this we define a new typed unification algorithm. Finally we prove the correctness of TSLD-resolution with respect to a typed declarative semantics.
Keywords:
Logic Programming Operational Semantics Types1 Introduction
Types play an important role in the verification and debugging of programming languages, and have been the subject of significant research in the logic programming community [Zob87, DZ92, Lu01, FSVY91, YFS92, MO84, LR91, SBG08, DMP02, SCWD08, Han14, BFC19, BFC22]. Most research has been driven by the desire to perform compile-time checking. One important line of this work views types as approximation of the program semantics [Zob87, DZ92, YFS92, BJ88, FSVY91]. A different approach relies on asking the user to provide the type information, thus filtering the set of admissible programs [MO84, LR91, SCWD08, BFC19]. In practice, static type-checking is not widely used in actual Prolog systems, but Prolog systems do rely on dynamic typing to ensure that system built-in parameters are called with acceptable arguments, such as is/2. In fact, the Prolog ISO standard defines a set of predefined types and typing violations [DEC96].
Motivated by these observations, we propose a step forward in the dynamic type checking of logic programs: to extend unification with a type checking mechanism. Type checking will thus become a core part of the resolution engine. This extension enables the detection of several bugs in Prolog programs which are rather difficult to capture in the standard untyped language, such as the unintended switch of arguments in a predicate call. This approach can also be used to help in the tracing of bugs in the traditional Prolog Four-Port Model debugging approach. In classical programming languages, type-checking essentially captures the use of functions on arguments of a type different from the expected. The parallel in logic programming is to capture type errors in queries applied to arguments of a different type. This is the essence of our new operational mechanism, here called Typed SLD (TSLD). Following Milner’s argument on wrong programs [Mil78], type errors will be denoted by an extra value, wrong; unification may succeed, fail, or be wrong. As discussed in prior work, a three-valued semantics provides a natural framework for describing the new unification [BFC19]. We name the new evaluation mechanism, that extends unification with type checking thus performing dynamic typing, Typed SLD (TSLD). Let us now present a simple example of the use of TSLD-resolution:
Example 1
Consider the following program consisting of the three facts:
p(0). p(1). p(a).
Let stand for success, false for failure and wrong for a run-time error. Assuming that constants and have different types (in this case, and , respectively), the TSLD-tree for the query is:
In the following we prove that TSLD-resolution is correct with respect to a new typed declarative semantics for logic programming based on the existence of several different semantic domains, for the interpretation of terms, instead of the usual Herbrand universe domain. The use of different domains for the semantics of logic programming is not new [LR91], but before it was used in a prescriptive typing approach where types were mandatory for every syntactic objects: functors, variables, and predicates. Here we assume that only constants and function symbols have predefined types. Type correctness in the form of a reformulation of subject-reduction for SLD-resolution was defined in [DS01] for a typed version of logic programs where also every syntactic objects must be typed statically. To semantically deal with dynamic typing here we define a new version of SLD (Typed SLD) and prove its soundness with respect to a new declarative semantics. In another previous work [SCWD08] dynamic type checking of Prolog predicates was done in two different scenarios: on calls from untyped to typed code using program transformation, and on calls to untyped code from typed code to check whether the untyped code satisfies previously made type annotations. In this previous work type annotations for predicates were necessary in both scenarios and the semantic soundness of these run-time checks was not studied. Here we do a semantic study of dynamic typing and use it to show that a new operational mechanism detecting run-time errors is sound. For this we use predefined types only for constants and function symbols.
The paper is organised as follows. Section 2 reviews some preliminary background concepts with the necessary definitions and results on the theory of types and logic programming, and sets the grounds for subsequent developments. In Section 3 we present the important new notion of typed unification, present the use of different types for constant symbols, and show that typed unification extends standard first order unification in the sense that when it succeeds the result is the same. Section 4 presents TSLD-resolution. We definite the notion of TSLD-derivation step and TSLD-tree, showing how run-time type errors are detected during program evaluation. We also distinguish the notion of a type error in the program from the notion of a type error in the query with respect to a program. In Section 5, we define a new declarative semantics for logic programming based on our previous work [BFC19] using a three-valued logic which uses the value wrong to denote run-time type errors, define ill-typed programs and queries, and show that TSLD-resolution is sound with respect to this declarative semantics. Finally, in section 6, we conclude and point out some research directions for future work.
2 Preliminary Concepts
In this section we will present concepts that are relevant both to the definitions of the operational and declarative semantics for logic programming. These concepts include the three-valued logic that will be used throughout the paper, defined initially in [Kle38], and the definition of our syntax for types, terms, and programs.
2.1 Three-Valued Logic
The three-valued logic used in this paper is the Weak Kleene logic [Kle38], later interpreted by [BB81] and [Bea16]. As in our previous work [BFC19] the third value is called wrong and represents a (dynamic) type error. In this logic, the value wrong propagates through every connective, which is a behaviour we want the type error value to have. In table 1, we describe the connectives in the logic.
true | false | wrong | |
true | true | false | wrong |
false | false | false | wrong |
wrong | wrong | wrong | wrong |
true | false | wrong | |
---|---|---|---|
true | true | true | wrong |
false | true | false | wrong |
wrong | wrong | wrong | wrong |
The negation of logic values is defined as: , and . And implication is defined as: .
Note that whenever the value wrong occurs in any connective in the logic, the result of applying that connective is wrong.
2.2 Types
In this paper we fix the set of base types int, float, atom and string, an enumerable set of compound types , where is a function symbol and are types, and an enumerable set of functional types of the form , where and are types.
We use this specific choice of base types because they correspond to types already present, to some extent, in Prolog. Some built-in predicates already expect integers, floating point numbers, or atoms.
2.3 Terms
The alphabet of logic programming is composed of symbols from disjoint classes. For our language of terms we have an infinite set of variables Var, an infinite set of function symbols Fun, parenthesis and the comma [Apt96].
Terms are defined as follows:
-
•
a variable is a term,
-
•
if is an n-ary function symbol and are terms, then is a term,
-
•
if is a function symbol of arity zero, then is a term and it is called a constant.
A ground term is a term with no variables. In the rest of the paper we assume that ground terms are assumed to be typed, meaning that each constant has associated to it a base type and for any ground compound term , the function symbol (of arity ) has associated to it a functional type of the form . Note that variables are not statically typed, but, as we will see in the forthcoming sections, type checking of the use of variables will be made dynamically through TSLD-resolution.
2.4 Programs and Queries
We now extend our language of terms to a language of programs by adding an infinite set of predicate symbols Pred and the reverse implication .
The definition of atoms, queries, clauses and programs is the usual one [Apt96]:
-
•
an atom is either a predicate symbol with arity , applied to terms , which we write as . We will represent atoms by ;
-
•
a query is a finite sequence of atoms, which we will represent by ;
-
•
a clause is of the form , where is an atom of the form and is a query;
-
•
a program is a finite set of clauses, which we will represent by .
The interpretation of queries and clauses is quantified. Every variable that occurs in a query is assumed to be existentially quantified and every variable that occurs in a clause is universally quantified [Apt96].
3 Typed Unification
Solving equality constraints using a unification algorithm [Rob65, MM82] is the main computational mechanism in logic programming. Logic programming usually uses an untyped term language and assumes a semantic universe composed of all semantic values: the Herbrand universe [Her30, Apt96].
However, in our work, we assume that the semantic values are split among several disjoint semantic domains and thus equality only makes sense inside each domain. Moreover each type will be mapped to a non-empty semantic domain. To reflect this, unification may now return three different outputs. Besides being successful or failing, unification can now return the wrong value. This is the logical value of nonsense and reflects the fact that we are trying to perform unification between terms with different types corresponding to a type error during program evaluation.
A substitution is a mapping from variables to terms, which assigns to each variable in its domain a term . We will represent bindings by , substitutions by symbols such as , and applying a substitution to a term will be represented by . We say is an instance of .
Substitution composition is represented by , i.e., the composition of the substitutions and is denoted and applying corresponds to . We can also calculate substitution composition, i.e., as defined below [Apt96].
Definition 1 (Substitution Composition)
Suppose and are substitutions, such that and . Then, composition is calculated by following these steps:
-
•
remove from the sequence the bindings such that and the elements for which
-
•
form a substitution from the resulting sequence.
A substitution is called a unifier of two terms and iff . If such a substitution exists, we say that the two terms are unifiable. In particular, a unifier is called a most general unifier (mgu) of two terms and if for every other unifier of and , , for some substitution .
First order unification [Rob65] assumes an untyped universe, so unification between any two terms always makes sense. Therefore, it either returns a mgu between the terms, if it exists, or halts with failure.
We argue that typed unification only makes sense between terms of the same type. Here we will extend a previous unification algorithm [MM82] to define a typed unification algorithm, where failure will be separated into , where two terms are not unifiable but may have the same type, and wrong, where the terms cannot have the same type.
Definition 2 (Typed Unification Algorithm)
Let and be two terms, and be a flag that starts . We create the starting set of equations as , and we will rewrite the pair by applying the following rules until it is no longer possible to apply any of them, or until the algorithm halts with wrong. If no rules are applicable, then we output if the flag is , or output the solved set , which can be seen as a substitution.
-
1.
-
2.
, if or
-
3.
-
4.
, if , and and have the same type
-
5.
, if , and and have different types
-
6.
-
7.
-
8.
-
9.
, where is not a variable and is a variable
-
10.
, where does not occur in and occurs in
-
11.
, where occurs in and
Let us illustrate with an example.
Example 2
Let be and be . We generate the pair , and proceed to apply the rewriting rules.
.
The successful cases of this algorithm are the same as for first order unification [MM82]. We will prove this result in the following theorem.
Theorem 3.1 (Conservative with respect to term unification)
Let and be two terms. If we apply the Martelli-Montanari algorithm (MM algorithm) to and and it returns a set of solved equalities , then the typed unification algorithm applied to the same two terms is also successful and returns the same set of equalities.
The following theorem proves typed unification detects run-time type errors.
Theorem 3.2 (Ill-typed unification)
If the output of the typed unification algorithm is wrong, then there is no substitution such that and have the same type.
Example 3
Let and . The typed unification algorithm outputs . We can see in Table 2 that there is no substitution such that , nor any substitution such that has the same type as , since the highlighted terms cannot have the same type for any substitution.
4 Operational Semantics
The operational semantics of logic programming describes how answers are computed. Here we define Typed SDL-resolution (TSLD) which returns the third value wrong whenever it finds a type error. We start by defining a TSLD-derivation step, which is a variation on the basic mechanism for computing answers to queries in the untyped semantics for logic programming, the SLD-derivation step. The major difference is the use of the typed unification algorithm. Then we create TSLD-derivations by iteratively applying these singular steps. After this, we introduce the concept of TSLD-trees and use it to represent the search space for answers in logic programming. Finally, we interpret the contents of the TSLD-tree.
4.1 TSLD-derivation
To compute in logic programming, we need a program and a query . We can interpret as being a set of statements, or rules, and as being a question that will be answered by finding an instance such that follows from . The essence of computation in logic programming is then to find such [Apt96].
In our setting the basic step for computation is the TSLD-derivation step. It corresponds to having a non-empty query and selecting from an atom . If unifies with , where is an input clause, we replace in by and apply an mgu of and to the query.
Definition 3 (TSLD-derivation step)
Consider a non-empty query and a clause of the form . Suppose that unifies (using typed unification) with and let be a mgu of and . is called the selected atom of . Then we write
and call it a TSLD-derivation step. is called its input clause. If typed unification between the selected atom and the input clause outputs (or ) we write the TSLD-derivation step as (or ).
In this definition we assume that is variable disjoint with . It is always possible to rename the variables in in order to achieve this, without loss of generality.
Definition 4 (TSLD-derivation)
Given a program and a query a sequence of TSLD-derivation steps from with input clauses of reaching the empty query, , or , is called a TSLD-derivation of in .
If the program is clear from the context, we speak of a TSLD-derivation of the query and if the input clauses are irrelevant we drop the reference to them. Informally, a TSLD-derivation corresponds to iterating the process of the TSLD-derivation step. We say that a TSLD-derivation is successful if we reach the empty query, further denoted by . The composition of the mgus used in each TSLD-derivation step is the computed answer substitution of the query. A TSLD-derivation that reaches is called a failed derivation and a TSLD-derivation that reaches is called an erroneous derivation.
In a TSLD-derivation, at each TSLD-derivation step we have several choices. We choose an atom from the query, a clause from the program, and a mgu. It is proven in [Apt96] that the choice of mgu does not affect the success or failure of an SLD-derivation, as long as the resulting mgu is idempotent. Since for TSLD-derivations the successes are the same as the ones in a corresponding SLD-derivation, then the result still holds for TSLD.
The selection rule, i.e, how we choose the selected atom in the considered query, does not influence the success of a TSLD-derivation either [Apt96], however if you stopped as soon as unification returns false, it could prevent us from detecting a type error, in a later atom. Let us show this in the following example.
Example 4
Consider the logic program consisting of only one fact, , and the selection rule that chooses the leftmost atom at each step.
Then, if we stopped when reaching , the query would have the TSLD-derivation , since typed unification between and outputs . However, the query has the TSLD-derivation , since typed unification between and outputs .
In fact, as the comma stands for conjunction, and since , we have to continue even if typed unification outputs in a step, and check if we ever reach the value . In general, for any selection rule we can construct a query such that it is necessary to continue when typed unification outputs for some atom in . Therefore, when we reach the value in a TSLD-derivation step, we continue applying steps until either we obtain a value from typed unification or we have no more atoms to select. In this last case, we can safely say that we reached . This guarantees independence of the selection rule. For the following example we use the selection rule that always chooses the leftmost atom in a query, which is the selection rule of Prolog.
Example 5
Let us continue example 4. The TSLD-derivation for is . Let . Then the TSLD-derivation is .
Note that when we get to for a typed unification in a TSLD-derivation, we can only output or , so either way it is not a successful derivation.
The selected clause from the program is another choice point we have at each TSLD-derivation step. We will discuss the impact of this choice in the next section.
4.2 TSLD-tree
When we want to find a successful TSLD-derivation for a query, we need to consider the entire search space, which consists of all possible derivations, choosing all possible clauses for a selected atom. We are considering a fixed selection rule here, so the only thing that changes between derivations is the selected clause. We say that a clause is applicable to an atom if and have the same predicate symbol with the same arity.
Definition 5 (TSLD-tree)
Given a program and a query , a TSLD-tree for is a tree where branches are TSLD-derivations of and every node has a child for each clause from applicable to the selected atom of .
Prolog uses the leftmost selection rule, where one always selects the leftmost atom in the query, and since the selection rule does not change the success of a TSLD-derivation we will use this selection rule in the rest of the paper.
Definition 6 (TSLD-tree classification)
-
•
If a TSLD-tree contains the empty query, we call it successful.
-
•
If a TSLD-tree is finite and all its branches are erroneous TSLD-derivations, we call it finitely erroneous.
-
•
If a TSLD-tree is finite and it is not successful nor finitely erroneous, we say it is finitely failed.
Example 6
Let program be:
p(1). p(2). q(1). q(a). r(X) :- p(X),q(X).
and let query be . The TSLD-tree for and is the following successful TSLD-tree:
We will present two auxiliary definitions which are needed to clearly define the notion of a type error in a program.
Definition 7 (Generic Query)
Let be a query and a program. We say that is a generic query of iff is only composed of an atom of the form for each predicate symbol that occurs in the head of at least one clause in , where are variables that occur only once in the query, and there are no other atoms in .
Example 7
Let be the program defined as follows:
p(X,X). q(X) :- p(1,a).
Then, given the generic query , we have the following TSLD-derivation:
Definition 8 (Blamed Clause)
Given a program and a query , a clause is a blamed clause of the TSLD-tree for if all derivations where is a input clause are erroneous.
The blamed clause is a clause in the program which causes a type error. A similar notion was first defined for functional programming languages with the blame calculus [WF09].
Example 8
Let be the following program, with clauses , , and , respectively:
p(1). q(a). q(X) :- p(a).
Then for the query , we have the following TSLD-tree:
In this case, is a blamed clause, since every derivation that uses it eventually reaches . Note that is not a blamed clause, because the leftmost branch of the TSLD-tree uses but is .
Definition 9 (Type Error in the Program)
Suppose we have a program and a generic query . Then has a type error if there is a blamed clause in the TSLD-tree for .
Note that if a program does not have a type error, then there is no blamed clause in the TSLD-tree. Also note that the order of the atoms in the generic query may change the place where we use the blamed clause, but it will not change the fact that there exists one or not.
Example 9
Assume the same program from Example 8. Let and be generic queries. The TSLD-trees for and are:
Intuitively, having a type error in the program means that somewhere in the program we will perform typed unification between two terms that do not have the same type.
Consider a generic query , where . For some derivation, after one step, we will have , where is a clause in and is a unifier of and . Since , or any other idempotent mgu of and , is a renaming of , where , and since the variables do not occur in because the clause is variable disjoint from the query by definition, nor do they occur in , by the definition of the generic query, then . The argument holds for any other atom in . Thus whenever the selected atom belongs to the original query the result is never and the mgu can be ignored.
After selecting the blamed clause , every TSLD-derivation is such that . Thus, at step , the selected atom comes from the program and every mgu applied up to this point is from substitutions arising from the program itself and not the query. Therefore, the type error was in the program.
Definition 10 (Type Error in the Query)
Let be a program and be a query. If there is no type error in and the TSLD-tree is finitely erroneous, then we say that there is a type error in the query with respect to .
If there is no type error in the program but the TSLD-tree is finitely erroneous, then that error must have occurred in a unification between terms from the query and the program. We then say that the type error is in the query.
Example 10
Now suppose we have the following program:
p(1). q(a). q(X) :- p(X).
Let us name the clauses , , and , respectively. The following trees are the TSLD-tree for a generic query, and the TSLD-tree for the query .
On the left-hand side, we see that for a generic query , the TSLD-tree for has no blamed query, which means that there is no type error in the program. On the right-hand side the tree is finitely erroneous, therefore there is a type error in query .
5 Declarative Semantics
The declarative semantics of logic programming is, in opposition to the operational one, a definition of what the programs compute. The fact that logic programming can be interpreted this way supports the fact that logic programming is declarative [Apt96]. In this section, we will introduce the concept of interpretations, which takes us from the syntactic programs we saw and used so far into the semantic universe, giving them meaning. With this interpretation we will redefine a declarative semantics for logic programming first defined in [BFC19] and prove a connection between both the operational and the declarative semantics.
5.1 Domains
Let be a non-empty set of semantic values, which we will call the universe. We assume that the universe is divided into domains such that each ground type is mapped to a non-empty domain. Thus, is divided into domains as follows: , where is the domain of integer numbers, is the domain of floating point numbers, is the domain of non-numeric constants, is the domain of strings, are domains for trees, where each domain has trees whose root is the same functor symbol and its n-children belong to n domains and is the domain of functions. Moreover, we define as the domain containing and , and as the domain with the single value , corresponding to a run-time error. We will call , , , and the base domains, and the tree domains. In particular, we can see that constants are separated into several predefined base domains, one for each base type, while complex terms, i.e. trees, are separated into domains depending on the principal function symbol (root) and the n-tuple inside the parenthesis (n-children).
5.2 Interpretations
Every constant of type is associated with a semantic value in one of the base domains, , , , or , corresponding to . Every function symbol of arity in our language is associated with a mapping from any n-tuple of base or tree domains to the domain , which is the domain of trees whose root is and the n-children are in the domains .
To define the semantic value for terms, we will first have to define states. States, , are mappings from variables into values of the universe. We also define a function that when applied to a semantic value returns the domain it belongs to. The semantic value of a term is defined as follows:
An interpretation associates every predicate symbol with a function in , such that the output of the function is the domain and the input is a union of tuples of domains. For each tuple that is in its domain, the function either returns or . We will use to denote the semantics of an expression , which can be an atom, a query, or a clause, in an interpretation , and define it as follows:
= if
then
else
Note that if the clause is of the form , then its semantics is equivalent to that of .
5.3 Models
The term language and their semantic values are fixed, thus each interpretation is determined by the interpretation of the predicate symbols. Interpretations differ from each other only in the functions they associate to each predicate defined in . We now define a context as a set of pairs of the form , where is a variable that occurs only once in the set, and is a domain. We say that complies with if every binding in is such that and . An interpretation is a model of in the context iff for every state that complies with , . We will denote this as . Given a program , we say that an interpretation is a model of in context if is a model of every clause in in context . Here we assume, without loss of generality, that all clauses are variable disjoint with each other. If two expressions and are such that every model of in a context is also a model of in the context , then we say that is a semantic consequence of and represent this by . Suppose two interpretations and are models of program in some context . Suppose, in particular, that for some predicate of the associated function is for and for . Let us call the set of tuples of terms for which outputs , and to the set of tuples of terms for which outputs . We say that is smaller than if and, if , then . We say that a model of in context is minimal if for every other model of in context , is smaller than .
Example 11
Consider the program defined below:
father(john,mary). father(phil,john). grandfather(X,Y) :- father(X,Z), father(Z,Y).
Suppose that interpretation is such that is associated with and . Suppose that is associated to in , with the same domain. Suppose, also, that , associated in with , is such that . Let the sets , , and be the sets of accepted tuples for , , and , respectively.
Thus, if these interpretations associate the same function to , and the set of accepted tuples for , then all are models of in context . In fact, all states that comply with are such that is true, for all .
But note that , and , but . In fact, any smaller domain or set would not model . Therefore is the minimal model of .
5.4 Type Errors
To calculate the set of accepted tuples for a given interpretation we will use the immediate consequence operator . The operator is traditionally used in the logic programming literature to iteratively calculate the minimal model of a logic program as presented in [vEK76, Apt96, Llo84].
Since for us interpretations for predicates are typed, does not generate an interpretation by itself. Instead it generates a set of atoms . Then we say that any interpretation derived from is such that for all predicates occurring in , , where for all there is at least one atom such that . Note that these interpretations may not be models of using our new definition of a model. We are now able to define the notion of ill-typed program.
Definition 11 (Ill-typed Program)
Let be a program. If no interpretation derived from is a model of , we say that is an ill-typed program.
Example 12
Let be the program defined as:
p(1). p(a). q(X) :- p(1.1).
Then . So any interpretation derived from is such that . Therefore for any context , for every that complies with , . Therefore no such is a model of .
The reason why is always a minimal model of in the untyped semantics, comes from the fact that whenever a body of a clause is for all states, then the clause is trivially for all states. However in our semantics, since we are separating these cases into and , the ones do not trivially make the formula , making it instead. These are the ill-typed cases.
Lemma 1 (Blamed Clause Type Error)
Suppose there is a type error in the program with blamed clause . Then, such that .
Effectively what this means is that if there is a type error in the program, then the blamed clause is such that it will not be used to calculate the , since at least one of the atoms in its body will never be able to be used in an application of . We can also have a ill-typed query, and we define it as follows.
Definition 12 (Ill-typed Query)
Let be a program. If any interpretation derived from , such that models in some context , is such that is not a model of in the context , then we say that is an ill-typed query with respect to .
5.5 Soundness of TSLD-resolution
In this section we will prove that TSLD-resolution is sound, i.e. if there is a successful derivation of a query in program with a correct answer substitution , then every model of is also a model of ; if there is a type error in the program, then the program is ill-typed; and if there is a type error in the query, the query is ill-typed with respect to the program. To prove this we will introduce the following auxiliary concept.
Definition 13 (Resultant)
Suppose we have a TSLD-derivation step . Then we define the resultant associated to this step as .
Lemma 2 (Soundness of resultants)
Let be a TSLD-derivation step using input clause and be the resultant associated with it. Then:
-
1.
;
-
2.
for any TSLD-derivation of with resultants , (for all ).
Proof of this lemma for the SLD-resolution is in [Apt96]. Since for unifiable terms the typed unification algorithm behaves like first-order unification, the proof still holds.
Theorem 5.1 (Soundness of TSLD-resolution)
Let be a program and a query. Then:
-
1.
Suppose that there exists a successful derivation of , with the correct answer substitution . Then .
-
2.
Suppose there is a type error in the program. Then is ill-typed.
-
3.
Suppose there is a type error in the query. Then is an ill-typed query with respect to .
A short note about completeness. As for untyped SLD-resolution, completeness is related to the search for answers in a TSLD-tree. If we use Prolog sequential, top-down, depth-first search with backtracking, then it may result in incompleteness for same cases where the TSLD-tree is infinite, because the exploration of an infinite computation may defer indefinitely the exploration of some alternative computation capable of yielding a correct answer.
6 Conclusions and Future Work
We presented an operational semantics for logic programming, here called TSLD-resolution, which is sensitive to run-time type errors. In this setting type errors are represented by a new value, here called wrong, which is added to the usual fail and success results of evaluation of a query for a given logic program. We have then adapted a previously defined declarative semantics for typed logic programs using a three-valued logic and proved that TSLD-resolution is sound with respect to this semantics. All these new concepts, TSLD, typed unification and the new declarative semantics, revisit and partially extend well-known concepts form the theory of logic programming.
Specially interpreted functors: in this paper functors are uninterpreted, such as in Prolog, in the sense that they are just symbols used to build new trees. An obvious extension of this work is to extend the system to dynamically detect type errors relating to the semantic interpretation of some specific functors, for instance the list constructor. For this, we would have for the list constructor not the Herbrand-based interpretation , but the following interpretation . Moreover, we would have the empty list with type . This would necessarily change the typed unification algorithm by introducing a new kind of constraints. As an example, consider the unification , where the second argument in both terms is not a list: considering a specially interpreted list constructor the result should be wrong, although the traditional untyped result is true. The same issues appear for arithmetic expressions. Arithmetic interpretations of , and can be introduced in the typed unification algorithm, so that in this context, unifications such as would now return wrong instead of true. These extensions are left for future work.
Using TSLD in practice: we plan to integrate TSLD into the YAP Prolog System [CRD12]. This will enable further applications of the method to large scale Prolog programs. One important point of this integration it to add explicit type declarations to all the implicitly typed Prolog built-in predicates. Finally, one could also wonder how to apply TSLD to the dynamic typing of constraint logic programming modules, adding new types mapped to constraint domains.
References
- [Apt96] Krzysztof R. Apt. From Logic Programming to Prolog. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.
- [BB81] D.A. Bochvar and Merrie Bergmann. On a three-valued logical calculus and its application to the analysis of the paradoxes of the classical extended functional calculus. History and Philosophy of Logic, 2(1-2):87–112, 1981.
- [Bea16] Jc Beall. Off-topic: A new interpretation of weak-kleene logic. The Australasian Journal of Logic, 13(6), 2016.
- [BFC19] João Barbosa, Mário Florido, and Vítor Santos Costa. A three-valued semantics for typed logic programming. In Proceedings 35th International Conference on Logic Programming (Technical Communications), ICLP 2019 Technical Communications, Las Cruces, NM, USA, September 20-25, 2019, volume 306 of EPTCS, pages 36–51, 2019.
- [BFC22] João Barbosa, Mário Florido, and Vítor Santos Costa. Data type inference for logic programming. In Emanuele De Angelis and Wim Vanhoof, editors, Logic-Based Program Synthesis and Transformation, pages 16–37, Cham, 2022. Springer International Publishing.
- [BJ88] Maurice Bruynooghe and Gerda Janssens. An instance of abstract interpretation integrating type and mode inferencing. In Fifth International Conference and Symposium, Washington, 1988, pages 669–683, 1988.
- [CRD12] Vítor Santos Costa, Ricardo Rocha, and Luís Damas. The YAP prolog system. Theory Pract. Log. Program., 12(1-2):5–34, 2012.
- [DEC96] Pierre Deransart, AbdelAli Ed-Dbali, and Laurent Cervoni. Prolog - the standard: reference manual. Springer, 1996.
- [DMP02] Włodzimierz Drabent, Jan Małuszyński, and Paweł Pietrzak. Using parametric set constraints for locating errors in CLP programs. Theory and Practice of Logic Programming, 2(4-5):549–610, 2002.
- [DS01] Pierre Deransart and Jan-Georg Smaus. Well-typed logic programs are not wrong. In Herbert Kuchen and Kazunori Ueda, editors, Functional and Logic Programming, 5th International Symposium, FLOPS 2001, Tokyo, Japan, 2001, Proceedings, volume 2024 of Lecture Notes in Computer Science, pages 280–295. Springer, 2001.
- [DZ92] Philip W. Dart and Justin Zobel. A regular type language for logic programs. In Frank Pfenning, editor, Types in Logic Programming, pages 157–187. The MIT Press, 1992.
- [FSVY91] Thom W. Frühwirth, Ehud Y. Shapiro, Moshe Y. Vardi, and Eyal Yardeni. Logic programs as types for logic programs. In Proc. of the Sixth Annual Symposium on Logic in Computer Science (LICS ’91), Netherlands,1991, pages 300–309, 1991.
- [Han14] Michael Hanus. Multiparadigm languages. In Computing Handbook, Third Edition: Computer Science and Software Engineering, pages 66: 1–17. Springer Cham, 2014.
- [Her30] Jacques Herbrand. Recherches sur la théorie de la démonstration. Numdam, 1930.
- [Kle38] S. C. Kleene. On notation for ordinal numbers. Journal Symbolic Logic, 3(4), 1938.
- [Llo84] J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin, Heidelberg, 1984.
- [LR91] T. L. Lakshman and Uday S. Reddy. Typed Prolog: A semantic reconstruction of the Mycroft-O’Keefe type system. In Logic Programming, Proceedings of the 1991 International Symposium, San Diego, California, USA, 1991.
- [Lu01] Lunjin Lu. On Dart-Zobel algorithm for testing regular type inclusion. SIGPLAN Not., 36(9):81–85, 2001.
- [Mil78] Robin Milner. A theory of type polymorphism in programming. J. Comput. Syst. Sci., 17(3):348–375, 1978.
- [MM82] Alberto Martelli and Ugo Montanari. An efficient unification algorithm. ACM Trans. Program. Lang. Syst., 4(2):258–282, apr 1982.
- [MO84] Alan Mycroft and Richard A. O’Keefe. A polymorphic type system for Prolog. Artif. Intell., 23(3):295–307, 1984.
- [Rob65] J. A. Robinson. A Machine-Oriented Logic Based on the Resolution Principle. Journal of the ACM, 12(1):23–41, 1965.
- [SBG08] Tom Schrijvers, Maurice Bruynooghe, and John P. Gallagher. From monomorphic to polymorphic well-typings and beyond. In Logic-Based Program Synthesis and Transformation, 18th International Symposium, LOPSTR 2008, Spain, July 17-18, 2008, pages 152–167, 2008.
- [SCWD08] Tom Schrijvers, Vítor Santos Costa, Jan Wielemaker, and Bart Demoen. Towards typed Prolog. In Logic Programming, 24th International Conference, ICLP 2008, Udine, Italy, Proceedings, pages 693–697, 2008.
- [vEK76] Maarten H. van Emden and Robert A. Kowalski. The semantics of predicate logic as a programming language. J. ACM, 23(4):733–742, 1976.
- [WF09] Philip Wadler and Robert Bruce Findler. Well-typed programs can’t be blamed. In Giuseppe Castagna, editor, Programming Languages and Systems, 18th European Symposium on Programming, ESOP 2009, York, UK, March 22-29, 2009. Proceedings, volume 5502 of Lecture Notes in Computer Science, pages 1–16. Springer, 2009.
- [YFS92] Eyal Yardeni, Thom W. Frühwirth, and Ehud Shapiro. Polymorphically typed logic programs. In Frank Pfenning, editor, Types in Logic Programming, pages 63–90. The MIT Press, 1992.
- [Zob87] Justin Zobel. Derivation of polymorphic types for Prolog programs. In Logic Programming, Proceedings of the Fourth International Conference, Melbourne, 1987, 1987.
Appendix 0.A Proofs for Theorems
Theorem 0.A.1 (Equivalence to term unification for success)
Let and be two terms. If we apply the Martelli-Montanari algorithm (MM algorithm) to and and it returns a set of solved equalities , then the typed unification algorithm applied to the same two terms is also successful and returns the same set of equalities.
Proof
The proof follows from induction on . Base cases:
-
•
Suppose we have an equality in of the form . Then the MM algorithm halts with failure, and our algorithm halts with .
-
•
Suppose we have an equality in of the form . Then the MM algorithm halts with failure, and our algorithm either halts with or changes to , depending on the types of and . In either case, it will not be successful.
-
•
Suppose we have an equality in of the form or . Then the MM algorithm halts with failure, and our algorithm halts with .
-
•
Suppose we have an equality in of the form , where occurs in . Then, the MM algorithm halts with failure and our algorithm changes to , so it is never successful.
Inductive cases:
-
•
Suppose we have an equality in of the form . Both algorithms generate the same new equalities and replace the selected one with those in . Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities , so does our algorithm.
-
•
Suppose we have an equality in of the form . Both algorithms delete this equality from . Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities , so does our algorithm.
-
•
Suppose we have an equality in of the form . Both algorithms delete this equality from . Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities , so does our algorithm.
-
•
Suppose we have an equality in of the form , where is not a variable and is a variable. Both algorithms replace this equality from with the same new one. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities , so does our algorithm.
-
•
Suppose we have an equality in of the form , where does not occur in and occurs somewhere else in . Both algorithms apply the same substitution to , therefore resulting in the same set of equalities. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities , so does our algorithm.
In any other case, none of the algorithms apply. ∎
Theorem 0.A.2 (Ill-typed unification)
If the output of the typed unification algorithm is wrong, then there is no substitution such that and have the same type.
Proof
Suppose that there is a substitution such that . Then, we now that the MM algorithm would output an mgu of and . Therefore so would our algorithm, from theorem 3.1. But the output of our algorithm is wrong, so there cannot be such a . Suppose that there is a substitution such that and have the same type. Then, we will prove that our algorithm would output by induction on and . Base cases:
-
•
Suppose we have and has the same type as . Then the algorithm outputs , not .
-
•
Suppose we have and has a different type from . Then such cannot exist.
-
•
Suppose we have , where occurs in . Then the algorithm outputs , not .
-
•
Suppose we have . Then such cannot exist.
Inductive step:
-
•
Suppose we have . Then, by the induction hypothesis, if the algorithm outputs for the input , we know that no such that and have the same type, exists. Therefore, no such that and have the same type exists either, by the properties of substitution.
-
•
Suppose we have or , then the algorithm outputs , not .
-
•
Suppose we have , then either the algorithm ends with or if occurs in , but never .
-
•
Suppose we have , where is not a variable. Then, the algorithm either halts with success in one step or fails. Either way it does not output .
So, we prove that whenever the typed unification algorithm halts with for some pair of terms and , then there is no substitution such that and have the same type. ∎
Lemma 3 (Blamed Clause Type Error)
Suppose there is a type error in the program with blamed clause . Then, such that .
Proof
We will prove this by contradiction. Suppose that for all there is some such that for some , . Then, there would be a derivation of the form or in the TSLD-tree for , where is a generic query, since the output for the unification between and would not return . But then would not be a blamed clause. Therefore we proved the lemma.
Theorem 0.A.3 (Soundness of TSLD-resolution)
Let be a program and a query. Then:
-
1.
Suppose that there exists a successful derivation of , with the correct answer substitution . Then .
-
2.
Suppose there is a type error in the program. Then is ill-typed.
-
3.
Suppose there is a type error in the query. Then is an ill-typed query with respect to .
Proof
(1) Let be the mgus obtained in a successful derivation. Therefore, . The proof follows directly from lemma 2 applied to .
(2) If there is a type error in the program, then there is a blamed clause in the TSLD-tree for , where is a generic query. Then by lemma 1 we know that there is at least one , in the body of , such that . Any interpretation derived from is such that for any . This implies that no such is a model of and, therefore, no such is a model of , which means P is ill-typed.
(3) This is the case where there is a type error in the query . Now, consider program , where is a predicate that does not occur in . Then note that every TSLD-derivation for a generic query with input clause leads to , so this is a blamed clause. Therefore, from (2), no interpretation derived from models this clause. Since the set of atoms for in is empty, we can give any interpretation to in any interpretation derived from . So we can choose an interpretation derived from such that models in some context . Therefore, as no interpretation derived from models in any context and we can build an interpretation which models predicate in some context, no interpretation can model . Thus, by the definition of ill-typed query, is ill-typed with respect to .
∎