This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: LIACC, INESC, Dep. de Ciência de Computadores
Faculdade de Ciências
Universidade do Porto, Porto
11email: {joao.barbosa,amflorido,vscosta}@fc.up.pt

Typed SLD-Resolution: Dynamic Typing for Logic Programming

João Barbosa    Mário Florido    Vítor Santos Costa
Abstract

The semantic foundations for logic programming are usually separated into two different approaches. The operational semantics, which uses SLD-resolution, the proof method that computes answers in logic programming, and the declarative semantics, which sees logic programs as formulas and its semantics as models. Here, we define a new operational semantics called TSLD-resolution, which stands for Typed SLD-resolution, where we include a value “wrong”, that corresponds to the detection of a type error at run-time. For this we define a new typed unification algorithm. Finally we prove the correctness of TSLD-resolution with respect to a typed declarative semantics.

Keywords:
Logic Programming Operational Semantics Types

1 Introduction

Types play an important role in the verification and debugging of programming languages, and have been the subject of significant research in the logic programming community [Zob87, DZ92, Lu01, FSVY91, YFS92, MO84, LR91, SBG08, DMP02, SCWD08, Han14, BFC19, BFC22]. Most research has been driven by the desire to perform compile-time checking. One important line of this work views types as approximation of the program semantics [Zob87, DZ92, YFS92, BJ88, FSVY91]. A different approach relies on asking the user to provide the type information, thus filtering the set of admissible programs [MO84, LR91, SCWD08, BFC19]. In practice, static type-checking is not widely used in actual Prolog systems, but Prolog systems do rely on dynamic typing to ensure that system built-in parameters are called with acceptable arguments, such as is/2. In fact, the Prolog ISO standard defines a set of predefined types and typing violations [DEC96].

Motivated by these observations, we propose a step forward in the dynamic type checking of logic programs: to extend unification with a type checking mechanism. Type checking will thus become a core part of the resolution engine. This extension enables the detection of several bugs in Prolog programs which are rather difficult to capture in the standard untyped language, such as the unintended switch of arguments in a predicate call. This approach can also be used to help in the tracing of bugs in the traditional Prolog Four-Port Model debugging approach. In classical programming languages, type-checking essentially captures the use of functions on arguments of a type different from the expected. The parallel in logic programming is to capture type errors in queries applied to arguments of a different type. This is the essence of our new operational mechanism, here called Typed SLD (TSLD). Following Milner’s argument on wrong programs [Mil78], type errors will be denoted by an extra value, wrong; unification may succeed, fail, or be wrong. As discussed in prior work, a three-valued semantics provides a natural framework for describing the new unification [BFC19]. We name the new evaluation mechanism, that extends unification with type checking thus performing dynamic typing, Typed SLD (TSLD). Let us now present a simple example of the use of TSLD-resolution:

Example 1

Consider the following program consisting of the three facts:

p(0).
p(1).
p(a).

Let \square stand for success, false for failure and wrong for a run-time error. Assuming that constants 11 and aa have different types (in this case, intint and atomatom, respectively), the TSLD-tree for the query p(1)p(1) is:

p(1)p(1)falsefalse\squarewrongwrong
Figure 1: TSLD-tree

In the following we prove that TSLD-resolution is correct with respect to a new typed declarative semantics for logic programming based on the existence of several different semantic domains, for the interpretation of terms, instead of the usual Herbrand universe domain. The use of different domains for the semantics of logic programming is not new [LR91], but before it was used in a prescriptive typing approach where types were mandatory for every syntactic objects: functors, variables, and predicates. Here we assume that only constants and function symbols have predefined types. Type correctness in the form of a reformulation of subject-reduction for SLD-resolution was defined in [DS01] for a typed version of logic programs where also every syntactic objects must be typed statically. To semantically deal with dynamic typing here we define a new version of SLD (Typed SLD) and prove its soundness with respect to a new declarative semantics. In another previous work [SCWD08] dynamic type checking of Prolog predicates was done in two different scenarios: on calls from untyped to typed code using program transformation, and on calls to untyped code from typed code to check whether the untyped code satisfies previously made type annotations. In this previous work type annotations for predicates were necessary in both scenarios and the semantic soundness of these run-time checks was not studied. Here we do a semantic study of dynamic typing and use it to show that a new operational mechanism detecting run-time errors is sound. For this we use predefined types only for constants and function symbols.

The paper is organised as follows. Section 2 reviews some preliminary background concepts with the necessary definitions and results on the theory of types and logic programming, and sets the grounds for subsequent developments. In Section 3 we present the important new notion of typed unification, present the use of different types for constant symbols, and show that typed unification extends standard first order unification in the sense that when it succeeds the result is the same. Section 4 presents TSLD-resolution. We definite the notion of TSLD-derivation step and TSLD-tree, showing how run-time type errors are detected during program evaluation. We also distinguish the notion of a type error in the program from the notion of a type error in the query with respect to a program. In Section 5, we define a new declarative semantics for logic programming based on our previous work [BFC19] using a three-valued logic which uses the value wrong to denote run-time type errors, define ill-typed programs and queries, and show that TSLD-resolution is sound with respect to this declarative semantics. Finally, in section 6, we conclude and point out some research directions for future work.

2 Preliminary Concepts

In this section we will present concepts that are relevant both to the definitions of the operational and declarative semantics for logic programming. These concepts include the three-valued logic that will be used throughout the paper, defined initially in [Kle38], and the definition of our syntax for types, terms, and programs.

2.1 Three-Valued Logic

The three-valued logic used in this paper is the Weak Kleene logic [Kle38], later interpreted by [BB81] and [Bea16]. As in our previous work [BFC19] the third value is called wrong and represents a (dynamic) type error. In this logic, the value wrong propagates through every connective, which is a behaviour we want the type error value to have. In table 1, we describe the connectives in the logic.

\bm{\wedge} true false wrong
true true false wrong
false false false wrong
wrong wrong wrong wrong
\bm{\vee} true false wrong
true true true wrong
false true false wrong
wrong wrong wrong wrong
Table 1: Connectives of the three-valued logic - conjunction and disjunction

The negation of logic values is defined as: ¬true=false\neg true=false, ¬false=true\neg false=true and ¬wrong=wrong\neg\emph{wrong}=\emph{wrong}. And implication is defined as: p>q(¬p)qp-->q\equiv(\neg p)\vee q.

Note that whenever the value wrong occurs in any connective in the logic, the result of applying that connective is wrong.

2.2 Types

In this paper we fix the set of base types int, float, atom and string, an enumerable set of compound types f(σ_1,,σ_n)f(\sigma_{\_}1,\cdots,\sigma_{\_}n), where ff is a function symbol and σ_i\sigma_{\_}i are types, and an enumerable set of functional types of the form σ_1××σ_nσ\sigma_{\_}1\times\cdots\times\sigma_{\_}n\rightarrow\sigma, where σ_i\sigma_{\_}i and σ\sigma are types.

We use this specific choice of base types because they correspond to types already present, to some extent, in Prolog. Some built-in predicates already expect integers, floating point numbers, or atoms.

2.3 Terms

The alphabet of logic programming is composed of symbols from disjoint classes. For our language of terms we have an infinite set of variables Var, an infinite set of function symbols Fun, parenthesis and the comma [Apt96].

Terms are defined as follows:

  • a variable is a term,

  • if ff is an n-ary function symbol and t_1,,t_nt_{\_}1,\dots,t_{\_}n are terms, then f(t_1,,t_n)f(t_{\_}1,\dots,t_{\_}n) is a term,

  • if ff is a function symbol of arity zero, then ff is a term and it is called a constant.

A ground term is a term with no variables. In the rest of the paper we assume that ground terms are assumed to be typed, meaning that each constant has associated to it a base type and for any ground compound term f(t_1,t_n)f(t_{\_}1,\cdots t_{\_}n), the function symbol ff (of arity n1n\geq 1) has associated to it a functional type of the form σ_1××σ_nf(σ_1,,σ_n)\sigma_{\_}1\times\cdots\times\sigma_{\_}n\rightarrow f(\sigma_{\_}1,\cdots,\sigma_{\_}n). Note that variables are not statically typed, but, as we will see in the forthcoming sections, type checking of the use of variables will be made dynamically through TSLD-resolution.

2.4 Programs and Queries

We now extend our language of terms to a language of programs by adding an infinite set of predicate symbols Pred and the reverse implication \leftarrow.

The definition of atoms, queries, clauses and programs is the usual one [Apt96]:

  • an atom is either a predicate symbol pp with arity nn, applied to terms t_1,,t_nt_{\_}1,\dots,t_{\_}n, which we write as p(t_1,,t_n)p(t_{\_}1,\dots,t_{\_}n). We will represent atoms by H,A,BH,A,B;

  • a query is a finite sequence of atoms, which we will represent by Q,A¯,B¯Q,\bar{A},\bar{B};

  • a clause is of the form HB¯H\leftarrow\bar{B}, where HH is an atom of the form p(t_1,,t_n)p(t_{\_}1,\dots,t_{\_}n) and B¯\bar{B} is a query;

  • a program is a finite set of clauses, which we will represent by PP.

The interpretation of queries and clauses is quantified. Every variable that occurs in a query is assumed to be existentially quantified and every variable that occurs in a clause is universally quantified [Apt96].

3 Typed Unification

Solving equality constraints using a unification algorithm [Rob65, MM82] is the main computational mechanism in logic programming. Logic programming usually uses an untyped term language and assumes a semantic universe composed of all semantic values: the Herbrand universe [Her30, Apt96].

However, in our work, we assume that the semantic values are split among several disjoint semantic domains and thus equality only makes sense inside each domain. Moreover each type will be mapped to a non-empty semantic domain. To reflect this, unification may now return three different outputs. Besides being successful or failing, unification can now return the wrong value. This is the logical value of nonsense and reflects the fact that we are trying to perform unification between terms with different types corresponding to a type error during program evaluation.

A substitution is a mapping from variables to terms, which assigns to each variable XX in its domain a term tt. We will represent bindings by XtX\mapsto t, substitutions by symbols such as θ,η,δ\theta,\eta,\delta\dots, and applying a substitution θ\theta to a term tt will be represented by θ(t)\theta(t). We say θ(t)\theta(t) is an instance of tt.

Substitution composition is represented by \circ, i.e., the composition of the substitutions θ\theta and η\eta is denoted θη\theta\circ\eta and applying (θη)(t)(\theta\circ\eta)(t) corresponds to θ(η(t))\theta(\eta(t)). We can also calculate substitution composition, i.e., δ=θη\delta=\theta\circ\eta as defined below [Apt96].

Definition 1 (Substitution Composition)

Suppose θ\theta and η\eta are substitutions, such that θ=[X_1t_1,,X_nt_n]\theta=[X_{\_}1\mapsto t_{\_}1,\dots,X_{\_}n\mapsto t_{\_}n] and η=[Y_1t_1,,Y_mt_m]\eta=[Y_{\_}1\mapsto t_{\_}1\prime,\dots,Y_{\_}m\mapsto t_{\_}m\prime]. Then, composition ηθ\eta\circ\theta is calculated by following these steps:

  • remove from the sequence X_1η(t_1),,X_nη(t_n),Y_1t_1,,Y_mt_mX_{\_}1\mapsto\eta(t_{\_}1),\dots,X_{\_}n\mapsto\eta(t_{\_}n),Y_{\_}1\mapsto t_{\_}1\prime,\dots,Y_{\_}m\mapsto t_{\_}m\prime the bindings X_iη(t_i)X_{\_}i\mapsto\eta(t_{\_}i) such that X_i=η(t_i)X_{\_}i=\eta(t_{\_}i) and the elements Y_it_iY_{\_}i\mapsto t_{\_}i\prime for which X_j.Y_i=X_j\exists X_{\_}j.Y_{\_}i=X_{\_}j

  • form a substitution from the resulting sequence.

A substitution θ\theta is called a unifier of two terms t_1t_{\_}1 and t_2t_{\_}2 iff θ(t_1)=θ(t_2)\theta(t_{\_}1)=\theta(t_{\_}2). If such a substitution exists, we say that the two terms are unifiable. In particular, a unifier θ\theta is called a most general unifier (mgu) of two terms t_1t_{\_}1 and t_2t_{\_}2 if for every other unifier η\eta of t_1t_{\_}1 and t_2t_{\_}2, η=δθ\eta=\delta\circ\theta, for some substitution δ\delta.

First order unification [Rob65] assumes an untyped universe, so unification between any two terms always makes sense. Therefore, it either returns a mgu between the terms, if it exists, or halts with failure.

We argue that typed unification only makes sense between terms of the same type. Here we will extend a previous unification algorithm [MM82] to define a typed unification algorithm, where failure will be separated into falsefalse, where two terms are not unifiable but may have the same type, and wrong, where the terms cannot have the same type.

Definition 2 (Typed Unification Algorithm)

Let t_1t_{\_}1 and t_2t_{\_}2 be two terms, and FF be a flag that starts truetrue. We create the starting set of equations as S={t_1=t_2}S=\{t_{\_}1=t_{\_}2\}, and we will rewrite the pair (S,F)(S,F) by applying the following rules until it is no longer possible to apply any of them, or until the algorithm halts with wrong. If no rules are applicable, then we output falsefalse if the flag is falsefalse, or output the solved set SS, which can be seen as a substitution.

  1. 1.

    ({f(t_1,,t_n)=f(s_1,,s_n)}Rest,F)({t_1=s_1,,t_n=s_n}Rest,F)(\{f(t_{\_}1,\dots,t_{\_}n)=f(s_{\_}1,\dots,s_{\_}n)\}\cup Rest,F)\to(\{t_{\_}1=s_{\_}1,\dots,t_{\_}n=s_{\_}n\}\cup Rest,F)

  2. 2.

    ({f(t_1,,t_n)=g(s_1,,s_m)}Rest,F)wrong(\{f(t_{\_}1,\dots,t_{\_}n)=g(s_{\_}1,\dots,s_{\_}m)\}\cup Rest,F)\to wrong, if fgf\neq g or nmn\neq m

  3. 3.

    ({c=c}Rest,F)(Rest,F)(\{c=c\}\cup Rest,F)\to(Rest,F)

  4. 4.

    ({c=d}Rest,F)(Rest,false)(\{c=d\}\cup Rest,F)\to(Rest,false), if cdc\neq d, and cc and dd have the same type

  5. 5.

    ({c=d}Rest,F)wrong(\{c=d\}\cup Rest,F)\to wrong, if cdc\neq d, and cc and dd have different types

  6. 6.

    ({c=f(t_1,,t_n)}Rest,F)wrong(\{c=f(t_{\_}1,\dots,t_{\_}n)\}\cup Rest,F)\to wrong

  7. 7.

    ({f(t_1,,t_n)=c}Rest,F)wrong(\{f(t_{\_}1,\dots,t_{\_}n)=c\}\cup Rest,F)\to wrong

  8. 8.

    ({X=X}Rest,F)(Rest,F)(\{X=X\}\cup Rest,F)\to(Rest,F)

  9. 9.

    ({t=X}Rest,F)({X=t}Rest,F)(\{t=X\}\cup Rest,F)\to(\{X=t\}\cup Rest,F), where tt is not a variable and XX is a variable

  10. 10.

    ({X=t}Rest,F)({X=t}[Xt](Rest),F)(\{X=t\}\cup Rest,F)\to(\{X=t\}\cup[X\mapsto t](Rest),F), where XX does not occur in tt and XX occurs in RestRest

  11. 11.

    ({X=t}Rest,F)(Rest,false)(\{X=t\}\cup Rest,F)\to(Rest,false), where XX occurs in tt and XtX\neq t

Let us illustrate with an example.

Example 2

Let t_1t_{\_}1 be g(X,a,f(1))g(X,a,f(1)) and t_2t_{\_}2 be g(b,Y,f(2))g(b,Y,f(2)). We generate the pair ({g(X,a,f(1))=g(b,Y,f(2))},true)(\{g(X,a,f(1))=g(b,Y,f(2))\},true), and proceed to apply the rewriting rules.
({g(X,a,f(1))=g(b,Y,f(2))},true)_1({X=b,a=Y,f(1)=f(2)},true)_10({X=b,Y=a,f(1)=f(2)},true)_1({X=b,Y=a,1=2},true)_5({X=b,Y=a},false)false(\{g(X,a,f(1))=g(b,Y,f(2))\},true)\to_{\_}1\\ (\{X=b,a=Y,f(1)=f(2)\},true)\to_{\_}{10}\\ (\{X=b,Y=a,f(1)=f(2)\},true)\to_{\_}1\\ (\{X=b,Y=a,1=2\},true)\to_{\_}5\\ (\{X=b,Y=a\},false)\to false.

The successful cases of this algorithm are the same as for first order unification [MM82]. We will prove this result in the following theorem.

Theorem 3.1 (Conservative with respect to term unification)

Let t_1t_{\_}1 and t_2t_{\_}2 be two terms. If we apply the Martelli-Montanari algorithm (MM algorithm) to t_1t_{\_}1 and t_2t_{\_}2 and it returns a set of solved equalities SS, then the typed unification algorithm applied to the same two terms is also successful and returns the same set of equalities.

The following theorem proves typed unification detects run-time type errors.

Theorem 3.2 (Ill-typed unification)

If the output of the typed unification algorithm is wrong, then there is no substitution θ\theta such that θ(t_1)\theta(t_{\_}1) and θ(t_2)\theta(t_{\_}2) have the same type.

Example 3

Let t_1=f(1,g(h(X,2)),Y)t_{\_}1=f(1,g(h(X,2)),Y) and t_2=f(Z,g(h(W,a)),1)t_{\_}2=f(Z,g(h(W,a)),1). The typed unification algorithm outputs wrongwrong. We can see in Table 2 that there is no substitution θ\theta such that θ(t_1)=θ(t_2)\theta(t_{\_}1)=\theta(t_{\_}2), nor any substitution θ\theta such that θ(t_1)\theta(t_{\_}1) has the same type as θ(t_2)\theta(t_{\_}2), since the highlighted terms cannot have the same type for any substitution.

ff11gghhXX22YY
ffZZgghhWWaa11
Table 2: Tree representation terms t_1t_{\_}1 and t_2t_{\_}2

4 Operational Semantics

The operational semantics of logic programming describes how answers are computed. Here we define Typed SDL-resolution (TSLD) which returns the third value wrong whenever it finds a type error. We start by defining a TSLD-derivation step, which is a variation on the basic mechanism for computing answers to queries in the untyped semantics for logic programming, the SLD-derivation step. The major difference is the use of the typed unification algorithm. Then we create TSLD-derivations by iteratively applying these singular steps. After this, we introduce the concept of TSLD-trees and use it to represent the search space for answers in logic programming. Finally, we interpret the contents of the TSLD-tree.

4.1 TSLD-derivation

To compute in logic programming, we need a program PP and a query QQ. We can interpret PP as being a set of statements, or rules, and QQ as being a question that will be answered by finding an instance θ(Q)\theta(Q) such that θ(Q)\theta(Q) follows from PP. The essence of computation in logic programming is then to find such θ\theta [Apt96].

In our setting the basic step for computation is the TSLD-derivation step. It corresponds to having a non-empty query QQ and selecting from QQ an atom AA. If AA unifies with HH, where HB¯H\leftarrow\bar{B} is an input clause, we replace AA in QQ by B¯\bar{B} and apply an mgu of AA and HH to the query.

Definition 3 (TSLD-derivation step)

Consider a non-empty query Q=A_1¯,A,A_2¯Q=\bar{A_{\_}1},A,\bar{A_{\_}2} and a clause cc of the form HB¯H\leftarrow\bar{B}. Suppose that AA unifies (using typed unification) with HH and let θ\theta be a mgu of AA and HH. AA is called the selected atom of QQ. Then we write

A_1¯,A,A_2¯𝑐θ(A_1¯,B¯,A_2¯)\bar{A_{\_}1},A,\bar{A_{\_}2}\underset{c}{\implies}\theta(\bar{A_{\_}1},\bar{B},\bar{A_{\_}2})

and call it a TSLD-derivation step. HB¯H\leftarrow\bar{B} is called its input clause. If typed unification between the selected atom AA and the input clause cc outputs wrongwrong (or falsefalse) we write the TSLD-derivation step as QwrongQ\implies wrong (or Qfalse,A_1¯,A_2¯Q\implies false,\bar{A_{\_}1},\bar{A_{\_}2}).

In this definition we assume that AA is variable disjoint with HH. It is always possible to rename the variables in HB¯H\leftarrow\bar{B} in order to achieve this, without loss of generality.

Definition 4 (TSLD-derivation)

Given a program PP and a query QQ a sequence of TSLD-derivation steps from QQ with input clauses of PP reaching the empty query, falsefalse, or wrongwrong, is called a TSLD-derivation of QQ in PP.

If the program is clear from the context, we speak of a TSLD-derivation of the query QQ and if the input clauses are irrelevant we drop the reference to them. Informally, a TSLD-derivation corresponds to iterating the process of the TSLD-derivation step. We say that a TSLD-derivation is successful if we reach the empty query, further denoted by \square. The composition of the mgus θ_1,,θ_n\theta_{\_}1,\dots,\theta_{\_}n used in each TSLD-derivation step is the computed answer substitution of the query. A TSLD-derivation that reaches falsefalse is called a failed derivation and a TSLD-derivation that reaches wrongwrong is called an erroneous derivation.

In a TSLD-derivation, at each TSLD-derivation step we have several choices. We choose an atom from the query, a clause from the program, and a mgu. It is proven in [Apt96] that the choice of mgu does not affect the success or failure of an SLD-derivation, as long as the resulting mgu is idempotent. Since for TSLD-derivations the successes are the same as the ones in a corresponding SLD-derivation, then the result still holds for TSLD.

The selection rule, i.e, how we choose the selected atom in the considered query, does not influence the success of a TSLD-derivation either [Apt96], however if you stopped as soon as unification returns false, it could prevent us from detecting a type error, in a later atom. Let us show this in the following example.

Example 4

Consider the logic program PP consisting of only one fact, p(X,X)p(X,X), and the selection rule that chooses the leftmost atom at each step.

Then, if we stopped when reaching falsefalse, the query Q=p(1,2),p(1,a)Q=p(1,2),p(1,a) would have the TSLD-derivation QfalseQ\implies false, since typed unification between p(X,X)p(X,X) and p(1,2)p(1,2) outputs falsefalse. However, the query Q=p(1,a),p(1,2)Q\prime=p(1,a),p(1,2) has the TSLD-derivation QwrongQ\implies wrong, since typed unification between p(X,X)p(X,X) and p(1,a)p(1,a) outputs wrongwrong.

In fact, as the comma stands for conjunction, and since wrongfalse=falsewrong=wrongwrong\wedge false=false\wedge wrong=wrong, we have to continue even if typed unification outputs falsefalse in a step, and check if we ever reach the value wrongwrong. In general, for any selection rule SS we can construct a query QQ such that it is necessary to continue when typed unification outputs falsefalse for some atom in QQ. Therefore, when we reach the value falsefalse in a TSLD-derivation step, we continue applying steps until either we obtain a value wrongwrong from typed unification or we have no more atoms to select. In this last case, we can safely say that we reached falsefalse. This guarantees independence of the selection rule. For the following example we use the selection rule that always chooses the leftmost atom in a query, which is the selection rule of Prolog.

Example 5

Let us continue example 4. The TSLD-derivation for QQ is Qfalse,p(1,a)wrongQ\implies false,p(1,a)\implies wrong. Let Q=p(1,2),p(1,1)Q\prime\prime=p(1,2),p(1,1). Then the TSLD-derivation is Qfalse,p(1,1)falseQ\prime\prime\implies false,p(1,1)\implies false.

Note that when we get to falsefalse for a typed unification in a TSLD-derivation, we can only output falsefalse or wrongwrong, so either way it is not a successful derivation.

The selected clause from the program is another choice point we have at each TSLD-derivation step. We will discuss the impact of this choice in the next section.

4.2 TSLD-tree

When we want to find a successful TSLD-derivation for a query, we need to consider the entire search space, which consists of all possible derivations, choosing all possible clauses for a selected atom. We are considering a fixed selection rule here, so the only thing that changes between derivations is the selected clause. We say that a clause HB¯H\leftarrow\bar{B} is applicable to an atom AA if HH and AA have the same predicate symbol with the same arity.

Definition 5 (TSLD-tree)

Given a program PP and a query QQ, a TSLD-tree for P{Q}P\cup\{Q\} is a tree where branches are TSLD-derivations of P{Q}P\cup\{Q\} and every node QQ has a child for each clause from PP applicable to the selected atom of QQ.

Prolog uses the leftmost selection rule, where one always selects the leftmost atom in the query, and since the selection rule does not change the success of a TSLD-derivation we will use this selection rule in the rest of the paper.

Definition 6 (TSLD-tree classification)
  • If a TSLD-tree contains the empty query, we call it successful.

  • If a TSLD-tree is finite and all its branches are erroneous TSLD-derivations, we call it finitely erroneous.

  • If a TSLD-tree is finite and it is not successful nor finitely erroneous, we say it is finitely failed.

Example 6

Let program PP be:

p(1).
p(2).
q(1).
q(a).
r(X) :- p(X),q(X).

and let query QQ be r(1)r(1). The TSLD-tree for QQ and PP is the following successful TSLD-tree:

r(1)r(1)p(1),q(1)p(1),q(1)q(1)q(1)\squarewrongwrongfalse,q(1)false,q(1)falsefalsewrongwrong

We will present two auxiliary definitions which are needed to clearly define the notion of a type error in a program.

Definition 7 (Generic Query)

Let QQ be a query and PP a program. We say that QQ is a generic query of PP iff QQ is only composed of an atom of the form p(X_1,,X_n)p(X_{\_}1,\dots,X_{\_}n) for each predicate symbol pp that occurs in the head of at least one clause in PP, where X_1,,X_nX_{\_}1,\dots,X_{\_}n are variables that occur only once in the query, and there are no other atoms in QQ.

Example 7

Let PP be the program defined as follows:

p(X,X).
q(X) :- p(1,a).

Then, given the generic query p(X_1,X_2),q(X_3)p(X_{\_}1,X_{\_}2),q(X_{\_}3), we have the following TSLD-derivation: p(X_1,X_2),q(X_3)q(X_3)p(1,a)wrongp(X_{\_}1,X_{\_}2),q(X_{\_}3)\implies q(X_{\_}3)\implies p(1,a)\implies wrong

Definition 8 (Blamed Clause)

Given a program PP and a query QQ, a clause cc is a blamed clause of the TSLD-tree for P{Q}P\cup\{Q\} if all derivations where cc is a input clause are erroneous.

The blamed clause is a clause in the program which causes a type error. A similar notion was first defined for functional programming languages with the blame calculus [WF09].

Example 8

Let PP be the following program, with clauses c_1c_{\_}1, c_2c_{\_}2, and c_3c_{\_}3, respectively:

p(1).
q(a).
q(X) :- p(a).

Then for the query p(2),q(b)p(2),q(b), we have the following TSLD-tree:

p(2),q(b)p(2),q(b)false,q(b)false,q(b)falsefalsec_2c_{\_}2p(a)p(a)wrongwrongc_1c_{\_}1c_3c_{\_}3c_1c_{\_}1

In this case, c_3c_{\_}3 is a blamed clause, since every derivation that uses it eventually reaches wrongwrong. Note that c_1c_{\_}1 is not a blamed clause, because the leftmost branch of the TSLD-tree uses c_1c_{\_}1 but is falsefalse.

Definition 9 (Type Error in the Program)

Suppose we have a program PP and a generic query QQ. Then PP has a type error if there is a blamed clause in the TSLD-tree for P{Q}P\cup\{Q\}.

Note that if a program does not have a type error, then there is no blamed clause in the TSLD-tree. Also note that the order of the atoms in the generic query may change the place where we use the blamed clause, but it will not change the fact that there exists one or not.

Example 9

Assume the same program from Example 8. Let Q_1=p(X1),q(X2)Q_{\_}1=p(X1),q(X2) and Q_2=q(X1),p(X2)Q_{\_}2=q(X1),p(X2) be generic queries. The TSLD-trees for P{Q_1}P\cup\{Q_{\_}1\} and P{Q_2}P\cup\{Q_{\_}2\} are:

p(X1),q(X2)p(X1),q(X2)q(X2)q(X2)\squarec_2c_{\_}2p(a)p(a)wrongwrongc_1c_{\_}1c_3c_{\_}3c_1c_{\_}1
q(X1),p(X2)q(X1),p(X2)p(X2)p(X2)\squarec_1c_{\_}1c_2c_{\_}2p(a),p(X2)p(a),p(X2)wrongwrongc_1c_{\_}1c_3c_{\_}3

Intuitively, having a type error in the program means that somewhere in the program we will perform typed unification between two terms that do not have the same type.

Consider a generic query Q=A,A¯Q=A,\bar{A}, where A=p(X_1,,X_n)A=p(X_{\_}1,\dots,X_{\_}n). For some derivation, after one step, we will have θ(B¯,A¯)\theta(\bar{B},\bar{A}), where HB¯H\leftarrow\bar{B} is a clause in PP and θ\theta is a unifier of AA and HH. Since θ\theta, or any other idempotent mgu of AA and HH, is a renaming of {X_1t_1,,X_nt_n}\{X_{\_}1\mapsto t_{\_}1,\dots,X_{\_}n\mapsto t_{\_}n\}, where H=p(t_1,,t_n)H=p(t_{\_}1,\dots,t_{\_}n), and since the variables X_1,,X_nX_{\_}1,\dots,X_{\_}n do not occur in B¯\bar{B} because the clause is variable disjoint from the query by definition, nor do they occur in A¯\bar{A}, by the definition of the generic query, then θ(B¯,A¯)=B¯,A¯\theta(\bar{B},\bar{A})=\bar{B},\bar{A}. The argument holds for any other atom in A¯\bar{A}. Thus whenever the selected atom belongs to the original query QQ the result is never wrongwrong and the mgu can be ignored.

After selecting the blamed clause cc, every TSLD-derivation is such that Q_0Q_nwrongQ_{\_}0\implies\dots\implies Q_{\_}n\implies\emph{wrong}. Thus, at step Q_nQ_{\_}n, the selected atom comes from the program and every mgu applied up to this point is from substitutions arising from the program itself and not the query. Therefore, the type error was in the program.

Definition 10 (Type Error in the Query)

Let PP be a program and QQ be a query. If there is no type error in PP and the TSLD-tree is finitely erroneous, then we say that there is a type error in the query QQ with respect to PP.

If there is no type error in the program PP but the TSLD-tree is finitely erroneous, then that error must have occurred in a unification between terms from the query and the program. We then say that the type error is in the query.

Example 10

Now suppose we have the following program:

p(1).
q(a).
q(X) :- p(X).

Let us name the clauses c_1c_{\_}1, c_2c_{\_}2, and c_3c_{\_}3, respectively. The following trees are the TSLD-tree for a generic query, and the TSLD-tree for the query Q=q(1.1)Q\prime=q(1.1).

p(X1),q(X2)p(X1),q(X2)q(X2)q(X2)\squarec_2c_{\_}2p(X)p(X)\squarec_1c_{\_}1c_3c_{\_}3c_1c_{\_}1
q(1.1)q(1.1)wrongwrongc_2c_{\_}2p(1.1)p(1.1)wrongwrongc_1c_{\_}1c_3c_{\_}3

On the left-hand side, we see that for a generic query QQ, the TSLD-tree for P{Q}P\cup\{Q\} has no blamed query, which means that there is no type error in the program. On the right-hand side the tree is finitely erroneous, therefore there is a type error in query QQ\prime.

5 Declarative Semantics

The declarative semantics of logic programming is, in opposition to the operational one, a definition of what the programs compute. The fact that logic programming can be interpreted this way supports the fact that logic programming is declarative [Apt96]. In this section, we will introduce the concept of interpretations, which takes us from the syntactic programs we saw and used so far into the semantic universe, giving them meaning. With this interpretation we will redefine a declarative semantics for logic programming first defined in [BFC19] and prove a connection between both the operational and the declarative semantics.

5.1 Domains

Let UU be a non-empty set of semantic values, which we will call the universe. We assume that the universe is divided into domains such that each ground type is mapped to a non-empty domain. Thus, UU is divided into domains as follows: U=Int+Float+Atom+String+A_1++A_n+F+Bool+WU=Int+Float+Atom+String+A_{\_}1+\dots+A_{\_}n+F+Bool+W, where IntInt is the domain of integer numbers, FloatFloat is the domain of floating point numbers, AtomAtom is the domain of non-numeric constants, StringString is the domain of strings, A_iA_{\_}i are domains for trees, where each domain has trees whose root is the same functor symbol and its n-children belong to n domains and FF is the domain of functions. Moreover, we define BoolBool as the domain containing truetrue and falsefalse, and WW as the domain with the single value wrongwrong, corresponding to a run-time error. We will call IntInt, FloatFloat, AtomAtom, and StringString the base domains, and A_1,,A_nA_{\_}1,\dots,A_{\_}n the tree domains. In particular, we can see that constants are separated into several predefined base domains, one for each base type, while complex terms, i.e. trees, are separated into domains depending on the principal function symbol (root) and the n-tuple inside the parenthesis (n-children).

5.2 Interpretations

Every constant of type TT is associated with a semantic value in one of the base domains, IntInt, FloatFloat, AtomAtom, or StringString, corresponding to TT. Every function symbol ff of arity nn in our language is associated with a mapping f_Uf_{\_}U from any n-tuple of base or tree domains δ_1××δ_n\delta_{\_}1\times\dots\times\delta_{\_}n to the domain F(δ_1,,δ_n)F(\delta_{\_}1,\dots,\delta_{\_}n), which is the domain of trees whose root is ff and the n-children are in the domains δ_i\delta_{\_}i.

To define the semantic value for terms, we will first have to define states. States, Σ\Sigma, are mappings from variables into values of the universe. We also define a function domaindomain that when applied to a semantic value returns the domain it belongs to. The semantic value of a term is defined as follows:

|[X|]_Σ=Σ(X)|[X|]_{\_}{\Sigma}=\Sigma(X)
|[c|]_Σ=c_U|[c|]_{\_}{\Sigma}=c_{\_}U
|[f(t_1,,t_n)|]_Σ=f_U(|[t_1|]_Σ,,|[t_n|]_Σ)|[f(t_{\_}1,\dots,t_{\_}n)|]_{\_}{\Sigma}=f_{\_}U(|[t_{\_}1|]_{\_}{\Sigma},\dots,|[t_{\_}n|]_{\_}{\Sigma})

An interpretation associates every predicate symbol pp with a function p_Up_{\_}U in FF, such that the output of the function p_Up_{\_}U is the domain BoolBool and the input is a union of tuples of domains. For each tuple that is in its domain, the function p_Up_{\_}U either returns truetrue or falsefalse. We will use |[|]_I,Σ|[~{}|]_{\_}{I,\Sigma} to denote the semantics of an expression EE, which can be an atom, a query, or a clause, in an interpretation II, and define it as follows:

|[p(t_1,,t_n)|]_I,Σ|[p(t_{\_}1,\dots,t_{\_}n)|]_{\_}{I,\Sigma} = if (domain(|[t_1|]_Σ),,domain(|[t_n|]_Σ))domain(I(p))(domain(|[t_{\_}1|]_{\_}{\Sigma}),\dots,domain(|[t_{\_}n|]_{\_}{\Sigma}))\subseteq domain(I(p))
                               then I(p)(|[t_1|]_Σ,,|[t_n|]_Σ)I(p)(|[t_{\_}1|]_{\_}{\Sigma},\dots,|[t_{\_}n|]_{\_}{\Sigma})
                               else wrongwrong
|[A_1,,A_n|]_I,Σ=|[A_1|]_I,Σ|[A_n|]_I,Σ|[A_{\_}1,\dots,A_{\_}n|]_{\_}{I,\Sigma}~{}=~{}|[A_{\_}1|]_{\_}{I,\Sigma}\wedge\dots\wedge|[A_{\_}n|]_{\_}{I,\Sigma}
|[q(t_1,,t_n):B¯|]_I,Σ=(|[B¯|]_I,Σ>(|[q(t_1,,t_n)|]_I,Σ))|[q(t_{\_}1,\dots,t_{\_}n):-\bar{B}|]_{\_}{I,\Sigma}~{}=~{}(|[\bar{B}|]_{\_}{I,\Sigma}-->(|[q(t_{\_}1,\dots,t_{\_}n)|]_{\_}{I,\Sigma}))

Note that if the clause is of the form HH\leftarrow, then its semantics is equivalent to that of HH.

5.3 Models

The term language and their semantic values are fixed, thus each interpretation II is determined by the interpretation of the predicate symbols. Interpretations differ from each other only in the functions p_Up_{\_}U they associate to each predicate pp defined in PP. We now define a context as a set Δ\Delta of pairs of the form X:DX:D, where XX is a variable that occurs only once in the set, and DD is a domain. We say that Σ\Sigma complies with Δ\Delta if every binding X:vX:v in Σ\Sigma is such that (X:D)Δ(X:D)\in\Delta and vDv\in D. An interpretation II is a model of EE in the context Δ\Delta iff for every state Σ\Sigma that complies with Δ\Delta, |[E|]_I,Σ=true|[E|]_{\_}{I,\Sigma}=true. We will denote this as Δ|[E|]_I\Delta\models|[E|]_{\_}I. Given a program PP, we say that an interpretation II is a model of PP in context Δ\Delta if II is a model of every clause in PP in context Δ\Delta. Here we assume, without loss of generality, that all clauses are variable disjoint with each other. If two expressions E_1E_{\_}1 and E_2E_{\_}2 are such that every model of E_1E_{\_}1 in a context Δ\Delta is also a model of E_2E_{\_}2 in the context Δ\Delta, then we say that E_2E_{\_}2 is a semantic consequence of E_1E_{\_}1 and represent this by E_1E_2E_{\_}1\models E_{\_}2. Suppose two interpretations I_1I_{\_}1 and I_2I_{\_}2 are models of program PP in some context Δ\Delta. Suppose, in particular, that for some predicate pp of PP the associated function is p_U1p_{\_}{U1} for I_1I_{\_}1 and p_U2p_{\_}{U2} for I_2I_{\_}2. Let us call T_iT_{\_}i the set of tuples of terms for which p_Uip_{\_}{Ui} outputs truetrue, and F_iF_{\_}i to the set of tuples of terms for which p_Uip_{\_}{Ui} outputs falsefalse. We say that I_1I_{\_}1 is smaller than I_2I_{\_}2 if T_1T_2T_{\_}1\subseteq T_{\_}2 and, if T_1=T_2T_{\_}1=T_{\_}2, then F_1F2F_{\_}1\subseteq F2. We say that a model II of PP in context Δ\Delta is minimal if for every other model II\prime of PP in context Δ\Delta, II is smaller than II\prime.

Example 11

Consider the program PP defined below:

father(john,mary).
father(phil,john).
grandfather(X,Y) :- father(X,Z), father(Z,Y).

Suppose that interpretation I_1I_{\_}1 is such that p_U_1p_{\_}{U_{\_}1} is associated with grandfathergrandfather and p_U1::Atom×AtomBoolp_{\_}{U1}::Atom\times Atom\to Bool. Suppose that p_U2p_{\_}{U2} is associated to grandfathergrandfather in I_2I_{\_}2, with the same domain. Suppose, also, that p_U3p_{\_}{U3}, associated in I_3I_{\_}3 with grandfathergrandfather, is such that p_U3::Atom×AtomInt×IntBoolp_{\_}{U3}::Atom\times Atom\cup Int\times Int\to Bool. Let the sets T_1={(phil,mary)}T_{\_}1=\{(phil,mary)\}, T_2={(phil,mary),(john,caroline)}T_{\_}2=\{(phil,mary),(john,caroline)\}, and T_3={(phil,mary)}T_{\_}3=\{(phil,mary)\} be the sets of accepted tuples for p_U1p_{\_}{U1}, p_U2p_{\_}{U2}, and p_U3p_{\_}{U3}, respectively.

Thus, if these interpretations associate the same function q_U::Atom×AtomBoolq_{\_}U::Atom\times Atom\to Bool to fatherfather, and T={(john,mary),(phil,john)}T=\{(john,mary),(phil,john)\} the set of accepted tuples for q_Uq_{\_}U, then all I_iI_{\_}i are models of PP in context Δ={X:Atom,Y:Atom,Z:Atom}\Delta=\{X:Atom,Y:Atom,Z:Atom\}. In fact, all states Σ\Sigma that comply with Δ\Delta are such that |[grandfather(X,Y):father(X,Z),father(Z,Y)|]_I_i,Σ|[grandfather(X,Y):-father(X,Z),father(Z,Y)|]_{\_}{I_{\_}i,\Sigma} is true, for all i=1,2,3i=1,2,3.

But note that T_1T_2T_{\_}1\subseteq T_{\_}2, and T_1=T_3T_{\_}1=T_{\_}3, but F_1F_3F_{\_}1\subseteq F_{\_}3. In fact, any smaller domain or set T_kT_{\_}k would not model PP. Therefore I_1I_{\_}1 is the minimal model of PP.

5.4 Type Errors

To calculate the set of accepted tuples for a given interpretation we will use the immediate consequence operator T_PT_{\_}P. The T_PT_{\_}P operator is traditionally used in the logic programming literature to iteratively calculate the minimal model of a logic program as presented in [vEK76, Apt96, Llo84].

Since for us interpretations for predicates are typed, T_Pω()T_{\_}P\uparrow\omega(\emptyset) does not generate an interpretation by itself. Instead it generates a set of atoms SS. Then we say that any interpretation II derived from SS is such that for all predicates pp occurring in SS, I(p)::(D_(1,1)××D_(1,n))(D_(k,1),,D_(k,n))BoolI(p)::(D_{\_}{(1,1)}\times\dots\times D_{\_}{(1,n)})\cup\dots\cup(D_{\_}{(k,1)},\dots,D_{\_}{(k,n)})\to Bool, where for all ii there is at least one atom p(v_1,,v_n)Sp(v_{\_}1,\dots,v_{\_}n)\in S such that v_1D_(i,1),,v_nD_(i,n)v_{\_}1\in D_{\_}{(i,1)},\dots,v_{\_}n\in D_{\_}{(i,n)}. Note that these interpretations may not be models of PP using our new definition of a model. We are now able to define the notion of ill-typed program.

Definition 11 (Ill-typed Program)

Let PP be a program. If no interpretation derived from T_Pω()T_{\_}P\uparrow\omega(\emptyset) is a model of PP, we say that PP is an ill-typed program.

Example 12

Let PP be the program defined as:

p(1).
p(a).
q(X) :- p(1.1).

Then S=T_Pω()={p(1),p(a)}S=T_{\_}P\uparrow\omega(\emptyset)=\{p(1),p(a)\}. So any interpretation II derived from SS is such that p_I::IntAtomBoolp_{\_}I::Int\cup Atom\to Bool. Therefore for any context Δ\Delta, for every Σ\Sigma that complies with Δ\Delta, |[q(X):p(1.1)|]_I,Σ=wrong|[q(X):-p(1.1)|]_{\_}{I,\Sigma}=wrong. Therefore no such II is a model of PP.

The reason why T_Pω()T_{\_}P\uparrow\omega(\emptyset) is always a minimal model of PP in the untyped semantics, comes from the fact that whenever a body of a clause is falsefalse for all states, then the clause is trivially truetrue for all states. However in our semantics, since we are separating these cases into falsefalse and wrongwrong, the wrongwrong ones do not trivially make the formula truetrue, making it wrongwrong instead. These are the ill-typed cases.

Lemma 1 (Blamed Clause Type Error)

Suppose there is a type error in the program with blamed clause c=HA_1,,A_mc=H\leftarrow A_{\_}1,\dots,A_{\_}m. Then, A_i=p(t_1,,t_n)\exists A_{\_}i=p(t_{\_}1,\dots,t_{\_}n) such that p(s_1,,s_n)T_Pω().Σ.j.domain(|[t_j|]_Σ)=domain(|[s_j|]_Σ)\forall p(s_{\_}1,\dots,s_{\_}n)\in T_{\_}P\uparrow\omega(\emptyset).\forall\Sigma.\exists j.domain(|[t_{\_}j|]_{\_}{\Sigma})=domain(|[s_{\_}j|]_{\_}{\Sigma}).

Effectively what this means is that if there is a type error in the program, then the blamed clause is such that it will not be used to calculate the T_Pω()T_{\_}P\uparrow\omega(\emptyset), since at least one of the atoms in its body will never be able to be used in an application of T_PT_{\_}P. We can also have a ill-typed query, and we define it as follows.

Definition 12 (Ill-typed Query)

Let PP be a program. If any interpretation II derived from T_Pω()T_{\_}P\uparrow\omega(\emptyset), such that II models PP in some context Δ\Delta, is such that II is not a model of QQ in the context Δ\Delta, then we say that QQ is an ill-typed query with respect to PP.

5.5 Soundness of TSLD-resolution

In this section we will prove that TSLD-resolution is sound, i.e. if there is a successful derivation of a query QQ in program PP with a correct answer substitution θ\theta, then every model of PP is also a model of θ(Q)\theta(Q); if there is a type error in the program, then the program is ill-typed; and if there is a type error in the query, the query is ill-typed with respect to the program. To prove this we will introduce the following auxiliary concept.

Definition 13 (Resultant)

Suppose we have a TSLD-derivation step Q_1θ(Q_2)Q_{\_}1\implies\theta(Q_{\_}2). Then we define the resultant associated to this step as θ(Q_1)Q_2\theta(Q_{\_}1)\leftarrow Q_{\_}2.

Lemma 2 (Soundness of resultants)

Let Q_1θ(Q_2)Q_{\_}1\implies\theta(Q_{\_}2) be a TSLD-derivation step using input clause cc and rr be the resultant associated with it. Then:

  1. 1.

    crc\models r;

  2. 2.

    for any TSLD-derivation of P{Q}P\cup\{Q\} with resultants r_1,,r_nr_{\_}1,\dots,r_{\_}n, Pr_iP\models r_{\_}i (for all i0i\geq 0).

Proof of this lemma for the SLD-resolution is in [Apt96]. Since for unifiable terms the typed unification algorithm behaves like first-order unification, the proof still holds.

Theorem 5.1 (Soundness of TSLD-resolution)

Let PP be a program and QQ a query. Then:

  1. 1.

    Suppose that there exists a successful derivation of P{Q}P\cup\{Q\}, with the correct answer substitution θ\theta. Then Pθ(Q)P\models\theta(Q).

  2. 2.

    Suppose there is a type error in the program. Then PP is ill-typed.

  3. 3.

    Suppose there is a type error in the query. Then QQ is an ill-typed query with respect to PP.

A short note about completeness. As for untyped SLD-resolution, completeness is related to the search for answers in a TSLD-tree. If we use Prolog sequential, top-down, depth-first search with backtracking, then it may result in incompleteness for same cases where the TSLD-tree is infinite, because the exploration of an infinite computation may defer indefinitely the exploration of some alternative computation capable of yielding a correct answer.

6 Conclusions and Future Work

We presented an operational semantics for logic programming, here called TSLD-resolution, which is sensitive to run-time type errors. In this setting type errors are represented by a new value, here called wrong, which is added to the usual fail and success results of evaluation of a query for a given logic program. We have then adapted a previously defined declarative semantics for typed logic programs using a three-valued logic and proved that TSLD-resolution is sound with respect to this semantics. All these new concepts, TSLD, typed unification and the new declarative semantics, revisit and partially extend well-known concepts form the theory of logic programming.

Specially interpreted functors: in this paper functors are uninterpreted, such as in Prolog, in the sense that they are just symbols used to build new trees. An obvious extension of this work is to extend the system to dynamically detect type errors relating to the semantic interpretation of some specific functors, for instance the list constructor. For this, we would have for the list constructor not the Herbrand-based interpretation [|]::A,B.A×B[A|B][~{}|~{}]::\forall A,B.A\times B\rightarrow[A|B], but the following interpretation [|]::A.A×list(A)list(A)[~{}|~{}]::\forall A.A\times list(A)\rightarrow list(A). Moreover, we would have the empty list [][~{}] with type D.list(D)\forall D.list(D). This would necessarily change the typed unification algorithm by introducing a new kind of constraints. As an example, consider the unification [1|2]=[1|2][1|2]=[1|2], where the second argument in both terms is not a list: considering a specially interpreted list constructor the result should be wrong, although the traditional untyped result is true. The same issues appear for arithmetic expressions. Arithmetic interpretations of +,,×+,-,\times, and // can be introduced in the typed unification algorithm, so that in this context, unifications such as a+b=a+ba+b=a+b would now return wrong instead of true. These extensions are left for future work.

Using TSLD in practice: we plan to integrate TSLD into the YAP Prolog System [CRD12]. This will enable further applications of the method to large scale Prolog programs. One important point of this integration it to add explicit type declarations to all the implicitly typed Prolog built-in predicates. Finally, one could also wonder how to apply TSLD to the dynamic typing of constraint logic programming modules, adding new types mapped to constraint domains.

References

  • [Apt96] Krzysztof R. Apt. From Logic Programming to Prolog. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.
  • [BB81] D.A. Bochvar and Merrie Bergmann. On a three-valued logical calculus and its application to the analysis of the paradoxes of the classical extended functional calculus. History and Philosophy of Logic, 2(1-2):87–112, 1981.
  • [Bea16] Jc Beall. Off-topic: A new interpretation of weak-kleene logic. The Australasian Journal of Logic, 13(6), 2016.
  • [BFC19] João Barbosa, Mário Florido, and Vítor Santos Costa. A three-valued semantics for typed logic programming. In Proceedings 35th International Conference on Logic Programming (Technical Communications), ICLP 2019 Technical Communications, Las Cruces, NM, USA, September 20-25, 2019, volume 306 of EPTCS, pages 36–51, 2019.
  • [BFC22] João Barbosa, Mário Florido, and Vítor Santos Costa. Data type inference for logic programming. In Emanuele De Angelis and Wim Vanhoof, editors, Logic-Based Program Synthesis and Transformation, pages 16–37, Cham, 2022. Springer International Publishing.
  • [BJ88] Maurice Bruynooghe and Gerda Janssens. An instance of abstract interpretation integrating type and mode inferencing. In Fifth International Conference and Symposium, Washington, 1988, pages 669–683, 1988.
  • [CRD12] Vítor Santos Costa, Ricardo Rocha, and Luís Damas. The YAP prolog system. Theory Pract. Log. Program., 12(1-2):5–34, 2012.
  • [DEC96] Pierre Deransart, AbdelAli Ed-Dbali, and Laurent Cervoni. Prolog - the standard: reference manual. Springer, 1996.
  • [DMP02] Włodzimierz Drabent, Jan Małuszyński, and Paweł Pietrzak. Using parametric set constraints for locating errors in CLP programs. Theory and Practice of Logic Programming, 2(4-5):549–610, 2002.
  • [DS01] Pierre Deransart and Jan-Georg Smaus. Well-typed logic programs are not wrong. In Herbert Kuchen and Kazunori Ueda, editors, Functional and Logic Programming, 5th International Symposium, FLOPS 2001, Tokyo, Japan, 2001, Proceedings, volume 2024 of Lecture Notes in Computer Science, pages 280–295. Springer, 2001.
  • [DZ92] Philip W. Dart and Justin Zobel. A regular type language for logic programs. In Frank Pfenning, editor, Types in Logic Programming, pages 157–187. The MIT Press, 1992.
  • [FSVY91] Thom W. Frühwirth, Ehud Y. Shapiro, Moshe Y. Vardi, and Eyal Yardeni. Logic programs as types for logic programs. In Proc. of the Sixth Annual Symposium on Logic in Computer Science (LICS ’91), Netherlands,1991, pages 300–309, 1991.
  • [Han14] Michael Hanus. Multiparadigm languages. In Computing Handbook, Third Edition: Computer Science and Software Engineering, pages 66: 1–17. Springer Cham, 2014.
  • [Her30] Jacques Herbrand. Recherches sur la théorie de la démonstration. Numdam, 1930.
  • [Kle38] S. C. Kleene. On notation for ordinal numbers. Journal Symbolic Logic, 3(4), 1938.
  • [Llo84] J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin, Heidelberg, 1984.
  • [LR91] T. L. Lakshman and Uday S. Reddy. Typed Prolog: A semantic reconstruction of the Mycroft-O’Keefe type system. In Logic Programming, Proceedings of the 1991 International Symposium, San Diego, California, USA, 1991.
  • [Lu01] Lunjin Lu. On Dart-Zobel algorithm for testing regular type inclusion. SIGPLAN Not., 36(9):81–85, 2001.
  • [Mil78] Robin Milner. A theory of type polymorphism in programming. J. Comput. Syst. Sci., 17(3):348–375, 1978.
  • [MM82] Alberto Martelli and Ugo Montanari. An efficient unification algorithm. ACM Trans. Program. Lang. Syst., 4(2):258–282, apr 1982.
  • [MO84] Alan Mycroft and Richard A. O’Keefe. A polymorphic type system for Prolog. Artif. Intell., 23(3):295–307, 1984.
  • [Rob65] J. A. Robinson. A Machine-Oriented Logic Based on the Resolution Principle. Journal of the ACM, 12(1):23–41, 1965.
  • [SBG08] Tom Schrijvers, Maurice Bruynooghe, and John P. Gallagher. From monomorphic to polymorphic well-typings and beyond. In Logic-Based Program Synthesis and Transformation, 18th International Symposium, LOPSTR 2008, Spain, July 17-18, 2008, pages 152–167, 2008.
  • [SCWD08] Tom Schrijvers, Vítor Santos Costa, Jan Wielemaker, and Bart Demoen. Towards typed Prolog. In Logic Programming, 24th International Conference, ICLP 2008, Udine, Italy, Proceedings, pages 693–697, 2008.
  • [vEK76] Maarten H. van Emden and Robert A. Kowalski. The semantics of predicate logic as a programming language. J. ACM, 23(4):733–742, 1976.
  • [WF09] Philip Wadler and Robert Bruce Findler. Well-typed programs can’t be blamed. In Giuseppe Castagna, editor, Programming Languages and Systems, 18th European Symposium on Programming, ESOP 2009, York, UK, March 22-29, 2009. Proceedings, volume 5502 of Lecture Notes in Computer Science, pages 1–16. Springer, 2009.
  • [YFS92] Eyal Yardeni, Thom W. Frühwirth, and Ehud Shapiro. Polymorphically typed logic programs. In Frank Pfenning, editor, Types in Logic Programming, pages 63–90. The MIT Press, 1992.
  • [Zob87] Justin Zobel. Derivation of polymorphic types for Prolog programs. In Logic Programming, Proceedings of the Fourth International Conference, Melbourne, 1987, 1987.

Appendix 0.A Proofs for Theorems

Theorem 0.A.1 (Equivalence to term unification for success)

Let t_1t_{\_}1 and t_2t_{\_}2 be two terms. If we apply the Martelli-Montanari algorithm (MM algorithm) to t_1t_{\_}1 and t_2t_{\_}2 and it returns a set of solved equalities SS, then the typed unification algorithm applied to the same two terms is also successful and returns the same set of equalities.

Proof

The proof follows from induction on SS. Base cases:

  • Suppose we have an equality in SS of the form f(t_1,,t_n)=g(s_1,,s_m)f(t_{\_}1,\dots,t_{\_}n)=g(s_{\_}1,\dots,s_{\_}m). Then the MM algorithm halts with failure, and our algorithm halts with wrongwrong.

  • Suppose we have an equality in SS of the form c=dc=d. Then the MM algorithm halts with failure, and our algorithm either halts with wrongwrong or changes FF to falsefalse, depending on the types of cc and dd. In either case, it will not be successful.

  • Suppose we have an equality in SS of the form c=f(t_1,,t_n)c=f(t_{\_}1,\dots,t_{\_}n) or f(t_1,,t_n)=cf(t_{\_}1,\dots,t_{\_}n)=c. Then the MM algorithm halts with failure, and our algorithm halts with wrongwrong.

  • Suppose we have an equality in SS of the form X=tX=t, where XX occurs in tt. Then, the MM algorithm halts with failure and our algorithm changes FF to falsefalse, so it is never successful.

Inductive cases:

  • Suppose we have an equality in SS of the form f(t_1,,t_n)=f(s_1,,s_n)f(t_{\_}1,\dots,t_{\_}n)=f(s_{\_}1,\dots,s_{\_}n). Both algorithms generate the same new equalities and replace the selected one with those in SS. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities SS\prime, so does our algorithm.

  • Suppose we have an equality in SS of the form c=cc=c. Both algorithms delete this equality from SS. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities SS\prime, so does our algorithm.

  • Suppose we have an equality in SS of the form X=XX=X. Both algorithms delete this equality from SS. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities SS\prime, so does our algorithm.

  • Suppose we have an equality in SS of the form t=Xt=X, where tt is not a variable and XX is a variable. Both algorithms replace this equality from SS with the same new one. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities SS\prime, so does our algorithm.

  • Suppose we have an equality in SS of the form X=tX=t, where XX does not occur in tt and XX occurs somewhere else in SS. Both algorithms apply the same substitution to S{X=t}S\setminus\{X=t\}, therefore resulting in the same set of equalities. Then, by the induction hypothesis, if the MM algorithm succeeds and outputs a set of solved equalities SS\prime, so does our algorithm.

In any other case, none of the algorithms apply. ∎

Theorem 0.A.2 (Ill-typed unification)

If the output of the typed unification algorithm is wrong, then there is no substitution θ\theta such that θ(t_1)\theta(t_{\_}1) and θ(t_2)\theta(t_{\_}2) have the same type.

Proof

Suppose that there is a substitution θ\theta such that θ(t_1)=θ(t_2)\theta(t_{\_}1)=\theta(t_{\_}2). Then, we now that the MM algorithm would output an mgu of t_1t_{\_}1 and t_2t_{\_}2. Therefore so would our algorithm, from theorem 3.1. But the output of our algorithm is wrong, so there cannot be such a θ\theta. Suppose that there is a substitution θ\theta such that θ(t_1)\theta(t_{\_}1) and θ(t_2)\theta(t_{\_}2) have the same type. Then, we will prove that our algorithm would output falsefalse by induction on t_1t_{\_}1 and t_2t_{\_}2. Base cases:

  • Suppose we have c=dc=d and cc has the same type as dd. Then the algorithm outputs falsefalse, not wrongwrong.

  • Suppose we have c=dc=d and cc has a different type from dd. Then such θ\theta cannot exist.

  • Suppose we have X=tX=t, where XX occurs in tt. Then the algorithm outputs falsefalse, not wrongwrong.

  • Suppose we have f(t_1,,t_n)=g(s_1,,s_m)f(t_{\_}1,\dots,t_{\_}n)=g(s_{\_}1,\dots,s_{\_}m). Then such θ\theta cannot exist.

Inductive step:

  • Suppose we have f(t_1,,t_n)=f(s_1,,s_n)f(t_{\_}1,\dots,t_{\_}n)=f(s_{\_}1,\dots,s_{\_}n). Then, by the induction hypothesis, if the algorithm outputs wrongwrong for the input ({t_1=s_1,,t_n=s_n},true)(\{t_{\_}1=s_{\_}1,\dots,t_{\_}n=s_{\_}n\},true), we know that no θ\theta such that i.θ(t_i)\forall i.\theta(t_{\_}i) and θ(s_i)\theta(s_{\_}i) have the same type, exists. Therefore, no θ\theta such that θ(f(t_1,,t_n))\theta(f(t_{\_}1,\dots,t_{\_}n)) and θ(f(s_1,,s_n))\theta(f(s_{\_}1,\dots,s_{\_}n)) have the same type exists either, by the properties of substitution.

  • Suppose we have c=cc=c or X=XX=X, then the algorithm outputs truetrue, not wrongwrong.

  • Suppose we have X=tX=t, then either the algorithm ends with F=trueF=true or F=falseF=false if XX occurs in tt, but never wrongwrong.

  • Suppose we have t=Xt=X, where tt is not a variable. Then, the algorithm either halts with success in one step or fails. Either way it does not output wrongwrong.

So, we prove that whenever the typed unification algorithm halts with wrongwrong for some pair of terms t_1t_{\_}1 and t_2t_{\_}2, then there is no substitution θ\theta such that θ(t_1)\theta(t_{\_}1) and θ(t_2)\theta(t_{\_}2) have the same type. ∎

Lemma 3 (Blamed Clause Type Error)

Suppose there is a type error in the program with blamed clause c=HA_1,,A_mc=H\leftarrow A_{\_}1,\dots,A_{\_}m. Then, A_i=p(t_1,,t_n)\exists A_{\_}i=p(t_{\_}1,\dots,t_{\_}n) such that p(s_1,,s_n)T_Pω().Σj.domain(|[t_j|]_Σ)=domain(|[s_j|]_Σ)\forall p(s_{\_}1,\dots,s_{\_}n)\in T_{\_}P\uparrow\omega(\emptyset).\forall\Sigma\exists j.domain(|[t_{\_}j|]_{\_}{\Sigma})=domain(|[s_{\_}j|]_{\_}{\Sigma}).

Proof

We will prove this by contradiction. Suppose that for all A_iA_{\_}i there is some p(s_1,,s_n)T_Pω()p(s_{\_}1,\dots,s_{\_}n)\in T_{\_}P\uparrow\omega(\emptyset) such that for some Σ\Sigma, i[1,,n]domain(|[t_i|]_Σ)=domain(|[s_i|]_Σ)\forall i\in[1,\dots,n]domain(|[t_{\_}i|]_{\_}{\Sigma})=domain(|[s_{\_}i|]_{\_}{\Sigma}). Then, there would be a derivation of the form A_1,,A_m,B¯B¯A_{\_}1,\dots,A_{\_}m,\bar{B}\implies\dots\implies\bar{B} or A_1,,A_m,B¯false,B¯A_{\_}1,\dots,A_{\_}m,\bar{B}\implies\dots\implies false,\bar{B} in the TSLD-tree for P{Q}P\cup\{Q\}, where QQ is a generic query, since the output for the unification between A_iA_{\_}i and p(s_1,,s_n)p(s_{\_}1,\dots,s_{\_}n) would not return wrongwrong. But then cc would not be a blamed clause. Therefore we proved the lemma.

Theorem 0.A.3 (Soundness of TSLD-resolution)

Let PP be a program and QQ a query. Then:

  1. 1.

    Suppose that there exists a successful derivation of P{Q}P\cup\{Q\}, with the correct answer substitution θ\theta. Then Pθ(Q)P\models\theta(Q).

  2. 2.

    Suppose there is a type error in the program. Then PP is ill-typed.

  3. 3.

    Suppose there is a type error in the query. Then QQ is an ill-typed query with respect to PP.

Proof

(1) Let θ_1,,θ_n\theta_{\_}1,\dots,\theta_{\_}n be the mgus obtained in a successful derivation. Therefore, θ=θ_nθ_1\theta=\theta_{\_}n\circ\dots\circ\theta_{\_}1. The proof follows directly from lemma 2 applied to Pθ_nθ_1(Q)P\models\theta_{\_}n\circ\dots\circ\theta_{\_}1(Q)\leftarrow\square.

(2) If there is a type error in the program, then there is a blamed clause cc in the TSLD-tree for P{GQ}P\cup\{GQ\}, where GQGQ is a generic query. Then by lemma 1 we know that there is at least one A_i=p(t_1,,t_n)A_{\_}i=p(t_{\_}1,\dots,t_{\_}n), in the body of cc, such that p(s_1,,s_n)T_Pω().i[1,,n].domain(|[t_i|])=domain(|[s_i|])\forall p(s_{\_}1,\dots,s_{\_}n)\in T_{\_}P\uparrow\omega(\emptyset).\exists i\in[1,\dots,n].domain(|[t_{\_}i|])=domain(|[s_{\_}i|]). Any interpretation II derived from T_Pω()T_{\_}P\uparrow\omega(\emptyset) is such that for any Σ\Sigma |[A_i|]_I,Σ=wrong|[A_{\_}i|]_{\_}{I,\Sigma}=wrong. This implies that no such II is a model of cc and, therefore, no such II is a model of PP, which means P is ill-typed.

(3) This is the case where there is a type error in the query QQ. Now, consider program P{p()Q}P\cup\{p()\leftarrow Q\}, where pp is a predicate that does not occur in PP. Then note that every TSLD-derivation for a generic query with input clause p()Qp()\leftarrow Q leads to wrongwrong, so this is a blamed clause. Therefore, from (2), no interpretation II derived from T_Pω()T_{\_}P\uparrow\omega(\emptyset) models this clause. Since the set of atoms for pp in T_Pω()T_{\_}P\uparrow\omega(\emptyset) is empty, we can give any interpretation to pp in any interpretation derived from T_Pω()T_{\_}P\uparrow\omega(\emptyset). So we can choose an interpretation II derived from T_Pω()T_{\_}P\uparrow\omega(\emptyset) such that II models p()p() in some context Δ\Delta. Therefore, as no interpretation derived from T_P()T_{\_}P\uparrow(\emptyset) models P{p()Q}P\cup\{p()\leftarrow Q\} in any context Δ\Delta and we can build an interpretation which models predicate pp in some context, no interpretation can model QQ. Thus, by the definition of ill-typed query, QQ is ill-typed with respect to PP.