\publyear

2021 \papernumber2095

\finalVersionForIOS

A Non-Deterministic Multiset Query Language

Bartosz Zieliński
Department of Computer Science
Faculty of Physics and Applied Informatics Address for correspondence: Department of Computer Science, Faculty of Physics and Applied Informatics, University of Łódź, Pomorska 149/153 90-236 Łódź, Poland.

Received November 2021; revised November 2021. University of Łódź
Pomorska 149/153 90-236 Łódź Poland
[email protected]

Abstract

We develop a multiset query and update language executable in a term rewriting system. Its most remarkable feature, besides non-standard approach to quantification and introduction of fresh values, is non-determinism — a query result is not uniquely determined by the database. We argue that this feature is very useful, e.g., in modelling user choices during simulation or reachability analysis of a data-centric business process — the intended application of our work. Query evaluation is implemented by converting the query into a terminating term rewriting system and normalizing the initial term which encapsulates the current database. A normal form encapsulates a query result. We prove that our language can express any relational algebra query. Finally, we present a simple business process specification framework (and an example specification). Both syntax and semantics of our query language is implemented in Maude.

keywords:

term rewriting, query languages, business process modelling

^†^†volume: 184^†^†issue: 2

A Non-deterministic Multiset Query Language

1 Introduction

In a data-centric approach to business process modelling (see, e.g., [1, 2]), specification of data transformation during case execution is an integral part of the business process model. This new paradigm requires new tools and formalisms for effective specification, simulation and validation. Task-centric models are commonly formalized using Petri nets (see, e.g., [3, 4]). Adapting Petri net-based formalizations to data-centric models is, however, problematic: While simple transformations on data can be represented directly within a coloured Petri net, Petri nets lack facilities for complex data processing and querying. Even so, there exists a large amound of literature (see e.g., [5, 6, 7]) devoted to enriching Petri nets with with data processing capabilities and automated verification of their properties. Recent paper [8] introduced DB-Nets — an attempt to integrate coloured Petri nets with relational databases. The use of two separate formalisms complicates verification and simulation (though it corresponds to actual implementations of BPM systems). In the following paper [9] a subset of database operations was implemented inside coloured Petri nets with name creation and transition priorities.

Conditional term rewriting [10] was proposed as an alternative (if less popular) generic framework for specification of dynamic systems [11]. It subsumes a variety of Petri nets [12] and their simulation is one of popular applications of the term rewriting system Maude [13, 14]). More precisely, it is well known (see e.g., [12]) that coloured Petri nets can be implemented by multiset rewriting systems, and, conversely, rewriting systems which rewrite multisets of terms representing colour tokens can be interpreted as Petri nets: just identify places with colour tokens, and each rewriting rule of the form

a_{1}a_{2}\ldots a_{n}\Rightarrow b_{1}b_{2}\ldots b_{m}

with a transition with input arcs from $a_{1}$ , $a_{2}$ , $\ldots$ , $a_{n}$ and output arcs to $b_{1}$ , $b_{2}$ , $\ldots$ , $b_{m}$ . Other constructs, such as inhibitor arcs can be easily implemented with conditional rewriting rules. Rewriting systems are more general than Petri Nets since they are not limited to rewriting multisets (on the other hand, Petri nets are much better supported by tools and programming libraries). However, rewriting systems still share with Petri nets the limitation and inconvenience of not directly supporting bulk, complex operations on data which involve some kinds of quantification. For example, suppose that the state of a data driven business process related to e-commerce is represented by a multiset of terms. In particular, terms of the form $\text{\tt item}(p,c)$ denote the presence of a product $p$ in the basket of a customer $c$ . Suppose now that $c$ cancels the case, and so we need to remove all $c$ ’s items from the multiset. Describing removal of a single (nondeterministically chosen) item is easy with the rule $\text{\tt item}(c,x)\Rightarrow\emptyset$ where $x$ is a variable, but specifiying in a rewriting system (or a Petri net) that all of them need to be removed before the next business step is more complex (though clearly possible) and would require auxilliary tokens and conditional rewrite rules corresponding to inhibitor arcs in a Petri net preventing other transitions as long as there are still some $c$ ’s items present in the rewritten multiset. Thus, a high-level query language adding quantifier constructs on top of conventional rewriting system or a Petri net is clearly desirable, particularly if rewriting systems or Petri nets are to become convenient formalisms to model data driven business processes.

In this paper we present an expressive multiset query and update language $\mathcal{Q}_{\Sigma,\mathcal{D}}$ , designed to be executable in a term rewriting system, useful for a unified and somewhat “Petri nettish” formalization of data-centric business processes. The connection with Petri nets is admittedly tenuous and follows from the fact that the language acts on data represented as multisets of terms which could be viewed as tokens (see the discussion above). Since this language specifies changes to data, instead of being a reimplementation of relational calculus, it contains linear-like features fitting a term rewriting implementation. Most remarkably, $\mathcal{Q}_{\Sigma,\mathcal{D}}$ is non-deterministic — the result of a query or update is not, in general, uniquely determined by the database. This permits modelling user choices, just like in the case of Petri nets. $\mathcal{Q}_{\Sigma,\mathcal{D}}$ consists of three sublanguages, parametrized with respect to a signature $\Sigma$ and $\Sigma$ -algebra of facts $\mathcal{D}$ :

1.

Language $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ of conditions (Boolean queries) which can be used independently as constraints, or as components of queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ and $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ .
2.

Data manipulation language $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . A DML query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ defines new facts to be added to the database. Some of the old facts used in constructing the new ones may be deleted.
3.

Language $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ of queries which only return facts but do not change the database. Both syntax, and to some extent semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ is a restriction of syntax and semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ .

A query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\alpha}$ , $\alpha\in\{\mathbf{cnd},\mathbf{qry},\mathbf{dml}\}$ , is given semantics by assignment of a rewriting system $\mathcal{R}_{\Sigma,\mathcal{D}}^{\alpha}(Q)$ . To evaluate $Q$ in a database $F$ we start with an initial term $\mathrm{I}_{Q}(F)$ . A normal form of $\mathrm{I}_{Q}(F)$ wraps a result of $Q$ ’s evaluation: a Boolean value indicating validity of a condition, a query answer or a new database resulting from execution of a DML query. As remarked above, while $\mathcal{R}_{\Sigma,\mathcal{D}}^{\alpha}(Q)$ is always terminating (i.e., there are no infinite execution paths, so we always do get some result from evaluating $Q$ ), it is not confluent (i.e., divergent execution paths may not eventually converge) in general, hence we may get distinct results depending on nondeterministic choices. We identify syntactic constraints on queries in each of the sublanguages which ensure confluence for the rewriting system, and hence determinism for the results of the query. $\mathcal{Q}_{\Sigma,\mathcal{D}}$ shares with the language introduced in [15] a non-standard approach to variable binding. The approach avoids problems with capture-free substitutions without dispensing with explicit variables, but at the price of non-compositionality: The surrounding context may determine whether a variable in a subterm is free or bound in this subterm. $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ supports introduction of fresh values to the database, which is used, e.g., to generate identifiers for newly created artifacts or to simulate user input (cf. [7]).

Since queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}$ are converted to a term rewriting system, verification of a business process specified as a set of DML queries (see Section 7) can be assisted with symbolic reachability analysis techniques based on narrowing (see e.g., [16]) . We plan to expand on this idea in future research. Note that our results in this article regarding the confluence of the rewriting systems to which a particular subclass of queries compiles to, may be relevant for narrowing (see e.g., [17, 18], c.f. [19]). E.g., narrowing a confluent system may provide a more efficient search procedure than in the case of a non-confluent one.

1.1 Prior work

The present paper builds on the previous paper [15] (cf. [20]) where a multiset query language executed in Maude was proposed. $\mathcal{Q}_{\Sigma,\mathcal{D}}$ shares many similarities with the language described in [15], particularly the treatment of quantification. It has, however, distinct syntax (with, e.g., fact markings in quantifiers) and distinct semantics. We consider both languages to be alternatives, each of which with its own strengths. A side-by-side comparison is presented in Table 1. Observe that while the language described in [15] can be defined both in the set and multiset setting (in the former case we match multisets of facts modulo idempotence in addition to commutativity, associativity and identity), here we assume exclusively multiset setting. This is because the language described here is implemented through multiset rewriting. Making multiset constructor idempotent would make it impossible to consistently replace matched terms, a feature which is crucial for our formalism to behave sensibly. E.g., given a rule $a\Rightarrow b$ , term $ab=_{\mathcal{A}}aab$ rewrites in one step both to $bb=_{\mathcal{A}}b$ (intended) and $abb=_{\mathcal{A}}ab$ (not intended).

Table 1: Comparison between

\mathcal{Q}_{\Sigma,\mathcal{D}}

and

\mathcal{CF}_{\Sigma,\mathcal{D}}(X)

from [15]

$\mathcal{Q}_{\Sigma,\mathcal{D}}$	$\mathcal{CF}_{\Sigma,\mathcal{D}}(X)$
Semantics of (possibly) non-deterministic queries is based on translation into term rewriting systems.	Semantics of queries is based on term matching on the metalevel. Queries are always deterministic.
Queries return facts in the same signature as the database against which the query is evaluated.	Queries return objects of an arbitrary signature (as long as it contains a “union” operator).
DML queries construct multisets of facts to be added to the current database. Some of the current facts used in the construction may be deleted.	DML expressions construct a pair of sets or multisets of facts — those to be deleted from and those to be added to the current database.
Fresh values are introduced through “virtual fresh facts”. Actual freshness in ensured through rewrite rules which define the semantics of DML query.	Attributes of input facts can be marked by special sorts as fresh. Testing framework ensures that values injected in those columns are actually fresh.
It is assumed that the database is a multiset of facts	The language can be defined both in set and multiset setting
Treatment of variable binding is identical in both languages

$\mathcal{Q}_{\Sigma,\mathcal{D}}$ , has some similarities to matching logic [21, 22], as both are based on term matching. They have different purposes, however, and different syntax and semantics. Unlike matching logic statements, $\mathcal{Q}_{\Sigma,\mathcal{D}}$ queries are non-deterministic. Matching logic is used for software verification, while $\mathcal{Q}_{\Sigma,\mathcal{D}}$ is a query language intended to be a component of the system. Finally, matching logic has conventional quantifiers, whereas we use a non-standard quantification over “relation patterns”.

CINNI [23] is a generic calculus of substitutions implemented in Maude which combines de Bruijn indices with explicit names to solve the problem of capture-free substitutions. To avoid the associated complexity, we decided not to use conventional variable binding implemented, e.g., using CINNI.

Our quantification over “relation patterns” instead of variables, resembles quantifier constructs in description logic ¹¹1We are grateful to Prof. Andrzej Tarlecki for this observation. [24]. Our syntactic construct, however, uses explicit variable names and is not limited to binary relations where only the second column is bound by the quantifier.

Data-centric business process models are formalized in a variety of ways. First-order logic and its restricted variations (see e.g., [25, 26, 27, 28]), datalog [29], and UML [30], are popular choices. Those formalisms are excellent for the specification of data models, and they come with expressive query languages; they are, however, not so ideally suited for modelling change, because of frame problems [31], where it is not always obvious what information is modified and what stays the same. Rewriting formalisms [10], which are explicit about scope of change have a clear advantage here. On the other hand, a great deal of work has been devoted to formal verification of logic based business process formalism, see e.g., [32, 33] in the context of hierarchical artifact systems. For another example see [34] where a data aware extension of BPMN was proposed together with SMT (satisfiability modulo theories) based verification techniques.

In [35] a language called Reseda, was introduced for specification of data driven business process. The language integrates data description with behaviour. What makes it relevant as a prior work to the present paper is that Reseda’s semantics is defined by associating with a Reseda program a transition system. Such a program can then be executed by rewriting data in accordance with the transition rules, similarly to the execution of the language described here.

As we remarked earlier, a recent paper [8] introduced DB-Nets which integrate coloured Petri nets with relational databases. Since the use of two separate formalisms complicates verification and simulation in the following paper [9] a subset of database operations was implemented inside coloured Petri nets with name creation and transition priorities. Thus, the motivation of [9] is analogous to the motivation of this paper, but in the world of Petri nets instead of term rewriting systems. There are however two important differences: First, our language is meant to provide complete specification of data driven business processes, whereas in [9] the business process is still specified as a Petri net, and there is just an interface between relational queries (perhaps implemented inside the net itself) and the main net describing the process. Secondly (and this is what makes the first point possible) we do not simply implement a conventional relational dml and query language in a rewriting system. Instead, we implement a linear and non-deterministic query language which can emulate user choices and creation of new objects.

1.2 Preliminaries on term rewriting

We recall basic notions related to term rewriting [10, 11], and many sorted equational logic [36].

Let $S$ be a poset (partially ordered set). A family $X=\{X_{s}\;|\;s\in S\}$ of sets is called an $S$ -sorted set if $X_{s}\subseteq X_{s^{\prime}}$ whenever $s\leq s^{\prime}$ . We abbreviate $x\in\bigcup X$ as $x\in X$ . We write $x:s$ iff $x\in X_{s}$ .

An algebraic signature $\Sigma=(\Sigma_{S},\Sigma_{F})$ consists of a finite poset of sorts $\Sigma_{S}$ and a finite set $\Sigma_{F}$ of function symbols. The set of function symbols $\Sigma_{F}$ is $\Sigma_{S}^{+}$ -sorted, where $\Sigma_{S}^{+}$ is the set of finite, non-empty sequences of elements of $\Sigma_{S}$ partially ordered with $s_{0}\cdots s_{n}\leq t_{0}\cdots t_{m}$ iff $m=n$ , $s_{0}\leq t_{0}$ and $s_{i}\geq t_{i}$ for all $i\in\{1,\ldots,n\}$ . Traditionally we write $f:s_{1}\ldots s_{n}\rightarrow s_{0}$ when $f\in(\Sigma_{F})_{s_{0}\ldots s_{n}}$ , where we denote by $(\Sigma_{F})_{s_{0}\ldots s_{n}}$ the set of function symbols of sort $s_{0}\ldots s_{n}$ . This explains the somewhat confusing ordering on $\Sigma_{S}^{+}$ : we are covariant on return value and contravariant on arguments. Symbols $c:\rightarrow s$ are called constants of sort $s$ . A $\Sigma$ -algebra $\mathbb{A}$ is an assignment of a set $\llbracket s\rrbracket_{\mathbb{A}}$ to each $s\in\Sigma_{S}$ such that $\llbracket s\rrbracket_{\mathbb{A}}\subseteq\llbracket s^{\prime}\rrbracket_{\mathbb{A}}$ if $s\leq s^{\prime}$ , and a function $\llbracket f\rrbracket_{\mathbb{A}}:\llbracket s_{1}\rrbracket_{\mathbb{A}}\times\cdots\times\llbracket s_{n}\rrbracket_{\mathbb{A}}\rightarrow\llbracket s\rrbracket_{\mathbb{A}}$ to each $f:s_{1}\ldots s_{n}\rightarrow s$ in $\Sigma_{F}$ . Let $V:=\{V_{s}\;|\;s\in\Sigma_{S}\}$ be a $\Sigma_{S}$ -sorted set of variables. A term algebra $\mathcal{T}_{\Sigma}(V)$ has “sort-safe” terms as elements and function symbols interpreted by themselves. We denote by $\mathcal{T}_{\Sigma}$ the algebra of ground $\Sigma$ -terms. We often use mixfix syntax where underscores in the function name correspond to consecutive arguments. Thus, if $\Sigma_{F}$ contains $\_+\_:A\;A\rightarrow A$ and $0:\rightarrow A$ then $0+0$ is a ground term of sort $A$ . Positions in a term are denoted by strings of positive integers. Denote by $\varepsilon$ the empty string, and by $t|_{\kappa}$ the subterm of $t\in\mathcal{T}_{\Sigma}(V)$ at position $\kappa\in\mathbb{Z}^{*}_{+}$ (if defined), i.e., $t|_{\varepsilon}:=t$ , and $f(t_{1},\ldots,t_{n})|_{k\kappa}:=t_{k}|_{\kappa}$ . Let $\mathit{Pos}(t):=\{\kappa\in\mathbb{Z}^{*}_{+}\;|\;t|_{\kappa}\;\text{is defined}\}$ . If $\kappa\in\mathit{Pos}(t)$ and $u$ is a term of the same sort as $t|_{\kappa}$ , then we denote by $t[u]_{\kappa}$ the result of replacing $t|_{\kappa}$ in $t$ with $u$ . We use a standard notation for substitutions. Let $\vec{a}=a_{1},\ldots,a_{n}$ be a list of terms, $\vec{v}=v_{1},\ldots,v_{n}$ a list of distinct variables. Then we denote $\sigma=\{\vec{a}/\vec{v}\}=\{a_{1}/v_{1},\ldots,a_{n}/v_{n}\}$ when $\sigma(v_{i})=a_{i}$ , $i\in\{1,\ldots,n\}$ , and $\sigma(v)=v$ for any variable $v\notin\{v_{1},\ldots,v_{n}\}$ .

A $\Sigma$ -algebra may be defined as a quotient of $\mathcal{T}_{\Sigma}$ by a congruence generated by a set $A\cup E$ of equalities, where equalities in $A$ , referred to as equational attributes, define structural properties such as associativity, commutativity, or identity, and $E$ consists of conditional equalities interpreted as directed simplification rules on $\mathcal{T}_{\Sigma}$ . It is assumed that simplifications terminate and are confluent, hence each $t$ has the unique (modulo $A$ ) irreducible form $t{\downarrow_{E/A}}\in\mathcal{T}_{\Sigma}$ representing a class of $t$ in $\mathcal{T}_{\Sigma}/{=}_{A\cup E}$ .

Simplification with respect to equalities computes values. The behaviour is represented with rewritings. A rewriting system $\mathcal{R}=(\Sigma,A,E,R)$ consists of a signature $\Sigma$ , a set of equations $A\cup E$ were $E$ defines confluent and terminating (modulo $A$ ) simplifications on $\mathcal{T}_{\Sigma}$ , and a finite set $R$ of conditional rewriting rules of the form $\lambda:t_{1}\Rightarrow t_{2}\ \mathit{if}\ C$ , where optional condition $C$ is a conjunction of equalities, and $\lambda$ is the rule’s label. A one-step rewrite $u\xrightarrow{\lambda}_{\mathcal{R}}u^{\prime}$ from $u$ to $u^{\prime}$ using such a rule is possible if there exists a position $\kappa$ , term $v$ , and a substitution $\sigma$ such that $u=_{A}v$ , $v|_{\kappa}=_{A}\sigma(t_{1})$ , $u^{\prime}=_{A}v[\sigma(t_{2})]_{\kappa}$ and $\sigma(C)$ is satisfied. We write $u\rightarrow_{\mathcal{R}}u^{\prime}$ iff there exist terms $s,s^{\prime}\in\mathcal{T}_{\Sigma}$ and a label $\lambda$ of a rule in $R$ such that $u{\downarrow}_{E/A}=_{A}s$ , $s\xrightarrow{\lambda}_{\mathcal{R}}s^{\prime}$ , and $s^{\prime}{\downarrow}_{E/A}=_{A}u^{\prime}{\downarrow}_{E/A}$ . We denote by $\rightarrow_{\mathcal{R}}^{+}$ and $\rightarrow_{\mathcal{R}}^{*}$ the transitive and reflexive-transitive closures of $\rightarrow_{\mathcal{R}}$ . We also write $u\rightarrow_{\mathcal{R}}^{!}u^{\prime}$ if $u\rightarrow_{\mathcal{R}}^{*}u^{\prime}$ and there is no $u^{\prime\prime}$ such that $u^{\prime}\rightarrow_{\mathcal{R}}u^{\prime\prime}$ . If $\mathcal{R}$ is implied by the context, we omit $\mathcal{R}$ from arrows.

Variants of the following definition and easy to prove lemma appear in the literature (see, e.g., [37]):

Definition 1.1

Let $(X,\rightarrow)$ , where $\rightarrow\subseteq X\times X$ , be a transition system. Assume that $\equiv$ is an equivalence on $X$ which is a bisimulation on $(X,\rightarrow)$ . We call $\rightarrow$ semiconfluent at $x\in X$ modulo $\equiv$ if for all $y,y^{\prime}\in X$ such that $y\leftarrow x\rightarrow^{*}y^{\prime}$ there exist $z,z^{\prime}\in X$ such that $y\rightarrow^{*}z\equiv z^{\prime}\;{}^{*}\!\!\leftarrow y^{\prime}$ . We call $\rightarrow$ semiconfluent modulo $\equiv$ if $\rightarrow$ is semiconfluent at all $x\in X$ . We call $\rightarrow$ confluent at $x\in X$ modulo $\equiv$ if for all $y,y^{\prime}\in X$ such that $y\;{}^{*}\!\!\leftarrow x\rightarrow^{*}y^{\prime}$ there exist $z,z^{\prime}\in X$ such that $y\rightarrow^{*}z\equiv z^{\prime}\;{}^{*}\!\!\leftarrow y^{\prime}$ . We call $\rightarrow$ confluent modulo $\equiv$ if $\rightarrow$ is confluent at all $x\in X$ .

Lemma 1.2

Semiconfluence modulo equivalence implies confluence modulo equivalence.

2 Multisets of facts, fresh facts and patterns

Our queries are evaluated against, or act on, finite multisets of facts. Duplicate facts can be genuinely useful and removing them is computationally expensive. If necessary, duplicates can be removed explicitly or, better, one can ensure that no duplicates are introduced in the first place by judicious choice of DML operations. In fact, SQL is a multiset query language as well, hence by using multisets we are closer to the actual relational database practice than formal systems based on sets.

$\mathcal{Q}_{\Sigma,\mathcal{D}}$ is parametrized with respect to a signature of facts $\Sigma$ and a $\Sigma$ -algebra of facts $\mathcal{D}$ . $\Sigma_{S}$ must contain sorts Fact and Bool for facts and Booleans, respectively. All constructors for facts are contained in $\Sigma_{F}$ . Facts are reifications of predicate instances. A typical fact has the form $f(a_{1},\ldots,a_{n})$ , where $f:s_{1}\ldots s_{n}\rightarrow\text{\tt Fact}$ is a fact constructor. $\mathcal{D}$ defines all the data types used in facts and is specifiable in terms of directed equations and equational attributes. $\mathcal{D}$ must define Boolean connectives and Boolean-valued equality $\_=\_:s\;s\rightarrow\text{\tt Bool}$ for all $s\in\Sigma_{S,K}$ . We assume that all ground terms of sort Bool simplify to either t or f. This is non-trivial: Define a function $f:\text{\tt Nat}\rightarrow\text{\tt Bool}$ with a single equation $f(0)=\text{\bf t}$ . Then $f(1)$ is fully reduced and distinct from both t and f.

Multisets of facts. The signature of multisets extends $\Sigma_{S}$ with sorts (Ne)FSet of finite (non-empty) multisets of facts. The subsort ordering is given by $\text{\tt Fact}<\text{\tt NeFSet}<\text{\tt FSet}$ . In particular, each fact is a non-empty multiset of facts. Finite multisets of facts are constructed with an associative and commutative binary operator $\_\circ\_:\text{\tt FSet}\;\text{\tt FSet}\rightarrow\text{\tt FSet}$ (cf. [38]) with identity element $\emptyset:\rightarrow\text{\tt FSet}$ . Operator $\_\circ\_$ is subsort overloaded with the additional declaration $\_\circ\_:\text{\tt FSet}\;\text{\tt NeFSet}\rightarrow\text{\tt NeFSet}$ . Thus, a multiset constructed from a multiset and a non-empty multiset is non-empty.

Freshness and nominal sorts. Support for creation of fresh values is a common requirement (cf. [7]): Identifiers for new objects must not belong to the present nor any past active domain of the database. To understand why reusing identifiers from past domains is bad consider situation where a new business object is created with the identifier of a previously deleted one. In this case the attempt to verify if the deleted object is present in the final database may yield an incorrect affirmative answer. We support creation of fresh values of nominal sorts only. Usually this suffices, and freshness for non-nominal data types is problematic (cf. [39]). A sort $s$ is nominal (relative to $\Sigma$ -algebra $\mathcal{D}$ ) if values of this sort have no non-trivial algebraic or relational structure beside equality. In particular, for nominal $s$ , $s^{\prime}\leq s\leq s^{\prime\prime}$ if and only if $s^{\prime}=s=s^{\prime\prime}$ . To create values of each nominal sort $s$ we have constructor $\imath^{s}_{\_}:\text{\tt Nat}\rightarrow s$ which belongs neither to $\Sigma$ nor to the signature of $\mathcal{Q}_{\Sigma,\mathcal{D}}$ .

Example 2.1

Consider a client basket database. Identifiers of customers, products and baskets have sorts $\mathfrak{c}$ , $\mathfrak{p}$ , and $\mathfrak{b}$ , respectively. We use two fact constructors: $\_\text{\tt owns}\_:\mathfrak{c}\;\mathfrak{b}\rightarrow\text{\tt Fact}$ and $\_\text{\tt in}\_:\mathfrak{p}\;\mathfrak{b}\rightarrow\text{\tt Fact}$ . Multiset of facts $(\imath^{\mathfrak{c}}_{1}\;\text{\tt owns}\;\imath^{\mathfrak{b}}_{1})\circ(\imath^{\mathfrak{p}}_{2}\;\text{\tt in}\;\imath^{\mathfrak{b}}_{1})\circ(\imath^{\mathfrak{p}}_{3}\;\text{\tt in}\;\imath^{\mathfrak{b}}_{1})$ denotes the state in which customer $\imath^{\mathfrak{c}}_{1}$ is the owner of basket $\imath^{\mathfrak{b}}_{1}$ containing products $\imath^{\mathfrak{p}}_{2}$ and $\imath^{\mathfrak{p}}_{3}$ . Using multisets instead of sets can be useful: multiset $(\imath^{\mathfrak{p}}_{3}\;\text{\tt in}\;\imath^{\mathfrak{b}}_{1})\circ(\imath^{\mathfrak{p}}_{3}\;\text{\tt in}\;\imath^{\mathfrak{b}}_{1})$ denotes the situation where basket $\imath^{\mathfrak{b}}_{1}$ contains two items of $\imath^{\mathfrak{p}}_{3}$ .

Constructing values of nominal sorts from natural numbers simplifies creation of fresh values: to ensure freshness one can construct the new value with the smallest natural number which was not used so far. To keep track of those “smallest unused naturals”, and to make retrieval of fresh values similar to retrieval of data we use fresh facts of sort $\text{\tt Fact}^{\mathbf{n}}$ unrelated to Fact. For each nominal sort $s$ we have a single-argument constructor of the form $C_{s}:s\rightarrow\text{\tt Fact}^{\mathbf{n}}$ which wraps value $\imath^{s}_{n}$ such that for all $m\geq n$ , $\imath^{s}_{m}$ was never used before. When fresh value of sort $s$ is requested, we return $\imath^{s}_{n}$ and update the fresh fact to $C_{s}(\imath^{s}_{n+1})$ . Fresh facts are combined into (non-empty) multisets of sort (Ne) $\text{\tt FSet}^{\mathbf{n}}$ using commutative and associative operator $\_\circ\_:\text{\tt FSet}^{\mathbf{n}}\;\text{\tt FSet}^{\mathbf{n}}\rightarrow\text{\tt FSet}^{\mathbf{n}}$ with identity $\emptyset:\rightarrow\text{\tt FSet}^{\mathbf{n}}$ . To facilitate bulk updates of fresh facts needed in the semantics of DML queries, we define the following function:

\upsilon:\text{\tt FSet}^{\mathbf{n}}\rightarrow\text{\tt FSet}^{\mathbf{n}},\quad\upsilon\bigl{(}C_{s_{1}}(\imath^{s_{1}}_{m_{1}})\circ\cdots\circ C_{s_{n}}(\imath^{s_{n}}_{m_{n}})\bigr{)}=C_{s_{1}}(\imath^{s_{1}}_{m_{1}+1})\circ\cdots\circ C_{s_{n}}(\imath^{s_{n}}_{m_{n}+1}).

(1)

Patterns. Quantifiers in $\mathcal{Q}_{\Sigma,\mathcal{D}}$ quantify over patterns (of sort Pat) containing non-ground multisets of facts and fresh facts marked by modalities, which control retention of matched facts (i.e., whether upon matching they are removed temporarily, permanently, or not at all from the database), and syntactically wrap fresh facts (i.e., fresh facts can appear in the pattern only inside specialized modality). Patterns can be preserving (of sort $\text{\tt Pat}^{p}$ ), semi-terminating (of sort $\text{\tt Pat}^{\text{\it st}}$ ), terminating (of sort $\text{\tt Pat}^{t}$ ), terminating and preserving (of sort $\text{\tt Pat}^{\text{\it tp}}$ ), or neither. The subsort relation is defined by $\text{\tt Pat}^{\text{\it tp}}<\text{\tt Pat}^{t}<\text{\tt Pat}^{\text{\it st}}<\text{\tt Pat}$ and $\text{\tt Pat}^{\text{\it tp}}<\text{\tt Pat}^{p}<\text{\tt Pat}$ . Patterns are constructed with modalities

[\_]_{!}:\text{\tt NeFSet}\rightarrow\text{\tt Pat}^{p},\quad\!\![\_]_{?}:\text{\tt NeFSet}\rightarrow\text{\tt Pat}^{\text{\it tp}},\quad\!\![\_]_{0}:\text{\tt NeFSet}\rightarrow\text{\tt Pat}^{\text{\it st}},\quad\!\![\_]_{\mathbf{n}}:\text{\tt NeFSet}^{\mathbf{n}}\rightarrow\text{\tt Pat}.

and associative and commutative, subsort overloaded operator:

	$\displaystyle\_\circ\_:\text{\tt Pat}\;\text{\tt Pat}\rightarrow\text{\tt Pat}.\quad\_\circ\_:\text{\tt Pat}\;\text{\tt Pat}^{t}\rightarrow\text{\tt Pat}^{t},\quad\_\circ\_:\text{\tt Pat}^{p}\;\text{\tt Pat}^{p}\rightarrow\text{\tt Pat}^{p},$
	$\displaystyle\_\circ\_:\text{\tt Pat}^{p}\;\text{\tt Pat}^{\text{\it tp}}\rightarrow\text{\tt Pat}^{\text{\it tp}},\quad\_\circ\_:\text{\tt Pat}^{\text{\it st}}\;\text{\tt Pat}\rightarrow\text{\tt Pat}^{\text{\it st}}.$

Thus, a terminating (resp. a semi-terminating) pattern has to contain at least one fact wrapped with $[\_]_{?}$ (resp. with either $[\_]_{?}$ or $[\_]_{0}$ ). A terminating and preserving pattern consists of facts marked only with $[\_]_{!}$ or $[\_]_{?}$ , and it contains at least one fact wrapped with $[\_]_{?}$ . Directed equalities $[F_{1}]_{m}\circ[F_{2}]_{m}=[F_{1}\circ F_{2}]_{m}$ , where $m\in\{!,?,0,\mathbf{n}\}$ , and $F_{1}$ , $F_{2}$ are non-empty multisets of (fresh) facts guarantee that fully reduced patterns have facts gathered in groups of the same modality.

Example 2.2

Let Id be a sort, let $f,g:\text{\tt Id}\;\text{\tt Nat}\rightarrow\text{\tt Fact}$ and $h:\text{\tt Id}\rightarrow\text{\tt Fact}$ be fact constructors and let $x$ , $y$ , $z$ and $t$ be variables. Then

[f(x,y)\circ h(x)]_{?}\circ[g(x,y)]_{!}\circ[C_{\text{\tt Id}}(t)]_{\mathbf{n}}\circ[g(x,1)]_{0}

is a pattern. It is terminating because of a presence of $[f(x,y)\circ h(x)]_{?}$ subpattern. It is not, however, preserving since it contains facts wrapped in $[\_]_{0}$ and $[\_]_{\mathbf{n}}$ .

The informal meaning of modalities is as follows: Facts matched by those marked by $[\_]_{?}$ can be considered at most once during quantifier evaluation, but they are not removed from the database. Facts marked by $[\_]_{0}$ are removed from the database when matched, but they are returned if the computation branch this matching leads to is unsuccessful. Thus, the presence of facts marked by $[\_]_{0}$ in the pattern may not guarantee termination, unless one can prove that the formula under quantifier is always successful. Facts marked by $[\_]_{!}$ are always retained in the database, and $[\_]_{\mathbf{n}}$ wraps fresh facts.

Remark 2.3

In what follows we use the following notation. Let $P$ be a pattern. We denote by $P_{0}$ , $P_{!}$ , $P_{?}$ , $P_{\mathbf{n}}$ the multisets of facts consisting of those facts in $P$ which are wrapped by modalities $[\_]_{0}$ , $[\_]_{!}$ , $[\_]_{?}$ , and $[\_]_{\mathbf{n}}$ , respectively. Thus, e.g., $([F_{1}]_{!}\circ[F_{2}]_{?})_{?}:=F_{2}$ and $([F_{1}]_{!}\circ[F_{2}]_{?})_{\mathbf{n}}:=\emptyset$ .

3 Query and condition languages

This section introduces the three sublanguages of $\mathcal{Q}_{\Sigma,\mathcal{D}}$ , their syntax and informal semantics. Formal semantics based on conditional term rewriting is provided in the subsequent sections.

3.1 Conditions

The language $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ of conditions on finite multisets of facts is analogous to first-order logic with quantification restricted to the active domain.

Definition 3.1

Let $\Sigma$ be a signature and let $\mathcal{D}$ be a $\Sigma$ -algebra of facts. Formulas of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ are (generally non-ground) terms of sort Cnd constructed with

\displaystyle\bot:\rightarrow\text{\tt Cnd},\ \{\_\}:\text{\tt Bool}\rightarrow\text{\tt Cnd},\ \neg\_:\text{\tt Cnd}\rightarrow\text{\tt Cnd},\ \_\vee\_:\text{\tt Cnd}\;\text{\tt Cnd}\rightarrow\text{\tt Cnd},\ \exists\_.\_:\text{\tt Pat}^{\text{\it tp}}\;\text{\tt Cnd}\rightarrow\text{\tt Cnd}.

Thus, $\bot$ , $\{B\}$ , $\neg\psi$ , $\psi\vee\psi^{\prime}$ and $\exists P\mathbin{.}\psi$ are conditions if $B$ is a term of sort Bool, $P$ is a terminating and preserving pattern, and $\psi$ and $\psi^{\prime}$ are conditions. Consider condition $T:=\exists P\mathbin{.}\psi$ . Existential quantifier $\exists$ binds in $\psi$ all the variables appearing in $P$ which were not bound by the term surrounding $T$ . Thus, the meaning of the formula may change when it is placed in a different context.

Example 3.2

Let $R:\text{\tt Nat}\ \text{\tt Nat}\rightarrow\text{\tt Fact}$ , and suppose that terms $R(t_{1},t_{2})$ represent rows of a relation $\mathbf{R}\subseteq\mathbf{N}\times\mathbf{N}$ . Let $x$ , $y$ , and $z$ be distinct variables. Then condition $\neg\exists[R(x,y)]_{?}.\exists[R(x,z)]_{?}.\neg\{y=z\}$ expresses functional dependency from the first to the second column of $\mathbf{R}$ . The first quantifier binds $x$ and $y$ , the second one binds $z$ . The condition is closed. The subcondition $\exists[R(x,z)]_{?}.\neg\{y=z\}$ taken on its own is open, but only $y$ is free and the quantifier now binds both $x$ and $z$ .

Closed formulas in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ are called sentences in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ . Let $\mathit{Var}(t)$ be the set of variables of $t$ , and let $\mathit{cl?}(\phi)$ iff $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ is closed. To define $\mathit{cl?}(\phi)$ by structural recursion we need to keep track of variables bound by the context of $\phi$ . Thus, $\mathit{cl?}(\phi):=\mathit{cl?}(\phi,\emptyset)$ , where, for any set of variables $V$ ,

	$\displaystyle\mathit{cl?}(\bot,V)=\text{\bf t},\quad\mathit{cl?}(\neg\phi,V)=\mathit{cl?}(\phi,V),\quad\mathit{cl?}(\{t\},V)=\mathit{Var}(t)\subseteq V$
	$\displaystyle\mathit{cl?}(\phi_{1}\vee\phi_{2},V)=\mathit{cl?}(\phi_{1},V)\wedge\mathit{cl?}(\phi_{2},V),\quad\mathit{cl?}(\exists P\mathbin{.}\phi,V)=\mathit{cl?}\bigl{(}\phi,V\cup\mathit{Var}(P)\bigr{)}.$		(2)

As the syntactic sugar we define operators $\top:\rightarrow\text{\tt Cnd}$ , $\_\wedge\_:\text{\tt Cnd}\;\text{\tt Cnd}\rightarrow\text{\tt Cnd}$ , $\forall\_.\_:\text{\tt Pat}^{\text{\it tp}}\;\text{\tt Cnd}\rightarrow\text{\tt Cnd}$ . with equalities $\top=\neg\bot$ , $\phi_{1}\wedge\phi_{2}=\neg(\neg\phi_{1}\vee\neg\phi_{2})$ , $\forall P\mathbin{.}\phi=\neg\exists P\mathbin{.}\neg\phi$ . Later we prove that, for any condition $\phi$ , $\neg\neg\phi$ is logically equivalent to $\phi$ . Then the functional dependency in Example 3.2 can be equivalently written as $\forall[R(x,y)]_{?}.\forall[R(x,z)]_{?}.\{y=z\}.$

Definition 3.3

A subcondition $\psi$ of $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ is a subterm of $\phi$ of sort Cnd.

3.2 Syntax of queries and DML queries

Syntactically $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ is a restriction of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . Therefore, we define their syntax jointly as follows:

Definition 3.4

Queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ are terms of sort Qy. DML queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ are terms of sort DQy. Success assured (DML) queries are terms of sorts $\text{\tt DQy}^{s}$ . The sorts are ordered with $\text{\tt Fact}<\text{\tt Qy}<\text{\tt DQy}$ and $\text{\tt Fact}<\text{\tt DQy}^{s}<\text{\tt DQy}$ . Thus, every query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ can be also interpreted as DML query (which inserts but doesn’t delete). Every fact is also a query. Success assured DML queries are DML queries guaranteed to return some facts (or at least $\surd$ ). Terms of sort DQy are constructed with

	$\displaystyle\surd:\rightarrow\text{\tt DQy}^{s},\quad\_\rhd\_:\text{\tt DQy}\;\text{\tt DQy}\rightarrow\text{\tt DQy},\quad\_\rhd\_:\text{\tt Qy}\;\text{\tt Qy}\rightarrow\text{\tt Qy},\quad\_\rhd\_:\text{\tt DQy}^{s}\;\text{\tt DQy}\rightarrow\text{\tt DQy}^{s},$
	$\displaystyle\_\rhd\_:\text{\tt DQy}\;\text{\tt DQy}^{s}\rightarrow\text{\tt DQy}^{s},\quad\emptyset:\rightarrow\text{\tt Qy},\quad\_\Rightarrow\_:\text{\tt Cnd}\;\text{\tt DQy}\rightarrow\text{\tt DQy},\quad\_\Rightarrow\_:\text{\tt Cnd}\;\text{\tt Qy}\rightarrow\text{\tt Qy},$
	$\displaystyle\nabla\_.\_:\text{\tt Pat}^{t}\;\text{\tt DQy}\rightarrow\text{\tt DQy},\quad\nabla\_.\_:\text{\tt Pat}^{\text{\it tp}}\;\text{\tt Qy}\rightarrow\text{\tt Qy},\quad\nabla\_.\_:\text{\tt Pat}^{\text{\it st}}\;\text{\tt DQy}^{s}\rightarrow\text{\tt DQy}.$

Thus, $\emptyset$ , $f$ , $Q\rhd Q^{\prime}$ , $\phi\Rightarrow Q$ , and $\nabla P\mathbin{.}Q$ are queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ if $f$ is a fact, $P$ is a terminating and preserving pattern, $Q$ and $Q^{\prime}$ are queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ , and $\phi$ is a condition. Similarly, $\emptyset$ , $\surd$ , $f$ , $D\rhd D^{\prime}$ , $\phi\Rightarrow D$ , and $\nabla P\mathbin{.}D$ are DML queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ if $f$ is a fact, $P$ is a terminating pattern, $D$ and $D^{\prime}$ are DML queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ , and $\phi$ is a condition. In addition, $\nabla P\mathbin{.}D$ is a DML query if $P$ is a more general semi-terminating pattern, but $D$ is success-assured, i.e., either $\surd$ , a fact, or of the form $D_{1}\rhd D_{2}$ where at least one of $D_{1}$ , $D_{2}$ is success assured. We force $D$ to be success assured when $P$ is semi-terminating, but not terminating because quantification over semi-terminating pattern may not terminate if the quantified expression can fail. Informal semantics of (DML) queries is given by:

1.

Let $f:\text{\tt Fact}$ . A query $f$ returns $f$ . A DML query $f$ adds $f$ to the current database.
2.

$\surd$ is used to mark a branch of a DML query as successful even if it does not add any facts.
3.

$\emptyset$ is a query returning the empty multiset of facts or a DML query which does nothing.
4.

A query $Q\rhd Q^{\prime}$ returns the multiset union of results of $Q$ and $Q^{\prime}$ . A DML query $D\rhd D^{\prime}$ adds to and removes from the database a multiset union of what $D$ and $D^{\prime}$ add and remove, respectively. Since facts are removed immediately, $\_\rhd\_$ is not commutative for DML queries.
5.

A query $\phi\Rightarrow Q$ returns what $Q$ returns if $\phi$ is satisfied. It returns $\emptyset$ otherwise. A DML query $\phi\Rightarrow D$ does what $D$ does if $\phi$ is satisfied. It does nothing otherwise.
6.

Quantifier $\nabla P.Q$ denotes iteration over facts in the database matching $P$ . At each iteration step $\sigma(Q)$ is executed, where $\sigma$ is the matching substitution. If $P$ contains fresh facts, execution of a DML query $\nabla P\mathbin{.}Q$ may introduce fresh values.

Let $\mathit{cl?}(Q)$ if and only if the query $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ (or $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ ) is closed. Similarly as in the case of conditions, $\mathit{cl?}(Q):=\mathit{cl?}(Q,\emptyset)$ , where, for any set of variables $V$ ,

	$\displaystyle\mathit{cl?}(\emptyset,V)=\mathit{cl?}(\surd,V)=\text{\bf t},\quad\mathit{cl?}(\phi\Rightarrow Q,V)=\mathit{cl?}(\phi,V)\wedge\mathit{cl?}(Q,V),$
	$\displaystyle\mathit{cl?}(f,V)=\mathit{Var}(f)\subseteq V\ \text{if}\ f:\text{\tt Fact}$
	$\displaystyle\mathit{cl?}(Q_{1}\rhd Q_{2},V)=\mathit{cl?}(Q_{1},V)\wedge\mathit{cl?}(Q_{2},V),\quad\mathit{cl?}(\nabla P\mathbin{.}Q,V)=\mathit{cl?}\bigl{(}Q,V\cup\mathit{Var}(P)\bigr{)}.$

Definition 3.5

A (DML) subquery of a (DML) query $Q$ is a subterm of $Q$ of sort (D)Qy.

Example 3.6

The following query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ is closed:

\nabla[f(x)\circ g]_{?}\circ[h(x)]_{!}\mathbin{.}\bigl{(}\{x>5\}\Rightarrow f(x+1)\bigr{)}.

It returns a multiset of facts consisting of facts of the form $f(x+1)$ where $f(x)$ and $h(x)$ belong to the multiset we query and $x>5$ . When evaluating the query each source fact $f(x)$ (such that $h(x)$ is in the multiset and $x>5$ ) and some fact $g$ are matched only once during the evaluation of this query (on the other hand, facts of the form $h(x)$ can be matched multiple times). Thus, we return at most $n$ facts $f(x+1)$ , where $n$ is the number of $g$ facts in the database. For example, for multiset $f(7)\circ f(8)\circ f(7)\circ f(4)\circ h(8)\circ h(7)\circ g\circ g$ the query returns either $f(7)\circ f(8)$ or $f(7)\circ f(7)$ depending on whether we match $f(7)\circ g$ and $f(8)\circ g$ or we match $f(7)\circ g$ twice (and then, in both cases, we run out of $g$ -facts).

Example 3.7

The following (closed) “DML query” is not syntactically correct (it cannot be assigned sort DQy):

\nabla[f(x)]_{0}\mathbin{.}\bigl{(}\{x>5\}\Rightarrow g(x)\bigr{)}

This is because the pattern $[f(x)]_{0}$ is only semiterminating, but not terminating. In this case the subquery in the scope of $\nabla[f(x)]_{!}\mathbin{.}\_$ should be success assured, but $\{x>5\}\Rightarrow g(x)$ clearly isn’t (if $x\leq 5$ then it fails). If we would execute this query against a multiset $f(1)\circ f(6)$ it would loop forever matching $f(1)$ , removing it, failing when executing $\{1>5\}\Rightarrow g(1)$ , returning $f(1)$ to the multiset, matching it again, and so on.

As the following example demonstrates, some incorrect DML queries of the form $\nabla P\mathbin{.}\psi$ , where $P$ is semiterminating but not terminating and $\phi$ is not success assured, provably always terminate. Nevertheless, we prefer to reject them regardless, since usually making them syntactically correct is not difficult:

Example 3.8

Assume that $x$ is a variable of sort Nat (natural numbers). Execution of the following syntactically incorrect DML query always terminates (it replaces each fact of the form $f(x)$ where $x>5$ with fact $g(x)$ , and returns to the multiset all other $f(x)$ ’s unchanged):

\nabla[f(x)]_{0}\mathbin{.}\bigl{(}(\{x>5\}\Rightarrow g(x))\rhd(\{x\leq 5\}\Rightarrow f(x))\bigr{)}.

Termination follows from the fact that $(\{x>5\}\Rightarrow g(x))\rhd(\{x\leq 5\}\Rightarrow f(x)\bigr{)}$ always succeeds regardless of which natural number is bound to $x$ : depending on whether $x>5$ or $x\leq 5$ (and one of these must be true) either left or right argument of $\rhd$ succeeds and hence the whole subquery succeeds.

Unfortunately, neither $\{x>5\}\Rightarrow g(x)$ nor $\{x\leq 5\}\Rightarrow f(x)$ is syntactically success assured and hence $(\{x>5\}\Rightarrow g(x))\rhd(\{x\leq 5\}\Rightarrow f(x)\bigr{)}$ is also not success assured. However, it is very easy to modify the above query so that it describes the same modification to the database, but is now syntactically correct and can be assigned sort DQy:

\nabla[f(x)]_{0}\mathbin{.}\bigl{(}(\{x>5\}\Rightarrow g(x))\rhd(\{x\leq 5\}\Rightarrow f(x))\rhd\mathit{\surd}\bigr{)}.

4 Rewriting semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$

Semantics of a sentence $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ is given by the rewriting system $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ . Terms rewritten by $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ are of the form $\{F,S\}^{c}$ , where $\{\_,\_\}^{c}:\text{\tt FSet}\;\text{\tt Stk}^{c}\rightarrow\text{\tt State}^{c}$ , $F$ is the database of facts on which $\phi$ is checked, and $S$ is a stack of sort $\text{\tt Stk}^{c}$ implementing structural recursion. Normal forms, constructed with $\mathfrak{s}:\text{\tt Bool}\rightarrow\text{\tt State}^{c}$ , encapsulate the result of $\phi$ ’s evaluation. Sort $\text{\tt State}^{c}$ of terms holding the full state of evaluation must be distinct from all the other sorts and it must not have any super- or sub-sort relation to other sorts. This guarantees that there are no constructors accepting terms of sort $\text{\tt State}^{c}$ as arguments. Consequently, if all rewrite rules have terms of this sort on the left-hand side, then no subterms can be rewritten, i.e., the defined rewriting system is top-level. Terms of sort $\text{\tt Stk}^{c}$ are built from frames of sort $\text{\tt Frm}^{c}$ , where $\text{\tt Frm}^{c}<\text{\tt Stk}^{c}$ , using an associative binary operator $\_\_:\text{\tt Stk}^{c}\;\text{\tt Stk}^{c}\rightarrow\text{\tt Stk}^{c}$ with identity $\emptyset$ . Most constructors of frames are indexed by subconditions $\psi$ of $\phi$ , and lists of distinct variable names $\vec{v}:=v_{1},\ldots,v_{n}$ of respective sorts $s_{1},\ldots,s_{n}$ ( $\vec{v}$ can be of any length, even be empty, as long as $\{\vec{v}\}\subseteq\mathit{Var}(\phi)$ and it contains free variables of $\psi$ ):

	$\displaystyle\mathfrak{r}:\text{\tt Bool}\rightarrow\text{\tt Frm}^{c},\quad{\neg}:\rightarrow\text{\tt Frm}^{c},\quad\quad[\_,\ldots,\_]^{\vec{v}\|c}_{\psi},[\_,\ldots,\_]^{\vec{v}\|c,\downarrow}_{\psi}:s_{1}\ldots s_{n}\rightarrow\text{\tt Frm}^{c},$
	$\displaystyle[\_\|\_,\ldots,\_]^{\vec{v}\|c}_{\exists P\mathbin{.}\psi}:\text{\tt FSet}\;s_{1}\ldots s_{n}\rightarrow\text{\tt Frm}^{c}$

As $\vec{v}$ can be empty, the above signature templates include $[]^{|c}_{\psi},[]^{|c,\downarrow}_{\psi}:\rightarrow\text{\tt Frm}^{c}$ and $[\_|]^{|c}_{\psi}:\text{\tt FSet}\rightarrow\text{\tt Frm}^{c}$ . $\mathfrak{r}(B)$ encapsulates the result of evaluation of a subcondition. Frame ${\neg}$ negates the result of the next frame on the stack. Frames of the form $[\vec{a}]^{\vec{v}|c}_{\psi}$ , $[\vec{a}]^{\vec{v}|c,\downarrow}_{\psi}$ , or $[F^{\prime}|\vec{a}]^{\vec{v}|c}_{\psi}$ are called $(\psi,\sigma)$ -frames, where $\sigma:=\{\vec{a}/\vec{v}\}$ is the current substitution. They are related to evaluation of $\sigma(\psi)$ . Marked frames $[\vec{a}]^{\vec{v}|c,\downarrow}_{\psi}$ occur in evaluation of disjunctions $\_\vee\_$ . Iterator frames $[F^{\prime}|\vec{a}]^{\vec{v}|c}_{\exists P\mathbin{.}\psi}$ represent iterative evaluation of quantifiers. Multiset $F^{\prime}$ , called iterator state, contains facts available for matching with $P_{!}\circ P_{?}$ . Given a database $F$ , to evaluate sentence $\phi$ we rewrite a state term $\mathrm{I}_{\phi}(F):=\{F,[]^{|c}_{\phi}\}$ until a normal form $\mathfrak{s}(B)$ is reached. If $B$ then $\phi$ is satisfied in $F$ .

Now we present rule schemata for $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ instantiated for a given formula $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}$ . It is important to distinguish between object variables substituted when applying actual rules, and metavariables used to define rule templates. Below, $F,F^{\prime}:\text{\tt FSet}$ , $S:\text{\tt Stk}^{c}$ , and $B:\text{\tt Bool}$ are object variables. We also denote by $\vec{v}:=v_{1},\ldots,v_{n}$ and $\vec{w}=w_{1},\ldots,w_{m}$ sequences of object variables in $\mathit{Var}(\phi)$ . Metavariables, $\psi$ , $\psi_{1}$ , and $\psi_{2}$ stand for arbitrary subconditions of $\phi$ , metavariable $\mathcal{B}$ stands for Boolean subterms of $\phi$ . Metavariable $P$ stands for arbitrary patterns in $\phi$ , and $P_{!}$ and $P_{?}$ denote multisets of facts in the instance of pattern $P$ marked by respective modalities (see Remark 2.3).

Constant $\bot$ evaluates to f, and $\{\mathcal{B}\}$ evaluates to $\sigma(\mathcal{B})$ , where $\sigma$ is the current substitution of variables $\vec{v}$ ( $\sigma(\mathcal{B})$ is ground since $\{\vec{v}\}$ contains all free variables (i.e., all variables) of $\mathcal{B}$ , hence, by assumption, it simplifies to either f or t):

\lambda_{\bot}:\{F,S[\vec{v}]_{\bot}^{\vec{v}|c}\}^{c}\Rightarrow\{F,S\mathfrak{r}(\text{\bf f})\}^{c},\quad\quad{\lambda_{\{\_\}}}:\{F,S[\vec{v}]_{\{\mathcal{B}\}}^{\vec{v}|c}\}^{c}\Rightarrow\{F,S\mathfrak{r}(\mathcal{B})\}^{c}.

(3)

To evaluate $\sigma(\neg\psi)$ , we “unfold” by replacing the $(\neg\psi,\sigma)$ -frame with $\neg$ and $(\psi,\sigma)$ -frame. When $\sigma(\psi)$ is evaluated we “fold” by negating the result:

{\lambda_{\neg}^{\text{unf}}}:\{F,S[\vec{v}]_{\neg\psi}^{\vec{v}|c}\}\Rightarrow\{F,S{\neg}[\vec{v}]_{\psi}^{\vec{v}|c}\}^{c},\quad\quad{\lambda_{\neg}^{\text{fld}}}:\{F,S{\neg}\mathfrak{r}(B)\}\Rightarrow\{F,S\mathfrak{r}(\neg B)\}^{c}.

(4)

Remark 4.1

In our notation the same symbols often play a dual role — as subterms, and as part of function symbols (in sub- and super-scripts). Consider the following instantiation of schema ${\lambda_{\neg}^{\text{unf}}}$ :

{\lambda_{\neg}^{\text{unf}}}:\{F,S[x,y]^{x,y|c}_{\neg\{x=y\}}\}^{c}\Rightarrow\{F,S{\neg}[x,y]^{x,y|c}_{\{x=y\}}\}^{c}

Variables in sub- and super-scripts are never substituted: with the above rule we have a one step rewrite of ground terms $\{\emptyset,[0,1]^{x,y|c}_{\neg\{x=y\}}\}^{c}\xrightarrow{{\lambda_{\neg}^{\text{unf}}}}\{\emptyset,{\neg}[0,1]^{x,y|c}_{\{x=y\}}\}^{c}.$ In schema ${\lambda_{\{\_\}}}$ metavariable $\mathcal{B}$ occurs both as part of a name and as a subterm. An instantiation ${\lambda_{\{\_\}}}:\{F,S[x,y]^{x,y|c}_{\{x=y\}}\}^{c}\Rightarrow\{F,S\mathfrak{r}(x=y)\}^{c}$ yields (with $S={\neg}$ , $x=0$ , and $y=1$ ) a one step rewrite $\{\emptyset,{\neg}[0,1]^{x,y|c}_{\{x=y\}}\}^{c}\xrightarrow{{\lambda_{\{\_\}}}}\{\emptyset,{\neg}\mathfrak{r}(0=1)\}^{c}=\{\emptyset,{\neg}\mathfrak{r}(\text{\bf f})\}^{c}.$

To evaluate disjunction $\psi_{1}\vee\psi_{2}$ we create two frames corresponding to the disjuncts. If $\psi_{2}$ evaluates to t, the frame marked by $\downarrow$ is dropped (disjunctions are short circuited). If $\psi_{2}$ evaluates to f, the frame corresponding to $\psi_{1}$ drops $\downarrow$ and is evaluated normally.

	$\displaystyle{\lambda_{\vee}^{\text{unf}}}:\{F,S[\vec{v}]_{\psi_{1}\vee\psi_{2}}^{\vec{v}\|c}\}^{c}\Rightarrow\{F,S[\vec{v}]_{\psi_{1}}^{\vec{v}\|c,\downarrow}[\vec{v}]_{\psi_{2}}^{\vec{v}\|c}\}^{c},$
	$\displaystyle{\lambda_{\vee;\text{\bf t}}^{\text{fld}}}:\{F,S[\vec{v}]_{\psi}^{\vec{v}\|c,\downarrow}\mathfrak{r}(\text{\bf t})\}^{c}\Rightarrow\{F,S\mathfrak{r}(\text{\bf t})\}^{c},\quad{\lambda_{\vee;\text{\bf f}}^{\text{fld}}}:\{F,S[\vec{v}]_{\psi}^{\vec{v}\|c,\downarrow}\mathfrak{r}(\text{\bf f})\}^{c}\Rightarrow\{F,S[\vec{v}]_{\psi}^{\vec{v}\|c}\}^{c},\quad$		(5)

Quantifier evaluation is initialized with the whole database available for matching:

{\lambda_{\exists}^{\text{init}}}:\{F,S[\vec{v}]^{\vec{v}|c}_{\exists P\mathbin{.}\psi}\}^{c}\Rightarrow\{F,S[F\;|\;\vec{v}]^{\vec{v}|c}_{\exists P\mathbin{.}\psi}\}^{c}.

(6)

Let $\vec{w}$ be a sequence of all the distinct variables in $\mathit{Var}(P)\setminus\{\vec{v}\}$ . Let $\sigma$ be the current substitution. Rule ${\lambda_{\exists}^{\text{unf}}}$ pushes onto the stack a $(\sigma^{\prime},\psi)$ -frame, where $\sigma^{\prime}=\sigma\cup\{\vec{b}/\vec{w}\}$ is defined by matching $F^{\prime}\circ\sigma(P_{!}\circ P_{?})$ with iterator state, and it removes $\sigma^{\prime}(P_{?})$ from the iterator state. We keep applying ${\lambda_{\exists}^{\text{unf}}}$ until $\sigma^{\prime}(\psi)$ evaluates to t or we cannot match $\sigma(P_{!}\circ P_{?})$ with iterator state:

	$\displaystyle{\lambda_{\exists}^{\text{unf}}}:\bigl{\{}F,S[F^{\prime}\circ P_{!}\circ P_{?}\;\|\;\vec{v}]^{\vec{v}\|c}_{\exists P\mathbin{.}\psi}\bigr{\}}^{c}\Rightarrow\bigl{\{}F,S[F^{\prime}\circ P_{!}\;\|\;\vec{v}]^{\vec{v}\|c}_{\exists P\mathbin{.}\psi}[\vec{v},\vec{w}]^{\vec{v},\vec{w}\|c}_{\psi}\bigr{\}}^{c},$		(7)
	$\displaystyle{\lambda_{\exists;\text{\bf f}}^{\text{fld}}}:\bigr{\{}F,S[F^{\prime}\|\vec{v}\bigr{]}^{\vec{v}\|c}_{\exists P\mathbin{.}\psi}\mathfrak{r}(\text{\bf f})\bigr{\}}^{c}\Rightarrow\bigl{\{}F,S[F^{\prime}\|\vec{v}]^{\vec{v}\|c}_{\exists P\mathbin{.}\psi}\bigr{\}}^{c},\quad{\lambda_{\exists;\text{\bf t}}^{\text{fld}}}:\bigl{\{}F,S[F^{\prime}\|\vec{v}\bigr{]}^{\vec{v}\|c}_{\exists P\mathbin{.}\psi}\mathfrak{r}(\text{\bf t})\bigr{\}}^{c}\Rightarrow\{F,S\mathfrak{r}(\text{\bf t})\bigr{\}}^{c}.$

Let Yes? be a sort and let $\text{\it yes}:\rightarrow\text{\tt Yes?}$ . For all $\vec{v}\subseteq\mathit{Var}(\phi)$ and patterns $P$ occurring in $\phi$ we define a function $\mu_{P,\vec{v}}:\text{\tt Pat}^{\text{\it tp}}\;s_{1}\ldots s_{n}\rightarrow\text{\tt Yes?}$ with the single equation

\mu_{P,\vec{v}}(F^{\prime}\circ P_{!}\circ P_{?},\vec{v})=\text{\it yes}.

(8)

Thus, $\mu_{P,\vec{v}}(F,\vec{a})=\text{\it yes}$ if and only if $F$ matches with $F^{\prime}\circ\{\vec{a}/\vec{v}\}(P_{?}\circ P_{0})$ . Since facts matched by $\sigma(P_{?})$ are removed from the iteration state, ${\lambda_{\exists}^{\text{unf}}}$ cannot be applied infinitely many times. The following rule schema makes $\sigma(\exists P\mathbin{.}\psi)$ evaluate to f when ${\lambda_{\exists}^{\text{unf}}}$ can no longer be applied:

{\lambda_{\exists}^{\text{end}}}:\bigl{\{}F,S[F^{\prime}\;|\;\vec{v}]^{\vec{v}|c}_{\exists P\mathbin{.}\psi}\}^{c}\Rightarrow\{F,S\mathfrak{r}(\text{\bf f})\}^{c}\quad\text{if}\ (\mu_{P,\vec{v}}(F^{\prime},\vec{v})=\text{\it yes})=\text{\bf f}.

(9)

Finally, the rule $\lambda_{\text{sat}}:\{F,\mathfrak{r}(B)\}^{c}\Rightarrow\mathfrak{s}(B)$ finishes evaluation of $\phi$ .

Theorem 4.2

$\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ is a terminating rewriting system.

Proof 4.3

It suffices to define a partial well-order $\_<_{c}\_$ on terms of sort $\text{\tt State}^{c}$ which makes rewriting strictly monotonic, i.e., such that $t_{1}\rightarrow t_{2}$ implies $t_{2}<_{c}t_{1}$ for all $t_{1},t_{2}:\text{\tt State}^{c}$ . The order is defined by

\mathfrak{s}(B)<_{c}\{F,S\}^{c},\quad\{F,S\}^{c}<_{c}\{F,S^{\prime}\}^{c}\ \text{iff}\ S<_{s}S^{\prime},

for all $F$ , $S$ , $S^{\prime}$ , $B$ . Here $\_<_{s}\_$ is the lexicographic order on stacks derived from partial order $\_<_{f}\_$ on frames, i.e., for all frames $D_{1},\ldots,D_{n}$ , $E_{1},\ldots,E_{m}$ , $D_{1}\ldots D_{n}<_{s}E_{1}\ldots E_{m}$ iff either (1) for some $k$ , $D_{k}<_{f}E_{k}$ and $D_{i}=E_{i}$ for $i\in\{1,\ldots,k-1\}$ , or (2) $n<m$ and $D_{i}=E_{i}$ for $i\in\{1,\ldots,n\}$ . Frame ordering is defined by $\mathfrak{r}(B)<_{f}{\neg}<_{f}[F\;|\;\vec{a}]^{\vec{v}|c}_{\psi}<_{f}[F^{\prime}\;|\;\vec{a}]^{\vec{v}|c}_{\psi}<_{f}[\vec{b}]_{\psi}^{\vec{w}|c}<_{f}[\vec{b}]_{\psi}^{\vec{w}|c,\downarrow}<_{f}[F^{\prime\prime}\;|\;\vec{c}]^{\vec{x}|c}_{\psi^{\prime}}$ , for all $F$ , $F^{\prime}$ , $F^{\prime\prime}$ , $\psi$ , $\psi^{\prime}$ , $B$ , $\vec{v}$ , $\vec{w}$ , $\vec{x}$ , $\vec{a}$ , $\vec{b}$ , $\vec{c}$ such that $F\subsetneq F^{\prime}$ and $\psi$ is a proper subcondition of $\psi^{\prime}$ . Partial order $\_<_{f}\_$ is Noetherian because multisets of facts $F$ , $F^{\prime}$ and conditions $\psi$ , $\psi^{\prime}$ are finite terms. Hence, if the stacks are of bounded size, also $\_<_{c}\_$ is Noetherian. The size of stacks is bounded because each stack size increasing rule is of the form $\{F,S[\ldots]^{\ldots}_{\psi_{1}}\}^{c}\rightarrow\{F,SA[\ldots]^{\ldots}_{\psi_{2}}\}^{c}$ , where $A$ is a frame and $\psi_{2}$ is a proper subterm of $\psi_{1}$ . Since rules in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ are topmost, the rewriting is strictly monotonic because $t_{2}<_{c}t_{1}$ for each rule schema $t_{1}\Rightarrow t_{2}\;\text{if}\;C$ in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ .

The following useful observation can be trivially verified by examining the rule schemas:

Lemma 4.4

Let $\psi$ be a subcondition of $\phi$ . For any finite multiset of facts $F$ , stack $S$ , Boolean $B$ , variables $\vec{v}=v_{1},\ldots,v_{n}$ and values $\vec{a}=a_{1},\ldots,a_{n}$ , $\{F,S[\vec{a}]^{\vec{v}|c}_{\psi}\}^{c}\rightarrow^{*}\{F,S\mathfrak{r}(B)\}^{c}$ in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ iff $\{F,[]^{|c}_{\sigma(\psi)}\}^{c}\rightarrow^{*}\mathfrak{s}(B)$ in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\sigma(\psi))$ , where substitution $\sigma:=\{\vec{a}/\vec{v}\}$ .

The following example shows verification of a condition in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ for a given multiset of facts using rewriting semantics. It also shows non-confluence of the resulting rewrite system.

Example 4.5

Suppose $p,q:\text{\tt Nat}\rightarrow\text{\tt Fact}$ . Let $\psi:=\exists\;[q(z)]_{?}\mathbin{.}\{z=x\}$ , $K:=p(1)\circ q(0)$ , $L:=p(0)\circ q(0)$ , and $M:=p(0)\circ p(0)\circ p(1)$ . To check if condition

\phi:=\exists[p(x)\circ p(y)]_{?}\circ[q(y)]_{!}\mathbin{.}\bigl{(}\neg\psi\vee\{x=y\}\bigr{)}.

is satisfied in a multiset $H:=p(0)\circ p(0)\circ K=L\circ p(0)\circ p(1)=M\circ q(0)$ we normalize

	$\displaystyle\mathrm{I}_{\phi}(H)\xrightarrow{{\lambda_{\exists}^{\text{init}}}}$	$\displaystyle\bigl{\{}H,[H\|\;]^{\|c}_{\phi}\bigr{\}}^{c}\xrightarrow{{\lambda_{\exists}^{\text{unf}}}}\bigl{\{}H,[K\|\;]^{\|c}_{\phi}[0,0]^{x,y\|c}_{\neg\psi\vee\{x=y\}}\bigr{\}}^{c}\xrightarrow{{\lambda_{\vee}^{\text{unf}}}}\bigl{\{}H,[K\|\;]^{\|c}_{\phi}[0,0]^{x,y\|c,\downarrow}_{\neg\psi}[0,0]^{x,y\|c}_{\{x=y\}}\bigr{\}}^{c}$
	$\displaystyle\xrightarrow{{\lambda_{\{\_\}}}}$	$\displaystyle\bigl{\{}H,[K\|\;]^{\|c}_{\phi}[0,0]^{x,y\|c,\downarrow}_{\neg\psi}\mathfrak{r}(0=0)\bigr{\}}^{c}\xrightarrow{{\lambda_{\vee;\text{\bf t}}^{\text{fld}}}}\bigl{\{}H,[K\|\;]^{\|c}_{\phi}\mathfrak{r}(\text{\bf t})\bigr{\}}^{c}\xrightarrow{{\lambda_{\exists;\text{\bf t}}^{\text{fld}}}}\bigl{\{}H,\mathfrak{r}(\text{\bf t})\bigr{\}}^{c}\xrightarrow{\lambda_{\text{sat}}}\mathfrak{s}(\text{\bf t}).$

Thus, $\phi$ is satisfied in $H$ . However, we have also a normalizing sequence ending with $\mathfrak{s}(\text{\bf f})$ :

	$\displaystyle\mathrm{I}_{\phi}(H)\xrightarrow{{\lambda_{\exists}^{\text{init}}}}$	$\displaystyle\bigl{\{}H,[H\;\|\;]^{\|c}_{\phi}\bigr{\}}^{c}\xrightarrow{{\lambda_{\exists}^{\text{unf}}}}\bigl{\{}H,[L\;\|\;]^{\|c}_{\phi}[0,1]^{x,y\|c}_{\neg\psi\vee\{x=y\}}\bigr{\}}^{c}\rightarrow^{*}\bigl{\{}H,[L\;\|\;]^{\|c}_{\phi}[0,1]^{x,y\|c}_{\neg\psi}\bigr{\}}^{c}$
	$\displaystyle\rightarrow^{*}$	$\displaystyle\bigl{\{}H,[L\;\|\;]^{\|c}_{\phi}{\neg}[H\;\|\;0,1]^{x,y\|c}_{\psi}\bigr{\}}^{c}\xrightarrow{{\lambda_{\exists}^{\text{unf}}}}\bigl{\{}H,[L\;\|\;]^{\|c}_{\phi}{\neg}[M\;\|\;0,1]^{x,y\|c}_{\psi}[0,1,0]^{x,y,z\|c}_{\{z=x\}}\bigr{\}}^{c}$
	$\displaystyle\rightarrow^{*}$	$\displaystyle\bigl{\{}H,[L\;\|\;]^{\|c}_{\phi}{\neg}\mathfrak{r}(\text{\bf t})\bigr{\}}^{c}\rightarrow^{!}\mathfrak{s}(\text{\bf f}).$

Thus, evaluation of conditions does not necessarily lead to a unique result (the rewriting system is not confluent). This requires making the definition of logical equivalence bisimulation-like:

Definition 4.6

Let $\phi_{1}$ and $\phi_{2}$ be conditions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ . We say that $\phi_{1}$ is logically equivalent to $\phi_{2}$ , writing $\phi_{1}\equiv\phi_{2}$ , if and only if for all ground multisets of facts $F$ , ground substitutions $\sigma$ such that $\sigma(\phi_{1})$ and $\sigma(\phi_{2})$ are closed, and a Boolean $B\in\{\text{\bf t},\text{\bf f}\}$ we have

\mathrm{I}_{\sigma(\phi_{1})}(F)\rightarrow^{!}\mathfrak{s}(B)\quad\text{if and only if}\quad\mathrm{I}_{\sigma(\phi_{2})}(F)\rightarrow^{!}\mathfrak{s}(B).

The following result is an immediate consequence of Lemma 4.4:

Lemma 4.7

Logical equivalence on conditions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ is an equivalence and a congruence, i.e., if $\kappa$ is a position in a condition $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ such that $\phi|_{\kappa}$ is a condition, and $\psi\equiv\phi|_{\kappa}$ , then $\phi\equiv\phi[\psi]_{\kappa}$ .

A renaming is an injective substitution $\sigma$ mapping variables to variables.

Lemma 4.8

For any closed condition $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ , and any renaming $\sigma$ , $\phi\equiv\sigma(\phi)$ .

The following result clarifies elements of rewriting semantics of sentences in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ :

Lemma 4.9

For each ground multiset of facts $F$ , and all sentences $\phi$ , $\phi_{1}$ , $\phi_{2}$ and $\exists P\mathbin{.}\psi$ :

1.

$\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ or $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ , and these are the only possible normal forms of $\mathrm{I}_{\phi}(F)$ .
2.

$\mathrm{I}_{\bot}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ and never $\mathrm{I}_{\bot}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ .
3.

$\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(B)$ iff $\mathrm{I}_{\neg\phi}(F)\rightarrow^{!}\mathfrak{s}(\neg B)$ for all $B:\text{\tt Bool}$ .
4.

$\mathrm{I}_{\phi_{1}\vee\phi_{2}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ iff $\mathrm{I}_{\phi_{1}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ or $\mathrm{I}_{\phi_{2}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ .
5.

$\mathrm{I}_{\phi_{1}\vee\phi_{2}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ iff $\mathrm{I}_{\phi_{1}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ and $\mathrm{I}_{\phi_{2}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ .
6.

$\mathrm{I}_{\exists P\mathbin{.}\psi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ iff there exists a substitution $\sigma$ , and a multiset $F^{\prime}$ such that $F^{\prime}\circ\sigma(P_{?}\circ P_{!})=F$ and $\mathrm{I}_{\sigma(\psi)}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ .
7.
$\mathrm{I}_{\exists P\mathbin{.}\psi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ iff there exist two sequences of ground multisets of facts $F_{0},F_{1},\ldots,F_{n}$ and $G_{0},G_{1},\ldots,G_{n-1}$ , and a sequence of substitutions $\sigma_{0},\sigma_{1},\ldots,\sigma_{n-1}$ such that
1. (a)
  
  $F_{0}=F$ , $F_{i+1}=G_{i}\circ\sigma_{i}(P_{!})$ , and $F_{i}=G_{i}\circ\sigma_{i}(P_{!}\circ P_{?})$ , for all $i\in\{0,\ldots,n-1\}$ ,
2. (b)
  
  $\mathrm{I}_{\sigma_{i}(\psi)}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ , for all $i\in\{0,\ldots,n-1\}$ ,
3. (c)
  
  there exists no substitution $\sigma_{n}$ and multiset of facts $G_{n}$ such that $F_{n}=G_{n}\circ\sigma_{n}(P_{!}\circ P_{?})$ .
8.

$\mathrm{I}_{\phi\vee\neg\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ . If both $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ and $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ then also $\mathrm{I}_{\phi\vee\neg\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ .

Proof 4.10

The first point is verified by structural recursion using Lemma 4.4 and rules in Equations (3)–(9). Points 2–7 are verified using rules in Equations (3)–(9). Point 8 is verified using points 3-5.

Lemma 4.11

The following logical equivalences hold between conditions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ :

	$\displaystyle\phi\vee\bot\equiv\phi,\quad\phi_{1}\vee\phi_{2}\equiv\phi_{2}\vee\phi_{1},\quad\phi_{1}\vee(\phi_{2}\vee\phi_{3})\equiv(\phi_{1}\vee\phi_{2})\vee\phi_{3},$
	$\displaystyle\neg\neg\phi\equiv\phi,\quad\exists P\mathbin{.}(\phi_{1}\vee\phi_{2})\equiv(\exists P\mathbin{.}\phi_{1})\vee(\exists P\mathbin{.}\phi_{2}).$

Proof 4.12

The above equivalences can be proven using points 1-7 in Lemma 4.9. Only the last equivalence’s proof is non-trivial. Let $F$ be a ground multiset of facts and let $\sigma$ be a ground substitution such that $\sigma(\exists P\mathbin{.}(\phi_{1}\vee\phi_{2}))$ (or, equivalently, $\sigma((\exists P\mathbin{.}\phi_{1})\vee(\exists P\mathbin{.}\phi_{2}))$ ) is closed. Denote $Q:=\sigma(P)$ , $\psi_{i}:=\sigma(\phi_{i})$ , for $i\in\{1,2\}$ . Using Lemma 4.9, p. 6, we see that $\mathrm{I}_{\exists Q\mathbin{.}(\psi_{1}\vee\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ iff there exists a substitution $\sigma^{\prime}$ , and a multiset of facts $F^{\prime}$ such that $F^{\prime}\circ\sigma^{\prime}(Q_{?}\circ Q_{!})=F$ and $\mathrm{I}_{\sigma^{\prime}(\psi_{1})\vee\sigma^{\prime}(\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ . The latter holds iff there exists $i\in\{1,2\}$ such that $\mathrm{I}_{\sigma^{\prime}(\psi_{i})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ , by Lemma 4.9, p. 4. It follows, again using Lemma 4.9, point 6, that $\mathrm{I}_{\exists Q\mathbin{.}(\psi_{1}\vee\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ iff $\mathrm{I}_{\exists Q\mathbin{.}\psi_{i}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ for some $i\in\{1,2\}$ , i.e., iff (by Lemma 4.9, p. 4) $\mathrm{I}_{(\exists Q\mathbin{.}\psi_{1})\vee(\exists Q\mathbin{.}\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ . The part of the proof with falsity is more complex. Using Lemma 4.9, p. 7, we see that $\mathrm{I}_{\exists Q\mathbin{.}(\psi_{1}\vee\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ iff there exist two sequences of ground multisets of facts $F_{0},\ldots,F_{n}$ and $G_{0},\ldots,G_{n-1}$ , and a sequence of substitutions $\sigma_{0},\ldots,\sigma_{n-1}$ such that (a) $F_{0}=F$ , $F_{i+1}=G_{i}\circ\sigma_{i}(Q_{!})$ and $F_{i}=G_{i}\circ\sigma_{i}(Q_{!}\circ Q_{?})$ , for all $i\in\{0,\ldots,n-1\}$ , (b) $\mathrm{I}_{\sigma_{i}(\psi_{1})\vee\sigma_{i}(\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ , for all $i\in\{0,\ldots,n-1\}$ , (c) there exists no substitution $\sigma_{n}$ and multiset of facts $G_{n}$ such that $F_{n}=G_{n}\circ\sigma_{n}(Q_{!}\circ Q_{?})$ . Then by Lemma 4.9, p. 5, (b) iff, for $j\in\{1,2\}$ , (b_j) $\mathrm{I}_{\sigma_{i}(\psi_{j})}(F_{i})\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ , for all $i\in\{0,\ldots,n-1\}$ . Then (by Lemma 4.9, p. 7) (a), (b₁), (b₂) and (c) iff $\mathrm{I}_{\exists Q\mathbin{.}\psi_{j}}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ for $j\in\{1,2\}$ iff, by Lemma 4.9, p. 5, $\mathrm{I}_{(\exists Q\mathbin{.}\psi_{1})\vee(\exists Q\mathbin{.}\psi_{2})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ .

Non-confluence of $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ in Example 4.5 depended on patterns in $\phi$ with more than one fact. It turns out that $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ may be non-confluent even if $\phi$ contains only single-fact patterns:

Example 4.13

Let $r:\text{\tt Nat}\;\text{\tt Nat}\rightarrow\text{\tt Fact}$ be commutative. Consider condition $\phi:=\exists[r(x,y)]_{?}\mathbin{.}\{x=1\}$ evaluated in a database $F=r(1,2)$ . Since $r$ is commutative, there are two distinct substitutions $\{1/x,2/y\}$ and $\{2/x,1/y\}$ which match $r(x,y)$ with $r(1,2)$ . Consequently, there are two distinct paths of evaluating $\phi$ : $\mathrm{I}_{\phi}(F)\rightarrow^{*}\bigl{\{}F,[F|]_{\phi}^{|c}\bigr{\}}^{c}\xrightarrow{{\lambda_{\exists}^{\text{unf}}}}\left\{\begin{array}[]{l}\bigl{\{}F,[\emptyset|]_{\phi}^{|c}[1,2]^{x,y|c}_{\{x=1\}}\bigr{\}}^{c}\rightarrow^{!}\mathfrak{s}(\text{\bf t})\\ \bigl{\{}F,[\emptyset|]_{\phi}^{|c}[2,1]^{x,y|c}_{\{x=1\}}\bigr{\}}^{c}\rightarrow^{!}\mathfrak{s}(\text{\bf f})\end{array}\right..$

Definition 4.14

A fully reduced term $t:\text{\tt Fact}$ is said to have a unique matching property iff for any ground, fully reduced term $t^{\prime}:\text{\tt Fact}$ there exists at most one substitution $\sigma$ such that $\sigma(t)=_{A}t^{\prime}$ .

Definition 4.15

A condition $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ is called deterministic if and only if all quantification patterns in $\phi$ contain only single facts with unique matching property.

The following theorem states that while evaluation of a deterministic condition is not itself deterministic, but its results are.

Theorem 4.16

Let $\phi$ be a deterministic condition in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ . Then $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ is confluent. In particular, given a ground multiset of facts $F$ , there is a unique $B\in\{\text{\bf t},\text{\bf f}\}$ such that $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(B)$ .

Proof 4.17

We argue by induction on the complexity of formulas indexing frames on the top of a stack. First observe that only terms of the form $t:=\bigl{\{}F,S[F^{\prime}\;|\;\vec{a}]^{\vec{v}|c}_{\exists P\mathbin{.}\psi}\bigr{\}}^{c}$ can be rewritten in a single step into two distinct terms. As semiconfluence implies confluence it suffices to prove that if $t^{\prime}\leftarrow t\rightarrow^{+}t^{\prime\prime}$ then there exists $s$ such that $t^{\prime}\rightarrow^{*}s\;{}^{*}\!\!\leftarrow t^{\prime\prime}$ . By Lemmas 4.4 and 4.9, p. 1, a rewrite sequence $t\rightarrow^{+}t^{\prime\prime}$ must (1) contain $\{F,S\mathfrak{r}(B)\}^{c}$ , or (2) can be extended to the sequence ending in $\{F,S\mathfrak{r}(B)\}^{c}$ . If we prove that $t^{\prime}\rightarrow^{*}\{F,S\mathfrak{r}(B)\}^{c}$ then we can set $s=t^{\prime\prime}$ (if (1)) or $s=\{F,S\mathfrak{r}(B)\}^{c}$ (if (2)). It remains to prove that $t^{\prime}\rightarrow^{*}\{F,S\mathfrak{r}(B)\}^{c}$ . Under the theorem’s assumption, $P=[f]_{?}$ , where $f$ is a fact with a unique matching property (Definition 4.14). Thus, if for some fact $g$ in $F^{\prime}$ there exists a substitution $\sigma_{g}$ extending $\{\vec{a}/\vec{v}\}$ such that $g=\sigma_{g}(f)$ and $\mathrm{I}_{\sigma_{g}(\psi)}(F)\rightarrow\mathfrak{r}(\text{\bf t})$ (which implies, by inductive assumption, that $\mathrm{I}_{\sigma_{g}(\psi)}(F)\not\rightarrow^{*}\mathfrak{r}(\text{\bf f})$ ), it is unique, and cannot be missed during evaluation of the iterator frame. Either such $g$ exists, and then $t\rightarrow^{*}\{F,S\mathfrak{r}(\text{\bf t})\}^{c}$ but $t\not\rightarrow^{*}\{F,S\mathfrak{r}(\text{\bf f})\}^{c}$ (hence necessarily both $B=\text{\bf t}$ and $t^{\prime}\rightarrow^{*}\{F,S\mathfrak{r}(\text{\bf t})\}^{c}$ ), or it doesn’t, and hence, both $B=\text{\bf f}$ and $t^{\prime}\rightarrow^{*}\{F,S\mathfrak{r}(\text{\bf f})\}^{c}$

Example 4.18

Suppose $p:\text{\tt Nat}\rightarrow\text{\tt Fact}$ , $q:\rightarrow\text{\tt Fact}$ , and let $r:\text{\tt Nat}\;\text{\tt Nat}\rightarrow\text{\tt Fact}$ be a commutative operator (i.e., $r(x,y)=_{A}r(y,x)$ ). Clearly, $p(x)$ and $q$ have unique matching property (Definition 4.14), while $r(x,y)$ does not, since, e.g., if $\sigma_{1}:=\{0/x,1/y\}$ and $\sigma_{2}:=\{1/x,0/y\}$ then $\sigma_{1}(r(x,y))=_{A}\sigma_{2}(r(x,y))=_{A}r(0,1)$ . Let

\phi_{1}:=\exists[p(x)]_{?}\mathbin{.}\{x=0\},\quad\phi_{2}:=\exists[p(x)\circ q]_{?}\mathbin{.}\{x=0\},\quad\phi_{3}:=\exists[r(x,y)]_{?}\mathbin{.}\{x=0\}.

Then $\phi_{1}$ is deterministic (Definition 4.15), while $\phi_{2}$ and $\phi_{3}$ are not deterministic. Let $F:=p(0)\circ p(1)\circ q\circ r(0,1)$ . We now consider evaluation of all the $\phi_{i}$ ’s on $F$ . First, the reader will easily verify that while evaluating the existential quantifier in $\phi_{1}$ we can either first match $p(x)$ with $p(1)$ and then, upon failure, with $p(0)$ , or first match with $p(0)$ . Eventually, both paths yield satisfaction of $\phi_{1}$ on $F$ (although the first path is is longer). On the other hand, evaluation of $\phi_{2}$ and $\phi_{3}$ demonstrate two ways in which non-determinism occurs in evaluation of conditions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ . First, consider two non-convergent paths of rewriting $I_{\phi_{2}}(F)$ :

Here the reason for non-determinism which ultimately leads to non-convergent paths of execution is that when evaluating the existential quantifier at each attempt we have to consume both $p(x)$ -fact and $q$ -fact: since there is only one $q$ fact, if we start from wrong $p(x)$ -fact (i.e., $p(1)$ ), we do not get the second chance.

Denote for brevity $K:=p(0)\circ p(1)\circ q$ . Recall that $r(x,y)=_{A}r(y,x)$ . In particular, $r(0,1)=_{A}r(1,0)$ . Consider now two non-convergent paths of rewriting $I_{\phi_{3}}(F)$ :

Thus, in this case the reason of non-determinism leading to non-convergent paths was the possibility of two distinct matchings of $r(0,1)$ with $r(x,y)$ given by $\{0/x,1/y\}$ and $\{1/x,0/y\}$ .

5 Rewriting semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$

Let $Q$ be a query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ . We associate with $Q$ the rewriting system $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(Q)$ . Terms rewritten with the rules of the rewriting system $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(Q)$ are of the form $\{F,F^{\prime},S\}^{q}$ , where $\{\_,\_,\_\}^{q}:\text{\tt FSet}\;\text{\tt FSet}\;\text{\tt Stk}^{q}\rightarrow\text{\tt State}^{q}$ , $F$ is a database of facts against which we issue the query, $F^{\prime}$ is a partial answer (i.e., an answer built so far in the rewriting process), and $S$ is a stack of sort $\text{\tt Stk}^{q}$ which simulates structural recursion. Normal forms encapsulating an answer to $Q$ are constructed with $\mathfrak{a}:\text{\tt FSet}\rightarrow\text{\tt State}^{q}$ . Terms of sort $\text{\tt Stk}^{q}$ are constructed from local computation frames of sort $\text{\tt Frm}^{q}$ , where $\text{\tt Frm}^{q}<\text{\tt Stk}^{q}$ , using an associative binary operator $\_\_:\text{\tt Stk}^{q}\;\text{\tt Stk}^{q}\rightarrow\text{\tt Stk}^{q}$ with identity element $\emptyset$ . Constructors of frames are indexed by sub-queries $R$ of $Q$ , and lists of distinct variable names $\vec{v}:=v_{1},\ldots,v_{n}$ of respective sorts $s_{1},\ldots,s_{n}$ ( $\vec{v}$ can be of any length as long as $\{\vec{v}\}\subseteq\mathit{Var}(Q)$ and it contains all free variables of $R$ ):

	$\displaystyle[\_,\ldots,\_]^{\vec{v}\|q}_{R}:s_{1}\ldots s_{n}\rightarrow\text{\tt Frm}^{q},\quad[\_,\ldots,\_\|\_]^{\vec{v}\|q}_{R}:s_{1}\ldots s_{n}\;\text{\tt Stk}^{c}\rightarrow\text{\tt Frm}^{q},$
	$\displaystyle[\_\|\_,\ldots,\_]^{\vec{v}\|q}_{R}:\text{\tt FSet}\;s_{1}\ldots s_{n}\rightarrow\text{\tt Frm}^{q}\ \text{if}\ R=\nabla P\mathbin{.}R^{\prime}.$

As $\vec{v}$ can be empty, the above signature templates include $[]^{|c}_{R}:\rightarrow\text{\tt Frm}^{q}$ , etc. As in the case of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ , variables in super- and sub-scripts are part of function names and are never matched or substituted — Remark 4.1 applies here and in the next section. Frames of the form $[\vec{a}]^{\vec{v}|q}_{R}$ , $[\vec{a}|S]^{\vec{v}|q}_{R}$ , or $[F^{\prime}|\vec{a}]^{\vec{v}|q}_{R}$ are called $(R,\sigma)$ -frames, where $\sigma:=\{\vec{a}/\vec{v}\}$ is the current substitution. They indicate evaluation of $\sigma(R)$ . Conditional frames $[\vec{a}|S]^{\vec{v}|q}_{R}$ are used in evaluation of conditionals $\phi\Rightarrow R$ , where $S:\text{\tt Stk}^{c}$ represents evaluation of $\phi$ . Iterator frames $[F^{\prime}|\vec{a}]^{\vec{v}|q}_{\nabla P\mathbin{.}R}$ , represent iterative evaluation of $\sigma(\nabla P\mathbin{.}R)$ . Multiset $F^{\prime}$ , called iterator state, contains facts available for matching with $P_{!}\circ P_{?}$ . Given a database $F$ , to evaluate a closed query $Q$ we rewrite a state term $\mathrm{I}_{Q}(F):=\{F,\emptyset,[]^{|q}_{Q}\}^{q}$ until a normal form $\mathfrak{a}(F^{\prime})$ is reached. Then we conclude that evaluation of $Q$ on $F$ yields $F^{\prime}$ as an answer.

Now we are ready to define the rules of $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(Q)$ . Literal facts are added to the partial answer multiset after applying the current substitution, and empty queries return nothing:

\lambda_{\text{fact}}:\bigl{\{}F,F^{\prime},S[\vec{v}]^{\vec{v}|q}_{f}\bigr{\}}^{q}\Rightarrow\bigl{\{}F,F^{\prime}\circ f,S\bigr{\}}^{q},\quad\lambda_{\emptyset}:\bigl{\{}F,F^{\prime},S[\vec{v}]^{\vec{v}|q}_{\emptyset}\bigr{\}}^{q}\Rightarrow\bigl{\{}F,F^{\prime},S\bigr{\}}^{q}.

(10)

Evaluation of “union” $\_\rhd\_$ is implemented by replacing the frame corresponding to $R_{1}\rhd R_{2}$ with two frames corresponding to $R_{1}$ and $R_{2}$ , respectively:

\lambda_{\rhd}^{\text{unf}}:\ \bigl{\{}F,F^{\prime},S[\vec{v}]^{\vec{v}|q}_{R_{1}\rhd R_{2}}\bigr{\}}^{q}\Rightarrow\bigl{\{}F,F^{\prime},S[\vec{v}]^{\vec{v}|q}_{R_{2}}[\vec{v}]^{\vec{v}|q}_{R_{1}}\bigr{\}}^{q}

(11)

To compute a conditional $\phi\Rightarrow R$ we first embed a stack representing computation of condition $\phi$ within the frame corresponding to the conditional. Once this condition is evaluated, we either evaluate $R$ if the condition is satisfied, or drop the conditional if it is not:

$\displaystyle\lambda_{\text{cond}}^{\text{unf}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},S[\vec{v}]^{\vec{v}\|q}_{\phi\Rightarrow R}\bigr{\}}^{q}\Rightarrow\bigl{\{}F,F^{\prime},S\bigl{[}\vec{v}\;\|\;[\vec{v}]^{\vec{v}\|c}_{\phi}\bigr{]}^{\vec{v}\|q}_{R}\bigr{\}}^{q},$
$\displaystyle\lambda_{\text{cond};\text{\bf f}}^{\text{unf}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},S[\vec{v}\;\|\;\mathfrak{r}(\text{\bf f})]^{\vec{v}\|q}_{R}\bigr{\}}^{q}\Rightarrow\{F,F^{\prime},S\}^{q},$
$\displaystyle\lambda_{\text{cond};\text{\bf t}}^{\text{unf}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},S[\vec{v}\;\|\;\mathfrak{r}(\text{\bf t})]^{\vec{v}\|q}_{R}\bigr{\}}^{q}\Rightarrow\bigl{\{}F,F^{\prime},S[\vec{v}]_{R}^{\vec{v}\|q}\bigr{\}}^{q}.$	(12)

To compute $\phi$ we add, for every rule $\lambda:\{F,S^{\prime}\}^{c}\Rightarrow\{F,S^{\prime\prime}\}^{c}\ \text{if}\ C$ in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi)$ the rule schema

\lambda^{q}:\{F,F^{\prime},S[\vec{v}\;|\;S^{\prime}]^{\vec{v}|q}_{R}\}^{q}\Rightarrow\{F,F^{\prime},S[\vec{v}\;|\;S^{\prime\prime}]^{\vec{v}|q}_{R}\}^{q}\ \text{if}\ C

(13)

Evaluation of $\nabla\_.\_$ subquery is initialized with the whole database available for matching:

{\lambda_{\nabla}^{\text{init}}}:\bigl{\{}F,F^{\prime},S[\vec{v}]^{\vec{v}|q}_{\nabla P\mathbin{.}R}\bigr{\}}^{q}\Rightarrow\bigl{\{}F,F^{\prime},S[F\;|\;\vec{v}]^{\vec{v}|q}_{\nabla P\mathbin{.}R}\bigr{\}}^{q}

(14)

{\lambda_{\nabla}^{\text{unf}}}:\bigl{\{}F,F^{\prime},S\bigl{[}F^{\prime\prime}\circ P_{!}\circ P_{?}\;|\;\vec{v}\bigr{]}^{\vec{v}|q}_{\nabla P\mathbin{.}R}\bigr{\}}^{c}\Rightarrow\bigl{\{}F,F^{\prime},S\bigl{[}F^{\prime\prime}\circ P_{!}\;|\;\vec{v}\bigr{]}^{\vec{v}|q}_{\nabla P\mathbin{.}R}[\vec{v},\vec{w}]^{\vec{v},\vec{w}|q}_{R}\bigr{\}}^{q}

(15)

We keep applying ${\lambda_{\nabla}^{\text{unf}}}$ until we cannot match $F^{\prime\prime}\circ\sigma(P_{!}\circ P_{?})$ with iterator state. Then we remove the iterator frame from the stack. To prevent premature application, rule schema ${\lambda_{\nabla}^{\text{end}}}$ is conditional, where the condition uses functions $\mu_{P,\vec{v}}:\text{\tt Pat}^{\text{\it tp}}\;s_{1}\ldots s_{n}\rightarrow\text{\tt Yes?}$ defined for each $\vec{v}\subseteq\mathit{Var}(Q)$ and pattern $P$ occurring in $Q$ with the single equation $\mu_{P,\vec{v}}(F\circ P_{!}\circ P_{?},\vec{v})=\text{\it yes}$ (cf. Equation (8)):

{\lambda_{\nabla}^{\text{end}}}:\{F,F^{\prime},S[F^{\prime\prime}\;|\;\vec{v}]^{\vec{v}|q}_{\nabla P\mathbin{.}R}\}^{q}\Rightarrow\{F,F^{\prime},S\}^{q}\quad\text{if}\ (\mu_{P,\vec{v}}(F^{\prime\prime},\vec{v})=\text{\it yes})=\text{\bf f}.

(16)

Finally, the rule $\lambda_{\text{ans}}:\{F,F^{\prime},\emptyset\}^{c}\Rightarrow\mathfrak{a}(F^{\prime})$ finishes evaluation of $Q$ .

The following result can be proven similarly to Theorem 4.2:

Theorem 5.1

$\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(R)$ is a terminating rewriting system.

The following useful observation can be trivially verified by examining the rule schemas:

Lemma 5.2

Let $R$ be a subcondition of $Q$ . Then, for all multisets of facts $F$ , $F^{\prime}$ , $F^{\prime\prime}$ , stacks $S$ , lists of variables $\vec{v}=v_{1},\ldots,v_{n}$ and values $\vec{a}=a_{1},\ldots,a_{n}$ , $\{F,F^{\prime},S[\vec{a}]^{\vec{v}|q}_{R}\}^{q}\rightarrow^{*}\{F,F^{\prime}\circ F^{\prime\prime},S\}^{q}$ in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(Q)$ if and only if $\{F,\emptyset,[]^{|q}_{\sigma(R)}\}^{q}\rightarrow^{!}\mathfrak{a}(F^{\prime\prime})$ in $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(\sigma(R))$ , where substitution $\sigma:=\{\vec{a}/\vec{v}\}$ .

The following example shows that evaluation of queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ does not, in general, return a unique answer, and, in particular, that $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}(Q)$ for general queries $Q$ is not confluent.

Example 5.3

Let $\sharp:\rightarrow\text{\tt Fact}$ and $b:\text{\tt Nat}\rightarrow\text{\tt Fact}$ . Consider the query $Q:=\nabla[\sharp]_{?}\circ[b(x)]_{!}\mathbin{.}b(x)$ executed against database $F:=\sharp\circ b(1)\circ b(2)$ . Then $\mathrm{I}_{Q}(F)$ can normalized in two ways:

\mathrm{I}_{Q}(F)\xrightarrow{{\lambda_{\nabla}^{\text{init}}}}\bigl{\{}F,\emptyset,[F\;|\;]^{|q}_{Q}\bigr{\}}^{q}\xrightarrow{{\lambda_{\nabla}^{\text{unf}}}}\left\{\begin{array}[]{l}\bigl{\{}F,\emptyset,[b(1)\circ b(2)\;|\;]^{|q}_{Q}[1]^{x|q}_{b(x)}\bigr{\}}^{q}\rightarrow^{!}\mathfrak{a}(b(1))\\ \bigl{\{}F,\emptyset,[b(1)\circ b(2)\;|\;]^{|q}_{Q}[2]^{x|q}_{b(x)}\bigr{\}}^{q}\rightarrow^{!}\mathfrak{a}(b(2))\end{array}\right..

Queries like $Q$ are useful as a simulation of a non-deterministic choice (say, by a human agent) of a subset of values stored in the database with a fixed maximal cardinality. E.g., $\nabla[b(x)]_{?}\mathbin{.}b(x)$ returns all “ $b$ -facts” stored in the database. Query $Q$ defined above, however, chooses (with repetitions) at most as many $b$ -facts as there are tokens $\sharp$ . Query $\nabla[\sharp\circ b(x)]_{?}\mathbin{.}b(x)$ avoids repetitions.

The following is the bisimulation-like definition of logical equivalence between queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ :

Definition 5.4

Let $Q_{1}$ and $Q_{2}$ be two conditions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ . Recall that $\mathrm{I}_{Q}(F):=\{F,\emptyset,[]^{|q}_{Q}\}^{q}$ . We say that $Q_{1}$ is logically equivalent to $Q_{2}$ , writing $Q_{1}\equiv Q_{2}$ , if and only if, for all ground multisets of facts $F$ and $F^{\prime}$ , and ground substitutions $\sigma$ such that $\sigma(Q_{1})$ and $\sigma(Q_{2})$ are closed, we have

\mathrm{I}_{\sigma(Q_{1})}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})\quad\text{iff}\quad\mathrm{I}_{\sigma(Q_{2})}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime}).

In other words, queries are equivalent if they can match each other’s answers. The following result is an immediate consequence of Lemma 5.2:

Lemma 5.5

Logical equivalence on queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ is an equivalence relation and a congruence, i.e., if $\kappa$ is a position in a query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ such that $Q|_{\kappa}$ is a query, and $R\equiv Q|_{\kappa}$ , then $Q\equiv Q[R]_{\kappa}$ .

We leave proof of the next observation to the reader:

Lemma 5.6

For any closed query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ , and any renaming $\sigma$ , $Q\equiv\sigma(Q)$ .

The following clarification of semantics of queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ is proven similarly to Lemma 4.9:

Lemma 5.7

For all ground multisets of facts $F$ , $F^{\prime}$ and $F^{\prime\prime}$ , all closed queries $Q$ , $Q_{1}$ , $Q_{2}$ , and $\nabla P\mathbin{.}R$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ , and all closed conditions $\phi$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ , the following statements hold:

1.

If $\mathrm{I}_{Q}(F)\rightarrow^{!}\Gamma$ then $\Gamma=\mathfrak{a}(G)$ for some ground multiset of facts $G$ .
2.

Let $f$ be a fact. If $\mathrm{I}_{f}(F)\rightarrow^{!}\Gamma$ then $\Gamma=\mathfrak{a}(f)$ . If $\mathrm{I}_{\emptyset}(F)\rightarrow^{!}\Gamma$ then $\Gamma=\mathfrak{a}(\emptyset)$ .
3.

Let $F^{\prime}\neq\emptyset$ . In this case $\mathrm{I}_{\phi\Rightarrow Q}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ iff $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ and $\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ .
4.

$\mathrm{I}_{\phi\Rightarrow Q}(F)\rightarrow^{!}\mathfrak{a}(\emptyset)$ iff either (non-exclusively) $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ or $\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{a}(\emptyset)$ .
5.

$\mathrm{I}_{Q_{1}\rhd Q_{2}}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ iff $\mathrm{I}_{Q_{1}}(F)\rightarrow^{!}\mathfrak{a}(F_{1})$ and $\mathrm{I}_{Q_{2}}(F)\rightarrow^{!}\mathfrak{a}(F_{2})$ for some multisets $F_{1}$ and $F_{2}$ such that $F^{\prime}=F_{1}\circ F_{2}$ .
6.
$\mathrm{I}_{\nabla P\mathbin{.}R}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ iff there exist lists of ground multisets of facts $F_{0},\ldots,F_{n}$ , $G_{0},\ldots,G_{n-1}$ , and $H_{0},\ldots,H_{n-1}$ , and a sequence of substitutions $\sigma_{0},\sigma_{1},\ldots,\sigma_{n-1}$ such that
1. (a)
  
  $F_{0}=F$ , $F_{i+1}=G_{i}\circ\sigma_{i}(P_{!})$ , and $F_{i}=G_{i}\circ\sigma_{i}(P_{!}\circ P_{?})$ , for all $i\in\{0,\ldots,n-1\}$ ,
2. (b)
  
  $\mathrm{I}_{\sigma_{i}(R)}(F)\rightarrow^{!}\mathfrak{a}(H_{i})$ , for all $i\in\{0,\ldots,n-1\}$ ,
3. (c)
  
  there exists no substitution $\sigma_{n}$ and multiset of facts $G_{n}$ such that $F_{n}=G_{n}\circ\sigma_{n}(P_{!}\circ P_{?})$ ,
4. (d)
  
  $F^{\prime}=H_{0}\circ H_{1}\circ\cdots H_{n-1}$ .

Lemma 5.8

The following logical equivalences hold between conditions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ :

	$\displaystyle\emptyset\rhd Q\equiv Q,\quad Q_{1}\rhd Q_{2}\equiv Q_{2}\rhd Q_{1},\quad Q_{1}\rhd(Q_{2}\rhd Q_{3})\equiv(Q_{1}\rhd Q_{2})\rhd Q_{3},$
	$\displaystyle\bot\Rightarrow R\equiv\emptyset,\quad\neg\bot\Rightarrow R\equiv R,\quad\exists P\mathbin{.}\emptyset\equiv\emptyset$

The next results show that non-confluence of queries makes some natural equivalences invalid:

Lemma 5.9

Let $\phi$ be a condition in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ and let $Q_{1}$ , $Q_{2}$ be queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ . For all ground multisets of facts $F$ , $F^{\prime}$ and all substitutions $\sigma$ such that $\sigma(\phi)$ , $\sigma(Q_{1})$ and $\sigma(Q_{2})$ are closed, we have

\mathrm{I}_{\sigma((\phi\Rightarrow Q_{1})\rhd(\phi\Rightarrow Q_{2}))}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})\quad\text{if}\quad\mathrm{I}_{\sigma(\phi\Rightarrow(Q_{1}\rhd Q_{2}))}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime}),

(17)

however, the inverse implication does not hold in general. If $\phi$ is deterministic (Definition 4.15), then

(\phi\Rightarrow Q_{1})\rhd(\phi\Rightarrow Q_{2})\equiv\phi\Rightarrow(Q_{1}\rhd Q_{2}).

(18)

Proof 5.10

To prove the implication (17) assume $\mathrm{I}_{\sigma(\phi\Rightarrow(Q_{1}\rhd Q_{2}))}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ . Either $F^{\prime}\neq\emptyset$ or $F^{\prime}=\emptyset$ . In the first case, by Lemma 5.7, p. 3 , our assumption is equivalent to $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{r}(\text{\bf t})$ and $\mathrm{I}_{\sigma(Q_{1}\rhd Q_{2})}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ , the latter of which, in turn, is equivalent, by Lemma 5.7, p. 5, to $\mathrm{I}_{\sigma(Q_{1})}(F)\rightarrow^{!}\mathfrak{a}(F_{1})$ and $\mathrm{I}_{\sigma(Q_{2})}(F)\rightarrow^{!}\mathfrak{a}(F_{2})$ for some multisets $F_{1}$ and $F_{2}$ such that $F^{\prime}=F_{1}\circ F_{2}$ . But this is equivalent, by Lemma 5.7, p. 3 and 5, to $\mathrm{I}_{\sigma((\phi\Rightarrow Q_{1})\rhd(\phi\Rightarrow Q_{2}))}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime}).$ The case $F^{\prime}=\emptyset$ is dealt with similarly.

To show that, in general, the inverse implication does not hold, consider $\phi:=\exists[a\circ b(x)]_{?}\mathbin{.}\{x=1\}$ , $Q:=\phi\Rightarrow(c\rhd d)$ , $Q^{\prime}:=(\phi\Rightarrow c)\rhd(\phi\Rightarrow d)$ , where $a,c,d:\rightarrow\text{\tt Fact}$ and $b:\text{\tt Nat}\rightarrow\text{\tt Fact}$ . Let $F:=a\circ b(1)\circ b(2)$ . Since $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{s}(B)$ for $B\in\{\text{\bf t},\text{\bf f}\}$ , $\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ iff $F^{\prime}\in\{\emptyset,c\circ d\}$ , whereas $\mathrm{I}_{Q^{\prime}}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ iff $F^{\prime}\in\{\emptyset,c,d,c\circ d\}$ .

Assume that $\phi$ is deterministic. By Theorem 4.16, either $\mathrm{I}_{\sigma(\phi)}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ or $\mathrm{I}_{\sigma(\phi)}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ , but not both. Let $R_{1}:=\sigma\bigl{(}(\phi\Rightarrow Q_{1})\rhd(\phi\Rightarrow Q_{2})\bigr{)}$ , $R_{2}:=\sigma\bigl{(}\phi\Rightarrow(Q_{1}\rhd Q_{2})\bigr{)}$ . If $\mathrm{I}_{\sigma(\phi)}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})$ then $\mathrm{I}_{R_{i}}(F)\rightarrow\mathfrak{a}(F^{\prime})$ iff $F^{\prime}=\emptyset$ for $i\in\{1,2\}$ . If $\mathrm{I}_{\sigma(\phi)}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})$ then, for $i\in\{1,2\}$ , $\mathrm{I}_{R_{i}}(F)\rightarrow\mathfrak{a}(F^{\prime})$ iff $F^{\prime}=F_{1}\circ F_{2}$ for some multisets $F_{1}$ and $F_{2}$ such that $\mathrm{I}_{Q_{j}}(F)\rightarrow\mathfrak{a}(F_{j})$ , $j\in\{1,2\}$ .

Example 5.11

In general, equivalence $\nabla P\mathbin{.}(Q_{1}\rhd Q_{2})\equiv(\nabla P\mathbin{.}Q_{1})\rhd(\nabla P\mathbin{.}Q_{2})$ is not valid. Indeed, let $a:\rightarrow\text{\tt Fact}$ and $b,c:\text{\tt Nat}\rightarrow\text{\tt Fact}$ , and let $P:=[a\circ b(x)]_{?}$ , $Q:=\nabla P\mathbin{.}(b(x)\rhd c(x))$ , $Q^{\prime}:=(\nabla P\mathbin{.}b(x))\rhd(\nabla P\mathbin{.}c(x))$ . Suppose that $F:=a\circ b(1)\circ b(2)$ . Then $\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{r}(F^{\prime})$ iff $F^{\prime}\in\bigl{\{}b(i)\circ c(i)\;|\;i\in\{1,2\}\bigr{\}}$ . However, $\mathrm{I}_{Q^{\prime}}(F)\rightarrow^{!}\mathfrak{r}(F^{\prime})$ iff $F^{\prime}\in\bigl{\{}b(i)\circ c(j)\;|\;i,j\in\{1,2\}\bigr{\}}$ .

Here we define a class of queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ which evaluate to unique answers:

Definition 5.12

A query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ is called deterministic if all quantification patterns in $Q$ (including those inside conditions) contain only single facts with unique matching property.

Theorem 5.13

Let $Q$ be a deterministic query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ . Then $\mathcal{R}_{\Sigma,\mathcal{D}}(Q)$ is confluent, and, in particular, given a ground multiset of facts $F$ , there is a unique multiset of facts $F^{\prime}$ such that $\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})$ .

Proof 5.14

As semiconfluence implies confluence, to show confluence at $t:\text{\tt State}^{q}$ it suffices to prove that if $t^{\prime}\leftarrow t\rightarrow^{+}t^{\prime\prime}$ for some $t^{\prime}$ and $t^{\prime\prime}$ , then there exists $s$ such that $t^{\prime}\rightarrow^{*}s\;{}^{*}\!\!\leftarrow t^{\prime\prime}$ . Semiconfluence is immediate at irreducible terms $\mathfrak{a}(F^{\prime})$ , as well as terms $\{F,F^{\prime},\emptyset\}^{q}$ which can only rewrite to $\mathfrak{a}(F^{\prime})$ . Let $t:=\{F,F^{\prime},SA\}^{q}$ where $S:\text{\tt Stk}^{q}$ , and $A:\text{\tt Frm}^{q}$ . If there exists a unique multiset of facts $K$ such that every rewrite sequence starting with $t$ either (1) contains $h:=\{F,F^{\prime}\circ K,S\}^{q}$ or (2) it can be extended to reach $h$ , then semiconfluence holds at $t$ . Indeed, in this case, either $s=t^{\prime\prime}$ witnesses the semiconfluence (if (1) holds for $t\rightarrow^{+}t^{\prime\prime}$ ) or $s=h$ does (if (2) holds for $t\rightarrow^{+}t^{\prime\prime}$ ). It remains to prove the existence of the unique multiset $K$ . We argue by induction on the structure of formulas indexing the frame $A$ on the top of the stack. Most cases are dealt by trivial application of Lemmas 5.7 and 5.2. For frames related to conditionals $\_\Rightarrow\_$ we have to also use Lemma 4.16. The only non-trivial part of the proof concerns frames $A$ of the form $[F^{\prime\prime}\;|\;\vec{a}]^{\vec{v}|q}_{\nabla P\mathbin{.}R}$ .

Under the theorem’s assumption, $P=[f]_{?}$ , where $f$ is a fact with unique matching property (Definition 4.14). Let $\vec{w}$ be a sequence of all variables in $\mathit{Var}(f)\setminus\{\vec{v}\}$ . If there is no fact $f^{\prime}$ in $F^{\prime}$ such that $f^{\prime}$ matches $\{\vec{a}/\vec{v}\}(f)$ then necessarily $K=\emptyset$ . Otherwise $F^{\prime}=G\circ f_{1}\circ\cdots f_{n}$ for some $n>0$ , where (1) for all $i\in\{1,\ldots,n\}$ there exists a unique substitution $\sigma_{i}=\{\vec{a}/\vec{v},\vec{b^{i}}/\vec{w}\}$ such that $f_{i}=\sigma_{i}(f)$ , (2) there is no fact $f^{\prime}$ in $G$ such that $f^{\prime}$ matches $\{\vec{a}/\vec{v}\}(f)$ . In this case, necessarily, by Lemma 5.7, p. 6, $K=K_{1}\circ\cdots\circ K_{n}$ , where, for all $i\in\{1,\ldots,n\}$ , $K_{i}$ is the unique multiset of facts (uniqueness and existence follows from inductive assumption) such that $\mathrm{I}_{\sigma_{i}(R)}(F)\rightarrow^{!}\mathfrak{a}(K_{i})$ .

There is a useful relationship between queries and conditions:

Lemma 5.15

Let $r:s_{1}\ldots s_{n}\rightarrow\text{\tt Fact}$ , let $Q$ be a query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ , and let $\vec{x}:=x_{1},\ldots,x_{n}$ be a list of $n$ distinct variables such that $\{\vec{x}\}\cap\mathit{Var}(Q)=\emptyset$ and $x_{i}:s_{i}$ for $i\in\{1,\ldots,n\}$ . Then, there exists a condition $\phi_{Q}^{\vec{x}|r}$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ such that, for all ground multisets of facts $F$ , and for all ground substitutions $\sigma:=\{\vec{t}/\vec{x},\vec{a}/\vec{v}\}$ such that $\{\vec{a}/\vec{v}\}(Q)$ is closed, $\sigma(\phi_{Q}^{\vec{x}|r})$ is closed, and, moreover,

	$\displaystyle\mathrm{I}_{\sigma(\phi_{Q}^{\vec{x}\|r})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf t})\quad\text{iff}\quad\exists F^{\prime}\mathbin{.}\bigl{(}\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime}\circ r(\vec{t}))\bigr{)},$
	$\displaystyle\mathrm{I}_{\sigma(\phi_{Q}^{\vec{x}\|r})}(F)\rightarrow^{!}\mathfrak{s}(\text{\bf f})\quad\text{iff}\quad\exists F^{\prime}\mathbin{.}\bigl{(}r(\vec{t})\notin F^{\prime}\wedge\mathrm{I}_{Q}(F)\rightarrow^{!}\mathfrak{a}(F^{\prime})\bigr{)},$		(19)

i.e., $\sigma(\phi_{Q}^{\vec{x}|r})$ evaluates to t (resp. f) on $F$ if and only if there is some $F^{\prime}$ returned by $Q$ when evaluated against $F$ , such that $r(\vec{t})\in F^{\prime}$ (resp. $r(\vec{t})\notin F^{\prime}$ ). Moreover, if $Q$ is deterministic, then so is $\phi_{Q}^{\vec{x}|r}$

Proof 5.16

We define $\phi_{Q}^{\vec{x}|r}$ by recursion on the structure of a query $Q$ :

\displaystyle\phi_{\emptyset}^{\vec{x}|r}=\bot,\ \phi_{h(\vec{t})}^{\vec{x}|r}=\{r(\vec{x})=h(\vec{t})\},\ \phi_{Q_{1}\rhd Q_{2}}^{\vec{x}|r}=\phi_{Q_{1}}^{\vec{x}|r}\vee\phi_{Q_{2}}^{\vec{x}|r},\ \phi_{\psi\Rightarrow R}^{\vec{x}|r}=\psi\wedge\phi_{R}^{\vec{x}|r},\ \phi_{\nabla P\mathbin{.}R}^{\vec{x}|r}=\exists P.\phi_{R}^{\vec{x}|r}.

The easy if laborious proof that $\phi_{Q}^{\vec{x}|r}$ really satisfies all the conditions in the statement is left to the reader.

Example 5.17

Consider the following query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ (where $x$ and $y$ are distinct variables):

Q:=\nabla[f(x,y)]_{?}\mathbin{.}\bigl{(}(\{x=y\}\Rightarrow r(x))\rhd r(s(x))\bigr{)}.

Thus, for any fact of the form $f(x,y)$ in the database, $Q$ will output $r(s(x))$ , and, if $x=y$ , also $r(x)$ . Using the recursive formula from the proof of Lemma 5.15 we easily see that

\phi^{z|r}_{Q}:=\exists[f(x,y)]_{?}\mathbin{.}\bigl{(}(\{x=y\}\wedge\{r(z)=r(x)\})\vee\{r(z)=r(s(x))\}\bigr{)}.

The next result shows that $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ can emulate relational algebra.

Theorem 5.18

Denote by $\text{RelAlg}(\mathcal{S},\mathcal{D})$ the relational algebra over the relational schema $\mathcal{S}$ and the domain $\mathcal{D}$ of atomic values (which we silently identify with its algebraic representation, where types are sorts and predicates are represented with equationally defined operators into the Bool sort). Furthermore, assume that for each relational algebra expression $R$ in $\text{RelAlg}(\mathcal{S},\mathcal{D})$ a function symbol $\mathcal{R}_{R}:s_{1}\ldots s_{n}\rightarrow\text{\tt Fact}$ is in $\Sigma_{F}$ , where $s_{i}$ is the sort (domain) of the $i$ -th column of $R$ and $n$ is the arity of $R$ . For any relational database $I$ with schema $\mathcal{S}$ , let $\overline{\mathit{tr}}(I)$ be a multiset corresponding to the database $I$ . More precisely,

\overline{\mathit{tr}}(I):=\circ\{\mathcal{R}_{r}(\vec{t})\;|\;\text{$r$ is a relation symbol in $\mathcal{S}$ and\ }(\vec{t})\in r^{I}\},

where $r^{I}$ is the set of tuples of $r$ in $I$ . Then, for all formulas $R$ in $\text{RelAlg}(\mathcal{S},\mathcal{D})$ , there exists a closed query $\mathit{tr}(R)$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ such that for all relational databases $I$ with schema $\mathcal{S}$ ,

\exists F\mathbin{.}\bigl{(}\mathrm{I}_{\mathit{tr}(R)}(\overline{\mathit{tr}}(I))\rightarrow^{!}\mathfrak{a}(F\circ\mathcal{R}_{R}(\vec{t}))\bigr{)}\quad\text{iff}\quad(\vec{t})\in\mathit{eval}_{I}(R),

(20)

where $\mathit{eval}_{I}(R)$ is the set of tuples obtained by evaluation of relational query $R$ against the relational database $I$ . Furthermore, for any relational expression $R$ , $\mathit{tr}(R)$ is deterministic.

Proof 5.19

To prove the theorem we define $\mathit{tr}(R)$ by recursion on the structure of $R$ . We consider relational algebra expressions to be constructed with base relations, projections, selections, set unions, Cartesian products, and set differences. Other well known relational operators such as joins can be defined in terms of basic operators mentioned above. Attribute renaming is not relevant in our case, since we represent relations with positional arguments only. We denote set unions, Cartesian products, and set differences with the usual mathematical notation (e.g., $R\cup S$ , $R\times S$ and $R\setminus S$ , respectively). Notation for projections and selections is less standardised (and needs to be adapted to relations with positional arguments). We denote by $\pi_{i_{1},\ldots,i_{k}}(R)$ the projection of $R$ onto $i_{1}$ ’th, $i_{2}$ ’th, $\ldots$ , and $i_{k}$ ’th “leg” (i.e., on a single tuple, $\pi_{i_{1},\ldots,i_{k}}((t_{1},t_{2},\ldots,t_{n})):=(t_{i_{1}},t_{i_{2}},\ldots,t_{i_{k}})$ ). We denote by $\sigma_{\phi}(R)$ the selection with condition $\phi$ applied to $R$ (i.e., it returns those tuples of $R$ which satisfy $\phi$ ). If $R$ is $n$ -ary, then we will assume that in $\phi$ expression $\$_{i}$ corresponds to the $i$ -th column of $R$ , for $i\in\{1,\ldots,n\}$ .

Let $r$ be a base relation of arity $k$ (for simplicity we omit the typing information), and let $\vec{x}$ be a list of $k$ distinct variables. Then $\mathit{tr}(r):=\nabla[\mathcal{R}_{r}(\vec{x})]_{?}\mathbin{.}\mathcal{R}_{r}(\vec{x}).$ We now consider selected relational operators:

1.

Let $R$ be a relational formula defining an $n$ -ary relation, and let $i_{1},\ldots,i_{k}$ be a subsequence of $1,\ldots,n$ . We define $\mathit{tr}(\pi_{i_{1},\ldots,i_{k}}(R))$ to be $\mathit{tr}(R)$ with each subquery of the form $\mathcal{R}_{R}(t_{1},\ldots,t_{n})$ replaced with $\mathcal{R}_{\pi_{i_{1},\ldots,i_{k}}(R)}(t_{i_{1}},\ldots,t_{i_{k}})$ .
2.

Let $R$ be a relational formula of arity $k$ , and let $\phi$ be a term of sort Bool representing condition on rows of $R$ (where in $\phi$ special variable $\$_{i}$ , $i\in\{1,\ldots,n\}$ corresponds to the $i$ -th “attribute” of $R$ ). We define $\mathit{tr}(\sigma_{\phi}(R))$ to be $\mathit{tr}(R)$ with each subquery of the form $\mathcal{R}_{R}(\vec{t})$ replaced with $\bigl{\{}\{\vec{t}/\vec{\$}\}(\phi)\bigr{\}}\Rightarrow\mathcal{R}_{\sigma_{\phi}(R)}(\vec{t})$ .
3.

$\mathit{tr}(R_{1}\cup R_{2}):=\mathit{tr}(R_{1})\rhd\mathit{tr}(R_{2})$ .
4.

Let $R$ and $S$ be relational formulas. Let $\mathit{tr}(S)^{\prime}:=\sigma(\mathit{tr}(S))$ for some renaming $\sigma$ such that $\mathit{Var}(\mathit{tr}(R))\cap\mathit{Var}(\mathit{tr}(S)^{\prime})=\emptyset$ ( $\mathit{tr}(S)\equiv\mathit{tr}(S)^{\prime}$ by Lemma 5.6 since $\mathit{tr}(S)$ is closed). Further, let $\alpha(\vec{t})$ be $\mathit{tr}(S)^{\prime}$ with each subquery of the form $\mathcal{R}_{S}(\vec{s})$ replaced with $\mathcal{R}_{R\times S}(\vec{t},\vec{s})$ . Then we define $\mathit{tr}(R\times S)$ to be $\mathit{tr}(R)$ with each subquery of the form $\mathcal{R}_{R}(\vec{t})$ replaced with $\alpha(\vec{t})$ .
5.

Let $R$ and $S$ be relational formulas of arity $k$ . Let $\mathit{tr}(S)^{\prime}$ be like in the previous point. Let $\vec{x}$ be a list of $k$ distinct variables such that $\{\vec{x}\}\cap(\mathit{Var}(\mathit{tr}(R))\cup\mathit{Var}(\mathit{tr}(S)^{\prime}))=\emptyset$ . We define $\mathit{tr}(R\setminus S))$ to be $\mathit{tr}(R)$ with each subquery of the form $\mathcal{R}_{R}(\vec{t})$ replaced with $\neg\{\vec{t}/\vec{x}\}\bigl{(}\phi_{\mathit{tr}(S)^{\prime}}^{\vec{x}|\mathcal{R}_{S}}\bigr{)}\Rightarrow\mathcal{R}_{R\setminus S}(\vec{t})$ (see Lemma 5.15).

An easy induction on the structure of $R$ shows that $\mathit{tr}(R)$ is closed and deterministic, and Equation (20) is satisfied (in the case of induction step for set difference we also use Lemma 5.15)

Remark 5.20

Relational queries $R$ evaluate to sets of tuples. However, $\mathit{tr}(R)$ may evaluate to a multiset of facts — e.g., when evaluating unions duplicate facts are not removed.

Example 5.21

Let $r$ and $s$ be binary relations. Consider the following relational algebra expression:

R:=\pi_{1,4}\bigl{(}\sigma_{\$_{2}=\$_{3}}(r\times s)\bigr{)}.

Thus, $(x,y)\in R$ iff $(x,z)\in r$ and $(z,y)\in s$ for some $z$ . Let represent $R$ as a query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ using definition of $\mathit{tr}(\_)$ from the proof of Theorem 5.18. First,

\mathit{tr}(r)=\nabla[\mathcal{R}_{r}(x_{1},x_{2})]_{?}\mathbin{.}\mathcal{R}_{r}(x_{1},x_{2}),\quad\mathit{tr}(s)=\nabla[\mathcal{R}_{s}(y_{1},y_{2})]_{?}\mathbin{.}\mathcal{R}_{s}(y_{1},y_{2}),

where $x_{1},x_{2},y_{2},y_{2}$ are distinct variables. Then, by the point 4 in the proof

\mathit{tr}(r\times s)=\nabla[\mathcal{R}_{r}(x_{1},x_{2})]_{?}\mathbin{.}\nabla[\mathcal{R}_{s}(y_{1},y_{2})]_{?}\mathbin{.}\mathcal{R}_{r\times s}(x_{1},x_{2},y_{1},y_{2}).

Finally, to deal with selection and projection we apply points 2 and 1, respectively, from the proof:

\mathit{tr}\bigl{(}\pi_{1,4}\bigl{(}\sigma_{\$_{2}=\$_{3}}(r\times s)\bigr{)}\bigr{)}\\ =\nabla[\mathcal{R}_{r}(x_{1},x_{2})]_{?}\mathbin{.}\nabla[\mathcal{R}_{s}(y_{1},y_{2})]_{?}\mathbin{.}\bigl{(}\{x_{2}=y_{1}\}\Rightarrow\mathcal{R}_{\pi_{1,4}(\sigma_{\$_{2}=\$_{3}}(r\times s))}(x_{1},y_{2})\bigr{)}.

6 Rewriting semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$

We associate with a DML query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ the rewriting system $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ . Terms rewritten with the rules of $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ are of the form $\{F,F^{\prime},F_{\mathbf{n}},S\}^{d}$ , where $\{\_,\_,\_,\_\}^{d}:\text{\tt FSet}\;\text{\tt FSet}\;\text{\tt FSet}^{\mathbf{n}}\;\text{\tt Stk}^{d}\rightarrow\text{\tt State}^{d}$ . $F$ is the database of facts against which we issue the DML query. $F$ changes during execution of the query while the facts are removed from it. A multiset $F^{\prime}$ , expanded during query execution, contains new facts to be added to the database. $F_{\mathbf{n}}$ is a multiset of fresh facts (see Section 2) from which fresh values are drawn. $S$ is a stack of sort $\text{\tt Stk}^{d}$ which simulates structural recursion. Normal forms encapsulating a new database and new multiset of fresh facts after successful or, respectively, failing execution of a DML query, are constructed with $\mathfrak{n},\mathfrak{f}:\text{\tt FSet}\;\text{\tt FSet}^{\mathbf{n}}\rightarrow\text{\tt State}^{d}$ . We consider only terms $t$ of sort $\text{\tt State}^{d}$ which satisfy the following freshness condition:

Definition 6.1

A term $t$ of sort $\text{\tt State}^{d}$ satisfies the freshness condition if and only if $m>n$ for all positions $\kappa$ , $\kappa^{\prime}$ in $t$ such that $\kappa\neq\kappa^{\prime}1$ , $t|_{\kappa}=C_{s}(\imath^{s}_{m})$ and $t|_{\kappa^{\prime}}=\imath^{s}_{n}$ .

Example 6.2

Consider the following term of $\text{\tt State}^{d}$ sort:

\mathfrak{n}\bigl{(}f(\imath^{s}_{10})\circ f(\imath^{s}_{3}),C_{s}(\imath^{s}_{7})\bigr{)}.

It does not satisfy the freshness condition having as subterms both $C_{s}(\imath^{s}_{7})$ and $\imath^{s}_{10}$ with $10>7$ . Our general assumption which justifies the freshness condition is that a fresh fact of the form $C_{s}(\imath^{s}_{m})$ means that we never before used any value of the form $\imath^{s}_{n}$ with $n>m$ . Clearly, terms which do not satisfy freshness condition (like the term above) violate this assumption.

Terms of sort $\text{\tt Stk}^{d}$ are constructed from local computation frames of sort $\text{\tt Frm}^{d}$ , where $\text{\tt Frm}^{d}<\text{\tt Stk}^{d}$ , using an associative binary operator $\_\_:\text{\tt Stk}^{d}\;\text{\tt Stk}^{d}\rightarrow\text{\tt Stk}^{d}$ with identity $\emptyset$ . Most constructors of frames are indexed by DML sub-queries $R$ of $Q$ , and lists of distinct variable names $\vec{v}:=v_{1},\ldots,v_{n}$ of respective sorts $s_{1},\ldots,s_{n}$ such that $\{\vec{v}\}\subseteq\mathit{Var}(Q)$ contains all free variables of $R$ :

	$\displaystyle[\_,\ldots,\_]^{\vec{v}\|d}_{R},[\_,\ldots,\_]^{\vec{v}\|d,\downarrow}_{R}:s_{1}\ldots s_{n}\rightarrow\text{\tt Frm}^{d},\quad[\_,\ldots,\_\;\|\;\_]^{\vec{v}\|d}_{R}:s_{1}\ldots s_{n}\;\text{\tt Stk}^{c}\rightarrow\text{\tt Frm}^{d},$
	$\displaystyle\mathit{\surd}:\rightarrow\text{\tt Frm}^{d},\quad[\_\;\|\;\_,\ldots,\_\;\|\;\_]^{\vec{v}\|d}_{\nabla P\mathbin{.}R}:\text{\tt FSet}\;s_{1}\ldots s_{n}\;\text{\tt Bool}\rightarrow\text{\tt Frm}^{d},$
	$\displaystyle[\_\;\|\;\_,\ldots,\_\;\|\;\_,\_]^{\vec{v}\|d}_{\nabla P\mathbin{.}R}:\text{\tt FSet}\;s_{1}\ldots s_{n}\;\text{\tt Bool}\;\text{\tt FSet}\rightarrow\text{\tt Frm}^{d}.$

As $\vec{v}$ can be empty, the above signature templates include $[]^{\vec{v}|d}_{R},[]_{R}^{\vec{v}|d,\downarrow}:\rightarrow\text{\tt Frm}^{d}$ , etc. A constant $\mathit{\surd}$ marks successful branches of computation, i.e., those which created either new facts or $\surd:\text{\tt DQy}$ . Marking such branches is necessary as facts deleted by unsuccessful branches are restored as soon as the branch finishes. Frames of the form $[\vec{a}]^{\vec{v}|d}_{R}$ , $[\vec{a}]^{\vec{v}|d,\downarrow}_{R}$ , $[\vec{a}|S]^{\vec{v}|d}_{R}$ , $[F^{\prime}|\vec{a}|B]^{\vec{v}|d}_{R}$ , or $[F^{\prime}|\vec{a}|B,F^{\prime\prime}]^{\vec{v}|d}_{R}$ are called $(R,\sigma)$ -frames, where $\sigma:=\{\vec{a}/\vec{v}\}$ is the current substitution. They are related to evaluation of $\sigma(R)$ . Marked frames $[\vec{a}]^{\vec{v}|d,\downarrow}$ occur in the execution of “unions” $\_\rhd\_$ . Conditional frames $[\vec{a}|S]^{\vec{v}|d}_{R}$ are used in execution of conditionals $\phi\Rightarrow R$ , where $S:\text{\tt Stk}^{c}$ represents evaluation of $\phi$ . Iterator frames $[F^{\prime}|\vec{a}|B]^{\vec{v}|d}_{\nabla P\mathbin{.}R}$ and $[F^{\prime}|\vec{a}|B,F^{\prime\prime}]^{\vec{v}|d}_{\nabla P\mathbin{.}R}$ represent iterative execution of $\sigma(\nabla P\mathbin{.}R)$ . Multiset $F^{\prime}$ , called iterator state, contains facts available for matching with $P_{!}\circ P_{?}\circ P_{0}$ (cf. Remark 2.3). Iteration status $B$ is equal to t iff the iteration already generated either new facts or $\surd$ (i.e., if the branch related to $\sigma(\nabla P\mathbin{.}R)$ was successful). “Tentative” iterator frames $[F^{\prime}|\vec{a}|B,F^{\prime\prime}]^{\vec{v}|q}_{\nabla P\mathbin{.}R}$ store multiset $F^{\prime\prime}$ of facts deleted from the database in the present step, so that they can be restored if the step is unsuccessful. Given a database $F$ , in order to execute a closed DML query $Q$ we rewrite a state term $\mathrm{I}_{Q}(F,F_{\mathbf{n}}):=\{F,\emptyset,F_{\mathbf{n}},[]^{|d}_{Q}\}^{d}$ until a normal form $\mathfrak{n}(F^{\prime},F_{\mathbf{n}}^{{}^{\prime}})$ or $\mathfrak{f}(F^{\prime},F_{\mathbf{n}}^{{}^{\prime}})$ is reached, indicating that a successful or, respectively, unsuccessful execution of $Q$ in the database $F$ yielded a new database $F^{\prime}$ , and a new multiset of fresh facts $F_{\mathbf{n}}^{{}^{\prime}}$ . Now we are ready to define rule schemas of $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ .

Two consecutive occurrences of $\mathit{\surd}$ are collapsed, and $(\mathit{\surd},\sigma)$ -frames are replaced with $\mathit{\surd}$ :

\lambda_{\text{col}}:\{F,F^{\prime},F_{\mathbf{n}},S\mathit{\surd}\mathit{\surd}\bigr{\}}^{d}\Rightarrow\!\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\mathit{\surd}\}^{d},\;\;\lambda_{\surd}:\{F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}|d}_{\surd}\}^{d}\Rightarrow\!\{F,F^{\prime},F_{\mathbf{n}},S\mathit{\surd}\}^{d}.

(21)

An $(\emptyset,\sigma)$ -frame is removed from the stack. An $(f,\sigma)$ -frame, where $f$ is a fact and $\sigma$ is the current substitution, is replaced by $\mathit{\surd}$ and $\sigma(f)$ is added to $F^{\prime}$ (since $\mathit{Var}(f)\subseteq\{\vec{v}\}$ , $\sigma(f)$ is closed):

\lambda_{\emptyset}:\bigl{\{}F,F^{\prime}\!,F_{\mathbf{n}},S[\vec{v}]^{\vec{v}|d}_{\emptyset}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime}\!,F_{\mathbf{n}},S\bigr{\}}^{q},\ \lambda_{\text{fact}}:\bigl{\{}F,F^{\prime}\!,F_{\mathbf{n}},S[\vec{v}]^{\vec{v}|d}_{f}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime}\circ f,F_{\mathbf{n}},S\mathit{\surd}\bigr{\}}^{q}.

(22)

An $(R_{1}\rhd R_{2},\sigma)$ -frame is split into the $(R_{1},\sigma)$ -frame and the $(R_{2},\sigma)$ -frame. The $(R_{2},\sigma)$ -frame is marked with $\downarrow$ so that the evaluation of $\sigma(R_{1}\rhd R_{2})$ can be marked as successful when at least one of the branches is successful. When both branches are successful, this can produce two consecutive $\mathit{\surd}$ constants on the stack which are then collapsed using $\lambda_{\text{col}}$ rule in Equation (21).

$\displaystyle\lambda_{\rhd}^{\text{unf}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}\|d}_{R_{1}\rhd R_{2}}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}\|d,\downarrow}_{R_{2}}[\vec{v}]^{\vec{v}\|d}_{R_{1}}\bigr{\}}^{d},$
$\displaystyle\lambda_{\rhd;\mathit{\surd}}^{\text{fld}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}\|d,\downarrow}_{R_{2}}\mathit{\surd}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\mathit{\surd}[\vec{v}]^{\vec{v}\|d}_{R_{2}}\bigr{\}}^{d},$
$\displaystyle\lambda_{\rhd;\emptyset}^{\text{fld}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}\|d,\downarrow}_{R_{2}}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}\|d}_{R_{2}}\bigr{\}}^{d},$	(23)

Rule schemas for execution of conditionals $\phi\Rightarrow Q$ are similar to those in Equations (12), (13):

$\displaystyle\lambda_{\text{cond}}^{\text{unf}}:$	$\displaystyle\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}\|d}_{\phi\Rightarrow R}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\bigl{[}\vec{v}\;\|\;[\vec{v}]^{\vec{v}\|c}_{\phi}\bigr{]}^{\vec{v}\|d}_{R}\bigr{\}}^{d},$
$\displaystyle\lambda_{\text{cond};\text{\bf f}}^{\text{unf}}:$	$\displaystyle\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}\;\|\;\mathfrak{r}(\text{\bf f})]^{\vec{v}\|d}_{R}\bigr{\}}^{d}\Rightarrow\{F,F^{\prime},F_{\mathbf{n}},S\}^{d},$
$\displaystyle\lambda_{\text{cond};\text{\bf t}}^{\text{unf}}:$	$\displaystyle\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}\;\|\;\mathfrak{r}(\text{\bf t})]^{\vec{v}\|d}_{R}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]_{R}^{\vec{v}\|d}\bigr{\}}^{d},$
$\displaystyle\lambda^{d}:$	$\displaystyle\{F,F^{\prime},F_{\mathbf{n}},S[\vec{v}\;\|\;S^{\prime}]^{\vec{v}\|d}_{R}\}^{d}\Rightarrow\{F,F^{\prime},F_{\mathbf{n}},S[\vec{v}\;\|\;S^{\prime\prime}]^{\vec{v}\|d}_{R}\}^{d}\ \text{if}\ C$
	$\displaystyle\quad\quad\text{for all}\ \lambda:\{F,S^{\prime}\}^{c}\Rightarrow\{F,S^{\prime\prime}\}^{c}\ \text{if}\ C\ \text{in}\ \mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}(\phi).$	(24)

Evaluation of a $\nabla\_.\_$ subquery is initialized with the whole database available for matching. Iteration status is f since evaluation is not successful yet.

{\lambda_{\nabla}^{\text{init}}}:\quad\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[\vec{v}]^{\vec{v}|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S[F\;|\;\vec{v}\;|\;\text{\bf f}]^{\vec{v}|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d}.

(25)

Let $\vec{w}$ be a sequence of all the distinct variables in $\mathit{Var}(P)\setminus\{\vec{v}\}$ . Let $\sigma$ be the current substitution. Rule ${\lambda_{\nabla}^{\text{unf}}}$ pushes onto the stack a $(\sigma^{\prime},R)$ -frame, where $\sigma^{\prime}=\sigma\cup\{\vec{b}/\vec{w}\}$ is defined by matching $F^{\prime\prime}\circ\sigma(P_{!}\circ P_{?}\circ P_{0})$ with iterator state and $P_{\mathbf{n}}$ with the multiset of fresh facts $F_{\mathbf{n}}$ . It also removes $\sigma^{\prime}(P_{?}\circ P_{0})$ from the iterator state and $\sigma^{\prime}(P_{0})$ from the database state, and, finally, it updates fresh facts using $\upsilon$ defined in Equation (1). Removed facts $\sigma^{\prime}(P_{0})$ are stored in the tentative iterator frame:

	$\displaystyle{\lambda_{\nabla}^{\text{unf}}}:$	$\displaystyle\quad\bigl{\{}F\circ P_{!}\circ P_{?}\circ P_{0},F^{\prime},F_{\mathbf{n}}\circ P_{\mathbf{n}},S\bigl{[}F^{\prime\prime}\circ P_{!}\circ P_{?}\circ P_{0}\;\|\;\vec{v}\;\|\;B\bigr{]}^{\vec{v}\|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d}$
		$\displaystyle\quad\Rightarrow\bigl{\{}F\circ P_{!}\circ P_{?},F^{\prime},F_{\mathbf{n}}\circ\upsilon(P_{\mathbf{n}}),S\bigl{[}F^{\prime\prime}\circ P_{!}\;\|\;\vec{v}\;\|\;B,P_{0}\bigr{]}^{\vec{v}\|d}_{\nabla P\mathbin{.}R}[\vec{v},\vec{w}]^{\vec{v},\vec{w}\|d}_{R}\bigr{\}}^{d},$		(26)

If execution of $\sigma^{\prime}(R)$ proves unsuccessful, removed facts can be restored both to iterator state and database. Otherwise, we discard them and set the iteration status to t:

	$\displaystyle\lambda^{\text{fld}}_{\nabla;\emptyset}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\bigl{[}F^{\prime\prime}\;\|\;\vec{v}\;\|\;B,F_{0}\bigr{]}^{\vec{v}\|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d}\Rightarrow\bigl{\{}F\circ F_{0},F^{\prime},F_{\mathbf{n}},S\bigl{[}F^{\prime\prime}\circ F_{0}\;\|\;\vec{v}\;\|\;B\bigr{]}^{\vec{v}\|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d},$
	$\displaystyle\lambda^{\text{fld}}_{\nabla;\mathit{\surd}}:$	$\displaystyle\ \bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\bigl{[}F^{\prime\prime}\;\|\;\vec{v}\;\|\;B,F_{0}\bigr{]}^{\vec{v}\|d}_{\nabla P\mathbin{.}R}\mathit{\surd}\bigr{\}}^{d}\Rightarrow\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\bigl{[}F^{\prime\prime}\;\|\;\vec{v}\;\|\;\text{\bf t}\bigr{]}^{\vec{v}\|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d}.$		(27)

We keep applying ${\lambda_{\nabla}^{\text{unf}}}$ until we cannot match $\sigma(P_{!}\circ P_{?}\circ P_{0})$ with iterator state. Then we replace the iterator frame with $\delta(B)$ where $B$ is the iteration status, $\delta(\text{\bf t})=\mathit{\surd}$ , and $\delta(\text{\bf f})=\emptyset$ . To prevent premature application, rule schema ${\lambda_{\nabla}^{\text{end}}}$ is conditional, where the condition uses functions $\mu_{P,\vec{v}}:\text{\tt Pat}\;s_{1}\ldots s_{n}\rightarrow\text{\tt Yes?}$ defined for each $\vec{v}\subseteq\mathit{Var}(Q)$ and pattern $P$ occurring in $Q$ with the single equation $\mu_{P,\vec{v}}(F\circ P_{!}\circ P_{?}\circ P_{0},\vec{v})=\text{\it yes}$ (cf. Equation (8)):

{\lambda_{\nabla}^{\text{end}}}:\bigl{\{}F,F^{\prime},F_{\mathbf{n}},S\bigl{[}F^{\prime\prime}\;|\;\vec{v}\;|\;B\bigr{]}^{\vec{v}|d}_{\nabla P\mathbin{.}R}\bigr{\}}^{d}\Rightarrow\{F,F^{\prime},F_{\mathbf{n}},S\delta(B)\}^{d}\ \text{if}\ (\mu_{P,\vec{v}}(F^{\prime\prime},\vec{v})=\text{\it yes})=\text{\bf f}.

(28)

The last two rules reduce the $\text{\tt State}^{d}$ -terms with an empty stack or stack containing only the $\mathit{\surd}$ constant into a term constructed with $\mathfrak{n}$ or $\mathfrak{f}$ , respectively:

\lambda_{\text{dml}}^{\mathit{\surd}}:\{F,F^{\prime},F_{\mathbf{n}},\mathit{\surd}\}\Rightarrow\mathfrak{n}(F\circ F^{\prime},F_{\mathbf{n}}),\quad\lambda_{\text{dml}}^{\emptyset}:\{F,F^{\prime},F_{\mathbf{n}},\emptyset\}\Rightarrow\mathfrak{f}(F\circ F^{\prime},F_{\mathbf{n}}).

(29)

Theorem 6.3

$\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ is a terminating rewriting system.

Proof 6.4

The proof is similar to the proof of Theorem 4.2. Only subqueries of the form $\nabla P\mathbin{.}R$ , where $P$ is semiterminating but not terminating (i.e., $P_{0}\neq\emptyset$ , but $P_{?}=\emptyset$ ) require special care. In this case, the signature of $\nabla\_.\_$ forces $R$ to be success assured, i.e., $R$ ’s evaluation always succeeds, and hence removed facts matching $P_{0}$ are never restored. This ensures termination of $\nabla P\mathbin{.}R$ ’s execution.

The following useful observation can be trivially verified by examining the rule schemas:

Lemma 6.5

Let $R$ be a DML subquery of $Q$ . Then, for all multisets of facts $F$ , $F^{\prime}$ , $G$ , $G^{\prime}$ , multisets $F_{\mathbf{n}}$ , $G_{\mathbf{n}}$ of fresh facts, stacks $S$ , lists of variables $\vec{v}=v_{1},\ldots,v_{n}$ and of values $\vec{a}=a_{1},\ldots,a_{n}$ ,

1.

$\{F,F^{\prime},F_{\mathbf{n}},S[\vec{a}]^{\vec{v}|d}_{R}\}^{d}\rightarrow^{*}_{\mathcal{R}_{1}}\{G,F^{\prime}\circ G^{\prime},G_{\mathbf{n}},S\mathit{\surd}\}^{d}$ iff $\{F,\emptyset,F_{\mathbf{n}},[]^{|d}_{\sigma(R)}\}^{d}\rightarrow^{*}_{\mathcal{R}_{2}}\mathfrak{n}(G\circ G^{\prime},G_{\mathbf{n}})$ ,
2.

$\{F,F^{\prime},F_{\mathbf{n}},S[\vec{a}]^{\vec{v}|d}_{R}\}^{d}\rightarrow^{*}_{\mathcal{R}_{1}}\{F,F^{\prime},G_{\mathbf{n}},S\}^{d}$ iff $\{F,\emptyset,F_{\mathbf{n}},[]^{|d}_{\sigma(R)}\}^{d}\rightarrow^{*}_{\mathcal{R}_{2}}\mathfrak{f}(F,G_{\mathbf{n}})$ .

where $\sigma:=\{\vec{a}/\vec{v}\}$ , $\mathcal{R}_{1}:=\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ , and $\mathcal{R}_{2}:=\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(\sigma(R))$ .

As in the case of pure queries, we consider two DML queries equivalent if and only if they can match their results. We should not, however, distinguish results differing only by the choice of fresh values. Let $\mathcal{S}_{\mathbf{n}}$ denote the set of nominal sorts in $\Sigma_{S}$ . Let $\mathit{nom}(t)$ be the $\mathcal{S}_{\mathbf{n}}$ -sorted set of nominal values contained in term $t$ . For any $\mathcal{S}_{\mathbf{n}}$ -sorted bijection $\alpha:X\rightarrow Y$ between sets of nominal values we denote by $\hat{\alpha}$ the natural extension of $\alpha$ to terms $t$ such that $\mathit{nom}(t)\subseteq X$ . More precisely, $\hat{\alpha}(x)=\alpha(x)$ if $x\in X$ , $\alpha(c)=c$ if $c$ is a constant of non-nominal sort or a variable, and $\hat{\alpha}(f(t_{1},\ldots,t_{n}))=f(\hat{\alpha}(t_{1}),\ldots,\hat{\alpha}(t_{n}))$ if $f(t_{1},\ldots,t_{n})$ is of non-nominal sort. With those notions we define equivalence on DML queries as follows:

Definition 6.6

Let $Q_{1}$ and $Q_{2}$ be two DML queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . We say that $Q_{1}$ is logically equivalent to $Q_{2}$ , writing $Q_{1}\equiv Q_{2}$ , if and only if, for all ground substitutions $\sigma$ such that $\sigma(Q_{1})$ and $\sigma(Q_{2})$ are closed, all ground multisets of facts $F$ , $G$ , all ground multisets of fresh facts $F_{\mathbf{n}}$ , $G_{\mathbf{n}}$ , and all $i\in\{1,2\}$

1.

if $\mathrm{I}_{\sigma(Q_{i})}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{n}(G,G_{\mathbf{n}})$ then there exist multisets of fresh facts $F^{{}^{\prime}}_{\mathbf{n}}$ , $G^{{}^{\prime}}_{\mathbf{n}}$ , multisets of facts $F^{\prime}$ , $G^{\prime}$ , and an $\mathcal{S}_{\mathbf{n}}$ -sorted bijection $\alpha:\mathit{nom}(F)\cup\mathit{nom}(G)\rightarrow\mathit{nom}(F^{\prime})\cup\mathit{nom}(G^{\prime})$ such that $F^{\prime}=\hat{\alpha}(F)$ , $G^{\prime}=\hat{\alpha}(G)$ , and $\mathrm{I}_{\sigma(Q_{3-i})}(F^{\prime},F^{{}^{\prime}}_{\mathbf{n}})\rightarrow^{!}\mathfrak{n}(G^{\prime},G^{{}^{\prime}}_{\mathbf{n}})$ .
2.

if $\mathrm{I}_{\sigma(Q_{i})}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F,G_{\mathbf{n}})$ then there exist multisets of fresh facts $F^{{}^{\prime}}_{\mathbf{n}}$ , $G^{{}^{\prime}}_{\mathbf{n}}$ , a multiset of facts $F^{\prime}$ , and an $\mathcal{S}_{\mathbf{n}}$ -sorted bijection $\alpha:\mathit{nom}(F)\rightarrow\mathit{nom}(F^{\prime})$ such that $F^{\prime}=\hat{\alpha}(F)$ and $\mathrm{I}_{\sigma(Q_{3-i})}(F^{\prime},F^{{}^{\prime}}_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F^{\prime},G^{{}^{\prime}}_{\mathbf{n}})$ .

The following result is an immediate consequence of Lemma 6.5:

Lemma 6.7

Logical equivalence on queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ is an equivalence relation and a congruence, i.e., if $\kappa$ is a position in a DML query $Q$ such that $Q|_{\kappa}$ is a DML query, and $R\equiv Q|_{\kappa}$ , then $Q\equiv Q[R]_{\kappa}$ .

Lemma 6.8

For any closed DML query $Q$ in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ , and any renaming $\sigma$ , $Q\equiv\sigma(Q)$ .

Lemma 6.9, proven similarly to Lemma 4.9, clarifies elements of rewriting semantics of DML queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ (we leave $\nabla\_.\_$ evaluation where we are better off with the rewriting definition).

Lemma 6.9

For all ground multisets of facts $F$ , $G$ and of fresh facts $F_{\mathbf{n}}$ , $G_{\mathbf{n}}$ , as well as closed DML queries $Q$ , $Q_{1}$ , $Q_{2}$ and all sentences $\phi$ , the following holds:

1.

If $\mathrm{I}_{Q}(F,F_{\mathbf{n}})\rightarrow^{!}\Gamma$ then $\Gamma=\mathfrak{n}(G,G_{\mathbf{n}})$ or $\Gamma=\mathfrak{f}(F,G_{\mathbf{n}})$ for some ground multiset of facts $G$ and ground multiset of fresh facts $G_{\mathbf{n}}$ . If $\mathrm{I}_{Q}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(G,G_{\mathbf{n}})$ then $F=G$ .
2.

If $\mathrm{I}_{f}(F,F_{\mathbf{n}})\rightarrow^{!}\Gamma$ then $\Gamma=\mathfrak{n}(F\circ f,F_{\mathbf{n}})$ . If $\mathrm{I}_{\emptyset}(F,F_{\mathbf{n}})\rightarrow^{!}\Gamma$ then $\Gamma=\mathfrak{f}(F,F_{\mathbf{n}})$ .
3.

$\mathrm{I}_{\phi\Rightarrow Q}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{n}(G,G_{\mathbf{n}})$ iff $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{r}(\text{\bf t})$ and $\mathrm{I}_{Q}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{n}(G,G_{\mathbf{n}})$ .
4.

$\mathrm{I}_{\phi\Rightarrow Q}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F,G_{\mathbf{n}})$ iff $\mathrm{I}_{\phi}(F)\rightarrow^{!}\mathfrak{r}(\text{\bf f})$ or $\mathrm{I}_{Q}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F,G_{\mathbf{n}})$ .
5.

$\mathrm{I}_{Q_{1}\rhd Q_{2}}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F,G_{\mathbf{n}})$ iff there exists a multiset of fresh facts $G^{{}^{\prime}}_{\mathbf{n}}$ such that $\mathrm{I}_{Q_{1}}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F,G^{{}^{\prime}}_{\mathbf{n}})$ and $\mathrm{I}_{Q_{2}}(F,G^{{}^{\prime}}_{\mathbf{n}})\rightarrow^{!}\mathfrak{f}(F,G_{\mathbf{n}})$ .
6.

$\mathrm{I}_{Q_{1}\rhd Q_{2}}(F,F_{\mathbf{n}})\rightarrow^{!}\mathfrak{n}(G,G_{\mathbf{n}})$ iff, for some ground multisets of facts $F^{\prime}$ , $F^{\prime\prime}$ , $G^{\prime}$ , $G^{\prime\prime}$ such that $G=G^{\prime\prime}\circ F^{\prime}\circ F^{\prime\prime}$ , a ground multiset of fresh facts $G_{\mathbf{n}}^{{}^{\prime}}$ , and stacks $S,S^{\prime}\in\{\emptyset,\mathit{\surd}\}$ where $S=\mathit{\surd}$ or $S^{\prime}=\mathit{\surd}$ , we have $\mathrm{I}_{Q_{1}}(F,F_{\mathbf{n}})\rightarrow^{*}\{G^{\prime},F^{\prime},G_{\mathbf{n}}^{{}^{\prime}},S\}^{d}$ and $\mathrm{I}_{Q_{2}}(G^{\prime},G^{{}^{\prime}}_{\mathbf{n}})\rightarrow^{*}\{G^{\prime\prime},F^{\prime\prime},G_{\mathbf{n}},S^{\prime}\}^{d}$ .

Lemma 6.10

The following logical equivalences hold between queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ :

	$\displaystyle\emptyset\rhd Q\equiv Q,\quad Q\rhd\emptyset\equiv Q,\quad Q_{1}\rhd(Q_{2}\rhd Q_{3})\equiv(Q_{1}\rhd Q_{2})\rhd Q_{3},$
	$\displaystyle\bot\Rightarrow R\equiv\emptyset,\quad\neg\bot\Rightarrow R\equiv R,\quad\exists P\mathbin{.}\emptyset\equiv\emptyset$

Lemma 6.10 is very similar to Lemma 5.8, except that $\_\rhd\_$ is not commutative. $Q_{1}\rhd Q_{2}$ may be non-equivalent with $Q_{2}\rhd Q_{1}$ if, say, $Q_{1}$ deletes a fact which is referred to in some pattern in $Q_{2}$ .

We need to generalize the notion of confluence, lest rewriting paths leading to terms differing only by distinct choices of fresh values (as in the next example) are to be considered non-convergent.

Example 6.11

Let I be a nominal sort. Let $r:\text{\tt Nat}\rightarrow\text{\tt Fact}$ , $s:\text{\tt I}\;\text{\tt Nat}\rightarrow\text{\tt Fact}$ . Consider DML query $Q:=\nabla[C_{\text{\tt I}}(x)]_{\mathbf{n}}\circ[r(y)]_{?}\mathbin{.}s(x,y)$ , and let $F:=r(1)\circ r(2)$ . Then

\mathrm{I}_{Q}\bigl{(}F,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{0})\bigr{)}\rightarrow^{*}\bigl{\{}F,\emptyset,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{0}),[F||\text{\bf f}]^{|d}_{Q}\bigr{\}}^{d}\xrightarrow{{\lambda_{\nabla}^{\text{unf}}}}\bigl{\{}F,\emptyset,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{1}),[r(2)||\text{\bf f},\emptyset]^{|d}_{Q}[\imath^{\text{\tt I}}_{0},1]^{x,y|d}_{s(x,y)}\bigr{\}}^{d}\\ \rightarrow^{*}\bigl{\{}F,s(\imath^{\text{\tt I}}_{0},1),C_{\text{\tt I}}(\imath^{\text{\tt I}}_{1}),[r(2)||\text{\bf t}]^{|d}_{Q}\bigr{\}}^{d}\xrightarrow{{\lambda_{\nabla}^{\text{unf}}}}\bigl{\{}F,S(\imath^{\text{\tt I}}_{0},1),C_{\text{\tt I}}(\imath^{\text{\tt I}}_{2}),[\emptyset||\text{\bf t}]^{|d}_{Q}[\imath^{\text{\tt I}}_{1},2]^{x,y|d}_{s(x,y)}\bigr{\}}^{d}\\ \rightarrow^{*}\mathfrak{n}\bigl{(}F\circ s(\imath^{\text{\tt I}}_{0},1)\circ s(\imath^{\text{\tt I}}_{1},2),C_{\text{\tt I}}(\imath^{\text{\tt I}}_{2})\bigr{)},

and, if we match $r(1)$ and $r(2)$ in reverse order in applications of ${\lambda_{\nabla}^{\text{unf}}}$ rule, then

\mathrm{I}_{Q}\bigl{(}F,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{0})\bigr{)}\rightarrow^{*}\mathfrak{n}\bigl{(}F\circ s(\imath^{\text{\tt I}}_{0},2)\circ s(\imath^{\text{\tt I}}_{1},1),C_{\text{\tt I}}(\imath^{\text{\tt I}}_{2})\bigr{)}.

Here we define an equivalence relation on terms of sort $\text{\tt State}^{d}$ which is a bisimulation:

Definition 6.12

We say that term $t_{1}$ is nominally equivalent to term $t_{2}$ , in which case we write $t_{1}\equiv_{\mathbf{n}}t_{2}$ , if and only if there exists a bijection $\alpha:\mathit{nom}(t_{1})\rightarrow\mathit{nom}(t_{2})$ such that $\hat{\alpha}(t_{1})=t_{2}$ .

Lemma 6.13

Nominal equivalence is an equivalence relation. When restricted to $\text{\tt State}^{d}$ terms satisfying the freshness condition (Definition 6.1), it is also a bisimulation on $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ , for all $Q$ .

We leave an easy proof of Lemma 6.13 to the reader. The restriction to terms satisfying the freshness condition is necessary for nominal equivalence being a bisimulation, as demonstrated below:

Example 6.14

Let I be a nominal sort. Let $r:\text{\tt I}\rightarrow\text{\tt Fact}$ , $s:\text{\tt I}\;\text{\tt I}\rightarrow\text{\tt Fact}$ . Define $F:=r(\imath^{\text{\tt I}}_{1})\circ r(\imath^{\text{\tt I}}_{2})$ . Consider a DML query $Q:=\nabla[C_{\text{\tt I}}(x)]_{\mathbf{n}}\circ[r(y)]_{?}\mathbin{.}(\{x=y\}\Rightarrow s(x,y)).$ Let $t_{1}:=\mathrm{I}_{Q}(F,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{0}))$ and $t_{2}:=\mathrm{I}_{Q}(F,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{3}))$ . Term $t_{1}$ does not satisfy the freshness condition (Definition 6.1). It is immediate that $t_{1}\equiv_{\mathbf{n}}t_{2}$ , $t_{1}\rightarrow^{!}t_{3}$ , where $t_{3}:=\mathfrak{n}(s(\imath^{\text{\tt I}}_{1},1)\circ F,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{2}))$ , but the only normal form of $t_{2}$ is $t_{4}:=\mathfrak{f}(F,C_{\text{\tt I}}(\imath^{\text{\tt I}}_{5}))$ and $t_{3}\not\equiv_{\mathbf{n}}t_{4}$ .

We now define a class of queries $Q$ for which $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ is confluent modulo nominal equivalence.

Definition 6.15

Let $Q$ be a DML query. We say that $Q$ has no deletion conflicts if and only if for each DML subquery $\nabla P\mathbin{.}R$ of $Q$ , and any subterm $f:\text{\tt Fact}$ of $R$ (resp. $P$ ) occurring inside $[\_]_{0}$ , $P$ (resp. $R$ ) has no subterm $f^{\prime}$ occurring inside $[\_]_{0}$ , $[\_]_{!}$ or $[\_]_{?}$ such that $f$ and $f^{\prime}$ are unifiable.

Definition 6.16

Let $Q$ be a DML query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . $Q$ is called deterministic if it has no deletion conflicts and all quantification patterns in $Q$ (including those inside subterms which are conditions) contain only single facts with unique matching property (but may contain any number of fresh facts).

In Example 4.18 we shown why multiple facts in patterns lead to non-confluence, and as a result, to non-deterministic evaluation of conditions (and queries). However, we have not previously considered deletion conflicts (as they are specific to DML queries). The following example shows why deletion conflicts can prevent confluence:

Example 6.17

Consider the following DML query:

Q:=\nabla[f(x)]_{0}\mathbin{.}\bigl{(}\mathit{\surd}\rhd(\nabla[f(1)]_{?}\mathbin{.}h(x))\bigr{)}.

(30)

Observe that all quantification patterns in $Q$ consist of single facts, but $Q$ does have deletion conflicts (facts in both patterns are unifiable, and one of them is $[\_]_{0}$ -pattern which has the second one in its scope). Since the first pattern ( $[f(x)]_{0}$ ) is semi-terminating but not terminating, to ensure that the DML query in its scope is success assured it is of the form $\mathit{\surd}\rhd\_$ .

Now, let $F:=f(1)\circ f(2)$ . It is easy to see that executing $Q$ against $F$ removes from $F$ both $f$ -facts and either adds a single fact $h(2)$ or nothing depending on whether pattern $[f(x)]_{0}$ first matches $f(2)$ (which makes it possible for the subquery $\nabla[f(1)]_{?}\mathbin{.}h(x)$ to succeed then and return $h(2)$ ) or $f(1)$ (which causes all executions of subquery $\nabla[f(1)]_{?}\mathbin{.}h(x)$ to fail).

The following theorem states that while evaluation of a deterministic DML query is not itself deterministic, but its results are.

Theorem 6.18

Let $Q$ be a deterministic query in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . Then $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ is confluent up to a nominal equivalence. In particular, given ground multisets of facts $F$ , and of fresh facts $F_{\mathbf{n}}$ , there is a unique (up to nominal equivalence) term $t$ of the form $\mathfrak{n}(G,G_{\mathbf{n}})$ or $\mathfrak{f}(F,G_{\mathbf{n}})$ such that $\mathrm{I}_{Q}(F,F_{\mathbf{n}})\rightarrow^{!}t$ .

Proof 6.19

The only significant difference between the proof of this theorem and Theorem 5.13 is the presence of deletions and fresh facts. The non-confluence introduced by fresh facts can be absorbed with nominal equivalence. Since $Q$ has no deletion conflicts, when a DML subquery $\nabla P\mathbin{.}R$ is executed, neither deletion of facts through $P$ influences execution of $R$ nor execution of $R$ decreases the pool of facts available for matching with $P$ . Moreover, if $P$ contains $[f]_{0}$ for some fact $f$ , then $f$ is the only fact in $P$ , hence $R$ cannot fail, facts deleted through $P$ are never returned, and $[f]_{0}$ behaves like $[f]_{?}$ .

Let us finish this section with the following remark about expressibility of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ :

Remark 6.20

A typical formalization of database updates is to use pairs of queries which define facts to be, respectively, deleted from, and added to the current database. This approach can be emulated in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ , using multiple DML queries executed in a sequence. First, let us extend the signature $\Sigma$ with function symbols $f^{d}$ and $f^{a}$ for each fact constructor $f$ . Let $Q_{d}$ and $Q_{a}$ be queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$ which return sets of facts to be deleted and added, respectively, to the database. We assume that $Q_{d}$ and $Q_{a}$ contain no subterms of the form $f^{d}(\vec{t})$ or $f^{a}(\vec{t})$ . Let $\hat{Q}_{d}$ and $\hat{Q}_{a}$ be the same as $Q_{d}$ and $Q_{a}$ , respectively, except that all subqueries $f(\vec{t})$ of sort Fact are replaced, respectively, with $f^{d}(\vec{t})$ and $f^{a}(\vec{t})$ . Then to update the database we execute the following DML queries (in this order):

	$\displaystyle\hat{Q}_{d},\quad\hat{Q}_{a},\quad\nabla[f_{1}^{d}(\vec{v}^{1})]_{?}\mathbin{.}\nabla[f_{1}(\vec{v}^{1})]_{0}\mathbin{.}\mathit{\surd},\ldots,\nabla[f_{m}^{d}(\vec{v}^{m})]_{?}\mathbin{.}\nabla[f_{m}(\vec{v}^{m})]_{0}\mathbin{.}\mathit{\surd},$
	$\displaystyle\nabla[f_{1}^{d}(\vec{v}^{1})]_{0}\mathbin{.}\mathit{\surd},\ldots,\nabla[f_{m}^{d}(\vec{v}^{m})]_{0}\mathbin{.}\mathit{\surd},\quad\nabla[f_{1}^{a}(\vec{v}^{1})]_{0}\mathbin{.}f_{1}(\vec{v}^{1}),\ldots,\nabla[f_{m}^{a}(\vec{v}^{m})]_{0}\mathbin{.}f_{m}(\vec{v}^{m}),$

where $f_{1},\ldots,f_{m}$ are fact constructors occurring in $Q_{d}$ and $Q_{a}$ . Thus, because of Theorem 5.18 we can express any relational database update using multiple DML queries in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ .

7 Reachability analysis of data-centric business processes

In this section we demonstrate the application of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ in specification and analysis of data-centric business processes. First, we describe a general reachability and simulation framework, and then devote the rest of the section to an extended example specification.

So far, we have specified the rules for execution of a single DML expression. A business process, in general, executes a sequence of DML expressions according to some orchestration rules. A simple example of such rules which we use here, appropriate for a data driven process, is that if a DML expression can be executed successfully then it can be chosen (non-deterministically) as the next command to be executed. Usually (c.f., [40]) such data modifying operations are launched in response to some events, such as user actions which also provide input parameters for the command. In turn, their execution may trigger further events. Here, taking inspiration from [41, 42], we interpret some of the facts as triggering events, user input, and output events (we describe this in more detail later as a part of the example). This simplifies the simulation.

Thus, we specify a business process simply as a finite set $\Gamma$ of DML expressions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . During simulation, at each “business step” a DML expression is chosen non-deterministically and is executed. Unsuccessful execution simply leaves the database of facts unchanged. Alternatively, during reachability analysis which performs a breadth-first search through all possible evolutions of the process it is more efficient to make the system stuck on unsuccessful step. This prunes spurious branches in search tree.

More precisely, a set of DML expressions $\Gamma$ determines a rewriting system $\mathcal{R}_{\Sigma,\mathcal{D}}(\Gamma)$ defined to be the union of $\mathcal{R}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}(Q)$ ’s, $Q\in\Gamma$ , augmented with two rule schemas

\lambda^{\text{new}}_{Q}:\mathfrak{n}(F,F_{\mathbf{n}})\Rightarrow\mathrm{I}_{Q}(F,F_{\mathbf{n}}),\quad\lambda^{\text{fail}}_{Q}:\mathfrak{f}(F,F_{\mathbf{n}})\Rightarrow\mathrm{I}_{Q}(F,F_{\mathbf{n}}),

(31)

for all $Q\in\Gamma$ .

Rule schema $\lambda^{\text{new}}_{Q}$ chooses non-deterministically a new DML query for execution and rewrites into an initial state for this query if the execution of the previous one was successful. Similarily, $\lambda^{\text{fail}}_{Q}$ chooses a new DML query if the execution of the previous one failed. It is important to know that the failure of a randomly chosen DML expression does not usually mean that the business process execution is faulty: Instead, it may simply mean that the given DML expression was not applicable at the moment. Since here the only way to know if the DML expression is applicable is to run it, the rule $\lambda^{\text{fail}}_{Q}$ is necessary lest the simulation stops prematurely. On the other hand, unsuccessful executions do not change database state, and thus are spurious, adding no useful information. This is why, when doing reachability analysis which explores using breadth-first search all possible paths of execution (in contrast to simulations, where each simulation travels just a single execution path) it is better to drop the rule $\lambda^{\text{fail}}_{Q}$ .

It is assumed that all DML queries $Q\in\Gamma$ are such that a successful execution of Q consumes and emits a special fact $\sharp$ (of sort Fact) called a token. The token does not denote any real data, but rather facilitates a non-deterministic choice of user input. Say, if in the database there were facts $f(a_{1}),f(a_{2}),\ldots$ , where $a_{1},a_{2},\ldots$ are possible user inputs for some business step, then we can simulate user choice and execution of further action $D(x)$ (based on this choice and expressed as DML query with free variable $x$ storing user’s decision) by using the DML expression of the form $\nabla[\sharp]_{0}\circ[f(x)]_{?}\mathbin{.}D(x)$ . If the query wouldn’t match and remove the token then the action $D(x)$ would be executed for every possible user input. In the example described in this section instead of a constant token we use tokens $\sharp:Nat\rightarrow\text{\tt Fact}$ parametrized by a natural number. All DML queries consume a token with a non-zero parameter and emit a token with a parameter decreased by one. This permits limiting the number of “large business steps”, i.e., executions of DML queries, in the simulation or search procedure. Rewriting systems such as Maude permit limiting rewriting steps in the search procedure. However, execution of each DML query can take many rewriting steps, the number of which is not easy to estimate. Thus, it is not trivial to pass from the number of rewriting steps to the number of business steps (which are more natural in this context).

Given an initial database $F$ we start reachability analysis from term $\mathfrak{n}(F\circ\sharp(k),C_{s_{1}}(\imath_{m_{1}}^{s_{1}})\circ\cdots C_{s_{n}}(\imath_{m_{n}}^{s_{n}}))$ , where $k$ is the maximal depth of search expressed in the number of business steps, $s_{1},\ldots,s_{n}$ are nominal sorts for which we need fresh values, and $m_{1},\ldots,m_{n}$ are such that values $\imath_{p_{i}}^{s_{i}}$ for $p_{i}\geq m_{i}$ , $i\in\{1,\ldots,n\}$ , do not occur in $F$ . In case of reachability analysis we search for the term of the form $\mathfrak{n}(F,F_{\mathbf{n}})$ where the database $F$ satisfies some condition (either desirable or undesirable one).

7.1 Example specification

We borrow an example from [28, Appendix C] to demonstrate specification of a business process as a set of DML expressions in $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ . The example concerns the process of selecting and advertising restaurant offers of dinners by employees of mediating agency, and managing corresponding bookings. The lifecycles of two key artifact types — Offer and Booking — are presented as finite state machines in Figure 1. Each agent publishes exactly one restaurant offer — either the new one which just came or the one which was previously put on hold. The published offer is in the state available. Agent puts the offer he currently publishes on hold (state onHold) when picking up another offer for publication. Dashed arrows in Figure 1 indicate that entering a given state by an artifact may trigger state change in another artifact, e.g., there is a dashed arrow between the available state and the anonymous transition into onHold state (in a distinct artifact of type Offer). Available offer may get closed (state closed, or be picked up by a customer (transition newBooking to state beingBooked). The latter triggers creation of a new Booking artifact. Booking starts with a preliminary phase of drafting (state drafting) in which the customer chooses dinner hosts (transition addHosts). After draft submission (which changes the state to submitted) the agent computes price for the offer (transition determineProposal to state finalized) and the customer decides to either accept or reject the proposal transitioning, respectively, to the accepted or canceled state. The acceptance may in some cases go through toBeValidated state when additional validation is necessary.

Figure 1: Lifecycles of Offer and Booking artifacts presented as finite state machines [28, Figure 5] (see also [43, Figure 5])

States of offers and bookings are constants of sorts OState and BState, respectively, named like in Figure 1. We use the following nominal sorts for identifiers: Rest for restaurants; Person for customers, agents and hosts; Offer, Book and Url for offers, bookings, and url’s of finalized proposals, respectively. For facts we use the following constructors:

	$\displaystyle\text{\tt Rest}:\text{\tt Rest}\rightarrow\text{\tt Fact},\quad\text{\tt Agent},\text{\tt Cust}:\text{\tt Person}\rightarrow\text{\tt Fact},$
	$\displaystyle\text{\tt Offer}:\text{\tt Offer}\;\text{\tt OState}\;\text{\tt Rest}\;\text{\tt Person}\rightarrow\text{\tt Fact},\quad\text{\tt Book}:\text{\tt Book}\;\text{\tt BState}\;\text{\tt Offer}\;\text{\tt Person}\rightarrow\text{\tt Fact},$
	$\displaystyle\text{\tt Host}:\text{\tt Book}\;\text{\tt Person}\rightarrow\text{\tt Fact},\quad\text{\tt Prop}:\text{\tt Book}\;\text{\tt Url}\rightarrow\text{\tt Fact}.$

where facts $\text{\tt Rest}(r)$ , $\text{\tt Agent}(a)$ , $\text{\tt Cust}(c)$ indicate that $r$ , $a$ , and $c$ are, respectively, identifiers of a registered restaurant, agent, and customer. $\text{\tt Offer}(o,s,r,a)$ means that an offer $o$ in a state $s$ for a restaurant $r$ is managed by an agent $a$ . $\text{\tt Book}(b,s,o,c)$ means that booking $b$ in a state $s$ for a customer $c$ is related to an offer $o$ . A fact $\text{\tt Host}(b,p)$ indicates that a person $p$ is included as a host for booking $b$ . Finally, $\text{\tt Prop}(b,u)$ indicates that finalized proposal for booking $b$ , with details and prices, is available at the url $u$ .

We now specify selected transitions from Figure 1 in detail. Transition newOffer responsible for creation of new offers is implemented with the following DML query:

\nabla[\sharp(s(n))]_{0}\circ[C_{\text{\tt Offer}}(o)]_{\mathbf{n}}\circ[\text{\tt Agent}(a)]_{?}\circ[\text{\tt Rest}(r)]_{!}\mathbin{.}\bigl{(}\forall\;[\text{\tt Offer}(o^{\prime},\text{\tt beingBooked},r^{\prime},a)]_{?}\;.\;\bot\bigr{)}\\ \Rightarrow\bigl{(}O(o,\text{\tt available},r,a)\rhd\bigl{(}\nabla[O(o^{\prime},\text{\tt available},r^{\prime},a)]_{0}\mathbin{.}O(o^{\prime},\text{\tt onHold},r^{\prime},a)\bigr{)}\rhd\sharp(n)\bigr{)}.

Above, $s:\text{\tt Nat}\rightarrow\text{\tt Nat}$ denotes the successor function. We assume that each DML query is executed against a database in which there is exactly one token matching $\sharp(s(n))$ , i.e., a token holding a number greater than zero. Thus, in the above, we first choose a single fresh offer identifier, and, non-deterministically, a single registered agent and a single restaurant. The token is marked with $[\_]_{0}$ , so it is removed from the database after matching (this guarantees that we choose no more than one agent, restaurant, and offer identifier). The query emits back a token with a number decreased by 1 ensuring the possibility (if this number is greater than zero) of executing a next query. The restaurant can be arbitrary (as long as it is registered in the system), however the agent must not manage an offer being booked, as described by the deterministic condition

\forall[O(o^{\prime},\text{\tt beingBooked},r^{\prime},a)]_{?}\;.\;\bot

inside the above DML query. If this condition is not satisfied, the quantifier step fails, the token is returned to the database and a new matching is tried. Since $\text{\tt Agent}(a)$ is marked by $[\_]_{?}$ , we do not try the same agent again. If the correct matching is found, a fact describing new offer is added to the database. We also change the state of any available offer managed by the agent of the new offer to onHold.

An offer which was put on hold, may be resumed by any agent who is not managing an offer which is currently being booked. Agent resuming an offer becomes the new manager of the offer:

\nabla[\sharp(s(n))\circ\text{\tt Offer}(o,\text{\tt onHold},r,a)]_{0}\circ[\text{\tt Agent}(a^{\prime})]_{?}\mathbin{.}\bigl{(}\forall\;[\text{\tt Offer}(o^{\prime},\text{\tt beingBooked},r^{\prime},a^{\prime})]_{?}\;.\;\bot\bigr{)}\\ \Rightarrow\bigl{(}\sharp(n)\rhd\text{\tt Offer}(o,\text{\tt available},r,a^{\prime})\\ \rhd\bigl{(}\nabla[\text{\tt Offer}(o^{\prime},\text{\tt available},r^{\prime},a^{\prime})]_{0}\mathbin{.}\text{\tt Offer}(o^{\prime},\text{\tt onHold},r^{\prime},a^{\prime})\bigr{)}\bigr{)}.

With the newBooking transition some offer $o$ changes state from available to beingBooked). It also triggers creation of a new booking (with a fresh identifier) in the drafting state for the chosen offer $o$ on behalf of some registered customer:

\nabla[\sharp(s(n))\circ\text{\tt Offer}(o,\text{\tt available},r,a)]_{0}\circ[C_{\text{\tt Book}}(b)]_{\mathbf{n}}\circ[\text{\tt Cust}(c)]_{!}\\ \mathbin{.}\bigl{(}\text{\tt Offer}(o,\text{\tt beingBooked},r,a)\rhd\text{\tt Book}(b,\text{\tt drafting},o,c)\rhd\sharp(n)\bigr{)}.

The customer involved in booking can add dinner hosts one by one (see transition addHosts in Figure 1) as long as the booking is in the drafting stage. The added host can be either fresh or be present in the database as a host for another offer. We use separate DML queries for each of those cases. The first case (of a fresh host) is trivial:

\nabla[\sharp(s(n))]_{0}\circ[C_{\text{\tt Person}}(h)]_{\mathbf{n}}\circ[\text{\tt Book}(b,\text{\tt drafting},o,c)]_{!}\mathbin{.}\bigl{(}\sharp(n)\rhd\text{\tt Host}(b,h)\bigr{)}.

In the second case we have to ensure that we are not adding the same person twice:

\nabla[\sharp(s(n))]_{0}\circ[\text{\tt Book}(b,\text{\tt drafting},o,c)]_{!}\circ[\text{\tt Host}(b^{\prime},h)]_{?}\mathbin{.}\bigl{(}(\forall\;[\text{\tt Host}(b,h)]_{?}\mathbin{.}\bot)\Rightarrow(\sharp(n)\rhd\text{\tt Host}(b,h))\bigr{)}.

The submit action simply changes the state of the booking from drafting to submitted. Then, if the customer’s customized booking is infeasible, it can be rejected (reject transition in Figure 1, the implementation of which we omit for brevity’s sake). Otherwise, the final proposal (which includes cost, etc.) to the customer who owns the booking is created (transition determineProposal in Figure 1). The preparation of the proposal is abstracted as (1) creating the fresh url to the proposal, and (2) removing information about hosts (which is available at the new url). As before, we pick the booking non-deterministically using the token and appropriate pattern:

\nabla[\sharp(s(n))\circ\text{\tt Book}(b,\text{\tt submitted},o,c)]_{0}\circ[C_{\text{\tt Url}}(u)]_{\mathbf{n}}\mathbin{.}\\ \bigl{(}\text{\tt Prop}(b,u)\rhd\text{\tt Book}(b,\text{\tt finalized},o,c)\rhd\bigl{(}\nabla[\text{\tt Host}(b,h)]_{0}\mathbin{.}\mathit{\surd}\bigr{)}\rhd\sharp(n)\bigr{)}.

A finalized booking proposal for a restaurant $r$ can be accepted either immediately (with $\text{\tt accept}_{1}$ ) or after an additional confirmation (with $\text{\tt accept}_{2}$ ). The first case applies only to golden customers of a given restaurant $r$ , i.e., those who successfully booked a dinner in $r$ at least $k$ -times, for some fixed $k$ . Accepting a proposal changes the state of the offer to which the booking belongs to closed:

\nabla[\sharp(s(n))\circ\text{\tt Book}(b,\text{\tt finalized},o,c)\circ\text{\tt Offer}(o,\text{\tt beingBooked},r,a)]_{0}\circ[\text{\tt Cust}(c)]_{?}\mathbin{.}\\ \bigl{(}\exists\;[\text{\tt Offer}(o_{1},\text{\tt closed},r,a_{1})\circ B(b_{1},\text{\tt accepted},o_{1},c)\\ \circ\cdots\circ\text{\tt Offer}(o_{k},\text{\tt closed},r,a_{k})\circ\text{\tt Book}(b_{k},\text{\tt accepted},o_{k},c)]_{?}\mathbin{.}\top\bigr{)}\\ \Rightarrow\bigl{(}\text{\tt Book}(b,\text{\tt accepted},o,c)\rhd\text{\tt Offer}(o,\text{\tt closed},r,a)\rhd\sharp(n)\bigr{)}.

Remark 7.1

In our earlier work [15] we have used the almost same example (with minor modifications) to illustrate an alternative formalism (c.f. Section 1.1 in the current paper) where queries, also implemented in the rewriting system, but using meta-level features, are deterministic. This makes them behave like a classical queries, but because of determinism it is not possible to simulate user input directly. Instead, a separate mechanism had to be introduced to simulate non-deterministic input choice. Secondly, DML expressions in [15] did not add or, more importantly, delete facts from the database directly. Instead, they return pairs (which need to be specified in the DML expression itself) of (multi)sets of facts: those to be deleted and those to be added. However, in this particular example (and we believe it is typical) facts to be deleted are matched by parts of the patterns in the query. This (in [15]) led to code duplication, and suggested natural use of rewriting rules (which, of course, replaces matched subterms), and ultimately led to the formalism described in this paper.

Remark 7.2

We have implemented both the syntax and semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\text{\tt Cnd}}$ and $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\text{\tt Dml}}$ in Maude [44]. The implementation is available on the project’s website [45]. To test the implementation we have used specification of the business process described above (also available from [45]). The specification compiles into a Maude’s system module which contains definitions of 196 operators, 285 equations and 83 rewriting rules (in actual implementation we used equations in place of deterministic rewriting rules).

Let us now describe a simple example of a reachability analysis with the specification just described. Let

\text{\tt initDB}:=\text{\tt agent}(a_{1})\circ\text{\tt agent}(a_{2})\circ\text{\tt cust}(c_{1})\circ\text{\tt cust}(c_{2})\circ\text{\tt rest}(r_{1})\circ\text{\tt rest}(r_{2})\circ\sharp(7)

be an initial database of facts. Let

\text{\tt initDBN}:=C(\imath^{\text{\tt Offer}}_{0})\circ C(\imath^{\text{\tt Book}}_{0})\circ C(\imath^{\text{\tt Url}}_{0})\circ C(\imath^{\text{\tt Person}}_{0})

be an initial multiset of fresh facts. Finally, let

\text{\tt initState}:=\mathfrak{n}(\text{\tt initDB},\text{\tt initDBN})

be an initial state. Note that since the token is parametrized by 7, it follows that we can make at most seven successful business steps from this state. We are interested in checking if we can reach from initState (in no more than 7 business steps) the state in which there exists a closed offer (and accepted associated booking) from the database with no bookings and offers. Formally, we want to reach the state matching

\mathfrak{n}(F\circ\text{\tt Offer}(o_{1},\text{\tt closed},r_{1},a_{1})\circ\text{\tt Book}(b_{1},\text{\tt accepted},o_{1},c_{1}),F_{\mathbf{n}}).

Using our implementation [45] we can easily check that a matching state is indeed reachable in 6 business steps (Maude reported 525165 actual rewritings in 2528ms).

8 Conclusion

We have presented a multiset non-deterministic query and data manipulation language $\mathcal{Q}_{\Sigma,\mathcal{D}}$ based on conditional term rewriting. The intended application of this language is in specification, simulation and reachability analysis of data-centric business processes. However, the remarkable features of $\mathcal{Q}_{\Sigma,\mathcal{D}}$ , particularly non-determinism and non-standard approach to variable binding, make it interesting on its own. We show that non-determinism of queries is useful for simulating user choices, but we also provide easily identifiable syntactic restrictions which ensure uniqueness of query results. Interestingly, this non-determinism leads to bisimulation-like definitions of logical equivalence between formulas. In the last section we demonstrated how sets of DML queries can be used to specify a business process and we provide a simple framework for simulation and testing.

$\mathcal{Q}_{\Sigma,\mathcal{D}}$ is a multiset query language. Most formal query languages, including relational calculus and algebra, are based on sets. One under-appreciated fact is that SQL is really a multiset query language, and for a very good reason — removing duplicates is expensive. While this was not our primary reason to use multisets, we believe that using multiset languages encourages query design which avoids unnecessary expensive operations, and takes the complexity of query execution into account better than set-based languages.

The fact that closed $\mathcal{Q}_{\Sigma,\mathcal{D}}$ formulas are compiled to rewriting systems permits their symbolic execution using narrowing [16]. We intend to explore this possibility in future research. This is also one of the reasons why it was important to limit the use of conditional rules as much as possible: many implementations of narrowing (see e.g., [46]) do not permit narrowing with conditions.

We have implemented $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$ , $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$ and a specification framework extending the one described at the beginning of Section 7 in Maude [46]. The implementation is available from [45]. It differs in non-essential way from the one described in the present paper, but the code is extensively documented.

Acknowledgements

The author is grateful to the anonymous reviewers for their helpful remarks

References

[1] Hull R. Artifact-Centric Business Process Models: Brief Survey of Research Results and Challenges. In: Meersman R, Tari Z (eds.), On the Move to Meaningful Internet Systems: OTM 2008. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008 pp. 1152–1163. 10.1007/978-3-540-88873-4_17.
[2] Calvanese D, De Giacomo G, Montali M. Foundations of Data-aware Process Analysis: A Database Theory Perspective. In: Proceedings of the 32Nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’13. ACM, New York, NY, USA, 2013 pp. 1–12. 10.1145/2463664.2467796.
[3] van der Aalst WM. The application of Petri nets to workflow management. Journal of circuits, systems, and computers, 1998. 8(01):21–66. 10.1142/S0218126698000043.
[4] van der Aalst WM, Ter Hofstede AH. YAWL: yet another workflow language. Information systems, 2005. 30(4):245–275. 10.1016/j.is.2004.02.002.
[5] Rosa-Velardo F, de Frutos-Escrig D. Decidability and complexity of Petri nets with unordered data. Theoretical Computer Science, 2011. 412(34):4439–4451.
[6] Lasota S. Decidability border for Petri nets with data: WQO dichotomy conjecture. In: International Conference on Application and Theory of Petri Nets and Concurrency. Springer, 2016 pp. 20–36. doi:10.1007/978-3-319-39086-4_3.
[7] Montali M, Rivkin A. Model checking Petri nets with names using data-centric dynamic systems. Formal Aspects of Computing, 2016. 28(4):615–641. 10.1007/s00165-016-0370-6.
[8] Montali M, Rivkin A. DB-Nets: On the Marriage of Colored Petri Nets and Relational Databases. In: Transactions on Petri Nets and Other Models of Concurrency XII. Springer Berlin Heidelberg, Berlin, Heidelberg, 2017 pp. 91–118. 10.1007/978-3-662-55862-1_5.
[9] Montali M, Rivkin A. From DB-nets to Coloured Petri Nets with Priorities. In: International Conference on Applications and Theory of Petri Nets and Concurrency. Springer, 2019 pp. 449–469. doi:10.1007/978-3-030-21571-2_24.
[10] Meseguer J. Conditional rewriting logic as a unified model of concurrency. Theoretical computer science, 1992. 96(1):73–155. 10.1016/0304-3975(92)90182-F.
[11] Meseguer J, Rosu G. The rewriting logic semantics project. Theoretical Computer Science, 2007. 373(3):213 – 237. 10.1016/j.tcs.2006.12.018.
[12] Stehr MO, Meseguer J, Ölveczky PC. Rewriting Logic as a Unifying Framework for Petri Nets. In: Unifying Petri Nets: Advances in Petri Nets. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001 pp. 250–303. 10.1007/3-540-45541-8_9.
[13] Padberg J, Schulz A. Model Checking Reconfigurable Petri Nets with Maude. In: Echahed R, Minas M (eds.), Graph Transformation. Springer International Publishing, Cham, 2016 pp. 54–70. 10.1007/978-3-319-40530-8_4.
[14] Kheldoun A, Barkaoui K, Ioualalen M. Formal verification of complex business processes based on high-level Petri nets. Information Sciences, 2017. 385:39–54. 10.1016/j.ins.2016.12.044.
[15] Zieliński B. A Query Language Based on Term Matching and Rewriting. Fundamenta Informaticae, 2019. 169:237–274. 10.3233/FI-2019-1845.
[16] Meseguer J, Thati P. Symbolic reachability analysis using narrowing and its application to verification of cryptographic protocols. Higher-Order and Symbolic Computation, 2007. 20(1-2):123–160. 10.1007/s10990-007-9000-6.
[17] Fay M. First-order unification in an equational theory. In: Proceedings of the 4th Workshop on Automated Deduction, Austin, Texas, 1979.
[18] Hullot JM. Canonical forms and unification. In: International Conference on Automated Deduction. Springer, 1980 pp. 318–334. doi:10.1007/3-540-10009-1_25.
[19] Alpuente M, Escobar S, Iborra J. Termination of narrowing revisited. Theoretical Computer Science, 2009. 410(46):4608–4625. doi:10.1016/j.tcs.2009.07.037.
[20] Zieliński B, Maślanka P. Relational Transition System in Maude. In: Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation: 13th International Conference, BDAS 2017, Ustroń, Poland, May 30 - June 2, 2017, Proceedings. Springer International Publishing, Cham, 2017 pp. 497–511. 10.1007/978-3-319-58274-0_39.
[21] Roşu G, Ellison C, Schulte W. Matching Logic: An Alternative to Hoare/Floyd Logic. In: Algebraic Methodology and Software Technology. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011 pp. 142–162. 10.1007/978-3-642-17796-5_9.
[22] Roşu G. Matching logic. arXiv:1705.06312, 2017.
[23] Stehr MO. CINNI-A Generic Calculus of Explicit Substitutions and its Application to $\lambda$ - $\varsigma$ -and $\pi$ -Calculi. Electronic Notes in Theoretical Computer Science, 2000. 36:70–92. 10.1016/S1571-0661(05)80125-2.
[24] Baader F. The description logic handbook: Theory, implementation and applications. Cambridge University Press, New York, NY, USA, 2003. ISBN:0-521-78176-0.
[25] De Giacomo G, De Masellis R, Rosati R. Verification of conjunctive artifact-centric services. International Journal of Cooperative Information Systems, 2012. 21(02):111–139. 10.1142/S0218843012500025.
[26] Hariri BB, Calvanese D, De Giacomo G, De Masellis R, Felli P. Foundations of relational artifacts verification. In: International Conference on Business Process Management. Springer, 2011 pp. 379–395. 10.1007/978-3-642-23059-2_28.
[27] Calvanese D, Montali M, Patrizi F, De Giacomo G. Description logic based dynamic systems: modeling, verification, and synthesis. In: Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2015 pp. 4247–4253.
[28] Abdulla PA, Aiswarya C, Atig MF, Montali M, Rezine O. Recency-Bounded Verification of Dynamic Database-Driven Systems. In: Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’16. ACM, New York, NY, USA, 2016 pp. 195–210. 10.1145/2902251.2902300.
[29] Chen-Burger YH, Robertson D. Automating business modelling: a guide to using logic to represent informal methods and support reasoning. Springer Science & Business Media, 2006. 10.1007/b138799.
[30] Merouani H, Mokhati F, Seridi-Bouchelaghem H. Formalizing Artifact-Centric Business Processes - Towards a Conformance Testing Approach. In: Proceedings of the 16th International Conference on Enterprise Information Systems. 2014 pp. 368–374. 10.5220/0004951803680374.
[31] McCarthy J, Hayes PJ. Some philosophical problems from the standpoint of artificial intelligence. Readings in artificial intelligence, 1969. pp. 431–450.
[32] Deutsch A, Li Y, Vianu V. Verifas: a practical verifier for artifact systems. Proceedings of the VLDB Endowment, 2017. 11(3):283–296. doi:10.14778/3157794.3157798.
[33] Deutsch A, Li Y, Vianu V. Verification of hierarchical artifact systems. ACM Transactions on Database Systems (TODS), 2019. 44(3):1–68. doi:10.1145/3321487.
[34] Calvanese D, Ghilardi S, Gianola A, Montali M, Rivkin A. Formal modeling and SMT-based parameterized verification of data-aware BPMN. In: International Conference on Business Process Management. Springer, 2019 pp. 157–175. doi:10.1007/978-3-030-26619-6_12.
[35] Seco JC, Debois S, Hildebrandt T, Slaats T. RESEDA: Declaring live event-driven computations as REactive SEmi-structured DAta. In: 2018 IEEE 22nd International enterprise distributed object computing conference (EDOC). IEEE, 2018 pp. 75–84. doi:10.1109/EDOC.2018.00020.
[36] Meseguer J. Membership algebra as a logical framework for equational specification. In: Recent Trends in Algebraic Development Techniques. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998 pp. 18–61. 10.1007/3-540-64299-4_26.
[37] Huet G. Confluent reductions: Abstract properties and applications to term rewriting systems: Abstract properties and applications to term rewriting systems. Journal of the ACM (JACM), 1980. 27(4):797–821.
[38] Thielscher M. Introduction to the Fluent Calculus. Electronic Transactions on Artificial Intelligence, 1998. 2(3-4):179–192.
[39] Ochremiak J. Nominal sets over algebraic atoms. In: International Conference on Relational and Algebraic Methods in Computer Science. Springer, 2014 pp. 429–445. 10.1007/978-3-319-06251-8_26.
[40] Hull R, Damaggio E, De Masellis R, Fournier F, Gupta M, Heath III FT, Hobson S, Linehan M, Maradugu S, Nigam A, et al. Business artifacts with guard-stage-milestone lifecycles: managing artifact interactions with conditions and events. In: Proceedings of the 5th ACM international conference on Distributed event-based system. ACM, 2011 pp. 51–62. 10.1145/2002259.2002270.
[41] Abiteboul S, Vianu V, Fordham B, Yesha Y. Relational Transducers for Electronic Commerce. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’98. ACM, New York, NY, USA. ISBN 0-89791-996-3, 1998 pp. 179–187. 10.1145/275487.275507.
[42] Abiteboul S, Vianu V, Fordham B, Yesha Y. Relational transducers for electronic commerce. Journal of Computer and System Sciences, 2000. 61(2):236–269. doi:10.1006/jcss.2000.1708.
[43] Abdulla PA, Aiswarya C, Atig MF, Montali M, Rezine O. Recency-bounded verification of dynamic database-driven systems (extended version). arXiv preprint arXiv:1604.03413, 2016.
[44] Clavel M, Durán F, Eker S, Lincoln P, Martí-Oliet N, Meseguer J, Talcott C. The Maude 2.0 System. In: Nieuwenhuis R (ed.), Rewriting Techniques and Applications (RTA 2003), number 2706 in Lecture Notes in Computer Science. Springer-Verlag, 2003 pp. 76–87. 10.1007/3-540-44881-0_7.
[45] Zieliński B. Nondeterministic Rewriting Query Language (NDRQL). Project website,
http://ki.wfi.uni.lodz.pl/ndrql/.
[46] Clavel M, Duràn F, Eker S, Escobar S, Lincoln P, Martì-Oliet N, Meseguer J, Talcott C. Maude Manual (Version 2.7.1), 2016.

A Non-Deterministic Multiset Query Language

Abstract

keywords:

1 Introduction

1.1 Prior work

1.2 Preliminaries on term rewriting

Definition 1.1

Lemma 1.2

2 Multisets of facts, fresh facts and patterns

Example 2.1

Example 2.2

Remark 2.3

3 Query and condition languages

3.1 Conditions

Definition 3.1

Example 3.2

Definition 3.3

3.2 Syntax of queries and DML queries

Definition 3.4

Definition 3.5

Example 3.6

Example 3.7

Example 3.8

4 Rewriting semantics of 𝒬Σ,𝒟𝐜𝐧𝐝\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}

Remark 4.1

Theorem 4.2

Proof 4.3

Lemma 4.4

Example 4.5

Definition 4.6

Lemma 4.7

Lemma 4.8

Lemma 4.9

Proof 4.10

Lemma 4.11

Proof 4.12

Example 4.13

Definition 4.14

Definition 4.15

Theorem 4.16

Proof 4.17

Example 4.18

5 Rewriting semantics of 𝒬Σ,𝒟𝐪𝐫𝐲\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}

Theorem 5.1

Lemma 5.2

Example 5.3

Definition 5.4

Lemma 5.5

Lemma 5.6

Lemma 5.7

Lemma 5.8

Lemma 5.9

Proof 5.10

Example 5.11

Definition 5.12

Theorem 5.13

Proof 5.14

Lemma 5.15

Proof 5.16

Example 5.17

Theorem 5.18

Proof 5.19

Remark 5.20

Example 5.21

6 Rewriting semantics of 𝒬Σ,𝒟𝐝𝐦𝐥\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}

Definition 6.1

Example 6.2

Theorem 6.3

Proof 6.4

Lemma 6.5

Definition 6.6

Lemma 6.7

Lemma 6.8

Lemma 6.9

Lemma 6.10

Example 6.11

Definition 6.12

Lemma 6.13

Example 6.14

Definition 6.15

4 Rewriting semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{cnd}}$

5 Rewriting semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{qry}}$

6 Rewriting semantics of $\mathcal{Q}_{\Sigma,\mathcal{D}}^{\mathbf{dml}}$