This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Algebraic Graph Transformation Approach
for RDF and SPARQL

Dominique Duval LJK, CNRS and Univ. Grenoble Alpes, Grenoble, France [email protected] LIG, CNRS and Univ. Grenoble Alpes, Grenoble, France LIG, CNRS and Univ. Grenoble Alpes, Grenoble, France    Rachid Echahed LIG, CNRS and Univ. Grenoble Alpes, Grenoble, France [email protected] LIG, CNRS and Univ. Grenoble Alpes, Grenoble, France    Frédéric Prost LIG, CNRS and Univ. Grenoble Alpes, Grenoble, France [email protected]
Abstract

We consider the recommendations of the World Wide Web Consortium (W3C) about RDF framework and its associated query language SPARQL. We propose a new formal framework based on category theory which provides clear and concise formal definitions of the main basic features of RDF and SPARQL. We define RDF graphs as well as SPARQL basic graph patterns as objects of some nested categories. This allows one to clarify, in particular, the role of blank nodes. Furthermore, we consider basic SPARQL CONSTRUCT and SELECT queries and formalize their operational semantics following a novel algebraic graph transformation approach called POIM.

1 Introduction

Mathematical semantics of computer science languages has been advocated since early 1970’s. It allows one to give precise meaning of syntactical objects and paves the way for involved reasoning methods such as modularity, compositionality, security and verification techniques, to quote a few. Nowadays, graph databases are becoming a very influential technology in our society. Mastering programming languages involved in the encoding of such graph data is a necessity to elaborate robust modern data management systems. Relational algebra [7] was the main mathematical foundation underlying oldy SQL-like formalisms for databases. However, with the advent of new graph oriented formalisms such as the most recent recommendations of the World Wide Web Consortium (W3C) about the Resource Description Framework (RDF) [20] and the associated query language SPARQL [19], there is a clear need of an alternative to relational algebra which copes with this change in data encodings, see e.g., [5, 16, 13]. In this paper, we consider RDF and SPARQL languages and propose a new mathematical semantics of a kernel of these formalisms within algebraic graph transformations setting.

RDF graphs are the key data structure in RDF. In [20, Section 3], an RDF graph is defined as a set of RDF triples, where an RDF triple has the form (subject,predicate,object)(subject,predicate,object). The subject is either an IRI (Internationalized Resource Identifier) or a blank node, the predicate is an IRI and the object is either an IRI, a literal (denoting a value such as a string, a number or a date) or a blank node. Blank nodes are arbitrary elements as long as they differ from IRIs and literals and they do not have any internal structure: they are used for indicating the existence of a thing and the blank node identifiers are locally scoped. For instance, let us consider a toy database, TdataT_{data}, consisting of the following four triples Tdata={(Alice,knows,Bob),(Tom,knows,Dave),(Bob,knows,blank1),(blank1,knows,Alice)}T_{data}=\{(Alice,knows,Bob),(Tom,knows,Dave),(Bob,knows,blank_{1}),(blank_{1},knows,Alice)\}. The two first triples say that Alice knows Bob and Tom knows Dave whereas the last two triples say that Bob knows someone, represented by the blank node blank1blank_{1}, who knows Alice. Notice that a predicate in an RDF triple cannot be a blank. For example, a triple such as (Paul,blank2,Henry)(Paul,blank_{2},Henry) standing for “there is some relationship between Paul and Henry” is not allowed in RDF, but only in generalized RDF [20, Section 7]. Following the theoretical point of view we propose in this paper, there is no harm to consider blank predicates within RDF triples. We thus consider data graphs in a more general setting including RDF graphs.

The query language SPARQL for RDF databases is based on basic graph patterns, which are kinds of RDF graphs with variables [19, Section 2]. In this paper, we consider query graphs which generalize basic graph patterns by allowing blanks to be predicates. The SPARQL query processor searches for triples within a given RDF database which match the triple patterns in the given basic graph pattern, and returns a multiset of solutions or an RDF graph. Considering basic graph patterns, one may wonder what is the difference between variables and blank nodes. SPARQL specifications in [19, Section 4.1.4] suggest similarities between them, whereas the opposite is made in [19, Section 16.2]. In the formalization of SPARQL we propose, blank nodes and variables are clearly distinguished by their respective roles in the definition of morphisms.

In the SPARQL recommendation [19], the SELECT query form is described lengthily. This query form can be compared to the SELECT query form of SQL, which returns a multiset of solutions. In contrast, the CONSTRUCT query form returns an RDF graph. Let us consider again the previous toy database TdataT_{data} and assume we formulate a CONSTRUCT query that constructs triples of the form R=(x,acquaintedWith,z)R=(x,acquaintedWith,z) every time there exists a third party yy such that the following condition, which we call LL, is satisfied : (x,knows,y)(x,knows,y) and (y,knows,z)(y,knows,z). Then, LL and RR are query graphs with variables x,yx,y and zz. They intuitively stand for the left-hand and right-hand sides of a rule representing the considered CONSTRUCT query. To perform such query, one should consider all matches, mm, of the condition LL against the database TdataT_{data} and for each match mm, create a new triple (m(x),acquaintedWith,m(z))(m(x),acquaintedWith,m(z)). Starting from database, TdataT_{data}, this process yields the following result

H={(Alice,acquaintedWith,blank1),(blank1,acquaintedWith,Bob),(Bob,acquaintedWith,Alice)}H=\{(Alice,acquaintedWith,blank_{1}),(blank_{1},acquaintedWith,Bob),(Bob,acquaintedWith,Alice)\}

From graph transformation point of view, TdataT_{data} is a host graph to be transformed by a rule whose left-hand and right-hand sides are respectively LL and RR. Notice that the resulting graph HH does not contain the non-matched triple (Tom,knows,Dave)(Tom,knows,Dave). This means that the considered transformations are not necessarily local. In addition, the graph HH gathers all possible results triggered by the matches of LL against the host graph TdataT_{data}. This toy example is considered in Example 28.

Following our formalization, the CONSTRUCT query form, which is described very shortly in [19, Section 16.2], is more fundamental than the SELECT query form. Actually, we start by proposing an operational semantics for CONSTRUCT queries based on a new approach of algebraic graph transformations which we call POIM and we show afterward how SELECT queries can be easily encoded as CONSTRUCT queries. This new POIM approach represents a CONSTRUCT query as a rule of the form LKRL\rightarrow K\leftarrow R where L,KL,K and RR are basic graph patterns, and a rewrite step is made of a pushout followed by an image factorization. The result of a CONSTRUCT query is the outcome of the transformation one obtains when running the above rule against an RDF database. It happens that such rules and rewrite techniques can be used also to encode the solutions computed by SELECT query forms. As said earlier, the involved graph transformations are not local in the sense that only query answers should be computed out of the graph database. All parts which are not matched are deleted. Classical graph transformation techniques such as Double Pushout [9] and Single Pushout [10] or even more RDF oriented transformations like MPOC-PO [6] are not best recommended in this case (cf. Section 6 for a comparison with related work).

The paper is organized as follows. Section 2 defines the objects and the morphisms of the categories of data graphs and query graphs. Section 3 introduces the POIM algebraic transformation. In Section 4, we define two different operational semantics for CONSTRUCT queries and prove their equivalence. We first define a high-level calculus as a mere application of the POIM transformation. Then we propose a low-level calculus which is defined by means of several applications of the POIM transformation followed by a “merging” process. Both calculi implement faithfully the SPARQL semantics for CONSTRUCT queries (Theorem 21). In Section 5, we show how the POIM transformation can be used to define a novel operational semantics of the SELECT queries. This semantics, which is faithful to SPARQL definitions (Theorem 40), is obtained by an original translation of each SELECT query into a CONSTRUCT query. Concluding remarks and related work are discussed in Section 6.

2 Graphs of Triples

The set of IRIs, denoted 𝐼𝑟𝑖\mathit{Iri}, and the set of literals, denoted 𝐿𝑖𝑡\mathit{Lit}, with its usual operations, are defined in [20]. Essentially, an IRI (Internationalized Resource Identifier) is an internet address and a literal denotes a value such as a string, a number or a date. The sets 𝐼𝑟𝑖\mathit{Iri} and 𝐿𝑖𝑡\mathit{Lit} are disjoint. In addition, let BB be a countably infinite set, disjoint from 𝐼𝑟𝑖\mathit{Iri} and 𝐿𝑖𝑡\mathit{Lit}. The elements of BB are called blanks. According to [20, Section 3.1], an RDF graph is a set of RDF triples, where an RDF triple consists of three components: the subject, which is an IRI or a blank node; the predicate, which is an IRI; and the object, which is an IRI, a literal or a blank node. The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph. Using set-theoretic notations, this can be expressed as follows: let 𝑇𝑟=(𝐼𝑟𝑖B)×𝐼𝑟𝑖×(𝐼𝑟𝑖𝐿𝑖𝑡B)\mathit{Tr}=(\mathit{Iri}\cup B)\times\mathit{Iri}\times(\mathit{Iri}\cup\mathit{Lit}\cup B), then an RDF triple is an element of 𝑇𝑟\mathit{Tr} and an RDF graph is a subset of 𝑇𝑟\mathit{Tr}. Let us also consider the following extension of RDF [20, Section 7]: A generalized RDF triple is a triple having a subject, a predicate, and object, where each can be an IRI, a blank node or a literal. A generalized RDF graph is a set of generalized RDF triples. Let I=𝐼𝑟𝑖𝐿𝑖𝑡I=\mathit{Iri}\cup\mathit{Lit}, so that a generalized RDF triple is an element of (IB)3(I\cup B)^{3} and a generalized RDF graph is a subset of (IB)3(I\cup B)^{3}.

Let VV be a countably infinite set disjoint from 𝐼𝑟𝑖\mathit{Iri}, 𝐿𝑖𝑡\mathit{Lit} and BB. The elements of VV are called variables. According to [19, Section 2] a set of triple patterns is called a basic graph pattern, where triple patterns are like RDF triples except that each of the subject, predicate and object may be a variable. Let 𝑇𝑟V=(𝐼𝑟𝑖BV)×(𝐼𝑟𝑖V)×(𝐼𝑟𝑖𝐿𝑖𝑡BV)\mathit{Tr}_{V}=(\mathit{Iri}\cup B\cup V)\times(\mathit{Iri}\cup V)\times(\mathit{Iri}\cup\mathit{Lit}\cup B\cup V), then a triple pattern is an element of 𝑇𝑟V\mathit{Tr}_{V} and a basic graph pattern is a subset of 𝑇𝑟V\mathit{Tr}_{V}. Since 𝑇𝑟V\mathit{Tr}_{V} is a subset of (IBV)3(I\cup B\cup V)^{3}, each basic graph pattern is a subset of (IBV)3(I\cup B\cup V)^{3}.

RDF graphs and basic graph patterns are generalized in Definition 2 as data graphs and query graphs respectively, are both relying on Definition 1.

Definition 1.

For each set AA, the triples on AA are the elements of A3A^{3}. For each triple t=(s,p,o)t=(s,p,o) on AA the elements ss, pp and oo of AA are called respectively the subject, the predicate and the object of tt. A graph on AA is a set of triples on AA, i.e. a subset of A3A^{3}. For each graph TT on AA, the subset of AA made of the subjects, predicates and objects of TT is called the set of attributes of TT and is denoted |T||T|; it follows that TT is a subset of |T|3|T|^{3}. Let TT and TT^{\prime} be two graphs on AA. A morphism a:TTa:T\to T^{\prime} is a map such that there is a map M:|T||T|M:|T|\to|T^{\prime}| such that aa is the restriction of M3M^{3} to TT. Then MM is uniquely determined by aa and will be denoted by |a||a|. This yields the category of graphs on AA, denoted 𝐆(A)\mathbf{G}(A). We say that a morphism a:TTa:T\to T^{\prime} of graphs on AA fixes a subset CC of AA if |a|(x)=x|a|(x)=x for each xx in |T|C|T|\cap C. For each subset CC of AA, the subcategory of 𝐆(A)\mathbf{G}(A) made of the graphs on AA with the morphisms fixing CC is denoted 𝐆C(A)\mathbf{G}_{C}(A).

Thus, by mapping aa to |a||a| we get a one-to-one correspondence between the morphisms a:TTa:T\to T^{\prime} of graphs on AA and the maps M:|T||T|M:|T|\to|T^{\prime}| such that M3(T)TM^{3}(T)\subseteq T^{\prime}.

An isomorphism (i.e., an invertible morphism) in 𝐆(A)\mathbf{G}(A) is a morphism a:TTa:T\to T^{\prime} of graphs on AA such that |a|:|T||T||a|:|T|\to|T^{\prime}| is a bijection and a(T)=Ta(T)=T^{\prime}. A morphism aa fixing CC is determined by the restriction of the map |a||a| to |T|C¯|T|\cap\overline{C}, where C¯=AC\overline{C}=A\setminus C. An isomorphism aa in 𝐆C(A)\mathbf{G}_{C}(A) is a morphism a:TTa:T\to T^{\prime} of graphs on AA such that |a||a| is the identity on |T|C|T|\cap C and a bijection between |T|C¯|T|\cap\overline{C} and |T|C¯|T^{\prime}|\cap\overline{C} and a(T)=Ta(T)=T^{\prime}. The notions of inclusion, subgraph, image and union for graphs on AA are defined as inclusion, subset, image and union for subsets of A3A^{3}.

Definition 2.

Let II, BB and VV be three pairwise distinct countably infinite sets, called respectively the sets of resource identifiers, blanks and variables. Let IB=IB{IB}=I\cup B, IV=IV{IV}=I\cup V and IBV=IBV{IBV}=I\cup B\cup V. The category of data graphs is 𝐃=𝐆(IB)\mathbf{D}=\mathbf{G}({IB}) and for each subset CC of IB{IB} the category of data graphs fixing CC is the subcategory 𝐃C=𝐆C(IB)\mathbf{D}_{C}=\mathbf{G}_{C}({IB}) of 𝐃\mathbf{D}. The category of query graphs is 𝐐=𝐆(IBV)\mathbf{Q}=\mathbf{G}({IBV}) and for each subset CC of IBV{IBV} the category of query graphs fixing CC is the subcategory 𝐐C=𝐆C(IBV)\mathbf{Q}_{C}=\mathbf{G}_{C}({IBV}) of 𝐐\mathbf{Q}.

Thus, since I=𝐼𝑟𝑖𝐿𝑖𝑡I=\mathit{Iri}\cup\mathit{Lit}, the RDF graphs are the data graphs where only nodes can be blanks and only nodes that are not subjects can be literals, and the RDF terms of an RDF graph are its attributes when it is seen as a data graph. Then the isomorphisms of RDF graphs, as defined in [20, Section 3.6.], are the isomorphisms in the category 𝐃I\mathbf{D}_{I} of data graphs fixing II: indeed, two data graphs G1G_{1} and G2G_{2} are isomorphic in 𝐃I\mathbf{D}_{I} if and only if they differ only by the names of their blanks. For each data graph TT, let |T|I=|T|I|T|_{I}=|T|\cap I and |T|B=|T|B|T|_{B}=|T|\cap B, so that |T||T| is the disjoint union of |T|I|T|_{I} and |T|B|T|_{B}. Similarly, the basic graph patterns of SPARQL are the query graphs where only nodes can be blanks and only nodes that are not subjects can be literals. For each query graph TT, let |T|I=|T|I|T|_{I}=|T|\cap I, |T|B=|T|B|T|_{B}=|T|\cap B and |T|V=|T|V|T|_{V}=|T|\cap V, so that |T||T| is the disjoint union of |T|I|T|_{I}, |T|B|T|_{B} and |T|V|T|_{V}.

Morphisms of graphs can be used, for instance, for substituting the variables of a query graph (Definition 3) or for interpreting a data graph in a universe of discourse (Definition 4).

Definition 3.

A match from a query graph LL to a data graph GG is a morphism of query graphs from LL to GG which fixes II. The set of matches from LL to GG is denoted 𝑀𝑎𝑡𝑐ℎ(L,G)\mathit{Match}(L,G) and the set of all matches from LL to any data graph is denoted 𝑀𝑎𝑡𝑐ℎ(L)\mathit{Match}(L).

Thus, a match fixes each resource identifier and it maps a variable or a blank to a resource identifier or a blank.

The interpretations of an RDF graph are also kinds of morphisms, see Definition 4. Note that this will not be used later in this paper. We define an interpretation of a data graph GG in a universe of discourse UU by generalizing the definition of a morphism, according to [20, Section 1.2.]: Any IRI or literal denotes something in the world (the “universe of discourse”). These things are called resources. The predicate itself is an IRI and denotes a property, that is, a resource that can be thought of as a binary relation. Recall that the binary relations on a set RR are the subsets of R2R^{2}. It may happen that a binary relation on RR is itself an element of RR, this is important for understanding Definition 4 and the semantics of RDF in general.

Definition 4.

Given a set RR and a subset PP of R2R^{2} made of binary relations on RR, let UU be the set of triples (s,p,o)(s,p,o) in R3R^{3} such that pPp\in P and (s,o)p(s,o)\in p. The universe of discourse with RR as set of resources and PP as set of properties is the graph UU on RR. Given a universe of discourse UU on a set RR and a map MI:IRM_{I}:I\to R, an interpretation of a data graph GG is a map i:GUi:G\to U such that i=M3i=M^{3} for a map M:|G||U|M:|G|\to|U| which extends MIM_{I}.

In this paper, we consider categories 𝐃C\mathbf{D}_{C} and 𝐐C\mathbf{Q}_{C} for various subsets CC of IB{IB} and IBV{IBV} respectively. It will always be the case that CC contains II, so that we can say that resource identifiers have a “global scope”. In contrast, blanks have a “local scope”: in the basic part of RDF and SPARQL considered in this paper, the scope of a blank node is restricted to one data graph or one query graph. The note about blank node identifiers in [20, Section 3.4] distinguishes two kinds of syntaxes for RDF: an abstract syntax where blank nodes do not have identifiers and concrete syntaxes where blank nodes have identifiers. In our approach a blank is an attribute, which corresponds to a concrete syntax, and the abstract syntax is obtained by considering data graphs as objects of the category 𝐃I\mathbf{D}_{I} up to isomorphism, so that any blank node can be changed for a fresh blank node if needed.

Notation 5.

In the examples variables are denoted as ?x, ?y, … and blanks as _:b, _:c, … The IRIs are written in an abbreviated way as :alice (the address of Alice’s web page), :knows (the address where the “knows” relation is described), … Each triple (s,p,o)(s,p,o) is written as s p o and a dot is used for separating triples inside an RDF graph or a basic graph pattern.

Example 6.

Consider three RDF graphs G1G_{1}, G2G_{2} and G3G_{3} as follows. They are pairwise distinct, thus pairwise non-isomorphic in 𝐃IB\mathbf{D}_{{IB}}. Graphs G1G_{1} and G2G_{2} are isomorphic in 𝐃I\mathbf{D}_{I}. In the RDF semantics the name of blanks does not matter, so that both G1G_{1} and G2G_{2} mean that “Alice knows someone” and “someone knows Bob”. Graph G3G_{3} is not isomorphic to G1G_{1} (thus nor to G2G_{2}) in 𝐃I\mathbf{D}_{I}, it means more precisely that “Alice knows someone who knows Bob”.

 G1G_{1} 

   

  :alice :knows _:b . 

  _:c :knows :bob 

   

 

 G2G_{2} 

   

  :alice :knows _:c . 

  _:b :knows :bob 

   

 

 G3G_{3} 

   

  :alice :knows _:b. 

  _:b :knows :bob 

   

 

Now consider basic graph patterns G3G_{3} to G8G_{8} (each RDF graph can be seen as a basic graph pattern without variables, like G3G_{3} and G5G_{5}). They are pairwise non-isomorphic in 𝐐IBV\mathbf{Q}_{{IBV}} because they are pairwise distinct. In 𝐐IV\mathbf{Q}_{{IV}} only G7G_{7} and G8G_{8} are isomorphic. In 𝐐I\mathbf{Q}_{I} these query graphs belong to two different isomorphism classes: on one side G3G_{3} and G4G_{4} are isomorphic and on the other side G5G_{5}, G6G_{6}, G7G_{7} and G8G_{8} are isomorphic.

 G3G_{3} 

   

  :alice :knows _:b. 

  _:b :knows :bob 

   

 

 G4G_{4} 

   

  :alice :knows ?x. 

  ?x :knows :bob 

   

 

 G5G_{5} 

   

  :alice :knows _:b. 

  _:c :knows :bob 

   

 

 G6G_{6} 

   

  :alice :knows ?x. 

  ?y :knows :bob 

   

 

 G7G_{7} 

   

  :alice :knows ?x. 

  _:b :knows :bob 

   

 

 G8G_{8} 

   

  :alice :knows ?x. 

  _:c :knows :bob 

   

 

Assumption 7.

From now on AA is a countably infinite set, CC a subset of AA, C¯=AC\overline{C}=A\setminus C the complement of CC in AA, and we assume that both CC and C¯\overline{C} are countably infinite.

Remark 8.

Since C¯\overline{C} is countably infinite, when dealing with a finite number of finite graphs on AA it is always possible to find a fresh attribute outside CC, i.e., an element of C¯\overline{C} that is not an attribute of any of the given graphs. We will use repeatedly the following consequence of this fact:
Given a graph TT on AA, if any attribute of TT in C¯\overline{C} is replaced by any fresh attribute outside CC the result is a graph TT^{\prime} on AA that is isomorphic to TT in 𝐆C(A)\mathbf{G}_{C}(A). Such a TT^{\prime} exists when TT is finite.

Now let us focus on some kinds of colimits of graphs on AA: coproducts in Proposition 9 and pushouts in Proposition 10. Recall that colimits in any category are defined up to isomorphism in this category. Remember that a morphism a:TTa:T\to T^{\prime} in 𝐆(A)\mathbf{G}(A) and a map M:|T||T|M:|T|\to|T^{\prime}| are such that M(x)=|a|(x)M(x)=|a|(x) for each attribute x|T|x\in|T| if and only if M3(t)=a(t)M^{3}(t)=a(t) for each triple tTt\in T.

Proposition 9.

Given graphs T1,,TkT_{1},...,T_{k} on AA such that |Ti||Tj|C|T_{i}|\cap|T_{j}|\subseteq C for each iji\neq j, the union T1TkT_{1}\cup...\cup T_{k} is a coproduct of T1,,TkT_{1},...,T_{k} in 𝐆C(A)\mathbf{G}_{C}(A). Given any finite graphs T1,,TkT_{1},...,T_{k} on AA there are graphs T1,,TkT^{\prime}_{1},...,T^{\prime}_{k} on AA such that TiT^{\prime}_{i} is isomorphic to TiT_{i} in 𝐆C(A)\mathbf{G}_{C}(A) for each ii and |Ti||Tj|C|T^{\prime}_{i}|\cap|T^{\prime}_{j}|\subseteq C for each iji\neq j, and then the union T1TkT^{\prime}_{1}\cup...\cup T^{\prime}_{k} is a coproduct of T1,,TkT_{1},...,T_{k} in 𝐆C(A)\mathbf{G}_{C}(A).

Proof.

First, assume that |Ti||Tj|C|T_{i}|\cap|T_{j}|\subseteq C for each iji\neq j. Consider morphisms ai:TiTa_{i}:T_{i}\to T in 𝐆C(A)\mathbf{G}_{C}(A) for i=1,,ki=1,...,k and the maps |ai|:|Ti||T||a_{i}|:|T_{i}|\to|T|. Note that |T1Tk|=|T1||Tk||T_{1}\cup...\cup T_{k}|=|T_{1}|\cup...\cup|T_{k}| and that |T1||Tk||T_{1}|\cup...\cup|T_{k}| is the disjoint union of the sets |Ti|C|T_{i}|\!\setminus\!C for i=1,,ki=1,...,k and (|T1||Tk|)C(|T_{1}|\cup...\cup|T_{k}|)\cap C, because of the assumption |Ti||Tj|C|T_{i}|\cap|T_{j}|\subseteq C for each iji\neq j. Thus we can define a map M:|T1Tk||T|M:|T_{1}\cup...\cup T_{k}|\to|T| by: M(x)=|ai|(x)M(x)=|a_{i}|(x) for each ii and each x|Ti|Cx\in|T_{i}|\!\setminus\!C and M(x)=xM(x)=x for each x(|T1||Tk|)Cx\in(|T_{1}|\cup...\cup|T_{k}|)\cap C. Then MM coincides with |ai||a_{i}| on |Ti||T_{i}| for each ii. Thus for each tTit\in T_{i} we have M3(t)=ai(t)M^{3}(t)=a_{i}(t), which proves that the image of T1TkT_{1}\cup...\cup T_{k} by M3M^{3} is in TT and that the restriction of M3M^{3} defines a morphism a:T1TkTa:T_{1}\cup...\cup T_{k}\to T in 𝐆C(A)\mathbf{G}_{C}(A) which coincides with aia_{i} on TiT_{i} for each ii. Unicity is clear. Now, the last statement, about any finite graphs T1,,TkT_{1},...,T_{k} on AA, is a consequence of Remark 8. ∎

Proposition 10.

Let l:LKl:L\to K and m:LGm:L\to G be morphisms of graphs on AA such that LL and KK are finite, ll is an inclusion and mm fixes CC. Let us assume that |K||G|C|K|\cap|G|\subseteq C (this is always possible up to isomorphism in 𝐆C(A)\mathbf{G}_{C}(A), by Remark 8). Let N:|K||G||KL|N:|K|\to|G|\cup|K\setminus L| be such that N(x)=|m|(x)N(x)=|m|(x) for x|L|x\in|L| and N(x)=xN(x)=x otherwise. Let D=GN3(K)D=G\cup N^{3}(K), let n:KDn:K\to D be the restriction of N3N^{3} and g:GDg:G\to D the inclusion. Then |D|=|G||KL||D|=|G|\cup|K\setminus L| and the square (l,m,n,g)(l,m,n,g) is a pushout square in 𝐆C(A)\mathbf{G}_{C}(A).

This means that DD is a kind of “union of GG and KK over LL”, however it is not the case that DD is the union of GG and KLK\setminus L, in general. It is always the case that D=GD2D=G\cup D_{2} where D2=N3(KL)D_{2}=N^{3}(K\setminus L) but in general N3N^{3} is not the identity on KLK\setminus L and moreover GG and D2D_{2} are not disjoint.

Proof.

From D=GN3(K)D=G\cup N^{3}(K) we get |D|=|G||N3(K)||D|=|G|\cup|N^{3}(K)|, and since |N3(K)|=N(|K|)=N(|L||KL|)=N(|L|)N(|KL|)=|m|(|L|)|KL||N^{3}(K)|=N(|K|)=N(|L|\cup|K\!\setminus\!L|)=N(|L|)\cup N(|K\!\setminus\!L|)=|m|(|L|)\cup|K\!\setminus\!L| with |m|(|L|)|G||m|(|L|)\subseteq|G| we get |D|=|G||KL||D|=|G|\cup|K\!\setminus\!L|. The definition of nn implies that gm=nlg\circ m=n\circ l. Now let a:GTa:G\to T and b:KTb:K\to T be any morphisms in 𝐆C(A)\mathbf{G}_{C}(A) such that am=bla\circ m=b\circ l. First, let us focus on attributes. We have |g||m|=|n||l||g|\circ|m|=|n|\circ|l| and |a||m|=|b||l||a|\circ|m|=|b|\circ|l|. Since |G||KL|C|G|\cap|K\!\setminus\!L|\subseteq C we have |a|(x)=|b|(x)=x|a|(x)=|b|(x)=x for each x|G||KL|x\in|G|\cap|K\!\setminus\!L|. Since |D|=|G||KL||D|=|G|\cup|K\!\setminus\!L| there is a unique map F:|D||T|F:|D|\to|T| such that F(x)=|a|(x)F(x)=|a|(x) for x|G|x\in|G| and F(x)=|b|(x)F(x)=|b|(x) for x|KL|x\in|K\!\setminus\!L|. Thus on the one hand F(|g|(x))=F(x)=|a|(x)F(|g|(x))=F(x)=|a|(x) for each x|G|x\in|G|, so that F|g|=|a|F\circ|g|=|a|. And on the other hand for each x|K|x\in|K|, if x|L|x\in|L| then F(|n|(x))=F(|m|(x))=|a|(|m|(x))=|b|(|l|(x))=|b|(x)F(|n|(x))=F(|m|(x))=|a|(|m|(x))=|b|(|l|(x))=|b|(x), otherwise F(|n|(x))=F(x)=|b|(x)F(|n|(x))=F(x)=|b|(x), so that F|n|=|b|F\circ|n|=|b|. Second, let us consider triples. Since D=GN3(K)D=G\cup N^{3}(K) and F3(G)=a(G)F^{3}(G)=a(G) and F3(N3(K))=F3(n(K))=b(K)F^{3}(N^{3}(K))=F^{3}(n(K))=b(K) we get F3(D)TF^{3}(D)\subseteq T, which means that there is a morphism f:DTf:D\to T of graphs on AA such that |f|=F|f|=F, fg=af\circ g=a and fn=bf\circ n=b. Unicity is clear. ∎

3 The POIM Transformation

A SPARQL query like “CONSTRUCT {RR} WHERE {LL}” is called basic when both RR and LL are basic graph patterns. In such a query, variables with the same name in LL and RR denote the same RDF term, whereas it is not the case for blank nodes. The statement “blank nodes in graph patterns act as variables” in [19, Section 4.1.4] holds for LL, whereas blank nodes in RR give rise to fresh blank nodes in the result of the query as in Examples 18 and 23. Thus, the meaning of blank nodes in LL is unrelated to the meaning of blank nodes in RR, and in both LL and RR each blank can be replaced by a fresh blank.

We generalize this situation in Definition 11 by allowing any data graphs for LL and RR up to isomorphism in 𝐐IV\mathbf{Q}_{{IV}}: the resource identifiers and the variables in LL and RR are fixed but each blank can be replaced by a fresh blank. Thus, without loss of generality, we can assume that |L|B|R|B=|L|_{B}\cap|R|_{B}=\emptyset. Under this assumption, the set of triples K=LRK=L\cup R with the inclusions of LL and RR in KK is a coproduct of LL and RR in the category 𝐐IV\mathbf{Q}_{{IV}}. We also assume that each variable in RR occurs in LL, so that every substitution for the variables in LL provides a substitution for the variables in RR. The relevance of this assumption with respect to SPARQL queries is discussed in Section 4.1. Note that this assumption |R|V|L|V|R|_{V}\subseteq|L|_{V} is equivalent to |K|V=|L|V|K|_{V}=|L|_{V}.

Definition 11.

A basic construct query is a pair of finite query graphs (L,R)(L,R) such that |L|B|R|B=|L|_{B}\cap|R|_{B}=\emptyset and |R|V|L|V|R|_{V}\subseteq|L|_{V}, up to isomorphism in the category 𝐐IV\mathbf{Q}_{{IV}}. The transformation rule of a basic construct query (L,R)(L,R) is the cospan PL,R=(LlKrR)P_{L,R}=(L\stackrel{{\scriptstyle l}}{{\to}}K\stackrel{{\scriptstyle r}}{{\leftarrow}}R) where K=LRK=L\cup R and ll and rr are the inclusions. Its left-hand side is LL and its right-hand side is RR.

PL,R=LlK=LRRrP_{L,R}\;\;=\;\;\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 6.40279pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&\crcr}}}\ignorespaces{\hbox{\kern-6.40279pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 24.40073pt\raise 5.43056pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-2.43056pt\hbox{$\scriptstyle{l}$}}}\kern 3.0pt}}}}}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 22.79169pt\raise-5.7018pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.75pt\hbox{$\scriptstyle{\subseteq}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 54.40279pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{\hbox{\kern 54.40279pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{K=L\cup R}$}}}}}}}{\hbox{\kern 156.53108pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{\ignorespaces\ignorespaces\ignorespaces\ignorespaces R}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 130.02466pt\raise 4.50694pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.50694pt\hbox{$\scriptstyle{r}$}}}\kern 3.0pt}}}}}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 128.9787pt\raise-5.7018pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.75pt\hbox{$\scriptstyle{\supseteq}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 108.53108pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces}}}}\ignorespaces
Example 12.

Consider the following SPARQL CONSTRUCT query, based on the examples given in the “CONSTRUCT” section (Section 16.2) of [19]. This query builds a triple ?x :FN ?name for each triple ?x :name ?name :

  Query       CONSTRUCT { ?x :FN ?name } WHERE { ?x :name ?name }     

In the corresponding transformation rule LlKrRL\stackrel{{\scriptstyle l}}{{\to}}K\stackrel{{\scriptstyle r}}{{\leftarrow}}R there are no blanks in LL nor in RR, thus K=LRK=L\cup R and ll and rr are the inclusions.

 LL 

   

  ?x :name ?name 

   

 

l\stackrel{{\scriptstyle l}}{{\to}}  KK        ?x :name ?name .    ?x :FN ?name        r\stackrel{{\scriptstyle r}}{{\leftarrow}}  RR        ?x :FN ?name       

Example 13.

Now the SPARQL CONSTRUCT query from Example 12 is modified by replacing both occurrences of the variable ?x by the blank node _:\_\colon\!x:

  Query       CONSTRUCT { _:x :FN ?name } WHERE { _:x :name ?name }     

In the corresponding transformation rule one blank has been modified so as to ensure that |L|B|R|B|L|_{B}\cap|R|_{B} is empty, so that it is still the case that K=LRK=L\cup R.

 LL 

   

  _:x :name ?name 

   

 

l\stackrel{{\scriptstyle l}}{{\to}}  KK        _:x :name ?name .    _:y :FN ?name        r\stackrel{{\scriptstyle r}}{{\leftarrow}}  RR        _:y :FN ?name       

When a basic SPARQL query “CONSTRUCT {RR} WHERE {LL}” is run against an RDF graph GG, and when there is precisely one match of LL into GG, the result of the query is an RDF graph HH obtained by substituting the variables in RR. This substitution can be seen as a match of RR into HH. We claim that the process of building HH with this match of RR into HH from the match of LL into GG can be seen as a two-step process involving an intermediate match of KK in an RDF graph DD. This claim will be proved, more generally, in Section 4. The definition of this two-step process relies on an algebraic construction that we call the POIM transformation: PO for pushout and IM for image (Definition 14). The POIM transformation is related to a large family of algebraic graph transformations based on pushouts, like the SPO (Simple Pushout) [10], DPO (Double Pushout) [9] or SqPO (Sesqui-Pushout) [8]. In the POIM transformation, the PO step creates fresh blank nodes and instantiates the variables of KK, while the IM step deletes everything that is not in the image of RR, as explained now.

Given a basic construct query (L,R)(L,R) and its transformation rule LlKrRL\stackrel{{\scriptstyle l}}{{\to}}K\stackrel{{\scriptstyle r}}{{\leftarrow}}R, the POIM transformation is defined as a map from the matches of LL to the matches of RR, in two steps: first from the matches of LL to the matches of KK, then from the matches of KK to the matches of RR. Given an inclusion l:LKl:L\to K in 𝐐I\mathbf{Q}_{I}, the cobase change along ll is the map l:𝑀𝑎𝑡𝑐ℎ(L)𝑀𝑎𝑡𝑐ℎ(K)l_{*}:\mathit{Match}(L)\to\mathit{Match}(K) that maps each m:LGm:L\to G to l(m):KDl_{*}(m):K\to D defined from the pushout of ll and mm in 𝐐I\mathbf{Q}_{I}, as described in Proposition 10. Note that DD is a data graph because of the assumption |K|V=|L|V|K|_{V}=|L|_{V}. Given an inclusion r:RKr:R\to K in 𝐐I\mathbf{Q}_{I}, the image factorization along rr is the map r+:𝑀𝑎𝑡𝑐ℎ(K)𝑀𝑎𝑡𝑐ℎ(R)r^{+}:\mathit{Match}(K)\to\mathit{Match}(R) that maps each n:KDn:K\to D to r+(n):RHr^{+}(n):R\to H where HH is the image of RR in DD and r+(n)r^{+}(n) is the restriction of nn and h:HDh:H\to D is the inclusion. This leads to Definition 14 and Proposition 15.

Definition 14.

Let (L,R)(L,R) be a basic construct query and LlKrRL\stackrel{{\scriptstyle l}}{{\to}}K\stackrel{{\scriptstyle r}}{{\leftarrow}}R its transformation rule. The POIM transformation map of (L,R)(L,R) is the map

𝑃𝑜𝐼𝑚L,R=r+l:𝑀𝑎𝑡𝑐ℎ(L)𝑀𝑎𝑡𝑐ℎ(R)\mathit{PoIm}_{L,R}=r^{+}\circ l_{*}:\mathit{Match}(L)\to\mathit{Match}(R)

composed of the cobase change map ll_{*} and the image factorization map r+r^{+}. The result of applying 𝑃𝑜𝐼𝑚L,R\mathit{PoIm}_{L,R} to a match m:LGm:L\to G is the match 𝑃𝑜𝐼𝑚L,R(m):RH\mathit{PoIm}_{L,R}(m):R\to H or simply the query graph HH.

L\textstyle{\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(PO)\scriptstyle{(PO)}l\scriptstyle{l}m\scriptstyle{m}K\textstyle{K\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}l(m)\scriptstyle{l_{*}(m)}=\scriptstyle{=}n\scriptstyle{n}(IM)\scriptstyle{(IM)}R\textstyle{R\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}r\scriptstyle{r}r+(n)\scriptstyle{r^{+}(n)}=\scriptstyle{=}p=𝑃𝑜𝐼𝑚L,R(m)\scriptstyle{p=\mathit{PoIm}_{L,R}(m)}G\textstyle{G\ignorespaces\ignorespaces\ignorespaces\ignorespaces}g\scriptstyle{g}D\textstyle{D}H\textstyle{H\ignorespaces\ignorespaces\ignorespaces\ignorespaces}h\scriptstyle{h} (1)

Proposition 15 says that HH is obtained from RR by instantiating all variables as in LL and by renaming blanks in an arbitrary way, as long as this renaming is one-to-one. Note that only the image of the match is transformed and the remaining parts of the RDF graph are deleted.

Proposition 15.

Let (L,R)(L,R) be a basic construct query and m:LGm:L\to G a match. Let P:|R|IBP:|R|\to{IB} be defined by P(x)=|m|(x)P(x)=|m|(x) for x|R|Vx\in|R|_{V} and P(x)=xP(x)=x otherwise. Then, up to isomorphism in 𝐐I\mathbf{Q}_{I}, the result of applying 𝑃𝑜𝐼𝑚L,R\mathit{PoIm}_{L,R} to mm is p:RHp:R\to H where H=P3(R)H=P^{3}(R) and pp is the restriction of P3P^{3}.

Proof.

We use the notations of Diagram (1). Up to isomorphism in 𝐐I\mathbf{Q}_{I} we can assume that all blanks in LL or in RR are distinct from the blanks in GG. Then |G||K|I|G|\cap|K|\subseteq I, so that by Proposition 10 the data graph DD is D=Gn(K)D=G\cup n(K) where nn is such that |n|(x)=|m|(x)|n|(x)=|m|(x) for x|L|x\in|L| and |n|(x)=x|n|(x)=x otherwise. It follows that the restriction of nn to RR is such that |n|(x)=|m|(x)|n|(x)=|m|(x) for x|L||R|x\in|L|\cap|R| and |n|(x)=x|n|(x)=x otherwise. Note that |L||R||L|\cap|R| is the disjoint union of |L|I|R|I|L|_{I}\cap|R|_{I}, that is fixed by all morphisms in 𝐐I\mathbf{Q}_{I}, and |L|V|R|V|L|_{V}\cap|R|_{V}, with |L|V|R|V=|R|V|L|_{V}\cap|R|_{V}=|R|_{V} since |R|V|L|V|R|_{V}\subseteq|L|_{V}. Thus the restriction of nn to RR is such that |n|(x)=|m|(x)|n|(x)=|m|(x) for x|R|Vx\in|R|_{V} and |n|(x)=x|n|(x)=x otherwise. The result follows. ∎

Remark 16.

Each set 𝑀𝑎𝑡𝑐ℎ(X)\mathit{Match}(X) can be seen as a coslice category, then the maps r+r^{+} and ll_{*} can be seen as functors: this could be useful when extending this paper to additional features of SPARQL.

Example 17.

Consider the SPARQL CONSTRUCT query from Example 12:

  Query       CONSTRUCT { ?x :FN ?name } WHERE { ?x :name ?name }     

and let us run this query against the RDF graph GG:

 GG      :alice :name "Alice" . :alice :nick "Lissie"     

There is a single match mm, it is such that m(?𝚡)=:alicem({\tt?x})=\mbox{{\tt:alice}} and m(?name)="Alice"m(\mbox{{\tt?name}})=\mbox{{\tt"Alice"}}. The POIM transformation produces successively the following data graphs DD and HH, where HH is the query result:

 GG 

   

   :alice :name "Alice" . 

   :alice :nick "Lissie" 

   

 

g\stackrel{{\scriptstyle g}}{{\to}}  DD        :alice :name "Alice" .    :alice :nick "Lissie" .    :alice :FN "Alice"        h\stackrel{{\scriptstyle h}}{{\leftarrow}}  HH        :alice :FN "Alice"       

Example 18.

Now consider the SPARQL CONSTRUCT query from Example 13:

  Query       CONSTRUCT { _:x :FN ?name } WHERE { _:x :name ?name }     

Let us run this query against the RDF graph GG from Example 17. There is a single match mm, it is such that m(_:𝚡)=:alicem({\tt\_:x})=\mbox{{\tt:alice}} and m(?name)="Alice"m(\mbox{{\tt?name}})=\mbox{{\tt"Alice"}}. The POIM transformation produces successively the following data graphs DD and HH, where HH is the query result:

 GG 

   

   :alice :name "Alice" . 

   :alice :nick "Lissie" 

   

 

g\stackrel{{\scriptstyle g}}{{\to}}  DD        :alice :name "Alice" .    :alice :nick "Lissie" .    _:b :FN "Alice"        h\stackrel{{\scriptstyle h}}{{\leftarrow}}  HH        _:b :FN "Alice"       

4 Running Basic Construct Queries

This Section begins with our definition of the query result of applying a basic construct query (Definition 11) to a data graph (Definition 2). Theorem 21 in Section 4.1 proves that our query result coincides (up to renaming blanks) with the answer returned by SPARQL when LL and RR are basic graph patterns and GG is an RDF graph. In Section 3, we defined the POIM transformation for running a basic construct query (L,R)(L,R) against a data graph GG, when there is exactly one match from LL to GG. Now, we define two different calculi for running a basic construct query against a data graph GG without any assumption on the number of matches. The high-level calculus (Definition 24) is one “large” application of the POIM transformation. The low-level calculus (Definition 26) consists of several “small” applications of the POIM transformation followed by a “merging” process. The construction of the high-level calculus is simpler, while the low-level calculus better fits with the description of the running process in SPARQL. Proposition 25 in Section 4.2 and Proposition 29 in Section 4.3 prove that both calculi compute the query result.

Definition 19.

Let (L,R)(L,R) be a basic construct query and GG a data graph. Assume (without loss of generality) that |G|B|L|B=|G|_{B}\cap|L|_{B}=\emptyset and |G|B|R|B=|G|_{B}\cap|R|_{B}=\emptyset. Let m1,,mkm_{1},...,m_{k} be the matches from LL to GG. For each i=1,,ki=1,...,k let HiH_{i} be the data graph obtained from RR by replacing each variable xx in RR by mi(x)m_{i}(x) and each blank in RR by a fresh blank (which means: a fresh blank for each blank in RR and each ii in {1,,k}\{1,...,k\}). The query result of applying the basic construct query (L,R)(L,R) to the data graph GG is the data graph H=H1HkH=H_{1}\cup...\cup H_{k}.

4.1 Running Basic Construct Queries in SPARQL

The answer of a SPARQL CONSTRUCT query over an RDF graph is defined in [15, Section 5], based on the seminal paper [16]. Note that in [15] literals are allowed as subjects or predicates in RDF graphs, however for our purpose this does not matter, so that we stick to the definition of an RDF graph from [20], as reminded at the beginning of Section 2. Thus, as I=𝐼𝑟𝑖𝐿𝑖𝑡I=\mathit{Iri}\cup\mathit{Lit}, a data graph GG is an RDF graph if and only if each triple (s,p,o)(s,p,o) in GG is an RDF triple, which means that s𝐼𝑟𝑖Bs\in\mathit{Iri}\cup B and p𝐼𝑟𝑖p\in\mathit{Iri}. Note that for each subset TT of (IBV)3({IBV})^{3} and each subset XX of |T||T|, each map f:XIBVf:X\to{IBV} gives rise to a map f:|T|IBVf^{\prime}:|T|\to{IBV} such that f(x)=f(x)f^{\prime}(x)=f(x) when xXx\in X and f(x)=xf^{\prime}(x)=x otherwise, then f:|T|IBVf^{\prime}:|T|\to{IBV} gives rise to f′′:T(IBV)3f^{\prime\prime}:T\to({IBV})^{3} which is the restriction of (f)3(f^{\prime})^{3} to TT. There will not be any ambiguity in denoting ff not only the given ff but also its extensions ff^{\prime} and f′′f^{\prime\prime}. For simplicity we consider only the SPARQL queries “CONSTRUCT {R}\{R\} WHERE {L}\{L\}” such that each variable in RR occurs in LL. Indeed, variables outside |L|V|L|_{V} cannot be instantiated in the result, and according to [19, Section 16.2], if a triple contains an unbound variable, then that triple is not included in the output RDF graph. Thus, triples involving a variable in |R|V|L|V|R|_{V}\!\setminus\!|L|_{V}, if any, can be dropped. It is assumed in [15] that there is no blank in LL. Indeed, since blank nodes in graph patterns act as variables, each blank in LL can be replaced by a fresh variable.

Definition 20 ([15]).

A solution mapping (or simply a mapping) from a basic graph pattern LL to an RDF graph GG is a map μ:|L|VIB\mu:|L|_{V}\to{IB} such that μ(L)G\mu(L)\subseteq G. When LL and RR are basic graph patterns such that |R|V|L|V|R|_{V}\subseteq|L|_{V}, the answer of the SPARQL query “CONSTRUCT {R}\{R\} WHERE {L}\{L\}” over an RDF graph GG is the set of all RDF triples μ(fμ(t))\mu(f_{\mu}(t)) for all triples tRt\in R and all mappings μ\mu from LL to GG, where for each μ\mu a map fμ:|R|BBf_{\mu}:|R|_{B}\to B is chosen in such a way that the subsets fμ(|R|B)f_{\mu}(|R|_{B}) of BB are pairwise distinct and all of them are distinct from |G|B|G|_{B}.

Theorem 21.

Let LL and RR be basic graph patterns with |L|B=|L|_{B}=\emptyset and |R|V|L|V|R|_{V}\subseteq|L|_{V}. Then (L,R)(L,R) is a basic construct query and the set of RDF triples in the query result of applying (L,R)(L,R) to an RDF graph GG is isomorphic in 𝐃I\mathbf{D}_{I} to the answer of the SPARQL query “CONSTRUCT {R}\{R\} WHERE {L}\{L\}” over GG.

Proof.

Clearly (L,R)(L,R) is a basic construct query and |G|B|L|B=|G|_{B}\cap|L|_{B}=\emptyset. We can assume without loss of generality that |G|B|R|B=|G|_{B}\cap|R|_{B}=\emptyset. The query result HH of applying (L,R)(L,R) to GG is given by Definition 19, based on the set of matches from LL to GG. The Theorem now follows from the fact that the maps μ\mu (or μ′′\mu^{\prime\prime}) on triples which are associated to the mappings μ\mu are precisely the matches from LL to GG. ∎

Example 22.

Consider the SPARQL query from Examples 12 and 17:

  Query       CONSTRUCT { ?x :FN ?name } WHERE { ?x :name ?name }     

and let us run this query against the RDF graph GG:

 GG      :alice :name "Alice" . :alice :nick "Lissie" .   :bob :name "Bob" . :bob :nick "Bobby"     

There are two matches and we get the RDF graphs H1H_{1}, H2H_{2} and the result HH:

 H1H_{1} 

   

  :alice :FN "Alice" 

   

 

 H2H_{2} 

   

  :bob :FN "Bob" 

   

 

 HH 

   

  :alice :FN "Alice" . 

  :bob :FN "Bob" 

   

 

Example 23.

Consider the SPARQL CONSTRUCT query:

  Query       CONSTRUCT { _:c :FN ?name } WHERE { ?x :name ?name }     

Note that this query always returns the same result as the query from Examples 13 and 18. Let us run this query against the RDF graph GG from Example 22. There are two matches and we get the RDF graphs H1H_{1}, H2H_{2} and the result HH:

 H1H_{1} 

   

  _:c :FN "Alice" 

   

 

 H2H_{2} 

   

  _:c :FN "Bob" 

   

 

 HH 

   

  _:c1 :FN "Alice" . 

  _:c2 :FN "Bob" 

   

 

4.2 The High-level Calculus

Let kk be a natural number. According to Proposition 9, for each query graph TT the query graph kTk\,T, coproduct of kk copies of TT in 𝐐I\mathbf{Q}_{I}, can be built (up to isomorphism) as follows: for each i{1,,k}i\in\{1,...,k\} let TiT_{i} be a copy of TT where each blank and variable has been renamed in such a way that there is no blank or variable common to two of the TiT_{i}’s, then the query graph kTk\,T is the union T1TkT_{1}\cup...\cup T_{k}. Now let (L,R)(L,R) be a basic construct query. The transformation rule PL,R=(LlKrR)P_{L,R}=(L\stackrel{{\scriptstyle l}}{{\to}}K\stackrel{{\scriptstyle r}}{{\leftarrow}}R) is a cospan in 𝐐I\mathbf{Q}_{I}, that gives rise to the cospan kPL,R=(kLklkKkrkR)k\,P_{L,R}=(k\,L\stackrel{{\scriptstyle k\,l}}{{\to}}k\,K\stackrel{{\scriptstyle k\,r}}{{\leftarrow}}k\,R). Since ll and rr are inclusions, this renaming can be done simultaneously in the copies of LL, KK and RR, so that kK=kLkRk\,K=k\,L\cup k\,R and klk\,l and krk\,r are the inclusions. Thus, (kL,kR)(k\,L,k\,R) is a basic construct query and PkL,kR=kPL,RP_{k\,L,k\,R}=k\,P_{L,R} is its corresponding transformation rule.

Definition 24.

Let (L,R)(L,R) be a basic construct query and GG a data graph. Let m1,,mkm_{1},...,m_{k} be the matches from LL to GG. Consider the basic construct query (kL,kR)(k\,L,k\,R). Let mm be the match from kLk\,L to GG that coincides with mim_{i} on the ii-th component of kLk\,L. The high-level query result of (L,R)(L,R) against GG is the result Hℎ𝑖𝑔ℎH_{\mathit{high}} of applying the POIM transformation map 𝑃𝑜𝐼𝑚kL,kR\mathit{PoIm}_{k\,L,k\,R} to the match m:kLGm:k\,L\to G, as in Diagram (2).

kL\textstyle{\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces k\,L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(PO)\scriptstyle{(PO)}kl\scriptstyle{k\,l}m\scriptstyle{m}kK\textstyle{\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces k\,K\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(IM)\scriptstyle{(IM)}n\scriptstyle{n}kR\textstyle{k\,R\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}kr\scriptstyle{k\,r}p\scriptstyle{p}G\textstyle{G\ignorespaces\ignorespaces\ignorespaces\ignorespaces}g\scriptstyle{g}D\textstyle{D}Hℎ𝑖𝑔ℎ\textstyle{H_{\mathit{high}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}h\scriptstyle{h} (2)
Proposition 25.

Let (L,R)(L,R) be a basic construct query and GG a data graph. The high-level query result of (L,R)(L,R) against GG is isomorphic, in the category 𝐃I\mathbf{D}_{I}, to the query result of (L,R)(L,R) against GG.

Proof.

This is a consequence of the description of the result of a POIM transformation from Proposition 15. ∎

4.3 The Low-level Calculus

The low-level calculus is a two-step process: first one local result is obtained for each match, using a POIM transformation, then the local results are glued together in order to form the low-level query result.

Definition 26.

Let (L,R)(L,R) be a basic construct query and GG a data graph. Let m1,,mkm_{1},...,m_{k} be the matches from LL to GG. For each i=1,,ki=1,...,k let GiG_{i} be the image of mim_{i} and let us still denote mim_{i} the restriction mi:LGim_{i}:L\to G_{i}. The local result HiH_{i} of (L,R)(L,R) against GG along mim_{i} is the result of applying the POIM transformation map 𝑃𝑜𝐼𝑚L,R\mathit{PoIm}_{L,R} to the match mi:LGim_{i}:L\to G_{i}. Let IB(G)=I|G|B{IB}(G)=I\cup|G|_{B}. The low-level query result H𝑙𝑜𝑤H_{\mathit{low}} of (L,R)(L,R) against GG is the coproduct of the HiH_{i}’s in the category 𝐃IB(G)\mathbf{D}_{{IB}(G)} of data graphs with morphisms fixing all resource identifiers and the blanks that are in GG.

L\textstyle{\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(PO)\scriptstyle{(PO)}l\scriptstyle{l}mi\scriptstyle{m_{i}}K\textstyle{\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces K\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(IM)\scriptstyle{(IM)}ni\scriptstyle{n_{i}}R\textstyle{R\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}r\scriptstyle{r}pi\scriptstyle{p_{i}}Gi\textstyle{G_{i}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}gi\scriptstyle{g_{i}}Di\textstyle{D_{i}}Hi\textstyle{H_{i}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}hi\scriptstyle{h_{i}} (3)
Example 27.

Let us apply the low-level calculus to Example 23. The match m1m_{1} produces G1D1H1G_{1}\to D_{1}\leftarrow H_{1}:

 G1G_{1} 

   

  :alice :name "Alice" 

   

 

g1\stackrel{{\scriptstyle g_{1}}}{{\to}}  D1D_{1}        :alice :name "Alice" .    _:c :FN "Alice"        h1\stackrel{{\scriptstyle h_{1}}}{{\leftarrow}}  H1H_{1}        _:c :FN "Alice"       

and similarly the match m2m_{2} produces G2D2H2G_{2}\to D_{2}\leftarrow H_{2}:

 G2G_{2} 

   

  :bob :name "Bob" 

   

 

g2\stackrel{{\scriptstyle g_{2}}}{{\to}}  D2D_{2}        :bob :name "Bob" .    _:c :FN "Bob"        h2\stackrel{{\scriptstyle h_{2}}}{{\leftarrow}}  H2H_{2}        _:c :FN "Bob"       

Finally the query result H𝑙𝑜𝑤H_{\mathit{low}}, which is the coproduct of H1H_{1} and H2H_{2} in category 𝐃IB(G)\mathbf{D}_{{IB}(G)}, is isomorphic to HH from Example 23.

 H𝑙𝑜𝑤H_{\mathit{low}}      _:c1 :FN "Alice" . _:c2 :FN "Bob"     

Example 28.

This example illustrates how local results are “merged” to compute the result in the low-level calculus. The SPARQL query is the following:

  Query      CONSTRUCT { ?x :acquaintedWith ?z } WHERE { ?x :knows ?y . ?y :knows ?z }     

Its corresponding transformation rule is:

 LL 

   

  ?x :knows ?y . 

  ?y :knows ?z 

   

 

l\stackrel{{\scriptstyle l}}{{\to}}  KK        ?x :knows ?y .    ?y :knows ?z .    ?x :acquaintedWith ?z        r\stackrel{{\scriptstyle r}}{{\leftarrow}}  RR        ?x :acquaintedWith ?z       

This query is applied to the following graph GG:

 GG      :alice :knows :bob . :bob :knows _:c . _:c :knows :alice . :cathy :knows :david     

There are three matches m1m_{1}, m2m_{2}, m3m_{3}, thus three local results H1H_{1}, H2H_{2}, H3H_{3}:

 H1H_{1} 

   

 :alice :acquaintedWith _:c 

   

 

 H2H_{2} 

   

 _:c :acquaintedWith :bob 

   

 

 H3H_{3} 

   

 :bob :acquaintedWith :alice 

   

 

The blank _:c in H1H_{1} and H2H_{2} is not duplicated in the coproduct H𝑙𝑜𝑤H_{\mathit{low}} because it comes from GG. Thus the result is:

 H𝑙𝑜𝑤H_{\mathit{low}}      :alice :acquaintedWith _:c . _:c :acquaintedWith :bob . :bob :acquaintedWith :alice     

Proposition 29.

Let (L,R)(L,R) be a basic construct query and GG a data graph. The low-level query result of (L,R)(L,R) against GG is isomorphic, in the category 𝐃I\mathbf{D}_{I}, to the query result of (L,R)(L,R) against GG.

Proof.

This is a consequence of the description of the result of a POIM transformation from Proposition 15 and the description of coproducts in 𝐃I\mathbf{D}_{I} from Proposition 9. ∎

5 Running Basic Select Queries

The CONSTRUCT query form of SPARQL returns a data graph whereas the SELECT query form returns a table, like the SELECT query form of SQL. Both in SQL and in SPARQL, it is well-known that such tables are not exactly relations in the mathematical sense: in mathematics a relation on X1,,XnX_{1},...,X_{n} is a subset of the cartesian product X1××XnX_{1}\!\times\!...\!\times\!X_{n}, while the result of a SELECT query in SQL or SPARQL is a multiset of elements of X1××XnX_{1}\!\times\!...\!\times\!X_{n}. In order to avoid ambiguities, such a multiset is called a multirelation on X1,,XnX_{1},...,X_{n}. When all XiX_{i}’s are the same set XX it is called a multirelation of arity nn on XX.

A SPARQL query such as “SELECT ?s1,,?sn?s_{1},...,?s_{n} WHERE {LL}” is called basic when LL is a basic graph pattern and ?s1,,?sn?s_{1},...,?s_{n} are distinct variables. We generalize this situation by defining a basic select query as a pair (L,S)(L,S) where LL is a finite query graph and SS is a finite set of variables. Then we associate to each basic select query (L,S)(L,S) a basic construct query (L,𝐺𝑟(S))(L,\mathit{Gr}(S)). Finally we define the result of running the basic select query (L,S)(L,S) against a data graph GG from the data graph HH result of running the basic construct query (L,𝐺𝑟(S))(L,\mathit{Gr}(S)) against GG. This process is first described on an example.

Example 30.

Consider the following SPARQL SELECT query:

  SELECT Query       SELECT ?nameX ?nameY   WHERE { ?x :knows ?y . ?x :name ?nameX . ?y :name ?nameY }     

We associate to this SELECT query the following CONSTRUCT query:

  CONSTRUCT Query       CONSTRUCT { _:r nameX ?nameX . _:r nameY ?nameY }   WHERE { ?x :knows ?y . ?x :name ?nameX . ?y :name ?nameY }     

Let us run this CONSTRUCT query against the RDF graph GG:

 GG      _:alice :name "Alice" . _:bob :name "Bob" . _:bobby :name "Bob" . _:cathy :name "Cathy" .   _:alice :knows _:bob . _:alice :knows _:bobby . _:alice :knows _:cathy     

The result is the RDF graph HH:

 HH      _:l1 nameX "Alice" . _:l1 nameY "Bob" .   _:l2 nameX "Alice" . _:l2 nameY "Bob" .   _:l3 nameX "Alice" . _:l3 nameY "Cathy"     

From the data graph HH we get the following table, by considering each blank _:\_\colon\!lil_{i} in HH as the identifier of a line in the table. Note that the set of triples in HH becomes a multiset of lines in the table. This table is indeed the answer of the SPARQL SELECT query over GG.

nameX nameY
"Alice" "Bob"
"Alice" "Bob"
"Alice" "Cathy"

In order to generalize Example 30 we have to define a transformation from each SELECT query to a CONSTRUCT query and a transformation from the result of this CONSTRUCT query to the result of the given SELECT query. For this purpose, we first define relational data graphs (Definition 31) and relational query graphs (Definition 34).

Definition 31.

A relational data graph on a finite set {s1,,sn}\{s_{1},...,s_{n}\} of resource identifiers is a data graph made of triples (_:li,sj,yi,j)(\_:l_{i},s_{j},y_{i,j}) where the _:li\_:l_{i}’s are pairwise distinct blanks and the yi,jy_{i,j}’s are in IB{IB}, for j{1,,n}j\in\{1,...,n\} and ii in some finite set {1,,k}\{1,...,k\}.

Proposition 32.

Each relational data graph S={(_:li,sj,yi,j)}i{1k},j{1n}S\!=\!\{(\_\!:\!l_{i},s_{j},y_{i,j})\}_{i\in\{1...k\},j\in\{1...n\}} determines a multirelation 𝑅𝑒𝑙(S)={(yi,1,,yi,n)}i{1k}\mathit{Rel}(S)=\{(y_{i,1},...,y_{i,n})\}_{i\in\{1...k\}} of arity nn on IB{IB}.

Proof.

This result is clear from the definitions of relational data graphs and multirelations. ∎

Example 33.

In Example 30 the graph HH is a relational data graph on the set {nameX,nameY}\{\mbox{{\tt nameX}},\mbox{{\tt nameY}}\} and the table result of the SELECT query is its corresponding multirelation.

Assume that each variable in SPARQL is written as “?s?s” for some string ss.

Definition 34.

The relational query graph on a finite set of variables S={?s1,,?sn}S=\{?s_{1},...,?s_{n}\} is the query graph 𝐺𝑟(S)\mathit{Gr}(S) made of the triples (_:r,sj,?sj)(\_:r,s_{j},?s_{j}) where j{1,,n}j\in\{1,...,n\} and _:r\_:r is a blank. Note that 𝐺𝑟(S)\mathit{Gr}(S) is uniquely determined by SS up to isomorphism in 𝐐IV\mathbf{Q}_{{IV}}.

Example 35.

Here is the relational query graph on {?nameX,?nameY}\{\mbox{{\tt?nameX}},\mbox{{\tt?nameY}}\}:

 

   

  _:r nameX ?nameX . _:r nameY ?nameY 

   

 

Below, we show how a basic select query can be encoded as a basic construct query (Definition 36) and we prove that the result of the given select query is easily recovered from the result of its associated construct query (Theorem 40).

Definition 36.

A basic select query is a pair (L,S)(L,S) where LL is a finite query graph and SS is a finite set of variables such that each variable in SS occurs in LL. The basic construct query associated to a basic select query (L,S)(L,S) is (L,𝐺𝑟(S))(L,\mathit{Gr}(S)) where 𝐺𝑟(S)\mathit{Gr}(S) is the relational query graph on SS.

Proposition 37.

Let (L,S)(L,S) be a basic select query and GG a data graph. The query result of (L,𝐺𝑟(S))(L,\mathit{Gr}(S)) against GG is a relational data graph HH. More precisely, let S={?s1,,?sn}S=\{?s_{1},...,?s_{n}\} and let m1,,mkm_{1},...,m_{k} be the matches from LL to GG, then HH is the set of triples (_:li,sj,mi(?sj))(\_:l_{i},s_{j},m_{i}(?s_{j})) where i{1,,k}i\in\{1,...,k\}, j{1,,n}j\in\{1,...,n\}, and the blanks _:l1,,_:lk\_:l_{1},...,\_:l_{k} are pairwise distinct.

Proof.

We have 𝐺𝑟(S)={(_:r,sj,?sj)}j{1,,n}\mathit{Gr}(S)=\{(\_:r,s_{j},?s_{j})\}_{j\in\{1,...,n\}}, so that according to Definition 19 the query result of (L,𝐺𝑟(S))(L,\mathit{Gr}(S)) against GG is H1HkH_{1}\cup...\cup H_{k} where Hi={(_:li,sj,mi(?sj))}j{1,,n}H_{i}=\{(\_:l_{i},s_{j},m_{i}(?s_{j}))\}_{j\in\{1,...,n\}} and the blanks _:l1,,_:lk\_:l_{1},...,\_:l_{k} are pairwise distinct. ∎

Because of Proposition 37 we can state the following definition.

Definition 38.

Let (L,S)(L,S) be a basic select query and GG a data graph. Let HH be the query result of (L,𝐺𝑟(S))(L,\mathit{Gr}(S)) against GG. The query result of (L,S)(L,S) against GG is the multirelation 𝑅𝑒𝑙(H)\mathit{Rel}(H) on IB{IB}.

The answer, or evaluation, of a SPARQL SELECT query over an RDF graph is defined in [13, Section 2.3] as follows.

Definition 39 ([13]).

Let LL be a basic graph pattern of SPARQL, S={?s1,,?sn}S=\{?s_{1},...,?s_{n}\} a finite set of variables included in |L|V|L|_{V} and let GG be an RDF graph. The answer of the SPARQL query “SELECT ?s1,,?sn?s_{1},...,?s_{n} WHERE {LL}” over GG is the multiset with elements the restrictions μ|S\mu|_{S} of the mappings μ\mu from LL to GG to the subset SS of |L|V|L|_{V}, each μ|S\mu|_{S} with multiplicity the number of corresponding μ\mu’s.

Theorem 40.

Let LL be a basic graph pattern of SPARQL and S={?s1,,?sn}S=\{?s_{1},...,?s_{n}\} a finite set of variables included in |L|V|L|_{V} and let GG be an RDF graph. Then the query result of (L,𝐺𝑟(S))(L,\mathit{Gr}(S)) against GG is the answer of the SPARQL query “SELECT ?s1,,?sn?s_{1},...,?s_{n} WHERE {LL}” over GG.

Proof.

Since the mappings from LL to GG correspond bijectively to the matches from LL to GG, the result follows from Proposition 37. ∎

6 Conclusion

In this paper, we bet to base our work entirely on algebraic theories behind graphs and their transformations. Suitable categories of data graphs and query graphs are defined and the definition of morphisms of query graphs clarifies the difference between blank nodes and variables. Besides, we propose to encode CONSTRUCT and SELECT queries as graph rewrite rules, of the form LLRRL\rightarrow L\cup R\leftarrow R, and define their operational semantics following a novel algebraic approach called POIM. From the proposed semantics, blanks in LL play the same role as variables and thus can be replaced by variables, whereas blanks in RR are used for creating new blanks in the result of a CONSTRUCT query. As in [15], we focus on the CONSTRUCT query form as the fundamental query form. In addition we propose a translation of the SELECT queries as CONSTRUCT queries compatible with their operational semantics. One of the benefits of using category theory is that coding of data graphs as sets of triples is not that important. The results we propose hold for all data models which define a category with enough colimits. For intance, one may expect to define data graph categories for the well-known Edge-labelled graphs or Property graphs [17]. The proposed operational semantics can clearly benefit from all results regarding efficient graph matching implementation, see e.g. [12].

Among related works, a category of RDF graphs as well as their transformations have been proposed in [6]. The authors defined objects of RDF graph categories of the form (GBlank,GTriple)(G_{\mathrm{Blank}},G_{\mathrm{Triple}}) where GBlankG_{\mathrm{Blank}} and GTripleG_{\mathrm{Triple}} denote respectively the set of blank nodes and the set of triples of graph GG. In addition, the morphisms of such RDF graphs associate blank nodes to blank nodes. These definitions of object and morphisms are different from ours. Associating a blank node to any element of a triple, as our homomorphisms do, is called instantiation in [6]. The authors did not tackle the problem of answering SPARQL queries but rather proposed an algebraic approach to transform RDF graphs. Their approach, called MPOC-PO, is inspired from DPO where the first square is replaced by a “minimal” pushout complement (MPOC). However, MPOC-PO drastically departs from the POIM transformations we propose. This difference is quite natural since the two approaches have different aims : the POIM approach is dedicated to implement SPARQL queries while the MPOC-PO is intended to transform RDF graphs in general. However, MPOC-PO and DPO approaches are clearly not tailored to implement CONSTRUCT or SELECT queries since the (minimal) pushout complements always include parts of large data graphs which are not matched by the queries while such parts are not involved in the query answers. The same remark applies also to graph transformations where rules are cospans as in [11].

The image factorization part of POIM steps does not yield, in general, the same result as a pushout complement or a pullback complement constructs. For example, let us consider the following query

  Query       CONSTRUCT {?x :pred :bob} WHERE {?x :pred ?y. ?z :pred :bob}     

The POIM rule LKRL\rightarrow K\leftarrow R corresponding to this query and its application to a graph GG consisting of one triple (:alice,:pred,:bob)(:alice,:pred,:bob) are depicted below.

 LL       ?x :pred ?y .   ?z :pred :bob        l\stackrel{{\scriptstyle l}}{{\longrightarrow}}  KK       ?x :pred ?y .   ?z :pred :bob .   ?x :pred :bob        r\stackrel{{\scriptstyle r}}{{\longleftarrow}}  RR       ?x :pred :bob       

m\stackrel{{\scriptstyle\rotatebox{90.0}{{\scriptsize m}}}}{{\longrightarrow}}

n\stackrel{{\scriptstyle\rotatebox{90.0}{{\scriptsize n}}}}{{\longrightarrow}}

p\stackrel{{\scriptstyle\rotatebox{90.0}{{\scriptsize p}}}}{{\longrightarrow}}

 GG       :alice :pred :bob        g\stackrel{{\scriptstyle g}}{{\longrightarrow}}  DD       :alice :pred :bob        h\stackrel{{\scriptstyle h}}{{\longleftarrow}}  HH       :alice :pred :bob       

The reader can easily check that the right square is neither a pushout nor a pullback.

In [2], even if the authors use a categorical setting, their objectives and results depart from ours as they mainly encode every ontology as a category. However, Graph Transformations have already been used in modeling relational databases, see e.g. [4] where a visual and textual hybrid query language has been proposed. In [14], the main features of a data management system based on graphs have been proposed where the underlying typed attributed data graphs are different from those of RDF and SPARQL. In [3], triple graph grammars (TGG) have also been used for data modelling and model transformation rules to be compiled into Graph Data Bases code for execution.

In this paper, we consider basic graphs and queries, which form a significant kernel of RDF and SPARQL. Future work includes the generalization of the present study to other features of RDF and SPARQL in order to encompass general SPARQL queries. We also consider investigating RDF Schema [21] and ontologies from this point of view.

References

  • [1]
  • [2] S. Aliyu, S.B. Junaidu & A. F. Donfack Kana (2015): A Category Theoretic Model of RDF Ontology. International Journal of Web & Semantic Technology 6(3), pp. 41–51, 10.5121/ijwest.2015.6304.
  • [3] Abdullah Alqahtani & Reiko Heckel (2018): Model Based Development of Data Integration in Graph Databases Using Triple Graph Grammars. In: STAF 2018, LNCS 11176, Springer, pp. 399–414, 10.1007/978-3-030-04771-9_29.
  • [4] Marc Andries & Gregor Engels (1996): A Hybrid Query Language for an Extended Entity-Relationship Model. J. Vis. Lang. Comput. 7(3), pp. 321–352, 10.1006/jvlc.1996.0017.
  • [5] Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter & Domagoj Vrgoc (2017): Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv. 50(5), pp. 68:1–68:40, 10.1145/3104031.
  • [6] Benjamin Braatz & Christoph Brandt (2008): Graph Transformations for the Resource Description Framework. ECEASST 10, 10.14279/tuj.eceasst.10.158.
  • [7] Edgar F. Codd (1990): The relational Model for Database Management (Version 2 ed.). Addison-Wesley.
  • [8] Andrea Corradini, Tobias Heindel, Frank Hermann & Barbara König (2006): Sesqui-Pushout Rewriting. In: ICGT 2006, LNCS 4178, Springer, pp. 30–45, 10.1007/11841883_4.
  • [9] Andrea Corradini, Ugo Montanari, Francesca Rossi, Hartmut Ehrig, Reiko Heckel & Michael Löwe (1997): Algebraic Approaches to Graph Transformation - Part I: Basic Concepts and Double Pushout Approach. In Rozenberg [18], pp. 163–246, 10.1142/3303.
  • [10] Hartmut Ehrig, Reiko Heckel, Martin Korff, Michael Löwe, Leila Ribeiro, Annika Wagner & Andrea Corradini (1997): Algebraic Approaches to Graph Transformation - Part II: Single Pushout Approach and Comparison with Double Pushout Approach. In Rozenberg [18], pp. 247–312, 10.1142/3303.
  • [11] Hartmut Ehrig, Frank Hermann & Ulrike Prange (2009): Cospan DPO Approach: An Alternative for DPO Graph Transformations. Bulletin of the EATCS 98, pp. 139–149.
    Available at https://eatcs.org/images/bulletin/beatcs98.pdf.
  • [12] Wenfei Fan, Jianzhong Li, Shuai Ma, Hongzhi Wang & Yinghui Wu (2010): Graph Homomorphism Revisited for Graph Matching. PVLDB 3(1), pp. 1161–1172, 10.14778/1920841.1920986.
  • [13] Mark Kaminski, Egor V. Kostylev & Bernardo Cuenca Grau (2017): Query Nesting, Assignment, and Aggregation in SPARQL 1.1. ACM Trans. Database Syst. 42(3), pp. 17:1–17:46, 10.1145/3083898.
  • [14] Norbert Kiesel, Andy Schürr & Bernhard Westfechtel (1995): GRAS, a Graph-Oriented (Software) Engineering Database System. Inf. Syst. 20(1), pp. 21–51, 10.1016/0306-4379(95)00002-L.
  • [15] Egor V. Kostylev, Juan L. Reutter & Martín Ugarte (2015): CONSTRUCT Queries in SPARQL. In: 18th International Conference on Database Theory, ICDT 2015, March 23-27, 2015, Brussels, Belgium, pp. 212–229, 10.4230/LIPIcs.ICDT.2015.212.
  • [16] Jorge Pérez, Marcelo Arenas & Claudio Gutiérrez (2009): Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), pp. 16:1–16:45, 10.1145/1567274.1567278.
  • [17] Ian Robinson, Jim Webber & Emil Eifrem (2013): Graph Databases. O’Reilly Media, Inc.
  • [18] Grzegorz Rozenberg, editor (1997): Handbook of Graph Grammars and Computing by Graph Transformations, Volume 1: Foundations. World Scientific, 10.1142/3303.
  • [19] (2013): SPARQL 1.1 Query Language. W3C Recommendation.
    Available at https://www.w3.org/TR/sparql11-query/.
  • [20] (2014): RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation.
    Available at https://www.w3.org/TR/rdf11-concepts/.
  • [21] (2014): RDF Schema 1.1. W3C Recommendation.
    Available at https://www.w3.org/TR/2014/REC-rdf-schema-20140225/.