This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Enhancing Linear Algebraic Computation of
Logic Programs Using Sparse Representation

Tuan Nguyen Quoc     Katsumi Inoue National Institute of Informatics, Tokyo, JapanThe Graduate University for Advanced Studies (SOKENDAI) {tuannq, inoue}@nii.ac.jp Wakayama University, Wakayama, Japan    Chiaki Sakama Wakayama University, Wakayama, Japan sakama@wakayama-u.ac.jp
Abstract

Algebraic characterization of logic programs has received increasing attention in recent years. Researchers attempt to exploit connections between linear algebraic computation and symbolic computation in order to perform logical inference in large scale knowledge bases. This paper proposes further improvement by using sparse matrices to embed logic programs in vector spaces. We show its great power of computation in reaching the fixpoint of the immediate consequence operator from the initial vector. In particular, performance for computing the least models of definite programs is dramatically improved in this way. We also apply the method to the computation of stable models of normal programs, in which the guesses are associated with initial matrices, and verify its effect when there are small numbers of negation. These results show good enhancement in terms of performance for computing consequences of programs and depict the potential power of tensorized logic programs.

1 Introduction

For decades, Logic Programming (LP) representation has been considered mainly in the form of symbolic logic [10], which is useful for declarative problem solving and symbolic reasoning. Logic programming starts gaining more attention recently in order to build explainable learning models, whereas it still has some limitations in terms of computation. In other words, symbolic computation is not an efficient way when we need to combine it with other numerical learning models such as Artificial Neural Network (ANN). Recently, several studies have been done on embedding logic programs to numerical spaces so that we can exploit great computing resources ranging from multi-threaded CPU to GPU. The linear algebraic approach is a robust way to manipulate logic programs in numerical spaces. Because linear algebra is at the heart of many applications of scientific computation, this approach is promising to develop scalable techniques to process huge relational Knowledge Base (KB) [19], [13]. In addition, it enables the ability to use efficient parallel algorithms of numerical linear algebra for computing LP.

In [6], Cohen described a probabilistic deductive database system in which reasoning is performed by a differentiable process. With this achievement, they can enable novel gradient-based learning algorithms. In [15], Sato presented the use of first-order logic in vector spaces for Tarskian semantics. He demonstrates how tensorization realizes the efficient computation of Datalog. In [16], Sato proposed linear algebraic approach to datalog evaluation. In this work, the least Herbrand model of DB, is computed via adjacency matrices. He also provided theoretical proofs for translating a program into a system of linear matrix equations. This approach achieves O(N3)O(N^{3}) time complexity where NN is the number of variables in a clause. Continuing to this direction, Sato et al. developed linear algebraic abduction to abductive inference in Datalog [17]. They did empirical experiments on linear and recursive cases and indicated that this approach can derive base relations.

Using a linear algebraic method, Sakama et al. [14] define relations between LP and multi-dimensional array (tensor) then propose algorithms for computation of LP models. The representation is done by defining a series of conversions from logical rules to vectors and then the computation is done by applying matrix multiplication. Later, elimination techniques are applied to reduce the matrix size [12] and gain impressive performance. In [4], a similar idea using 3D-tensor was employed to compute solutions of abductive Horn propositional tasks. In addition, Aspis built upon previous works on matrix characterization of Horn propositional logic programs to explore how inference from logic programs can be done by linear algebraic algorithms [3]. He also proposed a new algorithm for non-monotonic deduction, based on linear algebraic reducts and differentiable deduction. These works prove that the linear algebraic method is promising for logic inference on large scale but has not yet been done much in experiments, to the best of our knowledge.

In this paper, we continue Sakama et al.’s idea of representing logic programs by tensors [14]. Although the method is well-defined, there are some problems, which limit the performance of the approach and have not been solved. First, the obtained matrix after conversion is sparse but sparsity analysis was not considered. Second, the experiments were limited with small-size logic programs that are not sufficient to prove the robustness of matrix representation. In this research, we further raise the bar of computing performance by using sparse representation for logic programs in order to reach the fixpoint of the immediate consequence operator from the initial vector. We are able to do experiments on larger size logic programs to demonstrate the performance for computing least models of definite programs. Further, we also conduct experiments on computation of stable models of normal programs with a small number of negations.

Accordingly, the rest of this paper is organized as follows: Section 2 reviews and summaries some definitions and computation algorithms for definite and normal programs, Section 3 discusses sparsity problem in tensorized logic programs and proposes a method to represent LP, Section 4 demonstrates experimental results with definite and normal programs, Section 5 gives final conclusions and future works.

2 Preliminaries

2.1 Definite programs

We consider a language \mathscr{L} that contains a finite set of propositional variables. Given a logic program PP, the set of all propositional variables appearing in PP is called the Herbrand base of PP (written B_PB_{\_}P). A definite program is a finite set of rules of the form:

hb_1b_m(m0)h\leftarrow\;b_{\_}1\wedge\cdots\wedge b_{\_}m\;\;\;\;(m\geq 0) (1)

where hh and b_ib_{\_}i are propositional variables (atoms) in \mathscr{L}.

A rule rr is called an OR-rule if rr is the form:

hb_1b_m(m0)h\leftarrow\;b_{\_}1\vee\cdots\vee b_{\_}m\;\;\;\;(m\geq 0) (2)

where hh and b_ib_{\_}i are propositional variables in \mathscr{L}.

A standardized program is a finite set of rules that are either (1) or (2). Note that the rule (2) is a shorthand of mm rules: hb_1h\leftarrow b_{\_}1, \ldots, hb_mh\leftarrow b_{\_}m, so a standardized program is considered a definite program.

For each rule rr of the form (1) or (2), define head(r)=hhead(r)=h and body(r)={b_1,,b_m}body(r)=\{b_{\_}1,\ldots,b_{\_}m\}.111We assume b_ib_jijb_{\_}i\neq b_{\_}j\;\forall\;i\neq j. A rule rr is called a fact if body(r)=body(r)=\emptyset.

Definition 1

Singly-Defined (SD) program: A definite program PP is called a SD program if head(r_1)head(r_2)head(r_{\_}1)\neq head(r_{\_}2) for any two rules r_1r_{\_}1 and r_2r_{\_}2 (r_1r_2r_{\_}1\neq r_{\_}2) in PP.

Any definite program can be transformed to a standardized program by introducing new propositional variables. That is, if there are several rules with the same head but different bodies: hbody(r_1)h\leftarrow body(r_{\_}1), \ldots, hbody(r_k)h\leftarrow body(r_{\_}k), then we replace all these rules by hb_1b_kh\leftarrow b_{\_}1\vee\ldots\vee b_{\_}k, b_1body(r_1)b_{\_}1\leftarrow body(r_{\_}1),… b_kbody(r_k)b_{\_}k\leftarrow body(r_{\_}k). In this paper, a program means a standardized program unless stated otherwise.

A set IB_PI\subseteq B_{\_}P is an interpretation of PP. An interpretation II is a model of a standardized program PP if {b_1,,b_m}I\{b_{\_}1,\ldots,b_{\_}m\}\subseteq I implies hIh\in I for every rule (1) in PP, and {b_1,,b_m}I\{b_{\_}1,\ldots,b_{\_}m\}\cap I\neq\emptyset implies hIh\in I for every rule (2) in PP. A model II is the least model of PP if IJI\subseteq J for any model JJ of PP. A mapping T_P: 2B_P2B_PT_{\_}P:\,2^{B_{\_}P}\rightarrow 2^{B_{\_}P} (called a T_PT_{\_}P-operator) is defined as: T_P(I)={hhb_1b_mPand{b_1,,b_m}I}{hhb_1b_nPand{b_1,,b_n}I}.T_{\_}P(I)=\{\,h\,\mid\,h\leftarrow b_{\_}1\wedge\cdots\wedge b_{\_}m\in P\;\mbox{and}\;\{b_{\_}1,\ldots,b_{\_}m\}\subseteq I\,\}\;\cup\;\{\,h\,\mid\,h\leftarrow b_{\_}1\vee\cdots\vee b_{\_}n\in P\;\mbox{and}\;\{b_{\_}1,\ldots,b_{\_}n\}\cap I\neq\emptyset\,\}.

The powers of T_PT_{\_}P are defined as: T_Pk+1(I)=T_P(T_Pk(I))T_{\_}P^{k+1}(I)=T_{\_}P(T_{\_}P^{k}(I)) (k0)(k\geq 0) and T_P0(I)=IT_{\_}P^{0}(I)=I. Given IB_PI\subseteq B_{\_}P, there is a fixpoint T_Pn+1(I)=T_Pn(I)T_{\_}P^{n+1}(I)=T_{\_}P^{n}(I) (n0)(n\geq 0). For a definite program PP, the fixpoint T_Pn()T_{\_}P^{n}(\emptyset) coincides with the least model of PP [18].

Definition 2

Matrix representation of standardized programs [14]: Let PP be a standardized program and B_P={p_1B_{\_}P=\{p_{\_}1, \ldots, p_n}p_{\_}n\}. Then PP is represented by a matrix M_PM_{\_}P\in n×n such that for each element a_ija_{\_}{ij} (1i,jn)(1\leq i,j\leq n) in M_PM_{\_}P,

  1. 1.

    a_ij_k=1m(1km; 1i,j_kn)a_{\_}{ij_{\_}k}=\frac{1}{m}\;\;(1\leq k\leq m;\,1\leq i,j_{\_}k\leq n) if  p_ip_j_1p_j_mp_{\_}i\leftarrow p_{\_}{j_{\_}1}\wedge\cdots\wedge p_{\_}{j_{\_}m} is in PP;

  2. 2.

    a_ij_k=1(1kl; 1i,j_kn)a_{\_}{ij_{\_}k}=1\;\;(1\leq k\leq l;\,1\leq i,j_{\_}k\leq n) if  p_ip_j_1p_j_lp_{\_}i\leftarrow p_{\_}{j_{\_}1}\vee\cdots\vee p_{\_}{j_{\_}l} is in PP;

  3. 3.

    a_ii=1a_{\_}{ii}=1 if p_ip_{\_}i\leftarrow is in PP;

  4. 4.

    a_ij=0a_{\_}{ij}=0, otherwise.

M_PM_{\_}P is called a program matrix. We write 𝗋𝗈𝗐_i(M_P)=p_i{\sf row}_{\_}i(M_{\_}P)=p_{\_}i and 𝖼𝗈𝗅_j(M_P)=p_j{\sf col}_{\_}j(M_{\_}P)=p_{\_}j (1i,jn)(1\leq i,j\leq n).

To better understand Definition 2, let’s consider a concrete example.

Example 1

Consider the definite program P={pqr,pst,rs,qt,s,t}P=\left\{p\leftarrow q\wedge r,\;p\leftarrow s\wedge t,\;r\leftarrow s,\;q\leftarrow t,\;s\leftarrow,\;t\leftarrow\right\}.
PP
is not an SD program because there are two rules pqrp\leftarrow q\wedge r and pstp\leftarrow s\wedge t having the same head, then PP is transformed to the standardized program PP^{\prime} by introducing new atoms uu and vv as follows: P={uqr,vst,puv,rs,qt,s,t}P^{\prime}=\{u\leftarrow q\wedge r,\;v\leftarrow s\wedge t,\;p\leftarrow u\vee v,\;r\leftarrow s,\;q\leftarrow t,\;s\leftarrow,\;t\leftarrow\}. Then by applying Definition 2, we obtain:

pqrstuvp( 0000011) q0000100r0001000s0001000t0000100u01/21/20000v0001/21/200\small\bordermatrix{~{}&p&q&r&s&t&u&v\cr p&0&0&0&0&0&1&1\cr q&0&0&0&0&1&0&0\cr r&0&0&0&1&0&0&0\cr s&0&0&0&1&0&0&0\cr t&0&0&0&0&1&0&0\cr u&0&1/2&1/2&0&0&0&0\cr v&0&0&0&1/2&1/2&0&0\cr}

Sakama et al. further define representation of interpretation by using interpretation vector (Definition 3). This vector is used to store the truth value of all propositions in PP. The starting point of interpretation vector is defined as the initial vector (Definition 4).

Definition 3

Interpretation vector [14]: Let PP be a program and B_P={p_1,,p_n}B_{\_}P=\{p_{\_}1,\ldots,p_{\_}n\}. Then an interpretation IB_PI\subseteq B_{\_}P is represented by a vector v=(a_1,,a_n)𝖳v=(a_{\_}1,\ldots,a_{\_}n)^{\sf T} where each element a_ia_{\_}i (1in)(1\leq i\leq n) represents the truth value of the proposition p_ip_{\_}i such that a_i=1a_{\_}i=1 if p_iIp_{\_}i\in I; otherwise, a_i=0a_{\_}i=0. We write 𝗋𝗈𝗐_i(v)=p_i{\sf row}_{\_}i(v)=p_{\_}i.

Definition 4

Initial vector: Let PP be a program and B_P={p_1,,p_n}B_{\_}P=\{p_{\_}1,\ldots,p_{\_}n\}. Then the initial vector of PP is an interpretation vector v_0=(a_1,,a_n)𝖳v_{\_}0=(a_{\_}1,\ldots,a_{\_}n)^{\sf T} such that a_i=1a_{\_}i=1 (1in)(1\leq i\leq n) if 𝗋𝗈𝗐_i(v_0)=p_i{\sf row}_{\_}i(v_{\_}0)=p_{\_}i and a fact p_ip_{\_}i\leftarrow is in PP; otherwise, a_i=0a_{\_}i=0.

In order to compute the least model in vector space, Sakama et al. proposed an algorithm which is equivalent to the result of computing least models by the T_PT_{\_}P-operator. This algorithm is presented in Algorithm 1.

Definition 5

θ\theta-thresholding: Given a value xx, define θ(x)=x\theta(x)=x^{\prime} where x=1x^{\prime}=1 if x1x\geq 1; otherwise, x=0x^{\prime}=0.

Similarly, the θ\theta-thresholding is extended in an element-wise way to vectors and matrices.

Algorithm 1 Matrix computation of least model

Input: a definite program PP and its Herbrand base B_P={p_1,p_2,,p_n}B_{\_}P=\{p_{\_}1,p_{\_}2,...,p_{\_}n\}
Output: a vector vv representing the least model

1:  transform PP to a standardized program Pδ=QDP^{\delta}=Q\cup D with B_Pδ={p_1,p_2,,p_n,p_n+1,,p_m}B_{\_}{P^{\delta}}=\{p_{\_}1,p_{\_}2,...,p_{\_}n,p_{\_}{n+1},...,p_{\_}m\} where QQ is an SD program and DD is a set of OR-rules.
2:  create matrix M_Pδm×mM_{\_}{P^{\delta}}\in\mathbb{R}^{m\times m} representing PδP^{\delta}
3:  create initial vector v_0=(v_1,v_2,,v_m)Tv_{\_}0=(v_{\_}1,v_{\_}2,...,v_{\_}m)^{T} of PδP^{\delta}
4:  v=v_0v=v_{\_}0
5:  u=θ(M_Pδv)u=\theta(M_{\_}{P^{\delta}}v) {refer to Definition 5}
6:  while uvu\neq v do
7:     v=uv=u
8:     u=θ(M_Pδv)u=\theta(M_{\_}{P^{\delta}}v) {refer to Definition 5}
9:  end while
10:  return  vv

2.2 Normal programs

Normal programs can be transformed to definite programs as introduced in [2]. Therefore, we transform normal programs to definite programs before encoding them in matrices.

Definition 6

Normal program: A normal program is a finite set of normal rules:

hb_1b_2b_l¬b_l+1¬b_m(ml0)h\leftarrow b_{\_}1\wedge b_{\_}2\wedge...\wedge b_{\_}l\wedge\neg b_{\_}{l+1}\wedge...\wedge\neg b_{\_}m\ (m\geq l\geq 0) (3)

where hh and b_i(1im)b_{\_}i(1\leq i\leq m) are propositional variables (atoms) in {\cal L}.

PP is transformed to a definite program by rewriting the above rule to the following form:

hb_1b_2b_lb¯_l+1b¯_m(ml0)h\leftarrow b_{\_}1\wedge b_{\_}2\wedge...\wedge b_{\_}l\wedge\overline{b}_{\_}{l+1}\wedge...\wedge\overline{b}_{\_}m\ (m\geq l\geq 0) (4)

where b¯_i\overline{b}_{\_}i is a new proposition associated with b_ib_{\_}i.

In this part, we denote PP as a normal program with an interpretation IB_PI\subseteq B_{\_}P. The positive form P+P^{+} of PP is obtained by applying the above transformation. Since a definite program P+P^{+} is transformed to its standardized program, then we can apply Algorithm 1 to compute the least model. [2] proved that if PP is a normal program, II is a stable model of PP iff I+I^{+} is the least model of P+I¯P^{+}\cup\overline{I}. We should note that I+I^{+} is interpretation of P+P^{+} which is a definite program. We can obtain P+P^{+} by applying Algorithm 1 to the transformed program P+P^{+}. Define I¯={p¯|pB_PI}\bar{I}=\{\bar{p}\ |\ p\in B_{\_}P\setminus I\}, then I+=II¯I^{+}=I\cup\bar{I}.

Definition 7

Matrix representation of normal programs [12]: Let PP be a normal program and B_P={p_1B_{\_}P=\{p_{\_}1, \ldots, p_n}p_{\_}n\} and its positive form P+P^{+} with B_P+={p_1,,p_n,q¯_n+1,,q¯_m}B_{\_}{P^{+}}=\{p_{\_}1,\ldots,p_{\_}n,\overline{q}_{\_}{n+1},\ldots,\overline{q}_{\_}m\}.

Then P+P^{+} is represented by a matrix M_PM_{\_}P\in m×m such that for each element a_ija_{\_}{ij} (1i,jm)(1\leq i,j\leq m):

  1. 1.

    a_ii=1a_{\_}{ii}=1 for n+1imn+1\leq i\leq m;

  2. 2.

    a_ij=0a_{\_}{ij}=0 for n+1imn+1\leq i\leq m and 1jm1\leq j\leq m such that iji\neq j;

  3. 3.

    Otherwise, a_ija_{\_}{ij} (1in1\leq i\leq n; 1jm1\leq j\leq m) is encoded as in Definition 2.

M_PM_{\_}P is called a program matrix. We write 𝗋𝗈𝗐_i(M_P)=p_i{\sf row}_{\_}i(M_{\_}P)=p_{\_}i and 𝖼𝗈𝗅_j(M_P)=p_j{\sf col}_{\_}j(M_{\_}P)=p_{\_}j (1i,jn)(1\leq i,j\leq n).

Example 2

Consider a program P={pqs,qpt,s¬t,t,uv}P=\{p\leftarrow q\wedge s,\;q\leftarrow p\wedge t,\;s\leftarrow\neg t,\;t\leftarrow,\;u\leftarrow v\}.
First, we need to transform PP to P+P^{+} such that P+={pqs,qpt,st¯,t,uv}P^{+}=\{p\leftarrow q\wedge s,\;q\leftarrow p\wedge t,\;s\leftarrow\overline{t},\;t\leftarrow,\;u\leftarrow v\}. Then applying Definition 7, we obtain:

pqstuvt¯p( 01/21/20000) q1/2001/2000s0000001t0001000u0000010v0000000t¯0000001\small\bordermatrix{~{}&p&q&s&t&u&v&\overline{t}\cr p&0&1/2&1/2&0&0&0&0\cr q&1/2&0&0&1/2&0&0&0\cr s&0&0&0&0&0&0&1\cr t&0&0&0&1&0&0&0\cr u&0&0&0&0&0&1&0\cr v&0&0&0&0&0&0&0\cr\overline{t}&0&0&0&0&0&0&1\cr}

Instead of the initial vector in the case of definite programs, the notion of an initial matrix is introduced to encode positive and negative facts in a program.

Definition 8

Initial matrix [12]: Let PP be a normal program and B_P={p_1,,p_n}B_{\_}P=\{p_{\_}1,\ldots,p_{\_}n\} and its positive form P+P^{+} with B_P+={p_1,,p_n,B_{\_}{P^{+}}=\{p_{\_}1,\ldots,p_{\_}n, q¯_n+1,,q¯_m}\overline{q}_{\_}{n+1},\ldots,\overline{q}_{\_}m\}.
The initial matrix M_0m×h(1h2mn)M_{\_}0\in\mathbb{R}^{m\times h}(1\leq h\leq 2^{m-n}) is defined as follows:

  1. 1.

    each row of M_0M_{\_}0 corresponds to each element of B_PB_{\_}P in a way that row_i(M_0)=p_irow_{\_}i(M_{\_}0)=p_{\_}i for 1in1\leq i\leq n and row_i(M_0)=q¯_irow_{\_}i(M_{\_}0)=\overline{q}_{\_}i for n+1imn+1\leq i\leq m;

  2. 2.

    a_ij=1a_{\_}{ij}=1 (1in1\leq i\leq n, 1jh1\leq j\leq h) iff a fact q_iq_{\_}i\leftarrow is in PP; otherwise a_ij=0a_{\_}{ij}=0;

  3. 3.

    a_ij=0a_{\_}{ij}=0 (n+1imn+1\leq i\leq m, 1jh1\leq j\leq h) iff a fact q_iq_{\_}i is in PP; otherwise, there are two possibilities 0 and 11 for a_ija_{\_}{ij}, so it is either 0 or 11.

Each column of M_0M_{\_}0 is a potential stable model in the first stage. We update M_0M_{\_}0 by applying matrix multiplication with the matrix representation obtained by Definition 7 as M_k+1=θ(M_PM_k)M_{\_}{k+1}=\theta(M_{\_}PM_{\_}k). Then, the algorithm for computing the stable models is presented in Algorithm 2.

Algorithm 2 Matrix computation of stable models

Input: a normal program PP and its Herbrand base B_P={p_1,p_2,,p_n}B_{\_}P=\{p_{\_}1,p_{\_}2,...,p_{\_}n\}
Output: a set of vectors VV representing the stable models or PP

1:  transform PP to a standardized program P+P^{+} with B_P+={p_1,,p_n,q¯_n+1,,q¯_m}B_{\_}{P^{+}}=\{p_{\_}1,\ldots,p_{\_}n,\overline{q}_{\_}{n+1},\ldots,\overline{q}_{\_}m\}.
2:  create the matrix M_Pm×mM_{\_}P\in\mathbb{R}^{m\times m} representing P+P^{+}
3:  create the initial matrix M_0m×hM_{\_}0\in\mathbb{R}^{m\times h}
4:  M=M_0M=M_{\_}0, U=θ(M_PM)U=\theta(M_{\_}PM) {refer to Definition 5}
5:  while UMU\neq M do
6:     M=UM=U, U=θ(M_PM)U=\theta(M_{\_}PM) {refer to Definition 5}
7:  end while
8:  V=V= find stable models of PP {refer to Algorithm 3}
9:  return  VV
Algorithm 3 Find stable models of PP

Input: program matrix MM
Output: a set of vectors VV representing the stable models of PP

1:  V=V=\emptyset
2:  for ii from 11 to hh do
3:     v=(a_1,a_n,a_n+1,,a_m)Tv=(a_{\_}1,\ldots a_{\_}n,a_{\_}{n+1},\ldots,a_{\_}m)^{T} (ithi^{th}-column of MM)
4:     for jj from n+1n+1 to mm do
5:        q¯_j=row_j(M)\overline{q}_{\_}j=row_{\_}j(M)
6:        for ll from 11 to nn do
7:           if row_l(M)=q_jrow_{\_}l(M)=q_{\_}j then
8:              if a_l+a_j1a_{\_}l+a_{\_}j\neq 1 then break;
9:           end if
10:        end for
11:        if lnl\leq n then break;
12:     end for
13:     if jmj\leq m then break;
14:     else V=V{v}V=V\cup\{v\}
15:  end for
16:  return  VV

This method requires extra steps on transforming and finding stable models of a program. In addition, the initial matrix size grows exponentially by the number of negations mnm-n. Therefore this representation requires a lot of memory and the algorithm performs considerably slower than the method for definite programs if there are many negations appear in the program. But we will later show that this method still has the advantage when there are a small number of negations.

3 Sparse representation of logic programs

The idea of representing logic programs in vector spaces could minimize the work with symbolic computation and utilize better computing performance. Besides that, this method copes with the curse of dimension when a matrix representing logic programs becomes very large. Previous works on this topic only consider dense matrices for their implementation and it seems not very impressive in terms of performance even on small datasets [12]. In order to solve this problem, this paper focuses on analyzing the sparsity of logic programs in vector spaces and proposes improvement using sparse representation for logic programs.

3.1 Sparsity of logic programs in vector spaces

A sparse matrix is a matrix in which most of the elements are zero. The level of sparseness is measured by sparsity which equals the number of zero-valued elements divided by the total number of elements [5]. Because there are a large number of zero elements in sparse matrices, we can save the computation by ignoring these zero values [9]. According to the conversion method of linear algebraic approach, we can calculate the sparsity of a program PP 222We only consider the programs in Definition 2 and Definition 7.. This calculation is done by counting the number of nonzero-valued elements of each rule in PP, then let 11 minus the fraction of the number of nonzero-valued elements and the matrix size.

By definition, the sparsity of a program PP is computed by the following equation:

sparsity(P)=1_rP|body(r)|n2\displaystyle sparsity(P)=1-\frac{\displaystyle\sum_{\_}{r\in P}{|body(r)|}}{n^{2}} (5)

where nn is the number of elements in B_PB_{\_}P and |B_r||B_{\_}r| is the length of body of rule rr.

Accordingly, the representation matrix becomes a high level of sparsity if matrix size becomes larger while the length of body rule is insignificant. In fact, a rule rr in a logic program rarely has a body length approx nn, therefore, |B_r|n|B_{\_}r|\ll n. In short, we can say that the matrix representation of a logic program according to the linear algebraic approach is highly sparse.

3.2 Converting logic programs to sparse matrices

There are several sparse matrix representations. Compressed Sparse Row (CSR) is one of the most efficient, robust and widely adopted by many programming libraries among those [5]. Hence, in this paper, we propose CSR for representing logic programs.

In order to understand the CSR format, firstly we need to mention the Coordinate (COO) format which is simple using 2 arrays of coordinates and 1 array of values. The length of these arrays is equal to the number of nonzero elements. The first array stores the row index of each value, and the second array stores the row and column indices of each value, while the third array stores the values in the original matrix. We can imagine that the ithi^{th} nonzero element in a matrix is represented by a 3-tuple extracted from these 3 arrays at index ii. Example 3 illustrates sparse representation in COO format for the program PP in Example 1. We should note that in Example 3, zero-based indexing is used.

Example 3

COO representation for PP in Example 1

Row index 0 0 1 2 3 4 5 5 6 6
Col index 5 6 4 3 3 4 1 2 3 4
Value 1.0 1.0 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5

Noticeably, in the row index array, it is possible for a value to be repeated consecutively because the nonzero elements may appear in the same row for many times. We may reduce the size of the row index array by considering CSR format. In this format, while the column index and the value arrays remain the same, we compress the row index array by storing the index of the row only where nonzero elements appear. That means we do not need to store 2 consecutive 0 and 2 consecutive 55 as in Example 3. Instead, we store the index of the next row, then finally point the last index to the end of the row (which equals to the number of nonzero elements). Concretely in the row index array, the first element is starting index which is 0. The last element is an extra element to indicate the end of this array which is equal to the number of nonzero elements. We need 2 consecutive values in row index array to extract the nonzero elements in this row. To be specific, we need to interpret row_startrow\_start and row_endrow\_end of the ithi^{th} row from the compressed value in row_indexrow\_index array: row_start_i=row_index[i],row_end_i=row_index[i+1]row\_start_{\_}i=row\_index[i],row\_end_{\_}i=row\_index[i+1].

Example 4 illustrates this method. For the first row (i=0i=0), we have row_start_0=0,row_end_i=2row\_start_{\_}0=0,row\_end_{\_}i=2, then we extract 2 values 0 and 11 for the nonzero element in the first row. These startstart and endend will be used to extract column index and value of nonzero elements. Similarly, the second row (i=1i=1), we have row_start_1=2,row_end_1=3row\_start_{\_}1=2,row\_end_{\_}1=3 then we have only one nonzero element at index 2. Continue this interpretation until we reach the final row (i=6i=6), we have row_start_6=8,row_end_6=10row\_start_{\_}6=8,row\_end_{\_}6=10 then we extract last two nonzero elements at index 8 and 9 for the final row.

Example 4

CSR representation for PP in Example 1

Row index 0 2 3 4 5 6 8 10
Col index 5 6 4 3 3 4 1 2 3 4
Value 1.0 1.0 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5

As we can see in Example 4, the row index array now has only 8 indices rather than 10 in Example 3. We save storing repeatedly indices in the row index array by storing only the position where it starts and ends. This phenomenon does not always encourage the reduction in terms of array size. Imagine the situation where each row has only one nonzero element, then we save nothing with this representation. Actually for a sparse matrix of the size m×nm\times n, the CSR format saves on memory only when nnz<(m(n1)1)/2nnz<(m(n-1)-1)/2 (where nnznnz is number of nonzero elements). Fortunately, in case of linear algebraic method, each rule normally has many atoms in the body, therefore the matrix representation has many nonzero elements in a single row. That is why CSR format could considerably save memory. In fact, we can save up to 20%20\% of the size of the row index array using CSR format. Accordingly, in our method, we propose to implement CSR rather than COO not only because it saves more memory, but also because it is widely adopted by many programming libraries.

Because a logic program PP is highly sparse, applying Algorithm 1 and Algorithm 2 on sparse representation is remarkably faster than the dense matrix. Moreover, sparse representation saves the memory space as well, therefore enabling the ability to deal with large scale KBs. We are going to reveal the performance gain by using sparse representation in the next section.

Note that only the matrix representation of the program is sparse, while the initial vector and the initial matrix are not sparse. Thus, in our methods, we will keep the dense format for interpretation matrices.

4 Experimental results

In this section, we conduct two experiments on finding the least models of definite programs and computing stable models of normal programs. In order to evaluate the performance of linear algebraic methods, we complete the implementations of Algorithm 1 and Algorithm 2 with (i)(i) T_PT_{\_}P-operator and (ii)(ii) Clasp (Clingo v5.4.1 running with flag method=clasp). Our implementations are done with (iii)(iii) dense matrices and (iv)(iv) sparse matrices. Except Clasp, all implementations are implemented on C++ with CPU x64 as a targeted device (we do not use GPU accelerated code). In terms of matrix representations and operators, we use Eigen 3 library [8]. The computer running experiments has the following configurations: CPU: Intel Cote i7-4770 (4 cores, 8 threads) @3.4GHz; RAM: 16GB DDR3 @1333MHz; Operating system: Ubuntu 18.04 LTS 64bit.

Focusing on analyzing the performance of sparse representation, we first evaluate our method by conducting experiments on randomized logic programs. We use the same method of LP generation conducted in [12] that the size of logic program defined by the size n=|B_P|n=|B_{\_}P| of the Herband base B_PB_{\_}P and the number of rules m=|P|m=|P| in PP. The rules are uniformly generated based on the length (maximum length is 88) of rule body according to Table 1.

Table 1: Proportion of rules in PP based on the number of propositional variables in their bodies.
Body length 0 1 2 3 4 5 6 7 8
Allocated proportion <n/3<n/3 333This is the proportion of facts in PP. 4%4\% 4%4\% 10%10\% 40%40\% 35%35\% 4%4\% 2%2\% 1%1\%
33footnotetext: This is the proportion of facts in PP.

We further generate denser matrices in order to analyze the efficacy of the sparse method. While keeping the same proportion of facts and rules with body length are 1 and 2, we generate the rest 7080%70\sim 80\% rules such that their body length is around 5%5\% of the number of propositions. This method leads to the lower sparsity level of generated matrices with approximate 0.95.

Also based on the generation method for definite programs, we generate normal programs by randomly changing kk (4k8)(4\leq k\leq 8) literals to negations. The important difference from [12] is we do experiments on much larger nn and mm, because our method, which is implemented on C++, is dramatically more efficient than Nguyen et al.’s implementation using Maple. The largest size of the logic program in this experiment reaches thousands of propositions and hundreds of thousands of rules. Further, we also compare our method with one of the best Answer Set Programming (ASP) solvers - Clasp [7] running in the same environment. All methods are conducted 30 times on each LP to obtain mean values of execution time.

In addition, we also conduct a further experiment using non-random problems with definite programs using transitive closure problem. The graph we use is selected from the Koblenz network collection [11]. This dataset contains binary tuples and we compute transitive closure of them using the following rules: path(X,Y)edge(X,Y)path(X,Y)\leftarrow edge(X,Y) and path(X,Y)edge(X,Z)path(Z,Y)path(X,Y)\leftarrow edge(X,Z)\wedge path(Z,Y)

4.1 Definite programs

The final results on definite programs are illustrated in Table 2. We can see in the results, Dense matrix method is the slowest method and being unable to run with very large programs. Overall, Sparse matrix method is very efficient which is 101510\sim 15 faster than Clasp. We should mention that all the codes are executed on single-threaded CPU without using GPU boost or any other parallel computing techniques.

Table 2: Details of experimental results on definite programs of T_PT_{\_}P-operator, Clasp and linear algebraic methods (with dense and sparse representation). nn^{\prime} indicates the actual matrix size after transformation.
No. nn mm nn^{\prime} 444nn^{\prime} is the size of the Herbrand base of a standardized program Sparsity T_PT_{\_}P-operator Clasp Dense matrix Sparse matrix
1 1000 5000 5788 0.99 0.0402 0.1680 2.0559 0.0071
2 1000 10000 10799 0.99 0.1226 0.2940 17.9986 0.0127
3 1600 24000 25198 0.99 0.3952 1.8480 73.3541 0.0357
4 1600 30000 31285 0.99 0.4793 2.5360 116.1158 0.0605
5 2000 36000 37596 0.99 0.7511 3.1690 155.4312 0.0692
6 2000 40000 41936 0.99 0.9763 5.1610 187.6549 0.0675
7 10000 120000 127119 0.99 18.5608 9.0720 - 0.3798
8 10000 160000 167504 0.99 25.6532 15.7760 - 0.4832
9 16000 200000 211039 0.99 57.0223 19.9760 - 0.8643
10 16000 220000 231439 0.99 60.4486 24.7860 - 0.9429
11 20000 280000 297293 0.99 104.9978 30.5730 - 0.9048
12 20000 320000 337056 0.99 108.5883 34.4030 - 1.0614
44footnotetext: nn^{\prime} is the size of the Herbrand base of a standardized program
Table 3: Details of experimental results on definite programs (with lower sparsity level) of T_PT_{\_}P-operator, Clasp and linear algebraic methods (with dense and sparse representation). nn^{\prime} indicates the actual matrix size after transformation.
No. nn mm nn^{\prime} Sparsity T_PT_{\_}P-operator Clasp Dense matrix Sparse matrix
1 1000 5000 5876 0.95 0.1044 0.3970 2.3102 0.0384
2 1000 10000 10243 0.95 0.3613 0.9160 17.5917 0.0519
3 1600 24000 25712 0.95 0.9478 2.2540 70.0931 0.1634
4 1600 30000 31430 0.95 1.1817 3.0130 120.5195 0.3772
5 2000 36000 36612 0.95 1.7335 4.7810 152.9104 0.5499
6 2000 40000 41509 0.95 2.0378 6.3260 192.3609 0.6284
7 10000 120000 125692 0.95 27.8011 10.8930 - 1.0816
8 10000 160000 166741 0.95 47.2419 18.6050 - 2.2907
9 16000 200000 210526 0.95 89.5501 21.7110 - 3.7931
10 16000 220000 230178 0.95 108.1297 28.5370 - 4.8605
11 20000 280000 298582 0.95 144.8006 35.0920 - 5.3361
12 20000 320000 335918 0.95 183.5328 42.8420 - 5.9182

The benchmark on denser matrix are presented in Table 3. As can be seen in the results, denser matrices require more computation for Sparse matrix method while it does not affect the same scale on other competitors. Despite of that fact, Sparse matrix method still holds the first place in this benchmark. In terms of analyzing the sparseness level of logic programs, we hardly find a program in which the sparsity is less than 0.97. This observation strongly encourages the use of sparse representation for logic programs.

We next show the comparison for computing transitive closure. We assume that a dataset contains edgesedges (tuples of nodesnodes), then first perform grounding two rules of defining pathpath. The obtained results are demonstrated in Table 4. In this non-randomized problem, we can see that the matrix representations are very sprase. Therefore, it is no doubt that Sparse matrix method outperforms Dense matrix method. Accordingly, we only highlight the efficiency of sparse representation and omit the dense matrix approach. Surpringly, Sparse matrix method surpasses Clasp once again in this experiment by a large margin.

Table 4: Details of experimental results on transitive closure problem of T_PT_{\_}P-operator, Clasp and sparse representation approach. nn^{\prime} indicates the actual matrix size after transformation.
Data name (|V|,|E||V|,|E|) nn mm nn^{\prime} Sparsity T_PT_{\_}P-operator Clasp Sparse matrix
Club membership (65, 95) 1200 14492 15600 0.99 0.8397 0.3370 0.0255
Cattle (28, 217) 1512 20629 21924 0.99 0.9541 0.5060 0.0365
Windsurfers (43, 336) 4324 99788 103776 0.99 3.6453 3.3690 0.1824
Contiguous USA (49, 107) 4704 113003 117600 0.99 4.2975 3.8830 0.1830
Dolphins (62, 159) 7564 230861 238266 0.99 12.3067 9.3820 0.4019
Train bombing (64, 243) 8064 254259 262080 0.99 15.2257 10.6350 0.4524
Highschool (70, 366) 9660 333636 342930 0.99 19.9622 15.8010 0.6618
Les Miserables (77, 254) 11704 445006 456456 0.99 27.7931 21.9560 0.8300

As can be witnessed in the results, Dense matrix method is the slowest, even slower than T_pT_{\_}p-operator, in terms of computation time due to wasting of computation on a huge amount of zero elements. This could be explained by the high level of sparsity of logic programs provided in Table 2, Table 3 and Table 4. Moreover, large dense matrices consume a huge amount of memory therefore the method is unable to run with large scale matrix size. Overall, sparse matrix method is effective in computing the fixpoint of definite programs. On the other hand, the performance would be improved if we use GPU accelerated code and exploit parallel computing power. The results indicates a potential for logical inference using an algebraic method.

4.2 Normal programs

In our current method, the number of columns in the initial matrix (Definition 8) grows exponentially by the number of negations, we limit the number of negations in this benchmark by 88 as specified in the experiment setup.

Table 5: Details of experimental results on normal programs of T_PT_{\_}P-operator, Clasp and linear algebraic methods (with dense and sparse representation). nn^{\prime} indicates the actual matrix size after transformation.
No. nn mm nn^{\prime} kk 555The number of negations. Sparsity T_PT_{\_}P-operator Clasp Dense matrix Sparse matrix
1 1000 5000 6379 8 0.99 0.0472 0.3070 3.9560 0.0119
2 1000 10000 12745 8 0.99 0.1838 1.0920 28.1806 0.0178
3 1600 24000 30061 8 0.99 0.5525 3.2760 105.4931 0.0559
4 1600 30000 36402 7 0.99 0.6801 4.3050 168.8044 0.0832
5 2000 36000 42039 5 0.99 1.2378 6.7180 203.2749 0.0897
6 2000 40000 48187 8 0.99 1.5437 7.1800 256.9701 0.0991
7 10000 120000 171967 6 0.99 27.3162 7.6820 - 0.7124
8 10000 160000 207432 7 0.99 32.5547 24.6990 - 0.8424
9 16000 200000 250194 5 0.99 70.3114 30.7180 - 1.5603
10 16000 220000 278190 6 0.99 86.5192 35.4050 - 1.8314
11 20000 280000 357001 4 0.99 133.7881 50.1970 - 1.9170
12 20000 320000 396128 4 0.99 150.3377 58.6090 - 2.1066
55footnotetext: The number of negations.
Table 6: Details of experimental results on normal programs (with lower sparsity level) of T_PT_{\_}P-operator, Clasp and linear algebraic methods (with dense and sparse representation). nn^{\prime} indicates the actual matrix size after transformation. hh indicates the column size of the initial matrix.
No. nn mm nn^{\prime} kk Sparsity T_PT_{\_}P-operator Clasp Dense matrix Sparse matrix
1 1000 5000 6385 7 0.95 0.1680 0.3680 3.7791 0.1133
2 1000 10000 12294 8 0.95 0.2453 1.4940 30.0642 0.1867
3 1600 24000 33172 7 0.95 0.6819 3.7830 102.5389 0.2219
4 1600 30000 35091 8 0.95 0.7741 5.9120 174.5192 0.3462
5 2000 36000 44145 8 0.95 2.3194 7.1020 197.3004 0.4131
6 2000 40000 49080 7 0.95 3.2665 8.6690 250.0876 0.4895
7 10000 120000 181550 8 0.95 36.9532 10.4530 - 3.2504
8 10000 160000 203576 6 0.95 54.1106 33.1920 - 4.0186
9 16000 200000 246159 4 0.95 86.3571 48.1860 - 7.2193
10 16000 220000 282734 5 0.95 106.0275 56.9150 - 8.3059
11 20000 280000 365190 4 0.95 163.0558 78.1790 - 9.0177
12 20000 320000 387094 4 0.95 202.5501 84.3270 - 11.5203

First, we perform benchmarks on normal programs which has 0.990.99 sparsity level. Table 5 illustrates the execution time in detail. As can be witnessed in the results, Sparse matrix method is still faster than Clasp but with a smaller scale it did in definite programs. It is needed to mention that the initial matrix size is remarkably larger due to the limitation of representation. We have to initialize all possible combinations of an atom which appeared with its negation form in the program. There is no doubt that with a larger number of negations, the space complexity of linear algebraic method is exponential. Accordingly, the performance of Sparse matrix method is only better than Clasp with a small fraction of negations.

In the next experiments, we compare different methods on denser matrix. Table 6 presents the data for this benchmark. Once again, with a limited number of negations, Sparse matrix method holds the winner position.

Noticeably, execution time on normal programs is generally greater than that on definite programs. This is obvious because we have a larger size of initial matrices as well as the need of extra computation on transforming and finding the least models as described in Algorithm 2. Then the weakness of the linear algebraic method is that we have to deal with all combinations of truth assignments in order to compute the stable model. Accordingly, the column size of the initial matrix exponentially increases by the number of negations. Thus, in the benchmark on randomized programs, we limit the number of negations for all benchmarks so that the matrix can fit in memory. This limitation will become clearer in real problems which have many negations. This is a major problem that we are investigating to do further research.

5 Conclusion

In this paper, we analyze the sparsity of matrix representation for LP and then propose an improvement for logic programming in vector space using sparse matrix representation. The experimental results on computing the least models of definite programs demonstrate very significant enhancement in terms of computation performance even when compared to Clasp. This improvement remarkably reduced the burden of computation in previous linear algebraic approaches for representing LP. Whereas in finding stable models of normal programs, the efficacy of linear algebraic method is limited due to the representation requires a huge amount of memory to store all possible combinations of negated atoms. In spite of the fact, our method is efficient when there are small numbers of negation. However, matrix computation could be more accelerated using GPU. We have tested our implementation in this way, and obtained expected results too.

Sato’s linear algebraic method [16] is based on a completely different idea to represent logic programs, where each predicate is represented in one matrix and an approximation method is used to compute the extension of a target predicate of a recursive program. We should note that this approximation method is limited in a matrix size 10,000, while our exact method is comfortable with 320,000. Further comparison is a future research topic, yet we could expect that Sato’s method can also be enhanced by sparse representation.

The encouraging results open up rooms for improvement and optimization. In an extended version of this paper, we will add more experimental results including a comparison on other benchmark programs for ASP. Potential future work is to apply a sampling method to reduce the number of guesses in the initial matrix for normal programs. An algorithm would be to prepare some manageable size of the initial matrix, and if all guesses fail we do some local search and replace column vectors by new assignments then repeat it until a stable model is found. Further research directions on implementing disjunctive LP and abductive LP should be considered in order to reveal the applicability of tensor-based approaches for LP. Additionally, more complex types of the program should be taken in to account to be represented in vector space, for instance 3-valued logic programs and answer set programs with aggregates and constraints.

References

  • [1]
  • [2] Jose Julio Alferes, Joao Alexandre Leite, Luis Moniz Pereira, Halina Przymusinska & Teodor C. Przymusinski (2000): Dynamic updates of non-monotonic knowledge bases. The Journal of Logic Programming 45(1-3), pp. 43–70, 10.1016/S0743-1066(99)00065-5.
  • [3] Yaniv Aspis (2018): A Linear Algebraic Approach to Logic Programming. Master thesis at Imperial College London.
  • [4] Yaniv Aspis, Krysia Broda & Alessandra Russo (2018): Tensor-based abduction in horn propositional programs. 2206, CEUR Workshop Proceedings, pp. 68–75, 10.1145/200836.200838.
  • [5] James R Bunch & Donald J Rose (2014): Sparse matrix computations. Academic Press, 10.1016/C2013-0-10439-4.
  • [6] William W Cohen (2016): Tensorlog: A differentiable deductive database. arXiv preprint arXiv:1605.06523.
  • [7] Martin Gebser, Roland Kaminski, Benjamin Kaufmann, Max Ostrowski, Torsten Schaub & Philipp Wanko (2016): Theory solving made easy with clingo 5. In: OASIcs-OpenAccess Series in Informatics, 52, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 10.4230/OASIcs.ICLP.2016.2.
  • [8] Gaël Guennebaud, Benoît Jacob et al. (2010): Eigen v3. http://eigen.tuxfamily.org.
  • [9] Fred G Gustavson (1978): Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS) 4(3), pp. 250–269, 10.1145/355791.355796.
  • [10] Robert Kowalski (1974): Logic for problem solving. Department of Computational Logic, Edinburgh University, 10.1145/1005937.1005947.
  • [11] Jérôme Kunegis (2013): Konect: the koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1343–1350, 10.1145/2487788.2488173.
  • [12] Hien D. Nguyen, Chiaki Sakama, Taisuke Sato & Katsumi Inoue (2018): Computing Logic Programming Semantics in Linear Algebra. In: Proceedings of the 12th International Conference on Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2018), Springer International Publishing, Lecture Notes in Artificial Intelligence, Vol. 11248, pp. 32–48, 10.1007/978-3-030-03014-83.
  • [13] Tim Rocktäschel, Matko Bosnjak, Sameer Singh & Sebastian Riedel (2014): Low-dimensional embeddings of logic. In: Proceedings of the ACL 2014 Workshop on Semantic Parsing, pp. 45–49, 10.3115/v1/W14-2409.
  • [14] Chiaki Sakama, Katsumi Inoue & Taisuke Sato (2017): Linear Algebraic Characterization of Logic Programs. In: Proceedings of the 10th International Conference on Knowledge Science, Engineering and Management (KSEM 2017), Springer International Publishing, Lecture Notes in Artificial Intelligence, Vol. 10412, pp. 520–533, 10.1007/978-3-319-63558-344.
  • [15] Taisuke Sato (2017): Embedding Tarskian semantics in vector spaces. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence.
  • [16] Taisuke Sato (2017): A linear algebraic approach to Datalog evaluation. Theory and Practice of Logic Programming 17(3), pp. 244–265, 10.1017/S1471068417000023.
  • [17] Taisuke Sato, Katsumi Inoue & Chiaki Sakama (2018): Abducing Relations in Continuous Spaces. In: IJCAI, pp. 1956–1962, 10.24963/ijcai.2018/270.
  • [18] Maarten H Van Emden & Robert A Kowalski (1976): The semantics of predicate logic as a programming language. Journal of the ACM (JACM) 23(4), pp. 733–742, 10.1145/321978.321991.
  • [19] Bishan Yang, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao & Li Deng (2015): Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In: Proceedings of the International Conference on Learning Representations (ICLR) 2015. Available at https://www.microsoft.com/en-us/research/publication/embedding-entities-and-relations-for-learning-and-inference-in-knowledge-bases/.