11email: [email protected] 22institutetext: National Institute of Informatics, Japan
22email: [email protected] 33institutetext: National Institute of Informatics, Japan
33email: [email protected]
Towards end-to-end ASP computation
Abstract
We propose an end-to-end approach for answer set programming (ASP) and linear algebraically compute stable models satisfying given constraints. The idea is to implement Lin-Zhao’s theorem [14] together with constraints directly in vector spaces as numerical minimization of a cost function constructed from a matricized normal logic program, loop formulas in Lin-Zhao’s theorem and constraints, thereby no use of symbolic ASP or SAT solvers involved in our approach. We also propose precomputation that shrinks the program size and heuristics for loop formulas to reduce computational difficulty. We empirically test our approach with programming examples including the 3-coloring and Hamiltonian cycle problems. As our approach is purely numerical and only contains vector/matrix operations, acceleration by parallel technologies such as many-cores and GPUs is expected.
Keywords:
Answer Set Programming end-to-end ASP vector space cost minimization loop formula supported model stable model1 Introduction
Computing stable model semantics [9] lies at the heart of answer set programming (ASP) [19, 16, 11] and there have been a variety of approaches proposed so far. Early approaches such as Smodels [20] used backtracking. Then the concept of loop formula was introduced and approaches that use a SAT solver to compute stable models based on Lin-Zhao’s theorem [14] were proposed. They include ASSAT [14] and Cmodels [10] for example. Later more elaborated approaches such as Clasp [7, 8] based on conflict-driven no good learning have been developed. While these symbolic approaches continue to predominate in ASP, there has been another trend towards differentiable methods. For example Differentiable ASP/SAT [18] computes stable models by an ASP solver that utilizes derivatives of a cost function. More recently NeurASP [29] and SLASH [26] combined deep learning and ASP. In their approaches, deep learning is not used in an end-to-end way to compute stable models but used as a component to compute and learn probabilities represented by special atoms interfacing to ASP. A step towards end-to-end computation was taken by Aspis et al. [2] and Takemura and Inoue [27]. They formulated the computation of supported models, a super class of stable models111 Supported models and stable models of a propositional normal logic program coincide when the program is tight (no infinite call chain through positive goals) [5, 4]. , entirely as fixedpoint computation in vector spaces and obtain supported models represented by binary vectors [1, 17]. Also in the context of probabilistic modeling, an end-to-end sampling of supported models was proposed by Sato and Kojima in [24]. However there still remains a gap between computing stable models and computing supported ones.
In this paper, we propose an end-to-end approach for ASP and compute stable models satisfying given constraints in vector spaces. The idea is simple; we implement Lin-Zhao’s theorem [14] together with constraints directly in vector spaces as a cost minimization problem, thereby no use of symbolic ASP or SAT solvers involved. Also as our approach is purely numerical and only contains vector/matrix operations, acceleration by parallel technologies such as many-cores and GPUs is expected.
Technically, Lin-Zhao’s theorem [14] states that a stable model of a ground normal logic program coincides with a supported model which satisfies “loop formulas” associated with the program. They are propositional formulas indicating how to get out of infinite loops of top-down rule invocation. We formulate finding such a model as root finding in a vector space of a non-negative cost function represented in terms of the matricized program and loop formulas. The problem is that in whatever approach we may take, symbolic or non-symbolic, computing supported models is NP-hard (for example graph coloring is solved by computing supported models) and there can be exponentially many loop formulas to be satisfied [12]. We reduce this computational difficulty in two ways. One is precomputation that removes atoms from the search space which are known to be false in any stable model and yields a smaller program. The other is to heuristically choose loop formulas to be satisfied. In a symbolic approach, the latter would mean allowing non-stable model computation but in our continuous approach, it means gracefully degraded gradient information for the continuous search process.
Our end-to-end computing framework differs from those by [2] and [27] in that they basically compute supported models and the computing process itself has no mechanism such as loop formulas to exclude non-stable models. Also we impose no restriction on the syntax of programs like the MD condition [2] or the SD condition [27], any propositional normal logic program is acceptable. More importantly we incorporate the use of constraints, i.e. rules like , which make ASP programming smooth and practical.
Hence our contributions include
-
•
a proposal of end-to-end computing of stable models in vector spaces for unrestricted programs
-
•
augmentation of the above by constraints
-
•
introduction of precomputation and heuristics to reduce computational difficulty of stable model computation.
We add that since our primary purpose in this paper is to establish theoretical feasibility of end-to-end ASP computing in vector spaces, programming examples are small and implementation is of preliminary nature.
In what follows, after preliminaries in Section 2, we formulate the computation of supported models in vector spaces in Section 3 and that of stable models in Section 4. We then show programming examples in Section 5 including ASP programs for the 3-coloring problem and the HC problem. We there compare performance of precomputation and loop formula heuristics. Section 6 contains related work and Section 7 is conclusion.
2 Preliminaries
In this paper, we mean by a program a propositional normal logic program which is a finite set of rules of the form where is an atom222 We equate propositional variables with atoms. called the head, a conjunction of literals333 A literal is an atom (positive literal) or its negation (negative literal). called the body of the rule respectively. We suppose is written in a given set of atoms but usually assume = atom(), i.e. the set of atoms occurring in . We use to denote the conjunction of positive literals in . may be empty. The empty conjunction is always true. We call rule for . Let be rules for in . When , put . When , i.e. there is no rule for , put where is a special symbol representing the empty disjunction which is always false. We call the completed rule for . The completion of , comp(), is defined as comp() = . For a finite set , we denote the number of elements in by . So is the number of rules in the program .
A model (assignment) over a set of atoms is a mapping which determines the truth value of each atom . Then the truth value of a formula is inductively defined by and if becomes true evaluated by , we say satisfies , is true in or is a model of and write . This notation is extended to a set by considering as a conjunction . For convenience, we always equate with , i.e. the set of atoms true in . When satisfies all rules in the program , i.e. , is said to be a model of . If no rule body contains negative literals, is said to be a definite program. In that case, always has the least model (in the sense of set inclusion) , i.e. the set of atoms provable from . A model of comp() is called a supported model of [17]. When is a definite program, its least model is also a supported model. In general, for non-definite , there can be zero or multiple supported models. Stable models are a subclass of supported models. They are defined as follows. Given a program and a model , remove all rules from whose body contains a negative literal false in , then remove all negative literals from the remaining rules. The resulting program, , is called the Gelfond-Lifschitz (GL) reduction of by or just the reduct of by . It is a definite program and has the least model. If this least model is identical to , is called a stable model of [9]. may have zero or multiple stable models like supported models. Since the existence of a stable model is NP-complete [16] and so is a supported model, their computation is expected to be hard.
Let be a Boolean formula in variables (atoms) in disjunctive normal form (DNF) where each () is a conjunction of literals and called a disjunct of . When has no disjunct, is false. is called full when every is a conjunction of distinct literals.
A walk in a directed graph is a sequence () of vertices representing the corresponding non-zero sequence of edges . When , it is said to be closed. A cycle is a closed walk where are all distinct. A Hamiltonian cycle is a cycle which visits every vertex once. A path is a walk with no vertex repeated. A directed subgraph is called strongly connected if there are paths from to and from to for any pair of distinct vertices and . This “strongly connected” relation induces an equivalence relation over the set of vertices and an induced equivalence class is called a strongly connected component (SCC).
The positive dependency graph for a program is a directed graph where vertices are atoms occurring in and there is an edge from atom to atom if-and-only-if (iff) there is a rule in such that is a positive literal in . is said to be tight [5]444 In [5], it is called “positive-order-consistent”. when is acyclic, i.e. has no cycle. A loop in is a set of atoms where for any pair of atoms and in ( allowed), there is a path in from to and also from to .
We denote vectors by bold italic lower case letters such as where represents the -th element of . Vectors are column vectors by default. We use to stand for the inner product (dot product) of vectors and of the same dimension. and respectively denote the 1-norm and 2-norm of where and . We use to denote an all-ones vector of appropriate dimension. A model over a set of ordered atoms is equated with an -dimensional binary vector such that if is true in and otherwise (). is called the vectorized .
Italic capital letters such as stand for a matrix. We use to denote the -th element of , the -th row of and the -th column of , respectively. We often consider one dimensional matrices as (row or column) vectors. denotes the Frobenius norm of . Let be matrices. denotes their Hadamard product, i.e., for . designates the matrix of stacked onto . We implicitly assume that all dimensions of vectors and matrices in various expressions are compatible. We introduce a piece-wise linear function that returns the lesser of 1 and as an activation function which is related to the popular activation function by . denotes the result of component-wise application of to matrix . We also introduce thresholding notation. Suppose is a real number and an -dimensional vector. Then denotes a binary vector obtained by thresholding at where for , if and otherwise. is treated similarly. We extend thresholding to matrices. Thus means a matrix such that if and otherwise. For convenience, we generalize bit inversion to an -dimensional vector and use an expression to denote the -dimensional vector such that for . is treated similarly.
3 Computing supported models in vector spaces
In this section, we formulate the semantics of supported models in vector spaces and show how to compute it by cost minimization.
3.1 Matricized programs
First we encode programs as matrices. For concreteness, we explain by an example (generalization is easy). Suppose we are given a program below containing three rules in a set of atoms .
(5) |
Assuming atoms are ordered as and correspondingly so are the rules as in (5), we encode as a pair of matrices . Here represents conjunctions (the bodies of ) and their disjunctions so that they jointly represent .
(8) |
(11) |
As can be seen, represents conjunctions in in such a way that for example represents the conjunction of the first rule in by setting and so on. represents disjunctions of rule bodies. So means the first atom in has two rules, the first rule and the second rule , representing a disjunction for .
In general, a program that has rules in atoms is numerically encoded as a pair of binary matrices and , which we call a matricized . represents rule bodies in . Suppose atoms are ordered like and similarly rules are ordered like . Then the -th row () encodes the -th conjunction of the -th rule . Write (). Then is a zero vector except for . combines these conjunctions as a disjucntion (DNF) for each atom in . If the -th atom () has rules in , we put to represent a disjunction which is the right hand side of the completed rule for : . If has no rule, we put for all (). Thus the matricized can represent the completed program .
3.2 Evaluation of formulas and the reduct of a program in vector spaces
Here we explain how the propositional formulas and the reduct of a program are evaluated by a model in vector spaces. Let be a model over a set of atoms. Recall that is equated with a subset of . We inductively define the relation “a formula is true in ”555 Equivalently, satisfies . , in notation, as follows. For an atom , iff . For a compound formula , iff . When is a disjunction (), iff there is some () s.t. . So the empty disjunction () is always false. We consider a conjunction as a syntax sugar for using De Morgan’s law. Consequently the empty conjunction is always true. Let be a program having ordered rules in ordered atoms as before and the body of a rule in . By definition, ( is true in ) iff and . Also let be the completed rule for an atom in . We see iff .
Now we isomorphically embed the above symbolic evaluation to the one in vector spaces. Let be a model over ordered atoms . We first vectorize as a binary column vector such that if and () otherwise, and introduce the dualized written as by . is a vertical concatenation of and the bit inversion of .
Consider a matricized program , and its -th rule having a body represented by . Compute which is the number of true literals in in and compare it with the number of literals 666 is the 1-norm of a vector . in . When holds, all literals in are true in and hence the body is true in . In this way, we can algebraically compute the truth value of each rule body, but since we consider a conjunction as a negated disjunction, we instead compute which is the number of false literals in . If this number is non-zero, have at least one literal false in , and hence is false in . The converse is also true. The existence of a false literal in is thus computed by which is if there is a false literal, and otherwise. Consequently if there is no false literal in and vice versa. In other words, computes .
Now let be an enumeration of rules for and the disjunction of the rule bodies. is the number of rule bodies in that are true in . Noting if and otherwise by construction of in , we replace the summation by matrix multiplication and obtain . Introduce a column vector . We have = the number of rules for whose bodies are true in ().
Proposition 1
Let be a matricized program in a set of atoms and a vectorized model over . Put . It holds that
if-and-only-if | (12) |
(Proof) Put . Suppose and
write , the completed rule for an atom
(), as (). We have . So if , , and hence , giving because
is the number of rule bodies in that are
true in . So holds. Otherwise if , we have and . Consequently none of the rule bodies
are true in and we have . Putting the two
together, we have . Since is arbitrary,
we conclude , or equivalently . The converse is
similarly proved. Q.E.D.
Proposition 1 says that whether is a supported model of the program or not is determined by computing in vector spaces whose complexity is where is the number of rules in , that of atoms occurring in .
In the case of in (5) having three rules , take a model over the ordered atom set where and are true in but is false in . Then we have , , and finally . The last equation says that the rule bodies of , and have respectively zero, one and zero literal false in . Hence indicates that only the second rule body is false and the other two bodies are true in . So its bit inversion indicates that the second rule body is false in while others are true in . Thus by combining these truth values in terms of disjunctions , we obtain .
denotes for each atom the number of rules for whose body is true in
. For example means that the first atom
in has one rule ()
whose body () is true in . Likewise
means that the second atom has one rule () whose body (empty) is true in . indicates that the third atom has no such rule. Therefore
denotes the truth
values of the right hand sides of the completed rules evaluated by . Since holds, it follows from Proposition 1 that is a supported model of .
We next show how , the reduct of by , is dealt with in vector spaces. We assume has rules in a set of ordered atoms as before. We first show the evaluation of the reduct of the matricized program by a vectorized model . Write as where (resp. ) is the left half (resp. the right half) of representing the positive literals (resp. negative literals) of each rule body in . Compute . It is an matrix (treated as a column vector here) such that if the body of contains a negative literal false in and otherwise (). Let be a rule with negative literals in the body deleted. We see that and is syntactically represented by where with columns replaced by the zero column vector if (). denotes a rule set in for . We call the matricized reduct of by .
The matricized reduct is evaluated in vector spaces as follows. Compute 777 is component-wise product. . denotes the truth values of rule bodies in evaluated by . Thus () if is contained in and its body is true in . Otherwise and is not contained in or the body of is false in . Introduce . () is the number of rules in for the -th atom in whose bodies are true in .
Proposition 2
Let be a matricized program in a set of ordered atoms and a model over . Write as above. Let be the vectorized model . Compute , and . Also compute . Then, , , and are all equivalent.
(Proof) We prove first. Recall that a
rule in is created by removing negative literals
true in from the body of in . So for any , it is immediate that has a rule whose body is
true in iff has the rule whose body is
true in . Thus for every (), and hence . Consequently,
we have iff . Also iff is proved similarly to Proposition 1 (details
omitted). The rest follows from Proposition 1. Q.E.D.
From the viewpoint of end-to-end ASP, Proposition 2
means that we can obtain a supported model as a binary solution
of the equation
derived from or
derived from the reduct . Either equation is possible and gives
the same result but their computation will be different. This is
because the former equation is
piecewise linear w.r.t. whereas the latter is piecewise quadratic w.r.t.
.
Now look at in (5) and a model again. is the reduct of by . has the least model that coincides with . So is a stable model of . To simulate the reduction process of in vector spaces, let be the matricized . We first decompose in (8) as where is the positive part and the negative part of . They are
Let be the vectorized . We first compute to determine rules to be removed. Since , the second rule , indicated by , is removed from , giving . Using and shown in (11), we then compute and . denotes the number rule bodies in true in for each atom. Thus, since holds, is a supported model of by Proposition 2.
3.3 Cost minimization for supported models
Having linear algebraically reformulated several concepts in logic programming, we tackle the problem computing supported models in vector spaces. Although there already exist approaches for this problem, we tackle it without assuming any condition on programs while allowing constraints. Aspis et al. formulated the problem as solving a non-linear equation containing a sigmoid function [2]. They encode normal logic programs differently from ours based on Sakama’s encoding [22] and impose the MD condition on programs which is rather restrictive. No support is provided for constraints in their approach. Later Takemura and Inoue proposed another approach [27] which encodes a program in terms of a single matrix and evaluates conjunctions based on the number of true literals. They compute supported models by minimizing a non-negative function, not solving equations like [2]. Their programs are however restricted to those satisfying the SD condition and constraints are not considered.
Here we introduce an end-to-end way of computing supported models in
vector spaces through cost minimization of a new cost function based
on the evaluation of disjunction. We impose no syntactic restriction
on programs and allow constraints. We believe that these two features
make our end-to-end ASP approach more feasible.
We can base our supported model computation either on Proposition 1 or on Proposition 2. In the latter case, we have to compute GL reduction which requires complicated computation compared to the former case. So for the sake of simplicity, we explain the former. Then our task in vector spaces is to find a binary vector representing a supported model of a matricized program that satisfies where . For this task, we relax to and introduce a non-negative cost function :
Proposition 3
Let be defined from a program as above.
if-and-only-if is a binary vector representing a supported model of .
(Proof) Apparently if , we have and . The second equation means is binary (), and the first equation means this
binary is a vector representing a supported model of by
Proposition 1. The converse is obvious.
Q.E.D.
is piecewise differentiable and we can obtain a supported model of as a root of by minimizing to zero using Newton’s method. The Jacobian required for Newton’s method is derived as follows. We assume is written in ordered atoms and represents their continuous truth values where is the continuous truth value for atom (). For the convenience of derivation, we introduce the dot product of matrices and and a one-hot vector which is a zero vector except for the -th element and . We note and hold.
Let be the matricized program and write . Introduce , , , , and compute by
(27) |
We then compute the Jacobian of . We first compute where ().
Since is arbitrary, we have .
Next we compute :
Again since is arbitrary, we have and reach
3.4 Adding constraints
A rule which has no head like is called a constraint. We oftentimes need supported models which satisfy constraints. Since constraints are just rules without a head, we encode constraints just like rule bodies in a program using a binary matrix . We call constraint matrix. We introduce , a non-negative function of and ’s Jacobian as follows.
(29) | |||||
(30) |
The meaning of and is clear when is binary. Note that any binary is considered as a model over a set of ordered atoms in an obvious way. Suppose constraints are given to be satisfied. Then is a binary matrix and is a matrix. () is the number of literals falsified by in a conjunction of the -th constraint . So , or equivalently implies has no false literal i.e. , and vice versa. Hence equals the number of violated constraints. Consequently when is binary. we can say that iff all constraints are satisfied by .
When is not binary but just a real vector , and are thought to be a continuous approximation to their binary counterparts. Since is a piecewise differentiable non-negative function of , the approximation error can be minimized to zero by Newton’s method using in (30) (the derivation of is straightforward and omitted).
3.5 An algorithm for computing supported models with constraints
Here we present a minimization algorithm for computing supported models of the matricized program which satisfy constraints represented by a constraint matrix . We first combine and into .
(32) |
The next proposition is immediate from Proposition 3.
Proposition 4
if-and-only-if represents a supported model of satisfying a constraint matrix .
We compute in by (27) and by (29), and their Jacobians and by (3.3) and by (30) respectively. We minimize the non-negative to zero by Newton’s method using Algorithm 1. It finds a solution of which represents a supported model of satisfying constraint matrix . The updating formula is derived from the first order Taylor expansion of and by solving w.r.t. . We use it with a learning rate as follows.
(33) |
Algorithm 1 is a double loop algorithm where the inner -loop updates repeatedly to minimize while thresholding into a binary solution candidate for . The outer -loop is for retry when the inner fails to find a solution. The initialization at line 3 is carried out by sampling () where is the standard normal distribution. Lines 6,7 and 8 collectively perform thresholding of into a binary . As the inner loop repeats, becomes smaller and smaller and so do and in . being small means is close to a supported model of while being small means each element of is close to . So binarization with an appropriate threshold 888 Currently given , we divide the interval into 20 equally distributed notches and use each notch as a threshold value . has a good chance of yielding a binary representing a supported model of satisfying constraints represented by . It may happen that the inner loop fails to find a solution. In such case, we retry another -loop with perturbated at line 12. There is perturbated by where before the next -loop.
3.6 Connection to neural network computation
At this point, it is quite interesting to see the connection of our approach to neural network computation. In (27), we compute and . We point out that the computation of this is nothing but the output of a forward process of a single layer ReLU network from an input vector . Consider the computation of . We rewrite this using to
So is the output of a ReLU network having a weight matrix and a bias vector . Then is the output of a ReLU network with a single hidden layer and a linear output layer represented by having as activation function. Also when we compute a stable model , we minimize (27) which contains an MSE error term using (32). This is precisely back propagation from learning data .
Thus we may say that our approach is an integration of ASP semantics and neural computation and provides a neuro-symbolic [23] way of ASP computation. Nonetheless, there is a big difference. In standard neural network architecture, a weight matrix and a bias vector are independent. In our setting, they are interdependent and they faithfully reflect the logical structure of a program.
4 Computing stable models in vector spaces
4.1 Loop formulas and stable models
Let be a matricized program in a set of atoms having rules where and . We assume atoms and rules are ordered as indicated.
Computing a supported model of is equivalent to computing any binary fixedpoint such that in vector spaces and in this sense, it is conceptually simple (though NP-hard). Contrastingly since stable models are a proper subclass of supported models, if one wishes to obtain precisely stable models through fixedpoint computation, the exclusion of non-stable models is necessary. Lin-Zhao’s theorem [14] states that is a stable model of iff is a supported model of and satisfies a set of formulas called loop formulas associated with .
Let be a loop in . Recall that is a set of atoms which are strongly connected in the positive dependency graph of 999 In the case of a singleton loop , we specifically require, following [14], that has a self-loop, i.e. there must be a rule of the form in . . A support rule for is a rule such that . is called a support body for . Introduce a loop formula for 101010 In the original form, the antecedent of is a disjunction [14]. Later it is shown that Lin-Zhao’s theorem also holds for the AND type [6]. We choose this AND type as it is easier to satisfy. by
(34) | |||||
Then define loop formulas associated with as . Logically is treated as the conjunction of its elements and we sometimes call it the loop formula associated with . Now we evaluate by a real vector . Introduce an external support matrix by if there is a support rule for , else (). Suppose there are loops in . Introduce a loop matrix such that if the -th loop has as a support body for , else (). Evaluate by as follows.
(35) |
is a non-negative piecewise linear function of .
Proposition 5
Let be defined as above. When is binary, it holds that
if-and-only-if .
(Proof) Suppose and is binary. A summand in (35) corresponds to the
-th loop and is non-negative. Consider
as a disjunction .
Then implies , or
equivalently . Consequently, as is binary, we
have or . The former means
. The latter,
, means .
This is because is the number of support rules for whose bodies are true in (),
and hence means some support body () for is true in . So in either case . Since is arbitrary, we have . The converse is straightforward and omitted.
Q.E.D.
The Jacobian of is computed as follows.
(36) | |||||
Here and
, and are computed by (35).
(37) | |||||
(38) |
Proposition 6
is a stable model of satisfying constraints represented by if-and-only-if is a root of .
We compute such by Newton’s method using Algorithm 1 with a modified update rule (33) such that and are replaced by and respectively.
When a program is tight [5], for example when rules have no positive literal in their bodies, has no loop and hence is empty. In such case, we directly minimize instead of using with the empty .
4.2 LF heuristics
Minimizing is a general way of computing stable models under constraints. It is applicable to any program and gives us a theoretical framework for computing stable models in an end-to-end way without depending on symbolic systems. However there can be exponentially many loops [12] and they make the computation of (35) extremely difficult or practically impossible. To mitigate this seemingly insurmountable difficulty, we propose two heuristics which use a subset of loop formulas.
-
:
The first heuristics is heuristics. We consider only a set of loop formulas associated with SCCs in the positive dependency graph of a program . In the case of a singleton SCC {a}, ’a’ must have a self-loop in . We compute SCCs in time by Tarjan’s algorithm [28].
-
:
In this heuristics, instead of SCCs (maximal strongly connected subgraphs), we choose minimal strongly connected subgraphs, i.e. cycle graphs. Denote by the set of loop formulas associated with cycle graphs in . We use an enumeration algorithm described in [15] to enumerate cycles and construct due to its simplicity.
We remark that although and can exclude (some of) non-stable models, they do not necessarily exclude all of non-stable models. However, the role of loop formulas in our framework is entirely different from the one in symbolic ASP. Namely, the role of LF in our framework is not to logically reject non-stable models but to guide the search process by their gradient information in the continuous search space. Hence, we expect, as actually observed in experiments in the next section, some loop formulas have the power of guiding the search process to a root of .
4.3 Precomputation
We introduce here precomputation. The idea is to remove atoms from the search space which are false in every stable model. It downsizes the program and realizes faster model computation.
When a program in a set = atom() is given, we transform
to a definite program by removing all negative literals
from the rule bodies in . Since holds as a set
of rules for any model , we have where denotes the least model of a definite
program . When is a stable model, holds
and we have . By taking the complements of
both sides, we can say that if an atom is outside of
, i.e. if is false in , so is in
any stable model of . Thus, by precomputing the least
model , we can remove a set of atoms from our consideration as they are
known to be false in any stable model. We call stable false atoms.
Of course, this precomputation needs additional computation of
but it can be done in linear time proportional to the
size of , i.e. the total number of occurrences of atoms in
[3]111111
We implemented the linear time algorithm in [3] linear
algebraically using vector and matrix and confirmed its linearity.
. Accordingly precomputing the least model makes sense
if the benefit of removing stable false atoms from the search space
outweighs linear time computation for , which is likely to
happen when we deal with programs with positive literals in the rule
bodies.
More concretely, given a program and a set of constraints , we can obtain downsized ones, and , as follows.
-
Step 1:
Compute the least model and the set of stable false atoms .
-
Step 2:
Define
conjunction with negative literals removed (40)
Proposition 7
Let and be respectively the program (Step 2:) and
constraints (40). Also let be a model over
atom(). Expand to a model over atom() by assuming
every atom in is false in . Then
is a stable model of satisfying constraints if-and-only-if is a stable model of satisfying constraints .
(Proof) We prove first is a stable model of if-and-only-if is a stable model of . To prove it, we prove as set.
Let be an arbitrary rule in . Correspondingly there is a rule in such that . So there is a rule in such that and . implies by construction of from . So is contained in , which means is contained in because (recall that and and have the same set of positive literals). Thus since is an arbitrary rule, we conclude , and hence .
Now consider . There is an SLD derivation for from . Let be a rule used in the derivation which is derived from the rule such that . Since , we have and hence , i.e., contains no stable false atom. So and because every atom in the SDL derivation must belong in . Accordingly . On the other hand, implies . So is in and is in . Therefore is in because . Thus every rule used in the derivation for from is also a rule contained in , which means . Since is arbitrary, it follows that . By putting and together, we conclude .
Then, if is a stable model of , we have as set. Since as set, we have as
set, which means is a stable model of . Likewise when is a
stable model of , we have and as set. So as set and is a stable model
of . We can also similarly prove that satisfies
if-and-only-if satisfies . So we are done.
Q.E.D.
5 Programming examples
In this section, we apply our ASP approach to examples as a proof of concept and examine the effectiveness of precomputation and heuristics. Since large scale computing is out of scope in this paper, the program size is mostly small121212 Matricized programs in this paper are all written in GNU Octave 6.4.0 and run on a PC with Intel(R) Core(TM) [email protected] CPU with 26GB memory. .
5.1 The 3-coloring problem
We first deal with the 3-coloring problem. Suppose we are given a graph . The task is to color the vertices of the graph blue, red and green so that no two adjacent vertices have the same color like (b) in Fig. 1.
![]() |
![]() |
There are four nodes in the graph . We assign a set of three color atoms (Boolean variables) to each node to represent their color. For example, node is assigned three color atoms . We need to represent two facts by these atoms.
-
•
Each node has a unique color chosen from . So color atoms assigned to each node are in an XOR relation. We represent this fact by a tight program below containing three rules for each node.
(45) -
•
Two nodes connected by an edge must have a different color. We represent this fact in terms of constraints.
(51)
Assuming an ordering of atoms , the normal logic program shown in (45) is matricized to where is a binary identity matrix (because there are 12 atoms and each atom has just one rule) and is a binary matrix shown in (54). Constraints listed in (51) are a matricized to a () constraint matrix (57). In (54) and (57), for example stands for a triple and for .
(54) |
(57) |
We run Algorithm 1 on program with constraints to find a supported model (solution) of satisfying 131313 Since is a tight program, every supported model of is a stable model and vice versa. .
time(s) | #solutions |
6.7(0.7) | 5.2(0.9) |
To measure time to find a model, we conduct ten trials141414 One trial consists of parameter updates. of running Algorithm 1 with , , and take the average. The result is 0.104s(0.070) on average. Also to check the ability of finding different solutions, we perform ten trials of Algorithm 1151515 without another solution constraint introduced in Section 5.2 and count the number of different solutions in the returned solutions. #solutions in Table 1 is the average of ten such measurements. Due to naive implementation, computation is slow but the number of different solutions, 5.2 on average, seems rather high considering there are six solutions.
Next we check the scalability of our approach by a simple problem. We consider the 3-coloring of a cycle graph like (a) in Fig. 2. In general, given a cycle graph that has nodes, we encode its 3-coloring problem as in the previous example by a matricized program and a constraint matrix where is an identity matrix and and represent respectively rules and constraints. There are solutions () in possible assignments for atoms. So the problem will be exponentially difficult as goes up.
![]() ![]() ![]() |
The graph (b) in Fig. 2 is an example of convergence curve of by Algorithm 1 with . The curve tells us that in the first cycle of -loop, the inner for loop of Algorithm 1, no solution is found after iterations of update of continuous assignment vector . Then perturbation is given to which causes a small jump of at and the second cycle of -loop starts and this time a solution is found after dozens of updates by thresholding to a binary vector .
The graph (c) in Fig. 2 shows the scalability of computation time to find a solution up to . We set and plot the average of ten measurements of time to find a solution. The graph seems to indicate good linearity w.r.t. up to .
5.2 The Hamiltonian cycle problem, precomputation and another solution constraint
A Hamiltonian cycle (HC) is a cycle in a graph that visits every
vertex exactly once and the Hamiltonian cycle problem is to determine
if an HC exists in a given graph. It is an NP-complete problem and
has been used as a programming example since the early days of ASP.
Initially it is encoded by a non-tight program containing positive
recursion [19]. Later a way of encoding by a program that
is not tight but tight on its completion models is proposed
[13]. We here introduce yet another encoding by a tight
ground program inspired by SAT encoding proposed in [30]
where Zhou showed that the problem is solvable by translating six
conditions listed in Fig. 3 into a SAT problem161616
Actually he listed seven conditions to be encoded as a SAT problem.
However, one of them is found to be redundant and we use the remaining
six conditions.
.
conditions | meaning | ||
---|---|---|---|
(1) | : | one of outgoing edges from vertex is in an HC | |
(2) | : | if edge is in an HC and vertex is visited at time , | |
vertex is visited at time | |||
(3) | : | vertex 1 is visited at time 1 (starting point) | |
(4) | : | one of incoming edges to vertex is in an HC | |
(5) | : | if edge is in an HC, vertex is visited at time | |
(6) | : | vertex is visited once () |
In what follows, we assume vertices are numbered from to = the number of vertices in a graph. We use to denote an edge from vertex to vertex and to indicate there exists an edge from to in an HC. means vertex is visited at time () and means that one of is true. We translate these conditions into a program and constraints . To be more precise, the first condition is translated into a tight program just like a program (45). The conditions constitute a tight definite program. Constraints are encoded as a set of implications of the form where are literals. A set of atoms contained in a stable model of satisfying gives an HC.
We apply the above encoding to a simple Hamiltonian cycle problem for a graph in Fig. 4171717 is taken from: Section 4.2 in A User’s Guide to gringo, clasp, clingo, and iclingo (http://wp.doc.ic.ac.uk/arusso/wp-content/uploads/sites/47/2015/01/clingo_guide.pd). . There are six vertices and six HCs181818 They are , , , , , . . To solve this HC problem, we matricize and . There are 36 atoms () and 36 atoms (). So there are 72 atoms in total. contains 197 rules in these 72 atoms and we translate into a pair of matrices where is a binary matrix for disjunctions191919 For example, for each (), condition (2) generates six rules . and is a matrix for conjunctions (rule bodies). Likewise is translated into a constraint matrix which is a binary matrix. Then our task is to find a root of (3.5) constructed from these , and in a 72 dimensional vector space by minimizing to zero.
We apply precomputation in the previous section to and to reduce program size. It takes 2.3ms and detects 32 false stable atoms. It outputs a precomputed program and a constraint matrix of size , and respectively, which is or of the original size. So precomputation removes of atoms from the search space and returns much smaller matrices.
![]() |
|
We run Algorithm 1 on with (no precomputation) and also on with (precomputation) using , and and measure time to find a solution, i.e. stable model satisfying constraints. The result is shown by Table (b) in Fig. 4 as time(s) where time(s) is an average of ten trials. Figures in the table, 2.08s vs. 0.66s, clearly demonstrate the usefulness of precomputation.
In addition to computation time, we examine the search power of different solutions in our approach by measuring the number of obtainable solutions. More concretely, we run Algorithm 1 seven times, and each time a stable model is obtained as a conjunction of literals, we add a new constraint to previous constraints, thereby forcibly computing a new stable model in next trial. We call such use of constraint another solution constraint. Since there are at most six solutions, the number of solutions obtained by seven trials is at most six. We repeat a batch of seven trials ten times and take the average of the number of solutions obtained by each batch. The average is denoted as #solutions in Table (b) which indicates that 5.7 solutions out of 6, almost all solutions, are obtained by seven trials using another solution constraint.
Summing up, figures in Table (b) exemplify the effectiveness of precomputation which significantly reduces computation time and returns a more variety of solutions when combined with another solution constraint.
5.3 LF heuristics and precomputation on loopy programs
So far we have been dealing with tight programs which have no loop and hence have no loop formulas. We here deal with non-tight programs containing loops and examine how LF heuristics, and , introduced in the previous section work. We use an artificial non-tight program (with no constraint) shown below that have exponentially many loops [12].
For an even , program has atoms , supported models and one stable model . There are minimal loops and a maximal loop . The set of loop formulas for LF heuristics are computed as follows.
Note that although there are supported models, there is only one stable model. So and are expected to exclude supported models.
After translating into a matricized program where is a binary matrix and is a binary matrix respectively, we compute a stable model of for various by Algorithm 1 that minimizes (37) with coefficient for the constraint term (because of no use of constraints) using Jacobian (32).
Since all supported models of except for one stable model are non-stable, even if and are used to guide the search process towards a stable model, Algorithm 1 is likely to return a non-stable model. We can avoid such situation by the use of another solution constraint.
another solution constraint | time(s) | #trials |
---|---|---|
not used | ||
used |
To verify it, we examine the pure effect of another solution constraint that guides the search process to compute a model different from previous ones. Without using or heuristics, we repeatedly run Algorithm 1 with/without another solution constraint for trials with , , , and measure time to find a stable model and count the number of trials until then. We repeat this experiment ten times and take the average. The result is shown in Table 2.
The figure in Table 2 in the case of no use of
another solution constraint means Algorithm 1 always exhausts
trials without finding a stable model (due to implicit bias in
Algorithm 1). When another solution constraint is used however,
it finds a stable model in 0.09s after 3.5 trials on average. Thus
Table 2 demonstrates the necessity and effectiveness of
another solution constraint to efficiently explore the search space.
We next compare the effectiveness of LF heuristics and that of precomputation under another solution constraint. For , we repeatedly run Algorithm 1 using with on matricized (and no constraint matrix) to compute supported (stable) models. Coefficients in are set to . To be more precise, for each and each case of , , precomputation (without ) and no , we run Algorithm 1 at most 100 trials to measure time to find a stable model and count the number of supported models computed till then. We repeat this computation ten times and take the average and obtain graphs in Fig. 5.
![]() |
![]() |
In Fig. 5, no_LF means no use of heuristics. Also no_LF_pre means no_LF is applied to precomputed 202020 Precomputation takes 0.006s and removes only one stable false atom. So precomputation is not helpful in the current case. .
We can see from graph (a) in Fig. 5 that computation time
is . This
means that using LF heuristics is not necessarily a good policy. They
might cause extra computation to reach the same model.
Concerning the number of non-stable models computed redundantly, graph
(b) in Fig. 5 tells us that allows computing
redundant non-stable models but the rest, , no_LF and
no_LF_pre, return a stable model without computing redundant
non-stable models. This shows first that works correctly
to suppress the computation of non-stable models and second that the
heuristics works adversely, i.e. guiding the search process
away from the stable model. This somewhat unexpected result indicates
the need of (empirical) choice of LF heuristics.
Finally to examine the effectiveness of precomputation more precisely, we apply precomputation to a more complex program . It is a modification of by adding self-loops of atoms as illustrated by (a) in Fig. 6. The addition of self-loop causes the choice of () being true or being false in the search process. has supported models but has just one stable model .
|
![]() |
.
We compute a stable model by running Algorithm 1 on precomputed without using LF heuristics up to . When precomputation is applied to where , it detects false stable atoms and downsizes the matrices in from to and from to . Thus precomputed is downsized to 1/3 of the original .
We run Algorithm 1 on with and , at most 100 trials to measure time to find a stable model ten times for each and take the average. At the same time, we also run Clingo (version 5.6.2) on and similarly measure time. Graph (b) in Fig. 6 is the result. It shows that as far as computing a stable model of is concerned, our approach comes close to Clingo. However, this is due to a very specific situation that precomputation removes all false atoms in the stable model of and Algorithm 1 run on the precomputed detects the stable model only by thresholding before starting any update of . So what graph (b) really suggests seems to be the importance of optimization of a program like precomputation, which is to be developed further in our approach.
6 Related work
The most closely related work is [2] and [27]. As mentioned in Section 1, our approach differs from them in three points: (1) theoretically, the exclusion of non-stable models by loop formulas, (2) syntactically, no restriction on acceptable programs and (3) practically, incorporation of constraints. Concerning performance, they happen to use the same -negative loops program which consists of copies (alphabetic variants) of a program . According to [2], the success rate w.r.t. of returning a supported model goes from one initially to almost zero at in [2] while it keeps one till in [27]. We tested the same program with , and observed that the success rate keeps one till .
Although our approach is non-probabilistic, i.e. purely linear algebraic, there are probabilistic differentiable approaches for ASP. Differentiable ASP/SAT [18] iteratively samples a stable model by an ASP solver a la ASSAT [14]. The solver decides the next decision literal based on the derivatives of a cost function which is the MSE between the target probabilities and predicted probabilities computed from the sampled stable models via parameters associated with ”parameter atoms” in a program.
NeurASP [29] uses an ASP solver to obtain stable models including ”neural atoms” for a program. They are associated with probabilities learned by deep learning and the likelihood of an observation (a set of ASP constraints) is computed from them. The whole learning is carried out by backpropagating the likelihood to neural atoms to parameters in a neural network.
Similarly to NeurASP, SLASH [26] uses an ASP solver to compute stable models for a program containing ”neural probabilistic predicates”. Their probabilities are dealt with by neural networks and probabilistic circuits. The latter makes it possible to compute a joint distribution of the class category and data.
Sato and Kojima developed another differentiable approach which, without using ASP solvers, samples supported models of probabilistic normal logic programs [24]. They encode programs by matrices and realize the sampling of supported models in vector spaces by repeatedly computing a fixedpoint of some differentiable equations.
Independently of differentiable approaches mentioned so far, Tuan et al. adopted matrix encoding for propositional normal logic programs based on [22] and proposed to compute stable models in vector spaces by a generate-and-test manner using sparse representation [21].
Also though not computing stable models, Sato et al. developed a 2-stage approach for supported models. It first computes the least 3-valued model of a matricized program in vector spaces that contains undefined atoms, and then assigns true or false to them to obtain a supported model of the original program [25].
7 Conclusion
We proposed an end-to-end approach for computing stable models satisfying given constraints. We matricized a program and constraints and formulated stable model computation as a minimization problem in vector spaces of a non-negative cost function. We obtain a stable model satisfying constraints as a root the cost function by Newton’s method.
By incorporating all loop formula constraints introduced in Lin-Zhao’s
theorem [14] into the cost function to be minimized, we can
prevent redundant computation of non-stable models, at the cost of
processing exponentially many loop formulas. Hence, we introduced
precomputation which downsizes a program while preserving stable model
semantics and also two heuristics that selectively use loop formulas.
Then we confirmed the effectiveness of our approach including
precomputation and loop formula heuristics by simple examples.
Acknowledgments
This work is supported by JSPS KAKENHI Grant Number JP21H04905
and JST CREST Grant Number JPMJCR22D3..
References
- [1] Apt, K.R., Blair, H.A., Walker, A.: Foundations of deductive databases and logic programming. chap. Towards a Theory of Declarative Knowledge, pp. 89–148 (1988)
- [2] Aspis, Y., Broda, K., Russo, A., Lobo, J.: Stable and supported semantics in continuous vector spaces. In: Calvanese, D., Erdem, E., Thielscher, M. (eds.) Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020, Rhodes, Greece, September 12-18, 2020. pp. 59–68 (2020)
- [3] Dowling, W.F., Gallier, J.H.: Linear-time Algorithms for Testing the Satisfiability of Propositional Horn Formula. Journal of Logic Programming 3, 267–284 (1984)
- [4] Erdem, E., Lifschitz, V.: Tight Logic Programs. Theory and Practice of Logic Programming (TPLP) 3(4–5), 499–518 (2003)
- [5] Fages, F.: Consistency of Clark’s completion and existence of stable models. Journal of Methods of Logic in Computer Science 1, 51––60 (1994)
- [6] Ferraris, P., Lee, J., Lifschitz, V.: A generalization of the lin-zhao theorem. Annals of Mathematics and Artificial Intelligence 47, 79–101 (2006)
- [7] Gebser, M., Kaufmann, B., AndréNeumann, Schaub, T.: clasp: A conflict-driven answer set solver. In: LPNMR. Lecture Notes in Computer Science, vol. 4483, pp. 260–265. Springer (2007)
- [8] Gebser, M., Kaufmann, B., Schaub, T.: Conflict-driven answer set solving: From theory to practice. Artif. Intell. 187, 52–89 (2012)
- [9] Gelfond, M., Lifshcitz, V.: The stable model semantics for logic programming pp. 1070–1080 (1988)
- [10] Lierler, Y.: Cmodels: Sat-based disjunctive answer set solver. In: Proceedings of the 8th International Conference on Logic Programming and Nonmonotonic Reasoning. p. 447–451. LPNMR’05, Springer-Verlag, Berlin, Heidelberg (2005)
- [11] Lifschitz, V.: What is answer set programming? In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3. p. 1594–1597. AAAI’08, AAAI Press (2008)
- [12] Lifschitz, V., Razborov, A.: Why are there so many loop formulas? ACM Trans. Comput. Logic 7(2), 261–268 (apr 2006). https://doi.org/10.1145/1131313.1131316
- [13] Lin, F., Zhao, J.: On tight logic programs and yet another translation from normal logic programs to propositional logic. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). p. 853–858. Morgan Kaufmann Publishers Inc. (2003)
- [14] Lin, F., Zhao, Y.: Assat: computing answer sets of a logic program by sat solvers. Artificial Intelligence 157(1), 115–137 (2004)
- [15] Liu, H., Wang, J.: A new way to enumerate cycles in graph. Advanced Int’l Conference on Telecommunications and Int’l Conference on Internet and Web Applications and Services (AICT-ICIW’06) pp. 57–59 (2006)
- [16] Marek, V.W., Truszczyński, M.: Stable Models and an Alternative Logic Programming Paradigm, pp. 375–398. Springer Berlin Heidelberg (1999). https://doi.org/10.1007/978-3-642-60085-2_17
- [17] Marek, W., V.S.Subrahmanian: The relationship between stable, supported, default and autoepistemic semantics for general logic programs. Theoretical Computer Science 103(2), 365–386 (1992)
- [18] Nickles, M.: Differentiable SAT/ASP. In: Proceedings of the 5th International Workshop on Probabilistic Logic Programming, PLP 2018. pp. 62–74 (2018)
- [19] Niemelä, I.: Logic programs with stable model semantics as a constraint programming paradigm 25(3–4), 241–273 (1999)
- [20] Niemelä, I., Simons, P.: Smodels — an implementation of the stable model and well-founded semantics for normal logic programs. In: Dix, J., Furbach, U., Nerode, A. (eds.) Logic Programming And Nonmonotonic Reasoning. pp. 420–429. Springer Berlin Heidelberg (1997)
- [21] Quoc, T.N., Inoue, K., Sakama, C.: Enhancing linear algebraic computation of logic programs using sparse representation. New Gen. Comput. 40(1), 225–254 (2022). https://doi.org/10.1007/s00354-021-00142-2
- [22] Sakama, C., Inoue, K., Sato, T.: Linear Algebraic Characterization of Logic Programs. In: Proceedings of the 10th International Conference on Knowledge Science, Engineering and Management (KSEM2017), LNAI 10412, Springer-Verlag. pp. 520–533 (2017)
- [23] Sarker, M.K., Zhou, L., Eberhart, A., Hitzler, P.: Neuro-symbolic artificial intelligence: Current trends (2021)
- [24] Sato, T., Kojima, R.: In: Artificial Intelligence IJCAI 2019 International Workshops, Revised Selected Best Papers (Amal El Fallah Seghrouchni,David Sarne (Eds.)), Lecture Notes in Computer Science, vol. 12158, pp. 239–252. Springer (2020). https://doi.org/10.1007/978-3-030-56150-5
- [25] Sato, T., Sakama, C., Inoue, K.: From 3-valued semantics to supported model computation for logic programs in vector spaces. In: Rocha, A.P., Steels, L., van den Herik, H.J. (eds.) Proceedings of the 12th International Conference on Agents and Artificial Intelligence, ICAART 2020, Volume 2, Valletta, Malta, February 22-24, 2020. pp. 758–765. SCITEPRESS (2020). https://doi.org/10.5220/0009093407580765
- [26] Skryagin, A., Stammer, W., Ochs, D., Dhami, D.S., Kersting, K.: Neural-Probabilistic Answer Set Programming. In: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning. pp. 463–473 (2022)
- [27] Takemura, A., Inoue, K.: Gradient-based supported model computation in vector spaces. In: Gottlob, G., Inclezan, D., Maratea, M. (eds.) Logic Programming and Nonmonotonic Reasoning - 16th International Conference, LPNMR 2022, Genova, Italy, September 5-9, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13416, pp. 336–349. Springer (2022)
- [28] Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146–160 (1972)
- [29] Yang, Z., Ishay, A., Lee, J.: Neurasp: Embracing neural networks into answer set programming. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. pp. 1755–1762. International Joint Conferences on Artificial Intelligence Organization (2020)
- [30] Zhou, N.F.: In pursuit of an efficient sat encoding for the hamiltonian cycle problem. In: proceeding of the Principles and Practice of Constraint Programming: 26th International Conference (CP 2020). pp. 585–602. Springer-Verlag (2020)