¹¹institutetext: Skema Business School, Montreal, Canada

Answer-Set Programs for Repair Updates and Counterfactual Interventions

Leopoldo Bertossi Millennium Institute for Foundational Research on Data (IMFD, Chile) & Professor Emeritus Carleton University, Canada. [email protected]

Abstract

We briefly describe -mainly through very simple examples- different kinds of answer-set programs with annotations that have been proposed for specifying: database repairs and consistent query answering; secrecy view and query evaluation with them; counterfactual interventions for causality in databases; and counterfactual-based explanations in machine learning.

Repair Programs and Consistent Query Answering.

When a relational database fails to satisfy a given set of integrity constraints (ICs) and one wants to obtain semantically correct query answers, one can specify and simultaneously query virtually repaired, i.e. consistent, versions of the database. The so obtained certain answers are considered to be the consistent answers [1, 5]. Answer-set programs (ASPs) with annotations can be used for this task [4, 17]. The motivation for this approach comes from earlier work where annotated predicate logic, a form of multi-valued logic extended with non-monotonic negation, that was used for this task [2].

Example 1

The following is a denial constraint (DC), i.e. it prohibits combinations or joins of database atoms: $\kappa\!:\neg\exists x\exists y(S(x)\wedge R(x,y)\wedge S(y))$ . The following database instance $D$ violates $\kappa$ .

$R$	A	B
$\iota_{1}$	$a_{4}$	$a_{3}$
$\iota_{2}$	$a_{2}$	$a_{1}$
$\iota_{3}$	$a_{3}$	$a_{3}$

$S$	A
$\iota_{4}$	$a_{4}$
$\iota_{5}$	$a_{2}$
$\iota_{6}$	$a_{3}$

We use global tuple identifiers (tids) to refer to individual tuples. They appear in predicates’ first arguments followed by a semicolon.

Subset-repairs (S-repairs) of $D$ are consistent w.r.t. $\kappa$ , and minimally differ from $D$ under set inclusion. They are: $D_{1}=$ $\{R(\iota_{1};a_{4},a_{3}),$ $R(\iota_{2};a_{2},a_{1}),$ $R(\iota_{3};a_{3},a_{3}),$ $S(\iota_{4};a_{4}),S(\iota_{5};a_{2})\}$ , $D_{2}=\{R(\iota_{2};a_{2},a_{1}),S(\iota_{4};a_{4}),$ $S(\iota_{5};a_{2}),$ $S(\iota_{6};a_{3})\}$ , and $D_{3}=$ $\{R(\iota_{1};a_{4},a_{3}),R(\iota_{2};a_{2},a_{1}),S(\iota_{5};a_{2}),S(\iota_{6};a_{3})\}$ .

The repair program contains the atoms in $D$ , and the rules:

$\displaystyle S^{\prime}(t_{1};x,{\sf d})\vee R^{\prime}(t_{2};x,y,{\sf d})\vee S^{\prime}(t_{3};y,{\sf d})$	$\displaystyle\leftarrow$	$\displaystyle S(t_{1};x),R(t_{2};x,y),S(t_{3};y).$
$\displaystyle S^{\prime}(t;x,{\sf s})$	$\displaystyle\leftarrow$	$\displaystyle S(t;x),\ {\it not}\ S^{\prime}(t;x,{\sf d}).$
$\displaystyle R^{\prime}(t;x,y,{\sf s})$	$\displaystyle\leftarrow$	$\displaystyle R(t;x,y),{\it not}\ R^{\prime}(t;x,y,{\sf d}).$

Here, $t_{1},\ldots,t,x,y,\ldots$ are variables, but the annotation ${\sf d}$ is a constant indicating that the tuple is deleted from $D$ . Annotation constant ${\sf s}$ indicates that the tuple stays in the repair. Here, the first rule captures in its body (i.e. antecedent) a violation of $\kappa$ , and the head (i.e. the consequent) offers the alternative tuple deletions that can solve the violation. The last two rules specify that repairs keep the original tuples that have not been deleted. Predicates $R^{\prime}$ and $S^{\prime}$ are nicknames for $R$ and $S$ , with an extra argument for the annotation.

This repair-program has three stable models, with repair $D_{1}$ corresponding to the model $M_{1}=\{R^{\prime}(\iota_{1};a_{4},a_{3},{\sf s}),R^{\prime}(\iota_{2};a_{2},a_{1},{\sf s}),$ $R^{\prime}(\iota_{3};a_{3},a_{3},{\sf s}),S^{\prime}(\iota_{4};a_{4},{\sf s}),$ $S^{\prime}(\iota_{5};a_{2},{\sf s}),$ $S^{\prime}(\iota_{6};a_{3},{\sf d})\}\cup D$ , in the sense that $D_{1}$ can be read off from $M_{1}$ by keeping only the tuples annotated with s. $\Box$

If there are interacting ICs, for which repair actions for one of them may affect satisfaction of another one, it is necessary to use a couple of extra annotations to capture a transition process [17]. Doing CQA becomes skeptical (or certain) reasoning with the repair program. A system for CQA based on repair programs that runs on the DLV system [23] is described in [17].

In some cases, it is more convenient to specify and compute S-repairs that are also cardinality repairs (C-repairs), i.e. that differ from the original instance by a minimum number of tuples [5]. In Example 1, (only) they can be obtained by adding to the program the following weak program constraints (WCs): $\sim R(t,\bar{x}),R^{\prime}(t,\bar{x},{\sf d})$ and $\sim S(t,\bar{x}),S^{\prime}(t,\bar{x},{\sf d})$ . Violations, i.e. instantiations of what comes after $\sim$ , should not happen. However, they are accepted as long as they are kept to a minimum [23]. With these WCs the number of deleted tuples is minimized.

Repair programs are quite general and can be produced for any set of first-order ICs. Their complexity and expressive power are just right for the intrinsic complexity of repair-related computational tasks and CQA [17, 5]. Repair programs have been extended for CQA from virtual data integration systems [14], and to peer data exchange systems [7]. Other ASP-based approaches to database repairs and CQA can be found in [12, 21, 19].

Virtual Updates for Secrecy Views.

Repairs that replace attribute values by null values that behave as in SQL [6, 12] can also be used to hide the contents a view. Intuitively, a database, for which the contents a view is expected to stay hidden, offers as query answers only the certain (or consistent) answers from the set of secrecy instances (the repairs) that have the view empty or containing tuples with nulls only. Details about these repair and secrecy programs can be found in [6, 9].

Example 2

(example 1 cont.) Consider the same database ${D}$ , and $V_{\!\kappa}(x,y):S(x)\land R(x,y)\land S(y)$ defining the view whose contents we want to hide from users. In order to to do this, we repair w.r.t. the associated DC $\kappa$ by “minimally” changing attribute values by a constant NULL. Since it behaves as in SQL, it cannot be used to satisfy a join. There are two repairs.

$R$	A	B
$\iota_{1}$	$a_{4}$	$a_{3}$
$\iota_{2}$	$a_{2}$	$a_{1}$
$\iota_{3}$	$a_{3}$	$a_{3}$

$S$	A
$\iota_{4}$	$a_{4}$
$\iota_{5}$	$a_{2}$
$\iota_{6}$	${\sf NULL}$

$R$	A	B
$\iota_{1}$	$a_{4}$	${\sf NULL}$
$\iota_{2}$	$a_{2}$	$a_{1}$
$\iota_{3}$	$a_{3}$	${\sf NULL}$

$S$	A
$\iota_{4}$	$a_{4}$
$\iota_{5}$	$a_{2}$
$\iota_{6}$	$a_{3}$

In each of them NULL is preventing the satisfaction of a join in $V_{\!\kappa}$ . In both cases, the set of value changes is minimal under set inclusion.

A program rule that achieves this result is: $S^{\prime}(t_{1},{\sf NULL})\vee R^{\prime}(t_{2},{\sf NULL},y)\vee R^{\prime}(t_{2},x,{\sf NULL})\vee S^{\prime}(t_{3},{\sf NULL})\leftarrow S(t_{1},x),R(t_{2},x,y),S(t_{3},y)$ . $\Box$

Counterfactual Programs for Causality in Databases.

In [24], actual causality [22] was applied to define and compute database tuples that are causes for a query to be true. Furthermore, causal responsibility [18] was used to assign numerical scores to causes, to reflect their strength as such. A detailed analysis of causality in databases was carried out in [8], where a useful connection with database repairs was unveiled. As consequence, repair program can be used to specify and compute causes [9].

Example 3

(example 1 cont.) With the same database ${D}$ , the query ${\mathcal{Q}_{\kappa}\!:\ \exists x\exists y(S(x)\land R(x,y)\land S(y))}$ is true in $D$ . Tuple $t_{6}$ is a counterfactual cause for $\mathcal{Q}_{\!\kappa}$ to be true in ${D}$ , in the sense that if $t_{6}$ is removed from ${D}$ , $\mathcal{Q}_{\!\kappa}$ is no longer true. Its responsibility is $1$ , the highest possible. Tuple $t_{1}$ is an actual cause since deleting it from $D$ together with its contingent tuple $t_{3}$ makes the query false. Its responsibility is $\frac{1}{1+|\Gamma|}=\frac{1}{1+1}=\frac{1}{2}$ , with $\Gamma$ the smallest set of its contingent tuples. Similarly, $t_{3}$ and $t_{4}$ are actual causes, with responsibility ${\frac{1}{2}}$ .

When $Q_{\!\kappa}$ is true, equivalently the IC $\kappa$ is false, in order to obtain causes, tuples have to be deleted from $D$ , for which a repair program for $\kappa$ can be used. Causes’ tids can be retrieved by means of the rules: ${\it Cause}(t)\leftarrow R^{\prime}(t,x,y,{\mathsf{d}})$ and ${\it Cause}(t)\leftarrow S^{\prime}(t,x,{\mathsf{d}})$ . In order to obtain their contingency tuples, one can use, e.g. ${\it Cont}(t,t^{\prime})\leftarrow R^{\prime}(t,x,y,{\mathsf{d}}),R^{\prime}(t^{\prime},u,v,{\mathsf{d}}),t\neq t^{\prime}$ , collecting tids that have to de deleted together with $t$ . Using set-building and aggregations, one can compute contingency sets and their cardinalities [15, 16], and then, also responsibilities. With WCs one can concentrate on minimum cardinality contingency sets [9]. $\Box$

Counterfactual Programs for Explainable ML.

Consider entity records represented by atomic formulas $E(t;\bar{e},\mathsf{o})$ , with an id $t$ , a sequence of feature values $\bar{e}$ , and an annotation o for “original record”. They are labeled, say with $0$ or $1$ , by a black-box or open-box classifier represented by a predicate ${\it Cl}(t,\ell)$ : record with id $t$ received label $\ell$ . A particular entity $E(\mathbf{t};\mathbf{\bar{e}};\mathsf{o})$ receives label $1$ , i.e. ${\it Cl}(\mathbf{t},1)$ is true, and we want to explain this by counterfactually intervening $\mathbf{\bar{e}}$ , changing feature values, trying to obtain label $0$ . For each feature we do this, but value changes for other features may be necessary (as with actual causality above). A responsibility score can be assigned to each feature value that depends on the additional required changes (similar but much more general than causal responsibility above) [10, 11]. These are also called score-based attributive or contrastive explanations in explainable AI.

The process of iteratively intervening an entity until it switches label can be specified by means of counterfactual ASPs [12, 13]. For the gist, we give some of the rules in such a program. Since an entity may go trough an iteration of feature value changes, we need an annotation, $\mathfrak{\star}$ , to indicate it is in transition, with the rules: $E(\mathbf{e};\bar{x};\mathfrak{\star})\leftarrow E(\mathbf{e};\bar{x};\mbox{\sf o})$ and $E(\mathbf{e};\bar{x};\mathfrak{\star})\leftarrow E(\mathbf{e};\bar{x};\mbox{\sf do})$ , where annotation do indicates a single counterfactual change.

The main rule, $\mbox{\phantom{ooo}}E(t;x_{1}^{\prime},x_{2},\ldots,x_{n};\mbox{\sf do})\vee\cdots\vee E(t;x_{1},x_{2},\ldots,x_{n}^{\prime};\mbox{\sf do})\ \longleftarrow$ $E(t;\bar{x};\mathfrak{\star}),{\it Cl}[t;1],{\it Dom}_{1}(x_{1}^{\prime}),\ldots,{\it Dom}_{n}(x_{n}^{\prime}),x_{1}^{\prime}\neq x_{1},\ldots,x_{n}^{\prime}\neq x_{n},{\it choice}(\bar{x};x_{1}^{\prime}),$ $\ldots,{\it choice}(\bar{x};x_{n}^{\prime})$ , specifies that while the label is not switched, a single feature value is non-deterministically [20] replaced by a new one from the feature domain. One eventually stops when the label has been switched: $E(t;\bar{x};\mbox{\sf s})\ \leftarrow\ E(t;\bar{x};\mbox{\sf do}),{\it Cl}(t,0).$

Acknowledgments: Part of this work was funded by “ANID - Millennium Science Initiative Program” - Code ICN17002; and “Centro Nacional de Inteligencia Artificial” CENIA, FB210017, Financiamiento Basal para Centros Científicos y Tecnológicos de Excelencia de ANID, Chile.

References

[1] Arenas, M., Bertossi, L. and Chomicki, J. Consistent Query Answers in Inconsistent Databases. In Proc. ACM PODS 1999, pp. 68-79.
[2] Arenas, M., Bertossi, L. and Kifer, M. Applications of Annotated Predicate Calculus to Querying Inconsistent Databases. Proc. Computational Logic 2000, Springer LNCS 1861, pp. 926-941.
[3] Arenas, M., Bertossi, L. and Chomicki, J. Answer Sets for Consistent Query Answers. Theory and Practice of Logic Programming, 2003, 3(4&5):393-424.
[4] Barcelo, P. and Bertossi, L. Logic Programs for Querying Inconsistent Databases. Proc. PADL 2003, Springer LNCS 2562, pp. 208-222.
[5] Bertossi. L. Database Repairing and Consistent Query Answering. Synthesis Lectures in Data Management. Morgan & Claypool, 2011.
[6] Bertossi, L. and Li, L. Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates. IEEE Trans. Knowledge and Data Engineering, 2013, 25(5):987-1000.
[7] Bertossi, L. and Bravo, L. Consistency and Trust in Peer Data Exchange Systems. Theory and Practice of Logic Programming, 2017, 17(2):148-204. Extended version as Corr Arxiv Paper cs.DB/1606.01930.
[8] Bertossi, L. and Salimi, B. From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back. Theory of Computing Systems, 2017, 61(1):191-232.
[9] Bertossi, L. Specifying and Computing Causes for Query Answers in Databases via Database Repairs and Repair Programs. Knowledge and Information Systems, 2021, 63(1):199-231.
[10] Bertossi, L., Li, J., Schleich, M., Suciu, D. and Vagena, Z. Causality-Based Explanation of Classification Outcomes. In Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, DEEM@SIGMOD 2020, pages 6:1-6:10, 2020.
[11] Bertossi, L. Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis. In Reasoning Web. Declarative Artificial Intelligence 2021, Springer LNCS 13100, 2022, pp. 145-184.
[12] Bertossi, L. Declarative Approaches to Counterfactual Explanations for Classification. Theory and Practice of Logic Programming. 2021. https://doi.org/10.1017/S1471068421000582. arXiv Paper 2011.07423.
[13] Bertossi, L. and Reyes, G. Answer-Set Programs for Reasoning about Counterfactual Interventions and Responsibility Scores for Classification. In Inductive Logic Programming, Proc. 1st International Joint Conference on Learning and Reasoning (IJCLR’21), Springer LNAI 13191, 2022, pp. 41-56.
[14] Bravo, L. and Bertossi, L. Disjunctive Deductive Databases for Computing Certain and Consistent Answers to Queries from Mediated Data Integration Systems. Journal of Applied Logic, 2005, 3(2):329-367.
[15] Calimeri, F., Cozza, S., Ianni, G. and Leone, N. Computable Functions in ASP: Theory and Implementation. Proc. ICLP 2008, Springer LNCS 5366, pp. 407-424.
[16] Calimeri, F., Cozza, S., Ianni, G. and Leone, N. An ASP System with Functions, Lists,and Sets. Proc. LPNMR 2009, Springer LNCS 5753, pp. 483-489.
[17] Caniupan, M. and Bertossi, L. The Consistency Extractor System: Answer Set Programs for Consistent Query Answering in Databases. Data & Knowledge Engineering, 2010, 69(6):545-572.
[18] Chockler, H. and Halpern, J. Responsibility and Blame: A Structural-Model Approach. J. Artif. Intell. Res., 2004, 22:93-115.
[19] Eiter, T., Fink, M., Greco, G. and Lembo, D. Repair Localization for Query Answering from Inconsistent Databases. ACM Trans. Database Syst., 2008, 33(2):10:1-10:51.
[20] Giannotti, F., Greco, S., Sacca, D. and Zaniolo, C. Programming with Non-Determinism in Deductive Databases. Annals of Mathematics in Artificial Intelligence, 1997, 19(1-2):97-125.
[21] Greco, G., Greco, S. and Zumpano, E. A Logical Framework for Querying and Repairing Inconsistent Databases. IEEE Trans. Knowl. Data Eng., 2003, 15(6):1389-1408.
[22] Halpern, J. and Pearl, J. Causes and Explanations: A Structural-Model Approach. Part I: Causes. The British Journal for the Philosophy of Science, 2005, 56(4):843-887.
[23] Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S. and Scarcello, F. The DLV System for Knowledge Representation and Reasoning. ACM Transactions on Computational Logic, 2006, 7(3):499-562.
[24] Meliou, A., Gatterbauer, W., Moore, K. F. and Suciu, D. The Complexity of Causality and Responsibility for Query Answers and Non-Answers. Proc. VLDB 2010, pp. 34-41.