This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Declarative Approaches to Counterfactual Explanations for Classificationthanks: In memory of Prof. Jack Minker (1927-2021), a scientist, a scholar, a visionary; a generous, wise and committed man.

LEOPOLDO BERTOSSI
Universidad Adolfo Ibáñez
   Faculty of Engineering and Sciences
and
Millenium Institute on Foundations of Data (IMFD)
Santiago
   Chile.
[email protected]
(2002)
Abstract

We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely responsibility scores. The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this work is on the specification and computation of best counterfactual entities, i.e. those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints. Several examples of programs written in the syntax of the DLV ASP-solver, and run with it, are shown.

keywords:
Classification, explanations, counterfactuals, causality, answer-set programming, constraints
volume: 10 (3):

1 Introduction

Counterfactual Explanations.

Providing explanations to results obtained from machine-learning (ML) models has been recognized as critical in many applications. It has also become an active research direction in explainable ML [Molnar (2020)], and the broader area of explainable AI. Explanations become particularly relevant when decisions are automatically made by those models, possibly with serious consequences for stake holders. Since most of those models are algorithms learned from training data, providing explanations may not be easy or possible. These models are or can be seen as black-box models. Even if the components of the models, say their structure, mathematical functions, and parameters, are relatively clear and accessible, characterizing and measuring the relevance of a particular feature value for the classification of an entity may not be that clear from a sheer inspection of the model.

In AI, explanations have been investigated, among other areas, under actual causality [Halpern and Pearl (2005)], where counterfactual interventions on a causal structural model are central. They are hypothetical updates on the model’s variables, to explore if and how the outcome of the model changes or not. In this way, explanations for an original output are defined and computed. Counterfactual interventions have been used, explicitly or implicitly, with ML models, in particular with classification models [Martens and Provost (2014), Wachter et al. (2017), Russell (2019), Karimi et al. (2020a), Datta et al. (2016), Bertossi et al. (2020)]. Counterfactual explanations for query answers from databases have also been investigated [Meliou et al. (2010), Bertossi and Salimi (2017a), Bertossi (2021)], an area that is not unrelated to classification results (c.f. Section 4).

In this work we start by introducing and formalizing the notion of counterfactual explanation for the classification of an entity 𝐞\mathbf{e} (think of a finite record of feature values) with a certain label \ell. Counterfactual explanations are alternative versions, 𝐞\mathbf{e}^{\prime}, of 𝐞\mathbf{e} that differ from 𝐞\mathbf{e} by a set of feature values, but are classified with a different label \ell^{\prime}. In this work, “best counterfactual explanations” come in two forms, as s-explanations and c-explanations. The former minimize (under set inclusion) the set of features whose values are changed; while the latter, minimize the number of changed feature values.

Explanation Scores and Responsibility.

We use counterfactual explanations to define the  x-Resp score,  an explanation responsibility score for a feature value of the entity under classification. The idea is to identify and compute the most responsible feature values for the outcome. This score is adapted from the general notion of responsibility used in actual causality [Chockler and Halpern (2004)].

More specifically, in this work, we concentrate our interest mostly and mainly on specifying and computing, for a given and fixed classified entity 𝐞\mathbf{e}, all the best explanations as represented as counterfactual entities (that have feature values changed and a different label), where “best” for a counterfactual entity means that the number of changes of feature values (with respect to 𝐞\mathbf{e}) takes a minimum value (the c-explanations mentioned above). At the same time, we are also interested in the maximum-responsibility feature values. These two goals are directly related: The maximum-responsibility feature values (in the original entity 𝐞\mathbf{e}) are exactly those that appear as changed in a best counterfactual entity.

The x-Resp score belongs to the family of feature attribution scores, among which one finds the popular Shap score [Lundberg et al. (2020)]. While Shap is based on the Shapley value that is widely used in game theory, the x-Resp score is based on actual causality. Experimental results with an extended version (that we briefly describe) of the responsibility score investigated in this work, and comparisons with other scores, in particular, with Shap, are reported in [Bertossi et al. (2020)]. However, only a fully procedural approach to the x-Resp score was followed in [Bertossi et al. (2020)]. Furthermore, for performance and comparison-related reasons, the number of counterfactually changeable feature values was bounded a priori.

Specifying Counterfactuals and Reasoning.

Answer-set programming (ASP) is an elegant and powerful logic programming paradigm that allows for declarative specifications of a domain or a problem. From those specifications one can do logical reasoning, and quite usefully, non-monotonic reasoning [Brewka et al. (2011), Gelfond and Kahl (2014)]. ASP has been applied to the specification and solution of hard combinatorial problems, and also to different tasks related to databases. Among the latter, we find the specification of repairs of databases w.r.t. integrity constraints, virtual data integration, specification of privacy views [Bertossi (2011), Bertossi (2019)], and causality in databases [Bertossi (2021)].

In this work we also introduce Counterfactual Intervention Programs (CIPs), which are answer-set programs (ASPs) that specify counterfactual interventions and counterfactual explanations, and also allow to specify and compute the responsibility scores. More specifically, CIPs are designed to be used to compute: (a) counterfactual explanations (best or not), (b) best counterfactual explanations (i.e. c-explanations), and (c) highest-responsibility features values (in the original entity), that is, with maximum x-Resp score.

As already mentioned above, c-explanations lead to maximum responsibility scores, and maximum-responsibility scores are associated to c-explanations. However, in order to compute a maximum-responsibility score (or a feature value which attains it), we may not need all the best counterfactual entities that “contain” that particular feature value. Actually, only one of them would be good enough. Still, our CIPs compute all the counterfactual entities, and all (and only) the best counterfactual entities if so required.

In this regard, it is worth mentioning that counterfactual explanations, best or not, are interesting per se, in that they could show what could be done differently in order to be obtain a different classification [Schleich et al. (2021)]. For example, someone, represented as an entity 𝐞\mathbf{e}, who applies for a loan at a bank, and is deemed by the bank’s classifier as unworthy of the loan, could benefit from knowing that a reduction of the number of credit cards he/she owns might lead to a positive assessment. This counterfactual explanation would be considered as actionable (or feasible or a recourse) [Ustun et al. (2019), Karimi et al. (2020b)].

Furthermore, having all (or only the best) counterfactual explanations, we could think of doing different kinds of meta-analysis and meta-analytics on top of them. In some cases, query answering on the CIP can be leveraged, e.g. to know if a highest responsibility feature value appears as changed in every best counterfactual entity (c.f. Section 9).

CIPs can be applied to black-box models that can be invoked from the program; and also with any classifier that can be specified within the program, such as a rule-based classifier. Decision trees for classification can be expressed by means of rules. As recent research shows, other established classifiers, such as random forests, Bayesian networks,  and neural networks,  can be compiled into Boolean circuits [Shi et al. (2020), Shih et al. (2018), Choi et al. (2020), Narodytska et al. (2010)], opening the possibility of specifying them by means of ASPs.

Answer-Set Programming and Semantic Knowledge.

Our declarative approach to counterfactual interventions is particularly appropriate for bringing into the game additional, declarative, semantic knowledge. As we show, we could easily adopt constraints when computing counterfactual entities and scores. In particular, these constraints can be used to concentrate only on actionable counterfactuals (as defined by the constraint). In the loan example, we could omit counterfactual entities or explanations that require from the applicant to reduce his/her age. Imposing constraints in combination with purely procedural approaches is much more complicated. In our case, we can easily and seamlessly integrate logic-based semantic specifications with declarative programs, and use the generic and optimized solvers behind ASP implementations. In this work we include several examples of CIPs specified in the language of the DLV system [Leone et al. (2006)] and its extensions, and run on them.

We establish that ASPs are the right declarative and computational tool for our problem since they can be used to: (a) Compute all the counterfactual explanations, in particular, the best ones. (b) Provide through each model of the program all the information about a particular counterfactual explanation. (c) Compute maximum-responsibility scores and the counterfactual explanations associated to them. (d) In many cases, specify as a single program the classifier together with the explanation machinery. (d) Alternatively, call a classifier from the program as an external predicate defined in, say Python. (e) Bring into the picture additional, logic-based knowledge, such as semantic and actionability constraints. (f) Do query answering, under the skeptical and brave semantics, to analyze at a higher level the outcomes of the explanation process. Furthermore, and quite importantly, the ASPs we use match the intrinsic data complexity of the problem of computing maximum-responsibility feature values, as we will establish in this work.

We discuss why and how the responsibility score introduced and investigated early in this work has to be extended in some situations, leading to a generalized probabilistic responsibility score introduced in [Bertossi et al. (2020)]. After doing this, we show how logical constraints that capture domain knowledge can be used to modify the underlying probability distribution on which a score is defined.

Article Structure and Contents.

This paper is structured as follows. In Section 2 we provide background material on classification models and answer-set programs. In Section 3 we introduce counterfactual interventions and the responsibility score. In Section 4, we investigate the data complexity of the computation of maximum-responsibility feature values. In Section 5 we introduce and analyze the CIPs. In Section 6, we consider some extensions of CIPs that can be used to capture additional domain knowledge. In Section 7, and for completeness, we briefly motivate and describe the probabilistic responsibility score previously introduced in [Bertossi et al. (2020)]. In this regard, we also show how to bring logical constraints into the probabilistic setting of the responsibility score. In Section 8, the discuss related work. In Section 9, we conclude with a discussion and concluding remarks.

This paper is an extended version of the conference paper [Bertossi (2020)]. This current version, in addition to improving and making more precise the presentation in comparison with the conference version, includes a considerable amount of new material. Section 2, containing background material, is new. In particular, it contains a new example of a decision-tree classifier that is used as an additional running example throughout the paper. Section 4 on the complexity of x-Resp computation is completely new, so as Section 5.2 that retakes complexity at the light of CIPs. This version also contains several examples of the use of the DLV system for specifying and running CIPs. There were no such examples in the conference version. Section 7 is new. It contains two subsections; the first dealing with non-binary features and the need to extend in probabilistic terms the definition of the x-Resp score; and the second, with the combination of probabilistic uncertainty and domain knowledge. We now have the new Section 8 on related work. The final Section 9 has been considerably revised and extended.

2 Background

2.1 Classification models and counterfactuals

In a particular domain we may have a finite set ={F1,,Fn}\mathcal{F}\mathchar 61\relax\{F_{1},\ldots,F_{n}\} of features (a.k.a. as variables). More precisely, the features FiF_{i} are functions that take values in their domains Dom(Fi){\mathit{D}om}(F_{i}).111This is customary parlance, but, since a feature is a function, in Mathematics we would call this the range of FiF_{i}. As is commonly the case, we will assume that these domains are categorical, i.e. finite and without any a priori order structure. These features are used to describe or represent entities in an underlying population (or application domain), as records (or tuples) 𝐞\mathbf{e} formed by the values the features take on the entity. Actually, with a bit of abuse of notation, we represent entities directly as these records of feature values:  𝐞=F1(𝐞),,Fn(𝐞)\mathbf{e}\mathchar 61\relax\langle F_{1}(\mathbf{e}),\ldots,F_{n}(\mathbf{e})\rangle. A feature FF is said to be Boolean or propositional if Dom(F)={𝗍𝗋𝗎𝖾,𝖿𝖺𝗅𝗌𝖾}{\mathit{D}om}(F)\mathchar 61\relax\{\mathsf{true},\mathsf{false}\}, for which we sometimes use {𝗒𝖾𝗌,𝗇𝗈}\{\mathsf{yes},\mathsf{no}\} or simply, {1,0}\{1,0\}.

For example, if the underlying population contains people, and then, each entity represents a person, features could be ={𝖶𝖾𝗂𝗀𝗁𝗍,\mathcal{F}\mathchar 61\relax\{\mathsf{Weight}, 𝖠𝗀𝖾,𝖬𝖺𝗋𝗋𝗂𝖽,𝖤𝖽𝖫𝖾𝗏𝖾𝗅}\mathsf{Age},\mathsf{Marrid},\mathsf{EdLevel}\}, with domains: Dom(𝖶𝖾𝗂𝗀𝗁𝗍)={𝗈𝗏𝖾𝗋𝗐𝖾𝗂𝗀𝗁𝗍,𝖼𝗁𝗎𝖻𝖻𝗒,𝖿𝗂𝗍,{\mathit{D}om}(\mathsf{Weight})\mathchar 61\relax\{\mathsf{overweight},\mathsf{chubby},\mathsf{fit}, 𝗌𝗅𝗂𝗆,\mathsf{slim}, 𝗎𝗇𝖽𝖾𝗋𝗐𝖾𝗂𝗀𝗁𝗍}\mathsf{underweight}\}; Dom(𝖠𝗀𝖾)={𝗈𝗅𝖽,𝗆𝗂𝖽𝖽𝗅𝖾,𝗒𝗈𝗎𝗇𝗀,{\mathit{D}om}(\mathsf{Age})\mathchar 61\relax\{\mathsf{old},\mathsf{middle},\mathsf{young}, 𝖼𝗁𝗂𝗅𝖽}\mathsf{child}\}; Dom(𝖬𝖺𝗋𝗋𝗂𝖾𝖽)={𝗍𝗋𝗎𝖾,𝖿𝖺𝗅𝗌𝖾}{\mathit{D}om}(\mathsf{Married})\mathchar 61\relax\{\mathsf{true},\mathsf{false}\}; Dom(𝖤𝖽𝖫𝖾𝗏𝖾𝗅)={𝗅𝗈𝗐,{\mathit{D}om}(\mathsf{EdLevel})\mathchar 61\relax\{\mathsf{low}, 𝗆𝖾𝖽𝗂𝗎𝗆,𝗁𝗂𝗀𝗁,\mathsf{medium},\mathsf{high}, 𝗍𝗈𝗉}\mathsf{top}\}.  A particular entity could be (represented by) 𝐞=𝖿𝗂𝗍,𝗒𝗈𝗎𝗇𝗀,\mathbf{e}\mathchar 61\relax\langle\mathsf{fit},\mathsf{young}, 𝖿𝖺𝗅𝗌𝖾,𝗍𝗈𝗉\mathsf{false},\mathsf{top}\rangle. Here, 𝖬𝖺𝗋𝗋𝗂𝖾𝖽\mathsf{Married} is a propositional (or binary) feature.

A classifier, 𝒞\mathcal{C}, for a domain of entities \mathcal{E} is a computable function 𝒞:L\mathcal{C}:\mathcal{E}\rightarrow L, where L={1,,k}L\mathchar 61\relax\{\ell_{1},\ldots,\ell_{k}\} is a set of labels. We classify the entity by assigning a label to it. We do not have to assume any order structure on LL. In the example above, we may want to classify entities in the people’s population according to how appropriate they are for military service. The set of labels could be L={𝗀𝗈𝗈𝖽,𝗆𝖺𝗒𝖻𝖾,𝗇𝗈𝗐𝖺𝗒}L\mathchar 61\relax\{\mathsf{good},\mathsf{maybe},\mathsf{noway}\}, then the classifier could be such, that 𝒞(𝖿𝗂𝗍,𝗒𝗈𝗎𝗇𝗀,\mathcal{C}(\langle\mathsf{fit},\mathsf{young}, 𝖿𝖺𝗅𝗌𝖾,𝗍𝗈𝗉)=𝗀𝗈𝗈𝖽\mathsf{false},\mathsf{top}\rangle)\mathchar 61\relax\mathsf{good}, but 𝒞(𝗈𝗏𝖾𝗋𝗐𝖾𝗂𝗀𝗁𝗍,𝗈𝗅𝖽,\mathcal{C}(\langle\mathsf{overweight},\mathsf{old}, 𝗍𝗋𝗎𝖾,𝗅𝗈𝗐)=𝗇𝗈𝗐𝖺𝗒\mathsf{true},\mathsf{low}\rangle)\mathchar 61\relax\mathsf{noway}.  When the set of labels has two values, typically, {𝗒𝖾𝗌,𝗇𝗈}\{\mathsf{yes},\mathsf{no}\}, we have a binary or Boolean classifier. In this work we will consider mostly binary classifiers.

Example 2.1

This is a popular example taken from [Mitchell (1997)]. Consider the set of features ={𝖮𝗎𝗍𝗅𝗈𝗈𝗄,𝖧𝗎𝗆𝗂𝖽𝗂𝗍𝗒,𝖶𝗂𝗇𝖽}\mathcal{F}\mathchar 61\relax\{\mathsf{Outlook},\mathsf{Humidity},\mathsf{Wind}\}, with Dom(𝖮𝗎𝗍𝗅𝗈𝗈𝗄)={\mathit{D}om}(\mathsf{Outlook})\mathchar 61\relax {𝗌𝗎𝗇𝗇𝗒,𝗈𝗏𝖾𝗋𝖼𝖺𝗌𝗍,𝗋𝖺𝗂𝗇}\{\mathsf{sunny},\mathsf{overcast},\mathsf{rain}\}, Dom(𝖧𝗎𝗆𝗂𝖽𝗂𝗍𝗒)={\mathit{D}om}(\mathsf{Humidity})\mathchar 61\relax {𝗁𝗂𝗀𝗁,𝗇𝗈𝗋𝗆𝖺𝗅}\{\mathsf{high},\mathsf{normal}\}, Dom(𝖶𝗂𝗇𝖽)={\mathit{D}om}(\mathsf{Wind})\mathchar 61\relax {𝗌𝗍𝗋𝗈𝗇𝗀,\{\mathsf{strong}, 𝗐𝖾𝖺𝗄}\mathsf{weak}\}. An entity under classification has a value for each of the features, e.g. 𝐞=ent(𝗌𝗎𝗇𝗇𝗒,𝗇𝗈𝗋𝗆𝖺𝗅,𝗐𝖾𝖺𝗄)\mathbf{e}\mathchar 61\relax{\mathit{e}nt}(\mathsf{sunny},\mathsf{normal},\mathsf{weak}). The problem is about deciding about playing tennis or not under the conditions represented by that entity, which can be captured as a classification problem, with labels 𝗒𝖾𝗌\mathsf{yes} or 𝗇𝗈\mathsf{no} (sometimes we will use 11 and 0, resp.).

Refer to caption
Figure 1: A Decision Tree

The Boolean classifier is given as a decision-tree, as shown in Figure 1. The decision is computed by following the feature values along the branches of the tree. The entity 𝐞\mathbf{e} at hand gets label 𝗒𝖾𝗌\mathsf{yes}.

Of course, a decision tree could be just arbitrarily given, but the common situation is to learn it from training examples, that is from a collection of entities that come already with an assigned label. In more general terms, building a classifier, 𝒞\mathcal{C}, from a set of training data, i.e. a set of pairs T={𝐞1,(𝐞1),T\mathchar 61\relax\{\langle\mathbf{e}_{1},\ell(\mathbf{e}_{1})\rangle, ,𝐞M,(𝐞M)}\ldots,\langle\mathbf{e}_{M},\ell(\mathbf{e}_{M})\rangle\}, with (𝐞i)L\ell(\mathbf{e}_{i})\in L, is about learning a (computational) model for the label function LL for the entire domain of entities, beyond TT. We say that LL “represents” the classifier 𝒞\mathcal{C}. This supervised learning task is one of the most common in machine learning. (C.f. [Mitchell (1997)] or [Flach (2012)] for great introductions to the subject.)  Classifiers may take many different internal forms. They could be decision trees, random forests, rule-based classifiers, logistic regression models, neural network-based classifiers, etc.

In this work we are not concerned with learning classifiers. We will assume the classifier is given as an input/output computable relation, or that we know how to specify it, for example, trough a logic program. By the same token, we are not dealing here with any kind of program learning.

Some classes of classifiers are more “opaque” than others, i.e. with a more complex and less interpretable internal structure and results [Molnar (2020)]. Hence, the need for explaining the classification outcomes. The decision tree above is clearly a computable function. Since we have the classifier at hand with an explicit and clear specification, we would consider this classifier to be interpretable. In this direction, if we obtain a label for a particular entity at hand, we can inspect the model and explain why we obtained that label, and we could identify the feature values that were relevant for the outcome.

Instead of a decision tree as above, we could have, for example, a very complex neural network. Such a model would be much more difficult to interpret, or use to explain why we got a particular output for a particular input. This kind of models are considered to be black-box models (or al least, opaque) [Rudin (2019)].

Assume for a moment that we do not have the explicit classifier in Example 2.1, but we interact only with the box that contains it. We input entity 𝐞=ent(𝗌𝗎𝗇𝗇𝗒,𝗇𝗈𝗋𝗆𝖺𝗅,𝗐𝖾𝖺𝗄)\mathbf{e}\mathchar 61\relax{\mathit{e}nt}(\mathsf{sunny},\mathsf{normal},\mathsf{weak}), and we obtain the output 𝗒𝖾𝗌\mathsf{yes}. We want to know what are the feature values in 𝐞\mathbf{e} that influenced the outcome the most. Actually, we want to get a numerical score for each of the entity’s feature values, in such a way that the higher the score, the more relevant is the feature value for the outcome.

Different scores may be defined. A popular one is Shap [Lundberg et al. (2020)], which has been investigated in detail for some classes of classifiers [Arenas et al. (2021), Van den Broeck et al. (2021)]. In this work we concentrate on x-Resp, a responsibility score that was introduced, in a somewhat ad hoc manner, in [Bertossi et al. (2020)]. In this work, we present x-Resp in a causal setting, on the basis of counterfactual interventions and their responsibilities, following the general approach to causal responsibility in [Chockler and Halpern (2004)].

Just for the gist, and at the light of the example at hand, we want to detect and quantify the relevance (technically, the responsibility) of a feature value in 𝐞=ent(𝗌𝗎𝗇𝗇𝗒,𝗇𝗈𝗋𝗆𝖺𝗅¯,𝗐𝖾𝖺𝗄)\mathbf{e}\mathchar 61\relax{\mathit{e}nt}(\mathsf{sunny},\underline{\mathsf{normal}},\mathsf{weak}), say for feature 𝖧𝗎𝗆𝗂𝖽𝗂𝗍𝗒\mathsf{Humidity} (underlined), by hypothetically intervening its value; in this case, changing it from 𝗇𝗈𝗋𝗆𝖺𝗅\mathsf{normal} to 𝗁𝗂𝗀𝗁\mathsf{high}, obtaining a new entity 𝐞=ent(𝗌𝗎𝗇𝗇𝗒,𝗁𝗂𝗀𝗁¯,𝗐𝖾𝖺𝗄)\mathbf{e}^{\prime}\mathchar 61\relax{\mathit{e}nt}(\mathsf{sunny},\underline{\mathsf{high}},\mathsf{weak}), a counterfactual version of 𝐞\mathbf{e}. If we input this entity into the classifier, we now obtain the label 𝗇𝗈\mathsf{no}. This is an indication that the original feature value for 𝖧𝗎𝗆𝗂𝖽𝗂𝗍𝗒\mathsf{Humidity} is indeed relevant for the original classification. Through numerical aggregations over the outcomes associated to the alternative, counterfactually intervened entities, we can define and compute the x-Resp score. The details can be found in Section 3. As mentioned in Section 1, these counterfactual entities are also interesting per se, and not only as a basis for the definition and computation of feature attribution scores.

2.2 Answer-set programming

As customary, when we talk about answer-set programs, we refer to disjunctive Datalog programs with weak negation and stable model semantics [Gelfond and Lifschitz (1991)]. Accordingly, an answer-set program Π\Pi consists of a finite number of rules of the form

A1AnP1,,Pm,notN1,,notNk,A_{1}\vee\ldots\vee A_{n}\leftarrow P_{1},\ldots,P_{m},\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k}, (1)

where 0n,m,k0\leq n,m,k, and Ai,Pj,NsA_{i},P_{j},N_{s} are (positive) atoms, i.e. of the form Q(t¯)Q(\bar{t}), where QQ is a predicate of a fixed arity, say, \ell, and t¯\bar{t} is a sequence of length \ell of variables or constants. In rule (1), A1,,notNkA_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k} are called literals, with A1A_{1} positive, and notNk\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k}, negative. All the variables in the Ai,NsA_{i},N_{s} appear among those in the PjP_{j}. The left-hand side of a rule is called the head, and the right-hand side, the body. A rule can be seen as a (partial) definition of the predicates in the head (there may be other rules with the same predicates in the head).

The constants in program Π\Pi form the (finite) Herbrand universe HH of the program. The ground version of program Π\Pi, gr(Π){\mathit{g}r}(\Pi), is obtained by instantiating the variables in Π\Pi in all possible ways using values from HH. The Herbrand base, HB{\mathit{H}\!B}, of Π\Pi contains all the atoms obtained as instantiations of predicates in Π\Pi with constants in HH.

A subset MM of HB{\mathit{H}B} is a model of Π\Pi if it satisfies gr(Π){\mathit{g}r}(\Pi), i.e.: For every ground rule A1AnA_{1}\vee\ldots\vee A_{n} \leftarrow P1,,Pm,P_{1},\ldots,P_{m}, notN1,,notNk\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k} of gr(Π){\mathit{g}r}(\Pi), if {P1,,Pm}\{P_{1},\ldots,P_{m}\} \subseteq MM and {N1,,Nk}M=\{N_{1},\ldots,N_{k}\}\cap M\mathchar 61\relax\emptyset, then {A1,,An}M\{A_{1},\ldots,A_{n}\}\cap M\neq\emptyset. MM is a minimal model of Π\Pi if it is a model of Π\Pi, and Π\Pi has no model that is properly contained in MM. MM(Π){\mathit{M}M}(\Pi) denotes the class of minimal models of Π\Pi. Now, for SHB(Π)S\subseteq{\mathit{H}B}(\Pi), transform gr(Π){\mathit{g}r}(\Pi) into a new, positive program gr(Π)S{\mathit{g}r}(\Pi)^{\!S} (i.e.  without not{\mathit{n}ot}), as follows: Delete every rule A1AnP1,,Pm,notN1,A_{1}\vee\ldots\vee A_{n}\leftarrow P_{1},\ldots,P_{m},\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{1}, ,notNk\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k} for which {N1,,Nk}S\{N_{1},\ldots,N_{k}\}\cap S\neq\emptyset. Next, transform each remaining rule A1AnP1,,Pm,A_{1}\vee\ldots\vee A_{n}\leftarrow P_{1},\ldots,P_{m}, notN1,,notNk\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k} into A1AnP1,,PmA_{1}\vee\ldots\vee A_{n}\leftarrow P_{1},\ldots,P_{m}. Now, SS is a stable model of Π\Pi if SMM(gr(Π)S)S\in{\mathit{M}M}({\mathit{g}r}(\Pi)^{\!S}). Every stable model of Π\Pi is also a minimal model of Π\Pi. Stable models are also commonly called answer sets, and so are we going to do most of the time.

A program is unstratified if there is a cyclic, recursive definition of a predicate that involves negation. For example, the program consisting of the rules abc,notda\vee b\leftarrow c,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}d;  ded\leftarrow e, and ebe\leftarrow b is unstratified, because there is a negation in the mutually recursive definitions of bb and ee. The program in Example 2.2 below is not unstratified, i.e. it is stratified. A good property of stratified programs is that the models can be upwardly computed following strata (layers) starting from the facts, that is from the ground instantiations of rules with empty bodies (in which case the arrow is usually omitted). We refer the reader to [Gelfond and Kahl (2014)] for more details.

Query answering under the ASPs comes in two forms. Under the brave semantics, a query posed to the program obtains as answers those that hold in some model of the program. However, under the skeptical (or cautious) semantics, only the answers that simultaneously hold in all the models are returned. Both are useful depending on the application at hand.

We will use disjunctive programs. However, sometimes it is possible to use instead normal programs, which do not have disjunctions in rule heads, and with the same stable models, in the sense that disjunctive rules can be transformed into a set of non-disjunctive rules. More precisely, the rule in (1) can be transformed into the rules:

A1\displaystyle A_{1} \displaystyle\leftarrow P1,,Pm,notN1,,notNk,notA2,,notAn\displaystyle P_{1},\ldots,P_{m},\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k},\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}A_{2},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}A_{n}
\displaystyle\cdots
An\displaystyle A_{n} \displaystyle\leftarrow P1,,Pm,notN1,,notNk,notA1,,notAn1\displaystyle P_{1},\ldots,P_{m},\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}N_{k},\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}A_{1},\ldots,\makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}A_{n\mathchar 45\relax 1}\mathbin{\cdot}

This shift operation is possible if the original program is head-cycle free [Ben-Eliyahu and Dechter (1994), Dantsin et al. (2001)], as we define now. The dependency graph of a program Π\Pi, denoted DG(Π){\mathit{D}G}(\Pi), is a directed graph whose nodes are the atoms of the associated ground program gr(Π){\mathit{g}r}(\Pi). There is an arc from L1L_{1} to L2L_{2} iff there is a rule in gr(Π){\mathit{g}r}(\Pi) where L1L_{1} appears positive in the body and L2L_{2} appears in the head. Π\Pi is head-cycle free (HCF) iff DG(Π){\mathit{D}G}(\Pi) has no cycle through two atoms that belong to the head of a same rule.

Example 2.2

Consider the following program Π\Pi that is already ground.

ab\displaystyle a\vee b \displaystyle\leftarrow c\displaystyle c
d\displaystyle d \displaystyle\leftarrow b\displaystyle b
ab\displaystyle a\vee b \displaystyle\leftarrow e,notf\displaystyle e,\ {\mathit{n}ot}{\mathit{f}}
e\displaystyle e \displaystyle\leftarrow

o

The program has two stable models:  S1={e,a}S_{1}\mathchar 61\relax\{e,a\} and S2={e,b,d}S_{2}\mathchar 61\relax\{e,b,d\}.

Each of them expresses that the atoms in it are true, and any other atom that does not belong to it, is false.

These models are incomparable under set inclusion, and they are minimal models in that any proper subset of any of them is not a model of the program.

Refer to caption
Figure 2: Dependency graph DG(Π){\mathit{D}G}(\Pi)

The dependency graph is shown in Figure 2. We can see that the program Π\Pi is head-cycle free, because there is no cycle involving both aa and bb, the atoms that appear in the disjunctive head. As a consequence, the program can be transformed into the following non-disjunctive program

a\displaystyle a \displaystyle\leftarrow c,notb\displaystyle c,\ \makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}b
b\displaystyle b \displaystyle\leftarrow c,nota\displaystyle c,\ \makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}a
d\displaystyle d \displaystyle\leftarrow b\displaystyle b
a\displaystyle a \displaystyle\leftarrow e,notf,notb\displaystyle e,\ {\mathit{n}ot}{\mathit{f}},\ \makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}b
b\displaystyle b \displaystyle\leftarrow e,notf,nota\displaystyle e,\ {\mathit{n}ot}{\mathit{f}},\ \makebox[0.6458pt]{}{\mathit{n}ot}\makebox[0.6458pt]{}a\mathbin{\cdot}
e\displaystyle e \displaystyle\leftarrow

This program has the same stable models as the original one.

We will return to HCF-programs in Section 5.2.

3 Counterfactual Explanations and the x-Resp Score

We consider classification models, 𝒞\mathcal{C}, that are represented by an input/output relation. Inputs are the so-called entities, 𝐞\mathbf{e}, which are represented as records, 𝐞=x1,,xn\mathbf{e}\mathchar 61\relax\langle x_{1},\ldots,x_{n}\rangle, where xix_{i} is the value Fi(𝐞)Domi:=Dom(Fi)F_{i}(\mathbf{e})\in{\mathit{D}om}_{i}:\mathchar 61\relax{\mathit{D}om}(F_{i}) taken on 𝐞\mathbf{e} by the feature Fi={F1,,Fn}F_{i}\in\mathcal{F}\mathchar 61\relax\{F_{1},\ldots,F_{n}\}. The entities form a population \mathcal{E}. The output of a classifier is represented by a label function LL that maps entities 𝐞\mathbf{e} to 0 or 11, the binary result of the classification. That is, to simplify the presentation, we concentrate here on binary classifiers, but this is not essential. Furthermore, we consider domains Dom(Fi){\mathit{D}om}(F_{i}) with a finite number of categorical values.

In this work, we are not assuming that we have an explicit classification model, and we do not need it. All we need is to be able to invoke it. It could be a “black-box” model.

The problem is the following: Given an entity 𝐞\mathbf{e} that has received the label L(𝐞)L(\mathbf{e}), provide an “explanation” for this outcome. In order to simplify the presentation, and without loss of generality, we assume that label 11 is the one that has to be explained. It is the “negative” outcome one has to justify, such as the rejection of a loan application.

Counterfactual explanations are defined in terms of counterfactual interventions that simultaneously change feature values in 𝐞\mathbf{e}, in such a way that the updated record gets a new label. A counterfactual explanation for the classification of 𝐞\mathbf{e} is, then, an alternative entity 𝐞\mathbf{e}^{\prime} that is classified with a label different from that of 𝐞\mathbf{e}. In general, we are interested in counterfactual explanations that are more informative about the role of the feature values in 𝐞\mathbf{e}, which lead to its obtained label. These are the entities that are obtained from 𝐞\mathbf{e} through a minimal counterfactual interventions. Minimality can be defined in different ways, and we adopt an abstract approach, assuming a partial order relation \preceq on counterfactual interventions.

Definition 3.1

Consider a binary classifier represented by its label function LL, and a fixed input entity 𝐞=x1,,xn\mathbf{e}\mathchar 61\relax\langle x_{1},\ldots,x_{n}\rangle, with Fi(𝐞)=xiF_{i}(\mathbf{e})\mathchar 61\relax x_{i},  1in1\leq i\leq n, and L(𝐞)=1L(\mathbf{e})\mathchar 61\relax 1.

(a)  An intervention ι^\mathbf{\hat{\iota}} on 𝐞\mathbf{e} is a set of the form {Fi1,xi1,,FiK,xiK}\{\langle F_{i_{1}},x_{i_{1}}^{\prime}\rangle,\ldots,\langle F_{i_{K}},x_{i_{K}}^{\prime}\rangle\}, with FisFiF_{i_{s}}\neq F_{i_{\ell}}, for ss\neq\ell,  xisxisDom(Fis)x_{i_{s}}\neq x_{i_{s}}^{\prime}\in{\mathit{D}om}(F_{i_{s}}).  We denote with ι^(𝐞)\mathbf{\hat{\iota}}(\mathbf{e}) the record obtained by applying to 𝐞\mathbf{e} intervention ι^\mathbf{\hat{\iota}}, i.e. by replacing in 𝐞\mathbf{e} every xis=Fis(𝐞)x_{i_{s}}\mathchar 61\relax F_{i_{s}}(\mathbf{e}), with FisF_{i_{s}} appearing in ι^\mathbf{\hat{\iota}}, by xisx_{i_{s}}^{\prime}.

(b) A counterfactual intervention on 𝐞\mathbf{e} is an intervention ι^\mathbf{\hat{\iota}} on 𝐞\mathbf{e}, such that L(ι^(𝐞))=0L(\mathbf{\hat{\iota}}(\mathbf{e}))\mathchar 61\relax 0.  The resulting entity 𝐞=ι^(𝐞)\mathbf{e}^{\prime}\mathchar 61\relax\mathbf{\hat{\iota}}(\mathbf{e}) is called a counterfactual entity (for 𝐞\mathbf{e}).

(c) A \preceq-minimal counterfactual intervention ι^\mathbf{\hat{\iota}} is such that there is no counterfactual intervention ι^\mathbf{\hat{\iota}}^{\prime} on 𝐞\mathbf{e} with ι^ι^\mathbf{\hat{\iota}}^{\prime}\prec\mathbf{\hat{\iota}}  (i.e.  ι^ι^\mathbf{\hat{\iota}}^{\prime}\preceq\mathbf{\hat{\iota}}, but not ι^ι^\mathbf{\hat{\iota}}\preceq\mathbf{\hat{\iota}}^{\prime}).  The resulting entity 𝐞=ι^(𝐞)\mathbf{e}^{\prime}\mathchar 61\relax\mathbf{\hat{\iota}}(\mathbf{e}) is called a \preceq-minimal counterfactual entity.

(d) A counterfactual explanation for L(𝐞)L(\mathbf{e}) is a set of the form ϵ^={Fi1,xi1,,\mathbf{\hat{\epsilon}}\mathchar 61\relax\{\langle F_{i_{1}},x_{i_{1}}\rangle,\ldots, FiK,xiK}\langle F_{i_{K}},x_{i_{K}}\rangle\}, with Fij(𝐞)=xijF_{i_{j}}(\mathbf{e})\mathchar 61\relax x_{i_{j}}, for which there is a counterfactual intervention ι^(ϵ^)=\mathbf{\hat{\iota}}(\mathbf{\hat{\epsilon}})\mathchar 61\relax {Fi1,xi1,,\{\langle F_{i_{1}},x_{i_{1}}^{\prime}\rangle,\ldots, FiK,xiK}\langle F_{i_{K}},x_{i_{K}}^{\prime}\rangle\} for 𝐞\mathbf{e}.

(e) A counterfactual explanation ϵ^\mathbf{\hat{\epsilon}} for L(𝐞)L(\mathbf{e}) is \preceq-minimal if its associated counterfactual intervention ι^(ϵ^)\mathbf{\hat{\iota}}(\mathbf{\hat{\epsilon}}) is a \preceq-minimal counterfactual intervention on 𝐞\mathbf{e}.

Remark 3.1

Counterfactual explanations contain feature values of the original entity 𝐞\mathbf{e}. They contain relevant information about the classification result, and interventions are used to characterize and identify them. For this reason, we will usually call an alternative entity 𝐞\mathbf{e}^{\prime} obtained from 𝐞\mathbf{e} through a counterfactual intervention, a counterfactual explanation as well: The counterfactual explanation in the sense of Definition 3.1(d) can be read-off from 𝐞\mathbf{e}^{\prime}.

Several minimality criteria can be expressed in terms of partial orders. We will explicitly consider two of them.

Definition 3.2

Let ι^1\mathbf{\hat{\iota}}_{1} and ι^2\mathbf{\hat{\iota}}_{2} be interventions on an entity 𝐞\mathbf{e}.  (a)  ι^1sι^2\mathbf{\hat{\iota}}_{1}\leq^{s}\mathbf{\hat{\iota}}_{2} iff π1(ι^1)π1(ι^2)\pi_{1}(\mathbf{\hat{\iota}}_{1})\subseteq\pi_{1}(\mathbf{\hat{\iota}}_{2}), with π1(ι^)\pi_{1}(\mathbf{\hat{\iota}}) the projection of ι^\mathbf{\hat{\iota}} on the first position.  (b)  ι^1cι^2\mathbf{\hat{\iota}}_{1}\leq^{c}\mathbf{\hat{\iota}}_{2} iff |ι^1||ι^2||\mathbf{\hat{\iota}}_{1}|\leq|\mathbf{\hat{\iota}}_{2}|.

This definition introduces minimality under set inclusion and cardinality, resp. The former minimizes -under set inclusion- the set of features whose values are changed. The latter, the number of features that see their values changed. In the following, we will consider only these minimality criteria, mostly the second (c.f. Section 9 for a discussion).

Example 3.1

Consider three binary features ={F1,F2,F3}\mathcal{F}\mathchar 61\relax\{F_{1},F_{2},F_{3}\}, i.e. they take values 0 or 11. The input/output relation of a classifier 𝒞\mathcal{C} is shown in Table 1. Let 𝐞\mathbf{e} be 𝐞1\mathbf{e}_{1} in the table. We want counterfactual explanations for its label 11. Any other record in the table can be seen as the result of an intervention on 𝐞1\mathbf{e}_{1}. However, only 𝐞4,𝐞7,𝐞8\mathbf{e}_{4},\mathbf{e}_{7},\mathbf{e}_{8} are (results of) counterfactual interventions in that they switch the label to 0.

entity (id) F1F_{1} F2F_{2} F3F_{3} LL
𝐞1\mathbf{e}_{1} 0 1 1 1
𝐞2\mathbf{e}_{2} 1 1 1 1
𝐞3\mathbf{e}_{3} 1 1 0 1
𝐞4\mathbf{e}_{4} 1 0 1 0
𝐞5\mathbf{e}_{5} 1 0 0 1
𝐞6\mathbf{e}_{6} 0 1 0 1
𝐞7\mathbf{e}_{7} 0 0 1 0
𝐞8\mathbf{e}_{8} 0 0 0 0

Table 1:  Classifier  𝒞\mathcal{C}

For example, 𝐞4\mathbf{e}_{4} corresponds to the intervention ι^4={F1,1,F2,0}\mathbf{\hat{\iota}}_{4}\mathchar 61\relax\{\langle F_{1},1\rangle,\langle F_{2},0\rangle\}, in that 𝐞4\mathbf{e}_{4} is obtained from 𝐞1\mathbf{e}_{1} by changing the values of F1,F2F_{1},F_{2} into 11 and 0, resp. For ι^4\mathbf{\hat{\iota}}_{4}, π1(ι^4)={F1,F2}\pi_{1}(\mathbf{\hat{\iota}}_{4})\mathchar 61\relax\{\langle F_{1}\rangle,\langle F_{2}\rangle\}. From ι^4\mathbf{\hat{\iota}}_{4} we obtain the counterfactual explanation ϵ^4={F1,0,F2,1}\mathbf{\hat{\epsilon}}_{4}\mathchar 61\relax\{\langle F_{1},0\rangle,\langle F_{2},1\rangle\}, telling us that the values F1(𝐞1)=0F_{1}(\mathbf{e}_{1})\mathchar 61\relax 0 and F2(𝐞1)=1F_{2}(\mathbf{e}_{1})\mathchar 61\relax 1 are the joint cause for 𝐞1\mathbf{e}_{1} to be classified as 11.

There are three counterfactual explanations:  ϵ^4:={F1,0,F2,1}\mathbf{\hat{\epsilon}}_{4}:\mathchar 61\relax\{\langle F_{1},0\rangle,\langle F_{2},1\rangle\}, ϵ^7:=\mathbf{\hat{\epsilon}}_{7}:\mathchar 61\relax {F2,1}\{\langle F_{2},1\rangle\}, and ϵ^8:={F2,1,F3,1}\mathbf{\hat{\epsilon}}_{8}:\mathchar 61\relax\{\langle F_{2},1\rangle,\langle F_{3},1\rangle\}.  Here, 𝐞4\mathbf{e}_{4} and 𝐞8\mathbf{e}_{8} are incomparable under s\preceq^{s},  𝐞7s𝐞4\mathbf{e}_{7}\prec^{s}\mathbf{e}_{4},  𝐞7s𝐞8\mathbf{e}_{7}\prec^{s}\mathbf{e}_{8}, and ϵ^7\mathbf{\hat{\epsilon}}_{7} turns out to be s\preceq^{s}-  and  c\preceq^{c}-minimal (actually, minimum).

Notice that what matters here is what is intervened, and not how. By taking a projection, the partial order s\preceq^{s} does not care about the values that replace the original feature values, as long as the latter are changed.  Furthermore, given 𝐞\mathbf{e}, it would be good enough to indicate the features whose values are relevant, e.g. ϵ^7={F2}\mathbf{\hat{\epsilon}}_{7}\mathchar 61\relax\{F_{2}\} in the previous example. However, the introduced notation emphasizes the fact that the original values are those we concentrate on when providing explanations.

Every c\preceq^{c}-minimal explanation is also s\preceq^{s}-minimal. However, it is easy to produce an example showing that a s\preceq^{s}-minimal explanation may not be c\preceq^{c}-minimal.

Notation:  An s-explanation for L(𝐞)L(\mathbf{e}) is a s\preceq^{s}-minimal counterfactual explanation for L(𝐞)L(\mathbf{e}).  A c-explanation for L(𝐞)L(\mathbf{e}) is a c\preceq^{c}-minimal counterfactual explanation for L(𝐞)L(\mathbf{e}). So as prescribed in Remark 3.1, we will usually use the terms s-explanation and c-explanation to refer to the alternative, intervened entities 𝐞\mathbf{e}^{\prime} that are associated to s- and c-explanations in the sense of Definition 3.2.

So far, we have characterized explanations as sets of (interventions on) features. However, one would also like to quantify the “causal strength” of a single feature value in a record representing an entity. Something similar has been done for a single tuple in a database as a cause for a query answer [Meliou et al. (2010)], or for a single attribute value in a database tuple [Bertossi and Salimi (2017a), Bertossi (2021)].  Different scores for feature values have been proposed in this direction, e.g. Shap in [Lundberg et al. (2020)] and Resp in [Bertossi et al. (2020)]. Following [Chockler and Halpern (2004)], we will now formulate the latter as the responsibility of a feature value as an actual cause [Halpern and Pearl (2005)] for the observed classification.

Definition 3.3

Consider an entity 𝐞\mathbf{e} represented as a record of feature values xi=Fi(𝐞)x_{i}\mathchar 61\relax F_{i}(\mathbf{e}), FiF_{i}\in\mathcal{F}.

(a)  A feature value v=F(𝐞)v\mathchar 61\relax F(\mathbf{e}), with FF\in\mathcal{F}, is a value-explanation for L(𝐞)L(\mathbf{e}) if there is an s-explanation ϵ^\mathbf{\hat{\epsilon}} for L(𝐞)L(\mathbf{e}), such that F,vϵ^\langle F,v\rangle\in\mathbf{\hat{\epsilon}}.

(b) The explanatory responsibility of a value-explanation v=F(𝐞)v\mathchar 61\relax F(\mathbf{e}) is:

x-Resp𝐞,F(v):=max{1|ϵ^|:ϵ^ is s-explanation with F,vϵ^}\mbox{x-Resp}_{\mathbf{e},F}(v):\mathchar 61\relax{\mathit{m}ax}\{\frac{1}{|\mathbf{\hat{\epsilon}}|}:\mathbf{\hat{\epsilon}}\ \mbox{ is s-explanation with }\langle F,v\rangle\in\mathbf{\hat{\epsilon}}\}\mathbin{\cdot}

(c) If v=F(𝐞)v\mathchar 61\relax F(\mathbf{e}) is not a value-explanation, x-Resp𝐞,F(v):=0\mbox{x-Resp}_{\mathbf{e},F}(v):\mathchar 61\relax 0.

Notice that (b) can be stated as  x-Resp𝐞,F(v):=1|ϵ^|\mbox{x-Resp}_{\mathbf{e},F}(v):\mathchar 61\relax\frac{1}{|\mathbf{\hat{\epsilon}}^{\star}|}, with  ϵ^=argmin{|ϵ^|:ϵ^\mathbf{\hat{\epsilon}}^{\star}\mathchar 61\relax\mbox{argmin}\{|\mathbf{\hat{\epsilon}}|\makebox[0.6458pt]{}:\makebox[0.6458pt]{}\mathbf{\hat{\epsilon}} is s-explanation with F,vϵ^}\mbox{is s-explanation with }\langle F,v\rangle\in\mathbf{\hat{\epsilon}}\}.

Adopting the terminology of actual causality [Halpern and Pearl (2005)], a counterfactual value-explanation for 𝐞\mathbf{e}’s classification is a value-explanation vv with x-Resp𝐞,F(v)=1\mbox{x-Resp}_{\mathbf{e},F}(v)\mathchar 61\relax 1. That is, it suffices, without company from other feature values in 𝐞\mathbf{e}, to explain the classification.  Similarly, an actual value-explanation for 𝐞\mathbf{e}’s classification is a value-explanation vv with x-Resp𝐞,F(v)¿0\mbox{x-Resp}_{\mathbf{e},F}(v)\mathchar 62\relax 0. That is, vv appears in an s-explanation ϵ^\mathbf{\hat{\epsilon}}, say as F,v\langle F,v\rangle, but possibly in company of other feature values. In this case, ϵ^{F,v}\mathbf{\hat{\epsilon}}\smallsetminus\{\langle F,v\rangle\} is a contingency set for vv  (c.f. [Halpern and Pearl (2005)], and [Meliou et al. (2010)] for the case of databases).

Example 3.2

(example 3.1 continued)   ϵ^7\mathbf{\hat{\epsilon}}_{7}, equivalently entity 𝐞7\mathbf{e}_{7}, is the only c-explanation for 𝐞1\mathbf{e}_{1}’s classification. Its value 11 for F2F_{2} is a counterfactual value-explanation, and its explanatory responsibility is x-Resp𝐞1,F2(1):=1\mbox{x-Resp}_{\mathbf{e}_{1},F_{2}}(1):\mathchar 61\relax 1. The empty set is its contingency set. Now, entity 𝐞4\mathbf{e}_{4} shows that the intervened value for F2F_{2}, i.e. 11, needs {F1,0}\{\langle F_{1},0\rangle\} as contingency set for the label to switch to 0.

In this work we are interested mostly in c-explanations, which are our best explanations, and in maximum-responsibility feature values. Notice that maximum x-Resp scores can be obtained by concentrating only on c-explanations.  Maximum-responsibility value-explanations appear in c-explanations, and only in them. C.f. Section 9 for considerations on s-explanations and features with non-maximum responsibility scores.

4 Complexity of  x-Resp Computation

A natural question is about the complexity of computing the x-Resp score. Not only for the obvious reason of knowing the complexity, but also to determine if the ASP-based approach we will present is justified from the complexity point of view. We can shed some light on this matter by taking advantage of complexity results for actual causality in databases [Meliou et al. (2010), Bertossi and Salimi (2017a)]. It is known that there are Boolean Conjunctive Queries (BCQs), 𝒬\mathcal{Q}, for which deciding if the responsibility of a tuple for 𝒬\mathcal{Q} being true in a database DD is above a given threshold is NP{\mathit{N}\!P}-complete, in the size of DD [Bertossi and Salimi (2017a)].

In our case, given a fixed classifier 𝒞\mathcal{C}, the computational problem is about deciding, for an arbitrary entity 𝐞\mathbf{e}^{\star} and rational number vv, if x-Resp(𝐞)¿v\mbox{x-Resp}(\mathbf{e}^{\star})\mathchar 62\relax v. The complexity is measured as a function of |𝐞||\mathbf{e}^{\star}|, the size MM of cartesian product of the the underlying domains, and the complexity of computing 𝒞\mathcal{C}. (Under our ASP-based approach, MM will be associated to the extensional database for the program.)

Given a relational database schema 𝒮={R1,,Rk}\mathcal{S}\mathchar 61\relax\{R_{1},\ldots,R_{k}\}, with predicates with arities ar(Ri){\mathit{a}r}(R_{i}), we can see each attribute AA as a feature that takes values in a finite domain Dom(A){\mathit{D}om}(A). Without loss of generality, we assume the predicates RiR_{i} do not share attributes (but attributes may share a same domain).  Then, we have a sequence of attributes A11,\langle A^{1}_{1}, ,Aar(R1)1,,A1k,,Aar(Rk)k\ldots,A^{1}_{{\mathit{a}r}(R_{1})},\ldots,A^{k}_{1},\ldots,A^{k}_{{\mathit{a}r}(R_{k})}\rangle.  For a database DD, for each of the potential database tuples τ1,τM\tau_{1},\ldots\tau_{M} (in DD or not), with M=|Dom(A11)|××|Dom(Aar(R1)1)|++|Dom(A1k)|××|Dom(Aar(Ak)k)|M\mathchar 61\relax|{\mathit{D}om}(A^{1}_{1})|\times\cdots\times|{\mathit{D}om}(A^{1}_{{\mathit{a}r}(R_{1})})|\mathchar 43\relax\cdots\mathchar 43\relax|{\mathit{D}om}(A^{k}_{1})|\times\cdots\times|{\mathit{D}om}(A^{k}_{{\mathit{a}r}(A_{k})})|, define binary features Fj(τj):=1F_{j}(\tau_{j}):\mathchar 61\relax 1 if τjD\tau_{j}\in D, and 0, otherwise.

Now, define a binary entity 𝐞=F(τj)τjD\mathbf{e}^{\star}\mathchar 61\relax\langle F(\tau_{j})\rangle_{\tau_{j}\in D}. This entity, containing only 11s, represents the contents of database DD, and its length coincides with the number of tuples in DD, say |D||D|. Now, the population of entities is ={1,0}|D|\mathcal{E}\mathchar 61\relax\{1,0\}^{|D|}, i.e. all the binary sequences with the same length as 𝐞\mathbf{e}^{\star}. Intuitively, each entity in \mathcal{E} represents a sub-instance D𝐞D^{\mathbf{e}} of DD, by keeping only the tuples of DD that are assigned the value 11. We do not need to consider the insertion of tuples in DD, as will be clear soon.

Now, given the fixed BCQ 𝒬\mathcal{Q}, for which D𝒬D\models\mathcal{Q}, define a binary classifier 𝒞𝒬\mathcal{C}^{\mathcal{Q}} as follows:  For 𝐞E\mathbf{e}\in E, 𝒞𝒬(𝐞):=1 iff D𝐞𝒬\mathcal{C}^{\mathcal{Q}}(\mathbf{e}):\mathchar 61\relax 1\mbox{ iff }D^{\mathbf{e}}\models\mathcal{Q}. Notice that this classifier runs in polynomial-time in the length of 𝐞\mathbf{e}. Also, 𝒞𝒬(𝐞)=1\mathcal{C}^{\mathcal{Q}}(\mathbf{e}^{\star})\mathchar 61\relax 1. The monotonicity of 𝒬\mathcal{Q} allows us to concentrate on sub-instances of DD. Only subinstances can invalidate the query, and superinstances will always make it true. In this way, we have reduced the problem of computing the responsibility of a tuple τj\tau_{j} as an actual cause for D𝒬D\models\mathcal{Q} to the problem of computing x-Resp𝐞,Fj(1)\mbox{x-Resp}_{\mathbf{e}^{\star},F_{j}}\!(1).

Theorem 1

There is a binary polynomial-time classifier 𝒞\mathcal{C} over a finite set of binary entities \mathcal{E} for which deciding if x-Resp𝐞,F(v)\mbox{x-Resp}_{\mathbf{e},F}(v) is above a certain threshold is NP{\mathit{N}\!P}-complete in the size of 𝐞\mathbf{e} plus the size of \mathcal{E}.

So as in [Bertossi and Salimi (2017a)], several other problems in relation to responsibility can be investigated; and it is likely that most (if not all) the results in [Bertossi and Salimi (2017a)] can be used to obtain similar results for the x-Resp score.

5 Counterfactual Intervention Programs

An answer-set program has a possible-world semantics, given in terms of its stable models, which are the intended models of the program [Gelfond and Kahl (2014)]. A program consists of a set of rules with variables, possibly with negated atoms in a rule body (antecedent) and disjunctions in the head (consequent). This negation is non-monotonic, which is particulary useful for doing commonsense reasoning and specifying persistence of properties. A program has an extensional database consisting of ground atoms (the facts). In our case, the facts will be related to the entity under classification for whose label we want counterfactual explanations. The program specifies the possible interventions. Final, intervened versions of the original entity, that have their label switched, correspond to different stable models of the program.

Entities will be represented by means of a predicate with n+2n\mathchar 43\relax 2 arguments E(;;)E(\cdot;\cdots;\cdot). The first one holds a record (or entity) id (which may not be needed when dealing with single entities). The next nn arguments hold the feature values.222For performance-related reasons, it might be more convenient to use nn 3-ary predicates to represent an entity with an identifier, but the presentation here would be more complicated. The last argument holds an annotation constant from the set {o,do,,s}\{\mbox{o},\mbox{do},\mathbf{\star},\mbox{s}\}. Their semantics will be specified below, by the generic program that uses them.

Initially, a record 𝐞=f1,,fn\mathbf{e}\mathchar 61\relax\langle f_{1},\ldots,f_{n}\rangle has not been subject to interventions, and the corresponding entry in predicate EE is of the form E(𝐞,f¯,o)E(\mathbf{e},\bar{f},\mbox{o}), where 𝐞\mathbf{e} is (with a bit of abuse of notation) a constant used a an entity identifier, f¯\bar{f} is an abbreviation for f1,,fnf_{1},\ldots,f_{n}, i.e. the feature values for entity 𝐞\mathbf{e}; and the annotation constant “o” stands for “original entity”.

When the classifier gives label 11 to 𝐞\mathbf{e}, the idea is to start changing feature values, one at a time. The intervened entity becomes then annotated with constant do in the last argument.333The do-operator is common to denote interventions [Pearl (2009)]. Here, it is just a constant. When the resulting intervened entities are classified, we may not have the classifier specified within the program. For this reason, the program uses a special predicate 𝒞(,)\mathcal{C}(\cdots,\cdot), where the first arguments take the feature values of an entity under classification, and the last argument returns the binary label. This predicate can be explicitly given as a set of facts (c.f. Example 5.1), or can be specified within the program (c.f. Example 5.7), or can be invoked by the program as an external predicate (c.f. Example 5.8), much in the spirit of HEX-programs [Eiter et al. (2019), Eiter et al. (2017)].  Since the original instance may have to go through several interventions until reaching one that switches the label to 0, the intermediate entities get the “transition” annotation \mathbf{\star}.  This is achieved by a generic program.

5.1 Counterfactual intervention programs

The generic Counterfactual Intervention Program (CIP) is as follows:

  • P1.

    The facts of the program are the atoms of the form  Domi(c){\mathit{D}om}_{i}(c), with cDomic\in{\mathit{D}om}_{i}, plus the initial entity E(𝐞,f¯,o)E(\mathbf{e},\bar{f},\mbox{o}), with f¯\bar{f} the initial vector of feature values.

  • P2.

    The transition entities are obtained as initial, original entities, or as the result of an intervention.  Here, ee is a variable standing for a record id.

    E(e,x¯,)\displaystyle E(e,\bar{x},\mathfrak{\star}) \displaystyle\longleftarrow E(e,x¯,o)\displaystyle E(e,\bar{x},\mbox{o})\mathbin{\cdot}
    E(e,x¯,)\displaystyle E(e,\bar{x},\mathfrak{\star}) \displaystyle\longleftarrow E(e,x¯,do)\displaystyle E(e,\bar{x},\mbox{do})\mathbin{\cdot}
  • P3.

    The program rule specifying that, every time the entity at hand (original or obtained after a “previous” intervention) is classified with label 11, a new value has to be picked from a domain, and replaced for the current value. The new value is chosen via the non-deterministic “choice operator”, well-known in ASP [Giannotti et al. (1997)]. In this case, the values are chosen from the domains, subject to the condition of not being the same as the current value:

E(e,x1,x2,,xn,do)E(e,x1,x2,,xn,do)E(e,x¯,),𝒞(x¯,1),\displaystyle E(e,x_{1}^{\prime},x_{2},\ldots,x_{n},\mbox{do})\vee\cdots\vee E(e,x_{1},x_{2},\ldots,x_{n}^{\prime},\mbox{do})\ \longleftarrow\ E(e,\bar{x},\mathfrak{\star}),\mathcal{C}(\bar{x},1),
Dom1(x1),,Domn(xn),x1x1,,xnxn,\displaystyle\hskip 113.81102pt{\mathit{D}om}_{1}(x_{1}^{\prime}),\ldots,{\mathit{D}om}_{n}(x_{n}^{\prime}),x_{1}^{\prime}\neq x_{1},\ldots,x_{n}^{\prime}\neq x_{n},
choice(x¯,x1),,choice(x¯,xn)\displaystyle\hskip 113.81102pt{\mathit{c}hoice}(\bar{x},x_{1}^{\prime}),\ldots,{\mathit{c}hoice}(\bar{x},x_{n}^{\prime})\mathbin{\cdot}

In general, for each fixed x¯\bar{x}, choice(x¯,y){\mathit{c}hoice}(\bar{x},y) chooses a unique value yy subject to the other conditions in the same rule body. The use of the choice operator can be eliminated by replacing each choice(x¯,xi){\mathit{c}hoice}(\bar{x},x_{i}^{\prime}) atom by the atom Choseni(x¯,xi){\mathit{C}\!hosen}_{i}(\bar{x},x_{i}^{\prime}), and defining each predicate Choseni{\mathit{C}\!hosen}_{i} by means of “classical” rules [Giannotti et al. (1997)], as follows:444We emphasize that we are using here the “choice operator”, which is definable in ASP (as done here), and not the newer choice rules, which could be used here for the same purpose (and many more) and are included in the ASP-Core-2 Standard [Calimeri et al. (2020)]. We use the choice operator, because most of our programs are being run with DLV-Complex, which does not support choice rules.

Choseni(x¯,y)\displaystyle{\mathit{C}\!hosen}_{i}(\bar{x},y) \displaystyle\leftarrow E(e,x¯,),𝒞(x¯,1),Domi(y),yxi,\displaystyle E(e,\bar{x},\mathfrak{\star}),\ \mathcal{C}(\bar{x},1),\ {\mathit{D}om}_{i}(y),\ y\neq x_{i}, (2)
oooooooooooonotDiffChoicei(x¯,y)\displaystyle\mbox{\phantom{oooooooooooo}}{\mathit{n}ot}\ {\mathit{D}i\!f\!\!f\!Choice}_{i}(\bar{x},y)\mathbin{\cdot}
DiffChoicei(x¯,y)\displaystyle{\mathit{D}i\!f\!\!f\!Choice}_{i}(\bar{x},y) \displaystyle\leftarrow Choseni(x¯,y),yy\displaystyle{\mathit{C}\!hosen}_{i}(\bar{x},y^{\prime}),\ y^{\prime}\neq y\mathbin{\cdot} (3)
  • P4.

    The following rule specifies that we can “stop”, hence annotation s, when we reach an entity that gets label 0:

    E(e,x¯,s)E(e,x¯,do),𝒞(x¯,0)E(e,\bar{x},\mbox{s})\ \longleftarrow\ E(e,\bar{x},\mbox{do}),\ \mathcal{C}(\bar{x},0)\mathbin{\cdot}
  • P5.

    We add a em program constraint specifying that we prohibit going back to the original entity via local interventions:

    E(e,x¯,do),E(e,x¯,o)\longleftarrow\ E(e,\bar{x},\mbox{do}),\ E(e,\bar{x},\mbox{o})\mathbin{\cdot}

  • P6.

    The counterfactual explanations can be collected by means of a predicate Expl(,,){\mathit{E}xpl}(\cdot,\cdot,\cdot) specified by means of:

    Expl(e,i,xi)E(e,x1,,xn,o),E(e,x1,,xn,s),xixi{\mathit{E}xpl}(e,i,x_{i})\ \longleftarrow\ E(e,x_{1},\ldots,x_{n},\mbox{o}),\ E(e,x_{1}^{\prime},\ldots,x_{n}^{\prime},\mbox{s}),\ x_{i}\neq x_{i}^{\prime},

    with i=1,,ni\mathchar 61\relax 1,\ldots,n.  They collect each value that has been changed in the original instance ee, with its position in ee (the second argument of the predicate). Actually, each of these is a value-explanation.

The program will have several stable models due to the disjunctive rule and the choice operator. Each model will hold intervened versions of the original entity, and hopefully versions for which the label is switched, i.e. those with annotation s. If the classifier never switches the label, despite the fact that local interventions are not restricted, we will not find a model with a version of the initial entity annotated with s.  Due to the constraint in P5., none of the models will have the original entity annotated with do, because those models would be discarded [Leone et al. (2006)]. The definition of the choice operator contains non-stratified negation. The semantics of ASP, which involves model minimality, makes only one of the atoms in a head disjunction true (unless forced otherwise by the program).

Example 5.1

(example 3.1 continued) Most of the CIP above is generic. Here we have the facts:  Dom1(0),{\mathit{D}om}_{1}(0), Dom1(1),{\mathit{D}om}_{1}(1), Dom2(0),Dom2(1),Dom3(0),{\mathit{D}om}_{2}(0),{\mathit{D}om}_{2}(1),{\mathit{D}om}_{3}(0), Dom3(1){\mathit{D}om}_{3}(1) and E(𝐞1,0,1,1,o)E(\mathbf{e}_{1},0,1,1,\mbox{o}), with 𝐞1\mathbf{e}_{1} a constant, the record id of the first row in Table 1.  The classifier is explicitly given by Table 1. Then, predicate 𝒞\mathcal{C} can be specified with a set of additional facts: 𝒞(0,1,1,1)\mathcal{C}(0,1,1,1), 𝒞(1,1,1,1)\mathcal{C}(1,1,1,1), 𝒞(1,1,0,1)\mathcal{C}(1,1,0,1) 𝒞(1,0,1,0)\mathcal{C}(1,0,1,0) 𝒞(1,0,0,1)\mathcal{C}(1,0,0,1) 𝒞(0,1,0,1)\mathcal{C}(0,1,0,1) 𝒞(0,0,1,0)\mathcal{C}(0,0,1,0) 𝒞(0,0,0,0)\mathcal{C}(0,0,0,0). In them, the last entry corresponds the label assigned to the entity whose feature values are given in the first three arguments.

The stable models of the program will contain all the facts above. One of them, say 1\mathcal{M}_{1}, will contain (among others) the facts:  E(𝐞1,0,1,1;o)E(\mathbf{e}_{1},0,1,1;\mbox{o}) and E(𝐞1,0,1,1;)E(\mathbf{e}_{1},0,1,1;\mathbf{\star}).  The presence of the last atom activates rule P3., because 𝒞(0,1,1,1)\mathcal{C}(0,1,1,1) is true (for 𝐞1\mathbf{e}_{1} in Table 1). New facts are produced for 1\mathcal{M}_{1} (the new value due to an intervention is underlined): E(𝐞1,1¯,1,1,do),E(\mathbf{e}_{1},\underline{1},1,1,\mbox{do}), E(𝐞1,1¯,1,1,)E(\mathbf{e}_{1},\underline{1},1,1,\mathbf{\star}).  Due to the last fact and the true 𝒞(1,1,1,1)\mathcal{C}(1,1,1,1), rule P3. is activated again. Choosing the value 0 for the second disjunct, atoms E(𝐞1,1¯,0¯,1,do),E(\mathbf{e}_{1},\underline{1},\underline{0},1,\mbox{do}), E(𝐞1,1¯,0¯,1,)E(\mathbf{e}_{1},\underline{1},\underline{0},1,\mathbf{\star}) are generated. For the latter, 𝒞(1,0,1,0)\mathcal{C}(1,0,1,0) is true (coming from 𝐞4\mathbf{e}_{4} in Table 1), switching the label to 0. Rule P3. is no longer activated, and we can apply rule P4., obtaining  E(𝐞1,1¯,0¯,1,s)E(\mathbf{e}_{1},\underline{1},\underline{0},1,\mbox{s}).

From rule P6., we obtain explanations Expl(𝐞1,1,0),Expl(𝐞1,2,1){\mathit{E}xpl}(\mathbf{e}_{1},1,0),{\mathit{E}xpl}(\mathbf{e}_{1},2,1), showing the changed values in 𝐞1\mathbf{e}_{1}. All this in model 1\mathcal{M}_{1}.  There are other models, and one of them contains E(𝐞1,0,0¯,1,s)E(\mathbf{e}_{1},0,\underline{0},1,\mbox{s}), the minimally intervened version of 𝐞1\mathbf{e}_{1}, i.e. 𝐞7\mathbf{e}_{7}.

In the next example we will show how to write and run the counterfactual intervention program for Example 3.1 with the DLV system [Leone et al. (2006)].555We have experimented, with the examples in this paper and others, with each of DLV, DLV-Complex, and DLV2. They have been extremely useful. At this moment, each of them seems to have some nice features the others lack. When numerical aggregations and, specially, set operations are needed, we use instead the DLV-Complex system [Calimeri et al.. (2009)] (c.f. Section 9). In Example 5.8 we show a program that can be used with the newer version, DLV2 [Alviano et al. (2017), Alviano et al. (2019)], of DLV, which follows the ASP-Core-2 standard [Calimeri et al. (2020)].

Example 5.2

(example 5.1 continued) The answer-set program for Examples 3.1 and 5.1, written in the language for the DLV-Complex system is shown next. (The program portion shown right below would also run with DLV since it does not contain numerical aggregations.) In it, the annotation “tr” stands for the transition annotation “\star” used in Example 5.1, and X, Xp stand for x,xx,x^{\prime}, etc. In a DLV program, terms starting with a lower-case letter are constants; and those starting with an upper-case letter are variables.

    #include<ListAndSet>
    % the classifier:
    cls(0,1,1,1). cls(1,1,1,1). cls(1,1,0,1). cls(1,0,1,0). cls(1,0,0,1).
    cls(0,1,0,1). cls(0,0,1,0). cls(0,0,0,0).
    % the domains:
    dom1(0). dom1(1). dom2(0). dom2(1). dom3(0). dom3(1).
    % original entity at hand:
    ent(e,0,1,1,o).

    % transition rules:
    ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,o).
    ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,do).

    % admissible counterfactual interventions:
    ent(E,Xp,Y,Z,do) v ent(E,X,Yp,Z,do) v ent(E,X,Y,Zp,do) :- ent(E,X,Y,Z,tr),
                                    cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp),
                                    X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp),
                                    chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp).

    % definitions of chosen operators as in equations (2) and (3):
    chosen1(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(U), U != X,
                        not diffchoice1(X,Y,Z,U).
    diffchoice1(X,Y,Z, U) :- chosen1(X,Y,Z,Up), U != Up, dom1(U).
    chosen2(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom2(U), U != Y,
                        not diffchoice2(X,Y,Z,U).
    diffchoice2(X,Y,Z, U) :- chosen2(X,Y,Z,Up), U != Up, dom2(U).
    chosen3(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom3(U), U != Z,
                        not diffchoice3(X,Y,Z,U).
    diffchoice3(X,Y,Z, U) :- chosen3(X,Y,Z,Up), U != Up, dom3(U).

    % stop when label has been changed:
    ent(E,X,Y,Z,s) :- ent(E,X,Y,Z,do), cls(X,Y,Z,0).

    % hard constraint for not returning to original entity:
    :- ent(E,X,Y,Z,do), ent(E,X,Y,Z,o).

    % auxiliary predicate to avoid unsafe negation in the hard constraint below:
    entAux(E) :- ent(E,X,Y,Z,s).

    % hard constraint for not computing models where label does not change:
    :- ent(E,X,Y,Z,o), not entAux(E).

    % collecting explanatory changes per argument:
    expl(E,1,X) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp.
    expl(E,2,Y) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp.
    expl(E,3,Z) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.
 

If we run this program with DLV-Complex, we obtain the following three models; for which we do not show (most of) the original program facts, as can be requested from DLV:


    {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0),
     chosen3(0,1,1,0), ent(e,0,0,1,do), ent(e,0,0,1,tr), ent(e,0,0,1,s),
     diffchoice3(0,1,1,1), diffchoice2(0,1,1,1), diffchoice1(0,1,1,0),
     entAux(e), expl(e,2,1)}

    {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0),
     chosen3(0,1,1,0), ent(e,0,1,0,do), ent(e,0,1,0,tr), chosen1(0,1,0,1),
     chosen2(0,1,0,0), chosen3(0,1,0,1), ent(e,0,0,0,do), ent(e,0,0,0,tr),
     ent(e,0,0,0,s), diffchoice3(0,1,0,0), diffchoice3(0,1,1,1),
     diffchoice2(0,1,0,1), diffchoice2(0,1,1,1), diffchoice1(0,1,0,0),
     diffchoice1(0,1,1,0), entAux(e), expl(e,2,1), expl(e,3,1)}

    {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0),
     chosen3(0,1,1,0), ent(e,1,1,1,do), ent(e,1,1,1,tr), chosen1(1,1,1,0),
     chosen2(1,1,1,0), chosen3(1,1,1,0), ent(e,1,0,1,do), ent(e,1,0,1,tr),
     ent(e,1,0,1,s), diffchoice3(0,1,1,1), diffchoice3(1,1,1,1),
     diffchoice2(0,1,1,1), diffchoice2(1,1,1,1), diffchoice1(0,1,1,0),
     diffchoice1(1,1,1,1), entAux(e), expl(e,1,0), expl(e,2,1)}
 

These models correspond to the counterfactual entities 𝐞7,𝐞8,𝐞4\mathbf{e}_{7},\mathbf{e}_{8},\mathbf{e}_{4}, resp. in Example 3.1.

Notice that the program, except for the fact ent(e,0,1,1,o) in the 10th line, is completely generic, and can be used with any input entity that has been classified with label 11.666If the initial label is 0 instead, no interventions would be triggered, and the only model would correspond to the initial entity. We could remove it from the program, obtaining program theProgram2.txt, and we could run instead

C:\DLV>dlv.exe ent.txt program2.txt > outputFile.txt

where ent.txt is the file containing only  ent(e,0,1,1,o). .

In the previous example, the classifier was given as an input/output relation, that is, as a set of facts inserted directly in the program. In other situations, we may have the classifier invoked from the program as an external predicate. In others, the classifier can be specified directly in the program, as shown in Example 2.1.

Our CIPs compute all the counterfactual versions (or counterfactual explanations) of the original entity 𝐞\mathbf{e}. Each counterfactual version is represented (or characterized) by at least one of the stable models of the CIP; one that contains the counterfactual version of 𝐞\mathbf{e} annotated with “𝗌\mathsf{s}”. There may be more that one stable model associated to a counterfactual version 𝐞\mathbf{e}^{\prime} due to the use of the choice operator. Different choices may end up leading to the same 𝐞\mathbf{e}^{\prime}.

The counterfactual explanations obtained through the CIP are not necessarily s-explanations or c-explanations (c.f. Section 3), as Example 5.2 shows. The CIPs presented so far have (only) minimal models with respect to set-inclusion of the extensions of full predicates, whereas when we compare explanations, we do it at the “attribute (or feature, or argument) level”. Of course, s-explanations and c-explanations are all included among the counterfactual explanations, and are represented by stable models of the CIP. We will consider this point in Section 5.3.

Example 5.3

(example 3.1 continued) To show the different kinds of counterfactual versions of an original entity, let us change the classifier we had before by the one shown in Table 2.

entity (id) F1F_{1} F2F_{2} F3F_{3} LL
𝐞1\mathbf{e}_{1}^{\prime} 0 1 1 1
𝐞2\mathbf{e}_{2}^{\prime} 1 1 1 1
𝐞3\mathbf{e}_{3}^{\prime} 1 1 0 0
𝐞4\mathbf{e}_{4}^{\prime} 1 0 1 1
𝐞5\mathbf{e}_{5}^{\prime} 1 0 0 0
𝐞6\mathbf{e}_{6}^{\prime} 0 1 0 1
𝐞7\mathbf{e}_{7}^{\prime} 0 0 1 0
𝐞8\mathbf{e}_{8}^{\prime} 0 0 0 1

Table 2:  Classifier  𝒞\mathcal{C}^{\prime}

The changed values appear underlined.  In this case, we have three counterfactual versions: (a) 𝐞7\mathbf{e}_{7}^{\prime} that is c-explanation. Only one value is changed to switch the label to 0. (b) 𝐞3\mathbf{e}_{3}^{\prime} is an s-explanation, but not a c-explanation. It shows two changes, but not the one for 𝐞7\mathbf{e}_{7}^{\prime}. (c) 𝐞5\mathbf{e}_{5}^{\prime} is neither an s- nor a c-explanation. Its changes include those for 𝐞7\mathbf{e}_{7}^{\prime} and those for 𝐞3\mathbf{e}_{3}^{\prime}.

The CIP we have so far would return the three of them.  With the additional elements for CIP programs to be introduced in Section 5.3, we will be able to obtain only c-explanations. In Section 9, we briefly discuss the case of s-explanations (that may not be c-explanations).

5.2 Complexity of CIPs

The complexity result obtained in Section 4 makes us wonder whether using ASP-programs for specifying and computing the x-Resp score is an overkill, or, more precisely, whether they have the right and necessary expressive power and complexity to confront our problem. In fact they do. It is known that reasoning with disjunctive answer-set programs (DASP) falls in the second level of the polynomial hierarchy (in data complexity) [Dantsin et al. (2001)], and slightly above that by the use of weak constraints [Leone et al. (2006)] that we will use in Section 5.3. However, CIPs have the property of being head-cycle free (HCF), which brings down the complexity of a DASP to the first level of the polynomial hierarchy [Dantsin et al. (2001)]. This is in line with the result in Theorem 1.

It is easy to check that CIPs are HCF (c.f. Section 2): The ground chains in the directed graph DG(Π){\mathit{D}G}(\Pi) associated to a CIP Π\Pi are, due to rules P2. and P3., of the form:  E(𝐞,a¯,do)E(𝐞,a¯,)E(𝐞,a¯,do)E(\mathbf{e},\bar{a},\mbox{do})\rightarrow E(\mathbf{e},\bar{a},\mathbf{\star})\rightarrow E(\mathbf{e},\bar{a}^{\prime},\mbox{do}), with a¯a¯\bar{a}\neq\bar{a}^{\prime}. They never create a cycle in the head of a ground instantiation of the disjunctive rule.

One can also see the HCF property from the fact that the CIPs become repair-programs [Bertossi (2011)] for a database w.r.t. the integrity constraint, actually denial constraint,  ex¯¬(E(e,x¯)𝒞(x¯,1))\forall e\forall\bar{x}\neg(E(e,\bar{x})\wedge\mathcal{C}(\bar{x},1)). Denial constraints are common in databases, and their repairs and repair-programs have been investigated [Caniupan and Bertossi (2010), Bertossi (2021)]. C.f. [Bertossi (2011)] for additional references.

As a consequence of being HCF, a CIP can be transformed, by means of the shift operation, into an equivalent non-disjunctive ASP (c.f. Section 2).

Example 5.4

(example 5.1 continued) The disjunctive rule in Example 5.1 can be replaced by the three rules:

 ent(E,Xp,Y,Z,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp),
                     X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp),
                     chosen3(X,Y,Z,Zp), not ent(E,X,Yp,Z,do), not ent(E,X,Y,Zp,do).

 ent(E,X,Yp,Z,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp),
                     X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp),
                     chosen3(X,Y,Z,Zp), not ent(E,Xp,Y,Z,do), not ent(E,X,Y,Zp,do).

 ent(E,X,Y,Zp,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp),
                     X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp),
                     chosen3(X,Y,Z,Zp), not ent(E,Xp,Y,Z,do), not ent(E,X,Yp,Z,do).

The resulting program has the same answer-sets as the original program.

5.3 C-explanations and maximum responsibility

As discussed at the end of Section 5.1, an intervened entity of the form E(𝐞,c1,,E(\mathbf{e},c_{1},\ldots, cn,s)c_{n},\mbox{s}), that is, representing a counterfactual explanation, may not correspond to an s- or a c-explanation. We are interested in obtaining the latter, and only them, because: (a) They are our “best explanations”, and (b) They are used to define and compute the maximum x-Resp scores.

Moving towards computing x-Resp scores, notice that in each of the stable models \mathcal{M} of the CIP, we can collect the corresponding counterfactual explanation for 𝐞\mathbf{e}’s classification as the set  ϵ^={Fi,ci|\mathbf{\hat{\epsilon}}^{\mathcal{M}}\mathchar 61\relax\{\langle F_{i},c_{i}\rangle\makebox[0.6458pt]{}| Expl(𝐞,i;ci){\mathit{E}xpl}(\mathbf{e},i;c_{i}) }\in\mathcal{M}\}. This can be done within a ASP system such as DLV, which allows set construction and aggregation, in particular, counting [Leone et al. (2006)]. Actually, counting comes handy to obtain the cardinality of ϵ^\mathbf{\hat{\epsilon}}^{\mathcal{M}}, by means of:

inv-resp(𝐞,m)#count{i:Expl(𝐞,i;ci)}=m{\mathit{i}nv\mbox{-}resp}(\mathbf{e},m)\longleftarrow\#count\{i:{\mathit{E}xpl}(\mathbf{e},i;c_{i})\}\mathchar 61\relax m\mathbin{\cdot} (4)

For each model \mathcal{M} of the CIP, we will obtain such a value m()m(\mathcal{M}) that shows the number of changes of feature values that lead to the associated counterfactual explanation. Notice that, in each of the models o\mathcal{M}^{\!o} that correspond to c-explanations, these, now minimum values m(o)m(\mathcal{M}^{\!o}) will be the same, say mom^{\!o}, and can be used to compute the responsibility of a feature value in ϵ^o\mathbf{\hat{\epsilon}}^{{\mathcal{M}}^{o}}, as follows: For Expl(𝐞,i;ci)o{\mathit{E}xpl}(\mathbf{e},i;c_{i})\in\mathcal{M}^{\!o},

x-Resp𝐞,Fi(ci)=1|ϵ^|=1mo\mbox{x-Resp}_{\mathbf{e},F_{i}}(c_{i})\mathchar 61\relax\frac{1}{|\mathbf{\hat{\epsilon}}^{\mathcal{M}}|}\mathchar 61\relax\frac{1}{m^{\!o}}\mathbin{\cdot} (5)
Example 5.5

(example 5.2 continued) Let us add to the CIP above the rule:

    % computing the inverse of x-Resp:
    invResp(E,M) :- #count{I: expl(E,I,_)} = M, #int(M), E = e.

By running DLV-Complex with the new program, we obtain the models above extended with atoms representing the changes in arguments of the original entity (we omit most of the old atoms):


    {ent(e,0,1,1,o), ... , ent(e,0,0,1,s), expl(e,2,1), invResp(e,1)}
    {ent(e,0,1,1,o), ..., ent(e,0,0,0,s), expl(e,2,1), expl(e,3,1), invResp(e,2)}
    {ent(e,0,1,1,o), ..., ent(e,1,0,1,s), expl(e,1,0), expl(e,2,1), invResp(e,2)}
 

As before, we obtain three models, and each of them shows, in the last atom, the number of changes that were made to obtain the corresponding counterfactual explanation.  For example, for the last model, say 3\mathcal{M}_{3}, corresponding to entity 𝐞4\mathbf{e}_{4}, we obtain m(3)=2m(\mathcal{M}_{3})\mathchar 61\relax 2. Similarly for the second one, corresponding to entity 𝐞8\mathbf{e}_{8}. The first model corresponds to the counterfactual entity 𝐞7\mathbf{e}_{7} that is a c-explanation, which is shown by the minimum value “11” that predicate inResp takes in its second argument among all the stable models.

We can see that the first model is the one corresponding to a maximum responsibility feature value.

In order to obtain only the models associated to c-explanations, we add weak program constraints to the CIP. They can be violated by a stable model of the program (unlike strong program constraints), but the number of violations has to be minimized. In this case, for 1in1\leq i\leq n, we add to the CIP:777This notation follows the standard in [Calimeri et al. (2020), Alviano et al. (2017)].

:E(e,x1,,xn,o),E(e,x1,,xn,s),xixi:\sim\ E(e,x_{1},\ldots,x_{n},\mbox{o}),\ E(e,x_{1}^{\prime},\ldots,x_{n}^{\prime},\mbox{s}),x_{i}\neq x_{i}^{\prime}\mathbin{\cdot} (6)

Only the stable models representing an intervened version of 𝐞\mathbf{e} with a minimum number of value discrepancies with 𝐞\mathbf{e} will be kept.

Example 5.6

(example 5.5 continued) The best explanations, i.e. the c-explanations, can be obtained by adding weak program constraints to the combined CIP above:

    % weak constraints for minimizing number of changes:
    :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp.
    :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp.
    :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.
  

If we run DLV-Complex with the extended program, we obtain a single model, corresponding to 𝐞7\mathbf{e}_{7}:

   Best model: {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0),
                chosen3(0,1,1,0), ent(e,0,0,1,do), ent(e,0,0,1,tr), ent(e,0,0,1,s),
                diffchoice3(0,1,1,1), diffchoice2(0,1,1,1), diffchoice1(0,1,1,0),
                expl(e,2,1), entAux(e), invResp(e,1)}
   

This model shows that counterfactual entity 𝐞7\mathbf{e}_{7} has one change in the second attribute wrt. the original entity. This new entity gives a minimum inverse responsibility mo=1m^{\!o}\mathchar 61\relax 1 to the original value in the second argument of ent, which leads, via (5), to its (maximum) responsibility x-Resp𝐞1,F2(1)=1mo=1\mbox{x-Resp}_{\mathbf{e}_{1},F_{2}}(1)\mathchar 61\relax\frac{1}{m^{\!o}}\mathchar 61\relax 1.

Example 5.7

(example 2.1 continued)  We present now the CIP for the classifier based on the decision-tree, in DLV-Complex notation. Notice that after the facts, that now do not include the classifier, we find the rule-based specification of the decision tree.


 #include<ListAndSet>

 % facts:
    dom1(sunny). dom1(overcast). dom1(rain). dom2(high). dom2(normal).
    dom3(strong). dom3(weak).
    ent(e,sunny,normal,weak,o).   % original entity at hand

 % spec of the classifier:
    cls(X,Y,Z,1) :- Y = normal, X = sunny, dom1(X), dom3(Z).
    cls(X,Y,Z,1) :- X = overcast, dom2(Y), dom3(Z).
    cls(X,Y,Z,1) :- Z = weak, X = rain, dom2(Y).
    cls(X,Y,Z,0) :- dom1(X), dom2(Y), dom3(Z), not cls(X,Y,Z,1).

 % transition rules:
    ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,o).
    ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,do).

 % counterfactual rule
    ent(E,Xp,Y,Z,do) v ent(E,X,Yp,Z,do) v ent(E,X,Y,Zp,do) :-
                            ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp),
                            dom3(Zp), X != Xp, Y != Yp, Z!= Zp,
                     chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp).

 % definitions of chosen operators:
    chosen1(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(U), U != X,
                        not diffchoice1(X,Y,Z,U).
    diffchoice1(X,Y,Z, U) :- chosen1(X,Y,Z, Up), U != Up, dom1(U).
    chosen2(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom2(U), U != Y,
                        not diffchoice2(X,Y,Z,U).
    diffchoice2(X,Y,Z, U) :- chosen2(X,Y,Z, Up), U != Up, dom2(U).
    chosen3(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom3(U), U != Z,
                        not diffchoice3(X,Y,Z,U).
    diffchoice3(X,Y,Z, U) :- chosen3(X,Y,Z, Up), U != Up, dom3(U).

 % Not going back to initial entity (program constraint):
    :- ent(E,X,Y,Z,do), ent(E,X,Y,Z,o).

 % stop when label has been changed:
    ent(E,X,Y,Z,s) :- ent(E,X,Y,Z,do), cls(X,Y,Z,0).

 % collecting changed values for each feature:
    expl(E,outlook,X)   :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp.
    expl(E,humidity,Y)  :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp.
    expl(E,wind,Z)      :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.

    entAux(E) :- ent(E,X,Y,Z,s).        % auxiliary predicate to
                                        % avoid unsafe negation
                                        % in the constraint below
    :- ent(E,X,Y,Z,o), not entAux(E).   % discard models where
                                        % label does not change

 % computing the inverse of x-Resp:
    invResp(E,M) :- #count{I: expl(E,I,_)} = M, #int(M), E = e.

Two counterfactual versions of 𝐞\mathbf{e} are obtained, as represented by the two essentially different stable models of the program, and determined by the atoms with the annotation s (we keep in them only the most relevant atoms, omitting initial facts and choice-related atoms):

 {ent(e,sunny,normal,weak,o),cls(sunny,normal,strong,1),cls(sunny,normal,weak,1),
  cls(overcast,high,strong,1),cls(overcast,high,weak,1),cls(rain,high,weak,1),
  cls(overcast,normal,weak,1),cls(rain,normal,weak,1),cls(overcast,normal,strong,1),
  cls(sunny,high,strong,0),cls(sunny,high,weak,0),cls(rain,high,strong,0),
  cls(rain,normal,strong,0),ent(e,sunny,high,weak,do),ent(e,sunny,high,weak,tr),
  ent(e,sunny,high,weak,s),expl(e,humidity,normal),invResp(e,1)}

 {ent(e,sunny,normal,weak,o), cls(sunny,normal,strong,1),...,
  cls(rain,normal,strong,0),ent(e,rain,normal,strong,do),ent(e,rain,normal,strong,tr),
  ent(e,rain,normal,strong,s),expl(e,outlook,sunny),expl(e,wind,weak),invResp(e,2)}

The first model shows the classifiers as a set of atoms, and its last line, that ent(e,sunny,high,weak,s) is a counterfactual version, with label 0, of the original entity 𝐞\mathbf{e}, and is obtained from the latter by means of changes of values in feature 𝖧𝗎𝗆𝗂𝖽𝗂𝗍𝗒\mathsf{Humidity}, leading to an inverse score of 11.  The second model shows a different counterfactual version of 𝐞\mathbf{e}, namely ent(e,rain,normal,strong,s), now obtained by changing values for features 𝖮𝗎𝗍𝗅𝗈𝗈𝗄\mathsf{Outlook} and 𝖶𝗂𝗇𝖽\mathsf{Wind}, leading to an inverse score of 22.

Let us now add, at the end of the program the following weak constraints (labeled with (*)):

 % Weak constraints to minimize number of changes:      (*)
    :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp.
    :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp.
    :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.

If we run the program with them, the number of changes is minimized, and we basically obtain only the first model above, corresponding to the counterfactual entity 𝐞=ent(𝗌𝗎𝗇𝗇𝗒,𝗁𝗂𝗀𝗁,𝗐𝖾𝖺𝗄)\mathbf{e}^{\prime}\mathchar 61\relax{\mathit{e}nt}(\mathsf{sunny},\mathsf{high},\mathsf{weak}).

As can be seen at the light of this example, more complex rule-based classifiers could be defined inside a CIP. It is also possible to invoke the classifier as an external predicate, as the following example shows.

Example 5.8

(example 5.7 continued) The program below calls the classifiers through a predicate that has an external extension, as defined by a Python program.  The program has the same facts and the same rules as the the program in Example 5.7, except for a new rule that defines the classification predicate, cls here (and 𝒞\mathcal{C} in the general formulation in Section 5.1), and replaces the internal specification of the classifier:

     cls(X,Y,Z,L) :- &classifier(X,Y,Z;L), dom1(X), dom2(Y), dom3(Z).

Here, the atom &classifier(X,Y,Z;L) corresponds to the invocation of the external classifier with parameters X,Y,Z, which gets an external value through variable L. The program was run with the version of DLV2 for Linux that supports interaction with Python, and can be downloaded from:

     https://dlv.demacs.unical.it/publications#h.cgg9mbi41jq9

The program in Python that specifies the classifier is very simple, and it can be invoked in combination with DLV2, as follows:

sudo ./dlv2-python-linux-x86_64 program_dlv2.txt def_class.py.

Here, program_dlv2.txt is the CIP, and def_class.py is the very simple Python program that specifies the classifier, namely

 def classifier(X,Y,Z):
        if (X == "sunny") and (Y == "normal"):
                return 1
        if (X == "overcast"):
                return 1
        if (X == "rain") and (Z == "weak"):
                return 1
        else:
                return 0

We obtain as answer-set the first one in Example 5.7.

6 Semantic Knowledge

Counterfactual interventions in the presence of semantic conditions requires consideration. As the following example shows, not every intervention, or combination of them, may be admissible [Bertossi and Geerts (2020)]. In these situations declarative approaches to counterfactual interventions become particularly useful.

Example 6.1

A moving company makes automated hiring decisions based on feature values in applicants’ records of the form R=appCode,ability to lift,gender,R\mathchar 61\relax\langle{\mathit{a}ppCode},\mbox{{ability to lift}},\mbox{gender}, weight,height,age\mbox{weight},\mbox{height},{\mathit{a}ge}\rangle.  Mary, represented by R=101,1,F,160 pounds,6 feet,28R^{\star}\mathchar 61\relax\langle 101,1,F,\mbox{160 pounds},\mbox{6 feet},28\rangle applies, but is denied the job, i.e. the classifier returns:  L(R)=1L(R^{\star})\mathchar 61\relax 1.  To explain the decision, we can, hypothetically, change Mary’s gender, from F{\mathit{F}} into M{\mathit{M}}, obtaining record RR^{\star\prime}, for which we now observe L(R)=0L(R^{\star\prime})\mathchar 61\relax 0. Thus, her value FF for gender can be seen as a counterfactual explanation for the initial decision.

As an alternative, we might keep the value of gender, and counterfactually change other feature values. However, we might be constrained or guided by an ontology containing, e.g. the denial semantic constraint   ¬(R[2]=1R[6]¿80)\neg(R[2]\mathchar 61\relax 1\wedge R[6]\mathchar 62\relax 80) (22 and 66 indicating positions in the record) that prohibits someone over 80 to be qualified as fit for lifting weight.  We could also have a rule, such as  (R[3]=MR[4]¿100R[6]¡70)R[2]=1(R[3]\mathchar 61\relax M\wedge R[4]\mathchar 62\relax 100\wedge R[6]\mathchar 60\relax 70)\rightarrow R[2]\mathchar 61\relax 1, specifying that men who weigh over 100 pounds and are younger than 70 are automatically qualified to lift weight.

In situations like this, we could add to the CIPs we had before: (a) program constraints that prohibit certain models, e.g.

R(e,x,1,y,z,u,w,),w¿80;\longleftarrow\ R(e,x,1,y,z,u,w,\mathbf{\star}),\ w\mathchar 62\relax 80;

(b) additional rules, e.g.

R(e,x,1,y,z,u,w,)R(e,x,y,M,z,u,w,),z¿100,w¡70,R(e,x,1,y,z,u,w,\mathbf{\star})\longleftarrow R(e,x,y,M,z,u,w,\mathbf{\star}),\ z\mathchar 62\relax 100,w\mathchar 60\relax 70,

that may automatically generate additional interventions. In a similar way, one could accommodate certain preferences using weak program constraints.

Causality and responsibility in databases in the presence of integrity constraints was introduced and investigated in [Bertossi and Salimi (2017b)].

Example 6.2

(example 5.7 continued) It might be the case that in a particular region, some combinations of weather conditions are never possible, e.g. raining with a strong wind at the same time. When producing counterfactual interventions for the entity 𝐞\mathbf{e}, such a combination should be prohibited. This can be done by imposing a hard program constraint, that we add to the program in Example 5.7:

 % hard constraint disallowing a particular combination    (**)
    :- ent(E,rain,X,strong,tr).
 

If we run in DLV-Complex the program with this constraint, but without the weak constraints labeled with (*) in Example 5.7, we obtain only the first model shown in Example 5.7, corresponding to the counterfactual entity 𝐞=ent(𝗌𝗎𝗇𝗇𝗒,𝗁𝗂𝗀𝗁,𝗐𝖾𝖺𝗄)\mathbf{e}^{\prime}\mathchar 61\relax{\mathit{e}nt}(\mathsf{sunny},\mathsf{high},\mathsf{weak}).

As the previous example shows, we can easily impose constraints that make the counterfactual entities, or equivalently, the associated explanations, actionable [Ustun et al. (2019), Karimi et al. (2020b)]. As mentioned in Section 1, for the loan application example, we could impose a hard program constraint (i.e. add it to a CIP) of the form

E(e,,age,,𝗈),E(e,,age,,),age¡age,\longleftarrow{\mathit{E}}(e,\ldots,{\mathit{a}ge},\ldots,\mathsf{o}),\ {\mathit{E}}(e,\ldots,{\mathit{a}ge}^{\prime},\ldots,\mathsf{*}),\ {\mathit{a}ge}^{\prime}\mathchar 60\relax{\mathit{a}ge}, (7)

which prevents decreasing an applicant’s age.

Logic-based specifications also allow for compilation of constraints into rules. For example, instead of using a hard constraint, such as (7), we could directly impose the condition on a counterfactual age in the disjunctive counterfactual rule P3., of the form

E(e,x1,,xa,,do)E(e,x¯,),𝒞(x¯,1),,Doma(xa),\displaystyle\cdots\vee E(e,x_{1},\ldots,x_{a}^{\prime},\ldots,\mbox{do})\vee\cdots\ \longleftarrow\ E(e,\bar{x},\mathfrak{\star}),\mathcal{C}(\bar{x},1),\ldots,{\mathit{D}om}_{a}(x_{a}^{\prime}),
,xa¡xa,,choice(x¯,xa),\displaystyle\ldots,x_{a}\mathchar 60\relax x_{a}^{\prime},\ldots,{\mathit{c}hoice}(\bar{x},x_{a}^{\prime}),\ldots\mathbin{\cdot}

Here, the subscript aa refers to the domain and variables for the Age feature. On this basis, one could only increase the age. If this intervention leads to a successful counterfactual entity (i.e. with a positive label), we could tell the applicant that he/she has to wait to possibly succeed with the loan application.

We could also think of explicitly specifying actionable counterfactual entities, starting with a rule of the form

E(e,x¯,𝔞)E(e,x¯,𝔰),,E(e,\bar{x},\mathfrak{a})\ \longleftarrow\ E(e,\bar{x},\mathfrak{s}),\cdots,

where the new annotation 𝔞\mathfrak{a} stands for “actionable”, and the rule defines an entity as actionable if it is a counterfactual (final) entity that satisfies some extra conditions.

Several possibilities offer themselves in this direction. All of them require simple, symbolic changes in the overall specification. Doing something similar with a purely procedural approach would be much more complex, and would require modifying the underlying code.

Another situation where not all interventions are admissible occurs when features take continuous values, and their domains have to be discretized. The common way of doing this, namely the combination of bucketization and one-hot-encoding, leads to the natural and necessary imposition of additional constraints on interventions, as we will show.  Through bucketization, a feature range is discretized by splitting it into finitely many, say NN, usually non-overlapping intervals. This makes the feature basically categorical (each interval becoming a categorical value). Next, through one-hot-encoding, the original feature is represented as a vector of length NN of indicator functions, one for each categorical value (intervals here) [Bertossi et al. (2020)]. In this way, the original feature gives rise to NN binary features. For example, if we have a continuous feature “External Risk Estimate” (ERE), its buckets could be: [0,64),[64,71),[71,76),[76,81),[0,64),[64,71),[71,76),[76,81), [81,)[81,\infty). Accordingly, if for an entity 𝐞\mathbf{e}, ERE(𝐞)=65\mbox{ERE}(\mathbf{e})\mathchar 61\relax 65, then, after one-hot-encoding, this value is represented as the vector [0,1,0,0,0,0][0,1,0,0,0,0], because 6565 falls into the second bucket.

In a case like this, it is clear that counterfactual interventions are constrained by the assumptions behind bucketization and one-hot-encoding. For example, the vector cannot be updated into, say [0,1,0,1,0,0][0,1,0,1,0,0], meaning that the feature value for the entity falls both in intervals [64,71)[64,71) and [76,81)[76,81).  Bucketization and one-hot-encoding can make use of program constraints, such as  ERE(e,x,1,y,1,z,w,)\longleftarrow\mbox{ERE}(e,x,1,y,1,z,w,\mathbf{\star}), etc.  Of course, admissible interventions on predicate ERE could be easily handled with a disjunctive rule like that in P3., but without the “transition” annotation \mathbf{\star}. However, the ERE record is commonly a component, or a sub-record, of a larger record containing all the feature values for an entity [Bertossi et al. (2020)]. Hence, the need for a more general and uniform form of specification.

Here we are considering the simple scenario in which the values are treated as unordered and categorical (binary) values. In some applications of bucketization and one-hot-encoding, one assumes and takes advantage of an underlying order inherited from the values in the buckets. Such an order could be adopted and brought into this framework by using additional rules that define that order. Developing the details is somehow outside the scope of this work.

7 Beyond Binary Features and Uncertainty

7.1 Expectation over interventions for the  x-Resp score

The x-Resp introduced in Section 3 could be considered as a first approach to quantifying the relevance of a feature value. However, as the following example shows, we might want to go one step further.

Example 7.1

Consider a simple entity E(𝐞;0,a1)E(\mathbf{e};0,a_{1}), with 0Dom(F1)={0,1}0\in{\mathit{D}om}(F_{1})\mathchar 61\relax\{0,1\} and a1Dom(F2)={a1,ak}a_{1}\in{\mathit{D}om}(F_{2})\mathchar 61\relax\{a_{1},\ldots a_{k}\}. Assume that 𝒞(E(𝐞;0,a1))=1=𝒞(E(𝐞i;0,ai))=𝒞(E(𝐞j;1,aj))\mathcal{C}(E(\mathbf{e};0,a_{1}))\mathchar 61\relax 1\mathchar 61\relax\mathcal{C}(E(\mathbf{e}_{i};0,a_{i}))\mathchar 61\relax\mathcal{C}(E(\mathbf{e}_{j}^{\prime};1,a_{j})), for 1ik, 1jk11\leq i\leq k,\ 1\leq j\leq k\mathchar 45\relax 1, but 𝒞(E(𝐞k;1,ak))=0\mathcal{C}(E(\mathbf{e}_{k}^{\prime};1,a_{k}))\mathchar 61\relax 0.

We can see that changing only the original first feature value does not change the label provided by the classifier. Nor does additionally changing the second feature value, except when using the last possibly value, aka_{k}, for F2F_{2}. In this case, x-Resp𝐞,F1(0)=12\mbox{x-Resp}_{\mathbf{e},F_{1}}(0)\mathchar 61\relax\frac{1}{2}, despite the fact that almost all interventions on the second feature value do not change the label.

A similar phenomenon would appear if we had Dom(F1)={b1,,bk}{\mathit{D}om}(F_{1})\mathchar 61\relax\{b_{1},\mathbin{\cdot}\mathbin{\cdot}\mathbin{\cdot},b_{k}\}, with large kk, and 𝒞(E(𝐞;b1,a1))=1=𝒞(E(𝐞i;bj,a1))\mathcal{C}(E(\mathbf{e};b_{1},a_{1}))\mathchar 61\relax 1\mathchar 61\relax\mathcal{C}(E(\mathbf{e}_{i};b_{j},a_{1})), for j=1,,k1j\mathchar 61\relax 1,\ldots,k\mathchar 45\relax 1, but 𝒞(E(𝐞k;bk,a1))=0\mathcal{C}(E(\mathbf{e}_{k};b_{k},a_{1}))\mathchar 61\relax 0. In this case, the value b1b_{1} is a counterfactual cause with explanation responsibility 11, despite the fact that most of the interventions of b1b_{1} do not switch the label.  A way to compensate for this could be taking the label average over all possible interventions.

In order to extend the definition of the x-Resp by considering all possible interventions, we may consider the average of counterfactual labels over a given population, which would assume all entities are equally likely. In more general terms, we may assume the underlying entity population, \mathcal{E}, has a probability distribution, PP, which we can use to express the extended x-Resp in terms of expected values, as follows.

Consider 𝐞\mathbf{e}\in\mathcal{E}, an entity under classification, for which L(𝐞)=1L(\mathbf{e})\mathchar 61\relax 1, and a feature FF^{\star}\in\mathcal{F}. Assume we have:

  1. 1.

    Γ{F}\Gamma\ \subseteq\ \mathcal{F}\smallsetminus\{F^{\star}\},  a set of features that may end up accompanying feature FF^{\star}.

  2. 2.

    w¯=(wF)FΓ\bar{w}\mathchar 61\relax(w_{F})_{F\in\Gamma},  wFDom(F)w_{F}\in{\mathit{D}om}(F),   wF𝐞Fw_{F}\neq\mathbf{e}_{F}, i.e. new values for features in Γ\Gamma.

  3. 3.

    𝐞:=𝐞[Γ:=w¯]\mathbf{e}^{\prime}:\mathchar 61\relax\mathbf{e}[\Gamma:\mathchar 61\relax\bar{w}], i.e. reset 𝐞\mathbf{e}’s values for Γ\Gamma as in w¯\bar{w}.

  4. 4.

    L(𝐞)=L(𝐞)=1L(\mathbf{e}^{\prime})\mathchar 61\relax L(\mathbf{e})\mathchar 61\relax 1, i.e. there is no label change with w¯\bar{w} (but maybe with an extra change for FF^{\star}, in next item).

  5. 5.

    There is vDom(F)v\in{\mathit{D}om}(F^{\star}), with  vF(𝐞)v\neq F^{\star}(\mathbf{e}) and 𝐞′′:=𝐞[Γ:=w¯,F:=v]\mathbf{e}^{\prime\prime}:\mathchar 61\relax\mathbf{e}[\Gamma:\mathchar 61\relax\bar{w},F^{\star}:\mathchar 61\relax v].

As in Definition 3.3 and the paragraph that follows it,  if  L(𝐞)L(𝐞′′)=0L(\mathbf{e})\neq L(\mathbf{e}^{\prime\prime})\mathchar 61\relax 0,  F(𝐞)F^{\star}(\mathbf{e}) is an actual causal explanation for L(𝐞)=1L(\mathbf{e})\mathchar 61\relax 1, with “contingency set” Γ,𝐞Γ\langle\Gamma,\mathbf{e}_{\Gamma}\rangle, where 𝐞Γ\mathbf{e}_{\Gamma} is the projection of 𝐞\mathbf{e} on Γ\Gamma.

In order to define the “local” responsibility score, make vv vary randomly under conditions 1.- 5.:

RespP(𝐞,F,Γ,w¯):=L(𝐞)𝔼[L(𝐞′′)|𝐞{F}′′=𝐞{F}]1+|Γ|\mbox{Resp}^{\!P}\!(\mathbf{e},F^{\star},\Gamma,\bar{w})\ :\mathchar 61\relax\ \frac{L(\mathbf{e}^{\prime})\mathchar 45\relax\mathbb{E}[L(\mathbf{e}^{\prime\prime})\makebox[0.6458pt]{}|\makebox[0.6458pt]{}\mathbf{e}^{\prime\prime}_{\mathcal{F}\smallsetminus\{F^{\star}\}}\mathchar 61\relax\mathbf{e}^{\prime}_{\mathcal{F}\smallsetminus\{F^{\star}\}}]}{1\mathchar 43\relax|\Gamma|}\mathbin{\cdot} (8)

If, as so far, label 11 is what has to be explained, then L(𝐞)=1L(\mathbf{e}^{\prime})\mathchar 61\relax 1, and the numerator is a number between 0 and 11. Here, Γ\Gamma is fixed. Now we can minimize its size, obtaining the (generalized) responsibility score as the maximum local value; everything relative to distribution PP:

Resp𝐞,FP(F(𝐞))\displaystyle\mbox{Resp}^{\!P}_{\mathbf{e},F^{\star}}\!(F^{\star}(\mathbf{e}))\ :=\displaystyle:\mathchar 61\relax maxRespP(𝐞,F,Γ,w¯)\displaystyle\ \mbox{\large{max}}\ \ \mbox{Resp}^{\!P}\!(\mathbf{e},F^{\star},\Gamma,\bar{w})
oo|Γ|min.(8)¿0\displaystyle\mbox{\phantom{oo}}|\Gamma|\ \mbox{{min.}}\ \mbox{(\ref{eq:local})}\mathchar 62\relax 0
ooΓ,w¯1.4.\displaystyle\mbox{\phantom{oo}}{\langle\Gamma,\bar{w}\rangle\ \models\ \mbox{1.}\mathchar 45\relax\mbox{4.}}

This score was introduced, with less motivation and fewer details, and definitely not on a causal basis, in [Bertossi et al. (2020)], where experiments are shown, and different probability distributions are considered.

7.2 Domain knowledge under uncertainty

Different probability distribution on the entity population \mathcal{E} can be considered in the definition and computation of the generalized responsibility score (c.f. Section 7.1). A natural choice is the uniform distribution, PuP^{u}, that gives equal probability, 1||\frac{1}{|\mathcal{E}|}, to each entity in \mathcal{E}. Another natural distribution is the product distribution, P×P^{\times}, that is obtained, under the assumption of independence, as the product of given marginal distributions, pFp_{{}_{F}}, of the features FF\in\mathcal{F}:  P×(f1,,fn):=ΠFipFi(fi)P^{\times}(f_{1},\cdots,f_{n}):\mathchar 61\relax\Pi_{{}_{F_{i}\in\mathcal{F}}}p_{{}_{F_{i}}}(f_{i}).

We can also assume we have a sample from the entity population SS\subseteq\mathcal{E} that is used as a proxy for the latter. In this case, the distributions above become empirical distributions. In the uniform case, it is given by: P^u(𝐞):=1|S|\hat{P}^{u}(\mathbf{e}):\mathchar 61\relax\frac{1}{|S|} if 𝐞S\mathbf{e}\in S, and 0, otherwise.  In the case of the product, P^×(f1,,fn):=ΠFip^Fi(fi)\hat{P}^{\times}(f_{1},\ldots,f_{n}):\mathchar 61\relax\Pi_{{}_{F_{i}\in\mathcal{F}}}\hat{p}_{{}_{F_{i}}}(f_{i}), with p^Fi(fi):=|{𝐞S|𝐞i=fi}||S|\hat{p}_{{}_{F_{i}}}(f_{i}):\mathchar 61\relax\frac{|\{\mathbf{e}\in S\makebox[0.45206pt]{}|\makebox[0.45206pt]{}\mathbf{e}_{i}\mathchar 61\relax f_{i}\}|}{|S|}. A discussion on the use of these distributions in the context of explanation scores can be found in [Bertossi et al. (2020)].

In general, one can say that the uniform distribution may not be appropriate for capturing capturing correlations between feature values. One could argue that certain combinations of feature values may be more likely than others; or that certain correlations among them exist. This situation is aggravated by the product distribution due to the independence assumption. For these reasons an empirical distribution may be better for this purpose.

In any way, having chosen a distribution on the population, PP^{\star}, to work with; in particular, to compute the expectations needed for the responsibility score in (8), one could consider modifying the probabilities in the hope of capturing correlations and logical relationships between feature values. In particular, one could introduce constraints that prohibit certain combinations of values, in the spirit of denial constraints in databases, but in this case admitting positive and negative atoms. For example, with propositional features Old standing for “Is older than 20” and OverDr for “Gets an account overdraft above $50K”, we may want to impose the prohibition ¬(Old¯OverDr)\neg(\overline{\mbox{{Old}}}\wedge{\mathit{O}verDr}), standing for “nobody under 20 gets at overdraft above $50K”.

These constraints, which are satisfied or violated by a single entity at a time, are of the form:

χ:¬(F1FF2F¯),\chi:\ \neg(\bigwedge_{F\in\mathcal{F}_{1}}F\wedge\bigwedge_{F^{\prime}\in\mathcal{F}_{2}}\bar{F^{\prime}}), (10)

where 12\mathcal{F}_{1}\cup\mathcal{F}_{2}\subseteq\mathcal{F},  12=\mathcal{F}_{1}\cap\mathcal{F}_{2}\mathchar 61\relax\emptyset, and F,F¯F,\bar{F^{\prime}} mean that features F,FF,F^{\prime} take values 11 and 0, resp.

The event associated to χ\chi is  E(χ)={𝐞|𝐞χ}E(\chi)\mathchar 61\relax\{\mathbf{e}\in\mathcal{E}\makebox[0.6458pt]{}|\makebox[0.6458pt]{}\mathbf{e}\models\chi\}, where 𝐞χ\mathbf{e}\models\chi has the obvious meaning of satisfaction of χ\chi by entity 𝐞\mathbf{e}. In order to accommodate the constraint, given the initial probability space ,P\langle\mathcal{E},P^{\star}\rangle, we can redefine the probability as follows. For EE\subseteq\mathcal{E},

Pχ(E):=P(E|E(χ))=P(EE(χ))P(E(χ))P^{\star}_{\chi}(E):\mathchar 61\relax P^{\star}(E|E(\chi))\mathchar 61\relax\frac{P^{\star}(E\cap E(\chi))}{P^{\star}(E(\chi))}\mathbin{\cdot} (11)

If χ\chi is consistent with the population, i.e. satisfiable in \mathcal{E}, the conditional distribution is well-defined. Now, the probability of χ\chi’s violation set is:

Pχ(E(χ))=P()P(E(χ))=0P^{\star}_{\chi}(\mathcal{E}\smallsetminus E(\chi))\mathchar 61\relax\frac{P^{\star}(\emptyset)}{P^{\star}(E(\chi))}\mathchar 61\relax 0\mathbin{\cdot}

This definition can be extended to finite and consistent sets, Θ\Theta, of constraints, by using PΘ(E)P^{\star}_{\wedge\Theta}(E) in (11), with Θ\wedge\Theta the conjunction of the constraints in Θ\Theta.

Of course, one could go beyond constraints of the form (10), applying the same ideas, and consider any propositional formula that is intended to be evaluated on a single entity at a time, as opposed to considering combinations of feature values for different entities.

The resulting modified distribution that accommodates the constraints could be used in the computation of any of the scores expressed in terms of expected values or in probabilistic terms.

An alternative approach consists in restricting the (combinations of) interventions in the definitions and computation of the responsibility score, as suggested in Section 6 (and any other score based on counterfactual interventions, as a matter of fact). It is worth performing experimental comparisons between the two approaches.

8 Related Work

In this worl we consider only local methods in that we are not trying to explain the overall behavior of a classifier, but a particular output on the basis of individual feature values. We also consider model-agnostic methods that can be applied to black-box models, i.e. without the need for an access to the internal components of the model. Of course, these approaches can be applied to open models, i.e. that make all its components transparent for analysis. Actually, in some cases, it is possible to take computational advantage of this additional source of knowledge (more on this below in this section).

There are several approaches to the explanation of outcomes from classification models. These methods can be roughly categorized as those that provide attribution scores and those that provide sufficient explanations. Those in the former class assign a number to each feature value that reflects its relevance for the outcome at hand. x-Resp falls in this category. For comparison with other approaches, we repeat that, in the case of x-Resp, one counterfactually modifies feature values to see if the outcome changes. The score is computed from those changes.

Counterfactual changes are omnipresent in attribution-score-based methods, and they can be used to consider alternative entities to the one under explanation, and a notion of distance between those alternatives and the latter [Martens and Provost (2014), Wachter et al. (2017), Russell (2019), Karimi et al. (2020a), Datta et al. (2016), Bertossi et al. (2020)]. Sometimes the counterfactuals are less explicit, as with the popular Shap score [Lundberg et al. (2020)], that is based on the Shapley value of game theory. It can be seen as a counterfactual-based score in that all possible combinations of features values (and then, most of the time departing from the initial entity) are considered in a complex aggregation (an average or expected value).

The Shap score is designed as a model-agnostic method. However, for a large class of classifiers whose internal components can be used for the score computation, Shap becomes computable in polynomial-time, while its general computational complexity is #P\#P-hard [Arenas et al. (2021), Van den Broeck et al. (2021)]. As we showed in this paper, the computation of the x-Resp is NP{\mathit{N}\!P}-hard. Actually, the generalized, probabilistic extension of x-Resp (c.f. Section 7.1), for certain classifiers and probability distributions on the underlying population, x-Resp can be #P\#P-hard [Bertossi et al. (2020)]. An investigation of classes of classifiers for which the x-Resp score (deterministic as in this work or probabilistic) can be computed in polynomial time is still open.

The popular LIME score [Ribeiro et al. (2016)] is also an attribution score. It appeals to an explainable model that locally approximates the agnostic model around the entity under classification. From the resulting approximation feature scores can be computed.

Sufficient-explanation methods try to identify those (combinations of) feature values that alone determine the outcome at hand, in the sense that, by keeping those values and possibly changing all the others, the outcome remains the same [Ribeiro et al. (2018), Wang et al. (2021)]. One could say, in some sense, that the outcome is entailed by the values that appear in a sufficient explanation. Identifying those values is reminiscent of performing an abductive diagnosis task, as done with rule-based specifications [Eiter et al. (1997)], actually [Ribeiro et al. (2018)] does appeal to rule-based methods.

There are some approaches to logic-based explanations for classification. They mostly follow the sufficient-explanation paradigm we mentioned above. More specifically, it has been shown how to “reconstruct” certain classes of classifiers, e.g. random forests, Bayesian classifiers, and binary neural networks, as Boolean circuits [Shih et al. (2018), Shi et al. (2020), Choi et al. (2020)]. Once a circuit is available, one can use it in particular to obtain explanations for the outcomes of the model using methods that are, in essence, abductive [Darwiche and Hirth (2020)]. In this context the work presented in [Ignatiev et al. (2019), Ignatiev (2019), Izza and Marques-Silva (2021)] is also relevant, in that logic-based encodings of neural networks, boosted trees, and random forests are proposed and exploited for explanation purposes. Abductive and SAT-based approaches are followed. Notice that abductive methods that generate sufficient explanations can also be the basis for score definitions and their computation. Just for the gist, if a feature value appears in the large number of sufficient explanations, then it could be assigned a large individual score.

To the best of our knowledge, none of the approaches described above, and others, are based on logical specifications of the counterfactuals involved in the score definition and computation. Furthermore, these are specifications that can easily adopt domain or desirable logical constraints in a seamless manner, and for combined use. Actually, the logic-based representations of complex classifiers that we just mentioned above, could be the starting point for the use of our approach. For example, a Boolean circuit can be represented as a set of rules that becomes a first component of a CIP that does the counterfactual analysis on that basis.

9 Discussion

This work is about interacting via ASP with possibly external classifiers, and reasoning about their potential inputs and outputs. The classifier is supposed to have been learned by some other means. In particular, this work is not about learning ASPs, which goes in quite a different direction [Law et al. (2019)].

In this work we have treated classifiers as black-boxes that are represented by external predicates in the ASP. However, we have also considered the case of a classifier that is specified within the CIP by a set of rules, to define the classification predicate 𝒞\mathcal{C}. This was the case of a deterministic Decision Tree. Basically, each branch from the root to a label can be represented by a rule, with the branching direction at intermediate nodes represented by values in literals in a rule body, and with the label in the rule head.  Something similar can be done with Boolean Circuits used as classifiers. Actually, it is possible to represent more involved classifiers as Boolean circuits (c.f. Section 8).

Our CIPs can be easily enhanced with different extensions. For example, the feature domains can be automatically specified and computed from training or test data [Bertossi et al. (2020)], or other sources. As done in [Bertossi et al. (2020)] for experimental purposes and using purely procedural approaches, it is possible in our ASP setting to restrict the sizes of the contingency sets, e.g. to be of size 22 (c.f. Section 3). This can be easily done by adding a cardinality condition to the body of the disjunctive intervention rule (which is supported by DLV-Complex and DLV2). Doing this would not lead to any loss of best explanations (as long as they fall within the imposed bound), and may reduce the computational work.

It is also possible to extend CIPs to make them specify and compute the contingency sets (of feature values) that accompany a particular value that has been counterfactually changed (c.f. Section 3). This requires a set-building operation, which is provided by DLV-Complex. Doing this would allow to obtain s-explanations, i.e. with minimal (not necessarily minimum) contingency sets (c.f. Example 5.3). One could try if, by deleting one element from the contingency set, the label changes or not. Again, DLV-Complex could be used here. In [Bertossi and Reyes (2021)] a detailed example can be found where this is illustrated at the light of the naive Bayes classifier. This approach was also followed in [Bertossi (2021)] to compute contingency sets for individual database tuples as causes for query answers.

Our specification of counterfactual explanations is in some sense ideal, in that the whole product space of the feature domains is considered, together with the applicability of the classifier over that space. This may be impractical or unrealistic. However, we see our proposal as a conceptual and generic specification basis that can be adapted in order to include more specific declarative practices and mechanisms.

For example, restricting the product space can be done in different manners. One can use constraints or additional conditions in rule bodies. A different approach consists in replacing the product space with the entities in a data sample SΠi=1nDom(Fi)S\subseteq\Pi_{i\mathchar 61\relax 1}^{n}{\mathit{D}om}(F_{i}). We could even assume that this sample already comes with classification labels, i.e. SL={𝐞1,L(𝐞1),,𝐞K,L(𝐞K)}S^{L}\mathchar 61\relax\{\langle\mathbf{e}_{1}^{\prime},L(\mathbf{e}_{1}^{\prime})\rangle,\ldots,\langle\mathbf{e}_{K}^{\prime},L(\mathbf{e}_{K}^{\prime})\rangle\}. This dataset may not be disjoint from the training dataset TT (c.f. Section 3). The definition of counterfactual explanation and CIPs could be adapted to these new setting without major difficulties.

The CIPs we have introduced are reminiscent of repair programs that specify and compute the repairs of a database that fails to satisfy the intended integrity constraints [Caniupan and Bertossi (2010)]. Actually, the connection between database repairs and actual causality for query answers was established and exploited in [Bertossi and Salimi (2017a)]. ASPs that compute tuple-level and attribute-level causes for query answers were introduced in [Bertossi (2021)]. Attribute-level causes are close to interventions of feature values, but the ASPs for the former are much simpler that those presented here, because in the database scenario, changing attribute values by nulls is good enough to invalidate the query answer (the “equivalent” in that scenario to switching the classification label here). Once a null is introduced, there is no need to take it into account anymore, and a single “step” interventions are good enough.

Our CIPs are designed to obtain general counterfactual explanations, and in particular and mainly, c-explanations. The latter are associated to minimum-size contingency sets of feature values, and, at the same time, to maximum-responsibility feature values. This is achieved via the weak constraints in (6). If we wanted to obtain the responsibility of a non-maximum-responsibility feature value, that is associated to an s-explanation that is not a c-explanation, we can remove the weak constraints (and by so doing keeping all the models of the CIP), and pose a query under the brave semantics about the values of inv-resp in (4). An approach like this was followed in [Bertossi (2021)] for database tuples as causes for query answers.

Apropos query answering, that we haven’t exploited in this work, several opportunities offer themselves. For example, we could pose a query under the brave semantics to detect if a particular feature value is ever counterfactually changed, or counterfactually changed in a best explanation. Under the skeptical semantics, we can identify feature values that change in all the counterfactual entities. Fully exploiting query answering is a matter of ongoing work.

In this work we have considered s- and c-explanations, that are associated to two specific and related minimization criteria. However, as done in abstract terms in Section 3, counterfactual explanations could be cast in terms of different optimization criteria [Karimi et al. (2020a), Russell (2019), Schleich et al. (2021)]. One could investigate the specification and implementation of other forms of preference, the generic \preceq in Definition 3.1, by using ASPs as in [Gebser et al. (2011), Brewka et al. (2015)].

Specifying and computing the generalized, probabilistic responsibility score of Section 7.1 goes beyond the usual capabilities of ASP systems. However, it would be interesting to explore the use of probabilistic ASPs for these tasks [Baral et al. (2009), Lee and Yang (2017)]. Along a similar line, probabilistic ASPs could be in principle used to deal with the integration of semantic constraints with underlying probability distributions on the entity population, as described in Section 7.2. This is all matter of ongoing work.

Acknowledgements:  The author has been a member of RelationalAI’s Academic Network, which has been a source of inspiration for this work, and much more. Part of this work was funded by ANID - Millennium ScienceInitiative Program - Code ICN17002. Help from Jessica Zangari and Mario Alviano with information about DLV2, and from Gabriela Reyes with the DLV program runs is much appreciated.

References

  • Alviano et al. (2017) Alviano, M., Calimeri, F., Dodaro, C., Fuscà, D., Leone, L., Perri, S., Ricca, F., Veltri, P., and Zangari, J. 2017. The asp system dlv2. In: Balduccini, M., and Janhunen, T. (Eds). Proceedings of the 14th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2017. LNCS, vol. 10377, Springer, 215-221.
  • Alviano et al. (2019) Alviano, M., Amendola, G., Dodaro, C., Leone, N., Maratea, M. and Ricca, F. 2019. Evaluation of disjunctive programs in wasp. In: Marcello Balduccini, M., Lierler, Y., and Woltran, S. (Eds.). Proceedings of the 15th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2019. LNCS, vol. 11481, Springer, 241- 255.
  • Arenas et al. (2021) Arenas, M., Pablo Barceló, P., Bertossi, L. and Monet, M. 2012. The tractability of shap-scores over deterministic and decomposable boolean circuits. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021. AAAI Press, 6670-6678.
  • Baral et al. (2009) Baral, C., Gelfond, M. and Rushton, N. 2009.  Probabilistic reasoning with answer sets.  Theory and Practice of Logic Programming, 9, 1, 57-144.
  • Ben-Eliyahu and Dechter (1994) Ben-Eliyahu, R. and Dechter, R. 1994. Propositional semantics for disjunctive logic programs. Annals of Mathematics in Artificial Intelligence, 12, 53-87.
  • Bertossi (2011) Bertossi, L. 2011. Database Repairing and Consistent Query Answering. Synthesis Lectures in Data Management, Morgan & Claypool.
  • Bertossi (2019) Bertossi, L. 2019. Database repairs and consistent query answering: origins and further developments. Gems of PODS paper. In: Suciu, D., Skritek, S., and Koch, Ch. (Eds.). Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019. ACM, 48-58.
  • Bertossi and Salimi (2017a) Bertossi, L. and Salimi, B. 2017a. From causes for database queries to repairs and model-based diagnosis and back. Theory of Computing Systems, 61, 1, 191-232.
  • Bertossi and Salimi (2017b) Bertossi, L. and Salimi, B. 2017b. Causes for query answers from databases: datalog abduction, view-updates, and integrity constraints. International Journal of Approximate Reasoning, 90, 226-252.
  • Bertossi (2020) Bertossi, L. 2020.  An asp-based approach to counterfactual explanations for classification. In: Gutiérrez-Basulto, V., Kliegr, T., Soylu, A. Giese, M., and Roman, D. (Eds.). Proceedings “Rules and Reasoning” - 4th International Joint Conference, RuleML+RR 2020. LNCS vol. 12173, Springer, 70–81.
  • Bertossi (2021) Bertossi, L. 2021.  Characterizing and computing causes for query answers in databases from database repairs and repair programs. Knowledge and Information Systems, 63, 1, 199-231.
  • Bertossi et al. (2020) Bertossi, L., Li, J., Schleich, M., Suciu, D. and Vagena, Z. 2020.  Causality-based explanation of classification outcomes. In: Sebastian Schelter,S., Whang, S., and Stoyanovich, J. (Eds.). Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, In conjunction with the 2020 ACM SIGMOD/PODS Conference, DEEM@SIGMOD 2020, 6:1–6:10.
  • Bertossi and Geerts (2020) Bertossi, L. and Geerts, F. 2020.  Data quality and explainable ai.   ACM Journal of Data and Information Quality, 12, 2, 1-9.
  • Bertossi and Reyes (2021) Bertossi, L. and Reyes, G. 2021.  Answer-set programs for reasoning about counterfactual interventions and responsibility scores for classification. In: Proceedings of the 1st International Joint Conference on Learning and Reasoning, IJCLR 2021. LNCS, to appear, Springer. Extended version posted as ArXiv 2107.10159.
  • Brewka et al. (2011) Brewka, G., Eiter, T. and Truszczynski, M. 2011. Answer set programming at a glance. Commununications of the ACM, 54, 12, 92–103.
  • Brewka et al. (2015) Brewka, G., Delgrande, J., Romero, J. and Schaub, T. 2015. asprin: Customizing answer set preferences without a headache. In: Blai Bonet, B., and Koenig, S. (Eds.). Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015. AAAI Press, 1467–1474.
  • Calimeri et al.. (2009) Calimeri, F., Cozza, S., Ianni, G., and Leone, N. 2009. An asp system with functions, lists, and sets. In: Erdem, E., Lin, F., and Schaub, T. (Eds.). Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2009. LNCS, vol. 5753, Springer, 483–489.
  • Calimeri et al. (2020) Calimeri, F., Faber, W., Gebser, M., Ianni, G., Kaminski, R., Krennwallner, T., Leone, N., Maratea, M., Ricca, F. and Schaub, T. 2020.  Asp-core-2 input language format.  Theory and Practice of Logic Programming, 20, 2, 294-309.
  • Caniupan and Bertossi (2010) Caniupan, M. and Bertossi, L. 2010.  The consistency extractor system: answer set programs for consistent query answering in databases.  Data & Knowledge Engineering, 69, 6, 545-572.
  • Chockler and Halpern (2004) Chockler, H. and Halpern, J. Y. 2004. Responsibility and blame: a structural-model approach. Journal of Artificial Intelligence Research, 22, 93-115.
  • Choi et al. (2020) Choi, A., Shih, A., Goyanka, A. and Darwiche, A. 2020. On symbolically encoding the behavior of random forests. ArXiv 2007.01493, 2020.
  • Dantsin et al. (2001) Dantsin, E., Eiter, T., Gottlob, G. and Voronkov, A. 2001. Complexity and expressive power of logic programming. ACM Computing Surveys, 33, 3, 374-425.
  • Datta et al. (2016) Datta, A., Sen, S. and Zick, Y. 2016.  Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of the IEEE Symposium on Security and Privacy, SP 2016. IEEE Computer Society, 598–617.
  • Darwiche and Hirth (2020) Darwiche, A. and Hirth, A. 2020. On the reasons behind decisions. In: De Giacomo, G., Catalá, A., Dilkina, B., Milano, M., Barro, S., Bugarín, B., and Lang, J. (Eds.). Proceedings of the 24th European Conference on Artificial Intelligence, ECAI 2020. IOS Press, 712–720.
  • Eiter et al. (1997) Eiter, T., Gottlob, G. and Leone, N. 1997. Abduction from logic programs: semantics and complexity. Theoretical Computer Science, 189, 1-2, 129-177.
  • Eiter et al. (2019) Eiter, T., Germano, S., Ianni, G., Kaminski, T., Redl, C., Schüller, P. and Weinzierl, A. 2019. The dlvhex system.  Künstliche Intelligenz, 32, 2-3, 187-189.
  • Eiter et al. (2017) Eiter, T., Kaminski, T., Redl, C., Schüller, P. and Weinzierl, A. 2017. Answer set programming with external source access. In: Ianni, G., Lembo, D., Bertossi, L., Faber, W., Glimm, B., Gottlob, G., and Staab, S. (Eds.). Reasoning Web. Semantic Interoperability on the Web - 13th International Summer School 2017, Tutorial Lectures. LNCS, vol. 10370, Springer, 204–275.
  • Flach (2012) Flach, P.  Machine Learning.  Cambridge University Press, 2012.
  • Gebser et al. (2011) Gebser, M., Kaminski, R. and Schaub, T. 2011. Complex optimization in answer set programming.  Theory and Practice of Logic Programming, 11, 4-5, 821-839.
  • Izza and Marques-Silva (2021) Izza, Y. and Marques-Silva, J. 2021. On explaining random forests with SAT. In: Zhou, Z.-H. (Ed.). Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, 2584–2591.
  • Gelfond and Kahl (2014) Gelfond, M. and Kahl, Y. 2014. Knowledge Representation and Reasoning, and the Design of Intelligent Agents. Cambridge University Press.
  • Gelfond and Lifschitz (1991) Gelfond, M. and Lifschitz, V. 1991. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9, 365-385.
  • Giannotti et al. (1997) Giannotti, F., Greco, S., Sacca, D. and Zaniolo, C. 1997.  Programming with non-determinism in deductive databases.  Annals of Mathematics in Artificial Intelligence, 19, 1-2, 97-125.
  • Halpern and Pearl (2005) Halpern, J. and Pearl, J. 2005. Causes and explanations: a structural-model approach: part 1. British J. Philosophy of Science, 56, 843-887.
  • Ignatiev et al. (2019) Ignatiev, A., Narodytska, N. and Marques-Silva, J. 2019.  Abduction-based explanations for machine learning models. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, AAAI Press, 1511-1519.
  • Ignatiev (2019) Ignatiev, A. 2020.  Towards trustable explainable AI. In: Bessiere, C. (Ed.). Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, 5154-5158.
  • Karimi et al. (2020a) Karimi, A-H., Barthe, G., Balle, B. and Valera, I. 2020a.  Model-agnostic counterfactual explanations for consequential decisions. In: Chiappa, S., and Calandra, R. (Eds.). Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. PMLR, vol. 108, 895–905.
  • Karimi et al. (2020b) Karimi, A-H., von Kügelgen, B. J., Schölkopf, B. and Valera, I. 2020b.  Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.-F., and Lin, H.-T. (Eds.). Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020.
  • Law et al. (2019) Law, M., Russo, A. and Broda K. 2019. Logic-based learning of answer set programs. In: Krötzsch, M., and Stepanova, D. (Eds.). Reasoning Web. Explainable Artificial Intelligence - 15th International Summer School 2019, Tutorial Lectures. LNCS, vol. 11810, Springer, 196–231.
  • Lee and Yang (2017) Lee, J. and Yang, Z. 2017. LPMLN, weak constraints, and p-log. In: Singh, S.P., and Markovitch, S. (Eds.). Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, AAAI Press, 1170-1177.
  • Leone et al. (2006) Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Koch, C., Mateis, C., Perri, S. and Scarcello, F. 2006. The dlv system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7, 3, 499-562.
  • Lundberg et al. (2020) Lundberg , S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J., Nair, B., Katz, R., Himmelfarb, J., Bansal, N. and Lee, S-I. 2020. From local explanations to global understanding with explainable ai for trees.  Nature Machine Intelligence, 2, 1, 56-67.
  • Martens and Provost (2014) Martens, D. and Provost, F. J. 2014. Explaining data-driven document classifications. MIS Quarterly, 38, 1, 73-99.
  • Meliou et al. (2010) Meliou, A., Gatterbauer, W., Moore, K. F. and Suciu, D. 2010. The complexity of causality and responsibility for query answers and non-answers. Proceedings of the VLDB Endowment, 4, 1, 34–45.
  • Mitchell (1997) Mitchell, T. M. 1997. Machine Learning. McGraw-Hill.
  • Molnar (2020) Molnar, C. 2020. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book
  • Narodytska et al. (2010) Narodytska, N., Shrotri, A., Meel, K., Ignatiev, A. and Marques-Silva, J. 2019. Assessing heuristic machine learning explanations with model counting. In: Janota, M., and Lynce, I. (Eds.). Proceedings of the 22nd International Conference on Theory and Applications of Satisfiability Testing, SAT 2019. LNCS, vol. 11628, Springer, 267–278.
  • Pearl (2009) Pearl, J. 2009.  Causality: Models, Reasoning and Inference.  Cambridge University Press, 2nd edition.
  • Rudin (2019) Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206-215.
  • Russell (2019) Russell, Ch. 2019. Efficient search for diverse coherent explanations. In: Boyd, D., and Morgenstern, J. H. (Eds.). Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT 2019. ACM, 20–28.
  • Ribeiro et al. (2016) Ribeiro, M. T., Singh, S. and Guestrin, C. 2016.  “Why should I trust you?”: explaining the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A. J., Aggarwal, C.C., Shen, D., and Rastogi, R. (Eds.). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016. ACM, 1135–1144.
  • Ribeiro et al. (2018) Ribeiro, M. T., Singh, S. and Guestrin, C. 2018.  Anchors: high-precision model-agnostic explanations. In: McIlraith, S. A., and Weinberger, K. Q. (Eds.). Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI Press, 1527–1535.
  • Schleich et al. (2021) Schleich, M., Geng, Z., Zhang, Y. and Suciu, D. 2021. GeCo: Quality counterfactual explanations in real time.  Proceedings of the VLDB Endowment, 14, 9, 1681-1693.
  • Shi et al. (2020) Shi, W., Shih, A., Darwiche, A. and Choi, A. 2020. On tractable representations of binary neural networks. In: Calvanese, D., Erdem, E., and Thielscher, M. (Eds.). Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020, 882–892.
  • Shih et al. (2018) Shih, A., Choi, A. and Darwiche, A. 2018. Formal verification of bayesian network classifiers. In: Studený, M., and Kratochvíl, V. (Eds.). Proceedings of the International Conference on Probabilistic Graphical Models, PGM 2018. PLMR, vol. 72, 157–168.
  • Ustun et al. (2019) Ustun, B., Spangher, A. and Liu, Y. 2019. Actionable recourse in linear classification. In: Boyd, D., and Morgenstern, J. H. (Eds.). Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT 2019. ACM, 10-19.
  • Van den Broeck et al. (2021) Van den Broeck, G., Lykov, A., Schleich, M. and Suciu, D. 2021. On the tractability of shap explanations. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021. AAAI Press, 6505-6513.
  • Wachter et al. (2017) Wachter, S., Mittelstadt, B. D. and Russell, C. 2017. Counterfactual explanations without opening the black box: automated decisions and the GDPR.  Harvard Journal of Law & Technology, 31, 841.
  • Wang et al. (2021) Wang, E., Khosravi, P. and Van den Broeck, G. 2021. Probabilistic sufficient explanations. In: Zhou, Z.-H. (Ed.). Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, 3082–3088.