Declarative Approaches to Counterfactual Explanations for Classification††thanks: In memory of Prof. Jack Minker (1927-2021), a scientist, a scholar, a visionary; a generous, wise and committed man.
Abstract
We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely responsibility scores. The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this work is on the specification and computation of best counterfactual entities, i.e. those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints. Several examples of programs written in the syntax of the DLV ASP-solver, and run with it, are shown.
keywords:
Classification, explanations, counterfactuals, causality, answer-set programming, constraints1 Introduction
Counterfactual Explanations.
Providing explanations to results obtained from machine-learning (ML) models has been recognized as critical in many applications. It has also become an active research direction in explainable ML [Molnar (2020)], and the broader area of explainable AI. Explanations become particularly relevant when decisions are automatically made by those models, possibly with serious consequences for stake holders. Since most of those models are algorithms learned from training data, providing explanations may not be easy or possible. These models are or can be seen as black-box models. Even if the components of the models, say their structure, mathematical functions, and parameters, are relatively clear and accessible, characterizing and measuring the relevance of a particular feature value for the classification of an entity may not be that clear from a sheer inspection of the model.
In AI, explanations have been investigated, among other areas, under actual causality [Halpern and Pearl (2005)], where counterfactual interventions on a causal structural model are central. They are hypothetical updates on the model’s variables, to explore if and how the outcome of the model changes or not. In this way, explanations for an original output are defined and computed. Counterfactual interventions have been used, explicitly or implicitly, with ML models, in particular with classification models [Martens and Provost (2014), Wachter et al. (2017), Russell (2019), Karimi et al. (2020a), Datta et al. (2016), Bertossi et al. (2020)]. Counterfactual explanations for query answers from databases have also been investigated [Meliou et al. (2010), Bertossi and Salimi (2017a), Bertossi (2021)], an area that is not unrelated to classification results (c.f. Section 4).
In this work we start by introducing and formalizing the notion of counterfactual explanation for the classification of an entity (think of a finite record of feature values) with a certain label . Counterfactual explanations are alternative versions, , of that differ from by a set of feature values, but are classified with a different label . In this work, “best counterfactual explanations” come in two forms, as s-explanations and c-explanations. The former minimize (under set inclusion) the set of features whose values are changed; while the latter, minimize the number of changed feature values.
Explanation Scores and Responsibility.
We use counterfactual explanations to define the x-Resp score, an explanation responsibility score for a feature value of the entity under classification. The idea is to identify and compute the most responsible feature values for the outcome. This score is adapted from the general notion of responsibility used in actual causality [Chockler and Halpern (2004)].
More specifically, in this work, we concentrate our interest mostly and mainly on specifying and computing, for a given and fixed classified entity , all the best explanations as represented as counterfactual entities (that have feature values changed and a different label), where “best” for a counterfactual entity means that the number of changes of feature values (with respect to ) takes a minimum value (the c-explanations mentioned above). At the same time, we are also interested in the maximum-responsibility feature values. These two goals are directly related: The maximum-responsibility feature values (in the original entity ) are exactly those that appear as changed in a best counterfactual entity.
The x-Resp score belongs to the family of feature attribution scores, among which one finds the popular Shap score [Lundberg et al. (2020)]. While Shap is based on the Shapley value that is widely used in game theory, the x-Resp score is based on actual causality. Experimental results with an extended version (that we briefly describe) of the responsibility score investigated in this work, and comparisons with other scores, in particular, with Shap, are reported in [Bertossi et al. (2020)]. However, only a fully procedural approach to the x-Resp score was followed in [Bertossi et al. (2020)]. Furthermore, for performance and comparison-related reasons, the number of counterfactually changeable feature values was bounded a priori.
Specifying Counterfactuals and Reasoning.
Answer-set programming (ASP) is an elegant and powerful logic programming paradigm that allows for declarative specifications of a domain or a problem. From those specifications one can do logical reasoning, and quite usefully, non-monotonic reasoning [Brewka et al. (2011), Gelfond and Kahl (2014)]. ASP has been applied to the specification and solution of hard combinatorial problems, and also to different tasks related to databases. Among the latter, we find the specification of repairs of databases w.r.t. integrity constraints, virtual data integration, specification of privacy views [Bertossi (2011), Bertossi (2019)], and causality in databases [Bertossi (2021)].
In this work we also introduce Counterfactual Intervention Programs (CIPs), which are answer-set programs (ASPs) that specify counterfactual interventions and counterfactual explanations, and also allow to specify and compute the responsibility scores. More specifically, CIPs are designed to be used to compute: (a) counterfactual explanations (best or not), (b) best counterfactual explanations (i.e. c-explanations), and (c) highest-responsibility features values (in the original entity), that is, with maximum x-Resp score.
As already mentioned above, c-explanations lead to maximum responsibility scores, and maximum-responsibility scores are associated to c-explanations. However, in order to compute a maximum-responsibility score (or a feature value which attains it), we may not need all the best counterfactual entities that “contain” that particular feature value. Actually, only one of them would be good enough. Still, our CIPs compute all the counterfactual entities, and all (and only) the best counterfactual entities if so required.
In this regard, it is worth mentioning that counterfactual explanations, best or not, are interesting per se, in that they could show what could be done differently in order to be obtain a different classification [Schleich et al. (2021)]. For example, someone, represented as an entity , who applies for a loan at a bank, and is deemed by the bank’s classifier as unworthy of the loan, could benefit from knowing that a reduction of the number of credit cards he/she owns might lead to a positive assessment. This counterfactual explanation would be considered as actionable (or feasible or a recourse) [Ustun et al. (2019), Karimi et al. (2020b)].
Furthermore, having all (or only the best) counterfactual explanations, we could think of doing different kinds of meta-analysis and meta-analytics on top of them. In some cases, query answering on the CIP can be leveraged, e.g. to know if a highest responsibility feature value appears as changed in every best counterfactual entity (c.f. Section 9).
CIPs can be applied to black-box models that can be invoked from the program; and also with any classifier that can be specified within the program, such as a rule-based classifier. Decision trees for classification can be expressed by means of rules. As recent research shows, other established classifiers, such as random forests, Bayesian networks, and neural networks, can be compiled into Boolean circuits [Shi et al. (2020), Shih et al. (2018), Choi et al. (2020), Narodytska et al. (2010)], opening the possibility of specifying them by means of ASPs.
Answer-Set Programming and Semantic Knowledge.
Our declarative approach to counterfactual interventions is particularly appropriate for bringing into the game additional, declarative, semantic knowledge. As we show, we could easily adopt constraints when computing counterfactual entities and scores. In particular, these constraints can be used to concentrate only on actionable counterfactuals (as defined by the constraint). In the loan example, we could omit counterfactual entities or explanations that require from the applicant to reduce his/her age. Imposing constraints in combination with purely procedural approaches is much more complicated. In our case, we can easily and seamlessly integrate logic-based semantic specifications with declarative programs, and use the generic and optimized solvers behind ASP implementations. In this work we include several examples of CIPs specified in the language of the DLV system [Leone et al. (2006)] and its extensions, and run on them.
We establish that ASPs are the right declarative and computational tool for our problem since they can be used to: (a) Compute all the counterfactual explanations, in particular, the best ones. (b) Provide through each model of the program all the information about a particular counterfactual explanation. (c) Compute maximum-responsibility scores and the counterfactual explanations associated to them. (d) In many cases, specify as a single program the classifier together with the explanation machinery. (d) Alternatively, call a classifier from the program as an external predicate defined in, say Python. (e) Bring into the picture additional, logic-based knowledge, such as semantic and actionability constraints. (f) Do query answering, under the skeptical and brave semantics, to analyze at a higher level the outcomes of the explanation process. Furthermore, and quite importantly, the ASPs we use match the intrinsic data complexity of the problem of computing maximum-responsibility feature values, as we will establish in this work.
We discuss why and how the responsibility score introduced and investigated early in this work has to be extended in some situations, leading to a generalized probabilistic responsibility score introduced in [Bertossi et al. (2020)]. After doing this, we show how logical constraints that capture domain knowledge can be used to modify the underlying probability distribution on which a score is defined.
Article Structure and Contents.
This paper is structured as follows. In Section 2 we provide background material on classification models and answer-set programs. In Section 3 we introduce counterfactual interventions and the responsibility score. In Section 4, we investigate the data complexity of the computation of maximum-responsibility feature values. In Section 5 we introduce and analyze the CIPs. In Section 6, we consider some extensions of CIPs that can be used to capture additional domain knowledge. In Section 7, and for completeness, we briefly motivate and describe the probabilistic responsibility score previously introduced in [Bertossi et al. (2020)]. In this regard, we also show how to bring logical constraints into the probabilistic setting of the responsibility score. In Section 8, the discuss related work. In Section 9, we conclude with a discussion and concluding remarks.
This paper is an extended version of the conference paper [Bertossi (2020)]. This current version, in addition to improving and making more precise the presentation in comparison with the conference version, includes a considerable amount of new material. Section 2, containing background material, is new. In particular, it contains a new example of a decision-tree classifier that is used as an additional running example throughout the paper. Section 4 on the complexity of x-Resp computation is completely new, so as Section 5.2 that retakes complexity at the light of CIPs. This version also contains several examples of the use of the DLV system for specifying and running CIPs. There were no such examples in the conference version. Section 7 is new. It contains two subsections; the first dealing with non-binary features and the need to extend in probabilistic terms the definition of the x-Resp score; and the second, with the combination of probabilistic uncertainty and domain knowledge. We now have the new Section 8 on related work. The final Section 9 has been considerably revised and extended.
2 Background
2.1 Classification models and counterfactuals
In a particular domain we may have a finite set of features (a.k.a. as variables). More precisely, the features are functions that take values in their domains .111This is customary parlance, but, since a feature is a function, in Mathematics we would call this the range of . As is commonly the case, we will assume that these domains are categorical, i.e. finite and without any a priori order structure. These features are used to describe or represent entities in an underlying population (or application domain), as records (or tuples) formed by the values the features take on the entity. Actually, with a bit of abuse of notation, we represent entities directly as these records of feature values: . A feature is said to be Boolean or propositional if , for which we sometimes use or simply, .
For example, if the underlying population contains people, and then, each entity represents a person, features could be , with domains: ; ; ; . A particular entity could be (represented by) . Here, is a propositional (or binary) feature.
A classifier, , for a domain of entities is a computable function , where is a set of labels. We classify the entity by assigning a label to it. We do not have to assume any order structure on . In the example above, we may want to classify entities in the people’s population according to how appropriate they are for military service. The set of labels could be , then the classifier could be such, that , but . When the set of labels has two values, typically, , we have a binary or Boolean classifier. In this work we will consider mostly binary classifiers.
Example 2.1
This is a popular example taken from [Mitchell (1997)]. Consider the set of features , with , , . An entity under classification has a value for each of the features, e.g. . The problem is about deciding about playing tennis or not under the conditions represented by that entity, which can be captured as a classification problem, with labels or (sometimes we will use and , resp.).

The Boolean classifier is given as a decision-tree, as shown in Figure 1. The decision is computed by following the feature values along the branches of the tree. The entity at hand gets label .
Of course, a decision tree could be just arbitrarily given, but the common situation is to learn it from training examples, that is from a collection of entities that come already with an assigned label. In more general terms, building a classifier, , from a set of training data, i.e. a set of pairs , with , is about learning a (computational) model for the label function for the entire domain of entities, beyond . We say that “represents” the classifier . This supervised learning task is one of the most common in machine learning. (C.f. [Mitchell (1997)] or [Flach (2012)] for great introductions to the subject.) Classifiers may take many different internal forms. They could be decision trees, random forests, rule-based classifiers, logistic regression models, neural network-based classifiers, etc.
In this work we are not concerned with learning classifiers. We will assume the classifier is given as an input/output computable relation, or that we know how to specify it, for example, trough a logic program. By the same token, we are not dealing here with any kind of program learning.
Some classes of classifiers are more “opaque” than others, i.e. with a more complex and less interpretable internal structure and results [Molnar (2020)]. Hence, the need for explaining the classification outcomes. The decision tree above is clearly a computable function. Since we have the classifier at hand with an explicit and clear specification, we would consider this classifier to be interpretable. In this direction, if we obtain a label for a particular entity at hand, we can inspect the model and explain why we obtained that label, and we could identify the feature values that were relevant for the outcome.
Instead of a decision tree as above, we could have, for example, a very complex neural network. Such a model would be much more difficult to interpret, or use to explain why we got a particular output for a particular input. This kind of models are considered to be black-box models (or al least, opaque) [Rudin (2019)].
Assume for a moment that we do not have the explicit classifier in Example 2.1, but we interact only with the box that contains it. We input entity , and we obtain the output . We want to know what are the feature values in that influenced the outcome the most. Actually, we want to get a numerical score for each of the entity’s feature values, in such a way that the higher the score, the more relevant is the feature value for the outcome.
Different scores may be defined. A popular one is Shap [Lundberg et al. (2020)], which has been investigated in detail for some classes of classifiers [Arenas et al. (2021), Van den Broeck et al. (2021)]. In this work we concentrate on x-Resp, a responsibility score that was introduced, in a somewhat ad hoc manner, in [Bertossi et al. (2020)]. In this work, we present x-Resp in a causal setting, on the basis of counterfactual interventions and their responsibilities, following the general approach to causal responsibility in [Chockler and Halpern (2004)].
Just for the gist, and at the light of the example at hand, we want to detect and quantify the relevance (technically, the responsibility) of a feature value in , say for feature (underlined), by hypothetically intervening its value; in this case, changing it from to , obtaining a new entity , a counterfactual version of . If we input this entity into the classifier, we now obtain the label . This is an indication that the original feature value for is indeed relevant for the original classification. Through numerical aggregations over the outcomes associated to the alternative, counterfactually intervened entities, we can define and compute the x-Resp score. The details can be found in Section 3. As mentioned in Section 1, these counterfactual entities are also interesting per se, and not only as a basis for the definition and computation of feature attribution scores.
2.2 Answer-set programming
As customary, when we talk about answer-set programs, we refer to disjunctive Datalog programs with weak negation and stable model semantics [Gelfond and Lifschitz (1991)]. Accordingly, an answer-set program consists of a finite number of rules of the form
(1) |
where , and are (positive) atoms, i.e. of the form , where is a predicate of a fixed arity, say, , and is a sequence of length of variables or constants. In rule (1), are called literals, with positive, and , negative. All the variables in the appear among those in the . The left-hand side of a rule is called the head, and the right-hand side, the body. A rule can be seen as a (partial) definition of the predicates in the head (there may be other rules with the same predicates in the head).
The constants in program form the (finite) Herbrand universe of the program. The ground version of program , , is obtained by instantiating the variables in in all possible ways using values from . The Herbrand base, , of contains all the atoms obtained as instantiations of predicates in with constants in .
A subset of is a model of if it satisfies , i.e.: For every ground rule of , if and , then . is a minimal model of if it is a model of , and has no model that is properly contained in . denotes the class of minimal models of . Now, for , transform into a new, positive program (i.e. without ), as follows: Delete every rule for which . Next, transform each remaining rule into . Now, is a stable model of if . Every stable model of is also a minimal model of . Stable models are also commonly called answer sets, and so are we going to do most of the time.
A program is unstratified if there is a cyclic, recursive definition of a predicate that involves negation. For example, the program consisting of the rules ; , and is unstratified, because there is a negation in the mutually recursive definitions of and . The program in Example 2.2 below is not unstratified, i.e. it is stratified. A good property of stratified programs is that the models can be upwardly computed following strata (layers) starting from the facts, that is from the ground instantiations of rules with empty bodies (in which case the arrow is usually omitted). We refer the reader to [Gelfond and Kahl (2014)] for more details.
Query answering under the ASPs comes in two forms. Under the brave semantics, a query posed to the program obtains as answers those that hold in some model of the program. However, under the skeptical (or cautious) semantics, only the answers that simultaneously hold in all the models are returned. Both are useful depending on the application at hand.
We will use disjunctive programs. However, sometimes it is possible to use instead normal programs, which do not have disjunctions in rule heads, and with the same stable models, in the sense that disjunctive rules can be transformed into a set of non-disjunctive rules. More precisely, the rule in (1) can be transformed into the rules:
This shift operation is possible if the original program is head-cycle free [Ben-Eliyahu and Dechter (1994), Dantsin et al. (2001)], as we define now. The dependency graph of a program , denoted , is a directed graph whose nodes are the atoms of the associated ground program . There is an arc from to iff there is a rule in where appears positive in the body and appears in the head. is head-cycle free (HCF) iff has no cycle through two atoms that belong to the head of a same rule.
Example 2.2
Consider the following program that is already ground.
The program has two stable models: and .
Each of them expresses that the atoms in it are true, and any other atom that does not belong to it, is false.
These models are incomparable under set inclusion, and they are minimal models in that any proper subset of any of them is not a model of the program.

The dependency graph is shown in Figure 2. We can see that the program is head-cycle free, because there is no cycle involving both and , the atoms that appear in the disjunctive head. As a consequence, the program can be transformed into the following non-disjunctive program
This program has the same stable models as the original one.
We will return to HCF-programs in Section 5.2.
3 Counterfactual Explanations and the x-Resp Score
We consider classification models, , that are represented by an input/output relation. Inputs are the so-called entities, , which are represented as records, , where is the value taken on by the feature . The entities form a population . The output of a classifier is represented by a label function that maps entities to or , the binary result of the classification. That is, to simplify the presentation, we concentrate here on binary classifiers, but this is not essential. Furthermore, we consider domains with a finite number of categorical values.
In this work, we are not assuming that we have an explicit classification model, and we do not need it. All we need is to be able to invoke it. It could be a “black-box” model.
The problem is the following: Given an entity that has received the label , provide an “explanation” for this outcome. In order to simplify the presentation, and without loss of generality, we assume that label is the one that has to be explained. It is the “negative” outcome one has to justify, such as the rejection of a loan application.
Counterfactual explanations are defined in terms of counterfactual interventions that simultaneously change feature values in , in such a way that the updated record gets a new label. A counterfactual explanation for the classification of is, then, an alternative entity that is classified with a label different from that of . In general, we are interested in counterfactual explanations that are more informative about the role of the feature values in , which lead to its obtained label. These are the entities that are obtained from through a minimal counterfactual interventions. Minimality can be defined in different ways, and we adopt an abstract approach, assuming a partial order relation on counterfactual interventions.
Definition 3.1
Consider a binary classifier represented by its label function , and a fixed input entity , with , , and .
(a) An intervention on is a set of the form , with , for , . We denote with the record obtained by applying to intervention , i.e. by replacing in every , with appearing in , by .
(b) A counterfactual intervention on is an intervention on , such that . The resulting entity is called a counterfactual entity (for ).
(c) A -minimal counterfactual intervention is such that there is no counterfactual intervention on with (i.e. , but not ). The resulting entity is called a -minimal counterfactual entity.
(d) A counterfactual explanation for is a set of the form , with , for which there is a counterfactual intervention for .
(e) A counterfactual explanation for is -minimal if its associated counterfactual intervention is a -minimal counterfactual intervention on .
Remark 3.1
Counterfactual explanations contain feature values of the original entity . They contain relevant information about the classification result, and interventions are used to characterize and identify them. For this reason, we will usually call an alternative entity obtained from through a counterfactual intervention, a counterfactual explanation as well: The counterfactual explanation in the sense of Definition 3.1(d) can be read-off from .
Several minimality criteria can be expressed in terms of partial orders. We will explicitly consider two of them.
Definition 3.2
Let and be interventions on an entity . (a) iff , with the projection of on the first position. (b) iff .
This definition introduces minimality under set inclusion and cardinality, resp. The former minimizes -under set inclusion- the set of features whose values are changed. The latter, the number of features that see their values changed. In the following, we will consider only these minimality criteria, mostly the second (c.f. Section 9 for a discussion).
Example 3.1
Consider three binary features , i.e. they take values or . The input/output relation of a classifier is shown in Table 1. Let be in the table. We want counterfactual explanations for its label . Any other record in the table can be seen as the result of an intervention on . However, only are (results of) counterfactual interventions in that they switch the label to .
entity (id) | ||||
---|---|---|---|---|
0 | 1 | 1 | 1 | |
1 | 1 | 1 | 1 | |
1 | 1 | 0 | 1 | |
1 | 0 | 1 | 0 | |
1 | 0 | 0 | 1 | |
0 | 1 | 0 | 1 | |
0 | 0 | 1 | 0 | |
0 | 0 | 0 | 0 |
Table 1: Classifier
For example, corresponds to the intervention , in that is obtained from by changing the values of into and , resp. For , . From we obtain the counterfactual explanation , telling us that the values and are the joint cause for to be classified as .
There are three counterfactual explanations: , ,
and . Here, and are incomparable under , , , and turns out to be - and -minimal (actually, minimum).
Notice that what matters here is what is intervened, and not how. By taking a projection, the partial order does not care about the values that replace the original feature values, as long as the latter are changed. Furthermore, given , it would be good enough to indicate the features whose values are relevant, e.g. in the previous example. However, the introduced notation emphasizes the fact that the original values are those we concentrate on when providing explanations.
Every -minimal explanation is also -minimal. However, it is easy to produce an example showing that a -minimal explanation may not be -minimal.
Notation: An s-explanation for is a -minimal counterfactual explanation for . A c-explanation for is a -minimal counterfactual explanation for . So as prescribed in Remark 3.1, we will usually use the terms s-explanation and c-explanation to refer to
the alternative, intervened entities that are associated to
s- and c-explanations in the sense of Definition 3.2.
So far, we have characterized explanations as sets of (interventions on) features. However, one would also like to quantify the “causal strength” of a single feature value in a record representing an entity. Something similar has been done for a single tuple in a database as a cause for a query answer [Meliou et al. (2010)], or for a single attribute value in a database tuple [Bertossi and Salimi (2017a), Bertossi (2021)]. Different scores for feature values have been proposed in this direction, e.g. Shap in [Lundberg et al. (2020)] and Resp in [Bertossi et al. (2020)]. Following [Chockler and Halpern (2004)], we will now formulate the latter as the responsibility of a feature value as an actual cause [Halpern and Pearl (2005)] for the observed classification.
Definition 3.3
Consider an entity represented as a record of feature values , .
(a) A feature value , with , is a value-explanation for if there is an s-explanation for , such that .
(b) The explanatory responsibility of a value-explanation is:
(c) If is not a value-explanation, .
Notice that (b) can be stated as , with .
Adopting the terminology of actual causality [Halpern and Pearl (2005)], a counterfactual value-explanation for ’s classification is a value-explanation with . That is, it suffices, without company from other feature values in , to explain the classification. Similarly, an actual value-explanation for ’s classification is a value-explanation with . That is, appears in an s-explanation , say as , but possibly in company of other feature values. In this case, is a contingency set for (c.f. [Halpern and Pearl (2005)], and [Meliou et al. (2010)] for the case of databases).
Example 3.2
(example 3.1 continued) , equivalently entity , is the only c-explanation for ’s classification. Its value for is a counterfactual value-explanation, and its explanatory responsibility is
. The empty set is its contingency set. Now, entity shows that the intervened value for , i.e. , needs as contingency set for the label to switch to .
In this work we are interested mostly in c-explanations, which are our best explanations, and in maximum-responsibility feature values. Notice that maximum x-Resp scores can be obtained by concentrating only on c-explanations. Maximum-responsibility value-explanations appear in c-explanations, and only in them. C.f. Section 9 for considerations on s-explanations and features with non-maximum responsibility scores.
4 Complexity of x-Resp Computation
A natural question is about the complexity of computing the x-Resp score. Not only for the obvious reason of knowing the complexity, but also to determine if the ASP-based approach we will present is justified from the complexity point of view. We can shed some light on this matter by taking advantage of complexity results for actual causality in databases [Meliou et al. (2010), Bertossi and Salimi (2017a)]. It is known that there are Boolean Conjunctive Queries (BCQs), , for which deciding if the responsibility of a tuple for being true in a database is above a given threshold is -complete, in the size of [Bertossi and Salimi (2017a)].
In our case, given a fixed classifier , the computational problem is about deciding, for an arbitrary entity and rational number , if . The complexity is measured as a function of , the size of cartesian product of the the underlying domains, and the complexity of computing . (Under our ASP-based approach, will be associated to the extensional database for the program.)
Given a relational database schema , with predicates with arities , we can see each attribute as a feature that takes values in a finite domain . Without loss of generality, we assume the predicates do not share attributes (but attributes may share a same domain). Then, we have a sequence of attributes . For a database , for each of the potential database tuples (in or not), with , define binary features if , and , otherwise.
Now, define a binary entity . This entity, containing only s, represents the contents of database , and its length coincides with the number of tuples in , say . Now, the population of entities is , i.e. all the binary sequences with the same length as . Intuitively, each entity in represents a sub-instance of , by keeping only the tuples of that are assigned the value . We do not need to consider the insertion of tuples in , as will be clear soon.
Now, given the fixed BCQ , for which , define a binary classifier as follows: For , . Notice that this classifier runs in polynomial-time in the length of . Also, . The monotonicity of allows us to concentrate on sub-instances of . Only subinstances can invalidate the query, and superinstances will always make it true. In this way, we have reduced the problem of computing the responsibility of a tuple as an actual cause for to the problem of computing .
Theorem 1
There is a binary polynomial-time classifier over a finite set of binary entities for which deciding if is above a certain threshold is -complete in the size of plus the size of .
So as in [Bertossi and Salimi (2017a)], several other problems in relation to responsibility can be investigated; and it is likely that most (if not all) the results in [Bertossi and Salimi (2017a)] can be used to obtain similar results for the x-Resp score.
5 Counterfactual Intervention Programs
An answer-set program has a possible-world semantics, given in terms of its stable models, which are the intended models of the program [Gelfond and Kahl (2014)]. A program consists of a set of rules with variables, possibly with negated atoms in a rule body (antecedent) and disjunctions in the head (consequent). This negation is non-monotonic, which is particulary useful for doing commonsense reasoning and specifying persistence of properties. A program has an extensional database consisting of ground atoms (the facts). In our case, the facts will be related to the entity under classification for whose label we want counterfactual explanations. The program specifies the possible interventions. Final, intervened versions of the original entity, that have their label switched, correspond to different stable models of the program.
Entities will be represented by means of a predicate with arguments . The first one holds a record (or entity) id (which may not be needed when dealing with single entities). The next arguments hold the feature values.222For performance-related reasons, it might be more convenient to use 3-ary predicates to represent an entity with an identifier, but the presentation here would be more complicated. The last argument holds an annotation constant from the set . Their semantics will be specified below, by the generic program that uses them.
Initially, a record has not been subject to interventions, and the corresponding entry in predicate is of the form , where is (with a bit of abuse of notation) a constant used a an entity identifier, is an abbreviation for , i.e. the feature values for entity ; and the annotation constant “o” stands for “original entity”.
When the classifier gives label to , the idea is to start changing feature values, one at a time. The intervened entity becomes then annotated with constant do in the last argument.333The do-operator is common to denote interventions [Pearl (2009)]. Here, it is just a constant. When the resulting intervened entities are classified, we may not have the classifier specified within the program. For this reason, the program uses a special predicate , where the first arguments take the feature values of an entity under classification, and the last argument returns the binary label. This predicate can be explicitly given as a set of facts (c.f. Example 5.1), or can be specified within the program (c.f. Example 5.7), or can be invoked by the program as an external predicate (c.f. Example 5.8), much in the spirit of HEX-programs [Eiter et al. (2019), Eiter et al. (2017)]. Since the original instance may have to go through several interventions until reaching one that switches the label to , the intermediate entities get the “transition” annotation . This is achieved by a generic program.
5.1 Counterfactual intervention programs
The generic Counterfactual Intervention Program (CIP) is as follows:
-
P1.
The facts of the program are the atoms of the form , with , plus the initial entity , with the initial vector of feature values.
-
P2.
The transition entities are obtained as initial, original entities, or as the result of an intervention. Here, is a variable standing for a record id.
-
P3.
The program rule specifying that, every time the entity at hand (original or obtained after a “previous” intervention) is classified with label , a new value has to be picked from a domain, and replaced for the current value. The new value is chosen via the non-deterministic “choice operator”, well-known in ASP [Giannotti et al. (1997)]. In this case, the values are chosen from the domains, subject to the condition of not being the same as the current value:
In general, for each fixed , chooses a unique value subject to the other conditions in the same rule body. The use of the choice operator can be eliminated by replacing each atom by the atom , and defining each predicate by means of “classical” rules [Giannotti et al. (1997)], as follows:444We emphasize that we are using here the “choice operator”, which is definable in ASP (as done here), and not the newer choice rules, which could be used here for the same purpose (and many more) and are included in the ASP-Core-2 Standard [Calimeri et al. (2020)]. We use the choice operator, because most of our programs are being run with DLV-Complex, which does not support choice rules.
(2) | |||||
(3) |
-
P4.
The following rule specifies that we can “stop”, hence annotation s, when we reach an entity that gets label :
-
P5.
We add a em program constraint specifying that we prohibit going back to the original entity via local interventions:
-
P6.
The counterfactual explanations can be collected by means of a predicate specified by means of:
,
with . They collect each value that has been changed in the original instance , with its position in (the second argument of the predicate). Actually, each of these is a value-explanation.
The program will have several stable models due to the disjunctive rule and the choice operator. Each model will hold intervened versions of the original entity, and hopefully versions for which the label is switched, i.e. those with annotation s. If the classifier never switches the label, despite the fact that local interventions are not restricted, we will not find a model with a version of the initial entity annotated with s. Due to the constraint in P5., none of the models will have the original entity annotated with do, because those models would be discarded [Leone et al. (2006)]. The definition of the choice operator contains non-stratified negation. The semantics of ASP, which involves model minimality, makes only one of the atoms in a head disjunction true (unless forced otherwise by the program).
Example 5.1
(example 3.1 continued) Most of the CIP above is generic. Here we have the facts: and , with a constant, the record id of the first row in Table 1. The classifier is explicitly given by Table 1. Then, predicate can be specified with a set of additional facts: , , . In them, the last entry corresponds the label assigned to the entity whose feature values are given in the first three arguments.
The stable models of the program will contain all the facts above. One of them, say , will contain (among others) the facts: and . The presence of the last atom activates rule P3., because is true (for in Table 1). New facts are produced for (the new value due to an intervention is underlined): . Due to the last fact and the true , rule P3. is activated again. Choosing the value for the second disjunct, atoms are generated. For the latter, is true (coming from in Table 1), switching the label to . Rule P3. is no longer activated, and we can apply rule P4., obtaining .
From rule P6., we obtain explanations , showing the changed values in . All this in model . There are other models, and one of them contains , the minimally intervened version of , i.e. .
In the next example we will show how to write and run the counterfactual intervention program for Example 3.1 with the DLV system [Leone et al. (2006)].555We have experimented, with the examples in this paper and others, with each of DLV, DLV-Complex, and DLV2. They have been extremely useful. At this moment, each of them seems to have some nice features the others lack. When numerical aggregations and, specially, set operations are needed, we use instead the DLV-Complex system [Calimeri et al.. (2009)] (c.f. Section 9). In Example 5.8 we show a program that can be used with the newer version, DLV2 [Alviano et al. (2017), Alviano et al. (2019)], of DLV, which follows the ASP-Core-2 standard [Calimeri et al. (2020)].
Example 5.2
(example 5.1 continued) The answer-set program for Examples 3.1 and 5.1, written in the language for the DLV-Complex system is shown next. (The program portion shown right below would also run with DLV since it does not contain numerical aggregations.) In it, the annotation “tr
” stands for the transition annotation “” used in Example 5.1, and X, Xp
stand for , etc. In a DLV program, terms starting with a lower-case letter are constants; and those starting with an upper-case letter are variables.
#include<ListAndSet> % the classifier: cls(0,1,1,1). cls(1,1,1,1). cls(1,1,0,1). cls(1,0,1,0). cls(1,0,0,1). cls(0,1,0,1). cls(0,0,1,0). cls(0,0,0,0). % the domains: dom1(0). dom1(1). dom2(0). dom2(1). dom3(0). dom3(1). % original entity at hand: ent(e,0,1,1,o). % transition rules: ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,o). ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,do). % admissible counterfactual interventions: ent(E,Xp,Y,Z,do) v ent(E,X,Yp,Z,do) v ent(E,X,Y,Zp,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp), X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp). % definitions of chosen operators as in equations (2) and (3): chosen1(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(U), U != X, not diffchoice1(X,Y,Z,U). diffchoice1(X,Y,Z, U) :- chosen1(X,Y,Z,Up), U != Up, dom1(U). chosen2(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom2(U), U != Y, not diffchoice2(X,Y,Z,U). diffchoice2(X,Y,Z, U) :- chosen2(X,Y,Z,Up), U != Up, dom2(U). chosen3(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom3(U), U != Z, not diffchoice3(X,Y,Z,U). diffchoice3(X,Y,Z, U) :- chosen3(X,Y,Z,Up), U != Up, dom3(U). % stop when label has been changed: ent(E,X,Y,Z,s) :- ent(E,X,Y,Z,do), cls(X,Y,Z,0). % hard constraint for not returning to original entity: :- ent(E,X,Y,Z,do), ent(E,X,Y,Z,o). % auxiliary predicate to avoid unsafe negation in the hard constraint below: entAux(E) :- ent(E,X,Y,Z,s). % hard constraint for not computing models where label does not change: :- ent(E,X,Y,Z,o), not entAux(E). % collecting explanatory changes per argument: expl(E,1,X) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp. expl(E,2,Y) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp. expl(E,3,Z) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.
If we run this program with DLV-Complex, we obtain the following three models; for which we do not show (most of) the original program facts, as can be requested from DLV:
{ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0), chosen3(0,1,1,0), ent(e,0,0,1,do), ent(e,0,0,1,tr), ent(e,0,0,1,s), diffchoice3(0,1,1,1), diffchoice2(0,1,1,1), diffchoice1(0,1,1,0), entAux(e), expl(e,2,1)} {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0), chosen3(0,1,1,0), ent(e,0,1,0,do), ent(e,0,1,0,tr), chosen1(0,1,0,1), chosen2(0,1,0,0), chosen3(0,1,0,1), ent(e,0,0,0,do), ent(e,0,0,0,tr), ent(e,0,0,0,s), diffchoice3(0,1,0,0), diffchoice3(0,1,1,1), diffchoice2(0,1,0,1), diffchoice2(0,1,1,1), diffchoice1(0,1,0,0), diffchoice1(0,1,1,0), entAux(e), expl(e,2,1), expl(e,3,1)} {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0), chosen3(0,1,1,0), ent(e,1,1,1,do), ent(e,1,1,1,tr), chosen1(1,1,1,0), chosen2(1,1,1,0), chosen3(1,1,1,0), ent(e,1,0,1,do), ent(e,1,0,1,tr), ent(e,1,0,1,s), diffchoice3(0,1,1,1), diffchoice3(1,1,1,1), diffchoice2(0,1,1,1), diffchoice2(1,1,1,1), diffchoice1(0,1,1,0), diffchoice1(1,1,1,1), entAux(e), expl(e,1,0), expl(e,2,1)}
These models correspond to the counterfactual entities , resp. in Example 3.1.
Notice that the program, except for the fact ent(e,0,1,1,o)
in the 10th line, is completely generic, and can be used with any input entity that has been classified with label .666If the initial label is instead, no interventions would be triggered, and the only model would correspond to the initial entity. We could remove it from the program, obtaining
program theProgram2.txt
, and we could run instead
C:\DLV>dlv.exe ent.txt program2.txt > outputFile.txt
where ent.txt
is the file containing only ent(e,0,1,1,o).
.
In the previous example, the classifier was given as an input/output relation, that is, as a set of facts inserted directly in the program. In other situations, we may have the classifier invoked from the program as an external predicate. In others, the classifier can be specified directly in the program, as shown in Example 2.1.
Our CIPs compute all the counterfactual versions (or counterfactual explanations) of the original entity . Each counterfactual version is represented (or characterized) by at least one of the stable models of the CIP; one that contains the counterfactual version of annotated with “”. There may be more that one stable model associated to a counterfactual version due to the use of the choice operator. Different choices may end up leading to the same .
The counterfactual explanations obtained through the CIP are not necessarily s-explanations or c-explanations (c.f. Section 3), as Example 5.2 shows. The CIPs presented so far have (only) minimal models with respect to set-inclusion of the extensions of full predicates, whereas when we compare explanations, we do it at the “attribute (or feature, or argument) level”. Of course, s-explanations and c-explanations are all included among the counterfactual explanations, and are represented by stable models of the CIP. We will consider this point in Section 5.3.
Example 5.3
(example 3.1 continued) To show the different kinds of counterfactual versions of an original entity, let us change the classifier we had before by the one shown in Table 2.
entity (id) | ||||
---|---|---|---|---|
0 | 1 | 1 | 1 | |
1 | 1 | 1 | 1 | |
1 | 1 | 0 | 0 | |
1 | 0 | 1 | 1 | |
1 | 0 | 0 | 0 | |
0 | 1 | 0 | 1 | |
0 | 0 | 1 | 0 | |
0 | 0 | 0 | 1 |
Table 2: Classifier
The changed values appear underlined. In this case, we have three counterfactual versions: (a) that is c-explanation. Only one value is changed to switch the label to . (b) is an s-explanation, but not a c-explanation. It shows two changes, but not the one for . (c) is neither an s- nor a c-explanation. Its changes include those for and those for .
5.2 Complexity of CIPs
The complexity result obtained in Section 4 makes us wonder whether using ASP-programs for specifying and computing the x-Resp score is an overkill, or, more precisely, whether they have the right and necessary expressive power and complexity to confront our problem. In fact they do. It is known that reasoning with disjunctive answer-set programs (DASP) falls in the second level of the polynomial hierarchy (in data complexity) [Dantsin et al. (2001)], and slightly above that by the use of weak constraints [Leone et al. (2006)] that we will use in Section 5.3. However, CIPs have the property of being head-cycle free (HCF), which brings down the complexity of a DASP to the first level of the polynomial hierarchy [Dantsin et al. (2001)]. This is in line with the result in Theorem 1.
It is easy to check that CIPs are HCF (c.f. Section 2): The ground chains in the directed graph associated to a CIP are, due to rules P2. and P3., of the form: , with . They never create a cycle in the head of a ground instantiation of the disjunctive rule.
One can also see the HCF property from the fact that the CIPs become repair-programs [Bertossi (2011)] for a database w.r.t. the integrity constraint, actually denial constraint, . Denial constraints are common in databases, and their repairs and repair-programs have been investigated [Caniupan and Bertossi (2010), Bertossi (2021)]. C.f. [Bertossi (2011)] for additional references.
As a consequence of being HCF, a CIP can be transformed, by means of the shift operation, into an equivalent non-disjunctive ASP (c.f. Section 2).
Example 5.4
(example 5.1 continued) The disjunctive rule in Example 5.1 can be replaced by the three rules:
ent(E,Xp,Y,Z,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp), X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp), not ent(E,X,Yp,Z,do), not ent(E,X,Y,Zp,do). ent(E,X,Yp,Z,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp), X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp), not ent(E,Xp,Y,Z,do), not ent(E,X,Y,Zp,do). ent(E,X,Y,Zp,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp), X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp), not ent(E,Xp,Y,Z,do), not ent(E,X,Yp,Z,do).
The resulting program has the same answer-sets as the original program.
5.3 C-explanations and maximum responsibility
As discussed at the end of Section 5.1, an intervened entity of the form , that is, representing a counterfactual explanation, may not correspond to an s- or a c-explanation. We are interested in obtaining the latter, and only them, because: (a) They are our “best explanations”, and (b) They are used to define and compute the maximum x-Resp scores.
Moving towards computing x-Resp scores, notice that in each of the stable models of the CIP, we can collect the corresponding counterfactual explanation for ’s classification as the set . This can be done within a ASP system such as DLV, which allows set construction and aggregation, in particular, counting [Leone et al. (2006)]. Actually, counting comes handy to obtain the cardinality of , by means of:
(4) |
For each model of the CIP, we will obtain such a value that shows the number of changes of feature values that lead to the associated counterfactual explanation. Notice that, in each of the models that correspond to c-explanations, these, now minimum values will be the same, say , and can be used to compute the responsibility of a feature value in , as follows: For ,
(5) |
Example 5.5
(example 5.2 continued) Let us add to the CIP above the rule:
% computing the inverse of x-Resp: invResp(E,M) :- #count{I: expl(E,I,_)} = M, #int(M), E = e.
By running DLV-Complex with the new program, we obtain the models above extended with atoms representing the changes in arguments of the original entity (we omit most of the old atoms):
{ent(e,0,1,1,o), ... , ent(e,0,0,1,s), expl(e,2,1), invResp(e,1)} {ent(e,0,1,1,o), ..., ent(e,0,0,0,s), expl(e,2,1), expl(e,3,1), invResp(e,2)} {ent(e,0,1,1,o), ..., ent(e,1,0,1,s), expl(e,1,0), expl(e,2,1), invResp(e,2)}
As before, we obtain three models, and each of them shows, in the last atom, the number of changes that were made to obtain the corresponding counterfactual explanation.
For example, for the last model, say , corresponding to entity , we obtain . Similarly for the second one, corresponding to entity . The first model corresponds to the counterfactual entity that is a c-explanation, which is shown by the minimum value “” that predicate inResp
takes in its second argument among all the stable models.
We can see that the first model is the one corresponding to a maximum responsibility feature value.
In order to obtain only the models associated to c-explanations, we add weak program constraints to the CIP. They can be violated by a stable model of the program (unlike strong program constraints), but the number of violations has to be minimized. In this case, for , we add to the CIP:777This notation follows the standard in [Calimeri et al. (2020), Alviano et al. (2017)].
(6) |
Only the stable models representing an intervened version of with a minimum number of value discrepancies with will be kept.
Example 5.6
(example 5.5 continued) The best explanations, i.e. the c-explanations, can be obtained by adding weak program constraints to the combined CIP above:
% weak constraints for minimizing number of changes: :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp. :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp. :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.
If we run DLV-Complex with the extended program, we obtain a single model, corresponding to :
Best model: {ent(e,0,1,1,o), ent(e,0,1,1,tr), chosen1(0,1,1,1), chosen2(0,1,1,0), chosen3(0,1,1,0), ent(e,0,0,1,do), ent(e,0,0,1,tr), ent(e,0,0,1,s), diffchoice3(0,1,1,1), diffchoice2(0,1,1,1), diffchoice1(0,1,1,0), expl(e,2,1), entAux(e), invResp(e,1)}
This model shows that counterfactual entity has one change in the second attribute wrt. the original entity. This new entity gives a minimum inverse responsibility to the original value in the second argument of ent
, which leads, via (5), to its (maximum) responsibility .
Example 5.7
(example 2.1 continued) We present now the CIP for the classifier based on the decision-tree, in DLV-Complex notation. Notice that after the facts, that now do not include the classifier, we find the rule-based specification of the decision tree.
#include<ListAndSet> % facts: dom1(sunny). dom1(overcast). dom1(rain). dom2(high). dom2(normal). dom3(strong). dom3(weak). ent(e,sunny,normal,weak,o). % original entity at hand % spec of the classifier: cls(X,Y,Z,1) :- Y = normal, X = sunny, dom1(X), dom3(Z). cls(X,Y,Z,1) :- X = overcast, dom2(Y), dom3(Z). cls(X,Y,Z,1) :- Z = weak, X = rain, dom2(Y). cls(X,Y,Z,0) :- dom1(X), dom2(Y), dom3(Z), not cls(X,Y,Z,1). % transition rules: ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,o). ent(E,X,Y,Z,tr) :- ent(E,X,Y,Z,do). % counterfactual rule ent(E,Xp,Y,Z,do) v ent(E,X,Yp,Z,do) v ent(E,X,Y,Zp,do) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(Xp), dom2(Yp), dom3(Zp), X != Xp, Y != Yp, Z!= Zp, chosen1(X,Y,Z,Xp), chosen2(X,Y,Z,Yp), chosen3(X,Y,Z,Zp). % definitions of chosen operators: chosen1(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom1(U), U != X, not diffchoice1(X,Y,Z,U). diffchoice1(X,Y,Z, U) :- chosen1(X,Y,Z, Up), U != Up, dom1(U). chosen2(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom2(U), U != Y, not diffchoice2(X,Y,Z,U). diffchoice2(X,Y,Z, U) :- chosen2(X,Y,Z, Up), U != Up, dom2(U). chosen3(X,Y,Z,U) :- ent(E,X,Y,Z,tr), cls(X,Y,Z,1), dom3(U), U != Z, not diffchoice3(X,Y,Z,U). diffchoice3(X,Y,Z, U) :- chosen3(X,Y,Z, Up), U != Up, dom3(U). % Not going back to initial entity (program constraint): :- ent(E,X,Y,Z,do), ent(E,X,Y,Z,o). % stop when label has been changed: ent(E,X,Y,Z,s) :- ent(E,X,Y,Z,do), cls(X,Y,Z,0). % collecting changed values for each feature: expl(E,outlook,X) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp. expl(E,humidity,Y) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp. expl(E,wind,Z) :- ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp. entAux(E) :- ent(E,X,Y,Z,s). % auxiliary predicate to % avoid unsafe negation % in the constraint below :- ent(E,X,Y,Z,o), not entAux(E). % discard models where % label does not change % computing the inverse of x-Resp: invResp(E,M) :- #count{I: expl(E,I,_)} = M, #int(M), E = e.
Two counterfactual versions of are obtained, as represented by the two essentially different stable models of the program, and determined by the atoms with the annotation s
(we keep in them only the most relevant atoms, omitting initial facts and choice-related atoms):
{ent(e,sunny,normal,weak,o),cls(sunny,normal,strong,1),cls(sunny,normal,weak,1), cls(overcast,high,strong,1),cls(overcast,high,weak,1),cls(rain,high,weak,1), cls(overcast,normal,weak,1),cls(rain,normal,weak,1),cls(overcast,normal,strong,1), cls(sunny,high,strong,0),cls(sunny,high,weak,0),cls(rain,high,strong,0), cls(rain,normal,strong,0),ent(e,sunny,high,weak,do),ent(e,sunny,high,weak,tr), ent(e,sunny,high,weak,s),expl(e,humidity,normal),invResp(e,1)} {ent(e,sunny,normal,weak,o), cls(sunny,normal,strong,1),..., cls(rain,normal,strong,0),ent(e,rain,normal,strong,do),ent(e,rain,normal,strong,tr), ent(e,rain,normal,strong,s),expl(e,outlook,sunny),expl(e,wind,weak),invResp(e,2)}
The first model shows the classifiers as a set of atoms, and its last line, that ent(e,sunny,high,weak,s)
is a counterfactual version, with label , of the original entity , and is obtained from
the latter by means of changes of values in feature , leading to an inverse score of . The second model shows a different counterfactual version of , namely ent(e,rain,normal,strong,s)
, now obtained by changing values for features and , leading to an inverse score of .
Let us now add, at the end of the program the following weak constraints (labeled with (*)
):
% Weak constraints to minimize number of changes: (*) :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), X != Xp. :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Y != Yp. :~ ent(E,X,Y,Z,o), ent(E,Xp,Yp,Zp,s), Z != Zp.
If we run the program with them, the number of changes is minimized, and we basically obtain only the first model above, corresponding to the counterfactual entity
.
As can be seen at the light of this example, more complex rule-based classifiers could be defined inside a CIP. It is also possible to invoke the classifier as an external predicate, as the following example shows.
Example 5.8
(example 5.7 continued) The program below calls the classifiers through a predicate that has an external extension, as defined by a Python program.
The program has the same facts and the same rules as the the program in Example 5.7, except for a new rule that defines the classification predicate, cls
here (and in the general formulation in Section 5.1), and replaces the internal specification of the classifier:
cls(X,Y,Z,L) :- &classifier(X,Y,Z;L), dom1(X), dom2(Y), dom3(Z).
Here, the atom &classifier(X,Y,Z;L)
corresponds to the invocation of the external classifier with parameters X,Y,Z
, which gets an external value through variable L
. The program was run with the version of DLV2 for Linux that supports interaction with Python, and can be downloaded from:
https://dlv.demacs.unical.it/publications#h.cgg9mbi41jq9
The program in Python that specifies the classifier is very simple, and it can be invoked in combination with DLV2, as follows:
sudo ./dlv2-python-linux-x86_64 program_dlv2.txt def_class.py
.
Here, program_dlv2.txt
is the CIP, and def_class.py
is the very simple Python program that specifies the classifier, namely
def classifier(X,Y,Z): if (X == "sunny") and (Y == "normal"): return 1 if (X == "overcast"): return 1 if (X == "rain") and (Z == "weak"): return 1 else: return 0
We obtain as answer-set the first one in Example 5.7.
6 Semantic Knowledge
Counterfactual interventions in the presence of semantic conditions requires consideration. As the following example shows, not every intervention, or combination of them, may be admissible [Bertossi and Geerts (2020)]. In these situations declarative approaches to counterfactual interventions become particularly useful.
Example 6.1
A moving company makes automated hiring decisions based on feature values in applicants’ records of the form . Mary, represented by applies, but is denied the job, i.e. the classifier returns: . To explain the decision, we can, hypothetically, change Mary’s gender, from into , obtaining record , for which we now observe . Thus, her value for gender can be seen as a counterfactual explanation for the initial decision.
As an alternative, we might keep the value of gender, and counterfactually change other feature values. However, we might be constrained or guided by an ontology containing, e.g. the denial semantic constraint ( and indicating positions in the record) that prohibits someone over 80 to be qualified as fit for lifting weight. We could also have a rule, such as , specifying that men who weigh over 100 pounds and are younger than 70 are automatically qualified to lift weight.
In situations like this, we could add to the CIPs we had before: (a) program constraints that prohibit certain models, e.g.
(b) additional rules, e.g.
that may automatically generate additional interventions. In a similar way, one could accommodate certain preferences using weak program constraints.
Causality and responsibility in databases in the presence of integrity constraints was introduced and investigated in [Bertossi and Salimi (2017b)].
Example 6.2
(example 5.7 continued) It might be the case that in a particular region, some combinations of weather conditions are never possible, e.g. raining with a strong wind at the same time. When producing counterfactual interventions for the entity , such a combination should be prohibited. This can be done by imposing a hard program constraint, that we add to the program in Example 5.7:
% hard constraint disallowing a particular combination (**) :- ent(E,rain,X,strong,tr).
As the previous example shows, we can easily impose constraints that make the counterfactual entities, or equivalently, the associated explanations, actionable [Ustun et al. (2019), Karimi et al. (2020b)]. As mentioned in Section 1, for the loan application example, we could impose a hard program constraint (i.e. add it to a CIP) of the form
(7) |
which prevents decreasing an applicant’s age.
Logic-based specifications also allow for compilation of constraints into rules. For example, instead of using a hard constraint, such as (7), we could directly impose the condition on a counterfactual age in the disjunctive counterfactual rule P3., of the form
Here, the subscript refers to the domain and variables for the Age feature. On this basis, one could only increase the age. If this intervention leads to a successful counterfactual entity (i.e. with a positive label), we could tell the applicant that he/she has to wait to possibly succeed with the loan application.
We could also think of explicitly specifying actionable counterfactual entities, starting with a rule of the form
where the new annotation stands for “actionable”, and the rule defines an entity as actionable if it is a counterfactual (final) entity that satisfies some extra conditions.
Several possibilities offer themselves in this direction. All of them require simple, symbolic changes in the overall specification. Doing something similar with a purely procedural approach would be much more complex, and would require modifying the underlying code.
Another situation where not all interventions are admissible occurs when features take continuous values, and their domains have to be discretized. The common way of doing this, namely the combination of bucketization and one-hot-encoding, leads to the natural and necessary imposition of additional constraints on interventions, as we will show. Through bucketization, a feature range is discretized by splitting it into finitely many, say , usually non-overlapping intervals. This makes the feature basically categorical (each interval becoming a categorical value). Next, through one-hot-encoding, the original feature is represented as a vector of length of indicator functions, one for each categorical value (intervals here) [Bertossi et al. (2020)]. In this way, the original feature gives rise to binary features. For example, if we have a continuous feature “External Risk Estimate” (ERE), its buckets could be: . Accordingly, if for an entity , , then, after one-hot-encoding, this value is represented as the vector , because falls into the second bucket.
In a case like this, it is clear that counterfactual interventions are constrained by the assumptions behind bucketization and one-hot-encoding. For example, the vector cannot be updated into, say , meaning that the feature value for the entity falls both in intervals and . Bucketization and one-hot-encoding can make use of program constraints, such as , etc. Of course, admissible interventions on predicate ERE could be easily handled with a disjunctive rule like that in P3., but without the “transition” annotation . However, the ERE record is commonly a component, or a sub-record, of a larger record containing all the feature values for an entity [Bertossi et al. (2020)]. Hence, the need for a more general and uniform form of specification.
Here we are considering the simple scenario in which the values are treated as unordered and categorical (binary) values. In some applications of bucketization and one-hot-encoding, one assumes and takes advantage of an underlying order inherited from the values in the buckets. Such an order could be adopted and brought into this framework by using additional rules that define that order. Developing the details is somehow outside the scope of this work.
7 Beyond Binary Features and Uncertainty
7.1 Expectation over interventions for the x-Resp score
The x-Resp introduced in Section 3 could be considered as a first approach to quantifying the relevance of a feature value. However, as the following example shows, we might want to go one step further.
Example 7.1
Consider a simple entity , with and . Assume that , for , but .
We can see that changing only the original first feature value does not change the label provided by the classifier. Nor does additionally changing the second feature value, except when using the last possibly value, , for . In this case, , despite the fact that almost all interventions on the second feature value do not change the label.
A similar phenomenon would appear if we had , with large , and , for , but .
In this case, the value is a counterfactual cause with explanation responsibility , despite the fact that most of the interventions of do not switch the label.
A way to compensate for this could be taking the label average over all possible interventions.
In order to extend the definition of the x-Resp by considering all possible interventions, we may consider the average of counterfactual labels over a given population, which would assume all entities are equally likely. In more general terms, we may assume the underlying entity population, , has a probability distribution, , which we can use to express the extended x-Resp in terms of expected values, as follows.
Consider , an entity under classification, for which , and a feature . Assume we have:
-
1.
, a set of features that may end up accompanying feature .
-
2.
, , , i.e. new values for features in .
-
3.
, i.e. reset ’s values for as in .
-
4.
, i.e. there is no label change with (but maybe with an extra change for , in next item).
-
5.
There is , with and .
As in Definition 3.3 and the paragraph that follows it, if , is an actual causal explanation for , with “contingency set” , where is the projection of on .
In order to define the “local” responsibility score, make vary randomly under conditions 1.- 5.:
(8) |
If, as so far, label is what has to be explained, then , and the numerator is a number between and . Here, is fixed. Now we can minimize its size, obtaining the (generalized) responsibility score as the maximum local value; everything relative to distribution :
This score was introduced, with less motivation and fewer details, and definitely not on a causal basis, in [Bertossi et al. (2020)], where experiments are shown, and different probability distributions are considered.
7.2 Domain knowledge under uncertainty
Different probability distribution on the entity population can be considered in the definition and computation of the generalized responsibility score (c.f. Section 7.1). A natural choice is the uniform distribution, , that gives equal probability, , to each entity in . Another natural distribution is the product distribution, , that is obtained, under the assumption of independence, as the product of given marginal distributions, , of the features : .
We can also assume we have a sample from the entity population that is used as a proxy for the latter. In this case, the distributions above become empirical distributions. In the uniform case, it is given by: if , and , otherwise. In the case of the product, , with . A discussion on the use of these distributions in the context of explanation scores can be found in [Bertossi et al. (2020)].
In general, one can say that the uniform distribution may not be appropriate for capturing capturing correlations between feature values. One could argue that certain combinations of feature values may be more likely than others; or that certain correlations among them exist. This situation is aggravated by the product distribution due to the independence assumption. For these reasons an empirical distribution may be better for this purpose.
In any way, having chosen a distribution on the population, , to work with; in particular, to compute the expectations needed for the responsibility score in (8), one could consider modifying the probabilities in the hope of capturing correlations and logical relationships between feature values. In particular, one could introduce constraints that prohibit certain combinations of values, in the spirit of denial constraints in databases, but in this case admitting positive and negative atoms. For example, with propositional features Old standing for “Is older than 20” and OverDr for “Gets an account overdraft above $50K”, we may want to impose the prohibition , standing for “nobody under 20 gets at overdraft above $50K”.
These constraints, which are satisfied or violated by a single entity at a time, are of the form:
(10) |
where , , and mean that features take values and , resp.
The event associated to is , where has the obvious meaning of satisfaction of by entity . In order to accommodate the constraint, given the initial probability space , we can redefine the probability as follows. For ,
(11) |
If is consistent with the population, i.e. satisfiable in , the conditional distribution is well-defined. Now, the probability of ’s violation set is:
This definition can be extended to finite and consistent sets, , of constraints, by using in (11), with the conjunction of the constraints in .
Of course, one could go beyond constraints of the form (10), applying the same ideas, and consider any propositional formula that is intended to be evaluated on a single entity at a time, as opposed to considering combinations of feature values for different entities.
The resulting modified distribution that accommodates the constraints could be used in the computation of any of the scores expressed in terms of expected values or in probabilistic terms.
An alternative approach consists in restricting the (combinations of) interventions in the definitions and computation of the responsibility score, as suggested in Section 6 (and any other score based on counterfactual interventions, as a matter of fact). It is worth performing experimental comparisons between the two approaches.
8 Related Work
In this worl we consider only local methods in that we are not trying to explain the overall behavior of a classifier, but a particular output on the basis of individual feature values. We also consider model-agnostic methods that can be applied to black-box models, i.e. without the need for an access to the internal components of the model. Of course, these approaches can be applied to open models, i.e. that make all its components transparent for analysis. Actually, in some cases, it is possible to take computational advantage of this additional source of knowledge (more on this below in this section).
There are several approaches to the explanation of outcomes from classification models. These methods can be roughly categorized as those that provide attribution scores and those that provide sufficient explanations. Those in the former class assign a number to each feature value that reflects its relevance for the outcome at hand. x-Resp falls in this category. For comparison with other approaches, we repeat that, in the case of x-Resp, one counterfactually modifies feature values to see if the outcome changes. The score is computed from those changes.
Counterfactual changes are omnipresent in attribution-score-based methods, and they can be used to consider alternative entities to the one under explanation, and a notion of distance between those alternatives and the latter [Martens and Provost (2014), Wachter et al. (2017), Russell (2019), Karimi et al. (2020a), Datta et al. (2016), Bertossi et al. (2020)]. Sometimes the counterfactuals are less explicit, as with the popular Shap score [Lundberg et al. (2020)], that is based on the Shapley value of game theory. It can be seen as a counterfactual-based score in that all possible combinations of features values (and then, most of the time departing from the initial entity) are considered in a complex aggregation (an average or expected value).
The Shap score is designed as a model-agnostic method. However, for a large class of classifiers whose internal components can be used for the score computation, Shap becomes computable in polynomial-time, while its general computational complexity is -hard [Arenas et al. (2021), Van den Broeck et al. (2021)]. As we showed in this paper, the computation of the x-Resp is -hard. Actually, the generalized, probabilistic extension of x-Resp (c.f. Section 7.1), for certain classifiers and probability distributions on the underlying population, x-Resp can be -hard [Bertossi et al. (2020)]. An investigation of classes of classifiers for which the x-Resp score (deterministic as in this work or probabilistic) can be computed in polynomial time is still open.
The popular LIME score [Ribeiro et al. (2016)] is also an attribution score. It appeals to an explainable model that locally approximates the agnostic model around the entity under classification. From the resulting approximation feature scores can be computed.
Sufficient-explanation methods try to identify those (combinations of) feature values that alone determine the outcome at hand, in the sense that, by keeping those values and possibly changing all the others, the outcome remains the same [Ribeiro et al. (2018), Wang et al. (2021)]. One could say, in some sense, that the outcome is entailed by the values that appear in a sufficient explanation. Identifying those values is reminiscent of performing an abductive diagnosis task, as done with rule-based specifications [Eiter et al. (1997)], actually [Ribeiro et al. (2018)] does appeal to rule-based methods.
There are some approaches to logic-based explanations for classification. They mostly follow the sufficient-explanation paradigm we mentioned above. More specifically, it has been shown how to “reconstruct” certain classes of classifiers, e.g. random forests, Bayesian classifiers, and binary neural networks, as Boolean circuits [Shih et al. (2018), Shi et al. (2020), Choi et al. (2020)]. Once a circuit is available, one can use it in particular to obtain explanations for the outcomes of the model using methods that are, in essence, abductive [Darwiche and Hirth (2020)]. In this context the work presented in [Ignatiev et al. (2019), Ignatiev (2019), Izza and Marques-Silva (2021)] is also relevant, in that logic-based encodings of neural networks, boosted trees, and random forests are proposed and exploited for explanation purposes. Abductive and SAT-based approaches are followed. Notice that abductive methods that generate sufficient explanations can also be the basis for score definitions and their computation. Just for the gist, if a feature value appears in the large number of sufficient explanations, then it could be assigned a large individual score.
To the best of our knowledge, none of the approaches described above, and others, are based on logical specifications of the counterfactuals involved in the score definition and computation. Furthermore, these are specifications that can easily adopt domain or desirable logical constraints in a seamless manner, and for combined use. Actually, the logic-based representations of complex classifiers that we just mentioned above, could be the starting point for the use of our approach. For example, a Boolean circuit can be represented as a set of rules that becomes a first component of a CIP that does the counterfactual analysis on that basis.
9 Discussion
This work is about interacting via ASP with possibly external classifiers, and reasoning about their potential inputs and outputs. The classifier is supposed to have been learned by some other means. In particular, this work is not about learning ASPs, which goes in quite a different direction [Law et al. (2019)].
In this work we have treated classifiers as black-boxes that are represented by external predicates in the ASP. However, we have also considered the case of a classifier that is specified within the CIP by a set of rules, to define the classification predicate . This was the case of a deterministic Decision Tree. Basically, each branch from the root to a label can be represented by a rule, with the branching direction at intermediate nodes represented by values in literals in a rule body, and with the label in the rule head. Something similar can be done with Boolean Circuits used as classifiers. Actually, it is possible to represent more involved classifiers as Boolean circuits (c.f. Section 8).
Our CIPs can be easily enhanced with different extensions. For example, the feature domains can be automatically specified and computed from training or test data [Bertossi et al. (2020)], or other sources. As done in [Bertossi et al. (2020)] for experimental purposes and using purely procedural approaches, it is possible in our ASP setting to restrict the sizes of the contingency sets, e.g. to be of size (c.f. Section 3). This can be easily done by adding a cardinality condition to the body of the disjunctive intervention rule (which is supported by DLV-Complex and DLV2). Doing this would not lead to any loss of best explanations (as long as they fall within the imposed bound), and may reduce the computational work.
It is also possible to extend CIPs to make them specify and compute the contingency sets (of feature values) that accompany a particular value that has been counterfactually changed (c.f. Section 3). This requires a set-building operation, which is provided by DLV-Complex. Doing this would allow to obtain s-explanations, i.e. with minimal (not necessarily minimum) contingency sets (c.f. Example 5.3). One could try if, by deleting one element from the contingency set, the label changes or not. Again, DLV-Complex could be used here. In [Bertossi and Reyes (2021)] a detailed example can be found where this is illustrated at the light of the naive Bayes classifier. This approach was also followed in [Bertossi (2021)] to compute contingency sets for individual database tuples as causes for query answers.
Our specification of counterfactual explanations is in some sense ideal, in that the whole product space of the feature domains is considered, together with the applicability of the classifier over that space. This may be impractical or unrealistic. However, we see our proposal as a conceptual and generic specification basis that can be adapted in order to include more specific declarative practices and mechanisms.
For example, restricting the product space can be done in different manners. One can use constraints or additional conditions in rule bodies. A different approach consists in replacing the product space with the entities in a data sample . We could even assume that this sample already comes with classification labels, i.e. . This dataset may not be disjoint from the training dataset (c.f. Section 3). The definition of counterfactual explanation and CIPs could be adapted to these new setting without major difficulties.
The CIPs we have introduced are reminiscent of repair programs that specify and compute the repairs of a database that fails to satisfy the intended integrity constraints [Caniupan and Bertossi (2010)]. Actually, the connection between database repairs and actual causality for query answers was established and exploited in [Bertossi and Salimi (2017a)]. ASPs that compute tuple-level and attribute-level causes for query answers were introduced in [Bertossi (2021)]. Attribute-level causes are close to interventions of feature values, but the ASPs for the former are much simpler that those presented here, because in the database scenario, changing attribute values by nulls is good enough to invalidate the query answer (the “equivalent” in that scenario to switching the classification label here). Once a null is introduced, there is no need to take it into account anymore, and a single “step” interventions are good enough.
Our CIPs are designed to obtain general counterfactual explanations, and in particular and mainly, c-explanations. The latter are associated to minimum-size contingency sets of feature values, and, at the same time, to maximum-responsibility feature values. This is achieved via the weak constraints in (6). If we wanted to obtain the responsibility of a non-maximum-responsibility feature value, that is associated to an s-explanation that is not a c-explanation, we can remove the weak constraints (and by so doing keeping all the models of the CIP), and pose a query under the brave semantics about the values of inv-resp in (4). An approach like this was followed in [Bertossi (2021)] for database tuples as causes for query answers.
Apropos query answering, that we haven’t exploited in this work, several opportunities offer themselves. For example, we could pose a query under the brave semantics to detect if a particular feature value is ever counterfactually changed, or counterfactually changed in a best explanation. Under the skeptical semantics, we can identify feature values that change in all the counterfactual entities. Fully exploiting query answering is a matter of ongoing work.
In this work we have considered s- and c-explanations, that are associated to two specific and related minimization criteria. However, as done in abstract terms in Section 3, counterfactual explanations could be cast in terms of different optimization criteria [Karimi et al. (2020a), Russell (2019), Schleich et al. (2021)]. One could investigate the specification and implementation of other forms of preference, the generic in Definition 3.1, by using ASPs as in [Gebser et al. (2011), Brewka et al. (2015)].
Specifying and computing the generalized, probabilistic responsibility score of Section 7.1 goes beyond the usual capabilities of ASP systems. However, it would be interesting to explore the use of probabilistic ASPs for these tasks [Baral et al. (2009), Lee and Yang (2017)]. Along a similar line, probabilistic ASPs could be in principle used to deal with the integration of semantic constraints with underlying probability distributions on the entity population, as described in Section 7.2. This is all matter of ongoing work.
Acknowledgements: The author has been a member of RelationalAI’s Academic Network, which has been a source of inspiration for this work, and much more. Part of this work was funded by ANID - Millennium ScienceInitiative Program - Code ICN17002. Help from Jessica Zangari and Mario Alviano with information about DLV2, and from Gabriela Reyes with the DLV program runs is much appreciated.
References
- Alviano et al. (2017) Alviano, M., Calimeri, F., Dodaro, C., Fuscà, D., Leone, L., Perri, S., Ricca, F., Veltri, P., and Zangari, J. 2017. The asp system dlv2. In: Balduccini, M., and Janhunen, T. (Eds). Proceedings of the 14th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2017. LNCS, vol. 10377, Springer, 215-221.
- Alviano et al. (2019) Alviano, M., Amendola, G., Dodaro, C., Leone, N., Maratea, M. and Ricca, F. 2019. Evaluation of disjunctive programs in wasp. In: Marcello Balduccini, M., Lierler, Y., and Woltran, S. (Eds.). Proceedings of the 15th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2019. LNCS, vol. 11481, Springer, 241- 255.
- Arenas et al. (2021) Arenas, M., Pablo Barceló, P., Bertossi, L. and Monet, M. 2012. The tractability of shap-scores over deterministic and decomposable boolean circuits. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021. AAAI Press, 6670-6678.
- Baral et al. (2009) Baral, C., Gelfond, M. and Rushton, N. 2009. Probabilistic reasoning with answer sets. Theory and Practice of Logic Programming, 9, 1, 57-144.
- Ben-Eliyahu and Dechter (1994) Ben-Eliyahu, R. and Dechter, R. 1994. Propositional semantics for disjunctive logic programs. Annals of Mathematics in Artificial Intelligence, 12, 53-87.
- Bertossi (2011) Bertossi, L. 2011. Database Repairing and Consistent Query Answering. Synthesis Lectures in Data Management, Morgan & Claypool.
- Bertossi (2019) Bertossi, L. 2019. Database repairs and consistent query answering: origins and further developments. Gems of PODS paper. In: Suciu, D., Skritek, S., and Koch, Ch. (Eds.). Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019. ACM, 48-58.
- Bertossi and Salimi (2017a) Bertossi, L. and Salimi, B. 2017a. From causes for database queries to repairs and model-based diagnosis and back. Theory of Computing Systems, 61, 1, 191-232.
- Bertossi and Salimi (2017b) Bertossi, L. and Salimi, B. 2017b. Causes for query answers from databases: datalog abduction, view-updates, and integrity constraints. International Journal of Approximate Reasoning, 90, 226-252.
- Bertossi (2020) Bertossi, L. 2020. An asp-based approach to counterfactual explanations for classification. In: Gutiérrez-Basulto, V., Kliegr, T., Soylu, A. Giese, M., and Roman, D. (Eds.). Proceedings “Rules and Reasoning” - 4th International Joint Conference, RuleML+RR 2020. LNCS vol. 12173, Springer, 70–81.
- Bertossi (2021) Bertossi, L. 2021. Characterizing and computing causes for query answers in databases from database repairs and repair programs. Knowledge and Information Systems, 63, 1, 199-231.
- Bertossi et al. (2020) Bertossi, L., Li, J., Schleich, M., Suciu, D. and Vagena, Z. 2020. Causality-based explanation of classification outcomes. In: Sebastian Schelter,S., Whang, S., and Stoyanovich, J. (Eds.). Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, In conjunction with the 2020 ACM SIGMOD/PODS Conference, DEEM@SIGMOD 2020, 6:1–6:10.
- Bertossi and Geerts (2020) Bertossi, L. and Geerts, F. 2020. Data quality and explainable ai. ACM Journal of Data and Information Quality, 12, 2, 1-9.
- Bertossi and Reyes (2021) Bertossi, L. and Reyes, G. 2021. Answer-set programs for reasoning about counterfactual interventions and responsibility scores for classification. In: Proceedings of the 1st International Joint Conference on Learning and Reasoning, IJCLR 2021. LNCS, to appear, Springer. Extended version posted as ArXiv 2107.10159.
- Brewka et al. (2011) Brewka, G., Eiter, T. and Truszczynski, M. 2011. Answer set programming at a glance. Commununications of the ACM, 54, 12, 92–103.
- Brewka et al. (2015) Brewka, G., Delgrande, J., Romero, J. and Schaub, T. 2015. asprin: Customizing answer set preferences without a headache. In: Blai Bonet, B., and Koenig, S. (Eds.). Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015. AAAI Press, 1467–1474.
- Calimeri et al.. (2009) Calimeri, F., Cozza, S., Ianni, G., and Leone, N. 2009. An asp system with functions, lists, and sets. In: Erdem, E., Lin, F., and Schaub, T. (Eds.). Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2009. LNCS, vol. 5753, Springer, 483–489.
- Calimeri et al. (2020) Calimeri, F., Faber, W., Gebser, M., Ianni, G., Kaminski, R., Krennwallner, T., Leone, N., Maratea, M., Ricca, F. and Schaub, T. 2020. Asp-core-2 input language format. Theory and Practice of Logic Programming, 20, 2, 294-309.
- Caniupan and Bertossi (2010) Caniupan, M. and Bertossi, L. 2010. The consistency extractor system: answer set programs for consistent query answering in databases. Data & Knowledge Engineering, 69, 6, 545-572.
- Chockler and Halpern (2004) Chockler, H. and Halpern, J. Y. 2004. Responsibility and blame: a structural-model approach. Journal of Artificial Intelligence Research, 22, 93-115.
- Choi et al. (2020) Choi, A., Shih, A., Goyanka, A. and Darwiche, A. 2020. On symbolically encoding the behavior of random forests. ArXiv 2007.01493, 2020.
- Dantsin et al. (2001) Dantsin, E., Eiter, T., Gottlob, G. and Voronkov, A. 2001. Complexity and expressive power of logic programming. ACM Computing Surveys, 33, 3, 374-425.
- Datta et al. (2016) Datta, A., Sen, S. and Zick, Y. 2016. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of the IEEE Symposium on Security and Privacy, SP 2016. IEEE Computer Society, 598–617.
- Darwiche and Hirth (2020) Darwiche, A. and Hirth, A. 2020. On the reasons behind decisions. In: De Giacomo, G., Catalá, A., Dilkina, B., Milano, M., Barro, S., Bugarín, B., and Lang, J. (Eds.). Proceedings of the 24th European Conference on Artificial Intelligence, ECAI 2020. IOS Press, 712–720.
- Eiter et al. (1997) Eiter, T., Gottlob, G. and Leone, N. 1997. Abduction from logic programs: semantics and complexity. Theoretical Computer Science, 189, 1-2, 129-177.
- Eiter et al. (2019) Eiter, T., Germano, S., Ianni, G., Kaminski, T., Redl, C., Schüller, P. and Weinzierl, A. 2019. The dlvhex system. Künstliche Intelligenz, 32, 2-3, 187-189.
- Eiter et al. (2017) Eiter, T., Kaminski, T., Redl, C., Schüller, P. and Weinzierl, A. 2017. Answer set programming with external source access. In: Ianni, G., Lembo, D., Bertossi, L., Faber, W., Glimm, B., Gottlob, G., and Staab, S. (Eds.). Reasoning Web. Semantic Interoperability on the Web - 13th International Summer School 2017, Tutorial Lectures. LNCS, vol. 10370, Springer, 204–275.
- Flach (2012) Flach, P. Machine Learning. Cambridge University Press, 2012.
- Gebser et al. (2011) Gebser, M., Kaminski, R. and Schaub, T. 2011. Complex optimization in answer set programming. Theory and Practice of Logic Programming, 11, 4-5, 821-839.
- Izza and Marques-Silva (2021) Izza, Y. and Marques-Silva, J. 2021. On explaining random forests with SAT. In: Zhou, Z.-H. (Ed.). Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, 2584–2591.
- Gelfond and Kahl (2014) Gelfond, M. and Kahl, Y. 2014. Knowledge Representation and Reasoning, and the Design of Intelligent Agents. Cambridge University Press.
- Gelfond and Lifschitz (1991) Gelfond, M. and Lifschitz, V. 1991. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9, 365-385.
- Giannotti et al. (1997) Giannotti, F., Greco, S., Sacca, D. and Zaniolo, C. 1997. Programming with non-determinism in deductive databases. Annals of Mathematics in Artificial Intelligence, 19, 1-2, 97-125.
- Halpern and Pearl (2005) Halpern, J. and Pearl, J. 2005. Causes and explanations: a structural-model approach: part 1. British J. Philosophy of Science, 56, 843-887.
- Ignatiev et al. (2019) Ignatiev, A., Narodytska, N. and Marques-Silva, J. 2019. Abduction-based explanations for machine learning models. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, AAAI Press, 1511-1519.
- Ignatiev (2019) Ignatiev, A. 2020. Towards trustable explainable AI. In: Bessiere, C. (Ed.). Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, 5154-5158.
- Karimi et al. (2020a) Karimi, A-H., Barthe, G., Balle, B. and Valera, I. 2020a. Model-agnostic counterfactual explanations for consequential decisions. In: Chiappa, S., and Calandra, R. (Eds.). Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. PMLR, vol. 108, 895–905.
- Karimi et al. (2020b) Karimi, A-H., von Kügelgen, B. J., Schölkopf, B. and Valera, I. 2020b. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.-F., and Lin, H.-T. (Eds.). Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020.
- Law et al. (2019) Law, M., Russo, A. and Broda K. 2019. Logic-based learning of answer set programs. In: Krötzsch, M., and Stepanova, D. (Eds.). Reasoning Web. Explainable Artificial Intelligence - 15th International Summer School 2019, Tutorial Lectures. LNCS, vol. 11810, Springer, 196–231.
- Lee and Yang (2017) Lee, J. and Yang, Z. 2017. LPMLN, weak constraints, and p-log. In: Singh, S.P., and Markovitch, S. (Eds.). Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, AAAI Press, 1170-1177.
- Leone et al. (2006) Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Koch, C., Mateis, C., Perri, S. and Scarcello, F. 2006. The dlv system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7, 3, 499-562.
- Lundberg et al. (2020) Lundberg , S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J., Nair, B., Katz, R., Himmelfarb, J., Bansal, N. and Lee, S-I. 2020. From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2, 1, 56-67.
- Martens and Provost (2014) Martens, D. and Provost, F. J. 2014. Explaining data-driven document classifications. MIS Quarterly, 38, 1, 73-99.
- Meliou et al. (2010) Meliou, A., Gatterbauer, W., Moore, K. F. and Suciu, D. 2010. The complexity of causality and responsibility for query answers and non-answers. Proceedings of the VLDB Endowment, 4, 1, 34–45.
- Mitchell (1997) Mitchell, T. M. 1997. Machine Learning. McGraw-Hill.
-
Molnar (2020)
Molnar, C. 2020. Interpretable Machine Learning:
A Guide for Making Black Box Models Explainable.
https://christophm.github.io/interpretable-ml-book
- Narodytska et al. (2010) Narodytska, N., Shrotri, A., Meel, K., Ignatiev, A. and Marques-Silva, J. 2019. Assessing heuristic machine learning explanations with model counting. In: Janota, M., and Lynce, I. (Eds.). Proceedings of the 22nd International Conference on Theory and Applications of Satisfiability Testing, SAT 2019. LNCS, vol. 11628, Springer, 267–278.
- Pearl (2009) Pearl, J. 2009. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition.
- Rudin (2019) Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206-215.
- Russell (2019) Russell, Ch. 2019. Efficient search for diverse coherent explanations. In: Boyd, D., and Morgenstern, J. H. (Eds.). Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT∗ 2019. ACM, 20–28.
- Ribeiro et al. (2016) Ribeiro, M. T., Singh, S. and Guestrin, C. 2016. “Why should I trust you?”: explaining the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A. J., Aggarwal, C.C., Shen, D., and Rastogi, R. (Eds.). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016. ACM, 1135–1144.
- Ribeiro et al. (2018) Ribeiro, M. T., Singh, S. and Guestrin, C. 2018. Anchors: high-precision model-agnostic explanations. In: McIlraith, S. A., and Weinberger, K. Q. (Eds.). Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI Press, 1527–1535.
- Schleich et al. (2021) Schleich, M., Geng, Z., Zhang, Y. and Suciu, D. 2021. GeCo: Quality counterfactual explanations in real time. Proceedings of the VLDB Endowment, 14, 9, 1681-1693.
- Shi et al. (2020) Shi, W., Shih, A., Darwiche, A. and Choi, A. 2020. On tractable representations of binary neural networks. In: Calvanese, D., Erdem, E., and Thielscher, M. (Eds.). Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020, 882–892.
- Shih et al. (2018) Shih, A., Choi, A. and Darwiche, A. 2018. Formal verification of bayesian network classifiers. In: Studený, M., and Kratochvíl, V. (Eds.). Proceedings of the International Conference on Probabilistic Graphical Models, PGM 2018. PLMR, vol. 72, 157–168.
- Ustun et al. (2019) Ustun, B., Spangher, A. and Liu, Y. 2019. Actionable recourse in linear classification. In: Boyd, D., and Morgenstern, J. H. (Eds.). Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT∗ 2019. ACM, 10-19.
- Van den Broeck et al. (2021) Van den Broeck, G., Lykov, A., Schleich, M. and Suciu, D. 2021. On the tractability of shap explanations. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021. AAAI Press, 6505-6513.
- Wachter et al. (2017) Wachter, S., Mittelstadt, B. D. and Russell, C. 2017. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law & Technology, 31, 841.
- Wang et al. (2021) Wang, E., Khosravi, P. and Van den Broeck, G. 2021. Probabilistic sufficient explanations. In: Zhou, Z.-H. (Ed.). Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, 3082–3088.