This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: The University of Texas at Dallas, Richardson TX 75080, USA 22institutetext: Universidad Rey Juan Carlos, 28933 Móstoles, Madrid, Spain 33institutetext: The University of Texas at Dallas, Richardson TX 75080, USA 44institutetext: The University of Texas at Dallas, Richardson TX 75080, USA

CoGS: Causality Constrained Counterfactual Explanations using goal-directed ASPthanks: Authors supported by US NSF Grants IIS 1910131, US DoD, and industry grants.

Sopam Dasgupta 11    Joaquín Arias 22    Elmer Salazar 33    Gopal Gupta 44
Abstract

Machine learning models are increasingly used in areas such as loan approvals and hiring, yet they often function as black boxes, obscuring their decision-making processes. Transparency is crucial, and individuals need explanations to understand decisions, especially for the ones not desired by the user. Ethical and legal considerations require informing individuals of changes in input attribute values (features) that could lead to a desired outcome for the user. Our work aims to generate counterfactual explanations by considering causal dependencies between features. We present the CoGS (Counterfactual Generation with s(CASP)) framework that utilizes the goal-directed Answer Set Programming system s(CASP) to generate counterfactuals from rule-based machine learning models, specifically the FOLD-SE algorithm. CoGS computes realistic and causally consistent changes to attribute values taking causal dependencies between them into account. It finds a path from an undesired outcome to a desired one using counterfactuals. We present details of the CoGS framework along with its evaluation.

Keywords:
Causal reasoning Counterfactual reasoning Default Logic Goal-directed Answer Set Programming Planning problem.

1 Introduction

Predictive models are widely used in automated decision-making processes, such as job-candidate filtering or loan approvals. These models often function as black boxes, making it difficult to understand their internal reasoning for decision making. The decisions made by these models can have significant consequences, leading individuals to seek satisfactory explanations, especially for an unfavorable decision. This desire for transparency is crucial, whether the decision is made by an automated system or humans. Explaining these decisions presents a significant challenge. Additionally, users want to know what changes they must make to flip an undesired (negative) decision into a desired (positive) one.

In this work we follow Wachter et al.’s [19] approach where counterfactuals are employed to explain the reasoning behind a prediction by a machine learning model. Counterfactuals help answer the following question: “What changes should be made to input attributes or features to flip an undesired outcome to a desired one?" Counterfactuals also serve as a good candidate for explaining a prediction. Wachter et al. [19] use statistical techniques, where they examine the proximity of points in the N-dimensional feature space, to find counterfactuals. We present the Counterfactual Generation with s(CASP) (CoGS) framework, which generates counterfactual explanations from rule-based machine learning (RBML) algorithms such as FOLD-SE [20]. Our framework makes two advances compared to Wachter et al.’s work: (i) It computes counterfactuals using RBML algorithms and ASP, rather than statistical techniques, (ii) It takes causal dependencies among features into account when computing these counterfactuals. Another novelty of the CoGS framework is that it further leverages the FOLD-SE algorithm [20] to automatically discover potential dependencies between features that are subsequently approved by a user.

Our approach models various scenarios (or worlds): the current initial state ii representing negative a outcome and the goal state gg representing a positive outcome. A state is represented as a set of feature-value pairs. CoGS finds a path from the initial state ii to the goal state gg by performing interventions (or transitions), where each intervention corresponds to changing a feature value while taking causal dependencies among features into account. These interventions ensure realistic and achievable changes that will take us from state ii to gg. CoGS relies on common-sense reasoning, implemented through answer set programming (ASP) [10], specifically using the goal-directed s(CASP) ASP system [3]. The problem of finding these interventions can be viewed as a planning problem [10], except that unlike the planning problem, the moves (interventions) that take us from one state to another are not mutually independent.

2 Background

2.0.1 Counterfactual Reasoning:

Explanations help us understand decisions and inform actions. Wachter et al. [19] advocated using counterfactual explanations (CFE) to explain individual decisions, offering insights on how to achieve desired outcomes. For instance, a counterfactual explanation for a loan denial might state: If John had good credit, his loan application would be approved. This involves imagining alternate (reasonably plausible) scenarios where the desired outcome is achievable. For a binary classifier given by f:X{0,1}f:X\rightarrow\{0,1\}, we define a set of counterfactual explanations x^\hat{x} for a factual input xXx\in X as CFf(x)={x^X|f(x)f(x^)}\textit{CF}_{f}(x)=\{\hat{x}\in X|f(x)\neq f(\hat{x})\}. This set includes all inputs x^\hat{x} leading to different predictions than the original input xx under ff. We utilize the s(CASP) query-driven predicate ASP system [3] for counterfactual reasoning, incorporating causal dependency between features. s(CASP) computes dual rules (section 2.0.1), allowing negated queries and constructing alternate worlds/states that lead to counterfactual explanations while considering causal dependencies.

Causality Considerations: Causality relates to cause-effect relationship among predicates. PP is the cause of QQ, if (PQ)(P\Rightarrow Q) \wedge (¬P¬Q)(\neg P\Rightarrow\neg Q) [15]. We say that QQ is causally dependent on PP. For generating realistic counterfactuals, causal dependencies among features must be taken into account. For example, in a loan approval scenario, increasing the credit score to be ‘high’ while still being under increasing debt obligations is unrealistic due to the causal dependencies between debt, and credit score. Ignoring these dependencies could lead to invalid counterfactuals that do not achieve the desired outcome. Therefore, realistic counterfactual explanations must model these causal relationships to account for downstream effects of feature changes.

ASP, s(CASP) and Commonsense Reasoning: Answer Set Programming (ASP) is a paradigm for knowledge representation and reasoning [8, 4, 10], widely used in automating commonsense reasoning. We employ ASP to encode feature knowledge, decision-making rules, and causal rules, enabling the automatic generation of counterfactual explanations using this symbolic knowledge. s(CASP) is a goal-directed ASP system that executes answer set programs in a top-down manner without grounding [3]. Its query-driven nature aids in commonsense and counterfactual reasoning, utilizing proof trees for justification. To incorporate negation-as-failure, s(CASP) adopts program completion, turning “if” rules into “if and only if” rules. For every rule in a s(CASP) program, its corresponding dual rule(s) is computed. Details can be found elsewhere [3].

FOLD-SE: Wang and Gupta [20] developed FOLD-SE, an efficient, explainable rule-based machine learning (RBML) algorithm for classification tasks. FOLD-SE generates default rules—–a stratified normal logic program—–as an explainable model from the given input dataset. Both numerical and categorical features are allowed. The generated rules symbolically represent the machine learning model that will predict a label, given a data record. FOLD-SE can also be used for learning rules capturing causal dependencies among features in a dataset. FOLD-SE maintains scalability and explainability, as it learns a relatively small number of learned rules and literals regardless of dataset size, while retaining good classification accuracy compared to state-of-the-art machine learning methods. In CoGS, FOLD-SE is used to learn causal rules that accurately model feature dependencies and that are subsequently used to generate realistic counterfactual explanations.

The Planning Problem: Planning involves finding a sequence of transitions from an initial state to a goal state while adhering to constraints. In ASP, this problem is encoded in a logic program with rules defining transitions and constraints restricting the allowed transitions [10]. Solutions are represented as a series of transitions through intermediate states. Each state is represented as a set of facts or logical predicates. Solving the planning problem involves searching for a path of transitions that meets the goal conditions within the constraints. CoGS can be thought of as a framework to find a plan—a series of interventions that change feature values—that will take us from the initial state to the final goal state. However, unlike the planning domain, the interventions (moves) are not independent of each other due to causal dependencies among features.

3 Overview

3.1 The Problem

When an individual (represented as a set of features) receives an undesired negative decision (loan denial), they can seek necessary changes to flip it to a positive outcome. CoGS automatically identifies these changes. For example, if John is denied a loan (initial state ii), CoGS models the (positive) scenarios (goal set GG) where he obtains the loan. Obviously, the (negative) decision in the initial state ii should not apply to any scenario in the goal set GG. The query goal ‘?- reject_loan(john)’ should be True in the initial state ii and False for all goals in the goal set GG. The problem is to find a series of intervensions, namely, changes to feature values, that will take us from ii to gGg\in G.

3.2 Solution: CoGS Approach

Inspired by the planning problem, the CoGS approach casts the solution as traversing from an initial state to a goal state, represented as feature-value pairs (e.g., credit score: 600; age: 24). There can be multiple goal states that represents the positive outcome (set of goal states GG). The objective is to turn a negative decision (initial state i) into a positive one (goal state g) through necessary changes to feature values, so that the query goal ‘?- not reject_loan(john)’ will succeed for gGg\in G. As mentioned earlier, the CoGS framework involves solving a variant of the planning problem due to the dependence between features.

CoGS models two scenarios: 1) the negative outcome world (e.g., loan denial, initial state ii), and 2) the positive outcome world (e.g., loan approval, goal state gg) achieved through specific interventions. The initial state ii and goal state gg are defined by specific attribute values (e.g., loan approval requires a credit score >=600). CoGS symbolically computes the necessary interventions to find a path to the goal state gg, representing a flipped decision.

Given an instance where the decision query (e.g., ‘?- reject_loan/1’) succeeds (negative outcome), CoGS finds the state where this query fails (i.e., the decision query ‘?- not reject_loan/1’ succeeds), which then constitutes the goal state gg. In terms of ASP, the problem of finding interventions can be cast as follows: given a possible world where a query succeeds, compute changes to the feature values (while taking their causal dependencies into account) that will reach a possible world where negation of the query will succeed. Each of the intermediate possible worlds we traverse must be viable worlds with respect to the rules. We use the s(CASP) query-driven predicate ASP system [3] for this purpose. s(CASP) automatically generates dual rules, allowing us to execute negated queries (such as ‘?- not reject_loan/1’) constructively.

CoGS employs two kinds of actions: 1) Direct Actions: directly changing a feature value, and 2) Causal Actions: changing other features to cause the target feature to change, utilizing the causal dependencies between features. These actions guide the individual from the initial state ii to the goal state gg through (consistent) intermediate states, suggesting realistic and achievable changes. Unlike CoGS, Wachter et al.’s approach [19] can output non-viable solutions.

3.2.1 Example 1- Using direct actions to reach the counterfactual state:

Consider a loan application scenario. There are 4 feature-domain pairs: 1) Age: {1 year,…, 99 years}, 2) Debt: {$1\$1, …, $1000000\$1000000}, 3) Bank Balance: {$0\$0, …, $1billion\$1\ billion} and 4) Credit Score: {300points,,850points300\ points,...,850\ points}.

John (31 years, $5000\$5000, $40000\$40000, 599points599\ points) applies for a loan. The bank’s rule is to deny loans to individuals with a bank balance of less than $60000\$60000. Hence, John is denied a loan (negative outcome). To get his loan approved (positive outcome), CoGS suggests the following: Initial state: John (31 years, $5000\$5000, 12 months, $40000\$40000, 599points599\ points) is denied a loan. Goal state: John (31 years, $5000\$5000, 12 months, $60000\$60000, 599points599\ points) is approved for a loan. Intervention: Suggest to John to change his bank balance to $60000\$60000. As shown in Fig. 1 (top), the direct action of increasing the bank balance to $60000\$60000 flips the decision. John becomes eligible for the loan, bypassing bank’s rejection criteria.

Refer to caption
Figure 1: Top: Example 1 shows how John goes from being rejected for a loan to having his loan approved. Here the bank only considers the bank balance for loan approval. John does a direct action to increase his bank balance to $60000\$60000. Bottom: Example 2 shows how John goes from being rejected for a loan to having his loan approved. Here the bank considers both bank balance as well as credit score for loan approval. While the bank balance is directly altered by John, altering the credit score requires John to directly alter his debt obligations first. After clearing his debt, the causal effect of having $0\$0 debt increases John’s credit score to 620point620\ point. This is the causal action

Example 2- Highlighting the utility of Causal Actions: This example demonstrates the advantages of incorporating causal rules. Consider a scenario where the bank has two rejection rules: 1) Deny loans to individuals with a bank balance of less than $60000\$60000, and 2) Deny loans to individuals with a credit score below 600600. John (31 years, $5000\$5000, $40000\$40000, 599points599\ points) is denied a loan (negative outcome) but wishes to get it approved (positive outcome). Without the knowledge of causal dependencies, the solution will be the following: Initial state: John (31 years, $5000\$5000, 12 months, $40000\$40000, 599points599\ points) is denied a loan. Goal state: John (31 years, $5000\$5000, 12 months, $60000\$60000, 620points620\ points) is approved for a loan. Interventions: 1) Change the bank balance to $60000\$60000, and 2) the credit score to 620points620\ points. However, credit score cannot be changed directly.

To realistically increase the credit score, the bank’s guidelines suggest 1) having no debt. This leads to a causal dependency between debt and credit score. Incorporating this, CoGS provides: Initial state: John (31 years, $5000\$5000, 12 months, $40000\$40000, 599points599\ points) is denied a loan. Goal state: John (31 years, $0\$0, 12 months, $60000\$60000, 620points620\ points) is approved for a loan. Interventions: 1) John changes his bank balance to $60000\$60000, and 2) reduces his debt to $0 to increase his credit score.

As shown in Figure 1 (bottom), by clearing the debt (direct action), John’s credit score increases (causal effect), making him eligible for the loan. Intermediate states, such as John with $5000\$5000 in debt and John with $0\$0 in debt, are part of the path to the goal state. This example illustrates how using causal dependencies between features allows realistic achievement of desired outcomes.

Hence, we demonstrate how utilizing causal dependencies between features allows us to realistically achieve desired outcomes by making appropriate changes. The challenge now is to accomplish this automatically, i.e.,: (i) identify causal dependencies automatically: we use a rule-based machine learning algorithm (FOLD-SE) for this purpose. (ii) Compute the sequence of necessary interventions automatically: in particular, we want to avoid repeating states, a known issue in the planning domain. Our CoGS approach addresses these challenges. It generates the path from the initial state i to the counterfactual goal state g using the find_path algorithm (Algorithm 1), explained in Section 4.1.

4 Methodology

We next outline the methodology used by CoGS to generate paths from the initial state (negative outcome) to the goal state (positive outcome). Unlike traditional planning problems where actions are typically independent, our approach involves interdependent actions governed by causal rules CC. This ensures that the effect of one action can influence subsequent actions, making interventions realistic and causally consistent. Note that the CoGS framework uses the FOLD-SE RBML algorithm [20] to automatically compute causal dependency rules. These rules have to be either verified by a human, or commonsense knowledge must be used to verify them automatically. This is important, as RBML algorithms can identify a correlation as a causal dependency. CoGS uses the former approach. The latter is subject of research. We next define specific terms.

Definition 1 (State Space (S))

SS represents all combinations of feature values. For domains D1,,DnD_{1},...,D_{n} of the features F1,,FnF_{1},...,F_{n}, SS is a set of possible states ss, where each state is defined as a tuple of feature values V1,,VnV_{1},...,V_{n}

sSwhereS={(V1,V2,,Vn)|ViDi,foreachiin 1,,n}s\in S\ where\ S=\{(V_{1},V_{2},...,V_{n})|V_{i}\in D_{i},\ for\ each\ i\ in\ 1,...,n\} (1)

For example state ss can be : s=(31years,$5000,$40000, 599points)s=(31\ years,\ \$5000,\ \$40000,\ 599\ points).

Definition 2 (Causally Consistent State Space (SCS_{C}))

SCS_{C} is a subset of SS where all causal rules are satisfied. CC represents a set of causal rules over the features within a state space SS. Then, θC:P(S)P(S)\theta_{C}:P(S)\rightarrow P(S) (where P(S)P(S) is the power set of S) is a function that defines the subset of a given state sub-space SSS^{\prime}\subseteq S that satisfy all causal rules in C.

θC(S)={sSssatisfiesallcausalrulesinC}\theta_{C}(S^{\prime})=\{s\in S^{\prime}\mid s\ satisfies\ all\ causal\ rules\ in\ C\} (2)
SC=θC(S)S_{C}=\theta_{C}(S) (3)

E.g., if a causal rule states that if debtdebt is 0, the credit score should be above 599, then instance s1=(31years,$0,$40000, 620points)s_{1}=(31\ years,\ \$0,\ \$40000,\ 620\ points) is causally consistent, and instance s2=(31years,$0,$40000, 400points)s_{2}=(31\ years,\ \$0,\ \$40000,\ 400\ points) is causally inconsistent.

In a traditional planning problem, allowed actions in a given state are independent, i.e., the result of one action does not influence another. In CoGS, causal actions are interdependent, governed by CC.

Definition 3 (Decision Consistent State Space (SQS_{Q}))

SQS_{Q} is a subset of SCS_{C} where all decision rules are satisfied. QQ represents a set of rules that compute some external decision for a given state. θQ:P(S)P(S)\theta_{Q}:P(S)\rightarrow P(S) is a function that defines the subset of the causally consistent state space SSCS^{\prime}\subseteq S_{C} that is also consistent with decision rules in QQ:

θQ(S)={sSssatisfiesanydecisionruleinQ}\theta_{Q}(S^{\prime})=\{s\in S^{\prime}\mid s\ satisfies\ any\ decision\ rule\ in\ Q\} (4)

Given SCS_{C} and θQ\theta_{Q}, we define the decision consistent state space SQS_{Q} as

SQ=θQ(SC)=θQ(θC(S))S_{Q}=\theta_{Q}(S_{C})=\theta_{Q}(\theta_{C}(S)) (5)

For example, an individual John whose loan has been rejected: s = (3131 years, $0\$0, $40000\$40000, 620620 points), where sSQs\in S_{Q}.

Definition 4 (Initial State (ii))

ii is the starting point with an undesired outcome. Initial state ii is an element of the causally consistent state space SCS_{C}

iSCi\in S_{C} (6)

For example, i=(31years,$0,$40000, 620points)i=(31\ years,\ \$0,\ \$40000,\ 620\ points)

Definition 5 (Actions)

The set of actions AA includes all possible interventions (actions) that can transition a state from one to another within the state space. Each action aAa\in A is defined as a function that maps ss to a new state ss^{\prime}.

a:SSwhereaAa:S\rightarrow S\ \mid where\ a\in A (7)

Actions are divided into: 1) Direct Actions: Directly change the value of a single feature of a state ss, e.g., Increase bank balance from $40000\$40000 to $60000\$60000. 2) Causal Actions: Change the value of a target feature by altering related features, based on causal dependencies. It results in a causally consistent state with respect to C, e.g., reduce debt to increase the credit score.

Definition 6 (Transition Function)

A transition function δ:SC×ASC\delta:S_{C}\times A\rightarrow S_{C} maps a causally consistent state to the set of allowable causally consistent states that can be reached in a single step, and is defined as:

δ(s,a)={a(s) if a(s)SCδ(a(s),a) with aA,aA, otherwise\delta(s,a)=\left\{\begin{array}[]{l}a(s)\textit{ if }a(s)\in S_{C}\\ \delta(a(s),a^{\prime})\textit{ with }a\in A,a^{\prime}\in A\textit{, otherwise}\end{array}\right.

δ\delta models a function that repeatedly takes actions until a causally consistent state is reached. In example 1, δ\delta suggests changing the bank balance from $40000\$40000 to $60000\$60000: δ(31years,$5000,$40000,599)=\delta(31\ years,\$5000,\$40000,599)= (31years,$5000,$60000,599)(31\ years,\$5000,\$60000,599)

Definition 7 (Counterfactual Generation (CFG) Problem)

A counterfactual generation (CFG) problem is a 4-tuple (SC,SQ,I,δ)(S_{C},S_{Q},I,\delta) where SCS_{C} is causally consistent state space, SQS_{Q} is the decision consistent state space, ISCI\in S_{C} is the initial state , and δ\delta is a transition function.

Definition 8 (Goal Set)

The goal set GG is the set of desired outcomes that do not satisfy the decision rules QQ. For counterfactual (SC,SQ,I,δ)(S_{C},S_{Q},I,\delta), GSCG\subseteq S_{C}:

G={sSC|sSQ}G=\{s\in S_{C}|s\not\in S_{Q}\} (8)

GG includes all states in SCS_{C} that do not satisfy SQS_{Q}. For example 1, an example goal state gGg\in G is g=(31years,$5000,$60000, 599points).g=(31\ years,\ \$5000,\ \$60000,\ 599\ points).

Definition 9 (Solution Path)

A solution to the problem (SC,SQ,I,δ)(S_{C},S_{Q},I,\delta) with Goal set GG is a path:

s0,s1,smwheresjSCforallj{0,,m}suchthats0,,sm1G;si+1δ(si)fori{0,,m1};s0=I;smG\begin{split}s_{0},s_{1},...s_{m}\ where\ s_{j}\in S_{C}\ for\ all\ j\ \in\{0,...,m\}\ such\ that\\ s_{0},...,s_{m-1}\not\in G;\ s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,m-1\};\ s_{0}=I;s_{m}\in G\end{split} (9)

For example 1, individuals with less than $60000\$60000 in their account are ineligible for a loan, thus the state of an ineligible individual sSQs\in S_{Q} might be s=(31years,$5000,$40000,599points)s=(31\ years,\$5000,\$40000,599\ points). The goal set has only one goal state gGg\in G given by s=(31years,$5000,$60000,599points)s=(31\ years,\$5000,\$60000,599\ points). The path from ss to gg is {(31 years,$5000,$40000,599 points)Direct\rightarrow Direct\rightarrow(31 years,$5000,$60000,599 points)}\}. Here, the path has only 2 states as only changing the bank balance to be $60000\$60000 is needed to reach the goal state.

4.1 Algorithm

We next describe our algorithm to find the goal states and compute the solution paths. The algorithm makes use of the following functions: (i) not_member: checks if an element is: a) not a member of a list, and b) Given a list of tuples, not a member of any tuple in the list. (ii) drop_inconsistent: given a list of states [s0,,sks_{0},...,s_{k}] and a set of Causal rules CC, it drops all the inconsistent states resulting in a list of consistent states with respect to CC. (iii) get_last: returns the last member of a list. (iv) pop: returns the last member of a list. (v) is_counterfactual: returns True if the input state is a causally consistent counterfactual solution (see supplement for details [2]).

Algorithm 1 find_path: Obtain a path to the counterfactual state
0:  Initial State (I)(I), States SS, Causal Rules CC, Decision Rules QQ, is_counterfactual (Algorithm 3), delta (Algorithm 4), Actions aAa\in A:
1:   Create an empty list visited_states that tracks the list of states traversed (so that we avoid revisiting them).
2:  Append (ii, [ ]) to visited_states
3:  while is_counterfactual(get_last(visited_states),C,Q)isFALSEis\_counterfactual(get\_last(visited\_states),C,Q)\ is\ FALSE  do
4:     Set visited_states=intervene(visited_states,C,A)visited\_states=intervene(visited\_states,C,A)
5:  end while
6:  candidate_path = drop_inconsistent(visited_states)
7:  Return candidate_path

Find Path: Function ‘find_path’ implements the Solution Path PP of Definition 9. Its purpose is to find a path to the counterfactual state. Algorithm 1 provides the pseudo-code for ‘find_path’, which takes as input an Initial State ii, a set of Causal Rules CC, decision rules QQ, and actions AA. It returns a path to the counterfactual state/goal state gGg\in G for the given ii as a list ‘visited_states’. Unrealistic states are removed from ‘visited_states’ to obtain a ‘candidate_path’.

Initially, s=is=i. The function checks if the current state ss is a counterfactual. If ss is already a counterfactual, ‘find_path’ returns a list containing ss. If not, the algorithm moves from s=is=i to a new causally consistent state ss^{\prime} using the ‘intervene’ function, updating ‘visited_states’ with ss^{\prime}. It then checks if ss^{\prime} is a counterfactual using ‘is_counterfactual’. If True, the algorithm drops all inconsistent states from ‘visited_states’ and returns the ‘candidate_path’ as the path from ii to ss^{\prime}. If not, it updates ‘current_statecurrent\_state’ to ss^{\prime} and repeats until reaching a counterfactual/goal state gg. The algorithm ends when ‘is_counterfactual’ is satisfied, i.e., s=gwheregGs^{\prime}=g\mid\ where\ g\in G.

sGbydefinitions0,,sk,ss0,,sk:G\begin{split}s^{\prime}\in G\mid\ by\ definition\\ s_{0},...,s_{k},s^{\prime}\mid s_{0},...,s_{k}:\not\in G\end{split} (10)

Intervene: Function ‘intervene’ implements the transition function δ\delta from Definition 6. It is called by ‘find_path’ in line 4 of Algorithm 1. The primary purpose of ‘intervene’ is to transition from the current state to the next state, ensuring actions are not repeated and states are not revisited. Its pseudo-code as well as a detailed exploration is available in the supplement.

Algorithm 2 make_consistent: reaches a consistent state
0:  State ss, Causal rules CC, List visited_states , actions_taken, Actions aAa\in A:
1:  while ss does not satisfy all rules in CC do
2:     Try to select a causal action aa ensuring not_member(a(s),visited_states) and not_member(a,actions_taken) are TRUETRUE
3:     if  causal action aa exists then
4:        Set (s,actions_taken),visited_states=update(s,visited_states,actions_taken,a)
5:     else
6:        Try to select a direct action aa ensuring not_member(a(s),visited_states) and not_member(a,actions_taken) are TRUETRUE
7:        if  direct action aa exists then
8:           Set (s, actions_taken), visited_states=update(s,visited_states,actions_taken,a)
9:        else
10:           //Backtracking
11:           if visited_states is empty  then
12:              EXIT with Failure
13:           end if
14:           Set (s,actions_taken)(s,actions\_taken) = pop(visited_states)
15:        end if
16:     end if
17:  end while
18:  Return (s,actions_taken),visited_states(s,actions\_taken),visited\_states .

Make Consistent: The pseudo-code for ‘make_consistent’ is specified in Algorithm 2. It takes as arguments a current State ss, a list actions_taken, a list visited_states, a set of Causal Rules CC and a set of actions AA. Called by ‘intervene’ in line 12 of Algorithm 4, ‘make_consistent’ transitions from the current state to a new, causally consistent state.

4.1.1 Update

Function ‘update’ tracks the list of actions taken and states visited to avoid repeating actions and revisiting states. Its pseudo-code is provided and explored in detail in the supplement.

Discussion: A few points should be highlighted: (i) Certain feature values may be immutable or restricted, such as age cannot decrease or credit score cannot be directly altered. To respect these restrictions, we introduce plausibility constraints. These constraints apply to the actions in our algorithms, ensuring realistic changes to the features. Since they do not add new states but restrict reachable states, they are represented through the set of available actions in Algorithms 1, 4, 2, 5. (ii) Similarly, CoGS has the ability to specify the path length for candidate solutions. Starting with a minimal path length of 1, CoGS can identify solutions requiring only a single change. If no solution exists, CoGS can incrementally increase the path length until a solution is found. This ensures that the generated counterfactuals are both minimal and causally consistent, enhancing their practicality and interpretability. This is achieved via constraints on path length.

4.2 Soundness

Definition 10 (CFG Implementation)

When Algorithm 1 is executed with the inputs: Initial State ii (Definition 4), States Space SS (Definition 1), Set of Causal Rules CC (Definition 2), Set of Decision Rules QQ (Definition 3), and Set of Actions AA (Definition 5), a CFG problem (SC,SQ,I,δ)(S_{C},S_{Q},I,\delta) (Definition 7) with causally consistent state space SCS_{C} (Definition 2), Decision consistent state space SQS_{Q} (Definition 3), Initial State ii (Definition 4), the transition function δ\delta (Definition 6) is constructed.

Definition 11 (Candidate path)

Given the counterfactual (SC,SQ,I,δ)(S_{C},S_{Q},I,\delta) constructed from a run of algorithm 1, the return value (candidate path) is the resultant list obtained from removing all elements containing states sSCs^{\prime}\not\in S_{C}.

Definition 10 maps the input of Algorithm 1 to a CFG problem (Definition 7). Candidate path maps the result of Algorithm 1 to a possible solution (Definition 9) of the corresponding CGF problem. From Theorem 7.1 (proof in supplement [2]), the candidate path (Definition 11) is a solution to the corresponding CFG problem implementation (Definition 10).

Theorem 1 Soundness: Given a CFG 𝕏=(SC,SQ,I,δ)\mathbb{X}=(S_{C},S_{Q},I,\delta), constructed from a run of Algorithm 1 & a corresponding candidate path PP, PP is a solution path for 𝕏\mathbb{X}. Proof: Given in the supplemental document [2].

5 Experiments

We applied the CoGS methodology to rules generated by the FOLD-SE algorithm (code on GitHub [2]). Our experiments use the German dataset [11], Adult dataset [5], and the Car Evaluation dataset [7]. These are popular datasets found in the UCI Machine Learning repository [1]. The German dataset contains demographic data with labels for credit risk (‘good’ or ‘bad’), with records with the label ‘good’ greatly outnumbering those labeled ‘bad’. The Adult dataset includes demographic information with labels indicating income (‘=<$50k/year=<\$50k/year’ or ‘>$50k/year>\$50k/year’). The Car Evaluation dataset provides information on acceptability of a used car being purchased. We relabelled the Car Evaluation dataset to ‘acceptable’ and ‘unacceptable’ in order to generate the counterfactuals.

For the (imbalanced) German dataset, the learned FOLD-SE rules determine ‘good’ credit rating, with the undesired outcome being a ‘good’ rating, since the aim is to identify criteria making someone a credit risk (‘bad’ rating). Additionally, causal rules are also learnt using FOLD-SE and verified (for example, if feature ‘Job’ has value ‘unemployed’, then feature ‘Present employment since’ should have the value ‘unemployed/unskilled-non-resident’). We learn the rules to verify these assumptions on cause-effect dependencies.

By using these rules that identify individuals with a ‘good’ rating, we found a path to the counterfactuals thereby depicting steps to fall from a ‘good’ to a ‘bad’ rating in Table 1. Similarly, we learn the causal rules as well as the rules for the undesired outcome for the Adult dataset (undesired outcome: ‘=<$50k/year=<\$50k/year’). For the Car Evaluation dataset (undesired outcome: ‘unacceptable’), we only learn the rules for the undesired outcome as there are no causal dependencies (FOLD-SE did not generate any either). Table 1 shows a path to the counterfactual goal state for a specific instance for each of these datasets. Note that the execution time for finding the counterfactuals is also reported. While we have only shown specific paths in Table 1, our CoGS methodology can generate all possible paths from an original instance to a counterfactual. Note that each path may represent a set of counterfactuals. This is because numerical features may range over an interval. Thus, CoGS generates 240 sets of counterfactuals for the the German dataset, 112 for the Adult dataset, and 78 for the Car Evaluation dataset (See Table 2 in the supplement [2]).

Dataset Features Initial State Action Goal State Time (ms)
german Checking account status 200\geq\ 200 N/A 200\geq\ 200 3236
Credit history no credits taken/all credits paid back duly N/A no credits taken/all credits paid back duly
Property real estate Direct car or other
Duration months 7 N/A 7
Credit amount 500 N/A 500
Job unemployed N/A unemployed
Present Employment Since unemployed/unskilled-non-resident N/A unemployed/unskilled-non-resident
car evaluation persons 4 N/A 4 1221
maint low Direct medium
buying medium N/A medium
safety medium N/A medium
Dataset Features Initial State Action Intermediate Action Goal State Time (ms)
adult Marital_Status never_married N/A never_married Causal married_civ_spouse 1126
Capital Gain $6000 N/A N/A N/A >6849>6849 and 99999\leq 99999
Education_num 77 N/A N/A N/A 77
Relationship unmarried Direct husband N/A husband
Sex male N/A N/A N/A male
Age 28 N/A N/A N/A 28
Table 1: Paths showing transitions to goal states alongside the time taken across different datasets: 1) German: The value of Property changes from real estate to car or other. 2) Car Evaluation: The value of maint goes from low to medium. 3) Adult: The value of Relationship changes from unmarried to husband. This has a causal effect of altering Marital Status to married_civ_spouse.

6 Related Work and Conclusion

Various methods for generating counterfactual explanations in machine learning have been proposed. Wachter et al. [19] aimed to provide transparency in automated decision-making by suggesting changes individuals could make to achieve desired outcomes. However, they ignored causal dependencies, resulting in unrealistic suggestions. Utsun et al. [18] introduced algorithmic recourse, offering actionable paths to desired outcomes but assuming feature independence, which is often unrealistic. CoGS rectifies this by incorporating causal dependencies. Karimi et al. [12] focused on feature immutability and diverse counterfactuals, ensuring features like gender or age are not altered and maintain model-agnosticism. However, this method also assumes feature independence, limiting realism. Existing approaches include model-specific, optimization-based approaches [17, 16]. White et al. [21] showed how counterfactuals can enhance model performance and explanation accuracy . Karimi et al. [13] further emphasized incorporating causal rules in counterfactual generation for realistic and achievable interventions. However their method did not use the ‘if and only’ property which is vital in incorporating the effects of causal dependence. Bertossi and Reyes [6] rectified this by utilizing Answer Set Programming (ASP) but they relied on grounding, which can disconnect variables from their associations. In contrast, CoGS does not require grounding as it leverages s(CASP) to generate counterfactual explanations, providing a clear path from undesired to desired outcomes.

Eiter et al. [9] proposed a framework for generating contrastive explanations in the context of ASP, focusing on why one particular outcome occurred instead of another. While contrastive explanations identify and compare the assumptions leading to different outcomes, CoGS goes further by incorporating causal dependencies, ensuring that the generated counterfactuals are realistic and achievable.

The main contribution of this paper is the Counterfactual Generation with s(CASP) (CoGS) framework for automatically generating counterfactuals while taking causal dependencies into account to flip a negative outcome to a positive one. CoGS has the ability to find minimal paths by iteratively adjusting the path length. This ensures that explanations are both minimal and causally consistent. CoGS is flexible, generating counterfactuals irrespective of the underlying rule-based machine learning (RBML) algorithm. The causal dependencies can be learned from data using any RBML algorithm, such as FOLD-SE. The goal-directed s(CASP) ASP system plays a crucial role, as it allows us to compute a possible world in which a query Q fails by finding the world in which the query not Q succeeds. CoGS advances the state of the art by combining counterfactual reasoning, causal modeling, and ASP-based planning, offering a robust framework for realistic and actionable counterfactual explanations. Our experimental results show that counterfactuals can be computed for complex models in a reasonable amount of time. Future work will explore computing counterfactuals for image classification tasks, inspired by Padalkar et al [14].

References

  • [1] UCI machine learning repository, https://archive.ics.uci.edu/
  • [2] Supplement: CoGS: Causality Constrained Counterfactual Explanations using Goal-directed ASP (2024), https://github.com/sopam/Supplementary
  • [3] Arias, J., Carro, M., Salazar, E., Marple, K., Gupta, G.: Constraint Answer Set Programming without Grounding. TPLP 18(3-4), 337–354 (2018)
  • [4] Baral, C.: Knowledge representation, reasoning and declarative problem solving. Cambridge University Press (2003)
  • [5] Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996), DOI: https://doi.org/10.24432/C5XW20
  • [6] Bertossi, L.E., Reyes, G.: Answer-set programs for reasoning about counterfactual interventions and responsibility scores for classification. In: Proc. ILP. LNCS
  • [7] Bohanec, M.: Car Evaluation. UCI Machine Learning Repository (1997), DOI: https://doi.org/10.24432/C5JP48
  • [8] Brewka, G., Eiter, T., Truszczynski, M.: Answer set programming at a glance. Commun. ACM 54(12), 92–103 (2011)
  • [9] Eiter, T., Geibinger, T., Oetsch, J.: Contrastive explanations for answer-set programs. In: Logics in Artificial Intelligence JELIA. pp. 73–89. LNCS (2023)
  • [10] Gelfond, M., Kahl, Y.: Knowledge representation, reasoning, and the design of intelligent agents: Answer Set Programming approach. Cambridge Univ. Press (2014)
  • [11] Hofmann, H.: Statlog (German Credit Data). UCI Machine Learning Repository (1994), DOI: https://doi.org/10.24432/C5NC77
  • [12] Karimi, A., Barthe, G., Balle, B., Valera, I.: Model-agnostic counterfactual explanations for consequential decisions. In: AISTATS. PMLR (2020)
  • [13] Karimi, A., Schölkopf, B., Valera, I.: Algorithmic recourse: from counterfactual explanations to interventions. In: Proc. ACM FAccT. pp. 353–362 (2021)
  • [14] Padalkar, P., Wang, H., Gupta, G.: NeSyFOLD: A framework for interpretable image classification. In: Proc. AAAI (2024)
  • [15] Pearl, J.: Causal inference in statistics: An overview. Statistics Surveys 3(none), 96 – 146 (2009), https://doi.org/10.1214/09-SS057
  • [16] Russell, C.: Efficient search for diverse coherent explanations. In: Proc. ACM FAT. p. 20–28 (2019)
  • [17] Tolomei, G., Silvestri, F., Haines, A., Lalmas, M.: Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proc. ACM SIGKDD
  • [18] Ustun, B., Spangher, A., Liu, Y.: Actionable recourse in linear classification. In: FAT. ACM (2019)
  • [19] Wachter, S., Mittelstadt, B.D., Russell, C.: Counterfactual explanations without opening the black box: Automated decisions and the GDPR. CoRR
  • [20] Wang, H., Gupta, G.: FOLD-SE: an efficient rule-based machine learning algorithm with scalable explainability. In: Proc. PADL 2024. pp. 37–53. Springer LNCS 14512
  • [21] White, A., d’Avila Garcez, A.S.: Measurable counterfactual local explanations for any classifier. In: Proc. ECAI. vol. 325, pp. 2529–2535 (2020)

7 Supplementary Material

7.1 Methodology: Details

7.1.1 is_counterfactual: Checks for Goal State/ Counterfactual State

Algorithm 3 is_counterfactual: checks if a state is a counterfactual/goal state
0:  State sSs\in S, Set of Causal rules CC, Set of Decision rules QQ
1:  if ss satisfies ALL rules in CC AND ss satisfies NO rules in QQ then
2:     Return TRUETRUE.
3:  else
4:     Return FALSEFALSE.
5:  end if

The function is_counterfactual is our algorithmic implementation of checking if a state sGs\in G from definition 8. In Algorithm 3, we specify the pseudo-code for a function is_counterfactual which takes as arguments a state sSs\in S, a set of causal rules CC, and a set of Decision rules QQ. The function checks if a state sSs\in S is a counterfactual/goal state. By definition is_counterfactual is TRUETRUE for state ss that is causally consistent with all cCc\in C and does not agree with the any decision rules qQq\in Q.

is_counterfactual(s,C,Q)=TRUEsagreeswithC;sdisagreeswithQ;is\_counterfactual(s,C,Q)=TRUE\mid s\ agrees\ with\ C;\ s\ disagrees\ with\ Q; (11)

7.1.2 Intervene: Transition Function for moving from the current state to the new state

Algorithm 4 intervene: reach a causally consistent state from a causally consistent current state
0:  Causal rules CC, List visited_states, List actions_taken, Actions aAa\in A:
  • Causal Action: ss gets altered to a causally consistent new state s=a(s)s^{\prime}=a(s). OR

  • Direct Action: new state s=a(s)s^{\prime}=a(s) is obtained by altering 1 feature value of ss.

1:  Set (s,actions_taken)(s,actions\_taken) = pop(visited_states)
2:  Try to select an action aAa\in A ensuring not_member(a(s),visited_states) and not_member(a,actions_taken) are TRUETRUE
3:  if  aa exists then
4:     Set (s,actions_taken),visited_states=update(s,visited_states,actions_taken,a)(s,actions\_taken),visited\_states=update(s,visited\_states,actions\_taken,a)
5:  else
6:     //Backtracking
7:     if visited_states is empty  then
8:        EXIT with Failure
9:     end if
10:     Set (s,actions_taken)(s,actions\_taken) = pop(visited_states)
11:  end if
12:  Set (s,actions_taken),visited_states=(s,actions\_taken),visited\_states=      make_consistent(s,actions_taken,visited_states,C,A)make\_consistent(s,actions\_taken,visited\_states,C,A)
13:  Append (s,actions_taken)(s,actions\_taken) to visited_states
14:  Return visited_states.

The function intervene is our algorithmic implementation of the transition function δ\delta from definition 6. In Algorithm 4, we specify the pseudo-code for a function intervene, which takes as arguments an Initial State ii that is causally consistent, a set of Causal Rules CC, and a set of actions AA. It is called by find_path in line 4 of algorithm 1.

The function intervene acts as a transition function that inputs a list visited_states containing the current state ss as the last element, and returns the new state ss^{\prime} by appending ss^{\prime} to visited_states. The new state ss^{\prime} is what the current state ss traverses. Additionally, the function intervene ensures that no states are revisited. In traversing from ss to ss^{\prime}, if there are a series of intermediate states that are not causally consistent, it is also included in visited_states, thereby depicting how to traverse from one causally consistent state to another.

7.1.3 Update

Algorithm 5 update: Updates the list actions_taken with the planned action. Then updates the current state.
0:  State ss, List visited_states, List actions_taken, Action aAa\in A:
  • Causal Action: ss gets altered to a causally consistent new state s=a(s)s^{\prime}=a(s). OR

  • Direct Action: new state s=a(s)s^{\prime}=a(s) is obtained by altering 1 feature value of ss.

1:  Append aa to actions_taken.
2:  Append (s,actions_taken)(s,actions\_taken) to visited_states.
3:  Set s=a(s)s=a(s).
4:  return  (s,[])(s,[~{}]), visited_states

In Algorithm 5, we specify the pseudo-code for a function update, that given a state ss, list actions_taken, list visited_statesand given an action aa, appends aa to actions_taken. It also appends the list actions_taken as well as the new resultant state resulting from the action a(s)a(s) to the list visited_states. The list actions_taken is used to track all the actions attempted from the current state to avoid repeating them. The function update is called by both functions intervene and make_consistent.

7.1.4 Candidate Path

Given the CFG (SC,SQ,I,δ)(S_{C},S_{Q},I,\delta) constructed from a run of algorithm 1 and the return value that we refer to as rr such that r is a list (algorithm succeeds). rr^{\prime} is the resultant list obtained from removing all elements containing states sSCs^{\prime}\not\in S_{C}. We can construct the corresponding candidate path as follows: rir^{\prime}_{i} represents the ii th element of list rr^{\prime}. The candidate path is the sequence of states s0,,sm1s_{0},...,s_{m-1}, where mm is the length of list rr^{\prime}. sis_{i} is the state corresponding to rir^{\prime}_{i} for 0i<m0\leq i<m.

7.2 Proofs

Theorem 7.1

Soundness Theorem
Given a CFG 𝕏=(SC,SQ,I,δ)\mathbb{X}=(S_{C},S_{Q},I,\delta), constructed from a run of algorithm 1 and a corresponding candidate path PP, PP is a solution path for 𝕏\mathbb{X}.

Proof

Let GG be a goal set for 𝕏\mathbb{X}. By definition 11, P=s0,,smP=s_{0},...,s_{m}, where m0m\geq 0. By definition 9 we must show PP has the following properties.

1) s0=Is_{0}=I

2) smGs_{m}\in G

3) sjSCforallj{0,,m}s_{j}\in S_{C}\ for\ all\ j\ \in\{0,...,m\}

4) s0,,sm1Gs_{0},...,s_{m-1}\not\in G

5) si+1δ(si)fori{0,,m1}s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,m-1\}
1) By definition 4, ii is causally consistent and cannot be removed from the candidate path. Hence I must be in the candidate path and is the first state as per line 2 in algorithm 1. Therefore s0s_{0} must be ii.
2) The while loop in algorithm 1 ends if and only if is_counterfactual(s,C,Q)is\_counterfactual(s,C,Q) is True. From theorem 7.2, is_counterfactual(s,C,Q)is\_counterfactual(s,C,Q) is True only for the goal state. Hence smGs_{m}\in G.
3) By definition 11 of the candidate path, all states sjSCforallj{0,,m}s_{j}\in S_{C}\ for\ all\ j\ \in\{0,...,m\}.
4) By theorem 7.4, we have proved the claim s0,,sm1Gs_{0},...,s_{m-1}\not\in G.
5) By theorem 7.3, we have proved the claim si+1δ(si)fori{0,,m1}s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,m-1\}.
Hence we proved the candidate path PP (definition 11) is a solution path (definition 9).

Theorem 7.2

Given a CFG 𝕏=(SC,SQ,I,δ)\mathbb{X}=(S_{C},S_{Q},I,\delta), constructed from a run of algorithm 1, with goal set GG, and sSCs\in S_{C}; is_counterfactual(s,C,Q)is\_counterfactual(s,C,Q) will be TRUETRUE if and only if sGs\in G.

Proof

By the definition of the goal set GG we have

G={sSC|sSQ}G=\{s\in S_{C}|s\not\in S_{Q}\} (12)

For is_counterfactualis\_counterfactual which takes as input the state ss, the set of causal rules CC and the set of decision rules QQ (Algorithm 3), we see that by from line 1 in algorithm 3, it returns TRUETRUE if it satisfied all rules in CC and no rules in QQ.

By the Definition 3, sSQs\in S_{Q} if and only if it satisfies a rule in QQ. Therefore, is_counterfactual(s,C,Q)is\_counterfactual(s,C,Q) is TRUETRUE if and only if sSQs\not\in S_{Q} and since sSCs\in S_{C} and sSQs\not\in S_{Q}, then sGs\in G.

Theorem 7.3

Given a CFG 𝕏=(SC,SQ,I,δ)\mathbb{X}=(S_{C},S_{Q},I,\delta), constructed from a run of algorithm 1 and a corresponding candidate path P=s0,,smP=s_{0},...,s_{m}; si+1δ(si)fori{0,,m1}s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,m-1\}

Proof

This property can be proven by induction on the length of the list visited_lists obtained from Algorithm 1, 2, and 4.
Base Case: The list visited_lists from algorithm 1 has length of 1, i.e., [s0s_{0}]. The property si+1δ(si)fori{0,,m1}s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,m-1\} is trivially true as there is no s1s_{-1}.
Inductive Hypotheses: We have a list [s0,,sn1s_{0},...,s_{n-1}] of length nn generated from 0 or more iteration of running the function intervene (algorithm 4), and it satisfies the claim si+1δ(si)fori{0,,n1}s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,n-1\}

Inductive Step: If we have a list [s0,,sn1s_{0},...,s_{n-1}] of length n and we wish to get element sns_{n} obtained through running another iteration of function intervene (algorithm 4). Since [s0,,sn1s_{0},...,s_{n-1}] is of length n by the inductive hypothesis, it satisfies the property, and it is sufficient to show snδ(sn1)s_{n}\in\delta(s_{n-1}) where si+1δ(si)fori{0,,n1}s_{i+1}\in\delta(s_{i})\ for\ i\ \in\{0,...,n-1\}.

The list visited_lists from algorithm 1 has length of nn. Going from sn1s_{n-1} to sns_{n} involves calling the function intervene (Algorithm 4) which in turn calls the function make_consistent (Algorithm 2).

Function make_consistent (Algorithm 2) takes as input the state ss, the list of actions taken actions_taken, the list of visited states visited_states, the set of causal rules CC and the set of possible actions AA. It returns visited_states with the new causally consistent states as the last element. From line 1, if we pass as input a causally consistent state, then function make_consistent does nothing. On the other hand, if we pass a causally inconsistent state, it takes actions to reach a new state. Upon checking if the action taken results in a new state that is causally consistent from the while loop in line 1, it returns the new state. Hence, we have shown that the moment a causally consistent state is encountered in function make_consistent, it does not add any new state.

Function intervene (Algorithm 4) takes as input the list of visited states visited_states which contains the current state as the last element, the set of causal rules CC and the set of possible actions AA. It returns visited_states with the new causally consistent states as the last element. It calls the make_consistent function. For the function intervene, in line 1 it obtains the current state (in this case sn1s_{n-1}) from the list visited_states. It is seen in line 2 that an action aa is taken:

1) Case 1: If a causal action is taken, then upon entering the the function make_consistent (Algorithm 2), it will not do anything as causal actions by definition result in causally consistent states.

2) Case 2: If a direct action is taken, then the new state that may or may not be causally consistent is appended to visited_states. The call to the function make_consistent will append one or more states with only the final state appended being causally consistent.

Hence we have shown that the moment a causally consistent state is appended in function intervene, it does not add any new state. This causally consistent state is sns_{n}. In both cases sn=σ(sn1)s_{n}=\sigma(s_{n-1}) as defined in Definition 10 and this snδ(sn1)s_{n}\in\delta(s_{n-1}).

Theorem 7.4

Given a CFG problem 𝕏=(SC,SQ,I,δ)\mathbb{X}=(S_{C},S_{Q},I,\delta), constructed from a run of algorithm 1, with goal set GG and a corresponding candidate path P=s0,,smP=s_{0},...,s_{m} with m0m\geq 0, s0,,sm1Gs_{0},...,s_{m-1}\not\in G.

Proof

This property can be proven by induction on the length of the list visited_lists obtained from Algorithm 1, 2, and 4.
Base Case: visited_lists has length of 1. Therefore the property P=s0,,smP=s_{0},...,s_{m} with m0m\geq 0, s0,,sm1Gs_{0},...,s_{m-1}\not\in G is trivially true as state sjs_{j} for j<0j<0 does not exist.

Inductive Hypotheses: We have a list [s0,,sn1s_{0},...,s_{n-1}] of length nn generated from 0 or more iteration of running the function intervene (Algorithm 4), and it satisfies the claim s0,,sn2Gs_{0},...,s_{n-2}\not\in G.

Inductive Step: Suppose we have a list [s0,,sn1s_{0},...,s_{n-1}] of length n and we wish to append the n+1 th element (state sns_{n}) by calling the function intervene, and we wish to show that that the resultant list satisfies the claim s0,,sn1Gs_{0},...,s_{n-1}\not\in G. The first n-1 elements (s0,,sn2s_{0},...,s_{n-2}) are not in GG as per the inductive hypothesis.

From line 3 in the function find_path (Algorithm 1), we see that to call the function intervene another time, the current state (in this case sn1)s_{n-1}) cannot be a counterfactual, by theorem 7.2. Hence sn1Gs_{n-1}\not\in G

Therefore by induction the claim s0,,sn1Gs_{0},...,s_{n-1}\not\in G holds.

7.3 Experimental Setup

7.3.1 Counterfactuals

Algorithm 3 is_counterfactual returns True for all states consistent with causal rules CC that disagree with the decision rules QQ. Given our state space SS, from is_counterfactual, we obtain all states that are realistic counterfactuals. Table 2 shows the number of counterfactuals that we obtain using is_counterfactual.

Dataset # of Features Used # of Counterfactuals
Adult 6 112
Cars 4 78
German 7 240
Table 2: Table showing a Number of Counterfactuals produce by the is_counterfactual function given all possible states.

7.3.2 Dataset: Cars

The Car Evaluation dataset provides information on car purchasing acceptability. We relabelled the Car Evaluation dataset to ‘acceptable’ and ‘unacceptable’ in order to generate the counterfactuals. We applied the CoGS methodology to rules generated by the FOLD-SE algorithm. These rules indicate whether a car is acceptable to buy or should be rejected, with the undesired outcome being rejection.

For the Car Evaluation dataset, table 1) shows a path from initial rejection to changes that make the car acceptable for purchase.

We run the FOLD-SE algorithm to produce the following rules:
label(X,‘negative’) :- persons(X,‘2’).

label(X,‘negative’) :- safety(X,‘low’).

label(X,‘negative’) :- buying(X,‘vhigh’), maint(X,‘vhigh’).

label(X,‘negative’) :- not buying(X,‘low’), not buying(X,‘med’),
maint(X,‘vhigh’).

label(X,‘negative’) :- buying(X,‘vhigh’), maint(X,‘high’).

The rules described above indicate if the purchase of a car was rejected .

  1. 1.

    Accuracy: 93.9%

  2. 2.

    Precision: 100%

  3. 3.

    Recall: 91.3%

2) Features and Feature Values used:

  • Feature: persons

  • Feature: safety

  • Feature: buying

  • Feature: maint

7.3.3 Dataset: Adult

The Adult dataset includes demographic information with labels indicating income (‘=<$50k/year=<\$50k/year’ or ‘>$50k/year>\$50k/year’). We applied the CoGS methodology to rules generated by the FOLD-SE algorithm on the Adult dataset [5]. These rules indicate whether someone makes ‘=<=<$50k/year’.

Additionally, causal rules are also learnt and verified (for example, if feature ‘Marital Status’ has value ‘Never Married’, then feature ‘Relationship’ should (not) have the value ‘husband’ or ‘wife’. We learn the rules to verify these assumptions on cause-effect dependencies.

The goal of CoGS is to find a path to a counterfactual instance where a person makes ‘>>$50k/year’.

For the Adult dataset, Table 1 shows a path from making ‘=<=<$50k/year’ to ‘>>$50k/year’.

We run the FOLD-SE algorithm to produce the following decision making rules:
label(X,‘<=50K’) :- not marital_status(X,‘Married-civ-spouse’),

capital_gain(X,N1), N1=<6849.0.

label(X,‘<=50K’) :- marital_status(X,‘Married-civ-spouse’),

capital_gain(X,N1), N1=<5013.0,

education_num(X,N2), N2=<12.0.

  1. 1.

    Accuracy: 84.5%

  2. 2.

    Precision: 86.5%

  3. 3.

    Recall: 94.6%

2) FOLD-SE gives Causal rules for the ‘marital_status’ feature having value ‘never_married’:

marital_status(X,‘Never-married’) :- not relationship(X,‘Husband’),

not relationship(X,‘Wife’), age(X,N1), N1=<29.0.

  1. 1.

    Accuracy: 86.4%

  2. 2.

    Precision: 89.2%

  3. 3.

    Recall: 76.4%

3) FOLD-SE gives Causal rules for the ‘marital_status’ feature having value ‘Married-civ-spouse’:

marital_status(X,‘Married-civ-spouse’) :- relationship(X,‘Husband’).

marital_status(X,‘Married-civ-spouse’) :- relationship(X,‘Wife’).

  1. 1.

    Accuracy: 99.1%

  2. 2.

    Precision: 99.9%

  3. 3.

    Recall: 98.2%

4) For values of the feature ‘marital_status’ that are not ‘Married-civ-spouse’ or ‘never_married’ which we shall call ‘neither’, a user defined rule is used:

marital_status(X,neither) :- not relationship(X,‘Husband’),

not relationship(X,‘Wife’).

5) FOLD-SE gives Causal rules for the ‘relationship’ feature having value ‘husband’:

relationship(X,‘Husband’) :- sex(X,‘Male’) ,

age(X,N1), not(N1=<27.0).

  1. 1.

    Accuracy: 82.3%

  2. 2.

    Precision: 71.3%

  3. 3.

    Recall: 93.2%

5) For the ‘relationship’ feature value of ‘wife’, a user defined rule is used:

relationship(X,‘Wife’) :- sex(X,‘Female’).

6)Features Used in Generating the counterfactual path:

  • Feature: marital_status

  • Feature: relationship

  • Feature: sex

  • capital_gain

  • education_num

  • age

7.3.4 Dataset: German

We run the FOLD-SE algorithm to produce the following decision making rules:

label(X,‘good’):-checking_account_status(X,‘no_checking_account’)

label(X,‘good’):-not checking_account_status(X,‘no_checking_account’),
not credit_history(X,‘all_dues_atbank_cleared’),

duration_months(X,N1), N1=<21.0,
credit_amount(X,N2), not(N2=<428.0),
not ab1(X,‘True’).

ab1(X,‘True’):-property(X,‘car or other’),
credit_amount(X,N2), N2=<1345.0.

  1. 1.

    Accuracy: 77%

  2. 2.

    Precision: 83%

  3. 3.

    Recall: 84.2%

2) FOLD-SE gives Causal rules for the ‘present_employment_since’ feature having value ‘employed’ where employed is the placeholder for all feature values that are not equal to the feature value ‘unemployed’:

present_employment_since(X,‘employed’) :-
not job(X,‘unemployed/unskilled-non_resident’).

  1. 1.

    Accuracy: 95%

  2. 2.

    Precision: 96.4%

  3. 3.

    Recall: 98.4%

3) For values of the feature ‘present_employment_since’ that are ‘unemployed’, a user defined rule is used

present_employment_since(X,‘unemployed’) :-
job(X,‘unemployed/unskilled-non_resident’).

6)Features Used in Generating the counterfactual path:

  • checking_account_status

  • credit_history

  • property

  • duration_months

  • credit_amount

  • present_employment_since

  • job