Lookahead Counterfactual Fairness
Abstract
As machine learning (ML) algorithms are used in applications that involve humans, concerns have arisen that these algorithms may be biased against certain social groups. Counterfactual fairness (CF) is a fairness notion proposed in Kusner et al. (2017) that measures the unfairness of ML predictions; it requires that the prediction perceived by an individual in the real world has the same marginal distribution as it would be in a counterfactual world, in which the individual belongs to a different group. Although CF ensures fair ML predictions, it fails to consider the downstream effects of ML predictions on individuals. Since humans are strategic and often adapt their behaviors in response to the ML system, predictions that satisfy CF may not lead to a fair future outcome for the individuals. In this paper, we introduce lookahead counterfactual fairness (LCF), a fairness notion accounting for the downstream effects of ML models which requires the individual future status to be counterfactually fair. We theoretically identify conditions under which LCF can be satisfied and propose an algorithm based on the theorems. We also extend the concept to path-dependent fairness. Experiments on both synthetic and real data validate the proposed method111The code for this paper is available in https://github.com/osu-srml/LCF..
1 Introduction
The integration of machine learning (ML) into high-stakes domains (e.g., lending, hiring, college admissions, healthcare) has the potential to enhance traditional human-driven processes. However, it may introduce the risk of perpetuating biases and unfair treatment of protected groups. For instance, the violence risk assessment tool SAVRY has been shown to discriminate against males and foreigners (Tolan et al., 2019); Amazon’s previous hiring system exhibited gender bias (Dastin, 2018); the accuracy of a computer-aided clinical diagnostic system varies significantly across patients from different racial groups (Daneshjou et al., 2021). Numerous fairness notions have been proposed in the literature to address unfairness issues, including unawareness fairness that prevents the explicit use of demographic attributes in the decision-making process, parity-based fairness that equalizes certain statistics (e.g., accuracy, true/false positive rate) across different groups (Hardt et al., 2016b; Khalili et al., 2023; 2021b; 2021a; Abroshan et al., 2024), preference-based fairness that ensures a group of individuals, as a whole, regard the results or consequences they receive from the ML system more favorably than those received by another group (Zafar et al., 2017; Do et al., 2022). Unlike these notions that overlook the underlying causal structures among different variables Kusner et al. (2017) introduced the concept of counterfactual fairness (CF), which requires that an individual should receive a consistent treatment distribution in a counterfactual world where their sensitive attributes differs. Since then many approaches have been developed to train ML models that satisfy CF (Chiappa, 2019; Zuo et al., 2022; Wu et al., 2019; Xu et al., 2019; Ma et al., 2023; Zuo et al., 2023; Abroshan et al., 2022).
However, CF is primarily studied in static settings without considering the downstream impacts ML decisions may have on individuals. Because humans in practice often adapt their behaviors in response to the ML system, their future status may be significantly impacted by ML decisions (Miller et al., 2020; Shavit et al., 2020; Hardt et al., 2016a). For example, individuals receiving approvals in loan applications may have more resources and be better equipped to improve their future creditworthiness (Zhang et al., 2020). Content recommended in digital platforms can steer consumer behavior and reshape their preferences (Dean & Morgenstern, 2022; Carroll et al., 2022). As a result, a model that satisfies CF in a static setting without accounting for such downstream effects may lead to unexpected adverse outcomes.
Although the downstream impacts of fair ML have also been studied in prior works (Henzinger et al., 2023a; Xie & Zhang, 2024a; Ge et al., 2021; Henzinger et al., 2023b; Liu et al., 2018; Zhang et al., 2020; Xie et al., 2024), the impact of counterfactually fair decisions remain relatively unexplored. The most related work to this study is (Hu & Zhang, 2022), which considers sequential interactions between individuals and an ML system over time and their goal is to ensure ML decisions satisfy path-specific counterfactual fairness constraint throughout the sequential interactions. However, Hu & Zhang (2022) still focuses on the fairness of ML decisions but not the fairness of the individual’s actual status. Indeed, it has been well-evidenced that ML decisions satisfying certain fairness constraints during model deployment may reshape the population and unintentionally exacerbate the group disparity (Liu et al., 2018; Zhang et al., 2019; 2020). A prime example is Liu et al. (2018), which studied the lending problem and showed that the lending decisions satisfying statistical parity or equal opportunity fairness (Hardt et al., 2016b) may actually cause harm to disadvantaged groups by lowering their future credit scores, resulting in amplified group disparity. Tang et al. (2023) considered sequential interactions between ML decisions and individuals, where they studied the impact of counterfactual fair predictions on statistical fairness but their goal is still to ensure parity-based fairness at the group level.
In this work, we focus on counterfactual fairness evaluated over individual future status (label), which accounts for the downstream effects of ML decisions on individuals. We aim to examine under what conditions and by what algorithms the disparity between individual future status in factual and counterfactual worlds can be mitigated after deploying ML decisions. To this end, we first introduce a new fairness notion called “lookahead counterfactual fairness (LCF)." Unlike the original counterfactual fairness proposed by Kusner et al. (2017) that requires the ML predictions received by individuals to be the same as those in the counterfactual world, LCF takes one step further by enforcing the individual future status (after responding to ML predictions) to be the same.
Given the definition of LCF, we then develop algorithms that learn ML models under LCF. To model the effects of ML decisions on individuals, we focus on scenarios where individuals subject to certain ML decisions adapt their behaviors strategically by increasing their chances of receiving favorable decisions; this can be mathematically formulated as modifying their features toward the direction of the gradient of the decision function (Rosenfeld et al., 2020; Xie & Zhang, 2024b). We first theoretically identify conditions under which an ML model can satisfy LCF, and then develop an algorithm for training ML models under LCF. We also extend the algorithm and theorems to path-dependent LCF, which only considers unfairness incurred by the causal effect from the sensitive attribute to the outcome along certain paths.
Our contributions can be summarized as follows:
-
•
We propose lookahead counterfactual fairness (LCF), a novel fairness notion that evaluates counterfactual fairness over individual future status (i.e., actual labels after responding to ML systems). Unlike the original CF notion that focuses on current ML predictions, LCF accounts for the subsequent impacts of ML decisions and aims to ensure fairness over individual actual future status. We also extend the definition to path-dependent LCF.
-
•
For scenarios where individuals respond to ML models by changing features toward the direction of the gradient of decision functions, we theoretically identify conditions under which an ML model can satisfy LCF. We further develop an algorithm for training ML models under LCF.
-
•
We conduct extensive experiments on both synthetic and real data to validate the proposed algorithm. Results show that compared to conventional counterfactual fair predictors, our method can improve disparity with respect to the individual actual future status.
2 Related Work
Causal fairness has been explored in many aspects in recent years’ research. Kilbertus et al. (2017) point out that no observational criterion can distinguish scenarios determined by different causal mechanisms but have the same observational distribution. They propose the definition of unresolved discrimination and proxy discrimination based on the intuition that some of the paths from the sensitive attribute to the prediction can be acceptable. Nabi & Shpitser (2018) argue that a fair causal inference on the outcome can be obtained by solving a constrained optimization problem. These notions are based on defining constraints on the interventional distributions.
Counterfactual fairness (Kusner et al., 2017) requires the prediction on the target variable to have the same distribution in the factual world and counterfactual world. Many extensions of traditional statistical fairness notions such as Fair on Average Causal Effect (FACE) (can be regarded as counterfactual demographic parity) (Khademi et al., 2019), Fair on Average Causal Effect on the Treated (FACT) (can be regarded as counterfactual equalized odds) (Khademi et al., 2019), and CAPI fairness (counterfactual individual fairness) (Ehyaei et al., 2024) have been proposed. Path-specific counterfactual fairness (Chiappa, 2019) considered the path-specific causal effect. However, the notions are focused on fairness in static settings and do not consider the future effect. Most recent work about counterfactual fairness is about achieving counterfactual fairness in different applications, such as graph data (Wang et al., 2024a; b), medical LLMs (Poulain et al., 2024), or software debuging (Xiao et al., 2024). Some literature try to use counterfactual fairness for explanations (Goethals et al., 2024). Connecting counterfactual fairness with group fairness notions (Anthis & Veitch, 2024) or exploring counterfactual fairness with partial knowledge about the causal model (Shao et al., 2024; Pinto et al., 2024; Duong et al., 2024; Zhou et al., 2024) are also receiving much attention. Machado et al. (2024) propose an idea of interpretable counterfactual fairness by deriving counterfactuals with optimal transports (De Lara et al., 2024). While extending the definition of counterfactual fairness to include the downstream effects has been less focused on.
Several studies in the literature consider the downstream effect on fairness of ML predictions. There are two kinds of objectives in the study of downstream effects: ensuring fair predictions in the future or fair true status in the future. The two most related works to our paper are Hu & Zhang (2022) and Tang et al. (2023). Hu & Zhang (2022) consider the problem of ensuring ML predictions satisfy path-specific counterfactual fairness over time after interactions between individuals and an ML system. Tang et al. (2023) study the impact on the future true status of the ML predictions. Even though they considered the impact of a counterfactually fair predictor, their goal is to ensure parity-based fairness. Therefore, the current works lack the consideration of ensuring counterfactual fairness on the true label after the individual responds to the current ML prediction. Our paper is aimed at solving this problem.
3 Problem Formulation
Consider a supervised learning problem with a training dataset consisting of triples , where is a sensitive attribute distinguishing individuals from multiple groups (e.g., race, gender), is a -dimensional feature vector, and is the target variable indicating individual’s underlying status (e.g., in lending identifies an applicant’s ability to repay the loan, in healthcare may represent patients’ insulin spike level). The goal is to learn a predictor from training data that can predict given inputs and . Let denote the output of the predictor.
We assume is associated with a structural causal model (SCM) (Pearl et al., 2000) , where represents observable variables, includes unobservable (exogenous) variables that are not caused by any variable in , and is a set of functions called structural equations that determines how each observable variable is constructed. More precisely, we have the following structural equations,
(1) |
where , and are observable variables that are the parents of , , and , respectively. are unobservable variables that are the parents of . Similarly, we denote unobservable variables and as the parents of and , respectively.
3.1 Background: counterfactuals
If the probability density functions of unobserved variables are known, we can leverage the structural equations in SCM to find the marginal distribution of any observable variable and even study how intervening certain observable variables impacts other variables. Specifically, the intervention on variable is equivalent to replacing structural equation with equation for some . Given new structural equation and other unchanged structural equations, we can find out how the distribution of other observable variables changes as we change value .
In addition to understanding the impact of an intervention, SCM can further facilitate counterfactual inference, which aims to answer the question “what would be the value of if had taken value in the presence of evidence (both and are two observable variables)?” The answer to this question is denoted by with following conditional distribution of . Given and structural equations , the counterfactual value of can be computed by replacing the structural equation of with and replacing with in the rest of the structural equations. Such counterfactual is typically denoted by . Given evidence , the distribution of counterfactual value can be calculated as follows,222Given structural equations equation 3 and the marginal distribution of , can be calculated using the Change-of-Variables Technique and the Jacobian factor. As a result, can also be calculated.
(2) |
Example 3.1 (Law school success).
Consider two groups of college students distinguished by gender whose first-year average (FYA) in college is denoted by . The FYA of each student is causally related to (observable) grade-point average (GPA) before entering college , entrance exam score (LSAT) , and gender . Suppose there are two unobservable variables , e.g., may be interpreted as the student’s knowledge. Consider the following structural equations:
where are know parameters of the causal model. Given observation , the counterfactual value can be calculated with an abduction-action-prediction procedure Glymour et al. (2016): (i) abduction that finds posterior distribution . Here, we have and with probability ; (ii) action that performs intervention by replacing structural equations of ; (iii) prediction that computes distribution of given using new structural equations and the posterior. We have:
3.2 Counterfactual Fairness
Counterfactual Fairness (CF) was first proposed by Kusner et al. (2017); it requires that for an individual with , the prediction in the factual world should be the same as that in the counterfactual world in which the individual belongs to a different group. Mathematically, CF is defined as follows:
While the CF notion has been widely used in the literature, it does not account for the downstream impacts of ML prediction on individuals in factual and counterfactual worlds. To illustrate the importance of considering such impacts, we provide an example below.
Example 3.2.
Consider automatic lending where an ML model is used to decide whether to issue a loan to an applicant based on credit score and sensitive attribute . As highlighted in Liu et al. (2018), issuing loans to unqualified people who cannot repay the loan may hurt them by worsening their future credit scores. Assume an applicant in the factual world is qualified for the loan and does not default. But in a counterfactual world where the applicant belongs to another group, he/she is not qualified. Under counterfactually fair predictions, both individuals in the factual and counterfactual worlds should receive the loan with the same probability. Suppose both are issued a loan, then the one in the counterfactual world would have a worse credit score in the future. Thus, it is crucial to consider the downstream effects when learning a fair ML model.
3.3 Characterize downstream effects
Motivated by Example 3.2, this work studies CF in a dynamic setting where the deployed ML decisions may affect individual behavior and change their future features and statuses. Formally, let and denote an individual’s future feature vector and status, respectively. We use an individual response to capture the impact of ML prediction on individuals, as defined below.
Definition 3.1 (Individual response).
An individual response is a map from the current exogenous variables , endogenous variables , and prediction to the future exogenous variables and endogenous variables .
4 Lookahead Counterfactual Fairness

We consider the fairness over the individual’s future outcome . Given structural causal model , individual response , and data , we define lookahead counterfactual fairness below.
Definition 4.1.
We say an ML model satisfies lookahead counterfactual fairness (LCF) under a response if the following holds :
(3) |
LCF implies that the subsequent consequence of ML decisions for a given individual in the factual world should be the same as that in the counterfactual world where the individual belongs to other demographic groups. Note that CF may contradict LCF: even under counterfactually fair predictor, individuals in the factual and counterfactual worlds may end up with very different future statuses. We show this with an example below.
Example 4.1.
Consider the causal graph in Figure 1 and the structural functions as follows:
Based on Kusner et al. (2017), a predictor that only uses and as input is counterfactually fair.333Note that and can be generated for each sample . See Section 4.1 of (Kusner et al., 2017) for more details. Therefore, satisfies CF. Let and be uniformly distributed over . Note that the response and imply that individuals make efforts to change feature vectors through changing the unobservable variables, which results in higher in the future. It is easy to see that a CF predictor minimizes the MSE loss if and . However, since , we have:
where if and otherwise. It shows that although the decisions in the factual and counterfactual worlds are the same, the future statuses are still different and Definition 4.1 does not hold.
Theorem 4.1 below identifies more general scenarios under which LCF can be violated with a CF predictor.
Theorem 4.1 (Violation of LCF under CF predictors).
Consider a causal model and individual response in the following form:
If the response is a function and the status in factual and counterfactual worlds have different distributions, i.e.,
imposing any arbitrary model that satisfies CF will violate LCF, i.e.,
5 Learning under LCF
This section introduces an algorithm for learning a predictor under LCF. In particular, we focus on a special case with the causal model and the individual response defined below.
Given sets of unobservable variables and observable variables , we consider causal model with the following structural functions:
(4) |
where is an invertible function444Several works in causal inference also consider invertible structural function, e.g., bijective causal models introduced in Nasr-Esfahany et al. (2023)., and is invertible w.r.t. . After receiving the ML prediction , the individual’s future features and status change accordingly. Specifically, we consider scenarios where individual unobservable variables change based on the following
(5) |
and the future attributes and status also change accordingly, i.e.,
(6) |
The above scenario implies that individuals respond to ML model by strategically moving features toward the direction that increases their chances of receiving favorable decisions, step size controls the magnitude of data change and can be interpreted as the effort budget individuals have on changing their data. Note that this type of response has been widely studied in strategic classification literature (Rosenfeld et al., 2020; Hardt et al., 2016a). The above process with is visualized in Figure 2.

Our goal is to train an ML model under LCF constraint. Before presenting our method, we first define the notion of counterfactual random variables.
Definition 5.1 (Counterfactual random variable).
Let and be realizations of random variables and , and . We say and are the counterfactual random variables associated with if follows the conditional distribution as given by the causal model . The realizations of , are denoted by and .
The following theorem constructs a predictor that satisfies LCF, i.e., deploying the predictor in Theorem 5.1 ensures the future status is counterfactually fair.
Theorem 5.1 (Predictor with perfect LCF).
Consider causal model , where , , and the structural equations are given by,
(7) |
where , , , and denotes the element wise production. Then, the following predictor satisfies LCF,
(8) |
where with , and is the counterfactual random variable associated with . Here, and and function are arbitrary and can be trained to improve prediction performance.
Proof Sketch.
For any given , we can find the conditional distribution of . For a sample drawn from the distribution, we can compute and . Then we have
From this we can compute the gradient of w.r.t. , . With response function , we can get and . With the structural functions, we can know and . So, we have
Because we know that , we have
With law of total probability, we have LCF satisfied. ∎
The above theorem implies that should be constructed based on the counterfactual random variable and . Even though is unobserved, it can be obtained from the inverse of structural equations. Quantity in Theorem 5.1 depends on the step size in individual response, and parameters in structural functions. When , we can achieve perfect LCF.
It is worth noting that Definition 4.1 can be a very strong constraint and imposing and to have the same distribution may degrade the performance of the predictor significantly. To tackle this, we may consider a weaker version of LCF.
Definition 5.2 (Relaxed LCF).
We say Relaxed LCF holds if we have:
(9) |
Definition 5.2 implies that after individuals respond to ML model, the difference between the future status in factual and counterfactual worlds should be smaller than the difference between original status in factual and counterfactual worlds. In other words, it means that the disparity between factual and counterfactual worlds must decrease over time. In Section 7, we empirically show that constraint in equation 9 is weaker than the constraint in equation 3 and can lead to a better prediction performance.
Corollary 5.1 (Relaxed LCF with predictor in equation 8).
Proof Sketch.
When , from
we have
Therefore . With law of total probability, the Relaxed LCF is satisfied. ∎
Apart from relaxing in predictor as shown in equation 8, we can also relax the form of the predictor to satisfy Relaxed LCF, as shown in Theorem 5.2.
Theorem 5.2 (Predictor under Relaxed LCF).
Consider the same causal model defined in Theorem 5.1. A predictor satisfies Relaxed LCF if has the following three properties:
-
(i)
is strictly convex in .
-
(ii)
can be expressed as .
-
(iii)
The derivative of w.r.t. is -Lipschitz continuous in with , i.e.,
Proof Sketch.
When satisfies property (ii), we can prove that
still holds. Because of properties (i), we have
Therefore, when , , which is guaranteed by property (iii). ∎
Theorems 5.1 and 5.2 provide insights on designing algorithms to train a predictor with perfect or Relaxed LCF. Specifically, given training data , we first estimate the structural equations. Then, we choose a parameterized predictor that satisfies the conditions in Theorem 5.1 or 5.2. An example is shown in Algorithm 1, which finds an optimal predictor in the form of under LCF, where , is the training parameter for function , and are two other training parameters. Under Algorithm 1, we can find the optimal values for using training data . If we only want to satisfy Relaxed LCF (Definition 5.2), can be a training parameter with .
Input: Training data , response parameter .
Output:
It is worth noting that the results in Theorems 5.1 and 5.2 are for linear causal models. When the causal model is non-linear, it is hard to construct a model satisfying perfect LCF in Definition 4.1. Nonetheless, we can still show that it is possible to satisfy Relaxed LCF (Definition 5.2) for certain non-linear causal models. Theorem 5.3 below focuses on a special case when is not linearly dependent on and and it identifies the condition under which Relaxed LCF can be guaranteed.
Theorem 5.3.
Consider a bijective causal model , where is a scalar exogenous variable, , and the structural equations can be written in the form of
for some function , where is an arbitrary function of . Define function as the multiplication of and its derivative. If has the following properties:
-
•
is monotonic and strictly concave;
-
•
If then ,
-
•
and there exists constant such that is -Lipschitz continuous.
Then, the following predictor satisfies Relaxed LCF,
where , are learnable parameters, is the counterfactual random variable associated with , and can be an arbitrary monotonic function that is increasing (resp. decreasing) when is increasing (resp. decreasing).
Proof Sketch.
For a sample drawn from the conditional distribution of given , we can compute , , and get
The specific formulas of , , and can be seen in the full proof. With the mean value theorem, we know that
where , and
where . The three properties ensure that
Therefore, . With law of total probability, we have the Relaxed LCF satisfied. ∎
Theorems 5.2 and 5.3 show that designing a predictor under Relaxed LCF highly depends on the form of causal structure and structural equations. To wrap up this section, we would like to identify conditions under which Relaxed LCF holds in a causal graph that is determined by the product of and .
Theorem 5.4.
Consider a non-linear causal model , where , , is a binary sensitive attribute. Assume that the structural functions are given by,
(10) |
where , , and denotes the element wise production. A predictor satisfies Relaxed LCF if and the causal model have the following three properties.
-
(i)
The value domain of satisfies .
-
(ii)
is strictly convex.
-
(iii)
The derivate of is -Lipschitz continuous with , i.e.,
Proof Sketch.
For a sample drawn from the conditional distribution of given , we can compute the and and get
The definition of can be seen in the full proof. Because property (i) and property (ii),
Property (iii) ensures that . Therefore, . ∎
Although the structural equation associated with is still linear in and , we emphasize that such a linear assumption has been very common in the literature due to the complex nature of strategic classification Zhang et al. (2022); Liu et al. (2020); Bechavod et al. (2022). For instance, Bechavod et al. (2022) assumed the actual status of individuals is , a linear function of features . Zhang et al. (2022) assumed that itself may be non-linear in some underlying traits of the individuals, but the relationship between and is still linear. Indeed, due to the individual’s strategic response, conducting the theoretical analysis accounting for such responses can be highly challenging. Nonetheless, it is worthwhile extending LCF to non-linear settings and we leave this for future works.
6 Path-dependent LCF
An extension of counterfactual fairness called path-dependent fairness has been introduced in Kusner et al. (2017). In this section, we also want to introduce an extension of LCF called path-dependent LCF. We will also modify Algorithm 1 to satisfy path-dependent LCF.
We start by introducing the notion of path-dependent counterfactuals. In a causal model associated with a causal graph , we denote as a set of unfair paths from sensitive attribute to . We define as the set of features that are not present in any of the unfair paths. Under observation , we call path-dependent counterfactual random variable for , and its distribution can be calculated as follows:
For simplicity, we use and to represent a path-dependent counterfactual and the corresponding realization. That is, where follows . We consider the same kind of causal model described in Section 5, the future attributes and outcome are determined by equation 5 and equation 5. We formally define the path-dependent LCF in the following definition.
Definition 6.1.
We say an ML model satisfies path-dependent lookahead counterfactual fairness w.r.t. the unfair path set if the following holds :
Then we have the following theorem.
Theorem 6.1.
Consider a causal model and structural equations defined in Theorem 5.1. If we denote the features on unfair path as and remaining features as , we can re-write structural equations as
Then, the following predictor satisfies path-dependent LCF,
where with
and are learnable parameters to improve prediction performance and is an arbitary function.
Proof Sketch.
Consider a sample , we can compute and . Then we can get a similar equation about like Theorem 5.1. Then we can get from the form of . With law of total probability, we have the path-dependent LCF satisfied. ∎
7 Experiment
We conduct experiments on both synthetic and real data to validate the proposed method.
7.1 Synthetic Data
We generate the synthetic data based on the causal model described in Theorem 5.1, where we set and generated data points. We assume and follow the uniform distribution over and the sensitive attribute is a Bernoulli random variable with . Then, we generate and using the structural functions described in Theorem 5.1.555The exact values for parameters , , and can be found in the Appendix B. Based on the causal model, the conditional distribution of and given are as follows,
(11) |
Baselines.
We used two baselines for comparison: (i) Unfair predictor (UF) is a linear model without fairness constraint imposed. It takes feature as input and predicts . (ii) Counterfactual fair predictor (CF) only takes the unobservable variables as the input and was proposed by Kusner et al. (2017).
Implementation Details.
To find a predictor satisfying Definition 4.1, we train a predictor in the form of Eq. 8. In our experiment, is a linear function. To train , we follows Algorithm 1 with . We split the dataset into the training/validation/test set at ratio randomly and repeat the experiment 5 times. We use the validation set to find the optimal number of training epochs and the learning rate. Based on our observation, Adam optimization with a learning rate equal to and epochs gives us the best performance.
Metrics.
We use three metrics to evaluate the methods. To evaluate the performance, we use the mean squared error (MSE). Given a dataset , for each and , we generate values of from the posterior distribution. MSE can be estimated as follows,666Check Section 4.1 of Kusner et al. (2017) for details on why equation 12 is an empirical estimate of MSE.
(12) |
where is the prediction for data . Note that for the UF baseline, the prediction does not depend on . Therefore, does not change by for the UF predictor. To evaluate fairness, we define a metric called average future causal effect (AFCE),
It is the average difference between the factual and counterfactual future outcomes. To compare with under different algorithms, we use the unfairness improvement ratio (UIR) defined below. The larger implies a higher improvement in disparity.
Method | MSE | AFCE | UIR |
---|---|---|---|
UF | 0.036 0.003 | 1.296 0.000 | 0% 0 |
CF | 0.520 0.045 | 1.296 0.000 | 0% 0 |
Ours () | 0.064 0.001 | 0.000 0.0016 | 100% 0 |
Results.
Table 1 illustrates the results when we set and . The results show that our method can achieve perfect LCF with . Note that in our experiment, the range of is , and our method and UF can achieve similar MSE. Moreover, our method achieves better performance than the CF method because includes useful predictive information and using it in our predictor can improve performance and decrease the disparity simultaneously. Because both CF and UF do not take into account future outcome , is similar to , leading to .


Based on Corollary 5.1, the value of can impact the strength of fairness. We examine the tradeoff between accuracy and fairness by changing the value of from to under different . Figure 3(a) shows the MSE as a function of AFCE. The results show that when we can easily control the accuracy-fairness trade-off in our algorithm by adjusting . When becomes large, we can get a high LCF improvement while maintaining a low MSE. To show how our method impacts a specific individual, we choose the first data point in our test dataset and plot the distribution of factual future status and counterfactual future status for this specific data point under different methods. Figure 4 illustrates such distributions. It can be seen in the most left plot that there is an obvious gap between factual and counterfactual . Both UF and CF can not decrease this gap for future outcome . However, with our method, we can observe that the distributions of and become closer to each other. When (the most right plot in Figure 4), the two distributions become the same in the factual and counterfactual worlds.

7.2 Real Data: The Law School Success Dataset

We further measure the performance of our proposed method using the Law School Admission Dataset Wightman (1998). In this experiment, the objective is to forecast the first-year average grades (FYA) of students in law school using their undergraduate GPA and LSAT scores.
Dataset.
The dataset consists of 21,791 records. Each record is characterized by 4 attributes: Sex (), Race (), UGPA (), LSAT (), and FYA (). Both Sex and Race are categorical in nature. The Sex attribute can be either male or female, while Race can be Amerindian, Asian, Black, Hispanic, Mexican, Puerto Rican, White, or other. The UGPA is a continuous variable ranging from to . LSAT is an integer-based attribute with a range of . FYA, which is the target variable for prediction, is a real number ranging from to (it has been normalized). In this study, we consider as the sensitive attribute, while , and are treated as features.
Causal Model.
We adopt the causal model as presented in Kusner et al. (2017), which can be visualized in Figure 5. In this causal graph, represents an unobserved variable, which can be interpreted as knowledge. Thus, the model suggests that students’ grades (UGPA, LSAT, FYA) are influenced by their sex, race, and underlying knowledge. We assume that the prior distribution for follows a normal distribution, denoted as . We adopt the same structural equations as Kusner et al. (2017):
Implementation Details.
Note that race is an immutable characteristic. Therefore, we assume that the individuals only adjust their knowledge in response to the prediction model . That is . In contrast to synthetic data, the parameters of structural equations are unknown, and we have to use the training dataset to estimate them. Following the approach of Kusner et al. (2017), we assume that and adhere to Gaussian distributions centered at and , respectively. Note that is an integer, and it follows a Poisson distribution with the parameter . Using the Markov chain Monte Carlo (MCMC) method Geyer (1992), we can estimate the parameters and the conditional distribution of given . For each given data, we sampled different ’s from this conditional distribution. We partitioned the data into training, validation, and test sets with ratio.
Method | MSE | AFCE | UIR |
---|---|---|---|
UF | 0.393 0.046 | 0.026 0.003 | 0% 0 |
CF | 0.496 0.051 | 0.026 0.003 | 0% 0 |
Ours () | 0.493 0.049 | 0.013 0.002 | 50% 0 |
Ours () | 0.529 0.049 | 0.000 0.000 | 100% 0 |

Results.
Table 2 illustrates the results with and and where . The results show that our method achieves a similar MSE as the CF predictor. However, it can improve AFCE significantly compared to the baselines. Figure 6 shows the distribution of and for the first data point in the test set in the factual and counterfactual worlds. Under the UF and CF predictor, the disparity between factual and factual remains similar to the disparity between factual and counterfactual . On the other hand, the disparity between factual and counterfactual under our algorithms gets better for both and . Figure 3(b) demonstrates that for the law school dataset, the trade-off between MSE and AFCE can be adjusted by changing hyperparameter . Figure 6 show the factual and counterfactual distributions in real data experiment. It can be seen that our method is the only way that can decrease the gap between and in an obvious way.
8 Conclusion
This work studied the impact of ML decisions on individuals’ future status using a counterfactual inference framework. We observed that imposing the CF predictor may not decrease the group disparity in individuals’ future status. We thus introduced the lookahead counterfactual fairness (LCF) notion, which takes into account the downstream effects of ML models and requires the individual future status to be counterfactually fair. We proposed a method to train an ML model under LCF and evaluated the method through empirical studies on synthetic and real data.
Acknowledgements
This material is based upon work supported by the U.S. National Science Foundation under award IIS-2202699, IIS-2416895, IIS-2301599, and CMMI-2301601, and by OSU President’s Research Excellence Accelerator Grant, and grants from the Ohio State University’s Translational Data Analytics Institute and College of Engineering Strategic Research Initiative.
References
- (1) Loan Prediction Problem Dataset — kaggle.com. https://www.kaggle.com/datasets/altruistdelhite04/loan-prediction-problem-dataset. [Accessed 20-10-2024].
- Abroshan et al. (2022) Mahed Abroshan, Mohammad Mahdi Khalili, and Andrew Elliott. Counterfactual fairness in synthetic data generation. In NeurIPS Workshop on Synthetic Data for Empowering ML Research, 2022.
- Abroshan et al. (2024) Mahed Abroshan, Andrew Elliott, and Mohammad Mahdi Khalili. Imposing fairness constraints in synthetic data generation. In International Conference on Artificial Intelligence and Statistics, pp. 2269–2277. PMLR, 2024.
- Anthis & Veitch (2024) Jacy Anthis and Victor Veitch. Causal context connects counterfactual fairness to robust prediction and group fairness. Advances in Neural Information Processing Systems, 36, 2024.
- Bechavod et al. (2022) Yahav Bechavod, Chara Podimata, Steven Wu, and Juba Ziani. Information discrepancy in strategic learning. In International Conference on Machine Learning, pp. 1691–1715, 2022.
- Carroll et al. (2022) Micah D Carroll, Anca Dragan, Stuart Russell, and Dylan Hadfield-Menell. Estimating and penalizing induced preference shifts in recommender systems. In International Conference on Machine Learning, pp. 2686–2708. PMLR, 2022.
- Chiappa (2019) Silvia Chiappa. Path-specific counterfactual fairness. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 7801–7808, 2019.
- Daneshjou et al. (2021) Roxana Daneshjou, Kailas Vodrahalli, Weixin Liang, Roberto A Novoa, Melissa Jenkins, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, et al. Disparities in dermatology ai: Assessments using diverse clinical images. arXiv preprint arXiv:2111.08006, 2021.
- Dastin (2018) Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. http://reut.rs/2MXzkly, 2018.
- De Lara et al. (2024) Lucas De Lara, Alberto González-Sanz, Nicholas Asher, Laurent Risser, and Jean-Michel Loubes. Transport-based counterfactual models. Journal of Machine Learning Research, 25(136):1–59, 2024.
- Dean & Morgenstern (2022) Sarah Dean and Jamie Morgenstern. Preference dynamics under personalized recommendations. In Proceedings of the 23rd ACM Conference on Economics and Computation, pp. 795–816, 2022.
- Do et al. (2022) Virginie Do, Sam Corbett-Davies, Jamal Atif, and Nicolas Usunier. Online certification of preference-based fairness for personalized recommender systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6532–6540, 2022.
- Duong et al. (2024) Tri Dung Duong, Qian Li, and Guandong Xu. Achieving counterfactual fairness with imperfect structural causal model. Expert Systems with Applications, 240:122411, 2024.
- Ehyaei et al. (2024) Ahmad-Reza Ehyaei, Kiarash Mohammadi, Amir-Hossein Karimi, Samira Samadi, and Golnoosh Farnadi. Causal adversarial perturbations for individual fairness and robustness in heterogeneous data spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 11847–11855, 2024.
- Ge et al. (2021) Yingqiang Ge, Shuchang Liu, Ruoyuan Gao, Yikun Xian, Yunqi Li, Xiangyu Zhao, Changhua Pei, Fei Sun, Junfeng Ge, Wenwu Ou, et al. Towards long-term fairness in recommendation. In Proceedings of the 14th ACM international conference on web search and data mining, pp. 445–453, 2021.
- Geyer (1992) Charles J Geyer. Practical markov chain monte carlo. Statistical science, pp. 473–483, 1992.
- Glymour et al. (2016) Madelyn Glymour, Judea Pearl, and Nicholas P Jewell. Causal inference in statistics: A primer. John Wiley & Sons, 2016.
- Goethals et al. (2024) Sofie Goethals, David Martens, and Toon Calders. Precof: counterfactual explanations for fairness. Machine Learning, 113(5):3111–3142, 2024.
- Hardt et al. (2016a) Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science, pp. 111–122, 2016a.
- Hardt et al. (2016b) Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29:3315–3323, 2016b.
- Henzinger et al. (2023a) Thomas Henzinger, Mahyar Karimi, Konstantin Kueffner, and Kaushik Mallik. Runtime monitoring of dynamic fairness properties. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 604–614, 2023a.
- Henzinger et al. (2023b) Thomas A Henzinger, Mahyar Karimi, Konstantin Kueffner, and Kaushik Mallik. Monitoring algorithmic fairness. arXiv preprint arXiv:2305.15979, 2023b.
- Hu & Zhang (2022) Yaowei Hu and Lu Zhang. Achieving long-term fairness in sequential decision making. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 9549–9557, 2022.
- Khademi et al. (2019) Aria Khademi, Sanghack Lee, David Foley, and Vasant Honavar. Fairness in algorithmic decision making: An excursion through the lens of causality. In The World Wide Web Conference, pp. 2907–2914, 2019.
- Khalili et al. (2021a) Mohammad Mahdi Khalili, Xueru Zhang, and Mahed Abroshan. Fair sequential selection using supervised learning models. Advances in Neural Information Processing Systems, 34:28144–28155, 2021a.
- Khalili et al. (2021b) Mohammad Mahdi Khalili, Xueru Zhang, Mahed Abroshan, and Somayeh Sojoudi. Improving fairness and privacy in selection problems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 8092–8100, 2021b.
- Khalili et al. (2023) Mohammad Mahdi Khalili, Xueru Zhang, and Mahed Abroshan. Loss balancing for fair supervised learning. In International Conference on Machine Learning, pp. 16271–16290. PMLR, 2023.
- Kilbertus et al. (2017) Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. Avoiding discrimination through causal reasoning. Advances in neural information processing systems, 30, 2017.
- Kusner et al. (2017) Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. Advances in neural information processing systems, 30, 2017.
- Liu et al. (2018) Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact of fair machine learning. In International Conference on Machine Learning, pp. 3150–3158. PMLR, 2018.
- Liu et al. (2020) Lydia T Liu, Ashia Wilson, Nika Haghtalab, Adam Tauman Kalai, Christian Borgs, and Jennifer Chayes. The disparate equilibria of algorithmic decision making when individuals invest rationally. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 381–391, 2020.
- Ma et al. (2023) Jing Ma, Ruocheng Guo, Aidong Zhang, and Jundong Li. Learning for counterfactual fairness from observational data. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1620–1630, 2023.
- Machado et al. (2024) Agathe Fernandes Machado, Arthur Charpentier, and Ewen Gallic. Sequential conditional transport on probabilistic graphs for interpretable counterfactual fairness. arXiv preprint arXiv:2408.03425, 2024.
- Miller et al. (2020) John Miller, Smitha Milli, and Moritz Hardt. Strategic classification is causal modeling in disguise. In International Conference on Machine Learning, pp. 6917–6926. PMLR, 2020.
- Nabi & Shpitser (2018) Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Nasr-Esfahany et al. (2023) Arash Nasr-Esfahany, Mohammad Alizadeh, and Devavrat Shah. Counterfactual identifiability of bijective causal models. arXiv preprint arXiv:2302.02228, 2023.
- Pearl et al. (2000) Judea Pearl et al. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, 19(2), 2000.
- Pinto et al. (2024) Mariana Pinto, Andre V Carreiro, Pedro Madeira, Alberto Lopez, and Hugo Gamboa. The matrix reloaded: Towards counterfactual group fairness in machine learning. Journal of Data-centric Machine Learning Research, 2024.
- Poulain et al. (2024) Raphael Poulain, Hamed Fayyaz, and Rahmatollah Beheshti. Aligning (medical) llms for (counterfactual) fairness. arXiv preprint arXiv:2408.12055, 2024.
- Rosenfeld et al. (2020) Nir Rosenfeld, Anna Hilgard, Sai Srivatsa Ravindranath, and David C Parkes. From predictions to decisions: Using lookahead regularization. Advances in Neural Information Processing Systems, 33:4115–4126, 2020.
- Shao et al. (2024) Pengyang Shao, Le Wu, Kun Zhang, Defu Lian, Richang Hong, Yong Li, and Meng Wang. Average user-side counterfactual fairness for collaborative filtering. ACM Transactions on Information Systems, 42(5):1–26, 2024.
- Shavit et al. (2020) Yonadav Shavit, Benjamin Edelman, and Brian Axelrod. Causal strategic linear regression. In International Conference on Machine Learning, pp. 8676–8686. PMLR, 2020.
- Tang et al. (2023) Zeyu Tang, Yatong Chen, Yang Liu, and Kun Zhang. Tier balancing: Towards dynamic fairness over underlying causal factors. arXiv preprint arXiv:2301.08987, 2023.
- Tolan et al. (2019) Songül Tolan, Marius Miron, Emilia Gómez, and Carlos Castillo. Why machine learning may lead to unfairness: Evidence from risk assessment for juvenile justice in catalonia. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 83–92, 2019.
- Wang et al. (2024a) Zichong Wang, Zhibo Chu, Ronald Blanco, Zhong Chen, Shu-Ching Chen, and Wenbin Zhang. Advancing graph counterfactual fairness through fair representation learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 40–58. Springer, 2024a.
- Wang et al. (2024b) Zichong Wang, Meikang Qiu, Min Chen, Malek Ben Salem, Xin Yao, and Wenbin Zhang. Toward fair graph neural networks via real counterfactual samples. Knowledge and Information Systems, 66(11):6617–6641, 2024b.
- Wightman (1998) Linda F Wightman. Lsac national longitudinal bar passage study. lsac research report series. 1998.
- Wu et al. (2019) Yongkai Wu, Lu Zhang, and Xintao Wu. Counterfactual fairness: Unidentification, bound and algorithm. In Proceedings of the twenty-eighth international joint conference on Artificial Intelligence, 2019.
- Xiao et al. (2024) Ying Xiao, Jie M Zhang, Yepang Liu, Mohammad Reza Mousavi, Sicen Liu, and Dingyuan Xue. Mirrorfair: Fixing fairness bugs in machine learning software via counterfactual predictions. Proceedings of the ACM on Software Engineering, 1(FSE):2121–2143, 2024.
- Xie & Zhang (2024a) Tian Xie and Xueru Zhang. Automating data annotation under strategic human agents: Risks and potential solutions. arXiv preprint arXiv:2405.08027, 2024a.
- Xie & Zhang (2024b) Tian Xie and Xueru Zhang. Non-linear welfare-aware strategic learning. arXiv preprint arXiv:2405.01810, 2024b.
- Xie et al. (2024) Tian Xie, Zhiqun Zuo, Mohammad Mahdi Khalili, and Xueru Zhang. Learning under imitative strategic behavior with unforeseeable outcomes. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URL https://openreview.net/forum?id=82bNZGMNZa.
- Xu et al. (2019) Depeng Xu, Shuhan Yuan, and Xintao Wu. Achieving differential privacy and fairness in logistic regression. In Companion Proceedings of The 2019 World Wide Web Conference, pp. 594–599, 2019.
- Zafar et al. (2017) Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web, pp. 1171–1180, 2017.
- Zhang et al. (2019) Xueru Zhang, Mohammadmahdi Khaliligarekani, Cem Tekin, et al. Group retention when using machine learning in sequential decision making: the interplay between user dynamics and fairness. Advances in Neural Information Processing Systems, 32:15269–15278, 2019.
- Zhang et al. (2020) Xueru Zhang, Ruibo Tu, Yang Liu, Mingyan Liu, Hedvig Kjellstrom, Kun Zhang, and Cheng Zhang. How do fair decisions fare in long-term qualification? Advances in Neural Information Processing Systems, 33:18457–18469, 2020.
- Zhang et al. (2022) Xueru Zhang, Mohammad Mahdi Khalili, Kun Jin, Parinaz Naghizadeh, and Mingyan Liu. Fairness interventions as (Dis)Incentives for strategic manipulation. In Proceedings of the 39th International Conference on Machine Learning, pp. 26239–26264, 2022.
- Zhou et al. (2024) Zeyu Zhou, Ruqi Bai, and David I Inouye. Improving practical counterfactual fairness with limited causal knowledge. In ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models, 2024.
- Zuo et al. (2022) Aoqi Zuo, Susan Wei, Tongliang Liu, Bo Han, Kun Zhang, and Mingming Gong. Counterfactual fairness with partially known causal graph. Advances in Neural Information Processing Systems, 35:1238–1252, 2022.
- Zuo et al. (2023) Zhiqun Zuo, Mohammad Mahdi Khalili, and Xueru Zhang. Counterfactually fair representation. Advances in Neural Information Processing Systems, 36:12124–12140, 2023.
Appendix A Proofs
A.1 Proof of Theorem 5.1 and Theorem 5.2
Proof.
For any given , we can find the conditional distribution and based on causal model . Consider sample drawn from this conditional distribution. For this sample, we have,
Since is also a function of , utilizing that
the gradient of w.r.t. are
The response function is defined as
can be calculated using response as follows,
(13) |
Similarly, we can calculate counterfactual value as follows,
(14) |
Note that the following hold for ,
(15) | |||
(16) |
Thus,
(17) |
Given above equation, now we can prove Theorem 5.1, Corollary 5.1, and Theorem 5.2,
-
•
For in Theorem 5.1 and Corollary 5.1, we have,
(18) The partial derivative of can be computed as
(19) We denote . From the theorem, we know that . Therefore, we have
Since, for any realization of , the above equation holds, we can conclude that the following holds,
When , we have that
Because
and
we have . With law of total probability, we have
-
•
For in Theorem 5.2, since is strictly convex in , we have,
Note that derivative of with respect to is -Lipschitz continuous in ,
we have that
Therefore,
So we have
∎
A.2 Theorem 5.2 for non-binary
Let be a set of all possible values for . Let be the counterfactual random variable associated with given observation and . Then, satisfies LCF, where satisfies the properties in Theorem 5.2.
Proof.
For any given , we assume the set of counterfactual is . Consider a sample drawn from the condition distribution of and , with a predictor , use the same way in A.1, we can get
with . When , we have
and when ,
Because is strictly convex and Lipschitz continuous, we have
and
Therefore,
So we proved that, for any
∎
A.3 Proof of Theorem 5.3
Proof.
We start from the case when is increasing. For any given , we can find the conditional distribution based on the causal model . Consider a sample drawn from this conditional distribution. For this sample, we have
So, the gradient of w.r.t. is
Therefore, can be calculated using response as follows,
Similarly, we have the future counterfactual status as
When , it is obvious that since . When , because is increasing, we have
Because is increasing, we have and
We further denote
We already have
There are three cases for the relationship between , , and .
-
Case 1:
. In this case, from the mean value theorem,
where , and
where . Because
and
we have
Because is strictly concave, is strictly decreasing, and we have,
Therefore,
-
Case 2:
. In this case,
From the mean value theorem,
where and
where . Because is decreasing, we have
Because
we have that . Therefore, .
-
Case 3:
. In this case, we have
From the mean value theorem,
where , and
where . Because
and
when ,
Because , we have .
In conclusion, we prove that for every sample . With law of total probability, we have
For the case when is decreasing, we can consider , then we have
which is to say
∎
A.4 Proof of Theorem 5.4
Proof.
From the causal functions defined in Theorem 5.4, given any , we can find the conditional distribution and . Similar to the proof of Theorem 5.2, we have
Because
the gradient of w.r.t are
The response function is defined as
can be calculated using the response as follows,
In the counterfactual world,
So we have,
We denote that . Because is a binary attribute, we have
From the property of that is strictly convex, we have
Note that the derivative of is -Lipschitz continuous,
we have
Therefore, for every sampled from the conditional distribution, . So we proved
∎
A.5 Proof of Theorem 6.1
Proof.
For any given we can find the consitional distribution and based on causal model . Consider sample drawn from this conditional distribution. For this sample, we have,
So, the gradient of w.r.t. are
The response function is defined as
can be calculated using response as follows,
Similarly, we can calculate path-dependent counterfactual value as follows,
Thus,
Denote . Since the partial gradient of w.r.t. is , we know that . Since for any realization of , the equation holds, we can conclude that the path-dependent LCF holds. ∎
A.6 Proof of Theorem 4.1
Proof.
Since is determined by , we denote the causal function from to as
Suppose the conditional distribution of given could denoted as , we have
Because
we have,
(20) |
Because the predictor satisfies CF,
the future outcome could be written as
From Eq.20, we have
which is to say
∎
Appendix B Parameters for Synthetic Data Simulation
When generating the synthetic data, we used
These values are generated randomly.
Appendix C Empirical Evaluation of Theorem 5.2
In this section, we use the same synthetic dataset generated in Section 7.1 to validate Theorem 5.2. We keep all the experimental settings as the same as Section 7.1, but use a different predictor . The form of is
with . It is obvious that satisfies property (i) and (ii) in Theorem 5.2. Because in the synthetic causal model, is always larger than 0, property (iii) is also satisfied.
Method | MSE | AFCE | UIR |
---|---|---|---|
UF | 0.036 0.003 | 1.296 0.000 | 0% 0 |
CF | 0.520 0.045 | 1.296 0.000 | 0% 0 |
Ours | 0.064 0.001 | 0.930 0.001 | 28.2% 0 |
Appendix D Empirical Evaluation of Theorem 5.3
We generate a synthetic dataset with the structural function:
The domain of is . is sampled from a uniform distribution and set as 0.5987 in this experiment. In this case, . Therefore, property (i), (ii) are satisfied. Since , we have . We use a predictor
and choose .
Method | MSE | AFCE | UIR |
---|---|---|---|
UF | 0.012 0.001 | 1.084 0.000 | 0% 0 |
CF | 0.329 0.019 | 0.932 0.007 | 14.0% 0.65% |
Ours | 5.298 1.704 | 0.124 0.086 | 88.6% 7.93% |
Appendix E Empirical Evaluation of Theorem 5.4
We generate a synthetic dataset following the structural functions 10. We choose , and . We generated 1000 data samples. The parameters used in the structural functions are displayed as follows.
Method | MSE | AFCE | UIR |
---|---|---|---|
UF | 0.036 0.002 | 17.480 0.494 | 0% 0 |
CF | 1.400 0.098 | 11.193 1.019 | 35.9% 11.6% |
Ours | 1.068 0.432 | 0.000 0.000 | 100% 0 |
To construct a predictor satisfies property (ii) and (iii) described in Theorem 5.4, we set
with . Table 5 displays the experiment results. In this case, CF improved LCF. However, we know that there is no guarantee that CF predictor will always improve LCF. And our method, not only can provide the theoretical guarantee, but also achieves a better MSE-LCF trade-off.
Appendix F Empirical Evaluation On Real-world (Loan) Dataset
We measure the performance of our proposed method using Loan Prediction Problem Dataset (kag, ). In this experiment, the objective is to forecast the Loan Amount () of individuals using their Gender (), income (), co-applicant income (), married status () and area of the owned property ().
The causal model behind the dataset is that there exists an exogenous variable that represents the hidden financial status of the person. The structural functions are given as
and have no parent nodes. We use the same implementation as what we use in the experiments for the Law School Success dataset.
Method | MSE () | AFCE | UIR |
---|---|---|---|
UF | 1.352 0.835 | 11.751 0.848 | 3.49% 7.19% |
CF | 2.596 0.255 | 11.792 0.790 | 0% 0 |
Ours () | 2.646 0.540 | 5.896 0.395 | 50% 3.35% |
Ours () | 2.733 0.197 | 0.001 0.000 | 100% 0 |
Table 6 shows the results of our method compared to the baselines. Again, our method can achieve perfect LCF by setting as . Compared to CF predictor, our method has only a slightly larger MSE, but our LCF is greatly improved.