Kernel-Distance-Based Covariate Balancing
Abstract
A common concern in observational studies focuses on properly evaluating the causal effect, which usually refers to the average treatment effect or the average treatment effect on the treated. In this paper, we propose a data preprocessing method, the Kernel-distance-based covariate balancing, for observational studies with binary treatments. This proposed method yields a set of unit weights for the treatment and control groups, respectively, such that the reweighted covariate distributions can satisfy a set of pre-specified balance conditions. This preprocessing methodology can effectively reduce confounding bias of subsequent estimation of causal effects. We demonstrate the implementation and performance of Kernel-distance-based covariate balancing with Monte Carlo simulation experiments and a real data analysis.
Key words: Observational Studies; Causal Effect; Average Treatment Effect; Covariate Balance; Kernel Distance
1 Introduction
In causal inference, there are two crucial concepts. (i) The propensity score (Rosenbaum and Rubin, 1983): the probability of assignment to the treatment group given the observed covariates. (ii) The potential outcomes (Neyman, 1923; Rubin, 1974): the possible outcomes under different treatment decisions or interventions. The potential outcomes framework, also known as the Neyman-Rubin Potential Outcomes, was first proposed in completely randomized experiments by Neyman (1923) and was extended to both experimental and observational studies by Rubin (2005). An overview for the implementation of the potential outcomes framework can be found in Imbens and Rubin (2015). Both the two concepts are related to evaluate the causal effect, a central topic in causan inference. Based on various purpose, researchers may prefer estimating the average treatment effect (ATE), the average treatment effect on the treated (ATT), the average treatment effect on the control (ATC), or all of them.
In observational studies, however, it is not valid to evaluate the causal effect by merely computing the mean-difference in response between different experimental groups due to the confounding variables and the non-random assignment mechanism. One method proposed by Rubin (2007) to solve this problem is to balance the distributions between the treated and the control groups. Two popular preprocessing methods, matching and propensity score methods, have been widely used in observational studies. In causal inference, weighting methods that adjust for observed covariates enjoy substantial popularity. In this paper, we focus on weighting methods.
The weights derived from adjusting the empirical distributions of the observed covariates will be applied to the outcomes such that the reweighted data appear randomized, which can yield stable estimates for the parameters of interest when there is no unmeasured confounders. Generally speaking, inverse probability weighting is less robust and works no better than moment balancing methods. Therefore, we propose a weighting framework, namely the kernel distance-based (KDB) covariate balance, and use the first-moment balance of all covariates as a constraint for the optimization problem. In this article, the KDB method is a preprocessing technique to achieve covariate balance in observational studies with a binary treatment. We give estimators of ATE and ATT based on the KDB framework, among which our primary focus is ATE estimation. Within the same framework, the optimization problem can be flexibly changed to estimate ATT or ATC.
Subsequently, we further explore their asymptotic properties and compare our proposed method with other preprocessing methods, such as the inverse probability weighting (IPW) (Horvitz and Thompson, 1952; Hirano et al., 2003), the covariate balancing propensity score (CBPS) (Imai and Ratkovic, 2014), the entropy balancing (EB) (Hainmueller, 2012), and non-parametric calibration weighting (CAL)(Chan et al., 2016). It can be shown that this proposed framework has several attractive features. Most importantly, the KDB method can automatically match the balance involve the second and possibly higher moments of the covariate distributions as well as interactions. Besides, the KDB estimator for ATE shows greater stability in nonlinear simulation scenarios compared to existing weighting estimators.
The remainder of this article is organized as follows. Section 2 introduces the definitions, notation, and assumptions used throughout the article. Section 3 reviews the main results of four weighting methods, namely the IPW, CBPS, EB, and CAL. Section 4 introduces the weighted kernel distance in detail and describes the theoretical framework of the KDB that we proposed. Section 5 illustrates the proposed method and compares its performance with other weighting methods in two Monte Carlo simulations and a real data analysis. Section 6 concludes this article and discusses further explorations.
2 Notation and Model Setup
2.1 Notation
We use the symbol ‘’ to distinguish definitions from equalities. Suppose the observed dataset consists of a random sample or finite population of units, each assigned to one of the groups, treatment or control, for which covariate-balanced comparisons are of interest. For each unit , we can observe of , where is an outcome variable, is a treatment indicator and with expectation and covariance matrix is a set of covariates. Every takes values 1 or 0, which correspondingly indicates the unit is assigned to the treatment or the control group. We define the sample size in the treatment and control group respectively as and . The observed data in the treatment and control groups are denoted as and , respectively, where
The probability of assignment to the treatment group given the covariates is the so-called propensity score (Rosenbaum and Rubin, 1983)
which plays a central role in causal inference.
Under the potential outcome framework (Neyman, 1923; Rubin, 1974), we define a pair of potential outcomes for every unit under treatment 0 or 1 respectively. The observed oucome can be expressed by the pair of potential outcomes under the consistency assumption:
In this paper, we focus on two causal estimands: the average treatment effect (ATE), that is
where , and the average treatment effect on the treated (ATT), that is
where for . The expectations for the potential outcomes given the covariates are
For causal comparisons, we make the following assumptions (Rosenbaum and Rubin, 1983) throughout this article so that unbiased estimators of ATE and ATT are available in observational studies.
2.2 Assumptions
Assumption 1 (Strong Ignorability):
where denotes independence.
We assume the potential outcomes are independent of the treatment indicator given covariates , which implies there is no unmeasured confounders that may cause the selection bias. Formally, as shown by Rosenbaum and Rubin (1983), Assumption 1 also implies
and
Under Assumption 1, we can investigate ATE through the value of
Assumption 2 (Overlap):
To ensure that there are useful observations for estimating the causal effect, we assume the probability that a subject is assigned to the treatment group is limited to range between 0 and 1. Assumption 2 also helps to ensure that the bias-correction information is available in the entire domain of (Zhao and Percival, 2016) and equivalently requires that the covariate distribution provides sufficient overlap between the treatment and the control group.
Reminding that under Assumption 1, the ATE can be expressed as
(2.1) | ||||
where indicates the true distribution of covariates. Within a weighting framework, we denote the weighted sample average treatment effect (SATE) with weights , , as
and the weigthed empirical distributions of the treatment and control groups as
where is an indicator function. Therefore, we have
(2.2) | ||||
According to Equation (2.2), we know that the bias of weighted SATE depends entirely on the three items
Here, the second item directly shows the importance of balancing the distributions of the covariates between the treated and control groups, which is the goal of our proposed weighting framework. To further ensure , we need the following Assumption 3.
Assumption 3 (Constant Conditional ATE):
With Assumption 3, we assume a constant causal effect, so the ATE is numerically the same as the ATT.
3 Existing Weighting Methods for Covariate Balancing
We next give a brief overview of some existing approaches.
3.1 IPW: Inverse Probability Weighting
According to Ma and Wang (2010), the IPW assigns a weight
to (the outcome in the treatment group), , and a weight
to (the outcome in the control group), , to estimate ATE, thereby deriving a natural estimator of ATE, that is,
Besides, a natural estimator of ATT using inverse probability weighting is
Therefore, the weights assigned to for estimating ATT is
and the weights assigned to is
3.2 EB: Entropy Balancing (Hainmueller, 2012)
The entropy balancing (EB) scheme by Hainmueller (2012) considered:
The weights and of EB method can be directly calculated with the R package “ebal”.
3.3 CBPS: Covariate Balance Propensity Score (Imai and Ratkovic, 2014)
Imai and Ratkovic (2014) proposed a relatively more stable method, namely the Covariate Balance Propensity Score (CBPS), for estimating propensity score so that covariate balance is optimized. CBPS can significantly improve the poor empirical performance of propensity score matching and weighting methods. The key idea is using propensity score weighting to characterize the covariate balance:
where is an D-dimensional vector-valued measurable function of specified by the researcher. Equation (3.3) holds for any function of covariates. Selecting a specific , for example, setting , Equation (3.3) helps to balance both the first and second moments of all covariates.
The CBPS method can be implemented through the open-source R package “CBPS”.
3.4 CAL: Calibration Estimator (Chan et al., 2016)
A general class of calibration (CAL) estimators of ATE is established with a wide class, such as exponential tilting, of calibration weights. The weights are constructed to achieve an exact moment balance of observed covariates among the treated, the control, and the combined group. Global semiparametric efficiency has been established for the CAL estimators.
For a fixed , let denote a continuously differentiable distance measure for . Being nonnegative and strictly convex in , if and only if . The general idea of calibration as in Deville and Särndal (1992) is to minimize the aggregate distance between the final weights to a given vector of design weights subject to moment constraints. Avoiding estimating the design weights, CAL first used a vector of misspecified uniform design weights , and constructed the final weights by solving the following constrained optimization problem:
(3.1) |
This CAL method can be implemented through an open-source R package “ATE” for estimating both the ATE and ATT.
4 The Weighted Kernel Covariate Balancing
4.1 Kernel Distance
Let and be random samples from two unknown probability measures, and , then researchers are often interested in detecting the difference and estimating the discrepency between them. In this paper, we use a probability metric with -structure (Zolotarev, 1983)
where indicates a class of functions, to measure the distance between the treatment and the control covariates.
As for the implementation of , we can define to be a norm of a Reproducing Kernel Hilbert Spaces (RKHS) and set , by which we can get a “kernelized” version of the total variation distance. Restricting to be a strictly positive definite kernel function corresponding to RKHS , then we can get a closed-form solution for (Sriperumbudur et al., 2012), that is
where and respectively denote the empirical measure of and , and is the sample weight for the unit . Let us define if and if , then we can get a simple as well as closed-form analytical solution for the empirical estimate of the probability metric under the assumption that the function class represents an RKHS. The computation of is but it is also independent of the dimension of the confounders.
We can use as a diagnostic for covariate balance, the smaller values of which denote better performance.
4.2 The Integral Probability Metric (IPM)
Given two probability measures, and defined on a measurable space S, the integral probability metric (IPM) (Sriperumbudur et al., 2012), also called probability metrics with a -structure, between and is defined as
(4.1) |
where is a class of real-valued bounded measurable functions on S. The choice of functions is the crucial distinction between different IPMs and various popular distance measures, for example, the Kantorovich metric and the total variation distance, can be obtained by choosing appropriate in (4.1). When , the corresponding is called kernel distance or maximum mean discrepancy, where represents a reproducing kernel Hilbert space (RKHS) with as its reproducing kernel (r.k.). In this situation, we can rewrite the function space as and the IPM as . The kernel distance has been widely used in statistical applications, including homogeneity testing, independence testing, conditional independence testing, and mixture density estimation.
4.3 Estimators of IPM
Let and be i.i.d. samples randomly drawn from and , respectively, the empirical estimator of is given by
(4.2) |
where and represent the empirical distributions of and , respectively, is an indicator function, , when for and when for .
Correspondingly, we define the weighted estimator of of as
(4.3) |
where and represent the wighted distributions of and , respectively, when for and when for . and are the balancing weights derived frome some weighting methods.
4.4 Weighted Estimator of the Kernel Distance
Theorem 4.1.
Let be a strictly positive definite kernel, i.e., for all , and all mutually distinct , . Then, for , the following function is the unique solution to (4.3):
(4.4) |
and the corresponding weighted estimator of the kernel distance is
(4.5) |
Proof. Consider , which can be written as
where we have used the reproducing property of , i.e., .
Following the Cauchy-Schwartz inequality, which states that for all vectors and of an inner product space it is true that
(4.6) |
where is the inner product, we have
Therefore, we can set
Since is strictly positive definite, if and only if , which therefore ensures that (4.4) is well-defined.
4.5 The Gaussian Kernel and the Information Matrix
According to Rasmussen (2003), one of the choice of kernel for covariate balance is the Gaussian kernel, that is
where the parameter determines the width of the Gaussian kernel and it can either be fixed or estimated from the data. In this article, we estimate with the median of all possible pairwise squared Euclidean distances between all pairs of subjects.
The Information Matrix corresponding to the Gaussian kernel is
(4.7) | ||||
4.6 The Kernel-distance-based (KDB) Covariate Balance
In this article, we propose a preprocessing procedure, the kernel-distance-based covariate balancing method, to create balanced samples for evaluating the treatment effects. In this procedure, we create two groups of scalar weights for the treatment and the control groups so that the reweighted treatment and control groups can match precisely at the specified moments.
4.6.1 ATE Estimation
In this section, we introduce the estimation of ATE
with the weighted mean difference between the treatment and the control outcomes, that is
The weights and derive from the following KDB reweighting scheme:
(4.8) | ||||
where is an indicator function, and indicates a set of constraint functions imposed on the covariate moment. Here, for convenience, we set to be the sample mean function at all covariates so that the reweighted treatment and control groups can match on the first moment, that is
where . To express concisely, we define , , and
where refers to the total number of covariates, and respectively on behalf of the covariate in the treatment and control group, and then the optimization problem for estimating ATE can be expressed as
(4.9) |
Here, the first constraints are treated as equality constraints, and all further as inequality constraints.
4.6.2 Stable Weight Estimation of ATE
To further construct stable weights for estimating the ATE, we can make a small modification to the optimization problem in (4.9). Compared to the reweighting scheme in section 4.6.1, we only add a tuning parameter to the diagonal of in order to control the variance of , while keeping all the other settings and parameters the same. Defining , the reweighting scheme for stable ATE estimation is:
(4.10) |
It is easy to prove that the optimization problem (4.10) satisfying constraints above is equivalent to
(4.11) |
4.6.3 ATT Estimation
In this section, we introduce the KDB reweighting scheme for estimating the ATT
We estimate with the mean difference between the treatment outcomes and the weighted control outcomes, that is,
where derive‘ from the following reweighted scheme:
(4.12) | ||||
Just like the ATE estimation, we define and reset , , respectively to be
where and refers to the sample mean of the covariate in the treatment group. Therefore, we can also simplify the ATT optimization problem above into a matrix form
(4.13) | ||||
Here, the first constraints are treated as equality constraints, and all further as inequality constraints.
4.7 Propositions of the KDB Covariate Balance
Proposition 1.
The Gaussian Information Matrix is positive semi-definite.
Proof. Based on the propertied of kernel, i.e, , for any , we have
∎
as required.
Optimization Problem:
We aim to solve the following minimization problem:
(4.14) | |||
where , and .
Proposition 2.
(4.14) is a standard convex optimization problem.
The Dual of Above Convex Problem:
The Lagrange function is given by
and the dual function is given by . Note
(4.15) | ||||
The dual problem of (4.14) is . Let be the optimal value of (4.14), we have . Since the inequality constraints are all affine, from the weak Slater condition we know that . Then from KKT conditions we have,
(4.16) |
and
Assuming is positive difinite. Solving (4.16) we have . Therefore,
The dual problem is
(4.17) | |||
Solving for we have
(4.18) |
Substitute with (4.18), (4.17) becomes
(4.19) |
Suppose , we have . For simplicity, substitute with and write , (4.19) is converted to
(4.20) |
5 Simulation Studies
In this section, we perform simulation studies for evaluating the performance of different weighting methods, including the inverse probability weighting (IPW), the covariate balancing propensity score (CBPS) (Imai and Ratkovic, 2014), the entropy balance (EB) (Hainmueller, 2012), the calibration estimator (CAL) (Chan et al., 2016) and our proposed KDB Covariate Balance. We use the oracle and unadjusted (UnAD) estimators as benchmarks. In all the following simulation experiments, we assume a constant causal effect, so the ATE is numerically the same as the ATT. For each simulation experiment, we construct Monte Carlo datasets.
In addition to simulation studies, we also conduct a real data analysis with the ‘National supported work’ (NSW) dataset.
5.1 Evaluation Metrics and Balancing Statistics
To evaluate the performance of estimating ATE or ATT, we use the metric Bias, % Bias, empirical standard deviation (SD), and RMSE.
Bias is the difference between the average estimate and the true value of ATE/ATT. % Bias is the bias as a percentage of the estimate’s standard deviation. A useful rule of thumb (Kang et al., 2007) is that the performance of interval estimates and test statistics begins to deteriorate when the bias of the point estimate exceeds about 40% of its standard deviation. The empirical SD for an estimator is the square root of average squared differences between all estimates from the mean. Here, we use the empirical SD corresponding to the average value of ATE/ATT estimates, and thus it is the empirical standard deviation of all estimated values divided by the square root of sample size . RMSE is the root-mean-square error for measuring the differences between values predicted by a weighting method and the observed values.
To further compare the power of balancing covariates, we use the balancing statistics , kernel distance (KD), the maximum, average, and median value of absolute standardized mean difference (maxASMD, meanASMD, medASMD), the average Kolmogorov-Smirnov statistic (meanKS) and the average T statistic (meanT). We introduce each balancing statistic with more details as follows.
For every set of sample weights and obtained by different balancing methods for each Monte Carlo dataset, we calculate as
(5.1) | ||||
For a simulation experiment with a total of iterations, we calculate the average value of all statistics of each balancing method and compare them. KD, the square root of , has been widely used in statistical applications (Sriperumbudur et al., 2012) and can be regarded as a reasonable measure of covariates balance (Xie, 2018).
In estimating ATE, the ASMD of covariate with balancing weights and is defined as (Zhang et al., 2019):
where and respectively refer to the weighted sample mean of covariate in the treatment and control group
and and denote the sample variance of covariate in the treatment and control group without any adjustment
In estimating ATT, the ASMD of covariate with balancing weights and is defined as (Xie et al., 2019):
where denotes the sample standard deviation of covariate in the treatment group without any adjustment
For each Monte Carlo dataset, we calculte the ASMD values over all covariates and mark down the maximum, mean and median values.
A Kolmogorov-Smirnov test can be applied to test whether two underlying one-dimensional probability distributions significantly differ and the KS statistic quantifies a distance between the empirical distribution functions of two samples. We calculate the KS statistic between the weighted samples in treatment and control group over all covariates and take the average as one balancing statistic.
where . and are the empirical distribution functions of the weighted sample and respectively, and is the supremum function.
A t-test is commonly applied to determine if the means of two sets of data are significantly different from each other. We conduct a Welch’s t-test between the weighted means in the treatment and control group over all covariates and take the average as one balance statistic.
where , and . and are the unbiased estimators of the variances of the two weighted samples and respectively.
5.2 Benchmark Estimators
In different simulation scenarios, sample oracle estimators, sample unadjusted estimators, and their biases will be served as benchmarks for comparing different weighting methods.
5.2.1 Oracle Estimator
The sample oracle estimator of the ATE and ATT are respectively defined as
5.2.2 UnAD: Unadjusted Estimator
Without any adjustment, we assigned weight for each unit in the treatment group, and for each unit in the control group. From this, we can get the sample unadjusted estimator of the ATE, that is
5.3 Simulation Studies
For convenience, we denote
-
•
KDBC:
the proposed KDB covariate balancing with only the constraints:
-
•
KDM1:
the proposed KDB covariate balancing with the constraints:
5.3.1 Simulation 1: Kang and Schafer (Kang et al., 2007; Josey et al., 2020)
In this section, we follow the simulation example in Josey et al. (2020), an extensive set of experimental scenarios adapted from Kang et al. (2007), to compare different weighting methods.
We first generate the observed covariates , where , and then make the following transformations to build the hidden variables
In subsequent applications, we standardize so that it has a mean of zero and marginal variances of one. The propensity score model is defined as
where refers to the generative process that determines the treatment assignment. Here, indicates the log odds of the propensity score is linear with , while implies a nonlinear relaltionship between the log odds of the propensity score and . As for the outcome propcess, we know that and construct the pair potential outcomes for unit with a bivariate model
where is the error variance, is the correlation between the potential outcomes, represents the treatment effect and indicates the covariates, whether or , that participate in the outcome process. For example, if , we have . It should be emphasized that although we may use to construct variable or , it cannot be observed and the covariates to be balanced is always . In the other words, the observed simulated data consists of .
Based on this data generation mechanism, we can know that
-
•
ATE:
-
•
ATT:
To better understand the role of different parameters on estimating ATE, we choose the sample size in , the error variance in , the correlation between the potential outcomes in , the generative process that determines the treatment assignment in {: is generated with covariates ; : is generated with covariates }, and the outcome process in {: is generated with covariates ; : is generated with covariates }, so there are a total of 72 different simulaiton scenarios. We change only one parameter in different simulation scenarios and run 500 Monte Carlo datasets for each setting.
Fixing and ({ = X, = X}, { = X, = U}, { = U, = X}, { = U, = U}, we run 1000 Monte Carlo datasets to compare the performance of four weighting methods (IPW, CBPS, EB, KDBC) on estimating ATE under different combinations of , and . The results are briefly shown in Figure [1, 2, 3, 4]. For every Figure corresponds to a different combination of and , the value of corresponds to each row from top to bottom are , and the value of corresponds to each column from left to right are . It can be found that the sample size and the error variance mainly influence the standard errors of ATE estimators, while the correlation between potential outcomes and has almost no effect. This observation is consistent with Josey et al. (2020) and thus, we focus on the simulation scenario with parameters , , and .
We run 500 Monte Carlo experiments for estimating both the ATE and ATT. The results of ATE estimation are shown in Figure [5] and Table [1,2], and the results of ATT estimation are shown in Table [3,4]. Table [1,3] together show that the KDBC estimators, both for ATE and ATT, can perform as well as the other weighting estimators when all the true predictors are included in the observed dataset. While in both scenarios of { = X, = U} and { = U, = U} where the key information of the true covariates are missed, the KDBC estimators can perform much better than other weighting estimators. This reflects the high stability of our proposed KDB covariate balance in estimating causal effects even when the observed dataset misses some key information about the real generation mechanism.
To better observe the adjustment effect of our proposed method, we draw the cumulative distribution functions (CDF) of all covariates without any adjustment and that of the weighted covariates under KDBC. It can be seen from Figure [6] that regardless of the value of and , under our proposed KDB covariate balancing with only regularization constraints (KDBC), the CDFs of the weighted covariates between the treatment and control groups can always match each other well. This means that our proposed method can effectively achieve the covariate balance between the treatment and control groups.





Treatment Assignment Scenario | Outcome Scenario | Metric | Benchmark | Balancing Methods | ||||||||||||||||
Oracle | UnAD | IPW | CBPS | EB | CAL | KDBC | KDM1 | |||||||||||||
X | X | ATE | 20.02181 | -0.22254 | 19.42157 | 19.99264 | 19.97454 | 20.00381 | 19.99452 | 20.01701 | ||||||||||
Bias | 0.02181 | -20.22254 | -0.57843 | -0.00736 | -0.02546 | 0.00381 | -0.00548 | 0.01701 | ||||||||||||
sd | 0.01358 | 0.22983 | 0.16986 | 0.02390 | 0.02298 | 0.02302 | 0.03207 | 0.03171 | ||||||||||||
RMSE | 0.01360 | 0.93307 | 0.17165 | 0.02388 | 0.02299 | 0.02299 | 0.03204 | 0.03168 | ||||||||||||
X | U | ATE | 20.01750 | 7.76609 | 19.28762 | 19.67198 | 19.74617 | 19.76634 | 19.95198 | 19.97879 | ||||||||||
Bias | 0.01750 | -12.23391 | -0.71238 | -0.32802 | -0.25383 | -0.23366 | -0.04802 | -0.02121 | ||||||||||||
sd | 0.01428 | 0.23171 | 0.25003 | 0.17427 | 0.12919 | 0.12925 | 0.03289 | 0.03279 | ||||||||||||
RMSE | 0.01428 | 0.59407 | 0.25181 | 0.17471 | 0.12956 | 0.12954 | 0.03293 | 0.03277 | ||||||||||||
U | X | ATE | 19.99365 | 4.149422956 | 20.37174 | 19.99844 | 19.95353 | 19.99821 | 19.98244 | 19.99608 | ||||||||||
Bias | -0.00635 | -15.85057704 | 0.37174 | -0.00156 | -0.04647 | -0.00179 | -0.01756 | -0.00392 | ||||||||||||
sd | 0.01431 | 0.215249227 | 0.14953 | 0.02376 | 0.02326 | 0.02298 | 0.03060 | 0.03047 | ||||||||||||
RMSE | 0.01429 | 0.740757147 | 0.15030 | 0.02374 | 0.02333 | 0.02295 | 0.03058 | 0.03044 | ||||||||||||
U | U | ATE | 19.99174 | 5.308382057 | 15.30943 | 15.19822 | 15.83007 | 15.86088 | 19.89995 | 19.91026 | ||||||||||
Bias | -0.00826 | -14.69161794 | -4.69057 | -4.80178 | -4.16993 | -4.13912 | -0.10005 | -0.08974 | ||||||||||||
sd | 0.01474 | 0.223602406 | 0.17365 | 0.14242 | 0.11789 | 0.11796 | 0.03044 | 0.03037 | ||||||||||||
RMSE | 0.01473 | 0.69396 | 0.27221 | 0.25760 | 0.22056 | 0.21943 | 0.03074 | 0.03061 |
Treatment Assignment Scenario | Outcome Scenario | Balancing Metric | Benchmark | Balancing Methods | ||||||||||||||||
Oracle | UnAD | IPW | CBPS | EB | CAL | KDBC | KDM1 | |||||||||||||
X | X | KD | 0.00100 | 0.17749 | 0.03689 | 0.02457 | 0.02138 | 0.02140 | 0.00036 | 0.00035 | ||||||||||
maxASMD | 0.00000 | 0.00879 | 0.00129 | 0.00015 | 0.00012 | 0.00012 | 0.00017 | 0.00017 | ||||||||||||
meanASMD | 0.00000 | 0.00409 | 0.00068 | 0.00008 | 0.00007 | 0.00007 | 0.00009 | 0.00009 | ||||||||||||
medASMD | 0.00000 | 0.00327 | 0.00061 | 0.00007 | 0.00006 | 0.00006 | 0.00008 | 0.00008 | ||||||||||||
meanKS | 0.00000 | 0.22341 | 0.21086 | 0.21025 | 0.20455 | 0.20455 | 0.20812 | 0.20635 | ||||||||||||
meanT | 0.00000 | -1.30843 | -0.05517 | 0.00350 | 0.00279 | 0.00439 | -0.00104 | -0.00060 | ||||||||||||
X | U | KD | 0.00100 | 0.17657 | 0.03709 | 0.02459 | 0.02145 | 0.02147 | 0.00036 | 0.00036 | ||||||||||
maxASMD | 0.00000 | 0.00874 | 0.00129 | 0.00016 | 0.00013 | 0.00013 | 0.00018 | 0.00018 | ||||||||||||
meanASMD | 0.00000 | 0.00404 | 0.00069 | 0.00009 | 0.00007 | 0.00007 | 0.00010 | 0.00010 | ||||||||||||
medASMD | 0.00000 | 0.00324 | 0.00062 | 0.00008 | 0.00006 | 0.00006 | 0.00009 | 0.00009 | ||||||||||||
meanKS | 0.00000 | 0.22388 | 0.21115 | 0.21048 | 0.20520 | 0.20520 | 0.20888 | 0.20717 | ||||||||||||
meanT | 0.00000 | -1.28633 | -0.03970 | 0.00588 | 0.00577 | 0.00748 | 0.00066 | 0.00144 | ||||||||||||
U | X | KD | 0.00100 | 0.15692 | 0.03542 | 0.02824 | 0.02555 | 0.02557 | 0.00036 | 0.00035 | ||||||||||
maxASMD | 0.00000 | 0.00837 | 0.00101 | 0.00014 | 0.00012 | 0.00011 | 0.00016 | 0.00016 | ||||||||||||
meanASMD | 0.00000 | 0.00359 | 0.00053 | 0.00008 | 0.00006 | 0.00006 | 0.00009 | 0.00009 | ||||||||||||
medASMD | 0.00000 | 0.00265 | 0.00047 | 0.00007 | 0.00006 | 0.00005 | 0.00008 | 0.00008 | ||||||||||||
meanKS | 0.00000 | 0.20399 | 0.19083 | 0.19077 | 0.18849 | 0.18850 | 0.19127 | 0.19017 | ||||||||||||
meanT | 0.00000 | -0.85087 | 0.02321 | 0.00628 | -0.00105 | 0.00097 | 0.00171 | 0.00224 | ||||||||||||
U | U | KD | 0.00100 | 0.15523 | 0.03651 | 0.02824 | 0.02557 | 0.02558 | 0.00035 | 0.00036 | ||||||||||
maxASMD | 0.00000 | 0.00834 | 0.00109 | 0.00015 | 0.00012 | 0.00012 | 0.00017 | 0.00017 | ||||||||||||
meanASMD | 0.00000 | 0.00354 | 0.00057 | 0.00008 | 0.00007 | 0.00006 | 0.00009 | 0.00009 | ||||||||||||
medASMD | 0.00000 | 0.00258 | 0.00050 | 0.00007 | 0.00006 | 0.00006 | 0.00009 | 0.00009 | ||||||||||||
meanKS | 0.00000 | 0.20276 | 0.18890 | 0.18875 | 0.18557 | 0.18556 | 0.18917 | 0.18809 | ||||||||||||
meanT | 0.00000 | -0.85721 | 0.03375 | 0.00702 | -0.00030 | 0.00165 | 0.00152 | 0.00210 |
Treatment Assignment Scenario | Outcome Scenario | Metric | Benchmark | Balancing Methods | ||||||||||||||
Oracle | IPW | CBPS | EB | CAL | KDM1 | KDBC | ||||||||||||
X | X | ATT | 20.02181 | 18.94827 | 20.01480 | 19.94777 | 20.01839 | 20.01063 | 19.48173 | |||||||||
Bias | 0.02181 | -1.05173 | 0.01480 | -0.05223 | 0.01839 | 0.01063 | -0.51827 | |||||||||||
sd | 0.01358 | 0.22240 | 0.02916 | 0.02931 | 0.02924 | 0.04480 | 0.04704 | |||||||||||
RMSE | 0.01360 | 0.22710 | 0.02914 | 0.02938 | 0.02922 | 0.04476 | 0.05240 | |||||||||||
X | U | ATT | 20.01750 | 19.69548 | 20.77572 | 20.75489 | 20.77844 | 20.41813 | 19.82211 | |||||||||
Bias | 0.01750 | -0.30452 | 0.77572 | 0.75489 | 0.77844 | 0.41813 | -0.17789 | |||||||||||
sd | 0.01428 | 0.24458 | 0.15395 | 0.15362 | 0.15389 | 0.05750 | 0.05277 | |||||||||||
RMSE | 0.01428 | 0.24471 | 0.15766 | 0.15714 | 0.15763 | 0.06041 | 0.05332 | |||||||||||
U | X | ATT | 19.99365 | 25.08391 | 19.99283 | 19.94235 | 19.99311 | 19.97513 | 19.77988 | |||||||||
Bias | -0.00635 | 5.08391 | -0.00717 | -0.05765 | -0.00689 | -0.02487 | -0.22012 | |||||||||||
sd | 0.01431 | 0.25533 | 0.02654 | 0.02661 | 0.02653 | 0.04041 | 0.04144 | |||||||||||
RMSE | 0.01429 | 0.34170 | 0.02652 | 0.02671 | 0.02650 | 0.04038 | 0.04255 | |||||||||||
U | U | ATT | 19.99174 | 19.07117 | 16.39781 | 16.37561 | 16.39804 | 19.96664 | 19.74746 | |||||||||
Bias | -0.00826 | -0.92883 | -3.60219 | -3.62439 | -3.60196 | -0.03336 | -0.25254 | |||||||||||
sd | 0.01474 | 0.25013 | 0.15951 | 0.15934 | 0.15953 | 0.04586 | 0.04628 | |||||||||||
RMSE | 0.01473 | 0.25331 | 0.22659 | 0.22718 | 0.22660 | 0.04584 | 0.04759 |
Treatment Assignment Scenario | Outcome Scenario | Balancing Metric | Benchmark | Balancing Methods | ||||||||||||||
Oracle | IPW | CBPS | EB | CAL | KDM1 | KDBC | ||||||||||||
X | X | KD | 0.00142 | 0.05126 | 0.02618 | 0.02612 | 0.02619 | 0.00725 | 0.00568 | |||||||||
maxASMD | 0.00000 | 0.00198 | 0.00050 | 0.00050 | 0.00050 | 0.00050 | 0.00054 | |||||||||||
meanASMD | 0.00000 | 0.00107 | 0.00024 | 0.00024 | 0.00024 | 0.00024 | 0.00027 | |||||||||||
medASMD | 0.00000 | 0.00098 | 0.00020 | 0.00020 | 0.00020 | 0.00020 | 0.00023 | |||||||||||
meanKS | 0.00000 | 0.25172 | 0.26461 | 0.26408 | 0.26467 | 0.45139 | 0.43743 | |||||||||||
meanT | 0.00000 | -0.09005 | -0.00207 | -0.00467 | -0.00190 | -0.00225 | -0.01696 | |||||||||||
X | U | KD | 0.00142 | 0.05214 | 0.02675 | 0.02669 | 0.02676 | 0.00713 | 0.00556 | |||||||||
maxASMD | 0.00000 | 0.00205 | 0.00051 | 0.00051 | 0.00051 | 0.00051 | 0.00055 | |||||||||||
meanASMD | 0.00000 | 0.00110 | 0.00024 | 0.00024 | 0.00024 | 0.00024 | 0.00027 | |||||||||||
medASMD | 0.00000 | 0.00100 | 0.00020 | 0.00020 | 0.00020 | 0.00020 | 0.00022 | |||||||||||
meanKS | 0.00000 | 0.25165 | 0.26293 | 0.26249 | 0.26299 | 0.44922 | 0.43451 | |||||||||||
meanT | 0.00000 | -0.05648 | -0.00233 | -0.00499 | -0.00215 | -0.00118 | -0.01506 | |||||||||||
U | X | KD | 0.00140 | 0.05612 | 0.02820 | 0.02814 | 0.02820 | 0.00501 | 0.00412 | |||||||||
maxASMD | 0.00000 | 0.00232 | 0.00041 | 0.00040 | 0.00041 | 0.00041 | 0.00041 | |||||||||||
meanASMD | 0.00000 | 0.00123 | 0.00018 | 0.00018 | 0.00018 | 0.00018 | 0.00019 | |||||||||||
medASMD | 0.00000 | 0.00110 | 0.00014 | 0.00014 | 0.00014 | 0.00014 | 0.00016 | |||||||||||
meanKS | 0.00000 | 0.22078 | 0.21968 | 0.21934 | 0.21968 | 0.41261 | 0.39862 | |||||||||||
meanT | 0.00000 | 0.30059 | 0.00518 | 0.00423 | 0.00518 | 0.00598 | 0.00305 | |||||||||||
U | U | KD | 0.00141 | 0.05644 | 0.02832 | 0.02827 | 0.02832 | 0.00489 | 0.00403 | |||||||||
maxASMD | 0.00000 | 0.00237 | 0.00044 | 0.00044 | 0.00044 | 0.00044 | 0.00044 | |||||||||||
meanASMD | 0.00000 | 0.00123 | 0.00020 | 0.00020 | 0.00020 | 0.00020 | 0.00021 | |||||||||||
medASMD | 0.00000 | 0.00108 | 0.00015 | 0.00015 | 0.00015 | 0.00015 | 0.00017 | |||||||||||
meanKS | 0.00000 | 0.21859 | 0.21740 | 0.21706 | 0.21741 | 0.41061 | 0.39707 | |||||||||||
meanT | 0.00000 | 0.31149 | 0.00658 | 0.00557 | 0.00660 | 0.00625 | 0.00418 |

5.3.2 Simulation 2
In this section, we perform a simulation experiment with a parameter setting different from Simulation 1 (section 5.3.1). We wish to explore the mechanism of the KDB method in mining covariate information as well as check the performance of the KDB stable ATE estimation.
We first construct the treatment indicator with a binomial distribution . Here, we set , the probability for unit being assigned to the treatment group, to be 0.5. Next, we used two different multivariate normal distributions to generate the covariates and for the treatment group and the control group, respectively. We have
and
where . Let and to be distributer as , we continue to construct the covariates and . The vector is subsequently standardized to have a mean of zero and marginal variances of one. The observed covariates are . For the outcome variable, we use the bivariate model
where , , and with . To explore whether KDB stable ATE estimation can effectively reduce the variance of ATE estimator, we add a tuning parameter to the diagonal of and change the value to see see how the estimation result changes.
Setting the sample size , we run 500 Monte Carlo datasets for each value of in . Results of ATE estimation with are shown in Table [5]. In each table for a specific value of , the rows are arranged from top to bottom according to the absolute value of bias.
Comparing the results in all tables together, we can find that the UnAD estimator always performs the worst, and is far from the true value. This shows the importance of covariate balance, especially when the real generation mechanism of the data involves nonlinear information. As the value of increases, the performance of KDBC gradually deteriorates, but the performance of KDM1 has always been in the forefront, which indicates the necessity to add the first-moment balance to the balance constraints of the KDB method. Furthermore, adding a non-zero do help reduce the variance of the KDM1 estimator of ATE compared to the results when , the mechanism behind which requires to be further explored.
When changes from 0 to 1, the performance of KDM1 is not as good as CBPS and CAL. However, when we increase the value from 2 to 100, KDM1 again becomes the best weighting method with the smallest absolute value of bias, the mechanism behind which needs further exploration. This probably implies that there is a threshold for to induce better performance, and it may be related to the information matrix (section 4.5) of the observed dataset.
Considering the ASMD values of and under different weighting methods in Table [5], we know that the KDB method including the KDBC and KDM1 can always perform the best. Reviewing the performance of the KDB covariate balance in simulation 1 (section 5.3.1) with { = X, = U} and { = U, = U}, we can conclude that the KDB covariate balance may automatically discover and include some latent variables, such as interaction terms between different covariates or high-order terms of covariates, to analyze and estimate the ATE.
lambda = 0 | |||||||||||||
ATE | abs(Bias) | sd | RMSE | rw | KD | maxASMD | meanASMD | medASMD | X5ASMD | X6ASMD | meanKS | meanT | |
Oracle | 9.98058 | 0.01942 | 0.01427 | 0.01428 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
KDM1 | 9.97253 | 0.02747 | 0.02993 | 0.02993 | 0.00000 | 0.00034 | 0.00179 | 0.00039 | 0.00000 | 0.00068 | 0.00167 | 0.23125 | -0.01172 |
KDBC | 9.95925 | 0.04075 | 0.02982 | 0.02985 | 0.00000 | 0.00034 | 0.00203 | 0.00078 | 0.00061 | 0.00128 | 0.00166 | 0.23229 | -0.01277 |
CBPS | 9.93864 | 0.06136 | 0.02546 | 0.02559 | 0.00087 | 0.02829 | 0.15146 | 0.03478 | 0.00001 | 0.08006 | 0.12859 | 0.23308 | -0.00203 |
CAL | 9.92957 | 0.07043 | 0.02471 | 0.02488 | 0.00063 | 0.02436 | 0.13794 | 0.03100 | 0.00000 | 0.06410 | 0.12190 | 0.23103 | -0.00062 |
IPW | 9.87181 | 0.12819 | 0.13501 | 0.13500 | 0.00155 | 0.03605 | 0.18029 | 0.07931 | 0.06737 | 0.11296 | 0.13136 | 0.23359 | -0.05327 |
EB | 9.86523 | 0.13477 | 0.02452 | 0.02522 | 0.00063 | 0.02435 | 0.13861 | 0.03212 | 0.00218 | 0.06553 | 0.12184 | 0.23103 | -0.00725 |
UnAD | -13.16629 | 23.16629 | 0.15097 | 1.04695 | 0.04264 | 0.20455 | 0.89249 | 0.46910 | 0.46869 | 0.87363 | 0.11004 | 0.24557 | -2.83703 |
lambda = 1 | |||||||||||||
ATE | abs(Bias) | sd | RMSE | rw | KD | maxASMD | meanASMD | medASMD | X5ASMD | X6ASMD | meanKS | meanT | |
Oracle | 9.98058 | 0.01942 | 0.01427 | 0.01428 | 0.00000 | NA | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
CBPS | 9.93864 | 0.06136 | 0.02546 | 0.02559 | 0.00087 | 0.02829 | 0.15146 | 0.03478 | 0.00001 | 0.08006 | 0.12859 | 0.23308 | -0.00203 |
CAL | 9.92957 | 0.07043 | 0.02471 | 0.02488 | 0.00063 | 0.02436 | 0.13794 | 0.03100 | 0.00000 | 0.06410 | 0.12190 | 0.23103 | -0.00062 |
KDM1 | 9.92614 | 0.07386 | 0.02230 | 0.02252 | 0.00027 | 0.01609 | 0.09301 | 0.02067 | 0.00000 | 0.04032 | 0.08370 | 0.22262 | -0.02399 |
IPW | 9.87181 | 0.12819 | 0.13501 | 0.13500 | 0.00155 | 0.03605 | 0.18029 | 0.07931 | 0.06737 | 0.11296 | 0.13136 | 0.23359 | -0.05327 |
EB | 9.86523 | 0.13477 | 0.02452 | 0.02522 | 0.00063 | 0.02435 | 0.13861 | 0.03212 | 0.00218 | 0.06553 | 0.12184 | 0.23103 | -0.00725 |
KDBC | 7.70684 | 2.29316 | 0.03258 | 0.10759 | 0.00068 | 0.02577 | 0.13520 | 0.06471 | 0.06194 | 0.11691 | 0.08285 | 0.22935 | -0.29548 |
UnAD | -13.16629 | 23.16629 | 0.15097 | 1.04695 | 0.04264 | 0.20455 | 0.89249 | 0.46910 | 0.46869 | 0.87363 | 0.11004 | 0.24557 | -2.83703 |
lambda = 2 | |||||||||||||
ATE | abs(Bias) | sd | RMSE | rw | KD | maxASMD | meanASMD | medASMD | X5ASMD | X6ASMD | meanKS | meanT | |
Oracle | 9.99897 | 0.00103 | 0.01403 | 0.01402 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
CAL | 9.97598 | 0.02402 | 0.02712 | 0.02711 | 0.00061 | 0.02399 | 0.14136 | 0.03183 | 0.00000 | 0.06461 | 0.12634 | 0.23060 | 0.00053 |
KDM1 | 9.97495 | 0.02505 | 0.02545 | 0.02544 | 0.00034 | 0.01796 | 0.11197 | 0.02460 | 0.00000 | 0.04486 | 0.10274 | 0.22157 | -0.02585 |
CBPS | 9.96688 | 0.03312 | 0.02809 | 0.02810 | 0.00085 | 0.02795 | 0.15344 | 0.03519 | 0.00001 | 0.08018 | 0.13097 | 0.23332 | -0.00194 |
EB | 9.91031 | 0.08969 | 0.02739 | 0.02766 | 0.00061 | 0.02397 | 0.14194 | 0.03297 | 0.00220 | 0.06612 | 0.12629 | 0.23060 | -0.00639 |
IPW | 9.86231 | 0.13769 | 0.14389 | 0.14388 | 0.00165 | 0.03577 | 0.18388 | 0.07980 | 0.06765 | 0.11501 | 0.13453 | 0.23366 | -0.06557 |
KDBC | 5.90692 | 4.09308 | 0.04408 | 0.18827 | 0.00172 | 0.04099 | 0.20144 | 0.10423 | 0.10464 | 0.18551 | 0.09985 | 0.23222 | -0.52996 |
UnAD | -13.21379 | 23.21379 | 0.15533 | 1.04968 | 0.04286 | 0.20487 | 0.89554 | 0.47282 | 0.47129 | 0.87486 | 0.11403 | 0.24615 | -2.86241 |
lambda = 5 | |||||||||||||
ATE | abs(Bias) | sd | RMSE | rw | KD | maxASMD | meanASMD | medASMD | X5ASMD | X6ASMD | meanKS | meanT | |
Oracle | 9.99204 | 0.00796 | 0.01455 | 0.01454 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
KDM1 | 9.97729 | 0.02271 | 0.02752 | 0.02751 | 0.00042 | 0.02001 | 0.12573 | 0.02743 | 0.00000 | 0.04826 | 0.11631 | 0.22127 | -0.02422 |
CAL | 9.97347 | 0.02653 | 0.02839 | 0.02839 | 0.00063 | 0.02423 | 0.14342 | 0.03196 | 0.00000 | 0.06393 | 0.12784 | 0.23043 | -0.00077 |
CBPS | 9.96713 | 0.03287 | 0.02915 | 0.02916 | 0.00085 | 0.02797 | 0.15523 | 0.03541 | 0.00001 | 0.07816 | 0.13429 | 0.23273 | -0.00133 |
EB | 9.90745 | 0.09255 | 0.02870 | 0.02897 | 0.00063 | 0.02421 | 0.14408 | 0.03316 | 0.00223 | 0.06564 | 0.12778 | 0.23045 | -0.00754 |
IPW | 9.71187 | 0.28813 | 0.12863 | 0.12914 | 0.00154 | 0.03547 | 0.18570 | 0.07895 | 0.06512 | 0.11247 | 0.13634 | 0.23305 | -0.06307 |
KDBC | 1.99376 | 8.00624 | 0.07158 | 0.36512 | 0.00566 | 0.07427 | 0.34591 | 0.18240 | 0.18200 | 0.33050 | 0.10952 | 0.23495 | -1.02101 |
UnAD | -13.08489 | 23.08489 | 0.15413 | 1.04381 | 0.04235 | 0.20387 | 0.89282 | 0.47006 | 0.46973 | 0.87439 | 0.11324 | 0.24486 | -2.82364 |
lambda = 10 | |||||||||||||
ATE | abs(Bias) | sd | RMSE | rw | KD | maxASMD | meanASMD | medASMD | X5ASMD | X6ASMD | meanKS | meanT | |
Oracle | 10.00065 | 0.00065 | 0.01403 | 0.01402 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
KDM1 | 9.98450 | 0.01550 | 0.02582 | 0.02581 | 0.00044 | 0.02045 | 0.12810 | 0.02795 | 0.00000 | 0.04786 | 0.11985 | 0.22178 | -0.02839 |
CAL | 9.97525 | 0.02475 | 0.02652 | 0.02652 | 0.00060 | 0.02373 | 0.13972 | 0.03153 | 0.00000 | 0.06203 | 0.12714 | 0.23110 | -0.00074 |
CBPS | 9.97272 | 0.02728 | 0.02745 | 0.02745 | 0.00081 | 0.02729 | 0.14996 | 0.03443 | 0.00001 | 0.07602 | 0.13054 | 0.23393 | -0.00384 |
EB | 9.91211 | 0.08789 | 0.02633 | 0.02660 | 0.00060 | 0.02370 | 0.14022 | 0.03266 | 0.00214 | 0.06360 | 0.12710 | 0.23110 | -0.00735 |
IPW | 9.57129 | 0.42871 | 0.11484 | 0.11632 | 0.00140 | 0.03458 | 0.17781 | 0.07669 | 0.06409 | 0.11294 | 0.13090 | 0.23390 | -0.07975 |
KDBC | -1.93687 | 11.93687 | 0.09537 | 0.54227 | 0.01188 | 0.10779 | 0.48651 | 0.25751 | 0.25467 | 0.47040 | 0.11033 | 0.23875 | -1.53728 |
UnAD | -13.28347 | 23.28347 | 0.15303 | 1.05243 | 0.04300 | 0.20539 | 0.89664 | 0.47257 | 0.47019 | 0.87847 | 0.11167 | 0.24624 | -2.87076 |
lambda = 100 | |||||||||||||
ATE | abs(Bias) | sd | RMSE | rw | KD | maxASMD | meanASMD | medASMD | X5ASMD | X6ASMD | meanKS | meanT | |
Oracle | 9.99204 | 0.00796 | 0.01455 | 0.01454 | 0.00000 | NA | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
KDM1 | 9.97934 | 0.02066 | 0.02821 | 0.02820 | 0.00050 | 0.02168 | 0.13688 | 0.02977 | 0.00000 | 0.05193 | 0.12669 | 0.21941 | -0.02443 |
CAL | 9.97347 | 0.02653 | 0.02839 | 0.02839 | 0.00063 | 0.02423 | 0.14342 | 0.03196 | 0.00000 | 0.06393 | 0.12784 | 0.23043 | -0.00077 |
CBPS | 9.96713 | 0.03287 | 0.02915 | 0.02916 | 0.00085 | 0.02797 | 0.15523 | 0.03541 | 0.00001 | 0.07816 | 0.13429 | 0.23273 | -0.00133 |
EB | 9.90745 | 0.09255 | 0.02870 | 0.02897 | 0.00063 | 0.02421 | 0.14408 | 0.03316 | 0.00223 | 0.06564 | 0.12778 | 0.23045 | -0.00754 |
IPW | 9.71187 | 0.28813 | 0.12863 | 0.12914 | 0.00154 | 0.03547 | 0.18570 | 0.07895 | 0.06512 | 0.11247 | 0.13634 | 0.23305 | -0.06307 |
KDBC | -11.07394 | 21.07394 | 0.14527 | 0.95356 | 0.03548 | 0.18653 | 0.82059 | 0.43223 | 0.43112 | 0.80211 | 0.11307 | 0.24343 | -2.61005 |
UnAD | -13.08489 | 23.08489 | 0.15413 | 1.04381 | 0.04235 | 0.20387 | 0.89282 | 0.47006 | 0.46973 | 0.87439 | 0.11324 | 0.24486 | -2.82364 |
5.4 Real Data Analysis
To further compare different weighting methods, we investigate the causal effect of a real data set - the labour training program data that previously analyzed in LaLonde (1986) and Dehejia and Wahba (1999), among many others.
The dataset used here is a subset containing information on earnings in 1974 (RE74) of the National Supported Work Demonstration as used in LaLonde (1986). The National Supported Work Demonstration is an evaluation of the National Supported Work Demonstration project, a transitional, subsidized work experience program for four target groups of people with longstanding employment problems: ex-offenders, former drug addicts, women who were long-term recipients of welfare benefits, and school dropouts, many with criminal records. It is designed to test whether and to what extent 12 to 18 months of employment in a supportive but performance-oriented environment would equip hard-to-employ people to get and hold regular, unsubsidized jobs.
Metric | UnAD | IPW | CBPS | EB | CAL | KDBC | KDps | KDM1 | ||||||||
ATE | 1804.78064 | 1641.77770 | 1637.56504 | 1612.77550 | 1612.44289 | 1774.39211 | 2311.99732 | 1672.08084 | ||||||||
sd | 28.78274 | 29.73068 | 29.73643 | 30.01748 | 30.01800 | 29.48966 | 56.44856 | 29.32907 | ||||||||
Balancing Metric | UnAD | IPW | CBPS | EB | CAL | KDBC | KDps | KDM1 | ||||||||
KD | 0.00051 | 0.00012 | 0.00004 | 0.00004 | 0.00004 | 0.00002 | 0.00002 | 0.00001 | ||||||||
maxASMD | 0.00862 | 0.00802 | 0.00801 | 0.00800 | 0.00800 | 0.00864 | 0.00928 | 0.00803 | ||||||||
meanASMD | 0.00480 | 0.00445 | 0.00445 | 0.00445 | 0.00445 | 0.00482 | 0.00476 | 0.00445 | ||||||||
medASMD | 0.00498 | 0.00460 | 0.00460 | 0.00460 | 0.00460 | 0.00499 | 0.00469 | 0.00464 | ||||||||
meanKS | 0.57085 | 0.42960 | 0.42476 | 0.40371 | 0.40311 | 0.54584 | 0.22214 | 0.43027 | ||||||||
meanT | 9.81470 | 6.94619 | 6.86728 | 6.79432 | 6.78668 | 8.98585 | 2.07824 | 7.34122 |
Variables included in the dataset are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978). We compared the weighting estimates for earnings in 1978 between the treatment and control groups, which were estimates for the treatment effects. We compare the proposed KDB covariate balance with the IPW, CBPS, EB, CAL, and take the UnAD estimator as the benchmark. We set the balance restrictions in the KDB method as follows: regularization constraint only, regularization constraints plus propensity score balance, and regularization constraints plus the first-moment balance. Each scenario is denoted by “KDBC”, “KDps”, and “KDM1”, respectively. We obtain 500 bootstrap datasets using repeatable sampling, with which we estimate the ATE. The results are shown in Table [6], and all numerical results are rounded to five decimal places. As can be seen from Table [6], the estimation results of all weighting methods are significantly different from those of UnAD estimates. Except for the KDB method (KDBC, KDps, KDM1), the estimation results of other weighting methods are very similar. Both KDBC and KDM1 have the smallest standard deviation (sd), so they show outstanding stability in estimating ATE. Considering the specific estimates of ATE, however, only adding regularization constraints may not be enough to achieve a good balance of covariates between the treatment and control group. In addition, the results of KDPS estimation are completely different from those of other methods including UnAD, and its sd is very large. Therefore, it seems inappropriate to use regularization constraints plus propensity score balance as balance restrictions in the KDB method. This may be an important hint to remind us to carefully set balance restrictions, which is an important exploration focus.
We plot the probability density functions (PDF) of covariates “age” (denoted by ) and “education” (denoted by ), as well as PDFs of and weighted by different methods (UnAD, IPW, CBPS, EB, CAL, KDBC) to compare performance. Both the first PDF plot in Figure [7] and Figure [8] indicate that and have almost the same distributions in the treatment and control groups, which is consistent with the randomization setting in NSW data. In Figure [7,8], the relative relationship between the PDFs of and weighted by the IPW, CBPS, EB, and CAL methods in the treatment and control groups is very similar, which may be able to explain why the estimation results of these weighting methods in Table [6] are highly similar. Although all methods have widened the gap between the probability density functions of the treatment group and the control group, the KDBC method seems to better retain the characteristics of the original PDF, such as bimodal.


6 Conclusions and Further Discussions
6.1 Conclusions
With the goal of creating balanced samples, we proposed a kernel-distance-based covariate balancing method with the Gaussian kernel in this article. The proposed method works for calculating sample weights for the treatment and control groups in order to yield proper estimator for the causal effect. It seems to perform well in discovering the latent variable, such as the interaction of different predictors, and yield stable estimate for the ATE.
6.2 Further Discussions
There are two main directions for our proposed method to develop. (i) In addition to setting the kernel distance as the loss function, we can also choose other commonly used covariate balance metrics as the loss function, and use the kernel distance as an optimization constraint, so that it is possible to control the balance performance from different dimensions at the same time. (ii) We look forward to exploring the further applications of kernel distance in high-dimensional space.
Acknowledgments
References
- Chan et al. (2016) Chan, K. C. G., S. C. P. Yam, and Z. Zhang (2016): “Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting,” Journal of the Royal Statistical Society. Series B, Statistical methodology, 78, 673.
- Dehejia and Wahba (1999) Dehejia, R. H. and S. Wahba (1999): “Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs,” Journal of the American statistical Association, 94, 1053–1062.
- Deville and Särndal (1992) Deville, J.-C. and C.-E. Särndal (1992): “Calibration estimators in survey sampling,” Journal of the American statistical Association, 87, 376–382.
- Erlander (1977) Erlander, S. (1977): “Entropy in linear programs: an approach to planning" report LiTH-MAT-R-77-3,” Department of Mathematics, Linkoping University, Linkoping, Sweden.
- Hainmueller (2012) Hainmueller, J. (2012): “Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies,” Political Analysis, 20, 25–46.
- Hirano et al. (2003) Hirano, K., G. W. Imbens, and G. Ridder (2003): “Efficient estimation of average treatment effects using the estimated propensity score,” Econometrica, 71, 1161–1189.
- Horvitz and Thompson (1952) Horvitz, D. G. and D. J. Thompson (1952): “A generalization of sampling without replacement from a finite universe,” Journal of the American statistical Association, 47, 663–685.
- Imai and Ratkovic (2014) Imai, K. and M. Ratkovic (2014): “Covariate balancing propensity score,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 243–263.
- Imbens and Rubin (2015) Imbens, G. W. and D. B. Rubin (2015): Causal inference in statistics, social, and biomedical sciences, Cambridge University Press.
- Josey et al. (2020) Josey, K. P., E. Juarez-Colunga, F. Yang, and D. Ghosh (2020): “A framework for covariate balance using Bregman distances,” Scandinavian Journal of Statistics.
- Kang et al. (2007) Kang, J. D., J. L. Schafer, et al. (2007): “Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data,” Statistical science, 22, 523–539.
- LaLonde (1986) LaLonde, R. J. (1986): “Evaluating the econometric evaluations of training programs with experimental data,” The American economic review, 604–620.
- Ma and Wang (2010) Ma, X. and J. Wang (2010): “Robust inference using inverse probability weighting,” Journal of the American Statistical Association, 1–10.
- Newey and Smith (2004) Newey, W. K. and R. J. Smith (2004): “Higher order properties of GMM and generalized empirical likelihood estimators,” Econometrica, 72, 219–255.
- Neyman (1923) Neyman, J. S. (1923): “On the application of probability theory to agricultural experiments. essay on principles. section 9.(tlanslated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465-480),” Annals of Agricultural Sciences, 10, 1–51.
- Rasmussen (2003) Rasmussen, C. E. (2003): “Gaussian processes in machine learning,” in Summer School on Machine Learning, Springer, 63–71.
- Rosenbaum and Rubin (1983) Rosenbaum, P. R. and D. B. Rubin (1983): “The central role of the propensity score in observational studies for causal effects,” Biometrika, 70, 41–55.
- Rubin (1974) Rubin, D. B. (1974): “Estimating causal effects of treatments in randomized and nonrandomized studies.” Journal of educational Psychology, 66, 688.
- Rubin (1978) ——— (1978): “Bayesian inference for causal effects: The role of randomization,” The Annals of statistics, 34–58.
- Rubin (2005) ——— (2005): “Causal inference using potential outcomes: Design, modeling, decisions,” Journal of the American Statistical Association, 100, 322–331.
- Rubin (2007) ——— (2007): “The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials,” Statistics in medicine, 26, 20–36.
- Sriperumbudur et al. (2012) Sriperumbudur, B. K., K. Fukumizu, A. Gretton, B. Schölkopf, G. R. Lanckriet, et al. (2012): “On the empirical estimation of integral probability metrics,” Electronic Journal of Statistics, 6, 1550–1599.
- Wong and Chan (2018) Wong, R. K. and K. C. G. Chan (2018): “Kernel-based covariate functional balancing for observational studies,” Biometrika, 105, 199–213.
- Xie (2018) Xie, Y. (2018): “Causal Inference with Covariate Balance Optimization,” Ph.D. thesis, University of Waterloo.
- Xie et al. (2019) Xie, Y., Y. Zhu, C. A. Cotton, and P. Wu (2019): “A model averaging approach for estimating propensity scores by optimizing balance,” Statistical methods in medical research, 28, 84–101.
- Yeying et al. (2018) Yeying, Z., G. Debashis, et al. (2018): “A Kernel-Based Metric for Balance Assessment,” Journal of Causal Inference, 6.
- Zhang et al. (2019) Zhang, Z., H. J. Kim, G. Lonjon, Y. Zhu, et al. (2019): “Balance diagnostics after propensity score matching,” Annals of translational medicine, 7.
- Zhao and Percival (2016) Zhao, Q. and D. Percival (2016): “Entropy balancing is doubly robust,” Journal of Causal Inference, 5.
- Zhu et al. (2018) Zhu, Y., J. S. Savage, and D. Ghosh (2018): “A kernel-based metric for balance assessment,” Journal of causal inference, 6.
- Zolotarev (1983) Zolotarev, V. M. (1983): “Probability metrics,” Teoriya Veroyatnostei i ee Primeneniya, 28, 264–287.