The oracle property of the generalized outcome adaptive lasso
Ismaila Baldé
(April 6, 2025)
Abstract
The generalized outcome-adaptive lasso (GOAL) is a variable selection for high-dimensional causal inference proposed by Baldé et al. [2023, Biometrics79(1), 514–520]. When the dimension is high, it is now well established that an ideal variable selection method should have the oracle property to ensure the optimal large sample performance. However, the oracle property of GOAL has not been proven. In this paper, we show that the GOAL estimator enjoys the oracle property. Our simulation shows that the GOAL method deals with the collinearity problem better than the oracle-like method, the outcome-adaptive lasso (OAL).
Let be a continuous outcome variable, a potential confounders matrix and a binary treatment. Assume that all covariates , are measured prior to the treatment which in turn is measured prior to the outcome . We assume the propensity score (PS) model is defined as:
Let and denote indices of confounders (covariates related to both outcome and treatment) and pure predictors of outcome, respectively.
Our objective is to estimate the following PS model:
The negative log-likelihood function of is given by
Baldé et al. (2023) proposed the generalized outcome adaptive lasso (GOAL):
,
where such that and .
GOAL is designed to improve OAL (Shortreed and Ertefaie, 2017) for high-dimensional data analysis. Baldé (2022) conjectured that GOAL must satisfy the oracle property.
In this paper, we show that GOAL enjoys the oracle property with a proper choice of , and . That is, GOAL performs as well as if the true underlying model were known in advance. The oracle property is particularly important for a variable selection method when the dimension is high to ensure optimal large sample performance (Zou, 2006; Zou and Zhang, 2009).
2 Statistical theory
Let be the indices of desirable covariates to include in the estimated PS. Let be the indices of of covariates to exclude, where and are the indices of pure predictors of treatment and spurious covariates (covariates that are unrelated to both outcome and treatment), respectively. We write the Fisher information (FI) matrix as
(1)
Theorem 1.
Assume the following regularity conditions:
(C.1)
The Fisher information matrix defined in Equation 1 is finite and positive definite.
(C.2)
For each , there existe a function and such that for in the neighborhood of , we have:
such that
where is an open parameter space for .
(C.3)
and , for .
(C.4)
.
Then under conditions (C.1)-(C.4) the generalized outcome-adaptive lasso estimator satisfies the following:
1.
Consistency in variable selection: ;
2.
Asymptotic normality: .
Proof of Theorem 1.
The ideas of the proof are taken from Zou (2006), Khalili and Chen (2007), Slawski et al. (2010), and Shortreed and Ertefaie (2017).
First, we prove the asymptotic normality.
Let . Then
define by
For , we have:
Define . Thus
where .
By using the second-order Taylor expansion of around , we have:
Thus, we can rewrite as
with
By applying the central limit theorem and laws of large numbers, we have:
For the term , we observe that
By the condition (C.2), we observe that
(2)
where is between and . Equation 2 show that is bounded.
The behavior of and depend on the covariate type. If a covariate is a confounder () or a pure predictor of the outcome (), this is since . If , then we have:
with , and
By using the Slutsky’s theorem we have:
For the behavior of , we have
since by assumption and and then using the Slutsky’s theorem.
Using the convexity of and following the epi-convergence results of Geyer (1994), we have:
Thus, again by applying the theorem of Slutsky, we have:
Finally, we prove the asymptotic normality part.
Now we show the consistency in variable selection part, i.e.
. Let and define a penalized negative log-likelihood function as
Thus, to prove sparsity it suffices to show that
with probability tending to 1 as . We observe that
By the mean value theorem, we have:
for some .
By the mean value theorem again, we have:
Thus, the limiting behavior of depends wether or .
If , we have . Thus,
If , we have . Thus,
Then, for , we have
Hence, we have:
Thus,
(3)
By shrinking neighborhood of , in probability. This show that with probability tending to 1 as .
Let be the minimizer of the penalized negative log-likelihood function , where is a function of . Now it suffices to prove
.
By adding and subtracting , we have:
(4)
(5)
By the results in Equation 3, the right-hand side of the Equation 5 is positive with probability tending to 1 as . This completes the proof of the sparsity part.
3 Numerical example
In this section, we present simulations done to study the finite sample performance of GOAL. Unlike the simulation studies conducted in Baldé et al. (2023), and in Shortreed and Ertefaie (2017), we follow Zou and Zhang (2009) to allow the intrinsic dimension (the size of ) to diverge with the sample size as well. This makes our numerical study more challenging than simulation studies in Baldé et al. (2023) and in Shortreed and Ertefaie (2017), which considered a situation where the intrinsic dimension is fixed. In this numerical study, we considered three methods: GOAL, OAL and Lasso. We use these methods because OAL is an oracle-like method (Shortreed and Ertefaie, 2017) while Lasso does not have the oracle property (Zou, 2006; Zou and Zhang, 2009).
Now, we describe the simulation set up to generate the data , denoting the covariates matrix, treatment and outcome, respectively. The vector , for is simulated from a multivariate standard Gaussian distribution with pairwise correlation . The binary treatment is simulated from a Bernoulli distribution with . Given and , the continuous outcome is simulated as where . The true ATE was . We considered two different correlations: independent covariates () and strongly correlated covariates (). Let for . The true coefficients are , with , and denote a -vector of ’s/’s.
We follow Zou and Zhang (2009) to set for computing the adaptive weights in the GOAL and OAL. For estimation accuracy, we used the bias, standard error (SE) and mean squared error (MSE) to compare methods. For variable selection, we evaluated methods based on the proportion of times each variable was selected for inclusion in the PS model, where a variable was considered selected if its estimated coefficient was greater than the tolerance (Shortreed and Ertefaie, 2017).
Figure 1: Probability of covariate being included in PS model for estimating the average treatment effect (ATE) under independent () in row 1, and strongly correlated covariates () in row 2.
Table 1: Bias, standard error (SE) and mean squared error (MSE) of IPTW estimator for GOAL, OAL and Lasso estimator for the average treatment effect (ATE) based on replications.
No correlation
Strong correlation
Model
Bias
SE
MSE
Bias
SE
MSE
100
35
6
GOAL
0.13
0.32
0.12
0.32
0.68
0.57
OAL
0.17
0.32
0.13
1.40
0.80
2.60
Lasso
0.69
0.32
0.58
2.93
0.63
8.97
200
51
10
GOAL
0.12
0.26
0.08
0.51
0.84
0.95
OAL
0.15
0.27
0.10
2.96
1.14
10.07
Lasso
0.92
0.27
0.92
5.51
0.67
30.79
400
75
16
GOAL
0.15
0.20
0.06
1.00
1.13
2.27
OAL
0.19
0.22
0.08
5.56
1.68
33.74
Lasso
1.24
0.24
1.59
9.48
0.78
90.43
Table 1 and Figure 1 present the simulation results based on simulations with the combinations , , for both independent and strongly correlated covariates.
Table 1 presents the bias, standard error (SE) and mean squared error (MSE) of GOAL, OAL and Lasso estimators for the ATE. In all considered scenarios, GOAL and OAL performed much better than the Lasso method. For the independent covariates , the MSE of GOAL and OAL decreased as the sample size increased. The method GOAL exhibited the smallest bias and MSE for all combinations . When covariates are strongly correlated , the performance of the OAL method deteriorates with sample size. The Lasso performed the worst under the settings we considered.
Figure 1 reports the proportion of times each covariate was selected over simulations for inclusion in the PS model when Lasso, OAL and GOAL were used to fit the PS model to estimate the ATE. OAL and GOAL algorithms included all covariates at similar rate with high probability for confounders and pure predictors of the outcome (about 100%), and relatively small probability for the pure predictor of treatment and spurious covariates, for all combinations considered. However, the Lasso method selects the confounders and pure predictor of treatment with high probability and excludes pure predictors of the outcome and spurious covariates.
4 Discussion
In this paper, we studied the statistical property of the GOAL method. We compared GOAL, OAL and Lasso using a simulation study where both the number of parameters and the intrinsic dimension diverges with the sample size. A distinctive feature of our simulation scenarios compared to many existing variable selection method for causal inference including those conducted in Baldé et al. (2023) and Shortreed and Ertefaie (2017) is that the dimension of the active set () diverges with the sample size. This makes our numerical example more challenging and more appropriate for high-dimensional data analysis. GOAL and OAL outperformed the Lasso in all scenarios considered. The two oracle-like methods (GOAL and OAL) are the best when the covariates are independent () and the sample size is large (n=400). This result is expected according to the asymptotic theory for an oracle-like method (Zou and Zhang, 2009). However, OAL was less performant than GOAL when the correlation is strong (). The GOAL method had the best performance for every combination of . As a result, GOAL has much better finite sample performance than the oracle-like method OAL.
Data availability
No data was used for the research described in the article.
Acknowledgements
This work was funded by grants from New Brunswick Innovation Foundation (NBIF).
References
Baldé, I., 2022. Algorithmes de sélection de confondants en petite et grande dimensions : contextes d’application conventionnels et pour l’analyse de la médiation. Thèse. Montréal (Québec, Canada), Université du Québec à Montréal, Doctorat en mathématiques.
Baldé, I., Yang, A. Y., Lefebvre, G., 2023. Reader Reaction to “ Outcome-adaptive lasso: Variable selection for causal inference ” by Shortreed and Ertefaie (2017). Biometrics 79(1), 514–520.
Khalili, A., Chen, J., 2007. Variables selection in finite mixture of regression models. Journal of the American Statistical Association 104, 1025–38.
Shortreed, S. M., Ertefaie, A., 2017. Outcome-adaptive lasso: Variable selection for causal inference. Biometrics 73(4), 1111–1122.
Slawski, M., Castell, W. zu., Tutz, G., 2010. Feature Selection Guided by Structural Information. Annals of Applied Statistics, 4(2), 1056–1080.
Zou, H., 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association: Series B 101, 1418–1429.
Zou, H. and Zhang, H. H. (2009).
On the adaptive elastic-net with a diverging number of parameters.
The Annals of Statistics, 37, 1733–1751.