This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bayesian Nonparametric Trees for Principal Causal Effects

Abstract

Principal stratification analysis evaluates how causal effects of a treatment on a primary outcome vary across strata of units defined by their treatment effect on some intermediate quantity. This endeavor is substantially challenged when the intermediate variable is continuously scaled and there are infinitely many basic principal strata. We employ a Bayesian nonparametric approach to flexibly evaluate treatment effects across flexibly-modeled principal strata. The approach uses Bayesian Causal Forests (BCF) to simultaneously specify two Bayesian Additive Regression Tree models; one for the principal stratum membership and one for the outcome, conditional on principal strata. We show how the capability of BCF for capturing treatment effect heterogeneity is particularly relevant for assessing how treatment effects vary across the surface defined by continuously-scaled principal strata, in addition to other benefits relating to targeted selection and regularization-induced confounding. The capabilities of the proposed approach are illustrated with a simulation study, and the methodology is deployed to investigate how causal effects of power plant emissions control technologies on ambient particulate pollution vary as a function of the technologies’ impact on sulfur dioxide emissions.

Chanmin Kim1,∗ Corwin Zigler2
1Department of Statistics, Sungkyunkwan University, Seoul, Korea
2Department of Statistics and Data Science, The University of Texas, Austin, TX 78712, USA

1. Introduction

Many scientific questions relate to how causal effects of a treatment on a primary outcome relate to the treatment’s effect on some intermediate quantity. Complications to causal inference with such intermediate outcomes are well recognized, and have motivated a variety of analysis approaches. In particular, principal stratification (Frangakis and Rubin, 2002) formulates how causal effects of a treatment on an outcome may vary as a function of the treatment’s effect on the intermediate quantity. This has proven useful in evaluating causal treatment effects when the intermediate is a surrogate for the primary outcome (Gilbert and Hudgens, 2008), a measure of treatment compliance (Frumento et al., 2012), a measure of local response to a national regulation (Zigler et al., 2018), or time to an event in a study of semicompeting risks (Comment et al., 2019; Nevo and Gorfine, 2022; Lyu et al., 2023).

The objective of principal stratification is to estimate the average causal effect of a treatment, AA, on an outcome, YY, across “principal strata” of the population, where these strata are defined based on the extent to which AA causally affects some intermediate variable, MM (Frangakis and Rubin, 2002). Thus, there is a sense in which the presence of treatment effect heterogeneity with respect to the principal strata is a key feature of principal strataification analysis. Owing to the structure of potential outcomes, which are only observed for each unit under that unit’s observed treatment status, units’ membership in principal strata are unknown, with allocation of units to these strata a central analysis challenge. When the intermediate MM is binary, this amounts to allocating units to one of only four principal strata, and there are numerous available methods (Page, 2012; Frumento et al., 2012; Miratrix et al., 2018). When MM takes on continuous values, there are infinitely many principal strata and the problem is substantially more complex. Early work in Jin and Rubin (2008) offered a Bayesian parametric approach for continuous intermediate variables. This perspective was extended in Schwartz et al. (2011), who offered a Bayesian nonparametric Drichlet process mixture (DPM) model for the intermediate outcomes, the primary virtue of which was to estimate a complex joint distribution of potential continuous intermediates that corresponded to very flexibly modeled principal strata. Causal effects conditional on these strata were modeled with a parametric outcome model. Kim et al. (2019) further extended the flexibility of a DPM to model to both the distributions of potential intermediate and outcome variables, using a Gaussian copula model to combine all marginal distributions into a coherent joint distribution. This approach facilitated estimation of the potentially complex joint distribution of intermediate and outcome variables but, owing to the assumption of normality required in using the Gaussian copula model, the marginal distributions of potential intermediate and outcome variables remained linearly connected, imposing some parametric structure for characterizing principal causal effects. Note that the related framework of causal mediation analysis (Pearl, 2001; Robins and Greenland, 1992), with its own Bayesian nonparametric extensions (Kim et al., 2019), is designed to more specifically establish causal mechanism of how a treatment effect is mediated through the intermediate outcome, requiring different types of identifying assumptions (Imai et al., 2010). Relationships between principal stratification and causal mediation analysis are thoroughly explored in (VanderWeele, 2008; Mealli and Mattei, 2012; Kim et al., 2019), but are not the focus here.

In this paper, we continue development of flexible Bayesian nonparametric principal stratification analysis with continuous intermediate variables and continuous outcomes. Specifically, we offer an approach that utilizes Bayesian Additive Regression Trees (BART, Chipman et al. (2010)) to flexibly model both intermediate and outcome variables. Methods based on BART have been extensively studied in the causal inference community since Hill (2011), owing to their promising predictive performance. Green and Kern (2012) used BART in survey experiments to estimate conditional treatment effects. To evaluate heterogeneous treatment effects, Zeldow et al. (2019) proposed a Bayesian structural mean model (SMM) based on two BART priors. See Tan and Roy (2019) and Hill et al. (2020) for a detailed discussion of the BART method and its application in causal inference. Beyond the general goal of incorporating model flexibility and providing a unified estimating procedure for continuous intermediate and outcome variables, we adopt an approach based on BART for reasons specific to the goal of principal stratification. Since the goal of principal stratification is to evaluate how causal effects of AA on YY vary across principal strata, we utilize the BART-derived approach of Bayesian Causal Forests (Hahn et al., 2020) for its targeted ability to capture flexible forms of treatment effect heterogeneity, in addition to its ability to adjust for complex confounding structures. We propose joint model that specifies separate Bayesian causal forest models for continuous intermediate values and primary outcomes, designed specifically to (1) characterize complex principal strata structures and (2) use the capacity of BCF to capture how treatment effects vary across these strata. A Markov chain Monte Carlo procedure is proposed to sample from the posterior predictive distribution of unobserved potential intermediate variables and estimate principal causal effects.

The proposed methodology is motivated by and illustrated with an investigation of emission-reduction strategies installed at coal-fired power plants in the United States. Such strategies have played a significant role in the marked reduction of air pollution and corresponding health burden associated with coal power generation in the U.S. (Henneman et al., 2023). The specific treatment of interest is whether a coal-fired electricity generating unit (EGU) has a flue-gas desulfurization scrubber installed to reduce emissions of sulfur dioxide (SO2) a major byproduct of coal combustion. Interest lies in the extent to which the causal effect of such scrubbers on ambient fine particulate pollution (PM2.5, YY) vary as a function of the scrubber’s impact on SO2  emissions (MM). In particular, the effectiveness of scrubbers in reducing ambient PM2.5  may vary with respect to myriad power plant features and circumstances that dictate scrubber impacts on SO2  such as other changes in operation. A more complete understanding of the effectiveness of established control strategies, particularly as overall levels of this type of pollution have drastically reduced in the U.S. (Henneman et al., 2023) is essential and represents a needed advance over similar investigations conducted during earlier time periods (Kim et al., 2019, 2020).

2. Model and Methods

2.1. Notation and Causal Estimands

The approach is formalized within a potential outcomes framework (Rubin, 1974). Let (Ai,Mi,Yi)(A_{i},M_{i},Y_{i}) denote the exposure, intermediate, and outcome variables of interest, and 𝑿i\bm{X}_{i} be a set of confounders for observation ii, all conceived as fixed pre-treatment features. Throughout the paper, we assume the binary exposure. Let Mi(𝑨)M_{i}(\bm{A}) denote the potential value of the intermediate variable that would be observed under the vector of exposures 𝑨=(A1,,An)\bm{A}=(A_{1},\ldots,A_{n}) for subjects i=1,,ni=1,\ldots,n. Similarly, Yi(𝑨)Y_{i}(\bm{A}) denotes the potential value of the outcome that would be observed under the vector of exposures 𝑨=(A1,,An)\bm{A}=(A_{1},\ldots,A_{n}). Under the stable unit treatment value assumption (SUTVA, Rubin (1980)), potential intermediate and outcome values for subject ii do not depend on exposures assigned to other subjects (i.e., Mi(𝑨)=Mi(Ai)M_{i}(\bm{A})=M_{i}(A_{i}) and Yi(𝑨)=Yi(Ai)Y_{i}(\bm{A})=Y_{i}(A_{i})) and are essentially equal to the observed intermediate and outcome values (i.e., Mi=Mi(Ai)M_{i}=M_{i}(A_{i}) and Yi=Yi(Ai)Y_{i}=Y_{i}(A_{i})). The exposure (AA) in the motivating air quality study, is the presence of a flue-gas desulfurization scrubber (henceforth, “scrubber”), measured at each coal-fired power plant in 2014; the intermediate variable (MM) is the amount of SO2  emissions from the power plant during 2014; and the outcome of interest is the annual ambient average PM2.5  concentration within a 50km radius of the power plant (YY). Mi(Ai)M_{i}(A_{i}) and Yi(Ai)Y_{i}(A_{i}) are the potential SO2  emissions and surrounding PM2.5  values based on the state of scrubber installation (AiA_{i}) at the ii-th power plant.

Principal stratification (Frangakis and Rubin, 2002) partitions the population based on how AiA_{i} affects the intermediate variable, encoded as membership in a principal stratum defined as Si=(Mi(0),Mi(1))S_{i}=(M_{i}(0),M_{i}(1)). This principal stratum membership SiS_{i} is a latent variable that is unaffected by exposure and can be considered the baseline characteristic of the ii-th subject, encoding features that dictate how the primary treatment impacts the intermediate variable. In the power plant case, SiS_{i} denotes a combination of the effectiveness of scrubbers for reducing SO2  as well as other plant characteristics dictating variability in this effect that may also relate to the formation of ambient PM2.5  such as patterns of operation, responses to regional and seasonal regulatory programs, or atmospheric and weather conditions surrounding the plant. Our goal is to examine causal effects of AiA_{i} on YiY_{i} across principal strata defined by different values of (Mi(0),Mi(1))(M_{i}(0),M_{i}(1)), known as principal causal effects (PCEs, Frangakis and Rubin (2002)).

PCEs can be described as a surface of 𝔼[Y(1)Y(0)|M(0),M(1)]\mathbb{E}[Y(1)-Y(0)|M(0),M(1)] across values of (M(0),M(1))(M(0),M(1)), which has been termed the “causal effect predictiveness” surface (Gilbert and Hudgens, 2008). Summarizing this surface can be challenging with continuously-scaled MM, and deriving summary estimands will depend on context. Expected dissociative effects have been proposed and used to summarize average causal effects of AA on YY among principal strata for which AA has little or no impact on MM, representing something similar to a “direct effect” of AA on YY when MM remains unaffected. Expected associative effects can summarize average effects of AA on YY across strata where M(0)M(1)M(0)\neq M(1). Here, we extend previous work in Kim et al. (2020) to define multiple average principal effects based on a pre-determined threshold vectors, ϵ=(ϵlower,ϵupper)\bm{\epsilon}=(\epsilon_{\text{lower}},\epsilon_{\text{upper}}) and define expected PCEs as:

ΔPCE(ϵ)\displaystyle\Delta_{\text{PCE}}(\bm{\epsilon}) =\displaystyle= 𝔼[Yi(1)Yi(0)|ϵlower<Mi(1)Mi(0)<ϵupper],\displaystyle\mathbb{E}[Y_{i}(1)-Y_{i}(0)\,|\,\epsilon_{\text{lower}}<M_{i}(1)-M_{i}(0)<\epsilon_{\text{upper}}],

where various values of ϵ\bm{\epsilon} will define PCEs among strata with different impacts of AA on MM. For example, specifying |ϵlower|=|ϵupper|0|\epsilon_{\text{lower}}|=|\epsilon_{\text{upper}}|\approx 0 would correspond to an expected dissociative effect among strata of power plants for which scrubbers have little to no effect on SO2  emissions. If we consider the sample average estimand as the target estimand, the expression for ΔPCE(ϵ)\Delta_{\text{PCE}}(\bm{\epsilon}) is as follows:

ΔPCE(ϵ)\displaystyle\Delta_{\text{PCE}}(\bm{\epsilon}) =\displaystyle= 1nϵi=1n𝕀{ϵlower<Mi(1)Mi(0)<ϵupper}[Yi(1)Yi(0)],\displaystyle\frac{1}{n_{\epsilon}}\sum_{i=1}^{n}\mathbb{I}\{\epsilon_{\text{lower}}<M_{i}(1)-M_{i}(0)<\epsilon_{\text{upper}}\}[Y_{i}(1)-Y_{i}(0)],

where nϵ=𝕀{i=1nϵlower<Mi(1)Mi(0)<ϵupper}n_{\epsilon}=\mathbb{I}\{\sum_{i=1}^{n}\epsilon_{\text{lower}}<M_{i}(1)-M_{i}(0)<\epsilon_{\text{upper}}\} and 𝕀{}\mathbb{I}\{\} is an indicator function.

Note that the fundamental problem of causal inference applies in this case to both the values of Yi(A)Y_{i}(A) and Mi(A)M_{i}(A). Thus, the principal stratum membership SiS_{i} is unobserved and quantities such as ΔPCE(ϵ)\Delta_{\text{PCE}}(\bm{\epsilon}) are not identified based only on observed data. Unobserved values Mi(1Ai)M_{i}(1-A_{i}) will determine the value of SiS_{i} and must be estimated along with the unobserved values of Yi(1Ai)Y_{i}(1-A_{i}).

2.2. Identifying Assumptions

To estimate the principal causal effects, we first formulate an assignment mechanism, the distribution of 𝑨\bm{A} conditional on the potential (intermediate) outcomes (𝒀(0),𝒀(1),𝑴(0),𝑴(1)(\bm{Y}(0),\bm{Y}(1),\bm{M}(0),\bm{M}(1)) and observed covariates where 𝒀(a)=(Y1(a),,Yn(a))\bm{Y}(a)=(Y_{1}(a),\ldots,Y_{n}(a))^{\prime} and 𝑴(a)=(M1(a),,Mn(a))\bm{M}(a)=(M_{1}(a),\ldots,M_{n}(a))^{\prime} for a=0,1a=0,1. Under the assumption that the exposure assignment for each unit is independent of information from other units, the exposure assignment we consider for unit ii has the following form: Pr(Ai|Yi(0),Yi(1),Mi(0),Mi(1),𝑿i)\text{Pr}(A_{i}|Y_{i}(0),Y_{i}(1),M_{i}(0),M_{i}(1),\bm{X}_{i}). The ignorability assumption (Rosenbaum and Rubin, 1983) states that with a proper set of covariates 𝑿i\bm{X}_{i}, the exposure assignment mechanism can be reduced to a simpler form.

Assumption 1.

Strongly ignorable treatment assignment (unconfoundedness)

{Yi(0),Yi(1),Mi(0),Mi(1)}Ai|𝑿𝒊=𝒙,\{Y_{i}(0),Y_{i}(1),M_{i}(0),M_{i}(1)\}\perp\!\!\!\perp A_{i}\,|\,\bm{X_{i}}=\bm{x},

for all 𝐱𝒳\bm{x}\in\mathcal{X}. Equivalently, it can be restated as Pr(Ai|Yi(1),Yi(0),Mi(1),Mi(0),𝐗i)=Pr(Ai|𝐗i)Pr(A_{i}|Y_{i}(1),Y_{i}(0),M_{i}(1),M_{i}(0),\bm{X}_{i})=Pr(A_{i}|\bm{X}_{i}) where 0<Pr(Ai|𝐗i)<10<Pr(A_{i}|\bm{X}_{i})<1 for i=1,,ni=1,\ldots,n.

Here, the “proper” set of covariates implies that covariate vector 𝑿i\bm{X}_{i} includes all common causes (confounders) of AMA-M and AYA-Y relationships. Ignorability implies that, unlike observed MiobsM_{i}^{\text{obs}}, principal stratum membership Si=(Mi(0),Mi(1))S_{i}=(M_{i}(0),M_{i}(1)) is not affected by the exposure, and average comparisons between Yi(0)Y_{i}(0) and Yi(1)Y_{i}(1) within levels of SiS_{i} can be interpreted as (principal) causal effects.

Our target estimands are sample average PCEs for the nn sample units. Unlike corresponding population average estimands, the parameters governing associations between Yi(0)Y_{i}(0) and Yi(1)Y_{i}(1) and Mi(0)M_{i}(0) and Mi(1)M_{i}(1) cannot be ignored (Schwartz et al., 2011; Ding and Li, 2018; Kim et al., 2019; Li et al., 2023) and we need to apply the notion of inference used in Jin and Rubin (2008). With the Bayesian causal forest approach described in the next section, we adopt the following conditional independence assumptions to specify two joint conditional distributions of (Yi(1),Yi(0))(Y_{i}(1),Y_{i}(0)) and (Mi(1),Mi(0))(M_{i}(1),M_{i}(0)): Yi(1Ai)Yi(Ai)|Ai,Mi(0),Mi(1),𝑿iY_{i}(1-A_{i})\perp Y_{i}(A_{i})|A_{i},M_{i}(0),M_{i}(1),\bm{X}_{i} and Mi(1Ai)Mi(Ai)|Ai,𝑿iM_{i}(1-A_{i})\perp M_{i}(A_{i})|A_{i},\bm{X}_{i}. Then, this assumption also further factorizes the joint conditional distributions of potential (intermediate) outcomes as follows: Pr(Yi(0),Yi(1)|Mi(0),Mi(1),𝑿i,ϕ)=Pr(Yi(0)|Mi(0),Mi(1),𝑿i,ϕ)Pr(Yi(1)|Mi(0),Mi(1),𝑿i,ϕ)Pr(Y_{i}(0),Y_{i}(1)|M_{i}(0),M_{i}(1),\bm{X}_{i},\bm{\phi})=Pr(Y_{i}(0)|M_{i}(0),M_{i}(1),\bm{X}_{i},\bm{\phi})Pr(Y_{i}(1)|M_{i}(0),M_{i}(1),\bm{X}_{i},\bm{\phi}) and Pr(Mi(0),Mi(1)|𝑿i,ϕ)=Pr(Mi(0)|𝑿i,ϕ)Pr(Mi(1)|𝑿i,ϕ),Pr(M_{i}(0),M_{i}(1)|\bm{X}_{i},\bm{\phi})=Pr(M_{i}(0)|\bm{X}_{i},\bm{\phi})Pr(M_{i}(1)|\bm{X}_{i},\bm{\phi}), where AiA_{i} is omitted in both sides of the equations under the unconfoundedness assumption. This assumption is conservative in that it is stronger than assuming that (Mi(0),Mi(1))(M_{i}(0),M_{i}(1)), and/or (Yi(0),Yi(1))(Y_{i}(0),Y_{i}(1)) are conditionally associated.

2.3. Bayesian Causal Forest Overview

Among many causal inference methods based on BART, Hahn et al. (2020) proposed the Bayesian causal forest model (BCF) to accurately measure heterogeneous effects with the goal of minimizing confounding. Ignoring for the moment the presence of the intermediate variable, MM, the BCF method specifies the response surface as

𝔼(Yi|𝑿i=𝒙i,Ai=ai)=μ(𝒙i,π^(𝒙i))+τ(𝒙i)ai,\mathbb{E}(Y_{i}\,|\,\bm{X}_{i}=\bm{x}_{i},A_{i}=a_{i})=\mu(\bm{x}_{i},\hat{\pi}(\bm{x}_{i}))+\tau(\bm{x}_{i})a_{i}, (1)

where independent BART priors are specified on the functions μ\mu and τ\tau. Here, π^(𝒙i)\hat{\pi}(\bm{x}_{i}) is an estimated propensity score, which corresponds to the exposure assignment mechanism under Assumption 1.

A key point of the BCF specification is the role of the μ\mu function, also known as the prognostic function, which is equivalent to the expected outcomes of the unexposed (i.e., μ(𝒙)=𝔼(Y(0)|𝒙)\mu(\bm{x})=\mathbb{E}(Y(0)|\bm{x})). If the predicted value of the outcome in the absence of exposure relates to the assignment of exposure, the prognostic function will exhibit some dependence on the propensity score π^(𝒙i)\hat{\pi}(\bm{x}_{i}). This phenomenon is referred to as targeted selection (Hahn et al., 2020). One issue with targeted selection is that it can lead standard regularization methods - including the regularization implied by BART - to present regularization-induced confounding (RIC), particularly when conditional outcome distributions depend more on covariates in 𝑿\bm{X} than they do on the treatment, AA. The inclusion of the estiamted propensity score within the function μ\mu is designed specifically to mitigate the threat of RIC, and is a key feature of the BCF model in (1). In the power plant setting, we expect targeted selection to arise because scrubbers are more likely to be installed in power plants emitting a significant amount of pollutants (SO2) and in areas in danger of violating ambient air quality standards, that is, in areas with the higher values of PM2.5  in the absence of emissions controls.

The other main component of BCF is its inclusion of the function τ(𝒙i)\tau(\bm{x}_{i}), which is the feature of the model specifically targeted to a characterization of treatment effect heterogeneity. Specifically, a flexibly-modeled τ(𝒙i)\tau(\bm{x}_{i}) would represent a complex function of covariates that modify the relationship between AiA_{i} and the expected outcome. Estimation in the BCF approach is achieved by specifying an independent BART prior for the μ(𝒙i)\mu(\bm{x}_{i}) and τ(𝒙i)\tau(\bm{x}_{i}) functions. We provide further details within the context of the proposed model, which incorporates intermediate variables, in a later section.

2.4. Observed Data Model

The observed data models of YY and MM can be expressed using the following form of BCF:

Yi|𝑿i=𝒙i,Ai=ai,Mi(1)=m1i,Mi(0)=m0i\displaystyle Y_{i}|\bm{X}_{i}=\bm{x}_{i},A_{i}=a_{i},M_{i}(1)=m_{1i},M_{i}(0)=m_{0i} \displaystyle\sim μY(𝒙i,π^(𝒙i))+τY(𝒙i,m1i,m0i)ai+ϵi,\displaystyle\mu_{Y}(\bm{x}_{i},\hat{\pi}(\bm{x}_{i}))+\tau_{Y}(\bm{x}_{i},m_{1i},m_{0i})a_{i}+\epsilon_{i},
Mi|𝑿i=𝒙i,Ai=ai\displaystyle M_{i}|\bm{X}_{i}=\bm{x}_{i},A_{i}=a_{i} \displaystyle\sim μM(𝒙i,π^(𝒙i))+τM(𝒙i)ai+ϵi,\displaystyle\mu_{M}(\bm{x}_{i},\hat{\pi}(\bm{x}_{i}))+\tau_{M}(\bm{x}_{i})a_{i}+\epsilon_{i}^{\prime},

where ϵiN(0,σy2)\epsilon_{i}\sim N(0,\sigma_{y}^{2}) and ϵiN(0,σm2)\epsilon^{\prime}_{i}\sim N(0,\sigma_{m}^{2}). This essentially corresponds to specifying two BCF models; one for the intermediate, MM, and another for the outcome, YY, conditional on both potential intermediate values. These two BCF models correspond to the two joint distributions described as consequences of the conditional independence assumptions described in Section 2.2.: Pr(Yi(0),Yi(1)|Mi(0),Mi(1),𝑿i,ϕ)Pr(Y_{i}(0),Y_{i}(1)|M_{i}(0),M_{i}(1),\bm{X}_{i},\bm{\phi}), and Pr(Mi(0),Mi(1)|𝑿i,ϕ)Pr(M_{i}(0),M_{i}(1)|\bm{X}_{i},\bm{\phi}). Here, independent BART priors are specified for the functions μM\mu_{M}, μY\mu_{Y}, τM\tau_{M} and τY\tau_{Y}, and π^(𝒙𝒊)\hat{\pi}(\bm{x_{i}}) represents the estimated propensity score π^(𝒙i,ϕA)=Pr(Ai=1|𝒙i,ϕA)\hat{\pi}(\bm{x}_{i},\bm{\phi}_{A})=Pr(A_{i}=1\,|\,\bm{x}_{i},\bm{\phi}_{A}) for each ii-th subject where ϕA\bm{\phi}_{A} is a subvector of the global parameter ϕ\bm{\phi}. Key points to highlight include: (1) the flexibility of the outcome and intermediate models in mitigating the threat of RIC while effectively capturing the causal effects of AA on both YY and MM; (2) the reliance of the outcome model on potential intermediate variables Mi(1)M_{i}(1) and Mi(0)M_{i}(0) rather than an observed intermediate variable MiobsM_{i}^{\text{obs}}; and (3) the importance of the function τY\tau_{Y} in the outcome model, which allows the treatment effect of AA on YY to be modified by a complex function of principal stratum membership Si=(Mi(0),Mi(1))S_{i}=(M_{i}(0),M_{i}(1)) and covariates. The simulation study compares against an approach that excludes 𝑿i\bm{X}_{i} from the specification of τY\tau_{Y}. In either case, estimated PCEs will marginalize over values of 𝑿i\bm{X}_{i} within principal strata defined by SiS_{i}.

By incorporating BART priors, the entire set of observed data models can be reformulated:

Yi\displaystyle Y_{i} =\displaystyle= h=1HμYμY(𝑿i,π^(𝑿i,ϕA);𝒯hμY,hμY)+aih=1HτYτY(𝑿i,Mi(1),Mi(0);𝒯hτY,hτY)+ϵi,\displaystyle\sum_{h=1}^{H_{\mu_{Y}}}\mathcal{F}_{\mu_{Y}}(\bm{X}_{i},\hat{\pi}(\bm{X}_{i},\bm{\phi}_{A});\mathcal{T}_{h}^{\mu_{Y}},\mathcal{M}_{h}^{\mu_{Y}})+a_{i}\sum_{h=1}^{H_{\tau_{Y}}}\mathcal{F}_{\tau_{Y}}(\bm{X}_{i},M_{i}(1),M_{i}(0);\mathcal{T}_{h}^{\tau_{Y}},\mathcal{M}_{h}^{\tau_{Y}})+\epsilon_{i},
Mi\displaystyle M_{i} =\displaystyle= h=1HμMμM(𝑿i,π^(𝑿i,ϕA);𝒯hμM,hμM)+aih=1HτMτM(𝑿i;𝒯hτM,hτM)+ϵi,\displaystyle\sum_{h=1}^{H_{\mu_{M}}}\mathcal{F}_{\mu_{M}}(\bm{X}_{i},\hat{\pi}(\bm{X}_{i},\bm{\phi}_{A});\mathcal{T}_{h}^{\mu_{M}},\mathcal{M}_{h}^{\mu_{M}})+a_{i}\sum_{h=1}^{H_{\tau_{M}}}\mathcal{F}_{\tau_{M}}(\bm{X}_{i};\mathcal{T}_{h}^{\tau_{M}},\mathcal{M}_{h}^{\tau_{M}})+\epsilon_{i}^{\prime},

where, for each f{μY,τY,μM,τM}f\in\{\mu_{Y},\tau_{Y},\mu_{M},\tau_{M}\}, each of HfH_{f} distinct tree structures is denoted by 𝒯hf(h=1,,Hf)\mathcal{T}_{h}^{f}(h=1,\cdots,H_{f}) and the parameters at the terminal nodes of the hh-th tree are denoted by hf={μh,1f,,μh,nhf}\mathcal{M}_{h}^{f}=\{\mu_{h,1}^{f},\cdots,\mu_{h,n_{h}}^{f}\} where nhn_{h} is the number of terminal nodes of 𝒯hf\mathcal{T}_{h}^{f}. And f(𝒙,𝒯hf,hf)\mathcal{F}_{f}(\bm{x},\mathcal{T}_{h}^{f},\mathcal{M}_{h}^{f}) represents μt,ηfhf\mu_{t,\eta}^{f}\in\mathcal{M}_{h}^{f} if 𝒙\bm{x} is associated wtih the η\eta-th terminal node in tree 𝒯hf\mathcal{T}_{h}^{f}.

As suggested in Hahn et al. (2020), we specify different BART priors on μ\mu and τ\tau functions. In general, the default parameter setting in Chipman et al. (2010) is used: (1) a node at depth dd splits with the probability α(1+d)β\alpha(1+d)^{-\beta} (α=0.95\alpha=0.95 and β=2\beta=2); (2) for τY\tau_{Y} in the outcome model, independent priors μh,jτYN(0,σ02)\mu_{h,j}^{\tau_{Y}}\sim N(0,\sigma_{0}^{2}) are assigned to the leaf parameters where σ0=σy/HτY\sigma_{0}=\sigma_{y}/\sqrt{H_{\tau_{Y}}}; and (3) for τM\tau_{M} in the intermediate model, independent priors μh,jτMN(0,σ02)\mu_{h,j}^{\tau_{M}}\sim N(0,\sigma_{0}^{2}) are assigned to the leaf parameters where σ0=σm/HτM\sigma_{0}=\sigma_{m}/\sqrt{H_{\tau_{M}}}. For μY\mu_{Y} and μM\mu_{M}, however, we specify a weakly informative half-Cauchy prior on the scale parameter of the leaf parameters. We set the prior third quartile to twice the standard deviation of the observed YY (or MM).

3. Estimation

3.1. Posterior Computation

The detailed Markov chain Monte Carlo (MCMC) approach for posterior computation process is illustrated in the pseudo-code (Algorithm 1) in the Appendix. As shown in Algorithm 1 in the Appendix, a total of four BART priors (two for MM model and two for YY model) were updated using “Bayesian backfitting” (Hastie and Tibshirani, 2000) as a Metropolis-within-Gibbs sampler in which each tree is fit sequentially through the unexplained responses. The number of trees included in each BART prior is set differently, as suggested by Hahn et al. (2020). The BART priors assigned to the prognostic functions μY\mu_{Y} and μM\mu_{M} that adjust for confounders consist of a large number of trees (e.g., HμY=HμM=200H_{\mu_{Y}}=H_{\mu_{M}}=200), whereas the BART priors assigned to τY\tau_{Y} and τM\tau_{M} representing the heterogeneous effects consist of a small number of trees that regularizes towards the homogeneous effects (e.g., Hτy=Hτm=50H_{\tau_{y}}=H_{\tau_{m}}=50).

To summarize the entire procedure, for the rr-th MCMC iteration, the BART priors assigned to the μM\mu_{M} function (prognostic function) and the τM\tau_{M} function in the MM model are sequentially updated. Thereafter, for each i=1,,ni=1,...,n, MimisM_{i}^{\text{mis}} is generated, which is an unobservable Mi(1Ai)M_{i}(1-A_{i}) value. Through this, the value of the principal stratum membership of the rr-th MCMC iteration Si(r)=(Mi(1),Mi(0))S_{i}^{(r)}=(M_{i}(1),M_{i}(0)) is obtained for each ii-th observation. Similarly, the BART priors corresponding to the YY model are updated. The two potential intermediates, Mi(Ai)M_{i}(A_{i}) and Mi(1Ai)M_{i}(1-A_{i}) (forming the principal stratum membership variable, Si(r)S_{i}^{(r)}), along with 𝑿i\bm{X}_{i}, serve as independent variables for splitting the tree during the growth of the τY\tau_{Y} function. After updating all of the BART parameters, the rr-th iteration is completed by updating the variance parameters (σm2,σy2)(\sigma_{m}^{2},\sigma_{y}^{2}) with the Gibbs sampler. It is worth noting that the sampling for potential intermediate variables incorporated during the MCMC takes into account the densities for both the YY and MM models. Thus, when sampling MimisM_{i}^{\text{mis}}, it is done conditionally based on the structure of the YY model, which includes the pair of potential intermediate variables. The R code for posterior computation is available at https://github.com/lit777/BPCF.

3.2. Causal Effect Estimation

With RR of posterior samples for each parameter and unobserved potential outcome, the posterior means of the principal causal effects for a specific principal stratum {Si:ϵlower<Mi(1)Mi(0)<ϵupper}\{S_{i}:\epsilon_{\text{lower}}<M_{i}(1)-M_{i}(0)<\epsilon_{\text{upper}}\} can be estimated as follows:

𝔼^(ΔPCE(ϵ)|Data)\displaystyle\hat{\mathbb{E}}(\Delta_{\text{PCE}}(\bm{\epsilon})|\text{Data}) =\displaystyle= r=1Ri=1n[Yi(r)(1)Yi(r)(0)]𝕀(ϵlower<Mi(r)(1)Mi(r)(0)<ϵupper)r=1Ri=1n𝕀(ϵlower<Mi(r)(1)Mi(r)(0)<ϵupper),\displaystyle\frac{\sum_{r=1}^{R}\sum_{i=1}^{n}\left[Y_{i}^{(r)}(1)-Y_{i}^{(r)}(0)\right]\mathbb{I}\left(\epsilon_{\text{lower}}<M_{i}^{(r)}(1)-M_{i}^{(r)}(0)<\epsilon_{\text{upper}}\right)}{\sum_{r=1}^{R}\sum_{i=1}^{n}\mathbb{I}\left(\epsilon_{\text{lower}}<M_{i}^{(r)}(1)-M_{i}^{(r)}(0)<\epsilon_{\text{upper}}\right)},

where (Mi(r)(0),Mi(r)(1))(M_{i}^{(r)}(0),M_{i}^{(r)}(1)) and (Yi(r)(0),Yi(r)(1))(Y_{i}^{(r)}(0),Y_{i}^{(r)}(1)) are the rr-th MCMC samples of potential intermediate and outcome variables, respectively.

Multiple values of ϵ\bm{\epsilon} can be specified to investigate how the effect of AA on YY varies as a function of AA on MM by analyzing the outcome effects within groups categorized based on the strength of the exposure-intermediate relationship. For example, a sequence of ΔPCE(ϵ)\Delta_{\text{PCE}}(\bm{\epsilon}) estimate to increase in magnitude as |ϵupperϵlower||\epsilon_{\text{upper}}-\epsilon_{\text{lower}}| grows would indicate that larger effects on the intermediate are associated with larger impacts on the outcome. We will motivate and explore a series of ΔPCE(ϵ)\Delta_{\text{PCE}}(\bm{\epsilon}) in the context of the analysis of power plant data.

4. Simulation

We examine the model performance based on simulated datasets with N=300N=300 observations. We consider two different scenarios: (Scenario I) seven confounders (X1X7X_{1}-X_{7}) are independently generated from N(0,1)N(0,1). YY, MM, and AA are confounded through X1X7X_{1}-X_{7}; (Scenario II) two confounders (X1X2X_{1}-X_{2}) are independently generated from Unif(0,1)Unif(0,1) and three confounders are independently generated from normal distributions. YY, MM and AA are confounded through X1X5X_{1}-X_{5}. The second scenario represents the targeted selection situation where individuals select into exposure based on a prediction of the unexposed outcome (Hahn et al., 2020). The data generating process and results for Scenario II are in the Appendix. In both scenarios, confounding is strong and effect heterogeneity exists through interactions between AXA-X in the MM model and between ASA-S in the YY model. We generate m=200m=200 simulated data under each scenario. The MCMC chain runs for 10,000 iterations, and the first 5,000 iterations are discarded as burn-in.

The number of trees in our proposed model (denoted by the Bayesian principal causal forest; BPCF) was set to 150 for the prognostic function of the intermediate/outcome models, and 50 for the modifier function in the intermediate/outcome models. The prognostic and modifier functions of the intermediate model contain the same set of covariates, whereas the outcome model’s modifier function contains the principal stratum membership Si=(Mi(0),Mi(1))S_{i}=(M_{i}(0),M_{i}(1)) along with the full set of covariates. A node at depth dd splits with the probability α(1+d)β(α=0.95 and β=2)\alpha(1+d)^{-\beta}(\alpha=0.95\text{ and }\beta=2) for the prognostic function but with the probability α(1+d)β(α=0.25 and β=3)\alpha(1+d)^{-\beta}(\alpha=0.25\text{ and }\beta=3) for the modifier function to shrink more towards homogeneous effects. To compare our model performance to other methods, we consider three competing models: (1) seperate BART models for {M(0),M(1),Y(0),Y(1)}\{M(0),M(1),Y(0),Y(1)\}, which is denoted by ‘BARTpce{}_{\text{pce}}’; (2) Dirichlet process mixture (DPM) model from Schwartz et al. (2011), which is denoted by ‘DPpce{}_{\text{pce}}’; and (3) the proposed BPCF without covariates 𝑿\bm{X} in the modifier function of the outcome model (i.e., the modifier function τY\tau_{Y} only contains SiS_{i}), which is denoted by ‘BPCFm only{}_{\text{m only}}’. The specific forms of the BARTpce{}_{\text{pce}} and DPpce{}_{\text{pce}} models are included in the Appendix.

4.1. Simulation I

For the first simulation scenario, we consider the following data generating process:

Yi\displaystyle Y_{i} =\displaystyle= h1(xi1)+1.5h2(xi2)(Mi(1)Mi(0))2×Ai+2|xi3+1|+2xi4\displaystyle h_{1}(x_{i1})+1.5h_{2}(x_{i2})-(M_{i}(1)-M_{i}(0))^{2}\times A_{i}+2|x_{i3}+1|+2x_{i4}
+exp(0.5xi5)0.5|xi6||xi7+1|+ϵi,\displaystyle+\exp(0.5x_{i5})-0.5|x_{i6}|-|x_{i7}+1|+\epsilon_{i},
Mi\displaystyle M_{i} =\displaystyle= 0.5h1(xi1)+0.5h2(xi2)+2Ai+|xi3+1|+1.5xi4\displaystyle 0.5h_{1}(x_{i1})+0.5h_{2}(x_{i2})+2A_{i}+|x_{i3}+1|+1.5x_{i4}
exp(0.3xi5)+Ai×|xi5|+ψi,\displaystyle-\exp(0.3x_{i5})+A_{i}\times|x_{i5}|+\psi_{i},
𝔼(Ai|𝒙i)\displaystyle\mathbb{E}(A_{i}|\bm{x}_{i}) =\displaystyle= Φ(0.5+h1(xi1)+h2(xi2)0.5|xi31|+1.5xi4×xi5),\displaystyle\Phi\left(0.5+h_{1}(x_{i1})+h_{2}(x_{i2})-0.5|x_{i3}-1|+1.5x_{i4}\times x_{i5}\right),
ϵi\displaystyle\epsilon_{i} \displaystyle\sim N(0,0.32),ψiN(0,0.12),xijN(0,1) for j=1,,7,\displaystyle N(0,0.3^{2}),\,\,\,\psi_{i}\sim N(0,0.1^{2}),\,\,\,x_{ij}\sim N(0,1)\text{ for }j=1,\ldots,7,

where h1(x)=(1)I(x0)h_{1}(x)=(-1)^{I(x\geq 0)} and h2(x)=(1)I(x<0)h_{2}(x)=(-1)^{I(x<0)}. This simulation scenario is developed to evaluate the accuracy of PCE estimation when the YY and MM models are given a complex structure.

In this simulation study, target PCEs were defined based on five different intervals. These intervals were determined using quintiles of observed intermediate values obtained through the data generating process. Specifically, the first interval was (2.00, 2.26), the second interval was (2.26, 2.53), the third interval was (2.53, 2.84), and the fourth interval was (2.84, 3.29), and the last interval was (3.29, 5.09). The results are shown in Table 1. Relative bias and MSE were compared, and the proposed BPCF performed relatively better across all PCEs. In particular, it demonstrated very high accuracy in estimating the causal effect for intermediate variables [Mi(1)Mi(0)][M_{i}(1)-M_{i}(0)], and it can be argued that it showed high performance in estimating PCEs by estimating accurate principal stratum membership based on precise [Mi(1)Mi(0)][M_{i}(1)-M_{i}(0)] estimates.

A causal effect predictiveness surface (Gilbert and Hudgens, 2008) was created to help interpret the results of each model more precisely. Figure 1 depicts the predictiveness surfaces created using four different methods, including true surface plots. The xyx-y plane has two axes representing M(1)M(1) and M(0)M(0) variables, and the zz-axis represents the corresponding Y(1)Y(0)Y(1)-Y(0) value. In terms of the overall response slope, the DP method differed significantly from the actual shape. The dots on the xyx-y plane represent M(1)M(1) and M(0)M(0) sample values from one MCMC iteration. Based on this, it is reasonable to interpret that the BART and BPCF methods form more accurate principal strata than does DP. In particular, it can be observed that the DP method failed to accurately estimate Si=(Mi(1),Mi(0))S_{i}=(M_{i}(1),M_{i}(0)). The reason for this phenomenon occurring is that the data generating model for MM has a complex nonlinear structure that includes the interaction of AA and 𝑿\bm{X}. In contrast, the DP method (based on Schwartz et al. (2011)) is designed to vary only the intercept parameters of M(1)M(1) and M(0)M(0). Extensions of Schwartz et al. (2011) to extend the DP model to additional model parameters might improve performance in a scenario where interactions between AA and 𝑿\bm{X} dominate the data generating model for MM. On the other hand, the BPCFm only{}_{\text{m only}} method relatively accurately estimated the values of SiS_{i}, but the response surface showed deviations from the true values. This can be interpreted as the tree structure shrinking towards a more homogeneous effect when the modifier function only includes SiS_{i} and not all covariates.

Table 1: Simulation results (relative bias (rBias) and mean square error(MSE)) under Scenario I
Category True Effect BPCF BPCFm only{}_{\text{m only}} BARTpce{}_{\text{pce}} DPpce{}_{\text{pce}}
rBias MSE rBias MSE rBias MSE rBias MSE
E[M(1)M(0)]E[M(1)-M(0)] 2.79 -0.002 0.001 0.001 0.001 -0.006 0.001 -0.014 0.0014
E[Y(1)Y(0)| interval 1]E[Y(1)-Y(0)|\text{ interval 1}] -4.54 0.14 0.51 0.40 3.66 0.30 1.94 0.63 11.25
E[Y(1)Y(0)| interval 2]E[Y(1)-Y(0)|\text{ interval 2}] -5.71 0.02 0.09 0.19 1.39 0.09 0.29 0.26 3.84
E[Y(1)Y(0)| interval 3]E[Y(1)-Y(0)|\text{ interval 3}] -7.18 -0.03 0.13 0.04 0.31 -0.05 0.19 -0.02 0.74
E[Y(1)Y(0)| interval 4]E[Y(1)-Y(0)|\text{ interval 4}] -9.28 -0.06 0.40 -0.09 1.16 -0.14 1.86 -0.24 5.39
E[Y(1)Y(0)| interval 5]E[Y(1)-Y(0)|\text{ interval 5}] -14.11 -0.10 2.42 -0.27 15.36 -0.23 10.47 -0.48 48.35
(a) Truth
Refer to caption
(b) BPCF
Refer to caption
(c) BPCFm only{}_{\text{m only}}
Refer to caption
(d) BARTpce{}_{\text{pce}}
Refer to caption
(e) DPpce{}_{\text{pce}}
Refer to caption
Figure 1: [Simulation I] Average surface plots for all combinations of (M(0),M(1))(M(0),M(1)) from four different models: (a) Truth, (b) BPCF, (c) BPCFm only{}_{\text{m only}}, (d) BARTpce{}_{\text{pce}}, and (e) DPpce{}_{\text{pce}}. The points in XYX-Y plane are averages of one set of MCMC samples. As a response surface, the corresponding value of the effect AA on the outcome E[Y(1)Y(0)]E[Y(1)-Y(0)] is plotted.

5. Evaluation of Power Plant Scrubber Impacts on Emissions and Ambient Air Quality

The plan to reduce US SO2  emissions to 1980 levels through the Acid Rain Program established by Title IV of the Clean Air Act achieved its goal primarily by reducing emissions from fossil fuel power plants, or, more formally, electricity-generating units (EGUs). The total estimated annualized human health benefits from ARP range from $50 billion to $100 billion (Chestnut and Mills, 2005), but these estimates are heavily reliant on assumed relationships, particularly those between the program, power plant emissions, and ambient PM2.5. A recent study (Kim et al., 2019) discovered a significant relationship among them using data from 2005. Recent data-driven studies, on the other hand, are scarce.

In Kim et al. (2019), the impact of emission-control technology on ambient air quality was analyzed using a flexible statistical model that considered multiple pollutant emissions. By estimating the direct and indirect effect of scrubbers on PM2.5  a significant associative effect associated with SO2  emissions was determined. It is worth noting that these findings were documented in 2005, and no subsequent studies extending to time periods where this type of pollution was comparatively low have been conducted. Additionally, the study had limited consideration for other factors, such as social and environmental factors, that may influence the selection of a scrubber among various methods for SO2  reduction.

5.1. Data

We collected data on 385 coal fuel power plants located in the eastern United States and operating during 2014, incorporating information from the EPA Air Markets Program Data and the Energy Information Administration. The data encompassed plant characteristics, installed emissions control technologies (if applicable), and emissions of various pollutants including SO2. Additionally, predictions of annual PM2.5  concentrations at the zip code level were obtained (Wei et al., 2022). Our analysis specifically centered around the year 2014. In terms of confounders, weather variables are major factors that can affect the operation of power plants and the concentration of ambient PM2.5. We characterized meteorological conditions at the zip code level by measuring relative humidity, temperature, and precipitation at the zip code’s centroid (Tec et al., 2022). Non-local meteorological information from surrounding areas is also a crucial factor that affects local air quality. Recent work in Tec et al. (2022), proposed the use of self-supervised learned latent representations of non-local meteorological information surrounding each zip code as important confounding variables in studies of air pollution. We use the learned representations of non-local meteorological variables within a 100km radius of each zip code as additional confounders which can be extracted from https://huggingface.co/spaces/mauriciogtec/w2vec-app. Furthermore, the 2010 US Census data provided critical demographic information (i.e., total population) for a potential confounder surrounding each power plant.

Using the centroids of the zip codes, we established connections between each power plant and all zip code locations within a specified radius. This approach enabled us to link the average ambient PM2.5  concentration and the average value of local confounders to each power plant. We used a 50km radius in the main manuscript, with an additional radius (60km) being examined in the Appendix. As of 2014, a power plant was categorized as treated if over half of its EGUs had implemented any control techniques for sulfur dioxide (SO2). Additionally, only power plants with a total heat input greater than 0 were considered, and power plants with missing values for important confounding factors (especially power plant characteristics) were excluded. Table 2 shows the summary statistics for each variable in the two treatment groups. Before presenting the analytical results, MCMC convergence was assessed with plots of the MCMC log-marginal likelihood (Figure S2 in the Appendix).

Table 2: Summary statistics for covariates, intermediate and outcomes available for the analysis. All variables represent annual averages, with Census variables relying on annual data starting from the year 2010. Note that there are also 10 additional learned latent representations of neighboring weather information included in the analysis.
Have SO2  control (n=74) Have no SO2  control (n=87)
Mean SD Mean SD
Air Quality and Weather Data
Ambient PM2.5 (μm\mu\text{m}) 9.15 (1.70) 8.89 (1.64)
Relative Humidity(%) 74.32 (3.88) 73.95 (5.00))
Temperature at 2m(C{}^{\circ}C) 11.86 (3.49) 11.62 (4.10)
Total Precipitation(kg/m2kg/m^{2}) 2.73 (0.52) 2.70 (0.49)
Power Plant Data
Total SO2  Emission (tons) 713.95 (818.80) 1063.70 (1353.82)
Total NOx  Emission (tons; past year) 542.47 (400.97) 365.96 (365.69)
Total CO2  Emission (1000 tons; past year) 704.16 (424.70) 334.89 (381.71)
Average Heat Input (10000 MMBtu) 8139.89 (4825.14) 4446.26 (3870.44)
Phase of Participation in the ARP (Phase I or II) 0.62 (0.48) 0.75 (0.44)
Total Operating Time (hours ×\times # units) 19329.67 (9843.08) 18464.60 (10605.52)
Sulfur Content in Coal (lb/MMBtu) 2.09 (1.23) 0.89 (0.96)
Num. of NOx  Controls (# units) 4.97 (2.93) 4.09 (2.45)
Num. of Units 2.85 (1.49) 3.31 (1.71)
Gross Load (1000MWh) 8455.37 (5042.04) 4603.13 (4111.47)
Capacity (%) 58.22 (11.55) 47.48 (16.68)
Census Data
Population 2010 8604.69 (6612.86) 10077.72 (7377.57)

5.2. Preliminary Analysis

A simple average comparison of the two groups showed that the concentration of ambient PM2.5  was approximately 0.25 (μm/m3\mu\text{m}/\text{m}^{3}) higher within 50 km of the power plants where SO2  control technologies were installed. The average SO2  pollution emissions from the power plants with and without any SO2  control techniques were 713.95 tons and 1063.70 tons, respectively, indicating that the group of power plants without SO2  control technologies emitted more (roughly 349.76 tons more) SO2  pollution. Propensity score estimates were obtained using logistic regression with the confounding variables mentioned in the previous section.

Subsequent estimates are reported as posterior means along with 95% credible intervals. The estimated causal effect of SO2  control strategies on SO2  emissions was -480.09 (-702.55, -378.54) tons, indicating a significant reduction in SO2  emissions. Similarly, the estimated causal effect of control strategies on ambient PM2.5  concentrations was -0.60 (-0.83, -0.37), showing a significant reduction in PM2.5  concentrations within a 50-kilometer radius. These results coincide with the expectation that introducing SO2  reduction technology in power plants would consistently result in reduced SO2  emissions and subsequently lower ambient PM2.5  levels. The principal stratification approach will more thoroughly examine the influence of power plants’ varying response in SO2  emissions on the downstream impacts on ambient PM2.5 .

Notably, Figure 2 indicates that, in both the intermediate and outcome models, there is a positive relationship between propensity score estimates and prognostic function estimates. Particularly for the outcome variable (PM2.5), an increasing relationship between μ\mu and π\pi indicates that targeted selection bias may exist. This serves as one key motivation for the BCF-based approach.

Refer to caption
Figure 2: For the estimated prognostic values μ(𝒙i)\mu(\bm{x}_{i}), the estimated propensity scores π(𝒙i)\pi(\bm{x}_{i}) are plotted. The right plot for PM2.5  concentrations (YY) demonstrates a moderate monotone relationship between the propensity score and the prognostic function.
Refer to caption
Figure 3: Posterior mean estimates of principal causal effects for six different strata. These six strata are defined by intervals 1, 2, 3, 4, 5, and 6. The size of the circle is proportional to the number of plants in that stratum, and numbers listed are posterior mean of the effects, standard deviation and the number of subjects, respectively.

5.3. Principal Causal Effects

Principal strata chosen for summarizing the CEP surface were constructed based on the estimated variability among potential intermediate variables. The posterior predictive mean values for each ii-th Mi(1)Mi(0)M_{i}(1)-M_{i}(0) was estimated, and multiplying the standard deviation of these values by 0.2 and 0.5 resulted in values of 320 and 798, respectively. Using this information, six principal strata were constructed as follows.: interval 1 is (<< -798), interval 2 is [-798, -320), interval 3 is [-320, 0), interval 4 is [0, 320), interval 5 is [320, 798), and interval 6 is (\geq 798). The strata defined by the first three intervals demonstrate how the PM2.5  concentrations vary within units where SO2  emissions decrease due to SO2  control installation. The last three intervals explain the opposite impact of SO2  control techniques on PM2.5  concentrations in situations where SO2  emissions actually increase.

In the case of the principal causal effect for interval 1 (ΔPCE1\Delta_{\text{PCE}1}), 52.3 power plants were assigned to the subgroup on average, and the estimated value of the causal effect was -0.59 (95% CI: -0.84, -0.31). In the case of the principal causal effect for interval 2 (ΔPCE2\Delta_{\text{PCE}2}), 34.9 power plants were assigned to the subgroup on average, and the estimated value of the causal effect was -0.69 (95% CI: -1.05, -0.32). For interval 3(ΔPCE3\Delta_{\text{PCE}3}), 34.2 power plants were assigned to the subgroup on average, and the estimated value of the causal effect was -0.67 (95% CI: -1.13, -0.22). On the other hand, in the case of the principal causal effects for intervals 4-6 (ΔPCE4\Delta_{\text{PCE}4},ΔPCE5\Delta_{\text{PCE}5},ΔPCE6\Delta_{\text{PCE}6}), the groups with increased SO2  emissions had averages of 19.5, 8.8, 11.4 power plants and estimated effects of -0.61 (95% CI: -1.10, -0.11), -0.24 (95% CI: -1.47, 0.51), -0.45 (95% CI: -1.36, 0.57), respectively. The first three PCEs displayed significant effect estimates, with more power plants assigned to those strata. This suggests that the implementation of SO2  control techniques led to a substantial reduction in SO2  emissions and consequently lowered PM2.5  concentrations. Specifically, based on the fact that the highest number of power plants was assigned to the first principal stratum, it can be inferred that a significant number of power plants achieved a reduction in SO2  emissions of 798 tons or more through the installation of SO2  control techniques. In contrast, the last three principal strata, indicating a causal increase in PM2.5  concentration due to the installation of SO2  control, did not show significant PCEs except for interval 4. Moreover, the number of power plants within these strata was observed to be relatively small. This suggests that the installation of SO2  controls tends to not increase SO2  emissions and, furthermore, exhibits little evidence of an impact on PM2.5  concentrations.

These results are well illustrated in Figure 3 and Figure 4. Figure 3 displays the principal causal effects corresponding to intervals 1-6 from left to right. The center point of each circle represents the posterior mean of the principal causal effect, and the vertical line represents the 95% credible interval. The size of each circle represents the average value of subjects within the respective stratum, and the numbers below indicate the posterior mean, posterior standard deviation (SD), and average number of subjects in sequential order. The xx-axis represents the causal effect on SO2  emissions within each interval.

Figure 4 presents the average response surface. The points on the xyx-y axis represent the posterior predictive mean values for Si=(Mi(1),Mi(0))S_{i}=(M_{i}(1),M_{i}(0)). It can be observed that the slope of the response surface generally decreases in regions where M(1)M(1) values are lower than M(0)M(0) values. This indicates that the reduction in SO2  emissions corresponds to a decrease in PM2.5  concentrations, with steeper impacts in areas of the largest SO2  reductions.

Refer to caption
Figure 4: Average surface plot of the causal effect on PM2.5  for different values of (M(0),M(1))(M(0),M(1)) by tons. Posterior predictive values of (M(0),M(1))(M(0),M(1)) are plotted on the xyx-y plane. The corresponding value of the causal effect of SO2  control techniques installation on PM2.5, E[Y(1)Y(0)]E[Y(1)-Y(0)], is plotted on the zz-axis.

5.3.1 Additional Analysis

In the main analysis, each power plant utilized the average PM2.5  concentrations of zip codes within a 50km radius of the zipcode centroid as the outcome variable. However, considering the atmospheric movement, the area directly influenced by each power plant’s air quality may be different. Therefore, an additional analysis was conducted by obtaining the average PM2.5  concentration within a 60km radius and utilizing them as outcome variables. This range is set to prevent excessive overlap between each power plant’s sphere of influence. The appendix provides detailed information on the the result. When comparing these results to those within a 50km radius, it was confirmed that there is not a significant difference between the two sets of results.

6. Discussion

This paper presented a novel principal causal effect estimation method that can investigate the role of intermediate variables in estimating causal effects. To estimate the principal causal effect, which varies with changes in principal stratum membership, intermediate, and outcome variable models were created using the Bayesian causal forest model, which is known to accurately estimate the heterogeneous effect. This resulted in a more detailed characterization of principal strata, which were plugged into the modifier function in the outcome model as an effect modifier. The proposed method provides protection against the evident threat of targeted selection and regularization-induced confounding. The air quality study in Section 5 demonstrated the phenomenon of targeted selection in relation to the effect of SO2  emission reduction technology on PM2.5  concentrations, proving the efficacy of the proposed method.

In this study, a priori estimated propensity scores were utilized as fixed covariates, which means that the uncertainty surrounding the propensity score estimation was not considered in the final analysis. As claimed in Hahn et al. (2020), incorporating propensity score estimates is merely a means to control regularization by transforming covariates. Therefore, a priori estimation is not a major concern. Jointly modeling the propensity score model with the intermediate and outcome models is not a straightforward task, primarily due to the feedback between these models. To address this limitation and avoid the feedback between the propensity score model (i.e., treatment assignment model) and the intermediate/outcome models, a two-step Bayesian approach (Lunn et al., 2009; Zigler, 2016) could be employed. This approach allows for the prevention of feedback by decoupling the models and analyzing them in separate steps.

Further extension research covers a wide range of topics. If there are multiple intermediate variables, the intermediate model can be constructed using a multivariate response, and a multivariate principal stratum can be designed accordingly. In this case, the principal strata data entered the outcome model’s modifier function as multivariate data. Considering the nature of air quality research, several variables (various weather variables, regional/geographical information, social information, etc.) can be considered as potential confounders where it is important to select the confounders that will actually be used. Using a recently proposed method (Kim et al., 2023), it will be possible to connect a common selection probability prior to the prognostic function of each Bayesian causal forest model for the selection of the important confounders. Moreover, the outcome variable, ambient PM2.5 concentrations in the surrounding area, exhibits spatial correlation. Nevertheless, the covariate information employed in our application is anticipated to mitigate this residual spatial correlation. The Moran’s I value (0.12, p-value 0.37), computed using an inverse distance weight matrix, also suggests an absence of remaining spatial dependence. However, in alternative applications, introducing a modification to include a spatial random effect may prove beneficial.

Acknowledgements

The authors gratefully acknowledge that this work is supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2020R1F1A1A01048168, NRF-2022R1F1A1062904) and the US National Institutes of Health (R01ES034803 and R01ES035131).

Data Availability Statement

The data that support the findings of this paper are openly available in the GitHub repository at https://github.com/lit777/BPCF.

References

  • Chestnut and Mills (2005) Chestnut, Lauraine G and Mills, David Mauthor (collaboration) (2005), “A fresh look at the benefits and costs of the US acid rain program,” Journal of environmental management, 77, 252–266.
  • Chipman et al. (2010) Chipman, Hugh A and George, Edward I and McCulloch, Robert E— (2010), “BART: Bayesian additive regression trees,” The Annals of Applied Statistics, 4, 266–298.
  • Comment et al. (2019) Comment, Leah and Mealli, Fabrizia and Haneuse, Sebastien and Zigler, Corwin— (2019), “Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks,” arXiv preprint arXiv:1902.09304.
  • Ding and Li (2018) Ding, Peng and Li, Fan— (2018), “Causal inference: A Missing Data Perspective,” Statistical Science, 33, 214–237.
  • Frangakis and Rubin (2002) Frangakis, Constantine E and Rubin, Donald B— (2002), “Principal stratification in causal inference,” Biometrics, 58, 21–29.
  • Frumento et al. (2012) Frumento, Paolo and Mealli, Fabrizia and Pacini, Barbara and Rubin, Donald B— (2012), “Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data,” Journal of the American Statistical Association, 107, 450–466.
  • Gilbert and Hudgens (2008) Gilbert, Peter B and Hudgens, Michael G— (2008), “Evaluating candidate principal surrogate endpoints,” Biometrics, 64, 1146–1154.
  • Green and Kern (2012) Green, Donald P and Kern, Holger L— (2012), “Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees,” Public opinion quarterly, 76, 491–511.
  • Hahn et al. (2020) Hahn, P Richard and Murray, Jared S and Carvalho, Carlos M— (2020), “Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion),” Bayesian Analysis, 15, 965–1056.
  • Hastie and Tibshirani (2000) Hastie, Trevor and Tibshirani, Robert— (2000), “Bayesian backfitting (with comments and a rejoinder by the authors,” Statistical Science, 15, 196–223.
  • Henneman et al. (2023) Henneman, Lucas and Choirat, Christine and Dedoussi, Irene and Dominici, Francesca and Roberts, Jessica and Zigler, Corwin— (2023), “Mortality risk from United States coal electricity generation,” Science, 382, 941–946.
  • Hill et al. (2020) Hill, Jennifer and Linero, Antonio and Murray, Jared— (2020), “Bayesian additive regression trees: A review and look forward,” Annual Review of Statistics and Its Application, 7.
  • Hill (2011) Hill, Jennifer L— (2011), “Bayesian nonparametric modeling for causal inference,” Journal of Computational and Graphical Statistics, 20, 217–240.
  • Imai et al. (2010) Imai, Kosuke and Keele, Luke and Yamamoto, Teppei— (2010), “Identification, inference and sensitivity analysis for causal mediation effects,” Statistical science, 25, 51–71.
  • Jin and Rubin (2008) Jin, Hui and Rubin, Donald B— (2008), “Principal stratification for causal inference with extended partial compliance,” Journal of the American Statistical Association, 103, 101–111.
  • Kim et al. (2019) Kim, Chanmin and Daniels, Michael J and Hogan, Joseph W and Choirat, Christine and Zigler, Corwin M— (2019), “Bayesian methods for multiple mediators: Relating principal stratification and causal mediation in the analysis of power plant emission controls,” The annals of applied statistics, 13, 1927.
  • Kim et al. (2020) Kim, Chanmin and Henneman, Lucas RF and Choirat, Christine and Zigler, Corwin M— (2020), “Health effects of power plant emissions through ambient air quality,” Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 1677–1703.
  • Kim et al. (2023) Kim, Chanmin and Tec, Mauricio and Zigler, Corwin— (2023), “Bayesian nonparametric adjustment of confounding,” Biometrics, 79, 3252–3265.
  • Li et al. (2023) Li, Fan and Ding, Peng and Mealli, Fabrizia— (2023), “Bayesian causal inference: a critical review,” Philosophical Transactions of the Royal Society A, 381, 20220153.
  • Lunn et al. (2009) Lunn, David and Best, Nicky and Spiegelhalter, David and Graham, Gordon and Neuenschwander, Beat— (2009), “Combining MCMC with ‘sequential’PKPD modelling,” Journal of Pharmacokinetics and Pharmacodynamics, 36, 19–38.
  • Lyu et al. (2023) Lyu, Tianmeng and Bornkamp, Björn and Mueller-Velten, Guenther and Schmidli, Heinz— (2023), “Bayesian inference for a principal stratum estimand on recurrent events truncated by death,” Biometrics.
  • Mealli and Mattei (2012) Mealli, Fabrizia and Mattei, Alessandra— (2012), “A refreshing account of principal stratification,” The international journal of biostatistics, 8.
  • Miratrix et al. (2018) Miratrix, Luke and Furey, Jane and Feller, Avi and Grindal, Todd and Page, Lindsay C— (2018), “Bounding, an accessible method for estimating principal causal effects, examined and explained,” Journal of Research on Educational Effectiveness, 11, 133–162.
  • Nevo and Gorfine (2022) Nevo, Daniel and Gorfine, Malka— (2022), “Causal inference for semi-competing risks data,” Biostatistics, 23, 1115–1132.
  • Page (2012) Page, Lindsay C— (2012), “Understanding the impact of career academy attendance: An application of the principal stratification framework for causal effects accounting for partial compliance,” Evaluation Review, 36, 99–132.
  • Pearl (2001) Pearl, J.— (2001), “Direct and indirect effects,” in Proceedings of the seventeenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp. 411–420.
  • Robins and Greenland (1992) Robins, J. M. and Greenland, S.— (1992), “Identifiability and exchangeability for direct and indirect effects,” Epidemiology, 143–155.
  • Rosenbaum and Rubin (1983) Rosenbaum, Paul R and Rubin, Donald B— (1983), “The central role of the propensity score in observational studies for causal effects,” Biometrika, 70, 41–55.
  • Rubin (1974) Rubin, D. B.— (1974), “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of educational Psychology, 66, 688.
  • Rubin (1980) Rubin, Donald B— (1980), “Randomization analysis of experimental data: The Fisher randomization test comment,” Journal of the American statistical association, 75, 591–593.
  • Schwartz et al. (2011) Schwartz, Scott L and Li, Fan and Mealli, Fabrizia— (2011), “A Bayesian semiparametric approach to intermediate variables in causal inference,” Journal of the American Statistical Association, 106, 1331–1344.
  • Tan and Roy (2019) Tan, Yaoyuan Vincent and Roy, Jason— (2019), “Bayesian additive regression trees and the General BART model,” Statistics in medicine, 38, 5048–5069.
  • Tec et al. (2022) Tec, Mauricio and Scott, James and Zigler, Corwin— (2022), “Weather2vec: Representation Learning for Causal Inference with Non-Local Confounding in Air Pollution and Climate Studies,” arXiv preprint arXiv:2209.12316.
  • VanderWeele (2008) VanderWeele, Tyler J— (2008), “Simple relations between principal stratification and direct and indirect effects,” Statistics & Probability Letters, 78, 2957–2962.
  • Wei et al. (2022) Wei, Y. and Xing, X. and Shtein, A. and Casto, E. and Hultquist, C. and Yazdi, M. D. and Li, L. and Schwartz, J.— (2022), “Daily and Annual PM2.5, O3, and NO2 Concentrations at ZIP Codes for the Contiguous U.S., 2000-2016, v1.0,” .
  • Zeldow et al. (2019) Zeldow, Bret and Re III, Vincent Lo and Roy, Jason— (2019), “A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects,” The annals of applied statistics, 13, 1989.
  • Zigler (2016) Zigler, Corwin Matthew— (2016), “The central role of Bayes’ theorem for joint estimation of causal effects and propensity scores,” The American Statistician, 70, 47–54.
  • Zigler et al. (2018) Zigler, Corwin M and Choirat, Christine and Dominici, Francesca— (2018), “Impact of National Ambient Air Quality Standards nonattainment designations on particulate pollution and health,” Epidemiology (Cambridge, Mass.), 29, 165–174.

Supporting Information

Web Appendix A-E referenced in Sections 2, 3, 4 and 5 are available with this paper at the Biometrics website on Wiley Online Library. R code files for the simulation studies and data analysis are available with this paper at the Biometrics website on Wiley Online Library.