Estimating the treatment effect for adherers using multiple imputation

Junxiang Luo
200 Technology Square, Cambridge, MA 02139, USA
Email: junxiang.luo@Modernatx.com
Stephen J. Ruberg
Analytix Thinking, LCC, 11121 Bentgrass Court, Indianapolis, IN 46236, USA
Email: AnalytixThinking@gmail.com
Yongming Qu*
Department of Statistics, Data and Analytics, Eli Lilly and Company, Indianapolis, IN 46285, USA
Email: qu_yongming@lilly.com

Abstract

Randomized controlled trials are considered the gold standard to evaluate the treatment effect (estimand) for efficacy and safety. According to the recent International Council on Harmonisation (ICH)-E9 addendum (R1), intercurrent events (ICEs) need to be considered when defining an estimand, and principal stratum is one of the five strategies to handle ICEs. Qu et al. (2020, Statistics in Biopharmaceutical Research 12:1-18) proposed estimators for the adherer average causal effect (AdACE) for estimating the treatment difference for those who adhere to one or both treatments based on the causal-inference framework, and demonstrated the consistency of those estimators; however, this method requires complex custom programming related to high-dimensional numeric integrations. In this article, we implemented the AdACE estimators using multiple imputation (MI) and constructs CI through bootstrapping. A simulation study showed that the MI-based estimators provided consistent estimators with the nominal coverage probabilities of CIs for the treatment difference for the adherent populations of interest. As an illustrative example, the new method was applied to data from a real clinical trial comparing 2 types of basal insulin for patients with type 1 diabetes.
Keywords: Adherer average causal effect, counterfactual effect, principal stratum, tripartite estimands.

*Correspondence: Yongming Qu, Department of Statistics, Data and Analytics, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN 46285, U.S.A. Email: qu_yongming@lilly.com.

1 Introduction

Choosing and defining estimands and constructing corresponding estimators are integral parts of randomized controlled clinical trials. The International Council on Harmonisation (ICH) provides a general framework for choosing and defining estimands using a few key attributes: treatment of interest, population, handling intercurrent events (ICEs), endpoint, and population-level summary [1]. There are three possible populations in defining estimands: the whole targeted population, a subset of population based on baseline covariates, and a principal stratum defined by post-baseline variables [2, 3]. A principal stratum is a subset of the study population defined by the potential outcome of one or more post-randomization variables [4, 5, 6]. The most used estimands in clinical trials are to estimate the treatment effect for the whole targeted study population. The whole targeted study population in defining estimands is widely used as it theoretically maintains randomization, but the difficulty arises when, as inevitably happens, some patients do not provide complete efficacy response data relevant to the study objective due to intercurrent events. ICH E9 (R1) defines ICEs as “Events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest. It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated.” Deciding which strategy to use in handling ICEs can lead to different estimands. This is central to the development of ICH E9 (R1) which provides a framework and a language for defining the estimand first, and subsequently, an appropriate data handling and analysis approach. As mentioned in ICH E9 (R1), the definition of the estimand should be guided by the study objective. Deciding the treatment effect for the whole study population is an important question, but it may not be the only or the most important question. The authors of the recent research [7, 8, 9] provide arguments of importance of the treatment effect for principal strata of adherers, defined as the population of patients without ICEs. The treatment effect for the whole study population answers the question, “what is the overall expected treatment effect that would be communicated to an individual patient before that patient takes the medication?”, which is an unconditional expectation regardless of adherence status. The treatment effect for the adherers answers the question, “what is the treatment effect in a patient who can adhere to the treatment?”, which is a conditional expectation regarding adherence status.

The estimation of the treatment effect for a principal stratum has attracted interest for data analysis in clinical trials previously [10, 11, 12, 13, 14]. There are a variety of approaches for constructing estimators of the treatment effect in a principal stratum. Most methods require the monotonicity assumption and/or using the principal score [13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. Monotonicity basically assumes the potential outcome for the stratification indicator is a monotone function of the treatment indicator. The monotonicity assumption imposes a deterministic relationship on the potential outcomes (random variables) of the principal stratum variable(s). Qu et al. [26] demonstrate the implausibility of such an assumption in many situations from a theoretic perspective and illustrate this implausibility using a real data example in a cross-over study. The principal score is the probability of a subject belonging to the principal stratum, modeled via baseline covariates. As an apparent drawback, methods based on the principal score assume the principal stratum can be fully modeled through baseline covariates. Recently, Louizos et al. [27] proposed methods directly estimating the potential outcome of the response variable under the alternative treatment if the principal stratum can be observed in one treatment group. Qu et al. [8] developed estimators for the adherer average causal effect (AdACE) based on the causal inference framework for the treatment effect for those who can adhere to one or both treatments by modeling the potential outcome of the response variable and/or the principal score via baseline covariates and potential post-baseline intermediate measurements. It is important to include post-baseline intermediate measurements since patients and their treating physicians in clinical trials most often make decisions about whether or not to adhere to the randomized study treatment based on their efficacy and safety responses to the assigned treatment. Details on the implementation of estimators for the AdACE are given in Qu et al. [9]. Barriers to a wide application of these estimators include the complex estimation process, the time-consuming computation, and the requirement of customized programs for individual clinical trials.

Multiple imputation (MI), proposed by Rubin [28], is widely used in handling missing values and could be an alternative approach to construct estimators for the AdACE. The advantage of using MI is that the estimators can easily be calculated based on the imputed potential outcomes. In this article, we propose using MI to construct the estimators for the AdACE. The advantage of this approach is that it can utilize the existing estimation procedures for “complete” data after MI to estimate treatment effect in the stratum of interest. In Section 2, we will review the theoretical framework and outline the process of the MI-based estimators for adherers. The definition of adherence may be study specific, but generally we consider a patient to be adherent if the patient predominantly takes their estimand-defined study treatment (e.g., no intercurrent events) throughout the intended duration of the trial. As we are interested to estimate the potential outcome under the intended treatment regimen for adherers, the (potential) outcomes under a treatment regimen deviating away from the randomized treatment (e.g., treatment discontinuations or treatment switch) are not relevant in our estimation. In Section 3, simulations are conducted to evaluate the performance of the MI-based estimator. In Section 4, the application of the MI-based estimator is illustrated with a real clinical study. Finally, Section 5 serves as the summary and discussion.

2 Methods

Let $(\mbox{\boldmath$X$}_{ij},\mbox{\boldmath$Z$}_{ij},Y_{ij},\mbox{\boldmath$I$}_{ij})$ denote the data for assigned treatment $i\;(i=0,1)$ and subject $j\;(1\leq j\leq n_{i})$ , where $\mbox{\boldmath$X$}_{ij}$ is a vector of baseline covariates, $\mbox{\boldmath$Z$}_{ij}=(Z_{ij}^{(1)},Z_{ij}^{(2)},\ldots,Z_{ij}^{(K-1)})^{\prime}$ is a vector of intermediate repeated measurements, $\mbox{\boldmath$I$}_{ij}=(I_{ij}^{(1)},I_{ij}^{(2)},\ldots,I_{ij}^{(K-1)})^{\prime}$ , and $I_{ij}^{(k)}$ ( $1\leq k\leq K-1$ ) is the indicator variable for whether a patient is adherent to treatment after intermediate time point $k$ . Note $Z$ can be intermediate measurements of the same variable as $Y$ , or intermediate outcomes of other ancillary variables, or include both. Then, the adherence indicator for the study treatment is given by:

A_{ij}=\prod_{k=1}^{K-1}I_{ij}^{(k)}.

We use “ $(t)$ ” following the variable name to denote the potential outcome under the hypothetical treatment $t\;(t=0,1)$ . For example, $Y_{ij}(t)$ denotes the potential outcome for subject $j$ randomized to treatment $i$ if taking treatment $t$ . Generally, $Y_{ij}(i)$ can be observed but $Y_{ij}(1-i)$ cannot be observed in parallel studies.

Four principal strata were discussed in Qu et al. [8]:

$\bullet$

The whole study population: $S_{**}=\{(i,j):A_{ij}(0)\in\{0,1\},A_{ij}(1)\in\{0,1\}\}$ .
$\bullet$

Patients who can adhere to the experimental treatment: $S_{*+}=\{(i,j):A_{ij}(1)=1\}$
$\bullet$

Patients who can adhere to the control treatment: $S_{+*}=\{(i,j):A_{ij}(0)=1\}$
$\bullet$

Patients who can adhere to both treatments: $S_{++}=\{(i,j):A_{ij}(0)=1,A_{ij}(1)=1\}$

Qu et al. [8] provides estimators for $S_{*+}$ , $S_{+*}$ , and $S_{++}$ under the following assumptions:

A1:

$Y=Y(1)T+Y(0)(1-T)$
A2:

$Z=Z(1)T+Z(0)(1-T)$
A3:

$A=A(1)T+A(0)(1-T)$
A4:

$T\perp\{Y(1),A(1),Z(1),Y(0),A(0),Z(0)\}|X$
A5:

$A(i)\perp\{Y(1),Y(0),Z(1-i)\}|\{X,Z(i)\},\quad\forall i=0,1$
A6:

$Y(i)\perp Z(1-i)|\{X,Z(i)\},\quad\forall i=0,1$
A7:

$Z(0)\perp Z(1)|X$

The estimators provided in Qu et al. [8] were rather complex; however, the idea is to estimate the potential response $Y_{ij}(1-i)$ and/or the potential adherence status $A_{ij}(1-i)$ under the alternative treatment.

Alternatively, estimators can be achieved naturally with the approach of MI. For a subject $j$ in assigned treatment $i$ , the potential outcome $Y_{ij}(1-i)$ , $\mbox{\boldmath$Z$}_{ij}(1-i)$ , and $\mbox{\boldmath$I$}_{ij}(1-i)$ are unobserved for the alternative treatment $1-i$ and can be imputed using a model estimated from all data from treatment $1-i$ , i.e., $\{(\mbox{\boldmath$X$}_{1-i,j},\mbox{\boldmath$Z$}_{1-i,j},Y_{1-i,j},\mbox{\boldmath$I$}_{1-i,j}):1\leq j\leq n_{1-i}\}$ , and his/her own baseline value $\mbox{\boldmath$X$}_{ij}$ . Let $\{(\mbox{\boldmath$Z$}_{ij}(1-i)^{(m)},Y_{ij}(1-i)^{(m)},\mbox{\boldmath$I$}_{ij}(1-i)^{(m)}),1\leq m\leq M\}$ be the $M$ imputed values for the potential outcomes under treatment $1-i$ for subjects assigned to treatment $i$ . Note the unobserved outcomes for the potential treatment $i$ for those assigned to treatment $i$ (due to random missingness or non-adherence) can also be simultaneously imputed, denoted by $\{(\mbox{\boldmath$Z$}_{ij}(i)^{(m)},Y_{ij}(i)^{(m)},1\leq m\leq M\}$ . The potential adherence indicator under the alternative treatment based on the imputed values is calculated as:

A_{ij}(1-i)^{(m)}=\prod_{k=1}^{K-1}I_{ij}(1-i)^{(k,m)}.

Table 1: Illustration of MI to impute the potential outcome under treatment

T=t

for patients assigned to treatment

T=1-t

Subject	Randomized Treatment	$X$	$Z^{(1)}$	$Z^{(2)}$	$Z^{(3)}$	$I^{(1)}$	$I^{(2)}$	$I^{(3)}$	$Y$
001	$t$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	1	1	1	$\checkmark$
002	$t$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	1	1	0	$\cdot$
003	$t$	$\checkmark$	$\checkmark$	$\checkmark$	$\cdot$	1	0	0	$\cdot$
$\cdots$
101	$1-t$	$\checkmark$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$
102	$1-t$	$\checkmark$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$
103	$1-t$	$\checkmark$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$	$\cdot$
Abbreviations: MI, multiple imputation; “ $\checkmark$ ”, observed data; “ $\cdot$ ”, unobserved data.

After imputations, for each patient the (potential) outcome $Y$ and adherence $A$ under both treatments are available. Then, the mean response for each treatment can be calculated by simply taking the average of the (potential) outcome of $Y$ for the (potential) adherers. The estimation of the treatment effect for the whole study population using MI has been extensively studied in the literature [29], so we will not discuss it here. Essentially, the estimators for $S_{*+}$ and $S_{+*}$ can be constructed in the exact same way by symmetry. Therefore, we only provide the estimators for the mean response in each treatment on populations $S_{*+}$ and $S_{++}$ (Table 2), which are most relevant for placebo-controlled trials and active-comparator trials, respectively, as argued by Qu et al. [8]. For each treatment, the estimator can be constructed using patients randomly assigned to treatment $T=0$ , $T=1$ , and all patients. Generally, it is preferable to use all patients in constructing the estimators as this most closely represents the study population. The treatment difference can be calculated easily based on the estimators for individual treatments.

Table 2: Estimators for the mean response on principal strata defined by treatment adherence

PS	Treatment	Patient	Estimator
		$E_{0}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}(1)^{(m)}\left(A_{0j}Y_{0j}+(1-A_{0j})Y_{0j}(0)^{(m)}\right)}{\sum_{j=1}^{n_{0}}A_{0j}(1)^{(m)}}\right\}$
	$T=0$	$E_{1}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{1}}A_{1j}Y_{1j}(0)^{(m)}}{\sum_{j=1}^{n_{1}}A_{1j}}\right\}$
$S_{*+}$		$E_{0}\cup E_{1}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum\limits_{j=1}^{n_{0}}A_{0j}(1)^{(m)}\left(A_{0j}Y_{0j}+(1-A_{0j})Y_{0j}(0)^{(m)}\right)+\sum\limits_{j=1}^{n_{1}}A_{1j}Y_{1j}(0)^{(m)}}{\sum\limits_{j=1}^{n_{0}}A_{0j}(1)^{(m)}+\sum\limits_{j=1}^{n_{1}}A_{1j}}\right\}$
		$E_{0}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}(1)^{(m)}Y_{0j}(1)^{(m)}}{\sum_{j=1}^{n_{0}}A_{0j}(1)^{(m)}}\right\}$
	$T=1$	$E_{1}$	$\frac{\sum_{j=1}^{n_{1}}A_{1j}Y_{1j}}{\sum_{j=1}^{n_{1}}A_{1j}}$
		$E_{0}\cup E_{1}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}(1)^{(m)}Y_{0j}(1)^{(m)}+\sum_{j=1}^{n_{1}}A_{1j}Y_{1j}}{\sum_{j=1}^{n_{0}}A_{0j}(1)^{(m)}+\sum_{j=1}^{n_{1}}A_{1j}}\right\}$
		$E_{0}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}Y_{0j}}{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}}\right\}$
	$T=0$	$E_{1}$	$\frac{1}{M}\sum_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}Y_{1j}(0)^{(m)}}{\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}}\right\}$
		$E_{0}\cup E_{1}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}Y_{0j}+\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}Y_{1j}(0)^{(m)}}{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}+\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}}\right\}$
$S_{++}$		$E_{0}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}Y_{0j}(1)^{(m)}}{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}}\right\}$
	$T=1$	$E_{1}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}Y_{1j}}{\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}}\right\}$
		$E_{0}\cup E_{1}$	$\frac{1}{M}\sum\limits_{m=1}^{M}\left\{\frac{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}Y_{0j}(1)^{(m)}+\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}Y_{1j}}{\sum_{j=1}^{n_{0}}A_{0j}A_{0j}(1)^{(m)}+\sum_{j=1}^{n_{1}}A_{1j}A_{1j}(0)^{(m)}}\right\}$
Abbreviation: PS, principal stratum; $E_{i}(i=0,1)$ is the subset of patients randomized to treatment $i$ .

The SAS program implementing the MI-based AdACE estimtor is provided in the supplemental material.

3 Simulations

In this section, we consider a two-arm, parallel, randomized trial in diabetes with the simulation settings as described in Qu et al. [8]. The simulated data are denoted by $(X_{j},\mbox{\boldmath$Z$}_{j},Y_{j},\mbox{\boldmath$I$}_{j})$ for subject $j$ , where $Y_{j}$ is the primary outcome of HbA1c at Week 24 of treatment, $X_{j}$ is baseline HbA1c, $\mbox{\boldmath$Z$}_{j}=(Z_{j}^{(1)},Z_{j}^{(2)},Z_{j}^{(3)})^{\prime}$ is a vector of intermediate repeated measurements of HbA1c reading at Weeks 6, 12, and 18, and $\mbox{\boldmath$I$}_{j}=(I_{j}^{(1)},I_{j}^{(2)},I_{j}^{(3)})^{\prime}$ denotes the adherence to treatment after Weeks 6, 12, and 18, respectively. The data are simulated for treatments $T=0,1$ , respectively.

The baseline value $X_{j}$ , intermediate readings $\mbox{\boldmath$Z$}_{j}$ , and primary outcome $Y_{j}$ are generated by:

X_{j}\sim NID(\mu_{x},\sigma_{x}^{2}),

(1)

Z_{j}^{(k)}=\alpha_{0k}+\alpha_{1k}X_{j}+\alpha_{2k}T_{j}+\eta_{j}^{(k)},\quad 1\leq k\leq 3,

(2)

and:

Y_{j}=\beta_{0}+\beta_{1}X_{j}+\beta_{2}T_{j}+\sum_{k=1}^{3}\beta_{3k}Z_{j}^{(k)}+\epsilon_{j},

(3)

where NID means normally independently distributed, $\eta_{j}^{(k)}\sim NID(0,\sigma_{\eta}^{2})$ and $\epsilon_{j}\sim NID(0,\sigma_{\epsilon}^{2})$ , and $\eta_{j}^{(k)}$ ’s and $\epsilon_{j}$ are independent.

We assume patients can drop out of the study after the collection of clinical data at time point $k$ . The adherence indicator after time point $k(1\leq k\leq 3)$ is generated from a logistic model:

\displaystyle{\mathrm{logit}}\{\Pr(I_{j}^{(k)}=1|I_{j}^{(k-1)}=1,X_{j},Z_{j}^{(k)})\}=\gamma_{0}+\gamma_{1}X_{j}+\gamma_{3k}Z_{j}^{(k)},

(4)

where ${\mathrm{logit}}(p)=\log(p/(1-p))$ , and by convention we set $I_{j}^{(0)}=1$ . If the adherence indicator at any time point is 0, then the data after this time point are set to be missing.

To mimic the response and treatment adherence rates in clinical trials for anti-diabetes treatments, we consider two settings in our simulation and the parameters are given: $\mu_{x}=8.0$ , $\sigma_{x}=1.0$ , $\alpha_{01}=\alpha_{02}=\alpha_{03}=2.3$ , $\alpha_{11}=\alpha_{12}=\alpha_{13}=-0.3$ , $\alpha_{21}=-0.4$ , $\alpha_{22}=-0.9$ , $\alpha_{23}=-1.2$ , $\beta_{0}=0.2$ , $\beta_{1}=-0.02$ , $\beta_{2}=-0.2$ , $\beta_{31}=0.2$ , $\beta_{32}=0.4$ , $\beta_{33}=0.7$ , $\sigma_{\eta}=0.4$ , $\sigma_{\epsilon}=0.3$ , $\gamma_{0}=3$ , and $\gamma_{1}=-0.1$ (Setting 1) or -0.25 (Setting 2), $\gamma_{31}=-1$ , $\gamma_{32}=-2$ , $\gamma_{33}=-2.5$ , and $j$ =1 to 150. These simulation parameters are selected to mimic a real clinical trial in diabetes.

To impute the potential outcome of $(\mbox{\boldmath$Z$}_{ij}(1-i),Y_{ij}(1-i),\mbox{\boldmath$I$}_{j}(1-i))$ under treatment $(1-i)$ for patient $j$ assigned in treatment group $i$ , we need to use the data of $X_{ij}$ and $(X_{1-i,j},\mbox{\boldmath$Z$}_{1-i,j},Y_{1-i,j},\mbox{\boldmath$I$}_{1-i,j})$ . We first create missing records for $(\mbox{\boldmath$Z$}_{ij}(1-i),Y_{ij}(1-i),\mbox{\boldmath$I$}_{ij}(1-i))$ and then apply an MI procedure to impute these “missing” values. Note the true missing values at treatment group $(1-i)$ as a result of dropout are also imputed simultaneously.

The following three steps are then implemented to impute the data: 1) use regression models to impute $\mbox{\boldmath$Z$}_{ij}(1-i)|X_{ij}$ based on the relationship between $\mbox{\boldmath$Z$}_{1-i,j}$ and $X_{1-i,j}$ , 2) impute $Y_{ij}(1-i)|(X_{ij},\mbox{\boldmath$Z$}_{ij}(i-1))$ per the regression model of $Y_{1-i,j}\sim X_{1-i,j}+\mbox{\boldmath$Z$}_{1-i,j}$ , and 3) impute $\mbox{\boldmath$I$}_{ij}(1-i)|(X_{ij},\mbox{\boldmath$Z$}_{ij}(1-i))$ based on the relationship between $\mbox{\boldmath$I$}_{1-i,j}$ and $X_{1-i,j}$ through multiple logistic regressions. These three steps of multiple imputations can be easily implemented through SAS PROC MI procedure. SAS code for the simulation is provided as a supplementary document.

Table 3: Summary of the simulation results for the estimators of treatment effect in two populations of adherers (based on 3,000 simulated samples)

Abbreviations: CP, coverage of probability of the 95% confidence interval; SE, standard error; $\mu_{0,+}$ , population mean for the control group for $S_{+}$ ; $\mu_{1,+}$ , population mean for the treatment group for $S_{+}$ ; $\mu_{d,+}$ , population mean for the treatment difference between treatment and control groups for $S_{+}$ ; $\mu_{0,++}$ , population mean for the control group for $S_{++}$ ; $\mu_{1,++}$ , population mean for the treatment group for $S_{++}$ ; $\mu_{d,++}$ , population mean for the treatment difference between treatment and control groups for $S_{++}$ .
					Bootstrap		Rubin
Setting	Parameter	True Value	Estimate	Bias	SE	CP	SE	CP
	$\mu_{0,*+}$	-0.102	-0.102	-0.001	0.049	0.951	0.052	0.965
	$\mu_{1,*+}$	-1.588	-1.587	0.001	0.046	0.940	0.047	0.950
	$\mu_{d,*+}$	-1.487	-1.485	0.002	0.057	0.944	0.060	0.962
1	$\mu_{0,++}$	-0.192	-0.191	0.001	0.052	0.949	0.058	0.971
	$\mu_{1,++}$	-1.638	-1.640	-0.002	0.050	0.944	0.058	0.979
	$\mu_{d,++}$	-1.446	-1.449	-0.003	0.057	0.945	0.065	0.973
	$\mu_{0,*+}$	-0.107	-0.108	-0.001	0.063	0.939	0.069	0.962
	$\mu_{1,*+}$	-1.606	-1.601	0.004	0.052	0.937	0.053	0.948
	$\mu_{d,*+}$	-1.499	-1.494	0.006	0.069	0.941	0.075	0.961
2	$\mu_{0,++}$	-0.272	-0.263	0.009	0.071	0.939	0.088	0.980
	$\mu_{1,++}$	-1.679	-1.678	0.000	0.064	0.941	0.088	0.990
	$\mu_{d,++}$	-1.406	-1.415	-0.009	0.069	0.944	0.095	0.990

With imputed data, the estimators can easily be calculated through equations provided in Table 2. The true mean responses for each treatment and the treatment difference for $S_{*+}$ and $S_{++}$ can be calculated by numerical integration as described in Qu et al. [8]. To adjust for baseline covariates, the set of $E_{0}\cup E_{1}$ (all patients) is used in the calculation, where $E_{i}$ is the set of patients randomized to treatment $i$ . The point estimates can be obtained by averaging the mean estimates from multiple imputed samples. One method to estimate the variance is to combine the within- and between-imputation variability by Rubin [30] and Barnard and Rubin [31]; it has been reported that these methods may provide conservative coverage probability [32, 33, 34]. Our simulations also demonstrate that the variability achieved through the methods of Barnard and Rubin [31] is too conservative and the coverage probability of the confidence interval is larger than its nominal value. Therefore, we also use a bootstrap method to estimate the variance of the estimator. The bootstrap samples were first created and the imputation procedure was applied to the bootstrap samples. More discussions on various bootstrap methods can be found in Bartlett and Hughes [34, 35].

Table 3 shows the simulation results based on 3,000 simulated samples. For each simulated sample, 200 imputations were implemented based on the multiple imputation procedure described earlier. For the bootstrap approach, 50 bootstrap samples are generated for estimating the variance, and then 95% confidence intervals are calculated based on a normal approximation, which is appropriate in the simulation since the simulated data are generated from a normal distribution. The reason that the percentile bootstrap was not used for constructing the confidence interval is that it requires a large number of bootstrap samples and is very time consuming. The estimates from both scenarios have little bias and the empirical coverage probability for the 95% confidence intervals is close to the nominal level with the bootstrap approach. Specifically, by comparing the two scenarios in coverage probability, scenario 2 with lower adherence than scenario 1 has slightly lower but acceptable coverage. The 95% confidence interval based on Rubin’s method has much higher coverage probability than the normal level of 0.95. The much wider confidence interval of Rubin’s method is likely due to the uncongeniality between the analysis model and imputation model [34, 36, 37].

We also compared the performance of the proposed method to previously published principal scores method where the potential outcome and adherence are modeled through baseline covariates only [16, 19]. Similarly, the principal score based method was implemented using MI. With the same simulation settings, Table 4 shows that the estimates by the principal scores method were more biased as compared to the proposed method utilizing postbaseline intermediate outcomes. For the principal score method, the bias in the estimator for $\mu_{d,++}$ is very small and the estimator for $\mu_{*+}$ is relatively large.

In addition, the type 1 error for the estimate of treatment difference with the proposed method was assessed by updating $\alpha_{2k}=0$ and $\beta_{2}=0$ without changing other parameters in the two simulation settings. Under this scenario (null scenario), the two treatment groups have the same distributions in all variables ( $X$ , $Z$ , $Y$ and $A$ ). Under the null scenario, the true treatment difference $\mu_{d,++}=0$ for the principal stratum $S_{++}$ . The simulation showed that the rejection rate for the null hypothesis is 0.0680 and 0.0683 for Setting 1 and Setting 2, respectively. For principal stratum $S_{*+}$ , the treatment effect $\mu_{d,*+}$ under the null scenario is not equal to 0, which can be seen by Equation (B.6) in Qu et al. [8]. This phenomenon will be discussed further in Section 5. Therefore, the type 1 error for the estimator for $\mu_{d,*+}$ was not assessed.

Table 4: The comparison for the estimators of treatment effect of adherers between the proposed method and the principal scores method with MI imputation based on baseline only (based on 3,000 simulated samples)

Abbreviations: $\mu_{0,+}$ , population mean for the control group for $S_{+}$ ; $\mu_{1,+}$ , population mean for the treatment group for $S_{+}$ ; $\mu_{d,+}$ , population mean for the treatment difference between treatment and control groups for $S_{+}$ ; $\mu_{0,++}$ , population mean for the control group for $S_{++}$ ; $\mu_{1,++}$ , population mean for the treatment group for $S_{++}$ ; $\mu_{d,++}$ , population mean for the treatment difference between treatment and control groups for $S_{++}$ .
			Proposed method		Principal scores method
Setting	Parameter	True Value	Estimate	Bias	Estimate	Bias
	$\mu_{0,*+}$	-0.102	-0.102	-0.001	-0.154	-0.052
	$\mu_{1,*+}$	-1.588	-1.587	0.001	-1.596	-0.007
	$\mu_{d,*+}$	-1.487	-1.485	0.002	-1.442	0.045
1	$\mu_{0,++}$	-0.192	-0.191	0.001	-0.217	-0.026
	$\mu_{1,++}$	-1.638	-1.640	-0.002	-1.662	-0.024
	$\mu_{d,++}$	-1.446	-1.449	-0.003	-1.444	0.002
	$\mu_{0,*+}$	-0.107	-0.108	-0.001	-0.214	-0.108
	$\mu_{1,*+}$	-1.606	-1.601	0.004	-1.614	-0.008
	$\mu_{d,*+}$	-1.499	-1.494	0.006	-1.400	0.099
2	$\mu_{0,++}$	-0.272	-0.263	0.009	-0.303	-0.031
	$\mu_{1,++}$	-1.679	-1.678	0.000	-1.707	-0.028
	$\mu_{d,++}$	-1.406	-1.415	-0.009	-1.404	0.003

4 Application

The application of the proposed method was based on the IMAGINE-3 Study, which has been used by Bergenstal et al. [38] and allows for direct comparison with previous results. IMAGINE-3 was a 52-week treatment trial for patients with type 1 diabetes mellitus to demonstrate basal insulin lispro (BIL) was superior to insulin glargine (GL). In this trial, 1,114 adults with type 1 diabetes were randomized to BIL and GL in a 3:2 ratio. The study was conducted in accordance with the International Conference on Harmonisation Guidelines for Good Clinical Practice and the Declaration of Helsinki. This study was registered at clinicaltrials.gov as NCT01454284 and details of the study report have been published.

Of the 1,114 randomized patients, 1,112 patients (663 in BIL, 449 in GL) took at least one dose of study drugs. A total of 235 patients permanently discontinued the study treatment early due to reasons of lack of efficacy (LoE), adverse events (AE), or adminstration reasons, leaving 877 (78.9%) patients adhering to the treatment.

To apply to the proposed methods, we consider the following baseline covariates $X$ that could potentially impact treatment adherence: age, gender, HbA1c, low density lipoprotein clolesterol (LDL-C), triglyceride (TG), fasting serum glucose (FSG), and alanine aminotransferase (ALT). The study also collected HbA1c, LDL-C, TG, FSG, and ALT at Week 12 and Week 26, and injection site reaction adverse events (a binary variable) that occurred between randomization and Week 12 and between Week 12 and Week 26. Those post baseline variables were considered in intermediate covariates $\mbox{\boldmath$Z$}_{1}$ for Week 12 and $\mbox{\boldmath$Z$}_{2}$ for Week 26, respectively. The primary outcome $Y$ was the HbA1c reading at Week 52.

For each stratum of $S_{*+}$ or $S_{++}$ , 1,000 imputations were generated and the complete data after imputations were used to estimate the mean response for each treatment group and the treatment difference. Due to a large amount of “missing” values (potential outcomes for the alternative treatments were not observed), we need a large number of imputations to achieve good accuracy for the estimates. Based on our investigation, the 1,000 imputed samples made the variance due to imputation random error $<0.1\%$ of the variance of the final estimate (average of the estimates from 1,000 imputations). The variance of the final estimate was estimated using 50 bootstrap samples and the corresponding 95% confidence interval was calculated using the normal approximation with the bootstrap variance. Due to the relatively large sample size, we expect the distributions of the estimators to be approximately Gaussian, which was confirmed by normal Q-Q plots of the 50 bootstrap estimates for all parameters for $S_{*+}$ and $S_{++}$ (data not shown here).

Table 5: Summary of results of the real data analysis for the estimators of treatment effect in HbA1c for the two populations of adherers using proposed methods

Treatment	$S_{*+}$		$S_{++}$
Treatment	Estimate $\pm$ SE	95% CI	Estimate $\pm$ SE	95% CI
GL	7.59 $\pm$ 0.05	(7.49 , 7.70)	7.54 $\pm$ 0.05	(7.44 , 7.64)
BIL	7.33 $\pm$ 0.04	(7.25 , 7.41)	7.30 $\pm$ 0.04	(7.22 , 7.37)
BIL vs. GL	-0.26 $\pm$ 0.05	(-0.35 , -0.16)	-0.25 $\pm$ 0.05	(-0.34 , -0.15)
Abbreviations: BIL, basal insulin peglispro; CI, confidence interval; GL, basal insulin glargine; SE, standard error

Table 5 shows the estimates for HbA1c at 52 weeks for each treatment group and the treatment difference for the population $S_{*+}$ and $S_{++}$ using the method based on $E_{0}\cup E_{1}$ . The estimates were similar to those reported in Qu et al. [9]. It showed BIL was superior to GL in controlling HbA1c for the two principal strata: $S_{*+}$ and $S_{++}$ .

5 Summary and Discussion

In addition to the commonly used estimands for the treatment difference for the whole study population, the treatment difference for adherers (a principal stratum) is also important and plays a primary role in assessing the effect of a treatment as described in the so-called tripartite approach [7, 39]. When making decisions whether to start a new pharmacologic treatment or not, patients and physicians want to know what the effects of that treatment are when the patient takes the medication as prescribed. Careful thought and more sophisticated analyses are required (i.e., not the naïve completers analysis) so that data from randomized clinical trials can be used to assess this important principal stratum and provide an estimate of the causal effect of the treatment.

Furthermore, some aspects of clinical practice remain trial and error; a patient is prescribed a medication and follow-up visits are scheduled to assess the status of the patient’s disease or any resulting side effects. Those observations are used to guide the patient’s subsequent treatment with dosage changes or switching to other treatments. In our interactions with physicians, many start by prescribing a treatment that is highly effective when taken at the recommended dose and frequency and only alter or discontinue that treatment if side effects are not tolerable or if there is insufficient efficacy. This is considered preferable to starting with a treatment of lesser efficacy but perhaps greater adherence due to fewer or more acceptable side effects. Such treatments can always be a “fallback” option.

These considerations suggest that the treatment effect estimate in the principal stratum of patients who can adhere to treatment is very important, if not more important than the estimate for the whole study population. Furthermore, the Tripartite Approach of Akacha et al. [39] is ideally equipped to provide not only an estimate of the treatment effect in this clinically meaningful principal stratum, but also the likelihood of non-adherence due to adverse events and lack of efficacy. With this information, the patient and their physician can have a meaningful and nuanced discussion about the expected benefits and risks of actually taking the medication.

Finally, side effects are most often analyzed and described in the context of what happens when a medication is taken as prescribed, and we believe this context is most relevant for efficacy as well, especially when assimilating information into benefit-risk assessments.

Qu et al. [8] and Zhang et al. [40] provide the general framework for estimating the AdACE, but the implementation of such estimators is rather complex. In this article, we proposed an MI-based method to construct the estimators, which is much more straightforward than the original method proposed to construct the AdACE estimators. We evaluated the performance through simulations and it showed that the new method provides consistent estimators and has the correct coverage probability for the bootstrap confidence interval at the nominal alpha level. We also applied these MI-based estimators to the same data set as in Qu et al. [9] and yielded similar results as the original estimates.

We have focused on the discussion of the proposed method for the estimation of the mean treatment difference for a continuous variable. This method can be easily adapted to different types of variables (e.g., binary or time-to-event variables) as long as the data can be imputed consistently. Imputation of data for binary and time-to-event variables, which may be more challenging, is out of the scope of this article.

The proposed method, which utilizes the intermediate outcomes, is certainly more complex than the traditional method of identifying principal strata or predicting the outcome measurement only using baseline covariates. When the adherence only depends on baseline covariates, the proposed method will not provide additional benefit compared to the method of modeling the principal score only through the baseline covariates, but it will generally not deteriorate the estimation unless too many intermediate outcomes are included. In practice, baseline covariates and intermediate outcome variables should be carefully selected. In the example in Section 4, the selected variables were a direct indication of efficacy and safety outcomes that could potentially lead to treatment discontinuation.

It requires a set of relatively strong assumptions (A1-A7) to make the proposed estimator based on multiple imputation consistent for the treatment effect for adherers. Therefore, sensitivity analyses may be performed. The sensitivity analyses can be easily performed within the imputation (e.g., adding a sensitivity parameter after the imputation as in the tipping-point analysis [41]). Future research on sensitivity analyses may be required.

It requires careful consideration for using an estimand for a principal stratum. In general settings (e.g., in the setting in this article), the true treatment effect for $S_{*+}$ under the null treatment effect case may not zero. For the simulation model in this article, this can be seen by the theoretic treatment difference for $S_{*+}$ derived in Equation (B.6) in Qu et al. [8]. Only under special cases, the true treatment effect for $S_{*+}$ is equal to zero under the null hypothesis of no treatment difference between the potential outcomes for the 2 treatments. Further research is required to find the necessary conditions for which $\mu_{d,*+}=0$ under the null hypothesis. This phenomenon provides another reason to use the treatment effect for $S_{++}$ , in which the treatment effect is always zero under the null hypothesis. In addition, the traditional method of forming the null hypothesis may not work for principal stratification based estimands and the principal stratification variables should be take into consideration.

In summary, the MI-based estimation proposed in this article will allow for broader adoption and easier estimation of the AdACE, providing estimation for an alternative clinically meaningful estimand for adherers.

Acknowledgements

We would like to thank Dr. Ilya Lipkovich for his scientific review and useful comments. We would also like to thank the Associate Editor and two anonymous referees for their valuable comments, which lead to significant improvement of this article.

Conflict of interest

There is no funding received for this research other than the time spent as part of employment. For Junxiang Luo, the work was done when he was an employee of Sanofi.

Data sharing

Authors elect to not share data.

References

[1] ICH E9 (R1). Addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. EMA/CHMP/ICH/436221/2017, Step 5 (Final Version adopted on 17 Feb 2020); 2020.
[2] Scharfstein DO. A constructive critique of the draft ICH E9 Addendum. Clinical Trials. 2019;16(4):375–380.
[3] Qu Y, Lipkovich I. Implementation of ICH E9 (R1): a few points learned during the COVID-19 pandemic. Therapeutic Innovation & Regulatory Science. 2021;55(5):984–988.
[4] Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. Journal of the American Statistical Association. 1996;91(434):444–455.
[5] Imbens GW, Rubin DB. Bayesian inference for causal effects in randomized experiments with noncompliance. The annals of statistics. 1997;p. 305–327.
[6] Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58(1):21–29.
[7] Ruberg S, Akacha M. Considerations for evaluating treatment effects from randomized clinical trials. Clinical Pharmacology & Therapeutics. 2017;102:917–923.
[8] Qu Y, Fu H, Luo J, Ruberg SJ. A general framework for treatment effect estimators considering patient adherence. Statistics in Biopharmaceutical Research. 2020;12(1):1–18.
[9] Qu Y, Luo J, Ruberg SJ. Implementation of tripartite estimands using adherence causal estimators under the causal inference framework. Pharmaceutical Statistics. 2021;20(1):55–67.
[10] Mehrotra DV, Li X, Gilbert PB. A comparison of eight methods for the dual-endpoint evaluation of efficacy in a proof-of-concept HIV vaccine trial. Biometrics. 2006;62(3):893–900.
[11] Shepherd BE, Redman MW, Ankerst DP. Does finasteride affect the severity of prostate cancer? A causal sensitivity analysis. Journal of the American Statistical Association. 2008;103(484):1392–1404.
[12] Permutt T. Effects in adherent subjects. Statistics in Biopharmaceutical Research. 2018;10(3):233–235.
[13] Bornkamp B, Bermann G. Estimating the treatment effect in a subgroup defined by an early post-baseline biomarker measurement in randomized clinical trials with time-to-event endpoint. Statistics in Biopharmaceutical Research. 2020;12(1):19–28.
[14] Bornkamp B, Rufibach K, Lin J, Liu Y, Mehrotra DV, Roychoudhury S, et al. Principal stratum strategy: Potential role in drug development. Pharmaceutical Statistics. 2021;.
[15] Zhang JL, Rubin DB. Estimation of causal effects via principal stratification when some outcomes are truncated by “death”. Journal of Educational and Behavioral Statistics. 2003;28(4):353–368.
[16] Hayden D, Pauler DK, Schoenfeld D. An estimator for treatment comparisons among survivors in randomized trials. Biometrics. 2005;61(1):305–310.
[17] Small D, Tan Z. A stochastic monotonicity assumption for the instrumental variables method. Unpublished Working Paper. 2007;.
[18] Jin H, Rubin DB. Principal stratification for causal inference with extended partial compliance. Journal of the American Statistical Association. 2008;103(481):101–111.
[19] Jo B, Stuart EA. On the use of propensity scores in principal causal effect estimation. Statistics in medicine. 2009;28(23):2857–2875.
[20] Chiba Y, VanderWeele TJ. A simple method for principal strata effects when the outcome has been truncated due to death. American journal of epidemiology. 2011;173(7):745–751.
[21] Ding P, Geng Z, Yan W, Zhou XH. Identifiability and estimation of causal effects by principal stratification with outcomes truncated by death. Journal of the American Statistical Association. 2011;106(496):1578–1591.
[22] VanderWeele TJ. Principal stratification–uses and limitations. The international journal of biostatistics. 2011;7(1):1–14.
[23] Lu X, Mehrotra D, Shepherd B. Rank-based principal stratum sensitivity analyses. Statistics in medicine. 2013;32(26):4526–4539.
[24] Feller A, Mealli F, Miratrix L. Principal Score Methods: Assumptions and Extensions. arXiv preprint arXiv:160602682. 2016;.
[25] Magnusson BP, Schmidli H, Rouyrre N, Scharfstein DO. Bayesian inference for a principal stratum estimand to assess the treatment effect in a subgroup characterized by post-randomization events. arXiv preprint arXiv:180903741. 2018;.
[26] Qu Y, Lipkovich I, Ruberg SJ. Monotonicity assumptions in estimating the treatment effect for a principal stratum. arXiv preprint arXiv:211110938. 2021;.
[27] Louizos C, Shalit U, Mooij JM, Sontag D, Zemel R, Welling M. Causal effect inference with deep latent-variable models. In: Advances in Neural Information Processing Systems; 2017. p. 6446–6456.
[28] Rubin D. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons Inc; 1987.
[29] Mallinckrodt C, Lipkovich I. Analyzing longitudinal clinical trial data: a practical guide. CRC Press; 2016.
[30] Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–592.
[31] Barnard J, Rubin DB. Miscellanea. Small-sample degrees of freedom with multiple imputation. Biometrika. 1999;86(4):948–955.
[32] Robins JM, Wang N. Inference for imputation estimators. Biometrika. 2000;87(1):113–124.
[33] Hughes R, Sterne J, Tilling K. Comparison of imputation variance estimators. Statistical methods in medical research. 2016;25(6):2541–2557.
[34] Bartlett JW, Hughes RA. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Statistical methods in medical research. 2020;29(12):3533–3546.
[35] Schomaker M, Heumann C. Bootstrap inference when using multiple imputation. Statistics in medicine. 2018;37(14):2252–2266.
[36] Meng XL. Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994;9:538–558.
[37] Xie X, Meng XL. Dissecting multiple imputation from a multi-phase inference perspective: what happens when God’s, imputer’s and analyst’s models are uncongenial? Statistica Sinica. 2017;27:1485–1545.
[38] Bergenstal R, Lunt H, Franek E, Travert F, Mou J, Qu Y, et al. Randomized, double-blind clinical trial comparing basal insulin peglispro and insulin glargine, in combination with prandial insulin lispro, in patients with type 1 diabetes: IMAGINE 3. Diabetes, Obesity and Metabolism. 2016;18(11):1081–1088.
[39] Akacha M, Bretz F, Ruberg S. Estimands in clinical trials–broadening the perspective. Statistics in medicine. 2017;36(1):5–19.
[40] Zhang Y, Fu H, Ruberg SJ, Qu Y. Statistical inference on the estimators of the adherer average causal effect. Statistics in Biopharmaceutical Research; https://doiorg/101080/1946631520211891965. 2021;p. 1–4.
[41] Yan X, Lee S, Li N. Missing data handling methods in medical device clinical trials. Journal of Biopharmaceutical Statistics. 2009;19(6):1085–1098.