This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\authormark

YOUFEI YU et al

\corres

*Min Zhang, Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA.

An Inverse Probability Weighted Regression Method that Accounts for Right-censoring for Causal Inference with Multiple Treatments and a Binary Outcome.

Youfei Yu    Min Zhang    Bhramar Mukherjee \orgdivDepartment of Biostatistics, School of Public Health, \orgnameUniversity of Michigan, Ann Arbor, \orgaddress\stateMichigan, \countryUSA [email protected]
(26 April 2016; 6 June 2016; 6 June 2016)
Abstract

[Summary]Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the post-treatment outcome of interest is whether the event happens within a pre-specified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity score-based methods. An additional source of bias is right-censoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regression-based estimator that can simultaneously handle both confounding and right-censoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.

keywords:
causal inference, claims data, double robustness, observational studies, right censoring.
articletype: Article Type

1 Introduction

Data from observational studies, in which treatments are usually not randomly assigned, have been increasingly used to evaluate comparative effectiveness of treatments in the real world. Without randomization, it is difficult to make causal interpretations concerning the effect of a treatment on an outcome of interest, as the estimation of the average treatment effect tends to be biased by the presence of confounders. A commonly used tool that controls for confounding is the propensity score, defined as the conditional probability of treatment given a set of potential confounders.1 There is a large body of work on propensity score-based methods that adjust for the differences in baseline covariates among treatment groups, and readers can find some reviews of these methods by Stuart, 2 Hu et al,3 and Yu et al.4

It usually takes some period of follow-up time, which may or may not be the same for all subjects, to observe the post-treatment outcome of interest after the subjects enter the study. This type of outcome is sometimes referred to as time-lagged response. 5 One example is whether the event of interest happens within a pre-specified time window, which results in a binary outcome that cannot be ascertained if the subject has been censored prior to the end of the window. Censoring occurs when the information about the response is not completely available due to dropout, study termination, or treatment switch. Censoring together with confounding arises in many applications, such as insurance claims databases, where the participants are not randomly enrolled nor randomly assigned to treatments, and could potentially leave the insurance plan or be switched to another treatment prior to the occurrence of the event of interest. In this paper, we consider a study that evaluates the possible adverse effects of four candidate drugs prescribed for metastatic castration-resistant prostate cancer (mCRPC) using data from Optum Clinformatic Data Mart, a national private health insurance network. Some of these drugs are chemotherapy, while some others are hormone therapies, and the prior expectation is that the hormone therapies will lead to less acute adverse events. We separately examine two endpoints: emergency room (ER) visit and all-cause hospitalization post first-line therapy with one of these drugs. We intend to compare patients’ risks of experiencing at least one event of interest (i.e., ER visit or hospitalization) within a 180-day, 270-day, and 360-day time window after treatment initiation. Therefore, the outcome we focus on is binary that is subject to right-censoring.

Leaving out censored observations when conducting a comparative analysis can result in a biased effect estimate even after proper adjustment for confounding. Special techniques that adjust for both confounders and censoring are required to make valid inference on the causal effects. Anstrom and Tsiatis proposed an inverse probability weighted estimator of average treatment effect for the time-lagged response, which can be used to estimate the difference in risks of event occurrence.5 To improve the robustness and statistical efficiency of the estimator of Anstrom and Tsiatis, Wang et al proposed an augmented inverse probability weighted estimator that makes use of the information about the outcome model for censored medical cost data.6 The estimator of Wang et al can conveniently be extended to the case of a binary outcome by replacing the linear regression model for the outcome with a logistic regression model. Since one minus the survival function of the event of interest describes the risk of an event over the duration of follow-up, one can also use methods for time-to-event outcomes. These will include approaches that model the whole survival curve, and obtain the probability of surviving to the end of the pre-specified time window. For example, Zhang and Schaubel proposed an estimator for the cumulative hazard function, which incorporates the information from Cox models for the time-to-event outcome into the estimating equations.7 Their method was originally developed for estimation of restricted mean lifetimes, but can be easily adapted to estimate the average treatment effect on a possibly censored binary outcome, since one can think of the survival probability at a given time point as a fixed time ‘snapshot’ of the whole survival curve. The augmented estimators such as those proposed by Wang et al6 and Zhang and Schaubel7 are known to possess the double robustness property, such that the estimator remains consistent as long as either the model corresponding to the outcome, or the models corresponding to the weights in the estimating equations are correct. Another line of work in causal inference for censored data involves using pseudo-observations in replacement of the original outcomes that are possibly incompletely observed.8 For a binary outcome, the causal treatment effect is estimated by computing pseudo survival probability for each subject, followed by standard causal inference method for completely observed outcomes, such as inverse probability weighting or direct standardization. We leave the details of the aforementioned methods to Section 4. Among these approaches that address right-censoring, some assume conditional independence of the censoring and survival time given treatment only,5, 6, 8 while others require less restrictive conditions such that censoring and survival times are independent given treatment and baseline covariates.7

We propose a method that directly models the binary outcome using logistic regression, with confounding and censoring properly accounted for by weighting. The risk of event occurrence (and therefore the average treatment effect) is estimated based on standardization, which averages the outcome predictions obtained from the logistic regression model across all subjects. The treatment assignment and censoring mechanism together can be viewed as a special case of coarsening, a process that prevents one from observing the desired data structure.9 We explain how our problem can be described using the concept of coarsening later in Section 3. Coarsening is handled by applying inverse probability of not being coarsened as weights to the score function of the logistic regression model. We call the method inverse probability weighted regression-based estimator, CIPWR for short, with the letter C highlighting the censoring component. Specifically, three sets of working models are constructed, one for the treatment assignment, one for the treatment specific censoring distribution, and the other for the outcome of interest. We show that the CIPWR estimator is doubly robust in the sense that consistency of the estimator can be achieved if either the outcome or the coarsening mechanism is correctly modeled. As Zhang and Schaubel,7 this method makes the less restrictive assumption about censoring than Anstrom and Tsiatis,5 Wang et al,6 and Anderson et al.8 Unlike Zhang and Schaubel7 and Wang et al6 that are based on the general approach of augmented inverse probability weighting, our method is a standardization method. Also, unlike Zhang and Schaubel7 that estimates the whole survival curve, this method targets the binary outcome of interest and may lead to improved efficiency in some situations.

The rest of the paper is organized as follows. In Section 2, we introduce the statistical framework and the notations used. In Section 3, we describe our proposed method and establish its asymptotic properties. We compare the finite sample performance of the CIPWR estimator to that of several alternative approaches through simulation studies and results are presented in Section 5. The proposed method is then applied to the prostate cancer treatment comparison example from the insurance claims database in Section 6. Conclusions and discussions for future research are presented in Section 7.

2 Notations and Assumptions

For individual ii, where i=1,,ni=1,\cdots,n, let 𝑿~i\tilde{\boldsymbol{X}}_{i} be a set of baseline variables, and ZiZ_{i} be the treatment received. We assume that ZiZ_{i} is nominal with JJ levels, i.e., Zi=j{1,,J;J2}Z_{i}=j\in\{1,\cdots,J;J\geq 2\}, and let DijI(Zi=j)D_{ij}\equiv I(Z_{i}=j). Let TiT_{i} denote the underlying lag time to the first event of interest, which will always be observed if there were no censoring. In this study, the outcome of interest, denoted by YiY_{i}, is whether the event of interest occurs within a pre-specified time window dd. By this definition, Yi=I(Ti<d)Y_{i}=I(T_{i}<d). We adopt the counterfactual framework to formulate the problem of causal comparison 10. Each individual is associated with a set of potential outcomes {Yi(1),,Yi(J)}\{Y_{i}^{(1)},\cdots,Y_{i}^{(J)}\}, where Yi(j)=I{Ti(j)<d}Y_{i}^{(j)}=I\{T_{i}^{(j)}<d\} and Ti(j)T_{i}^{(j)} is defined as the potentially observed time to the first event of interest had the patient received treatment jj. Under the Stable Unit Treatment Value Assumption (SUTVA, defined later), only the outcome under the actual treatment received, Yi=j=1JDijYi(j)Y_{i}=\sum_{j=1}^{J}D_{ij}Y_{i}^{(j)}, can be observed.

In practice, the time to event TiT_{i} may not be completely observed due to right-censoring, in which case the outcome variable YiY_{i} is therefore subject to coarsening. Let CiC_{i} denote the censoring time and Ri=I{Cimin(Ti,d)}R_{i}=I\left\{C_{i}\geq\min(T_{i},d)\right\}. Then YiY_{i} is observed if the individual has not been censored before dd, i.e., Ri=1R_{i}=1. We further let Δi=I(TiCi)\Delta_{i}=I(T_{i}\leq C_{i}) and Li=min(Ti,Ci,d)L_{i}=\min(T_{i},C_{i},d). Note that the outcomes YiY_{i} of those whose TiT_{i} are censored (Δi=0\Delta_{i}=0) are not necessarily missing at time dd (Ri=0R_{i}=0).

Interest lies in estimating the average treatment effect τ(j,j)=E{Y(j)Y(j)}\tau\left(j,j^{\prime}\right)=E\{Y^{\left(j^{\prime}\right)}-Y^{(j)}\}, which equals the risk difference pr{Y(j)=1}pr{Y(j)=1}pr\{Y^{(j^{\prime})}=1\}-pr\{Y^{(j)}=1\} for a binary outcome. We seek to estimate E{Y(j)}E\{Y^{(j)}\} separately for j=1,,Jj=1,\cdots,J. To connect the counterfactual framework to the observable data and establish a causal interpretation, we make the following assumptions.

  1. I.

    (Random sampling) The individuals in the study are randomly sampled from the population.

  2. II.

    (Stable Unit Treatment Value Assumption, or SUTVA) For any individual ii, i=1,,ni=1,\cdots,n, if Zi=jZ_{i}=j, then Yi=Yi(j)Y_{i}=Y_{i}^{(j)}, for all j=1,,Jj=1,\cdots,J.

  3. III.

    (Unconfoundedness) {Yi(1),,Yi(J)}Zi|𝑿~i\{Y_{i}^{(1)},\cdots,Y_{i}^{(J)}\}\mathrel{\text{\scalebox{1.07}{$\perp\mkern-10.0mu\perp$}}}Z_{i}|\tilde{\boldsymbol{X}}_{i}.

  4. IV.

    (Overlap) For all values of jj and 𝒙~\tilde{\boldsymbol{x}}, 0<πj(𝒙~)<10<\pi_{j}(\tilde{\boldsymbol{x}})<1, where πj(𝒙~)=pr(Zi=j|𝒙~)\pi_{j}(\tilde{\boldsymbol{x}})=pr(Z_{i}=j|\tilde{\boldsymbol{x}}).

  5. V.

    (Censoring at random) Ci{Ti(1),,Ti(J)}|(Zi,𝑿~i)C_{i}\mathrel{\text{\scalebox{1.07}{$\perp\mkern-10.0mu\perp$}}}\{T_{i}^{(1)},\cdots,T_{i}^{(J)}\}\Big{|}(Z_{i},\tilde{\boldsymbol{X}}_{i}).

3 Proposed Method: Inverse Probability Weighted Regression that Accounts for Right-Censoring

We note that instead of directly evaluating E{Yi(j)}E\{Y_{i}^{(j)}\}, it is theoretically more convenient to work with the survival function

μjE[I{Ti(j)d}]=1E{Yi(j)},\displaystyle\mu_{j}\equiv E[I\{T_{i}^{(j)}\geq d\}]=1-E\{Y_{i}^{(j)}\},

and we let Y~i(j)=I{Ti(j)d}\tilde{Y}_{i}^{(j)}=I\{T_{i}^{(j)}\geq d\}. The counterfactual parameter μj\mu_{j} can be represented using the observed data, μj=EX[E{Y~i(j)|𝑿~i}]=EX[E{Y~i|𝑿~i,Zi=j}]\mu_{j}=E_{X}[E\{\tilde{Y}_{i}^{(j)}|\tilde{\boldsymbol{X}}_{i}\}]=E_{X}[E\{\tilde{Y}_{i}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j\}], where the second equation follows from the unconfoundedness assumption III. Had there been no right-censoring, μj\mu_{j} could be estimated by averaging the predicted potential outcomes, μ^j=n1i=1nE^(Y~i|𝑿~i,Zi=j)\hat{\mu}_{j}=n^{-1}\sum_{i=1}^{n}\hat{E}(\tilde{Y}_{i}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j), for j=1,,Jj=1,\cdots,J, where E^(Y~i|𝑿~i,Zi=j)\hat{E}(\tilde{Y}_{i}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j) are usually fitted values in a parametric regression model. For a binary outcome, a logistic regression model is a popular choice to fit Y~\tilde{Y} in group jj, specified as

logit{E(Y~i|𝑿~i,Zi=j)}=𝑿iT𝜷j,\displaystyle\text{logit}\{E(\tilde{Y}_{i}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j)\}=\boldsymbol{X}_{i}^{T}\boldsymbol{\beta}_{j}, (1)

where 𝑿i\boldsymbol{X}_{i} is a vector-valued function of 𝑿~i\tilde{\boldsymbol{X}}_{i} with an intercept and possibly interactions and nonlinear terms. For notational convenience, we define mij(𝜷j)=expit(𝑿iT𝜷j)m_{ij}(\boldsymbol{\beta}_{j})=\text{expit}(\boldsymbol{X}_{i}^{T}\boldsymbol{\beta}_{j}). When the outcome YY (and therefore Y~\tilde{Y}) is completely observed for all individuals in the sample, 𝜷j\boldsymbol{\beta}_{j} is commonly estimated by solving the score equations

i=1n𝑿i{Y~imij(𝜷j)}=0,\displaystyle\sum_{i=1}^{n}\boldsymbol{X}_{i}\{\tilde{Y}_{i}-m_{ij}(\boldsymbol{\beta}_{j})\}=0, (2)

the solution of which is the maximum likelihood estimator.

From the missing data perspective, the potential outcome Yi(j)=I{Ti(j)<d}Y_{i}^{(j)}=I\{T_{i}^{(j)}<d\} would be missing at baseline (t=0t=0) if individual ii were assigned to the treatment other than jj. When censoring comes into play, Yi(j)Y_{i}^{(j)} is subject to missingness at any time 0<t<d0<t<d because of censoring. In this case, we consider the more general notion of coarsening of data 9, 11, 12, which describes the case where one only gets to observe a many-to-one function of the full data for some of the individuals in the sample, and different many-to-one functions are allowed for different individuals. In the context of estimating μj\mu_{j}, the full data one would like to observe for individual ii is {Yi(j),𝑿~i}\{Y_{i}^{(j)},\tilde{\boldsymbol{X}}_{i}\}. When Dij=0D_{ij}=0, Ti(j)T_{i}^{(j)} (and therefore Yi(j)Y_{i}^{(j)}) is completely missing, and we only observe 𝑿~i\tilde{\boldsymbol{X}}_{i}. When Dij=1D_{ij}=1 and Ci=t<min(Ti(j),d)C_{i}=t<\min(T_{i}^{(j)},d), we observe {I(Ti(j)>t),𝑿~i}\{I(T_{i}^{(j)}>t),\tilde{\boldsymbol{X}}_{i}\}, where t<dt<d. When Dij=1D_{ij}=1 and Ci=tmin(Ti(j),d)C_{i}=t\geq\min(T_{i}^{(j)},d), there was no coarsening at all, and we observe the full data {Yi(j),𝑿~i}\{Y_{i}^{(j)},\tilde{\boldsymbol{X}}_{i}\}. In summary, there are two layers of missingness in our setting, one due to treatment assignment and the other due to censoring. Therefore, we can inversely weight the unbiased estimating equations (2) by the probability of not being coarsened to make inference about the target population where no subjects were coarsened. The weighted estimating equations for 𝜷j\boldsymbol{\beta}_{j} is given by

i=1nDijRi𝑿i{Y~imij(𝜷j)}πij(𝜶)exp{Λij(Li)}=0,\displaystyle\sum_{i=1}^{n}\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\{\tilde{Y}_{i}-m_{ij}(\boldsymbol{\beta}_{j})\}}{\pi_{ij}(\boldsymbol{\alpha})\exp\{{-\Lambda_{ij}(L_{i})}\}}=0, (3)

where πij(𝜶)=pr(Zi=j|𝑿~i)\pi_{ij}(\boldsymbol{\alpha})=pr(Z_{i}=j|\tilde{\boldsymbol{X}}_{i}) is the propensity score for treatment jj, and Λij(t)\Lambda_{ij}(t) is the cumulative hazard function of CiC_{i} at tt for treatment jj. We denote the solution to (3) by 𝜷^j\hat{\boldsymbol{\beta}}_{j}. The total weights that account for the coarsening mechanism consist of two weighting components. The first is the propensity of being assigned to treatment jj, and the second is (informally) the conditional probability of not being censored. Therefore, equation (3) can be regarded as weighting the score functions (2) by the inverse probability of observing the complete cases. A regression-based estimator for μj\mu_{j} indicated by (1) is

μ^j=n1i=1nexpit(𝑿iT𝜷^j)=n1i=1nmij(𝜷^j),\displaystyle\hat{\mu}_{j}=n^{-1}\sum_{i=1}^{n}\text{expit}(\boldsymbol{X}_{i}^{T}\hat{\boldsymbol{\beta}}_{j})=n^{-1}\sum_{i=1}^{n}m_{ij}(\hat{\boldsymbol{\beta}}_{j}),

and we call it inverse probability weighted regression-based estimator that accounts for right-censoring (CIPWR).

In the literature, an outcome that is subject to censoring is sometimes handled using survival models, such as Cox regression model.7, 13 CIPWR, on the other hand, directly models the binary outcome of interest using the logistic regression, which is relatively more intuitive and straightforward to implement for empirical researchers. Our proposed estimator can be implemented using standard statistical software, such as the glm function in R with the weights argument being specified.

In practice, πij(𝜶)\pi_{ij}(\boldsymbol{\alpha}) and Λij(t)\Lambda_{ij}(t) in (3) are usually unknown and need to be estimated from the data. We build working models for these two nuisance components. Let 𝑽i\boldsymbol{V}_{i} and 𝑾i\boldsymbol{W}_{i} be vector-valued functions of 𝑿~i\tilde{\boldsymbol{X}}_{i}, which are allowed to be different from 𝑿i\boldsymbol{X}_{i}. We assume that the treatment assignment mechanism is governed by a multinomial logistic regression model

logpr(Zi=j|𝑽i)pr(Zi=J|𝑽i)=𝑽iT𝜶j,j=1,,J1,\displaystyle\log\frac{pr(Z_{i}=j|\boldsymbol{V}_{i})}{pr(Z_{i}=J|\boldsymbol{V}_{i})}=\boldsymbol{V}_{i}^{T}\boldsymbol{\alpha}_{j},\hskip 5.69054ptj=1,\cdots,J-1,

where JJ is the reference treatment level. Let 𝜶=(𝜶1,,𝜶J1)T\boldsymbol{\alpha}=\left(\boldsymbol{\alpha}_{1},\cdots,\boldsymbol{\alpha}_{J-1}\right)^{T}, and its estimated value 𝜶^\hat{\boldsymbol{\alpha}} can be obtained through maximum likelihood estimation. With respect to censoring, for each treatment j=1,,Jj=1,\cdots,J, we assume a Cox proportional hazards model, specified as

λij(t|𝑾i,𝜸j)=λ0j(t)exp(𝑾iT𝜸j),\displaystyle\lambda_{ij}(t|\boldsymbol{W}_{i},\boldsymbol{\gamma}_{j})=\lambda_{0j}(t)\exp(\boldsymbol{W}_{i}^{T}\boldsymbol{\gamma}_{j}),

where λ0j(t)\lambda_{0j}(t) is an unspecified treatment-specific baseline hazard function of CC. The estimates for 𝜸j\boldsymbol{\gamma}_{j} and Λ0j(t)=0tλ0j(s)𝑑s\Lambda_{0j}(t)=\int_{0}^{t}\lambda_{0j}(s)ds can be determined by the maximum partial likelihood estimator, 𝜸^j\hat{\boldsymbol{\gamma}}_{j}, and the Breslow estimator, Λ^0j(t)\hat{\Lambda}_{0j}(t), respectively. Then the probability of remaining uncensored at tt for individual ii is given by exp{Λ^0j(t)exp(𝑾iT𝜸^j)}\exp\{-\hat{\Lambda}_{0j}(t)\exp(\boldsymbol{W}_{i}^{T}\boldsymbol{\hat{\gamma}}_{j})\}. In a typical survival study, one only gets to observe the minimum of CC and TT, which is usually referred to as observation time, and the observed data can be represented as {Δi,min(Ti,Ci)}\{\Delta_{i},\min(T_{i},C_{i})\}. However, for our data example, time to treatment switch or the end of insurance coverage can always be identified from the claims data, regardless of whether the event of interest happens or not. In this case, censoring time CC is always available for all subjects, and Δi=0\Delta_{i}=0 for all i,i=1,,ni,i=1,\cdots,n. Therefore, one can alternatively estimate the probability of remaining uncensored by replacing {Δi,min(Ti,Ci)}\{\Delta_{i},\min(T_{i},C_{i})\} with {0,Ci}\{0,C_{i}\}.

3.1 Consistency and Double Robustness

Under suitable regularity conditions, 𝜶^\hat{\boldsymbol{\alpha}}, 𝜸^j\hat{\boldsymbol{\gamma}}_{j}, and Λ^0j\hat{\Lambda}_{0j} converge in probability to well-defined limits, denoted by 𝜶\boldsymbol{\alpha}^{\ast}, 𝜸j\boldsymbol{\gamma}_{j}^{\ast}, and 𝚲0j\boldsymbol{\Lambda}_{0j}^{\ast}, respectively, which can be different from their corresponding true values 𝜶0\boldsymbol{\alpha}^{0}, 𝜸j0\boldsymbol{\gamma}_{j}^{0}, and 𝚲0j0\boldsymbol{\Lambda}_{0j}^{0} 14, 15. We denote the true values for 𝜷j\boldsymbol{\beta}_{j} and μj\mu_{j} by 𝜷j0\boldsymbol{\beta}_{j}^{0} and μj0\mu_{j}^{0}, respectively. For notational convenience, we also define Λij(t)=Λ0j(t)exp(𝑾iT𝜸j)\Lambda_{ij}^{\ast}(t)=\Lambda_{0j}^{\ast}(t)\exp(\boldsymbol{W}_{i}^{T}\boldsymbol{\gamma}_{j}^{\ast}) and Λij0(t)=Λ0j0(t)exp(𝑾iT𝜸j0)\Lambda_{ij}^{0}(t)=\Lambda_{0j}^{0}(t)\exp(\boldsymbol{W}_{i}^{T}\boldsymbol{\gamma}_{j}^{0}).

We first show that μ^j=n1i=1nmij(𝜷^j)\hat{\mu}_{j}=n^{-1}\sum_{i=1}^{n}m_{ij}(\hat{\boldsymbol{\beta}}_{j}), a function of 𝜷^j\hat{\boldsymbol{\beta}}_{j}, is consistent when the outcome model for treatment jj is correct. Using the theory of M-estimator, the consistency of 𝜷^j\hat{\boldsymbol{\beta}}_{j} can be established by showing that the estimating function is unbiased 16, that is,

0=\displaystyle 0= E[DijRi𝑿i{Y~imij(𝜷j0)}πij(𝜶)exp{Λij(Li)}]\displaystyle E\left[\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\{\tilde{Y}_{i}-m_{ij}(\boldsymbol{\beta}_{j}^{0})\}}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(L_{i})\}}\right]
=\displaystyle= E[DijRi𝑿iY~iπij(𝜶)exp{Λij(Li)}]\displaystyle E\left[\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\tilde{Y}_{i}}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(L_{i})\}}\right] (4a)
E[DijRi𝑿imij(𝜷j0)πij(𝜶)exp{Λij(Li)}].\displaystyle-E\left[\frac{D_{ij}R_{i}\boldsymbol{X}_{i}m_{ij}(\boldsymbol{\beta}_{j}^{0})}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(L_{i})\}}\right]. (4b)

Applying the law of iterated expectation and using the Assumption V,

(4a)\displaystyle(\ref{eq:ef1}) =E{E[DijRi𝑿iY~iπij(𝜶)exp{Λij(Li)}|𝑿i,Zi=j]}\displaystyle=E\left\{E\left[\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\tilde{Y}_{i}}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(L_{i})\}}\middle|\boldsymbol{X}_{i},Z_{i}=j\right]\right\}
=E[Dij𝑿iE{I(Ci>d)|𝑿i,Zi=j}πij(𝜶)exp{Λij(d)}E(Y~i|𝑿i,Zi=j)],\displaystyle=E\left[\frac{D_{ij}\boldsymbol{X}_{i}E\left\{I(C_{i}>d)\middle|\boldsymbol{X}_{i},Z_{i}=j\right\}}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(d)\}}E(\tilde{Y}_{i}|\boldsymbol{X}_{i},Z_{i}=j)\right],

where the second equation is derived from the formula RiY~i=I{Ci>min(Ti,d)}Y~i=I{Ci>d}Y~iR_{i}\tilde{Y}_{i}=I\{C_{i}>\min(T_{i},d)\}\tilde{Y}_{i}=I\{C_{i}>d\}\tilde{Y}_{i}. Since when Ti>dT_{i}>d, Ri/exp{Λij(Li)}=I(Ci>d)/exp{Λij(d)}R_{i}/\exp\{-\Lambda_{ij}^{\ast}(L_{i})\}=I(C_{i}>d)/\exp\{-\Lambda_{ij}^{\ast}(d)\}, using similar techniques,

(4b)\displaystyle(\ref{eq:ef2}) =E{E[D(j)Ri𝑿imij(𝜷j0)πij(𝜶)exp{Λij(Li)}|𝑿i,Zi=j,Ti>d]}\displaystyle=E\left\{E\left[\frac{D^{(j)}R_{i}\boldsymbol{X}_{i}m_{ij}(\boldsymbol{\beta}_{j}^{0})}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(L_{i})\}}\middle|\boldsymbol{X}_{i},Z_{i}=j,T_{i}>d\right]\right\}
=E{Dij𝑿iE[I(Ci>d)|𝑿i,Zi=j]πij(𝜶)exp{Λij(d)}mij(𝜷j0)}.\displaystyle=E\left\{\frac{D_{ij}\boldsymbol{X}_{i}E\left[I(C_{i}>d)\middle|\boldsymbol{X}_{i},Z_{i}=j\right]}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda^{\ast}_{ij}(d)\}}m_{ij}(\boldsymbol{\beta}_{j}^{0})\right\}.

Since E(Y~i|𝑿i,Zi=j)=mij(𝜷j0)E(\tilde{Y}_{i}|\boldsymbol{X}_{i},Z_{i}=j)=m_{ij}(\boldsymbol{\beta}_{j}^{0}) when the outcome model is correctly specified, the estimating function is shown to be unbiased, which implies that 𝜷^j\hat{\boldsymbol{\beta}}_{j} obtained by solving (3) converges in probability to the truth 𝜷j0\boldsymbol{\beta}_{j}^{0}. Therefore, μ^j=n1i=1nmij(𝜷^j)𝑝E{mij(𝜷j0)}=μj0\hat{\mu}_{j}=n^{-1}\sum_{i=1}^{n}m_{ij}(\hat{\boldsymbol{\beta}}_{j})\overset{p}{\to}E\left\{m_{ij}\left(\boldsymbol{\beta}_{j}^{0}\right)\right\}=\mu_{j}^{0}.

We then show the consistency of μ^j\hat{\mu}_{j} when the coarsening mechanisms (i.e., treatment and censoring models) are correctly specified, in which case πij(𝜶^)𝑝πij(𝜶0)\pi_{ij}(\hat{\boldsymbol{\alpha}})\overset{p}{\to}\pi_{ij}(\boldsymbol{\alpha}^{0}) and Λ^ij(t)𝑝Λij0(t)\hat{\Lambda}_{ij}(t)\overset{p}{\to}\Lambda_{ij}^{0}(t). Under suitable regularity conditions, 𝜷^j𝑝𝜷j\hat{\boldsymbol{\beta}}_{j}\overset{p}{\to}\boldsymbol{\beta}_{j}^{\ast}, where 𝜷j\boldsymbol{\beta}_{j}^{\ast} is a well-defined limit, and then

μ^j=n1i=1nmij(𝜷^j)𝑝E{mij(𝜷j)}.\displaystyle\hat{\mu}_{j}=n^{-1}\sum_{i=1}^{n}m_{ij}\left(\hat{\boldsymbol{\beta}}_{j}\right)\overset{p}{\to}E\left\{m_{ij}\left(\boldsymbol{\beta}_{j}^{\ast}\right)\right\}.

We consider the intercept term in 𝑿i\boldsymbol{X}_{i} and rearrange equation (3),

n1i=1nDijRiY~iπij(𝜶^)exp{Λ^ij(Li)}=n1i=1nDijRimij(𝜷^j)πij(𝜶^)exp{Λ^ij(Li)}.\displaystyle n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}\tilde{Y}_{i}}{\pi_{ij}(\hat{\boldsymbol{\alpha}})\exp\{-\hat{\Lambda}_{ij}(L_{i})\}}=n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}m_{ij}(\hat{\boldsymbol{\beta}}_{j})}{\pi_{ij}(\hat{\boldsymbol{\alpha}})\exp\{-\hat{\Lambda}_{ij}(L_{i})\}}. (5)

The left-hand side of (5) converges in probability to μj0\mu_{j}^{0}, because

n1i=1nDijRiY~iπij(𝜶^)exp{Λ^ij(Li)}\displaystyle n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}\tilde{Y}_{i}}{\pi_{ij}(\hat{\boldsymbol{\alpha}})\exp\{-\hat{\Lambda}_{ij}(L_{i})\}} p E[DijRiY~iπij(𝜶0)exp{Λij0(Li)}]\displaystyle\xrightarrow{\text{ }p\text{ }}E\left[\frac{D_{ij}R_{i}\tilde{Y}_{i}}{\pi_{ij}(\boldsymbol{\alpha}^{0})\exp\{-\Lambda_{ij}^{0}(L_{i})\}}\right]
=E[DijE{I(Ci>d)Y~i(j)|𝑿~i,Zi=j}πij(𝜶0)exp{Λij0(d)}]\displaystyle=E\left[\frac{D_{ij}E\{I(C_{i}>d)\tilde{Y}_{i}^{(j)}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j\}}{\pi_{ij}(\boldsymbol{\alpha}^{0})\exp\{-\Lambda_{ij}^{0}(d)\}}\right]
=E[DijE{I(Ci>d)|𝑿~i,Zi=j}E{Y~i(j)|𝑿~i,Zi=j}πij(𝜶0)exp{Λij0(d)}]\displaystyle=E\left[\frac{D_{ij}E\{I(C_{i}>d)|\tilde{\boldsymbol{X}}_{i},Z_{i}=j\}E\{\tilde{Y}_{i}^{(j)}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j\}}{\pi_{ij}(\boldsymbol{\alpha}^{0})\exp\{-\Lambda_{ij}^{0}(d)\}}\right] (6)

where (6) follows from Assumption V. With correct specification of the treatment and censoring models, E{I(Ci>d)|𝑿~i,Zi=j}=exp{Λij0(d)}E\{I(C_{i}>d)|\tilde{\boldsymbol{X}}_{i},Z_{i}=j\}=\exp\{-\Lambda_{ij}^{0}(d)\} and E(Dij|𝑿~i)=πij(𝜶0)E(D_{ij}|\tilde{\boldsymbol{X}}_{i})=\pi_{ij}(\boldsymbol{\alpha}_{0}), and therefore (6) can be reduced to E[E{Y~i(j)|𝑿~i,Zi=j}]E[E\{\tilde{Y}_{i}^{(j)}|\tilde{\boldsymbol{X}}_{i},Z_{i}=j\}], which is equivalent to μj0\mu_{j}^{0}.

Using similar techniques, one can show that the right-hand side of (5) converges in probability to E{mij(𝜷j)}E\left\{m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\right\}, since

n1i=1nDijRimij(𝜷^j)πij(𝜶^)exp{Λ^ij(Li)}\displaystyle n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}m_{ij}(\hat{\boldsymbol{\beta}}_{j})}{\pi_{ij}(\hat{\boldsymbol{\alpha}})\exp\{-\hat{\Lambda}_{ij}(L_{i})\}} p E[DijRimij(𝜷j)πij(𝜶0)exp{Λij0(Li)}]\displaystyle\xrightarrow{\text{ }p\text{ }}E\left[\frac{D_{ij}R_{i}m_{ij}(\boldsymbol{\beta}_{j}^{\ast})}{\pi_{ij}(\boldsymbol{\alpha}^{0})\exp\{-\Lambda_{ij}^{0}(L_{i})\}}\right]
=E[Dijmij(𝜷j)E{I(Ci>d)|𝑿i,Zi=j,Ti>d}πij(𝜶0)exp{Λij0(d)}]\displaystyle=E\left[\frac{D_{ij}m_{ij}(\boldsymbol{\beta}_{j}^{\ast})E\{I(C_{i}>d)|\boldsymbol{X}_{i},Z_{i}=j,T_{i}>d\}}{\pi_{ij}(\boldsymbol{\alpha}^{0})\exp\{-\Lambda_{ij}^{0}(d)\}}\right]
=E{mij(𝜷j)}.\displaystyle=E\left\{m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\right\}.

It follows that μj0=E{mij(𝜷j)}\mu_{j}^{0}=E\left\{m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\right\}, and μ^j𝑝μj0\hat{\mu}_{j}\overset{p}{\to}\mu_{j}^{0} when the treatment and censoring models are correctly specified. Note that when the stronger independence assumption (CT|ZC\mathrel{\text{\scalebox{1.07}{$\perp\mkern-10.0mu\perp$}}}T|Z) holds for survival and censoring times, only the treatment model is required to be correct. We have shown that the proposed estimator exhibits the so-called double robustness property.

3.2 Asymptotic Properties

In this section, we establish the asymptotic properties of our proposed estimator μ^j\hat{\mu}_{j}. For j=1,,Jj=1,\cdots,J, through a Taylor series expansion of μ^j=n1i=1nmij(𝜷^j)\hat{\mu}_{j}=n^{-1}\sum_{i=1}^{n}m_{ij}(\hat{\boldsymbol{\beta}}_{j}) about 𝜷j\boldsymbol{\beta}_{j}^{\ast},

n1/2(μ^jμj0)=n1/2i=1n{mij(𝜷j)μj0}+𝑨j(𝜷j)n1/2(𝜷^j𝜷j)+op(1),\displaystyle n^{1/2}(\hat{\mu}_{j}-\mu_{j}^{0})=n^{-1/2}\sum_{i=1}^{n}\left\{m_{ij}(\boldsymbol{\beta}^{\ast}_{j})-\mu_{j}^{0}\right\}+\boldsymbol{A}_{j}(\boldsymbol{\beta}_{j}^{\ast})n^{1/2}(\hat{\boldsymbol{\beta}}_{j}-\boldsymbol{\beta}_{j}^{\ast})+o_{p}(1), (7)

where 𝑨j(𝜷j)=E[𝑿iTmij(𝜷j){1mij(𝜷j)}]\boldsymbol{A}_{j}(\boldsymbol{\beta}_{j}^{\ast})=E\left[\boldsymbol{X}_{i}^{T}m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\{1-m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\}\right].

Equation (7) indicates that to characterize the asymptotic distribution of n1/2(μ^jμj0)n^{1/2}(\hat{\mu}_{j}-\mu_{j}^{0}), one first needs to identify the asymptotic distribution of n1/2(𝜷^j𝜷j)n^{1/2}(\hat{\boldsymbol{\beta}}_{j}-\boldsymbol{\beta}_{j}^{\ast}), which further depends on the asymptotic results for the parameters of the treatment and censoring models. Under some suitable regularity conditions, 𝜶^l𝑝𝜶l\hat{\boldsymbol{\alpha}}_{l}\overset{p}{\to}\boldsymbol{\alpha}_{l}^{\ast} for l=1,,J1l=1,\cdots,J-1, and the estimator of the treatment model parameter is asymptotically normal with

n1/2(𝜶^l𝜶l)=𝑯l1(𝜶)n1/2i=1n𝑽i{Dilπil(𝜶)}+op(1),\displaystyle n^{1/2}(\hat{\boldsymbol{\alpha}}_{l}-\boldsymbol{\alpha}_{l}^{\ast})=\boldsymbol{H}_{l}^{-1}(\boldsymbol{\alpha}^{\ast})n^{-1/2}\sum_{i=1}^{n}\boldsymbol{V}_{i}\left\{D_{il}-\pi_{il}(\boldsymbol{\alpha}^{\ast})\right\}+o_{p}(1), (8)

where 𝑯l(𝜶)=E[i=1n𝑽i𝑽iTπil(𝜶){1πil(𝜶)}]\boldsymbol{H}_{l}(\boldsymbol{\alpha}^{\ast})=E\left[\sum_{i=1}^{n}\boldsymbol{V}_{i}\boldsymbol{V}_{i}^{T}\pi_{il}(\boldsymbol{\alpha}^{\ast})\left\{1-\pi_{il}(\boldsymbol{\alpha}^{\ast})\right\}\right] with 𝜶=(𝜶1,,𝜶J1)T\boldsymbol{\alpha}^{\ast}=(\boldsymbol{\alpha}_{1}^{\ast},\cdots,\boldsymbol{\alpha}_{J-1}^{\ast})^{T}.

For the asymptotic distributions of the estimators γ^j\hat{\gamma}_{j} and Λ^ij\hat{\Lambda}_{ij}, we define the relevant notations 𝒔j(q)(t;𝜸j)\boldsymbol{s}_{j}^{(q)}(t;\boldsymbol{\gamma}_{j}) for q=0,1,2q=0,1,2, 𝒘¯j(t;𝜸j)\overline{\boldsymbol{w}}_{j}(t;\boldsymbol{\gamma}_{j}), dΛ0j(t)d\Lambda_{0j}^{\ast}(t), and dMij(t)dM_{ij}^{\ast}(t) in section A in the Supplementary Materials available at Biostatistics online. We further denote the counting process by Nij(t)=DijI{min(Ti,Ci)t,Δi=1}N_{ij}(t)=D_{ij}I\{\min(T_{i},C_{i})\leq t,\Delta_{i}=1\} and the at-risk process by Yij(t)=DijI{min(Ti,Ci)t}Y_{ij}(t)=D_{ij}I\{\min(T_{i},C_{i})\geq t\}. Let δ\delta be the time point that satisfies P{min(Ti,Ci)δ}>0P\{\min(T_{i},C_{i})\geq\delta\}>0 for i=1,,ni=1,\cdots,n, which is practically set to the maximum observation time. Lin and Wei15 showed that under some regularity conditions, 𝜸^j𝑝𝜸j\hat{\boldsymbol{\gamma}}_{j}\overset{p}{\to}\boldsymbol{\gamma}_{j}^{\ast}, and n1/2(𝜸^j𝜸j)n^{1/2}(\hat{\boldsymbol{\gamma}}_{j}-\boldsymbol{\gamma}_{j}^{\ast}) converges in distribution to a normal distribution

n1/2(𝜸^j𝜸j)=𝛀j1(𝜸j)n1/2i=1n𝑼ij(𝜸j)+op(1),\displaystyle n^{1/2}(\hat{\boldsymbol{\gamma}}_{j}-\boldsymbol{\gamma}_{j}^{\ast})=\boldsymbol{\Omega}^{-1}_{j}(\boldsymbol{\gamma}_{j}^{\ast})n^{-1/2}\sum_{i=1}^{n}\boldsymbol{U}_{ij}(\boldsymbol{\gamma}_{j}^{\ast})+o_{p}(1), (9)

where 𝛀j(𝜸j)=0δ{𝒔j(2)(t;𝜸j)sj(0)(t;𝜸j)𝒘¯j(t;𝜸j)2}E{Yij(t)λij(t)}𝑑t\boldsymbol{\Omega}_{j}(\boldsymbol{\gamma}_{j}^{\ast})=\int_{0}^{\delta}\left\{\frac{\boldsymbol{s}_{j}^{(2)}(t;\boldsymbol{\gamma}_{j}^{\ast})}{s_{j}^{(0)}(t;\boldsymbol{\gamma}_{j}^{\ast})}-\overline{\boldsymbol{w}}_{j}(t;\boldsymbol{\gamma}_{j})^{\otimes 2}\right\}E\{Y_{ij}(t)\lambda_{ij}(t)\}dt and 𝑼ij(𝜸j)=0δ{𝑾i𝒘¯(t;𝜸j)}𝑑Mij(t)\boldsymbol{U}_{ij}(\boldsymbol{\gamma}_{j}^{\ast})=\int_{0}^{\delta}\{\boldsymbol{W}_{i}-\overline{\boldsymbol{w}}(t;\boldsymbol{\gamma}_{j}^{\ast})\}dM_{ij}^{\ast}(t). Using (9), one can show that

n1/2{Λ^ij(t)Λij(t)}=𝑲ijT(t;𝜸j)𝛀1(𝜸j)n1/2i=1n𝑼ij(𝜸j)+exp(𝑾iT𝜸j)n1/2i=1n0tdMij(u)s(0)(u;𝜸j)+op(1),n^{1/2}\{\hat{\Lambda}_{ij}(t)-\Lambda_{ij}^{\ast}(t)\}=\boldsymbol{K}_{ij}^{T}(t;\boldsymbol{\gamma}_{j}^{\ast})\boldsymbol{\Omega}^{-1}(\boldsymbol{\gamma}_{j}^{\ast})n^{-1/2}\sum_{i=1}^{n}\boldsymbol{U}_{ij}(\boldsymbol{\gamma}_{j}^{\ast})\\ +\exp(\boldsymbol{W}_{i}^{T}\boldsymbol{\gamma}_{j}^{\ast})n^{-1/2}\sum_{i=1}^{n}\int_{0}^{t}\frac{dM_{ij}^{\ast}(u)}{s^{(0)}(u;\boldsymbol{\gamma}_{j}^{\ast})}+o_{p}(1), (10)

where 𝑲ij(t;𝜸j)=0t{𝑾i𝒘¯j(t;𝜸j)}𝑑Λij(u)\boldsymbol{K}_{ij}(t;\boldsymbol{\gamma}_{j}^{\ast})=\int_{0}^{t}\{\boldsymbol{W}_{i}-\overline{\boldsymbol{w}}_{j}(t;\boldsymbol{\gamma}_{j}^{\ast})\}d\Lambda_{ij}^{\ast}(u).

By a sequence of Taylor series expansion of n1i=1nDijRi𝑿i{Y~imij(𝜷^j)}πij(𝜶^)exp{Λ^ij(Li)}n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\{\tilde{Y}_{i}-m_{ij}(\hat{\boldsymbol{\beta}}_{j})\}}{\pi_{ij}(\hat{\boldsymbol{\alpha}})\exp\{-\hat{\Lambda}_{ij}(L_{i})\}} (see Section A in the Supplementary Materials) and combining the results of (8), (9), and (10), it follows that

n1/2(𝜷^j𝜷j)=𝑩j1(𝜷j,𝜶,Λij)n1/2i=1nDijRi𝑿i{Y~imij(𝜷j)}πij(𝜶)exp{Λij(Li)}+𝑩j1(𝜷j,𝜶,Λij)l=1J1[𝑭jl(𝜷j,𝜶,Λij)𝑯l1(𝜶)n1/2i=1n𝑽i{Dilπil(𝜶)}]+𝑩j1(𝜷j,𝜶,Λij)𝑷j(𝜷j,𝜶,Λij)𝛀j1(𝜸j)n1/2i=1n𝑼ij(𝜸j)+𝑩j1(𝜷j,𝜶,Λij)𝑸j(𝜷j,𝜶,Λij)n1/2i=1n0tdMij(u)s(0)(u;𝜸j)+op(1).n^{1/2}(\hat{\boldsymbol{\beta}}_{j}-\boldsymbol{\beta}_{j}^{\ast})=\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})n^{-1/2}\sum_{i=1}^{n}\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\{\tilde{Y}_{i}-m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\}}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda_{ij}^{\ast}(L_{i})\}}\\ +\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\sum_{l=1}^{J-1}\left[\boldsymbol{F}_{jl}\left(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast}\right)\boldsymbol{H}_{l}^{-1}(\boldsymbol{\alpha}^{\ast})n^{-1/2}\sum_{i=1}^{n}\boldsymbol{V}_{i}\{D_{il}-\pi_{il}(\boldsymbol{\alpha}^{\ast})\}\right]\\ +\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\boldsymbol{P}_{j}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\boldsymbol{\Omega}_{j}^{-1}(\boldsymbol{\gamma}_{j}^{\ast})n^{-1/2}\sum_{i=1}^{n}\boldsymbol{U}_{ij}(\boldsymbol{\gamma}_{j}^{\ast})\\ +\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\boldsymbol{Q}_{j}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha},\Lambda_{ij}^{\ast})n^{-1/2}\sum_{i=1}^{n}\int_{0}^{t}\frac{dM_{ij}^{\ast}(u)}{s^{(0)}(u;\boldsymbol{\gamma}_{j}^{\ast})}+o_{p}(1). (11)

where 𝑩j\boldsymbol{B}_{j}, 𝑭jl\boldsymbol{F}_{jl}, 𝑷j\boldsymbol{P}_{j}, and 𝑸j\boldsymbol{Q}_{j} are defined in section A in the Supplementary Materials available at Biostatistics online.

Plugging (11) into (7), we can represent n1/2(μ^jμj)n^{1/2}(\hat{\mu}_{j}-\mu_{j}) as n1/2i=1nψij+op(1)n^{-1/2}\sum_{i=1}^{n}\psi_{ij}+o_{p}(1), where

ψij=\displaystyle\psi_{ij}= mij(𝜷j)μj+𝑨j(𝜷j)𝑩j1(𝜷j,𝜶,Λij)DijRi𝑿i{Y~imij(𝜷j)}πij(𝜶)exp{Λij(Li)}\displaystyle m_{ij}(\boldsymbol{\beta}^{\ast}_{j})-\mu_{j}+\boldsymbol{A}_{j}(\boldsymbol{\beta}_{j}^{\ast})\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\frac{D_{ij}R_{i}\boldsymbol{X}_{i}\{\tilde{Y}_{i}-m_{ij}(\boldsymbol{\beta}_{j}^{\ast})\}}{\pi_{ij}(\boldsymbol{\alpha}^{\ast})\exp\{-\Lambda_{ij}^{\ast}(L_{i})\}}
+𝑨j(𝜷j)𝑩j1(𝜷j,𝜶,Λij)l=1J1𝑭jl(𝜷j,𝜶,Λij)𝑯l1(𝜶)𝑽i{Dilπil(𝜶)}\displaystyle+\boldsymbol{A}_{j}(\boldsymbol{\beta}_{j}^{\ast})\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\sum_{l=1}^{J-1}\boldsymbol{F}_{jl}\left(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast}\right)\boldsymbol{H}_{l}^{-1}(\boldsymbol{\alpha}^{\ast})\boldsymbol{V}_{i}\{D_{il}-\pi_{il}(\boldsymbol{\alpha}^{\ast})\}
+𝑨j(𝜷j)𝑩j1(𝜷j,𝜶,Λij)𝑷j(𝜷j,𝜶,Λij)𝛀j1(𝜸j)𝑼ij(𝜸j)\displaystyle+\boldsymbol{A}_{j}(\boldsymbol{\beta}_{j}^{\ast})\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\boldsymbol{P}_{j}(\boldsymbol{\beta}_{j},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\boldsymbol{\Omega}_{j}^{-1}(\boldsymbol{\gamma}_{j}^{\ast})\boldsymbol{U}_{ij}(\boldsymbol{\gamma}_{j}^{\ast})
+𝑨j(𝜷j)𝑩j1(𝜷j,𝜶,Λij)𝑸j(𝜷j,𝜶,Λij)0LidMij(u)s(0)(u;𝜸j).\displaystyle+\boldsymbol{A}_{j}(\boldsymbol{\beta}_{j}^{\ast})\boldsymbol{B}_{j}^{-1}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\boldsymbol{Q}_{j}(\boldsymbol{\beta}_{j}^{\ast},\boldsymbol{\alpha}^{\ast},\Lambda_{ij}^{\ast})\int_{0}^{L_{i}}\frac{dM_{ij}^{\ast}(u)}{s^{(0)}(u;\boldsymbol{\gamma}_{j}^{\ast})}.

By the central limit theorem, n1/2i=1nψijn^{-1/2}\sum_{i=1}^{n}\psi_{ij} converges in distribution to a normal distribution with mean 0 and variance E(ψij2)E(\psi_{ij}^{2}).

4 Methods under Comparison

We compare our proposed method with several alternative approaches. The first is to leave out censored subjects and apply standard inverse probability weighted (IPW) method to the data with completely observed outcome only. The corresponding estimator for the average potential outcome in treatment group jj is given by

μ^j,IPW=n1i=1nDijRiY~iπij(𝜶^).\displaystyle\hat{\mu}_{j,\text{IPW}}=n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}\tilde{Y}_{i}}{\pi_{ij}(\hat{\boldsymbol{\alpha}})}.

The second approach considered builds on the IPW estimator and, along the line of Anstrom and Tsiatis,5 further weights the subjects by the inverse probability of not being coarsened, namely, CIPW estimator

μ^j,CIPW=n1i=1nDijRiY~iπij(𝜶^)exp{Λ^ij(Li)}.\displaystyle\hat{\mu}_{j,\text{CIPW}}=n^{-1}\sum_{i=1}^{n}\frac{D_{ij}R_{i}\tilde{Y}_{i}}{\pi_{ij}(\hat{\boldsymbol{\alpha}})\exp\{-\hat{\Lambda}_{ij}(L_{i})\}}.

The third is the estimator of Wang et al,6 which is a doubly robust estimator for average treatment effect using an augmented inverse probability weighted method, and we label it CAIPW-Wang. Let hij(𝝎j)h_{ij}(\boldsymbol{\omega}_{j}) be a posited model, in this case a logistic regression model, for E(Y~i|Zi=j,𝑿~i)E(\tilde{Y}_{i}|Z_{i}=j,\tilde{\boldsymbol{X}}_{i}). The estimates for the parameter 𝝎j\boldsymbol{\omega}_{j}, denoted by 𝝎^j\hat{\boldsymbol{\omega}}_{j}, are obtained by solving the score functions weighted by the inverse probability of not being censored. The final estimator is given by

μ^j,CAIPW-Wang=(i=1nwi)1i=1nwi{DijY~iπij(𝜶^)Dijπij(𝜶^)πij(𝜶^)hij(𝝎^j)},\displaystyle\hat{\mu}_{j,\text{CAIPW-Wang}}=\left(\sum_{i=1}^{n}w_{i}\right)^{-1}\sum_{i=1}^{n}w_{i}\left\{\frac{D_{ij}\tilde{Y}_{i}}{\pi_{ij}(\hat{\boldsymbol{\alpha}})}-\frac{D_{ij}-\pi_{ij}(\hat{\boldsymbol{\alpha}})}{\pi_{ij}(\hat{\boldsymbol{\alpha}})}h_{ij}(\hat{\boldsymbol{\omega}}_{j})\right\},

where wi=i=1n{Δi/j=1JDijK^j[min(Ti,Ci)]}w_{i}=\sum_{i=1}^{n}\{\Delta_{i}/\sum_{j=1}^{J}D_{ij}\hat{K}_{j}[\min(T_{i},C_{i})]\} and K^j(t)\hat{K}_{j}(t) is the treatment-specific Kaplan-Meier (KM) estimator.

The fourth is to apply standard causal inference methods, such as IPW, to pseudo-values of the outcome,8 and we call it Pseudo-IPW. Suppose that the parameter of interest is θ=E{I(Tid)}\theta=E\{I(T_{i}\geq d)\}. The pseudo-observations for subject ii is defined as θi=nθ^(n1)θ^i\theta_{i}=n\hat{\theta}-(n-1)\hat{\theta}^{-i}, where θ^\hat{\theta} is the KM estimator and θ^i\hat{\theta}^{-i} is the estimator applied to the sample from which subject ii is excluded. The method is implemented using the pseudo package in R.17 The pseudo-observations can be viewed as a replacement for the (possibly incompletely observed) outcome variable, and censoring is taken care of in the computation of pseudo-observations. In this case, the pseudo-observations are calculated assuming the independence of TT and CC given ZZ.

The fifth is the estimator of Zhang and Schaubel,7 which is originally designed for estimating the restricted mean lifetimes, and involves modeling the entire survival curve of the event time using Cox proportional hazards models, rather than focusing on a fixed time point as in the aforementioned approaches and our proposed method. It first estimates the cumulative hazard function for the event time TT for each group jj, denoted by Λ^j(t)\hat{\Lambda}_{j}(t), by augmenting an inverse probability weighted estimating equation with additional terms that involve outcome models. Then one can estimate the survival probability μj(t)\mu_{j}(t) at any time point tt by μ^j(t)=eΛ^j(t)\hat{\mu}_{j}(t)=e^{-\hat{\Lambda}_{j}(t)}. Therefore, the method of Zhang and Schaubel can also be used for evaluating the risk at a specific time point dd. We label their approach based on the inverse probability weighted estimating function as CIPW-ZS, and the augmented version as CAIPW-ZS. CAIPW-ZS is doubly robust under Assumption V, such that the estimator is consistent when either the time-to-event or the coarsening mechanism is correctly modeled.

The method of Zhang and Schaubel is developed within the general framework of augmented inverse probability weighting, whereas the proposed method is a standardization method. Other than this, another key difference between these two methods is that the former uses the Cox models but the proposed method uses logistic regression models as working models to improve efficiency of the treatment effect estimator. If the interest only lies in a binary outcome (i.e., the occurrence of the event within a specific time point), theoretically one only needs to model the relationship of the binary outcome with covariates to improve efficiency. The method of Zhang and Schaubel requires modeling the relationship of the hazard function (equivalently, the survival curve), not limited to a specific time point, with covariates. This tends to be an overkill for our purpose and increase the chance of model misspecification. When the Cox model is severely misspecified, CAIPW-ZS may provide little efficiency gain for the treatment effect estimates. We illustrate this point in one of our simulation settings.

We further consider the simple difference in the average outcome of each treatment group, as a benchmark, and call it the Naive estimator. The Naive estimator ignores both confounding and censoring. Sample code for the methods described in this section can be found at https://github.com/youfeiyu/CIPWR.

Among these methods, Naive and IPW estimators fail to account for censoring. Pseudo-IPW and AIPW-Wang assume that TT and CC are independent conditional on ZZ. CIPW-ZS and CAIPW-ZS, along with our proposed method CIPWR, rely on a more relaxed assumption that TT and CC are independent conditional on ZZ and 𝑿~\tilde{\boldsymbol{X}}. CAIPW-Wang, CAIPW-ZS, and CIPWR leverage the information about the outcome model, which asymptotically improves the precision of the estimates. Moreover, the double robustness property of CAIPW-Wang and CAIPW-ZS are provided by the augmentation terms in the estimating equations, while the proposed CIPWR achieves double robustness through standardization of the weighted outcome model.

5 Simulation Studies

We compared the finite sample performance of our proposed method to the six alternative approaches described in Section 4 through simulation studies. Specifically, we considered two settings that varied in degrees of nonproportionality with respect to the hazard functions for our simulation. The first setting assumed a logistic distribution for the time to event such that the hazard functions did not cross. The second setting concerned hazard functions that crossed at a certain time point for subjects with different covariate values, in which case Cox proportional hazards model tended to perform poorly in terms of improving precision.

5.1 Simulation Setting I: Non-crossing Hazards

For the first setting, each simulated dataset contained five baseline covariates. X1X_{1}, X2X_{2}, and X3X_{3} were independently sampled from a standard normal distribution. X4Bernoulli(0.4)X_{4}\sim\text{Bernoulli}(0.4) and X5Uniform(2,2)X_{5}\sim\text{Uniform}(-2,2). The treatment assignment ZZ was simulated from a categorical distribution with the probability of receiving treatment jj being

exp(αj0+αj1X1+αj2X2+αj4X4)z=13exp(αz0+αz1X1+αz2X2+αz4X4)\displaystyle\frac{\exp(\alpha_{j0}+\alpha_{j1}X_{1}+\alpha_{j2}X_{2}+\alpha_{j4}X_{4})}{\sum_{z=1}^{3}\exp(\alpha_{z0}+\alpha_{z1}X_{1}+\alpha_{z2}X_{2}+\alpha_{z4}X_{4})}

for j=1,2,3j=1,2,3. The potential time to event T(j)T^{(j)} was sampled from a logistic distribution with mean function βj0+βj1X1+βj2X2+βj3X3\beta_{j0}+\beta_{j1}X_{1}+\beta_{j2}X_{2}+\beta_{j3}X_{3} and scale parameter s=7s=7. The potential outcome Y(j)Y^{(j)} is defined as Y(j)=I{T(j)<d}Y^{(j)}=I\{T^{(j)}<d\}, where d=130d=130. We generated the censoring time CC using inverse transform sampling.18 In particular, we assumed a Cox proportional hazard model with the baseline hazard following a Weibull distribution,

C(j)={λ1exp(γj0+γj1X1+γj2X2+γj5X5)1logu}1/ν,\displaystyle C^{(j)}=\{\lambda^{-1}\exp(\gamma_{j0}+\gamma_{j1}X_{1}+\gamma_{j2}X_{2}+\gamma_{j5}X_{5})^{-1}\log u\}^{1/\nu},

where the scale parameter λ=0.01\lambda=0.01, the shape parameter ν=7\nu=7, and uu was randomly sampled from a uniform distribution with interval [0,1].

We defined a ‘baseline’ scenario where the outcome was weakly associated with the covariates and the proportion of being censored by d=130d=130 was 30%30\%. Then we varied the corresponding parameters to induce three proportions of being censored by d=130d=130 (20%20\%, 30%30\%, and 40%40\%) and two levels of associations with the outcome (strong and weak). We also considered a scenario where censoring was independent of the covariates, referring to it as the random censoring scenario, with 30% of the subjects being censored by d=130d=130. The values of the parameters chosen in Setting I are listed in Table B.1. The true values for the estimands E{Y(1)}E\{Y^{(1)}\}, E{Y(2)}E\{Y^{(2)}\}, and E{Y(3)}E\{Y^{(3)}\} were respectively 0.36, 0.50, and 0.63 for the scenarios of weak outcome associations, and 0.40, 0.59, and 0.50 when the outcome associations were strong.

The propensity scores were estimated using a multinomial logistic regression model, and the probability of remaining uncensored at dd was estimated by a Cox proportional hazards model. For the random censoring scenario, the probability of remaining uncensored was also estimated using the treatment-specific KM estimator. For the scenario with the largest proportion of censored observations (\sim40%), we further considered the case in which the censoring time was observed for all subjects, as was the case in our data example, and evaluated the performance of CIPWR based on the observed censoring time (in contrast to the observation time). Across the scenarios, we considered three sets of model specifications for the CIPWR estimator: (1) correctly specified models for outcome, treatment, and censoring, (2) correctly specified models for treatment and censoring only, and (3) a correctly specified outcome model only. The misspecification for each model was caused by removing the confounder X2X_{2}. The CAIPW-ZS estimator assumed a Cox proportional hazard model for the survival time, and therefore the outcome model was always misspecified in this case. The CAIPW-Wang method only considered an outcome model and a propensity score model, since the survival function of the censoring time was estimated by the KM estimator.

5.2 Simulation Setting II: Crossing Hazards

Two covariates independently sampled from the standard normal distribution, X1X_{1} and X2X_{2}, were considered for the setting of crossed hazard functions. We assumed a multiphase model for the event time, where the effects of risk factors on the hazards differed by phases. The varying effects over time of risk factors are often seen in the setting of surgery. Specifically, the event time was generated such that the hazard functions crossed at some time point, and the equations we used to obtain the event time are listed in Section C of the Supplementary Materials. The probability of being assigned to treatment jj was exp(αj1X1+αj2X2)/z=13exp(αz1X1+αz2X2)\exp(\alpha_{j1}X_{1}+\alpha_{j2}X_{2})/\sum_{z=1}^{3}\exp(\alpha_{z1}X_{1}+\alpha_{z2}X_{2}). Censoring was generated using C=λ1exp{γ1X1+γ2X2+θ1I(Z=2)+θ2I(Z=3)}1loguC=-\lambda^{-1}\exp\{\gamma_{1}X_{1}+\gamma_{2}X_{2}+\theta_{1}I(Z=2)+\theta_{2}I(Z=3)\}^{-1}\log u. In the first scenario, we assumed that the treatment assignment and censoring time only depended on X1X_{1}, such that 𝜶1=(α11,α12)T=(0,0)T\boldsymbol{\alpha}_{1}=(\alpha_{11},\alpha_{12})^{T}=(0,0)^{T}, 𝜶2=(α21,α22)T=(0.2,0)T\boldsymbol{\alpha}_{2}=(\alpha_{21},\alpha_{22})^{T}=(0.2,0)^{T}, 𝜶3=(α31,α32)T=(0.3,0)T\boldsymbol{\alpha}_{3}=(\alpha_{31},\alpha_{32})^{T}=(0.3,0)^{T}, and (λ,γ1,γ2,θ1,θ2)T=(0.8,1,0,0.2,0.4)T(\lambda,\gamma_{1},\gamma_{2},\theta_{1},\theta_{2})^{T}=(0.8,1,0,0.2,0.4)^{T}. In the second scenario, we let X2X_{2} come into play, such that 𝜶1=(α11,α12)T=(0,0)T\boldsymbol{\alpha}_{1}=(\alpha_{11},\alpha_{12})^{T}=(0,0)^{T}, 𝜶2=(α21,α22)T=(0.2,0.2)T\boldsymbol{\alpha}_{2}=(\alpha_{21},\alpha_{22})^{T}=(0.2,0.2)^{T}, 𝜶3=(α31,α32)T=(0.3,0.3)T\boldsymbol{\alpha}_{3}=(\alpha_{31},\alpha_{32})^{T}=(0.3,0.3)^{T}, and (λ,γ1,γ2,θ1,θ2)T=(0.7,0.5,0.5,0.4,0.2)T(\lambda,\gamma_{1},\gamma_{2},\theta_{1},\theta_{2})^{T}=(0.7,-0.5,0.5,0.4,0.2)^{T}. The cutoff point dd was chosen to be 0.5 and 0.3 for the first and second scenario, respectively, which led to 30.7% and 13.2% of censored observations for the corresponding dd. The true values for E{Y(1)}E\{Y^{(1)}\}, E{Y(2)}E\{Y^{(2)}\}, and E{Y(3)}E\{Y^{(3)}\} were 0.68, 0.62, and 0.52 for the first scenario, and 0.54, 0.45, and 0.41 for the second scenario.

The models for the treatment assignment and censoring were correctly specified in this setting. The logistic regression model and the Cox model for the outcome were misspecified such that squared terms of the covariates and interactions (if applicable) were included in the model.

5.3 Evaluation Metrics

For each scenario, we generated 20002000 Monte Carlo datasets, each with n=1500n=1500 subjects. For CIPWR, the standard errors and 95% confidence intervals (CIs) were estimated using the formula for asymptotic variance derived in Section 3.2. For the other methods considered, 200 bootstrap replicates were used to estimate the standard errors and 95% CI. Simulation results are presented in terms of bias, empirical standard deviation, root mean squared error (RMSE) and coverage rate of 95% CI.

5.4 Simulation Results

The numerical results for the simulation are reported in Tables B.2-B.7 in the Supplementary Materials, and are summarized in Figures 1-3 in the main text and Figures B.1-B.7 in the Supplemental Materials. Figures B.1-B.5 show the results for bias for the methods under comparison. Naive and IPW estimates were expectedly biased away from the true risk differences in all the scenarios in both settings (Figures B.1-B.3 in the Supplemental Materials), as they failed to accommodate censoring. When censoring was unrelated to baseline covariates (the random censoring scenario), the rest of the estimators considered were consistent for the true values (Figure B.1 in the Supplemental Materials). CIPWR estimators had close to zero bias regardless of whether the probability of remaining uncensored was estimated by Cox model or KM estimator (Figures B.4 in the Supplemental Materials). When censoring depended on the covariates, the bias for Pseudo-IPW and CAIPW-Wang estimators, which relied on the restrictive independence assumption CT|ZC\mathrel{\text{\scalebox{1.07}{$\perp\mkern-10.0mu\perp$}}}T|Z, became non-negligible. For example, the relative bias for Pseudo-IPW with a correctly specified propensity score model reached 21.5% in the scenario with 40% censoring (Table B.5 in the Supplemental Materials). CAIPW-ZS and CIPWR with at least one of the models for outcome and coarsening mechanism correctly specified remained unbiased across the scenarios considered in both settings, exhibiting double robustness property (Figures B.1-B.3 in the Supplemental Materials). When the probability of remaining uncensored was estimated based on the observed censoring time rather than the observation time, the empirical bias for CIPWR remained close to zero (Figure B.5 in the Supplemental Materials).

The RMSE results for Setting I are displayed in Figures 1 and 2. We report the ratio of RMSE to the RMSE of CIPW with correctly modeled coarsening mechanism. Figure 1 summarizes the results for different censoring mechanisms and proportions of being censored. In general, CAIPW-ZS and CIPWR had the smallest RMSE among the methods considered across all treatment pair comparisons. As the censoring proportions increased, we observed larger gain in efficiency for CAIPW-ZS and CIPWR over CIPW. Cox model-based and KM estimator-based CIPWR yielded similar RMSE in the random censoring scenario (Figure B.6 in the Supplemental Materials), with the former having slightly smaller RMSE than the latter in some cases. This finding suggests that estimating the probability of remaining uncensored from the data, even if the value is known, may actually lead to smaller variance for the CIPWR estimator than using the true value, which is consistent with the theoretical results in the literature.9 Moreover, estimating the probability of remaining uncensored using observation time tended to reduce the RMSE compared with using observed censoring time (Figure B.7 in the Supplemental Materials). Figure 2 displays the RMSE results for different levels of outcome associations. When the associations between the covariates and the outcome were weak, we observed 2.6%-3.7% reduction in RMSE for CIPWR and CAIPW-ZS. Greater reduction (7.6%-10.5%) was noted as the associations became stronger. The RMSE results for Setting II where crossing hazards existed are presented in Figure 3. Note that in this setting both the logistic regression model and the Cox model for the outcome were misspecified. Again, we observed lower RMSE for CIPWR and CAIPW-ZS compared to CIPW. Furthermore, CAIPW-ZS produced larger variability (and therefore larger RMSE) than CIPWR when the hazard functions had a crossing (Figure 3 and Table B.7). For example, the ratios of the variance of CAIPW-ZS to that of CIPWR ranged from 1 to 1.06 in the first scenario, and from 1.05 to 1.08 in the second scenario.

Refer to caption

Figure 1: RMSE over RMSE of CIPW with correctly specified propensity and censoring models for different proportions of censoring in Setting I. For CAIPW-Wang, the first letter and second letter denote the specification of the propensity and outcome model, respectively. For CIPWR and CAIPW-ZS, the first and second letter in the parentheses correspond to the model for coarsening mechanism and outcome, respectively. The outcome model in CAIPW-ZS is always misspecified, and we use c to denote the case where the true predictors for the outcome were included in the model. Propensity model is correctly specified for IPW, Pseudo-IPW, CIPW, and CIPW-ZS. Numbers that fall outside the range of x-axis are labeled in the figure. Sample size was 1500. Results were obtained using 2000 simulated datasets.

Refer to caption

Figure 2: RMSE over RMSE of CIPW with correctly specified propensity and censoring models for different levels of outcome associations in Setting I. Censoring depended on covariates and censoring proportion was 30%. For CAIPW-Wang, the first letter and second letter denote the specification of the propensity and outcome model, respectively. For CIPWR and CAIPW-ZS, the first and second letter in the parentheses correspond to the model for coarsening mechanism and outcome, respectively. The outcome model in CAIPW-ZS is always misspecified, and we use c to denote the case where the true predictors for the outcome were included in the model. Propensity model is correctly specified for IPW, Pseudo-IPW, CIPW, and CIPW-ZS. Numbers that fall outside the range of x-axis are labeled in the figure. Sample size was 1500. Results were obtained using 2000 simulated datasets.

Refer to caption

Figure 3: RMSE over RMSE of CIPW with correctly specified propensity and censoring models in the presence of crossing hazards in Setting II. The models for the coarsening mechanism were correctly specified. The outcome model was always misspecified in this setting. Numbers that fall outside the range of x-axis are labeled in the figure. Sample size was 1500. Results were obtained using 2000 simulated datasets.

All methods that were approximately unbiased in the presence of nonrandom censoring (i.e., CIPW, CIPW-ZS, CAIPW-ZS, and CIPWR) achieved close to nominal coverage of 95% (Tables B.2-B.7 in the Supplemental Materials).

6 Application to Comparison of Treatments for Prostate Cancer using Medical Claims

6.1 Data Analysis Methods

We applied our proposed method to a data set comprised of patients with metastatic castration-resistant prostate cancer (mCRPC), which was obtained from a large national private health insurance network (Optum Clinformatic Data Mart). The study cohort included patients who used at least one of the six drugs (docetaxel, abiraterone, enzalutamide, sipuleucel-T, cabazitaxel, and radium-223) approved to treat mCRPC from January 1, 2014, to December 31, 2019. Among these drugs, docetaxel and cabazitaxel are chemotherapies, abiraterone and enzalutamide are oral hormone therapies, sipuleucel-T is an immunotherapy, and radium-223 is a radioactive drug. We excluded the patients who received cabazitaxel (n=56n=56) or radium-223 (n=28n=28) as their first-line therapy from our analysis, since there were much fewer samples in these two groups than the other four. We examined the occurrence of ER visits and all-cause hospitalization within 180, 270, and 360 days of treatment initiation, respectively. Patients who switched to another treatment or dropped out of the insurance plan prior to the event of interest within the pre-specified time window were considered as being censored.

The treatment was modeled using a multinomial logistic regression adjusting for age, race, education level, household income, geographic region, insurance product type, whether the insurance plan is administrative services only (ASO), metastatic status of cancer, year of first prescription, comorbid conditions, and provider type.19 All covariates were binary or categorical. To improve the common support of the covariate distributions, we followed the criteria discussed in Lopez and Gutman20 and discarded the tails of the propensity score distributions. The outcome model and the censoring model adjusted for the same set of covariates that were controlled for in the treatment model. The CIs were obtained using (1) Wald-type CIs based on original data for the Naive method, and (2) bootstrap standard errors based on 200 bootstrap samples for the rest of the methods.

6.2 Data Analysis Results

Patients who had less than 180 days of continuous enrollment prior to the first prescription, had missing covariates, or experienced the event of interest on the same day as the first prescription were removed from the analysis. In the end, we identified 7678 and 7709 mCRPC patients for ER visit and hospitalization, respectively, and calling them ER visit cohort and hospitalization cohort. The sample sizes of the two cohorts differed because ER visit and initial treatment prescription for the first time were more likely to occur on the same day than subsequent hospitalization. The proportions of overall and cause-specific censoring within each specified time window for the two outcomes are reported in Table B.8 in the Supplementary Materials. In general, hospitalization (24.6%-40.6%) was associated with greater overall proportion of censored observations than ER visits (20.8%-32.6%). The proportions of patients being censored and the unadjusted risks ignoring censored patients for each treatment group are presented in Table B.9 in the Supplementary Materials. In particular, Sipuleucel-T group had larger percentage of censored patients than the other three groups. Docetaxel group had the highest crude risks of ER visits (53.6%, 64.6%, and 71.8% within 180, 270, and 360-day time windows, respectively) and hospitalization (41.1%, 52.3%, and 60.2% within 180, 270, and 360-day time windows, respectively). The baseline demographic and clinical characteristics of the ER visit cohort stratified by treatment groups are presented in Table B.10 in the Supplemental Materials. The covariate distributions of the hospitalization cohort were close to those of the ER visit cohort (data not shown).

To improve the covariate overlap among the treatment groups for the comparative analysis, we applied data trimming with criteria discussed in Lopez and Gutman,20 which left us with 7003 and 7045 patients for ER visit and hospitalization, respectively.

Figure 4 shows the differences in 360-day risks among the four treatment groups for both outcomes of interest. Results for 180-day and 270-day risks are presented in Figures B.8 and B.9 in the Supplemental Materials, respectively. When confounding and censoring were both ignored, docetaxel users had significantly higher risk of at least one ER visit within 360 days of treatment initiation than users of abiraterone, enzalutamide, and sipuleucel-T. Similar directional results for docetaxel vs. abiraterone and docetaxel vs. enzalutamide comparisons were noted for methods that accounts for both confounding and censoring, and the corresponding 95% CIs consistently excluded zero across the methods. For example, the risk difference estimated by the CIPWR estimator using observation time (CIPWR1) was -0.082 (95% CI [-0.118, -0.046]) for docetaxel vs. abiraterone comparison, and -0.156 (95% CI [-0.198, -0.114]) for docetaxel vs. enzalutamide comparison. These findings agree with the clinical evidence that oral therapies abiraterone and enzalutamide tend to have fewer side effects than docetaxel, a chemotherapy.21 Sipeucel-T was identified to have lower risk of ER visit than docetaxel, though the differences were not significant for some of the methods (e.g., risk difference=-0.158, 95% CI [-0.295, -0.021] for CAIPW-ZS; risk difference=-0.073, 95% CI [-0.172, 0.025] for CIPWR1). For the two oral drugs, Enzalutamide was identified to have lower risk of ER visit than Abiraterone within each specified specified period of time when both confounding and censoring were accounted for. The Naive and IPW methods indicated significantly higher 360-day risk of ER visit for Enzalutamide than Siputeucel-T, while the methods that account for both confounding and censoring showed that there was no significant difference. Similar patterns were observed for the risks of all-cause hospitalization for each time window considered.

Refer to caption

Figure 4: Differences in 360-day risks of experiencing at least one emergency room visit or hospitalization among the four focus drugs and the associated 95% confidence intervals. Data were obtained from Optum Clinformative Data Mart. Total sample size was N=7003N=7003 (NA=2458N_{A}=2458, ND=2162N_{D}=2162, NE=1833N_{E}=1833, NS=550N_{S}=550) for ER visits, and N=7045N=7045 (NA=2474N_{A}=2474, ND=2172N_{D}=2172, NE=1843N_{E}=1843, NS=556N_{S}=556) for hospitalization. CIPWR1 was based on observation time, and CIPWR2 was based on observed censoring time. Confidence intervals that exclude zero are highlighted in orange. Abbreviations: A, abiraterone; D, docetaxel; E, enzalutamide; S, sipuleucel-T; ER, emergency room visit.

In general, CIPWR based on observed censoring time (CIPWR2) tended to have wider CIs than CIPWR using observation time (CIPWR1), which is consistent with our simulation results (Figure B.7 in the Supplementary Materials). For example, the ratios of confidence widths of CIPWR1 over CIPWR2 for 360-day hospitalization ranged from 54.9% to 86.9%. Greater differences in the width of CIs were noted as the duration of time window became longer and the proportion of censored patients increased (Figures B.8 and B.9 in the Supplementary Materials and Figure 4). In most cases, CAIPW-ZS and CIPWR yielded narrower CIs than the CIPW estimator. For example, the percentage of reduction in width ranged from 58.8% to 92.4% for the risk of ER visits estimated by CIPWR1. CAIPW-ZS was noted to have wider CIs than CIPWR for treatment pairs that involved Sipuleucel-T for ER visits (Figure 4). Greater differences in point estimates between CAIPW-ZS and CIPWR, the former of which modeled the entire survival curve over time, was noted as the ending time point moved farther away from the treatment initiation.

Differences between methods that ignore and account for censoring increased as the time window was extended. Results for methods that rely on different independence assumptions on the censoring mechanism were similar. One possible reason is that censoring may only depend on the treatment in this cohort, and the restrictive version of the assumption is satisfied, as the censored patients within each time window exhibited similar demographic and baseline clinical characteristics to uncensored ones (Table B.11 in the Supplemental Materials), and most covariates were not significantly associated with censoring (Table B.12 in the Supplemental Materials).

7 Discussion

We present an inverse probability weighted regression-based estimator, CIPWR, for average treatment effect for a binary outcome that is subject to right-censoring. This method is based on the intuitively simple standardization idea, where we model the binary outcome given the observed covariates using the familiar logistic regression model for each treatment separately and then averaging predictions for all patients. The CIPWR method improves robustness by accounting for confounding due to nonrandomized treatment and censoring using the inverse probability weighting approach. Therefore, the proposed method is a hybrid of the two general approaches (standardization and weighting) in the missing data and causal inference literature that handle missingness, confounding and censoring. Like the well-studied augmented inverse probability weighting approach (e.g., CAIPW-Wang and CAIPW-ZS), the proposed method enjoys a double robustness property such that the estimator is consistent if either the (binary) outcome, or both treatment assignment and censoring are correctly modeled. However, in this method the double robustness and improvement in efficiency are not through direct augmentation. Instead, it achieves double robustness by combining two approaches in different stpdf, with each step based on popular models and the natural ideas of standardization and weighting. The proposed method is conceptually straightforward to understand and easy to implement using standard statistical software for practitioners.

Simulation studies show that in finite sample, CIPWR yielded approximately unbiased estimates and close to nominal coverage of 95% across the scenarios considered, particularly when censoring depends on the covariates. CIPWR also provides efficiency gain over CIPW by exploiting the information from the outcome model. The proposed method was applied to claims data for comparing the average treatment effects of multiple treatments.

Time-to-event data are often analyzed using approaches that model the whole survival curve from baseline to the end of follow-up, such as the Cox proportional hazards model used for CAIPW-ZS.7 In the case where interest only lies in the risk difference over a pre-specified period of time (e.g., 180-day risk of ER visit), a method that directly targets the survival function at the fixed time point can lead to better efficiency than general methods that estimate the whole survival curve. The problem can be reduced to estimating the marginal expectation of a (possibly censored) binary indicator of event occurrence, which we propose to solve by utilizing logistic regression that directly targets the binary outcome. In general, the difficulty of correctly modeling the time-to-event outcome given observed covariates increases as the time window becomes longer, and the incorrect model for the survival time tends to result in more severe problems than misspecification of the outcome model at a fixed time point. In our simulation setting where the hazard functions did not cross, CIPWR, which directly modeled the binary indicator of event occurrence, performed similarly to CAIPW-ZS, which utilized outcome information accumulated over time, in terms of RMSE. In the presence of crossing hazards, when both the logistic regression model and Cox model were misspecified, CIPWR realized more efficiency gain over CIPW than CAIPW-ZS, as the latter failed to capture the associations between the outcome and covariates. The efficiency gain of CIPWR over CAIPW-ZS was also observed in the data example, where CAIPW-ZS had wider CI than CIPWR for some treatment pairs (Figure 4), and the difference in confidence width increased as the time window became larger (Figure 4 and Figure B.8 in the Supplementary Materials).

In the presence of right censoring, discarding the censored observations may result in biased estimates for the treatment effects, even if the censoring was completely random, as shown in our simulation studies (Figure B.1 in the Supplementary Materials). In this paper, we assumed the independence of survival and censoring time conditional on treatment and baseline covariates, and used Cox model to estimate the probability of remaining uncensored, which was then inverted and used as weights in the estimating equation. Under the more restrictive conditional independence assumption given treatment only, it is sufficient to use the non-parametric KM estimator for estimating the probability of remaining uncensored. However, simulation results showed that when censoring was random, CIPWR based on Cox model was still unbiased for the treatment effect, and could possibly be more efficient than CIPWR based on KM estimator.

In this study, we focus on low-dimensional covariates and only consider simple parametric working models for the outcome and treatment. As researchers gain increasing access to large databases with a substantial collection of covariates, variable selection techniques for causal inference has been an emerging topic of interest.22, 23 Another possible extension to our method is to replace the parametric models with modern machine learning methods that can capture potential nonlinearities and nonadditivities. For example, neural networks and methods based on recursive partitioning have been suggested as promising alternatives to logistic regression for estimating propensity scores when the true model structure is complex.24, 25 In addition, treatment switching was treated in the same way as dropout and study termination. That is, an observation was considered censored when someone switched treatment, which may not be optimal and studying treatment sequences will be another challenge.

Acknowledgments

The research of BM was supported by NSF DMS 1712933.

Conflict of interest

The authors declare no potential conflict of interests.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Supporting information

Additional supporting information may be found online in the Supporting Information section at the end of this article.

References

  • 1 Rosenbaum PR, Rubin DB. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 1983; 70(1): 41–55. doi: 10.1093/biomet/70.1.41
  • 2 Stuart EA. Matching Methods for Causal Inference: A Review and a Look Forward. Statistical science : a review journal of the Institute of Mathematical Statistics 2010; 25(1): 1–21. doi: 10.1214/09-STS313
  • 3 Hu L, Gu C, Lopez M, Ji J, Wisnivesky J. Estimation of Causal Effects of Multiple Treatments in Observational Studies with a Binary Outcome. Statistical Methods in Medical Research 2020; 29(11): 3218–3234. doi: 10.1177/0962280220921909
  • 4 Yu Y, Zhang M, Shi X, Caram MEV, Little RJA, Mukherjee B. A Comparison of Parametric Propensity Score-Based Methods for Causal Inference with Multiple Treatments and a Binary Outcome. Statistics in Medicine 2021; 40(7): 1653–1677. doi: 10.1002/sim.8862
  • 5 Anstrom KJ, Tsiatis AA. Utilizing Propensity Scores to Estimate Causal Treatment Effects with Censored Time-Lagged Data. Biometrics 2001; 57(4): 1207–1218. doi: 10.1111/j.0006-341X.2001.01207.x
  • 6 Wang X, Beste LA, Maier MM, Zhou XH. Double Robust Estimator of Average Causal Treatment Effect for Censored Medical Cost Data. Statistics in Medicine 2016; 35(18): 3101–3116. doi: 10.1002/sim.6876
  • 7 Zhang M, Schaubel DE. Double-Robust Semiparametric Estimator for Differences in Restricted Mean Lifetimes in Observational Studies. Biometrics 2012; 68(4): 999–1009. doi: 10.1111/j.1541-0420.2012.01759.x
  • 8 Andersen PK, Syriopoulou E, Parner ET. Causal Inference in Survival Analysis Using Pseudo-Observations. Statistics in Medicine 2017; 36(17): 2669–2681. doi: 10.1002/sim.7297
  • 9 Tsiatis A. Semiparametric Theory and Missing Data. Springer Series in StatisticsNew York: Springer-Verlag . 2006
  • 10 Rubin DB. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology 1974; 66(5): 688–701. doi: 10.1037/h0037350
  • 11 Heitjan DF, Rubin DB. Ignorability and Coarse Data. The Annals of Statistics 1991; 19(4): 2244–2253. doi: 10.1214/aos/1176348396
  • 12 Gill RD, van der Laan MJ, Robins JM. Coarsening at Random: Characterizations, Conjectures, Counter-Examples. In: Lin DY, Fleming TR. , eds. Proceedings of the First Seattle Symposium in Biostatistics Lecture Notes in Statistics. Springer US; 1997; New York, NY: 255–294
  • 13 Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 1972; 34(2): 187–202. doi: 10.1111/j.2517-6161.1972.tb00899.x
  • 14 Casella G, Berger RL. Statistical Inference. Thomson Learning . 2002.
  • 15 Lin DY, Wei LJ. The Robust Inference for the Cox Proportional Hazards Model. Journal of the American Statistical Association 1989; 84(408): 1074–1078. doi: 10.1080/01621459.1989.10478874
  • 16 Boos DD, Stefanski LA. M-Estimation (Estimating Equations). In: New York, NY: Springer New York. 2013 (pp. 297–337)
  • 17 Perme MP, Gerster M, Rodrigues K. Pseudo: Computes Pseudo-Observations for Modeling.; 2017.
  • 18 Bender R, Augustin T, Blettner M. Generating Survival Times to Simulate Cox Proportional Hazards Models. Statistics in Medicine 2005; 24(11): 1713–1723. doi: 10.1002/sim.2059
  • 19 Caram MEV, Ross R, Lin P, Mukherjee B. Factors Associated With Use of Sipuleucel-T to Treat Patients With Advanced Prostate Cancer. JAMA network open 2019; 2(4): e192589. doi: 10.1001/jamanetworkopen.2019.2589
  • 20 Lopez MJ, Gutman R. Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas. Statistical Science 2017; 32(3): 432–454. doi: 10.1214/17-STS612
  • 21 Tonyali S, Haberal HB, Sogutdelen E. Toxicity, Adverse Events, and Quality of Life Associated with the Treatment of Metastatic Castration-Resistant Prostate Cancer. Current Urology 2017; 10(4): 169–173. doi: 10.1159/000447176
  • 22 Shortreed SM, Ertefaie A. Outcome-Adaptive Lasso: Variable Selection for Causal Inference. Biometrics 2017; 73(4): 1111–1122. doi: 10.1111/biom.12679
  • 23 Athey S, Imbens GW, Wager S. Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2018; 80(4): 597–623. doi: 10.1111/rssb.12268
  • 24 Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF. Evaluating Uses of Data Mining Techniques in Propensity Score Estimation: A Simulation Study. Pharmacoepidemiology and Drug Safety 2008; 17(6): 546–555. doi: 10.1002/pds.1555
  • 25 Lee BK, Lessler J, Stuart EA. Improving Propensity Score Weighting Using Machine Learning. Statistics in medicine 2010; 29(3): 337–346. doi: 10.1002/sim.3782