Nonparametric assessment of regimen response curve estimators

Cuong Pham¹, Benjamin R. Baer^1,2, and Ashkan Ertefaie¹
¹Department of Biostatistics and Computational Biology,
University of Rochester Medical Center, Rochester, NY, USA
²School of Mathematics and Statistics,
University of St Andrews, St Andrews, Scotland

Abstract

Marginal structural models have been widely used in causal inference to estimate mean outcomes under either a static or a prespecified set of treatment decision rules. This approach requires imposing a working model for the mean outcome given a sequence of treatments and possibly baseline covariates. In this paper, we introduce a dynamic marginal structural model that can be used to estimate an optimal decision rule within a class of parametric rules. Specifically, we will estimate the mean outcome as a function of the parameters in the class of decision rules, referred to as a regimen-response curve. In general, misspecification of the working model may lead to a biased estimate with questionable causal interpretability. To mitigate this issue, we will leverage risk to assess ”goodness-of-fit” of the imposed working model. We consider the counterfactual risk as our target parameter and derive inverse probability weighting and canonical gradients to map it to the observed data. We provide asymptotic properties of the resulting risk estimators, considering both fixed and data-dependent target parameters. We will show that the inverse probability weighting estimator can be efficient and asymptotic linear when the weight functions are estimated using a sieve-based estimator. The proposed method is implemented on the LS1 study to estimate a regimen-response curve for patients with Parkinson’s disease.

1 Introduction

Many diseases and disorders, including cancer, Parkinson’s, substance abuse, and depression, require a sequence of treatments to address the evolving characteristics of the disease and the patient. Dynamic treatment regimes offer personalized treatment sequences to cater to these specific needs (Robins, 1986, 1989, 1997; Murphy et al., 2001). Such regimes consist of a predefined set of decision rules established before treatment initiation, which map the patient’s time-varying characteristics to appropriate treatment options at each decision point.

A key step toward estimating optimal treatment regimes is to estimate the mean potential outcome under a given regime. Marginal structural models have been widely used for this purpose, where a working model is imposed for the mean outcome given a sequence of treatments and possibly baseline covariates. However, the majority of the current literature focuses on finding an optimal regime among a prespecified set of regimes (Robins et al., 2000; Murphy et al., 2001; Shortreed and Moodie, 2012; Ertefaie et al., 2016). Dynamic marginal structural models are alternatives that can be used to estimate optimal regimes within a class of parametric decision rules (Orellana et al., 2010; Duque et al., 2021). This approach estimates the mean outcome as a function of the parameters in the class of decision rules, referred to as a regimen-response curve (Ertefaie et al., 2023). The quality of the estimated optimal regime and its causal interpretability depends on how closely the working model approximates the true regimen-response curve (Neugebauer and van der Laan, 2007).

Cross-validation provides a powerful tool for model selection both in parametric and nonparametric settings. The asymptotic and finite sample properties of cross-validated risk have been extensively studied (van der Laan and Dudoit, 2003; van der Vaart et al., 2006). Brookhart and van der Laan (2006) illustrated how to utilize cross-validated risk for selecting the best nuisance parameters in a semi-parametric model. In this paper, we will use cross-validated risk to assess the performance of an imposed working model to estimate a regimen-response curve. In this context, risk quantifies the proximity of the working model to the actual regimen-response curve. It is defined as the Euclidean distance between the working model and the potential mean outcome under all treatment regimes. The challenge lies in the fact that we do not observe the potential mean outcome under a treatment regime. Hence, an inverse probability weighting (IPW) mapping is often utilized to estimate the risk of a working model.

The consistency of the IPW estimators relies on a correctly specified propensity score model (Cole and Hernán, 2008; Mortimer et al., 2005; Howe et al., 2011). This requirement is not always guaranteed when parametric methods are used. A solution to this problem is to use nonparametric methods. However, naively utilizing nonparametric methods to estimate the propensity score often leads to risk estimators that are not $\sqrt{n}$ -consistent and suffer poor finite sample performance (Ertefaie et al., 2023). This is because the convergence rate of an IPW estimator relies on the convergence rate of the propensity score estimator, which is slower than the desired root- $n$ rate when nonparametric models are used. Additionally, inefficiency is a notable drawback of IPW estimators. In clinical trials that involve human subjects, efficiency is a critical consideration, as it is usually costly to have a large sample size.

Numerous approaches have emerged to address limitations in the IPW estimator. For instance, Imai and Ratkovic (2015) proposed the covariate balancing propensity score, while Yu and van der Laan (2006) introduced a double robust estimator. However, these methods were tailored for estimating the marginal structural model in a static treatment setting and were not applicable in identifying the optimal dynamic treatment setting. We expand the literature by deriving the canonical gradient of the risk function and proposing a multiply robust estimator to estimate the risk of the dynamic marginal structural working model. This estimator maintains consistency even when the nuisance parameters were misspecified (Rotnitzky et al., 1998; van der Laan and Robins, 2003). It achieves efficiency under the correct specification of all nuisance parameters. Furthermore, the multiply robust estimator allows for a slower convergence rate of the propensity score model, enabling the utilization of more flexible nonparametric methods for its modeling. However, a drawback of the multiply robust estimator is its reliance on estimating numerous nuisance parameters. Employing parametric methods for their estimation can lead to inconsistency in some parameters, impacting the estimator’s efficiency. Conversely, nonparametric methods ensure consistency but may become computationally burdensome, potentially yielding irregular estimators when some parameters are inconsistently estimated (van der Laan, 2014; Benkeser et al., 2017). Therefore, we also introduce a technique to refine the IPW estimator, enhancing its efficiency. Initially introduced by Ertefaie et al. (2023) for estimating causal treatment effects in point exposure settings, we extend this method to the longitudinal setting.

The contribution of our proposed method beyond the existing literature can be summarized as follows. First, we derive the canonical gradient of the risk function corresponding to a dynamic marginal structural model in longitudinal settings. Our derivation leads to a multiply robust estimator of the risk function. Second, we show that when the propensity score is estimated using a sieve-based approach, the resulting IPW estimator will solve the efficient influence function. This implies that the proposed IPW estimator achieves the nonparametric efficiency bound. Third, we show that, under certain rate conditions, the proposed multiply robust and sieve-based estimators are asymptotically linear for both fixed and data dependent risk functions.

2 Problem Setting

2.1 Notation and setup

Our data consists of $n$ i.i.d. samples of $O_{i}=\{S_{1i},A_{1i},\dots,S_{Ti},A_{Ti},Y_{i}\}^{n}_{i=1}\sim\mathbb{P}_{0}\in\mathcal{M}$ where $\mathcal{M}$ is a nonparametric model. We denote $S_{t}$ as the vector of patient characteristics at time $t$ and $V\subseteq S_{1}$ as a subset of the baseline characteristics. The observed treatment is denoted $A_{t}\in\{0,1\}$ at time $t=1,\cdots,T$ , and the outcome of interest is denoted $Y$ . We use the overbar notation to denote the history of a variable. For example, $\bar{S}_{t}=(S_{1},\dots,S_{t})$ and $\bar{A}_{t}=(A_{1},\dots,A_{t})$ are the covariates and past treatment up to time $t$ . We let $\bar{H}_{t}=(\bar{A}_{t-1},\bar{S}_{t})$ be all the information up to time $t$ . We denote $\mathbb{P}_{n}$ as the empirical distribution and define $\mathbb{P}f=\int f(o)\,\mathrm{d}\mathbb{P}(o)$ for a given function $f(o)$ and distribution $\mathbb{P}$ .

A dynamic regime consists of a set of decision rules that assigns the treatment at each stage based on the information at and before that stage. We define the deterministic dynamic treatment rule $d^{\theta}=I(\theta_{t}^{T}S_{t}>0)$ to be the decision rule at time $t$ . $\pi_{0}^{t}\equiv P(A_{t}=d_{t}^{\theta}|\bar{H}_{t})$ is the true propensity score at time $t$ and $\pi_{n}^{t}$ denote the corresponding estimator. And, $Y^{\theta}$ denotes the potential outcome under the regime $\theta$ . Suppose that we have the full data $X=\{(Y^{\theta}:d^{\theta}\in\mathcal{D}),V\}\sim P_{X}\in\mathcal{M}^{F}$ where $\mathcal{M}^{F}$ is a nonparametric full data model. We can then define the regimen response curve $m_{0}(\theta,V):=E_{P_{X}}(Y^{\theta}\mid V)$ .

2.2 Cross-validated risk

A dynamic marginal structural model can be used to approximate $m_{0}(\theta,V)$ by imposing a working model. We quantify the quality of the imposed working model using a cross-validated risk function. Let $B_{n}\in\{0,1\}^{n}$ denote the random partition of $\{1,\dots,n\}$ into two folds. A realization of $B_{n}=\{B_{n,1},B_{n,2},\cdots,B_{n,n}\}$ defines a particular split of the sample of $n$ observations into a training set $\{i\in\{1,\cdots,n\}:B_{n,i}=0\}$ and a validation set $\{i\in\{1,\cdots,n\}:B_{n,i}=1\}$ . Denote by $\mathbb{P}_{n,b}^{0}$ and $\mathbb{P}_{n,b}^{1}$ the empirical distribution of the training sample and the validation sample. The type of cross-validation procedure is determined by the specific distribution of $B_{n}$ . For example, in Monte Carlo cross-validation, where the sample is divided into training and validation sets with probability $p$ , the distribution of $B_{n}$ puts mass $1/\binom{n}{np}$ on each binary realization of $B_{n}$ . Another example is $B$ -fold cross-validation, in which the distribution of $B_{n}$ puts mass $1/B$ on each possible realization. Throughout, we assume that this randomness is independent of all data.

Consider an estimation rule $m_{n}(\theta,V)\equiv m_{n}(\theta,V\mid\mathbb{P})$ that is fit using the distribution $\mathbb{P}$ ; we refer to $m_{n}$ as a marginal structural model (MSM) to be consistent with the literature, although it is the fit from a working model rather than a model itself. Let $m_{n,b}(\theta,V)\equiv m_{n}(\theta,V\mid\mathbb{P}_{n,b}^{0})$ denote the MSM $m_{n}(\theta,V)$ evaluated at the empirical distribution $\mathbb{P}_{n,b}^{0}$ , i.e. the training data. We denote $m^{*}(\theta,V)$ as the pointwise probability limit of $m_{n,b}(\theta,V)$ . Note, $m^{*}$ does not necessarily equal the true regimen-response curve $m_{0}$ .

Define the full data conditional risk based on training observations as

\displaystyle\Psi_{0}(m_{n})\equiv

\displaystyle\mathbb{E}_{B_{n}}\left[\mathbb{P}_{0}\left\{\int\{Y^{\theta}-m_{n,b}(\theta,V)\}^{2}\,\mathrm{d}F(\theta)\right\}\right],

(1)

where $\mathbb{E}_{B_{n}}$ denotes an expectation over possible partitions and $F$ is a user-specified distribution function over the parameter $\theta$ . The averaging over the partitions in (1) prevents the estimand from depending on any randomness in the selection of training data, thus making $\Psi_{0}(m_{n})$ a data-adaptive estimand (Hubbard et al., 2016). We define the full non-data-adaptive for a given regimen-response curve $m$ as $\Psi_{0,\text{nda}}(m)\equiv\mathbb{P}_{0}\left\{\int\{Y^{\theta}-m(\theta,V)\}^{2}\,\mathrm{d}F(\theta)\right\}.$ The full data conditional risk, $\Psi_{0}(m_{n})$ can be expressed in term of non-data-adaptive risk, $\Psi_{0,\text{nda}}(m)$ , through the relation, $\Psi_{0}(m_{n})=\mathbb{E}_{B_{n}}\left[\Psi_{0,\text{nda}}(m_{n,b})\right]$ . In Section 3.1, we develop efficiency theory for $\Psi_{0,\text{nda}}(m)$ of a given regimen-response curve, $m$ . Then, we will use those results to construct a nonparametric asymptotically linear estimators for $\Psi_{0}(m_{n})$ .

2.3 Inverse-probability weighted identification

For a given marginal structural model $m_{n}$ , the definition of the risk $\Psi_{0}(m_{n})$ involves the potential outcome $Y^{\theta}$ which is not observed. However, under the following causal assumptions, it is identifiable using the observed data. Recall that $\pi_{0}^{t}\equiv P(A_{t}=d_{t}^{\theta}|\bar{H}_{t})$ is the true propensity score at time $t$ .

Assumption 1.

(Causal assumptions)

a. Consistency: $Y=Y^{\theta}$ if $\bar{A}_{T}=\bar{d}^{\theta}_{T}$ for all $\theta$ ;
b. Exchangeability: $A_{t}\perp\!\!\!\!\perp Y^{\theta}\mid\bar{H}_{t}$ for all $\theta$ and all $t=1,\dots,T$ ;
c. Positivity: $\prod_{t=1}^{T}\pi_{0}^{t}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})>\epsilon$ almost surely, for all $\theta$ and some $\epsilon>0$ .

Assumption 1a states that the potential outcome under an observed regime equals the observed value. This assumption can be violated if the treatment assignment of a unit impacts the outcome of others (Hernán and Robins, 2023; Kennedy, 2019). Assumption 1b states that there are no unmeasured confounders. This assumption holds in standard randomized studies. However, in observational studies it requires a rich enough set of characteristics $S_{1},\dots,S_{T}$ . Assumption 1c states that regardless of a subject’s medical history, that person will have a nonzero probability of following any given treatment regime.

Lemma 1.

Let $m_{n}$ be an estimation rule for a regimen response curve. If Assumption 1 holds, then the estimand is an observed-data functional, that is,

\displaystyle\Psi_{0}(m_{n})

\displaystyle=\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left[\left\{\int\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi_{0}^{t}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\right\}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right].

2.4 Inverse-probability weighted estimator

Applying the plug-in approach to Lemma 1, we have the inverse probability weight (IPW) estimator for the full data conditional risk of a working model $m_{n}(\theta,V)$ .

\displaystyle\Psi_{n}^{\mathrm{IPW}}(m_{n},\pi_{n})

\displaystyle=\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\left[\int\left\{\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{n,b}(A_{t}=d_{t}^{\theta}\mid\bar{A}_{t-1},\bar{S}_{t})}\right\}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right],

where $\pi^{t}_{n,b}$ is the estimate of the propensity score applied to the training sample for the $b^{th}$ sample split. The consistency and convergence rate of $\Psi_{n,m}^{\mathrm{IPW}}(\pi_{n})$ depends entirely on the consistency and convergence rate of the estimators $\pi^{t}_{n}$ of $\pi^{t}_{0}$ for each time point $t=1,\dots,T$ . It is common to estimate the propensity score using parametric models as it has the convergence rate of $O_{p}(n^{-1/2})$ . Alternatively, data-adaptive methods can be used to increase the flexibility of the model. However, due to a slow convergent rate of nonparametric methods, the IPW estimator could be biased. We can see that through the following decomposition.

For a given marginal structural model $m(\theta,V)$ , define the loss as

L_{m}(\pi)=\int\left\{\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}}\right\}\left\{Y-m(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta).

Then, we have

	$\displaystyle\Psi_{n}^{\mathrm{IPW}}(m,\pi_{n})-\Psi_{0}(m,\pi_{0})=\mathbb{P}_{n}L_{m}(\pi_{n})-\mathbb{P}_{0}L_{m}(\pi_{0})$
	$\displaystyle\hskip 42.67912pt=(\mathbb{P}_{n}-\mathbb{P}_{0})L_{m}(\pi_{0})+\mathbb{P}_{0}\{L_{m}(\pi_{n})-L_{m}(\pi_{0})\}+(\mathbb{P}_{n}-\mathbb{P}_{0})\{L_{m}(\pi_{n})-L_{m}(\pi_{0})\}$
	$\displaystyle\hskip 42.67912pt=(I)+(II)+(III).$

Assuming $\pi_{n}$ is a consistent estimator of $\pi_{0}$ and Donsker class condition, using standard empirical process theory, we can show that term (III) will converge to 0 at rate $n^{-1/2}$ (van der Vaart, 1998). Hence, $\Psi_{n,m}^{\mathrm{IPW}}$ is asymptotically linear only if term (II) also converges to 0 at the rate $n^{-1/2}$ . Since that rate is not obtainable using nonparametric methods, term (II) will dominate the right hand side, causing $\Psi_{n,m}^{\mathrm{IPW}}$ to be biased. The same issue arises when $m_{n}(\theta,V)$ is a cross-fitted estimator of the underlying marginal structural model.

2.5 The highly adaptive lasso

Our proposed approach for refining the inverse probability weighting estimator of $\Psi_{0}(m_{n})$ utilizes the Highly Adaptive Lasso model (HAL). The detail of this process will be discussed in Section 4.1. To provide the context, in this section, we will give a concise introduction to the HAL model and highlight its key characteristics. For an in-depth exploration of HAL, we refer our interested readers to Benkeser and van der Laan (2016); Ertefaie et al. (2023)

The highly adaptive lasso is a nonparametric regression function that estimates infinite dimensional functions by constructing a linear combination of indicator functions that minimize a loss function by constraining the $L_{1}$ -norm complexity of the model. (Bibaut and van der Laan, 2019) shows that highly adaptive lasso estimators have a fast convergent rate of $n^{-1/3}$ (up to a log factor).

Let $\mathbb{D}[0,\tau]$ be the Banach space of $d$ -variate real-valued cadlag functions on a cube $[0,\tau]\in\mathbb{R}^{d}$ . For a function $f\in\mathbb{D}[0,\tau]$ , the sectional variation norm is defined as

\|f\|_{\nu}=|f(0)|+\sum_{r\subset\{1,\ldots,d\}}\int_{0_{r}}^{\tau_{r}}\left|df_{r}(u)\right|,

where $f_{r}(u)=f\{u_{1}I(1\in r),\dots,u_{d}I(d\in r)\}$ is the $r^{th}$ section of $f$ and the summation is over all subset of $\{1,\dots,d\}$ . The term $\int_{0_{r}}^{\tau_{r}}\left|df_{r}(u)\right|$ is the r-specific variation norm.

For any $f\in\mathbb{D}[0,\tau]$ with $\|f\|_{\nu}<\infty$ , Gill et al. (1995) shows that $f$ can be represented as a linear combination of indicator functions.

	$\displaystyle f(x):$	$\displaystyle=f(0)+\sum_{r\subset\{1,\ldots,d\}}\int_{0_{r}}^{w_{r}}df_{r}(u)$
		$\displaystyle=f(0)+\sum_{r\subset\{1,\ldots,d\}}\int_{0_{r}}^{\tau_{r}}I\left(u_{r}\leq x_{r}\right)df_{r}(u).$		(2)

The latter integral in Equation 2 can be approximated using a discrete measure $F_{n}$ putting mass on the observed values of $X_{r}=\left(X_{j}:j\in r\right)$ denoted as $x_{r,i}$ for the $i$ the subject. We let $\beta_{r,i}$ denote the mass placed on each observation $x_{r,i}$ and $\phi_{r,i}=I\left(x_{i,r}\leq c_{r}\right)$ . Then the approximated $f$ is

	$\displaystyle\tilde{f}(x)$	$\displaystyle=f(0)+\sum_{r\subset\{1,\ldots,d\}}\sum_{i=1}^{n}I\left(x_{r,i}\leq c_{r}\right)f_{r,n}\left(dx_{i,r}\right)$
		$\displaystyle=\beta_{0}+\sum_{r\subset\{1,\ldots,d\}}\sum_{i=1}^{n}\beta_{r,i}\phi_{r,i}$

In this formulation, $\left|\beta_{0}\right|+\sum_{r\subset\{1,\ldots,d\}}\sum_{i=1}^{n}\left|\beta_{r,i}\right|$ is an approximation of the sectional variation norm of $f$ . Consequently, the minimum loss based highly adaptive lasso estimator $\beta_{n}$ is defined as

\beta_{n}=\arg\min_{\beta:\left|\beta_{0}\right|+\sum_{r\subset\{1,\ldots,d\}}\sum_{i=1}^{n}\left|\beta_{r,i}\right|<\lambda}P_{n}L(\tilde{f}).

where $L(.)$ is a given loss function. This is a constrained minimization that corresponds to lasso linear regression with up to $n\times\left(2^{d}-1\right)$ parameters (i.e., $\left.\beta_{r,i}\right)$ . Each value of $\lambda$ will give a unique HAL estimator.

3 Efficient multiply-robust estimation

In this section, we develop a multiply robust (MR) estimator for the cross-validated risk $\Psi_{0}(m_{n})$ of an estimation rule, or working model, $m_{n}$ .

3.1 The Canonical Gradient of $\Psi_{0,\text{nda}}(m)$

To construct the MR estimator, we need to find the efficient influence function of $\Psi_{0,\text{nda}}(m)$ . Influence functions are important because a regular and asymptotically linear estimator can be expressed as the empirical mean of the influence function plus some negligible term that converge to 0 at the rate of $n^{-1/2}$ or faster (van der Laan and Robins, 2003; Tsiatis, 2006). The efficient influence function is the influence function whose variance corresponds to the efficiency bound. The influence function is unique in the nonparametric setting and equals the efficient influence function. The multiply robust estimator is defined using the efficient influence function, and we will show that it is asymptotically efficient when all the nuisance parameters converge to their true value at a rate of $n^{-1/4}$ . This slower-than-parametric rate allows us to estimate the nuisance parameters with flexible nonparametric methods. Although the MR estimator is not efficient when the propensity score model is misspecified, it is still consistent when the outcome regression is correctly specified, or vice versa.

The following result provides the nonparametric efficient influence function for the inner term in the definition of the cross-validated risk $\Psi_{0}(m_{n})$ .

Theorem 1.

Let $m$ be a given regimen-response curve. If Lemma 1 holds, the nonparametric efficient influence function for

\Psi_{0,\text{nda}}(m)\equiv\mathbb{P}_{0}\int\{Y^{\theta}-m(\theta,V)\}^{2}\,\mathrm{d}F(\theta)

is $\varphi(\pi_{0},Q_{0,m};m)-\Psi_{0,\text{nda}}(m)$ , where the uncentered term $\varphi(\pi_{0},Q_{0,m};m)$ is

\displaystyle\int\left[\left\{\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}}\right\}\{Y-m(\theta,V)\}^{2}-\sum_{t=1}^{T}\frac{I(A_{t}=d^{\theta}_{t})-\pi^{t}_{0}}{\pi^{t}_{0}}Q^{t}_{0,m}\prod_{j=1}^{t-1}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{j}_{0}}\right]\,\mathrm{d}F(\theta),

and the nuisance parameter

Q^{t}_{0,m}=\mathbb{P}_{0}\left[\left\{\prod_{j=t+1}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}}\right\}\{Y-m(\theta,V)\}^{2}\middle|\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\right].

The structure of the influence function in Theorem 1 involves a weighted average over $\theta$ of an inverse probability weighting term and an augmentation term for each of the time points. The function $Q^{t}_{0,m}$ can be represented as a recursive sequential regression formula, which is useful in practice.

Remark 1.

The function $Q^{t}_{0,m}$ can be expressed recursively as

Q^{t}_{0,m}=\mathbb{P}_{0}\left(Q^{t+1}_{0,m}\mid\bar{S}_{t+1},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\right),

for $t=1,\dots,T$ , where $Q^{T+1}_{0,m}=Y$ .

3.2 The estimator

The form of the efficient influence function for $\Psi_{0,\text{nda}}$ may be used to construct an estimator for $\Psi_{0}(m_{n})$ that is asymptotically efficient for certain choices of nuisance parameter estimators (Bickel et al., 1993). The estimator construction is more delicate than usual since the efficient influence function is for $\Psi_{0,\text{nda}}(m)$ , with $m$ a given regimen-response curve, while the estimand is $\Psi_{0}(m_{n})=\mathbb{E}_{B_{n}}\{\Psi_{0,\text{nda}}(m_{n}(\cdot;\mathbb{P}^{0}_{n,b}))\}$ . We first develop estimators as if $m$ were given, then we discuss how to adapt these estimators to instead target our estimand $\Psi_{0}(m_{n})$ .

Consider estimation of $\Psi_{0,\text{nda}}(m)$ using the efficient influence function as a guide. Two common estimation approaches are to use one-step estimators, which add the the estimated efficient influence function to an initial estimator, and estimating-equation estimators, which treat the efficient influence function as an estimating function (Tsiatis, 2006; Kennedy, 2022). Since the efficient influence function in Theorem 1 only depends on $\Psi_{0,\text{nda}}(m)$ through its subtraction, each of these estimation strategies is equivalent. Thus we initially consider the estimator

\mathbb{P}_{n}\{\varphi(\pi_{n},Q_{n,m};m)\},

(3)

where $\pi_{n}$ and $Q_{n,m}$ are estimators for $\pi_{0}$ and $Q_{0,m}$ respectively. Assuming these nuisance parameter estimators satisfy certain conditions, we can show that the above estimator is asymptotically efficient, i.e. has the efficient influence function of $\Psi_{0,\text{nda}}(m)$ , found in Theorem 1, as its influence function.

Now consider estimation of our estimand $\Psi_{0}(m_{n})=\mathbb{E}_{B_{n}}[\Psi_{0,\text{nda}}\{m_{n}(\cdot\mid\mathbb{P}^{0}_{n,b})\}]$ ; we construct our estimator by adapting the estimator in (3) for $\Psi_{0,\text{nda}}(m)$ . A plugin-like approach readily produces the estimator

\mathbb{E}_{B_{n}}\{\mathbb{P}_{n}\varphi(\pi_{n,b},Q_{n,b,m_{n,b}};m_{n,b})\},

(4)

in which an outer average over folds is included, each fixed regimen-response curve $m$ is replaced with $m_{n,b}$ which is estimated in the training data $\mathbb{P}^{0}_{n,b}$ of the fold, and each nuisance parameter is replaced with estimators within the training data.

The estimator in (4) is almost a suitable estimator for the cross-validated risk $\Psi_{0}(m_{n})$ , however it still has one major deficiency which harms its performance. It averages over all observations, including those in the training set $\mathbb{P}_{n,b}^{0}$ as well as the validation set $\mathbb{P}_{n,b}^{1}$ , which biases the estimator downwards. A corrected estimator, which we study throughout this paper, is

\displaystyle\Psi_{n}^{\mathrm{MR}}(m_{n},\pi_{n},Q_{n,m_{n}})=\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\varphi(\pi_{n,b},Q_{n,b,m_{n,b}};m_{n,b}),

(5)

where $\pi_{n,b}$ and $Q_{n,b,m_{n,b}}$ are estimates of the nuisance parameters $\pi_{0}$ and $Q_{n,m_{0}}$ calculated with the training sample for the $b^{\text{th}}$ sample split.

The next result establishes that $\Psi_{n}^{\mathrm{MR}}$ is an asymptotic efficient estimator for the estimand $\Psi_{0}(m_{n})$ . Before stating it, we make two assumptions.

Assumption 2.

For all $t=1,\dots,T$ , the nuisance parameters $\pi^{t}_{n}$ and $Q^{t}_{n,m_{n,b}}$ satisfy $\mathbb{E}_{B_{n}}\int\|\pi^{t}_{0}-\pi^{t}_{n,b}\|_{L_{2}(\mathbb{P}_{0})}\|Q^{t}_{0,m_{n,b}}-Q^{t}_{n,m_{n,b}}\|_{L_{2}(\mathbb{P}_{0})}dF(\theta)=o_{p}(n^{-1/2})$ .

Assumption 3.

The estimator $m_{n,b}$ satisfies $\int\|m_{n,b}(\theta;V)-m^{*}(\theta;V)\|_{L_{2}(\mathbb{P}_{0})}dF(\theta)=o_{p}(n^{-1/4})$ .

Assumption 2 states that the estimators $\pi_{n}^{t}$ and $Q^{t}_{n,m_{n,b}}$ need to converge to its true value at a rate of $o_{p}(n^{-1/4})$ for any $\theta$ . This assumption holds when these nuisance parameters are estimated using the highly adaptive lasso. Moreover, if either $\pi_{0}^{t}$ or $Q^{t}_{n,m_{n,b}}$ converge to its true value at the rate of $O_{p}(n^{-1/2})$ , we only require consistency for the other nuisance parameter. The rate of $O_{p}(n^{-1/2})$ can be achieved using parametric methods. Assumption 3 is also a relatively mild assumption. It only requires the fitted marginal structure model, $m_{n,b}$ , to have a limit $m^{*}$ . And $m_{n,b}$ converges to $m^{*}$ at the rate of $o_{p}(n^{-1/4})$ , enabling the utilization of both parametric and nonparametric methods to estimate the marginal structure model.

Theorem 2.

If Assumptions 1 holds, then the estimator $\Psi_{n}^{\mathrm{MR}}(m_{n},\pi_{n},Q_{n,m_{n}})$ satisfies

\Psi_{n}^{\mathrm{MR}}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m_{n})=(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi(\pi_{0},Q_{0,m^{*}};m^{*})+R_{n},

where $R_{n}$ is defined in Section 10 of the Supplement Material. If Assumption 2 additionally holds, then $R_{n}=o_{p}(n^{-1/2})$ and

\sqrt{n}\left\{\Psi_{n}^{\mathrm{MR}}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m_{n})\right\}\rightsquigarrow\mathcal{N}\left[0,\mathbb{V}_{0}\{\varphi(\pi_{0},Q_{0,m^{*}};m^{*})\}\right].

Theorem 2 shows that as the fitted working marginal structural model $m_{n,b}^{0}$ converges to its limiting value $m^{*}$ at the rate of $o_{p}(n^{-1/4})$ , the multiply robust estimator is asymptotic linear with respect to the limit of $m_{n}$ . The centering term in Theorem 2 is the data-adaptive risk $\Psi_{0}(m_{n})$ that depends on the estimator $m_{n}$ , the true propensity score and outcome regression.

When two working models are under considerations, the results of Theorem 2 allow us to compare their estimators, $m_{n,1}$ and $m_{n,2}$ , by applying the asymptotic linearity result once for each estimator. We may ascertain the quality of the estimators by comparing their risks $\Psi_{0}(m_{n,1})$ and $\Psi_{0}(m_{n,2})$ by testing the informal null hypothesis that these risks are equal.

In Theorem 2, we study the difference $\Psi^{\mathrm{MR}}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m_{n})$ . This is because $\Psi^{\mathrm{MR}}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m^{*})$ is asymptotically linear under much more restrictive assumptions. In the following result, we study the asymptotics of $\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m^{*})$ to replace the data-adaptive target parameter $\Psi_{0}(m_{n})$ with the fixed target parameter $\Psi_{0}(m^{*})$ in the conclusions of Theorem 2.

Theorem 3.

If Assumptions 1 and 3 hold and additionally $m^{*}$ , the limit of $m_{n}$ , equals the true minimizer of $\Psi_{0}(m)$ over $m\in\mathcal{M}$ (i.e., $m^{*}=m_{0}$ ), then

\displaystyle\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m^{*})=(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi(\pi_{0},Q_{0,m_{0}};m_{0})+R_{n},

for $R_{n}$ defined in Section 11 of the Supplementary Materials. If additionally Assumption 2 holds, then $R_{n}=o_{p}(n^{-1/2})$ and

\sqrt{n}\left\{\Psi_{n}^{\mathrm{MR}}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m^{*})\right\}\rightsquigarrow\mathcal{N}[0,\mathbb{V}_{0}\{\varphi_{m^{*}}(\pi_{0},Q_{0,m^{*}})\}].

The result of Theorem 3 can be used to compare two different modeling approaches known to be consistent with $\|m_{n,1}-m_{0}\|_{2}=o_{p}(n^{-1/4})$ and $\|m_{n,2}-m_{0}\|_{2}=o_{p}(n^{-1/4})$ . Despite having the same asymptotic variance, the approximated variance based on $\varphi(\pi_{0},Q_{0,m_{n,1}};m_{n,1})$ and $\varphi(\pi_{0},Q_{0,m_{n,2}};m_{n,2})$ may differ. As an illustration, one could consider comparing the super learner with a library that includes HAL alongside other parametric and nonparametric methods, against a HAL-only model.

4 Efficient plugin estimation

4.1 The estimator

Although doubly robust procedures facilitate the use of data adaptive techniques for modeling nuisance parameters, the resultant estimator can be irregular with large bias and a slow rate of convergence when either of the nuisance parameters is inconsistently estimated (van der Laan, 2014; Benkeser et al., 2017). To mitigate this issue, we propose a sieve based IPW estimator called UIPW where the propensity scores are estimated using an undersmoothed highly adaptive lasso. Define the estimator as

\displaystyle\Psi_{n}^{\mathrm{UIPW}}(m_{n},\pi_{n,\lambda})=\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\left[\int\left\{\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{n,b,\lambda}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\right\}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right],

(6)

where $\pi^{t}_{n,b,\lambda}$ is the highly adaptive lasso estimate of $\pi^{t}_{0}$ obtained using the b^th sample split for a given tuning parameter $\lambda$ . Smaller values of $\lambda$ imply a weaker penalty, thereby undersmoothing the fit.

Our theoretical results show that for specific choices $\lambda$ , the function $\Psi_{m_{n},n}^{\mathrm{UIPW}}(\pi_{n,\lambda})$ can solve the efficient influence function presented in Theorem 1. This implication indicates that the proposed undersmoothed highly adaptive lasso estimator achieves nonparametric efficiency. This accomplishment relies on the satisfaction of two key assumptions:

Assumption 4 (Regularity of Functions).

$Q^{t}_{0,m^{*}}$ and $\pi^{t}_{0}$ for any $1\leq t\leq T$ are cadlag functions with finite sectional variation norm.

Assumption 5 (Approximation of Functions).

For $1\leq t\leq T$ , let $f_{t}=\frac{Q^{t}_{0,m^{*}}}{\pi^{t}_{0}}\prod_{j=1}^{t-1}\frac{I(A_{t}=d^{\theta}_{t})}{\pi^{j}}$ , and $\tilde{f}_{t}$ be the projection of $f_{t}$ onto the linear span of the basic functions $\phi^{t}_{r,i}$ in $L^{2}(P)$ . Here, $\phi^{t}_{r,i}$ represents the basis functions generated by the highly adaptive lasso fit of $\pi^{t}_{n,b,\lambda}$ . We assume that $\|f_{t}-\tilde{f}_{t}\|_{L_{2}(\mathbb{P}_{0})}=O_{p}(n^{-1/4})$ .

Assumption 4 typically holds in practical applications because the set of cadlag functions with finite sectional variation norms is extensive.Assumption 5, on the other hand, posits that the generated basis functions in $\pi^{t}_{b,\lambda}$ are sufficient to approximate $f_{t}$ within an $O_{p}(n^{-1/4})$ neighborhood of $f_{t}$ . Without undersmoothing, the basis functions can effectively approximate $\pi^{t}_{b,\lambda}$ when $\pi^{t}_{0}$ is as complex as $f_{t}$ . However, when the complexity of $f_{t}$ surpasses that of $\pi^{t}_{b,\lambda}$ , more undersmoothing is necessary.

Theorem 4 (Efficiency of UIPW estimator).

Let

\displaystyle D^{t}_{CAR}(\pi,Q)=\int\left\{\frac{I(A_{t}=d^{\theta}_{t})-\pi^{t}}{\pi^{t}}Q^{t}_{m}\prod_{j=1}^{t-1}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{j}}\right\}dF(\theta).

Under Assumptions 1,4,5 and $\mathbb{E}_{B_{n}}\int\|\pi_{n,b}-\pi_{0}\|_{2}.\left(\|\pi_{n,b}-\pi_{0}\|_{2}+\|m_{n,b}-m^{*}\|_{2}\right)dF(\theta)=o_{p}(n^{-1/2})$ , there exists a set of $\lambda_{opt}=\{\lambda^{1}_{opt},...,\lambda^{T}_{opt}\}$ such that $\mathbb{E}_{B_{n}}\mathbb{P}_{n}\sum_{t=1}^{T}D^{t}_{CAR}(\pi_{b,\lambda_{opt}},Q_{0,m^{*}})=o_{p}(n^{-1/2})$ and the estimator $\Psi^{UIPW}_{n}({m_{n}},\pi_{n,\lambda_{opt}})$ is asymptotically efficient and satisfies

\Psi^{UIPW}_{n}(m_{n},\pi_{n,\lambda_{opt}})-\Psi_{0}(m_{n},\pi_{0})=(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi(\pi_{0},Q_{m^{*},0};m^{*})+o_{p}(n^{-1/2}).

4.2 Undersmoothing criteria in practice

As discussed in Section 2.4, without undersmoothing, the estimator $\Psi_{m_{n},n}^{\mathrm{UIPW}}$ may fail to be asymptotically linear. An IPW estimator will be efficient if it solves the efficient influence function. Theorem 4 shows that the latter is achieved when $\mathbb{P}_{n}\sum_{t=1}^{T}D^{t}_{CAR}(\pi_{n,\lambda},Q_{0,m^{*}})=o_{p}(n^{-1/2})$ . The theoretical rate might not provide practical guidance due to the unknown constant. However, given the finite sample at hand, one can achieve the best performance by selecting a tuning parameter that minimizes the $\mathbb{P}_{n}\sum_{t=1}^{T}D^{t}_{CAR}(\pi_{n,\lambda},Q_{0,m^{*}})$ . Specifically, we define

\displaystyle\lambda_{n,t}=\operatorname*{argmin}_{\lambda}\left|B^{-1}\sum_{b=1}^{B}\mathbb{P}^{1}_{n,b}D^{t}_{CAR}(\pi_{n,b,\lambda},Q_{n,b,m_{b}})\right|,

(7)

where $Q_{b,m_{b}}$ is a cross-validated highly adaptive lasso estimate of $Q_{0,m^{*}}$ with the $L_{1}$ -norm bound based on the global cross-validation selector.

The criterion (7) relies on knowledge about the efficient influence function. However, in some cases, deriving this function can be intricate (e.g., longitudinal settings with many decision points) or may not have a closed-form solution (e.g., bivariate survival probability in bivariate right-censored data) (van der Laan, 1996). To this end, we propose an alternative criterion, drawing from the approach in Ertefaie et al. (2023), that does not require the efficient influence function. Define

\tilde{\lambda}_{n,t}=\operatorname*{argmin}_{\lambda}B^{-1}\sum_{b=1}^{B}\left[\sum_{(r,i)\in\mathcal{J}_{n}}\frac{1}{\lVert\beta_{t,n,\lambda,b}\rVert_{L_{1}}}\bigg{\lvert}\mathbb{P}_{n,b}^{1}\tilde{\Omega}_{r,i}(\phi,\pi^{t}_{n,b,\lambda})\bigg{\rvert}\right],

(8)

in which $\lVert\beta_{t,n,\lambda,b}\rVert_{L_{1}}=\lvert\beta_{t,n,\lambda,0}\rvert+\sum_{r\subset\{1,\ldots,d\}}\sum_{j=1}^{n}\lvert\beta_{t,n,\lambda,r,i}\rvert$ is the $L_{1}$ -norm of the coefficients $\beta_{t,n,\lambda,r,i}$ in the highly adaptive estimator $\pi^{t}_{n,\lambda}$ for a given $\lambda$ , and $\tilde{\Omega}_{r,i}(\phi,\pi^{t}_{n,\lambda,v})=\phi_{r,i}(\bar{H}_{t})\{A_{j}-\pi^{t}_{n,\lambda,v}(1\mid\bar{H}_{t})\}\{\pi^{t}_{n,\lambda,v}(1\mid\bar{H}_{t}))\}^{-1}$ .

The main drawback of undersmoothing is the decreased convergence rate of the highly adaptive lasso fit. Our asymptotic linearity proof of $\Psi_{m_{n},n}^{\mathrm{UIPW}}(\pi_{n,\lambda})$ requires $\int\|\pi^{t}_{n}-\pi^{t}_{0}\|_{2}dF(\theta)=o_{p}(n^{-1/4})$ which is slower than the cross-validated based highly adaptive lasso fit of $O_{p}(n^{-1/3})$ . This ensures that we can tolerate a certain degree of undersmoothing without compromising our desired asymptotic properties. Let $J$ be a number of features included in fit (i.e., $J=|\mathcal{J}_{n}|$ ). van der Laan (2023) showed that the convergence rate of a highly adaptive lasso is $(J/n)^{1/2}$ . Hence to ensure the required $o_{p}(n^{-1/4})$ , we must choose a $J$ such that $J<n^{1/2}$ and let $\lambda_{n,J}$ be the corresponding $L_{1}$ norm. We then propose our score based criterion as

\displaystyle\lambda^{\prime}_{n,t}=\max(\lambda_{n,J},\tilde{\lambda}_{n,t}).

(9)

In practice, the score-based criterion tends to undersmooth too much (i.e., pick a very small $\tilde{\lambda}_{n,t}$ ), and the max operator helps us to mitigate this issue.

5 Simulation Studies

5.1 Setting

In our simulation study, we evaluate and compare the performance exhibited by our proposed estimators, MR and UIPW to the conventional IPW estimator. We consider a two-stage treatment in our simulation (T = 2). The baseline covariates $S_{1,t=1},S_{2,t=1}$ are generated from a normal distribution with the mean equals 0 and the standard deviation equals 0.5. The time-dependent covariate $S_{1,t},S_{2,t}$ are function of the preceding $S_{1,t-1},S_{2,t-1}$ . In particular,

	$\displaystyle S_{1,t}=\text{Max}[\text{Min}\{0.9(2A_{t-1})S_{1,t-1}+0.05S_{1,t-1}S_{2,t-1}+N(0,0.5),8\},-4]$
	$\displaystyle S_{2,t}=\text{Max}[\text{Min}\{0.9(1-2A_{t-1})S_{1,t-1}+0.05S_{1,t-1}S_{2,t-1}+N(0,0.5),8\},-4]$

The outcome $Y=S_{1,T}-S_{2,T}+A_{T}S_{1,T}+N(0,0.1^{2})$ is a function of $S_{1},S_{2}$ in the last stage. The decision rule is given as $d^{\theta}_{t}=I\{S_{1,t}<\theta\}$ which only depends on $S_{1,t}$ . We generate $\theta$ from a normal distribution $f(\theta)=N(0,0.1^{2})$ . The treatment $A_{t}$ is generated from a Bernoulli distribution with $P(A_{t}=1|S_{1,t},S_{2,t})=\text{expit}(0.1+0.2S_{1,t})$ . Additional simulations where $A_{t}$ is randomized can be found in Section 13 of the Supplementary Material.

We consider two variants of the UIPW estimator, differing in how the tuning parameter $\lambda$ is selected. UIPW-Dcar chooses $\lambda$ based on the criterion 7, while UIPW-Score chooses $\lambda$ based on the criterion 9. The nuisance parameters, $Q^{t}_{n,m_{n}}$ and $\pi^{t}_{n}$ , is estimated using the highly adaptive lasso. We impose the parametric working model $m=E[Y|\theta]=\beta_{0}+\beta_{1}\theta$ . We perform 360 simulations on the sample sizes 250, 500, 850, and 1000. In Section 13 of the Supplementary Material, we present additional simulations when the working model $m$ is estimated using other regression models, both parametric and nonparametric.

5.2 Efficiency of the Proposed Estimators

We showcase the effectiveness of our proposed estimators via coverage plots, as seen in the right panel of Figure 1. Notably, the coverage of IPW consistently falls below 77% at a sample size of 250 and diminishes further with larger sample sizes. Conversely, the MR and UIPW-Dcar estimators maintain stable coverage around 90% at small sample sizes and achieving 93% and 95% at sample sizes of 850 and 1000, respectively. Regarding UIPW-Score, its initial coverage hovers around 85% at sample sizes of 250 and 500, but rapidly climbing to approximately 92% at a sample size of 1000.

The left panel of Figure 1 provides insight into the scaled bias, calculated by multiplying the bias by the square root of the sample size. The MR estimator displays the smallest scaled bias. Additionally, as the sample size increases, our proposed estimators exhibit a declining trend in scaled bias, indicating convergence towards the true value at a rate of $o_{p}(n^{-1/2})$ .

Conversely, the scaled bias of the IPW estimator shows an upward trend with increasing sample size, signifying a slower convergence rate than $o_{p}(n^{-1/2})$ . This slower convergence is attributed to the bias term in the IPW estimator, which does not diminish rapidly enough when estimating the propensity score model using a nonparametric method.

Refer to caption — Figure 1: Bias and Coverage plots

6 Data Application

We implement our proposed methods to determine the optimal treatment strategy for individuals with Parkinson’s Disease, focusing on the LS1 study. The LS1 study enrolled patients diagnosed with Parkinson’s within five years, being treated with either Levodopa or DRA drug class (Kieburtz et al., 2015). However, given the prevalent use of Levodopa as the first line therapy, we concentrate solely on patients already on Levodopa before the study. The study includes follow-ups at three months initially and annually thereafter, with our primary outcome ( $Y$ ) being the UPDRS-III score two years after the start of the LS1 study. UPDRS-III is a measure of motor function of a patient, ranging from 0 to 100.

Within our sample of 238 patients, the UPDRS-III score ranges from 3.00 to 68.00, with mean and median values around 20.00, and the first and third quantiles are at 13.00 and 26.00, repspectively. We focus on two decision points for treatment adjustment: at three months and a year into the study (i.e., $T=2$ ). Treatment decisions involve high (total Levodopa $>$ 300 mg/day) or low ( $\leq$ 300 mg/day) doses (i.e., $A_{t}\in\{high,low\}$ ). Our decision rule, $d^{\theta}$ , assigns high or low doses based on whether the UPDRS-III score exceeds the threshold $\theta$ , i.e $d^{\theta}=I(\theta>\text{UPDRS-III})$ .

We seek to find $\theta_{opt}=\operatorname*{argmin}_{\theta\in[0,30]}E[Y^{\theta}]$ where the choice of $\theta$ ’s range [0,30] aligns with our data support. We consider three marginal structural models:

Model 1: Quadratic linear regression in $\theta$ ; i.e $E[Y\mid\theta]=\beta_{0}+\beta_{1}\theta+\beta_{2}\theta^{2}$ .

Model 2: Highly adaptive lasso model with zero order splines

Model 3: Highly adaptive lasso model with smoothing.

We will employ cross-validation to prevent overfitting. For risk calculation, we employ four estimators: DR, IPW, and UIPWs (Dcar and Score). These models leverage the highly adaptive lasso to fit both the propensity score model $\pi^{t}$ and the nuisance parameter $Q^{t}$ , using multiple covariates.

The risk estimates in Table 1 indicate that the linear regression model (Model 1) consistently yields the highest risk among all estimators. However, the estimated risks of the highly adaptive lasso models (Model 2 and 3) are close to the risks of the linear regresion model. Examining Table 2, optimal $\theta$ values across folds mostly converge around 30, suggesting minimal improvement with higher Levodopa doses. Figure 2 presents marginal structural model plots, indicating minimal marginal benefits or none with increased Levodopa doses.

Dyskinesia and motor fluctuation are two major side effects of Levodopa treatment. To examine the association between Levodopa doses (i.e., high vs low) with these side effects, we fit a logistic model that includes the binary dosing of Levodopa as a dependent variable and the presence of dyskinesia or motor fluctuation recorded in UPDRS-IV after a year of treatment initiation as an independent variable. Our analysis indicates that the odd ratio of developing adverse events in the high dose is 1.6 times that in the low dose. This finding is consistent with findings from other studies that a high dose of Levodopa is associated with dyskinesia and motor fluctuation (Thanvi et al., 2007). Hence, our analysis suggests favoring low-dose Levodopa for early-stage Parkinson’s patients. It is because high dose of Levodopa has minimal improvement in UPDRS-III scores but increases likelihood of having adverse events.

Table 1: Estimated risk with different risk estimators

Model	DR	UIPW-Dcar	IPW	UIPW-Score
Model 1	114.19	199.33	199.33	199.33
Model 2	125.64	204.16	204.16	204.16
Model 3	121.94	203.02	203.02	203.02

Table 2: Optimal points with each fold

Fold	Model 1	Model 2	Model 3
Fold 1	19.5	0.0	30.0
Fold 2	30.0	18.5	30.0
Fold 3	30.0	18.5	22.5
Fold 4	17.0	21.5	25.0
Fold 5	30.0	18.5	30.0

7 Discussion

This paper introduces two efficient estimators for assessing the risk associated with a marginal structural model. The first is a multiply robust estimator, validated to achieve efficiency when all nuisance parameters are correctly specified. It remains consistent even when the propensity score or the mean outcome model is accurately specified. The second is a sieve IPW estimation leveraging Highly Adaptive Models to attain efficiency.

Our work has several promising directions for extension. While our method is adaptable to studies with multiple stages, the assumption of positivity often fails in scenarios with a large number of stages. A potential modification involves transitioning from a deterministic to a stochastic decision rule where a probability is assigned to each possible treatment such that higher probabilities reflect the higher chance of being the optimal treatment. Stochastic rules are of interest because their carry more information about the relative optimality of a treatment compared to the other treatment choices.

To facilitate the comparison of several marginal structural models, one can use our asymptotic linearity results to identify the set of best models that contains the true best model with a given probability. This will necessitate the generalization of the multiple comparison with the best approach to our specific setting. Dynamic marginal structural models can be used with time-to-event data by considering structural Cox or accelerated failure time models for the potential outcomes.

Acknowledgements

CP, BRB, and AE were supported by the National Institute of Neurological Disorders and Stroke (R61/R33 NS120240).

References

Benkeser and van der Laan [2016] David Benkeser and Mark van der Laan. The highly adaptive lasso estimator. In IEEE International Conference on Data Science and Advanced Analytics, pages 689–696. IEEE, 2016.
Benkeser et al. [2017] David Benkeser, Marco Carone, Mark J van der Laan, and Peter B Gilbert. Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104(4):863–880, 2017.
Bibaut and van der Laan [2019] Aurelien F Bibaut and Mark J van der Laan. Fast rates for empirical risk minimization over cadlag functions with bounded sectional variation norm. arXiv preprint arXiv:1907.09244, 2019.
Bickel et al. [1993] Peter J Bickel, Chris AJ Klaassen, Ya’acov Ritov, and Jon A Wellner. Efficient and Adaptive Eestimation for Semiparametric Models. Spinger, New York, 1993.
Brookhart and van der Laan [2006] M Alan Brookhart and Mark J van der Laan. A semiparametric model selection criterion with applications to the marginal structural model. Computational Statistics & Data Analysis, 50(2):475–498, 2006.
Cole and Hernán [2008] Stephen R Cole and Miguel A Hernán. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6):656–664, 2008.
Duque et al. [2021] Daniel Rodriguez Duque, David A Stephens, and Erica EM Moodie. Estimation of optimal dynamic treatment regimes using gaussian process emulation. arXiv preprint arXiv:2105.12259, 2021.
Ertefaie et al. [2016] Ashkan Ertefaie, Tianshuang Wu, Kevin G Lynch, and Inbal Nahum-Shani. Identifying a set that contains the best dynamic treatment regimes. Biostatistics, 17(1):135–148, 2016.
Ertefaie et al. [2023] Ashkan Ertefaie, Nima S Hejazi, and Mark J van der Laan. Nonparametric inverse-probability-weighted estimators based on the highly adaptive lasso. Biometrics, 79(2):1029–1041, 2023.
Gill et al. [1995] Richard D Gill, Mark J Laan, and Jon A Wellner. Inefficient estimators of the bivariate survival function for three models. In Annales de l’IHP Probabilités et statistiques, volume 31, pages 545–597, 1995.
Hernán and Robins [2023] Miguel A. Hernán and James M. Robins. Causal Inference: What If. Chapman & Hall/CRC, Boca Raton, 2023.
Howe et al. [2011] Chanelle J Howe, Stephen R Cole, Joan S Chmiel, and Alvaro Munoz. Limitation of inverse probability-of-censoring weights in estimating survival in the presence of strong selection bias. American Journal of Epidemiology, 173(5):569–577, 2011.
Hubbard et al. [2016] Alan E Hubbard, Sara Kherad-Pajouh, and Mark J van der Laan. Statistical inference for data adaptive target parameters. The International Journal of Biostatistics, 12(1):3–19, 2016.
Imai and Ratkovic [2015] Kosuke Imai and Marc Ratkovic. Robust estimation of inverse probability weights for marginal structural models. Journal of the American Statistical Association, 110(511):1013–1023, 2015.
Kennedy [2019] Edward H Kennedy. Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526):645–656, 2019.
Kennedy [2022] Edward H Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469, 2022.
Kieburtz et al. [2015] Karl Kieburtz, Barbara C Tilley, Jordan J Elm, Debra Babcock, Robert Hauser, G Webster Ross, Alicia H Augustine, Erika U Augustine, Michael J Aminoff, Ivan G Bodis-Wollner, et al. Effect of creatine monohydrate on clinical progression in patients with parkinson disease: a randomized clinical trial. Journal of the American Medical Association, 313(6):584–593, 2015.
Mortimer et al. [2005] Kathleen M Mortimer, Romain Neugebauer, Mark van der Laan, and Ira B Tager. An application of model-fitting procedures for marginal structural models. American Journal of Epidemiology, 162(4):382–388, 2005.
Murphy et al. [2001] Susan A Murphy, Mark J van der Laan, James M Robins, and Conduct Problems Prevention Research Group. Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456):1410–1423, 2001.
Neugebauer and van der Laan [2007] Romain Neugebauer and Mark van der Laan. Nonparametric causal effects based on marginal structural models. Journal of Statistical Planning and Inference, 137(2):419–434, 2007.
Orellana et al. [2010] Liliana Orellana, Andrea Rotnitzky, and James M Robins. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: main content. The International Journal of Biostatistics, 6(2), 2010.
Robins [1986] James Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9-12):1393–1512, 1986.
Robins [1989] James M Robins. The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In L Sechrest, H Freeman, and A Mulley, editors, Health Service Research Methodology: A Focus on AIDS, pages 113–159, Washington, D.C., 1989. U.S. Public Health Service, National Center for Health Services Research.
Robins [1997] James M Robins. Causal inference from complex longitudinal data. In M Berkane, editor, Latent Variable Modeling and Applications to Causality, pages 69–117, New York, 1997. Springer.
Robins et al. [2000] James M Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology, pages 550–560, 2000.
Rotnitzky et al. [1998] Andrea Rotnitzky, James M Robins, and Daniel O Scharfstein. Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association, 93(444):1321–1339, 1998.
Shortreed and Moodie [2012] Susan M Shortreed and Erica EM Moodie. Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential multiple-assignment randomized clinical antipsychotic trials of intervention and effectiveness schizophrenia study. Journal of the Royal Statistical Society Series C: Applied Statistics, 61(4):577–599, 2012.
Thanvi et al. [2007] Bhomraj Thanvi, Nelson Lo, and Tom Robinson. Levodopa-induced dyskinesia in parkinson’s disease: clinical features, pathogenesis, prevention and treatment. Postgraduate Medical Journal, 83(980):384–388, 2007.
Tsiatis [2006] Anastasios A Tsiatis. Semiparametric Theory and Missing Data. Springer, New York, 2006.
van der Laan [2023] Mark van der Laan. Higher order spline highly adaptive lasso estimators of functional parameters: Pointwise asymptotic normality and uniform convergence rates. arXiv preprint arXiv:2301.13354, 2023.
van der Laan [1996] Mark J van der Laan. Efficient estimation in the bivariate censoring model and repairing npmle. The Annals of Statistics, 24(2):596–627, 1996.
van der Laan [2014] Mark J van der Laan. Targeted estimation of nuisance parameters to obtain valid statistical inference. The International Journal of Biostatistics, 10(1):29–57, 2014.
van der Laan and Dudoit [2003] Mark J van der Laan and Sandrine Dudoit. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. U.C. Berkeley Division of Biostatistics Working Paper Series, 2003. Working Paper 130.
van der Laan and Robins [2003] Mark J van der Laan and James M Robins. Unified Methods for Censored Longitudinal Data and Causality. Springer, New York, 2003.
van der Vaart [1998] Aad W van der Vaart. Asymptotic Statistics. Cambridge University Press, Cambridge, 1998.
van der Vaart et al. [2006] Aad W van der Vaart, Sandrine Dudoit, and Mark J van der Laan. Oracle inequalities for multi-fold cross validation. Statistics & Decisions, 24(3):351–371, 2006.
Yu and van der Laan [2006] Zhuo Yu and Mark van der Laan. Double robust estimation in longitudinal marginal structural models. Journal of Statistical Planning and Inference, 136(3):1061–1089, 2006.

SUPPLEMENTARY MATERIAL

8 Proof of Lemma 1

The lemma follows in the usual way. For any given regimen-response curve $m$ , we see that

	$\displaystyle\mathbb{P}_{0}\left[\int\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\left\{Y-m(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right]$
	$\displaystyle\hskip 42.67912pt=\mathbb{P}_{0}\left[\int\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\left\{Y_{\theta}-m(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right]$
	$\displaystyle\hskip 42.67912pt=\mathbb{P}_{0}\left[\int\frac{\pi^{T}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t},Y_{\theta})}{\pi^{T}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\prod_{t=1}^{T-1}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\left\{Y_{\theta}-m(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right]$
	$\displaystyle\hskip 42.67912pt=\mathbb{P}_{0}\left[\int\prod_{t=1}^{T-1}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\left\{Y_{\theta}-m(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right]$
	$\displaystyle\hskip 42.67912pt=\dots$
	$\displaystyle\hskip 42.67912pt=\mathbb{P}_{0}\left[\int\left\{Y_{\theta}-m(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right],$

where the second equality applies iterated expectation and the dots formally follow through induction.

Now, recall that

\Psi_{0}(m_{n})=\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\{Y^{\theta}-m_{n,b}(\theta,V)\}^{2}\,\mathrm{d}F(\theta)

is an average over folds. Taking the regimen-response curve $m$ to be the curve $m_{n,b}$ fitted to the training data $\mathbb{P}^{1}_{n,b}$ and $\mathbb{P}_{0}$ to be the expectation with respect to a (new) test point, our previous calculation shows that

\displaystyle\mathbb{P}_{0}\left[\int\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d_{t}^{\theta}\mid\bar{H}_{t})}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right]=\mathbb{P}_{0}\left[\int\left\{Y_{\theta}-m_{n,b}(\theta,V)\right\}^{2}\,\mathrm{d}F(\theta)\right].

The result now follows by averaging over folds.

9 Proof of Theorem 1

The target parameter $\Psi_{0,\text{nda}}(m)$ is a pathwise differentiable function. Hence, the canonical gradient of $\Psi_{0,\text{nda}}(m)$ in a nonparametric model $\mathcal{M}$ corresponds to the efficient influence function of a regular, asymptotically linear (RAL) estimator. Let $\Psi_{\epsilon,\text{nda}}(m)=P_{\epsilon}L_{m}(\pi)$ where $P_{\epsilon}=(1+\epsilon\mathcal{S})P_{0}$ is a perturbed data distribution. The canonical gradient $\varphi$ is the one that satisfies the following equation:

\frac{\partial}{\partial\epsilon}\Psi_{\epsilon,\text{nda}}(m)\mid_{\epsilon=0}=E\{\varphi(m)\mathcal{S}(O;\epsilon)\}\mid_{\epsilon=0},

where $\mathcal{S}(O;\epsilon)=\frac{\partial}{\partial\epsilon}\log f(O;\epsilon)$ .

Let $\pi_{\epsilon}$ denote the perturbed propensity score. Then,

$\displaystyle\frac{\partial}{\partial\epsilon}\Psi_{\epsilon,\text{nda}}(m)\mid_{\epsilon=0}$	$\displaystyle=\mathbb{E}\int\Big{\{}\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi_{0}^{t}(A_{t}=d_{t}^{\theta}\|\bar{H}_{t})}\{Y-m(\theta,V)\}^{2}\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\Big{\}}$
	$\displaystyle+\mathbb{E}\left[\int\frac{\partial}{\partial\epsilon}\left\{\prod_{t=1}^{T}\frac{I(A_{t}=d^{\theta}_{t})}{\pi^{t}_{\epsilon}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})}\{Y-m(\theta,V)\}^{2}\right\}_{\epsilon=0}\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\Big{\{}\int\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi_{0}^{t}(A_{t}=d_{t}^{\theta}\|\bar{H}_{t})}\{Y-m(\theta,V)\}^{2}\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\Big{\}}$
	$\displaystyle-\mathbb{E}\int\Big{[}\sum_{t=1}^{T}\prod_{j\neq t}\frac{I(A_{j}=d_{j}^{\theta})I(A_{t}=d_{t}^{\theta})\{Y-m(\theta,V)\}^{2}}{\pi^{t}_{0}(A_{j}=d^{\theta}_{j}\mid\bar{H}_{j})\pi^{t}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})^{2}}\frac{\partial}{\partial\epsilon}\pi_{\epsilon}(A_{t}=d^{\theta}_{t}\|\bar{H}_{t})\mathrm{d}F(\theta)\Big{]}$	(10)

We derive the form of (10) by focusing on a given $t$ which then easily generalizes to all $1\leq t\leq T$ . We have

	$\displaystyle\mathbb{E}\Big{[}\int\prod_{j\neq t}^{T}\Big{\{}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}(A_{j}=d^{\theta}_{j}\|\bar{H}_{j})}\Big{\}}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})^{2}}\{Y-m(\theta,V)\}^{2}\frac{\partial}{\partial\epsilon}\pi_{\epsilon}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})\mathrm{d}F(\theta)\Big{]}$
	$\displaystyle=\mathbb{E}\left[\int\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}\mathbb{E}\left[\prod_{j\neq t}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t}\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle-\mathbb{E}\Bigg{[}\int\mathbb{E}\left[\prod_{j\neq m}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\big{]}\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\left[\int\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}\mathbb{E}\left[\prod_{j\neq m}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi_{0}^{j}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle-\mathbb{E}\Bigg{[}\int\mathbb{E}\left[\prod_{j\neq m}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi_{0}^{j}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\big{]}\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\left[\int\left\{\prod_{i=1}^{t-1}\frac{I(A_{i}=d^{\theta}_{i})}{\pi^{t}_{0}(A_{i}=d^{\theta}_{i}\mid\bar{H}_{i})}\right\}Q^{t}_{m}\left[\frac{I(A_{t}=d_{t}^{\theta})}{\pi_{0}^{t}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}-1\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\left[\int\left\{\prod_{i=1}^{t-1}\frac{I(A_{i}=d^{\theta}_{i})}{\pi^{t}_{0}(A_{i}=d^{\theta}_{i}\mid\bar{H}_{i})}\right\}Q^{t}_{0,m}\left[\frac{I(A_{t}=d_{t}^{\theta})-{\pi_{0}^{t}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}}{{\pi_{0}^{t}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}}\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right],$

where

Q^{t}_{0,m}=\mathbb{E}\left[\prod_{j={t+1}}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi_{0}^{j}\{A_{j}=d^{\theta}_{t}\mid\bar{H}_{j}\}}\{Y-m(\theta,V)\}^{2}\mid\bar{H}_{t},A_{t}=d^{\theta}_{t}\right].

Hence, the un-centered canonical gradient of $\Psi_{0,\text{nda}}(m)$ is

	$\displaystyle\varphi_{m}(\pi,Q_{0,m})=$	$\displaystyle\int\left\{\prod_{t=1}^{T}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})}\{Y-m(\theta,V)\}^{2}\right.$
		$\displaystyle\left.-\sum_{t=1}^{T}\frac{I(A_{t}=d^{\theta}_{t})-\pi^{t}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})}{\pi^{t}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})}Q^{t}_{0,m}\prod_{j=1}^{t-1}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{j}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})}\right\}\,dF(\theta).$		(11)

10 Proof of Theorem 2

We let $\pi_{n,b}$ be the estimate of $\pi_{0}$ obtained using the $b^{th}$ sample split. Similarly, $Q_{n,b,m}$ is the estimate of $Q_{0,m}$ and $m_{b}$ is the estimate of $m_{0}$ obtained using the $b^{th}$ sample split.

		$\displaystyle\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m_{n})$
	$\displaystyle=$	$\displaystyle\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m_{n})\pm\Psi_{n}^{MR}(m_{n},\pi_{0},Q_{0,m_{n}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\mathbb{E}_{B_{n}}\mathbb{P}_{0}\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\pm\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}^{1}_{n,b}-\mathbb{P}_{0})\varphi_{m_{n,b}}\left(\pi_{0},Q_{0,m_{n,b}}\right)+\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}^{1}_{n,b}-\mathbb{P}_{0})\varphi_{m_{n,b}}\left(\pi_{0},Q_{0,m_{n,b}}\right)+\mathbb{E}_{B_{n}}\mathbb{P}^{1}_{n,b}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}^{1}_{n,b}-\mathbb{P}_{0})\varphi_{m^{}}\left(\pi_{0},Q_{0,m^{}}\right)+\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{}}\left(\pi_{0},Q_{0,m^{}}\right)+\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}+o_{p}(n^{-1/2})$

We need to show that $\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}=o_{p}(n^{-1/2})$ .

		$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{n,b}}\{Y-m_{n,b}(\theta,V)\}^{2}-\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\{Y-m_{n,b}(\theta,V)\}^{2}\right]dF(\theta)$
		$\displaystyle+\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}Q^{t}_{0,m_{n,b}}-\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{n,b,m_{n,b}}\right]dF(\theta)$
		$\displaystyle+\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{b,m_{n,b}}-\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}Q^{t}_{0,m_{n,b}}\right]dF(\theta)$
		$\displaystyle=(I)+(II)+(III)$

For the rest of this calculation, we use the color brown to indicate that the term converges to 0 at the $n^{1/2}$ rate under the required assumptions.

	$\displaystyle(I)=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{n,b}}\{Y-m_{n,b}(\theta,V)\}^{2}-\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\{Y-m_{n,b}(\theta,V)\}^{2}dF(\theta)$
		$\displaystyle\pm\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\frac{\prod_{i=1}^{T}I\{A_{i}=d^{\theta}_{i}\}}{\left(\prod_{i=1}^{t-1}\pi^{i}_{n,b}\right)\left(\prod_{i=t}^{T}\pi^{i}_{0}\right)}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\right]dF(\theta)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{T}I\{A_{i}=d^{\theta}_{i}\}}{\left(\prod_{i=1}^{t}\pi^{i}_{n,b}\right)\left(\prod_{i=t+1}^{T}\pi^{i}_{0}\right)}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\left\{\frac{\pi^{t}_{0}-\pi^{t}_{n,b}}{\pi^{t}_{0}}\right\}dF(\theta)$
	$\displaystyle=$	$\displaystyle{\color[rgb]{.75,.5,.25}\sum_{t=1}^{T}\sum_{k<l<m<t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{T}I\{A_{i}=d^{\theta}_{i}\}}{\pi^{k}_{n,b}\pi^{m}_{0}}\{Y-m_{n,b}(\theta,V)\}^{2}\frac{\pi_{l,0}-\pi_{l,S_{n}}}{\pi^{t}_{0}\pi^{t}_{n,b}}\frac{\pi^{t}_{0}-\pi^{t}_{n,b}}{\pi^{t}_{0}\pi^{t}_{n,b}}dF(\theta)}$
		$\displaystyle+\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left(\prod_{i=1}^{T}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\frac{\pi^{t}_{0}-\pi^{t}_{n,b}}{\pi^{t}_{n,b}}\right)dF(\theta)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left(\prod_{i=1}^{T}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left\{Y-m_{n,b}(\theta,V)\right\}^{2}\frac{\pi^{t}_{0}-\pi^{t}_{n,b}}{\pi^{t}_{n,b}}\right)dF(\theta)+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{0,m_{n,b}}\right)\frac{\pi^{t}_{0}-\pi^{t}_{n,b}}{\pi^{t}_{n,b}}dF(\theta)+o_{p}(n^{-1/2})$

We continue with Term (II)

	$\displaystyle(II)=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}Q^{t}_{0,m_{n,b}}-\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{n,b,m_{n,b}}\right]dF(\theta)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}Q^{t}_{0,m_{n,b}}-\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{n,b,m_{n,b}}\right]dF(\theta)$
		$\displaystyle\pm\sum_{t=1}^{T}\sum_{k=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{k-1}\pi^{i}_{n,b}\prod_{i=k}^{t}\pi^{i}_{0}}Q^{t}_{n,b,m_{n,b}}$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{0,m_{n,b}}-Q^{t}_{n,b,m_{n,b}}\right)dF(\theta)$
		$\displaystyle+{\color[rgb]{.75,.5,.25}\sum_{t=1}^{T}\sum_{i=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{k=1}^{i-1}\left(\pi^{k}_{n,b}\right)\prod_{l=i}^{t}\left(\pi^{l}_{0}\right)}\left(Q^{t}_{n,b,m_{n,b}}-Q^{t}_{0,m_{n,b}}\right)\left(\frac{\pi^{i}_{n,b}-\pi^{i}_{0}}{\pi^{i}_{n,b}}\right)dF(\theta)}$
		$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{k=1}^{i-1}\left(\pi^{k}_{n,b}\right)\prod_{l=i}^{t}\left(\pi^{l}_{0}\right)}\left(\frac{\pi^{i}_{n,b}-\pi^{i}_{0}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{n,b,m_{n,b}}-Q^{t}_{0,m_{n,b}}\right)dF(\theta)$
		$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{i}_{n,b}-\pi^{i}_{0}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)$
		$\displaystyle+{\color[rgb]{.75,.5,.25}\sum_{t=1}^{T}\sum_{1\leq k\leq l\leq t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{l}_{n,b}-\pi^{l}_{0}}{\pi^{l}_{n,b}}\right)\left(\frac{\pi^{k}_{0}-\pi^{k}_{n,b}}{\pi^{k}_{n,b}}\right)Q^{t}_{0,m_{b}}dF(\theta)}+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{n,b,m_{b}}-Q^{t}_{0,m_{b}}\right)dF(\theta)$
		$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{i}_{n,b}-\pi^{i}_{0}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{b}}dF(\theta)+o_{p}(n^{-1/2})$

Lastly, we consider the third term in red.

	$\displaystyle(III)=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{n,b,m_{n,b}}-\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{0,m_{n,b}}\right]dF(\theta)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left[\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{n,b,m_{n,b}}-\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{n,b}}Q^{t}_{0,m_{n,b}}\right]dF(\theta)$
		$\displaystyle\pm\sum_{t=1}^{T}\sum_{k=1}^{t-1}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t-1}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{k-1}\pi^{i}_{n,b}\prod_{i=k}^{t-1}\pi^{i}_{0}}Q^{t}_{n,b,m_{n,b}}$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{n,b,m_{n,b}}-Q^{t}_{0,m_{n,b}}\right)dF(\theta)$
		$\displaystyle+{\color[rgb]{.75,.5,.25}\sum_{t=1}^{T}\sum_{i=1}^{t-1}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t-1}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{k=1}^{i}\left(\pi^{k}_{n,b}\right)\prod_{l=i}^{t-1}\left(\pi^{i}_{0}\right)}\left(Q^{t}_{0,m_{n,b}}-Q^{t}_{n,b,m_{n,b}}\right)\left(\frac{\pi^{l}_{0}-\pi^{l}_{n,b}}{\pi^{l}_{n,b}}\right)dF(\theta)}$
		$\displaystyle+\sum_{t=1}^{T}\sum_{1\leq l\leq t-1}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t-1}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{k=1}^{i}\left(\pi^{i}_{n,b}\right)\prod_{l=i}^{t-1}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{l}_{0}-\pi^{l}_{n,b}}{\pi^{l}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{0,m_{n,b}}-Q^{t}_{n,b,m_{n,b}}\right)dF(\theta)$
		$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t-1}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{i}_{0}-\pi^{i}_{n,b}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)$
		$\displaystyle+{\color[rgb]{.75,.5,.25}\sum_{t=1}^{T}\sum_{1\leq k\leq l\leq t-1}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t-1}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t-1}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{l}_{0}-\pi^{l}_{n,b}}{\pi^{l}_{n,b}}\right)\left(\frac{\pi^{k}_{n,b}-\pi^{k}_{0}}{\pi^{k}_{n,b}}\right)Q^{t}_{n,b,m_{n,b}}dF(\theta)}+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t-1}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{0,m_{n,b}}-Q^{t}_{n,b,m_{n,b}}\right)dF(\theta)$
		$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t-1}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t-1}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t-1}\pi^{i}_{0}}\left(\frac{\pi^{i}_{0}-\pi^{i}_{n,b}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)+o_{p}(n^{-1/2})$

We combine all the remaining terms in (I), (II), and (III)

	$\displaystyle(I)+(II)+(III)$
	$\displaystyle=\sum_{t=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{i=1}^{t}\frac{I\{A_{i}=d^{\theta}_{i}\}}{\pi^{i}_{0}}\left(Q^{t}_{n,b,m_{n,b}}-Q^{t}_{0,m_{n,b}}\right)\left(\frac{\it-\pi^{t}_{0}}{\pi^{t}_{0}}\right)dF(\theta)$
	$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{i}_{n,b}-\pi^{i}_{0}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)+o_{p}(n^{-1/2})$
	$\displaystyle+\sum_{t=1}^{T}\sum_{i=1}^{t}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{i=1}^{t}I\{A_{i}=d^{\theta}_{i}\}}{\prod_{i=1}^{t}\left(\pi^{i}_{0}\right)}\left(\frac{\pi^{i}_{0}-\pi^{i}_{n,b}}{\pi^{i}_{n,b}}\right)Q^{t}_{0,m_{n,b}}dF(\theta)+o_{p}(n^{-1/2})$
	$\displaystyle=o_{p}(n^{-1/2})$

Hence,

\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m_{n})=(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{*}}(\pi_{0},Q_{0,m^{*}})+o_{p}(n^{-1/2})

11 Proof of Theorem 3

In the calculation below, we show that if we look at $\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{m_{n},n})-\Psi_{0}(m^{*})$ instead of $\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{m_{n},n})-\Psi_{0}(m_{n})$ , we will have a bias term.

	$\displaystyle\Psi^{MR}_{n}$	$\displaystyle(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m^{*})$
	$\displaystyle=$	$\displaystyle\Psi^{MR}_{n}(m_{n},\pi_{n},Q_{n,m_{n}})-\Psi_{0}(m^{})\pm\Psi^{MR}_{n}(m^{},\pi_{0},Q_{0,m^{*}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{B_{n}}\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\mathbb{P}_{0}\varphi_{m}^{}(\pi_{0},Q_{0,m^{}})\pm\mathbb{E}_{B_{n}}\mathbb{P}_{n}\varphi_{m}^{}(\pi_{0},Q_{0,m^{}})$
	$\displaystyle=$	$\displaystyle(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})+\mathbb{E}_{B_{n}}\mathbb{P}_{B_{n}}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})\right\}$
		$\displaystyle\pm\mathbb{E}_{B_{n}}\mathbb{P}_{B_{n}}\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})$
	$\displaystyle=$	$\displaystyle(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})+\mathbb{E}_{B_{n}}\mathbb{P}_{B_{n}}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}$
		$\displaystyle+\mathbb{E}_{B_{n}}\mathbb{P}_{B_{n}}\left\{\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})-\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})\right\}$
	$\displaystyle=$	$\displaystyle(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})+\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}$
		$\displaystyle+\mathbb{E}_{B_{n}}\mathbb{P}_{0}\{\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})-\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})\}$

In the proof of Theorem 2, we have shown that
$\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{\varphi_{m_{n,b}}(\pi_{n,b},Q_{n,b,m_{n,b}})-\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})\right\}=o_{p}(n^{-1/2})$ . We will continue to examine the third term.

	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}$	$\displaystyle[\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})-\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})]$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\int\mathbb{P}_{0}\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\left\{\{Y-m_{n,b}(\theta,V)\}^{2}-(Y-m^{*}(\theta,V))^{2}\right\}dF(\theta)$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\int\mathbb{P}_{0}\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\left\{Y^{2}-2Ym_{n,b}(\theta,V)+m_{n,b}^{2}(\theta,V)-Y^{2}+2Ym^{}(\theta,V)-{m^{}(\theta,V)}^{2}\right\}dF(\theta)$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\int\mathbb{P}_{0}\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\{Y^{2}-2m_{0}(\theta,V)m_{n,b}(\theta,V)+m_{n,b}^{2}(\theta,V)-Y^{2}$
		$\displaystyle\hskip 128.0374pt+2m_{0}(\theta,V)m^{}(\theta,V)-{m^{}(\theta,V)}^{2}\}dF(\theta)$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\int\mathbb{P}_{0}\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\left[\{m_{0}(\theta,V)-m_{n,b}\}^{2}-\{m^{*}(\theta,V)-m_{0}(\theta,V)\}^{2}\right]dF(\theta)$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\int\mathbb{P}_{0}\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\big{[}\{m_{0}(\theta,V)-m^{}(\theta,V)+m^{}(\theta,V)-m_{n,b}(\theta,V)\}^{2}$
		$\displaystyle\hskip 128.0374pt-\{m^{*}(\theta,V)-m_{0}(\theta,V)\}^{2}\big{]}dF(\theta)$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\int\mathbb{P}_{0}\prod_{t=1}^{T}\frac{\it}{\pi^{t}_{0}}\big{[}\{m^{*}(\theta,V)-m_{n,b}(\theta,V)\}^{2}$
		$\displaystyle\hskip 128.0374pt-2\{m^{}(\theta,V)-m_{0}(\theta,V)\}\{m^{}(\theta,V)-m_{n,b}(\theta,V)\}\big{]}dF(\theta)$

where $m_{0}(\theta,V)=E[Y|\theta,V]$ is the true minimizer of the risk function and $m_{n}(\theta,V)\xrightarrow{}m^{*}(\theta,V)$ .

Hence $\mathbb{E}_{B_{n}}\mathbb{P}_{0}[\varphi_{m_{n,b}}(\pi_{0},Q_{0,m_{n,b}})-\varphi_{m^{*}}(\pi_{0},Q_{0,m^{*}})]=o_{p}(n^{-1/2})$ if and only if $m_{0}(\theta,V)=m^{*}(\theta,V)$ .

12 Proof of Theorem 4

In this section, we will show that by undersmoothing the propensity score, the IPW estimator will be linear asymptotic. We start by considering the difference below.

	$\displaystyle\Psi^{UIPW}_{n}(m_{n},\pi_{n})-\Psi_{0}(m_{n})=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)-L_{m_{n,b}}\left(\pi_{0}\right)\right\}$
		$\displaystyle+\mathbb{E}_{B_{n}}(\mathbb{P}^{1}_{n,b}-\mathbb{P}_{0})L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)$		(12)

The second term on the right-hand side of (12) can be represented as

$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}^{1}_{n,b}-\mathbb{P}_{0})L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}^{1}_{n,b}-\mathbb{P}_{0})L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)-L_{m^{*}}\left(\pi_{0}\right)$
	$\displaystyle+\mathbb{E}_{B_{n}}(\mathbb{P}_{n,b}^{1}-\mathbb{P}_{0})L_{m^{*}}\left(\pi_{0}\right)$	(13)
$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}_{n}-\mathbb{P}_{0})L_{m^{*}}\left(\pi_{0}\right)+o_{p}(n^{-1/2})$	(14)

The only remaining term to study is the first term in (12).

	$\displaystyle\mathbb{E}_{B_{n}}$	$\displaystyle\mathbb{P}_{0}\left\{L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)-L_{m_{n,b}}\left(\pi_{0}\right)\right\}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)-L_{m_{n,b}}\left(\pi_{0}\right)\right\}-\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{L_{m^{}}\left(\pi_{n,b,\lambda}\right)-L_{m^{}}\left(\pi_{0}\right)\right\}$
		$\displaystyle+\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{L_{m^{}}\left(\pi_{n,b,\lambda}\right)-L_{m^{}}\left(\pi_{0}\right)\right\}$

The terms in the first integral can be written as

	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\{L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)-L_{m_{n,b}}\left(\pi_{0}\right)\}=\mathbb{E}_{B_{n}}\int\int\mathbb{E}\left\{\left(Y-m_{n,b}(\theta,V)\right)^{2}\mid\bar{H}_{T}\right\}\pi_{0}\left(\frac{\pi_{0}-\pi_{n,b,\lambda}}{\pi_{n,b,\lambda}\pi_{0}}\right)dF(\theta)dP\left(o\right)$
	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\{L_{m^{}}\left(\pi_{n,b,\lambda}\right)-L_{m^{}}\left(\pi_{0}\right)\}=\mathbb{E}_{B_{n}}\int\int\mathbb{E}\left\{\left(Y-m^{*}(\theta,V)\right)^{2}\mid\bar{H}_{T}\right\}\pi_{0}\left(\frac{\pi_{0}-\pi_{n,b,\lambda}}{\pi_{n,b,\lambda}\pi_{0}}\right)dF(\theta)dP\left(o\right).$

where $\pi_{0}=\prod_{t=1}^{T}\pi_{0}^{t}$ and $\pi_{n,b}$ is an estimate of $\pi_{0}$ using the $b^{th}$ sample split.

Combining these two terms, we have

	$\displaystyle\mathbb{E}_{B_{n}}\left\{L_{m_{n,b}}\left(\pi_{n,b,\lambda}\right)-L_{m_{n,b}}\left(\pi_{0}\right)\right\}-\left\{L_{m^{}}\left(\pi_{n,b,\lambda}\right)-L_{m^{}}\left(\pi_{0}\right)\right\}=$
	$\displaystyle\int\int\left[\{m^{}(\theta,V)-m_{n,b}(\theta,V)\}\{2Y-m_{n,b}(\theta,V)-m^{}(\theta,V)\}\right]\pi_{0}\left(\frac{\pi_{0}-\pi_{n,b,\lambda}}{\pi_{n,b,\lambda}\pi_{0}}\right)dF(\theta)dP\left(o\right)=o_{p}\left(n^{-1/2}\right).$

The last equality applies Cauchy-Schwarz inequality and the assumption that $\int\|\pi_{n}-\pi_{0}\|_{2}.\|m_{n}-m^{*}\|_{2}dF(\theta)=o_{p}(n^{-1/2})$ . Gathering the relevant terms, we have

	$\displaystyle\Psi^{UIPW}_{n}(m_{n},\pi_{n})-\Psi_{0}(m_{n})=$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\left\{L_{m^{}}\left(\pi_{0}\right)-\Psi_{0}(m^{})\right\}$
		$\displaystyle+\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\left\{L_{m^{}}\left(\pi_{n,b,\lambda}\right)-L_{m^{}}\left(\pi_{0}\right)\right\}+o_{p}\left(n^{-1/2}\right).$

We now consider the second term on the right-hand side.

	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}\left\{L_{m^{}}\left(\pi_{n,b,\lambda}\right)-L_{m^{}}\left(\pi_{0}\right)\right\}$
	$\displaystyle=\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\prod_{t=1}^{T}\frac{I(A_{t}=d^{\theta}_{t})\{Y-m^{}(\theta,V)\}^{2}}{\pi^{t}_{n,b,\lambda}}dF(\theta)-\mathbb{P}_{0}\int\prod_{t=1}^{T}\frac{I(A_{t}=d^{\theta}_{t})\{Y-m^{}(\theta,V)\}^{2}}{\pi^{t}_{0}}dF(\theta)$
	$\displaystyle\pm\sum_{k=2}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{t=1}^{T}I(A_{t}=d^{\theta}_{t})\{Y-m^{*}(\theta,V)\}^{2}}{\prod_{i=1}^{k-1}(\pi^{t}_{n,b,\lambda})\prod_{t=k}^{T}(\pi^{t}_{0})}dF(\theta)$
	$\displaystyle=\sum_{k=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{t=1}^{T}I(A_{t}=d_{t}^{\theta})\{Y-m^{*}(\theta,V)\}^{2}}{\prod_{t=1}^{k}(\pi^{t}_{n,b,\lambda})\prod_{t=k+1}^{T}(\pi^{t}_{0})}dF(\theta)$
	$\displaystyle=\sum_{k=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{*}}}{\prod_{t=1}^{k-1}(\pi_{t,b})}\frac{\pi^{k}_{0}(\pi^{k}_{0}-\pi^{k}_{n,b,\lambda})}{\pi^{k}_{0}\pi^{k}_{n,b,\lambda}}dF(\theta)$
	$\displaystyle=\sum_{k=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{}}}{\prod_{t=1}^{k-1}(\pi^{t}_{n,b,\lambda})}\frac{\pi^{k}_{0}-\pi^{k}_{n,b,\lambda}}{\pi^{k}_{0}}-\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{}}}{\prod_{t=1}^{k-1}(\pi^{t}_{n,b,\lambda})}\frac{(\pi^{k}_{0}-\pi^{k}_{n,b,\lambda})^{2}}{\pi^{k}_{n,b,\lambda}}dF(\theta)$
	$\displaystyle=\sum_{k=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{*}}}{\prod_{t=1}^{k-1}(\pi^{t}_{n,b,\lambda})}\frac{\pi^{k}_{0}-\pi^{k}_{n,b,\lambda}}{\pi^{k}_{0}}+o_{p}(n^{-1/2})$
	$\displaystyle=\sum_{k=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{0}\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{*}}}{\prod_{t=1}^{k-1}(\pi^{t}_{n,b,\lambda})}\frac{I(A_{k}=d^{\theta}_{k})-\pi^{k}_{n,b,\lambda}}{\pi^{k}_{0}}+o_{p}(n^{-1/2}).$		(15)

Let

	$\displaystyle D_{CAR}^{k}(Q_{0},\pi_{0},\pi_{n,b,\lambda})=\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{*}}}{\prod_{t=1}^{k-1}\pi^{t}_{n,b,\lambda}}\frac{I(A_{k}=d^{\theta}_{k})-\pi^{k}_{n,b,\lambda}}{\pi^{k}_{0}}$		(16)
	$\displaystyle D_{CAR}^{k}(Q_{0},\pi_{0})=\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{*}}}{\prod_{t=1}^{k-1}\pi^{t}_{n,b,\lambda}}\frac{I(A_{k}=d^{\theta}_{k})-\pi^{k}_{0}}{\pi^{k}_{0}}$		(17)

and $D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})=\sum_{k=1}^{T}D_{CAR}^{k}(Q_{0},\pi_{0},\pi_{n,b,\lambda})$ and $D_{CAR}(Q_{0},\pi_{0})=\sum_{k=1}^{T}D_{CAR}^{k}(Q_{0},\pi_{0})$ . We then have

	$\displaystyle(\ref{eq:dcar1})$	$\displaystyle=\mathbb{E}_{B_{n}}\mathbb{P}_{0}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}\mathbb{P}_{0}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})-\mathbb{E}_{B_{n}}\mathbb{P}_{n,b}^{1}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+\mathbb{E}_{B_{n}}\mathbb{P}_{n,b}^{1}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}_{0}-\mathbb{P}_{n,b}^{1})D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+\mathbb{E}_{B_{n}}\mathbb{P}_{n,b}^{1}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{B_{n}}(\mathbb{P}_{0}-\mathbb{P}_{n,b}^{1})[D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})-D_{CAR}(Q_{0},\pi_{0})]+\mathbb{E}_{B_{n}}(\mathbb{P}_{0}-\mathbb{P}_{n,b}^{1})D_{CAR}(Q_{0},\pi_{0})$
		$\displaystyle+\mathbb{E}_{B_{n}}\mathbb{P}_{n,b}^{1}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+o_{p}(n^{-1/2})$
	$\displaystyle=$	$\displaystyle(\mathbb{P}_{0}-\mathbb{P}_{n})D_{CAR}(Q_{0},\pi_{0})+\mathbb{E}_{B_{n}}\mathbb{P}_{n,b}^{1}D_{CAR}(Q_{0},\pi_{0},\pi_{n,b,\lambda})+o_{p}(n^{-1/2}).$

Putting everything together, we have

$\displaystyle\Psi^{UIPW}_{n}(m_{n},\pi_{n})-\Psi_{0}(m_{n},\pi_{0})=$	$\displaystyle(\mathbb{P}_{n}-\mathbb{P}_{0})\{L_{m}^{*}(\pi_{0})-D_{CAR}(Q_{0},\pi_{0})\}$
	$\displaystyle+\sum_{k=1}^{T}\mathbb{E}_{B_{n}}\mathbb{P}_{n,b}^{1}D^{k}_{CAR}(Q_{0},\pi_{0},\pi_{b,\lambda_{opt}})+o_{p}(n^{-1/2}),$	(18)
$\displaystyle=$	$\displaystyle(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{}}(\pi_{0},Q_{0,m^{}})$	(19)

so it remains to show the second term on the right-hand side is asymptotically negligible.

For any given $k$ and any given collection of folds, we have

	$\displaystyle\mathbb{P}_{n,b}^{1}D^{k}_{CAR}(Q_{0},\pi_{0},\pi_{b,\lambda_{opt}})$	$\displaystyle=\mathbb{P}_{n,b}^{1}\int\frac{\prod_{t=1}^{k-1}I(A_{t}=d^{\theta}_{t})Q^{k}_{0,m^{*}}}{\prod_{t=1}^{k-1}(\pi^{t}_{b,\lambda_{opt}})}\frac{I(A_{k}=d^{\theta}_{k})-\pi^{k}_{b,\lambda_{opt}}}{\pi^{k}_{0}}$
		$\displaystyle=o_{p}(n^{-1/2}),$

by applying Lemma 1 in Ertefaie et al. [2023], which relies on our assumptions …[his asp 1 and 2 plus his equation 4, at each k] …. The conclusion that

\Psi^{UIPW}_{n}(m_{n},\pi_{n,\lambda_{opt}})-\Psi_{0}(m_{n})=(\mathbb{P}_{n}-\mathbb{P}_{0})\varphi_{m^{*}}(\pi_{0},Q_{0,m^{*}})+o_{p}(n^{-1/2})

followed by summing over $k$ .

13 Additional Simulations

We consider the case where the treatment $A_{t}$ is randomized (Figure 3). Figure 3 shows that our proposed estimators and IPW estimator are all asymptotically linear. However, the IPW estimator is not efficient. Our proposed estimator is efficient. Their coverage all get close to 95% as the sample size increases. The coverage of IPW estimator, on the other hand, decreases to 80% as the sample size increases.

We also present the simulation scenarios where the marginal structural models are estimated using

1.

Linear model with second order term (i.e, $E[Y\mid\theta]=\beta_{0}+\beta_{1}\theta+\beta_{2}\theta^{2}$ ) (Figure 4)
2.

Random forest (Figure 5)
3.

Highly adaptive lasso (Figure 6)

While our proposed estimators may display some variability in scaled bias in small sample sizes when the marginal structural model is estimated using nonparametric methods, they exhibit a downward trend as sample sizes increase, suggesting the asymmetrical linearity property. In contrast, the inverse probability weighting (IPW) estimator shows an upward trend, indicating potential bias when the propensity model is estimated using nonparametric methods. Notably, the UIPW-score estimator outperforms the UIPW-Dcar in various scenarios. Its scaled bias is consistently smaller than that of UIPW-Dcar and, in some instances, even smaller than the MR estimator. Furthermore, its coverage is closer to 0.95 compared to UIPW-Dcar in smaller sample sizes, especially when the marginal structural model is estimated using nonparametric methods.

	$\displaystyle\mathbb{E}\Big{[}\int\prod_{j\neq t}^{T}\Big{\{}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}(A_{j}=d^{\theta}_{j}\|\bar{H}_{j})}\Big{\}}\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})^{2}}\{Y-m(\theta,V)\}^{2}\frac{\partial}{\partial\epsilon}\pi_{\epsilon}(A_{t}=d^{\theta}_{t}\mid\bar{H}_{t})\mathrm{d}F(\theta)\Big{]}$
	$\displaystyle=\mathbb{E}\left[\int\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}\mathbb{E}\left[\prod_{j\neq t}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t}\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle-\mathbb{E}\Bigg{[}\int\mathbb{E}\left[\prod_{j\neq m}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi^{j}_{0}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\big{]}\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\left[\int\frac{I(A_{t}=d_{t}^{\theta})}{\pi^{t}_{0}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}\mathbb{E}\left[\prod_{j\neq m}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi_{0}^{j}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle-\mathbb{E}\Bigg{[}\int\mathbb{E}\left[\prod_{j\neq m}^{T}\frac{I(A_{j}=d_{j}^{\theta})}{\pi_{0}^{j}\left\{A_{j}=d^{\theta}_{j}\|\bar{H}_{j}\right\}}\{Y-m(\theta,V)\}^{2}\mid\bar{S}_{t},\bar{A}_{t-1},A_{t}=d^{\theta}_{t}\big{]}\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\left[\int\left\{\prod_{i=1}^{t-1}\frac{I(A_{i}=d^{\theta}_{i})}{\pi^{t}_{0}(A_{i}=d^{\theta}_{i}\mid\bar{H}_{i})}\right\}Q^{t}_{m}\left[\frac{I(A_{t}=d_{t}^{\theta})}{\pi_{0}^{t}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}-1\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right]$
	$\displaystyle=\mathbb{E}\left[\int\left\{\prod_{i=1}^{t-1}\frac{I(A_{i}=d^{\theta}_{i})}{\pi^{t}_{0}(A_{i}=d^{\theta}_{i}\mid\bar{H}_{i})}\right\}Q^{t}_{0,m}\left[\frac{I(A_{t}=d_{t}^{\theta})-{\pi_{0}^{t}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}}{{\pi_{0}^{t}\left\{A_{t}=d^{\theta}_{t}\mid\bar{H}_{t}\right\}}}\right]\mathcal{S}(\mathcal{O})\mathrm{d}F(\theta)\right],$