This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Regression-Adjusted Estimation of Quantile Treatment Effects under Covariate-Adaptive Randomizations

Liang Jiang Fanhai International School of Finance, Fudan University. 220 Handan Rd, Shanghai, China 200437. Peter C. B. Phillips School of Economics, Singapore Management University. 90 Stamford Rd, Singapore 178903. Yale University, New Haven, CT 06520-8281, United States of America. University of Auckland, 12 Grafton Rd, Auckland Central, Auckland, 1010, New Zealand. University of Southampton, University Rd, Southampton SO17 1BJ, United Kingdom. Yubo Tao Corresponding author. Email: [email protected]. Department of Economics, University of Macau. Avenida da Universidade, Taipa, Macao SAR, China. Yichong Zhang School of Economics, Singapore Management University. 90 Stamford Rd, Singapore 178903.
(March 10, 2025)
Abstract

Datasets from field experiments with covariate-adaptive randomizations (CARs) usually contain extra covariates in addition to the strata indicators. We propose to incorporate these additional covariates via auxiliary regressions in the estimation and inference of unconditional quantile treatment effects (QTEs) under CARs. We establish the consistency and limit distribution of the regression-adjusted QTE estimator and prove that the use of multiplier bootstrap inference is non-conservative under CARs. The auxiliary regression may be estimated parametrically, nonparametrically, or via regularization when the data are high-dimensional. Even when the auxiliary regression is misspecified, the proposed bootstrap inferential procedure still achieves the nominal rejection probability in the limit under the null. When the auxiliary regression is correctly specified, the regression-adjusted estimator achieves the minimum asymptotic variance. We also discuss forms of adjustments that can improve the efficiency of the QTE estimators. The finite sample performance of the new estimation and inferential methods is studied in simulations, and an empirical application to a well-known dataset concerned with expanding access to basic bank accounts on savings is reported.


Keywords: Covariate-adaptive randomization, High-dimensional data, Regression adjustment, Quantile treatment effects.

JEL codes: C14, C21, D14, G21

1 Introduction

Covariate-adaptive randomizations (CARs) have recently seen growing use in a wide variety of randomized experiments in economic research. Examples include Chong et al. (2016), Greaney et al. (2016), Jakiela and Ozier (2016), Burchardi et al. (2019), Anderson and McKenzie (2021), among many others. In CAR modeling, units are first stratified using some baseline covariates, and then, within each stratum, the treatment status is assigned (independent of covariates) to achieve the balance between the numbers of treated and control units.

In many empirical studies, apart from the average treatment effect (ATE), researchers are often interested in using randomized experiments to estimate quantile treatment effects (QTEs). The QTE has a useful role as a robustness check for the ATE and characterizes any heterogeneity that may be present in the sign and magnitude of the treatment effects according to their position within the distribution of outcomes. See, for example, Bitler et al. (2006), Muralidharan and Sundararaman (2011), Duflo et al. (2013), Banerjee et al. (2015), Crépon et al. (2015), and Campos et al. (2017).

Two practical issues arise in estimation and inference concerning QTEs under CARs. First, other covariates in addition to the strata indicators are collected during the experiment. It is possible to incorporate these covariates in the estimation of treatment effects to reduce variance and improve efficiency. In the estimation of ATE, the usual practice is to run a simple ordinary least squares (OLS) regression of the outcome on treatment status, strata indicators, and additional covariates as in the analysis of covariance (ANCOVA). Freedman (2008a, b) pointed out that such an OLS regression adjustment can degrade the precision of the ATE estimator. Lin (2013) reexamined Freedman’s critique and showed that, in order to improve efficiency, the linear regression adjustment should include a full set of interactions between the treatment status and covariates. However, because the quantile function is a nonlinear operator, even when the assignment of treatment status is completely random, a similar linear quantile regression with a full set of interaction terms is unable to provide a consistent estimate of the unconditional QTE, not to mention the improvement of estimation efficiency. Second, in order to achieve balance in the respective number of treated and control units within each stratum, treatment statuses under CARs usually exhibit a (negative) cross-sectional dependence. Standard inference procedures that rely on cross-sectional independence are therefore conservative and lack power. These two issues raise questions of how to use the additional covariates to consistently and more efficiently estimate QTE in CAR settings and how to conduct valid statistical procedures that mitigate conservatism in inference.111For example, Bugni et al. (2018) and Zhang and Zheng (2020) have shown that the usual two-sample tt-test for inference concerning ATE and multiplier bootstrap inference concerning QTE are in general conservative under CARs.

The present paper addresses these issues by proposing a regression-adjusted estimator of the QTE, deriving its limit theory, and establishing the validity of multiplier bootstrap inference under CARs. Even under potential misspecification of the auxiliary regressions, the proposed QTE estimator is shown to maintain its consistency, and the multiplier bootstrap procedure is shown to have an asymptotic size equal to the nominal level under the null. When the auxiliary regression is correctly specified, the QTE estimator achieves minimum asymptotic variance.

We further investigate efficiency gains that materialize from the regression adjustments in three scenarios: (1) parametric regressions, (2) nonparametric regressions, and (3) regressions with regularization in high-dimensional settings. Specifically, for parametric regressions with a potentially misspecified linear probability model, we propose to compute the optimal linear coefficient by minimizing the variance of the QTE estimator. Such an adjustment is optimal within the class of linear adjustments but does not necessarily achieve the global minimum asymptotic variance. However, as no adjustment is a special case of the linear regression with all the coefficients being zero, our optimal linear adjustment is guaranteed to be weakly more efficient than the QTE estimator with no adjustments, which addresses Freedman’s critique. We also consider a potentially misspecified logistic regression with fixed-dimensional regressors and strata- and quantile-specific regression coefficients, which is then estimated by the quasi maximum likelihood estimation (QMLE). Although the QMLE does not necessarily minimize the asymptotic variance of the QTE, such a flexible logistic model can closely approximate the true specification. Therefore, in practice, the corresponding regression-adjusted QTE estimator usually has a smaller variance than that with no adjustments. Last, we propose to treat the logistic QMLE adjustments as new linear regressors and re-construct the corresponding optimal linear adjustments. We then show the QTE estimator with the new adjustments are weakly more efficient than that with both the original logistic QMLE adjustments and no adjustments.

In nonparametric regressions, we further justify the QMLE by letting the regressors in the logistic regression be a set of sieve basis functions with increasing dimension and show how such a nonparametric regression-adjusted QTE estimator can achieve the global minimum asymptotic variance. For high-dimensional regressions with regularization, we consider logistic regression under 1\ell_{1} penalization, an approach that also achieves the global minimum asymptotic variance. All the limit theories hold uniformly over a compact set of quantile indices, implying that our multiplier bootstrap procedure can be used to conduct inference on QTEs involving single, multiple, or a continuum of quantile indices.

These results, including the limit distribution of the regression-adjusted QTE estimator and the validity of the multiplier bootstrap, provide novel contributions to the literature in three respects. First, the data generated under CARs are different from observational data as the observed outcomes and treatment statuses are cross-sectionally dependent due to the randomization schemes. Recently Bugni et al. (2018) established a rigorous asymptotic framework to study the ATE estimator under CARs and pointed out the conservatism of the two-sample t-test except for some special cases. (See Bugni et al. (2018, Remark 4.2) for more detail.) Our analysis follows this new framework, which departs from the literature of causal inference under an i.i.d. treatment structure.

Second, we contribute to the literature on causal inference under CARs by developing a new methodology that includes additional covariates in the estimation of the unconditional QTE and by establishing a general theory for regression adjustments that allow for parametric, nonparametric, and regularized estimations of the auxiliary regressions. As mentioned earlier, unlike ATE estimation, the naive linear quantile regression with additional covariates cannot even produce a consistent estimator of the QTE. Instead, we propose a new way to incorporate additional covariates based on the Neyman orthogonal moment and investigate the asymptotic properties and the efficiency gains of the proposed regression-adjusted estimator under CARs. This new machinery allows us to study the QTE regression, which is nonparametrically specified, with both linear (linear probability model) and nonlinear (logit and probit models) regression adjustments. To clarify this contribution to the literature we note that Hu and Hu (2012); Ma et al. (2015, 2020); Olivares (2021); Shao and Yu (2013); Zhang and Zheng (2020); Ye (2018); Ye and Shao (2020) considered inference of various causal parameters under CARs but without taking into account additional covariates. Bugni et al. (2018), Bugni et al. (2019), and Bugni and Gao (2021) considered saturated regressions for ATE and local ATE, which can be viewed as regression-adjustments where strata indicators are interacted with the treatment or instrument. Shao et al. (2010) showed that if a test statistic is constructed based on the correctly specified model between outcome and additional covariates and the covariates used for CAR are functions of additional covariates, then the test statistic is valid conditional on additional covariates. Bloniarz et al. (2016); Fogarty (2018); Lin (2013); Lu (2016); Lei and Ding (2021); Li and Ding (2020); Liu et al. (2020); Liu and Yang (2020); Negi and Wooldridge (2020); Ye et al. (2022); Zhao and Ding (2021) studied various estimation methods based on regression adjustments, but these studies all focused on ATE estimation. Specifically, Liu et al. (2020) considered linear adjustments for ATE under CARs in which the covariates can be high-dimensional and the adjustments can be estimated by Lasso. Ansel et al. (2018) considered regression adjustment using additional covariates for ATE and Local ATE. We differ from them by considering QTE with nonlinear adjustments such as logistic Lasso.

Third, we establish the validity of the multiplier bootstrap inference for the regression-adjusted QTE estimator under CARs. To the best of our knowledge, Shao et al. (2010) and Zhang and Zheng (2020) are the only works in the literature that studied bootstrap inference under CARs. Shao et al. (2010) considered the covariate-adaptive bootstrap for the linear regression model. Zhang and Zheng (2020) proposed to bootstrap inverse propensity score weighted (IPW) QTE estimator with the estimated target fraction of treatment even when the truth is known. They showed that the asymptotic variance of the IPW estimator is the same under various CARs. Thus, even though the bootstrap sample ignores the cross-sectional dependence and behaves as if the randomization scheme is simple, the asymptotic variance of the bootstrap analogue is still the same. We complement this research by studying the validity of multiplier bootstrap inference for our regression-adjusted QTE estimator. We establish analytically that the multiplier bootstrap with the estimated fraction of treatment is not conservative in the sense that it can achieve an asymptotic size equal to the nominal level under the null even when the auxiliary regressions are misspecified.

The present paper also comes under the umbrella of a growing literature that has addressed estimation and inference in randomized experiments. In this connection, we mention the studies of Hahn et al. (2011); Athey and Imbens (2017); Abadie et al. (2018); Tabord-Meehan (2021); Bai et al. (2021); Bai (2020); Jiang et al. (2021) among many others. Bai (2020) showed an ‘optimal’ matched-pair design can minimize the mean-squared error of the difference-in-means estimator for ATE, conditional on covariates. Tabord-Meehan (2021) designed an adaptive randomization procedure which can minimize the variance of the weighted estimator for ATE. Both works rely on a pilot experiment to design the optimal randomization. In contrast, we take the randomization scheme (i.e., CARs) as given and search for new estimators (other than difference-in-quantile and weighted estimators) for QTE that have smaller variance. In addition, our approach does not require a pilot experiment. Therefore, our and their methods are applied to different scenarios depending on the definition of ‘optimality’ and the data available, and thus, complement to each other.

From a practical perspective, our estimation and inferential methods have four advantages. First, they allow for common choices of auxiliary regressions such as linear probability, logit, and probit regressions, even though these regressions may be misspecified. Second, the methods can be implemented without tuning parameters. Third, our (bootstrap) estimator can be directly computed via the subgradient condition, and the auxiliary regressions need not be re-estimated in the bootstrap procedure, both of which save considerable computation time. Last, our estimation and inference methods can be implemented without the knowledge of the exact treatment assignment rule used in the experiment. This advantage is especially useful in subsample analysis, where sub-groups are defined using variables other than those to form the strata and the treatment assignment rule for each sub-group becomes unknown. See, for example, the anemic subsample analysis in Chong et al. (2016) and Zhang and Zheng (2020). These last three points carry over from Zhang and Zheng (2020) and are logically independent of the regression adjustments. One of our contributions is to show these results still hold for our regression-adjusted estimator.

The remainder of the paper is organized as follows. Section 2 describes the model setup and notation. Section 3 develops the asymptotic properties of our regression-adjusted QTE estimator. Section 4 studies the validity of the multiplier bootstrap inference. Section 5 considers parametric, nonparametric, and regularized estimation of the auxiliary regressions. Section 6 reports simulation results, and an empirical application of our methods to the impact of expanding access to basic bank accounts on savings is provided in Section 7. Section 8 concludes. Proofs of all results and some additional simulation results are given in the Online Supplement.

2 Setup and Notation

Potential outcomes for treated and control groups are denoted by Y(1)Y(1) and Y(0)Y(0), respectively. Treatment status is denoted by AA, with A=1A=1 indicating treated and A=0A=0 untreated. The stratum indicator is denoted by SS, based on which the researcher implements the covariate-adaptive randomization. The support of SS is denoted by 𝒮\mathcal{S}, a finite set. After randomization, the researcher can observe the data {Yi,Si,Ai,Xi}i[n]\{Y_{i},S_{i},A_{i},X_{i}\}_{i\in[n]} where [n]={1,2,n}[n]=\{1,2,...n\}, Yi=Yi(1)Ai+Yi(0)(1Ai)Y_{i}=Y_{i}(1)A_{i}+Y_{i}(0)(1-A_{i}) is the observed outcome, and XiX_{i} contains extra covariates besides SiS_{i} in the dataset. The support of XX is denoted Supp(X)\text{Supp}(X). In this paper, we allow XiX_{i} and SiS_{i} to be dependent. For i[n]i\in[n], let p(s)=(Si=s)p(s)=\mathbb{P}(S_{i}=s), n(s)=i[n]1{Si=s}n(s)=\sum_{i\in[n]}1\{S_{i}=s\}, n1(s)=i[n]Ai1{Si=s}n_{1}(s)=\sum_{i\in[n]}A_{i}1\{S_{i}=s\}, and n0(s)=n(s)n1(s)n_{0}(s)=n(s)-n_{1}(s). We make the following assumptions on the data generating process (DGP) and the treatment assignment rule.

Assumption 1.
  1. (i)

    {Yi(1),Yi(0),Si,Xi}i[n]\{Y_{i}(1),Y_{i}(0),S_{i},X_{i}\}_{i\in[n]} is i.i.d.

  2. (ii)

    {Yi(1),Yi(0),Xi}i[n]{Ai}i[n]|{Si}i[n]\{Y_{i}(1),Y_{i}(0),X_{i}\}_{i\in[n]}\perp\!\!\!\perp\{A_{i}\}_{i\in[n]}|\{S_{i}\}_{i\in[n]}.

  3. (iii)

    Suppose p(s)p(s) is fixed with respect to (w.r.t.) nn and is positive for every s𝒮s\in\mathcal{S}.

  4. (iv)

    Let π(s)\pi(s) denote the target fraction of treatment for stratum ss. Then, c<mins𝒮π(s)maxs𝒮π(s)<1cc<\min_{s\in\mathcal{S}}\pi(s)\leq\max_{s\in\mathcal{S}}\pi(s)<1-c for some constant c(0,0.5)c\in(0,0.5) and Dn(s)n(s)=op(1)\frac{D_{n}(s)}{n(s)}=o_{p}(1) for s𝒮s\in\mathcal{S}, where Dn(s)=i[n](Aiπ(s))1{Si=s}D_{n}(s)=\sum_{i\in[n]}(A_{i}-\pi(s))1\{S_{i}=s\}.

Several remarks are in order. First, Assumption 1(i) allows for cross-sectional dependence among treatment statuses ({Ai}i[n]\{A_{i}\}_{i\in[n]}), thereby accommodating many covariate-adaptive randomization schemes as discussed below. Second, although treatment statuses are cross-sectionally dependent, they are independent of the potential outcomes and additional covariates conditional on the stratum indicator SS. Therefore, data are still experimental rather than observational. Third, Assumption 1(iii) requires the size of each stratum to be proportional to the sample size. Fourth, we can view π(s)\pi(s) as the target fraction of treated units in stratum ss. Similar to Bugni et al. (2019), we allow the target fractions to differ across strata. Just as for the overlapping support condition in an observational study, the target fractions are assumed to be bounded away from zero and one. In randomized experiments, this condition usually holds because investigators can determine π(s)\pi(s) in the design stage. In fact, in most CARs, π(s)=0.5\pi(s)=0.5 for s𝒮s\in\mathcal{S}. Fifth, Dn(s)D_{n}(s) represents the degree of imbalance between the real and target factions of treated units in the ssth stratum. Bugni et al. (2018) show that Assumption 1(iv) holds under several covariate-adaptive treatment assignment rules such as simple random sampling (SRS), biased-coin design (BCD), adaptive biased-coin design (WEI), and stratified block randomization (SBR). For completeness, we briefly repeat their descriptions below. Note we only require Dn(s)/n(s)=op(1)D_{n}(s)/n(s)=o_{p}(1), which is weaker than the assumption imposed by Bugni et al. (2018) but the same as that imposed by Bugni et al. (2019) and Zhang and Zheng (2020).

Example 1 (SRS).

Let {Ai}i[n]\{A_{i}\}_{i\in[n]} be drawn independently across ii and of {Si}i[n]\{S_{i}\}_{i\in[n]} as Bernoulli random variables with success rate π\pi, i.e., for k=1,,nk=1,\cdots,n,

(Ak=1|{Si}i[n],{Aj}j[k1])=(Ak=1)=π(Si).\displaystyle\mathbb{P}\left(A_{k}=1\big{|}\{S_{i}\}_{i\in[n]},\{A_{j}\}_{j\in[k-1]}\right)=\mathbb{P}(A_{k}=1)=\pi(S_{i}).
Example 2 (WEI).

This design was first proposed by Wei (1978). Let nk1(Sk)=i[k1]1{Si=Sk}n_{k-1}(S_{k})=\sum_{i\in[k-1]}1\{S_{i}=S_{k}\}, Dk1(s)=i[k1](Ai12)1{Si=s}D_{k-1}(s)=\sum_{i\in[k-1]}\left(A_{i}-\frac{1}{2}\right)1\{S_{i}=s\}, and

(Ak=1|{Si}i[k],{Ai}i[k1])=ϕ(2Dk1(Sk)nk1(Sk)),\displaystyle\mathbb{P}\left(A_{k}=1\big{|}\{S_{i}\}_{i\in[k]},\{A_{i}\}_{i\in[k-1]}\right)=\phi\biggl{(}\frac{2D_{k-1}(S_{k})}{n_{k-1}(S_{k})}\biggr{)},

where ϕ():[1,1][0,1]\phi(\cdot):[-1,1]\mapsto[0,1] is a pre-specified non-increasing function satisfying ϕ(x)=1ϕ(x)\phi(-x)=1-\phi(x) and D0(S1)0\frac{D_{0}(S_{1})}{0} is understood to be zero.

Example 3 (BCD).

The treatment status is determined sequentially for 1kn1\leq k\leq n as

(Ak=1|{Si}i[k],{Ai}i[k1])={12if Dk1(Sk)=0λif Dk1(Sk)<01λif Dk1(Sk)>0,\displaystyle\mathbb{P}\left(A_{k}=1|\{S_{i}\}_{i\in[k]},\{A_{i}\}_{i\in[k-1]}\right)=\begin{cases}\frac{1}{2}&\text{if }D_{k-1}(S_{k})=0\\ \lambda&\text{if }D_{k-1}(S_{k})<0\\ 1-\lambda&\text{if }D_{k-1}(S_{k})>0,\end{cases}

where Dk1(s)D_{k-1}(s) is defined as above and 12<λ1\frac{1}{2}<\lambda\leq 1.

Example 4 (SBR).

For each stratum, π(s)n(s)\lfloor\pi(s)n(s)\rfloor units are assigned to treatment and the rest are assigned to control.

Denote the τ\tauth quantile of Y(a)Y(a) by qa(τ)q_{a}(\tau) for a=0,1a=0,1. We are interested in estimating and inferring the τ\tauth quantile treatment effect defined as q(τ)=q1(τ)q0(τ)q(\tau)=q_{1}(\tau)-q_{0}(\tau). The testing problems of interest involve single, multiple, or even a continuum of quantile indices, as in the following null hypotheses

0:q(τ)=q¯versusq(τ)q¯,\displaystyle\mathcal{H}_{0}:q(\tau)=\underline{q}\quad\text{versus}\quad q(\tau)\neq\underline{q},
0:q(τ1)q(τ2)=q¯versusq(τ1)q(τ2)q¯,and\displaystyle\mathcal{H}_{0}:q(\tau_{1})-q(\tau_{2})=\underline{q}\quad\text{versus}\quad q(\tau_{1})-q(\tau_{2})\neq\underline{q},\;\textrm{and}
0:q(τ)=q¯(τ)τΥversusq(τ)q¯(τ)for someτΥ,\displaystyle\mathcal{H}_{0}:q(\tau)=\underline{q}(\tau)\leavevmode\nobreak\ \forall\tau\in\Upsilon\quad\text{versus}\quad q(\tau)\neq\underline{q}(\tau)\leavevmode\nobreak\ \text{for some}\leavevmode\nobreak\ \tau\in\Upsilon,

for some pre-specified value q¯\underline{q} or function q¯(τ)\underline{q}(\tau), where Υ\Upsilon is some compact subset of (0,1)(0,1). We can also test constant QTE by letting q¯(τ)\underline{q}(\tau) in the last hypothesis be a constant q¯\underline{q}.

3 Estimation

Define ma(τ,s,x)=τ(Yi(a)qa(τ)|Si=s,Xi=x)m_{a}(\tau,s,x)=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x) for a=0,1a=0,1 which are the true specifications but unknown to researchers. Instead, researchers specify working models {m¯a(τ,s,x)}a=0,1\{\overline{m}_{a}(\tau,s,x)\}_{a=0,1}222We view m¯a()\overline{m}_{a}(\cdot) as some function with inputs τ,s,x\tau,s,x. For example, researchers can specify a linear probability model with m¯a(τ,s,x)=τxβa,s\overline{m}_{a}(\tau,s,x)=\tau-x^{\top}\beta_{a,s}, where βa,s\beta_{a,s} is the linear coefficient that varies across treatment status aa and stratum ss. for the true specification, which can be misspecified. Last, the researchers estimate the (potentially misspecified) working models via some forms of regression, and the estimators are denoted as {m^a(τ,s,x)}a=0,1\{\widehat{m}_{a}(\tau,s,x)\}_{a=0,1}. We also refer to m¯a()\overline{m}_{a}(\cdot) as the auxiliary regression.

Our regression-adjusted estimator of q1(τ)q_{1}(\tau), denoted as q^1adj(τ)\hat{q}_{1}^{adj}(\tau), can be defined as

q^1adj(τ)=\displaystyle\hat{q}_{1}^{adj}(\tau)= argminqi[n][Aiπ^(Si)ρτ(Yiq)+(Aiπ^(Si))π^(Si)m^1(τ,Si,Xi)q],\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}\rho_{\tau}(Y_{i}-q)+\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})q\right], (3.1)

where ρτ(u)=u(τ1{u0})\rho_{\tau}(u)=u(\tau-1\{u\leq 0\}) is the usual check function and π^(s)=n1(s)/n(s)\hat{\pi}(s)=n_{1}(s)/n(s). We emphasize that m^1()\widehat{m}_{1}(\cdot) may not consistently estimate the true specification m1()m_{1}(\cdot). Similarly, we can define

q^0adj(τ)=\displaystyle\hat{q}_{0}^{adj}(\tau)= argminqi[n][1Ai1π^(Si)ρτ(Yiq)(Aiπ^(Si))1π^(Si)m^0(τ,Si,Xi)q].\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\left[\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}\rho_{\tau}(Y_{i}-q)-\frac{(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})q\right]. (3.2)

Then, our regression adjusted QTE estimator is

q^adj(τ)=q^1adj(τ)q^0adj(τ).\displaystyle\hat{q}^{adj}(\tau)=\hat{q}_{1}^{adj}(\tau)-\hat{q}_{0}^{adj}(\tau). (3.3)

Several remarks are in order. First, in observational studies with i.i.d. data and AiXi|SiA_{i}\perp\!\!\!\perp X_{i}|S_{i}, Firpo (2007), Belloni et al. (2017), and Kallus et al. (2020) showed that the doubly robust moment for q1(τ)q_{1}(\tau) is

𝔼[Ai(τ1{Yi(1)q})π¯(Si)Aiπ¯(Si)π¯(Si)m¯1(τ,Si,Xi)]=0,\displaystyle\mathbb{E}\left[\frac{A_{i}(\tau-1\{Y_{i}(1)\leq q\})}{\overline{\pi}(S_{i})}-\frac{A_{i}-\overline{\pi}(S_{i})}{\overline{\pi}(S_{i})}\overline{m}_{1}(\tau,S_{i},X_{i})\right]=0, (3.4)

where π¯(s)\overline{\pi}(s) and m¯1(τ,s,x)\overline{m}_{1}(\tau,s,x) are the working models for the target fraction (π(s)\pi(s)) and conditional probability (m1(τ,s,x)m_{1}(\tau,s,x)), respectively. Our estimator is motivated by this doubly robust moment, but our analysis differs from that for the observational data as CARs introduces cross-sectional dependence among observations. Second, as our target fraction estimator π^(s)=n1(s)/n(s)\hat{\pi}(s)=n_{1}(s)/n(s) is consistent, it means π¯(s)\overline{\pi}(s) is correctly specified as π(s)\pi(s). Then, due to the double robustness, our regression adjusted estimator is consistent even when m¯a()\overline{m}_{a}(\cdot) is misspecified and m^a()\widehat{m}_{a}(\cdot) is an inconsistent estimator of ma()m_{a}(\cdot). Third, we use the estimated target fraction π^(s)\hat{\pi}(s) even when π(s)\pi(s) is known because this guarantees that the bootstrap inference is not conservative. Further discussion is provided after Theorem 5.

Assumption 2.

For a=0,1a=0,1, denote fa()f_{a}(\cdot), fa(|s)f_{a}(\cdot|s), and fa(|x,s)f_{a}(\cdot|x,s) as the PDFs of Yi(a)Y_{i}(a), Yi(a)|Si=sY_{i}(a)|S_{i}=s, and Yi(a)|Si=s,Xi=xY_{i}(a)|S_{i}=s,X_{i}=x, respectively.

  1. (i)

    fa(qa(τ))f_{a}(q_{a}(\tau)) and fa(qa(τ)|s)f_{a}(q_{a}(\tau)|s) are bounded and bounded away from zero uniformly over τΥ\tau\in\Upsilon and s𝒮s\in\mathcal{S}, where Υ\Upsilon is a compact subset of (0,1)(0,1).

  2. (ii)

    fa()f_{a}(\cdot) and fa(|s)f_{a}(\cdot|s) are Lipschitz over {qj(τ):τΥ}.\{q_{j}(\tau):\tau\in\Upsilon\}.

  3. (iii)

    supy,xSupp(X),s𝒮fa(y|x,s)<\sup_{y\in\Re,x\in\text{Supp}(X),s\in\mathcal{S}}f_{a}(y|x,s)<\infty.

Assumption 3.
  1. (i)

    For a=0,1a=0,1, there exists a function m¯a(τ,s,x)\overline{m}_{a}(\tau,s,x) such that for Δ¯a(τ,s,Xi)=m^a(τ,s,Xi)m¯a(τ,s,Xi)\overline{\Delta}_{a}(\tau,s,X_{i})=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i}), we have

    supτΥ,s𝒮|iI1(s)Δ¯a(τ,s,Xi)n1(s)iI0(s)Δ¯a(τ,s,Xi)n0(s)|=op(n1/2),\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{1}(s)}-\frac{\sum_{i\in I_{0}(s)}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{0}(s)}\biggr{|}=o_{p}(n^{-1/2}),

    where Ia(s)={i[n]:Ai=a,Si=s}I_{a}(s)=\{i\in[n]:A_{i}=a,S_{i}=s\}.

  2. (ii)

    For a=0,1a=0,1, let a={m¯a(τ,s,x):τΥ}\mathcal{F}_{a}=\{\overline{m}_{a}(\tau,s,x):\tau\in\Upsilon\} with an envelope Fa(s,x)F_{a}(s,x). Then, maxs𝒮𝔼(|Fa(Si,Xi)|q|Si=s)<\max_{s\in\mathcal{S}}\mathbb{E}(|F_{a}(S_{i},X_{i})|^{q}|S_{i}=s)<\infty for q>2q>2 and there exist fixed constants (α,v)>0(\alpha,v)>0 such that

    supQN(a,eQ,εFaQ,2)(αε)v,ε(0,1],\displaystyle\sup_{Q}N(\mathcal{F}_{a},e_{Q},\varepsilon||F_{a}||_{Q,2})\leq\left(\frac{\alpha}{\varepsilon}\right)^{v},\quad\forall\varepsilon\in(0,1],

    where N()N(\cdot) denotes the covering number, eQ(f,g)=fgQ,2e_{Q}(f,g)=||f-g||_{Q,2}, and the supremum is taken over all finitely discrete probability measures QQ.

  3. (iii)

    For a=0,1a=0,1 and any τ1,τ2Υ\tau_{1},\tau_{2}\in\Upsilon, there exists a constant C>0C>0 such that

    𝔼((m¯a(τ2,Si,Xi)m¯a(τ1,Si,Xi))2|Si=s)C|τ2τ1|.\mathbb{E}((\overline{m}_{a}(\tau_{2},S_{i},X_{i})-\overline{m}_{a}(\tau_{1},S_{i},X_{i}))^{2}|S_{i}=s)\leq C|\tau_{2}-\tau_{1}|.

Several remarks are in order. First, Assumption 2 is standard in the quantile regression literature. We do not need fa(y|x,s)f_{a}(y|x,s) to be bounded away from zero because we are interested in the unconditional quantile qa(τ)q_{a}(\tau), which is uniquely defined as long as the unconditional density fa(qa(τ))f_{a}(q_{a}(\tau)) is positive. Second, Assumption 3(i) is high-level. If we consider a linear probability model such that m¯a(τ,s,Xi)=τXiθa,s(τ)\overline{m}_{a}(\tau,s,X_{i})=\tau-X_{i}^{\top}\theta_{a,s}(\tau) and m^a(τ,s,Xi)=τXiθ^a,s(τ)\widehat{m}_{a}(\tau,s,X_{i})=\tau-X_{i}^{\top}\hat{\theta}_{a,s}(\tau), then Assumption 3(i) is equivalent to

supτΥ,a=0,1,s𝒮|(iI1(s)Xin1(s)iI0(s)Xin0(s))(θ^a,s(τ)θa,s(τ))|=op(n1/2),\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left(\frac{\sum_{i\in I_{1}(s)}X_{i}}{n_{1}(s)}-\frac{\sum_{i\in I_{0}(s)}X_{i}}{n_{0}(s)}\right)^{\top}\left(\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)\right)\right|=o_{p}(n^{-1/2}),

which is similar to Liu et al. (2020, Assumption 3) and holds intuitively if θ^a,s(τ)\hat{\theta}_{a,s}(\tau) is a consistent estimator of the pseudo true value θa,s(τ)\theta_{a,s}(\tau). Third, Assumptions 3(ii) and 3(iii) impose mild regularity conditions on m¯a()\overline{m}_{a}(\cdot). Assumption 3(ii) holds automatically if Υ\Upsilon is a finite set. In general, both Assumption 3(ii) and 3(iii) hold if

supa=0,1,s𝒮,xSupp(X)|m¯a(τ2,s,x)m¯a(τ1,s,x)|L|τ2τ1|\displaystyle\sup_{a=0,1,s\in\mathcal{S},x\in\text{Supp}(X)}|\overline{m}_{a}(\tau_{2},s,x)-\overline{m}_{a}(\tau_{1},s,x)|\leq L|\tau_{2}-\tau_{1}|

for some constant L>0L>0. Such Lipschitz continuity holds for the true specification (m¯a()=ma()\overline{m}_{a}(\cdot)=m_{a}(\cdot)) under Assumption 2. Fourth, we provide primitive sufficient conditions for Assumption 3 in Section 5.

Theorem 3.1.

Suppose Assumptions 13 hold. Then, uniformly over τΥ\tau\in\Upsilon,

n(q^adj(τ)q(τ))(τ),\displaystyle\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))\rightsquigarrow\mathcal{B}(\tau),

where (τ)\mathcal{B}(\tau) is a tight Gaussian process with covariance kernel Σ(τ,τ)\Sigma(\tau,\tau^{\prime}) defined in Section E of the Online Supplement. In addition, for any finite set of quantile indices (τ1,,τK)(\tau_{1},\cdots,\tau_{K}), the asymptotic covariance matrix of (q^adj(τ1),,q^adj(τK))(\hat{q}^{adj}(\tau_{1}),\cdots,\hat{q}^{adj}(\tau_{K})) is denoted as [Σ(τk,τl)]k,l[K][\Sigma(\tau_{k},\tau_{l})]_{k,l\in[K]}, where we use [Ukl]k,l[K][U_{kl}]_{k,l\in[K]} to denote a K×KK\times K matrix whose (k,l)(k,l)th entry is UklU_{kl}. Then, [Σ(τk,τl)]k,l[K][\Sigma(\tau_{k},\tau_{l})]_{k,l\in[K]} is minimized in the matrix sense333For two symmetric matrices AA and BB, we say AA is greater than or equal to BB if ABA-B is positive semidefinite. when the auxiliary regressions are correctly specified at (τ1,,τK)(\tau_{1},\cdots,\tau_{K}), i.e., m¯a(τk,s,x)=ma(τk,s,x)\overline{m}_{a}(\tau_{k},s,x)=m_{a}(\tau_{k},s,x) for a=0,1a=0,1, k[K]k\in[K], and all (s,x)(s,x) in the joint support of (Si,Xi)(S_{i},X_{i}).

Three remarks are in order. First, the expression for the asymptotic variance of q^adj(τ)\hat{q}^{adj}(\tau) can be found in the proof of Theorem 3. It is the same whether the randomization scheme achieves strong balance444We refer readers to Bugni et al. (2018) for the definition of strong balance. or not. This robustness is due to the use of the estimated target fraction (π^(s)\hat{\pi}(s)). The same phenomenon was discovered in the simplified setting by Zhang and Zheng (2020). Second, although our estimator is still consistent and asymptotically normal when the auxiliary regression is misspecified, it is meaningful to pursue the correct specification as it achieves the minimum variance. As the estimator with no adjustments can be viewed as a special case of our estimator with m¯a()=0\overline{m}_{a}(\cdot)=0, Theorem 3 implies that the adjusted estimator with the correctly specified auxiliary regression is more efficient than that with no adjustments. If the auxiliary regression is misspecified, the adjusted estimator can sometimes be less efficient than the unadjusted one, which is known as the Freedman’s critique. In Section 5, we discuss how to make adjustments that do not harm the precision of the QTE estimator. Third, the asymptotic variance of q^adj(τ)\hat{q}^{adj}(\tau) depends on (fa(qa(τ)),ma(τ,s,x))a=0,1(f_{a}(q_{a}(\tau)),m_{a}(\tau,s,x))_{a=0,1}, which are infinite-dimensional nuisance parameters. To conduct analytic inference, it is necessary to nonparametrically estimate these nuisance parameters, which requires tuning parameters. Nonparametric estimation can be sensitive to the choice of tuning parameters and rule-of-thumb tuning parameter selection may not be appropriate for every DGP or every quantile. The use of cross-validation in selecting the tuning parameters is possible in principle but, in practice, time-consuming. These practical difficulties of analytic methods of inference provide strong motivation to investigate bootstrap inference procedures that are much less reliant on tuning parameters.

4 Multiplier Bootstrap Inference

We approximate the asymptotic distributions of q^adj(τ)\hat{q}^{adj}(\tau) via the multiplier bootstrap. Let {ξi}i[n]\{\xi_{i}\}_{i\in[n]} be a sequence of bootstrap weights which will be specified later. Define n1w(s)=i[n]ξiAi1{Si=s}n_{1}^{w}(s)=\sum_{i\in[n]}\xi_{i}A_{i}1\{S_{i}=s\}, n0w(s)=i[n]ξi(1Ai)1{Si=s}n_{0}^{w}(s)=\sum_{i\in[n]}\xi_{i}(1-A_{i})1\{S_{i}=s\}, nw(s)=i[n]ξi1{Si=s}=n1w(s)+n0w(s)n^{w}(s)=\sum_{i\in[n]}\xi_{i}1\{S_{i}=s\}=n_{1}^{w}(s)+n_{0}^{w}(s), and π^w(s)=n1w(s)/nw(s)\hat{\pi}^{w}(s)=n_{1}^{w}(s)/n^{w}(s). The multiplier bootstrap counterpart of q^adj(τ)\hat{q}^{adj}(\tau) is denoted by q^w(τ)\hat{q}^{w}(\tau) and defined as

q^w(τ)=q^1w(τ)q^0w(τ),\displaystyle\hat{q}^{w}(\tau)=\hat{q}_{1}^{w}(\tau)-\hat{q}_{0}^{w}(\tau),

where

q^1w(τ)=\displaystyle\hat{q}_{1}^{w}(\tau)= argminqi[n]ξi[Aiπ^w(Si)ρτ(Yiq)+(Aiπ^w(Si))π^w(Si)m^1(τ,Si,Xi)q],\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}\rho_{\tau}(Y_{i}-q)+\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})q\right], (4.1)

and

q^0w(τ)=\displaystyle\hat{q}_{0}^{w}(\tau)= argminqi[n]ξi[1Ai1π^w(Si)ρτ(Yiq)(Aiπ^w(Si))1π^w(Si)m^0(τ,Si,Xi)q].\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\xi_{i}\left[\frac{1-A_{i}}{1-\hat{\pi}^{w}(S_{i})}\rho_{\tau}(Y_{i}-q)-\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{1-\hat{\pi}^{w}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})q\right]. (4.2)

Two comments on implementation are noted here: (i) we do not re-estimate m^a()\widehat{m}_{a}(\cdot) in the bootstrap sample, which is similar to the multiplier bootstrap procedure proposed by Belloni et al. (2017); and (ii) in Section B of the Online Supplement we propose a way to directly compute (q^aw(τ))a=0,1(\hat{q}_{a}^{w}(\tau))_{a=0,1} from the subgradient conditions of (4.1) and (4.2), thereby avoiding the optimization. Both features considerably reduce computation time of our bootstrap procedure.

Next, we specify the bootstrap weights.

Assumption 4.

Suppose {ξi}i[n]\{\xi_{i}\}_{i\in[n]} is a sequence of nonnegative i.i.d. random variables with unit expectation and variance and a sub-exponential upper tail.

Assumption 5.

Recall Δ¯a(τ,s,x)\overline{\Delta}_{a}(\tau,s,x) defined in Assumption 3. We have, for a=0,1a=0,1,

supτΥ,s𝒮|iI1(s)ξiΔ¯a(τ,s,Xi)n1w(s)iI0(s)ξiΔ¯a(τ,s,Xi)n0w(s)|=op(n1/2).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{|}=o_{p}(n^{-1/2}).

We require the bootstrap weights to be nonnegative so that the objective functions in (4.1) and (4.2) are convex. In practice, we generate ξi\xi_{i} independently from the standard exponential distribution. Assumption 5 is the bootstrap counterpart of Assumption 3. Continuing with the linear model example considered after Assumption 3, Assumption 5 requires

supτΥ,a=0,1,s𝒮|(iI1(s)ξiXin1w(s)iI0(s)ξiXin0w(s))(θ^a,s(τ)θa,s(τ))|=op(n1/2),\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left(\frac{\sum_{i\in I_{1}(s)}\xi_{i}X_{i}}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}X_{i}}{n_{0}^{w}(s)}\right)^{\top}\left(\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)\right)\right|=o_{p}(n^{-1/2}),

which holds if θ^a,s(τ)\hat{\theta}_{a,s}(\tau) is a uniformly consistent estimator of θa,s(τ)\theta_{a,s}(\tau).

Theorem 4.1.

Suppose Assumptions 15 hold. Then, uniformly over τΥ\tau\in\Upsilon,

n(q^w(τ)q^adj(τ))𝜉(τ),\displaystyle\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))\underset{\xi}{\overset{\mathbb{P}}{\rightsquigarrow}}\mathcal{B}(\tau),

where (τ)\mathcal{B}(\tau) is the same Gaussian process defined in Theorem 3.555We view n(q^w(τ)q^adj(τ))\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau)) and (τ)\mathcal{B}(\tau) as two processes indexed by τΥ\tau\in\Upsilon and denote them as GnG_{n} and GG, respectively. Then, following van der Vaart and Wellner (1996, Chapter 2.9), we say Gn𝜉GG_{n}\underset{\xi}{\overset{\mathbb{P}}{\rightsquigarrow}}G uniformly over τΥ\tau\in\Upsilon if suphBL1|𝔼ξh(Gn)𝔼h(G)|p0,\displaystyle\sup_{h\in\text{BL}_{1}}|\mathbb{E}_{\xi}h(G_{n})-\mathbb{E}h(G)|\stackrel{{\scriptstyle p}}{{\longrightarrow}}0, where BL1\text{BL}_{1} is the set of all functions h:(Υ)[0,1]h:\ell^{\infty}(\Upsilon)\mapsto[0,1] such that |h(z1)h(z2)||z1z2||h(z_{1})-h(z_{2})|\leq|z_{1}-z_{2}| for every z1,z2(Υ)z_{1},z_{2}\in\ell^{\infty}(\Upsilon), and 𝔼ξ\mathbb{E}_{\xi} denotes expectation with respect to the bootstrap weights {ξ}i[n]\{\xi\}_{i\in[n]}.

Two remarks are in order. First, Theorem 5 shows the limit distribution of the bootstrap estimator conditional on data can approximate that of the original estimator uniformly over τΥ\tau\in\Upsilon. This is the theoretical foundation for the bootstrap confidence intervals and bands described in Section B in the Online Supplement. Specifically, denote {q^w,b(τ)}b[B]\{\hat{q}^{w,b}(\tau)\}_{b\in[B]} as the bootstrap estimates where BB is the number of bootstrap replications. Let 𝒞^(ν)\widehat{\mathcal{C}}(\nu) and 𝒞(ν)\mathcal{C}(\nu) be the ν\nuth empirical quantile of the sequence {q^w,b(τ)}b[B]\{\hat{q}^{w,b}(\tau)\}_{b\in[B]} and the ν\nuth standard normal critical value, respectively. Then, we suggest using the bootstrap estimator to construct the standard error of q^adj(τ)\hat{q}^{adj}(\tau) as σ^=𝒞^(0.975)𝒞^(0.025)𝒞(0.975)𝒞(0.025)\hat{\sigma}=\frac{\widehat{\mathcal{C}}(0.975)-\widehat{\mathcal{C}}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}. Note that, unlike Hahn and Liao (2021), our bootstrap standard error is not conservative. In our context, the bootstrap estimator of σ2\sigma^{2} considered by Hahn and Liao (2021) is 𝔼(n(q^w(τ)q^adj(τ))2)\mathbb{E}^{*}(\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))^{2}), where 𝔼\mathbb{E}^{*} is the conditional expectation given data. It is well-known that weak convergence does not imply convergence in L2L_{2}-norm, which explains why they can show their estimator is in general conservative. Instead, we use a different estimator of the standard error and can show it is consistent given weak convergence. Second, such a bootstrap approximation is consistent under CAR. Zhang and Zheng (2020) showed that for the QTE estimation without regression adjustment, bootstrapping the IPW QTE estimator with the estimated target fraction results in non-conservative inference, while bootstrapping the IPW estimator with the true fraction is conservative under CARs. As the estimator considered by Zhang and Zheng (2020) is a special case of our regression-adjusted estimator with m^a()=0\widehat{m}_{a}(\cdot)=0, we conjecture that the same conclusion holds. A proof of conservative bootstrap inference with the true target fraction is not included in the paper due to the space limit.666Full statements and proofs are lengthy because we need to derive the limit distributions of not only the bootstrap but also the original estimator with the true target fraction. Although the negative result is theoretically interesting, we are not aware of any empirical papers using the true target fraction while making regression adjustments. Moreover, our method is shown to have better performance than the one with the true target fraction in simulations. So the practical value of proving the negative result is limited. Our simulations confirm both the correct size coverage of our inference method using the bootstrap with the estimated target fraction and the conservatism of the bootstrap with the true target fraction. The standard error of the QTE estimator is found to be 34.9% larger on average by using the true rather than the estimated target fraction in the simulations (see Tables 1 below and 15 in the Online Supplement).

5 Auxiliary Regressions

In this section, we consider two approaches to estimation for the auxiliary regressions: (1) a parametric method and (2) a nonparametric method. In Section A of the Online Supplement, we further consider a regularization method. For the parametric method, we do not require the model to be correctly specified. We propose ways to estimate the pseudo true value of the auxiliary regression. For the other two methods, we (nonparametrically) estimate the true model so that the asymptotic variance of q^adj(τ)\hat{q}^{adj}(\tau) achieves its minimum based on Theorem 3. For all three methods, we verify Assumptions 3 and 5.

5.1 Parametric method

In this section, we consider the case where XiX_{i} is finite-dimensional. Recall ma(τ,s,x)τ(Yi(a)qa(τ)|Xi=x,Si=s)m_{a}(\tau,s,x)\equiv\tau-\mathbb{P}\left(Y_{i}(a)\leq q_{a}(\tau)|X_{i}=x,S_{i}=s\right) for a=0,1a=0,1. We propose to model (Yi(a)qa(τ)|Xi,Si=s)\mathbb{P}\left(Y_{i}(a)\leq q_{a}(\tau)|X_{i},S_{i}=s\right) as Λτ,s(Xi,θa,s(τ))\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau)), where θa,s(τ)\theta_{a,s}(\tau) is a finite dimensional parameter that depends on (a,s,τ)(a,s,\tau) so that our model for ma(τ,s,Xi)m_{a}(\tau,s,X_{i}) is

m¯a(τ,s,Xi)=τΛτ,s(Xi,θa,s(τ)).\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau)). (5.1)

We note that, as we allow for misspecification, the researchers have the freedom to choose any functional forms for Λτ,s()\Lambda_{\tau,s}(\cdot) and any pseudo true values for θa,s(τ)\theta_{a,s}(\tau), both of which can vary with respect to (τ,s)(\tau,s). For example, if we assume a logistic regression with Λτ,s(Xi,θa,s(τ))=λ(Xiθa,s(τ))\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))=\lambda(X_{i}^{\top}\theta_{a,s}(\tau)), where λ()\lambda(\cdot) is the logistic CDF, then there are various choices of θa,s(τ)\theta_{a,s}(\tau) such as the maximizer of the population pseudo likelihood, the maximizer of the population version of the least squares objective function, or the minimizer of the asymptotic variance of the adjusted QTE estimator. As the logistic model is potentially misspecified, these three pseudo true values are not necessarily the same and can lead to different adjustments, and thus, different asymptotic variances of the corresponding adjusted QTE estimators.

Next, we first state a general result for generic choices of Λτ,s()\Lambda_{\tau,s}(\cdot) and θa,s(τ)\theta_{a,s}(\tau). Suppose we estimate θa,s(τ)\theta_{a,s}(\tau) by θ^a,s(τ)\hat{\theta}_{a,s}(\tau). Then, the corresponding m^a(τ,s,Xi)\widehat{m}_{a}(\tau,s,X_{i}) can be written as

m^a(τ,s,Xi)=τΛτ,s(Xi,θ^a,s(τ)).\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(X_{i},\hat{\theta}_{a,s}(\tau)). (5.2)
Assumption 6.
  1. (i)

    Suppose there exist a positive random variable LiL_{i} and a positive constant C>0C>0 such that

    supτ1,τ2Υ,s𝒮,θCΛτ1,s(Xi,θ)Λτ2,s(Xi,θ)2Li|τ1τ2|,supτΥ,s𝒮,θC|Λτ,s(Xi,θ)|Li,\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,s\in\mathcal{S},||\theta||\leq C}||\Lambda_{\tau_{1},s}(X_{i},\theta)-\Lambda_{\tau_{2},s}(X_{i},\theta)||_{2}\leq L_{i}|\tau_{1}-\tau_{2}|,\quad\sup_{\tau\in\Upsilon,s\in\mathcal{S},||\theta||\leq C}|\Lambda_{\tau,s}(X_{i},\theta)|\leq L_{i},
    supτΥ,s𝒮,θC|θΛτ,s(Xi,θ)|Li,and𝔼(Lid|Si=s)C<for some d>2.\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S},||\theta||\leq C}|\partial_{\theta}\Lambda_{\tau,s}(X_{i},\theta)|\leq L_{i},\quad\text{and}\quad\mathbb{E}(L_{i}^{d}|S_{i}=s)\leq C<\infty\quad\text{for some $d>2$.}
  2. (ii)

    supτ1,τ2Υ,a=0,1,s𝒮|θa,s(τ1)θa,s(τ2)|C|τ1τ2|\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}|\theta_{a,s}(\tau_{1})-\theta_{a,s}(\tau_{2})|\leq C|\tau_{1}-\tau_{2}|.

  3. (iii)

    supτΥ,a=0,1,s𝒮θ^a,s(τ)θa,s(τ)2p0\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)||_{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Three remarks are in order. First, common choices for auxiliary regressions are linear probability, logistic, and probit regressions, corresponding to Λτ,s(Xi,θa,s(τ))=Xiθa,s(τ)\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))=X_{i}^{\top}\theta_{a,s}(\tau), λ(Xiθa,s(τ))\lambda(\vec{X}_{i}^{\top}\theta_{a,s}(\tau)), and Φ(Xiθa,s(τ))\Phi(\vec{X}_{i}^{\top}\theta_{a,s}(\tau)), respectively, where Φ()\Phi(\cdot) is the standard normal CDF and Xi=(1,Xi)\vec{X}_{i}=(1,X_{i}^{\top})^{\top}. For these models, the functional form Λτ,s()\Lambda_{\tau,s}(\cdot) does not depend on (τ,s)(\tau,s), and Assumption 6(i) holds automatically. For the linear regression case, we do not include the intercept because our regression adjusted estimators ((3.1) and (3.2)) and their bootstrap counterparts ((4.1) and (4.2)) are numerically invariant to location shift of the auxiliary regressions. Second, it is also important to allow the functional form Λτ,s()\Lambda_{\tau,s}(\cdot) to vary across τ\tau to incorporate the case in which the regressor XiX_{i} in the linear, logistic, and probit regressions is replaced by Wi,s(τ)W_{i,s}(\tau), a function of XiX_{i} that depends on (τ,s)(\tau,s). We give a concrete example for this situation in Section 5.1.3. Third, Assumption 6(ii) also holds automatically if Υ\Upsilon is finite. When Υ\Upsilon is infinite, this condition is still mild.

Theorem 5.1.

Denote q^par(τ)\hat{q}^{par}(\tau) and q^par,w(τ)\hat{q}^{par,w}(\tau) as the τ\tauth QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with m¯a(τ,Si,Xi)\overline{m}_{a}(\tau,S_{i},X_{i}) and m^a(τ,Si,Xi)\widehat{m}_{a}(\tau,S_{i},X_{i}) defined in (5.1) and (5.2), respectively. Suppose Assumptions 1, 2, 4, and 6 hold. Then, Assumptions 3 and 5 hold, which further implies Theorems 3 and 5 hold for q^par(τ)\hat{q}^{par}(\tau) and q^par,w(τ)\hat{q}^{par,w}(\tau), respectively.

Theorem 5.1 shows that, as long as the estimator of the pseudo true value (θ^a,s(τ)\hat{\theta}_{a,s}(\tau)) is uniformly consistent, under mild regularity conditions, all the general estimation and bootstrap inference results established in Sections 3 and 4 hold.

5.1.1 Linear probability model

In this section, we consider linear adjustment with parameter ta,s(τ)t_{a,s}(\tau) such that

Λτ,s(Xi,ta,s(τ))=Wi,s(τ)ta,s(τ)andm¯a(τ,s,Xi)=τWi,s(τ)ta,s(τ),\displaystyle\Lambda_{\tau,s}(X_{i},t_{a,s}(\tau))=W_{i,s}^{\top}(\tau)t_{a,s}(\tau)\quad\text{and}\quad\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)t_{a,s}(\tau), (5.3)

where the regressor Wi,s(τ)W_{i,s}(\tau) is a function of XiX_{i} but the functional form may vary across s,τs,\tau. For example, we can consider Wi,s(τ)=XiW_{i,s}(\tau)=X_{i}, the transformations of XiX_{i} such as quadratic and interaction terms, and some prediction of (1{Yi(1)q1(τ)},1{Yi(0)q0(τ)})(1\{Y_{i}(1)\leq q_{1}(\tau)\},1\{Y_{i}(0)\leq q_{0}(\tau)\}) given XiX_{i} and Si=sS_{i}=s. The last example is further explained in Section 5.1.3.

We note that the asymptotic variance (denoted as σ2\sigma^{2}) of the q^adj(τ)\hat{q}^{adj}(\tau) is a function of the working model (m¯a(τ,s,)\overline{m}_{a}(\tau,s,\cdot)), which is further indexed by its parameters (denoted as {ta,s(τ)}a=0,1,s𝒮\{t_{a,s}(\tau)\}_{a=0,1,s\in\mathcal{S}}), i.e., σ2=σ2({m¯a(τ,s,;ta,s)}a=0,1,s𝒮)\sigma^{2}=\sigma^{2}(\{\overline{m}_{a}(\tau,s,\cdot;t_{a,s})\}_{a=0,1,s\in\mathcal{S}}). Our optimal linear adjustment corresponds to parameter value θa,s(τ)\theta_{a,s}(\tau) such that it minimizes σ2({m¯a(τ,s,;ta,s)}a=0,1,s𝒮)\sigma^{2}(\{\overline{m}_{a}(\tau,s,\cdot;t_{a,s})\}_{a=0,1,s\in\mathcal{S}}), i.e.,

{θa,s(τ)}a=0,1,s𝒮argminta,s:a=0,1,s𝒮σ2({m¯a(τ,s,;ta,s)}a=0,1,s𝒮).\displaystyle\{\theta_{a,s}(\tau)\}_{a=0,1,s\in\mathcal{S}}\in\operatorname*{arg\,min}_{t_{a,s}:a=0,1,s\in\mathcal{S}}\sigma^{2}(\{\overline{m}_{a}(\tau,s,\cdot;t_{a,s})\}_{a=0,1,s\in\mathcal{S}}).
Assumption 7.

Define W~i,s(τ)=Wi,s(τ)𝔼(Wi,s(τ)|Si=s)\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s). There exist constants 0<c<C<0<c<C<\infty such that

c<infa=0,1,s𝒮,τΥλmin(𝔼W~i,s(τ)W~i,s(τ)|Si=s)supa=0,1,s𝒮,τΥλmax(𝔼W~i,s(τ)W~i,s(τ)|Si=s)C\displaystyle c<\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\min}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}(\tau)^{\top}|S_{i}=s)\leq\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\max}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}(\tau)^{\top}|S_{i}=s)\leq C

and 𝔼(W~i,s2d|Si=s)C\mathbb{E}(||\tilde{W}_{i,s}||_{2}^{d}|S_{i}=s)\leq C for some d>2d>2, where for a generic symmetric matrix UU, λmin(U)\lambda_{\min}(U) and λmax(U)\lambda_{\max}(U) denote the minimal and maximal eigenvalues of UU, respectively.

The next theorem derives the closed-form expression for the optimal linear coefficient.

Theorem 5.2.

Suppose Assumptions 1, 2, 4, 6, 7 hold, and Λτ,s()\Lambda_{\tau,s}(\cdot) is defined in (5.3). Further denote the asymptotic covariance matrix of (q^par(τ1),,q^par(τK))(\hat{q}^{par}(\tau_{1}),\cdots,\hat{q}^{par}(\tau_{K})) for any finite set of quantile indices (τ1,,τK)(\tau_{1},\cdots,\tau_{K}) as [ΣLP(τk,τl)]k,l[K][\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})]_{k,l\in[K]}. Then, [ΣLP(τk,τl)]k,l[K][\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})]_{k,l\in[K]} is minimized in the matrix sense at (θ1,s(τk),θ0,s(τk))k[K]\left(\theta_{1,s}(\tau_{k}),\theta_{0,s}(\tau_{k})\right)_{k\in[K]} such that

θ1,s(τk)f1(q1(τk))+π(s)θ0,s(τk)(1π(s))f0(q0(τk))=θ1,sLP(τk)f1(q1(τk))+π(s)θ0,sLP(τk)(1π(s))f0(q0(τk)),k[K],\displaystyle\frac{\theta_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)\theta_{0,s}(\tau_{k})}{(1-\pi(s))f_{0}(q_{0}(\tau_{k}))}=\frac{\theta_{1,s}^{\textit{LP}}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)\theta_{0,s}^{\textit{LP}}(\tau_{k})}{(1-\pi(s))f_{0}(q_{0}(\tau_{k}))},\leavevmode\nobreak\ k\in[K],

where for τ=τ1,,τK\tau=\tau_{1},\cdots,\tau_{K} and a=0,1a=0,1,

θa,sLP(τ)=[𝔼(W~i,s(τ)W~i,s(τ)|Si=s)]1𝔼[W~i,s(τ)1{Yi(a)qa(τ)}|Si=s].\displaystyle\theta_{a,s}^{\textit{LP}}(\tau)=\left[\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}(\tau)^{\top}|S_{i}=s)\right]^{-1}\mathbb{E}\left[\tilde{W}_{i,s}(\tau)1\{Y_{i}(a)\leq q_{a}(\tau)\}|S_{i}=s\right].

Four remarks are in order. First, the optimal linear coefficients {θa,s(τ)}a=0,1,s𝒮\{\theta_{a,s}(\tau)\}_{a=0,1,s\in\mathcal{S}} are not uniquely defined. In order to achieve the minimal variance, we only need to consistently estimate one of the minimizers. We choose

(θ1,s(τ),θ0,s(τ))=(θ1,sLP(τ),θ0,sLP(τ)),s𝒮,\displaystyle(\theta_{1,s}(\tau),\theta_{0,s}(\tau))=(\theta_{1,s}^{\textit{LP}}(\tau),\theta_{0,s}^{\textit{LP}}(\tau)),\leavevmode\nobreak\ s\in\mathcal{S},

as this choice avoids estimation of the densities f1(q1(τ))f_{1}(q_{1}(\tau)) and f0(q0(τ))f_{0}(q_{0}(\tau)). In Theorem 5.3 below, we propose estimators of θ1,sLP(τ)\theta_{1,s}^{\textit{LP}}(\tau) and θ0,sLP(τ)\theta_{0,s}^{\textit{LP}}(\tau) and show they are consistent uniformly over ss and τ\tau. Second, note that no adjustment is nested by our linear adjustment with zero coefficients. Due to the optimality result established in Theorem 5.2, our regression-adjusted QTE estimator with (consistent estimators of) {θa,sLP(τ)}a=0,1,s𝒮\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S}} is more efficient than that with no adjustments. Third, we also need to clarify that the optimality of {θa,sLP(τ)}a=0,1,s𝒮\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S}} is only within the class of linear regressions. It is possible that the QTE estimator with some nonlinear adjustments are more efficient than that with the optimal linear adjustments, especially because the linear probability model is likely misspecified. Fourth, the optimal linear coefficients {θa,sLP(τ)}a=0,1,s𝒮,τΥ\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon} minimize (over the class of linear models) not only the asymptotic variance of q^par(τ)\hat{q}^{par}(\tau) but also the covariance matrix of (q^par(τ1),,q^par(τK))(\hat{q}^{par}(\tau_{1}),\cdots,\hat{q}^{par}(\tau_{K})) for any finite-dimension quantile indices (τ1,,τK)(\tau_{1},\cdots,\tau_{K}). This implies we can use the same (estimators of) optimal linear coefficients for hypothesis testing involving single, multiple, or even a continuum of quantile indices.

In the rest of this subsection, we focus on the estimation of {θa,sLP(τ)}a=0,1,s𝒮\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S}}. Note that θa,sLP(τ)\theta_{a,s}^{\textit{LP}}(\tau) is the projection coefficient of 1{Yiqa(τ)}1\{Y_{i}\leq q_{a}(\tau)\} on W~i,s(τ)\tilde{W}_{i,s}(\tau) for the sub-population with Si=sS_{i}=s and Ai=aA_{i}=a. We estimate them by sample analog. Specifically, the parameter qa(τ)q_{a}(\tau) is unknown and is replaced by some n\sqrt{n}-consistent estimator denoted by q^a(τ)\hat{q}_{a}(\tau).

Assumption 8.

Assume that supτΥ,a=0,1|q^a(τ)qa(τ)|=Op(n1/2)\sup_{\tau\in\Upsilon,a=0,1}|\hat{q}_{a}(\tau)-q_{a}(\tau)|=O_{p}(n^{-1/2}).

In practice, we compute {q^a(τ)}a=0,1\{\hat{q}_{a}(\tau)\}_{a=0,1} based on (3.1) and (3.2) by setting m^a(τ,Si,Xi)0\widehat{m}_{a}(\tau,S_{i},X_{i})\equiv 0. Then, Assumption 8 holds automatically by Theorem 3 with m^a(τ,Si,Xi)=m¯a(τ,Si,Xi)=0\widehat{m}_{a}(\tau,S_{i},X_{i})=\overline{m}_{a}(\tau,S_{i},X_{i})=0. Analysis throughout this section takes into account that the estimator q^a(τ)\hat{q}_{a}(\tau) is used in place of qa(τ)q_{a}(\tau).

Next, we define the estimator of θa,sLP(τ)\theta_{a,s}^{\textit{LP}}(\tau). Recall Ia(s)I_{a}(s) is defined in Assumption 3. Let

m¯a(τ,s,Xi)=τWi,s(τ)θa,sLP(τ),\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\theta_{a,s}^{\textit{LP}}(\tau), (5.4)
m^a(τ,s,Xi)=τWi,s(τ)θ^a,sLP(τ),\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\hat{\theta}_{a,s}^{\textit{LP}}(\tau), (5.5)
W˙i,a,s(τ)=Wi,s(τ)1na(s)iIa(s)Wi,s(τ),\displaystyle\dot{W}_{i,a,s}(\tau)=W_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}W_{i,s}(\tau), (5.6)

and

θ^a,sLP(τ)=[1na(s)iIa(s)W˙i,a,s(τ)W˙i,a,s(τ)]1[1na(s)iIa(s)W˙i,a,s(τ)1{Yiq^a(τ)}].\displaystyle\hat{\theta}_{a,s}^{\textit{LP}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}^{\top}(\tau)\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right]. (5.7)
Assumption 9.

Suppose there exists a positive random variable LiL_{i} and a positive constant C>0C>0 such that for a=0,1a=0,1,

supτ1,τ2Υ,a=0,1,s𝒮Wi,s(τ1)Wi,s(τ2)2\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}||W_{i,s}(\tau_{1})-W_{i,s}(\tau_{2})||_{2}\leq Li|τ1τ2|,supτΥ,a=0,1,s𝒮Wi,s(τ)2Li,\displaystyle L_{i}|\tau_{1}-\tau_{2}|,\quad\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||W_{i,s}(\tau)||_{2}\leq L_{i},

and 𝔼(Lid|Si=s)C<\mathbb{E}(L_{i}^{d}|S_{i}=s)\leq C<\infty for some d>2d>2.

We note that Assumption 9 holds automatically if the regressor Wi,s(τ)W_{i,s}(\tau) does not depend on τ\tau.

Theorem 5.3.

Suppose Assumptions 1, 2, 79 hold. Then Assumption 6 holds for (θa,s(τ),θ^a,s(τ))=(θa,sLP(τ),θ^a,sLP(τ)),a=0,1,s𝒮,τΥ(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))=(\theta_{a,s}^{\textit{LP}}(\tau),\hat{\theta}_{a,s}^{\textit{LP}}(\tau)),\leavevmode\nobreak\ a=0,1,s\in\mathcal{S},\tau\in\Upsilon.

We refer to the QTE estimator adjusted by this linear probability model with optimal linear coefficients θa,sLP(τ)\theta_{a,s}^{\textit{LP}}(\tau) and estimators θ^a,sLP(τ)\hat{\theta}_{a,s}^{\textit{LP}}(\tau) as the LP estimator and denote it and its bootstrap counterpart as q^LP(τ)\hat{q}^{\textit{LP}}(\tau) and q^LP,w(τ)\hat{q}^{\textit{LP,w}}(\tau), respectively. Theorem 5.3 verifies Assumption 6 for the proposed estimator of the optimal linear coefficient. Then, by Theorem 5.1, Theorems 3 and 5 hold for q^LP(τ)\hat{q}^{\textit{LP}}(\tau) and q^LP,w(τ)\hat{q}^{\textit{LP,w}}(\tau), which implies all the estimation and inference methods established in the paper are valid for the LP estimator. Theorem 5.2 further shows q^LP(τ)\hat{q}^{\textit{LP}}(\tau) is the estimator with the optimal linear adjustment and weakly more efficient than the QTE estimator with no adjustments.

5.1.2 Logistic probability model

It is also common to consider the logistic regression as the adjustment and estimate the model by maximum likelihood (ML). The main goal of the working model is to approximate the true model as closely as possible. It is, therefore, useful to include additional technical regressors such as interactions in the logistic regression. The set of regressors used is defined as Hi=H(Xi)H_{i}=H(X_{i}), which is allowed to contain the intercept. Let θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) and θa,sML(τ)\theta_{a,s}^{\textit{ML}}(\tau) be the quasi-ML estimator and its corresponding pseudo true value, respectively, i.e.,

θ^a,sML(τ)=argmaxθa1na(s)iIa(s)[1{Yiq^a(τ)}log(λ(Hiθa))+1{Yi>q^a(τ)}log(1λ(Hiθa))],\displaystyle\hat{\theta}_{a,s}^{\textit{ML}}(\tau)=\operatorname*{arg\,max}_{\theta_{a}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left[1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{i}^{\top}\theta_{a}))+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{i}^{\top}\theta_{a}))\right], (5.8)

and

θa,sML(τ)=argmaxθa𝔼[1{Yi(a)qa(τ)}log(λ(Hiθa))+1{Yi(a)>qa(τ)}log(1λ(Hiθa))|Si=s].\displaystyle\theta_{a,s}^{\textit{ML}}(\tau)=\operatorname*{arg\,max}_{\theta_{a}}\mathbb{E}\left[1\{Y_{i}(a)\leq q_{a}(\tau)\}\log(\lambda(H_{i}^{\top}\theta_{a}))+1\{Y_{i}(a)>q_{a}(\tau)\}\log(1-\lambda(H_{i}^{\top}\theta_{a}))|S_{i}=s\right]. (5.9)

We then define

m¯a(τ,s,Xi)=τλ(Hiθa,sML(τ))andm^a(τ,s,Xi)=τλ(Hiθ^a,sML(τ)).\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-\lambda(H_{i}^{\top}\theta_{a,s}^{\textit{ML}}(\tau))\quad\text{and}\quad\widehat{m}_{a}(\tau,s,X_{i})=\tau-\lambda(H_{i}^{\top}\hat{\theta}_{a,s}^{\textit{ML}}(\tau)). (5.10)

In addition to the inclusion of technical regressors, we allow the pseudo true value (θa,sML(τ)\theta_{a,s}^{\textit{ML}}(\tau)) to vary across quantiles τ\tau, giving another layer of flexibility to the model. Such a model is called the distribution regression and was first proposed by Chernozhukov et al. (2013). We emphasize here that, although we aim to make the regression model as flexible as possible, our theory and results do not require the model to be correctly specified.

Assumption 10.

Suppose θa,sML(τ)\theta_{a,s}^{\textit{ML}}(\tau) is the unique minimizer defined in (5.9) for a=0,1a=0,1.

Theorem 5.4.

Suppose Assumptions 1, 2, 8, 10 hold and there exist constants c,Cc,C such that

0<cλmin(𝔼HiHi)λmax(𝔼HiHi)C<,\displaystyle 0<c\leq\lambda_{\min}(\mathbb{E}H_{i}H_{i}^{\top})\leq\lambda_{\max}(\mathbb{E}H_{i}H_{i}^{\top})\leq C<\infty,

then Assumption 6(iii) holds for (θa,s(τ),θ^a,s(τ))=(θa,sML(τ),θ^a,sML(τ)),a=0,1,s𝒮,τΥ(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))=(\theta_{a,s}^{\textit{ML}}(\tau),\hat{\theta}_{a,s}^{\textit{ML}}(\tau)),\leavevmode\nobreak\ a=0,1,s\in\mathcal{S},\tau\in\Upsilon.

Four remarks are in order. First, we refer to the QTE estimator adjusted by the logistic model with QMLE as the ML estimator and denote it and its bootstrap counterpart as q^ML(τ)\hat{q}^{\textit{ML}}(\tau) and q^ML,w(τ)\hat{q}^{\textit{ML,w}}(\tau), respectively. Assumptions 6(i) holds automatically for the logistic regression. If we further impose Assumption 6(ii), then Theorem 5.4 implies that all the estimation and bootstrap inference methods established in the paper are valid for the ML estimator. Second, we take into account that θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) is computed when the true qa(τ)q_{a}(\tau) is replaced by its estimator q^a(τ)\hat{q}_{a}(\tau) and derive the results in Theorem 5.4 under Assumption 8. Third, the ML estimator is not guaranteed to be optimal or be more efficient than QTE estimator with no adjustments. On the other hand, as we can include additional technical terms in the regression and allow the regression coefficients to vary across τ\tau, the logistic model can be close to the true model ma(τ,s,Xi)m_{a}(\tau,s,X_{i}), which achieves the global minimum asymptotic variance based on Theorem 3. Fourth, in Section 5.2, we further justify the use of the ML estimator with a flexible logistic model by letting the number of technical terms (or equivalently, the dimension of HiH_{i}) diverge to infinity, showing by this means that the ML estimator can indeed consistently estimate the true model and thereby achieve the global minimum covariance matrix of the adjusted QTE estimator.

5.1.3 Further improved logistic model

Although in simulations, we cannot find a DGP in which the QTE estimator with logistic adjustment is less efficient than that with no adjustments, theoretically such a scenario still exists. In this section, we follow the idea of Cohen and Fogarty (2020) and construct an estimator which is weakly more efficient than both the ML estimator and the estimator with no adjustments. We denote Wi,s(τ)=(λ(Hiθ1,sML(τ)),λ(Hiθ0,sML(τ)))W_{i,s}(\tau)=(\lambda(H_{i}^{\top}\theta_{1,s}^{\textit{ML}}(\tau)),\lambda(H_{i}^{\top}\theta_{0,s}^{\textit{ML}}(\tau)))^{\top} and treat it as the regressor in a linear adjustment, i.e., define m¯a(τ,s,Xi)=τWi,s(τ)ta,s(τ)\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)t_{a,s}(\tau). Then, the logistic adjustment in Section 5.1.2 and no adjustments correspond to ta,s(τ)=a(1,0)+(1a)(0,1)t_{a,s}(\tau)=a(1,0)^{\top}+(1-a)(0,1)^{\top} and ta,s(τ)=(0,0)t_{a,s}(\tau)=(0,0)^{\top} for a=0,1a=0,1, respectively. However, following Theorem 5.2, the optimal linear coefficient with regressor Wi,s(τ)W_{i,s}(\tau) is

θa,sLPML(τ)=[𝔼(W~i,s(τ)W~i,s(τ)|Si=s)]1𝔼[W~i,s(τ)1{Yi(a)qa(τ)}|Si=s],\displaystyle\theta_{a,s}^{\textit{LPML}}(\tau)=\left[\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)\right]^{-1}\mathbb{E}\left[\tilde{W}_{i,s}(\tau)1\{Y_{i}(a)\leq q_{a}(\tau)\}|S_{i}=s\right], (5.11)

where W~i,s(τ)=Wi,s(τ)𝔼(Wi,s(τ)|Si=s)\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s). Using the adjustment term m¯a(τ,s,Xi)=τWi,s(τ)ta,s(τ)\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)t_{a,s}(\tau) with ta,s(τ)=θa,sLPML(τ)t_{a,s}(\tau)=\theta_{a,s}^{\textit{LPML}}(\tau) is asymptotically weakly more efficient than any other choices of ta,s(τ)t_{a,s}(\tau). In practice, we do not observe Wi,s(τ)W_{i,s}(\tau), but can replace it by its feasible version W^i,s(τ)=(λ(Hiθ^1,sML(τ)),λ(Hiθ^0,sML(τ)))\hat{W}_{i,s}(\tau)=(\lambda(H_{i}^{\top}\hat{\theta}_{1,s}^{\textit{ML}}(\tau)),\lambda(H_{i}^{\top}\hat{\theta}_{0,s}^{\textit{ML}}(\tau)))^{\top}. We then define

m¯a(τ,s,Xi)=τWi,s(τ)θa,sLPML(τ),\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\theta_{a,s}^{\textit{LPML}}(\tau), (5.12)
m^a(τ,s,Xi)=τW^i,s(τ)θ^a,sLPML(τ),\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\hat{W}_{i,s}^{\top}(\tau)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau), (5.13)
W˘i,a,s(τ)=W^i,s(τ)1na(s)iIa(s)W^i,s(τ),\displaystyle\breve{W}_{i,a,s}(\tau)=\hat{W}_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{W}_{i,s}(\tau), (5.14)

and

θ^a,sLPML(τ)=[1na(s)iIa(s)W˘i,a,s(τ)W˘i,a,s(τ)]1[1na(s)iIa(s)W˘i,a,s(τ)1{Yiq^a(τ)}].\displaystyle\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}^{\top}(\tau)\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right]. (5.15)
Assumption 11.
  1. (i)

    There exist constants c,Cc,C such that

    0<c<\displaystyle 0<c< infa=0,1,s𝒮,τΥλmin(𝔼W~i,s(τ)W~i,s(τ)|Si=s)\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\min}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)
    \displaystyle\leq supa=0,1,s𝒮,τΥλmax(𝔼W~i,s(τ)W~i,s(τ)|Si=s)C<.\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\max}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)\leq C<\infty.
  2. (ii)

    Suppose

    supτ1,τ2Υ,a=0,1,s𝒮θa,sML(τ1)θa,sML(τ2)2C|τ1τ2|\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}||\theta_{a,s}^{\textit{ML}}(\tau_{1})-\theta_{a,s}^{\textit{ML}}(\tau_{2})||_{2}\leq C|\tau_{1}-\tau_{2}|
    supτ1,τ2Υ,a=0,1,s𝒮θa,sLPML(τ1)θa,sLPML(τ2)2C|τ1τ2|.\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}||\theta_{a,s}^{\textit{LPML}}(\tau_{1})-\theta_{a,s}^{\textit{LPML}}(\tau_{2})||_{2}\leq C|\tau_{1}-\tau_{2}|.
Theorem 5.5.

Denote q^LPML(τ)\hat{q}^{\textit{LPML}}(\tau) and q^LPML,w(τ)\hat{q}^{\textit{LPML,w}}(\tau) as the τ\tauth QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with m¯a(τ,s,Xi)\overline{m}_{a}(\tau,s,X_{i}) and m^a(τ,s,Xi)\widehat{m}_{a}(\tau,s,X_{i}) defined in (5.12) and (5.13), respectively. Suppose Assumptions 1, 2, 8, 10, and 11 hold, and there exist constants c,Cc,C such that

0<cλmin(𝔼HiHi)λmax(𝔼HiHi)C<.\displaystyle 0<c\leq\lambda_{\min}(\mathbb{E}H_{i}H_{i}^{\top})\leq\lambda_{\max}(\mathbb{E}H_{i}H_{i}^{\top})\leq C<\infty.

Then, Assumptions 3 and 5 hold, which further implies Theorems 3 and 5 hold for q^LPML(τ)\hat{q}^{\textit{LPML}}(\tau) and q^LPML,w(τ)\hat{q}^{\textit{LPML,w}}(\tau), respectively. Further denote the asymptotic covariance matrices of (q^J(τ1),,q^J(τK))(\hat{q}^{\textit{J}}(\tau_{1}),\cdots,\hat{q}^{\textit{J}}(\tau_{K})) for any finite set of quantile indices (τ1,,τK)(\tau_{1},\cdots,\tau_{K}) as [ΣJ(τk,τl)]k,l[K][\Sigma^{\textit{J}}(\tau_{k},\tau_{l})]_{k,l\in[K]} for J{LPML,ML,NA}J\in\{\text{LPML,ML,NA}\}, where q^NA(τ)\hat{q}^{\textit{NA}}(\tau) is the τ\tauth QTE estimator without adjustments. Then we have

[ΣLPML(τk,τl)]k,l[K][ΣML(τk,τl)]k,l[K]and[ΣLPML(τk,τl)]k,l[K][ΣNA(τk,τl)]k,l[K]\displaystyle[\Sigma^{\textit{LPML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}\leq[\Sigma^{\textit{ML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}\quad\text{and}\quad[\Sigma^{\textit{LPML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}\leq[\Sigma^{\textit{NA}}(\tau_{k},\tau_{l})]_{k,l\in[K]}

in the matrix sense.

In practice when nn is small W˘i,a,s(τ)\breve{W}_{i,a,s}(\tau) may be nearly multicollinear within some stratum, which can lead to size distortion in inference concerning QTE. We therefore suggest first normalizing W˘i,a,s(τ)\breve{W}_{i,a,s}(\tau) by its standard deviation (denoting the normalized W˘i,a,s(τ)\breve{W}_{i,a,s}(\tau) as W¨i,a,s(τ)\ddot{W}_{i,a,s}(\tau)) and then running a ridge regression

θ~a,sLPML(τ)=[1na(s)iIa(s)W¨i,a,s(τ)W¨i,a,s(τ)+δnI2]1[1na(s)iIa(s)W¨i,a,s(τ)1{Yiq^a(τ)}],\displaystyle\tilde{\theta}_{a,s}^{\textit{LPML}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\ddot{W}_{i,a,s}(\tau)\ddot{W}_{i,a,s}^{\top}(\tau)+\delta_{n}I_{2}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\ddot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right],

where I2I_{2} is the two-dimensional identity matrix and δn=1/n\delta_{n}=1/n. Then, the final regression adjustment is

m^a(τ,s,Xi)=τW¨i,a,s(τ)θ~a,sLPML(τ).\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\ddot{W}_{i,a,s}^{\top}(\tau)\tilde{\theta}_{a,s}^{\textit{LPML}}(\tau).

Given Assumption 11, such a ridge penalty is asymptotically negligible and all the results in Theorem 5.5 still hold.777In unreported simulations, we find that when n800n\geq 800, the ridge regularization is unnecessary and the original adjustment (i.e., (5.13)) has no size distortion, implying that near-multicollinearity is indeed just a finite-sample issue.

5.2 Nonparametric method

This section considers nonparametric estimation of ma(τ,s,Xi)m_{a}(\tau,s,X_{i}) when the dimension of XiX_{i} is fixed as dxd_{x}. For ease of notation, we assume all coordinates of XiX_{i} are continuously distributed. If in an application some elements of XX are discrete, the dimension dxd_{x} is interpreted as the dimension of the continuous covariates. All results in this section can then be extended in a conceptually straightforward manner by using the continuous covariates only within samples that are homogeneous in discrete covariates.

As ma(τ,s,Xi)m_{a}(\tau,s,X_{i}) is nonparametrically estimated, we have m¯a(τ,s,Xi)=ma(τ,s,Xi)=τ(Yi(a)qa(τ)|Si=s,Xi)\overline{m}_{a}(\tau,s,X_{i})=m_{a}(\tau,s,X_{i})=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}). We estimate (Yi(a)qa(τ)|Si=s,Xi)\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}) by the sieve method of fitting a logistic model, as studied by Hirano et al. (2003). Specifically, recall λ()\lambda(\cdot) is the logistic CDF and denote the number of sieve bases by hnh_{n}, which depends on the sample size nn and can grow to infinity as nn\rightarrow\infty. Let Hhn(x)=(b1n(x),,bhnn(x))H_{h_{n}}(x)=(b_{1n}(x),\cdots,b_{h_{n}n}(x))^{\top} where {bhn(x)}h[hn]\{b_{hn}(x)\}_{h\in[h_{n}]} is an hnh_{n} dimensional basis of a linear sieve space. More details on the sieve space are given in Section B of the Online Supplement. Denote

m^a(τ,s,Xi)=τλ(Hhn(Xi)θ^a,sNP(τ))and\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}_{a,s}^{\textit{NP}}(\tau))\quad\text{and } (5.16)
θ^a,sNP(τ)=argmaxθa1na(s)iIa(s)[1{Yiq^a(τ)}log(λ(Hhn(Xi)θa))\displaystyle\hat{\theta}_{a,s}^{\textit{NP}}(\tau)=\operatorname*{arg\,max}_{\theta_{a}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))
+1{Yi>q^a(τ)}log(1λ(Hhn(Xi)θa))].\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))\biggr{]}. (5.17)

We refer to the QTE estimator with the nonparametric adjustment as the NP estimator. Note that we use the estimator q^a(τ)\hat{q}_{a}(\tau) of qa(τ)q_{a}(\tau) in (5.17), where q^a(τ)\hat{q}_{a}(\tau) satisfies Assumption 8. All the analysis in this section takes account of the fact that q^a(τ)\hat{q}_{a}(\tau) instead of qa(τ)q_{a}(\tau) is used.

Assumption 12.
  1. (i)

    There exist constants 0<κ1<κ2<0<\kappa_{1}<\kappa_{2}<\infty such that with probability approaching one,

    κ1λmin(1na(s)iIa(s)Hhn(Xi)Hhn(Xi))λmax(1na(s)iIa(s)Hhn(Xi)Hhn(Xi))κ2,\kappa_{1}\leq\lambda_{\min}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})\right)\leq\lambda_{\max}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})\right)\leq\kappa_{2},

    and

    κ1λmin(𝔼(Hhn(Xi)Hhn(Xi)|Si=s))λmax(𝔼(Hhn(Xi)Hhn(Xi)|Si=s))κ2.\kappa_{1}\leq\lambda_{\min}\left(\mathbb{E}(H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})|S_{i}=s)\right)\leq\lambda_{\max}\left(\mathbb{E}(H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})|S_{i}=s)\right)\leq\kappa_{2}.
  2. (ii)

    For a=0,1a=0,1, there exists an hn×1h_{n}\times 1 vector θa,sNP(τ)\theta_{a,s}^{\textit{NP}}(\tau) such that for Ra(τ,s,x)=(Yi(a)qa(τ)|Si=s,Xi=x)λ(Hhn(x)θa,sNP(τ))R_{a}(\tau,s,x)=\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)-\lambda(H_{h_{n}}^{\top}(x)\theta_{a,s}^{\textit{NP}}(\tau)), we have supa=0,1,s𝒮,τΥ,xSupp(X)|Ra(τ,s,x)|=o(1)\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}|R_{a}(\tau,s,x)|=o(1),

    supa=0,1,τΥ,s𝒮1na(s)iIa(s)Ra2(τ,s,Xi)=Op(hnlog(n)n),\displaystyle\sup_{a=0,1,\tau\in\Upsilon,s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})=O_{p}\left(\frac{h_{n}\log(n)}{n}\right),

    and

    supa=0,1,τΥ,s𝒮𝔼(Ra2(τ,s,Xi)|Si=s)=O(hnlog(n)n).\displaystyle\sup_{a=0,1,\tau\in\Upsilon,s\in\mathcal{S}}\mathbb{E}(R_{a}^{2}(\tau,s,X_{i})|S_{i}=s)=O\left(\frac{h_{n}\log(n)}{n}\right).
  3. (iii)

    For a=0,1a=0,1, there exists a constant c(0,0.5)c\in(0,0.5) such that

    c\displaystyle c\leq infa=0,1,s𝒮,τΥ,xSupp(X)(Yi(a)qa(τ)|Si=s,Xi=x)\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)
    \displaystyle\leq supa=0,1,s𝒮,τΥ,xSupp(X)(Yi(a)qa(τ)|Si=s,Xi=x)1c.\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)\leq 1-c.
  4. (iv)

    Suppose 𝔼(Hhn,h2(Xi)|Si=s)C<\mathbb{E}(H^{2}_{h_{n},h}(X_{i})|S_{i}=s)\leq C<\infty for some constant C>0C>0, supxSupp(X)Hhn(x)2ζ(hn)\sup_{x\in\text{Supp}(X)}||H_{h_{n}}(x)||_{2}\leq\zeta(h_{n}), ζ2(hn)hnlog(n)=o(n)\zeta^{2}(h_{n})h_{n}\log(n)=o(n), and hn2log2(n)=o(n)h_{n}^{2}\log^{2}(n)=o(n), where Hhn,h(Xi)H_{h_{n},h}(X_{i}) denotes the hhth coordinate of Hhn(Xi)H_{h_{n}}(X_{i}).

Four remarks are in order. First, Assumption 12(i) is standard in the sieve literature. Second, Assumption 12(ii) means the approximation error of the sieve logistic model vanishes asymptotically, which holds given sufficient smoothness of (Yi(a)qa(τ)|Si=s,Xi=x)\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x) in xx. Third, Assumption 12(iii) usually holds when Supp(X)\text{Supp}(X) is compact. This condition is also assumed by Hirano et al. (2003). Fourth, the quantity ζ(hn)\zeta(h_{n}) in Assumption 12(iv) depends on the choice of basis functions. For example, ζ(hn)=O(hn1/2)\zeta(h_{n})=O(h_{n}^{1/2}) for splines and ζ(hn)=O(hn)\zeta(h_{n})=O(h_{n}) for power series. Taking splines as an example, Assumption 12(iv) requires hn=o(n1/2)h_{n}=o(n^{1/2}).

Theorem 5.6.

Denote q^NP(τ)\hat{q}^{\textit{NP}}(\tau) and q^NP,w(τ)\hat{q}^{\textit{NP,w}}(\tau) as the τ\tauth QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with m¯a(τ,Si,Xi)=ma(τ,Si,Xi)\overline{m}_{a}(\tau,S_{i},X_{i})=m_{a}(\tau,S_{i},X_{i}) and m^a(τ,Si,Xi)\widehat{m}_{a}(\tau,S_{i},X_{i}) defined in (5.16). Further suppose Assumptions 1, 2, 4, 8, and 12 hold. Then, Assumptions 3 and 5 hold, which further implies that Theorems 3 and 5 hold for q^NP(τ)\hat{q}^{\textit{NP}}(\tau) and q^NP,w(τ)\hat{q}^{\textit{NP,w}}(\tau), respectively. In addition, for any finite-dimensional quantile indices (τ1,,τK)(\tau_{1},\cdots,\tau_{K}), the covariance matrix of (q^NP(τ1),,q^NP(τK))(\hat{q}^{\textit{NP}}(\tau_{1}),\cdots,\hat{q}^{\textit{NP}}(\tau_{K})) achieves the minimum (in the matrix sense) as characterized in Theorem 3.

Three remarks are in order. First, as the nonparametric regression consistently estimates the true specifications {ma()}a=0,1\{m_{a}(\cdot)\}_{a=0,1}, the QTE estimator adjusted by the nonparametric regression achieves the global minimum asymptotic variance, and thus is weakly more efficient than QTE estimation with linear and logistic adjustments studied in the previous section. Second, the practical implementation of NP and ML methods are the same, given that they share the same set of covariates (basis functions). Therefore, even if we include a small number of basis functions so that hnh_{n} is better treated as fixed, the proposed estimation and inference methods for the regression-adjusted QTE estimator are still valid, although they may not be optimal. Third, in Section A in the Online Supplement, we consider computing m^a(τ,s,x)\widehat{m}_{a}(\tau,s,x) via an 1\ell_{1} penalized logistic regression when the dimension of the regressors can be comparable or even higher than the sample size. We then provide primitive conditions under which we verify Assumptions 3 and 5.

6 Simulations

6.1 Data generating processes

Two DGPs are used to assess the finite sample performance of the estimation and inference methods introduced in the paper. We consider the outcome equation

Yi=α(Xi)+γZi+μ(Xi)Ai+ηi,\displaystyle Y_{i}=\alpha(X_{i})+\gamma Z_{i}+\mu(X_{i})A_{i}+\eta_{i}, (6.1)

where γ=4\gamma=4 for all cases while α(Xi)\alpha(X_{i}), μ(Xi)\mu(X_{i}), and ηi\eta_{i} are separately specified as follows.

  1. (i)

    Let ZZ be standardized Beta(2,2)(2,2) distributed, Si=j=141{Zigj}S_{i}=\sum_{j=1}^{4}1\{Z_{i}\leq g_{j}\}, and (g1,,g4)=(0.2520,0,0.2520,0.520)(g_{1},\cdots,g_{4})=(-0.25\sqrt{20},0,0.25\sqrt{20},0.5\sqrt{20}). XiX_{i} contains two covariates (X1i,X2i)(X_{1i},X_{2i})^{\top}, where X1iX_{1i} follows a uniform distribution on [2,2][-2,2], X2iX_{2i} follows a standard normal distribution, and X1iX_{1i} and X2iX_{2i} are independent. Further define α(Xi)=1+X2i\alpha(X_{i})=1+X_{2i}, μ(Xi)=1+Xiβ\mu(X_{i})=1+X_{i}^{\top}\beta, β=(3,3)\beta=(3,3)^{\top}, and ηi=(0.25+X1i2)Aiε1i+(1Ai)ε2i\eta_{i}=(0.25+X_{1i}^{2})A_{i}\varepsilon_{1i}+(1-A_{i})\varepsilon_{2i}, where (ε1i,ε2i)(\varepsilon_{1i},\varepsilon_{2i}) are jointly standard normal.

  2. (ii)

    Let ZZ be uniformly distributed on [2,2][-2,2], Si=j=141{Zigj}S_{i}=\sum_{j=1}^{4}1\{Z_{i}\leq g_{j}\}, and (g1,,g4)=(1,0,1,2)(g_{1},\cdots,g_{4})=(-1,0,1,2). Let Xi=(X1i,X2i)X_{i}=(X_{1i},X_{2i})^{\top} be the same as defined in DGP (i). Further define α(Xi)=1+X1i+X2i\alpha(X_{i})=1+X_{1i}+X_{2i}, μ(Xi)=1+X1i+X2i+14(Xiβ)2\mu(X_{i})=1+X_{1i}+X_{2i}+\frac{1}{4}(X_{i}^{\top}\beta)^{2} with β=(2,2)\beta=(2,2)^{\top}, and ηi=2(1+Zi2)Aiε1i+(1+Zi2)(1Ai)ε2i\eta_{i}=2(1+Z_{i}^{2})A_{i}\varepsilon_{1i}+(1+Z_{i}^{2})(1-A_{i})\varepsilon_{2i}, where (ε1i,ε2i)(\varepsilon_{1i},\varepsilon_{2i}) are mutually independently T(5)/5T(5)/\sqrt{5} distributed.

For each DGP, we consider the following four randomization schemes as in Zhang and Zheng (2020) with π(s)=0.5\pi(s)=0.5 for s𝒮s\in\mathcal{S}:

  1. (i)

    SRS: Treatment assignment is generated as in Example 1.

  2. (ii)

    WEI: Treatment assignment is generated as in Example 2 with ϕ(x)=(1x)/2\phi(x)=(1-x)/2.

  3. (iii)

    BCD: Treatment assignment is generated as in Example 3 with λ=0.75\lambda=0.75.

  4. (iv)

    SBR: Treatment assignment is generated as in Example 4.

We assess the empirical size and power of the tests for n=200n=200 and n=400n=400. We compute the true QTEs and their differences by simulations with 10,000 sample size and 1,000 replications. To compute power, we perturb the true values by Δ=1.5\Delta=1.5. We examine three null hypotheses:

  1. (i)

    Pointwise test

    H0:q(τ)=truthv.s.H1:q(τ)=truth+Δ,τ=0.25,0.5,0.75;H_{0}:q(\tau)=\text{truth}\quad\text{v.s.}\quad H_{1}:q(\tau)=\text{truth}+\Delta,\quad\tau=0.25,0.5,0.75;
  2. (ii)

    Test for the difference

    H0:q(0.75)q(0.25)=truthv.s.H1:q(0.75)q(0.25)=truth+Δ;H_{0}:q(0.75)-q(0.25)=\text{truth}\quad\text{v.s.}\quad H_{1}:q(0.75)-q(0.25)=\text{truth}+\Delta;
  3. (iii)

    Uniform test

    H0:q(τ)=truth(τ)v.s.H1:q(τ)=truth(τ)+Δ,τ[0.25,0.75].H_{0}:q(\tau)=\text{truth}(\tau)\quad v.s.\quad H_{1}:q(\tau)=\text{truth}(\tau)+\Delta,\quad\tau\in[0.25,0.75].

For the pointwise test, we report the results for the median (τ=0.5\tau=0.5) in the main text and give the cases τ=0.25\tau=0.25 and τ=0.75\tau=0.75 in the Online Supplement.

6.2 Estimation methods

We consider the following estimation methods of the auxiliary regression.

  1. (i)

    NA: the estimator with no adjustments, i.e., setting m^a()=m¯a()=0\widehat{m}_{a}(\cdot)=\overline{m}_{a}(\cdot)=0.

  2. (ii)

    LP: the linear probability model with regressors XiX_{i} and the pseudo true value estimated by θ^a,sLP(τ)\hat{\theta}_{a,s}^{\textit{LP}}(\tau) defined in (5.7).

  3. (iii)

    ML: the logistic model with regressor Hi=(1,X1i,X2i)H_{i}=(1,X_{1i},X_{2i})^{\top} and the pseudo true value estimated by θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) defined in (5.8).

  4. (iv)

    LPML: the logistic model with regressor Hi=(1,X1i,X2i)H_{i}=(1,X_{1i},X_{2i})^{\top} and the pseudo true value estimated by θ^a,sLPML(τ)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau) defined in (5.15).

  5. (v)

    MLX: the logistic model with regressor Hi=(1,X1i,X2i,X1iX2i)H_{i}=(1,X_{1i},X_{2i},X_{1i}X_{2i})^{\top} and the pseudo true value estimated by θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) defined in (5.8).

  6. (vi)

    LPMLX: the logistic model with regressor Hi=(1,X1i,X2i,X1iX2i)H_{i}=(1,X_{1i},X_{2i},X_{1i}X_{2i})^{\top} and the pseudo true value estimated by θ^a,sLPML(τ)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau) defined in (5.15).

  7. (vii)

    NP: the logistic model with regressor Hhn(Xi)=(1,X1i,X2i,X1iX2i,X1i1{X1i>t1}X2i1{X2i>t2})H_{h_{n}}(X_{i})=(1,X_{1i},X_{2i},X_{1i}X_{2i},X_{1i}1\{X_{1i}>t_{1}\}X_{2i}1\{X_{2i}>t_{2}\})^{\top} where t1t_{1} and t2t_{2} are the sample medians of {X1i}i[n]\{X_{1i}\}_{i\in[n]} and {X2i}i[n]\{X_{2i}\}_{i\in[n]}, respectively. The pseudo true value is estimated by θ^a,sNP(τ)\hat{\theta}_{a,s}^{\textit{NP}}(\tau) defined in (5.17).

6.3 Simulation results

Table 1 presents the empirical size and power for the pointwise test with τ=0.5\tau=0.5 under DGPs (i) and (ii). We make six observations. First, none of the auxiliary regressions is correctly specified, but test sizes are all close to the nominal level 5%, confirming that estimation and inference are robust to misspecification. Second, the inclusion of auxiliary regressions improves the efficiency of the QTE estimator as the powers for method “NA” are the lowest among all the methods for both DGPs and all randomization schemes. This finding is consistent with theory because methods “LP”, “LPML”, “LPMLX”, “NP” are guaranteed to be weakly more efficient than “NA”. Third, the powers of methods “LPML” and “LPMLX” are higher than those of methods “ML” and “MLX”, respectively. This is consistent with our theory that methods “LPML” and “LPMLX” further improve “ML” and “MLX”, respectively. In addition, methods “MLX” and “LPMLX” fit a flexible distribution regression that can approximate the true DGP well. Therefore, the powers of “MLX” and “LPMLX” are respectively much larger than those of “ML” and “LPML”. For the same reason we observe that the power of “LPMLX” is close to “NP”.888The results in Section C of the Online Supplement show that “LPMLX” has much smaller bias than “NP” and its variance is similar to “NP”, which make “LPMLX” preferable in practice. Fourth, the powers of method “NP” are the best because it estimates the true specification and achieves the minimum asymptotic variance of q^adj(τ)\hat{q}^{adj}(\tau) as shown in Theorem 5.1. Fifth, when the sample size is 200, the method “NP” slightly over-rejects but size becomes closer to nominal when the sample size increases to 400. Sixth, the improvement of power of “LPMLX” estimator upon “NA” (i.e., with no adjustments) is due to about 12-15% reduction of the standard error of the QTE estimator on average.999The bias and standard errors are reported in the Section C in the Online Supplement.

Table 1: Pointwise Test (τ=0.5\tau=0.5)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.055 0.054 0.050 0.054 0.051 0.054 0.051 0.051 0.404 0.406 0.403 0.406 0.665 0.676 0.681 0.681
LP 0.052 0.050 0.049 0.052 0.048 0.053 0.051 0.052 0.491 0.497 0.502 0.492 0.779 0.788 0.790 0.791
ML 0.053 0.050 0.049 0.055 0.051 0.050 0.052 0.052 0.472 0.478 0.483 0.473 0.759 0.768 0.775 0.773
LPML 0.054 0.052 0.052 0.057 0.052 0.054 0.051 0.053 0.506 0.509 0.523 0.513 0.802 0.812 0.814 0.809
MLX 0.056 0.059 0.055 0.057 0.055 0.054 0.055 0.058 0.475 0.479 0.486 0.482 0.752 0.759 0.760 0.760
LPMLX 0.060 0.058 0.059 0.058 0.054 0.055 0.054 0.054 0.506 0.513 0.521 0.512 0.802 0.810 0.813 0.811
NP 0.063 0.059 0.062 0.064 0.055 0.054 0.054 0.056 0.523 0.523 0.531 0.526 0.804 0.811 0.814 0.809
Panel B: DGP (ii)
NA 0.046 0.051 0.045 0.047 0.047 0.045 0.048 0.047 0.479 0.489 0.500 0.490 0.773 0.775 0.774 0.782
LP 0.049 0.051 0.050 0.050 0.045 0.048 0.050 0.045 0.572 0.581 0.589 0.579 0.851 0.856 0.857 0.854
ML 0.051 0.058 0.050 0.054 0.049 0.046 0.050 0.048 0.524 0.534 0.541 0.539 0.812 0.810 0.807 0.807
LPML 0.051 0.058 0.054 0.053 0.050 0.049 0.053 0.047 0.574 0.581 0.588 0.580 0.862 0.863 0.863 0.863
MLX 0.058 0.059 0.056 0.059 0.051 0.049 0.051 0.050 0.566 0.574 0.583 0.573 0.826 0.824 0.827 0.827
LPMLX 0.057 0.062 0.057 0.060 0.052 0.050 0.053 0.052 0.615 0.620 0.630 0.627 0.878 0.878 0.880 0.879
NP 0.063 0.066 0.062 0.062 0.056 0.055 0.056 0.051 0.622 0.625 0.632 0.628 0.883 0.880 0.882 0.879

Tables 2 and 3 present sizes and powers of inference on q(0.75)q(0.25)q(0.75)-q(0.25) and on q(τ)q(\tau) uniformly over τ[0.25,0.75]\tau\in[0.25,0.75], respectively, for DGPs (i) and (ii) and four randomization schemes. All the observations made above apply to these results. The improvement in power of the “LPMLX” estimator upon “NA” (i.e., with no adjustments) is due to a 9% reduction of the standard error of the difference of the QTE estimators on average. In Section C in the Online Supplement, we provide additional simulation results such as the empirical sizes and powers for the pointwise test with τ=0.25\tau=0.25 and 0.750.75, the bootstrap inference with the true target fraction, and the adjusted QTE estimator when the DGP contains high-dimensional covariates and the adjustments are computed via logistic Lasso. We also report the biases and standard errors of the adjusted QTE estimators.

Table 2: Test for Differences (τ1=0.25\tau_{1}=0.25, τ2=0.75\tau_{2}=0.75)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.043 0.045 0.040 0.041 0.044 0.043 0.041 0.043 0.214 0.216 0.209 0.203 0.387 0.389 0.383 0.365
LP 0.045 0.048 0.043 0.045 0.045 0.047 0.043 0.045 0.246 0.242 0.234 0.248 0.424 0.422 0.422 0.421
ML 0.045 0.045 0.043 0.042 0.046 0.047 0.040 0.048 0.234 0.233 0.231 0.239 0.415 0.422 0.417 0.426
LPML 0.044 0.049 0.045 0.045 0.049 0.049 0.044 0.047 0.250 0.250 0.248 0.259 0.451 0.453 0.450 0.459
MLX 0.046 0.052 0.046 0.047 0.047 0.047 0.044 0.049 0.232 0.234 0.229 0.241 0.415 0.415 0.404 0.416
LPMLX 0.049 0.055 0.047 0.047 0.049 0.050 0.047 0.047 0.247 0.249 0.249 0.258 0.445 0.453 0.445 0.453
NP 0.050 0.054 0.050 0.051 0.052 0.052 0.047 0.048 0.246 0.248 0.245 0.257 0.444 0.444 0.442 0.450
Panel B: DGP (ii)
NA 0.039 0.044 0.040 0.038 0.044 0.041 0.039 0.047 0.211 0.225 0.217 0.194 0.399 0.396 0.392 0.383
LP 0.043 0.048 0.045 0.040 0.045 0.044 0.042 0.047 0.244 0.255 0.251 0.245 0.447 0.440 0.441 0.455
ML 0.049 0.046 0.046 0.043 0.044 0.045 0.042 0.048 0.217 0.228 0.213 0.212 0.379 0.386 0.386 0.396
LPML 0.047 0.051 0.048 0.043 0.047 0.045 0.047 0.048 0.253 0.258 0.253 0.252 0.456 0.451 0.454 0.468
MLX 0.047 0.051 0.047 0.047 0.046 0.046 0.045 0.049 0.226 0.240 0.228 0.223 0.394 0.392 0.391 0.399
LPMLX 0.053 0.056 0.051 0.048 0.051 0.049 0.045 0.050 0.261 0.272 0.265 0.263 0.467 0.460 0.460 0.477
NP 0.056 0.058 0.053 0.052 0.051 0.052 0.045 0.050 0.266 0.275 0.266 0.270 0.469 0.459 0.461 0.479
Table 3: Uniform Test (τ[0.25,0.75]\tau\in[0.25,0.75])
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.048 0.044 0.044 0.045 0.047 0.049 0.045 0.048 0.450 0.451 0.455 0.454 0.765 0.770 0.769 0.770
LP 0.045 0.044 0.043 0.045 0.047 0.051 0.047 0.046 0.589 0.588 0.589 0.581 0.902 0.901 0.904 0.900
ML 0.047 0.044 0.043 0.045 0.044 0.051 0.045 0.047 0.570 0.577 0.582 0.568 0.887 0.889 0.893 0.890
LPML 0.046 0.046 0.045 0.047 0.046 0.050 0.046 0.051 0.603 0.605 0.616 0.607 0.916 0.917 0.915 0.915
MLX 0.052 0.049 0.048 0.048 0.046 0.053 0.050 0.050 0.582 0.582 0.595 0.576 0.889 0.893 0.891 0.889
LPMLX 0.053 0.047 0.049 0.052 0.047 0.053 0.050 0.050 0.612 0.614 0.619 0.610 0.915 0.919 0.919 0.913
NP 0.056 0.055 0.054 0.055 0.050 0.057 0.052 0.054 0.633 0.627 0.633 0.629 0.916 0.919 0.918 0.915
Panel B: DGP (ii)
NA 0.038 0.039 0.039 0.038 0.045 0.039 0.040 0.045 0.572 0.571 0.579 0.574 0.878 0.882 0.879 0.879
LP 0.041 0.044 0.045 0.041 0.044 0.043 0.039 0.042 0.704 0.708 0.710 0.700 0.953 0.955 0.956 0.955
ML 0.044 0.043 0.048 0.041 0.047 0.045 0.043 0.044 0.661 0.660 0.664 0.655 0.931 0.931 0.933 0.935
LPML 0.047 0.046 0.048 0.044 0.047 0.046 0.041 0.046 0.723 0.714 0.720 0.714 0.964 0.963 0.965 0.964
MLX 0.052 0.050 0.052 0.049 0.048 0.046 0.045 0.045 0.703 0.710 0.708 0.704 0.946 0.949 0.946 0.951
LPMLX 0.056 0.054 0.054 0.051 0.052 0.048 0.046 0.048 0.761 0.761 0.766 0.754 0.972 0.972 0.972 0.974
NP 0.060 0.060 0.062 0.058 0.055 0.052 0.047 0.051 0.770 0.771 0.773 0.765 0.973 0.974 0.972 0.974

6.4 Practical recommendations

When XX is finite-dimensional, we suggest using the LPMLX adjustment in which the logistic model includes interaction terms and the regression coefficients are allowed to depend on (τ,a,s)(\tau,a,s). When XX is high-dimensional, we suggest using the logistic Lasso to estimate the regression adjustment.101010The relevant theory and simulation results on high-dimensional covariates are provided in Section A of the Online Supplement.

7 Empirical Application

Undersaving has been found to have important individual and social welfare consequences (Karlan et al., 2014). Does expanding access to bank accounts for the poor lead to an overall increase in savings? To answer the question, Dupas et al. (2018) conducted a covariate-adaptive randomized experiment in Uganda, Malawi, and Chile to study the impact of a bank account subsidy on savings. In their paper, the authors examined the ATEs as well as the QTEs of the subsidy. This section reports an application of our methods to the same dataset to examine the QTEs of the subsidy on household total savings in Uganda.

The sample consists of 2160 households in Uganda.111111We filter out observations with missing values. Our final sample contains 1952 households. Within each of 41 strata by gender, occupation, and bank branch, 50 percent of households in the sample were randomly assigned to receive the bank account subsidy and the rest of the sample were in the control group. This is a block stratified randomization design with 41 strata, which satisfies Assumption 1 in Section 2. The target fraction of the treated units is 1/2. It is trivial to see that statements (i), (ii), and (iii) in Assumption 1 are satisfied. Because maxs𝒮|Dn(s)n(s)|0.056\max_{s\in\mathcal{S}}|\frac{D_{n}(s)}{n(s)}|\approx 0.056, it is reasonable to claim that Assumption 1(iv) is also satisfied in our analysis.

After the randomization and the intervention, the authors conducted 3 rounds of follow-up surveys in Uganda (See Dupas et al. (2018) for a detailed description). In this section, we focus on the first round follow up survey to examine the impact of the bank account subsidy on total savings.

Tables 4 and 5 present the QTE estimates and their standard errors (in parentheses) estimated by different methods at quantile indices 0.25, 0.5, and 0.75. The description of these estimators is similar to that in Section 6.121212Specifically, we have: (i) NA: the estimator with no adjustments. (ii) LP: the linear probability model. When there is only one auxiliary regressor, Hi=(1,X1i)H_{i}=(1,X_{1i})^{\top}, and when there are four auxiliary regressors, Hi=(1,X1i,X2i,X3i,X4i)H_{i}=(1,X_{1i},X_{2i},X_{3i},X_{4i})^{\top}, where X1i,X2i,X3i,X4i,X_{1i},X_{2i},X_{3i},X_{4i}, represent four covariates used in the regression adjustment. (iii) ML: the logistic probability model with regressor HiH_{i}, where HiH_{i} is the same as that in the LP model. (iv) LPML: the further improved logistic probability model with regressor HiH_{i}, where HiH_{i} is the same as that in the LP model. (v) MLX: the logistic probability model with interaction terms. MLX is only be applied to the case when there are four auxiliary regressors with Hi=(1,X1i,X2i,X3i,X4i,X1iX2i,X2iX3i)H_{i}=(1,X_{1i},X_{2i},X_{3i},X_{4i},X_{1i}X_{2i},X_{2i}X_{3i})^{\top}. (vi) LPMLX: the further improved logistic probability model with interaction terms. LPMLX is only be applied to the case when there are four auxiliary regressors with the same HiH_{i} as that used in the MLX model. (vii) NP: the nonparametric logistic probability model with regressor HhnH_{h_{n}}. NP is only applied to the case when there are four auxiliary regressors with Hhn=(1,X1i,X2i,X3i,X4i,X1iX2i,X2iX3i,X1i1{X1i>t1},X2i1{X2i>t2},X1i1{X1i>t1}X2i1{X2i>t2})H_{h_{n}}=(1,X_{1i},X_{2i},X_{3i},X_{4i},X_{1i}X_{2i},X_{2i}X_{3i},X_{1i}1\{X_{1i}>t_{1}\},X_{2i}1\{X_{2i}>t_{2}\},X_{1i}1\{X_{1i}>t_{1}\}X_{2i}1\{X_{2i}>t_{2}\})^{\top} where t1t_{1} and t2t_{2} are the sample medians of {X1i}i[n]\{X_{1i}\}_{i\in[n]} and {X2i}i[n]\{X_{2i}\}_{i\in[n]}, respectively. (viii) Lasso: the logistic probability model with regressor HpnH_{p_{n}} and post-Lasso coefficient estimator θ^a,spost(τ)\hat{\theta}_{a,s}^{post}(\tau). Lasso is only applied to the case when there are four auxiliary regressors with Hpn(Xi)=(1,X1i,X2i,X3i,X4i,X1i2,X2i2,X3i2,X1iX2i,X2iX3i,X1i1{X1i>t1},X2i1{X2i>t2},X1i1{X1i>t1}X2i1{X2i>t2})H_{p_{n}}(X_{i})=(1,X_{1i},X_{2i},X_{3i},X_{4i},X_{1i}^{2},X_{2i}^{2},X_{3i}^{2},X_{1i}X_{2i},X_{2i}X_{3i},X_{1i}1\{X_{1i}>t_{1}\},X_{2i}1\{X_{2i}>t_{2}\},X_{1i}1\{X_{1i}>t_{1}\}X_{2i}1\{X_{2i}>t_{2}\})^{\top}. The post-Lasso estimator θ^a.spost(τ)\hat{\theta}_{a.s}^{post}(\tau) is defined in (A.2). The choice of tuning parameter and the estimation procedure are detailed in Section B.3. In the analysis, we focus on two sets of additional baseline variables: baseline value of total savings only (one auxiliary regressor) and baseline value of total savings, household size, age, and married female dummy (four auxiliary regressors). The first set of regressors follows Dupas et al. (2018). The second one is used to illustrate all the methods discussed in the paper. Tables 4 and 5 report the results with one and four auxiliary regressors, respectively.

Table 4: QTEs on Total Savings (one auxiliary regressor)
NA LP ML LPML
25% 1.105 1.105 1.105 1.105
(0.564) (0.564) (0.470) (0.470)
50% 3.682 3.682 3.682 3.682
(1.010) (1.080) (1.146) (1.033)
75% 7.363 9.204 9.204 9.204
(3.757) (4.227) (3.616) (3.757)
\justify

Notes: The table presents the QTE estimates of the effect of the bank account subsidy on household total savings at quantiles 25%, 50%, and 75% when only one auxiliary regressor (baseline value of total savings) is used in the regression adjustment models. Standard errors are in parentheses.

Table 5: QTEs on Total Savings (four auxiliary regressors)
NA LP ML LPML MLX LPMLX NP Lasso
25% 1.105 1.473 1.105 1.105 1.105 1.105 1.105 1.105
(0.564) (0.564) (0.564) (0.564) (0.357) (0.319) (0.188) (0.564)
50% 3.682 3.682 3.682 3.682 3.682 3.682 3.682 3.682
(1.010) (1.033) (0.939) (0.939) (0.958) (1.033) (0.939) (0.939)
75% 7.363 8.100 7.363 7.363 7.363 7.363 7.363 7.363
(3.757) (3.757) (3.757) (3.569) (3.757) (3.663) (3.663) (3.757)
\justify

Notes: The table shows QTE estimates of the effect of the bank account subsidy on household total savings at quantiles 25%, 50%, and 75% when four auxiliary regressors (baseline value of total savings, household size, age, and married female dummy) are used in the regression adjustment models. Standard errors are in parentheses.

Table 6: Test for the Difference between Two QTEs on Total Savings
NA LP ML LPML MLX LPMLX NP Lasso
50%25%50\%-25\% 2.577 2.209 2.577 2.577 2.577 2.577 2.577 2.577
(0.939) (1.104) (0.939) (0.939) (0.958) (1.033) (0.845) (0.911)
75%50%75\%-50\% 3.682 4.418 3.682 3.682 3.682 3.682 3.682 3.682
(3.757) (3.663) (3.663) (3.287) (3.475) (3.287) (3.663) (3.757)
75%25%75\%-25\% 6.259 6.627 6.259 6.259 6.259 6.259 6.259 6.259
(3.851) (3.757) (3.757) (3.695) (3.588) (3.569) (3.287) (3.832)
\justify

Notes: The table presents tests for the difference between two QTE estimates of the effect of the bank account subsidy on household total savings when there are four auxiliary regressors: baseline value of total savings, household size, age, and married female dummy. Standard errors are in parentheses.

Refer to caption
Figure 1: Quantile Treatment Effects on the Distribution of Total Savings
\justify

Notes: The graphs in each panel of the figure plot the QTE estimates of the effect of the bank account subsidy on the distribution of household total savings when there are four auxiliary regressors: baseline value of total savings, household size, age, and married female dummy. The shadowed areas display 95% confidence regions.

The results in Tables 4-5 prompt two observations. First, consistent with the theoretical and simulation results, the standard errors for the regression-adjusted QTEs are mostly lower than those for the QTE estimate without adjustment. This observation holds for most of the specification and estimation methods of the auxiliary regressions.131313The efficiency gain from the “NP” adjustment is not the only reason for its small standard error at the 25% QTE. Another reason is that the treated outcomes around this percentile themselves do not have much variation. For example, in Table 4, the standard errors for the “LPML” QTE estimates are 16.7% less than those for the QTE estimate without adjustment at the 25th percentile, respectively. For another example in Table 5, at the 25th percentile, the standard error for the “LPMLX” QTE estimates is 43.4% less than that for the QTE estimate without adjustment. At the median, the standard error for the “LPML” QTE estimates is 7% less than that for the QTE estimate without adjustment.

Second, there is substantial heterogeneity in the impact of the subsidy on total savings. In particular, we observe larger effects as the quantile indexes increase, which is consistent with the findings in Dupas et al. (2018). For example, Table 5 shows that, although the treatment effects are all positive and significantly different from zero at quantiles 25%, 50%, and 75%, the magnitude of the effects increases by over 200% from the 25th percentile to the median and by around 100% from the median to the 75th percentile.

The second observation suggests that the heterogeneous effects of the subsidy on savings are sizable economically. To evaluate whether these effects are statistically significant, we report statistical tests for the heterogeneity of the QTEs in Table 6. Specifically, we test the null hypotheses: H0:q(0.5)q(0.25)=0H_{0}:q(0.5)-q(0.25)=0, H0:q(0.75)q(0.5)=0H_{0}:q(0.75)-q(0.5)=0, and H0:q(0.75)q(0.25)=0H_{0}:q(0.75)-q(0.25)=0. Table 6 shows that only the difference between the 50th and 25th QTEs is statistically significant at the 5% significance level.

How does the variation in the impact of the subsidy appear across the distribution of total savings? The QTEs on the distribution of savings are plotted in Figure 1, where the shaded areas represent the 95% confidence region. The figure shows that the QTEs seem insignificantly different from zero below about the 20th percentile. At higher levels to near the 80th percentile, the treatment group savings exceed the control group savings at an accelerated rate, yielding increasingly significantly positive QTEs. Beyond the 80th percentile, the QTEs again become insignificantly different from zero. These findings point to notable distributional heterogeneity in the impact of the subsidy on savings.

8 Conclusion

This paper proposes the use of auxiliary regression to incorporate additional covariates into estimation and inference relating to unconditional QTEs under CARs. The auxiliary regression model may be estimated parametrically, nonparametrically, or via regularization if there are high-dimensional covariates. Both estimation and bootstrap inference methods are robust to potential misspecification of the auxiliary model and do not suffer from the conservatism due to the CAR. It is shown that efficiency can be improved when including extra covariates. When the auxiliary regression is correctly specified, the regression-adjusted estimator further achieves the minimum asymptotic variance. In both the simulations and the empirical application, the proposed regression-adjusted QTE estimator performs well. These results and the robustness of the methods to auxiliary model misspecification reflect the aphorism widespread in scientific modeling that all models may be wrong, but some are useful.141414The aphorism “all models are wrong, but some are useful” is often attributed to the statistician George Box (1976). But the notion has many antecedents, including a particularly apposite remark made in 1947 by John von Neumann (2019) in an essay on the empirical origins of mathematical ideas to the effect that “truth … is much too complicated to allow anything but approximations”.

Acknowledgements

We thank the Managing Editor, Elie Tamer, the Associate Editor and three anonymous referees for many useful comments that helped to improve this paper. We are also grateful to Michael Qingliang Fan and seminar participants from the 2022 Econometric Society Australasian Meeting, the 2021 Nanyang Econometrics Workshop, University of California, Irvine, and University of Sydney for their comments.

Funding:

Yichong Zhang acknowledges financial support from the Singapore Ministry of Education under Tier 2 grant No. MOE2018-T2-2-169, the NSFC under the grant No. 72133002, and a Lee Kong Chian fellowship. Peter C. B. Phillips acknowledges support from NSF Grant No. SES 18-50860, a Kelly Fellowship at the University of Auckland, and a Lee Kong Chian Fellowship. Yubo Tao acknowledges the financial support from the Start-up Research Grant of University of Macau (SRG2022-00016-FSS). Liang Jiang acknowledges support from MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Project No.18YJC790063).

Appendix A Regularization Method for Regression Adjustments

This section considers estimation of ma(τ,s,X)m_{a}(\tau,s,X) in a high-dimensional environment. Let Hpn(Xi)H_{p_{n}}(X_{i}) be the regressors with dimension pnp_{n}, which may exceed the sample size. When the number of raw controls is comparable to or exceeds the sample size, we can just let Hpn(Xi)=XiH_{p_{n}}(X_{i})=X_{i}. On the other hand, Hpn(Xi)H_{p_{n}}(X_{i}) may be composed of a large dictionary of sieve bases derived from a fixed dimensional vector XiX_{i} through suitable transformations such as powers and interactions. Thus, high dimensionality in Hpn(Xi)H_{p_{n}}(X_{i}) can arise from the desire to flexibly approximate nuisance functions. In our approach we follow Belloni et al. (2017) and implement a logistic regression with 1\ell_{1}-penalization. In their notation we view ma(τ,s,x)m_{a}(\tau,s,x) as a function of qa(τ)q_{a}(\tau), i.e., ma(τ,s,x)=τa(qa(τ),s,x)m_{a}(\tau,s,x)=\tau-\mathcal{M}_{a}(q_{a}(\tau),s,x), where a(q,s,x)=(Yi(a)q|Si=s,Xi=x)\mathcal{M}_{a}(q,s,x)=\mathbb{P}(Y_{i}(a)\leq q|S_{i}=s,X_{i}=x). We estimate a(qa(τ),s,x)\mathcal{M}_{a}(q_{a}(\tau),s,x) as λ(Hpn(Xi)θ^a,sHD(q^a(τ)))\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau))), where q^a(τ)\hat{q}_{a}(\tau) is defined in Assumption 8,

θ^a,sHD(q)=\displaystyle\hat{\theta}_{a,s}^{\textit{HD}}(q)= argminθa1na(s)iIa(s)[1{Yiq}log(λ(Hpn(Xi)θa))\displaystyle\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq q\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))
+1{Yi>q}log(1λ(Hpn(Xi)θa))]+ϱn,a(s)na(s)||Ω^θa||1,\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+1\{Y_{i}>q\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}+\frac{\varrho_{n,a}(s)}{n_{a}(s)}||\hat{\Omega}\theta_{a}||_{1}, (A.1)

ϱn,a(s)\varrho_{n,a}(s) is a tuning parameter, and Ω^=diag(ω^1,,ω^pn)\hat{\Omega}=\text{diag}(\hat{\omega}_{1},\cdots,\hat{\omega}_{p_{n}}) is a diagonal matrix of data-dependent penalty loadings. We specify ϱn,a(s)\varrho_{n,a}(s) and Ω^\hat{\Omega} in Section B. Post-Lasso estimation is also considered. Let 𝕊^a,s(q)\hat{\mathbb{S}}_{a,s}(q) be the support of θ^a,sHD(q)={h[pn]:θ^a,s,hHD(q)0}\hat{\theta}_{a,s}^{\textit{HD}}(q)=\{h\in[p_{n}]:\hat{\theta}_{a,s,h}^{\textit{HD}}(q)\neq 0\}, where θ^a,s,hHD(q)\hat{\theta}_{a,s,h}^{\textit{HD}}(q) is the hhth coordinate of θ^a,sHD(q)\hat{\theta}_{a,s}^{\textit{HD}}(q). We can complement 𝕊^a,s(q)\hat{\mathbb{S}}_{a,s}(q) with additional variables in 𝕊^a,s+(q)\hat{\mathbb{S}}^{+}_{a,s}(q) that researchers want to control for and define the enlarged set of variables as 𝕊~a,s(q)=𝕊^a,s(q)𝕊^a,s+(q)\tilde{\mathbb{S}}_{a,s}(q)=\hat{\mathbb{S}}_{a,s}(q)\cup\hat{\mathbb{S}}^{+}_{a,s}(q). We compute the post-Lasso estimator θ^a,spost(q)\hat{\theta}_{a,s}^{post}(q) as

θ^a,spost(q)=\displaystyle\hat{\theta}_{a,s}^{post}(q)= argminθa𝕊~a,s(q)1na(s)iIa(s)[1{Yiq}log(λ(Hpn(Xi)θa))\displaystyle\operatorname*{arg\,min}_{\theta_{a}\in\tilde{\mathbb{S}}_{a,s}(q)}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq q\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))
+1{Yi>q}log(1λ(Hpn(Xi)θa))].\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+1\{Y_{i}>q\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}. (A.2)

Finally, we compute the auxiliary model as

m^a(τ,s,Xi)=λ(Hpn(Xi)θ^a,sHD(q^a(τ)))orm^a(τ,s,Xi)=λ(Hpn(Xi)θ^a,spost(q^a(τ))).\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\lambda(H_{p_{n}}^{\top}(X_{i})\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau)))\quad\text{or}\quad\widehat{m}_{a}(\tau,s,X_{i})=\lambda(H_{p_{n}}^{\top}(X_{i})\hat{\theta}_{a,s}^{post}(\hat{q}_{a}(\tau))). (A.3)

We refer to the QTE estimator with the regularized adjustment as the HD estimator. Note that we use the estimator q^a(τ)\hat{q}_{a}(\tau) of qa(τ)q_{a}(\tau) in (A.3), where q^a(τ)\hat{q}_{a}(\tau) satisfies Assumption 8. All the analysis in this section takes account of the fact that q^a(τ)\hat{q}_{a}(\tau) instead of qa(τ)q_{a}(\tau) is used.

Assumption 13.
  1. (i)

    Let 𝒬aε={q:infτΥ|qqa(τ)|ε}\mathcal{Q}^{\varepsilon}_{a}=\{q:\inf_{\tau\in\Upsilon}|q-q_{a}(\tau)|\leq\varepsilon\} for a=0,1a=0,1. Suppose (Yi(a)q|Si=s,Xi)=λ(Hpn(Xi)θa,sHD(q))+ra(q,s,Xi)\mathbb{P}(Y_{i}(a)\leq q|S_{i}=s,X_{i})=\lambda(H_{p_{n}}(X_{i})^{\top}\theta^{\textit{HD}}_{a,s}(q))+r_{a}(q,s,X_{i}) such that supa=0,1,q𝒬aε,s𝒮θa,sHD(q)0hn\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\theta^{\textit{HD}}_{a,s}(q)||_{0}\leq h_{n}.

  2. (ii)

    Suppose supi[n]Hpn(Xi)ζn\sup_{i\in[n]}||H_{p_{n}}(X_{i})||_{\infty}\leq\zeta_{n} and suph[pn]𝔼(|Hpn,hd(Xi)||Si=s)<\sup_{h\in[p_{n}]}\mathbb{E}(|H_{p_{n},h}^{d}(X_{i})||S_{i}=s)<\infty for d>2d>2.

  3. (iii)

    Suppose

    supa=0,1,q𝒬aε,s𝒮1na(s)iIa(s)ra2(q,s,Xi)=Op(hnlog(pn)/n),\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}r^{2}_{a}(q,s,X_{i})=O_{p}(h_{n}\log(p_{n})/n),
    supa=0,1,q𝒬aε,s𝒮𝔼(ra2(q,s,Xi)|Si=s)=O(hnlog(pn)/n),\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}\mathbb{E}(r^{2}_{a}(q,s,X_{i})|S_{i}=s)=O(h_{n}\log(p_{n})/n),

    and

    supa=0,1,q𝒬aε,s𝒮,x𝒳|ra(q,s,X)|=O(ξn2hn2log(pn)/n).\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S},x\in\mathcal{X}}|r_{a}(q,s,X)|=O(\sqrt{\xi_{n}^{2}h_{n}^{2}\log(p_{n})/n}).
  4. (iv)

    log(pn)ξn2hn2n0\frac{\log(p_{n})\xi_{n}^{2}h_{n}^{2}}{n}\rightarrow 0, log2(pn)log2(n)hn2n0\frac{\log^{2}(p_{n})\log^{2}(n)h_{n}^{2}}{n}\rightarrow 0, supa=0,1,q𝒬aε,s𝒮|𝕊^a,s+(q)|=Op(hn)\sup_{a=0,1,q\in\mathcal{Q}_{a}^{\varepsilon},s\in\mathcal{S}}|\hat{\mathbb{S}}^{+}_{a,s}(q)|=O_{p}(h_{n}), where |𝕊^a,s+(q)||\hat{\mathbb{S}}^{+}_{a,s}(q)| denotes the number of elements in 𝕊^a,s+(q)\hat{\mathbb{S}}^{+}_{a,s}(q).

  5. (v)

    There exists a constant c(0,0.5)c\in(0,0.5) such that

    c\displaystyle c\leq infa=0,1,s𝒮,τΥ,xSupp(X)(Yi(a)qa(τ)|Si=s,Xi=x)\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)
    \displaystyle\leq supa=0,1,s𝒮,τΥ,xSupp(X)(Yi(a)qa(τ)|Si=s,Xi=x)1c.\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)\leq 1-c.
  6. (vi)

    Let n\ell_{n} be a sequence that diverges to infinity. Then, there exist two constants κ1\kappa_{1} and κ2\kappa_{2} such that with probability approaching one,

    0<κ1\displaystyle 0<\kappa_{1}\leq infa=0,1,s𝒮,v0hnnvT(1na(s)iIa(s)Hpn(Xi)Hpn(Xi))vv22\displaystyle\inf_{a=0,1,s\in\mathcal{S},||v||_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}\right)v}{||v||_{2}^{2}}
    \displaystyle\leq supa=0,1,s𝒮,v0hnnvT(1na(s)iIa(s)Hpn(Xi)Hpn(Xi))vv22κ2<,\displaystyle\sup_{a=0,1,s\in\mathcal{S},||v||_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}\right)v}{||v||_{2}^{2}}\leq\kappa_{2}<\infty,

    and

    0<κ1\displaystyle 0<\kappa_{1}\leq infa=0,1,s𝒮,v0hnnvT𝔼(Hpn(Xi)Hpn(Xi)|Si=s)vv22\displaystyle\inf_{a=0,1,s\in\mathcal{S},||v||_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}|S_{i}=s)v}{||v||_{2}^{2}}
    \displaystyle\leq supa=0,1,s𝒮,v0hnnvT𝔼(Hpn(Xi)Hpn(Xi)|Si=s)vv22κ2<,\displaystyle\sup_{a=0,1,s\in\mathcal{S},||v||_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}|S_{i}=s)v}{||v||_{2}^{2}}\leq\kappa_{2}<\infty,

    where v0||v||_{0} denotes the number of nonzero components in vv.

  7. (vii)

    For a=0,1a=0,1, let ϱn,a(s)=cna(s)Φ1(10.14log(na(s))pn)\varrho_{n,a}(s)=c\sqrt{n_{a}(s)}\Phi^{-1}\left(1-\frac{0.1}{4\log(n_{a}(s))p_{n}}\right) where Φ()\Phi(\cdot) is the standard normal CDF and c>0c>0 is a constant.

Assumption 13 is standard in the literature and we refer interested readers to Belloni et al. (2017) for more discussion. Assumption 13(i) implies the logistic model is approximately correctly specified. As the approximation is assumed to be sparse, the condition is not innocuous in the high-dimensional setting. As our method is valid even when the auxiliary model is misspecified, we conjecture that Assumption 13(i) can be relaxed, which links to the recent literature on the study of regularized estimation in the high-dimensional setting under misspecification: see, for example, Bradic et al. (2019) and Tan (2020) and the references therein. An interesting topic for future work is to study misspecification-robust high-dimensional estimators of the conditional probability model and their use to adjust the QTE estimator under CAR based on (3.1) and (3.2). The following theorem shows that all the estimation and inference results in Theorems 3 and 5 hold for the HD estimator.

Theorem A.1.

Denote q^HD(τ)\hat{q}^{\textit{HD}}(\tau) and q^HD,w(τ)\hat{q}^{\textit{HD,w}}(\tau) as the τ\tauth QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with m¯a(τ,Si,Xi)=ma(τ,Si,Xi)\overline{m}_{a}(\tau,S_{i},X_{i})=m_{a}(\tau,S_{i},X_{i}) and m^a(τ,Si,Xi)\widehat{m}_{a}(\tau,S_{i},X_{i}) defined in (A.3). Further suppose Assumptions 1, 2, 4, 8, and 13 hold. Then, Assumptions 3 and 5 hold, which further imply Theorems 3 and 5 hold for q^HD(τ)\hat{q}^{\textit{HD}}(\tau) and q^HD,w(τ)\hat{q}^{\textit{HD,w}}(\tau), respectively. In addition, for any finite-dimensional quantile indices (τ1,,τK)(\tau_{1},\cdots,\tau_{K}), the covariance matrix of (q^HD(τ1),,q^HD(τK))(\hat{q}^{\textit{HD}}(\tau_{1}),\cdots,\hat{q}^{\textit{HD}}(\tau_{K})) achieves the minimum (in the matrix sense) as characterized in Theorem 3.

Appendix B Practical Guidance and Computation

B.1 Procedures for estimation and bootstrap inference

We can compute (q^1adj(τ),q^0adj(τ))(\hat{q}_{1}^{adj}(\tau),\hat{q}_{0}^{adj}(\tau)) by solving the subgradient conditions of (3.1) and (3.2), respectively. Specifically, we have (q^1adj(τ),q^0adj(τ))=(Yi1,Yi0)(\hat{q}_{1}^{adj}(\tau),\hat{q}_{0}^{adj}(\tau))=(Y_{i_{1}},Y_{i_{0}}) such that Ai1=1A_{i_{1}}=1, Ai0=0A_{i_{0}}=0,

τ(i=1nAiπ^(Si))i=1n((Aiπ^(Si))π^(Si)m^1(τ,Si,Xi))\displaystyle\tau\left(\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\right)-\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)
i=1nAiπ^(Si)1{Yi<Yi1}\displaystyle\geq\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}1\{Y_{i}<Y_{i_{1}}\}
τ(i=1nAiπ^(Si))1π^(Si1)i=1n((Aiπ^(Si))π^(Si)m^1(τ,Si,Xi)),\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\right)-\frac{1}{\hat{\pi}(S_{i_{1}})}-\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right), (B.1)

and

τ(i=1n1Ai1π^(Si))+i=1n((Aiπ^(Si))1π^(Si)m^0(τ,Si,Xi))\displaystyle\tau\left(\sum_{i=1}^{n}\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}\right)+\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right)
i=1n1Ai1π^(Si)1{Yi<Yi0}\displaystyle\geq\sum_{i=1}^{n}\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}1\{Y_{i}<Y_{i_{0}}\}
τ(i=1n1Ai1π^(Si))11π^(Si0)+i=1n((Aiπ^(Si))1π^(Si)m^0(τ,Si,Xi)).\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}\right)-\frac{1}{1-\hat{\pi}(S_{i_{0}})}+\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right). (B.2)

We note (i1,i0)(i_{1},i_{0}) are uniquely defined as long as all the inequalities in (B.1) and (B.2) are strict, which is usually the case. If we have

τ(i=1nAiπ^(Si))i=1n((Aiπ^(Si))π^(Si)m^1(τ,Si,Xi))=i=1nAiπ^(Si)1{YiYi1},\displaystyle\tau\left(\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\right)-\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)=\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}1\{Y_{i}\leq Y_{i_{1}}\},

then both i1i_{1} and i1i_{1}^{\prime} satisfy (B.1), where i1i_{1}^{\prime} is the index such that Ai1=1A_{i_{1}^{\prime}}=1 and Yi1Y_{i_{1}^{\prime}} is the smallest observation in the treatment group that is larger than Yi1Y_{i_{1}}. In this case, we let q^1adj(τ)=Yi1\hat{q}_{1}^{adj}(\tau)=Y_{i_{1}}.151515In this case, any value that belongs to [Yi1,Yi1][Y_{i_{1}},Y_{i_{1}^{\prime}}] can be viewed as a solution. Because |Yi1Yi1|=Op(1/n)|Y_{i_{1}}-Y_{i_{1}^{\prime}}|=O_{p}(1/n), all choices are asymptotically equivalent. Similarly, by solving the subgradient conditions of (B.3) and (B.4), we have (q^1w(τ),q^0w(τ))=(Yi1w,Yi0w)(\hat{q}_{1}^{w}(\tau),\hat{q}_{0}^{w}(\tau))=(Y_{i_{1}^{w}},Y_{i_{0}^{w}}) such that Ai1w=1A_{i_{1}^{w}}=1, Ai0w=0A_{i_{0}^{w}}=0,

τ(i=1nξiAiπ^w(Si))i=1n(ξi(Aiπ^w(Si))π^w(Si)m^1(τ,Si,Xi))\displaystyle\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}\right)-\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)
i=1nξiAiπ^w(Si)1{Yi<Yi1w}\displaystyle\geq\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}1\{Y_{i}<Y_{i_{1}^{w}}\}
τ(i=1nξiAiπ^w(Si))ξi1wπ^w(Si1w)i=1n(ξi(Aiπ^w(Si))π^w(Si)m^1(τ,Si,Xi))\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}\right)-\frac{\xi_{i_{1}^{w}}}{\hat{\pi}^{w}(S_{i_{1}^{w}})}-\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right) (B.3)

and

τ(i=1nξi(1Ai)1π^w(Si))+i=1n(ξi(Aiπ^(Si))1π^w(Si)m^0(τ,Si,Xi))\displaystyle\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}(1-A_{i})}{1-\hat{\pi}^{w}(S_{i})}\right)+\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}^{w}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right)
i=1nξi(1Ai)1π^w(Si)1{Yi<Yi0w}\displaystyle\geq\sum_{i=1}^{n}\frac{\xi_{i}(1-A_{i})}{1-\hat{\pi}^{w}(S_{i})}1\{Y_{i}<Y_{i_{0}^{w}}\}
τ(i=1nξi(1Ai)1π^w(Si))ξi0w1π^w(Si0)+i=1n(ξi(Aiπ^w(Si))1π^w(Si)m^0(τ,Si,Xi)).\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}(1-A_{i})}{1-\hat{\pi}^{w}(S_{i})}\right)-\frac{\xi_{i_{0}^{w}}}{1-\hat{\pi}^{w}(S_{i_{0}})}+\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}^{w}(S_{i}))}{1-\hat{\pi}^{w}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right). (B.4)

The inequalities in (B.3) and (B.4) are strict with probability one if ξi\xi_{i} is continuously distributed. In this case, (q^1w(τ),q^0w(τ))(\hat{q}_{1}^{w}(\tau),\hat{q}_{0}^{w}(\tau)) are uniquely defined with probability one.

We summarize the steps in the bootstrap procedure as follows.

  1. 1.

    Let 𝒢\mathcal{G} be a set of quantile indices. For τ𝒢\tau\in\mathcal{G}, compute q^1(τ)\hat{q}_{1}(\tau) and q^0(τ)\hat{q}_{0}(\tau) following (B.1) and (B.2) with m^1(τ,Si,Xi)\widehat{m}_{1}(\tau,S_{i},X_{i}) and m^0(τ,Si,Xi)\widehat{m}_{0}(\tau,S_{i},X_{i}) replaced by zero.

  2. 2.

    Compute m^a(τ,Si,Xi)\widehat{m}_{a}(\tau,S_{i},X_{i}) for a=0,1a=0,1 and τ𝒢\tau\in\mathcal{G} using q^1(τ)\hat{q}_{1}(\tau) and q^0(τ)\hat{q}_{0}(\tau).

  3. 3.

    Compute the original estimator q^adj(τ)=q^1adj(τ)q^0adj(τ)\hat{q}^{adj}(\tau)=\hat{q}^{adj}_{1}(\tau)-\hat{q}^{adj}_{0}(\tau), following (B.1) and (B.2) for τ𝒢\tau\in\mathcal{G}.

  4. 4.

    Let BB be the number of bootstrap replications. For b[B]b\in[B], generate {ξi}i[n]\{\xi_{i}\}_{i\in[n]}. Compute q^w,b(τ)=q^1w,b(τ)q^0w,b(τ)\hat{q}^{w,b}(\tau)=\hat{q}_{1}^{w,b}(\tau)-\hat{q}_{0}^{w,b}(\tau) for τ𝒢\tau\in\mathcal{G} following (B.3) and (B.4). Obtain {q^w,b(τ)}τ𝒢\{\hat{q}^{w,b}(\tau)\}_{\tau\in\mathcal{G}}.

  5. 5.

    Repeat the above step for b[B]b\in[B] and obtain BB bootstrap estimates of the QTE, denoted as {q^w,b(τ)}b[B],τ𝒢\{\hat{q}^{w,b}(\tau)\}_{b\in[B],\tau\in\mathcal{G}}.

B.2 Bootstrap confidence intervals

Given the bootstrap estimates, we next discuss how to conduct bootstrap inference for the null hypotheses with single, multiple, and a continuum of quantile indices.

Case (1). We test the single null hypothesis that 0:q(τ)=q¯\mathcal{H}_{0}:q(\tau)=\underline{q} vs. q(τ)q¯q(\tau)\neq\underline{q}. Set 𝒢={τ}\mathcal{G}=\{\tau\} in the procedures described above and let 𝒞^(ν)\widehat{\mathcal{C}}(\nu) and 𝒞(ν)\mathcal{C}(\nu) be the ν\nuth empirical quantile of the sequence {q^w,b(τ)}b[B]\{\hat{q}^{w,b}(\tau)\}_{b\in[B]} and the ν\nuth standard normal critical value, respectively. Let α(0,1)\alpha\in(0,1) be the significance level. We suggest using the bootstrap estimator to construct the standard error of q^adj(τ)\hat{q}^{adj}(\tau) as σ^=𝒞^(0.975)𝒞^(0.025)𝒞(0.975)𝒞(0.025)\hat{\sigma}=\frac{\widehat{\mathcal{C}}(0.975)-\widehat{\mathcal{C}}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}. Then the valid confidence interval and Wald test using this standard error are

CI(α)=(q^adj(τ)+𝒞(α/2)σ^,q^adj(τ)+𝒞(1α/2)σ^),\displaystyle CI(\alpha)=(\hat{q}^{adj}(\tau)+\mathcal{C}(\alpha/2)\hat{\sigma},\hat{q}^{adj}(\tau)+\mathcal{C}(1-\alpha/2)\hat{\sigma}),

and 1{|q^adj(τ)q¯σ^|𝒞(1α/2)}1\big{\{}\big{|}\frac{\hat{q}^{adj}(\tau)-\underline{q}}{\hat{\sigma}}\big{|}\geq\mathcal{C}(1-\alpha/2)\big{\}}, respectively.161616It is asymptotically valid to use standard and percentile bootstrap confidence intervals. But in simulations we found that the confidence interval proposed in the paper has better finite sample performance in terms of coverage rates under the null.

Case (2). We test the null hypothesis that 0:q(τ1)q(τ2)=q¯\mathcal{H}_{0}:q(\tau_{1})-q(\tau_{2})=\underline{q} vs. q(τ1)q(τ2)q¯q(\tau_{1})-q(\tau_{2})\neq\underline{q}. In this case, we have 𝒢={τ1,τ2}\mathcal{G}=\{\tau_{1},\tau_{2}\} in the procedure described in Section B.1. Further, let 𝒞^(ν)\widehat{\mathcal{C}}(\nu) be the ν\nuth empirical quantile of the sequence {q^w,b(τ1)q^w,b(τ2)}b[B]\{\hat{q}^{w,b}(\tau_{1})-\hat{q}^{w,b}(\tau_{2})\}_{b\in[B]}, and let α(0,1)\alpha\in(0,1) be the significance level. We suggest using the bootstrap standard error to construct the valid confidence interval and Wald test as

CI(α)=(q^adj(τ1)q^adj(τ2)+𝒞(α/2)σ^,q^adj(τ1)q^adj(τ2)+𝒞(1α/2)σ^),\displaystyle CI(\alpha)=(\hat{q}^{adj}(\tau_{1})-\hat{q}^{adj}(\tau_{2})+\mathcal{C}(\alpha/2)\hat{\sigma},\hat{q}^{adj}(\tau_{1})-\hat{q}^{adj}(\tau_{2})+\mathcal{C}(1-\alpha/2)\hat{\sigma}),

and 1{|q^adj(τ1)q^adj(τ2)q¯σ^|𝒞(1α/2)}1\big{\{}\big{|}\frac{\hat{q}^{adj}(\tau_{1})-\hat{q}^{adj}(\tau_{2})-\underline{q}}{\hat{\sigma}}\big{|}\geq\mathcal{C}(1-\alpha/2)\big{\}}, respectively, where σ^=𝒞^(0.975)𝒞^(0.025)𝒞(0.975)𝒞(0.025)\hat{\sigma}=\frac{\widehat{\mathcal{C}}(0.975)-\widehat{\mathcal{C}}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}.

Case (3). We test the null hypothesis that

0:q(τ)=q¯(τ)τΥvs.q(τ)q¯(τ)τΥ.\mathcal{H}_{0}:q(\tau)=\underline{q}(\tau)\leavevmode\nobreak\ \forall\tau\in\Upsilon\leavevmode\nobreak\ \text{vs.}\leavevmode\nobreak\ q(\tau)\neq\underline{q}(\tau)\leavevmode\nobreak\ \exists\tau\in\Upsilon.

In theory, we should let 𝒢=Υ\mathcal{G}=\Upsilon. In practice, we let 𝒢={τ1,,τG}\mathcal{G}=\{\tau_{1},\cdots,\tau_{G}\} be a fine grid of Υ\Upsilon where GG should be as large as computationally possible. Further, let 𝒞^τ(ν)\widehat{\mathcal{C}}_{\tau}(\nu) denote the ν\nuth empirical quantile of the sequence {q^w,b(τ)}b[B]\{\hat{q}^{w,b}(\tau)\}_{b\in[B]} for τ𝒢\tau\in\mathcal{G}. Compute the standard error of q^adj(τ)\hat{q}^{adj}(\tau) as

σ^τ=𝒞^τ(0.975)𝒞^τ(0.025)𝒞(0.975)𝒞(0.025).\displaystyle\hat{\sigma}_{\tau}=\frac{\widehat{\mathcal{C}}_{\tau}(0.975)-\widehat{\mathcal{C}}_{\tau}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}.

The uniform confidence band with an α\alpha significance level is constructed as

CB(α)={q^adj(τ)𝒞~ασ^τ,q^adj(τ)+𝒞~ασ^τ:τ𝒢},\displaystyle CB(\alpha)=\{\hat{q}^{adj}(\tau)-\widetilde{\mathcal{C}}_{\alpha}\hat{\sigma}_{\tau},\hat{q}^{adj}(\tau)+\widetilde{\mathcal{C}}_{\alpha}\hat{\sigma}_{\tau}:\tau\in\mathcal{G}\},

where the critical value 𝒞~α\widetilde{\mathcal{C}}_{\alpha} is computed as

𝒞~α=inf{z:1Bb=1B1{supτ𝒢|q^w,b(τ)q~(τ)σ^τ|z}1α},\displaystyle\widetilde{\mathcal{C}}_{\alpha}=\inf\left\{z:\frac{1}{B}\sum_{b=1}^{B}1\left\{\sup_{\tau\in\mathcal{G}}\left|\frac{\hat{q}^{w,b}(\tau)-\tilde{q}(\tau)}{\hat{\sigma}_{\tau}}\right|\leq z\right\}\geq 1-\alpha\right\},

and q~(τ)\tilde{q}(\tau) is first-order equivalent to q^adj(τ)\hat{q}^{adj}(\tau) in the sense that supτΥ|q~(τ)q^adj(τ)|=op(1/n)\sup_{\tau\in\Upsilon}|\tilde{q}(\tau)-\hat{q}^{adj}(\tau)|=o_{p}(1/\sqrt{n}). We suggest choosing q~(τ)=𝒞^τ(0.5)\tilde{q}(\tau)=\widehat{\mathcal{C}}_{\tau}(0.5) over other choices such as q~(τ)=q^adj(τ)\tilde{q}(\tau)=\hat{q}^{adj}(\tau) due to its better finite sample performance. We reject 0\mathcal{H}_{0} at significance level α\alpha if q¯()CB(α).\underline{q}(\cdot)\notin CB(\alpha).

B.3 Computation of Auxiliary Regressions

Parametric regressions.

For the linear probability model, we compute the LP estimator via (5.7). For the logistic model, we consider the ML and the LPML estimators. First, we compute the ML estimator θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) as in (5.9), which is the quasi maximum likelihood estimator of a flexible distribution regression. Second, we propose to compute the logistic function values with θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) and treat them as regressors in a linear adjustment to further improve the ML estimate.

Sieve logistic regressions.

We provide more detail on the sieve basis. Recall Hhn(x)(b1n(x),,bhnn(x))H_{h_{n}}(x)\equiv(b_{1n}(x),\cdots,b_{h_{n}n}(x))^{\top}, where {bhn()}h[hn]\{b_{hn}(\cdot)\}_{h\in[h_{n}]} are hnh_{n} basis functions of a linear sieve space, denoted as 𝔹\mathbb{B}. Given that all dxd_{x} elements of XX are continuously distributed, the sieve space 𝔹\mathbb{B} can be constructed as follows.

  1. 1.

    For each element X(l)X^{(l)} of XX, l=1,,dxl=1,\cdots,d_{x}, let l\mathcal{B}_{l} be the univariate sieve space of dimension JnJ_{n}. One example of l\mathcal{B}_{l} is the linear span of the JnJ_{n} dimensional polynomials given by

    𝔹l={k=0Jnαkxk,xSupp(X(l)),αk};\mathbb{B}_{l}=\biggl{\{}\sum_{k=0}^{J_{n}}\alpha_{k}x^{k},x\in\text{Supp}(X^{(l)}),\alpha_{k}\in\Re\biggr{\}};

    Another is the linear span of rr-order splines with JnJ_{n} nodes given by

    𝔹l={k=0r1αkxk+j=1Jnbj[max(xtj,0)]r1,xSupp(X(l)),αk,bj},\mathbb{B}_{l}=\biggl{\{}\sum_{k=0}^{r-1}\alpha_{k}x^{k}+\sum_{j=1}^{J_{n}}b_{j}[\max(x-t_{j},0)]^{r-1},x\in\text{Supp}(X^{(l)}),\alpha_{k},b_{j}\in\Re\biggr{\}},

    where the grid =t0t1tJntJn+1=-\infty=t_{0}\leq t_{1}\leq\cdots\leq t_{J_{n}}\leq t_{J_{n}+1}=\infty partitions Supp(X(l))\text{Supp}(X^{(l)}) into Jn+1J_{n}+1 subsets Ij=[tj,tj+1)Supp(X(l))I_{j}=[t_{j},t_{j+1})\cap\text{Supp}(X^{(l)}), j=1,,Jn1j=1,\cdots,J_{n}-1, I0=(t0,t1)Supp(X(l))I_{0}=(t_{0},t_{1})\cap\text{Supp}(X^{(l)}), and IJn=(tJn,tJn+1)Supp(X(l))I_{J_{n}}=(t_{J_{n}},t_{J_{n}+1})\cap\text{Supp}(X^{(l)}).

  2. 2.

    Let 𝔹\mathbb{B} be the tensor product of {l}l=1dx\{\mathcal{B}_{l}\}_{l=1}^{d_{x}}, which is defined as a linear space spanned by the functions l=1dxgl\prod_{l=1}^{d_{x}}g_{l}, where gllg_{l}\in\mathcal{B}_{l}. The dimension of 𝔹\mathbb{B} is then KdxJnK\equiv d_{x}J_{n}.

We refer interested readers to Hirano et al. (2003) and Chen (2007) for more details about the implementation of sieve estimation. Given the sieve basis, we can compute the m^a(τ,s,Xi)\widehat{m}_{a}(\tau,s,X_{i}) following (5.16) and (5.17).

Logistic regressions with an 1\ell_{1} penalization.

We follow the estimation procedure and the choice of tuning parameter proposed by Belloni et al. (2017). We provide details below for completeness. Recall ϱn,a(s)=cna(s)Φ1(11/(pnlog(na(s))))\varrho_{n,a}(s)=c\sqrt{n_{a}(s)}\Phi^{-1}(1-1/(p_{n}\log(n_{a}(s)))). We set c=1.1c=1.1 following Belloni et al. (2017). We then implement the following algorithm to estimate θ^a,sHD(q^a(τ))\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau)) for τΥ\tau\in\Upsilon:

  1. (i)

    Let σ^h(0)=1na(s)iIa(s)(1{Yiq^a(τ)}Y¯a,s(τ))2Hpn,h2\hat{\sigma}_{h}^{(0)}=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\bar{Y}_{a,s}(\tau))^{2}H_{p_{n},h}^{2} for h[pn]h\in[p_{n}], where Y¯a,s(τ)=1na(s)iIa(s)1{Yiq^a(τ)}\bar{Y}_{a,s}(\tau)=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}. Estimate

    θ^a,sHD,0(q^a(τ))=\displaystyle\hat{\theta}_{a,s}^{\textit{HD,0}}(\hat{q}_{a}(\tau))= argminθa1na(s)iIa(s)[1{Yiq^a(τ)}log(λ(Hpn(Xi)θa))\displaystyle\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))
    +1{Yi>q^a(τ)}log(1λ(Hpn(Xi)θa))]+ϱn,a(s)na(s)h[pn]σ^h(0)|θa,h|.\displaystyle\qquad\qquad\qquad+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}+\frac{\varrho_{n,a}(s)}{n_{a}(s)}\sum_{h\in[p_{n}]}\hat{\sigma}_{h}^{(0)}|\theta_{a,h}|.
  2. (ii)

    For k=1,,Kk=1,\cdots,K, obtain σ^h(k)=1ni=1n(Hpn,hε^i(k))2\hat{\sigma}_{h}^{(k)}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(H_{p_{n},h}\hat{\varepsilon}_{i}^{(k)})^{2}}, where ε^i(k)=1{Yiq^a(τ)}λ(HpnTθ^a,sHD,k-1(q^a(τ)))\hat{\varepsilon}_{i}^{(k)}=1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\lambda(H_{p_{n}}^{T}\hat{\theta}_{a,s}^{\textit{HD,k-1}}(\hat{q}_{a}(\tau))). Estimate

    θ^a,sHD,k(q^a(τ))=\displaystyle\hat{\theta}_{a,s}^{\textit{HD,k}}(\hat{q}_{a}(\tau))= argminθa1na(s)iIa(s)[1{Yiq^a(τ)}log(λ(Hpn(Xi)θa))\displaystyle\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))
    +1{Yi>q^a(τ)}log(1λ(Hpn(Xi)θa))]+ϱn,a(s)na(s)h[pn]σ^h(k)|θa,h|.\displaystyle\qquad\qquad\qquad+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}+\frac{\varrho_{n,a}(s)}{n_{a}(s)}\sum_{h\in[p_{n}]}\hat{\sigma}_{h}^{(k)}|\theta_{a,h}|.
  3. (iii)

    Let θ^a,sHD(q^a(τ))=θ^a,sHD,K(q^a(τ))\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau))=\hat{\theta}_{a,s}^{\textit{HD,K}}(\hat{q}_{a}(\tau)).

  4. (iv)

    Repeat the above procedure for τ𝒢\tau\in\mathcal{G}.

Appendix C Additional Simulation Results

C.1 Pointwise tests

Additional simulation results are provided for pointwise tests at 25% and 75% quantiles. The results are summarized in Tables 7 and 8. The simulation settings are the same as the pointwise test simulations in Section 6 of the main paper.

Table 7: Pointwise Test (τ=0.25\tau=0.25)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.053 0.057 0.052 0.052 0.056 0.054 0.053 0.052 0.334 0.341 0.346 0.344 0.579 0.591 0.595 0.607
LP 0.054 0.056 0.052 0.054 0.053 0.057 0.053 0.052 0.391 0.406 0.405 0.394 0.683 0.696 0.693 0.694
ML 0.057 0.055 0.055 0.054 0.053 0.055 0.049 0.051 0.380 0.387 0.389 0.378 0.673 0.678 0.683 0.674
LPML 0.056 0.056 0.057 0.057 0.052 0.056 0.055 0.056 0.410 0.418 0.417 0.409 0.714 0.722 0.717 0.715
MLX 0.058 0.060 0.057 0.057 0.051 0.055 0.053 0.057 0.387 0.394 0.398 0.388 0.668 0.674 0.677 0.677
LPMLX 0.060 0.059 0.060 0.060 0.052 0.060 0.056 0.057 0.414 0.423 0.431 0.415 0.718 0.722 0.725 0.718
NP 0.061 0.065 0.063 0.063 0.056 0.060 0.058 0.057 0.432 0.444 0.442 0.427 0.724 0.728 0.730 0.724
Panel B: DGP (ii)
NA 0.044 0.047 0.048 0.046 0.054 0.049 0.044 0.046 0.457 0.457 0.466 0.475 0.741 0.751 0.752 0.760
LP 0.050 0.050 0.050 0.049 0.054 0.052 0.045 0.044 0.541 0.542 0.545 0.538 0.824 0.830 0.831 0.825
ML 0.049 0.049 0.052 0.054 0.050 0.051 0.049 0.045 0.478 0.477 0.480 0.474 0.757 0.761 0.760 0.761
LPML 0.053 0.052 0.056 0.054 0.053 0.050 0.048 0.044 0.542 0.540 0.544 0.536 0.832 0.837 0.840 0.833
MLX 0.055 0.057 0.057 0.058 0.054 0.053 0.048 0.045 0.507 0.504 0.505 0.500 0.765 0.771 0.771 0.770
LPMLX 0.055 0.061 0.061 0.057 0.057 0.054 0.050 0.046 0.572 0.567 0.572 0.563 0.848 0.850 0.852 0.845
NP 0.063 0.065 0.063 0.061 0.058 0.056 0.052 0.051 0.575 0.571 0.576 0.572 0.847 0.852 0.854 0.847
Table 8: Pointwise Test (τ=0.75\tau=0.75)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.054 0.055 0.056 0.054 0.057 0.052 0.051 0.051 0.348 0.342 0.352 0.338 0.583 0.594 0.601 0.576
LP 0.055 0.052 0.056 0.052 0.050 0.053 0.053 0.049 0.428 0.418 0.424 0.432 0.686 0.698 0.698 0.697
ML 0.051 0.053 0.054 0.051 0.050 0.051 0.054 0.054 0.402 0.395 0.403 0.402 0.658 0.667 0.666 0.667
LPML 0.056 0.056 0.058 0.056 0.050 0.053 0.056 0.051 0.428 0.425 0.437 0.435 0.706 0.711 0.711 0.709
MLX 0.059 0.056 0.057 0.057 0.054 0.054 0.053 0.055 0.404 0.399 0.404 0.406 0.654 0.663 0.662 0.657
LPMLX 0.058 0.058 0.061 0.057 0.052 0.056 0.056 0.053 0.425 0.421 0.432 0.432 0.702 0.711 0.710 0.710
NP 0.061 0.061 0.065 0.062 0.056 0.057 0.058 0.057 0.441 0.435 0.441 0.447 0.706 0.710 0.711 0.711
Panel B: DGP (ii)
NA 0.052 0.054 0.053 0.050 0.047 0.047 0.050 0.051 0.325 0.331 0.325 0.311 0.557 0.550 0.546 0.538
LP 0.055 0.055 0.062 0.053 0.051 0.053 0.053 0.054 0.389 0.402 0.396 0.394 0.626 0.626 0.634 0.634
ML 0.056 0.055 0.057 0.055 0.049 0.050 0.052 0.054 0.350 0.357 0.346 0.348 0.563 0.575 0.574 0.569
LPML 0.054 0.057 0.060 0.055 0.053 0.051 0.055 0.053 0.390 0.400 0.394 0.388 0.635 0.640 0.639 0.644
MLX 0.058 0.057 0.059 0.059 0.053 0.053 0.052 0.055 0.371 0.387 0.377 0.373 0.590 0.595 0.595 0.597
LPMLX 0.062 0.060 0.065 0.058 0.055 0.055 0.057 0.056 0.416 0.425 0.420 0.421 0.658 0.663 0.663 0.668
NP 0.068 0.068 0.067 0.063 0.056 0.054 0.058 0.059 0.429 0.436 0.427 0.426 0.663 0.670 0.670 0.672

C.2 Estimation biases and standard errors

In this section we report the biases and standard errors of our regression-adjusted estimators under three test settings. Specifically, the biases and standard errors for pointwise tests are summarized in Tables 9-11. Table 12 reports the biases and standard errors for estimating the difference of QTEs. Table 13 provides the average estimation bias and standard errors over the interval τ[0.25,0.75]\tau\in[0.25,0.75].

Table 9: Estimation Bias and Standard Errors for Pointwise Tests (τ=0.25\tau=0.25)
Bias Standard Error
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.010 0.004 -0.010 -0.023 0.011 -0.001 -0.001 -0.026 0.984 0.975 0.972 0.975 0.688 0.685 0.685 0.686
LP 0.024 0.013 0.005 0.036 0.016 0.006 0.004 0.009 0.882 0.874 0.872 0.872 0.610 0.607 0.608 0.607
ML 0.024 0.022 0.014 0.037 0.010 0.004 -0.002 0.012 0.902 0.894 0.892 0.891 0.620 0.617 0.619 0.618
LPML 0.011 0.010 0.002 0.028 0.005 0.000 -0.009 0.006 0.867 0.863 0.857 0.860 0.596 0.592 0.595 0.592
MLX -0.001 -0.006 -0.015 0.019 0.003 -0.002 -0.016 0.000 0.904 0.896 0.893 0.894 0.626 0.624 0.626 0.624
LPMLX 0.005 -0.002 -0.012 0.015 0.000 -0.007 -0.014 0.000 0.867 0.861 0.857 0.858 0.594 0.591 0.593 0.592
NP -0.037 -0.045 -0.053 -0.021 -0.014 -0.018 -0.027 -0.013 0.869 0.862 0.858 0.859 0.592 0.590 0.592 0.591
Panel B: DGP (ii)
NA -0.004 0.003 -0.007 -0.043 -0.001 -0.008 -0.001 -0.019 0.824 0.820 0.816 0.819 0.574 0.572 0.571 0.571
LP -0.038 -0.036 -0.040 -0.034 -0.022 -0.021 -0.015 -0.008 0.754 0.751 0.747 0.752 0.523 0.522 0.521 0.521
ML -0.032 -0.027 -0.036 -0.033 -0.023 -0.020 -0.013 -0.011 0.816 0.813 0.808 0.814 0.573 0.571 0.570 0.570
LPML -0.013 -0.007 -0.013 -0.003 -0.008 -0.009 -0.002 0.004 0.741 0.739 0.734 0.740 0.513 0.511 0.511 0.511
MLX -0.063 -0.056 -0.060 -0.065 -0.033 -0.038 -0.032 -0.035 0.803 0.800 0.796 0.802 0.571 0.568 0.569 0.568
LPMLX -0.061 -0.054 -0.057 -0.050 -0.032 -0.035 -0.028 -0.022 0.735 0.733 0.729 0.734 0.510 0.509 0.508 0.508
NP -0.068 -0.066 -0.072 -0.062 -0.039 -0.041 -0.033 -0.026 0.734 0.732 0.729 0.733 0.510 0.509 0.508 0.508
Table 10: Estimation Bias and Standard Errors for Pointwise Tests (τ=0.5\tau=0.5)
Bias Standard Error
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA -0.001 0.020 0.016 0.043 -0.003 -0.006 -0.005 0.019 0.979 0.976 0.976 0.973 0.688 0.685 0.685 0.687
LP -0.021 -0.005 -0.006 -0.017 -0.009 -0.008 -0.006 -0.010 0.875 0.873 0.871 0.871 0.610 0.606 0.607 0.609
ML 0.006 0.023 0.013 0.014 0.006 0.004 0.005 0.004 0.897 0.894 0.893 0.893 0.627 0.623 0.623 0.625
LPML -0.004 0.013 0.001 -0.002 0.001 -0.001 0.003 -0.004 0.860 0.858 0.856 0.856 0.597 0.592 0.592 0.595
MLX -0.004 0.016 0.006 0.003 0.003 -0.001 0.006 0.000 0.898 0.894 0.894 0.894 0.631 0.627 0.628 0.630
LPMLX 0.008 0.025 0.011 0.005 0.010 0.006 0.008 0.003 0.860 0.858 0.855 0.855 0.594 0.592 0.591 0.593
NP -0.021 -0.006 -0.014 -0.028 0.003 0.002 0.004 -0.003 0.859 0.858 0.855 0.855 0.593 0.589 0.589 0.592
Panel B: DGP (ii)
NA 0.032 0.017 0.032 0.091 0.003 0.013 0.019 0.039 1.036 1.026 1.023 1.025 0.723 0.720 0.718 0.718
LP -0.012 -0.034 -0.020 -0.015 -0.012 -0.003 0.001 -0.008 0.944 0.937 0.932 0.936 0.660 0.658 0.656 0.655
ML 0.025 0.010 0.026 0.029 0.010 0.009 0.015 0.010 1.006 1.000 0.997 0.998 0.709 0.705 0.702 0.702
LPML 0.020 0.013 0.029 0.031 0.016 0.019 0.028 0.013 0.930 0.920 0.919 0.924 0.644 0.640 0.638 0.637
MLX -0.008 -0.038 -0.021 -0.001 -0.015 -0.012 -0.008 -0.008 0.981 0.972 0.969 0.970 0.692 0.690 0.688 0.688
LPMLX -0.021 -0.041 -0.025 -0.024 -0.017 -0.013 -0.006 -0.017 0.916 0.912 0.906 0.908 0.636 0.635 0.633 0.631
NP -0.043 -0.059 -0.042 -0.041 -0.026 -0.019 -0.012 -0.023 0.910 0.903 0.901 0.903 0.634 0.632 0.631 0.630
Table 11: Estimation Bias and Standard Errors for Pointwise Tests (τ=0.75\tau=0.75)
Bias Standard Error
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.011 0.010 0.001 0.003 0.004 0.004 -0.004 -0.008 0.885 0.882 0.883 0.883 0.622 0.619 0.618 0.619
LP 0.002 0.002 -0.006 0.011 0.005 0.003 0.000 -0.008 0.779 0.776 0.775 0.776 0.545 0.542 0.541 0.542
ML 0.012 0.012 0.000 0.018 0.009 0.007 0.002 -0.006 0.795 0.791 0.790 0.793 0.557 0.554 0.553 0.553
LPML 0.004 0.006 -0.007 0.009 0.004 0.005 0.002 -0.004 0.759 0.756 0.755 0.757 0.528 0.525 0.524 0.525
MLX 0.000 0.002 -0.006 0.002 0.006 0.001 0.003 -0.004 0.796 0.791 0.790 0.793 0.561 0.559 0.558 0.558
LPMLX 0.004 0.007 -0.007 0.007 0.006 0.006 0.001 -0.004 0.758 0.755 0.753 0.756 0.527 0.524 0.523 0.524
NP -0.023 -0.017 -0.028 -0.015 0.004 0.004 0.000 -0.005 0.758 0.755 0.753 0.755 0.526 0.524 0.523 0.524
Panel B: DGP (ii)
NA 0.024 0.019 0.007 0.025 0.005 0.006 0.014 0.004 0.783 0.777 0.777 0.777 0.547 0.545 0.545 0.546
LP -0.009 -0.012 -0.029 -0.011 -0.013 -0.008 -0.004 -0.010 0.711 0.707 0.707 0.707 0.495 0.494 0.493 0.494
ML 0.009 0.009 -0.008 0.005 -0.008 0.001 0.003 0.001 0.744 0.74 0.739 0.74 0.523 0.522 0.522 0.522
LPML 0.026 0.025 0.013 0.021 0.011 0.018 0.020 0.017 0.697 0.696 0.696 0.693 0.481 0.48 0.479 0.482
MLX -0.037 -0.039 -0.049 -0.037 -0.028 -0.019 -0.018 -0.018 0.727 0.724 0.723 0.723 0.517 0.516 0.515 0.516
LPMLX -0.045 -0.043 -0.058 -0.046 -0.027 -0.019 -0.015 -0.019 0.686 0.685 0.684 0.684 0.479 0.476 0.476 0.479
NP -0.056 -0.061 -0.074 -0.061 -0.036 -0.027 -0.023 -0.027 0.685 0.681 0.682 0.681 0.474 0.473 0.473 0.473
Table 12: Estimation Bias and Standard Errors for Tests of Differences (τ1=0.25\tau_{1}=0.25, τ2=0.75\tau_{2}=0.75)
Bias Standard Error
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA -0.011 0.015 0.029 0.067 -0.013 -0.007 0.000 0.045 1.310 1.301 1.299 1.299 0.911 0.905 0.906 0.909
LP -0.044 -0.020 -0.007 -0.052 -0.025 -0.015 -0.007 -0.019 1.255 1.248 1.246 1.245 0.869 0.864 0.865 0.867
ML -0.029 -0.012 -0.005 -0.034 -0.009 -0.006 0.001 -0.015 1.253 1.243 1.241 1.241 0.865 0.860 0.862 0.862
LPML -0.028 -0.008 -0.010 -0.043 -0.010 -0.007 0.009 -0.015 1.201 1.194 1.190 1.191 0.823 0.819 0.820 0.822
MLX -0.012 0.000 0.012 -0.016 0.003 0.000 0.024 -0.001 1.250 1.244 1.239 1.241 0.870 0.865 0.867 0.868
LPMLX -0.019 0.006 0.006 -0.029 0.007 0.011 0.025 0.002 1.197 1.190 1.187 1.187 0.822 0.817 0.818 0.819
NP -0.010 0.014 0.018 -0.026 0.009 0.011 0.028 0.002 1.198 1.192 1.188 1.189 0.819 0.814 0.816 0.818
Panel B: DGP (ii)
NA 0.038 0.016 0.040 0.135 0.007 0.022 0.021 0.060 1.280 1.270 1.264 1.268 0.886 0.882 0.881 0.880
LP 0.029 0.002 0.021 0.020 0.012 0.018 0.017 0.000 1.201 1.192 1.186 1.191 0.831 0.829 0.827 0.826
ML 0.053 0.040 0.048 0.061 0.034 0.030 0.029 0.026 1.286 1.279 1.270 1.276 0.898 0.894 0.892 0.893
LPML 0.030 0.014 0.038 0.027 0.024 0.027 0.031 0.010 1.180 1.169 1.164 1.172 0.811 0.808 0.806 0.806
MLX 0.054 0.032 0.043 0.054 0.022 0.024 0.012 0.016 1.258 1.253 1.247 1.252 0.889 0.884 0.884 0.882
LPMLX 0.028 0.003 0.023 0.013 0.013 0.021 0.021 0.004 1.165 1.159 1.152 1.157 0.804 0.803 0.801 0.799
NP 0.019 0.001 0.023 0.010 0.014 0.021 0.021 0.003 1.160 1.153 1.149 1.152 0.804 0.801 0.799 0.799
Table 13: Average Estimation Bias and Standard Errors for Uniform Tests (τ[0.25,0.75]\tau\in[0.25,0.75])
Bias Standard Error
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.003 0.012 0.000 0.010 0.002 0.000 -0.004 -0.004 0.931 0.927 0.925 0.925 0.653 0.651 0.651 0.652
LP 0.001 0.004 -0.003 0.008 0.002 0.001 -0.001 -0.002 0.829 0.825 0.824 0.824 0.576 0.575 0.575 0.575
ML 0.012 0.016 0.005 0.019 0.005 0.006 0.001 0.003 0.851 0.847 0.845 0.846 0.593 0.591 0.591 0.591
LPML 0.002 0.008 -0.003 0.008 0.002 0.001 -0.002 -0.002 0.814 0.811 0.808 0.810 0.562 0.560 0.560 0.560
MLX -0.002 0.006 -0.008 0.009 0.000 0.001 -0.002 -0.003 0.850 0.847 0.845 0.846 0.596 0.594 0.594 0.594
LPMLX 0.003 0.010 -0.004 0.007 0.003 0.001 -0.001 -0.002 0.813 0.810 0.808 0.808 0.561 0.559 0.559 0.560
NP -0.024 -0.017 -0.029 -0.019 -0.002 -0.003 -0.005 -0.006 0.813 0.810 0.808 0.808 0.560 0.558 0.558 0.558
Panel B: DGP (ii)
NA 0.012 0.013 0.010 0.024 0.002 0.003 0.010 0.005 0.850 0.843 0.841 0.843 0.593 0.591 0.590 0.590
LP -0.024 -0.022 -0.027 -0.017 -0.015 -0.011 -0.007 -0.010 0.772 0.767 0.765 0.766 0.537 0.535 0.534 0.534
ML 0.001 0.000 0.000 0.009 -0.003 -0.001 0.005 0.001 0.820 0.816 0.814 0.816 0.579 0.577 0.575 0.575
LPML 0.015 0.016 0.015 0.023 0.010 0.014 0.018 0.014 0.761 0.757 0.755 0.757 0.525 0.523 0.522 0.522
MLX -0.035 -0.038 -0.038 -0.030 -0.022 -0.024 -0.019 -0.020 0.801 0.797 0.795 0.796 0.569 0.567 0.566 0.566
LPMLX -0.039 -0.040 -0.039 -0.034 -0.021 -0.020 -0.015 -0.018 0.752 0.748 0.745 0.746 0.521 0.519 0.518 0.519
NP -0.055 -0.059 -0.057 -0.052 -0.033 -0.030 -0.024 -0.028 0.746 0.742 0.741 0.742 0.516 0.515 0.514 0.514

C.3 Naïve Bootstrap Inference

In this section we report the size and power of our regression-adjusted estimator for the median QTE when we replace π^(s)\hat{\pi}(s) by the true propensity score 1/21/2. We then consider the multiplier bootstrap as defined in the main text but with π^(s)\hat{\pi}(s) replaced by the true propensity score 1/21/2. We call this the naive bootstrap inference because the simulation results below show that it is conservative. Specifically, we report addition simulation results for the pointwise tests with τ=0.25\tau=0.25, 0.50.5 and 0.750.75 (Tables 14-16), tests for differences (Table 17), and uniform tests (Table 18).

Comparing the results with the ones in Section 6, we see that using the true, instead of the estimated, propensity score, the multiplier bootstrap inference becomes conservative for randomization schemes “WEI”, “BCD”, and “SBR”. Specifically, the sizes are much smaller than the nominal rate (5%5\%). At the same time, the powers are smaller than their counterparts in Section 6. The improvement in powers of the “LPMLX” estimator with the estimated propensity score over the “LPMLX” estimator with the true propensity score is due to the 31–38% reduction in the standard errors. This outcome is consistent with the findings in Bugni et al. (2018) and Zhang and Zheng (2020) that the naive inference methods under CARs are conservative.

Table 14: Pointwise Tests with Naïve Bootstrap Inference (τ=0.25\tau=0.25, π^(s)=0.5\hat{\pi}(s)=0.5)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.049 0.032 0.022 0.023 0.056 0.032 0.023 0.023 0.255 0.234 0.224 0.220 0.454 0.454 0.456 0.465
LP 0.048 0.018 0.008 0.006 0.053 0.019 0.006 0.007 0.222 0.173 0.149 0.113 0.399 0.391 0.372 0.331
ML 0.048 0.034 0.032 0.031 0.048 0.044 0.037 0.037 0.330 0.292 0.279 0.253 0.622 0.614 0.602 0.592
LPML 0.051 0.034 0.031 0.029 0.050 0.041 0.038 0.038 0.346 0.311 0.296 0.270 0.643 0.630 0.622 0.606
MLX 0.049 0.039 0.033 0.034 0.053 0.046 0.041 0.042 0.334 0.300 0.294 0.265 0.621 0.617 0.608 0.597
LPMLX 0.052 0.040 0.033 0.032 0.049 0.046 0.042 0.041 0.353 0.326 0.308 0.280 0.656 0.647 0.640 0.629
NP 0.054 0.045 0.037 0.035 0.053 0.049 0.045 0.044 0.370 0.347 0.327 0.302 0.679 0.672 0.663 0.650
Panel B: DGP (ii)
NA 0.050 0.019 0.008 0.009 0.053 0.019 0.007 0.008 0.276 0.247 0.225 0.237 0.498 0.498 0.501 0.520
LP 0.052 0.013 0.002 0.002 0.053 0.011 0.001 0.002 0.238 0.178 0.148 0.109 0.432 0.413 0.389 0.363
ML 0.046 0.041 0.041 0.038 0.045 0.039 0.037 0.033 0.443 0.443 0.446 0.435 0.720 0.726 0.726 0.724
LPML 0.047 0.039 0.037 0.033 0.047 0.038 0.031 0.030 0.444 0.431 0.440 0.435 0.720 0.735 0.736 0.741
MLX 0.052 0.046 0.044 0.045 0.048 0.044 0.041 0.038 0.469 0.470 0.472 0.464 0.732 0.744 0.743 0.748
LPMLX 0.053 0.049 0.052 0.047 0.053 0.047 0.042 0.041 0.531 0.527 0.531 0.527 0.818 0.829 0.834 0.830
NP 0.058 0.056 0.056 0.057 0.052 0.053 0.050 0.049 0.548 0.553 0.556 0.553 0.841 0.848 0.850 0.842
Table 15: Pointwise Tests with Naïve Estimator (τ=0.5\tau=0.5, π^(s)=0.5\hat{\pi}(s)=0.5)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.056 0.026 0.015 0.019 0.053 0.028 0.014 0.015 0.291 0.255 0.242 0.236 0.492 0.501 0.503 0.504
LP 0.049 0.008 0.001 0.001 0.052 0.008 0.001 0.000 0.181 0.103 0.053 0.033 0.314 0.254 0.191 0.158
ML 0.050 0.018 0.008 0.007 0.051 0.018 0.006 0.007 0.273 0.226 0.198 0.166 0.479 0.479 0.479 0.458
LPML 0.049 0.017 0.006 0.006 0.048 0.017 0.006 0.005 0.276 0.217 0.198 0.155 0.495 0.492 0.491 0.476
MLX 0.050 0.018 0.008 0.008 0.052 0.021 0.008 0.007 0.270 0.229 0.211 0.169 0.482 0.472 0.473 0.455
LPMLX 0.053 0.017 0.007 0.007 0.050 0.018 0.006 0.006 0.284 0.232 0.209 0.169 0.500 0.498 0.498 0.479
NP 0.055 0.020 0.008 0.008 0.052 0.021 0.006 0.007 0.291 0.243 0.227 0.184 0.507 0.507 0.507 0.490
Panel B: DGP (ii)
NA 0.051 0.017 0.005 0.005 0.048 0.016 0.006 0.006 0.284 0.244 0.223 0.211 0.499 0.491 0.486 0.494
LP 0.047 0.006 0.000 0.000 0.048 0.004 0.000 0.000 0.171 0.087 0.041 0.019 0.315 0.229 0.150 0.116
ML 0.050 0.021 0.013 0.011 0.047 0.026 0.019 0.018 0.315 0.254 0.216 0.187 0.588 0.552 0.528 0.509
LPML 0.049 0.016 0.009 0.006 0.045 0.017 0.010 0.009 0.296 0.229 0.188 0.166 0.526 0.484 0.462 0.440
MLX 0.056 0.021 0.014 0.013 0.052 0.027 0.019 0.021 0.336 0.275 0.243 0.213 0.606 0.564 0.554 0.535
LPMLX 0.055 0.020 0.013 0.011 0.054 0.021 0.017 0.016 0.341 0.280 0.240 0.210 0.598 0.565 0.550 0.525
NP 0.059 0.023 0.015 0.014 0.056 0.028 0.024 0.023 0.368 0.303 0.269 0.237 0.651 0.614 0.602 0.578
Table 16: Pointwise Tests with Naïve Estimator (τ=0.75\tau=0.75, π^(s)=0.5\hat{\pi}(s)=0.5)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.054 0.033 0.024 0.023 0.054 0.030 0.022 0.019 0.265 0.241 0.232 0.218 0.458 0.462 0.462 0.441
LP 0.015 0.002 0.000 0.000 0.029 0.002 0.000 0.000 0.089 0.022 0.002 0.001 0.162 0.073 0.012 0.006
ML 0.028 0.003 0.001 0.000 0.044 0.005 0.001 0.000 0.127 0.050 0.015 0.008 0.228 0.137 0.067 0.053
LPML 0.028 0.003 0.000 0.000 0.045 0.005 0.000 0.000 0.126 0.046 0.013 0.008 0.232 0.138 0.063 0.045
MLX 0.028 0.003 0.001 0.000 0.045 0.005 0.000 0.000 0.127 0.050 0.016 0.008 0.228 0.141 0.066 0.053
LPMLX 0.028 0.003 0.000 0.000 0.045 0.005 0.000 0.000 0.127 0.049 0.014 0.009 0.232 0.140 0.064 0.047
NP 0.030 0.003 0.000 0.000 0.044 0.006 0.000 0.001 0.134 0.052 0.017 0.009 0.234 0.145 0.069 0.053
Panel B: DGP (ii)
NA 0.051 0.027 0.019 0.016 0.050 0.028 0.016 0.019 0.239 0.210 0.192 0.169 0.409 0.392 0.384 0.371
LP 0.009 0.001 0.000 0.000 0.024 0.002 0.000 0.000 0.067 0.018 0.002 0.000 0.144 0.056 0.007 0.002
ML 0.021 0.003 0.000 0.000 0.040 0.004 0.001 0.000 0.103 0.040 0.011 0.005 0.191 0.102 0.041 0.031
LPML 0.027 0.003 0.000 0.000 0.041 0.004 0.000 0.000 0.101 0.034 0.008 0.004 0.185 0.094 0.030 0.026
MLX 0.023 0.003 0.000 0.000 0.040 0.005 0.000 0.000 0.107 0.044 0.011 0.005 0.195 0.107 0.044 0.035
LPMLX 0.024 0.003 0.000 0.000 0.041 0.004 0.000 0.000 0.110 0.043 0.011 0.004 0.199 0.109 0.042 0.032
NP 0.027 0.002 0.000 0.000 0.040 0.003 0.000 0.000 0.113 0.045 0.013 0.005 0.205 0.113 0.044 0.035
Table 17: Tests for Differences with Naïve Estimator (τ1=0.25\tau_{1}=0.25, τ2=0.75\tau_{2}=0.75, π^(s)=0.5\hat{\pi}(s)=0.5)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.035 0.033 0.028 0.029 0.043 0.035 0.030 0.033 0.187 0.184 0.170 0.159 0.345 0.343 0.335 0.311
LP 0.010 0.006 0.001 0.001 0.029 0.008 0.003 0.003 0.081 0.059 0.034 0.026 0.203 0.149 0.110 0.097
ML 0.027 0.008 0.003 0.002 0.042 0.009 0.002 0.002 0.105 0.060 0.030 0.025 0.190 0.127 0.080 0.072
LPML 0.027 0.009 0.002 0.002 0.043 0.009 0.002 0.003 0.105 0.055 0.025 0.023 0.195 0.128 0.079 0.068
MLX 0.027 0.008 0.004 0.003 0.042 0.010 0.002 0.002 0.101 0.059 0.029 0.026 0.188 0.125 0.079 0.072
LPMLX 0.028 0.009 0.003 0.003 0.042 0.009 0.002 0.003 0.108 0.058 0.028 0.025 0.197 0.128 0.079 0.070
NP 0.028 0.009 0.002 0.003 0.044 0.010 0.002 0.003 0.110 0.057 0.027 0.025 0.198 0.130 0.078 0.070
Panel B: DGP (ii)
NA 0.037 0.026 0.020 0.021 0.040 0.028 0.023 0.026 0.167 0.165 0.152 0.122 0.330 0.318 0.306 0.294
LP 0.005 0.003 0.001 0.001 0.024 0.006 0.001 0.000 0.053 0.038 0.018 0.009 0.174 0.106 0.062 0.050
ML 0.022 0.004 0.002 0.001 0.035 0.005 0.002 0.001 0.081 0.035 0.014 0.006 0.162 0.086 0.045 0.033
LPML 0.026 0.005 0.001 0.000 0.039 0.004 0.001 0.001 0.081 0.032 0.009 0.005 0.157 0.070 0.025 0.021
MLX 0.023 0.005 0.001 0.001 0.034 0.007 0.001 0.001 0.082 0.038 0.013 0.008 0.165 0.092 0.043 0.038
LPMLX 0.023 0.005 0.001 0.000 0.037 0.005 0.001 0.001 0.084 0.037 0.012 0.007 0.170 0.089 0.037 0.031
NP 0.024 0.005 0.001 0.000 0.037 0.005 0.001 0.001 0.091 0.038 0.013 0.007 0.175 0.093 0.042 0.036
Table 18: Uniform Tests with Naïve Estimator (τ[0.25,0.75]\tau\in[0.25,0.75], π^(s)=0.5\hat{\pi}(s)=0.5)
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Methods SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: DGP (i)
NA 0.044 0.020 0.012 0.011 0.048 0.025 0.014 0.011 0.187 0.184 0.170 0.159 0.566 0.569 0.567 0.562
LP 0.027 0.004 0.001 0.001 0.040 0.005 0.000 0.001 0.081 0.059 0.034 0.026 0.347 0.295 0.234 0.195
ML 0.031 0.011 0.007 0.006 0.043 0.016 0.009 0.010 0.105 0.060 0.030 0.025 0.630 0.588 0.562 0.541
LPML 0.034 0.010 0.007 0.007 0.043 0.016 0.008 0.009 0.105 0.055 0.025 0.023 0.636 0.608 0.583 0.562
MLX 0.032 0.012 0.008 0.007 0.046 0.018 0.009 0.011 0.101 0.059 0.029 0.026 0.629 0.591 0.560 0.548
LPMLX 0.033 0.013 0.010 0.008 0.045 0.019 0.011 0.011 0.108 0.058 0.028 0.025 0.653 0.623 0.602 0.582
NP 0.034 0.013 0.008 0.009 0.047 0.021 0.012 0.012 0.110 0.057 0.027 0.025 0.673 0.646 0.624 0.604
Panel B: DGP (ii)
NA 0.043 0.013 0.005 0.003 0.050 0.011 0.003 0.004 0.308 0.251 0.213 0.208 0.562 0.570 0.566 0.572
LP 0.029 0.002 0.000 0.000 0.035 0.002 0.000 0.000 0.171 0.082 0.041 0.025 0.358 0.282 0.207 0.177
ML 0.035 0.012 0.015 0.011 0.039 0.019 0.016 0.015 0.454 0.401 0.387 0.368 0.826 0.800 0.789 0.787
LPML 0.030 0.011 0.010 0.010 0.036 0.016 0.009 0.010 0.444 0.388 0.374 0.355 0.804 0.771 0.762 0.755
MLX 0.037 0.017 0.019 0.014 0.040 0.022 0.017 0.019 0.500 0.450 0.439 0.418 0.853 0.830 0.826 0.819
LPMLX 0.038 0.018 0.019 0.016 0.040 0.023 0.018 0.019 0.534 0.492 0.482 0.462 0.889 0.870 0.864 0.860
NP 0.041 0.023 0.025 0.023 0.045 0.028 0.023 0.024 0.573 0.539 0.533 0.513 0.919 0.906 0.903 0.899

C.4 High-dimensional covariates

To assess the finite sample performance of the estimation and inference methods introduced in Section A, we consider the outcomes equation

Yi=α(Xi)+γZi+μ(Xi)Ai+ηi,\displaystyle Y_{i}=\alpha(X_{i})+\gamma Z_{i}+\mu(X_{i})A_{i}+\eta_{i}, (C.1)

where γ=4\gamma=4 for all cases while α(Xi)\alpha(X_{i}), μ(Xi)\mu(X_{i}), and ηi\eta_{i} are separately specified as follows.

Let ZZ follow the standardized Beta(2, 2) distribution, Si=j=141{Zigj}S_{i}=\sum_{j=1}^{4}1\{Z_{i}\leq g_{j}\}, and (g1,,g4)=(0.55,0,0.55,5)(g_{1},\cdots,g_{4})=(-0.5\sqrt{5},0,0.5\sqrt{5},\sqrt{5}). Further suppose that XiX_{i} contains twenty covariates (X1i,,X20,i)(X_{1i},\cdots,X_{20,i})^{\top}, where X=Φ(W)X=\Phi(W) with WN(020×1,Ω)W\sim N(0_{20\times 1},\Omega) and the variance matrix Ω\Omega is the Toeplitz matrix

Ω=(10.50.520.5190.510.50.5180.520.510.5170.5190.5180.5171)\displaystyle\Omega=\begin{pmatrix}1&0.5&0.5^{2}&\cdots&0.5^{19}\\ 0.5&1&0.5&\cdots&0.5^{18}\\ 0.5^{2}&0.5&1&\cdots&0.5^{17}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0.5^{19}&0.5^{18}&0.5^{17}&\cdots&1\end{pmatrix}

Further define α(Xi)=1\alpha(X_{i})=1, μ(Xi)=1+k=120Xkiβk\mu(X_{i})=1+\sum_{k=1}^{20}X_{ki}\beta_{k} with βk=4/k2\beta_{k}=4/k^{2}, and ηi=2Aiε1i+(1Ai)ε2i\eta_{i}=2A_{i}\varepsilon_{1i}+(1-A_{i})\varepsilon_{2i}, where (ε1i,ε2i)(\varepsilon_{1i},\varepsilon_{2i}) are jointly standard normal.

We consider the post-Lasso estimator θ^a,spost(q^a(τ))\hat{\theta}_{a,s}^{post}(\hat{q}_{a}(\tau)) as defined in (A.2) with Hpn(Xi)=(1,Xi)H_{p_{n}}(X_{i})=(1,X_{i}^{\top})^{\top} and 𝕊^a,s+(q)={2}\hat{\mathbb{S}}^{+}_{a,s}(q)=\{2\}. The choice of tuning parameter and the estimation procedure are detailed in Section B.3. We assess the empirical size and power of the tests for n=200n=200 and n=400n=400. All simulations are replicated 10,000 times, with the bootstrap sample size being 1,000. We compute the true QTEs or QTE differences by simulations with 10,000 sample size and 1,000 replications. To compute power, we perturb the true values by 1.5.

In Table 19, we report the empirical size and power of all three testing scenarios in the high-dimensional setting. In particular, we compare the methods “NA” with our post-Lasso estimator and the oracle estimator. Evidently, all sizes for both methods approach the nominal level as the sample size increases. The post-Lasso method dominates “NA” in all tests with superior power performance. The improvement in power of the “Post-Lasso” estimator upon “NA” (i.e., with no adjustments) is due to a 2.5% reduction of the standard error of the difference of the QTE estimators on average as shown in Table 20. This result is consistent with the theory given in Theorem A.1. The powers of the “Post-Lasso” and “Oracle” estimators are similarly, which also confirms that the “Post-Lasso” estimator achieves the minimum asymptotic variance.

Table 19: Empirical Size and Power for High-dimensional Covariates
Size Power
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Cases SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: NA
τ=0.25\tau=0.25 0.049 0.046 0.046 0.050 0.050 0.049 0.046 0.045 0.649 0.646 0.645 0.660 0.915 0.911 0.915 0.917
τ=0.50\tau=0.50 0.047 0.044 0.045 0.043 0.042 0.044 0.044 0.046 0.732 0.732 0.726 0.736 0.955 0.960 0.960 0.960
τ=0.75\tau=0.75 0.050 0.045 0.046 0.047 0.044 0.046 0.051 0.047 0.620 0.635 0.638 0.627 0.895 0.904 0.903 0.898
Diff 0.038 0.041 0.038 0.039 0.041 0.042 0.042 0.037 0.365 0.373 0.369 0.351 0.643 0.644 0.644 0.628
Uniform 0.035 0.034 0.036 0.036 0.040 0.038 0.040 0.040 0.852 0.860 0.857 0.865 0.994 0.996 0.994 0.994
Panel B: Post-Lasso
τ=0.25\tau=0.25 0.060 0.055 0.058 0.058 0.054 0.054 0.048 0.052 0.661 0.655 0.659 0.656 0.923 0.916 0.924 0.918
τ=0.50\tau=0.50 0.056 0.054 0.056 0.052 0.048 0.050 0.051 0.049 0.739 0.744 0.728 0.741 0.960 0.964 0.963 0.963
τ=0.75\tau=0.75 0.059 0.055 0.055 0.056 0.052 0.050 0.056 0.055 0.627 0.644 0.648 0.643 0.902 0.911 0.907 0.904
Diff 0.052 0.050 0.048 0.050 0.048 0.046 0.046 0.043 0.377 0.380 0.381 0.373 0.657 0.665 0.655 0.659
Uniform 0.051 0.051 0.048 0.046 0.049 0.046 0.048 0.049 0.872 0.881 0.877 0.882 0.996 0.998 0.996 0.996
Panel C: Oracle
τ=0.25\tau=0.25 0.048 0.043 0.045 0.048 0.048 0.047 0.040 0.046 0.668 0.663 0.660 0.660 0.925 0.921 0.929 0.922
τ=0.50\tau=0.50 0.041 0.044 0.044 0.040 0.042 0.044 0.043 0.044 0.745 0.746 0.739 0.749 0.962 0.967 0.967 0.967
τ=0.75\tau=0.75 0.049 0.044 0.046 0.045 0.046 0.046 0.050 0.047 0.640 0.652 0.648 0.649 0.907 0.916 0.914 0.911
Diff 0.052 0.050 0.048 0.050 0.041 0.041 0.042 0.038 0.387 0.390 0.392 0.385 0.661 0.663 0.656 0.655
Uniform 0.041 0.044 0.044 0.040 0.040 0.038 0.040 0.041 0.873 0.881 0.883 0.883 0.997 0.998 0.997 0.997
Table 20: Estimation Bias and Standard Errors for High-dimensional Covariates
Bias Standard Error
N=200N=200 N=400N=400 N=200N=200 N=400N=400
Cases SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR SRS WEI BCD SBR
Panel A: NA
τ=0.25\tau=0.25 -0.009 -0.006 -0.001 -0.033 -0.005 0.000 -0.005 -0.018 0.652 0.649 0.647 0.651 0.456 0.455 0.454 0.456
τ=0.50\tau=0.50 0.004 0.005 0.018 -0.004 0.004 -0.002 -0.002 -0.001 0.588 0.584 0.583 0.583 0.408 0.407 0.407 0.407
τ=0.75\tau=0.75 0.022 0.006 0.016 0.028 0.017 0.001 0.007 0.016 0.652 0.651 0.650 0.648 0.457 0.456 0.454 0.456
Diff 0.011 0.022 0.012 0.067 0.007 0.008 0.013 0.032 0.922 0.917 0.916 0.917 0.644 0.642 0.641 0.643
Uniform 0.008 0.000 -0.004 -0.002 0.000 -0.001 0.000 0.008 0.624 0.621 0.620 0.621 0.436 0.435 0.435 0.435
Panel B: Post-Lasso
τ=0.25\tau=0.25 0.004 0.005 0.005 -0.004 -0.002 0.004 -0.003 -0.002 0.639 0.636 0.633 0.637 0.446 0.445 0.445 0.446
τ=0.50\tau=0.50 0.014 0.010 0.029 0.011 0.007 0.001 0.003 0.005 0.576 0.573 0.572 0.571 0.400 0.398 0.398 0.399
τ=0.75\tau=0.75 0.039 0.021 0.026 0.026 0.023 0.008 0.014 0.015 0.639 0.638 0.636 0.635 0.447 0.446 0.445 0.446
Diff 0.013 0.025 0.008 0.039 0.010 0.007 0.018 0.012 0.907 0.901 0.900 0.902 0.632 0.630 0.630 0.631
Uniform 0.020 0.011 0.006 0.010 0.004 0.005 0.006 0.014 0.611 0.609 0.608 0.608 0.427 0.426 0.426 0.426
Panel C: Oracle
τ=0.25\tau=0.25 -0.010 -0.003 -0.001 -0.003 -0.006 -0.001 -0.007 -0.004 0.633 0.631 0.629 0.633 0.445 0.443 0.443 0.444
τ=0.50\tau=0.50 0.001 0.003 0.022 0.006 0.002 -0.003 -0.002 0.004 0.572 0.569 0.568 0.568 0.397 0.396 0.396 0.396
τ=0.75\tau=0.75 0.022 0.008 0.020 0.022 0.018 0.001 0.009 0.013 0.637 0.635 0.633 0.632 0.446 0.445 0.443 0.445
Diff 0.017 0.026 0.013 0.040 0.008 0.009 0.016 0.014 0.900 0.894 0.894 0.895 0.629 0.628 0.627 0.628
Uniform 0.007 0.002 -0.002 0.004 -0.001 0.000 0.000 0.011 0.607 0.604 0.603 0.604 0.425 0.424 0.424 0.424

Appendix D Additional Notation

Throughout the supplement the collection (ξis,Xis,Yis(1),Yis(0))i[n](\xi_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0))_{i\in[n]} denotes an i.i.d. sequence with marginal distribution equal to the conditional distribution of (ξi,Xi,Yi(1),Yi(0))(\xi_{i},X_{i},Y_{i}(1),Y_{i}(0)) given Si=sS_{i}=s. In addition, {(ξis,Xis,Yis(1),Yis(0))i[n]}s𝒮\{(\xi_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0))_{i\in[n]}\}_{s\in\mathcal{S}} are independent across ss and with {Ai,Si}i[n]\{A_{i},S_{i}\}_{i\in[n]}. We further denote \mathcal{F} as a generic class of functions which differs in different contexts. The envelope of \mathcal{F} is denoted as FF. We say \mathcal{F} is of VC-type with coefficients (αn,vn)(\alpha_{n},v_{n}) if

supQN(,eQ,εFQ,2)(αnε)vn,ε(0,1],\displaystyle\sup_{Q}N\left(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2}\right)\leq\left(\frac{\alpha_{n}}{\varepsilon}\right)^{v_{n}},\quad\forall\varepsilon\in(0,1],

where N()N(\cdot) denote the covering number, eQ(f,g)=fgQ,2e_{Q}(f,g)=||f-g||_{Q,2}, and the supremum is taken over all finitely discrete probability measures.

Appendix E Proof of Theorem 3

Name Description na(s)n_{a}(s) For a{0,1},s𝒮a\in\{0,1\},s\in\mathcal{S}, na(s)n_{a}(s) is the number of individuals with Ai=aA_{i}=a in stratum s𝒮s\in\mathcal{S} n(s)n(s) For s𝒮s\in\mathcal{S}, n(s)n(s) is the number of individuals in stratum s𝒮s\in\mathcal{S} π^(s)\hat{\pi}(s) For s𝒮s\in\mathcal{S}, π^(s)=n1(s)/n(s)\hat{\pi}(s)=n_{1}(s)/n(s) q^aadj(τ)\hat{q}_{a}^{adj}(\tau) For a{0,1}a\in\{0,1\} and τΥ\tau\in\Upsilon, q^aadj(τ)\hat{q}_{a}^{adj}(\tau) is the regression-adjusted estimator of qa(τ)q_{a}(\tau) with a generic regression adjustment ma(τ,s,x)m_{a}(\tau,s,x) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, τΥ\tau\in\Upsilon, and xSupp(X)x\in\text{Supp}(X), ma(τ,s,x)=τ(Yi(a)qa(τ)|Si=s,Xi=x)m_{a}(\tau,s,x)=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x) is the true specification m¯a(τ,s,x)\overline{m}_{a}(\tau,s,x) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, τΥ\tau\in\Upsilon, and xSupp(X)x\in\text{Supp}(X), m¯a(τ,s,x)\overline{m}_{a}(\tau,s,x) is the model for ma(τ,s,x)m_{a}(\tau,s,x) specified by researchers ma(τ,s)m_{a}(\tau,s) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, ma(τ,s)=𝔼(ma(τ,Si,Xi)|Si=s)m_{a}(\tau,s)=\mathbb{E}(m_{a}(\tau,S_{i},X_{i})|S_{i}=s) m¯a(τ,s)\overline{m}_{a}(\tau,s) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, m¯a(τ,s)=𝔼(m¯a(τ,Si,Xi)|Si=s)\overline{m}_{a}(\tau,s)=\mathbb{E}(\overline{m}_{a}(\tau,S_{i},X_{i})|S_{i}=s) ηi,a(τ,s)\eta_{i,a}(\tau,s) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon,, ηi,a(τ,s)=τ1{Yi(a)qa(τ)}ma(τ,s)\eta_{i,a}(\tau,s)=\tau-1\{Y_{i}(a)\leq q_{a}(\tau)\}-m_{a}(\tau,s) fa()f_{a}(\cdot) For a{0,1}a\in\{0,1\}, fa()f_{a}(\cdot) denotes the density of Y(a)Y(a) Dn(s)D_{n}(s) For s𝒮s\in\mathcal{S}, Dn(s)=i=1n(Aiπ(s))1{Si=s}D_{n}(s)=\sum_{i=1}^{n}(A_{i}-\pi(s))1\{S_{i}=s\} denotes the imbalance in stratum ss Δ¯a(τ,s,x)\overline{\Delta}_{a}(\tau,s,x) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, τΥ\tau\in\Upsilon, and xSupp(X)x\in\text{Supp}(X), Δ¯a(τ,s,x)=m^a(τ,s,x)m¯a(τ,s,x)\overline{\Delta}_{a}(\tau,s,x)=\widehat{m}_{a}(\tau,s,x)-\overline{m}_{a}(\tau,s,x)

We first derive the linear expansion of q^1adj(τ)\hat{q}_{1}^{adj}(\tau). By Knight’s identity ((Knight, 1998)), we have

Ln(u,τ)=\displaystyle L_{n}(u,\tau)= i=1n[Aiπ^(Si)[ρτ(Yiq1(τ)u/n)ρτ(Yiq1(τ))]+(Aiπ^(Si))π^(Si)nm^1(τ,Si,Xi)u]\displaystyle\sum_{i=1}^{n}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}\left[\rho_{\tau}(Y_{i}-q_{1}(\tau)-u/\sqrt{n})-\rho_{\tau}(Y_{i}-q_{1}(\tau))\right]+\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})\sqrt{n}}\widehat{m}_{1}(\tau,S_{i},X_{i})u\right]
\displaystyle\equiv L1,n(τ)u+L2,n(u,τ),\displaystyle-L_{1,n}(\tau)u+L_{2,n}(u,\tau),

where

L1,n(τ)=1ni=1n[Aiπ^(Si)(τ1{Yiq1(τ)})(Aiπ^(Si))π^(Si)m^1(τ,Si,Xi)]\displaystyle L_{1,n}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})-\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right]

and

L2,n(τ)=i=1nAiπ^(Si)0un(1{Yiq1(τ)+v}1{Yiq1(τ)})𝑑v.\displaystyle L_{2,n}(\tau)=\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\int_{0}^{\frac{u}{\sqrt{n}}}\left(1\{Y_{i}\leq q_{1}(\tau)+v\}-1\{Y_{i}\leq q_{1}(\tau)\}\right)dv.

By change of variables, we have

n(q^1adj(τ)q1(τ))=argminuLn(u,τ).\displaystyle\sqrt{n}(\hat{q}_{1}^{adj}(\tau)-q_{1}(\tau))=\operatorname*{arg\,min}_{u}L_{n}(u,\tau).

Note that L2,n(τ)L_{2,n}(\tau) is exactly the same as that considered in the proof of Theorem 3.2 in Zhang and Zheng (2020) and by their result we have

supτΥ|L2,n(τ)f1(q1(τ))u22|=op(1).\displaystyle\sup_{\tau\in\Upsilon}\left|L_{2,n}(\tau)-\frac{f_{1}(q_{1}(\tau))u^{2}}{2}\right|=o_{p}(1).

Next, consider L1,n(τ)L_{1,n}(\tau). Denote m1(τ,s)=𝔼(m1(τ,Si,Xi)|Si=s)m_{1}(\tau,s)=\mathbb{E}(m_{1}(\tau,S_{i},X_{i})|S_{i}=s), ηi,1(τ,s)=τ1{Yiq1(τ)}m1(τ,s)\eta_{i,1}(\tau,s)=\tau-1\{Y_{i}\leq q_{1}(\tau)\}-m_{1}(\tau,s), and

L1,n(τ)=\displaystyle L_{1,n}(\tau)= 1ni=1n[Aiπ^(Si)(τ1{Yiq1(τ)})]1ni=1n[(Aiπ^(Si))π^(Si)m^1(τ,Si,Xi)]\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})\right]-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right]
\displaystyle\equiv L1,1,n(τ)L1,2,n(τ).\displaystyle L_{1,1,n}(\tau)-L_{1,2,n}(\tau).

First, note that π^(s)π(s)=Dn(s)n(s)\hat{\pi}(s)-\pi(s)=\frac{D_{n}(s)}{n(s)}. Therefore,

L1,1,n(τ)\displaystyle L_{1,1,n}(\tau) =1ni=1ns𝒮Aiπ(s)1{Si=s}(τ1{Yi(1)q1(τ)})\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})
i=1ns𝒮Ai1{Si=s}(π^(s)π(s))nπ^(s)π(s)(τ1{Yi(1)q1(τ)})\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}(\hat{\pi}(s)-\pi(s))}{\sqrt{n}\hat{\pi}(s)\pi(s)}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})
=1ni=1ns𝒮Aiπ(s)1{Si=s}(τ1{Yi(1)q1(τ)})\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})
i=1ns𝒮Ai1{Si=s}Dn(s)n(s)nπ^(s)π(s)ηi,1(τ,s)s𝒮Dn(s)m1(τ,s)n(s)nπ^(s)π(s)Dn(s)s𝒮Dn(s)m1(τ,s)nπ^(s)\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}D_{n}(s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}(s)}
=s𝒮1ni=1nAi1{Si=s}π(s)ηi,1(τ,s)+s𝒮Dn(s)nπ(s)m1(τ,s)+i=1nm1(τ,Si)n\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{s\in\mathcal{S}}\frac{D_{n}(s)}{\sqrt{n}\pi(s)}m_{1}(\tau,s)+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}
i=1ns𝒮Ai1{Si=s}Dn(s)n(s)nπ^(s)π(s)ηi,1(τ,s)s𝒮Dn(s)m1(τ,s)n(s)nπ^(s)π(s)Dn(s)s𝒮Dn(s)m1(τ,s)nπ^(s)\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}D_{n}(s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}(s)}
=s𝒮1ni=1nAi1{Si=s}π(s)ηi,1(τ,s)+i=1nm1(τ,Si)n+R1,1(τ),\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}(\tau), (E.1)

where

R1,1(τ)=\displaystyle R_{1,1}(\tau)= i=1ns𝒮Ai1{Si=s}Dn(s)n(s)nπ^(s)π(s)ηi,1(τ,s)s𝒮Dn(s)m1(τ,s)n(s)nπ^(s)π(s)Dn(s)\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}D_{n}(s)
+s𝒮Dn(s)m1(τ,s)n(1π(s)1π^(s))\displaystyle+\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{\sqrt{n}}\left(\frac{1}{\pi(s)}-\frac{1}{\hat{\pi}(s)}\right)
=i=1ns𝒮Ai1{Si=s}Dn(s)n(s)nπ^(s)π(s)ηi,1(τ,s).\displaystyle=-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s).

In addition, note that

{τ1{Yi(1)q1(τ)}m1(τ,Si):τΥ}\displaystyle\{\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i}):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients (α,v)(\alpha,v) and bounded envelope, and 𝔼(τ1{Yi(1)q1(τ)}m1(τ,Si)|Si=s)=0\mathbb{E}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i})|S_{i}=s)=0. Therefore, Lemma N.2 implies

supτΥ,s𝒮|1ni=1nAi1{Si=s}ηi,1(τ,s)|=Op(1).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\eta_{i,1}(\tau,s)\right|=O_{p}(1).

By Assumption 1 we have maxs𝒮|Dn(s)/n(s)|=op(1)\max_{s\in\mathcal{S}}|D_{n}(s)/n(s)|=o_{p}(1), maxs𝒮|π^(s)π(s)|=op(1)\max_{s\in\mathcal{S}}|\hat{\pi}(s)-\pi(s)|=o_{p}(1), and mins𝒮π(s)>c>0\min_{s\in\mathcal{S}}\pi(s)>c>0, which imply supτΥ|R1,1(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{1,1}(\tau)|=o_{p}(1).

Next, denote m¯1(τ,s)=𝔼(m¯1(τ,s,Xi)|Si=s)\overline{m}_{1}(\tau,s)=\mathbb{E}(\overline{m}_{1}(\tau,s,X_{i})|S_{i}=s). Then

L1,2,n\displaystyle L_{1,2,n} =1ns𝒮i=1nAiπ^(s)m¯1(τ,s,Xi)1{Si=s}1ni=1nm¯1(τ,Si,Xi)\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(s)}\overline{m}_{1}(\tau,s,X_{i})1\{S_{i}=s\}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\overline{m}_{1}(\tau,S_{i},X_{i})
+1ns𝒮1π^(s)i=1n(Aiπ^(s))(m^1(τ,s,Xi)m¯1(τ,s,Xi))1{Si=s}\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}
=1ns𝒮i=1nAiπ^(s)(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s}\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}
1ni=1n(m¯1(τ,Si,Xi)m¯1(τ,Si))\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)
+1ns𝒮1π^(s)i=1n(Aiπ^(s))(m^1(τ,s,Xi)m¯1(τ,s,Xi))1{Si=s}\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}
=1ns𝒮i=1nAiπ(s)(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s}\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}
1ni=1n(m¯1(τ,Si,Xi)m¯1(τ,Si))\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)
+1ns𝒮(π(s)π^(s)π^(s)π(s))(i=1nAi(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s})\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}(s)}{\hat{\pi}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)
+1ns𝒮1π^(s)i=1n(Aiπ^(s))(m^1(τ,s,Xi)m¯1(τ,s,Xi))1{Si=s}\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}
1ns𝒮i=1nAiπ(s)(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s}\displaystyle\equiv\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}
1ni=1n(m¯1(τ,Si,Xi)m¯1(τ,Si))+R2(τ),\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)+R_{2}(\tau), (E.2)

where the second equality holds because

s𝒮i[n]Aiπ^(s)m¯1(τ,s)1{Si=s}=s𝒮n(s)m¯1(τ,s)=i=1nm¯1(τ,Si).\displaystyle\sum_{s\in\mathcal{S}}\sum_{i\in[n]}\frac{A_{i}}{\hat{\pi}(s)}\overline{m}_{1}(\tau,s)1\{S_{i}=s\}=\sum_{s\in\mathcal{S}}n(s)\overline{m}_{1}(\tau,s)=\sum_{i=1}^{n}\overline{m}_{1}(\tau,S_{i}).

For the first term of R2(τ)R_{2}(\tau), we have

supτΥ|1ns𝒮(π(s)π^(s)π^(s)π(s))(i=1nAi(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s})|\displaystyle\sup_{\tau\in\Upsilon}\left|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}(s)}{\hat{\pi}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)\right|
s𝒮|Dn(s)n1(s)π(s)|supτΥ,s𝒮|1ni=1nAi1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))|.\displaystyle\leq\sum_{s\in\mathcal{S}}\left|\frac{D_{n}(s)}{n_{1}(s)\pi(s)}\right|\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))\right|.

Assumption 3 implies

={m¯1(τ,s,Xi)m¯1(τ,s):τΥ}\displaystyle\mathcal{F}=\{\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients (α,v)(\alpha,v) and an envelope FiF_{i} such that 𝔼(|Fi|d|Si=s)<\mathbb{E}(|F_{i}|^{d}|S_{i}=s)<\infty for d>2d>2. Therefore,

supτΥ,s𝒮|1ni=1nAi1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))|=Op(n1/2).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))\right|=O_{p}(n^{-1/2}).

It is also assumed that Dn(s)/n(s)=op(1)D_{n}(s)/n(s)=o_{p}(1) and n(s)/n1(s)p1/π(s)<n(s)/n_{1}(s)\stackrel{{\scriptstyle p}}{{\longrightarrow}}1/\pi(s)<\infty. Therefore, we have

supτΥ|1ns𝒮(π(s)π^(s)π^(s)π(s))(i=1nAi(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s})|=op(1).\displaystyle\sup_{\tau\in\Upsilon}\left|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}(s)}{\hat{\pi}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)\right|=o_{p}(1).

Recall Δ¯1(τ,s,Xi)=m^1(τ,s,Xi)m¯1(τ,s,Xi)\overline{\Delta}_{1}(\tau,s,X_{i})=\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i}). Then, for the second term of R2(τ)R_{2}(\tau), we have

|1ns𝒮1π^(s)i=1n(Aiπ^(s))Δ¯1(τ,s,Xi)1{Si=s}|\displaystyle\left|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\overline{\Delta}_{1}(\tau,s,X_{i})1\{S_{i}=s\}\right|
=\displaystyle= 1ns𝒮n0(s)supτΥ|iI1(s)Δ¯1(τ,s,Xi)n1(s)iI0(s)Δ¯1(τ,s,Xi)n0(s)|=op(1),\displaystyle\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}n_{0}(s)\sup_{\tau\in\Upsilon}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}(s)}-\frac{\sum_{i\in I_{0}(s)}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}(s)}\biggr{|}=o_{p}(1),

where the last equality holds by Assumption 3(i). Therefore, we have

supτΥ|R1,2(τ)|=op(1).\displaystyle\sup_{\tau\in\Upsilon}|R_{1,2}(\tau)|=o_{p}(1).

Combining (E.1) and (E.2), we have

L1,n(τ)\displaystyle L_{1,n}(\tau) =s𝒮1ni=1nAi1{Si=s}[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]
+s𝒮1ni=1n(1Ai)1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)
+i=1nm1(τ,Si)n+R1,1(τ)R1,2(τ).\displaystyle+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}(\tau)-R_{1,2}(\tau).

Note by Assumption 3 that the classes of functions

{ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s)):τΥ}\displaystyle\left\{\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right):\tau\in\Upsilon\right\}

and

{m¯1(τ,s,Xi)m¯1(τ,s):τΥ}\displaystyle\left\{\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s):\tau\in\Upsilon\right\}

are of the VC-type with fixed coefficients and envelopes belonging to L,dL_{\mathbb{P},d}. In addition,

𝔼[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))|Si=s]=0\displaystyle\mathbb{E}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\bigg{|}S_{i}=s\right]=0

and

𝔼(m¯1(τ,s,Xi)m¯1(τ,s)|Si=s)=0.\displaystyle\mathbb{E}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)|S_{i}=s)=0.

Therefore, Lemma N.2 implies,

supτΥ|s𝒮1ni=1nAi1{Si=s}[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]|=Op(1)\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]\right|=O_{p}(1)

and

supτΥ|s𝒮1ni=1n(1Ai)1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))|=Op(1).\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right|=O_{p}(1).

This implies supτΥ|L1,n(τ)|=Op(1)\sup_{\tau\in\Upsilon}|L_{1,n}(\tau)|=O_{p}(1). Then by Kato (2009, Theorem 2), we have

n(q^1adj(τ)q1(τ))\displaystyle\sqrt{n}(\hat{q}^{adj}_{1}(\tau)-q_{1}(\tau))
=1f1(q1(τ)){s𝒮1ni=1nAi1{Si=s}[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]\displaystyle=\frac{1}{f_{1}(q_{1}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]
+s𝒮1ni=1n(1Ai)1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))+i=1nm1(τ,Si)n}+Rq,1(τ),\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,1}(\tau),

where supτΥ|Rq,1(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{q,1}(\tau)|=o_{p}(1). Similarly, we have

n(q^0adj(τ)q0(τ))\displaystyle\sqrt{n}(\hat{q}^{adj}_{0}(\tau)-q_{0}(\tau))
=\displaystyle= 1f0(q0(τ)){s𝒮1ni=1n(1Ai)1{Si=s}[ηi,0(τ,s)1π(s)+(111π(s))(m¯0(τ,s,Xi)m¯0(τ,s))]\displaystyle\frac{1}{f_{0}(q_{0}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left[\frac{\eta_{i,0}(\tau,s)}{1-\pi(s)}+\left(1-\frac{1}{1-\pi(s)}\right)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)\right]
+s𝒮1ni=1nAi1{Si=s}(m¯0(τ,s,Xi)m¯0(τ,s))+i=1nm0(τ,Si)n}+Rq,0(τ),\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)+\sum_{i=1}^{n}\frac{m_{0}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,0}(\tau),

where ηi,0(τ,s)=τ1{Yi(0)q0(τ)}m0(τ,s)\eta_{i,0}(\tau,s)=\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,s) and supτΥ|Rq,0(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{q,0}(\tau)|=o_{p}(1). Taking the difference of the above two displays gives

n(q^adj(τ)q(τ))\displaystyle\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))
=s𝒮1ni=1nAi1{Si=s}[ηi,1(τ,s)(1π(s))(m¯1(τ,s,Xi)m¯1(τ,s))π(s)f1(q1(τ))(m¯0(τ,s,Xi)m¯0(τ,s))f0(q0(τ))]\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)-(1-\pi(s))\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{\pi(s)f_{1}(q_{1}(\tau))}-\frac{\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{f_{0}(q_{0}(\tau))}\right]
s𝒮1ni=1n(1Ai)1{Si=s}[ηi,0(τ,s)π(s)(m¯0(τ,s,Xi)m¯0(τ,s))(1π(s))f0(q0(τ))(m¯1(τ,s,Xi)m¯1(τ,s))f1(q1(τ))]\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left[\frac{\eta_{i,0}(\tau,s)-\pi(s)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{(1-\pi(s))f_{0}(q_{0}(\tau))}-\frac{\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{f_{1}(q_{1}(\tau))}\right]
+1ni=1n(m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)))+Rq(τ)\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)+R_{q}(\tau)
s𝒮1ni=1nAi1{Si=s}ϕ1(τ,s,Yi(1),Xi)s𝒮1ni=1n(1Ai)1{Si=s}ϕ0(τ,s,Yi(0),Xi)\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i})
+1ni=1nϕs(τ,Si)+Rq(τ),\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\phi_{s}(\tau,S_{i})+R_{q}(\tau),

where supτΥ|Rq(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{q}(\tau)|=o_{p}(1). Lemma N.3 shows that, uniformly over τΥ\tau\in\Upsilon,

n(q^adj(τ)q(τ))(τ),\displaystyle\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))\rightsquigarrow\mathcal{B}(\tau),

where (τ)\mathcal{B}(\tau) is a Gaussian process with covariance kernel

Σ(τ,τ)\displaystyle\Sigma(\tau,\tau^{\prime}) =𝔼π(Si)ϕ1(τ,Si,Yi(1),Xi)ϕ1(τ,Si,Yi(1),Xi)\displaystyle=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
+𝔼(1π(Si))ϕ0(τ,Si,Yi(0),Xi)ϕ0(τ,Si,Yi(0),Xi)\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})
+𝔼ϕs(τ,Si)ϕs(τ,Si).\displaystyle+\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).

For the second result in Theorem 3, we denote

δa(τ,Si,Xi)=ma(τ,Si,Xi)ma(τ,Si)andδ¯a(τ,Si,Xi)=(m¯a(τ,Si,Xi)m¯a(τ,Si)),a=0,1.\displaystyle\delta_{a}(\tau,S_{i},X_{i})=m_{a}(\tau,S_{i},X_{i})-m_{a}(\tau,S_{i})\quad\text{and}\quad\overline{\delta}_{a}(\tau,S_{i},X_{i})=(\overline{m}_{a}(\tau,S_{i},X_{i})-\overline{m}_{a}(\tau,S_{i})),\leavevmode\nobreak\ a=0,1. (E.3)

Then

𝔼π(Si)ϕ1(τ,Si,Yi(1),Xi)ϕ1(τ,Si,Yi(1),Xi)\displaystyle\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
=𝔼1π(Si)[(τ1{Yi(1)q1(τ)}m1(τ,Si,Xi))f1(q1(τ))][(τ1{Yi(1)q1(τ)}m1(τ,Si,Xi))f1(q1(τ))]\displaystyle=\mathbb{E}\frac{1}{\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(1)\leq q_{1}(\tau^{\prime})\}-m_{1}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]
+𝔼π(Si)[δ1(τ,Si,Xi)δ¯1(τ,Si,Xi)π(Si)f1(q1(τ))+(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))]\displaystyle+\mathbb{E}\pi(S_{i})\left[\frac{\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau))}+\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]
×[δ1(τ,Si,Xi)δ¯1(τ,Si,Xi)π(Si)f1(q1(τ))+(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))],\displaystyle\times\left[\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau^{\prime}))}+\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right],
𝔼(1π(Si))ϕ0(τ,Si,Yi(1),Xi)ϕ0(τ,Si,Yi(1),Xi)\displaystyle\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
=𝔼11π(Si)[(τ1{Yi(0)q0(τ)}m0(τ,Si,Xi))f1(q1(τ))][(τ1{Yi(0)q0(τ)}m0(τ,Si,Xi))f1(q1(τ))]\displaystyle=\mathbb{E}\frac{1}{1-\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(0)\leq q_{0}(\tau^{\prime})\}-m_{0}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]
+𝔼(1π(Si))[δ0(τ,Si,Xi)δ¯0(τ,Si,Xi)(1π(Si))f0(q0(τ))(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))]\displaystyle+\mathbb{E}(1-\pi(S_{i}))\left[\frac{\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau))}-\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]
×[δ0(τ,Si,Xi)δ¯0(τ,Si,Xi)(1π(Si))f0(q0(τ))(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))],\displaystyle\times\left[\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau^{\prime}))}-\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right],

and

𝔼ϕs(τ,Si)ϕs(τ,Si)\displaystyle\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}) =𝔼(m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)))(m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)))\displaystyle=\mathbb{E}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)
=𝔼(m1(τ,Si,Xi)f1(q1(τ))m0(τ,Si,Xi)f0(q0(τ)))(m1(τ,Si,Xi)f1(q1(τ))m0(τ,Si,Xi)f0(q0(τ)))\displaystyle=\mathbb{E}\left(\frac{m_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)
𝔼(δ1(τ,Si,Xi)f1(q1(τ))δ0(τ,Si,Xi)f0(q0(τ)))(δ1(τ,Si,Xi)f1(q1(τ))δ0(τ,Si,Xi)f0(q0(τ))).\displaystyle-\mathbb{E}\left(\frac{\delta_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\delta_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right).

Let

Σ(τ,τ)\displaystyle\Sigma^{*}(\tau,\tau^{\prime})
=𝔼1π(Si)[(τ1{Yi(1)q1(τ)}m1(τ,Si,Xi))f1(q1(τ))][(τ1{Yi(1)q1(τ)}m1(τ,Si,Xi))f1(q1(τ))]\displaystyle=\mathbb{E}\frac{1}{\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(1)\leq q_{1}(\tau^{\prime})\}-m_{1}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]
+𝔼11π(Si)[(τ1{Yi(0)q0(τ)}m0(τ,Si,Xi))f1(q1(τ))][(τ1{Yi(0)q0(τ)}m0(τ,Si,Xi))f1(q1(τ))]\displaystyle+\mathbb{E}\frac{1}{1-\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(0)\leq q_{0}(\tau^{\prime})\}-m_{0}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]
+𝔼(m1(τ,Si,Xi)f1(q1(τ))m0(τ,Si,Xi)f0(q0(τ)))(m1(τ,Si,Xi)f1(q1(τ))m0(τ,Si,Xi)f0(q0(τ))),\displaystyle+\mathbb{E}\left(\frac{m_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right),

which does not rely on the working models. Then,

Σ(τ,τ)Σ(τ,τ)\displaystyle\Sigma(\tau,\tau^{\prime})-\Sigma^{*}(\tau,\tau^{\prime})
=𝔼π(Si)[δ1(τ,Si,Xi)δ¯1(τ,Si,Xi)π(Si)f1(q1(τ))+(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))]\displaystyle=\mathbb{E}\pi(S_{i})\left[\frac{\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau))}+\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]
×[δ1(τ,Si,Xi)δ¯1(τ,Si,Xi)π(Si)f1(q1(τ))+(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))]\displaystyle\times\left[\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau^{\prime}))}+\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right]
+𝔼(1π(Si))[δ0(τ,Si,Xi)δ¯0(τ,Si,Xi)(1π(Si))f0(q0(τ))(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))]\displaystyle+\mathbb{E}(1-\pi(S_{i}))\left[\frac{\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau))}-\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]
×[δ0(τ,Si,Xi)δ¯0(τ,Si,Xi)(1π(Si))f0(q0(τ))(δ¯1(τ,Si,Xi)f1(q1(τ))δ¯0(τ,Si,Xi)f0(q0(τ)))]\displaystyle\times\left[\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau^{\prime}))}-\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right]
𝔼(δ1(τ,Si,Xi)f1(q1(τ))δ0(τ,Si,Xi)f0(q0(τ)))(δ1(τ,Si,Xi)f1(q1(τ))δ0(τ,Si,Xi)f0(q0(τ)))\displaystyle-\mathbb{E}\left(\frac{\delta_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\delta_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)
=𝔼[1π(Si)π(Si)δ1(τ,Si,Xi)δ¯1(τ,Si,Xi)f1(q1(τ))+π(Si)1π(Si)δ0(τ,Si,Xi)δ¯0(τ,Si,Xi)f0(q0(τ))]\displaystyle=\mathbb{E}\left[\sqrt{\frac{1-\pi(S_{i})}{\pi(S_{i})}}\frac{\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}+\sqrt{\frac{\pi(S_{i})}{1-\pi(S_{i})}}\frac{\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right]
×[1π(Si)π(Si)δ1(τ,Si,Xi)δ¯1(τ,Si,Xi)f1(q1(τ))+π(Si)1π(Si)δ0(τ,Si,Xi)δ¯0(τ,Si,Xi)f0(q0(τ))]\displaystyle\times\left[\sqrt{\frac{1-\pi(S_{i})}{\pi(S_{i})}}\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}+\sqrt{\frac{\pi(S_{i})}{1-\pi(S_{i})}}\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right]
𝔼ui(τ)ui(τ),\displaystyle\equiv\mathbb{E}u_{i}(\tau)u_{i}(\tau^{\prime}),

where

ui(τ)=1π(Si)π(Si)(δ1(τ,Si,Xi)δ¯1(τ,Si,Xi))f1(q1(τ))+π(Si)1π(Si)(δ0(τ,Si,Xi)δ¯0(τ,Si,Xi))f0(q0(τ)).u_{i}(\tau)=\sqrt{\frac{1-\pi(S_{i})}{\pi(S_{i})}}\frac{\left(\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})\right)}{f_{1}(q_{1}(\tau))}+\sqrt{\frac{\pi(S_{i})}{1-\pi(S_{i})}}\frac{\left(\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})\right)}{f_{0}(q_{0}(\tau))}.

Further, denote ui=(ai(τ1),,ai(τK))\vec{u}_{i}=(a_{i}(\tau_{1}),\cdots,a_{i}(\tau_{K}))^{\top}, the asymptotic variance covariance matrix of (q^adj(τ1),,q^adj(τK))(\hat{q}^{adj}(\tau_{1}),\cdots,\hat{q}^{adj}(\tau_{K})) as [Σkl]k,l[K][\Sigma_{kl}]_{k,l\in[K]}, and the optimal variance covariance matrix as [Σkl]k,l[K][\Sigma^{*}_{kl}]_{k,l\in[K]}. We have

[Σkl]k,l[K][Σkl]k,l[K]=[𝔼ui(τk)ui(τl)]k,l[K]=𝔼uiui,\displaystyle[\Sigma_{kl}]_{k,l\in[K]}-[\Sigma^{*}_{kl}]_{k,l\in[K]}=[\mathbb{E}u_{i}(\tau_{k})u_{i}(\tau_{l})]_{k,l\in[K]}=\mathbb{E}\vec{u}_{i}\vec{u}_{i}^{\top},

which is positive semidefinite. In addition, 𝔼uiui=0\mathbb{E}\vec{u}_{i}\vec{u}_{i}^{\top}=0 if m¯a(τ,s,x)=ma(τ,s,x)\overline{m}_{a}(\tau,s,x)=m_{a}(\tau,s,x) for a=0,1a=0,1, τ{τ1,,τK}\tau\in\{\tau_{1},\cdots,\tau_{K}\}, and (s,x)(s,x) in the joint support of (Si,Xi)(S_{i},X_{i}). This concludes the proof.

Appendix F Proof of Theorem 5

Name Description nwa(s)n^{w}_{a}(s) For a{0,1}a\in\{0,1\} and s𝒮s\in\mathcal{S}, nwa(s)=i=1nξiAi1{Si=s}n^{w}_{a}(s)=\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}, where {ξi}i[n]\{\xi_{i}\}_{i\in[n]} is the bootstrap weights nw(s)n^{w}(s) For s𝒮s\in\mathcal{S}, nw(s)=i=1nξi1{Si=s}n^{w}(s)=\sum_{i=1}^{n}\xi_{i}1\{S_{i}=s\} π^w(s)\hat{\pi}^{w}(s) For s𝒮s\in\mathcal{S}, π^(s)=n1w(s)/nw(s)\hat{\pi}(s)=n_{1}^{w}(s)/n^{w}(s) q^aw(τ)\hat{q}_{a}^{w}(\tau) For a{0,1}a\in\{0,1\} and τΥ\tau\in\Upsilon, q^aw(τ)\hat{q}_{a}^{w}(\tau) is the bootstrap estimator of qa(τ)q_{a}(\tau) with a generic regression adjustment

We focus on deriving the linear expansion of q^w1(τ).\hat{q}^{w}_{1}(\tau). Let

Lnw(u,τ)=\displaystyle L_{n}^{w}(u,\tau)= i=1nξi[Aiπ^w(Si)[ρτ(Yiq1(τ)u/n)ρτ(Yiq1(τ))]+(Aiπ^w(Si))π^w(Si)nm^1(τ,Si,Xi)u]\displaystyle\sum_{i=1}^{n}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}\left[\rho_{\tau}(Y_{i}-q_{1}(\tau)-u/\sqrt{n})-\rho_{\tau}(Y_{i}-q_{1}(\tau))\right]+\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})\sqrt{n}}\widehat{m}_{1}(\tau,S_{i},X_{i})u\right]
\displaystyle\equiv L1,nw(τ)u+L2,nw(u,τ),\displaystyle-L_{1,n}^{w}(\tau)u+L_{2,n}^{w}(u,\tau),

where

L1,nw(τ)=1ni=1nξi[Aiπ^w(Si)(τ1{Yiq1(τ)})(Aiπ^w(Si))π^w(Si)m^1(τ,Si,Xi)],\displaystyle L_{1,n}^{w}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})-\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right],

and

L2,nw(τ)=i=1nξiAiπ^w(Si)0un(1{Yiq1(τ)+v}1{Yiq1(τ)})dv.\displaystyle L_{2,n}^{w}(\tau)=\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}\int_{0}^{\frac{u}{\sqrt{n}}}\left(1\{Y_{i}\leq q_{1}(\tau)+v\}-1\{Y_{i}\leq q_{1}(\tau)\}\right)dv.

By change of variables, we have

n(q^1w(τ)q1(τ))=argminuLnw(u,τ).\displaystyle\sqrt{n}(\hat{q}_{1}^{w}(\tau)-q_{1}(\tau))=\operatorname*{arg\,min}_{u}L_{n}^{w}(u,\tau).

Note that L2,nw(τ)L_{2,n}^{w}(\tau) is exactly the same as that considered in the proof of Theorem 3.2 in Zhang and Zheng (2020) and by their result we have

supτΥ|L2,nw(τ)f1(q1(τ))u22|=op(1).\displaystyle\sup_{\tau\in\Upsilon}\left|L_{2,n}^{w}(\tau)-\frac{f_{1}(q_{1}(\tau))u^{2}}{2}\right|=o_{p}(1).

Next consider L1,nw(τ)L_{1,n}^{w}(\tau). Recall m1(τ,s)=𝔼(m1(τ,Si,Xi)|Si=s)m_{1}(\tau,s)=\mathbb{E}(m_{1}(\tau,S_{i},X_{i})|S_{i}=s) and ηi,1(τ,s)=τ1{Yiq1(τ)}m1(τ,s)\eta_{i,1}(\tau,s)=\tau-1\{Y_{i}\leq q_{1}(\tau)\}-m_{1}(\tau,s). Denote

L1,nw(τ)\displaystyle L_{1,n}^{w}(\tau) =1ni=1nξi[Aiπ^w(Si)(τ1{Yiq1(τ)})]1ni=1nξi[(Aiπ^w(Si))π^w(Si)m^1(τ,Si,Xi)]\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})\right]-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left[\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right]
L1,1,nw(τ)L1,2,nw(τ).\displaystyle\equiv L_{1,1,n}^{w}(\tau)-L_{1,2,n}^{w}(\tau).

First, note that

L1,1,nw(τ)=1ni=1ns𝒮ξiAiπ(s)1{Si=s}(τ1{Yi(1)q1(τ)})\displaystyle L_{1,1,n}^{w}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})
i=1ns𝒮ξiAi1{Si=s}(π^w(s)π(s))nπ^w(s)π(s)(τ1{Yi(1)q1(τ)})\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}(\hat{\pi}^{w}(s)-\pi(s))}{\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})
=1ni=1ns𝒮ξiAiπ(s)1{Si=s}(τ1{Yi(1)q1(τ)})\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})
i=1ns𝒮ξiAi1{Si=s}Dnw(s)nw(s)nπ^w(s)π(s)ηi,1(τ,s)s𝒮Dnw(s)m1(τ,s)nw(s)nπ^w(s)π(s)Dnw(s)s𝒮Dnw(s)m1(τ,s)nπ^w(s)\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D_{n}^{w}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}^{w}(s)m_{1}(\tau,s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}D_{n}^{w}(s)-\sum_{s\in\mathcal{S}}\frac{D_{n}^{w}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}^{w}(s)}
=s𝒮1ni=1nξiAi1{Si=s}π(s)ηi,1(τ,s)+s𝒮Dwn(s)nπ(s)m1(τ,s)+i=1nξim1(τ,Si)n\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)}{\sqrt{n}\pi(s)}m_{1}(\tau,s)+\sum_{i=1}^{n}\frac{\xi_{i}m_{1}(\tau,S_{i})}{\sqrt{n}}
i=1ns𝒮ξiAi1{Si=s}Dnw(s)nw(s)nπ^w(s)π(s)ηi,1(τ,s)s𝒮Dnw(s)m1(τ,s)nw(s)nπ^w(s)π(s)Dwn(s)s𝒮Dwn(s)m1(τ,s)nπ^w(s)\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D_{n}^{w}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}^{w}(s)m_{1}(\tau,s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}D^{w}_{n}(s)-\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}^{w}(s)}
=s𝒮1ni=1nξiAi1{Si=s}π(s)ηi,1(τ,s)+i=1nξim1(τ,Si)n+R1,1w(τ),\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{i=1}^{n}\frac{\xi_{i}m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}^{w}(\tau), (F.1)

where Dnw(s)=i=1nξi(Aiπ(Si))1{Si=s}=(πw(s)π(s))nw(s)D_{n}^{w}(s)=\sum_{i=1}^{n}\xi_{i}(A_{i}-\pi(S_{i}))1\{S_{i}=s\}=(\pi^{w}(s)-\pi(s))n^{w}(s),

R1,1w(τ)=i=1ns𝒮ξiAi1{Si=s}Dwn(s)nw(s)nπ^w(s)π(s)ηi,1(τ,s)s𝒮Dwn(s)m1(τ,s)nw(s)nπ^w(s)π(s)Dwn(s)\displaystyle R_{1,1}^{w}(\tau)=-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D^{w}_{n}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)m_{1}(\tau,s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}D^{w}_{n}(s)
+s𝒮Dwn(s)m1(τ,s)n(1π(s)1π^w(s))=i=1ns𝒮ξiAi1{Si=s}Dwn(s)nw(s)nπ^w(s)π(s)ηi,1(τ,s).\displaystyle+\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)m_{1}(\tau,s)}{\sqrt{n}}\left(\frac{1}{\pi(s)}-\frac{1}{\hat{\pi}^{w}(s)}\right)=-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D^{w}_{n}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s).

Note that

{ξi(τ1{Yi(1)q1(τ)}m1(τ,Si)):τΥ}\displaystyle\{\xi_{i}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i})):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients (α,v)(\alpha,v) and the envelope Fi=ξiF_{i}=\xi_{i}, and

𝔼[ξi(τ1{Yi(1)q1(τ)}m1(τ,Si))|Si=s]=0.\mathbb{E}\left[\xi_{i}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i}))|S_{i}=s\right]=0.

We can also let σn2=𝔼(Fi2|Si=s)C<\sigma_{n}^{2}=\mathbb{E}(F_{i}^{2}|S_{i}=s)\leq C<\infty for some constant CC. Then, Lemma N.2 implies

supτΥ,s𝒮|1ni=1nAi1{Si=s}ξiηi,1(τ,s)|=Op(1).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\xi_{i}\eta_{i,1}(\tau,s)\right|=O_{p}(1).

In addition, Lemma N.4 implies maxs𝒮|Dnw(s)/nw(s)|=op(1)\max_{s\in\mathcal{S}}|D_{n}^{w}(s)/n^{w}(s)|=o_{p}(1), which further implies maxs𝒮|π^w(s)π(s)|=op(1)\max_{s\in\mathcal{S}}|\hat{\pi}^{w}(s)-\pi(s)|=o_{p}(1). Combining these results, we have

supτΥ|Rw1,1(τ)|=op(1).\displaystyle\sup_{\tau\in\Upsilon}|R^{w}_{1,1}(\tau)|=o_{p}(1).

Next, recall m¯1(τ,s)=𝔼(m¯1(q1(τ),Xi,Si)|Si=s)\overline{m}_{1}(\tau,s)=\mathbb{E}(\overline{m}_{1}(q_{1}(\tau),X_{i},S_{i})|S_{i}=s). Then

L1,2,nw\displaystyle L_{1,2,n}^{w} =1ns𝒮i=1nξiAiπ^w(s)m¯1(τ,s,Xi)1{Si=s}1ni=1nξim¯1(τ,Si,Xi)\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(s)}\overline{m}_{1}(\tau,s,X_{i})1\{S_{i}=s\}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\overline{m}_{1}(\tau,S_{i},X_{i})
+1ns𝒮1π^w(s)i=1nξi(Aiπ^w(s))(m^1(τ,s,Xi)m¯1(τ,s,Xi))1{Si=s}\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}
=1ns𝒮i=1nξiAiπ^w(s)(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s}\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}
1ni=1nξi(m¯1(τ,Si,Xi)m¯1(τ,Si))\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)
+1ns𝒮1π^w(s)i=1nξi(Aiπ^w(s))(m^1(τ,s,Xi)m¯1(τ,s,Xi))1{Si=s}\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}
=1ns𝒮i=1nξiAiπ(s)(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s}\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}
1ni=1nξi(m¯1(τ,Si,Xi)m¯1(τ,Si))\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)
+1ns𝒮(π(s)π^w(s)π^w(s)π(s))(i=1nξiAi(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s})\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}^{w}(s)}{\hat{\pi}^{w}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}\xi_{i}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)
+1ns𝒮1π^w(s)i=1nξi(Aiπ^w(s))(m^1(τ,s,Xi)m¯1(τ,s,Xi))1{Si=s}\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}
1ns𝒮i=1nξiAiπ(s)(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s}\displaystyle\equiv\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}
1ni=1nξi(m¯1(τ,Si,Xi)m¯1(τ,Si))+R1,2w(τ),\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)+R_{1,2}^{w}(\tau), (F.2)

where the second equality holds because

s𝒮i=1nξiAi1{Si=s}m¯1(τ,s)π^w(s)=\displaystyle\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\frac{\overline{m}_{1}(\tau,s)}{\hat{\pi}^{w}(s)}= s𝒮n1w(s)m¯1(τ,s)π^w(s)=s𝒮nw(s)m¯1(τ,s)\displaystyle\sum_{s\in\mathcal{S}}n_{1}^{w}(s)\frac{\overline{m}_{1}(\tau,s)}{\hat{\pi}^{w}(s)}=\sum_{s\in\mathcal{S}}n^{w}(s)\overline{m}_{1}(\tau,s)
=\displaystyle= i=1ns𝒮ξi1{Si=s}m¯1(τ,Si)=i=1nξim¯1(τ,Si).\displaystyle\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\xi_{i}1\{S_{i}=s\}\overline{m}_{1}(\tau,S_{i})=\sum_{i=1}^{n}\xi_{i}\overline{m}_{1}(\tau,S_{i}).

For the first term in R2w(τ)R_{2}^{w}(\tau), we have

supτΥ|1ns𝒮(π(s)π^w(s)π^w(s)π(s))(i=1nAiξi(m¯1(τ,s,Xi)m¯1(τ,s))1{Si=s})|\displaystyle\sup_{\tau\in\Upsilon}\left|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}^{w}(s)}{\hat{\pi}^{w}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}\xi_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)\right|
s𝒮|Dnw(s)nw(s)π^w(s)π(s)|supτΥ,s𝒮|1ni=1nξiAi1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))|=op(1),\displaystyle\leq\sum_{s\in\mathcal{S}}\left|\frac{D_{n}^{w}(s)}{n^{w}(s)\hat{\pi}^{w}(s)\pi(s)}\right|\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))\right|=o_{p}(1),

where the last equality holds due to Lemmas N.2 and N.4, and the fact that ={ξ(m¯1(τ,s,Xi)m¯1(τ,s)):τΥ}\mathcal{F}=\{\xi(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)):\tau\in\Upsilon\} is of the VC-type with fixed coefficients (α,v)(\alpha,v) and envelope ξiFi\xi_{i}F_{i} such that 𝔼((ξiFi)q|Si=s)<\mathbb{E}((\xi_{i}F_{i})^{q}|S_{i}=s)<\infty for q>2.q>2.

For the second term in R1,2w(τ)R_{1,2}^{w}(\tau), recall Δ¯1(τ,s,Xi)=m^1(τ,s,Xi)m¯1(τ,s,Xi)\overline{\Delta}_{1}(\tau,s,X_{i})=\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i}). Then

|1ns𝒮1π^w(s)i=1nξi(Aiπ^w(s))Δ¯1(τ,s,Xi)1{Si=s}|\displaystyle\left|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\overline{\Delta}_{1}(\tau,s,X_{i})1\{S_{i}=s\}\right|
=1ns𝒮n0w(s)supτΥ|iI1(s)ξiΔ¯1(τ,s,Xi)n1w(s)iI0(s)ξiΔ¯1(τ,s,Xi)n0w(s)|=op(1),\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}n_{0}^{w}(s)\sup_{\tau\in\Upsilon}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{|}=o_{p}(1),

where the last equality holds by Assumption 3. Therefore, we have

supτΥ|R1,2w(τ)|=op(1).\displaystyle\sup_{\tau\in\Upsilon}|R_{1,2}^{w}(\tau)|=o_{p}(1).

Combining (F.1) and (F.2), we have

L1,nw(τ)=\displaystyle L_{1,n}^{w}(\tau)= s𝒮1ni=1nξiAi1{Si=s}[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]
+s𝒮1ni=1nξi(1Ai)1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)
+i=1nξim1(τ,Si)n+R1,1w(τ)R1,2w(τ),\displaystyle+\sum_{i=1}^{n}\xi_{i}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}^{w}(\tau)-R_{1,2}^{w}(\tau),

where supτΥ(|R1,1w(τ)|+|R1,2w(τ)|)=op(1)\sup_{\tau\in\Upsilon}(|R_{1,1}^{w}(\tau)|+|R_{1,2}^{w}(\tau)|)=o_{p}(1). In addition, Assumption 3 implies that the classes of functions

{ξi[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]:τΥ}\displaystyle\left\{\xi_{i}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]:\tau\in\Upsilon\right\}

and

{ξi[m¯1(τ,s,Xi)m¯1(τ,s)]:τΥ}\displaystyle\left\{\xi_{i}[\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)]:\tau\in\Upsilon\right\}

are of the VC-type with fixed coefficients and envelopes belonging to L,dL_{\mathbb{P},d}. In addition,

𝔼[ξi(ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s)))|Si=s]=0,\displaystyle\mathbb{E}\left[\xi_{i}\left(\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right)\biggl{|}S_{i}=s\right]=0,

and

𝔼[ξi(m¯1(τ,s,Xi)m¯1(τ,s))|Si=s]=0.\displaystyle\mathbb{E}\left[\xi_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))|S_{i}=s\right]=0.

Therefore, Lemma N.2 implies,

supτΥ|s𝒮1ni=1nAi1{Si=s}[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]|=Op(1),\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]\right|=O_{p}(1),

and

supτΥ|s𝒮1ni=1n(1Ai)1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))|=Op(1).\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right|=O_{p}(1).

This implies supτΥ|Lw1,n(τ)|=Op(1)\sup_{\tau\in\Upsilon}|L^{w}_{1,n}(\tau)|=O_{p}(1). Then by Kato (2009, Theorem 2) we have

n(q^w1(τ)q1(τ))\displaystyle\sqrt{n}(\hat{q}^{w}_{1}(\tau)-q_{1}(\tau))
=1f1(q1(τ)){s𝒮1ni=1nξiAi1{Si=s}[ηi,1(τ,s)π(s)+(11π(s))(m¯1(τ,s,Xi)m¯1(τ,s))]\displaystyle=\frac{1}{f_{1}(q_{1}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]
+s𝒮1ni=1nξi(1Ai)1{Si=s}(m¯1(τ,s,Xi)m¯1(τ,s))+i=1nξim1(τ,Si)n}+Rq,1w(τ),\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)+\sum_{i=1}^{n}\xi_{i}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,1}^{w}(\tau),

where supτΥ|Rq,1w(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{q,1}^{w}(\tau)|=o_{p}(1). Similarly, we have

n(q^w0(τ)q0(τ))\displaystyle\sqrt{n}(\hat{q}^{w}_{0}(\tau)-q_{0}(\tau))
=1f0(q0(τ)){s𝒮1ni=1nξi(1Ai)1{Si=s}[ηi,0(τ,s)1π(s)+(111π(s))(m¯0(τ,s,Xi)m¯0(τ,s))]\displaystyle=\frac{1}{f_{0}(q_{0}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\left[\frac{\eta_{i,0}(\tau,s)}{1-\pi(s)}+\left(1-\frac{1}{1-\pi(s)}\right)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)\right]
+s𝒮1ni=1nξiAi1{Si=s}(m¯0(τ,s,Xi)m¯0(τ,s))+i=1nξim0(τ,Si)n}+Rq,0(τ),\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)+\sum_{i=1}^{n}\xi_{i}\frac{m_{0}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,0}(\tau),

where supτΥ|Rq,0w(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{q,0}^{w}(\tau)|=o_{p}(1). Taking the difference of the above two displays we obtain

n(q^w(τ)q(τ))\displaystyle\sqrt{n}(\hat{q}^{w}(\tau)-q(\tau))
=s𝒮1ni=1nξiAi1{Si=s}\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}
×[ηi,1(τ,s)(1π(s))(m¯1(τ,s,Xi)m¯1(τ,s))π(s)f1(q1(τ))(m¯0(τ,s,Xi)m¯0(τ,s))f0(q0(τ))]\displaystyle\times\left[\frac{\eta_{i,1}(\tau,s)-(1-\pi(s))\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{\pi(s)f_{1}(q_{1}(\tau))}-\frac{\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{f_{0}(q_{0}(\tau))}\right]
s𝒮1ni=1nξi(1Ai)1{Si=s}\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}
×[ηi,0(τ,s)π(s)(m¯0(τ,s,Xi)m¯0(τ,s))(1π(s))f0(q0(τ))(m¯1(τ,s,Xi)m¯1(τ,s))f1(q1(τ))]\displaystyle\times\left[\frac{\eta_{i,0}(\tau,s)-\pi(s)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{(1-\pi(s))f_{0}(q_{0}(\tau))}-\frac{\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{f_{1}(q_{1}(\tau))}\right]
+1ni=1nξi(m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)))+Rwq(τ)\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)+R^{w}_{q}(\tau)
=s𝒮1ni=1nξiAi1{Si=s}ϕ1(τ,s,Yi(1),Xi)s𝒮1ni=1nξi(1Ai)1{Si=s}ϕ0(τ,s,Yi(0),Xi)\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i})
+1ni=1nξiϕs(τ,Si)+Rqw(τ),\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\phi_{s}(\tau,S_{i})+R_{q}^{w}(\tau),

where supτΥ|Rqw(τ)|=op(1)\sup_{\tau\in\Upsilon}|R_{q}^{w}(\tau)|=o_{p}(1) and (ϕ1(),ϕ0(),ϕs())(\phi_{1}(\cdot),\phi_{0}(\cdot),\phi_{s}(\cdot)) are defined in Section E. Recalling the linear expansion of n(q^adj(τ)q(τ))\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau)) established in Section E, we have

n(q^w(τ)q^adj(τ))=s𝒮1ni=1n(ξi1)Ai1{Si=s}ϕ1(τ,s,Yi(1),Xi)\displaystyle\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})
s𝒮1ni=1n(ξi1)(1Ai)1{Si=s}ϕ0(τ,s,Yi(0),Xi)+1ni=1n(ξi1)ϕs(τ,Si)+R(τ)\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i})+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\phi_{s}(\tau,S_{i})+R(\tau)
=ϖn,1w(τ)ϖn,2w(τ)+R(τ),\displaystyle=\varpi_{n,1}^{w}(\tau)-\varpi_{n,2}^{w}(\tau)+R(\tau),

where supτΥ|R(τ)|=op(1)\sup_{\tau\in\Upsilon}|R(\tau)|=o_{p}(1),

ϖn,1w(τ)=\displaystyle\varpi_{n,1}^{w}(\tau)= s𝒮1ni=1n(ξi1)Ai1{Si=s}ϕ1(τ,s,Yi(1),Xi)\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})
s𝒮1ni=1n(ξi1)(1Ai)1{Si=s}ϕ0(τ,s,Yi(0),Xi),\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i}),

and

ϖn,2w(τ)=1ni=1n(ξi1)ϕs(τ,Si).\displaystyle\varpi_{n,2}^{w}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\phi_{s}(\tau,S_{i}).

Lemma N.5 shows that, uniformly over τΥ\tau\in\Upsilon and conditionally on data,

ϖn,1w(τ)+ϖn,2w(τ)(τ),\displaystyle\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)\rightsquigarrow\mathcal{B}(\tau),

where (τ)\mathcal{B}(\tau) is the Gaussian process with covariance kernel

Σ(τ,τ)=𝔼π(Si)ϕ1(τ,Si,Yi(1),Xi)ϕ1(τ,Si,Yi(1),Xi)\displaystyle\Sigma(\tau,\tau^{\prime})=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
+𝔼(1π(Si))ϕ0(τ,Si,Yi(0),Xi)ϕ0(τ,Si,Yi(0),Xi)+𝔼ϕs(τ,Si)ϕs(τ,Si),\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})+\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}),

as defined in Theorem 3. This concludes the proof.

Appendix G Proof of Theorem 5.1

Name Description N(s)N(s) For s𝒮s\in\mathcal{S}, N(s)=i=1n1{Si<s}N(s)=\sum_{i=1}^{n}1\{S_{i}<s\} Λτ,s(x,θa,s(τ))\Lambda_{\tau,s}(x,\theta_{a,s}(\tau)) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, τΥ\tau\in\Upsilon, and xSupp(X)x\in\text{Supp}(X), Λτ,s(x,θa,s(τ))\Lambda_{\tau,s}(x,\theta_{a,s}(\tau)) is a parametric model for (Y(a)qa(τ)|Si=s,Xi=x)\mathbb{P}(Y(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x) with a pseudo true value θa,s(τ)\theta_{a,s}(\tau) θ^a,s(τ)\hat{\theta}_{a,s}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, τΥ\tau\in\Upsilon, θ^a,s(τ)\hat{\theta}_{a,s}(\tau) is a consistent estimator of θa,s(τ)\theta_{a,s}(\tau)

The proof is divided into two steps. In the first step, we show Assumption 5. Assumption 3(i) can be shown in the same manner and is omitted. In the second step, we establish Assumptions 3(ii) and 3(iii).

Step 1. Recall m^a(τ,s,Xi)=τΛτ,s(x,θ^a,s(τ))\widehat{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(x,\hat{\theta}_{a,s}(\tau)), m¯a(τ,s,Xi)=τΛτ,s(x,θa,s(τ))\overline{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(x,\theta_{a,s}(\tau)),

Δ¯a(τ,s,Xi)=m^a(τ,s,Xi)m¯a(τ,s,Xi)=Λτ,s(Xi,θa,s(τ))Λτ,s(Xi,θ^a,s(τ)),\displaystyle\overline{\Delta}_{a}(\tau,s,X_{i})=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})=\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))-\Lambda_{\tau,s}(X_{i},\hat{\theta}_{a,s}(\tau)),

and {Xis,ξsi}i[n]\{X_{i}^{s},\xi^{s}_{i}\}_{i\in[n]} is generated independently from the joint distribution of (Xi,ξi)(X_{i},\xi_{i}) given Si=sS_{i}=s, and so is independent of {Ai,Si}i[n]\{A_{i},S_{i}\}_{i\in[n]}. Let Ψτ,s(θ1,θ2)=𝔼[Λτ,s(Xi,θ1)Λτ,s(Xi,θ2)|Si=s]=𝔼[Λτ,s(Xis,θ1)Λτ,s(Xis,θ2)]\Psi_{\tau,s}(\theta_{1},\theta_{2})=\mathbb{E}[\Lambda_{\tau,s}(X_{i},\theta_{1})-\Lambda_{\tau,s}(X_{i},\theta_{2})|S_{i}=s]=\mathbb{E}[\Lambda_{\tau,s}(X_{i}^{s},\theta_{1})-\Lambda_{\tau,s}(X_{i}^{s},\theta_{2})]. We have

supτΥ,s𝒮|iI1(s)ξiΔ¯a(τ,s,Xi)n1w(s)iI0(s)ξiΔ¯a(τ,s,Xi)n0w(s)|\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{|}
(maxs𝒮n1(s)n1w(s))supτΥ,s𝒮|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)|\displaystyle\leq\left(\max_{s\in\mathcal{S}}\frac{n_{1}(s)}{n_{1}^{w}(s)}\right)\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|
+(maxs𝒮n0(s)n0w(s))supτΥ,s𝒮|iI0(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n0(s)|\displaystyle+\left(\max_{s\in\mathcal{S}}\frac{n_{0}(s)}{n_{0}^{w}(s)}\right)\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{0}(s)}\right|
=op(n1/2).\displaystyle=o_{p}(n^{-1/2}). (G.1)

To see the last equality, we note that, for any ε>0\varepsilon>0, with probability approaching one (w.p.a.1), we have

supτΥ,s𝒮||θ^a,s(τ)θa,s(τ)||ε.\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}||\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)||\leq\varepsilon.

Therefore, on the event

𝒜n(ε){supτΥ,s𝒮||θ^a,s(τ)θa,s(τ)||ε,supτΥmax(||θ^a,s(τ)||,||θa,s(τ)||)C,mins𝒮n1(s)εn}\displaystyle\mathcal{A}_{n}(\varepsilon)\equiv\left\{\sup_{\tau\in\Upsilon,s\in\mathcal{S}}||\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)||\leq\varepsilon,\sup_{\tau\in\Upsilon}\max(||\hat{\theta}_{a,s}(\tau)||,||\theta_{a,s}(\tau)||)\leq C,\min_{s\in\mathcal{S}}n_{1}(s)\geq\varepsilon n\right\}

we have

supτΥ|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)||{Ai,Si}i[n]\displaystyle\sup_{\tau\in\Upsilon}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\biggl{|}\{A_{i},S_{i}\}_{i\in[n]}
=dsupτΥ|i=N(s)+1N(s)+n1(s)ξis[Δ¯a(τ,s,Xis)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)||{Ai,Si}i[n]\displaystyle\stackrel{{\scriptstyle d}}{{=}}\sup_{\tau\in\Upsilon}\left|\frac{\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\xi_{i}^{s}[\overline{\Delta}_{a}(\tau,s,X_{i}^{s})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\biggl{|}\{A_{i},S_{i}\}_{i\in[n]}
||n1(s)|||{Ai,Si}i[n],\displaystyle\leq||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}\biggl{|}\{A_{i},S_{i}\}_{i\in[n]},

where N(s)=i=1n1{Si<s}N(s)=\sum_{i=1}^{n}1\{S_{i}<s\} and

={ξis[Λτ,s(Xis,θ1)Λτ,s(Xis,θ2)Ψτ,s(θ1,θ2)]:τΥ,||θ1θ2||ε,max(||θ1||,||θ2||)C}.\displaystyle\mathcal{F}=\left\{\xi_{i}^{s}[\Lambda_{\tau,s}(X_{i}^{s},\theta_{1})-\Lambda_{\tau,s}(X_{i}^{s},\theta_{2})-\Psi_{\tau,s}(\theta_{1},\theta_{2})]:\tau\in\Upsilon,||\theta_{1}-\theta_{2}||\leq\varepsilon,\max(||\theta_{1}||,||\theta_{2}||)\leq C\right\}.

By Assumption 6, \mathcal{F} is a VC-class with a fixed VC index and envelope LiL_{i}. In addition,

supff2𝔼Li2(θ1θ2)2Cε2.\displaystyle\sup_{f\in\mathcal{F}}\mathbb{P}f^{2}\leq\mathbb{E}L_{i}^{2}(\theta_{1}-\theta_{2})^{2}\leq C\varepsilon^{2}.

Therefore, for any δ>0\delta>0 we have

(supτΥ,s𝒮|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)|δn1/2)\displaystyle\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\geq\delta n^{-1/2}\right)
(supτΥ,s𝒮|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)|δn1/2,𝒜n(ε))+(𝒜nc(ε))\displaystyle\leq\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\geq\delta n^{-1/2},\mathcal{A}_{n}(\varepsilon)\right)+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon))
𝔼[(supτΥ,s𝒮|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)|δn1/2,𝒜n(ε)|{Ai,Si}i[n])]\displaystyle\leq\mathbb{E}\left[\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\geq\delta n^{-1/2},\mathcal{A}_{n}(\varepsilon)\biggl{|}\{A_{i},S_{i}\}_{i\in[n]}\right)\right]
+(𝒜nc(ε))\displaystyle+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon))
s𝒮𝔼[(||n1(s)||δn1/2|{Ai,Si}i[n])1{n1(s)nε}]+(𝒜nc(ε))\displaystyle\leq\sum_{s\in\mathcal{S}}\mathbb{E}\left[\mathbb{P}\left(||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}\geq\delta n^{-1/2}\biggl{|}\{A_{i},S_{i}\}_{i\in[n]}\right)1\{n_{1}(s)\geq n\varepsilon\}\right]+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon))
s𝒮𝔼{n1/2𝔼[||n1(s)|||{Ai,Si}i[n]]1{n1(s)nε}δ}+(𝒜nc(ε)).\displaystyle\leq\sum_{s\in\mathcal{S}}\mathbb{E}\left\{\frac{n^{1/2}\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]1\{n_{1}(s)\geq n\varepsilon\}}{\delta}\right\}+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon)).

By Chernozhukov et al. (2014, Corollary 5.1),

n1/2𝔼[||n1(s)|||{Ai,Si}i[n]]1{n1(s)nε}\displaystyle n^{1/2}\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]1\{n_{1}(s)\geq n\varepsilon\}
C(nn1(s)ε2+n1/2n11/d1(s))1{n1(s)nε}\displaystyle\leq C(\sqrt{\frac{n}{n_{1}(s)}\varepsilon^{2}}+n^{1/2}n_{1}^{1/d-1}(s))1\{n_{1}(s)\geq n\varepsilon\}
C(ε1/2+n1/d1/2ε1/d1).\displaystyle\leq C(\varepsilon^{1/2}+n^{1/d-1/2}\varepsilon^{1/d-1}).

Therefore,

𝔼{n1/2𝔼[||n1(s)|||{Ai,Si}i[n]]1{n1(s)nε}δ}C𝔼(ε1/2+n1/d1/2ε1/d1)/δ.\displaystyle\mathbb{E}\left\{\frac{n^{1/2}\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]1\{n_{1}(s)\geq n\varepsilon\}}{\delta}\right\}\leq C\mathbb{E}\left(\varepsilon^{1/2}+n^{1/d-1/2}\varepsilon^{1/d-1}\right)/\delta.

By letting nn\rightarrow\infty followed by ε0\varepsilon\rightarrow 0, we have

limn(supτΥ,s𝒮|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]n1(s)|δn1/2)=0,\displaystyle\lim_{n\rightarrow\infty}\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\geq\delta n^{-1/2}\right)=0,

In addition,

maxs𝒮|n1w(s)/n1(s)1|=maxs𝒮|(Dnw(s)Dn(s))/(π(s)n(s)+Dn(s))|p1,\displaystyle\max_{s\in\mathcal{S}}|n_{1}^{w}(s)/n_{1}(s)-1|=\max_{s\in\mathcal{S}}|(D_{n}^{w}(s)-D_{n}(s))/(\pi(s)n(s)+D_{n}(s))|\stackrel{{\scriptstyle p}}{{\longrightarrow}}1,

as Lemma N.4 shows that maxs𝒮|(Dnw(s)Dn(s))/n(s)|=op(1)\max_{s\in\mathcal{S}}|(D_{n}^{w}(s)-D_{n}(s))/n(s)|=o_{p}(1).

Therefore,

supτΥ,s𝒮|iI1(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]nw1(s)|=op(n1/2).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n^{w}_{1}(s)}\right|=o_{p}(n^{-1/2}).

For the same reason, we have

supτΥ,s𝒮|iI0(s)ξi[Δ¯a(τ,s,Xi)Ψτ,s(θa,s(τ),θ^a,s(τ))]nw0(s)|=op(n1/2),\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n^{w}_{0}(s)}\right|=o_{p}(n^{-1/2}),

and (G.1) holds.

Step 2. By Assumption 6,

|m¯a(τ2,Si,Xi)m¯a(τ1,Si,Xi)|\displaystyle|\overline{m}_{a}(\tau_{2},S_{i},X_{i})-\overline{m}_{a}(\tau_{1},S_{i},X_{i})|
|τ2τ1|+|Λτ1,s(Xi,θa,s(τ1))Λτ2,s(Xi,θa,s(τ2))|\displaystyle\leq|\tau_{2}-\tau_{1}|+|\Lambda_{\tau_{1},s}(X_{i},\theta_{a,s}(\tau_{1}))-\Lambda_{\tau_{2},s}(X_{i},\theta_{a,s}(\tau_{2}))|
(1+L1)|τ2τ1|+Li|θa,s(τ1)θa,s(τ2)|(CLi+1)|τ2τ1|.\displaystyle\leq(1+L_{1})|\tau_{2}-\tau_{1}|+L_{i}|\theta_{a,s}(\tau_{1})-\theta_{a,s}(\tau_{2})|\leq(CL_{i}+1)|\tau_{2}-\tau_{1}|.

This implies Assumption 3(iii). Furthermore, by Assumption 6 we can let the envelope for the class of functions ={m¯a(τ2,Si,Xi):τΥ}\mathcal{F}=\{\overline{m}_{a}(\tau_{2},S_{i},X_{i}):\tau\in\Upsilon\} be Fi=max(C,1)Li+1F_{i}=\max(C,1)L_{i}+1 where the constant CC is the one in the above display. Then, we have

supQN(,eQ,ε||F||Q,2)N(Υ,d,ε)1/ε,\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq N(\Upsilon,d,\varepsilon)\leq 1/\varepsilon,

where d(τ1,τ2)=|τ1τ2|d(\tau_{1},\tau_{2})=|\tau_{1}-\tau_{2}|. This verifies Assumption 3(ii).

Appendix H Proof of Theorem 5.2

Name Description Wi,s(τ)W_{i,s}(\tau) For i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, Wi,s(τ)W_{i,s}(\tau) is the linear regressor in the linear adjustment so that Λτ,s(Xi,θa,s(τ))=Wi,s(τ)θa,s(τ)\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))=W_{i,s}^{\top}(\tau)\theta_{a,s}(\tau) W~i,s(τ)\tilde{W}_{i,s}(\tau) For i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, W~i,s(τ)=Wi,s(τ)𝔼(Wi,s(τ)|Si=s)\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s) θa,s(τ)\theta_{a,s}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θa,s(τ)\theta_{a,s}(\tau) is the pseudo true value in the linear adjustment

Let ΣLP(τk,τl)\Sigma^{\textit{LP}}(\tau_{k},\tau_{l}) be the asymptotic covariance matrix of q^adj(τk)\hat{q}^{adj}(\tau_{k}) and q^adj(τl)\hat{q}^{adj}(\tau_{l}) with a linear adjustment and pseudo true values (θ1,s(τk),θ0,s(τk))k[K](\theta_{1,s}(\tau_{k}),\theta_{0,s}(\tau_{k}))_{k\in[K]}. Then, we have

ΣLP(τk,τl)=ΣLP1(τk,τl)+s𝒮p(s)𝔼[(W~i,s(τk)βs(τk)y¯i,s(τk))(W~i,s(τl)βs(τl)y¯i,s(τl))|Si=s],\displaystyle\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})=\Sigma^{\textit{LP}}_{1}(\tau_{k},\tau_{l})+\sum_{s\in\mathcal{S}}p(s)\mathbb{E}\left[(\tilde{W}_{i,s}^{\top}(\tau_{k})\beta_{s}(\tau_{k})-\overline{y}_{i,s}(\tau_{k}))(\tilde{W}_{i,s}^{\top}(\tau_{l})\beta_{s}(\tau_{l})-\overline{y}_{i,s}(\tau_{l}))|S_{i}=s\right],

where

ΣLP1(τk,τl)\displaystyle\Sigma^{\textit{LP}}_{1}(\tau_{k},\tau_{l}) ={𝔼[(τ1{Yi(1)q1(τ)}m1(τ,Si,Xi))2π(Si)f12(q1(0))]\displaystyle=\biggl{\{}\mathbb{E}\left[\frac{(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i},X_{i}))^{2}}{\pi(S_{i})f_{1}^{2}(q_{1}(0))}\right]
+𝔼[(τ1{Yi(0)q0(τ)}m0(τ,Si,Xi))2(1π(Si))f02(q0(0))]\displaystyle+\mathbb{E}\left[\frac{(\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,S_{i},X_{i}))^{2}}{(1-\pi(S_{i}))f_{0}^{2}(q_{0}(0))}\right]
+𝔼(m1(τ,Si,Xi)f1(q1(τ))m0(τ,Si,Xi)f0(q0(τ)))2},\displaystyle+\mathbb{E}\left(\frac{m_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)^{2}\biggr{\}},
βs(τ)\displaystyle\beta_{s}(\tau) =1π(s)π(s)θ1,s(τ)f1(q1(τ))+π(s)1π(s)θ0,s(τ)f0(q0(τ)),and\displaystyle=\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\theta_{1,s}(\tau)}{f_{1}(q_{1}(\tau))}+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\theta_{0,s}(\tau)}{f_{0}(q_{0}(\tau))},\quad\text{and}
y¯i,s(τ)\displaystyle\overline{y}_{i,s}(\tau) =1π(s)π(s)[(Yi(1)q1(τ)|Xi,Si=s)(Yi(1)q1(τ)|Si=s)]f1(q1(τ))\displaystyle=\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\left[\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)|S_{i}=s)\right]}{f_{1}(q_{1}(\tau))}
+π(s)1π(s)[(Yi(0)q0(τ)|Xi,Si=s)(Yi(0)q0(τ)|Si=s)]f0(q0(τ)).\displaystyle+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\left[\mathbb{P}(Y_{i}(0)\leq q_{0}(\tau)|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(0)\leq q_{0}(\tau)|S_{i}=s)\right]}{f_{0}(q_{0}(\tau))}.

To minimize [ΣLP(τk,τl)]k,l[K][\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})]_{k,l\in[K]} (in the matrix sense) is the same as minimizing

[𝔼[(W~i,s(τk)βs(τk)y¯i,s(τk))(W~i,s(τl)βs(τl)y¯i,s(τl))|Si=s]]k,l[K]\left[\mathbb{E}\left[(\tilde{W}_{i,s}^{\top}(\tau_{k})\beta_{s}(\tau_{k})-\overline{y}_{i,s}(\tau_{k}))(\tilde{W}_{i,s}^{\top}(\tau_{l})\beta_{s}(\tau_{l})-\overline{y}_{i,s}(\tau_{l}))|S_{i}=s\right]\right]_{k,l\in[K]}

for each s𝒮s\in\mathcal{S}, which is achieved if

βs(τk)=[𝔼W~i,s(τk)W~i,s(τk)|Si=s]1𝔼[W~i,s(τk)y¯i,s(τk)|Si=s].\displaystyle\beta_{s}(\tau_{k})=[\mathbb{E}\tilde{W}_{i,s}(\tau_{k})\tilde{W}_{i,s}^{\top}(\tau_{k})|S_{i}=s]^{-1}\mathbb{E}[\tilde{W}_{i,s}(\tau_{k})\overline{y}_{i,s}(\tau_{k})|S_{i}=s]. (H.1)

Because 𝔼[W~i,s(τ)(Yi(a)qa(τ)|Si=s)|Si=s]=0\mathbb{E}[\tilde{W}_{i,s}(\tau)\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s)|S_{i}=s]=0 for a=0,1a=0,1, (H.1) implies

1π(s)π(s)θ1,s(τk)f1(q1(τk))+π(s)1π(s)θ0,s(τk)f0(q0(τk))\displaystyle\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\theta_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\theta_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))}
=1π(s)π(s)θLP1,s(τk)f1(q1(τk))+π(s)1π(s)θLP0,s(τk)f0(q0(τk)),\displaystyle=\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\theta^{\textit{LP}}_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\theta^{\textit{LP}}_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))},

or equivalently,

θ1,s(τk)f1(q1(τk))+π(s)1π(s)θ0,s(τk)f0(q0(τk))=θLP1,s(τk)f1(q1(τk))+π(s)1π(s)θLP0,s(τk)f0(q0(τk)).\displaystyle\frac{\theta_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)}{1-\pi(s)}\frac{\theta_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))}=\frac{\theta^{\textit{LP}}_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)}{1-\pi(s)}\frac{\theta^{\textit{LP}}_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))}.

This concludes the proof.

Appendix I Proof of Theorem 5.3

Name Description qa(τ)q_{a}(\tau) For a=0,1a=0,1 and τΥ\tau\in\Upsilon, qa(τ)q_{a}(\tau) is the τ\tauth quantile of Y(a)Y(a) θa,sLP(τ)\theta_{a,s}^{\textit{LP}}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θa,sLP(τ)=[𝔼(W~i,s(τ)W~i,s(τ)|Si=s)]1𝔼[W~i,s(τ)1{Yi(a)qa(τ)}|Si=s]\theta_{a,s}^{\textit{LP}}(\tau)=\left[\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)\right]^{-1}\mathbb{E}\left[\tilde{W}_{i,s}(\tau)1\{Y_{i}(a)\leq q_{a}(\tau)\}|S_{i}=s\right] is the optimal linear coefficient W˙i,a,s(τ)\dot{W}_{i,a,s}(\tau) For a{0,1}a\in\{0,1\}, i[n]i\in[n], s𝒮s\in\mathcal{S}, W˙i,a,s(τ)=Wi,s(τ)1na(s)iIa(s)Wi,s(τ)\dot{W}_{i,a,s}(\tau)=W_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}W_{i,s}(\tau), Ia(s)={i[n]:Ai=a,Si=s}I_{a}(s)=\{i\in[n]:A_{i}=a,S_{i}=s\} θ^a,sLP(τ)\hat{\theta}_{a,s}^{\textit{LP}}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θ^a,sLP(τ)\hat{\theta}_{a,s}^{\textit{LP}}(\tau) is defined in (5.7) q^a(τ)\hat{q}_{a}(\tau) For a{0,1}a\in\{0,1\} and τΥ\tau\in\Upsilon, q^a(τ)\hat{q}_{a}(\tau) is the estimator of qa(τ)q_{a}(\tau) without any adjustments

Assumption 6(i) holds by Assumption 7. In addition, by Assumption 2, we have supτΥ|τθa,sLP(τ)|<\sup_{\tau\in\Upsilon}|\partial_{\tau}\theta_{a,s}^{\textit{LP}}(\tau)|<\infty. This implies Assumption 6(ii). Next, we aim to show

supτΥ,a=0,1,s𝒮|θ^a,sLP(τ)θa,sLP(τ)|=Op(n1/2).\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}|\hat{\theta}_{a,s}^{\textit{LP}}(\tau)-\theta_{a,s}^{\textit{LP}}(\tau)|=O_{p}(n^{-1/2}).

Focusing on θ^1,sLP(τ)\hat{\theta}_{1,s}^{\textit{LP}}(\tau) we have

θ^1,sLP(τ)θ1,sLP(τ)\displaystyle\hat{\theta}_{1,s}^{\textit{LP}}(\tau)-\theta_{1,s}^{\textit{LP}}(\tau) =[1n1(s)iI1(s)W˙i,1,s(τ)W˙i,1,s(τ)]1\displaystyle=\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)\dot{W}_{i,1,s}^{\top}(\tau)\right]^{-1}
×[1n1(s)iI1(s)W˙i,1,s(τ)(1{Yiq^1(τ)}W˙i,1,s(τ)θ1,sLP(τ))].\displaystyle\times\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right]. (I.1)

For the first term in (I.1), we have

1n1(s)iI1(s)W˙i,1,s(τ)W˙i,1,s(τ)=d1n1(s)i=N(s)+1N(s)+n1(s)W˙i,1,ss(τ)W˙i,1,ss(τ),\displaystyle\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)\dot{W}_{i,1,s}^{\top}(\tau)\stackrel{{\scriptstyle d}}{{=}}\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\dot{W}_{i,1,s}^{s}(\tau)\dot{W}_{i,1,s}^{s\top}(\tau),

where W˙i,1,ss(τ)=Wi,ss(τ)1n1(s)i=N(s)+1N(s)+n1(s)Wi,ss(τ)\dot{W}_{i,1,s}^{s}(\tau)=W_{i,s}^{s}(\tau)-\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}W_{i,s}^{s}(\tau) and Wi,ss(τ)W_{i,s}^{s}(\tau) is i.i.d. across ii with common distribution equal to the conditional distribution of Wi,s(τ)W_{i,s}(\tau) given Si=sS_{i}=s and independent of N(s),n1(s)N(s),n_{1}(s). Therefore, by Assumption 9, we have

sups𝒮,τΥ1n1(s)i=N(s)+1N(s)+n1(s)W˙i,1,ss(τ)W˙i,1,ss(τ)𝔼(Wi,ss(τ)𝔼Wi,ss(τ))(Wi,ss(τ)𝔼Wi,ss(τ))F\displaystyle\sup_{s\in\mathcal{S},\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\dot{W}_{i,1,s}^{s}(\tau)\dot{W}_{i,1,s}^{s\top}(\tau)-\mathbb{E}(W_{i,s}^{s}(\tau)-\mathbb{E}W_{i,s}^{s}(\tau))(W_{i,s}^{s}(\tau)-\mathbb{E}W_{i,s}^{s}(\tau))^{\top}\right\|_{F}
=sups𝒮,τΥ1n1(s)i=N(s)+1N(s)+n1(s)W˙i,1,ss(τ)W˙i,1,ss(τ)𝔼(W~i,s(τ)W~i,s(τ)|Si=s)F=op(1),\displaystyle=\sup_{s\in\mathcal{S},\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\dot{W}_{i,1,s}^{s}(\tau)\dot{W}_{i,1,s}^{s\top}(\tau)-\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)\right\|_{F}=o_{p}(1),

where ||||F||\cdot||_{F} denotes the Frobenius norm and W~i,s(τ)=Wi,s(τ)𝔼(Wi,s(τ)|Si=s)\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s). For the second term in (I.1), we have

1n1(s)iI1(s)W˙i,1,s(τ)(1{Yiq^1(τ)}W˙i,1,s(τ)θ1,sLP(τ))\displaystyle\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)\left(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right)
=1n1(s)iI1(s)W~i,s(τ)(1{Yiq^1(τ)}W˙i,1,s(τ)θ1,sLP(τ))+R1(τ)\displaystyle=\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\left(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right)+R_{1}(\tau)
=1n1(s)iI1(s)W~i,s(τ)(1{Yiq^1(τ)}W~i,s(τ)θ1,sLP(τ))+R1(τ)+R2(τ)\displaystyle=\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\left(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right)+R_{1}(\tau)+R_{2}(\tau)
=[1n1(s)iI1(s)W~i,s(τ)(1{Yiq^1(τ)}1{Yiq1(τ)})]\displaystyle=\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-1\{Y_{i}\leq q_{1}(\tau)\})\right]
+[1n1(s)iI1(s)W~i,s(τ)(1{Yiq1(τ)}W~i,s(τ)θ1,sLP(τ))]+R1(τ)+R2(τ)\displaystyle+\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)(1\{Y_{i}\leq q_{1}(\tau)\}-\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right]+R_{1}(\tau)+R_{2}(\tau)
I(τ)+II(τ)+R1(τ)+R2(τ),\displaystyle\equiv I(\tau)+II(\tau)+R_{1}(\tau)+R_{2}(\tau),

where

R1(τ)=\displaystyle R_{1}(\tau)= (1n1(s)iI1(s)Wi,s(τ)𝔼(Wi,s(τ)|Si=s))(1n1(s)iI1(s)(1{Yiq^1(τ)}W˙i,1,s(τ)θ1,sLP(τ))),\displaystyle-\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right)\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right),

and

R2(τ)=(1n1(s)iI1(s)Wi,s(τ)𝔼(Wi,s(τ)|Si=s))(1n1(s)iI1(s)W~i,s(τ)θ1,sLP(τ)).\displaystyle R_{2}(\tau)=\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right)\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right).

By Assumption 7 we can show that supτΥ|θ1,sLP(τ)|C<\sup_{\tau\in\Upsilon}|\theta_{1,s}^{\textit{LP}}(\tau)|\leq C<\infty for some constant C>0C>0. Therefore, we have

supτΥ|R1(τ)|\displaystyle\sup_{\tau\in\Upsilon}|R_{1}(\tau)| =supτΥ|1n1(s)iI1(s)Wi,s(τ)𝔼(Wi,s(τ)|Si=s)||1n1(s)iI1(s)(1{Yiq^1(τ)}W˙i,1,s(τ)θ1,sLP(τ))|\displaystyle=\sup_{\tau\in\Upsilon}\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right|\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right|
=Op(n1/2),\displaystyle=O_{p}(n^{-1/2}),

and

supτΥ|R2(τ)|\displaystyle\sup_{\tau\in\Upsilon}|R_{2}(\tau)| =supτΥ|1n1(s)iI1(s)Wi,s(τ)𝔼(Wi,s(τ)|Si=s)||1n1(s)iI1(s)W~i,s(τ)θ1,sLP(τ)|\displaystyle=\sup_{\tau\in\Upsilon}\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right|\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right|
=Op(n1/2),\displaystyle=O_{p}(n^{-1/2}),

where we use the fact that, by Assumption 9,

supτυ|1n1(s)iI1(s)Wi,s(τ)𝔼(Wi,s(τ)|Si=s)|=dsupτυ|1n1(s)i=N(s)+1N(s)+n1(s)(Wi,ss(τ)𝔼Wi,ss(τ))|=Op(n1/2).\displaystyle\sup_{\tau\in\upsilon}\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right|\stackrel{{\scriptstyle d}}{{=}}\sup_{\tau\in\upsilon}\left|\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}(W_{i,s}^{s}(\tau)-\mathbb{E}W_{i,s}^{s}(\tau))\right|=O_{p}(n^{-1/2}).

Next, note that supτΥ|q^(τ)q(τ)|=Op(n1/2)\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|=O_{p}(n^{-1/2}), which means for any ε>0\varepsilon>0, there exists a constant M>0M>0 such that supτΥ|q^(τ)q(τ)|Mn1/2\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|\leq Mn^{-1/2} with probability greater than 1ε1-\varepsilon. On the event set that supτΥ|q^(τ)q(τ)|Mn1/2\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|\leq Mn^{-1/2}, we have

supτΥ|I(τ)|supτΥ|1n1(s)iI1(s)W~i,s(τ)(1{Yi(1)q^1(τ)}1{Yi(1)q1(τ)}\displaystyle\sup_{\tau\in\Upsilon}|I(\tau)|\leq\sup_{\tau\in\Upsilon}\biggl{|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\biggl{(}1\{Y_{i}(1)\leq\hat{q}_{1}(\tau)\}-1\{Y_{i}(1)\leq q_{1}(\tau)\}
(Yi(1)q^1(τ)|Xi,Si=s)+(Yi(1)q1(τ)|Xi,Si=s))|\displaystyle-\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)|X_{i},S_{i}=s)\biggr{)}\biggr{|}
+supτΥ|1n1(s)iI1(s)W~i,s(τ)((Yi(1)q^1(τ)|Xi,Si=s)(Yi(1)q1(τ)|Xi,Si=s))|\displaystyle+\sup_{\tau\in\Upsilon}\biggl{|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)|X_{i},S_{i}=s))\biggr{|}
sup|qq|Mn1/2|1n1(s)iI1(s)W~i,s(τ)(1{Yi(1)q}1{Yi(1)q}\displaystyle\leq\sup_{|q-q^{\prime}|\leq Mn^{-1/2}}\biggl{|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\biggl{(}1\{Y_{i}(1)\leq q\}-1\{Y_{i}(1)\leq q^{\prime}\}
(Yi(1)q|Xi,Si=s)+(Yi(1)q|Xi,Si=s))|+CsupτΥ|q^(τ)q(τ)|\displaystyle-\mathbb{P}(Y_{i}(1)\leq q|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q^{\prime}|X_{i},S_{i}=s)\biggr{)}\biggr{|}+C\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|
sup|qq|Mn1/2|1n1(s)iI1(s)W~i,s(τ)(1{Yi(1)q}1{Yi(1)q}\displaystyle\leq\sup_{|q-q^{\prime}|\leq Mn^{-1/2}}\biggl{|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\biggl{(}1\{Y_{i}(1)\leq q\}-1\{Y_{i}(1)\leq q^{\prime}\}
(Yi(1)q|Xi,Si=s)+(Yi(1)q|Xi,Si=s))|+Cn1/2\displaystyle-\mathbb{P}(Y_{i}(1)\leq q|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q^{\prime}|X_{i},S_{i}=s)\biggr{)}\biggr{|}+Cn^{-1/2}
=Op(n1/2),\displaystyle=O_{p}(n^{-1/2}),

where the first inequality is due to the triangle inequality, the second inequality is due to the fact that supτΥ|q^(τ)q(τ)|Mn1/2\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|\leq Mn^{-1/2}, and the third inequality is due to the fact that f1(|Xi,Si=s)f_{1}(\cdot|X_{i},S_{i}=s) is assumed to be bounded. To see the last equality in the above display, we define

={(Wi,s(j)𝔼Wi,s(τ,j)|Si=s)(1{Yi(1)q}1{Yi(1)q}(Yi(1)q|Xi,Si=s)+(Yi(1)q|Xi,Si=s)):τΥ,|qq|Mn1/2}\displaystyle\mathcal{F}=\begin{Bmatrix}&(W_{i,s}(j)-\mathbb{E}W_{i,s}(\tau,j)|S_{i}=s)\biggl{(}1\{Y_{i}(1)\leq q\}-1\{Y_{i}(1)\leq q^{\prime}\}\\ &-\mathbb{P}(Y_{i}(1)\leq q|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q^{\prime}|X_{i},S_{i}=s)\biggr{)}:\tau\in\Upsilon,|q-q^{\prime}|\leq Mn^{-1/2}\end{Bmatrix}

with envelope Fi=2Li+𝔼(Li|Si=s)L,dF_{i}=2L_{i}+\mathbb{E}(L_{i}|S_{i}=s)\in L_{\mathbb{P},d} for some d>2d>2, where Wi,s(τ,j)W_{i,s}(\tau,j) is the jjth coordinate of Wi,s(τ)W_{i,s}(\tau). Clearly \mathcal{F} is of the VC-type with fixed coefficients (α,v)(\alpha,v). In addition,

supff2Cn1/2σn2.\displaystyle\sup_{f\in\mathcal{F}}\mathbb{P}f^{2}\leq Cn^{-1/2}\equiv\sigma_{n}^{2}.

Therefore, Lemma N.2 implies that supτΥ|I(τ)|=Op(n1/2)\sup_{\tau\in\Upsilon}|I(\tau)|=O_{p}(n^{-1/2}). By the usual maximal inequality (e.g. (van der Vaart and Wellner, 1996), Theorem 2.14.1), we can show that

supτΥ|II(τ)|=Op(n1/2).\displaystyle\sup_{\tau\in\Upsilon}|II(\tau)|=O_{p}(n^{-1/2}).

Combining these results, we conclude that

supτΥ,s𝒮|1n1(s)iI1(s)W˙i,1,s(τ)(1{Yiq^1(τ)}W˙i,1,s(τ)θ1,sLP(τ))|=Op(n1/2),\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right|=O_{p}(n^{-1/2}),

and hence

supτΥ,s𝒮|θ^1,sLP(τ)θ1,sLP(τ)|=Op(n1/2).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}|\hat{\theta}_{1,s}^{\textit{LP}}(\tau)-\theta_{1,s}^{\textit{LP}}(\tau)|=O_{p}(n^{-1/2}).

Appendix J Proof of Theorem 5.4

Name Description HiH_{i} For i[n]i\in[n], Hi=H(Xi)H_{i}=H(X_{i}) for some function HH θa,sML(τ)\theta_{a,s}^{\textit{ML}}(\tau) For a{0,1}a\in\{0,1\} and s𝒮s\in\mathcal{S}, θa,sML(τ)\theta_{a,s}^{\textit{ML}}(\tau) is the pseudo true value defined in (5.9) θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) For a{0,1}a\in\{0,1\} and s𝒮s\in\mathcal{S}, θ^a,sML(τ)\hat{\theta}_{a,s}^{\textit{ML}}(\tau) is the estimator of θa,sML(τ)\theta_{a,s}^{\textit{ML}}(\tau) in (5.8)

Let dHd_{H} be the dimension of HiH_{i}, udHu\in\Re^{d_{H}}, and

Qn(τ,s,q,u)=1na(s)iIa(s)[1{Yiq}log(λ(Hi(θaML(τ)+u)))+1{Yi>q}log(1λ(Hi(θaML(τ)+u)))],\displaystyle Q_{n}(\tau,s,q,u)=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}[1\{Y_{i}\leq q\}\log(\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))+1\{Y_{i}>q\}\log(1-\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))],

and

Q(τ,s,q,u)=𝔼[1{Yi(a)q}log(λ(Hi(θaML(τ)+u)))+1{Yi(a)>q}log(1λ(Hi(θaML(τ)+u)))|Si=s].\displaystyle Q(\tau,s,q,u)=\mathbb{E}[1\{Y_{i}(a)\leq q\}\log(\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))+1\{Y_{i}(a)>q\}\log(1-\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))|S_{i}=s].

We note that

=(1{Yiq}log(λ(Hi(θaML(τ)+u))))+1{Yi>q}log(1λ(Hi(θaML(τ)+u))))Q(τ,s,q,u)τΥ,s𝒮,q,||u||2δ)\displaystyle\mathcal{F}=\begin{pmatrix}1\{Y_{i}\leq q\}\log(\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u))))+1\{Y_{i}>q\}\log(1-\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u))))-Q(\tau,s,q,u)\\ \tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta\end{pmatrix}

is a VC class with a fixed VC index. Then, Lemma N.2 implies

supτΥ,s𝒮,q,||u||2δ|Qn(τ,s,q,u)Q(τ,s,q,u)|=op(1).\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta}|Q_{n}(\tau,s,q,u)-Q(\tau,s,q,u)|=o_{p}(1).

In addition,

supτΥ,s𝒮,q,||u||2δ|qQ(τ,s,q,u)|C,\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta}|\partial_{q}Q(\tau,s,q,u)|\leq C,

and supτΥ|q^a(τ)qa(τ)|=Op(n1/2)\sup_{\tau\in\Upsilon}|\hat{q}_{a}(\tau)-q_{a}(\tau)|=O_{p}(n^{-1/2}). Therefore,

ΔnsupτΥ,s𝒮,||u||2δ|Qn(τ,s,q^a(τ),u)Q(τ,s,qa(τ),u)|\displaystyle\Delta_{n}\equiv\sup_{\tau\in\Upsilon,s\in\mathcal{S},||u||_{2}\leq\delta}|Q_{n}(\tau,s,\hat{q}_{a}(\tau),u)-Q(\tau,s,q_{a}(\tau),u)|
supτΥ,s𝒮,q,||u||2δ|Qn(τ,s,q,u)Q(τ,s,q,u)|\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta}|Q_{n}(\tau,s,q,u)-Q(\tau,s,q,u)|
+supτΥ,s𝒮,||u||2δ|Q(τ,s,q^a(τ),u)Q(τ,s,qa(τ),u)|=op(1).\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S},||u||_{2}\leq\delta}|Q(\tau,s,\hat{q}_{a}(\tau),u)-Q(\tau,s,q_{a}(\tau),u)|=o_{p}(1). (J.1)

Further note that Qn(τ,s,q^a(τ),u)Q_{n}(\tau,s,\hat{q}_{a}(\tau),u) is concave in uu for fixed τ\tau. Therefore, for vSdH1v\in S^{d_{H}-1} where SdH1={vdH,||v||2=1}S^{d_{H}-1}=\{v\in\Re^{d_{H}},||v||_{2}=1\}, and l>δl>\delta

Qn(τ,s,q^a(τ),δv)δlQn(τ,s,q^a(τ),lv)+(1δl)Qn(τ,s,q^a(τ),0),\displaystyle Q_{n}(\tau,s,\hat{q}_{a}(\tau),\delta v)\geq\frac{\delta}{l}Q_{n}(\tau,s,\hat{q}_{a}(\tau),lv)+\left(1-\frac{\delta}{l}\right)Q_{n}(\tau,s,\hat{q}_{a}(\tau),0),

which implies

δl(Qn(τ,s,q^a(τ),lv)Qn(τ,s,q^a(τ),0))\displaystyle\frac{\delta}{l}\left(Q_{n}(\tau,s,\hat{q}_{a}(\tau),lv)-Q_{n}(\tau,s,\hat{q}_{a}(\tau),0)\right) Qn(τ,s,q^a(τ),δv)Qn(τ,s,q^a(τ),0)\displaystyle\leq Q_{n}(\tau,s,\hat{q}_{a}(\tau),\delta v)-Q_{n}(\tau,s,\hat{q}_{a}(\tau),0)
Q(τ,s,qa(τ),δv)Q(τ,s,qa(τ),0)+2Δn.\displaystyle\leq Q(\tau,s,q_{a}(\tau),\delta v)-Q(\tau,s,q_{a}(\tau),0)+2\Delta_{n}.

Because Q(τ,s,qa(τ),δv)Q(τ,s,qa(τ),0)Q(\tau,s,q_{a}(\tau),\delta v)-Q(\tau,s,q_{a}(\tau),0) is continuous in (τ,v)Υ×SdH1(\tau,v)\in\Upsilon\times S^{d_{H}-1}, Υ×SdH1\Upsilon\times S^{d_{H}-1} is compact, and 0 is the unique maximizer of Q(τ,s,qa(τ),u)Q(\tau,s,q_{a}(\tau),u), we have

sup(τ,v)Υ×Sdx1Q(τ,s,qa(τ),δv)Q(τ,s,qa(τ),0)η,\displaystyle\sup_{(\tau,v)\in\Upsilon\times S^{d_{x}-1}}Q(\tau,s,q_{a}(\tau),\delta v)-Q(\tau,s,q_{a}(\tau),0)\leq-\eta,

for some η>0\eta>0. In addition, if supτΥ||θ^a,sML(τ)θa,sML(τ)||2>δ\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)||_{2}>\delta, then there exists (τ,l,v)Υ×(δ,)×Sdx1(\tau,l,v)\in\Upsilon\times(\delta,\infty)\times S^{d_{x}-1} such that

δl(Qn(τ,s,q^a(τ),lv)Qn(τ,s,q^a(τ),0))0.\displaystyle\frac{\delta}{l}\left(Q_{n}(\tau,s,\hat{q}_{a}(\tau),lv)-Q_{n}(\tau,s,\hat{q}_{a}(\tau),0)\right)\geq 0.

Therefore,

(supτΥ||θ^a,sML(τ)θa,sML(τ)||2>δ)(η2Δn)0,\displaystyle\mathbb{P}\left(\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)||_{2}>\delta\right)\leq\mathbb{P}(\eta\leq 2\Delta_{n})\rightarrow 0,

where the last step is due to (J.1). This implies

supτΥ||θ^a,sML(τ)θa,sML(τ)||2=op(1).\displaystyle\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)||_{2}=o_{p}(1).

Appendix K Proof of Theorem 5.5

Name Description λ()\lambda(\cdot) Logistic CDF θa,sLPML(τ)\theta_{a,s}^{\textit{LPML}}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θa,sLPML(τ)\theta_{a,s}^{\textit{LPML}}(\tau) is the pseudo true value defined in (5.11) θ^a,sLPML(τ)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θ^a,sLPML(τ)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau) is the estimator of θa,sLPML(τ)\theta_{a,s}^{\textit{LPML}}(\tau) in (5.15) ωi,a,s(τ)\omega_{i,a,s}(\tau) For a=0,1a=0,1, i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, ωi,a,s(τ)=λ(Hiθa,sML(τ))\omega_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\theta_{a,s}^{\textit{ML}}(\tau)) Wi,s(τ)W_{i,s}(\tau) For i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, Wi,s(τ)=(ωi,1,s(τ),ωi,0,s(τ))W_{i,s}(\tau)=(\omega_{i,1,s}(\tau),\omega_{i,0,s}(\tau))^{\top} ωi,a,a,s(τ)\omega_{i,a,a^{\prime},s}(\tau) For a=0,1a=0,1, a=0,1a^{\prime}=0,1, i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, ωi,a,a,s(τ)=ωi,a,s(τ)1na(s)iIa(s)ωi,a,s(τ)\omega_{i,a,a^{\prime},s}(\tau)=\omega_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\omega_{i,a^{\prime},s}(\tau) W˙i,a,s(τ)\dot{W}_{i,a,s}(\tau) For a=0,1a=0,1, i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, W˙i,a,s(τ)=Wi,s(τ)1na(s)iIa(s)Wi,s(τ)=(ωi,a,1,s(τ),ωi,a,0,s(τ))\dot{W}_{i,a,s}(\tau)=W_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}W_{i,s}(\tau)=(\omega_{i,a,1,s}(\tau),\omega_{i,a,0,s}(\tau))^{\top} ω^i,a,s(τ)\hat{\omega}_{i,a,s}(\tau) For a=0,1a=0,1, i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, ω^i,a,s(τ)=λ(Hiθ^a,sML(τ))\hat{\omega}_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\hat{\theta}_{a,s}^{\textit{ML}}(\tau)) W^i,s(τ)\hat{W}_{i,s}(\tau) For i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, W^i,s(τ)=(ω^i,1,s(τ),ω^i,0,s(τ))\hat{W}_{i,s}(\tau)=(\hat{\omega}_{i,1,s}(\tau),\hat{\omega}_{i,0,s}(\tau))^{\top} ω^i,a,a,s(τ)\hat{\omega}_{i,a,a^{\prime},s}(\tau) For a=0,1a=0,1, a=0,1a^{\prime}=0,1, i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, ω^i,a,a,s(τ)=ω^i,a,s(τ)1na(s)iIa(s)ω^i,a,s(τ)\hat{\omega}_{i,a,a^{\prime},s}(\tau)=\hat{\omega}_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{\omega}_{i,a^{\prime},s}(\tau) W˘i,a,s(τ)\breve{W}_{i,a,s}(\tau) For a=0,1a=0,1, i[n]i\in[n], s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, W˘i,a,s(τ)=W^i,a,s(τ)1na(s)iIa(s)W^i,s(τ)=(ω^i,a,1,s(τ),ω^i,a,0,s(τ))\breve{W}_{i,a,s}(\tau)=\hat{W}_{i,a,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{W}_{i,s}(\tau)=(\hat{\omega}_{i,a,1,s}(\tau),\hat{\omega}_{i,a,0,s}(\tau))^{\top}

Recall m^a(τ,s,Xi)=τW^i,s(τ)θ^a,sLPML(τ)\widehat{m}_{a}(\tau,s,X_{i})=\tau-\hat{W}_{i,s}^{\top}(\tau)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau) and m¯a(τ,s,Xi)=τWi,s(τ)θa,sLPML(τ)\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\theta_{a,s}^{\textit{LPML}}(\tau). Let dHd_{H} be the dimension of HiH_{i}. Then, we have

Δ¯a(τ,s,Xi)\displaystyle\overline{\Delta}_{a}(\tau,s,X_{i}) =m^a(τ,s,Xi)m¯a(τ,s,Xi)\displaystyle=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})
Λτ,s(Xi,θa,s(τ))Λτ,s(Xi,θ^a,s(τ)),\displaystyle\equiv\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))-\Lambda_{\tau,s}(X_{i},\hat{\theta}_{a,s}(\tau)),

where the functional form Λτ,s()\Lambda_{\tau,s}(\cdot) is invariant to (τ,s)(\tau,s),

Λτ,s(Xi,θ)=Λ(Xi,θ)(λ(H(Xi)θ1),λ(H(Xi)θ2))θ3,θ1,θ2dH,θ32,\displaystyle\Lambda_{\tau,s}(X_{i},\theta)=\Lambda(X_{i},\theta)\equiv(\lambda(H(X_{i})^{\top}\theta_{1}),\lambda(H(X_{i})^{\top}\theta_{2}))\theta_{3},\quad\theta_{1},\theta_{2}\in\Re^{d_{H}},\quad\theta_{3}\in\Re^{2},
θa,s(τ)=((θ1,sML(τ)),(θ0,sML(τ)),(θa,sLPML(τ))),and\displaystyle\theta_{a,s}(\tau)=((\theta_{1,s}^{\textit{ML}}(\tau))^{\top},(\theta_{0,s}^{\textit{ML}}(\tau))^{\top},(\theta_{a,s}^{\textit{LPML}}(\tau))^{\top})^{\top},\quad\text{and}
θ^a,s(τ)=((θ^1,sML(τ)),(θ^0,sML(τ)),(θ^a,sLPML(τ))).\displaystyle\hat{\theta}_{a,s}(\tau)=((\hat{\theta}_{1,s}^{\textit{ML}}(\tau))^{\top},(\hat{\theta}_{0,s}^{\textit{ML}}(\tau))^{\top},(\hat{\theta}_{a,s}^{\textit{LPML}}(\tau))^{\top})^{\top}.

Suppose

supτΥ,a=0,1,s𝒮|θ^a,sLPML(τ)θa,sLPML(τ)|=op(1),\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}|\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)-\theta_{a,s}^{\textit{LPML}}(\tau)|=o_{p}(1), (K.1)

and we also have

supτΥ,a=0,1,s𝒮|θ^a,sML(τ)θa,sML(τ)|=op(1)\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}|\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)|=o_{p}(1)

by Theorem 5.4. Then Assumption 6(iii) holds for θ^a,s(τ)\hat{\theta}_{a,s}(\tau). Assumption 6(i) holds automatically as Λ()\Lambda(\cdot) does not depend on τ\tau, and Assumption 6(ii) holds by Assumption 11. Then, Theorem 5.1 implies Assumptions 3 and 5 hold. In addition, Theorem 5.2 implies [ΣLPML(τk,τl)]k,l[K][\Sigma^{\textit{LPML}}(\tau_{k},\tau_{l})]_{k,l\in[K]} is the smallest among all linear adjustments with Wi,s(τ)=(ωi,1,s(τ),ωi,0,s(τ))W_{i,s}(\tau)=(\omega_{i,1,s}(\tau),\omega_{i,0,s}(\tau))^{\top} with ωi,a,s(τ)=λ(Hiθa,sML(τ))\omega_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\theta_{a,s}^{\textit{ML}}(\tau)) as the regressors.

Therefore, the only thing left is to establish (K.1). First, denote

ω˙i,a,a,s(τ)=ωi,a,s(τ)1na(s)iIa(s)ωi,a,s(τ),W˙i,a,s(τ)=(ω˙i,a,1,s(τ),ω˙i,a,0,s(τ)),and\displaystyle\dot{\omega}_{i,a,a^{\prime},s}(\tau)=\omega_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\omega_{i,a^{\prime},s}(\tau),\quad\dot{W}_{i,a,s}(\tau)=(\dot{\omega}_{i,a,1,s}(\tau),\dot{\omega}_{i,a,0,s}(\tau))^{\top},\quad\text{and}
θ˘a,sLPML(τ)=[1na(s)iIa(s)W˙i,a,s(τ)W˙i,a,s(τ)]1[1na(s)iIa(s)W˙i,a,s(τ)1{Yiq^a(τ)}].\displaystyle\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right].

We note that Assumption 9 holds with Wi,s(τ)=(λ(Hiθ1,sML(τ)),λ(Hiθ0,sML(τ)))W_{i,s}(\tau)=(\lambda(H_{i}^{\top}\theta_{1,s}^{\textit{ML}}(\tau)),\lambda(H_{i}^{\top}\theta_{0,s}^{\textit{ML}}(\tau)))^{\top} by Assumption 11(ii). Then, following the same argument as in the proof of Theorem 5.3, we can show that

supτΥ,a=0,1,s𝒮||θ˘a,sLPML(τ)θa,sLPML(τ)||2=op(1).\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)-\theta_{a,s}^{\textit{LPML}}(\tau)||_{2}=o_{p}(1).

Therefore, it suffices to show

supτΥ,a=0,1,s𝒮||θ˘a,sLPML(τ)θ^a,sLPML(τ)||2=op(1).\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)-\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)||_{2}=o_{p}(1).

Denote W^i,s(τ)=(ω^i,1,s(τ),ω^i,0,s(τ))\hat{W}_{i,s}(\tau)=(\hat{\omega}_{i,1,s}(\tau),\hat{\omega}_{i,0,s}(\tau))^{\top} and ω^i,a,s(τ)=λ(Hiθ^a,sML(τ))\hat{\omega}_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\hat{\theta}_{a,s}^{\textit{ML}}(\tau)). We have

supτΥ,a=0,1,s𝒮[1na(s)iIa(s)(ωi,1,s(τ)ω^i,1,s(τ))2]1/2\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))^{2}\right]^{1/2}
maxa=0,1,s𝒮λmax1/2(1na(s)iIa(s)HiHi)supτΥ,a=0,1,s𝒮||θ^1,sML(τ)θ1,sML(τ)||2=op(1).\displaystyle\leq\max_{a=0,1,s\in\mathcal{S}}\lambda_{\max}^{1/2}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{i}H_{i}^{\top}\right)\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\hat{\theta}_{1,s}^{\textit{ML}}(\tau)-\theta_{1,s}^{\textit{ML}}(\tau)||_{2}=o_{p}(1).

In addition, denote ω˘i,a,a,s(τ)=ω^i,a,s(τ)1na(s)iIa(s)ω^i,a,s(τ)\breve{\omega}_{i,a,a^{\prime},s}(\tau)=\hat{\omega}_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{\omega}_{i,a^{\prime},s}(\tau) and W˘i,a,s(τ)=(ω˘i,a,1,s(τ),ω˘i,a,0,s(τ))\breve{W}_{i,a,s}(\tau)=(\breve{\omega}_{i,a,1,s}(\tau),\breve{\omega}_{i,a,0,s}(\tau))^{\top}. We first consider the case a=1a^{\prime}=1. We have

supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)ω˙i,a,1,s2(τ)]1/2[1na(s)iIa(s)ω˘i,a,1,s2(τ)]1/2|\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,1,s}^{2}(\tau)\right]^{1/2}-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,1,s}^{2}(\tau)\right]^{1/2}\right|
supτΥ,a=0,1,s𝒮[1na(s)iIa(s)(ω˙i,a,1,s(τ)ω˘i,a,1,s2(τ))2]1/2\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{\omega}_{i,a,1,s}(\tau)-\breve{\omega}_{i,a,1,s}^{2}(\tau))^{2}\right]^{1/2}
supτΥ,a=0,1,s𝒮[1na(s)iIa(s)(ωi,1,s(τ)ω^i,1,s(τ))2]1/2+supτΥ,a=0,1,s𝒮|1na(s)iIa(s)(ωi,1,s(τ)ω^i,1,s(τ))|\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))^{2}\right]^{1/2}+\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))\right|
2supτΥ,a=0,1,s𝒮[1na(s)iIa(s)(ωi,1,s(τ)ω^i,1,s(τ))2]1/2=op(1).\displaystyle\leq 2\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))^{2}\right]^{1/2}=o_{p}(1).

In addition, Assumption 11 implies infτΥ,a=0,1,s𝒮[1na(s)iIa(s)ω˙i,a,1,s2(τ)]1/2C<\inf_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,1,s}^{2}(\tau)\right]^{1/2}\leq C<\infty. Therefore, we have

supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)ω˙i,a,1,s2(τ)][1na(s)iIa(s)ω˘i,a,1,s2(τ)]|=op(1).\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,1,s}^{2}(\tau)\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,1,s}^{2}(\tau)\right]\right|=o_{p}(1).

Similarly, we have

supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)ω˙i,a,0,s2(τ)][1na(s)iIa(s)ω˘i,a,0,s2(τ)]|=op(1).\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,0,s}^{2}(\tau)\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,0,s}^{2}(\tau)\right]\right|=o_{p}(1).

Last,

supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)ω˙i,a,0,s(τ)ω˙i,a,1,s(τ)][1na(s)iIa(s)ω˘i,a,0,s(τ)ω˘i,a,1,s(τ)]|\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,0,s}(\tau)\dot{\omega}_{i,a,1,s}(\tau)\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,0,s}(\tau)\breve{\omega}_{i,a,1,s}(\tau)\right]\right|
supτΥ,a=0,1,s𝒮[(1na(s)iIa(s)ω˙i,a,0,s2(τ))1/2+(1na(s)iIa(s)ω˘i,a,1,s2(τ))1/2]\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,0,s}^{2}(\tau)\right)^{1/2}+\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,1,s}^{2}(\tau)\right)^{1/2}\right]
×{[1na(s)iIa(s)(ω˙i,a,1,s(τ)ω˘i,a,1,s(τ))2]1/2+[1na(s)iIa(s)(ω˙i,a,0,s(τ)ω˘i,a,0,s(τ))2]1/2}=op(1).\displaystyle\times\biggl{\{}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{\omega}_{i,a,1,s}(\tau)-\breve{\omega}_{i,a,1,s}(\tau))^{2}\right]^{1/2}+\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{\omega}_{i,a,0,s}(\tau)-\breve{\omega}_{i,a,0,s}(\tau))^{2}\right]^{1/2}\biggr{\}}=o_{p}(1).

This implies

supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)W˙i,a,s(τ)W˙i,a,s(τ)][1na(s)iIa(s)W˘i,a,s(τ)W˘i,a,s(τ)]|\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}(\tau)^{\top}\right]\right|
=supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)(ω˙i,a,1,s2(τ)ω˙i,a,1,s(τ)ω˙i,a,0,s(τ)ω˙i,a,0,s(τ)ω˙i,a,1,s(τ)ω˙i,a,0,s2(τ))]\displaystyle=\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\biggl{|}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\begin{pmatrix}\dot{\omega}_{i,a,1,s}^{2}(\tau)&\dot{\omega}_{i,a,1,s}(\tau)\dot{\omega}_{i,a,0,s}(\tau)\\ \dot{\omega}_{i,a,0,s}(\tau)\dot{\omega}_{i,a,1,s}(\tau)&\dot{\omega}_{i,a,0,s}^{2}(\tau)\end{pmatrix}\right]
[1na(s)iIa(s)(ω˘i,a,1,s2(τ)ω˘i,a,1,s(τ)ω˘i,a,0,s(τ)ω˘i,a,0,s(τ)ω˘i,a,1,s(τ)ω˘i,a,0,s2(τ))]|=op(1),\displaystyle-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\begin{pmatrix}\breve{\omega}_{i,a,1,s}^{2}(\tau)&\breve{\omega}_{i,a,1,s}(\tau)\breve{\omega}_{i,a,0,s}(\tau)\\ \breve{\omega}_{i,a,0,s}(\tau)\breve{\omega}_{i,a,1,s}(\tau)&\breve{\omega}_{i,a,0,s}^{2}(\tau)\end{pmatrix}\right]\biggr{|}=o_{p}(1),

and thus,

supτΥ,a=0,1,s𝒮|[1na(s)iIa(s)W˙i,a,s(τ)W˙i,a,s(τ)]1[1na(s)iIa(s)W˘i,a,s(τ)W˘i,a,s(τ)]1|=op(1).\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]^{-1}-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\right|=o_{p}(1).

In addition,

supτΥ,a=0,1,s𝒮1na(s)iIa(s)(W˙i,a,sW˘i,a,s(τ))1{Yiq^a(τ)}2\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{W}_{i,a,s}-\breve{W}_{i,a,s}(\tau))1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right\|_{2}
supτΥ,a=0,1,s𝒮1na(s)iIa(s)W˙i,a,s(τ)W˘i,a,s(τ)2\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left\|\dot{W}_{i,a,s}(\tau)-\breve{W}_{i,a,s}(\tau)\right\|_{2}
2supτΥ,a=0,1,s𝒮1na(s)iIa(s)Wi,s(τ)W^i,s(τ)2=op(1).\displaystyle\leq 2\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left\|W_{i,s}(\tau)-\hat{W}_{i,s}(\tau)\right\|_{2}=o_{p}(1).

Therefore, we have

supτΥ,a=0,1,s𝒮||θ˘a,sLPML(τ)θ^a,sLPML(τ)||2\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)-\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)||_{2}
=supτΥ,a=0,1,s𝒮[1na(s)iIa(s)W˙i,a,s(τ)W˙i,a,s(τ)]1[1na(s)iIa(s)W˙i,a,s(τ)1{Yiq^a(τ)}]\displaystyle=\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\biggl{\|}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right]
[1na(s)iIa(s)W˘i,a,s(τ)W˘i,a,s(τ)]1[1na(s)iIa(s)W˘i,a,s(τ)1{Yiq^a(τ)}]2\displaystyle-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right]\biggr{\|}_{2}
=op(1),\displaystyle=o_{p}(1),

which concludes the proof.

Appendix L Proof of Theorem 5.6

Name Description θa,sNP(τ)\theta_{a,s}^{\textit{NP}}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θa,sNP(τ)\theta_{a,s}^{\textit{NP}}(\tau) is the pseudo true value defined in Assumption 12(ii) θ^a,sNP(τ)\hat{\theta}_{a,s}^{\textit{NP}}(\tau) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θ^a,sNP(τ)\hat{\theta}_{a,s}^{\textit{NP}}(\tau) is the estimator of θa,sNP(τ)\theta_{a,s}^{\textit{NP}}(\tau) in (5.17)

The proof strategy follows Belloni et al. (2017) and details are given here for completeness. We divide the proof into three steps. In the first step, we show

supτΥ||θ^a,sNP(τ)θa,sNP(τ)||2=Op(hnlog(n)/n).\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{NP}}(\tau)-\theta_{a,s}^{\textit{NP}}(\tau)||_{2}=O_{p}(\sqrt{h_{n}\log(n)/n}).

In the second step, we establish Assumption 5. By a similar argument, we can establish Assumption 3(i). In the third step, we establish Assumptions 3(ii) and 3(iii).

Step 1. Let U^τ=θ^NPa,s(τ)θNPa,s(τ)\hat{U}_{\tau}=\hat{\theta}^{\textit{NP}}_{a,s}(\tau)-\theta^{\textit{NP}}_{a,s}(\tau),

Qn(τ,s,q,θ)\displaystyle Q_{n}(\tau,s,q,\theta) =1na(s)iIa(s)[1{Yiq}log(λ(Hhn(Xi)θa))+1{Yi>q}log(1λ(Hhn(Xi)θa))]\displaystyle=\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}[1\{Y_{i}\leq q\}\log(\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))+1\{Y_{i}>q\}\log(1-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))]
=1na(s)iIa(s)[log(1+exp(Hhn(Xi)θa))1{Yiq}Hhn(Xi)θa],\displaystyle=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}[\log\left(1+\exp(H_{h_{n}}^{\top}(X_{i})\theta_{a})\right)-1\{Y_{i}\leq q\}H_{h_{n}}^{\top}(X_{i})\theta_{a}],

and for an arbitrary UτhnU_{\tau}\in\Re^{h_{n}},

i(t)=log(1+exp(Hhn(Xi)(θNPa,s(τ)+tUτ))).\displaystyle\ell_{i}(t)=\log(1+\exp(H_{h_{n}}^{\top}(X_{i})(\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau}))).

Then, we have

U^τ=argminUτQn(τ,s,q^a(τ),θNPa,s(τ)+Uτ)Qn(τ,s,q^a(τ),θNPa,s(τ)),\displaystyle\hat{U}_{\tau}=\operatorname*{arg\,min}_{U_{\tau}}Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+U_{\tau})-Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)),
tQn(τ,s,q^a(τ),θNPa,s(τ)+tUτ)|t=0=1na(s)iIa(s)(1{Yiq^a(τ)}λ(Hhn(Xi)θNPa,s(τ)))Hhn(Xi)Uτ,\displaystyle\partial_{t}Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})|_{t=0}=\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))\right)H_{h_{n}}^{\top}(X_{i})U_{\tau},

and

Qn(τ,s,q^a(τ),θNPa,s(τ)+Uτ)Qn(τ,s,q^a(τ),θNPa,s(τ))tQn(τ,s,q^a(τ),θNPa,s(τ)+tUτ)|t=0\displaystyle Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+U_{\tau})-Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau))-\partial_{t}Q_{n}^{\top}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})|_{t=0}
=\displaystyle= 1na(s)iIa(s)[i(1)i(0)i(0)].\displaystyle\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left[\ell_{i}(1)-\ell_{i}(0)-\ell_{i}^{\prime}(0)\right].

In addition

|i(t)||i(t)||Hhn(Xi)Uτ|.\displaystyle|\ell_{i}^{{}^{\prime\prime\prime}}(t)|\leq|\ell_{i}^{{}^{\prime\prime}}(t)||H_{h_{n}}^{\top}(X_{i})U_{\tau}|.

Therefore, there exists a constant c¯>0\underline{c}>0 such that

i(1)i(0)i(0)\displaystyle\ell_{i}(1)-\ell_{i}(0)-\ell^{\prime}_{i}(0)
i(0)(Hhn(Xi)Uτ)2[exp(|Hhn(Xi)Uτ|)+|Hhn(Xi)Uτ|1]\displaystyle\geq\frac{\ell_{i}^{{}^{\prime\prime}}(0)}{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}\left[\exp(-|H_{h_{n}}^{\top}(X_{i})U_{\tau}|)+|H_{h_{n}}^{\top}(X_{i})U_{\tau}|-1\right]
=λ(Hhn(Xi)θNPa,s(τ))(1Λ(Hhn(Xi)θNPa,s(τ)))[exp(|Hhn(Xi)Uτ|)+|Hhn(Xi)Uτ|1]\displaystyle=\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))(1-\Lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau)))\left[\exp(-|H_{h_{n}}^{\top}(X_{i})U_{\tau}|)+|H_{h_{n}}^{\top}(X_{i})U_{\tau}|-1\right]
c¯[exp(|Hhn(Xi)Uτ|)+|Hhn(Xi)Uτ|1]\displaystyle\geq\underline{c}\left[\exp(-|H_{h_{n}}^{\top}(X_{i})U_{\tau}|)+|H_{h_{n}}^{\top}(X_{i})U_{\tau}|-1\right]
c¯((Hhn(Xi)Uτ)22|Hhn(Xi)Uτ|36),\displaystyle\geq\underline{c}\left(\frac{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}{2}-\frac{|H_{h_{n}}^{\top}(X_{i})U_{\tau}|^{3}}{6}\right),

where the first inequality is due to Bach (2010, Lemma 1) and the third inequality holds because

ex+x1x22x36,x>0.\displaystyle e^{-x}+x-1\geq\frac{x^{2}}{2}-\frac{x^{3}}{6},\;x>0.

To see the second inequality, note that ex+x10e^{-x}+x-1\geq 0 for x0x\geq 0 and by Assumption 12,

infa=0,1,s𝒮,τΥ,xSupp(X)λ(Hhn(x)θNPa,s(τ))\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\lambda(H_{h_{n}}^{\top}(x)\theta^{\textit{NP}}_{a,s}(\tau))
=infa=0,1,s𝒮,τΥ,xSupp(X)((Yi(a)qa(τ)|Si=s,Xi=x)Ra(τ,s,x))c/2,\displaystyle=\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}(\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)-R_{a}(\tau,s,x))\geq c/2,

and

supa=0,1,s𝒮,τΥ,xSupp(X)λ(Hhn(x)θNPa,s(τ))\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\lambda(H_{h_{n}}^{\top}(x)\theta^{\textit{NP}}_{a,s}(\tau))
=\displaystyle= supa=0,1,s𝒮,τΥ,xSupp(X)((Yi(a)qa(τ)|Si=s,Xi=x)+Ra(τ,s,x))1c/2.\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}(\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)+R_{a}(\tau,s,x))\leq 1-c/2.

This implies

infτΥλ(Hhn(Xi)θNPa,s(τ))(1Λ(Hhn(Xi)θNPa,s(τ)))c¯>0,\displaystyle\inf_{\tau\in\Upsilon}\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))(1-\Lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau)))\geq\underline{c}>0,

and thus,

Gn(Uτ)\displaystyle G_{n}(U_{\tau}) Qn(τ,s,q^a(τ),θNPa,s(τ)+Uτ)Qn(τ,s,q^a(τ),θNPa,s(τ))tQn(τ,s,q^a(τ),θNPa,s(τ)+tUτ)|t=0\displaystyle\equiv Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+U_{\tau})-Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau))-\partial_{t}Q_{n}^{\top}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})|_{t=0}
c¯na(s)iIa(s)((Hhn(Xi)Uτ)22|Hhn(Xi)Uτ|36).\displaystyle\geq\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left(\frac{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}{2}-\frac{|H_{h_{n}}^{\top}(X_{i})U_{\tau}|^{3}}{6}\right).

Let

¯=infUhn[1na(s)iIa(s)(Hhn(Xi)U)2]3/21na(s)iIa(s)|Hhn(Xi)U|3.\displaystyle\overline{\ell}=\inf_{U\in\Re^{h_{n}}}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})U)^{2}\right]^{3/2}}{\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}|H_{h_{n}}^{\top}(X_{i})U|^{3}}. (L.1)

If [1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2¯\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}\leq\overline{\ell}, then

1na(s)iIa(s)(Hhn(Xi)U^τ)2\displaystyle\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}
=[1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2[1na(s)iIa(s)(Hhn(Xi)U^τ)2]3/21na(s)iIa(s)(Hhn(Xi)U^τ)31na(s)iIa(s)(Hhn(Xi)U^τ)3\displaystyle=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{-1/2}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{3/2}}{\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{3}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{3}
1na(s)iIa(s)(Hhn(Xi)U^τ)3,\displaystyle\geq\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{3},

and thus

Gn(U^τ)c¯na(s)iIa(s)((Hhn(Xi)Uτ)22|Hhn(Xi)Uτ|36)c¯na(s)iIa(s)(Hhn(Xi)U^τ)23.\displaystyle G_{n}(\hat{U}_{\tau})\geq\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left(\frac{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}{2}-\frac{|H_{h_{n}}^{\top}(X_{i})U_{\tau}|^{3}}{6}\right)\geq\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\frac{(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}}{3}.

On the other hand, if [1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2>¯\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}>\overline{\ell}, we can denote U¯τ=¯U^τ[1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2\overline{U}_{\tau}=\frac{\overline{\ell}\hat{U}_{\tau}}{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}} such that

[1na(s)iIa(s)(Hhn(Xi)U¯τ)2]1/2¯.\displaystyle\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\overline{U}_{\tau})^{2}\right]^{1/2}\leq\overline{\ell}.

Further, because Gn(Uτ)G_{n}(U_{\tau}) is convex in UτU_{\tau} we have

Gn(U^τ)\displaystyle G_{n}(\hat{U}_{\tau}) =Gn([1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2¯U¯τ)\displaystyle=G_{n}\left(\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}{\overline{\ell}}\overline{U}_{\tau}\right)
[1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2¯Gn(U¯τ)\displaystyle\geq\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}{\overline{\ell}}G_{n}(\overline{U}_{\tau})
[1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2¯c¯na(s)iIa(s)(Hhn(Xi)U¯τ)23\displaystyle\geq\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}{\overline{\ell}}\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\frac{(H_{h_{n}}^{\top}(X_{i})\overline{U}_{\tau})^{2}}{3}
=c¯¯3[1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2.\displaystyle=\frac{\underline{c}\overline{\ell}}{3}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}.

Therefore, for some constant c¯\overline{c} that only depends on c¯\underline{c} and κ1\kappa_{1}, we have

Gn(U^τ)\displaystyle G_{n}(\hat{U}_{\tau}) min(c¯na(s)iIa(s)(Hhn(Xi)U^τ)23,c¯¯3[1na(s)iIa(s)(Hhn(Xi)U^τ)2]1/2)\displaystyle\geq\min\left(\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\frac{(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}}{3},\frac{\underline{c}\overline{\ell}}{3}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}\right)
c¯3min(||U^τ||22,¯||U^τ||2).\displaystyle\geq\frac{\overline{c}}{3}\min(||\hat{U}_{\tau}||_{2}^{2},\overline{\ell}||\hat{U}_{\tau}||_{2}). (L.2)

In addition, by construction,

Gn(U^τ)\displaystyle G_{n}(\hat{U}_{\tau}) |tQn(τ,s,q^a(τ),θNPa,s(τ)+tUτ)|t=0|\displaystyle\leq|\partial_{t}Q_{n}^{\top}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})|_{t=0}|
=|1na(s)iIa(s)(1{Yiq^a(τ)}λ(Hhn(Xi)θa,sNP(τ)))Hhn(Xi)U^τ|\displaystyle=\left|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a,s}^{\textit{NP}}(\tau)))H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau}\right|
1na(s)iIa(s)(1{Yiq^a(τ)}ma(τ,s,Xi))Hhn(Xi)U^τ1\displaystyle\leq\left\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}\left\|\hat{U}_{\tau}\right\|_{1}
+[1na(s)iIa(s)Ra2(τ,s,Xi)]1/2[1na(s)iIa(s)(Hhn(Xi)U^τ)]1/2\displaystyle+\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})\right]^{1/2}
hn1/2na(s)iIa(s)(1{Yiq^a(τ)}ma(τ,s,Xi))Hhn(Xi)U^τ2\displaystyle\leq\left\|\frac{h_{n}^{1/2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}\left\|\hat{U}_{\tau}\right\|_{2}
+[κ2na(s)iIa(s)Ra2(τ,s,Xi)]1/2||U^τ||2.\displaystyle+\left[\frac{\kappa_{2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}||\hat{U}_{\tau}||_{2}. (L.3)

Combining (L.2) and (L.3), we have

c¯3min(||U^τ||2,¯)hn1/2na(s)iIa(s)(1{Yiq^a(τ)}ma(τ,s,Xi))Hhn(Xi)+[κ2na(s)iIa(s)Ra2(τ,s,Xi)]1/2.\displaystyle\frac{\overline{c}}{3}\min(||\hat{U}_{\tau}||_{2},\overline{\ell})\leq\left\|\frac{h_{n}^{1/2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}+\left[\frac{\kappa_{2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}.

Taking supτΥ\sup_{\tau\in\Upsilon} on both sides, we have

c¯3min(supτΥ||U^τ||2,¯)\displaystyle\frac{\overline{c}}{3}\min(\sup_{\tau\in\Upsilon}||\hat{U}_{\tau}||_{2},\overline{\ell})
supτΥhn1/2na(s)iIa(s)(1{Yiq^a(τ)}ma(τ,s,Xi))Hhn(Xi)+supτΥ[κ2na(s)iIa(s)Ra2(τ,s,Xi)]1/2\displaystyle\leq\sup_{\tau\in\Upsilon}\left\|\frac{h_{n}^{1/2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}+\sup_{\tau\in\Upsilon}\left[\frac{\kappa_{2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}
=Op(hnlog(n)n),\displaystyle=O_{p}(\sqrt{\frac{h_{n}\log(n)}{n}}),

where the last line holds due to Assumption 12 and Lemma N.6. Finally, Lemma N.7 shows that ¯/hnlog(n)n\overline{\ell}/\sqrt{\frac{h_{n}\log(n)}{n}}\rightarrow\infty, which implies

supτΥ||U^τ||2=Op(hnlog(n)n).\displaystyle\sup_{\tau\in\Upsilon}||\hat{U}_{\tau}||_{2}=O_{p}\left(\sqrt{\frac{h_{n}\log(n)}{n}}\right).

Step 2. Recall

Δ¯a(τ,s,Xi)\displaystyle\overline{\Delta}_{a}(\tau,s,X_{i}) =m^a(τ,s,Xi)m¯a(τ,s,Xi)\displaystyle=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})
=(Yi(a)qa(τ)|Xi,Si=s)λ(Hhn(Xi)θ^NPa,s(τ))\displaystyle=\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|X_{i},S_{i}=s)-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}^{\textit{NP}}_{a,s}(\tau))
=λ(Hhn(Xi)θNPa,s(τ))λ(Hhn(Xi)θ^NPa,s(τ))+Ra(τ,s,Xi),\displaystyle=\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}^{\textit{NP}}_{a,s}(\tau))+R_{a}(\tau,s,X_{i}),

and {Xis,ξsi}i[n]\{X_{i}^{s},\xi^{s}_{i}\}_{i\in[n]} is generated independently from the joint distribution of (Xi,ξi)(X_{i},\xi_{i}) given Si=sS_{i}=s, and so is independent of {Ai,Si}i[n]\{A_{i},S_{i}\}_{i\in[n]}. Let

H(θ1,θ2)=𝔼[λ(Hhn(Xi)θ1)λ(Hhn(Xi)θ2)|Si=s]=𝔼[λ(Hhn(Xis)θ1)λ(Hhn(Xis)θ2)].H(\theta_{1},\theta_{2})=\mathbb{E}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{2})|S_{i}=s]=\mathbb{E}[\lambda(H_{h_{n}}^{\top}(X_{i}^{s})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i}^{s})\theta_{2})].

We have

supτΥ,s𝒮|iI1(s)ξiΔ¯1(τ,s,Xi)n1w(s)iI0(s)ξiΔ¯1(τ,s,Xi)n0w(s)|\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{|}
supτΥ,s𝒮|iI1(s)ξi[Δ¯1(τ,s,Xi)H(θ1,sNP(τ),θ^NP1,s(τ))𝔼(R1(τ,s,Xi)|Si=s)]n1w(s)|\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))-\mathbb{E}(R_{1}(\tau,s,X_{i})|S_{i}=s)]}{n_{1}^{w}(s)}\right|
+supτΥ,s𝒮|iI0(s)ξi[Δ¯1(τ,s,Xi)H(θNP1,s(τ),θ^NP1,s(τ))𝔼(R1(τ,s,Xi)|Si=s)]n0w(s)|\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H(\theta^{\textit{NP}}_{1,s}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))-\mathbb{E}(R_{1}(\tau,s,X_{i})|S_{i}=s)]}{n_{0}^{w}(s)}\right| (L.4)

We aim to bound the first term on the RHS of (L.4). Note for any ε>0\varepsilon>0, there exists a constant M>0M>0 such that

(supτΥ||θ^a,sNP(τ)θa,sNP(τ)||2Mhnlog(n)/n)1ε.\displaystyle\mathbb{P}\left(\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{NP}}(\tau)-\theta_{a,s}^{\textit{NP}}(\tau)||_{2}\leq M\sqrt{h_{n}\log(n)/n}\right)\geq 1-\varepsilon.

On the set 𝒜(ε)={supτΥ||θ^a,sNP(τ)θa,sNP(τ)||2Mhnlog(n)/n}\mathcal{A}(\varepsilon)=\{\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{NP}}(\tau)-\theta_{a,s}^{\textit{NP}}(\tau)||_{2}\leq M\sqrt{h_{n}\log(n)/n}\}, we have

supτΥ,s𝒮|iI1(s)ξi[Δ¯1(τ,s,Xi)H(θ1,sNP(τ),θ^NP1,s(τ))𝔼(R1(τ,s,Xi)|Si=s)]n1w(s)|\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))-\mathbb{E}(R_{1}(\tau,s,X_{i})|S_{i}=s)]}{n_{1}^{w}(s)}\right|
supτΥ,s𝒮|iI1(s)ξi[λ(Hhn(Xi)θ1,sNP(τ))λ(Hhn(Xi)θ^1,sNP(τ))H(θ1,sNP(τ),θ^NP1,s(τ))]n1(s)|\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1,s}^{\textit{NP}}(\tau))-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}_{1,s}^{\textit{NP}}(\tau))-H(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))]}{n_{1}(s)}\right|
+supτΥ,s𝒮|iI1(s)ξi[R1(τ,s,Xi)𝔼(R1(τ,s,Xi)|Si=s)]n1w(s)|\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[R_{1}(\tau,s,X_{i})-\mathbb{E}(R_{1}(\tau,s,X_{i})|S_{i}=s)\right]}{n_{1}^{w}(s)}\right|
n1(s)n1w(s)[sups𝒮,θ1,θ2hn,||θ1θ2||2Mhnlog(n)/n|iI1(s)ξi[λ(Hhn(Xi)θ1)λ(Hhn(Xi)θ2)H(θ1,θ2)]n1(s)|\displaystyle\leq\frac{n_{1}(s)}{n_{1}^{w}(s)}\biggl{[}\sup_{s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{2})-H(\theta_{1},\theta_{2})]}{n_{1}(s)}\right|
+supτΥ,s𝒮|iI1(s)ξi[R1(τ,s,Xi)𝔼(R1(τ,s,Xi)|Si=s)]n1(s)|]\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[R_{1}(\tau,s,X_{i})-\mathbb{E}(R_{1}(\tau,s,X_{i})|S_{i}=s)\right]}{n_{1}(s)}\right|\biggr{]}
n1(s)n1w(s)(D1+D2).\displaystyle\equiv\frac{n_{1}(s)}{n_{1}^{w}(s)}(D_{1}+D_{2}).

For D1D_{1}, we have

D1|{Ai,Si}i[n]\displaystyle D_{1}|\{A_{i},S_{i}\}_{i\in[n]}
=dsup|i=N(s)N(s)+n1(s)ξis[λ(Hhn(Xsi)θ1)λ(Hhn(Xsi)θ2)H(θ1,θ2)]n1(s)||{Ai,Si}i[n]\displaystyle\stackrel{{\scriptstyle d}}{{=}}\sup\left|\frac{\sum_{i=N(s)}^{N(s)+n_{1}(s)}\xi_{i}^{s}[\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{2})-H(\theta_{1},\theta_{2})]}{n_{1}(s)}\right||\{A_{i},S_{i}\}_{i\in[n]}
=d||n1(s)|||{Ai,Si}i[n],\displaystyle\stackrel{{\scriptstyle d}}{{=}}||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]},

where the supremum in the first equality is taken over {s𝒮,θ1,θ2hn,||θ1θ2||2Mhnlog(n)/n}\{s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n}\} and

={ξis[λ(Hhn(Xsi)θ1)λ(Hhn(Xsi)θ2)H(θ1,θ2)]:s𝒮,θ1,θ2hn,||θ1θ2||2Mhnlog(n)/n}\mathcal{F}=\begin{Bmatrix}&\xi_{i}^{s}[\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{2})-H(\theta_{1},\theta_{2})]:\\ &s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n}\end{Bmatrix}

with the envelope F=2ξisF=2\xi_{i}^{s}. We further note that ||maxi[n]2ξis||,2Clog(n)||\max_{i\in[n]}2\xi_{i}^{s}||_{\mathbb{P},2}\leq C\log(n),

supf𝔼f2sup𝔼(Hhn(Xsi)(θ1θ2))2κ2M2hnlog(n)/n,\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq\sup\mathbb{E}(H_{h_{n}}^{\top}(X^{s}_{i})(\theta_{1}-\theta_{2}))^{2}\leq\kappa_{2}M^{2}h_{n}\log(n)/n,

and

supQN(,eQ,ε||F||Q,2)(aε)chn,\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{a}{\varepsilon}\right)^{ch_{n}},

where a,ca,c are two fixed constants. Therefore, by Chernozhukov et al. (2014, Corollary 5.1),

𝔼[||n1(s)|||{Ai,Si}i[n]]=Op(hnlog(n)/n+hnlog2(n)n)=op(n1/2),\displaystyle\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]=O_{p}\left(h_{n}\log(n)/n+\frac{h_{n}\log^{2}(n)}{n}\right)=o_{p}(n^{-1/2}),

which implies D1=op(n1/2)D_{1}=o_{p}(n^{-1/2}).

Similarly, we have

D2|{Ai,Si}i[n]=d\displaystyle D_{2}|\{A_{i},S_{i}\}_{i\in[n]}\stackrel{{\scriptstyle d}}{{=}} supτΥ,s𝒮|i=N(s)N(s)+n1(s)ξsi[R1(τ,s,Xsi)𝔼(R1(τ,s,Xsi))]n1(s)||{Ai,Si}i[n]\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i=N(s)}^{N(s)+n_{1}(s)}\xi^{s}_{i}\left[R_{1}(\tau,s,X^{s}_{i})-\mathbb{E}(R_{1}(\tau,s,X^{s}_{i}))\right]}{n_{1}(s)}\right||\{A_{i},S_{i}\}_{i\in[n]}
=\displaystyle= ||n1(s)|||{Ai,Si}i[n]|{Ai,Si}i[n],\displaystyle||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}|\{A_{i},S_{i}\}_{i\in[n]},

where ={ξis[τm1(τ,s,Xis)λ(Hhn(Xsi)θ1,sNP(τ))]:τΥ}\mathcal{F}=\{\xi_{i}^{s}[\tau-m_{1}(\tau,s,X_{i}^{s})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1,s}^{\textit{NP}}(\tau))]:\tau\in\Upsilon\} with an envelope F=ξisF=\xi_{i}^{s}. In addition, we note \mathcal{F} is nested in

~={ξis[τm1(τ,s,Xis)λ(Hhn(Xsi)θ1)]:τΥ,θ1hn},\displaystyle\widetilde{\mathcal{F}}=\{\xi_{i}^{s}[\tau-m_{1}(\tau,s,X_{i}^{s})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})]:\tau\in\Upsilon,\theta_{1}\in\Re^{h_{n}}\},

so that

supQN(,eQ,ε||F||Q,2)supQN(~,eQ,ε||F||Q,2)(aε)chn.\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\sup_{Q}N(\widetilde{\mathcal{F}},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{a}{\varepsilon}\right)^{ch_{n}}.

Last,

supf𝔼f2=supτΥ,a=0,1,s𝒮𝔼R12(τ,s,Xis)=O(hnlog(n)/n).\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}=\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\mathbb{E}R_{1}^{2}(\tau,s,X_{i}^{s})=O(h_{n}\log(n)/n).

by Chernozhukov et al. (2014, Corollary 5.1),

𝔼[||n1(s)|||{Ai,Si}i[n]]=Op(hnlog(n)/n+hnlog2(n)n)=op(n1/2),\displaystyle\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]=O_{p}\left(h_{n}\log(n)/n+\frac{h_{n}\log^{2}(n)}{n}\right)=o_{p}(n^{-1/2}),

which implies D2=op(n1/2)D_{2}=o_{p}(n^{-1/2}). This leads to (L.4).

Step 3. Note |ma(τ1,s,Xi)|1|m_{a}(\tau_{1},s,X_{i})|\leq 1 and

|ma(τ1,s,Xi)ma(τ2,s,Xi)|\displaystyle|m_{a}(\tau_{1},s,X_{i})-m_{a}(\tau_{2},s,X_{i})|
\displaystyle\leq |τ1τ2|+|(Yi(a)qa(τ1)|Xi,Si=s)(Yi(a)qa(τ2)|Xi,Si=s)|\displaystyle|\tau_{1}-\tau_{2}|+|\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau_{1})|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau_{2})|X_{i},S_{i}=s)|
\displaystyle\leq (1+supyfa(y|Xi,Si=s)infτΥfa(qa(τ)))|τ1τ2|\displaystyle\left(1+\frac{\sup_{y}f_{a}(y|X_{i},S_{i}=s)}{\inf_{\tau\in\Upsilon}f_{a}(q_{a}(\tau))}\right)|\tau_{1}-\tau_{2}|
\displaystyle\leq C|τ1τ2|.\displaystyle C|\tau_{1}-\tau_{2}|.

This implies Assumptions 3(ii) and 3(iii).

Appendix M Proof of Theorem A.1

Name Description Hpn(Xi)H_{p_{n}}(X_{i}) High-dimensional regressor constructed based on XiX_{i} with dimension pnp_{n} θa,sHD(q)\theta_{a,s}^{\textit{HD}}(q) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θa,sHD(q)\theta_{a,s}^{\textit{HD}}(q) is the pseudo true value defined in Assumption 13(i) θ^a,sHD(q)\hat{\theta}_{a,s}^{\textit{HD}}(q) For a{0,1}a\in\{0,1\}, s𝒮s\in\mathcal{S}, and τΥ\tau\in\Upsilon, θ^a,sHD(q)\hat{\theta}_{a,s}^{\textit{HD}}(q) is the estimator of θa,sHD(q)\theta_{a,s}^{\textit{HD}}(q) in (A.1) ϱn,a(s)\varrho_{n,a}(s) Lasso penalty defined after (A.1) Ω^\hat{\Omega} Lasso penalty loading matrix defined after (A.1) a(q,s,x)\mathcal{M}_{a}(q,s,x) For a{0,1}a\in\{0,1\}, qReq\in Re, s𝒮s\in\mathcal{S}, and xSupp(X)x\in\text{Supp}(X), a(q,s,x)=(Yi(a)q|Si=s,Xi=x)\mathcal{M}_{a}(q,s,x)=\mathbb{P}(Y_{i}(a)\leq q|S_{i}=s,X_{i}=x)

We focus on the case with a=1a=1. Note

{Xi,Yi(1)}iI1(s)|{Ai,Si}i[n]=d{Xis,Yis(1)}i=N(s)+1N(s)+n1(s)|{Ai,Si}i[n].\displaystyle\{X_{i},Y_{i}(1)\}_{i\in I_{1}(s)}|\{A_{i},S_{i}\}_{i\in[n]}\stackrel{{\scriptstyle d}}{{=}}\{X_{i}^{s},Y_{i}^{s}(1)\}_{i=N(s)+1}^{N(s)+n_{1}(s)}|\{A_{i},S_{i}\}_{i\in[n]}.

where {Xis,Yi(1)s}i[n]\{X_{i}^{s},Y_{i}(1)^{s}\}_{i\in[n]} is an i.i.d. sequence that is independent of {Ai,Si}i[n]\{A_{i},S_{i}\}_{i\in[n]}. Therefore,

θ^1,sHD(q)|{Ai,Si}i[n]\displaystyle\hat{\theta}_{1,s}^{\textit{HD}}(q)|\{A_{i},S_{i}\}_{i\in[n]} =dargminθa1n1(s)i=N(s)+1N(s)+n1(s)[1{Yis(1)q}log(λ(Hpn(Xi)θa))\displaystyle\stackrel{{\scriptstyle d}}{{=}}\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\biggl{[}1\{Y_{i}^{s}(1)\leq q\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))
+1{Yis(1)>q}log(1λ(Hpn(Xsi)θa))]+ϱn,1(s)n1(s)||Ω^θa||1|{Ai,Si}i[n],\displaystyle+1\{Y_{i}^{s}(1)>q\}\log(1-\lambda(H_{p_{n}}^{\top}(X^{s}_{i})\theta_{a}))\biggr{]}+\frac{\varrho_{n,1}(s)}{n_{1}(s)}||\hat{\Omega}\theta_{a}||_{1}\biggl{|}\{A_{i},S_{i}\}_{i\in[n]},

and Assumption 13(vi) implies

0<κ1\displaystyle 0<\kappa_{1}\leq infa=0,1,s𝒮,|v|0hnnvT(1n1(s)i=N(s)+1N(s)+n1(s)Hpn(Xsi)Hpn(Xsi))v||v||22\displaystyle\inf_{a=0,1,s\in\mathcal{S},|v|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i})\right)v}{||v||_{2}^{2}}
\displaystyle\leq supa=0,1,s𝒮,|v|0hnnvT(1n1(s)i=N(s)+1N(s)+n1(s)Hpn(Xsi)Hpn(Xsi))v||v||22κ2<,\displaystyle\sup_{a=0,1,s\in\mathcal{S},|v|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i})\right)v}{||v||_{2}^{2}}\leq\kappa_{2}<\infty,

and

0<κ1\displaystyle 0<\kappa_{1}\leq infa=0,1,s𝒮,|v|0hnnvT𝔼(Hpn(Xsi)Hpn(Xsi))v||v||22\displaystyle\inf_{a=0,1,s\in\mathcal{S},|v|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i}))v}{||v||_{2}^{2}}
\displaystyle\leq supa=0,1,s𝒮,|v|0hnnvT𝔼(Hpn(Xsi)Hpn(Xsi))v||v||22κ2<.\displaystyle\sup_{a=0,1,s\in\mathcal{S},|v|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i}))v}{||v||_{2}^{2}}\leq\kappa_{2}<\infty.

In addition, we have n1(s)/na.s.π(s)p(s)>0n_{1}(s)/n\stackrel{{\scriptstyle a.s.}}{{\rightarrow}}\pi(s)p(s)>0. Therefore, based on the results established by Belloni et al. (2017), we have, conditionally on {Ai,Si}i[n]\{A_{i},S_{i}\}_{i\in[n]}, and thus, unconditionally,

supa=0,1,q𝒬εa,s𝒮||θ^a,sHD(q)θa,sHD(q)||2=Op(hnlog(pn)n),\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{\textit{HD}}(q)-\theta_{a,s}^{\textit{HD}}(q)||_{2}=O_{p}\left(\sqrt{\frac{h_{n}\log(p_{n})}{n}}\right),
supa=0,1,q𝒬εa,s𝒮||θ^a,spost(q)θa,sHD(q)||2=Op(hnlog(pn)n),\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{post}(q)-\theta_{a,s}^{\textit{HD}}(q)||_{2}=O_{p}\left(\sqrt{\frac{h_{n}\log(p_{n})}{n}}\right),
supa=0,1,q𝒬εa,s𝒮||θ^a,sHD(q)||=Op(hn),\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{\textit{HD}}(q)||=O_{p}(h_{n}),

and

supa=0,1,q𝒬εa,s𝒮||θ^a,spost(q)||=Op(hn).\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{post}(q)||=O_{p}(h_{n}).

In the following, we prove the results when θ^a,sHD(q)\hat{\theta}_{a,s}^{\textit{HD}}(q) is used. The results corresponding to θ^a,spost(q)\hat{\theta}_{a,s}^{post}(q) can be proved in the same manner and are therefore omitted. Recall

Δ¯1(τ,s,Xi)=m^1(τ,s,Xi)m¯1(τ,s,Xi)\displaystyle\overline{\Delta}_{1}(\tau,s,X_{i})=\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})
=(Yi(1)q1(τ)|Xi,Si=s)λ(Hpn(Xi)θ^HD1,s(q^1(τ)))\displaystyle=\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)|X_{i},S_{i}=s)-\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))
=[1(q1(τ),s,Xi)1(q^1(τ),s,Xi)+ra(q^1(τ),s,Xi)]\displaystyle=\left[\mathcal{M}_{1}(q_{1}(\tau),s,X_{i})-\mathcal{M}_{1}(\hat{q}_{1}(\tau),s,X_{i})+r_{a}(\hat{q}_{1}(\tau),s,X_{i})\right]
+[λ(Hpn(Xi)θHD1,s(q^1(τ)))λ(Hpn(Xi)θ^HD1,s(q^1(τ)))]\displaystyle+\left[\lambda(H_{p_{n}}(X_{i})^{\top}\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))-\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))\right]
a,s(q1(τ),q^1(τ),Xi)+λ(Hpn(Xi)θHD1,s(q^1(τ)))λ(Hpn(Xi)θ^HD1,s(q^1(τ))),\displaystyle\equiv\mathcal{R}_{a,s}(q_{1}(\tau),\hat{q}_{1}(\tau),X_{i})+\lambda(H_{p_{n}}(X_{i})^{\top}\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))-\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau))),

where

a,s(q,q,Xi)=1(q,s,Xi)1(q,s,Xi)+ra(q,s,Xi).\displaystyle\mathcal{R}_{a,s}(q,q^{\prime},X_{i})=\mathcal{M}_{1}(q,s,X_{i})-\mathcal{M}_{1}(q^{\prime},s,X_{i})+r_{a}(q^{\prime},s,X_{i}).

Let

Hλ(θ1,θ2,s)=𝔼[λ(Hpn(Xi)θ1)λ(Hpn(Xi)θ2)|Si=s],H_{\lambda}(\theta_{1},\theta_{2},s)=\mathbb{E}[\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{1})-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{2})|S_{i}=s],

and

HR(q,q,s)=𝔼(a,s(q,q,Xi)|Si=s).\displaystyle H_{R}(q,q^{\prime},s)=\mathbb{E}(\mathcal{R}_{a,s}(q,q^{\prime},X_{i})|S_{i}=s).

Then, we have

supτΥ,s𝒮|iI1(s)ξiΔ¯1(τ,s,Xi)n1w(s)iI0(s)ξiΔ¯1(τ,s,Xi)n0w(s)|\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{|}
supτΥ,s𝒮|iI1(s)ξi[Δ¯1(τ,s,Xi)Hλ(θ^HD1,s(q^1(τ)),θHD1,s(q^1(τ)),s)HR(q1(τ),q^1(τ),s)]n1w(s)|\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H_{\lambda}(\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),s)-H_{R}(q_{1}(\tau),\hat{q}_{1}(\tau),s)]}{n_{1}^{w}(s)}\right|
+supτΥ,s𝒮|iI0(s)ξi[Δ¯1(τ,s,Xi)Hλ(θ^HD1,s(q^1(τ)),θHD1,s(q^1(τ)),s)HR(q1(τ),q^1(τ),s)]n0w(s)|\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H_{\lambda}(\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),s)-H_{R}(q_{1}(\tau),\hat{q}_{1}(\tau),s)]}{n_{0}^{w}(s)}\right| (M.1)

We aim to bound the first term on the RHS of (M.1). Note for any ε>0\varepsilon>0, there exists a constant M>0M>0 such that

(supq𝒬1ε||θ^1,sHD(q)θ1,sHD(q)||2Mhnlog(pn)n,supq𝒬1ε||θ^1,sHD(q)||0Mhn,supτΥ|q^1(τ)q1(τ)|Mn1/2)1ε.\displaystyle\mathbb{P}\begin{pmatrix}\sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)-\theta_{1,s}^{\textit{HD}}(q)||_{2}\leq M\sqrt{\frac{h_{n}\log(p_{n})}{n}},\leavevmode\nobreak\ \sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)||_{0}\leq Mh_{n},\\ \sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\leq Mn^{-1/2}\end{pmatrix}\geq 1-\varepsilon.

On the set

𝒜(ε)={supq𝒬1ε||θ^1,sHD(q)θ1,sHD(q)||2Mhnlog(pn)n,supq𝒬1ε||θ^1,sHD(q)||0Mhn,supτΥ|q^1(τ)q1(τ)|Mn1/2},\mathcal{A}(\varepsilon)=\begin{Bmatrix}\sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)-\theta_{1,s}^{\textit{HD}}(q)||_{2}\leq M\sqrt{\frac{h_{n}\log(p_{n})}{n}},\leavevmode\nobreak\ \sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)||_{0}\leq Mh_{n},\\ \sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\leq Mn^{-1/2}\end{Bmatrix},

we have

supτΥ,s𝒮|iI1(s)ξi[Δ¯1(τ,s,Xi)Hλ(θ^HD1,s(q^1(τ)),θHD1,s(q^1(τ)),s)HR(q1(τ),q^1(τ),s)]n1w(s)|\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H_{\lambda}(\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),s)-H_{R}(q_{1}(\tau),\hat{q}_{1}(\tau),s)]}{n_{1}^{w}(s)}\right|
supτΥ,s𝒮|iI1(s)ξi[λ(Hhn(Xi)θ1,sNP(τ))λ(Hhn(Xi)θ^1,sNP(τ))Hλ(θ1,sNP(τ),θ^NP1,s(τ),s)]n1(s)|\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1,s}^{\textit{NP}}(\tau))-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}_{1,s}^{\textit{NP}}(\tau))-H_{\lambda}(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau),s)]}{n_{1}(s)}\right|
+supτΥ,s𝒮|iI1(s)ξi[1,s(q1(τ),q^1(τ),Xi)𝔼(1,s(q1(τ),q^1(τ),Xi)|Si=s)]n1w(s)|\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[\mathcal{R}_{1,s}(q_{1}(\tau),\hat{q}_{1}(\tau),X_{i})-\mathbb{E}(\mathcal{R}_{1,s}(q_{1}(\tau),\hat{q}_{1}(\tau),X_{i})|S_{i}=s)\right]}{n_{1}^{w}(s)}\right|
n1(s)n1w(s)[sup|iI1(s)ξi[λ(Hhn(Xi)θ1)λ(Hhn(Xi)θ2)Hλ(θ1,θ2,s)]n1(s)|\displaystyle\leq\frac{n_{1}(s)}{n_{1}^{w}(s)}\biggl{[}\sup\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{2})-H_{\lambda}(\theta_{1},\theta_{2},s)]}{n_{1}(s)}\right|
+supq,q𝒬1ε,|qq|Mn1/2,s𝒮|iI1(s)ξi[1,s(q,q,Xi)𝔼(1,s(q,q,Xi)|Si=s)]n1(s)|]\displaystyle+\sup_{q,q^{\prime}\in\mathcal{Q}_{1}^{\varepsilon},|q-q^{\prime}|\leq Mn^{-1/2},s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[\mathcal{R}_{1,s}(q,q^{\prime},X_{i})-\mathbb{E}(\mathcal{R}_{1,s}(q,q^{\prime},X_{i})|S_{i}=s)\right]}{n_{1}(s)}\right|\biggr{]}
n1(s)n1w(s)(D1+D2),\displaystyle\equiv\frac{n_{1}(s)}{n_{1}^{w}(s)}(D_{1}+D_{2}),

where the first supremum in the second inequality is taken over {s𝒮,θ1,θ2hn,||θ1θ2||2Mhnlog(n)/n,||θ1||0+||θ2||0Mhn}\{s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n},||\theta_{1}||_{0}+||\theta_{2}||_{0}\leq Mh_{n}\}. Denote

={ξis[λ(Hhn(Xsi)θ1)λ(Hhn(Xsi)θ2)Hλ(θ1,θ2,s)]:s𝒮,θ1,θ2hn,||θ1θ2||2Mhnlog(n)/n,||θ1||0+||θ2||0Mhn}\mathcal{F}=\begin{Bmatrix}\xi_{i}^{s}[\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{2})-H_{\lambda}(\theta_{1},\theta_{2},s)]:\\ s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n},||\theta_{1}||_{0}+||\theta_{2}||_{0}\leq Mh_{n}\end{Bmatrix}

with the envelope F=2ξisF=2\xi_{i}^{s}. We further note that ||maxi[n]2ξis||,2Clog(n)||\max_{i\in[n]}2\xi_{i}^{s}||_{\mathbb{P},2}\leq C\log(n),

supf𝔼f2sup𝔼(Hhn(Xsi)(θ1θ2))2κ2M2hnlog(pn)/n,\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq\sup\mathbb{E}(H_{h_{n}}^{\top}(X^{s}_{i})(\theta_{1}-\theta_{2}))^{2}\leq\kappa_{2}M^{2}h_{n}\log(p_{n})/n,

and

supQN(,eQ,ε||F||Q,2)(c1pnε)c2hn,\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{c_{1}p_{n}}{\varepsilon}\right)^{c_{2}h_{n}},

where c1,c2c_{1},c_{2} are two fixed constants. Therefore, Lemma N.2 implies

D1=Op(hnlog(pn)n+hnlog(n)log(pn)n)=op(n1/2).\displaystyle D_{1}=O_{p}\left(\frac{h_{n}\log(p_{n})}{n}+\frac{h_{n}\log(n)\log(p_{n})}{n}\right)=o_{p}(n^{-1/2}).

Similarly, denote

={ξsi[1(q,s,Xi)1(q,s,Xi)+ra(q,s,Xi)]:q,q𝒬1ε,|qq|Mn1/2,s𝒮},\displaystyle\mathcal{F}=\begin{Bmatrix}\xi^{s}_{i}\left[\mathcal{M}_{1}(q,s,X_{i})-\mathcal{M}_{1}(q^{\prime},s,X_{i})+r_{a}(q^{\prime},s,X_{i})\right]:q,q^{\prime}\in\mathcal{Q}_{1}^{\varepsilon},|q-q^{\prime}|\leq Mn^{-1/2},s\in\mathcal{S}\end{Bmatrix},

with an envelope F=ξisF=\xi_{i}^{s}. In addition, note that \mathcal{F} is nested in

~={ξis[1(q,s,Xis)λ(Hhn(Xsi)θ1)]:q𝒬1ε,θ1pn,||θ1||0hn},\displaystyle\widetilde{\mathcal{F}}=\{\xi_{i}^{s}[\mathcal{M}_{1}(q,s,X_{i}^{s})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})]:q\in\mathcal{Q}_{1}^{\varepsilon},\theta_{1}\in\Re^{p_{n}},||\theta_{1}||_{0}\leq h_{n}\},

with the same envelope. Hence,

supQN(,eQ,ε||F||Q,2)supQN(~,eQ,ε||F||Q,2)(apnε)chn.\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\sup_{Q}N(\widetilde{\mathcal{F}},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{ap_{n}}{\varepsilon}\right)^{ch_{n}}.

Last,

supf𝔼f2Csupq,q𝒬1ε,|qq|Mn1/2,s𝒮(|qq|2+𝔼ra2(q,s,Xis))=O(hnlog(pn)/n).\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq C\sup_{q,q^{\prime}\in\mathcal{Q}_{1}^{\varepsilon},|q-q^{\prime}|\leq Mn^{-1/2},s\in\mathcal{S}}(|q-q^{\prime}|^{2}+\mathbb{E}r_{a}^{2}(q^{\prime},s,X_{i}^{s}))=O(h_{n}\log(p_{n})/n).

Therefore, Lemma N.2 implies

D2=Op(hnlog(pn)n+hnlog(n)log(pn)n)=op(n1/2).\displaystyle D_{2}=O_{p}\left(\frac{h_{n}\log(p_{n})}{n}+\frac{h_{n}\log(n)\log(p_{n})}{n}\right)=o_{p}(n^{-1/2}).

This leads to (M.1). We can establish Assumption 3(i) in the same manner. Assumptions 3(ii) and 3(iii) can be established by the same argument used in Step 3 of the proof of Theorem 5.6. This concludes the proof of Theorem A.1.

Appendix N Technical Lemmas

The first lemma was established in Zhang and Zheng (2020).

Lemma N.1.

Let SkS_{k} be the kk-th partial sum of Banach space valued independent identically distributed random variables, then

(max1kn||Sk||ε)3max1kn(||Sk||ε/3)9(||Sn||ε/30).\displaystyle\mathbb{P}(\max_{1\leq k\leq n}||S_{k}||\geq\varepsilon)\leq 3\max_{1\leq k\leq n}\mathbb{P}(||S_{k}||\geq\varepsilon/3)\leq 9\mathbb{P}(||S_{n}||\geq\varepsilon/30).
Proof.

The first inequality is due to Zhang and Zheng (2020, Lemma E.1) and the second inequality is due to Montgomery-Smith (1993, Theorem 1). ∎

The next lemma is due to Chernozhukov et al. (2014) with our modification of their maximal inequality to the case with covariate-adaptive randomization.

Lemma N.2.

Suppose Assumption 1 holds. Let wi=1w_{i}=1 or ξi\xi_{i} defined in Assumption 4. Denote \mathcal{F} as a class of functions of the form f(x,y1,y0)f(x,y_{1},y_{0}) where, f(x,y1,y0)f(x,y_{1},y_{0}) is a measurable function and 𝔼(f(Xi,Yi(1),Yi(0))|Si=s)=0\mathbb{E}(f(X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s)=0. Further suppose maxs𝒮𝔼(|Fi|q|Si=s)<\max_{s\in\mathcal{S}}\mathbb{E}(|F_{i}|^{q}|S_{i}=s)<\infty for some q2q\geq 2, where

Fi=supf|wif(Xi,Yi(1),Yi(0))|,F_{i}=\sup_{f\in\mathcal{F}}|w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))|,

\mathcal{F} is of the VC-type with coefficients (αn,vn)>0(\alpha_{n},v_{n})>0, and supf𝔼(f2|S=s)σn2\sup_{f\in\mathcal{F}}\mathbb{E}(f^{2}|S=s)\leq\sigma_{n}^{2}. Then,

supf,s𝒮1n|i=1nAi1{Si=s}wif(Xi,Yi(1),Yi(0))|\displaystyle\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left|\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right|
=Op(vnσn2log(αn||F||,2σ)+vn||maxi[n]Fi||,2log(αn||F||,2σ)n),\displaystyle=O_{p}\left(\sqrt{v_{n}\sigma_{n}^{2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}+\frac{v_{n}||\max_{i\in[n]}F_{i}||_{\mathbb{P},2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}{\sqrt{n}}\right),

and

supf,s𝒮1n|i=1n(1Ai)1{Si=s}wif(Xi,Yi(1),Yi(0))|\displaystyle\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left|\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right|
=Op(vnσn2log(αn||F||,2σ)+vn||maxi[n]Fi||,2log(αn||F||,2σ)n).\displaystyle=O_{p}\left(\sqrt{v_{n}\sigma_{n}^{2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}+\frac{v_{n}||\max_{i\in[n]}F_{i}||_{\mathbb{P},2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}{\sqrt{n}}\right).
Proof.

We focus on establishing the first statement. The proof of the second statement is similar and is omitted. Following Bugni et al. (2018), we define the sequence of i.i.d. random variables {(wis,Xis,Yis(1),Yis(0)):1in}\{(w_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)):1\leq i\leq n\} with marginal distributions equal to the distribution of (wi,Xi,Yi(1),Yi(0))|Si=s(w_{i},X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s. The distribution of i=1nAi1{Si=s}wif(Xi,Yi(1),Yi(0))\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0)) is the same as the counterpart with units ordered by strata and then ordered by Ai=1A_{i}=1 first and Ai=0A_{i}=0 second within each stratum, i.e.,

i=1nAi1{Si=s}wif(Xi,Yi(1),Yi(0))\displaystyle\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0)) =di=N(s)+1N(s)+n1(s)wsif(Xsi,Ysi(1),Ysi(0))\displaystyle\stackrel{{\scriptstyle d}}{{=}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}w^{s}_{i}f(X^{s}_{i},Y^{s}_{i}(1),Y^{s}_{i}(0))
Γns(N(s)+n1(s),f)Γns(N(s)+1,f),\displaystyle\equiv\Gamma_{n}^{s}(N(s)+n_{1}(s),f)-\Gamma_{n}^{s}(N(s)+1,f),

where N(s)=i=1n1{Si<s}N(s)=\sum_{i=1}^{n}1\{S_{i}<s\} and

Γns(k,f)=i[k]wsif(Xsi,Ysi(1),Ysi(0)).\displaystyle\Gamma_{n}^{s}(k,f)=\sum_{i\in[k]}w^{s}_{i}f(X^{s}_{i},Y^{s}_{i}(1),Y^{s}_{i}(0)).

Let μn=vnσn2log(αn||F||,2σ)+vn||maxi[n]Fi||,2log(αn||F||,2σ)n.\mu_{n}=\sqrt{v_{n}\sigma_{n}^{2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}+\frac{v_{n}||\max_{i\in[n]}F_{i}||_{\mathbb{P},2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}{\sqrt{n}}. Then, for some constant C>0C>0, we have

(supf,s𝒮1n|i=1nAi1{Si=s}wif(Xi,Yi(1),Yi(0))|tμn)\displaystyle\mathbb{P}\left(\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left|\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right|\geq t\mu_{n}\right)
=(supf,s𝒮1n|Γns(N(s)+n1(s),f)Γns(N(s)+1,f)|tμn)\displaystyle=\mathbb{P}\left(\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left|\Gamma_{n}^{s}(N(s)+n_{1}(s),f)-\Gamma_{n}^{s}(N(s)+1,f)\right|\geq t\mu_{n}\right)
s𝒮(max1knsupf|1nΓns(k,f)|tμn/2)\displaystyle\leq\sum_{s\in\mathcal{S}}\mathbb{P}\left(\max_{1\leq k\leq n}\sup_{f\in\mathcal{F}}\left|\frac{1}{\sqrt{n}}\Gamma_{n}^{s}(k,f)\right|\geq t\mu_{n}/2\right)
s𝒮9(supf|1nΓns(n,f)|tμn/60)\displaystyle\leq\sum_{s\in\mathcal{S}}9\;\mathbb{P}\left(\sup_{f\in\mathcal{F}}\left|\frac{1}{\sqrt{n}}\Gamma_{n}^{s}(n,f)\right|\geq t\mu_{n}/60\right)
s𝒮540𝔼(supf|1nΓns(n,f)|)tμn\displaystyle\leq\sum_{s\in\mathcal{S}}\frac{540\;\mathbb{E}\left(\sup_{f\in\mathcal{F}}\left|\frac{1}{\sqrt{n}}\Gamma_{n}^{s}(n,f)\right|\right)}{t\mu_{n}}
=s𝒮540𝔼(n||sns||)tμn\displaystyle=\sum_{s\in\mathcal{S}}\frac{540\mathbb{E}\left(\sqrt{n}||\mathbb{P}^{s}_{n}-\mathbb{P}^{s}||_{\mathcal{F}}\right)}{t\mu_{n}}
C/t,\displaystyle\leq C/t,

where ns\mathbb{P}_{n}^{s} and s\mathbb{P}^{s} are the empirical process and expectation w.r.t. i.i.d. data {wis,Xis,Yis(1),Yis(0)}i[n]\{w_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)\}_{i\in[n]}, respectively, the second inequality is due to Lemma N.1, the last equality is due to the fact that

𝔼wisf(Xis,Yis(1),Yis(0))=𝔼(wif(Xi,Yi(1),Yi(0))|Si=s)=0,\displaystyle\mathbb{E}w_{i}^{s}f(X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0))=\mathbb{E}\left(w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s\right)=0,

and the last inequality is due to the fact that, by Chernozhukov et al. (2014, Corollary 5.1),

𝔼(n||sns||)Cμn.\displaystyle\mathbb{E}\left(\sqrt{n}||\mathbb{P}^{s}_{n}-\mathbb{P}^{s}||_{\mathcal{F}}\right)\leq C\mu_{n}.

Then, for any ε>0\varepsilon>0, we can choose tC/εt\geq C/\varepsilon so that

(supf,s𝒮1n|i=1nAi1{Si=s}wif(Xi,Yi(1),Yi(0))|tμn)ε,\displaystyle\mathbb{P}\left(\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left|\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right|\geq t\mu_{n}\right)\leq\varepsilon,

which implies the desired result. ∎

The next lemma is similar to Zhang and Zheng (2020, Lemma E.2) but with additional covariates and regression adjustments. It is retained in the Supplement to make the paper self-contained.

Lemma N.3.

Suppose Assumptions in Theorem 3 hold. Denote

ϖn,1(τ)=\displaystyle\varpi_{n,1}(\tau)= s𝒮1ni=1nAi1{Si=s}ϕ1(τ,s,Yi(1),Xi)\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})
\displaystyle- s𝒮1ni=1n(1Ai)1{Si=s}ϕ0(τ,s,Yi(0),Xi),\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i}),

and

ϖn,2(τ)=1ni=1nϕs(τ,Si).\displaystyle\varpi_{n,2}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\phi_{s}(\tau,S_{i}).

Then, uniformly over τΥ\tau\in\Upsilon,

(ϖn,1(τ),ϖn,2(τ))(1(τ),2(τ)),\displaystyle(\varpi_{n,1}(\tau),\varpi_{n,2}(\tau))\rightsquigarrow(\mathcal{B}_{1}(\tau),\mathcal{B}_{2}(\tau)),

where (1(τ),2(τ))(\mathcal{B}_{1}(\tau),\mathcal{B}_{2}(\tau)) are two independent Gaussian processes with covariance kernels Σ1(τ,τ)\Sigma_{1}(\tau,\tau^{\prime}) and Σ2(τ,τ)\Sigma_{2}(\tau,\tau^{\prime}), respectively, such that

Σ1(τ,τ)\displaystyle\Sigma_{1}(\tau,\tau^{\prime}) =𝔼π(Si)ϕ1(τ,Si,Yi(1),Xi)ϕ1(τ,Si,Yi(1),Xi)\displaystyle=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
+𝔼(1π(Si))ϕ0(τ,Si,Yi(0),Xi)ϕ0(τ,Si,Yi(0),Xi)\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})

and

Σ2(τ,τ)=𝔼ϕs(τ,Si)ϕs(τ,Si).\displaystyle\Sigma_{2}(\tau,\tau^{\prime})=\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).
Proof.

We follow the general argument in the proof of Bugni et al. (2018, Lemma B.2). We divide the proof into two steps. In the first step, we show that

(ϖn,1(τ),ϖn,2(τ))=d(ϖn,1(τ),ϖn,2(τ))+op(1),\displaystyle(\varpi_{n,1}(\tau),\varpi_{n,2}(\tau))\stackrel{{\scriptstyle d}}{{=}}(\varpi^{\star}_{n,1}(\tau),\varpi_{n,2}(\tau))+o_{p}(1),

where the op(1)o_{p}(1) term holds uniformly over τΥ\tau\in\Upsilon, ϖn,1(τ)ϖn,2(τ)\varpi^{\star}_{n,1}(\tau)\perp\!\!\!\perp\varpi_{n,2}(\tau), and, uniformly over τΥ\tau\in\Upsilon,

ϖn,1(τ)1(τ).\displaystyle\varpi^{\star}_{n,1}(\tau)\rightsquigarrow\mathcal{B}_{1}(\tau).

In the second step, we show that

ϖn,2(τ)2(τ)\displaystyle\varpi_{n,2}(\tau)\rightsquigarrow\mathcal{B}_{2}(\tau)

uniformly over τΥ\tau\in\Upsilon.

Step 1. Recall that we define {(Xis,Yis(1),Yis(0)):1in}\{(X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)):1\leq i\leq n\} as a sequence of i.i.d. random variables with marginal distributions equal to the distribution of (Xi,Yi(1),Yi(0))|Si=s(X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s and N(s)=i=1n1{Si<s}N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}. The distribution of ϖn,1(τ)\varpi_{n,1}(\tau) is the same as the counterpart with units ordered by strata and then ordered by Ai=1A_{i}=1 first and Ai=0A_{i}=0 second within each stratum, i.e.,

ϖn,1(τ)|{(Ai,Si)i[n]}\displaystyle\varpi_{n,1}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\} =dϖ~n,1(τ)|{(Ai,Si)i[n]}\displaystyle\stackrel{{\scriptstyle d}}{{=}}\widetilde{\varpi}_{n,1}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\}

where

ϖ~n,1(τ)\displaystyle\widetilde{\varpi}_{n,1}(\tau) s𝒮1ni=N(s)+1N(s)+n1(s)ϕ1(τ,s,Ysi(1),Xsi)\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})
s𝒮1ni=N(s)+n1(s)+1N(s)+n(s)ϕ0(τ,s,Ysi(0),Xsi),\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}),

with

ϕ1(τ,s,Ysi(1),Xsi)=\displaystyle\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})= τ1{Yis(1)q1(τ)}m1(τ,s)(1π(s))(m¯1(τ,s,Xsi)m¯1(τ,s))π(s)f1(q1(τ))\displaystyle\frac{\tau-1\{Y_{i}^{s}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,s)-(1-\pi(s))\left(\overline{m}_{1}(\tau,s,X^{s}_{i})-\overline{m}_{1}(\tau,s)\right)}{\pi(s)f_{1}(q_{1}(\tau))}
\displaystyle- (m¯0(τ,s,Xsi)m¯0(τ,s))f0(q0(τ)),\displaystyle\frac{\left(\overline{m}_{0}(\tau,s,X^{s}_{i})-\overline{m}_{0}(\tau,s)\right)}{f_{0}(q_{0}(\tau))},

and

ϕ0(τ,s,Ysi(0),Xsi)=\displaystyle\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i})= τ1{Yis(0)q0(τ)}m0(τ,s)π(s)(m¯0(τ,s,Xsi)m¯0(τ,s))(1π(s))f0(q0(τ))\displaystyle\frac{\tau-1\{Y_{i}^{s}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,s)-\pi(s)\left(\overline{m}_{0}(\tau,s,X^{s}_{i})-\overline{m}_{0}(\tau,s)\right)}{(1-\pi(s))f_{0}(q_{0}(\tau))}
\displaystyle- (m¯1(τ,s,Xsi)m¯1(τ,s))f1(q1(τ)).\displaystyle\frac{\left(\overline{m}_{1}(\tau,s,X^{s}_{i})-\overline{m}_{1}(\tau,s)\right)}{f_{1}(q_{1}(\tau))}.

As ϖn,2(τ)\varpi_{n,2}(\tau) is only a function of {Si}i[n]\{S_{i}\}_{i\in[n]}, we have

(ϖn,1(τ),ϖn,2(τ))=d(ϖ~n,1(τ),ϖn,2(τ)).\displaystyle(\varpi_{n,1}(\tau),\varpi_{n,2}(\tau))\stackrel{{\scriptstyle d}}{{=}}(\widetilde{\varpi}_{n,1}(\tau),\varpi_{n,2}(\tau)).

Let F(s)=(Si<s)F(s)=\mathbb{P}(S_{i}<s), p(s)=(Si=s)p(s)=\mathbb{P}(S_{i}=s), and

ϖn,1(τ)\displaystyle\varpi^{\star}_{n,1}(\tau)\equiv s𝒮1ni=nF(s)+1n(F(s)+π(s)p(s))ϕ1(τ,s,Ysi(1),Xsi)\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor nF(s)\rfloor+1}^{\lfloor n(F(s)+\pi(s)p(s))\rfloor}\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})
\displaystyle- s𝒮1ni=n(F(s)+π(s)p(s))+1n(F(s)+p(s))ϕ0(τ,s,Ysi(0),Xsi).\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor n(F(s)+\pi(s)p(s))\rfloor+1}^{\lfloor n(F(s)+p(s))\rfloor}\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}).

Note ϖn,1(τ)\varpi^{\star}_{n,1}(\tau) is a function of (Yis(1),Yis(0),Xis)i[n],s𝒮(Y_{i}^{s}(1),Y_{i}^{s}(0),X_{i}^{s})_{i\in[n],s\in\mathcal{S}}, which is independent of {Ai,Si}i[n]\{A_{i},S_{i}\}_{i\in[n]} by construction. Therefore,

ϖn,1(τ)ϖn,2(τ).\displaystyle\varpi^{\star}_{n,1}(\tau)\perp\!\!\!\perp\varpi_{n,2}(\tau).

Note that

N(s)npF(s),n1(s)npπ(s)p(s),andn(s)npp(s).\displaystyle\frac{N(s)}{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}F(s),\quad\frac{n_{1}(s)}{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\pi(s)p(s),\quad\text{and}\quad\frac{n(s)}{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}p(s).

Denote Γn,a(s,t,τ)=i=1nt1nϕa(τ,s,Yis(a),Xis)\Gamma_{n,a}(s,t,\tau)=\sum_{i=1}^{\lfloor nt\rfloor}\frac{1}{n}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s}) for a=0,1a=0,1. In order to show that supτΥ|ϖ~n,1(τ)ϖn,1(τ)|=op(1)\sup_{\tau\in\Upsilon}|\widetilde{\varpi}_{n,1}(\tau)-\varpi^{\star}_{n,1}(\tau)|=o_{p}(1) and ϖn,1(τ)1(τ)\varpi_{n,1}^{\star}(\tau)\rightsquigarrow\mathcal{B}_{1}(\tau), it suffices to show that (1) for a=0,1a=0,1 and s𝒮s\in\mathcal{S}, the stochastic process

{Γn,a(s,t,τ):t(0,1),τΥ}\displaystyle\{\Gamma_{n,a}(s,t,\tau):t\in(0,1),\tau\in\Upsilon\}

is stochastically equicontinuous and (2) ϖn,1(τ)1(τ)\varpi_{n,1}^{\star}(\tau)\rightsquigarrow\mathcal{B}_{1}(\tau) converges to B1(τ)B_{1}(\tau) in finite dimension.

Claim (1). We want to bound

sup|Γn,a(s,t2,τ2)Γn,a(s,t1,τ1)|,\displaystyle\sup|\Gamma_{n,a}(s,t_{2},\tau_{2})-\Gamma_{n,a}(s,t_{1},\tau_{1})|,

where the supremum is taken over 0<t1<t2<t1+ε<10<t_{1}<t_{2}<t_{1}+\varepsilon<1 and τ1<τ2<τ1+ε\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon such that τ1,τ1+εΥ.\tau_{1},\tau_{1}+\varepsilon\in\Upsilon. Note that,

sup|Γn,a(s,t2,τ2)Γn,a(s,t1,τ1)|\displaystyle\sup|\Gamma_{n,a}(s,t_{2},\tau_{2})-\Gamma_{n,a}(s,t_{1},\tau_{1})|
\displaystyle\leq sup0<t1<t2<t1+ε<1,τΥ|Γn,a(s,t2,τ)Γn,a(s,t1,τ)|+supt(0,1),τ1,τ2Υ,τ1<τ2<τ1+ε|Γn,a(s,t,τ2)Γn,a(s,t,τ1)|.\displaystyle\sup_{0<t_{1}<t_{2}<t_{1}+\varepsilon<1,\tau\in\Upsilon}|\Gamma_{n,a}(s,t_{2},\tau)-\Gamma_{n,a}(s,t_{1},\tau)|+\sup_{t\in(0,1),\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}|\Gamma_{n,a}(s,t,\tau_{2})-\Gamma_{n,a}(s,t,\tau_{1})|. (N.1)

Then, for an arbitrary δ>0\delta>0, by taking ε=δ4\varepsilon=\delta^{4}, we have

(sup0<t1<t2<t1+ε<1,τΥ|Γn,a(s,t2,τ)Γn,a(s,t1,τ)|δ)\displaystyle\mathbb{P}\left(\sup_{0<t_{1}<t_{2}<t_{1}+\varepsilon<1,\tau\in\Upsilon}|\Gamma_{n,a}(s,t_{2},\tau)-\Gamma_{n,a}(s,t_{1},\tau)|\geq\delta\right)
=(sup0<t1<t2<t1+ε<1,τΥ|i=nt1+1nt2ϕa(τ,s,Yis(a),Xis)|nδ)\displaystyle=\mathbb{P}\left(\sup_{0<t_{1}<t_{2}<t_{1}+\varepsilon<1,\tau\in\Upsilon}|\sum_{i=\lfloor nt_{1}\rfloor+1}^{\lfloor nt_{2}\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})|\geq\sqrt{n}\delta\right)
=(sup0<tε,τΥ|i=1ntϕa(τ,s,Yis(a),Xis)|nδ)\displaystyle=\mathbb{P}\left(\sup_{0<t\leq\varepsilon,\tau\in\Upsilon}|\sum_{i=1}^{\lfloor nt\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})|\geq\sqrt{n}\delta\right)
(max1knεsupτΥ|Sk(τ)|nδ)\displaystyle\leq\mathbb{P}\left(\max_{1\leq k\leq\lfloor n\varepsilon\rfloor}\sup_{\tau\in\Upsilon}|S_{k}(\tau)|\geq\sqrt{n}\delta\right)
270𝔼supτΥ|i=1nεϕa(τ,s,Yis(a),Xis)|nδ\displaystyle\leq\frac{270\;\mathbb{E}\sup_{\tau\in\Upsilon}|\sum_{i=1}^{\lfloor n\varepsilon\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})|}{\sqrt{n}\delta}
nεnδδ,\displaystyle\lesssim\frac{\sqrt{n\varepsilon}}{\sqrt{n}\delta}\lesssim\delta,

where in the first inequality, Sk(τ)=i=1kϕa(τ,s,Yis(a),Xis)S_{k}(\tau)=\sum_{i=1}^{k}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s}) and the second inequality holds due to Lemma N.1. To see the third inequality, denote

={ϕa(τ,s,Yis(a),Xis):τΥ}\displaystyle\mathcal{F}=\{\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s}):\tau\in\Upsilon\}

with an envelope function FiF_{i} such that by Assumption 3, ||Fi||,q<||F_{i}||_{\mathbb{P},q}<\infty. In addition, by Assumption 3 again and the fact that

{τ1{Yis(a)qa(τ)}ma(τ,s):τΥ}\displaystyle\{\tau-1\{Y_{i}^{s}(a)\leq q_{a}(\tau)\}-m_{a}(\tau,s):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients (α,v)(\alpha,v), so is \mathcal{F}. Then, we have

J(1,)<,\displaystyle J(1,\mathcal{F})<\infty,

where

J(δ,)=supQ0δ1+logN(,L2(Q),ε||F||Q,2)dε,\displaystyle J(\delta,\mathcal{F})=\sup_{Q}\int_{0}^{\delta}\sqrt{1+\log N(\mathcal{F},L_{2}(Q),\varepsilon||F||_{Q,2})}d\varepsilon,

N(,L2(Q),ε||F||Q,2)N(\mathcal{F},L_{2}(Q),\varepsilon||F||_{Q,2}) is the covering number, and the supremum is taken over all discrete probability measures QQ. Therefore, by van der Vaart and Wellner (1996, Theorem 2.14.1)

270𝔼supτΥ|i=1nεϕa(τ,s,Yis(a),Xis)|nδnε[𝔼nε||nε||]nδnεJ(1,)nδ.\displaystyle\frac{270\mathbb{E}\sup_{\tau\in\Upsilon}|\sum_{i=1}^{\lfloor n\varepsilon\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})|}{\sqrt{n}\delta}\lesssim\frac{\sqrt{\lfloor n\varepsilon\rfloor}\left[\mathbb{E}\sqrt{\lfloor n\varepsilon\rfloor}||\mathbb{P}_{\lfloor n\varepsilon\rfloor}-\mathbb{P}||_{\mathcal{F}}\right]}{\sqrt{n}\delta}\lesssim\frac{\sqrt{\lfloor n\varepsilon\rfloor}J(1,\mathcal{F})}{\sqrt{n}\delta}.

For the second term on the RHS of (N), by taking ε=δ4\varepsilon=\delta^{4}, we have

(supt(0,1),τ1,τ2Υ,τ1<τ2<τ1+ε|Γn,a(s,t,τ2)Γn,a(s,t,τ1)|δ)\displaystyle\mathbb{P}\left(\sup_{t\in(0,1),\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}|\Gamma_{n,a}(s,t,\tau_{2})-\Gamma_{n,a}(s,t,\tau_{1})|\geq\delta\right)
=(max1knsupτ1,τ2Υ,τ1<τ2<τ1+ε|Sk(τ1,τ2)|nδ)\displaystyle=\mathbb{P}\left(\max_{1\leq k\leq n}\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}|S_{k}(\tau_{1},\tau_{2})|\geq\sqrt{n}\delta\right)
270𝔼supτ1,τ2Υ,τ1<τ2<τ1+ε|i=1n(ϕa(τ2,s,Yis(a),Xis)ϕa(τ1,s,Yis(a),Xis)|nδ\displaystyle\leq\frac{270\;\mathbb{E}\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}|\sum_{i=1}^{n}(\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s})|}{\sqrt{n}\delta}
δlog(Cδ2),\displaystyle\lesssim\delta\sqrt{\log(\frac{C}{\delta^{2}})},

where in the first equality, Sk(τ1,τ2)=i=1k(ϕa(τ2,s,Yis(a),Xis)ϕa(τ1,s,Yis(a),Xis))S_{k}(\tau_{1},\tau_{2})=\sum_{i=1}^{k}(\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s})) and the first inequality is due to Lemma N.1. To see the last inequality, denote

={ϕa(τ2,s,Yis(a),Xis)ϕa(τ1,s,Yis(a),Xis):τ1,τ2Υ,τ1<τ2<τ1+ε}\displaystyle\mathcal{F}=\{\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s}):\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon\}

with a constant envelope function FiF_{i} such that ||Fi||,q<||F_{i}||_{\mathbb{P},q}<\infty. In addition, due to Assumptions 2.2 and 3.3, one can show that

supf𝔼f2cεσ2\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq c\varepsilon\equiv\sigma^{2}

for some constant c>0c>0. Last, due to Assumption 3.2, \mathcal{F} is of the VC-type with fixed coefficients (α,v)(\alpha,v). Therefore, by Chernozhukov et al. (2014, Corollary 5.1),

270𝔼supτ1,τ2Υ,τ1<τ2<τ1+ε|i=1n(ϕa(τ2,s,Yis(a),Xis)ϕa(τ1,s,Yis(a),Xis))|nδ\displaystyle\frac{270\mathbb{E}\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}|\sum_{i=1}^{n}(\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s}))|}{\sqrt{n}\delta}
\displaystyle\lesssim n𝔼||n||δσ2log(Cσ)δ2+Clog(Cσ)nδδlog(Cδ2),\displaystyle\frac{\sqrt{n}\mathbb{E}||\mathbb{P}_{n}-\mathbb{P}||_{\mathcal{F}}}{\delta}\lesssim\sqrt{\frac{\sigma^{2}\log(\frac{C}{\sigma})}{\delta^{2}}}+\frac{C\log(\frac{C}{\sigma})}{\sqrt{n}\delta}\lesssim\delta\sqrt{\log(\frac{C}{\delta^{2}})},

where the last inequality holds by letting nn be sufficiently large. Note that δlog(Cδ2)0\delta\sqrt{\log(\frac{C}{\delta^{2}})}\rightarrow 0 as δ0\delta\rightarrow 0. This concludes the proof of Claim (1).

Claim (2). For a single τ\tau, by the triangular array CLT,

ϖn,1(τ)N(0,Σ1(τ,τ)),\displaystyle\varpi_{n,1}^{\star}(\tau)\rightsquigarrow N(0,\Sigma_{1}(\tau,\tau)),

where

Σ1(τ,τ)\displaystyle\Sigma_{1}(\tau,\tau) =limns𝒮(n(F(s)+π(s)p(s))nF(s))n𝔼ϕ12(τ,s,Yis(1),Xis)\displaystyle=\lim_{n\rightarrow\infty}\sum_{s\in\mathcal{S}}\frac{(\lfloor n(F(s)+\pi(s)p(s))\rfloor-\lfloor nF(s)\rfloor)}{n}\mathbb{E}\phi_{1}^{2}(\tau,s,Y_{i}^{s}(1),X_{i}^{s})
+limns𝒮(n(F(s)+p(s))n(F(s)+p(s)π(s)))n𝔼ϕ02(τ,s,Yis(0),Xis)\displaystyle+\lim_{n\rightarrow\infty}\sum_{s\in\mathcal{S}}\frac{(\lfloor n(F(s)+p(s))\rfloor-\lfloor n(F(s)+p(s)\pi(s))\rfloor)}{n}\mathbb{E}\phi_{0}^{2}(\tau,s,Y_{i}^{s}(0),X_{i}^{s})
=s𝒮p(s)𝔼(π(s)ϕ12(τ,Si,Yi(1),Xi)+(1π(s))ϕ02(τ,Si,Yi(0),Xi)|Si=s)\displaystyle=\sum_{s\in\mathcal{S}}p(s)\mathbb{E}(\pi(s)\phi_{1}^{2}(\tau,S_{i},Y_{i}(1),X_{i})+(1-\pi(s))\phi_{0}^{2}(\tau,S_{i},Y_{i}(0),X_{i})|S_{i}=s)
=𝔼π(Si)ϕ12(τ,Si,Yi(1),Xi)+𝔼(1π(Si))ϕ02(τ,Si,Yi(0),Xi).\displaystyle=\mathbb{E}\pi(S_{i})\phi_{1}^{2}(\tau,S_{i},Y_{i}(1),X_{i})+\mathbb{E}(1-\pi(S_{i}))\phi_{0}^{2}(\tau,S_{i},Y_{i}(0),X_{i}).

Finite dimensional convergence is proved by the Cramér-Wold device. In particular, we can show that the covariance kernel is

Σ1(τ,τ)=\displaystyle\Sigma_{1}(\tau,\tau^{\prime})= 𝔼π(Si)ϕ1(τ,Si,Yi(1),Xi)ϕ1(τ,Si,Yi(1),Xi)\displaystyle\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
+𝔼(1π(Si))ϕ0(τ,Si,Yi(0),Xi)ϕ0(τ,Si,Yi(0),Xi).\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i}).

This concludes the proof of Claim (2), and thereby leads to the desired results in Step 1.

Step 2. As ma(τ,Si)=τ(Yi(a)qa(τ)|Si)m_{a}(\tau,S_{i})=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}) is Lipschitz continuous in τ\tau with a bounded Lipschitz constant, {ma(τ,Si):τΥ}\{m_{a}(\tau,S_{i}):\tau\in\Upsilon\} is of the VC-type with fixed coefficients (α,v)(\alpha,v) and a constant envelope function. Therefore, {m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)):τΥ}\{\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}:\tau\in\Upsilon\} is a Donsker class and we have

ϖn,2(τ)2(τ),\displaystyle\varpi_{n,2}(\tau)\rightsquigarrow\mathcal{B}_{2}(\tau),

where 2(τ)\mathcal{B}_{2}(\tau) is a Gaussian process with covariance kernel

Σ2(τ,τ)=𝔼(m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)))(m1(τ,Si)f1(q1(τ))m0(τ,Si)f0(q0(τ)))𝔼ϕs(τ,Si)ϕs(τ,Si).\displaystyle\Sigma_{2}(\tau,\tau^{\prime})=\mathbb{E}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\equiv\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).

This concludes the proof. ∎

Lemma N.4.

Suppose the Assumptions in Theorem 5 hold and recall Dnw(s)=i=1nξi(Aiπ(Si))1{Si=s}D_{n}^{w}(s)=\sum_{i=1}^{n}\xi_{i}(A_{i}-\pi(S_{i}))1\{S_{i}=s\}. Then, maxs𝒮|(Dnw(s)Dn(s))/n(s)|=op(1)\max_{s\in\mathcal{S}}|(D_{n}^{w}(s)-D_{n}(s))/n(s)|=o_{p}(1) and maxs𝒮|Dnw(s)/nw(s)|=op(1)\max_{s\in\mathcal{S}}|D_{n}^{w}(s)/n^{w}(s)|=o_{p}(1).

Proof.

We note that nw(s)/n(s)p1n^{w}(s)/n(s)\stackrel{{\scriptstyle p}}{{\longrightarrow}}1 and Dn(s)/n(s)p0D_{n}(s)/n(s)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0. Therefore, we only need to show

Dnw(s)Dn(s)n(s)=i=1n(ξi1)(Aiπ(s))1{Si=s}n(s)p0.\displaystyle\frac{D_{n}^{w}(s)-D_{n}(s)}{n(s)}=\sum_{i=1}^{n}\frac{(\xi_{i}-1)(A_{i}-\pi(s))1\{S_{i}=s\}}{n(s)}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

As n(s)n(s)\rightarrow\infty a.s., given data,

1n(s)i=1n(Aiπ(s))21{Si=s}=\displaystyle\frac{1}{n(s)}\sum_{i=1}^{n}(A_{i}-\pi(s))^{2}1\{S_{i}=s\}= 1ni=1n(Aiπ(s)2π(s)(Aiπ(s))+π(s)π2(s))1{Si=s}\displaystyle\frac{1}{n}\sum_{i=1}^{n}\left(A_{i}-\pi(s)-2\pi(s)(A_{i}-\pi(s))+\pi(s)-\pi^{2}(s)\right)1\{S_{i}=s\}
=\displaystyle= Dn(s)2π(s)Dn(s)n(s)+π(s)(1π(s))pπ(s)(1π(s)).\displaystyle\frac{D_{n}(s)-2\pi(s)D_{n}(s)}{n(s)}+\pi(s)(1-\pi(s))\stackrel{{\scriptstyle p}}{{\longrightarrow}}\pi(s)(1-\pi(s)).

Then, by the Lindeberg CLT, conditionally on data,

1n(s)i=1n(ξi1)(Aiπ(s))1{Si=s}N(0,π(s)(1π(s)))=Op(1),\displaystyle\frac{1}{\sqrt{n(s)}}\sum_{i=1}^{n}(\xi_{i}-1)(A_{i}-\pi(s))1\{S_{i}=s\}\rightsquigarrow N(0,\pi(s)(1-\pi(s)))=O_{p}(1),

and thus

Dnw(s)Dn(s)n(s)=Op(n1/2(s))=op(1).\displaystyle\frac{D_{n}^{w}(s)-D_{n}(s)}{n(s)}=O_{p}(n^{-1/2}(s))=o_{p}(1).

Lemma N.5.

Suppose the Assumptions in Theorem 5 hold. Then, uniformly over τΥ\tau\in\Upsilon,

ϖn,1w(τ)+ϖn,2w(τ)𝜉(τ),\displaystyle\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)\underset{\xi}{\overset{\mathbb{P}}{\rightsquigarrow}}\mathcal{B}(\tau),

where (τ)\mathcal{B}(\tau) is a Gaussian process with the covariance kernel

Σ(τ,τ)=𝔼π(Si)ϕ1(τ,Si,Yi(1),Xi)ϕ1(τ,Si,Yi(1),Xi)\displaystyle\Sigma(\tau,\tau^{\prime})=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})
+𝔼(1π(Si))ϕ0(τ,Si,Yi(0),Xi)ϕ0(τ,Si,Yi(0),Xi)+𝔼ϕs(τ,Si)ϕs(τ,Si).\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})+\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).
Proof.

We divide the proof into two steps. In the first step, we show the conditional stochastic equicontinuity of ϖn,1w(τ)\varpi_{n,1}^{w}(\tau) and ϖn,2w(τ)\varpi_{n,2}^{w}(\tau). In the second step, we show the finite-dimensional convergence of ϖn,1w(τ)+ϖn,2w(τ)\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau) conditional on data.

Step 1. Following the same idea in the proof of Lemma N.3, we define {(ξis,Xis,Yis(1),Yis(0)):1in}\{(\xi_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)):1\leq i\leq n\} as a sequence of i.i.d. random variables with marginal distributions equal to the distribution of (ξi,Xi,Yi(1),Yi(0))|Si=s(\xi_{i},X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s and N(s)=i=1n1{Si<s}N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}. The distribution of ϖn,1(τ)\varpi_{n,1}(\tau) is the same as the counterpart with units ordered by strata and then ordered by Ai=1A_{i}=1 first and Ai=0A_{i}=0 second within each stratum, i.e.,

ϖn,1w(τ)|{(Ai,Si)i[n]}\displaystyle\varpi_{n,1}^{w}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\} =dϖ~wn,1(τ)|{(Ai,Si)i[n]},\displaystyle\stackrel{{\scriptstyle d}}{{=}}\widetilde{\varpi}^{w}_{n,1}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\},

and thus,

ϖn,1w(τ)\displaystyle\varpi_{n,1}^{w}(\tau) =dϖ~wn,1(τ),\displaystyle\stackrel{{\scriptstyle d}}{{=}}\widetilde{\varpi}^{w}_{n,1}(\tau), (N.2)

where

ϖ~n,1w(τ)\displaystyle\widetilde{\varpi}_{n,1}^{w}(\tau) s𝒮1ni=N(s)+1N(s)+n1(s)(ξis1)ϕ1(τ,s,Ysi(1),Xsi)\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}(\xi_{i}^{s}-1)\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})
s𝒮1ni=N(s)+n1(s)+1N(s)+n(s)(ξis1)ϕ0(τ,s,Ysi(0),Xsi).\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}(\xi_{i}^{s}-1)\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}).

In addition, let

ϖ~wn,1(τ)\displaystyle\widetilde{\varpi}^{w\star}_{n,1}(\tau) s𝒮1ni=nF(s)+1n(F(s)+π(s)p(s))(ξis1)ϕ1(τ,s,Ysi(1),Xsi)\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor nF(s)\rfloor+1}^{\lfloor n(F(s)+\pi(s)p(s))\rfloor}(\xi_{i}^{s}-1)\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})
s𝒮1ni=n(F(s)+π(s)p(s))+1n(F(s)+p(s))(ξis1)ϕ0(τ,s,Ysi(0),Xsi).\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor n(F(s)+\pi(s)p(s))\rfloor+1}^{\lfloor n(F(s)+p(s))\rfloor}(\xi_{i}^{s}-1)\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}).

Following exactly the same argument as in the proof of Lemma N.3, we have

supτΥ|ϖ~n,1w(τ)ϖ~wn,1(τ)|=op(1).\displaystyle\sup_{\tau\in\Upsilon}|\widetilde{\varpi}_{n,1}^{w}(\tau)-\widetilde{\varpi}^{w\star}_{n,1}(\tau)|=o_{p}(1). (N.3)

and ϖ~wn,1(τ)\widetilde{\varpi}^{w\star}_{n,1}(\tau) is unconditionally stochastically equicontinuous, i.e., for any ε>0\varepsilon>0, as nn\rightarrow\infty followed by δ0\delta\rightarrow 0, we have

𝔼ξ(supτ1,τ2Υ,τ1<τ2<τ1+δ|ϖ~wn,1(τ1)ϖ~wn,1(τ2)|ε)\displaystyle\mathbb{E}\mathbb{P}_{\xi}\left(\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\delta}|\widetilde{\varpi}^{w\star}_{n,1}(\tau_{1})-\widetilde{\varpi}^{w\star}_{n,1}(\tau_{2})|\geq\varepsilon\right)
=(supτ1,τ2Υ,τ1<τ2<τ1+δ|ϖ~wn,1(τ1)ϖ~wn,1(τ2)|ε)0,\displaystyle=\mathbb{P}\left(\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\delta}|\widetilde{\varpi}^{w\star}_{n,1}(\tau_{1})-\widetilde{\varpi}^{w\star}_{n,1}(\tau_{2})|\geq\varepsilon\right)\rightarrow 0,

where ξ\mathbb{P}_{\xi} means the probability operator is with respect to the bootstrap weights {ξi}i[n]\{\xi_{i}\}_{i\in[n]} and is conditional on data. This implies the unconditional stochastic equicontinuity of ϖwn,1(τ)\varpi^{w}_{n,1}(\tau) due to (N.2) and (N.3), which further implies the conditional stochastic equicontinuity of ϖwn,1(τ)\varpi^{w}_{n,1}(\tau), i.e., for any ε>0\varepsilon>0, as nn\rightarrow\infty followed by δ0\delta\rightarrow 0,

ξ(supτ1,τ2Υ,τ1<τ2<τ1+δ|ϖ~wn,1(τ1)ϖ~wn,1(τ2)|ε)p0.\displaystyle\mathbb{P}_{\xi}\left(\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\delta}|\widetilde{\varpi}^{w\star}_{n,1}(\tau_{1})-\widetilde{\varpi}^{w\star}_{n,1}(\tau_{2})|\geq\varepsilon\right)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

By a similar but simpler argument, the conditional stochastic equicontinuity of ϖn,2w(τ)\varpi_{n,2}^{w}(\tau) holds as well. This concludes the first step.

Step 2. We first show the asymptotic normality of ϖn,1w(τ)+ϖn,2w(τ)\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau) conditionally on data for a fixed τ\tau. Note

ϖn,1w(τ)+ϖn,2w(τ)\displaystyle\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)
=\displaystyle= 1ni=1n(ξi1)[Ai1{Si=s}ϕ1(τ,Si,Yi(1),Xi)(1Ai)1{Si=s}ϕ1(τ,Si,Yi(1),Xi)+ϕs(τ,Si)]\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\left[A_{i}1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})-(1-A_{i})1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})+\phi_{s}(\tau,S_{i})\right]
\displaystyle\equiv 1ni=1n(ξi1)𝒥i(τ,s).\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\mathcal{J}_{i}(\tau,s).

Conditionally on data, {(ξi1)𝒥i(τ)}i[n]\{(\xi_{i}-1)\mathcal{J}_{i}(\tau)\}_{i\in[n]} is a sequence of i.n.i.d. random variables. In order to apply the Lindeberg-Feller central limit theorem, we only need to show that (1)

1ni=1n𝒥2n,i(τ)pΣ(τ,τ),\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}^{2}_{n,i}(\tau)\stackrel{{\scriptstyle p}}{{\longrightarrow}}\Sigma(\tau,\tau),

where Σ(τ,τ)\Sigma(\tau,\tau) is defined in Theorem 3, and (2) the Lindeberg condition holds, i.e.,

1ni=1n𝒥n,i2(τ)𝔼(ξi1)21{|(ξi1)𝒥n,i(τ)|nε}p0.\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}_{n,i}^{2}(\tau)\mathbb{E}(\xi_{i}-1)^{2}1\{|(\xi_{i}-1)\mathcal{J}_{n,i}(\tau)|\geq\sqrt{n}\varepsilon\}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

For part (1), we have

1ni=1n𝒥2n,i(τ)=σ12+2σ12+σ22,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}^{2}_{n,i}(\tau)=\sigma_{1}^{2}+2\sigma_{12}+\sigma_{2}^{2},

where

σ12=1ni=1n[Ai1{Si=s}ϕ1(τ,Si,Yi(1),Xi)(1Ai)1{Si=s}ϕ0(τ,Si,Yi(1),Xi)]2,\displaystyle\sigma_{1}^{2}=\frac{1}{n}\sum_{i=1}^{n}\left[A_{i}1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})-(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,S_{i},Y_{i}(1),X_{i})\right]^{2},
σ12=1ni=1n[Ai1{Si=s}ϕ1(τ,Si,Yi(1),Xi)(1Ai)1{Si=s}ϕ1(τ,Si,Yi(1),Xi)]ϕs(τ,Si),\displaystyle\sigma_{12}=\frac{1}{n}\sum_{i=1}^{n}\left[A_{i}1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})-(1-A_{i})1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\right]\phi_{s}(\tau,S_{i}),

and

σ22=1ni=1nϕs2(τ,Si).\displaystyle\sigma_{2}^{2}=\frac{1}{n}\sum_{i=1}^{n}\phi_{s}^{2}(\tau,S_{i}).

Note

σ12\displaystyle\sigma_{1}^{2} =1ni=1nAi1{Si=s}ϕ21(τ,Si,Yi(1),Xi)+1ni=1n(1Ai)1{Si=s}ϕ02(τ,Si,Yi(1),Xi)\displaystyle=\frac{1}{n}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\phi^{2}_{1}(\tau,S_{i},Y_{i}(1),X_{i})+\frac{1}{n}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\phi_{0}^{2}(\tau,S_{i},Y_{i}(1),X_{i})
=d1ns𝒮i=N(s)+1N(s)+n1(s)ϕ21(τ,s,Yis(1),Xsi)+1ni=N(s)+n1(s)+1N(s)+n(s)ϕ02(τ,s,Ysi(1),Xsi)\displaystyle\stackrel{{\scriptstyle d}}{{=}}\frac{1}{n}\sum_{s\in\mathcal{S}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\phi^{2}_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})+\frac{1}{n}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}\phi_{0}^{2}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})
ps𝒮[π(s)𝔼ϕ21(τ,s,Yis(1),Xsi)+(1π(s))𝔼ϕ20(τ,s,Yis(1),Xsi)]\displaystyle\stackrel{{\scriptstyle p}}{{\longrightarrow}}\sum_{s\in\mathcal{S}}[\pi(s)\mathbb{E}\phi^{2}_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})+(1-\pi(s))\mathbb{E}\phi^{2}_{0}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})]
=𝔼[π(Si)ϕ21(τ,Si,Yi(1),Xi)+(1π(s))ϕ20(τ,Si,Yi(1),Xi)],\displaystyle=\mathbb{E}\left[\pi(S_{i})\phi^{2}_{1}(\tau,S_{i},Y_{i}(1),X_{i})+(1-\pi(s))\phi^{2}_{0}(\tau,S_{i},Y_{i}(1),X_{i})\right],

where the convergence holds since N(s)/nF(s)N(s)/n\rightarrow F(s), n1(s)/npπ(s)p(s)n_{1}(s)/n\stackrel{{\scriptstyle p}}{{\longrightarrow}}\pi(s)p(s), n(s)/npp(s)n(s)/n\stackrel{{\scriptstyle p}}{{\longrightarrow}}p(s), and uniform convergence of the partial sum process. Similarly,

σ12\displaystyle\sigma_{12} =d1ns𝒮i=N(s)+1N(s)+n1(s)ϕ1(τ,s,Yis(1),Xsi)ϕs(τ,s)+1ni=N(s)+n1(s)+1N(s)+n(s)ϕ0(τ,s,Ysi(1),Xsi)ϕs(τ,s)\displaystyle\stackrel{{\scriptstyle d}}{{=}}\frac{1}{n}\sum_{s\in\mathcal{S}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\phi_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})\phi_{s}(\tau,s)+\frac{1}{n}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}\phi_{0}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})\phi_{s}(\tau,s)
ps𝒮[π(s)𝔼ϕ1(τ,s,Yis(1),Xsi)+(1π(s))𝔼ϕ0(τ,s,Yis(1),Xsi)]ϕs(τ,s)=0,\displaystyle\stackrel{{\scriptstyle p}}{{\longrightarrow}}\sum_{s\in\mathcal{S}}[\pi(s)\mathbb{E}\phi_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})+(1-\pi(s))\mathbb{E}\phi_{0}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})]\phi_{s}(\tau,s)=0,

where we use the fact that

𝔼ϕ1(τ,s,Yis(1),Xsi)=𝔼ϕ0(τ,s,Yis(1),Xsi)=0.\displaystyle\mathbb{E}\phi_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})=\mathbb{E}\phi_{0}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})=0.

By the standard weak law of large numbers, we have

σ22p𝔼ϕs2(τ,Si).\displaystyle\sigma_{2}^{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\mathbb{E}\phi_{s}^{2}(\tau,S_{i}).

Therefore,

1ni=1n𝒥2n,i(τ)p𝔼[π(Si)ϕ21(τ,Si,Yi(1),Xi)+(1π(s))ϕ20(τ,Si,Yi(1),Xi)]+𝔼ϕs2(τ,Si)=Σ(τ,τ).\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}^{2}_{n,i}(\tau)\stackrel{{\scriptstyle p}}{{\longrightarrow}}\mathbb{E}\left[\pi(S_{i})\phi^{2}_{1}(\tau,S_{i},Y_{i}(1),X_{i})+(1-\pi(s))\phi^{2}_{0}(\tau,S_{i},Y_{i}(1),X_{i})\right]+\mathbb{E}\phi_{s}^{2}(\tau,S_{i})=\Sigma(\tau,\tau).

To verify the Lindeberg condition, we note that

1ni=1n𝒥n,i2(τ)𝔼(ξi1)21{|(ξi1)𝒥n,i(τ)|nε}\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}_{n,i}^{2}(\tau)\mathbb{E}(\xi_{i}-1)^{2}1\{|(\xi_{i}-1)\mathcal{J}_{n,i}(\tau)|\geq\sqrt{n}\varepsilon\}
1n(nε)q2i=1n𝒥n,iq(τ)𝔼(ξi1)q\displaystyle\leq\frac{1}{n(\sqrt{n}\varepsilon)^{q-2}}\sum_{i=1}^{n}\mathcal{J}_{n,i}^{q}(\tau)\mathbb{E}(\xi_{i}-1)^{q}
cn(nε)q2i=1n[ϕ1q(τ,Si,Yi(1),Xi)+ϕ0q(τ,Si,Yi(1),Xi)+ϕsq(τ,Si)]=op(1),\displaystyle\leq\frac{c}{n(\sqrt{n}\varepsilon)^{q-2}}\sum_{i=1}^{n}[\phi_{1}^{q}(\tau,S_{i},Y_{i}(1),X_{i})+\phi_{0}^{q}(\tau,S_{i},Y_{i}(1),X_{i})+\phi_{s}^{q}(\tau,S_{i})]=o_{p}(1),

where the last equality is due to Assumption 3(ii) and the fact that ηi,a(τ,s)\eta_{i,a}(\tau,s) is bounded.

Finite dimensional convergence of ϖn,1w(τ)+ϖn,2w(τ)\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau) across τ\tau can be established in the same manner using the Cramér-Wold device and the details are omitted. By the same calculation as that given above the covariance kernel is shown to be

limn1ni=1n𝒥n,i(τ1)𝒥n,i(τ2)\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}_{n,i}(\tau_{1})\mathcal{J}_{n,i}(\tau_{2})
=𝔼[π(Si)ϕ1(τ1,Si,Yi(1),Xi)ϕ1(τ2,Si,Yi(1),Xi)]\displaystyle=\mathbb{E}\left[\pi(S_{i})\phi_{1}(\tau_{1},S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau_{2},S_{i},Y_{i}(1),X_{i})\right]
+𝔼[(1π(s))ϕ0(τ1,Si,Yi(1),Xi)ϕ(τ2,Si,Yi(1),Xi)]\displaystyle+\mathbb{E}\left[(1-\pi(s))\phi_{0}(\tau_{1},S_{i},Y_{i}(1),X_{i})\phi(\tau_{2},S_{i},Y_{i}(1),X_{i})\right]
+𝔼ϕs(τ1,Si)ϕs(τ2,Si)=Σ(τ1,τ2),\displaystyle+\mathbb{E}\phi_{s}(\tau_{1},S_{i})\phi_{s}(\tau_{2},S_{i})=\Sigma(\tau_{1},\tau_{2}),

which concludes the proof. ∎

Lemma N.6.

Suppose the Assumptions in Theorem 5.6 hold. Then,

supτΥ,a=0,1,s𝒮1na(s)iIa(s)(1{Yiq^a(τ)}ma(τ,s,Xi))Hhn(Xi)=Op(log(n)n).\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}=O_{p}\left(\sqrt{\frac{\log(n)}{n}}\right).
Proof.

We focus on a=1a=1. We have

supτΥ1n1(s)iI1(s)(1{Yiq^1(τ)}m1(τ,s,Xi))Hhn(Xi)\displaystyle\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}
supτΥ1n1(s)iI1(s)(1{Yiq^1(τ)}(Yi(1)q^1(τ)|Xi,Si=s))Hhn(Xi)\displaystyle\leq\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)|X_{i},S_{i}=s))H_{h_{n}}(X_{i})\right\|_{\infty}
+supτΥ1n1(s)iI1(s)((Yi(1)q^1(τ)|Xi,Si=s)m1(τ,s,Xi))Hhn(Xi)\displaystyle+\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)|X_{i},S_{i}=s)-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}
supq1n1(s)iI1(s)(1{Yiq}(Yi(1)q|Xi,Si=s))Hhn(Xi)\displaystyle\leq\sup_{q\in\Re}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq q\}-\mathbb{P}(Y_{i}(1)\leq q|X_{i},S_{i}=s))H_{h_{n}}(X_{i})\right\|_{\infty}
+supτΥ1n1(s)iI1(s)((Yi(1)q^1(τ)|Xi,Si=s)m1(τ,s,Xi))Hhn(Xi).\displaystyle+\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)|X_{i},S_{i}=s)-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}. (N.4)

Define

h={1{Yis(1)q}Hhn,h(Xi):q},=h[hn]hn,\displaystyle\mathcal{F}_{h}=\{1\{Y_{i}^{s}(1)\leq q\}H_{h_{n},h}(X_{i}):q\in\Re\},\quad\mathcal{F}=\cup_{h\in[h_{n}]}\mathcal{F}_{h_{n}},

and let Hhn,h(Xi)H_{h_{n},h}(X_{i}) be the hh-th coordinate of Hhn(Xi)H_{h_{n}}(X_{i}). For each h[hn]h\in[h_{n}], h\mathcal{F}_{h} is of the VC-type with fixed coefficients (α,v)(\alpha,v) and a common envelope Fi=||Hhn(Xi)||2ζ(hn)F_{i}=||H_{h_{n}}(X_{i})||_{2}\leq\zeta(h_{n}), i.e.,

supQN(h,eQ,ε||F||Q,2)(αε)v,ε(0,1],\displaystyle\sup_{Q}N(\mathcal{F}_{h},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{\alpha}{\varepsilon}\right)^{v},\quad\forall\varepsilon\in(0,1],

where the supremum is taken over all finitely discrete probability measures. This implies

supQN(,eQ,ε||F||Q,2)h[hn]supQN(h,eQ,ε||F||Q,2)(αhnε)vε(0,1],\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\sum_{h\in[h_{n}]}\sup_{Q}N(\mathcal{F}_{h},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{\alpha h_{n}}{\varepsilon}\right)^{v}\,\quad\forall\varepsilon\in(0,1],

i.e., \mathcal{F} is also of the VC-type with coefficients (αhn,v)(\alpha h_{n},v). In addition,

supf𝔼f2maxh[hn]𝔼Hhn,h2(Xi)C<.\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq\max_{h\in[h_{n}]}\mathbb{E}H_{h_{n},h}^{2}(X_{i})\leq C<\infty.

Then, Lemma N.2 implies

supq1n1(s)iI1(s)(1{Yiq}(Yi(1)q|Xi,Si=s))Hhn(Xi)\displaystyle\sup_{q\in\Re}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq q\}-\mathbb{P}(Y_{i}(1)\leq q|X_{i},S_{i}=s))H_{h_{n}}(X_{i})\right\|_{\infty}
=Op(log(hnζ(hn))n+ζ(hn)log(ζ(hn))n)=Op(log(n)n).\displaystyle=O_{p}\left(\sqrt{\frac{\log(h_{n}\zeta(h_{n}))}{n}}+\frac{\zeta(h_{n})\log(\zeta(h_{n}))}{n}\right)=O_{p}\left(\sqrt{\frac{\log(n)}{n}}\right).

For the second term of (N.4), because supq,xSupp(X),s𝒮f1(q|x,s)<\sup_{q\in\Re,x\in\text{Supp}(X),s\in\mathcal{S}}f_{1}(q|x,s)<\infty, we have

supτΥ1n1(s)iI1(s)((Yi(1)q^1(τ)|Xi,Si=s)m1(τ,s,Xi))Hhn(Xi)\displaystyle\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)|X_{i},S_{i}=s)-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}
supτΥ|q^1(τ)q1(τ)|1n1(s)iI1(s)|Hhn(Xi)|\displaystyle\leq\sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}|H_{h_{n}}(X_{i})|\right\|_{\infty}
supτΥ|q^1(τ)q1(τ)|1n1(s)iIs(1)[|Hhn(Xsi)|𝔼(|Hhn(Xi)||Si=s)]\displaystyle\leq\sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{s}(1)}\left[|H_{h_{n}}(X^{s}_{i})|-\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right]\right\|_{\infty}
+supτΥ|q^1(τ)q1(τ)|𝔼(|Hhn(Xi)||Si=s)\displaystyle+\sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\left\|\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right\|_{\infty}
=supτΥ|q^1(τ)q1(τ)|1n1(s)iIs(1)[|Hhn(Xsi)|𝔼(|Hhn(Xi)||Si=s)]+Op(n1/2)\displaystyle=\sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{s}(1)}\left[|H_{h_{n}}(X^{s}_{i})|-\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right]\right\|_{\infty}+O_{p}(n^{-1/2})
=Op(n1/2),\displaystyle=O_{p}(n^{-1/2}),

where the second to last inequality holds because of Assumption 8 and 𝔼(|Hhn(Xi)||Si=s)C<\left\|\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right\|_{\infty}\leq C<\infty, and the last inequality holds because by a similar argument to the one used in bounding the first term on the RHS of (N.4), we can show that

1n1(s)iIs(1)[|Hhn(Xsi)|𝔼(|Hhn(Xi)||Si=s)]=Op(log(n)n).\displaystyle\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{s}(1)}\left[|H_{h_{n}}(X^{s}_{i})|-\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right]\right\|_{\infty}=O_{p}\left(\sqrt{\frac{\log(n)}{n}}\right).

This concludes the proof. ∎

Lemma N.7.

Suppose the Assumptions in Theorem 5.6 hold and recall ¯\overline{\ell} defined in (L.1). We have ¯/(hnlog(n)/n)\overline{\ell}/(\sqrt{h_{n}\log(n)}/n)\rightarrow\infty, w.p.a.1.

Proof.

Note that w.p.a.1,

¯\displaystyle\overline{\ell} =infUhn[1na(s)iIa(s)(Hhn(Xi)U)2]3/21na(s)iIa(s)|Hhn(Xi)U|3\displaystyle=\inf_{U\in\Re^{h_{n}}}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})U)^{2}\right]^{3/2}}{\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}|H_{h_{n}}^{\top}(X_{i})U|^{3}}
infUhn[1na(s)iIa(s)(Hhn(Xi)U)2]1/2supx𝒳||Hhn(x)||2||U||2κ11/2ζ(hn),\displaystyle\geq\inf_{U\in\Re^{h_{n}}}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})U)^{2}\right]^{1/2}}{\sup_{x\in\mathcal{X}}||H_{h_{n}}(x)||_{2}||U||_{2}}\geq\frac{\kappa_{1}^{1/2}}{\zeta(h_{n})},

where the last inequality is due to Assumption 12. Therefore,

¯/(hnlog(n)/n)κ1nζ2(hn)hnlog(n)w.p.a.1.\displaystyle\overline{\ell}/(\sqrt{h_{n}\log(n)}/n)\geq\sqrt{\frac{\kappa_{1}n}{\zeta^{2}(h_{n})h_{n}\log(n)}}\rightarrow\infty\leavevmode\nobreak\ w.p.a.1.

References

  • Abadie et al. (2018) Abadie, A., M. M. Chingos, and M. R. West (2018). Endogenous stratification in randomized experiments. Review of Economics and Statistics 100(4), 567–580.
  • Anderson and McKenzie (2021) Anderson, S. J. and D. McKenzie (2021). Improving business practices and the boundary of the entrepreneur: a randomized experiment comparing training, consulting, insourcing and outsourcing. Journal of Political Economy, 130(1), 157–209.
  • Ansel et al. (2018) Ansel, J., H. Hong, and J. Li (2018). OLS and 2SLS in randomised and conditionally randomized experiments. Journal of Economics and Statistics 238, 243–293.
  • Athey and Imbens (2017) Athey, S. and G. W. Imbens (2017). The econometrics of randomized experiments. In Handbook of Economic Field Experiments, Volume 1, pp. 73–140. Elsevier.
  • Bach (2010) Bach, F. (2010). Self-concordant analysis for logistic regression. Electronic Journal of Statistics 4, 384–414.
  • Bai (2020) Bai, Y. (2020). Optimality of matched-pair designs in randomized controlled trials. Available at SSRN 3483834.
  • Bai et al. (2021) Bai, Y., A. Shaikh, and J. P. Romano (2021). Inference in experiments with matched pairs. Journal of the American Statistical Association, forthcoming.
  • Banerjee et al. (2015) Banerjee, A., E. Duflo, R. Glennerster, and C. Kinnan (2015). The miracle of microfinance? Evidence from a randomized evaluation. American Economic Journal: Applied Economics 7(1), 22–53.
  • Belloni et al. (2017) Belloni, A., V. Chernozhukov, I. Fernández-Val, and C. Hansen (2017). Program evaluation with high-dimensional data. Econometrica 85(1), 233–298.
  • Bitler et al. (2006) Bitler, M. P., J. B. Gelbach, and H. W. Hoynes (2006). What mean impacts miss: distributional effects of welfare reform experiments. American Economic Review 96(4), 988–1012.
  • Bloniarz et al. (2016) Bloniarz, A., H. Liu, C.-H. Zhang, J. S. Sekhon, and B. Yu (2016). Lasso adjustments of treatment effect estimates in randomized experiments. Proceedings of the National Academy of Sciences 113(27), 7383–7390.
  • Box (1976) Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association 71(356), 791–799.
  • Bradic et al. (2019) Bradic, J., S. Wager, and Y. Zhu (2019). Sparsity double robust inference of average treatment effects. arXiv preprint arXiv: 1905.00744.
  • Bugni et al. (2018) Bugni, F. A., I. A. Canay, and A. M. Shaikh (2018). Inference under covariate-adaptive randomization. Journal of the American Statistical Association 113(524), 1741–1768.
  • Bugni et al. (2019) Bugni, F. A., I. A. Canay, and A. M. Shaikh (2019). Inference under covariate-adaptive randomization with multiple treatments. Quantitative Economics 10(4), 1747–1785.
  • Bugni and Gao (2021) Bugni, F. A. and M. Gao (2021). Inference under covariate-adaptive randomization with imperfect compliance. arXiv preprint arXiv: 2102.03937.
  • Burchardi et al. (2019) Burchardi, K. B., S. Gulesci, B. Lerva, and M. Sulaiman (2019). Moral hazard: experimental evidence from tenancy contracts. Quarterly Journal of Economics 134(1), 281–347.
  • Campos et al. (2017) Campos, F., M. Frese, M. Goldstein, L. Iacovone, H. C. Johnson, D. McKenzie, and M. Mensmann (2017). Teaching personal initiative beats traditional training in boosting small business in west africa. Science 357(6357), 1287–1290.
  • Chen (2007) Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook of Econometrics, Volume 6, pp.  5549–5632. Elsevier.
  • Chernozhukov et al. (2014) Chernozhukov, V., D. Chetverikov, and K. Kato (2014). Gaussian approximation of suprema of empirical processes. Annals of Statistics 42(4), 1564–1597.
  • Chernozhukov et al. (2013) Chernozhukov, V., I. Fernández-Val, and B. Melly (2013). Inference on counterfactual distributions. Econometrica 81(6), 2205–2268.
  • Chong et al. (2016) Chong, A., I. Cohen, E. Field, E. Nakasone, and M. Torero (2016). Iron deficiency and schooling attainment in Peru. American Economic Journal: Applied Economics 8(4), 222–55.
  • Cohen and Fogarty (2020) Cohen, P. L. and C. B. Fogarty (2020). No-harm calibration for generalized oaxaca-blinder estimators. arXiv preprint arXiv:2012.09246.
  • Crépon et al. (2015) Crépon, B., F. Devoto, E. Duflo, and W. Parienté (2015). Estimating the impact of microcredit on those who take it up: evidence from a randomized experiment in Morocco. American Economic Journal: Applied Economics 7(1), 123–50.
  • Duflo et al. (2013) Duflo, E., M. Greenstone, R. Pande, and N. Ryan (2013). Truth-telling by third-party auditors and the response of polluting firms: experimental evidence from India. Quarterly Journal of Economics 128(4), 1499–1545.
  • Dupas et al. (2018) Dupas, P., D. Karlan, J. Robinson, and D. Ubfal (2018). Banking the unbanked? evidence from three countries. American Economic Journal: Applied Economics 10(2), 257–297.
  • Firpo (2007) Firpo, S. (2007). Efficient semiparametric estimation of quantile treatment effects. Econometrica 75(1), 259–276.
  • Fogarty (2018) Fogarty, C. B. (2018). Regression-assisted inference for the average treatment effect in paired experiments. Biometrika 105(4), 994–1000.
  • Freedman (2008a) Freedman, D. A. (2008a). On regression adjustments in experiments with several treatments. Annals of Applied Statistics 2(1), 176–196.
  • Freedman (2008b) Freedman, D. A. (2008b). On regression adjustments to experimental data. Advances in Applied Mathematics 40(2), 180–193.
  • Greaney et al. (2016) Greaney, B. P., J. P. Kaboski, and E. Van Leemput (2016). Can self-help groups really be “self-help”? Review of Economic Studies 83(4), 1614–1644.
  • Hahn et al. (2011) Hahn, J., K. Hirano, and D. Karlan (2011). Adaptive experimental design using the propensity score. Journal of Business & Economic Statistics 29(1), 96–108.
  • Hahn and Liao (2021) Hahn, J. and Z. Liao (2021). Bootstrap standard error estimates and inference. Econometrica 89(4), 1963–1977.
  • Hirano et al. (2003) Hirano, K., G. W. Imbens, and G. Ridder (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4), 1161–1189.
  • Hu and Hu (2012) Hu, Y. and F. Hu (2012). Asymptotic properties of covariate-adaptive randomization. Annals of Statistics 40(3), 1794–1815.
  • Jakiela and Ozier (2016) Jakiela, P. and O. Ozier (2016). Does Africa need a rotten kin theorem? Experimental evidence from village economies. Review of Economic Studies 83(1), 231–268.
  • Jiang et al. (2021) Jiang, L., X. Liu, P. C. B. Phillips, and Y. Zhang (2021). Bootstrap inference for quantile treatment effects in randomized experiments with matched pairs. Review of Economics and Statistics, forthcoming.
  • Kallus et al. (2020) Kallus, N., X. Mao, and M. Uehara (2020). Localized debiased machine learning: efficient inference on quantile treatment effects and beyond. arXiv preprint arXiv: 1912.12945.
  • Karlan et al. (2014) Karlan, D., A. L. Ratan, and J. Zinman (2014). Savings by and for the poor: A research review and agenda. Review of Income and Wealth 60(1), 36–78.
  • Kato (2009) Kato, K. (2009). Asymptotics for argmin processes: convexity arguments. Journal of Multivariate Analysis 100(8), 1816–1829.
  • Knight (1998) Knight, K. (1998). Limiting distributions for L1L_{1} regression estimators under general conditions. Annals of Statistics 26(2), 755–770.
  • Lei and Ding (2021) Lei, L. and P. Ding (2021). Regression adjustment in completely randomized experiments with a diverging number of covariates. Biometrika, 108(4), 815–828.
  • Li and Ding (2020) Li, X. and P. Ding (2020). Rerandomization and regression adjustment. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82(1), 241–268.
  • Lin (2013) Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: reexamining Freedman’s critique. Annals of Applied Statistics 7(1), 295–318.
  • Liu et al. (2020) Liu, H., F. Tu, and W. Ma (2020). A general theory of regression adjustment for covariate-adaptive randomization: OLS, Lasso, and beyond. arXiv preprint arXiv: 2011.09734.
  • Liu and Yang (2020) Liu, H. and Y. Yang (2020). Regression-adjusted average treatment effect estimates in stratified randomized experiments. Biometrika 107(4), 935–948.
  • Lu (2016) Lu, J. (2016). Covariate adjustment in randomization-based causal inference for 2K factorial designs. Statistics & Probability Letters 119, 11–20.
  • Ma et al. (2015) Ma, W., F. Hu, and L. Zhang (2015). Testing hypotheses of covariate-adaptive randomized clinical trials. Journal of the American Statistical Association 110(510), 669–680.
  • Ma et al. (2020) Ma, W., Y. Qin, Y. Li, and F. Hu (2020). Statistical inference for covariate-adaptive randomization procedures. Journal of the American Statistical Association 115(531), 1488–1497.
  • Montgomery-Smith (1993) Montgomery-Smith, S. J. (1993). Comparison of sums of independent identically distributed random variables. Probability and Mathematical Statistics 14(2), 281–285.
  • Muralidharan and Sundararaman (2011) Muralidharan, K. and V. Sundararaman (2011). Teacher performance pay: experimental evidence from India. Journal of Political Economy 119(1), 39–77.
  • Negi and Wooldridge (2020) Negi, A. and J. M. Wooldridge (2020). Revisiting regression adjustment in experiments with heterogeneous treatment effects. Econometric Reviews 40(5), 1–31.
  • Olivares (2021) Olivares, M. (2021). Robust permutation test for equality of distributions under covariate-adaptive randomization. Working paper, University of Illinois at Urbana Champaign.
  • Shao and Yu (2013) Shao, J. and X. Yu (2013). Validity of tests under covariate-adaptive biased coin randomization and generalized linear models. Biometrics 69(4), 960–969.
  • Shao et al. (2010) Shao, J., X. Yu, and B. Zhong (2010). A theory for testing hypotheses under covariate-adaptive randomization. Biometrika 97(2), 347–360.
  • Tabord-Meehan (2021) Tabord-Meehan, M. (2021). Stratification trees for adaptive randomization in randomized controlled trials. arXiv preprint arXiv: 1806.05127.
  • Tan (2020) Tan, Z. (2020). Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data. Annals of Statistics 48(2), 811–837.
  • van der Vaart and Wellner (1996) van der Vaart, A. and J. A. Wellner (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • von Neumann (2019) von Neumann, J. (2019). The mathematician. In Mathematics: People, Problems, Results (2 ed.). Chapman and Hall/CRC.
  • Wei (1978) Wei, L. (1978). An application of an urn model to the design of sequential controlled clinical trials. Journal of the American Statistical Association 73(363), 559–563.
  • Ye (2018) Ye, T. (2018). Testing hypotheses under covariate-adaptive randomisation and additive models. Statistical Theory and Related Fields 2(1), 96–101.
  • Ye and Shao (2020) Ye, T. and J. Shao (2020). Robust tests for treatment effect in survival analysis under covariate-adaptive randomization. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82(5), 1301–1323.
  • Ye et al. (2022) Ye, T., Y. Yi, and J. Shao (2022). Inference on average treatment effect under minimization and other covariate-adaptive randomization methods. Biometrika, 109(1), 33–47.
  • Zhang and Zheng (2020) Zhang, Y. and X. Zheng (2020). Quantile treatment effects and bootstrap inference under covariate-adaptive randomization. Quantitative Economics 11(3), 957–982.
  • Zhao and Ding (2021) Zhao, A. and P. Ding (2021). Covariate-adjusted fisher randomization tests for the average treatment effect. Journal of Econometrics 225(2), 278–294.