Regression-Adjusted Estimation of Quantile Treatment Effects under Covariate-Adaptive Randomizations

Liang Jiang Fanhai International School of Finance, Fudan University. 220 Handan Rd, Shanghai, China 200437. Peter C. B. Phillips School of Economics, Singapore Management University. 90 Stamford Rd, Singapore 178903. Yale University, New Haven, CT 06520-8281, United States of America. University of Auckland, 12 Grafton Rd, Auckland Central, Auckland, 1010, New Zealand. University of Southampton, University Rd, Southampton SO17 1BJ, United Kingdom. Yubo Tao Corresponding author. Email: [email protected]. Department of Economics, University of Macau. Avenida da Universidade, Taipa, Macao SAR, China. Yichong Zhang School of Economics, Singapore Management University. 90 Stamford Rd, Singapore 178903.

(March 10, 2025)

Abstract

Datasets from field experiments with covariate-adaptive randomizations (CARs) usually contain extra covariates in addition to the strata indicators. We propose to incorporate these additional covariates via auxiliary regressions in the estimation and inference of unconditional quantile treatment effects (QTEs) under CARs. We establish the consistency and limit distribution of the regression-adjusted QTE estimator and prove that the use of multiplier bootstrap inference is non-conservative under CARs. The auxiliary regression may be estimated parametrically, nonparametrically, or via regularization when the data are high-dimensional. Even when the auxiliary regression is misspecified, the proposed bootstrap inferential procedure still achieves the nominal rejection probability in the limit under the null. When the auxiliary regression is correctly specified, the regression-adjusted estimator achieves the minimum asymptotic variance. We also discuss forms of adjustments that can improve the efficiency of the QTE estimators. The finite sample performance of the new estimation and inferential methods is studied in simulations, and an empirical application to a well-known dataset concerned with expanding access to basic bank accounts on savings is reported.

Keywords: Covariate-adaptive randomization, High-dimensional data, Regression adjustment, Quantile treatment effects.

JEL codes: C14, C21, D14, G21

1 Introduction

Covariate-adaptive randomizations (CARs) have recently seen growing use in a wide variety of randomized experiments in economic research. Examples include Chong et al. (2016), Greaney et al. (2016), Jakiela and Ozier (2016), Burchardi et al. (2019), Anderson and McKenzie (2021), among many others. In CAR modeling, units are first stratified using some baseline covariates, and then, within each stratum, the treatment status is assigned (independent of covariates) to achieve the balance between the numbers of treated and control units.

In many empirical studies, apart from the average treatment effect (ATE), researchers are often interested in using randomized experiments to estimate quantile treatment effects (QTEs). The QTE has a useful role as a robustness check for the ATE and characterizes any heterogeneity that may be present in the sign and magnitude of the treatment effects according to their position within the distribution of outcomes. See, for example, Bitler et al. (2006), Muralidharan and Sundararaman (2011), Duflo et al. (2013), Banerjee et al. (2015), Crépon et al. (2015), and Campos et al. (2017).

Two practical issues arise in estimation and inference concerning QTEs under CARs. First, other covariates in addition to the strata indicators are collected during the experiment. It is possible to incorporate these covariates in the estimation of treatment effects to reduce variance and improve efficiency. In the estimation of ATE, the usual practice is to run a simple ordinary least squares (OLS) regression of the outcome on treatment status, strata indicators, and additional covariates as in the analysis of covariance (ANCOVA). Freedman (2008a, b) pointed out that such an OLS regression adjustment can degrade the precision of the ATE estimator. Lin (2013) reexamined Freedman’s critique and showed that, in order to improve efficiency, the linear regression adjustment should include a full set of interactions between the treatment status and covariates. However, because the quantile function is a nonlinear operator, even when the assignment of treatment status is completely random, a similar linear quantile regression with a full set of interaction terms is unable to provide a consistent estimate of the unconditional QTE, not to mention the improvement of estimation efficiency. Second, in order to achieve balance in the respective number of treated and control units within each stratum, treatment statuses under CARs usually exhibit a (negative) cross-sectional dependence. Standard inference procedures that rely on cross-sectional independence are therefore conservative and lack power. These two issues raise questions of how to use the additional covariates to consistently and more efficiently estimate QTE in CAR settings and how to conduct valid statistical procedures that mitigate conservatism in inference.¹¹1For example, Bugni et al. (2018) and Zhang and Zheng (2020) have shown that the usual two-sample $t$ -test for inference concerning ATE and multiplier bootstrap inference concerning QTE are in general conservative under CARs.

The present paper addresses these issues by proposing a regression-adjusted estimator of the QTE, deriving its limit theory, and establishing the validity of multiplier bootstrap inference under CARs. Even under potential misspecification of the auxiliary regressions, the proposed QTE estimator is shown to maintain its consistency, and the multiplier bootstrap procedure is shown to have an asymptotic size equal to the nominal level under the null. When the auxiliary regression is correctly specified, the QTE estimator achieves minimum asymptotic variance.

We further investigate efficiency gains that materialize from the regression adjustments in three scenarios: (1) parametric regressions, (2) nonparametric regressions, and (3) regressions with regularization in high-dimensional settings. Specifically, for parametric regressions with a potentially misspecified linear probability model, we propose to compute the optimal linear coefficient by minimizing the variance of the QTE estimator. Such an adjustment is optimal within the class of linear adjustments but does not necessarily achieve the global minimum asymptotic variance. However, as no adjustment is a special case of the linear regression with all the coefficients being zero, our optimal linear adjustment is guaranteed to be weakly more efficient than the QTE estimator with no adjustments, which addresses Freedman’s critique. We also consider a potentially misspecified logistic regression with fixed-dimensional regressors and strata- and quantile-specific regression coefficients, which is then estimated by the quasi maximum likelihood estimation (QMLE). Although the QMLE does not necessarily minimize the asymptotic variance of the QTE, such a flexible logistic model can closely approximate the true specification. Therefore, in practice, the corresponding regression-adjusted QTE estimator usually has a smaller variance than that with no adjustments. Last, we propose to treat the logistic QMLE adjustments as new linear regressors and re-construct the corresponding optimal linear adjustments. We then show the QTE estimator with the new adjustments are weakly more efficient than that with both the original logistic QMLE adjustments and no adjustments.

In nonparametric regressions, we further justify the QMLE by letting the regressors in the logistic regression be a set of sieve basis functions with increasing dimension and show how such a nonparametric regression-adjusted QTE estimator can achieve the global minimum asymptotic variance. For high-dimensional regressions with regularization, we consider logistic regression under $\ell_{1}$ penalization, an approach that also achieves the global minimum asymptotic variance. All the limit theories hold uniformly over a compact set of quantile indices, implying that our multiplier bootstrap procedure can be used to conduct inference on QTEs involving single, multiple, or a continuum of quantile indices.

These results, including the limit distribution of the regression-adjusted QTE estimator and the validity of the multiplier bootstrap, provide novel contributions to the literature in three respects. First, the data generated under CARs are different from observational data as the observed outcomes and treatment statuses are cross-sectionally dependent due to the randomization schemes. Recently Bugni et al. (2018) established a rigorous asymptotic framework to study the ATE estimator under CARs and pointed out the conservatism of the two-sample t-test except for some special cases. (See Bugni et al. (2018, Remark 4.2) for more detail.) Our analysis follows this new framework, which departs from the literature of causal inference under an i.i.d. treatment structure.

Second, we contribute to the literature on causal inference under CARs by developing a new methodology that includes additional covariates in the estimation of the unconditional QTE and by establishing a general theory for regression adjustments that allow for parametric, nonparametric, and regularized estimations of the auxiliary regressions. As mentioned earlier, unlike ATE estimation, the naive linear quantile regression with additional covariates cannot even produce a consistent estimator of the QTE. Instead, we propose a new way to incorporate additional covariates based on the Neyman orthogonal moment and investigate the asymptotic properties and the efficiency gains of the proposed regression-adjusted estimator under CARs. This new machinery allows us to study the QTE regression, which is nonparametrically specified, with both linear (linear probability model) and nonlinear (logit and probit models) regression adjustments. To clarify this contribution to the literature we note that Hu and Hu (2012); Ma et al. (2015, 2020); Olivares (2021); Shao and Yu (2013); Zhang and Zheng (2020); Ye (2018); Ye and Shao (2020) considered inference of various causal parameters under CARs but without taking into account additional covariates. Bugni et al. (2018), Bugni et al. (2019), and Bugni and Gao (2021) considered saturated regressions for ATE and local ATE, which can be viewed as regression-adjustments where strata indicators are interacted with the treatment or instrument. Shao et al. (2010) showed that if a test statistic is constructed based on the correctly specified model between outcome and additional covariates and the covariates used for CAR are functions of additional covariates, then the test statistic is valid conditional on additional covariates. Bloniarz et al. (2016); Fogarty (2018); Lin (2013); Lu (2016); Lei and Ding (2021); Li and Ding (2020); Liu et al. (2020); Liu and Yang (2020); Negi and Wooldridge (2020); Ye et al. (2022); Zhao and Ding (2021) studied various estimation methods based on regression adjustments, but these studies all focused on ATE estimation. Specifically, Liu et al. (2020) considered linear adjustments for ATE under CARs in which the covariates can be high-dimensional and the adjustments can be estimated by Lasso. Ansel et al. (2018) considered regression adjustment using additional covariates for ATE and Local ATE. We differ from them by considering QTE with nonlinear adjustments such as logistic Lasso.

Third, we establish the validity of the multiplier bootstrap inference for the regression-adjusted QTE estimator under CARs. To the best of our knowledge, Shao et al. (2010) and Zhang and Zheng (2020) are the only works in the literature that studied bootstrap inference under CARs. Shao et al. (2010) considered the covariate-adaptive bootstrap for the linear regression model. Zhang and Zheng (2020) proposed to bootstrap inverse propensity score weighted (IPW) QTE estimator with the estimated target fraction of treatment even when the truth is known. They showed that the asymptotic variance of the IPW estimator is the same under various CARs. Thus, even though the bootstrap sample ignores the cross-sectional dependence and behaves as if the randomization scheme is simple, the asymptotic variance of the bootstrap analogue is still the same. We complement this research by studying the validity of multiplier bootstrap inference for our regression-adjusted QTE estimator. We establish analytically that the multiplier bootstrap with the estimated fraction of treatment is not conservative in the sense that it can achieve an asymptotic size equal to the nominal level under the null even when the auxiliary regressions are misspecified.

The present paper also comes under the umbrella of a growing literature that has addressed estimation and inference in randomized experiments. In this connection, we mention the studies of Hahn et al. (2011); Athey and Imbens (2017); Abadie et al. (2018); Tabord-Meehan (2021); Bai et al. (2021); Bai (2020); Jiang et al. (2021) among many others. Bai (2020) showed an ‘optimal’ matched-pair design can minimize the mean-squared error of the difference-in-means estimator for ATE, conditional on covariates. Tabord-Meehan (2021) designed an adaptive randomization procedure which can minimize the variance of the weighted estimator for ATE. Both works rely on a pilot experiment to design the optimal randomization. In contrast, we take the randomization scheme (i.e., CARs) as given and search for new estimators (other than difference-in-quantile and weighted estimators) for QTE that have smaller variance. In addition, our approach does not require a pilot experiment. Therefore, our and their methods are applied to different scenarios depending on the definition of ‘optimality’ and the data available, and thus, complement to each other.

From a practical perspective, our estimation and inferential methods have four advantages. First, they allow for common choices of auxiliary regressions such as linear probability, logit, and probit regressions, even though these regressions may be misspecified. Second, the methods can be implemented without tuning parameters. Third, our (bootstrap) estimator can be directly computed via the subgradient condition, and the auxiliary regressions need not be re-estimated in the bootstrap procedure, both of which save considerable computation time. Last, our estimation and inference methods can be implemented without the knowledge of the exact treatment assignment rule used in the experiment. This advantage is especially useful in subsample analysis, where sub-groups are defined using variables other than those to form the strata and the treatment assignment rule for each sub-group becomes unknown. See, for example, the anemic subsample analysis in Chong et al. (2016) and Zhang and Zheng (2020). These last three points carry over from Zhang and Zheng (2020) and are logically independent of the regression adjustments. One of our contributions is to show these results still hold for our regression-adjusted estimator.

The remainder of the paper is organized as follows. Section 2 describes the model setup and notation. Section 3 develops the asymptotic properties of our regression-adjusted QTE estimator. Section 4 studies the validity of the multiplier bootstrap inference. Section 5 considers parametric, nonparametric, and regularized estimation of the auxiliary regressions. Section 6 reports simulation results, and an empirical application of our methods to the impact of expanding access to basic bank accounts on savings is provided in Section 7. Section 8 concludes. Proofs of all results and some additional simulation results are given in the Online Supplement.

2 Setup and Notation

Potential outcomes for treated and control groups are denoted by $Y(1)$ and $Y(0)$ , respectively. Treatment status is denoted by $A$ , with $A=1$ indicating treated and $A=0$ untreated. The stratum indicator is denoted by $S$ , based on which the researcher implements the covariate-adaptive randomization. The support of $S$ is denoted by $\mathcal{S}$ , a finite set. After randomization, the researcher can observe the data $\{Y_{i},S_{i},A_{i},X_{i}\}_{i\in[n]}$ where $[n]=\{1,2,...n\}$ , $Y_{i}=Y_{i}(1)A_{i}+Y_{i}(0)(1-A_{i})$ is the observed outcome, and $X_{i}$ contains extra covariates besides $S_{i}$ in the dataset. The support of $X$ is denoted $\text{Supp}(X)$ . In this paper, we allow $X_{i}$ and $S_{i}$ to be dependent. For $i\in[n]$ , let $p(s)=\mathbb{P}(S_{i}=s)$ , $n(s)=\sum_{i\in[n]}1\{S_{i}=s\}$ , $n_{1}(s)=\sum_{i\in[n]}A_{i}1\{S_{i}=s\}$ , and $n_{0}(s)=n(s)-n_{1}(s)$ . We make the following assumptions on the data generating process (DGP) and the treatment assignment rule.

Assumption 1.

(i)

$\{Y_{i}(1),Y_{i}(0),S_{i},X_{i}\}_{i\in[n]}$ is i.i.d.
(ii)

$\{Y_{i}(1),Y_{i}(0),X_{i}\}_{i\in[n]}\perp\!\!\!\perp\{A_{i}\}_{i\in[n]}|\{S_{i}\}_{i\in[n]}$ .
(iii)

Suppose $p(s)$ is fixed with respect to (w.r.t.) $n$ and is positive for every $s\in\mathcal{S}$ .
(iv)

Let $\pi(s)$ denote the target fraction of treatment for stratum $s$ . Then, $c<\min_{s\in\mathcal{S}}\pi(s)\leq\max_{s\in\mathcal{S}}\pi(s)<1-c$ for some constant $c\in(0,0.5)$ and $\frac{D_{n}(s)}{n(s)}=o_{p}(1)$ for $s\in\mathcal{S}$ , where $D_{n}(s)=\sum_{i\in[n]}(A_{i}-\pi(s))1\{S_{i}=s\}$ .

Several remarks are in order. First, Assumption 1(i) allows for cross-sectional dependence among treatment statuses ( $\{A_{i}\}_{i\in[n]}$ ), thereby accommodating many covariate-adaptive randomization schemes as discussed below. Second, although treatment statuses are cross-sectionally dependent, they are independent of the potential outcomes and additional covariates conditional on the stratum indicator $S$ . Therefore, data are still experimental rather than observational. Third, Assumption 1(iii) requires the size of each stratum to be proportional to the sample size. Fourth, we can view $\pi(s)$ as the target fraction of treated units in stratum $s$ . Similar to Bugni et al. (2019), we allow the target fractions to differ across strata. Just as for the overlapping support condition in an observational study, the target fractions are assumed to be bounded away from zero and one. In randomized experiments, this condition usually holds because investigators can determine $\pi(s)$ in the design stage. In fact, in most CARs, $\pi(s)=0.5$ for $s\in\mathcal{S}$ . Fifth, $D_{n}(s)$ represents the degree of imbalance between the real and target factions of treated units in the $s$ th stratum. Bugni et al. (2018) show that Assumption 1(iv) holds under several covariate-adaptive treatment assignment rules such as simple random sampling (SRS), biased-coin design (BCD), adaptive biased-coin design (WEI), and stratified block randomization (SBR). For completeness, we briefly repeat their descriptions below. Note we only require $D_{n}(s)/n(s)=o_{p}(1)$ , which is weaker than the assumption imposed by Bugni et al. (2018) but the same as that imposed by Bugni et al. (2019) and Zhang and Zheng (2020).

Example 1 (SRS).

Let $\{A_{i}\}_{i\in[n]}$ be drawn independently across $i$ and of $\{S_{i}\}_{i\in[n]}$ as Bernoulli random variables with success rate $\pi$ , i.e., for $k=1,\cdots,n$ ,

\displaystyle\mathbb{P}\left(A_{k}=1\big{|}\{S_{i}\}_{i\in[n]},\{A_{j}\}_{j\in[k-1]}\right)=\mathbb{P}(A_{k}=1)=\pi(S_{i}).

Example 2 (WEI).

This design was first proposed by Wei (1978). Let $n_{k-1}(S_{k})=\sum_{i\in[k-1]}1\{S_{i}=S_{k}\}$ , $D_{k-1}(s)=\sum_{i\in[k-1]}\left(A_{i}-\frac{1}{2}\right)1\{S_{i}=s\}$ , and

\displaystyle\mathbb{P}\left(A_{k}=1\big{|}\{S_{i}\}_{i\in[k]},\{A_{i}\}_{i\in[k-1]}\right)=\phi\biggl{(}\frac{2D_{k-1}(S_{k})}{n_{k-1}(S_{k})}\biggr{)},

where $\phi(\cdot):[-1,1]\mapsto[0,1]$ is a pre-specified non-increasing function satisfying $\phi(-x)=1-\phi(x)$ and $\frac{D_{0}(S_{1})}{0}$ is understood to be zero.

Example 3 (BCD).

The treatment status is determined sequentially for $1\leq k\leq n$ as

\displaystyle\mathbb{P}\left(A_{k}=1|\{S_{i}\}_{i\in[k]},\{A_{i}\}_{i\in[k-1]}\right)=\begin{cases}\frac{1}{2}&\text{if }D_{k-1}(S_{k})=0\\ \lambda&\text{if }D_{k-1}(S_{k})<0\\ 1-\lambda&\text{if }D_{k-1}(S_{k})>0,\end{cases}

where $D_{k-1}(s)$ is defined as above and $\frac{1}{2}<\lambda\leq 1$ .

Example 4 (SBR).

For each stratum, $\lfloor\pi(s)n(s)\rfloor$ units are assigned to treatment and the rest are assigned to control.

Denote the $\tau$ th quantile of $Y(a)$ by $q_{a}(\tau)$ for $a=0,1$ . We are interested in estimating and inferring the $\tau$ th quantile treatment effect defined as $q(\tau)=q_{1}(\tau)-q_{0}(\tau)$ . The testing problems of interest involve single, multiple, or even a continuum of quantile indices, as in the following null hypotheses

	$\displaystyle\mathcal{H}_{0}:q(\tau)=\underline{q}\quad\text{versus}\quad q(\tau)\neq\underline{q},$
	$\displaystyle\mathcal{H}_{0}:q(\tau_{1})-q(\tau_{2})=\underline{q}\quad\text{versus}\quad q(\tau_{1})-q(\tau_{2})\neq\underline{q},\;\textrm{and}$
	$\displaystyle\mathcal{H}_{0}:q(\tau)=\underline{q}(\tau)\leavevmode\nobreak\ \forall\tau\in\Upsilon\quad\text{versus}\quad q(\tau)\neq\underline{q}(\tau)\leavevmode\nobreak\ \text{for some}\leavevmode\nobreak\ \tau\in\Upsilon,$

for some pre-specified value $\underline{q}$ or function $\underline{q}(\tau)$ , where $\Upsilon$ is some compact subset of $(0,1)$ . We can also test constant QTE by letting $\underline{q}(\tau)$ in the last hypothesis be a constant $\underline{q}$ .

3 Estimation

Define $m_{a}(\tau,s,x)=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)$ for $a=0,1$ which are the true specifications but unknown to researchers. Instead, researchers specify working models $\{\overline{m}_{a}(\tau,s,x)\}_{a=0,1}$ ²²2We view $\overline{m}_{a}(\cdot)$ as some function with inputs $\tau,s,x$ . For example, researchers can specify a linear probability model with $\overline{m}_{a}(\tau,s,x)=\tau-x^{\top}\beta_{a,s}$ , where $\beta_{a,s}$ is the linear coefficient that varies across treatment status $a$ and stratum $s$ . for the true specification, which can be misspecified. Last, the researchers estimate the (potentially misspecified) working models via some forms of regression, and the estimators are denoted as $\{\widehat{m}_{a}(\tau,s,x)\}_{a=0,1}$ . We also refer to $\overline{m}_{a}(\cdot)$ as the auxiliary regression.

Our regression-adjusted estimator of $q_{1}(\tau)$ , denoted as $\hat{q}_{1}^{adj}(\tau)$ , can be defined as

\displaystyle\hat{q}_{1}^{adj}(\tau)=

\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}\rho_{\tau}(Y_{i}-q)+\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})q\right],

(3.1)

where $\rho_{\tau}(u)=u(\tau-1\{u\leq 0\})$ is the usual check function and $\hat{\pi}(s)=n_{1}(s)/n(s)$ . We emphasize that $\widehat{m}_{1}(\cdot)$ may not consistently estimate the true specification $m_{1}(\cdot)$ . Similarly, we can define

\displaystyle\hat{q}_{0}^{adj}(\tau)=

\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\left[\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}\rho_{\tau}(Y_{i}-q)-\frac{(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})q\right].

(3.2)

Then, our regression adjusted QTE estimator is

\displaystyle\hat{q}^{adj}(\tau)=\hat{q}_{1}^{adj}(\tau)-\hat{q}_{0}^{adj}(\tau).

(3.3)

Several remarks are in order. First, in observational studies with i.i.d. data and $A_{i}\perp\!\!\!\perp X_{i}|S_{i}$ , Firpo (2007), Belloni et al. (2017), and Kallus et al. (2020) showed that the doubly robust moment for $q_{1}(\tau)$ is

\displaystyle\mathbb{E}\left[\frac{A_{i}(\tau-1\{Y_{i}(1)\leq q\})}{\overline{\pi}(S_{i})}-\frac{A_{i}-\overline{\pi}(S_{i})}{\overline{\pi}(S_{i})}\overline{m}_{1}(\tau,S_{i},X_{i})\right]=0,

(3.4)

where $\overline{\pi}(s)$ and $\overline{m}_{1}(\tau,s,x)$ are the working models for the target fraction ( $\pi(s)$ ) and conditional probability ( $m_{1}(\tau,s,x)$ ), respectively. Our estimator is motivated by this doubly robust moment, but our analysis differs from that for the observational data as CARs introduces cross-sectional dependence among observations. Second, as our target fraction estimator $\hat{\pi}(s)=n_{1}(s)/n(s)$ is consistent, it means $\overline{\pi}(s)$ is correctly specified as $\pi(s)$ . Then, due to the double robustness, our regression adjusted estimator is consistent even when $\overline{m}_{a}(\cdot)$ is misspecified and $\widehat{m}_{a}(\cdot)$ is an inconsistent estimator of $m_{a}(\cdot)$ . Third, we use the estimated target fraction $\hat{\pi}(s)$ even when $\pi(s)$ is known because this guarantees that the bootstrap inference is not conservative. Further discussion is provided after Theorem 5.

Assumption 2.

For $a=0,1$ , denote $f_{a}(\cdot)$ , $f_{a}(\cdot|s)$ , and $f_{a}(\cdot|x,s)$ as the PDFs of $Y_{i}(a)$ , $Y_{i}(a)|S_{i}=s$ , and $Y_{i}(a)|S_{i}=s,X_{i}=x$ , respectively.

(i)

$f_{a}(q_{a}(\tau))$ and $f_{a}(q_{a}(\tau)|s)$ are bounded and bounded away from zero uniformly over $\tau\in\Upsilon$ and $s\in\mathcal{S}$ , where $\Upsilon$ is a compact subset of $(0,1)$ .
(ii)

$f_{a}(\cdot)$ and $f_{a}(\cdot|s)$ are Lipschitz over $\{q_{j}(\tau):\tau\in\Upsilon\}.$
(iii)

$\sup_{y\in\Re,x\in\text{Supp}(X),s\in\mathcal{S}}f_{a}(y|x,s)<\infty$ .

Assumption 3.

(i)

For $a=0,1$ , there exists a function $\overline{m}_{a}(\tau,s,x)$ such that for $\overline{\Delta}_{a}(\tau,s,X_{i})=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})$ , we have

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{1}(s)}-\frac{\sum_{i\in I_{0}(s)}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{0}(s)}\biggr{|}=o_{p}(n^{-1/2}),

where $I_{a}(s)=\{i\in[n]:A_{i}=a,S_{i}=s\}$ .

(ii)

For $a=0,1$ , let $\mathcal{F}_{a}=\{\overline{m}_{a}(\tau,s,x):\tau\in\Upsilon\}$ with an envelope $F_{a}(s,x)$ . Then, $\max_{s\in\mathcal{S}}\mathbb{E}(|F_{a}(S_{i},X_{i})|^{q}|S_{i}=s)<\infty$ for $q>2$ and there exist fixed constants $(\alpha,v)>0$ such that

\displaystyle\sup_{Q}N(\mathcal{F}_{a},e_{Q},\varepsilon||F_{a}||_{Q,2})\leq\left(\frac{\alpha}{\varepsilon}\right)^{v},\quad\forall\varepsilon\in(0,1],

where $N(\cdot)$ denotes the covering number, $e_{Q}(f,g)=||f-g||_{Q,2}$ , and the supremum is taken over all finitely discrete probability measures $Q$ .

(iii)

For $a=0,1$ and any $\tau_{1},\tau_{2}\in\Upsilon$ , there exists a constant $C>0$ such that

\mathbb{E}((\overline{m}_{a}(\tau_{2},S_{i},X_{i})-\overline{m}_{a}(\tau_{1},S_{i},X_{i}))^{2}|S_{i}=s)\leq C|\tau_{2}-\tau_{1}|.

Several remarks are in order. First, Assumption 2 is standard in the quantile regression literature. We do not need $f_{a}(y|x,s)$ to be bounded away from zero because we are interested in the unconditional quantile $q_{a}(\tau)$ , which is uniquely defined as long as the unconditional density $f_{a}(q_{a}(\tau))$ is positive. Second, Assumption 3(i) is high-level. If we consider a linear probability model such that $\overline{m}_{a}(\tau,s,X_{i})=\tau-X_{i}^{\top}\theta_{a,s}(\tau)$ and $\widehat{m}_{a}(\tau,s,X_{i})=\tau-X_{i}^{\top}\hat{\theta}_{a,s}(\tau)$ , then Assumption 3(i) is equivalent to

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left(\frac{\sum_{i\in I_{1}(s)}X_{i}}{n_{1}(s)}-\frac{\sum_{i\in I_{0}(s)}X_{i}}{n_{0}(s)}\right)^{\top}\left(\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)\right)\right|=o_{p}(n^{-1/2}),

which is similar to Liu et al. (2020, Assumption 3) and holds intuitively if $\hat{\theta}_{a,s}(\tau)$ is a consistent estimator of the pseudo true value $\theta_{a,s}(\tau)$ . Third, Assumptions 3(ii) and 3(iii) impose mild regularity conditions on $\overline{m}_{a}(\cdot)$ . Assumption 3(ii) holds automatically if $\Upsilon$ is a finite set. In general, both Assumption 3(ii) and 3(iii) hold if

\displaystyle\sup_{a=0,1,s\in\mathcal{S},x\in\text{Supp}(X)}|\overline{m}_{a}(\tau_{2},s,x)-\overline{m}_{a}(\tau_{1},s,x)|\leq L|\tau_{2}-\tau_{1}|

for some constant $L>0$ . Such Lipschitz continuity holds for the true specification ( $\overline{m}_{a}(\cdot)=m_{a}(\cdot)$ ) under Assumption 2. Fourth, we provide primitive sufficient conditions for Assumption 3 in Section 5.

Theorem 3.1.

Suppose Assumptions 1–3 hold. Then, uniformly over $\tau\in\Upsilon$ ,

\displaystyle\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))\rightsquigarrow\mathcal{B}(\tau),

where $\mathcal{B}(\tau)$ is a tight Gaussian process with covariance kernel $\Sigma(\tau,\tau^{\prime})$ defined in Section E of the Online Supplement. In addition, for any finite set of quantile indices $(\tau_{1},\cdots,\tau_{K})$ , the asymptotic covariance matrix of $(\hat{q}^{adj}(\tau_{1}),\cdots,\hat{q}^{adj}(\tau_{K}))$ is denoted as $[\Sigma(\tau_{k},\tau_{l})]_{k,l\in[K]}$ , where we use $[U_{kl}]_{k,l\in[K]}$ to denote a $K\times K$ matrix whose $(k,l)$ th entry is $U_{kl}$ . Then, $[\Sigma(\tau_{k},\tau_{l})]_{k,l\in[K]}$ is minimized in the matrix sense³³3For two symmetric matrices $A$ and $B$ , we say $A$ is greater than or equal to $B$ if $A-B$ is positive semidefinite. when the auxiliary regressions are correctly specified at $(\tau_{1},\cdots,\tau_{K})$ , i.e., $\overline{m}_{a}(\tau_{k},s,x)=m_{a}(\tau_{k},s,x)$ for $a=0,1$ , $k\in[K]$ , and all $(s,x)$ in the joint support of $(S_{i},X_{i})$ .

Three remarks are in order. First, the expression for the asymptotic variance of $\hat{q}^{adj}(\tau)$ can be found in the proof of Theorem 3. It is the same whether the randomization scheme achieves strong balance⁴⁴4We refer readers to Bugni et al. (2018) for the definition of strong balance. or not. This robustness is due to the use of the estimated target fraction ( $\hat{\pi}(s)$ ). The same phenomenon was discovered in the simplified setting by Zhang and Zheng (2020). Second, although our estimator is still consistent and asymptotically normal when the auxiliary regression is misspecified, it is meaningful to pursue the correct specification as it achieves the minimum variance. As the estimator with no adjustments can be viewed as a special case of our estimator with $\overline{m}_{a}(\cdot)=0$ , Theorem 3 implies that the adjusted estimator with the correctly specified auxiliary regression is more efficient than that with no adjustments. If the auxiliary regression is misspecified, the adjusted estimator can sometimes be less efficient than the unadjusted one, which is known as the Freedman’s critique. In Section 5, we discuss how to make adjustments that do not harm the precision of the QTE estimator. Third, the asymptotic variance of $\hat{q}^{adj}(\tau)$ depends on $(f_{a}(q_{a}(\tau)),m_{a}(\tau,s,x))_{a=0,1}$ , which are infinite-dimensional nuisance parameters. To conduct analytic inference, it is necessary to nonparametrically estimate these nuisance parameters, which requires tuning parameters. Nonparametric estimation can be sensitive to the choice of tuning parameters and rule-of-thumb tuning parameter selection may not be appropriate for every DGP or every quantile. The use of cross-validation in selecting the tuning parameters is possible in principle but, in practice, time-consuming. These practical difficulties of analytic methods of inference provide strong motivation to investigate bootstrap inference procedures that are much less reliant on tuning parameters.

4 Multiplier Bootstrap Inference

We approximate the asymptotic distributions of $\hat{q}^{adj}(\tau)$ via the multiplier bootstrap. Let $\{\xi_{i}\}_{i\in[n]}$ be a sequence of bootstrap weights which will be specified later. Define $n_{1}^{w}(s)=\sum_{i\in[n]}\xi_{i}A_{i}1\{S_{i}=s\}$ , $n_{0}^{w}(s)=\sum_{i\in[n]}\xi_{i}(1-A_{i})1\{S_{i}=s\}$ , $n^{w}(s)=\sum_{i\in[n]}\xi_{i}1\{S_{i}=s\}=n_{1}^{w}(s)+n_{0}^{w}(s)$ , and $\hat{\pi}^{w}(s)=n_{1}^{w}(s)/n^{w}(s)$ . The multiplier bootstrap counterpart of $\hat{q}^{adj}(\tau)$ is denoted by $\hat{q}^{w}(\tau)$ and defined as

\displaystyle\hat{q}^{w}(\tau)=\hat{q}_{1}^{w}(\tau)-\hat{q}_{0}^{w}(\tau),

where

\displaystyle\hat{q}_{1}^{w}(\tau)=

\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}\rho_{\tau}(Y_{i}-q)+\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})q\right],

(4.1)

and

\displaystyle\hat{q}_{0}^{w}(\tau)=

\displaystyle\operatorname*{arg\,min}_{q}\sum_{i\in[n]}\xi_{i}\left[\frac{1-A_{i}}{1-\hat{\pi}^{w}(S_{i})}\rho_{\tau}(Y_{i}-q)-\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{1-\hat{\pi}^{w}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})q\right].

(4.2)

Two comments on implementation are noted here: (i) we do not re-estimate $\widehat{m}_{a}(\cdot)$ in the bootstrap sample, which is similar to the multiplier bootstrap procedure proposed by Belloni et al. (2017); and (ii) in Section B of the Online Supplement we propose a way to directly compute $(\hat{q}_{a}^{w}(\tau))_{a=0,1}$ from the subgradient conditions of (4.1) and (4.2), thereby avoiding the optimization. Both features considerably reduce computation time of our bootstrap procedure.

Next, we specify the bootstrap weights.

Assumption 4.

Suppose $\{\xi_{i}\}_{i\in[n]}$ is a sequence of nonnegative i.i.d. random variables with unit expectation and variance and a sub-exponential upper tail.

Assumption 5.

Recall $\overline{\Delta}_{a}(\tau,s,x)$ defined in Assumption 3. We have, for $a=0,1$ ,

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{|}=o_{p}(n^{-1/2}).

We require the bootstrap weights to be nonnegative so that the objective functions in (4.1) and (4.2) are convex. In practice, we generate $\xi_{i}$ independently from the standard exponential distribution. Assumption 5 is the bootstrap counterpart of Assumption 3. Continuing with the linear model example considered after Assumption 3, Assumption 5 requires

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left(\frac{\sum_{i\in I_{1}(s)}\xi_{i}X_{i}}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}X_{i}}{n_{0}^{w}(s)}\right)^{\top}\left(\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)\right)\right|=o_{p}(n^{-1/2}),

which holds if $\hat{\theta}_{a,s}(\tau)$ is a uniformly consistent estimator of $\theta_{a,s}(\tau)$ .

Theorem 4.1.

Suppose Assumptions 1–5 hold. Then, uniformly over $\tau\in\Upsilon$ ,

\displaystyle\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))\underset{\xi}{\overset{\mathbb{P}}{\rightsquigarrow}}\mathcal{B}(\tau),

where $\mathcal{B}(\tau)$ is the same Gaussian process defined in Theorem 3.⁵⁵5We view $\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))$ and $\mathcal{B}(\tau)$ as two processes indexed by $\tau\in\Upsilon$ and denote them as $G_{n}$ and $G$ , respectively. Then, following van der Vaart and Wellner (1996, Chapter 2.9), we say $G_{n}\underset{\xi}{\overset{\mathbb{P}}{\rightsquigarrow}}G$ uniformly over $\tau\in\Upsilon$ if $\displaystyle\sup_{h\in\text{BL}_{1}}|\mathbb{E}_{\xi}h(G_{n})-\mathbb{E}h(G)|\stackrel{{\scriptstyle p}}{{\longrightarrow}}0,$ where $\text{BL}_{1}$ is the set of all functions $h:\ell^{\infty}(\Upsilon)\mapsto[0,1]$ such that $|h(z_{1})-h(z_{2})|\leq|z_{1}-z_{2}|$ for every $z_{1},z_{2}\in\ell^{\infty}(\Upsilon)$ , and $\mathbb{E}_{\xi}$ denotes expectation with respect to the bootstrap weights $\{\xi\}_{i\in[n]}$ .

Two remarks are in order. First, Theorem 5 shows the limit distribution of the bootstrap estimator conditional on data can approximate that of the original estimator uniformly over $\tau\in\Upsilon$ . This is the theoretical foundation for the bootstrap confidence intervals and bands described in Section B in the Online Supplement. Specifically, denote $\{\hat{q}^{w,b}(\tau)\}_{b\in[B]}$ as the bootstrap estimates where $B$ is the number of bootstrap replications. Let $\widehat{\mathcal{C}}(\nu)$ and $\mathcal{C}(\nu)$ be the $\nu$ th empirical quantile of the sequence $\{\hat{q}^{w,b}(\tau)\}_{b\in[B]}$ and the $\nu$ th standard normal critical value, respectively. Then, we suggest using the bootstrap estimator to construct the standard error of $\hat{q}^{adj}(\tau)$ as $\hat{\sigma}=\frac{\widehat{\mathcal{C}}(0.975)-\widehat{\mathcal{C}}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}$ . Note that, unlike Hahn and Liao (2021), our bootstrap standard error is not conservative. In our context, the bootstrap estimator of $\sigma^{2}$ considered by Hahn and Liao (2021) is $\mathbb{E}^{*}(\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))^{2})$ , where $\mathbb{E}^{*}$ is the conditional expectation given data. It is well-known that weak convergence does not imply convergence in $L_{2}$ -norm, which explains why they can show their estimator is in general conservative. Instead, we use a different estimator of the standard error and can show it is consistent given weak convergence. Second, such a bootstrap approximation is consistent under CAR. Zhang and Zheng (2020) showed that for the QTE estimation without regression adjustment, bootstrapping the IPW QTE estimator with the estimated target fraction results in non-conservative inference, while bootstrapping the IPW estimator with the true fraction is conservative under CARs. As the estimator considered by Zhang and Zheng (2020) is a special case of our regression-adjusted estimator with $\widehat{m}_{a}(\cdot)=0$ , we conjecture that the same conclusion holds. A proof of conservative bootstrap inference with the true target fraction is not included in the paper due to the space limit.⁶⁶6Full statements and proofs are lengthy because we need to derive the limit distributions of not only the bootstrap but also the original estimator with the true target fraction. Although the negative result is theoretically interesting, we are not aware of any empirical papers using the true target fraction while making regression adjustments. Moreover, our method is shown to have better performance than the one with the true target fraction in simulations. So the practical value of proving the negative result is limited. Our simulations confirm both the correct size coverage of our inference method using the bootstrap with the estimated target fraction and the conservatism of the bootstrap with the true target fraction. The standard error of the QTE estimator is found to be 34.9% larger on average by using the true rather than the estimated target fraction in the simulations (see Tables 1 below and 15 in the Online Supplement).

5 Auxiliary Regressions

In this section, we consider two approaches to estimation for the auxiliary regressions: (1) a parametric method and (2) a nonparametric method. In Section A of the Online Supplement, we further consider a regularization method. For the parametric method, we do not require the model to be correctly specified. We propose ways to estimate the pseudo true value of the auxiliary regression. For the other two methods, we (nonparametrically) estimate the true model so that the asymptotic variance of $\hat{q}^{adj}(\tau)$ achieves its minimum based on Theorem 3. For all three methods, we verify Assumptions 3 and 5.

5.1 Parametric method

In this section, we consider the case where $X_{i}$ is finite-dimensional. Recall $m_{a}(\tau,s,x)\equiv\tau-\mathbb{P}\left(Y_{i}(a)\leq q_{a}(\tau)|X_{i}=x,S_{i}=s\right)$ for $a=0,1$ . We propose to model $\mathbb{P}\left(Y_{i}(a)\leq q_{a}(\tau)|X_{i},S_{i}=s\right)$ as $\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))$ , where $\theta_{a,s}(\tau)$ is a finite dimensional parameter that depends on $(a,s,\tau)$ so that our model for $m_{a}(\tau,s,X_{i})$ is

\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau)).

(5.1)

We note that, as we allow for misspecification, the researchers have the freedom to choose any functional forms for $\Lambda_{\tau,s}(\cdot)$ and any pseudo true values for $\theta_{a,s}(\tau)$ , both of which can vary with respect to $(\tau,s)$ . For example, if we assume a logistic regression with $\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))=\lambda(X_{i}^{\top}\theta_{a,s}(\tau))$ , where $\lambda(\cdot)$ is the logistic CDF, then there are various choices of $\theta_{a,s}(\tau)$ such as the maximizer of the population pseudo likelihood, the maximizer of the population version of the least squares objective function, or the minimizer of the asymptotic variance of the adjusted QTE estimator. As the logistic model is potentially misspecified, these three pseudo true values are not necessarily the same and can lead to different adjustments, and thus, different asymptotic variances of the corresponding adjusted QTE estimators.

Next, we first state a general result for generic choices of $\Lambda_{\tau,s}(\cdot)$ and $\theta_{a,s}(\tau)$ . Suppose we estimate $\theta_{a,s}(\tau)$ by $\hat{\theta}_{a,s}(\tau)$ . Then, the corresponding $\widehat{m}_{a}(\tau,s,X_{i})$ can be written as

\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(X_{i},\hat{\theta}_{a,s}(\tau)).

(5.2)

Assumption 6.

(i)

Suppose there exist a positive random variable $L_{i}$ and a positive constant $C>0$ such that

	$\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,s\in\mathcal{S},\|\|\theta\|\|\leq C}\|\|\Lambda_{\tau_{1},s}(X_{i},\theta)-\Lambda_{\tau_{2},s}(X_{i},\theta)\|\|_{2}\leq L_{i}\|\tau_{1}-\tau_{2}\|,\quad\sup_{\tau\in\Upsilon,s\in\mathcal{S},\|\|\theta\|\|\leq C}\|\Lambda_{\tau,s}(X_{i},\theta)\|\leq L_{i},$
	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S},\|\|\theta\|\|\leq C}\|\partial_{\theta}\Lambda_{\tau,s}(X_{i},\theta)\|\leq L_{i},\quad\text{and}\quad\mathbb{E}(L_{i}^{d}\|S_{i}=s)\leq C<\infty\quad\text{for some $d>2$.}$

(ii)

$\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}|\theta_{a,s}(\tau_{1})-\theta_{a,s}(\tau_{2})|\leq C|\tau_{1}-\tau_{2}|$ .
(iii)

$\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)||_{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0$ .

Three remarks are in order. First, common choices for auxiliary regressions are linear probability, logistic, and probit regressions, corresponding to $\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))=X_{i}^{\top}\theta_{a,s}(\tau)$ , $\lambda(\vec{X}_{i}^{\top}\theta_{a,s}(\tau))$ , and $\Phi(\vec{X}_{i}^{\top}\theta_{a,s}(\tau))$ , respectively, where $\Phi(\cdot)$ is the standard normal CDF and $\vec{X}_{i}=(1,X_{i}^{\top})^{\top}$ . For these models, the functional form $\Lambda_{\tau,s}(\cdot)$ does not depend on $(\tau,s)$ , and Assumption 6(i) holds automatically. For the linear regression case, we do not include the intercept because our regression adjusted estimators ((3.1) and (3.2)) and their bootstrap counterparts ((4.1) and (4.2)) are numerically invariant to location shift of the auxiliary regressions. Second, it is also important to allow the functional form $\Lambda_{\tau,s}(\cdot)$ to vary across $\tau$ to incorporate the case in which the regressor $X_{i}$ in the linear, logistic, and probit regressions is replaced by $W_{i,s}(\tau)$ , a function of $X_{i}$ that depends on $(\tau,s)$ . We give a concrete example for this situation in Section 5.1.3. Third, Assumption 6(ii) also holds automatically if $\Upsilon$ is finite. When $\Upsilon$ is infinite, this condition is still mild.

Theorem 5.1.

Denote $\hat{q}^{par}(\tau)$ and $\hat{q}^{par,w}(\tau)$ as the $\tau$ th QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with $\overline{m}_{a}(\tau,S_{i},X_{i})$ and $\widehat{m}_{a}(\tau,S_{i},X_{i})$ defined in (5.1) and (5.2), respectively. Suppose Assumptions 1, 2, 4, and 6 hold. Then, Assumptions 3 and 5 hold, which further implies Theorems 3 and 5 hold for $\hat{q}^{par}(\tau)$ and $\hat{q}^{par,w}(\tau)$ , respectively.

Theorem 5.1 shows that, as long as the estimator of the pseudo true value ( $\hat{\theta}_{a,s}(\tau)$ ) is uniformly consistent, under mild regularity conditions, all the general estimation and bootstrap inference results established in Sections 3 and 4 hold.

5.1.1 Linear probability model

In this section, we consider linear adjustment with parameter $t_{a,s}(\tau)$ such that

\displaystyle\Lambda_{\tau,s}(X_{i},t_{a,s}(\tau))=W_{i,s}^{\top}(\tau)t_{a,s}(\tau)\quad\text{and}\quad\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)t_{a,s}(\tau),

(5.3)

where the regressor $W_{i,s}(\tau)$ is a function of $X_{i}$ but the functional form may vary across $s,\tau$ . For example, we can consider $W_{i,s}(\tau)=X_{i}$ , the transformations of $X_{i}$ such as quadratic and interaction terms, and some prediction of $(1\{Y_{i}(1)\leq q_{1}(\tau)\},1\{Y_{i}(0)\leq q_{0}(\tau)\})$ given $X_{i}$ and $S_{i}=s$ . The last example is further explained in Section 5.1.3.

We note that the asymptotic variance (denoted as $\sigma^{2}$ ) of the $\hat{q}^{adj}(\tau)$ is a function of the working model ( $\overline{m}_{a}(\tau,s,\cdot)$ ), which is further indexed by its parameters (denoted as $\{t_{a,s}(\tau)\}_{a=0,1,s\in\mathcal{S}}$ ), i.e., $\sigma^{2}=\sigma^{2}(\{\overline{m}_{a}(\tau,s,\cdot;t_{a,s})\}_{a=0,1,s\in\mathcal{S}})$ . Our optimal linear adjustment corresponds to parameter value $\theta_{a,s}(\tau)$ such that it minimizes $\sigma^{2}(\{\overline{m}_{a}(\tau,s,\cdot;t_{a,s})\}_{a=0,1,s\in\mathcal{S}})$ , i.e.,

\displaystyle\{\theta_{a,s}(\tau)\}_{a=0,1,s\in\mathcal{S}}\in\operatorname*{arg\,min}_{t_{a,s}:a=0,1,s\in\mathcal{S}}\sigma^{2}(\{\overline{m}_{a}(\tau,s,\cdot;t_{a,s})\}_{a=0,1,s\in\mathcal{S}}).

Assumption 7.

Define $\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)$ . There exist constants $0<c<C<\infty$ such that

\displaystyle c<\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\min}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}(\tau)^{\top}|S_{i}=s)\leq\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\max}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}(\tau)^{\top}|S_{i}=s)\leq C

and $\mathbb{E}(||\tilde{W}_{i,s}||_{2}^{d}|S_{i}=s)\leq C$ for some $d>2$ , where for a generic symmetric matrix $U$ , $\lambda_{\min}(U)$ and $\lambda_{\max}(U)$ denote the minimal and maximal eigenvalues of $U$ , respectively.

The next theorem derives the closed-form expression for the optimal linear coefficient.

Theorem 5.2.

Suppose Assumptions 1, 2, 4, 6, 7 hold, and $\Lambda_{\tau,s}(\cdot)$ is defined in (5.3). Further denote the asymptotic covariance matrix of $(\hat{q}^{par}(\tau_{1}),\cdots,\hat{q}^{par}(\tau_{K}))$ for any finite set of quantile indices $(\tau_{1},\cdots,\tau_{K})$ as $[\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})]_{k,l\in[K]}$ . Then, $[\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})]_{k,l\in[K]}$ is minimized in the matrix sense at $\left(\theta_{1,s}(\tau_{k}),\theta_{0,s}(\tau_{k})\right)_{k\in[K]}$ such that

\displaystyle\frac{\theta_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)\theta_{0,s}(\tau_{k})}{(1-\pi(s))f_{0}(q_{0}(\tau_{k}))}=\frac{\theta_{1,s}^{\textit{LP}}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)\theta_{0,s}^{\textit{LP}}(\tau_{k})}{(1-\pi(s))f_{0}(q_{0}(\tau_{k}))},\leavevmode\nobreak\ k\in[K],

where for $\tau=\tau_{1},\cdots,\tau_{K}$ and $a=0,1$ ,

\displaystyle\theta_{a,s}^{\textit{LP}}(\tau)=\left[\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}(\tau)^{\top}|S_{i}=s)\right]^{-1}\mathbb{E}\left[\tilde{W}_{i,s}(\tau)1\{Y_{i}(a)\leq q_{a}(\tau)\}|S_{i}=s\right].

Four remarks are in order. First, the optimal linear coefficients $\{\theta_{a,s}(\tau)\}_{a=0,1,s\in\mathcal{S}}$ are not uniquely defined. In order to achieve the minimal variance, we only need to consistently estimate one of the minimizers. We choose

\displaystyle(\theta_{1,s}(\tau),\theta_{0,s}(\tau))=(\theta_{1,s}^{\textit{LP}}(\tau),\theta_{0,s}^{\textit{LP}}(\tau)),\leavevmode\nobreak\ s\in\mathcal{S},

as this choice avoids estimation of the densities $f_{1}(q_{1}(\tau))$ and $f_{0}(q_{0}(\tau))$ . In Theorem 5.3 below, we propose estimators of $\theta_{1,s}^{\textit{LP}}(\tau)$ and $\theta_{0,s}^{\textit{LP}}(\tau)$ and show they are consistent uniformly over $s$ and $\tau$ . Second, note that no adjustment is nested by our linear adjustment with zero coefficients. Due to the optimality result established in Theorem 5.2, our regression-adjusted QTE estimator with (consistent estimators of) $\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S}}$ is more efficient than that with no adjustments. Third, we also need to clarify that the optimality of $\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S}}$ is only within the class of linear regressions. It is possible that the QTE estimator with some nonlinear adjustments are more efficient than that with the optimal linear adjustments, especially because the linear probability model is likely misspecified. Fourth, the optimal linear coefficients $\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}$ minimize (over the class of linear models) not only the asymptotic variance of $\hat{q}^{par}(\tau)$ but also the covariance matrix of $(\hat{q}^{par}(\tau_{1}),\cdots,\hat{q}^{par}(\tau_{K}))$ for any finite-dimension quantile indices $(\tau_{1},\cdots,\tau_{K})$ . This implies we can use the same (estimators of) optimal linear coefficients for hypothesis testing involving single, multiple, or even a continuum of quantile indices.

In the rest of this subsection, we focus on the estimation of $\{\theta_{a,s}^{\textit{LP}}(\tau)\}_{a=0,1,s\in\mathcal{S}}$ . Note that $\theta_{a,s}^{\textit{LP}}(\tau)$ is the projection coefficient of $1\{Y_{i}\leq q_{a}(\tau)\}$ on $\tilde{W}_{i,s}(\tau)$ for the sub-population with $S_{i}=s$ and $A_{i}=a$ . We estimate them by sample analog. Specifically, the parameter $q_{a}(\tau)$ is unknown and is replaced by some $\sqrt{n}$ -consistent estimator denoted by $\hat{q}_{a}(\tau)$ .

Assumption 8.

Assume that $\sup_{\tau\in\Upsilon,a=0,1}|\hat{q}_{a}(\tau)-q_{a}(\tau)|=O_{p}(n^{-1/2})$ .

In practice, we compute $\{\hat{q}_{a}(\tau)\}_{a=0,1}$ based on (3.1) and (3.2) by setting $\widehat{m}_{a}(\tau,S_{i},X_{i})\equiv 0$ . Then, Assumption 8 holds automatically by Theorem 3 with $\widehat{m}_{a}(\tau,S_{i},X_{i})=\overline{m}_{a}(\tau,S_{i},X_{i})=0$ . Analysis throughout this section takes into account that the estimator $\hat{q}_{a}(\tau)$ is used in place of $q_{a}(\tau)$ .

Next, we define the estimator of $\theta_{a,s}^{\textit{LP}}(\tau)$ . Recall $I_{a}(s)$ is defined in Assumption 3. Let

	$\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\theta_{a,s}^{\textit{LP}}(\tau),$		(5.4)
	$\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\hat{\theta}_{a,s}^{\textit{LP}}(\tau),$		(5.5)
	$\displaystyle\dot{W}_{i,a,s}(\tau)=W_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}W_{i,s}(\tau),$		(5.6)

and

\displaystyle\hat{\theta}_{a,s}^{\textit{LP}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}^{\top}(\tau)\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right].

(5.7)

Assumption 9.

Suppose there exists a positive random variable $L_{i}$ and a positive constant $C>0$ such that for $a=0,1$ ,

\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}||W_{i,s}(\tau_{1})-W_{i,s}(\tau_{2})||_{2}\leq

\displaystyle L_{i}|\tau_{1}-\tau_{2}|,\quad\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||W_{i,s}(\tau)||_{2}\leq L_{i},

and $\mathbb{E}(L_{i}^{d}|S_{i}=s)\leq C<\infty$ for some $d>2$ .

We note that Assumption 9 holds automatically if the regressor $W_{i,s}(\tau)$ does not depend on $\tau$ .

Theorem 5.3.

Suppose Assumptions 1, 2, 7–9 hold. Then Assumption 6 holds for $(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))=(\theta_{a,s}^{\textit{LP}}(\tau),\hat{\theta}_{a,s}^{\textit{LP}}(\tau)),\leavevmode\nobreak\ a=0,1,s\in\mathcal{S},\tau\in\Upsilon$ .

We refer to the QTE estimator adjusted by this linear probability model with optimal linear coefficients $\theta_{a,s}^{\textit{LP}}(\tau)$ and estimators $\hat{\theta}_{a,s}^{\textit{LP}}(\tau)$ as the LP estimator and denote it and its bootstrap counterpart as $\hat{q}^{\textit{LP}}(\tau)$ and $\hat{q}^{\textit{LP,w}}(\tau)$ , respectively. Theorem 5.3 verifies Assumption 6 for the proposed estimator of the optimal linear coefficient. Then, by Theorem 5.1, Theorems 3 and 5 hold for $\hat{q}^{\textit{LP}}(\tau)$ and $\hat{q}^{\textit{LP,w}}(\tau)$ , which implies all the estimation and inference methods established in the paper are valid for the LP estimator. Theorem 5.2 further shows $\hat{q}^{\textit{LP}}(\tau)$ is the estimator with the optimal linear adjustment and weakly more efficient than the QTE estimator with no adjustments.

5.1.2 Logistic probability model

It is also common to consider the logistic regression as the adjustment and estimate the model by maximum likelihood (ML). The main goal of the working model is to approximate the true model as closely as possible. It is, therefore, useful to include additional technical regressors such as interactions in the logistic regression. The set of regressors used is defined as $H_{i}=H(X_{i})$ , which is allowed to contain the intercept. Let $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ and $\theta_{a,s}^{\textit{ML}}(\tau)$ be the quasi-ML estimator and its corresponding pseudo true value, respectively, i.e.,

\displaystyle\hat{\theta}_{a,s}^{\textit{ML}}(\tau)=\operatorname*{arg\,max}_{\theta_{a}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left[1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{i}^{\top}\theta_{a}))+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{i}^{\top}\theta_{a}))\right],

(5.8)

and

\displaystyle\theta_{a,s}^{\textit{ML}}(\tau)=\operatorname*{arg\,max}_{\theta_{a}}\mathbb{E}\left[1\{Y_{i}(a)\leq q_{a}(\tau)\}\log(\lambda(H_{i}^{\top}\theta_{a}))+1\{Y_{i}(a)>q_{a}(\tau)\}\log(1-\lambda(H_{i}^{\top}\theta_{a}))|S_{i}=s\right].

(5.9)

We then define

\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-\lambda(H_{i}^{\top}\theta_{a,s}^{\textit{ML}}(\tau))\quad\text{and}\quad\widehat{m}_{a}(\tau,s,X_{i})=\tau-\lambda(H_{i}^{\top}\hat{\theta}_{a,s}^{\textit{ML}}(\tau)).

(5.10)

In addition to the inclusion of technical regressors, we allow the pseudo true value ( $\theta_{a,s}^{\textit{ML}}(\tau)$ ) to vary across quantiles $\tau$ , giving another layer of flexibility to the model. Such a model is called the distribution regression and was first proposed by Chernozhukov et al. (2013). We emphasize here that, although we aim to make the regression model as flexible as possible, our theory and results do not require the model to be correctly specified.

Assumption 10.

Suppose $\theta_{a,s}^{\textit{ML}}(\tau)$ is the unique minimizer defined in (5.9) for $a=0,1$ .

Theorem 5.4.

Suppose Assumptions 1, 2, 8, 10 hold and there exist constants $c,C$ such that

\displaystyle 0<c\leq\lambda_{\min}(\mathbb{E}H_{i}H_{i}^{\top})\leq\lambda_{\max}(\mathbb{E}H_{i}H_{i}^{\top})\leq C<\infty,

then Assumption 6(iii) holds for $(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))=(\theta_{a,s}^{\textit{ML}}(\tau),\hat{\theta}_{a,s}^{\textit{ML}}(\tau)),\leavevmode\nobreak\ a=0,1,s\in\mathcal{S},\tau\in\Upsilon$ .

Four remarks are in order. First, we refer to the QTE estimator adjusted by the logistic model with QMLE as the ML estimator and denote it and its bootstrap counterpart as $\hat{q}^{\textit{ML}}(\tau)$ and $\hat{q}^{\textit{ML,w}}(\tau)$ , respectively. Assumptions 6(i) holds automatically for the logistic regression. If we further impose Assumption 6(ii), then Theorem 5.4 implies that all the estimation and bootstrap inference methods established in the paper are valid for the ML estimator. Second, we take into account that $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ is computed when the true $q_{a}(\tau)$ is replaced by its estimator $\hat{q}_{a}(\tau)$ and derive the results in Theorem 5.4 under Assumption 8. Third, the ML estimator is not guaranteed to be optimal or be more efficient than QTE estimator with no adjustments. On the other hand, as we can include additional technical terms in the regression and allow the regression coefficients to vary across $\tau$ , the logistic model can be close to the true model $m_{a}(\tau,s,X_{i})$ , which achieves the global minimum asymptotic variance based on Theorem 3. Fourth, in Section 5.2, we further justify the use of the ML estimator with a flexible logistic model by letting the number of technical terms (or equivalently, the dimension of $H_{i}$ ) diverge to infinity, showing by this means that the ML estimator can indeed consistently estimate the true model and thereby achieve the global minimum covariance matrix of the adjusted QTE estimator.

5.1.3 Further improved logistic model

Although in simulations, we cannot find a DGP in which the QTE estimator with logistic adjustment is less efficient than that with no adjustments, theoretically such a scenario still exists. In this section, we follow the idea of Cohen and Fogarty (2020) and construct an estimator which is weakly more efficient than both the ML estimator and the estimator with no adjustments. We denote $W_{i,s}(\tau)=(\lambda(H_{i}^{\top}\theta_{1,s}^{\textit{ML}}(\tau)),\lambda(H_{i}^{\top}\theta_{0,s}^{\textit{ML}}(\tau)))^{\top}$ and treat it as the regressor in a linear adjustment, i.e., define $\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)t_{a,s}(\tau)$ . Then, the logistic adjustment in Section 5.1.2 and no adjustments correspond to $t_{a,s}(\tau)=a(1,0)^{\top}+(1-a)(0,1)^{\top}$ and $t_{a,s}(\tau)=(0,0)^{\top}$ for $a=0,1$ , respectively. However, following Theorem 5.2, the optimal linear coefficient with regressor $W_{i,s}(\tau)$ is

\displaystyle\theta_{a,s}^{\textit{LPML}}(\tau)=\left[\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)\right]^{-1}\mathbb{E}\left[\tilde{W}_{i,s}(\tau)1\{Y_{i}(a)\leq q_{a}(\tau)\}|S_{i}=s\right],

(5.11)

where $\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)$ . Using the adjustment term $\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)t_{a,s}(\tau)$ with $t_{a,s}(\tau)=\theta_{a,s}^{\textit{LPML}}(\tau)$ is asymptotically weakly more efficient than any other choices of $t_{a,s}(\tau)$ . In practice, we do not observe $W_{i,s}(\tau)$ , but can replace it by its feasible version $\hat{W}_{i,s}(\tau)=(\lambda(H_{i}^{\top}\hat{\theta}_{1,s}^{\textit{ML}}(\tau)),\lambda(H_{i}^{\top}\hat{\theta}_{0,s}^{\textit{ML}}(\tau)))^{\top}$ . We then define

	$\displaystyle\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\theta_{a,s}^{\textit{LPML}}(\tau),$		(5.12)
	$\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\hat{W}_{i,s}^{\top}(\tau)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau),$		(5.13)
	$\displaystyle\breve{W}_{i,a,s}(\tau)=\hat{W}_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{W}_{i,s}(\tau),$		(5.14)

and

\displaystyle\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}^{\top}(\tau)\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right].

(5.15)

Assumption 11.

(i)

There exist constants $c,C$ such that

	$\displaystyle 0<c<$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\min}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)\|S_{i}=s)$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon}\lambda_{\max}(\mathbb{E}\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)\|S_{i}=s)\leq C<\infty.$

(ii)

Suppose

	$\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}\|\|\theta_{a,s}^{\textit{ML}}(\tau_{1})-\theta_{a,s}^{\textit{ML}}(\tau_{2})\|\|_{2}\leq C\|\tau_{1}-\tau_{2}\|$
	$\displaystyle\sup_{\tau_{1},\tau_{2}\in\Upsilon,a=0,1,s\in\mathcal{S}}\|\|\theta_{a,s}^{\textit{LPML}}(\tau_{1})-\theta_{a,s}^{\textit{LPML}}(\tau_{2})\|\|_{2}\leq C\|\tau_{1}-\tau_{2}\|.$

Theorem 5.5.

Denote $\hat{q}^{\textit{LPML}}(\tau)$ and $\hat{q}^{\textit{LPML,w}}(\tau)$ as the $\tau$ th QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with $\overline{m}_{a}(\tau,s,X_{i})$ and $\widehat{m}_{a}(\tau,s,X_{i})$ defined in (5.12) and (5.13), respectively. Suppose Assumptions 1, 2, 8, 10, and 11 hold, and there exist constants $c,C$ such that

\displaystyle 0<c\leq\lambda_{\min}(\mathbb{E}H_{i}H_{i}^{\top})\leq\lambda_{\max}(\mathbb{E}H_{i}H_{i}^{\top})\leq C<\infty.

Then, Assumptions 3 and 5 hold, which further implies Theorems 3 and 5 hold for $\hat{q}^{\textit{LPML}}(\tau)$ and $\hat{q}^{\textit{LPML,w}}(\tau)$ , respectively. Further denote the asymptotic covariance matrices of $(\hat{q}^{\textit{J}}(\tau_{1}),\cdots,\hat{q}^{\textit{J}}(\tau_{K}))$ for any finite set of quantile indices $(\tau_{1},\cdots,\tau_{K})$ as $[\Sigma^{\textit{J}}(\tau_{k},\tau_{l})]_{k,l\in[K]}$ for $J\in\{\text{LPML,ML,NA}\}$ , where $\hat{q}^{\textit{NA}}(\tau)$ is the $\tau$ th QTE estimator without adjustments. Then we have

\displaystyle[\Sigma^{\textit{LPML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}\leq[\Sigma^{\textit{ML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}\quad\text{and}\quad[\Sigma^{\textit{LPML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}\leq[\Sigma^{\textit{NA}}(\tau_{k},\tau_{l})]_{k,l\in[K]}

in the matrix sense.

In practice when $n$ is small $\breve{W}_{i,a,s}(\tau)$ may be nearly multicollinear within some stratum, which can lead to size distortion in inference concerning QTE. We therefore suggest first normalizing $\breve{W}_{i,a,s}(\tau)$ by its standard deviation (denoting the normalized $\breve{W}_{i,a,s}(\tau)$ as $\ddot{W}_{i,a,s}(\tau)$ ) and then running a ridge regression

\displaystyle\tilde{\theta}_{a,s}^{\textit{LPML}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\ddot{W}_{i,a,s}(\tau)\ddot{W}_{i,a,s}^{\top}(\tau)+\delta_{n}I_{2}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\ddot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right],

where $I_{2}$ is the two-dimensional identity matrix and $\delta_{n}=1/n$ . Then, the final regression adjustment is

\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\ddot{W}_{i,a,s}^{\top}(\tau)\tilde{\theta}_{a,s}^{\textit{LPML}}(\tau).

Given Assumption 11, such a ridge penalty is asymptotically negligible and all the results in Theorem 5.5 still hold.⁷⁷7In unreported simulations, we find that when $n\geq 800$ , the ridge regularization is unnecessary and the original adjustment (i.e., (5.13)) has no size distortion, implying that near-multicollinearity is indeed just a finite-sample issue.

5.2 Nonparametric method

This section considers nonparametric estimation of $m_{a}(\tau,s,X_{i})$ when the dimension of $X_{i}$ is fixed as $d_{x}$ . For ease of notation, we assume all coordinates of $X_{i}$ are continuously distributed. If in an application some elements of $X$ are discrete, the dimension $d_{x}$ is interpreted as the dimension of the continuous covariates. All results in this section can then be extended in a conceptually straightforward manner by using the continuous covariates only within samples that are homogeneous in discrete covariates.

As $m_{a}(\tau,s,X_{i})$ is nonparametrically estimated, we have $\overline{m}_{a}(\tau,s,X_{i})=m_{a}(\tau,s,X_{i})=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i})$ . We estimate $\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i})$ by the sieve method of fitting a logistic model, as studied by Hirano et al. (2003). Specifically, recall $\lambda(\cdot)$ is the logistic CDF and denote the number of sieve bases by $h_{n}$ , which depends on the sample size $n$ and can grow to infinity as $n\rightarrow\infty$ . Let $H_{h_{n}}(x)=(b_{1n}(x),\cdots,b_{h_{n}n}(x))^{\top}$ where $\{b_{hn}(x)\}_{h\in[h_{n}]}$ is an $h_{n}$ dimensional basis of a linear sieve space. More details on the sieve space are given in Section B of the Online Supplement. Denote

	$\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\tau-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}_{a,s}^{\textit{NP}}(\tau))\quad\text{and }$		(5.16)
	$\displaystyle\hat{\theta}_{a,s}^{\textit{NP}}(\tau)=\operatorname*{arg\,max}_{\theta_{a}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))\biggr{]}.$		(5.17)

We refer to the QTE estimator with the nonparametric adjustment as the NP estimator. Note that we use the estimator $\hat{q}_{a}(\tau)$ of $q_{a}(\tau)$ in (5.17), where $\hat{q}_{a}(\tau)$ satisfies Assumption 8. All the analysis in this section takes account of the fact that $\hat{q}_{a}(\tau)$ instead of $q_{a}(\tau)$ is used.

Assumption 12.

(i)

There exist constants $0<\kappa_{1}<\kappa_{2}<\infty$ such that with probability approaching one,

\kappa_{1}\leq\lambda_{\min}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})\right)\leq\lambda_{\max}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})\right)\leq\kappa_{2},

and

\kappa_{1}\leq\lambda_{\min}\left(\mathbb{E}(H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})|S_{i}=s)\right)\leq\lambda_{\max}\left(\mathbb{E}(H_{h_{n}}(X_{i})H_{h_{n}}^{\top}(X_{i})|S_{i}=s)\right)\leq\kappa_{2}.

(ii)

For $a=0,1$ , there exists an $h_{n}\times 1$ vector $\theta_{a,s}^{\textit{NP}}(\tau)$ such that for $R_{a}(\tau,s,x)=\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)-\lambda(H_{h_{n}}^{\top}(x)\theta_{a,s}^{\textit{NP}}(\tau))$ , we have $\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}|R_{a}(\tau,s,x)|=o(1)$ ,

\displaystyle\sup_{a=0,1,\tau\in\Upsilon,s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})=O_{p}\left(\frac{h_{n}\log(n)}{n}\right),

and

\displaystyle\sup_{a=0,1,\tau\in\Upsilon,s\in\mathcal{S}}\mathbb{E}(R_{a}^{2}(\tau,s,X_{i})|S_{i}=s)=O\left(\frac{h_{n}\log(n)}{n}\right).

(iii)

For $a=0,1$ , there exists a constant $c\in(0,0.5)$ such that

	$\displaystyle c\leq$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|S_{i}=s,X_{i}=x)$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|S_{i}=s,X_{i}=x)\leq 1-c.$

(iv)

Suppose $\mathbb{E}(H^{2}_{h_{n},h}(X_{i})|S_{i}=s)\leq C<\infty$ for some constant $C>0$ , $\sup_{x\in\text{Supp}(X)}||H_{h_{n}}(x)||_{2}\leq\zeta(h_{n})$ , $\zeta^{2}(h_{n})h_{n}\log(n)=o(n)$ , and $h_{n}^{2}\log^{2}(n)=o(n)$ , where $H_{h_{n},h}(X_{i})$ denotes the $h$ th coordinate of $H_{h_{n}}(X_{i})$ .

Four remarks are in order. First, Assumption 12(i) is standard in the sieve literature. Second, Assumption 12(ii) means the approximation error of the sieve logistic model vanishes asymptotically, which holds given sufficient smoothness of $\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)$ in $x$ . Third, Assumption 12(iii) usually holds when $\text{Supp}(X)$ is compact. This condition is also assumed by Hirano et al. (2003). Fourth, the quantity $\zeta(h_{n})$ in Assumption 12(iv) depends on the choice of basis functions. For example, $\zeta(h_{n})=O(h_{n}^{1/2})$ for splines and $\zeta(h_{n})=O(h_{n})$ for power series. Taking splines as an example, Assumption 12(iv) requires $h_{n}=o(n^{1/2})$ .

Theorem 5.6.

Denote $\hat{q}^{\textit{NP}}(\tau)$ and $\hat{q}^{\textit{NP,w}}(\tau)$ as the $\tau$ th QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with $\overline{m}_{a}(\tau,S_{i},X_{i})=m_{a}(\tau,S_{i},X_{i})$ and $\widehat{m}_{a}(\tau,S_{i},X_{i})$ defined in (5.16). Further suppose Assumptions 1, 2, 4, 8, and 12 hold. Then, Assumptions 3 and 5 hold, which further implies that Theorems 3 and 5 hold for $\hat{q}^{\textit{NP}}(\tau)$ and $\hat{q}^{\textit{NP,w}}(\tau)$ , respectively. In addition, for any finite-dimensional quantile indices $(\tau_{1},\cdots,\tau_{K})$ , the covariance matrix of $(\hat{q}^{\textit{NP}}(\tau_{1}),\cdots,\hat{q}^{\textit{NP}}(\tau_{K}))$ achieves the minimum (in the matrix sense) as characterized in Theorem 3.

Three remarks are in order. First, as the nonparametric regression consistently estimates the true specifications $\{m_{a}(\cdot)\}_{a=0,1}$ , the QTE estimator adjusted by the nonparametric regression achieves the global minimum asymptotic variance, and thus is weakly more efficient than QTE estimation with linear and logistic adjustments studied in the previous section. Second, the practical implementation of NP and ML methods are the same, given that they share the same set of covariates (basis functions). Therefore, even if we include a small number of basis functions so that $h_{n}$ is better treated as fixed, the proposed estimation and inference methods for the regression-adjusted QTE estimator are still valid, although they may not be optimal. Third, in Section A in the Online Supplement, we consider computing $\widehat{m}_{a}(\tau,s,x)$ via an $\ell_{1}$ penalized logistic regression when the dimension of the regressors can be comparable or even higher than the sample size. We then provide primitive conditions under which we verify Assumptions 3 and 5.

6 Simulations

6.1 Data generating processes

Two DGPs are used to assess the finite sample performance of the estimation and inference methods introduced in the paper. We consider the outcome equation

\displaystyle Y_{i}=\alpha(X_{i})+\gamma Z_{i}+\mu(X_{i})A_{i}+\eta_{i},

(6.1)

where $\gamma=4$ for all cases while $\alpha(X_{i})$ , $\mu(X_{i})$ , and $\eta_{i}$ are separately specified as follows.

(i)

Let $Z$ be standardized Beta $(2,2)$ distributed, $S_{i}=\sum_{j=1}^{4}1\{Z_{i}\leq g_{j}\}$ , and $(g_{1},\cdots,g_{4})=(-0.25\sqrt{20},0,0.25\sqrt{20},0.5\sqrt{20})$ . $X_{i}$ contains two covariates $(X_{1i},X_{2i})^{\top}$ , where $X_{1i}$ follows a uniform distribution on $[-2,2]$ , $X_{2i}$ follows a standard normal distribution, and $X_{1i}$ and $X_{2i}$ are independent. Further define $\alpha(X_{i})=1+X_{2i}$ , $\mu(X_{i})=1+X_{i}^{\top}\beta$ , $\beta=(3,3)^{\top}$ , and $\eta_{i}=(0.25+X_{1i}^{2})A_{i}\varepsilon_{1i}+(1-A_{i})\varepsilon_{2i}$ , where $(\varepsilon_{1i},\varepsilon_{2i})$ are jointly standard normal.
(ii)

Let $Z$ be uniformly distributed on $[-2,2]$ , $S_{i}=\sum_{j=1}^{4}1\{Z_{i}\leq g_{j}\}$ , and $(g_{1},\cdots,g_{4})=(-1,0,1,2)$ . Let $X_{i}=(X_{1i},X_{2i})^{\top}$ be the same as defined in DGP (i). Further define $\alpha(X_{i})=1+X_{1i}+X_{2i}$ , $\mu(X_{i})=1+X_{1i}+X_{2i}+\frac{1}{4}(X_{i}^{\top}\beta)^{2}$ with $\beta=(2,2)^{\top}$ , and $\eta_{i}=2(1+Z_{i}^{2})A_{i}\varepsilon_{1i}+(1+Z_{i}^{2})(1-A_{i})\varepsilon_{2i}$ , where $(\varepsilon_{1i},\varepsilon_{2i})$ are mutually independently $T(5)/\sqrt{5}$ distributed.

For each DGP, we consider the following four randomization schemes as in Zhang and Zheng (2020) with $\pi(s)=0.5$ for $s\in\mathcal{S}$ :

(i)

SRS: Treatment assignment is generated as in Example 1.
(ii)

WEI: Treatment assignment is generated as in Example 2 with $\phi(x)=(1-x)/2$ .
(iii)

BCD: Treatment assignment is generated as in Example 3 with $\lambda=0.75$ .
(iv)

SBR: Treatment assignment is generated as in Example 4.

We assess the empirical size and power of the tests for $n=200$ and $n=400$ . We compute the true QTEs and their differences by simulations with 10,000 sample size and 1,000 replications. To compute power, we perturb the true values by $\Delta=1.5$ . We examine three null hypotheses:

(i)

Pointwise test

$H_{0}:q(\tau)=\text{truth}\quad\text{v.s.}\quad H_{1}:q(\tau)=\text{truth}+\Delta,\quad\tau=0.25,0.5,0.75;$
(ii)

Test for the difference

$H_{0}:q(0.75)-q(0.25)=\text{truth}\quad\text{v.s.}\quad H_{1}:q(0.75)-q(0.25)=\text{truth}+\Delta;$
(iii)

Uniform test

$H_{0}:q(\tau)=\text{truth}(\tau)\quad v.s.\quad H_{1}:q(\tau)=\text{truth}(\tau)+\Delta,\quad\tau\in[0.25,0.75].$

For the pointwise test, we report the results for the median ( $\tau=0.5$ ) in the main text and give the cases $\tau=0.25$ and $\tau=0.75$ in the Online Supplement.

6.2 Estimation methods

We consider the following estimation methods of the auxiliary regression.

(i)

NA: the estimator with no adjustments, i.e., setting $\widehat{m}_{a}(\cdot)=\overline{m}_{a}(\cdot)=0$ .
(ii)

LP: the linear probability model with regressors $X_{i}$ and the pseudo true value estimated by $\hat{\theta}_{a,s}^{\textit{LP}}(\tau)$ defined in (5.7).
(iii)

ML: the logistic model with regressor $H_{i}=(1,X_{1i},X_{2i})^{\top}$ and the pseudo true value estimated by $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ defined in (5.8).
(iv)

LPML: the logistic model with regressor $H_{i}=(1,X_{1i},X_{2i})^{\top}$ and the pseudo true value estimated by $\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)$ defined in (5.15).
(v)

MLX: the logistic model with regressor $H_{i}=(1,X_{1i},X_{2i},X_{1i}X_{2i})^{\top}$ and the pseudo true value estimated by $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ defined in (5.8).
(vi)

LPMLX: the logistic model with regressor $H_{i}=(1,X_{1i},X_{2i},X_{1i}X_{2i})^{\top}$ and the pseudo true value estimated by $\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)$ defined in (5.15).
(vii)

NP: the logistic model with regressor $H_{h_{n}}(X_{i})=(1,X_{1i},X_{2i},X_{1i}X_{2i},X_{1i}1\{X_{1i}>t_{1}\}X_{2i}1\{X_{2i}>t_{2}\})^{\top}$ where $t_{1}$ and $t_{2}$ are the sample medians of $\{X_{1i}\}_{i\in[n]}$ and $\{X_{2i}\}_{i\in[n]}$ , respectively. The pseudo true value is estimated by $\hat{\theta}_{a,s}^{\textit{NP}}(\tau)$ defined in (5.17).

6.3 Simulation results

Table 1 presents the empirical size and power for the pointwise test with $\tau=0.5$ under DGPs (i) and (ii). We make six observations. First, none of the auxiliary regressions is correctly specified, but test sizes are all close to the nominal level 5%, confirming that estimation and inference are robust to misspecification. Second, the inclusion of auxiliary regressions improves the efficiency of the QTE estimator as the powers for method “NA” are the lowest among all the methods for both DGPs and all randomization schemes. This finding is consistent with theory because methods “LP”, “LPML”, “LPMLX”, “NP” are guaranteed to be weakly more efficient than “NA”. Third, the powers of methods “LPML” and “LPMLX” are higher than those of methods “ML” and “MLX”, respectively. This is consistent with our theory that methods “LPML” and “LPMLX” further improve “ML” and “MLX”, respectively. In addition, methods “MLX” and “LPMLX” fit a flexible distribution regression that can approximate the true DGP well. Therefore, the powers of “MLX” and “LPMLX” are respectively much larger than those of “ML” and “LPML”. For the same reason we observe that the power of “LPMLX” is close to “NP”.⁸⁸8The results in Section C of the Online Supplement show that “LPMLX” has much smaller bias than “NP” and its variance is similar to “NP”, which make “LPMLX” preferable in practice. Fourth, the powers of method “NP” are the best because it estimates the true specification and achieves the minimum asymptotic variance of $\hat{q}^{adj}(\tau)$ as shown in Theorem 5.1. Fifth, when the sample size is 200, the method “NP” slightly over-rejects but size becomes closer to nominal when the sample size increases to 400. Sixth, the improvement of power of “LPMLX” estimator upon “NA” (i.e., with no adjustments) is due to about 12-15% reduction of the standard error of the QTE estimator on average.⁹⁹9The bias and standard errors are reported in the Section C in the Online Supplement.

Table 1: Pointwise Test (

\tau=0.5

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.055	0.054	0.050	0.054	0.051	0.054	0.051	0.051	0.404	0.406	0.403	0.406	0.665	0.676	0.681	0.681
LP	0.052	0.050	0.049	0.052	0.048	0.053	0.051	0.052	0.491	0.497	0.502	0.492	0.779	0.788	0.790	0.791
ML	0.053	0.050	0.049	0.055	0.051	0.050	0.052	0.052	0.472	0.478	0.483	0.473	0.759	0.768	0.775	0.773
LPML	0.054	0.052	0.052	0.057	0.052	0.054	0.051	0.053	0.506	0.509	0.523	0.513	0.802	0.812	0.814	0.809
MLX	0.056	0.059	0.055	0.057	0.055	0.054	0.055	0.058	0.475	0.479	0.486	0.482	0.752	0.759	0.760	0.760
LPMLX	0.060	0.058	0.059	0.058	0.054	0.055	0.054	0.054	0.506	0.513	0.521	0.512	0.802	0.810	0.813	0.811
NP	0.063	0.059	0.062	0.064	0.055	0.054	0.054	0.056	0.523	0.523	0.531	0.526	0.804	0.811	0.814	0.809
Panel B: DGP (ii)
NA	0.046	0.051	0.045	0.047	0.047	0.045	0.048	0.047	0.479	0.489	0.500	0.490	0.773	0.775	0.774	0.782
LP	0.049	0.051	0.050	0.050	0.045	0.048	0.050	0.045	0.572	0.581	0.589	0.579	0.851	0.856	0.857	0.854
ML	0.051	0.058	0.050	0.054	0.049	0.046	0.050	0.048	0.524	0.534	0.541	0.539	0.812	0.810	0.807	0.807
LPML	0.051	0.058	0.054	0.053	0.050	0.049	0.053	0.047	0.574	0.581	0.588	0.580	0.862	0.863	0.863	0.863
MLX	0.058	0.059	0.056	0.059	0.051	0.049	0.051	0.050	0.566	0.574	0.583	0.573	0.826	0.824	0.827	0.827
LPMLX	0.057	0.062	0.057	0.060	0.052	0.050	0.053	0.052	0.615	0.620	0.630	0.627	0.878	0.878	0.880	0.879
NP	0.063	0.066	0.062	0.062	0.056	0.055	0.056	0.051	0.622	0.625	0.632	0.628	0.883	0.880	0.882	0.879

Tables 2 and 3 present sizes and powers of inference on $q(0.75)-q(0.25)$ and on $q(\tau)$ uniformly over $\tau\in[0.25,0.75]$ , respectively, for DGPs (i) and (ii) and four randomization schemes. All the observations made above apply to these results. The improvement in power of the “LPMLX” estimator upon “NA” (i.e., with no adjustments) is due to a 9% reduction of the standard error of the difference of the QTE estimators on average. In Section C in the Online Supplement, we provide additional simulation results such as the empirical sizes and powers for the pointwise test with $\tau=0.25$ and $0.75$ , the bootstrap inference with the true target fraction, and the adjusted QTE estimator when the DGP contains high-dimensional covariates and the adjustments are computed via logistic Lasso. We also report the biases and standard errors of the adjusted QTE estimators.

Table 2: Test for Differences (

\tau_{1}=0.25

\tau_{2}=0.75

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.043	0.045	0.040	0.041	0.044	0.043	0.041	0.043	0.214	0.216	0.209	0.203	0.387	0.389	0.383	0.365
LP	0.045	0.048	0.043	0.045	0.045	0.047	0.043	0.045	0.246	0.242	0.234	0.248	0.424	0.422	0.422	0.421
ML	0.045	0.045	0.043	0.042	0.046	0.047	0.040	0.048	0.234	0.233	0.231	0.239	0.415	0.422	0.417	0.426
LPML	0.044	0.049	0.045	0.045	0.049	0.049	0.044	0.047	0.250	0.250	0.248	0.259	0.451	0.453	0.450	0.459
MLX	0.046	0.052	0.046	0.047	0.047	0.047	0.044	0.049	0.232	0.234	0.229	0.241	0.415	0.415	0.404	0.416
LPMLX	0.049	0.055	0.047	0.047	0.049	0.050	0.047	0.047	0.247	0.249	0.249	0.258	0.445	0.453	0.445	0.453
NP	0.050	0.054	0.050	0.051	0.052	0.052	0.047	0.048	0.246	0.248	0.245	0.257	0.444	0.444	0.442	0.450
Panel B: DGP (ii)
NA	0.039	0.044	0.040	0.038	0.044	0.041	0.039	0.047	0.211	0.225	0.217	0.194	0.399	0.396	0.392	0.383
LP	0.043	0.048	0.045	0.040	0.045	0.044	0.042	0.047	0.244	0.255	0.251	0.245	0.447	0.440	0.441	0.455
ML	0.049	0.046	0.046	0.043	0.044	0.045	0.042	0.048	0.217	0.228	0.213	0.212	0.379	0.386	0.386	0.396
LPML	0.047	0.051	0.048	0.043	0.047	0.045	0.047	0.048	0.253	0.258	0.253	0.252	0.456	0.451	0.454	0.468
MLX	0.047	0.051	0.047	0.047	0.046	0.046	0.045	0.049	0.226	0.240	0.228	0.223	0.394	0.392	0.391	0.399
LPMLX	0.053	0.056	0.051	0.048	0.051	0.049	0.045	0.050	0.261	0.272	0.265	0.263	0.467	0.460	0.460	0.477
NP	0.056	0.058	0.053	0.052	0.051	0.052	0.045	0.050	0.266	0.275	0.266	0.270	0.469	0.459	0.461	0.479

Table 3: Uniform Test (

\tau\in[0.25,0.75]

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.048	0.044	0.044	0.045	0.047	0.049	0.045	0.048	0.450	0.451	0.455	0.454	0.765	0.770	0.769	0.770
LP	0.045	0.044	0.043	0.045	0.047	0.051	0.047	0.046	0.589	0.588	0.589	0.581	0.902	0.901	0.904	0.900
ML	0.047	0.044	0.043	0.045	0.044	0.051	0.045	0.047	0.570	0.577	0.582	0.568	0.887	0.889	0.893	0.890
LPML	0.046	0.046	0.045	0.047	0.046	0.050	0.046	0.051	0.603	0.605	0.616	0.607	0.916	0.917	0.915	0.915
MLX	0.052	0.049	0.048	0.048	0.046	0.053	0.050	0.050	0.582	0.582	0.595	0.576	0.889	0.893	0.891	0.889
LPMLX	0.053	0.047	0.049	0.052	0.047	0.053	0.050	0.050	0.612	0.614	0.619	0.610	0.915	0.919	0.919	0.913
NP	0.056	0.055	0.054	0.055	0.050	0.057	0.052	0.054	0.633	0.627	0.633	0.629	0.916	0.919	0.918	0.915
Panel B: DGP (ii)
NA	0.038	0.039	0.039	0.038	0.045	0.039	0.040	0.045	0.572	0.571	0.579	0.574	0.878	0.882	0.879	0.879
LP	0.041	0.044	0.045	0.041	0.044	0.043	0.039	0.042	0.704	0.708	0.710	0.700	0.953	0.955	0.956	0.955
ML	0.044	0.043	0.048	0.041	0.047	0.045	0.043	0.044	0.661	0.660	0.664	0.655	0.931	0.931	0.933	0.935
LPML	0.047	0.046	0.048	0.044	0.047	0.046	0.041	0.046	0.723	0.714	0.720	0.714	0.964	0.963	0.965	0.964
MLX	0.052	0.050	0.052	0.049	0.048	0.046	0.045	0.045	0.703	0.710	0.708	0.704	0.946	0.949	0.946	0.951
LPMLX	0.056	0.054	0.054	0.051	0.052	0.048	0.046	0.048	0.761	0.761	0.766	0.754	0.972	0.972	0.972	0.974
NP	0.060	0.060	0.062	0.058	0.055	0.052	0.047	0.051	0.770	0.771	0.773	0.765	0.973	0.974	0.972	0.974

6.4 Practical recommendations

When $X$ is finite-dimensional, we suggest using the LPMLX adjustment in which the logistic model includes interaction terms and the regression coefficients are allowed to depend on $(\tau,a,s)$ . When $X$ is high-dimensional, we suggest using the logistic Lasso to estimate the regression adjustment.¹⁰¹⁰10The relevant theory and simulation results on high-dimensional covariates are provided in Section A of the Online Supplement.

7 Empirical Application

Undersaving has been found to have important individual and social welfare consequences (Karlan et al., 2014). Does expanding access to bank accounts for the poor lead to an overall increase in savings? To answer the question, Dupas et al. (2018) conducted a covariate-adaptive randomized experiment in Uganda, Malawi, and Chile to study the impact of a bank account subsidy on savings. In their paper, the authors examined the ATEs as well as the QTEs of the subsidy. This section reports an application of our methods to the same dataset to examine the QTEs of the subsidy on household total savings in Uganda.

The sample consists of 2160 households in Uganda.¹¹¹¹11We filter out observations with missing values. Our final sample contains 1952 households. Within each of 41 strata by gender, occupation, and bank branch, 50 percent of households in the sample were randomly assigned to receive the bank account subsidy and the rest of the sample were in the control group. This is a block stratified randomization design with 41 strata, which satisfies Assumption 1 in Section 2. The target fraction of the treated units is 1/2. It is trivial to see that statements (i), (ii), and (iii) in Assumption 1 are satisfied. Because $\max_{s\in\mathcal{S}}|\frac{D_{n}(s)}{n(s)}|\approx 0.056$ , it is reasonable to claim that Assumption 1(iv) is also satisfied in our analysis.

After the randomization and the intervention, the authors conducted 3 rounds of follow-up surveys in Uganda (See Dupas et al. (2018) for a detailed description). In this section, we focus on the first round follow up survey to examine the impact of the bank account subsidy on total savings.

Tables 4 and 5 present the QTE estimates and their standard errors (in parentheses) estimated by different methods at quantile indices 0.25, 0.5, and 0.75. The description of these estimators is similar to that in Section 6.¹²¹²12Specifically, we have: (i) NA: the estimator with no adjustments. (ii) LP: the linear probability model. When there is only one auxiliary regressor, $H_{i}=(1,X_{1i})^{\top}$ , and when there are four auxiliary regressors, $H_{i}=(1,X_{1i},X_{2i},X_{3i},X_{4i})^{\top}$ , where $X_{1i},X_{2i},X_{3i},X_{4i},$ represent four covariates used in the regression adjustment. (iii) ML: the logistic probability model with regressor $H_{i}$ , where $H_{i}$ is the same as that in the LP model. (iv) LPML: the further improved logistic probability model with regressor $H_{i}$ , where $H_{i}$ is the same as that in the LP model. (v) MLX: the logistic probability model with interaction terms. MLX is only be applied to the case when there are four auxiliary regressors with $H_{i}=(1,X_{1i},X_{2i},X_{3i},X_{4i},X_{1i}X_{2i},X_{2i}X_{3i})^{\top}$ . (vi) LPMLX: the further improved logistic probability model with interaction terms. LPMLX is only be applied to the case when there are four auxiliary regressors with the same $H_{i}$ as that used in the MLX model. (vii) NP: the nonparametric logistic probability model with regressor $H_{h_{n}}$ . NP is only applied to the case when there are four auxiliary regressors with $H_{h_{n}}=(1,X_{1i},X_{2i},X_{3i},X_{4i},X_{1i}X_{2i},X_{2i}X_{3i},X_{1i}1\{X_{1i}>t_{1}\},X_{2i}1\{X_{2i}>t_{2}\},X_{1i}1\{X_{1i}>t_{1}\}X_{2i}1\{X_{2i}>t_{2}\})^{\top}$ where $t_{1}$ and $t_{2}$ are the sample medians of $\{X_{1i}\}_{i\in[n]}$ and $\{X_{2i}\}_{i\in[n]}$ , respectively. (viii) Lasso: the logistic probability model with regressor $H_{p_{n}}$ and post-Lasso coefficient estimator $\hat{\theta}_{a,s}^{post}(\tau)$ . Lasso is only applied to the case when there are four auxiliary regressors with $H_{p_{n}}(X_{i})=(1,X_{1i},X_{2i},X_{3i},X_{4i},X_{1i}^{2},X_{2i}^{2},X_{3i}^{2},X_{1i}X_{2i},X_{2i}X_{3i},X_{1i}1\{X_{1i}>t_{1}\},X_{2i}1\{X_{2i}>t_{2}\},X_{1i}1\{X_{1i}>t_{1}\}X_{2i}1\{X_{2i}>t_{2}\})^{\top}$ . The post-Lasso estimator $\hat{\theta}_{a.s}^{post}(\tau)$ is defined in (A.2). The choice of tuning parameter and the estimation procedure are detailed in Section B.3. In the analysis, we focus on two sets of additional baseline variables: baseline value of total savings only (one auxiliary regressor) and baseline value of total savings, household size, age, and married female dummy (four auxiliary regressors). The first set of regressors follows Dupas et al. (2018). The second one is used to illustrate all the methods discussed in the paper. Tables 4 and 5 report the results with one and four auxiliary regressors, respectively.

Table 4: QTEs on Total Savings (one auxiliary regressor)

	NA	LP	ML	LPML
25%	1.105	1.105	1.105	1.105
	(0.564)	(0.564)	(0.470)	(0.470)
50%	3.682	3.682	3.682	3.682
	(1.010)	(1.080)	(1.146)	(1.033)
75%	7.363	9.204	9.204	9.204
	(3.757)	(4.227)	(3.616)	(3.757)

\justify

Notes: The table presents the QTE estimates of the effect of the bank account subsidy on household total savings at quantiles 25%, 50%, and 75% when only one auxiliary regressor (baseline value of total savings) is used in the regression adjustment models. Standard errors are in parentheses.

Table 5: QTEs on Total Savings (four auxiliary regressors)

	NA	LP	ML	LPML	MLX	LPMLX	NP	Lasso
25%	1.105	1.473	1.105	1.105	1.105	1.105	1.105	1.105
	(0.564)	(0.564)	(0.564)	(0.564)	(0.357)	(0.319)	(0.188)	(0.564)
50%	3.682	3.682	3.682	3.682	3.682	3.682	3.682	3.682
	(1.010)	(1.033)	(0.939)	(0.939)	(0.958)	(1.033)	(0.939)	(0.939)
75%	7.363	8.100	7.363	7.363	7.363	7.363	7.363	7.363
	(3.757)	(3.757)	(3.757)	(3.569)	(3.757)	(3.663)	(3.663)	(3.757)

\justify

Notes: The table shows QTE estimates of the effect of the bank account subsidy on household total savings at quantiles 25%, 50%, and 75% when four auxiliary regressors (baseline value of total savings, household size, age, and married female dummy) are used in the regression adjustment models. Standard errors are in parentheses.

Table 6: Test for the Difference between Two QTEs on Total Savings

	NA	LP	ML	LPML	MLX	LPMLX	NP	Lasso
$50\%-25\%$	2.577	2.209	2.577	2.577	2.577	2.577	2.577	2.577
	(0.939)	(1.104)	(0.939)	(0.939)	(0.958)	(1.033)	(0.845)	(0.911)
$75\%-50\%$	3.682	4.418	3.682	3.682	3.682	3.682	3.682	3.682
	(3.757)	(3.663)	(3.663)	(3.287)	(3.475)	(3.287)	(3.663)	(3.757)
$75\%-25\%$	6.259	6.627	6.259	6.259	6.259	6.259	6.259	6.259
	(3.851)	(3.757)	(3.757)	(3.695)	(3.588)	(3.569)	(3.287)	(3.832)

\justify

Notes: The table presents tests for the difference between two QTE estimates of the effect of the bank account subsidy on household total savings when there are four auxiliary regressors: baseline value of total savings, household size, age, and married female dummy. Standard errors are in parentheses.

Refer to caption — Figure 1: Quantile Treatment Effects on the Distribution of Total Savings

The results in Tables 4-5 prompt two observations. First, consistent with the theoretical and simulation results, the standard errors for the regression-adjusted QTEs are mostly lower than those for the QTE estimate without adjustment. This observation holds for most of the specification and estimation methods of the auxiliary regressions.¹³¹³13The efficiency gain from the “NP” adjustment is not the only reason for its small standard error at the 25% QTE. Another reason is that the treated outcomes around this percentile themselves do not have much variation. For example, in Table 4, the standard errors for the “LPML” QTE estimates are 16.7% less than those for the QTE estimate without adjustment at the 25th percentile, respectively. For another example in Table 5, at the 25th percentile, the standard error for the “LPMLX” QTE estimates is 43.4% less than that for the QTE estimate without adjustment. At the median, the standard error for the “LPML” QTE estimates is 7% less than that for the QTE estimate without adjustment.

Second, there is substantial heterogeneity in the impact of the subsidy on total savings. In particular, we observe larger effects as the quantile indexes increase, which is consistent with the findings in Dupas et al. (2018). For example, Table 5 shows that, although the treatment effects are all positive and significantly different from zero at quantiles 25%, 50%, and 75%, the magnitude of the effects increases by over 200% from the 25th percentile to the median and by around 100% from the median to the 75th percentile.

The second observation suggests that the heterogeneous effects of the subsidy on savings are sizable economically. To evaluate whether these effects are statistically significant, we report statistical tests for the heterogeneity of the QTEs in Table 6. Specifically, we test the null hypotheses: $H_{0}:q(0.5)-q(0.25)=0$ , $H_{0}:q(0.75)-q(0.5)=0$ , and $H_{0}:q(0.75)-q(0.25)=0$ . Table 6 shows that only the difference between the 50th and 25th QTEs is statistically significant at the 5% significance level.

How does the variation in the impact of the subsidy appear across the distribution of total savings? The QTEs on the distribution of savings are plotted in Figure 1, where the shaded areas represent the 95% confidence region. The figure shows that the QTEs seem insignificantly different from zero below about the 20th percentile. At higher levels to near the 80th percentile, the treatment group savings exceed the control group savings at an accelerated rate, yielding increasingly significantly positive QTEs. Beyond the 80th percentile, the QTEs again become insignificantly different from zero. These findings point to notable distributional heterogeneity in the impact of the subsidy on savings.

8 Conclusion

This paper proposes the use of auxiliary regression to incorporate additional covariates into estimation and inference relating to unconditional QTEs under CARs. The auxiliary regression model may be estimated parametrically, nonparametrically, or via regularization if there are high-dimensional covariates. Both estimation and bootstrap inference methods are robust to potential misspecification of the auxiliary model and do not suffer from the conservatism due to the CAR. It is shown that efficiency can be improved when including extra covariates. When the auxiliary regression is correctly specified, the regression-adjusted estimator further achieves the minimum asymptotic variance. In both the simulations and the empirical application, the proposed regression-adjusted QTE estimator performs well. These results and the robustness of the methods to auxiliary model misspecification reflect the aphorism widespread in scientific modeling that all models may be wrong, but some are useful.¹⁴¹⁴14The aphorism “all models are wrong, but some are useful” is often attributed to the statistician George Box (1976). But the notion has many antecedents, including a particularly apposite remark made in 1947 by John von Neumann (2019) in an essay on the empirical origins of mathematical ideas to the effect that “truth … is much too complicated to allow anything but approximations”.

Acknowledgements

We thank the Managing Editor, Elie Tamer, the Associate Editor and three anonymous referees for many useful comments that helped to improve this paper. We are also grateful to Michael Qingliang Fan and seminar participants from the 2022 Econometric Society Australasian Meeting, the 2021 Nanyang Econometrics Workshop, University of California, Irvine, and University of Sydney for their comments.

Funding:

Yichong Zhang acknowledges financial support from the Singapore Ministry of Education under Tier 2 grant No. MOE2018-T2-2-169, the NSFC under the grant No. 72133002, and a Lee Kong Chian fellowship. Peter C. B. Phillips acknowledges support from NSF Grant No. SES 18-50860, a Kelly Fellowship at the University of Auckland, and a Lee Kong Chian Fellowship. Yubo Tao acknowledges the financial support from the Start-up Research Grant of University of Macau (SRG2022-00016-FSS). Liang Jiang acknowledges support from MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Project No.18YJC790063).

Appendix A Regularization Method for Regression Adjustments

This section considers estimation of $m_{a}(\tau,s,X)$ in a high-dimensional environment. Let $H_{p_{n}}(X_{i})$ be the regressors with dimension $p_{n}$ , which may exceed the sample size. When the number of raw controls is comparable to or exceeds the sample size, we can just let $H_{p_{n}}(X_{i})=X_{i}$ . On the other hand, $H_{p_{n}}(X_{i})$ may be composed of a large dictionary of sieve bases derived from a fixed dimensional vector $X_{i}$ through suitable transformations such as powers and interactions. Thus, high dimensionality in $H_{p_{n}}(X_{i})$ can arise from the desire to flexibly approximate nuisance functions. In our approach we follow Belloni et al. (2017) and implement a logistic regression with $\ell_{1}$ -penalization. In their notation we view $m_{a}(\tau,s,x)$ as a function of $q_{a}(\tau)$ , i.e., $m_{a}(\tau,s,x)=\tau-\mathcal{M}_{a}(q_{a}(\tau),s,x)$ , where $\mathcal{M}_{a}(q,s,x)=\mathbb{P}(Y_{i}(a)\leq q|S_{i}=s,X_{i}=x)$ . We estimate $\mathcal{M}_{a}(q_{a}(\tau),s,x)$ as $\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau)))$ , where $\hat{q}_{a}(\tau)$ is defined in Assumption 8,

	$\displaystyle\hat{\theta}_{a,s}^{\textit{HD}}(q)=$	$\displaystyle\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq q\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))$
		$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+1\{Y_{i}>q\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}+\frac{\varrho_{n,a}(s)}{n_{a}(s)}\|\|\hat{\Omega}\theta_{a}\|\|_{1},$		(A.1)

$\varrho_{n,a}(s)$ is a tuning parameter, and $\hat{\Omega}=\text{diag}(\hat{\omega}_{1},\cdots,\hat{\omega}_{p_{n}})$ is a diagonal matrix of data-dependent penalty loadings. We specify $\varrho_{n,a}(s)$ and $\hat{\Omega}$ in Section B. Post-Lasso estimation is also considered. Let $\hat{\mathbb{S}}_{a,s}(q)$ be the support of $\hat{\theta}_{a,s}^{\textit{HD}}(q)=\{h\in[p_{n}]:\hat{\theta}_{a,s,h}^{\textit{HD}}(q)\neq 0\}$ , where $\hat{\theta}_{a,s,h}^{\textit{HD}}(q)$ is the $h$ th coordinate of $\hat{\theta}_{a,s}^{\textit{HD}}(q)$ . We can complement $\hat{\mathbb{S}}_{a,s}(q)$ with additional variables in $\hat{\mathbb{S}}^{+}_{a,s}(q)$ that researchers want to control for and define the enlarged set of variables as $\tilde{\mathbb{S}}_{a,s}(q)=\hat{\mathbb{S}}_{a,s}(q)\cup\hat{\mathbb{S}}^{+}_{a,s}(q)$ . We compute the post-Lasso estimator $\hat{\theta}_{a,s}^{post}(q)$ as

	$\displaystyle\hat{\theta}_{a,s}^{post}(q)=$	$\displaystyle\operatorname*{arg\,min}_{\theta_{a}\in\tilde{\mathbb{S}}_{a,s}(q)}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq q\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))$
		$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+1\{Y_{i}>q\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}.$		(A.2)

Finally, we compute the auxiliary model as

\displaystyle\widehat{m}_{a}(\tau,s,X_{i})=\lambda(H_{p_{n}}^{\top}(X_{i})\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau)))\quad\text{or}\quad\widehat{m}_{a}(\tau,s,X_{i})=\lambda(H_{p_{n}}^{\top}(X_{i})\hat{\theta}_{a,s}^{post}(\hat{q}_{a}(\tau))).

(A.3)

We refer to the QTE estimator with the regularized adjustment as the HD estimator. Note that we use the estimator $\hat{q}_{a}(\tau)$ of $q_{a}(\tau)$ in (A.3), where $\hat{q}_{a}(\tau)$ satisfies Assumption 8. All the analysis in this section takes account of the fact that $\hat{q}_{a}(\tau)$ instead of $q_{a}(\tau)$ is used.

Assumption 13.

(i)

Let $\mathcal{Q}^{\varepsilon}_{a}=\{q:\inf_{\tau\in\Upsilon}|q-q_{a}(\tau)|\leq\varepsilon\}$ for $a=0,1$ . Suppose $\mathbb{P}(Y_{i}(a)\leq q|S_{i}=s,X_{i})=\lambda(H_{p_{n}}(X_{i})^{\top}\theta^{\textit{HD}}_{a,s}(q))+r_{a}(q,s,X_{i})$ such that $\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\theta^{\textit{HD}}_{a,s}(q)||_{0}\leq h_{n}$ .
(ii)

Suppose $\sup_{i\in[n]}||H_{p_{n}}(X_{i})||_{\infty}\leq\zeta_{n}$ and $\sup_{h\in[p_{n}]}\mathbb{E}(|H_{p_{n},h}^{d}(X_{i})||S_{i}=s)<\infty$ for $d>2$ .

(iii)

Suppose

\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}r^{2}_{a}(q,s,X_{i})=O_{p}(h_{n}\log(p_{n})/n),

\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}\mathbb{E}(r^{2}_{a}(q,s,X_{i})|S_{i}=s)=O(h_{n}\log(p_{n})/n),

and

\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S},x\in\mathcal{X}}|r_{a}(q,s,X)|=O(\sqrt{\xi_{n}^{2}h_{n}^{2}\log(p_{n})/n}).

(iv)

$\frac{\log(p_{n})\xi_{n}^{2}h_{n}^{2}}{n}\rightarrow 0$ , $\frac{\log^{2}(p_{n})\log^{2}(n)h_{n}^{2}}{n}\rightarrow 0$ , $\sup_{a=0,1,q\in\mathcal{Q}_{a}^{\varepsilon},s\in\mathcal{S}}|\hat{\mathbb{S}}^{+}_{a,s}(q)|=O_{p}(h_{n})$ , where $|\hat{\mathbb{S}}^{+}_{a,s}(q)|$ denotes the number of elements in $\hat{\mathbb{S}}^{+}_{a,s}(q)$ .

(v)

There exists a constant $c\in(0,0.5)$ such that

	$\displaystyle c\leq$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|S_{i}=s,X_{i}=x)$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|S_{i}=s,X_{i}=x)\leq 1-c.$

(vi)

Let $\ell_{n}$ be a sequence that diverges to infinity. Then, there exist two constants $\kappa_{1}$ and $\kappa_{2}$ such that with probability approaching one,

	$\displaystyle 0<\kappa_{1}\leq$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\|\|v\|\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}\right)v}{\|\|v\|\|_{2}^{2}}$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\|\|v\|\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}\right)v}{\|\|v\|\|_{2}^{2}}\leq\kappa_{2}<\infty,$

and

	$\displaystyle 0<\kappa_{1}\leq$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\|\|v\|\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}\|S_{i}=s)v}{\|\|v\|\|_{2}^{2}}$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\|\|v\|\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X_{i})H_{p_{n}}(X_{i})^{\top}\|S_{i}=s)v}{\|\|v\|\|_{2}^{2}}\leq\kappa_{2}<\infty,$

where $||v||_{0}$ denotes the number of nonzero components in $v$ .

(vii)

For $a=0,1$ , let $\varrho_{n,a}(s)=c\sqrt{n_{a}(s)}\Phi^{-1}\left(1-\frac{0.1}{4\log(n_{a}(s))p_{n}}\right)$ where $\Phi(\cdot)$ is the standard normal CDF and $c>0$ is a constant.

Assumption 13 is standard in the literature and we refer interested readers to Belloni et al. (2017) for more discussion. Assumption 13(i) implies the logistic model is approximately correctly specified. As the approximation is assumed to be sparse, the condition is not innocuous in the high-dimensional setting. As our method is valid even when the auxiliary model is misspecified, we conjecture that Assumption 13(i) can be relaxed, which links to the recent literature on the study of regularized estimation in the high-dimensional setting under misspecification: see, for example, Bradic et al. (2019) and Tan (2020) and the references therein. An interesting topic for future work is to study misspecification-robust high-dimensional estimators of the conditional probability model and their use to adjust the QTE estimator under CAR based on (3.1) and (3.2). The following theorem shows that all the estimation and inference results in Theorems 3 and 5 hold for the HD estimator.

Theorem A.1.

Denote $\hat{q}^{\textit{HD}}(\tau)$ and $\hat{q}^{\textit{HD,w}}(\tau)$ as the $\tau$ th QTE estimator and its multiplier bootstrap counterpart defined in Sections 3 and 4, respectively, with $\overline{m}_{a}(\tau,S_{i},X_{i})=m_{a}(\tau,S_{i},X_{i})$ and $\widehat{m}_{a}(\tau,S_{i},X_{i})$ defined in (A.3). Further suppose Assumptions 1, 2, 4, 8, and 13 hold. Then, Assumptions 3 and 5 hold, which further imply Theorems 3 and 5 hold for $\hat{q}^{\textit{HD}}(\tau)$ and $\hat{q}^{\textit{HD,w}}(\tau)$ , respectively. In addition, for any finite-dimensional quantile indices $(\tau_{1},\cdots,\tau_{K})$ , the covariance matrix of $(\hat{q}^{\textit{HD}}(\tau_{1}),\cdots,\hat{q}^{\textit{HD}}(\tau_{K}))$ achieves the minimum (in the matrix sense) as characterized in Theorem 3.

Appendix B Practical Guidance and Computation

B.1 Procedures for estimation and bootstrap inference

We can compute $(\hat{q}_{1}^{adj}(\tau),\hat{q}_{0}^{adj}(\tau))$ by solving the subgradient conditions of (3.1) and (3.2), respectively. Specifically, we have $(\hat{q}_{1}^{adj}(\tau),\hat{q}_{0}^{adj}(\tau))=(Y_{i_{1}},Y_{i_{0}})$ such that $A_{i_{1}}=1$ , $A_{i_{0}}=0$ ,

	$\displaystyle\tau\left(\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\right)-\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)$
	$\displaystyle\geq\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}1\{Y_{i}<Y_{i_{1}}\}$
	$\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\right)-\frac{1}{\hat{\pi}(S_{i_{1}})}-\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right),$		(B.1)

and

	$\displaystyle\tau\left(\sum_{i=1}^{n}\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}\right)+\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right)$
	$\displaystyle\geq\sum_{i=1}^{n}\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}1\{Y_{i}<Y_{i_{0}}\}$
	$\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{1-A_{i}}{1-\hat{\pi}(S_{i})}\right)-\frac{1}{1-\hat{\pi}(S_{i_{0}})}+\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right).$		(B.2)

We note $(i_{1},i_{0})$ are uniquely defined as long as all the inequalities in (B.1) and (B.2) are strict, which is usually the case. If we have

\displaystyle\tau\left(\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\right)-\sum_{i=1}^{n}\left(\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)=\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}1\{Y_{i}\leq Y_{i_{1}}\},

then both $i_{1}$ and $i_{1}^{\prime}$ satisfy (B.1), where $i_{1}^{\prime}$ is the index such that $A_{i_{1}^{\prime}}=1$ and $Y_{i_{1}^{\prime}}$ is the smallest observation in the treatment group that is larger than $Y_{i_{1}}$ . In this case, we let $\hat{q}_{1}^{adj}(\tau)=Y_{i_{1}}$ .¹⁵¹⁵15In this case, any value that belongs to $[Y_{i_{1}},Y_{i_{1}^{\prime}}]$ can be viewed as a solution. Because $|Y_{i_{1}}-Y_{i_{1}^{\prime}}|=O_{p}(1/n)$ , all choices are asymptotically equivalent. Similarly, by solving the subgradient conditions of (B.3) and (B.4), we have $(\hat{q}_{1}^{w}(\tau),\hat{q}_{0}^{w}(\tau))=(Y_{i_{1}^{w}},Y_{i_{0}^{w}})$ such that $A_{i_{1}^{w}}=1$ , $A_{i_{0}^{w}}=0$ ,

	$\displaystyle\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}\right)-\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)$
	$\displaystyle\geq\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}1\{Y_{i}<Y_{i_{1}^{w}}\}$
	$\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}\right)-\frac{\xi_{i_{1}^{w}}}{\hat{\pi}^{w}(S_{i_{1}^{w}})}-\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right)$		(B.3)

and

	$\displaystyle\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}(1-A_{i})}{1-\hat{\pi}^{w}(S_{i})}\right)+\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}(S_{i}))}{1-\hat{\pi}^{w}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right)$
	$\displaystyle\geq\sum_{i=1}^{n}\frac{\xi_{i}(1-A_{i})}{1-\hat{\pi}^{w}(S_{i})}1\{Y_{i}<Y_{i_{0}^{w}}\}$
	$\displaystyle\geq\tau\left(\sum_{i=1}^{n}\frac{\xi_{i}(1-A_{i})}{1-\hat{\pi}^{w}(S_{i})}\right)-\frac{\xi_{i_{0}^{w}}}{1-\hat{\pi}^{w}(S_{i_{0}})}+\sum_{i=1}^{n}\left(\frac{\xi_{i}(A_{i}-\hat{\pi}^{w}(S_{i}))}{1-\hat{\pi}^{w}(S_{i})}\widehat{m}_{0}(\tau,S_{i},X_{i})\right).$		(B.4)

The inequalities in (B.3) and (B.4) are strict with probability one if $\xi_{i}$ is continuously distributed. In this case, $(\hat{q}_{1}^{w}(\tau),\hat{q}_{0}^{w}(\tau))$ are uniquely defined with probability one.

We summarize the steps in the bootstrap procedure as follows.

1.

Let $\mathcal{G}$ be a set of quantile indices. For $\tau\in\mathcal{G}$ , compute $\hat{q}_{1}(\tau)$ and $\hat{q}_{0}(\tau)$ following (B.1) and (B.2) with $\widehat{m}_{1}(\tau,S_{i},X_{i})$ and $\widehat{m}_{0}(\tau,S_{i},X_{i})$ replaced by zero.
2.

Compute $\widehat{m}_{a}(\tau,S_{i},X_{i})$ for $a=0,1$ and $\tau\in\mathcal{G}$ using $\hat{q}_{1}(\tau)$ and $\hat{q}_{0}(\tau)$ .
3.

Compute the original estimator $\hat{q}^{adj}(\tau)=\hat{q}^{adj}_{1}(\tau)-\hat{q}^{adj}_{0}(\tau)$ , following (B.1) and (B.2) for $\tau\in\mathcal{G}$ .
4.

Let $B$ be the number of bootstrap replications. For $b\in[B]$ , generate $\{\xi_{i}\}_{i\in[n]}$ . Compute $\hat{q}^{w,b}(\tau)=\hat{q}_{1}^{w,b}(\tau)-\hat{q}_{0}^{w,b}(\tau)$ for $\tau\in\mathcal{G}$ following (B.3) and (B.4). Obtain $\{\hat{q}^{w,b}(\tau)\}_{\tau\in\mathcal{G}}$ .
5.

Repeat the above step for $b\in[B]$ and obtain $B$ bootstrap estimates of the QTE, denoted as $\{\hat{q}^{w,b}(\tau)\}_{b\in[B],\tau\in\mathcal{G}}$ .

B.2 Bootstrap confidence intervals

Given the bootstrap estimates, we next discuss how to conduct bootstrap inference for the null hypotheses with single, multiple, and a continuum of quantile indices.

Case (1). We test the single null hypothesis that $\mathcal{H}_{0}:q(\tau)=\underline{q}$ vs. $q(\tau)\neq\underline{q}$ . Set $\mathcal{G}=\{\tau\}$ in the procedures described above and let $\widehat{\mathcal{C}}(\nu)$ and $\mathcal{C}(\nu)$ be the $\nu$ th empirical quantile of the sequence $\{\hat{q}^{w,b}(\tau)\}_{b\in[B]}$ and the $\nu$ th standard normal critical value, respectively. Let $\alpha\in(0,1)$ be the significance level. We suggest using the bootstrap estimator to construct the standard error of $\hat{q}^{adj}(\tau)$ as $\hat{\sigma}=\frac{\widehat{\mathcal{C}}(0.975)-\widehat{\mathcal{C}}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}$ . Then the valid confidence interval and Wald test using this standard error are

\displaystyle CI(\alpha)=(\hat{q}^{adj}(\tau)+\mathcal{C}(\alpha/2)\hat{\sigma},\hat{q}^{adj}(\tau)+\mathcal{C}(1-\alpha/2)\hat{\sigma}),

and $1\big{\{}\big{|}\frac{\hat{q}^{adj}(\tau)-\underline{q}}{\hat{\sigma}}\big{|}\geq\mathcal{C}(1-\alpha/2)\big{\}}$ , respectively.¹⁶¹⁶16It is asymptotically valid to use standard and percentile bootstrap confidence intervals. But in simulations we found that the confidence interval proposed in the paper has better finite sample performance in terms of coverage rates under the null.

Case (2). We test the null hypothesis that $\mathcal{H}_{0}:q(\tau_{1})-q(\tau_{2})=\underline{q}$ vs. $q(\tau_{1})-q(\tau_{2})\neq\underline{q}$ . In this case, we have $\mathcal{G}=\{\tau_{1},\tau_{2}\}$ in the procedure described in Section B.1. Further, let $\widehat{\mathcal{C}}(\nu)$ be the $\nu$ th empirical quantile of the sequence $\{\hat{q}^{w,b}(\tau_{1})-\hat{q}^{w,b}(\tau_{2})\}_{b\in[B]}$ , and let $\alpha\in(0,1)$ be the significance level. We suggest using the bootstrap standard error to construct the valid confidence interval and Wald test as

\displaystyle CI(\alpha)=(\hat{q}^{adj}(\tau_{1})-\hat{q}^{adj}(\tau_{2})+\mathcal{C}(\alpha/2)\hat{\sigma},\hat{q}^{adj}(\tau_{1})-\hat{q}^{adj}(\tau_{2})+\mathcal{C}(1-\alpha/2)\hat{\sigma}),

and $1\big{\{}\big{|}\frac{\hat{q}^{adj}(\tau_{1})-\hat{q}^{adj}(\tau_{2})-\underline{q}}{\hat{\sigma}}\big{|}\geq\mathcal{C}(1-\alpha/2)\big{\}}$ , respectively, where $\hat{\sigma}=\frac{\widehat{\mathcal{C}}(0.975)-\widehat{\mathcal{C}}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}$ .

Case (3). We test the null hypothesis that

\mathcal{H}_{0}:q(\tau)=\underline{q}(\tau)\leavevmode\nobreak\ \forall\tau\in\Upsilon\leavevmode\nobreak\ \text{vs.}\leavevmode\nobreak\ q(\tau)\neq\underline{q}(\tau)\leavevmode\nobreak\ \exists\tau\in\Upsilon.

In theory, we should let $\mathcal{G}=\Upsilon$ . In practice, we let $\mathcal{G}=\{\tau_{1},\cdots,\tau_{G}\}$ be a fine grid of $\Upsilon$ where $G$ should be as large as computationally possible. Further, let $\widehat{\mathcal{C}}_{\tau}(\nu)$ denote the $\nu$ th empirical quantile of the sequence $\{\hat{q}^{w,b}(\tau)\}_{b\in[B]}$ for $\tau\in\mathcal{G}$ . Compute the standard error of $\hat{q}^{adj}(\tau)$ as

\displaystyle\hat{\sigma}_{\tau}=\frac{\widehat{\mathcal{C}}_{\tau}(0.975)-\widehat{\mathcal{C}}_{\tau}(0.025)}{\mathcal{C}(0.975)-\mathcal{C}(0.025)}.

The uniform confidence band with an $\alpha$ significance level is constructed as

\displaystyle CB(\alpha)=\{\hat{q}^{adj}(\tau)-\widetilde{\mathcal{C}}_{\alpha}\hat{\sigma}_{\tau},\hat{q}^{adj}(\tau)+\widetilde{\mathcal{C}}_{\alpha}\hat{\sigma}_{\tau}:\tau\in\mathcal{G}\},

where the critical value $\widetilde{\mathcal{C}}_{\alpha}$ is computed as

\displaystyle\widetilde{\mathcal{C}}_{\alpha}=\inf\left\{z:\frac{1}{B}\sum_{b=1}^{B}1\left\{\sup_{\tau\in\mathcal{G}}\left|\frac{\hat{q}^{w,b}(\tau)-\tilde{q}(\tau)}{\hat{\sigma}_{\tau}}\right|\leq z\right\}\geq 1-\alpha\right\},

and $\tilde{q}(\tau)$ is first-order equivalent to $\hat{q}^{adj}(\tau)$ in the sense that $\sup_{\tau\in\Upsilon}|\tilde{q}(\tau)-\hat{q}^{adj}(\tau)|=o_{p}(1/\sqrt{n})$ . We suggest choosing $\tilde{q}(\tau)=\widehat{\mathcal{C}}_{\tau}(0.5)$ over other choices such as $\tilde{q}(\tau)=\hat{q}^{adj}(\tau)$ due to its better finite sample performance. We reject $\mathcal{H}_{0}$ at significance level $\alpha$ if $\underline{q}(\cdot)\notin CB(\alpha).$

B.3 Computation of Auxiliary Regressions

Parametric regressions.

For the linear probability model, we compute the LP estimator via (5.7). For the logistic model, we consider the ML and the LPML estimators. First, we compute the ML estimator $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ as in (5.9), which is the quasi maximum likelihood estimator of a flexible distribution regression. Second, we propose to compute the logistic function values with $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ and treat them as regressors in a linear adjustment to further improve the ML estimate.

Sieve logistic regressions.

We provide more detail on the sieve basis. Recall $H_{h_{n}}(x)\equiv(b_{1n}(x),\cdots,b_{h_{n}n}(x))^{\top}$ , where $\{b_{hn}(\cdot)\}_{h\in[h_{n}]}$ are $h_{n}$ basis functions of a linear sieve space, denoted as $\mathbb{B}$ . Given that all $d_{x}$ elements of $X$ are continuously distributed, the sieve space $\mathbb{B}$ can be constructed as follows.

For each element $X^{(l)}$ of $X$ , $l=1,\cdots,d_{x}$ , let $\mathcal{B}_{l}$ be the univariate sieve space of dimension $J_{n}$ . One example of $\mathcal{B}_{l}$ is the linear span of the $J_{n}$ dimensional polynomials given by

\mathbb{B}_{l}=\biggl{\{}\sum_{k=0}^{J_{n}}\alpha_{k}x^{k},x\in\text{Supp}(X^{(l)}),\alpha_{k}\in\Re\biggr{\}};

Another is the linear span of $r$ -order splines with $J_{n}$ nodes given by

\mathbb{B}_{l}=\biggl{\{}\sum_{k=0}^{r-1}\alpha_{k}x^{k}+\sum_{j=1}^{J_{n}}b_{j}[\max(x-t_{j},0)]^{r-1},x\in\text{Supp}(X^{(l)}),\alpha_{k},b_{j}\in\Re\biggr{\}},

where the grid $-\infty=t_{0}\leq t_{1}\leq\cdots\leq t_{J_{n}}\leq t_{J_{n}+1}=\infty$ partitions $\text{Supp}(X^{(l)})$ into $J_{n}+1$ subsets $I_{j}=[t_{j},t_{j+1})\cap\text{Supp}(X^{(l)})$ , $j=1,\cdots,J_{n}-1$ , $I_{0}=(t_{0},t_{1})\cap\text{Supp}(X^{(l)})$ , and $I_{J_{n}}=(t_{J_{n}},t_{J_{n}+1})\cap\text{Supp}(X^{(l)})$ .

2.

Let $\mathbb{B}$ be the tensor product of $\{\mathcal{B}_{l}\}_{l=1}^{d_{x}}$ , which is defined as a linear space spanned by the functions $\prod_{l=1}^{d_{x}}g_{l}$ , where $g_{l}\in\mathcal{B}_{l}$ . The dimension of $\mathbb{B}$ is then $K\equiv d_{x}J_{n}$ .

We refer interested readers to Hirano et al. (2003) and Chen (2007) for more details about the implementation of sieve estimation. Given the sieve basis, we can compute the $\widehat{m}_{a}(\tau,s,X_{i})$ following (5.16) and (5.17).

Logistic regressions with an $\ell_{1}$ penalization.

We follow the estimation procedure and the choice of tuning parameter proposed by Belloni et al. (2017). We provide details below for completeness. Recall $\varrho_{n,a}(s)=c\sqrt{n_{a}(s)}\Phi^{-1}(1-1/(p_{n}\log(n_{a}(s))))$ . We set $c=1.1$ following Belloni et al. (2017). We then implement the following algorithm to estimate $\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau))$ for $\tau\in\Upsilon$ :

(i)

Let $\hat{\sigma}_{h}^{(0)}=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\bar{Y}_{a,s}(\tau))^{2}H_{p_{n},h}^{2}$ for $h\in[p_{n}]$ , where $\bar{Y}_{a,s}(\tau)=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}$ . Estimate

	$\displaystyle\hat{\theta}_{a,s}^{\textit{HD,0}}(\hat{q}_{a}(\tau))=$	$\displaystyle\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))$
		$\displaystyle\qquad\qquad\qquad+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}+\frac{\varrho_{n,a}(s)}{n_{a}(s)}\sum_{h\in[p_{n}]}\hat{\sigma}_{h}^{(0)}\|\theta_{a,h}\|.$

(ii)

For $k=1,\cdots,K$ , obtain $\hat{\sigma}_{h}^{(k)}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(H_{p_{n},h}\hat{\varepsilon}_{i}^{(k)})^{2}}$ , where $\hat{\varepsilon}_{i}^{(k)}=1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\lambda(H_{p_{n}}^{T}\hat{\theta}_{a,s}^{\textit{HD,k-1}}(\hat{q}_{a}(\tau)))$ . Estimate

	$\displaystyle\hat{\theta}_{a,s}^{\textit{HD,k}}(\hat{q}_{a}(\tau))=$	$\displaystyle\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\biggl{[}1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))$
		$\displaystyle\qquad\qquad\qquad+1\{Y_{i}>\hat{q}_{a}(\tau)\}\log(1-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))\biggr{]}+\frac{\varrho_{n,a}(s)}{n_{a}(s)}\sum_{h\in[p_{n}]}\hat{\sigma}_{h}^{(k)}\|\theta_{a,h}\|.$

(iii)

Let $\hat{\theta}_{a,s}^{\textit{HD}}(\hat{q}_{a}(\tau))=\hat{\theta}_{a,s}^{\textit{HD,K}}(\hat{q}_{a}(\tau))$ .
(iv)

Repeat the above procedure for $\tau\in\mathcal{G}$ .

Appendix C Additional Simulation Results

C.1 Pointwise tests

Additional simulation results are provided for pointwise tests at 25% and 75% quantiles. The results are summarized in Tables 7 and 8. The simulation settings are the same as the pointwise test simulations in Section 6 of the main paper.

Table 7: Pointwise Test (

\tau=0.25

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.053	0.057	0.052	0.052	0.056	0.054	0.053	0.052	0.334	0.341	0.346	0.344	0.579	0.591	0.595	0.607
LP	0.054	0.056	0.052	0.054	0.053	0.057	0.053	0.052	0.391	0.406	0.405	0.394	0.683	0.696	0.693	0.694
ML	0.057	0.055	0.055	0.054	0.053	0.055	0.049	0.051	0.380	0.387	0.389	0.378	0.673	0.678	0.683	0.674
LPML	0.056	0.056	0.057	0.057	0.052	0.056	0.055	0.056	0.410	0.418	0.417	0.409	0.714	0.722	0.717	0.715
MLX	0.058	0.060	0.057	0.057	0.051	0.055	0.053	0.057	0.387	0.394	0.398	0.388	0.668	0.674	0.677	0.677
LPMLX	0.060	0.059	0.060	0.060	0.052	0.060	0.056	0.057	0.414	0.423	0.431	0.415	0.718	0.722	0.725	0.718
NP	0.061	0.065	0.063	0.063	0.056	0.060	0.058	0.057	0.432	0.444	0.442	0.427	0.724	0.728	0.730	0.724
Panel B: DGP (ii)
NA	0.044	0.047	0.048	0.046	0.054	0.049	0.044	0.046	0.457	0.457	0.466	0.475	0.741	0.751	0.752	0.760
LP	0.050	0.050	0.050	0.049	0.054	0.052	0.045	0.044	0.541	0.542	0.545	0.538	0.824	0.830	0.831	0.825
ML	0.049	0.049	0.052	0.054	0.050	0.051	0.049	0.045	0.478	0.477	0.480	0.474	0.757	0.761	0.760	0.761
LPML	0.053	0.052	0.056	0.054	0.053	0.050	0.048	0.044	0.542	0.540	0.544	0.536	0.832	0.837	0.840	0.833
MLX	0.055	0.057	0.057	0.058	0.054	0.053	0.048	0.045	0.507	0.504	0.505	0.500	0.765	0.771	0.771	0.770
LPMLX	0.055	0.061	0.061	0.057	0.057	0.054	0.050	0.046	0.572	0.567	0.572	0.563	0.848	0.850	0.852	0.845
NP	0.063	0.065	0.063	0.061	0.058	0.056	0.052	0.051	0.575	0.571	0.576	0.572	0.847	0.852	0.854	0.847

Table 8: Pointwise Test (

\tau=0.75

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.054	0.055	0.056	0.054	0.057	0.052	0.051	0.051	0.348	0.342	0.352	0.338	0.583	0.594	0.601	0.576
LP	0.055	0.052	0.056	0.052	0.050	0.053	0.053	0.049	0.428	0.418	0.424	0.432	0.686	0.698	0.698	0.697
ML	0.051	0.053	0.054	0.051	0.050	0.051	0.054	0.054	0.402	0.395	0.403	0.402	0.658	0.667	0.666	0.667
LPML	0.056	0.056	0.058	0.056	0.050	0.053	0.056	0.051	0.428	0.425	0.437	0.435	0.706	0.711	0.711	0.709
MLX	0.059	0.056	0.057	0.057	0.054	0.054	0.053	0.055	0.404	0.399	0.404	0.406	0.654	0.663	0.662	0.657
LPMLX	0.058	0.058	0.061	0.057	0.052	0.056	0.056	0.053	0.425	0.421	0.432	0.432	0.702	0.711	0.710	0.710
NP	0.061	0.061	0.065	0.062	0.056	0.057	0.058	0.057	0.441	0.435	0.441	0.447	0.706	0.710	0.711	0.711
Panel B: DGP (ii)
NA	0.052	0.054	0.053	0.050	0.047	0.047	0.050	0.051	0.325	0.331	0.325	0.311	0.557	0.550	0.546	0.538
LP	0.055	0.055	0.062	0.053	0.051	0.053	0.053	0.054	0.389	0.402	0.396	0.394	0.626	0.626	0.634	0.634
ML	0.056	0.055	0.057	0.055	0.049	0.050	0.052	0.054	0.350	0.357	0.346	0.348	0.563	0.575	0.574	0.569
LPML	0.054	0.057	0.060	0.055	0.053	0.051	0.055	0.053	0.390	0.400	0.394	0.388	0.635	0.640	0.639	0.644
MLX	0.058	0.057	0.059	0.059	0.053	0.053	0.052	0.055	0.371	0.387	0.377	0.373	0.590	0.595	0.595	0.597
LPMLX	0.062	0.060	0.065	0.058	0.055	0.055	0.057	0.056	0.416	0.425	0.420	0.421	0.658	0.663	0.663	0.668
NP	0.068	0.068	0.067	0.063	0.056	0.054	0.058	0.059	0.429	0.436	0.427	0.426	0.663	0.670	0.670	0.672

C.2 Estimation biases and standard errors

In this section we report the biases and standard errors of our regression-adjusted estimators under three test settings. Specifically, the biases and standard errors for pointwise tests are summarized in Tables 9-11. Table 12 reports the biases and standard errors for estimating the difference of QTEs. Table 13 provides the average estimation bias and standard errors over the interval $\tau\in[0.25,0.75]$ .

Table 9: Estimation Bias and Standard Errors for Pointwise Tests (

\tau=0.25

)

	Bias								Standard Error
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.010	0.004	-0.010	-0.023	0.011	-0.001	-0.001	-0.026	0.984	0.975	0.972	0.975	0.688	0.685	0.685	0.686
LP	0.024	0.013	0.005	0.036	0.016	0.006	0.004	0.009	0.882	0.874	0.872	0.872	0.610	0.607	0.608	0.607
ML	0.024	0.022	0.014	0.037	0.010	0.004	-0.002	0.012	0.902	0.894	0.892	0.891	0.620	0.617	0.619	0.618
LPML	0.011	0.010	0.002	0.028	0.005	0.000	-0.009	0.006	0.867	0.863	0.857	0.860	0.596	0.592	0.595	0.592
MLX	-0.001	-0.006	-0.015	0.019	0.003	-0.002	-0.016	0.000	0.904	0.896	0.893	0.894	0.626	0.624	0.626	0.624
LPMLX	0.005	-0.002	-0.012	0.015	0.000	-0.007	-0.014	0.000	0.867	0.861	0.857	0.858	0.594	0.591	0.593	0.592
NP	-0.037	-0.045	-0.053	-0.021	-0.014	-0.018	-0.027	-0.013	0.869	0.862	0.858	0.859	0.592	0.590	0.592	0.591
Panel B: DGP (ii)
NA	-0.004	0.003	-0.007	-0.043	-0.001	-0.008	-0.001	-0.019	0.824	0.820	0.816	0.819	0.574	0.572	0.571	0.571
LP	-0.038	-0.036	-0.040	-0.034	-0.022	-0.021	-0.015	-0.008	0.754	0.751	0.747	0.752	0.523	0.522	0.521	0.521
ML	-0.032	-0.027	-0.036	-0.033	-0.023	-0.020	-0.013	-0.011	0.816	0.813	0.808	0.814	0.573	0.571	0.570	0.570
LPML	-0.013	-0.007	-0.013	-0.003	-0.008	-0.009	-0.002	0.004	0.741	0.739	0.734	0.740	0.513	0.511	0.511	0.511
MLX	-0.063	-0.056	-0.060	-0.065	-0.033	-0.038	-0.032	-0.035	0.803	0.800	0.796	0.802	0.571	0.568	0.569	0.568
LPMLX	-0.061	-0.054	-0.057	-0.050	-0.032	-0.035	-0.028	-0.022	0.735	0.733	0.729	0.734	0.510	0.509	0.508	0.508
NP	-0.068	-0.066	-0.072	-0.062	-0.039	-0.041	-0.033	-0.026	0.734	0.732	0.729	0.733	0.510	0.509	0.508	0.508

Table 10: Estimation Bias and Standard Errors for Pointwise Tests (

\tau=0.5

)

	Bias								Standard Error
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	-0.001	0.020	0.016	0.043	-0.003	-0.006	-0.005	0.019	0.979	0.976	0.976	0.973	0.688	0.685	0.685	0.687
LP	-0.021	-0.005	-0.006	-0.017	-0.009	-0.008	-0.006	-0.010	0.875	0.873	0.871	0.871	0.610	0.606	0.607	0.609
ML	0.006	0.023	0.013	0.014	0.006	0.004	0.005	0.004	0.897	0.894	0.893	0.893	0.627	0.623	0.623	0.625
LPML	-0.004	0.013	0.001	-0.002	0.001	-0.001	0.003	-0.004	0.860	0.858	0.856	0.856	0.597	0.592	0.592	0.595
MLX	-0.004	0.016	0.006	0.003	0.003	-0.001	0.006	0.000	0.898	0.894	0.894	0.894	0.631	0.627	0.628	0.630
LPMLX	0.008	0.025	0.011	0.005	0.010	0.006	0.008	0.003	0.860	0.858	0.855	0.855	0.594	0.592	0.591	0.593
NP	-0.021	-0.006	-0.014	-0.028	0.003	0.002	0.004	-0.003	0.859	0.858	0.855	0.855	0.593	0.589	0.589	0.592
Panel B: DGP (ii)
NA	0.032	0.017	0.032	0.091	0.003	0.013	0.019	0.039	1.036	1.026	1.023	1.025	0.723	0.720	0.718	0.718
LP	-0.012	-0.034	-0.020	-0.015	-0.012	-0.003	0.001	-0.008	0.944	0.937	0.932	0.936	0.660	0.658	0.656	0.655
ML	0.025	0.010	0.026	0.029	0.010	0.009	0.015	0.010	1.006	1.000	0.997	0.998	0.709	0.705	0.702	0.702
LPML	0.020	0.013	0.029	0.031	0.016	0.019	0.028	0.013	0.930	0.920	0.919	0.924	0.644	0.640	0.638	0.637
MLX	-0.008	-0.038	-0.021	-0.001	-0.015	-0.012	-0.008	-0.008	0.981	0.972	0.969	0.970	0.692	0.690	0.688	0.688
LPMLX	-0.021	-0.041	-0.025	-0.024	-0.017	-0.013	-0.006	-0.017	0.916	0.912	0.906	0.908	0.636	0.635	0.633	0.631
NP	-0.043	-0.059	-0.042	-0.041	-0.026	-0.019	-0.012	-0.023	0.910	0.903	0.901	0.903	0.634	0.632	0.631	0.630

Table 11: Estimation Bias and Standard Errors for Pointwise Tests (

\tau=0.75

)

	Bias								Standard Error
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.011	0.010	0.001	0.003	0.004	0.004	-0.004	-0.008	0.885	0.882	0.883	0.883	0.622	0.619	0.618	0.619
LP	0.002	0.002	-0.006	0.011	0.005	0.003	0.000	-0.008	0.779	0.776	0.775	0.776	0.545	0.542	0.541	0.542
ML	0.012	0.012	0.000	0.018	0.009	0.007	0.002	-0.006	0.795	0.791	0.790	0.793	0.557	0.554	0.553	0.553
LPML	0.004	0.006	-0.007	0.009	0.004	0.005	0.002	-0.004	0.759	0.756	0.755	0.757	0.528	0.525	0.524	0.525
MLX	0.000	0.002	-0.006	0.002	0.006	0.001	0.003	-0.004	0.796	0.791	0.790	0.793	0.561	0.559	0.558	0.558
LPMLX	0.004	0.007	-0.007	0.007	0.006	0.006	0.001	-0.004	0.758	0.755	0.753	0.756	0.527	0.524	0.523	0.524
NP	-0.023	-0.017	-0.028	-0.015	0.004	0.004	0.000	-0.005	0.758	0.755	0.753	0.755	0.526	0.524	0.523	0.524
Panel B: DGP (ii)
NA	0.024	0.019	0.007	0.025	0.005	0.006	0.014	0.004	0.783	0.777	0.777	0.777	0.547	0.545	0.545	0.546
LP	-0.009	-0.012	-0.029	-0.011	-0.013	-0.008	-0.004	-0.010	0.711	0.707	0.707	0.707	0.495	0.494	0.493	0.494
ML	0.009	0.009	-0.008	0.005	-0.008	0.001	0.003	0.001	0.744	0.74	0.739	0.74	0.523	0.522	0.522	0.522
LPML	0.026	0.025	0.013	0.021	0.011	0.018	0.020	0.017	0.697	0.696	0.696	0.693	0.481	0.48	0.479	0.482
MLX	-0.037	-0.039	-0.049	-0.037	-0.028	-0.019	-0.018	-0.018	0.727	0.724	0.723	0.723	0.517	0.516	0.515	0.516
LPMLX	-0.045	-0.043	-0.058	-0.046	-0.027	-0.019	-0.015	-0.019	0.686	0.685	0.684	0.684	0.479	0.476	0.476	0.479
NP	-0.056	-0.061	-0.074	-0.061	-0.036	-0.027	-0.023	-0.027	0.685	0.681	0.682	0.681	0.474	0.473	0.473	0.473

Table 12: Estimation Bias and Standard Errors for Tests of Differences (

\tau_{1}=0.25

\tau_{2}=0.75

)

	Bias								Standard Error
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	-0.011	0.015	0.029	0.067	-0.013	-0.007	0.000	0.045	1.310	1.301	1.299	1.299	0.911	0.905	0.906	0.909
LP	-0.044	-0.020	-0.007	-0.052	-0.025	-0.015	-0.007	-0.019	1.255	1.248	1.246	1.245	0.869	0.864	0.865	0.867
ML	-0.029	-0.012	-0.005	-0.034	-0.009	-0.006	0.001	-0.015	1.253	1.243	1.241	1.241	0.865	0.860	0.862	0.862
LPML	-0.028	-0.008	-0.010	-0.043	-0.010	-0.007	0.009	-0.015	1.201	1.194	1.190	1.191	0.823	0.819	0.820	0.822
MLX	-0.012	0.000	0.012	-0.016	0.003	0.000	0.024	-0.001	1.250	1.244	1.239	1.241	0.870	0.865	0.867	0.868
LPMLX	-0.019	0.006	0.006	-0.029	0.007	0.011	0.025	0.002	1.197	1.190	1.187	1.187	0.822	0.817	0.818	0.819
NP	-0.010	0.014	0.018	-0.026	0.009	0.011	0.028	0.002	1.198	1.192	1.188	1.189	0.819	0.814	0.816	0.818
Panel B: DGP (ii)
NA	0.038	0.016	0.040	0.135	0.007	0.022	0.021	0.060	1.280	1.270	1.264	1.268	0.886	0.882	0.881	0.880
LP	0.029	0.002	0.021	0.020	0.012	0.018	0.017	0.000	1.201	1.192	1.186	1.191	0.831	0.829	0.827	0.826
ML	0.053	0.040	0.048	0.061	0.034	0.030	0.029	0.026	1.286	1.279	1.270	1.276	0.898	0.894	0.892	0.893
LPML	0.030	0.014	0.038	0.027	0.024	0.027	0.031	0.010	1.180	1.169	1.164	1.172	0.811	0.808	0.806	0.806
MLX	0.054	0.032	0.043	0.054	0.022	0.024	0.012	0.016	1.258	1.253	1.247	1.252	0.889	0.884	0.884	0.882
LPMLX	0.028	0.003	0.023	0.013	0.013	0.021	0.021	0.004	1.165	1.159	1.152	1.157	0.804	0.803	0.801	0.799
NP	0.019	0.001	0.023	0.010	0.014	0.021	0.021	0.003	1.160	1.153	1.149	1.152	0.804	0.801	0.799	0.799

Table 13: Average Estimation Bias and Standard Errors for Uniform Tests (

\tau\in[0.25,0.75]

)

	Bias								Standard Error
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.003	0.012	0.000	0.010	0.002	0.000	-0.004	-0.004	0.931	0.927	0.925	0.925	0.653	0.651	0.651	0.652
LP	0.001	0.004	-0.003	0.008	0.002	0.001	-0.001	-0.002	0.829	0.825	0.824	0.824	0.576	0.575	0.575	0.575
ML	0.012	0.016	0.005	0.019	0.005	0.006	0.001	0.003	0.851	0.847	0.845	0.846	0.593	0.591	0.591	0.591
LPML	0.002	0.008	-0.003	0.008	0.002	0.001	-0.002	-0.002	0.814	0.811	0.808	0.810	0.562	0.560	0.560	0.560
MLX	-0.002	0.006	-0.008	0.009	0.000	0.001	-0.002	-0.003	0.850	0.847	0.845	0.846	0.596	0.594	0.594	0.594
LPMLX	0.003	0.010	-0.004	0.007	0.003	0.001	-0.001	-0.002	0.813	0.810	0.808	0.808	0.561	0.559	0.559	0.560
NP	-0.024	-0.017	-0.029	-0.019	-0.002	-0.003	-0.005	-0.006	0.813	0.810	0.808	0.808	0.560	0.558	0.558	0.558
Panel B: DGP (ii)
NA	0.012	0.013	0.010	0.024	0.002	0.003	0.010	0.005	0.850	0.843	0.841	0.843	0.593	0.591	0.590	0.590
LP	-0.024	-0.022	-0.027	-0.017	-0.015	-0.011	-0.007	-0.010	0.772	0.767	0.765	0.766	0.537	0.535	0.534	0.534
ML	0.001	0.000	0.000	0.009	-0.003	-0.001	0.005	0.001	0.820	0.816	0.814	0.816	0.579	0.577	0.575	0.575
LPML	0.015	0.016	0.015	0.023	0.010	0.014	0.018	0.014	0.761	0.757	0.755	0.757	0.525	0.523	0.522	0.522
MLX	-0.035	-0.038	-0.038	-0.030	-0.022	-0.024	-0.019	-0.020	0.801	0.797	0.795	0.796	0.569	0.567	0.566	0.566
LPMLX	-0.039	-0.040	-0.039	-0.034	-0.021	-0.020	-0.015	-0.018	0.752	0.748	0.745	0.746	0.521	0.519	0.518	0.519
NP	-0.055	-0.059	-0.057	-0.052	-0.033	-0.030	-0.024	-0.028	0.746	0.742	0.741	0.742	0.516	0.515	0.514	0.514

C.3 Naïve Bootstrap Inference

In this section we report the size and power of our regression-adjusted estimator for the median QTE when we replace $\hat{\pi}(s)$ by the true propensity score $1/2$ . We then consider the multiplier bootstrap as defined in the main text but with $\hat{\pi}(s)$ replaced by the true propensity score $1/2$ . We call this the naive bootstrap inference because the simulation results below show that it is conservative. Specifically, we report addition simulation results for the pointwise tests with $\tau=0.25$ , $0.5$ and $0.75$ (Tables 14-16), tests for differences (Table 17), and uniform tests (Table 18).

Comparing the results with the ones in Section 6, we see that using the true, instead of the estimated, propensity score, the multiplier bootstrap inference becomes conservative for randomization schemes “WEI”, “BCD”, and “SBR”. Specifically, the sizes are much smaller than the nominal rate ( $5\%$ ). At the same time, the powers are smaller than their counterparts in Section 6. The improvement in powers of the “LPMLX” estimator with the estimated propensity score over the “LPMLX” estimator with the true propensity score is due to the 31–38% reduction in the standard errors. This outcome is consistent with the findings in Bugni et al. (2018) and Zhang and Zheng (2020) that the naive inference methods under CARs are conservative.

Table 14: Pointwise Tests with Naïve Bootstrap Inference (

\tau=0.25

\hat{\pi}(s)=0.5

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.049	0.032	0.022	0.023	0.056	0.032	0.023	0.023	0.255	0.234	0.224	0.220	0.454	0.454	0.456	0.465
LP	0.048	0.018	0.008	0.006	0.053	0.019	0.006	0.007	0.222	0.173	0.149	0.113	0.399	0.391	0.372	0.331
ML	0.048	0.034	0.032	0.031	0.048	0.044	0.037	0.037	0.330	0.292	0.279	0.253	0.622	0.614	0.602	0.592
LPML	0.051	0.034	0.031	0.029	0.050	0.041	0.038	0.038	0.346	0.311	0.296	0.270	0.643	0.630	0.622	0.606
MLX	0.049	0.039	0.033	0.034	0.053	0.046	0.041	0.042	0.334	0.300	0.294	0.265	0.621	0.617	0.608	0.597
LPMLX	0.052	0.040	0.033	0.032	0.049	0.046	0.042	0.041	0.353	0.326	0.308	0.280	0.656	0.647	0.640	0.629
NP	0.054	0.045	0.037	0.035	0.053	0.049	0.045	0.044	0.370	0.347	0.327	0.302	0.679	0.672	0.663	0.650
Panel B: DGP (ii)
NA	0.050	0.019	0.008	0.009	0.053	0.019	0.007	0.008	0.276	0.247	0.225	0.237	0.498	0.498	0.501	0.520
LP	0.052	0.013	0.002	0.002	0.053	0.011	0.001	0.002	0.238	0.178	0.148	0.109	0.432	0.413	0.389	0.363
ML	0.046	0.041	0.041	0.038	0.045	0.039	0.037	0.033	0.443	0.443	0.446	0.435	0.720	0.726	0.726	0.724
LPML	0.047	0.039	0.037	0.033	0.047	0.038	0.031	0.030	0.444	0.431	0.440	0.435	0.720	0.735	0.736	0.741
MLX	0.052	0.046	0.044	0.045	0.048	0.044	0.041	0.038	0.469	0.470	0.472	0.464	0.732	0.744	0.743	0.748
LPMLX	0.053	0.049	0.052	0.047	0.053	0.047	0.042	0.041	0.531	0.527	0.531	0.527	0.818	0.829	0.834	0.830
NP	0.058	0.056	0.056	0.057	0.052	0.053	0.050	0.049	0.548	0.553	0.556	0.553	0.841	0.848	0.850	0.842

Table 15: Pointwise Tests with Naïve Estimator (

\tau=0.5

\hat{\pi}(s)=0.5

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.056	0.026	0.015	0.019	0.053	0.028	0.014	0.015	0.291	0.255	0.242	0.236	0.492	0.501	0.503	0.504
LP	0.049	0.008	0.001	0.001	0.052	0.008	0.001	0.000	0.181	0.103	0.053	0.033	0.314	0.254	0.191	0.158
ML	0.050	0.018	0.008	0.007	0.051	0.018	0.006	0.007	0.273	0.226	0.198	0.166	0.479	0.479	0.479	0.458
LPML	0.049	0.017	0.006	0.006	0.048	0.017	0.006	0.005	0.276	0.217	0.198	0.155	0.495	0.492	0.491	0.476
MLX	0.050	0.018	0.008	0.008	0.052	0.021	0.008	0.007	0.270	0.229	0.211	0.169	0.482	0.472	0.473	0.455
LPMLX	0.053	0.017	0.007	0.007	0.050	0.018	0.006	0.006	0.284	0.232	0.209	0.169	0.500	0.498	0.498	0.479
NP	0.055	0.020	0.008	0.008	0.052	0.021	0.006	0.007	0.291	0.243	0.227	0.184	0.507	0.507	0.507	0.490
Panel B: DGP (ii)
NA	0.051	0.017	0.005	0.005	0.048	0.016	0.006	0.006	0.284	0.244	0.223	0.211	0.499	0.491	0.486	0.494
LP	0.047	0.006	0.000	0.000	0.048	0.004	0.000	0.000	0.171	0.087	0.041	0.019	0.315	0.229	0.150	0.116
ML	0.050	0.021	0.013	0.011	0.047	0.026	0.019	0.018	0.315	0.254	0.216	0.187	0.588	0.552	0.528	0.509
LPML	0.049	0.016	0.009	0.006	0.045	0.017	0.010	0.009	0.296	0.229	0.188	0.166	0.526	0.484	0.462	0.440
MLX	0.056	0.021	0.014	0.013	0.052	0.027	0.019	0.021	0.336	0.275	0.243	0.213	0.606	0.564	0.554	0.535
LPMLX	0.055	0.020	0.013	0.011	0.054	0.021	0.017	0.016	0.341	0.280	0.240	0.210	0.598	0.565	0.550	0.525
NP	0.059	0.023	0.015	0.014	0.056	0.028	0.024	0.023	0.368	0.303	0.269	0.237	0.651	0.614	0.602	0.578

Table 16: Pointwise Tests with Naïve Estimator (

\tau=0.75

\hat{\pi}(s)=0.5

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.054	0.033	0.024	0.023	0.054	0.030	0.022	0.019	0.265	0.241	0.232	0.218	0.458	0.462	0.462	0.441
LP	0.015	0.002	0.000	0.000	0.029	0.002	0.000	0.000	0.089	0.022	0.002	0.001	0.162	0.073	0.012	0.006
ML	0.028	0.003	0.001	0.000	0.044	0.005	0.001	0.000	0.127	0.050	0.015	0.008	0.228	0.137	0.067	0.053
LPML	0.028	0.003	0.000	0.000	0.045	0.005	0.000	0.000	0.126	0.046	0.013	0.008	0.232	0.138	0.063	0.045
MLX	0.028	0.003	0.001	0.000	0.045	0.005	0.000	0.000	0.127	0.050	0.016	0.008	0.228	0.141	0.066	0.053
LPMLX	0.028	0.003	0.000	0.000	0.045	0.005	0.000	0.000	0.127	0.049	0.014	0.009	0.232	0.140	0.064	0.047
NP	0.030	0.003	0.000	0.000	0.044	0.006	0.000	0.001	0.134	0.052	0.017	0.009	0.234	0.145	0.069	0.053
Panel B: DGP (ii)
NA	0.051	0.027	0.019	0.016	0.050	0.028	0.016	0.019	0.239	0.210	0.192	0.169	0.409	0.392	0.384	0.371
LP	0.009	0.001	0.000	0.000	0.024	0.002	0.000	0.000	0.067	0.018	0.002	0.000	0.144	0.056	0.007	0.002
ML	0.021	0.003	0.000	0.000	0.040	0.004	0.001	0.000	0.103	0.040	0.011	0.005	0.191	0.102	0.041	0.031
LPML	0.027	0.003	0.000	0.000	0.041	0.004	0.000	0.000	0.101	0.034	0.008	0.004	0.185	0.094	0.030	0.026
MLX	0.023	0.003	0.000	0.000	0.040	0.005	0.000	0.000	0.107	0.044	0.011	0.005	0.195	0.107	0.044	0.035
LPMLX	0.024	0.003	0.000	0.000	0.041	0.004	0.000	0.000	0.110	0.043	0.011	0.004	0.199	0.109	0.042	0.032
NP	0.027	0.002	0.000	0.000	0.040	0.003	0.000	0.000	0.113	0.045	0.013	0.005	0.205	0.113	0.044	0.035

Table 17: Tests for Differences with Naïve Estimator (

\tau_{1}=0.25

\tau_{2}=0.75

\hat{\pi}(s)=0.5

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.035	0.033	0.028	0.029	0.043	0.035	0.030	0.033	0.187	0.184	0.170	0.159	0.345	0.343	0.335	0.311
LP	0.010	0.006	0.001	0.001	0.029	0.008	0.003	0.003	0.081	0.059	0.034	0.026	0.203	0.149	0.110	0.097
ML	0.027	0.008	0.003	0.002	0.042	0.009	0.002	0.002	0.105	0.060	0.030	0.025	0.190	0.127	0.080	0.072
LPML	0.027	0.009	0.002	0.002	0.043	0.009	0.002	0.003	0.105	0.055	0.025	0.023	0.195	0.128	0.079	0.068
MLX	0.027	0.008	0.004	0.003	0.042	0.010	0.002	0.002	0.101	0.059	0.029	0.026	0.188	0.125	0.079	0.072
LPMLX	0.028	0.009	0.003	0.003	0.042	0.009	0.002	0.003	0.108	0.058	0.028	0.025	0.197	0.128	0.079	0.070
NP	0.028	0.009	0.002	0.003	0.044	0.010	0.002	0.003	0.110	0.057	0.027	0.025	0.198	0.130	0.078	0.070
Panel B: DGP (ii)
NA	0.037	0.026	0.020	0.021	0.040	0.028	0.023	0.026	0.167	0.165	0.152	0.122	0.330	0.318	0.306	0.294
LP	0.005	0.003	0.001	0.001	0.024	0.006	0.001	0.000	0.053	0.038	0.018	0.009	0.174	0.106	0.062	0.050
ML	0.022	0.004	0.002	0.001	0.035	0.005	0.002	0.001	0.081	0.035	0.014	0.006	0.162	0.086	0.045	0.033
LPML	0.026	0.005	0.001	0.000	0.039	0.004	0.001	0.001	0.081	0.032	0.009	0.005	0.157	0.070	0.025	0.021
MLX	0.023	0.005	0.001	0.001	0.034	0.007	0.001	0.001	0.082	0.038	0.013	0.008	0.165	0.092	0.043	0.038
LPMLX	0.023	0.005	0.001	0.000	0.037	0.005	0.001	0.001	0.084	0.037	0.012	0.007	0.170	0.089	0.037	0.031
NP	0.024	0.005	0.001	0.000	0.037	0.005	0.001	0.001	0.091	0.038	0.013	0.007	0.175	0.093	0.042	0.036

Table 18: Uniform Tests with Naïve Estimator (

\tau\in[0.25,0.75]

\hat{\pi}(s)=0.5

)

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Methods	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: DGP (i)
NA	0.044	0.020	0.012	0.011	0.048	0.025	0.014	0.011	0.187	0.184	0.170	0.159	0.566	0.569	0.567	0.562
LP	0.027	0.004	0.001	0.001	0.040	0.005	0.000	0.001	0.081	0.059	0.034	0.026	0.347	0.295	0.234	0.195
ML	0.031	0.011	0.007	0.006	0.043	0.016	0.009	0.010	0.105	0.060	0.030	0.025	0.630	0.588	0.562	0.541
LPML	0.034	0.010	0.007	0.007	0.043	0.016	0.008	0.009	0.105	0.055	0.025	0.023	0.636	0.608	0.583	0.562
MLX	0.032	0.012	0.008	0.007	0.046	0.018	0.009	0.011	0.101	0.059	0.029	0.026	0.629	0.591	0.560	0.548
LPMLX	0.033	0.013	0.010	0.008	0.045	0.019	0.011	0.011	0.108	0.058	0.028	0.025	0.653	0.623	0.602	0.582
NP	0.034	0.013	0.008	0.009	0.047	0.021	0.012	0.012	0.110	0.057	0.027	0.025	0.673	0.646	0.624	0.604
Panel B: DGP (ii)
NA	0.043	0.013	0.005	0.003	0.050	0.011	0.003	0.004	0.308	0.251	0.213	0.208	0.562	0.570	0.566	0.572
LP	0.029	0.002	0.000	0.000	0.035	0.002	0.000	0.000	0.171	0.082	0.041	0.025	0.358	0.282	0.207	0.177
ML	0.035	0.012	0.015	0.011	0.039	0.019	0.016	0.015	0.454	0.401	0.387	0.368	0.826	0.800	0.789	0.787
LPML	0.030	0.011	0.010	0.010	0.036	0.016	0.009	0.010	0.444	0.388	0.374	0.355	0.804	0.771	0.762	0.755
MLX	0.037	0.017	0.019	0.014	0.040	0.022	0.017	0.019	0.500	0.450	0.439	0.418	0.853	0.830	0.826	0.819
LPMLX	0.038	0.018	0.019	0.016	0.040	0.023	0.018	0.019	0.534	0.492	0.482	0.462	0.889	0.870	0.864	0.860
NP	0.041	0.023	0.025	0.023	0.045	0.028	0.023	0.024	0.573	0.539	0.533	0.513	0.919	0.906	0.903	0.899

C.4 High-dimensional covariates

To assess the finite sample performance of the estimation and inference methods introduced in Section A, we consider the outcomes equation

\displaystyle Y_{i}=\alpha(X_{i})+\gamma Z_{i}+\mu(X_{i})A_{i}+\eta_{i},

(C.1)

where $\gamma=4$ for all cases while $\alpha(X_{i})$ , $\mu(X_{i})$ , and $\eta_{i}$ are separately specified as follows.

Let $Z$ follow the standardized Beta(2, 2) distribution, $S_{i}=\sum_{j=1}^{4}1\{Z_{i}\leq g_{j}\}$ , and $(g_{1},\cdots,g_{4})=(-0.5\sqrt{5},0,0.5\sqrt{5},\sqrt{5})$ . Further suppose that $X_{i}$ contains twenty covariates $(X_{1i},\cdots,X_{20,i})^{\top}$ , where $X=\Phi(W)$ with $W\sim N(0_{20\times 1},\Omega)$ and the variance matrix $\Omega$ is the Toeplitz matrix

\displaystyle\Omega=\begin{pmatrix}1&0.5&0.5^{2}&\cdots&0.5^{19}\\ 0.5&1&0.5&\cdots&0.5^{18}\\ 0.5^{2}&0.5&1&\cdots&0.5^{17}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0.5^{19}&0.5^{18}&0.5^{17}&\cdots&1\end{pmatrix}

Further define $\alpha(X_{i})=1$ , $\mu(X_{i})=1+\sum_{k=1}^{20}X_{ki}\beta_{k}$ with $\beta_{k}=4/k^{2}$ , and $\eta_{i}=2A_{i}\varepsilon_{1i}+(1-A_{i})\varepsilon_{2i}$ , where $(\varepsilon_{1i},\varepsilon_{2i})$ are jointly standard normal.

We consider the post-Lasso estimator $\hat{\theta}_{a,s}^{post}(\hat{q}_{a}(\tau))$ as defined in (A.2) with $H_{p_{n}}(X_{i})=(1,X_{i}^{\top})^{\top}$ and $\hat{\mathbb{S}}^{+}_{a,s}(q)=\{2\}$ . The choice of tuning parameter and the estimation procedure are detailed in Section B.3. We assess the empirical size and power of the tests for $n=200$ and $n=400$ . All simulations are replicated 10,000 times, with the bootstrap sample size being 1,000. We compute the true QTEs or QTE differences by simulations with 10,000 sample size and 1,000 replications. To compute power, we perturb the true values by 1.5.

In Table 19, we report the empirical size and power of all three testing scenarios in the high-dimensional setting. In particular, we compare the methods “NA” with our post-Lasso estimator and the oracle estimator. Evidently, all sizes for both methods approach the nominal level as the sample size increases. The post-Lasso method dominates “NA” in all tests with superior power performance. The improvement in power of the “Post-Lasso” estimator upon “NA” (i.e., with no adjustments) is due to a 2.5% reduction of the standard error of the difference of the QTE estimators on average as shown in Table 20. This result is consistent with the theory given in Theorem A.1. The powers of the “Post-Lasso” and “Oracle” estimators are similarly, which also confirms that the “Post-Lasso” estimator achieves the minimum asymptotic variance.

Table 19: Empirical Size and Power for High-dimensional Covariates

	Size								Power
	$N=200$				$N=400$				$N=200$				$N=400$
Cases	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: NA
$\tau=0.25$	0.049	0.046	0.046	0.050	0.050	0.049	0.046	0.045	0.649	0.646	0.645	0.660	0.915	0.911	0.915	0.917
$\tau=0.50$	0.047	0.044	0.045	0.043	0.042	0.044	0.044	0.046	0.732	0.732	0.726	0.736	0.955	0.960	0.960	0.960
$\tau=0.75$	0.050	0.045	0.046	0.047	0.044	0.046	0.051	0.047	0.620	0.635	0.638	0.627	0.895	0.904	0.903	0.898
Diff	0.038	0.041	0.038	0.039	0.041	0.042	0.042	0.037	0.365	0.373	0.369	0.351	0.643	0.644	0.644	0.628
Uniform	0.035	0.034	0.036	0.036	0.040	0.038	0.040	0.040	0.852	0.860	0.857	0.865	0.994	0.996	0.994	0.994
Panel B: Post-Lasso
$\tau=0.25$	0.060	0.055	0.058	0.058	0.054	0.054	0.048	0.052	0.661	0.655	0.659	0.656	0.923	0.916	0.924	0.918
$\tau=0.50$	0.056	0.054	0.056	0.052	0.048	0.050	0.051	0.049	0.739	0.744	0.728	0.741	0.960	0.964	0.963	0.963
$\tau=0.75$	0.059	0.055	0.055	0.056	0.052	0.050	0.056	0.055	0.627	0.644	0.648	0.643	0.902	0.911	0.907	0.904
Diff	0.052	0.050	0.048	0.050	0.048	0.046	0.046	0.043	0.377	0.380	0.381	0.373	0.657	0.665	0.655	0.659
Uniform	0.051	0.051	0.048	0.046	0.049	0.046	0.048	0.049	0.872	0.881	0.877	0.882	0.996	0.998	0.996	0.996
Panel C: Oracle
$\tau=0.25$	0.048	0.043	0.045	0.048	0.048	0.047	0.040	0.046	0.668	0.663	0.660	0.660	0.925	0.921	0.929	0.922
$\tau=0.50$	0.041	0.044	0.044	0.040	0.042	0.044	0.043	0.044	0.745	0.746	0.739	0.749	0.962	0.967	0.967	0.967
$\tau=0.75$	0.049	0.044	0.046	0.045	0.046	0.046	0.050	0.047	0.640	0.652	0.648	0.649	0.907	0.916	0.914	0.911
Diff	0.052	0.050	0.048	0.050	0.041	0.041	0.042	0.038	0.387	0.390	0.392	0.385	0.661	0.663	0.656	0.655
Uniform	0.041	0.044	0.044	0.040	0.040	0.038	0.040	0.041	0.873	0.881	0.883	0.883	0.997	0.998	0.997	0.997

Table 20: Estimation Bias and Standard Errors for High-dimensional Covariates

	Bias								Standard Error
	$N=200$				$N=400$				$N=200$				$N=400$
Cases	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR	SRS	WEI	BCD	SBR
Panel A: NA
$\tau=0.25$	-0.009	-0.006	-0.001	-0.033	-0.005	0.000	-0.005	-0.018	0.652	0.649	0.647	0.651	0.456	0.455	0.454	0.456
$\tau=0.50$	0.004	0.005	0.018	-0.004	0.004	-0.002	-0.002	-0.001	0.588	0.584	0.583	0.583	0.408	0.407	0.407	0.407
$\tau=0.75$	0.022	0.006	0.016	0.028	0.017	0.001	0.007	0.016	0.652	0.651	0.650	0.648	0.457	0.456	0.454	0.456
Diff	0.011	0.022	0.012	0.067	0.007	0.008	0.013	0.032	0.922	0.917	0.916	0.917	0.644	0.642	0.641	0.643
Uniform	0.008	0.000	-0.004	-0.002	0.000	-0.001	0.000	0.008	0.624	0.621	0.620	0.621	0.436	0.435	0.435	0.435
Panel B: Post-Lasso
$\tau=0.25$	0.004	0.005	0.005	-0.004	-0.002	0.004	-0.003	-0.002	0.639	0.636	0.633	0.637	0.446	0.445	0.445	0.446
$\tau=0.50$	0.014	0.010	0.029	0.011	0.007	0.001	0.003	0.005	0.576	0.573	0.572	0.571	0.400	0.398	0.398	0.399
$\tau=0.75$	0.039	0.021	0.026	0.026	0.023	0.008	0.014	0.015	0.639	0.638	0.636	0.635	0.447	0.446	0.445	0.446
Diff	0.013	0.025	0.008	0.039	0.010	0.007	0.018	0.012	0.907	0.901	0.900	0.902	0.632	0.630	0.630	0.631
Uniform	0.020	0.011	0.006	0.010	0.004	0.005	0.006	0.014	0.611	0.609	0.608	0.608	0.427	0.426	0.426	0.426
Panel C: Oracle
$\tau=0.25$	-0.010	-0.003	-0.001	-0.003	-0.006	-0.001	-0.007	-0.004	0.633	0.631	0.629	0.633	0.445	0.443	0.443	0.444
$\tau=0.50$	0.001	0.003	0.022	0.006	0.002	-0.003	-0.002	0.004	0.572	0.569	0.568	0.568	0.397	0.396	0.396	0.396
$\tau=0.75$	0.022	0.008	0.020	0.022	0.018	0.001	0.009	0.013	0.637	0.635	0.633	0.632	0.446	0.445	0.443	0.445
Diff	0.017	0.026	0.013	0.040	0.008	0.009	0.016	0.014	0.900	0.894	0.894	0.895	0.629	0.628	0.627	0.628
Uniform	0.007	0.002	-0.002	0.004	-0.001	0.000	0.000	0.011	0.607	0.604	0.603	0.604	0.425	0.424	0.424	0.424

Appendix D Additional Notation

Throughout the supplement the collection $(\xi_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0))_{i\in[n]}$ denotes an i.i.d. sequence with marginal distribution equal to the conditional distribution of $(\xi_{i},X_{i},Y_{i}(1),Y_{i}(0))$ given $S_{i}=s$ . In addition, $\{(\xi_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0))_{i\in[n]}\}_{s\in\mathcal{S}}$ are independent across $s$ and with $\{A_{i},S_{i}\}_{i\in[n]}$ . We further denote $\mathcal{F}$ as a generic class of functions which differs in different contexts. The envelope of $\mathcal{F}$ is denoted as $F$ . We say $\mathcal{F}$ is of VC-type with coefficients $(\alpha_{n},v_{n})$ if

\displaystyle\sup_{Q}N\left(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2}\right)\leq\left(\frac{\alpha_{n}}{\varepsilon}\right)^{v_{n}},\quad\forall\varepsilon\in(0,1],

where $N(\cdot)$ denote the covering number, $e_{Q}(f,g)=||f-g||_{Q,2}$ , and the supremum is taken over all finitely discrete probability measures.

Appendix E Proof of Theorem 3

Name Description $n_{a}(s)$ For $a\in\{0,1\},s\in\mathcal{S}$ , $n_{a}(s)$ is the number of individuals with $A_{i}=a$ in stratum $s\in\mathcal{S}$ $n(s)$ For $s\in\mathcal{S}$ , $n(s)$ is the number of individuals in stratum $s\in\mathcal{S}$ $\hat{\pi}(s)$ For $s\in\mathcal{S}$ , $\hat{\pi}(s)=n_{1}(s)/n(s)$ $\hat{q}_{a}^{adj}(\tau)$ For $a\in\{0,1\}$ and $\tau\in\Upsilon$ , $\hat{q}_{a}^{adj}(\tau)$ is the regression-adjusted estimator of $q_{a}(\tau)$ with a generic regression adjustment $m_{a}(\tau,s,x)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , $\tau\in\Upsilon$ , and $x\in\text{Supp}(X)$ , $m_{a}(\tau,s,x)=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)$ is the true specification $\overline{m}_{a}(\tau,s,x)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , $\tau\in\Upsilon$ , and $x\in\text{Supp}(X)$ , $\overline{m}_{a}(\tau,s,x)$ is the model for $m_{a}(\tau,s,x)$ specified by researchers $m_{a}(\tau,s)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $m_{a}(\tau,s)=\mathbb{E}(m_{a}(\tau,S_{i},X_{i})|S_{i}=s)$ $\overline{m}_{a}(\tau,s)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\overline{m}_{a}(\tau,s)=\mathbb{E}(\overline{m}_{a}(\tau,S_{i},X_{i})|S_{i}=s)$ $\eta_{i,a}(\tau,s)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ ,, $\eta_{i,a}(\tau,s)=\tau-1\{Y_{i}(a)\leq q_{a}(\tau)\}-m_{a}(\tau,s)$ $f_{a}(\cdot)$ For $a\in\{0,1\}$ , $f_{a}(\cdot)$ denotes the density of $Y(a)$ $D_{n}(s)$ For $s\in\mathcal{S}$ , $D_{n}(s)=\sum_{i=1}^{n}(A_{i}-\pi(s))1\{S_{i}=s\}$ denotes the imbalance in stratum $s$ $\overline{\Delta}_{a}(\tau,s,x)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , $\tau\in\Upsilon$ , and $x\in\text{Supp}(X)$ , $\overline{\Delta}_{a}(\tau,s,x)=\widehat{m}_{a}(\tau,s,x)-\overline{m}_{a}(\tau,s,x)$

We first derive the linear expansion of $\hat{q}_{1}^{adj}(\tau)$ . By Knight’s identity ((Knight, 1998)), we have

	$\displaystyle L_{n}(u,\tau)=$	$\displaystyle\sum_{i=1}^{n}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}\left[\rho_{\tau}(Y_{i}-q_{1}(\tau)-u/\sqrt{n})-\rho_{\tau}(Y_{i}-q_{1}(\tau))\right]+\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})\sqrt{n}}\widehat{m}_{1}(\tau,S_{i},X_{i})u\right]$
	$\displaystyle\equiv$	$\displaystyle-L_{1,n}(\tau)u+L_{2,n}(u,\tau),$

where

\displaystyle L_{1,n}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})-\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right]

and

\displaystyle L_{2,n}(\tau)=\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(S_{i})}\int_{0}^{\frac{u}{\sqrt{n}}}\left(1\{Y_{i}\leq q_{1}(\tau)+v\}-1\{Y_{i}\leq q_{1}(\tau)\}\right)dv.

By change of variables, we have

\displaystyle\sqrt{n}(\hat{q}_{1}^{adj}(\tau)-q_{1}(\tau))=\operatorname*{arg\,min}_{u}L_{n}(u,\tau).

Note that $L_{2,n}(\tau)$ is exactly the same as that considered in the proof of Theorem 3.2 in Zhang and Zheng (2020) and by their result we have

\displaystyle\sup_{\tau\in\Upsilon}\left|L_{2,n}(\tau)-\frac{f_{1}(q_{1}(\tau))u^{2}}{2}\right|=o_{p}(1).

Next, consider $L_{1,n}(\tau)$ . Denote $m_{1}(\tau,s)=\mathbb{E}(m_{1}(\tau,S_{i},X_{i})|S_{i}=s)$ , $\eta_{i,1}(\tau,s)=\tau-1\{Y_{i}\leq q_{1}(\tau)\}-m_{1}(\tau,s)$ , and

	$\displaystyle L_{1,n}(\tau)=$	$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\frac{A_{i}}{\hat{\pi}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})\right]-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\frac{(A_{i}-\hat{\pi}(S_{i}))}{\hat{\pi}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right]$
	$\displaystyle\equiv$	$\displaystyle L_{1,1,n}(\tau)-L_{1,2,n}(\tau).$

First, note that $\hat{\pi}(s)-\pi(s)=\frac{D_{n}(s)}{n(s)}$ . Therefore,

$\displaystyle L_{1,1,n}(\tau)$	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})$
	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}(\hat{\pi}(s)-\pi(s))}{\sqrt{n}\hat{\pi}(s)\pi(s)}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})$
	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}D_{n}(s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}(s)}$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{s\in\mathcal{S}}\frac{D_{n}(s)}{\sqrt{n}\pi(s)}m_{1}(\tau,s)+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}$
	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}D_{n}(s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}(s)}$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}(\tau),$	(E.1)

where

	$\displaystyle R_{1,1}(\tau)=$	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}D_{n}(s)$
		$\displaystyle+\sum_{s\in\mathcal{S}}\frac{D_{n}(s)m_{1}(\tau,s)}{\sqrt{n}}\left(\frac{1}{\pi(s)}-\frac{1}{\hat{\pi}(s)}\right)$
		$\displaystyle=-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{A_{i}1\{S_{i}=s\}D_{n}(s)}{n(s)\sqrt{n}\hat{\pi}(s)\pi(s)}\eta_{i,1}(\tau,s).$

In addition, note that

\displaystyle\{\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i}):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients $(\alpha,v)$ and bounded envelope, and $\mathbb{E}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i})|S_{i}=s)=0$ . Therefore, Lemma N.2 implies

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\eta_{i,1}(\tau,s)\right|=O_{p}(1).

By Assumption 1 we have $\max_{s\in\mathcal{S}}|D_{n}(s)/n(s)|=o_{p}(1)$ , $\max_{s\in\mathcal{S}}|\hat{\pi}(s)-\pi(s)|=o_{p}(1)$ , and $\min_{s\in\mathcal{S}}\pi(s)>c>0$ , which imply $\sup_{\tau\in\Upsilon}|R_{1,1}(\tau)|=o_{p}(1)$ .

Next, denote $\overline{m}_{1}(\tau,s)=\mathbb{E}(\overline{m}_{1}(\tau,s,X_{i})|S_{i}=s)$ . Then

$\displaystyle L_{1,2,n}$	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(s)}\overline{m}_{1}(\tau,s,X_{i})1\{S_{i}=s\}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\overline{m}_{1}(\tau,S_{i},X_{i})$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\hat{\pi}(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}$
	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}$
	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}(s)}{\hat{\pi}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}$
	$\displaystyle\equiv\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}$
	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)+R_{2}(\tau),$	(E.2)

where the second equality holds because

\displaystyle\sum_{s\in\mathcal{S}}\sum_{i\in[n]}\frac{A_{i}}{\hat{\pi}(s)}\overline{m}_{1}(\tau,s)1\{S_{i}=s\}=\sum_{s\in\mathcal{S}}n(s)\overline{m}_{1}(\tau,s)=\sum_{i=1}^{n}\overline{m}_{1}(\tau,S_{i}).

For the first term of $R_{2}(\tau)$ , we have

	$\displaystyle\sup_{\tau\in\Upsilon}\left\|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}(s)}{\hat{\pi}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)\right\|$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}\left\|\frac{D_{n}(s)}{n_{1}(s)\pi(s)}\right\|\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))\right\|.$

Assumption 3 implies

\displaystyle\mathcal{F}=\{\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients $(\alpha,v)$ and an envelope $F_{i}$ such that $\mathbb{E}(|F_{i}|^{d}|S_{i}=s)<\infty$ for $d>2$ . Therefore,

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))\right|=O_{p}(n^{-1/2}).

It is also assumed that $D_{n}(s)/n(s)=o_{p}(1)$ and $n(s)/n_{1}(s)\stackrel{{\scriptstyle p}}{{\longrightarrow}}1/\pi(s)<\infty$ . Therefore, we have

\displaystyle\sup_{\tau\in\Upsilon}\left|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}(s)}{\hat{\pi}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)\right|=o_{p}(1).

Recall $\overline{\Delta}_{1}(\tau,s,X_{i})=\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})$ . Then, for the second term of $R_{2}(\tau)$ , we have

		$\displaystyle\left\|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}(s)}\sum_{i=1}^{n}(A_{i}-\hat{\pi}(s))\overline{\Delta}_{1}(\tau,s,X_{i})1\{S_{i}=s\}\right\|$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}n_{0}(s)\sup_{\tau\in\Upsilon}\biggl{\|}\frac{\sum_{i\in I_{1}(s)}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}(s)}-\frac{\sum_{i\in I_{0}(s)}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}(s)}\biggr{\|}=o_{p}(1),$

where the last equality holds by Assumption 3(i). Therefore, we have

\displaystyle\sup_{\tau\in\Upsilon}|R_{1,2}(\tau)|=o_{p}(1).

Combining (E.1) and (E.2), we have

	$\displaystyle L_{1,n}(\tau)$	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]$
		$\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)$
		$\displaystyle+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}(\tau)-R_{1,2}(\tau).$

Note by Assumption 3 that the classes of functions

\displaystyle\left\{\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right):\tau\in\Upsilon\right\}

and

\displaystyle\left\{\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s):\tau\in\Upsilon\right\}

are of the VC-type with fixed coefficients and envelopes belonging to $L_{\mathbb{P},d}$ . In addition,

\displaystyle\mathbb{E}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\bigg{|}S_{i}=s\right]=0

and

\displaystyle\mathbb{E}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)|S_{i}=s)=0.

Therefore, Lemma N.2 implies,

\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]\right|=O_{p}(1)

and

\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right|=O_{p}(1).

This implies $\sup_{\tau\in\Upsilon}|L_{1,n}(\tau)|=O_{p}(1)$ . Then by Kato (2009, Theorem 2), we have

	$\displaystyle\sqrt{n}(\hat{q}^{adj}_{1}(\tau)-q_{1}(\tau))$
	$\displaystyle=\frac{1}{f_{1}(q_{1}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]$
	$\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)+\sum_{i=1}^{n}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,1}(\tau),$

where $\sup_{\tau\in\Upsilon}|R_{q,1}(\tau)|=o_{p}(1)$ . Similarly, we have

		$\displaystyle\sqrt{n}(\hat{q}^{adj}_{0}(\tau)-q_{0}(\tau))$
	$\displaystyle=$	$\displaystyle\frac{1}{f_{0}(q_{0}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left[\frac{\eta_{i,0}(\tau,s)}{1-\pi(s)}+\left(1-\frac{1}{1-\pi(s)}\right)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)\right]$
		$\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)+\sum_{i=1}^{n}\frac{m_{0}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,0}(\tau),$

where $\eta_{i,0}(\tau,s)=\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,s)$ and $\sup_{\tau\in\Upsilon}|R_{q,0}(\tau)|=o_{p}(1)$ . Taking the difference of the above two displays gives

	$\displaystyle\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)-(1-\pi(s))\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{\pi(s)f_{1}(q_{1}(\tau))}-\frac{\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{f_{0}(q_{0}(\tau))}\right]$
	$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left[\frac{\eta_{i,0}(\tau,s)-\pi(s)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{(1-\pi(s))f_{0}(q_{0}(\tau))}-\frac{\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{f_{1}(q_{1}(\tau))}\right]$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)+R_{q}(\tau)$
	$\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i})$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\phi_{s}(\tau,S_{i})+R_{q}(\tau),$

where $\sup_{\tau\in\Upsilon}|R_{q}(\tau)|=o_{p}(1)$ . Lemma N.3 shows that, uniformly over $\tau\in\Upsilon$ ,

\displaystyle\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))\rightsquigarrow\mathcal{B}(\tau),

where $\mathcal{B}(\tau)$ is a Gaussian process with covariance kernel

	$\displaystyle\Sigma(\tau,\tau^{\prime})$	$\displaystyle=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
		$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})$
		$\displaystyle+\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).$

For the second result in Theorem 3, we denote

\displaystyle\delta_{a}(\tau,S_{i},X_{i})=m_{a}(\tau,S_{i},X_{i})-m_{a}(\tau,S_{i})\quad\text{and}\quad\overline{\delta}_{a}(\tau,S_{i},X_{i})=(\overline{m}_{a}(\tau,S_{i},X_{i})-\overline{m}_{a}(\tau,S_{i})),\leavevmode\nobreak\ a=0,1.

(E.3)

Then

	$\displaystyle\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
	$\displaystyle=\mathbb{E}\frac{1}{\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(1)\leq q_{1}(\tau^{\prime})\}-m_{1}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]$
	$\displaystyle+\mathbb{E}\pi(S_{i})\left[\frac{\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau))}+\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]$
	$\displaystyle\times\left[\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau^{\prime}))}+\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right],$

	$\displaystyle\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
	$\displaystyle=\mathbb{E}\frac{1}{1-\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(0)\leq q_{0}(\tau^{\prime})\}-m_{0}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]$
	$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\left[\frac{\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau))}-\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]$
	$\displaystyle\times\left[\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau^{\prime}))}-\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right],$

and

	$\displaystyle\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i})$	$\displaystyle=\mathbb{E}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)$
		$\displaystyle=\mathbb{E}\left(\frac{m_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)$
		$\displaystyle-\mathbb{E}\left(\frac{\delta_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\delta_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right).$

Let

	$\displaystyle\Sigma^{*}(\tau,\tau^{\prime})$
	$\displaystyle=\mathbb{E}\frac{1}{\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(1)\leq q_{1}(\tau^{\prime})\}-m_{1}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]$
	$\displaystyle+\mathbb{E}\frac{1}{1-\pi(S_{i})}\left[\frac{(\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,S_{i},X_{i}))}{f_{1}(q_{1}(\tau))}\right]\left[\frac{(\tau^{\prime}-1\{Y_{i}(0)\leq q_{0}(\tau^{\prime})\}-m_{0}(\tau^{\prime},S_{i},X_{i}))}{f_{1}(q_{1}(\tau^{\prime}))}\right]$
	$\displaystyle+\mathbb{E}\left(\frac{m_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right),$

which does not rely on the working models. Then,

	$\displaystyle\Sigma(\tau,\tau^{\prime})-\Sigma^{*}(\tau,\tau^{\prime})$
	$\displaystyle=\mathbb{E}\pi(S_{i})\left[\frac{\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau))}+\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]$
	$\displaystyle\times\left[\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{\pi(S_{i})f_{1}(q_{1}(\tau^{\prime}))}+\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right]$
	$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\left[\frac{\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau))}-\left(\frac{\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\right]$
	$\displaystyle\times\left[\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{(1-\pi(S_{i}))f_{0}(q_{0}(\tau^{\prime}))}-\left(\frac{\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\right]$
	$\displaystyle-\mathbb{E}\left(\frac{\delta_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{\delta_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)$
	$\displaystyle=\mathbb{E}\left[\sqrt{\frac{1-\pi(S_{i})}{\pi(S_{i})}}\frac{\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}+\sqrt{\frac{\pi(S_{i})}{1-\pi(S_{i})}}\frac{\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right]$
	$\displaystyle\times\left[\sqrt{\frac{1-\pi(S_{i})}{\pi(S_{i})}}\frac{\delta_{1}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{1}(\tau^{\prime},S_{i},X_{i})}{f_{1}(q_{1}(\tau^{\prime}))}+\sqrt{\frac{\pi(S_{i})}{1-\pi(S_{i})}}\frac{\delta_{0}(\tau^{\prime},S_{i},X_{i})-\overline{\delta}_{0}(\tau^{\prime},S_{i},X_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right]$
	$\displaystyle\equiv\mathbb{E}u_{i}(\tau)u_{i}(\tau^{\prime}),$

where

u_{i}(\tau)=\sqrt{\frac{1-\pi(S_{i})}{\pi(S_{i})}}\frac{\left(\delta_{1}(\tau,S_{i},X_{i})-\overline{\delta}_{1}(\tau,S_{i},X_{i})\right)}{f_{1}(q_{1}(\tau))}+\sqrt{\frac{\pi(S_{i})}{1-\pi(S_{i})}}\frac{\left(\delta_{0}(\tau,S_{i},X_{i})-\overline{\delta}_{0}(\tau,S_{i},X_{i})\right)}{f_{0}(q_{0}(\tau))}.

Further, denote $\vec{u}_{i}=(a_{i}(\tau_{1}),\cdots,a_{i}(\tau_{K}))^{\top}$ , the asymptotic variance covariance matrix of $(\hat{q}^{adj}(\tau_{1}),\cdots,\hat{q}^{adj}(\tau_{K}))$ as $[\Sigma_{kl}]_{k,l\in[K]}$ , and the optimal variance covariance matrix as $[\Sigma^{*}_{kl}]_{k,l\in[K]}$ . We have

\displaystyle[\Sigma_{kl}]_{k,l\in[K]}-[\Sigma^{*}_{kl}]_{k,l\in[K]}=[\mathbb{E}u_{i}(\tau_{k})u_{i}(\tau_{l})]_{k,l\in[K]}=\mathbb{E}\vec{u}_{i}\vec{u}_{i}^{\top},

which is positive semidefinite. In addition, $\mathbb{E}\vec{u}_{i}\vec{u}_{i}^{\top}=0$ if $\overline{m}_{a}(\tau,s,x)=m_{a}(\tau,s,x)$ for $a=0,1$ , $\tau\in\{\tau_{1},\cdots,\tau_{K}\}$ , and $(s,x)$ in the joint support of $(S_{i},X_{i})$ . This concludes the proof.

Appendix F Proof of Theorem 5

Name Description $n^{w}_{a}(s)$ For $a\in\{0,1\}$ and $s\in\mathcal{S}$ , $n^{w}_{a}(s)=\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}$ , where $\{\xi_{i}\}_{i\in[n]}$ is the bootstrap weights $n^{w}(s)$ For $s\in\mathcal{S}$ , $n^{w}(s)=\sum_{i=1}^{n}\xi_{i}1\{S_{i}=s\}$ $\hat{\pi}^{w}(s)$ For $s\in\mathcal{S}$ , $\hat{\pi}(s)=n_{1}^{w}(s)/n^{w}(s)$ $\hat{q}_{a}^{w}(\tau)$ For $a\in\{0,1\}$ and $\tau\in\Upsilon$ , $\hat{q}_{a}^{w}(\tau)$ is the bootstrap estimator of $q_{a}(\tau)$ with a generic regression adjustment

We focus on deriving the linear expansion of $\hat{q}^{w}_{1}(\tau).$ Let

	$\displaystyle L_{n}^{w}(u,\tau)=$	$\displaystyle\sum_{i=1}^{n}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}\left[\rho_{\tau}(Y_{i}-q_{1}(\tau)-u/\sqrt{n})-\rho_{\tau}(Y_{i}-q_{1}(\tau))\right]+\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})\sqrt{n}}\widehat{m}_{1}(\tau,S_{i},X_{i})u\right]$
	$\displaystyle\equiv$	$\displaystyle-L_{1,n}^{w}(\tau)u+L_{2,n}^{w}(u,\tau),$

where

\displaystyle L_{1,n}^{w}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})-\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right],

and

\displaystyle L_{2,n}^{w}(\tau)=\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(S_{i})}\int_{0}^{\frac{u}{\sqrt{n}}}\left(1\{Y_{i}\leq q_{1}(\tau)+v\}-1\{Y_{i}\leq q_{1}(\tau)\}\right)dv.

By change of variables, we have

\displaystyle\sqrt{n}(\hat{q}_{1}^{w}(\tau)-q_{1}(\tau))=\operatorname*{arg\,min}_{u}L_{n}^{w}(u,\tau).

Note that $L_{2,n}^{w}(\tau)$ is exactly the same as that considered in the proof of Theorem 3.2 in Zhang and Zheng (2020) and by their result we have

\displaystyle\sup_{\tau\in\Upsilon}\left|L_{2,n}^{w}(\tau)-\frac{f_{1}(q_{1}(\tau))u^{2}}{2}\right|=o_{p}(1).

Next consider $L_{1,n}^{w}(\tau)$ . Recall $m_{1}(\tau,s)=\mathbb{E}(m_{1}(\tau,S_{i},X_{i})|S_{i}=s)$ and $\eta_{i,1}(\tau,s)=\tau-1\{Y_{i}\leq q_{1}(\tau)\}-m_{1}(\tau,s)$ . Denote

	$\displaystyle L_{1,n}^{w}(\tau)$	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left[\frac{A_{i}}{\hat{\pi}^{w}(S_{i})}(\tau-1\{Y_{i}\leq q_{1}(\tau)\})\right]-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left[\frac{(A_{i}-\hat{\pi}^{w}(S_{i}))}{\hat{\pi}^{w}(S_{i})}\widehat{m}_{1}(\tau,S_{i},X_{i})\right]$
		$\displaystyle\equiv L_{1,1,n}^{w}(\tau)-L_{1,2,n}^{w}(\tau).$

First, note that

	$\displaystyle L_{1,1,n}^{w}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})$
	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}(\hat{\pi}^{w}(s)-\pi(s))}{\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}}{\pi(s)}1\{S_{i}=s\}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\})$
	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D_{n}^{w}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}^{w}(s)m_{1}(\tau,s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}D_{n}^{w}(s)-\sum_{s\in\mathcal{S}}\frac{D_{n}^{w}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}^{w}(s)}$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)}{\sqrt{n}\pi(s)}m_{1}(\tau,s)+\sum_{i=1}^{n}\frac{\xi_{i}m_{1}(\tau,S_{i})}{\sqrt{n}}$
	$\displaystyle-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D_{n}^{w}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D_{n}^{w}(s)m_{1}(\tau,s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}D^{w}_{n}(s)-\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)m_{1}(\tau,s)}{\sqrt{n}\hat{\pi}^{w}(s)}$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}1\{S_{i}=s\}}{\pi(s)}\eta_{i,1}(\tau,s)+\sum_{i=1}^{n}\frac{\xi_{i}m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}^{w}(\tau),$		(F.1)

where $D_{n}^{w}(s)=\sum_{i=1}^{n}\xi_{i}(A_{i}-\pi(S_{i}))1\{S_{i}=s\}=(\pi^{w}(s)-\pi(s))n^{w}(s)$ ,

	$\displaystyle R_{1,1}^{w}(\tau)=-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D^{w}_{n}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s)-\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)m_{1}(\tau,s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}D^{w}_{n}(s)$
	$\displaystyle+\sum_{s\in\mathcal{S}}\frac{D^{w}_{n}(s)m_{1}(\tau,s)}{\sqrt{n}}\left(\frac{1}{\pi(s)}-\frac{1}{\hat{\pi}^{w}(s)}\right)=-\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\frac{\xi_{i}A_{i}1\{S_{i}=s\}D^{w}_{n}(s)}{n^{w}(s)\sqrt{n}\hat{\pi}^{w}(s)\pi(s)}\eta_{i,1}(\tau,s).$

Note that

\displaystyle\{\xi_{i}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i})):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients $(\alpha,v)$ and the envelope $F_{i}=\xi_{i}$ , and

\mathbb{E}\left[\xi_{i}(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i}))|S_{i}=s\right]=0.

We can also let $\sigma_{n}^{2}=\mathbb{E}(F_{i}^{2}|S_{i}=s)\leq C<\infty$ for some constant $C$ . Then, Lemma N.2 implies

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\xi_{i}\eta_{i,1}(\tau,s)\right|=O_{p}(1).

In addition, Lemma N.4 implies $\max_{s\in\mathcal{S}}|D_{n}^{w}(s)/n^{w}(s)|=o_{p}(1)$ , which further implies $\max_{s\in\mathcal{S}}|\hat{\pi}^{w}(s)-\pi(s)|=o_{p}(1)$ . Combining these results, we have

\displaystyle\sup_{\tau\in\Upsilon}|R^{w}_{1,1}(\tau)|=o_{p}(1).

Next, recall $\overline{m}_{1}(\tau,s)=\mathbb{E}(\overline{m}_{1}(q_{1}(\tau),X_{i},S_{i})|S_{i}=s)$ . Then

$\displaystyle L_{1,2,n}^{w}$	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(s)}\overline{m}_{1}(\tau,s,X_{i})1\{S_{i}=s\}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\overline{m}_{1}(\tau,S_{i},X_{i})$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\hat{\pi}^{w}(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}$
	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}$
	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}^{w}(s)}{\hat{\pi}^{w}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}\xi_{i}A_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\left(\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})\right)1\{S_{i}=s\}$
	$\displaystyle\equiv\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\frac{\xi_{i}A_{i}}{\pi(s)}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}$
	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\overline{m}_{1}(\tau,S_{i},X_{i})-\overline{m}_{1}(\tau,S_{i})\right)+R_{1,2}^{w}(\tau),$	(F.2)

where the second equality holds because

	$\displaystyle\sum_{s\in\mathcal{S}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\frac{\overline{m}_{1}(\tau,s)}{\hat{\pi}^{w}(s)}=$	$\displaystyle\sum_{s\in\mathcal{S}}n_{1}^{w}(s)\frac{\overline{m}_{1}(\tau,s)}{\hat{\pi}^{w}(s)}=\sum_{s\in\mathcal{S}}n^{w}(s)\overline{m}_{1}(\tau,s)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}\xi_{i}1\{S_{i}=s\}\overline{m}_{1}(\tau,S_{i})=\sum_{i=1}^{n}\xi_{i}\overline{m}_{1}(\tau,S_{i}).$

For the first term in $R_{2}^{w}(\tau)$ , we have

	$\displaystyle\sup_{\tau\in\Upsilon}\left\|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\left(\frac{\pi(s)-\hat{\pi}^{w}(s)}{\hat{\pi}^{w}(s)\pi(s)}\right)\left(\sum_{i=1}^{n}A_{i}\xi_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))1\{S_{i}=s\}\right)\right\|$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}\left\|\frac{D_{n}^{w}(s)}{n^{w}(s)\hat{\pi}^{w}(s)\pi(s)}\right\|\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))\right\|=o_{p}(1),$

where the last equality holds due to Lemmas N.2 and N.4, and the fact that $\mathcal{F}=\{\xi(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)):\tau\in\Upsilon\}$ is of the VC-type with fixed coefficients $(\alpha,v)$ and envelope $\xi_{i}F_{i}$ such that $\mathbb{E}((\xi_{i}F_{i})^{q}|S_{i}=s)<\infty$ for $q>2.$

For the second term in $R_{1,2}^{w}(\tau)$ , recall $\overline{\Delta}_{1}(\tau,s,X_{i})=\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})$ . Then

	$\displaystyle\left\|\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}\frac{1}{\hat{\pi}^{w}(s)}\sum_{i=1}^{n}\xi_{i}(A_{i}-\hat{\pi}^{w}(s))\overline{\Delta}_{1}(\tau,s,X_{i})1\{S_{i}=s\}\right\|$
	$\displaystyle=\frac{1}{\sqrt{n}}\sum_{s\in\mathcal{S}}n_{0}^{w}(s)\sup_{\tau\in\Upsilon}\biggl{\|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{\|}=o_{p}(1),$

where the last equality holds by Assumption 3. Therefore, we have

\displaystyle\sup_{\tau\in\Upsilon}|R_{1,2}^{w}(\tau)|=o_{p}(1).

Combining (F.1) and (F.2), we have

	$\displaystyle L_{1,n}^{w}(\tau)=$	$\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]$
		$\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)$
		$\displaystyle+\sum_{i=1}^{n}\xi_{i}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}+R_{1,1}^{w}(\tau)-R_{1,2}^{w}(\tau),$

where $\sup_{\tau\in\Upsilon}(|R_{1,1}^{w}(\tau)|+|R_{1,2}^{w}(\tau)|)=o_{p}(1)$ . In addition, Assumption 3 implies that the classes of functions

\displaystyle\left\{\xi_{i}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]:\tau\in\Upsilon\right\}

and

\displaystyle\left\{\xi_{i}[\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)]:\tau\in\Upsilon\right\}

are of the VC-type with fixed coefficients and envelopes belonging to $L_{\mathbb{P},d}$ . In addition,

\displaystyle\mathbb{E}\left[\xi_{i}\left(\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right)\biggl{|}S_{i}=s\right]=0,

and

\displaystyle\mathbb{E}\left[\xi_{i}(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s))|S_{i}=s\right]=0.

Therefore, Lemma N.2 implies,

\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]\right|=O_{p}(1),

and

\displaystyle\sup_{\tau\in\Upsilon}\left|\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right|=O_{p}(1).

This implies $\sup_{\tau\in\Upsilon}|L^{w}_{1,n}(\tau)|=O_{p}(1)$ . Then by Kato (2009, Theorem 2) we have

	$\displaystyle\sqrt{n}(\hat{q}^{w}_{1}(\tau)-q_{1}(\tau))$
	$\displaystyle=\frac{1}{f_{1}(q_{1}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\left[\frac{\eta_{i,1}(\tau,s)}{\pi(s)}+\left(1-\frac{1}{\pi(s)}\right)\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)\right]$
	$\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)+\sum_{i=1}^{n}\xi_{i}\frac{m_{1}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,1}^{w}(\tau),$

where $\sup_{\tau\in\Upsilon}|R_{q,1}^{w}(\tau)|=o_{p}(1)$ . Similarly, we have

	$\displaystyle\sqrt{n}(\hat{q}^{w}_{0}(\tau)-q_{0}(\tau))$
	$\displaystyle=\frac{1}{f_{0}(q_{0}(\tau))}\biggl{\{}\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\left[\frac{\eta_{i,0}(\tau,s)}{1-\pi(s)}+\left(1-\frac{1}{1-\pi(s)}\right)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)\right]$
	$\displaystyle+\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)+\sum_{i=1}^{n}\xi_{i}\frac{m_{0}(\tau,S_{i})}{\sqrt{n}}\biggr{\}}+R_{q,0}(\tau),$

where $\sup_{\tau\in\Upsilon}|R_{q,0}^{w}(\tau)|=o_{p}(1)$ . Taking the difference of the above two displays we obtain

	$\displaystyle\sqrt{n}(\hat{q}^{w}(\tau)-q(\tau))$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}$
	$\displaystyle\times\left[\frac{\eta_{i,1}(\tau,s)-(1-\pi(s))\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{\pi(s)f_{1}(q_{1}(\tau))}-\frac{\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{f_{0}(q_{0}(\tau))}\right]$
	$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}$
	$\displaystyle\times\left[\frac{\eta_{i,0}(\tau,s)-\pi(s)\left(\overline{m}_{0}(\tau,s,X_{i})-\overline{m}_{0}(\tau,s)\right)}{(1-\pi(s))f_{0}(q_{0}(\tau))}-\frac{\left(\overline{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s)\right)}{f_{1}(q_{1}(\tau))}\right]$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)+R^{w}_{q}(\tau)$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i})$
	$\displaystyle+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\phi_{s}(\tau,S_{i})+R_{q}^{w}(\tau),$

where $\sup_{\tau\in\Upsilon}|R_{q}^{w}(\tau)|=o_{p}(1)$ and $(\phi_{1}(\cdot),\phi_{0}(\cdot),\phi_{s}(\cdot))$ are defined in Section E. Recalling the linear expansion of $\sqrt{n}(\hat{q}^{adj}(\tau)-q(\tau))$ established in Section E, we have

	$\displaystyle\sqrt{n}(\hat{q}^{w}(\tau)-\hat{q}^{adj}(\tau))=\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})$
	$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i})+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\phi_{s}(\tau,S_{i})+R(\tau)$
	$\displaystyle=\varpi_{n,1}^{w}(\tau)-\varpi_{n,2}^{w}(\tau)+R(\tau),$

where $\sup_{\tau\in\Upsilon}|R(\tau)|=o_{p}(1)$ ,

	$\displaystyle\varpi_{n,1}^{w}(\tau)=$	$\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})$
		$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i}),$

and

\displaystyle\varpi_{n,2}^{w}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\phi_{s}(\tau,S_{i}).

Lemma N.5 shows that, uniformly over $\tau\in\Upsilon$ and conditionally on data,

\displaystyle\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)\rightsquigarrow\mathcal{B}(\tau),

where $\mathcal{B}(\tau)$ is the Gaussian process with covariance kernel

	$\displaystyle\Sigma(\tau,\tau^{\prime})=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
	$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})+\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}),$

as defined in Theorem 3. This concludes the proof.

Appendix G Proof of Theorem 5.1

Name Description $N(s)$ For $s\in\mathcal{S}$ , $N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}$ $\Lambda_{\tau,s}(x,\theta_{a,s}(\tau))$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , $\tau\in\Upsilon$ , and $x\in\text{Supp}(X)$ , $\Lambda_{\tau,s}(x,\theta_{a,s}(\tau))$ is a parametric model for $\mathbb{P}(Y(a)\leq q_{a}(\tau)|S_{i}=s,X_{i}=x)$ with a pseudo true value $\theta_{a,s}(\tau)$ $\hat{\theta}_{a,s}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , $\tau\in\Upsilon$ , $\hat{\theta}_{a,s}(\tau)$ is a consistent estimator of $\theta_{a,s}(\tau)$

The proof is divided into two steps. In the first step, we show Assumption 5. Assumption 3(i) can be shown in the same manner and is omitted. In the second step, we establish Assumptions 3(ii) and 3(iii).

Step 1. Recall $\widehat{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(x,\hat{\theta}_{a,s}(\tau))$ , $\overline{m}_{a}(\tau,s,X_{i})=\tau-\Lambda_{\tau,s}(x,\theta_{a,s}(\tau))$ ,

\displaystyle\overline{\Delta}_{a}(\tau,s,X_{i})=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})=\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))-\Lambda_{\tau,s}(X_{i},\hat{\theta}_{a,s}(\tau)),

and $\{X_{i}^{s},\xi^{s}_{i}\}_{i\in[n]}$ is generated independently from the joint distribution of $(X_{i},\xi_{i})$ given $S_{i}=s$ , and so is independent of $\{A_{i},S_{i}\}_{i\in[n]}$ . Let $\Psi_{\tau,s}(\theta_{1},\theta_{2})=\mathbb{E}[\Lambda_{\tau,s}(X_{i},\theta_{1})-\Lambda_{\tau,s}(X_{i},\theta_{2})|S_{i}=s]=\mathbb{E}[\Lambda_{\tau,s}(X_{i}^{s},\theta_{1})-\Lambda_{\tau,s}(X_{i}^{s},\theta_{2})]$ . We have

	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{\|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{a}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{\|}$
	$\displaystyle\leq\left(\max_{s\in\mathcal{S}}\frac{n_{1}(s)}{n_{1}^{w}(s)}\right)\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right\|$
	$\displaystyle+\left(\max_{s\in\mathcal{S}}\frac{n_{0}(s)}{n_{0}^{w}(s)}\right)\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{0}(s)}\right\|$
	$\displaystyle=o_{p}(n^{-1/2}).$		(G.1)

To see the last equality, we note that, for any $\varepsilon>0$ , with probability approaching one (w.p.a.1), we have

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}||\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)||\leq\varepsilon.

Therefore, on the event

\displaystyle\mathcal{A}_{n}(\varepsilon)\equiv\left\{\sup_{\tau\in\Upsilon,s\in\mathcal{S}}||\hat{\theta}_{a,s}(\tau)-\theta_{a,s}(\tau)||\leq\varepsilon,\sup_{\tau\in\Upsilon}\max(||\hat{\theta}_{a,s}(\tau)||,||\theta_{a,s}(\tau)||)\leq C,\min_{s\in\mathcal{S}}n_{1}(s)\geq\varepsilon n\right\}

we have

	$\displaystyle\sup_{\tau\in\Upsilon}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right\|\biggl{\|}\{A_{i},S_{i}\}_{i\in[n]}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\sup_{\tau\in\Upsilon}\left\|\frac{\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\xi_{i}^{s}[\overline{\Delta}_{a}(\tau,s,X_{i}^{s})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right\|\biggl{\|}\{A_{i},S_{i}\}_{i\in[n]}$
	$\displaystyle\leq\|\|\mathbb{P}_{n_{1}(s)}-\mathbb{P}\|\|_{\mathcal{F}}\biggl{\|}\{A_{i},S_{i}\}_{i\in[n]},$

where $N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}$ and

\displaystyle\mathcal{F}=\left\{\xi_{i}^{s}[\Lambda_{\tau,s}(X_{i}^{s},\theta_{1})-\Lambda_{\tau,s}(X_{i}^{s},\theta_{2})-\Psi_{\tau,s}(\theta_{1},\theta_{2})]:\tau\in\Upsilon,||\theta_{1}-\theta_{2}||\leq\varepsilon,\max(||\theta_{1}||,||\theta_{2}||)\leq C\right\}.

By Assumption 6, $\mathcal{F}$ is a VC-class with a fixed VC index and envelope $L_{i}$ . In addition,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{P}f^{2}\leq\mathbb{E}L_{i}^{2}(\theta_{1}-\theta_{2})^{2}\leq C\varepsilon^{2}.

Therefore, for any $\delta>0$ we have

	$\displaystyle\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right\|\geq\delta n^{-1/2}\right)$
	$\displaystyle\leq\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right\|\geq\delta n^{-1/2},\mathcal{A}_{n}(\varepsilon)\right)+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon))$
	$\displaystyle\leq\mathbb{E}\left[\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right\|\geq\delta n^{-1/2},\mathcal{A}_{n}(\varepsilon)\biggl{\|}\{A_{i},S_{i}\}_{i\in[n]}\right)\right]$
	$\displaystyle+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon))$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}\mathbb{E}\left[\mathbb{P}\left(\|\|\mathbb{P}_{n_{1}(s)}-\mathbb{P}\|\|_{\mathcal{F}}\geq\delta n^{-1/2}\biggl{\|}\{A_{i},S_{i}\}_{i\in[n]}\right)1\{n_{1}(s)\geq n\varepsilon\}\right]+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon))$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}\mathbb{E}\left\{\frac{n^{1/2}\mathbb{E}\left[\|\|\mathbb{P}_{n_{1}(s)}-\mathbb{P}\|\|_{\mathcal{F}}\|\{A_{i},S_{i}\}_{i\in[n]}\right]1\{n_{1}(s)\geq n\varepsilon\}}{\delta}\right\}+\mathbb{P}(\mathcal{A}_{n}^{c}(\varepsilon)).$

By Chernozhukov et al. (2014, Corollary 5.1),

	$\displaystyle n^{1/2}\mathbb{E}\left[\|\|\mathbb{P}_{n_{1}(s)}-\mathbb{P}\|\|_{\mathcal{F}}\|\{A_{i},S_{i}\}_{i\in[n]}\right]1\{n_{1}(s)\geq n\varepsilon\}$
	$\displaystyle\leq C(\sqrt{\frac{n}{n_{1}(s)}\varepsilon^{2}}+n^{1/2}n_{1}^{1/d-1}(s))1\{n_{1}(s)\geq n\varepsilon\}$
	$\displaystyle\leq C(\varepsilon^{1/2}+n^{1/d-1/2}\varepsilon^{1/d-1}).$

Therefore,

\displaystyle\mathbb{E}\left\{\frac{n^{1/2}\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]1\{n_{1}(s)\geq n\varepsilon\}}{\delta}\right\}\leq C\mathbb{E}\left(\varepsilon^{1/2}+n^{1/d-1/2}\varepsilon^{1/d-1}\right)/\delta.

By letting $n\rightarrow\infty$ followed by $\varepsilon\rightarrow 0$ , we have

\displaystyle\lim_{n\rightarrow\infty}\mathbb{P}\left(\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n_{1}(s)}\right|\geq\delta n^{-1/2}\right)=0,

In addition,

\displaystyle\max_{s\in\mathcal{S}}|n_{1}^{w}(s)/n_{1}(s)-1|=\max_{s\in\mathcal{S}}|(D_{n}^{w}(s)-D_{n}(s))/(\pi(s)n(s)+D_{n}(s))|\stackrel{{\scriptstyle p}}{{\longrightarrow}}1,

as Lemma N.4 shows that $\max_{s\in\mathcal{S}}|(D_{n}^{w}(s)-D_{n}(s))/n(s)|=o_{p}(1)$ .

Therefore,

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n^{w}_{1}(s)}\right|=o_{p}(n^{-1/2}).

For the same reason, we have

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{a}(\tau,s,X_{i})-\Psi_{\tau,s}(\theta_{a,s}(\tau),\hat{\theta}_{a,s}(\tau))]}{n^{w}_{0}(s)}\right|=o_{p}(n^{-1/2}),

and (G.1) holds.

Step 2. By Assumption 6,

	$\displaystyle\|\overline{m}_{a}(\tau_{2},S_{i},X_{i})-\overline{m}_{a}(\tau_{1},S_{i},X_{i})\|$
	$\displaystyle\leq\|\tau_{2}-\tau_{1}\|+\|\Lambda_{\tau_{1},s}(X_{i},\theta_{a,s}(\tau_{1}))-\Lambda_{\tau_{2},s}(X_{i},\theta_{a,s}(\tau_{2}))\|$
	$\displaystyle\leq(1+L_{1})\|\tau_{2}-\tau_{1}\|+L_{i}\|\theta_{a,s}(\tau_{1})-\theta_{a,s}(\tau_{2})\|\leq(CL_{i}+1)\|\tau_{2}-\tau_{1}\|.$

This implies Assumption 3(iii). Furthermore, by Assumption 6 we can let the envelope for the class of functions $\mathcal{F}=\{\overline{m}_{a}(\tau_{2},S_{i},X_{i}):\tau\in\Upsilon\}$ be $F_{i}=\max(C,1)L_{i}+1$ where the constant $C$ is the one in the above display. Then, we have

\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq N(\Upsilon,d,\varepsilon)\leq 1/\varepsilon,

where $d(\tau_{1},\tau_{2})=|\tau_{1}-\tau_{2}|$ . This verifies Assumption 3(ii).

Appendix H Proof of Theorem 5.2

Name Description $W_{i,s}(\tau)$ For $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $W_{i,s}(\tau)$ is the linear regressor in the linear adjustment so that $\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))=W_{i,s}^{\top}(\tau)\theta_{a,s}(\tau)$ $\tilde{W}_{i,s}(\tau)$ For $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)$ $\theta_{a,s}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\theta_{a,s}(\tau)$ is the pseudo true value in the linear adjustment

Let $\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})$ be the asymptotic covariance matrix of $\hat{q}^{adj}(\tau_{k})$ and $\hat{q}^{adj}(\tau_{l})$ with a linear adjustment and pseudo true values $(\theta_{1,s}(\tau_{k}),\theta_{0,s}(\tau_{k}))_{k\in[K]}$ . Then, we have

\displaystyle\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})=\Sigma^{\textit{LP}}_{1}(\tau_{k},\tau_{l})+\sum_{s\in\mathcal{S}}p(s)\mathbb{E}\left[(\tilde{W}_{i,s}^{\top}(\tau_{k})\beta_{s}(\tau_{k})-\overline{y}_{i,s}(\tau_{k}))(\tilde{W}_{i,s}^{\top}(\tau_{l})\beta_{s}(\tau_{l})-\overline{y}_{i,s}(\tau_{l}))|S_{i}=s\right],

where

	$\displaystyle\Sigma^{\textit{LP}}_{1}(\tau_{k},\tau_{l})$	$\displaystyle=\biggl{\{}\mathbb{E}\left[\frac{(\tau-1\{Y_{i}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,S_{i},X_{i}))^{2}}{\pi(S_{i})f_{1}^{2}(q_{1}(0))}\right]$
		$\displaystyle+\mathbb{E}\left[\frac{(\tau-1\{Y_{i}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,S_{i},X_{i}))^{2}}{(1-\pi(S_{i}))f_{0}^{2}(q_{0}(0))}\right]$
		$\displaystyle+\mathbb{E}\left(\frac{m_{1}(\tau,S_{i},X_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i},X_{i})}{f_{0}(q_{0}(\tau))}\right)^{2}\biggr{\}},$
	$\displaystyle\beta_{s}(\tau)$	$\displaystyle=\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\theta_{1,s}(\tau)}{f_{1}(q_{1}(\tau))}+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\theta_{0,s}(\tau)}{f_{0}(q_{0}(\tau))},\quad\text{and}$
	$\displaystyle\overline{y}_{i,s}(\tau)$	$\displaystyle=\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\left[\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)\|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)\|S_{i}=s)\right]}{f_{1}(q_{1}(\tau))}$
		$\displaystyle+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\left[\mathbb{P}(Y_{i}(0)\leq q_{0}(\tau)\|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(0)\leq q_{0}(\tau)\|S_{i}=s)\right]}{f_{0}(q_{0}(\tau))}.$

To minimize $[\Sigma^{\textit{LP}}(\tau_{k},\tau_{l})]_{k,l\in[K]}$ (in the matrix sense) is the same as minimizing

\left[\mathbb{E}\left[(\tilde{W}_{i,s}^{\top}(\tau_{k})\beta_{s}(\tau_{k})-\overline{y}_{i,s}(\tau_{k}))(\tilde{W}_{i,s}^{\top}(\tau_{l})\beta_{s}(\tau_{l})-\overline{y}_{i,s}(\tau_{l}))|S_{i}=s\right]\right]_{k,l\in[K]}

for each $s\in\mathcal{S}$ , which is achieved if

\displaystyle\beta_{s}(\tau_{k})=[\mathbb{E}\tilde{W}_{i,s}(\tau_{k})\tilde{W}_{i,s}^{\top}(\tau_{k})|S_{i}=s]^{-1}\mathbb{E}[\tilde{W}_{i,s}(\tau_{k})\overline{y}_{i,s}(\tau_{k})|S_{i}=s].

(H.1)

Because $\mathbb{E}[\tilde{W}_{i,s}(\tau)\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i}=s)|S_{i}=s]=0$ for $a=0,1$ , (H.1) implies

	$\displaystyle\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\theta_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\theta_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))}$
	$\displaystyle=\sqrt{\frac{1-\pi(s)}{\pi(s)}}\frac{\theta^{\textit{LP}}_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\sqrt{\frac{\pi(s)}{1-\pi(s)}}\frac{\theta^{\textit{LP}}_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))},$

or equivalently,

\displaystyle\frac{\theta_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)}{1-\pi(s)}\frac{\theta_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))}=\frac{\theta^{\textit{LP}}_{1,s}(\tau_{k})}{f_{1}(q_{1}(\tau_{k}))}+\frac{\pi(s)}{1-\pi(s)}\frac{\theta^{\textit{LP}}_{0,s}(\tau_{k})}{f_{0}(q_{0}(\tau_{k}))}.

This concludes the proof.

Appendix I Proof of Theorem 5.3

Name Description $q_{a}(\tau)$ For $a=0,1$ and $\tau\in\Upsilon$ , $q_{a}(\tau)$ is the $\tau$ th quantile of $Y(a)$ $\theta_{a,s}^{\textit{LP}}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\theta_{a,s}^{\textit{LP}}(\tau)=\left[\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)|S_{i}=s)\right]^{-1}\mathbb{E}\left[\tilde{W}_{i,s}(\tau)1\{Y_{i}(a)\leq q_{a}(\tau)\}|S_{i}=s\right]$ is the optimal linear coefficient $\dot{W}_{i,a,s}(\tau)$ For $a\in\{0,1\}$ , $i\in[n]$ , $s\in\mathcal{S}$ , $\dot{W}_{i,a,s}(\tau)=W_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}W_{i,s}(\tau)$ , $I_{a}(s)=\{i\in[n]:A_{i}=a,S_{i}=s\}$ $\hat{\theta}_{a,s}^{\textit{LP}}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{\theta}_{a,s}^{\textit{LP}}(\tau)$ is defined in (5.7) $\hat{q}_{a}(\tau)$ For $a\in\{0,1\}$ and $\tau\in\Upsilon$ , $\hat{q}_{a}(\tau)$ is the estimator of $q_{a}(\tau)$ without any adjustments

Assumption 6(i) holds by Assumption 7. In addition, by Assumption 2, we have $\sup_{\tau\in\Upsilon}|\partial_{\tau}\theta_{a,s}^{\textit{LP}}(\tau)|<\infty$ . This implies Assumption 6(ii). Next, we aim to show

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}|\hat{\theta}_{a,s}^{\textit{LP}}(\tau)-\theta_{a,s}^{\textit{LP}}(\tau)|=O_{p}(n^{-1/2}).

Focusing on $\hat{\theta}_{1,s}^{\textit{LP}}(\tau)$ we have

	$\displaystyle\hat{\theta}_{1,s}^{\textit{LP}}(\tau)-\theta_{1,s}^{\textit{LP}}(\tau)$	$\displaystyle=\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)\dot{W}_{i,1,s}^{\top}(\tau)\right]^{-1}$
		$\displaystyle\times\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right].$		(I.1)

For the first term in (I.1), we have

\displaystyle\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)\dot{W}_{i,1,s}^{\top}(\tau)\stackrel{{\scriptstyle d}}{{=}}\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\dot{W}_{i,1,s}^{s}(\tau)\dot{W}_{i,1,s}^{s\top}(\tau),

where $\dot{W}_{i,1,s}^{s}(\tau)=W_{i,s}^{s}(\tau)-\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}W_{i,s}^{s}(\tau)$ and $W_{i,s}^{s}(\tau)$ is i.i.d. across $i$ with common distribution equal to the conditional distribution of $W_{i,s}(\tau)$ given $S_{i}=s$ and independent of $N(s),n_{1}(s)$ . Therefore, by Assumption 9, we have

	$\displaystyle\sup_{s\in\mathcal{S},\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\dot{W}_{i,1,s}^{s}(\tau)\dot{W}_{i,1,s}^{s\top}(\tau)-\mathbb{E}(W_{i,s}^{s}(\tau)-\mathbb{E}W_{i,s}^{s}(\tau))(W_{i,s}^{s}(\tau)-\mathbb{E}W_{i,s}^{s}(\tau))^{\top}\right\\|_{F}$
	$\displaystyle=\sup_{s\in\mathcal{S},\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\dot{W}_{i,1,s}^{s}(\tau)\dot{W}_{i,1,s}^{s\top}(\tau)-\mathbb{E}(\tilde{W}_{i,s}(\tau)\tilde{W}_{i,s}^{\top}(\tau)\|S_{i}=s)\right\\|_{F}=o_{p}(1),$

where $||\cdot||_{F}$ denotes the Frobenius norm and $\tilde{W}_{i,s}(\tau)=W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)$ . For the second term in (I.1), we have

	$\displaystyle\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)\left(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right)$
	$\displaystyle=\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\left(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right)+R_{1}(\tau)$
	$\displaystyle=\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\left(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right)+R_{1}(\tau)+R_{2}(\tau)$
	$\displaystyle=\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-1\{Y_{i}\leq q_{1}(\tau)\})\right]$
	$\displaystyle+\left[\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)(1\{Y_{i}\leq q_{1}(\tau)\}-\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right]+R_{1}(\tau)+R_{2}(\tau)$
	$\displaystyle\equiv I(\tau)+II(\tau)+R_{1}(\tau)+R_{2}(\tau),$

where

\displaystyle R_{1}(\tau)=

\displaystyle-\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right)\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right),

and

\displaystyle R_{2}(\tau)=\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right)\left(\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right).

By Assumption 7 we can show that $\sup_{\tau\in\Upsilon}|\theta_{1,s}^{\textit{LP}}(\tau)|\leq C<\infty$ for some constant $C>0$ . Therefore, we have

	$\displaystyle\sup_{\tau\in\Upsilon}\|R_{1}(\tau)\|$	$\displaystyle=\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)\|S_{i}=s)\right\|\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right\|$
		$\displaystyle=O_{p}(n^{-1/2}),$

and

	$\displaystyle\sup_{\tau\in\Upsilon}\|R_{2}(\tau)\|$	$\displaystyle=\sup_{\tau\in\Upsilon}\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)\|S_{i}=s)\right\|\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau)\right\|$
		$\displaystyle=O_{p}(n^{-1/2}),$

where we use the fact that, by Assumption 9,

\displaystyle\sup_{\tau\in\upsilon}\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}W_{i,s}(\tau)-\mathbb{E}(W_{i,s}(\tau)|S_{i}=s)\right|\stackrel{{\scriptstyle d}}{{=}}\sup_{\tau\in\upsilon}\left|\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}(W_{i,s}^{s}(\tau)-\mathbb{E}W_{i,s}^{s}(\tau))\right|=O_{p}(n^{-1/2}).

Next, note that $\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|=O_{p}(n^{-1/2})$ , which means for any $\varepsilon>0$ , there exists a constant $M>0$ such that $\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|\leq Mn^{-1/2}$ with probability greater than $1-\varepsilon$ . On the event set that $\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|\leq Mn^{-1/2}$ , we have

	$\displaystyle\sup_{\tau\in\Upsilon}\|I(\tau)\|\leq\sup_{\tau\in\Upsilon}\biggl{\|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\biggl{(}1\{Y_{i}(1)\leq\hat{q}_{1}(\tau)\}-1\{Y_{i}(1)\leq q_{1}(\tau)\}$
	$\displaystyle-\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)\|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)\|X_{i},S_{i}=s)\biggr{)}\biggr{\|}$
	$\displaystyle+\sup_{\tau\in\Upsilon}\biggl{\|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)\|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)\|X_{i},S_{i}=s))\biggr{\|}$
	$\displaystyle\leq\sup_{\|q-q^{\prime}\|\leq Mn^{-1/2}}\biggl{\|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\biggl{(}1\{Y_{i}(1)\leq q\}-1\{Y_{i}(1)\leq q^{\prime}\}$
	$\displaystyle-\mathbb{P}(Y_{i}(1)\leq q\|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q^{\prime}\|X_{i},S_{i}=s)\biggr{)}\biggr{\|}+C\sup_{\tau\in\Upsilon}\|\hat{q}(\tau)-q(\tau)\|$
	$\displaystyle\leq\sup_{\|q-q^{\prime}\|\leq Mn^{-1/2}}\biggl{\|}\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\tilde{W}_{i,s}(\tau)\biggl{(}1\{Y_{i}(1)\leq q\}-1\{Y_{i}(1)\leq q^{\prime}\}$
	$\displaystyle-\mathbb{P}(Y_{i}(1)\leq q\|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q^{\prime}\|X_{i},S_{i}=s)\biggr{)}\biggr{\|}+Cn^{-1/2}$
	$\displaystyle=O_{p}(n^{-1/2}),$

where the first inequality is due to the triangle inequality, the second inequality is due to the fact that $\sup_{\tau\in\Upsilon}|\hat{q}(\tau)-q(\tau)|\leq Mn^{-1/2}$ , and the third inequality is due to the fact that $f_{1}(\cdot|X_{i},S_{i}=s)$ is assumed to be bounded. To see the last equality in the above display, we define

\displaystyle\mathcal{F}=\begin{Bmatrix}&(W_{i,s}(j)-\mathbb{E}W_{i,s}(\tau,j)|S_{i}=s)\biggl{(}1\{Y_{i}(1)\leq q\}-1\{Y_{i}(1)\leq q^{\prime}\}\\ &-\mathbb{P}(Y_{i}(1)\leq q|X_{i},S_{i}=s)+\mathbb{P}(Y_{i}(1)\leq q^{\prime}|X_{i},S_{i}=s)\biggr{)}:\tau\in\Upsilon,|q-q^{\prime}|\leq Mn^{-1/2}\end{Bmatrix}

with envelope $F_{i}=2L_{i}+\mathbb{E}(L_{i}|S_{i}=s)\in L_{\mathbb{P},d}$ for some $d>2$ , where $W_{i,s}(\tau,j)$ is the $j$ th coordinate of $W_{i,s}(\tau)$ . Clearly $\mathcal{F}$ is of the VC-type with fixed coefficients $(\alpha,v)$ . In addition,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{P}f^{2}\leq Cn^{-1/2}\equiv\sigma_{n}^{2}.

Therefore, Lemma N.2 implies that $\sup_{\tau\in\Upsilon}|I(\tau)|=O_{p}(n^{-1/2})$ . By the usual maximal inequality (e.g. (van der Vaart and Wellner, 1996), Theorem 2.14.1), we can show that

\displaystyle\sup_{\tau\in\Upsilon}|II(\tau)|=O_{p}(n^{-1/2}).

Combining these results, we conclude that

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\dot{W}_{i,1,s}(\tau)(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\dot{W}_{i,1,s}^{\top}(\tau)\theta_{1,s}^{\textit{LP}}(\tau))\right|=O_{p}(n^{-1/2}),

and hence

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}|\hat{\theta}_{1,s}^{\textit{LP}}(\tau)-\theta_{1,s}^{\textit{LP}}(\tau)|=O_{p}(n^{-1/2}).

Appendix J Proof of Theorem 5.4

Name Description $H_{i}$ For $i\in[n]$ , $H_{i}=H(X_{i})$ for some function $H$ $\theta_{a,s}^{\textit{ML}}(\tau)$ For $a\in\{0,1\}$ and $s\in\mathcal{S}$ , $\theta_{a,s}^{\textit{ML}}(\tau)$ is the pseudo true value defined in (5.9) $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ For $a\in\{0,1\}$ and $s\in\mathcal{S}$ , $\hat{\theta}_{a,s}^{\textit{ML}}(\tau)$ is the estimator of $\theta_{a,s}^{\textit{ML}}(\tau)$ in (5.8)

Let $d_{H}$ be the dimension of $H_{i}$ , $u\in\Re^{d_{H}}$ , and

\displaystyle Q_{n}(\tau,s,q,u)=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}[1\{Y_{i}\leq q\}\log(\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))+1\{Y_{i}>q\}\log(1-\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))],

and

\displaystyle Q(\tau,s,q,u)=\mathbb{E}[1\{Y_{i}(a)\leq q\}\log(\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))+1\{Y_{i}(a)>q\}\log(1-\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u)))|S_{i}=s].

We note that

\displaystyle\mathcal{F}=\begin{pmatrix}1\{Y_{i}\leq q\}\log(\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u))))+1\{Y_{i}>q\}\log(1-\lambda(H_{i}^{\top}(\theta_{a}^{\textit{ML}}(\tau)+u))))-Q(\tau,s,q,u)\\ \tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta\end{pmatrix}

is a VC class with a fixed VC index. Then, Lemma N.2 implies

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta}|Q_{n}(\tau,s,q,u)-Q(\tau,s,q,u)|=o_{p}(1).

In addition,

\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S},q\in\Re,||u||_{2}\leq\delta}|\partial_{q}Q(\tau,s,q,u)|\leq C,

and $\sup_{\tau\in\Upsilon}|\hat{q}_{a}(\tau)-q_{a}(\tau)|=O_{p}(n^{-1/2})$ . Therefore,

	$\displaystyle\Delta_{n}\equiv\sup_{\tau\in\Upsilon,s\in\mathcal{S},\|\|u\|\|_{2}\leq\delta}\|Q_{n}(\tau,s,\hat{q}_{a}(\tau),u)-Q(\tau,s,q_{a}(\tau),u)\|$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S},q\in\Re,\|\|u\|\|_{2}\leq\delta}\|Q_{n}(\tau,s,q,u)-Q(\tau,s,q,u)\|$
	$\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S},\|\|u\|\|_{2}\leq\delta}\|Q(\tau,s,\hat{q}_{a}(\tau),u)-Q(\tau,s,q_{a}(\tau),u)\|=o_{p}(1).$		(J.1)

Further note that $Q_{n}(\tau,s,\hat{q}_{a}(\tau),u)$ is concave in $u$ for fixed $\tau$ . Therefore, for $v\in S^{d_{H}-1}$ where $S^{d_{H}-1}=\{v\in\Re^{d_{H}},||v||_{2}=1\}$ , and $l>\delta$

\displaystyle Q_{n}(\tau,s,\hat{q}_{a}(\tau),\delta v)\geq\frac{\delta}{l}Q_{n}(\tau,s,\hat{q}_{a}(\tau),lv)+\left(1-\frac{\delta}{l}\right)Q_{n}(\tau,s,\hat{q}_{a}(\tau),0),

which implies

	$\displaystyle\frac{\delta}{l}\left(Q_{n}(\tau,s,\hat{q}_{a}(\tau),lv)-Q_{n}(\tau,s,\hat{q}_{a}(\tau),0)\right)$	$\displaystyle\leq Q_{n}(\tau,s,\hat{q}_{a}(\tau),\delta v)-Q_{n}(\tau,s,\hat{q}_{a}(\tau),0)$
		$\displaystyle\leq Q(\tau,s,q_{a}(\tau),\delta v)-Q(\tau,s,q_{a}(\tau),0)+2\Delta_{n}.$

Because $Q(\tau,s,q_{a}(\tau),\delta v)-Q(\tau,s,q_{a}(\tau),0)$ is continuous in $(\tau,v)\in\Upsilon\times S^{d_{H}-1}$ , $\Upsilon\times S^{d_{H}-1}$ is compact, and $0$ is the unique maximizer of $Q(\tau,s,q_{a}(\tau),u)$ , we have

\displaystyle\sup_{(\tau,v)\in\Upsilon\times S^{d_{x}-1}}Q(\tau,s,q_{a}(\tau),\delta v)-Q(\tau,s,q_{a}(\tau),0)\leq-\eta,

for some $\eta>0$ . In addition, if $\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)||_{2}>\delta$ , then there exists $(\tau,l,v)\in\Upsilon\times(\delta,\infty)\times S^{d_{x}-1}$ such that

\displaystyle\frac{\delta}{l}\left(Q_{n}(\tau,s,\hat{q}_{a}(\tau),lv)-Q_{n}(\tau,s,\hat{q}_{a}(\tau),0)\right)\geq 0.

Therefore,

\displaystyle\mathbb{P}\left(\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)||_{2}>\delta\right)\leq\mathbb{P}(\eta\leq 2\Delta_{n})\rightarrow 0,

where the last step is due to (J.1). This implies

\displaystyle\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)||_{2}=o_{p}(1).

Appendix K Proof of Theorem 5.5

Name Description $\lambda(\cdot)$ Logistic CDF $\theta_{a,s}^{\textit{LPML}}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\theta_{a,s}^{\textit{LPML}}(\tau)$ is the pseudo true value defined in (5.11) $\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)$ is the estimator of $\theta_{a,s}^{\textit{LPML}}(\tau)$ in (5.15) $\omega_{i,a,s}(\tau)$ For $a=0,1$ , $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\omega_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\theta_{a,s}^{\textit{ML}}(\tau))$ $W_{i,s}(\tau)$ For $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $W_{i,s}(\tau)=(\omega_{i,1,s}(\tau),\omega_{i,0,s}(\tau))^{\top}$ $\omega_{i,a,a^{\prime},s}(\tau)$ For $a=0,1$ , $a^{\prime}=0,1$ , $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\omega_{i,a,a^{\prime},s}(\tau)=\omega_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\omega_{i,a^{\prime},s}(\tau)$ $\dot{W}_{i,a,s}(\tau)$ For $a=0,1$ , $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\dot{W}_{i,a,s}(\tau)=W_{i,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}W_{i,s}(\tau)=(\omega_{i,a,1,s}(\tau),\omega_{i,a,0,s}(\tau))^{\top}$ $\hat{\omega}_{i,a,s}(\tau)$ For $a=0,1$ , $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{\omega}_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\hat{\theta}_{a,s}^{\textit{ML}}(\tau))$ $\hat{W}_{i,s}(\tau)$ For $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{W}_{i,s}(\tau)=(\hat{\omega}_{i,1,s}(\tau),\hat{\omega}_{i,0,s}(\tau))^{\top}$ $\hat{\omega}_{i,a,a^{\prime},s}(\tau)$ For $a=0,1$ , $a^{\prime}=0,1$ , $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{\omega}_{i,a,a^{\prime},s}(\tau)=\hat{\omega}_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{\omega}_{i,a^{\prime},s}(\tau)$ $\breve{W}_{i,a,s}(\tau)$ For $a=0,1$ , $i\in[n]$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\breve{W}_{i,a,s}(\tau)=\hat{W}_{i,a,s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{W}_{i,s}(\tau)=(\hat{\omega}_{i,a,1,s}(\tau),\hat{\omega}_{i,a,0,s}(\tau))^{\top}$

Recall $\widehat{m}_{a}(\tau,s,X_{i})=\tau-\hat{W}_{i,s}^{\top}(\tau)\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)$ and $\overline{m}_{a}(\tau,s,X_{i})=\tau-W_{i,s}^{\top}(\tau)\theta_{a,s}^{\textit{LPML}}(\tau)$ . Let $d_{H}$ be the dimension of $H_{i}$ . Then, we have

	$\displaystyle\overline{\Delta}_{a}(\tau,s,X_{i})$	$\displaystyle=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})$
		$\displaystyle\equiv\Lambda_{\tau,s}(X_{i},\theta_{a,s}(\tau))-\Lambda_{\tau,s}(X_{i},\hat{\theta}_{a,s}(\tau)),$

where the functional form $\Lambda_{\tau,s}(\cdot)$ is invariant to $(\tau,s)$ ,

	$\displaystyle\Lambda_{\tau,s}(X_{i},\theta)=\Lambda(X_{i},\theta)\equiv(\lambda(H(X_{i})^{\top}\theta_{1}),\lambda(H(X_{i})^{\top}\theta_{2}))\theta_{3},\quad\theta_{1},\theta_{2}\in\Re^{d_{H}},\quad\theta_{3}\in\Re^{2},$
	$\displaystyle\theta_{a,s}(\tau)=((\theta_{1,s}^{\textit{ML}}(\tau))^{\top},(\theta_{0,s}^{\textit{ML}}(\tau))^{\top},(\theta_{a,s}^{\textit{LPML}}(\tau))^{\top})^{\top},\quad\text{and}$
	$\displaystyle\hat{\theta}_{a,s}(\tau)=((\hat{\theta}_{1,s}^{\textit{ML}}(\tau))^{\top},(\hat{\theta}_{0,s}^{\textit{ML}}(\tau))^{\top},(\hat{\theta}_{a,s}^{\textit{LPML}}(\tau))^{\top})^{\top}.$

Suppose

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}|\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)-\theta_{a,s}^{\textit{LPML}}(\tau)|=o_{p}(1),

(K.1)

and we also have

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}|\hat{\theta}_{a,s}^{\textit{ML}}(\tau)-\theta_{a,s}^{\textit{ML}}(\tau)|=o_{p}(1)

by Theorem 5.4. Then Assumption 6(iii) holds for $\hat{\theta}_{a,s}(\tau)$ . Assumption 6(i) holds automatically as $\Lambda(\cdot)$ does not depend on $\tau$ , and Assumption 6(ii) holds by Assumption 11. Then, Theorem 5.1 implies Assumptions 3 and 5 hold. In addition, Theorem 5.2 implies $[\Sigma^{\textit{LPML}}(\tau_{k},\tau_{l})]_{k,l\in[K]}$ is the smallest among all linear adjustments with $W_{i,s}(\tau)=(\omega_{i,1,s}(\tau),\omega_{i,0,s}(\tau))^{\top}$ with $\omega_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\theta_{a,s}^{\textit{ML}}(\tau))$ as the regressors.

Therefore, the only thing left is to establish (K.1). First, denote

	$\displaystyle\dot{\omega}_{i,a,a^{\prime},s}(\tau)=\omega_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\omega_{i,a^{\prime},s}(\tau),\quad\dot{W}_{i,a,s}(\tau)=(\dot{\omega}_{i,a,1,s}(\tau),\dot{\omega}_{i,a,0,s}(\tau))^{\top},\quad\text{and}$
	$\displaystyle\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right].$

We note that Assumption 9 holds with $W_{i,s}(\tau)=(\lambda(H_{i}^{\top}\theta_{1,s}^{\textit{ML}}(\tau)),\lambda(H_{i}^{\top}\theta_{0,s}^{\textit{ML}}(\tau)))^{\top}$ by Assumption 11(ii). Then, following the same argument as in the proof of Theorem 5.3, we can show that

\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)-\theta_{a,s}^{\textit{LPML}}(\tau)||_{2}=o_{p}(1).

Therefore, it suffices to show

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}||\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)-\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)||_{2}=o_{p}(1).

Denote $\hat{W}_{i,s}(\tau)=(\hat{\omega}_{i,1,s}(\tau),\hat{\omega}_{i,0,s}(\tau))^{\top}$ and $\hat{\omega}_{i,a,s}(\tau)=\lambda(H_{i}^{\top}\hat{\theta}_{a,s}^{\textit{ML}}(\tau))$ . We have

	$\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))^{2}\right]^{1/2}$
	$\displaystyle\leq\max_{a=0,1,s\in\mathcal{S}}\lambda_{\max}^{1/2}\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}H_{i}H_{i}^{\top}\right)\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\|\|\hat{\theta}_{1,s}^{\textit{ML}}(\tau)-\theta_{1,s}^{\textit{ML}}(\tau)\|\|_{2}=o_{p}(1).$

In addition, denote $\breve{\omega}_{i,a,a^{\prime},s}(\tau)=\hat{\omega}_{i,a^{\prime},s}(\tau)-\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\hat{\omega}_{i,a^{\prime},s}(\tau)$ and $\breve{W}_{i,a,s}(\tau)=(\breve{\omega}_{i,a,1,s}(\tau),\breve{\omega}_{i,a,0,s}(\tau))^{\top}$ . We first consider the case $a^{\prime}=1$ . We have

	$\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,1,s}^{2}(\tau)\right]^{1/2}-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,1,s}^{2}(\tau)\right]^{1/2}\right\|$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{\omega}_{i,a,1,s}(\tau)-\breve{\omega}_{i,a,1,s}^{2}(\tau))^{2}\right]^{1/2}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))^{2}\right]^{1/2}+\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))\right\|$
	$\displaystyle\leq 2\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\omega_{i,1,s}(\tau)-\hat{\omega}_{i,1,s}(\tau))^{2}\right]^{1/2}=o_{p}(1).$

In addition, Assumption 11 implies $\inf_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,1,s}^{2}(\tau)\right]^{1/2}\leq C<\infty$ . Therefore, we have

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,1,s}^{2}(\tau)\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,1,s}^{2}(\tau)\right]\right|=o_{p}(1).

Similarly, we have

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,0,s}^{2}(\tau)\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,0,s}^{2}(\tau)\right]\right|=o_{p}(1).

Last,

	$\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,0,s}(\tau)\dot{\omega}_{i,a,1,s}(\tau)\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,0,s}(\tau)\breve{\omega}_{i,a,1,s}(\tau)\right]\right\|$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left[\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{\omega}_{i,a,0,s}^{2}(\tau)\right)^{1/2}+\left(\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{\omega}_{i,a,1,s}^{2}(\tau)\right)^{1/2}\right]$
	$\displaystyle\times\biggl{\{}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{\omega}_{i,a,1,s}(\tau)-\breve{\omega}_{i,a,1,s}(\tau))^{2}\right]^{1/2}+\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{\omega}_{i,a,0,s}(\tau)-\breve{\omega}_{i,a,0,s}(\tau))^{2}\right]^{1/2}\biggr{\}}=o_{p}(1).$

This implies

	$\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}(\tau)^{\top}\right]\right\|$
	$\displaystyle=\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\biggl{\|}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\begin{pmatrix}\dot{\omega}_{i,a,1,s}^{2}(\tau)&\dot{\omega}_{i,a,1,s}(\tau)\dot{\omega}_{i,a,0,s}(\tau)\\ \dot{\omega}_{i,a,0,s}(\tau)\dot{\omega}_{i,a,1,s}(\tau)&\dot{\omega}_{i,a,0,s}^{2}(\tau)\end{pmatrix}\right]$
	$\displaystyle-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\begin{pmatrix}\breve{\omega}_{i,a,1,s}^{2}(\tau)&\breve{\omega}_{i,a,1,s}(\tau)\breve{\omega}_{i,a,0,s}(\tau)\\ \breve{\omega}_{i,a,0,s}(\tau)\breve{\omega}_{i,a,1,s}(\tau)&\breve{\omega}_{i,a,0,s}^{2}(\tau)\end{pmatrix}\right]\biggr{\|}=o_{p}(1),$

and thus,

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left|\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]^{-1}-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\right|=o_{p}(1).

In addition,

	$\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(\dot{W}_{i,a,s}-\breve{W}_{i,a,s}(\tau))1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right\\|_{2}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left\\|\dot{W}_{i,a,s}(\tau)-\breve{W}_{i,a,s}(\tau)\right\\|_{2}$
	$\displaystyle\leq 2\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left\\|W_{i,s}(\tau)-\hat{W}_{i,s}(\tau)\right\\|_{2}=o_{p}(1).$

Therefore, we have

	$\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\|\|\breve{\theta}_{a,s}^{\textit{LPML}}(\tau)-\hat{\theta}_{a,s}^{\textit{LPML}}(\tau)\|\|_{2}$
	$\displaystyle=\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\biggl{\\|}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)\dot{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\dot{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right]$
	$\displaystyle-\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)\breve{W}_{i,a,s}(\tau)^{\top}\right]^{-1}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\breve{W}_{i,a,s}(\tau)1\{Y_{i}\leq\hat{q}_{a}(\tau)\}\right]\biggr{\\|}_{2}$
	$\displaystyle=o_{p}(1),$

which concludes the proof.

Appendix L Proof of Theorem 5.6

Name Description $\theta_{a,s}^{\textit{NP}}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\theta_{a,s}^{\textit{NP}}(\tau)$ is the pseudo true value defined in Assumption 12(ii) $\hat{\theta}_{a,s}^{\textit{NP}}(\tau)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{\theta}_{a,s}^{\textit{NP}}(\tau)$ is the estimator of $\theta_{a,s}^{\textit{NP}}(\tau)$ in (5.17)

The proof strategy follows Belloni et al. (2017) and details are given here for completeness. We divide the proof into three steps. In the first step, we show

\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{NP}}(\tau)-\theta_{a,s}^{\textit{NP}}(\tau)||_{2}=O_{p}(\sqrt{h_{n}\log(n)/n}).

In the second step, we establish Assumption 5. By a similar argument, we can establish Assumption 3(i). In the third step, we establish Assumptions 3(ii) and 3(iii).

Step 1. Let $\hat{U}_{\tau}=\hat{\theta}^{\textit{NP}}_{a,s}(\tau)-\theta^{\textit{NP}}_{a,s}(\tau)$ ,

	$\displaystyle Q_{n}(\tau,s,q,\theta)$	$\displaystyle=\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}[1\{Y_{i}\leq q\}\log(\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))+1\{Y_{i}>q\}\log(1-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a}))]$
		$\displaystyle=\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}[\log\left(1+\exp(H_{h_{n}}^{\top}(X_{i})\theta_{a})\right)-1\{Y_{i}\leq q\}H_{h_{n}}^{\top}(X_{i})\theta_{a}],$

and for an arbitrary $U_{\tau}\in\Re^{h_{n}}$ ,

\displaystyle\ell_{i}(t)=\log(1+\exp(H_{h_{n}}^{\top}(X_{i})(\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau}))).

Then, we have

\displaystyle\hat{U}_{\tau}=\operatorname*{arg\,min}_{U_{\tau}}Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+U_{\tau})-Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)),

\displaystyle\partial_{t}Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})|_{t=0}=\frac{-1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))\right)H_{h_{n}}^{\top}(X_{i})U_{\tau},

and

		$\displaystyle Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+U_{\tau})-Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau))-\partial_{t}Q_{n}^{\top}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})\|_{t=0}$
	$\displaystyle=$	$\displaystyle\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left[\ell_{i}(1)-\ell_{i}(0)-\ell_{i}^{\prime}(0)\right].$

In addition

\displaystyle|\ell_{i}^{{}^{\prime\prime\prime}}(t)|\leq|\ell_{i}^{{}^{\prime\prime}}(t)||H_{h_{n}}^{\top}(X_{i})U_{\tau}|.

Therefore, there exists a constant $\underline{c}>0$ such that

	$\displaystyle\ell_{i}(1)-\ell_{i}(0)-\ell^{\prime}_{i}(0)$
	$\displaystyle\geq\frac{\ell_{i}^{{}^{\prime\prime}}(0)}{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}\left[\exp(-\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|)+\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|-1\right]$
	$\displaystyle=\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))(1-\Lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau)))\left[\exp(-\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|)+\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|-1\right]$
	$\displaystyle\geq\underline{c}\left[\exp(-\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|)+\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|-1\right]$
	$\displaystyle\geq\underline{c}\left(\frac{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}{2}-\frac{\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|^{3}}{6}\right),$

where the first inequality is due to Bach (2010, Lemma 1) and the third inequality holds because

\displaystyle e^{-x}+x-1\geq\frac{x^{2}}{2}-\frac{x^{3}}{6},\;x>0.

To see the second inequality, note that $e^{-x}+x-1\geq 0$ for $x\geq 0$ and by Assumption 12,

	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\lambda(H_{h_{n}}^{\top}(x)\theta^{\textit{NP}}_{a,s}(\tau))$
	$\displaystyle=\inf_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}(\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|S_{i}=s,X_{i}=x)-R_{a}(\tau,s,x))\geq c/2,$

and

		$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}\lambda(H_{h_{n}}^{\top}(x)\theta^{\textit{NP}}_{a,s}(\tau))$
	$\displaystyle=$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\tau\in\Upsilon,x\in\text{Supp}(X)}(\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|S_{i}=s,X_{i}=x)+R_{a}(\tau,s,x))\leq 1-c/2.$

This implies

\displaystyle\inf_{\tau\in\Upsilon}\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))(1-\Lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau)))\geq\underline{c}>0,

and thus,

	$\displaystyle G_{n}(U_{\tau})$	$\displaystyle\equiv Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+U_{\tau})-Q_{n}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau))-\partial_{t}Q_{n}^{\top}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})\|_{t=0}$
		$\displaystyle\geq\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left(\frac{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}{2}-\frac{\|H_{h_{n}}^{\top}(X_{i})U_{\tau}\|^{3}}{6}\right).$

Let

\displaystyle\overline{\ell}=\inf_{U\in\Re^{h_{n}}}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})U)^{2}\right]^{3/2}}{\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}|H_{h_{n}}^{\top}(X_{i})U|^{3}}.

(L.1)

If $\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}\leq\overline{\ell}$ , then

	$\displaystyle\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}$
	$\displaystyle=\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{-1/2}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{3/2}}{\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{3}}\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{3}$
	$\displaystyle\geq\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{3},$

and thus

\displaystyle G_{n}(\hat{U}_{\tau})\geq\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\left(\frac{(H_{h_{n}}^{\top}(X_{i})U_{\tau})^{2}}{2}-\frac{|H_{h_{n}}^{\top}(X_{i})U_{\tau}|^{3}}{6}\right)\geq\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\frac{(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}}{3}.

On the other hand, if $\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}>\overline{\ell}$ , we can denote $\overline{U}_{\tau}=\frac{\overline{\ell}\hat{U}_{\tau}}{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}$ such that

\displaystyle\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\overline{U}_{\tau})^{2}\right]^{1/2}\leq\overline{\ell}.

Further, because $G_{n}(U_{\tau})$ is convex in $U_{\tau}$ we have

	$\displaystyle G_{n}(\hat{U}_{\tau})$	$\displaystyle=G_{n}\left(\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}{\overline{\ell}}\overline{U}_{\tau}\right)$
		$\displaystyle\geq\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}{\overline{\ell}}G_{n}(\overline{U}_{\tau})$
		$\displaystyle\geq\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}}{\overline{\ell}}\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\frac{(H_{h_{n}}^{\top}(X_{i})\overline{U}_{\tau})^{2}}{3}$
		$\displaystyle=\frac{\underline{c}\overline{\ell}}{3}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}.$

Therefore, for some constant $\overline{c}$ that only depends on $\underline{c}$ and $\kappa_{1}$ , we have

	$\displaystyle G_{n}(\hat{U}_{\tau})$	$\displaystyle\geq\min\left(\frac{\underline{c}}{n_{a}(s)}\sum_{i\in I_{a}(s)}\frac{(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}}{3},\frac{\underline{c}\overline{\ell}}{3}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})^{2}\right]^{1/2}\right)$
		$\displaystyle\geq\frac{\overline{c}}{3}\min(\|\|\hat{U}_{\tau}\|\|_{2}^{2},\overline{\ell}\|\|\hat{U}_{\tau}\|\|_{2}).$		(L.2)

In addition, by construction,

$\displaystyle G_{n}(\hat{U}_{\tau})$	$\displaystyle\leq\|\partial_{t}Q_{n}^{\top}(\tau,s,\hat{q}_{a}(\tau),\theta^{\textit{NP}}_{a,s}(\tau)+tU_{\tau})\|_{t=0}\|$
	$\displaystyle=\left\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{a,s}^{\textit{NP}}(\tau)))H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau}\right\|$
	$\displaystyle\leq\left\\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}\left\\|\hat{U}_{\tau}\right\\|_{1}$
	$\displaystyle+\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})\hat{U}_{\tau})\right]^{1/2}$
	$\displaystyle\leq\left\\|\frac{h_{n}^{1/2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}\left\\|\hat{U}_{\tau}\right\\|_{2}$
	$\displaystyle+\left[\frac{\kappa_{2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}\|\|\hat{U}_{\tau}\|\|_{2}.$	(L.3)

Combining (L.2) and (L.3), we have

\displaystyle\frac{\overline{c}}{3}\min(||\hat{U}_{\tau}||_{2},\overline{\ell})\leq\left\|\frac{h_{n}^{1/2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}+\left[\frac{\kappa_{2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}.

Taking $\sup_{\tau\in\Upsilon}$ on both sides, we have

	$\displaystyle\frac{\overline{c}}{3}\min(\sup_{\tau\in\Upsilon}\|\|\hat{U}_{\tau}\|\|_{2},\overline{\ell})$
	$\displaystyle\leq\sup_{\tau\in\Upsilon}\left\\|\frac{h_{n}^{1/2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}+\sup_{\tau\in\Upsilon}\left[\frac{\kappa_{2}}{n_{a}(s)}\sum_{i\in I_{a}(s)}R_{a}^{2}(\tau,s,X_{i})\right]^{1/2}$
	$\displaystyle=O_{p}(\sqrt{\frac{h_{n}\log(n)}{n}}),$

where the last line holds due to Assumption 12 and Lemma N.6. Finally, Lemma N.7 shows that $\overline{\ell}/\sqrt{\frac{h_{n}\log(n)}{n}}\rightarrow\infty$ , which implies

\displaystyle\sup_{\tau\in\Upsilon}||\hat{U}_{\tau}||_{2}=O_{p}\left(\sqrt{\frac{h_{n}\log(n)}{n}}\right).

Step 2. Recall

	$\displaystyle\overline{\Delta}_{a}(\tau,s,X_{i})$	$\displaystyle=\widehat{m}_{a}(\tau,s,X_{i})-\overline{m}_{a}(\tau,s,X_{i})$
		$\displaystyle=\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)\|X_{i},S_{i}=s)-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}^{\textit{NP}}_{a,s}(\tau))$
		$\displaystyle=\lambda(H_{h_{n}}^{\top}(X_{i})\theta^{\textit{NP}}_{a,s}(\tau))-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}^{\textit{NP}}_{a,s}(\tau))+R_{a}(\tau,s,X_{i}),$

and $\{X_{i}^{s},\xi^{s}_{i}\}_{i\in[n]}$ is generated independently from the joint distribution of $(X_{i},\xi_{i})$ given $S_{i}=s$ , and so is independent of $\{A_{i},S_{i}\}_{i\in[n]}$ . Let

H(\theta_{1},\theta_{2})=\mathbb{E}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{2})|S_{i}=s]=\mathbb{E}[\lambda(H_{h_{n}}^{\top}(X_{i}^{s})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i}^{s})\theta_{2})].

We have

	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{\|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{\|}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))-\mathbb{E}(R_{1}(\tau,s,X_{i})\|S_{i}=s)]}{n_{1}^{w}(s)}\right\|$
	$\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H(\theta^{\textit{NP}}_{1,s}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))-\mathbb{E}(R_{1}(\tau,s,X_{i})\|S_{i}=s)]}{n_{0}^{w}(s)}\right\|$		(L.4)

We aim to bound the first term on the RHS of (L.4). Note for any $\varepsilon>0$ , there exists a constant $M>0$ such that

\displaystyle\mathbb{P}\left(\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{NP}}(\tau)-\theta_{a,s}^{\textit{NP}}(\tau)||_{2}\leq M\sqrt{h_{n}\log(n)/n}\right)\geq 1-\varepsilon.

On the set $\mathcal{A}(\varepsilon)=\{\sup_{\tau\in\Upsilon}||\hat{\theta}_{a,s}^{\textit{NP}}(\tau)-\theta_{a,s}^{\textit{NP}}(\tau)||_{2}\leq M\sqrt{h_{n}\log(n)/n}\}$ , we have

	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))-\mathbb{E}(R_{1}(\tau,s,X_{i})\|S_{i}=s)]}{n_{1}^{w}(s)}\right\|$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1,s}^{\textit{NP}}(\tau))-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}_{1,s}^{\textit{NP}}(\tau))-H(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau))]}{n_{1}(s)}\right\|$
	$\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[R_{1}(\tau,s,X_{i})-\mathbb{E}(R_{1}(\tau,s,X_{i})\|S_{i}=s)\right]}{n_{1}^{w}(s)}\right\|$
	$\displaystyle\leq\frac{n_{1}(s)}{n_{1}^{w}(s)}\biggl{[}\sup_{s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},\|\|\theta_{1}-\theta_{2}\|\|_{2}\leq M\sqrt{h_{n}\log(n)/n}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{2})-H(\theta_{1},\theta_{2})]}{n_{1}(s)}\right\|$
	$\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[R_{1}(\tau,s,X_{i})-\mathbb{E}(R_{1}(\tau,s,X_{i})\|S_{i}=s)\right]}{n_{1}(s)}\right\|\biggr{]}$
	$\displaystyle\equiv\frac{n_{1}(s)}{n_{1}^{w}(s)}(D_{1}+D_{2}).$

For $D_{1}$ , we have

	$\displaystyle D_{1}\|\{A_{i},S_{i}\}_{i\in[n]}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\sup\left\|\frac{\sum_{i=N(s)}^{N(s)+n_{1}(s)}\xi_{i}^{s}[\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{2})-H(\theta_{1},\theta_{2})]}{n_{1}(s)}\right\|\|\{A_{i},S_{i}\}_{i\in[n]}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\|\|\mathbb{P}_{n_{1}(s)}-\mathbb{P}\|\|_{\mathcal{F}}\|\{A_{i},S_{i}\}_{i\in[n]},$

where the supremum in the first equality is taken over $\{s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n}\}$ and

\mathcal{F}=\begin{Bmatrix}&\xi_{i}^{s}[\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{2})-H(\theta_{1},\theta_{2})]:\\ &s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n}\end{Bmatrix}

with the envelope $F=2\xi_{i}^{s}$ . We further note that $||\max_{i\in[n]}2\xi_{i}^{s}||_{\mathbb{P},2}\leq C\log(n)$ ,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq\sup\mathbb{E}(H_{h_{n}}^{\top}(X^{s}_{i})(\theta_{1}-\theta_{2}))^{2}\leq\kappa_{2}M^{2}h_{n}\log(n)/n,

and

\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{a}{\varepsilon}\right)^{ch_{n}},

where $a,c$ are two fixed constants. Therefore, by Chernozhukov et al. (2014, Corollary 5.1),

\displaystyle\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]=O_{p}\left(h_{n}\log(n)/n+\frac{h_{n}\log^{2}(n)}{n}\right)=o_{p}(n^{-1/2}),

which implies $D_{1}=o_{p}(n^{-1/2})$ .

Similarly, we have

	$\displaystyle D_{2}\|\{A_{i},S_{i}\}_{i\in[n]}\stackrel{{\scriptstyle d}}{{=}}$	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i=N(s)}^{N(s)+n_{1}(s)}\xi^{s}_{i}\left[R_{1}(\tau,s,X^{s}_{i})-\mathbb{E}(R_{1}(\tau,s,X^{s}_{i}))\right]}{n_{1}(s)}\right\|\|\{A_{i},S_{i}\}_{i\in[n]}$
	$\displaystyle=$	$\displaystyle\|\|\mathbb{P}_{n_{1}(s)}-\mathbb{P}\|\|_{\mathcal{F}}\|\{A_{i},S_{i}\}_{i\in[n]}\|\{A_{i},S_{i}\}_{i\in[n]},$

where $\mathcal{F}=\{\xi_{i}^{s}[\tau-m_{1}(\tau,s,X_{i}^{s})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1,s}^{\textit{NP}}(\tau))]:\tau\in\Upsilon\}$ with an envelope $F=\xi_{i}^{s}$ . In addition, we note $\mathcal{F}$ is nested in

\displaystyle\widetilde{\mathcal{F}}=\{\xi_{i}^{s}[\tau-m_{1}(\tau,s,X_{i}^{s})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})]:\tau\in\Upsilon,\theta_{1}\in\Re^{h_{n}}\},

so that

\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\sup_{Q}N(\widetilde{\mathcal{F}},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{a}{\varepsilon}\right)^{ch_{n}}.

Last,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}=\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\mathbb{E}R_{1}^{2}(\tau,s,X_{i}^{s})=O(h_{n}\log(n)/n).

by Chernozhukov et al. (2014, Corollary 5.1),

\displaystyle\mathbb{E}\left[||\mathbb{P}_{n_{1}(s)}-\mathbb{P}||_{\mathcal{F}}|\{A_{i},S_{i}\}_{i\in[n]}\right]=O_{p}\left(h_{n}\log(n)/n+\frac{h_{n}\log^{2}(n)}{n}\right)=o_{p}(n^{-1/2}),

which implies $D_{2}=o_{p}(n^{-1/2})$ . This leads to (L.4).

Step 3. Note $|m_{a}(\tau_{1},s,X_{i})|\leq 1$ and

		$\displaystyle\|m_{a}(\tau_{1},s,X_{i})-m_{a}(\tau_{2},s,X_{i})\|$
	$\displaystyle\leq$	$\displaystyle\|\tau_{1}-\tau_{2}\|+\|\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau_{1})\|X_{i},S_{i}=s)-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau_{2})\|X_{i},S_{i}=s)\|$
	$\displaystyle\leq$	$\displaystyle\left(1+\frac{\sup_{y}f_{a}(y\|X_{i},S_{i}=s)}{\inf_{\tau\in\Upsilon}f_{a}(q_{a}(\tau))}\right)\|\tau_{1}-\tau_{2}\|$
	$\displaystyle\leq$	$\displaystyle C\|\tau_{1}-\tau_{2}\|.$

This implies Assumptions 3(ii) and 3(iii).

Appendix M Proof of Theorem A.1

Name Description $H_{p_{n}}(X_{i})$ High-dimensional regressor constructed based on $X_{i}$ with dimension $p_{n}$ $\theta_{a,s}^{\textit{HD}}(q)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\theta_{a,s}^{\textit{HD}}(q)$ is the pseudo true value defined in Assumption 13(i) $\hat{\theta}_{a,s}^{\textit{HD}}(q)$ For $a\in\{0,1\}$ , $s\in\mathcal{S}$ , and $\tau\in\Upsilon$ , $\hat{\theta}_{a,s}^{\textit{HD}}(q)$ is the estimator of $\theta_{a,s}^{\textit{HD}}(q)$ in (A.1) $\varrho_{n,a}(s)$ Lasso penalty defined after (A.1) $\hat{\Omega}$ Lasso penalty loading matrix defined after (A.1) $\mathcal{M}_{a}(q,s,x)$ For $a\in\{0,1\}$ , $q\in Re$ , $s\in\mathcal{S}$ , and $x\in\text{Supp}(X)$ , $\mathcal{M}_{a}(q,s,x)=\mathbb{P}(Y_{i}(a)\leq q|S_{i}=s,X_{i}=x)$

We focus on the case with $a=1$ . Note

\displaystyle\{X_{i},Y_{i}(1)\}_{i\in I_{1}(s)}|\{A_{i},S_{i}\}_{i\in[n]}\stackrel{{\scriptstyle d}}{{=}}\{X_{i}^{s},Y_{i}^{s}(1)\}_{i=N(s)+1}^{N(s)+n_{1}(s)}|\{A_{i},S_{i}\}_{i\in[n]}.

where $\{X_{i}^{s},Y_{i}(1)^{s}\}_{i\in[n]}$ is an i.i.d. sequence that is independent of $\{A_{i},S_{i}\}_{i\in[n]}$ . Therefore,

	$\displaystyle\hat{\theta}_{1,s}^{\textit{HD}}(q)\|\{A_{i},S_{i}\}_{i\in[n]}$	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\operatorname*{arg\,min}_{\theta_{a}}\frac{-1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\biggl{[}1\{Y_{i}^{s}(1)\leq q\}\log(\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{a}))$
		$\displaystyle+1\{Y_{i}^{s}(1)>q\}\log(1-\lambda(H_{p_{n}}^{\top}(X^{s}_{i})\theta_{a}))\biggr{]}+\frac{\varrho_{n,1}(s)}{n_{1}(s)}\|\|\hat{\Omega}\theta_{a}\|\|_{1}\biggl{\|}\{A_{i},S_{i}\}_{i\in[n]},$

and Assumption 13(vi) implies

	$\displaystyle 0<\kappa_{1}\leq$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\|v\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i})\right)v}{\|\|v\|\|_{2}^{2}}$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\|v\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\left(\frac{1}{n_{1}(s)}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i})\right)v}{\|\|v\|\|_{2}^{2}}\leq\kappa_{2}<\infty,$

and

	$\displaystyle 0<\kappa_{1}\leq$	$\displaystyle\inf_{a=0,1,s\in\mathcal{S},\|v\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i}))v}{\|\|v\|\|_{2}^{2}}$
	$\displaystyle\leq$	$\displaystyle\sup_{a=0,1,s\in\mathcal{S},\|v\|_{0}\leq h_{n}\ell_{n}}\frac{v^{T}\mathbb{E}(H_{p_{n}}(X^{s}_{i})H_{p_{n}}^{\top}(X^{s}_{i}))v}{\|\|v\|\|_{2}^{2}}\leq\kappa_{2}<\infty.$

In addition, we have $n_{1}(s)/n\stackrel{{\scriptstyle a.s.}}{{\rightarrow}}\pi(s)p(s)>0$ . Therefore, based on the results established by Belloni et al. (2017), we have, conditionally on $\{A_{i},S_{i}\}_{i\in[n]}$ , and thus, unconditionally,

\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{\textit{HD}}(q)-\theta_{a,s}^{\textit{HD}}(q)||_{2}=O_{p}\left(\sqrt{\frac{h_{n}\log(p_{n})}{n}}\right),

\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{post}(q)-\theta_{a,s}^{\textit{HD}}(q)||_{2}=O_{p}\left(\sqrt{\frac{h_{n}\log(p_{n})}{n}}\right),

\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{\textit{HD}}(q)||=O_{p}(h_{n}),

and

\displaystyle\sup_{a=0,1,q\in\mathcal{Q}^{\varepsilon}_{a},s\in\mathcal{S}}||\hat{\theta}_{a,s}^{post}(q)||=O_{p}(h_{n}).

In the following, we prove the results when $\hat{\theta}_{a,s}^{\textit{HD}}(q)$ is used. The results corresponding to $\hat{\theta}_{a,s}^{post}(q)$ can be proved in the same manner and are therefore omitted. Recall

	$\displaystyle\overline{\Delta}_{1}(\tau,s,X_{i})=\widehat{m}_{1}(\tau,s,X_{i})-\overline{m}_{1}(\tau,s,X_{i})$
	$\displaystyle=\mathbb{P}(Y_{i}(1)\leq q_{1}(\tau)\|X_{i},S_{i}=s)-\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))$
	$\displaystyle=\left[\mathcal{M}_{1}(q_{1}(\tau),s,X_{i})-\mathcal{M}_{1}(\hat{q}_{1}(\tau),s,X_{i})+r_{a}(\hat{q}_{1}(\tau),s,X_{i})\right]$
	$\displaystyle+\left[\lambda(H_{p_{n}}(X_{i})^{\top}\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))-\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))\right]$
	$\displaystyle\equiv\mathcal{R}_{a,s}(q_{1}(\tau),\hat{q}_{1}(\tau),X_{i})+\lambda(H_{p_{n}}(X_{i})^{\top}\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)))-\lambda(H_{p_{n}}(X_{i})^{\top}\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau))),$

where

\displaystyle\mathcal{R}_{a,s}(q,q^{\prime},X_{i})=\mathcal{M}_{1}(q,s,X_{i})-\mathcal{M}_{1}(q^{\prime},s,X_{i})+r_{a}(q^{\prime},s,X_{i}).

Let

H_{\lambda}(\theta_{1},\theta_{2},s)=\mathbb{E}[\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{1})-\lambda(H_{p_{n}}(X_{i})^{\top}\theta_{2})|S_{i}=s],

and

\displaystyle H_{R}(q,q^{\prime},s)=\mathbb{E}(\mathcal{R}_{a,s}(q,q^{\prime},X_{i})|S_{i}=s).

Then, we have

	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\biggl{\|}\frac{\sum_{i\in I_{1}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{1}^{w}(s)}-\frac{\sum_{i\in I_{0}(s)}\xi_{i}\overline{\Delta}_{1}(\tau,s,X_{i})}{n_{0}^{w}(s)}\biggr{\|}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H_{\lambda}(\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),s)-H_{R}(q_{1}(\tau),\hat{q}_{1}(\tau),s)]}{n_{1}^{w}(s)}\right\|$
	$\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{0}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H_{\lambda}(\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),s)-H_{R}(q_{1}(\tau),\hat{q}_{1}(\tau),s)]}{n_{0}^{w}(s)}\right\|$		(M.1)

We aim to bound the first term on the RHS of (M.1). Note for any $\varepsilon>0$ , there exists a constant $M>0$ such that

\displaystyle\mathbb{P}\begin{pmatrix}\sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)-\theta_{1,s}^{\textit{HD}}(q)||_{2}\leq M\sqrt{\frac{h_{n}\log(p_{n})}{n}},\leavevmode\nobreak\ \sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)||_{0}\leq Mh_{n},\\ \sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\leq Mn^{-1/2}\end{pmatrix}\geq 1-\varepsilon.

On the set

\mathcal{A}(\varepsilon)=\begin{Bmatrix}\sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)-\theta_{1,s}^{\textit{HD}}(q)||_{2}\leq M\sqrt{\frac{h_{n}\log(p_{n})}{n}},\leavevmode\nobreak\ \sup_{q\in\mathcal{Q}_{1}^{\varepsilon}}||\hat{\theta}_{1,s}^{\textit{HD}}(q)||_{0}\leq Mh_{n},\\ \sup_{\tau\in\Upsilon}|\hat{q}_{1}(\tau)-q_{1}(\tau)|\leq Mn^{-1/2}\end{Bmatrix},

we have

	$\displaystyle\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\overline{\Delta}_{1}(\tau,s,X_{i})-H_{\lambda}(\hat{\theta}^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),\theta^{\textit{HD}}_{1,s}(\hat{q}_{1}(\tau)),s)-H_{R}(q_{1}(\tau),\hat{q}_{1}(\tau),s)]}{n_{1}^{w}(s)}\right\|$
	$\displaystyle\leq\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1,s}^{\textit{NP}}(\tau))-\lambda(H_{h_{n}}^{\top}(X_{i})\hat{\theta}_{1,s}^{\textit{NP}}(\tau))-H_{\lambda}(\theta_{1,s}^{\textit{NP}}(\tau),\hat{\theta}^{\textit{NP}}_{1,s}(\tau),s)]}{n_{1}(s)}\right\|$
	$\displaystyle+\sup_{\tau\in\Upsilon,s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[\mathcal{R}_{1,s}(q_{1}(\tau),\hat{q}_{1}(\tau),X_{i})-\mathbb{E}(\mathcal{R}_{1,s}(q_{1}(\tau),\hat{q}_{1}(\tau),X_{i})\|S_{i}=s)\right]}{n_{1}^{w}(s)}\right\|$
	$\displaystyle\leq\frac{n_{1}(s)}{n_{1}^{w}(s)}\biggl{[}\sup\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}[\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X_{i})\theta_{2})-H_{\lambda}(\theta_{1},\theta_{2},s)]}{n_{1}(s)}\right\|$
	$\displaystyle+\sup_{q,q^{\prime}\in\mathcal{Q}_{1}^{\varepsilon},\|q-q^{\prime}\|\leq Mn^{-1/2},s\in\mathcal{S}}\left\|\frac{\sum_{i\in I_{1}(s)}\xi_{i}\left[\mathcal{R}_{1,s}(q,q^{\prime},X_{i})-\mathbb{E}(\mathcal{R}_{1,s}(q,q^{\prime},X_{i})\|S_{i}=s)\right]}{n_{1}(s)}\right\|\biggr{]}$
	$\displaystyle\equiv\frac{n_{1}(s)}{n_{1}^{w}(s)}(D_{1}+D_{2}),$

where the first supremum in the second inequality is taken over $\{s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n},||\theta_{1}||_{0}+||\theta_{2}||_{0}\leq Mh_{n}\}$ . Denote

\mathcal{F}=\begin{Bmatrix}\xi_{i}^{s}[\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{2})-H_{\lambda}(\theta_{1},\theta_{2},s)]:\\ s\in\mathcal{S},\theta_{1},\theta_{2}\in\Re^{h_{n}},||\theta_{1}-\theta_{2}||_{2}\leq M\sqrt{h_{n}\log(n)/n},||\theta_{1}||_{0}+||\theta_{2}||_{0}\leq Mh_{n}\end{Bmatrix}

with the envelope $F=2\xi_{i}^{s}$ . We further note that $||\max_{i\in[n]}2\xi_{i}^{s}||_{\mathbb{P},2}\leq C\log(n)$ ,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq\sup\mathbb{E}(H_{h_{n}}^{\top}(X^{s}_{i})(\theta_{1}-\theta_{2}))^{2}\leq\kappa_{2}M^{2}h_{n}\log(p_{n})/n,

and

\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{c_{1}p_{n}}{\varepsilon}\right)^{c_{2}h_{n}},

where $c_{1},c_{2}$ are two fixed constants. Therefore, Lemma N.2 implies

\displaystyle D_{1}=O_{p}\left(\frac{h_{n}\log(p_{n})}{n}+\frac{h_{n}\log(n)\log(p_{n})}{n}\right)=o_{p}(n^{-1/2}).

Similarly, denote

\displaystyle\mathcal{F}=\begin{Bmatrix}\xi^{s}_{i}\left[\mathcal{M}_{1}(q,s,X_{i})-\mathcal{M}_{1}(q^{\prime},s,X_{i})+r_{a}(q^{\prime},s,X_{i})\right]:q,q^{\prime}\in\mathcal{Q}_{1}^{\varepsilon},|q-q^{\prime}|\leq Mn^{-1/2},s\in\mathcal{S}\end{Bmatrix},

with an envelope $F=\xi_{i}^{s}$ . In addition, note that $\mathcal{F}$ is nested in

\displaystyle\widetilde{\mathcal{F}}=\{\xi_{i}^{s}[\mathcal{M}_{1}(q,s,X_{i}^{s})-\lambda(H_{h_{n}}^{\top}(X^{s}_{i})\theta_{1})]:q\in\mathcal{Q}_{1}^{\varepsilon},\theta_{1}\in\Re^{p_{n}},||\theta_{1}||_{0}\leq h_{n}\},

with the same envelope. Hence,

\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\sup_{Q}N(\widetilde{\mathcal{F}},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{ap_{n}}{\varepsilon}\right)^{ch_{n}}.

Last,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq C\sup_{q,q^{\prime}\in\mathcal{Q}_{1}^{\varepsilon},|q-q^{\prime}|\leq Mn^{-1/2},s\in\mathcal{S}}(|q-q^{\prime}|^{2}+\mathbb{E}r_{a}^{2}(q^{\prime},s,X_{i}^{s}))=O(h_{n}\log(p_{n})/n).

Therefore, Lemma N.2 implies

\displaystyle D_{2}=O_{p}\left(\frac{h_{n}\log(p_{n})}{n}+\frac{h_{n}\log(n)\log(p_{n})}{n}\right)=o_{p}(n^{-1/2}).

This leads to (M.1). We can establish Assumption 3(i) in the same manner. Assumptions 3(ii) and 3(iii) can be established by the same argument used in Step 3 of the proof of Theorem 5.6. This concludes the proof of Theorem A.1.

Appendix N Technical Lemmas

The first lemma was established in Zhang and Zheng (2020).

Lemma N.1.

Let $S_{k}$ be the $k$ -th partial sum of Banach space valued independent identically distributed random variables, then

\displaystyle\mathbb{P}(\max_{1\leq k\leq n}||S_{k}||\geq\varepsilon)\leq 3\max_{1\leq k\leq n}\mathbb{P}(||S_{k}||\geq\varepsilon/3)\leq 9\mathbb{P}(||S_{n}||\geq\varepsilon/30).

Proof.

The first inequality is due to Zhang and Zheng (2020, Lemma E.1) and the second inequality is due to Montgomery-Smith (1993, Theorem 1). ∎

The next lemma is due to Chernozhukov et al. (2014) with our modification of their maximal inequality to the case with covariate-adaptive randomization.

Lemma N.2.

Suppose Assumption 1 holds. Let $w_{i}=1$ or $\xi_{i}$ defined in Assumption 4. Denote $\mathcal{F}$ as a class of functions of the form $f(x,y_{1},y_{0})$ where, $f(x,y_{1},y_{0})$ is a measurable function and $\mathbb{E}(f(X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s)=0$ . Further suppose $\max_{s\in\mathcal{S}}\mathbb{E}(|F_{i}|^{q}|S_{i}=s)<\infty$ for some $q\geq 2$ , where

F_{i}=\sup_{f\in\mathcal{F}}|w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))|,

$\mathcal{F}$ is of the VC-type with coefficients $(\alpha_{n},v_{n})>0$ , and $\sup_{f\in\mathcal{F}}\mathbb{E}(f^{2}|S=s)\leq\sigma_{n}^{2}$ . Then,

	$\displaystyle\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left\|\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right\|$
	$\displaystyle=O_{p}\left(\sqrt{v_{n}\sigma_{n}^{2}\log\left(\frac{\alpha_{n}\|\|F\|\|_{\mathbb{P},2}}{\sigma}\right)}+\frac{v_{n}\|\|\max_{i\in[n]}F_{i}\|\|_{\mathbb{P},2}\log\left(\frac{\alpha_{n}\|\|F\|\|_{\mathbb{P},2}}{\sigma}\right)}{\sqrt{n}}\right),$

and

	$\displaystyle\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left\|\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right\|$
	$\displaystyle=O_{p}\left(\sqrt{v_{n}\sigma_{n}^{2}\log\left(\frac{\alpha_{n}\|\|F\|\|_{\mathbb{P},2}}{\sigma}\right)}+\frac{v_{n}\|\|\max_{i\in[n]}F_{i}\|\|_{\mathbb{P},2}\log\left(\frac{\alpha_{n}\|\|F\|\|_{\mathbb{P},2}}{\sigma}\right)}{\sqrt{n}}\right).$

Proof.

We focus on establishing the first statement. The proof of the second statement is similar and is omitted. Following Bugni et al. (2018), we define the sequence of i.i.d. random variables $\{(w_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)):1\leq i\leq n\}$ with marginal distributions equal to the distribution of $(w_{i},X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s$ . The distribution of $\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))$ is the same as the counterpart with units ordered by strata and then ordered by $A_{i}=1$ first and $A_{i}=0$ second within each stratum, i.e.,

	$\displaystyle\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))$	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}w^{s}_{i}f(X^{s}_{i},Y^{s}_{i}(1),Y^{s}_{i}(0))$
		$\displaystyle\equiv\Gamma_{n}^{s}(N(s)+n_{1}(s),f)-\Gamma_{n}^{s}(N(s)+1,f),$

where $N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}$ and

\displaystyle\Gamma_{n}^{s}(k,f)=\sum_{i\in[k]}w^{s}_{i}f(X^{s}_{i},Y^{s}_{i}(1),Y^{s}_{i}(0)).

Let $\mu_{n}=\sqrt{v_{n}\sigma_{n}^{2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}+\frac{v_{n}||\max_{i\in[n]}F_{i}||_{\mathbb{P},2}\log\left(\frac{\alpha_{n}||F||_{\mathbb{P},2}}{\sigma}\right)}{\sqrt{n}}.$ Then, for some constant $C>0$ , we have

	$\displaystyle\mathbb{P}\left(\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left\|\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right\|\geq t\mu_{n}\right)$
	$\displaystyle=\mathbb{P}\left(\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left\|\Gamma_{n}^{s}(N(s)+n_{1}(s),f)-\Gamma_{n}^{s}(N(s)+1,f)\right\|\geq t\mu_{n}\right)$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}\mathbb{P}\left(\max_{1\leq k\leq n}\sup_{f\in\mathcal{F}}\left\|\frac{1}{\sqrt{n}}\Gamma_{n}^{s}(k,f)\right\|\geq t\mu_{n}/2\right)$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}9\;\mathbb{P}\left(\sup_{f\in\mathcal{F}}\left\|\frac{1}{\sqrt{n}}\Gamma_{n}^{s}(n,f)\right\|\geq t\mu_{n}/60\right)$
	$\displaystyle\leq\sum_{s\in\mathcal{S}}\frac{540\;\mathbb{E}\left(\sup_{f\in\mathcal{F}}\left\|\frac{1}{\sqrt{n}}\Gamma_{n}^{s}(n,f)\right\|\right)}{t\mu_{n}}$
	$\displaystyle=\sum_{s\in\mathcal{S}}\frac{540\mathbb{E}\left(\sqrt{n}\|\|\mathbb{P}^{s}_{n}-\mathbb{P}^{s}\|\|_{\mathcal{F}}\right)}{t\mu_{n}}$
	$\displaystyle\leq C/t,$

where $\mathbb{P}_{n}^{s}$ and $\mathbb{P}^{s}$ are the empirical process and expectation w.r.t. i.i.d. data $\{w_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)\}_{i\in[n]}$ , respectively, the second inequality is due to Lemma N.1, the last equality is due to the fact that

\displaystyle\mathbb{E}w_{i}^{s}f(X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0))=\mathbb{E}\left(w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s\right)=0,

and the last inequality is due to the fact that, by Chernozhukov et al. (2014, Corollary 5.1),

\displaystyle\mathbb{E}\left(\sqrt{n}||\mathbb{P}^{s}_{n}-\mathbb{P}^{s}||_{\mathcal{F}}\right)\leq C\mu_{n}.

Then, for any $\varepsilon>0$ , we can choose $t\geq C/\varepsilon$ so that

\displaystyle\mathbb{P}\left(\sup_{f\in\mathcal{F},s\in\mathcal{S}}\frac{1}{\sqrt{n}}\left|\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}w_{i}f(X_{i},Y_{i}(1),Y_{i}(0))\right|\geq t\mu_{n}\right)\leq\varepsilon,

which implies the desired result. ∎

The next lemma is similar to Zhang and Zheng (2020, Lemma E.2) but with additional covariates and regression adjustments. It is retained in the Supplement to make the paper self-contained.

Lemma N.3.

Suppose Assumptions in Theorem 3 hold. Denote

	$\displaystyle\varpi_{n,1}(\tau)=$	$\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\phi_{1}(\tau,s,Y_{i}(1),X_{i})$
	$\displaystyle-$	$\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,s,Y_{i}(0),X_{i}),$

and

\displaystyle\varpi_{n,2}(\tau)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\phi_{s}(\tau,S_{i}).

Then, uniformly over $\tau\in\Upsilon$ ,

\displaystyle(\varpi_{n,1}(\tau),\varpi_{n,2}(\tau))\rightsquigarrow(\mathcal{B}_{1}(\tau),\mathcal{B}_{2}(\tau)),

where $(\mathcal{B}_{1}(\tau),\mathcal{B}_{2}(\tau))$ are two independent Gaussian processes with covariance kernels $\Sigma_{1}(\tau,\tau^{\prime})$ and $\Sigma_{2}(\tau,\tau^{\prime})$ , respectively, such that

	$\displaystyle\Sigma_{1}(\tau,\tau^{\prime})$	$\displaystyle=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
		$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})$

and

\displaystyle\Sigma_{2}(\tau,\tau^{\prime})=\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).

Proof.

We follow the general argument in the proof of Bugni et al. (2018, Lemma B.2). We divide the proof into two steps. In the first step, we show that

\displaystyle(\varpi_{n,1}(\tau),\varpi_{n,2}(\tau))\stackrel{{\scriptstyle d}}{{=}}(\varpi^{\star}_{n,1}(\tau),\varpi_{n,2}(\tau))+o_{p}(1),

where the $o_{p}(1)$ term holds uniformly over $\tau\in\Upsilon$ , $\varpi^{\star}_{n,1}(\tau)\perp\!\!\!\perp\varpi_{n,2}(\tau)$ , and, uniformly over $\tau\in\Upsilon$ ,

\displaystyle\varpi^{\star}_{n,1}(\tau)\rightsquigarrow\mathcal{B}_{1}(\tau).

In the second step, we show that

\displaystyle\varpi_{n,2}(\tau)\rightsquigarrow\mathcal{B}_{2}(\tau)

uniformly over $\tau\in\Upsilon$ .

Step 1. Recall that we define $\{(X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)):1\leq i\leq n\}$ as a sequence of i.i.d. random variables with marginal distributions equal to the distribution of $(X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s$ and $N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}$ . The distribution of $\varpi_{n,1}(\tau)$ is the same as the counterpart with units ordered by strata and then ordered by $A_{i}=1$ first and $A_{i}=0$ second within each stratum, i.e.,

\displaystyle\varpi_{n,1}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\}

\displaystyle\stackrel{{\scriptstyle d}}{{=}}\widetilde{\varpi}_{n,1}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\}

where

	$\displaystyle\widetilde{\varpi}_{n,1}(\tau)$	$\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})$
		$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}),$

with

	$\displaystyle\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})=$	$\displaystyle\frac{\tau-1\{Y_{i}^{s}(1)\leq q_{1}(\tau)\}-m_{1}(\tau,s)-(1-\pi(s))\left(\overline{m}_{1}(\tau,s,X^{s}_{i})-\overline{m}_{1}(\tau,s)\right)}{\pi(s)f_{1}(q_{1}(\tau))}$
	$\displaystyle-$	$\displaystyle\frac{\left(\overline{m}_{0}(\tau,s,X^{s}_{i})-\overline{m}_{0}(\tau,s)\right)}{f_{0}(q_{0}(\tau))},$

and

	$\displaystyle\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i})=$	$\displaystyle\frac{\tau-1\{Y_{i}^{s}(0)\leq q_{0}(\tau)\}-m_{0}(\tau,s)-\pi(s)\left(\overline{m}_{0}(\tau,s,X^{s}_{i})-\overline{m}_{0}(\tau,s)\right)}{(1-\pi(s))f_{0}(q_{0}(\tau))}$
	$\displaystyle-$	$\displaystyle\frac{\left(\overline{m}_{1}(\tau,s,X^{s}_{i})-\overline{m}_{1}(\tau,s)\right)}{f_{1}(q_{1}(\tau))}.$

As $\varpi_{n,2}(\tau)$ is only a function of $\{S_{i}\}_{i\in[n]}$ , we have

\displaystyle(\varpi_{n,1}(\tau),\varpi_{n,2}(\tau))\stackrel{{\scriptstyle d}}{{=}}(\widetilde{\varpi}_{n,1}(\tau),\varpi_{n,2}(\tau)).

Let $F(s)=\mathbb{P}(S_{i}<s)$ , $p(s)=\mathbb{P}(S_{i}=s)$ , and

	$\displaystyle\varpi^{\star}_{n,1}(\tau)\equiv$	$\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor nF(s)\rfloor+1}^{\lfloor n(F(s)+\pi(s)p(s))\rfloor}\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})$
	$\displaystyle-$	$\displaystyle\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor n(F(s)+\pi(s)p(s))\rfloor+1}^{\lfloor n(F(s)+p(s))\rfloor}\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}).$

Note $\varpi^{\star}_{n,1}(\tau)$ is a function of $(Y_{i}^{s}(1),Y_{i}^{s}(0),X_{i}^{s})_{i\in[n],s\in\mathcal{S}}$ , which is independent of $\{A_{i},S_{i}\}_{i\in[n]}$ by construction. Therefore,

\displaystyle\varpi^{\star}_{n,1}(\tau)\perp\!\!\!\perp\varpi_{n,2}(\tau).

Note that

\displaystyle\frac{N(s)}{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}F(s),\quad\frac{n_{1}(s)}{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\pi(s)p(s),\quad\text{and}\quad\frac{n(s)}{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}p(s).

Denote $\Gamma_{n,a}(s,t,\tau)=\sum_{i=1}^{\lfloor nt\rfloor}\frac{1}{n}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})$ for $a=0,1$ . In order to show that $\sup_{\tau\in\Upsilon}|\widetilde{\varpi}_{n,1}(\tau)-\varpi^{\star}_{n,1}(\tau)|=o_{p}(1)$ and $\varpi_{n,1}^{\star}(\tau)\rightsquigarrow\mathcal{B}_{1}(\tau)$ , it suffices to show that (1) for $a=0,1$ and $s\in\mathcal{S}$ , the stochastic process

\displaystyle\{\Gamma_{n,a}(s,t,\tau):t\in(0,1),\tau\in\Upsilon\}

is stochastically equicontinuous and (2) $\varpi_{n,1}^{\star}(\tau)\rightsquigarrow\mathcal{B}_{1}(\tau)$ converges to $B_{1}(\tau)$ in finite dimension.

Claim (1). We want to bound

\displaystyle\sup|\Gamma_{n,a}(s,t_{2},\tau_{2})-\Gamma_{n,a}(s,t_{1},\tau_{1})|,

where the supremum is taken over $0<t_{1}<t_{2}<t_{1}+\varepsilon<1$ and $\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon$ such that $\tau_{1},\tau_{1}+\varepsilon\in\Upsilon.$ Note that,

		$\displaystyle\sup\|\Gamma_{n,a}(s,t_{2},\tau_{2})-\Gamma_{n,a}(s,t_{1},\tau_{1})\|$
	$\displaystyle\leq$	$\displaystyle\sup_{0<t_{1}<t_{2}<t_{1}+\varepsilon<1,\tau\in\Upsilon}\|\Gamma_{n,a}(s,t_{2},\tau)-\Gamma_{n,a}(s,t_{1},\tau)\|+\sup_{t\in(0,1),\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}\|\Gamma_{n,a}(s,t,\tau_{2})-\Gamma_{n,a}(s,t,\tau_{1})\|.$		(N.1)

Then, for an arbitrary $\delta>0$ , by taking $\varepsilon=\delta^{4}$ , we have

	$\displaystyle\mathbb{P}\left(\sup_{0<t_{1}<t_{2}<t_{1}+\varepsilon<1,\tau\in\Upsilon}\|\Gamma_{n,a}(s,t_{2},\tau)-\Gamma_{n,a}(s,t_{1},\tau)\|\geq\delta\right)$
	$\displaystyle=\mathbb{P}\left(\sup_{0<t_{1}<t_{2}<t_{1}+\varepsilon<1,\tau\in\Upsilon}\|\sum_{i=\lfloor nt_{1}\rfloor+1}^{\lfloor nt_{2}\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})\|\geq\sqrt{n}\delta\right)$
	$\displaystyle=\mathbb{P}\left(\sup_{0<t\leq\varepsilon,\tau\in\Upsilon}\|\sum_{i=1}^{\lfloor nt\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})\|\geq\sqrt{n}\delta\right)$
	$\displaystyle\leq\mathbb{P}\left(\max_{1\leq k\leq\lfloor n\varepsilon\rfloor}\sup_{\tau\in\Upsilon}\|S_{k}(\tau)\|\geq\sqrt{n}\delta\right)$
	$\displaystyle\leq\frac{270\;\mathbb{E}\sup_{\tau\in\Upsilon}\|\sum_{i=1}^{\lfloor n\varepsilon\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})\|}{\sqrt{n}\delta}$
	$\displaystyle\lesssim\frac{\sqrt{n\varepsilon}}{\sqrt{n}\delta}\lesssim\delta,$

where in the first inequality, $S_{k}(\tau)=\sum_{i=1}^{k}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})$ and the second inequality holds due to Lemma N.1. To see the third inequality, denote

\displaystyle\mathcal{F}=\{\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s}):\tau\in\Upsilon\}

with an envelope function $F_{i}$ such that by Assumption 3, $||F_{i}||_{\mathbb{P},q}<\infty$ . In addition, by Assumption 3 again and the fact that

\displaystyle\{\tau-1\{Y_{i}^{s}(a)\leq q_{a}(\tau)\}-m_{a}(\tau,s):\tau\in\Upsilon\}

is of the VC-type with fixed coefficients $(\alpha,v)$ , so is $\mathcal{F}$ . Then, we have

\displaystyle J(1,\mathcal{F})<\infty,

where

\displaystyle J(\delta,\mathcal{F})=\sup_{Q}\int_{0}^{\delta}\sqrt{1+\log N(\mathcal{F},L_{2}(Q),\varepsilon||F||_{Q,2})}d\varepsilon,

$N(\mathcal{F},L_{2}(Q),\varepsilon||F||_{Q,2})$ is the covering number, and the supremum is taken over all discrete probability measures $Q$ . Therefore, by van der Vaart and Wellner (1996, Theorem 2.14.1)

\displaystyle\frac{270\mathbb{E}\sup_{\tau\in\Upsilon}|\sum_{i=1}^{\lfloor n\varepsilon\rfloor}\phi_{a}(\tau,s,Y_{i}^{s}(a),X_{i}^{s})|}{\sqrt{n}\delta}\lesssim\frac{\sqrt{\lfloor n\varepsilon\rfloor}\left[\mathbb{E}\sqrt{\lfloor n\varepsilon\rfloor}||\mathbb{P}_{\lfloor n\varepsilon\rfloor}-\mathbb{P}||_{\mathcal{F}}\right]}{\sqrt{n}\delta}\lesssim\frac{\sqrt{\lfloor n\varepsilon\rfloor}J(1,\mathcal{F})}{\sqrt{n}\delta}.

For the second term on the RHS of (N), by taking $\varepsilon=\delta^{4}$ , we have

	$\displaystyle\mathbb{P}\left(\sup_{t\in(0,1),\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}\|\Gamma_{n,a}(s,t,\tau_{2})-\Gamma_{n,a}(s,t,\tau_{1})\|\geq\delta\right)$
	$\displaystyle=\mathbb{P}\left(\max_{1\leq k\leq n}\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}\|S_{k}(\tau_{1},\tau_{2})\|\geq\sqrt{n}\delta\right)$
	$\displaystyle\leq\frac{270\;\mathbb{E}\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}\|\sum_{i=1}^{n}(\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s})\|}{\sqrt{n}\delta}$
	$\displaystyle\lesssim\delta\sqrt{\log(\frac{C}{\delta^{2}})},$

where in the first equality, $S_{k}(\tau_{1},\tau_{2})=\sum_{i=1}^{k}(\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s}))$ and the first inequality is due to Lemma N.1. To see the last inequality, denote

\displaystyle\mathcal{F}=\{\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s}):\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon\}

with a constant envelope function $F_{i}$ such that $||F_{i}||_{\mathbb{P},q}<\infty$ . In addition, due to Assumptions 2.2 and 3.3, one can show that

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq c\varepsilon\equiv\sigma^{2}

for some constant $c>0$ . Last, due to Assumption 3.2, $\mathcal{F}$ is of the VC-type with fixed coefficients $(\alpha,v)$ . Therefore, by Chernozhukov et al. (2014, Corollary 5.1),

		$\displaystyle\frac{270\mathbb{E}\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\varepsilon}\|\sum_{i=1}^{n}(\phi_{a}(\tau_{2},s,Y_{i}^{s}(a),X_{i}^{s})-\phi_{a}(\tau_{1},s,Y_{i}^{s}(a),X_{i}^{s}))\|}{\sqrt{n}\delta}$
	$\displaystyle\lesssim$	$\displaystyle\frac{\sqrt{n}\mathbb{E}\|\|\mathbb{P}_{n}-\mathbb{P}\|\|_{\mathcal{F}}}{\delta}\lesssim\sqrt{\frac{\sigma^{2}\log(\frac{C}{\sigma})}{\delta^{2}}}+\frac{C\log(\frac{C}{\sigma})}{\sqrt{n}\delta}\lesssim\delta\sqrt{\log(\frac{C}{\delta^{2}})},$

where the last inequality holds by letting $n$ be sufficiently large. Note that $\delta\sqrt{\log(\frac{C}{\delta^{2}})}\rightarrow 0$ as $\delta\rightarrow 0$ . This concludes the proof of Claim (1).

Claim (2). For a single $\tau$ , by the triangular array CLT,

\displaystyle\varpi_{n,1}^{\star}(\tau)\rightsquigarrow N(0,\Sigma_{1}(\tau,\tau)),

where

	$\displaystyle\Sigma_{1}(\tau,\tau)$	$\displaystyle=\lim_{n\rightarrow\infty}\sum_{s\in\mathcal{S}}\frac{(\lfloor n(F(s)+\pi(s)p(s))\rfloor-\lfloor nF(s)\rfloor)}{n}\mathbb{E}\phi_{1}^{2}(\tau,s,Y_{i}^{s}(1),X_{i}^{s})$
		$\displaystyle+\lim_{n\rightarrow\infty}\sum_{s\in\mathcal{S}}\frac{(\lfloor n(F(s)+p(s))\rfloor-\lfloor n(F(s)+p(s)\pi(s))\rfloor)}{n}\mathbb{E}\phi_{0}^{2}(\tau,s,Y_{i}^{s}(0),X_{i}^{s})$
		$\displaystyle=\sum_{s\in\mathcal{S}}p(s)\mathbb{E}(\pi(s)\phi_{1}^{2}(\tau,S_{i},Y_{i}(1),X_{i})+(1-\pi(s))\phi_{0}^{2}(\tau,S_{i},Y_{i}(0),X_{i})\|S_{i}=s)$
		$\displaystyle=\mathbb{E}\pi(S_{i})\phi_{1}^{2}(\tau,S_{i},Y_{i}(1),X_{i})+\mathbb{E}(1-\pi(S_{i}))\phi_{0}^{2}(\tau,S_{i},Y_{i}(0),X_{i}).$

Finite dimensional convergence is proved by the Cramér-Wold device. In particular, we can show that the covariance kernel is

	$\displaystyle\Sigma_{1}(\tau,\tau^{\prime})=$	$\displaystyle\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
		$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i}).$

This concludes the proof of Claim (2), and thereby leads to the desired results in Step 1.

Step 2. As $m_{a}(\tau,S_{i})=\tau-\mathbb{P}(Y_{i}(a)\leq q_{a}(\tau)|S_{i})$ is Lipschitz continuous in $\tau$ with a bounded Lipschitz constant, $\{m_{a}(\tau,S_{i}):\tau\in\Upsilon\}$ is of the VC-type with fixed coefficients $(\alpha,v)$ and a constant envelope function. Therefore, $\{\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}:\tau\in\Upsilon\}$ is a Donsker class and we have

\displaystyle\varpi_{n,2}(\tau)\rightsquigarrow\mathcal{B}_{2}(\tau),

where $\mathcal{B}_{2}(\tau)$ is a Gaussian process with covariance kernel

\displaystyle\Sigma_{2}(\tau,\tau^{\prime})=\mathbb{E}\left(\frac{m_{1}(\tau,S_{i})}{f_{1}(q_{1}(\tau))}-\frac{m_{0}(\tau,S_{i})}{f_{0}(q_{0}(\tau))}\right)\left(\frac{m_{1}(\tau^{\prime},S_{i})}{f_{1}(q_{1}(\tau^{\prime}))}-\frac{m_{0}(\tau^{\prime},S_{i})}{f_{0}(q_{0}(\tau^{\prime}))}\right)\equiv\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).

This concludes the proof. ∎

Lemma N.4.

Suppose the Assumptions in Theorem 5 hold and recall $D_{n}^{w}(s)=\sum_{i=1}^{n}\xi_{i}(A_{i}-\pi(S_{i}))1\{S_{i}=s\}$ . Then, $\max_{s\in\mathcal{S}}|(D_{n}^{w}(s)-D_{n}(s))/n(s)|=o_{p}(1)$ and $\max_{s\in\mathcal{S}}|D_{n}^{w}(s)/n^{w}(s)|=o_{p}(1)$ .

Proof.

We note that $n^{w}(s)/n(s)\stackrel{{\scriptstyle p}}{{\longrightarrow}}1$ and $D_{n}(s)/n(s)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0$ . Therefore, we only need to show

\displaystyle\frac{D_{n}^{w}(s)-D_{n}(s)}{n(s)}=\sum_{i=1}^{n}\frac{(\xi_{i}-1)(A_{i}-\pi(s))1\{S_{i}=s\}}{n(s)}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

As $n(s)\rightarrow\infty$ a.s., given data,

	$\displaystyle\frac{1}{n(s)}\sum_{i=1}^{n}(A_{i}-\pi(s))^{2}1\{S_{i}=s\}=$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\left(A_{i}-\pi(s)-2\pi(s)(A_{i}-\pi(s))+\pi(s)-\pi^{2}(s)\right)1\{S_{i}=s\}$
	$\displaystyle=$	$\displaystyle\frac{D_{n}(s)-2\pi(s)D_{n}(s)}{n(s)}+\pi(s)(1-\pi(s))\stackrel{{\scriptstyle p}}{{\longrightarrow}}\pi(s)(1-\pi(s)).$

Then, by the Lindeberg CLT, conditionally on data,

\displaystyle\frac{1}{\sqrt{n(s)}}\sum_{i=1}^{n}(\xi_{i}-1)(A_{i}-\pi(s))1\{S_{i}=s\}\rightsquigarrow N(0,\pi(s)(1-\pi(s)))=O_{p}(1),

and thus

\displaystyle\frac{D_{n}^{w}(s)-D_{n}(s)}{n(s)}=O_{p}(n^{-1/2}(s))=o_{p}(1).

∎

Lemma N.5.

Suppose the Assumptions in Theorem 5 hold. Then, uniformly over $\tau\in\Upsilon$ ,

\displaystyle\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)\underset{\xi}{\overset{\mathbb{P}}{\rightsquigarrow}}\mathcal{B}(\tau),

where $\mathcal{B}(\tau)$ is a Gaussian process with the covariance kernel

	$\displaystyle\Sigma(\tau,\tau^{\prime})=\mathbb{E}\pi(S_{i})\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau^{\prime},S_{i},Y_{i}(1),X_{i})$
	$\displaystyle+\mathbb{E}(1-\pi(S_{i}))\phi_{0}(\tau,S_{i},Y_{i}(0),X_{i})\phi_{0}(\tau^{\prime},S_{i},Y_{i}(0),X_{i})+\mathbb{E}\phi_{s}(\tau,S_{i})\phi_{s}(\tau^{\prime},S_{i}).$

Proof.

We divide the proof into two steps. In the first step, we show the conditional stochastic equicontinuity of $\varpi_{n,1}^{w}(\tau)$ and $\varpi_{n,2}^{w}(\tau)$ . In the second step, we show the finite-dimensional convergence of $\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)$ conditional on data.

Step 1. Following the same idea in the proof of Lemma N.3, we define $\{(\xi_{i}^{s},X_{i}^{s},Y_{i}^{s}(1),Y_{i}^{s}(0)):1\leq i\leq n\}$ as a sequence of i.i.d. random variables with marginal distributions equal to the distribution of $(\xi_{i},X_{i},Y_{i}(1),Y_{i}(0))|S_{i}=s$ and $N(s)=\sum_{i=1}^{n}1\{S_{i}<s\}$ . The distribution of $\varpi_{n,1}(\tau)$ is the same as the counterpart with units ordered by strata and then ordered by $A_{i}=1$ first and $A_{i}=0$ second within each stratum, i.e.,

\displaystyle\varpi_{n,1}^{w}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\}

\displaystyle\stackrel{{\scriptstyle d}}{{=}}\widetilde{\varpi}^{w}_{n,1}(\tau)|\{(A_{i},S_{i})_{i\in[n]}\},

and thus,

\displaystyle\varpi_{n,1}^{w}(\tau)

\displaystyle\stackrel{{\scriptstyle d}}{{=}}\widetilde{\varpi}^{w}_{n,1}(\tau),

(N.2)

where

	$\displaystyle\widetilde{\varpi}_{n,1}^{w}(\tau)$	$\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}(\xi_{i}^{s}-1)\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})$
		$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}(\xi_{i}^{s}-1)\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}).$

In addition, let

	$\displaystyle\widetilde{\varpi}^{w\star}_{n,1}(\tau)$	$\displaystyle\equiv\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor nF(s)\rfloor+1}^{\lfloor n(F(s)+\pi(s)p(s))\rfloor}(\xi_{i}^{s}-1)\phi_{1}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})$
		$\displaystyle-\sum_{s\in\mathcal{S}}\frac{1}{\sqrt{n}}\sum_{i=\lfloor n(F(s)+\pi(s)p(s))\rfloor+1}^{\lfloor n(F(s)+p(s))\rfloor}(\xi_{i}^{s}-1)\phi_{0}(\tau,s,Y^{s}_{i}(0),X^{s}_{i}).$

Following exactly the same argument as in the proof of Lemma N.3, we have

\displaystyle\sup_{\tau\in\Upsilon}|\widetilde{\varpi}_{n,1}^{w}(\tau)-\widetilde{\varpi}^{w\star}_{n,1}(\tau)|=o_{p}(1).

(N.3)

and $\widetilde{\varpi}^{w\star}_{n,1}(\tau)$ is unconditionally stochastically equicontinuous, i.e., for any $\varepsilon>0$ , as $n\rightarrow\infty$ followed by $\delta\rightarrow 0$ , we have

	$\displaystyle\mathbb{E}\mathbb{P}_{\xi}\left(\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\delta}\|\widetilde{\varpi}^{w\star}_{n,1}(\tau_{1})-\widetilde{\varpi}^{w\star}_{n,1}(\tau_{2})\|\geq\varepsilon\right)$
	$\displaystyle=\mathbb{P}\left(\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\delta}\|\widetilde{\varpi}^{w\star}_{n,1}(\tau_{1})-\widetilde{\varpi}^{w\star}_{n,1}(\tau_{2})\|\geq\varepsilon\right)\rightarrow 0,$

where $\mathbb{P}_{\xi}$ means the probability operator is with respect to the bootstrap weights $\{\xi_{i}\}_{i\in[n]}$ and is conditional on data. This implies the unconditional stochastic equicontinuity of $\varpi^{w}_{n,1}(\tau)$ due to (N.2) and (N.3), which further implies the conditional stochastic equicontinuity of $\varpi^{w}_{n,1}(\tau)$ , i.e., for any $\varepsilon>0$ , as $n\rightarrow\infty$ followed by $\delta\rightarrow 0$ ,

\displaystyle\mathbb{P}_{\xi}\left(\sup_{\tau_{1},\tau_{2}\in\Upsilon,\tau_{1}<\tau_{2}<\tau_{1}+\delta}|\widetilde{\varpi}^{w\star}_{n,1}(\tau_{1})-\widetilde{\varpi}^{w\star}_{n,1}(\tau_{2})|\geq\varepsilon\right)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

By a similar but simpler argument, the conditional stochastic equicontinuity of $\varpi_{n,2}^{w}(\tau)$ holds as well. This concludes the first step.

Step 2. We first show the asymptotic normality of $\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)$ conditionally on data for a fixed $\tau$ . Note

		$\displaystyle\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\left[A_{i}1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})-(1-A_{i})1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})+\phi_{s}(\tau,S_{i})\right]$
	$\displaystyle\equiv$	$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\xi_{i}-1)\mathcal{J}_{i}(\tau,s).$

Conditionally on data, $\{(\xi_{i}-1)\mathcal{J}_{i}(\tau)\}_{i\in[n]}$ is a sequence of i.n.i.d. random variables. In order to apply the Lindeberg-Feller central limit theorem, we only need to show that (1)

\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}^{2}_{n,i}(\tau)\stackrel{{\scriptstyle p}}{{\longrightarrow}}\Sigma(\tau,\tau),

where $\Sigma(\tau,\tau)$ is defined in Theorem 3, and (2) the Lindeberg condition holds, i.e.,

\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}_{n,i}^{2}(\tau)\mathbb{E}(\xi_{i}-1)^{2}1\{|(\xi_{i}-1)\mathcal{J}_{n,i}(\tau)|\geq\sqrt{n}\varepsilon\}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

For part (1), we have

\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}^{2}_{n,i}(\tau)=\sigma_{1}^{2}+2\sigma_{12}+\sigma_{2}^{2},

where

\displaystyle\sigma_{1}^{2}=\frac{1}{n}\sum_{i=1}^{n}\left[A_{i}1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})-(1-A_{i})1\{S_{i}=s\}\phi_{0}(\tau,S_{i},Y_{i}(1),X_{i})\right]^{2},

\displaystyle\sigma_{12}=\frac{1}{n}\sum_{i=1}^{n}\left[A_{i}1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})-(1-A_{i})1\{S_{i}=s\}\phi_{1}(\tau,S_{i},Y_{i}(1),X_{i})\right]\phi_{s}(\tau,S_{i}),

and

\displaystyle\sigma_{2}^{2}=\frac{1}{n}\sum_{i=1}^{n}\phi_{s}^{2}(\tau,S_{i}).

Note

	$\displaystyle\sigma_{1}^{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}A_{i}1\{S_{i}=s\}\phi^{2}_{1}(\tau,S_{i},Y_{i}(1),X_{i})+\frac{1}{n}\sum_{i=1}^{n}(1-A_{i})1\{S_{i}=s\}\phi_{0}^{2}(\tau,S_{i},Y_{i}(1),X_{i})$
		$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\frac{1}{n}\sum_{s\in\mathcal{S}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\phi^{2}_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})+\frac{1}{n}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}\phi_{0}^{2}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})$
		$\displaystyle\stackrel{{\scriptstyle p}}{{\longrightarrow}}\sum_{s\in\mathcal{S}}[\pi(s)\mathbb{E}\phi^{2}_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})+(1-\pi(s))\mathbb{E}\phi^{2}_{0}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})]$
		$\displaystyle=\mathbb{E}\left[\pi(S_{i})\phi^{2}_{1}(\tau,S_{i},Y_{i}(1),X_{i})+(1-\pi(s))\phi^{2}_{0}(\tau,S_{i},Y_{i}(1),X_{i})\right],$

where the convergence holds since $N(s)/n\rightarrow F(s)$ , $n_{1}(s)/n\stackrel{{\scriptstyle p}}{{\longrightarrow}}\pi(s)p(s)$ , $n(s)/n\stackrel{{\scriptstyle p}}{{\longrightarrow}}p(s)$ , and uniform convergence of the partial sum process. Similarly,

	$\displaystyle\sigma_{12}$	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}\frac{1}{n}\sum_{s\in\mathcal{S}}\sum_{i=N(s)+1}^{N(s)+n_{1}(s)}\phi_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})\phi_{s}(\tau,s)+\frac{1}{n}\sum_{i=N(s)+n_{1}(s)+1}^{N(s)+n(s)}\phi_{0}(\tau,s,Y^{s}_{i}(1),X^{s}_{i})\phi_{s}(\tau,s)$
		$\displaystyle\stackrel{{\scriptstyle p}}{{\longrightarrow}}\sum_{s\in\mathcal{S}}[\pi(s)\mathbb{E}\phi_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})+(1-\pi(s))\mathbb{E}\phi_{0}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})]\phi_{s}(\tau,s)=0,$

where we use the fact that

\displaystyle\mathbb{E}\phi_{1}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})=\mathbb{E}\phi_{0}(\tau,s,Y_{i}^{s}(1),X^{s}_{i})=0.

By the standard weak law of large numbers, we have

\displaystyle\sigma_{2}^{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\mathbb{E}\phi_{s}^{2}(\tau,S_{i}).

Therefore,

\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}^{2}_{n,i}(\tau)\stackrel{{\scriptstyle p}}{{\longrightarrow}}\mathbb{E}\left[\pi(S_{i})\phi^{2}_{1}(\tau,S_{i},Y_{i}(1),X_{i})+(1-\pi(s))\phi^{2}_{0}(\tau,S_{i},Y_{i}(1),X_{i})\right]+\mathbb{E}\phi_{s}^{2}(\tau,S_{i})=\Sigma(\tau,\tau).

To verify the Lindeberg condition, we note that

	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}_{n,i}^{2}(\tau)\mathbb{E}(\xi_{i}-1)^{2}1\{\|(\xi_{i}-1)\mathcal{J}_{n,i}(\tau)\|\geq\sqrt{n}\varepsilon\}$
	$\displaystyle\leq\frac{1}{n(\sqrt{n}\varepsilon)^{q-2}}\sum_{i=1}^{n}\mathcal{J}_{n,i}^{q}(\tau)\mathbb{E}(\xi_{i}-1)^{q}$
	$\displaystyle\leq\frac{c}{n(\sqrt{n}\varepsilon)^{q-2}}\sum_{i=1}^{n}[\phi_{1}^{q}(\tau,S_{i},Y_{i}(1),X_{i})+\phi_{0}^{q}(\tau,S_{i},Y_{i}(1),X_{i})+\phi_{s}^{q}(\tau,S_{i})]=o_{p}(1),$

where the last equality is due to Assumption 3(ii) and the fact that $\eta_{i,a}(\tau,s)$ is bounded.

Finite dimensional convergence of $\varpi_{n,1}^{w}(\tau)+\varpi_{n,2}^{w}(\tau)$ across $\tau$ can be established in the same manner using the Cramér-Wold device and the details are omitted. By the same calculation as that given above the covariance kernel is shown to be

	$\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\mathcal{J}_{n,i}(\tau_{1})\mathcal{J}_{n,i}(\tau_{2})$
	$\displaystyle=\mathbb{E}\left[\pi(S_{i})\phi_{1}(\tau_{1},S_{i},Y_{i}(1),X_{i})\phi_{1}(\tau_{2},S_{i},Y_{i}(1),X_{i})\right]$
	$\displaystyle+\mathbb{E}\left[(1-\pi(s))\phi_{0}(\tau_{1},S_{i},Y_{i}(1),X_{i})\phi(\tau_{2},S_{i},Y_{i}(1),X_{i})\right]$
	$\displaystyle+\mathbb{E}\phi_{s}(\tau_{1},S_{i})\phi_{s}(\tau_{2},S_{i})=\Sigma(\tau_{1},\tau_{2}),$

which concludes the proof. ∎

Lemma N.6.

Suppose the Assumptions in Theorem 5.6 hold. Then,

\displaystyle\sup_{\tau\in\Upsilon,a=0,1,s\in\mathcal{S}}\left\|\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(1\{Y_{i}\leq\hat{q}_{a}(\tau)\}-m_{a}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\|_{\infty}=O_{p}\left(\sqrt{\frac{\log(n)}{n}}\right).

Proof.

We focus on $a=1$ . We have

	$\displaystyle\sup_{\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq\hat{q}_{1}(\tau)\}-\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)\|X_{i},S_{i}=s))H_{h_{n}}(X_{i})\right\\|_{\infty}$
	$\displaystyle+\sup_{\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)\|X_{i},S_{i}=s)-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}$
	$\displaystyle\leq\sup_{q\in\Re}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq q\}-\mathbb{P}(Y_{i}(1)\leq q\|X_{i},S_{i}=s))H_{h_{n}}(X_{i})\right\\|_{\infty}$
	$\displaystyle+\sup_{\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)\|X_{i},S_{i}=s)-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}.$		(N.4)

Define

\displaystyle\mathcal{F}_{h}=\{1\{Y_{i}^{s}(1)\leq q\}H_{h_{n},h}(X_{i}):q\in\Re\},\quad\mathcal{F}=\cup_{h\in[h_{n}]}\mathcal{F}_{h_{n}},

and let $H_{h_{n},h}(X_{i})$ be the $h$ -th coordinate of $H_{h_{n}}(X_{i})$ . For each $h\in[h_{n}]$ , $\mathcal{F}_{h}$ is of the VC-type with fixed coefficients $(\alpha,v)$ and a common envelope $F_{i}=||H_{h_{n}}(X_{i})||_{2}\leq\zeta(h_{n})$ , i.e.,

\displaystyle\sup_{Q}N(\mathcal{F}_{h},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{\alpha}{\varepsilon}\right)^{v},\quad\forall\varepsilon\in(0,1],

where the supremum is taken over all finitely discrete probability measures. This implies

\displaystyle\sup_{Q}N(\mathcal{F},e_{Q},\varepsilon||F||_{Q,2})\leq\sum_{h\in[h_{n}]}\sup_{Q}N(\mathcal{F}_{h},e_{Q},\varepsilon||F||_{Q,2})\leq\left(\frac{\alpha h_{n}}{\varepsilon}\right)^{v}\,\quad\forall\varepsilon\in(0,1],

i.e., $\mathcal{F}$ is also of the VC-type with coefficients $(\alpha h_{n},v)$ . In addition,

\displaystyle\sup_{f\in\mathcal{F}}\mathbb{E}f^{2}\leq\max_{h\in[h_{n}]}\mathbb{E}H_{h_{n},h}^{2}(X_{i})\leq C<\infty.

Then, Lemma N.2 implies

	$\displaystyle\sup_{q\in\Re}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(1\{Y_{i}\leq q\}-\mathbb{P}(Y_{i}(1)\leq q\|X_{i},S_{i}=s))H_{h_{n}}(X_{i})\right\\|_{\infty}$
	$\displaystyle=O_{p}\left(\sqrt{\frac{\log(h_{n}\zeta(h_{n}))}{n}}+\frac{\zeta(h_{n})\log(\zeta(h_{n}))}{n}\right)=O_{p}\left(\sqrt{\frac{\log(n)}{n}}\right).$

For the second term of (N.4), because $\sup_{q\in\Re,x\in\text{Supp}(X),s\in\mathcal{S}}f_{1}(q|x,s)<\infty$ , we have

	$\displaystyle\sup_{\tau\in\Upsilon}\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}(\mathbb{P}(Y_{i}(1)\leq\hat{q}_{1}(\tau)\|X_{i},S_{i}=s)-m_{1}(\tau,s,X_{i}))H_{h_{n}}(X_{i})\right\\|_{\infty}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon}\|\hat{q}_{1}(\tau)-q_{1}(\tau)\|\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{1}(s)}\|H_{h_{n}}(X_{i})\|\right\\|_{\infty}$
	$\displaystyle\leq\sup_{\tau\in\Upsilon}\|\hat{q}_{1}(\tau)-q_{1}(\tau)\|\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{s}(1)}\left[\|H_{h_{n}}(X^{s}_{i})\|-\mathbb{E}(\|H_{h_{n}}(X_{i})\|\|S_{i}=s)\right]\right\\|_{\infty}$
	$\displaystyle+\sup_{\tau\in\Upsilon}\|\hat{q}_{1}(\tau)-q_{1}(\tau)\|\left\\|\mathbb{E}(\|H_{h_{n}}(X_{i})\|\|S_{i}=s)\right\\|_{\infty}$
	$\displaystyle=\sup_{\tau\in\Upsilon}\|\hat{q}_{1}(\tau)-q_{1}(\tau)\|\left\\|\frac{1}{n_{1}(s)}\sum_{i\in I_{s}(1)}\left[\|H_{h_{n}}(X^{s}_{i})\|-\mathbb{E}(\|H_{h_{n}}(X_{i})\|\|S_{i}=s)\right]\right\\|_{\infty}+O_{p}(n^{-1/2})$
	$\displaystyle=O_{p}(n^{-1/2}),$

where the second to last inequality holds because of Assumption 8 and $\left\|\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right\|_{\infty}\leq C<\infty$ , and the last inequality holds because by a similar argument to the one used in bounding the first term on the RHS of (N.4), we can show that

\displaystyle\left\|\frac{1}{n_{1}(s)}\sum_{i\in I_{s}(1)}\left[|H_{h_{n}}(X^{s}_{i})|-\mathbb{E}(|H_{h_{n}}(X_{i})||S_{i}=s)\right]\right\|_{\infty}=O_{p}\left(\sqrt{\frac{\log(n)}{n}}\right).

This concludes the proof. ∎

Lemma N.7.

Suppose the Assumptions in Theorem 5.6 hold and recall $\overline{\ell}$ defined in (L.1). We have $\overline{\ell}/(\sqrt{h_{n}\log(n)}/n)\rightarrow\infty$ , w.p.a.1.

Proof.

Note that w.p.a.1,

	$\displaystyle\overline{\ell}$	$\displaystyle=\inf_{U\in\Re^{h_{n}}}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})U)^{2}\right]^{3/2}}{\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}\|H_{h_{n}}^{\top}(X_{i})U\|^{3}}$
		$\displaystyle\geq\inf_{U\in\Re^{h_{n}}}\frac{\left[\frac{1}{n_{a}(s)}\sum_{i\in I_{a}(s)}(H_{h_{n}}^{\top}(X_{i})U)^{2}\right]^{1/2}}{\sup_{x\in\mathcal{X}}\|\|H_{h_{n}}(x)\|\|_{2}\|\|U\|\|_{2}}\geq\frac{\kappa_{1}^{1/2}}{\zeta(h_{n})},$

where the last inequality is due to Assumption 12. Therefore,

\displaystyle\overline{\ell}/(\sqrt{h_{n}\log(n)}/n)\geq\sqrt{\frac{\kappa_{1}n}{\zeta^{2}(h_{n})h_{n}\log(n)}}\rightarrow\infty\leavevmode\nobreak\ w.p.a.1.

∎

References

Abadie et al. (2018) Abadie, A., M. M. Chingos, and M. R. West (2018). Endogenous stratification in randomized experiments. Review of Economics and Statistics 100(4), 567–580.
Anderson and McKenzie (2021) Anderson, S. J. and D. McKenzie (2021). Improving business practices and the boundary of the entrepreneur: a randomized experiment comparing training, consulting, insourcing and outsourcing. Journal of Political Economy, 130(1), 157–209.
Ansel et al. (2018) Ansel, J., H. Hong, and J. Li (2018). OLS and 2SLS in randomised and conditionally randomized experiments. Journal of Economics and Statistics 238, 243–293.
Athey and Imbens (2017) Athey, S. and G. W. Imbens (2017). The econometrics of randomized experiments. In Handbook of Economic Field Experiments, Volume 1, pp. 73–140. Elsevier.
Bach (2010) Bach, F. (2010). Self-concordant analysis for logistic regression. Electronic Journal of Statistics 4, 384–414.
Bai (2020) Bai, Y. (2020). Optimality of matched-pair designs in randomized controlled trials. Available at SSRN 3483834.
Bai et al. (2021) Bai, Y., A. Shaikh, and J. P. Romano (2021). Inference in experiments with matched pairs. Journal of the American Statistical Association, forthcoming.
Banerjee et al. (2015) Banerjee, A., E. Duflo, R. Glennerster, and C. Kinnan (2015). The miracle of microfinance? Evidence from a randomized evaluation. American Economic Journal: Applied Economics 7(1), 22–53.
Belloni et al. (2017) Belloni, A., V. Chernozhukov, I. Fernández-Val, and C. Hansen (2017). Program evaluation with high-dimensional data. Econometrica 85(1), 233–298.
Bitler et al. (2006) Bitler, M. P., J. B. Gelbach, and H. W. Hoynes (2006). What mean impacts miss: distributional effects of welfare reform experiments. American Economic Review 96(4), 988–1012.
Bloniarz et al. (2016) Bloniarz, A., H. Liu, C.-H. Zhang, J. S. Sekhon, and B. Yu (2016). Lasso adjustments of treatment effect estimates in randomized experiments. Proceedings of the National Academy of Sciences 113(27), 7383–7390.
Box (1976) Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association 71(356), 791–799.
Bradic et al. (2019) Bradic, J., S. Wager, and Y. Zhu (2019). Sparsity double robust inference of average treatment effects. arXiv preprint arXiv: 1905.00744.
Bugni et al. (2018) Bugni, F. A., I. A. Canay, and A. M. Shaikh (2018). Inference under covariate-adaptive randomization. Journal of the American Statistical Association 113(524), 1741–1768.
Bugni et al. (2019) Bugni, F. A., I. A. Canay, and A. M. Shaikh (2019). Inference under covariate-adaptive randomization with multiple treatments. Quantitative Economics 10(4), 1747–1785.
Bugni and Gao (2021) Bugni, F. A. and M. Gao (2021). Inference under covariate-adaptive randomization with imperfect compliance. arXiv preprint arXiv: 2102.03937.
Burchardi et al. (2019) Burchardi, K. B., S. Gulesci, B. Lerva, and M. Sulaiman (2019). Moral hazard: experimental evidence from tenancy contracts. Quarterly Journal of Economics 134(1), 281–347.
Campos et al. (2017) Campos, F., M. Frese, M. Goldstein, L. Iacovone, H. C. Johnson, D. McKenzie, and M. Mensmann (2017). Teaching personal initiative beats traditional training in boosting small business in west africa. Science 357(6357), 1287–1290.
Chen (2007) Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook of Econometrics, Volume 6, pp. 5549–5632. Elsevier.
Chernozhukov et al. (2014) Chernozhukov, V., D. Chetverikov, and K. Kato (2014). Gaussian approximation of suprema of empirical processes. Annals of Statistics 42(4), 1564–1597.
Chernozhukov et al. (2013) Chernozhukov, V., I. Fernández-Val, and B. Melly (2013). Inference on counterfactual distributions. Econometrica 81(6), 2205–2268.
Chong et al. (2016) Chong, A., I. Cohen, E. Field, E. Nakasone, and M. Torero (2016). Iron deficiency and schooling attainment in Peru. American Economic Journal: Applied Economics 8(4), 222–55.
Cohen and Fogarty (2020) Cohen, P. L. and C. B. Fogarty (2020). No-harm calibration for generalized oaxaca-blinder estimators. arXiv preprint arXiv:2012.09246.
Crépon et al. (2015) Crépon, B., F. Devoto, E. Duflo, and W. Parienté (2015). Estimating the impact of microcredit on those who take it up: evidence from a randomized experiment in Morocco. American Economic Journal: Applied Economics 7(1), 123–50.
Duflo et al. (2013) Duflo, E., M. Greenstone, R. Pande, and N. Ryan (2013). Truth-telling by third-party auditors and the response of polluting firms: experimental evidence from India. Quarterly Journal of Economics 128(4), 1499–1545.
Dupas et al. (2018) Dupas, P., D. Karlan, J. Robinson, and D. Ubfal (2018). Banking the unbanked? evidence from three countries. American Economic Journal: Applied Economics 10(2), 257–297.
Firpo (2007) Firpo, S. (2007). Efficient semiparametric estimation of quantile treatment effects. Econometrica 75(1), 259–276.
Fogarty (2018) Fogarty, C. B. (2018). Regression-assisted inference for the average treatment effect in paired experiments. Biometrika 105(4), 994–1000.
Freedman (2008a) Freedman, D. A. (2008a). On regression adjustments in experiments with several treatments. Annals of Applied Statistics 2(1), 176–196.
Freedman (2008b) Freedman, D. A. (2008b). On regression adjustments to experimental data. Advances in Applied Mathematics 40(2), 180–193.
Greaney et al. (2016) Greaney, B. P., J. P. Kaboski, and E. Van Leemput (2016). Can self-help groups really be “self-help”? Review of Economic Studies 83(4), 1614–1644.
Hahn et al. (2011) Hahn, J., K. Hirano, and D. Karlan (2011). Adaptive experimental design using the propensity score. Journal of Business & Economic Statistics 29(1), 96–108.
Hahn and Liao (2021) Hahn, J. and Z. Liao (2021). Bootstrap standard error estimates and inference. Econometrica 89(4), 1963–1977.
Hirano et al. (2003) Hirano, K., G. W. Imbens, and G. Ridder (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4), 1161–1189.
Hu and Hu (2012) Hu, Y. and F. Hu (2012). Asymptotic properties of covariate-adaptive randomization. Annals of Statistics 40(3), 1794–1815.
Jakiela and Ozier (2016) Jakiela, P. and O. Ozier (2016). Does Africa need a rotten kin theorem? Experimental evidence from village economies. Review of Economic Studies 83(1), 231–268.
Jiang et al. (2021) Jiang, L., X. Liu, P. C. B. Phillips, and Y. Zhang (2021). Bootstrap inference for quantile treatment effects in randomized experiments with matched pairs. Review of Economics and Statistics, forthcoming.
Kallus et al. (2020) Kallus, N., X. Mao, and M. Uehara (2020). Localized debiased machine learning: efficient inference on quantile treatment effects and beyond. arXiv preprint arXiv: 1912.12945.
Karlan et al. (2014) Karlan, D., A. L. Ratan, and J. Zinman (2014). Savings by and for the poor: A research review and agenda. Review of Income and Wealth 60(1), 36–78.
Kato (2009) Kato, K. (2009). Asymptotics for argmin processes: convexity arguments. Journal of Multivariate Analysis 100(8), 1816–1829.
Knight (1998) Knight, K. (1998). Limiting distributions for $L_{1}$ regression estimators under general conditions. Annals of Statistics 26(2), 755–770.
Lei and Ding (2021) Lei, L. and P. Ding (2021). Regression adjustment in completely randomized experiments with a diverging number of covariates. Biometrika, 108(4), 815–828.
Li and Ding (2020) Li, X. and P. Ding (2020). Rerandomization and regression adjustment. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82(1), 241–268.
Lin (2013) Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: reexamining Freedman’s critique. Annals of Applied Statistics 7(1), 295–318.
Liu et al. (2020) Liu, H., F. Tu, and W. Ma (2020). A general theory of regression adjustment for covariate-adaptive randomization: OLS, Lasso, and beyond. arXiv preprint arXiv: 2011.09734.
Liu and Yang (2020) Liu, H. and Y. Yang (2020). Regression-adjusted average treatment effect estimates in stratified randomized experiments. Biometrika 107(4), 935–948.
Lu (2016) Lu, J. (2016). Covariate adjustment in randomization-based causal inference for 2K factorial designs. Statistics & Probability Letters 119, 11–20.
Ma et al. (2015) Ma, W., F. Hu, and L. Zhang (2015). Testing hypotheses of covariate-adaptive randomized clinical trials. Journal of the American Statistical Association 110(510), 669–680.
Ma et al. (2020) Ma, W., Y. Qin, Y. Li, and F. Hu (2020). Statistical inference for covariate-adaptive randomization procedures. Journal of the American Statistical Association 115(531), 1488–1497.
Montgomery-Smith (1993) Montgomery-Smith, S. J. (1993). Comparison of sums of independent identically distributed random variables. Probability and Mathematical Statistics 14(2), 281–285.
Muralidharan and Sundararaman (2011) Muralidharan, K. and V. Sundararaman (2011). Teacher performance pay: experimental evidence from India. Journal of Political Economy 119(1), 39–77.
Negi and Wooldridge (2020) Negi, A. and J. M. Wooldridge (2020). Revisiting regression adjustment in experiments with heterogeneous treatment effects. Econometric Reviews 40(5), 1–31.
Olivares (2021) Olivares, M. (2021). Robust permutation test for equality of distributions under covariate-adaptive randomization. Working paper, University of Illinois at Urbana Champaign.
Shao and Yu (2013) Shao, J. and X. Yu (2013). Validity of tests under covariate-adaptive biased coin randomization and generalized linear models. Biometrics 69(4), 960–969.
Shao et al. (2010) Shao, J., X. Yu, and B. Zhong (2010). A theory for testing hypotheses under covariate-adaptive randomization. Biometrika 97(2), 347–360.
Tabord-Meehan (2021) Tabord-Meehan, M. (2021). Stratification trees for adaptive randomization in randomized controlled trials. arXiv preprint arXiv: 1806.05127.
Tan (2020) Tan, Z. (2020). Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data. Annals of Statistics 48(2), 811–837.
van der Vaart and Wellner (1996) van der Vaart, A. and J. A. Wellner (1996). Weak Convergence and Empirical Processes. Springer, New York.
von Neumann (2019) von Neumann, J. (2019). The mathematician. In Mathematics: People, Problems, Results (2 ed.). Chapman and Hall/CRC.
Wei (1978) Wei, L. (1978). An application of an urn model to the design of sequential controlled clinical trials. Journal of the American Statistical Association 73(363), 559–563.
Ye (2018) Ye, T. (2018). Testing hypotheses under covariate-adaptive randomisation and additive models. Statistical Theory and Related Fields 2(1), 96–101.
Ye and Shao (2020) Ye, T. and J. Shao (2020). Robust tests for treatment effect in survival analysis under covariate-adaptive randomization. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82(5), 1301–1323.
Ye et al. (2022) Ye, T., Y. Yi, and J. Shao (2022). Inference on average treatment effect under minimization and other covariate-adaptive randomization methods. Biometrika, 109(1), 33–47.
Zhang and Zheng (2020) Zhang, Y. and X. Zheng (2020). Quantile treatment effects and bootstrap inference under covariate-adaptive randomization. Quantitative Economics 11(3), 957–982.
Zhao and Ding (2021) Zhao, A. and P. Ding (2021). Covariate-adjusted fisher randomization tests for the average treatment effect. Journal of Econometrics 225(2), 278–294.

Regression-Adjusted Estimation of Quantile Treatment Effects under Covariate-Adaptive Randomizations

Abstract

1 Introduction

2 Setup and Notation

Assumption 1.

Example 1 (SRS).

Example 2 (WEI).

Example 3 (BCD).

Example 4 (SBR).

3 Estimation

Assumption 2.

Assumption 3.

Theorem 3.1.

4 Multiplier Bootstrap Inference

Assumption 4.

Assumption 5.

Theorem 4.1.

5 Auxiliary Regressions

5.1 Parametric method

Assumption 6.

Theorem 5.1.

5.1.1 Linear probability model

Assumption 7.

Theorem 5.2.

Assumption 8.

Assumption 9.

Theorem 5.3.

5.1.2 Logistic probability model

Assumption 10.

Theorem 5.4.

5.1.3 Further improved logistic model

Assumption 11.

Theorem 5.5.

5.2 Nonparametric method

Assumption 12.

Theorem 5.6.

6 Simulations

6.1 Data generating processes

6.2 Estimation methods

6.3 Simulation results

6.4 Practical recommendations

7 Empirical Application

8 Conclusion

Acknowledgements

Funding:

Appendix A Regularization Method for Regression Adjustments

Assumption 13.

Theorem A.1.

Appendix B Practical Guidance and Computation

B.1 Procedures for estimation and bootstrap inference

B.2 Bootstrap confidence intervals

B.3 Computation of Auxiliary Regressions

Parametric regressions.

Sieve logistic regressions.

Logistic regressions with an ℓ1\ell_{1} penalization.

Appendix C Additional Simulation Results

C.1 Pointwise tests

C.2 Estimation biases and standard errors

C.3 Naïve Bootstrap Inference

C.4 High-dimensional covariates

Appendix D Additional Notation

Appendix E Proof of Theorem 3

Appendix F Proof of Theorem 5

Appendix G Proof of Theorem 5.1

Appendix H Proof of Theorem 5.2

Appendix I Proof of Theorem 5.3

Appendix J Proof of Theorem 5.4

Appendix K Proof of Theorem 5.5

Appendix L Proof of Theorem 5.6

Appendix M Proof of Theorem A.1

Appendix N Technical Lemmas

Lemma N.1.

Proof.

Lemma N.2.

Proof.

Lemma N.3.

Proof.

Lemma N.4.

Proof.

Lemma N.5.

Logistic regressions with an $\ell_{1}$ penalization.