This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\floatsetup

[table]capposition=top \AtAppendix \AtAppendix \deffootnote1em1.6em\thefootnotemark

Two Sample Unconditional Quantile Effect

Atsushi Inoue
Vanderbilt University
Department of Economics, Vanderbilt University. Email: [email protected]
   Tong Li
Vanderbilt University
Department of Economics, Vanderbilt University. Email: [email protected]
   Qi Xu
Vanderbilt University
Department of Economics, Vanderbilt University. Email: [email protected]
Abstract

This paper proposes a new framework to evaluate unconditional quantile effects (UQE) in a data combination model. The UQE measures the effect of a marginal counterfactual change in the unconditional distribution of a covariate on quantiles of the unconditional distribution of a target outcome. Under rank similarity and conditional independence assumptions, we provide a set of identification results for UQEs when the target covariate is continuously distributed and when it is discrete, respectively. Based on these identification results, we propose semiparametric estimators and establish their large sample properties under primitive conditions. Applying our method to a variant of Mincer’s earnings function, we study the counterfactual quantile effect of actual work experience on income.


Keywords: Counterfatual policy effect, unconditional quantile effect, data combination model, Mincer regression.

1 Introduction

Missing data is a ubiquitous problem in empirical studies. Consider the scenario where a researcher is interested in conducting counterfactual analysis on a target variable, but it is entirely missing from the dataset of interest. In such circumstances, counterfactual policy effects cannot be identified from the primary dataset alone, and therefore, external information and/or stronger identifying assumptions are necessary. In this paper, we utilize both to achieve identification. Specifically, we focus on the situation where the missing variable can be found in another dataset and the information from which can be used to recover target policy parameters in the population of interest, under a set of commonly assumed restrictions on both the data structure and the model primitives.

To fix ideas, consider the following example. Suppose we are interested in studying the effect of a counterfactual change in the distribution of actual labor market experience on some distributional feature of yearly earnings. Our main dataset does not record respondents’ work history, and therefore, we cannot recover their actual labor market experience. Suppose the variable is available from a second dataset, but it may not be a reliable source of information on income or it may not be representative of the target population we aim to analyze. In this case, we would benefit from combining information from both samples to identify and estimate our parameter of interest.

Research on counterfactual policy effects under data combination is scarce. Our paper fills this gap by proposing a new framework that accommodates such a data structure. In this paper, we focus on one particular type of counterfactual policy effects, the unconditional quantile effect (UQE). It measures the effect of a marginal change in the unconditional distribution of a single covariate on the quantiles of a target outcome. We provide identification results for UQE under various types of marginal distributional change. The key insight of our identification strategy is that some covariates present in both datasets can be excluded from the outcome equation, which would provide a source of exogenous variations that allows us to recover the joint distribution of missing variables, otherwise not identified using the two samples separately.

The second contribution of the paper is to propose novel semiparametric estimators based on these identification results. Departing from the literature on the estimation of counterfactual quantile effects—see, e.g. Firpo et al. (2009b), Sasaki et al. (2020), etc.—which focuses primarily on the marginal location shift (MLS) of a covariate, we provide estimators of UQE under two general types of counterfactual distributional changes, namely the marginal distributional shift (MDS) and the marginal quantile shift (MQS),111The precise definitions of MLS, MDS, and MQS are given in Section 3. the latter of which includes MLS as a special case. To the best of our knowledge, large sample results for these two cases are new to the literature. We apply these results to study a variant of Mincer’s earnings function. Using data from Integrated Public Use Microdata Sample (IPUMS) as our main data source and the Panel Study of Income Dynamics (PSID) as the auxiliary sample, we investigate the counterfactual income effect of actual work experience. The effect profiles with MDS and MQS are found to be similar in shape.

This paper belongs to the growing literature on the unconditional policy effect. Since Firpo et al. (2009b) introduced the method of unconditional quantile regressions (UQR), the study of unconditional policy effect has gained much attention. In general, this parameter differs from the one identified by the conditional quantile regression (Koenker and Bassett Jr, 1978), where marginal effects on the conditional quantile are the locus of attention. Applied researcher are often interested in the shifts in the quantiles of unconditional distribution of a target outcome. For instance, one may take an interest in how wage distribution changes in response to marginal increases in some characteristics of the labor force, such as education level and experience. Conditional quantile regression cannot be applied to address this type of questions, whereas UQR suits the goal.

Rothe (2012) generalizes the method of Firpo et al. (2009b), and analyzes a variety of counterfactual policy effects. He formalizes the idea of ceteris paribus distributional change and provides extensive results for both fixed and marginal policy shifts. Our identification framework is closely related to his treatment of the latter type. Focusing on the special case of quantile effects, we extend his identification results to a data combination setting and provide novel inference theories specifically tailored to the distinct features of combined samples. For recent development in this literature, see Martínez-Iriarte and Sun (2020), Martínez-Iriarte (2020), and Sasaki et al. (2020). For a comprehensive survey on counterfactual distributions and decomposition methods, see Fortin et al. (2011).

Our paper also builds on the econometric methods of data combination. In economics, this strand of literature stems from the two-sample instrumental variables (TSIV) model that was first introduced by Klevmarken (1982), Angrist and Krueger (1992), Arellano and Meghir (1992), and is later extended by Ridder and Moffitt (2007), Inoue and Solon (2010), among others. Conceptually, the semiparametric data combination model we consider here is different from the traditional missing data problem (Robins et al., 1994). It is more closely related to the “verify-out-of-sample” model in Chen et al. (2008), and also to Imbens and Lancaster (1994), Fan et al. (2014), Graham et al. (2016), Hirukawa et al. (2020), and Buchinsky et al. (2021), to name a few.

The paper is organized as follows. In the next section, we describe the model and assumptions on the data structure. In Section 3 we introduce the parameter of interest, and then present identification results for continuously distributed and discrete target covariates, respectively. Section 4 discusses the estimation strategy and large sample results. We apply the method to study the income effect of real labor market experience in Section 5. Section 6 concludes.

2 Setup

The objective of our paper is to analyze the effect of a counterfactual change in the marginal distribution of the covariate of interest, XX, on the quantiles of the target outcome, YY, under data combination. The precise definition of the counterfactual policy effect is provided in Section 3. When XX is exogenous, and all the variables relevant for analysis are observed from a single data source, counterfactual policy effects can be analyzed either directly by applying tools from Firpo et al. (2009b) and Rothe (2012), or indirectly by recovering the structural function using standard identification results such as Matzkin (2003) and Matzkin (2007). However, when the variables of interest are scattered among several different data sources, we face a fundamental identification problem: The conditional distribution of YY given XX is not identified from any single sample. In this case, existing methods do not provide an immediate solution.

Throughout this paper, we consider the scenario where our YY and XX are sourced from two different data sets. The outcome is contained in the principal or main sample, 𝒮s={Yi,Zi}i=1ns\mathcal{S}_{s}=\{Y_{i},Z_{i}\}_{i=1}^{n_{s}}, from the study population, 𝒫s\mathcal{P}_{s}. The target covariate is missing completely from 𝒮s\mathcal{S}_{s}. However, it is observed in the auxiliary sample, 𝒮a={Xi,Zi}i=1na\mathcal{S}_{a}=\{X_{i},Z_{i}\}_{i=1}^{n_{a}}, from the auxiliary population, 𝒫a\mathcal{P}_{a}, which does not contain observations of YY.

We now formally describe our structural model. We allow variables from two populations to be determined by different mechanisms. For the study population,

Ys\displaystyle Y_{s} =gs(Xs,Z1,ϵs),\displaystyle=g_{s}(X_{s},Z_{1},\epsilon_{s}), (1)
Xs\displaystyle X_{s} =hs(Z,ηs),\displaystyle=h_{s}(Z,\eta_{s}), (2)

where Ys𝒴Y_{s}\in\mathcal{Y}\subset\operatorname{\mathbb{R}} is the potential outcome in the study population, ϵsdϵ\epsilon_{s}\in\mathcal{E}\subset\mathbb{R}^{d_{\epsilon}} is a vector of unobserved heterogeneity term. Equation (1), links the target outcome, a scalar variable, Xs𝒳X_{s}\in\mathcal{X}\subset\mathbb{R}, and a vector of exogenous variables, Z1Z_{1}. Here, XsX_{s} is the potential covariate of interest in the study population, which is in turn determined by (2). We can think of (2) as the reduced form relationship between XsX_{s} and ZZ, where Z:=(Z1,Z2)𝒵:=𝒵1×𝒵2dzZ^{\prime}:=(Z_{1}^{\prime},Z_{2}^{\prime})^{\prime}\in\operatorname{\mathcal{Z}}:=\operatorname{\mathcal{Z}}_{1}\times\operatorname{\mathcal{Z}}_{2}\subset\operatorname{\mathbb{R}}^{d_{z}} includes both the exogenous variables in the outcome equation and a vector of excluded instrument, Z2Z_{2}. The vector of instrument, ZZ, is available in both samples, and therefore, it serves to establish a link between two samples.222The support of ZZ is not restricted. We allow the distribution of ZZ to be continuous, discrete, or mixed. Nevertheless, to ease notational burden, we focus on the continuous case exclusively in what follows. The model in (1) accommodates general nonseparability between covariates and the unobserved heterogeneity. We do not impose any parametric or shape restriction on gsg_{s}.

Variables in the auxiliary population are determined by

Ya\displaystyle Y_{a} =ga(Xa,Z1,ϵa),\displaystyle=g_{a}(X_{a},Z_{1},\epsilon_{a}),
Xa\displaystyle X_{a} =ha(Z,ηa),\displaystyle=h_{a}(Z,\eta_{a}),

where gag_{a} and hah_{a} are generally different from gsg_{s} and hsh_{s}.

Let RR denote the sample membership indicator. That is, Ri=1,R_{i}=1, if ii-th draw comes from the study population, i=1,,n:=ns+nai=1,...,n:=n_{s}+n_{a}. Let Y:=RYs+(1R)YaY:=RY_{s}+(1-R)Y_{a} and X:=RXs+(1R)XaX:=RX_{s}+(1-R)X_{a}. If no variable is missing, we are able to observe (Ys,Ya,Xs,Xa)(Y_{s},Y_{a},X_{s},X_{a}). However, in our context, only RYRY and (1R)X(1-R)X are observed. We then construct a pseudo-merged sample 𝒮\mathcal{S} using the two data sources as 𝒮={Ri,RiYi,(1Ri)Xi,Zi}i=1n\mathcal{S}=\{R_{i},R_{i}Y_{i},(1-R_{i})X_{i},Z_{i}\}_{i=1}^{n}. Let A:=(R,RY,(1R)X,Z)A:=(R,RY,(1-R)X,Z) and W:=(X,Z1)W:=(X,Z_{1}) collect the observed variables and the covariates in the outcome equation, respectively. Throughout the paper, we arrange the data in a way such that Ri=1R_{i}=1 for i=1,,nsi=1,...,n_{s} and Ri=0R_{i}=0 for i=ns+1,,n.i=n_{s}+1,...,n. The merged sample may not correspond to any real-world population. We impose the following set of assumptions on the merged sample so it can mimic a random sample from a pseudo population. These assumptions are largely based on Assumption 1 in Graham et al. (2016).

Assumption 2.1 (Data Structure).
  1. (a)

    Supp(FZ|R=1)Supp(FZ|R=0)\operatorname{Supp}(F_{Z|R=1})\subset\operatorname{Supp}(F_{Z|R=0}).

  2. (b)

    (i) ns/(ns+na)Q0n_{s}/(n_{s}+n_{a})\to Q_{0}; (ii) RR follows a Bernoulli distribution, with 𝔼[R]=Q0\mathbb{E}[R]=Q_{0}.

  3. (c)

    There is a unique measurable function r():𝒵[0,1]r(\cdot):\operatorname{\mathcal{Z}}\mapsto[0,1], such that for all z𝒵z\in\operatorname{\mathcal{Z}},

    fZ|R(z|1)fZ|R(z|0)=1Q0Q0r(z)1r(z).\dfrac{f_{Z|R}(z|1)}{f_{Z|R}(z|0)}=\dfrac{1-Q_{0}}{Q_{0}}\dfrac{r(z)}{1-r(z)}.
  4. (d)

    (i) Q0(ϵ1,1ϵ1),Q_{0}\in(\epsilon_{1},1-\epsilon_{1}), for some ϵ1(0,1/2)\epsilon_{1}\in(0,1/2); (ii) ϵ2<r(z)<1ϵ2\epsilon_{2}<r(z)<1-\epsilon_{2} for some ϵ2(0,1/2)\epsilon_{2}\in(0,1/2), and for all z𝒵\ z\in\mathcal{Z}.

  5. (e)

    (Xs|Z,R=1)=d(Xa|Z,R=0)(X_{s}|Z,R=1)\stackrel{{\scriptstyle d}}{{=}}(X_{a}|Z,R=0).

Assumption 2.1(a) is a support condition on the commonly observed variables. It ensures that we will be able to find, for all the observations in the study sample, comparable units in the auxiliary sample, Assumption 2.1(b) imposes a pseudo randomization scheme on RR, and therefore, allows us to view the merged data as a random sample from the pseudo-merged population. Let ()\ell(\cdot) denote the conditional likelihood ratio of ZZ across two population, i.e. (z):=fZ|R(z|1)/fZ|R(z|0)\ell(z):=f_{Z|R}(z|1)/f_{Z|R}(z|0). Assumption 2.1(c) expresses this likelihood ratio as a function of r()r(\cdot), which plays the role of the “propensity score” function of RR given ZZ. In our context, this is the probability that one observation belongs to the study population conditional on the value that instrumental variables take. The first part of Assumption 2.1(d) indicates that nsn_{s} grows at the same order of magnitude as nan_{a}. The second part of Assumption 2.1(d) ensures that the pseudo-true merged population is not a degenerate one conditional on all possible values of ZZ. By Assumption 2.1(b)–(d) and Bayes’ Law, we have r(z)=(D=1|Z=z),r(z)=\operatorname{\mathbb{P}}(D=1|Z=z), and thus, r()r(\cdot) can be viewed as the propensity score function.

Assumption 2.1(e) is a rank similarity condition. It requires the conditional distribution of XsX_{s} given ZZ in the principal population coincide with that of XaX_{a} in the auxiliary population. Assumption 2.1(e) is the only cross-population restriction we impose on our data structure, which means the conditional distribution of YY given (X,Z)(X,Z), and therefore, the conditional distribution of YY given ZZ and the marginal distribution of ZZ are all allowed to differ across 𝒫s\operatorname{\mathcal{P}}_{s} and 𝒫a\operatorname{\mathcal{P}}_{a}. This assumption is weaker than Assumption 1(ii) of Graham et al. (2016), as we do not impose a rank similarity condition on the outcome, which would imply FYs|ZR=1(|)=FYa|ZR=0(|)F_{Y_{s}|ZR=1}(\cdot|\cdot)=F_{Y_{a}|ZR=0}(\cdot|\cdot).

3 Identification

In this section, we first introduce the definition of UQE. Then, we develop a set of identification results, for the cases when XX is continuously distributed, and when it is discrete, respectively.

3.1 Parameter of Interest

Our definition of the unconditional policy effect depends on the notion of a counterfactual experiment, which is formally defined as follows,

Definition 1 (Counterfactual Experiment).

Let ϕ:=(𝒰~s,G~s,Z~,R~,ϵ~s,g~s):ΩK([0,\phi:=(\widetilde{\mathcal{U}}_{s},\widetilde{G}_{s},\widetilde{Z},\widetilde{R},\widetilde{\epsilon}_{s},\widetilde{g}_{s}):\Omega\mapsto K([0, 1])×𝒟(𝒳)×𝒵×{0,1}××l2(𝒳,𝒵1,),1])\times\operatorname{\mathcal{D}}(\operatorname{\mathcal{X}})\times\operatorname{\mathcal{Z}}\times\{0,1\}\times\mathcal{E}\times l_{2}(\operatorname{\mathcal{X}},\operatorname{\mathcal{Z}}_{1},\mathcal{E}), where K([0,1])K([0,1]) is the collection of all non-empty closed subsets of the unit interval, and 𝒟(𝒳)\operatorname{\mathcal{D}}(\operatorname{\mathcal{X}}) denotes the space of distribution functions on 𝒳.\operatorname{\mathcal{X}}. We say Φ\Phi is the set of counterfactual experiments, if for all ϕΦ\phi\in\Phi, we have (i) G~s1(Us)=G~s1(Us)\widetilde{G}_{s}^{-1}(U_{s})=\widetilde{G}_{s}^{-1}(U^{\prime}_{s}) almost surely for all Us,Us𝒰~sU_{s},U^{\prime}_{s}\in\widetilde{\mathcal{U}}_{s}; (ii) (ϵ~s,Z~,R~)=d(ϵs,Z,R)(\widetilde{\epsilon}_{s},\widetilde{Z},\widetilde{R})\stackrel{{\scriptstyle d}}{{=}}(\epsilon_{s},Z,R); (iii) g~s=gs\widetilde{g}_{s}=g_{s}, (iv) for all Us𝒰sU_{s}\in\mathcal{U}_{s} and U~s𝒰~s\widetilde{U}_{s}\in\widetilde{\mathcal{U}}_{s}, there exists U~s𝒰~s\widetilde{U}^{\prime}_{s}\in\widetilde{\mathcal{U}}_{s} and Us𝒰sU^{\prime}_{s}\in\mathcal{U}_{s}, respectively, such that (U~s|Z~1,R~=1)=d(Us|Z1,R=1)(\widetilde{U}^{\prime}_{s}|\widetilde{Z}_{1},\widetilde{R}=1)\stackrel{{\scriptstyle d}}{{=}}(U_{s}|Z_{1},R=1) and (U~s|Z~1,R~=1)=d(Us|Z1,R=1)(\widetilde{U}_{s}|\widetilde{Z}_{1},\widetilde{R}=1)\stackrel{{\scriptstyle d}}{{=}}(U^{\prime}_{s}|Z_{1},R=1), where 𝒰s={U˘𝒰[0,1]:(FXs|R1(U˘s|1)|Z1,R=1)=d(Xs|Z1,R=1)}\mathcal{U}_{s}=\{\breve{U}\in\mathcal{U}[0,1]:(F^{-1}_{X_{s}|R}(\breve{U}_{s}|1)|Z_{1},R=1)\stackrel{{\scriptstyle d}}{{=}}(X_{s}|Z_{1},R=1)\}.

The definition of counterfactual experiments does not specify the counterfactural target covariate X~s\widetilde{X}_{s} directly. It is implicitly defined through the first two elements of ϕ\phi. The first element, 𝒰~s\widetilde{\mathcal{U}}_{s}, is a set of rank variables associated with the counterfactual target covariate, X~s\widetilde{X}_{s}. When X~s\widetilde{X}_{s} is absolutely continuous, 𝒰~s\widetilde{\mathcal{U}}_{s} becomes a singleton set, but the set is generally not degenerate when the distribution of X~s\widetilde{X}_{s} contains a mass point. The second component, G~s\widetilde{G}_{s}, is the counterfactual distribution of X~s\widetilde{X}_{s} conditional on the study population. When the target covariate is continuously distributed, G~s\widetilde{G}_{s} is continuous and strictly increasing, and therefore, X~s\widetilde{X}_{s} is uniquely determined by X~s=G~s1(U~s)\widetilde{X}_{s}=\widetilde{G}_{s}^{-1}(\widetilde{U}_{s}), where U~s\widetilde{U}_{s} is the only element in 𝒰~s\widetilde{\mathcal{U}}_{s}. However, when the target covariate contains mass points, there is a set of counterfactual rank variables that correspond to the same target covariate in the study population. This equivalent class is defined by Condition (i).

Following Rothe (2012), we restrict our attention to counterfactual changes where only the marginal distribution of XsX_{s} is changed, while the marginal distribution of ZZ and the dependence structure between XsX_{s} and ZZ remain unaffected. This notion of a ceteris paribus change is formally characterized by Conditions (ii)–(iv). Condition (ii) implies that the joint distribution of the observed variables (Z,R)(Z,R) and the latent variable ϵs\epsilon_{s} remain unchanged across counterfactual experiments. Under Condition (iii), the structural production function, gg is also not affected by the counterfactual change. Condition (iv) imposes a rank similarity condition. It says the conditional rank of the counterfactural target covariate follows the same distribution as the status quo. Due to the possibility of multiplicity of rank variables, the condition is also framed in terms of a set equivalence condition. When we restrict attention to absolutely continuous target covariates, both 𝒰s\mathcal{U}_{s} and 𝒰~s\widetilde{\mathcal{U}}_{s} are singleton sets. Hence, this condition reduces to (X~s|Z~1,R~=1)=d(Xs|Z1,R=1)(\widetilde{X}_{s}|\widetilde{Z}_{1},\widetilde{R}=1)\stackrel{{\scriptstyle d}}{{=}}(X_{s}|Z_{1},R=1).

Each counterfactual experiment ϕ\phi represents a modification of the underlying economic system. It completely determines the counterfactual outcome in the study population. Yet we remain largely agnostic as to the counterfactual change in the auxiliary population. The definition also leaves the mechanism causing the change in the marginal distribution of the target covariate unspecified.

Remark 1.

Our definition of counterfactual experiments relaxes the rank invariance conditions imposed by Rothe (2012). Instead, counterfactual changes in our context only need to satisfy a rank similarity or copula invariance condition.

With the counterfactual experiments defined, we now construct the counterfactual covariate vector by W~G:=(G~s1(U~s),Z~1).\widetilde{W}_{G}:=(\widetilde{G}_{s}^{-1}(\widetilde{U}_{s}),\widetilde{Z}_{1}^{\prime})^{\prime}. The counterfactual outcome of the study population is then defined as Y~s=g~s(W~G,ϵ~s)\widetilde{Y}_{s}=\widetilde{g}_{s}(\widetilde{W}_{G},\widetilde{\epsilon}_{s}), which follows a marginal distribution, FY~s,F_{\widetilde{Y}_{s}}, and a conditional distribution restricted to the principal population, FY~s|R=1F_{\widetilde{Y}_{s}|R=1}. Note that the unconditional distribution is not well-defined, due to the lack of information on counterfactual changes in the auxiliary population. Therefore, we focus exclusively on the counterfactual distribution conditional on the study population in what follows. When XX is discrete, a single counterfactual experiment is mapped to a set of counterfactual outcomes, and we denote the corresponding set of counterfactual distributions by Y~s\operatorname{\mathcal{F}}_{\widetilde{Y}_{s}}.

In our context, the sequence of counterfactual distributions is defined in terms of the “marginal” distribution of the potential covariate XsX_{s} in the study population, rather than the true unconditional distribution of the observed XX. Although XsX_{s} is missing from the main dataset, and therefore, its marginal distribution cannot be directly identified from the study population, we show in Theorem 3.2 that it can be recovered from the auxiliary data under the rank similarity assumption we impose in Assumption 2.1.

The policy parameter we seek to identify in this paper is the pathwise derivative of counterfactual distributional effect conditional on the study population. It is adapted from the definition of the marginal partial distributional policy effect (MPPE) by Rothe (2012).

Definition 2 (Marginal Partial Distributional Policy Effect).

Let Φ:={ϕt}t0Φ\Phi^{\ast}:=\{\phi_{t}\}_{t\geq 0}\subset\Phi denote a sequence of counterfactual experiments, such that G~s,tFXs|R=1\widetilde{G}_{s,t}\to F_{X_{s}|R=1}, as t0t\downarrow 0. The MPPE for a given functional ν:𝒟(𝒴)\nu:\operatorname{\mathcal{D}}(\operatorname{\mathcal{Y}})\to\mathbb{R} and a sequence of F~s,tY~s,t\widetilde{F}_{s,t}\in\operatorname{\mathcal{F}}_{\widetilde{Y}_{s,t}} is defined by,

MPPE(ν,{Y~s,t}t0):=ν(FY~s,t|R=1)t|t=0=limt0ν(FY~s,t|R=1)ν(FYs|R=1)t.MPPE(\nu,\{\widetilde{Y}_{s,t}\}_{t\geq 0}):=\left.\dfrac{\partial\nu(F_{\widetilde{Y}_{s,t}|R=1})}{\partial t}\right|_{t=0}=\lim_{t\downarrow 0}\dfrac{\nu(F_{\widetilde{Y}_{s,t}|R=1})-\nu(F_{Y_{s}|R=1})}{t}.

We consider two specific types of counterfactual distributional changes: MDS and MQS. The defintion of the former is due to Firpo et al. (2009b). It denotes a small perturbation in the distribution of XsX_{s}, in the direction of GG. MQS, on the other hand, considers a minuscule change in the quantiles of XsX_{s}. This type of policy change includes the MLS, Gt,ls1(u):=FXs|R1(u|1)+tG^{-1}_{t,ls}(u):=F_{X_{s}|R}^{-1}(u|1)+t, as a special case.

Definition 3 (Counterfactual Policy Distributions).
  • Marginal Distributional Shift (MDS): Gt,p(x):=FXs|R(x|1)+t(G(x)FXs|R(x|1)).G_{t,p}(x):=F_{X_{s}|R}(x|1)+t(G(x)-F_{X_{s}|R}(x|1)).

  • Marginal Quantile Shift (MQS): Gt,q1(u):=FXs|R1(u|1)+t(G1(u)FXs|R1(u|1)).G^{-1}_{t,q}(u):=F_{X_{s}|R}^{-1}(u|1)+t(G^{-1}(u)-F_{X_{s}|R}^{-1}(u|1)).

xxF(x),Gt(x)F(x),G_{t}(x)0x¯\bar{x}11MDSf(x1)f(x_{1})MQS1Gt(x)G_{t}(x)F(x)F(x)t0t\downarrow 0x1x_{1}x2x_{2}(x¯,1)(\bar{x},1)
Figure 1: Marginal Distributional Shift and Marginal Quantile Shift
Remark 2.

Figure 1 illustrates how the rates of change between the two types of counterfactuals are related. Under the condition that FXs|R=1F_{X_{s}|R=1} is compactly supported with strictly positive density on 𝒳\operatorname{\mathcal{X}}, MQS in the direction of q(x)q(x) can be approximated in the limit by MDS with G(x)=FXs|R(x|1)fXs|R(x|1)q(x)G(x)=F_{X_{s}|R}(x|1)-f_{X_{s}|R}(x|1)q(x).

Turning to the case of quantiles, the quantile operator for a particular τ\tau is defined by, ντ(FYs|R=1):=FYs|R=11(τ)\nu_{\tau}(F_{Y_{s}|R=1}):=F_{Y_{s}|R=1}^{-1}(\tau). With the understanding that MPPE associated with a counterfactual experiment is generally a set when the X is discretely valued, we suppress the index with respect to {Y~s,t}t0\{\widetilde{Y}_{s,t}\}_{t\geq 0} for notational convenience, and denote the MPPE with MDS, MPPE(ντ,{Y~s,t}t0)MPPE(\nu_{\tau},\{\widetilde{Y}_{s,t}\}_{t\geq 0}), and MPPE with MQS, MPPE(ντ,{Y~s,t}t0)MPPE(\nu_{\tau},\{\widetilde{Y}_{s,t}\}_{t\geq 0}), by UQEp(τ,G)UQE_{p}(\tau,G) and UQEq(τ,G)UQE_{q}(\tau,G), respectively. Here, and in what follows, the qualifier “unconditional” in UQE should be understood as conditional on (or relative to) the study population.

3.2 Identification of FYs|XsZ1R=1F_{Y_{s}|X_{s}Z_{1}R=1}

If there is no missing variable, the joint distribution of (Y,X,Z)(Y,X,Z) is directly identifiable from a random sample. Under data combination, however, only the “marginal” conditional distributions: FYs|ZR=1F_{Y_{s}|ZR=1} and FXa|ZR=0F_{X_{a}|ZR=0}, can still be separately identified from the two samples, respectively. The conditional distribution, FYs|XsZ1R=1F_{Y_{s}|X_{s}Z_{1}R=1}, is generally not identifiable without further cross-population assumptions.

Instead of seeking identification of the entire conditional distribution,333Information of the entire distribution is crucial for the conditional quantile regression approach, even if the researcher may be interested in a single fixed quantile. FYs|XsZ1R=1(|,F_{Y_{s}|X_{s}Z_{1}R=1}(\cdot|\cdot, ,1)\cdot,1), we demonstrate in Section 3.3 and 3.4 that UQE, and MPPE in general, can be identified using information at a single point of the distribution. This allows us to obtain identification under much milder restrictions on the pseudo-merged population. Identification is achieved through the excluded instrument variables, Z2Z_{2}.

To ease notational burden, define Λ(x,z1):=FYs|XsZ1R(qτ|x,z1,1)\Lambda(x,z_{1}):=F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|x,z_{1},1) for all x𝒳x\in\operatorname{\mathcal{X}} and z1𝒵1z_{1}\in\operatorname{\mathcal{Z}}_{1}.

Assumption 3.1.

ϵsZ2|Xs,Z1,R=1.\epsilon_{s}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Z_{2}|X_{s},Z_{1},R=1.

Assumption 3.1 implies that Z2Z_{2} can be excluded from the outcome equation, and therefore, can be used as a source of exogenous variation to proxy for the missing covariate in the study population. Note that Λ\Lambda is generally not identified without an exogenous instrument Z2Z_{2}. We illustrate this point with the example below.

Under Assumptions 2.1 and 3.1, the following moment-matching equation holds, for all z𝒵z\in\mathcal{Z},

𝔼[1(Yqτ)|Z,R=1]𝔼[Λ(W)|Z,R=0]=0.\mathbb{E}\left[1(Y\leq q_{\tau})|Z,R=1\right]-\mathbb{E}\left[\Lambda(W)|Z,R=0\right]=0. (3)

Equivalently, Λ\Lambda can be identified based on a likelihood-ratio-weighting equation,

𝔼[R1(Yqτ)(1R)Λ(W)r(Z)1r(Z)|Z]=0.\mathbb{E}\left[\left.R1(Y\leq q_{\tau})-(1-R)\Lambda(W)\dfrac{r(Z)}{1-r(Z)}\right|Z\right]=0. (4)

The next assumption is about the global identification of Λ\Lambda.

Assumption 3.2.

Λ\Lambda is the unique solution to (3) or (4) almost surely.

Assumption 3.2 is a high level condition. It is implied by a bounded completeness condition.444For two random element UU and VV, we say UU is bounded complete for VV, relative to a subpopulation S=sS=s, if for all bounded measurable functions δ()\delta(\cdot), 𝔼[δ(U)|V,S=s]=0\operatorname{\mathbb{E}}[\delta(U)|V,S=s]=0 implies δ(U)0\delta(U)\equiv 0 almost surely. Note that Λ\Lambda is globally identified as long as 𝔼[Λ(W)Λ~(W)|Z,R=0]=0\operatorname{\mathbb{E}}[\Lambda(W)-\widetilde{\Lambda}(W)|Z,R=0]=0 implies Λ=Λ~\Lambda=\widetilde{\Lambda}, which follows immediately if Λ\Lambda is measurable with respect to WW, and WW is bounded complete for ZZ, relative to the auxiliary population. Bounded completeness is weaker than the commonly adopted completeness condition appearing in Newey and Powell (2003) and Fan et al. (2014). We refer readers to Hoeffding et al. (1977), Blundell et al. (2007), and Lehmann (1986) for detailed discussions.

In Section 4, we base our estimation and inference on parametric identification of Λ\Lambda. In this case, we assume that FYs|XsZ1R(qτ|x,z1,1)=Λ(x,z1;β0)F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|x,z_{1},1)=\Lambda(x,z_{1};\beta_{0}), for β0θβdβ\beta_{0}\in\theta_{\beta}\subset\mathbb{R}^{d_{\beta}}. In Lemma A.1, we provide a set of sufficient conditions which allow us to establish a global parametric identification condition analogous to Assumption 3.2.

Lemma 3.1.

FYs|XsZ1R(qτ|,,1)F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|\cdot,\cdot,1) is point identified under Assumptions 2.13.2.

Lemma 3.1 establishes the nonparametric identification of Λ\Lambda. The proof for the parametric case follows along exactly the same line so we omit it here. In the next example, we verify the identification assumptions in a conditional normal model.

Example 1 (Conditional Normal Model).

Let the structural equations of the study population be given by

Ys\displaystyle Y_{s} =gs(Xs,Z1)+ϵs,\displaystyle=g_{s}(X_{s},Z_{1})+\epsilon_{s},
Xs\displaystyle X_{s} =hs(Z)+ηs,\displaystyle=h_{s}(Z)+\eta_{s},

where ϵs\epsilon_{s} and ηs\eta_{s} are jointly normally distributed. Specifically, for positive-valued functions ψy()\psi_{y}(\cdot) and ψx()\psi_{x}(\cdot), we have

(ϵs,ηs)|Z,R=1pN(0,(ψy(Z1)00ψx(Z1))).(\epsilon_{s},\eta_{s})|Z,R=1\stackrel{{\scriptstyle p}}{{\to}}N\left(0,\begin{pmatrix}\psi_{y}(Z_{1})&0\\ 0&\psi_{x}(Z_{1})\end{pmatrix}\right).

Then, Λ(w)=Φ(ψy(z1)1/2(qτgs(w)))\Lambda(w)=\Phi(\psi_{y}(z_{1})^{-1/2}(q_{\tau}-g_{s}(w))), where Φ()\Phi(\cdot) denotes the CDF of standard normal distribution. Suppose the reduced-form of XX given ZZ in the auxiliary population is Xa=ha(Z)+ηa,X_{a}=h_{a}(Z)+\eta_{a}, Assumption 2.1(e) is satisfied if hs=ha=hh_{s}=h_{a}=h, and (ηs|Z,R=1)=d(ηa|Z,R=0)(\eta_{s}|Z,R=1)\stackrel{{\scriptstyle d}}{{=}}(\eta_{a}|Z,R=0). Assumption 3.1 holds if Z2ϵs|Xs,Z1,R=1Z_{2}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\epsilon_{s}|X_{s},Z_{1},R=1. Assume, in addition that, conditional on z1z_{1}, Supp(Z2)\operatorname{Supp}(Z_{2}) contains an open set and that h(z1,)h(z_{1},\cdot) maps open sets of z2z_{2} into open sets. Assumption 3.2 then follows by Theorem 2.2 in Newey and Powell (2003).

Turning to the linear case, let gs(w)=γs1x+γs2z1g_{s}(w)=\gamma_{s_{1}}x+\gamma_{s_{2}}^{\prime}z_{1}, h(z)=δ1z1+δ2z2h(z)=\delta_{1}^{\prime}z_{1}+\delta_{2}^{\prime}z_{2}, ψy=ψx=1\psi_{y}=\psi_{x}=1, ηs(ϵs,Z2)\eta_{s}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(\epsilon_{s},Z_{2}), and therefore, 𝔼[1(Yqτ)|Z,R=1]=Φ((qτ(γs1δ1+γs2)Z1γs1δ2Z2)/(1+γs12)1/2).\mathbb{E}\left[1(Y\leq q_{\tau})|Z,R=1\right]=\Phi((q_{\tau}-(\gamma_{s_{1}}\delta_{1}^{\prime}+\gamma_{s_{2}})^{\prime}Z_{1}-\gamma_{s_{1}}\delta_{2}^{\prime}Z_{2})/(1+\gamma_{s_{1}}^{2})^{1/2}). As a consequence, (γs1,γs2)(\gamma_{s_{1}},\gamma_{s_{2}}^{\prime})^{\prime} are uniquely determined by (3) or (4), if and only if δ20\delta_{2}\neq 0.

3.3 Identification with Continuously Distributed XX

In this section, we establish the identification of UQEqUQE_{q} and UQEpUQE_{p} when the distribution of XX is absolutely continuous. Before stating the main result, we need some additional identifying assumptions.

Assumption 3.3.
  1. (a)

    (i) ϵsUs|Z1,R=1\epsilon_{s}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}U_{s}|Z_{1},R=1; (ii) there exists a t0t_{0} sufficiently close to 0, such that for all tt0t\leq t_{0} and ϕtΦ\phi_{t}\in\Phi^{\ast}, ϵ~s,tU~s,t|Z~1,t,R~t=1\widetilde{\epsilon}_{s,t}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\widetilde{U}_{s,t}|\widetilde{Z}_{1,t},\widetilde{R}_{t}=1.

  2. (b)

    Supp(G)Supp(FX|ZR(|Z,1))\operatorname{Supp}(G)\subset\operatorname{Supp}(F_{X|ZR}(\cdot|Z,1)) almost surely.

Assumption 3.4.

FY|R=1F_{Y|R=1} is continuously differentiable in an open neighborhood of qτq_{\tau} with strictly positive density function fY|R=1f_{Y|R=1}.

Assumption 3.3(a)(i) says that conditional on Z1Z_{1}, structural error ϵ\epsilon is independent of the rank variable UsU_{s} in the study population. This is much weaker than the commonly assumed strict independence condition that XsX_{s} is independent of ϵs\epsilon_{s} unconditionally. Conditional exogeneity has also been imposed by Firpo et al. (2009b), Rothe (2012), and Chernozhukov, Fernández-Val and Melly (2013), among others. Assumption 3.3(a)(ii) requires the conditional independence condition of part (a) to hold when counterfactual experiments get sufficiently close to the status quo. Under the rank invariance condition imposed by Rothe (2012), it is automatically implied by Assumption 3.3(a)(i). Assumption 3.3(b) ensures that the conditional distribution of YsY_{s} given WW is identified over the support of WW. Assumption 3.4 imposes a smoothness condition on the distribution of target outcome, which implies that FY|R=11F_{Y|R=1}^{-1} is Hadamard differentiable at FY|R=1F_{Y|R=1}, tangentially to the set of functions that are continuous at qτq_{\tau}.

The main theoretical result of this section is given as follows.

Theorem 3.2.

Suppose that Assumptions 2.13.4 hold, and that the distribution of XX is absolutely continuous with respect to the Lebesgue measure, both UQEp(τ,G)UQE_{p}(\tau,G) and UQEq(τ,G)UQE_{q}(\tau,G) are identified.

(a) For UQEqUQE_{q}, we have

UQEq(τ,G)=1fYs|R(qτ|1)(1Q0)𝔼[(1R)(Z)Λx(X,Z1)gq(X)],UQE_{q}(\tau,G)=-\dfrac{1}{f_{Y_{s}|R}(q_{\tau}|1)(1-Q_{0})}\mathbb{E}\left[(1-R)\ell(Z)\Lambda_{x}(X,Z_{1})g_{q}(X)\right],

where gq(x):=G1(FXs|R(x|1))xg_{q}(x):=G^{-1}(F_{X_{s}|R}(x|1))-x, and FXs|R(x|1)=11Q0𝔼[(1R)(Z)1(Xx)]F_{X_{s}|R}(x|1)=\frac{1}{1-Q_{0}}\mathbb{E}\left[(1-R)\ell(Z)1(X\leq x)\right].

(b) Suppose in addition that 𝒳\operatorname{\mathcal{X}} is compact, and FXs|R=1F_{X_{s}|R=1} is continuously differentiable on 𝒳\operatorname{\mathcal{X}} with strictly positive density function fXs|R=1f_{X_{s}|R=1}. Then we have,

UQEp(τ,G)=1fYs|R(qτ|1)(1Q0)𝔼[(1R)(Z)Λx(X,Z1)gp(X)],UQE_{p}(\tau,G)=-\dfrac{1}{f_{Y_{s}|R}(q_{\tau}|1)(1-Q_{0})}\mathbb{E}\left[(1-R)\ell(Z)\Lambda_{x}(X,Z_{1})g_{p}(X)\right],

where gp(x):=G(x)FXs|R(x|1)fXs|R(x|1)g_{p}(x):=-\frac{G(x)-F_{X_{s}|R}(x|1)}{f_{X_{s}|R}(x|1)}, and fXs|R(x|1)=11Q0𝔼[(1R)(Z)1(Xx)]/x.f_{X_{s}|R}(x|1)=\frac{1}{1-Q_{0}}\partial\mathbb{E}\left[(1-R)\ell(Z)1(X\leq x)\right]/\partial x.

Remark 3.

The compactness condition on 𝒳\operatorname{\mathcal{X}} is assumed to ensure the existence of pathwise derivative of the inverse map. It can be relaxed by imposing a boundary condition on Λx\Lambda_{x}. Specifically, we may assume that Λx\Lambda_{x} vanishes when x[FXs|R=1(q1)+ϵ,FXs|R=1(q2)ϵ]x\not\in[F_{X_{s}|R=1}(q_{1})+\epsilon,F_{X_{s}|R=1}(q_{2})-\epsilon], for 0<q1<q2<10<q_{1}<q_{2}<1 and some ϵ>0\epsilon>0.

3.4 Identification with Discrete Covariate

Let the support of XX be {x1,,xl}\{x^{1},\dots,x^{l}\}. When XX is discrete, MQS is not well-defined and we consider MDS only, with counterfactual experiments defined through a fixed discrete distribution, GG.

Assumption 3.5.
  1. (a)

    (i) ϵsUs|Z1,R=1\epsilon_{s}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}U_{s}|Z_{1},R=1, for all Us𝒰sU_{s}\in\mathcal{U}_{s}; (ii) there exists a t0t_{0} sufficiently close to 0, such that for all tt0t\leq t_{0} and ϕtΦ\phi_{t}\in\Phi^{\ast}, ϵs,tU~s,t|Z~1,t,R~t=1\epsilon_{s,t}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\widetilde{U}_{s,t}|\widetilde{Z}_{1,t},\widetilde{R}_{t}=1, for all U~s,t𝒰~s,t\widetilde{U}_{s,t}\in\mathcal{\widetilde{U}}_{s,t}.

  2. (b)

    Supp(G)Supp(FXs|R=1).\operatorname{Supp}(G)\subset\operatorname{Supp}(F_{X_{s}|R=1}).

  3. (c)

    For all Us𝒰sU_{s}\in\mathcal{U}_{s}, FUs|Z1R(us|z1,1)F_{U_{s}|Z_{1}R}(u_{s}|z_{1},1) is continuously differentiable in usu_{s}, for all z1𝒵1z_{1}\in\operatorname{\mathcal{Z}}_{1}.

Assumption 3.5(a) is the counterpart of Assumption 3.3(a) for discrete covariates. Since the rank variables are no longer uniquely pinned down by strictly increasing quantile functions, we strengthen Assumption 3.3(a) so that conditional independence holds for all the rank variables in the equivalent class. With this identifying assumption in hand, we are ready to present the following identification result. For j=1,,lj=1,\dots,l, let the period bound generating function be defined by

hqτ(xj,xj1,z1):=(Λ(xj1,z1)Λ(xj,z1))(G(xj1)FXs|R(xj1|1))fYs|R(qτ|1).h_{q_{\tau}}(x^{j},x^{j-1},z_{1}):=-\dfrac{(\Lambda(x^{j-1},z_{1})-\Lambda(x^{j},z_{1}))\cdot(G(x^{j-1})-F_{X_{s}|R}(x^{j-1}|1))}{f_{Y_{s}|R}(q_{\tau}|1)}.
Theorem 3.3.

Suppose that Assumptions 2.13.2, 3.4, and 3.5 hold, UQEp(τ,G)UQE_{p}(\tau,G) is partially identified, with

UQEp(τ,G)[j𝒥+hqτ(xj,xj1,z1,j)+j𝒥hqτ(xj,xj1,z1,j),j𝒥+hqτ(xj,xj1,z1,j)+j𝒥hqτ(xj,xj1,z1,j)],UQE_{p}(\tau,G)\in\left[\sum_{j\in\operatorname{\mathcal{J}}_{+}}h_{q_{\tau}}(x^{j},x^{j-1},z^{\dagger}_{1,j})+\sum_{j\in\operatorname{\mathcal{J}}_{-}}h_{q_{\tau}}(x^{j},x^{j-1},z^{\ast}_{1,j}),\right.\\ \left.\sum_{j\in\operatorname{\mathcal{J}}_{+}}h_{q_{\tau}}(x^{j},x^{j-1},z^{\ast}_{1,j})+\sum_{j\in\operatorname{\mathcal{J}}_{-}}h_{q_{\tau}}(x^{j},x^{j-1},z^{\dagger}_{1,j})\right],

where 𝒥+:={j{1,,l}:G(xj1)FXs|R(xj1|1)}\operatorname{\mathcal{J}}_{+}:=\{j\in\{1,\dots,l\}:G(x^{j-1})\leq F_{X_{s}|R}(x^{j-1}|1)\} (𝒥\operatorname{\mathcal{J}}_{-} is analogously defined), z1,j:=argsupz1𝒵1(Λ(xj1,z1)Λ(xj,z1))z_{1,j}^{\ast}:=\arg\sup_{z_{1}\in\operatorname{\mathcal{Z}}_{1}}(\Lambda(x^{j-1},z_{1})-\Lambda(x^{j},z_{1})), z1,j:=arginfz1𝒵1(Λ(xj1,z1)Λ(xj,z1))z_{1,j}^{\dagger}:=\arg\inf_{z_{1}\in\operatorname{\mathcal{Z}}_{1}}(\Lambda(x^{j-1},z_{1})-\Lambda(x^{j},z_{1})), and FXs|R(xj|1)=𝔼[1R1Q0(Z)1(Xxj)]F_{X_{s}|R}(x^{j}|1)=\operatorname{\mathbb{E}}[\frac{1-R}{1-Q_{0}}\ell(Z)1(X\leq x^{j})], for j{1,,l}j\in\{1,\dots,l\}.

Theorem 3.3 indicates that UQEpUQE_{p} is generally partially identified with bounds generated by hqτ.h_{q_{\tau}}. In the special case when Λ(x,z1)\Lambda(x,z_{1}) is constant in z1z_{1}, the identified set of UQEpUQE_{p} reduces to a singleton.

If XX is binary and Gt,p(x)=1{0x<1}(FXs|R(0|1)t)+1{x1}G_{t,p}(x)=1\{0\leq x<1\}(F_{X_{s}|R}(0|1)-t)+1\{x\geq 1\}, hqτh_{q_{\tau}} reduces to (Λ(1,z1)Λ(0,z1))/fY|R(qτ|1)-\left(\Lambda(1,z_{1})-\Lambda(0,z_{1})\right)/f_{Y|R}(q_{\tau}|1). In such circumstance, Theorem 3.3 corresponds to the two-sample generalization of Theorem 5 in Rothe (2012), when ν\nu in that paper takes on the quantile functional.

4 Estimation and Inference

In this section, we discuss estimation and inference for our two-sample UQE. First, we describe an estimation procedure for UQEqUQE_{q} and UQEpUQE_{p} as identified in Theorem 3.2. We then show that our estimator is consistent and asymptotically normal in Theorem 4.2.555Here we focus on the scenario where the distribution of XX is absolutely continuous. When XX is discrete, the problem features partially identified parameters defined by the intersection bounds. Chernozhukov, Lee and Rosen (2013) provide an extensive treatment of this topic. We omit discussion here and refer readers to Appendix D in Rothe (2012) for a detailed discussion on how to apply their method.

4.1 Estimation Procedure

Following the discussion in Section 3, we first propose an estimator of the conditional probability, Λ\Lambda. Here, we restrict our attention to the parametric setting where Λ\Lambda is indexed by a vector of parameter, β\beta. We use a moment-matching method based on (4) to estimate β\beta. The estimation of β\beta consists of four-steps. In the first step, we estimate qτq_{\tau} by solving

q^τ:=argminq𝒴𝔼n[R(τ1(Yq))(Yq)].\widehat{q}_{\tau}:=\arg\min_{q\in\operatorname{\mathcal{Y}}}\mathbb{E}_{n}[R(\tau-1(Y\leq q))\cdot(Y-q)]. (5)

The next three steps follow closely the Auxilliary-to-Study Tilting (AST) method proposed by Graham et al. (2016). Using the AST estimator, β\beta and the propensity score can be jointly estimated from moment restrictions in (4). To implement the estimator, we first estimate the propensity score, r(z)r(z). Towards this end, we assume that the propensity score takes a parametric form, ie. r(z)=L(k(z)γ)r(z)=L(k(z)^{\prime}\gamma), where L()L(\cdot) is any link function that satisfies Assumption 4.1(e). Using L()L(\cdot), γ^\widehat{\gamma} can be obtained by solving the following problem,

γ^:=argmaxγΘγ𝔼n[Rlog(L(k(Z)γ))+(1R)log(1L(k(Z)γ))].\widehat{\gamma}:=\arg\max_{\gamma\in\Theta_{\gamma}}\operatorname{\mathbb{E}}_{n}\left[R\log(L(k(Z)^{\prime}\gamma))+(1-R)\log(1-L(k(Z)^{\prime}\gamma))\right]. (6)

The AST estimator augments the conditional maximum likelihood estimator γ^\widehat{\gamma} with tilting parameters. The resulting estimator of β\beta is more efficient than the one based on γ^\widehat{\gamma} alone. Let t(z)t(z) be a vector of known functions of zz with a constant term as the first element. Denote the tilting parameters associated with the auxiliary data and the study sample, by λa\lambda_{a} and λs\lambda_{s}, respectively. They are estimated by solving,

𝔼n\displaystyle\operatorname{\mathbb{E}}_{n} [(1R1L(k(Z)γ^+t(Z)λ^a)1)L(k(Z)γ^)t(Z)]=0,\displaystyle\left[\left(\dfrac{1-R}{1-L(k(Z)^{\prime}\widehat{\gamma}+t(Z)^{\prime}\widehat{\lambda}_{a})}-1\right)L(k(Z)^{\prime}\widehat{\gamma})t(Z)\right]=0, (7)
𝔼n\displaystyle\operatorname{\mathbb{E}}_{n} [(RL(k(Z)γ^+t(Z)λ^s)1)L(k(Z)γ^)t(Z)]=0.\displaystyle\left[\left(\dfrac{R}{L(k(Z)^{\prime}\widehat{\gamma}+t(Z)^{\prime}\widehat{\lambda}_{s})}-1\right)L(k(Z)^{\prime}\widehat{\gamma})t(Z)\right]=0. (8)

Using λ^s\widehat{\lambda}_{s} and λ^a\widehat{\lambda}_{a}, we compute study and auxiliary sample tilts, which are defined as follows

π^is\displaystyle\widehat{\pi}_{i}^{s} :=L(k(Zi)γ^)L(k(Zi)γ^+t(Zi)λ^s),π^ia:=L(k(Zi)γ^)1L(k(Zi)γ^+t(Zi)λ^a).\displaystyle:=\dfrac{L(k(Z_{i})^{\prime}\widehat{\gamma})}{L(k(Z_{i})^{\prime}\widehat{\gamma}+t(Z_{i})^{\prime}\widehat{\lambda}_{s})},\qquad\qquad\widehat{\pi}_{i}^{a}:=\dfrac{L(k(Z_{i})^{\prime}\widehat{\gamma})}{1-L(k(Z_{i})^{\prime}\widehat{\gamma}+t(Z_{i})^{\prime}\widehat{\lambda}_{a})}. (9)

Also let e(z)e(z) be a dβd_{\beta}-dimensional vector of known functions of zz, and g(a;q^τ,γ^,λ^s,λ^s,β)g(a;\widehat{q}_{\tau},\widehat{\gamma},\widehat{\lambda}_{s},\widehat{\lambda}_{s},\beta) :=(π^sr1(yq^τ)π^a(1r)Λ(w;β))e(z):=(\widehat{\pi}^{s}r1(y\leq\widehat{q}_{\tau})-\widehat{\pi}^{a}(1-r)\Lambda(w;\beta))e(z). Now, in the last step, β\beta can be estimated by

β^:=arginfβΘβ^n(β),\widehat{\beta}:=\arg\inf_{\beta\in\Theta_{\beta}}\widehat{\operatorname{\mathcal{L}}}_{n}(\beta), (10)

where ^n(β):=𝔼n[g(A;q^τ,γ^,λ^s,λ^a,β)]Ωn2\widehat{\operatorname{\mathcal{L}}}_{n}(\beta):=\left\lVert\operatorname{\mathbb{E}}_{n}[g(A;\widehat{q}_{\tau},\widehat{\gamma},\widehat{\lambda}_{s},\widehat{\lambda}_{a},\beta)]\right\rVert^{2}_{\Omega_{n}} and xΩn2:=xΩnx\left\lVert x\right\rVert^{2}_{\Omega_{n}}:=x^{\prime}\Omega_{n}x, for a sequence of positive definite weighting matrices Ωn\Omega_{n}.

Using these quantities, we can obtain Λx(W;β^):=Λ(W;β^)/x\Lambda_{x}(W;\widehat{\beta}):=\partial\Lambda(W;\widehat{\beta})/\partial x, and ^(z):=nansL(k(z)γ^+t(z)λ^s)1L(k(z)γ^+t(z)λ^a).\widehat{\ell}(z):=\frac{n_{a}}{n_{s}}\cdot\frac{L(k(z)^{\prime}\widehat{\gamma}+t(z)^{\prime}\widehat{\lambda}_{s})}{1-L(k(z)^{\prime}\widehat{\gamma}+t(z)^{\prime}\widehat{\lambda}_{a})}. Throughout this section, we assume that the counterfactual distribution GG is known. In practice, if GG is not known, it may be estimated from an independent sample; see e.g. Rothe (2010). Using the above estimates and F^Xs|R=1():=𝔼na[^(z)1(X)]\widehat{F}_{X_{s}|R=1}(\cdot):=\operatorname{\mathbb{E}}_{n_{a}}[\widehat{\ell}(z)1(X\leq\cdot)], where 𝔼na[X]\operatorname{\mathbb{E}}_{n_{a}}[X] denotes na1i=ns+1nXin_{a}^{-1}\sum_{i=n_{s}+1}^{n}X_{i}, g^q\widehat{g}_{q} can be obtained as the plug-in estimator. For gpg_{p}, we need an estimator for fX|R=1()f_{X|R=1}(\cdot). Our identification relies on a compact support condition, and it is well known that the Prazen-Rosenblatt density estimator is not valid near the boundary of support. To overcome this challenge, we introduce trimming.666 Trimming is widely adopted in the literature; see e.g. Härdle and Stoker (1989), Powell et al. (1989) among others. This specific trimming function is inspired by Guerre et al. (2000) and Li et al. (2002). As an alternative, we can use a local polynomial density estimator that adjusts for the boundary bias adaptively; see Cattaneo et al. (2020) for details. For a kernel KxK_{x} with compact support, and some bandwidth bxb_{x}, we let

f^X|R(x|1):=𝔼na[^(Z)IbxKbx(Xx)],\widehat{f}_{X|R}(x|1):=\operatorname{\mathbb{E}}_{n_{a}}\left[\widehat{\ell}(Z)I_{b_{x}}K_{b_{x}}\left(X-x\right)\right],

where Kbx():=bx1Kx(/bx)K_{b_{x}}(\cdot):=b_{x}^{-1}K_{x}(\cdot/b_{x}). IbxI_{b_{x}} is a trimming indicator, which equals one for x{[x¯+ρxbx/2,x¯ρxbx/2]}x\in\{[\underline{x}+\rho_{x}b_{x}/2,\bar{x}-\rho_{x}b_{x}/2]\}, where x¯\underline{x}, x¯\bar{x}, and ρx\rho_{x} are the lower and upper bound, of 𝒳\operatorname{\mathcal{X}}, and the diameter of Supp(Kx)\operatorname{Supp}(K_{x}), respectively. The density, fY|Rf_{Y|R}, can also be estimated using kernel density estimator. Specifically, for any kernel function Ky()K_{y}(\cdot) that satisfies Assumption 4.2(b), let f^Y|R(y|1):=𝔼ns[Kby(Yiy)],\widehat{f}_{Y|R}(y|1):=\operatorname{\mathbb{E}}_{n_{s}}[K_{b_{y}}\left(Y_{i}-y\right)], where 𝔼ns[X]:=ns1i=1nsXi\operatorname{\mathbb{E}}_{n_{s}}[X]:=n_{s}^{-1}\sum_{i=1}^{n_{s}}X_{i} and Kby(y):=by1Ky(y/by)K_{b_{y}}(y):=b_{y}^{-1}K_{y}(y/b_{y}).

Now, plugging in the estimators of nuisance quantities, UQEj(τ,G)UQE_{j}(\tau,G) can thus be estimated by,

UQE^j(τ,G):=1f^YR(q^τ|1)𝔼na[^(Z)Λx(W;β^)g^j(X)],j=p,q.\widehat{UQE}_{j}(\tau,G):=-\dfrac{1}{\widehat{f}_{YR}(\widehat{q}_{\tau}|1)}\mathbb{E}_{n_{a}}[\widehat{\ell}(Z)\Lambda_{x}(W;\widehat{\beta})\widehat{g}_{j}(X)],\ j=p,q. (11)

We summarize the estimation procedure in the following algorithm.

Algorithm 1 (Plug-in Estimator for UQE^\widehat{UQE}).
  1. 1.

    Compute the empirical quantile estimator q^τ\widehat{q}_{\tau} by solving (5).

  2. 2.

    Compute the conditional maximum likelihood estimator γ^\widehat{\gamma} by solving (6).

  3. 3.

    Solve (7) and (8) to get λ^j\widehat{\lambda}_{j}, and use them to compute π^j\widehat{\pi}_{j}, for j=s,aj=s,a, following (9).

  4. 4.

    Use the above quantities to compute β^\widehat{\beta}, by solving (10).

  5. 5.

    Compute Λx(;β^),^(),F^Xs|R=1()\Lambda_{x}(\cdot;\widehat{\beta}),\widehat{\ell}(\cdot),\widehat{F}_{X_{s}|R=1}(\cdot), f^Xs|R=1()\widehat{f}_{X_{s}|R=1}(\cdot). Using these quantities to compute g^j\widehat{g}_{j}, for j=p,qj=p,q.

  6. 6.

    For j=p,q,j=p,q, compute the plug-in estimator UQE^j\widehat{UQE}_{j}, following (11).

4.2 Large Sample Results

In this section, we present inference results for the estimators introduced in the previous section. We first establish large sample properties of β^\widehat{\beta}, for which purpose, some additional regularity conditions are in order.

Assumption 4.1.
  1. (a)

    (i){(Ri,RiYi,(1Ri)Xi,Zi)}i=1n\{(R_{i},R_{i}Y_{i},(1-R_{i})X_{i},Z_{i})\}_{i=1}^{n} are i.i.d.; (ii) let θ:=(γ,λs,λa,β)Θ:=Θβ×Θλ2×Θβ\theta:=(\gamma,\lambda_{s},\lambda_{a},\beta)\in\Theta:=\Theta_{\beta}\times\Theta_{\lambda}^{2}\times\Theta_{\beta}, then Θ\Theta is compact, and θ0\theta_{0} lies in the interior of Θ\Theta.

  2. (b)

    FY|ZR=1(y|z)F_{Y|ZR=1}(y|z) is absolutely continuous and differentiable in y𝒴0y\in\operatorname{\mathcal{Y}}_{0} for all z𝒵z\in\operatorname{\mathcal{Z}}, where 𝒴0\operatorname{\mathcal{Y}}_{0} is a compact subset of 𝒴\operatorname{\mathcal{Y}}, and

    sup(y,z)𝒴0𝒵|fY|ZR(y|z,1)|c1<.\sup_{(y,z)\in\operatorname{\mathcal{Y}}_{0}\operatorname{\mathcal{Z}}}|f_{Y|ZR}(y|z,1)|\leq c_{1}<\infty.
  3. (c)

    (i) Λ(w;β)\Lambda(w;\beta) is twice continuously differentiable in β\beta with uniformly bounded derivatives, for all w𝒲w\in\operatorname{\mathcal{W}}; (ii) 0infw,βΛ(w;β),supw,βΛ(w;β)10\leq\inf_{w,\beta}\Lambda(w;\beta),\sup_{w,\beta}\Lambda(w;\beta)\leq 1; (iii) Λx(;β)\Lambda_{x}(\cdot;\beta) is continuously differentiable in β\beta, and supw,β|Λx(w;β)|c2<\sup_{w,\beta}|\Lambda_{x}(w;\beta)|\leq c_{2}<\infty.

  4. (d)

    There exists a symmetric, non-random matrix Ω\Omega, such that ΩnΩ=Op(δω,n)||\Omega_{n}-\Omega||=O_{p}(\delta_{\omega,n}), where δω,n=o(1)\delta_{\omega,n}=o(1), and that c31λmin(Ω)λmax(Ω)c3c_{3}^{-1}\leq\lambda_{min}(\Omega)\leq\lambda_{max}(\Omega)\leq c_{3}.

  5. (e)

    There is a unique γ0Θγ\gamma_{0}\in\Theta_{\gamma}, and known function L()L(\cdot) such that (i)

    (z)=1Q0Q0L(k(z)γ0)1L(k(z)γ0).\ell(z)=\dfrac{1-Q_{0}}{Q_{0}}\cdot\dfrac{L(k(z)^{\prime}\gamma_{0})}{1-L(k(z)^{\prime}\gamma_{0})}.

    (ii) L()L(\cdot) is strictly increasing, twice continuously differentiable, with bounded first and second order derivatives; (ii) limxL(x)=0\lim_{x\to-\infty}L(x)=0 and limxL(x)=1\lim_{x\to\infty}L(x)=1; (iii) 0<c4<L(k(z)γ+t(z)λj)c5<10<c_{4}<L(k(z)^{\prime}\gamma+t(z)^{\prime}\lambda_{j})\leq c_{5}<1 for all (γ,λj)Θγ×Θλ(\gamma,\lambda_{j})\in\Theta_{\gamma}\times\Theta_{\lambda}, j=s,a,j=s,a, and z𝒵z\in\mathcal{Z}.

  6. (f)

    𝔼[j(Z)4]<\operatorname{\mathbb{E}}[||j(Z)||^{4}]<\infty, where j=k,t,ej=k,t,e.

Assumption 4.1(a) is standard in the microeconometric literature. Assumption 4.1(b) requires the conditional density fY|ZR(|,1)f_{Y|ZR}(\cdot|\cdot,1) be bounded uniformly for all (y,z)𝒴0𝒵(y,z)\in\operatorname{\mathcal{Y}}_{0}\operatorname{\mathcal{Z}}. Assumption 4.1(c) imposes mild smoothness conditions on the parametric function Λ(,;)\Lambda(\cdot,\cdot;\cdot), requiring it to be bounded between the unit interval, thus behaving like a distribution function. Assumption 4.1(d) states that Ωn\Omega_{n} is consistent for Ω\Omega, which is positive definite. Assumption 4.1(e) implies that the true “propensity score” is known up to finite dimensional γ0\gamma_{0}. It also specifies smoothness and boundedness conditions on the parametric propensity score. Finally, due to the estimation of qτq_{\tau}, we impose a finite fourth moment condition in Assumption 4.1(f), which is stronger than the usual square-integrability condition.

Lemma 4.1.

Suppose that Assumptions 2.13.4, and Assumption 4.1 hold, then (i) β^pβ0\widehat{\beta}\stackrel{{\scriptstyle p}}{{\rightarrow}}\beta_{0}; furthermore, (ii) suppose that the Jacobian, MΩM_{\Omega}, as defined in the Supplementary Appendix A, is invertible, then

n(β^β0)=1ni=1nψβ(Ai;θ0,qτ)+op(1),\sqrt{n}(\widehat{\beta}-\beta_{0})=\dfrac{1}{\sqrt{n}}\sum_{i=1}^{n}\psi_{\beta}(A_{i};\theta_{0},q_{\tau})+o_{p}(1),

where ψβ(A;θ0,qτ)\psi_{\beta}(A;\theta_{0},q_{\tau}) is given in the Supplementary Appendix A, and (iii)

n(β^β0)dN(0,Σβ),\sqrt{n}(\widehat{\beta}-\beta_{0})\stackrel{{\scriptstyle d}}{{\rightarrow}}N(0,\Sigma_{\beta}),

where Σβ:=𝔼[ψβ(A;θ0,qτ)ψβ(A;θ0,qτ)]\Sigma_{\beta}:=\operatorname{\mathbb{E}}[\psi_{\beta}(A;\theta_{0},q_{\tau})\psi_{\beta}(A;\theta_{0},q_{\tau})^{\prime}].

Lemma 4.1 shows that the parameters of FYs|XsZR=1F_{Y_{s}|X_{s}ZR=1} are consistently estimated by β^\widehat{\beta}. Furthermore, it admits an asymptotic linear representation with influence function given by ψβ(A;θ0,qτ)\psi_{\beta}(A;\theta_{0},q_{\tau}), which plays a key role in establishing the large sample properties of UQE. Towards this ends, we need the following set of assumptions.

Assumption 4.2.
  1. (a)

    (i) FY|R=1()F_{Y|R=1}(\cdot) is absolutely continuous and differentiable over y𝒴y\in\operatorname{\mathcal{Y}}; (ii) fY|R=1()f_{Y|R=1}(\cdot) is uniformly continuous; (iii) the density fY|R=1(y)f_{Y|R=1}(y) is strictly bounded away from 0, three times continuously differentiable in yy with uniformly bounded derivatives for yy in 𝒴0\operatorname{\mathcal{Y}}_{0}, such that qτ𝒴0q_{\tau}\in\operatorname{\mathcal{Y}}_{0}.

  2. (b)

    The kernel function Ky()K_{y}(\cdot) is symmetric, continuous, bounded, with a compact support, and such that (i) Ky(y)𝑑y=1\int K_{y}(y)dy=1; (ii) yKy(y)𝑑y=0\int yK_{y}(y)dy=0.

  3. (c)

    by0,log(n)n1by10b_{y}\to 0,\ \log(n)n^{-1}b_{y}^{-1}\to 0, and nby5c6<nb_{y}^{5}\to c_{6}<\infty.

  4. (d)

    (i) 𝒳\operatorname{\mathcal{X}} is compact; (ii) GG is continuously differentiable on 𝒳\operatorname{\mathcal{X}} with strictly positive density.

Assumption 4.2(a) strengthens Assumption 3.4 and imposes stronger smoothness conditions on the distribution of YsY_{s}. Assumption 4.2(b) states several regularity conditions on kernel functions, which is standard in the literature. Assumption 4.2(c) specifies admissible rate for the bandwidth parameter. We can choose by=O(nsκ)b_{y}=O(n_{s}^{-\kappa}), for κ[1/5,1/2)\kappa\in[1/5,1/2). Assumption 4.2(d) imposes support and smoothness conditions for the counterfactual target covariate.

Asymptotic properties of UQE^\widehat{UQE} are formally characterized in the next theorem.

Theorem 4.2.

Under Assumptions 2.13.4, 4.1, and 4.2, (i) the following linear expansions hold,

UQE^q(τ,G)UQEq(τ,G)=1ni=1nψq+Bq(τ,G,by)+op(n1/2).\widehat{UQE}_{q}(\tau,G)-UQE_{q}(\tau,G)=\dfrac{1}{n}\sum_{i=1}^{n}\psi_{q}+B_{q}(\tau,G,b_{y})+o_{p}(n^{-1/2}).

Suppose in addition that Assumption B.1 holds, (ii) then we have

UQE^p(τ,G)UQEp(τ,G)=1ni=1nψp+Bp(τ,G,by)+op(n1/2),\widehat{UQE}_{p}(\tau,G)-UQE_{p}(\tau,G)=\dfrac{1}{n}\sum_{i=1}^{n}\psi_{p}+B_{p}(\tau,G,b_{y})+o_{p}(n^{-1/2}),

where, ψj\psi_{j}, j=p,q,j=p,q, is defined in Appendix B, Bj(qτ,G,by)B_{j}(q_{\tau},G,b_{y}) :=by2fY|R′′(qτ|1)dj(θ0,G)2fY|R2(qτ|1):=\frac{b_{y}^{2}f_{Y|R}^{{}^{\prime\prime}}(q_{\tau}|1)d_{j}(\theta_{0},G)}{2f_{Y|R}^{2}(q_{\tau}|1)} y2Ky(y)dy\cdot\int y^{2}K_{y}(y)dy, and dj(θ0,G):=11Q0𝔼[(1R)(Z)Λx(X,Z1;β0)gj(X)]d_{j}(\theta_{0},G):=\frac{1}{1-Q_{0}}\mathbb{E}[(1-R)\ell(Z)\Lambda_{x}(X,Z_{1};\beta_{0})g_{j}(X)], for j=p,q.j=p,q.

(iii) Therefore,

nby(UQE^j(τ,G)UQEj(τ,G)Bj(qτ,G,by))dN(0,Σj),\sqrt{nb_{y}}(\widehat{UQE}_{j}(\tau,G)-UQE_{j}(\tau,G)-B_{j}(q_{\tau},G,b_{y}))\stackrel{{\scriptstyle d}}{{\to}}N(0,\Sigma_{j}),

where, Σj:=dj2(θ0,G)fY|R3(qτ|1)Q0Ky2(y)𝑑y,\Sigma_{j}:=\frac{d_{j}^{2}(\theta_{0},G)}{f_{Y|R}^{3}(q_{\tau}|1)Q_{0}}\int K_{y}^{2}(y)dy, for j=p,q.j=p,q.

From the linear expansions in Theorem 4.2, we conclude that UQEUQE converges at a rate that is slower than root-nn. This result is mainly driven by the nonparametric estimation of the density fY|R=1f_{Y|R=1}, and therefore, the estimator is nonparametric in essence. Moreover, the asymptotic expansion includes an asymptotic bias term, B(τ,G,by)B(\tau,G,b_{y}). If we assume, as in Firpo et al. (2009a), nby50nb^{5}_{y}\to 0 or κ<1/5\kappa<1/5, the bias vanishes asymptotically.

Remark 4.

Estimators for the asymptotic variance of UQEp(τ,G)UQE_{p}(\tau,G) and UQEq(τ,G)UQE_{q}(\tau,G) can be constructed using their empirical counterparts. Specifically, let

Σ^j:=d^j,n(θ^,G)2f^Y|R3(q^τ|1)𝔼n[R]Ky2(y)𝑑y,\widehat{\Sigma}_{j}:=\dfrac{\widehat{d}_{j,n}(\widehat{\theta},G)^{2}}{\widehat{f}^{3}_{Y|R}(\widehat{q}_{\tau}|1)\operatorname{\mathbb{E}}_{n}[R]}\int K_{y}^{2}(y)dy,

where d^j,n(θ^,G):=𝔼na[^(Z)Λx(W;β^)g^j(X)]\widehat{d}_{j,n}(\widehat{\theta},G):=\mathbb{E}_{n_{a}}[\widehat{\ell}(Z)\Lambda_{x}(W;\widehat{\beta})\widehat{g}_{j}(X)]. Under a suitable rate condition on byb_{y}, consistency of Σ^j\widehat{\Sigma}_{j} follows directly from the first two parts of Theorem 4.2. To achieve better finite-sample performance, we can add the root-n terms of the influence functions to the variance estimator, based on which, we propose the following improved variance estimator,

Σ^j,imp:=by𝔼n[ψ^j(A;θ^,q^τ,by)2].\widehat{\Sigma}_{j,imp}:=b_{y}\operatorname{\mathbb{E}}_{n}[\widehat{\psi}_{j}(A;\widehat{\theta},\widehat{q}_{\tau},b_{y})^{2}]. (12)

In the above definition, ψ^j(a;θ^,q^τ,by)\widehat{\psi}_{j}(a;\widehat{\theta},\widehat{q}_{\tau},b_{y}) is a plug-in estimator of the influence function, ψj(A;θ0,qτ,by)\psi_{j}(A;\theta_{0},q_{\tau},b_{y}), for j=p,qj=p,q. A detailed description of the construction of ψ^\widehat{\psi} can be found in the Supplementary Appendix C. As an alternative, we may conduct inference using a bootstrap procedure, e.g. exchangeable bootstrap. It can be shown that bootstrap approximates not only the leading term but also the higher-order terms in the asymptotic linear representation, which would immediately lead to an improvement over the simple plug-in estimator.

Remark 5.

Theorem 4.2 implies that tests of the unconditional quantile effect converges at a non-parametric rate in general. Nonetheless, for the null of zero, positive, and negative effects, we can still construct tests that have power against departures of the null at the parametric rate. For example, to test the null: H0:UQEj(τ,G)=0H_{0}:UQE_{j}(\tau,G)=0, it is equivalent to test H0:dj(β0,G)=0H^{\prime}_{0}:d_{j}(\beta_{0},G)=0, as UQEj(τ,G)=0dj(β0,G)=0UQE_{j}(\tau,G)=0\Leftrightarrow d_{j}(\beta_{0},G)=0, for j=p,qj=p,q. From Theorem 4.2, we know that d^j,n(β^,G)\widehat{d}_{j,n}(\widehat{\beta},G) converges at the parametric rate. Moreover, we have

nV^d,j1/2(d^j,n(θ^,G)dj(θ0,G))dN(0,1),\sqrt{n}\widehat{V}_{d,j}^{-1/2}(\widehat{d}_{j,n}(\widehat{\theta},G)-d_{j}(\theta_{0},G))\stackrel{{\scriptstyle d}}{{\to}}N(0,1), (13)

where V^d,j\widehat{V}_{d,j} is an estimator of Vd,j:=𝔼[ψd,j(A;θ0,qτ)2],V_{d,j}:=\operatorname{\mathbb{E}}[\psi_{d,j}(A;\theta_{0},q_{\tau})^{2}], with ψd,j\psi_{d,j}, j=p,qj=p,q, defined in Appendix B. The result in (13) can be used to test H0H^{\prime}_{0}, applying standard testing procedures.

5 Empirical Application

We apply our identification and estimation methods to a variant of the Mincer’s regression. Our main goal here is to demonstrate the bias from using potential instead of actual labor experience in human capital earnings models.

Identifying the causal relationship between earnings and human capital accumulation has been a focus of labor economic studies for decades. Traditionally, Mincer’s regression has been widely used to quantify the link between labor wage, education and labor market experience.

Most datasets do not provide respondents’ actual work histories. Therefore, many researchers choose to proxy the variable with potential work experience. The potential experience measure is usually calculated by subtracting years of schooling plus some constant (typically 6 years) from age. Despite the popularity of this practice, many labor economists believe that the return to actual experience tends to be biased when we employ the potential experience as proxy; see e.g. Regan and Oaxaca (2009). One of their main argument is that any lapse in labor force participation would be implicitly assumed away when potential experience instead of the actual one is used. There is little reason to believe that the return to employed experience is the same as that of the unemployed period. Hence, it is still preferable to use the actual labor experience.

We use the 1970 wave of IPUMS as our main sample. The data is a 1-in-10,000 national random sample of the population. The outcome of interest is the natural log of yearly earnings. The target covariate, actual work experience, is missing from IPUMS. To apply the procedure described in Section 4, we need a dataset where the actual work experience is available. For that purpose, we use the 1972 wave of PSID as cleaned by Hirukawa et al. (2020). Detailed work histories are available in PSID. Therefore, it allows us to recover the actual labor market experience. However, running analysis directly with PSID may not be ideal due to the fact that it is not nationally representative. Our method is able to address this issue by combining information from both samples.

To estimate FYs|XsZ1R=1F_{Y_{s}|X_{s}Z_{1}R=1}, we consider the following specification,

(log(Income)y)=Λ(β0+β1\displaystyle\operatorname{\mathbb{P}}(log(Income)\leq y)=\Lambda(\beta_{0}+\beta_{1} educ+β2black+β3south\displaystyle educ+\beta_{2}black+\beta_{3}south
+β4married+β5experr+β6experr2)\displaystyle+\beta_{4}married+\beta_{5}exper_{r}+\beta_{6}exper_{r}^{2}) (14)

where experrexper_{r} stands for individual’s actual or realized work experience, educeduc denotes the highest grade completed by the respondent, black,married,black,married, and southsouth are dummy variables which take one if the person is black, married, and lives in the south, respectively.

Table 1: Summary Statistics

Variable Mean St. Dev. Min 25th Pctl. Median 75th Pctl. Max Data Source A: The IPUMS Sample Income 8,856.48 5,846.45 50 5,550 8,050 10,650 50,000 Log(Income) 8.88 0.73 3.91 8.62 8.99 9.27 10.82 Age 35.6 8.3 23 28 35 43 50 Education 11.74 2.72 5 10 12 13 17 Black 0.07 0.26 0 0 0 0 1 South 0.26 0.44 0 0 0 1 1 Married 0.84 0.37 0 1 1 1 1 Potential Experience 17.86 9.17 0 10 17 26 39 Data Source B: The PSID Sample Income 9,415.42 5,620.45 50 6,000 8,598 11,721 70,000 Log(Income) 8.98 0.66 3.91 8.7 9.06 9.37 11.16 Age 34.87 8.41 23 27 34 42 50 Education 12.53 2.93 5 11 12 16 17 Black 0.26 0.44 0 0 0 1 1 South 0.4 0.49 0 0 0 1 1 Married 0.9 0.29 0 1 1 1 1 Potential Experience 16.34 9.49 0 8 16 24 39 Actual Experience 15.98 8.39 1 8 16 23 35 Notes: Summary statistics for IPUMS and PSID. The top panel uses male subsample (aged between 23 to 50) from the 1970 wave of IPUMS with a sample size of 3,504. The bottom panel uses the male subsample (aged between 23 to 50) from the 1972 wave of PSID with a sample size of 1,697.

The actual work experience serves as our XsX_{s}. It enters (14) with linear and quadratic terms. We let (educ,black,south,married)(educ,black,south,married) be the set of included instruments Z1Z_{1}, and the potential experience, experpexper_{p}, be the excluded instrument Z2Z_{2}. We provide estimation results when Λ()\Lambda(\cdot) takes either the logistic link or the probit link. To implement the AST estimator, we choose j(Z)=(Z1,Z2)j(Z)=(Z_{1}^{\prime},Z_{2}^{\prime})^{\prime}, for j=k,t,ej=k,t,e. The density of YsY_{s} is estimated using by=ns0.01bn,0b_{y}=n_{s}^{-0.01}b_{n,0}, where bn,0:=1.06min{σ(Ys),interquartile(Ys)}n10.2b_{n,0}:=1.06\min\{\sigma(Y_{s}),interquartile(Y_{s})\}n_{1}^{-0.2} is the usual “rule-of-thumb” bandwidth.

Refer to caption
Figure 2: Distributions of Potential and Actual Experience.
\floatfoot

Notes: The (smoothed) empirical distributions of potential and actual labor market experience. The red line depicts the smoothed ECDF of potential experience in the PSID sample (target counterfactual distribution). The green line depicts the ECDF of actual experience in the PSID sample. The blue line depicts the estimated CDF of actual experience in the IPUMS sample. The grey line depicts the ECDF of potential experience in the IPUMS sample.

Table 2: Estimation Results
Quantile Level 0.25 0.5 0.75 0.25 0.5 0.75
Logit Link Probit Link
MDS
UQE2s(τ)UQE_{2s}(\tau) -0.0490 -0.0303 -0.0153 -0.0606 -0.0308 -0.0157
(0.0244) (0.0116) (0.0071) (0.0277) (0.0119) (0.0071)
H0:UQEH_{0}:UQE = 0 0.0426 0.0080 0.0292 0.0272 0.0087 0.0260
MQS
UQE2s(τ)UQE_{2s}(\tau) -0.0458 -0.0297 -0.0149 -0.0575 -0.0300 -0.0152
(0.0112) (0.0065) (0.0056) (0.0128) (0.0065) (0.0056)
H0:UQEH_{0}:UQE = 0 0.0000 0.0000 0.0075 0.0000 0.0000 0.0064
MLS
UQE2s(τ)UQE_{2s}(\tau) 0.0240 0.0189 0.0150 0.0268 0.0191 0.0154
(0.0033) (0.0019) (0.0017) (0.0035) (0.0019) (0.0016)
H0:UQEH_{0}:UQE = 0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Notes: In each panel, the first two rows report point estimates and standard error using our two sample estimator. The last row of each panel reports the pp-value associated with the Wald test of zero effect.
Refer to caption
(a) MDS
Refer to caption
(b) MQS
Refer to caption
(c) MLS
Figure 3: Unconditional Quantile Effect of Actual Experience on Log(Earnings).
\floatfoot

Notes: The UQE of labor market experience on log earnings. The top left panel: results for UQE with MDS. The top right panel: results for UQE with MQS. The bottom panel: results for UQE with MLS. All three plots contain the UQE of potential experience based on IPUMS (solid lines), the UQE of actual experience based on PSID (dotted lines), the two-sample UQE (dashed lines), and the two-sided 95% confidence intervals based on the improved variance estimator (shaded area).

Table 1 reports the descriptive statistics for the two samples. We use only the data on men aged between 23 and 50 when the surveys are taken, to ensure that common support condition holds. This leaves us with a sample of ns=3,504n_{s}=3,504 respondents for IPUMS and na=1,697n_{a}=1,697 for PSID. There are considerable differences between the two datasets. Individuals who are black, married and/or lives in the south are over-represented in PSID compared to the nationally representative IPUMS. On average, an individual in PSID has 0.36 years more potential experience than actual experience.

For UQE with MDS and MQS, we choose the smoothed empirical distribution of experpexper_{p} in the PSID sample (assuming it is fixed and known) as the target counterfactual distribution, i.e. G=FZ2|R=0G=F_{Z_{2}|R=0}. As depicted in Figure 2, we find that less-experienced workers tend to have even fewer years of experience in the counterfactual scenario than in the status quo, and the opposite is true for workers closer to the right tail of the distribution.

We report estimation results in Table 2 and Figure 3. A few remarks are in order. First, our estimates suggest that the counterfactual effect of a marginal shift in the distribution of actual experience is heterogeneous across income groups. The effect is larger in magnitude for the lower-income groups as expected. When MDS and MQS are considered, the quantile effects are uniformly negative and the shapes of the effect curves are similar. The marginal shift could decrease the (log) earning by anything between 0.015 and 0.061 across income quantiles. For reference, when logistic link is assumed, the marginal effect of MDS amounts to a reduction of 4.8% in annual earnings for individuals the first quartile, 3.0% at the median, and 1.5% at the third quartile, respectively. For MLS, two-sample estimates are bounded between the two sets of one-sample estimates. The marginal upward shift in the actual experience would increase a median worker’s income by 1.9%.

Next, we consider the bias caused by using potential experience in lieu of the actual experience. We note that, one-sample UQE estimates based on IPUMS tend to be smaller in magnitude at lower income quantiles than that based on the combined data. Eventually, the two estimators converge at higher income levels. In the context of our application, there does not seem to be sufficient evidence to support the claim that the bias is statistically significant.777Our analysis is local to the direction of counterfactual change, and therefore, does not allow the insignificance result to be extrapolated globally. For the same reason, the comparison between UQE2sUQE_{2s} and UQEpsidUQE_{psid} is not meaningful.

6 Concluding Remarks

In this paper, we propose a framework to identify and estimate unconditional quantile policy effect under data combination. We establish the identification of UQE under two main conditions: a rank similarity assumption and a conditional independence assumption, based on which, we provide estimators for the identified UQE and derive their large sample properties.

Our current approach can be extended in the following directions. First, although we have restricted our attention to the quantile effect throughout this paper, our results can be easily extended to other statistical functionals such as mean, interquartile, and inequality measures. It would be interesting to see how the identification requirements change with respect to the functional we adopt. Second, we have focused exclusively on the pointwise identification and inference. While extension to uniform results seem straightforward, it comes at a cost of stronger cross-sample restrictions. Under such assumptions, conditional quantile regression is likely feasible. Comparing conditional and unconditional quantile effects, as in Firpo et al. (2009b), under our two-sample structure, would also be an interesting direction for future research.

Appendix

Appendix A Proofs of Lemmas and Theorems in Section 3

Proof of Lemma 3.1: We provide proof for the nonparametric identification here. The proof for the parametric case follows along exactly the same line and is omitted. We shall show (3) first,

𝔼[1(Yqτ)|Z,R=1]\displaystyle\mathbb{E}[1(Y\leq q_{\tau})|Z,R=1] =𝔼[𝔼[1(Ysqτ)|Xs,Z,R=1]|Z,R=1]\displaystyle=\mathbb{E}[\mathbb{E}[1(Y_{s}\leq q_{\tau})|X_{s},Z,R=1]|Z,R=1]
=𝔼[𝔼[1(Ysqτ)|Xs,Z1,R=1]|Z,R=1]\displaystyle=\mathbb{E}[\mathbb{E}[1(Y_{s}\leq q_{\tau})|X_{s},Z_{1},R=1]|Z,R=1]
=𝔼[Λ(Xs,Z1)|Z,R=1]\displaystyle=\mathbb{E}[\Lambda(X_{s},Z_{1})|Z,R=1]
=𝔼[Λ(Xa,Z1)|Z,R=0]\displaystyle=\mathbb{E}[\Lambda(X_{a},Z_{1})|Z,R=0]
=𝔼[Λ(W)|Z,R=0],\displaystyle=\mathbb{E}[\Lambda(W)|Z,R=0],

where the second equality is by Assumption 3.1, and the fourth line follows by Assumption 2.1(e). Likewise for (4),

𝔼[R1(Yqτ)|Z]\displaystyle\mathbb{E}[R1(Y\leq q_{\tau})|Z] =𝔼[1(Yqτ)|Z,R=1][R=1|Z]\displaystyle=\mathbb{E}[1(Y\leq q_{\tau})|Z,R=1]\cdot\mathbb{P}[R=1|Z]
=𝔼[Λ(W)|Z,R=0]r(Z)\displaystyle=\mathbb{E}[\Lambda(W)|Z,R=0]\cdot r(Z)
=𝔼[(1R)Λ(W)|Z]r(Z)1r(Z).\displaystyle=\mathbb{E}[(1-R)\Lambda(W)|Z]\cdot\dfrac{r(Z)}{1-r(Z)}.

Thus, Lemma 3.1 follows immediately from (3) (or (4)) and Assumption 3.2. \blacksquare

Lemma A.1.

Suppose (i) Λ(w;β)\Lambda(w;\beta) is measurable with respect to ww for all βΘβ\beta\in\Theta_{\beta}; (ii) WW is bounded complete for ZZ, relative to the auxiliary population; (iii) Λ(w;β)\Lambda(w;\beta) is differentiable with respect to β\beta; and (iv) Λ(;β)/β\partial\Lambda(\cdot;\beta)/\partial\beta is uniformly bounded and Λ(;β)/β0\partial\Lambda(\cdot;\beta)/\partial\beta\not\equiv 0 for all βΘβ\beta\in\Theta_{\beta}. Then, under Assumptions 2.1 and 3.1, β0\beta_{0} can be uniquely identified from (3) or (4).

Proof of Lemma A.1: From Lemma 3.1, we know that β0\beta_{0} solves (3) or (4). It remains to show uniqueness. Suppose, there is β1\beta_{1}, β1β0\beta_{1}\neq\beta_{0}, that solves (3), then, 𝔼[Λ(W;β1)Λ(W;β0)|Z,R=0]=0.\operatorname{\mathbb{E}}[\Lambda(W;\beta_{1})-\Lambda(W;\beta_{0})|Z,R=0]=0. By MVT, this and (iii) implies that 𝔼[Λ(W;β)/β|β=β~|Z,R=0](β1β0)=0\operatorname{\mathbb{E}}[\partial\Lambda(W;\beta)/\partial\beta|_{\beta=\widetilde{\beta}}|Z,R=0](\beta_{1}-\beta_{0})=0, for some value between β0\beta_{0} and β1\beta_{1}. Condition (i), (ii), and (iv) then imply that 𝔼[Λ(W;β)/β|β=β~|Z,R=0]0\operatorname{\mathbb{E}}[\partial\Lambda(W;\beta)/\partial\beta|_{\beta=\widetilde{\beta}}|Z,R=0]\not\equiv 0, which leads to a contradiction. \blacksquare

Proof of Theorem 3.2: We shall first prove the identification result for a fixed counterfactual distribution. Next, we take the derivative of the counterfactual experiments with respect to tt. The result of Theorem 3.2 then follows by the fact that Hadamard derivative operator of the quantile functional is linear. For any tt0t\leq t_{0}, fix ϵtΦ\epsilon_{t}\in\Phi^{\ast}, and we have that

F\displaystyle F (qτ|1)Y~s,t|R{}_{\widetilde{Y}_{s,t}|R}(q_{\tau}|1)
=P(gs(X~s,t,Z~1,t,ϵ~s,t)qτ|X~s,t=x,Z~1,t=z1,R~t=1)dFX~s,tZ~1,t|R~t(x,z1|1)\displaystyle=\int P(g_{s}(\widetilde{X}_{s,t},\widetilde{Z}_{1,t},\widetilde{\epsilon}_{s,t})\leq q_{\tau}|\widetilde{X}_{s,t}=x,\widetilde{Z}_{1,t}=z_{1},\widetilde{R}_{t}=1)dF_{\widetilde{X}_{s,t}\widetilde{Z}_{1,t}|\widetilde{R}_{t}}(x,z_{1}|1)
=P(gs(Gt1(U~s,t),Z~1,t,ϵ~s,t)qτ|U~s,t=u,Z~1,t=z1,R~t=1)dFU~s,tZ~1,t|R~t(u,z1|1)\displaystyle=\int P(g_{s}(G^{-1}_{t}(\widetilde{U}_{s,t}),\widetilde{Z}_{1,t},\widetilde{\epsilon}_{s,t})\leq q_{\tau}|\widetilde{U}_{s,t}=u,\widetilde{Z}_{1,t}=z_{1},\widetilde{R}_{t}=1)dF_{\widetilde{U}_{s,t}\widetilde{Z}_{1,t}|\widetilde{R}_{t}}(u,z_{1}|1)
=P(gs(Gt1(u),Z1,ϵs)qτ|Z1=z1,R=1)dFUsZ1|R(u,z1|1)\displaystyle=\int P(g_{s}(G^{-1}_{t}(u),Z_{1},\epsilon_{s})\leq q_{\tau}|Z_{1}=z_{1},R=1)dF_{U_{s}Z_{1}|R}(u,z_{1}|1)
=P(gs(Gt1(u),Z1,ϵs)qτ|Us=u,Z1=z1,R=1)dFUsZ1|R(u,z1|1)\displaystyle=\int P(g_{s}(G^{-1}_{t}(u),Z_{1},\epsilon_{s})\leq q_{\tau}|U_{s}=u,Z_{1}=z_{1},R=1)dF_{U_{s}Z_{1}|R}(u,z_{1}|1)
=P(gs(Xs,Z1,ϵs)qτ|Xs=Gt1(u),Z1=z1,R=1)dFUsZ1|R(u,z1|1)\displaystyle=\int P(g_{s}(X_{s},Z_{1},\epsilon_{s})\leq q_{\tau}|X_{s}=G^{-1}_{t}(u),Z_{1}=z_{1},R=1)dF_{U_{s}Z_{1}|R}(u,z_{1}|1)
=P(gs(Xs,Z1,ϵs)qτ|Xs=Gt1(u),Z1=z1,R=1)dFUsZ|R(u,z|1)\displaystyle=\int P(g_{s}(X_{s},Z_{1},\epsilon_{s})\leq q_{\tau}|X_{s}=G^{-1}_{t}(u),Z_{1}=z_{1},R=1)dF_{U_{s}Z|R}(u,z|1)
=P(gs(Xs,Z1,ϵs)qτ|Xs=Gt1(FXs(x)),Z1=z1,R=1)dFXsZ|R(x,z|1)\displaystyle=\int P(g_{s}(X_{s},Z_{1},\epsilon_{s})\leq q_{\tau}|X_{s}=G^{-1}_{t}(F_{X_{s}}(x)),Z_{1}=z_{1},R=1)dF_{X_{s}Z|R}(x,z|1)
=FYs|XsZ1R(qτ|Gt1(FXs|R(x|1)),Z1,1)dFXsZ|R(x,z|1)\displaystyle=\int F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|G^{-1}_{t}(F_{X_{s}|R}(x|1)),Z_{1},1)dF_{X_{s}Z|R}(x,z|1)
=FYs|XsZ1R(qτ|Gt1(FXs|R(x|1)),Z1,1)r(z)(1Q0)Q0(1r(z))dFXZ|R(x,z|0)\displaystyle=\int F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|G^{-1}_{t}(F_{X_{s}|R}(x|1)),Z_{1},1)\dfrac{r(z)(1-Q_{0})}{Q_{0}(1-r(z))}dF_{XZ|R}(x,z|0)
=𝔼[FYs|XsZ1R(qτ|Gt1(FXs|R(X|1)),Z1,1)r(Z)(1Q0)Q0(1r(Z))|R=0]\displaystyle=\mathbb{E}\left[F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|G^{-1}_{t}(F_{X_{s}|R}(X|1)),Z_{1},1)\dfrac{r(Z)(1-Q_{0})}{Q_{0}(1-r(Z))}|R=0\right]
=11Q0𝔼[(1R)(Z)FYs|XsZ1R(qτ|Gt1(FX|R(X|1)),Z1,1)],\displaystyle=\dfrac{1}{1-Q_{0}}\mathbb{E}\left[(1-R)\ell(Z)\cdot F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|G^{-1}_{t}(F_{X|R}(X|1)),Z_{1},1)\right],

where the second line follows by the definition of FY~s,t(qτ)F_{\widetilde{Y}_{s,t}}(q_{\tau}), the third one comes from the definition of U~s,t\widetilde{U}_{s,t} and a change of variable from xx to uu, the fourth equality follows by the construction of Φ\Phi^{\ast} and Assumptions 3.3(a) and (b), the fifth line is again by Assumption 3.3(a), the eighth one follows by the definition of UsU_{s} and standard change-of-variable argument, the tenth line is by Assumptions 2.1(a)–(c) and Bayes’ Law.

To obtain the marginal distributional effect, we take derivative of FY~s|RGt(qτ|1)F_{\widetilde{Y}_{s}|R}^{G_{t}}(q_{\tau}|1) with respect to tt and evaluate it at t=0t=0. For the marginal distributional shift,

FY~s,t|R(qτ|1)t|t=0=\displaystyle\left.\dfrac{\partial F_{\widetilde{Y}_{s,t}|R}(q_{\tau}|1)}{\partial t}\right|_{t=0}= FYs|XsZ1R(qτ|x,z1,1)xGt,p1(FXs|R(x|1))t|t=0\displaystyle\int\left.\dfrac{\partial F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|x,z_{1},1)}{\partial x}\cdot\dfrac{\partial G^{-1}_{t,p}(F_{X_{s}|R}(x|1))}{\partial t}\right|_{t=0}
r(z)(1Q0)Q0(1r(z))dFW|R(w|0)\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\cdot\dfrac{r(z)(1-Q_{0})}{Q_{0}(1-r(z))}dF_{W|R}(w|0)
=\displaystyle= 11Q0𝔼[(1R)(Z)Λ(X,Z1)xGt,p1(FXs|R(x|1))t|t=0].\displaystyle\dfrac{1}{1-Q_{0}}\mathbb{E}\left[(1-R)\ell(Z)\cdot\dfrac{\partial\Lambda(X,Z_{1})}{\partial x}\cdot\dfrac{\partial G^{-1}_{t,p}(F_{X_{s}|R}(x|1))}{\partial t}|_{t=0}\right].

Observe that Gt,p1()t|t=0\frac{\partial G^{-1}_{t,p}(\cdot)}{\partial t}|_{t=0} is the pathwise derivative of the inverse map HH1H\mapsto H^{-1} at FXs|R=1F_{X_{s}|R=1} in the direction of G()FXs|R=1G(\cdot)-F_{X_{s}|R=1}. By Lemma 3.9.23 in Van Der Vaart and Wellner (1996), the inverse map is Hadamard differentiable under the conditions specified in the theorem, with the derivative map given by,

ϕ(ϕ/h)H1,\phi\mapsto-(\phi/h)\circ H^{-1},

where hh is the first-order derivative of HH. Let ϕ()=G()FXs|R=1()\phi(\cdot)=G(\cdot)-F_{X_{s}|R=1}(\cdot) and H=FXs|R=1H=F_{X_{s}|R=1}, it follows immediately that, for all u[0,1]u\in[0,1],

Gt,p1(u)t|t=0=G(FXs|R=11(u))u)fXs|R=1(FXs|R=11(u)),\left.\frac{\partial G^{-1}_{t,p}(u)}{\partial t}\right|_{t=0}=-\dfrac{G(F_{X_{s}|R=1}^{-1}(u))-u)}{f_{X_{s}|R=1}(F_{X_{s}|R=1}^{-1}(u))},

and hence, for all x𝒳x\in\operatorname{\mathcal{X}},

Gt,p1(FXs|R(x|1))t|t=0=FXs|R(x|1)G(x)fXs|R=1(x).\left.\dfrac{\partial G^{-1}_{t,p}(F_{X_{s}|R}(x|1))}{\partial t}\right|_{t=0}=\dfrac{F_{X_{s}|R}(x|1)-G(x)}{f_{X_{s}|R=1}(x)}.

Analogously, for marginal quantile shift,

FY~s,t|R(qτ|1)t|t=0=\displaystyle\left.\dfrac{\partial F_{\widetilde{Y}_{s,t}|R}(q_{\tau}|1)}{\partial t}\right|_{t=0}= FYs|XsZ1R(qτ|x,z1,1)xGt,q1(FXs|R(x|1))t|t=0\displaystyle\int\dfrac{\partial F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|x,z_{1},1)}{\partial x}\cdot\left.\dfrac{\partial G^{-1}_{t,q}(F_{X_{s}|R}(x|1))}{\partial t}\right|_{t=0}
r(z)(1Q0)Q0(1r(z))dFW|R(w|0)\displaystyle\qquad\qquad\qquad\quad\qquad\qquad\qquad\qquad\cdot\dfrac{r(z)(1-Q_{0})}{Q_{0}(1-r(z))}dF_{W|R}(w|0)
=\displaystyle= 11Q0𝔼[(1R)(Z)Λ(X,Z1)x(G1(FX|R=1(X))X)],\displaystyle\dfrac{1}{1-Q_{0}}\mathbb{E}\left[(1-R)\ell(Z)\cdot\dfrac{\partial\Lambda(X,Z_{1})}{\partial x}\cdot(G^{-1}(F_{X|R=1}(X))-X)\right],

where the second equality follows from Lemma 3.1 and the definition of Gt,q1()G^{-1}_{t,q}(\cdot).

To identify FXs|R=1F_{X_{s}|R=1}, we exploit the following fact

FXs|R=1()\displaystyle F_{X_{s}|R=1}(\cdot) =𝔼[1(X)|R=1]\displaystyle=\mathbb{E}[1(X\leq\cdot)|R=1]
=𝒵𝒳1(x)𝑑FX|ZR(x|z,1)𝑑FZ|R(z|1)\displaystyle=\int_{\operatorname{\mathcal{Z}}}\int_{\operatorname{\mathcal{X}}}1(x\leq\cdot)dF_{X|ZR}(x|z,1)dF_{Z|R}(z|1)
=𝒵𝒳1(x)(1Q0)r(z)Q0(1r(z))𝑑FX|ZR(x|z,0)𝑑FZ|R(z|0)\displaystyle=\int_{\operatorname{\mathcal{Z}}}\int_{\operatorname{\mathcal{X}}}1(x\leq\cdot)\cdot\dfrac{(1-Q_{0})r(z)}{Q_{0}(1-r(z))}dF_{X|ZR}(x|z,0)dF_{Z|R}(z|0)
=𝔼[(1Q0)r(Z)Q0(1r(Z))1(X)|R=0]\displaystyle=\mathbb{E}\left[\dfrac{(1-Q_{0})r(Z)}{Q_{0}(1-r(Z))}1(X\leq\cdot)|R=0\right]
=11Q0𝔼[(1R)(Z)1(X)],\displaystyle=\dfrac{1}{1-Q_{0}}\mathbb{E}\left[(1-R)\ell(Z)1(X\leq\cdot)\right],

where the third line is due to Assumption 2.1 and Bayes’ Law.

Theorem 3.2 then follows from Assumption 3.4, qτ=FYs|R=11(τ)q_{\tau}=F_{Y_{s}|R=1}^{-1}(\tau), and the fact that the Hadamard derivative of the quantile functional is ντ(ϕ)=ϕfYs|R=1FYs|R=11(τ)\nu_{\tau}^{\prime}(\phi)=-\dfrac{\phi}{f_{Y_{s}|R=1}}\circ F_{Y_{s}|R=1}^{-1}(\tau), which is linear in ϕ.\phi. \blacksquare

Proof of Theorem 3.3: First, we fix Us𝒰sU_{s}\in\mathcal{U}_{s} and ϕtΦ\phi_{t}\in\Phi^{\ast}, for tt0t\leq t_{0}. By construction, there exists U~s,t𝒰~s,t\widetilde{U}_{s,t}\in\mathcal{\widetilde{U}}_{s,t} such that (U~s,t|Z1,R=1)=d(Us|Z1,R=1)(\widetilde{U}_{s,t}|Z_{1},R=1)\stackrel{{\scriptstyle d}}{{=}}(U_{s}|Z_{1},R=1). Now we rewrite FY~s,t|R=1Y~s,t|R=1F_{\widetilde{Y}_{s,t}|R=1}\in\operatorname{\mathcal{F}}_{\widetilde{Y}_{s,t}|R=1} in terms of UsU_{s} and Z1Z_{1}. Let x0=x^{0}=-\infty, and we have that

F\displaystyle F (qτ)Y~s,t|R=1{}_{\widetilde{Y}_{s,t}|R=1}(q_{\tau})
=\displaystyle= P(gs(X~s,t,Z~1,t,ϵ~s,t)qτ|X~t=x,Z~1,t=z1,R~t=1)dFX~s,tZ~1,t|R~t(x,z1|1)\displaystyle\int P(g_{s}(\widetilde{X}_{s,t},\widetilde{Z}_{1,t},\widetilde{\epsilon}_{s,t})\leq q_{\tau}|\widetilde{X}_{t}=x,\widetilde{Z}_{1,t}=z_{1},\widetilde{R}_{t}=1)dF_{\widetilde{X}_{s,t}\widetilde{Z}_{1,t}|\widetilde{R}_{t}}(x,z_{1}|1)
=\displaystyle= j=1lP(gs(X~s,t,Z~1,t,ϵ~s,t)qτ|X~s,t=xj,Z~1,t=z1,R~t=1)\displaystyle\sum_{j=1}^{l}\int P(g_{s}(\widetilde{X}_{s,t},\widetilde{Z}_{1,t},\widetilde{\epsilon}_{s,t})\leq q_{\tau}|\widetilde{X}_{s,t}=x^{j},\widetilde{Z}_{1,t}=z_{1},\widetilde{R}_{t}=1)
P(U~s,t(Gt,p(xj1),Gt,p(xj)]|Z~1,t=z1,R~t=1)dFZ~1,t|R~t(z1|1)\displaystyle\qquad\cdot P(\widetilde{U}_{s,t}\in(G_{t,p}(x^{j-1}),G_{t,p}(x^{j})]|\widetilde{Z}_{1,t}=z_{1},\widetilde{R}_{t}=1)dF_{\widetilde{Z}_{1,t}|\widetilde{R}_{t}}(z_{1}|1)
=\displaystyle= j=1lP(gs(Xs,Z1,ϵs)qτ|X=xj,Z1=z1,R=1)\displaystyle\sum_{j=1}^{l}\int P(g_{s}(X_{s},Z_{1},\epsilon_{s})\leq q_{\tau}|X=x^{j},Z_{1}=z_{1},R=1)
P(Us(Gt,p(xj1),Gt,p(xj)]|Z1=z1,R=1)dFZ1|R(z1|1)\displaystyle\qquad\cdot P(U_{s}\in(G_{t,p}(x^{j-1}),G_{t,p}(x^{j})]|Z_{1}=z_{1},R=1)dF_{Z_{1}|R}(z_{1}|1)
=\displaystyle= j=1lFYs|XsZ1R(qτ|xj,z1,1)(P(Us(FXs|R(xj1|1),FXs|R(xj|1)]|Z1=z1,R=1)\displaystyle\sum_{j=1}^{l}\int F_{Y_{s}|X_{s}Z_{1}R}(q_{\tau}|x^{j},z_{1},1)\cdot(P(U_{s}\in(F_{X_{s}|R}(x^{j-1}|1),F_{X_{s}|R}(x^{j}|1)]|Z_{1}=z_{1},R=1)
+P(Us(Gt,p(xj1),Gt,p(xj)]|Z1=z1,R=1)\displaystyle+P(U_{s}\in(G_{t,p}(x^{j-1}),G_{t,p}(x^{j})]|Z_{1}=z_{1},R=1)
P(Us(FXs|R(xj1|1),FXs|R(xj|1)]|Z1=z1,R=1))dFZ1|R(z1|1)\displaystyle-P(U_{s}\in(F_{X_{s}|R}(x^{j-1}|1),F_{X_{s}|R}(x^{j}|1)]|Z_{1}=z_{1},R=1))dF_{Z_{1}|R}(z_{1}|1)
=\displaystyle= FYs|R(qτ|1)j=1lΛ(xj,z1)(P(Us(Gt,p(xj1),Gt,p(xj)]|Z1=z1,R=1)\displaystyle F_{Y_{s}|R}(q_{\tau}|1)-\sum_{j=1}^{l}\int\Lambda(x^{j},z_{1})(P(U_{s}\in(G_{t,p}(x^{j-1}),G_{t,p}(x^{j})]|Z_{1}=z_{1},R=1)
P(Us(FXs|R(xj1|1),FXs|R(xj|1)]|Z1=z1,R=1))dFZ1|R(z1|1).\displaystyle-P(U_{s}\in(F_{X_{s}|R}(x^{j-1}|1),F_{X_{s}|R}(x^{j}|1)]|Z_{1}=z_{1},R=1))dF_{Z_{1}|R}(z_{1}|1). (15)

For the second term on the right hand side of the last equality, we have

P(Us\displaystyle P(U_{s} (Gt,p(xj1),Gt,p(xj)]Z1=z1,R=1)\displaystyle\in(G_{t,p}(x^{j-1}),G_{t,p}(x^{j})]\mid Z_{1}=z_{1},R=1)
P(Us(FXs|R(xj1|1),FXs|R(xj|1)]Z1=z1,R=1))\displaystyle-P(U_{s}\in(F_{X_{s}|R}(x^{j-1}|1),F_{X_{s}|R}(x^{j}|1)]\mid Z_{1}=z_{1},R=1))
=\displaystyle= (FUs|Z1R(Gt,p(xj)z1,1)FUs|Z1R(FXs|R(xj|1)z1,1)\displaystyle(F_{U_{s}|Z_{1}R}(G_{t,p}(x^{j})\mid z_{1},1)-F_{U_{s}|Z_{1}R}(F_{X_{s}|R}(x^{j}|1)\mid z_{1},1)
(FUs|Z1R(Gt,p(xj1)z1,1))FUs|Z1R(FXs|R(xj1|1)z1,1))\displaystyle-(F_{U_{s}|Z_{1}R}(G_{t,p}(x^{j-1})\mid z_{1},1))-F_{U_{s}|Z_{1}R}(F_{X_{s}|R}(x^{j-1}|1)\mid z_{1},1))
=\displaystyle= (Gt,p(xj)FXs|R(xj|1))fUs|Z1R(u~j,tz1,1)\displaystyle(G_{t,p}(x^{j})-F_{X_{s}|R}(x^{j}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j,t}\mid z_{1},1)
(Gt,p(xj1)FXs|R(xj1|1))fUs|Z1R(u~j1,tz1,1)\displaystyle-(G_{t,p}(x^{j-1})-F_{X_{s}|R}(x^{j-1}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j-1,t}\mid z_{1},1)
=\displaystyle= t(G(xj)FXs|R(xj|1))fUs|Z1R(u~j,tz1,1)\displaystyle t\cdot(G(x^{j})-F_{X_{s}|R}(x^{j}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j,t}\mid z_{1},1)
t(G(xj1)FXs|R(xj1|1))fUs|Z1R(u~j1,tz1,1),\displaystyle-t\cdot(G(x^{j-1})-F_{X_{s}|R}(x^{j-1}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j-1,t}\mid z_{1},1),

where u~j,t\widetilde{u}_{j,t} is some value between Gt,p(xj)G_{t,p}(x^{j}) and FXs|R(xj|1)F_{X_{s}|R}(x^{j}|1), and is potentially dependent on z1z_{1}. The last equality is due to MVT. Using the above result, (15) becomes

FYs|R(qτ|1)\displaystyle F_{Y_{s}|R}(q_{\tau}|1) j=1ltΛ(xj,z1)((G(xj)FXs|R(xj|1))fUs|Z1R(u~j,tz1,1)\displaystyle-\sum_{j=1}^{l}t\cdot\int\Lambda(x^{j},z_{1})\cdot((G(x^{j})-F_{X_{s}|R}(x^{j}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j,t}\mid z_{1},1)
(G(xj1)FXs|R(xj1|1))fUs|Z1R(u~j1,tz1,1))dFZ1|R(z1|1)\displaystyle-(G(x^{j-1})-F_{X_{s}|R}(x^{j-1}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j-1,t}\mid z_{1},1))dF_{Z_{1}|R}(z_{1}|1)
=FYs|R(qτ|1)\displaystyle=F_{Y_{s}|R}(q_{\tau}|1) j=2lt(Λ(xj1,z1)Λ(xj,z1))\displaystyle-\sum_{j=2}^{l}t\cdot\int(\Lambda(x^{j-1},z_{1})-\Lambda(x^{j},z_{1}))
(G(xj1)FXs|R(xj1|1))fUs|Z1R(u~j1,tz1,1)dFZ1|R(z1|1),\displaystyle\cdot(G(x^{j-1})-F_{X_{s}|R}(x^{j-1}|1))\cdot f_{U_{s}|Z_{1}R}(\widetilde{u}_{j-1,t}\mid z_{1},1)dF_{Z_{1}|R}(z_{1}|1),

where the equality follows by rearranging terms, the fact that G(x0)=FXs|R(x0|1)=0G(x^{0})=F_{X_{s}|R}(x^{0}|1)=0, and that G(xl)=FXs|R(xl|1)=1G(x^{l})=F_{X_{s}|R}(x^{l}|1)=1. The pathwise derivative can thus be calculated as

limt0\displaystyle\lim_{t\downarrow 0} FY~s,t|R=1(qτ)FYs|R(qτ|1)t\displaystyle\dfrac{F_{\widetilde{Y}_{s,t}|R=1}(q_{\tau})-F_{Y_{s}|R}(q_{\tau}|1)}{t}
=\displaystyle= j=2l(Λ(xj1,z1)Λ(xj,z1))\displaystyle\sum_{j=2}^{l}\int(\Lambda(x^{j-1},z_{1})-\Lambda(x^{j},z_{1}))
(G(xj1)FXs|R(xj1|1))dFZ1|UsR(z1FXs|R(xj1|1),1),\displaystyle\qquad\cdot(G(x^{j-1})-F_{X_{s}|R}(x^{j-1}|1))dF_{Z_{1}|U_{s}R}(z_{1}\mid F_{X_{s}|R}(x^{j-1}|1),1),

where the second line is due to the dominated convergence theorem, Bayes’s Law, and the fact that Us|R=1U_{s}|R=1 follows the standard uniform distribution. Therefore, by Lemma 3.1 and the linearity of ντ()\nu_{\tau}^{\prime}(\cdot),

UQEp(τ,G)[infUs𝒰sj=2lhqτ(xj,xj1,z1)dFZ1|UsR(z1FXs|R(xj1|1),1),supUs𝒰sj=2lhqτ(xj,xj1,z1)dFZ1|UsR(z1FXs|R(xj1|1),1)].UQE_{p}(\tau,G)\in\left[\inf\limits_{U_{s}\in{\mathcal{U}}_{s}}\sum_{j=2}^{l}\int h_{q_{\tau}}(x^{j},x^{j-1},z_{1})dF_{Z_{1}|U_{s}R}(z_{1}\mid F_{X_{s}|R}(x^{j-1}|1),1),\right.\\ \left.\sup\limits_{U_{s}\in{\mathcal{U}}_{s}}\sum_{j=2}^{l}\int h_{q_{\tau}}(x^{j},x^{j-1},z_{1})dF_{Z_{1}|U_{s}R}(z_{1}\mid F_{X_{s}|R}(x^{j-1}|1),1)\right].

Using a similar argument as in the proof of Theorem 5 in Rothe (2012), we can show that for j=1,,lj=1,\dots,l, {FZ1|UsR(z1|Us=FXs|R(xj|1),R=1):Us𝒰s}\{F_{Z_{1}|U_{s}R}(z_{1}|U_{s}=F_{X_{s}|R}(x^{j}|1),R=1):U_{s}\in\mathcal{U}_{s}\} is the set of all multivariate distribution functions with support equal to Supp(FZ1|R=1)\operatorname{Supp}(F_{Z_{1}|R=1}). To see this, note that for j=1,,lj=1,\dots,l, FZ1|UsR(|Us=FXs|R(xj|1),R=1)=C1Us(FXs|R(xj|1),FZ1|R(|1))F_{Z_{1}|U_{s}R}(\cdot|U_{s}=F_{X_{s}|R}(x^{j}|1),R=1)=C^{U_{s}}_{1}(F_{X_{s}|R}(x^{j}|1),F_{Z_{1}|R}(\cdot|1)), where the conditional copula, CUsC^{U_{s}}, is defined by CUs(FUs|R(u|1),FZ1|R(z1|1))C^{U_{s}}(F_{U_{s}|R}(u|1),F_{Z_{1}|R}(z_{1}|1)) :=FUsZ1|R(u,z1|1):=F_{U_{s}Z_{1}|R}(u,z_{1}|1), and C1UsC^{U_{s}}_{1} is the partial derivative of CUsC^{U_{s}} with respect to the first argument. By the construction of Φ\Phi, the set of CUs(,)C^{U_{s}}(\cdot,\cdot) for Us𝒰sU_{s}\in{\mathcal{U}}_{s} is equivalent to the identified set of the conditional copula of XsX_{s} and Z1Z_{1} given R=1R=1, CXs(,)C^{X_{s}}(\cdot,\cdot), where CXs(FXs|R(x|1),FZ1|R(z1|1)):=FXsZ1|R(x,z|1)C^{X_{s}}(F_{X_{s}|R}(x|1),F_{Z_{1}|R}(z_{1}|1)):=F_{X_{s}Z_{1}|R}(x,z|1), for all x{x1,,xl}x\in\{x^{1},\dots,x^{l}\}. Then, the desired result follows by applying an extension of Theorem 2.2.7 in Nelsen (2007).

Without loss of generality, we focus on the upper bound for now. By appropriately choosing Dirac measures with unit masses on {zj}j𝒥+\{z^{\ast}_{j}\}_{j\in\operatorname{\mathcal{J}}_{+}} and {zj}j𝒥\{z^{\dagger}_{j}\}_{j\in\operatorname{\mathcal{J}}_{-}}, It is straightforward to show that,

supUs𝒰s\displaystyle\sup\limits_{U_{s}\in\mathcal{U}_{s}} j=2lhqτ(xj,xj1,z1)dFZ1|UsR(z1FXs|R(xj1|1),1)\displaystyle\sum_{j=2}^{l}\int h_{q_{\tau}}(x^{j},x^{j-1},z_{1})dF_{Z_{1}|U_{s}R}(z_{1}\mid F_{X_{s}|R}(x^{j-1}|1),1)
=j𝒥+hqτ(xj,xj1,z1,j)+j𝒥hqτ(xj,xj1,z1,j).\displaystyle=\sum_{j\in\operatorname{\mathcal{J}}_{+}}h_{q_{\tau}}(x^{j},x^{j-1},z^{\ast}_{1,j})+\sum_{j\in\operatorname{\mathcal{J}}_{-}}h_{q_{\tau}}(x^{j},x^{j-1},z^{\dagger}_{1,j}). (16)

The right hand side of (A) is identified under the support condition in Assumption 2.1(a). The proof for the lower bound follows by an analogous argument. \blacksquare

Appendix B Asymptotic Linear Representation of UQE Estimators

We specify additional regularity conditions in Theorem 4.2 and provide linear expansions for UQE^p\widehat{UQE}_{p} and UQE^q\widehat{UQE}_{q} in this section, the proofs of which are contained in the Supplementary Appendix.

Assumption B.1.
  1. (a)

    fX|ZR=0f_{X|ZR=0} is uniformly bounded, twice continuously differentiable with uniformly bounded first and second order derivatives on 𝒳𝒵\operatorname{\mathcal{X}}\operatorname{\mathcal{Z}}.

  2. (b)

    (i) Kx()K_{x}(\cdot) is a second order symmetric kernel function; (ii) the support of KxK_{x} is continuous, bounded, with compact support, Kx()K_{x}(\cdot), and such that Kx(x)𝑑x=1\int K_{x}(x)dx=1, xKx(x)𝑑x=0,x2Kx(x)𝑑x>0\int xK_{x}(x)dx=0,\int x^{2}K_{x}(x)dx>0, and Kx2(x)𝑑x<\int K_{x}^{2}(x)dx<\infty.

  3. (c)

    (i) nbx/log(n)nb_{x}/log(n)\to\infty and (ii) nbx40.nb_{x}^{4}\to 0.

For j=p,qj=p,q, the asymptotic linear representation of UQE^j\widehat{UQE}_{j} is given as follows,

UQE^\displaystyle\widehat{UQE} (τ,G)jUQEj(τ,G)Bj(τ,d,by){}_{j}(\tau,G)-UQE_{j}(\tau,G)-B_{j}(\tau,d,b_{y})
=\displaystyle= 1ni=1n{ψfy,j(Ai;θ0,qτ,G)1fY|R(qτ|1)ψd,j(Ai;θ0,qτ,G)}+op(n1/2by1/2+by2),\displaystyle\dfrac{1}{n}\sum_{i=1}^{n}\left\{\psi_{f_{y},j}(A_{i};\theta_{0},q_{\tau},G)-\dfrac{1}{f_{Y|R}(q_{\tau}|1)}\psi_{d,j}(A_{i};\theta_{0},q_{\tau},G)\right\}+o_{p}(n^{-1/2}b_{y}^{-1/2}+b_{y}^{2}),
=\displaystyle= 1ni=1nψj(Ai;θ0,qτ,by)+op(n1/2by1/2+by2),\displaystyle\dfrac{1}{n}\sum_{i=1}^{n}\psi_{j}(A_{i};\theta_{0},q_{\tau},b_{y})+o_{p}(n^{-1/2}b_{y}^{-1/2}+b_{y}^{2}), (17)

where

ψfy,j(a;θ0,qτ,G):=\displaystyle\psi_{f_{y},j}(a;\theta_{0},q_{\tau},G):= dj(θ0,G)fY|R2(qτ|1)rQ0(Kby(yqτ)\displaystyle\dfrac{d_{j}(\theta_{0},G)}{f_{Y|R}^{2}(q_{\tau}|1)}\dfrac{r}{Q_{0}}\left(K_{b_{y}}(y-q_{\tau})\right.
𝔼[Kby(Yqτ)|R=1](1(yqτ)τ)fY|R(qτ|1)fY|R(qτ|1)),\displaystyle\left.-\operatorname{\mathbb{E}}[K_{b_{y}}(Y-q_{\tau})|R=1]-\dfrac{(1(y\leq q_{\tau})-\tau)f^{\prime}_{Y|R}(q_{\tau}|1)}{f_{Y|R}(q_{\tau}|1)}\right), (18)
ψd,j(a;θ0,qτ,G):=\displaystyle\psi_{d,j}(a;\theta_{0},q_{\tau},G):= (Mθ,j(θ0)ψθ(a;θ0,qτ)\displaystyle\left(M_{\theta,j}(\theta_{0})^{\prime}\psi_{\theta}(a;\theta_{0},q_{\tau})\right.
+ψg,j(a;θ0,G)+(1r)(z)Λx(w;β0)gj(x)1Q0rdj(θ0,G)Q0).\displaystyle\left.+\psi_{g,j}(a;\theta_{0},G)+\dfrac{(1-r)\ell(z)\Lambda_{x}(w;\beta_{0})g_{j}(x)}{1-Q_{0}}-\dfrac{rd_{j}(\theta_{0},G)}{Q_{0}}\right). (19)

In the above equation, ψθ(a;θ0,qτ)\psi_{\theta}(a;\theta_{0},q_{\tau}) are defined in the Supplementary Appendix,

Mθ,j(θ0):=\displaystyle M_{\theta,j}(\theta_{0}):= 𝔼[1RQ0(Λx(W;β0)(L,θ0(Z)gj(X)+Gj,θ0(X))L0(Z)1L0(Z)Λx,β(W;β0)gj(X))],\displaystyle\operatorname{\mathbb{E}}\left[\dfrac{1-R}{Q_{0}}\begin{pmatrix}\Lambda_{x}(W;\beta_{0})\left(\nabla_{L,\theta_{0}}(Z)\cdot g_{j}(X)+G_{j,\theta_{0}}(X)\right)\\ \dfrac{L_{0}(Z)}{1-L_{0}(Z)}\Lambda_{x,\beta}(W;\beta_{0})g_{j}(X)\end{pmatrix}\right], (20)
L,θ(z):=\displaystyle\nabla_{L,\theta}(z):= (Ls(z)(1La(z))+Ls(z)La(z)(1La(z))2k(z)Ls(Z)1La(Z)t(Z)Ls(Z)La(Z)(1La(Z))2t(Z)),\displaystyle\begin{pmatrix}\dfrac{L_{s}^{\prime}(z)(1-L_{a}(z))+L_{s}(z)L^{\prime}_{a}(z)}{(1-L_{a}(z))^{2}}\cdot k(z)\\ \dfrac{L^{\prime}_{s}(Z)}{1-L_{a}(Z)}\cdot t(Z)\\ -\dfrac{L_{s}(Z)L_{a}^{\prime}(Z)}{(1-L_{a}(Z))^{2}}\cdot t(Z)\end{pmatrix}, (21)

for Lj(z):=L(k(z)γ+t(z)λj)L_{j}(z):=L(k(z)^{\prime}\gamma+t(z)^{\prime}\lambda_{j}), and Lj(z):=L(k(z)γ+t(z)λj)L_{j}^{\prime}(z):=L^{\prime}(k(z)^{\prime}\gamma+t(z)^{\prime}\lambda_{j}), j=s,a.j=s,a. In addition,

Gq,θ0(x):=\displaystyle G_{q,\theta_{0}}(x):= G(G1(FX|R(x|1)))1𝔼[1RQ0L,θ0(Z)1(Xx)],\displaystyle G^{{}^{\prime}}(G^{-1}(F_{X|R}(x|1)))^{-1}\cdot\operatorname{\mathbb{E}}\left[\dfrac{1-R}{Q_{0}}\cdot\nabla_{L,\theta_{0}}(Z)1(X\leq x)\right], (22)
Gp,θ0(x):=\displaystyle G_{p,\theta_{0}}(x):= 𝔼[1RQ0L,θ0(Z)1(Xx)]/fX|R(x|1)\displaystyle\left.\operatorname{\mathbb{E}}\left[\frac{1-R}{Q_{0}}\nabla_{L,\theta_{0}}(Z)1(X\leq x)\right]\middle/f_{X|R}(x|1)\right.
+(G(x)FX|R=1(x))𝔼[1RQ0L,θ0(Z)IbxKbx(Xx)]/fX|R(x|1)2,\displaystyle+(G(x)-F_{X|R=1}(x))\left.\operatorname{\mathbb{E}}\left[\frac{1-R}{Q_{0}}\nabla_{L,\theta_{0}}(Z)I_{b_{x}}K_{b_{x}}(X-x)\right]\middle/f_{X|R}(x|1)^{2},\right.
ψg,q(a;θ0,G):=\displaystyle\psi_{g,q}(a;\theta_{0},G):= 𝔼[1R1Q0(Z)Λx(W;β0)G(G1(FX|R(X|1)))\displaystyle\operatorname{\mathbb{E}}\left[\frac{1-R}{1-Q_{0}}\cdot\frac{\ell(Z)\Lambda_{x}(W;\beta_{0})}{G^{\prime}\left(G^{-1}(F_{X|R}(X|1))\right)}\right.
((1r)(z)1(xX)1Q0rFX|R(X|1)Q0)],\displaystyle\left.\cdot\left(\frac{(1-r)\ell(z)1(x\leq X)}{1-Q_{0}}-\frac{rF_{X|R}(X|1)}{Q_{0}}\right)\right], (23)
ψg,p(a;θ0,G):=\displaystyle\psi_{g,p}(a;\theta_{0},G):= 𝔼[(1R)(Z)Λx(W;β0)(1Q0)fX|R(X|1)((1r)(z)1(xX)1Q0FX|R(X|1))]\displaystyle\operatorname{\mathbb{E}}\left[\frac{(1-R)\ell(Z)\Lambda_{x}(W;\beta_{0})}{(1-Q_{0})f_{X|R}(X|1)}\cdot\left(\frac{(1-r)\ell(z)1(x\leq X)}{1-Q_{0}}-F_{X|R}(X|1)\right)\right]
+(1r)(z)1Q0π(x)𝔼[(1R)(Z)1Q0π(X)]\displaystyle+\dfrac{(1-r)\ell(z)}{1-Q_{0}}\pi(x)-\operatorname{\mathbb{E}}\left[\dfrac{(1-R)\ell(Z)}{1-Q_{0}}\pi(X)\right]
rQ0Q0𝔼[(1R)(Z)Λx(W;β0)1Q0G(X)fX|R(X|1)],\displaystyle-\dfrac{r-Q_{0}}{Q_{0}}\cdot\operatorname{\mathbb{E}}\left[\dfrac{(1-R)\ell(Z)\Lambda_{x}(W;\beta_{0})}{1-Q_{0}}\cdot\dfrac{G(X)}{f_{X|R}(X|1)}\right], (24)
π(x):=\displaystyle\pi(x):= 𝔼[Λx(W;β0)|X=x,R=1]G(x)FX|R(x|1)fX|R(x|1).\displaystyle\operatorname{\mathbb{E}}\left[\left.\Lambda_{x}(W;\beta_{0})\right|X=x,R=1\right]\dfrac{G(x)-F_{X|R}(x|1)}{f_{X|R}(x|1)}. (25)

References

  • (1)
  • Angrist and Krueger (1992) Angrist, J. D., and Krueger, A. B. (1992), “The Effect of Age at School Entry on Educational Attainment: an Application of Instrumental Variables with Moments from Two Samples,” Journal of the American Statistical Association, 87(418), 328–336.
  • Arellano and Meghir (1992) Arellano, M., and Meghir, C. (1992), “Female Labour Supply and On-the-Job Search: an Empirical Model Estimated Using Complementary Data Sets,” The Review of Economic Studies, 59(3), 537–559.
  • Blundell et al. (2007) Blundell, R., Chen, X., and Kristensen, D. (2007), “Semi-Nonparametric IV estimation of shape-invariant Engel curves,” Econometrica, 75(6), 1613–1669.
  • Buchinsky et al. (2021) Buchinsky, M., Li, F., and Liao, Z. (2021), “Estimation and Inference of Semiparametric Models Using Data from Several Sources,” Journal of Econometrics, In Press.
  • Cattaneo et al. (2020) Cattaneo, M. D., Jansson, M., and Ma, X. (2020), “Simple Local Polynomial Density Estimators,” Journal of the American Statistical Association, 115(531), 1449–1455.
  • Chen et al. (2008) Chen, X., Hong, H., Tarozzi, A. et al. (2008), “Semiparametric Efficiency in GMM Models with Auxiliary Data,” The Annals of Statistics, 36(2), 808–843.
  • Chernozhukov, Fernández-Val and Melly (2013) Chernozhukov, V., Fernández-Val, I., and Melly, B. (2013), “Inference on Counterfactual Distributions,” Econometrica, 81(6), 2205–2268.
  • Chernozhukov, Lee and Rosen (2013) Chernozhukov, V., Lee, S., and Rosen, A. M. (2013), “Intersection Bounds: Estimation and Inference,” Econometrica, 81(2), 667–737.
  • Fan et al. (2014) Fan, Y., Sherman, R., and Shum, M. (2014), “Identifying Treatment Effects under Data Combination,” Econometrica, 82(2), 811–822.
  • Firpo et al. (2009a) Firpo, S., Fortin, N., and Lemieux, T. (2009a), “Supplement to ‘Unconditional Quantile Regressions’,” Econometrica Supplemental Material, 77.
  • Firpo et al. (2009b) Firpo, S., Fortin, N. M., and Lemieux, T. (2009b), “Unconditional Quantile Regressions,” Econometrica, 77(3), 953–973.
  • Fortin et al. (2011) Fortin, N., Lemieux, T., and Firpo, S. (2011), “Decomposition Methods in Economics,” in Handbook of Labor Economics, Vol. 4, Amsterdam: Elsevier, pp. 1–102.
  • Graham et al. (2016) Graham, B. S., Pinto, C. C. d. X., and Egel, D. (2016), “Efficient Estimation of Data Combination Models by the Method of Auxiliary-to-Study Tilting (AST),” Journal of Business & Economic Statistics, 34(2), 288–301.
  • Guerre et al. (2000) Guerre, E., Perrigne, I., and Vuong, Q. (2000), “Optimal Nonparametric Estimation of First-Price Auctions,” Econometrica, 68(3), 525–574.
  • Härdle and Stoker (1989) Härdle, W., and Stoker, T. M. (1989), “Investigating Smooth Multiple Regression by the Method of Average Derivatives,” Journal of the American Statistical Association, 84(408), 986–995.
  • Hirukawa et al. (2020) Hirukawa, M., Murtazashvili, I., and Prokhorov, A. (2020), “Yet Another Look at the Omitted Variable Bias,”. Working Paper.
  • Hoeffding et al. (1977) Hoeffding, W. et al. (1977), “Some Incomplete and Boundedly Complete Families of Distributions,” The Annals of Statistics, 5(2), 278–291.
  • Imbens and Lancaster (1994) Imbens, G. W., and Lancaster, T. (1994), “Combining Micro and Macro Data in Microeconometric Models,” The Review of Economic Studies, 61(4), 655–680.
  • Inoue and Solon (2010) Inoue, A., and Solon, G. (2010), “Two-Sample Instrumental Variables Estimators,” The Review of Economics and Statistics, 92(3), 557–561.
  • Klevmarken (1982) Klevmarken, A. (1982), “Missing Variables and Two-Stage Least-Squares Estimation from More Than One Data Set,” in 1981 Proceedings of the American Statistical Association, Business and Economic Statistics Section.
  • Koenker and Bassett Jr (1978) Koenker, R., and Bassett Jr, G. (1978), “Regression Quantiles,” Econometrica, 46(1), 33–50.
  • Lehmann (1986) Lehmann, E. (1986), Testing Statistical Hypotheses (Second Ed.), New York: Wiley.
  • Li et al. (2002) Li, T., Perrigne, I., and Vuong, Q. (2002), “Structural Estimation of the Affiliated Private Value Auction Model,” RAND Journal of Economics, 33(2), 171–193.
  • Martínez-Iriarte (2020) Martínez-Iriarte, J. (2020), “Sensitivity Analysis in Unconditional Quantile Effects,”. Working Paper.
  • Martínez-Iriarte and Sun (2020) Martínez-Iriarte, J., and Sun, Y. (2020), “Identification and Estimation of Unconditional Policy Effects of an Endogenous Binary Treatment,”. Working Paper.
  • Matzkin (2003) Matzkin, R. L. (2003), “Nonparametric estimation of nonadditive random functions,” Econometrica, 71(5), 1339–1375.
  • Matzkin (2007) Matzkin, R. L. (2007), “Nonparametric Identification,” in Handbook of Econometrics, Vol. 6, Amsterdam: Elsevier, pp. 5307–5368.
  • Nelsen (2007) Nelsen, R. B. (2007), An Introduction to Copulas, New York: Springer Science & Business Media.
  • Newey and Powell (2003) Newey, W. K., and Powell, J. L. (2003), “Instrumental Variable Estimation of Nonparametric Models,” Econometrica, 71(5), 1565–1578.
  • Powell et al. (1989) Powell, J. L., Stock, J. H., and Stoker, T. M. (1989), “Semiparametric Estimation of Index Coefficients,” Econometrica, 57(6), 1403–1430.
  • Regan and Oaxaca (2009) Regan, T. L., and Oaxaca, R. L. (2009), “Work Experience as a Source of Specification Error in Earnings Models: Implications for Gender Wage Decompositions,” Journal of Population Economics, 22(2), 463–499.
  • Ridder and Moffitt (2007) Ridder, G., and Moffitt, R. (2007), “The Econometrics of Data Combination,” in Handbook of Econometrics, Vol. 6, Amsterdam: Elsevier, pp. 5469–5547.
  • Robins et al. (1994) Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994), “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed,” Journal of the American Statistical Association, 89(427), 846–866.
  • Rothe (2010) Rothe, C. (2010), “Nonparametric Estimation of Distributional Policy Effects,” Journal of Econometrics, 155(1), 56–70.
  • Rothe (2012) Rothe, C. (2012), “Partial Distributional Policy Effects,” Econometrica, 80(5), 2269–2301.
  • Sasaki et al. (2020) Sasaki, Y., Ura, T., and Zhang, Y. (2020), “Unconditional Quantile Regression with High Dimensional Data,”. Working Paper.
  • Van Der Vaart and Wellner (1996) Van Der Vaart, A. W., and Wellner, J. A. (1996), Weak Convergence and Empirical Processes, New York: Springer.