This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

MTE with Misspecificationthanks: We thank Vitor Possebom, Yixiao Sun, Kaspar Wuthrich, and seminar participants at UC San Diego for helpful comments. All remaining errors are ours.

Julián Martínez-Iriarte
Department of Economics
UC Santa Cruz
Email: [email protected]
   Pietro Emilio Spini
Department of Economics
UC San Diego
Email: [email protected]
(Click here for the latest version
This draft: November 2021
First draft: June 2021
)
Abstract

This paper studies the implication of a fraction of the population not responding to the instrument when selecting into treatment. We show that, in general, the presence of non-responders biases the Marginal Treatment Effect (MTE) curve and many of its functionals. Yet, we show that, when the propensity score is fully supported on the unit interval, it is still possible to restore identification of the MTE curve and its functionals with an appropriate re-weighting.

Keywords: Marginal Treatment Effects, Misspecification, Weak Instruments.

1 Introduction

Marginal treatment effects (MTEs) have unified the identification theory of several policy parameters. While the MTE framework is essentially non-parametric,111Linearity is sometimes assumed to facilitate estimation. See, e.g., Appendix B in Heckman, Urzua, and Vytlacil (2006) it is required that the recipient’s participation into treatment follows a (generalized) Roy model. This is often referred to as additive separability: an “additive” comparison of costs and benefits determines selection. On the other hand, identification of the MTE is achieved via the local instrumental variable (LIV) approach (Heckman and Vytlacil (2001, 2005)). An excellent survey is provided by Mogstad and Torgovitsky (2018). An early effort to analyze MTE under misspecification can be found in the appendix of the seminal paper by Heckman and Vytlacil (2001). They consider a case where the additive separability in the selection equation does not hold. The most serious consequence is that the LIV approach does not identify the MTE curve.

In this paper we analyze a different type of misspecification. We model a situation in which, under additive separability, a proportion of the population does not take into account the instrumental variable when deciding whether to take up treatment or not. We refer to them as non-responders. To analyze the resulting bias, we define a pseudo-MTE curve which results from the LIV approach. Under no misspecification, the pseudo-MTE curve would coincide with the MTE curve. The resulting bias can be interpreted as a location-scale change of the MTE curve, parameterized by the proportion of non-responders and their propensity score.

We have two main results. The first one shows that the ability to recover the conditional average treatment effect (CATE) for the subpopulation of responders depends on the proportion of non-responders only through the support of the responders’ propensity score. Indeed, when the support of the propensity score is the unit interval, it is possible to identify the CATE without having to recover the true MTE curve in the first place. In a nutshell, ignoring misspecification and integrating under the pseudo-MTE curve over the support of observed propensity score yields the correct CATE for the subpopulation of responders.

While the previous identification result for the CATE is independent of the proportion of non-responders, this is not true of the MTE curve and other parameters derived from it such as LATE and MPRTE. However, in our second result, we show how to recover the MTE curve for responders by undoing the location-scale change induced by the presence of non-responders. The correction is based on an estimate of the support of the propensity score and requires only observable data. It gives an estimator of the policy parameter of interest that is simple to implement. Cases where the propensity score is fully supported are relevant in practice. For a recent example, see the survey approach of Briggs, Caplin, Leth-Petersen, Tonetti, and Violante (2020) the probability of having a child is supported on the full unit interval.

Recently, Acerenza, Ban, and Kedagni (2021) and Possebom (2021) focus on the effect of measurement error in treatment status on the MTE curve. We complement such results by noting that a simple change to our setup can cover the case of misclassification. In a setting where treatment status is misclassified, the observed outcome is generated with the true treatment status. In our setting of misclassification, the observed outcome can be regarded as a mixture of responders and non-responders. The proportion of non-responders is analogous to the proportion of misreporters. Indeed, our results also hold if instead of having a fraction of non-responders, we have a fraction of misreporters.

Another consequence of the presence of non-responders in the sample is that the effect of the instrumental variable on the propensity score is attenuated. Motivated by this, we model a situation where the proportion of non-responders approaches 1, analogous to the setting of weak instruments of Staiger and Stock (1997). Thus, we can derive weak-instrument-like asymptotic distributions for the parameters derived from the MTE curve.

The rest of the paper is organized as follows: section 2 introduces the model; section 3 contains the main identification results; section 4 provides bounds for the case where the propensity score is not fully supported in the unit interval; section 5 traces the connection to the weak IV literature; and section 6 concludes. While this paper only deals with identification, we expect to extend our results to cover estimation and inference.

2 Misspecification and MTE

In this section we introduce our model for misspecification in the MTE framework (Bjorklund and Moffitt (1987), Heckman and Vytlacil (2001, 2005)). We analyze the consequences of misspecification from the identification point of view.

2.1 The Model

We start with a general non-separable potential outcome model

Y(0)\displaystyle Y(0) =h0(X,U0),\displaystyle=h_{0}(X,U_{0}),
Y(1)\displaystyle Y(1) =h1(X,U1),\displaystyle=h_{1}(X,U_{1}),
Y\displaystyle Y =DY(1)+(1D)Y(0),\displaystyle=D^{*}Y(1)+(1-D^{*})Y(0),

where DD^{*} is the observed treatment status, XX are observable covariates with support denoted by 𝒳\mathcal{X}, and {Y(0),Y(1)},Y\left\{Y(0),Y(1)\right\},Y are potential and observed outcomes, respectively. The functions h0h_{0} and h1h_{1} are unknown.

We model misspecification as a situation where there are two types of individuals: responders and non-responders. Responders select into treatment taking into account the incentives in Z. Their selection equation is given by D=𝟙{μ(X,Z)V}D=\mathds{1}\left\{\mu(X,Z)\geq V\right\}. On the other hand, non-responders do not react to incentives in Z at all. Their selection equation is given by D~=𝟙{μ~(X)V~}\tilde{D}=\mathds{1}\left\{\tilde{\mu}(X)\geq\tilde{V}\right\}. Notice how ZZ is not featured in μ~()\tilde{\mu}(\cdot). For the non-responders, Z fails the relevance condition of the standard MTE model.

Let SS be the latent status of an individual: S=1S=1 for a responder and S=0S=0 for a non-responder. The observed treatment status DD^{*} is given by:

D=SD+(1S)D~.\displaystyle D^{*}=S\cdot D+(1-S)\cdot\tilde{D}. (1)

We allow for the proportion of non-responders may vary with XX. To this end, we define δX=Pr(S=0|X)=Pr(D=D~|X)\delta_{X}=\Pr(S=0|X)=\Pr(D^{*}=\tilde{D}|X). Thus, for every subpopulation with characteristics X=xX=x there is a proportion δx=Pr(S=0|X=x)[0,1)\delta_{x}=\Pr(S=0|X=x)\in[0,1) of non-responders. We consider values where supx𝒳δx<1\sup_{x\in\mathcal{X}}\delta_{x}<1 to avoid a situation where no-one responds to the instrumental variable.

Remark 1.

We observe YY according to Y=DY(1)+(1D)Y(0)Y=D^{*}Y(1)+(1-D^{*})Y(0), which is given by the actual choice DD^{*}. If, instead, we have Y=DY(1)+(1D)Y(0)Y=DY(1)+(1-D)Y(0), then we can interpret DD^{*} as a misclassified treatment status. In this case, all individuals decide according to D=𝟙{μ(X,Z)V}D=\mathds{1}\left\{\mu(X,Z)\geq V\right\}, but a fraction of them reports according to D~=𝟙{μ~(X)V~}\tilde{D}=\mathds{1}\left\{\tilde{\mu}(X)\geq\tilde{V}\right\} See Acerenza, Ban, and Kedagni (2021) and Possebom (2021) for recent studies on MTE under misclassification.

The econometrician observes a cross section of (Yi,Di,Xi,Zi)(Y_{i},D^{*}_{i},X_{i},Z_{i}). When δX=0\delta_{X}=0 almost surely, then D=DD^{*}=D and we are in the familiar MTE framework of Heckman and Vytlacil (2001, 2005). Otherwise, if δX0\delta_{X}\neq 0 almost surely, for an observation of DiD^{*}_{i}, we do not know whether we are observing the treatment status of a non-responder or of a responder. That is, it is unknown if we are observing DiD_{i} or D~i\tilde{D}_{i}.

Assumption 1.

Type Independence. SZXS\perp Z\|X.

Assumption 1 states that once we control for XX, the latent status of a individuals does not vary with the instrumental variable Z.

Assumption 2.

Relevance and Exogeneity

  1. 1.

    μ(X,Z)\mu(X,Z) is a nondegenerate random variable conditional on XX.

  2. 2.

    (U0,U1,V,V~)(U_{0},U_{1},V,\tilde{V}) are independent of ZZ conditional on XX.

Note that, for the subpopulation of non-responders, the instrument is valid but totally irrelevant. The larger the value of δx\delta_{x}, the “weaker” the instrument ZZ, since most participants with X=xX=x are non-responders. With the exception of the requirement that V~ZX\tilde{V}\perp Z\|X, these are the same conditions of Heckman and Vytlacil (2001, 2005). Our additional requirement covers the subpopulation of non-responders: neither the “cost” of treatment V~\tilde{V} nor the “benefit” μ~(X)\tilde{\mu}(X) depend on ZZ when conditioned on XX.

Example 1.

To fix ideas, we can think of a two part cost of providing the incentive. A fixed cost associated to targeting a particular subpopulation with covariates X=xX=x and the cost of the incentive itself. If Z is a voucher, there could be administrative costs associated to making it available to subpopulation X=xX=x. For non-responders who do not redeem the voucher, the cost of the incentive is zero. Such a scenario would satisfy Assumption 2.

The misclassification structure of Equation (1) allows to define three different propensity scores. An observed/identified one which is based on the observables (D,X,Z)(D^{*},X,Z), and two latent/unobserved propensity scores: one for the reponders and one for the non-responders. Formally, they are given by

P(X,Z)\displaystyle P^{*}(X,Z) :=Pr(D=1|X,Z)\displaystyle:=\Pr(D^{*}=1|X,Z) (Observed)
P(X,Z)\displaystyle P(X,Z) :=Pr(D=1|S=1,X,Z)\displaystyle:=\Pr(D=1|S=1,X,Z) (Responders)
P~(X)\displaystyle\tilde{P}(X) :=Pr(D~=1|S=0,X)\displaystyle:=\Pr(\tilde{D}=1|S=0,X) (Non-responders)

The next result takes (mainly) advantage of Assumption 1 to derive a useful linear relation between them.

Lemma 1.

Under Assumptions 1 and 2.2 we can relate the different propensity scores by

P(X,Z)=(1δX)P(X,Z)+δXP~(X).\displaystyle P^{*}(X,Z)=(1-\delta_{X})\cdot P(X,Z)+\delta_{X}\cdot\tilde{P}(X). (2)
Proof.

Starting with the model in (1) we can write

Pr(D=1|X,Z)\displaystyle\Pr(D^{*}=1|X,Z) =Pr(S=1|X,Z)Pr(D=1|S=1,X,Z)\displaystyle=\Pr(S=1|X,Z)\cdot\Pr(D=1|S=1,X,Z)
+Pr(S=0|X,Z)Pr(D~=1|S=0,X,Z).\displaystyle+\Pr(S=0|X,Z)\cdot\Pr(\tilde{D}=1|S=0,X,Z).

Assumption 1 simplifies the mixing probabilities to Pr(S=1|X)=1δX\Pr(S=1|X)=1-\delta_{X} and Pr(S=0|X)=δX\Pr(S=0|X)=\delta_{X}. We obtain

Pr(D=1|X,Z)=(1δX)Pr(D=1|S=1,X,Z)+δXPr(D~=1|S=0,X,Z).\displaystyle\Pr(D^{*}=1|X,Z)=(1-\delta_{X})\cdot\Pr(D=1|S=1,X,Z)+\delta_{X}\cdot\Pr(\tilde{D}=1|S=0,X,Z).

To see that Pr(D~=1|S=0,X,Z)=Pr(D~=1|S=0,X)Pr(\tilde{D}=1|S=0,X,Z)=Pr(\tilde{D}=1|S=0,X), we note that By Assumptions 1 and 2.2:

Pr(D~=1|S=0,X,Z)=Pr(μ~(X)V~|S=0,X,Z)=Pr(μ~(X)V~|X)=Pr(D~=1|S=0,X).\displaystyle\Pr(\tilde{D}=1|S=0,X,Z)=\Pr(\tilde{\mu}(X)\geq\tilde{V}|S=0,X,Z)=\Pr(\tilde{\mu}(X)\geq\tilde{V}|X)=\Pr(\tilde{D}=1|S=0,X).

Therefore

Pr(D=1|X,Z)\displaystyle\Pr(D^{*}=1|X,Z) =(1δX)Pr(D=1|S=1,X,Z)+δXPr(D~=1|S=0,X)\displaystyle=(1-\delta_{X})\cdot\Pr(D=1|S=1,X,Z)+\delta_{X}\cdot\Pr(\tilde{D}=1|S=0,X)
=(1δX)P(X,Z)+δXP~(X).\displaystyle=(1-\delta_{X})\cdot P(X,Z)+\delta_{X}\cdot\tilde{P}(X).

For a fixed X=xX=x, the result in Lemma 1 shows that the observed propensity (still random through ZZ) is a linear transformation of the propensity score for the responders. If, additionally, we take two different values of ZZ, for example zz and zz^{\prime}, we can remove the contribution of P~(X)\tilde{P}(X), which is invariant with respect to zz and obtain222We write P(x,z)P^{*}(x,z) for Pr(D=1|X=x,Z=z)\Pr(D^{*}=1|X=x,Z=z), and P(x,z)P(x,z) for Pr(D=1|S=1,X=x,Z=z)\Pr(D=1|S=1,X=x,Z=z).

P(x,z)P(x,z)=(1δx)[P(x,z)P(x,z)]\displaystyle P^{*}(x,z)-P^{*}(x,z^{\prime})=(1-\delta_{x})\cdot\left[P(x,z)-P(x,z^{\prime})\right] (3)

Equation (3) says that the changes on the observed propensity score induced by varying ZZ are proportional to the changes on the true propensity score induced by varying ZZ. Thus, if we knew δx\delta_{x}, we could recover the change in the propensity score for the responders. When ZZ is continuous, we can take a limiting version of this argument, e.g., as zzz^{\prime}\to z, to obtain

P(x,z)z=(1δx)P(x,z)z.\displaystyle\frac{\partial P^{*}(x,z)}{\partial z}=(1-\delta_{x})\cdot\frac{\partial P(x,z)}{\partial z}. (4)

Both the discrete (equation (3)), and the continuous (equation(4)) change in the propensity score play a role in the relationship between the MTE curve (defined below) and certain parameters of interest.

2.2 The MTE for Responders

For the subpopulation of responders, the standard MTE framework holds. This motivates us to define an MTE curve for this subpopulation. In doing so, we are implicitly assuming that this is our object of interest. The reason for this is that many times we can also control the instrumental variable ZZ. Thus, to asses the effects of manipulations of ZZ we look at the MTE curve for responders.

Let 𝒫x\mathcal{P}_{x} and 𝒫x\mathcal{P}^{*}_{x} denote the support of P(x,Z):=Pr(D=1|X=x,Z)P(x,Z):=\Pr(D=1|X=x,Z) and P(x,Z):=Pr(D=1|X=x,Z)P^{*}(x,Z):=\Pr(D^{*}=1|X=x,Z) respectively. For the subpopulation of responders, we rewrite the selection equation as D=𝟙{P(X,Z)UD}D=\mathds{1}\left\{P(X,Z)\geq U_{D}\right\} where UDU(0,1)U_{D}\sim U_{(0,1)}.333This follows from D=𝟙{FV|S,X,Z(μ(X,Z)|1,X,Z)FV|S,X,Z(V|1,X,Z)}D=\mathds{1}\{F_{V|S,X,Z}(\mu(X,Z)|1,X,Z)\geq F_{V|S,X,Z}(V|1,X,Z)\}. Noting that by assumptions 2.(2) and 1, we have D=𝟙{P(X,Z)FV|S,X(V|1,X)}D=\mathds{1}\{P(X,Z)\geq F_{V|S,X}(V|1,X)\}. Finally, we take UD:=FV|S,X(V|1,X)U_{D}:=F_{V|S,X}(V|1,X). Thus, we define the MTE curve for responders as

MTE(u,x):=𝔼[Y(1)Y(0)|S=1,UD=u,X=x].\text{MTE}(u,x):=\mathbb{E}\left[Y(1)-Y(0)|S=1,U_{D}=u,X=x\right].

By the LIV approach we have the following equivalence result:444See Heckman and Vytlacil (2001) for sufficient conditions.

MTE(u,x)=𝔼[Y|S=1,P(X,Z)=u,X=x]u for u𝒫x.\text{MTE}(u,x)=\frac{\partial\mathbb{E}\left[Y|S=1,P(X,Z)=u,X=x\right]}{\partial u}\text{ for }u\in\mathcal{P}_{x}. (5)

Since we do not observe P(X,Z)P(X,Z), this is not an identification result in our setting. In a similar fashion, we define the following pseudo-MTE curve:

MTE(u,x;δx):=𝔼[Y|P(X,Z)=u,X=x]u for u𝒫x.\displaystyle\text{MTE}^{*}(u,x;\delta_{x}):=\frac{\partial\mathbb{E}\left[Y|P^{*}(X,Z)=u,X=x\right]}{\partial u}\text{ for }u\in\mathcal{P}^{*}_{x}. (6)

We emphasize that the pseudo-MTE curve is indexed by δx\delta_{x} because it depends implicitly on the proportion of the nonresponders. From the data only, we can only compute MTE(u,x;δx)\text{MTE}^{*}(u,x;\delta_{x}), not MTE(u,x)\text{MTE}(u,x). The pseudo-MTE curve is the curve that would be mistakenly taken to be the MTE curve. Indeed, in the absence of non-responders, MTE(u,x;0)=MTE(u,x)\text{MTE}^{*}(u,x;0)=\text{MTE}(u,x). If non-responders are present in the X=xX=x subpopulation, that is if δx>0\delta_{x}>0, the observed MTE(u,x;δx)\text{MTE}^{*}(u,x;\delta_{x}) does not identify MTE(u,x)\text{MTE}(u,x). In another words, the LIV approach is biased. We can now fully characterize the bias induced by δx\delta_{x} on the MTE curve.

Lemma 2.

Under Assumptions 1 and 2, we can write

MTE(v,x)=(1δx)MTE((1δx)v+δxP~(x),x;δx) for v𝒫x.\displaystyle\text{MTE}(v,x)=(1-\delta_{x})\text{MTE}^{*}\left((1-\delta_{x})v+\delta_{x}\tilde{P}(x),x;\delta_{x}\right)\text{ for }v\in\mathcal{P}_{x}. (7)
Proof.

Using (2), for u𝒫xu\in\mathcal{P}^{*}_{x}, we can write

𝔼[Y|P(X,Z)=u,X=x]\displaystyle\mathbb{E}\left[Y|P^{*}(X,Z)=u,X=x\right] =𝔼[Y|(1δx)P(X,Z)+δxP~(X)=u,X=x]\displaystyle=\mathbb{E}\left[Y|(1-\delta_{x})\cdot P(X,Z)+\delta_{x}\cdot\tilde{P}(X)=u,X=x\right]
=𝔼[Y|P(X,Z)=uδxP~(x)1δx,X=x]\displaystyle=\mathbb{E}\left[Y\bigg{|}P(X,Z)=\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}},X=x\right]

Differentiating with respect to uu, we obtain

MTE(u,x;δx)=11δxMTE(uδxP~(x)1δx,x) for u𝒫x.\displaystyle\text{MTE}^{*}(u,x;\delta_{x})=\frac{1}{1-\delta_{x}}\text{MTE}\left(\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}},x\right)\text{ for }u\in\mathcal{P}^{*}_{x}. (8)

since uδxP~(x)1δ𝒫x\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta}\in\mathcal{P}_{x} by (2). Alternatively, we can write

MTE(v,x)=(1δx)MTE((1δx)v+δxP~(x),x;δx) for v𝒫x.\displaystyle\text{MTE}(v,x)=(1-\delta_{x})\text{MTE}^{*}\left((1-\delta_{x})v+\delta_{x}\tilde{P}(x),x;\delta_{x}\right)\text{ for }v\in\mathcal{P}_{x}.

Lemma 2 shows that the bias is in the form of both location and scale. Equation (8), which is equivalent to Equation (7),555Note the changes in the domain of integration between (7) and (8). shows that MTE\text{MTE}^{*} is obtained by changing the location from uu to uδxP~(x)u-\delta_{x}\tilde{P}(x), and rescaling by (1δx)1(1-\delta_{x})^{-1}. Thus, as in a location-scale family of densities, we can regard MTE\text{MTE}^{*} as a family of curves, defined over 𝒫x\mathcal{P}^{*}_{x}, which is indexed by δx\delta_{x} and P~(x)\tilde{P}(x).

3 Automatic and explicit de-biasing

We now introduce our two main results. We show that, for any subpopulation X=xX=x where the instrument is strong enough to induce a propensity score supported on the full unit interval [0,1][0,1], the associated CATE(x)CATE(x) can be identified for responders. This is true even if the MTE(u,x,δx)MTE^{*}(u,x,\delta_{x}) curve is biased for MTE(u,x)MTE(u,x). We note that the identified CATE(x)CATE(x) parameters corresponds to the subpopulation of responders.

Assumption 3.

Full Support. The support of P(x,Z)P(x,Z) is 𝒫x=[0,1]\mathcal{P}_{x}=[0,1] for every xx in a subset 𝒳B𝒳\mathcal{X}_{B}\subseteq\mathcal{X}.

Assumption 3 says that the incentive in the instrument ZZ is strong enough to induce any individual in the X=xX=x subpopulation into or out of treatment. Perhaps surprisingly, the CATE(x)\text{CATE}(x), can be recovered only by resorting to the full support assumption. That is, to correctly compute the CATE(x)\text{CATE}(x) we do not need to recover the true MTE curve for responders.

Theorem 1.

Let Assumptions 1, 2, and 3 hold. Then, for any x𝒳Bx\in\mathcal{X}_{B}:

CATE(x)=inf𝒫xsup𝒫xMTE(u,x;δx)𝑑u.\displaystyle\text{CATE}(x)=\int_{\inf\mathcal{P}^{*}_{x}}^{\sup\mathcal{P}^{*}_{x}}\text{MTE}^{*}(u,x;\delta_{x})du.
Proof.

The Conditional Average Treatment Effect, CATE(x)\text{CATE}(x), could be computed using the true MTE curve (if it was observed) as

CATE(x)=01MTE(u,x)𝑑u.\displaystyle\text{CATE}(x)=\int_{0}^{1}\text{MTE}(u,x)du.

Given that 𝒫x=[0,1]\mathcal{P}_{x}=[0,1], then 𝒫x:=[px¯,px¯]\mathcal{P}^{*}_{x}:=[\underline{p_{x}^{*}},\overline{p_{x}^{*}}] where px¯:=inf𝒫x=δxP~(x)\underline{p_{x}^{*}}:=\inf\mathcal{P}^{*}_{x}=\delta_{x}\tilde{P}(x) and px¯:=sup𝒫x(1δx)+δxP~(x)]\overline{p_{x}^{*}}:=\sup\mathcal{P}^{*}_{x}(1-\delta_{x})+\delta_{x}\tilde{P}(x)]. Consider the integrating the pseudo-MTE curve over the support of the observed propensity score:

δxP~(x)(1δx)+δxP~(x)MTE(u,x;δx)𝑑u.\displaystyle\int_{\delta_{x}\tilde{P}(x)}^{(1-\delta_{x})+\delta_{x}\tilde{P}(x)}\text{MTE}^{*}(u,x;\delta_{x})du.

Using (8), we have

δxP~(x)(1δx)+δxP~(x)MTE(u,x;δx)𝑑u\displaystyle\int_{\delta_{x}\tilde{P}(x)}^{(1-\delta_{x})+\delta_{x}\tilde{P}(x)}\text{MTE}^{*}(u,x;\delta_{x})du =δxP~(x)(1δx)+δxP~(x)11δxMTE(uδxP~(x)1δx,x)𝑑u\displaystyle=\int_{\delta_{x}\tilde{P}(x)}^{(1-\delta_{x})+\delta_{x}\tilde{P}(x)}\frac{1}{1-\delta_{x}}\text{MTE}\left(\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}},x\right)du
=01MTE(u,x)𝑑u\displaystyle=\int_{0}^{1}\text{MTE}(u,x)du
=CATE(x)\displaystyle=\text{CATE}(x)

where we have done the change of variables

v=uδxP~(x)1δx.\displaystyle v=\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}}.

Remark 2.

The result of Theorem 1 states that by integrating the observed (and biased) marginal treatment effect curve over the support of the observed (and biased) propensity score leads to the CATE(x)\text{CATE}(x) provided that the propensity score for responders has full support. Thus, under the type of misspecification described in (1), CATE(x)\text{CATE}(x) is robust to δx0\delta_{x}\neq 0.

Remark 3.

This result also hold in a setting of misclassification and was our original motivation. That is, in a setting where instead of Y=DY(1)+(1D)Y(0)Y=D^{*}Y(1)+(1-D^{*})Y(0), we have Y=DY(1)+(1D)Y(0)Y=DY(1)+(1-D)Y(0) and we interpret DD^{*} as a misclassified treatment status.

Unfortunately, the automatic “de-biasing" in Theorem 1 does not hold for the other policy parameters that can be obtained via the MTE curve. On the other hand, we show that the full support assumption can be used to identify δx\delta_{x} which allows an explicit “de-biasing" procedure. Given that 𝒫x:=[px¯,px¯]=[δxP~(x),(1δx)+δxP~(x)]\mathcal{P}^{*}_{x}:=[\underline{p_{x}^{*}},\overline{p_{x}^{*}}]=[\delta_{x}\tilde{P}(x),(1-\delta_{x})+\delta_{x}\tilde{P}(x)] we can actually identify both δx\delta_{x} and P~(x)\tilde{P}(x). It follows then from Lemma 2 that we can recover the MTE(u,x)\text{MTE}(u,x) curve.

Proposition 1.

Let Assumptions 1, 2, and 3 hold. Then δx\delta_{x} is identified for any x𝒳Bx\in\mathcal{X}_{B} through:

δx=1(px¯px¯)\delta_{x}=1-(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})
Proof.

According to Equation (2), the range of the observed propensity score is given by 𝒫x=[δxP~(x),(1δx)+δxP~(x)]\mathcal{P}^{*}_{x}=[\delta_{x}\tilde{P}(x),(1-\delta_{x})+\delta_{x}\tilde{P}(x)]. For each xx, the observed propensity score P()P^{*}(\cdot) can be viewed as an affine function of P()P(\cdot). This affine function is parameterized by δx\delta_{x} and P~x\tilde{P}_{x}. For the endpoints p¯x\underline{p}_{x} and p¯x\overline{p}_{x} of the true propensity score, we have the mappings:

px¯(1δx)px¯+δxP~(x)\displaystyle\underline{p_{x}}\mapsto(1-\delta_{x})\underline{p_{x}}+\delta_{x}\tilde{P}(x)
px¯(1δx)px¯+δxP~(x)\displaystyle\overline{p_{x}}\mapsto(1-\delta_{x})\overline{p_{x}}+\delta_{x}\tilde{P}(x)

The images of this collection of mapping are observed. They are the endpoints of the observed propensity score P(Z,x)P^{*}(Z,x). If the original endpoints of the true P()P(\cdot) are known to be px¯=0\underline{p_{x}}=0 and px¯=1\overline{p_{x}}=1, like stated in Assumption 3, the mapping above can be recovered by the following system of two equations in two unknowns: P~(x)\tilde{P}(x) and δx\delta_{x}.

px¯\displaystyle\underline{p_{x}^{*}} =δxP~(x)\displaystyle=\delta_{x}\tilde{P}(x)
px¯\displaystyle\overline{p_{x}^{*}} =(1δx)+δxP~(x)\displaystyle=(1-\delta_{x})+\delta_{x}\tilde{P}(x)

which implies that

δx\displaystyle\delta_{x} =1(px¯px¯)\displaystyle=1-(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})
P~(x)\displaystyle\tilde{P}(x) =px¯1δx\displaystyle=\underline{p_{x}^{*}}\cdot\frac{1}{\delta_{x}}

The intuition for this result is simple. Because the original propensity score P(Z,x)P(Z,x), for any fixed xx, is supported on the unit interval, the observed support 𝒫x=[px¯,px¯]\mathcal{P}^{*}_{x}=[\underline{p_{x}^{*}},\overline{p_{x}^{*}}] will contain enough information to identify δx\delta_{x}. This is summarized Figure 1.

Refer to caption
Figure 1: Identifying δx\delta_{x}: The figure shows the link between the non-responders propensity score, the proportion of non-responders and the observed propensity score. Because the non-responders propensity score does not vary with the instrument ZZ and supp(P(Z,x))=[0,1]supp(P(Z,x))=[0,1] the δx\delta_{x} can be recovered from observing the discrepancy from the observed support P(Z,x)P^{*}(Z,x) and [0,1][0,1]. The picture shows one of those points, x0x_{0}.

Having identified δx\delta_{x}, then we use Equation (8) to identify the MTE curve.

Corollary 1.

Let Assumptions 1, 2, and 3 hold. Then, the MTE curve is identified:

MTE(v,x)=(px¯px¯)MTE((px¯px¯)v+px¯,x;1(px¯px¯)) for v𝒫x=[0,1].\displaystyle\text{MTE}(v,x)=(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\text{MTE}^{*}\left((\overline{p_{x}^{*}}-\underline{p_{x}^{*}})v+\underline{p_{x}^{*}},x;1-(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\right)\text{ for }v\in\mathcal{P}_{x}=[0,1].

where px¯=inf𝒫x\underline{p_{x}^{*}}=\inf\mathcal{P}^{*}_{x} and px¯=sup𝒫x.\overline{p_{x}^{*}}=\sup\mathcal{P}^{*}_{x}.

This corollary provides the correct “de-biasing” to be performed on the observed MTE curve to match the true MTE curve. However, it is possible to recover parameters that are based on the MTE curve without having to recover the MTE curve in the first place. We provide two examples.

Example 2 (LATE).

Consider the LATE, for P(x,z)<P(x,z)P(x,z^{\prime})<P(x,z) with z,z𝒵z,z^{\prime}\in\mathcal{Z}, which can be obtained from MTE curve as

LATE(x,P(x,z),P(x,z))=1P(x,z)P(x,z)P(x,z)P(x,z)MTE(u,x)𝑑u.\displaystyle\text{LATE}(x,P(x,z),P(x,z^{\prime}))=\frac{1}{P(x,z)-P(x,z^{\prime})}\int_{P(x,z^{\prime})}^{P(x,z)}\text{MTE}(u,x)du.

Under misspecification, for the same z,z𝒵z,z^{\prime}\in\mathcal{Z}, we have

LATE(x,P(x,z),P(x,z))\displaystyle\text{LATE}^{*}(x,P^{*}(x,z),P^{*}(x,z^{\prime})) =1P(x,z)P(x,z)P(x,z)P(x,z)MTE(u,x;δx)𝑑u\displaystyle=\frac{1}{P^{*}(x,z)-P^{*}(x,z^{\prime})}\int_{P^{*}(x,z^{\prime})}^{P^{*}(x,z)}\text{MTE}^{*}(u,x;\delta_{x})du
=(1δx)1P(x,z)P(x,z)(1δx)P(x,z)+δxP~(x)(1δx)P(x,z)+δxP~(x)11δx\displaystyle=\frac{(1-\delta_{x})^{-1}}{P(x,z)-P(x,z^{\prime})}\int_{(1-\delta_{x})P(x,z^{\prime})+\delta_{x}\tilde{P}(x)}^{(1-\delta_{x})P(x,z)+\delta_{x}\tilde{P}(x)}\frac{1}{1-\delta_{x}}
×MTE(uδxP~(x)1δx,x)du.\displaystyle\times\text{MTE}\left(\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}},x\right)du.

Note that to go from MTE\text{MTE}^{*} to MTE we used Lemma 2. We did not use Corollary 1. Defining the change of variables u~=uδxP~(x)1δx\tilde{u}=\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}}, we get (1δx)du~=du.(1-\delta_{x})d\tilde{u}=du. We then write

LATE(x,P(x,z),P(x,z))\displaystyle\text{LATE}^{*}(x,P^{*}(x,z),P^{*}(x,z^{\prime})) =(1δx)1P(x,z)P(x,z)(1δx)P(x,z)+δxP~(x)(1δx)P(x,z)+δxP~(x)11δx\displaystyle=\frac{(1-\delta_{x})^{-1}}{P(x,z)-P(x,z^{\prime})}\int_{(1-\delta_{x})P(x,z^{\prime})+\delta_{x}\tilde{P}(x)}^{(1-\delta_{x})P(x,z)+\delta_{x}\tilde{P}(x)}\frac{1}{1-\delta_{x}}
×MTE(uδxP~(x)1δx,x)du\displaystyle\times\text{MTE}\left(\frac{u-\delta_{x}\tilde{P}(x)}{1-\delta_{x}},x\right)du
=(1δx)1P(x,z)P(x,z)P(x,z)P(x,z)MTE(u,x)𝑑u\displaystyle=\frac{(1-\delta_{x})^{-1}}{P(x,z)-P(x,z^{\prime})}\int_{P(x,z^{\prime})}^{P(x,z)}\text{MTE}(u,x)du
=11δxLATE(x,P(x,z),P(x,z)).\displaystyle=\frac{1}{1-\delta_{x}}\text{LATE}(x,P(x,z),P(x,z^{\prime})).

Now, since δx=1(px¯px¯)\delta_{x}=1-(\overline{p_{x}^{*}}-\underline{p_{x}^{*}}) by Proposition 1, the explicit de-biasing is achieved by

(px¯px¯)LATE(x,P(x,z),P(x,z))\displaystyle(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\text{LATE}^{*}(x,P^{*}(x,z),P^{*}(x,z^{\prime})) =LATE(x,P(x,z),P(x,z)).\displaystyle=\text{LATE}(x,P(x,z),P(x,z^{\prime})).

The left hand side can be computed from the data.

Example 3 (MPRTE).

The marginal policy relevant treatment effect (MPRTE) is an average of the MTE(u,x)\text{MTE}(u,x) along the margin of indifference: when UD=P(X,Z)U_{D}=P(X,Z). It is given by

MPRTE(x)=𝒵MTE(P(x,z),x)P(x,z)z(E[[P(x,Z)]z])1fZ|X(z|x)𝑑z\displaystyle\text{MPRTE}(x)=\int_{\mathcal{Z}}\text{MTE}(P(x,z),x)\frac{\partial P(x,z)}{\partial z}\left(E\left[\frac{\partial[P(x,Z)]}{\partial z}\right]\right)^{-1}f_{Z|X}(z|x)dz

Then, using Equations (4) and (7) we get

MPRTE(x)\displaystyle\text{MPRTE}^{*}(x) =𝒵MTE(P(x,z),x;δx)P(x,z)z(E[[P(x,Z)]z])1fZ|X(z|x)𝑑z\displaystyle=\int_{\mathcal{Z}}\text{MTE}^{*}(P^{*}(x,z),x;\delta_{x})\frac{\partial P^{*}(x,z)}{\partial z}\left(E\left[\frac{\partial[P^{*}(x,Z)]}{\partial z}\right]\right)^{-1}f_{Z|X}(z|x)dz
=𝒵11δxMTE(P(x,z),x)P(x,z)z(E[[P(X,Z)]z])1fZ|X(z|x)𝑑z\displaystyle=\int_{\mathcal{Z}}\frac{1}{1-\delta_{x}}\text{MTE}(P(x,z),x)\frac{\partial P(x,z)}{\partial z}\left(E\left[\frac{\partial[P(X,Z)]}{\partial z}\right]\right)^{-1}f_{Z|X}(z|x)dz
=11δxMPRTE(x).\displaystyle=\frac{1}{1-\delta_{x}}\text{MPRTE}(x).

Thus, again, by Proposition 1, we obtain

(px¯px¯)MPRTE(x)\displaystyle(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\text{MPRTE}^{*}(x) =MPRTE(x).\displaystyle=\text{MPRTE}(x).

In the previous examples, proceeding as if there were no misspecification, yields biased parameters. Thus, the automatic “de-biasing” in CATE is the exception rather than the rule.

4 Bounds under limited support

Instead of assuming full support, now we allow for limited support of the propensity score P(x,Z)P(x,Z), but we still require that it is an interval.

Assumption 4.

Limited Support. The support of P(x,Z)P(x,Z) is 𝒫x=[px¯,px¯][0,1]\mathcal{P}_{x}=[\underline{p_{x}},\overline{p_{x}}]\subset[0,1].

Under Assumption 4, and using (2), we have that the observed support of P(X,Z)P^{*}(X,Z) is

[px¯,px¯]=[(1δx)px¯+δxP~(x),(1δx)px¯+δxP~(x)].\displaystyle[\underline{p_{x}^{*}},\overline{p_{x}^{*}}]=[(1-\delta_{x})\underline{p_{x}}+\delta_{x}\tilde{P}(x),(1-\delta_{x})\overline{p_{x}}+\delta_{x}\tilde{P}(x)].

Taking the difference we obtain that px¯px¯=(1δx)(px¯px¯)\overline{p_{x}^{*}}-\underline{p_{x}^{*}}=(1-\delta_{x})(\overline{p_{x}}-\underline{p_{x}}). Since px¯px¯1\overline{p_{x}}-\underline{p_{x}}\leq 1, then px¯px¯(1δx)\overline{p_{x}^{*}}-\underline{p_{x}^{*}}\leq(1-\delta_{x}), so that a lower bound for δx\delta_{x} is δx1(px¯px¯)\delta_{x}\geq 1-(\overline{p_{x}^{*}}-\underline{p_{x}^{*}}).

In general, it is not possible to provide an upper bound for δx\delta_{x}. This is similar to the case of misclassification. Following that literature (see Assumption 4 in Acerenza, Ban, and Kedagni (2021), and references therein), we assume it is known that for some δ¯x\overline{\delta}_{x}: δxδ¯x<1\delta_{x}\leq\overline{\delta}_{x}<1. Thus, we can write 1(px¯px¯)δxδ¯x.1-(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\leq\delta_{x}\leq\overline{\delta}_{x}. The correction factor in Examples 2 and 3 is (1δx)(1-\delta_{x}). Now, it bounded by 1δ¯x1δxpx¯px¯1-\overline{\delta}_{x}\leq 1-\delta_{x}\leq\overline{p_{x}^{*}}-\underline{p_{x}^{*}}. Thus, we can bound both LATE and MPRTE using this:

(1δ¯)LATE(x,P(x,z),P(x,z))\displaystyle(1-\overline{\delta})\text{LATE}^{*}(x,P^{*}(x,z),P^{*}(x,z^{\prime})) LATE(x,P(x,z),P(x,z))\displaystyle\leq\text{LATE}(x,P(x,z),P(x,z^{\prime}))
(px¯px¯)LATE(x,P(x,z),P(x,z)),\displaystyle\leq(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\text{LATE}^{*}(x,P^{*}(x,z),P^{*}(x,z^{\prime})),

and

(1δ¯)MPRTE(x)MPRTE(x)(px¯px¯)MPRTE(x).\displaystyle(1-\overline{\delta})\text{MPRTE}^{*}(x)\leq\text{MPRTE}(x)\leq(\overline{p_{x}^{*}}-\underline{p_{x}^{*}})\text{MPRTE}^{*}(x).

Naturally, if δ¯x\overline{\delta}_{x} is not known, we can only provide upper bounds.

Again, we stress that is not necessary to bound the MTE curve in the first place. Such a bound can be complicated to obtain since, by Lemma 2, δx\delta_{x} enters in three different ways in the observed MTE curve.

5 Misspecification as a weak instrument

We can frame our model as the triangular scheme of Staiger and Stock (1997) and consider a sequence {δx,n}n=1\left\{\delta_{x,n}\right\}_{n=1}^{\infty} such that limnδx,n=1\lim_{n\to\infty}\delta_{x,n}=1 at a certain rate as nn\to\infty. Thus, as nn\to\infty, the instrument becomes irrelevant in the model. A possible indicator of the presence of a large value of δx,n\delta_{x,n} can be the average derivative of the observed propensity score. This equals an attenuated version of the average derivative of the true propensity score. For a given value of δx,n\delta_{x,n}, by equation (4), we have

E[P(x,Z)z]=(1δx,n)E[P(x,Z)z]\displaystyle E\left[\frac{\partial P^{*}(x,Z)}{\partial z}\right]=(1-\delta_{x,n})E\left[\frac{\partial P(x,Z)}{\partial z}\right]

Thus a “small” value can be an indication that δx,n\delta_{x,n} is close to 1. This is similar to a first stage regression in the linear model. We take the derivative with respect to zz to get rid of the propensity score that does not respond to ZZ. We average, because this likely to be a non-linear expression. Thus, (1δx,n)(1-\delta_{x,n}) can be thought of as the counterpart of C/TC/\sqrt{T} in the notation of Staiger and Stock (1997). Indeed, define

Covx(Z,D):=E[ZD|X=x]E[Z|X=x]E[D|X=x].\displaystyle Cov_{x}(Z,D^{*}):=E[ZD^{*}|X=x]-E[Z|X=x]E[D^{*}|X=x].

We have

E[ZD|X=x]\displaystyle E[ZD^{*}|X=x] =E[ZSD|X=x]+E[Z(1S)D~|X=x]\displaystyle=E[ZSD|X=x]+E[Z(1-S)\tilde{D}|X=x]
=E[ZSD|X=x]+E[Z|X=x]E[(1S)D~|X=x]\displaystyle=E[ZSD|X=x]+E[Z|X=x]E[(1-S)\tilde{D}|X=x]

and

E[D|X=x]=E[SD|X=x]+E[(1S)D~|X=x]\displaystyle E[D^{*}|X=x]=E[SD|X=x]+E[(1-S)\tilde{D}|X=x]

Thus,

Covx(Z,D)\displaystyle Cov_{x}(Z,D^{*}) =E[ZSD|X=x]E[Z|X=x]E[SD|X=x]\displaystyle=E[ZSD|X=x]-E[Z|X=x]E[SD|X=x]
+E[Z|X=x]E[(1S)D~|X=x]E[Z|X=x]E[(1S)D~|X=x]\displaystyle+E[Z|X=x]E[(1-S)\tilde{D}|X=x]-E[Z|X=x]E[(1-S)\tilde{D}|X=x]
=Covx(Z,SD)\displaystyle=Cov_{x}(Z,SD)

which is the covariance between the instrument and treatment status for the responders with X=xX=x. To see the role of the rate at which δx,n\delta_{x,n} converges to 1, suppose for a second that we know the functional form of P(x,Z)P^{*}(x,Z), and we estimate the average derivative using a sample mean:

E^[P(x,Z)z]=1ni=1nP(x,Zi)z=(1δx,n)1ni=1nP(x,Zi)z\displaystyle\hat{E}\left[\frac{\partial P^{*}(x,Z)}{\partial z}\right]=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial P^{*}(x,Z_{i})}{\partial z}=(1-\delta_{x,n})\frac{1}{n}\sum_{i=1}^{n}\frac{\partial P(x,Z_{i})}{\partial z}

Then

E^[P(x,Z)z]E[P(x,Z)z]=(1δx,n)(1ni=1nP(x,Zi)zE[P(x,Z)z])\displaystyle\hat{E}\left[\frac{\partial P^{*}(x,Z)}{\partial z}\right]-E\left[\frac{\partial P^{*}(x,Z)}{\partial z}\right]=(1-\delta_{x,n})\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial P(x,Z_{i})}{\partial z}-E\left[\frac{\partial P(x,Z)}{\partial z}\right]\right)

In order to investigate possible discontinuities in the limiting distributions, we follow Hahn and Kuersteiner (2002), and we let (1δx,n)=nνx(1-\delta_{x,n})=n^{\nu_{x}}, for νx<0\nu_{x}<0. We obtain

E^[P(X,Z)z]E[P(X,Z)z]=Op(nνx1/2).\displaystyle\hat{E}\left[\frac{\partial P^{*}(X,Z)}{\partial z}\right]-E\left[\frac{\partial P^{*}(X,Z)}{\partial z}\right]=O_{p}(n^{\nu_{x}-1/2}).

Then, we obtain a degenerate limit:

n(E^[P(X,Z)z]E[P(X,Z)z])=op(1)\displaystyle\sqrt{n}\left(\hat{E}\left[\frac{\partial P^{*}(X,Z)}{\partial z}\right]-E\left[\frac{\partial P^{*}(X,Z)}{\partial z}\right]\right)=o_{p}(1)

Now consider the MPRTE. Recall that, by Example 3, under the full support guaranteed by Assumption 3,

nνxMPRTE(x)=MPRTE(x).\displaystyle n^{\nu_{x}}\text{MPRTE}^{*}(x)=\text{MPRTE}(x).

Assume that, if δx=0\delta_{x}=0, there exists MPRTE^(x)\hat{\text{MPRTE}}(x), a n\sqrt{n}-consistent estimator of MPRTE(x)\text{MPRTE}(x) such that

MPRTE^(x)MPRTE(x)=nνx(MPRTE^(x)MPRTE(x)).\displaystyle\hat{\text{MPRTE}}^{*}(x)-\text{MPRTE}^{*}(x)=n^{-\nu_{x}}\left(\hat{\text{MPRTE}}(x)-\text{MPRTE}(x)\right).

Thus, if νx=1/2\nu_{x}=-1/2, then MPRTE^(x)\hat{\text{MPRTE}}^{*}(x) does not converge in probability. In future work, we will use these results to construct confidence intervals for the parameters of interest.

6 Conclusion

In this paper we use the MTE framework to model a proportion of individuals who do not respond to the incentives of the instrumental variable. We show that in the special case where the observed propensity score is fully supported on the unit interval, i) the CATE is automatically identified regardless of the non-responders, and ii) we can identify the proportion of non-responders and use it to recover the MTE curve, and we can recover any parameter associated with it. We show that for some parameters, such as LATE and MPRTE, it is even possible to bypass the recovery of the MTE curve, and directly recover these parameters. Moreover, if the propensity has limited support, we find bounds for the LATE, the MPRTE, and the MTE curve. When we let the proportion of non-responders approach 1 at a certain rate, the framework resembles that of weak instruments. In future research we hope to leverage the results in this literature to construct valid confidence intervals for the MTE curve and related parameters.

References

  • (1)
  • Acerenza, Ban, and Kedagni (2021) Acerenza, S., K. Ban, and D. Kedagni (2021): “Marginal Treatment Effects with Misclassified Treatment,” Working Paper.
  • Bjorklund and Moffitt (1987) Bjorklund, A., and R. Moffitt (1987): “The Estimation of Wage Gains and Welfare Gains in Self-Selection,” The Review of Economics and Statistics, 69(1), 42–49.
  • Briggs, Caplin, Leth-Petersen, Tonetti, and Violante (2020) Briggs, J., A. Caplin, S. Leth-Petersen, C. Tonetti, and G. Violante (2020): “Estimating Marginal Treatment Effects with Survey Instruments,” Working Paper.
  • Hahn and Kuersteiner (2002) Hahn, J., and G. Kuersteiner (2002): “Discontinuities of weak instrument limiting distributions,” Economics Letters, 75, 325–331.
  • Heckman, Urzua, and Vytlacil (2006) Heckman, J. J., S. Urzua, and E. Vytlacil (2006): “Understanding Instrumental Variables in Models with Essential Heterogeneity,” The Review of Economics and Statistics, 88(3), 389–432.
  • Heckman and Vytlacil (2001) Heckman, J. J., and E. Vytlacil (2001): “Local Instrumental Variables,” in Nonlinear Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic Theory and Econometrics: Essays in Honor of Takeshi Amemiya, ed. by C. Hsiao, K. Morimune, and J. Powell, pp. 1–46. Cambrigde University Press.
  • Heckman and Vytlacil (2005)    (2005): “Structural Equations, Treatment Effects, and Econometric Policy Evaluation,” Econometrica, 73(3), 669–738.
  • Mogstad and Torgovitsky (2018) Mogstad, M., and A. Torgovitsky (2018): “Identification and Extrapolation of Causal Effects with Instrumental Variables,” Annual Review of Economics, 10, 577–613.
  • Possebom (2021) Possebom, V. (2021): “Crime and Mismeasured Punishment: Marginal Treatment Effect with Misclassification,” Working Paper.
  • Staiger and Stock (1997) Staiger, D., and J. H. Stock (1997): “Instrumental Variables Regression with Weak Instruments,” Econometrica, 65(3), 557–586.