This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A general characterization of optimal tie-breaker designs

Harrison H. Li
Stanford University
   Art B. Owen
Stanford University
(October 2022)
Abstract

Tie-breaker designs trade off a statistical design objective with short-term gain from preferentially assigning a binary treatment to those with high values of a running variable xx. The design objective is any continuous function of the expected information matrix in a two-line regression model, and short-term gain is expressed as the covariance between the running variable and the treatment indicator. We investigate how to specify design functions indicating treatment probabilities as a function of xx to optimize these competing objectives, under external constraints on the number of subjects receiving treatment. Our results include sharp existence and uniqueness guarantees, while accommodating the ethically appealing requirement that treatment probabilities are non-decreasing in xx. Under such a constraint, there always exists an optimal design function that is constant below and above a single discontinuity. When the running variable distribution is not symmetric or the fraction of subjects receiving the treatment is not 1/21/2, our optimal designs improve upon a DD-optimality objective without sacrificing short-term gain, compared to the three level tie-breaker designs of Owen and Varian (2020) that fix treatment probabilities at 0, 1/21/2, and 11. We illustrate our optimal designs with data from Head Start, an early childhood government intervention program.

1 Introduction

Companies, charitable institutions, and clinicians often have ethical or economic reasons to prefer assigning a binary treatment to certain individuals. If this preference is expressed by the values of a scalar running variable xx, a natural decision is to assign the treatment to a subject if and only if their xx is at least some threshold tt. This is a regression discontinuity design, or RDD (Thistlethwaite and Campbell,, 1960). Unfortunately, treatment effect estimates from an RDD analysis typically have very high variance (Jacob et al.,, 2012; Goldberger,, 1972; Gelman and Imbens,, 2017), relative to those from a randomized control trial (RCT) that does not preferentially treat any individuals. To trade off between these competing statistical and ethical objectives, investigators can use a tie-breaker design (TBD). In a typical tie-breaker design, the top ranked subjects get the treatment, the lowest ranked subjects are in the control group and a cohort in the middle are randomized to treatment or control. The earliest tie-breaker reference that we are aware of is Campbell, (1969) where xx was discrete and the randomization broke ties among subjects with identical values of xx.

Past settings for tie-breaker designs include the offer of remedial English to incoming university students based on their high school English proficiency (Aiken et al.,, 1998), a diversion program designed to reduce juvenile delinquency (Lipsey et al.,, 1981), scholarship offerings for two and four year colleges based on a judgment of the applicants’ needs and academic strengths (Abdulkadiroglu et al.,, 2017; Angrist et al.,, 2020), and clinical trials (Trochim and Cappelleri,, 1992), where they are known as cutoff designs.

The tie-breaker design problem is to choose treatment probabilities pip_{i} for subjects i=1,,ni=1,\dots,n based on their running variables xix_{i}. These probabilities are chosen before observing the response values y1,,yny_{1},\dots,y_{n} but with the running variables x1,,xnx_{1},\dots,x_{n} known. We assume throughout that pi=pip_{i}=p_{i^{\prime}} whenever xi=xix_{i}=x_{i^{\prime}}.

As is common in the optimal experimental design literature, the statistical objective is an “efficiency” criterion that measures estimation precision. Specifically, our criterion will be a function Ψ()\Psi(\cdot) of the information (scaled inverse variance) matrix n(p1,,pn)\mathcal{I}_{n}(p_{1},\dots,p_{n}) for the model parameters β=(β0,β1,β2,β3)\beta=(\beta_{0},\beta_{1},\beta_{2},\beta_{3})^{\top} in a two line model relating the response yiy_{i} to the running variable xix_{i} and a treatment indicator zi{1,1}z_{i}\in\{-1,1\}:

yi=β0+β1xi+β2zi+β3xizi+εi.y_{i}=\beta_{0}+\beta_{1}x_{i}+\beta_{2}z_{i}+\beta_{3}x_{i}z_{i}+\varepsilon_{i}. (1)

This simple working model nonetheless poses some challenging design problems. In Section 6 we describe some more general modeling settings for tie-breaker models.

Throughout we assume that the running variable is centered, i.e. (1/n)ixi=0(1/n)\sum_{i}x_{i}=0, and that the εi\varepsilon_{i} have common variance σ2\sigma^{2}. Here zi=1z_{i}=1 indicates treatment and so pi=Pr(zi=1)=(1+𝔼(zi))/2p_{i}=\Pr(z_{i}=1)=(1+\mathbb{E}(z_{i}))/2). For model (1) the information matrix is n=𝔼(𝒳i𝒳i)\mathcal{I}_{n}=\mathbb{E}(\mathcal{X}_{i}\mathcal{X}_{i}^{\top}) where 𝒳i=(1,xi,zi,xizi)4\mathcal{X}_{i}=(1,x_{i},z_{i},x_{i}z_{i})^{\top}\in\mathbb{R}^{4} and the expectation is taken over the treatment assignments ziz_{i}, conditional on the running variables xix_{i} (whose values are known). The ordinary least squares estimate β^\hat{\beta} of β\beta satisfies 𝔼(Var(β^)1)=nn/σ2\mathbb{E}(\mathrm{Var}(\hat{\beta})^{-1})=n\mathcal{I}_{n}/\sigma^{2}. Common examples of efficiency criteria Ψ()\Psi(\cdot) in the literature, such as the D-optimality criterion ΨD()=log(det())\Psi_{D}(\cdot)=\log(\det(\cdot)), are concave in both n\mathcal{I}_{n} and p=(p1,,pn)p=(p_{1},\ldots,p_{n})^{\top} (Boyd and Vandenberghe,, 2004). However, our theoretical results only require continuity of Ψ()\Psi(\cdot).

Our preference for treating individuals with higher running variables xx is expressed as an equality constraint on the scaled covariance xp¯(1/n)i=1nxipi\overline{xp}\equiv(1/n)\sum_{i=1}^{n}x_{i}p_{i} between treatment and the running variable (recall the latter is known, hence viewed as non-random). Under the two-line model (1), this constraint has the following economic interpretation. We take yy to be something like economic value or student success, where larger yy is better. We expect that β3>0\beta_{3}>0 holds in most of our motivating problems. The expected value of yy per customer under (1) is then

𝔼(yi)=β0+β2(2p¯1)+β3(2xp¯1)\displaystyle\mathbb{E}(y_{i})=\beta_{0}+\beta_{2}\cdot(2\bar{p}-1)+\beta_{3}\cdot(2\overline{xp}-1) (2)

where p¯(1/n)i=1npi\bar{p}\equiv(1/n)\sum_{i=1}^{n}p_{i}. Equation (2) shows that the expected gain is unaffected by β0\beta_{0} or β1\beta_{1}. Furthermore, we assume the proportion of treated subjects is fixed by an external budget, i.e., an equality constraint p¯=p~\bar{p}=\widetilde{p} for some p~(0,1)\widetilde{p}\in(0,1). For instance, there might be only a set number of scholarships or perks to be given out. The only term affected by the design in (2) is then β3xp¯\beta_{3}\cdot\overline{xp}, as pointed out by Owen and Varian, (2020). For β3>0\beta_{3}>0, the short term average value per customer grows with xp¯\overline{xp} and we would want that value to be large. Similar functionals are also commonly studied as regret functions in bandit problems (Goldenshluger and Zeevi,, 2013; Metelkina and Pronzato,, 2017).

We are now ready to formulate the tie-breaker design problem as the following constrained optimization problem. Given real values x1x2xnx_{1}\leqslant x_{2}\leqslant\cdots\leqslant x_{n}:

maximizeΨ(n(𝒑))over𝒑=(p1,,pn)𝒜subject ton1i=1npi=p~andn1i=1nxipi=xp~\begin{array}[]{ll}\mbox{maximize}&\qquad\Psi(\mathcal{I}_{n}(\bm{p}))\\ \mbox{over}&\qquad\bm{p}=(p_{1},\dots,p_{n})\in\mathcal{A}\\ \mbox{subject to}&\qquad n^{-1}\sum_{i=1}^{n}p_{i}=\widetilde{p}\qquad\mbox{and}\qquad n^{-1}\sum_{i=1}^{n}x_{i}p_{i}=\widetilde{xp}\end{array} (3)

for some constants p~\widetilde{p} and xp~\widetilde{xp}. The first equality constraint in (3) is a budget constraint due to the cost of treatment and the second constraint is on the short term gain mentioned above. We consider two different sets 𝒜\mathcal{A} in detail. The first is [0,1]n[0,1]^{n}. The second is {𝒑[0,1]n0p1p2pn1}\{\bm{p}\in[0,1]^{n}\mid 0\leqslant p_{1}\leqslant p_{2}\leqslant\cdots\leqslant p_{n}\leqslant 1\} which requires treatment probabilities to be non-decreasing in the running variable xx. Such a monotonicity constraint prevents more qualified students from having a lower chance of getting a scholarship than less qualified ones or more loyal customers having a lower chance for a perk than others. It also eliminates perverse incentives for subjects to lower their xix_{i}. To our knowledge, such a monotonicity constraint has not been received much attention in the optimal design literature, though it is enormously appealing in our motivating applications.

When the efficiency criterion ψ()\psi(\cdot) is concave in 𝒑\bm{p}, then a solution to (3) can be found numerically via convex optimization, as Morrison and Owen, (2022) do for vector valued xix_{i}, and as Metelkina and Pronzato, (2017) mention for a similar problem. However, our particular setting with univariate xix_{i} is tractable enough to provide a simple yet complete analytical characterization of the optimal pip_{i}, even if the efficiency criterion is not concave. We show, under general conditions, that we can always find optimal treatment probabilities that are piecewise constant in xx, with the number of pieces small and independent of nn.

There is a well-developed literature for optimal experiment design in the presence of multiple objectives. Early examples of a constrained optimization problem of the form (3) were designed to account for several of the standard efficiency objectives simultaneously (Stigler,, 1971; Lee,, 1987, 1988). Läuter, (1974, 1976) proposed maximizing a convex combination of efficiency objectives, a practice now typically referred to as a “compound” design approach. It is now well known (Cook and Wong,, 1994; Clyde and Chaloner,, 1996) that in many problems with concave objectives, optimal constrained and compound designs are equivalent. In this paper, we provide another approach to reduce the constrained problem (3) to a compound problem that can handle the monotonicity constraint. At the same time, we provide simple ways to compute our optimal designs that are based directly on the parameters p~\widetilde{p} and xp~\widetilde{xp} in our constrained formulation (3), and do not require specifying the Lagrange multipliers appearing in the corresponding compound problem. Those Lagrange multipliers involve ratios of information gain to economic gain where each of those quantities is only known up to a multiplicative constant.

Problems similar to (3) have received significant attention in the sequential design of clinical trials. Biased-coin designs, beginning with the simple procedure of Efron, (1971), have been developed as a compromise between treatment balance and randomization; see Atkinson, (2014) for a review. Covariate-adaptive biased-coin designs often replace the balance objective with an efficiency criterion such as D-optimality (Atkinson,, 1982; Rosenberger and Sverdlov,, 2008). Response-adaptive designs also optimize for some efficiency objective but simultaneously seek to minimize the number of patients receiving the inferior treatment for ethical reasons (Hu and Rosenberger,, 2006). Various authors such as Bandyopadhyay and Biswas, (2001) and Hu et al., (2015) propose sequential designs to effectively navigate this trade-off. When they also account for covariate information, they are called covariate-adjusted response-adaptive (CARA) designs (Zhang et al.,, 2007; Zhang and Hu,, 2009).

In the CARA literature especially, there has been significant recent interest in optimal design for nonlinear models (Sverdlov et al.,, 2013; Metelkina and Pronzato,, 2017; Biswas and Bhattacharya,, 2018). Unlike optimal designs in linear models such as (1), designs in nonlinear models can typically only be locally optimal, meaning that their optimality depends on the values of the unknown parameters (Chernoff,, 1953). While we may be able to obtain increasingly reliable estimates of these parameters over time in sequential settings, in non-clinical settings subjects typically enter a tie-breaker study non-sequentially, i.e., we know the running variables for all subjects before designing the experiment. In these applications — such as measuring the impact of a scholarship on future educational attainment — it can take several years to collect a single set of responses on which to compute a parameter estimate. Locally optimal designs are therefore of limited utility in this setting, and so we focus on optimal design under the linear model (1) in a non-sequential setting, which already presents a sufficient challenge.

The existing literature on problems like (3) typically considers the running variable xx to be random. For example, Section 7 of Owen and Varian, (2020) study tie-breaker designs under the assumption that the running variable is either uniform or Gaussian, and exactly half the subjects are to be treated. They consider the typical three level tie-breaker design where subjects with running variable xx above some threshold Δ\Delta always get the treatment, subjects with running variable below Δ-\Delta never get the treatment, and the remaining subjects are randomized into treatment with probability 1/21/2. They find that a cc-optimality criterion of statistical efficiency is monotonically increasing in the width Δ\Delta of the randomization window, with the RCT (Δ\Delta\to\infty) being most efficient and the RDD (Δ=0\Delta=0) least efficient. Conversely, the short-term gain is decreasing in Δ\Delta. They also show the three level design is optimal for any given level of short-term gain. In this article we show strong advantages to moving away from that three level design when the running variable is not symmetric, or we cannot treat half of the subjects.

Metelkina and Pronzato, (2017) studied a further generalization of the optimal tie-breaker design problem, motivated by CARA designs. In particular, their Example 1, an illustration of their Corollary 2.1, is similar111There are minor differences such as the lack of a treatment fraction constraint, an inequality constraint on short-term gain as opposed to an equality constraint, and the use of a common intercept for the treated and untreated individuals, i.e., assuming that β2=0\beta_{2}=0 in (1). to a random-xx generalization of (3). Crucially, however, the proof for their Corollary 2.1 does not generalize to the case where we require the treatment probabilities be monotone. Even without the monotonicity constraint, we provide a sharper characterization of the solutions to our more specific problem (Section 3.1).

We introduce a random-xx generalization of the problem (3) in Section 2. We show it encompasses both  (3) and the problem studied by Owen and Varian, (2020) as special cases. Then, Section 3 presents the main technical results characterizing the solutions to this more general problem. In particular, Theorem 2 shows that, under the monotonicity constraint, there always exists a solution to (3) corresponding to a simple, two-level stratified design where all subjects with running variable below some threshold tt^{\prime} have the same probability of treatment, all subjects with running variable above tt^{\prime} have an identical, higher probability of treatment, and those subjects with x=tx=t^{\prime} have a treatment probability between these two values. Section 4 then presents some results on the trade-off between a DD-optimality efficiency criterion and short-term gain for these optimal designs; examples of this trade-off for some specific running variable distributions are given in Section 5. That section also includes a fixed-xx application based on Head Start, a government assistance program for low-income children. It also shows how to compute our optimal designs when xx is either fixed or random. Finally, Section 6 provides summarizes the main results.

2 Random running variable

Our random-xx generalization of (3) assumes the running variables xix_{i} are samples from a common distribution FF, which we hereafter identify with the corresponding cumulative distribution function. It considers an information matrix that averages over both the random treatment assignments and randomness in the running variables. This allows us to characterize optimal designs for an as yet unobserved set of running variable values, when their distribution is known.

Before presenting the random-xx tie-breaker design problem, we briefly review the standard setting of optimal design in multiple linear regression models; see e.g. Atkinson et al., (2007) for further background. In the simplest case, the user assumes the standard linear model yi=𝒙i𝖳𝜷+εiy_{i}=\bm{x}_{i}^{\mathsf{T}}\bm{\beta}+\varepsilon_{i} with the goal of selecting covariate values 𝒙1,,𝒙np\bm{x}_{1},\dots,\bm{x}_{n}\in{}^{p} to optimize an efficiency criterion that is a function of the information matrix 𝒳𝖳𝒳\mathcal{X}^{\mathsf{T}}\mathcal{X}, where 𝒳n×p\mathcal{X}\in{}^{n\times p} is the design matrix with ii-th row 𝒙i𝖳\bm{x}_{i}^{\mathsf{T}}. Perhaps the most common such criterion is D-optimality, which corresponds to maximizing log(det(𝒳𝖳𝒳))\log(\det(\mathcal{X}^{\mathsf{T}}\mathcal{X})). Another popular choice is cc-optimality, which minimizes c𝖳(𝒳𝖳𝒳)1cc^{\mathsf{T}}(\mathcal{X}^{\mathsf{T}}\mathcal{X})^{-1}c for some choice of cpc\in{}^{p}. This can be interpreted as minimizing Var(cβ^𝒳)\mathrm{Var}(c^{\top}\hat{\beta}\mid\mathcal{X}), where β^\hat{\beta} is the ordinary least squares estimator. The study of optimal design is often simplified by the use of design measures. A design measure ξ\xi is a probability distribution from which to generate the covariates 𝒙i\bm{x}_{i}. The relaxed optimal design problem involves selecting a design measure ξ\xi instead of a finite number of covariate values 𝒙1,,𝒙n\bm{x}_{1},\ldots,\bm{x}_{n}. The objective is to optimize for the desired functional of the expected information matrix (ξ)𝔼ξ[𝒳𝖳𝒳]\mathcal{I}(\xi)\equiv\mathbb{E}_{\xi}[\mathcal{X}^{\mathsf{T}}\mathcal{X}] over some space Ξ\Xi of design measures ξ\xi. For instance, a design measure ξ\xi^{*} is D-optimal (for the relaxed problem) if ξargmaxξΞdet((ξ))\xi^{*}\in\operatorname*{arg\,max}_{\xi\in\Xi}\det(\mathcal{I}(\xi)), and cc-optimal if ξargminξΞc(ξ)1c\xi^{*}\in\operatorname*{arg\,min}_{\xi\in\Xi}c^{\top}\mathcal{I}(\xi)^{-1}c. The original optimal design problem restricts Ξ\Xi to only consist of discrete probability distributions supported on at most nn distinct points with probabilities that are multiples of 1/n1/n.

For the tie-breaker design problem, our regression model (1) includes both the running variables xix_{i} and the treatment indicators ziz_{i} as covariates. But the experimenter does not have control over the entire joint distribution of (xi,zi)(x_{i},z_{i}). The running variable is externally determined, so they can only specify the conditional distribution of the treatment indicator ziz_{i} given the running variable xix_{i}. This conditional distribution is specified by a design function p:[0,1]p:\real\to[0,1] such that p(x)Pr(zi=1xi=x)p(x)\equiv\Pr(z_{i}=1\mid x_{i}=x). As mentioned above, we assume xiFx_{i}\sim F for a known, fixed distribution FF. This allows us to drop subscripts ii when convenient. For any two design functions pp and pp^{\prime} we say p=pp=p^{\prime} whenever PrF({x:p(x)=p(x)})=1\Pr_{F}(\{x:p(x)=p^{\prime}(x)\})=1. We only need a minimal assumption on FF, which can be continuous, discrete, or neither:

Assumption 1.

0<VarF(x)<0<\mathrm{Var}_{F}(x)<\infty with 𝔼F(x)=0\mathbb{E}_{F}(x)=0.

The mean-centeredness part of Assumption 1 loses no generality, due to the translation invariance of estimation under the two-line model (1). All expectations involving xx hereafter omit the subscript FF from all such expectations with the implicit understanding that xFx\sim F.

The random-xx tie-breaker design problem is as follows:

maximizeΨ((p))overpsubject to𝔼p(z)=z~and𝔼p(xz)=xz~.\begin{array}[]{ll}\mbox{maximize}&\qquad\Psi(\mathcal{I}(p))\\ \mbox{over}&\qquad p\in\mathcal{F}\\ \mbox{subject to}&\qquad\mathbb{E}_{p}(z)=\widetilde{z}\\ \mbox{and}&\qquad\mathbb{E}_{p}(xz)=\widetilde{xz}.\end{array} (4)

Here z~\widetilde{z} and xz~\widetilde{xz} are constants analogous to p¯\bar{p} and xp¯\overline{xp}, respectively, in (3), \mathcal{F} is a collection of design functions, and (p)\mathcal{I}(p) is the expected information matrix under the model (1), averaging over both xFx\sim F and zxpz\mid x\sim p.

This problem can be viewed as a constrained relaxed optimal design problem under the regression model (1) where the set Ξ\Xi of allowable design measures is indexed by the design functions pp\in\mathcal{F}.

To interpret the equality constraints in (4) it is helpful to note that

𝔼p(xaz)=𝔼(xa𝔼p(zx))=𝔼(xa(2p(x)1))=2𝔼(xap(x))𝔼(xa)\displaystyle\mathbb{E}_{p}(x^{a}z)=\mathbb{E}(x^{a}\mathbb{E}_{p}(z\!\mid\!x))=\mathbb{E}(x^{a}(2p(x)-1))=2\mathbb{E}(x^{a}p(x))-\mathbb{E}(x^{a}) (5)

for any a0a\geqslant 0 with 𝔼(|x|a)<\mathbb{E}(|x|^{a})<\infty. In particular, for each positive integer aa, there exists an invertible linear mapping φa:a+1a+1\varphi_{a}:{}^{a+1}\to{}^{a+1} that does not depend on the design pp and maps (𝔼p(z),,𝔼p(xaz))(\mathbb{E}_{p}(z),\ldots,\mathbb{E}_{p}(x^{a}z)) to (𝔼(p(x)),,𝔼(xap(x)))(\mathbb{E}(p(x)),\ldots,\mathbb{E}(x^{a}p(x))). For example, φ1(x,y)=(1/2)(1+x,y)\varphi_{1}(x,y)=(1/2)(1+x,y). Taking a=0a=0 in (5), we see the constraint 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z} in (4) is equivalent to requiring the expected proportion of subjects to be treated to be (1+z~)/2(1+\widetilde{z})/2. This proportion is typically determined by the aforementioned budget constraints. Taking a=1a=1 in (5) shows that the second constraint 𝔼p(xz)=xz~\mathbb{E}_{p}(xz)=\widetilde{xz} in (4) sets the expected level of short-term gain. In Section 2.2, we provide some guidance on how to choose xz~\widetilde{xz} in practice.

From computing the expected information matrix (p)\mathcal{I}(p) in Section 2.3, we will see the problem (4) reduces to the finite-dimensional problem (3) when FF is discrete, placing probability mass n1n^{-1} on each of the known running variable values x1,,xnx_{1},\ldots,x_{n}. Thus, to solve (3) it suffices to solve the problem (4) for any FF satisfying Assumption 1, which must hold for any discrete distribution with finite support.

2.1 Some design functions

For convenience, we introduce some notation for certain forms of the design function pp. We will commonly encounter designs of the form

pA(x)𝟏(xA)\displaystyle p_{A}(x)\equiv\bm{1}(x\in A) (6)

for a set AA\subseteq\real. Another important special case consists of two level designs

p,u,t(x)𝟏(x<t)+u𝟏(xt)\displaystyle p_{\ell,u,t}(x)\equiv\ell\bm{1}(x<t)+u\bm{1}(x\geqslant t) (7)

for treatment probabilities 0u10\leqslant\ell\leqslant u\leqslant 1 and a threshold t¯t\in\overline{\mathbb{R}}. For example, p0,1,tp_{0,1,t} is a sharp RDD with threshold tt, while for any tt, pθ,θ,tp_{\theta,\theta,t} is an RCT with treatment probability θ\theta.

The condition u\ell\leqslant u ensures that p(x)p(x) is nondecreasing in xx; we refer to such designs as monotone. Under a monotone design, a subject cannot have a lower treatment probability than another subject with lower xx. We also define a symmetric design to be one for which p(x)=1p(x)p(-x)=1-p(x); for instance, pp might be the cumulative distribution function (CDF) of a symmetric random variable. Finally, the three level tie-breaker design from Owen and Varian, (2020) is both monotone and symmetric and defined for Δ[0,1]\Delta\in[0,1] by

p3,Δ(x)0.5𝟏(|x|Δ)+𝟏(x>Δ)p_{3,\Delta}(x)\equiv 0.5\bm{1}(|x|\leqslant\Delta)+\bm{1}(x>\Delta) (8)

when FF is the 𝕌(1,1)\mathbb{U}(-1,1) distribution. Note that for all Δ\Delta, p3,Δp_{3,\Delta} always treats half the subjects, i.e., z~=0\widetilde{z}=0. The generalization to other z~\widetilde{z} and running variable distribution functions FF is

p3;z~,Δ(x)=0.5×𝟏(a(z~,Δ)<x<b(z~,Δ))+𝟏(xb(z~,Δ))p_{3;\widetilde{z},\Delta}(x)=0.5\times\bm{1}(a(\widetilde{z},\Delta)<x<b(\widetilde{z},\Delta))+\bm{1}(x\geqslant b(\widetilde{z},\Delta)) (9)

where a(z~,Δ)=F1((1z~)/2Δ)a(\widetilde{z},\Delta)=F^{-1}((1-\widetilde{z})/2-\Delta) and b(z~,Δ)=F1((1z~)/2+Δ)b(\widetilde{z},\Delta)=F^{-1}((1-\widetilde{z})/2+\Delta).

2.2 Bounds on short-term gain

Before studying optimal designs, we impose lower and upper bounds on the possible short-term gain constraints xz~\widetilde{xz} to consider, for each possible z~(1,1)\widetilde{z}\in(-1,1). For an upper bound we use xz~max(z~)\widetilde{xz}_{\max}(\widetilde{z}), the maximum xz~\widetilde{xz} that can be attained by any design function pp satisfying the treatment fraction constraint 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z}. It turns out that this upper bound is always uniquely attained. If the running variable distribution FF is continuous, it is uniquely attained by a sharp RDD. We remind the reader that uniqueness of a design function satisfying some property means that for any two design functions pp and pp^{\prime} with that property, we must have Pr(p(x)=p(x))=1\Pr(p(x)=p^{\prime}(x))=1 under xFx\sim F.

Lemma 1.

For any z~[1,1]\widetilde{z}\in[-1,1] and running variable distribution FF, there exists a unique design pz~p_{\widetilde{z}} satisfying

𝔼pz~(z)\displaystyle\mathbb{E}_{p_{\widetilde{z}}}(z) =z~,and\displaystyle=\widetilde{z},\quad\text{and} (10)
pz~(x)\displaystyle p_{\widetilde{z}}(x) ={1,x>t0,x<t\displaystyle=\begin{cases}1,&x>t\\ 0,&x<t\end{cases} (11)

for some tt\in\real. Any pp that satisfies the treatment fraction constraint (10) also satisfies

𝔼p(xz)𝔼pz~(xz)xz~max(z~)\displaystyle\mathbb{E}_{p}(xz)\leqslant\mathbb{E}_{p_{\widetilde{z}}}(xz)\equiv\widetilde{xz}_{\max}(\widetilde{z}) (12)

with equality if and only if p=pz~p=p_{\widetilde{z}}, i.e. Pr(p(x)=pz~(x))=1\Pr(p(x)=p_{\widetilde{z}}(x))=1 under xFx\sim F.

Remark 1.

Notice that equation (11) does not specify pz~(x)p_{\widetilde{z}}(x) at x=tx=t. If FF is continuous, then any value for pz~(t)p_{\widetilde{z}}(t) yields an equivalent design function, but if FF has an atom at x=tx=t then we will require a specific value for pz~(t)[0,1]p_{\widetilde{z}}(t)\in[0,1]. We must allow FF to have atoms to solve the finite dimensional problem (3). While we specify pz~(t)p_{\widetilde{z}}(t) in the proof of Lemma 1 below, later results of this type do not give the values of design functions at such discontinuities.

Remark 2.

If FF is continuous, then pz~p_{\widetilde{z}} is an RDD: pz~=p0,1,tp_{\widetilde{z}}=p_{0,1,t} for t=F1((1z~)/2)t=F^{-1}((1-\widetilde{z})/2). We call the design pz~p_{\widetilde{z}} a generalized RDD for general FF satisfying Assumption 1.

Remark 3.

The threshold tt in (11) is essentially unique. If there is an interval (t,s)(t,s) with Pr(t<x<s)=0\Pr(t<x<s)=0 then all step locations in [t,s)[t,s) provide equivalent generalized RDDs.

Proof of Lemma 1..

If z~{1,1}\widetilde{z}\in\{-1,1\} then the only design functions (again, up to uniqueness w.p.1 under xFx\sim F) are the constant functions p(x)=0p(x)=0 and p(x)=1p(x)=1, and the result holds trivially. Thus, we can assume that z~(1,1)\widetilde{z}\in(-1,1). By (5), the existence of pz~p_{\widetilde{z}} follows by taking t=inf{s:F(s)(1z~)/2}t=\inf\{s:F(s)\geqslant(1-\widetilde{z})/2\} and

pz~(t)={0, if Pr(x=t)=0F(t)(1z~)/2Pr(x=t), if Pr(x=t)>0.p_{\widetilde{z}}(t)=\begin{cases}0,&\text{ if $\Pr(x=t)=0$}\\ \frac{F(t)-(1-\widetilde{z})/2}{\Pr(x=t)},&\text{ if $\Pr(x=t)>0$.}\end{cases}

To show (12), fix any design pp satisfying (10) and notice that 𝔼(p(x)pz~(x))=0\mathbb{E}(p(x)-p_{\widetilde{z}}(x))=0 means

𝔼(p(x)𝟏(x<t))+(p(t)pz~(t))Pr(x=t)+𝔼((p(x)1)𝟏(x>t))=0.\mathbb{E}(p(x)\bm{1}(x<t))+(p(t)-p_{\widetilde{z}}(t))\Pr(x=t)+\mathbb{E}((p(x)-1)\bm{1}(x>t))=0.

Then 𝔼(x(pz~(x)p(x)))\mathbb{E}(x(p_{\widetilde{z}}(x)-p(x))) equals

𝔼(xp(x)𝟏(x<t))+t(pz~(t)p(t))Pr(x=t)+𝔼(x(1p(x))𝟏(x>t))\displaystyle\phantom{\geqslant}\ \mathbb{E}(-xp(x)\bm{1}(x<t))+t(p_{\widetilde{z}}(t)-p(t))\Pr(x=t)+\mathbb{E}(x(1-p(x))\bm{1}(x>t))
t[𝔼(p(x)𝟏(x<t))+(pz~(t)p(t))Pr(x=t)+𝔼((1p(x))𝟏(x>t))]\displaystyle\geqslant t[\mathbb{E}(-p(x)\bm{1}(x<t))+(p_{\widetilde{z}}(t)-p(t))\Pr(x=t)+\mathbb{E}((1-p(x))\bm{1}(x>t))]
=0\displaystyle=0

with equality iff (tx)p(x)𝟏(x<t)=(xt)(1p(x))𝟏(x>t)=0(t-x)p(x)\bm{1}(x<t)=(x-t)(1-p(x))\bm{1}(x>t)=0 for a set of xx with probability one under FF, i.e., iff pp satisfies (11) with probability one under xFx\sim F. ∎

By symmetry, the design that minimizes 𝔼p(xz)\mathbb{E}_{p}(xz) over all designs pp with 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z} is p1,0,sp_{1,0,s} where s=F1((1+z~)/2)s=F^{-1}((1+\widetilde{z})/2). Notice that 𝔼p1,0,s(xz)=xz~min(z~)2𝔼(x𝟏(x<s))<0\mathbb{E}_{p_{1,0,s}}(xz)=\widetilde{xz}_{\min}(\widetilde{z})\equiv 2\mathbb{E}(x\bm{1}(x<s))<0. We impose a stricter lower bound of xz~0\widetilde{xz}\geqslant 0 in the context of problem (4). This is motivated by the fact that the running variable xx has mean 0 (Assumption 1), meaning that 𝔼p(xz)=0\mathbb{E}_{p}(xz)=0 whenever the design function pp is constant, corresponding to an RCT. Designs with xz~<0\widetilde{xz}<0 exist for all z~(1,1)\widetilde{z}\in(-1,1) but would not be relevant in our motivating applications, as they represent scenarios where subjects with smaller xx are more preferentially treated than in an RCT. We hence define the feasible input space 𝒥\mathcal{J} by

𝒥{(z~,xz~)1<z~<1, 0xz~xz~max(z~)}2.\mathcal{J}\equiv\Bigl{\{}(\widetilde{z},\widetilde{xz})\mid-1<\widetilde{z}<1,\ 0\leqslant\widetilde{xz}\leqslant\widetilde{xz}_{\max}(\widetilde{z})\Bigr{\}}\subseteq\mathbb{R}^{2}. (13)

Any design function pp for which the moments (𝔼p(z),𝔼p(xz))(\mathbb{E}_{p}(z),\mathbb{E}_{p}(xz)) lie within the feasible input space 𝒥\mathcal{J} is referred to as an input-feasible design function.

If the design pp is input-feasible, we can write 𝔼p(xz)=δxz~max(𝔼p(z))\mathbb{E}_{p}(xz)=\delta\cdot\widetilde{xz}_{\max}(\mathbb{E}_{p}(z)) for some δ[0,1]\delta\in[0,1]. The parameter δ\delta corresponds to the amount of additional short-term gain attained by the design pp over an RCT, relative to the amount of additional short-term gain attained by the generalized RDD p𝔼p(z)p_{\mathbb{E}_{p}(z)} that treats the same proportion of subjects as pp. For instance, δ=0.4\delta=0.4 means that the design pp has a short-term gain that is 40% of the way from that of an RCT to the maximum attainable short-term gain under the treatment fraction constraint.

2.3 Expected information matrix and equivalence of DD-optimality and cc-optimality

We now explicitly compute the expected information matrix

(p)=σ2𝔼(n1𝒳𝖳𝒳)=σ2(10𝔼(z)𝔼(xz)0𝔼(x2)𝔼(xz)𝔼(x2z)𝔼(z)𝔼(xz)10𝔼(xz)𝔼(x2z)0𝔼(x2))=σ2(DCCD),\mathcal{I}(p)=\sigma^{-2}\mathbb{E}(n^{-1}\mathcal{X}^{\mathsf{T}}\mathcal{X})=\sigma^{-2}\begin{pmatrix}1&0&\mathbb{E}(z)&\mathbb{E}(xz)\\ 0&\mathbb{E}(x^{2})&\mathbb{E}(xz)&\mathbb{E}(x^{2}z)\\ \mathbb{E}(z)&\mathbb{E}(xz)&1&0\\ \mathbb{E}(xz)&\mathbb{E}(x^{2}z)&0&\mathbb{E}(x^{2})\end{pmatrix}=\sigma^{-2}\begin{pmatrix}D&C\\ C&D\end{pmatrix}, (14)

where

C=(𝔼(z)𝔼(xz)𝔼(xz)𝔼(x2z))andD=(100𝔼(x2))C=\begin{pmatrix}\mathbb{E}(z)&\mathbb{E}(xz)\\ \mathbb{E}(xz)&\mathbb{E}(x^{2}z)\end{pmatrix}\quad\text{and}\quad D=\begin{pmatrix}1&0\\ 0&\mathbb{E}(x^{2})\end{pmatrix}

and we have omitted the dependence of the expectations on the design pp for brevity. We emphasize that \mathcal{I} depends on FF as well, though the experimenter can only control pp. Furthermore, when F=(1/n)i=1nδxiF=(1/n)\sum_{i=1}^{n}\delta_{x_{i}} and the running variable values x1,,xnx_{1},\dots,x_{n} are mean-centered, the expected information matrix (p)\mathcal{I}(p) is precisely the fixed-xx information matrix n(p1,,pn)\mathcal{I}_{n}(p_{1},\ldots,p_{n}), identifying pip(xi)p_{i}\equiv p(x_{i}). This shows that indeed, the random-xx problem (4) is strictly more general than the fixed-xx problem (3). Equation (14) also shows that any efficiency objective Ψ((p))\Psi(\mathcal{I}(p)) only depends on the treatment indicators zz through their marginal distributions conditional on xx, and not their joint distribution. In the fixed-xx setting, this makes it easier to obey an exact budget constraint n1i=1nzi=p~n^{-1}\sum_{i=1}^{n}z_{i}=\widetilde{p} by stratification. For instance, given five subjects with pi=0.4p_{i}=0.4 we could randomly treat exactly two of them, instead of randomizing each subject independently and possibly going over budget.

While we will characterize solutions to the optimal design problem (4) for any continuous efficiency criterion Ψ()\Psi(\cdot), in Section 4 we will prove some additional results for the DD-optimality criterion ΨD()=log(det())\Psi_{D}(\cdot)=\log(\det(\cdot)). We show that DD-optimality is of particular interest in this setting, as it happens to correspond exactly with the cc-optimality efficiency criterion of Owen and Varian, (2020). They aim to minimize the asymptotic variance of β^3\hat{\beta}_{3} and do so by observing that if \mathcal{I} is invertible, then when (xi,zi)(x_{i},z_{i}) are independent, by the law of large numbers

nVar(β^𝒳)=n(𝒳𝖳𝒳)1a.s.1.n\textnormal{Var}(\hat{\beta}\!\mid\!\mathcal{X})=n(\mathcal{X}^{\mathsf{T}}\mathcal{X})^{-1}\,\stackrel{{\scriptstyle\mathrm{a.s.}}}{{\to}}\,\mathcal{I}^{-1}.

We have assumed σ2=1\sigma^{2}=1 WLOG as the DD-optimal design does not depend on σ2\sigma^{2}. Then by standard block inversion formulas

nVar(β^3𝒳)a.s.(1)44=M11(p)det(M(p))n\textnormal{Var}(\hat{\beta}_{3}\!\mid\!\mathcal{X})\stackrel{{\scriptstyle\mathrm{a.s.}}}{{\to}}(\mathcal{I}^{-1})_{44}=\frac{M_{11}(p)}{\det(M(p))} (15)

where

M=M(p)\displaystyle M=M(p) =DCD1C\displaystyle=D-CD^{-1}C
=(1𝔼(z)2𝔼(xz)2𝔼(x2)𝔼(xz)𝔼(z)𝔼(x2z)𝔼(xz)𝔼(x2)𝔼(xz)𝔼(z)𝔼(x2z)𝔼(xz)𝔼(x2)𝔼(x2)𝔼(xz)2𝔼(x2z)2𝔼(x2)).\displaystyle=\begin{pmatrix}1-\mathbb{E}(z)^{2}-\frac{\mathbb{E}(xz)^{2}}{\mathbb{E}(x^{2})}&-\mathbb{E}(xz)\cdot\mathbb{E}(z)-\frac{\mathbb{E}(x^{2}z)\mathbb{E}(xz)}{\mathbb{E}(x^{2})}\\[6.45831pt] -\mathbb{E}(xz)\cdot\mathbb{E}(z)-\frac{\mathbb{E}(x^{2}z)\mathbb{E}(xz)}{\mathbb{E}(x^{2})}&\mathbb{E}(x^{2})-\mathbb{E}(xz)^{2}-\frac{\mathbb{E}(x^{2}z)^{2}}{\mathbb{E}(x^{2})}\end{pmatrix}. (16)

Equation (15) shows that minimizing the asymptotic conditional variance of β^3\hat{\beta}_{3} is equivalent to maximizing Eff(p):=det(M(p))/M11(p)\textnormal{Eff}(p):=\det(M(p))/M_{11}(p), which under the present formalization of the problem is further equivalent to cc-optimality for the expected information matrix (14) with c=(0,0,0,1)c=(0,0,0,1)^{\top}. The following result shows that M11(p)>0M_{11}(p)>0 for any input-feasible design pp. It follows that Eff(p)\textnormal{Eff}(p) is always well-defined and nonnegative for any input-feasible design.

Corollary 1.

For any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J}, M11=1z~2(xz~)2/𝔼(x2)>0M_{11}=1-\widetilde{z}^{2}-(\widetilde{xz})^{2}/\mathbb{E}(x^{2})>0.

Proof.

See Appendix A. ∎

On the other hand, because \mathcal{I} is the expected value of a positive semi-definite rank one matrix, it is also positive semi-definite. Thus

0det()=det(D)det(M)=𝔼(x2)det(M).0\leqslant\det(\mathcal{I})=\det(D)\det(M)=\mathbb{E}(x^{2})\det(M).

This shows that det(M)0\det(M)\geqslant 0 with inequality iff \mathcal{I} is invertible. Additionally, since M11M_{11} only depends on pp through 𝔼p(z)\mathbb{E}_{p}(z) and 𝔼p(xz)\mathbb{E}_{p}(xz), any two input-feasible designs pp and pp^{\prime} satisfying the equality constraints in (4) must have M11(p)=M11(p)M_{11}(p)=M_{11}(p^{\prime}). It follows that the solutions to (4) under the DD-optimality criterion Ψ((p))=ΨD((p))\Psi(\mathcal{I}(p))=\Psi_{D}(\mathcal{I}(p)) and the cc-optimality criterion Ψ((p))=Eff(p)\Psi(\mathcal{I}(p))=\textnormal{Eff}(p) must be identical whenever (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J}.

3 Optimal design characterizations

To solve the constrained optimization problem (4), we begin by observing that the expected information matrix (p)\mathcal{I}(p), computed in (14), only depends on the design function pp through the quantities 𝔼p(z)\mathbb{E}_{p}(z), 𝔼p(xz)\mathbb{E}_{p}(xz), and 𝔼p(x2z)\mathbb{E}_{p}(x^{2}z). Then the same is true for any efficiency objective Ψ((p))\Psi(\mathcal{I}(p)). Consequently for any continuous Ψ\Psi we can write Ψ((p)))=gΨ(𝔼p(z),𝔼p(xz),𝔼p(x2z))\Psi(\mathcal{I}(p)))=g_{\Psi}(\mathbb{E}_{p}(z),\mathbb{E}_{p}(xz),\mathbb{E}_{p}(x^{2}z)) for some continuous gΨ:3g_{\Psi}:{}^{3}\to\real that may depend on the running variable distribution FF.

Fixing (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} and the set \mathcal{F} of permissible design functions, we say a feasible design pp\in\mathcal{F} is one that satisfies the equality constraints in (4), i.e. 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z} and 𝔼p(xz)=xz~\mathbb{E}_{p}(xz)=\widetilde{xz}. Thus, the efficiency criterion Ψ((p))\Psi(\mathcal{I}(p)) can only vary among feasible designs pp through the single quantity 𝔼p(x2z)\mathbb{E}_{p}(x^{2}z). Furthermore, any two feasible designs pp and qq with 𝔼p(x2z)=𝔼q(x2z)\mathbb{E}_{p}(x^{2}z)=\mathbb{E}_{q}(x^{2}z) must have the same efficiency. Thus, we can break down the problem (4) into two steps. First, we find a solution

x2z~(z~,xz~;Ψ)argmaxaI(z~,xz~)gΨ(z~,xz~,a)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)\in\operatorname*{arg\,max}_{a\in I_{\mathcal{F}}(\widetilde{z},\widetilde{xz})}g_{\Psi}(\widetilde{z},\widetilde{xz},a) (17)

where

I(z~,xz~)={𝔼p(x2z)𝔼p(z)=z~,𝔼p(xz)=xz~ for some p}[𝔼(x2),𝔼(x2)]I_{\mathcal{F}}(\widetilde{z},\widetilde{xz})=\{\mathbb{E}_{p}(x^{2}z)\mid\mathbb{E}_{p}(z)=\widetilde{z},\mathbb{E}_{p}(xz)=\widetilde{xz}\text{ for some $p\in\mathcal{F}$}\}\subseteq[-\mathbb{E}(x^{2}),\mathbb{E}(x^{2})] (18)

is the set of values of 𝔼p(x2z)\mathbb{E}_{p}(x^{2}z) attainable by some feasible design pp\in\mathcal{F}. Then we must find a feasible design pp\in\mathcal{F} that satisfies 𝔼p(x2z)=x2z~(z~,xz~;Ψ)\mathbb{E}_{p}(x^{2}z)=\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi).

The next result shows that when \mathcal{F} is convex, I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) is an interval. For our two choices of \mathcal{F} of interest, Propositions 1 and 2 will show it is a closed interval, so (17) will always have a solution when Ψ()\Psi(\cdot) is continuous.

Lemma 2.

Suppose feasible designs pp, pp^{\prime}\in\mathcal{F} satisfy 𝔼p(x2z)𝔼p(x2z)\mathbb{E}_{p}(x^{2}z)\leqslant\mathbb{E}_{p^{\prime}}(x^{2}z), where \mathcal{F} is convex. Then if 𝔼p(x2z)γ𝔼p(x2z)\mathbb{E}_{p}(x^{2}z)\leqslant\gamma\leqslant\mathbb{E}_{p^{\prime}}(x^{2}z), there exists feasible p(γ)p^{(\gamma)}\in\mathcal{F} with 𝔼p(γ)(x2z)=γ\mathbb{E}_{p^{(\gamma)}}(x^{2}z)=\gamma.

Proof.

If 𝔼p(x2z)=𝔼p(x2z)\mathbb{E}_{p^{\prime}}(x^{2}z)=\mathbb{E}_{p}(x^{2}z) then either of them is a suitable p(γ)p^{(\gamma)}. Otherwise take λ[0,1]\lambda\in[0,1] so that γ=λ𝔼p(x2z)+(1λ)𝔼p(x2z)\gamma=\lambda\mathbb{E}_{p}(x^{2}z)+(1-\lambda)\mathbb{E}_{p^{\prime}}(x^{2}z). Then p(γ)=λp+(1λ)pp^{(\gamma)}=\lambda p+(1-\lambda)p^{\prime} is in \mathcal{F} by convexity and, by direct computation of the moments 𝔼p(xaz)\mathbb{E}_{p}(x^{a}z) for a{0,1,2}a\in\{0,1,2\}, feasible with 𝔼p(x2z)=γ\mathbb{E}_{p}(x^{2}z)=\gamma. ∎

The endpoints of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) can be computed as the optimal values of the following constrained optimization problems:

maximize𝔼p(x2z)minimize𝔼p(x2z)overpoverpsubject to𝔼p(z)=z~subject to𝔼p(z)=z~and𝔼p(xz)=xz~and𝔼p(xz)=xz~.\begin{array}[]{lllll}\mbox{maximize}&\mathbb{E}_{p}(x^{2}z)&&\mbox{minimize}&\mathbb{E}_{p}(x^{2}z)\\ \mbox{over}&p\in\mathcal{F}&&\mbox{over}&p\in\mathcal{F}\\ \mbox{subject to}&\mathbb{E}_{p}(z)=\widetilde{z}&&\mbox{subject to}&\mathbb{E}_{p}(z)=\widetilde{z}\\ \mbox{and}&\mathbb{E}_{p}(xz)=\widetilde{xz}&&\mbox{and}&\mathbb{E}_{p}(xz)=\widetilde{xz}.\end{array} (19)

Given solutions pmaxp_{\max} and pminp_{\min} to the problems (19), Lemma 2 shows that the design

popt(x)={pmax(x),𝔼pmax(x2z)x2z~(z~,xz~;Ψ)pmin(x),𝔼pmin(x2z)x2z~(z~,xz~;Ψ)λpmin(x)+(1λ)pmax(x),elsep_{\textnormal{opt}}(x)=\begin{cases}p_{\max}(x),&\mathbb{E}_{p_{\max}}(x^{2}z)\leqslant\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)\\ p_{\min}(x),&\mathbb{E}_{p_{\min}}(x^{2}z)\geqslant\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)\\ \lambda p_{\min}(x)+(1-\lambda)p_{\max}(x),&\text{else}\end{cases} (20)

solves the problem (4) for λ=(𝔼pmax(x2z)x2z~(z~,xz~;Ψ))/(𝔼pmax(x2z)𝔼pmin(x2z))\lambda=(\mathbb{E}_{p_{\max}}(x^{2}z)-\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi))/(\mathbb{E}_{p_{\max}}(x^{2}z)-\mathbb{E}_{p_{\min}}(x^{2}z)). If 𝔼pmax(x2z)=𝔼pmin(x2z)\mathbb{E}_{p_{\max}}(x^{2}z)=\mathbb{E}_{p_{\min}}(x^{2}z) then all feasible designs pp have the same efficiency, so any one of them is optimal.

The remainder of this section is concerned with characterizing the solutions to the problems (19) for two specific choices of design function classes \mathcal{F}: the set of all measurable functions into [0,1][0,1], and the set of all such monotone functions. For these two choices of \mathcal{F}, solutions pmaxp_{\max} and pminp_{\min} exist for any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} and are unique. Our argument uses extensions of the Neyman-Pearson lemma (Neyman and Pearson,, 1933) in hypothesis testing. These extensions are in Dantzig and Wald, (1951), whose two authors discovered the relevant results independently of each other. We use a modern formulation of their work, adapting the presentation by Lehmann and Romano, (2005):

Lemma 3.

Consider any measurable h1,,hm+1:h_{1},\dots,h_{m+1}:\real\rightarrow\real with 𝔼(|hi(x)|)<,i=1,,m+1\mathbb{E}(|h_{i}(x)|)<\infty,i=1,\dots,m+1. Define SmS\subseteq{}^{m} to be the set of all points c=(c1,,cm)c=(c_{1},\dots,c_{m}) such that

𝔼(p(x)hi(x))=ci,i=1,,m,\mathbb{E}(p(x)h_{i}(x))=c_{i},\quad i=1,\dots,m, (21)

for some pp\in\mathcal{F}, where \mathcal{F} is some collection of measurable functions from into [0,1][0,1]. For each cSc\in S let c\mathcal{F}_{c} be the set of all pp\in\mathcal{F} satisfying (21). If \mathcal{F} is such that

S={(c,cm+1)m+1cS,cm+1=𝔼(p(x)hm+1(x)) for some pc}S^{\prime}=\bigl{\{}(c,c_{m+1})\in{}^{m+1}\mid c\in S,\ c_{m+1}=\mathbb{E}(p(x)h_{m+1}(x))\text{ for some $p\in\mathcal{F}_{c}$}\bigr{\}}

is closed and convex and cintSc\in\textnormal{int}\ S, then

  1. 1.

    There exists pcp\in\mathcal{F}_{c} and k1,,kmk_{1},\ldots,k_{m}\in\real such that

    pargmaxq𝔼(q(x)(hm+1(x)i=1mkihi(x))),andp\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}\biggl{(}q(x)\biggl{(}h_{m+1}(x)-\sum_{i=1}^{m}k_{i}h_{i}(x)\biggr{)}\biggr{)},\quad\text{and} (22)
  2. 2.

    pargmaxqc𝔼(q(x)hm+1(x))p\in\operatorname*{arg\,max}_{q\in\mathcal{F}_{c}}\mathbb{E}(q(x)h_{m+1}(x)) if and only if pcp\in\mathcal{F}_{c} satisfies (22) for some k1,,kmk_{1},\dots,k_{m}.

Proof.

Claim 1 and necessity of (22) in claim 2 follows from the proof of part (iv) of Theorem 3.6.1 in Lehmann and Romano, (2005), which uses the fact that SS^{\prime} is closed and convex to construct a separating hyperplane in m+1. Sufficiency of (22) in claim 2 follows from part (ii) of that theorem, and is often called the method of undetermined multipliers. ∎

Lemma 3 equates a constrained optimization problem (item 2) and a compound optimization problem (item 1). Unlike typical equivalence theorems, it does not require \mathcal{F} to be the set of all measurable design functions, and uses an entirely different proof technique. Following Whittle, (1973), equivalence theorems in optimal design are now popularly proven using the concept of Fréchet derivatives on the space of design functions (measures). However, such approaches often do not apply when \mathcal{F} is restricted to be the set of all monotone design functions. Most relevant to our problem, the proof of Corollary 2.1 in the supplement of Metelkina and Pronzato, (2017) involves Fréchet derivatives in the direction of design functions supported at a single value of xx, which are not monotone. However, the use of Lemma 3 requires an objective linear in pp, where typical equivalence theorems only require concavity.

3.1 Globally optimal designs

We now solve the design problem (4) in the case that \mathcal{F} is the set of all measurable functions p:[0,1]p:\real\to[0,1]. We first explain how the results of Metelkina and Pronzato, (2017) do not adequately do so already. Identifying our design functions pp with their design measures ξ\xi, Corollary 2.1 of Metelkina and Pronzato, (2017) does not provide any information about what an optimal solution popt(x)p_{\textnormal{opt}}(x) to (4) would be for values of xx where G1(popt();x)=G2(popt();x)G_{1}(p_{\textnormal{opt}}(\cdot);x)=G_{2}(p_{\textnormal{opt}}(\cdot);x). Here G1G_{1} and G2G_{2} are quantities derived from the aforementioned Fréchet derivatives, depending on the constraints z~\widetilde{z} and xz~\widetilde{xz}. Unfortunately, this lack of information about poptp_{\textnormal{opt}} holds for all xx both in their Example 1 and in our setting. Their example skirts this limitation by noting some moment conditions on poptp_{\textnormal{opt}} implied by the equality G1=G2G_{1}=G_{2} when the running variable is uniform and the efficiency criterion is DD-optimality, and then manually searching for some parametric forms of poptp_{\textnormal{opt}} for which it is possible to satisfy these conditions. By contrast, the results in this section apply Lemma 3 with \mathcal{F} the set of all design functions, and show a simple stratified design function is always optimal for any running variable distribution FF and continuous efficiency criterion. This enables optimal designs to be systematically and efficiently constructed (Section 5).

We will apply Lemma 3 with m=2m=2 constraints pertaining to h1(x)=1h_{1}(x)=1 and h2(x)=xh_{2}(x)=x. Our objective function is based on h3(x)=x2h_{3}(x)=x^{2}. When the running variable distribution FF is continuous, recalling the notation (6) the solutions to (19) take the forms pmax=p[a1,a2]cp_{\max}=p_{[a_{1},a_{2}]^{c}} and pmin=p[b1,b2]p_{\min}=p_{[b_{1},b_{2}]} for some intervals [a1,a2][a_{1},a_{2}] and [b1,b2][b_{1},b_{2}].

Proposition 1.

Let \mathcal{F} be the set of all measurable functions from into [0,1][0,1]. For any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J}, there exist unique solutions pmaxp_{\max} and pminp_{\min} to the optimization problems (19). These solutions are the unique feasible designs satisfying

pmax(x)={1,x[a1,a2]0,x(a1,a2)andpmin(x)={1,x(b1,b2)0,x[b1,b2]p_{\max}(x)=\begin{cases}1,&x\not\in[a_{1},a_{2}]\\ 0,&x\in(a_{1},a_{2})\end{cases}\qquad\text{and}\qquad p_{\min}(x)=\begin{cases}1,&x\in(b_{1},b_{2})\\ 0,&x\not\in[b_{1},b_{2}]\end{cases} (23)

for some a1a2a_{1}\leqslant a_{2} and b1b2b_{1}\leqslant b_{2} which depend on (z~,xz~)(\widetilde{z},\widetilde{xz}) and can be infinite if xz~=xz~max(z~)\widetilde{xz}=\widetilde{xz}_{\max}(\widetilde{z}).

Proof.

If xz~=xz~max(z~)\widetilde{xz}=\widetilde{xz}_{\max}(\widetilde{z}), then the proposition follows by Lemma 1 and taking a1=a_{1}=-\infty, a2=t=b1a_{2}=t=b_{1} and b2=b_{2}=\infty. Thus we can assume that xz~<xz~max(z~)\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}). We give the proof for pmaxp_{\max} in detail. The argument for pminp_{\min} is completely symmetric.

As noted above, we are in the setting of Lemma 3 with m=2m=2, h1(x)=1h_{1}(x)=1, h2(x)=xh_{2}(x)=x, and h3(x)=x2h_{3}(x)=x^{2}. The collection \mathcal{F} here is the set of all measurable functions from into [0,1][0,1], so the corresponding SS^{\prime} is closed and convex, as shown in part (iv) of Theorem 3.6.1 in Lehmann and Romano, (2005). By Lemma 1 and (5) we can write intS=φ1(𝒯)\textnormal{int}\ S=\varphi_{1}(\mathcal{T}) where φ1\varphi_{1} is defined in the discussion around (5) and

𝒯={(z~,xz~)1<z~<1,xz~min(z~)<xz~<xz~max(z~)}\mathcal{T}=\{(\widetilde{z},\widetilde{xz})\mid-1<\widetilde{z}<1,\widetilde{xz}_{\min}(\widetilde{z})<\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z})\}

Hence our previous assumption xz~<xz~max(z~)\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}) ensures c=φ1(z~,xz~)intSc=\varphi_{1}(\widetilde{z},\widetilde{xz})\in\textnormal{int}\ S.

With the conditions of Lemma 3 satisfied, we now show that (22) is equivalent to (23) for any feasible pmaxp_{\max}. A feasible design pmaxp_{\max} satisfies (22) iff pmaxargmaxq𝔼(q(x)(x2k1xk2))p_{\max}\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}(q(x)(x^{2}-k_{1}x-k_{2})) for some k1,k2k_{1},k_{2}, or equivalently

pmax(x)={1,x2k1xk2>00,x2k1xk2<0p_{\max}(x)=\begin{cases}1,&x^{2}-k_{1}x-k_{2}>0\\ 0,&x^{2}-k_{1}x-k_{2}<0\end{cases} (24)

(cf. part (ii) of Theorem 3.6.1 in Lehmann and Romano, (2005)). If x2k1xk2x^{2}-k_{1}x-k_{2} has no real roots then pmax(x)=1p_{\max}(x)=1 for all xx, contradicting z~<1\widetilde{z}<1. Thus we write x2k1xk2=(xa1)(xa2)x^{2}-k_{1}x-k_{2}=(x-a_{1})(x-a_{2}) for some (real) a1a2a_{1}\leqslant a_{2}, showing that (24) is equivalent to (23). We can now conclude, by the second claim in Lemma 3, that the set of optimal solutions to (19) contains precisely those feasible designs satisfying (23). Furthermore, the first claim of Lemma 3 ensures that such a design must exist. It remains to show only one feasible design can satisfy (23); in Appendix B we provide a direct argument, which does not rely on Lemma 3.

Remark 4.

The necessity and sufficiency results of Proposition 1 do follow from Corollary 2.1 of Metelkina and Pronzato, (2017). We again identify our design functions pp with their design measures ξ\xi and take ψ(ξ)=x2dξ(x)\psi(\xi)=\int x^{2}\,\mathrm{d}\xi(x), which can be written as an affine function 𝚿()\bm{\Psi}(\cdot) of the expected information matrix (ξ)\mathcal{I}(\xi). As discussed at the beginning of this section, however, the form of a solution to problem (4), cannot be constructed from their Corollary without the reduction to (19) and applying (20), so that ψ\psi is as above rather than something like DD-optimality. We have also shown a stronger uniqueness result than Section 2.3.3 of Metelkina and Pronzato, (2017), which only applies when the running variable distribution FF has a density with respect to Lebesgue measure. Our Lemma 3 also provides an existence guarantee that does not rely on strict concavity of 𝚿()\bm{\Psi}(\cdot) on the set of positive definite matrices; this is violated by the affine choice we need here.

As we will see in Section 5, when z~<0\widetilde{z}<0 we frequently encounter popt=pmaxp_{\textnormal{opt}}=p_{\max} under DD-optimality. In this case it is intuitive that there is an efficiency advantage to strategically allocate the rare level z=1z=1 at both high and low xx, compared to a three level tie-breaker. But such a design is usually unacceptable in our motivating problems. We will thus constrain \mathcal{F} to the set of monotone design functions in Section 3.2.

Before doing that, we present an alternative solution to (4) assuming the running variable distribution has an additional moment. This result shows that when the running variable is continuous, an optimal design with no randomization always exists. However, randomized assignment becomes essential once we restrict our attention to monotone designs in Section 3.2, as the only non-randomized monotone designs are generalized RDDs (Remark 2).

Theorem 1.

Suppose 𝔼(|x|3)<\mathbb{E}(|x|^{3})<\infty. Then when \mathcal{F} is the set of all measurable design functions, for any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} there exists a solution to (4) with

popt(x)={1,x<a10,a1<x<a21,a2<x<a30,x>a3p_{\textnormal{opt}}(x)=\begin{cases}1,&x<a_{1}\\ 0,&a_{1}<x<a_{2}\\ 1,&a_{2}<x<a_{3}\\ 0,&x>a_{3}\end{cases} (25)

for some a1a2a3a_{1}\leqslant a_{2}\leqslant a_{3} which are finite unless poptp_{\textnormal{opt}} is one of the designs pmaxp_{\max} or pminp_{\min} in (23).

Proof.

The solutions to (4) are precisely the feasible design functions pp\in\mathcal{F} where 𝔼p(x2z)\mathbb{E}_{p}(x^{2}z) is a solution to (17). Fix any such solution x2z~(z~,xz~;Ψ)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi) to (17); it suffices to find a feasible design poptp_{\textnormal{opt}} with 𝔼popt(x2z)=x2z~(z~,xz~;Ψ)\mathbb{E}_{p_{\textnormal{opt}}}(x^{2}z)=\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi). If x2z~(z~,xz~;Ψ)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi) is the lower (resp. upper) endpoint of the interval I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}), then by Proposition 1, the unique feasible design with 𝔼p(x2z)=x2z~(z~,xz~;Ψ)\mathbb{E}_{p}(x^{2}z)=\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi) is the design pminp_{\min} (resp. pmaxp_{\max}). Then the result follows with a1=a_{1}=-\infty (resp. a3=a_{3}=\infty).

Otherwise, x2z~(z~,xz~;Ψ)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi) is in the interior of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}), and we aim to apply Lemma 3 with m=3m=3, hi(x)=xih_{i}(x)=x^{i} for i{1,2,3}i\in\{1,2,3\}, and c=φ2(z~,xz~,x2z~(z~,xz~;Ψ))c=\varphi_{2}(\widetilde{z},\widetilde{xz},\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)). With SS^{\prime} closed and convex as shown in Proposition 1, we only need to show cintSc\in\textnormal{int}\ S. With the interior of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) being nonempty, there is more than one feasible design and the uniqueness result of Lemma 1 indicates that we must have xz~<xz~max(z~)\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}). With z~(1,1)\widetilde{z}\in(-1,1) by assumption and x2z~(z~,xz~;Ψ)intI(z~,xz~)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)\in\textnormal{int}\ I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}), indeed cintSc\in\textnormal{int}\ S.

Applying Lemma 3 we know that there exists a feasible design poptp_{\textnormal{opt}} with 𝔼popt(x2z)=x2z~(z~,xz~;Ψ)\mathbb{E}_{p_{\textnormal{opt}}}(x^{2}z)=\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi) and popt(x)argmaxq𝔼(q(x)(x3k1x2k2xk3))p_{\textnormal{opt}}(x)\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}(q(x)(-x^{3}-k_{1}x^{2}-k_{2}x-k_{3})) for some k1,k2,k3k_{1},k_{2},k_{3}. If f(x;k1,k2,k3)x3k1x2k2xk3f(x;k_{1},k_{2},k_{3})\equiv-x^{3}-k_{1}x^{2}-k_{2}x-k_{3} had only one real root a1a_{1}, then the negative leading coefficient indicates f(x;k1,k2,k3)>0f(x;k_{1},k_{2},k_{3})>0 when x<a1x<a_{1} and f(x;k1,k2,k3)<0f(x;k_{1},k_{2},k_{3})<0 when x>a1x>a_{1}. This would imply popt(x)p_{\textnormal{opt}}(x) is a design that always treats all subjects with x<a1x<a_{1} and never treats any subject with x>a1x>a_{1}, which cannot be input-feasible. We conclude f(x;k1,k2,k3)=(xa1)(xa2)(xa3)f(x;k_{1},k_{2},k_{3})=-(x-a_{1})(x-a_{2})(x-a_{3}) for some finite a1a2a3a_{1}\leqslant a_{2}\leqslant a_{3} which are the roots of f(x;k1,k2,k3)f(x;k_{1},k_{2},k_{3}). This shows the existence of poptp_{\textnormal{opt}} of the form (25). ∎

3.2 Imposing a monotonicity constraint

We now apply Lemma 3 to solve (4) in the case of principal interest, where \mathcal{F} is the set of all monotone design functions. Note that the lower bound xz~0\widetilde{xz}\geqslant 0 that we imposed in Section 2.2 does not exclude any monotone designs. If p(x)p(x) is monotone then xx and zz necessarily have a nonnegative covariance 𝔼p(xz)\mathbb{E}_{p}(xz).

Our argument follows the outline of Section 3.1. Suppose that pmaxp_{\max}^{{\dagger}} and pminp_{\min}^{{\dagger}} are solutions to (19) with \mathcal{F} the set of monotone design functions, which we distinguish from the optimal designs pmaxp_{\max} and pminp_{\min} of Section 3.1. As \mathcal{F} is convex, Lemma 2 applies, and thus a solution poptp_{\textnormal{opt}}^{{\dagger}} to (4) is given by (20), replacing pmaxp_{\max} and pminp_{\min} by pmaxp_{\max}^{{\dagger}} and pminp_{\min}^{{\dagger}}, respectively. Note that x2z~(z~,xz~)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz}) may differ from its value in Section 3.1 since \mathcal{F} has changed.

We now characterize the designs pmaxp_{\max}^{{\dagger}} and pminp_{\min}^{{\dagger}}. As in Proposition 1, these designs always exist and are unique for any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J}. When FF is continuous, these are monotone two level designs pmax=p,1,tp_{\max}^{{\dagger}}=p_{\ell,1,t} and pmin=p0,u,sp_{\min}^{{\dagger}}=p_{0,u,s} as defined in (7). For general FF the designs pmaxp_{\max}^{{\dagger}} and pminp_{\min}^{{\dagger}} may differ from these designs at at the single discontinuity.

Proposition 2.

For any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J}, there exist unique solutions pmaxp_{\max}^{{\dagger}} and pminp_{\min}^{{\dagger}} to the optimization problems (19), when \mathcal{F} is the set of all monotone design functions. These solutions are the unique feasible designs satisfying

pmax(x)={,x<t1,x>tandpmin(x)={0,x<su,x>sp_{\max}^{{\dagger}}(x)=\begin{cases}\ell,&x<t\\ 1,&x>t\end{cases}\qquad\text{and}\qquad p_{\min}^{{\dagger}}(x)=\begin{cases}0,&x<s\\ u,&x>s\end{cases} (26)

for some ,u[0,1]\ell,u\in[0,1] and constants s,ts,t, which all depend on (z~,xz~)(\widetilde{z},\widetilde{xz}), where ss and tt may be infinite if xz~=0\widetilde{xz}=0.

Proof.

If xz~=0\widetilde{xz}=0, the only feasible monotone design is the fully randomized design p(x)=(1+z~)/2p(x)=(1+\widetilde{z})/2, and the theorem holds trivially with pmax=pminp_{\max}^{{\dagger}}=p_{\min}^{{\dagger}}, t=s=t=-s=\infty, and =u=(1+z~)/2\ell=u=(1+\widetilde{z})/2. Likewise, if xz~=xz~max(z~)\widetilde{xz}=\widetilde{xz}_{\max}(\widetilde{z}) then the desired results follow by Lemma 1 (take =0\ell=0, u=1u=1, and s=ts=t with tt as in Lemma 1). Thus, we assume that 0<xz~<xz~max(z~)0<\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}). Again, we only write out the argument for pmaxp_{\max}^{{\dagger}}; the proof for pminp_{\min}^{{\dagger}} is completely analogous.

Once again, we are in the setting of Lemma 3 with m=2m=2, h1(x)=1h_{1}(x)=1, h2(x)=xh_{2}(x)=x, and h3(x)=x2h_{3}(x)=x^{2}. The only difference from Proposition 1 is the definition of \mathcal{F}, so we must verify that the conditions on the corresponding SS^{\prime} and SS are satisfied. Since 𝔼(xap(x))\mathbb{E}(x^{a}p(x)) is linear in pp for all aa, and any convex combination of monotone functions is monotone (cf. Lemma 2), SS^{\prime} is convex. Now suppose that c0c_{0} is a limit point of SS^{\prime}. Then there exists a sequence p1,p2,p_{1},p_{2},\dots\in\mathcal{F} with (𝔼(pn(x)),𝔼(xpn(x)),𝔼(x2pn(x)))c0(\mathbb{E}(p_{n}(x)),\mathbb{E}(xp_{n}(x)),\mathbb{E}(x^{2}p_{n}(x)))\to c_{0} as nn\to\infty. As \mathcal{F} is sequentially compact, there exists a subsequence pnip_{n_{i}} and pp\in\mathcal{F} with pnipp_{n_{i}}\to p pointwise. But then (𝔼(p(x)),𝔼(xp(x)),𝔼(x2p(x)))=c0(\mathbb{E}(p(x)),\mathbb{E}(xp(x)),\mathbb{E}(x^{2}p(x)))=c_{0} by dominated convergence, so SS^{\prime} is closed. Finally, intS=φ1(𝒯)\textnormal{int}\ S=\varphi_{1}(\mathcal{T}^{{\dagger}}) where

𝒯{(z~,xz~)1<z~<1,0<xz~<xz~max(z~)}=int𝒥.\mathcal{T}^{{\dagger}}\equiv\{(\widetilde{z},\widetilde{xz})\mid-1<\widetilde{z}<1,0<\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z})\}=\textnormal{int}\ \mathcal{J}.

Hence the assumption that 0<xz~<xz~max(z~)0<\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}) ensures that c=φ1(z~,xz~)intSc=\varphi_{1}(\widetilde{z},\widetilde{xz})\in\textnormal{int}\ S.

With the conditions of Lemma 3 once again satisfied, we now show that (22) is equivalent to (26) for any (monotone) feasible pmaxp_{\max}^{{\dagger}}. First assume feasible pmaxp_{\max}^{{\dagger}} satisfies (22), i.e. pmaxargmaxq𝔼(q(x)(x2k1xk2))p_{\max}^{{\dagger}}\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}(q(x)(x^{2}-k_{1}x-k_{2})) for some k1,k2k_{1},k_{2}. The polynomial x2k1xk2x^{2}-k_{1}x-k_{2} having no real roots would mean this condition is equivalent to pmax(x)=1p_{\max}^{{\dagger}}(x)=1, contradicting z~<1\widetilde{z}<1. Hence we can factor x2k1xk2=(xr)(xt)x^{2}-k_{1}x-k_{2}=(x-r)(x-t) for some rtr\leqslant t. Considering the sign of (xr)(xt)(x-r)(x-t) and monotonicity of any qq\in\mathcal{F} we see

𝔼(q(x)(xr)(xt))\displaystyle\mathbb{E}(q(x)(x-r)(x-t)) 𝔼(q(r)(xr)(xt)𝟏(x<r))+𝔼(q(r)(xr)(xt)𝟏(rx<t))\displaystyle\leqslant\mathbb{E}(q(r)(x-r)(x-t)\bm{1}(x<r))+\mathbb{E}(q(r)(x-r)(x-t)\bm{1}(r\leqslant x<t))
+𝔼((xr)(xt)𝟏(xt)).\displaystyle\phantom{=}\ +\mathbb{E}((x-r)(x-t)\bm{1}(x\geqslant t)).

This inequality is strict unless q(x)=1q(x)=1 for almost every x>tx>t and

(q(r)q(x))(xr)(xt)𝟏(x<r)=(q(x)q(r))(xr)(xt)𝟏(rx<t)=0(q(r)-q(x))(x-r)(x-t)\bm{1}(x<r)=(q(x)-q(r))(x-r)(x-t)\bm{1}(r\leqslant x<t)=0

with probability one under FF, i.e., q(x)=q(r)=q(x)=q(r)=\ell for some [0,1]\ell\in[0,1] and almost every x<tx<t. Therefore any design in argmaxq𝔼(q(x)(x2k1xk2))\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}(q(x)(x^{2}-k_{1}x-k_{2})) must satisfy the first condition in (26). Conversely, if a feasible, monotone pmaxp_{\max}^{{\dagger}} satisfies (26) then let r1tr_{1}\leqslant t be such that g(r1)=𝔼((xr1)(xt)𝟏(x<t))=0g(r_{1})=\mathbb{E}((x-r_{1})(x-t)\bm{1}(x<t))=0. Such r1r_{1} exists since assuming WLOG that Pr(x<t)>0\Pr(x<t)>0, g()g(\cdot) is continuous on (,t](-\infty,t] with =limk1g(k1)<0g(t)-\infty=\lim_{k_{1}\downarrow-\infty}g(k_{1})<0\leqslant g(t). Considering the signs of (xr1)(xt)(x-r_{1})(x-t) we get for any pp\in\mathcal{F}

𝔼((pmax(x)p(x))(xr1)(xt))\displaystyle\mathbb{E}((p_{\max}^{{\dagger}}(x)-p(x))(x-r_{1})(x-t)) =𝔼((1p(x))(xr1)(xt)𝟏(xt))\displaystyle=\mathbb{E}((1-p(x))(x-r_{1})(x-t)\bm{1}(x\geqslant t))
+𝔼((p(x))(xr1)(xt)𝟏(x<t))\displaystyle\phantom{=}\ +\mathbb{E}((\ell-p(x))(x-r_{1})(x-t)\bm{1}(x<t))
𝔼((1p(x))(xr1)(xt)𝟏(xt))+(p(r1))g(r1)\displaystyle\geqslant\mathbb{E}((1-p(x))(x-r_{1})(x-t)\bm{1}(x\geqslant t))+(\ell-p(r_{1}))g(r_{1})
=𝔼((1p(x))(xr1)(xt)𝟏(xt))0\displaystyle=\mathbb{E}((1-p(x))(x-r_{1})(x-t)\bm{1}(x\geqslant t))\geqslant 0

and so pmaxp_{\max}^{{\dagger}} satisfies (22) with k1=r1+tk_{1}=r_{1}+t, and k2=tr1k_{2}=-tr_{1}. The second claim in Lemma 3 then ensures that the set of optimal solutions to (19) consists of precisely those feasible, monotone designs satisfying (26). Such a design must exist by the first claim of Lemma 3. The remaining uniqueness claims are shown in Appendix C. ∎

Analogous to Theorem 1, if we assume the running variable has a third moment then we have a solution to (4) of a simpler form than (20). When the running variable xx is continuous, this solution will be a two level design p,u,tp_{\ell^{\prime},u^{\prime},t^{\prime}} for some 0u10\leqslant\ell^{\prime}\leqslant u^{\prime}\leqslant 1. In general, when xx is not continuous, we may need a different treatment probability at the discontinuity tt^{\prime}.

Theorem 2.

Suppose 𝔼(|x|3)<\mathbb{E}(|x|^{3})<\infty. Then when \mathcal{F} is the set of all monotone design functions, for any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} there exists a solution to (4) with

popt(x)={,x<tu,x>tp_{\textnormal{opt}}^{{\dagger}}(x)=\begin{cases}\ell^{\prime},&x<t^{\prime}\\ u^{\prime},&x>t^{\prime}\end{cases} (27)

for some 0u10\leqslant\ell^{\prime}\leqslant u^{\prime}\leqslant 1 and tt^{\prime}\in\real.

Proof.

The proof structure is similar to that of Theorem 1. Fix any solution x2z~(z~,xz~;Ψ)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi) to (17). If it is an endpoint of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) then the unique solution to (4) is pmaxp_{\max}^{{\dagger}} or pminp_{\min}^{{\dagger}} from Proposition 2, which takes the form (27) with u=1u^{\prime}=1 or =0\ell^{\prime}=0, respectively. Otherwise we apply Lemma 3 with m=3m=3, hi(x)=xih_{i}(x)=x^{i} for i{1,2,3}i\in\{1,2,3\}, and c=φ2(z~,xz~,x2z~(z~,xz~;Ψ))c=\varphi_{2}(\widetilde{z},\widetilde{xz},\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)). The lemma applies since SS^{\prime} is closed and convex from the proof of Proposition 2, and cintSc\in\textnormal{int}\ S since our assumption that x2z~(z~,xz~;Ψ)intI(z~,xz~)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\Psi)\in\textnormal{int}\ I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) indicates there is more than one feasible design, so 0<xz~<xz~max(z~)0<\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}).

Applying Lemma 3 we see that there exists a design poptp_{\textnormal{opt}}^{{\dagger}} that solves (4) with poptargmaxq𝔼(q(x)f(x;k1,k2,k3))p_{\textnormal{opt}}^{{\dagger}}\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}(q(x)f(x;k_{1},k_{2},k_{3})) for some k1,k2,k3k_{1},k_{2},k_{3}. Here, as in the proof of Theorem 1, f(x)=f(x;k1,k2,k3)(x3+k1x2+k2x+k3)f(x)=f(x;k_{1},k_{2},k_{3})\equiv-(x^{3}+k_{1}x^{2}+k_{2}x+k_{3}). We show this implies poptp_{\textnormal{opt}}^{{\dagger}} is of the form (27) using the following claim.

Claim: Suppose sign(f(x))=𝟏(x<a)𝟏(x>a)\textnormal{sign}(f(x))=\bm{1}(x<a)-\bm{1}(x>a) w.p.1 for some aa\in\real and \mathcal{F} is the set of monotone design functions. Then pargmaxq𝔼(q(x)f(x))p\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}(q(x)f(x)) implies p(x)=p(a)p(x)=p(a) w.p.1.
Proof of claim: For any monotone design pp we can define p~(x)=min(p(x),p(a))\tilde{p}(x)=\min(p(x),p(a)) so that

𝔼((p~(x)p(x))f(x)\displaystyle\mathbb{E}((\tilde{p}(x)-p(x))f(x) =𝔼((p~(x)p(x))f(x)𝟏(xa))=𝔼((p(x)p(a))f(x)𝟏(xa))\displaystyle=\mathbb{E}((\tilde{p}(x)-p(x))f(x)\bm{1}(x\geqslant a))=-\mathbb{E}((p(x)-p(a))f(x)\bm{1}(x\geqslant a))

is nonnegative, and zero iff p(x)=p(a)p(x)=p(a) for almost all xax\geqslant a. Similarly by considering p~(x)=max(p(x),p(a))\tilde{p}(x)=\max(p(x),p(a)), we conclude p(x)=p(a)p(x)=p(a) for almost all x<ax<a.

We notice that ff has either one real root a1a_{1} or three real roots a1,a2,a3a_{1},a_{2},a_{3}. If a1a_{1} is the only root, we know f(x)<0f(x)<0 when x>a1x>a_{1} and f(x)>0f(x)>0 when x<a1x<a_{1}, since the leading coefficient of ff is negative. Thus we can apply the claim directly to show that poptp_{\textnormal{opt}}^{{\dagger}} is constant, in particular of the form (27) with =u\ell^{\prime}=u^{\prime}. If there are three real roots we show poptp_{\textnormal{opt}}^{{\dagger}} is of this form with t=a2t^{\prime}=a_{2}. Let F<F_{<} (F>F_{>}) be the conditional distribution of xx given x<a2x<a_{2} (x>a2x>a_{2}), so

𝔼F(q(x)f(x))=𝔼F<(q(x)f(x))Pr(x<a2)+𝔼F>(q(x)f(x))Pr(x>a2)\mathbb{E}_{F}(q(x)f(x))=\mathbb{E}_{F_{<}}(q(x)f(x))\Pr(x<a_{2})+\mathbb{E}_{F_{>}}(q(x)f(x))\Pr(x>a_{2})

We conclude the condition poptargmaxq𝔼F(q(x)f(x;k1,k2,k3))p_{\textnormal{opt}}^{{\dagger}}\in\operatorname*{arg\,max}_{q\in\mathcal{F}}\mathbb{E}_{F}(q(x)f(x;k_{1},k_{2},k_{3})) implies popt(x)=popt(a1)p_{\textnormal{opt}}^{{\dagger}}(x)=p_{\textnormal{opt}}^{{\dagger}}(a_{1}) for almost all x<a2x<a_{2} and popt(x)=popt(a3)p_{\textnormal{opt}}^{{\dagger}}(x)=p_{\textnormal{opt}}^{{\dagger}}(a_{3}) for almost all x>a2x>a_{2} by applying the claim twice (once for F<F_{<}, once for F>F_{>}). ∎

In general, the optimal designs derived in Theorems 1 and 2 are not unique when x2z~(z~,xz~,Ψ)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz},\Psi) is not on the boundary of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}). For example, in nondegenerate cases the solution poptp_{\textnormal{opt}}^{{\dagger}} in (27) typically has two levels, while the solution in (20) (with the monotonicity constraint) will have three levels. As another example, the three-level tiebreaker found by Owen and Varian, (2020) to be optimal when FF is uniform and z~=0\widetilde{z}=0 does not take the form (25) whenever xz~<xz~max(0)\widetilde{xz}<\widetilde{xz}_{\max}(0). Conversely, Propositions 1 and 2 guarantee a unique optimal design when x2z~(z~,xz~,Ψ)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz},\Psi) is one of the endpoints of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}).

4 Exploration-exploitation trade-off

As discussed in Section 1, Owen and Varian, (2020) showed that when z~=0\widetilde{z}=0 and F𝕌(1,1)F\sim\mathbb{U}(-1,1), the efficiency (under their criterion Eff()\textnormal{Eff}(\cdot)) of the three level tie-breaker (8) is monotonically increasing in the width Δ\Delta of the randomization window. As Δ\Delta is a strictly decreasing function of xz~\widetilde{xz}, and the three level tie-breaker solves (4) for all xz~\widetilde{xz}, they conclude that there is a monotone trade-off between short-term gain and statistical efficiency. In other words, greater statistical efficiency from an optimal design requires giving up short-term gain.

We now extend these results to general z~\widetilde{z} and other running variable distributions. Hereafter popt=popt;z~,xz~p_{\textnormal{opt}}=p_{\textnormal{opt};\widetilde{z},\widetilde{xz}} denotes an optimal design without the monotonicity constraint, to be contrasted with poptp_{\textnormal{opt}}^{{\dagger}} of Section 3.2. Note we have made the dependence of these designs on (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} explicit. We use the same efficiency criterion Eff()\textnormal{Eff}(\cdot) as Owen and Varian, (2020). Recall this is a cc-optimality criterion corresponding to the scaled asymptotic variance of the OLS estimate for β3\beta_{3} in (1), and equivalent to DD-optimality for our problem (4) by Section 2.3.

Theorem 3.

Suppose the distribution function FF of the running variable has a positive derivative everywhere in II, the smallest open interval with If(x)dx=1\int_{I}f(x)\,\mathrm{d}x=1. If additionally F(x)=1F(x),F(x)=1-F(-x), xI\forall x\in I, then fixing any z~(1,1)\widetilde{z}\in(-1,1), Eff(popt;z~,xz~)\textnormal{Eff}(p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}) is decreasing in xz~\widetilde{xz}.

Proof.

See Appendix D. ∎

It turns out, however, that the gain versus efficiency trade-off is no longer monotone under the monotonicity constraint. Indeed, our next theorem shows that whenever z~0\widetilde{z}\neq 0, if FF is symmetric (or indeed, not extremely skewed), the fully randomized design pθ,θ,0p_{\theta,\theta,0} is inadmissible for any θ1/2\theta\neq 1/2, in the sense that there exists a different monotone design pp with 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z} but both Eff(p)>Eff(pθ,θ,0)\textnormal{Eff}(p)>\textnormal{Eff}(p_{\theta,\theta,0}) and 𝔼p(xz)>𝔼pθ,θ,0(xz)\mathbb{E}_{p}(xz)>\mathbb{E}_{p_{\theta,\theta,0}}(xz). In other words, the RCT is no longer admissible under Eff()\textnormal{Eff}(\cdot) when z~0\widetilde{z}\neq 0.

Theorem 4.

Fix z~(1,1){0}\widetilde{z}\in(-1,1)\setminus\{0\}, and assume FF satisfies the conditions of Theorem 3. If z~<0\widetilde{z}<0 assume that 𝔼(x2)<F1(1)2\mathbb{E}(x^{2})<F^{-1}(1)^{2}; otherwise assume that 𝔼(x2)<F1(0)2\mathbb{E}(x^{2})<F^{-1}(0)^{2}. Here F1(1)supIF^{-1}(1)\equiv\sup I and F1(0)infIF^{-1}(0)\equiv\inf I. Let p1=pθ,θ,0p_{1}=p_{\theta,\theta,0} be the fully randomized monotone design with 𝔼p1(z)=z~\mathbb{E}_{p_{1}}(z)=\widetilde{z}, so that θ=(1+z~)/2\theta=(1+\widetilde{z})/2. Then there exists a monotone design p2p_{2} such that 𝔼p2(z)=z~\mathbb{E}_{p_{2}}(z)=\widetilde{z} yet both Eff(p2)>Eff(p1)\textnormal{Eff}(p_{2})>\textnormal{Eff}(p_{1}) and 𝔼p2(xz)>0=𝔼p1(xz)\mathbb{E}_{p_{2}}(xz)>0=\mathbb{E}_{p_{1}}(xz).

Proof.

See Appendix E. ∎

5 Examples

In this section, we compute the optimal exploration-exploitation trade-off curves investigated in Section 4 for several specific running variable distributions FF. We can obtain large gains in efficiency under the criterion Eff()\textnormal{Eff}(\cdot) by moving from the three level tie-breaker design to poptp_{\textnormal{opt}}^{{\dagger}}, without sacrificing short-term gain. We see further (generally smaller) improvements when we remove the monotonicity constraint and move from poptp_{\textnormal{opt}}^{\dagger} to poptp_{\textnormal{opt}}.

To generate these curves we compute optimal designs popt;z~,xz~p_{\textnormal{opt};\widetilde{z},\widetilde{xz}} and popt;z~,xz~p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}^{{\dagger}} and evaluate their efficiency for various fixed z~(1,1)\widetilde{z}\in(-1,1) as we vary the short-term gain constraint xz~\widetilde{xz} over a fine grid covering [0,xz~max(z~)][0,\widetilde{xz}_{\max}(\widetilde{z})]. For interpretability we write xz~=δxz~max(z~)\widetilde{xz}=\delta\cdot\widetilde{xz}_{\max}(\widetilde{z}) and specify short-term gain with the normalized parameter δ[0,1]\delta\in[0,1], as discussed in Section 2.2. When FF is continuous, solutions poptp_{\textnormal{opt}} and poptp_{\textnormal{opt}}^{{\dagger}} to (4) are computed by noting that we can write pmax=p[a1,a2]cp_{\max}=p_{[a_{1},a_{2}]^{c}}, pmin=p[b1,b2]p_{\min}=p_{[b_{1},b_{2}]}, pmax=p,1,tp_{\max}^{{\dagger}}=p_{\ell,1,t}, and pmin=p0,u,sp_{\min}^{{\dagger}}=p_{0,u,s} by Propositions 1 and 2. Each of these designs has two unknown parameters that must be the unique solutions to the two feasibility constraints 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z} and 𝔼p(xz)=xz~\mathbb{E}_{p}(xz)=\widetilde{xz}. Given these parameters, we can apply (20) to compute poptp_{\textnormal{opt}} and poptp_{\textnormal{opt}}^{{\dagger}}. If λ{0,1}\lambda\notin\{0,1\} we could also get an optimal design of the form in Theorem 1. First we compute x2z~(z~,xz~;Eff)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff}) via (17), noting that the endpoints of I(z~,xz~)I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) are 𝔼pmin(z~,xz~(x2z)\mathbb{E}_{p_{\min}(\widetilde{z},\widetilde{xz}}(x^{2}z) and 𝔼pmax(z~,xz~)(x2z)\mathbb{E}_{p_{\max}(\widetilde{z},\widetilde{xz})}(x^{2}z). Then (17) is simply maximizing a continuous function over a closed interval, so it can be handled by standard methods such as Brent’s algorithm (Brent,, 1973). Given x2z~(z~,xz~;Eff)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff}) we can then numerically search for a1a_{1}, a2a_{2}, a3a_{3} such that popt=𝟏(xa1)+𝟏(a2xa3)p_{\textnormal{opt}}=\bm{1}(x\leqslant a_{1})+\bm{1}(a_{2}\leqslant x\leqslant a_{3}) is feasible with 𝔼popt(x2z)=x2z~(z~,xz~;Eff)\mathbb{E}_{p_{\textnormal{opt}}}(x^{2}z)=\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff}). By Theorem 1 such a solution will exist and be optimal. We can do a similar search for an optimal two level design poptp_{\textnormal{opt}}^{{\dagger}} under the monotonicity constraint, by Theorem 2.

5.1 Uniform running variable

Table 1: A list of the parameters for the various designs considered in Section 5, for fixed (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J}, when F𝕌(1,1)F\sim\mathbb{U}(-1,1). The values for Δ\Delta are only valid if they are between 0 and min((1z~)/2,(1+z~)/2)\min((1-\widetilde{z})/2,(1+\widetilde{z})/2), inclusive. Otherwise there is no feasible three level tie-breaker design.
Design Parameter Value
p3;z~,Δp_{3;\widetilde{z},\Delta} Δ\Delta 2(1z~22xz~)1/22(1-\widetilde{z}^{2}-2\widetilde{xz})^{1/2}
pmax;z~,xz~p_{\max;\widetilde{z},\widetilde{xz}} a1a_{1} xz~/(1z~)(1z~)/2-\widetilde{xz}/(1-\widetilde{z})-(1-\widetilde{z})/2
a2a_{2} xz~/(1z~)+(1z~)/2-\widetilde{xz}/(1-\widetilde{z})+(1-\widetilde{z})/2
pmin;z~,xz~p_{\min;\widetilde{z},\widetilde{xz}} b1b_{1} xz~/(1+z~)(1+z~)/2\widetilde{xz}/(1+\widetilde{z})-(1+\widetilde{z})/2
b2b_{2} xz~/(1+z~)+(1+z~)/2\widetilde{xz}/(1+\widetilde{z})+(1+\widetilde{z})/2
pmax;z~,xz~p_{\max;\widetilde{z},\widetilde{xz}}^{{\dagger}} \ell (1/2)(1z~22xz~)/(1z~xz~)(1/2)(1-\widetilde{z}^{2}-2\widetilde{xz})/(1-\widetilde{z}-\widetilde{xz})
tt 12xz~/(1z~)1-2\widetilde{xz}/(1-\widetilde{z})
pmin;z~,xz~p_{\min;\widetilde{z},\widetilde{xz}}^{{\dagger}} uu (1/2)(1+z)2/(1+z~xz~)(1/2)(1+z)^{2}/(1+\widetilde{z}-\widetilde{xz})
ss 2xz~/(1+z~)12\widetilde{xz}/(1+\widetilde{z})-1

We begin with the case F𝕌(1,1)F\sim\mathbb{U}(-1,1). This is the distribution most extensively studied by Owen and Varian, (2020), and allows closed form expressions for the parameters in pmaxp_{\max}, pminp_{\min}, pmaxp_{\max}^{{\dagger}}, and pminp_{\min}^{{\dagger}}, given in Table 1. Figure 1 shows plots of Eff(p)1\textnormal{Eff}(p)^{-1} versus xz~\widetilde{xz} for z~{0,0.2,0.5,0.7}\widetilde{z}\in\{0,-0.2,-0.5,-0.7\} under different designs: the three level tie-breaker p3;z~,Δp_{3;\widetilde{z},\Delta}, a globally optimal design popt;z~,xz~p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}, and an optimal monotone design popt;z~,xz~p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}^{{\dagger}}. Since FF is symmetric, the curves would be identical if z~\widetilde{z} were replaced with z~-\widetilde{z}.

As shown in Owen and Varian, (2020), under the constraint z~=0\widetilde{z}=0 the three level tie-breaker is optimal for all δ\delta, and thus the three level tie-breaker, poptp_{\textnormal{opt}}, and poptp_{\textnormal{opt}}^{{\dagger}} all attain the optimal efficiency, as can be seen in the top left panel of Figure 1. The proof of Theorem 3 shows this would hold for any continuous, symmetric running variable distribution FF. As z~\widetilde{z} moves away from 0, however, we see that the three level tie-breaker becomes increasingly less efficient relative to both the optimal monotone design and the optimal design. At the same time, the range of short-term gain values xz~\widetilde{xz} attainable by three level tie-breaker designs becomes smaller relative to the full range achievable by arbitrary designs. Note that Figure 1 plots the reciprocal of the efficiency criterion Eff()\textnormal{Eff}(\cdot), so that it can be interpreted as an asymptotic variance for β^3\hat{\beta}_{3} via (15), and compared with the plots in Owen and Varian, (2020).

Refer to caption
Refer to caption
Figure 1: Left four plots: The inverse efficiencies of the three level tiebreaker p3;z~,Δp_{3;\widetilde{z},\Delta}, the optimal designs poptp_{\textnormal{opt}} in Section 3.1, and the optimal monotone designs poptp_{\textnormal{opt}}^{{\dagger}} in Section 3.2 as a function of the normalized short-term gain parameter δ\delta for fixed z~{0,0.2,0.5,0.7}\widetilde{z}\in\{0,-0.2,-0.5,-0.7\}, when F𝕌(1,1)F\sim\mathbb{U}(-1,1). When z~=0\widetilde{z}=0, all three curves coincide. Note the logarithmic vertical axis for the bottom two plots. Right four plots: The value of 𝔼p(x2z)\mathbb{E}_{p}(x^{2}z) for the designs pp in the left four plots.
Table 2: An extension of Table 2 in Owen and Varian, (2020), showing how poptp_{\textnormal{opt}} and poptp_{\textnormal{opt}}^{{\dagger}} can greatly increase efficiency without sacrificing short-term gain, compared to a three level tie-breaker. All designs pp in this table satisfy 𝔼p(z)=0.7\mathbb{E}_{p}(z)=-0.7 and assume F𝕌(1,1)F\sim\mathbb{U}(-1,1).
Design pp Description Normalized short-term gain δ\delta Eff(p)1\textnormal{Eff}(p)^{-1}
p0,1,0.7p_{0,1,0.7} Sharp RDD 1.0001.000 223.44223.44
p3;0.7,Δp_{3;-0.7,\Delta} 3 level tie-breaker 0.9800.980\phantom{0} 137.56137.56
popt;0.7,0.25p_{\textnormal{opt};-0.7,0.25}^{{\dagger}} Optimal monotone design 0.9800.980\phantom{0} 54.90\phantom{0}54.90
popt;0.7,0.25p_{\textnormal{opt};-0.7,0.25} Optimal design 0.9800.980\phantom{0} 42.37\phantom{0}42.37

Table 2 extends Table 2 of Owen and Varian, (2020), referring to a setting in which only 15% of subjects are to be treated (z~=0.7\widetilde{z}=-0.7). That table shows the inverse efficiency of the sharp RDD p0,1,0.7p_{0,1,0.7} is 223.44, while the three level tie-breaker p3;0.7,0.05p_{3;-0.7,0.05} reduces this by about 40% to 137.56137.56, at the cost of around 2% of the short-term gain of the sharp RDD over the RCT. Then Eff(popt)1=54.90\textnormal{Eff}(p_{\textnormal{opt}}^{{\dagger}})^{-1}=54.90 and Eff(popt)1=42.37\textnormal{Eff}(p_{\textnormal{opt}})^{-1}=42.37, further improving efficiency for designs poptp_{\textnormal{opt}} and poptp_{\textnormal{opt}}^{{\dagger}} achieving the same short term gain as the three level tie-breaker. For this example, we can directly compute with (31) that popt;0.7,0.25=pmax;0.7,0.25=p,1,tp_{\textnormal{opt};-0.7,0.25}^{{\dagger}}=p_{\max;-0.7,-0.25}^{{\dagger}}=p_{\ell,1,t} is the unique optimal monotone design, where by Table 1, =0.0034\ell=0.0034 and t=0.7059t=0.7059. In other words, the unique optimal monotone tie-breaker design deterministically assigns treatment to the top 14.7%, and gives the other subjects an equal, small (0.34%) chance of treatment.

A limitation of this analysis is that in many practical settings, the two-line regression model will not fit very well over the entire range of xx values. In that case the investigator might use a narrower data range, essentially fitting a less asymmetric two-line model, as illustrated in Owen and Varian, (2020). This is equivalent to using a local linear regression with a rectangular “boxcar” kernel. In this setting, we know from Figure 1 that when the treatment proportion is not exactly 50%, we can always do better than the three level tie-breaker using monotone two level design. Even with a small asymmetry, e.g. 40% treatment (z~=0.2\widetilde{z}=-0.2), we see a noticeable efficiency increase between the three level tie-breaker and an optimal monotone design across all values of δ\delta.

Finally, consistent with the results of Section 4, we observe in Figure 1 that Eff(popt)\textnormal{Eff}(p_{\textnormal{opt}}) decreases with the gain parameter δ\delta for each z~\widetilde{z}, while near δ=0\delta=0, Eff(popt)\textnormal{Eff}(p_{\textnormal{opt}}^{{\dagger}}) increases with δ\delta for all z~0\widetilde{z}\neq 0. This clearly demonstrates the inadmissibility of the fully randomized design from Theorem 4. For example, if we fix z~=0.5\widetilde{z}=-0.5 (so 25% of the subjects are to be treated), the fully randomized design p0.25,0.25,0=popt;0.5,0p_{0.25,0.25,0}=p_{\textnormal{opt};-0.5,0}^{{\dagger}} has efficiency 0.25 and no short-term gain (δ=0\delta=0), while popt;0.5,0.1p_{\textnormal{opt};-0.5,0.1}^{{\dagger}} has higher efficiency (0.28) with short-term gain δ=0.27>0\delta=0.27>0. However, if we remove the monotonicity constraint, by Theorem 3 popt;0.5,0p_{\textnormal{opt};-0.5,0} is the most efficient design over all attainable gain values, attaining efficiency 0.33 with δ=0\delta=0.

5.2 Skewed running variable

Refer to caption
Refer to caption
Figure 2: Same as Figure 1, except for the case where FF is a centered Weibull distribution (28).

We now repeat the analysis of Section 5.1 for a skewed running variable distribution FF:

F(x)=1exp(x+2)F(x)=1-\exp(-\sqrt{x+2}) (28)

for x(2,)=Ix\in(-2,\infty)=I. This corresponds to a mean-centered Weibull distribution with shape parameter 0.50.5 and scale parameter 11. Figure 2 shows the trade-off curves under this distribution FF. We see, as expected by Theorem 4, that once again the fully randomized design is inadmissible, even within the class of monotone designs, when z~0\widetilde{z}\neq 0.

Another notable feature when FF is not symmetric is that the three level tie-breaker is no longer optimal, even in the balanced case z~=0\widetilde{z}=0. While the unconstrained optimal design attains the lower bound 𝔼(x2z)=x2z~(0,xz~;Eff)=0\mathbb{E}(x^{2}z)=\widetilde{x^{2}z}^{*}(0,\widetilde{xz};\textnormal{Eff})=0 for a wide range of short-term gains, Figure 2 shows the three level tie-breaker does not, except in the case z~=xz~=0\widetilde{z}=\widetilde{xz}=0 corresponding to the RCT. In Figure 2, we see the optimal design is over 100 times as efficient as the three level tiebreaker for sufficiently large δ\delta, even in the balanced setting z~=0\widetilde{z}=0. In the unbalanced treatment cases we also see a range of values for which optimal designs with and without the monotonicity constraint attain the same value of 𝔼(x2z)\mathbb{E}(x^{2}z). In those situations there exists a globally optimal design that is also monotone.

5.3 Fixed-xx data example

We now illustrate how to compute optimal designs for the original fixed-xx problem (3) using a real data example. Ludwig and Miller, (2007) used an RDD to analyze the impact of Head Start, a U.S. government program launched in 1965 that provides benefits such as preschool and health services to children in low-income families. When the program was launched, extra grant-writing assistance was provided to the 300 counties with the highest poverty rates in the country. This created a natural discontinuity in the amount of funding to counties as a function of xx, a county poverty index based on the 1960 U.S. Census. The distribution of xx over n=2,804n=2{,}804 counties is shown in Figure 3. The data is made freely available by Cattaneo et al., (2017).

If the government had deemed it ethical to somewhat randomize the 300 counties receiving the grant-writing assistance, it could have more efficiently estimated the causal impact of this assistance using our poptp_{\textnormal{opt}}^{{\dagger}}, while still ensuring poorer counties are preferentially helped, and no county has a lower chance of getting the assistance than a more well-off county. As in the data example of Kluger and Owen, (2021), we do not observe the potential outcomes, so we cannot actually implement such a design and compute any estimators. However, we can still study statistical efficiencies, which depend only on the expected information matrix \mathcal{I}.

Refer to caption
Figure 3: Left: A histogram of the mean-centered poverty index xx for n=2,804n=2{,}804 counties used to determine eligibility for additional grant-writing assistance in the Head Start program. The dotted vertical line indicates the eligibility threshold. Right: The exploration-exploitation trade-offs for the Head Start data, comparing the three level tie-breaker with the optimal monotone two level design. The curves intersect at the value Eff1\textnormal{Eff}^{-1} of the RDD.

We fix the treatment fraction at 300/2804300/2804, corresponding to z~0.79\widetilde{z}\approx-0.79. Varying the short-term gain constraint xz~\widetilde{xz} we seek to compute pmax;z~,xz~p_{\max;\widetilde{z},\widetilde{xz}}^{{\dagger}} and pmin;z~,xz~p_{\min;\widetilde{z},\widetilde{xz}}^{{\dagger}}. We describe how to compute the former. Because FF is discrete it suffices to only consider discontinuity points t{x1,,xn}t\in\{x_{1},\dots,x_{n}\} where FF places positive probability mass, as every design of the form of pmaxp_{\max}^{{\dagger}} in (26) has a representation in that form with such tt. Also, given the values of the discontinuity tt and p(t)=ϵp(t)=\epsilon, there is at most one value =(t,ϵ)[0,1]\ell=\ell(t,\epsilon)\in[0,1] such that the resulting design pp in the form of pmaxp_{\max}^{{\dagger}} in (26) satisfies the treatment fraction constraint 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z}. When such an \ell exists for some (t,ϵ)(t,\epsilon), call the corresponding design p(t,ϵ)p^{(t,\epsilon)} (note we suppress the dependence on z~\widetilde{z}). From Appendix C we deduce that 𝔼p(t,ϵ)(xz)<𝔼p(t,ϵ)(xz)\mathbb{E}_{p^{(t,\epsilon)}}(xz)<\mathbb{E}_{p^{(t^{\prime},\epsilon^{\prime})}}(xz) and (t,ϵ)>(t,ϵ)\ell(t,\epsilon)>\ell(t^{\prime},\epsilon^{\prime}) if t>tt>t^{\prime}, or if t=tt=t^{\prime} and p(t)<p(t)p(t)<p^{\prime}(t). This shows we can efficiently find the unique (t,ϵ)(t,\epsilon) so that p(t,ϵ)p^{(t,\epsilon)} satisfies the desired short-term gain constraint 𝔼p(t,ϵ)(xz)=xz~\mathbb{E}_{p^{(t,\epsilon)}}(xz)=\widetilde{xz}. In particular we compute t=max{s{x1,,xn}𝔼p(s,1)(xz)xz~}t=\max\{s\in\{x_{1},\ldots,x_{n}\}\mid\mathbb{E}_{p^{(s,1)}}(xz)\geqslant\widetilde{xz}\} via a binary search on {x1,,xn}\{x_{1},\ldots,x_{n}\}, then solve for ϵ\epsilon to satisfy 𝔼p(t,ϵ)(xz)=xz~\mathbb{E}_{p^{(t,\epsilon)}}(xz)=\widetilde{xz}. Given sorted xx, this entire procedure computes pmaxp_{\max}^{{\dagger}} in O(n)O(n) operations, as for each (t,ϵ)(t,\epsilon), (t,ϵ)\ell(t,\epsilon) and 𝔼p(t,ϵ)(xz)\mathbb{E}_{p^{(t,\epsilon)}}(xz) can be computed in constant time using (5) given the partial sums {i=1mxi}m=1n\{\sum_{i=1}^{m}x_{i}\}_{m=1}^{n}.

After computing pminp_{\min}^{{\dagger}} with a similar approach, we can apply (20) to compute an optimal design poptp_{\textnormal{opt}}^{{\dagger}}. As in the continuous case, we can alternately obtain a solution poptp_{\textnormal{opt}}^{{\dagger}} of the form in Theorem 2 by finding \ell^{\prime}, uu^{\prime}, tt^{\prime}, and ϵ\epsilon^{\prime} such that 𝔼p(z)=z~\mathbb{E}_{p}(z)=\widetilde{z}, 𝔼p(xz)=xz~\mathbb{E}_{p}(xz)=\widetilde{xz}, and 𝔼p(x2z)=x2z~(z~,xz~;Eff)\mathbb{E}_{p}(x^{2}z)=\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff}) for p(x)𝟏(x<t)+ϵ𝟏(x=t)+𝟏(x>t)p(x)\equiv\ell^{\prime}\bm{1}(x<t^{\prime})+\epsilon^{\prime}\bm{1}(x=t^{\prime})+\bm{1}(x>t^{\prime}). Unlike the continuous case, we now have 4 unknown parameters instead of 3. We can search for an optimal set of these parameters by looping through the finite possible values of tt^{\prime} and then doing a univariate search for ϵ\epsilon^{\prime}, noting that knowledge of tt^{\prime} and ϵ\epsilon^{\prime} determines \ell^{\prime} and uu^{\prime} by the equality constraint parameters z~,xz~)\widetilde{z},\widetilde{xz}). We implemented this search, along with the procedure to compute pmaxp_{\max}^{{\dagger}} and pminp_{\min}^{{\dagger}} described above, in the R language (R Core Team,, 2022). The code is freely available online222https://github.com/hli90722/optimal_tiebreaker_designs.

The right panel of Figure 3 shows the inverse efficiency for the three level tie-breaker (9) versus the best two level monotone design obtained by applying the above procedure to the xix_{i} in the Head Start data. It turns out that for these xix_{i} and our choice of z~\widetilde{z}, pmaxp_{\max}^{{\dagger}} is optimal for all xz~\widetilde{xz} (and hence the unique optimal design, by Proposition 2). We note that with a normalized short term gain δ0.958\delta\approx 0.958, which corresponds to random assignment for about 150 counties in the 3-level tie-breaker, the optimal monotone two level design has inverse efficiency 0.030, compared to 0.050 for the three level tie-breaker. That is, confidence intervals for β3\beta_{3} using the three level tie-breaker would be about 29% wider than for the optimal monotone two-level design, without additional short-term gain. The sharp RDD would give 62% wider intervals than the optimal monotone two-level design with only about 4.2% additional short-term gain.

6 Summary

Our results provide a thorough characterization of the solutions to a constrained optimal experiment design problem. Considering a linear regression model for a scalar outcome involving a binary treatment assignment indicator zz, a scalar running variable xx, and their interaction, we seek to specify a randomized treatment assignment scheme based on xx — a tie-breaker design — that optimizes a statistical efficiency criterion that is an arbitrary continuous function of the expected information matrix under this regression model. We have equality constraints on the proportion of subjects receiving treatment due to an external budget, and on the covariance between xx and zz due to a preference for treating subjects with higher values of xx. Critically, our proof techniques, which deviate from those typically used to show equivalence theorems, enable an additional monotonicity constraint. This allows our results to handle the ethical or economic requirement that a subject cannot have a lower chance of receiving the treatment than another subject with a lower value of xx.

In a setting where the running variable xx is viewed as random from some distribution FF — and thus part of the randomness in the expected information matrix defining the efficiency criterion — we prove the existence of constrained optimal designs that stratify xx into a small number of intervals and assign treatment with the same probability to all individuals within each stratum. In particular, with the monotonicity constraint that is essential in our motivating applications, we only need three strata, one of which only contains a single running variable value. We also provide strong conditions on which the optimal tie-breaker design is unique. We emphasize the generality of our results, which apply for any continuous efficiency criterion, any running variable distribution FF (subject only to weak moment existence conditions), and the full range of feasible equality constraints. The problem an investigator faces in practice, where there are a finite number of running variable values x1,,xnx_{1},\ldots,x_{n} known (hence non-random) at the time of treatment assignment, is a special case of our more general problem where FF is discrete and takes on values x1,,xnx_{1},\ldots,x_{n} with equal probability. This enables optimal designs to be easily computed in practice, as described in Section 5.3.

We believe that this work provides a useful starting point to study optimal tie-breaker designs. For results on tie-breaker designs beyond the two line parametric regression, see Morrison and Owen, (2022) for a multivariate regression context and Kluger and Owen, (2021) for local linear regression models with a scalar running variable.

Appendix A Proof of Corollary 1

For any z~(1,1)\widetilde{z}\in(-1,1) we have xz~max(z~)=𝔼pz~(xz)=2𝔼(xpz~(x))\widetilde{xz}_{\max}(\widetilde{z})=\mathbb{E}_{p_{\widetilde{z}}}(xz)=2\mathbb{E}(xp_{\widetilde{z}}(x)) where pz~p_{\widetilde{z}} is as in Lemma 1. The desired condition M11>0M_{11}>0 is equivalent to (xz~)2<𝔼(x2)(1z~2)(\widetilde{xz})^{2}<\mathbb{E}(x^{2})(1-\widetilde{z}^{2}) and so it suffices to show 𝔼(xpz~(x))2<𝔼(x2)((1+z~)/2)((1z~)/2)\mathbb{E}(xp_{\widetilde{z}}(x))^{2}<\mathbb{E}(x^{2})((1+\widetilde{z})/2)((1-\widetilde{z})/2).

Applying Cauchy-Schwarz to xpz~(x)1/2×pz~(x)1/2xp_{\widetilde{z}}(x)^{1/2}\times p_{\widetilde{z}}(x)^{1/2} and then x(1pz~(x))1/2×(1pz~(x))1/2x(1-p_{\widetilde{z}}(x))^{1/2}\times(1-p_{\widetilde{z}}(x))^{1/2} yields the two equations

𝔼(xpz~(x))2\displaystyle\mathbb{E}(xp_{\widetilde{z}}(x))^{2} <𝔼(x2pz~(x))(1+z~2)\displaystyle<\mathbb{E}(x^{2}p_{\widetilde{z}}(x))\Bigl{(}\frac{1+\widetilde{z}}{2}\Bigr{)}
𝔼(xpz~(x))2=𝔼(x(1pz~(x)))2\displaystyle\mathbb{E}(xp_{\widetilde{z}}(x))^{2}=\mathbb{E}(x(1-p_{\widetilde{z}}(x)))^{2} <𝔼(x2(1pz~(x)))(1z~2)\displaystyle<\mathbb{E}(x^{2}(1-p_{\widetilde{z}}(x)))\Bigl{(}\frac{1-\widetilde{z}}{2}\Bigr{)}

where we have used the fact 𝔼(xpz~(x))=𝔼(x(1pz~(x)))\mathbb{E}(xp_{\widetilde{z}}(x))=-\mathbb{E}(x(1-p_{\widetilde{z}}(x))) since 𝔼(x)=0\mathbb{E}(x)=0. Note both inequalities are strict, since xx cannot equal a scalar multiple of pz~(x)p_{\widetilde{z}}(x) w.p.1. If it did, then 𝔼(kpz~(x)x)=k(1+z~)/2=0\mathbb{E}(kp_{\widetilde{z}}(x)-x)=k(1+\widetilde{z})/2=0 for some kk, implying k=0k=0 and hence x=0x=0 w.p.1, contradicting Var(x)>0\mathrm{Var}(x)>0. As 𝔼(x2)=𝔼(x2pz~(x))+𝔼(x2(1pz~(x)))\mathbb{E}(x^{2})=\mathbb{E}(x^{2}p_{\widetilde{z}}(x))+\mathbb{E}(x^{2}(1-p_{\widetilde{z}}(x))) we know that either 𝔼(x2pz~(x))𝔼(x2)(1+z~)/2\mathbb{E}(x^{2}p_{\widetilde{z}}(x))\leqslant\mathbb{E}(x^{2})\cdot(1+\widetilde{z})/2 or 𝔼(x2(1pz~(x)))𝔼(x2)(1z~)/2\mathbb{E}(x^{2}(1-p_{\widetilde{z}}(x)))\leqslant\mathbb{E}(x^{2})\cdot(1-\widetilde{z})/2.  \Box

Appendix B Proof of uniqueness in Proposition 1

We show uniqueness for pminp_{\min}. The same argument shows uniqueness for 1pmax1-p_{\max} and hence uniqueness for pmaxp_{\max}. Suppose that p(x)=δ1𝟏(x=b1)+𝟏(b1<x<b2)+δ2𝟏(x=b2)p(x)=\delta_{1}\bm{1}(x=b_{1})+\bm{1}(b_{1}<x<b_{2})+\delta_{2}\bm{1}(x=b_{2}) and p(x)=δ1𝟏(x=b1)+𝟏(b1<x<b2)+δ2𝟏(x=b2)p^{\prime}(x)=\delta^{\prime}_{1}\bm{1}(x=b^{\prime}_{1})+\bm{1}(b^{\prime}_{1}<x<b_{2}^{\prime})+\delta^{\prime}_{2}\bm{1}(x=b^{\prime}_{2}) are both solutions for pminp_{\min}. By symmetry we can assume that either b1<b1b_{1}<b_{1}^{\prime}, or both b1=b1b_{1}=b_{1}^{\prime} and δ1δ1\delta_{1}\leqslant\delta_{1}^{\prime}. Since pp and pp^{\prime} are feasible for (19), we must have 𝔼(p(x)p(x))=0\mathbb{E}(p(x)-p^{\prime}(x))=0 and 𝔼(x(p(x)p(x)))=0\mathbb{E}(x(p(x)-p^{\prime}(x)))=0, in view of (5). We show that p(x)=p(x)p(x)=p^{\prime}(x) w.p.1. under xFx\sim F. Note that we can assume without loss of generality that Pr(x[b1,b1+ϵ))>0\Pr(x\in[b_{1},b_{1}+\epsilon))>0 for any ϵ>0\epsilon>0, because otherwise, we could increase b1b_{1} to b1+sup{ϵ>0Pr(x[b1,b1+ϵ))=0}b_{1}+\sup\{\epsilon>0\mid\Pr(x\in[b_{1},b_{1}+\epsilon))=0\} without changing pp on a set of positive probability. We can similarly assume that Pr(x(b2ϵ,b2])>0\Pr(x\in(b_{2}-\epsilon,b_{2}])>0 for any ϵ>0\epsilon>0. Finally, we impose these two canonicalizing conditions on b1b_{1}^{\prime} and b2b_{2}^{\prime} as well.

Assume first that b2>b1b_{2}>b_{1}. Then we cannot have b1>b1b_{1}^{\prime}>b_{1} because we would then need either b2>b2b_{2}^{\prime}>b_{2} or b2=b2b_{2}^{\prime}=b_{2} with δ2>δ2\delta_{2}^{\prime}>\delta_{2} and Pr(x=b2)>0\Pr(x=b_{2})>0 to enforce 𝔼(p(x)p(x))=0\mathbb{E}(p(x)-p^{\prime}(x))=0 and this would cause 𝔼(x(p(x)p(x)))<0\mathbb{E}(x(p(x)-p^{\prime}(x)))<0. We similarly cannot have b1=b1b_{1}^{\prime}=b_{1} with both δ1>δ1\delta_{1}^{\prime}>\delta_{1} and Pr(x=b1)>0\Pr(x=b_{1})>0. Therefore after canonicalizing, we know that both pp and pp^{\prime} are equivalent to designs of the form given with b1=b1b_{1}=b_{1}^{\prime} and δ1=δ1\delta_{1}=\delta_{1}^{\prime} along with the analogous conditions b2=b2b_{2}=b_{2}^{\prime} and δ2=δ2\delta_{2}=\delta_{2}^{\prime}. Then our canonicalized pp and pp^{\prime} satisfy p(x)=p(x)p(x)=p^{\prime}(x) for all xx and so in particular Pr(p(x)=p(x))=1\Pr(p(x)=p^{\prime}(x))=1.

It remains to handle the case where b1=b2b_{1}=b_{2}. We then have Pr(x=b1)>0\Pr(x=b_{1})>0 since z~>1\widetilde{z}>-1. If b1>b1b_{1}^{\prime}>b_{1} then the support of pp^{\prime} is completely to the right of that of pp which violates 𝔼(x(p(x)p(x)))=0\mathbb{E}(x(p(x)-p^{\prime}(x)))=0. We can similarly rule out b2<b2b_{2}^{\prime}<b_{2}. As a result pp^{\prime} must have b1b1=b2b2b_{1}^{\prime}\leqslant b_{1}=b_{2}\leqslant b_{2}^{\prime}. Then we must have δ1Pr(x=b1)+Pr(b1<x<b1)=0\delta_{1}^{\prime}\Pr(x=b_{1}^{\prime})+\Pr(b_{1}^{\prime}<x<b_{1})=0 or else 𝔼(p(x)p(x))>0\mathbb{E}(p^{\prime}(x)-p(x))>0. For the same reason, we must have Pr(b2<x<b2)+δ2Pr(x=b2)=0\Pr(b_{2}<x<b_{2}^{\prime})+\delta_{2}^{\prime}\Pr(x=b_{2}^{\prime})=0. It then follows that both pp and pp^{\prime} have support {b1}\{b_{1}\} and then 𝔼(p(x))=𝔼(p(x))=(1+z~)/2\mathbb{E}(p(x))=\mathbb{E}(p^{\prime}(x))=(1+\widetilde{z})/2 forces (1+z~)/(2Pr(x=b1))=δ1=p(b1)=p(b1)=δ1(1+\widetilde{z})/(2\Pr(x=b_{1}))=\delta_{1}=p(b_{1})=p^{\prime}(b_{1})=\delta_{1}^{\prime}, so Pr(p(x)=p(x))=1\Pr(p(x)=p^{\prime}(x))=1.

Appendix C Proof of uniqueness in Proposition 2

We focus on pmaxp_{\max}^{{\dagger}} and consider two monotone designs pp and pp^{\prime} satisfying the feasibility constraints 𝔼p(z)=𝔼p(z)=z~\mathbb{E}_{p}(z)=\mathbb{E}_{p^{\prime}}(z)=\widetilde{z} and 𝔼p(xz)=𝔼p(xz)=xz~\mathbb{E}_{p}(xz)=\mathbb{E}_{p^{\prime}}(xz)=\widetilde{xz} along with the characterization of pmaxp_{\max}^{{\dagger}} in (26). Then p(x)=𝟏(x<t)+δ𝟏(x=t)+𝟏(x>t)p(x)=\ell\bm{1}(x<t)+\delta\bm{1}(x=t)+\bm{1}(x>t) and p(x)=𝟏(x<t)+δ𝟏(x=t)+𝟏(x>t)p^{\prime}(x)=\ell^{\prime}\bm{1}(x<t^{\prime})+\delta^{\prime}\bm{1}(x=t^{\prime})+\bm{1}(x>t^{\prime}) for some ,(0,1)\ell,\ell^{\prime}\in(0,1) with δ1\ell\leqslant\delta\leqslant 1 and δ1\ell^{\prime}\leqslant\delta^{\prime}\leqslant 1. Note the cases {0,1}\ell\in\{0,1\} (and the same for \ell^{\prime}) are excluded by the assumptions that xz~<xz~max(z~)\widetilde{xz}<\widetilde{xz}_{\max}(\widetilde{z}) and z~<1\widetilde{z}<1. Also, z~<1\widetilde{z}<1 also guarantees min(Pr(xt),Pr(xt))>0\min(\Pr(x\leqslant t),\Pr(x\leqslant t^{\prime}))>0. Finally, we note that we only have to show p(x)=p(x)p(x)=p^{\prime}(x) for almost all xtx\neq t, since then 𝔼(p(x)p(x))=0\mathbb{E}(p(x)-p^{\prime}(x))=0 ensures either p(t)=p(t)p(t)=p^{\prime}(t) or Pr(x=t)=0\Pr(x=t)=0; in either case this gives p=pp=p^{\prime} w.p.1. By symmetry we can assume that ttt\leqslant t^{\prime} with δp(t)p(t)=:δ\delta\equiv p(t)\geqslant p(t^{\prime})=:\delta^{\prime} if t=tt=t^{\prime}. Then p(x)=p(x)p(x)=p^{\prime}(x) for all x>tx>t^{\prime}.

Now we compute

𝔼(x(p(x)p(x)))\displaystyle\mathbb{E}(x(p(x)-p^{\prime}(x))) =𝔼((xt)(p(x)p(x))) since 𝔼(p(x))=𝔼(p(x))\displaystyle=\mathbb{E}((x-t)(p(x)-p^{\prime}(x)))\text{ since $\mathbb{E}(p(x))=\mathbb{E}(p^{\prime}(x))$}
=𝔼((tx)(p(x)p(x))𝟏(x<t))+𝔼((xt)(p(x)p(x))𝟏(t<xt))\displaystyle=\mathbb{E}((t-x)(p^{\prime}(x)-p(x))\bm{1}(x<t))+\mathbb{E}((x-t)(p(x)-p^{\prime}(x))\bm{1}(t<x\leqslant t^{\prime}))
=()𝔼((tx)𝟏(x<t))+𝔼((xt)(1p(x))𝟏(t<xt)).\displaystyle=(\ell^{\prime}-\ell)\mathbb{E}((t-x)\bm{1}(x<t))+\mathbb{E}((x-t)(1-p^{\prime}(x))\bm{1}(t<x\leqslant t^{\prime})).

If t=tt=t^{\prime}, then the right-hand side reduces to just ()𝔼((tx)𝟏(x<t))(\ell^{\prime}-\ell)\mathbb{E}((t-x)\bm{1}(x<t)). This is nonzero unless Pr(x<t)=0\Pr(x<t)=0 or =\ell=\ell^{\prime}. In both cases p(x)=p(x)p(x)=p^{\prime}(x) for almost all xtx\neq t.

If t<tt<t^{\prime}, then we can assume Pr(t<xt)>0\Pr(t<x\leqslant t^{\prime})>0 (otherwise the problem reduces to the case t=tt=t^{\prime}). First suppose \ell\geqslant\ell^{\prime}. Then p(x)p(x)p(x)\geqslant p^{\prime}(x) for all xx and so the treatment fraction constraint would require the identity

𝟏(t<xt)=p(x)𝟏(t<xt)p(x)𝟏(t<xt)=δ𝟏(x=t)+𝟏(t<x<t)\bm{1}(t<x\leqslant t^{\prime})=p(x)\bm{1}(t<x\leqslant t^{\prime})\geqslant p^{\prime}(x)\bm{1}(t<x\leqslant t^{\prime})=\delta^{\prime}\bm{1}(x=t^{\prime})+\ell^{\prime}\bm{1}(t<x<t^{\prime})

to hold with equality w.p.1. But since <1\ell^{\prime}<1, equality w.p.1. can only occur if Pr(t<x<t)=0\Pr(t<x<t^{\prime})=0 and δ=1\delta^{\prime}=1. In that case we immediately see p(x)=p(x)p(x)=p^{\prime}(x) for almost all x>tx>t, but 0=𝔼(x(p(x)p(x)))=()𝔼((tx)𝟏(x<t))0=\mathbb{E}(x(p(x)-p^{\prime}(x)))=(\ell^{\prime}-\ell)\mathbb{E}((t-x)\bm{1}(x<t)) so p(x)=p(x)p(x)=p^{\prime}(x) for almost all x<tx<t as well. Conversely, if we suppose <\ell<\ell^{\prime}, then 𝔼(x(p(x)p(x)))=0\mathbb{E}(x(p(x)-p^{\prime}(x)))=0 requires Pr(x<t)=0\Pr(x<t)=0 and p(x)=1p^{\prime}(x)=1 for almost all x(t,t]x\in(t,t^{\prime}], so once again p(x)=p(x)p(x)=p^{\prime}(x) for almost all xtx\neq t.

Appendix D Proof of Theorem 3

For any feasible design pp, we have

det(M)=(1z~2)(𝔼p(x2z))2𝔼(x2)2z~(xz~)2𝔼p(x2z)𝔼(x2)+𝔼(x2)(1z~2)+(xz~)2((xz~)2𝔼(x2)2).\displaystyle\begin{split}\det(M)&=-\frac{(1-\widetilde{z}^{2})\cdot\bigl{(}\mathbb{E}_{p}(x^{2}z)\bigr{)}^{2}}{\mathbb{E}(x^{2})}-\frac{2\widetilde{z}(\widetilde{xz})^{2}\cdot\mathbb{E}_{p}(x^{2}z)}{\mathbb{E}(x^{2})}+\mathbb{E}(x^{2})\bigl{(}1-\widetilde{z}^{2}\bigr{)}+(\widetilde{xz})^{2}\biggl{(}\frac{(\widetilde{xz})^{2}}{\mathbb{E}(x^{2})}-2\biggr{)}.\end{split} (29)

where M=M(p)M=M(p) as in (2.3). Thus det(M(p))\det(M(p)) is a concave quadratic function of 𝔼p(x2z)\mathbb{E}_{p}(x^{2}z) globally maximized at

a(z~,xz~)z~(xz~)21z~2a^{*}(\widetilde{z},\widetilde{xz})\equiv-\frac{\widetilde{z}\cdot(\widetilde{xz})^{2}}{1-\widetilde{z}^{2}} (30)

It follows that x2z~(z~,xz~;Eff)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff}) is the point in I(z~,xz~)=[I;min(z~,xz~),I;max(z~,xz~)]I_{\mathcal{F}}(\widetilde{z},\widetilde{xz})=[I_{\mathcal{F};\min}(\widetilde{z},\widetilde{xz}),I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz})] closest in absolute value to a(z~,xz~)a^{*}(\widetilde{z},\widetilde{xz}), i.e.

x2z~(z~,xz~;Eff)={I;min(z~,xz~)a(z~,xz~)I;min(z~,xz~)a(z~,xz~)I;min(z~,xz~)<a(z~,xz~)<I;max(z~,xz~)I;max(z~,xz~)a(z~,xz~)I;max(z~,xz~)\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff})=\begin{cases}I_{\mathcal{F};\min}(\widetilde{z},\widetilde{xz})&a^{*}(\widetilde{z},\widetilde{xz})\leqslant I_{\mathcal{F};\min}(\widetilde{z},\widetilde{xz})\\ a^{*}(\widetilde{z},\widetilde{xz})&I_{\mathcal{F};\min}(\widetilde{z},\widetilde{xz})<a^{*}(\widetilde{z},\widetilde{xz})<I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz})\\ I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz})&a^{*}(\widetilde{z},\widetilde{xz})\geqslant I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz})\end{cases} (31)

The above holds for any choice of \mathcal{F}; for the remainder of this proof we take \mathcal{F} to be the set of all measurable design functions.

We first show the case where z~=0\widetilde{z}=0. Note that 𝔼p(x2z)=0=a(0,xz~)\mathbb{E}_{p}(x^{2}z)=0=a^{*}(0,\widetilde{xz}) for any symmetric design pp, by symmetry of the running variable distribution. By continuity, for any xz~[0,xz~max(z~)]\widetilde{xz}\in[0,\widetilde{xz}_{\max}(\widetilde{z})] there exists Δ[0,]\Delta\in[0,\infty] such that the three level tie-breaker p3;z~,Δp_{3;\widetilde{z},\Delta} (which is symmetric and always satisfies 𝔼p3;z~,Δ(z)=0\mathbb{E}_{p_{3;\widetilde{z},\Delta}}(z)=0) satisfies 𝔼p3;z~,Δ(xz)=xz~\mathbb{E}_{p_{3;\widetilde{z},\Delta}}(xz)=\widetilde{xz} too. This shows that for all xz~[0,xz~max(z~)]\widetilde{xz}\in[0,\widetilde{xz}_{\max}(\widetilde{z})], 0I(z~,xz~)0\in I_{\mathcal{F}}(\widetilde{z},\widetilde{xz}) and hence x2z~(0,xz~;Eff)=0\widetilde{x^{2}z}^{*}(0,\widetilde{xz};\textnormal{Eff})=0, meaning any feasible design pp with 𝔼p(x2z)=0\mathbb{E}_{p}(x^{2}z)=0 is optimal. Then by (29)

det(M(popt;0,xz~))=𝔼(x2)+(xz~)2((xz~)2𝔼(x2)2)\det(M(p_{\textnormal{opt};0,\widetilde{xz}}))=\mathbb{E}(x^{2})+(\widetilde{xz})^{2}\biggl{(}\frac{(\widetilde{xz})^{2}}{\mathbb{E}(x^{2})}-2\biggr{)}

which is decreasing in xz~\widetilde{xz} on [0,xz~max(z~)][0,\widetilde{xz}_{\max}(\widetilde{z})], showing the theorem for z~=0\widetilde{z}=0.

For the cases z~<0\widetilde{z}<0 and z~>0\widetilde{z}>0, we begin with the two following claims.

Claim 1.

For any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} with z~<0\widetilde{z}<0, we have I;min(z~,xz~)0<a(z~,xz~)I_{\mathcal{F};\min}(\widetilde{z},\widetilde{xz})\leqslant 0<a^{*}(\widetilde{z},\widetilde{xz}).

Claim 2.

For any (z~,xz~)𝒥(\widetilde{z},\widetilde{xz})\in\mathcal{J} with z~>0\widetilde{z}>0, we have I;max(z~,xz~)0>a(z~,xz~)I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz})\geqslant 0>a^{*}(\widetilde{z},\widetilde{xz}).

Proof of Claims 1 and 2.

We write I;min(z~,xz~)=2𝔼(x2pmin;z~,xz~(x))𝔼(x2)I_{\mathcal{F};\min}(\widetilde{z},\widetilde{xz})=2\mathbb{E}(x^{2}p_{\min;\widetilde{z},\widetilde{xz}}(x))-\mathbb{E}(x^{2}) and similarly rewrite I;max(z~,xz~)I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz}). For claim 1, we proceed by writing pmin=p[b1,b2]p_{\min}=p_{[b_{1},b_{2}]} by Proposition 1 (suppressing the dependence of b1b_{1} and b2b_{2} on (z~,xz~)(\widetilde{z},\widetilde{xz}) in our notation) and performing casework on the signs of b1b_{1} and b2b_{2} to show that 𝔼(x2pmin(x))𝔼(x2)/2\mathbb{E}(x^{2}p_{\min}(x))\leqslant\mathbb{E}(x^{2})/2 in each case. In the case b1b20b_{1}\leqslant b_{2}\leqslant 0 we have 𝔼(x2pmin(x))𝔼(x2𝟏(x0))=𝔼(x2)/2\mathbb{E}(x^{2}p_{\min}(x))\leqslant\mathbb{E}(x^{2}\bm{1}(x\leqslant 0))=\mathbb{E}(x^{2})/2 by symmetry; similarly if b2b10b_{2}\geqslant b_{1}\geqslant 0 then 𝔼(x2pmin(x))𝔼(x2𝟏(x0))=𝔼(x2)/2\mathbb{E}(x^{2}p_{\min}(x))\leqslant\mathbb{E}(x^{2}\bm{1}(x\geqslant 0))=\mathbb{E}(x^{2})/2. Next, if b10b1b2b_{1}\leqslant 0\leqslant-b_{1}\leqslant b_{2} then F(b2)+F(0)F(b1)=Pr(b1<xb2)+1/2=𝔼(pmin(x))+1/2<1F(b_{2})+F(0)-F(b_{1})=\Pr(b_{1}<x\leqslant b_{2})+1/2=\mathbb{E}(p_{\min}(x))+1/2<1 since z~<0\widetilde{z}<0 implies 𝔼(p(x))<1/2\mathbb{E}(p(x))<1/2 by (5). Therefore

𝔼(x2pmin(x))\displaystyle\mathbb{E}(x^{2}p_{\min}(x)) =𝔼(x2𝟏(b1x0))+𝔼(x2𝟏(0xb2))\displaystyle=\mathbb{E}(x^{2}\bm{1}(b_{1}\leqslant x\leqslant 0))+\mathbb{E}(x^{2}\bm{1}(0\leqslant x\leqslant b_{2}))
b22Pr(b1x0)+𝔼(x2𝟏(0xb2))\displaystyle\leqslant b_{2}^{2}\Pr(b_{1}\leqslant x\leqslant 0)+\mathbb{E}(x^{2}\bm{1}(0\leqslant x\leqslant b_{2}))
=b22Pr(b2xF1(F(b2)+F(0)F(b1)))+𝔼(x2𝟏(0xb2))\displaystyle=b_{2}^{2}\Pr(b_{2}\leqslant x\leqslant F^{-1}(F(b_{2})+F(0)-F(b_{1})))+\mathbb{E}(x^{2}\bm{1}(0\leqslant x\leqslant b_{2}))
𝔼(x2𝟏(0xF1(F(b2)+F(0)F(b1))))𝔼(x2)2\displaystyle\leqslant\mathbb{E}(x^{2}\bm{1}(0\leqslant x\leqslant F^{-1}(F(b_{2})+F(0)-F(b_{1}))))\leqslant\frac{\mathbb{E}(x^{2})}{2}

where the final inequality uses symmetry of FF again. The final case b10b2b1b_{1}\leqslant 0\leqslant b_{2}\leqslant-b_{1} follows by a symmetric argument. The proof of Claim 2 is completely analogous, with pmax=p[a1,a2]cp_{\max}=p_{[a_{1},a_{2}]^{c}} by Proposition 1. ∎

We now proceed to prove the theorem. Given Claim 1, we have x2z~(z~,xz~;Eff)=min(I;max(z~,xz~),a(z~,xz~))\widetilde{x^{2}z}^{*}(\widetilde{z},\widetilde{xz};\textnormal{Eff})=\min(I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz}),a^{*}(\widetilde{z},\widetilde{xz})) by (31), and hence suppressing some z~\widetilde{z} dependences

h(xz~)det(M(popt;z~,xz~))={h(xz~),g(xz~)a(z~,xz~)det(M(pmax;z~,xz~)),g(xz~)a(z~,xz~)h(\widetilde{xz})\equiv\det(M(p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}))=\begin{cases}h^{*}(\widetilde{xz}),&g(\widetilde{xz})\geqslant a^{*}(\widetilde{z},\widetilde{xz})\\ \det(M(p_{\max;\widetilde{z},\widetilde{xz}})),&g(\widetilde{xz})\leqslant a^{*}(\widetilde{z},\widetilde{xz})\end{cases}

where g(xz~)I;max(z~,xz~)g(\widetilde{xz})\equiv I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz}) and h(xz~)h^{*}(\widetilde{xz}) is defined by substituting 𝔼p(x2z)=a(z~,xz~)\mathbb{E}_{p}(x^{2}z)=a^{*}(\widetilde{z},\widetilde{xz}) into (29). We must show that h(xz~)h(\widetilde{xz}) is decreasing on xz~>0\widetilde{xz}>0.

First, we compute h(xz~)=(𝔼(x2))2M112/(𝔼(x2)(1z~2))h^{*}(\widetilde{xz})=(\mathbb{E}(x^{2}))^{2}M_{11}^{2}/(\mathbb{E}(x^{2})(1-\widetilde{z}^{2})) and note it is decreasing in xz~\widetilde{xz} since M11M_{11} is positive (Corollary 1) and decreasing in xz~\widetilde{xz} on [0,xz~max(z~)][0,\widetilde{xz}_{\max}(\widetilde{z})]. Next, we show det(M(pmax;z~,xz~))\det(M(p_{\max;\widetilde{z},\widetilde{xz}})) is decreasing in xz~\widetilde{xz}. Note (a1,a2)(a_{1},a_{2}) are the unique solutions to the system

F(a1)+1F(a2)\displaystyle F(a_{1})+1-F(a_{2}) =(1+z~)/2\displaystyle=(1+\widetilde{z})/2
𝔼(x(𝟏(x<a1)+𝟏(x>a2)))\displaystyle\mathbb{E}(x(\bm{1}(x<a_{1})+\bm{1}(x>a_{2}))) =xz~/2\displaystyle=\widetilde{xz}/2

By the implicit function theorem (e.g., De Oliveira, (2018) since we do not require continuity of ff)), it follows that a1=a1(xz~)a_{1}=a_{1}(\widetilde{xz}) and a2=a2(xz~)a_{2}=a_{2}(\widetilde{xz}) are differentiable and satisfy

f(a1)a1(xz~)f(a2)a2(xz~)\displaystyle f(a_{1})a_{1}^{\prime}(\widetilde{xz})-f(a_{2})a_{2}^{\prime}(\widetilde{xz}) =0,and\displaystyle=0,\quad\text{and} (32)
a1f(a1)a1(xz~)a2f(a2)a2(xz~)\displaystyle a_{1}f(a_{1})a_{1}^{\prime}(\widetilde{xz})-a_{2}f(a_{2})a_{2}^{\prime}(\widetilde{xz}) =1/2.\displaystyle=1/2. (33)

Equations (32) and (33) imply that

g(xz~)=xz~𝔼p[a1,a2]c(x2z)=2a12f(a1)a1(xz~)2a22f(a2)a2(xz~)=a1+a2<0.g^{\prime}(\widetilde{xz})=\frac{\partial}{\partial\widetilde{xz}}\mathbb{E}_{p_{[a_{1},a_{2}]^{c}}}(x^{2}z)=2a_{1}^{2}f(a_{1})a_{1}^{\prime}(\widetilde{xz})-2a_{2}^{2}f(a_{2})a_{2}^{\prime}(\widetilde{xz})=a_{1}+a_{2}<0.

The inequality follows by the assumption z~<0\widetilde{z}<0 which ensures a1a_{1} and a2a_{2} must have different signs, and then noting that 𝔼pmax(xz)>0\mathbb{E}_{p_{\max}}(xz)>0 requires 𝔼(xpmax(x)𝟏(x>0))>𝔼(xpmax(x)𝟏(x<0))=𝔼(xpmax(x)𝟏(x>0))\mathbb{E}(xp_{\max}(x)\bm{1}(x>0))>-\mathbb{E}(xp_{\max}(x)\bm{1}(x<0))=\mathbb{E}(xp_{\max}(-x)\bm{1}(x>0)), the equality following by symmetry of FF. Thus, for all xz~>0\widetilde{xz}>0 such that g(xz~)a(z~,xz~)=z~(xz~)2/(1z~2)g(\widetilde{xz})\leqslant a^{*}(\widetilde{z},\widetilde{xz})=-\widetilde{z}(\widetilde{xz})^{2}/(1-\widetilde{z}^{2}) we have

𝔼(x2)det(M(pmax))xz~\displaystyle\frac{\mathbb{E}(x^{2})\partial\det(M(p_{\max}))}{\partial\widetilde{xz}} =g(xz~)(2(1z~2)g(xz~)4(xz~z~))\displaystyle=g(\widetilde{xz})\left(-2(1-\widetilde{z}^{2})g^{\prime}(\widetilde{xz})-4(\widetilde{xz}\cdot\widetilde{z})\right)
2(xz~)2z~g(xz~)+4(xz~)((xz~)2𝔼(x2))\displaystyle\phantom{=}\,-2(\widetilde{xz})^{2}\cdot\widetilde{z}g^{\prime}(\widetilde{xz})+4(\widetilde{xz})((\widetilde{xz})^{2}-\mathbb{E}(x^{2}))
4(xz~)3(z~)21z~2+4(xz~)((xz~)2𝔼(x2))=4M11(xz~)𝔼(x2)1z~2\displaystyle\leqslant\frac{4(\widetilde{xz})^{3}(\widetilde{z})^{2}}{1-\widetilde{z}^{2}}+4(\widetilde{xz})((\widetilde{xz})^{2}-\mathbb{E}(x^{2}))=-\frac{4M_{11}\cdot(\widetilde{xz})\mathbb{E}(x^{2})}{1-\widetilde{z}^{2}}

The RHS is negative (Corollary 1), so det(M(pmax;z~,xz~))\det(M(p_{\max;\widetilde{z},\widetilde{xz}})) is in fact decreasing in xz~>0\widetilde{xz}>0.

Finally, we fix 0x1<x2xz~max(z~)0\leqslant x_{1}<x_{2}\leqslant\widetilde{xz}_{\max}(\widetilde{z}) and show h(x1)>h(x2)h(x_{1})>h(x_{2}). Note g¯(xz~)g(xz~)a(z~,xz~)\overline{g}(\widetilde{xz})\equiv g(\widetilde{xz})-a^{*}(\widetilde{z},\widetilde{xz}) is continuous in xz~\widetilde{xz}, and h(xz~)det(M(pmax;z~,xz~))h^{*}(\widetilde{xz})\geqslant\det(M(p_{\max;\widetilde{z},\widetilde{xz}})). We now carry out casework on the signs of g¯(x1)\bar{g}(x_{1}) and g¯(x2)\bar{g}(x_{2}).
g¯(x1)0\overline{g}(x_{1})\geqslant 0: In this case

h(x1)=h(x1)>h(x2)h(x2).h(x_{1})=h^{*}(x_{1})>h^{*}(x_{2})\geqslant h(x_{2}). (34)

g¯(x1)<0\overline{g}(x_{1})<0 and g¯(x2)0\overline{g}(x_{2})\geqslant 0: Define S={x[x1,x2]g¯(x)0}S=\{x\in[x_{1},x_{2}]\mid\overline{g}(x)\geqslant 0\}, which contains x2x_{2}. Letting x3=infS>x1x_{3}=\inf S>x_{1}, we have g¯(x3)=0\overline{g}(x_{3})=0 and g¯(x)0\overline{g}(x)\leqslant 0 for x[x1,x3]x\in[x_{1},x_{3}], so

h(x1)=det(M(pmax;z~,x1))>det(M(pmax;z~,x3))=h(x3)h(x2)=h(x2)h(x_{1})=\det(M(p_{\max;\widetilde{z},x_{1}}))>\det(M(p_{\max;\widetilde{z},x_{3}}))=h^{*}(x_{3})\geqslant h^{*}(x_{2})=h(x_{2}) (35)

g¯(x1)<0\overline{g}(x_{1})<0 and g¯(x2)<0\overline{g}(x_{2})<0: In this case either g¯(x)0\overline{g}(x)\leqslant 0 on [x1,x2][x_{1},x_{2}] (so h(x1)=det(M(pmax;z~,x1))>det(M(pmax;z~,x2))=h(x2)h(x_{1})=\det(M(p_{\max;\widetilde{z},x_{1}}))>\det(M(p_{\max;\widetilde{z},x_{2}}))=h(x_{2})), or SS as defined in the previous case is non-empty with x3=inf(S)x_{3}=\inf(S) and x4=sup(S)x_{4}=\sup(S) satisfying x1<x3x4<x2x_{1}<x_{3}\leqslant x_{4}<x_{2} and g¯(x3)=g¯(x4)=0\overline{g}(x_{3})=\overline{g}(x_{4})=0. Then

h(x1)>(35)h(x3)(34)h(x4)>(35)h(x2)h(x_{1})\stackrel{{\scriptstyle(\ref{eq:h_case2})}}{{>}}h(x_{3})\stackrel{{\scriptstyle(\ref{eq:h_case1})}}{{\geqslant}}h(x_{4})\stackrel{{\scriptstyle(\ref{eq:h_case2})}}{{>}}h(x_{2})

which shows the theorem when z~<0\widetilde{z}<0. The proof of the case z~>0\widetilde{z}>0 is completely symmetric, and relies on Claim 2.

Appendix E Proof of Theorem 4

First we fix z~<0\widetilde{z}<0. It suffices to show that assuming 𝔼(x2)<F1(1)2\mathbb{E}(x^{2})<F^{-1}(1)^{2}, there exists δ>0\delta>0 such that (/xz~)det(M(popt;z~,xz~))>0(\partial/\partial\widetilde{xz})\det\bigl{(}M(p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}^{{\dagger}})\bigr{)}>0 whenever xz~(0,δ)\widetilde{xz}\in(0,\delta), and that det(M(popt;z~,xz~))\det(M(p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}^{{\dagger}})) is continuous in xz~\widetilde{xz} at xz~=0\widetilde{xz}=0.

From the assumed continuity of FF and Proposition 2, we have pmax(x)=p,1,t(x)p_{\max}^{\dagger}(x)=p_{\ell,1,t}(x), with z~>1\widetilde{z}>-1 ensuring F(t)>0F(t)>0. Again, we suppress the dependence of \ell and tt on (z~,xz~)(\widetilde{z},\widetilde{xz}) in our notation for brevity. By the treatment fraction constraint 𝔼p,1,t(z)=z~\mathbb{E}_{p_{\ell,1,t}}(z)=\widetilde{z}, we must have =1(1z~)/(2F(t))\ell=1-(1-\widetilde{z})/(2F(t)). From the short-term gain constraint 𝔼p,1,t(xz)=xz~\mathbb{E}_{p_{\ell,1,t}}(xz)=\widetilde{xz} we see

xz~2\displaystyle\frac{\widetilde{xz}}{2} =(1)𝔼(x𝟏(x<t))=1z~2F(t)𝔼(x𝟏(x<t)).\displaystyle=(\ell-1)\mathbb{E}(x\bm{1}(x<t))=-\frac{1-\widetilde{z}}{2F(t)}\mathbb{E}(x\bm{1}(x<t)).

We know by Proposition 2 and continuity of FF that the two equations above have a unique solution (,t)=((xz~),t(xz~))(\ell,t)=(\ell(\widetilde{xz}),t(\widetilde{xz})) for xz~(0,xz~max(z~))\widetilde{xz}\in(0,\widetilde{xz}_{\max}(\widetilde{z})). Thus, we can differentiate both of the equations above with respect to xz~\widetilde{xz} to see that the derivatives of \ell and tt are given by

t=t(xz~)=F(t)2(1z~)f(t)𝔼((xt)𝟏(x<t))and=(xz~)=12𝔼((xt)𝟏(x<t)).t^{\prime}=t^{\prime}(\widetilde{xz})=\frac{F(t)^{2}}{(1-\widetilde{z})f(t)\mathbb{E}((x-t)\bm{1}(x<t))}\qquad\text{and}\qquad\ell^{\prime}=\ell^{\prime}(\widetilde{xz})=\frac{1}{2\mathbb{E}((x-t)\bm{1}(x<t))}.

Then g(xz~)I;max(z~,xz~)=2(1)𝔼(x2𝟏(x<t))+𝔼(x2)g(\widetilde{xz})\equiv I_{\mathcal{F};\max}(\widetilde{z},\widetilde{xz})=2(\ell-1)\mathbb{E}(x^{2}\bm{1}(x<t))+\mathbb{E}(x^{2}) is differentiable as well with

g(xz~)=2𝔼(x2𝟏(x<t))+2(1)t2f(t)t=2(1)t2F(t)2(1z~)𝔼(x2𝟏(x<t))(1z~)𝔼((tx)𝟏(x<t))g^{\prime}(\widetilde{xz})=2\ell^{\prime}\mathbb{E}(x^{2}\bm{1}(x<t))+2(\ell-1)t^{2}f(t)t^{\prime}=\frac{2(1-\ell)t^{2}F(t)^{2}-(1-\widetilde{z})\mathbb{E}(x^{2}\bm{1}(x<t))}{(1-\widetilde{z})\mathbb{E}((t-x)\bm{1}(x<t))}

Next, note that g(0)=z~𝔼(x2)<0=a(z~,0)g(0)=\widetilde{z}\mathbb{E}(x^{2})<0=a^{*}(\widetilde{z},0), in the notation of (30). By differentiability (and thus continuity) of a(z~,)a^{*}(\widetilde{z},\cdot) and gg (the latter due to differentiability of \ell and tt), we conclude that there exists ϵ>0\epsilon>0 such that a(z~,xz~)g(xz~)0a^{*}(\widetilde{z},\widetilde{xz})-g(\widetilde{xz})\geqslant 0 for all xz~[0,ϵ]\widetilde{xz}\in[0,\epsilon]. By (31), this means popt;z~,xz~=pmax,z~,xz~p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}^{{\dagger}}=p_{\max,\widetilde{z},\widetilde{xz}}^{{\dagger}} for all xz~[0,ϵ]\widetilde{xz}\in[0,\epsilon]. Thus, it suffices to show xz~det(M(pmax;z~,xz~))>0\frac{\partial}{\partial\widetilde{xz}}\det(M(p_{\max;\widetilde{z},\widetilde{xz}}^{{\dagger}}))>0 for all xz~(0,δ)\widetilde{xz}\in(0,\delta), for some δϵ\delta\leqslant\epsilon. Continuity of det(M(pmax;z~,xz~))\det(M(p_{\max;\widetilde{z},\widetilde{xz}}^{{\dagger}})) at xz~=0\widetilde{xz}=0 follows immediately from continuity of gg and (29).

As xz~0\widetilde{xz}\downarrow 0 we have t(xz~)F1(1)t(\widetilde{xz})\uparrow F^{-1}(1) and (xz~)(1+z~)/2\ell(\widetilde{xz})\uparrow(1+\widetilde{z})/2 and also 𝔼((tx)𝟏(x<t))=tF(t)𝔼(x𝟏(x<t))F1(1)\mathbb{E}((t-x)\bm{1}(x<t))=tF(t)-\mathbb{E}(x\bm{1}(x<t))\uparrow F^{-1}(1). In the case F1(1)<F^{-1}(1)<\infty we have limxz~0g(xz~)=F1(1)𝔼(x2)/F1(1)>0\lim_{\widetilde{xz}\downarrow 0}g^{\prime}(\widetilde{xz})=F^{-1}(1)-\mathbb{E}(x^{2})/F^{-1}(1)>0 by assumption. If F1(1)=F^{-1}(1)=\infty, then g(xz~)g^{\prime}(\widetilde{xz})\rightarrow\infty as xz~0\widetilde{xz}\downarrow 0. Finally, we substitute into the formula (29) for det(M)\det(M) getting

det(M(pmax;z~,xz~))xz~\displaystyle\frac{\partial\det(M(p_{\max;\widetilde{z},\widetilde{xz}}^{{\dagger}}))}{\partial\widetilde{xz}} =2g(xz~)((1z~2)g(xz~)+z~(xz~)2)𝔼(x2)4g(xz~)z~(xz~)𝔼(x2)+4(xz~)3𝔼(x2)4xz~.\displaystyle=-\frac{2g^{\prime}(\widetilde{xz})\left((1-\widetilde{z}^{2})g(\widetilde{xz})+\widetilde{z}(\widetilde{xz})^{2}\right)}{\mathbb{E}(x^{2})}-\frac{4g(\widetilde{xz})\widetilde{z}(\widetilde{xz})}{\mathbb{E}(x^{2})}+\frac{4(\widetilde{xz})^{3}}{\mathbb{E}(x^{2})}-4\widetilde{xz}.

Since g(xz~)z~𝔼(x2)g(\widetilde{xz})\to\widetilde{z}\cdot\mathbb{E}(x^{2}) as xz~0\widetilde{xz}\downarrow 0, we have (1z~2)g(xz~)+(xz~)2z~𝔼(x2)z~M11<0(1-\widetilde{z}^{2})g(\widetilde{xz})+(\widetilde{xz})^{2}\widetilde{z}\to\mathbb{E}(x^{2})\widetilde{z}M_{11}<0 (Corollary 1). Our analysis of the limiting behavior on g(xz~)g^{\prime}(\widetilde{xz}) then indicates that

limxz~0det(M(pmax;z~,xz~))xz~=2z~M11(F1(1)𝔼(x2)/F1(1))>0\lim_{\widetilde{xz}\downarrow 0}\frac{\partial\det(M(p_{\max;\widetilde{z},\widetilde{xz}}^{{\dagger}}))}{\partial\widetilde{xz}}=-2\widetilde{z}M_{11}(F^{-1}(1)-\mathbb{E}(x^{2})/F^{-1}(1))>0

The proof for the case z~>0\widetilde{z}>0 is completely analogous. We first show that popt;z~,xz~=pmin;z~,xz~p_{\textnormal{opt};\widetilde{z},\widetilde{xz}}^{{\dagger}}=p_{\min;\widetilde{z},\widetilde{xz}}^{{\dagger}} whenever xz~\widetilde{xz} is sufficiently close to 0. Then we note (u,s)(u,s) is the unique solution to the equations u=(1+z~)/(2(1F(s)))u=(1+\widetilde{z})/(2(1-F(s))) and xz~/2=(1+z~)𝔼(x𝟏(xs))/(2(1F(s)))\widetilde{xz}/2=(1+\widetilde{z})\mathbb{E}(x\bm{1}(x\geqslant s))/(2(1-F(s))) to compute the derivatives u(xz~)u^{\prime}(\widetilde{xz}) and s(xz~)s^{\prime}(\widetilde{xz}). This enables us to show

limxz~0(/xz~)det(M(pmin;z~,xz~))>0\lim_{\widetilde{xz}\downarrow 0}(\partial/\partial\widetilde{xz})\det(M(p_{\min;\widetilde{z},\widetilde{xz}}^{{\dagger}}))>0

under the condition 𝔼(x2)<F1(0)2\mathbb{E}(x^{2})<F^{-1}(0)^{2}.

Acknowledgments

This work was supported by the US National Science Foundation under grants IIS-1837931 and DMS-2152780. The authors would like to thank Kevin Guo, Dan Kluger, Tim Morrison, and several anonymous reviewers for helpful comments.

References

  • Abdulkadiroglu et al., (2017) Abdulkadiroglu, A., Angrist, J. D., Narita, Y., and Pathak, P. A. (2017). Impact evaluation in matching markets with general tie-breaking. Technical report, National Bureau of Economic Research.
  • Aiken et al., (1998) Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J. L., and Hsiung, S. (1998). Comparison of a randomized and two quasi-experimental designs in a single outcome evaluation: Efficacy of a university-level remedial writing program. Evaluation Review, 22(2):207–244.
  • Angrist et al., (2020) Angrist, J., Autor, D., and Pallais, A. (2020). Marginal effects of merit aid for low-income students. Technical report, National Bureau of Economic Research.
  • Atkinson et al., (2007) Atkinson, A., Donev, A., and Tobias, R. (2007). Optimum experimental designs, with SAS. Oxford University Press.
  • Atkinson, (1982) Atkinson, A. C. (1982). Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika, 69(1):61–67.
  • Atkinson, (2014) Atkinson, A. C. (2014). Selecting a biased-coin design. Statistical Science, 29(1):144–163.
  • Bandyopadhyay and Biswas, (2001) Bandyopadhyay, U. and Biswas, A. (2001). Adaptive designs for normal responses with prognostic factors. Biometrika, 88(2):409–419.
  • Biswas and Bhattacharya, (2018) Biswas, A. and Bhattacharya, R. (2018). A class of covariate-adjusted response-adaptive allocation designs for multitreatment binary response trials. Journal of biopharmaceutical statistics, 28(5):809–823.
  • Boyd and Vandenberghe, (2004) Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press, Cambridge.
  • Brent, (1973) Brent, R. P. (1973). Algorithms for minimization without derivatives. Prentice-Hall, Inc., Englewood Cliffs, NJ.
  • Campbell, (1969) Campbell, D. T. (1969). Reforms as experiments. American psychologist, 24(4):409.
  • Cattaneo et al., (2017) Cattaneo, M. D., Titiunik, R., and Vazquez-Bare, G. (2017). Comparing inference approaches for RD designs: A reexamination of the effect of head start on child mortality. Journal of Policy Analysis and Management, 36(3):643–681.
  • Chernoff, (1953) Chernoff, H. (1953). Locally optimal designs for estimating parameters. The Annals of Mathematical Statistics, pages 586–602.
  • Clyde and Chaloner, (1996) Clyde, M. and Chaloner, K. (1996). The equivalence of constrained and weighted designs in multiple objective design problems. Journal of the American Statistical Association, 91(435):1236–1244.
  • Cook and Wong, (1994) Cook, R. D. and Wong, W. K. (1994). On the equivalence of constrained and compound optimal designs. Journal of the American Statistical Association, 89(426):687–692.
  • Dantzig and Wald, (1951) Dantzig, G. B. and Wald, A. (1951). On the fundamental lemma of Neyman and Pearson. The Annals of Mathematical Statistics, 22(1):87–93.
  • De Oliveira, (2018) De Oliveira, O. (2018). The implicit function theorem for maps that are only differentiable: An elementary proof. Real Analysis Exchange, 43(2):429–444.
  • Efron, (1971) Efron, B. (1971). Forcing a sequential experiment to be balanced. Biometrika, 58(3):403–417.
  • Gelman and Imbens, (2017) Gelman, A. and Imbens, G. (2017). Why high-order polynomials should not be used in regression discontinuity designs. Journal of Business & Economic Statistics, 37(3):447–456.
  • Goldberger, (1972) Goldberger, A. S. (1972). Selection bias in evaluating treatment effects: Some formal illustrations. Technical Report Discussion paper 128–72, Institute for Research on Poverty, University of Wisconsin–Madison.
  • Goldenshluger and Zeevi, (2013) Goldenshluger, A. and Zeevi, A. (2013). A linear response bandit problem. Stochastic Systems, 3(1):230–261.
  • Hu and Rosenberger, (2006) Hu, F. and Rosenberger, W. F. (2006). The theory of response-adaptive randomization in clinical trials. John Wiley & Sons.
  • Hu et al., (2015) Hu, J., Zhu, H., and Hu, F. (2015). A unified family of covariate-adjusted response-adaptive designs based on efficiency and ethics. Journal of the American Statistical Association, 110(509):357–367.
  • Jacob et al., (2012) Jacob, R., Zhu, P., Somers, M.-A., and Bloom, H. (2012). A practical guide to regression discontinuity. MDRC.
  • Kluger and Owen, (2021) Kluger, D. and Owen, A. B. (2021). Tie-breaker designs provide more efficient kernel estimates than regression discontinuity designs. Technical Report arXiv:2101.09605, Stanford University.
  • Läuter, (1974) Läuter, E. (1974). Experimental design in a class of models. Mathematische Operationsforschung und Statistik, 5(4-5):379–398.
  • Läuter, (1976) Läuter, E. (1976). Optimal multipurpose designs for regression models. Mathematische Operationsforschung und Statistik, 7(1):51–68.
  • Lee, (1987) Lee, C. M.-S. (1987). Constrained optimal designs for regressiom models. Communications in Statistics-Theory and Methods, 16(3):765–783.
  • Lee, (1988) Lee, C. M.-S. (1988). Constrained optimal designs. Journal of Statistical Planning and Inference, 18(3):377–389.
  • Lehmann and Romano, (2005) Lehmann, E. L. and Romano, J. P. (2005). Testing statistical hypotheses, volume 3. Springer, New York.
  • Lipsey et al., (1981) Lipsey, M. W., Cordray, D. S., and Berger, D. E. (1981). Evaluation of a juvenile diversion program: Using multiple lines of evidence. Evaluation Review, 5(3):283–306.
  • Ludwig and Miller, (2007) Ludwig, J. and Miller, D. L. (2007). Does Head Start improve children’s life chances? evidence from a regression discontinuity design. The Quarterly journal of economics, 122(1):159–208.
  • Metelkina and Pronzato, (2017) Metelkina, A. and Pronzato, L. (2017). Information-regret compromise in covariate-adaptive treatment allocation. The Annals of Statistics, 45(5):2046–2073.
  • Morrison and Owen, (2022) Morrison, T. P. and Owen, A. B. (2022). Optimality in multivariate tie-breaker designs. Technical report, Stanford University. arxiv2202.10030.
  • Neyman and Pearson, (1933) Neyman, J. and Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231(694-706):289–337.
  • Owen and Varian, (2020) Owen, A. B. and Varian, H. (2020). Optimizing the tie-breaker regression discontinuity design. Electronic Journal of Statistics, 14(2):4004–4027.
  • R Core Team, (2022) R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rosenberger and Sverdlov, (2008) Rosenberger, W. F. and Sverdlov, O. (2008). Handling covariates in the design of clinical trials. Statistical Science, 23(3):404–419.
  • Stigler, (1971) Stigler, S. M. (1971). Optimal experimental design for polynomial regression. Journal of the American Statistical Association, 66(334):311–318.
  • Sverdlov et al., (2013) Sverdlov, O., Rosenberger, W. F., and Ryeznik, Y. (2013). Utility of covariate-adjusted response-adaptive randomization in survival trials. Statistics in Biopharmaceutical Research, 5(1):38–53.
  • Thistlethwaite and Campbell, (1960) Thistlethwaite, D. L. and Campbell, D. T. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational psychology, 51(6):309.
  • Trochim and Cappelleri, (1992) Trochim, W. M. and Cappelleri, J. C. (1992). Cutoff assignment strategies for enhancing randomized clinical trials. Controlled Clinical Trials, 13(3):190–212.
  • Whittle, (1973) Whittle, P. (1973). Some general points in the theory of optimal experimental design. Journal of the Royal Statistical Society: Series B (Methodological), 35(1):123–130.
  • Zhang et al., (2007) Zhang, L.-X., Hu, F., Cheung, S. H., and Chan, W. S. (2007). Asymptotic properties of covariate-adjusted response-adaptive designs. The Annals of Statistics, 35(3):1166–1182.
  • Zhang and Hu, (2009) Zhang, L.-X. and Hu, F.-f. (2009). A new family of covariate-adjusted response adaptive designs and their properties. Applied Mathematics-A Journal of Chinese Universities, 24(1):1–13.