A general characterization of optimal tie-breaker designs
Abstract
Tie-breaker designs trade off a statistical design objective with short-term gain from preferentially assigning a binary treatment to those with high values of a running variable . The design objective is any continuous function of the expected information matrix in a two-line regression model, and short-term gain is expressed as the covariance between the running variable and the treatment indicator. We investigate how to specify design functions indicating treatment probabilities as a function of to optimize these competing objectives, under external constraints on the number of subjects receiving treatment. Our results include sharp existence and uniqueness guarantees, while accommodating the ethically appealing requirement that treatment probabilities are non-decreasing in . Under such a constraint, there always exists an optimal design function that is constant below and above a single discontinuity. When the running variable distribution is not symmetric or the fraction of subjects receiving the treatment is not , our optimal designs improve upon a -optimality objective without sacrificing short-term gain, compared to the three level tie-breaker designs of Owen and Varian (2020) that fix treatment probabilities at , , and . We illustrate our optimal designs with data from Head Start, an early childhood government intervention program.
1 Introduction
Companies, charitable institutions, and clinicians often have ethical or economic reasons to prefer assigning a binary treatment to certain individuals. If this preference is expressed by the values of a scalar running variable , a natural decision is to assign the treatment to a subject if and only if their is at least some threshold . This is a regression discontinuity design, or RDD (Thistlethwaite and Campbell,, 1960). Unfortunately, treatment effect estimates from an RDD analysis typically have very high variance (Jacob et al.,, 2012; Goldberger,, 1972; Gelman and Imbens,, 2017), relative to those from a randomized control trial (RCT) that does not preferentially treat any individuals. To trade off between these competing statistical and ethical objectives, investigators can use a tie-breaker design (TBD). In a typical tie-breaker design, the top ranked subjects get the treatment, the lowest ranked subjects are in the control group and a cohort in the middle are randomized to treatment or control. The earliest tie-breaker reference that we are aware of is Campbell, (1969) where was discrete and the randomization broke ties among subjects with identical values of .
Past settings for tie-breaker designs include the offer of remedial English to incoming university students based on their high school English proficiency (Aiken et al.,, 1998), a diversion program designed to reduce juvenile delinquency (Lipsey et al.,, 1981), scholarship offerings for two and four year colleges based on a judgment of the applicants’ needs and academic strengths (Abdulkadiroglu et al.,, 2017; Angrist et al.,, 2020), and clinical trials (Trochim and Cappelleri,, 1992), where they are known as cutoff designs.
The tie-breaker design problem is to choose treatment probabilities for subjects based on their running variables . These probabilities are chosen before observing the response values but with the running variables known. We assume throughout that whenever .
As is common in the optimal experimental design literature, the statistical objective is an “efficiency” criterion that measures estimation precision. Specifically, our criterion will be a function of the information (scaled inverse variance) matrix for the model parameters in a two line model relating the response to the running variable and a treatment indicator :
(1) |
This simple working model nonetheless poses some challenging design problems. In Section 6 we describe some more general modeling settings for tie-breaker models.
Throughout we assume that the running variable is centered, i.e. , and that the have common variance . Here indicates treatment and so ). For model (1) the information matrix is where and the expectation is taken over the treatment assignments , conditional on the running variables (whose values are known). The ordinary least squares estimate of satisfies . Common examples of efficiency criteria in the literature, such as the D-optimality criterion , are concave in both and (Boyd and Vandenberghe,, 2004). However, our theoretical results only require continuity of .
Our preference for treating individuals with higher running variables is expressed as an equality constraint on the scaled covariance between treatment and the running variable (recall the latter is known, hence viewed as non-random). Under the two-line model (1), this constraint has the following economic interpretation. We take to be something like economic value or student success, where larger is better. We expect that holds in most of our motivating problems. The expected value of per customer under (1) is then
(2) |
where . Equation (2) shows that the expected gain is unaffected by or . Furthermore, we assume the proportion of treated subjects is fixed by an external budget, i.e., an equality constraint for some . For instance, there might be only a set number of scholarships or perks to be given out. The only term affected by the design in (2) is then , as pointed out by Owen and Varian, (2020). For , the short term average value per customer grows with and we would want that value to be large. Similar functionals are also commonly studied as regret functions in bandit problems (Goldenshluger and Zeevi,, 2013; Metelkina and Pronzato,, 2017).
We are now ready to formulate the tie-breaker design problem as the following constrained optimization problem. Given real values :
(3) |
for some constants and . The first equality constraint in (3) is a budget constraint due to the cost of treatment and the second constraint is on the short term gain mentioned above. We consider two different sets in detail. The first is . The second is which requires treatment probabilities to be non-decreasing in the running variable . Such a monotonicity constraint prevents more qualified students from having a lower chance of getting a scholarship than less qualified ones or more loyal customers having a lower chance for a perk than others. It also eliminates perverse incentives for subjects to lower their . To our knowledge, such a monotonicity constraint has not been received much attention in the optimal design literature, though it is enormously appealing in our motivating applications.
When the efficiency criterion is concave in , then a solution to (3) can be found numerically via convex optimization, as Morrison and Owen, (2022) do for vector valued , and as Metelkina and Pronzato, (2017) mention for a similar problem. However, our particular setting with univariate is tractable enough to provide a simple yet complete analytical characterization of the optimal , even if the efficiency criterion is not concave. We show, under general conditions, that we can always find optimal treatment probabilities that are piecewise constant in , with the number of pieces small and independent of .
There is a well-developed literature for optimal experiment design in the presence of multiple objectives. Early examples of a constrained optimization problem of the form (3) were designed to account for several of the standard efficiency objectives simultaneously (Stigler,, 1971; Lee,, 1987, 1988). Läuter, (1974, 1976) proposed maximizing a convex combination of efficiency objectives, a practice now typically referred to as a “compound” design approach. It is now well known (Cook and Wong,, 1994; Clyde and Chaloner,, 1996) that in many problems with concave objectives, optimal constrained and compound designs are equivalent. In this paper, we provide another approach to reduce the constrained problem (3) to a compound problem that can handle the monotonicity constraint. At the same time, we provide simple ways to compute our optimal designs that are based directly on the parameters and in our constrained formulation (3), and do not require specifying the Lagrange multipliers appearing in the corresponding compound problem. Those Lagrange multipliers involve ratios of information gain to economic gain where each of those quantities is only known up to a multiplicative constant.
Problems similar to (3) have received significant attention in the sequential design of clinical trials. Biased-coin designs, beginning with the simple procedure of Efron, (1971), have been developed as a compromise between treatment balance and randomization; see Atkinson, (2014) for a review. Covariate-adaptive biased-coin designs often replace the balance objective with an efficiency criterion such as D-optimality (Atkinson,, 1982; Rosenberger and Sverdlov,, 2008). Response-adaptive designs also optimize for some efficiency objective but simultaneously seek to minimize the number of patients receiving the inferior treatment for ethical reasons (Hu and Rosenberger,, 2006). Various authors such as Bandyopadhyay and Biswas, (2001) and Hu et al., (2015) propose sequential designs to effectively navigate this trade-off. When they also account for covariate information, they are called covariate-adjusted response-adaptive (CARA) designs (Zhang et al.,, 2007; Zhang and Hu,, 2009).
In the CARA literature especially, there has been significant recent interest in optimal design for nonlinear models (Sverdlov et al.,, 2013; Metelkina and Pronzato,, 2017; Biswas and Bhattacharya,, 2018). Unlike optimal designs in linear models such as (1), designs in nonlinear models can typically only be locally optimal, meaning that their optimality depends on the values of the unknown parameters (Chernoff,, 1953). While we may be able to obtain increasingly reliable estimates of these parameters over time in sequential settings, in non-clinical settings subjects typically enter a tie-breaker study non-sequentially, i.e., we know the running variables for all subjects before designing the experiment. In these applications — such as measuring the impact of a scholarship on future educational attainment — it can take several years to collect a single set of responses on which to compute a parameter estimate. Locally optimal designs are therefore of limited utility in this setting, and so we focus on optimal design under the linear model (1) in a non-sequential setting, which already presents a sufficient challenge.
The existing literature on problems like (3) typically considers the running variable to be random. For example, Section 7 of Owen and Varian, (2020) study tie-breaker designs under the assumption that the running variable is either uniform or Gaussian, and exactly half the subjects are to be treated. They consider the typical three level tie-breaker design where subjects with running variable above some threshold always get the treatment, subjects with running variable below never get the treatment, and the remaining subjects are randomized into treatment with probability . They find that a -optimality criterion of statistical efficiency is monotonically increasing in the width of the randomization window, with the RCT () being most efficient and the RDD () least efficient. Conversely, the short-term gain is decreasing in . They also show the three level design is optimal for any given level of short-term gain. In this article we show strong advantages to moving away from that three level design when the running variable is not symmetric, or we cannot treat half of the subjects.
Metelkina and Pronzato, (2017) studied a further generalization of the optimal tie-breaker design problem, motivated by CARA designs. In particular, their Example 1, an illustration of their Corollary 2.1, is similar111There are minor differences such as the lack of a treatment fraction constraint, an inequality constraint on short-term gain as opposed to an equality constraint, and the use of a common intercept for the treated and untreated individuals, i.e., assuming that in (1). to a random- generalization of (3). Crucially, however, the proof for their Corollary 2.1 does not generalize to the case where we require the treatment probabilities be monotone. Even without the monotonicity constraint, we provide a sharper characterization of the solutions to our more specific problem (Section 3.1).
We introduce a random- generalization of the problem (3) in Section 2. We show it encompasses both (3) and the problem studied by Owen and Varian, (2020) as special cases. Then, Section 3 presents the main technical results characterizing the solutions to this more general problem. In particular, Theorem 2 shows that, under the monotonicity constraint, there always exists a solution to (3) corresponding to a simple, two-level stratified design where all subjects with running variable below some threshold have the same probability of treatment, all subjects with running variable above have an identical, higher probability of treatment, and those subjects with have a treatment probability between these two values. Section 4 then presents some results on the trade-off between a -optimality efficiency criterion and short-term gain for these optimal designs; examples of this trade-off for some specific running variable distributions are given in Section 5. That section also includes a fixed- application based on Head Start, a government assistance program for low-income children. It also shows how to compute our optimal designs when is either fixed or random. Finally, Section 6 provides summarizes the main results.
2 Random running variable
Our random- generalization of (3) assumes the running variables are samples from a common distribution , which we hereafter identify with the corresponding cumulative distribution function. It considers an information matrix that averages over both the random treatment assignments and randomness in the running variables. This allows us to characterize optimal designs for an as yet unobserved set of running variable values, when their distribution is known.
Before presenting the random- tie-breaker design problem, we briefly review the standard setting of optimal design in multiple linear regression models; see e.g. Atkinson et al., (2007) for further background. In the simplest case, the user assumes the standard linear model with the goal of selecting covariate values to optimize an efficiency criterion that is a function of the information matrix , where is the design matrix with -th row . Perhaps the most common such criterion is D-optimality, which corresponds to maximizing . Another popular choice is -optimality, which minimizes for some choice of . This can be interpreted as minimizing , where is the ordinary least squares estimator. The study of optimal design is often simplified by the use of design measures. A design measure is a probability distribution from which to generate the covariates . The relaxed optimal design problem involves selecting a design measure instead of a finite number of covariate values . The objective is to optimize for the desired functional of the expected information matrix over some space of design measures . For instance, a design measure is D-optimal (for the relaxed problem) if , and -optimal if . The original optimal design problem restricts to only consist of discrete probability distributions supported on at most distinct points with probabilities that are multiples of .
For the tie-breaker design problem, our regression model (1) includes both the running variables and the treatment indicators as covariates. But the experimenter does not have control over the entire joint distribution of . The running variable is externally determined, so they can only specify the conditional distribution of the treatment indicator given the running variable . This conditional distribution is specified by a design function such that . As mentioned above, we assume for a known, fixed distribution . This allows us to drop subscripts when convenient. For any two design functions and we say whenever . We only need a minimal assumption on , which can be continuous, discrete, or neither:
Assumption 1.
with .
The mean-centeredness part of Assumption 1 loses no generality, due to the translation invariance of estimation under the two-line model (1). All expectations involving hereafter omit the subscript from all such expectations with the implicit understanding that .
The random- tie-breaker design problem is as follows:
(4) |
Here and are constants analogous to and , respectively, in (3), is a collection of design functions, and is the expected information matrix under the model (1), averaging over both and .
This problem can be viewed as a constrained relaxed optimal design problem under the regression model (1) where the set of allowable design measures is indexed by the design functions .
To interpret the equality constraints in (4) it is helpful to note that
(5) |
for any with . In particular, for each positive integer , there exists an invertible linear mapping that does not depend on the design and maps to . For example, . Taking in (5), we see the constraint in (4) is equivalent to requiring the expected proportion of subjects to be treated to be . This proportion is typically determined by the aforementioned budget constraints. Taking in (5) shows that the second constraint in (4) sets the expected level of short-term gain. In Section 2.2, we provide some guidance on how to choose in practice.
From computing the expected information matrix in Section 2.3, we will see the problem (4) reduces to the finite-dimensional problem (3) when is discrete, placing probability mass on each of the known running variable values . Thus, to solve (3) it suffices to solve the problem (4) for any satisfying Assumption 1, which must hold for any discrete distribution with finite support.
2.1 Some design functions
For convenience, we introduce some notation for certain forms of the design function . We will commonly encounter designs of the form
(6) |
for a set . Another important special case consists of two level designs
(7) |
for treatment probabilities and a threshold . For example, is a sharp RDD with threshold , while for any , is an RCT with treatment probability .
The condition ensures that is nondecreasing in ; we refer to such designs as monotone. Under a monotone design, a subject cannot have a lower treatment probability than another subject with lower . We also define a symmetric design to be one for which ; for instance, might be the cumulative distribution function (CDF) of a symmetric random variable. Finally, the three level tie-breaker design from Owen and Varian, (2020) is both monotone and symmetric and defined for by
(8) |
when is the distribution. Note that for all , always treats half the subjects, i.e., . The generalization to other and running variable distribution functions is
(9) |
where and .
2.2 Bounds on short-term gain
Before studying optimal designs, we impose lower and upper bounds on the possible short-term gain constraints to consider, for each possible . For an upper bound we use , the maximum that can be attained by any design function satisfying the treatment fraction constraint . It turns out that this upper bound is always uniquely attained. If the running variable distribution is continuous, it is uniquely attained by a sharp RDD. We remind the reader that uniqueness of a design function satisfying some property means that for any two design functions and with that property, we must have under .
Lemma 1.
For any and running variable distribution , there exists a unique design satisfying
(10) | ||||
(11) |
for some . Any that satisfies the treatment fraction constraint (10) also satisfies
(12) |
with equality if and only if , i.e. under .
Remark 1.
Notice that equation (11) does not specify at . If is continuous, then any value for yields an equivalent design function, but if has an atom at then we will require a specific value for . We must allow to have atoms to solve the finite dimensional problem (3). While we specify in the proof of Lemma 1 below, later results of this type do not give the values of design functions at such discontinuities.
Remark 2.
If is continuous, then is an RDD: for . We call the design a generalized RDD for general satisfying Assumption 1.
Remark 3.
The threshold in (11) is essentially unique. If there is an interval with then all step locations in provide equivalent generalized RDDs.
Proof of Lemma 1..
If then the only design functions (again, up to uniqueness w.p.1 under ) are the constant functions and , and the result holds trivially. Thus, we can assume that . By (5), the existence of follows by taking and
To show (12), fix any design satisfying (10) and notice that means
Then equals
with equality iff for a set of with probability one under , i.e., iff satisfies (11) with probability one under . ∎
By symmetry, the design that minimizes over all designs with is where . Notice that . We impose a stricter lower bound of in the context of problem (4). This is motivated by the fact that the running variable has mean 0 (Assumption 1), meaning that whenever the design function is constant, corresponding to an RCT. Designs with exist for all but would not be relevant in our motivating applications, as they represent scenarios where subjects with smaller are more preferentially treated than in an RCT. We hence define the feasible input space by
(13) |
Any design function for which the moments lie within the feasible input space is referred to as an input-feasible design function.
If the design is input-feasible, we can write for some . The parameter corresponds to the amount of additional short-term gain attained by the design over an RCT, relative to the amount of additional short-term gain attained by the generalized RDD that treats the same proportion of subjects as . For instance, means that the design has a short-term gain that is 40% of the way from that of an RCT to the maximum attainable short-term gain under the treatment fraction constraint.
2.3 Expected information matrix and equivalence of -optimality and -optimality
We now explicitly compute the expected information matrix
(14) |
where
and we have omitted the dependence of the expectations on the design for brevity. We emphasize that depends on as well, though the experimenter can only control . Furthermore, when and the running variable values are mean-centered, the expected information matrix is precisely the fixed- information matrix , identifying . This shows that indeed, the random- problem (4) is strictly more general than the fixed- problem (3). Equation (14) also shows that any efficiency objective only depends on the treatment indicators through their marginal distributions conditional on , and not their joint distribution. In the fixed- setting, this makes it easier to obey an exact budget constraint by stratification. For instance, given five subjects with we could randomly treat exactly two of them, instead of randomizing each subject independently and possibly going over budget.
While we will characterize solutions to the optimal design problem (4) for any continuous efficiency criterion , in Section 4 we will prove some additional results for the -optimality criterion . We show that -optimality is of particular interest in this setting, as it happens to correspond exactly with the -optimality efficiency criterion of Owen and Varian, (2020). They aim to minimize the asymptotic variance of and do so by observing that if is invertible, then when are independent, by the law of large numbers
We have assumed WLOG as the -optimal design does not depend on . Then by standard block inversion formulas
(15) |
where
(16) |
Equation (15) shows that minimizing the asymptotic conditional variance of is equivalent to maximizing , which under the present formalization of the problem is further equivalent to -optimality for the expected information matrix (14) with . The following result shows that for any input-feasible design . It follows that is always well-defined and nonnegative for any input-feasible design.
Corollary 1.
For any , .
Proof.
See Appendix A. ∎
On the other hand, because is the expected value of a positive semi-definite rank one matrix, it is also positive semi-definite. Thus
This shows that with inequality iff is invertible. Additionally, since only depends on through and , any two input-feasible designs and satisfying the equality constraints in (4) must have . It follows that the solutions to (4) under the -optimality criterion and the -optimality criterion must be identical whenever .
3 Optimal design characterizations
To solve the constrained optimization problem (4), we begin by observing that the expected information matrix , computed in (14), only depends on the design function through the quantities , , and . Then the same is true for any efficiency objective . Consequently for any continuous we can write for some continuous that may depend on the running variable distribution .
Fixing and the set of permissible design functions, we say a feasible design is one that satisfies the equality constraints in (4), i.e. and . Thus, the efficiency criterion can only vary among feasible designs through the single quantity . Furthermore, any two feasible designs and with must have the same efficiency. Thus, we can break down the problem (4) into two steps. First, we find a solution
(17) |
where
(18) |
is the set of values of attainable by some feasible design . Then we must find a feasible design that satisfies .
The next result shows that when is convex, is an interval. For our two choices of of interest, Propositions 1 and 2 will show it is a closed interval, so (17) will always have a solution when is continuous.
Lemma 2.
Suppose feasible designs , satisfy , where is convex. Then if , there exists feasible with .
Proof.
If then either of them is a suitable . Otherwise take so that . Then is in by convexity and, by direct computation of the moments for , feasible with . ∎
The endpoints of can be computed as the optimal values of the following constrained optimization problems:
(19) |
Given solutions and to the problems (19), Lemma 2 shows that the design
(20) |
solves the problem (4) for . If then all feasible designs have the same efficiency, so any one of them is optimal.
The remainder of this section is concerned with characterizing the solutions to the problems (19) for two specific choices of design function classes : the set of all measurable functions into , and the set of all such monotone functions. For these two choices of , solutions and exist for any and are unique. Our argument uses extensions of the Neyman-Pearson lemma (Neyman and Pearson,, 1933) in hypothesis testing. These extensions are in Dantzig and Wald, (1951), whose two authors discovered the relevant results independently of each other. We use a modern formulation of their work, adapting the presentation by Lehmann and Romano, (2005):
Lemma 3.
Consider any measurable with . Define to be the set of all points such that
(21) |
for some , where is some collection of measurable functions from into . For each let be the set of all satisfying (21). If is such that
is closed and convex and , then
-
1.
There exists and such that
(22) -
2.
if and only if satisfies (22) for some .
Proof.
Claim 1 and necessity of (22) in claim 2 follows from the proof of part (iv) of Theorem 3.6.1 in Lehmann and Romano, (2005), which uses the fact that is closed and convex to construct a separating hyperplane in m+1. Sufficiency of (22) in claim 2 follows from part (ii) of that theorem, and is often called the method of undetermined multipliers. ∎
Lemma 3 equates a constrained optimization problem (item 2) and a compound optimization problem (item 1). Unlike typical equivalence theorems, it does not require to be the set of all measurable design functions, and uses an entirely different proof technique. Following Whittle, (1973), equivalence theorems in optimal design are now popularly proven using the concept of Fréchet derivatives on the space of design functions (measures). However, such approaches often do not apply when is restricted to be the set of all monotone design functions. Most relevant to our problem, the proof of Corollary 2.1 in the supplement of Metelkina and Pronzato, (2017) involves Fréchet derivatives in the direction of design functions supported at a single value of , which are not monotone. However, the use of Lemma 3 requires an objective linear in , where typical equivalence theorems only require concavity.
3.1 Globally optimal designs
We now solve the design problem (4) in the case that is the set of all measurable functions . We first explain how the results of Metelkina and Pronzato, (2017) do not adequately do so already. Identifying our design functions with their design measures , Corollary 2.1 of Metelkina and Pronzato, (2017) does not provide any information about what an optimal solution to (4) would be for values of where . Here and are quantities derived from the aforementioned Fréchet derivatives, depending on the constraints and . Unfortunately, this lack of information about holds for all both in their Example 1 and in our setting. Their example skirts this limitation by noting some moment conditions on implied by the equality when the running variable is uniform and the efficiency criterion is -optimality, and then manually searching for some parametric forms of for which it is possible to satisfy these conditions. By contrast, the results in this section apply Lemma 3 with the set of all design functions, and show a simple stratified design function is always optimal for any running variable distribution and continuous efficiency criterion. This enables optimal designs to be systematically and efficiently constructed (Section 5).
We will apply Lemma 3 with constraints pertaining to and . Our objective function is based on . When the running variable distribution is continuous, recalling the notation (6) the solutions to (19) take the forms and for some intervals and .
Proposition 1.
Let be the set of all measurable functions from into . For any , there exist unique solutions and to the optimization problems (19). These solutions are the unique feasible designs satisfying
(23) |
for some and which depend on and can be infinite if .
Proof.
If , then the proposition follows by Lemma 1 and taking , and . Thus we can assume that . We give the proof for in detail. The argument for is completely symmetric.
As noted above, we are in the setting of Lemma 3 with , , , and . The collection here is the set of all measurable functions from into , so the corresponding is closed and convex, as shown in part (iv) of Theorem 3.6.1 in Lehmann and Romano, (2005). By Lemma 1 and (5) we can write where is defined in the discussion around (5) and
Hence our previous assumption ensures .
With the conditions of Lemma 3 satisfied, we now show that (22) is equivalent to (23) for any feasible . A feasible design satisfies (22) iff for some , or equivalently
(24) |
(cf. part (ii) of Theorem 3.6.1 in Lehmann and Romano, (2005)). If has no real roots then for all , contradicting . Thus we write for some (real) , showing that (24) is equivalent to (23). We can now conclude, by the second claim in Lemma 3, that the set of optimal solutions to (19) contains precisely those feasible designs satisfying (23). Furthermore, the first claim of Lemma 3 ensures that such a design must exist. It remains to show only one feasible design can satisfy (23); in Appendix B we provide a direct argument, which does not rely on Lemma 3.
∎
Remark 4.
The necessity and sufficiency results of Proposition 1 do follow from Corollary 2.1 of Metelkina and Pronzato, (2017). We again identify our design functions with their design measures and take , which can be written as an affine function of the expected information matrix . As discussed at the beginning of this section, however, the form of a solution to problem (4), cannot be constructed from their Corollary without the reduction to (19) and applying (20), so that is as above rather than something like -optimality. We have also shown a stronger uniqueness result than Section 2.3.3 of Metelkina and Pronzato, (2017), which only applies when the running variable distribution has a density with respect to Lebesgue measure. Our Lemma 3 also provides an existence guarantee that does not rely on strict concavity of on the set of positive definite matrices; this is violated by the affine choice we need here.
As we will see in Section 5, when we frequently encounter under -optimality. In this case it is intuitive that there is an efficiency advantage to strategically allocate the rare level at both high and low , compared to a three level tie-breaker. But such a design is usually unacceptable in our motivating problems. We will thus constrain to the set of monotone design functions in Section 3.2.
Before doing that, we present an alternative solution to (4) assuming the running variable distribution has an additional moment. This result shows that when the running variable is continuous, an optimal design with no randomization always exists. However, randomized assignment becomes essential once we restrict our attention to monotone designs in Section 3.2, as the only non-randomized monotone designs are generalized RDDs (Remark 2).
Theorem 1.
Proof.
The solutions to (4) are precisely the feasible design functions where is a solution to (17). Fix any such solution to (17); it suffices to find a feasible design with . If is the lower (resp. upper) endpoint of the interval , then by Proposition 1, the unique feasible design with is the design (resp. ). Then the result follows with (resp. ).
Otherwise, is in the interior of , and we aim to apply Lemma 3 with , for , and . With closed and convex as shown in Proposition 1, we only need to show . With the interior of being nonempty, there is more than one feasible design and the uniqueness result of Lemma 1 indicates that we must have . With by assumption and , indeed .
Applying Lemma 3 we know that there exists a feasible design with and for some . If had only one real root , then the negative leading coefficient indicates when and when . This would imply is a design that always treats all subjects with and never treats any subject with , which cannot be input-feasible. We conclude for some finite which are the roots of . This shows the existence of of the form (25). ∎
3.2 Imposing a monotonicity constraint
We now apply Lemma 3 to solve (4) in the case of principal interest, where is the set of all monotone design functions. Note that the lower bound that we imposed in Section 2.2 does not exclude any monotone designs. If is monotone then and necessarily have a nonnegative covariance .
Our argument follows the outline of Section 3.1. Suppose that and are solutions to (19) with the set of monotone design functions, which we distinguish from the optimal designs and of Section 3.1. As is convex, Lemma 2 applies, and thus a solution to (4) is given by (20), replacing and by and , respectively. Note that may differ from its value in Section 3.1 since has changed.
We now characterize the designs and . As in Proposition 1, these designs always exist and are unique for any . When is continuous, these are monotone two level designs and as defined in (7). For general the designs and may differ from these designs at at the single discontinuity.
Proposition 2.
For any , there exist unique solutions and to the optimization problems (19), when is the set of all monotone design functions. These solutions are the unique feasible designs satisfying
(26) |
for some and constants , which all depend on , where and may be infinite if .
Proof.
If , the only feasible monotone design is the fully randomized design , and the theorem holds trivially with , , and . Likewise, if then the desired results follow by Lemma 1 (take , , and with as in Lemma 1). Thus, we assume that . Again, we only write out the argument for ; the proof for is completely analogous.
Once again, we are in the setting of Lemma 3 with , , , and . The only difference from Proposition 1 is the definition of , so we must verify that the conditions on the corresponding and are satisfied. Since is linear in for all , and any convex combination of monotone functions is monotone (cf. Lemma 2), is convex. Now suppose that is a limit point of . Then there exists a sequence with as . As is sequentially compact, there exists a subsequence and with pointwise. But then by dominated convergence, so is closed. Finally, where
Hence the assumption that ensures that .
With the conditions of Lemma 3 once again satisfied, we now show that (22) is equivalent to (26) for any (monotone) feasible . First assume feasible satisfies (22), i.e. for some . The polynomial having no real roots would mean this condition is equivalent to , contradicting . Hence we can factor for some . Considering the sign of and monotonicity of any we see
This inequality is strict unless for almost every and
with probability one under , i.e., for some and almost every . Therefore any design in must satisfy the first condition in (26). Conversely, if a feasible, monotone satisfies (26) then let be such that . Such exists since assuming WLOG that , is continuous on with . Considering the signs of we get for any
and so satisfies (22) with , and . The second claim in Lemma 3 then ensures that the set of optimal solutions to (19) consists of precisely those feasible, monotone designs satisfying (26). Such a design must exist by the first claim of Lemma 3. The remaining uniqueness claims are shown in Appendix C. ∎
Analogous to Theorem 1, if we assume the running variable has a third moment then we have a solution to (4) of a simpler form than (20). When the running variable is continuous, this solution will be a two level design for some . In general, when is not continuous, we may need a different treatment probability at the discontinuity .
Theorem 2.
Suppose . Then when is the set of all monotone design functions, for any there exists a solution to (4) with
(27) |
for some and .
Proof.
The proof structure is similar to that of Theorem 1. Fix any solution to (17). If it is an endpoint of then the unique solution to (4) is or from Proposition 2, which takes the form (27) with or , respectively. Otherwise we apply Lemma 3 with , for , and . The lemma applies since is closed and convex from the proof of Proposition 2, and since our assumption that indicates there is more than one feasible design, so .
Applying Lemma 3 we see that there exists a design
that solves (4)
with
for some .
Here,
as in the proof of Theorem 1,
.
We show this implies is of the form (27)
using the following claim.
Claim: Suppose w.p.1 for some
and is the set of monotone design functions.
Then
implies w.p.1.
Proof of claim:
For any monotone design we can define so that
is nonnegative,
and zero iff for almost all .
Similarly by considering ,
we conclude for almost all .
We notice that has either one real root or three real roots . If is the only root, we know when and when , since the leading coefficient of is negative. Thus we can apply the claim directly to show that is constant, in particular of the form (27) with . If there are three real roots we show is of this form with . Let () be the conditional distribution of given (), so
We conclude the condition implies for almost all and for almost all by applying the claim twice (once for , once for ). ∎
In general, the optimal designs derived in Theorems 1 and 2 are not unique when is not on the boundary of . For example, in nondegenerate cases the solution in (27) typically has two levels, while the solution in (20) (with the monotonicity constraint) will have three levels. As another example, the three-level tiebreaker found by Owen and Varian, (2020) to be optimal when is uniform and does not take the form (25) whenever . Conversely, Propositions 1 and 2 guarantee a unique optimal design when is one of the endpoints of .
4 Exploration-exploitation trade-off
As discussed in Section 1, Owen and Varian, (2020) showed that when and , the efficiency (under their criterion ) of the three level tie-breaker (8) is monotonically increasing in the width of the randomization window. As is a strictly decreasing function of , and the three level tie-breaker solves (4) for all , they conclude that there is a monotone trade-off between short-term gain and statistical efficiency. In other words, greater statistical efficiency from an optimal design requires giving up short-term gain.
We now extend these results to general and other running variable distributions. Hereafter denotes an optimal design without the monotonicity constraint, to be contrasted with of Section 3.2. Note we have made the dependence of these designs on explicit. We use the same efficiency criterion as Owen and Varian, (2020). Recall this is a -optimality criterion corresponding to the scaled asymptotic variance of the OLS estimate for in (1), and equivalent to -optimality for our problem (4) by Section 2.3.
Theorem 3.
Suppose the distribution function of the running variable has a positive derivative everywhere in , the smallest open interval with . If additionally , then fixing any , is decreasing in .
Proof.
See Appendix D. ∎
It turns out, however, that the gain versus efficiency trade-off is no longer monotone under the monotonicity constraint. Indeed, our next theorem shows that whenever , if is symmetric (or indeed, not extremely skewed), the fully randomized design is inadmissible for any , in the sense that there exists a different monotone design with but both and . In other words, the RCT is no longer admissible under when .
Theorem 4.
Fix , and assume satisfies the conditions of Theorem 3. If assume that ; otherwise assume that . Here and . Let be the fully randomized monotone design with , so that . Then there exists a monotone design such that yet both and .
Proof.
See Appendix E. ∎
5 Examples
In this section, we compute the optimal exploration-exploitation trade-off curves investigated in Section 4 for several specific running variable distributions . We can obtain large gains in efficiency under the criterion by moving from the three level tie-breaker design to , without sacrificing short-term gain. We see further (generally smaller) improvements when we remove the monotonicity constraint and move from to .
To generate these curves we compute optimal designs and and evaluate their efficiency for various fixed as we vary the short-term gain constraint over a fine grid covering . For interpretability we write and specify short-term gain with the normalized parameter , as discussed in Section 2.2. When is continuous, solutions and to (4) are computed by noting that we can write , , , and by Propositions 1 and 2. Each of these designs has two unknown parameters that must be the unique solutions to the two feasibility constraints and . Given these parameters, we can apply (20) to compute and . If we could also get an optimal design of the form in Theorem 1. First we compute via (17), noting that the endpoints of are and . Then (17) is simply maximizing a continuous function over a closed interval, so it can be handled by standard methods such as Brent’s algorithm (Brent,, 1973). Given we can then numerically search for , , such that is feasible with . By Theorem 1 such a solution will exist and be optimal. We can do a similar search for an optimal two level design under the monotonicity constraint, by Theorem 2.
5.1 Uniform running variable
Design | Parameter | Value |
---|---|---|
We begin with the case . This is the distribution most extensively studied by Owen and Varian, (2020), and allows closed form expressions for the parameters in , , , and , given in Table 1. Figure 1 shows plots of versus for under different designs: the three level tie-breaker , a globally optimal design , and an optimal monotone design . Since is symmetric, the curves would be identical if were replaced with .
As shown in Owen and Varian, (2020), under the constraint the three level tie-breaker is optimal for all , and thus the three level tie-breaker, , and all attain the optimal efficiency, as can be seen in the top left panel of Figure 1. The proof of Theorem 3 shows this would hold for any continuous, symmetric running variable distribution . As moves away from 0, however, we see that the three level tie-breaker becomes increasingly less efficient relative to both the optimal monotone design and the optimal design. At the same time, the range of short-term gain values attainable by three level tie-breaker designs becomes smaller relative to the full range achievable by arbitrary designs. Note that Figure 1 plots the reciprocal of the efficiency criterion , so that it can be interpreted as an asymptotic variance for via (15), and compared with the plots in Owen and Varian, (2020).


Design | Description | Normalized short-term gain | |
---|---|---|---|
Sharp RDD | |||
3 level tie-breaker | |||
Optimal monotone design | |||
Optimal design |
Table 2 extends Table 2 of Owen and Varian, (2020), referring to a setting in which only 15% of subjects are to be treated (). That table shows the inverse efficiency of the sharp RDD is 223.44, while the three level tie-breaker reduces this by about 40% to , at the cost of around 2% of the short-term gain of the sharp RDD over the RCT. Then and , further improving efficiency for designs and achieving the same short term gain as the three level tie-breaker. For this example, we can directly compute with (31) that is the unique optimal monotone design, where by Table 1, and . In other words, the unique optimal monotone tie-breaker design deterministically assigns treatment to the top 14.7%, and gives the other subjects an equal, small (0.34%) chance of treatment.
A limitation of this analysis is that in many practical settings, the two-line regression model will not fit very well over the entire range of values. In that case the investigator might use a narrower data range, essentially fitting a less asymmetric two-line model, as illustrated in Owen and Varian, (2020). This is equivalent to using a local linear regression with a rectangular “boxcar” kernel. In this setting, we know from Figure 1 that when the treatment proportion is not exactly 50%, we can always do better than the three level tie-breaker using monotone two level design. Even with a small asymmetry, e.g. 40% treatment (), we see a noticeable efficiency increase between the three level tie-breaker and an optimal monotone design across all values of .
Finally, consistent with the results of Section 4, we observe in Figure 1 that decreases with the gain parameter for each , while near , increases with for all . This clearly demonstrates the inadmissibility of the fully randomized design from Theorem 4. For example, if we fix (so 25% of the subjects are to be treated), the fully randomized design has efficiency 0.25 and no short-term gain (), while has higher efficiency (0.28) with short-term gain . However, if we remove the monotonicity constraint, by Theorem 3 is the most efficient design over all attainable gain values, attaining efficiency 0.33 with .
5.2 Skewed running variable


We now repeat the analysis of Section 5.1 for a skewed running variable distribution :
(28) |
for . This corresponds to a mean-centered Weibull distribution with shape parameter and scale parameter . Figure 2 shows the trade-off curves under this distribution . We see, as expected by Theorem 4, that once again the fully randomized design is inadmissible, even within the class of monotone designs, when .
Another notable feature when is not symmetric is that the three level tie-breaker is no longer optimal, even in the balanced case . While the unconstrained optimal design attains the lower bound for a wide range of short-term gains, Figure 2 shows the three level tie-breaker does not, except in the case corresponding to the RCT. In Figure 2, we see the optimal design is over 100 times as efficient as the three level tiebreaker for sufficiently large , even in the balanced setting . In the unbalanced treatment cases we also see a range of values for which optimal designs with and without the monotonicity constraint attain the same value of . In those situations there exists a globally optimal design that is also monotone.
5.3 Fixed- data example
We now illustrate how to compute optimal designs for the original fixed- problem (3) using a real data example. Ludwig and Miller, (2007) used an RDD to analyze the impact of Head Start, a U.S. government program launched in 1965 that provides benefits such as preschool and health services to children in low-income families. When the program was launched, extra grant-writing assistance was provided to the 300 counties with the highest poverty rates in the country. This created a natural discontinuity in the amount of funding to counties as a function of , a county poverty index based on the 1960 U.S. Census. The distribution of over counties is shown in Figure 3. The data is made freely available by Cattaneo et al., (2017).
If the government had deemed it ethical to somewhat randomize the 300 counties receiving the grant-writing assistance, it could have more efficiently estimated the causal impact of this assistance using our , while still ensuring poorer counties are preferentially helped, and no county has a lower chance of getting the assistance than a more well-off county. As in the data example of Kluger and Owen, (2021), we do not observe the potential outcomes, so we cannot actually implement such a design and compute any estimators. However, we can still study statistical efficiencies, which depend only on the expected information matrix .

We fix the treatment fraction at , corresponding to . Varying the short-term gain constraint we seek to compute and . We describe how to compute the former. Because is discrete it suffices to only consider discontinuity points where places positive probability mass, as every design of the form of in (26) has a representation in that form with such . Also, given the values of the discontinuity and , there is at most one value such that the resulting design in the form of in (26) satisfies the treatment fraction constraint . When such an exists for some , call the corresponding design (note we suppress the dependence on ). From Appendix C we deduce that and if , or if and . This shows we can efficiently find the unique so that satisfies the desired short-term gain constraint . In particular we compute via a binary search on , then solve for to satisfy . Given sorted , this entire procedure computes in operations, as for each , and can be computed in constant time using (5) given the partial sums .
After computing with a similar approach, we can apply (20) to compute an optimal design . As in the continuous case, we can alternately obtain a solution of the form in Theorem 2 by finding , , , and such that , , and for . Unlike the continuous case, we now have 4 unknown parameters instead of 3. We can search for an optimal set of these parameters by looping through the finite possible values of and then doing a univariate search for , noting that knowledge of and determines and by the equality constraint parameters . We implemented this search, along with the procedure to compute and described above, in the R language (R Core Team,, 2022). The code is freely available online222https://github.com/hli90722/optimal_tiebreaker_designs.
The right panel of Figure 3 shows the inverse efficiency for the three level tie-breaker (9) versus the best two level monotone design obtained by applying the above procedure to the in the Head Start data. It turns out that for these and our choice of , is optimal for all (and hence the unique optimal design, by Proposition 2). We note that with a normalized short term gain , which corresponds to random assignment for about 150 counties in the 3-level tie-breaker, the optimal monotone two level design has inverse efficiency 0.030, compared to 0.050 for the three level tie-breaker. That is, confidence intervals for using the three level tie-breaker would be about 29% wider than for the optimal monotone two-level design, without additional short-term gain. The sharp RDD would give 62% wider intervals than the optimal monotone two-level design with only about 4.2% additional short-term gain.
6 Summary
Our results provide a thorough characterization of the solutions to a constrained optimal experiment design problem. Considering a linear regression model for a scalar outcome involving a binary treatment assignment indicator , a scalar running variable , and their interaction, we seek to specify a randomized treatment assignment scheme based on — a tie-breaker design — that optimizes a statistical efficiency criterion that is an arbitrary continuous function of the expected information matrix under this regression model. We have equality constraints on the proportion of subjects receiving treatment due to an external budget, and on the covariance between and due to a preference for treating subjects with higher values of . Critically, our proof techniques, which deviate from those typically used to show equivalence theorems, enable an additional monotonicity constraint. This allows our results to handle the ethical or economic requirement that a subject cannot have a lower chance of receiving the treatment than another subject with a lower value of .
In a setting where the running variable is viewed as random from some distribution — and thus part of the randomness in the expected information matrix defining the efficiency criterion — we prove the existence of constrained optimal designs that stratify into a small number of intervals and assign treatment with the same probability to all individuals within each stratum. In particular, with the monotonicity constraint that is essential in our motivating applications, we only need three strata, one of which only contains a single running variable value. We also provide strong conditions on which the optimal tie-breaker design is unique. We emphasize the generality of our results, which apply for any continuous efficiency criterion, any running variable distribution (subject only to weak moment existence conditions), and the full range of feasible equality constraints. The problem an investigator faces in practice, where there are a finite number of running variable values known (hence non-random) at the time of treatment assignment, is a special case of our more general problem where is discrete and takes on values with equal probability. This enables optimal designs to be easily computed in practice, as described in Section 5.3.
We believe that this work provides a useful starting point to study optimal tie-breaker designs. For results on tie-breaker designs beyond the two line parametric regression, see Morrison and Owen, (2022) for a multivariate regression context and Kluger and Owen, (2021) for local linear regression models with a scalar running variable.
Appendix A Proof of Corollary 1
For any we have where is as in Lemma 1. The desired condition is equivalent to and so it suffices to show .
Applying Cauchy-Schwarz to and then yields the two equations
where we have used the fact since . Note both inequalities are strict, since cannot equal a scalar multiple of w.p.1. If it did, then for some , implying and hence w.p.1, contradicting . As we know that either or .
Appendix B Proof of uniqueness in Proposition 1
We show uniqueness for . The same argument shows uniqueness for and hence uniqueness for . Suppose that and are both solutions for . By symmetry we can assume that either , or both and . Since and are feasible for (19), we must have and , in view of (5). We show that w.p.1. under . Note that we can assume without loss of generality that for any , because otherwise, we could increase to without changing on a set of positive probability. We can similarly assume that for any . Finally, we impose these two canonicalizing conditions on and as well.
Assume first that . Then we cannot have because we would then need either or with and to enforce and this would cause . We similarly cannot have with both and . Therefore after canonicalizing, we know that both and are equivalent to designs of the form given with and along with the analogous conditions and . Then our canonicalized and satisfy for all and so in particular .
It remains to handle the case where . We then have since . If then the support of is completely to the right of that of which violates . We can similarly rule out . As a result must have . Then we must have or else . For the same reason, we must have . It then follows that both and have support and then forces , so .
Appendix C Proof of uniqueness in Proposition 2
We focus on and consider two monotone designs and satisfying the feasibility constraints and along with the characterization of in (26). Then and for some with and . Note the cases (and the same for ) are excluded by the assumptions that and . Also, also guarantees . Finally, we note that we only have to show for almost all , since then ensures either or ; in either case this gives w.p.1. By symmetry we can assume that with if . Then for all .
Now we compute
If , then the right-hand side reduces to just . This is nonzero unless or . In both cases for almost all .
If , then we can assume (otherwise the problem reduces to the case ). First suppose . Then for all and so the treatment fraction constraint would require the identity
to hold with equality w.p.1. But since , equality w.p.1. can only occur if and . In that case we immediately see for almost all , but so for almost all as well. Conversely, if we suppose , then requires and for almost all , so once again for almost all .
Appendix D Proof of Theorem 3
For any feasible design , we have
(29) |
where as in (2.3). Thus is a concave quadratic function of globally maximized at
(30) |
It follows that is the point in closest in absolute value to , i.e.
(31) |
The above holds for any choice of ; for the remainder of this proof we take to be the set of all measurable design functions.
We first show the case where . Note that for any symmetric design , by symmetry of the running variable distribution. By continuity, for any there exists such that the three level tie-breaker (which is symmetric and always satisfies ) satisfies too. This shows that for all , and hence , meaning any feasible design with is optimal. Then by (29)
which is decreasing in on , showing the theorem for .
For the cases and , we begin with the two following claims.
Claim 1.
For any with , we have .
Claim 2.
For any with , we have .
Proof of Claims 1 and 2.
We write and similarly rewrite . For claim 1, we proceed by writing by Proposition 1 (suppressing the dependence of and on in our notation) and performing casework on the signs of and to show that in each case. In the case we have by symmetry; similarly if then . Next, if then since implies by (5). Therefore
where the final inequality uses symmetry of again. The final case follows by a symmetric argument. The proof of Claim 2 is completely analogous, with by Proposition 1. ∎
We now proceed to prove the theorem. Given Claim 1, we have by (31), and hence suppressing some dependences
where and is defined by substituting into (29). We must show that is decreasing on .
First, we compute and note it is decreasing in since is positive (Corollary 1) and decreasing in on . Next, we show is decreasing in . Note are the unique solutions to the system
By the implicit function theorem (e.g., De Oliveira, (2018) since we do not require continuity of )), it follows that and are differentiable and satisfy
(32) | ||||
(33) |
Equations (32) and (33) imply that
The inequality follows by the assumption which ensures and must have different signs, and then noting that requires , the equality following by symmetry of . Thus, for all such that we have
The RHS is negative (Corollary 1), so is in fact decreasing in .
Finally, we fix and show .
Note is continuous in ,
and . We now carry out casework on the signs of and .
: In this case
(34) |
and : Define , which contains . Letting , we have and for , so
(35) |
and : In this case either on (so ), or as defined in the previous case is non-empty with and satisfying and . Then
which shows the theorem when . The proof of the case is completely symmetric, and relies on Claim 2.
Appendix E Proof of Theorem 4
First we fix . It suffices to show that assuming , there exists such that whenever , and that is continuous in at .
From the assumed continuity of and Proposition 2, we have , with ensuring . Again, we suppress the dependence of and on in our notation for brevity. By the treatment fraction constraint , we must have . From the short-term gain constraint we see
We know by Proposition 2 and continuity of that the two equations above have a unique solution for . Thus, we can differentiate both of the equations above with respect to to see that the derivatives of and are given by
Then is differentiable as well with
Next, note that , in the notation of (30). By differentiability (and thus continuity) of and (the latter due to differentiability of and ), we conclude that there exists such that for all . By (31), this means for all . Thus, it suffices to show for all , for some . Continuity of at follows immediately from continuity of and (29).
As we have and and also . In the case we have by assumption. If , then as . Finally, we substitute into the formula (29) for getting
Since as , we have (Corollary 1). Our analysis of the limiting behavior on then indicates that
The proof for the case is completely analogous. We first show that whenever is sufficiently close to 0. Then we note is the unique solution to the equations and to compute the derivatives and . This enables us to show
under the condition .
Acknowledgments
This work was supported by the US National Science Foundation under grants IIS-1837931 and DMS-2152780. The authors would like to thank Kevin Guo, Dan Kluger, Tim Morrison, and several anonymous reviewers for helpful comments.
References
- Abdulkadiroglu et al., (2017) Abdulkadiroglu, A., Angrist, J. D., Narita, Y., and Pathak, P. A. (2017). Impact evaluation in matching markets with general tie-breaking. Technical report, National Bureau of Economic Research.
- Aiken et al., (1998) Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J. L., and Hsiung, S. (1998). Comparison of a randomized and two quasi-experimental designs in a single outcome evaluation: Efficacy of a university-level remedial writing program. Evaluation Review, 22(2):207–244.
- Angrist et al., (2020) Angrist, J., Autor, D., and Pallais, A. (2020). Marginal effects of merit aid for low-income students. Technical report, National Bureau of Economic Research.
- Atkinson et al., (2007) Atkinson, A., Donev, A., and Tobias, R. (2007). Optimum experimental designs, with SAS. Oxford University Press.
- Atkinson, (1982) Atkinson, A. C. (1982). Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika, 69(1):61–67.
- Atkinson, (2014) Atkinson, A. C. (2014). Selecting a biased-coin design. Statistical Science, 29(1):144–163.
- Bandyopadhyay and Biswas, (2001) Bandyopadhyay, U. and Biswas, A. (2001). Adaptive designs for normal responses with prognostic factors. Biometrika, 88(2):409–419.
- Biswas and Bhattacharya, (2018) Biswas, A. and Bhattacharya, R. (2018). A class of covariate-adjusted response-adaptive allocation designs for multitreatment binary response trials. Journal of biopharmaceutical statistics, 28(5):809–823.
- Boyd and Vandenberghe, (2004) Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press, Cambridge.
- Brent, (1973) Brent, R. P. (1973). Algorithms for minimization without derivatives. Prentice-Hall, Inc., Englewood Cliffs, NJ.
- Campbell, (1969) Campbell, D. T. (1969). Reforms as experiments. American psychologist, 24(4):409.
- Cattaneo et al., (2017) Cattaneo, M. D., Titiunik, R., and Vazquez-Bare, G. (2017). Comparing inference approaches for RD designs: A reexamination of the effect of head start on child mortality. Journal of Policy Analysis and Management, 36(3):643–681.
- Chernoff, (1953) Chernoff, H. (1953). Locally optimal designs for estimating parameters. The Annals of Mathematical Statistics, pages 586–602.
- Clyde and Chaloner, (1996) Clyde, M. and Chaloner, K. (1996). The equivalence of constrained and weighted designs in multiple objective design problems. Journal of the American Statistical Association, 91(435):1236–1244.
- Cook and Wong, (1994) Cook, R. D. and Wong, W. K. (1994). On the equivalence of constrained and compound optimal designs. Journal of the American Statistical Association, 89(426):687–692.
- Dantzig and Wald, (1951) Dantzig, G. B. and Wald, A. (1951). On the fundamental lemma of Neyman and Pearson. The Annals of Mathematical Statistics, 22(1):87–93.
- De Oliveira, (2018) De Oliveira, O. (2018). The implicit function theorem for maps that are only differentiable: An elementary proof. Real Analysis Exchange, 43(2):429–444.
- Efron, (1971) Efron, B. (1971). Forcing a sequential experiment to be balanced. Biometrika, 58(3):403–417.
- Gelman and Imbens, (2017) Gelman, A. and Imbens, G. (2017). Why high-order polynomials should not be used in regression discontinuity designs. Journal of Business & Economic Statistics, 37(3):447–456.
- Goldberger, (1972) Goldberger, A. S. (1972). Selection bias in evaluating treatment effects: Some formal illustrations. Technical Report Discussion paper 128–72, Institute for Research on Poverty, University of Wisconsin–Madison.
- Goldenshluger and Zeevi, (2013) Goldenshluger, A. and Zeevi, A. (2013). A linear response bandit problem. Stochastic Systems, 3(1):230–261.
- Hu and Rosenberger, (2006) Hu, F. and Rosenberger, W. F. (2006). The theory of response-adaptive randomization in clinical trials. John Wiley & Sons.
- Hu et al., (2015) Hu, J., Zhu, H., and Hu, F. (2015). A unified family of covariate-adjusted response-adaptive designs based on efficiency and ethics. Journal of the American Statistical Association, 110(509):357–367.
- Jacob et al., (2012) Jacob, R., Zhu, P., Somers, M.-A., and Bloom, H. (2012). A practical guide to regression discontinuity. MDRC.
- Kluger and Owen, (2021) Kluger, D. and Owen, A. B. (2021). Tie-breaker designs provide more efficient kernel estimates than regression discontinuity designs. Technical Report arXiv:2101.09605, Stanford University.
- Läuter, (1974) Läuter, E. (1974). Experimental design in a class of models. Mathematische Operationsforschung und Statistik, 5(4-5):379–398.
- Läuter, (1976) Läuter, E. (1976). Optimal multipurpose designs for regression models. Mathematische Operationsforschung und Statistik, 7(1):51–68.
- Lee, (1987) Lee, C. M.-S. (1987). Constrained optimal designs for regressiom models. Communications in Statistics-Theory and Methods, 16(3):765–783.
- Lee, (1988) Lee, C. M.-S. (1988). Constrained optimal designs. Journal of Statistical Planning and Inference, 18(3):377–389.
- Lehmann and Romano, (2005) Lehmann, E. L. and Romano, J. P. (2005). Testing statistical hypotheses, volume 3. Springer, New York.
- Lipsey et al., (1981) Lipsey, M. W., Cordray, D. S., and Berger, D. E. (1981). Evaluation of a juvenile diversion program: Using multiple lines of evidence. Evaluation Review, 5(3):283–306.
- Ludwig and Miller, (2007) Ludwig, J. and Miller, D. L. (2007). Does Head Start improve children’s life chances? evidence from a regression discontinuity design. The Quarterly journal of economics, 122(1):159–208.
- Metelkina and Pronzato, (2017) Metelkina, A. and Pronzato, L. (2017). Information-regret compromise in covariate-adaptive treatment allocation. The Annals of Statistics, 45(5):2046–2073.
- Morrison and Owen, (2022) Morrison, T. P. and Owen, A. B. (2022). Optimality in multivariate tie-breaker designs. Technical report, Stanford University. arxiv2202.10030.
- Neyman and Pearson, (1933) Neyman, J. and Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231(694-706):289–337.
- Owen and Varian, (2020) Owen, A. B. and Varian, H. (2020). Optimizing the tie-breaker regression discontinuity design. Electronic Journal of Statistics, 14(2):4004–4027.
- R Core Team, (2022) R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- Rosenberger and Sverdlov, (2008) Rosenberger, W. F. and Sverdlov, O. (2008). Handling covariates in the design of clinical trials. Statistical Science, 23(3):404–419.
- Stigler, (1971) Stigler, S. M. (1971). Optimal experimental design for polynomial regression. Journal of the American Statistical Association, 66(334):311–318.
- Sverdlov et al., (2013) Sverdlov, O., Rosenberger, W. F., and Ryeznik, Y. (2013). Utility of covariate-adjusted response-adaptive randomization in survival trials. Statistics in Biopharmaceutical Research, 5(1):38–53.
- Thistlethwaite and Campbell, (1960) Thistlethwaite, D. L. and Campbell, D. T. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational psychology, 51(6):309.
- Trochim and Cappelleri, (1992) Trochim, W. M. and Cappelleri, J. C. (1992). Cutoff assignment strategies for enhancing randomized clinical trials. Controlled Clinical Trials, 13(3):190–212.
- Whittle, (1973) Whittle, P. (1973). Some general points in the theory of optimal experimental design. Journal of the Royal Statistical Society: Series B (Methodological), 35(1):123–130.
- Zhang et al., (2007) Zhang, L.-X., Hu, F., Cheung, S. H., and Chan, W. S. (2007). Asymptotic properties of covariate-adjusted response-adaptive designs. The Annals of Statistics, 35(3):1166–1182.
- Zhang and Hu, (2009) Zhang, L.-X. and Hu, F.-f. (2009). A new family of covariate-adjusted response adaptive designs and their properties. Applied Mathematics-A Journal of Chinese Universities, 24(1):1–13.