This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Degrees of Randomness in Rerandomization Procedures

Connor T. Jerzak Assistant Professor, Government Department, The University of Texas at Austin. Email: [email protected] URL: ConnorJerzak.com ORCID: 0000-0001-9858-2050    Rebecca Goldstein Assistant Professor of Law, Jurisprudence and Social Policy Program, UC Berkeley School of Law, 2240 Piedmont Avenue, Berkeley CA 94720. Email: [email protected] URL: RebeccasGoldstein.com ORCID: 0000-0002-9944-8440

Degrees of Randomness in Rerandomization Procedures

Connor T. Jerzak Assistant Professor, Government Department, The University of Texas at Austin. Email: [email protected] URL: ConnorJerzak.com ORCID: 0000-0001-9858-2050    Rebecca Goldstein Assistant Professor of Law, Jurisprudence and Social Policy Program, UC Berkeley School of Law, 2240 Piedmont Avenue, Berkeley CA 94720. Email: [email protected] URL: RebeccasGoldstein.com ORCID: 0000-0002-9944-8440

Degrees of Randomness in Rerandomization Procedures

Connor T. Jerzak Assistant Professor, Government Department, The University of Texas at Austin. Email: [email protected] URL: ConnorJerzak.com ORCID: 0000-0001-9858-2050    Rebecca Goldstein Assistant Professor of Law, Jurisprudence and Social Policy Program, UC Berkeley School of Law, 2240 Piedmont Avenue, Berkeley CA 94720. Email: [email protected] URL: RebeccasGoldstein.com ORCID: 0000-0002-9944-8440
Abstract

Randomized controlled experiments are susceptible to imbalance on covariates predictive of the outcome. Rerandomization and deterministic treatment assignment are two proposed solutions. This paper explores the relationship between rerandomization and deterministic assignment, showing how deterministic assignment is an extreme case of rerandomization. The paper argues that in small experiments, both fully randomized and fully deterministic assignment have limitations. Instead, the researcher should consider setting the rerandomization acceptance probability based on an analysis of covariates and assumptions about the data structure to achieve an optimal alignment between randomness and balance. This allows for the calculation of minimum pp-values along with valid permutation tests and fiducial intervals. The paper also introduces tools, including a new, open-source R package named fastrerandomize, to implement rerandomization and explore options for optimal rerandomization acceptance thresholds.
Keywords: Rerandomization; Design of experiments; Design-based inference; Optimally balanced randomization

1 Introduction

Although randomized controlled experiments are now a cornerstone of causal inference in the medical and social sciences, researchers have known since at least [14] that in finite samples, random assignment will often result in imbalance between the treated and control groups on covariates predictive of the outcome. This imbalance will generate conditional bias on estimates of the average treatment effect while adding noise to treatment effect estimates.

In a 2012 paper, Morgan and Rubin propose a solution: rerandomization [22]. They argue that if experimenters provide a quantitative definition of imbalance in advance, they can safely discard randomizations that are too imbalanced and obtain more precise estimates of treatment effects. After rerandomization, researchers adjust exact significance tests and fiducial intervals by excluding randomizations that would not have been accepted according to the initial acceptance threshold. Researchers can also calculate fiducial intervals by exploiting the duality between intervals and tests. Rerandomization has already gained traction as a best practice for field researchers: it is included as a recommended practice for minimizing covariate imbalance in the Handbook of Field Experiments [12].

At the same time, [17] revived a decades-old debate between Gossett (writing as “Student”) and Fisher by arguing that “experimenters might not want to randomize in general” because “[t]he treatment assignment that minimizes conditional expected loss is in general unique if there are continuous covariates, so that a deterministic assignment strictly dominates all randomized assignments.” A similar argument is made in [27, 3, 16] and proven in a theoretical framework by [2], who show that Bayesians and near-Bayesians should not randomize.

In this paper, we begin by formalizing the idea that these two solutions to the problem of covariate imbalance in randomized controlled trials – rerandomization (the Morgan and Rubin approach) and deterministic choice of an optimally balanced treatment assignment vector (the Kasy approach)—are closely related but very much in conflict.111[17] discusses the relationship between the two ideas in concept, writing that “I very much agree with this [Morgan and Rubin’s] argument; one way to think of the present paper is that it provides a formal foundation for this argument and takes it to its logical conclusion.” In particular, we discuss how the deterministic approach in [17] is a case of the [22] approach, but with the threshold for acceptable randomizations set so high that there is only one acceptable treatment assignment vector—and thus an undefined minimum possible pp-value and no way to carry out a randomization test.

This discussion is particularly useful in the context of very small experiments, which can arise in contexts such as clinical medicine and development economics. As [10] points out, if there is an experiment in which two villages are to be assigned to treatment and two to control, there are only six possible allocations of the treatment conditions, one of which would have been the self-selected allocation. Even with “hundreds of villages,” Deaton writes, “whether or not balance happens depends on how many factors have to be balanced, and nothing stops the actual allocation being the self-selected allocation that we would like to avoid.” Rerandomization, though, would stop the actual allocation from being a specific unbalanced application that we would like to avoid, and we formalize an approach to arriving at an optimum rerandomization that achieves covariate balance without being overly deterministic—thereby allowing use of randomization tests even with close-to-optimal balance.

The issues discussed here have resonance with long-standing issues raised by Neyman’s foundational 1923 paper. As Donald Rubin wrote in 1990, this paper “represents the first attempt to evaluate, formally or informally, the repeated-sampling properties over their nonnull randomization distributions…this contribution is uniquely and distinctly Neyman’s” (Rubin 1990). The insight that variance could be—and indeed needs to be—calculated in part based on variations over these distributions is a key conceptual piece of the modern potential outcomes framework.

One component of variance calculations over randomization distributions is expressed, in early form, in Neyman (1923)’s description of “true yields,” which Neyman describes as “ repeat[ed]…measurement of the yield on the same fixed plot under the same conditions” (Neyman 1923). Rubin characterizes this as a way of describing “[v]ariation in efficacy of randomly chosen versions of the same treatment” (Rubin 1990), and this idea—that there can be variation in outcomes over randomly chosen version of the same treatment—that has its echo in current work on rerandomization. Our contribution here is to probe some of the consequences of researcher choices in the rerandomization context for exact significance tests for treatment effects, focusing on the ways in which there may or may not be an optimal degree of randomness in treatment assignment.

In the remainder of the paper, we answer the question of where experimenters should set the acceptance threshold for rerandomization, especially in small experiments. Our argument proceeds in three parts. First, we discuss how the minimum pp-value achievable under a rerandomization scheme is a function of the acceptance probability, pap_{a}. Then, we discuss how the minimum pp-value determines the width of fiducial intervals when these intervals are generated using permutation-based tests that account for the rerandomization procedure. Finally, we propose a simple pre-experiment analysis to explore the optimal degree of randomness under varying assumptions about the data-generating process. In brief, the experimenter can use knowledge of the covariates and prior assumptions about the data structure to select the degree of randomness allowed via the rerandomization threshold optimally.

We make these rerandomization tools available in open-source software. The repository, available at github.com/cjerzak/fastrerandomize-software contains tutorials on how to deploy this package on real data.

Criteria Neyman Fisher Rerandomization
Objective Estimate causal effects and obtain unbiased estimates Randomize to ensure the validity of significance tests Balance covariates to improve causal effect precision
Statistical Assumption Potential outcomes are fixed; random assignment of 𝐖\mathbf{W} Random assignment of 𝐖\mathbf{W} Random assignment of 𝐖\mathbf{W}
Pre-design data requirements Knowledge of number of observations (if completely randomizing) Knowledge of number of observations (if completely randomizing) Knowledge of number of observations (if completely randomizing); baseline pre-treatment covariates
Efficiency Focused on deriving efficient estimators Focused on exactness Can improve efficiency, maintain exactness
Balance Often achieved via stratification Not guaranteed unless blocking, depends on sample size Explicitly aims for balance
Table 1: Comparison of three paradigms in experimental design: Neymanian, Fisherian, and rerandomization-based inference.

2 Application to a Modern Agricultural Experiment: An Introduction

We motivate methodological problems discussed in this paper using a recent randomized experiment on agricultural tenancy contracts [7]. In this study, researchers randomized the assignment of five different tenancy contracts across three treatment arms to 304 tenant farmers in 237 Ugandan villages. The impetus for this study comes from the longstanding hypothesis in microeconomics that one reason for low agricultural production among tenant farmers is that tenancy contracts which turn a large share (e.g., half) of a tenant farmer’s crops to the landowner effectively incentivize low output [21]. This hypothesis is most prominently due to Alfred Marshall and thus is sometimes called Marshallian inefficiency.

To test this empirically, the authors randomized tenants into three treatment arms reflecting three different tenancy contracts: one where tenants kept 50% of their crop (the standard tenancy contract in Uganda, labeled the Control arm in the study), one where they kept 75% (labeled Treatment 1 in the study), and one where they kept 50% of their output but earned an additional fixed payment—either as a cash transfer (labeled Treatment 2A) or as part of a lottery (labeled Treatment 2B).

Consistent with the idea of Marshallian inefficiency, [7] find that, when tenants are able to keep more of their own output (75% versus 50%, Treatment 1 and Control in the study), they generate 60% greater agricultural output. Treatment 2—the additional payment—does not lead to greater output compared to the control tenancy contract.

This experiment has much in common with the experiments that Neyman was researching in the early twentieth century. Its aim is to improve agricultural yields, just as in many of the classic experiments that both Neyman and Fisher designed. And because there is variation between farms, plots, and farmer characteristics, it is critical that, at baseline (pre-experimentation), the studied units are balanced on covariates that could affect farm yield.

This study had a large number of units—large enough that any rerandomization strategy would probably not meaningfully have changed the main results (since rerandomization and t-tests are asymptotically equivalent [25]). However, it was carried out in partnership with one of the largest non-governmental organizations in Uganda, and researchers unable to work with such a large and well-funded partner might not have the opportunity to carry out such a large trial. Small trials where rerandomization is critical to finding an unbiased treatment effect with a narrow confidence interval are not uncommon in general, especially in clinical medicine [25].

Figure 1 displays imbalance in the full and a randomly selected 10% subset of the Treatment 1 and Control units is shown below. As we see, with a larger sample size, finite-sample balance is approximately maintained, but with a small experiment, there are larger differences between the treatment and control groups. These imbalances contribute to increased variance in ATE estimates, as an uncertainty estimate for the ATE in the full sample (s.e. == 0.05) is much smaller than in the reduced sample (s.e. == 0.29). Add to this some of the issues associated with small samples regarding degrees of randomness we shall discuss, we see that the study of rerandomization in small studies is of practical importance.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Left panel: Difference in covariate means in the full data. Right panel: Difference in covariate means for a randomly selected 10% sample of the full data.

In this context, could we have improved precision of the treatment effect estimates for a 10% sample of the data using rerandomization-based approaches? Does rerandomization provide a route to precise treatment effect estimates in the case of possible covariate imbalance? If so, how should the rerandomization process proceed? We turn to this issue in the next section.

3 Rerandomization in Small Samples

3.1 Notation & Assumptions

Before we present our main arguments, it is useful to introduce the notation we will use going forward. Following the notation in [22], we assume that the experiment of interest is 212^{1}, meaning there is one experimental factor with two levels (i.e., a treatment and control level). Let Wi=1W_{i}=1 if unit ii is treated and Wi=0W_{i}=0 if unit ii is control. Say i{1,,n}i\in\{1,...,n\}. The vector Wn×1{W}_{n\times 1} denotes treatment assignment. Let 𝐗n×k\mathbf{X}_{n\times k} denote the data matrix containing kk pre-treatment covariates relevant to the outcome of interest. We are interested in the outcome matrix 𝐘n×2\mathbf{Y}_{n\times 2}, where Y(0)n×1{Y}(0)_{n\times 1} denotes the complete potential outcomes under control and Y(1)n×1{Y}(1)_{n\times 1} denotes the complete potential outcomes under treatment. If (YObs(W))n×1\left(Y_{\textrm{Obs}}({W})\right)_{n\times 1} denotes the vector of observed outcome values, YObs,i=Yi(1)Wi+Yi(0)(1Wi)Y_{\textrm{Obs},i}=Y_{i}(1)W_{i}+Y_{i}(0)(1-W_{i}). If we assume the sharp null hypothesis (zero treatment effect for every unit) and if we leave “YObs{Y}_{\textrm{Obs}} fixed and simulating many acceptable randomization assignments, W{W}, we can empirically create the distribution of any estimator,” g(𝐗,W,YObs(W))g(\mathbf{X},{W},{Y}_{\textrm{Obs}}({W})) conditional on the null hypothesis [22]. We make the additional assumption that the experimenter is interested in using Fisherian inference and that the treatment effects are constant and additive. Next, we define randomization-based fiducial intervals because these intervals are the core of both [22]’s insight and our contribution to the literature. Researchers can also exploit the duality between intervals and tests to form a fiducial interval from a randomization test.

3.2 Rerandomization and the Limits of Randomization Inference

Is it possible to take rerandomization “too far” in the sense that exact tests are no longer informative? We argue that it is indeed possible, in the sense of accepting so few randomizations that it is no longer possible to perform a meaningful randomization test. This important method of assessing significance in exact tests becomes non-informative.

Consider the following example: there are 8 experimental units, and we would like to test the effect of one factor at two levels (one treatment group and one control group). There is one background covariate, hh, which denotes the hour a test is conducted. For simplicity, assume that there are 8 observed values for hh, ranging from 11 to 88. We can quantify the randomization balance with the square root of the quadratic loss:

m=[(h¯|Wi=1)(h¯|Wi=0)]2m=\sqrt{\left[(\bar{h}|W_{i}=1)-(\bar{h}|W_{i}=0)\right]^{2}} (1)

where smaller values of mm are more desirable because they indicate better balance on hh. If the experiment is completely randomized and no rerandomization done, the smallest possible pp-value is 1÷(84)=0.0151\div{8\choose 4}=0.015 and the average mm value is 1.41.4. If the experiment is pair-matched and complete randomization occurs within pairs for whom h{1,2},,h{7,8}h\in\{1,2\},...,h\in\{7,8\}, the average mm improves to 0.3750.375. However, the minimum possible pp-value in randomization inference is now 1÷(21)4=0.0631\div{2\choose 1}^{4}=0.063. The minimum possible pp-value does not vary linearly with the maximum acceptable mm in completely randomized experiments (see Table 2).

Table 2: A simple example: How does one’s balance threshold influence exact pp-values?
Maximum acceptable balance score Minimum pp-value
4 0.125
3 0.028
2 0.018
1 0.015
0 0.014

This example illustrates that low acceptance thresholds can invalidate the estimation of uncertainty in the sense that the minimum possible pp-value will eventually increase above 0.050.05. In addition, the minimum possible pp-value varies nonlinearly with the level of strictness the researcher adopts in accepting randomizations. These considerations suggest that it may be possible to systematically establish an acceptance criterion that will minimize the minimum possible pp-value while maximizing the balance improvement from rerandomization.

This insight formalizes the intuition, described in [10], that including every possible combination of treatment allocations will inevitably involve including a large number of irrelevant treatment allocations that do not help the researcher understand a possible treatment effect. “Randomization, after all, is random,” Deaton writes, “and searching for solutions at random is inefficient because it considers so many irrelevant possibilities.” One intuitive way to understand our approach to optimal rerandomization is that it has the advantage of, as Deaton writes, “provid[ing] the basis for making probabilistic statements about whether or not the difference [between treatment and control groups] arose by chance,” without the disadvantage of including many irrelevant treatment allocations, including ones which the researcher would specifically like to avoid.

3.3 Randomness in Rerandomization and Implications for Exact Randomization Tests

An important theoretical principle here is that

Minimum pp-value =1# acceptable randomizations.\displaystyle=\frac{1}{\textrm{\# acceptable randomizations}}.

This expression is true because the exact pp-value is defined as the fraction of hypothetical randomizations showing results as or more extreme than the observed value. If there are rr acceptable randomizations, then at least 11 of the rr randomizations is as or more extreme than the observed value, so the minimum fraction is 1/r1/r. When the set of acceptable randomizations gets smaller, the minimum pp-value invariably gets larger.

Setting nCand=(105)n_{\textrm{Cand}}={10\choose 5} in Figure 2, we see the clear non-linear relationship between pap_{a} and the minimum pp-value. This figure is consistent with the later Monte Carlo results of Figure 3.

Refer to caption
Figure 2: With nCandn_{\textrm{Cand}} held fixed, we can plot the relationship between pap_{a} and the minimum pp-value.

The preceding expression shows how there is a point after which, as experimenters reduce further pap_{a}, fiducial intervals get wider as well, since it becomes more and more difficult to reject H0:τ=τ0H_{0}:\tau=\tau_{0} for any τ0\tau_{0}. In the extreme, if the minimum pp-value is above α\alpha, randomization-based fiducial interval will become non-informative at (,)(-\infty,\infty).

For the sake of illustration, we briefly examine an example from simulations that we will later explore more carefully (see §5.1 for design information). For each candidate randomization, we calculate MM, where

M\displaystyle M :=(𝐗¯T𝐗¯C)[Cov(𝐗¯T𝐗¯C)1](𝐗¯T𝐗¯C);\displaystyle:=\left(\bar{\mathbf{X}}_{T}-\bar{\mathbf{X}}_{C}\right)^{\prime}\bigg{[}\textrm{Cov}\left(\bar{\mathbf{X}}_{T}-\bar{\mathbf{X}}_{C}\right)^{-1}\bigg{]}\left(\bar{\mathbf{X}}_{T}-\bar{\mathbf{X}}_{C}\right);
=npw(1pw)(𝐗¯T𝐗¯C)Cov(𝐱)1(𝐗¯T𝐗¯C),\displaystyle=np_{w}\left(1-p_{w}\right)\left(\bar{\mathbf{X}}_{T}-\bar{\mathbf{X}}_{C}\right)^{\prime}\textrm{Cov}(\mathbf{x})^{-1}\left(\bar{\mathbf{X}}_{T}-\bar{\mathbf{X}}_{C}\right),

where pwp_{w} denotes the fixed proportion of treated units and Cov(𝐱)\textrm{Cov}(\mathbf{x}) denotes the sample covariance matrix of 𝐗\mathbf{X}. We then accept the top pap_{a}-th fraction of randomizations (in terms of balance) across all possible randomizations.222Note: We could also accept MM with probability pap_{a} based on the inverse CDF of MM. Due to the Multivariate Normality of X, we see that Mχk2M\sim\chi_{k}^{2}; results are similar across approach.

We see in Figure 3 how, in small samples, different rerandomization acceptance thresholds induce different expected exact pp-values, where the expectation averages across the data-generating process. More stringent acceptance probabilities reduce imbalance and therefore finite-sample bias and sampling variability of estimated treatment effects (see [19], also [6]). However, at a certain point, the size of the acceptable rerandomization set no longer supports significant results, as eventually there is only one acceptable randomization (leading to a minimum and expected pp-value of 1).

Refer to caption
Refer to caption
(a)
Figure 3: Illustrating the relationship between acceptance probability and expected pp-value in small samples. Error bars are 95% confidence intervals across simulations.

As a consequence of this discussion, as the acceptance threshold gets more stringent, despite improved sampling variability of the treatment effect estimate, exact fiducial intervals will eventually become non-informative as the interval width goes to \infty because the size of the rerandomization set can only yield a pp-value above 0.05. The exact threshold when non-informativity is reached will vary with the experimental design, but the general pattern is theoretically inevitable: if we shrink the size of the set in which a random choice gets made, we also shrink the reference set for assessing uncertainty. This simulation shows that the interval widths do not necessarily get monotonically smaller until the point of non-informativity but can start to increase well before that point (the exact change point will require future theoretical elucidation). In this context, how can we characterize “optimal” ways to rerandomize? What sources of uncertainty affect this optimization?

Refer to caption
(a)
Figure 4: As the acceptance probability decreases, the minimum pp-value increases until no statistical significance at α=0.05\alpha=0.05 is possible.

We can state these ideas more formally as follows.

Let \mathcal{R} denote the set of all possible ways of assigning NN observational units to treatment or control. Let 𝒜\mathcal{A} denote the set of all algorithms that take \mathcal{R} as an input and give as an output a reduced set of acceptable assignments, A\mathcal{R}_{A}. A subset of 𝒜\mathcal{A}, denoted 𝒜1\mathcal{A}^{1} produces an A\mathcal{R}_{A} such that |A|=1|\mathcal{R}_{A}|=1. This subset contains [17]’s algorithm as a member, as well as any algorithm for returning a strictly optimal treatment vector (according to some imbalance measure).

Observation 1. Any member of the set, 𝒜1\mathcal{A}^{1}, yields infinitely wide fiducial intervals and has no ability to reject any null hypothesis using randomization tests.

Proof of Observation 1. By construction, the minimum possible pp-value of non-parametric randomization tests for any member of 𝒜1\mathcal{A}^{1} is

Minimum pp-value =1# acceptable randomizations.\displaystyle=\frac{1}{\textrm{\# acceptable randomizations}}.

When |A|=1|\mathcal{R}_{A}|=1 by definition, we have

Minimum pp-value when |A||\mathcal{R}_{A}| is 11 =11=1>0.05.\displaystyle=\frac{1}{1}=1>0.05.

Thus, all pp-values will be greater than 0.050.05 for 𝒜1\mathcal{A}^{1}, directly implying no ability to reject any exact null hypothesis. Moreover, by the construction of intervals from tests, it follows that every candidate endpoint of the fiducial interval will fail to be rejected. As a consequence, the fiducial interval will span the entire range of the outcome variable and will be infinitely wide for continuous outcomes. In sum, optimal treatment assignment leads to non-informative exact tests and confidence intervals. (We note that the problem of obtaining Neymanian randomization-based distributions for the test statistic is different and outside the scope of this work; see the discussion in [15, 4]).

There is an interesting connection to the discussion of randomness in rerandomization approaches here to the literature on matching contains a non-informativity of a similar nature. In that context, researchers must decide how much dissimilarity to allow between matches. This dissimilarity is often set with a caliper. [9], [1], and others have made recommendations about the optimal caliper size in the context of propensity score matching. These recommendations seek to reduce the imbalance as much as possible without ending up with no actual matches: if the caliper were exactly 0 and if the sample space were continuous, the analysis would be non-informative as no units could be matched. In both matching and rerandomization, then, we would like to create a situation where treated and control units are as similar as possible to one another. However, whereas achieving no acceptable matches affects point estimates, achieving a single acceptable randomization affects the evaluation of null hypotheses using exact inference.

This section illustrates an important tradeoff in rerandomization involving small samples: more stringent acceptance thresholds improve balance and reduce sampling variability in treatment effect estimates, but too stringent thresholds lead to non-informative hypothesis tests. Balancing these considerations, the choice of the rerandomization acceptance probability pap_{a} (or equivalently, the threshold aa) is critical. In the next section, we will explore several approaches for setting the acceptance threshold.

4 Approaches to Choice of Rerandomization Threshold

The choice of pap_{a} involves considering multiple factors including improved covariate balance, number of acceptance randomizations, and computational time to randomization acceptance. The first two issues have been discussed already. The last issue is relevant because of the waiting time until an acceptable randomization is distributed according to a Geometric(pap_{a}) distribution in the Multivariate Normal case. In general, we would expect to generate 1/pa1/p_{a} randomizations before accepting a single one. In the context of this tradeoff, there are several possible approaches.

4.1 An a priori threshold

As Morgan and Rubin write, “for small samples, care should be taken to ensure the number of acceptable randomizations does not become too small, for example, less than 10001000” (p. 7, [22]). This view suggests that pap_{a} should be set as low as possible so long as the number of acceptable randomizations is greater than a fixed threshold that is determined a priori. In this sense, one decision rule might be to set pap_{a} such that

minimum pp-value given pap_{a} =β.\displaystyle=\beta.

The value of pap_{a} which yields β\beta can easily be found by inverting the formula for the minimum pp-value. On the one hand, it is difficult to know whether a given threshold is reasonable given the number of units available, the number of covariates observed pre-treatment, and the computational resources at researchers’ disposal. On the other hand, a threshold is easy to interpret and does not involve additional optimization steps.

4.2 Heuristic Tradeoff

Our simulations illustrated how the fiducial interval widths do not get monotonically smaller with pap_{a} until the point of non-informativity; rather, they can start to increase before that point in small samples. Thus, in selecting pap_{a}, there is a tradeoff between the variance gains from rerandomization and the decreasing size of the acceptable randomization set. We could therefore propose an alternative procedure to determine a pap_{a} value that explicitly trades off the costs and benefits of rerandomization.

Following [22], let epa(pa)=ea(a):=minimum p-value given pae_{p_{a}}(p_{a})=e_{a}(a):=\textrm{minimum $p$-value given $p_{a}$}. Let vpa(pa)=va(a):=vav_{p_{a}}(p_{a})=v_{a}(a):=v_{a}, where vav_{a} loosely denotes the remaining variance between treatment and control covariate means on a percentage basis. This notation allows us to emphasize the duality between setting aa and pap_{a}, where pap_{a} is defined as the acceptance probability and aa is defined as the acceptance threshold for MM.

We want both the minimum pp-value given pap_{a} and vav_{a} to be small: when the minimum pp-value is small, we can form tight fiducial intervals; when vav_{a} is small, we achieve a large reduction in variance for each covariate. Furthermore, both epa(pa)e_{p_{a}}(p_{a}) and vpa(pa)v_{p_{a}}(p_{a}) are bounded between 0 and 11. Thus, we want to set pap_{a} (or, equivalently aa) such that

pa\displaystyle p_{a}^{*} :=argminpaλepa(pa)+(1λ)vpa(pa),\displaystyle:=\textrm{arg}\min_{p_{a}}\;\;\;\lambda\cdot e_{p_{a}}(p_{a})+(1-\lambda)\cdot v_{p_{a}}(p_{a}),

where λ[0,1]\lambda\in[0,1] determines the tradeoff between the variance reduction and the minimum possible pp-value. It is possible to derive an efficient solution to this optimization problem using gradient descent.

The optimization framework described here has several attractive features. First, given a minimum pp-value, we can also back out the λ\lambda value that this implies. Thus, this framing therefore allows for the evaluation of threshold choices. Second, when w=0.5w=0.5 and there is only one covariate, the minimum pp-value will always be less than 0.05. Third, conditional on number of covariates, pap_{a}^{*} describes the same relative place on the objective function no matter the sample size, so in this sense, it is comparable across experiments. Finally, pap_{a}^{*} has an intuitive interpretation in that it captures the point at which we have gained most of the variance-reducing benefits of rerandomization but have not incurred significant inferential costs.

4.3 Optimal Rerandomization Threshold with Prior Design Information

The preceding discussion of heuristic tradeoffs describes dynamics regarding the choice of randomization threshold when no prior information is available. However, when prior design information on units is accessible, we can incorporate that information explicitly to help improve exact tests in a design-based manner.

Considerations of optimal precision have a long history in the analysis of treatment effects. For example, [23] and [24] discuss minimizing confidence interval width via stratum-size informed stratified sampling [18]. In general, in the Neymanian framework, optimal experimental design is that which generates unbiased and minimal variance estimates, usually under the condition of no prior assumptions on the data-generating process.

While discussions of power in the rerandomization framework are developed in [5] and the weighting of covariates in the rerandomization [20], there is less guidance in the literature about how to consider optimality in randomization choice from a Neymanian-informed Fisherian perspective—a task we turn to in this section.

Because rerandomization-based inference is simulation-based, the a priori calculation of the optimal acceptable balance threshold is difficult (although as mentioned asymptotic results are available [19]). We here show how we can use prior information on background covariates and (b) prior information about the plausible range of treatment effects to select the covariate balance threshold so as to explicitly minimize the acceptable randomization threshold minimizing the expected pp-value. The expectation is taken over the aforementioned sources of prior information, with the intuition being that when we have an estimate for how prior information is relevant for predicting the outcome, we can use this estimate for determining how to select an optimal point on the tradeoff between better balance and sufficient randomness to perform exact rerandomization inference.

Formally, we select the rerandomization threshold using knowledge of the design—in particular, the covariates and prior assumptions on plausible distributions over treatment effect and relationships between the baseline covariates and outcome:

pa=argminpa𝔼𝐓𝐃(pa),β,𝝉[p-value(𝐘~(𝐗,𝐓,𝝉),𝐓)],\displaystyle p_{a}^{*}=\arg\min_{p_{a}}\;\;\mathbb{E}_{\mathbf{T}\sim\mathbf{D}(p_{a}),\beta,\boldsymbol{\tau}}\left[\textrm{$p$-value}(\widetilde{\mathbf{Y}}(\mathbf{X},\mathbf{T},\boldsymbol{\tau}),\mathbf{T})\right],

with 𝝉\boldsymbol{\tau} being the vector of individual treatment effects, β\beta defining the parameters of the potential outcome model, denoted using 𝐘~\widetilde{\mathbf{Y}}, and 𝐃(pa)\mathbf{D}(p_{a}) defining the distribution over the treatment assignment vector as a function of the acceptance probability, pap_{a}. The approach here involves simulating potential outcomes under prior knowledge, calculating the expected pp-value under various acceptance thresholds, and selecting the threshold minimizing this quantity (thereby maximizing power to reject the exact null).

By deciding on the acceptance threshold before analyzing the data, we use the principles of design-based inference pioneered by Neyman and contemporaries. And, by incorporating rerandomization, we improve treatment effect estimates while maintaining the ability to calculate valid exact intervals in a way that incorporates information about the structure of randomness in the randomization inference.

In summary, selecting the best rerandomization threshold in a given experiment is difficult to do in small experiments given without guidance from asymptotic results. However, we can use simulation-based methods to form an approximation for how pp-values will on average operate under the prior design knowledge. These approximations are then useful in pre-analysis design decisions, especially as investigators can select the acceptance threshold minimizing a priori expected pp-values.

5 Exploring Degrees of Randomness in Rerandomization via Simulation

5.1 Simulation Design

We explore the impact of varying levels of randomness in rerandomization on inference through simulation. We simulate potential outcomes under a linear model:

Yi(0)\displaystyle Y_{i}(0) =𝐗i𝜷+ϵiY(0)\displaystyle=\mathbf{X}_{i}^{\prime}\boldsymbol{\beta}+\epsilon_{i}^{Y(0)}
Yi(1)\displaystyle Y_{i}(1) =τ+𝐗i𝜷+ϵiY(1),\displaystyle=\tau+\mathbf{X}_{i}^{\prime}\boldsymbol{\beta}+\epsilon_{i}^{Y(1)},

where covariates, 𝐗i\mathbf{X}_{i} are drawn from a Multivariate Gaussian with diagonal covariance and where ϵiY(t)\epsilon_{i}^{Y(t)} are independent Gaussian error terms. We study an environment where errors are N(0,0.1)N(0,0.1), allowing us to assess robustness to noise in the potential outcomes. We also study results across low and high treatment effect strength conditions (where τ{0.1,1}\tau\in\{0.1,1\}). Finally, we vary the number of observations, n{6,12,18}n\in\{6,12,18\}.

We then generate a sequence of acceptable randomizations, ranging from 10 to the set of all possible completely randomized treatment vectors, to evaluate performance at different rerandomization acceptance thresholds. We run the pre-analysis design procedure described in §4.3, which returns an estimated expected pp-value using the prior design information. We compute the estimated expected pp-value from each Monte Carlo iteration. We then compare the range of estimates against the true expected pp-value minimizer, analyzing both bias and relative RMSE, where for interpretability the RMSE estimate has been normalized by the expected RMSE under uniform selection of the acceptable randomization threshold, pap_{a}. We assess performance averaging over randomness in the covariates and outcome.

5.2 Simulation Results

Figure 5 displays the main results. In the left panel, we see that the Bayesian selection procedure for the rerandomization threshold described in §4.3 is somewhat downwardly biased under the non-informative priors specified here, both in the small (“S”) and large (“L”) treatment effect cases. In other words, it yields an acceptance probability that is somewhat lower than what we find to be optimal via Monte Carlo. However, in the right panel, we see that the procedure generates lower relative RMSE compared to uniform selection of pap_{a} across the interval [0,1][0,1]. This finding, again robust to treatment effect size, indicates that there is a systematic structure in the randomness surrounding the rerandomization problem that can be leveraged in pre-design analysis decisions.

Refer to caption
(a)
Figure 5: Left panel: Bias, where the target estimand is the optimal pap_{a} in minimizing expected pp-value. “S” denotes the small treatment effect size case; “L” denotes the large treatment effect size case. Right panel: Relative RMSE, with raw RMSEs have been made relative using the RMSE under uniform selection of pap_{a} as a baseline.

Overall, we find a large decrease in relative root mean squared error (RMSE) when using the pre-analysis rerandomization threshold selection procedure described §4.3. Rerandomization allows robust causal inference in cases with high noise levels and smaller sample sizes.

6 Application to a Modern Agricultural Experiment, Revisited

Having shown the possible improvements with our optimal rerandomization approach via simulation, we now explore the same dynamics using data from the real agricultural experiment on tenancy contracts discussed in §2.

To do this, we first take a random 10% subsample of the units in the experiment, to illustrate what it would have been like to carry out a much smaller, less costly experiment than done by the original authors. We then carry out a semi-synthetic simulation involving the experimental data.

We first fit an OLS model on the full data to estimate outcome model parameters (along with variance-covariances). We then sample these coefficients from a Multivariate Gaussian to simulate counterfactual outcomes for the potential outcomes of each unit in random 10% subsamples. We need this semi-synthetic simulation protocol to generate the complete potential outcomes table. We use the fixed covariate profiles of units, and average uncertainty in the outcome imputations when reporting assessing performance.

With this semi-synthetic setup where true causal effects are known (and non-zero), we find, as shown in Figure 6, that the true acceptance probability that minimizes the expected pp-value is 0.008. This is quite close to the selected pap_{a} value from the pre-design procedure described in §4.3 (value near 0.008; non-informative Gaussian priors for assumed structural parameters used). The implication of these results for practice are that investigators should in this case employ some rerandomization so that there are about 128 acceptable randomization in terms of balance out of the set of all possible 184,756 complete randomizations. When they do so, they would based on the prior design structure expect a pp-value of 0.027 compared to 0.10 without any rerandomization.

Refer to caption
Figure 6: The acceptance probability minimizing the expected pp-value is quite close to the one selected by the choice procedure described in §4.3.

We next observe in Figure 7 the degree to which the difference-in-means estimates for the ATE fluctuate across repeated samples in relation to the acceptance probability. When the acceptance probability remains high, there’s a corresponding increase in the average imbalance—and therefore sampling variability. As the acceptance probability diminishes, estimation variance shrinks in tandem with improved balance. The selection procedure elucidated previously selects an acceptance probability that not only ensures a precisely estimated ATE (where precision is meant in terms of sampling variability) but also a testing framework leading to improved power to reject the null, evident from the minimum expected pp-value we noticed in Figure 6.

Without rerandomization, the finite-sample τ^\hat{\tau}’s deviate 14% or more from the true τ\tau in this semi-synthetic design, whereas with the rerandomization specified by the pre-experimental design analysis, they deviate by less than 5%.

Refer to caption
Figure 7: As we decrease the acceptance probability, the expected τ^\hat{\tau} gets closer to the true value (indicated by the gray horizontal line). The acceptance probability indicated by the aforementioned procedure weights the expected precision gains in τ^\hat{\tau} vs. the uncertainty estimates in the exact pp-values.

7 Discussion of Limitations & Future Work

The discussion presented in this paper relies on several key assumptions that are worth highlighting.

First, we focus our discussion on Fisherian permutation-based inference. As mentioned, the story of how significance tests interact with rerandomization threshold choice may differ for Neymanian repeated sampling inference. However, we note that some of the principles we have discussed here apply (at least in the extreme) for other kinds of randomization-based inference that test null hypotheses beyond the exact null—e.g., the conditional independence null used in [8] where randomization inference is performed using draws from the assignment distribution over treatment vector.

Second, our simulation results assume additivity and constancy of treatment effects. This rules out treatment effect heterogeneity or interactions between covariates and treatment. While rerandomization can still be beneficial under weaker assumptions, the relationship between the acceptance probability and inference may differ in more complex settings. Investigating rerandomization with heterogeneous effects is an important area for continued work.

Finally, we note that the task of achieving very low rerandomization thresholds is computationally intensive for experiments involving many units. While the computational tools introduced in the fastrerandomize package accompanying paper make possible the analysis of the full permutation set over treatment vectors ranging in size into the hundreds of millions using an accelerated linear algebra compiler, further computational effort to adaptively build this set would enable researchers to achieve better balance while maintaining use of exact inference or conditional randomization tests.

8 Conclusion

In this paper, we formalize the idea that rerandomization can be taken “too far,” in the sense that the variance-reduction benefits of rerandomization can be outweighed by the costs of having undesirably wide permutation-based fiducial intervals when acceptance probabilities are low. We accomplish this formalization by deriving the minimum possible pp-value, which turns out to be a function of the acceptance threshold and the number of units.

In this light, we presented a unified approach to the problem of determining an optimal acceptance threshold. While the decision for a rerandomization threshold traditionally operates in the absence of prior information, the availability of prior design details helps structure the randomness in a way that can be leveraged in the design phase. By leveraging this information, researchers can hone experimental precision in a design-centric fashion, a concept rooted in the foundations set in [23] and [24]. Drawing from the Neymanian ethos, the ideal experimental design offers unbiased and variance-minimized estimates, typically devoid of data-generating presumptions. Notably, as the rerandomization framework is inherently simulation-based, determining an a priori optimal balance threshold can be intricate, despite some extant asymptotic results [19].

Although Fisherian and Neymanian approaches to experimental treatment effect estimation differ, and although rerandomization—because it is grounded in randomization inference—is more aligned with a Fisherian approach, both researchers were concerned with the efficient use of data. Rerandomization can make better use of available data by improving covariate balance, thereby serving the interests of efficiency from both Fisherian and Neymanian perspectives.

We have contributed to the understanding of rerandomization as a kind of slippery slope: at the top of this slope, all randomizations are accepted and, at the bottom, only one randomization is accepted but it becomes impossible to estimate uncertainty using important non-parametric tests. Our paper thus helps experimenters to understand where they are on this slope, where they might most prefer to be, and how they can arrive there.

Over thirty years ago, [26] pointed out that, even though Fisher’s and Neyman’s approaches to hypothesis testing were distinct, they were “complementary.” While Fisher relied on a sharp null and Neyman took a repeated-sampling approach over nonnull distributions, both have the intuition that examining all ways that treatments could be assigned can help researchers understand how the treatment changed outcomes for treated units relative to similarly situated control units. In this paper, we hope to suggest one method that uses these same intuitions to make a methodological contribution in today’s very different world of ubiquitous randomized experiments and immense computing power. By strategically winnowing down the set of possible randomizations to an optimum number, we contribute an approach that emphasizes balance while reaping the benefits of random variation.

Data availability statement: Data and code to replicate results, as well as a tutorial in installing and using the fastrerandomize package, are or will be available at github.com/cjerzak/fastrerandomize-software.

References

  • 1 Peter C Austin “Optimal Caliper Widths for Propensity-Score Matching When Estimating Differences in Means and Differences in Proportions in Observational Studies” In Pharmaceutical Statistics 10.2, 2011, pp. 150–161 DOI: 10.1002/pst.433
  • 2 Abhijit Banerjee, Sylvain Chassang, Sergio Montero and Erik Snowberg “A Theory of Experimenters”, 2017
  • 3 Dimitris Bertsimas, Mac Johnson and Nathan Kallus “The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples” In Operations Research 63.4 INFORMS, 2015, pp. 868–876
  • 4 Zach Branson and Tirthankar Dasgupta “Sampling-Based Randomized Designs for Causal Inference Under the Potential Outcomes Framework” In arXiv preprint arXiv:1808.01691, 2018
  • 5 Zach Branson, Xinran Li and Peng Ding “Power and Sample Size Calculations for Rerandomization” In arXiv preprint arXiv:2201.02486, 2022
  • 6 Zach Branson and Luke Miratrix “Randomization Tests that Condition on Non-Categorical Covariate Balance” In Journal of Causal Inference 7.1, 2019
  • 7 Konrad B Burchardi, Selim Gulesci, Benedetta Lerva and Munshi Sulaiman “Moral Hazard: Experimental Evidence from Tenancy Contracts” In The Quarterly Journal of Economics 134.1 Oxford University Press, 2019, pp. 281–347
  • 8 Emmanuel Candes, Yingying Fan, Lucas Janson and Jinchi Lv “Panning for Gold: Model-X Knockoffs for High Dimensional Controlled Variable Selection” In Journal of the Royal Statistical Society Series B: Statistical Methodology 80.3 Oxford University Press, 2018, pp. 551–577
  • 9 William G. Cochran and Donald B. Rubin “Controlling Bias in Observational Studies: A Review” In Sankhya: The Indian Journal of Statistics, Series A (1961-2002) 35.4, 1973, pp. 417–446 URL: http://www.jstor.org/stable/25049893
  • 10 Angus Deaton “Randomization in the Tropics Revisited: A Theme and Eleven Variations”, 2020
  • 11 Peng Ding and Xinran Li “General Forms of Finite Population Central Limit Theorems with Applications to Causal Inference” In Journal of the American Statistical Association 112.520, 2017, pp. 1759–1796
  • 12 Esther Duflo and Abhijit Banerjee “Handbook of Field Experiments” Elsevier, 2017
  • 13 Paul H. Garthwaite “Confidence Intervals from Randomization Tests” In Biometrics 52.4, 1996, pp. 1387–1393 DOI: 10.2307/2532852
  • 14 William S. Gosset “Comparison Between Balanced and Random Arrangements of Field Plots” In Biometrika JSTOR, 1938, pp. 363–378
  • 15 Guido W. Imbens and Donald B. Rubin “Causal Inference for Statistics, Social, and Biomedical Sciences” Cambridge: Cambridge University Press, 2015 URL: http://www.cambridge.org/US/academic/subjects/statistics-probability/statistical-theory-and-methods/causal-inference-statistics-social-and-biomedical-sciences-introduction
  • 16 Nathan Kallus “Optimal A Priori Balance in the Design of Controlled Experiments” In Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80.1 Wiley Online Library, 2018, pp. 85–112
  • 17 Maximilian Kasy “Why Experimenters Might Not Always Want to Randomize, and What They Could Do Instead” In Political Analysis 24.3, 2016, pp. 324–338
  • 18 Adam P Kubiak and Paweł Kawalec “Prior Information in Frequentist Research Designs: The Case of Neyman’s Sampling Theory” In Journal for General Philosophy of Science 53.4 Springer, 2022, pp. 381–402
  • 19 Xinran Li, Peng Ding and Donald B. Rubin “Asymptotic Theory of Rerandomization in Treatment-Control Experiments” In Proceedings of the National Academy of Sciences 115.37 National Academy of Sciences, 2018, pp. 9157–9162 DOI: 10.1073/pnas.1808191115
  • 20 Zhaoyang Liu, Tingxuan Han, Donald B Rubin and Ke Deng “Bayesian Criterion for Re-Randomization” In arXiv preprint arXiv:2303.07904, 2023
  • 21 Alfred Marshall “The Principles of Economics”, 1890
  • 22 Kari Lock Morgan and Donald B. Rubin “Rerandomization to Improve Covariate Balance in Experiments” In The Annals of Statistics 40.2, 2012, pp. 1263–1282 URL: http://projecteuclid.org.ezp-prod1.hul.harvard.edu/euclid.aos/1342625468
  • 23 Jerzy Neyman “Zarys Teorji I Praktyki Badania Struktury Ludności Metodą Reprezentacyjna” Z zasiłkiem Kasy im Mianowskiego Instytutu Popierania Nauki, 1933
  • 24 Jerzy Neyman “Recognition of Priority” In Jour. Roy. Stat. Soc 115, 1952, pp. 602
  • 25 Michael A Proschan and Lori E Dodd “Re-randomization Tests in Clinical Trials” In Statistics in medicine 38.12 Wiley Online Library, 2019, pp. 2292–2302
  • 26 Donald B Rubin “Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies” In Statistical Science 5.4 Citeseer, 1990, pp. 472–480
  • 27 Leonard Jimmie Savage “Subjective Probability and Statistical Practice” Mathematical Sciences Directorate, Office of Scientific Research, US Air Force, 1959
  • 28 Jose F. Soares and C.F.J. Wu “Optimality of Random Allocation Design for the Control of Accidental Bias in Sequential Experiments” In Journal of Statistical Planning and Inference 11.1, 1985, pp. 81–87 DOI: 10.1016/0378-3758(85)90027-8

9 Appendix

9.1 Defining Randomization-based Fiducial Intervals

Randomization intervals incorporate randomness only through variation in the treatment vector, and do not explicitly make a reference to repeated experiments (as in Neyman’s classical confidence intervals). In this case, we can produce an interval by finding the set of all null hypotheses that the observed data would fail to reject. If we recall the constant and additive treatment effect assumption (Y(1)=Y(0)+τ{Y}(1)={Y}(0)+\tau), then the hypotheses can be written as

H0\displaystyle H_{0} :τ=τ0\displaystyle:\tau=\tau_{0}
H1\displaystyle H_{1} :ττ0.\displaystyle:\tau\neq\tau_{0}.

An α\alpha-level fiducial interval for τ\tau consists of the set of τ0\tau_{0} such that the observed test statistic would not lead to a rejection of the null hypothesis at significance-level α\alpha. Here, when τ00\tau_{0}\neq 0, the randomization test is conducted by constructing

Yi(0)=(YObs,iτ0)Wi+YObs,i(1Wi).Y_{i}(0)^{*}=\left(Y_{\textrm{Obs},i}-\tau_{0}\right)W_{i}+Y_{\textrm{Obs},i}(1-W_{i}).

Then, keeping Yi(0)Y_{i}(0)^{*} fixed, we permute the treatment assignment vector, and calculate YObs{Y}_{\textrm{Obs}}^{*} by adding τ0\tau_{0} to the treatment group outcomes under the permutation. This procedure generates a distribution of τ^\hat{\tau} under the null H0:τ=τ0H_{0}:\tau=\tau_{0}. We can use this distribution to calculate a pp-value for τ^Obs\hat{\tau}_{\textrm{Obs}} under the null. We can form an 100(1α)%100(1-\alpha)\% interval for τ\tau by finding the values of τ0\tau_{0} which generate a pp-value greater than or equal to α\alpha. [13] discuss an efficient algorithm for obtained randomization-based fiducial intervals, which searches for the interval endpoints using a procedure based on the Robbins-Monro search process.

9.2 Proofs of Observations 1

In this section, we define V:=WCV:=W^{C} for notational convenience. We assume that the covariates are Multivariate Gaussian and the imbalance metric is the Mahalanobis distance between treated and control means. Then, we can appeal to the finite-population Central Limit Theorem to assume 𝐗¯T𝐗¯C\mathbf{\overline{X}}_{T}-\mathbf{\overline{X}}_{C} is Multivariate Normal and therefore that M(V)χk2M(V)\sim\chi_{k}^{2} [11].

9.2.1 Proof of Observation 1

For rerandomization, each accepted VV is equally likely to have been drawn (since each VV, accepted or not, was drawn with probability 1/nCand1/n_{\textrm{Cand}}). Recall that, if all outcomes are equally likely, the probability of the event {V=v}\{V=v\} given M(V)aM(V)\leq a is equal to the number of outcomes in which that event occurs (which is 11) divided by the total number of outcomes (which is |A||\mathcal{R}_{A}|). This means

Pr(V=v|M(V)a)=1|A|.\Pr(V=v|M(V)\leq a)=\frac{1}{|\mathcal{R}_{A}|}.

Another way to see this is to consider how

Pr(V=v|I{M(V)a})\displaystyle\Pr(V=v|I\{M(V)\leq a\}) =Pr(V=v|M(V)a)\displaystyle=\Pr(V=v|M(V)\leq a)
=Pr(I{M(V)a}|V=v)×Pr(V=v)Pr(I{M(V)a});\displaystyle=\frac{\Pr(I\{M(V)\leq a\}|V=v)\times\Pr(V=v)}{\Pr(I\{M(V)\leq a\})};
=I{M(v)a}×2k|A|nCand=1×1nCand|A|nCand=1/|A|,\displaystyle=\frac{I\{M(v)\leq a\}\times 2^{-k}}{\frac{|\mathcal{R}_{A}|}{n_{\textrm{Cand}}}}=\frac{1\times\frac{1}{n_{\textrm{Cand}}}}{\frac{|\mathcal{R}_{A}|}{n_{\textrm{Cand}}}}=1/|\mathcal{R}_{A}|,

where the last line uses the fact that nCand=2kn_{\textrm{Cand}}=2^{k}. The average waiting time for the first acceptance is governed by a Geometric distribution with probability parameter pap_{a}. By the independence of the sampling process, the waiting time until the kthk^{\textrm{th}} acceptance is given by k/pak/p_{a}.

9.3 Kasy (2016) in the Rerandomization Framework

We argued in the above that Kasy’s optimal assignment procedure is a special case of rerandomization in which the only acceptable randomization is the one that provides optimal covariate balance. Let the 𝐱\mathbf{x} matrix include all observed covariates that the experimenter seeks to balance. Let 𝐖\mathbf{W} denote the nn-dimensional treatment assignment vector indicating the treatment group for each unit. In the notation of Morgan and Rubin, the rerandomization criterion can be written as

φ(𝐱,𝐖)={1,if 𝐖 is an acceptable randomization,0,if 𝐖 is not an acceptable randomization.\varphi(\mathbf{x,W})=\begin{cases}1,&\textrm{if $\mathbf{W}$ is an acceptable randomization},\\ 0,&\textrm{if $\mathbf{W}$ is not an acceptable randomization}.\end{cases}

With the appropriate φ(,)\varphi(\cdot,\cdot), Kasy’s non-deterministic assignment procedure can be considered a special case of rerandomization. That is, we will obtain Kasy’s optimal assignment if we define φ(,)\varphi(\cdot,\cdot) as follows, letting R()R(\cdot) denote either Bayesian or minimax risk, letting β^\hat{\beta} denote the choice of estimator, and letting UU denote a generic randomization procedure independent of 𝐱\mathbf{x} and 𝐘\mathbf{Y},

φ(𝐱,𝐖)={1,if 𝐖 minimizes R(𝐖,β^|𝐱,U),0,if 𝐖 does not minimize R(𝐖,β^|𝐱,U).\varphi(\mathbf{x,W})=\begin{cases}1,&\textrm{if $\mathbf{W}$ minimizes $R(\mathbf{W},\hat{\beta}|\mathbf{x},U)$},\\ 0,&\textrm{if $\mathbf{W}$ does not minimize $R(\mathbf{W},\hat{\beta}|\mathbf{x},U)$}.\end{cases}

With nn units, we will obtain a 𝐖\mathbf{W} such that φ(𝐱,𝐖)=1\varphi(\mathbf{x},\mathbf{W})=1 with probability (1/2)n(1/2)^{n}. The expected wait time until obtaining this 𝐖\mathbf{W} is given by a Geometric distribution, with mean 2n2^{n}. This calculation implies that, as soon as n20n\geq 20, the expected wait-time will exceed 11 million rerandomization draws if rerandomization is used [28].

This section has put our discussion in closer dialogue Kasy’s work in order to further illustrate how Kasy’s deterministic procedure for optimal treatment assignment is a special case of rerandomization developed in [22].