This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Economics of Partisan Gerrymandering

Anton Kolotilin and Alexander Wolitzky
(Date: 22nd January 2025. First version: 17th September 2020.)

Kolotilin: School of Economics, UNSW Business School. Wolitzky: Department of Economics, MIT.
 
We thank Nikhil Agarwal, Garance Genicot, Ben Golub, Richard Holden, Gary King, Hongyi Li, Nolan McCarty, Stephen Morris, Ben Olken, and Ken Shotts, as well as seminar and conference participants at ASSA, Harvard, MIT, NBER, Peking, Penn State, Rochester, Stanford, and Warwick for helpful comments and suggestions. We thank Eitan Sapiro-Gheiler and Nancy Wang for excellent research assistance. Anton Kolotilin gratefully acknowledges support from the Australian Research Council Discovery Early Career Research Award DE160100964 and from MIT Sloan’s Program on Innovation in Markets and Organizations. Alexander Wolitzky gratefully acknowledges support from NSF CAREER Award 1555071 and Sloan Foundation Fellowship 2017-9633.

We study the problem of a partisan gerrymanderer who assigns voters to equipopulous districts so as to maximize his party’s expected seat share. The designer faces both aggregate uncertainty (how many votes his party will receive) and idiosyncratic, voter-level uncertainty (which voters will vote for his party). We argue that pack-and-pair districting, where weaker districts are “packed” with a single type of voter, while stronger districts contain two voter types, is typically optimal for the gerrymanderer. The optimal form of pack-and-pair districting depends on the relative amounts of aggregate and idiosyncratic uncertainty. When idiosyncratic uncertainty dominates, it is optimal to pack opposing voters and pair more favorable voters; this plan resembles traditional “packing-and-cracking.” When aggregate uncertainty dominates, it is optimal to pack moderate voters and pair extreme voters; this “matching slices” plan has received some attention in the literature. Estimating the model using precinct-level returns from recent US House elections indicates that, in practice, idiosyncratic uncertainty dominates and packing opponents is optimal; moreover, traditional pack-and-crack districting is approximately optimal. We discuss implications for redistricting reform and political polarization. Methodologically, we exploit a formal connection between gerrymandering—partitioning voters into districts—and information design—partitioning states of the world into signals.  

JEL Classification: C78, D72, D82

Keywords: Gerrymandering, pack-and-crack, matching slices, pack-and-pair, information design

1. Introduction

Legislative district boundaries are drawn by political partisans under many electoral systems (Bickerstaff, 2020). In the United States, the importance of districting has accelerated with the rise of computer-assisted districting (Newkirk, 2017), together with intense partisan efforts to gain and exploit control of the districting process. These trends culminated in “The Great Gerrymander of 2012” (McGhee, 2020), where the Republican party’s Redistricting Majority Project (REDMAP), having previously targeted state-level elections that would give Republicans control of redistricting, aggressively redistricted several states, including Michigan, Ohio, Pennsylvania, and Wisconsin. The resulting districting plans are widely viewed as contributing to the outcome of the 2012 general election, where Republican congressional candidates won a 33-seat majority in the House of Representatives with 49.4% of the two-party vote (McGann, Smith, Latner, and Keena, 2016). In light of these developments—along with the Supreme Court ruling in Rucho v. Common Cause (2019) that partisan gerrymanders are not judiciable in federal court, and the continued prominence of gerrymandering in the 2020 US redistricting cycle (Rakich and Mejia, 2022)—partisan gerrymandering looks likely to remain an important feature of American politics for some time.

This paper studies the problem of a partisan gerrymanderer (the “designer”) who assigns voters to a large number of equipopulous districts so as to maximize his party’s expected seat share.111Of course, studying this problem does not endorse gerrymandering, any more than studying monopolistic behavior endorses monopoly. This problem approximates the one facing many partisan gerrymanderers in the United States. In particular, the constraint that districts must be equipopulous is crucial and is strictly enforced by law.222In Karcher v. Daggett (1983), the Supreme Court rejected a districting plan in New Jersey with less than a 1% deviation from population equality, finding that “there are no de minimus population variations, which could practically be avoided, but which nonetheless meet the standard of Article I, Section 2 [of the U.S. Constitution] without justification.” In practice, gerrymanderers also face other significant constraints, such as the federal requirements that districts are contiguous and do not discriminate on the basis of race, and various state-level restrictions, such as “compactness” requirements, requirements to respect political sub-divisions such as county lines, requirements to represent racial or ethnic groups or other communities of interest, and so on. While these complex additional constraints are important in some cases, we believe that often they are not as binding as they might seem, and also that they are more productively considered on a case-by-case basis rather than as part of a general theoretical analysis.333See Friedman and Holden (2008) for more discussion of these constraints. For example, contiguity is not as severe a constraint as it might seem, because contiguous districts can have extremely irregular shapes. We therefore follow much of the literature (discussed below) in focusing on the simpler problem with only the equipopulation constraint.

When the designer has perfect information, it is well-known that the solution to this problem is pack-and-crack: if the designer’s party is supported by a minority of voters of size m<1/2m<1/2, he “packs” 12m1-2m opposing voters in districts where he receives zero votes, and “cracks” the remaining 2m2m voters in districts which he wins with 50% of the vote.444If the designer has majority support, he can win all the districts. We instead consider the more general and realistic case where the designer must allocate a variety of types of voters (or, more realistically, groups of voters such as census blocks or precincts) under uncertainty. The goal of this paper is to characterize optimal partisan gerrymandering in this setting, to compare optimal gerrymandering with simple and realistic forms of packing-and-cracking, and to draw some implications for broader legal and political economy issues.

In outline, our model and results are as follows. We assume that the designer faces both aggregate uncertainty (how many votes his party will receive) and idiosyncratic, voter-level uncertainty (which voters will vote for his party). Aggregate uncertainty is parameterized by a one-dimensional aggregate shock, while voters are parameterized by a one-dimensional type that determines a voter’s probability of voting for the designer’s party for each value of the aggregate shock. We focus on the case where the aggregate shock is unimodal and where moderate voters are “swingier” than more extreme voters, in that their vote probabilities swing more with the aggregate shock. In this case, we argue that a class of districting plans that we call pack-and-pair—which generalize pack-and-crack—are typically optimal for the designer. Under pack-and-pair districting, the designer creates weaker districts that are packed with a single type of voter (which are analogous to the packed districts under pack-and-crack), and stronger districts that contain two voter types (which are analogous to the cracked districts under pack-and-crack).

We further show that the optimal form of pack-and-pair districting depends on the relative amounts of aggregate and idiosyncratic uncertainty. When idiosyncratic uncertainty dominates, it is optimal to pack opposing voters and pair more favorable voters. This pack-opponents-and-pair plan (henceforth, POP) resembles traditional packing-and-cracking. POP also resembles the “pp-segregation” plan introduced by Gul and Pesendorfer (2010), where opposing voters are segregated and more favorable voters are all pooled together, rather than being paired as they are under POP. When instead aggregate uncertainty dominates, it is optimal to pack moderate voters and pair extreme voters. This pack-moderates-and-pair plan (henceforth, PMP) was proposed under the name “matching slices” by Friedman and Holden (2008) and was applied to redistricting law by Cox and Holden (2011). The pack-and-pair class thus nests the main districting plans proposed in the literature. Our primary theoretical contribution is identifying this class and showing that the optimal plan within this class is determined by the relative amounts of aggregate and idiosyncratic uncertainty.

A rough intuition for these results is that when idiosyncratic uncertainty dominates, the probability that the designer wins a district is approximately determined by the mean voter type in the district, as in probabilistic voting models with partisan taste shocks (e.g., Hinich 1977, Lindbeck and Weibull 1993). With a unimodal aggregate shock, the distribution of district means is then optimized by segregating opposing voters and pooling more favorable voters, as in pp-segregation. When instead aggregate uncertainty dominates, the probability that the designer wins a district is approximately determined by the median voter type in the district, as in probabilistic voting models with an uncertain median bliss point (e.g., Wittman 1983, Calvert 1985). The distribution of district medians is then optimized by pairing above-population-median and below-population-median voter types, as in matching slices. However, the optimal plans we identify (POP and PMP) are somewhat more intricate than pp-segregation and the simple form of matching slices emphasized by Friedman and Holden (2008): POP pairs favorable voters, rather than pooling them as in pp-segregation; and PMP segregates an interval of intermediate voter types, rather than pairing all types as in the simplest form of matching slices.

As we discuss in Section 6, whether optimal districting takes the form of POP or PMP has significant implications for several political and legal issues surrounding redistricting, including redistricting reform and intra- and inter-district political polarization (see also Cox and Holden 2011). It is therefore important to understand whether idiosyncratic or aggregate uncertainty is larger in practice. We answer this question using precinct-level returns from the 2016, 2018, and 2020 US House elections. The data clearly show that idiosyncratic uncertainty is much larger than aggregate uncertainty. Intuitively, this finding results from the simple observation that, in practice, most precinct vote splits are much closer to 50-50 (the vote split under high idiosyncratic uncertainty) than 100-0 or 0-100 (the vote splits under high aggregate uncertainty).555This observation also implies that models with only two types of voters or precincts (e.g., Owen and Grofman 1988) cannot closely approximate the problem facing actual gerrymanderers, who must decide how to allocate many different types of precincts. We therefore expect that, in practice, optimal districting takes the form of POP. We also note, however, that the optimal POP plan is close to pp-segregation under our estimated parameters. Thus, simple pp-segregation plans are likely approximately optimal in practice. This finding helps explain why actual gerrymandering usually resembles pp-segregation—or an even simpler form of pack-and-crack, where unfavorable voters are pooled rather than segregated—instead of a more complicated plan like POP.

Methodologically, we establish a formal connection between gerrymandering—partitioning voters into districts—and information design—partitioning states of the world into signals. The partisan gerrymandering problem we study is mathematically equivalent to a general Bayesian persuasion problem with a one-dimensional state, a one-dimensional action for the receiver, and state-independent sender preferences. Most of our results are novel in the context of this persuasion problem. This paper thus directly contributes to information design as well as gerrymandering; more importantly, we establish a strong connection between these two topics.666Contemporaneous papers by Lagarde and Tomala (2021) and Gomberg, Pancs, and Sharma (2023) also emphasize connections between gerrymandering and information design, albeit in less general models. Lagarde and Tomala assume two voter types, as in Owen and Grofman (1988); Gomberg, Pancs, and Sharma assume no aggregate uncertainty. The closest paper in the persuasion literature is our companion paper, Kolotilin, Corrao, and Wolitzky (2023), which we discuss later on.

1.1. Related Literature

The most related prior papers on optimal partisan gerrymandering are Owen and Grofman (1988), Friedman and Holden (2008), and Gul and Pesendorfer (2010). Owen and Grofman’s model is equivalent to the special case of our model with two voter types. Gul and Pesendorfer consider competition between two designers who each control districting in some area and aim to win a majority of seats.777Friedman and Holden (2020) study designer competition in the model of their *FH paper. A simplified version of their model with a single designer is equivalent to the special case of our model where vote swings are linear in voter types; we discuss this special case in Section 3.4. Friedman and Holden consider essentially the same model as we do (and in particular allow non-linear swings), but their main results concern the special case where aggregate uncertainty is much larger than idiosyncratic uncertainty. In contrast, we do not restrict the relative amounts of aggregate and idiosyncratic uncertainty, and we show empirically that the practically relevant case is that where idiosyncratic uncertainty dominates (i.e., the opposite of the case emphasized by Friedman and Holden).

The broader literature on gerrymandering and redistricting addresses a wide range of issues, including geographic constraints on gerrymandering (Sherstyuk, 1998; Shotts, 2001; Puppe and Tasnádi, 2009), gerrymandering with heterogeneous voter turnout (Bouton, Genicot, Castanheira, and Stashko, 2023), socially optimal districting (Gilligan and Matsusaka, 2006; Coate and Knight, 2007; Bracco, 2013), measuring district compactness (Chambers and Miller, 2010; Fryer and Holden, 2011; Ely, 2022), the interaction of redistricting and policy choices (Shotts, 2002; Besley and Preston, 2007), measuring gerrymandering (Grofman and King, 2007; McGhee, 2014; Stephanopoulos and McGhee, 2015; Duchin, 2018; Gomberg, Pancs, and Sharma, 2023), and assessing the consequences of redistricting (among many: Gelman and King, 1994b; McCarty, Poole, and Rosenthal, 2009; Hayes and McKee, 2009; Jeong and Shenoy, 2022). As the partisan gerrymandering problem interacts with many of these issues, our analysis may facilitate future research in these areas.

1.2. Outline

The paper is organized as follows: Section 2 presents the model. Section 3 analyzes some benchmark cases. Section 4 contains our main theoretical and numerical results. Section 5 contains our empirical results. Section 6 discusses policy implications of our results. Section 7 concludes. All proofs are deferred to the appendix.

2. Model

We consider a standard electoral model with one-dimensional voter types (parameterizing a voter’s probability of voting for the designer’s party) and one-dimensional aggregate uncertainty (parameterizing the designer’s aggregate vote share).

Voters and Vote Shares. There is a continuum of voters. Each voter has a type s[s¯,s¯]s\in[\underline{s},\overline{s}], which is observed by the designer.888In our empirical implementation, ss will correspond to the precinct the voter lives in. The population distribution of voter types is denoted by FF. The aggregate shock is denoted by rr\in\mathbb{R}; its distribution is denoted by GG. We assume that FF and GG are sufficiently smooth and that the corresponding densities ff and gg are strictly positive.999It suffices that distributions FF, GG, and QQ (defined below) are four-times differentiable. We also consider discrete distributions in some benchmark cases.

The share of type-ss voters who vote for the designer when the aggregate shock takes value rr is deterministic and is denoted by v(s,r)[0,1]v(s,r)\in[0,1].101010In our empirical implementation, v(s,r)v(s,r) will correspond to the designer’s vote share in precinct ss given shock rr. The function v(s,r)v(s,r) plays a key role in our analysis. We assume that v(s,r)v(s,r) is strictly increasing in ss and strictly decreasing in rr. Thus, higher voter types are stronger supporters of the designer (i.e., they vote for him with higher probability for every rr), and higher aggregate shocks are worse for the designer (i.e., they reduce the probability that each voter type votes for him). The model thus lets different voter types “swing” by different amounts in response to an aggregate shock, but it does assume that all types swing in the same direction. We also impose the technical assumptions that v(s,r)v(s,r) is four-times differentiable and satisfies limrv(s,r)=0\lim_{r\to\infty}v(s,r)=0 and limrv(s,r)=1\lim_{r\to-\infty}v(s,r)=1 for all ss.

An interpretation of the vote share function v(s,r)v(s,r) is that each voter is hit by an idiosyncratic “taste shock” tt\in\mathbb{R} and votes for the designer if and only if

srt0.s-r-t\geq 0.

With this interpretation, when the taste shock distribution is QQ, we have

v(s,r)=Q(sr)for all (s,r).v(s,r)=Q(s-r)\;\;\text{for all }(s,r).

Mathematically, this “additive taste shock” case arises when the function v(s,r)v(s,r) is translation-invariant: i.e., depends only on the difference srs-r. In this case, the model is parameterized by three distributions: FF, GG, and QQ. However, scaling ss, rr, and tt by the same constant leaves the model unchanged, so we can normalize the variance of one of these three variables to 11. We will thus assume, without loss, that the variance of tt is 11.111111Outside of the benchmark case considered in Section 3.3, where QQ is degenerate.

The designer thus faces two kinds of uncertainty: aggregate uncertainty (captured by rr) and idiosyncratic, voter-level uncertainty (captured by tt, or more generally by the extent to which v(s,r)v(s,r) lies away from the extremes of 0 and 11). Many of our results will involve comparing the “amount” of each kind of uncertainty.

Districting Plans. The designer allocates voters among a continuum of equipopulous districts based on their types ss, and thus determines the distribution PP of ss in each district.121212Since districting plans in the US are drawn at the state level, our continuum model implicitly assumes that each state contains a large number of districts. Obviously, this is a better approximation for state legislative districts and for congressional districts in large states than it is for congressional districts in small states. Introducing integer constraints on the number of districts, while interesting and realistic, would substantially complicate the analysis and would risk obscuring our main insights. A district is characterized by the distribution PP of voter types ss it contains. Thus, a districting plan—which specifies the measure of districts with each voter-type distribution PP—is a distribution \mathcal{H} over distributions PP of ss, such that the population distribution of ss is given by FF: that is, ΔΔ[s¯,s¯]\mathcal{H}\in\Delta\Delta\mathbb{[}\underline{s},\overline{s}] and

P(s)𝑑(P)=F(s)for all s.\int P(s)d\mathcal{H}(P)=F(s)\;\;\text{for all }s.

For example, under uniform districting, where all districts are the same, \mathcal{H} assigns probability 11 to P=FP=F. In the opposite extreme case of segregation, where each district consists entirely of one type of voter, every distribution PP in the support of \mathcal{H} takes the form P=δsP=\delta_{s} for some s[s¯,s¯]s\in[\underline{s},\overline{s}], where δs\delta_{s} denotes the degenerate distribution on voter type ss.

Designer’s Problem. The designer wins a district iff he receives a majority of the district vote. Thus, the designer wins a district with voter type distribution PP (henceforth, “district PP”) iff rr satisfies v(s,r)𝑑P(s)1/2\int v(s,r)dP(s)\geq 1/2. Since v(s,r)v(s,r) is decreasing in rr, the designer wins district PP iff

rr(P):={r:v(s,r)𝑑P(s)=12}.r\leq r^{*}(P):=\left\{r:\int v(s,r)dP(s)=\frac{1}{2}\right\}.

We say that a district PP^{\prime} is weaker than another district PP if r(P)<r(P)r^{*}(P^{\prime})<r^{*}(P). Note that, whenever the designer wins a district PP, he also wins all weaker districts PP^{\prime}. Our model thus reflects what Grofman and King (*GrofKing, p. 12) call “a key empirical generalization that applies to all elections in the U.S. and most other democracies: the statewide or nationwide swing in elections is highly variable and difficult to predict, but the approximate rank order of districts is highly regular and stable.”

We assume that the designer maximizes his party’s expected seat share.131313See Section 7 and Kolotilin and Wolitzky (2020) for discussion of more general designer objectives. Thus, the designer’s problem is

maxΔΔ[s¯,s¯]G(r(P))𝑑(P)\displaystyle\max_{\mathcal{H}\in\Delta\Delta\mathbb{[}\underline{s},\overline{s}]}\int G(r^{*}(P))d\mathcal{H}(P)
s.t.P𝑑(P)=F.\displaystyle\text{s.t.}\;\int Pd\mathcal{H}(P)=F.

This problem nests the partisan gerrymandering problems of Owen and Grofman (1988), Friedman and Holden (2008), and (with a single designer) Gul and Pesendorfer (2010).141414Gul and Pesendorfer (2010) consider a majoritarian objective with district-level uncertainty in addition to aggregate uncertainty. However, after conditioning on the pivotal value of the aggregate shock, district-level uncertainty in Gul and Pesendorfer plays the same role as aggregate uncertainty in our model. It is also equivalent to a Bayesian persuasion problem, where the designer splits a prior distribution FF into posterior distributions PP, and obtains utility G(r(P))G(r^{*}(P)) from inducing posterior PP.151515Specifically, the designer’s problem is equivalent to the state-independent sender case of the persuasion problem studied in Kolotilin, Corrao, and Wolitzky (2023), which specializes the general Bayesian persuasion problem of Kamenica and Gentzkow (2011) by assuming that the state and the receiver’s action are one-dimensional, the receiver’s utility is supermodular and concave in his action, and the sender’s utility is independent of the state and increasing in the receiver’s action. In the gerrymandering context, state-independent sender preferences reflect the fact that the designer cares only about how many districts he wins and not directly about the composition of these districts.

3. Benchmark Cases

We first consider four benchmark cases:

  1. (1)

    There is no uncertainty.

  2. (2)

    There is idiosyncratic uncertainty but no aggregate uncertainty.

  3. (3)

    There is aggregate uncertainty but no idiosyncratic uncertainty.

  4. (4)

    Both kinds of uncertainty are present, but swings are linear in voter types.

These cases illustrate the key forces in the model and set up our main analysis. The benchmark cases with only one kind of uncertainty are much more tractable than the general case with both kinds, but they give a good indication of the form of optimal districting plans when both kinds of uncertainty are present but one kind is much “larger” than the other. We will see that this case is relevant in practice, where idiosyncratic uncertainty is much larger than aggregate uncertainty. Similarly, the linear swing case is very tractable and is a good guide to the more realistic case where swings deviate from linearity systematically but by a relatively small amount.

3.1. Perfect Information: Pack-and-Crack

With perfect information, optimal gerrymandering takes a simple and well-known form.

Proposition 1.

Assume there is no uncertainty: there exists r0r^{0} such that r=r0r=r^{0} with certainty, and v(s,r0)=1{sr0}v(s,r^{0})=\mbox{\bf 1}\{s\geq r^{0}\} for all ss. Denote the fraction of the designer’s “supporters” by m=1F(r0)m=1-F(r^{0}).

  1. (1)

    If m1/2m\geq 1/2, a districting plan is optimal iff it creates measure 11 of districts where PrP(sr0)1/2\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0})\geq 1/2. Under such a plan, the designer wins all districts.

  2. (2)

    If m<1/2m<1/2, a districting plan is optimal iff it creates measure 2m2m of “cracked” districts where PrP(sr0)=PrP(s<r0)=1/2\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0})=\mathop{\rm Pr}\nolimits_{P}(s<r^{0})=1/2 and measure 12m1-2m of “packed” districts where PrP(s<r0)=1\mathop{\rm Pr}\nolimits_{P}(s<r^{0})=1. Under such a plan, the designer wins the cracked districts.

Case (1) says that a designer with majority support wins all the districts (e.g., with uniform districting). Case (2) says that a designer with minority support m<1/2m<1/2 wins 2m2m districts with 50% of the vote, and gets zero votes in the remaining 12m1-2m districts. We call any optimal plan in case (2) pack-and-crack.

When m<1/2m<1/2 and voter types are continuous, there are many pack-and-crack plans. For example, some types of supporters can be assigned to only a subset of cracked districts, and some types of opponents can be assigned only to packed districts. This seemingly pedantic point will become important once we introduce uncertainty, because optimal plans under a small amount of uncertainty will approximate some but not all pack-and-crack plans.

12x01-2x^{0}x0x^{0}x0x^{0}poolingpooling(a) traditional pack-and-crack           12x01-2x^{0}x0x^{0}x0x^{0}segregationpooling(b) pack-opponents-and-pool(pp-segregation)
12x01-2x^{0}x0x^{0}x0x^{0}segregationpairing(c) pack-opponents-and-pair (POP)           x0x^{0}12x01-2x^{0}x0x^{0}pairing(d) pack-moderates-and-pair (PMP)(matching slices)
Figure 1. Four Varieties of Pack-and-Crack
Notes: In each panel, the horizontal axis is the interval of voter types, ss, where red voters are supporters and blue voters are opponents. The designer wins red districts and loses blue ones. Solid shading indicates pooling; curved lines connecting two voter types indicate pairing; hatched shading indicates segregation.

Figure 1 illustrates four pack-and-crack plans that play important roles in our analysis. Panel (a) is what we call traditional pack-and-crack: the strongest opposing voters are pooled in one type of district, while the remaining voters (a mix of supporters and opponents) are pooled in another type of district. Panel (b) is the same, except now each strong opposing type is segregated in a distinct, homogeneous district. We call this plan pack-opponents-and-pool. This plan was previously studied by Gul and Pesendorfer (2010), who called it “pp-segregation.” Panel (c) is the same as Panel (b), except now favorable voter types are matched in a negatively assortative manner to form distinct districts. We call this plan pack-opponents-and-pair, or POP. This plan plays a central role in our analysis, as we will see that it is optimal for realistic parameter values; however, we will also see that the simpler traditional pack-and-crack and pack-opponents-and-pool plans are approximately optimal for the same parameters.

Finally, we call the plan in Panel (d)—where extreme voter types are matched in a negatively assortative manner, and intermediate voter types are segregated—pack-moderates-and-pair, or PMP. This plan was previously studied by Friedman and Holden (2008), who called it “matching slices.”161616Friedman and Holden did not emphasize the possibility of segregating a non-trivial interval of intermediate voter types under matching slices, but their results allow this possibility, and we will see that this is actually the typical case. We also refer to the extreme form of PMP where the segregation region is degenerate, so that only a single voter type is segregated, as negative assortative districting.

3.2. No Aggregate Uncertainty

We next consider the case with idiosyncratic uncertainty but no aggregate uncertainty. As we will see, this case is fairly realistic, as empirically idiosyncratic uncertainty is much larger than aggregate uncertainty.

Proposition 2.

Assume there is no aggregate uncertainty: there exists r0r^{0} such that r=r0r=r^{0} with certainty.

  1. (1)

    If v(s,r0)𝑑F(s)1/2\int v(s,r^{0})dF(s)\geq 1/2, a districting plan is optimal iff it creates measure 11 of districts where v(s,r0)𝑑P(s)1/2\int v(s,r^{0})dP(s)\geq 1/2. Under such a plan, the designer wins all districts.

  2. (2)

    If v(s,r0)𝑑F(s)<1/2\int v(s,r^{0})dF(s)<1/2, let ss^{*} satisfy ss¯(v(s,r0)1/2)𝑑F(s)=0.\int^{\overline{s}}_{s^{*}}(v(s,r^{0})-1/2)dF(s)=0. A districting plan is optimal iff it creates measure 1F(s)1-F(s^{*}) of cracked districts where PrP(ss)=1\mathop{\rm Pr}\nolimits_{P}(s\geq s^{*})=1 and ss¯v(s,r0)𝑑P(s)=1/2\int^{\overline{s}}_{s^{*}}v(s,r^{0})dP(s)=1/2, and measure F(s)F(s^{*}) of packed districts where PrP(s<s)=1\mathop{\rm Pr}\nolimits_{P}(s<s^{*})=1. Under such a plan, the designer wins the cracked districts.

In case (1), the designer wins all districts under uniform districting. In case (2), the designer assigns all voter types s>ss>s^{*} to cracked districts that he wins with exactly 50% of the vote, and packs the remaining voters arbitrarily. The intuition is that the designer wins a district iff the mean vote share v(s,r0)v(s,r^{0}) among voters in the district exceeds 50%, so to win as many districts as possible the designer assigns only voter types above ss^{*} to cracked districts. This plan approximates the pack-and-crack vote share pattern as closely as possible, given the uncertainty facing the designer.

The optimal plans in Proposition 2 coincide with the subset of optimal perfect-information plans that pack opponents (e.g., the plans in Figure 1(a)–(c)). Hence, pack-and-crack plans that pack opponents can be optimal with idiosyncratic uncertainty but no aggregate uncertainty, but plans that pack moderates (e.g., PMP) cannot. In Sections 4 and 5, we will see that idiosyncratic uncertainty dominates aggregate uncertainty in practice. Hence, any optimal plan in Proposition 2—for example, traditional pack-and-crack—will prove to be approximately optimal for realistic parameters.

3.3. No Idiosyncratic Uncertainty

We now turn to the case with aggregate uncertainty but no idiosyncratic uncertainty.

Proposition 3.

Assume there is no idiosyncratic uncertainty: v(s,r)=1{sr}v(s,r)=\mbox{\bf 1}\{s\geq r\} for all (s,r)(s,r). Denote the population median voter type by sm=F1(1/2)s^{m}=F^{-1}(1/2). A districting plan is optimal iff for \mathcal{H}-almost every district Psupp()P\in\operatorname{supp}(\mathcal{H}) there exists a voter type sPsms^{P}\geq s^{m} such that PrP(s=sP)=PrP(s<sm)=1/2\mathop{\rm Pr}\nolimits_{P}(s=s^{P})=\mathop{\rm Pr}\nolimits_{P}(s<s^{m})=1/2. Under such a plan, the designer wins district PP iff rsPr\leq s^{P}.

That is, for each voter type ss above the population median, the designer creates a district consisting of 50% voters with this type and 50% voters with below-median types. Note that, for every realization of aggregate uncertainty r(s¯,s¯)r\in(\underline{s},\overline{s}), the designer wins some districts with exactly 50% of the vote and wins zero votes in all other districts. This is precisely the pack-and-crack vote share pattern.

The intuition for Proposition 3 is easy to see with a finite number NN of districts. With no idiosyncratic uncertainty, the probability that the designer wins a given district is determined by the median voter type in that district. The strongest district the designer can possibly create is formed by combining the 1/(2N)1/(2N) highest voter types with any other voters: that is, it is impossible to create a district where the median voter is above the 11/(2N)1-1/(2N) quantile of the population distribution. Similarly, it is impossible to create nn districts where the median voter is everywhere above the 1n/(2N)1-n/(2N) quantile of the population distribution. But, by creating districts one at time by always combining the 1/(2N)1/(2N) highest remaining voters with 1/(2N)1/(2N) below-median voters, the designer ensures that the median voter in the nthn^{\text{th}} strongest district is exactly the 1n/(2N)1-n/(2N) quantile. So this plan is optimal.

The optimal plans in Proposition 3 are a subset of optimal perfect-information plans. For example, the PMP plan in Figure 1(d) remains optimal when v(s,r)=1{sr}v(s,r)=\mbox{\bf 1}\{s\geq r\} but rr is not degenerate, while the plans in Figures 1(a)–(c) that pack opponents are not optimal in this setting. This result is consistent with Friedman and Holden (2008), who show that matching slices is optimal when idiosyncratic uncertainty is sufficiently small, under some additional assumptions which we discuss in Section 4.1.171717Note that in every optimal plan in Proposition 3, all voters with the highest type ss are assigned to the same district: in Friedman and Holden’s words, “one’s most ardent supporters should be grouped together.” This is what Friedman and Holden mean when they write that “cracking is never optimal” and summarize their findings as “sometimes pack, but never crack.”

3.4. Linear Swing

Our last benchmark case is when vote shares and swings are linear in voter types. There are two equivalent ways to define this case. The simplest definition is that vote shares v(s,r)v(s,r) are linear in ss:

v(s,r)=s¯ss¯s¯v(s¯,r)+ss¯s¯s¯v(s¯,r)for all (s,r).v(s,r)=\frac{\overline{s}-s}{\overline{s}-\underline{s}}v(\underline{s},r)+\frac{s-\underline{s}}{\overline{s}-\underline{s}}v(\overline{s},r)\quad\text{for all $(s,r)$}.

An alternative, equivalent definition is that vote swings are linear in ss. To state this definition, first define the swing of a voter type ss when the aggregate shock changes from rr^{\prime} to rr by

Δsr,r=v(s,r)v(s,r).\Delta_{s}^{r,r^{\prime}}=v(s,r)-v(s,r^{\prime}).

We then say that swings Δsr,r\Delta_{s}^{r,r^{\prime}} are linear in ss if

Δsr,r\displaystyle\Delta_{s}^{r,r^{\prime}} =ρ(s)Δs¯r,r+(1ρ(s))Δs¯r,rfor all (s,r,r),where\displaystyle=\rho(s)\Delta_{\underline{s}}^{r,r^{\prime}}+(1-\rho(s))\Delta_{\overline{s}}^{r,r^{\prime}}\quad\text{for all $(s,r,r^{\prime})$,}\quad\text{where}
ρ(s)\displaystyle\rho(s) =v(s¯,r)v(s,r)v(s¯,r)v(s¯,r)=v(s¯,r)v(s,r)v(s¯,r)v(s¯,r).\displaystyle=\frac{v(\overline{s},r)-v(s,r)}{v(\overline{s},r)-v(\underline{s},r)}=\frac{v(\overline{s},r^{\prime})-v(s,r^{\prime})}{v(\overline{s},r^{\prime})-v(\underline{s},r^{\prime})}.

It is easy to see that, up to a rescaling of ss, vote shares are linear iff swings are linear.

The linear case nests the uniform swing case where Δsr,r\Delta_{s}^{r,r^{\prime}} is independent of ss (for each r,rr,r^{\prime}), so the aggregate shock shifts the vote share equally for all voter types. Political scientists often assume uniform swing to study how a given districting plan would perform under different electoral outcomes.181818See, e.g., Katz, King, and Rosenblatt (2020) for a recent discussion of this methodology. The linear case also nests the case where voter types are binary (i.e., supp(F)={s¯,s¯}\operatorname{supp}(F)=\{\underline{s},\overline{s}\}), as well as the no-aggregate-uncertainty case considered in Section 3.2. However, the no-idiosyncratic-uncertainty case considered in Section 3.3 cannot be linear, unless voter types are binary.

The key simplification afforded by linearity is that the threshold shock r(P)r^{*}(P) for winning a district PP depends only on the district mean voter type x=𝔼P[s]x=\mathbb{E}_{P}[s]. Under linearity, the designer thus effectively chooses a distribution H(x)H(x) of mean types xx, rather than a distribution (P)\mathcal{H}(P) of distributions of types PP. With this formulation, the constraint P𝑑(P)=F\int Pd\mathcal{H}(P)=F simplifies to the requirement that HH is a mean-preserving contraction of FF, which we denote by FHF\succsim H.191919One way to see this is by analogy to statistics, where if a state ss is distributed according to FF then there exists an experiment such that the distribution of posterior expectations of ss is given by HH iff HH is a mean-preserving contraction of FF (e.g., Blackwell, 1953; Kolotilin, 2018).

Slightly abusing notation, the designer wins districts with mean voter type at least xx iff rr(x)r\leq r^{*}(x). The probability of this event is

U(x):=G(r(x)).U(x):=G(r^{*}(x)).

We can interpret UU as the distribution of a re-scaled aggregate shock zz where the designer wins a district with mean type xx iff zxz\leq x. The designer’s problem thus becomes

maxHΔ[s¯,s¯]U(x)𝑑H(x)s.t. FH.\begin{gathered}\max_{H\in\Delta[\underline{s},\overline{s}]}\int U(x)dH(x)\\ \text{s.t. $F\succsim H$}.\end{gathered}

Clearly, uniform districting is optimal if UU is concave, and segregation is optimal if UU is convex. However, a more realistic assumption is that UU is strictly S-shaped, so the marginal impact of replacing a less favorable voter with a more favorable one on the probability of winning a district is first increasing and then decreasing. Formally, this means that there is an inflection point xi[0,1]x^{i}\in[0,1] such that UU is strictly convex on [0,xi][0,x^{i}] and strictly concave on [xi,1][x^{i},1]; equivalently, the re-scaled aggregate shock zz is unimodal.

We will see that UU being S-shaped is closely related to the optimality of pack-opponents-and-pool districting (i.e., pp-segregation, see Figure 1(b)), where voter types below some cutoff ss^{*} are segregated, and voter types above ss^{*} are pooled in districts with mean voter type x=𝔼F[s|ss]x^{*}=\mathbb{E}_{F}[s|s\geq s^{*}]. Under pack-opponents-and-pool districting with cutoff ss^{*} and pool mean x=𝔼F[s|ss]x^{*}=\mathbb{E}_{F}[s|s\geq s^{*}], the designer’s expected seat share is

s¯sU(x)𝑑F(x)+U(x)(1F(s)).\int_{\underline{s}}^{s^{*}}U(x)dF(x)+U(x^{*})(1-F(s^{*})).

The best pack-opponents-and-pool plan is the one where ss^{*} is chosen to maximize this expectation. When the optimal value of ss^{*} is interior, it is characterized by the first-order condition

u(x)(xs)=U(x)U(s).u(x^{*})(x^{*}-s^{*})=U(x^{*})-U(s^{*}).

The intuition for this equation is that a marginal increase in ss^{*} increases the pool mean, which increases the designer’s expected seat share by u(x)(1F(s))dx/ds=u(x)(xs)f(s)u(x^{*})(1-F(s^{*}))dx^{*}/ds^{*}=u(x^{*})(x^{*}-s^{*})f(s^{*}); but also decreases the mass of pooled voters, which decreases the designer’s expected seat share by (U(x)U(s))f(s)(U(x^{*})-U(s^{*}))f(s^{*}). The first-order condition equates the marginal benefit and marginal cost. See Figure 2.

11s¯\underline{s}s¯\overline{s}U(s){U}(s^{*})U(x){U}(x^{*})ss^{*}xx^{*}UUsegregationpoolingxx
Figure 2. Optimal Pack-Opponents-and-Pool Districting

A simple result is that pack-opponents-and-pool is optimal when UU is strictly S-shaped.

Proposition 4.

In the linear case where UU is strictly S-shaped, pack-opponents-and-pool districting is optimal, and every optimal districting plan has the same distribution of district means.

Intuitively, when UU is S-shaped the designer is risk-loving in the pool mean xx for x[0,s]x\in[0,s^{*}] and is risk-averse in xx “on average” for x[s,1]x\in[s^{*},1], so voters below ss^{*} are segregated and voters above ss^{*} are pooled. Similar results were established by Gul and Pesendorfer (2010) and, in the persuasion literature, Kolotilin (2018) and Kolotilin, Mylovanov, and Zapechelnyuk (2022).

As aggregate uncertainty vanishes, the best pack-opponents-and-pool plan converges to the plan characterized in Proposition 2 with segregated packed districts.212121Note that as GG converges to the step function 𝟏{rr0}\mathbf{1}\{r\geq r^{0}\}, UU converges to the step function 𝟏{xxi}\mathbf{1}\{x\geq{x}^{i}\}, where xi{x}^{i} is the solution to v(xi,r0)=1/2v({x}^{i},r^{0})=1/2. The first-order condition then reduces to the condition that x=xix^{*}={x}^{i}, which yields the same condition for ss^{*} as in Proposition 2. Thus, traditional pack-and-crack (where packed districts are pooled) and pack-opponents-and-pool and POP (where packed districts are segregated) are all optimal without aggregate uncertainty, but only the latter two plans remain optimal with a small amount of aggregate uncertainty.222222Intuitively, the designer optimally segregates packed districts to have a respectable chance of winning the strongest of these districts. Note that pack-opponents-and-pool and POP induce the same distribution of district mean types, and hence may both be optimal even when the optimal distribution of means is unique. However, the designer’s indifference among different ways of creating cracked districts with the same mean type is not robust to introducing slightly non-linear swings, as we show in the next section.

Remark 1 (Means vs. Medians).

An intuition for why packing opponents is optimal with linear swings and unimodal aggregate shocks (including in the no-aggregate-uncertainty case), while packing moderates is optimal with no idiosyncratic uncertainty, is that the designer targets a distribution of district means in the former case and district medians in the latter case. Optimizing the distribution of district means with unimodal aggregate uncertainty entails packing opponents and cracking moderates and supporters among districts with the same mean type. Optimizing the distribution of district medians entails matching voter types above and below the population median. Loosely speaking, whether packing opponents or moderates is optimal in practice depends on whether reality is closer to the linear/mean-dependent case or the no-idiosyncratic-uncertainty/median-dependent case.

The distinction between mean and median-dependence can be used to classify several strands of related literature. In gerrymandering, Owen and Grofman (1988) and Gul and Pesendorfer (2010) study the mean-dependent case, while Friedman and Holden (2008) study an approximately median-depedent case. In electoral competition, probabilisitic voting models with partisan taste shocks such as Hinich (1977) and Lindbeck and Weibull (1993) are mean-dependent, while stochastic median voter models such as Wittman (1983) and Calvert (1985) are median-dependent. In persuasion, Gentzkow and Kamenica (2016), Kolotilin, Mylovanov, Zapechelnyuk, and Li (2017), Kolotilin (2018), Dworczak and Martini (2019), and Kleiner, Moldovanu, and Strack (2021) study the mean-depedent case, while Kolotilin, Corrao, and Wolitzky (2023) study a general case nesting both the mean and quantile (e.g., median)-dependent case, and Yang and Zentefis (2023) study the quantile-dependent case.

4. General Analysis

We now consider the general case with both idiosyncratic and aggregate uncertainty and non-linear swings. We first impose a natural curvature assumption on swings, and show that it implies that optimal districting is “strictly single-dipped,” in that more extreme voters are assigned to stronger districts. We then argue that optimal strictly single-dipped districting plans typically take a “pack-and-pair” form, where weaker districts are segregated and stronger districts consist of exactly two voter types. POP and PMP are leading examples of pack-and-pair plans. We next provide theoretical and numerical results that delineate the parameter ranges where POP or PMP is optimal. Here we find that POP is optimal when idiosyncratic uncertainty is much larger than aggregate uncertainty, PMP is optimal when aggregate uncertainty is larger than idiosyncratic uncertainty, and mixed versions of POP or PMP are optimal in the intermediate range. Finally, we observe that when idiosyncratic uncertainty is sufficiently dominant (as we will see is the case in practice), the optimal POP plan closely resembles pp-segregation, and both pp-segregation and traditional pack-and-crack districting are approximately optimal.

4.1. Swingy Moderates and Single-Dipped Districting

The linear swing case considered in Section 3.4 is a natural benchmark, but it makes the counterfactual prediction that the “swingiest” voters—those with the largest Δsr,r\Delta_{s}^{r,r^{\prime}}—are “extremists” with s{s¯,s¯}s\in\{\underline{s},\overline{s}\}. In contrast, election forecasters (and presumably sophisticated gerrymanderers) take into account that moderate voters are usually swingier than extremists. As Nathaniel Rakich and Nate Silver put it when describing the “elasticity scores” in the FiveThirtyEight.com forecasting model, “Voters at the extreme end of the spectrum—those who have a close to a 0 percent or a 100 percent chance of voting for one of the parties—don’t swing as much as those in the middle,” (Rakich and Silver, 2018). We provide some evidence for this claim in Section 5.

The following assumption formalizes the idea that moderates are swingier than extremists.

Assumption 1 (Swingy Moderates).

We have

2srln(v(s,r)s)>0for all sr.\frac{\partial^{2}}{\partial s\partial r}\ln\left(\frac{\partial v(s,r)}{\partial s}\right)>0\quad\text{for all $s$, $r$}. (1)

To see why Assumption 1 corresponds to moderates being swingy, note that integrating (1) gives, for all s<s<s′′s<s^{\prime}<s^{\prime\prime} and r<rr<r^{\prime},

(v(s′′,r)v(s,r))(v(s,r)v(s,r))>(v(s′′,r)v(s,r))(v(s,r)v(s,r)),\displaystyle(v(s^{\prime\prime},r^{\prime})-v(s^{\prime},r^{\prime}))(v(s^{\prime},r)-v(s,r))>(v(s^{\prime\prime},r)-v(s^{\prime},r))(v(s^{\prime},r^{\prime})-v(s,r^{\prime})),

or equivalently

Δsr,r>v(s′′,r)v(s,r)v(s′′,r)v(s,r)Δsr,r+v(s,r)v(s,r)v(s′′,r)v(s,r)Δs′′r,rfor all s<s<s′′r<r.\Delta_{s^{\prime}}^{r,r^{\prime}}>\frac{v(s^{\prime\prime},r)-v(s^{\prime},r)}{v(s^{\prime\prime},r)-v(s,r)}\Delta_{s}^{r,r^{\prime}}+\frac{v(s^{\prime},r)-v(s,r)}{v(s^{\prime\prime},r)-v(s,r)}\Delta_{s^{\prime\prime}}^{r,r^{\prime}}\quad\text{for all $s<s^{\prime}<s^{\prime\prime}$, $r<r^{\prime}$}. (2)

Recall that the linear case is defined by having equality in (2). Thus, Assumption 1 says that, for any pair of aggregate shocks r<rr<r^{\prime} and any triple of voter types s<s<s′′s<s^{\prime}<s^{\prime\prime}, when the aggregate shock improves from rr^{\prime} to rr, type ss^{\prime} voters swing toward the designer more than type ss and s′′s^{\prime\prime} voters, relative to the linear case.

We mention an equivalent condition and an implication of Assumption 1.

Proposition 5.

The following hold:

  1. (1)

    In the additive taste shock case, Assumption 1 holds iff the density qq of the taste shock tt is strictly log-concave:

    d2dt2ln(q(t))<0for all t.\frac{d^{2}}{dt^{2}}\ln\left(q(t)\right)<0\quad\text{for all $t$}.
  2. (2)

    Assumption 1 implies that v(s,r)/r\partial v(s,r)/\partial r is strictly single-dipped (i.e., decreasing and then increasing) in ss, for each rr.

Many common distributions have strictly log-concave densities, including the normal, logistic, and extreme value distributions (see, e.g., Table 1 in Bagnoli and Bergstrom 2005), so part 1 of the proposition shows that Assumption 1 is a standard property. The property in part 2 of the proposition gives another sense in which moderates are swingier than extremists. For example, for any s<s<s′′s<s^{\prime}<s^{\prime\prime}, this property implies that (letting vr=v/rv_{r}=\partial v/\partial r) if vr(s,r)=vr(s′′,r)v_{r}(s,r)=v_{r}(s^{\prime\prime},r), then vr(s,r)<vr(s,r)=vr(s′′,r)<0v_{r}(s^{\prime},r)<v_{r}(s,r)=v_{r}(s^{\prime\prime},r)<0 (recalling that vr<0v_{r}<0), so type ss^{\prime} is swingier than types ss and s′′s^{\prime\prime}.

We now show that Assumption 1 implies that every optimal districting plan is “strictly single-dipped,” in that more extreme voters are assigned to stronger districts. Formally, a districting plan \mathcal{H} is strictly single-dipped if any district Psupp()P\in\operatorname{supp}(\mathcal{H}) containing any two voter types s<s′′s<s^{\prime\prime} is stronger than any district Psupp()P^{\prime}\in\operatorname{supp}(\mathcal{H}) containing any intervening voter type s(s,s′′)s^{\prime}\in(s,s^{\prime\prime}), in that r(P)<r(P)r^{*}(P^{\prime})<r^{*}(P).232323Formally, we say that a district PP “contains” a voter type ss if ssupp(P)s\in\operatorname{supp}(P). Note that if districting is strictly single-dipped then each district consists of at most two distinct voter types.

Proposition 6.

Under Assumption 1, every optimal districting plan is strictly single-dipped.

Similar results were established by Friedman and Holden (2008) and, in the persuasion context, Kolotilin, Corrao, and Wolitzky (2023).242424Assumption 1 is equivalent to Friedman and Holden’s “informative signal property.” Friedman and Holden assume a finite number of districts, and also assume that the median and mode of QQ coincide. Kolotilin, Corrao, and Wolitzky (2023) give sufficient conditions for single-dippedness in a more general model that allows state-dependent designer preferences. To see the intuition, suppose a districting plan creates two districts, 1 and 2, with the same threshold aggregate shock rr^{*}, but where District 1 consists entirely of moderates and District 2 consists of a mix of left-wing and right-wing extremists. With linear swings, the distribution of vote shares in the two districts are identical. However, under Assumption 1, the vote share is swingier in District 1 than in District 2. Thus, conditional on the aggregate shock being close to rr^{*}, a marginal voter is more likely to be pivotal in District 2 than in District 1. The designer can then profitably exploit this asymmetry by re-allocating some unfavorable voters to District 1 and re-allocating some favorable voters to District 2, thus weakening the moderate District 1 and strengthening the extreme District 2. Breaking all ties in favor of extreme disticts in this manner leads to strictly single-dipped districting.

Proposition 6 implies that, under Assumption 1, the designer should never pool more than two voter types in the same district. Thus, among the plans in Figure 1, only POP and PMP can be optimal under Assumption 1 (and, moreover, more extreme paired districts under these plans must be stronger than more moderate districts). In particular, while pack-opponents-and-pool is optimal with linear swings and unimodal aggregate shocks, if moderates are even slightly swingier than extremists then the designer is better-off splitting the pool into distinct districts each consisting of at most two types such that more extreme districts are strictly stronger.

4.2. Pack-and-Pair Districting

Strict single-dippedness is an important property of a districting plan, but many plans can be strictly single-dipped. This subsection argues that, among strictly single-dipped plans, it is natural to focus on “pack-and-pair” districting, where weaker districts are segregated and stronger districts consist of exactly two voter types. Formally, a strictly single-dipped districting plan \mathcal{H} is pack-and-pair if δssupp()\delta_{s}\in\operatorname{supp}(\mathcal{H}) implies that any Psupp()P\in\operatorname{supp}(\mathcal{H}) such that r(P)<r(δs)r^{*}(P)<r^{*}(\delta_{s}) takes the form P=δsP=\delta_{s^{\prime}} for some s<ss^{\prime}<s.

For simplicity, for the remainder of the current section, we restrict attention to the additive taste shock case, and assume that the taste shock density is strictly log-concave and symmetric about 0. The symmetry assumption has the convenient implication that the threshold shock to win a packed district P=δsP=\delta_{s} is just r(P)=sr^{*}(P)=s.

We first show that any pack-and-pair plan \mathcal{H} can be described in a simple way. First, there exists a bifurcation point rb[s¯,s¯]r^{b}\in[\underline{s},\overline{s}] such that a district Psupp()P\in\operatorname{supp}(\mathcal{H}) is packed if r(P)rbr^{*}(P)\leq r^{b} and is paired if r(P)>rbr^{*}(P)>r^{b}. The bifurcation point thus divides the packed and paired districts. Second, the assignment of voters to paired districts is described by a decreasing function s1s_{1} and an increasing function s2s_{2} where, for each paired district PP, the two voter types in district PP are s1(r(P))s_{1}(r^{*}(P)) and s2(r(P))>s1(r(P))s_{2}(r^{*}(P))>s_{1}(r^{*}(P)). Stronger paired districts thus contain more extreme voters, as single-dippedness requires.

Proposition 7.

For any pack-and-pair districting plan \mathcal{H}, there exists a bifurcation point rb[s¯,s¯]r^{b}\in[\underline{s},\overline{s}], a decreasing function s1:(rb,s¯][s¯,rb)s_{1}:(r^{b},\overline{s}]\rightarrow[\underline{s},r^{b}), and an increasing function s2:(rb,s¯](rb,s¯]s_{2}:(r^{b},\overline{s}]\rightarrow(r^{b},\underline{s}] satisfying s1(r)<r<s2(r)s_{1}(r)<r<s_{2}(r), such that for each Psupp()P\in\operatorname{supp}(\mathcal{H}), we have supp(P)={r(P)}\operatorname{supp}(P)=\{r^{*}(P)\} if r(P)rbr^{*}(P)\leq r^{b} and supp(P)={s1(r(P)),s2(r(P))}\operatorname{supp}(P)=\{s_{1}(r^{*}(P)),s_{2}(r^{*}(P))\} if r(P)>rbr^{*}(P)>r^{b}.

Examples of pack-and-pair districting include segregation, POP, PMP, and negative assortative districting. Note that segregation and negative assortative districting represent the extreme pack-and-pair plans where all voter types are segregated and where only a single type is segregated. We first give conditions under which these extreme districting plans are optimal.

Proposition 8.

Negative assortative districting is uniquely optimal if GG is concave, and segregation is uniquely optimal if GG is “sufficiently convex,” in that there exists a constant c>0c>0 such that segregation is uniquely optimal if g(r)/g(r)c{g^{\prime}(r)}/{g(r)}\geq c for all rr.

The intuition for the first part of the result is as follows. First, any strictly single-dipped districting plan that never segregates any two voter types is negative assortative. So, it suffices to show that if GG is concave (and the taste shock density is strictly log-concave and symmetric), it is sub-optimal for the designer to segregate any two voter types s<ss<s^{\prime}. To see this, suppose the designer pools a few type-ss voters in with the type-ss^{\prime} voters. The marginal effect of this change on the designer’s expected seat share among type-ss voters is

G(s)G(s),G(s^{\prime})-G(s),

which is the increased probability of winning a type-ss voter’s district when she moves from the weak district δs\delta_{s} to the strong district δs\delta_{s^{\prime}}. On the other hand, the marginal effect of this change on the designer’s expected seat share among type-ss^{\prime} voters is

Q(ss)12q(0)g(s).\frac{Q(s-s^{\prime})-\frac{1}{2}}{q(0)}g(s^{\prime}).

This follows because the first term is the marginal effect on the threshold shock to win the strong district, where this comes from using the implicit function theorem (and Q(0)=1/2Q(0)=1/2) to calculate dr/dρdr/d\rho at ρ=0\rho=0 from the equation

ρQ(sr)+(1ρ)Q(sr)=12,\rho Q(s-r)+(1-\rho)Q(s^{\prime}-r)=\frac{1}{2},

and the second term is the density of the aggregate shock at r(δs)=sr^{*}(\delta_{s^{\prime}})=s^{\prime}. Finally, the sum of the two effects is positive, because

G(s)G(s)g(s)ss>12Q(ss)q(0),\frac{G(s^{\prime})-G(s)}{g(s^{\prime})}\geq s^{\prime}-s>\frac{\frac{1}{2}-Q(s-s^{\prime})}{q(0)},

where the first inequality is by concavity of GG, and the second inequality is by symmetry and strict convexity of QQ on (,0](-\infty,0] (which follows from strict log-concavity of qq).

The intuition for the second part of the result is that if GG is sufficiently convex then, for any two voter types ss and ss^{\prime}, we have

G(s)G(s)g(s)Q(ss)12q(0),\frac{G(s^{\prime})-G(s)}{g(s^{\prime})}\leq\frac{Q(s^{\prime}-s)-\frac{1}{2}}{q(0)},

which by a similar logic as above implies that it is optimal for the designer to separate any two voter types rather than pooling them.

Proposition 8 expresses the intuition that concavity of GG favors pooling (which, under strict single-dippedness, takes the form of pairing types, rather than pooling intervals of types), while convexity of GG favors segregation. In the realistic case where GG is strictly S-shaped (i.e., the aggregate shock is unimodal), segregation and negative assortative districting are both sub-optimal, unless the two parties are substantially asymmetric.252525Proposition 9 can be compared to Proposition 1 of Friedman and Holden (2008). Friedman and Holden show that PMP (“matching slices”) is optimal when idiosyncratic uncertainty is sufficiently small, but their discussion focuses on the extreme case of negative assortative districting, where only a single voter type is segregated. Proposition 9 shows that this extreme case never arises when the distribution of the aggregate shock is unimodal and the two parties are symmetric.

Proposition 9.

If GG is strictly S-shaped with inflection point r(F)r^{*}(F), then segregation and negative assortative districting are both sub-optimal.

The intuition is simple. By Proposition 8, the designer prefers pooling any two voter types above the inflection point r(F)r^{*}(F), so segregation is suboptimal. Moreover, for any negative assortative districting, there exist nearby voter types that are paired in a district PP with r(P)<r(F)r^{*}(P)<r^{*}(F), but the designer prefers segregating such types.

Since convexity of GG favors segregation, concavity of GG favors pairing, and it is natural to assume that GG is S-shaped (first convex, then concave), a natural conjecture is that pack-and-pair districting (first segregation, then pairing) is optimal. We can verify this conjecture numerically (for an extremely wide range of parameters) in the special case where GG and QQ are both normal. The following proposition states this result, as well as giving a general sufficient condition for pack-and-pair districting to be uniquely optimal.

Proposition 10.

If there do not exist s¯s<r<ss′′s¯\underline{s}\leq s<r<s^{\prime}\leq s^{\prime\prime}\leq\overline{s} satisfying

G(r)+λ(r)(Q(sr)12)G(s)andG(r)+λ(r)(Q(sr)12)G(s′′)+λ(s′′)(Q(ss′′)12),\begin{gathered}G(r)+\lambda(r)\left(Q(s-r)-\tfrac{1}{2}\right)\geq G(s)\quad\text{and}\\ G(r)+\lambda(r)\left(Q(s-r)-\tfrac{1}{2}\right)\geq G(s^{\prime\prime})+\lambda(s^{\prime\prime})\left(Q(s-s^{\prime\prime})-\tfrac{1}{2}\right),\end{gathered} (3)

where

λ(r)=g(r)(Q(sr))Q(sr))(Q(sr)12)q(sr)(Q(sr)12)q(sr)andλ(s′′)=g(s′′)q(0),\lambda(r)=\frac{g(r)(Q(s^{\prime}-r))-Q(s-r))}{\left(Q(s^{\prime}-r)-\frac{1}{2}\right)q(s-r)-\left(Q(s-r)-\frac{1}{2}\right)q(s^{\prime}-r)}\quad\text{and}\quad\lambda(s^{\prime\prime})=\frac{g(s^{\prime\prime})}{q(0)},

then every optimal districting plan is pack-and-pair. Moreover, when QQ is the standard normal distribution and GG is the centered normal distribution with standard deviation γ1\gamma^{-1}, there do not exist γ{.1,.2,,99.9,100}\gamma\in\{.1,.2,\ldots,99.9,100\} and s<r<ss′′s<r<s^{\prime}\leq s^{\prime\prime} with s,r,s,s′′{5,4.9,,4.9,5}s,r,s^{\prime},s^{\prime\prime}\in\{-5,-4.9,\ldots,4.9,5\} that satisfy (3).

Condition (3) can be explained as follows. For any (strictly single-dipped) non-pack-and-pair plan, there exist s<r<ss′′s<r<s^{\prime}\leq s^{\prime\prime} such that voter types s<ss<s^{\prime} are paired in a district PP with r(P)=r(s,s)r^{*}(P)=r\in(s,s^{\prime}) and voter type s′′s^{\prime\prime} is segregated. By a similar logic to Proposition 8, if the first inequality in (3) fails, the designer prefers to segregate a few type-ss voters from district PP; and if the second inequality in (3) fails, the designer prefers to move a few type-ss voters from district PP to district δs′′\delta_{s^{\prime\prime}}. Thus, if there do not exist s<r<ss′′s<r<s^{\prime}\leq s^{\prime\prime} that satisfy (3), then any optimal plan must be pack-and-pair.

4.3. Should Opponents or Moderates be Packed?

Having provided some arguments for pack-and-pair districting, the last part of our analysis compares two key forms of pack-and-pair—POP and PMP—as well as mixed versions of these districting plans. The mixed versions of POP and PMP that we will encounter fall into a class of districting plans that we call “Y-districting.” Formally, a pack-and-pair plan \mathcal{H} is Y-districting if there exists a positive number ε>0\varepsilon>0 such that

  1. (1)

    For all r[rbε,rb+ε]r\in[r^{b}-\varepsilon,r^{b}+\varepsilon] (where rbr^{b} is the bifurcation point), there exists Psupp()P\in\operatorname{supp}(\mathcal{H}) such that r(P)=rr^{*}(P)=r.

  2. (2)

    Districts Psupp()P\in\operatorname{supp}(\mathcal{H}) with r(P)[rbε,rb]r^{*}(P)\in[r^{b}-\varepsilon,r^{b}] are segregated (i.e., supp(P)={r(P)}\operatorname{supp}(P)=\{r^{*}(P)\}).

  3. (3)

    Districts Psupp()P\in\operatorname{supp}(\mathcal{H}) with r(P)(rb,rb+ε]r^{*}(P)\in(r^{b},r^{b}+\varepsilon] are paired (i.e., supp(P)={s1(r(P)),s2(r(P))}\operatorname{supp}(P)=\{s_{1}(r^{*}(P)),s_{2}(r^{*}(P))\} for some s1(r(P))<s2(r(P))s_{1}(r^{*}(P))<s_{2}(r^{*}(P))).

  4. (4)

    The functions s1s_{1} and s2s_{2} describing the voter types in paired districts are twice differentiable and satisfy limrrbs1(r)=limrrbs2(r)\lim_{r\downarrow r^{b}}s_{1}(r)=\lim_{r\downarrow r^{b}}s_{2}(r).262626The differentiability condition is used in the proof of Proposition 11. It may be possible to drop it.

We will see that Y-districting encompasses a mixed version of POP, where there exists s^(s¯,rb)\hat{s}\in(\underline{s},r^{b}) such that voter types in [s¯,s^)[\underline{s},\hat{s}) are always segregated and types in (s^,rb)(\hat{s},r^{b}) are sometimes segregated and sometimes paired, as well as a mixed version of PMP, where there exists s^(s¯,rb)\hat{s}\in(\underline{s},r^{b}) such that types in [s¯,s^)[\underline{s},\hat{s}) are always paired and types in (s^,rb)(\hat{s},r^{b}) are sometimes segregated and sometimes paired. (In contrast, recall that under POP there exists s^(s¯,rb)\hat{s}\in(\underline{s},r^{b}) such that types in [s¯,s^)[\underline{s},\hat{s}) are always segregated and types in (s^,rb)(\hat{s},r^{b}) are always paired, while under PMP there exists s^(s¯,rb)\hat{s}\in(\underline{s},r^{b}) such that types in [s¯,s^)[\underline{s},\hat{s}) are always paired and types in (s^,rb)(\hat{s},r^{b}) are always segregated.) We will give theoretical and numerical results that indicate that POP is optimal when idiosyncratic uncertainty is much larger than aggregate uncertainty, PMP is optimal when aggregate uncertainty is larger than idiosyncratic uncertainty, and Y-districting (and, in particular, mixed POP or mixed PMP) is optimal in the intermediate range.

We first discuss how POP, PMP, and Y-districting relate to the set of all pack-and-pair plans. POP and PMP are both pure districting plans, in that each voter type ss is assigned to a single district PP: formally, for each s[s¯,s¯]s\in[\underline{s},\overline{s}], there exists a unique Psupp()P\in\operatorname{supp}(\mathcal{H}) such that ssupp(P)s\in\operatorname{supp}(P). They are not the only pure districting plans: for example, a pack-and-pair plan could segregate voter types below a cutoff s0s_{0} and match slices (including with an intermediate segregation region) among voter types above s0s_{0}. However, POP and PMP are the simplest such plans, as they involve only a single non-degenerate interval of segregated voter types. We are not aware of any parameters for which a more complex pure pack-and-pair plan is optimal.

In contrast, Y-districting plans are mixed, because voter types ss just below the bifurcation point are sometimes segregated and sometimes paired with higher types. Somewhat surprisingly, we will see that such plans are uniquely optimal for a range of parameters, even though voter types are continuous. While not every mixed pack-and-pair plan is Y-districting, we will see that, at least numerically, optimal plans always take one of the three forms we consider.

We would like to have general necessary and sufficient conditions for the optimality of POP, PMP, and Y-districting. Unfortunately, this seems very challenging, because the form of optimal districting is driven by global constraints that are difficult to analyze. We instead present a seemingly modest result, which is that if Y-districting is optimal, then the ratio of idiosyncratic uncertainty to aggregate uncertainty must fall in an intermediate range. However, numerically it appears that this result actually characterizes when all three forms of districting are optimal: at least in the case where aggregate and idiosyncratic shocks are both normally distributed, our necessary conditions for optimality of Y-districting are also approximately sufficient, and when the ratio of idiosyncratic uncertainty to aggregate uncertainty is below (resp., above) the range where Y-districting is optimal, then PMP (resp., POP) is optimal.

To facilitate a comparison of the amount of aggregate and idiosyncratic uncertainty, the distributions GG and QQ should have the same shape. We therefore assume that there exists a parameter γ>0\gamma>0 such that G(r)=Q(γr)G(r)=Q(\gamma r) for all rr. The parameter γ\gamma thus meaures the ratio of the standard deviation of the idiosyncratic shocks (which is normalized to 11) to that of the aggregate shock (which equals γ1\gamma^{-1}). The following is our key result.

Proposition 11.

If Y-districting is optimal, then rb=0r^{b}=0 and γ(1,1+31.65]\gamma\in(1,\sqrt{1+\sqrt{3}}\approx 1.65].

The proof of Proposition 11 proceeds by deriving three necessary conditions for optimal districting to involve a bifurcation point at rr (which are based on linear programming duality), and then showing that these conditions imply that the bifurcation point must coincide with the inflection point, and the ratio of idiosyncratic to aggregate uncertainty must lie in an intermediate range. The first condition (equation (12) in Appendix B) says that it is optimal to pair voter types just below and just above rr. The second condition (equation (13)) says that it is optimal to segregate types just below rr. The third condition (equation (14)) says that the proportions of favorable and unfavorable voters in each district PP with r(P)=rr^{*}(P)=r^{\prime} just above rr actually generate the desired cutoff rr^{\prime}. Intuitively, for it to be optimal to pair nearby voter types around rr, GG must be weakly concave at rr; and for it to be optimal to segregate voter types just below rr, GG must be weakly convex at rr. Hence, bifurcation can occur only at the inflection point of GG, which by symmetry equals 0. Moreover, if we take parameters where Y-districting is optimal and increase aggregate uncertainty, it eventually becomes optimal to always segregate voter types just below 0 rather than pairing them with higher voter types, at which point optimal districting becomes PMP (with a bifurcation point below 0). On the other hand, if we take parameters where Y-districting is optimal and decrease aggregate uncertainty, it eventually becomes optimal to always pair voter types just below 0 with higher voter types rather than segregating them, at which point optimal districting becomes POP (with a bifurcation point above 0). We discuss the mechanics of the transition from PMP to POP as γ\gamma increases below.

If we take for granted that the condition γ(1,1.65)\gamma\in(1,1.65) is sufficient as well as necessary for Y-districting to be optimal, the above intuition suggests that:

  1. (1)

    PMP is optimal when γ1\gamma\leq 1.

  2. (2)

    Y-districting is optimal when γ(1,1.65)\gamma\in(1,1.65).

  3. (3)

    POP is optimal when γ1.65\gamma\geq 1.65.

Figure 3 presents numerical solutions that verify this heuristic. In the figure, QQ is the standard normal distribution, GG is the centered normal distribution with standard deviation γ1\gamma^{-1}, and FF is the uniform distribution on [1,1][-1,1].272727More precisely, we approximate the designer’s problem by a finite-dimensional linear program and then solve it using Gurobi Optimizer. Our approximation specifies that ss is uniformly distributed on {1,.99,,.99,1}\{-1,-.99,\ldots,.99,1\} and that the designer is constrained to create districts PP satisfying r(P){1,.99,,.99,1}r^{*}(P)\in\{-1,-.99,\ldots,.99,1\}. Voter types are on the xx-axis, and the threshold shocks to win the districts to which each voter type is assigned are on the yy-axis. (Thus, packed districts lie on the 4545^{\circ} line, while paired districts straddle the 4545^{\circ} line.) For mixed districting plans (i.e., Y-districting, the middle row of the figure), the shading intensity indicates the probability that a voter type is assigned to each district. We see that optimal districting takes exactly the conjectured form: PMP is optimal for γ{0.2,0.5,1}\gamma\in\{0.2,0.5,1\}, Y-districting is optimal for γ{1.2,1.4,1.6}\gamma\in\{1.2,1.4,1.6\}, and POP is optimal for γ{1.7,3,6}\gamma\in\{1.7,3,6\}. The highest value of γ\gamma in the figure, γ=6\gamma=6, is the value closest to our empirical estimates. When γ=6\gamma=6, POP remains optimal but now closely resembles pp-segregation. Thus, for what we will see is the empirically relevant parameter range, pp-segregation is approximately optimal.

Refer to caption
γ=0.2\gamma=0.2
Refer to caption
γ=0.5\gamma=0.5
Refer to caption
γ=1\gamma=1
Refer to caption
γ=1.2\gamma=1.2
Refer to caption
γ=1.4\gamma=1.4
Refer to caption
γ=1.6\gamma=1.6
Refer to caption
γ=1.7\gamma=1.7
Refer to caption
γ=3\gamma=3
Refer to caption
γ=6\gamma=6
Figure 3. Optimal Districting as γ\gamma Varies
Notes: The optimal districting plan is PMP for γ{0.2,0.5,1}\gamma\in\{0.2,0.5,1\}, Y-districting for γ{1.2,1.4,1.6}\gamma\in\{1.2,1.4,1.6\} (and, specifically, mixed PMP for γ{1.2,1.4}\gamma\in\{1.2,1.4\} and mixed POP for γ=1.6\gamma=1.6), and POP for γ{1.7,3,6}\gamma\in\{1.7,3,6\}. Our empirical estimates of γ\gamma in Section 5 are above 66.

We can give an intuition for how and why optimal districting transitions from PMP to POP as γ\gamma increases, as illustrated in Figure 3. Along the way, we also mention some additional features of optimal PMP and POP plans, as well as describing the transition from mixed PMP to mixed POP within the Y-districting regime.

First, recall the extreme cases where γ\gamma is close to 0 (almost no idiosyncratic uncertainty) and where γ\gamma is very large (almost no aggregate uncertainty). When γ\gamma is close to 0, PMP is optimal; moreover, when FF is symmetric about 0 as in Figure 3, almost all voters are paired, so optimal districting is approximately negative assortative, which implies that the bifurcation point is below 0 and the range of values of r(P)r^{*}(P) across paired districts PP is large.282828Another property of optimal PMP plans is that the left arm of the “Y” is infinitely steep at the bifurcation point, i.e., limrrbs1(r)=0\lim_{r\downarrow r^{b}}s_{1}^{\prime}(r)=0. When γ\gamma is very large, POP is optimal; moreover, pp-segregation is approximately optimal, which implies that the bifurcation point is above 0 and the range of values of r(P)r^{*}(P) across paired districts is very small.292929Another property of optimal POP plans is that pairing at the bifurcation point is smooth, i.e., limrrbs1(r)=\lim_{r\downarrow r^{b}}s_{1}^{\prime}(r)=-\infty and limrrbs2(r)=\lim_{r\downarrow r^{b}}s_{2}^{\prime}(r)=\infty. Now, when γ\gamma increases from 0 toward 11, the range of r(P)r^{*}(P) across paired districts decreases (as the range of probable aggregate shocks decreases), and the proportion of packed districts increases. When γ\gamma reaches 11, it becomes optimal to pack voters with s=0s=0, the inflection point of GG. Since it cannot be optimal to pack voters above the inflection point, once γ\gamma crosses 11 it becomes optimal to pair voters with ss just above 0 with a few slightly less favorable voters. At this point, districting takes the form of mixed PMP.

As γ\gamma increases farther above 11, the range of r(P)r^{*}(P) across paired districts continues to decrease. This implies a flattening out of the right arm of the “Y”—i.e., an increase in s2s_{2}^{\prime}—which increases the mass of favorable voters assigned to districts where r(P)r^{*}(P) is positive but small. To keep r(P)r^{*}(P) small in these districts, this effect must be offset by also assigning more unfavorable voters to these districts, which is achieved by assigning more of the “mixed” unfavorable voters type to paired districts rather than packed districts, while the range of unfavorable voter types assigned to each interval of mixed districts actually decreases—i.e., the left arm of the Y gets steeper.303030The proof of Proposition 11 shows that, for all sufficiently small positive rr, |s1(r)||s_{1}^{\prime}(r)| is decreasing in γ\gamma (i.e., the left arm gets steeper) and s2(r)s_{2}^{\prime}(r) is increasing in γ\gamma (i.e., the right arm gets flatter). At some point, the right arm of the Y becomes flatter than the left arm so that the most extreme left-wing voters have no right-wing voters to match with, at which point these voters are segregated: this point marks the transition from mixed PMP to mixed POP, which occurs at γ=21.41\gamma=\sqrt{2}\approx 1.41 in the uniform case illustrated in Figure 3.313131The transition point γ=2\gamma=\sqrt{2} is defined as the unique value of γ\gamma at which limr0|s1(r)|=limr0s2(r)\lim_{r\downarrow 0}|s_{1}^{\prime}(r)|=\lim_{r\downarrow 0}s_{2}^{\prime}(r). The γ=1.4\gamma=1.4 panel in the figure illustrates a point just before this transition occurs. As γ\gamma increases further, more and more mixed unfavorable voters are assigned to paired districts, until all such voters are assigned to paired districts, at which point optimal districting becomes POP, and the bifurcation point becomes positive. This occurs when γ1.65\gamma\approx 1.65. Finally, as γ\gamma increases further beyond 1.651.65, the range of r(P)r^{*}(P) across paired districts continues to decrease, and the optimal POP plan approximates pp-segregation more and more closely.

Remark 2 (Approximate Optimality of Traditional Pack-and-Crack).

We conclude this setting by noting that, for what we will see is the empirically-relevant range of parameters, the optimal POP plan closely resembles pp-segregation, and in fact both pp-segregation and traditional pack-and-crack districting are approximately optimal. Our central estimates for γ\gamma in Section 5 are above 6, and for most states are above 10. Figure 3 shows that, for these parameters, POP is optimal, and the optimal POP plan closely resembles pp-segregation. Moreover, for the parameters used in Figure 3 (where the standard deviation of ss is fixed at what we will see is a realistic level, while γ1\gamma^{-1}, the standard deviation of rr, varies), we have calculated that the designer’s expected seat share under the optimal districting plan never exceeds his expected seat share under the optimal traditional pack-and-crack plan by more than 1.4%1.4\% for any value of γ\gamma, or by more than 0.1%0.1\% for any value of γ\gamma above 55.323232Friedman and Holden (*FH, p. 129) and Cox and Holden (*CH p. 571) present an example with large aggregate uncertainty (γ=1/20.71\gamma=1/\sqrt{2}\approx 0.71) and a large standard deviation of ss (equal to 33, while our empirical estimate of this parameter is 0.630.63) where the designer’s expected seat share is over 20%20\% greater under matching slices than under traditional pack-and-crack. This shows that, when the standard deviations of both rr and ss are (unrealistically) large, the advantage of optimal districting over traditional pack-and-crack can be significantly larger than the 1.4%1.4\% upper bound that we obtain by varying the standard deviation of rr while fixing the standard deviation of ss at a realistic level. For example, when γ=6\gamma=6 the optimal expected seat share is approximately .7087.7087, while the optimal traditional pack-and-crack plan gives an expected seat share of approximately .7082.7082.333333When γ=2\gamma=2 (an unrealistic low value), the corresponding expected seat shares are .5392.5392 and .5357.5357. When γ=15\gamma=15 (close to our central estimate), they are .8488.8488 and .8485.8485. An intuition for this result is that in practice aggregate uncertainty is small (relative to both idiosyncratic uncertainty and the range of voter/precinct types ss), so the no-aggregate uncertainty case considered in Section 3.2—where traditional pack-and-crack is exactly optimal—is fairly realistic.

5. Estimation

We have argued that the form of optimal districting depends on a comparison of the amount of aggregate and idiosyncratic uncertainty facing the designer, and in particular on the parameter γ\gamma introduced in the previous section (i.e., the ratio of idiosyncratic to aggregate uncertainty, or equivalently the inverse standard deviation of the aggregate shock rr, recalling that the the standard deviation of the idiosyncratic shocks tt is normalized to 11). We now estimate γ\gamma using precinct-level returns from recent US House elections, while also providing empirical support for some of our key theoretical assumptions. We first describe our data and empirical model, then present some simple summary statistics and plots, and finally estimate γ\gamma.

5.1. Data and Empirical Model

Our data are the precinct-level returns for the US House elections in 2016, 2018, and 2020, which were recently standardized and made freely available by Baltz et al. (2022). For each precinct nn and election t{2016,2018,2020}t\in\{2016,2018,2020\}, we observe the total two-party vote kntk_{nt} and the share of the two-party vote for the Republican candidate vntv_{nt}.343434A “precinct” is the smallest election-reporting unit in a state, which typically corresponds to a geographic area where all voters vote at the same polling place. Maine and New Jersey report election returns only at the township level, so for these states nn indexes townships rather than precincts. Also, for some elections where a nominally third-party candidate runs in place of an official Democratic or Republican candidate, we manually re-label this candidate as a Democrat or Republican. For example, in New York, we re-assign Working Families Party candidates as Democrats and re-assign Conservative Party candidates as Republicans. The data are a repeated cross-section rather than a panel, because there is no general way to match precincts across elections (for example, because precinct boundaries change frequently; Baltz et al. 2022, p. 6). We drop all districts with an uncontested House race in any of 2016, 2018, or 2020 (which drops 25% of all districts).353535Keeping these districts would bias our estimate of γ\gamma, because the relevant vote shares are for contested elections, and if these districts were contested their vote shares would be different from 0 or 1. Keeping a district with one or two uncontested elections only for the elections where it is contested would also bias our estimate of γ\gamma, by distorting the estimated swing across elections. Dropping uncontested districts does likely bias our estimate of the distribution FF of voter types ss, as uncontested districts are presumably more extreme; however, this bias is irrelevant for our main goal of estimating γ\gamma. Moreover, for each of the three elections, we drop precincts where there are fewer than 50 total votes (which drops .13% of all votes) or where the Republican vote share is 0 or 1 (which drops an additional .015% of votes).

To take the model to these data, we assume that the designer has voter information at the precinct level. This is a reasonable assumption, since this is the finest level at which election data is available. As a voter type ss in the model captures the information available to the designer, we therefore assume that all voters in a given precinct nn have the same type sns_{n}. We will also assume that precincts are relatively large (in the data, the mean precinct vote count is 789 with standard deviation 1,399, after dropping precincts with fewer than 50 total votes or a 0 or 1 vote share), and idiosyncratic taste shocks are normally distributed, so that the designer’s vote share in precinct nn in election tt is given by

v(sn,rt)=Φ(snrt),v(s_{n},r_{t})=\Phi\left({s_{n}-r_{t}}\right),

where Φ\Phi is the standard normal cdf.

While our estimation relies on the assumption that taste shocks are normally distributed, it is important to note that our estimates are quite insensitive to this assumption: because we will find that γ\gamma is very large, the taste shock distribution is approximately uniform over the relevant range, so specifying any smooth taste shock distribution leaves our estimates almost unchanged.

5.2. Descriptive Figures and Summary Statistics

We first present a histogram (Figure 4(a)) showing the number of voters in the United States who live in a precinct with Republican vote share vv, with bin breaks {0,.05,,.95,1}\{0,.05,\ldots,.95,1\}, averaging over elections t{2016,2018,2020}t\in\{2016,2018,2020\}. The histogram shows that the distribution of vntv_{nt} is unimodal, with a large majority (74%) of the mass on v[.25,.75]v\in[.25,.75]. This pattern has two simple, but important, implications for our model. First, the distribution of voter/precinct types is far from bimodal: there is a continuum of types, with most mass “toward the middle.” A designer choosing how to partition precincts into districts must thus decide how to allocate a continuum of types, as in our model.363636In practice, the smallest “districtable unit” is not a precinct but a census block, which is the smallest geographic unit for which the US Census tabulates complete data. However, the number of voters in a precinct or a census block are roughly similar (typically around 1,000, albeit with fairly wide variation), so we believe there is little loss in proceeding as if designers partition precincts rather than census blocks. Second, idiosyncratic uncertainty appears to be large relative to aggregate uncertainty. To see this, note that if idiosyncratic uncertainty were extremely large, Figure 4(a) would show a degenerate distribution at v=1/2v=1/2, while if aggregate uncertainty were extremely large, it would show a bimodal distribution with all mass at 0 and 11. The former case is a better approximation, as the actual distribution in Figure 4(a) is unimodal, with 74% of the mass on v[.25,.75]v\in[.25,.75]. While we will quantitatively estimate γ\gamma in the next subsection, this observation already suggests what we will find, which is that γ\gamma is much greater than 11.

Refer to caption
(a) Precinct Vote Shares
Refer to caption
(b) District Vote Swings
Figure 4. Distributions of Precinct Vote Shares and District Vote Swings

Next we present another histogram (Figure 4(b)), which shows the number of (district, election) pairs where the district-wide Republican vote share deviated from its mean over the three elections we consider by xx, with bin breaks {.25,.225,,.225,.25}\{-.25,-.225,\ldots,.225,.25\}.373737This histogram is compiled at the district level because precincts are not matched across elections. This histogram gives another way of showing that aggregate shocks are small: the distribution is centrally unimodal, and most of the mass (57%) is on x[.025,.025]x\in[-.025,.025]. In contrast, if aggregate shocks were large, we would again have a bimodal distribution with all mass far from 0.

Finally, we consider the empirical distribution of vote shares vntv_{nt} across precincts nn (weighted by the number of votes in each precinct), for each election tt. This is shown in Figure 5(a). The S-shaped curve for each election again indicates that most precincts have vote shares relatively close to 1/21/2. The ordering of the curves (except for the lowest-vote-share precincts, discussed below) reflects the fact that, among the 2016, 2018, and 2020 elections, 2018 was the best year for Democrats, 2016 was the best year for Republicans, and 2020 was in the middle.

Refer to caption
(a) CDF
Refer to caption
(b) Normalized CDF
Figure 5. CDF and Normalized CDF of Precinct Vote Shares in 2016, 2018, and 2020
Notes: The left panel displays the empirical cdf of the precinct vote share in 2016, 2018, and 2020, which we denote by Jt(v)J_{t}(v) for t{2016,2018,2020}t\in\{2016,2018,2020\}. The right panel displays the curves J20161(J2020(v))J^{-1}_{2016}(J_{2020}(v)), J12018(J2020(v))J^{-1}_{2018}(J_{2020}(v)), and J12020(J2020(v))=vJ^{-1}_{2020}(J_{2020}(v))=v, as well as the first, second, and third quantiles of J2020(v)J_{2020}(v).

We can use these curves to assess the realism of our key assumption that moderates are swingier than extremists (Assumption 1). Figure 5(b) transforms Figure 5(a) by normalizing by the empirical vote-share distribution in 2020. Thus, in Figure 5(b) the blue curve is the 4545^{\circ} line; the red curve is the 2016 Republican vote share for a precinct with a given 2020 Republican vote share; and the green curve is the analogous curve for 2018.383838Technically, since we cannot match precincts across elections, the red curve is the 2016 Republican vote share for a precinct at the same quantile of the vote share distribution as a precinct with a given 2020 Republic vote share, and similarly for the green curve. Under our assumptions—including Assumption 1—the red curve should be concave and everywhere above the blue curve, and the green curve should be convex and everywhere below the blue curve, where these concavity/convexity properties reflect Assumption 1. Figure 5 shows that this is not exactly true in our data, because the green and red curves are “too low” for the left-most districts (a small minority of districts, lying well into the lowest quartile of the vote-share distribution, as indicated in the figure). We believe that this small deviation from Assumption 1 likely reflects an unusually strong performance by Republicans in urban districts in 2020, largely due to a well-documented shift in the Hispanic vote toward Republicans (e.g., Igielnik, Keeter, and Hartig 2021, Kolko and Monkovic 2021). Such demographic-specific shocks are, of course, outside our model, but could be explored in future work. Overall, we believe Figure 5 is well-explained by a combination of our assumptions (including Assumption 1) and an unexpected shift toward Republicans in urban areas in 2020.

5.3. Estimates for γ\gamma

We now estimate the key parameter γ\gamma under the assumption that aggregate and idiosyncratic shocks are both normally distributed. Since districting plans in the US are drawn at the state level, we estimate γ\gamma separately for each US state. Without loss, we normalize the variance of the taste shock distribution to 11, so that Q=ΦQ=\Phi, the standard normal cdf, and the aggregate shock distribution GG is given by a centered normal cdf with standard deviation γ1\gamma^{-1}. Recall that our theoretical and numerical results in Section 4.3 indicate that PMP is optimal if γ1\gamma\leq 1, Y-districting is optimal if γ(1,1.65)\gamma\in(1,1.65), and POP is optimal if γ1.65\gamma\geq 1.65. Thus, a key question of interest is which of these three regions contains our estimate of γ\gamma.

We estimate γ\gamma by method of moments. Recall that vntv_{nt} is the Republican share of the two-party vote in precinct nn and election tt. Let wnt=Φ1(vnt)w_{nt}=\Phi^{-1}(v_{nt}), the corresponding quantile of the standard normal distribution. Next, define

wt=nkntwntnkntandw=twtT,w_{t}=\frac{\sum_{n}k_{nt}w_{nt}}{\sum_{n}k_{nt}}\quad\text{and}\quad w=\frac{\sum_{t}w_{t}}{T},

where the sums over nn range over all precincts in a given state. Thus, wtw_{t} is the average value of wntw_{nt} over precincts in the state, weighted by the number of votes in each precinct; and ww is the average value of wtw_{t} over elections tt. It is then easy to show that an unbiased and consistent estimator of γ\gamma is given by

γ^=1/t(wtw)2T1,\widehat{\gamma}={1}\bigg{/}{\sqrt{\frac{\sum_{t}(w_{t}-w)^{2}}{T-1}}},

and, for any α(0,1)\alpha\in(0,1), a 1α1-\alpha confidence interval for γ\gamma is given by

χ2T1(α/2)T1γ^γχ2T1(1α/2)T1γ^.\sqrt{\frac{\chi^{2}_{T-1}(\alpha/2)}{T-1}}\widehat{\gamma}\leq\gamma\leq\sqrt{\frac{\chi^{2}_{T-1}(1-\alpha/2)}{T-1}}\widehat{\gamma}.
Refer to caption
Figure 6. Point Estimates and 90% Confidence Intervals for γ\gamma by State
Notes: POP is optimal for γ1.65\gamma\geq 1.65. The point estimate and 90% confidence interval for γ\gamma for the entire US are 14.75 and [3.34, 25.54].

Figure 6 displays the results of this estimation. The figure shows the 90% confidence interval for γ\gamma for each state. The confidence intervals are extremely wide, because we only have data from three elections, i.e., T=3T=3. However, it is clear that the central estimates for γ\gamma, as well as the lower bound of the 90% confidence interval for almost all states, is well above the critical value of 1.65. The lowest estimate for γ\gamma for any state is 5.63, the mean estimate for γ\gamma (weighted by the number of districts in each state) is 14.32, and the corresponding estimate when we estimate γ\gamma for the US as a whole is 14.75. These estimates are all far above the critical value of 1.65. Moreover, even with T=3T=3, the lower endpoint of the 90% confidence interval is above 1.65 for all states except North Dakota (where the lower endpoint is 1.28), Hawaii (1.6), Alabama (1.61), and Louisiana (1.65). We expect that if we expanded our dataset to include the returns from the 2012 and 2014 elections (thus covering all five congressional elections held under the 2010 districting plans), the lower endpoints of the 90% confidence interval would exceed 1.65 for these states as well.393939Precinct-level returns for 2012 and 2014 have been compiled by Ansolabehere, Palmer, and Lee (2014) but are less complete and less standardized than the Baltz et al. (2022) data we use, which only cover 2016, 2018, and 2020. The data thus clearly indicate that γ\gamma is well above 1.65 in practice, at least for the vast majority of states, and probably for all of them. Together with the results in Section 4.3, this provides strong evidence that optimal gerrymandering is given by POP for realistic parameters.404040While it is not relevant for determining the qualitative form of optimal districting, we can also estimate the distribution FF of voter types ss. At the country-level, the mean estimate of FF (calculated as ww) is very close to 0, and the standard deviation estimate of FF (calculated as n,tknt(wntwt)2/n,tknt\sqrt{\sum_{n,t}k_{nt}(w_{nt}-w_{t})^{2}/\sum_{n,t}k_{nt}}) is 0.630.63. These values are similar to those in Figure 3. Note however that these estimates may be biased by dropping uncontested elections (unlike our estimates of γ\gamma, which remain unbiased after dropping any set of districts). We also note that the correlation between our estimates of γ\gamma and the standard deviation of FF at the state level (weighted by the number of districts in each state) is small (.28-.28), which is consistent with varying γ\gamma in G(r):=Q(γr)G(r):=Q(\gamma r) for fixed QQ and FF as in Figure 3. In contrast, for an alternative normalization with Q(t):=G(t/γ)Q(t):=G(t/\gamma) for fixed GG and FF, the weighted correlation between our estimates of γ\gamma and the standard deviation of FF is large (.79), which would be inconsistent with varying γ\gamma for fixed GG and FF.

Our estimates for γ\gamma are so high that not only is POP clearly optimal rather than PMP, but the optimal POP plan is very similar to pp-segregation, and both pp-segregation and traditional pack-and-crack districting are approximately optimal. (Recall Figure 3, where POP is already close to pp-segregation when γ=6\gamma=6.) This result can rationalize why actual gerrymandered districting plans usually resemble pp-segregation or traditional pack-and-crack, rather than POP.

6. Discussion: Why Does the Form of Gerrymandering Matter?

Gerrymandering has been a major concern in American politics for many years and has been tied to several important political and legal issues. In this section, we briefly discuss potential implications of our results on the form of optimal partisan gerrymandering—in particular, whether gerrymanderers optimally pack opponents or moderates—for some of these broader issues. We focus on two areas: implications for how regulations and restrictions on districting affect partisan representation, and implications for how gerrymandering affects political competition and polarization.

6.1. Effects of Districting Restrictions on Partisan Representation

American state and federal election laws have long recognized potential harms associated with gerrymandering and have therefore restricted gerrymandering in various ways. At the federal level, the key laws are the Equal Protection Clause of the Fourteenth Amendment and the Voting Rights Act of 1965. These laws have been interpreted as not only prohibitting adverse racial gerrymandering, but also as affirmatively requiring states to create electoral districts where racial or ethnic minority voters form either a majority (a so-called “majority-minority district”) or a large enough minority so as to have a strong opportunity to elect their candidate of choice, perhaps in coalition with some majority voters (often called a “minority opportunity district”) (e.g., Canon 2022). The creation of such districts played a significant role in increasing Black representation in state legislatures and the US Congress from the 1970’s onward, especially in the South (Grofman and Handley 1991, Cox and Holden 2011). However, the overall partisan impact of majority-minority and minority opportunity districts has long been a hotly contested issue, with some observers arguing that these districts effectively pack strong Democratic supporters and thus resemble a component of a Republican-optimal districting plan. This issue came to a head following the 1994 Republican takeover of the US House, which many journalists and political scientists blamed in part on the creation of majority-minority districts in the 1990 redistricting cycle; however, other observers have disputed this narrative (see, e.g., Cox and Holden 2011 and references therein, Cameron, Epstein, and O’Halloran 1996, Washington 2012).

Following Cox and Holden (2011), we argue that whether a requirement to create majority-minority or minority opportunity districts is likely to increase or decrease overall Republican representation hinges to a large degree on whether optimal partisan gerrymandering packs opponents or moderates. The convential view throughout the 1990’s (what Cox and Holden call the “pack-and-crack consensus”) was that optimal gerrymandering packs opponents, and hence that a requirement to create majority-minority districts that pack strong Democratic supporters may well increase overall Republican representation.414141Minority opportunity districts may or may not raise similar issues, depending on the share of strong Democratic supporters in these districts (Lublin, Handley, Brunell, and Grofman, 2020). Based on the analysis of Friedman and Holden (2008), Cox and Holden (2011) challenge this consensus by arguing that optimal districting is given by PMP, and thus packs moderates rather than opponents. Since a PMP plan does not create districts packed with strong Democratic supporters, Cox and Holden argue that a requirement to create such districts precludes PMP and is therefore likely to reduce overall Republican representation.

We agree with Cox and Holden that whether optimal districting packs opponents or moderates is likely to be an important determinant of whether a requirement to create majority-minority or minority opportunity districts increases or decreases overall Republican representation. However, Cox and Holden’s argument that PMP is optimal in practice rests on the implicit assumption that the low-idiosyncratic-uncertainty case studied by Friedman and Holden (2008) is representative. For example, Cox and Holden write, “In a world with diverse voter types, however, there is no plausible distribution of African American voters that would make it optimal for Republican redistricting authorities to create districts in which African Americans make up a supermajority of voters. Within the model, packing one’s opponents is never the optimal strategy,” (p. 574). Our results instead indicate that, empirically, idiosyncratic uncertainty is much larger than aggregate uncertainty, and that in this case POP is optimal (and traditional pack-and-crack districting is approximately optimal), so Republicans do benefit from packing strong Democratic voters. Thus, by analyzing a general model that allows diverse voter types but does not restrict the relative amounts of idiosyncratic and aggregate uncertainty, we can let the data determine which form of districting plan is optimal in practice, and we find that POP is optimal for realistic parameters. Overall, our results support the traditional “pack-and-crack consensus”—Republicans benefit from packing strong Democratic voters—over Cox and Holden’s challenge based on the optimality of packing moderates for certain parameter values.

Of course, even if POP is optimal, so that packing strong Democratic voters in the Republican-optimal manner benefits Republicans, whether a requirement to create majority-minority or minority opportunity districts benefits Republicans in practice is an empirical question. A requirement to create a large numbers of districts with relatively small Democratic majorities can obviously hurt Republicans. Moreover, as emphasized by Shotts (2001), any constraint on districting weakly hurts Republicans in states where Republicans control districting. In general, we believe that understanding the form of partisan-optimal unconstrained districting is useful for assessing the likely impact of restrictions on districting, such as those imposed by Voting Rights Act, but as a complement to empirical analysis rather than a substitute.

6.2. Effects of Gerrymandering on Political Competition and Polarization

A second area of debate concerns the impact of gerrymandering on the intensity of electoral competition (e.g., the fraction of “competitive” districts or the extent of incumbency advantage) and political polarization. Popular discourse often blames gerrymandering for reducing competition and increasing polarization. While the scholarly literature is generally skeptical of the claim that gerrymandering plays a large role in explaining overall secular trends in competition and polarization (e.g., Gelman and King 1994a, Abramowitz, Alexander, and Gunning 2006, McCarty, Poole, and Rosenthal 2009, Friedman and Holden 2009), some work does find such effects (e.g., Cottrell 2019, Kenny, McCartan, Simko, Kuriwaki, and Imai 2022), and the issue remains contested.

Regardless of the overall effects of gerrymandering on competition and polarization, the nature of these effects likely depends on the form that gerrymandering takes. Roughly speaking, with a right-wing designer, POP (as well as pp-segregation and traditional pack-and-crack) create a few strongly left-leaning districts and many slightly right-leaning districts, with a “gap” between the left-leaning and right-leaning districts. Formally, under POP, there is always a gap between the highest value of r(P)r^{*}(P) for a district PP in the interval of segregated voter types and the lowest value of r(P)r^{*}(P) for a district PP in the interval of paired types (see, e.g., the last three panels in Figure 3). POP also involves relatively low polarization within each district, since the lowest voter types in cracked districts are “moderates” rather than extreme left-wingers. In contrast, PMP creates a continuum of districts ranging from left-leaning to right-leaning—formally, the set {r:r=r(P) for some Psupp()}\{r:r=r^{*}(P)\text{ for some }P\in\operatorname{supp}(\mathcal{H})\} is an interval (see, e.g., the first three panels in Figure 3)—with less extreme left-leaning districts than under POP. PMP also involves greater within-district polarization than POP, at least in the sense that the maximum range of voter types that are pooled together under PMP is greater than under POP (since this range is as large as possible under PMP, but is strictly smaller under POP).

Our model does not encompass any endogenous political responses to districting, such as effects of districting on which politicians run for office and on what platforms. With this caveat in mind, we can draw some tentative implications of the above features of POP (or pp-segregation or traditional pack-and-crack) and PMP for political competition and polarization. First, the fact that the distribution of threshold shocks r(P)r^{*}(P) has a gap under POP but not under PMP suggests that pack-and-crack plans may lead to a more polarized legislature, where the packed districts elect left-wing representatives, and the cracked districts elect right-leaning representatives. The possibility that packing opponents can increase polarization in this manner is a long-standing political and legal concern (see, e.g., Cox and Holden 2011, p. 595). Coate and Knight (2007), Besley and Preston (2007), and Bracco (2013) develop models with this feature. In contrast, PMP may lead to a less polarized legislature. Second, POP may lead to a larger number of “uncompetitive,” far-left districts. Creating uncompetitive districts is usually viewed as a socially undesirable feature of a districting plan, but see Buchler (2005) and Brunell (2008) for opposing views. Finally, lower within-district polarization under POP may be socially desirable if voters benefit from being ideologically close to their representative, as in Besley and Preston (2007) and Gomberg, Pancs, and Sharma (2023). These and other implications of optimal districting for political processes and outcomes could be studied more fully in a model that endogenized additional aspects of political competition beyond districting. This is a promising direction for future research.

7. Conclusion

This paper has developed a simple and general model of optimal partisan gerrymandering. Our main message has four parts. First, pack-and-pair districting—a generalization of traditional packing-and-cracking—is typically optimal for the gerrymanderer. Second, the optimal form of pack-and-pair depends on the relative amounts of aggregate and idiosyncratic uncertainty facing the gerrymanderer: opposing voters are packed when idiosyncratic uncertainty dominates, while moderate voters are packed when aggregate uncertainty dominates. Third, empirically, idiosyncratic uncertainty dominates in practice, so we expect pack-opponents-and-pair (POP) districting to be optimal. This finding also establishes that the relevant parameter range for future research on gerrymandering (and electoral competition more generally) is that where idiosyncratic uncertainty is much larger than aggregate uncertainty. Fourth, estimated idiosyncratic uncertainty is so large that the optimal POP plan closely resembles a simpler pack-opponents-and-pool plan, where more favorable voters are all pooled together, rather than being paired as they are under POP; moreover, traditional pack-and-crack districting, where less favorable voters are also all pooled together, rather than being segregated, is also approximately optimal. This final observation can rationalize the use of traditional pack-and-crack districting plans in practice.

Methodologically, we develop and exploit a tight connection between gerrymandering and information design. We show that a general model of partisan gerrymandering is equivalent to a general Bayesian persuasion problem where the state of the world and the receiver’s action are both one-dimensional and the sender’s preferences are state-independent. This common framework nests the important prior contributions of Owen and Grofman (1988), Friedman and Holden (2008), and Gul and Pesendorfer (2010), and facilitates a more general and realistic analysis that allows diverse voter types and non-linear vote swings without restricting the relative amounts of aggregate and idiosyncratic uncertainty.

We hope our model can inform future research on various aspects of redistricting. We mention a few directions for future research.

First, we have assumed that the designer maximizes his party’s expected seat share. It may be more realistic to assume that the designer’s utility is non-linear in his party’s seat share, for example because he puts a premium on winning a majority of seats. We examined this case in an earlier version of the current paper (Kolotilin and Wolitzky, 2020). While non-linear designer utility introduces some new complications, it also reinforces the main message of the current paper, in that if the designer’s utility is S-shaped in his party’s seat share (as in the case with a premium on winning a majority), then pack-opponents-and-pool is strictly optimal even with linear swing and uniform aggregate shocks (whereas a designer with linear utility is indifferent among all districting plans in this case).

Second, we have assumed that all voters always vote, or at least always vote at the same rate (as is equivalent). It would be interesting to incorporate heterogeneous turnout in the analysis. A recent contribution by Bouton, Genicot, Castanheira, and Stashko (2023) considers voters with a binary partisan type (as in Owen and Grofman 1988) and a continuous “turnout type,” which captures fixed turnout heterogeneity across voters. An alternative model, which captures variable turnout heterogeneity, would retain one-dimensional voter types but assume that voters abstain when they are close to indifferent between the parties. It would be interesting to compare these models, as in practice turnout heterogeneity has both fixed sources (e.g., education, race) and variable ones (e.g., almost-indifferent voters turn out less).

Third, a robust prediction of our analysis is that there should be greater within-district polarization in districts that are more favorable for the designer’s party. It would be interesting to test this prediction empirically.

Further questions include, what does the model imply for political competition and the resulting policy choices? What are the model’s comparative statics—for example, what factors determine the proportion of packed and cracked districts?424242Kolotilin and Wolitzky (2020) analyze comparative statics with binary voter types. What does the model imply about how gerrymandering should be measured and regulated? A better understanding of the form of optimal partisan gerrymandering can contribute to the study of these questions and related ones.

References

  • (1)
  • Abramowitz, Alexander, and Gunning (2006) Abramowitz, A. I., B. Alexander, and M. Gunning (2006): “Incumbency, Redistricting, and the Decline of Competition in US House Elections,” Journal of Politics, 68, 75–88.
  • Ansolabehere, Palmer, and Lee (2014) Ansolabehere, S., M. Palmer, and A. Lee (2014): “Precinct-Level Election Data,” Harvard Dataverse.
  • Bagnoli and Bergstrom (2005) Bagnoli, M., and T. Bergstrom (2005): “Log-Concave Probability and Its Applications,” Economic Theory, 26(2), 445–469.
  • Baltz et al. (2022) Baltz, S., et al. (2022): “American Election Results at the Precinct Level,” Scientific Data, 9(651).
  • Besley and Preston (2007) Besley, T., and I. Preston (2007): “Electoral Bias and Policy Choice: Theory and Evidence,” Quarterly Journal of Economics, 122, 1473–1510.
  • Bickerstaff (2020) Bickerstaff, S. (2020): Election Systems and Gerrymandering Worldwide. Springer.
  • Blackwell (1953) Blackwell, D. (1953): “Equivalent Comparisons of Experiments,” Annals of Mathematical Statistics, 24, 265–272.
  • Bouton, Genicot, Castanheira, and Stashko (2023) Bouton, L., G. Genicot, M. Castanheira, and A. Stashko (2023): “Gerrymandering when Turnout Rates Differ,” Georgetown University.
  • Bracco (2013) Bracco, E. (2013): “Optimal Districting with Endogenous Party Platforms,” Journal of Public Economics, 104, 1–13.
  • Brunell (2008) Brunell, T. (2008): Redistricting and Representation: Why Competitive Elections Are Bad for America. Routledge.
  • Buchler (2005) Buchler, J. (2005): “Competition, Representation, and Redistricting: The Case against Competitive Congressional Districts,” Journal of Theoretical Politics, 17, 431–463.
  • Calvert (1985) Calvert, R. L. (1985): “Robustness of the Multidimensional Voting Model: Candidate Motivations, Uncertainty, and Convergence,” American Journal of Political Science, 29(1), 69–95.
  • Cameron, Epstein, and O’Halloran (1996) Cameron, C., D. Epstein, and S. O’Halloran (1996): “Do Majority-Minority Districts Maximize Substantive Black Representation in Congress?,” American Political Science Review, 90, 794–812.
  • Canon (2022) Canon, D. (2022): “Race and Redistricting,” Annual Review of Political Science, 25, 509–528.
  • Chambers and Miller (2010) Chambers, C. P., and A. D. Miller (2010): “A Measure of Bizarreness,” Quarterly Journal of Political Science, 5, 27–44.
  • Coate and Knight (2007) Coate, S., and B. Knight (2007): “Socially Optimal Districting: A Theoretical and Empirical Exploration,” Quarterly Journal of Economics, 122, 1409–1471.
  • Cottrell (2019) Cottrell, D. (2019): “Using Computer Simulations to Measure the Effect of Gerrymandering on Electoral Competition in the U.S. Congress,” Legislative Studies Quarterly, 44, 487–514.
  • Cox and Holden (2011) Cox, A. B., and R. R. Holden (2011): “Reconsidering Racial and Partisan Gerrymandering,” University of Chicago Law Review, 78, 553–604.
  • Duchin (2018) Duchin, M. (2018): “Gerrymandering Metrics: How to Measure? What’s the Baseline?,” Tufts University.
  • Dworczak and Martini (2019) Dworczak, P., and G. Martini (2019): “The Simple Economics of Optimal Persuasion,” Journal of Political Economy, 127(5), 1993–2048.
  • Ely (2022) Ely, J. (2022): “A Cake-Cutting Solution to Gerrymandering,” Northwestern University.
  • Friedman and Holden (2009) Friedman, J. N., and R. Holden (2009): “The Rising Incumbent Reelection Rate: What’s Gerrymandering Got To Do with It?,” Journal of Politics, pp. 593–611.
  • Friedman and Holden (2020)    (2020): “Optimal Gerrymandering in a Competitive Environment,” Economic Theory Bulletin, pp. 1–21.
  • Friedman and Holden (2008) Friedman, J. N., and R. T. Holden (2008): “Optimal Gerrymandering: Sometimes Pack, but Never Crack,” American Economic Review, 98(1), 113–44.
  • Fryer and Holden (2011) Fryer, R. G., and R. Holden (2011): “Measuring the Compactness of Political Districting Plans,” Journal of Law and Economics, 54, 493–535.
  • Gelman and King (1994a) Gelman, A., and G. King (1994a): “Enhancing Democracy through Legislative Redistricting,” American Political Science Review, 88, 541–559.
  • Gelman and King (1994b)    (1994b): “A Unified Method of Evaluating Electoral Systems and Redistricting Plans,” American Journal of Political Science, 38, 514–554.
  • Gentzkow and Kamenica (2016) Gentzkow, M., and E. Kamenica (2016): “A Rothschild-Stiglitz Approach to Bayesian Persuasion,” American Economic Review, Papers & Proceedings, 106, 597–601.
  • Gilligan and Matsusaka (2006) Gilligan, T. W., and J. G. Matsusaka (2006): “Public Choice Principles of Redistricting,” Public Choice, 129, 381–398.
  • Gomberg, Pancs, and Sharma (2023) Gomberg, A., R. Pancs, and T. Sharma (2023): “Electoral Maldistricting,” International Economic Review, Forthcoming.
  • Grofman and Handley (1991) Grofman, B., and L. Handley (1991): “The Impact of the Voting Rights Act on Black Representation in Southern State Legislatures,” Legislative Studies Quarterly, 16, 111–128.
  • Grofman and King (2007) Grofman, B., and G. King (2007): “The Future of Partisan Symmetry as a Judicial Test for Partisan Gerrymandering After LULAC v. Perry,” Election Law Journal, 6, 2–35.
  • Gul and Pesendorfer (2010) Gul, F., and W. Pesendorfer (2010): “Strategic Redistricting,” American Economic Review, 100(4), 1616–1141.
  • Hayes and McKee (2009) Hayes, D., and S. C. McKee (2009): “The Participatory Effects of Redistricting,” American Journal of Political Science, 53, 1006–1023.
  • Hinich (1977) Hinich, M. J. (1977): “Equilibrium in Spatial Voting: The Median Voter Result is an Artifact,” Journal of Economic Theory, 16(2), 208–219.
  • Igielnik, Keeter, and Hartig (2021) Igielnik, R., S. Keeter, and H. Hartig (2021): “Behind Biden’s 2020 Victory,” Pew Research Center.
  • Jeong and Shenoy (2022) Jeong, D., and A. Shenoy (2022): “The Targeting and Impact of Partisan Gerrymandering: Evidence from a Legislative Discontinuity,” Review of Economics and Statistics, Forthcoming.
  • Kamenica and Gentzkow (2011) Kamenica, E., and M. Gentzkow (2011): “Bayesian Persuasion,” American Economic Review, 101, 2590–2615.
  • Katz, King, and Rosenblatt (2020) Katz, J., G. King, and E. Rosenblatt (2020): “Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies,” American Political Science Review, 114, 164–178.
  • Kenny, McCartan, Simko, Kuriwaki, and Imai (2022) Kenny, C., C. McCartan, T. Simko, S. Kuriwaki, and K. Imai (2022): “Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition,” Harvard University.
  • Kleiner, Moldovanu, and Strack (2021) Kleiner, A., B. Moldovanu, and P. Strack (2021): “Extreme Points and Majorization: Economic Applications,” Econometrica, 89(4), 1557–1593.
  • Kolko and Monkovic (2021) Kolko, J., and T. Monkovic (2021): “The Places that had the Biggest Swings Toward and Against Trump,” New York Times, https://www.nytimes.com/2020/12/07/upshot/trump-election-vote-shift.
  • Kolotilin (2015) Kolotilin, A. (2015): “Experimental Design to Persuade,” Games and Economic Behavior, 90, 215–226.
  • Kolotilin (2018)    (2018): “Optimal Information Disclosure: A Linear Programming Approach,” Theoretical Economics, 13, 607–636.
  • Kolotilin, Corrao, and Wolitzky (2023) Kolotilin, A., R. Corrao, and A. Wolitzky (2023): “Persuasion with Non-Linear Preferences,” MIT.
  • Kolotilin, Mylovanov, and Zapechelnyuk (2022) Kolotilin, A., T. Mylovanov, and A. Zapechelnyuk (2022): “Censorship as Optimal Persuasion,” Theoretical Economics, 17(2), 561–585.
  • Kolotilin, Mylovanov, Zapechelnyuk, and Li (2017) Kolotilin, A., T. Mylovanov, A. Zapechelnyuk, and M. Li (2017): “Persuasion of a Privately Informed Receiver,” Econometrica, 85, 1949–1964.
  • Kolotilin and Wolitzky (2020) Kolotilin, A., and A. Wolitzky (2020): “The Economics of Partisan Gerrymandering,” MIT.
  • Lagarde and Tomala (2021) Lagarde, A., and T. Tomala (2021): “Optimality and Fairness of Partisan Gerrymandering,” Mathematical Programming, pp. 1–37.
  • Lindbeck and Weibull (1993) Lindbeck, A., and J. Weibull (1993): “A Model of Political Equilibrium in a Representative Democracy,” Journal of Public Economics, 51, 195–209.
  • Lublin, Handley, Brunell, and Grofman (2020) Lublin, D., L. Handley, T. Brunell, and B. Grofman (2020): “Minority Success in Non-Majority Minority Districts: Finding the “Sweet Spot”,” Journal of Race, Ethnicity, and Politics, 5, 275–298.
  • McCarty, Poole, and Rosenthal (2009) McCarty, N., K. T. Poole, and H. Rosenthal (2009): “Does Gerrymandering Cause Polarization?,” American Journal of Political Science, 53, 666–680.
  • McGann, Smith, Latner, and Keena (2016) McGann, A. J., C. A. Smith, M. Latner, and A. Keena (2016): Gerrymandering in America: The House of Representatives, the Supreme Court, and the Future of Popular Sovereignty. Cambridge University Press.
  • McGhee (2014) McGhee, E. (2014): “Measuring Partisan Bias in Single-Member District Electoral Systems,” Legislative Studies Quarterly, 39(1), 55–85.
  • McGhee (2020)    (2020): “Partisan Gerrymandering and Political Science,” Annual Review of Political Science, 23, 171–185.
  • Newkirk (2017) Newkirk, V. R. (2017): “How Redistricting Became a Technological Arms Race,” The Atlantic, 28 October.
  • Owen and Grofman (1988) Owen, G., and B. Grofman (1988): “Optimal Partisan Gerrymandering,” Political Geography Quarterly, 7(1), 5–22.
  • Puppe and Tasnádi (2009) Puppe, C., and A. Tasnádi (2009): “Optimal Redistricting under Geographical Constraints: Why “Pack and Crack” Does Not Work,” Economics Letters, 105, 93–96.
  • Rakich and Mejia (2022) Rakich, N., and E. Mejia (2022): “Did Redistricting Cost Democrats the House?,” https://fivethirtyeight.com/features/redistricting-house-2022/.
  • Rakich and Silver (2018) Rakich, N., and N. Silver (2018): “Election Update: The Most (And Least) Elastic States And Districts,” https://fivethirtyeight.com/features/election-update-the-house-districts-that-swing-the-most-and-least-with-the-national-mood/.
  • Sherstyuk (1998) Sherstyuk, K. (1998): “How to Gerrymander: A Formal Analysis,” Public Choice, 95, 27–49.
  • Shotts (2001) Shotts, K. W. (2001): “The Effect of Majority-Minority Mandates on Partisan Gerrymandering,” American Journal of Political Science, pp. 120–135.
  • Shotts (2002)    (2002): “Gerrymandering, Legislative Composition, and National Policy Outcomes,” American Journal of Political Science, pp. 398–414.
  • Stephanopoulos and McGhee (2015) Stephanopoulos, N. O., and E. M. McGhee (2015): “Partisan Gerrymandering and the Efficiency Gap,” University of Chicago Law Review, 82, 831–900.
  • Washington (2012) Washington, E. (2012): “Do Majority-Black Districts Limit Blacks’ Representation? The Case of the 1990 Redistricting,” Journal of Law and Economics, 55, 251–274.
  • Wittman (1983) Wittman, D. (1983): “Candidate Motivation: A Synthesis of Alternative Theories,” American Political Science Review, 77(1), 142–157.
  • Yang and Zentefis (2023) Yang, K. H., and A. Zentefis (2023): “Extreme Points of First-Order Stochastic Dominance Intervals: Theory and Applications,” Yale University.

Appendix: Proofs

Given the equivalence between our model and a class of Bayesian persuasion problems described in Section 2, Propositions 1, 2, 4, and 6 follow from prior results in the persuasion literature. For these results, we give references to the literature as well as (mostly) self-contained proofs, for completeness. In contrast, Propositions 3, 5, and 711 are new to both the persuasion and gerrymandering literatures. We give complete proofs of these results.

Appendix A Proofs for Section 3

Proof of Proposition 1.

This result is standard (see, e.g., Figure 1 in Owen and Grofman 1988). Case (1) is trivial, as the designer wins all districts if he creates measure 11 of districts satisfying PrP(sr0)1/2\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0})\geq 1/2 and loses a positive measure of districts otherwise. For case (2), note that since the designer wins a district PP iff PrP(sr0)1/2\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0})\geq 1/2, a districting plan can be described by a distribution HH over x=PrP(sr0)x=\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0}). The designer’s utility for any feasible HH is

1{x12}dH(x)2xdH(x)=2m,\int\mbox{\bf 1}\left\{x\geq\tfrac{1}{2}\right\}dH(x)\leq\int 2xdH(x)=2m, (4)

where the inequality holds because 1{x1/2}2x\mbox{\bf 1}\{x\geq 1/2\}\leq 2x for all x[0,1]x\in[0,1], and the equality holds because xdH(x)=m\int xdH(x)=m for any feasible HH, by the law of iterated expectations. Thus, any plan that creates measure 2m2m of cracked districts satisfying PrP(sr0)=1/2\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0})=1/2 and measure 12m1-2m of packed districts satisfying PrP(s<r0)=1\mathop{\rm Pr}\nolimits_{P}(s<r^{0})=1 is optimal. Moreover, any other plan creates a positive measure of districts with PrP(sr0){0,1/2}\mathop{\rm Pr}\nolimits_{P}(s\geq r^{0})\notin\{0,1/2\} (i.e., supp(H){0,1/2}\operatorname{supp}(H)\nsubseteq\{0,1/2\}), so that the inequality in (4) is strict, because 1{x1/2}=2x\mbox{\bf 1}\{x\geq 1/2\}=2x iff x{0,1/2}x\in\{0,1/2\}. So any such plan is suboptimal. ∎

Proof of Proposition 2.

The proposition can be obtained using the proofs of Lemmas 1 and C1 in Kolotilin (2015). Case (1) is trivial, as the designer wins all districts if he creates measure 11 of districts satisfying v(s,r0)dP(s)1/2\int v(s,r^{0})dP(s)\geq 1/2 and loses a positive measure of districts otherwise. For case (2), note that since v(s,r0)v(s,r^{0}) is differentiable and strictly increasing in ss, we can redefine ss as v(s,r0)v(s,r^{0}), so that the redefined ss has a full-support density on [s¯,s¯][\underline{s},\overline{s}], with 0s¯<s¯10\leq\underline{s}<\overline{s}\leq 1. Assume that s¯>1/2\overline{s}>1/2, as otherwise the result is trivial. Since 𝔼F[s]<1/2\mathbb{E}_{F}[s]<1/2, there is a unique s(s¯,s¯)s^{*}\in(\underline{s},\overline{s}) satisfying 𝔼F[s|ss]=1/2\mathbb{E}_{F}[s|s\geq s^{*}]=1/2. Define

U¯(x)={0,x<s,xs12s,xs.\overline{U}(x)=\begin{cases}0,&x<s^{*},\\ \frac{x-s^{*}}{1-2s^{*}},&x\geq s^{*}.\end{cases}

Since the designer wins a district PP iff 𝔼P[s]1/2\mathbb{E}_{P}[s]\geq 1/2, his expected seat share under a plan \mathcal{H} is

1{𝔼P[s]12}d(P)U¯(𝔼P[s])d(P)U¯(s)dP(s)d(P)=U¯(s)dF(s)=ss¯ss12sdF(s)=1F(s),\begin{gathered}\int\mbox{\bf 1}\left\{\mathbb{E}_{P}[s]\geq\tfrac{1}{2}\right\}d\mathcal{H}(P)\leq\int\overline{U}(\mathbb{E}_{P}[s])d\mathcal{H}(P)\leq\iint\overline{U}(s)dP(s)d\mathcal{H}(P)\\ =\int\overline{U}(s)dF(s)=\int_{s^{*}}^{\overline{s}}\frac{s-s^{*}}{1-2s^{*}}dF(s)=1-F(s^{*}),\end{gathered} (5)

where the first inequality holds because 1{x1/2}U¯(x)\mbox{\bf 1}\{x\geq 1/2\}\leq\overline{U}(x) for all xx, the second inequality holds because U¯\overline{U} is convex, the first equality holds because PdH(P)=F\int PdH(P)=F, and the last equality holds because 𝔼F[s|ss]=1/2\mathbb{E}_{F}[s|s\geq s^{*}]=1/2. Thus, a plan \mathcal{H} is optimal iff for all Psupp()P\in\operatorname{supp}(\mathcal{H}) we have: (i) 𝔼P[s]s\mathbb{E}_{P}[s]\leq s^{*} or 𝔼P[s]=1/2\mathbb{E}_{P}[s]=1/2 (as otherwise the first inequality in (5) is strict), and (ii) supp(P)[s¯,s]\operatorname{supp}(P)\subset[\underline{s},s^{*}] if 𝔼P[s]<s\mathbb{E}_{P}[s]<s^{*} and supp(P)[s,s¯]\operatorname{supp}(P)\subset[s^{*},\overline{s}] if 𝔼P[s]=1/2\mathbb{E}_{P}[s]=1/2 (as otherwise the second inequality in (5) is strict). This means that \mathcal{H} contains measure F(s)F(s^{*}) of districts PP where PrP(s<s)=1\mathop{\rm Pr}\nolimits_{P}(s<s^{*})=1 and measure 1F(s)1-F(s^{*}) of districts PP where PrP(ss)=1\mathop{\rm Pr}\nolimits_{P}(s\geq s^{*})=1 and 𝔼P[s]=1/2.\mathbb{E}_{P}[s]=1/2.

Proof of Proposition 3.

For a districting plan \mathcal{H}, define HH as H(r)=Pr(r(P)r)H(r)=\mathop{\rm Pr}\nolimits_{\mathcal{H}}(r^{*}(P)\leq r) for all rr. The designer thus wins measure 1H(r)1-H(r_{-}) of districts when the realized aggregate shock is rr. For each realization rr, the designer wins a district PP iff it contains at least measure 1/21/2 voters with types srs\geq r (i.e., PrP(sr)1/2\mathop{\rm Pr}\nolimits_{P}(s\geq r)\geq 1/2). Since the population has measure 1F(r)1-F(r) voters with types srs\geq r, the designer wins at most measure 2(1F(r))2(1-F(r)) districts, so 1H(r)2(1F(r))1-H(r_{-})\leq 2(1-F(r)). Since the designer can win at most measure 11 districts, any feasible HH satisfies H(r)H(r)H(r_{-})\geq H^{*}(r), where

H(r)={0,if rsm,12(1F(r)),if r>sm.H^{*}(r)=\begin{cases}0,&\text{if $r\leq s^{m}$},\\ 1-2(1-F(r)),&\text{if $r>s^{m}$}.\end{cases}

Thus, the designer’s expected seat share for any feasible HH is

(1H(r))dG(r)(1H(r))dG(r),\int\left(1-H(r_{-})\right)dG(r)\leq\int\left(1-H^{*}(r)\right)dG(r),

with strict inequality if H(r)>H(r)H(r_{-})>H^{*}(r) for some rr (and thus on some interval (r,r)(r,r^{\prime}) with r>rr^{\prime}>r, by continuity of HH^{*} and monotonicity of HH), because G(r)G(r) is strictly increasing in rr. Thus, a districting plan \mathcal{H} is optimal iff it induces HH^{*}, which means that \mathcal{H}-almost every district PP that the designer wins iff the aggregate shock is at most rr satisfies PrP(s=r)=PrP(s<sm)=1/2\mathop{\rm Pr}\nolimits_{P}(s=r)=\mathop{\rm Pr}\nolimits_{P}(s<s^{m})=1/2. ∎

Proof of Proposition 4.

The proposition follows from Theorem 1 in Kolotilin, Mylovanov, and Zapechelnyuk (2022). The proof is similar to the proof of Proposition 2. The most interesting case is where there is an interior cutoff ss^{*} and pool mean x=𝔼F[ss]x^{*}=\mathbb{E}_{F}[s\geq s^{*}] satisfying u(x)(xs)=U(x)U(s)u(x^{*})(x^{*}-s^{*})=U(x^{*})-U(s^{*}). As follows from Figure 2, such ss^{*} is unique. Define

U¯(x)={U(x),x<s,U(x)+u(x)(xx),xs.\overline{U}(x)=\begin{cases}U(x),&x<s^{*},\\ U(x^{*})+u(x^{*})(x-x^{*}),&x\geq s^{*}.\end{cases}

The designer’s expected seat share under a plan \mathcal{H} is

U(𝔼P[s])d(P)U¯(𝔼P[s])d(P)U¯(s)dP(s)d(P)=U¯(s)dF(s)=0sU(x)dF(x)+U(x)(1F(s)),\begin{gathered}\int U(\mathbb{E}_{P}[s])d\mathcal{H}(P)\leq\int\overline{U}(\mathbb{E}_{P}[s])d\mathcal{H}(P)\leq\iint\overline{U}(s)dP(s)d\mathcal{H}(P)\\ =\int\overline{U}(s)dF(s)=\int_{0}^{s^{*}}U(x)dF(x)+U(x^{*})(1-F(s^{*})),\end{gathered} (6)

where the first inequality holds by UU¯U\leq\overline{U}, the second inequality holds by convexity of U¯\overline{U}, the first equality holds by PdH(P)=F\int PdH(P)=F, and the second equality holds by the definition of ss^{*}, xx^{*}, and U¯\overline{U}. Thus, a plan \mathcal{H} is optimal iff for all Psupp()P\in\operatorname{supp}(\mathcal{H}) we have: (i) 𝔼P[s]s\mathbb{E}_{P}[s]\leq s^{*} or 𝔼P[s]=x\mathbb{E}_{P}[s]=x^{*} (as otherwise the first inequality in (6) is strict), and (ii) P=δ𝔼P[s]P=\delta_{\mathbb{E}_{P}[s]} if 𝔼P[s]<s\mathbb{E}_{P}[s]<s^{*} and supp(P)[s,s¯]\operatorname{supp}(P)\subset[s^{*},\overline{s}] if 𝔼P[s]=x\mathbb{E}_{P}[s]=x^{*} (as otherwise the second inequality in (6) is strict). This implies that the distribution of district means induced by pack-opponents-and-pool districting with cutoff ss^{*} is uniquely optimal. ∎

Appendix B Proofs for Section 4

We start with a lemma that distills some key results from Kolotilin, Corrao, and Wolitzky (2023).

Lemma 1.

There exists a bounded, measurable function λ:\lambda:\mathbb{R}\rightarrow\mathbb{R} such that, for any optimal districting plan \mathcal{H}, the following hold:

  1. (1)

    For all P,Psupp()P,P^{\prime}\in\operatorname{supp}(\mathcal{H}) and all ssupp(P)s\in\operatorname{supp}(P), we have

    G(r(P))+λ(r(P))(v(s,r(P))12)G(r(P))+λ(r(P))(v(s,r(P))12).G(r^{*}(P))+\lambda(r^{*}(P))\left(v(s,r^{*}(P))-\tfrac{1}{2}\right)\geq G(r^{*}(P^{\prime}))+\lambda(r^{*}(P^{\prime}))\left(v(s,r^{*}(P^{\prime}))-\tfrac{1}{2}\right).
  2. (2)

    For all Psupp()P\in\operatorname{supp}(\mathcal{H}), we have

    λ(r(P))=g(r(P))v(s,r(P))rdP(s).\lambda(r^{*}(P))=-\frac{g(r^{*}(P))}{{\int}\frac{\partial v(s,r^{*}(P))}{\partial r}dP(s)}.
  3. (3)

    For any non-degenerate Psupp()P\in\operatorname{supp}(\mathcal{H}), λ\lambda has a derivative λ(r(P))\lambda^{\prime}(r^{*}(P)) at r(P)r^{*}(P) satisfying, for all ssupp(P)s\in\operatorname{supp}(P),

    g(r(P))+λ(r(P))v(s,r(P))r+λ(r(P))(v(s,r(P))12)=0.g(r^{*}(P))+\lambda(r^{*}(P))\frac{\partial v(s,r^{*}(P))}{\partial r}+\lambda^{\prime}(r^{*}(P))\left(v(s,r^{*}(P))-\tfrac{1}{2}\right)=0.

Intuitively, λ(r(P))\lambda(r^{*}(P)) is the multiplier on the constraint v(s,r(P))dP=12\int v(s,r^{*}(P))dP=\frac{1}{2}. Part 2 of the lemma says that λ(r(P))\lambda(r^{*}(P)) equals the product of the designer’s marginal utility of increasing r(P)r^{*}(P) (which equals g(r(P))g(r^{*}(P))) and the rate at which r(P)r^{*}(P) increases as the constraint v(s,r(P))dP=12\int v(s,r^{*}(P))dP=\frac{1}{2} is relaxed (which equals 1/v(s,r(P))rdP(s)-1/{\int}\frac{\partial v(s,r^{*}(P))}{\partial r}dP(s) by the implicit function theorem). Part 1 of the lemma says that the designer assigns a type-ss voter to a district PP so as to maximize G(r(P))+λ(r(P))(v(s,r(P))12)G(r^{*}(P))+\lambda(r^{*}(P))\left(v(s,r^{*}(P))-\tfrac{1}{2}\right). Part 3 says that the first-order condition of this maximization problem with respect to rr holds for all non-degenerate Psupp()P\in\operatorname{supp}(\mathcal{H}) and all ssupp(P)s\in\operatorname{supp}(P).

Proof.

Any districting plan \mathcal{H} induces a joint distribution π\pi_{\mathcal{H}} of voter type ss and the threshold aggregate shock rr below which the designer wins a district containing voter type ss. Specifically, denoting r¯=r(δs¯)\underline{r}=r^{*}(\delta_{\underline{s}}) and r¯=r(δs¯)\overline{r}=r^{*}(\delta_{\overline{s}}), \mathcal{H} induces π\pi_{\mathcal{H}} given by

π(S,R):=P:r(P)RP(S)d(P)for all measurable S[s¯,s¯] and R[r¯,r¯].\pi_{\mathcal{H}}(S,R):=\int_{P:r^{*}(P)\in R}P(S)d\mathcal{H}(P)\quad\text{for all measurable $S\subset[\underline{s},\overline{s}]$ and $R\subset[\underline{r},\overline{r}]$}.

Appendix B in Kolotilin, Corrao, and Wolitzky (2023) constructs a suitable bounded, measurable function λ:[r¯,r¯]\lambda:[\underline{r},\overline{r}]\rightarrow\mathbb{R}, and defines the set Γ\Gamma as

Γ:={(s,r)[s¯,s¯]×[r¯,r¯]:supr~[r¯,r¯]{G(r~)+λ(r~)(v(s,r~)12)}=G(r)+λ(r)(v(s,r)12)}.\Gamma:=\{(s,r)\in[\underline{s},\overline{s}]\times[\underline{r},\overline{r}]:\sup_{\tilde{r}\in[\underline{r},\overline{r}]}\{G(\tilde{r})+\lambda(\tilde{r})\left(v(s,\tilde{r})-\tfrac{1}{2}\right)\}=G(r)+\lambda(r)\left(v(s,r)-\tfrac{1}{2}\right)\}.

Moreover, they define

RΓ\displaystyle R_{\Gamma} :={r[r¯,r¯]:(s,r)Γ for some s[s¯,s¯]},\displaystyle:=\{r\in[\underline{r},\overline{r}]:(s,r)\in\Gamma\quad\text{ for some }s\in[\underline{s},\overline{s}]\},
Γr\displaystyle\Gamma_{r} :={s[s¯,s¯]:(s,r)Γ}for all r[r¯,r¯].\displaystyle:=\{s\in[\underline{s},\overline{s}]:(s,r)\in\Gamma\}\quad\text{for all $r\in[\underline{r},\overline{r}]$}.

Part 1 of their Theorem 7 shows that the set Γ\Gamma is compact and satisfies

minΓrs(r)maxΓrfor all rRΓ,\min\Gamma_{r}\leq s^{*}(r)\leq\max\Gamma_{r}\quad\text{for all $r\in R_{\Gamma}$}, (7)

where s(r)s^{*}(r) is defined by v(s(r),r)=1/2v(s^{*}(r),r)=1/2. Moreover, the same result shows that

supp(π)Γfor each optimal .\operatorname{supp}(\pi_{\mathcal{H}})\subset\Gamma\quad\text{for each optimal $\mathcal{H}$}. (8)

Furthermore, Kolotilin, Corrao, and Wolitzky define the set ΓΓ\Gamma^{*}\subset\Gamma such that

Γr={{s(r)},rRΓ and s(r){minΓr,maxΓr},Γr,otherwise,for all r[r¯,r¯].\Gamma^{*}_{r}=\begin{cases}\{s^{*}(r)\},&r\in R_{\Gamma}\text{ and }s^{*}(r)\in\{\min\Gamma_{r},\max\Gamma_{r}\},\\ \Gamma_{r},&\text{otherwise},\end{cases}\quad\text{for all $r\in[\underline{r},\overline{r}]$}.

Part 2 of their Theorem 7 shows that, if Γr={s(r)}\Gamma^{*}_{r}=\{s^{*}(r)\}, then

g(r)+λ(r)v(s(r),r)r=0,g(r)+\lambda(r)\frac{\partial v(s^{*}(r),r)}{\partial r}=0, (9)

and if minΓr<s(r)<maxΓr\min\Gamma^{*}_{r}<s^{*}(r)<\max\Gamma^{*}_{r}, then λ\lambda has a derivative λ(r)\lambda^{\prime}(r) at rr satisfying, for all sΓrs\in\Gamma^{*}_{r},

g(r)+λ(r)v(s,r)r+λ(r)(v(s,r)12)=0.g(r)+\lambda(r)\frac{\partial v(s,r)}{\partial r}+\lambda^{\prime}(r)\left(v(s,r)-\tfrac{1}{2}\right)=0. (10)

Now, consider any optimal \mathcal{H}. By (8), we have supp(P)Γr(P)\operatorname{supp}(P)\subset\Gamma_{r^{*}(P)} for all Psupp(P\in\operatorname{supp}(\mathcal{H}). By the definition of r(P)r^{*}(P), we have v(s,r(P))dP(s)=1/2\int v(s,r^{*}(P))dP(s)=1/2, so either supp(P)={s(r(P))}\operatorname{supp}(P)=\{s^{*}(r^{*}(P))\} or minsupp(P)<s(r(P))<maxsupp(P)\min\operatorname{supp}(P)<s^{*}(r^{*}(P))<\max\operatorname{supp}(P). In both cases, we have supp(P)Γr(P)\operatorname{supp}(P)\subset\Gamma^{*}_{r^{*}(P)}, by (7) and the definition of Γr(P)\Gamma^{*}_{r^{*}(P)}. Thus, part 1 of the lemma follows from the definition of Γ\Gamma. In turn, part 2 follows from (9) when PP is degenerate and from integrating (10) over PP when PP is non-degenerate. Finally, part 3 follows from (10). ∎

Proof of Proposition 5.

Part 1 follows from (1) and v(s,r)=Q(sr)v(s,r)=Q(s-r). For part 2, notice that (1) is equivalent to

3v(s,r)s2rv(s,r)s>2v(s,r)sr2v(s,r)s2for all sr.\frac{\partial^{3}v(s,r)}{\partial s^{2}\partial r}\frac{\partial v(s,r)}{\partial s}>\frac{\partial^{2}v(s,r)}{\partial s\partial r}\frac{\partial^{2}v(s,r)}{\partial s^{2}}\quad\text{for all $s$, $r$}.

Thus, letting subscripts denote partial derivatives, vsr(s,r)=0v_{sr}(s,r)=0 implies vssr(s,r)>0v_{ssr}(s,r)>0, so vsr(s,r)=0v_{sr}(s,r)=0 implies vsr(s,r)>0v_{sr}(s^{\prime},r)>0 for all s>ss^{\prime}>s, showing that vsr(s,r)v_{sr}(s,r) satisfies strict single crossing in ss, and hence vr(s,r)v_{r}(s,r) is strictly quasi-convex in ss. ∎

Proof of Proposition 6.

The proposition follows from Theorem 3 in Kolotilin, Corrao, and Wolitzky (2023) for the state-independent sender case, where V(a,θ)=V(a)V(a,\theta)=V(a). We illustrate the proof in the case where supp(F)\operatorname{supp}(F) and supp()\operatorname{supp}(\mathcal{H}) are finite. The general proof has the same logic but involves additional technicalities, which can be handled using Lemma 1. The proof rests on two lemmas.

Lemma 2.

For any optimal \mathcal{H} (with finite support), there do not exist P,Psupp()P,P^{\prime}\in\operatorname{supp}(\mathcal{H}) such that PP contains types s<ss<s^{\prime\prime}, PP^{\prime} contains a type s(s,s)s^{\prime}\in(s,s^{\prime\prime}), and r(P)<r(P).r^{*}(P)<r^{*}(P^{\prime}).

Proof.

Suppose for contradiction that such districts PP and PP^{\prime} exist, and denote r(P)=rr^{*}(P)=r and r(P)=rr^{*}(P^{\prime})=r^{\prime}, with r<rr<r^{\prime}. Consider a perturbation that shifts mass ρ=(v(s,r)v(s,r))ε\rho=(v(s^{\prime\prime},r)-v(s^{\prime},r))\varepsilon of type-ss voters and mass ρ=(v(s,r)v(s,r))ε\rho^{\prime\prime}=(v(s^{\prime},r)-v(s,r))\varepsilon of type-ss^{\prime\prime} voters from PP to PP^{\prime}, and shifts an equal mass ρ=ρ+ρ=(v(s,r)v(s,r))ε\rho^{\prime}=\rho+\rho^{\prime\prime}=(v(s^{\prime\prime},r)-v(s,r))\varepsilon of type-ss^{\prime} from PP^{\prime} to PP, for a sufficiently small ε>0\varepsilon>0. Since v(s,r)v(s,r) is strictly increasing in ss, these masses are strictly positive and thus this perturbation is well-defined. Since the perturbation does not change the mass of voters in PP and PP^{\prime}, to show that it strictly increases the designer’s expected seat share, it suffices to show that r(P)r^{*}(P) does not change and r(P)r^{*}(P^{\prime}) strictly increases. First, r(P)r^{*}(P) does not change because v(s,r)dP(s)\int v(s,r)dP(s) does not change, as

v(s,r)ρ+v(s,r)ρv(s,r)ρ=0.-v(s,r)\rho+v(s^{\prime},r)\rho^{\prime}-v(s^{\prime\prime},r)\rho^{\prime\prime}=0.

Second, r(P)r^{*}(P^{\prime}) strictly increases because v(s,r)dP(s)\int v(s,r^{\prime})dP^{\prime}(s) strictly increases, as

v(s,r)ρv(s,r)ρ+v(s,r)ρ\displaystyle v(s,r^{\prime})\rho-v(s^{\prime},r^{\prime})\rho^{\prime}+v(s^{\prime\prime},r^{\prime})\rho^{\prime\prime}
=[(v(s,r)v(s,r))(v(s,r)v(s,r))(v(s,r)v(s,r))(v(s,r)v(s,r))]ε\displaystyle=[(v(s^{\prime\prime},r^{\prime})-v(s^{\prime},r^{\prime}))(v(s^{\prime},r)-v(s,r))-(v(s^{\prime\prime},r)-v(s^{\prime},r))(v(s^{\prime},r^{\prime})-v(s,r^{\prime}))]\varepsilon
=[ssssv(s~,r)sv(s~,r)sds~ds~ssssv(s~,r)sv(s~,r)sds~ds~]ε\displaystyle=\left[\int_{s^{\prime}}^{s^{\prime\prime}}\int_{s}^{s^{\prime}}\frac{\partial v(\tilde{s}^{\prime},r^{\prime})}{\partial s}\frac{\partial v(\tilde{s},r)}{\partial s}d\tilde{s}d\tilde{s}^{\prime}-\int_{s^{\prime}}^{s^{\prime\prime}}\int_{s}^{s^{\prime}}\frac{\partial v(\tilde{s}^{\prime},r)}{\partial s}\frac{\partial v(\tilde{s},r^{\prime})}{\partial s}d\tilde{s}d\tilde{s}^{\prime}\right]\varepsilon
=[ssss(v(s~,r)sv(s~,r)sv(s~,r)sv(s~,r)s)ds~ds~]ε>0,\displaystyle=\left[\int_{s^{\prime}}^{s^{\prime\prime}}\int_{s}^{s^{\prime}}\left(\frac{\partial v(\tilde{s}^{\prime},r^{\prime})}{\partial s}\frac{\partial v(\tilde{s},r)}{\partial s}-\frac{\partial v(\tilde{s}^{\prime},r)}{\partial s}\frac{\partial v(\tilde{s},r^{\prime})}{\partial s}\right)d\tilde{s}d\tilde{s}^{\prime}\right]\varepsilon>0,

where the inequality holds because the integrand is strictly positive for r<rr<r^{\prime} and s~<s~\tilde{s}<\tilde{s}^{\prime} by Assumption 1. ∎

Lemma 3.

For any optimal \mathcal{H} (with finite support) and any Psupp()P\in\operatorname{supp}(\mathcal{H}), we have |supp(P)|2|\operatorname{supp}(P)|\leq 2.

Proof.

Suppose for contradiction that there exists a district Psupp()P\in\operatorname{supp}(\mathcal{H}) that contains three types s<s<ss<s^{\prime}<s^{\prime\prime}. Denote r(P)=rr^{*}(P)=r. Suppose we split district PP into two identical equal-sized districts PP^{\prime} and PP^{\prime\prime}. Then consider a perturbation that shifts mass ρ=(v(s,r)v(s,r))ε\rho=(v(s^{\prime\prime},r)-v(s^{\prime},r))\varepsilon of type-ss voters and mass ρ=(v(s,r)v(s,r))ε\rho^{\prime\prime}=(v(s^{\prime},r)-v(s,r))\varepsilon of type-ss^{\prime\prime} voters from PP^{\prime} to PP^{\prime\prime}, and shifts an equal mass ρ=ρ+ρ=(v(s,r)v(s,r))ε\rho^{\prime}=\rho+\rho^{\prime\prime}=(v(s^{\prime\prime},r)-v(s,r))\varepsilon of type-ss^{\prime} voters from PP^{\prime\prime} to PP^{\prime}, for a sufficiently small ε>0\varepsilon>0. Notice that r(P)=r(P)=rr^{*}(P^{\prime\prime})=r^{*}(P^{\prime})=r, because

v(s,r)ρv(s,r)ρ+v(s,r)ρ=0.v(s,r)\rho-v(s^{\prime},r)\rho^{\prime}+v(s^{\prime\prime},r)\rho^{\prime\prime}=0.

Now consider an additional perturbation that moves an infinitesimal mass dρd\rho of type-ss voters from PP^{\prime\prime} to PP^{\prime} and moves the same mass dρd\rho of type-ss^{\prime\prime} voters from PP^{\prime} to PP^{\prime\prime}. By the implicit function theorem, r(P)=r+dr+o(dr)r^{*}(P^{\prime\prime})=r+dr^{\prime\prime}+o(dr^{\prime\prime}) and r(P)=rdr+o(dr)r^{*}(P^{\prime})=r-dr^{\prime}+o(dr^{\prime}), where

dr=(v(s,r)v(s,r))v(s~,r)rdP(s~)dmanddr=(v(s,r)v(s,r))v(s~,r)rdP(s~)dm.dr^{\prime\prime}=\frac{(v(s^{\prime\prime},r)-v(s,r))}{-\int\frac{\partial v(\tilde{s},r)}{\partial r}dP^{\prime\prime}(\tilde{s})}dm\quad\text{and}\quad dr^{\prime}=-\frac{(v(s^{\prime\prime},r)-v(s,r))}{-\int\frac{\partial v(\tilde{s},r)}{\partial r}dP^{\prime}(\tilde{s})}dm.

To show that this perturbation strictly increases the designer’s expected seat share, it suffices to show that dr>drdr^{\prime\prime}>dr^{\prime}, or equivalently v(s~,r)rdP(s~)<v(s~,r)rdP(s~)-\int\frac{\partial v(\tilde{s},r)}{\partial r}dP^{\prime\prime}(\tilde{s})<-\int\frac{\partial v(\tilde{s},r)}{\partial r}dP^{\prime}(\tilde{s}). This holds because

v(s,r)rρ+v(s,r)rρv(s,r)rρ\displaystyle-\frac{\partial v(s,r)}{\partial r}\rho+\frac{\partial v(s^{\prime},r)}{\partial r}\rho^{\prime}-\frac{\partial v(s^{\prime\prime},r)}{\partial r}\rho^{\prime\prime}
=[v(s,r)r(v(s,r)v(s,r))+v(s,r)r(v(s,r)v(s,r))v(s,r)r(v(s,r)v(s,r))]ε\displaystyle=\left[-\tfrac{\partial v(s,r)}{\partial r}(v(s^{\prime\prime},r)-v(s^{\prime},r))+\tfrac{\partial v(s^{\prime},r)}{\partial r}(v(s^{\prime\prime},r)-v(s,r))-\tfrac{\partial v(s^{\prime\prime},r)}{\partial r}(v(s^{\prime},r)-v(s,r))\right]\varepsilon
=[(v(s,r)rv(s,r)r)(v(s,r)v(s,r))(v(s,r)rv(s,r)r)(v(s,r)v(s,r))]ε\displaystyle=\left[\left(\tfrac{\partial v(s^{\prime},r)}{\partial r}-\tfrac{\partial v(s,r)}{\partial r}\right)(v(s^{\prime\prime},r)-v(s^{\prime},r))-\left(\tfrac{\partial v(s^{\prime\prime},r)}{\partial r}-\tfrac{\partial v(s^{\prime},r)}{\partial r}\right)(v(s^{\prime},r)-v(s,r))\right]\varepsilon
=[ss2v(s~,r)srds~ssv(s~,r)sds~ss2v(s~,r)srds~ssv(s~,r)sds~]ε\displaystyle=\left[\int_{s}^{s^{\prime}}\frac{\partial^{2}v(\tilde{s},r)}{\partial s\partial r}d\tilde{s}\int_{s^{\prime}}^{s^{\prime\prime}}\frac{\partial v(\tilde{s}^{\prime},r)}{\partial s}d\tilde{s}^{\prime}-\int_{s^{\prime}}^{s^{\prime\prime}}\frac{\partial^{2}v(\tilde{s}^{\prime},r)}{\partial s\partial r}d\tilde{s}^{\prime}\int_{s}^{s^{\prime}}\frac{\partial v(\tilde{s},r)}{\partial s}d\tilde{s}\right]\varepsilon
<2v(s,r)srv(s,r)s[ssv(s~,r)sds~ssv(s~,r)sds~ssv(s~,r)sds~ssv(s~,r)sds~]ε=0,\displaystyle<\frac{\frac{\partial^{2}v(s^{\prime},r)}{\partial s\partial r}}{\frac{\partial v(s^{\prime},r)}{\partial s}}\left[\int_{s}^{s^{\prime}}\frac{\partial v(\tilde{s},r)}{\partial s}d\tilde{s}\int_{s^{\prime}}^{s^{\prime\prime}}\frac{\partial v(\tilde{s}^{\prime},r)}{\partial s}d\tilde{s}^{\prime}-\int_{s^{\prime}}^{s^{\prime\prime}}\frac{\partial v(\tilde{s}^{\prime},r)}{\partial s}d\tilde{s}^{\prime}\int_{s}^{s^{\prime}}\frac{\partial v(\tilde{s},r)}{\partial s}d\tilde{s}\right]\varepsilon=0,

where the inequality follows from Assumption 1, which implies that ln(v(s,r)/s)/r\partial\ln(\partial v(s,r)/\partial s)/\partial r is strictly increasing in ss, and thus

2v(s~,r)srv(s~,r)s<2v(s,r)srv(s,r)s<2v(s~,r)srv(s~,r)sfor s~<s<s~.\frac{\frac{\partial^{2}v(\tilde{s},r)}{\partial s\partial r}}{\frac{\partial v(\tilde{s},r)}{\partial s}}<\frac{\frac{\partial^{2}v(s^{\prime},r)}{\partial s\partial r}}{\frac{\partial v(s^{\prime},r)}{\partial s}}<\frac{\frac{\partial^{2}v(\tilde{s}^{\prime},r)}{\partial s\partial r}}{\frac{\partial v(\tilde{s}^{\prime},r)}{\partial s}}\quad\text{for $\tilde{s}<s^{\prime}<\tilde{s}^{\prime}$}.\qed

By Lemmas 2 and 3, to show that every optimal districting plan \mathcal{H} (with finite support) is single-dipped, it suffices to show that for any district Psupp()P\in\operatorname{supp}(\mathcal{H}) consisting of voter types s<ss<s^{\prime\prime} and any district Psupp()P^{\prime}\in\operatorname{supp}(\mathcal{H}) containing a voter type s(s,s)s^{\prime}\in(s,s^{\prime\prime}), we have r(P)r(P)r^{*}(P)\neq r^{*}(P^{\prime}). But this follows because, if r(P)=r(P)r^{*}(P)=r^{*}(P^{\prime}), then merging districts PP and PP^{\prime} into one district would also be optimal, but the merged district would contain three voter types, contradicting Lemma 3. ∎

Proof of Proposition 7.

Let \mathcal{H} be a pack-and-pair districting plan. Since \mathcal{H} is strictly single-dipped, the support of each Psupp()P\in\operatorname{supp}(\mathcal{H}) has at most two elements and thus can be represented as {s1(r(P)),s2(r(P))}\{s_{1}(r^{*}(P)),s_{2}(r^{*}(P))\} with s1(r(P))r(P)s2(r(P))s_{1}(r^{*}(P))\leq r^{*}(P)\leq s_{2}(r^{*}(P)). Moreover, for each P,Psupp()P,P^{\prime}\in\operatorname{supp}(\mathcal{H}) with r(P)<r(P)r^{*}(P)<r^{*}(P^{\prime}), we have s2(r(P))s2(r(P))s_{2}(r^{*}(P))\leq s_{2}(r^{*}(P^{\prime})), as otherwise we would have s2(r(P))(s1(r(P)),s2(r(P)))s_{2}(r^{*}(P^{\prime}))\in(s_{1}(r^{*}(P)),s_{2}(r^{*}(P))) contradicting strict single-dippedness of .\mathcal{H}.

Assume that there exists PP such that s1(r(P))<s2(r(P))s_{1}(r^{*}(P))<s_{2}(r^{*}(P)), as otherwise the proposition obviously holds with rb=s¯r^{b}=\overline{s}. Define rb=inf{r(P~):P~supp(),s1(r(P~))<s2(r(P~))}r^{b}=\inf\{r^{*}(\tilde{P}):\tilde{P}\in\operatorname{supp}(\mathcal{H}),\ s_{1}(r^{*}(\tilde{P}))<s_{2}(r^{*}(\tilde{P}))\}, so that, for each Psupp()P\in\operatorname{supp}(\mathcal{H}) with r(P)<rbr^{*}(P)<r^{b}, we have supp(P)={r(P)}\operatorname{supp}(P)=\{r^{*}(P)\}. Since supp()\operatorname{supp}(\mathcal{H}) is compact, there exists Pbsupp()P^{b}\in\operatorname{supp}(\mathcal{H}) with r(Pb)=rb.r^{*}(P^{b})=r^{b}. It follows that supp(Pb)={rb}\operatorname{supp}(P^{b})=\{r^{b}\}, as otherwise (i.e., if s1(r(Pb))<rb<s2(r(Pb))s_{1}(r^{*}(P^{b}))<r^{b}<s_{2}(r^{*}(P^{b})) voter types in (rb,s2(r(Pb))(r^{b},s_{2}(r^{*}(P^{b})) (which have strictly positive mass since ff is strictly positive on [s¯,s¯][\underline{s},\overline{s}]) cannot be segregated, as this would contradict strict single-dippedness of \mathcal{H}, and also cannot be paired with other types, as this would contradict either strict single-dippedness of \mathcal{H} or the definition of rbr^{b}.

Finally, we show that, for each P,Psupp()P,P^{\prime}\in\operatorname{supp}(\mathcal{H}) with rb<r(P)<r(P)r^{b}<r^{*}(P)<r^{*}(P^{\prime}), we have s1(r(P))s1(r(P))s_{1}(r^{*}(P))\geq s_{1}(r^{*}(P^{\prime})). Suppose by contradiction that s1(r(P))<s1(r(P))s_{1}(r^{*}(P))<s_{1}(r^{*}(P^{\prime})). Since \mathcal{H} is a strictly single-dipped pack-and-pair districting plan, by the definition of rbr^{b}, we have s1(r(P))<r(P)<s2(r(P))s1(r(P))<r(P)<s2(r(P))s_{1}(r^{*}(P))<r^{*}(P)<s_{2}(r^{*}(P))\leq s_{1}(r^{*}(P^{\prime}))<r^{*}(P^{\prime})<s_{2}(r^{*}(P^{\prime})). Define r=inf{r(P~):P~supp(),s1(r(P))s1(r(P~))<s2(r(P~))s2(r(P))}s1(r(P))r^{\dagger}=\inf\{r^{*}(\tilde{P}):\tilde{P}\in\operatorname{supp}(\mathcal{H}),\ s_{1}(r^{*}(P^{\prime}))\leq s_{1}(r^{*}(\tilde{P}))<s_{2}(r^{*}(\tilde{P}))\leq s_{2}(r^{*}(P^{\prime}))\}\geq s_{1}(r^{*}(P^{\prime})). By the same argument as in the previous paragraph, we have δrsupp()\delta_{r^{\dagger}}\in\operatorname{supp}(\mathcal{H}), contradicting that \mathcal{H} is pack-and-pair. ∎

The next lemma restates some results from Kolotilin, Corrao, and Wolitzky (2023), which we use to prove Propositions 8 and 9.

Lemma 4.

Consider the additive taste shock case where the taste shock density is strictly log-concave and symmetric about 0.

  1. (1)

    If for all s<r<ss<r<s^{\prime}, we have

    G(r)<Q(sr)12Q(sr)Q(sr)G(s)+12Q(sr)Q(sr)Q(sr)G(s),G(r)<\frac{Q(s^{\prime}-r)-\frac{1}{2}}{Q(s^{\prime}-r)-Q(s-r)}G(s)+\frac{\frac{1}{2}-Q(s-r)}{Q(s^{\prime}-r)-Q(s-r)}G(s^{\prime}),

    then the unique optimal plan is segregation.

  2. (2)

    If for all s<ss<s^{\prime} there exists r(s,s)r\in(s,s^{\prime}) such that

    G(r)>Q(sr)12Q(sr)Q(sr)G(s)+12Q(sr)Q(sr)Q(sr)G(s),G(r)>\frac{Q(s^{\prime}-r)-\frac{1}{2}}{Q(s^{\prime}-r)-Q(s-r)}G(s)+\frac{\frac{1}{2}-Q(s-r)}{Q(s^{\prime}-r)-Q(s-r)}G(s^{\prime}),

    then the unique optimal plan is negative assortative.

Proof.

By the definition of r(P)r^{*}(P), we have

r(ρδs+(1ρ)δs)=r(s,s)ρ=Q(sr)12Q(sr)Q(sr)(0,1).r^{*}(\rho\delta_{s}+(1-\rho)\delta_{s^{\prime}})=r\in(s,s^{\prime})\iff\rho=\frac{Q(s^{\prime}-r)-\frac{1}{2}}{Q(s^{\prime}-r)-Q(s-r)}\in(0,1).

Thus, part 1 says that, for any s<ss<s^{\prime}, the designer prefers to separate any district P=ρδs+(1ρ)δsP=\rho\delta_{s}+(1-\rho)\delta_{s^{\prime}} into districts δs\delta_{s} and δs\delta_{s^{\prime}}, and part 2 says that, for any s<ss<s^{\prime}, the designer prefers to pool districts δs\delta_{s} and δs\delta_{s^{\prime}} into some district P=ρδs+(1ρ)δsP=\rho\delta_{s}+(1-\rho)\delta_{s^{\prime}}. Consequently, parts 1 and 2 follow from Theorems 4 and 6 in Kolotilin, Corrao, and Wolitzky (2023). ∎

Proof of Proposition 8.

For part 1, by Lemma 4, negative assortative districting is uniquely optimal if for all s<ss<s^{\prime} there exists r(s,s)r\in(s,s^{\prime}) such that

(G(r)G(s))(Q(sr)12)>(G(s)G(r))(12Q(sr)),(G(r)-G(s))\left(Q(s^{\prime}-r)-\tfrac{1}{2}\right)>(G(s^{\prime})-G(r))\left(\tfrac{1}{2}-Q(s-r)\right),

and thus, considering rsr\uparrow s^{\prime}, if for all s<ss<s^{\prime}, we have

(G(s)G(s))q(0)>g(s)(12Q(ss)),(G(s^{\prime})-G(s))q(0)>g(s^{\prime})\left(\tfrac{1}{2}-Q(s-s^{\prime})\right),

which holds if GG is concave, as shown in the main text.

For part 2, it suffices to show that there exists c>0c>0 such that, for all srs\neq r, we have

G(s)G(r)g(r)>Q(sr)12q(0).\frac{G(s)-G(r)}{g(r)}>\frac{Q(s-r)-\frac{1}{2}}{q(0)}.

Indeed, this inequality implies that for all s<r<ss<r<s^{\prime}, we have

G(r)G(s)12Q(sr)<g(r)q(0)<G(s)G(r)Q(sr)12,\frac{G(r)-G(s)}{\frac{1}{2}-Q(s-r)}<\frac{g(r)}{q(0)}<\frac{G(s^{\prime})-G(r)}{Q(s^{\prime}-r)-\frac{1}{2}}, (11)

and hence segregation is uniquely optimal by Lemma 4.

Now, since g(r)/g(r)cg^{\prime}(r)/g(r)\geq c for all rr, Gronwall’s inequality gives g(s)/g(r)ec(sr)g(s)/g(r)\geq e^{c(s-r)} for all s>rs>r and g(s)/g(r)ec(sr)g(s)/g(r)\leq e^{c(s-r)} for all s<rs<r. Hence, for all s,rs,r, we have

G(s)G(r)g(r)=rsg(x)g(r)dxrsec(xr)dx=ec(sr)1c.\frac{G(s)-G(r)}{g(r)}=\int_{r}^{s}\frac{g(x)}{g(r)}dx\geq\int_{r}^{s}e^{c(x-r)}dx=\frac{e^{c(s-r)}-1}{c}.

Thus, it suffices to show that there exists c>0c>0 such that, for all srs\neq r, we have

ec(sr)1c>Q(sr)12q(0).\frac{e^{c(s-r)}-1}{c}>\frac{Q(s-r)-\frac{1}{2}}{q(0)}.

Note that both sides have the same values and the same derivatives at s=rs=r. Moreover, at s=rs=r, the second derivative of the left-hand side, c>0c>0, is greater than the second derivative of the right-hand side, q(0)/q(0)=0{q^{\prime}(0)}/{q(0)}=0. Thus, the inequality holds in some neighborhood s(rε,r)s\in(r-\varepsilon,r). Setting c=q(0)/(1/2Q(ε))>0c=q(0)/(1/2-Q(-\varepsilon))>0 guarantees that the inequality holds for all srs\neq r. Indeed, for srεs\leq r-\varepsilon, we have

ec(sr)1c>1c=Q(ε)12q(0)Q(sr)12q(0),\frac{e^{c(s-r)}-1}{c}>-\frac{1}{c}=\frac{Q(-\varepsilon)-\frac{1}{2}}{q(0)}\geq\frac{Q(s-r)-\frac{1}{2}}{q(0)},

where the first inequality holds by ec(sr)>0e^{c(s-r)}>0 and the second holds by monotonicity of QQ. For s>rs>r, we have

ec(sr)1c>sr>Q(sr)12q(0),\frac{e^{c(s-r)}-1}{c}>s-r>\frac{Q(s-r)-\frac{1}{2}}{q(0)},

where the first inequality holds by strict convexity of ecxe^{cx} in xx and the second holds by strict concavity of QQ on [0,+)[0,+\infty). ∎

Proof of Proposition 9.

Since density qq is symmetric about 0 and density ff is strictly positive on [s¯,s¯][\underline{s},\overline{s}], we have s¯<r(F)<s¯\underline{s}<r^{*}(F)<\overline{s}. Since GG is strictly S-shaped with inflection point r(F)r^{*}(F), it follows that GG is concave on [r(F),s¯][r^{*}(F),\overline{s}]. Thus, by Proposition 8, negative assortative districting is uniquely optimal for types in [r(F),s¯][r^{*}(F),\overline{s}], showing that segregation cannot be optimal.

Suppose for contradiction that negative assortative districting \mathcal{H} is optimal. By Proposition 7, for each Psupp()P\in\operatorname{supp}(\mathcal{H}) except for δrb\delta_{r^{b}}, we have s1(r(P))<r(P)<s2(r(P))s_{1}(r^{*}(P))<r^{*}(P)<s_{2}(r^{*}(P)), where s1s_{1} is decreasing and s2s_{2} is increasing. Note that rb<r(F)r^{b}<r^{*}(F), because

Q(sr(F))dF(s)=0=Q(sr(P))dP(s)d(P)\displaystyle\int Q(s-r^{*}(F))dF(s)=0=\iint Q(s-r^{*}(P))dP(s)d\mathcal{H}(P)
<Q(srb)dP(s)d(P)=Q(srb)dF(s),\displaystyle<\iint Q(s-r^{b})dP(s)d\mathcal{H}(P)=\int Q(s-r^{b})dF(s),

where the first two equalities hold by the definition of r(F)r^{*}(F) and r(P)r^{*}(P), the inequality holds by r(P)>rbr^{*}(P)>r^{b} for all Psupp()P\in\operatorname{supp}(\mathcal{H}) except for δrb\delta_{r^{b}}, and the last equality holds by Pd(P)=F\int Pd\mathcal{H}(P)=F. Since density ff is strictly positive on [s¯,s¯][\underline{s},\overline{s}], by the same argument as in the proof of Proposition 7, we get limrrbs1(r)=limrrbs2(r)=rb\lim_{r\downarrow r^{b}}s_{1}(r)=\lim_{r\downarrow r^{b}}s_{2}(r)=r^{b}. Thus, for any ε>0\varepsilon>0, there exists Psupp()P\in\operatorname{supp}(\mathcal{H}) such that rbε<s1(r(P))<s2(r(P))<rb+εr^{b}-\varepsilon<s_{1}(r^{*}(P))<s_{2}(r^{*}(P))<r^{b}+\varepsilon, and all types in [s1(r(P)),s2(r(P))][s_{1}(r^{*}(P)),s_{2}(r^{*}(P))] are matched between themselves in a negatively assortative manner. For small enough ε>0\varepsilon>0 and all s<r<ss<r<s^{\prime} in [s1(r(P)),s2(r(P))][s_{1}(r^{*}(P)),s_{2}(r^{*}(P))], we have

G(s)G(r)g(r)>Q(sr)12q(0),\frac{G(s)-G(r)}{g(r)}>\frac{Q(s-r)-\frac{1}{2}}{q(0)},

where the inequality holds because both sides have the same values and the same derivatives at s=rs=r, while the second derivative of the left-hand side, g(r)/g(r)>0g^{\prime}(r)/g(r)>0 (recall that rbr^{b} is less than inflection point r(F)r^{*}(F) of strictly S-shaped GG), is greater than the second derivative of the right-hand side, q(0)/q(0)=0q^{\prime}(0)/q(0)=0. As follows from (11) in the proof of Proposition 8, segregation is uniquely optimal for types in [s1(r(P)),s2(r(P))][s_{1}(r^{*}(P)),s_{2}(r^{*}(P))], showing that \mathcal{H} cannot be optimal. ∎

Proof of Proposition 10.

Suppose for contradiction that there exists an optimal non-pack-and-crack plan \mathcal{H}. By Proposition 6, \mathcal{H} is strictly single-dipped. Consequently, since \mathcal{H} is not pack-and-crack, there exist s<r<sss<r<s^{\prime}\leq s^{\prime\prime} and P,Psupp()P,P^{\prime}\in\operatorname{supp}(\mathcal{H}) such that r(P)=r,r^{*}(P)=r, supp(P)={s,s}\operatorname{supp}(P)=\{s,s^{\prime}\}, and supp(P)={s}\operatorname{supp}(P^{\prime})=\{s^{\prime\prime}\}. By Lemma 1, condition (3) holds. Intuitively, (3) says that the designer prefers not to move a few type-ss voters from district PP to districts δs\delta_{s} and δs\delta_{s^{\prime\prime}}.

We have numerically verified that (3) holds over the specified range of parameters. The code is available on request. ∎

Proof of Proposition 11.

By Lemma 1, λ\lambda has a derivative λ(r)\lambda^{\prime}(r) at each r(rb,rb+ε]r\in(r^{b},r^{b}+\varepsilon] satisfying

g(r)λ(r)q(s2(r)r)+λ(r)(Q(s2(r)r)12)=0,\displaystyle g(r)-\lambda(r)q(s_{2}(r)-r)+\lambda^{\prime}(r)\left(Q(s_{2}(r)-r)-\tfrac{1}{2}\right)=0,
g(r)λ(r)q(s1(r)r)+λ(r)(Q(s1(r)r)12)=0.\displaystyle g(r)-\lambda(r)q(s_{1}(r)-r)+\lambda^{\prime}(r)\left(Q(s_{1}(r)-r)-\tfrac{1}{2}\right)=0.

Solving for λ(r)\lambda(r) and λ(r)\lambda^{\prime}(r) yields, for all r(rb,rb+ε]r\in(r^{b},r^{b}+\varepsilon],

λ(r)=g(r)[Q(s2(r)r)Q(s1(r)r)](Q(s2(r)r)12)q(s1(r)r)(Q(s1(r)r)12)q(s2(r)r),\displaystyle\lambda(r)=\frac{g(r)[Q(s_{2}(r)-r)-Q(s_{1}(r)-r)]}{\left(Q(s_{2}(r)-r)-\frac{1}{2}\right)q(s_{1}(r)-r)-\left(Q(s_{1}(r)-r)-\frac{1}{2}\right)q(s_{2}(r)-r)},
λ(r)=g(r)[q(s2(r)r)q(s1(r)r)](Q(s2(r)r)12)q(s1(r)r)(Q(s1(r)r)12)q(s2(r)r).\displaystyle\lambda^{\prime}(r)=\frac{g(r)[q(s_{2}(r)-r)-q(s_{1}(r)-r)]}{\left(Q(s_{2}(r)-r)-\frac{1}{2}\right)q(s_{1}(r)-r)-\left(Q(s_{1}(r)-r)-\frac{1}{2}\right)q(s_{2}(r)-r)}.

Since λ\lambda^{\prime} is the derivative of λ\lambda, we have dλ(r)/dr=λ(r)d\lambda(r)/dr=\lambda^{\prime}(r) for all r(rb,rb+ε]r\in(r^{b},r^{b}+\varepsilon]. Taking into account that s1s_{1} and s2s_{2} are twice differentiable and satisfy limrrbs1(r)=limrrbs2(r)=rb\lim_{r\downarrow r^{b}}s_{1}(r)=\lim_{r\downarrow r^{b}}s_{2}(r)=r^{b}, we can apply L’Hopital’s rule to evaluate dλ(r)/dr=λ(r)d\lambda(r)/dr=\lambda^{\prime}(r) in the limit rrbr\downarrow r^{b} to obtain

g(rb)q(0)(q(0))2=g(rb)q(0)(q(0))2,\frac{g^{\prime}(r^{b})q(0)}{(q(0))^{2}}=\frac{g(r^{b})q^{\prime}(0)}{(q(0))^{2}},

which implies that rb=0r^{b}=0, because G(r)=Q(γr)G(r)=Q(\gamma r) for all rr and q(r)=0q^{\prime}(r)=0 iff r=0r=0. Denote limrrbs1(r)=1β1\lim_{r\downarrow r^{b}}s^{\prime}_{1}(r)=1-\beta_{1} and limrrbs2(r)=1+β2\lim_{r\downarrow r^{b}}s^{\prime}_{2}(r)=1+\beta_{2}, where β11\beta_{1}\geq 1 (because s1s_{1} is decreasing) and β20\beta_{2}\geq 0 (because s2(r)>r)s_{2}(r)>r). Differentiating dλ(r)/dr=λ(r)d\lambda(r)/dr=\lambda^{\prime}(r) with respect to rr and taking the limit r0r\downarrow 0, we get

γq(0)(γ2β2β1)q(0)=γq(0)(β2β1)2q(0),\frac{\gamma q^{\prime\prime}(0)(\gamma^{2}-\beta_{2}\beta_{1})}{q(0)}=\frac{\gamma q^{\prime\prime}(0)(\beta_{2}-\beta_{1})}{2q(0)},

and hence

2γ2=2β2β1+β2β1.2\gamma^{2}=2\beta_{2}\beta_{1}+\beta_{2}-\beta_{1}. (12)

Since, for small enough r>0r>0, type s1(r)s_{1}(r) is assigned to both district δs1(r)\delta_{s_{1}(r)} and district PP with r(P)=rr^{*}(P)=r and supp(P)={s1(r),s2(r)}\operatorname{supp}(P)=\{s_{1}(r),s_{2}(r)\}, we must have, by Lemma 1,

Q(γs1(r))=Q(γr)+λ(r)(Q(s1(r)r)12).Q(\gamma s_{1}(r))=Q(\gamma r)+\lambda(r)\left(Q(s_{1}(r)-r)-\tfrac{1}{2}\right).

In the limit r0r\downarrow 0, the values and the derivatives up to order 2 of both sides always coincide, while the third derivatives coincide iff

q(0)γ3(β1+1)3=q(0)γ33q(0)γ3β1+3q(0)γβ2β12q(0)γβ13,\displaystyle q^{\prime\prime}(0)\gamma^{3}(-\beta_{1}+1)^{3}=q^{\prime\prime}(0)\gamma^{3}-3q^{\prime\prime}(0)\gamma^{3}\beta_{1}+3q^{\prime\prime}(0)\gamma\beta_{2}\beta_{1}^{2}-q^{\prime\prime}(0)\gamma\beta_{1}^{3},

which simplifies to

γ2β1+3γ2=3β2β1.-\gamma^{2}\beta_{1}+3\gamma^{2}=3\beta_{2}-\beta_{1}. (13)

Since, for small enough r>0r>0, type s1(r)s_{1}(r) is assigned to both district δs1(r)\delta_{s_{1}(r)} and district PP with r(P)=rr^{*}(P)=r, while type s2(r)s_{2}(r) is assigned only to district PP, we have

f(s1(r))s1(r)(Q(s1(r)r)12)f(s2(r))s2(r)(Q(s2(r)r)12).f(s_{1}(r))s_{1}^{\prime}(r)\left(Q(s_{1}(r)-r)-\tfrac{1}{2}\right)\geq f(s_{2}(r))s_{2}^{\prime}(r)\left(Q(s_{2}(r)-r)-\tfrac{1}{2}\right).

In the limit r0r\downarrow 0, both sides are equal, and hence their derivatives must satisfy

f(0)q(0)β1(1β1)f(0)q(0)β2(β2+1),-f(0)q(0)\beta_{1}(1-\beta_{1})\geq f(0)q(0)\beta_{2}(\beta_{2}+1),

which, given that β1+β2>0\beta_{1}+\beta_{2}>0, simplifies to

β1β2+1.\beta_{1}\geq\beta_{2}+1. (14)

Equations (12) and (13) have two solutions (β1,β2)=(3γ2/(2(γ21)),γ2/2)(\beta_{1},\beta_{2})=\left({3\gamma^{2}}/{(2(\gamma^{2}-1))},{\gamma^{2}}/{2}\right) and (β1,β2)=(1,(2γ2+1)/3)(\beta_{1},\beta_{2})=\left(1,{(2\gamma^{2}+1)}/{3}\right), unless γ2=1\gamma^{2}=1, in which case (12) and (13) have only one solution (β1,β2)=(1,1)(\beta_{1},\beta_{2})=(1,1). The solution (β1,β2)=(1,(2γ2+1)/3)(\beta_{1},\beta_{2})=\left(1,{(2\gamma^{2}+1)}/{3}\right) never satisfies (14) and thus is discarded. Moreover, for the solution (β1,β2)=(3γ2/(2(γ21)),γ2/2)(\beta_{1},\beta_{2})=\left({3\gamma^{2}}/{(2(\gamma^{2}-1))},{\gamma^{2}}/{2}\right), condition β11\beta_{1}\geq 1 yields γ>1\gamma>1, and condition (14) yields γ1+3\gamma\leq\sqrt{1+\sqrt{3}}. Thus, for Y-districting to be optimal, we must have γ(1,1+3]\gamma\in(1,\sqrt{1+\sqrt{3}}]. Finally, the statement in Footnote 30 holds because

limr0s1(r)=1β1=(γ2+2)2(γ21)<0andlimr0s2(r)=1+β2=1+γ22>0\lim_{r\downarrow 0}s_{1}^{\prime}(r)=1-\beta_{1}=-\frac{(\gamma^{2}+2)}{2(\gamma^{2}-1)}<0\quad\text{and}\quad\lim_{r\downarrow 0}s_{2}^{\prime}(r)=1+\beta_{2}=1+\frac{\gamma^{2}}{2}>0

are both strictly increasing in γ\gamma. ∎