This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Implications of Distance over Redistricting Maps: Central and Outlier Maps111Readers are encouraged to see the arxiv version with more results: https://arxiv.org/abs/2203.00872

Seyed A. Esmaeili 1, Darshan Chakrabarti2, Hayley Grape3, Brian Brubach3
Abstract

In representative democracy, a redistricting map is chosen to partition an electorate into districts which each elects a representative. A valid redistricting map must satisfy a collection of constraints such as being compact, contiguous, and of almost-equal population. However, these constraints are loose enough to enable an enormous ensemble of valid redistricting maps. This enables a partisan legislature to gerrymander by choosing a map which unfairly favors it. In this paper, we introduce an interpretable and tractable distance measure over redistricting maps which does not use election results and study its implications over the ensemble of redistricting maps. Specifically, we define a central map which may be considered ”most typical” and give a rigorous justification for it by showing that it mirrors the Kemeny ranking in a scenario where we have a committee voting over a collection of redistricting maps to be drawn. We include running time and sample complexity analysis for our algorithms, including some negative results which hold using any algorithm. We further study outlier detection based on this distance measure and show that our framework can detect some gerrymandered maps. More precisely, we show some maps that are widely considered to be gerrymandered that lie very far away from our central maps in comparison to a large ensemble of valid redistricting maps. Since our distance measure does not rely on election results, this gives a significant advantage in gerrymandering detection which is lacking in all previous methods.

1 Introduction

Redistricting is the process of dividing an electorate into districts which each elect a representative. In the United States, this process is used for both federal and state-level representation, and we will use the U.S. House of Representatives as a running example. Subject to both state and federal law, the division of states into congressional districts is not arbitrary and must satisfy a collection of properties such as contiguity and near-equal population. Despite these regulations, redistricting is vulnerable to strategic manipulation in the form of gerrymandering. The body in charge can easily create a map within the legal constraints that leads to election results which favor a particular outcome (e.g., more representatives elected from one political party in the case of partisan gerrymandering). In addition, the ability to draw gerrymandered districts has improved greatly with the aid of computers since the historic salamander-shaped district approved by Massachusetts Governor Elbridge Gerry in 1812. For example, assuming voting consistent with the 2016 election, the state of North Carolina with 13 representatives can be redistricted to elect either 3 Democrats and 10 republicans or 10 Democrats and 3 Republicans.

However, despite this obvious threat to functioning democracy, partisan gerrymandering has often eluded regulation partly because it has been difficult to measure. In response, a recent line of impactful research introduced sampling techniques to randomly generate a large collection of redistricting maps222These are not the truly uniform random samples from the immense and ill-defined space of all possible maps that we ideally want, but they are generally treated as such in courts. (Chikina, Frieze, and Pegden 2017; DeFord, Duchin, and Solomon 2019; Herschlag et al. 2020) and calculate statistics such as a histogram of the number of seats won by each party using this collection. This can show that a proposed or enacted map is an outlier in terms of its election outcome with respect to the sample. In fact, these techniques were used as a key argument in a recent U.S. Supreme Court case on partisan gerrymandering (Rucho v. Common Cause No. 18-422, 588 U.S. 2019) and have supported successful efforts to change redistricting maps in state supreme court cases (LWV vs Commonwealth of Pennsylvania No. 159 MM 2018). More importantly for the present work, at least two states, Michigan and Wisconsin, have used such a sampling tool (DeFord, Duchin, and Solomon 2019) in the recent redistricting process in response to the 2020 U.S. Census  (Chen 2021). Recent papers have even extended these methods to locate which regions in a state are most unfairly impacted by a given map (Lin et al. 2022; Ko et al. 2022).

Despite this progress, all of these methods rely on election outcomes to detect any possible gerrymandering. This is a problem in instances where citizens, courts, and/or legislative bodies request methods which do not take partisanship into account. However, there has not been an effective method for detecting gerrymandering by identifying outliers according to a non-partisan metric. Abrishami et al. (2020) make progress toward this goal, but has limitations such a using a small number of samples and not having a clear outlier score (See Section 2 and Appendix C.2 for more details). In this paper, we introduce a framework that resolves these issues and demonstrates the first effective method for detecting gerrymandering based on a distance (dissimilarity) measure over redistricting maps.

Furthermore, while progress has been made on the problem of detecting/labeling gerrymandering through the use of these sampling techniques, the question of drawing a redistricting map in a way that is “fair” and resists strategic manipulation remains largely unclear. We survey some existing proposals to automate redistricting in more detail in Section 2, but none of them have been adopted in practice thus far. Indeed, the issue of finding the “ideal” redistricting map is elusive and one of the main problems in redistricting and gerrymandering. We make progress in this direction, by introducing a novel method which selects the most “central” map. More specifically, given a collection of maps which are voted on by committee members we select the map with the minimum vote-weighted distance from the collection.

1.1 Our Contributions

Our paper introduces a number of contributions:

  1. 1.

    Distance over Redistricting Maps: We introduce a tractable family of distance measures over redistricting maps which have a simple edit distance interpretation. This family of distance measures can be adjusted to accommodate considerations such as a population or path length between voting blocks (Subsection 3.1).

  2. 2.

    Medoid Map: We introduce the medoid map and show that it mirrors the Kemeny ranking in a setting where committee members vote over a collection of maps. We further characterize the complexity of finding this map and introduce heuristics for finding it (Sections 4, 5, 6).

  3. 3.

    Centroid Map: We introduce the centroid map which is not a valid map, but has interesting properties and implications. We provide algorithms for finding this map and characterize its sample complexity (Sections 5,6).

  4. 4.

    Gerrymandering Detection and Empirical Validation: We show empirically that our framework can be used to detect gerrymandering in some instances. Further, we carry out extensive experimental validation of our pipeline where we ensure convergence and reproducibility by repeating the same experiment using different seeds. Remarkably, we reach the same conclusion across all runs: well-known gerrymandered maps of North Carolina and Pennsylvania have a distance in the 99th percentile away from the centroid map in comparison to an ensemble (Section 6 and Appendix B).

Furthermore, in Appendix C we include more details about gerrymandering detection such as the interpretation of having a large distance from the centroid map and a more detailed comparison to the work of (Abrishami et al. 2020). Finally, we note that our framework can adapt to various considerations such the Voting Rights Act (VRA) (Bickel 1966) and other state-specific redistricting rules by simply modifying the sampling method to adapt to these considerations. Due to the space limits, we delay all proofs to Appendix A.

2 Related Work

Less than a decade ago, several early works ushered in the current era of Markov Chain Monte Carlo (MCMC) sampling techniques for gerrymandering detection (Mattingly and Vaughn 2014; Wu et al. 2015; Fifield et al. 2015). Followup work has both refined these techniques and further analyzed their ability to approximate the target distribution. Authors of these works have been involved in court cases in Pennsylvania (Chikina, Frieze, and Pegden 2017) and North Carolina (Herschlag et al. 2020) with sampling approaches being used to demonstrate that existing maps were outliers as evidence of partisan gerrymandering. One of the most recent works in this area introduces the ReCom\operatorname{\textbf{ReCom}} tool (DeFord, Duchin, and Solomon 2019) which was used by the Wisconsin People’s Maps Commission and the Michigan Independent Citizens Redistricting Commission in the current redistricting cycle following the 2020 U.S. census (Chen 2021). More recently, the works of (Lin et al. 2022; Ko et al. 2022) have made an interesting extension of the previous methods by identifying the voting blocks unfairly impacted in a gerrymandered map. Generally, these techniques have primarily been used to analyze and sometimes reject existing maps rather than draw new maps. However, we may view them as narrowing the search space of maps that can be drawn. Furthermore, it has been shown that even the regulation of gerrymandering via outlier detection is subject to strategic manipulation (Brubach, Srinivasan, and Zhao 2020).

On the automated redistricting side, many map drawing algorithms have favored optimization approaches and in particular, optimizing some notion of compactness while avoiding explicit use of partisan information. Approaches emphasizing compactness include balanced power diagrams (Cohen-Addad, Klein, and Young 2018), a kk-median-based objective (Bycoffe et al. 2018), and minimizing the number of cut edges  (Hettle et al. 2021). Some works include partisan information for the sake of creating competitive districts (districts with narrow margins between the two main parties). The PEAR tool (Liu, Cho, and Wang 2016) balances nonpartisan criteria like compactness (defined by the Polsby-Popper score (Polsby and Popper 1991)) with other criteria such as competitiveness and uses an evolutionary algorithm with some similarity to the random walks taken by MCMC sampling approaches. Other works go further in the explicitly partisan direction such as the game theoretic approach of Pegden, Procaccia, and Yu (2017) which seeks a map that is fair to two parties. Finally, there are methods which prefer simplicity such as the Splitline (Ryan and Smith 2022) algorithm which iteratively splits a state until the desired number of districts is reached.

In all of these approaches, the aim is to automate redistricting, but it is difficult to determine whether the choices made are the “right” or “fairest” decisions. The question of whether optimizing properties such as compactness while ignoring partisan factors could result in partisan bias is a concern. Cho (2019) notes a comment by Justice Scalia suggesting that such a process could be biased against Democratic voters clustered in cities in Vieth v. Jubelirer (Vieth v. Jubelirer No. 02-1580, 541 U.S. 267 (2004). For those that do take partisan bias into account, there are questions of whether purposely drawing competitive districts or giving a fair allocation to two parties are really beneficial to voters.

Finally, the work of Abrishami et al. (2020) introduces a distance measure over redistricting maps. However, our distance is easy to compute and does not require solving a linear program. Further, our focus is on the implications of having a distance measure, i.e. the medoid and centroid maps that will be introduced. Moreover, unlike Abrishami et al. (2020) we can detect gerrymandered maps rigorously by specifying where they lie on a distance histogram without using an embedding method and using 200,000200{,}000 samples instead of only 100100. Finally, we give a clear outlier score for a given map (its percentile distance from the central map) whereas (Abrishami et al. 2020) cannot do that. Therefore, it is difficult to see how their methods would be applied in a real practical setting. See Appendix C.2 for a more detailed discussion.

3 Problem Set-Up

Refer to caption
Figure 1: We are given a hypothetical state consisting of 4 vertices V={v1,v2,v3,v4}V=\{v_{1},v_{2},v_{3},v_{4}\} with M1M_{1} and M2M_{2} being two valid redistricting maps. The adjacency matrices A1A_{1},A2A_{2}, and edit distance interpretation of dΘ(A1,A2)\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2}) are demonstrated. Note that dΘ(A1,A2)=θ(1,2)+θ(3,4)+θ(1,3)+θ(2,4)\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2})=\theta(1,2)+\theta(3,4)+\theta(1,3)+\theta(2,4) which is exactly the minimum sum of edge weights that need to be deleted and added to obtain A2A_{2} from A1A_{1}.

A given state is modelled by a graph G=(V,E)G=(V,E) where each vertex vVv\in V represents a voting block (unit). Each unit vv has a weight w(v)>0w(v)>0 which represents its population. Further, u,vV\forall u,v\in V there is an edge e=(u,v)Ee=(u,v)\in E if and only if the two vertices are connected (geographically this means that units uu and vv share a boundary). The number of units is |V|=n|V|=n. A redistricting (redistricting map or simply map) MM is a partition of VV into k>0k>0 many districts, i.e., V=V1V2VkV=V_{1}\cup V_{2}\dots\cup V_{k} where each ViV_{i} represents a district and i[k],|Vi|0\forall i\in[k],|V_{i}|\neq 0 and i,j[k],ViVj=ifij\forall i,j\in[k],V_{i}\cap V_{j}=\emptyset\ \text{if}\ i\neq j. The redistricting map MM is decided by the induced partition, i.e., M={V1,,Vk}M=\{V_{1},\dots,V_{k}\}. For a redistricting MM to be considered valid, it must satisfy a collection of properties, some of which are specific to the given state. We use the most common properties as stated in (DeFord, Duchin, and Solomon 2019; Hettle et al. 2021): (1) Compactness: The given partitioning should have “compact” districts. Although there is no definitive mathematical criterion which decides compactness for districts, some have used common definitions such as Polsby-Popper or Reock Score (Alexeev and Mixon 2018). Others have used a clustering criterion like the kk-median objective (Cohen-Addad, Klein, and Young 2018) or considered the total number of cuts (number of edges between vertices in different districts) (DeFord, Duchin, and Solomon 2019). (2) Equal Population: To satisfy the desideratum of “one person one vote” each district should have approximately the same number of individuals. I.e., a given district ViV_{i} should satisfy vViw(v)[(1ϵ)vVw(v)k,(1+ϵ)vVw(v)k]\sum_{v\in V_{i}}w(v)\in[(1-\epsilon)\frac{\sum_{v\in V}w(v)}{k},(1+\epsilon)\frac{\sum_{v\in V}w(v)}{k}] where ϵ\epsilon is a non-negative parameter relaxing the equal population constraint. (3) Contiguity: Each district (partition) ViV_{i} should be a connected component, i.e., i[k]\forall i\in[k] and u,vVi\forall u,v\in V_{i}, vv should be reachable from uu through vertices which only belong to ViV_{i}. Our proofs do not rely on these properties and therefore can accommodate additional properties if desired.

Let \operatorname{\mathcal{M}} be the set of all valid maps. Let 𝒟()\operatorname{\mathcal{D}}(\operatorname{\mathcal{M}}) be a distribution over these maps. Furthermore, define a distance function over the maps d:×[0,)d:\operatorname{\mathcal{M}}\times\operatorname{\mathcal{M}}\rightarrow[0,\infty). Then the population medoid map is MM^{*} which is a solution to the following:

M=argminM𝔼M𝒟()[d(M,M)]\displaystyle M^{*}=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}}}\ \underset{\operatorname{\mathnormal{M}}^{\prime}\sim\operatorname{\mathcal{D}}(\operatorname{\mathcal{M}})}{\operatorname{\mathbb{E}}}[d(\operatorname{\mathnormal{M}},\operatorname{\mathnormal{M}}^{\prime})] (1)

In words, the population medoid map is a valid map minimizing the expected sum of distances away from all valid maps according to the distribution 𝒟()\operatorname{\mathcal{D}}(\operatorname{\mathcal{M}}). This serves as a natural way to define a central or most typical map with respect to a given distance metric of interest.

Since we clearly operate over a sample (a finite collection) from 𝒟()\operatorname{\mathcal{D}}(\operatorname{\mathcal{M}}); therefore, we assume that the following condition holds:

Condition 3.1.

We can sample maps from the distribution 𝒟()\operatorname{\mathcal{D}}(\operatorname{\mathcal{M}}) in an independent and identically distributed (iid\operatorname{\textbf{iid}}) manner in polynomial time.

We note that although independence certainly does not hold over the sampling methods of (DeFord, Duchin, and Solomon 2019; Mattingly and Vaughn 2014) since they use MCMC methods, it makes the derivations significantly more tractable. Further, the specific choice of the sampling technique is somewhat immaterial to our objective.

Based on the above condition, we can sample from the distribution \operatorname{\mathcal{M}} efficiently and obtain a finite set of maps T\operatorname{\mathcal{M}_{\mathnormal{T}}} having TT many maps, i.e., |T|=T|\operatorname{\mathcal{M}_{\mathnormal{T}}}|=T.

Now, we define the sample medoid, which is simply the extension of the population medoid, but restricted to the given sample. This leads to the following definition:

M¯=argminMTMTd(M,M)\displaystyle\bar{M}^{*}=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}_{\mathnormal{T}}}}\sum_{M^{\prime}\in\operatorname{\mathcal{M}_{\mathnormal{T}}}}\ d(\operatorname{\mathnormal{M}},\operatorname{\mathnormal{M}}^{\prime}) (2)

3.1 Distance over Redistricting Maps

Before we introduce our distance measure, we note that a given map (partition) MM can be represented using an “adjacency” matrix AA in which A(i,j)=1A(i,j)=1 if and only if VM:i,jV\exists V_{\ell}\in M:i,j\in V_{\ell} otherwise A(i,j)=0A(i,j)=0. We note that this adjacency matrix can be seen as drawing an edge between every two vertices i,ji,j that are in the same district, i.e., where A(i,j)=1A(i,j)=1. It is clear that we can refer to a map by the partition MM or the induced adjacency matrix AA. Accordingly, we refer to the population medoid as MM^{*} or A\operatorname{\mathnormal{A^{*}}} and the sample medoid as M¯\bar{M}^{*} or A¯\operatorname{\mathnormal{\bar{A}^{*}}}

We now introduce our distance family which is parametrized by a weight matrix Θ\Theta and have the following form:

dΘ(A1,A2)=12i,jVθ(i,j)|A1(i,j)A2(i,j)|\displaystyle\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2})=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)|A_{1}(i,j)-A_{2}(i,j)| (3)

where we only require that θ(i,j)>0,i,jV\theta(i,j)>0,\forall i,j\in V where θ(i,j)\theta(i,j) is the (i,j)(i,j) entry of Θ\Theta. For the simple case where θ(i,j)=1,i,jV\theta(i,j)=1,\forall i,j\in V, our distance d1(A1,A2)d_{\textbf{1}}(A_{1},A_{2}) is equivalent to a Hamming distance over adjacency matrices. When θ(i,j)=1,i,jV\theta(i,j)=1,\forall i,j\in V, we refer to the metric as the unweighted distance. We note that such a distance measure was used in previous work that considered adversarial attacks on clustering (Chhabra, Roy, and Mohapatra 2020; Cinà, Torcinovich, and Pelillo 2022).

Another choice of Θ\Theta that leads to a meaningful metric is the population-weighted distance where θ(i,j)=w(i)w(j)\theta(i,j)=w(i)w(j). This leads to dW(A1,A2)=12i,jVw(i)w(j)|A1(i,j)A2(i,j)|d_{W}(A_{1},A_{2})=\frac{1}{2}\sum_{i,j\in V}w(i)w(j)|A_{1}(i,j)-A_{2}(i,j)|. The population-weighted distance takes into account the number of individuals being separated from one another when vertices ii and jj are separated from one another333Recall that each vertex (unit) is a voting block (AKA voter tabulation district) and units may contain different numbers of voters. by assigning a cost of w(i)w(j)w(i)w(j). By contrast, the unweighted distance assigns the same cost regardless of the population values and thus has a uniform weight over the separation of units immaterial of the populations which they include.

Another choice of metric which is meaningful, could be of the form θ(i,j)=f(l(i,j))\theta(i,j)=f(l(i,j)) where l(i,j)l(i,j) is the length of a shortest path between ii and jj and f(.)f(.) is a positive decreasing function such as f(l(i,j))=el(i,j)f(l(i,j))=e^{-l(i,j)}. Such a metric would place a smaller penalty for separating vertices that are far away from each other. In general, our distance has an edit distance interpretation. Specifically, if we were to draw edges between vertices according to the entries with 11 in the adjacency matrix, then given A1A_{1} and A2A_{2}, the distance dΘ(A1,A2)\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2}) simply equals the minimum total weight (according to θ(i,j)\theta(i,j)) of the edges that must be added and deleted to obtain A2A_{2} from A1A_{1}. In the case of the unweighted distance, it is precisely equal to the minimum number of edges that have to be deleted from and added to A1A_{1} to obtain A2A_{2}. See Figure 1 for an illustration.

4 Justification for Choosing a Central Map

Connection to the Kemeny Rule:

We note that the Kemeny rule (Kemeny 1959; Brandt et al. 2016) is the main inspiration behind our proposed framework. More specifically, given a set of alternatives and individuals voting by ranking the alternatives, the Kemeny rule provides a method for aggregating the resulting collection of rankings. This is done by introducing a distance measure over rankings (the Kendall tau distance (Kendall 1938)) and choosing the ranking which minimizes the sum of distances away from the other rankings in the collection as the aggregate ranking.

Although we do not deal with rankings here, we follow a similar approach to the Kemeny rule as we introduce a distance measure over redistricting maps and choose the map which minimizes the sum of the distances as the aggregate map. In fact, recently there has been significant citizen engagement in drawing redistricting maps. For example, in the state of Maryland an executive order from the governor has established a web page to collect citizen submissions of redistricting maps (Commission 2021). If each member of a committee was to vote for exactly one map in the given submitted maps, then if we interpret the probability pMp_{\operatorname{\mathnormal{M}}^{\prime}} for a map M\operatorname{\mathnormal{M}}^{\prime}\in\operatorname{\mathcal{M}} to be the number of votes it received from the total set of votes, then the medoid map MM^{*} (similar to the Kemeny ranking) would be the map which minimizes the weighted sum of distances from the set of maps voted on. We include this result as a proposition and its proof follows directly from the definition we gave above:

Proposition 1.

Suppose we have a committee of 𝒯\mathcal{T} many voters and that each voter votes for one map from a subset of all possible valid maps \operatorname{\mathcal{M}}, then given a map M\operatorname{\mathnormal{M}}^{\prime}, if we assign it a probability pM=τ=1𝒯ντ,M𝒯p_{\operatorname{\mathnormal{M}}^{\prime}}=\frac{\sum_{\tau=1}^{\mathcal{T}}\nu_{\tau,\operatorname{\mathnormal{M}}^{\prime}}}{\mathcal{T}} where ντ,M{0,1}\nu_{\tau,\operatorname{\mathnormal{M}}^{\prime}}\in\{0,1\} is the vote of member τ\tau for map M\operatorname{\mathnormal{M}}^{\prime}, then the medoid map M=argminM𝔼MpM[d(M,M)]M^{*}=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}}}\ \underset{\operatorname{\mathnormal{M}}^{\prime}\sim p_{\operatorname{\mathnormal{M}}^{\prime}}}{\operatorname{\mathbb{E}}}[d(\operatorname{\mathnormal{M}},\operatorname{\mathnormal{M}}^{\prime})] is the map that minimizes the sum of distances from the set of valid maps where the distance to each map is weighted by the total votes it receives.

Connection to Distance and Clustering Based Outlier Detection:

The medoid map, by virtue of minimizing the sum of distances, can be considered a central map. Accordingly, one may consider using the medoid map to test for gerrymandering in a manner similar to distance and clustering based outlier detection (He, Xu, and Deng 2003; Knox and Ng 1998). More specifically, given a large ensemble of maps, if the enacted map is far from the medoid444Our experiments use the centroid instead of the medoid map. in comparison to the ensemble then this suggests possible gerrymandering. In fact, we carry experiments on the states of North Carolina and Pennsylvania (both of which have had enacted maps which were considered gerrymandered) and we indeed find the gerrymandered maps to be faraway whereas the remedial maps are much closer in terms of distance.

5 Algorithms

We show our linear time algorithm for obtaining the sample medoid in Subsection 5.1. In Subsection 5.2, we define the population centroid, derive sample complexity guarantees for obtaining it, and show that its (i,j)(i,j) entry equals the probability of having ii and jj in the same district. Finally, in Subsection 5.3, we discuss obtaining the population medoid and show that in general an arbitrarily large sample is not sufficient to approximate it.

Before we introduce our algorithms, we show that our distance family is a metric (satisfies the properties of a metric):

Proposition 2.

For all Θ\Theta such that i,j,θ(i,j)>0\forall i,j,\theta(i,j)>0, the following distance function is a metric.

dΘ(A1,A2)=12i,jVθ(i,j)|A1(i,j)A2(i,j)|\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2})=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)|A_{1}(i,j)-A_{2}(i,j)|

5.1 Obtaining the Sample Medoid

We note that in general obtaining the sample medoid is not scalable since it usually takes quadratic time (Newling and Fleuret 2017) in the number of samples, i.e. Ω(T2)\Omega(T^{2}). An O(T2)O(T^{2}) run time can be easily obtained through a brute-force algorithm which for every map calculates the sum of the distances from other maps and then selects the map with the minimum sum. However, for our family of distances dΘ(.,.)\operatorname{\mathnormal{d_{\Theta}}}(.,.) we show that the medoid map is the closest map to the centroid map (defined below) and show a simple algorithm that runs in O(T)O(T) time for obtaining the sample medoid. The fundamental cause behind this speed up is an equivalence between the Hamming distance over binary vectors and the square of the Euclidean distance which is still maintained with our generalized distance. Before introducing the theorem we define d2,Θ(A1,A2)=12i,jVθ(i,j)(A1(i,j)A2(i,j))2\operatorname{\mathnormal{d_{2,\Theta}}}(A_{1},A_{2})=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)(A_{1}(i,j)-A_{2}(i,j))^{2} where the absolute has been replaced by a square. Now we state the decomposition theorem:

Theorem 5.1.

Given a collection of redistricting maps A1,,ATA_{1},\dots,A_{T}, the sum of distances of the maps from a fixed redistricting map AA^{\prime} equals the following:

t=1TdΘ(At,A)\displaystyle\sum_{t=1}^{T}\operatorname{\mathnormal{d_{\Theta}}}(A_{t},A^{\prime}) =t=1Td2,Θ(At,A¯c)+Td2,Θ(A¯c,A)\displaystyle=\sum_{t=1}^{T}\operatorname{\mathnormal{d_{2,\Theta}}}(A_{t},\bar{A}_{c})+T\operatorname{\mathnormal{d_{2,\Theta}}}(\bar{A}_{c},A^{\prime}) (4)

where A¯c=1Tt=1At\bar{A}_{c}=\frac{1}{T}\sum_{t=1}A_{t}.

Notice that the above theorem introduces the centroid map A¯c\operatorname{\mathnormal{\bar{A}_{c}}} which is simply equal to the empirical mean of the adjacency maps. It should be clear that with the exception of trivial cases the centroid map A¯c\operatorname{\mathnormal{\bar{A}_{c}}} is not a valid adjacency matrix, since despite being symmetric it would have fractional entries between 0 and 11. Hence, the centroid map also does not lead to a valid partition or districting. Moreover, we note that it is more accurate to call A¯c\operatorname{\mathnormal{\bar{A}_{c}}} the sample centroid, as opposed to the population centroid AcA_{c} (see Subsection 5.2) which we would obtain with an infinite number of samples.

  Input: T={A1,,AT}\operatorname{\mathcal{M}_{\mathnormal{T}}}=\{A_{1},\dots,A_{T}\}, Θ={θ(i,j)>0,i,jV}\Theta=\{\theta(i,j)>0,\forall i,j\in V\}.
  1: Calculate the centroid map A¯c=1Tt=1TAt\operatorname{\mathnormal{\bar{A}_{c}}}=\frac{1}{T}\sum_{t=1}^{T}A_{t}.
   2: Pick the map A¯T\bar{A}^{*}\in\operatorname{\mathcal{M}_{\mathnormal{T}}} which minimizes the d2,Θ\operatorname{\mathnormal{d_{2,\Theta}}} distance from the centroid A¯c\operatorname{\mathnormal{\bar{A}_{c}}}, i.e. A¯=argminATd2,Θ(A,A¯c)\bar{A}^{*}=\operatorname*{arg\,min}_{A\in\operatorname{\mathcal{M}_{\mathnormal{T}}}}\operatorname{\mathnormal{d_{2,\Theta}}}(A,\operatorname{\mathnormal{\bar{A}_{c}}}).
  return  A¯\bar{A}^{*}
Algorithm 1 Finding the Sample Medoid

The above theorem leads to Algorithm 1 with the following remark:

Remark 1.

Algorithm 1 returns the correct sample medoid and runs in O(T)O(T) time.

We note that calculating the sample medoid in algorithm 1 has no dependence on the generating method. Therefore, if a set of maps are produced through any mechanism and are considered to be representative and sufficiently diverse, then algorithm 1 can be used to obtain the sample medoid in time that is linear in the number of samples.

5.2 Sample Complexity for Obtaining the Population Centroid

In the prior section, we introduced the sample centroid A¯c\operatorname{\mathnormal{\bar{A}_{c}}} which is equal to the empirical mean from taking the average of the adjacency matrices, i.e., A¯c=1Tt=1TAt\operatorname{\mathnormal{\bar{A}_{c}}}=\frac{1}{T}\sum_{t=1}^{T}A_{t}. We now consider the population centroid Ac=limTt=1TAt\operatorname{\mathnormal{A_{c}}}=\lim_{T\rightarrow\infty}\sum_{t=1}^{T}A_{t}. Clearly, by the law of large numbers (Zubrzycki 1972), we have Ac(i,j)=𝔼[A(i,j)]\operatorname{\mathnormal{A_{c}}}(i,j)=\operatorname{\mathbb{E}}[A(i,j)] . It is also clear that Ac\operatorname{\mathnormal{A_{c}}} has an interesting property, specifically the (i,j)(i,j)-entry equals the probability that ii and jj are in the same district:

Proposition 3.

Ac(i,j)=Pr[i and j in the same district]\operatorname{\mathnormal{A_{c}}}(i,j)=\Pr[i\text{ and }j\text{ in the same district}].

Now we show that with a sufficient number of samples, the sample centroid converges to the population centroid entry-wise and in terms of the d2,Θ\operatorname{\mathnormal{d_{2,\Theta}}} value. Specifically, we have the following proposition:

Proposition 4.

If we sample T1ϵ2lnnδT\geq\frac{1}{\epsilon^{2}}\ln{\frac{n}{\delta}} iid\operatorname{\textbf{iid}} samples, then with probability at least 1δ1-\delta, we have that i,jV:|A¯c(i,j)Ac(i,j)|ϵ\forall i,j\in V:|\operatorname{\mathnormal{\bar{A}_{c}}}(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j)|\leq\epsilon. Further, let κ=maxi,jVθ(i,j)\kappa=\max_{i,j\in V}\sqrt{\theta(i,j)}, if we have Tκn2ϵlnnδT\geq\frac{\kappa n^{2}}{\epsilon}\ln{\frac{n}{\delta}} iid\operatorname{\textbf{iid}} samples, then d2,Θ(A¯c,Ac)ϵ\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{\bar{A}_{c}}},\operatorname{\mathnormal{A_{c}}})\leq\epsilon with probability at least 1δ1-\delta.

5.3 Obtaining the Population Medoid

Having found the sample centroid A¯c\operatorname{\mathnormal{\bar{A}_{c}}} and shown that it is a good estimate of the population centroid Ac\operatorname{\mathnormal{A_{c}}}, we now show that we can obtain a good estimate of the population medoid by solving an optimization problem. Assuming that we have the population centroid Ac\operatorname{\mathnormal{A_{c}}}, then the population medoid is simply a valid redistricting map AA which has a minimum d2,Θ(A,Ac)\operatorname{\mathnormal{d_{2,\Theta}}}(A,\operatorname{\mathnormal{A_{c}}}) value. This follows immediately from Theorem 5.1. More interestingly, we show that this optimization problem is a constrained instance of the min kk-cut problem:

Theorem 5.2.

Given the population centroid Ac\operatorname{\mathnormal{A_{c}}}, the population medoid A\operatorname{\mathnormal{A^{*}}} can be obtained by solving a constrained min kk-cut problem.

If we have a good estimate A¯c\operatorname{\mathnormal{\bar{A}_{c}}} of the population centroid Ac\operatorname{\mathnormal{A_{c}}}, then we can solve the above optimization using A¯c\operatorname{\mathnormal{\bar{A}_{c}}} instead of Ac\operatorname{\mathnormal{A_{c}}} and obtain an estimate of the population medoid A¯\operatorname{\mathnormal{\bar{A}^{*}}} instead of the true population medoid A\operatorname{\mathnormal{A^{*}}} and bound the error of that estimate. The issue is that the min kk-cut problem is NP-hard (Goldschmidt and Hochbaum 1994; Saran and Vazirani 1995) 555Note that in our case the min kk-cut problem can have negative edge weights while the min kk-cut problem is generally stated with non-negative weights. Nevertheless, we still minimize a cut objective and the non-negative weight min kk-cut instance is trivially reducible to a min kk-cut instance with negative and non-negative weights.. Further, the existing approximation algorithms assume non-negativity of the weights. Even if these approximation algorithms can be tailored to this setting, the additional constraints on the partition being a valid redistricting (each partition being contiguous, of equal population, and compact) make it quite difficult to approximate the objective. In fact, excluding the objective and focusing on the constraint alone, only the work of (Hettle et al. 2021) has produced approximation algorithm for redistricting maps but has done that for the restricted case of grid graphs. Further, while there exists heuristics for solving min kk-cut for redistricting maps they only scale to at most around 500500 vertices (Validi and Buchanan 2022).

Having shown the difficulty in obtaining the population medoid by solving an optimization problem, it is reasonable to wonder whether we can gain any guarantees about the population medoid by sampling. We show the negative result that we cannot guarantee that we can estimate the sample medoid of a distribution with high probability by choosing a sampled map even if we sample an arbitrarily large number of maps. This implies as a corollary that the sample medoid does not converge to the population medoid in contrast to the centroid (see Proposition 4).

Theorem 5.3.

For any arbitrary TT many iid\operatorname{\textbf{iid}} samples {A1,,AT}\{A_{1},\dots,A_{T}\} there exists a distribution over a set of redistricting maps such that: (1) Pr[minA{A1,,AT}d(A,A)0.331]23\Pr[\min_{A\in\{A_{1},\dots,A_{T}\}}d(A,A^{*})\geq 0.331]\geq\frac{2}{3} and (2) Pr[minA{A1,,AT}f(A)1.1f(A)]23\Pr[\min_{A\in\{A_{1},\dots,A_{T}\}}f(A)\geq 1.1f(A^{*})]\geq\frac{2}{3} where f(.)f(.) is the medoid cost function.

We therefore, use a heuristic to find the medoid as mentioned in the next section.

6 Experiments

Refer to caption
Figure 2: Distance histograms for NC using the unweighted distance measure. Different plots correspond to different seeds. For NC the distances of gerrymandered maps are indicated with red markers whereas the distances of the remedial maps are indicated with green markers (the circle and the X are for 2011 and 2016 enacted maps, respectively).

We conduct experiments on three states, North Carolina (NC), Maryland (MD), and Pennsylvania (PA), which have featured in major court cases on partisan gerrymandering (LWV vs Commonwealth of Pennsylvania No. 159 MM 2018; Rucho v. Common Cause No. 18-422, 588 U.S. 2019). The number of voting units (vertices) are around 2,7002{,}700, 1,8001{,}800, and 8,9008{,}900, and the number of districts are 1313, 88, and 1818 for NC, MD, and PA, respectively666For PA, the number of districts was reduced to 17 after the 2020 census, but we are using past election results with 18 districts.. We focus on the results for NC here and discuss the qualitatively-similar results for PA and MD in Appendix B. NC is especially valuable because we have good examples of both gerrymandered and not gerrymandered maps (enacted maps which are widely considered gerrymandered and a remedial map which is not).

To generate a collection of maps, we use the Recombination algorithm, ReCom\operatorname{\textbf{ReCom}}, from DeFord, Duchin, and Solomon (2019) whose implementation is available online. ReCom\operatorname{\textbf{ReCom}} is a Markov Chain Monte Carlo (MCMC) sampling method, and hence, the generated samples are not actually iid\operatorname{\textbf{iid}}. While this means Condition 3.1 does not hold, we believe our theorems still have utility and future work can address more realistic sampling conditions. Moreover, we always exclude the first 2,0002{,}000 samples from any calculation as these are considered to be “burn-in” samples777In MCMC, the chain is supposed to converge to a stationary distribution after some number of steps, called the mixing time. Although the mixing time has not been theoretically calculated for ReCom\operatorname{\textbf{ReCom}}, empirically it seems that 2,0002{,}000 steps are sufficient.. Throughout this section, when we say distance, we mean d2,Θ(.,.)\operatorname{\mathnormal{d_{2,\Theta}}}(.,.) instead of dΘ(.,.)\operatorname{\mathnormal{d_{\Theta}}}(.,.). Further experimental results and figures are included in Appendix B.

Convergence of the Centroid:

Previous work (DeFord, Duchin, and Solomon 2019; DeFord and Duchin 2019) has used the ReCom\operatorname{\textbf{ReCom}} algorithm for estimating statistics such as the histogram of election seats won by a party and determined that using 50,00050{,}000 samples is sufficient for accurate results. However, our setting is more challenging. Specifically, the centroid includes Ω(n)\Omega(n) entries where nn is the total number of voting units (vertices) whereas an election histogram includes only kk entries where kk is the number of districts—usually orders of magnitude smaller than the number of voting units. Thus, we sample 200,000200{,}000 maps instead to estimate the centroid. Here, we emphasize the importance of our linear-time algorithm since using a quadratic-time algorithm on samples of the order of even 50,00050{,}000 could be computationally difficult. Following similar practice to Herschlag et al. (2020) for verifying convergence, we repeat the procedure (sampling using ReCom\operatorname{\textbf{ReCom}} and estimating the centroid) for a total of three times for each state, starting from a different seed map each time and confirming that all three runs result in essentially the same centroid estimate.

To verify the closeness of the different centroid estimates, we calculate the distances between them and compare them to their distances from sampled redistricting maps using ReCom\operatorname{\textbf{ReCom}}. We find that the centroids are orders of magnitude closer to each other than to any other sampled map. For example, the maximum unweighted distance between any two centroids is less than 130130 whereas the minimum unweighted distance between any of the three centroids and any sampled map is more than 100,000100{,}000. Similarly, the maximum weighted distance between any two centroids is less than 1.6×1091.6\times 10^{9} whereas the minimum weighted distance between a sampled map and a centroid is at least 1.3×10121.3\times 10^{12} which is again three orders of magnitude higher.

Distance Histogram and Detecting Gerrymandering:

For each state, we plot the distance histogram from its centroid. More specifically, having estimated the centroid A¯c\operatorname{\mathnormal{\bar{A}_{c}}}, we sample 200,000200{,}000 maps and calculate d2,Θ(A¯c,At)\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{\bar{A}_{c}}},A_{t}) where AtA_{t} is the ttht^{\text{th}} sampled map. Figure 2 shows the unweighted distance histogram for NC888In Appendix B, we show the histogram for PA and MD as well.. The histogram appears like a normal distribution, peaking at the middle (around the mean) and falling almost symmetrically away. This shows that the maps do not concentrate near the centroid even though it minimizes the sum of d2,Θ(A¯c,At)\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{\bar{A}_{c}}},A_{t}) distances. Interestingly, the histogram has a similar shape for both distances (unweighted and weighted), and this shape remains unchanged across the different seeds.

Furthermore, previous work used similar sampling methods to detect gerrymandered maps (Chikina, Frieze, and Pegden 2017; Mattingly and Vaughn 2014; Herschlag et al. 2020). In essence, these papers demonstrated that the election outcome achieved by the enacted map was rare in comparison to a large sampled ensemble of redistricting maps. Using similar logic, we can also detect gerrymandered maps. Specifically, the 2011 and 2016 enacted maps of NC were widely considered to be gerrymandered, and both maps are at the right tail of the histogram, far from the centroid. By contrast, a remedial NC map drawn by a set of retired judges (Herschlag et al. 2020) is much closer to the centroid (see Figure 2 red and green marked symbols). Interestingly, all gerrymandered maps are in the 99th percentile in terms of distance (for both distance measures and across three seeds).

This suggests that our method can detect gerrymandered maps with two advantages over previous methods: it does not use election results or partisan outcomes and it is very interpretable. Thus, a guideline or rule that maps should not be far away from the centroid can exist alongside reforms that prohibit explicit partisan consideration.

Finding the medoid:

Since we have shown in Subsection 5.3 that the medoid cannot be obtained by sampling, we follow a heuristic that consists of these steps: (1) Sample 200,000200{,}000 maps and pick the one closest to the centroid AclosestA_{\text{closest}}. (2) Start the ReCom\operatorname{\textbf{ReCom}} chain from AclosestA_{\text{closest}} but given a specific state (redistricting map) we only allow transitions to new states (maps) that are closer to the centroid, and we do this for 200,000200{,}000 steps to obtain the final estimated medoid A^\hat{A}^{*}. We follow this procedure three times one for each centroid 999As mentioned before we get three centroids each from sampling a chain that starts with a different seed.. Figure 3 (top row) shows the AclosestA_{\text{closest}} medoids from two different runs (each comparing to a different centroid), and it is easy to see they are different. The bottom row shows the final medoids A^\hat{A}^{*} which are visually more similar to each other and also close in distance.

Refer to caption
Figure 3: NC medoids, each column is for a specific seed. Top row: AclosestA_{\text{closest}}, Bottom row: A^\hat{A}^{*}.

7 Conclusion

In this paper we introduced a framework which accounts for the challenge of choosing from a large space of valid maps in the redistricting process. Specifically, we introduced a well-motivated family of distance measures and showed how we can obtain the medoid map according to this measure. Additionally, we showed experimentally that our framework can be used to find outlier (gerrymandered) maps based on their distance from the centroid map. Finally, we believe that there are further applications of having a distance measure over redistricting maps that are interesting to investigate such as using high dimensional visualization methods to gain further insight into the map ensemble.

References

  • Abrishami et al. (2020) Abrishami, T.; Guillen, N.; Rule, P.; Schutzman, Z.; Solomon, J.; Weighill, T.; and Wu, S. 2020. Geometry of graph partitions via optimal transport. SIAM Journal on Scientific Computing, 42(5): A3340–A3366.
  • Alexeev and Mixon (2018) Alexeev, B.; and Mixon, D. G. 2018. An impossibility theorem for gerrymandering. The American Mathematical Monthly, 125: 878–884.
  • Ashtiani, Kushagra, and Ben-David (2016) Ashtiani, H.; Kushagra, S.; and Ben-David, S. 2016. Clustering with same-cluster queries. Advances in neural information processing systems, 29.
  • Bickel (1966) Bickel, A. M. 1966. The Voting Rights Cases. The Supreme Court Review, 1966: 79–102.
  • Borg et al. (2013) Borg, I.; Groenen, P. J.; Mair, P.; Borg, I.; Groenen, P. J.; and Mair, P. 2013. Mds algorithms. Applied Multidimensional Scaling, 81–86.
  • Brandt et al. (2016) Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D. 2016. Handbook of computational social choice. Cambridge University Press.
  • Brubach, Srinivasan, and Zhao (2020) Brubach, B.; Srinivasan, A.; and Zhao, S. 2020. Meddling metrics: the effects of measuring and constraining partisan gerrymandering on voter incentives. In Proceedings of the 21st ACM Conference on Economics and Computation, 815–833.
  • Bycoffe et al. (2018) Bycoffe, A.; Koeze, E.; Wasserman, D.; and Wolfe, J. 2018. The Atlas Of Redistricting. https://projects.fivethirtyeight.com/redistricting-maps/. [Online; published 25-January-2018; accessed 15-August-2019].
  • Chen (2021) Chen, M. 2021. Tufts research lab aids states with redistricting process. The Tufts Daily.
  • Chhabra, Roy, and Mohapatra (2020) Chhabra, A.; Roy, A.; and Mohapatra, P. 2020. Suspicion-free adversarial attacks on clustering algorithms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 3625–3632.
  • Chikina, Frieze, and Pegden (2017) Chikina, M.; Frieze, A.; and Pegden, W. 2017. Assessing significance in a Markov chain without mixing. Proceedings of the National Academy of Sciences, 114(11): 2860–2864.
  • Cho (2019) Cho, W. K. T. 2019. Technology-Enabled Coin Flips for Judging Partisan Gerrymandering. Southern California law review, 93.
  • Cinà, Torcinovich, and Pelillo (2022) Cinà, A. E.; Torcinovich, A.; and Pelillo, M. 2022. A black-box adversarial attack for poisoning clustering. Pattern Recognition, 122: 108306.
  • Cohen-Addad, Klein, and Young (2018) Cohen-Addad, V.; Klein, P. N.; and Young, N. E. 2018. Balanced centroidal power diagrams for redistricting. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 389–396.
  • Commission (2021) Commission, M. C. R. 2021. Redistricting Map Submission Process. https://redistricting.maryland.gov/Pages/plan-proposals.aspx. [Online; accessed 20-October-2021].
  • DeFord and Duchin (2019) DeFord, D.; and Duchin, M. 2019. Redistricting reform in Virginia: Districting criteria in context. Virginia Policy Review, 12(2): 120–146.
  • DeFord, Duchin, and Solomon (2019) DeFord, D.; Duchin, M.; and Solomon, J. 2019. Recombination: A family of Markov chains for redistricting. arXiv preprint arXiv:1911.05725.
  • Duchin and Walch (2021) Duchin, M.; and Walch, O. 2021. Political Geometry.
  • Fifield et al. (2015) Fifield, B.; Higgins, M.; Imai, K.; and Tarr, A. 2015. A new automated redistricting simulator using markov chain monte carlo. Work. Pap., Princeton Univ., Princeton, NJ.
  • Goldschmidt and Hochbaum (1994) Goldschmidt, O.; and Hochbaum, D. S. 1994. A polynomial algorithm for the k-cut problem for fixed k. Mathematics of operations research, 19(1): 24–37.
  • He, Xu, and Deng (2003) He, Z.; Xu, X.; and Deng, S. 2003. Discovering cluster-based local outliers. Pattern recognition letters, 24(9-10): 1641–1650.
  • Herschlag et al. (2020) Herschlag, G.; Kang, H. S.; Luo, J.; Graves, C. V.; Bangia, S.; Ravier, R.; and Mattingly, J. C. 2020. Quantifying gerrymandering in north carolina. Statistics and Public Policy, 7(1): 30–38.
  • Hettle et al. (2021) Hettle, C.; Zhu, S.; Gupta, S.; and Xie, Y. 2021. Balanced Districting on Grid Graphs with Provable Compactness and Contiguity. arXiv preprint arXiv:2102.05028.
  • Kemeny (1959) Kemeny, J. G. 1959. Mathematics without numbers. Daedalus, 88(4): 577–591.
  • Kendall (1938) Kendall, M. G. 1938. A new measure of rank correlation. Biometrika, 30(1/2): 81–93.
  • Knox and Ng (1998) Knox, E. M.; and Ng, R. T. 1998. Algorithms for mining distancebased outliers in large datasets. In Proceedings of the international conference on very large data bases, 392–403. Citeseer.
  • Ko et al. (2022) Ko, S.-H.; Taylor, E.; Agarwal, P.; and Munagala, K. 2022. All Politics is Local: Redistricting via Local Fairness. Advances in Neural Information Processing Systems, 35: 17443–17455.
  • Lin et al. (2022) Lin, J.; Chen, C.; Chmielewski, M.; Zaman, S.; and Fain, B. 2022. Auditing for gerrymandering by identifying disenfranchised individuals. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 1125–1135.
  • Liu, Cho, and Wang (2016) Liu, Y. Y.; Cho, W. K. T.; and Wang, S. 2016. PEAR: a massively parallel evolutionary computation approach for political redistricting optimization and analysis. Swarm and Evolutionary Computation, 30: 78 – 92.
  • LWV vs Commonwealth of Pennsylvania (No. 159 MM 2018) LWV vs Commonwealth of Pennsylvania. No. 159 MM 2018.
  • Mattingly and Vaughn (2014) Mattingly, J. C.; and Vaughn, C. 2014. Redistricting and the Will of the People. arXiv preprint arXiv:1410.8796.
  • Mead (1992) Mead, A. 1992. Review of the development of multidimensional scaling methods. Journal of the Royal Statistical Society: Series D (The Statistician), 41(1): 27–39.
  • Newling and Fleuret (2017) Newling, J.; and Fleuret, F. 2017. A sub-quadratic exact medoid algorithm. In Artificial Intelligence and Statistics, 185–193. PMLR.
  • Pegden, Procaccia, and Yu (2017) Pegden, W.; Procaccia, A. D.; and Yu, D. 2017. A partisan districting protocol with provably nonpartisan outcomes. arXiv preprint arXiv:1710.08781.
  • Polsby and Popper (1991) Polsby, D. D.; and Popper, R. D. 1991. The third criterion: Compactness as a procedural safeguard against partisan gerrymandering. Yale Law & Policy Review, 9(2): 301–353.
  • Rucho v. Common Cause (No. 18-422, 588 U.S. 2019) Rucho v. Common Cause. No. 18-422, 588 U.S. 2019.
  • Ryan and Smith (2022) Ryan, I.; and Smith, W. D. 2022. Splitline districtings of all 50 states + DC + PR. https://rangevoting.org/SplitLR.html. [Online; accessed 15-August-2019].
  • Saran and Vazirani (1995) Saran, H.; and Vazirani, V. V. 1995. Finding k cuts within twice the optimal. SIAM Journal on Computing, 24(1): 101–108.
  • Validi and Buchanan (2022) Validi, H.; and Buchanan, A. 2022. Political districting to minimize cut edges. Mathematical Programming Computation, 1–50.
  • Vieth v. Jubelirer (No. 02-1580, 541 U.S. 267 (2004) Vieth v. Jubelirer. No. 02-1580, 541 U.S. 267 (2004).
  • Wu et al. (2015) Wu, L. C.; Dou, J. X.; Sleator, D.; Frieze, A.; and Miller, D. 2015. Impartial redistricting: A markov chain approach. arXiv preprint arXiv:1510.03247.
  • Zubrzycki (1972) Zubrzycki, S. 1972. Lectures in probability theory and mathematical statistics, volume 38. Elsevier Publishing Company.

Appendix A Omitted Proof

We restate the proposition and give its proof: See 1

Proof.

The first step in the proof follows from the definition of the weighted sum of distances for a map MM from the set of all maps:

M\displaystyle M^{*} =argminMM[(τντ,M)d(M,M)]\displaystyle=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}}}\sum_{M^{\prime}\in\operatorname{\mathcal{M}}}\Big{[}\Big{(}\sum_{\tau}\nu_{\tau,\operatorname{\mathnormal{M}}}\Big{)}d(M,M^{\prime})\Big{]}
=argminMM[(τντ,M𝒯)d(M,M)]\displaystyle=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}}}\sum_{M^{\prime}\in\operatorname{\mathcal{M}}}\Big{[}\Big{(}\frac{\sum_{\tau}\nu_{\tau,\operatorname{\mathnormal{M}}}}{\mathcal{T}}\Big{)}d(M,M^{\prime})\Big{]}
=argminMM[pMd(M,M)]\displaystyle=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}}}\sum_{M^{\prime}\in\operatorname{\mathcal{M}}}\Big{[}p_{\operatorname{\mathnormal{M}}}d(M,M^{\prime})\Big{]}
=argminM𝔼MpM[d(M,M)]\displaystyle=\operatorname*{arg\,min}\limits_{\operatorname{\mathnormal{M}}\in\operatorname{\mathcal{M}}}\underset{\operatorname{\mathnormal{M}}^{\prime}\sim p_{\operatorname{\mathnormal{M}}^{\prime}}}{\operatorname{\mathbb{E}}}[d(\operatorname{\mathnormal{M}},\operatorname{\mathnormal{M}}^{\prime})]

We restate the next proposition and give its proof: See 2

Proof.

Since θ(i,j)>0,i,jV\theta(i,j)>0,\forall i,j\in V, it is clear that dΘ(A1,A2)\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2}) is non-negative, symmetric, and that it equals zero if and only if A1=A2A_{1}=A_{2}. The triangle inequality follows since |A1(i,j)A2(i,j)|=|A1(i,j)A3(i,j)+A3(i,j)A2(i,j)||A1(i,j)A3(i,j)|+|A3(i,j)A2(i,j)||A_{1}(i,j)-A_{2}(i,j)|=|A_{1}(i,j)-A_{3}(i,j)+A_{3}(i,j)-A_{2}(i,j)|\leq|A_{1}(i,j)-A_{3}(i,j)|+|A_{3}(i,j)-A_{2}(i,j)|, therefore we have:

dΘ(A1,A2)=12i,jVθ(i,j)|A1(i,j)A2(i,j)|\displaystyle\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2})=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)|A_{1}(i,j)-A_{2}(i,j)|
12i,jVθ(i,j)(|A1(i,j)A3(i,j)|+|A3(i,j)A2(i,j)|)\displaystyle\leq\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\big{(}|A_{1}(i,j)-A_{3}(i,j)|+|A_{3}(i,j)-A_{2}(i,j)|\big{)}
12i,jVθ(i,j)|A1(i,j)A3(i,j)|\displaystyle\leq\frac{1}{2}\sum_{i,j\in V}\theta(i,j)|A_{1}(i,j)-A_{3}(i,j)|
+12i,jVθ(i,j)|A3(i,j)A2(i,j)|\displaystyle\qquad+\frac{1}{2}\sum_{i,j\in V}\theta(i,j)|A_{3}(i,j)-A_{2}(i,j)|
=dΘ(A1,A3)+dΘ(A2,A3)\displaystyle=\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{3})+\operatorname{\mathnormal{d_{\Theta}}}(A_{2},A_{3})

We restate the next theorem and give its proof: See 5.1

Proof.

We begin with the following lemma:

Lemma 1.

For any A1,A2A_{1},A_{2} that are binary matrices (entries either 0 or 11), with d2,Θ(A1,A2)=12i,jVθ(i,j)(A1(i,j)A2(i,j))2\operatorname{\mathnormal{d_{2,\Theta}}}(A_{1},A_{2})=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)(A_{1}(i,j)-A_{2}(i,j))^{2}, then we have that dΘ(A1,A2)=d2,Θ(A1,A2)\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2})=\operatorname{\mathnormal{d_{2,\Theta}}}(A_{1},A_{2}).

Proof.

The proof is immediate since A1A_{1} and A2A_{2} are binary. ∎

It may seem redundant to introduce a new definition d2,Θ(.,.)\operatorname{\mathnormal{d_{2,\Theta}}}(.,.) since by Lemma 1, they are equivalent. However, we will shortly be using d2,Θ(.,.)\operatorname{\mathnormal{d_{2,\Theta}}}(.,.) over matrices which are not necessarily binary, clearly then we might have dΘ(A1,A2)d2,Θ(A1,A2)\operatorname{\mathnormal{d_{\Theta}}}(A_{1},A_{2})\neq\operatorname{\mathnormal{d_{2,\Theta}}}(A_{1},A_{2}).

We then introduce next lemma:

Lemma 2.

For any two matrices (not necessarily binary), the following holds:

d2,Θ(A1,A2)=12A1ΘA2Θ22\displaystyle\operatorname{\mathnormal{d_{2,\Theta}}}(A_{1},A_{2})=\frac{1}{2}\left\lVert A^{\Theta}_{1}-A^{\Theta}_{2}\right\rVert^{2}_{2} (5)

where AsΘ(i,j)=θ(i,j)As(i,j),s{1,2}A^{\Theta}_{s}(i,j)=\sqrt{\theta(i,j)}A_{s}(i,j),\forall s\in\{1,2\} and A1ΘA2Θ22\left\lVert A^{\Theta}_{1}-A^{\Theta}_{2}\right\rVert^{2}_{2} is the square of the 2\ell_{2}-norm of the vectorized form of the matrix (A1ΘA2Θ)(A^{\Theta}_{1}-A^{\Theta}_{2}).

Proof.
d2,Θ(A1,A2)\displaystyle\operatorname{\mathnormal{d_{2,\Theta}}}(A_{1},A_{2}) =12i,jVθ(i,j)(A1(i,j)A2(i,j))2\displaystyle=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)(A_{1}(i,j)-A_{2}(i,j))^{2}
=12i,jV(θ(i,j)A1(i,j)θ(i,j)A2(i,j))2\displaystyle=\frac{1}{2}\sum_{i,j\in V}(\sqrt{\theta(i,j)}A_{1}(i,j)-\sqrt{\theta(i,j)}A_{2}(i,j))^{2}
=12A1ΘA2Θ22\displaystyle=\frac{1}{2}\left\lVert A^{\Theta}_{1}-A^{\Theta}_{2}\right\rVert^{2}_{2}

With the lemmas above, we can now prove the decomposition theorem:

t=1TdΘ(At,A)\displaystyle\sum_{t=1}^{T}\operatorname{\mathnormal{d_{\Theta}}}(A_{t},A^{\prime}) =t=1Td2,Θ(At,A)(using Lemma 1)\displaystyle=\sum_{t=1}^{T}\operatorname{\mathnormal{d_{2,\Theta}}}(A_{t},A^{\prime})\ \ \ \ \ \ \text{(using Lemma \ref{ham_eq_l2})}
=t=1T12AtΘAΘ22(using Lemma 2)\displaystyle=\sum_{t=1}^{T}\frac{1}{2}\left\lVert A^{\Theta}_{t}-{A^{\prime}}^{\Theta}\right\rVert^{2}_{2}\ \ \ \ \ \ \text{(using Lemma \ref{vector_lemma})}
=12t=1TAtΘA¯cΘ+A¯cΘAΘ22\displaystyle=\frac{1}{2}\sum_{t=1}^{T}\left\lVert A^{\Theta}_{t}-\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}+\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta}\right\rVert^{2}_{2}
=12t=1T[(AtΘA¯cΘ+A¯cΘAΘ)(AtΘA¯cΘ+A¯cΘAΘ)]\displaystyle=\frac{1}{2}\sum_{t=1}^{T}\Big{[}(A^{\Theta}_{t}-\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}+\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta})^{\intercal}(A^{\Theta}_{t}-\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}+\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta})\Big{]}
=t=1T12[AtΘA¯cΘ22+A¯cΘAΘ22]+(t=1T(AtΘA¯cΘ))(A¯cΘAΘ)\displaystyle=\sum_{t=1}^{T}\frac{1}{2}\Big{[}\left\lVert A^{\Theta}_{t}-\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}\right\rVert^{2}_{2}+\left\lVert\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta}\right\rVert^{2}_{2}\Big{]}+\left(\sum_{t=1}^{T}(A^{\Theta}_{t}-\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}})\right)^{\intercal}(\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta})
=t=1T12AtΘA¯cΘ22+T2A¯cΘAΘ22+(TA¯cΘTA¯cΘ)(A¯cΘAΘ)\displaystyle=\sum_{t=1}^{T}\frac{1}{2}\left\lVert A^{\Theta}_{t}-\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}\right\rVert^{2}_{2}+\frac{T}{2}\left\lVert\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta}\right\rVert^{2}_{2}+(T\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-T\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}})^{\intercal}(\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}-{A^{\prime}}^{\Theta})
=t=1Td2,Θ(At,A¯c)+Td2,Θ(A¯c,A)\displaystyle=\sum_{t=1}^{T}\operatorname{\mathnormal{d_{2,\Theta}}}(A_{t},\operatorname{\mathnormal{\bar{A}_{c}}})+T\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{\bar{A}_{c}}},A^{\prime})

Note that in the fourth line we take the dot product with the matrices being in vectorized form and that A¯cΘ=1Tt=1TAtΘ\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}=\frac{1}{T}\sum_{t=1}^{T}A^{\Theta}_{t}. Note that it follows that A¯cΘ(i,j)=θ(i,j)A¯c(i,j)\operatorname{\mathnormal{\bar{A}^{\Theta}_{c}}}(i,j)=\sqrt{\theta(i,j)}\operatorname{\mathnormal{\bar{A}_{c}}}(i,j). ∎

We restate the next proposition and give its proof: See 3

Proof.

By definition of the redistricting adjacency matrices and condition (3.1) for iid\operatorname{\textbf{iid}} sampling we have that:

Ac(i,j)=𝔼At[At(i,j)]\displaystyle\operatorname{\mathnormal{A_{c}}}(i,j)=\underset{A_{t}\sim\operatorname{\mathcal{M}}}{\operatorname{\mathbb{E}}}[A_{t}(i,j)]
=(1)Pr[i and j are in the same district]+(0)Pr[i and j are not in the same district]\displaystyle=(1)\Pr[i\text{ and }j\text{ are in the same district}]+(0)\Pr[i\text{ and }j\text{ are not in the same district}]
=Pr[i and j are in the same district]\displaystyle=\Pr[i\text{ and }j\text{ are in the same district}]

We restate the next proposition and give its proof: See 4

Proof.

For a given i,jVi,j\in V by the Hoeffding bound we have that:

Pr[|A¯c(i,j)Ac(i,j)|ϵ]12e2ϵ2T\displaystyle\Pr[|\operatorname{\mathnormal{\bar{A}_{c}}}(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j)|\leq\epsilon]\geq 1-2e^{-2\epsilon^{2}T}
12e2ϵ21ϵ2lnnδ12(e2lnnδ)112δ2n2\displaystyle\geq 1-2e^{-2\epsilon^{2}\frac{1}{\epsilon^{2}}\ln{\frac{n}{\delta}}}\geq 1-2(e^{2\ln{\frac{n}{\delta}}})^{-1}\geq 1-2\frac{\delta^{2}}{n^{2}}

Now we calculate the following event:

Pr({i,jV:|A¯c(i,j)Ac(i,j)|ϵ})\displaystyle\Pr(\{\forall i,j\in V:|\operatorname{\mathnormal{\bar{A}_{c}}}(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j)|\leq\epsilon\})
=1Pr({i,jV:|A¯c(i,j)Ac(i,j)|>ϵ})\displaystyle=1-\Pr(\{\exists i,j\in V:|\operatorname{\mathnormal{\bar{A}_{c}}}(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j)|>\epsilon\})
1i,jV2δ2n212δ2(n2n2)n21δ21δ( since δ(0,1))\displaystyle\geq 1-\sum_{i,j\in V}2\frac{\delta^{2}}{n^{2}}\geq 1-2\delta^{2}\frac{(\frac{n^{2}-n}{2})}{n^{2}}\geq 1-\delta^{2}\geq 1-\delta\ \ \ \ \ \ \text{( since $\delta\in(0,1)$)}

Now we prove the second part. By applying the previous result with ϵ\epsilon set to ϵρn\frac{\sqrt{\epsilon}}{\sqrt{\rho}n}, we get that with probability at least 1δ1-\delta, |A¯c(i,j)Ac(i,j)|ϵρn2|\operatorname{\mathnormal{\bar{A}_{c}}}(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j)|\leq\frac{\sqrt{\epsilon}}{\sqrt{\rho}n^{2}}. It follows that:

d2,Θ(A¯c,Ac)\displaystyle\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{\bar{A}_{c}}},\operatorname{\mathnormal{A_{c}}}) =12i,jVθ(i,j)(A¯c(i,j)Ac(i,j))2\displaystyle=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)(\operatorname{\mathnormal{\bar{A}_{c}}}(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j))^{2}
12i,jVθ(i,j)(ϵρn)212i,jVϵn2\displaystyle\leq\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\Big{(}\frac{\sqrt{\epsilon}}{\sqrt{\rho}n}\Big{)}^{2}\leq\frac{1}{2}\sum_{i,j\in V}\frac{\epsilon}{n^{2}}
12ϵn2n2n2ϵ\displaystyle\leq\frac{1}{2}\frac{\epsilon}{n^{2}}\frac{n^{2}-n}{2}\leq\epsilon

We restate the next theorem and give its proof: See 5.2

Proof.

From Theorem 5.1, the population medoid is a valid redistricting map AA for which d2,Θ(A,Ac)\operatorname{\mathnormal{d_{2,\Theta}}}(A,\operatorname{\mathnormal{A_{c}}}) is minimized. Note that since AA is a redistricting map, unlike Ac\operatorname{\mathnormal{A_{c}}} it must be a binary matrix. Therefore, |A(i,j)Ac(i,j)|=(1Ac(i,j))+(2Ac(i,j)1)(1A(i,j))|A(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j)|=(1-\operatorname{\mathnormal{A_{c}}}(i,j))+(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)(1-A(i,j)), where this identity can be verified by plugging 0 or 1 for A(i,j)A(i,j) and seeing that it leads to an equality. Define the matrix BB as a “complement” of AA. Specifically, B(i,j)=1A(i,j)B(i,j)=1-A(i,j). It follows that B(i,j)=1B(i,j)=1 if and only if ii and jj are in different partitions and B(i,j)=0B(i,j)=0 otherwise. Clearly, BB is a binary matrix. We can obtain the following:

d2,Θ(A,Ac)=12i,jVθ(i,j)(A(i,j)Ac(i,j))2\displaystyle\operatorname{\mathnormal{d_{2,\Theta}}}(A,\operatorname{\mathnormal{A_{c}}})=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)(A(i,j)-\operatorname{\mathnormal{A_{c}}}(i,j))^{2}
=12i,jVθ(i,j)((1Ac(i,j))+(2Ac(i,j)1)(1A(i,j)))2\displaystyle\quad=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\big{(}(1-\operatorname{\mathnormal{A_{c}}}(i,j))+(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)(1-A(i,j))\big{)}^{2}
=12i,jVθ(i,j)((1Ac(i,j))+(2Ac(i,j)1)B(i,j))2\displaystyle\quad=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\big{(}(1-\operatorname{\mathnormal{A_{c}}}(i,j))+(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)B(i,j)\big{)}^{2}
=12i,jVθ(i,j)((1Ac(i,j))2+2(1Ac(i,j))(2Ac(i,j)1)B(i,j)+(2Ac(i,j)1)2B2(i,j))\displaystyle\quad=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\big{(}(1-\operatorname{\mathnormal{A_{c}}}(i,j))^{2}+2(1-\operatorname{\mathnormal{A_{c}}}(i,j))(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)B(i,j)+(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)^{2}B^{2}(i,j)\big{)}
=12i,jVθ(i,j)((1Ac(i,j))2+2(1Ac(i,j))(2Ac(i,j)1)B(i,j)+(2Ac(i,j)1)2B(i,j))\displaystyle\quad=\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\big{(}(1-\operatorname{\mathnormal{A_{c}}}(i,j))^{2}+2(1-\operatorname{\mathnormal{A_{c}}}(i,j))(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)B(i,j)+(2\operatorname{\mathnormal{A_{c}}}(i,j)-1)^{2}B(i,j)\big{)}
=[12i,jVθ(i,j)(Ac2(i,j)2Ac(i,j)+1)][12i,jVθ(i,j)(12Ac(i,j))B(i,j)]\displaystyle\quad=\Big{[}\frac{1}{2}\sum_{i,j\in V}\theta(i,j)(\operatorname{\mathnormal{A_{c}}}^{2}(i,j)-2\operatorname{\mathnormal{A_{c}}}(i,j)+1)\Big{]}-\Big{[}\frac{1}{2}\sum_{i,j\in V}\theta(i,j)\big{(}1-2\operatorname{\mathnormal{A_{c}}}(i,j)\big{)}B(i,j)\Big{]}

Note that the first sum in the last equality is a constant and has no dependence on BB. Hence to minimize d2,Θ(A,Ac)\operatorname{\mathnormal{d_{2,\Theta}}}(A,\operatorname{\mathnormal{A_{c}}}), we maximize the following:

maxB\displaystyle\max_{B} i,jVs(i,j)B(i,j)\displaystyle\sum_{i,j\in V}s(i,j)B(i,j) (6)
s.t. B is a k partition that leads to a valid redistricting map\displaystyle B\text{ is a $k$ partition that leads to a valid redistricting map} (7)

where the weight s(i,j)s(i,j) is equal to s(i,j)=12θ(i,j)(12Ac(i,j))s(i,j)=\frac{1}{2}\theta(i,j)\big{(}1-2\operatorname{\mathnormal{A_{c}}}(i,j)\big{)}. Clearly, this is a constrained max kk-cut instance where the partition has to be a valid redistricting map.

Theorem 5.3 shows a negative result for obtaining the population medoid of a set of redistricting maps. Before we show its proof, we show the following theorem which proves a similar result over points in a 2-dimensional Euclidean space. We note that both theorems hold even if the population (true) medoid is sampled with non-zero probability:

Theorem A.1.

There exists a distribution over a set of points 𝒫\mathcal{P} in the 2-dimensional Euclidean space such that given TT many iid\operatorname{\textbf{iid}} samples {x1,,xT}\{x_{1},\dots,x_{T}\} the following holds: (1) If Tpoly(ϵ,ln(1/ρ))T\geq\text{poly}(\epsilon,\ln(1/\rho)) then with probability at least 1ρ1-\rho we have 1Tt=1xt𝔼[x]22ϵ\left\lVert\frac{1}{T}\sum_{t=1}x_{t}-\operatorname{\mathbb{E}}[x]\right\rVert^{2}_{2}\leq\epsilon. However, it is also the case that for any number of samples TT, we have: (2) Pr[minx{x1,,xT}d(x,x)0.999]23\Pr[\min_{x\in\{x_{1},\dots,x_{T}\}}d(x,x^{*})\geq 0.999]\geq\frac{2}{3} and (3) Pr[minxf(x)1.2f(x)]23\Pr[\min_{x\in}f(x)\geq 1.2f(x^{*})]\geq\frac{2}{3} where xx^{*} is the population medoid and f(.)f(.) is the medoid cost function.

Proof.
Refer to caption
Figure 4: Points γ1,γ2,γ3\gamma_{1},\gamma_{2},\gamma_{3}, and γ4\gamma_{4} all lie on the circle and the angle between any two adjacent points is 9090^{\circ}. Point γ5\gamma_{5} is a distance \ell from the circle center cc. The center cc is also the origin point of the 2-dd plane (0,0)(0,0).

Consider the set of points 𝒫={γ1,γ2,γ3,γ4,γ5}\mathcal{P}=\{\gamma_{1},\gamma_{2},\gamma_{3},\gamma_{4},\gamma_{5}\} shown in Figure 4. Specifically, points γ1,γ2,γ3\gamma_{1},\gamma_{2},\gamma_{3} and γ4\gamma_{4} lie on a circle of radius 11 at an equal separation. Point γ5\gamma_{5} lies closer to the center of the circle cc at a distance of \ell. Further, each point γi\gamma_{i} has a probability pip_{i} of being sampled. Specifically, p5=δp_{5}=\delta and p1=p2=p3=p4=1δ4p_{1}=p_{2}=p_{3}=p_{4}=\frac{1-\delta}{4}.

The centroid of the distribution is the expected value, i.e. 𝔼[x]=j=15pjγj\operatorname{\mathbb{E}}[x]=\sum_{j=1}^{5}p_{j}\gamma_{j}. It is straightforward to show that given iid\operatorname{\textbf{iid}} samples {x1,x2,,xT}\{x_{1},x_{2},\dots,x_{T}\} where T=Ω(ln(1/ρ)ϵ)T=\Omega(\frac{\ln(1/\rho)}{\epsilon}), then 1Tt=1Tx^t𝔼[x]22ϵ\left\lVert\frac{1}{T}\sum_{t=1}^{T}\hat{x}_{t}-\operatorname{\mathbb{E}}[x]\right\rVert^{2}_{2}\leq\epsilon with probability at least 1ρ1-\rho. This follows by applying the generalized Hoeffding inequality (see (Ashtiani, Kushagra, and Ben-David 2016) for example). This proves the first statement.

Now, given a candidate medoid point γ\gamma^{\prime}, the medoid cost function is f(γ)=j=15pjd(γ,γj)f(\gamma^{\prime})=\sum_{j=1}^{5}p_{j}d(\gamma^{\prime},\gamma_{j}). Note that the medoid is the value of γ\gamma^{\prime} which minimizes f(γ)f(\gamma^{\prime}) and also γ\gamma^{\prime} must be an element of the set, i.e. γ𝒫\gamma^{\prime}\in\mathcal{P}. It is easy to verify that f(γ1)=1δ4[d(γ1,γ1)+d(γ1,γ2)+d(γ1,γ3)+d(γ1,γ4)]+δd(γ1,γ5)>1δ4[0+22+2]=1δ2(1+2)f(\gamma_{1})=\frac{1-\delta}{4}[d(\gamma_{1},\gamma_{1})+d(\gamma_{1},\gamma_{2})+d(\gamma_{1},\gamma_{3})+d(\gamma_{1},\gamma_{4})]+\delta d(\gamma_{1},\gamma_{5})>\frac{1-\delta}{4}[0+2\sqrt{2}+2]=\frac{1-\delta}{2}(1+\sqrt{2}). By symmetry it also follows that f(γ2),f(γ3),f(γ4)f(\gamma_{2}),f(\gamma_{3}),f(\gamma_{4}) are each lower bounded by 1δ2(1+2)\frac{1-\delta}{2}(1+\sqrt{2}).

On the other hand, we have f(γ5)=1δ4j=14d(γ5,γj)1δ4j=14[d(γ5,c)+d(c,γj)]1δ4(4(1+))=(1δ)(1+)f(\gamma_{5})=\frac{1-\delta}{4}\sum_{j=1}^{4}d(\gamma_{5},\gamma_{j})\leq\frac{1-\delta}{4}\sum_{j=1}^{4}[d(\gamma_{5},c)+d(c,\gamma_{j})]\leq\frac{1-\delta}{4}(4(1+\ell))=(1-\delta)(1+\ell). Further, for any i5:f(γi)f(γ5)1δ2(1+2)(1δ)(1+)=(1+2)2(1+11000)>1.2i\neq 5:\frac{f(\gamma_{i})}{f(\gamma_{5})}\geq\frac{\frac{1-\delta}{2}(1+\sqrt{2})}{(1-\delta)(1+\ell)}=\frac{(1+\sqrt{2})}{2(1+\frac{1}{1000})}>1.2 by setting =11000\ell=\frac{1}{1000}, and therefore γ5\gamma_{5} is the medoid. Further, i{1,2,3,4}:d(γi,γ5)d(γi,c)d(c,γ5)10.999\forall i\in\{1,2,3,4\}:d(\gamma_{i},\gamma_{5})\geq d(\gamma_{i},c)-d(c,\gamma_{5})\geq 1-\ell\geq 0.999. Therefore, to prove the second and third statements it is sufficient to prove that γ5\gamma_{5} would be sampled with probability at most 13\frac{1}{3} in TT many iid\operatorname{\textbf{iid}} samples:

Pr[Sampling γ5 in T iid samples]\displaystyle\Pr[\text{Sampling $\gamma_{5}$ in $T$ $\operatorname{\textbf{iid}}$ samples}]
=1Pr[Not sampling γ5 in T iid samples]\displaystyle=1-\Pr[\text{Not sampling $\gamma_{5}$ in $T$ $\operatorname{\textbf{iid}}$ samples}]
=1(1δ)T13\displaystyle=1-(1-\delta)^{T}\leq\frac{1}{3}
δ123T\displaystyle\iff\delta\leq 1-\sqrt[T]{\frac{2}{3}}

Therefore, by setting δ123T\delta\leq 1-\sqrt[T]{\frac{2}{3}} the distribution satisfies all three statement simultaneously. ∎

Now we restate Theorem 5.3 and show its proof: See 5.3

Proof.
Refer to caption
Figure 5: The graph shows a hypothetical state. Blue edges indicate that the vertices are adjacent geographically. All vertices have a weight (population) of 1, except for states {a1,b1,a4,b4}\{a_{1},b_{1},a_{4},b_{4}\} which have a weight of 12\frac{1}{2}.

Consider the hypothetical state shown in Figure 5 where vertices v1v_{1} and v4v_{4} are further subdividing into two vertices each. We wish to divide the state into 22 districts (k=2k=2). Since each vertex has a weight of 11, except vertices {a1,b1,a4,b4}\{a_{1},b_{1},a_{4},b_{4}\} which each have a weight of 12\frac{1}{2}, then each district should have a population of 22 to enforce the equal population rule with a tolerance less than 0.250.25.

Denoting the set of all vertices by VV and letting V={a1,b1,a4,b4}V^{\prime}=\{a_{1},b_{1},a_{4},b_{4}\}, then the weight parameters of our weighted distance measure are defined as follows:

θ(i,j)={ϵif i&jV12if iVV,jV or iV,jVV1otherwise\displaystyle\theta(i,j)=\begin{cases}\epsilon&\text{if $i\ \&\ j\in V^{\prime}$}\\ \frac{1}{2}&\text{if $i\in V-V^{\prime},j\in V^{\prime}$ or $i\in V^{\prime},j\in V-V^{\prime}$}\\ 1&\text{otherwise}\end{cases}
Refer to caption
Figure 6: Maps M1,M2,M3M_{1},M_{2},M_{3}, and M4M_{4}. Vertices in the same district are connected with edges.

Where 0<ϵ10<\epsilon\leq 1. Now, consider the maps M1,M2,M3M_{1},M_{2},M_{3}, and M4M_{4} shown in Figure 6. Based on the definition of the weighted distance measure, it is not difficult to see that given maps MsM_{s} and MtM_{t}, then dΘ(Ms,Mt)\operatorname{\mathnormal{d_{\Theta}}}(M_{s},M_{t}) can be computed visually by drawing the adjacent graphs of MsM_{s} and MtM_{t} and then finding the minimum number of edges that have to be deleted and added to MsM_{s} to produce MtM_{t} and adding the weighted θ(i.j)\theta(i.j) of these edges. By following this procedure, we can show that dΘ(M1,M3)=6+4ϵ\operatorname{\mathnormal{d_{\Theta}}}(M_{1},M_{3})=6+4\epsilon for example as shown in Figure 7. Here we list all distances:

dΘ(M1,M2)=4\displaystyle\operatorname{\mathnormal{d_{\Theta}}}(M_{1},M_{2})=4
dΘ(M1,M4)=dΘ(M2,M4)=2+4ϵ\displaystyle\operatorname{\mathnormal{d_{\Theta}}}(M_{1},M_{4})=\operatorname{\mathnormal{d_{\Theta}}}(M_{2},M_{4})=2+4\epsilon
dΘ(M1,M3)=dΘ(M2,M4)=(2+4ϵ)+4=6+4ϵ\displaystyle\operatorname{\mathnormal{d_{\Theta}}}(M_{1},M_{3})=\operatorname{\mathnormal{d_{\Theta}}}(M_{2},M_{4})=(2+4\epsilon)+4=6+4\epsilon
dΘ(M3,M4)=4\displaystyle\operatorname{\mathnormal{d_{\Theta}}}(M_{3},M_{4})=4

Given a map MM, the medoid cost function is defined as f(M)=MpMd(M,M)f(M)=\sum_{M^{\prime}\in\operatorname{\mathcal{M}}}p_{M^{\prime}}d(M,M^{\prime}). Let the probabilities for the redistricting maps be assigned as follows: p1=p2=p3=1δ3p_{1}=p_{2}=p_{3}=\frac{1-\delta}{3} whereas Pr[Sampling a Map M{M1,M2,M3}]=δ>0\Pr[\text{Sampling a Map $M\notin\{M_{1},M_{2},M_{3}\}$}]=\delta>0. Accordingly, p5δp_{5}\leq\delta.

Further, since θ(i,j)1,i,jV\theta(i,j)\leq 1\ ,\forall i,j\in V, then maximum distance between any redistricting maps DD can be upper bounded by the highest number of edges that can be deleted and added from one map to produce another, therefore D2(|V|2)=|V|(|V|1)=10×9=90D\leq 2{|V|\choose 2}=|V|(|V|-1)=10\times 9=90 since |V|=10|V|=10 (see Figure 5). The medoid cost function can be lower bounded for M1,M2,M_{1},M_{2}, and M3M_{3} and upper bounded for M4M_{4} as shown below:

f(M1)>1δ3[d(M1,M2)+d(M1,M3)]=1δ3(10+4ϵ)\displaystyle f(M_{1})>\frac{1-\delta}{3}[d(M_{1},M_{2})+d(M_{1},M_{3})]=\frac{1-\delta}{3}(10+4\epsilon)
f(M2)>1δ3[d(M1,M2)+d(M2,M3)]=1δ3(10+4ϵ)\displaystyle f(M_{2})>\frac{1-\delta}{3}[d(M_{1},M_{2})+d(M_{2},M_{3})]=\frac{1-\delta}{3}(10+4\epsilon)
f(M3)>1δ3[d(M1,M3)+d(M2,M3)]=1δ3(12+8ϵ)\displaystyle f(M_{3})>\frac{1-\delta}{3}[d(M_{1},M_{3})+d(M_{2},M_{3})]=\frac{1-\delta}{3}(12+8\epsilon)
f(M4)<1δ3[d(M1,M4)+d(M2,M4)+d(M3,M4)]+δD=1δ3(8+8ϵ)+90δ1δ3(9+8ϵ)\displaystyle f(M_{4})<\frac{1-\delta}{3}[d(M_{1},M_{4})+d(M_{2},M_{4})+d(M_{3},M_{4})]+\delta D=\frac{1-\delta}{3}(8+8\epsilon)+90\delta\leq\frac{1-\delta}{3}(9+8\epsilon)

Where the last inequality was obtained by setting δ<1271\delta<\frac{1}{271} since it follows that 90δ<1δ390\delta<\frac{1-\delta}{3}. From the above bounds it follows that the population medoid cannot cannot be M1,M2M_{1},M_{2}, or M3M_{3}.

Set ϵ=11000\epsilon=\frac{1}{1000} and δ11000<1271\delta\leq\frac{1}{1000}<\frac{1}{271} and with 1(p1+p2+p3)=δ1-(p_{1}+p_{2}+p_{3})=\delta, then 1δ3(14ϵ)>0.331\frac{1-\delta}{3}(1-4\epsilon)>0.331 and (10+4ϵ)(9+8ϵ)>1.11\frac{(10+4\epsilon)}{(9+8\epsilon)}>1.11. With the population medoid denoted by MM^{*}, then we have: mini{1,2,3}f(Mi)f(M)mini{1,2,3}f(Mi)f(M4)1δ3(10+4ϵ)1δ3(9+8ϵ)>1.11\min\limits_{i\in\{1,2,3\}}\frac{f(M_{i})}{f(M^{*})}\geq\min\limits_{i\in\{1,2,3\}}\frac{f(M_{i})}{f(M_{4})}\geq\frac{\frac{1-\delta}{3}(10+4\epsilon)}{\frac{1-\delta}{3}(9+8\epsilon)}>1.11. Further, it follows by the triangle inequality that i{1,2,3}:f(Mi)f(M)+d(Mi,M)\forall i\in\{1,2,3\}:f(M_{i})\leq f(M^{*})+d(M_{i},M^{*}), thus d(Mi,M)1δ3(14ϵ)d(M_{i},M^{*})\geq\frac{1-\delta}{3}(1-4\epsilon), since otherwise f(M)+d(Mi,M)f(M4)+d(Mi,M)<1δ3(9+8ϵ)+1δ3(14ϵ)=1δ3(10+4ϵ)f(M^{*})+d(M_{i},M^{*})\leq f(M_{4})+d(M_{i},M^{*})<\frac{1-\delta}{3}(9+8\epsilon)+\frac{1-\delta}{3}(1-4\epsilon)=\frac{1-\delta}{3}(10+4\epsilon) which would be a contradiction.

From the above we have shown that, i{1,2,3}:d(Mi,M)1δ3(14ϵ)>0.331\forall i\in\{1,2,3\}:d(M_{i},M^{*})\geq\frac{1-\delta}{3}(1-4\epsilon)>0.331 and mini{1,2,3}f(Mi)f(M)1.11\min\limits_{i\in\{1,2,3\}}\frac{f(M_{i})}{f(M^{*})}\geq 1.11, therefore to prove parts (1) and (2) of the theorem it is sufficient to upper bound the probability of sampling a map that is not in {M1,M2,M3}\{M_{1},M_{2},M_{3}\} in TT iid samples by 13\frac{1}{3}. This leads to the following:

Pr[Obtaining a map M{M1,M2,M3} in a given T iid samples]\displaystyle\Pr[\text{Obtaining a map $M\notin\{M_{1},M_{2},M_{3}\}$ in a given $T$ $\ \operatorname{\textbf{iid}}$ samples}]
=1Pr[No map M{M1,M2,M3} in the given T iid samples]\displaystyle=1-\Pr[\text{No map $M\in\{M_{1},M_{2},M_{3}\}$ in the given $T$ $\ \operatorname{\textbf{iid}}$ samples}]
=1(1δ)T13\displaystyle=1-(1-\delta)^{T}\leq\frac{1}{3}

Therefore, from the above we should have δ=min{11000,123T}\delta=\min\{\frac{1}{1000},1-\sqrt[T]{\frac{2}{3}}\} to satisfy both parts of the theorem.

Refer to caption
Figure 7: The first map is M1M_{1} and the last is M3M_{3}. The middle map shows the edges the should be deleted from M1M_{1} (marked with X) and the edges should be added to M1M_{1} (dashed green edges) to produce M3M_{3}. The weight of each edge that is deleted or added is shown next to it in blue. By adding the weights we get that dΘ(M1,M3)=6+4ϵ\operatorname{\mathnormal{d_{\Theta}}}(M_{1},M_{3})=6+4\epsilon.

Remark:

In both theorems A.1 and 5.3 the probability of “failure” is set to 23\frac{2}{3} but this is arbitrary as we can make it arbitrarily large by choosing smaller values of δ\delta. But our objective was simply to show that no sampled map would converge to the population medoid or would have a medoid cost function value that converges to the value of medoid cost function of the population medoid. Further, both theorems would hold if the population medoid is sampled with probability zero, but in our proofs we allowed the population medoid to be sampled with non-zero probability to show that the negative result would still hold even if we were to assume that the population medoid is sampled with non-zero probability.

Appendix B Additional Experimental Results

B.1 Convergence to the centroid

As a reminder to further test the robustness of our results and see the convergence, all calculations are done 3 times for each state, each time starting from a specific seed map101010For MD we use only two seeds and do the calculations twice instead of three times.. For example, we estimate the centroid three times, by sampling starting from seed maps s1s_{1}, s2s_{2}, and s3s_{3}111111Seed maps are either already enacted maps or produced using optimization functions from the GerryChain toolbox: https://github.com/mggg/GerryChain. and as a result we end up with three centroids c1c_{1}, c2c_{2}, and c3c_{3}. Here we further verify the convergence of the centroids by showing that the final estimated centroids are close to each other. In fact, if we denote the smallest d2,Θ(.,.)\operatorname{\mathnormal{d_{2,\Theta}}}(.,.) distance between a sampled map and any of the centroids c1c_{1}, c2c_{2}, or c3c_{3} by d2,Θmin\operatorname{\mathnormal{d_{2,\Theta}}}_{\min}. Then we consistently find –for all states (NC,PA,MD) and all distance measures (unweighted and population-weighted)– that for any two estimated centroids cic_{i} and cjc_{j} that d2,Θ(ci,cj)d2,Θmin\operatorname{\mathnormal{d_{2,\Theta}}}(c_{i},c_{j})\leq\operatorname{\mathnormal{d_{2,\Theta}}}_{\min} by at least 22 or 33 orders of magnitude. This is strong evidence that while the differences in the estimation may be large in value, they are very small relative to the distances between other maps. Therefore, at this scale the estimation error is small. We note moreover the while we used 200,000200,000 samples to estimate the centroids for NC, we use only 50,00050,000 for PA and MD in each seed run and interestingly find that 50,00050,000 samples are sufficient. Table 1 shows the maximum d2,Θ(.,.)\operatorname{\mathnormal{d_{2,\Theta}}}(.,.) distance between any two estimated centroid and d2,Θmin\operatorname{\mathnormal{d_{2,\Theta}}}_{\min}.

Table 1: Maximum Distance Between Any Two Centroids vs Minimum Distance between a Centroid and a Sampled Map
STATE Distance Measure maxi,j{1,2,3}d2,Θ(ci,cj)\max\limits_{i,j\in\{1,2,3\}}\operatorname{\mathnormal{d_{2,\Theta}}}(c_{i},c_{j})         d2,Θmin\operatorname{\mathnormal{d_{2,\Theta}}}_{\min}
NC Unweighted 1.2051×1021.2051\times 10^{2}         1.051030×1051.051030\times 10^{5}
NC Population-Weighted 1.5446×1091.5446\times 10^{9}          1.333014×10121.333014\times 10^{12}
PA Unweighted 5.0794×1035.0794\times 10^{3}          1.099175×1061.099175\times 10^{6}
PA Population-Weighted 1.1090×10101.1090\times 10^{10}          2.036627×10122.036627\times 10^{12}
MD Unweighted 2.5067×1022.5067\times 10^{2}          5.033550×1045.033550\times 10^{4}
MD Population-Weighted 2.7067×1092.7067\times 10^{9}          5.015410×10115.015410\times 10^{11}

B.2 Distance Histograms and Detecting Gerrymandered Maps

Here we produce the histogram maps for all states and all distance measures. Each histogram is produced for a given state and distance measure starting from a specific centroid and by sampling from its corresponding seed, e.g. we sample maps starting from seed s1s_{1} and calculate d2,Θ(M,c1)\operatorname{\mathnormal{d_{2,\Theta}}}(M,c_{1}) between a sampled map MM and centroid c1c_{1} to construct one histogram. We find similar observations to Figure 2, indeed we see that the shape of the histograms is stable across all states, distances, and centroids. I.e., all of the distance histograms peak in the middle and fall away from the middle similar to a normal distribution. Further, our observations about the gerrymandered maps of NC (Figure 8) still hold. In addition, we show similar observations in PA where we find that the gerrymandered map that was struck down by the state supreme court (LWV vs Commonwealth of Pennsylvania No. 159 MM 2018) is an outlier in terms of distance. On the other hand, the remedial map has a much smaller distance and appears near the peak of the histogram.

Tables 2, 3, 4, and 5 list percentile distances for the above mentioned maps in NC and PA according to both unweighted and weighted distance measures. We see that the three gerrymander maps (the 2016 and 2011 enacted maps for NC and the 2011 enacted map for PA) are all above the 99th percentile for both distance measures and all three seeds. On the other hand, the NC Judge’s map and PA remedial map are near the peak of the histogram. In addition, the percentiles for these maps are fairly consistent across seeds for any given combination of state and distance measure.

Refer to caption
Figure 8: Distance histograms for NC, the distances of gerrymandered maps are indicated with red markers whereas the distances of the remedial maps are indicated with green markers. The {\color[rgb]{1,0,0}\circ} and the X are for 2011 and 2016 enacted maps, respectively.
Refer to caption
Figure 9: Distance histograms for PA, the distances of gerrymandered maps are indicated with red markers whereas the distances of the remedial maps are indicated with green markers..
Refer to caption
Figure 10: Distance histograms for MD.
Table 2: Percentile distances for special maps in NC using the unweighted distance measure.
Starting Seed Judges’ map 2016 enacted map 2011 enacted map
Seed 1 58.362 99.942 100.0
Seed 2 57.004 99.927 100.0
Seed 3 54.558 99.943 100.0
Table 3: Percentile distances for special maps in NC using the weighted distance measure.
Starting Seed Judges’ map 2016 enacted map 2011 enacted map
Seed 1 65.359 99.999 100.0
Seed 2 64.454 99.997 100.0
Seed 3 63.238 99.996 100.0
Table 4: Percentile distances for special maps in PA using the unweighted distance measure.
Starting Seed Remedial map 2011 enacted map
Seed 1 49.992 100.0
Seed 2 51.875 100.0
Seed 3 48.715 100.0
Table 5: Percentile distances for special maps in PA using the weighted distance measure.
Starting Seed Remedial map 2011 enacted map
Seed 1 22.383 100.0
Seed 2 20.956 100.0
Seed 3 21.813 100.0

B.3 Finding the Medoid

Similar to the previous experiments, we produce different medoids, one for each state, seed, and distance measure. Having chosen the state and distance measure (unweighted or population-weighted), then for a given centroid cic_{i} and its corresponding seed sis_{i}, we start by sampling T1T_{1} maps from the seed and pick the sampled map MM whose distance d2,Θ(M,ci)\operatorname{\mathnormal{d_{2,\Theta}}}(M,c_{i}) is the smallest. We refer to this map as the initial medoid. Clearly, we would have 3 such maps for each state and distance measure. Having found the initial map, we start sampling again but this time starting from the initial map. However, we modify the transition in the ReCom\operatorname{\textbf{ReCom}} chain. Specifically, if we are at a state (map) MM, then we only transition to a new state (map) MM^{\prime} if its closer to the centroid, i.e. d2,Θ(M,ci)<d2,Θ(M,ci)\operatorname{\mathnormal{d_{2,\Theta}}}(M^{\prime},c_{i})<\operatorname{\mathnormal{d_{2,\Theta}}}(M,c_{i}). Doing this for T2T_{2} iterations, we obtain the final medoid. Again we would have one final medoid map for each state, seed, and distance measure. We use T1=T2=200,000T_{1}=T_{2}=200,000 for NC whereas for PA and MD we have T1=50,000T_{1}=50,000 and T2=15,000T_{2}=15,000. Empirically, we find that T1=50,000T_{1}=50,000 and T2=15,000T_{2}=15,000 are sufficient (see Table 6 and the rest of this section for more discussion).

Relative Error Measure:

We have seen in Subsection B.1 that we have convergence in the centroid and therefore we assume that we are dealing with one centroid A¯c\operatorname{\mathnormal{\bar{A}_{c}}} and that A¯c\operatorname{\mathnormal{\bar{A}_{c}}} is a very good approximation of the population centroid Ac\operatorname{\mathnormal{A_{c}}}. Since the medoid is supposed to be the closest valid map to the centroid, for any two given mediods M^1\mathnormal{\hat{M}_{1}} and M^2\mathnormal{\hat{M}_{2}} we calculate the relative error between them as follows:

RE(M1,M2)=|d2,Θ(M^1,A¯c)d2,Θ(M^2,A¯c)|min{d2,Θ(M^1,A¯c),d2,Θ(M^2,A¯c)}\displaystyle\operatorname{\textbf{RE}}(M_{1},M_{2})=\frac{|\operatorname{\mathnormal{d_{2,\Theta}}}(\mathnormal{\hat{M}_{1}},\operatorname{\mathnormal{\bar{A}_{c}}})-\operatorname{\mathnormal{d_{2,\Theta}}}(\mathnormal{\hat{M}_{2}},\operatorname{\mathnormal{\bar{A}_{c}}})|}{\min\{\operatorname{\mathnormal{d_{2,\Theta}}}(\mathnormal{\hat{M}_{1}},\operatorname{\mathnormal{\bar{A}_{c}}}),\operatorname{\mathnormal{d_{2,\Theta}}}(\mathnormal{\hat{M}_{2}},\operatorname{\mathnormal{\bar{A}_{c}}})\}}

We use the relative error measure RE(.,.)\operatorname{\textbf{RE}}(.,.) to see how the final medoids approximate the medoid cost function relative to one another.

General Observations and Conclusion:

Here we note the observations we reach from the following subsections: (1) For a given state and seed, the initial medoids are the same regardless of distance used (unweighted or population-weighted). (2) For NC and MD, the final medoids are visually very similar, however for PA which is much larger we notice significant differences. (3) Medoid maps can lead to different election results even if they are visually very similar and very close in terms of distance dΘ\operatorname{\mathnormal{d_{\Theta}}}. (4) Medoid maps do not lead to rare election outcomes and the election outcome is at the peak of the election histogram or close to it. (5) The relative error measure RE(.,.)\operatorname{\textbf{RE}}(.,.) over final medoid is in general very small, at most equal to 3.87%3.87\%. The fact that we find final medoids with small relative error but that are still different may be explained by the fact that the medoid (or approximate medoids) may not be unique. Regardless, we believe the final medoids maps we obtain are useful as starting maps that can be refined to draw final redistricting maps to be enacted.

Initial Medoids and Final Medoids

Figures 11, 12, and 13 show the initial and final medoids for NC, PA, and MD, respectively. We note that in all figures, that maps in the first column are all using seed s1s_{1} and centroid c1c_{1}, the second column are using seed s2s_{2} and centroid c2c_{2}, and so on. Further, the initial medoid maps are found to be the same for both distances (unweighted or population-weighted). In general, we notice that the NC and MD final medoid maps are much similar to one another than PA as there are large regions in PA that are clearly assigned to different districts.

Refer to caption
Figure 11: NC Medoids. (Top Row): Initial Medoids, (Middle Row): Final Unweighted Distance Medoids, and (Bottom Row): Final Population-Weighted Distance Medoids. Each column is for a specific seed and its associated centroid.
Refer to caption
Figure 12: PA Medoids. (Top Row): Initial Medoids, (Middle Row): Final Unweighted Distance Medoids, and (Bottom Row): Final Population-Weighted Distance Medoids. Each column is for a specific seed and its associated centroid.
Refer to caption
Figure 13: MD Medoids. (Top Row): Initial Medoids, (Middle Row): Final Unweighted Distance Medoids, and (Bottom Row): Final Population-Weighted Distance Medoids. Each column is for a specific seed and its associated centroid.

Relative Error RE(.,.)\operatorname{\textbf{RE}}(.,.) between the Final Medoids

Since we have three final medoids for each state and distance measure, we report the maximum relative error value that can be obtained using any pair of maps. Table 6 shows the maximum relative error percentage value, clearly the largest relative error value is at most around 3.87%3.87\%.

Table 6: Maximum Relative Error RE\operatorname{\textbf{RE}} between final medoids for each state and distance measure
STATE Distance Measure Max RE(.,)\operatorname{\textbf{RE}}(.,) Percentage Value Over Final Medoid Pair
NC Unweighted               0.17010.1701
NC Population-Weighted               3.25063.2506
PA Unweighted               1.34821.3482
PA Population-Weighted               3.52103.5210
MD Unweighted               3.61673.6167
MD Population-Weighted               3.87493.8749

Election Histograms and Medoid Election Results

Here we show the election histogram for each state using votes for a specific election. First, note according to the distribution of votes in each district the number of seats won by a specific party can be calculated. Since we have two-party results, we show only the seats for one party (Democratic party). Finding election histograms using sampling methods is well-established (DeFord, Duchin, and Solomon 2019; Herschlag et al. 2020) therefore we show only one histogram for each state, see Figures 14,15, and 16. We note that the election histograms are the same independent of the choice of seed (as expected). Further, election histograms are not related to a centroid or a chosen distance measure. For each state and distance measure, we record the number of seats for the final medoid maps as shown in the election tables below. In general, medoid maps lead to election results that have high probability, the outcome with the highest probability or close to it. Moreover, we find that the election outcomes of different medoids can significantly overlap.

Refer to caption
Figure 14: NC Election Histograms.
Refer to caption
Figure 15: PA Election Histograms.
Refer to caption
Figure 16: MD Election Histograms.
Table 7: Election results for the unweighted NC medoids.
Starting Seed Governor 16 Senate 16 Presidential 16
Seed 1 5 4 5
Seed 2 6 4 5
Seed 3 6 4 5
Table 8: Election results for the weighted NC medoids.
Starting Seed Governor 16 Senate 16 Presidential 16
Seed 1 5 4 4
Seed 2 6 4 5
Seed 3 6 4 6
Table 9: Election results for the unweighted PA medoids.
Starting Seed Attorney General 16 Governor 14 Presidential 16 Senate 16
Seed 1 8 10 6 6
Seed 2 8 9 7 6
Seed 3 7 10 6 6
Table 10: Election results for the weighted PA medoids.
Starting Seed Attorney General 16 Governor 14 Presidential 16 Senate 16
Seed 1 8 9 7 5
Seed 2 8 9 7 6
Seed 3 8 10 6 6
Table 11: Election results for the unweighted MD medoids.
Starting Seed Attorney General 18 Governor 14 Presidential 16 Senate 16
Seed 1 6 4 6 5
Seed 2 6 4 6 6
Table 12: Election results for the weighted MD medoids.
Starting Seed Attorney General 18 Governor 14 Presidential 16 Senate 16
Seed 1 6 4 6 5
Seed 2 6 4 6 6

Appendix C Further Details on Gerrymandering Detection

C.1 Gerrymandering Interpretation of Distance Outlier Maps

What does it mean for a map AG\operatorname{\mathnormal{A_{G}}} to a have very large distance d2,Θ(AG,A¯c)\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{A_{G}}},\operatorname{\mathnormal{\bar{A}_{c}}}) from the centroid? Here, we show two interesting and mathematically justified interpretations which support the claim that such a map AGA_{G} should be considered gerrymandered. For ease of representation, we assume that we are in the asymptotic regime, i.e. A¯c=Ac\operatorname{\mathnormal{\bar{A}_{c}}}=\operatorname{\mathnormal{A_{c}}}:

(1) AG\operatorname{\mathnormal{A_{G}}} is Highly Dissimilar from the Ensemble.

We show that the distance from the centroid is a direct measure of the average distance from the ensemble of maps. This follows by direct manipulation of the decomposition theorem 5.1:

d2,Θ(AG,A¯c)=1Tt=1TdΘ(AG,At)Average distance between map AG and the ensemble1Tt=1Td2,Θ(At,A¯c)Constant independent of the map AG\displaystyle\operatorname{\mathnormal{d_{2,\Theta}}}(\operatorname{\mathnormal{A_{G}}},\operatorname{\mathnormal{\bar{A}_{c}}})=\underbrace{\frac{1}{T}\sum_{t=1}^{T}\operatorname{\mathnormal{d_{\Theta}}}(\operatorname{\mathnormal{A_{G}}},A_{t})}_{\text{Average distance between map $\operatorname{\mathnormal{A_{G}}}$ and the ensemble}}-\underbrace{\frac{1}{T}\sum_{t=1}^{T}\operatorname{\mathnormal{d_{2,\Theta}}}(A_{t},\operatorname{\mathnormal{\bar{A}_{c}}})}_{\text{Constant independent of the map $\operatorname{\mathnormal{A_{G}}}$}} (8)

Therefore, the above equation shows that a map AG\operatorname{\mathnormal{A_{G}}} with a large distance from the centroid has to be highly dissimilar from the ensemble. This gives further justification for why maps that are outliers from the centroid such as those of NC and PA (marked in the histograms of Figures 8 and 9) should be considered to be gerrymandered. It also helps explain why maps which are outliers in terms of partisan election results would also tend to be outliers according to our non-partisan metric. Intuitively, maps which produce rare election results with respect to the ensemble should have rare structure with respect to the ensemble. Ideally, the metric we propose is then able to detect abnormal maps which might disadvantage groups beyond those harmed by the typical partisan or racial gerrymandering we know to look for.

Importantly, we note a significant computational advantage, since in general finding the average distance of a map AG\operatorname{\mathnormal{A_{G}}} from TT sampled maps would require Ω(T)\Omega(T) time, but with the above equation we only need to find the distance between the map AG\operatorname{\mathnormal{A_{G}}} and the centroid.

(2) AG\operatorname{\mathnormal{A_{G}}} Separates Same District Vertices and Unites Separated Vertices.

Proposition 3 shows that Ac(i,j)=Pr[i and j in the same district]\operatorname{\mathnormal{A_{c}}}(i,j)=\Pr[i\text{ and }j\text{ in the same district}], i.e., the probability that vertices ii and jj are in the same district equals Ac(i,j)\operatorname{\mathnormal{A_{c}}}(i,j). Therefore consider the unweighted d2,Θ\operatorname{\mathnormal{d_{2,\Theta}}} distance, i.e. d2(AG,A¯c)=12i,jV(AG(i,j)A¯c(i,j))2d_{2}(\operatorname{\mathnormal{A_{G}}},\operatorname{\mathnormal{\bar{A}_{c}}})=\frac{1}{2}\sum_{i,j\in V}(\operatorname{\mathnormal{A_{G}}}(i,j)-\operatorname{\mathnormal{\bar{A}_{c}}}(i,j))^{2}. It follows that if d2(AG,A¯c)d_{2}(\operatorname{\mathnormal{A_{G}}},\operatorname{\mathnormal{\bar{A}_{c}}}) is abnormally large, then AG\operatorname{\mathnormal{A_{G}}} is following a different distribution where on average if vertices ii and jj were usually in the same district (i.e., Pr[i and j in the same district]>0.5\Pr[i\text{ and }j\text{ in the same district}]>0.5), they are placed in different districts and if ii and jj were usually separated ( i.e., Pr[i and j in the same district]<0.5\Pr[i\text{ and }j\text{ in the same district}]<0.5), they are placed in the same district. A similar interpretation follows for weighted distances, but with the weight θ(i,j)\theta(i,j) taken into account, i.e., higher cost if θ(i,j)\theta(i,j) is larger.

C.2 Comparison to Previous Work on Detecting Gerrymandering

With the exception of the work of Abrishami et al. (2020), the methods that were introduced such as Herschlag et al. (2020); Chikina, Frieze, and Pegden (2017); Ko et al. (2022); Lin et al. (2022); DeFord, Duchin, and Solomon (2019) to detect gerrymandering have all been based on election outcomes and therefore are orthogonal to our work. Now, we provide more details about Abrishami et al. (2020). The work of Abrishami et al. (2020) introduces a distance measure over redistricting, but finding the distance between two maps requires solving a linear program which is significantly more computationally expensive then our distance measure. Further, Abrishami et al. (2020) did not define a medoid or centroid map. Moreover, their work only considered the 2011 and 2016 gerrymandered maps of North Carolina unlike our work which also considered the 2011 gerrymandered instance of Pennsylvania. More importantly, the method used in Abrishami et al. (2020) to show that the 2011 and 2016 North Carolina maps are gerrymandered amounts to sampling 100 maps and then applying multidimensional scaling (MDS) a classical method for visualizing high dimensional data in lower dimensional space (Borg et al. 2013; Mead 1992). Figure 17 is from (Abrishami et al. 2020) and shows the visualization. Note that the 2012 map in the figure is actually the same as the 2011 map we use (the map was enacted in 2011 but used in 2012). One can see that the gerrymandered maps in red lie further away unlike the judges’ map. Based on the above discussion one can identify the following limitations in (Abrishami et al. 2020):

  1. 1.

    Small Sample Size: In previous work in redistricting such as (Chikina, Frieze, and Pegden 2017; DeFord, Duchin, and Solomon 2019; Herschlag et al. 2020) we have seen tens and even hundreds of thousands of maps used to estimate a quantity. However, (Abrishami et al. 2020) only uses 100 maps which is a significant limitation that could put into question the statistical accuracy. This is an outcome of their distance measure which is computationally expensive.

  2. 2.

    Heuristic Method with No Outlier Score: (Abrishami et al. 2020) uses a high dimensional visualization method (MDS). One necessarily loses information when projecting to a lower dimensional space and the final visualization is not considered to give a necessarily accurate view of the data in higher dimensional space. Furthermore, no outlier score is given for the gerrymandered maps, but rather the observation that they appear further away. Not having a clear numerical value is a clear deficiency in this method. Moreover, in Figure 17 there exists a large number of ordinary maps sampled from ReCom\operatorname{\textbf{ReCom}} that are comparatively as far away at the exterior of the visualization as the red gerrymandered maps of North Carolina.

Our framework resolves both issues above. It can use as much as 200,000200{,}000 maps or more and gives an outlier score in terms of percentile distance.

Refer to caption
Figure 17: Figure from (Abrishami et al. 2020) shown as Figure 15 (b) in the reference. The black points are sampled using ReCom\operatorname{\textbf{ReCom}}.
Limitation of Our Method:

Our method is a distance (dissimilarity) based method. As we have seen, it resolves significant issues and introduces a rigorous and statistically stable framework to detect maps that are outlier in terms of distance. However, this may or may not be an advantage since a gerrymandering legislature may select a map that unfairly favors it that is close to the ensemble in terms of distance (Duchin and Walch 2021). In general, there is no clean cut approach to detect gerrymandering in any instance. In fact, even using election outcomes it is not in general possible to detect a gerrymandered map. For example, consider the case where the election outcomes are uniformly distributed, this would imply that all outcomes are equally rare.