\makesavenoteenv

longtable

Finding Pareto Efficient Redistricting Plans with Short Bursts

Cory McCartan¹¹1 To whom correspondence should be addressed. Email: [email protected]. Website: https://corymccartan.com. Address: 1 Oxford St, Cambridge, MA 02138. The author thanks Christopher Kenny and Tyler Simko for helpful comments and suggestions.
Department of Statistics
Harvard University

(May 27, 2024)

Abstract

Redistricting practitioners must balance many competing constraints and criteria when drawing district boundaries. To aid in this process, researchers have developed many methods for optimizing districting plans according to one or more criteria. This research note extends a recently-proposed single-criterion optimization method, short bursts (Cannon et al.,, 2023), to handle the multi-criterion case, and in doing so approximate the Pareto frontier for any set of constraints. We study the empirical performance of the method in a realistic setting and find it behaves as expected and is not very sensitive to algorithmic parameters. The proposed approach, which is implemented in open-source software, should allow researchers and practitioners to better understand the tradeoffs inherent to the redistricting process.

Keywords redistricting • optimization • Markov chain • Pareto efficiency

1 Introduction

Legislative districts in the U.S. must satisfy a wide variety of constraints and criteria, which vary across states and municipalities, in addition to several federal statutory and constitutional criteria (National Conference of State Legislatures,, 2021). Many recent advances in the quantitative analysis of redistricting have involved introducing techniques for generating sample districting plans which satisfy these constraints. These techniques fall into two broad categories: sampling algorithms and optimization algorithms.

Sampling algorithms aim to generate a random sample of districting plans which meet a set of constraints. These algorithms include ad-hoc methods (Cirincione et al.,, 2000; Chen and Rodden,, 2013; Magleby and Mosesson,, 2018) as well as algorithms that are designed to sample from a specific probability distribution (Mattingly and Vaughn,, 2014; Wu et al.,, 2015; DeFord et al.,, 2021; Carter et al.,, 2019; Fifield et al.,, 2020; McCartan and Imai,, 2020; Cannon et al.,, 2022); most of which are based on Markov chain Monte Carlo techniques.

In contrast, optimization algorithms are designed to produce a districting plan or plans which score well on predefined numerical criteria. Informally, while sampling algorithms can give an idea of a “typical” districting plan, optimization algorithms can find more extreme or even close-to-optimal districting plans along certain dimensions. Many different optimization schemes have been proposed in the literature (Mehrotra et al.,, 1998; Macmillan,, 2001; Bozkaya et al.,, 2003; Altman and McDonald,, 2011; Liu et al.,, 2016; Rincón-García et al.,, 2013; Gurnee and Shmoys,, 2021; Swamy et al.,, 2022). One recent development by Cannon et al., (2023) has been optimization by short bursts, which has shown particular promise in maximizing minority vote share across many districts. The short burst algorithm operates by running a Markov chain on the space of districting plans for a small number of steps, then restarting it from the step which scored highest according to a prespecified criterion.

One limitation of many of these optimization algorithms is that users must specify a univariate scoring function. When multiple constraints must be balanced against each other, practitioners often take a linear combination of multiple scoring functions and carefully tune the weights to achieve their desired goal. An alternative approach, which has been previously proposed, is to use optimization or sampling algorithms to find districting plans on the Pareto frontier for a set of constraints: plans which cannot be improved on any one dimension without sacrificing another dimension (Gerken,, 2010; Altman and McDonald,, 2018). The ability to identify a set of plans lying along or close to the Pareto frontier can be a valuable tool for redistricting researchers and practitioners alike in understanding the compatibility and tradeoffs between various redistricting criteria.

While some optimization algorithms (Rincón-García et al.,, 2013; Swamy et al.,, 2022) are explicitly designed to handle multiple constraints and approximate the Pareto frontier, many, including optimization by short bursts, are not. This research note provides a natural extension of the short burst algorithm to optimize the entire Pareto frontier at once.

Section 2 describes the problem formally and develops the algorithmic extension, which is also implemented in open-source software (Kenny et al.,, 2022). We examine the efficacy of the proposed algorithm in studying congressional redistricting in Iowa in Section 3. We find that the proposed algorithm identifies an expanding Pareto frontier as the number of bursts increases, that the algorithm’s performance is not sensitive to the burst size, and that multiple independent runs of the algorithm for fixed parameters produce similar results. Section 4 concludes and discusses directions for future work.

2 Approximating the Pareto Frontier with Short Bursts

Let $\Xi$ denote the (finite) collection of possible districting plans in a state or city. In most algorithmic redistricting work, $\Xi$ is defined as the set of possible graph partitions of an adjacency graph over geographic units such as precincts or Census tracts (see, e.g., McCartan and Imai,, 2020).

2.1 Pareto ordering and efficiency

Suppose we have a scoring function $f:\Xi\to\mathbb{R}^{J}$ that provides a vector of $J$ numerical scores for a given districting plan. We refer a single element of the vector $f_{j}$ as a districting criterion. Without loss of generality we interpret larger values of each $f_{j}$ to be desirable for the practitioner. We can then define a strict partial order on districting plans $\xi\in\Xi$ as follows: $\xi\prec\xi^{\prime}$ iff for all $j$ , $f_{j}(\xi)\leq f_{j}(\xi^{\prime})$ , and there exists a $j$ with $f_{j}(\xi)<f_{j}(\xi^{\prime})$ . Note that when $J=1$ this is in fact a strict total order. This ordering is called the Pareto ordering; when $\xi\prec\xi^{\prime}$ we say that $\xi^{\prime}$ Pareto dominates $\xi$ .

A plan is Pareto efficient if there does not exist a distinct $\xi^{\prime}\in\Xi$ that dominates it. The Pareto frontier $P(f)$ is the set of all Pareto efficient plans. It is not possible to improve a plan on the Pareto frontier (in the sense of increasing the value of a criterion $f_{j}$ ) without decreasing the value of at least one criterion.

2.2 Pareto optimization by short bursts

The Pareto frontier $P(f)$ for a scoring function $f:\Xi\to\mathbb{R}^{J}$ is easily calculated from an enumeration of all possible districting plans: simply discard plans in $\Xi$ which are Pareto dominated by any other plan. Unfortunately, in most realistic districting problems, $\Xi$ is too large to enumerate, and so analysts must resort to approximations.

Some researchers have generated an approximate Pareto frontier by generating a large number of samples from $\Xi$ using a redistricting sampling algorithm targeting a “neutral” distribution $\pi$ which contains no information about $f$ , and then discarding samples which are Pareto dominated by any other sampled plan (Schutzman,, 2020). While this approach will consistently generate the Pareto frontier as the number of samples goes to infinity, it will not perform well in general in finite samples, because the generation of the sample from $\Xi$ is done without any knowledge of the scoring function $f$ .

One could imagine instead generating samples from a target distribution $\pi_{f}$ which places more probability mass on plans that score well on $f$ , and then discarding Pareto-dominated plans. For example, $\pi_{f}(\xi)=\exp(-\norm{f(\xi)})$ is one such distribution. As Cannon et al., (2023) show, however, a better approach yet might be to generate the Pareto frontier over a series of “short bursts.”

The original short bursts algorithm of Cannon et al., (2023) is described in Algorithm 1. Given an initial plan $\xi_{0}$ , the short burst algorithm runs $b$ steps of some Markov chain $\mathcal{M}$ on districting plans, then picks the best plan from the set of $b+1$ plans (including the initializing plan) to initialize the next “burst.” The chain $\mathcal{M}$ can be any Markov chain on districting plans, even one which does not satisfy detailed balance. Cannon et al., (2023) use the Markov chain of DeFord et al., (2021); here, we use a substantially similar algorithm that is also spanning-tree based and is implemented in the software of Kenny et al., (2022). The short burst algorithm is terminated when $f(\xi_{0})$ reaches a predefined threshold, or when the total number of bursts run hits a specified maximum.

Algorithm 1 Univariate optimization by short bursts

Input: initial plan $\xi_{0}$ , univariate scoring function $f:\Xi\to\mathbb{R}$ , Markov chain $\mathcal{M}$ , and burst size parameter $b$ .

Repeat until termination:

1.

Run $\mathcal{M}$ for $b$ steps starting at $\xi_{0}$ , producing plans $(\xi_{1},\dots,\xi_{b})$ .
2.

Set $k\leftarrow\operatorname*{arg\,max}_{0\leq i\leq b}f(\xi_{i})$ .
3.

Set $\xi_{0}\leftarrow\xi_{k}$ .

For a scoring function $f:\Xi\to\mathbb{R}^{J}$ , the Pareto frontier $P(f)$ can be approximated with Algorithm 1 by running the algorithm to completion on many different scoring functions $f_{\vec{w}}$ , where

f_{\vec{w}}(\xi)=\sum_{j=1}^{J}w_{j}f_{j}(\xi),

for many different combinations of weights $\vec{w}$ lying in a $J$ -simplex. However, this is computationally expensive, especially as $J$ grows.

Instead, we propose a simple modification to Algorithm 1. Rather than selecting the best plan from the previous burst to initialize the next burst, we will keep track of the set of sampled plans which are not Pareto-dominated by any other observed plan, and initializing each burst with a plan sampled from this Pareto efficient set. This yields a generalized short burst algorithm for Pareto optimization which is detailed in Algorithm 2. The univariate Algorithm 1 is a special case of Algorithm 2 with $J=1$ .

Algorithm 2 Pareto optimization by short bursts

Input: set of initial plans $X$ , scoring function $f:\Xi\to\mathbb{R}^{J}$ , Markov chain $\mathcal{M}$ , and burst size parameter $b$ .

Repeat until termination:

1.

Sample $\xi_{0}$ from $X$ uniformly at random.
2.

Run $\mathcal{M}$ for $b$ steps starting at $\xi_{0}$ , producing plans $(\xi_{1},\dots,\xi_{b})$ .
3.

Set $X\leftarrow X\cup\{\xi_{1},\dots,\xi_{b}\}$ .
4.

Remove Pareto-dominated plans from $X$ : set $X\leftarrow\{\xi\in X:\not\exists\ \xi^{\prime}\in X\text{ with }\xi\prec\xi^{\prime}\}$ .

By initializing each burst with a sample from the yet-observed Pareto frontier, Algorithm 2 is able to explore and improve the entire Pareto frontier at any point. One could imagine generalizations of the algorithm which place more weight on plans on the Pareto frontier which the algorithm heuristically believes are easier to further optimize. We leave such improvements to future work.

2.3 Algorithm properties

Letting $X_{n}$ denote the (random) output of Algorithm 2 after $n$ bursts, we can record several easily-verified properties of the proposed algorithm. Proofs are deferred to the appendix.

{restatable}

proppropconv Let $\mathcal{M}^{b}$ be the Markov chain obtained by taking $b$ steps from $\mathcal{M}$ at a time. If the Markov chain $\mathcal{M}^{b}$ has a strictly positive transition probability between any pair of plans, then $X_{n}\xrightarrow{\;\!a.s.\>\!}P(f)$ as $n\to\infty$ .

Unfortunately, all Markov chains developed to date to sample districting plans do not have strictly positive transition probability between all pairs of plans. Weaker conditions such as the irreducibility of $\mathcal{M}$ are not enough for convergence, either, as the following proposition records.

{restatable}

proppropirred For any $\Xi$ , burst size $b\leq|\Xi|-2$ , and scoring function $f:\Xi\to\mathbb{R}$ which takes at least $b+2$ distinct values on $\Xi$ , there exists an irreducible Markov chain $\mathcal{M}$ on $\Xi$ such that $X_{n}\not\xrightarrow{\;\!a.s.\>\!}P(f)$ as $n\to\infty$ for some initializing set $A$ .

While Algorithm 2 is therefore not guaranteed to produce the Pareto frontier in all settings, even with infinite computing time, given the promising results in Cannon et al., (2023), we expect it to perform well in approximating the Pareto frontier in real-world redistricting problems.

One qualitative difference in algorithm performance between the $J=1$ and $J>1$ settings involves local minima. When $J=1$ and the algorithm finds itself in a local minimum, it is often unable to further improve the scoring function. Multiple runs of the algorithm are therefore often required to have confidence that any particular run has avoided such a local minimum. While there is no guarantee of avoiding local minima when $J>1$ , in practice such issues may be mitigated by the fact that the algorithm randomly selects starting points along the currently-estimated Pareto frontier for each burst. If one point along the frontier lies in a local minimum, the algorithm can still improve the overall frontier by starting bursts from other points. Of course, the overall size of the Pareto frontier grows rapidly as $J$ increases (see appendix). As a result, each portion of the frontier may get less overall “attention” per burst, leading to overall slower improvements in any one region of the frontier.

Refer to caption — Figure 1: Pareto frontier estimated by the proposed method across a range of total bursts (colored lines), plotted against 1,000 samples from an MCMC sampling algorithm using the same Markov chain (grey points).

3 Empirical Demonstration

Here we apply the proposed Algorithm 2 to the problem of congressional redistricting in the state of Iowa. Since 2010, Iowa has been apportioned four congressional districts, and by law, these districts must be comprised of whole counties, of which there are 99. Traditionally, maximizing district compactness and minimizing the deviation in populations across districts are far and away the two most important criteria for districting plans in Iowa.

We measure district compactness with the Polsby–Popper score (Polsby and Popper,, 1991), which is proportional to the ratio of a district’s area to its perimeter. To score an entire plan, we record the minimum Polsby–Popper score (least compact) of all the districts:

\text{comp}(\xi)\coloneqq\min_{1\leq i\leq m}4\pi\frac{\text{area}_{i}(\xi)}{\text{perim}_{i}(\xi)^{2}},

where $\text{area}_{i}(\xi)$ and $\text{perim}_{i}(\xi)$ denote the area and perimeter of district $i$ in a plan $\xi$ . We measure population equality with the maximum population deviation score. For districts in a plan $\xi$ with populations $N_{1},\dots,N_{m}$ in a state with total population $N$ , the population deviation score is defined as

\text{dev}(\xi)\coloneqq\max_{1\leq i\leq m}\absolutevalue{\frac{N_{i}-N/m}{N/m}},

that is, the maximum percentage deviation in any district from full population equality. We can therefore write the overall scoring function as $f(\xi)=(-\text{dev}(\xi),\text{comp}(\xi))$ .

First, we run Algorithm 2 with $b=10$ and $\xi_{0}$ set to the enacted 2020 plan over a range of maximum bursts from 10 to 10,000. The estimated Pareto frontiers from each of these runs of the algorithm is plotted in Figure 1. For comparison, we also run the underlying Markov chain for 1,000 steps, and plot the sampled plans’ compactness and deviation scores on the same figure. Unsurprisingly, the estimated Pareto frontier lie outside of (or close to) the convex hull of the sampled plans’ scores, even for small numbers of total bursts,

Figure 2 plots the 11 plans that define the Pareto frontier estimated with 10,000 bursts. The minimum-deviation plan is relatively noncompact compared to the rest of the frontier, though is typical compared to the set of plans sampled directly from the Markov chain. As we travel along the frontier, compactness increases substantially, before reaching a relative plateau. In fact, there appears to be little overall tradeoff between the two criteria, as indicated by the sharp angle in the Pareto frontier. Across most of their range, each criterion can be optimized with minimal effect on the other criterion; only at extreme values must compactness and population equality be weighed against each other. Compared to single-criterion optimizers or existing sampling methods, the proposed algorithm enables qualitative findings like this which may be useful to redistricting practitioners looking to improve a districting plan along multiple dimensions.

Figure 1 plots the Pareto frontier as a function of the total number of bursts. We can also examine the sensitivity of the estimated frontier to the burst size parameter $b$ . For burst sizes $b\in\{5,10,20\}$ , we run 10 replications of Algorithm 2 for 200 bursts each. The resulting estimated Pareto frontiers are shown in Figure 3.

The frontiers are remarkably consistent across both replications and varying burst sizes. The lack of sensitivity to the burst size over a reasonable range of sizes was noted by Cannon et al., (2023), and it is encouraging to see similar results in the multidimensional case. The consistency across replications further provides confidence that running Algorithm 2 a small-to-moderate number of times will generally capture a reasonable Pareto frontier (for that computational budget).

4 Conclusion

We have proposed a natural generalization of the short bursts algorithm of Cannon et al., (2023) which will allow practitioners and researchers to estimate the Pareto frontier induced by a set of criteria on districting plans. In a demonstration application in the state of Iowa, we find that the proposed Pareto optimization by short bursts algorithm is not particularly sensitive to the burst size, and produces relatively consistent results across multiple independent runs.

Future work should compare the performance of both the univariate and multi-criteria short burst algorithms to competing redistricting optimization approaches, including those of Liu et al., (2016) and Swamy et al., (2022). Additional simulation studies examining the scaling behavior of all of these algorithms in the number of districts, number of precincts or counties, and number of redistricting criteria (dimensionality of $f$ ) would be highly valuable as well.

References

Altman and McDonald, (2018) Altman, M. and McDonald, M. (2018). Redistricting by formula: An Ohio reform experiment. American Politics Research, 46(1):103–131.
Altman and McDonald, (2011) Altman, M. and McDonald, M. P. (2011). BARD: Better automated redistricting. Journal of Statistical Software, 42:1–28.
Bozkaya et al., (2003) Bozkaya, B., Erkut, E., and Laporte, G. (2003). A tabu search heuristic and adaptive memory procedure for political districting. European journal of operational research, 144(1):12–26.
Cannon et al., (2022) Cannon, S., Duchin, M., Randall, D., and Rule, P. (2022). Spanning tree methods for sampling graph partitions. arXiv preprint arXiv:2210.01401.
Cannon et al., (2023) Cannon, S., Goldbloom-Helzner, A., Gupta, V., Matthews, J., and Suwal, B. (2023). Voting rights, Markov chains, and optimization by short bursts. Methodology and Computing in Applied Probability, 25(1):36.
Carter et al., (2019) Carter, D., Herschlag, G., Hunter, Z., and Mattingly, J. (2019). A merge-split proposal for reversible Monte Carlo Markov chain sampling of redistricting plans. arXiv preprint arXiv:1911.01503.
Chen and Rodden, (2013) Chen, J. and Rodden, J. (2013). Unintentional gerrymandering: Political geography and electoral bias in legislatures. Quarterly Journal of Political Science, 8(3):239–269.
Cirincione et al., (2000) Cirincione, C., Darling, T. A., and O’Rourke, T. G. (2000). Assessing South Carolina’s 1990s congressional districting. Political Geography, 19(2):189–211.
DeFord et al., (2021) DeFord, D., Duchin, M., and Solomon, J. (2021). Recombination: A family of Markov chains for redistricting. Harvard Data Science Review. https://hdsr.mitpress.mit.edu/pub/1ds8ptxu.
Fifield et al., (2020) Fifield, B., Higgins, M., Imai, K., and Tarr, A. (2020). Automated redistricting simulation using Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 29(4):715–728.
Gerken, (2010) Gerken, H. K. (2010). Getting from here to there in redistricting reform. Duke J. Const. L. & Pub. Pol’y, 5:1.
Gurnee and Shmoys, (2021) Gurnee, W. and Shmoys, D. B. (2021). Fairmandering: A column generation heuristic for fairness-optimized political districting. In SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), pages 88–99. SIAM.
Kenny et al., (2022) Kenny, C. T., McCartan, C., Fifield, B., and Imai, K. (2022). redist: Simulation Methods for Legislative Redistricting. R package; available at https://alarm-redist.org/redist/.
Liu et al., (2016) Liu, Y. Y., Cho, W. K. T., and Wang, S. (2016). PEAR: a massively parallel evolutionary computation approach for political redistricting optimization and analysis. Swarm and Evolutionary Computation, 30:78–92.
Macmillan, (2001) Macmillan, W. (2001). Redistricting in a GIS environment: An optimisation algorithm using switching-points. Journal of Geographical Systems, 3(2):167–180.
Magleby and Mosesson, (2018) Magleby, D. B. and Mosesson, D. B. (2018). A new approach for developing neutral redistricting plans. Political Analysis, 26(2):147–167.
Mattingly and Vaughn, (2014) Mattingly, J. C. and Vaughn, C. (2014). Redistricting and the will of the people. arXiv preprint arXiv:1410.8796.
McCartan and Imai, (2020) McCartan, C. and Imai, K. (2020). Sequential Monte Carlo for sampling balanced and compact redistricting plans. arXiv preprint arXiv:2008.06131.
Mehrotra et al., (1998) Mehrotra, A., Johnson, E. L., and Nemhauser, G. L. (1998). An optimization based heuristic for political districting. Management Science, 44(8):1100–1114.
National Conference of State Legislatures, (2021) National Conference of State Legislatures (2021). Redistricting criteria. Available at https://www.ncsl.org/research/redistricting/redistricting-criteria.aspx.
Polsby and Popper, (1991) Polsby, D. D. and Popper, R. D. (1991). The third criterion: Compactness as a procedural safeguard against partisan gerrymandering. Yale L. & Pol’y Rev., 9:301.
Rincón-García et al., (2013) Rincón-García, E., Gutiérrez-Andrade, M., De-Los-Cobos-Silva, S., Lara-Velázquez, P., Ponsich, A., and Mora-Gutiérrez, R. (2013). A multiobjective algorithm for redistricting. Journal of applied research and technology, 11(3):324–330.
Schutzman, (2020) Schutzman, Z. (2020). Trade-offs in fair redistricting. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 159–165.
Swamy et al., (2022) Swamy, R., King, D. M., and Jacobson, S. H. (2022). Multiobjective optimization for politically fair districting: A scalable multilevel approach. Operations Research.
Wu et al., (2015) Wu, L. C., Dou, J. X., Sleator, D., Frieze, A., and Miller, D. (2015). Impartial redistricting: A Markov Chain approach. arXiv preprint arXiv:1510.03247.

Appendix \thechapter.A Proofs of Propositions

\propconv

Proof.

Let $\xi\in P(f)$ . Since every pair of plans has strictly positive transition probability under $\mathcal{M}^{b}$ , with probability 1 the algorithm started at any $A$ will eventually transition to $\xi$ within a burst of length $b$ (i.e., $\exists n:\xi\in X_{n}$ ). Since $\xi\in P(f)$ , at the end of the burst, $\xi$ will be added to the approximate Pareto frontier $X$ . Additionally, $\xi$ will never be removed from $X$ , since no plan in $\Xi$ Pareto dominates it. Since this holds for all $\xi\in P(f)$ (a finite set), with probability one all plans in $P(f)$ will eventually belong to $X$ . Then no other plan can belong to $X$ , or else such a plan would not be Pareto dominated by any plan in $\Xi$ , and thus would belong to $P(f)$ itself. So $X_{n}\xrightarrow{\;\!a.s.\>\!}P(f)$ . ∎

\propirred

Proof.

Without loss of generality, number the plans of $\Xi$ such that

f(\xi_{1})=\dots=f(\xi_{i_{1}})>f(\xi_{i_{1}+1})=\dots=f(\xi_{i_{2}})>f(\xi_{i_{2}+1})\dots f(\xi_{i_{b+1}})<f(\xi_{i_{b+1}+1})=\dots=f(\xi_{i_{b+2}})

with $f(\xi_{i_{b}+1})>f(\xi_{1})$ . Then define $\mathcal{M}$ to be a random walk along this ordering; clearly $\mathcal{M}$ is irreducible. However, initializing Algorithm 2 with $A=\{\xi_{1}\}$ , any burst of length $b$ will return a plan $\xi_{l}$ with $1\leq l\leq i_{b+1}$ , which will be Pareto dominated by one of $\{\xi_{1},...\xi_{i+1}\}$ . Since there are at least $b$ plans separating any of $\{\xi_{1},...\xi_{i+1}\}$ from the Pareto frontier $P(f)=\{f(\xi_{i_{b+1}+1}),\dots,f(\xi_{i_{b+2}})\}$ , no burst of length $b$ will transition into the Pareto frontier, and so $X\not\xrightarrow{\;\!a.s.\>\!}P(f)$ . ∎

We expect Proposition 2.3 to hold for $J>1$ as well.

Appendix \thechapter.B Scaling of the Pareto frontier size

Figure \thechapter.B1 shows the scaling behavior of the size of the Pareto frontier. We generate $n$ samples from a $\mathcal{N}(0,I_{J\times J})$ distribution, for a range of $n$ and $J$ .

As both parameters increase, the size of the Pareto frontier grows rapidly; as $n\to\infty$ the growth is exponential in $J$ .