Abstract

Redistricting is the process of partitioning a set of basic units into a given number of larger groups for electoral purposes. These groups must follow federal and state requirements to enhance fairness and minimize the impact of manipulating boundaries for political gain. In redistricting tasks, one of the most important criteria is equal population. As a matter of fact, redistricting plans can be rejected when the population deviation exceeds predefined limits. In the literature, there are several methods to balance population among districts. However, further discussion is needed to assess the effectiveness of these strategies. In this paper, we considered two different strategies, mean deviation and overall range. Additionally, a compactness measure is included to design well-shaped districts. In order to provide a wide set of redistricting plans that achieve good trade-offs between mean deviation, overall range, and compactness, we propose four multiobjective metaheuristic algorithms based on NSGA-II and SPEA-II. The proposed strategies were applied in California, Texas, and New York. Numerical results show that the proposed multiobjective approach can be a very valuable tool in any real redistricting process.

1. Introduction

The zone design problem consists in partitioning a given set of geographic units (GU’s) into k larger groups called zones in order to satisfy several criteria and constraints for a specific context. GU’s may represent diverse geographic spaces such as cities, counties, postal codes, or special geographic crafted areas of interest for a decision-maker (DM). The criteria can be the construction of zones with a specific shape and the same amount of clients or services among others. The zone design problem has a broad range of applications such as land use [1], commercial territory design [2], school districting [3], police districting [4], and service and maintenance zones [5]. Due to its complexity, which has shown to be NP-hard [6], there have been several heuristic approaches such as genetic algorithms [7], tabu search [8], and GRASP [9] to fulfill its objectives in reasonable time.

Most of the papers in the literature use an objective function that combines all objectives in a weighted sum, where the different weights represent their relative importance. The main problem of these methods is that nonconvex optimal solutions cannot be obtained by minimizing linear combination of the objectives; to find multiple solutions, the algorithm must be executed many times, and they are also sensitive to several characteristics of the search space such as discontinuity, multimodality, and nonuniformity [10]. The use of multiobjective algorithms and Pareto-based optimization techniques tries to avoid these pitfalls, but their use for the zone design problem is scarce, and their performance in this type of problems has not been fully investigated.

Regarding those few multiobjective approaches, we can find studies in GIS-based spatial zoning model [11], public service districting problem [12], commercial territory design [2], meter reading power distribution networks [13], planning of earthquake shelters [14], patrol sector [15], and health care system [16].

In this work, we focus on the political districting problem which is probably the most popular case of zone design due to its influence in democratic processes. It consists of grouping GU’s into a fixed number of zones according to several criteria such as contiguity, population equality, and compactness, to avoid political manipulation or gerrymandering. In political districting, there are several techniques reported to solve the problem. However, most of these techniques pose the political districting problem as a single-objective optimization technique [17, 18, 19]. There are scarce studies that deal with multiobjective approaches, for example, we can find the work of Guo et al. [20] where a multiobjective graph partitioning engine integrated with a Geographic Information System is proposed and tested in three cities of Australia. Ricca and Simeone [21] solve the political districting problem by a convex combination of the objectives and compare the behavior of four local search metaheuristics: descent, tabu search, simulated annealing, and old bachelor acceptance. Rincón-García et al. [22] present a multiobjective simulated annealing-based algorithm, where a set of weighting vectors are assigned to nondominated solutions, and the rejection of new solutions is based on the Metropolis criterion. Vanneschi et al. [23] provide a hybrid multiobjective method that combines a genetic algorithm with variable neighborhood and was tested on five US states. Caballero et al. [24] proposed a multiobjective algorithm based on an archived multiobjective simulated annealing (AMOSA) to solve the political redistricting problem, and the technique was applied on 12 datasets generated using electoral information from Mexico. It is important to mention that most of the aforementioned strategies considered a biobjective formulation of the problem where population equality and compactness are treated as objectives while contiguity is a side constraint.

Since its introduction in 1964 by the US Supreme Court, the equal population criterion has widely been studied and played a paramount role in American politics. In different real scenarios, population equality is considered essential and even small deviations from the ideal had been challenged in court. Much of the redistricting jurisprudence devotes to this feature, and there are several tribunal cases that set precedence within other criteria [25]. Additionally, the legislative body standards are quite strict, requiring equal population “as nearly as is practicable,” so districts must have approximately the same number of inhabitants or the redistricting plan could be rejected. There are several methods for measuring population equality. Generally, it is calculated by the sum of the absolute deviations of all the districts divided by the total number of districts. However, in some states, it is determined by the difference in population between the largest and the smallest district divided by the ideal number of inhabitants. Nevertheless, neither of these two measures on its own provide a full picture of the degree of population equality. For example, if we fixate (do not modify) the largest and the smallest districts, very different redistricting plans can be arranged, even though the best solution should be a redistricting plan where all the remaining districts are closely clustered around the ideal population. This analysis can only be done if both measures are examined simultaneously. Therefore, we propose a multiobjective approach where both measures, and compactness, are considered as objectives to optimize, whereas contiguity is incorporated as a constraint. We designed two multiobjective algorithms based on the nondominated sorting genetic algorithm II (NSGA-II) and the strength Pareto evolutionary algorithm II (SPEA-II) and evaluated their performance in three of the most populated states in the United States: California, Texas, and New York. The remainder of this paper is organized as follows: in Section 2, some relevant multiobjective optimization concepts are defined; the problem definition is provided in Section 3; Section 4 includes a description of the proposed multiobjective heuristic algorithms; in Section 6, the computational experiments are explained and discussed. Finally, conclusions and future lines of research are included in Section 7.

2. Multiobjective Optimization

Many real-world scenarios involve simultaneously optimization of several objectives in order to solve a certain problem [26, 27, 28]. Formally, a general multiobjective optimization problem (MOP) can be formulated as [29]where is the feasible region in the decision space and is a vector of decision variables. consists of m objective functions, , where is the objective space.

The objectives in (1) are often in conflict with each other. Improvement of one objective may lead to the deterioration of another. Therefore, a single solution, which can optimize all objectives simultaneously, does not exist. Instead, in an MOP, the goal is to find the best trade-off solutions, called the Pareto optimal solutions, that are important to a DM. To define the concept of optimality for a multiobjective problem, the following definitions are provided.

Definition 1. Let x and y be vectors of decision variables such that , we say that x dominates y, denoted as , if and only if, for all .

Definition 2. A feasible solution of problem (1) is called a Pareto optimal solution, if and only if there is no other solution such that . The set of all efficient solutions or Pareto optimal solutions is called the Pareto set (PS), which is denoted as .

Definition 3. If is Pareto optimal, then is called as a nondominated point or an efficient point. The set of all nondominated points is referred to the nondominated frontier or the Pareto front (PF). The Pareto front is the image of the PS in the objective space, which is denoted as .
The main goal when solving an MOP is to provide the DM the so-called Pareto optimal set of Pareto optimal (or nondominated) solutions.

3. Problem Definition

Political redistricting can be characterized as a multiobjective combinatorial optimization problem with criteria and constraints that fulfill democratic ideals. There are several criteria based on geographic, socioeconomic, and cultural attributes suggested in the literature to avoid political interference or gerrymandering, and some are usually imposed by law in different countries. However, there is a general consensus that population equality, contiguity, and compactness are fundamental in any electoral democratic process. In this work, we use a multiobjective model that considers population equality and compactness as objectives and contiguity as a constraint.

3.1. Constraints

Let us define as a set of GU’s that must be grouped into k zones or districts. Let be the set of all the GU’s that belong to the i-th zone, and let S be a districting plan, . Then, each district can be defined through a set of binary variables such that if the l-th GU belongs to district and otherwise. In order to consider S a feasible districting plan, all zones must attain a contiguity property, meaning that each GU in a district can be connected to every other GU in the zone via GU’s that are also in the district. In addition, the following constraints must be satisfied:

Constraint (2) implies that each district must be nonempty; in other words, a district must contain at least one GU. Constraints (3) and (4) assure that each GU is assigned to exactly one zone. Finally, constraint (4) indicates that a districting plan must be complete in terms that all units are assigned to a district. Thus, a redistricting plan can be considered feasible if (C1) each district is connected, (C2) the number of districts is equal to k, and (C3) each GU is assigned to exactly one district.

3.2. Population Equality

Population equality pursues the one man, one vote principle, and it seeks that districts should have about the same population. In this paper, we use two different population equality measures, a mean deviation and an overall range.

The mean deviation is equal to the absolute deviation of a district divided by the product of the total number of districts and the ideal population:where represents the population of the district i, is the average district population, and k is the number of districts to be designed.

The overall range is given by the difference in population between the largest and the smallest districts divided by the ideal population:where and represent the population of the largest and smallest districts, respectively, and is the average district population.

Equations (5) and (6) measure the level of population deviation among the districts. The lower the value for , or , the better population equality. In an ideal case, the number of inhabitants in each district is equal to the average district population, and in this case, equations (5) and (6) reach their lowest possible value of zero.

3.3. Compactness

Compactness deals with the promotion of regular shapes among the districts. It is included to avoid the creation of irregular zones for political purposes. Although compactness is considered essential to prevent the creation of unfair political boundaries, there is no consensus to define it. There are different measures, or shape indices, proposed in the literature that can be used to quantify the compactness of a district, see [30, 31, 32] for surveys on compactness metrics. We decided to use a simple and widely used compactness measure that compares the area and the perimeter of each district as follows:where represents the perimeter and the area of district .

From equation (7), we can observe that the more compact all the districts are, the closer the cost is to 0.

4. Heuristic Algorithms

In this section, we include a description of the proposed multiobjective algorithms for the redistricting problem. These algorithms are able to produce a set of nondominated redistricting plans that satisfy constraints C1C3, whereas objectives , mean deviation, , overall range, and , compactness are minimized.

In order to obtain a good balanced design of the districts, we use two popular multiobjective evolutionary algorithms, namely, NSGA-II and SPEA-II. We first present the main framework for these multiobjective techniques. We then describe the tailored components for these genetic algorithms such as the solution encoding and genetic operators.

4.1. NSGA-II

The nondominated sorting genetic algorithm II (NSGA-II) is a multiobjective metaheuristic originally proposed by Deb et al. [10]. NSGA-II blends the main characteristics of a genetic algorithm and the concept of Pareto dominance.

Firstly, NSGA-II creates a random parent population, , of size N. At each generation, solutions are ranked into several classes or fronts according to its nondomination level. All nondominated solutions are included in front level 1, or front 1. This front represents the best efficient set and is temporarily disregarded from the population. Iteratively, nondominated solutions are determined and assigned to front level 2, or front 2. This new front represents the second best efficient set. The process is repeated until the population is empty. This procedure is called fast nondominated sort.

In order to maintain population diversity, a second value called crowding distance is calculated for solutions that belong to the same nondominated front. This measure estimates population density around a solution in the objective space. The extreme points of each front are assigned with an infinite distance, so they are preserved and can introduce more dispersion in the population. As a consequence, every chromosome will have two attributes, the nondomination rank and a crowding distance.

Next, a binary tournament is applied. Two solutions are picked randomly from the population, and the winner is the lowest ranked individual; if the rank is the same for both, the winner will be the one with the highest crowding distance. This strategy is applied to select N pairs of parents, subsequently the crossover and mutation operators are applied to obtain a new population, . Finally, the fast nondominated sort and the crowding distance are applied to all solutions in , and the N best-ranked solutions are retained to the next population. This process is repeated a predefined number of generations. The pseudocode of NSGA-II is shown in Algorithm 1.

Input parameters: N (population size)
Output: A (Pareto front approximation)
(1) Generate an initial population,
(2) Evaluate objective values of solutions in
(3) Use the fast nondominated sort to assign a rank to each solution
(4) Calculate the crowding distance of each solution
(5)while stop criterion is not reached do
(6)  Select parents from using a binary tournament selection based on rank and crowding distance
(7)  Apply crossover and mutation operators to create a set of new solutions,
(8)  Evaluate objective values of solutions in
(9)  Use the fast nondominated sort to assign a rank to each solution in
(10)  Calculate the crowding distance of each solution in
(11)  Replace solutions in with the N best solution in
(12)end while
4.2. SPEA-II

The strength Pareto evolutionary algorithm (SPEA-II) proposed by Zitzler et al. [33] is a Pareto-based technique with an elitist strategy for dealing with multiobjective optimization problems. SPEA-II is an enhanced version of its predecessor, SPEA, that has shown competitive results in comparison with other popular multiobjective evolutionary algorithms like PESA [34], VEGA [35], and NSGA-II [10]. In general, SPEA-II follows a generic evolutionary process which involves population initialization, reproduction operators, and a selection scheme.

SPEA-II guides the search towards the Pareto optimal set through three distinctive aspects: (SP1) a fine-grained fitness assignment mechanism such that, for each solution , the number of individuals that dominate and the number of individuals dominated by it are considered, (SP2) a nearest neighbor density estimation technique, and (SP3) an archive truncation method that ensures border solutions are preserved. The pseudocode of SPEA-II is shown in Algorithm 2.

Input parameters: N (archive size), T (maximum number of generations)
Output: A (Pareto front approximation)
(1)
(2) Generate an initial population, , and external archive,
(3)while do
(4)  Calculate fitness values of individuals in and
(5)  Copy all nondominated individuals in and to
(6)  if then
(7)   Fill with dominated solutions in and
(8)  end if
(9)  if then
(10)   Reduce by means of the truncation operator
(11)  end if
(12)  Create a mating pool applying binary tournament selection
(13)  Apply crossover and mutation over the mating pool and set to the resulting population
(14)   UpdateParetoFront(P, )
(15)  
(16)end while

SPEA-II manages an external archive and a regular population. The external archive maintains a set of nondominated individuals found so far from the initial population and takes part in the genetic operations (crossover and mutation). At each generation, all solutions in the archive and regular population are assigned a fitness value. This assignment of fitness has two main components: the so-called raw fitness, based on the dominance concept, and a density estimation, obtained through an adaptation of the k nearest neighbor method. Additionally, a truncation method is used to keep a constant number of individuals in the archive. SPEA-II steps are briefly discussed here, and a more detailed description can be found in [33].

5. NSGA-II and SPEA-II Adaptations

The current section is intended to describe in detail the encoding of the candidate solutions and the genetic operators that the proposed methods use for the redistricting problem.

In order to enhance the search capabilities of both algorithms, we designed specific operators that exploit the spatial configuration of the problem such as strategies to generate the initial population, crossover, and mutation. It should be mentioned that these procedures are the same for both techniques.

5.1. Solution Encoding Scheme

To apply genetic operators, a proper encoding scheme must be defined. In this paper, a candidate solution is represented by an integer array a of length equal to the number of GUs, in which the l-th element of the array denotes the district number that is assigned to the l-th GU. For example, means that the l-th GU unit belongs to the district number five. It is worth noting that this encoding strategy assures that every geographic unit is assigned to exactly one zone.

5.2. Population Initialization

The algorithms begin by creating a set of feasible solutions to form the initial population. In order to generate an initial solution, the following construction strategy was devised. Firstly, all GUs are labeled as available. The algorithms then randomly select k GU’s, assign them to different districts, and label them as not available. In consequence, at this moment, each district has only one GU. Finally, each district is iteratively extended by adding an available GU having a frontier with the district in its current shape. Every time a GU is incorporated into a district, it is labeled as not available in order to avoid the construction of overlapping districts. The latter step is performed until all the GU’s are labeled as not available. This process ensures that each initial solution consists of n connected districts that include all GU’s. This procedure is repeated until the initial population is fully populated.

5.3. Fitness Evaluation

Traditionally, genetic algorithms assess the population based on the fitness function. An individual with a higher fitness value will have a greater chance to be selected into the next generation. As opposed to single-objective optimization where a scalarized set of objectives into a single function is used directly as the fitness value, NSGA-II and SPEA-II are based on the concept of Pareto optimality to appropriately assess the quality of solutions.

In this work, the proposed algorithms simultaneously optimize population equality, overall range, and compactness using the built-in fitness schemes described in Sections 4.1 and 4.2. NSGA-II employs the nondominated sorting procedure for fitness assignment and parent selection, while SPEA-II uses the raw fitness and a density measure. More details of these procedures can be found in the original studies [10, 33].

5.4. Crossover Operation

This operator provides a recombination strategy on the chosen parent solutions to generate offspring. To exchange GUs between neighboring districts, we formulated a crossover mechanism inspired by the general path relinking method described in [36]. Let and be two parents solutions selected for recombination (i.e., two districting plans) and u a randomly chosen GU.

Thus, there is a district and a district such that . Let be the set of GU’s in but not in , and the set of GU’s in but not in .

For the following steps, we only consider the GU’s in solution . Let us define to be the set of GU’s in that can be inserted into a district contiguous to . On the contrary, we define to be the set of GU’s in that are in the border line with . Then, a GU in is extracted from and inserted into any randomly chosen district contiguous to , and a GU in is inserted into .

In order to illustrate this process, we include the following example. Let X be a set of twelve GU’s that must be divided into two districts, , and let and be two parent solutions, (Figure 1). Suppose that we randomly select GU u. In this case, district is represented by the red zone in , and district is represented by the green zone in . Thus, and . Using this information, we can set and . Next, we consider the GU’s in that can be inserted in a neighbor district to , , and the GU’s in that are in the border line with , . Finally, we can produce a new solution if, for example, we move the GU s to the blue zone and the GU r to the red zone.

Note that these movements can produce a disconnection in district so that a repair process must be applied. The number of connected components in is counted after the moves previously described. If the number of connected components equals 1, then the district is connected. Otherwise, the algorithms define the connected component that includes GU u (i.e., the GU used within the above-described recombination strategy) as district ; subsequently, the remaining components are assigned to other adjacent districts. In this way, constraints C1C3 are satisfied.

This way, a path can be created from solution to solution . However, due to the computational complexity of the full path enumeration, this mechanism is performed only once for each crossover process.

5.5. Mutation Operator

Next, a mutation procedure is applied to increase the diversity of the population by selecting GUs for district reassignment. First, a random district, , is chosen, and a GU in this district is moved to a neighboring district, . This move can produce a disconnection in district so that a repair process must be carried out. The number of connected components in is counted after the move. If it equals 1, then the district is connected. Otherwise, the algorithms define the connected component that has most GU’s as district and the remaining components are assigned to . In this way, constraints C1C3 are preserved. Finally, the specific NSGA-II, or SPEA-II, techniques are applied to select the best solutions for the next generation, and the procedure of crossover, mutation, and selection is repeated.

Crossover can produce solutions that are very different from their two parents, whereas mutation produces very small changes. After some experiments, we found that if we apply the mutation operator to solutions in the current population, we can find some improvements that cannot be achieved for new individuals. Thus, a mutation was applied two times, once for new individuals and another for the current population.

We want to remark that we found that, in some cases, the repair process can be required to reach an optimal solution. As a matter of fact, if disconnected solutions are rejected, instead of repaired, the optimal solution could not be found. In Figure 2, we include an example to illustrate this situation. Suppose that we must generate two zones and Figures 2(a) and 2(d) represent the initial and the optimal solutions, respectively. If we move GU A, or B, from its early state to the red zone, a disconnection is produced in the blue zone, as depicted in Figure 2(b). If these movements are rejected because they produce a disconnection, the optimum will never be reached. On the contrary, if we repair the disconnected solution (Figure 2(c)), the algorithm will be able to find the optimum. This kind of configurations can be found in real cases; in Figure 3, we can see a set of GU’s in West Virginia that are distributed similarly to the previous example. When we face these type of scenarios, we do not know a priori if the GU’s in blue must be together or in different zones to reach the optimum. Thus, we designed our algorithms so that they can merge or separate any subset of GU’s into one or different zones. After this analysis, we concluded that unless a repair process is implemented, the quality of the redistricting plans could heavily depend on initial solutions.

A similar conclusion was reported in [37]. The authors showed the advantages of a strategic oscillation procedure and concluded that transitions between feasible and infeasible space can be useful to reach an optimum. We think that this is a very important discovery for the redistricting problem since depending on the initial solution, the optimum of some instances can only be reached by visiting infeasible regions.

The NSGA-II and SPEA-II implementations proposed within the framework of this study can be seen as an adaptation of the classical multiobjective algorithms for the redistricting problem. Actually, the canonical structures of both strategies have been respected:(i)For the NSGA-II version, the fast nondominated sort and the conventional crowding distance are used to select the best individuals for the next generation.(ii)For the SPEA-II version, the characteristics SP1–SP3 are used to update the current population and the external archive.

6. Computational Experiments

This section presents the experiments carried out with the two algorithms described in the previous section, for the redistricting problem of three of the most populated states in the United States, namely, California, New York, and Texas. The number of inhabitants, according to the census of 2010, and districts for each state is presented in Table 1. As mentioned previously, all redistricting plans must satisfy constraints C1C3, whereas equations (5)–(7) are considered as objective to minimize. Additionally, we considered that the maximum allowed population deviation between the largest and the smallest districts should be 5%. Therefore, plans with a greater overall range were considered unacceptable, and they were rejected.

6.1. Experimental Settings

In order to deal with the stochastic effect inherent to heuristic techniques, 30 independent executions were performed for each algorithm. Additionally, the same seeds were used for both algorithms so that both techniques started with the same initial solutions.

Regarding the operating parameter settings, the only parameters to be tuned are the population size and maximum generation number. These latter parameters determine the number of objective function evaluations (OFEs) made during the search and, ideally, both optimization techniques should use the same number of evaluations in order to have a fair comparison basis. After some experiments, we found that both techniques required 1,000,000 OFEs to find good approximations to the Pareto front. In the present work, we chose to set the population size for both algorithms to 20 individuals and 25,000 generations (the mutation operator was applied to the new individuals and to the solutions in the current population).

For each state, NSGA-II and SPEA-II produce 30 approximated Pareto fronts (one for each run), and the nondominated solutions of the 30 fronts are combined to form one global front attained by each technique. Finally, the true Pareto front, , is considered as the union of the global fronts obtained with both methods (though removing dominated solutions).

In order to assess the performance of all techniques, we employed the participation to and the hypervolume metric. These measures allow evaluating the quality of the produced set of nondominated solutions produced by each algorithm. The first indicator simply expresses the proportion of solutions produced by each technique that participates in the global combined Pareto front.

The second quality indicator, denoted as , was originally proposed in [38]. The hypervolume of a set of solutions represents the size of the portion of objective space that dominates those solutions conjointly. It is the only quality indicator known to capture in a single scalar both convergence and diversity of the solutions to the optimal set. Let indicate the Lebesgue measure; then is defined aswhere is the volume among solution vectors y and , dominated by B under a reference point . The higher the hypervolume, the better the quality of the nondominated set found. The reference point used to calculate the metric must be chosen in such a way that its coordinates are larger than those of the nadir point in .

6.2. Numerical Results and Discussion

It was observed that the resulting redistricting plans generated by both algorithms were always feasible according to constraints C1C3. However, some solutions were not considered because their overall range exceeded the allowed 5%. Indeed, NSGA-II and SPEA-II were designed to produce a set of well spread nondominated solutions. Therefore, the algorithms will produce some solutions with low compactness cost although their mean deviation and overall range can be affected. After removing these redistricting plans, the remaining solutions were filtered through a Pareto sorting procedure to identify the final nondominated sets. The global fronts produced by NSGA-II and SPEA-II (e.g., the nondominated solutions obtained from the 30 executions of each method) are shown in Figure 4.

First of all, we can see that both algorithms were able to generate redistricting plans with low mean deviation and overall range costs, whereas the value for compactness seems to be high. At this point, we must remark that, according to equation (7), the most compact shape is a circle and even a square, which is also considered a very compact shape, will have a positive compactness cost equal to 0.2146. Thus, in order to reach low compactness costs, we should produce redistricting plans where all districts resemble “almost” perfect circles. Even in this case, which is almost impossible to achieve in real scenarios, the irregular boundaries of states and geographic units may increase the cost of compactness, regardless of the shape of the districts. For example, in Figure 5, we include the compactness cost for a circle, a square, a circle made up of squares, and a circle with an irregular contour. According to equation (7), the perfect circle will have a compactness cost equal to zero, whereas the remaining figures will have a positive cost, which will depend on their area-perimeter ratios. Therefore, the cost of the solutions generated by NSGA-II and SPEA-II does not seem to be so high if we consider that we must sum the compactness of a large number of districts with irregular contours, for example, fifty-three districts in California.

On the contrary, although NSGA-II and SPEA-II generated redistricting plans that satisfy criteria C1C3, some of these plans have an overall range higher than 5%. In fact, contrary to what was expected, NSGA-II produced a very small number of solutions with an overall range below the allowed threshold. Based on these observations, we decided to create enhanced variants of NSGA-II and SPEA-II, which we will be referred now on as NSGA-IIHC and SPEA-IIHC.

In order to address the aforementioned obstacles that affect NSGA-II and SPEA-II, we decided to apply the following ideas. First, we defined an additional criteria C4, which limits the overall range as lower or equal to 5%. Thus, a redistricting plan can be considered feasible only if it satisfies criteria C1C3 and C4. Second, since all the solutions generated by our algorithms always satisfy criteria C1C3, we decided to promote criterion C4 by applying the following rules to compare two solutions [39]:R1: between two solutions that do not satisfy criterion C4, the one having a lower overall range is preferredR2: a solution that satisfies criterion C4 is always preferred over a solution that does notR3: between two solutions that fulfill criterion C4, the one having the better objective function is preferred

This way, at the beginning, when all solutions have a high overall range, the algorithms use rules R1 and R2 to promote a better distribution of the population among the districts. As the algorithms advance, more solutions that satisfy criterion C4 are generated and the techniques explained in Section 4 are applied to select the best options.

Additionally, we can note that the overall range of a solution will change only if the number of inhabitants in the biggest, or smallest, district is modified. Thus, when a solution is infeasible, the algorithms apply the following strategy: the most (the less) populated district has a higher probability of being chosen to give (to receive) GU’s to (from) its neighbor districts. Therefore, we slightly changed the operation mode of the crossover and mutation. When a solution is infeasible, these operators must apply a biased roulette wheel selection, based on the number of inhabitants in each district, to choose the districts that must be modified. Obviously, to achieve the balance between intensification and exploration, this strategy should not be applied when the overall range is lower than 5%.

To sum up, NSGA-IIHC and SPEA-IIHC encourage an additional criterion C4, and they compare solutions using rules R1R3 and employ a modified crossover and mutation using a biased roulette wheel strategy. The new versions, NSGA-IIHC and SPEA-IIHC, were used to repeat the previously described experiments. Again, we rejected any redistricting plan with an overall range higher than 5%, and the remaining solutions were filtered through a Pareto sorting procedure. We present the Pareto fronts approximated by NSGA-IIHC and SPEA-IIHC in Figure 6. A first and clear observation is the number of nondominated solutions provided by each technique; this time both algorithms found a higher number of feasible solutions.

In Figures 79, we include some examples of the redistricting plans generated by the algorithms for California, New York, and Texas, respectively. In these figures, we can see that the algorithms foster the production of regular shaped districts. In fact, some of the most compact districts closely resemble a square or a rectangle. However, the generation of perfect squares is almost impossible due to the irregular boundaries of the states and its GU’s. In addition, the overall range cost of these solutions must be lower than 5%, thus the proposed districts should represent an adequate balance between two competing objectives: population equality and compactness.

Finally, in order to compare the performance of the four algorithms, all the nondominated solutions generated by NSGA-II, SPEA-II, NSGA-IIHD, and SPEA-IIHD were filtered through a Pareto sorting procedure. The true Pareto fronts (e.g., the combination of those four global fronts) are presented in Figure 10. First of all, we want to highlight the fact that none of the solutions produced by NSGA-II, without the handling constraint strategy, were included in the true Pareto Front. This result was to be expected because this algorithm had difficulties to generate solutions with a small overall range. On the contrary, NSGA-IIHC was able to produce solutions in the extreme points of the true Pareto front. In fact, the most equally populated redistricting plans for Texas and New York were generated by NSGA-IIHC. Nevertheless, a second trend can be extracted. The front illustrations show that the NSGA-II had troubles for producing a set of evenly distributed solutions in the true Pareto front. This observation, therefore, indicates that the NSGA-II, when trapped in a locally optimal front, experiences difficulties in jumping this barrier and getting to the real nondominated front. Additionally, as we previously said, the main goal of multiobjective optimization is to provide the DM with a wide set of nondominated solutions. Under this point of view, SPEA-II and SPEA-IIHC outperform the NSGA-II versions. If we concentrate our analysis on the solutions that lay on the true Pareto front, we can see that the solutions generated by SPEA-II and SPEA-IIHC are much more evenly distributed than those of the NSGA-IIHC. Finally, we can note that the best equally populated redistricting plans for California were generated by SPAE-IIHC.

As a consequence of the former observations, the composition of the (approximated) is definitely biased in favor of SPEA-II and SPEA-IIHC. Indeed, Table 2 proves that SPEA-II and SPEA-IIHC participate with more points in the true Pareto front than NSGA-IIHC. From this point of view, the SPEA-II versions obtain an indisputable superiority over the NSGA-II algorithms.

Regarding the hypervolume metric, Table 3 shows the hypervolume for the nondominated solutions reported by each algorithm after 30 runs. We can see that NSGA-IIHD obtains the best hypervolume for California, although the differences between NSGA-IIHD, SPEA-II, and SAPEA-IIHD are marginal. On the contrary, in Texas and New York, NSGA-II was outperformed by SPEA-II and SPEA-IIHD. In these states, the two algorithms based on SPEA-II obtain very similar hypervolume values. However, SPEA-II obtains the higher hypervolume in both instances. Up to this point, we cannot say that any of the three algorithms is clearly superior to their counterparts. On the one hand, if we use the data reported in Tables 2 and 3, we can say that SPEA-II seems to be the best option. On the other hand, in Figure 6, we can see that the best equally populated solutions were generated by NSGA-IIHD and SPEA-IIHD. For these reasons, we consider that the three algorithms should be used together to provide a wide set of high-quality solutions.

Finally, all the algorithms were compared in terms of computational efficiency. Table 4 summarizes the average execution time for each technique on California, Texas, and New York. The experiments were performed in a Pentium i7 (3.4 GHz) computer with 32 GB of memory. We can see that all the techniques showed similar running times for each state. It is worth noting that the execution time of the proposed algorithms also depends on the number of generations and the number of individuals in the population. Increasing these parameters would clearly result in greater execution time.

7. Conclusions

Population equality is one of the most important criteria considered for the design of electoral districts. In this paper, we considered two different measures to promote a balanced district population, the mean deviation and the overall range. These criteria and a compactness measure were included as objectives to minimize in a multicriteria formulation of the redistricting problem. We proposed two algorithms based on a classic implementation of NSGA-II and SPEA-II and two versions that included a handling constraint strategy. We also introduced an effective and efficient balancing population crossover and mutation operators to improve population deviations in newly created solutions. The four algorithms were applied in California, Texas, and New York.

From preliminary tests, we found that the classical version of NSGA-II struggles to find solutions that adequately satisfied population equality. Hence, we decided to use a handling constraint strategy to improve its performance, NSGA-IIHD. On the other hand, we observed that the conventional version of SPEA-II was able to produce high-quality solutions. However, in order to do a fair a comparison, we implemented a SPEA-II version with a handling constraint strategy, SPEA-IIHD. The nondominated solutions generated by these techniques and their participation in the true Pareto front were used to compare their performance. It was clear that the classical version of NSGA-II is not a competitive option for this kind of problems. The remaining algorithms seemed to have complementary behavior. Actually, the solutions generated by different techniques are clearly located in different regions of the true Pareto front. Thus, we cannot conclude that any of these algorithms is better than its counterparts.

However, we consider that the application of NSGA-IIHD, SPEA-II, and SPEA-IIHD can be a very valuable tool in any real redistricting processes. Especially if we consider that some state statutes do not establish a clear limit for population equality and compactness, constitutional language only suggests that districts must be substantially equal in population. In these cases, our proposals provide a wide set of nondominated solutions that can be used as a guide to identify improvements to any new redistricting plan, no matter if mean deviation, overall range, or compactness is considered more important. Additionally, the time required by each algorithm is acceptable if we contemplate that a redistricting plan is used for many years.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.