Abstract

Censuses in Mexico are taken by the National Institute of Statistics and Geography (INEGI). In this paper a Two-Phase Approach (TPA) to optimize the routes of INEGI’s census takers is presented. For each pollster, in the first phase, a route is produced by means of the Simulated Annealing (SA) heuristic, which attempts to minimize the travel distance subject to particular constraints. Whenever the route is unrealizable, it is made realizable in the second phase by constructing a visibility graph for each obstacle and applying Dijkstra’s algorithm to determine the shortest path in this graph. A tuning methodology based on the irace package was used to determine the parameter values for TPA on a subset of 150 instances provided by INEGI. The practical effectiveness of TPA was assessed on another subset of 1962 instances, comparing its performance with that of the in-use heuristic (). The results show that TPA clearly outperforms . The average improvement is of 47.11%.

1. Introduction

The National Institute of Statistics and Geography (INEGI, its acronym in Spanish) is the government institution responsible for requesting, processing, analyzing, preserving, and publishing statistical and geographical data that sustain decision making processes in Mexico. These activities are done continuously, require significant public budgets, and involve specific rules and care to ensure the reliability of the generated information. Among its several functions, the planning and execution of official censuses are important.

To carry out censuses and polls, INEGI faces the logistic problem of determining a minimal length travel route through the city blocks of the instance assigned to each census taker or pollster. This problem is subject to two constraints: (a) the route must be done exclusively by accessible roads, and (b) for practical reasons each city block must be completely traversed before passing to the next one.

Presently, INEGI approaches this problem by applying the following rules of thumb (INEGIH) [1]: (a) the first city block to be traversed by the pollster is the one situated in the northwest corner of its assigned instance; (b) each city block must be completely toured, starting from its northwest corner, before going to the next one; (c) once a city block has been toured, the next one is chosen following a zigzag path from north to south (see Figures 1 and 2).

From the combinatorial optimization viewpoint, this problem can be posed as a Minimal Hamiltonian Path Problem (MHPP) with side constraints. Given a graph with weighted edges, MHPP consists in finding a minimal length path that visits each node of exactly once. The similarity between MHPP and the well-known Traveling Salesman Problem (TSP) is clear. Hence, in view of the constraint (b) above, INEGI’s problem could be modeled as a directed General Traveling Salesman Problem (GTSP), which, in turn, can be transformed in a directed TSP, allowing us to use the standard solution methods for TSP (see [2] and the references therein).

Also, INEGI’s problem could be seen as an unpublished member of the family of arc-routing problems, but as its instances data are only available in terms of vertex coordinates, this kind of modeling did not appear straightforward. For a thorough account of arc-routing problems see [3].

The decision was to model INEGI’s problem as an undirected MHPP with side constraint (b) in a geometric realm. The transformation together with standard solution methods did not seem attractive because, being of general use, it does not take advantage of the geometric aspects of the problem, and its performance depends significantly on the number of vertices of the instances. In contrast, this work proposes a solution method whose performance relies more on the number of blocks than on the number of vertices. Other approaches could be the subject of future research. As MHPP has been classified as NP-complete in complexity theory [4], the proposal is an ad hoc heuristic procedure, called TPA (TPA: Two-Phase Approach), composed by two phases. The first one uses an SA algorithm [5] aimed at solving MHPP in a graph whose nodes correspond to the vertices of the polygons defining the city blocks. The second phase is devoted to repair unrealizable solutions (with crossings over the houses or buildings) obtained by the SA algorithm. This is done by means of computational geometry concepts such as visibility graphs [6, 7] and then finding the shortest paths in these graphs with Dijkstra’s algorithm [8].

TPA was assessed on a subset of 1962 real-world instances provided by INEGI. Results show that TPA is able to attain an average improvement of as much as 47.11% with respect to INEGIH.

The rest of the paper has been organized as follows. In Section 2 the problem under study is formally defined. The implementation details of TPA are discussed in Section 3. Then, Section 4 presents computational experiments aimed at determining suitable parameter settings for TPA and at comparing its performance with that of INEGIH. Finally, the main conclusions of this research are provided in Section 5.

2. Problem Formalization

The optimization problem studied in this paper consists in finding a minimal length route that traverses all the city blocks in a given instance, subject to the following specific requirements: (a) each city block must be completely traversed before passing to the next one, and (b) the route must be done exclusively by accessible roads, namely, without crossing any city block. In this sense, it can be seen as an instance of the MHPP.

Figure 3(a) shows an unrealizable route because of crossings over city blocks. However, crossings can be eliminated by adding vertices to the route (see Figure 3(b)). Most instances have blocks with irregular, asymmetric geometries, and many more vertices.

More formally, let be the set of city blocks of an instance. Each block is defined as an -side polygon with vertex set and perimeter

Let be the set composed of all the vertices of city blocks, and let . Given a subset that includes at least one vertex from each city block , a route is an order on of size . Then, the total length (cost) of route iswhere stands for the Euclidean distance between vertices . The problem consists in finding a route minimizing (1); namely,where is the set of all the possible routes.

3. TPA

The problem under study was solved by implementing TPA as mentioned in Section 1. Below, its pseudocode is shown as Algorithm 1. In the first phase of TPA a route is constructed by applying the SA algorithm. The core of SA is a neighborhood function, called , composed of two carefully designed neighborhood relations (see line 10). The route obtained by SA is expected to be optimal or near-optimal; however, it could be unrealizable because, very likely, the neighborhood function generates routes with crossings over city blocks. Thus, the second phase of TPA (see line 20) is devoted to repair those unrealizable solutions. This is done by adding to one or more vertices of some city blocks so as to avoid crossings over them. The implementation details of the TPA algorithm are as follows.

(1)  TPA (,  ,  )
(2)  begin
(3)  generate a random initial solution
(4)  
(5)  
(6)  repeat
(7)   
(8)   while  nbIter < maxIter  do
(9)     
    / generate a neighboring solution
    /
(10)   
(11)   
(12)   generate a random real number
       
(13)   if () or () then
(14)      
(15)      if    then  
(16)   end
(17)  end
(18)  
(19) until stopCondition( )
(20)  repairCrossings()
(21) return
(22)end

3.1. Internal Representation

During the SA process, a route (a solution) is represented by vectors and , where, for , is the city block visited in the th step of the route and is the vertex of block where the route leaves to go to block , or where the route arrives to when coming from block .

3.2. Neighborhood Function

The main objective of the neighborhood function is to produce solutions (routes) by making small changes to the current solution. This neighborhood function is composed of two neighborhood relations (see Algorithm 2). The first one, , generates a neighboring solution by swapping, in the current solution , two randomly selected city blocks (uniform distribution). Its range is .

(1)   generateNeighbor (
     /   is represented using two vectors, one of blocks B and one of vertices .
     represents the th-block visited in the path and the vertex used for this block /
(2)   begin
(3)   generate a random integer number
(4)    ()  then
(5)    select two different blocks at random positions and in
     / exchange the blocks and in the solution /
(6)   swapBlocks ()
(7)   else
(8)    select a block at a random position in , that is,
(9)    
(10) 
     / look for the vertex of which is closer to and does not generate crossings /
(11) for  every vertex    of block    do
(12)    dist
(13)    cross
(14)  if (cross = false) and () then
(15)      
(16)      
(17)  end
(18) end
    / assign the vertex as the one used by block /
(19)    changeCurrentVertex()
(20) end
(21) return  
(22) end

The second neighborhood function, , changes the vertex of one randomly selected city block, currently used in route , to produce a neighboring solution with the following procedure. First a city block, say , is randomly selected. Then, the next block in the current route (namely, ) is also considered to identify one of its vertices satisfying the conditions: (a) it minimizes the distance between city blocks and and (b) it avoids crossings when connecting these two city blocks (see Algorithm 3). The latter condition is ensured by an efficient verification of lines intersection through computational geometry procedures [6]. The range of is bounded by .

(1)    verifyCrossings (
   /  verifyLineIntersection( ) returns true if the two lines intersect /
(2)    begin
(3)     
(4)     
(5)     
      / verify self crossings /
(6)     for  every edge    of block    do
(7)      
(8)     end
(9)     
      / verify crossings with block /
(10)  for  every edge    of block    do
(11)   
(12)  end
(13)  return (selfCrossings  or  otherCrossings)
(14) end

During the search process, a combination of the and neighborhood relations is employed. The former is applied with probability , while the latter is employed at () rate. Thus, the neighborhood function is defined aswhere is a random number uniformly distributed in the interval .

3.3. Evaluation Function

The selection of the evaluation function is a critical aspect of any search procedure. First, to efficiently test each potential solution, the evaluation function must be as simple as possible. Secondly, it must be sensitive enough to identify promising search regions in the solution space. Finally, the evaluation function must be consistent: a solution that is better than others must contribute with a better value of the objective function [9].

In TPA implementation, the cost of a solution is calculated using formula (1). To determine the cost of a route with formula (1), the perimeter of every city block and the Euclidean distance between every pair of vertices in the route must be computed. instructions must be executed (remember that represents the number of city blocks in the problem instance and is the size of an order (route) on ) by this complete evaluation scheme. Nevertheless, TPA employs an incremental evaluation of the neighboring solutions which takes advantage of the fact that the evaluation function contains a constant term, . This is precomputed only once and used along the search process. Furthermore, when a neighboring solution is generated with function , the incremental evaluation scheme only recomputes the (up to four) Euclidean distances that change, in order to obtain the cost of the route . This is much faster than the operations originally required.

A similar incremental evaluation mechanism was implemented for the neighborhood function . As a consequence, TPA analyzes thousands of neighboring solutions employing only a very small fraction of the time that would be required by the complete evaluation scheme. This is possible thanks to the use of appropriate data structures.

3.4. Cooling Schedule

A cooling schedule is defined by the following parameters: initial temperature, maximum number of solutions generated at each temperature value (Markov chain length), rule for decreasing the temperature, and stop condition. The cooling schedule governs the convergence of any SA [10].

In TPA implementation, a geometrical cooling scheme (static) was defined due to its simplicity, requested computational time, and quality of the solutions produced in preliminary experiments. It starts at an initial temperature . Then, the temperature is decreased at each round by a positive factor using the relation . For each temperature , the maximum number of neighboring solutions was fixed to . Hence, it depends directly on the range of the neighborhood function and on the constant , which is called “moves factor.” This is because more neighboring solutions are required for denser instances.

3.5. Stop Condition

TPA stops if either the best-known solution does not change during a predefined number of consecutive temperature decrements or the current temperature is lower than a prefixed value .

3.6. Repair Mechanism

Let denote the route that SA offers as optimal solution in the first phase of TPA. Recall that is defined by vectors and . The second phase of TPA consists in applying the following repair mechanism on (see Algorithm 4).

(1)   repairCrossings ()
(2)   begin
   / list of detected crossings in routing /
(3)   
     detect crossings in routing /
(4)   for  every edge  ,   used in routing    do
     / crossings with block /
(5)    
     / get a list of neighbor blocks /
(6)    
     / crossings with neighbor blocks /
(7)    
(8)   end
(9)   
(10) for  every detected crossing in    do
    / get a list of neighbor blocks /
(11)   
       / construct a visibility graph /
(12)   
      / find a path without crossings and minimum distance between and /
(13)    
       / repair routing using the path to avoid a particular crossing /
(14)     
(15) end
    / return a repaired routing with minimum total distance /
(16) return  
(17) end

Let denote the centroid of city block and, for , let , where and are the centroids of two neighboring blocks. Further, for , let be the set of blocks in the instance such that

Now, let denote a set of blocks in that are crossed by the line segment going from to , for (see lines 4–8). A line segment crosses a city block , if has nonempty intersection with the interior of the polygon defining . The crossings can be of two types: (a) self-crossings, when an edge crosses the city block itself, and (b) neighboring-crossings, those belonging to . For example, considering the blocks of Figure 3(a) as a subset of an instance and block as reference, the line segment 1-2 is a self-crossing, and the segment 2-3 is a neighboring-crossing.

Then, whenever is not empty, a set of neighboring blocks is produced (see lines 10–15). Set serves to build up a visibility graph ; see [7]. In , the set of nodes is composed of the vertices of all city blocks in , and edge if is side of a city block in or if the line segment with endpoints and does not cross a block in (namely, is visible from and vice versa). Each edge in has as weight the Euclidean distance between its endpoints.

Figure 4(a) presents a visibility graph between vertices 1 and 3. Figure 4(b) shows a realizable path between vertices 1 and 3. A shortest path between vertices and in is obtained by means of Dijkstra’s algorithm. Of course, this path, of minimal length, does not cross any single block. Dijkstra’s algorithm is a well-known, efficient algorithm to solve the so-called Shortest Path Problem (SPP) in a graph with nonnegative edge costs [8].

4. Computational Experiments

In this section, two computational experiments are presented. The first one was aimed at determining the parameter values of TPA that yield the best trade-off between solution quality and computing time. The second experiment was carried out to compare TPA with INEGIH in solving the route problem. For these experiments, the algorithms were coded in Java SE Development Kit version 7, update 51. They were run sequentially into a cluster equipped with 10 InfiniBand interconnected nodes, each of which features 8 cores running at 2.66 GHz, has a total of 16 GB of RAM, and uses CentOS distribution of the Linux Operating System.

4.1. TPA Parameter Tuning

Optimizing parameter values is an important activity when implementing efficient algorithms. The literature offers many possibilities to find a combination of parameter values that allow a particular algorithm to furnish the best solution quality in reasonable computational times [11, 12].

In this work, the strategy was to find the most appropriate parameter values of TPA by employing the iterated racing procedure () [13]. This offline automatic configuration procedure is implemented in as part of the irace package [14] and has been successfully used in several research projects [15]. I/F-Race is based on the use of racing [16] and Friedman’s nonparametric two-way analysis of variance by ranks. It consists of the following three steps, executed iteratively until a predefined termination criterion is met: sampling new parameter configurations according to either a normal or a discrete distribution, depending on whether the parameters are numeric or categorical, selecting the best parameter configurations from the newly sampled ones by means of racing, and updating the sampling distribution in order to bias the sampling towards the best configurations. Further details of the I/F-Race iterative procedure and its implementation can be found in [13, 14, 16].

For the experiments INEGI provided a set of 2112 instances, where the number of city blocks varies from 3 to 103, and the total number of vertices is in the range . From this set a subset of 150 instances served in the tuning experiments, and the remaining 1962 instances were used during the performance evaluation of TPA, as it will be detailed later. From preliminary experiments, it was found that TPA required different parameter configurations depending on the density of the instance to be solved. The density of a simple undirected graph is defined as For this specific problem , and is the number of blocks. Then, the total number of blocks sides in a particular instance can be expressed as . Consequently, the density of a particular instance can be computed asThus, the decision was to sort the test set obtained from INEGI according to density. Then, five representative ranks of density were identified (see Table 1) and 30 instances from each rank were randomly selected for tuning the TPA algorithm. The idea behind this is to have a representative sample of the instances for this experiment and to produce one good parameter configuration for each density rank.

A sample of 30 statistically representative instances of each rank was used to run the irace package [15]. 1500 independent runs were executed for each instance.

In Table 2, the allowed range of possible parameter settings used for the tuning process is presented. , , and stand, respectively, for initial temperature, moves factor, and maximum number of consecutive temperature decrements without improvement. For all the ranks, the range of the final temperature was , the range of the cooling factor was , and the range of the probability of using each of the two neighborhood functions was .

As a result, the five different parameter configurations depicted in Table 2 were obtained. These parameter configurations are therefore used in the experimentation reported next.

In Table 3, the parameter combinations found with irace for TPA are presented. grows as density grows. does not present a defined trend; however the values are relatively similar. Moves factor grows as density grows; however, in the densest rank, the value drops significantly to 0.66. In this rank, the instances have up to 103 blocks and 8461 vertices. A smaller parameter is required because of the large size of the neighborhoods and , and the relation .

4.2. Comparing TPA with INEGIH

The main objective of this section is to compare average solutions achieved by TPA with respect to INEGIH. Table 4 shows 1962 (2212 − 150) instances used to evaluate TPA performance. Next, results are compared with INEGIH. Due to the nondeterministic nature of TPA, 30 independent runs were executed for each instance. The parameter values employed by TPA are those previously found in Section 4.1.

The results from this experiment are presented in Table 5, summarizing, for each rank, the average data obtained by the compared algorithms. The solutions found by INEGIH are listed in column 2. Columns 3 to 6 show, respectively, the best, the average and the standard deviation of the total travel distance (cost) reached by TPA, and the needed CPU time in seconds. Additionally, the average improvement attained by TPA was computed as . Table 5 shows that TPA clearly outperforms INEGIH with an average improvement of 47.11%. This highlights the suitability of TPA. Also, a relatively small standard deviation is an indicator of the algorithm’s precision and robustness. TPA consumes, in average, 0.54 seconds per run. CPU time consumed by TPA is acceptable and fully justified by considering that it is able to outperform INEGIH in terms of cost. Results achieved by TPA are illustrated in Figure 5. The subset of 1962 instances is indexed in nonincreasing order of cost-improvement.

In regard to the instance with the highest number of blocks (92 blocks, 478 vertices), the improvement was of 70.90%. Further, in the instance with the highest number of vertices (51 blocks, 8461 vertices), the improvement was of 40.30%. The highest improvement, 98.82%, was presented in an instance of 59 blocks and 519 vertices. The lowest improvement was of 1.18% and was presented by an instance with 6 blocks and 275 vertices. TPA seems to be an effective approach to find good quality solutions for the route problem faced by INEGI.

In Table 5 it can be observed that ranks 2–5 present a relatively small standard deviation. This is an indicator of the algorithm’s precision and robustness, since it shows that, on average, the performance of TPA does not present important fluctuations. However, the standard deviation for rank 1 has a high value, which might be explained by the high variability of the instances’ size. For example, the maximum total perimeter of an instance belonging to this subset is 59.48 times the minimum total perimeter of another one. Further, when comparing the length of the optimal route in any two instances of rank 1, the maximum ratio is as much as 256.74.

5. Conclusions and Future Work

In this paper a Two-Phase Approach, called TPA, was proposed for the solution of a logistic problem faced by INEGI. In the first phase of TPA a route , of expected minimal length, is produced by an ad hoc implementation of the SA heuristic. Due to the particular implementation, route is likely to be unrealizable, namely, with crossings over the city blocks. Hence, the second phase of TPA is aimed to repair route , namely, to eliminate the crossings from it. This is done with the use of visibility graphs [6] and Dijkstra’s algorithm [8].

The parameter values for TPA were established through intensive experimentation on a representative subset of 150 instances. A tuning methodology based on the irace package was applied. The practical effectiveness of TPA was assessed on another subset of 1962 instances drawn from the test set, by comparing its performance with that of the in-use heuristic (INEGIH). The results show that TPA clearly outperforms INEGIH; the average improvement over the total distance traveled is of 47.11% (636.31 versus 328.77).

The problem resolved in this work has common elements with traditional arc-routing problems; however some details make it original. For example, the constraints are that (a) the route must be done exclusively by accessible roads, and (b) for practical reasons each city block must be completely traversed before passing to the next one. Constraint (a) implies a combination of an arc-routing problem with geometric aspects. Standard procedures exclude these considerations and therefore are not applied directly. The composed neighborhood implemented in the SA algorithm is an ad hoc strategy for this specific problem in order to face the presence of blocks and vertices with the challenge of irregular geometries causing unrealizable solutions. Also, the integral solution given to the problem, as a whole, is not reported in the literature. In this sense, this work represents an original contribution.

Although outstanding results were obtained by TPA, they may be still improved. Future work would concentrate on designing alternative neighborhood and evaluation functions, so as to boost the algorithms performance, since it is well-known that these are two important components to define the so-called landscape of the search problem, and they impact thus greatly the overall efficiency [9]. Another future work could focus on a different problem modeling or different heuristic approaches. The instances as well as detailed results of the experiments presented in this paper are available under request.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The first author would like to thank Daniel Rivera, Héctor Sanvicente, and Joaquín Pérez (Ph.D. tutorial committee); PRODEP. This research was partly supported by the Programa para el Desarrollo Profesional Docente (PRODEP), from Mexico, through Grant 103-5/12/4535; Guillermina Urbano and Paulina Baldo from PRODEP; Mireya Gally from Universidad Politécnica del Estado de Morelos.