Abstract

Finding the shortest path of the traveling salesman problem (TSP) is a typical NP-hard problem and one of the basic optimization problems. TSP in three-dimensional space (3D-TSP) is an extension of TSP. It plays an important role in the fields of 3D path planning and UAV inspection, such as forest fire patrol path planning. Many existing studies have focused on the expected length of the shortest path of TSP in 2D space. The expected length of the shortest path in 3D space has not yet been studied. To fill this gap, this research focuses on developing models to estimate the expected length of the shortest path of 3D-TSP. First, different experimental scenarios are designed by combining different service areas and the number of demand points. Under each scenario, the specified number of demand points is randomly generated, and an improved genetic algorithm and Gurobi are used to find the shortest path. A total of 500 experiments are performed for each scenario, and the average length of the shortest path is calculated. The models to estimate the expected length of the shortest path are proposed. Model parameters are estimated and k-fold cross-validation is used to evaluate the goodness of fit. Results show that all the models fit the data well and the best model is selected. The developed models can be used to estimate the expected length of the shortest path of 3D-TSP and provide important references for many applications.

1. Introduction

The traveling salesman problem (TSP) is a typical combinatorial optimization problem and is also NP-hard [1, 2]. It is explained as a salesman trying to find the shortest path (or minimum travel cost) through a given set of cities, where the travel distances (or travel costs) between cities are also given, under the requirement of visiting each city only once and returning to the starting city [3]. TSP has a wide range of applications in transportation, logistics, computer science, genetics, and other fields [49]. The traveling salesman problem in three-dimensional space (3D-TSP) is a generalization of the classic TSP. 3D-TSP refers to an executor, such as a drone that tries to find the shortest path in the three-dimensional space and returns to the starting point after visiting each of the given demand points. The difference between 3D-TSP and the classic TSP is that the coordinates of the cities or demand points are no longer represented by two-dimensional coordinates but by three-dimensional coordinates. Many problems, such as the unmanned aerial vehicle (UAV) patrolling for inspection, could be regarded as a TSP in 3D space [10]. In these applications, it is important to know the expected length of the shortest path of the 3D-TSP under a given area and the number of demand points in order to determine the zoning strategy of the service area and the number of executors needed.

So far, many scholars have studied the expected length of the shortest path and proposed estimation models that adopt the number of cities and service area parameters as independent variables [1115]. However, models of the expected length of the shortest path of 3D-TSP have not yet been studied. As a result, this study contributes to determining the influence of the number of demand points and the relevant parameters of the service area on the expected length of the shortest path of 3D-TSP and establishing the estimation models of the expected length.

The remainder of this paper is structured as follows. Section 2 provides a review of the related literature on the algorithms to solve TSP and the estimation models of the expected length of the shortest path of TSP. Section 3 gives the formal definition of 3D-TSP, its objective function, and the improved genetic algorithm (GA) used in this paper. Section 4 describes the simulation experiment scenarios used in this paper, proposes the variables that can be used to construct the models, builds the models based on the results of the simulation experiment, and evaluates the goodness of fit of the models. Section 5 provides a summary of the entire study.

2. Literature Review

No research has yet been conducted on the expected length of the shortest path of the 3D TSP. Thus, the studies related to the expected length of the shortest path of the 2D TSP are reviewed.

Beardwood et al. [11] proposed that the expected length of the shortest path through N points in abounded plane region is always proportional to when N is relatively large. In two-dimensional space, this result can be used to estimate the expected length of the shortest path of the 2D TSP. The specific function is provided as Model (1) in Table 1. In the model, L is the expected length of the shortest path of the 2D TSP, A is the area of the service area, N is the number of points in the area, and α is the constant of proportionality, which is not related to the shape of the area. After that, many scholars have estimated the value of [1623].

Based on the study of Beardwood et al., Daganzo [12] considered the influence of shape on the expected length of the shortest route and proposed Model (2), as shown in Table 1. In the model, is an unknown constant, is the density of points (), l is the width of the rectangle, N is the number of points, and A is the area of the service area. In the same year, Daganzo [13] proposed Model (3) to estimate the expected length when there is a fleet of vehicles operating in an area with heterogeneous demand, as shown in Table 1. In the model, is an unknown constant, A is the area of the subregion, N is the number of points in the subregion, D is the average straight-line distance from each point in the subregion to the depot, and C is the maximum number of times that a car can stop.

Chien [14] presented seven functions for the length of the shortest path of TSP on the basis of the actual locations of customers and the depot and conducted a large number of Monte Carlo experiments to determine the values of related parameters. The results show that when the service area and randomly distributed points in the area are provided, Model (4) in Table 1 performs the best and can obtain a highly accurate approximation of the expected length. In this function, and are unknown constants, is the average straight-line distance between the depot (origin) and customers, is the smallest rectangular area that covers all the customers, and N is the size of TSP (number of customers plus 1).

Kwon [15] constructed the regression model and neural network model of the expected length of the shortest path for 2D TSP and compared the results with the model proposed by Beardwood et al., Daganzo, and Robuste et al. [24]. The results show that the model built by Kwon performed best. Different from the previous model, Kwon’s model also considers the effect of the aspect ratio S and of the rectangular service area on the expected length of the shortest path. The specific functions of the regression models established by Kwon are shown in Models (5) and (6) in Table 1. In Models (5) and (6), , , , and are unknown constants; N is the number of points in TSP (the sum of the number of depot and all the customers); S is the ratio of the length to the width of the service area; A is the smallest rectangular area that covers all the points; D is the average straight-line distance from all the customers to the depot. This study primarily refers to Model (5) in Table 1 (proposed by Kwon) to construct our models.

When studying the expected length of the shortest path of the 3D TSP, we need to first solve the given 3D TSP in order to obtain the shortest path. The solution algorithm of 3D-TSP is similar to that of the classical TSP. The only difference is the calculation of the distance matrix. Therefore, this paper adopts the solution algorithm of TSP to find the optimal solution for 3D-TSP. Many scholars have studied various algorithms to solve TSP. Solution algorithms for TSP can be divided into two types: exact algorithm and heuristic algorithm [19]. For small-scale problems, exact algorithms can be used to solve them. For large-scale or medium-scale problems, exact algorithms cannot generate the optimal solution in a reasonably short time, and thus heuristic algorithms are needed to find the approximate optimal solution. Some commonly used algorithms [2528] are shown in Table 2.

For the exact algorithm, the solver Cplex or Gurobiis is usually used when the number of demand points is less than or equal to 100. For heuristics algorithm, GA is a random parallel search algorithm based on natural selection and genetics. It is an efficient method to find the optimal solution without relying on any initial data. Due to its simplicity, GA is widely used to solve TSP. However, it has a certain dependence on the selection of the initial population. As a result, GA is often improved by combining it with other heuristic algorithms. For example, Li et al. [2] proposed to combine GA with other algorithms such as greedy, hill climbing (HC), and simulated annealing (SA) algorithm to improve GA to achieve better performance. Meng et al. [29] proposed a PBIL algorithm combining GA and competitive learning to explore better solutions. Later, Meng et al. [30] also proposed a variable neighborhood search (VNS) approach that utilizes a two-stage greedy initialization algorithm to generate an initial solution and employs a 2-opt method for local search. Xu et al. [31] proposed a biobjective variable neighborhood search (BVNS) approach based on the VNS proposed by Meng et al. Kumar [32] proposed to improve the GA with a greedy algorithm to improve the quality of the initial population. Among them, the combination of GA and the greedy algorithm has generated good results within a reasonable time. Therefore, this paper uses Gurobi and GA combined with the greedy algorithm to obtain the shortest path of the 3D-TSP in different scenarios.

A randomly generated geometric TSP instance [3] assumes each point is independently and randomly generated from the service area based on the uniform distribution. The travel cost between two points is the Euclidean distance between the corresponding points. This study implements relevant experimental designs and randomly generates TSP examples in 3D space. In addition, TSPLIB contains more than 100 TSP instances and 85,900 cities [33]. It provides a testbed for the proposed algorithm.

3. 3D-TSP and Improved GA Algorithm

3.1. 3D-TSP

Let us give a formal definition of the 3D-TSP. Suppose that there is a drone (or performer of action) that needs to visit each of the N demand points once and come back to the origin. It can be formulated over a complete graph , where vertex set ; and each edge in , is associated with a weight , which represents a visit cost (or distance) between two demand points i and j. The objective is to determine the route along which the drone departs from the origin, visits each point once, and returns to the origin with the lowest total travel cost.

The integer programming model of 3D-TSP is presented as follows [4]. Binary variable , if the drone passes through the edge (i, j); and otherwise, . The objective function of the problem is

Subject to the following constraints: first, the drone departs from the demand point numbered 0 (origin) and returns to the origin, that is,

Each demand point except 0 must be visited by the drone exactly once, that is:

For each demand point, introduce a decision variable . is the access order of demand point i. For example, in , it can be understood that point 1 is the fifth point visited from the starting point. M is a very large number. M should be an upper bound of . Some papers pointed out that the effect will be better if the tightest upper bound is taken, so we take M = N [25]. Construct Miller–Tucker–Zemlin (MTZ) constraint to eliminate subtour, that is:

It should be noted that since the classical TSP is NP-hard [1, 2] and 3D-TSP is an extension of the classical TSP in the 3D space, 3D-TSP is also NP-hard. When all the points of the 3D-TSP fall onto the same plane, 3D-TSP turns into the classical TSP.

3.2. Improved GA Algorithm

Due to the complexity of 3D-TSP and the relatively large number of demand points in this study, it is necessary to use heuristic algorithms to solve the 3D-TSPs. As mentioned in the literature review, GA has a good global search ability and is easy to be combined with other algorithms for improvement. Combining the GA with the greedy algorithm can shorten the time to obtain the solution. It is achieved by using the greedy algorithm to improve the initial population of the GA since a high-quality initial population can accelerate the evolution of the GA to quickly reach a satisfactory optimal solution [2, 32]. Therefore, this paper chooses to use the GA combined with the greedy algorithm to find the shortest path of 3D-TSP.

The initial parameters include the population size M, the number of chromosome genes N (that is, the number of demand points), the number of iterations C, the crossover probability Pc, and the mutation probability Pm. The specific steps to perform the algorithm are as follows.

Step 1. Generate the digital codes (chromosomes) of M feasible solutions of 3D-TSP through the greedy algorithm as the initial population (the feasible solution set).

Step 2. Select the inverse of the total length of the path as the fitness function, which indicates the shorter the distance is, the better the fitness function is. The fitness of each individual is calculated by the fitness function.

Step 3. Use the wheel selection mechanism for the selection operation, eliminate individuals with low fitness, and select individuals with high fitness.

Step 4. Randomly select two individuals and perform the partial matching crossover. The specific steps are as follows: first randomly generate two intersection points, determine the matching areas between the two points, and exchange the two matching areas. If the code repetition occurs outside the matching area, it will be replaced based on the corresponding position in the matching area.

Step 5. Randomly select an individual, extract two genes from the individual code, and exchange their positions. In this way, variation of individual code is achieved.

Step 6. The selection, crossover, and mutation of individuals generate a new population for the next generation. After a new population is generated, it is evaluated and then selected, crossed, and mutated. These steps are repeated until the maximum number of iterations or the termination condition of the algorithm is reached. Finally, an approximate optimal solution is found.
The flowchart of the improved GA is as follows (Figure 1).

3.3. Illustration

Figure 2 presents an example of the 3D-TSP. In this example, there are 10 demand points and the volume of the service area is 1000 with equal length, width, and height. The red line in the figure is the obtained shortest path. It can be seen from the figure that the length of the shortest path is 42.0285. The coordinates of each demand point are marked.

4. Experiments and Results

In this section, we first design the simulation experiment scenarios according to the previous literature [14, 15, 34], including the setting of the volume, shape, and the number of demand points of the service area. This section first compares the solution obtained by the GA combined with the greedy algorithm to the solution obtained by the solver Gurobi. The better solutions are selected for further analysis and modeling.

4.1. Settings of the Experimental Scenarios

The volume of the service area, the number of demand points, and the shape of the service area are three important factors that affect the expected length of the shortest path. We only consider the shape of the service area as a cuboid or cube. We assume that the longest side is length, the second longest side is width, and the shortest side is height. This study also assumes that the service area has three parameters: the volume of the service area (V), the ratio of length to width (S1, the ratio of the longest side to the second longest side), and the ratio of length to height (S2, the ratio of the longest side to the shortest side). When V is fixed, the shape of the service area will change with the change in S1 and S2. For example, when , which means that the length, width, and height are equal, the shape of the service area is a cube.

A total of 500 (5·10·10) experimental scenarios are designed in this study. The range of V includes 1000, 8000, 27,000, 64,000, and 125,000. Under the premise of keeping V constant, let S1 and S2 increase from 1 to 4 at an interval of 1, such that the shape of the service area is gradually elongated from a cube to a cuboid. After the service area is set, demand points are randomly generated in each of the 3D service areas. The number of demand points increases from 10 to 100 at intervals of 10. The demand generation process is repeated under each scenario.

4.2. Algorithm Comparison

The computer environment is as follows: Windows 10 (64 bit), CPU is AMD R9-5900HX processor and baseline speed is 3.30 GHz. First of all, in order to verify the effectiveness of the improved GA, this paper selects 4 cases from the TSPLIB database with the number of cities less than or equal to 100. The solution results are shown in Table 3. The test results show that the improved GA algorithm used in this paper generates results that are very close to the solution given by the TSPLIB. The TSPLIB official website states that any solution that is close to or better than the solution they provide can be considered a good solution [33], which verifies the effectiveness of the algorithm.

The improved GA and Gurobi are used to obtain the shortest path lengths of 3D-TSP. The computing time and quality of the two solutions are compared. We set the parameters of the improved GA as M = 200, C = 10,000, Pc = 0.7, Pm = 0.1. 500 simulation experiments were performed for each scenario, and the average value of the length of the shortest path of the 500 experiments was taken as the expected length of the shortest path in this scenario. To prove that 500 simulation experiments for each scenario are sufficient to accurately estimate the expected length of the shortest path, we plotted the relationship between the average length of the shortest path and the number of experiments (see Figure 3). As shown in the figure, the fluctuation is large when the number of simulations is less than 100. When the number of simulations is between 100 and 300, the fluctuation is smaller. When the number of simulations exceeds 300, the average value tends to be stable. To obtain accurate results, we set the number of simulation experiments to 500.

In each of the 50 scenarios with N = 10, we compute the difference between the lengths of the shortest paths obtained by the two methods and divide the difference by the length obtained by Gurobi to get a percentage. Figure 4 shows the distribution of the percentage. Figure 4 shows that 90% of the difference is lower than 1%, and the biggest difference is 1.37%, which is also very small. As a result, the quality of the solutions obtained by the two methods is not much different. At the same time, we found that the improved GA takes much less computing time than using Gurobi. Especially when the value of N is greater than 30, Gurobi cannot generate the results for the 500 scenarios in three days, while the improved GA can. As a result, this paper uses the solution obtained by the improved GA to further establish the estimation models.

4.3. Estimation Models of the Expected Length of the Shortest Path

We start by defining some variables that will be used in the estimation models. We then explore the relationship between these variables and the expected length of the shortest path and propose possible models for estimating the expected length. Finally, the goodness of fit of the proposed models is evaluated.

4.3.1. Variables for Building the Models

We obtain four variables (A, N, S, and L) from the previous literature [1115]. These studies assume that the service area is a rectangle, where A is the area of the service area, N is the number of demand points in the service area, S is the ratio of the length of the service area to its width, and L is the expected length of the shortest path in the service area. Because this study focuses on the 3D space, the number of variables is expanded to five (V, N, S1, S2, and L), where V is the volume of the service area, S1 is the ratio of the length to the width, and S2 is the ratio of the length to the height.

4.3.2. Analysis of the Expected Length of the Shortest Path

Before establishing the estimation models, it is necessary to analyze the relationship between the variables and L based on the experimental data. The results of the expected length of the shortest path for some of the 500 scenarios obtained using the improved GA are shown in Table 4. The demand points are randomly generated three-dimensional coordinates, as explained in Section 4.1. This table shows the change in the mean value of the shortest path length when different values of V, N, S1, and S2 are selected. As indicated in Table 4, the mean value of the shortest path length increases with an increase in V, N, S1, and S2. When V, S1, and S2 are fixed, the mean value of the shortest path length increases as N increases. When N, S1, and S2 are fixed, the mean value of the shortest path length increases as V increases. When V and N are fixed, the mean value of the shortest path length increases gradually as S1 and S2 increase.

We use the data when the shape of the service area is a cube as an example to demonstrate the relationship between the expected length of the shortest path and the volume of the service area and the number of demand points in the service area. We first present the relationship between the volume of the service area (V) and the expected length of the shortest path (L) under the different number of demand points (N) (Figure 5). As shown in Figure 5, when V does not change, L gradually increases as N increases. When N is constant, L gradually increases with an increase in V and there is a nonlinear relationship between the two variables. It can be seen from the figure that L and the cubic root of the volume of the service area () may have a linear relationship. To construct better models, we also explore the relationship between L and .

On the basis of the expected length of the shortest path model () proposed by Beardwood et al. [11], we speculate that the expected length of the shortest path in 3D-TSP may have a linear relationship with or . Therefore, we use the data when the shape of the service area is a cube as an example to visualize the relationship between the variables in the form of a scatterplot. The scatterplot of the expected length of the shortest path and is shown in Figure 6. The scatterplot of the expected length of the shortest path and is presented in Figure 7.

Figure 6 shows an evident linear relationship between the expected length of the shortest path (L) and . By contrast, the linear relationship between L and in Figure 7 is not that obvious.

In addition, this study calculates that when the volume of the service area is fixed to 1000, the ratio of L to and the ratio of L to under different shapes are provided in Tables 5 and 6. As indicated in Table 5, when the volume of the service area (V) and the number of demand points (N) are fixed, the ratio increases as the shape of the service area is elongated (refer to any row in Table 5). When the shape and volume of the service area are fixed, the ratio increases with the number of demand points (refer to any column in Table 5). When the volume of the service area is fixed, the difference between ratios becomes smaller as the number of demand points increases. The conclusions drawn from Table 6 are similar to those drawn from Table 5.

4.3.3. Model Results

From the analysis above, we could build the following models:where the is a linear combination of the independent variables.

This study establishes multiple linear regression equations based on the data obtained in the simulation experiments. Test and analyze the significance of the comprehensive linear effect of each independent variable on the dependent variable through RStudio, select the independent variable that only has a significant effect on the independent variable, and establish the regression equations. Based on correlation analysis, eight different models are considered and presented in Table 7. Models 1 and 2 are extensions of the models proposed by Beardwood [11]. Models 3–6 are extensions of the models proposed by Kwon [15]. In these models, , , , , and are unknown parameters.

After the regression models are determined, the experimental data is used to estimate the unknown parameters of the models, and RStudio is used to analyze the significance of the coefficients of the fitted regression equations. The parameter estimation method used in this study is the least square method. Their adjusted R2 and values of each model are recorded. The results are summarized in Table 8. From Table 8, we can see that the R2 of each model is very high, and the value is very small. That means all the models fit the data well. Model 1 has the highest R2.

However, R2 should not be the only evaluation measure of the model because the models could suffer from an overfitting problem. To deal with the possible overfitting problem, the k-fold cross-validation method is used [35]. The mean absolute error (MAE) and mean absolute percentage error (MAPE) of each model are calculated at the same time to compare the performance of these 6 models.

The k-fold cross-validation is a cross-validation method that can effectively compare models [35]. It first divides the dataset into k groups of mutually exclusive subsets of the same size. Then, it selects one subset each time as the testing set, leaving k−1 subsets as the training set. In this manner, k groups of training and testing sets are obtained. The operation is performed k times. The MAE and MAPE are used to evaluate the model performance. The average values of the measures are calculated for each model. The value of k in this study is set to 10. The results are provided in Table 9. The MAE and MAPE of the testing set are larger than those of the training set, and such findings are reasonable.

Among the 6 models, Model 1 and Model 2 are the simplest models. The MAE and MAPE of Model 1 are smaller than those of Model 2. Model 1 is selected as the representative of the simple models. Models 3 to 6 are more complex and fit the data better in terms of MAE and MAPE. Among them, Model 4 has the smallest MAE and MAPE, followed by Model 3. As a result, Model 4 is selected as the representative of the complex models.

5. Conclusions

In this paper, we intend to study the expected length of the shortest path of the 3D-TSP. By changing the volume, shape, and number of demand points within the service area, 500 experimental scenarios were designed. The improved GA and the solver Gurobi are used to obtain the shortest path of 3D-TSP in each scenario. The results of the two methods are compared and analyzed. Though the Gurobi could generate better results, it takes much longer computing time. Especially when the number of demand points is large, it becomes infeasible to obtain the results in an acceptable time. The shortest path obtained by the improved GA is very similar to that obtained by the Gurobi. As a result, the shortest path obtained by the improved GA is used in this study for further analysis and modeling. Regression models of the expected length of the shortest path are proposed based on data analysis and previous studies. The coefficients of models are fitted using RStudio. The k-fold cross-validation is applied to evaluate the model performance based on MAE and MAPE. All the models proposed in this study fit the training and testing datasets well. The results indicate that Model 1 can accurately estimate the expected length of the shortest path of the 3D-TSP and has a simple formulation. For a more complex model formulation, Model 4 could generate better prediction accuracy.

The results of this study can be used to estimate the expected length of the shortest path of 3D-TSP. Although this study uses linear regression to model the expected length of the shortest path of the 3D-TSP, machine learning methods, such as neural networks, are also worth exploring. The limitation of this study is as follows. Firstly, we obtained the solution using the improved GA. The solution may not be globally optimal. Better algorithms could be explored in the future to make sure the global optimal solution is obtained. Secondly, this study is based on experiments designed under ideal conditions. The applicability of the proposed models in practice should be further verified.

Data Availability

The data used to prove the validity of the algorithm of this study are available from https://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was funded by the National Natural Science Foundation of China (Grant nos. 71704145, 71831006, and 51774241), China Postdoctoral Science Foundation, and Sichuan Youth Science and Technology Innovation Research Team Project (Grant nos. 2019JDTD0002 and 2020JDTD0027).