Research Article  Open Access
Shushan Chai, Qinghuai Liang, "An Improved NSGAII Algorithm for Transit Network Design and Frequency Setting Problem", Journal of Advanced Transportation, vol. 2020, Article ID 2895320, 20 pages, 2020. https://doi.org/10.1155/2020/2895320
An Improved NSGAII Algorithm for Transit Network Design and Frequency Setting Problem
Abstract
The transit network design and frequency setting problem is related to the generation of transit routes with corresponding frequency schedule. Considering not only the influence of transfers but also the delay caused by congestion on passengers’ travel time, a multiobjective transit network design model is developed. The model aims to minimize the travel time of passengers and minimize the number of vehicles used in the network. To solve the model belongs to a NPHard problem and is intractable due to the high complexity and strict constraints. In order to obtain the better network schemes, a multipopulation genetic algorithm is proposed based on NSGAII framework. With the algorithm, network generation, mode choice, demand assignment, and frequency setting are all integrated to be solved. The effectiveness of the algorithm which includes the high global convergence and the applicability for the problem is verified by comparison with previous works and calculation of a realsize case. The model and algorithm can be used to provide candidates for the sustainable policy formulation of urban transit network scheme.
1. Introduction
The urban transit planning process includes the following phases: network design, frequency setting, timetable development, vehicle, and driver scheduling [1, 2]. The network design with corresponding frequency is related to the scientific configuration of urban transit facilities and the improvement of network service level, which will affect the development of the city and the evolution of urban space morphology. This paper focuses on the network design and frequency setting problem.
The crosssectional passenger volume and the congestion situation of a line, which depends on the frequency setting, should be considered in the network design. Furthermore, both the demand of users and the profit of managers should be considered at the same time, thus it is a complicated multiobjective transit network design problem (MTNDP) [3, 4]. The characteristics of the related studies are listed in the Table 1. As can be seen, generally, the problem is formulated by minimizing the travel cost [5–8], minimizing the construction and operation cost [7, 9], and maximizing the passenger attraction or coverage [5, 6, 9, 10] etc. The studies about the problem can be classified into the entire network design and the routes design based on existing network [7]. At present, the multimode networks integration method [11] and the multiphase integration method in a network mode [12, 13] are engaged. The effective estimation of passenger flow state [14], the congestion and the capacity limitation of transit network [15] will have a great influence on the network design. The optimal matching between the distribution of passenger flow and the network structure can lead to less travel time and higher transport efficiency.

The solving of MTNDP is usually treated as an intractable problem for its high degree of complexity, the multiple objectives against each other and strict constraits, etc. The constraints and variables will explode with the expansion of network size. With the rapid development of computer technology and operational research, mathematical computing software and heuristic algorithm have been applied in transit network planning [6, 9, 10, 16]. However, for largescale network, its nonlinear problem cannot be easily solved by conventional optimization software. As the classical approximative method, heuristic methods are often used, which includes constructive algorithm, local improvement algorithm, and the combinations of them [17–19]. However, many traditional heuristic algorithms have poor global convergence, and the results sometimes depend largely on the initial solutions. If the lines are selected from a given line pool to form the network, the rationality of these lines will largely depend on the quality of the line pool [20]. At present, the metaheuristic methods, such as genetic algorithm [8, 21, 22], simulated annealing algorithm [13], tabu search algorithm [23], swarm intelligence algorithm [24] and other metaheuristics [25, 26] are usually applied in the transit network design problem.
However, the generating of the network and the setting of the frequency, which are interrelated with each other, are separated in some methods. And it is particularly important to use the algorithm with high global convergence ability to search for the better network scenarios when the exact solutions cannot be obtained. In this paper, an effective algorithm based on improved NSGAII is put forward, in which the network generation, the mode choice, the passenger demand assignment considering passenger trip rule, and the frequency setting are all integrated. The algorithm has high global convergence and good applicability to the problem, which are verified by numerical experiments and case application.
The structure of this paper is as follows: The mathematical model of MTNDP is presented in Section 2. Section 3 describes the multiobjective algorithm for the MTNDP, followed by numerical experiments in Section 4 and case application in Section 5. Section 6 presents the conclusions.
2. Mathematical Model
2.1. Notations
Let be the set of all nodes which represent the stations, be the set of edges which represent the sections in the transit network, represent a section between adjacent stations and , . The notations in the model are defined as follows.
First, variables associated with the connection among nodes in the network are defined as follows.(i) if line contains the section , 0 otherwise.(ii) if line contains the station , 0 otherwise.(iii) if the network contains the section , 0 otherwise.(iv) if line is included, 0 otherwise.(v) is an effective path between starting and ending points.(vi) is the set of effective paths, .(vii) is the set of neighbour nodes of node .(viii) is the route cycle time in line .(ix) is the total number of stations in line .(x) is the maximum crosssectional passenger volume of line at peak hour.(xi) is the average headway of vehicles in line .
Then, the variables associated with the changes of travel path are defined as follows.(i) if the passenger flow corresponding to the OD pair uses the section of line along the path , 0 otherwise.(ii) if the passenger flow corresponding to the OD pair transfers from line to line at station along the path , 0 otherwise.(iii) is the total number of stations in path .(iv) is the transfer times needed along the path .(v) is the headway of vehicles at the starting point of path .(vi) is the headway of vehicles taken by passengers travelling after the transfer in path .(vii) is the transit passenger volume travelling along the path between the starting point and the ending point .
Finally, the known quantities in the model are defined as follows.(i) is the number of lines in the transit network.(ii) is the maximum limited number of lines in the network.(iii) is the minimum limited number of lines in the network.(iv) is the maximum limited number of stations in line .(v) is the minimum limited number of stations in line .(vi) is the maximum limited length of line .(vii) is the minimum limited length of line .(viii) is the maximum limited transfer times in a path.(ix) is the number of seats of each vehicle in line .(x) is the maximum limited load factor of the vehicles in line .(xi) is the transit passenger demand between the starting point and the ending point .(xii) is the capacity of section .(xiii) is the length of section .(xiv) is the shortest distance between the starting point and the ending point .(xv) is the travel time of section .(xvi) is the dwell time of vehicles at the nontransfer station in path .(xvii) is the walking time for the transfer in path .(xviii) is the minimum limited headway of vehicles.
2.2. Model Formulation
The generalised travel cost function on paths in transit network is formulated to calculate the passenger travel time, including the waiting time at starting point, invehicle travel time and transfer time. The invehicle travel time includes the travel time in sections and the dwell time at intermediate stations. The transfer time includes the walking time and the waiting time at transfer stations. Assume that all the passengers arrive at the stations randomly and are able to board the first vehicle on the route they encounter, the average waiting time of passengers can be considered as half of the headway [27–29]. Let be the invehicle travel time, be the total transfer time, then they can be expressed by Equations (1) and (2), respectively.
Passengers have different sensitivities to the transfer times in the same path. Under the same travel time, the more the transfer times are, the less likely the path would be selected by passengers. In most previous works, such as in [8, 25, 30–32], the fixed penalty coefficient was used to describe the influence of transfers on the travel time. In this paper, the sensitivity coefficient of transfer times and the adjustment parameter are introduced to adjust the transfer penalty, so the total transfer time can be expressed by
where, represents passenger’s sensitivity to the transfer during the travel. is the adjustment parameter, generally greater than 1. The larger the , the greater the difference for sensitivity levels between different transfer times in the path.
Excessive passengers waiting at station and excessive load factor of incoming vehicle will have a great impact on the passenger travel delay, which can cause an increase in travel time. In order to describe the impact of congestion on travel time while entering a station or transferring, the travel delay coefficient of the station is defined, as shown in
where, and are the crosssectional passenger flow and the capacity of next section of the station in path , respectively. and are the adjusting coefficients.
Let be the generalized travel cost of the path between the two points and , then the can be expressed by
The mode choice model is used to calculate the transit passenger demand between OD pair . Assume that any segment of the basic road network can be used by other modes, such as private car, bicycle, and walk. Since the travel cost is usually the most important influencing factor of mode choice, it is assumed that the mode choice probability depends exclusively on the generalised travel costs using different modes. Compared to the absolute utility, the passengers are more concerned about their relative utility. So the choice probability of the transit can be simply calculated by
where is the choice probability of public transport for passengers traveling between the OD pair , is the shortest time traveling between the OD pair using public transport, is the shortest travel time using the mode, is the average shortest travel time of all the modes used.
From the perspective of passengers, the minimum travel time of passengers is regarded as one of objectives to improve the network service level and travel efficiency, which is shown in
From the perspective of managers, the minimum number of vehicles used in the network is regarded as another objective to reduce operating costs and improve the economic benefit, which is shown in Equations (8)–(10).
where, the headway is limited to integer for the convenience of operation and management, as shown in equation (9), and the maximum crosssectional passenger volume is calculated by Equation (10).
Multiple constraints are included related to network topology relationship, connectivity between stations, limited transfer times, network size, etc., as shown in
The terms (11)–(15) are the network topology constraints, indicating the relationship between nodes and edges. The term (16) indicates that the flow corresponding to the OD pair can use the section of a line along the path only if the section belongs to the line . The term (17) indicates that line has no branch at point . The terms (18) and (19) indicate that line is continuous and loopfree, where is an arbitrary node set. Constraint (20) makes different lines have different nodes or connections. The term (21) indicates that the isolated nodes are not allowed in the network, where is a candidate node set. The term (22) is the network connectivity constraint, which indicates that if there is passenger demand between two stations, the two stations must be connected in the network. Constraints (23) and (24) limit the transfer times for a path. The term (25) indicates that the passenger flow corresponding to the OD pair only has the inflow at the starting point and the outflow at the ending point , and the inflow is equal to the outflow at the intermediate stations of the path . The term (26) indicates that the passenger flow entering the line is equal to that exiting the line . The term (27) indicates that reflow is not allowed in the path. The term (28) limits the capacity of sections. Constraint (29) limits the number of stations in line . Constraint (30) limits the length of line . Constraint (31) limits the number of lines in the network. Constraint (32) limits the minimum value of the integral headway.
3. Improved NSGAII Algorithm for the MTNDP
The multiobjective network design model developed in this paper belongs to the NPhard problem mathematically. NSGAII [33], which is the nondominated sorting genetic algorithm with elite strategy, is one of very effective algorithms to solve the multiobjective problem. Compared with NSGA [34], the algorithm reduces the complexity and has the advantage of better robust performance. In NSGAII, there are three important concepts that need to be explained as follows. For detailed description of the algorithm, please refer to the literature [33] by Deb et al.
(1)Fast nondominated sortIf the objective functions of an individual is no worse than the others, the individual is the nondominated solution. In the iterative process, all the individuals are assigned to several fronts according to the noninferior relationships. The individuals in the next front is dominated by that in the previous front, and each individual is assigned a rank value .
(2)Crowding distanceIn order to evaluate the distribution density of other individuals around a particular individual, the average distance between the two sides of the individual needs to be calculated, which is called the crowding distance of the individual. The distance for the two boundary individuals is assigned to infinity, and for the other individuals, the distances is calculated by
where, is the crowding distance of the objective of the individual. and are the objective function values of the and individuals, respectively. and are the maximum and minimum values of the objective function of the individual.
(3)Crowded comparisonFor two individuals and , if the ranks and are same, the crowding distances and are compared. In the comparison, if or but , the individual is superior to the individual.
In this paper, a multipopulation genetic algorithm is proposed based on the framework of NSGAII. Figure 1 presents the flowchart of the algorithm proposed in this paper for the MTNDP.
3.1. MultiPopulation Dynamic Coding
In the process of chromosome coding, not only the connections among stations, but also the lines to which the stations belong should be considered. In order to effectively store the topological information and improve the convenience of coding for transit networks, a multipopulation dynamic coding method is proposed.
In this paper, the integer coding method is adopted, and the concepts of “main population” and “affiliated population” are proposed. The information about connections among stations is stored in the chromosomes of “main population”, and the information about lines to which the stations belong is stored in the corresponding chromosomes of “affiliated population”, as shown in Figure 2.
In Figure 2, alleles on chromosome in the “main population” represent the stations while the alleles in its “affiliated population” represent the line labels, and the stations that belong to the same line are connected sequentially. An individual which is a characteristic entity of a chromosome represents a network scheme. If a station appears multiple times in a chromosome of “main population”, it represents a transfer station. The chromosomes in the same population have different lengths because of different number of transfer stations in networks. In the iteration process, the chromosomes in the “main population” and its “affiliated population” whose lengths are dynamically changed have onetoone correspondence.
3.2. Generation of Initial Populations
For large network, it is difficult to generate a large number of initial solutions with high diversity under strict constraints. At present, the main initial solution generation procedures can be classified as follows.
Random connection method of adjacent nodes. In this method, the routes are generated by the connection between adjacent nodes, such as the probabilitybased IRSG procedure proposed by Jha et al. [26]. The method usually has high calculation efficiency, but cannot guarantee that the travel time between OD pairs will not be too long along the routes.
Shortestpathbased method. In this method, the routes are generated based on the shortest paths searching between OD pairs, and another procedure dealing with constraints is usually needed. For example, the construction and repair procedures used by Ahmed et al. [35] and Szeto and Jiang [36], the initial solution generation procedure based on greedy algorithm used by Nikolić and Teodorović [24] and Nayeem et al. [37], and the Floyd’s algorithm and feasibility check used by Chew et al. [22]. With the method, the direct demand between departure and destination stations can be met, but more vehicles on the network will be needed [38].
Multipathbased method. In this method, the routes are generated based on effective paths, such as the RGA and RFA procedures for initial solution skeleton proposed by Mahdavi Moghaddam et al. [38]. With the method, the initial solutions can be generated from both the perspectives of passengers and managers, but the number of effective paths between OD pairs to be searched will have a significant impact on the calculation time of the procedure.
In the existing procedures, the isolated nodes (i.e., nodes are in the connected basic road network but not served by the routes) cannot be effectively circumvented, and the characteristics of transit network structure are not well reflected in the initial solutions, such as the phenomenon of “rich club”, that is, nodes with larger degree values (“rich nodes”) tend to be connected to each other [39]. For transit network, the endpoints of lines have the smallest average degree value while the transfer stations have the largest. Therefore, in order to generate initial schemes with actual network characteristics, an initial population generation method is proposed in this paper.(1)Generation of the endpoint set of lines
The node with smaller degree is more likely to be selected as the endpoint which represents the departure or the terminal station of a line. In this paper, the line endpoints are randomly selected based on the roulette method, and the selection probability can be calculated by
where, is the probability that node is selected as the endpoint of a line, satisfying , is the degree value of candidate node .
(2)Generation of linesThe remaining candidate nodes are randomly selected as intermediate stations of a line after the endpoints are selected. Firstly, the shortest path search method in nonweighted road network is used to get the top paths between two endpoints of a line. All the top paths should meet the constraint about nodes number which can be expressed by the distance between the two endpoints of each path, as shown in
where, is the distance between endpoints and in the nonweighted network.
Then, a path is selected from the paths as the first line according to a certain probability. The larger the average degree value of the intermediate nodes in the path, the greater the probability that the path is selected. The roulette method is also used and the probability can be expressed by
where, is the probability that the candidate path is selected, satisfying and , is the average degree value of intermediate nodes in the path .
After the first line is generated, the paths between the two endpoints of next line are searched with the same method to constitute the set . The paths in that contains the isolated nodes in the road network are stored in the new set as the candidate paths of the next line. So, with the increase of generated lines, the number of candidate paths and isolated nodes decrease gradually.
Finally, if isolated nodes are not allowed and still exist in the generated network, they will be randomly inserted into the generated lines as intermediate nodes. In order to maintain the diversity of species, the repetitive chromosomes in the population are removed to ensure their uniqueness. The pseudo code for the initial population generation algorithm is shown in Algorithm 1.

3.3. Demand Assignment
For comparison, the assumption about paths selection is the same as in the literature [40] by Arbex and Cunha, that is, assuming that passengers only select the paths with no more than two transfers. The passenger initially selects the direct path to avoid transfer. If the direct path is not available, only a singletransfer path will be selected. If there is no singletransfer path, then the twotransfer path is selected. Ultimately, if there is no twotransfer path, it is considered that the passengers’ travel demand cannot be covered.
The multipath incremental assignment method is used in this paper. The OD passenger flow is divided into multiple parts equally that is presented in multiple OD matrices and distributed to the network successively. Before an OD matrix is assigned, the selection of paths and lines for passengers are calculated according to the assignment result of the previous OD matrix. In the initial state, there is no passengers on each line and the waiting time is set to an initial nonzero value. If there are multiple paths, the relative utility of the path is used to calculated the selection probability , as shown in
where, is the calibration parameter, is the average travel cost for the effective paths between the OD pair .
3.4. Two Types of Crossover
According to the characteristics of transit network, the crossover operators of chromosomes can be classified into two types: interchromosome crossover and intrachromosome crossover.(1)Interchromosome crossover
Interchromosome crossover is the replacement and recombination of nodes and connections between different parent individuals. According to the number of changed lines in a chromosome, it can be divided into singleline crossover and multiline crossover, as shown in Figures 3 and 4. Taking the singleline crossover between two parents as an example, the steps are as follows.
Step 1. If there are common endpoints between lines belonging to the two parents, the common endpoints are selected to constitute the set , and the two lines are selected to constitute the set and respectively, then turn to Step 2. Otherwise, turn to Step 5.
Step 2. If the lines and have common intermediate nodes, the intermediate nodes are selected to constitute the set , and turn to Step 3. Otherwise, turn to Step 4.
Step 3. Randomly select two lines and which have the endpoint and intermediate node . All the nodes from to of the two lines exchange with each other, and update the corresponding chromosomal alleles in the “affiliated population”. If the offspring individuals meet the other constraints, the crossover ends. Otherwise, turn to Step 4.
Step 4. The two lines which have common endpoints are exchanged as a whole with each other between the two parents, and update the corresponding chromosomal alleles in the “affiliated population” (as shown in Figure 5). If the offspring individuals meet the other constraints, the crossover ends. Otherwise, turn to Step 5.
Step 5. Randomly select a line from each parent and exchange them as a whole, then update the corresponding chromosomal alleles in the “affiliated population” (as shown in Figure 6).
By executing the interchromosome crossover, not only the network size which includes the total number of stations and the total mileage of the network, but also the line characteristics which include the starting and ending points, the alignment, the mileage, the number of stations of each line and the transfer relationship between lines can be adjusted to get different network schemes.
(2) Intrachromosome crossoverIntrachromosome crossover is the replacement and recombination of nodes and connections between lines in the same parent individual, as shown in Figure 7. It can’t change the network size, but the line characteristics. The steps of the intrachromosome crossover in a parent are as follows:
Step 1. Search for the transfer nodes in the parent to constitute the set .
Step 2. Randomly select a node , and search for the lines with node to constitute the set .
Step 3. Randomly select two lines . All the nodes from to an endpoint of the two lines exchange with each other, and update the corresponding chromosomal alleles in the “affiliated population”.
3.5. Adaptive Mutation
Each iteration can result in Pareto solutions and may contain only partial solutions of the original problem. The nondominated solutions may be dominated by new solutions generated after successive iterations. In this paper, if new nondominated solutions cannot be generated by multiple iterations, the mutation operator will be triggered to jump out of the local optimal solutions, as shown in Figure 8. The steps for mutation of a parent are as follows.
Step 1. Randomly select a line from the parent and three adjacent nodes , , and in order from the selected line.
Step 2. Search for the nodes that can be connected to the nodes and to constitute the set .
Step 3. Randomly select a node to replace the node . The corresponding chromosome in the “affiliated population” remains unchanged.
3.6. Dealing with Strict Constraints
Offspring population is generated by selection, crossover and mutation of parents. It is necessary to ensure the diversity of the offspring. However, the generated offspring may not satisfy the strict constraints after genetic operation. Therefore, an auxiliary algorithm is designed to deal with the strict constraints.
The genetic operators are executed according to certain probabilities. In this paper, intrachromosome crossover will be executed when the interchromosome crossover is finished with ineligible offspring, and the mutation operator will also be executed when the two types of crossover operators are finished but none of eligible offspring is generated. If the size of the offspring does not meet the requirement, the genetic operators will be executed cyclically, and the new eligible offspring will be merged until the size meets the requirement. The pseudo code for the auxiliary algorithm is shown in Algorithm 2.

4. Numerical Experiments
4.1. Parameter Values
The road network of Switzerland given by Mandl in [32] is used as the basic road network, as shown in Figure 9. In the road network, each node represents a city and each edge represents a road segment between two cities. The number on each edge represents the travel time between two cities. The OD matrix of the network are shown in Table 2.

For comparison, the number of seats in each bus, the maximum limited load factor and the minimum number of stations allowed in a line are the same as those used in [40] by Arbex and Cunha. In this calculation, because the passenger demand is static, the mode choice model calculated by Equation (6) is not included. Since the headway of vehicles is not constrained to an integer in the literatures for comparison, the Equation (9) is calculated without rounding down, and the constraint (32) is not considered in this calculation. The parameter values in the model are shown in Table 3. The probabilities of interchromosome crossover, intrachromosome crossover and mutation is set to 0.9, 0.9, and 0.1, respectively. The maximum number of iterations is set to 1000. Table 4 presents the transfer penalty coefficients under different sensitivities.


4.2. Results and Discussion
Figure 10 presents the comparison among buses, user cost, and direct trips of the networks for the scenarios with 4 lines under different transfer sensitivities . The results show that the larger the , the smaller the passengers’ willingness to transfer in general, and the higher the user cost under the same direct rate. The solutions with better objective values can be obtained when under the given transfer sensitivities, but the direct rates of the solutions are obviously lower. Taking into account the objective values and the direct rates at the same time, the smaller number of buses or the lower user cost can be obtained when under a certain direct rate. Taking the four solutions including the solution with , the solution with , the solution with and the solution with (as shown in Figure 10) as examples, the network scheme corresponding to the solution requires the least number of buses and the lowest user cost, but the direct rate which is 90.95% has the least advantage. There is little difference for the direct rate among the solutions , , and , but the solution requires 3 more buses than the solution , and more cost than the solution .
Therefore, the scenarios with different line numbers are calculated when . Figures 11–14 present the comparison between the optimal Pareto solutions obtained by the algorithm and the results in previous literatures. As can be seen, the solutions obtained by the algorithm can dominate the results calculated in the previous literatures, which indicates the better solutions can be obtained by the algorithm. The objectives obtained in [40] by Arbex and Cunha is the closest to the solution obtained by the algorithm, but with the increase of transfer sensitivity, the advantage of the algorithm in the user cost and direct rate is gradually emerging.
The solutions which can dominate the solutions in the previous literatures are selected, as shown in Table 5. The following indicators are usually used in many literatures to test the quality of results: : Percentage of demand satisfied without any transfer. : Percentage of demand satisfied with one transfer. : Percentage of demand satisfied with two transfers. : Percentage of demand unsatisfied. : Average invehicle travel time in minutes per transit user, . : Average user cost in minutes per transit user , comprising travel time and transfer penalties.

As can be observed from Table 5, the selected nondominated solutions obtained by the algorithm requires fewer buses, lower user cost, and lower transfer rate. Therefore, the algorithm has high convergence in solving the MTNDP.
5. Case Application with RealSize Network
The model and the algorithm are used for a realsize bus network calculation in Baotou city, Inner Mongolia. The predicted population density in long term and the trunk roads in the city centre area are shown in Figure 15. In the centre area, 10 bus lines will be planned on the trunk roads and the locations of 50 stops are determined in the trunk roads. In this case, the mode choice among bus, private car, bicycle and walk is included. It should be mentioned that as the travel distance increases, the probability of walking decreases the fastest until it drops to zero. The parameter values used in the case are shown in Table 6.
