Abstract

Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR) approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO) and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK) is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA) optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.

1. Introduction

The size of fleets at airports is becoming ever larger because of the continuous increase in the past few decades in the demand for transportation by air. Consequently, efficiency levels are dropping as managers face more operational eventualities, and airlines have to accommodate higher fuel costs and mounting numbers of customer complaint. In 2016, the passenger traffic volume reached 487.96 million, an increase of 10.7% since the previous year. The current average on-time rate of flights in China is 76.76%, and average delay time is 16 minutes, which is 5 minutes less than in 2015. The total number of complaints received (all recorded by airlines and airports) in 2016 showed an increase of 84% on the 2015 total [1]. In China, an aircraft is considered to be “on-time” if it take-off (lands) within 15 minutes before and after the schedule departure (landing) time. The delay time here refers to the departure flights and is defined as the difference between actual departure time and plan departure time.

Flight delays have a dramatic impact on the movement of taxiing aircraft between gates and runways. Taxi-out time is defined as the time between the actual pushback and wheels-off. Taxi-out time is difficult to predict in hub airports at peak hours. Consequently very long taxiing times and airport surface congestion would be suffered. The long-time taxiing aircraft may cause a blunder when dealing with the pushback and take-off slots, which not only destroys the balance of the arrival and departure process, but also increases fuel consumption and emissions. Moreover, the increasing workload of controllers is unfortunate. The delay is cumulative, but it is both stochastic and controllable in the taxi process. The stochastic characteristic is reflected in uncertainty events, such as shifts in the weather environment, the interaction of the departure/arrival aircraft surface movement, and the human factor. Controllable behaviours such as delays can be adjusted by alternating routes and taxiing speed and even by holding at gate [2].

Better prediction of taxi-out time allows all stakeholders to arrange the future activities in airport operation. Efficient taxi-out prediction methods are effective approaches when the aim is to eliminate delays and improve the utilization of resources. Once taxi-out time is predicted in advance, operators gain a flexibility that allows them to adjust the schedule, gates assignment, and pushback plan. This achieves the smoother operation of an airport and reduces its surface congestion and fuel-burn costs. The aim of this research is to develop the approaches that are more accurate predictors of the taxi-out time of departing aircraft. In this paper, we introduce two methods of predicting taxi-out time, both of which arose from an analysis of the factors extracted from the Aviation System Performance (ASP) data of Beijing International Airport. The proposed models are developed on the soft-computing approaches to predicting taxi-out time: Particle Swarm Optimization algorithm based and Improved Firefly Algorithm based Support Vector Regression. These two intelligent algorithms can search the optimal parameters for SVR to predict the taxi-out time effectively.

The organization of this paper is as follows: A brief overview is offered of previous attempts to analyse taxi-out-time behaviours in the airport departure process, and of the several prediction methods discussed in the Literature Review. This is followed by a description of the research methodology, which includes three traditional prediction methods and two newly proposed, improved swarm intelligence algorithm-based approaches to predicting taxi-out time. The layout data of PEK airport is illustrated, along with historical data, and both are validated for analysing airport dynamics and traffic situations in the taxiing process. Results obtained from the PEK data and findings are then discussed. The conclusion summarizes the benefits that accrue from these findings, and their implications.

2. Literature Review

Several efforts have been made to address the prediction of taxi-out times. Those efforts have included both historic data-based predictions and the queuing-based approaches that regard causal factors. Shumsky deemed aircraft flow and departure demand to be casual factors and used dynamic linear models to predict taxi-out time. He compared static and dynamic linear models and found the dynamic linear model better for predicting taxi-out time in a short-time window [3]. Pujet modelled the departure system as queuing servers and derived a stochastic distribution for the taxi-out time. His model captured the details of the departure process to estimate taxi-out time [4]. Idris et al. analysed a number of factors that affect taxi-out time by using the Airline Service Quality Performance (ASQP) data. These factors included the runway configuration, the airline/terminal, the downstream restrictions, and the take-off queue size [57].

These researchers developed the queuing model for predicting taxi-out time and drew the conclusion that take-off queue size correlates best with taxi-out time, especially when the queue that each aircraft experiences is measured as the number of take-offs between its pushback time and its take-off time. Carr et al. proposed a simulation-based research of queuing dynamics and traffic rules. They predicted taxi-out time by considering aggregate metrics such as airport throughput and departure congestion [8]. Simaiakis and Balakrishnan proposed a taxi-out time prediction model with an analytical model of the aircraft departure process, which included an estimate of the distributions of unimpeded taxi-out time, and the development of a queuing model of the departure runway system [9, 10].

Several statistical approaches and machine-learning methods were applied to the prediction of aircraft taxiing time. Srivastava used high-resolution position updates from the ASDE-X surveillance system of JFK to develop a taxi-out prediction model based on the existing surface traffic conditions and short-term traffic trends [11]. Hebert and Dietz developed a multistage Markov process model of the departure process at LaGuardia airport, based on five days of data, to predict taxi-out time [12]. Balakrishna et al. proposed the reinforcement learning algorithms, which could adapt to the stochastic nature of departure operations, to predict average airport taxi-out time trends approximately 30–60 minutes in advance of the given time of day [2, 13]. Ravizza et al. built a combined statistical and ground movement model and used multiple linear regression to find the function that would predict taxiing times more accurately [14]. Also, they used the same explanatory variables for different approaches, which included multiple linear regression, least median squared linear regression, Support Vector Regression, M5 model trees, Mamdani fuzzy rule-based systems, and TSK fuzzy rule-based systems, to predict taxi-out times and then compared these approaches [15]. Lee et al. used both fast-time simulation and machine-learning techniques to predict taxi-out time and found the prediction method of Support Vector Regression to be better than the linear regression method and the Dead Reckoning method [16].

Unfortunately, the state-of-the-art methods are tested at airports that do not give the findings much universalizability. These airports have exceptional facilitating taxiing conditions, and their response to clearance and delays is quick. For airports that are large in every respect, these methods are slightly inadequate, or they do not take some necessary factors into consideration.

3. Taxi-Out Time Prediction Techniques

There are several predictive approaches such as Artificial Neural Networks (ANN) [17], Kalman Filtering models [18], Softmax Regression (SR) [19], and the Support Vector Regression (SVR) [20]. Therefore, methods with reasonable accuracy are essential for estimating taxi-out time at departure.

3.1. Generalized Linear Model

The Generalized Linear Model (GLM), formulated by Nelder and Wedderburn [21], is a flexible generalization of ordinary linear regression that allows for response variables with error-distribution models other than the normal distribution. GLM relates the linear model to the response variables through a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. The relationship between predicted value and independent variable is defined inwhere is the dependent variable, is the link function, is a set of independent variables, represents the slope coefficients, and is the distribution of . The procedures of GLM for predicting taxi-out time are as follows.

Step 1. Input the training dataset of historical taxi-out time and the corresponding factors; check the distribution of .

Step 2. Choose the link function according to the distribution of .

Step 3. Build the regression model between and , calculate the estimated value of regression parameters , and implement the significance test.

Step 4. Predict the taxi-out time by using the factors in test dataset.

3.2. Softmax Regression Model

The Softmax Regression (SR) is a generalization of logistic regression capable of handling multiclass problems, that is, admitting more than two possible discrete outcomes [19]. The algorithm includes a training phase for estimating the regressors and a testing phase for abstracting the appropriate probability of each feature vector from which the class labels are inferred. Afterwards, the SR selects the value of classified members by calculating the probabilities ofand the model parameters were trained to minimize the cost function:where is the number of classes, are the parameters of SR model, is an -by- matrix, and is an indicator function. SR predicts the taxi-out time with the following procedures.

Step 1. Input the training dataset of historical taxi-out time and the corresponding factors; recognize the number of classification of taxi-out times.

Step 2. Build the exponential distribution family by running a set of independent binary regressions according to the factors vectors of each taxi-out time class; obtain the maximum likelihood function .

Step 3. Establish and minimize the cost function to obtain the optimal parameter θ by using gradient descent method.

Step 4. Update the likelihood function with optimal , and predict the taxi-out time of test set by using .

3.3. Artificial Neural Network

Artificial Neural Network (ANN) is a machine-learning method based on a large collection of connected simple units called artificial neurons. The Back-Propagation Neural Network (BPNN), a multilayer feedforward network trained by error back-propagation algorithm, is one of the most widely used neural network models. Its topology includes input layer, hidden layer, and output layer. In output layer, the activation of a neuron is determined bywhere is the activation of the th neuron, is the neurons set in the preceding layer, is the weight of the connection between neuron and , is the output of neuron , and is the sigmoid function. The BPNN model can learn from the parameters set of taxi-out time and calculate the actual output when implementing the predicting process. If the error between the actual output and expected output did not meet the accuracy requirements, the learning rule of the BPNN would optimize variance by adjusting weights and thresholds until satisfying the accuracy requirements. The learning process of BPNN approach can be summarized in the following steps.

Step 1. Initialize the neural network; define the minimum MSE error () and maximum number of iteration.

Step 2. Input training set; initialize the weight matrix .

Step 3. Compute the layer response output and the calculated MSE.

Step 4. Compare the calculated MSE and ; if calculated MSE > , continue; else go to Step 6.

Step 5. Calculate change in weights and update weights; go to Step 3.

Step 6. Finish training and predict the taxi-out time by using ANN with test set.

3.4. Improved Swarm Intelligence Algorithm Based Prediction Approaches
3.4.1. Support Vector Regression

The Support Vector Regression technique is a nonlinear regression forecasting method. The basic idea is mapping the input variables into a high-dimensional linear feature space (Hilbert space), commonly through a kernel function. The Gaussian Radial Basis Function (RBF) kernel is a commonly used kernel function, where is the parameter to be optimized. In this higher dimensional space, the training data can be approximated to a linear function. Then, the global optimal solution is obtained by training of the finite sample. The regression function for SVR iswhere is the weight vector and can be replaced by kernel function . In -SVR, the objective of is estimating the deviations of output variables less than or equal to from training data. The -value controls the complexity of the approximating functions where small values tend to penalize large portion of the training data, leading to tight approximating models, and large values tend to free data from penalization, leading to loose approximating models. Therefore, the proper choice of -value is critical for the generalization of regression models [22]. The optimal regression function is determined from the estimation of and by solving the following optimization problem:where are the variables that are introduced to penalizing complex fitting functions and the constant allows for the penalizing of the error by determining the tradeoff between the training error and the model complexity. And the dual function is maximizing:

The nonlinear regression function is

To avoid the complex dot product through the kernel function , the input variables are mapped into a high-dimensional linear feature space. Thus, (7) can be written as

3.4.2. Particle Swarm Optimization

The Particle Swarm Optimization is a swarm intelligence algorithm developed in recent years. It is a metaheuristic global optimization method based on a social-behaviour analogy, such as birds flocking and fish schooling. The PSO method solves an optimization problem by moving the particles (namely, candidate solutions) over those particles’ velocities and positions according to simple mathematical formulae. The position of each particle is updated towards the better-known position driven by its neighbours’, and the global, best performance. Thus in searching for the optimal solution of the problem, the update velocity and position of particle are based on the following equation of motion:where is the updated velocity for the th particle, is the inertia weight, and are the weighting coefficients for the personal best and global best positions, respectively, is the th particle’s position at time , is the th particle’s best known position, is the best position known to the swarm, and and are the uniformly random variables . Variants on this update equation consider best positions within a particle’s local neighbourhood at time .

3.4.3. Improved Firefly Algorithm Optimization

The Firefly Algorithm (FA), as a new group bionic optimization algorithm, has high efficiency in solving numerous optimization problems and can outperform conventional algorithms, such as GA. In this algorithm, the fireflies are attracted to each other depending on the two elements: their own brightness and attraction. The brightness depends on the location and the target value, and the higher the brightness, the better the location. Fireflies with higher brightness at the same time have a higher degree of attraction. Low-brightness fireflies in the field of vision are attracted by high-brightness fireflies. Fireflies would move randomly if they had similar fluorescent brightness.

Regarding the brightness as objective function, the optimization problem can be seen as a maximization problem. The attractiveness of the fireflies is proportional to the fluorescence intensity of the nearby fireflies and is inversely proportional to the distance. Define the relative fluorescence brightness of the fireflies as and the attractiveness as . Distance between fireflies and is . Firefly is attracted by firefly to update the location; the location update equation iswhere is the absorption coefficient, , is the attractiveness when , is the step factor for determining random firefly movement, and rand is a random number drawn from a Gaussian distribution, .

Adaptive Step Factor. The value of the step factor affects the global and local optimal detection ability of the algorithm. In order to improve the convergence efficiency of the optimization algorithm, the large step factor can benefit the global optimal solution search efficiency. With the increasing of number of iterations, gradually reducing the step factor is more conducive to the algorithm in the search space for fine tuning. Thus a monotonically decreasing function is chosen as the step factors, which is written aswhere is the initial attractive coefficient, is the controlling parameter, empirically selected as 0.9, and is the number of iterations.

Lévy Flight. The conventional FA optimization uses regular random movement method in stochastic optimization. This often leads to premature converging without the global optimal solution when dealing with a large number of local optimal solutions. In order to reduce the probability that the optimal process falls into the local optimal solution, this paper adopts Lévy flight when updating the distance of fireflies. Lévy flight is a random walk that the step length obeys Lévy distribution, which is a distribution of a sum of identically and independently distributed random variables. The Fourier transform is . The step lengths follow Lévy distribution , where is an index and follows a power-law distribution. The distribution has an infinite variance following

Thus by replacing the original step factor and random walk with adaptive step factor and Lévy flight, respectively, the new update equation of IFA is written aswhere symbol is entry-wise multiplication.

3.4.4. PSO/IFA Based Support Vector Regression

In this study, identifying the optimal parameters of the SVR model is an optimization problem. Therefore, this study combined swarm intelligence algorithm and SVR in prediction methods to reduce prediction errors. Considering that the number of samples of the learning data is much larger than that of feature dimensions, the input variables are mapped into Hilbert space through the RBF kernel, which is more promising, compared with other kernels. In order to solve the problem of predicting departure taxi-out time more accurately, the establishment of SVR models requires the determining of the penalty factor , RBF kernel parameter , and the -value in advance, by using PSO and IFA optimization, respectively, since the inapposite would affect the training error and model complexity, inapposite would define the nonlinear mapping from the input space to Hilbert space and induce overfitting or fewer learning phenomena, and the -value controls the complexity of the approximating functions. The flowchart of PSO/IFA based on the SVR prediction model is shown in Figure 1.

In Figure 1, the optimized SVR prediction model includes three parts: data classification, PSO/IFA optimization, and SVR prediction model. Historical data would be classified as training set, validating set, and test set. The training set is used to adjust weights and biases. The validating set is used to evaluate the performance of the trained SVR model. And the test set is used to confirm the predicting accuracy. The optimization process optimizes the parameters of the SVR and SVR models, trains and validates the models, and then passes the feedback to the optimization process after evaluating the fitness values to continue searching the optimal parameters until meeting the accuracy. In short, the SVR implements regression parts, whereas the PSO and IFA are applied to determine the optimal SVR parameters.

The parameters of SVR prediction models were evaluated with PSO and IFA, respectively, in order to get the optimum fitness. All prediction processes were performed in MATLAB 2012a. In the parameters’ optimization with both PSO and IFA methods, we initialized the maximum population size as 20 and the maximum number of iterations as 100, and each particle is a vector that comprises the SVR parameters; namely, . The search space of the SVR parameters is . The termination criteria are fulfilled if there is no improvement in fitness function and the maximum number of iterations is obtained.

3.5. Performance Measures

This research aims to compare the swarm intelligence algorithm based SVR methods and other prediction methods, to evaluate performance by using the prediction accuracy measures in statistics as presented in (15) to (18):

(1) Root mean square error (RMSE):where is the actual value, is predictive value, and is the number of data samples.

(2) Mean absolute percentage error (MAPE):

(3) Squared correlation coefficient ():

(4) Prediction accuracy (PA): the last set of performance measures is the percentage of prediction accuracy within a specific-error absolute value. This percentage indicates the percentage of the aircraft in the dataset predicted within 2, 3, and 5 minutes, as presented in (14):

4. Data Analysis and Observation

4.1. Data Source

The datasets in this study are from the Aviation System Performance of PEK, the second busiest airport in the world, with a huge traffic volume, as well as severe delay time. PEK airport comprises three parallel runways, with Runway 36L/18R being used for combined arrival and departure operations, Runway 36R/18L mainly dedicated to departures, and Runway 01/19 used only for arrivals, with all three runways serving both departures and arrivals at traffic rush hour (Civil Aviation Administration of China, 2013) [23].

The days from Oct. 17 to Oct. 30 were used for training, and the days between Nov. 13 and Nov. 15 were used for testing the prediction. ASP data record the following information: schedule take-off time and schedule landing time, applied pushback time, actual take-off time, and actual landing time of arrival flights. Using the historical data is important to ensure that the results are realistic, and can be compared with the status quo at a specific airport simultaneously, in order to estimate the potential situation at other similar airports.

4.2. Data Analysis

In recent years, researchers have found that departure taxi-out time is related to numerous factors, including the number of departing aircraft in the runway queue, the number of arriving aircraft taxiing, the time of day [2, 13], airlines, and taxiing route distance [14, 15]. Departure delay is also a significant factor in some specific airports such as PEK. These elements complicate the development of a methodology for predicting departure taxi-out time. In this research, the various prediction models were used for predicting the taxi-out time of each flight. In order to train the state of flights, several factors were taken into consideration. The state variable set for the prediction was determined by analysing the performance data. The configuration of three parallel runways at PEK airport reduces influence among the different runways. For a specific flight waiting for departure, the current departure queue length on taxiway (), the potential number of landing aircraft during taxi-out course (), and the distance of taxi-out route from each gate to runway () are the significant factors affecting taxi-out time. The recorded data include considerable delay information due to a great deal of traffic flow. Especially at PEK airport, the second busiest airport of the world, numerous traffic flows induce enormous delays. Delay violates the fluency of the departure and arrival processes and influences taxi-out time. Therefore, delay () has become a very important indicator. However, available delay is determined as the mean delay time at a previous hour-bracket, because real-time delay could not be got before flights’ take-off. In addition, taxi-out time changes over the schedule. Thus, the plan take-off time () and actual pushback time () are also included as pertinent factors.

For the lack of certain variability details in our dataset, some potential explanations are not listed. The different taxiing behaviours of airlines and pilots cannot be explained from the character of data. The taxi-out time of raw PEK performance data is recorded as minutes rather than seconds, while the models we used, except the SR model, have a search precision of 10−4.

4.3. Observations
4.3.1. Dynamics of Training Days

To cultivate an understanding of the dynamics of PEK airport, a discussion of the actually observed departure behaviour at the airport is presented first. Table 1 shows the statistical values for each variable obtained from the training data. The response is taxi-out time, and the variables of the predictors are other attributes. Hereinto, the actual pushback time actual means the “Time Point” rather than the “Time Period”; thus the plan take-off time and the actual pushback time are transferred into a minute format of a day (e.g., a whole day has 1,440 minutes; the 0:00 records as 0 and the 6:01 records as 361). To avoid numerical difficulties and abnormal errors, some absurd data points, such as a very extended delay time, have been eliminated from datasets, and all data samples are normalized within a range of 0 to 1 for modelling.

Figure 2 shows the observed dynamics on a training day at PEK airport. It includes actual average taxi-out time per quarter (15 min) (Figure 2(a)), departure demand per quarter (Figure 2(b)), and arrival demand per quarter (Figure 2(b)). The average taxi-out time and departure demand have two peak-hour durations: at 7:00 AM–9:00 AM and 4:00 PM–6:00 PM, respectively. The peak-hour duration of the arrival process happens from about 4:00 PM to 7:00 PM. These two overlapping durations contribute the longer taxi-out time.

Figure 3 describes the scatterplot of a training dataset, showing the linear fit between taxi-out time and delay. In general, the delay has a positive impact on taxi-out time, and is 0.5374. This is the reason for the delay being one of the factors in busy airport.

4.3.2. Dynamics of Testing Days

The testing days are from Nov. 13 to Nov. 15 in 2013. A set of details performance is shown in Table 2.

Table 2 displays the details of testing days that include two normal days (13th and 14th) and a day with excessive delay (15th). In order to validate the different gap between normal and abnormal days, a nonparametrical statistical test, named Wilcoxon-Mann-Whitney test, is implemented. Two null hypotheses of “the taxi-out time distribution is same between two days” on the 13th-14th and 13th-15th, respectively, are tested. value of the 13th-14th is 0.081 > 0.05 and the null hypothesis can be accepted, while value of the 13th-15th is 10−5 < 0.05 and the null hypothesis is rejected. The test set and results of statistical test can be seen in Appendix 1. Thus we can safely conclude that it is statistically different between normal and abnormal days.

5. Numerical Results

Through prediction of test data, performances of each predictive method could be compared. For PSO-SVR, global optimal parameters in this research are (16.885, 1.401, 0.028) and (36.221, 0.917, 0.020) for IFA-SVR. A visualized comparison is made between the mean actual taxi-out time per quarter and the mean predicted taxi-out time per quarter (i.e., on the predicted days). The illumination below just shows the actual and predicted taxi-out time curves on the 14th, which was shown in Figure 4.

We can intuitively see in Figure 4 that PSO and IFA based SVR models have higher compatibility than other approaches, especially GLM and SR, which are obviously underfitting and sometimes wrong-fitting, whereas the ANN method also has a very good fit effect.

Table 3 shows the first three performance measures for predicted datasets, and bold numbers highlight the best performance measures (closest to actual values) for each predictive method across three testing days. The introduced IFA-SVR outperforms other approaches in terms of mean taxi-out time and standard deviance, while IFA-SVR is superior on median taxi-out time. These results are closer to the actual performance of testing data. As data on the 15th presents very long taxi-out time on the whole, all mean predicted taxi-out times are less than actual values. The output results of SR are integers, since SR is based on the integral classification of training taxi-out time, which can be seen from the form of median taxi-out time. However, the standard deviance of predicted taxi-out times of GLR reveals the worst distinct sensitivity with different parameters, and this also can be observed from the underfitting phenomenon in Figure 4. Compared with the results in [2] at Tampa International Airport, these swarm intelligence algorithm based prediction methods show better fault-tolerance ability for handling mean taxi-out time predictions, especially in excessive traffic or abnormal patterns.

The comparison results of modelling performance for each predictive method can be found in Table 4 and the best performance is also highlighted with bold numbers. Table 4 shows that the highlighted performance measures of IFA-SVR are slightly better than the results of PSO-SVR and significantly outperform other approaches. Both the newly introduced PSO-SVR and IFA-SVR have the squared correlation coefficient exceeding 90% on both the 13th and 14th, while they drop on the 15th for the large numbers of underestimated taxi-out times on the 15th, which will be shown in Figure 5. Figure 5 indicates a comparison of taxi-out time prediction accuracy for each predictor on the 14th and 15th, respectively, of which the -axis represents the aircraft, sorted from underestimated to overestimated taxi-out times, and the -axis is the error between predicted and actual-predicted taxi-out time, namely, predicted taxi-out time – actual taxi-out time.

The vertical dash line divides the sorted aircraft into (i) underestimated taxi-out time region and (ii) overestimated taxi-out time region. The distance between the dots on each line and the 0-baseline represents the absolute error of predicted taxi-out time for each aircraft. The number of underestimated taxi-out times in Figure 5(a) is almost in balance with the number of overestimated taxi-out time, while being larger than it is in Figure 5(b). We can also find the notable predictive ability of newly introduced predictors for excessive traffic or abnormal patterns from Figure 5(b). In addition, the reason for all performance measures on the 13th and 14th of PSO-SVR and IFA-SVR being better than that on the 15th (except MAPE) is in that the actual mean taxi-out time on the 15th is greater than on other days.

Table 5 shows the performance measures of prediction accuracy within 2, 3, and 5 min by measuring absolute error. IFA-SVR still comes out on top among the testing methods. In terms of accuracy within 2 and 5 minutes, the performance of IFA-SVR is inferior to capability in [15] (79.39% to 86.81% and 95.52% to 99.08%) for Stockholm Arlanda Airport. That is caused by the different traffic condition samples between different airports. Notice that the accuracy measures in [15] of linear regression are 85.3% and 99.16%, respectively, while the best performance of TSK model improves the rates by 1.78% and −0.08%, respectively. In this research, the performance of IFA-SVR improves the rates of GLR by 97.49% and 24.42%, respectively.

6. Conclusions

When the objective is to improve on-time performance, enhance the utilization of handling-personnel and other resources, and reduce delay, congestion, and emissions, an improved taxi-out time prediction method is significant when it can contribute to each decision-support system at departure operations. This paper collected several classical regression and machine-learning methods (including generalized linear regression, Softmax Regression, and Artificial Neural Network) and proposed two improved swarm intelligence algorithm based SVR prediction approaches to test predictive ability. Several potentially significant factors were observed and analysed in the historical data of PEK airport. And queue length, potential landing number, and the distance of taxiing route were identified and were shown to be significant, as was delay time in the previous hour, which was also important in some specific airports and taken into consideration there. Compared with the traditional predicting methods, the proposed two approaches, especially IFA-SVR method, achieved accuracy rate up to 95.52% within 5 minutes and showed a tremendous improvement on predictive accuracy. Moreover, the proposed approaches showed commendable ability in dealing with deviant situations. These results could motivate managers to arrange tighter flight schedules and pushback slots.

Although the proposed predictive methods seem to accurately predict taxi-out time, they have to be improved for the combined statistical factors because of the lost information about take-off direction for a whole day and about the different taxiing speeds of aircraft type. Future work will focus upon considering the different taxiing speeds of aircraft types and upon collecting precise taxi-out routes to improve prediction accuracy. In addition, study of other hub airports is also an ongoing research interest.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research has been partially supported by the National Natural Science Foundation of China (no. U1533203 and no. U1233124) and by Air Traffic Management Research Institute (NTU-CAAS) (Grant no. M4061216). The authors would like to thank The Second Research Institute of CAAC for their permission to obtain ASP data.

Supplementary Materials

Supplementary material includes the taxi-out times of test set and the results of Wilcoxon-Mann-Whitney test; see Appendix 1. Table 1: taxi-out times in test days. Table 2: results of Wilcoxon-Mann-Whitney test for test days. (Supplementary Materials)