Mathematical Problems in Engineering

Volume 2018 (2018), Article ID 7509508, 11 pages

https://doi.org/10.1155/2018/7509508

## Predicting Taxi-Out Time at Congested Airports with Optimization-Based Support Vector Regression Methods

^{1}School of Transportation Science and Engineering, Harbin Institute of Technology, Harbin, China^{2}School of Mechanical & Aerospace Engineering, Nanyang Technological University, Singapore^{3}Ground Support Equipment Research Base, Civil Aviation University of China, Tianjin, China^{4}The Second Research Institute of Civil Aviation Administration of China, Chengdu, China

Correspondence should be addressed to Yaping Zhang; moc.361@5090tlxz

Received 9 November 2017; Revised 25 February 2018; Accepted 15 March 2018; Published 22 April 2018

Academic Editor: Ricardo Soto

Copyright © 2018 Guan Lian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR) approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO) and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK) is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA) optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.

#### 1. Introduction

The size of fleets at airports is becoming ever larger because of the continuous increase in the past few decades in the demand for transportation by air. Consequently, efficiency levels are dropping as managers face more operational eventualities, and airlines have to accommodate higher fuel costs and mounting numbers of customer complaint. In 2016, the passenger traffic volume reached 487.96 million, an increase of 10.7% since the previous year. The current average on-time rate of flights in China is 76.76%, and average delay time is 16 minutes, which is 5 minutes less than in 2015. The total number of complaints received (all recorded by airlines and airports) in 2016 showed an increase of 84% on the 2015 total [1]. In China, an aircraft is considered to be “on-time” if it take-off (lands) within 15 minutes before and after the schedule departure (landing) time. The delay time here refers to the departure flights and is defined as the difference between actual departure time and plan departure time.

Flight delays have a dramatic impact on the movement of taxiing aircraft between gates and runways. Taxi-out time is defined as the time between the actual pushback and wheels-off. Taxi-out time is difficult to predict in hub airports at peak hours. Consequently very long taxiing times and airport surface congestion would be suffered. The long-time taxiing aircraft may cause a blunder when dealing with the pushback and take-off slots, which not only destroys the balance of the arrival and departure process, but also increases fuel consumption and emissions. Moreover, the increasing workload of controllers is unfortunate. The delay is cumulative, but it is both stochastic and controllable in the taxi process. The stochastic characteristic is reflected in uncertainty events, such as shifts in the weather environment, the interaction of the departure/arrival aircraft surface movement, and the human factor. Controllable behaviours such as delays can be adjusted by alternating routes and taxiing speed and even by holding at gate [2].

Better prediction of taxi-out time allows all stakeholders to arrange the future activities in airport operation. Efficient taxi-out prediction methods are effective approaches when the aim is to eliminate delays and improve the utilization of resources. Once taxi-out time is predicted in advance, operators gain a flexibility that allows them to adjust the schedule, gates assignment, and pushback plan. This achieves the smoother operation of an airport and reduces its surface congestion and fuel-burn costs. The aim of this research is to develop the approaches that are more accurate predictors of the taxi-out time of departing aircraft. In this paper, we introduce two methods of predicting taxi-out time, both of which arose from an analysis of the factors extracted from the Aviation System Performance (ASP) data of Beijing International Airport. The proposed models are developed on the soft-computing approaches to predicting taxi-out time: Particle Swarm Optimization algorithm based and Improved Firefly Algorithm based Support Vector Regression. These two intelligent algorithms can search the optimal parameters for SVR to predict the taxi-out time effectively.

The organization of this paper is as follows: A brief overview is offered of previous attempts to analyse taxi-out-time behaviours in the airport departure process, and of the several prediction methods discussed in the Literature Review. This is followed by a description of the research methodology, which includes three traditional prediction methods and two newly proposed, improved swarm intelligence algorithm-based approaches to predicting taxi-out time. The layout data of PEK airport is illustrated, along with historical data, and both are validated for analysing airport dynamics and traffic situations in the taxiing process. Results obtained from the PEK data and findings are then discussed. The conclusion summarizes the benefits that accrue from these findings, and their implications.

#### 2. Literature Review

Several efforts have been made to address the prediction of taxi-out times. Those efforts have included both historic data-based predictions and the queuing-based approaches that regard causal factors. Shumsky deemed aircraft flow and departure demand to be casual factors and used dynamic linear models to predict taxi-out time. He compared static and dynamic linear models and found the dynamic linear model better for predicting taxi-out time in a short-time window [3]. Pujet modelled the departure system as queuing servers and derived a stochastic distribution for the taxi-out time. His model captured the details of the departure process to estimate taxi-out time [4]. Idris et al. analysed a number of factors that affect taxi-out time by using the Airline Service Quality Performance (ASQP) data. These factors included the runway configuration, the airline/terminal, the downstream restrictions, and the take-off queue size [5–7].

These researchers developed the queuing model for predicting taxi-out time and drew the conclusion that take-off queue size correlates best with taxi-out time, especially when the queue that each aircraft experiences is measured as the number of take-offs between its pushback time and its take-off time. Carr et al. proposed a simulation-based research of queuing dynamics and traffic rules. They predicted taxi-out time by considering aggregate metrics such as airport throughput and departure congestion [8]. Simaiakis and Balakrishnan proposed a taxi-out time prediction model with an analytical model of the aircraft departure process, which included an estimate of the distributions of unimpeded taxi-out time, and the development of a queuing model of the departure runway system [9, 10].

Several statistical approaches and machine-learning methods were applied to the prediction of aircraft taxiing time. Srivastava used high-resolution position updates from the ASDE-X surveillance system of JFK to develop a taxi-out prediction model based on the existing surface traffic conditions and short-term traffic trends [11]. Hebert and Dietz developed a multistage Markov process model of the departure process at LaGuardia airport, based on five days of data, to predict taxi-out time [12]. Balakrishna et al. proposed the reinforcement learning algorithms, which could adapt to the stochastic nature of departure operations, to predict average airport taxi-out time trends approximately 30–60 minutes in advance of the given time of day [2, 13]. Ravizza et al. built a combined statistical and ground movement model and used multiple linear regression to find the function that would predict taxiing times more accurately [14]. Also, they used the same explanatory variables for different approaches, which included multiple linear regression, least median squared linear regression, Support Vector Regression, M5 model trees, Mamdani fuzzy rule-based systems, and TSK fuzzy rule-based systems, to predict taxi-out times and then compared these approaches [15]. Lee et al. used both fast-time simulation and machine-learning techniques to predict taxi-out time and found the prediction method of Support Vector Regression to be better than the linear regression method and the Dead Reckoning method [16].

Unfortunately, the state-of-the-art methods are tested at airports that do not give the findings much universalizability. These airports have exceptional facilitating taxiing conditions, and their response to clearance and delays is quick. For airports that are large in every respect, these methods are slightly inadequate, or they do not take some necessary factors into consideration.

#### 3. Taxi-Out Time Prediction Techniques

There are several predictive approaches such as Artificial Neural Networks (ANN) [17], Kalman Filtering models [18], Softmax Regression (SR) [19], and the Support Vector Regression (SVR) [20]. Therefore, methods with reasonable accuracy are essential for estimating taxi-out time at departure.

##### 3.1. Generalized Linear Model

The Generalized Linear Model (GLM), formulated by Nelder and Wedderburn [21], is a flexible generalization of ordinary linear regression that allows for response variables with error-distribution models other than the normal distribution. GLM relates the linear model to the response variables through a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. The relationship between predicted value and independent variable is defined inwhere is the dependent variable, is the link function, is a set of independent variables, represents the slope coefficients, and is the distribution of . The procedures of GLM for predicting taxi-out time are as follows.

*Step 1. *Input the training dataset of historical taxi-out time and the corresponding factors; check the distribution of .

*Step 2. *Choose the link function according to the distribution of .

*Step 3. *Build the regression model between and , calculate the estimated value of regression parameters , and implement the significance test.

*Step 4. *Predict the taxi-out time by using the factors in test dataset.

##### 3.2. Softmax Regression Model

The Softmax Regression (SR) is a generalization of logistic regression capable of handling multiclass problems, that is, admitting more than two possible discrete outcomes [19]. The algorithm includes a training phase for estimating the regressors and a testing phase for abstracting the appropriate probability of each feature vector from which the class labels are inferred. Afterwards, the SR selects the value of classified members by calculating the probabilities ofand the model parameters were trained to minimize the cost function:where is the number of classes, are the parameters of SR model, is an -by- matrix, and is an indicator function. SR predicts the taxi-out time with the following procedures.

*Step 1. *Input the training dataset of historical taxi-out time and the corresponding factors; recognize the number of classification of taxi-out times.

*Step 2. *Build the exponential distribution family by running a set of independent binary regressions according to the factors vectors of each taxi-out time class; obtain the maximum likelihood function .

*Step 3. *Establish and minimize the cost function to obtain the optimal parameter *θ* by using gradient descent method.

*Step 4. *Update the likelihood function with optimal , and predict the taxi-out time of test set by using .

##### 3.3. Artificial Neural Network

Artificial Neural Network (ANN) is a machine-learning method based on a large collection of connected simple units called artificial neurons. The Back-Propagation Neural Network (BPNN), a multilayer feedforward network trained by error back-propagation algorithm, is one of the most widely used neural network models. Its topology includes input layer, hidden layer, and output layer. In output layer, the activation of a neuron is determined bywhere is the activation of the th neuron, is the neurons set in the preceding layer, is the weight of the connection between neuron and , is the output of neuron , and is the sigmoid function. The BPNN model can learn from the parameters set of taxi-out time and calculate the actual output when implementing the predicting process. If the error between the actual output and expected output did not meet the accuracy requirements, the learning rule of the BPNN would optimize variance by adjusting weights and thresholds until satisfying the accuracy requirements. The learning process of BPNN approach can be summarized in the following steps.

*Step 1. *Initialize the neural network; define the minimum MSE error () and maximum number of iteration.

*Step 2. *Input training set; initialize the weight matrix .

*Step 3. *Compute the layer response output and the calculated MSE.

*Step 4. *Compare the calculated MSE and ; if calculated MSE > , continue; else go to Step 6.

*Step 5. *Calculate change in weights and update weights; go to Step 3.

*Step 6. *Finish training and predict the taxi-out time by using ANN with test set.

##### 3.4. Improved Swarm Intelligence Algorithm Based Prediction Approaches

###### 3.4.1. Support Vector Regression

The Support Vector Regression technique is a nonlinear regression forecasting method. The basic idea is mapping the input variables into a high-dimensional linear feature space (Hilbert space), commonly through a kernel function. The Gaussian Radial Basis Function (RBF) kernel is a commonly used kernel function, where is the parameter to be optimized. In this higher dimensional space, the training data can be approximated to a linear function. Then, the global optimal solution is obtained by training of the finite sample. The regression function for SVR iswhere is the weight vector and can be replaced by kernel function . In -SVR, the objective of is estimating the deviations of output variables less than or equal to from training data. The -value controls the complexity of the approximating functions where small values tend to penalize large portion of the training data, leading to tight approximating models, and large values tend to free data from penalization, leading to loose approximating models. Therefore, the proper choice of -value is critical for the generalization of regression models [22]. The optimal regression function is determined from the estimation of and by solving the following optimization problem:where are the variables that are introduced to penalizing complex fitting functions and the constant allows for the penalizing of the error by determining the tradeoff between the training error and the model complexity. And the dual function is maximizing:

The nonlinear regression function is

To avoid the complex dot product through the kernel function , the input variables are mapped into a high-dimensional linear feature space. Thus, (7) can be written as

###### 3.4.2. Particle Swarm Optimization

The Particle Swarm Optimization is a swarm intelligence algorithm developed in recent years. It is a metaheuristic global optimization method based on a social-behaviour analogy, such as birds flocking and fish schooling. The PSO method solves an optimization problem by moving the particles (namely, candidate solutions) over those particles’ velocities and positions according to simple mathematical formulae. The position of each particle is updated towards the better-known position driven by its neighbours’, and the global, best performance. Thus in searching for the optimal solution of the problem, the update velocity and position of particle are based on the following equation of motion:where is the updated velocity for the th particle, is the inertia weight, and are the weighting coefficients for the personal best and global best positions, respectively, is the th particle’s position at time , is the th particle’s best known position, is the best position known to the swarm, and and are the uniformly random variables . Variants on this update equation consider best positions within a particle’s local neighbourhood at time .

###### 3.4.3. Improved Firefly Algorithm Optimization

The Firefly Algorithm (FA), as a new group bionic optimization algorithm, has high efficiency in solving numerous optimization problems and can outperform conventional algorithms, such as GA. In this algorithm, the fireflies are attracted to each other depending on the two elements: their own brightness and attraction. The brightness depends on the location and the target value, and the higher the brightness, the better the location. Fireflies with higher brightness at the same time have a higher degree of attraction. Low-brightness fireflies in the field of vision are attracted by high-brightness fireflies. Fireflies would move randomly if they had similar fluorescent brightness.

Regarding the brightness as objective function, the optimization problem can be seen as a maximization problem. The attractiveness of the fireflies is proportional to the fluorescence intensity of the nearby fireflies and is inversely proportional to the distance. Define the relative fluorescence brightness of the fireflies as and the attractiveness as . Distance between fireflies and is . Firefly is attracted by firefly to update the location; the location update equation iswhere is the absorption coefficient, , is the attractiveness when , is the step factor for determining random firefly movement, and rand is a random number drawn from a Gaussian distribution, .

*Adaptive Step Factor. *The value of the step factor affects the global and local optimal detection ability of the algorithm. In order to improve the convergence efficiency of the optimization algorithm, the large step factor can benefit the global optimal solution search efficiency. With the increasing of number of iterations, gradually reducing the step factor is more conducive to the algorithm in the search space for fine tuning. Thus a monotonically decreasing function is chosen as the step factors, which is written aswhere is the initial attractive coefficient, is the controlling parameter, empirically selected as 0.9, and is the number of iterations.

*Lévy Flight. *The conventional FA optimization uses regular random movement method in stochastic optimization. This often leads to premature converging without the global optimal solution when dealing with a large number of local optimal solutions. In order to reduce the probability that the optimal process falls into the local optimal solution, this paper adopts Lévy flight when updating the distance of fireflies. Lévy flight is a random walk that the step length obeys Lévy distribution, which is a distribution of a sum of identically and independently distributed random variables. The Fourier transform is . The step lengths follow Lévy distribution , where is an index and follows a power-law distribution. The distribution has an infinite variance following

Thus by replacing the original step factor and random walk with adaptive step factor and Lévy flight, respectively, the new update equation of IFA is written aswhere symbol is entry-wise multiplication.

###### 3.4.4. PSO/IFA Based Support Vector Regression

In this study, identifying the optimal parameters of the SVR model is an optimization problem. Therefore, this study combined swarm intelligence algorithm and SVR in prediction methods to reduce prediction errors. Considering that the number of samples of the learning data is much larger than that of feature dimensions, the input variables are mapped into Hilbert space through the RBF kernel, which is more promising, compared with other kernels. In order to solve the problem of predicting departure taxi-out time more accurately, the establishment of SVR models requires the determining of the penalty factor , RBF kernel parameter , and the -value in advance, by using PSO and IFA optimization, respectively, since the inapposite would affect the training error and model complexity, inapposite would define the nonlinear mapping from the input space to Hilbert space and induce overfitting or fewer learning phenomena, and the -value controls the complexity of the approximating functions. The flowchart of PSO/IFA based on the SVR prediction model is shown in Figure 1.