Journal of Advanced Transportation

Volume 2019, Article ID 3525912, 11 pages

https://doi.org/10.1155/2019/3525912

## Study of Flight Departure Delay and Causal Factor Using Spatial Analysis

^{1}School of Transportation Science and Technology, Harbin Institute of Technology, Harbin 150001, China^{2}State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China^{3}The Second Research Institute of Civil Aviation Administration of China, Chengdu 610041, China

Correspondence should be addressed to Siqi Hao; moc.361@74oahiqis

Received 15 February 2019; Revised 5 May 2019; Accepted 23 May 2019; Published 4 June 2019

Academic Editor: Eneko Osaba

Copyright © 2019 Shaowu Cheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Analysis of flight delay and causal factors is crucial in maintaining airspace efficiency and safety. However, delay samples are not independent since they always show a certain aggregation pattern. Therefore, this study develops a novel spatial analysis approach to explore the delay and causal factors which is able to take dependence and the possible problem involved including error correlation and variable lag effect of causal factors on delay into account. The study first explores the delay aggregation pattern by measuring and quantifying the spatial dependence of delay. The spatial error model (SEM) and spatial lag model (SLM) are then established to solve the error correlation and the variable lag effect, respectively. Results show that the SEM and SLM achieve better fit than ordinary least square (OLS) regression, which indicates the effectiveness of considering dependence by employing spatial analysis. Moreover, the outcomes suggest that, aside from the well-known weather and flow control factors, delay-reduction strategies also need to pay more attention to reducing the impact of delay at the previous airport.

#### 1. Introduction

With the rapid development of the civil aviation industry, airspace has become increasingly crowded. This crowdedness causes increasingly frequent delays in most major airports worldwide. This situation seriously affects airports, airlines, and passengers. From 2007 to 2017, the annual flights in China consistently increased from 3.65 million to 10.83 million, with an average increasing rate of approximately 12.2% in the past five years. Meanwhile, the rate of flights arriving on time decreased from 83.19% in 2007 to 71.67% in 2017. The annual cost of flight delays in China was estimated to be more than $7.4 billion. Such high economic costs of delay necessitate delay causal factor analysis and delay-reduction strategies.

Several approaches have been taken to analyze the factors that affect flight arrival and departure delay. Allan et al. [1] studied several determining causes of flight delay at the Newark International Airport (EWR) using a comprehensive approach. The results show that adverse weather conditions, low ceilings, and low visibility conditions strongly influence flight delays. Similarly, Asfe et al. [2] investigated the major causal factors of flight delays by ranking different factors using the analytical hierarchical process. They found technical failure and delayed entries as two of the most influential factors. Based on the identification of causal factors, further researches explored the quantitative effect of each factor on flight delay. By analyzing the characteristics of flight departure and arrival delays by constructing probability density functions, Mueller et al. [3] explored several causal factors of delays, such as traffic volume, aircraft type, aircraft maintenance, airline operations, weather conditions, change of procedures en route, capacity constraints, customer service issues, and late aircraft or crew arrival. The results show that weather contributed to 69% of the delays. Different results can be achieved by different method and variables; research results of Kwan and Hansen [4] show that airport congestion contributed to approximately 32% of the average delays, in which a series of econometric models was established to identify the key causal factors of flight delays, including airport congestion, total traffic, and en route weather. In addition to identifying the causal factors and their quantitative effect on flight delay, more studies focus on the development of models to determine the probability of aircraft delay. Wesonga et al. [5] proposed and evaluated a multiple parametric approach, which includes the apparently significant meteorological and aviation parameters, to predict the probability of aircraft delay. Recent research and development effort in delay probability prediction are seeking to develop asymmetric Bayesian logit model to take the asymmetric distribution pattern of the dependent variable into consideration (see Perez-Rodriguez et al. [6]). By using data from BTS and IATA, this article corroborates the necessity and superiority of the proposed asymmetric Bayesian logit model, as well as identifying new significant factors affecting the probability of arrival delay.

In addition to traditional statistical methods, machine learning algorithms were used by several studies. Bayesian network was a commonly used approach to establish delay model to explore the delay propagation mode and estimate delay [7, 8]. Artificial neural network was also utilized to examine the relationship between departure delay and different causal factors comparing to linear and nonlinear regressions [9]. Deep learning models have also been investigated for air traffic delay prediction tasks [10]. Moreover, a number of studies attempted to determine the major causal factors of flight delays by detecting the time series data trend. Abdel-Aty et al. [11] applied the “two-stage approach” to detect periods of regularly repeating patterns in their data and to identify the factors correlated with them. Tu et al. [12] employed a smoothing spline model to identify the relationship between seasonal trends, random effects, and daily delay propagation pattern. Delay propagation has also been deeply investigated by many researches to help to understand the air congestion [13–15] and alleviate fight delay [16, 17]. The effects of day and time were assumed to be additive, and the residuals were assumed to be identically and independently distributed in the study.

However, delays show a certain aggregation pattern in the temporal dimension; high delays are normally clustered; and low delays tend to be surrounded by low delays. In other words, the delay value of samples with shorter distance between them is normally similar compared to the delay value of delays with longer distance between them. The correlation between two delay values depends on their spatial attribute such as spatial location and spatial distance. Without doubt, there is high degree of spatial dependence among delays in a space organized by hour and by day. Given that most of the aforementioned methods were based on certain assumptions which either ignore or simplify the correlation of samples in the dataset, Diana [18] initially introduced the approach of spatial analysis for delay prediction, which is able to take the spatial dependence in every direction into account. In the study, delay was considered as a spatially distributed variable in a space coordinated by day and time. A spatial error model (SEM) was built to consider spatial dependence in error.

Actually, flight departure delay is a complex problem with substantial direct causal factors and many concealed indirect causal factors. Flight departure delay is caused by the abovementioned factors, as well as by the flight delays that occur earlier [12], as the operation resources required by the current flight, such as the crew, aircraft, and passenger gates, might have been utilized by previously delayed flights. This resources correlation may lead to delay daily propagation. The spatial dependence exists in every direction since the aggregation is observed in both day of week and hour of day, which probably lead to error correlation and variable lag effect of causal factors on delay [19].

Motivated by the exploration of the main causal factors of flight departure delays in consideration of correlation between delay samples, our study analyzes departure delay as a geographic problem instead of a statistical problem by assuming delay as a spatially distributed variable organized by hour and by day. Causal factor analysis using spatial analysis enables the existence of spatial dependence in variables, which solves the problem of sample correlations among hours and days simultaneously. Specifically, spatial regression models were built to absorb the delay spatial dependence by adding a spatial independent variable. The spatial lag model (SLM) and spatial error model (SEM) are established in our study to solve the variable lagged effect and the error correlation, respectively. Comparisons between the SLM, the SEM, and the OLS estimation are also conducted.

This paper is structured as follows. Section 2 introduces the spatial analysis methodology. Section 3 describes the data sources, defines the variables, and describes the data-processing methodology. Section 4.1 shows the exploration analysis of flight departure delay with a distribution map and trend analysis. Section 4.2 demonstrates identification of delay pattern. Section 4.3 maps the semivariogram to quantify the spatial dependence of flight departure delay. Based on the results of Section 4.3, Section 4.4 illustrates spatial prediction considering spatial autocorrelation by employing ordinary kriging method. Section 4.5 discusses the establishment of the classical regression model, SEM, and SLM, as well as the comparative analysis of the three models to explore the main causal factors of flight departure delays. Finally, this paper is concluded with a summary.

#### 2. Methods

This study employs the spatial analysis method to explore the delay distribution pattern and causal factors of flight departure delays while considering delay spatial dependence. Delay is assumed to be a spatially distributed variable. Spatial analysis is a quantifying technique used in the study of spatial variables [20].

##### 2.1. Delay Pattern Analysis

*(1) Exploring Delay Distribution*. The first step to analyze delay pattern is to explore the distribution. By defining a space with coordinate of day of week and coordinate of hour, the delay is added to each hour unit as an attribute. The delay distribution can be plotted with different colors as the delay minutes. The time when an intense delay occurred is recognized in the distribution map. 3D trend analysis can be used to visualize the departure delay distribution and trend in the temporal dimension.

*(2) Identifying the Pattern of Delay.* The pattern of delay is then identified by calculating Moran’s and general to measure the degree of delay spatial dependence among observations. Positive autocorrelation suggests that the values of the one hour unit and its neighbors are similar. Negative autocorrelation suggests that the values of the one hour unit and its neighbors are different. No autocorrelation suggests that the values are randomly distributed over the space.

Moran’s is calculated as where is the value of Moran’s* I*, is the total minutes of departure delay during the hour unit* i*, and is the spatial weight matrix.* Z* value is generally used to test Moran’s value. A test result against the null hypothesis indicates that no spatial autocorrelation exists.

Most of the spatial weight matrices are built based on spatial connectivity and spatial distance. The weight matrix in this study is generated based on distance measured by the inverse Euclidean distance between two hour units. The value of Moran’s ranges from −1 to 1. Moran’s identifies the similarity between units with delay and the spatial distribution pattern. However, it cannot distinguish high- from low-value clusters. General identifies the two different patterns of spatial cluster; it is computed as

When the value is significant, a general value that is greater than its average indicates a high-value cluster, as opposed to a general value that is less than its average. A general value that is equal to its average indicates no autocorrelation.

The cluster type in the flight departure delay is then identified, and the hot and cold spots of flight departure delay are explored.

*(3) Quantifying Delay Spatial Dependence*. After measuring the degree of spatial dependence, the semivariogram is modeled to quantify the departure delay spatial dependence and to analyze its random and structural properties. Departure delay is considered as a regionalized variable because it is correlated with the hour and the day. Structural property indicates the existence of the autocorrelation between the departure delay at location and at location (h is the distance from x). Semivariance calculates the average difference on departure delays between pairs of hour units at a given interval [9]; it is computed as where is the total minutes of departure delay of location ; is the total minutes of departure delay of the locations with distance from ; and is the number of locations with distance from .

*(4) Delay Prediction*. After the spatial dependence structure of a variable is determined, the measured data can be used to estimate the variable at unmeasured locations. This interpolation method is known as kriging interpolation. Based on the unbiased estimation and the minimum variance principle, the kriging interpolation method can quantify the spatial dependence between the known sample and the estimated point according to the statistical characteristics and spatial variation of the sample.

##### 2.2. Causal Factor Analysis

After the identification of delay dependence, causal factor analysis is performed using spatial analysis, which enables the existence of spatial dependence in variables. To explore the causal factors of flight departure delay, spatial econometric models were built to absorb the delay spatial dependence by adding a spatial independent variable, and the outcomes of the SLM, SEM, and classical regression model are compared.

*(1) Classical Regression Model*. A classical regression model can be written as where represents the total minutes of departure delay at the target airport and represents the factor variables. represents the effect of the independent variables on the dependent variable, and is the random error term vector subjected to normal distribution.

*(2) SEM*. SEM is able to consider the spatial dependence in error terms by adding spatial error term as an explanatory variable. The SEM takes the following form:where is the total minutes of departure delay and is the factor variables. represents the effect of the independent variables on the dependent variable, is the random error term vector, is the spatial error coefficient, is the spatial weight matrix of error term generated based on distance measured by the inverse Euclidean distance between two hour units, and is the random error term vector subjected to normal distribution.

*(3) SLM*. SLM is able to consider the spatial autocorrelation in delay variable by adding a spatial lag variable as an explanatory variable. The SLM takes the following form:where is the total minutes of departure delay and is the factor variables. represents the effect of the independent variables on the dependent variable; is a spatial weight matrix of the dependent variables generated based on distance measured by the inverse Euclidean distance between two hour units; is the spatial regression coefficient, which reflects the effects of the delay in the neighbor hours on the delay in one hour* Y*; and is the random error term vector subjected to normal distribution.

#### 3. Data Collection

##### 3.1. Data Sample

The data in this study are obtained from the database of an international hub airport in China in June 2016. To maintain the privacy of the institution, the name of the airport is not revealed. In June 2016, 8788 flights departed from the target airport, among which 18 flights returned, 51 flights were canceled, and 5357 flights (60.96%) were delayed for more than 15 minutes; 3180 flights (36.19%) were delayed for more than half an hour; 1528 flights (17.39%) were delayed for more than one hour; and 489 flights (5.56%) were delayed for more than two hours. The most severe delay lasted for 888 minutes. Approximately 70% of the delays were within 60 minutes. The data are organized by day of week and hour of day. To demonstrate the spatial dependence of delay distribution intuitively, the study assumed delay as a spatially distributed variable. The space is defined with day of week as the x coordinate and hour of day as the y coordinate. Compared with the total number of flights (8788), there were few flights (72) from 0:00 to 7:00, and hour units with less than five flights are not considered since it could bias the average. The study area covers 7:00 to 24:00, including a total of 510 hour units with departure delays.

##### 3.2. Definitions of Variables

First step of variable construction is to find out factors affecting flight departure delay. The flight delay determinants considered in previous studies include weather, delay propagation, flight schedule, airplane shortage, air route, airplane type, flight order, air traffic flow, hub airport, ability of the airline to pay debt, ability of the airline to profit, load factors of the airline, load rate of the airline, and other factors [21, 22]. Chinese aviation determined the following factors of flight delay. Technical failure includes technical failure at the target airport (T) and technical failure at the previous airport (TL). Weather refers to weather conditions at the target airport (W), at the previous airport (WL), at the destination airport (WD), and en route (WR). Control factors include flow control (CF) and route restriction (CR). Other factors include the airline (A), airport facility (F), passenger (P), and capacity allocation (D).

Then, nominal factors are selected by calculating the frequency and the effect of each factor in our dataset. Effect of each factor of flight delay in Table 1 is measured by average delay minutes caused by each factor. As shown in Table 1, the frequency of the flow control factor is significantly higher than the others, but the average delay minutes caused by the flow control factor is lower. Conversely, the frequency of the weather condition at the target airport and the technical failure at the previous airport are significantly lower, with high average delay minutes.