Abstract

In this paper, we develop a route-traffic-based method for detecting community structures in airline networks. Our model is both an application and an extension of the Clauset-Newman-Moore (CNM) modularity maximization algorithm, in that we apply the CNM algorithm to large airline networks, and take both route distance and passenger volumes into account. Therefore, the relationships between airports are defined not only based on the topological structure of the network but also by a traffic-driven indicator. To illustrate our model, two case studies are presented: American Airlines and Southwest Airlines. Results show that the model is effective in exploring the characteristics of the network connections, including the detection of the most influential nodes and communities on the formation of different network structures. This information is important from an airline operation pattern perspective to identify the vulnerability of networks.

1. Introduction

Our world consists of many complex geographical networks, ranging from computer to social networks, to infrastructure or transport-related networks. Many researchers have attempted to unravel the properties associated with these complex networks [1]. In terms of transport, for instance, city streets [2], public transport networks [3, 4] and aviation networks [5, 6] have been investigated. With the discovery of small-world [7] and scale-free [8] properties in many natural and artificial networks, methods and techniques have emerged to improve the understanding of these complex networks. More recently, the study of community structures within networks have gained renewed attention. As defined by Girvan et al. [9] and Chen et al. [1], p. 890 “community structure refers to vertices that are gathered into several groups in which there is a higher density of edges within groups than among groups.” A schematic illustration of such a graph with communities is shown in Figure 1.

Communities, also often referred to as clusters or modules, are groups of vertices that share common properties and/or play similar roles within a network [10]. In other words, apart from the topological structure, the vertex properties made explicit in the communities are also important. This is because being able to detect these communities can help us to understand and utilize these networks more effectively. Moreover, it will allow us to discover hidden relations between vertices [11].

For this paper, we are interested in detecting community structures in airport and airline networks. Much of the existing work in this area is relevant to the development of our modelling approach. Gegov et al. [12], for instance, investigated community structure in a US airport network by considering both the topological properties and the volume of people traveling. Comparing the network structure with migration patterns, the identified relationships showed a clear overlap between US domestic air travel and migration. Guimerà et al. [13] identified communities in the worldwide air transport network, and demonstrated the multi-community structure of this worldwide network. Their analysis showed that the community structure cannot be explained exclusively based on geographical restraints, but that geo-political concerns should also be taken into account. Postorino and Versaci [14] proposed a fuzzy-based procedure to cluster airports using the geometric distance among airports as an intrinsic fuzzy variable. The result showed that this airport selection or classification is more appropriate compared to the classification provided within policy recommendations. Finally, Cong et al. [15] developed a spectral clustering algorithm to categorize airports by analyzing fluctuations in distance correlation. The results showed that there is one category of airports that controls the critical state of the network, and six airports in this group were found to be the most important airports in the Chinese air transport network.

However, few of the previous studies have considered the reality of networks, and communities were often only algorithmically defined [16]. In most networks, only distance is taken into account, and community detection is merely the end product of the algorithm. Without a clear definition of the network, the results can lead to conclusions of fuzzy communities and so-called unstable nodes [17]. Anomalies may arise especially in an air transport network, such as the mismatching of low degree and high betweenness of an individual node ([18, 19]). Furthermore, the most connected cities are not necessarily the most central [13]. These anomalies can result in the miscalculation of some marginal vertices, which lead to inaccurate community detection and ignore the effect of these vertices on both internal and external communities.

Essentially, air transport networks are not only spatially constrained when the vertices are both vastly topologically coupled and spatially clustered, but can also be property correlated between vertices. Barrat et al. [20] pointed to the clustering coefficient concept. The empirical research on the worldwide air transport network have shown the impact of edge-weight on network structure. That is, even with the same topology, the related internal structures and hierarchies of networks would be different due to different weights. Moreover, the detection of important nodes would be impacted within different weighted networks. Techniques used in identifying node importance are distinguished from network metric-based selection and robustness-related node importance perspective, and can be applied to a weighted network.

Sun et al. [21] considered passengers and distance as network weight to investigate the robustness of the worldwide air transportation network. Based on the network metric, 12 different ‘attacking strategies’ were used to analyze airport importance by measuring the unaffected passengers with rerouting. Using different metrics (e.g., node degree, weighted betweenness centrality, size of giant component) not only allowed for the robustness of network to be evaluated, but also allowed for the comparison of the importance of node connectivity [22, 23], Motivated by Sun et al. [21], we hypothesized that not only route distance would affect the airport connection in terms of the network spatial aspect, but also that route traffic would impact on the capability of a network in terms of passenger rerouting. Therefore, both traffic and distance were considered as network weight in this paper. We aim to investigate the importance of airports in their capability to guarantee more passengers within the shortest route.

Based on route traffic and distance, we propose an improvement of the Clauset-Newman-Moore (CNM) modularity maximization algorithm [1, 24] to detect communities in spatially constrained networks. A core community is not only more compact in space, but the nodes in the same community also have stronger flight correlations than those in a different community. The aim is to identify the regional features of network connections and to detect the most impacted nodes and communities in flight interaction, and how they affect the entire network.

This paper is organized as follows. In Section 2, we introduce the route-traffic-based community detection method, modified to account for route traffic and distance when considering two connected airports. Section 3 illustrates the application of our model using two case studies (American Airlines and Southwest Airlines). In Section 4, core communities and core airports of different airline networks are analyzed and we discuss the results in relation to different operation patterns in identifying the vulnerability of entire networks. Section 5 provides the conclusion and discusses directions for further research.

2. Methodology

The Clauset-Newman-Moore (CNM) modularity maximization algorithm formed the basis of our model for airport networks. We rely in part on the work by Chen et al. [1], and we modify their algorithm by defining the weights of the edges in the network as the function of the route-traffic correlation coefficient between two directly connected airports. The coefficient corresponds to the number of flights between the connected airports, such that the higher the number of flights, the greater the route-traffic correlation coefficient.

2.1. Modularity and CNM Algorithm

The CNM algorithm [24] is a community detection algorithm based on an agglomerative hierarchical clustering method, where groups of vertices are successively joined to form larger communities such that modularity gradually increases after communities are merged. The higher the value of the modularity, the better clustering of the network. Therefore, the basic concept of the CNM algorithm is the concept of modularity, and the approach is devised to maximize the modularity of the network [25].

The modularity, as pointed out by Wang et al. [26], is based on the idea that a random graph is not expected to have a cluster structure. Therefore, the possible existence of clusters is revealed by the comparison between the actual density of the edges and the density that one would expect to have if the vertices of the graph were attached regardless of the community structure. A sound partition of a network should result in a considerably greater number of edges within communities than expected. The mathematical expression of modularity is [1], p. 894:

where is a community, is the community set of the network, and and are nodes in the community . is an element of the adjacency matrix of the network. If there is an edge between and , then ; otherwise,. In addition, let be the total number of edges in the network, and be the degree of node v. The higher the value of , the better the community structure. Therefore, is defined as a stop criterion for the community detection algorithm. To simplify the above expression, two additional variables are introduced:

where is the fraction of edges that join vertices in community to vertices in community , and is the fraction of edges that are attached to vertices in community . Thus, in line with [1], p. 895, the function of can be transformed into:

Note that at the beginning, as Chen et al. (2015, 895) states “the CNM algorithm regards every vertex as a community and then merges them step-by-step.” Furthermore, merging communities between where there are edges connected leads to an increase of modularity, which results in . The pair of communities that results into the maximum is then selected to form a new community. When is negative, the process stops; and the network community structure is identified.

2.2. Route-Traffic Modularity and Modified Algorithm

As stated earlier, we also alter the CNM algorithm to distinguish a route-traffic-based community. As it is a spatially constrained network, route distance is taken into consideration. In addition, because the traffic in different airports will affect the interaction of connecting airports, we suggest a route-traffic modularity, which adds a weight of traffic correlation coefficient to the edges when calculating modularity. Using the correlation coefficient in probability theory, the correlation of route traffic between two airports can be presented as ( means taking the average):

Following the impact of route traffic and distance decay, the weight of the edges is defined as:

Where is the route-traffic correlation coefficient, where denotes the number of flight departures from airport and arrivals at airport , and is the normalized distance between airport and . Different values of the power of may lead to different community structures, therefore, we choose (based on the gravity model) to square when considering the spatial limitation [1]. According to Equation (2), we can calculate the element of the weighted matrix of :

where is the sum of weighted edges in the network. Let be the sum of elements in each row of the matrix,

the route-traffic modularity can be defined as:

Similar to the CNM algorithm, the route-traffic-based community detection algorithm begins with each airport as a single community. It computes the increase of route-traffic modularity by merging each pair of communities, and the pair that generates the maximum is then merged. In this way, route-traffic communities can be clustered while considering both geographic distance and route traffic.

3. Applying the Model to Two Case Studies

We test our model using two case studies. The case studies were selected so that we focus on the same geographical area (i.e., the US). While both the selected airlines have a strong market reach within that area and are in competition with each other, they operate with different strategy patterns. American Airlines (AAL) is a full service carrier with scheduled domestic flights for 150 cities. After merging with the US Airways Group in 2013, AAL became one of the largest airlines in the world. It has been shown that the network structure created by the merger of these two carriers has impacted on hub structures, accessibility and physical coverage of the network [27]. By contrast, Southwest Airlines (SWA) is the largest low-cost carrier in the world and it has the most scheduled domestic flights in United States. In 2017, the traffic volume of AAL was 144 million and the traffic volume of SWA reached 156 million, which occupied 17.9 and 18.3% of the air traffic in the US respectively. Although both airlines are the largest airlines of America, their operation patterns—which determines how they attract passengers and how their network operates—are vastly different.

Data from https://www.oag.com and https://www.flightstats.com on American Airlines (AAL) and Southwest Airlines (SWA) were collected from May to June 2017, and the networks of these two airlines are shown in Figure 2 based on their scheduling flights. Using the community detection method discussed above, the community detection maps of these two airlines were produced (Figure 3). Different communities are divided by different colors, with core airports denoted by larger circles and names.

Figure 3 shows that the community structures of the two airlines are completely different. The American Airlines network is divided into 5 main communities with 7 core airports,while the Southwest Airlines network is divided into 6 main communities and 8 core airports (core airports are shown by larger dots). In the AAL network, the main airports include LAX, PHX and MIA. In the SWA network, the main airports include SAN and STL.

From the correlation coefficient distribution function of both networks (Figure 4), the route correlation coefficient of AAL has a greater range (ranging from 4.06 to 5.52, with an average value of 0.73) compared to the route correlation coefficient of the SWA network (ranging from −0.83 to 0.98, with an average value of 0.08).

From Figure 4, the AAL network showed the strongest features of connectivity to preferential nodes and is considered to be the closest to the hub-and-spoke network. In fact, it presented a sparse-strong style (i.e., a few edges among the communities with large weights), which means that traffic is concentrated between hub airports and there are fewer routes between communities. Therefore, we can assume a ‘tendency’ towards a hub hierarchy or hub-and-spoke configuration in the AAL network. This is also based on the appearance of nodes such as PHX, LAX, and MIA, which are structured as hubs in the framework of AAL activities.

The traffic in the SWA network showed that flights are scheduled in a more balanced manner over the entire network and presented in a dense-weak style (i.e., many edges among the communities with small weights). This means that more routes are connected for both inside and outside communities, but with traffic distributed sparsely. This characteristic leads to a city-to-city network configuration. This is not surprising given SWA’s low cost operation.

4. Community Analysis

Given that the position of an airport in the carriers’ hub hierarchies may influence network structures and other market variables such as average airfares [28], core community/airport and their influence should be further explored [29]. In the following sections, we discuss a number of issues related to community analysis. We examine the core communities, the core nodes (airports) and the peripheral nodes (airports).

4.1. Core Community

The link between city size and air service is complex; and the way airlines serve cities of different sizes is worth exploring [30]. The correlation coefficient between airports impacts on how the various nodes relate and link to one another, and ultimately determine the configuration of the entire network structure. In this paper, between two airports is used as an edge weight for the core airport correlation coefficient and the community correlation coefficient is calculated based on a network metric, i.e. weighted betweenness centrality, which indicates a node’s ability to stand between other nodes, and therefore, to control the flows among them. Betweenness centrality of node u is calculated as follows:

Where is the number of the shortest paths between node to node that pass through node , and is the overall number of paths between nodes and [21]. The airport correlation coefficient and the community correlation coefficient can be presented respectively as:

Therefore, it is worth examining the average correlation coefficient of each community in more detail and to draw some conclusions from this Tables 1 and 2.

In Table 1, size refers to the number of airports in a community and degree refers to the degree of a community, which is the average degree of all nodes in this community. The internal and external edge respectively presents the quantity of edges that is inside and outside a community. The average community correlation coefficient represents the centrality and influence of a community. In Table 1, the core communities of AAL have larger values and fewer core airports. This means that the community is dominated by a small number of hub airports that may lead to larger impacts on other communities. In particular, the largest average coefficient of community (0.84) reveals a strong correlation between airports in this community. This demonstrates strong connection between airports in this community. In other words, internal flights of community are more easily transmitted, and therefore it is the most frequently contacted community.

However, the correlation coefficient of the core community of the SWA network is much smaller than that of the AAL network. Its highest value is 0.75 for community and the lowest value is 0.43 for community , which shows a weaker connection between airports inside the community. The largest community in the AAL network is which includes 24 airports, while the largest community of SWA is which has only 11 airports but is dominated by three key airports.

It can be seen that the core community of SWA has more core airports and less coefficient values. This is because flights are distributed uniformly to several main airports and more flights are connected outside communities. Therefore, the core community of SWA has less impact on other communities of these airports compared to the core community of AAL.

4.2. Core Airport

The correlation coefficient of a single node can be understood as a measure of centrality and influence, where the highest values are usually matched to the best connected node. This is because the best connected airports have a greater impact on the network, as they can “control” a significant number of flights. Furthermore, geographical distance between nodes focuses on their ‘ease-of-access’ to other nodes. The core airport is determined by the ranking of the airport correlation coefficient , which considers both the correlation impact inside and outside the community. In Table 3, a list of ranked airports in the AAL network and the SWA network is provided to show the important connectivity of airports inside and outside the community. It also shows the absolute change in value of community correlations (i.e. and ) when these different airports are “attacked” (removed from the network) at each time. and denotes the community correlations after removal.

The top 10 airports are the key airports in their own communities. For both AAL and SWA networks, the removal of a core airport has less impact on the core community, especially those with more than two core airports. This is because of the homogeneity of flight scheduling on these routes and that most of the flights can be rerouted. Therefore, core communities often have better robustness. For non-core communities, their dominant airports such as DFW in and BWI in , have a greater impact on the overall connectivity of communities, especially when they are the only leading airport in the community. In addition, the average community correlation decline in AAL network is 0.131, while it is 0.185 in SWA network. This indicates that the network structure of AAL is robust against airport failures. The smaller decrease in correlation value for each airport also indicates that the removal of the key airports might not easily reduce the connectivity of other airports both inside or outside the community. Table 4 shows the results for two specific airports (PHX and LAX) in the AAL network used to investigate the impact of key airports in different communities.

In Table 4, refers to the number of airports that are connected with PHX/LAX airport in the same community and refers to the number of airports that are connected outside. The difference in core airports between the two networks is clear. In the AAL network, PHX dominates 42 airports with an average correlation coefficient of 3.49, with 23 airports inside the community and 19 outside. The correlation coefficient between PHX and the 23 connected airports within the community is 0.68 more than the correlation coefficient outside the community. This shows the dominate position of PHX airport in community .

In the SWA network, PHX only dominates 5 airports inside the community but dominates 26 airports outside the community. However, the difference between the average inside and the average outside community correlation coefficient is only 0.02. This means that there is little difference between airports inside or outside the community. In other words, PHX is of the same importance in its own community as well as in the entire network.

As LAX is also a key airport in the same community as PHX in both airline networks, results were similar to that of PHX. However, the lower value of , shows that LAX has a less prominent position compared to PHX.

The results indicate that the core airports in the AAL network have a stronger impact compared to airports in the same community. By contrast, the core airports in the SWA network have a stronger impact compared to airports outside of the community. The different relationships among communities have an important effect on the impact of core airports when the modularity values are equal.

4.3. Airport Dominance and Control

The detection method is also useful for analyzing airports that have an “ambiguous” geographical position (e.g., Kansas City International Airport, MCI). In previous studies, airport communities were divided by distance. However, distance is only one kind of airport clustering. In addition, nearby airports are considered to be in the same community, and therefore airports that are equal in distance to core airports can be easily confused. For example, the route distance of MCI-ORD and MCI-DFW are approximately equal; thus it is difficult to determine whether it is dominated by ORD or DFW when only distance is considered. However, results show that the route-traffic-based model can eliminate this confusion.

In the AAL network, MCI is dominate by ORD in community . This is because flights between route MCI-ORD are more frequent, and ORD has a larger impact on traffic in MCI.

From Figure 5, it can be noted that as traffic in ORD and DFW increase from 5% to 40%, the difference between the two curves of becomes more noticeable. This indicates the control effect of ORD. Furthermore, apart from the core airports, other airports (such as STL and TUL) from the surrounding community have minimal impact on MCI.

In the SWA network, MCI is dominated by STL in community , which is closer than the core airport MDW in community . STL is one of the hub airports of the SWA airline and MDW is a secondary airport in the city of Chicago. However, SWA airline is easily influenced by MDW due to its frequent traffic flow. Therefore, airports have a stronger connection in the same community in the AAL network, while they are more likely to be influenced by other core airports in surrounding communities within the SWA network structure.

Effective community and airport detection is important for understanding the different patterns in airline network configurations in the consideration of geographical, air transport-political and economic factors such as supply, traffic demand and costs. Based on the results, two possible reasons can be identified for different community structure and hub airports in the two airline networks. First reason is the different network configurations. AAL airline is a full-service carrier, which offers a variety of services and network linkages; while SWA is a typical low-cost carrier, which offers a limited number of services in specific segments of the network such as regional airports at low prices and mainly dominate on a point-to-point basis. More short- and medium-range flight route is provided by SWA, which increases the connectivity of airports on a smaller scale. Second reason is the different operation patterns. The historical operation pattern and company strategies have placed limitations on its networks structure expansion, not only in terms of complementarity and competition, but also in terms of the target passengers that they attract. The higher frequency and flexibility in flight scheduling of SWA enable branch routes to become more important and have a greater impact on the robustness of the entire network.

5. Conclusion

The purpose of this paper is to propose a route-traffic-based method of detecting communities in airline networks that have different operation patterns to identify their community structures that are both highly connected and spatially clustered.

Our model is an improved version of the Clauset-Newman-Moore (CNM) modularity maximization algorithm, where both route distance and passenger volumes are taken into account. This extension is useful because the relations among airports are defined not only based on their topological network structure, but also by traffic between connecting airports. The method was tested using the American Airlines and Southwest Airlines networks. Results show that the model is effective in analyzing the interaction and clustering properties for different airline patterns without diverging from realistic networks or mismatching airports. The three main findings are as follows:(1)The route-traffic-based method of detecting communities, which considers the actual flight operation (airline passenger flows between routes) of airlines, is more accurate compared to network community detection using only the physical distance between airports. The core airports of the communities are consistent with the actual hub airports of the airlines. Moreover, there is a clearer and more explicit classification especially for those more geographically “ambiguous” (fuzzy) airports (i.e., airports that fall in the catchment area of different dominant airports), which is helpful to further study the relationship between airports.(2)The differences in operating patterns of airlines lead to different network topological and community structures. This difference is mainly due to flight scheduling factors such as flight focus on trunk route or branch route, rather than objective factors such as route distance. For the overall structure of the network, the traditional full service airlines have fewer communities with a greater number of airports, sparse correlation/bridge edges and higher weights on the trunk route. Low cost airlines tend to have more communities with fewer airports, dense correlation/bridge edges and dispersion route-weight [31].(3)For the connected airports inside and outside of communities: First, links between core airports (regardless of whether it is inside or outside of the community) are stronger in the network of full service airlines, and the impact of these core airports is greater for both internal and external nodes that are organized as communities. The connection between other nodes in different communities is much weaker, and the influence between core airports is much stronger than that of other airports in the same community. In other words, the connection between core airports is strong, and non-key airports are more likely to be affected by key airports outside of the community. Second, the connections between internal nodes in the same community of a low-cost airline network are closely linked. The airports are mostly “controlled” by the core airports in the same community, while the key airports are more affected by other airports both inside and outside of the community. In other words, the relationship between the key airports is weaker, and the non-key airports are mainly dominated by the key airports of its own community. That is, the key airports are more prominent in the community.

In this study, characteristics of the network connections were investigated. Nodes and communities with the most impact on the development of different network structures were identified, and how these affect the entire network was examined. Furthermore, several drivers on network structuring, including correlation coefficient and degree were analyzed to compare the different relationships between community structures and airline operating patterns.

Two issues have been identified in this study that can be addressed in future research. First, more airline networks can be analyzed to further validate the application of our model. Different airlines have different networks and operation patterns. The effectiveness of the route-traffic-based community detection method can be further tested and refined using different airline networks. Second, more community detection variables can be considered. Although we explored route traffic as the main factor that determine the core airports and their dominance, future research can also consider other factors such as time and flight delay in the model.

Data Availability

The numeric data used to support the findings of this study are available from the corresponding author (Haoyu Zhang, Email: [email protected]) upon request. The data sheets are saved in EXCEL which is collected from the historical data on website: https://zh.flightaware.com.

Conflicts of Interest

All authors declare that there is no conflict of interest regarding the publication of this paper.

Authors’ Contributions

Haoyu Zhang wrote the manuscript; Haoyu Zhang, Weiwei Wu and Frank Witlox provided the idea; Shengrun Zhang and Haoyu Zhang provide the data and case study; and Haoyu Zhang, Weiwei Wu, Shengrun Zhang and Frank Witlox revised the manuscript.

Funding

Financial support from the National Natural Science Foundation of China (U1933118,71731001, 41701120) and the National Key Research and Development Project (2018YFB1601200) are gratefully acknowledged.

Supplementary Materials

The uploaded excel refers to data we used in the paper. (Supplementary Materials)