Complex Network Analysis of Pakistan Railways
We study the structural properties of Pakistan railway network (PRN), where railway stations are considered as nodes while edges are represented by trains directly linking two stations. The network displays small world properties and is assortative in nature. Based on betweenness and closeness centralities of the nodes, the most important cities are identified with respect to connectivity as this could help in identifying the potential congestion points in the network.
In recent years there has been rapidly growing interest in investigating the statistical and dynamical properties of network systems, containing set of items called nodes or vertices and edges representing interactions between them. Examples include the Internet, the World Wide Web, social networks of acquaintance or other connections between individuals, organizational networks and networks of business relations between companies, neural networks, metabolic networks, food webs, distribution networks such as blood vessels or postal delivery routes, and networks of citations between papers.
Transportation networks are among the most important building blocks in the economic development of a country. The structure and performance of transportation networks reflect the ease of travelling and transferring goods among different parts of a country, thus affecting trade and other aspects of the economy. In the recent years, complex network analysis has been used to study several transportation networks. These include airport networks, for instance, the airport network of China [1, 2], airport network of India , US airport network , and the worldwide airport network [5, 6], urban road networks [7–9], and railway networks [10–14].
Railways are one of the most important modes of transportation around the world, with the topological properties of these railway networks attracting huge attention. Sen et al.  were amongst the first to apply complex network theory to the railway network, while in the process of studying the statistical properties of the Indian railways the authors introduced a new topological representation, the P-Space topology, wherein stations or stops are identified as nodes and are connected if at least one train stops at both the stations. The authors introduced a new method to calculate the shortest distance between two stations. Based on these calculations, the small world properties and exponential degree distribution of the Indian railway network are identified. An extension to this was provided by Majima et al.  as the same topology was applied to the Japanese railway network and the same statistical results were obtained. While two different networks exhibited the same properties when illustrated using the P-Space representation, the Chinese railway network also displayed the small world properties of the shortest distance between stations and high clustering coefficient, however, with a power-law degree distribution . In another attempt to explain the dynamic nature of the Chinese network, Guo and Cai  concluded that the network is a scale-free network when extracted in the L-Space topology. Similarly, Wang et al. [17, 18] represented the railway network of China in both L-Space and P-Space and successfully fitted a power-law distribution in both cases.
The PRN is a moderate railway network with over 620 stations and 7,791-kilometer track. Railways are the primary mode of intercity transportation in Pakistan and the network is responsible for transporting massive number of passengers and freight. Even though railways play an important role in shaping the transportation sector of Pakistan, no research has been put forward into studying the complex nature of this network. To the best of our knowledge, this is the first study ever on the complex network theory application on PRN.
2. Network Construction
Before starting off with the analysis of PRN, it might be a good idea to define the proper network topology. Two methodologies exist in current literature for representing a network, Space L [8, 17] and Space P [8, 12, 18, 19] (Figure 1). Space L consists of nodes representing cities, bus, metro, train stops, and sea ports and a link between two nodes exists if they are consecutive stops on the route. Nodes in the Space P are the same as in the previous topology; here an edge between two nodes means that there is a direct bus, train, or metro route that links them. In other words, if a route consists of nodes , that is, , then in the Space P the nearest neighbors of the node are . The node degree in this topology is the total number of nodes reachable using a single route and the distance can be interpreted as the number of transfers (plus one) one has to take to get from one stop to another, whereas the node degree in the previous topology is just the number of directions one can take from a given node, while the distance equals the total number of stops on the path from one node to another [8, 12]. In this study, we use the Space P methodology to represent the PRN, as this has already been used to represent railway networks [2, 12, 14]. The network was constructed from the official “Pakistan railways time table,” kindly provided by Pakistan railways. The time table had complete details of railway stations, number of trains, and the arrival and departure of each train at/from each station.
3. Topological Properties
Table 1 provides all computed network statistics, from basic network properties such as the number of nodes and edges to the more complex metrics such as clustering and assortativity.
3.1. Degree Distribution
The degree of a node, a measure of its connectivity, is defined as the fraction of nodes with degree in a network. Degree is one of the measures of centrality of a node in a network and it symbolizes the importance of a node in a network. Commonly accepted rule is that the larger the degree of a node is, the more important it becomes. The PRN is comprised of nodes and edges representing the direct link among stations. The average degree of the network is thus which indicates the average number of stations reachable from an arbitrary station via a single train.
The degree distribution is an important feature that reflects the topology of the network and is defined as the fraction of nodes having degree in the network. However, the cumulative degree distribution is usually preferred as degree distribution is often noisy and there are rarely enough nodes having high degrees to get good statistics in the tail of the distribution whereas the cumulative distribution effectively reduces the number of statistical errors due to the finite network size . The cumulative degree distribution of the network is provided in Figure 2. As evident from Figure 3, the railway network of Pakistan is a moderately connected network, with majority of nodes having degrees of 29 or below, whereas a few stations share high degree connectivity and act as hubs. Karachi, Lahore, Hyderabad, Kotri, Rawalpindi, and Peshawar are the most connected stations; however, they also pose a threat to the operations of the railway network, as a failure of one of these major stations can cause a major portion of the network to crash down and halt. This has been the case in the past several times when failure at one major station caused a major halt of railway operations in Pakistan.
3.2. Small World Properties
Watts and Strogatz  proposed a model of small world network in the context of various social and biological networks. A small world network is categorized as a network in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of stations. Stated simply, a small world is a network having a small average shortest path length and a large clustering coefficient as compared to a random network with the same number of . We apply the same method to see if the small world properties are present in PRN.
The average shortest path length (the minimum number of edges passed through to get from one node to another) from one node to all other nodes of the network is calculated using the following equation: where correspond to the set of nodes in the network, is the shortest path from to , and is the total number of nodes in the network. A small average path length of two stops or stations means that there is connectivity among almost all the stations of PRN, regardless of geographical distance. The network also features small diameter (maximum path length of a network), .
Clustering coefficient () of a node is defined as the ratio of the number of links shared by its neighboring nodes to the maximum number of possible links among them. The average clustering coefficient is defined as
Using the above equation, the average clustering coefficient of the network is calculated to be 0.97, indicating that the PRN is a highly clustered network. This result is substantially higher than the value of an equivalent Erdos-Rényi random graph , . The clustering coefficient together with the small average path length (see above) indicates that the PRN is indeed a small world network.
3.3. Degree-Degree Correlation
Another important topological characteristic of a network that is examined is the degree-degree correlation between connected nodes. A given network is said to be assortative if the high degree nodes have a tendency to connect to other high degree nodes. Similarly disassortative networks are where low degree nodes tend to connect to high degree nodes. Newman introduced a summary statistic for assortativity in 2002 , defined as the Pearson correlation coefficient of the degrees at either end of an edge. Mathematically, this expression can be represented by the following equation: where
This statistic lies in between the range of [−1, 1], where −1 indicates a completely disassortative network and 1 indicates a completely assortative network. For the PRN, the assortativity is measured to be 0.34 illustrating high degree nodes at one end of a link showing preference towards high degree nodes at the other end. To justify the result, the average degree of the nearest neighbor, , for nodes of degree , can be plotted using the following equation:
If increases with , the network is assortative. If decreases with , the network is disassortative. Figure 3 represents the average degree of the nearest neighbor and it can be seen that the increases with degree , consistent with a positive assortativity of 0.34.
3.4. Identifying the Major Stations in the PRN
To identify the stations with high traffic and congestion, betweenness and closeness centralities are used. Betweenness centrality of a node can be defined as sum of the fractions of all-pairs shortest paths that passes through . Mathematically, where is the set of nodes, is the total number of shortest paths, and is the number of shortest paths passing through . The top ten railway stations according to high betweenness centrality are given in Table 2. The station of Jacobabad leads the list as it acts as a link between three different provinces of Pakistan: Sindh, Punjab, and Baluchistan. Similarly, the stations of Kot Addu, Kundian, Rohri, and Raiwind provide access to almost all of Pakistan as trains from different routes pass on through these stations.
Another studied parameter used to identify the major stations in PRN is the closeness centrality, defined as the average shortest distance from node to all the other nodes, which reflects the closeness degree of the node with other nodes in the network. The mathematical expression is where is the shortest distance between and and is equal to the minimum stations from to in the network whereas is the normalization factor. Closeness centrality reflects the closeness degree from one station to all the other stations in the railway network, the larger the value is, the greater the influence is, and the wider range of service the station has. The top ten stations based on closeness centrality are listed in Table 3.
In this paper we have studied the PRN as an unweighted graph of railway stations. The network clearly displays small world properties and is assortative in nature. The betweenness and closeness centralities of the stations are also computed, wherein these stations are identified as potential congestion points. As public transportation, especially railways, provides crucial mode of movement of passengers, the identification of possible congestion stations may serve an important role in identifying the limitations of the network. Although this study contributes a complex network analysis of the physical state of the PRN, given the availability of passenger/cargo flow data, it would also be interesting to study the weighted network as it could reveal a clearer picture of network dynamics in terms of passenger/cargo flow. Such a study would not only reveal the topological aspects but also provide a detailed insight into the network dynamics by identifying the stations with greater flow, the correlations of the edge weights with the degree of the vertices, and especially the eigenvector centrality where the quality of an edge also matters.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research is supported by 2011 Founded Project of National Natural Science Foundation of China (71171084), 2011 Research Fund for the Doctoral Program of Higher Education of China (20110172110010), and the Fundamental Research Funds for the Central Universities (2012, x2gsD2117850).
W. Li and X. Cai, “Statistical analysis of airport network of China,” Physical Review E, vol. 69, no. 4, Article ID 046106, 2004.View at: Google Scholar
H.-K. Liu and T. Zhou, “Empirical study of Chinese city airline network,” Acta Physica Sinica, vol. 56, no. 1, pp. 106–112, 2007.View at: Google Scholar
R. Guimerà, S. Mossa, A. Turtschi, and L. A. N. Amaral, “The worldwide air transportation network: anomalous centrality, community structure, and cities' global roles,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 22, pp. 7794–7799, 2005.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
P. Sen, S. Dasgupta, A. Chatterjee, P. A. Sreeram, G. Mukherjee, and S. S. Manna, “Small-world properties of the Indian railway network,” Physical Review E, vol. 67, no. 3, Article ID 036106, 2003.View at: Google Scholar
T. Majima, M. Katuhara, and K. Takadama, “Analysis on transport networks of railway, subway and waterbus in Japan,” in Emergent Intelligence of Networked Agents, pp. 99–113, Springer, Berlin, Germany, 2007.View at: Google Scholar
D. J. Watts and S. H. Strogatz, “Collective dynamics of 'small-world9 networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998.View at: Google Scholar
M. E. J. Newman, “Assortative mixing in networks,” Physical Review Letters, vol. 89, no. 20, Article ID 208701, 2002.View at: Google Scholar