New Models, New Technologies, New Data and Applications of Urban Complexity from Spatio-temporal PerspectivesView this Special Issue
Using Open Big Data to Build and Analyze Urban Bus Network Models within and across Administrations
Urban bus networks play an important role, when the capacity of urban public services is evaluated. With recent advancements in Internet and Communication Technologies, there is an emerging interest in building an urban bus network model through open big data. This has rarely been investigated and exposes several challenges in the provision of transportation services in urban planning. On the one hand, it is necessary to combine bus stations based on spatial distance constraints due to their ambiguous definition in open big data; on the other hand, it is difficult and time-consuming to relocate and build new stations, but the optimization of bus lines is relatively easy to implement. This study aimed to develop an explicit methodological framework for building and analyzing two different types of urban bus network model using open big data. Thereafter, the framework was applied in two case studies in China, within a county-level administration and in a region including three county-level administrations. The key result shows that there was a shortage of urban bus services across these different administrations. This paper contributes to the body of research methodologies into public transport networks and to understanding the sharing of urban public services across administrations, improving the management of urban bus networks, and highlighting the importance of examining the characteristics of urban bus network in county-level administrations rather than just in large cities in China.
Prioritizing urban bus networks (UBNs) has become an important way of solving many urban transportation problems, such as alleviating traffic congestion  and reducing emissions. In general, an urban bus is the most basic and important mode of transportation for urban and rural residents and is much more environmentally friendly than other modes of transportation . Additionally, improving the service capacity of UBNs has become an important part of the current construction of livable cities promoted by the Chinese government. In this respect, an in-depth topological and statistical analysis of UBNs is of fundamental importance for the evaluation of the capacity of current urban public services [3, 4]. This has also become an indispensable part of urban planning.
In recent years, complex network theory [5, 6] has become the most common and effective method for the construction and analysis of UBN models, and many metrics have been used to evaluate the characteristics of UBNs. For example, Wang et al.  studied the spatial configuration of the UBN in the city of Shenzhen, China, adopting the metrics of degree centrality, betweenness centrality, clustering coefficient, and average path length. Xu et al.  found that the degree distributions of all UBNs of 330 cities in China were approximated by exponential distributions. Zhang et al.  explored the structural characteristics of dynamic weighted UBNs by using complex network theory. In general, complex network theory has been used widely in previous studies.
The investigation of UBNs has been a long-standing research topic. It has been conducted in various ways, focusing on issues such as spatial configuration , design and optimization [10, 11], and network structure . However, the data used for the construction and analysis of UBNs were mainly from local government agencies, which might vary from one city to another in terms of the data standard. This prevents comparability across different cities, while these comparisons are very important in urban planning. Moreover, these data are typically not readily available due to business privacy, which poses another constraint for UBN research. With the expansion of urban transportation systems, UBNs have become incredibly large and complicated, and there are high demands for the automatic construction of UBN models through open big data [3, 13]. Therefore, it is necessary to establish a framework for open big data acquisition and processing in UBN research.
Open big data has become available and attracted intensive attention from the field of urban planning [4, 13], thanks to advances in Internet and Communication Technologies. Many studies have reported fruitful results using various types of data, such as high-speed railway data [14, 15], metropolitan rail transport data , aviation flight data [17–20], patent data , shipping track data [22, 23], and subway credit card data . These data contain nodes with geographic coordinates and information about the strength of the connections between them. However, as for building UBNs based on big open data in China, there are four problems that have not been noticed before but are important for developing the research model. In this respect, this study has two notable features regarding data processing: (1) it has a highly automated capability to collect open big data, which provides a consistent way of analyzing UBNs across different regions; (2) it considers how to merge bus stations according to the requirements of urban planning and the accuracy of data calculation.
Previous studies have mainly focused on the analysis of UBNs in metropolitan areas. For example, two studies [25, 26] pointed out that UBNs operated inefficiently in metropolitan areas. However, there is a lack of evaluation and research on UBNs at the county level, and particularly, there are very few studies that examined UBNs across different counties. In fact, due to the rapid development of Chinese cities, the trend of integration development and the travel demand across different regions are becoming more and more important and urgent. Thus, the evaluation and design of the current UBNs at different levels becomes an important aspect of the sustainable development of Chinese cities in the future. In addition, the current UBNs might have a few problems, which are often discussed in line with the issues of sustainable urban development, such as the lack of support for urban commuting  and the inefficient layout of feeder bus routes [28–30].
More importantly, this study paid special attention to the analysis of cross-county UBNs, which have rarely been investigated in existing literature. Particularly, this study aimed to answer the question of whether the service quality of cross-county UBNs is very poor. For example, cross-county UBNs tend to have the problem of “broken links,” which refers to bus lines that do not have access to other established UBNs in a small area. Furthermore, the analysis of UBNs in many large cities of China has been mainly used for network optimization instead of network expansion. In order to meet the needs of sustainable urban development, it is challenging to relocate bus stations, but it is relatively convenient to optimize the bus lines [31, 32]. Thus, this study not only focused on the analysis of bus stations but also emphasized the important role of bus lines for the construction and analysis of UBN models .
The remainder of this paper is structured as follows. Sections 2 and 3 provide the introduction of materials and methodology. In Section 4, the results of two case studies are provided. Finally, a discussion of the results and conclusions are presented in Section 5.
2.1. Study Areas
The study areas of the two case studies are shown in Figure 1, and the data were all obtained from the Gaode map (the Chinese version of map service provider, like Google and TomTom) in May 2019. The first case is a UBN analysis in a county-level administration, while the second case is a UBN analysis across different counties.
In the first case, the major features of the UBN are shown in an independent county-level administrative area. This case study area is Kunshan, Jiangsu Province, China, which neighbors Shanghai and is one of the top 10 economically strongest counties in China. It is selected because its UBN covers the entire administrative area. This case is representative of studying the spatial equality of UBNs, so that more people can enjoy better public transport services in small and medium cities.
In the second case, a cross-boundary area with three adjacent county-level administrative units was selected from different provinces in the Yangtze River Delta metropolitan area. They are Jiashan County of Zhejiang Province, Wujiang District of Jiangsu Province, and Qingpu District of Shanghai. It is worth mentioning that the UBN in each administrative unit has been developed independently and the connections across the three administrative districts through their UBNs are relatively weak. The Yangtze River Delta is one of the three most developed urban agglomerations in China and its integrated development across different administrations has been strengthened by market forces and promoted by the central government. Thus, the second case is of great significance to the integrated development of cross-county UBNs in China’s urban planning.
2.2. Data Problems in the Analysis of Urban Bus Networks
As shown in Figure 2, there are four major problems for the analysis of UBNs using open big data.
As shown in Figure 2(a), a bus station may be represented by multiple geospatial points of the same name which are adjacent. It is particularly true for a transfer station, where two or more bus lines cross each other. In this case, the same transfer station obtained from different bus lines might have different geographical locations and is not completely coincident. In fact, this transfer station should be unique.
As shown in Figure 2(b), some bus stations might be geographically far away from each other in a large study area, although they have the same name. Therefore, these stations have to be treated as different ones.
As shown in Figure 2(c), there are small differences in the names of the same bus station on different bus lines. These bus stations are adjacent or coincident, and they should be treated as the same bus station.
As shown in Figure 2(d), some bus stations are actually not the same station, but they are very close to each other. There are many reasons for this situation. For example, multiple bus stations are built in a high-speed railway station, or they are built in a small area to meet the needs of bus line stops. For urban planning, these stations need to be merged into one bus station to observe the bus service capability of a geospatial entity, such as bus stations in the high-speed railway station areas.
3.1. Two Types of Urban Bus Network Model
The first type of UBN model is shown in panel (a) of Figure 3, which is typically known as the “Line-Station” representation. In this network, bus stations are network nodes, bus lines between bus stations are network edges, and the number of bus lines can be used as the weight value of the network edge. The Line-Station-based UBN model is conventionally used to examine the characteristics of bus stations, while ignoring the bus lines, for instance, identification of a bus station with an important transit function in the network . In addition, these studies tend to use the P-space to establish the complex network .
The second type of UBN model is shown in panel (b) of Figure 3, which is known as the “Line-Line” representation. In this network, bus lines are modeled as network nodes, and bus lines passing through the same bus station are considered as network edges among them, and the number of bus stations that bus lines pass through is taken as the weight value of the network edge. The Line-Line-based UBN model construction is more concerned about the bus lines, and hence it is more practical for the planning and management of bus lines in public transport .
In addition, the “Line-Line” model is often better than the “Line-Station” model in data visualization, because the bus line object is clearer and more significant than the bus station object in graphic representations. In other words, when applying a “Line-Line” model to evaluate UBNs across different counties, it is much easier to detect which bus lines are across different administrations and where the “broken links” exist. In the first case study, because of the high level of urban-rural integrated development in Kunshan, the analysis of the cross-administrative issue is not very important. Therefore, the “Line-Station” model is used for the analysis of the UBN in this case, while the “Line-Line” model is used for the analysis of UBNs in the second case.
3.2. Method of Analyzing Urban Bus Networks Using Open Big Data
As shown in Figure 4, the method to analyze a UBN constitutes includes three steps.
3.2.1. Step 1: Collecting Data on Bus Lines and Bus Stations Automatically
Firstly, the paper selects the online map provider, which can provide abundant point of interests (POIs) data over the study area. This is because POIs data contain information of a bus station, which records the information of all bus lines passing through it. Secondly, it collects the POIs data with the type of bus station via the application programming interfaces (APIs) provided by these map providers. Thirdly, the names of bus lines are extracted from these POIs, which are further used to crawl the detailed information of both bus lines and bus stations in a city. In this respect, it can automatically collect all the information of bus lines and bus stations by simply providing the city name of the study area via the use of the official data acquisition interface API.
3.2.2. Step 2: Processing the Collected Bus Network Data Using Spatial Constraints
In this step, this paper proposes an effective solution to cope with the four problems illustrated in Figure 2. As shown in Figure 5, firstly, it merges the bus stations with the same name within a certain distance (e.g., 250 m) of each other into one bus station. Secondly, it distinguishes bus stations with the same name but with a long distance from each other (e.g., >250 m) by adding a suffix identifier. Thereafter, the bus stations should have different names. Thirdly, it merges the bus stations within a certain distance (e.g., 100 m) into one bus station.
3.2.3. Step 3: Building Two Types of Urban Bus Network Using the Processed Data
In this step, two types of UBN are built, namely, the “Line-Station” network and the “Line-Line” network. It should be noted that the two types of network are modeled as directed weighted networks and they are established using the new data on bus stations and bus lines, which are generated in Step 2.
3.3. Metrics for Analyzing Urban Bus Networks
3.3.1. Node Degree Centrality
The node degree centrality (DC) is defined as the number of edges coincidence on the current node. It can reflect the number of direct neighbors of the current node, and it is mathematically defined in (1), where Lij reflects the connection between node i and node j and n represents the total number of nodes [35–37]. One has
3.3.2. Node Strength Centrality
The node strength centrality, which is also called “Weighted Degree Centrality” (WDC), is defined as the summation of weights of edges coincidence on the current node . It can reflect the intensity of interactions between the current node and its neighboring nodes, and it is mathematically defined in (2), where is the weight of the edge between nodes i and j and n represents the total number of nodes. One has
3.3.3. Weighted Betweenness Centrality
Weighted betweenness centrality (WBC) is defined as the number of shortest paths between two nodes that pass through the current node considering the edge weight. It is used to measure the importance of nodes serving as bridges in the network [38, 39], and it is mathematically defined in (3), where s and t consist of a node pair in clusters ; indicates the number of weighted shortest paths between node s and node t; and is the number of weighted shortest paths between node s and node t passing through node i. One has
3.3.4. Community Detection
The community detection method can partition an entire network into tightly connected subnetworks, and the community can be understood as a class of nodes with similar characteristics. There are many different types of community detection methods in the literature, and the most commonly used one is the modularity-based method. Modularity is a measure of the degree to which a network’s communities may be separated and recombined, which is a commonly used criterion for partitioning a network into a certain number of communities [40, 41]. The larger the modularity value, the better the division of the community structure. In real-world systems, the value of modularity usually ranges from 0.3 to 0.7 . This paper employs a novel method based on modularity optimization [43, 44], which partitions the network into a number of distinct modules if there is clear modularity in the network.
4. Results and Discussion
4.1. Case Study 1: Data Extraction and Analysis of an Urban Bus Network in a Single Administrative Area
This case study used the “Line-Station” type of UBN model, which mainly shows the overall characteristics of the spatial distribution of bus stations in a single administrative area. As shown in Figure 6(a), it displays the spatial distribution of the WDC of bus stations. Specifically, Yushan Town has the strongest bus service capability and has a fusion development trend with an Economic Development Zone, while the bus service capability in the other towns is relatively weak. Figure 6(b) represents the spatial distribution of the WBC of bus stations, which can be used to understand the transit capacity of bus stations in the study area. For instance, bus stations with high WBC values in Yushan Town are likely to be densely distributed, while those with low WBC values in other towns are relatively sparsely distributed. In these respects, the UBN of Kunshan is mainly concentrated and fully developed in the town of Yushan.
Figure 7 displays the community structure of the UBN, which can be used to examine the demarcation of transportation space. From this figure, the UBN is organized into 13 communities with a certain degree of spatial coherence. The modularity value is 0.613, which shows that the community pattern is satisfactory. More importantly, the spatial organization characteristics of the UBN in Kunshan can be seen in Figure 7: two communities in the south of Kunshan can be clearly demarcated, which are related to five towns. For instance, the towns of Zhangpu, Jinxi, and Zhouzhuang are in the same community (community 3), while the towns of Qiandeng and Dingshanhu are in another community (community 10). Communities in the north of Kunshan are intertwined and are covered by six towns. For instance, it is much more intricate for the town of Yushan and its four neighboring towns.
Furthermore, this study calculated the proportion of bus stations of each community in different administrative units. Table 1 shows the distribution of the proportion of bus stations of each community in each administrative unit, and two things become apparent. First, Zhouzhuang Town, Jinxi Town, and Dianshanhu Town in the south of Kunshan are all composed of a single community, and thus the spatial structure of the subnetwork in each administrative unit is relatively simple. Second, other towns are composed of multiple communities, especially Yushan Town and the Economic Development Zone, where the spatial structure of the subnetwork is relatively complicated.
These community patterns may be roughly explained by the socioeconomic development of the towns. The socioeconomic level in the northern part of Kunshan is high, and thus the connections via the UBN among these towns are relatively strong. However, in the south of Kunshan, there are many water towns and tourist towns, which have a relatively lower level of socioeconomic development. Hence, the connections via the UBN are likely to be concentrated on a single or a few towns to satisfy the need of specified industries. At the center of the city, Yushan Town and the Economic Development Zone are composed of many subnetworks to meet the different needs of urban bus travel.
4.2. Case Study 2: Data Extraction and Analysis of Urban Bus Networks of Cross Administrations
This case study used the “Line-Line” type of UBN model, which is suitable for assessing the development of UBNs across different counties. This paper analyzes this issue from four aspects.
First, Figure 8 displays the spatial distribution of the WDC of the bus lines. Bus lines with high WDC values are mainly distributed in Jiashan and Qingpu, while those with weak values are mainly located in Wujiang. Additionally, there are very few bus lines with high WDC values across different administrative counties. That is to say, bus lines with the highest value of WDC tend to be constrained in one independent administrative county, and they can be affected by the spatial layout of the administrative county.
Second, Figure 9 displays the spatial distribution of the WBC of bus lines, which can be used to identify the hub-type bus lines in the study area. Bus lines with a strong hub function are mainly distributed in the junction of the three administrative regions, which indicates that a certain number of hub bus lines across counties have been formed in the study area. However, the number of these bus lines is still very small compared with the majority of nonhub bus lines. Besides, many bus lines are not connected in the junction of administrative districts, which is typically known as the “broken link” phenomenon in urban planning . Particularly, there are no bus lines connecting Jiashan and Qingpu, which might hinder integrated regional development.
Third, Figure 10 displays the community structure of the UBNs, which can be used to examine the demarcation of transport space. This figure suggests that the UBNs can be divided into nine communities with a high degree of spatial coherence. The modularity value is 0.758, which shows that the community pattern is reasonable and satisfactory. However, there are no obvious cross-administrative communities, which further indicates that the UBN of each administrative region is relatively independent.
Fourth, to evaluate the service capacity of the UBNs across different counties, this study calculated the proportions of bus lines with respect to different administrative units in each community. As shown in Table 2, there are only two communities covering two or more administrative regions, among which community 1 spans two administrative regions (Qingpu and Wujiang) and community 4 spans all counties. Nonetheless, most of the bus lines are concentrated in one county, namely, 91.49% for community 1 and 94.52% for community 4. Besides, all the bus lines in other communities are constrained in one single county. This further indicates that there are deficiencies in the service capacity of UBN across different counties. Therefore, city managers need to break the barriers of administrative divisions to improve the capacity of the cross-county services of the UBNs.
The contributions and limitations of this study are as follows. Firstly, data acquisition has always been a bottleneck in the study of cross-county UBNs. In other words, how to get reliable data efficiently is a potential problem. To cope with this problem, this paper demonstrated a methodological framework to analyze UBNs using open big data. These open big data are collected from the same data source, which guarantees the reliability of a cross-comparison of the structures and organizations of UBNs in different counties. This is of fundamental importance for urban planning.
Secondly, UBNs were represented as the “Line-Station” model in many previous studies, which took the bus station as the network node. In this study, it is much more useful to represent the UBN with the “Line-Line” model, where the bus line is taken as the network node. This is because it is much easier and more convenient to adjust bus lines than to retrofit or build new bus stations. In addition, this type of UBN representation is much more effective in the analysis of the relationship between different regions in a study area, which can assist a comprehensive and objective judgment on the evaluation of UBNs.
Thirdly, this study explored the application of the methodological framework to two case studies, which might provide explicit implications from the perspective of urban planning. In the first case study, the overall spatial characteristics of bus stations were evaluated in a single county, where the “Line-Station” model was adopted. However, it is very limited on the exploration of the spatial relationship among subnetworks. In the second case study, this study analyzed UBNs across different counties, where the “Line-Line” model was adopted. Specifically, the “Line-Line” model can enhance the understanding of the integration and development of UBNs between different counties.
Furthermore, for urban planning and design, the contributions of this study are mainly focused on the following three aspects: (1) The analytic results can attract more scholars and urban planners to examine the spatial characteristics of UBNs in counties, not just in big cities, using open big data. This can help to improve the spatial equity level of public transport services in many areas of rural China. (2) The study can help urban planners to identify the practical problems of cross-county UBNs through a standardized technical approach. (3) It can also provide data source and methodological support for fine-grained urban management, such as helping government officials to evaluate urban bus stations and lines dynamically and to improve the UBN service quality continuously.
Limitations of this study are also highlighted for future studies. Firstly, this study only analyzed UBNs, while it lacked the analysis to consider other transport modes, such as subway networks. Second, it needs to fuse other types of open big data to improve the capacity of capturing urban planning problems . Third, the impact of using different properties weights for the UBN analysis should get more attention for more diverse applications . Fourth, the accessibility and spatial equality [48, 49] of urban bus stations are also important for future UBN analysis.
Overall, this paper provides a methodological framework for building and analyzing UBN models using open big data, which is valuable for the planning and management of urban public transportation facilities. This framework was applied in two case studies, where the structure and organization of UBNs were examined and analyzed from an urban planning perspective. The analytic results can be valuable for urban planners and government agencies in many aspects of understanding the sharing of public services across different counties, managing UBNs in an effective way, and recognizing the importance of the county-level bus networks in China.
Data are made available upon request to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Sheng Wei and Lei Wang contributed equally to this work.
This study was supported by the National Natural Science Foundation of China (41801107 and 41971332) and Natural Science Foundation of Jiangsu Province, China (BK20161088 and BK20191486).
P. K. Agarwal and A. P. Singh, “Performance improvement of urban bus system: issues and solution,” International Journal of Engineering Science and Technology, vol. 2, no. 9, pp. 4759–4766, 2010.View at: Google Scholar
F. Xu, J. Zhu, and J. Miao, “The robustness of high-speed railway and civil aviation compound network based on the complex network theory,” Complex Systems and Complexity Science, vol. 12, pp. 40–45, 2015.View at: Google Scholar
C. Daraio, M. Diana, F. Di Costa, C. Leporelli, G. Matteucci, and A. Nastasi, “Efficiency and effectiveness in the urban public transport sector: a critical review with directions for future research,” European Journal of Operational Research, vol. 248, no. 1, pp. 1–20, 2016.View at: Publisher Site | Google Scholar
J. Hong, R. Tamakloe, S. Lee, and D. Park, “Exploring the topological characteristics of complex public transportation networks: focus on variations in both single and integrated systems in the seoul metropolitan area,” Sustainability, vol. 11, no. 19, p. 5404, 2019.View at: Publisher Site | Google Scholar
S. Rui, M. Cordeiro, M. Oliveira, S. Tabassum, and J. Gama, Social Network Analysis in Streaming Call Graphs, Springer, Cham, Switzerland, 2015.