As a sustainable mode of transportation, subways bring great convenience to the society. Although there have been many studies examining the relationship between the built environment and the station-level ridership, those studies focused mainly on the ridership, which is defined as the number of trips for each station. While ridership is an important indicator for evaluating subway demand, passenger-distance is another critical indicator that incorporates distance into demand evaluation, which has not yet been fully explored. To fill this gap, this paper investigates the relationship between the built environment around stations and the station-level passenger-distance (SLPD). As noted in previous studies, the relationship between the built environment and travel demand can vary by space. Therefore, a geographically weighted regression (GWR) model and a mixed geographically weighted regression (MGWR) model have been used to explore this spatially varying relationship using Chengdu, China, as an example case. The results were compared with that of an ordinary least squares (OLS) model. The comparison shows that the MGWR model that considers both global and local variables has the best goodness of fit. Results also show that 11 of the 25 potential variables are significantly related to SLPD. The accessibility of the station, station type, such as transfer or terminal, number of bus stops, number of restaurants, density of building area, density of the national road network, and density of the provincial road network, all have a positive correlation with SLPD. Meanwhile, the variables, whether it is a newly opened subway station, density of living points of interest (POIs), and density of railroad network, are all negatively correlated with SLPD. Ten of the eleven significant variables (except accessibility) have spatially varying relationships with SLPD. These findings can serve a useful reference for transportation planners for the demand evaluation.

1. Introduction

With urbanization, a growing number of cars are occupying the roadways, which brings along a series of problems, such as vehicular traffic congestion [1], air pollution [2, 3], and fuel consumption [4]. Subway has been considered as a sustainable public transportation mode to alleviate traffic congestion [5]. As early as 1863, the world’s first subway system opened in London, England, with a trunk line of about 6.5 km, using a combination of underground and ground lines (Railway Technology). Construction of China’s subway began in 1965, and in May 2020, 47 cities in China had urban rail transit (China Urban Rail Transit Association). At the end of 2016, China’s urban rail transit construction investment reached 384.7 billion yuan (6.6423 yuan equals one US dollar in 2016), and the total length of urban rail transit lines under construction reached 5636.5 kilometers [6]. In recent years, transit-oriented development (TOD) has become an effective way to alleviate traffic congestion and promote sustainable transportation modes [7, 8], and urban rail transit has been an important part of sustainable development both in developed and developing countries in cities such as New York, Hong Kong, and London [9].

The built environment has been found to have a great impact on travel behavior as well as subway ridership. The concept of built environment variables has been expanded from the original “3D” (density, diversity, and design) (Cervero and Kockelman [10]), to the “6D” (density, diversity, design, destinations accessibility, distance to transit, and demand management) [11, 12]. Therefore, it is important for transportation planners to understand the relationship between the built environment and the station-level ridership. In the literature, previous studies mainly use ridership as the response variable [8, 1318]. However, this variable does not take into consideration the travel distance. Therefore, this may lead to the partial conclusion that when two stations have the same ridership, the station with longer passenger-distance can be regarded as the station with higher demand. To address this problem, SLPD has been taken as the dependent variable against the built environment. This study addresses this issue by exploring the relationship between the dependent variable SLPD and the built environment.

Regarding modeling, since previous studies have found that the built environment typically has a spatially varying effect on travel behaviors [17], geographically weighted regression (GWR) models are adopted in this study, which allows the coefficients of variables to vary across space to capture the spatially varying relationship. In addition, the GWR model relies on a strong assumption that the coefficient of all independent variables varies across the space. As the MGWR model allows variables to be either global or local, this model has been used in this paper, to investigate the relationship between the built environment and SLDP. The data from Chengdu, China, is used to perform the analysis.

The remainder of the paper is organized as follows: Section 2 reviews the previous studies on the relationship between the built environment and subway demand; Section 3 provides a description of the data; Section 4 explains the ordinary least squares (OLS) models, the GWR models, and the MGWR models; Section 5 presents model results with interpretations; and Section 6 summarizes the main findings of this paper.

2. Literature Review

There have been some studies exploring the impact of the TOD on the behavior of residents around the stations. Some studies have explored the principles of TOD by studying the relationship between the ridership and the built environment around subway stations [13, 19, 20], such as population and employment density. In addition, the concept of TOD has also been applied in the field of logistics [2123].

The built environment is the man-made environment for human activities [24], covering land-use patterns, urban design, and transportation infrastructure. Built environment can play an important role in influencing travel behaviors [2527]. For example, Ding et al. [28] explored the influence of the “4D” built environment variables, which are density, design, diversity, and distance to the Central Business District (CBD), on subway ridership in Washington, D.C., and concluded that the built environment variables played a significant role with a total contribution of 34% to subway ridership estimation. A wealth of literature has also explored the relationship between the built environment and subway ridership in China. Zhao et al. [16] suggested the significant effects of eleven built environment variables on subway ridership such as business/office floor area and road length by studying 55 metro stations in Nanjing, China. Liu et al. [29] found that the type of land use around stations and the accessibility of rail stations had a significant impact on passenger flow by studying the trip data of Guangzhou Subway from 2011 to 2016. Yang et al. [30] explored the synergy between built environment variables such as land use and the city attributes based on Shenzhen subway data, and showed that the Baidu heat index (an indicator for destination popularity) for restaurants and entertainment around subway stations was higher at night than that during the day, which triggered interests for exploring the urban “night market economy.” Du and Zheng [31] studied the characteristics of different types of commercial lease-type built environment variables around the Beijing subway network, linking concentrated businesses and dispersed labor through the analysis of metrics such as subway accessibility. Li et al. [32] investigated the impact of the built environment factors on urban rail transit in Guangzhou, China, in a study that integrated multiple sources of spatial big data such as points of interest (POI), high spatial resolution remote sensing images, social media, and building footprint data. However, the above studies rarely comprehensively investigated various types of built environment variables, such as POI class, road network class, density class, etc.

When studying the effects of the built environment on the subway demand, most studies use ridership as a response variable. For example, Chen et al. [33] used daily ridership as the dependent variable in their study of daytime patterns of transit riders of the New York City subway system. However, this response variable could not reflect passenger travel distance. Based on previous reviews, there is only one study using passenger trip miles as the response variable. Iseki et al. [34] studied the determinants of passenger miles traveled (PMT) for each origin-destination (OD) station pair in Washington D.C, but the relationship between determinants and the PMT was assumed to be constant across the study area, which may be context-specific.

At present, studies on urban rail transit passengers generally use two models: spatial models and nonspatial models. Ordinary least squares (OLS) regression models are nonspatial models and the most representative linear regression models were used to initially explain the complex relationship between the built environment and subway capacity [15, 35, 36], which assumes that the relationship between independent variables and the dependent variable is global and not spatially heterogeneous. In terms of spatial models, among the most common are spatial error models [37, 38], spatial lag models [39], geographically weighted regression models [32, 40, 41], and derivative models related to geographically weighted regression models, such as geographic time-weighted regression (GTWR) models [42], geographically weighted negative binomial regression (GWNBR) models [43], geographically weighted Poisson regression (GWPR) models [44], and mixed geographically weighted regression (MGWR) models [13, 18, 45]. For example, Li et al. [32] used a GWR model to refine the study of Guangzhou Subway. Although the GWR models yield a satisfactory fit for the study of spatial heterogeneity of subway stations, they treat all variables as local variables, assuming that all built environment variables vary across space, which can lead to a biased result. To address this, MGWR has been developed. For example, Jun et al. [13] used a stepwise regression model, and the MGWR model in their study of Seoul Subway showed that the MGWR model fits the data better, testifying for the superiority of the MGWR models in handling spatial-temporal data. In sum, few previous papers have analyzed the spatially varying relationships between the built environment and the SLPD. Therefore, it would be worthwhile contribution to literature to explore the spatially varying impact of the built environment on the subway station demand.

3. Data Description

3.1. Study Area

Chengdu is located in Sichuan Province, China, and has been known as the “Land of Heaven,” with a total area of 14,335 square kilometers, a built-up area of 949.6 square kilometers, and a resident population of 16,330,000 (Chengdu Municipal People’s Government). Chengdu had five subway lines under operation, which were Line 1, Line 2, Line 3, Line 7, and Line 10 as in March 2018, with a total of 136 stations, as shown in Figure 1. In March 2018 alone, the number of subway trips was more than 60 million. This study uses the automatic fare collection data of the Chengdu subway from March 18 to 31, 2018. The average number of trips per day during this period was 2,061,853, the average passenger travel distance was 14.5 km, and the average passenger travel time was 30 minutes.

When constructing the buffer zone, two types of distances are considered: Euclidean distance and network distance. Many studies have used the Euclidean distance [32, 4649] while some other studies adopted the network distance [50, 51]. Guo and Bhat [52] and Schirmer et al. [53] showed that the modeling results based on the buffer zone constructed by network distance and Euclidean distance are similar. As a result, the Euclidean distance is used in this paper. The choice of the catchment area is an important consideration in spatial modeling. Relevant studies [32, 46, 47] found that rail stations generally had a catchment area with a radius of 800 m. Therefore, a circular buffer zone with a radius of 800 m is used as a catchment area in this paper. Considering the fact that some buffer zones may have overlapping areas, which could lead to double (or multiple) counting of some variables, the Tyson polygons have been used, as shown in Figure 2.

3.2. Data

Regarding explanatory variables, 25 built environment variables are selected. One of the major types of variables is the point of interest (POI) variables, which are commonly used in transportation related studies to reflect the land-use characteristics [36, 54, 55]. The POI data of Chengdu with longitude and latitude information have been used in this paper, and ArcGIS is used for spatial processing to retrieve POI building environment variables. The data of POI variables come from Amap (China’s Google Map equivalent) (https://www.amap.com). In addition, OpenStreetMap (https://www.openstreetmap.org/) is used to obtain the variables related to the road network.

Since the automatic fare collection (AFC) system records detailed trip information including card number, card type, trip date, boarding station name, boarding time, alighting station name, and alighting time, AFC data have been used in this study to estimate passenger travel distances at each station. The distance of each station pair along the metro line has been measured using ArcGIS. Since there are 136 stations, a 136 by 136 distance matrix has been obtained (Table 1). In the data, the origin station and the destination station of each user has been recorded. This information has been used to calculate the value of the response variable Passenger-distance and the value of the explanatory variable Accessibility.

Further, the Passenger-distance of each station has been obtained by the product of the OD distance and the passenger flow of the station. Since the response variable, passenger-distance, shows a highly left-skewed distribution, the variable is log-transformed to meet the normality assumption for the OLS models. The variables have been described in Table 2.

The following equation has been used for the calculation of accessibility of a station [56].where denotes the accessibility of the station , denotes the distance from station to station , and n denotes the number of stations in the subway system.

3.3. Clustering Analysis

The subway stations are clustered using the K-means clustering algorithm based on the 25 explanatory variables. The K-means clustering algorithm is a classic clustering algorithm that was proposed by MacQueen [57] in 1967. When using the K-means clustering, the number of clusters, represented by K, needs to be determined. The commonly used methods are the Elbow diagram and the contour coefficient method [58]. The result of the Elbow diagram and the contour coefficient method are shown in Figure 3. From the Elbow diagram, the elbow points occur when the number of clusters is 4 or 5. From the contour coefficient diagram, the contour factor is the highest when the number of clusters is 5. As a result, the value of K is set as 5.

The results of the clustered stations are shown in Figure 4. Most stations in clusters 1 and 2 are located in areas far away from the city center of Chengdu. Compared to the city center, these areas have a lower density of population and employment, which may result in lower passenger flows. Cluster 3 shows the characteristics of stations in the city center, where the density of population and employment is high. Cluster 4 only has one subway station, Chunxi Road subway station, which is located in a busy commercial corridor. At the same time, this is a typical case of TOD development. Through close integration with commerce, it also drives the traffic of the site. Thus, this station may stand out as a unique station type. Cluster 5 is mainly located around the loop subway line of Line 7. These stations have similar characteristics with high percentages of residential areas.

3.4. Passenger Flow Characteristics

When describing the passenger flow characteristics, generally more attention is paid to the daily passenger flow and the distribution of passengers’ travel time. Figure 5 (unit: person/day) shows that the weekday passenger flow is higher than that on the weekends. After calculation, the average number of trips on weekdays is 2,184,722 per day, and the average number of trips on weekends is 1,756,813 per day. Furthermore, the passenger flow of Line 1 is significantly influenced by the day of the week, indicating that Line 1 passengers are likely mainly commuters. Line 10, on the other hand, is not affected by the day of the week, as it is largely airport traffic. According to the distribution of travel time of subway passengers (Figure 6 (Note: Unit is number of people)), the travel time is mostly within the range of 0–100 minutes (99.65%). The average travel time of passengers is 29.7 minutes. Among them, the highest passenger flow was 758,264 at about 19 minutes.

According to the heat map of hourly passenger flow (Figure 7), the vertical axis represents date, which ranges from the 18th to the 31st. The horizontal axis represents time of the day, which ranges from 6 to 23. Darker color indicates higher passenger flow. The differences in passenger flow exist between weekdays and weekends. On weekdays, there are two periods of peak ridership, one in the morning (7 : 00-8 : 00) and the other one in the evening (17 : 00-18 : 00). The average passenger flow during the morning peak hours (7 : 00-8 : 00) on weekdays is 247,808.7 per hour, and the average passenger flow during the evening peak hours (17 : 00-18 : 00) is 217,769.3 per hour. The ridership in the morning peak hours is higher than that in the evening peak hours, which is consistent with the findings of Ma et al. [36]. However, on weekends, the subway ridership does not have a clear peak period, which is consistent with the findings of [40, 59]. When analyzing the hourly subway ridership, it has been found that the trips are mainly concentrated between 6 : 00 and 23 : 00, consistent with the operating hours of the Chengdu Subway.

The spatial distribution of Chengdu subway passenger flow shows that trips are concentrated around the city center, the terminal stations, and the southern high-tech zone, which is a residential and employment center in Chengdu. The spatial distribution of the logarithm of the subway ridership (Figure 8) and the passenger-distance (Figure 9) in Chengdu show that the passenger-distance and the ridership at subway stations have the same spatial relationship overall, i.e., the more trips are made, the more passenger miles are traveled. However, there are some stations with inconsistent performance.

4. Models and Methods

4.1. OLS Model

The ordinary least squares (OLS) model is one of the most used models to determine the linear relationship between the explanatory variables and the response variable [60]. Its function is given aswhere denotes the th observation of the response variable; denotes the coefficient of the th predicting variable; denotes the th predicting variable of the th observation; and denotes the random error term.

4.2. GWR Model

The OLS model assumes the relationship between the explanatory variables and the response variable to be consistent across the space. Yet, their relationship could vary across space. To overcome this issue, Brunsdon et al. [61] proposed the geographically weighted regression (GWR) model, which is commonly used to study the spatially varying relationships between the response and explanatory variables [62]. The function of the model is shown aswhere denotes the th observation of the response variable; denotes the coefficient of the th predicting variable; denotes the th predicting variable of the th observation; and denotes the coordinates of the ith observation; denotes the random error term.

4.3. MGWR Model

The MGWR model integrates the OLS model and the GWR model by allowing some variables to have a constant coefficient and other variables to have spatially varying coefficients [13]. The function of the MGWR model is shown aswhere denotes the intercept, which can be set as a global intercept or a local intercept ; is the ath global variable; is the bth local variable; denotes the regression coefficient of the ath global variable; and is the regression coefficient of the bth local variable.

There are two main types of weighting functions to determine the coefficients of the MGWR model, which are (i) the Gaussian function and (ii) the double square function. For searching the bandwidth, there are four main methods, which are (i) Akaike Information Criterion (AICc) for small samples, (ii) Akaike Information Criterion (AIC), (iii) Bayesian Information Criterion (BIC), and (iv) cross-validation (CV). By exploring different weighting functions and search methods, the best model was obtained based on the CV search method and the Gaussian weighting function.

5. Results and Discussion

5.1. Model Results
5.1.1. OLS Model Results

Before constructing the OLS model, potential multicollinearity between variables needs to be eliminated. The variance inflation factor (VIF) is used in the paper to measure the multicollinearity among variables. According to Reference [63], when VIF is less than 10, the variables do not have a multicollinearity issue. After calculating the VIF values, the VIF values of the two variables City center and Township road were higher than 10. Thus, the Township road variable was removed, as its VIF value was the highest. To explore the correlation between the response variable and the explanatory variables, the correlation matrix is obtained based on the Pearson’s correlation coefficient, as shown in Figure 10.

The results of the OLS model are shown in Table 3. It can be seen that 11 of the 25 explanatory variables are significant. In terms of Passenger-distance, significant variables at the 0.01 level include New station, Transfer, Accessibility, Bus, Building, and National highway. Among these variables, Terminal, Transfer, Accessibility, Bus, Restaurants, Building, National highway, and Dart are positively related to the SLPD, while New station, Life, and Railway are negatively related. This result was compared with the result of the ridership model. There are nine variables that significantly affect the ridership at the 0.01 level, including New station, Transfer, Restaurants, Bus, Building, National highway, Railroads, and Dart.

From Table 3, it can be observed that the positive and negative effects of the response variable Passenger-distance and Ridership on the explanatory variables in the OLS model are comparable, except for the difference in the significance level of the variables Lift and Accessibility. In addition, the higher number of subway transit stations implies more passenger-distance, which is consistent with the findings of previous studies [35, 48].

Furthermore, it is necessary to consider the fact that passengers choose to travel by subway for different purposes. The variable Accessibility has a positive correlation with Passenger-distance. The reason could be that higher accessibility of subway lines is associated with a stronger willingness of passengers to use the station, which increases the number of trips. Consequently, it can have a positive effect on Passenger-distance. Unlike most studies, the Life has a negative effect on Passenger-distance, which may be because the high number of life POIs around the city reduces the need of passengers to travel by subway to reach these POIs. In contrast, the restaurant POIs have a positive effect on Passenger-distance, which may be explained as people in Chengdu are willing to travel for long distances to access different types of restaurants, resulting in more and longer trips to restaurant POIs.

Transportation-related variables have different effects on the travel distance of subway passengers. Generally, the more roads around a subway station, the more convenient it is for people to travel by subway [13]. For example, in the OLS model, the variables National highway and Dart both have a positive effect on Passenger-distance and Ridership. However, the railroad network can play a competing role against the subway, with nearby subway stations around the railroad station more likely to serve as an intermediate destination. Therefore, the more railway stations there are, the fewer passenger-distance can be expected.

In addition, the subway station attribute variables New station, Terminal, and Transfer have different effects on Passenger-distance. For the variable New station, it has a negative relationship. The reason can be that the newly opened station is not as established as the older sites, which may result in less passenger flow in the short run. Variables Terminal and Transfer both show a positive relationship. In general, the passenger flow at the start and transfer stations of the subway is usually larger than that of other stations. These stations also have a larger passenger flow in the analysis. Therefore, more passenger distance is expected.

5.1.2. GWR and MGWR Model Results

When using the conventional linear regression models, estimating the relationship between variables through inefficient regression coefficients without considering the spatial autocorrelation of the data may cause a mismatch between the observation and the reality [46]. Therefore, the spatial autocorrelation of the explanatory variables needs to be measured using the spatial Moran’s I index and Z score [64]. The estimation of Moran’s I index by the software Geoda suggests that the variable New station does not have a strong spatial autocorrelation and therefore it shall be considered as a global variable. With the existence of spatial correlation of all other variables involved in the calculation, it can be concluded that the use of GWR and MGWR models is more reliable.

The GWR 4.0 software has been used for the calculation and fitting of the data using the MGWR model and the GWR model. Five important indicators Mean, STD, Min, Median, and Max for the data with the results have been presented in Table 4. For the GWR and the MGWR models, the signs of the coefficients for the global variables are similar to the signs of the variables in the OLS model. Among them, Accessibility is a global variable in the MGWR model and will not vary spatially.

5.1.3. Result Comparison

To further study the spatially varying relationship between the explanatory and the response variables, and address the problem of spatial autocorrelation, the GWR model and the MGWR model have been used to fit the data. In measuring the model performance, the results of the OLS model, GWR model, and MGWR model have been compared and analyzed using AIC, BIC, AICc, CV, R2, and the Adjusted R2. According to Huang et al. [65], the smaller the value of AIC, the more accurate the model is, and for different models, the difference of more than 3 in the value of AIC indicates a significant difference among the models [62]. In GWR 4.0 software, it has been observed that the model calculated by Gaussian fixed and CV is better. As shown in Table 5, the MGWR model has smaller AIC and AICc values than the OLS and GWR models, while the R2 increases to 0.671. By comparison, it has been observed that the MGWR model, which considers both global and local variables, has a better fit compared to the OLS model and the GWR model.

5.2. Spatial Patterns

The spatial patterns of subway stations and their built environment, including distance, design, diversity, density, and accessibility, can have an impact on passenger travel behaviors [11, 40]. The research results show that the spatial distribution of the residuals of the MGWR model has a more random distribution than the GWR and OLS models, with the GWR and OLS models showing patterns of spatial aggregation. Therefore, the GWR model and the OLS model fit the spatial data poorly. According to Figure 11, the spatial distributions of residuals of the MGWR model, GWR model, and OLS model are similar, but the range of residuals of the MGWR model is smaller than that of the GWR and the OLS models.

From the spatial visualization of the local R2 of the MGWR model and the GWR model (Figure 12), the spatial distributions of the local R2 of the two models show differences. However, the difference between the local R2 spatial visualization of the MGWR model and the GWR model is more obvious in the East side of Chengdu. The higher local R2 of the MGWR model is mainly distributed near subway stations in the eastern and southern regions.

From the perspective of multimodal transportation, subway trips are spatially significantly correlated with trips at bus stop densities, which is consistent with the findings of Zhao et al. [15]. Based on the results, it is observed that rail and bus have different cooperative or competitive impacts on Passenger-distance under different circumstances, which is similar to the findings of Chen et al. [5]. The Government of Seoul, South Korea, provides free transfers between subway and bus stations, which greatly encourage public transportation trips [35]. Therefore, it is recommended that Chengdu could develop a free or discount transfer policy between subway and bus to encourage the use of public transportation.

Moving onto station attributes, it is found that the transfer stations are generally recorded with larger passenger flows. Therefore, passenger-distance of transfer stations is higher. Figure 13 shows the spatial distribution of the three variables of station attributes New station, Terminal, and Transfer. The spatial distribution of the New station coefficient indicates a high value in the west and south but a low value in the north. This is likely due to opening of the third phase of Chengdu Subway Line 1 (mainly located in the south) in mid-March 2018 (Chengdu Metro). As a result, the passenger-distance in the west and the surrounding areas was impacted, while the impact in the north was yet to show in the data.

As for roadway accessibility, by comparing the four submaps (a)–(c) in Figure 14, it is found that there is generally a greater demand in the southern zone of Chengdu. Since there are more job opportunities in the high-tech zone of Chengdu, this is likely generating the traffic of commuters. For the variable National Highway, it shows that in the eastern and southern regions, it has a strong positive correlation with passenger mileage, while for some stations in the north, it has a negative correlation. The reason for this situation can be that Chengdu’s national highways are denser in the northwest; therefore, passengers in these areas are more likely to choose car-based modes such as private or ride-hailing cars. The variable Railway has a strong positive correlation with the southern subway station, while the northwest station has a negative correlation with the passenger-distance. The reason for this can be that the railway line is spread out in the northwest part, while in the south and east the stations are more sparsely distributed. The relationship between passenger-distance and the variable Dart is different from that of the previous two variables. It has a higher positive correlation in the western and southern regions. The reason can be that the provincial roadway network in the southern and the western districts is denser, while that in the northern and the eastern districts is sparser.

On the discussion of POIs, most POI class built environment variables seem to replace the equivalent types of land use, such as commercial POIs can replace commercial land use, and living class POIs can replace population density [5]. According to Figures 15(c) and 15(d), the variable Bus expresses a positive correlation with the Passenger-distance. The variable Life has a negative correlation effect on most subway stations, but it has a positive correlation relationship with most subway stations in the High-tech Zone (southern region of Chengdu) and the Shuangliu District (southwest of Chengdu). The variable Restaurant shows a different spatial representation from the first two variables. The coefficients are higher in the eastern region, indicating that the passenger mileage and the restaurant in these regions have a strong positive correlation. In addition, the variables Buildings and Passenger-distance are both positively correlated. The difference in space is that they have a greater impact on the western and southern regions, compared to the central and eastern regions.

6. Conclusion

6.1. Summary

This paper focuses on the spatial heterogeneity between the built environment and subway passenger flow. Taking subway passenger-distance as an example, the spatial relationship between Passenger-distance and other key built environment factors has been investigated. To study the spatial heterogeneity of ridership at the subway station level, three models were developed for analysis and comparison, namely, the OLS model, the GWR model, and the MGWR model. Our results show that MGWR model has smaller AIC, AICc, and CV values than OLS and GWR models, while the goodness-of-fit (i.e., R2) is greater. Therefore, the MGWR model, which considers both global and local variables, has a better fit to explore spatial heterogeneity through our data. The MGWR model also reveals that the relationship between the built environment variables and SLPD exhibits spatial heterogeneity.

According to the OLS model, among the eleven significant variables at the 0.1 significance level, Terminal, Transfer, Accessibility, Bus, Restaurant, Building, National Highway, and Dart are positively associated with subway SLPD, indicating that as the values of these built environment variables increase, the Passenger-distance of subway stations increases. In the GWR model, these 11 significant variables are all treated as local variables. Through analysis, the MGWR model considers Terminal, Transfer, Bus, Restaurant, Building, Life, National Highway, Dart and Railway as local variables, and Accessibility as a global variable. These findings show that the MGWR model fits the data better.

In addition, some policy suggestions can be proposed based on the results of the MGWR model. First, for the subway stations located on the south side of Chengdu, government agencies could increase the density of the buildings around the subway station, build denser national highways, and more bus stations to increase the accessibility and the attractiveness of the subway stations. Second, for those stations located in the southwest of Chengdu, as the number of life-related POIs is positively related to the passenger distance, the government agencies are advised to invest more life-related POIs near the stations, in consistence with the TOD strategy. Third, for the stations on the east side of Chengdu, since the passenger distance is positively related to the number of restaurants, more efforts should be spent on enriching the number and diversity of the restaurants to encourage longer travel distance on subway by the passengers. Finally, since the density of provincial highways has a significantly positive impact on the passenger distance of subway stations in western Chengdu, government agencies could consider expanding these provincial links to increase the demand for public transportation. Although this study mainly focuses on Chengdu, China, the research framework and methodology of this paper are applicable to other cities as well. Specifically, other studies should consider the passenger-distance as the dependent variable and the use of MGWR to explore the spatially varying relationship between the station level passenger-distance and determinants.

6.2. Limitations and Future Research

We acknowledge several limitations of this study. First, due to the lack of data on population density and regional economy, these data were omitted from the model, which may cause some biases in the results. Second, regarding the buffer selection of subway station catchment area, a circular buffer zone with an 800 m radius and the Tyson polygons are used. This selection may result in some biases especially in the city center where many stations are densely distributed with smaller coverage areas. Despite these limitations, this study systematically explores the spatial variation patterns of subway SLPD and investigates the influence of the built environment on their spatial heterogeneity. Meanwhile, we provide some suggestions on ridership growth and management. However, although the MGWR model could be used to explore the spatially varying relationship between explanatory variables and the response variable, it cannot take into consideration the temporal dimension of the data. This is a potential direction that future work can lean towards. Models that consider both spatial and temporal scales of the relationship between the built environment and the station-level passenger distance may yield more accurate results. Another limitation of this study is that the circular buffer is used as an approximation of the actual access and egress distance. In reality, the territorial discontinuity and barriers could influence the access and egress distance, which will further influence travelers’ behaviors [66]. In the future, researchers should consider the territorial discontinuity and barriers around stations to obtain more accurate estimation of the actual access and egress distance.

Data Availability

The data used to support the findings of this study are available from OpenStreetMap and Amap: https://www.openstreetmap.org/#map=12/51.0611/-114.1304 and https://www.amap.no/data.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This study was funded by the National Natural Science Foundation of China (grant nos. 71704145, 51774241, and 71831006), Humanity and Social Science Foundation of Ministry of Education of China (grant no. 18YJCZH138), and China Postdoctoral Science Foundation, Sichuan Youth Science and Technology Innovation Research Team Project (grant nos. 2019JDTD0002 and 2020JDTD0027).