Abstract

Stations are being converted into various living spaces that can be used for public transportation, work, commerce, and leisure. To satisfy the various requirements and expectations for functional extension, it is necessary to investigate and understand the phenomena caused by users. A methodology to cluster the characteristics of pedestrian space of a railway station through the pedestrian trajectory data collected from an actual operating station is proposed in this paper. Then the spatial usability of the movement and stay of pedestrians were defined through the results of the clustering. The procedure to cluster the indoor space characteristics of an urban railway station in this study consists of four steps: data collection, feature vector extraction, K-means clustering, and cluster characteristics analysis. A case study was conducted for the Samseong station. The results of the proposed spatial clustering analysis showed that there are several types of spaces depending on the space occupancy characteristics of pedestrians. The proposed methodology could be applied to indoor space diagnosis from the perspective of station monitoring and management. In addition, the station operator could respond flexibly to unexpected events by monitoring the indoor spaces according to whether the flow is normal or suggestive of an emergency.

1. Introduction

The urban railway station consists of various spaces: a platform, passage, concourse, complementary, and urban spaces [1]. Rather than being spaces used solely for the public transportation, these spaces are transformed into complex living spaces with various functionalities, including spaces of work, commerce, and leisure. The multi-business district centered on the railway station has a positive effect, leading to an improvement in the urban traffic environment in conjunction with transit-oriented development, unlike large-scale traffic inducing facilities such as marts and department stores. To support the various requirements and expectations of this expansion, it is necessary to investigate and understand various types of spatial utilization patterns generated by users.

Traditionally, evaluations of station facilities have been conducted through levels of service. The Transportation Research Board [2] proposed a method for evaluating the service of space and facilities through measures such as density, speed, and space. The space was divided into concourse, platform, stairs, and passage, with evaluation criteria presented for each facility. However, this evaluation method is limited in that it does not reflect the many phenomena of increasingly various spaces. In addition, the existing approach for designing stations has focused on efficiency for reducing the cost and scale of buildings without considerations of pedestrian flow. Accordingly, a novel strategy based on scientific analysis techniques is required to ensure the service quality provided to pedestrians. You et al. [3] proposed the pedestrian movement-based assessment toolkit for simulation (PATS), utilizing a comprehensive analytic framework that incorporates Big Data platforms such as individual travel records and demographic statistics corresponding to building information. By defining the space with increasingly diverse characteristics, this paper proposes a systematic space management and operation method.

This study classifies the characteristics of indoor spaces using the pedestrian travel information collected by advanced sensor technology. A method is presented for clustering the pedestrian space of railway stations using the pedestrian trajectory data collected from an actual operating station. Additionally, the method defines the spatial usability of movement and static tendencies through the results of the clustering. The spatial clustering procedure comprises four steps. First, a field experiment was conducted in the present study to collect actual pedestrian trajectory data using 2D-LiDAR sensors. Second, the cell-based feature vectors related to the utilization efficiency, mobility, and comfortability were extracted from the pedestrian trajectories. These feature vectors were used to input variables for clustering the spatial characteristics. Third, a K-means clustering algorithm was used for spatial clustering. The K-means clustering method derives optimized spatial classification results using three feature vectors. Finally, analyses of the case study on the Samseong station were conducted. The characteristics of patterns were identified by clusters associated with the feature vectors to demonstrate whether the spatial pattern differentiated by the feature vectors.

The remainder of this paper is organized as follows. In Section 2, the proposed procedure for spatial clustering is explained while Section 3 introduces the pedestrian trajectory collection systems and presents an overview of data collection. Section 4 presents the analysis results along with a discussion. Finally, Section 5 presents the conclusion with a summary of this study and future research directions based on the identification of limitations involved in this study.

2. Literature Review

A literature review was conducted to identify valuable research opportunities that would differentiate this study from existing studies. This literature review is focused on existing studies in two major fields that are closely associated with the objective of this study. The first looks at the approaches used to obtain pedestrian trajectories and the applications of this. The second investigates pedestrian movements in indoor spaces to model pedestrian behavior.

Recently, advanced sensor and communication technologies have been widely applied to collect the data not only of vehicles but also of pedestrians. Significant efforts have been made to detect and to track pedestrians [46]. These studies proposed the algorithm for detecting and tracking the vehicles and pedestrians using advanced methodologies such as LiDAR and VISION sensors. In addition, several studies were performed to assess the congestion of the indoor spaces using pedestrian characteristics data collected from various sensors, video, and Bluetooth [710]. In this study, the pedestrian trajectory collection system based on the LiDAR sensor was used to collect pedestrian trajectory data. Rather than proposing a pedestrian trajectory detection and tracking method, this study suggests a method to utilize the collected pedestrian trajectories. The pedestrian detection technologies developed in the previous studies can be combined with the methodology developed in this study to inform the service evaluation and operation plan.

A few existing studies have developed an extended model of pedestrian behavior based on cellular automata and demonstrated the model through simulation analysis [1113]. Ji et al. [14] presented the cell-based model for aggressive pedestrians weaving their way through a crowd in a corridor. Relevant lines of literature proposed methodologies surrounding the concept that existing pedestrian models could better reflect real-world pedestrian behavior. Ma et al. [15] demonstrated that the movement of pedestrian counter flow is caused by the interaction of K-nearest-neighbor. It also conducted a validation analysis by comparing the lane formation pattern and the fundamental diagram with real pedestrian counter flow. The results indicate that the proposed modeling method therein provides a more efficient and accurate traffic condition. These studies can be applied to the pedestrian simulation model for evaluating facilities and services in regard to pedestrian movement in the subway stations. The purpose of this study is not to simulate the pedestrian behavior model, but to present the characteristics of the space by aggregating the movement of the pedestrian. Thus, a systematic framework for future space management and operation can be suggested.

Many efforts have been conducted for collecting pedestrian data and investigating the pedestrian behavior model. Although several related studies involved pedestrians, we are unaware of any study that incorporates pedestrian movement characteristics into a spatial pattern analysis in a subway station. With this perspective, this study proposes a method to monitor the station, juxtaposing this with pedestrian detection and tracking technologies of existing studies. The systematic framework for classifying the spatial patterns using pedestrian trajectory data is proposed in this study.

3. Methodology

3.1. Overall Procedure of Spatial Clustering Analysis

In the past, the railway station has been defined as a waiting space for pedestrians to use public transportation. More recently, stations have rapidly changed into spaces where pedestrians engage in mobile, sedentary, and commerce-driven behaviors. Because the space impacts the nature of the users’ activity occurring within it, there is a need for systematic analysis of the diversified space.

In this study, spatial clustering was conducted to establish the basis for ensuring the safety and efficiency of railway stations. This was done through the classification of space characteristics for the indoor station. Feature vectors for spatial clustering were extracted using pedestrian trajectory data collected through the pedestrian trajectory collection system (PTCS), and the indoor space was classified using the K-means clustering method. To identify the spatial characteristics, a sequential procedure for clustering the indoor space has been developed, as shown in Figure 1. These steps were applied in a case study at Samseong station.(i)Step 1: For the study area, the pedestrian trajectory data was collected and used for clustering the spatial characteristics. Samseong station in Seoul, Korea, was selected for the study area, as it is connected to the large complex shopping center, COEX mall. This study used the commercialized pedestrian trajectory collection system based on LiDAR sensors. The pedestrian trajectory data includes the two-axis coordinates, speed, acceleration, and the direction angle of an individual pedestrian. The pedestrian trajectory data comprise calculated values by the internal algorithm. The algorithm is provided by the commercialized system based on the raw data collected from the LiDAR sensors.(ii)Step 2: The cell-based feature vectors were extracted for spatial clustering. The feature vector is an aggregated value by 1 m × 1 m cells from the individual pedestrian trajectory data. Features related to utilization efficiency, mobility, and comfortability of pedestrians were considered. These three feature vectors, utilization efficiency, mobility, and comfortability, were represented as the number of trajectory points, average speed, and standard deviation of direction angle, respectively.(iii)Step 3: The spatial clustering using a K-means clustering method was conducted. Because the K-means clustering method is an unsupervised learning method, it is necessary to determine the optimal number of clusters. In this step, the silhouette method was used to derive the optimal number of clusters. Then, the cluster results were assigned to minimize the objective function, which is the Euclidean distance based on the predefined number of clusters.(iv)Step 4: Finally, the cluster results were applied to Samseong station, including the analysis of the characteristics according to clusters and time of day. Based on the clustering results, space characteristics were defined to include moving space, waiting space, and crowded space. Further discussion of Step 4 includes explanations of methods used in spatial characteristics identification.

3.2. Pedestrian Trajectory Collection System (PTCS)

Recently, advanced sensor and communication technologies have been widely applied to provide various services for transportation users and operators. In this study, the PTCS, based on a 2D-LiDAR sensor, was used to collect individual pedestrian trajectory data in a public facility.

The LiDAR (Light Detection And Ranging) sensor, which was recently used as a core component of autonomous vehicles, is easy to install and expand, with a fast data acquisition process. Consisting of a transmitter and a receiver, a LiDAR sensor detects the distance, direction, and speed from the object by calculating the duration of the returning short light pulse laser. The sensor has an excellent range and spatial resolution in weather conditions of various lighting and temperatures. The LiDAR sensor has the advantage to precisely track objects and process data quickly, enabling it to smoothly track a pedestrian’s movement, even in a complex room such as a subway station. More technical details of the LiDAR sensor can be found in the referenced literature [16].

In this study, commercially developed PTCS based on the LiDAR sensor was used for data collection. The PTCS affords customized communication, with the maximum sensing range of a LiDAR sensor approximately 15 m and 270°, with a 5-Hz band. Continuous pedestrian trajectories can be collected by overlapping the detection areas between LiDAR sensors. Collected and recorded every 0.2 s on a server computer connected to the sensors, the pedestrian trajectory data include the two-axis coordinates, walking speed, and the direction angle. Based on tracking results, the PTCS also provides visualization solutions, such as real-time pedestrian tracking systems and a heat-map. Further information regarding the PTCS can be found in the referenced literature [17].

4. Analysis and Results

4.1. Data Collection

Samseong station in Seoul, Korea, which has a development plan to establish additional six-railway lines, was selected for the study area. According to the development of Samseong station, the space will be changed into the complex space where pedestrians will use it for various purposes. The high-density waiting room connected to the ticket gate in Samseong station was selected as the data collection space. Five LiDAR sensors were used to collect the pedestrian trajectory data, with the sensing range in the red area shown in Figure 2.

The field experiment for collecting the pedestrian trajectories was conducted from 7 AM to 10 PM on July 12, 2017, in Samseong station. The individual pedestrian trajectory data is collected in 0.2 second increments via the PTCS and includes two-axis coordinate, speed, acceleration, and direction angle. Table 1 presents a description of the trajectory information collected by PTCS.

Samseong station is a commercial and business-oriented district, so the pedestrian flow patterns vary according to the time of day, as presented in Figure 3. At the morning peak hours (dotted box ①), there is significant pedestrian flow from the ticket gate to concourse; on the other hand, the opposite pattern appears in the afternoon peak hours (dotted box ③). In addition, the inflow and outflow patterns are similar during nonpeak hours (dotted box ②). In total, 11,007 pedestrian trajectories were collected and used to establish a dataset for indoor spatial clustering. The dataset comprises 5,729; 747; and 4,531 pedestrians in the morning peak hours, nonpeak hours, and afternoon peak hours, respectively [18].

4.2. Extract the Feature Vector

Currently, performance measures for evaluating the level of service provided to pedestrians include pedestrian space, flow rate, density, and travel time [2, 19]. Additionally, significant effort has been made in many countries to develop a novel method for evaluating pedestrian environments [2023]. In this study, feature vectors were extracted based on the index for evaluating pedestrian environments related to utilization efficiency, mobility, and comfortability. To identify the characteristics of indoor space for an urban railway station, this study used three feature vectors derived from the individual pedestrian trajectory data collected by PTCS: sum of the trajectory point, the average speed, and the standard deviation of the direction angle. The study area was divided into 1 m × 1 m cells, to reflect spatial characteristics, as shown in Figure 4. The definition of the feature vectors is as follows.(i)Sum of the trajectory points (STP, number of points/m2): The STP represents the utilization efficiency for the space. That is, it indicates how many pedestrians used the space in a given time interval. The value can be calculated by aggregating the number of trajectory points existing in 1 m2 to the given time interval, which is defined as 1-hour in this study.(ii)Average speed (AS, m/s): The AS represents the variable indicating the mobility of pedestrians in the station. This feature vector is used to evaluate the mobility of the cell. Each pedestrian trajectory point has a speed value. Speed is the measured value from the PTCS based on the coordinates between two consecutive cells. The cell-based aggregation AS value of the speed collected from the PTCS is used as the feature vector for mobility.(iii)Standard deviation of direction angle (SDD, degrees2): The SDD is used as the surrogate measure reflecting how comfortable the pedestrians can move in the corresponding cell. When the presence of pedestrians moving in various directions in the same cell is large, many conflicts occur between pedestrians in the corresponding cell. The direction angle is the numerical value for the movement direction of the pedestrian. It is calculated using the vector values of the coordinate of time “T-1” and “T,” with reference points. The value ranges from −180° to +180°. The direction angle concept is illustrated in Figure 5.

4.3. Spatial Clustering Results

The K-means clustering algorithm used in this study is a simple method to partition observations into clusters in which each observation belongs to the cluster with the nearest mean. The objective function serves to minimize the variance of distance between each cluster. The steps for clustering the given data are described below. A more detailed theoretical background can be found in the referenced literatures [24, 25].(i)Step 1: Select the number of clusters, K, and then randomly assign points to be centered on the cluster.(ii)Step 2: Calculate the distances between individual data and the centers of K, and assign the individual data to the closest cluster to which the corresponding data belongs.(iii)Step 3: Set the average data value belonging to the cluster as the center of the new cluster.(iv)Step 4: Repeat steps 2-3 until the objective function converges to the predefined threshold.

The cell-based STP, AS, and SDD were used as input variables of the K-means clustering algorithm. Because K-means clustering is an unsupervised learning method, it is necessary to decide the optimal number of clusters. In this study, the optimal number of clusters is determined by the silhouette method.

The silhouette method interprets and validates consistency within clusters of data. The technique provides a concise graphical representation of how well an object lies within the cluster [26]. The silhouette value is a measure of cohesion and separation within its own cluster compared with other clusters, as shown in Here, i is each object, s(i) is the silhouette value, a(i) is the average distance between and all other data within the same cluster, and b(i) is the lowest average distance from to all points in any other cluster, of which is not a member.

The silhouette value ranges from -1 to +1, where a high value indicates that the data are appropriately clustered. As presented in Figures 6(a) and 6(b), the optimal number of clusters was 6. Figures 6(c) and 6(d) show the distribution of three input variables clustered to six groups, with the minimum distance between the center point and each object.

4.4. Characteristic Analysis by Clusters for Samseong Station

To identify that the results of the cluster indicate the characteristics of indoor space, a characteristic analysis of the feature vectors was conducted for each of the six clusters. The STP was defined as the utilization efficiency for space, AS was represented by the mobility of pedestrians, and the SDD was used as a surrogate measure for comfortability of the space. Table 2 represents the descriptions of feature vectors for each cluster.

The spaces of Samseong station were characterized into six groups, as illustrated in Figure 7. Cluster 1 has the smallest STP, which indicates a lesser space utilization frequency, with less conflicts, and a higher AS related to mobility. Cluster 1 could be defined as the space through which pedestrians move with more varying speeds because the SDD is higher compared to that in other clusters. According to the results of spatial matching, cluster 1 appears to be a region close to pillars and walls. It has a low utilization rate and few conflicts; thus, it can be identified as a space suitable to high-speed movement. Clusters 2 and 3 have a similar STP value; however, the AS of cluster 2 is higher and the SDD lower, characterizing cluster 2 as a space where pedestrians can move stably. In cluster 3, the SDD was high, thus confirming the heightened conflict involved in walking. For clusters 4-6, the occupancy rate of pedestrians was high. This coincided with the main purpose of these spaces, which was to pass by the station in morning-peak and afternoon-peak hours. In particular, cluster 5 had more conflicts than clusters 4 and 6, which could be explained by the bottleneck caused by passengers getting off the train at peak hours in the morning. The definition of space based on the description of feature vectors is presented in Table 3.

Additionally, the results showed that the characteristics of the Samseong station had different patterns depending on the time of the day. Figure 8 presents the number of cells depending on clusters and the time of day. Cluster 1, located near the pillar or wall, showed that the characteristics of the space did not change regardless of time. The space connected to both sides of the ticket gate is mainly used as a movement space for pedestrians, with the type of movement changing according to the morning-peak and afternoon-peak hours. In the corridor and the vicinity of the ticket gate of cluster 3, conflicts occur constantly. This mandates additional services such as moving line separation and real-time control of the gateway in order to secure pedestrians comfortability.

5. Discussion and Conclusion

With the expansion of complex business centers in urban railway stations, a comprehensive and systematic analytical framework is required to inform the complicated procedures of design and operational planning. For this purpose, it is necessary to manage the space according to differentiated spatial characteristics through analysis of movement patterns of pedestrians. This study proposed a spatial clustering methodology based on the complex walking patterns of railway pedestrians.

This study used the pedestrian trajectory data to classify the spatial characteristics of the station by patterns of pedestrian usage of space. The pedestrian trajectory data were collected from the PTCS based on 2D LiDAR sensors. Cell-based surrogate measures, including the sum of the trajectory point, the average speed, and the standard deviation of the direction angle, were used to represent utilization efficiency, mobility, and comfortability, respectively. These were then derived as feature vectors. The K-means clustering algorithm was used to classify the characteristics of the indoor space; by using the silhouette method, the optimal cluster number was determined to be six. A case study was conducted in the Samseong station, and the results showed that the characteristics of the Samseong station had different patterns depending on the time of day. Additionally, it was demonstrated that there are various space types according to the space occupation characteristics of pedestrians. In particular, this emphasized the importance of spatial analysis as a reflection of pedestrian travel behavior.

In the related studies, the service level of a station was evaluated using measures such as density, flow rate, space, etc. In this evaluation method, the investigator directly investigated the number of pedestrians at various times. Advanced sensor technologies were used to collect more accurate pedestrian data. Through this, a systematic process was suggested to analyze the usage status and the diagnosis of indoor space characterizations. Although useful insights were derived from this study, further research must be conducted to achieve more reliable and widely applicable results. First, to analyze the diversity of indoor space, various clustering techniques and feature vectors are needed. Furthermore, it is possible to optimize an indoor space through repetitive learning methods such as deep neural networks. Additionally, spatial analysis using various cell sizes is necessary to investigate the effects of cell sizes on the results. Second, an integrated evaluation study that incorporates methodologies such as space syntax should be conducted, as such methodology is already being used for building design and assessment. Finally, system extensions and a real-time monitoring framework should be developed to improve practical usability of the proposed methodology. To do this, the methodology proposed in this study should be applied to various stations and the resulting characteristics compared and analyzed. Thus, a novel evaluation criterion for spatial analysis of indoor spaces could be established and utilized in the management and operation of the station.

Through spatial diagnosis of pedestrians’ movement patterns, the proposed methodology can be applied to space management and monitoring. In particular, its application can be expanded to include real time situations involving changes in pedestrian flow caused by the arrival and departure of trains. Furthermore, it can support flexible responses for unexpected events by proactively monitoring the occupation pattern of an indoor space according to the type of flow, whether normal or emergent. Stations’ interest in using such advanced technology is increasing. Thus, the proposed methodology could be extended to other applications: it could be applied not only as a space management technique, but also as an underlying technology for providing various services to users and operators.

Data Availability

The pedestrian trajectory data used to support the findings of this study were supplied by LG Hitachi under license and so cannot be made freely available.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by a grant from R&D Program of the Korea Railroad Research Institute (KRRI), Republic of Korea.