Urban open places with a public service function (e.g., urban parks) are likely to be populated in peak hours and during public events. To mitigate the risk of overcrowding and even events of stampedes, it is of considerable significance to realize a real-time full coverage estimate of the population density. The main challenge has been the limited deployment of crowd surveillance detectors in open public spaces, leading to incomplete data coverage and thus impacting the quality and reliability of the density estimation. To remedy this issue, this paper proposes a modified inverse distance weighting (IDW) method, named the inverse distance weighting based on path selection behavior (IDWPSB) method. The proposed IDWPSB method adjusts the distance decay effect according to visitors’ path selection behavior, which better characterizes the human dynamics in open spaces. By implementing the model in a real-world road network in the Shichahai scenic area in Beijing, China, the study shows a decrease in the absolute deviation by 17.62% comparing the results between the new method and the traditional IDW method, justifying the effectiveness of the new method for spatial interpolation in open public places. By considering the behavioral factor, the proposed IDWPSB method can provide insights into public safety management with the increasing availability of data derived from location-based services.

1. Introduction

In Chinese cities, open places serving a public service function (e.g., urban parks) are likely to be populated in peak hours and during public events. The inherent difficulties of surveillance and crowd control in these open spaces highlight the numerous evacuation problems that could arise in an emergency, including the occurrence of stampedes. This is precisely what occurred on February 5, 2004, at the Lantern Festival in Beijing’s Miyun Park. In this instance, the primary viewing space was trampled and caused 37 fatalities even though the total number of visitors was far below the park’s maximum capacity. Hence, controlling the total number of visitors is insufficient for comprehensive safety management in open public places. It is paramount to realize a real-time and full coverage population density estimate to prevent local overcrowding.

To date, many techniques have been applied to crowd surveillance. Video monitoring recognition is one of the most widely used techniques in public security surveillance. With the development of computer vision, studies on crowd analysis [1, 2] and anomalous detection [35] based on video monitoring recognition techniques have made significant strides. Although critical improvements have been made in the video monitoring recognition techniques, it remains infeasible to count the number of people primarily due to the partial occlusions among individuals when the crowd density is extremely high [6]. This difficulty in counting high-density crowds has become a more severe issue for safety management in open public places.

There is an opportunity to overcome this issue by using location-based services (LBSs), such as cellular phone and Wi-Fi probe data. The popularity of smartphones ensures that cell phone data, including cellular signaling data and call detail record data, can fill the gap of individual-based trajectory tracking in urban open spaces [710]. Nonetheless, employing cellular phone data has been considered an invasion of people's privacy [11] and, consequently, its widespread use in exploring individual behavior is restricted. However, the increasing ubiquity of public wireless networks in urban environments creates nascent pathways to understand population dynamics across space and over time [12]. As an emerging location sensor, Wi-Fi probes can conveniently obtain location information from mobile devices [13]. Given the popularity of smartphones today, Wi-Fi probes may prove indispensable in acquiring aggregate movement information in an area of interest [14]. Since Wi-Fi probes merely maintain a unique log of the mobile device, there are little potential privacy infringements [11]. Wi-Fi probe data (called Wi-Fi data hereafter) are consequently becoming an integral component of crowd surveillance [12, 15, 16].

Regardless of how the data collection process ultimately develops, the age-old paradox of expensive detection equipment and limited budgets remains in effect, which leads to the limited deployment of detectors in open public spaces. Moreover, equipment failures and data transmission interruptions diminish the data coverage and severely weaken the quality and reliability of related applications. Thus, the challenge is to realize an acceptable level of the data accuracy within the constraints of a limited budget [17]. The spatial interpolation technique can yield efficient imputations of the missing data while being cost-effective without additional data collection efforts [18]. To this end, various interpolation methods, including the geostatistical and deterministic methods, have been developed in recent decades. The most common geostatistical method applied to traffic interpolation is the Kriging method, which is based on the assumption that the area of interest is an unrestricted region that has spatial autocorrelation with its neighborhoods. Typically, the Kriging method only adopts the Euclidean distance as the distance metric to represent the distance decay effect. Modifications of the Kriging have adopted various distance metrics, such as the road network distance [17, 19] and the great-circle distance [20], which, to a limited degree, can address the limitations associated with the traditional Kriging. The inverse distance weighting (IDW) method is the most widely utilized deterministic method and is commonly applied to big dataset interpolation, including air quality [21] and noise pollution monitoring [22], and has been implemented as a standard spatial interpolation procedure in many geographic information systems (GIS) software packages [23]. The IDW method assumes that each measured point has a local influence on unsampled sites moderated by the distance decay effect. Compared to the Kriging method, the IDW method has a greater ability to adjust the distance metric as it is not restricted by the covariance and variation functions. However, the primary disadvantage of the existing IDW methods is that they only consider objective factors such as the road network structure and spatial distance as decisive factors in the interpolation process while ignoring the influence of individual travel behavior. In an urban open space, an individual’s travel often matches their activity demands. The seemingly free-flowing crowd actually has a particular preference for certain paths, referred to as the path selection behavior. The path selection behavior becomes more prominent in emergency situations such as overcrowding [24]. This path selection behavior has not been adequately considered in the existing interpolation methods.

To estimate the spatial distribution of the crowd population in open public places, we have proposed a new interpolation method called the inverse distance weighting method based on path selection behavior (IDWPSB). The path selection behavior is used to adjust the distance decay effect, making the interpolation result more suitable for approximating the actual crowd characteristics. The IDWPSB has been applied to a case study in the Shichahai scenic area in Beijing, China, to verify the computational efficiency and reliability of this new method in estimating the crowd population in road networks.

2. Materials

2.1. Description of Wi-Fi Data

A Wi-Fi-enabled mobile device (e.g., smartphone) can initiate a connection to a Wi-Fi network by continuously broadcasting signals, also known as the probe requests. Each probe request contains a sequence of device information, including the media access control (MAC) address, device type, brand, and manufacturer. For each smart device, the MAC address is unique to the network connection. Since the probe requests are not encrypted, they can be passively captured and decoded with the help of wireless sniffers. In addition, the received signal strength (RSS) of probe requests can also be measured. All these data are uploaded to a server.

The Wi-Fi probe is a type of wireless sniffer. A description of the data collected by the Wi-Fi probe is shown in Table 1. Since there is a clear correspondence between a user and his smart device, the user’s MAC address can be regarded as an identifier of a specific individual who is located within the detection range of the Wi-Fi probe. To avoid violations of privacy, the user’s MAC address is anonymized by only extracting a fragment from the source record. The timestamp records the temporal information of the Wi-Fi connection. The RSS indicates the connection intensity, which mainly depends on the physical distance between the smart device and the Wi-Fi probe.

2.2. Preprocessing of Wi-Fi Data

Before being analyzed, the raw Wi-Fi data extracted from the server require several steps of preprocessing. First, the RSS needs to be resampled at a regular time step due to the bursty nature of the probe requests, as the probe requests sent by a mobile device are not evenly distributed in time. For example, there can be a sequence of probe requests from one smart device within a few hundred milliseconds, followed by a silent period that lasts several seconds. The number of bursts depends on the working condition of the smart device. To mitigate the uncertainty of the RSS captures, we averaged the RSS values received within a one-second interval [25].

The second step is to uniformize the detection radius among the Wi-Fi probes installed in various environments by data screening. The relationship between the RSS and physical distance is calculated aswhere is the physical distance from the smart device to the Wi-Fi probe, is the RSS, is the calibration parameter of the RSS, and is the calibration parameter of the distance decay effect. For a given type of Wi-Fi probe, and are constants. By setting a maximum detection radius for the Wi-Fi probe, the lower limit of the RSS, , can be calculated accordingly, by which the request records with RSS values smaller than can be excluded. In addition, to remove individuals who are not outdoor, a threshold for the time duration can be set, defined as the end time minus the start time. All the records with a time duration larger than the threshold value (i.e., one hour) are eliminated because individuals in an indoor environment stay connected longer.

Third, considering that some individuals, such as children and senior citizens, do not carry a smart device, we adjusted the detected population through the detectable rate of the Wi-Fi probe. The detectable rate of the Wi-Fi probe is defined aswhere is the detectable rate, is the population detected by the Wi-Fi probe, and is the actual population in the detection radius. The detectable rate of a Wi-Fi probe is related to multiple factors, including the installation environment, working conditions of the smart devices, and the population under coverage. We obtained the parameter in a field survey, in which was acquired by manually counting the number of people within the scope. By surveying the sampling sites, the detectable rate of each Wi-Fi probe is calibrated. Then, the adjusted detected population can be calculated accordingly aswhere is the adjusted population within the detection radius of a Wi-Fi probe.

3. Methodology

3.1. Inverse Distance Weighting (IDW) Method

First, the local crowd density at the sampling site is calculated aswhere is the local crowd density and is the adjusted detected population at a sampling site . is the total area of the road segments in the detection radius of Wi-Fi probe and is calculated aswhere is the number of road segments within the detection radius of Wi-Fi probe , is the length of road segment , and is the average width of road segment .

According to the principle of the IDW, the estimated crowd density at an unsampled site, (node) , is calculated aswhere is the crowd density of the unsampled node, ; is the crowd density of the sampling node, ; and is the shortest path distance from node to node .

A series of scenarios were set to simulate the interpolation process for open public places and validate the effectiveness of (6). As shown in Figure 1, we assume that node is the unsampled site and that nodes and are sampling sites. We also assume that the path distance from node to node and that to node are equal. Equation (7) formulates the crowd density at node as , which is estimated based on the observed population on nodes and . where and are the observed population on node and node , respectively, and are the shortest path distances for segment and segment , respectively.

To further explain Equation (7), we have designed three scenarios, as shown in Figure 1. In Figure 1(a), when there are no road branches, . Now, we add a junction to the road, as shown in Figure 1(b). When individuals select roads randomly (i.e., no path selection behavior), the crowd flow will evenly split between and . In this case, node and node still contribute equally to the population estimation at node . In Figure 1(c), when node becomes more attractive than node , we adjust the distance weight according to the path selection behavior to reflect the attractiveness of the road segment.

3.2. Inverse Distance Weighting Based on the Path Selection Behavior

Because of the large number of junctions in the road networks of open public places, the distance decay effect with the spatial distance on a straight road is altered by individual preference. Through the observation of individuals’ activity trajectories derived from the Wi-Fi data, the path selection behavior pattern, denoted as the transition probability, can be extracted. Adjustment of the distance decay effect according to the transition probability of the path selection behavior can improve the interpolation accuracy.

An activity trajectory is defined as a sequence of locations that an individual walks through during a period, denoted aswhere is the th node on the activity trajectory and is the timestamp of node , both of which are recorded by a series of Wi-Fi probes located at different sites, and is the total number of nodes on the activity trajectory. As shown in Figure 2, a cluster of activity trajectories can reveal the diversity of the path selection behavior among the crowd. In a road network, the set of nodes that most activity trajectories contain has a stronger association, regardless of the spatial distance.

The transition probability of the path selection behavior can be calculated from a large number of activity trajectories recorded by Wi-Fi data over a period of time. For a pair of adjacent nodes, , the transition probability is defined as the proportion of activity trajectories that contain both nodes and to those containing node , calculated aswhere is the total number of individuals recorded by the Wi-Fi probes, and and are arbitrary timestamps on an activity trajectory. For nodes that are not adjacent, the transition probability is the product of the transition probabilities of each pair of adjacent nodes on the shortest path between the unsampled node and the nonadjacent sampling node, calculated aswhere is the number of intermediate nodes from node to node . It is worth noting that since an activity trajectory has a movement direction, may not be equal to . Thus, the relationship between node and node can be represented as the average of the transition probabilities between nodes and , denoted as

Then, the transition probability of the path selection behavior is used to adjust the distance decay effect in the traditional IDW metric. The higher the transition probability of the path selection behavior is, the smaller the distance decay is, which is calculated aswhere is the adjusted path distance from node to node , is the shortest path distance, and is the average transition probability between nodes and .

3.3. The Validation Index of the Interpolation

Cross-validation is applied to evaluate the accuracy of the interpolation results by comparing the interpolated crowd density of a sampling node with the adjusted crowd density detected by a Wi-Fi probe. As mentioned before, there is a deviation between the population detected by the Wi-Fi probe and the ground truth data because the detectable rate of the Wi-Fi probe is not 100%. For this reason, some field survey experiments were carried out to calibrate the detection rate of the Wi-Fi probe and to adjust the detected population according to Equation (3), making the adjusted population as close to the ground truth value as possible. Although there is still a deviation between the adjusted population and the ground truth data, assuming that this deviation is a systematic error for each Wi-Fi probe, it is acceptable to use the adjusted population as the ground truth because it does not affect the validation result.

The widely used accuracy indices, the mean absolute error (MAE), and mean bias error (MBE) are used to indicate the absolute deviation and bias of the interpolation results, calculated aswhere is the adjusted crowd density detected by a Wi-Fi probe on node , which is regarded as the true value; is the estimated value obtained by IDWPSB method on node .

The MAE represents the mean value of the absolute deviations between the estimated values and the true values. The smaller the MAE is, the closer the estimated values are to the true values. The MBE denotes the mean bias between the estimated values and the true values; the sign of the MBE suggests underestimation (positive value) or overestimation (negative) of the interpolation results.

4. Study Area and Data Collection

The Shichahai scenic area is a tourist neighborhood in Northwest Beijing, covering an area of 2.31 million square meters. As an open public place in a metropolitan center, the Shichahai scenic area fulfills multiple public service functions, combining a tourist attraction with business and residential sectors. Because of the multifunction role of the area, it is crowded throughout the year. On a typical holiday, the crowd flow can easily reach 20 thousand per hour. Thus, there has been a considerable safety concern and needs of crowd control in this area.

As shown in Figure 3, the road network in this area is rather complex and contains many junctions. To select the optimal locations to install the Wi-Fi probes, first, the road segments considered to be crowded were designated by experts, as illustrated by the purple lines in Figure 3. Then, the three tourist landmarks, including the Yinding Bridge, the former residence of Soong Ching Ling, and the Prince Gong’s Mansion, illustrated by pink hollow stars in Figure 3, were also selected as candidates for installation. Taking these factors into account, 23 Wi-Fi probes, denoted as , were installed along the main roads to collect the crowd data. By preprocessing the Wi-Fi data, the detection radius of the Wi-Fi probe was uniformized to 50 m.

During the study period from December 23, 2016, to March 16, 2017, the crowd data of more than 6,000,000 individuals were collected by the installed Wi-Fi probes. The mean value of the all-day crowd density during the study period was calculated according to Equation (3) for each sampling site and is shown in Figures 3(a) and 3(b) for a weekday and a holiday. Table 2 shows the exact values of the crowd density at each sampling site and ranks them by sorting the weekday and holiday values in descending order. It can be seen from the table that although the absolute crowd density value differs by the two days, there is no significant difference in the ranks of the crowd density. That is, whether on the weekday or the holiday, the crowding patterns are similar in the study area. The most crowded sites are near the Yinding Bridge, including p8, p11, p14, and p12, which are the core scenic spots.

5. Results and Discussion

5.1. Comparing the Global Crowd Density Estimates by the IDWPSB and IDW Methods

As shown in Figure 4, we selected a typical peak time (3 p.m.) on a weekday (Feb 8, 2017) and a holiday (Dec 25, 2016) as an example to show the interpolation results of the IDWPSB method. The spatial resolution of the interpolation was 5 meter, generating the estimation of 628 unsampled sites in total. Overall, the global crowd density on the holiday was significantly higher than that on the weekday, with more road segments highly populated. The crowd tended to aggregate towards the northeast corner, the Yinding Bridge, on both days; meanwhile, the population was relatively sparse in the southwest quadrant, which is the residential sector.

Additionally, the traditional IDW method was employed to interpolate the population density and was compared with the new IDWPSB method. As shown in Figure 5, the majority of the MAE values at the sampling sites obtained by the IDWPSB method are smaller than those obtained by the IDW method. As shown by the MBE values, the IDWPSB method is less likely to overestimate or underestimate the crowd density in the cross-validation, suggesting a more robust performance than the IDW method. Furthermore, the accuracy improvement ratio, , was calculated as the difference between the IDWPSB MAE and the IDW MAE, as shown in

For the weekday result, the accuracy improvement ratio is 20.49% and for the holiday, the ratio is 15.69%. In general, the IDWPSB method improves the interpolation accuracy by 17.62%, indicating a better performance than the IDW. In terms of computational efficiency, the interpolation process took 52.4 seconds using the IDW method and 74.3 seconds using the IDWPSB method under the same computing environment. These various aspects of comparison suggest that including the path selection behavior in the IDWPSB method can improve the interpolation accuracy without significantly increasing the computational demand.

5.2. Factors Influencing the Interpolation Accuracy

Several factors may affect the interpolation accuracy and should be taken into consideration. The first factor is the number of sampling sites. We implemented the IDWPSB interpolation by using different numbers of sampling sites ranging from 10 to 22 and then randomly selecting sampling sites from the pool. Figure 6 shows the variation in the MAE value for each sampling site when the site is estimated using cross-validation. It is evident that there is a decreasing trend in the MAE when the number of sampling sites increases in all instances. In reality, the optimal number of sampling sites should be determined by both budget consideration and requirements for accuracy.

The second factor is the location of the sampling sites. To evaluate the relationship of the interpolation error with respect to the location of the sampling sites, the all-day crowd density for each sampling site on the weekday and the holiday was estimated for all the other sampling sites using the IDWPSB method. The spatial distributions of the MAE and MBE are shown in Figure 7. It is worth noting that, on either day, the site with the maximum interpolation error is p8, which is the sampling site with the highest crowd density. This result is likely due to the inherent bias of the IDW method in that it fails to extrapolate the maximum value. Thus, we suggest that the deployment of the Wi-Fi probes should cover the sites with the local maximum values.

The third factor is the role of an unsampled node in the road network. We use the node centrality to indicate the importance of an unsampled node in the road network. Here, the centrality of node i is defined as the number of times that node i is on the shortest path between any other two nodes being the origin and the destination. As shown in Figure 8, we investigated the relationship between the node centrality of an unsampled node and its interpolation MAE in the cross-validation. It is evident that there is no relationship between the node centrality and the MAE, suggesting that the accuracy of the interpolation is hardly affected by the role of an unsampled node in the road network. An advantage of the new method is that it requires less consideration of the road network structure.

6. Conclusion

A real-time full coverage estimate of crowd density is essential for safety management of urban spaces. In practice, the trade-off between full coverage monitoring and limited budgets always exists, leading to the partial deployment of crowd surveillance detectors in open public spaces. The incomplete data coverage seriously influences the quality and reliability of the crowd density estimate. By extracting individual activity trajectories from the Wi-Fi data, this paper proposes a modified IDW method based on the path selection behaviors, named the IDWPSB method. The main improvement of the new method is to adjust the distance decay effect in the IDW method according to the transition probabilities of the path selection behavior. The case study in the Shichahai scenic area of Beijing city demonstrates better performance of the proposed method than the traditional IDW method. The proposed method can also be applied to urban spaces structured by networks in both indoor and outdoor environments with the availability of LBS data. However, it should be noted that the path preference is not static in time. Thus, the transition probabilities of the path selection behavior should be a function of time, which may vary over time of the day or may differ between weekdays and weekends. The time-dependent calculation will provide a valid context for improving the interpolation method and accommodating dynamic travel environments.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest in the research.


This work was supported by the National Key Research and Development Program of China 2017YFC1503004.