Abstract

Pedestrians are more likely to be seriously injured in vehicle collisions. In fact, multiple collisions between vehicles and pedestrians occur on residential roads that lack street-to-sidewalk dividers and have numerous blind spots. Traditional traffic safety features and equipment, such as speed bumps and traffic signs, are not always sufficient to prevent pedestrian accidents on such residential roads. Therefore, we suggest a collision risk warning service for residential roads as a solution to this issue. We use CCTVs with computer vision techniques and radar to accurately detect objects in real-time and to trace their trajectories. In addition, we employ a time-to-collision-based method to identify dangerous situations. The service warns drivers and pedestrians about hazardous situations using a light-emitting diode sign board. We applied our service to three different roads on a university campus in Seoul, Korea, and then conducted a user survey to evaluate the service. In summary, more than 90% of respondents stated that the service was necessary for these specific locations, and 76.9% noted that the service significantly contributed to traffic safety on the campus. This implies that the proposed service improved traffic safety and can be applied to various locations on residential roads.

1. Introduction

Approximately 1.3 million people die annually because of traffic accidents [1]. Some governments and agencies in many countries have tried to reduce traffic accidents by implementing safety education and policies such as promoting traffic rules and enforcing speed limits [2]. As a result of these efforts, traffic fatalities in most developed countries in the OECD have decreased substantially. For example, Korea reduced road fatalities by 26.4% from 2017 to 2020 [3]. However, safety issues of pedestrians remain a concern [4]. Pedestrian fatalities in Korea accounted for 35% of total fatalities [5]. More than half of pedestrian fatalities occur on residential roads without separation of streets and sidewalks [6]. Pedestrians are the most likely to be seriously injured in vehicle collisions. Traditional traffic safety features and equipment, such as speed bumps and traffic signs, are not always sufficient for preventing pedestrian accidents in blind spots on residential roads. Particularly when pedestrians abruptly exit from parked vehicles on roads, drivers are unable to respond appropriately, and traffic accidents are highly possible.

Several technologies have been developed to prevent vehicle-pedestrian collisions. They are based on algorithms that identify objects, predict their trajectories, and determine whether or not a collision risk exists. The algorithms can be divided into two categories depending on how the collision risk is determined. First, some algorithms employ surrogate safety measures (SSMs) to recognize the presence of potentially dangerous situations based on whether predicted trajectories of objects overlap. Using microscopic traffic characteristics such as vehicle speed, acceleration, time headway, and space headway, an SSM method assesses the collision risk of particular traffic scenarios [7]. SSMs, such as time-to-collision (TTC) and post encroachment time (PET), have been widely used to evaluate traffic safety performance and identify potential accident risks [813]. One study assumed a connected environment in which pedestrians and vehicles shared real-time location information using IoT devices. According to object locations, velocity, relative distance, angle, and TTC, dangerous situations were determined [8]. In another study, an algorithm was developed using onboard cameras in vehicles. Potential collision areas were defined by the minimum TTC from the predicted movements of ego vehicles and pedestrians [9]. In addition, in a connected vehicle environment, a crash warning system was developed for bike lane areas. PET was used to identify potential areas of interaction between vehicles and bicycles [10]. Most algorithms were verified as simulation-based or autonomous platforms. The second set of algorithms predicts risk situations using deep learning methods [1113]. After an algorithm is trained using prior data labeled by SSMs as risk situations, it predicts whether a given situation is dangerous. The gated recurrent unit method was used to predict collision risk at a signalized intersection [11]. Similarly, long-short term memory (LSTM) was used to predict risk situations [12]. In some cases, deep learning methods were used for trajectory estimation to predict risk situations. One study proposed a collision risk area estimation system at unsignalized crosswalks. The system used LSTM to predict object trajectories and then conducted statistical inferencing to predict collision risk areas [13].

As soon as a potentially hazardous situation is identified, various warning services are provided. This warning information can be divided into three categories. First, information is provided by vehicles. Augmented reality (AR) on the heads-up display in vehicles was employed to display warning information. In addition to AR, an audio warning was immediately provided [10]. Several active pedestrian collision avoidance systems did not give alerts but instead automatically controlled the vehicles [14, 15]. The second method is to provide information to vehicles from roadside equipment (RSE). For example, amber flashing lights were activated when pedestrians were approaching or crossing crosswalks so that the drivers could perceive them [16]. The third method is to use infrastructure-to-vehicle (I2V), vehicle-to-pedestrian (V2P), and vehicle-to-everything (V2X) communication. One study utilized I2V communication to give warning information to vehicles from RSE [17]. Several studies developed V2P and V2X communication-based warning services in Wi-Fi environments [18, 19]. However, in the current state, the communication-based safety warning method has problems regarding latency and stability.

Most systems were developed from the perspective of vehicles. Based on cameras or radar sensors in vehicles and CCTVs in RSE, warning information was provided to drivers. Few services considered a pedestrian perspective. One study developed a system that recognized dangerous situations and provided information to pedestrians via their smartphones [20]. However, it was inaccurate and ineffective in that object detection was conducted only by cameras on smartphones. In addition, few studies evaluated the effects of proposed algorithms in the field. Most algorithms were evaluated based on simulations or field prototype tests, and accuracy was only verified through a confusion matrix.

In the present study, we propose a safety service framework that provides risk information to both vehicles and pedestrians. The proposed framework utilizes RSE such as CCTVs and radar to detect objects using a deep learning method. Then, the algorithm uses SSMs to identify whether the current situation is dangerous. If the situation is unsafe, a light-emitting diode (LED) sign board gives warning information to both vehicles and pedestrians to avoid a potential collision. Thus, the service alerts drivers and pedestrians at the same time. To evaluate the safety effects of the proposed service, we implemented and operated it on-site. We conducted a survey to investigate user satisfaction.

The remainder of this paper is structured as follows: the service description section presents the overall framework. The application and evaluation section introduces the study site and presents the evaluation. In the last section, we summarize this study and discuss possible future research directions.

2. Service Description

We propose a collision risk warning service procedure, as depicted in Figure 1. This service is a proactive countermeasure against vehicle-vehicle or vehicle-pedestrian collisions. There are four steps: Step 1 object detection through CCTV and radar; Step 2 trajectory prediction of detected objects; Step 3 collision risk identification based on predicted trajectories; and Step 4 collision risk warning, if any. Here, the current time is , and the previous point one time step before and the future point time steps after are denoted by and .

2.1. Object Detection

We use CCTV and radar equipment to detect vehicles and pedestrians. One of the detecting algorithms is you only look once (YOLO) [21], which has been used in various fields for real-time detection. We employ a YOLO v5-based algorithm. YOLO v5 is faster and more accurate than its previous versions [22]. To account for the characteristics of residential roads, we need residential-road-specific training datasets, which are distinguished from general road datasets. Therefore, we used 150 hours of video data from CCTV cameras installed on residential roads in Guro-gu, Seoul. We trained for various environments such as lighting and weather conditions as well as situations involving numerous objects such as pedestrians carrying umbrellas, as shown in Figure 2. With the trained model, objects can be accurately identified in real-time as pedestrians, motorcycles, bicycles, vehicles, and personal mobility devices, even under severe lighting and weather conditions. In Figure 3, the training results are shown with an example site at two different time points compared to the identification without the training. The overall accuracy is presented in Table 1. At the 50% level of Intersection over Union, defined as the degree of overlap between ground truth and prediction regions [23, 24], the detection rate for pedestrians, motorcycles, bicycles, vehicles, and personal mobility devices was higher than 99%. In addition, radar is employed to complement CCTVs. They provide precise locations and speeds, which are difficult to collect with CCTVs. With these two complementary devices, accurate and precise real-time object detection is achieved.

2.2. Trajectory Prediction

Future trajectories of objects are predicted based on their previous coordinates that we can track. In this study, perspective transformation and Kalman filter are used for tracking objects. Coordinates of objects detected by CCTVs are transformed into overhead perspectives to measure exact locations. We employ the perspective matrix in Open CV to convert the coordinates from the videos to overhead coordinates [25, 26]. Kalman filter involves repeating the prediction step and correction step of trajectories [27]. In the prediction step, the next position of the object in the current time is estimated based on the information collected about the object already being tracked, as in equation (1).where is the state vector representing the object’s dynamic behavior at a discrete time index ; is the transition matrix at time index ; and vector is the noise following normal probability distribution with zero mean and covariance matrix . In the correction step, the previously-predicted position is compared to the position measured by CCTVs. To modify the object position, a weight called Kalman gain is used, which indicates the ratio of the error of the predicted object position to the error of the object position measured by the object detection algorithm. has a range from zero to one, and it is influenced by more accurate values between the predicted position and the measured position, as stated in equation (2).where is a posteriori estimated state; is a priori estimate; is the observed measurement; and is the measurement matrix at time . After the repeated execution, we update and find the optimal state that minimizes the error between the estimated state and the measured state [28].

The object-tracking algorithm proposed in this study was compared to DeepSORT, which is a deep learning-based method for tracking objects [29, 30]. DeepSORT consists of four key components: detection, estimation, data association, and generation and deletion of tracking objects. In DeepSORT, Kalman filter is used in the estimation stage, and a Hungarian matching algorithm is employed in the data association stage [30]. The major difference between DeepSORT and the proposed algorithm is twofold. First, we advanced the Kalman filter algorithm. Second, due to the Hungarian algorithm’s prohibitive computational cost, we developed an original matching algorithm instead of using the Hungarian algorithm. For comparison, we used the Oxford Town Centre dataset, which is commonly employed to assess object-tracking performance [31]. The comparison results are shown in Table 2.

Multiple object tracking accuracy (MOTA) is used to evaluate the accuracy of object-tracking algorithms [32]. MOTA is the most prevalent indicator used to measure a tracker’s performance. Its value may be determined using equation (3).where ground truth (GT) is the total number of ground truth objects, identity switching (IDSW) represents the number of ID switches in the video stream, false negative (FN) indicates a missed detection, and false positive (FP) means an inaccurate detection. Based on MOTA, we observe that the proposed algorithm results in higher accuracy than DeepSORT. Mostly tracked targets (MT) and mostly lost targets (ML) are the number of tracked and lost objects, respectively. The proposed algorithm has higher and lower values of MT and ML, respectively, than DeepSORT, which are desirable. Furthermore, the proposed object-tracking algorithm has a higher FPS than DeepSORT since we do not use the computationally burdening Hungarian matching algorithm. In summary, the proposed tracking algorithm outperforms DeepSORT.

Then, based on the tracking data, we predict vehicle and pedestrian trajectories. First, we classify straight and curved trajectories based on whether the angle of the previous trajectories is smaller than the angle that we predetermine, (unit: radians). We estimate a vehicle’s tendency to move based on the angle difference between previous points. Then, we reflect this tendency in the trajectory prediction. If we set the time index to zero for the current time, the current position is at , and the previous positions at two and one time steps before are and , respectively, i.e., the coordinates of and are and . The known angle between and is (unit: radians). The tendency angle (unit: radians) is calculated in equation (4).where is a weight factor that considers the angle error, s.t., , and is the time window length of the past data, i.e., we consider the tracking data at .

We consider a short prediction period between and , where . Thus, it can be reasonable to assume that a vehicle with keeps moving straight during . At future time point , the center location of the straight moving vehicle, denoted , is on the straight line extended from and the distance from to , , is the multiplication of the average vehicle speed and the time difference between and . If , the center of a vehicle predicted at time point , , is found based on , , and the average vehicle speed. Particularly, is calculated using and . Similar to equation (4), is defined as for all . The spatial range of a vehicle, predicted at , is defined to have its center at , and its boundary is determined based on the actual vehicle size.

Compared to vehicles, pedestrians have relatively uncertain and inconsistent movement characteristics. Thus, we use an elliptical trajectory prediction approach to account for stochastic pedestrian trajectories, and we consider the ellipse as the future spatial range of a pedestrian’s location [33]. We estimate the moving direction of pedestrians based on the previous directions in the same method for vehicles, as described in equation (4). We estimate the major and minor axes of an ellipse using actual pedestrian path data collected from CCTVs on residential roads [34]. Estimation results depend on the time point for all at which we predict from the present time, . We can find of the center of an ellipse for a pedestrian in a similar way to finding that for a vehicle with . Depending on the moving distance (unit: meters) from the current time point to the future point , defined as , the possible spatial range that the pedestrians reach varies. The estimated major and minor axes of the ellipse are determined using equation (5), where the parameters were tuned based on the actual data.where (unit: meters) is the major axis of the ellipse at future time point , and (unit: meters) is the minor axis of the ellipse at future time point .

The predicted trajectories are graphically illustrated in Figure 4.

We tested the accuracy of the trajectory prediction model using example trajectories of three pedestrians and two vehicles, as depicted in Figure 5. The length of each time period is one frame, and for each prediction, we plotted the center of a pedestrian’s ellipse or a vehicle at (after three seconds from each prediction time point). The prediction trial is indexed by and the total number of trials is . We conducted 857, 396, 899, 330, and 324 trials for pedestrian #1, pedestrian #2, pedestrian #3, vehicle #1, and vehicle #2, respectively. The test results are presented in Figure 6 and Table 3. The unit of the graphs in Figure 6 is in pixels, and the horizontal and vertical lengths of a pixel are 0.09 meters and 0.11 meters, respectively. For the accuracy measure, we use mean absolute error (MAE) calculated by equation (6).where and are the predicted and the actual locations of the object three seconds later than the current time point of the prediction trial, respectively. Specifically, is the coordinate of . The MAE was calculated to be between 0.09 and 0.65 meters. The accuracy is lowered as the curvature and speed increase.

2.3. Collision Risk Identification

Once future trajectories of objects intersect, we get one intersecting point of two objects and , and two collision time (CT) to reach the intersecting point from each object at the current location. and are calculated by equation (7) [35].where and represent the longitudinal and latitudinal coordinates of the intersecting point, respectively; and represent the current coordinates of objects and , respectively; and , , , and represent the current tangent and cotangent values of objects and . With , , and the objects’ speeds and directions, we can calculate CTs for both objects, [36]. We compare the two CTs, and the smaller one, , is used in determining whether the spatial ranges overlap in increments of 0.25 seconds starting from one second earlier than considering the difference between the center point and the spatial range boundary of each object.

If the spatial ranges of two objects successively overlap at more than three intervals, the time interval when the two spatial ranges first overlap is defined as the predicted TTC. This indicator assumes that the involved objects do not recognize the risk and there is no urgent maneuver to avoid it in a following short period of time. We compare the predicted TTC with a TTC threshold to identify whether a collision risk exists [3739]. If it is smaller than the TTC threshold, the collision risk is regarded to exist. The entire process of the TTC calculation is depicted in Figure 7.

We assume that the TTC threshold value is the summation of the perception reaction time, the margin time for an LED sign, and the vehicle stopping time. In this study, a fixed TTC threshold of four seconds is used to account for a safety margin to some extent. We consider the perception reaction time to be 1.5 seconds [40], the LED sign margin time to be 1 second, and the vehicle stopping time to be 1.5 seconds.

2.4. Collision Risk Warning

If the situation is judged to be dangerous, drivers and pedestrians are presented with LED sign information. This service delivers warning information on the roadside for vehicles and pedestrians, as opposed to prior systems that provided risk information only to vehicles. The warning information is presented in Figure 8.

3. Application and Evaluation

3.1. Application

We applied the proposed service on KAIST Seoul Campus in Korea. The service was provided in three situations: illegal roadside parking, unprotected left turn, and wrong-way driving. Figure 9 provides a description of the application site and each situation. In addition, we evaluated the proposed service to analyze its impact on safety by conducting an on-site survey.

3.2. Survey Design

After the service application, we analyzed responses to the service in terms of safety effects. We collected data through in-person interviews. The survey questionnaire consisted of four sections. First, we inquired about demographic characteristics, including gender, age, mobility impairment, and current modes of transportation. Second, we asked whether accidents or accident hazards had occurred at the site. If so, respondents were questioned as to whether they were using a smartphone or headphones and about locations of incidents. Furthermore, we inquired whether respondents thought campus traffic safety should be improved. Third, for each location, we solicited feedback on the installed and operating service, such as preference or level of satisfaction. We first inquired if respondents were aware that the service was operational. If they were familiar with the service, they were asked about its safety effects following the operation and its requirements at each location. Finally, the contribution of the service to safety on KAIST Seoul Campus was assessed using a five-point Likert scale.

The respondents to the questionnaire were people who commute to KAIST Seoul Campus. The survey was conducted in July 2022. A total of 151 responses were collected. The majority of respondents were campus members, such as students, professors, and employees, while some were local residents and travelers who were passing by KAIST Seoul Campus. Sample descriptions are summarized in Table 4.

3.3. Survey Analysis

We analyzed the collected data, including descriptive statistical analysis and the chi-square test. First, we investigated the perception of the traffic safety status and the service on the site, as shown in Figure 10. 32.5% of respondents reported that they had encountered unsafe situations on KAIST Seoul Campus, with 44.9% in A, 22.4% in B, and 10.3% in C. 30.6% of respondents who experienced accident risk indicated that those risks had occurred at night. Moreover, 26.5% were using a smartphone, and 22.4% were wearing earphones (or headphones) when the incidents happened. Regarding campus traffic safety, 78.8% of respondents indicated that it should be improved for four reasons, as shown in Figure 11. First, there is no separation between streets for vehicles and sidewalks for pedestrians on campus roads (16.8%). Second, the road widths are narrow (15.1%). Third, insufficient guiding signs on one-way roads frequently lead to wrong-way driving (7.6%). Fourth, there are multiple blind spots due to parked vehicles and buildings (5.9%). These factors are consistent with the safety problems of residential roads in other regions of Korea [6]. Then, after service installation and operation on KAIST Seoul Campus, 76.9% of total subjects noted that the service contributed to traffic safety on the campus from the results of the Likert scale, as shown in Table 4. They stated that the service could prevent collision risk in the blind spots by providing warnings. Specifically, they mentioned that LED sign boards made signs instantly recognizable, even at night, compared to convex mirrors. In addition, 78.6%, 77.6%, and 85.1% of respondents who were aware of the service in Locations A, B, and C, respectively, believed that roads were safer after service operation. In addition, 92.9%, 91.8%, and 94.6% of respondents who knew the service in Locations A, B, and C indicated that the service was necessary for campus traffic safety, as presented in Figure 12.

Second, we conducted a chi-square test to determine whether respondents’ perceptions of the service in operation at three different locations differed significantly. A chi-square test is a nonparametric test to analyze the independence or difference across a group among nominal variables [41]. We used the chi-square test of homogeneity to compare the proportions of service perception among groups at three locations for significant differences. To conduct the homogeneity test, samples of the test groups must be distinct [42]. For this, three different groups of respondents were questioned about their opinions of the service in the three designated locations (i.e., Locations A, B, and C). Each location had a different number of individuals who were aware that the service was operational. In Location C, there were more individuals who were unaware that the service was operational than those who were aware, as shown in Figure 12. Therefore, we were able to use the homogeneity test. The formula for the test statistics, -value, is mathematically expressed as equation (8).where is observed frequency, is expected frequency, represents the location index, refers to the category of response (e.g., yes or no), and and are the number of locations and the number of categories, respectively. The degree of freedom (df) was found to be two, using the following formula: Degree of freedom = (the number of rows − 1) (the number of columns − 1). We established the two different null hypotheses that the proportion among the three groups is the same for the following questions: (i) “Do you think that the service has increased safety at this location?” and (ii) “Do you think that the service is necessary at this location?.” If the calculated -value is greater than the critical value from the -distribution, we must reject the null hypothesis. This implies that at least one proportion differs considerably from another proportion among groups. We obtained two values of 1.6444 and 0.4911, with associated values of 0.4395 and 0.7823, respectively. Since these values are greater than the significance level of 0.05, we do not reject the two hypotheses. The report states that there is no statistically significant difference among the locations in terms of service perception. Each location represents scenarios that can happen on residential roads. Accordingly, it is demonstrated that the proposed service can be implemented in many locations on residential roads and have the same effect regardless of location from a user’s perspective. The results of the chi-square test are summarized in Table 5.

Finally, we asked respondents for suggestions on improving traffic safety at each location. The suggestions are summarized in Table 6. One of the most recommended approaches at almost all locations was to use an acoustic speaker to deliver warning information. Some respondents suggested that it would be more effective if both a visual and an audible warning were utilized concurrently. Another suggestion was to promote the service to campus members. Indeed, in Locations A, B, and C, 64.9%, 56.3%, and 49.0% of respondents, respectively, knew that the service was in operation. Except for the use of acoustic speakers, Location C’s suggestions differed from those of the other locations. The characteristics of Location C included traffic safety signs for one-way driving and an automatic roadblock that prevents vehicles from traveling in the incorrect direction.

4. Conclusions

In this study, we propose a collision risk warning service for residential roads based on risk assessment. In contrast to earlier research, this service combines CCTVs and radar to detect items precisely and quickly. We use an elliptical trajectory prediction approach to predict unknown pedestrian behaviors. The major and minor axes of the ellipse were derived using CCTV data on actual pedestrian trajectories. Furthermore, we use TTC to identify collision risks in vehicle-vehicle and vehicle-pedestrian cases. An LED sign board is used to provide risk warnings to vehicles and pedestrians. The proposed service was provided in three situations: illegal roadside parking, unprotected left turn, and wrong-way driving on residential roads.

We applied our service to three locations on KAIST Seoul Campus in Korea. To evaluate service effects, we conducted a survey and analyzed the safety effects from the user’s perspective. Using a set of questions, we investigated respondents’ satisfaction with the service. 76.9% of respondents reported that the service contributed to improving traffic safety on KAIST Seoul Campus. In addition, they stated that the campus was safer than before the service installation, with 78.6%, 77.6%, and 85.1% responding positively for Locations A, B, and C, respectively. These results, combined with the high satisfaction reported by survey respondents, suggest that our service can be applied to various areas that are typically considered residential roads. The service implementation is expected to improve traffic safety and reduce fatalities that arise due to blind spots.

The generalizability of these results is subject to certain limitations. For instance, evaluating the proposed service was conducted based only on a survey. A natural progression of this work is to evaluate the service using before-and-after surrogate data, such as the frequency of two- and three-second TTC events. We are currently collecting the relevant data for this purpose. In addition, TTC may also vary based on given infrastructure, nature, and human environments. In this study, a constant TTC threshold was adopted with a safety margin, which could yield unnecessary false positives in some situations. Therefore, we should conduct a sensitivity analysis for TTC with respect to different location-specific environments to determine the optimal TTC for each location. Moreover, using acoustic speakers to alert vehicles and pedestrians appeared to be the most suggested approach for enhancing traffic safety at all locations. According to the respondents, they believed that combining visual and audible warnings would provide a more effective warning to those using a smartphone and/or earphones. The results are consistent with [43], which supported the idea that multimodal warning services have potential advantages in various situations, such as when people are using smartphones or are engaged in distracted driving.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Aya Selmoune and Jeongin Yun contributed equally to this paper.

Acknowledgments

This work was supported by the Ministry of the Interior and Safety (MOIS), Republic of Korea (grant nos. 2021-MOIS41-001-00000000-2022 (development of traffic safety risk warning technology for blind spots in living roads)).