Abstract

It is of practical significance to understand the specific impact of weather events on the operating condition of the surface transportation system so that proactive and reactive strategies can be quickly implemented by transportation agencies to minimize the negativity resulted from adverse weather events. Many studies have been conducted on quantifying such effects yet suffer from limitations such as subjectively defining a time window under uncongested conditions and not being able to account for the severe impact from weather events which result in travel time unreliability. To overcome those shortcomings in existing literature, an integrated data mining framework based on decision tree and quantile regression techniques is developed in this study. The results demonstrate that the approach is effective in characterizing time periods with different traffic characteristics and quantifying the impact of rain and snow events on both congestion and reliability aspects of the transportation system. It is observed that snow events impose more significant impact on travel times than that from rain events. In addition, the impact from weather events is even more severe on travel time reliability than average delay. The impact magnitude is directly related to the level of recurrent congestion under study. Other insights with regard to the capability of quantile regression and future improvement on the methodological design are also offered.

1. Introduction

It is well recognized that the transportation system may be significantly disrupted by adverse weather events. Aside from extreme events like floods, tornados, and hurricanes that can be disastrous, more common weather including rain, snow, and ice could also have apparent negative impacts on the system. Such effects include reducing travel demand and physical capacity, deteriorating safety condition, and compromising travel mobility and reliability. Dey et at. and Maze et al. have provided a detailed review and discussion on those issues [1, 2]. There is no doubt that it is of critical value to evaluate and understand the specific impact of weather events on the system. Transportation agencies are seeking to implement proactive and reactive management strategies such that the negativity associated with adverse events could be minimized. The scope of this study fits in the operational aspect in that we want to quantify the effect of different weather events involving rain and snow on both congestion and reliability of the corridor.

Many efforts have been devoted to this topic. Ibrahim and Hall studied the impact of adverse weather on the flow-occupancy and speed-flow relationships on a freeway in Canada [3]. The data in their study was collected from two non-holiday weekdays during the mid-day period. Weather intensity was defined in accordance with the rate of fall and visibility for rain and snow, respectively. They found out that heavy rain and snow had the greatest influence on traffic operation and that heavy snow had more impact than heavy rain. Brilon and Ponzlet analyzed three years’ worth of loop-detector-based data with one-hour increment and showed that snow and rain conditions caused about 7 and 4 mph reduction of speed, respectively, under uncongested and partial dense condition on six-lane highways [4]. In contrast, the impact was less severe for the same situation on four-lane highways. Kyte et al. studied effects of weather on free flow speed on a rural interstate freeway and results indicated the speeds during rain and snow are 6.2-10 mph slower than that during normal condition, whereas speeds during heavy snow could drop 21.7-28 mph [5]. Similarly, Smith et al. studied the impact of rain on urban freeway traffic flow under uncongested condition during the day-time period and they found rainy condition led to 5-6.5% decrease in operating speed and no differentiable effects from different intensity levels of rainfall were observed [6]. However, it was unclear whether it was due to the uncongested time period they selected for the study.

From more recent studies, Agarwal et al. aimed to quantify the impact of rain, snow, and pavement surface conditions on the reduction of freeway capacities and operating speeds in the Twin Cities area [2, 7]. The collected weather data was classified by intensities. Based on statistical analysis, they showed that severe rain could cause 4-7% reduction in speed while severe snow could cause 11-15% reduction. In addition, under low visibility condition (less than 0.25 mile), traveling speeds could drop as high as 10-12%. They noted that some statistics such as differences in speeds during light and heavy rain may not be accurate due to limited sample size. Tsapakis et al. examined how precipitation, snow, and temperature affected urban travel times at the macroscopic level [8]. Data from the Greater London area, UK, at AM, Midday, and PM periods during October and December of 2009 were collected through the plate recognition method. Similar to those previous studies, they also found snow has more significant impact on travel times than rain and heavy snow was the most significant factor. In addition, the regional difference with respect to impact was also observed. However, the sample size is not sufficient as many statistics derived in the study were not statistically significant. Yazici et al. applied the decision tree approach to evaluate the different impacts of weather on travel time and reliability during different times of day and days of the week based on the data collected from Taxi GPS in New York City [9]. However, there was no differentiating in highway types, like freeways and arterials, as all the records were blended together.

Based on extensive review of existing literature, it seems there have been a sufficient number of studies conducted to understand the operational impact of rain and snow events. However, there are some shortcomings in previous studies. First, many studies were conducted either in a subjectively defined time window such as free-flow condition [5], or by using the whole years’ data as in [2, 7]; however, there may be significant differences in terms of the degree of impact under different congestion levels at different times of day [10]. Consequently, the derived information is quite limited since it is not representative and applicable to periods with different traveling characteristics. For instance, commuters are more interested in knowing how weather will impose excessive delay on their way to work when the traffic is often congested. Thus, it is desirable to have a comprehensive evaluation on weather’s impact across different times through the day and week.

In addition, adverse weather often has an asymmetric influence where more severe conditions usually have more serious outcomes, yet most studies are only conducted in terms of the average effect. Therefore, those studies only convey a partial view of the complete picture as the mean may underestimate the actual effect. In this case, the whole distribution of travel time can provide a more comprehensive perspective. This is pertinent to the concept of travel time reliability (TTR), which has received extensive attention from the transportation field and has been proved to be an important component in both quantifying corridor/system performance from the perspective of transportation agencies and making trip-related decisions from the general public’s point of view [1115]. However, represented by SHRP 2 studies, most of the existing reliability related research focused on all the non-recurring events where weather as a whole is one of the factors under investigation [14, 16, 17] at a subjectively defined time window [18]. Although different weather events at different time periods were separately evaluated in [9], the reliability concept in their study is closer to inter-vehicle variability instead of at the operational level. In this regard, there is still a need to evaluate travel time distributions at various time periods.

The goal of the study is to propose an integrated data mining framework to quantify the impact of various weather events on not only congestion but also reliability aspect of the transportation system at different time periods with varying traffic characteristics. Accordingly, the first objective is to propose a pattern recognition algorithm such that the raw travel time data can be classified into different groups where observations in the same group are more homogeneous than observations across groups. The decision tree-based approach explored in [9] is deemed suitable for this purpose. Next, the quantile regression method which has the flexibility to quantify the impact of weather events on any part of travel time distribution is adopted for each classified time period. The reminder of the paper is organized as follows. The next section selects an Interstate corridor as the case study site and introduces the travel time and weather data for the study. The third section applies a classification tree to obtain classes with different traffic characteristics under normal conditions and discusses the results. The fourth section describes the quantile regression method and uses it to quantify the separate effects from rain and snow on mobility and reliability. The final section draws conclusions on the findings of this study and suggests further research.

2. Data Description

Interstate 71/75 in the Northern Kentucky urban area in Figure 1 is chosen as the study site. As part of a SHRP 2 pilot testing effort, a variety of datasets from many sources including speed sensors, GPS-based probe data, incident logs, and geometry information have been collected for the corridor. The non-holiday weekday speeds from 2013 and 2014 collected through 8 radar sensors which cover 8 miles are used. The original data is aggregated into 5-minute increment, which is regarded sufficient in measuring the short-term travel time variation [11]. The section under study has five lanes per direction starting from the south end and gradually reduces to three lanes per direction to the north end, and the associated AADT ranges between 147603 and 193399 vehicles. Based on existing conditions, the northbound direction of the corridor is often congested, especially during AM peak when heavy commuting traffic going into the Cincinnati area across the river. Therefore, the study will specifically focus on this direction. In order to conduct the analysis, the sensor-level data is further aggregated into corridor level by using the midpoint-based estimation method [19].

Meanwhile, historical weather records are obtained from https://www.wunderground.com/ which collects the data from weather sensors at Cincinnati/Northern Kentucky International Airport, which is about 1.5 miles away from the corridor of interest. The data includes rich weather information such as temperature, wind speed, direction, visibility, weather type, precipitation, etc. Preliminary analysis indicates wind and visibility have negligible impact on the congestion and reliability of the corridor; hence, they are not considered in current analysis. However, it is advised to include them, should they present any apparent impact depending on the location of the study. Accordingly, the rain and snow will be the focus of the current study. It should be noted that the amount of precipitation in the obtained data is added up cumulatively within the same hour, if there are multiple entries during rain or snow conditions, and then reset to zero in next hour. This means the value from the last entry should be used for that hour. Also, if the time difference between the last entries of two consecutive hours is not equal to one hour, then the precipitation is proportionately adjusted to get the hourly rate.

Besides weather, other non-recurring events such as crashes, non-crash incidents, planned road work activities, and sport and festival events that occurred in 2013 and 2014 are also collected and processed. Then, travel times at time periods when those events occurred are excluded so that the rest of the intervals are only affected by weather events.

3. Travel Time Classification

It is clear that traffic patterns vary at different times of day and days of the week and the impact from weather could also vary as traffic pattern changes. Therefore, it is important to first evaluate the traffic characteristics from different time periods. Normally, it is conducted by traffic practitioners based on their local experience, such as choosing 6:00-9:00am as the AM peak period. This may be subjective and not accurately defining the traffic transitioning boundaries. Hence, an appropriate classification method is desired so that the unique traffic flow pattern can be maintained. In this study, a popular non-parametric data mining technique called classification and regression tree (CART) is adopted [20]. First, since it is non-parametric, no assumption regarding the underlying distribution of the response variable is made during the model development. This feature is particularly appealing to our study as travel times are often non-normally distributed and skewed to the right. Second, the method is robust in dealing with situations like missing variables and outliers. In addition, it is relatively simple to understand and use and can generate easily interpretable results.

Here only travel time data under clear weather condition is selected as input for CART. This is to exclude the abnormal impact from rain or snow conditions as they may skew the regular traffic patterns under recurring condition. The travel time to be grouped is used as the dependent variable and the time of day in 5-min increments and day of week serve as independent variables. To build the tree that distinguishes among classes, starting from the root node which uses the whole travel time dataset, the CART method employs a binary partitioning procedure to iteratively split the data into subsets in a top-down manner so that data points in each subset become more and more homogeneous. The Gini impurity (GI) index, which measures such homogeneity, is used as the splitting criterion and the optimal cutting point between two sub-groups are determined when the GI is minimal [20]. The partitioning procedure is repeated until all the terminal nodes are reached, a condition where GI cannot be further reduced.

Often, the final tree may contain a large number of classes, which makes it challenging for transportation practitioners to meaningfully interpret the results and transfer the gained knowledge to real-world applications. Therefore, a tree pruning procedure is necessary. In order to have more flexibility in the practice, we use the pruning function that allows users to specify the number of final classes in the pruned tree through the tenfold cross-validation method by using tree package in R environment [21].

The travel time trends during clear conditions at different times of day and days of the week are presented in Figure 2. It can be observed that the recurring congestion occurs at AM peak during weekdays and at PM peak on Thursdays and Fridays. Also, the congestion issue in the morning seems more severe on Tuesday-Thursday than on Monday and Friday. Based on the classification results, CART seems to generate time periods that align well with the changing traffic pattern. The first split happens at 6:25 before which the traffic is under free-flow condition. After 6:25, the congestion starts to build. The congestion situation peaks at 6:50-8:40 period, and this period can be further classified into three groups based on day of week and the Tuesday-Thursday group is the most congested. After 8:40, the congestion starts to dissolve and the traffic comes back to normal after 9:05am. For the PM peak, a different pattern is observed during 14:50-18:40 when traffic condition is worse on Thursday and Friday than the three other weekdays.

Owing to the limited sample size of travel times that are affected by weather events, especially heavy rain and snow, a large number of classifications may keep us from having enough samples in some groups with shorter time spans to ensure the results are statistically significant. Therefore, with reference to the weather data and traffic patterns, the groups with similar operating conditions are further consolidated together and six final groups are included for further analysis. The boundary points of these groups are also shown in Figure 2. It should be noted that the number of groups can vary but it balances with available data.

After groups are determined, the weather data in each group is obtained. The criteria applied in previous research are also used to determine the intensity of rainfall and snowfall for consistency purpose [2, 6, 8]. The sample size associated with each weather type in each group is shown in Table 1. As there are so few travel time samples associated with heavy snow conditions, the light snow and heavy snow data are integrated together and referred to as snow events from hereafter. In addition, the non-parametric two-sample Kolmogorov-Smirnov (K-S) test is conducted to see whether distributions of travel time under clear and weather events within the same group are significantly different from each other. Unlike many statistical tests, the K-S test does not require the data to follow the Gaussian distribution. Accordingly, the null hypothesis of this test is the two travel time distributions follow the same probability distribution and the alternative hypothesis is the two travel time distributions do not follow the same probability distribution. According to test results, there is not enough evidence at 5% significance level to reject the null hypothesis, meaning the distributions under light and heavy rain conditions are not distinct from each other in Monday/Friday 6:25-9:05 and Tuesday-Thursday 6:25-9:05 time periods as well as the distributions between clear and heavy rain conditions in Thursday-Friday 14:50-18:40 time period. These three time periods each have 11, 79, and 56 samples for the heavy rain scenario, indicating they may not be sufficient to draw valid conclusions. Therefore, the light and heavy rain are also combined in the following analysis.

4. Regression Analysis

In previous studies, the dummy variable multiple regression (MR) method was applied to quantify the effects from weather on the traffic flow characteristics [5, 22]. With reference to the available variables included in current study, the model can be expressed as follows:where and are dummy variables and equal to 1 if the event occurs and 0 otherwise; is the intercept of the regression equation, which takes value of average travel times under clear conditions; and are coefficients of variable and variable and represent the increase in travel time if rain and snow event happens, respectively.

Accordingly, the MR model is only able to represent the average relationship between the response variable and explanatory variables. However, we know weather events often have an asymmetric impact on travel times as more adverse weather events usually lead to much worse congestion condition, thus excessively longer travel time. As a result, it is expected that weather events would have a more pronounced impact on the higher percentiles, such as the percentile travel time, than the conditional mean which dilutes such impact. Thus, MR only conveys a partial picture on the whole distribution of travel times. On the other hand, different effects from different weather events suggest the travel times under each condition have unequal variation or are heteroscedastic, which violates the basic homoscedasticity assumption made by MR.

In contrast, quantile regression (QR) relaxes such assumption and can be applied to describe the relation between any part of the distribution of the response variable and explanatory variables [23, 24]. In this regard, the quantile regression (QR) method is more suitable in quantifying the more severe aspects of weather and how the effects are different across the distribution.

Now suppose the cumulative distribution function of travel time is where ; then the -th quantile or percentile travel time would be

Accordingly, can be mathematically formulated aswhere is the intercept which is the -th percentile travel time under clear condition; and are the coefficients that represent the increase in the -th percentile travel time under rain and snow, respectively.

The above equation can be estimated by solving the following minimization problem correspondingly.where is the loss function and is defined by

Equations (4) and (5) can be reformulated into a standard linear programming problem, which can be easily solved with the simplex method.

The regression models including MR and QR are both implemented to analyze the effects from the rain and snow events for each individual time period obtained from the classification analysis. The impact of different weather events on the whole distribution of travel time during 6:25 -9:05 on Monday and Friday is shown in Figure 3 as an example. The x-axis corresponds to the percentiles of interest, ranging from the to the percentile. The solid red line represents the conditional mean outputted by the MR method, while dashed red lines show the 5th and 95th confidence interval of the mean. Meanwhile, the dashed dot line represents the percentile values and the shaded area shows the corresponding 5th and 95th confidence interval. According to MR, the average travel time under clear conditions is 9.5 minutes, and it will take 3.3 or 3.8 minutes longer if it rains or snows. In contrast, the quantile regression plot shows the increase in travel time at a given percentile if a weather event occurs. For example, if there will be snow during this time period, then the percentile travel time will increase 12.65 minutes, in addition to 13.7 minutes under clear condition. As a result, MR underestimates such negative impact. In fact, based on the locations of mean regression lines and quantile regression lines, it can be observed that MR overestimates the impact from the weather at lower percentiles (below red line) while it underestimates the impact at higher percentiles (above red line).

Next, we specially focus on the 80th and the 95th percentile travel time as the former is suggested to transportation agencies for project improvement evaluations [11], while the latter is of particular interest to travelers in their trip choice decision-making [12]. The results are reported in Table 2 where the last two columns are percentage of increase in travel time compared to that during clear weather condition.

Based on Table 2, the magnitude of negative impact on the average travel time is directly correlated to the level of recurrent congestion. Here, we consider the first, fourth, and fifth time periods in Table 2 are in free-flow condition and the second and sixth time periods are in moderately congested condition, while the third time period represents heavily congested situation. Accordingly, the mean effect from rain on three congestion levels is 4-14%, 15-35%, and 22.5%, respectively. The percent increase in travel time during heavily congested conditions is not that significant, and this is possibly due to traffic already traveling at a very low speed, making the additional effect from rain less dramatic. In addition, compared to rain, snow events usually have a more severe impact on the operation condition. It can result in 14-20%, 20-40%, and more than 40% increase in travel time in free-flow, moderately, and heavily congested condition, respectively.

In terms of travel time reliability which is represented by and percentile travel times, we can see more significant impact from weather events than that on the average travel time. For example, compared to the average impact of rain in the heavily congested condition, the percentage of increase is now 28.7% for percentile travel time and 54% for percentile travel time. In other words, during rainy conditions, to ensure 19 out of 20 times are on-time arrivals, travelers should add half of what they normally allocate when there is no weather event. Also, snow would have a more serious impact on travel time reliability compared to rain, especially at the right tail of the distribution. Still taking the heavily congested condition for example, the percentile travel time is two times more if it is snowing compared to clear weather condition. In other words, it will take 51 minutes to ensure 95% chance of making on-time arrival, whereas it normally only takes 17 minutes to meet that requirement. Apparently, some snow events have caused enormous disruptions on the reliability of corridor operation.

In order to better understand the traffic pattern during inclement weather condition, the effect of a blowing snow event that occurred on March 5, 2013, is illustrated in Figure 4. The grey solid line in the graph represents travel time and red dashed line represents precipitation. At first light rain started at 14:05 and it gradually became heavy rain at 18:55, which then quickly turned to heavy snow at 19:15 and lasted for 35 minutes before changing to light snow, which continued onto next day until 6:55. Under the weather impact, the travel time on the corridor ascent dramatically 10 minutes after heavy snow happened and the severe congestion lasted for 4 hours before travel time started to decline and was not completely back to normal until at 8:00. It should be noted that when the intensity of snow decreases, it does not necessarily mean the operating condition improves. This is attributable to the following: (1) it takes time to completely dissipate the queue previously built up at downstream during the event; (2) in the meantime, the accumulated snow on the ground significantly reduces the operating speed, which in essence reduces the capacity of the highway. As a result, simply labeling travel times that are in between the start and end of a weather event as affected by weather and using precipitation to categorize the intensity of rain and/or snow may not completely capture the true impact of some adverse weather events, thus underestimating their effects on the operation condition.

5. Conclusion

In this study, a data-driven approach was developed to quantify the impact of weather events on travel time and reliability. We first applied decision tree to automatically classify the time of day and day of week into different groups based on the speed profile. The goal is to create groups so that traffic characteristics are homogeneous within the group while being distinct across the groups. This helps to establish more representative baseline conditions and thus more accurately separate the weather impact from the recurrent congestion in each group. After that, the impact of weather events was determined quantitatively for each group by using multiple regression and quantile regression. We used quantile regression because it was able to evaluate the impact on the whole travel time distribution, whereas multiple regression was only able to reflect the impact on average travel time. For example, quantile regression can estimate the additional time needed due to rain or snow events for the 80th or the 95th percentile travel time, which are frequently used as measures of travel time reliability. The analysis demonstrated the effect of weather events on travel time reliability is more significant than that on average travel time.

Understanding the impact of weather events on traffic operation is important for transportation agencies to determine weather-responsive management strategies and for travelers to plan their trips. The established approach only requires speed and incident data, which are readily available to transportation agencies. It can be easily applied to a corridor to understand the effect of weather events on the corridor traffic. It can also be scaled to the regional network level to identify the road sections experiencing the most adverse impact of weather events. The analysis can help transportation agencies to deploy weather-responsive management strategies at critical spots as well as to evaluate their effectiveness after deployment. As an example, winter maintenance can consider the varying degrees of snow impact while prioritizing snow plowing routes. The quantitative analysis can be helpful to the general public as well. The percentile travel time can be disseminated to travelers for route planning purpose. When there is a snow event, freight carriers can use the percentile travel times under snow conditions to determine an optimal route and/or time to ensure just-in-time deliveries.

As more data accumulate over time, the developed approach can be refined by including different intensity levels of rain and snow events. In addition, the actual impact of a weather event can linger longer than its duration; therefore, it is desirable to develop an approach to address this issue in the future research.

Data Availability

The data used to support the findings of this study have not been made available due to the restriction of data use agreement.

Disclosure

The contents of this paper reflect the views of the authors who are responsible for the facts and accuracy of the data presented herein.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research is made possible by support from the Kentucky Transportation Cabinet and Federal Highway Administration.