#### Abstract

Short-term traffic volume forecasting is one of the most essential elements in Intelligent Transportation System (ITS) by providing prediction of traffic condition for traffic management and control applications. Among previous substantial forecasting approaches, K nearest neighbor (KNN) is a nonparametric and data-driven method popular for conciseness, interpretability, and real-time performance. However, in previous related researches, the limitations of Euclidean distance and forecasting with asymmetric loss have rarely been focused on. This research aims to fill up these gaps. This paper reconstructs Euclidean distance to overcome its limitation and proposes a KNN forecasting algorithm with asymmetric loss. Correspondingly, an asymmetric loss index, Imbalanced Mean Squared Error (IMSE), has also been proposed to test the effectiveness of newly designed algorithm. Moreover, the effect of Loess technique and suitable parameter value of dynamic KNN method have also been tested. In contrast to the traditional KNN algorithm, the proposed algorithm reduces the IMSE index by more than 10%, which shows its effectiveness when the cost of forecasting residual direction is notably different. This research expands the applicability of KNN method in short-term traffic volume forecasting and provides an available approach to forecast with asymmetric loss.

#### 1. Introduction

With the booming development of detecting devices, mobile internet, and cloud computing technique, Intelligent Transportation System (ITS) is being implemented in real traffic management systems to improve the efficiency of traffic management. Short-term traffic volume forecasting, which can provide road information ahead of time, has been an essential part of ITS to support real-time traffic control and management [1]. Short-term traffic volume forecasting is the process of estimating directly the anticipated traffic conditions at a future time, given continuous short-term feedback of traffic information [2]. Distinguished from long-term traffic forecasting which serves for traffic planning, short-term traffic volume forecasting focuses on predicting traffic condition over time horizons ranging from few seconds to few hours. Most of traffic data used in short-time traffic forecasting models is collected by automatic detecting devices.

Besides abundant real-time traffic data, rapidly developing researches in this field using typical statistic models and machine learning algorithms also accelerate the application of short-term traffic volume forecasting. These approaches are reviewed in Section 2. However, most of these researches focus on forecasting with symmetric loss, which holds a hypothesis that the forecast value is larger or lower than the real value and it gets the same cost. Forecasting with symmetric loss is simple and mediocre but insufficient to satisfy the real need in the context of traffic volume, because if the forecasting value is larger than the real value, it only costs linear-increased management resource and guides travelers suboptimal routes. On the opposite, if forecasting traffic volume is lower than the real value, it may cause a traffic congestion and the whole traffic system will be more vulnerable and unpredictable.

This article tries to enhance normal KNN forecasting method to forecast with asymmetric loss. This paper is organized as follows. In Section 2, previous related researches are reviewed. In Section 3, the basic concept of dynamic KNN forecasting is introduced firstly; then some detailed techniques are discussed including the enhanced Euclidean distance which takes the stability of difference between two traffic profiles into consideration, Loess smoothing, asymmetric loss index Imbalanced Mean Squared Error (IMSE), and corresponding algorithm. In Section 4, the basic data used in experiments is introduced firstly. Then profiles using Loess or not are both experimented in dynamic KNN forecasting model. The best ranges of three key parameters are discussed. The effectiveness of KNN with asymmetric loss is tested last. In Section 5, the conclusion of this paper is drawn and further issues in this direction are discussed.

#### 2. Literature Review

Short-term traffic forecasting has been a classical research direction in ITS for nearly 40 years. After Box-Jenkins method was applied by Ahmed and Cook [3], enormous typical statistical approaches such as historical average algorithms [4], smoothing [4, 5], Kalman filtering [6], and ARIMA family models [7, 8] are widely used in this area. These well-founded mathematical approaches mostly are parametric models and act well in model specification; however they become insufficient when traffic pattern is complex and parameters of models are hard to adjust responsively [9, 10]. Corresponding entire spectrum literature before 2003 was critically reviewed by Vlahogianni E [2].

In recent 20 years, as automatic traffic detecting devices have been widely used and machine learning theories have made progress rapidly, data-driven empirical algorithms become prosperous in short-term traffic forecasting [11]. Such algorithms have advantages that they are free of any assumptions regarding the underlying model formulations and the uncertainty involved in estimating the model parameters. These algorithms include K nearest neighbor (KNN) [12, 13], Support Vector Machine (SVM) [14], Random Forest (RF) regression [15], and Artificial Neural Network (AI/NN) [16].

For KNN method, Smith and Demetsky tested the performance of KNN regression compared with neural networks, a historical average, and the ARIMA model and concluded that KNN was superior in the field of transferability and robustness [17]. Smith et al. used kernel neighborhoods and suggested that the method produced predictions with an accuracy comparable with that of the seasonal version of an ARIMA model [18]. Habtemichael concluded that previous researches using KNN method mostly used the simplest form of KNN [19]. Enhanced KNN method using weighted Euclidean distance and weight to the candidate value was proposed in this article.

The crucial step in KNN method is to define the similarity measurement between traffic profiles. It is also the subject in traffic volume clustering. Aghabozorgi et al. reviewed previous literature in time-series clustering and concluded Euclidean and Dynamic Time Warping (DTW) were useful similarity measurement for time series [20–23]. There is certain limitation in Euclidean distance and it is discussed and enhanced in Section 3.1. Other related researches in traffic volume clustering include Xia et al. [24] and Xia et al. [25]. Lin et al. combined KNN with local linear wavelet neural network for short-term prediction of five-minute traffic volumes and get better performance compared with LLWNN and SVR [26].

In the review of Vlahogianni [11], 10 future directions in short-term forecasting were proposed. In model selection, most previous researches just selected the model that provided the most accurate predictions regardless of whether certain modeling assumptions were violated or unrealistic. Forecasting with asymmetric loss has been researched in statistic field [27] and was widely used in economic issues [28–30]. Zhang et al. [31] used GJR-GARCH model with a conditional variance formulation to capture asymmetric response in the conditional variance in short-term traffic forecasting. GJR-GARCH allows the conditional variance to respond differently to the past negative and positive innovations, which is inspiring for this article. Lin et al. [32] used quantile regression to deal with the heteroscedasticity problem, which used asymmetric loss functions for prediction intervals calculation of short-term traffic volume.

In summary, the limitation of using Euclidean distance in similarity measurement of traffic profiles has rarely been discussed and using asymmetric loss forecasting of traffic volume is a relatively new issue and is practical in real ITS systems. This research aims to fill up these gaps.

#### 3. Methodology

##### 3.1. Basic Concepts of Dynamic KNN Forecasting Method

KNN is a nonparametric and data-driven method for classification and forecasting. The notion of KNN is “Whatever has happened before will happen again.” Similar pattern is extracted from historic data and compared with new data to determine the underlying classification label or value of new data. In traffic volume forecasting, KNN model needs a historic traffic profiles database. Given a certain subject traffic profile to make forecasting, KNN model compares the similarity between the subject profiles with profiles in database using predefined similarity measurement. Then K nearest neighbor profiles are chosen and aggregated in desired time horizon to make predictions of future traffic volume. To explain how to use KNN method in traffic volume forecasting, some terms are defined as follows. The use of terms generally keeps consistent with terms used by Habtemichael [19].(i)Subject profile: the traffic profile of one specific day to be forecasted. The data structure of subject profile can be 1n (n timestamp) vector.(ii)Profiles database: a database contains historical traffic profile collected and preprocessed previously. The data structure of profiles database can be mn (m is the size of historical profiles and n timestamp) two-dimensional table.(iii)Candidate profiles: the nearest neighbor selected from profiles database according to similarity between subject profiles with profiles in database. The number of candidate profiles is the parameter K in KNN method.(iv)Lag duration: the time window considered to determine the similarity between subject and candidate profiles. For example, if time resolution is 5 minutes, the start point of forecasting is 6.00 AM (timestamp 72) and lag duration is 24 time intervals (5 24 / 60 = 2 hours); the subprofile from 4.00 AM to 6.00 AM of subject profile and candidate profiles will be extracted and used for similarity measure.(v)Forecast duration: the time window to make forecasting. For example, if time resolution is 5 minutes, the start point of forecasting is 6.00 AM and forecast duration is 12 time intervals (5 12 / 60 = 1 hour); the subprofile from 6.00 AM to 7.00 AM will be forecasted based on candidate subprofiles from 6.00 AM to 7.00 AM.

The KNN forecasting methodological approach proposed by Habtemichael can just achieve static forecast by one time [19]. The dynamic version of KNN forecasting is transformed as flow chart Figure 1 shows. This dynamic KNN approach can achieve rolling traffic volume forecasting of a whole day by multisteps.

##### 3.2. Improved Similarity Measuring Method

Many previous researches hold the view that Euclidean distance (or weighted Euclidean distance) is a proper choice of similarity measurement in traffic volume clustering and predicting. However, it is worth thinking about the limitation of Euclidean distance. For a sequence of points, and , Euclidean distance is calculated as

Take an example as Figure 2 shows. There are three imitating traffic volume time series y, y1, and y2. Use Euclidean distance to calculate the similarity between y and y1, y and y2. Because every point of y2 is closer to y compared with y1, the Euclidean distance between y2 and y (79.86) is lower than that between y1 and y (120).

However, in the context of traffic volume, profile y and y1 shows the traffic volume with peak in timestamp 4 and profile y2 with peak in timestamps 2 and 7. In this case, Euclidean distance can only measure similarity in absolute distance in total but ignores the difference in shape between profiles and cannot describe traffic volume more particularly.

To take the shape difference between profiles into consideration, adding the stability of difference measurement into Euclidean distance is a practical way as (2) shows:where and is the average value of series . R1 and R2 are the parameters to switch the balance of the weight between absolute distance and stability of difference, which can add flexibility in similarity measurement.

##### 3.3. Loess Smoothing Technique to Reduce Noise

The traffic volume used in experiments is aggregated by 5 minutes and there is plenty of noise in profiles. Using raw data without any processing will make forecasting unstable and damp accuracy. This has been tested in Section 4.2.1. Technique of smoothing noise in time series should be used before forecasting.

Locally estimated scatterplot smoothing (Loess) is a mature nonparameter smoothing technique which has been widely used in previous related works. The theory of Loess technique is stated by Cleveland and Devlin [33]. Comparing experiments using raw profiles and using Loess smoothing profiles in KNN forecasting is conducted in Section 4.2. Span is the key parameter of Loess model, which can adjust the smoothness of smoothed profiles. Previous research [19] has proved that 0.2 is a proper value for span in traffic volume forecasting using KNN method.

##### 3.4. Forecasting with Asymmetric Loss

In normal KNN forecasting model and most of other forecasting methods, there is a hypothesis: the cost of forecasting lower or higher is equal. Few researches focus on the asymmetric cost of forecasting. However, asymmetric cost is meaningful in traffic management. If forecasting traffic volume is higher than the real value, it only costs linear-increased management resource and makes travelers choose other possible routes. However, if forecasting value is lower than the real one, traffic chaos is more likely to occur and the whole traffic system is more unpredictable and vulnerable. It is sensible to make forecasting slightly higher according to the imbalance cost of direction of forecasting residual.

###### 3.4.1. Enhanced Criterion Index IMSE

To measure the effect of asymmetric loss forecasting method, enhanced asymmetric criterion must be constructed firstly. The typical balanced criterion index MSE (Mean Squared Error) is calculated as (3), where is observed value and is forecast value.

Because sum of squares does not take positive or negative of difference into consideration, the index MSE is balanced in forecasting evaluation. Think about the difference between the real value and forecasting value . When is positive, which means observed value is larger than forecasting value, the punishment should be greater. Oppositely, if is negative, the punishment should be less.

So the Imbalanced Mean Squared Error (IMSE) can be transformed as follows:

###### 3.4.2. Enhanced Asymmetric Algorithm of KNN Forecasting Method

In KNN forecasting model, the most important part is to calculate the similarity between subject profile and candidate profiles. It is natural to add asymmetric identification and operation into similarity calculation. A native way is to add asymmetric response into distance. If the profile in database is larger than the subject profile, it will not be brought into distance calculation and will be chosen more easily in nearest neighbor identification. On the opposite, if subject profile is larger, it will be brought into distance calculation. The distance of two profiles is calculated as follows:where is candidate profile value and is subject profile value at timestamp t.

The advantage of this asymmetric algorithm is that it is easy to be programmed. However, when profile is far larger than the subject value, this algorithm is vulnerable and can be affected by abnormal value in profiles. This algorithm is reasonable when profiles in database are all from one certain detecting device and contain few abnormal values in time series.

#### 4. Result and Discussion

##### 4.1. Data Use

The dataset used in this paper is from one of the traffic investigation stations in GuiZhou province in China. It was collected by an electromagnetic coil detecting device buried under ground of freeway. The dataset is supported by Transport Planning and Research Institute of China. The time span of experiment dataset is from September 19 to October 12, 2016. Traffic volume aggregates every 5 minutes. So there are 288 records (60 / 5 24) every day if there is no loss in detecting and aggregating.

However, missing value is unavoidable when detecting device is not completely reliable. The numbers of records every day in experiment dataset are shown in Table 1. There is no record on Sep 28, so it will not be used in later experiments. For other dates that contain missing value more or less, interpolation method is used to ensure the number of records every day is 288.

The date span of dataset contains a “National Day” holiday from September to October , which is useful when finding similar traffic patterns in later experiments. Traffic volume pattern between holiday and normal dates can be easily identified as Figure 3 shows.

**(a)**

**(b)**

##### 4.2. Dynamic KNN Forecasting

###### 4.2.1. Using Raw Data in Dynamic KNN Model

To test the feasibility of dynamic KNN forecasting method mentioned in Section 3, raw traffic volume data which is not smoothed by Loess technique is used in this section.

Firstly, as described in KNN workflow in Figure 1, the subject profile and profiles database should be appointed. For example, the series of 10/6 is chosen to be the subject profile, and series of other dates including the date before and after 10/6 (because the scale of dataset is limited and data should be used as far as possible to maintain the scale of database) are chosen to be the element profile of database.

Secondly, key parameters of KNN model should be designated, including the number of nearest neighbor,* lag duration* the time window to calculate similarity of profiles, and* forecasting duration* the time window to make forecasting. Using the best value of parameters in Section 4.3, K is 3, lag duration is 22 intervals (1 hour 50 minutes), and forecasting duration is 6 intervals (30 minutes).

Thirdly, the start time point of forecasting should be designated. Because the traffic volume in early of a day (like 0 AM to 6 AM) is quite low and not important in traffic management, it is assumed that the forecasting value is approximately equal to the real value before start time point of a day. In this experiment, the start time point is designated to timestamp 72 (6 AM).

After all these preparations, dynamic KNN forecasting method can be used to forecast traffic volume. Figure 4 shows that the forecast value fluctuates acutely, which may reduce the accuracy of forecasting. The criterion indexes MSE, MAE, and IMSE of this experiment lie in the first line of Table 5.

###### 4.2.2. Using Loess Data in Dynamic KNN Model

As experiment in Section 4.2.1 shows, fluctuation of profiles increases the difficulty for KNN method to find nearest neighbor and meanwhile reduce the accuracy of forecasting result. As Section 3.3 states, Loess smoothing technique can reduce noise in profiles and improve the ability to identify similarity among profiles. In this section, experiment is conducted to verify the effect of Loess smoothing technique using dynamic KNN forecasting model.

The main steps of this experiment are similar to experiment in Section 4.2.1 except that every profile in database is smoothed by Loess technique with span of 0.2. The result of experiment is shown in Figure 5. In this figure, the forecasting value is relatively smooth after the start point (72 timestamps). The red line of forecasting value still contains some sharp breaks caused by the change of chosen nearest neighbor. The criterion indexes MSE, IMSE, and MAE of this forecasting experiment lie in the second line of Table 5. Three criterion indexes drop obviously compared with the first line. It means that the forecasting using smoothed profiles database gets better accuracy.

The forecasting residual error after start time point (from timestamps 73 to 288) is shown in Figure 6. It shows the residual error distributes nearby 0 and contains no apparent pattern with time. However, residual error above 0 is slightly more than error below 0. It means that in normal KNN forecasting model, forecasting value tends to be lower than real value. In traffic volume forecasting, this inclination may cause bad effect in traffic management which has been discussed in Section 3.4. To reduce this effect, asymmetric loss and asymmetric KNN algorithm have been introduced in and related experiment is conducted in Section 4.4.

Figure 7(a) shows Auto Correlation Function (ACF) of residual error and shows there is a little self-correlation in residual error. Figure 7(b) tests normality of residual error by Quantile-Quantile plot (Q-Q plot) and shows the normality of residual is acceptable.

**(a)**

**(b)**

##### 4.3. Identifying Suitable Parameter Value in KNN Methods

###### 4.3.1. Suitable Number of Nearest Neighbors K

Number of nearest neighbors K is one of the most important parameters, which determines the candidate scale used to make forecasting. When K is too small, the profiles used to forecast are insufficient and forecasting may fluctuate sharply by extreme value of candidate profiles. When K is too large, the candidate profiles are more likely to contain dissimilar traffic profiles, which may reduce the accuracy of forecasting. So choose an appropriate K value that is critical for this model.

In symmetric loss model, different K value is experimented and MSE is chosen to be the criterion index. In asymmetric loss model, IMSE is chosen to be the criterion index. The result is shown in Table 2. MSE is lowest when K is 3 in symmetric loss model and IMSE is lowest when K is 5 in asymmetric loss. It means asymmetric model needs little more nearest neighbors to make the best forecasting.

###### 4.3.2. Suitable Value of Lag Duration

Lag duration determines the length of profile to calculate similarity among the subject profile and candidate profiles. When lag duration is too short, the selection of nearest neighbor profiles changes notably, which makes forecasting unstable. When lag duration is too long, the forecasting is more stable; however the flexibility of model is worse, which makes accuracy decline.

Lag duration from 4 intervals (20 minutes) to 48 intervals (4 hours) is tested in both symmetric model and asymmetric model. The corresponding forecasting criterion indexes are shown in Table 3. MSE is lowest when lag duration is 22 in symmetric loss model and IMSE is lowest when lag duration is 30 in asymmetric model.

###### 4.3.3. Suitable Value of Forecasting Duration

Forecasting duration determines the time window of one step. Longer forecasting duration can provide more traffic information in future, which is more helpful. However, if forecasting duration is too long, the predicting accuracy may decline and may cause unnecessary traffic chaos. This section tries to test the ultimate limit of forecasting duration in dynamic KNN traffic forecasting model.

As Table 4 shows, the criterion indexes of different forecasting duration fluctuate slightly from 4 intervals (20 minutes) to 12 intervals (1 hour). When forecasting duration is larger than 1 hour, indexes increase obviously, which means forecasting is becoming unreliable. The best forecasting duration is 6 (30 minutes) in symmetric model and 8 (40 minutes) in asymmetric model.

##### 4.4. Using Asymmetric Loss and Asymmetric Algorithm in KNN

As analysis in Section 3.4 shows, traffic volume forecasting with asymmetric loss is more practical in traffic management and traffic information service. This section uses asymmetric loss criterion index IMSE and asymmetric algorithm with the best parameters as Section 4.3 to test whether new-designed asymmetric algorithm can achieve asymmetric forecasting. The forecasting result is shown in Figure 8.

In Figure 8, the red line is between the orange line and the highest edge of blue line, which means forecasting using asymmetric algorithm is inclined to forecast traffic volume a little higher in a reasonable extent.

However, after timestamp 192 (16 PM), the red line is close to the orange line. By examining the chosen nearest neighbors, it is found that the chosen neighbors are the same to neighbors of normal model. It is inferred that this result is related to the scale of profiles database. Larger scale of profile database may act better in asymmetric forecasting. This inference will be examined in further research. The forecasting criterion indexes lie in the last line of Table 5.

From Table 5, asymmetric algorithm performs worse in MSE/MAE but does better in IMSE. Using the newly designed algorithm, IMSE index drops more than 10%. It means that asymmetric algorithms are more useful when loss of prediction direction is different, which is reasonable in traffic management. So the asymmetric algorithm achieves the aim of design.

#### 5. Conclusions

Using KNN method in short-term traffic forecasting has a history of nearly 20 years. However some detailed operations in KNN still have potential to be enhanced to satisfy the growing need of real ITS systems. In this paper, several limitations of previous researches in this direction are discussed and corresponding methods are proposed and tested.

First, the limitation of Euclidean distance is discussed using a counterexample. Only the absolute distance of every pair of points of two traffic profiles will be calculated and the shape difference is not taken into consideration. To solve this problem, Euclidean distance is reconstructed to contain the stability of the difference between two profiles and use parameters to adjust the balance between absolute difference and shape difference.

Second, Loess technique has been widely used in previous related researches. However the comparison of using raw profiles and smoothed profiles in dynamic KNN method in traffic volume forecasting was seldom made. This research provides strong evidence suggesting that the criterion indexes MSE and MAE drop sharply when Loess is used in KNN method.

Third, asymmetric loss was seldom discussed in previous short-term traffic forecasting researches and it has realistic meaning in real traffic management and service. The asymmetric loss index IMSE is proposed and the asymmetric loss version of KNN algorithm is constructed and tested in later experiment. The results show that IMSE index drops more than 10% and the forecasting value is closer to the upper edge of real traffic volume, which means that the newly designed algorithm can achieve better performance when the cost of forecasting direction has significant difference.

There are still some limitations in this study. The concept of asymmetric loss and newly designed method is still immature and relatively rough, which will be easily affected if profiles contain extreme values. More refined traffic forecasting methods and algorithms with asymmetric loss can be proposed in this direction in future researches. And more corresponding simulation experiments can be conducted to test the usefulness of asymmetric loss in traffic management and traffic guidance.

#### Data Availability

The data used to support the findings of this study have been deposited in the https://github.com/ahorawzy/TFTSA/tree/master/data-raw repository.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The study is funded by scientific and technological support program of the Ministry of Science and Technology of People’s Republic of China (2014BAH23F01). WenPeng Zhao (zhaowp@cahs.com.cn) gave useful advice about preprocessing of raw data and R language programming. Chuantao Wang (wangchuantao@bucea.edu.cn) gave useful advice about major and minor revise and undertook some revision work.