Abstract

Wide-ranging applications of road traffic detection technology in road traffic state data acquisition have introduced new challenges for transportation and storage of road traffic big data. In this paper, a compression method for road traffic spatial data based on LZW encoding is proposed. First, the spatial correlation of road segments was analyzed by principal component analysis. Then, the road traffic spatial data compression based on LZW encoding is presented. The parameters determination is also discussed. Finally, six typical road segments in Beijing are adopted for case studies. The final results are listed and prove that the road traffic spatial data compression method based on LZW encoding is feasible, and the reconstructed data can achieve high accuracy.

1. Introduction

The advent of big data brings unprecedented opportunities as well as challenges, especially in the field of transportation and traffic engineering [1, 2]. With the rapid development of science and technology, the intelligent transportation system (ITS) has developed continuously, and its applications have become wide ranging. The ITS system can accomplish the tasks of road traffic data acquisition, processing, and transportation. Besides, it can complete the job of traffic state analysis, route guidance, and traffic control. As various road traffic detection systems are adopted in the road traffic field, the collected road traffic state data increase and become massive. This serious situation introduces a challenge for real-time transmission, storage, and guidance of massive road traffic data. Thus, it is necessary to find an efficient approach to compress real-time traffic states data which can save much storage space as well as providing some other applications [3]. And the compression method of road traffic states data has deeply promoted the managements for transportation administrators. Besides, useful compression method of road traffic data can also be applied to transportation research fields, and some inspirations may occur to the researchers. The essence of road traffic state data compression is to represent the signal information with less data. Through effective compression and reconstruction, traffic data transmission and storage can be achieved [46].

In recent years, a great many of data compression methods have been explored in traffic and transportation fields. With the popularity of machine learning and data mining study among practitioners and researchers, some road traffic compression methods are presented. Due to the multidimensional and multigranularity characteristics of traffic and transportation big data, PCA method realizes the compression of road traffic states data through reducing the dimensions of original data [7]. As an emerging technology, compression sensing has also been used in data compression due to its superiority. Compression sensing breaks through traditional Nyquist sampling theorem restricts and can collect and compress data simultaneously. Making use of the redundancy characteristics of road traffic states, compression sensing technology achieves the estimation [8] and compression [911] of road traffic states data. Since the road traffic states data possess the spatial-temporal correlation and similar characteristics, Xiao et al. presented a spatial-temporal model based on road traffic data compression and decompression technology of 2D discrete wavelet transformation, realizing the denoising compression of ITS system [12]. Ou et al. proposed road traffic volume data compression based on artificial neural network [13].

Some modified and improved methods also fill the compression gap. The embedded devices in motor vehicles also generate abundant data for researches to investigate the compression of road traffic states data [14]. Making use of the GPS positioning data produced by the mobile devices of travelers, Ma et al. presented a differential preprocessing method, and a dynamical Huffman algorithm was adopted to compress GPS positioning data [15]. Wang et al. put forward an encoding algorithm with self-adaptive switching mode according to specific format [16]. Hou presented a stop-wave mode based on the concept of the compression factor and its differential equation [17]. Song et al. proposed a hybrid spatial compression algorithm and error bounded temporal compression algorithm to compress the spatial and temporal information of trajectories, respectively [18]. However, many researches do not have a common baseline for their performance analysis and provide the infrastructure to operate on a publicly available dataset.

The existing road traffic data compression methods mainly focus on the compression of road traffic network data. However, in recent years, limited literature has been written on the road traffic spatial data compression methods of different road segments on similar time nodes. Some literatures on predictions are investigated temporally and spatially in recent years. The studies are not only in road traffic field, but also in the field of transportation.

The travel needs and travel routes of traffic participants exhibit certain regularity; thus, the road traffic spatial states of different road segments on similar time nodes represent strong relationships. That is, the changing curve of road traffic spatial state on different road segments on similar time nodes possesses some similarity. The correlation presents great probability for the compression of road traffic spatial data. Thus, based on the spatial correlation characteristics of the road traffic states, the road traffic spatial data on different road segments on similar time nodes are extracted for compression. LZW inherits the merits of LZ77 and LZ78 on compression efficiency and speed. Besides, the method easily achieves good performance. Thus, the LZW encoding is introduced in the study. Based on the spatial correlation of road traffic, a compression method of road traffic spatial data based on LZW encoding is proposed in this paper.

In this study, a compression method of road traffic spatial data based on LZW encoding is proposed to compress the road traffic spatial states data under the same time intervals, realizing efficient transmission and storage as well as display. The useful compression of road traffic states data can be efficiently used into feature extraction and traffic states prediction. Multivariate time series analysis is similar to the proposed method, which can take into consideration both spatial and temporal correlations. In our study, we used the spatial correlation characteristics of road traffic states to compress the states data. The aims of the two studies are different.

Some motivations are explained here. Although the proposed compression method of our study is tested on the road traffic states data, it is also very useful for transportation management as well as transportation prediction. Besides, the compression can be also used for feature extraction, which can be applied to evaluate the traffic running states.

Based on the characteristics of road traffic flow, the PCA method can be used to analyze the correlation of spatial road segments [19, 20]. Then, the spatial road segments are selected to extract the data for compression. The spatial road segments denote the different road segments; the data on these segments are extracted on the spatial road segments at the same time intervals.

The contributions of the proposed algorithm are threefold:(1)The PCA method was introduced to the algorithm to select the road segments with spatial correlation.(2)A novel road traffic spatial data compression algorithm based on LZW encoding was proposed to construct the difference data on selected spatial road segments under the same mode.(3)The proposed algorithm could determine the optimal parameters in the training process based on spatial historical data and base data on road traffic states.

The rest of this paper is organized as follows. The modeling methodology of the proposed algorithm is discussed in Section 2. In Section 3, parameter determination of the road traffic spatial data compression study based on LZW encoding is presented. The experiment results are shown in Section 4. The conclusion and direction for future studies are discussed in Section 5.

2. Compression Algorithm of Road Traffic Spatial Data Based on LZW Encoding

2.1. Framework of the Algorithm

The process of compression and reconstruction of road traffic spatial data is shown in Figures 1 and 2, respectively. First, the PCA method was used to select the road segments with the characteristics of spatial correlation. The road traffic spatial data under the same mode on different road segments were acquired to construct the reference sequences of road traffic characteristics. Based on the analysis of spatial correlation, the base road segment was selected and the data on which were regarded as spatial base data. Second, the historical data on other spatial road segments under the same mode was extracted as training data. The optimal threshold of road traffic spatial difference data was determined based on road traffic spatial base data under the same mode. Third, real-time spatial data on other road segments under the same mode were acquired as experimental data and the road traffic spatial difference data were acquired on the basis of road traffic spatial base data under the same mode. Finally, the compression and reconstruction of road traffic spatial difference data were achieved through LZW encoding and decoding technology, respectively.

2.2. Acquisition of Road Traffic Spatial Base Data
2.2.1. Selection of Road Segments with Correlation Based on PCA Method

Road traffic flows possess the characteristics of periodicity, similarity, correlation, and so on. The road traffic flows of spatial road segments indicate a strong spatial correlation. Thus, the PCA method was used in this study to select the road segments with the characteristics of correlation.

PCA is a multivariate statistical method that eliminates the correlation among the variable indicators. -dimensions of road traffic state data can be effectively reduced to two dimensions, which can be illustrated in a 2D figure. Taking advantage of these characteristics, the related road segments can be selected. The process has been described in previous studies [19, 20].

2.2.2. Division of Road Traffic Running Modes

The road traffic running modes can be divided into two levels: the road network level and road segments level. Assuming that the running modes division identification of road network level and road segments level can be divided into and submodes, respectively, the road traffic running modes can be divided into modes in total. The modes can be shown as . and can be determined by the road traffic running modes division identification. The running modes division identification of road network mainly refers to the impact factors of road traffic running modes on different dates. The road traffic running modes division identification of road segments refers to the influence factors of the road traffic running modes of the specific condition of the road segments, which can be illustrated as in Figure 3.

2.2.3. Construction Design of Road Traffic Characteristics Reference Sequences

Assuming the collection period of road traffic state data was , then time format of road traffic information template can be illustrated as in Figure 4. The table format of the road traffic characteristics reference sequence can be described as in Tables 1 and 2.

Let denote the total number of selected road segments, which can be described as follows:where is the number of spatial road segments; denotes the th road segments; represents the set of selected road segments with correlation.

Based on the correlation of road traffic spatial data, the base road segment is acquired to extract the road traffic data as road traffic base data. The road traffic data on other spatial road segments are extracted as historical data and real-time data.

2.3. Optimal Threshold Determination of Road Traffic Difference Data

The data on other spatial road segments are extracted as training data. Under mode, the road traffic spatial difference data under the same mode are acquired based on road traffic spatial base data to conduct the threshold processing. Through LZW encoding, the optimal threshold is identified. The main expressions can be described as follows: The characteristics are described in Table 3.

Based on the formulas of (2), the optimal threshold of difference data can be identified.

2.4. Road Traffic Spatial Data Compression Based on LZW Encoding
2.4.1. Acquisition of Road Traffic Spatial Difference Data

The spatial data on other road segments were extracted as real-time data. Under mode and based on the spatial base data, the road traffic difference data were acquired. The main expressions can be described as follows:The characteristics are described in Table 4.

2.4.2. Road Traffic Spatial Difference Data Compression Based on LZW Encoding

LZW encoding is a lossless compression method based on dictionary coding. By constructing a string table, the long code word is presented by a shorter code word to realize data compression. The string and code word are gradually built, and the string table is constructed dynamically. The string table is constantly improved and is greater in comparison with the latter string and string table. The created string table does not need to be stored along with the data. In the decompression process, the same string word can still be reconstructed. Thus, the compression radio can be improved by another step.

Based on LZW encoding, the road traffic spatial data compression can be achieved. The best threshold of the difference data between road segment and base road segment can be introduced into the difference data on the road segment and the base road segment under the same mode. Combining the LZW encoding, the difference data compression of road segment and base road segment can be realized. The main expressions can be described as follows:The characteristics are explained in Table 5.

The compression radio is .

2.5. Road Traffic Spatial Data Decompression Based on LZW Decoding

Based on LZW decoding technology, the data reconstruction of difference data between road segments and base road segment can be realized. Combining the base data, the decompression of road segments real-time data can be achieved. The main expressions are as follows:where denotes the LZW decoding; denotes the spatial difference data on road segments after LZW decoding at moment under mode; and denotes the reconstructed road traffic real-time data on road segments at moment under mode.

3. Parameter Determination

In the process of road traffic spatial data compression based on LZW encoding, the following parameters were involved: , , , , , , where can be acquired by and , , and can be acquired by , , and . Parameter settings here are only concerned with the effect analysis of the road traffic spatial data compression based on LZW encoding. Separately analyzing the effect of each parameter on the accuracy of the algorithm cannot guarantee an optimal algorithm because these parameters influence the accuracy of the algorithm in different ways. All of the parameters in the road traffic spatial data compression results should be considered when conducting the algorithm analysis.

The compression ratios are introduced to measure the effect of parameters on the precision of the algorithm. The main expression can be described as follows:where denotes the compression ratio of road segment at moment under mode; denotes the number of road traffic data before compression at moment under mode; and denotes the number of road traffic data after compression at moment under mode.

Different corresponds to different NMAE. Thus, the following expression is reasonable:

That is, a certain distribution relationship exists between and . The process of finding the maximum that corresponds to is training optimal parameters. Thus, the following model can be obtained:

Finally, the value of can be determined through statistical analysis of the reconstructed results of road traffic state.

4. Experiments

4.1. Data Acquisition
4.1.1. Road Segment Acquisition

The proposed compression algorithm is conducted with the road traffic spatial relevant data; thus, the selected data must exhibit the characteristics of spatial correlation. The road segments will be briefly explained here. The types of the road segments are express ways, the wide of which is similar. First, the volume data on six typical road segments in Beijing were adopted in the present study. The specific road segments were determined in Table 6.

Five days (June 11, 18, 19, 25, and 26 in 2011) of road traffic data were extracted to construct the reference sequences of road traffic characteristics. The road traffic state data collection interval is 2 min. As the correlation of road segments mentioned in the literatures [19, 20], the first two principal components can reflect most of the information of road traffic state. Based on PCA method, we can find that four road segments, HI3009b, HI3008b, HI7058b, HI7036b, exhibited strong correlation that can be determined by cross correlation.

The volume data on the six road segments from June 11, 2011, were extracted to determine the spatial correlation. The cross correlation is shown in Table 7. According to the table, the correlation of all road segments can be determined.

As shown in Table 7, the cross correlations between HI3008b and the other three road segments (HI3009b, HI7058, and HI7036b) were greater than 0.9. Thus, the HI3008b road segment served as the base road, and its collected data were considered as the base data. The volume data on the four road segments were selected for the case study to prove the performance of the proposed algorithm. This can be explained by the following reasons.

The change regularity of volume is mainly determined by the regularity of people’s origin-destination (OD) travel. But for different date, people travel OD changes randomly. The travel on weekends has a comparative regularity. Thus, four days (June 18, 19, 25, and 26 on 2011) of road traffic data on spatial road segments were extracted to construct the reference sequences of road traffic characteristics.

4.1.2. Data Instruction

The collected road traffic data on the HI3009b, HI7058b, and HI7036b road segments from June 11, 2011, were considered as training data to conduct algorithm parameter settings. Under the same mode, the collected road traffic data on the HI3009b, HI7058b, and HI7036b road segments from four other days were regarded as real-time data to validate the proposed algorithm.

4.2. Results

The road traffic spatial volume data compression results based on LZW encoding on the HI3009b, HI7058b, and HI7036b road segments are illustrated in Figures 516.

The running time is provided here, which can indirectly reflect the calculation speed of the proposed method. Through several times testing, the average running time is approximate to 0.45. From the running time, we can see that the proposed method is simple and practicable.

The statistical reconstructed results of spatial volume data based on LZW encoding on HI3009b and HI7058b road segments from June 18, 19, 25, and 26, 2011, are illustrated in Tables 8 and 9, respectively. CR, AE, marerr, and denote the compression ratio, mean absolute error, absolute relative error percentage, and error standard deviation, respectively. Average denotes the mean value of the four indicators. CR is described in (7). AE, marerr, and can be described as follows:wherewhere denotes the error data between the original real-time data and the reconstructed real-time data on road segments at moment under mode; denotes the mean error at moment under mode.

4.3. Sensitive Analysis

A sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs [21]. In Section 4.2, four road segments are selected, and HI3008b is used for training and the others are used for testing. To test the effect of data size on the compression and reconstruction results, a sensitive analysis is urgently needed. Since the proposed algorithm is applicative for big data in road traffic transportation data, a sensitive analysis is also required to test the feasibility for little and medium-size data. The data size can be indicated by the collecting time. Thereby, a sensitive analysis is conducted through testing the compression and reconstruction results indicators under different collecting time, that is, CR, AE, marerr, and .

The process of parameters determination is performed in Section 3, but the optimal parameter is determined under fixed collecting time. For different collecting time, the optimal parameters will be different. Thus, collecting time is considered as a variable to test compression results. Besides, in this process, we also follow the rule in (9).

Here, a brief data declaration is provided. In Section 4.1, one-day collected data (720) are used for experiment. To test the feasibility of the proposed method, we calculate the experimental index under different collecting time on HI7058b. This may be regarded as a test bed. The sensitive analysis can be seen in Tables 1114.

From the sensitive analysis results shown in Tables 1114, we can see that the compression ratio of big-size data is relatively greater than little and medium-size data. And AE, marerr, andσ are all less than 10. The results show that the proposed algorithm is feasible.

A comparison is also provided here. PCA method is a famous data compression method; thus, we compare the proposed method with PCA method. We compare the reconstruction indicators on on June 19, 2011. The specific results are shown in Table 15.

From Table 15, we can see that the CR of LZW encoding is dramatically greater than that of PCA. The AE, marerr, and of PCA and LZW are very similar. The comparison proves that the performance of the proposed method is comparatively better.

4.4. Analysis of Experiment Results

Based on the experiment results conducted in Section 4.2, the following analyses are presented:(1)From Tables 810, the following results can be obtained:For the reconstructed volume data, the average compression ratios are 9.91, 15.05, and 5.94 for the HI3009b, HI7058b, and HI7036b road segments, respectively; the average mean absolute error rates are 12.15, 6.96, and 10.32 for the HI3009b, HI7058b, and HI7036b road segments, respectively; the average absolute relative error percentages are 13.79, 7.53, and 12.00 for the HI3009b, HI7058b, and HI7036b road segments, respectively; the average error standard deviations are 14.12, 9.16, and 13.37 for the HI3009b, HI7058b, and HI7036b road segments, respectively. As the statistical data show, we can find that the performance of the HI7058 road segment is better than that of the HI3009b and HI7036b road segments.(2)The road traffic spatial volume compression ratio for the HI7058b road segment is higher than that for the HI3009b and HI7036b road segments.The main reason is that the cross correlation of road traffic spatial volume for HI7058b is higher than that for HI3009b and HI7036b. From Table 7, a similar conclusion can be reached. Consequently, the volume compression ratio for the HI7058b road segment is higher than that for the other two road segments.(3)The precision of the reconstructed results of volume for the HI7058b road segment is higher than that for the other two road segments.From Figures 516, we can get that the precision and stability of the reconstructed volume data for the HI7058b road segment is higher than those for the HI3009b and HI7036b road segments based on LZW encoding. The phenomenon is mainly caused by the cross correlation between the base volume data and real-time volume data on base road segment and other spatial road segments, respectively. From Table 7, similar conclusions can be reached.(4)Some peak points are missing and the phenomenon can be described by the following reason.The road traffic data at the peak points has a sudden change compared to the data on the base road segment. In the feature extraction process, the features are extracted based on the threshold processing of the difference data. If the features needed to be retained, we can shift down the threshold. In the reconstruction process, the peak points can be retained at the cost of compression ratio. From the reconstructed results shown above, the peak points are sustainable.(5)Several errors occur in the road traffic state reconstruction of this algorithm.The errors are mainly caused by the following two reasons:(1)Obtaining the corresponding road traffic spatial states with a perfect match based on LZW encoding is difficult because of the limitations of the road traffic running characteristics.(2)The parameters exhibit a certain deviation. Determining the optimal parameters is irregular because they vary for different road traffic state datasets. The selected optimal parameters are determined based on the historical road traffic state data. Therefore, the current optimal parameters are approximately different from the historical optimal parameters.

5. Conclusions

An effective road traffic data compression algorithm can boost the data transportation and storage effectiveness of a road traffic system. The PCA method can be used to select the road traffic segments with strong correlation. Based on the spatial correlation of the road traffic spatial data, this study proposes a road traffic spatial data compression algorithm that uses LZW encoding. The contributions of this study can be effectively used for the road traffic spatial data compression of different road segments. Besides, the high spatial correlation roads are selected by PCA, which can also be used in transportation research. Further, the compression method can motivate some interesting ideas in transportation research field as well.

For the road segments with high spatial correlation, the proposed algorithm performs effectively. According to the reconstructed results of the HI3009b and HI7036b road segments, the algorithm is sensitive for correlation. The stronger the correlation is, the better the performance of the algorithm is. Thus, to ensure improved performance, the cross correlation should be greater than 0.95. Then, the expected compression ratio can be obtained.

Considering the remarkable performance of the proposed algorithm, we will explore the traffic state compression based on the spatial-temporal correlations in our next study.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Zhejiang Provincial Natural Science Foundation (Grant no. LQ16E080012), the National Natural Science Foundation of China (Grant no. 6157331), and Open Fund for a Key-Key Discipline of Zhejiang Province (2015001).