Abstract

Parking issues have been receiving increasing attention. An accurate parking occupancy prediction is considered to be a key prerequisite to optimally manage limited parking resources. However, parking prediction research that focuses on estimating the occupancy for various parking lots, which is critical to the coordination management of multiple parks (e.g., district-scale or city-scale), is relatively limited. This study aims to analyse the performance of different prediction methods with regard to parking occupancy, considering parking type and parking scale. Two forecasting methods, FM1 and FM2, and four predicting models, linear regression (LR), support vector machine (SVR), backpropagation neural network (BPNN), and autoregressive integrated moving average (ARIMA), were proposed to build models that can predict the parking occupancy of different parking lots. To compare the predictive performances of these models, real-world data of four parks in Shenzhen, Shanghai, and Dongguan were collected over 8 weeks to estimate the correlation between the parking lot attributes and forecast results. As per the case studies, among the four models considered, SVM offers stable and accurate prediction performance for almost all types and scales of parking lots. For commercial, mixed functional, and large-scale parking lots, FM1 with SVM made the best prediction. For office and medium-scale parking lots, FM2 with SVM made the best prediction.

1. Introduction

Car parking has been a major issue in urban areas worldwide. Most countries are facing issues related to the lack of parking places. With the increasing economic development and urbanisation, car ownerships are growing rapidly, which exacerbates the imbalance between parking supply and demand [1]. The Ministry of Public Security of China released data of car ownership nationwide in 2018, showing that the number of cars reached 240 million with an annual growth rate of 10.51%, but the total number of parking spaces was only 102.5 million including private and public parking spaces, which is lower than half of the total number of cars. Moreover, around 30% of the traffic congestion in Chongqing and Shanghai, major cities of China, is due to lack of car parking spaces [2]. This issue is mainly caused by ineffective parking management. According to the latest research report [3], the parking space utilisation rate of more than 90% of cities in China is <50%. With the limited areas in the cities, increasing parking area would not be a sustainable solution, but the implementation of efficient parking management would be a practical solution. The intelligent parking system is an essential part of efficient parking management. In intelligent parking system, the time-sensitive parking occupancy prediction will be of great significance for decision makers and city planners regarding parking.

The number of available parking spaces plays an important role in drivers’ decision-making processes regarding parking [4, 5]. According to Caicedo et al. [6], drivers that possess information on parking availability are 45% more successful in availing parking spaces than those without knowledge. Moreover, the parking occupancy prediction is helpful in transportation management and planning [7]. For instance, public agencies such as city traffic and planning departments use the predicted parking occupancy information to manage transportation demand and traffic congestion [8]. Parking facility managers and operators may foresee the parking system performance and carry out short- and long-term preventive strategic decisions to avoid system breakdowns [9]. On the other side, the parking occupancy prediction can help reduce traffic congestion and energy consumption [10]. According to a report [11], on an average, US drivers spend 17 h per year searching for parking spaces at a cost of $345 per driver incurred due to time consumption, fuel, and emissions. If an accurate prediction of parking availability can be provided, drivers can save a lot of time while searching for parking spaces, and energy consumption can be reduced. Thus, the parking occupancy prediction is a critical, but often ignored, element in the transportation system, which can balance the distribution of the occupancy rate of parking lots.

Therefore, researchers have applied models such as linear regression (LR), support vector machine (SVM), backpropagation neural network (BPNN), and autoregressive integrated moving average (ARIMA) to predict parking availabilities. According to the literature review, the performance of these models may vary significantly if parking type and parking scale are considered [12]. Sun et al. [13] used a local linear regression model to predict short-term traffic based on the traffic speed data of Houston’s freeway. Their result shows that local linear methods performed better than kernel smoothing methods. Apronti et al. [14] conducted an empirical study based on traffic count data collected from sampled roads across Wyoming. Then, they developed a linear regression model and a logistic regression model to predict traffic volumes. The results indicated that both model types are useful for accurate and cost-effective estimation of traffic volumes in Wyoming roads. Deshpande and Bajaj [15] discussed the implementation of the traffic flow prediction model using SVM based on the traffic data obtained near the Perungudi toll plaza in the IT corridor in Chennai, India. They used a rough set to validate the prediction result. The prediction results were fully satisfactory. Chen and Wang [16] proposed the combination of support vector regression and genetic algorithm (GA) to predict the tourism demand, forecasting the inbound tourism flow in China and the tourist flow in Shanxi, respectively. Hong et al. [17] combined GA and support vector regression to predict the inbound tourism flow in Barbados and achieved high-quality prediction results. Pflügler et al. [18] used a neural network to predict the parking space availability in urban areas of Munich based on the following factors: the day of the week, time, location, temperature, events, traffic, vacation time, and rainfall. It shows that publicly available information can be a good initiation point for prediction, but we still need to rely on the historical data of parking. Chen [19] predicted the parking occupancy in San Francisco using neural network, ARIMA, linear regression, and support vector regression. It is found that neural network provides the best prediction results among the aforementioned models. However, neural network needs training of over 90 min, which is really long. Haviluddin and Rayner [20] developed a BPNN model to predict daily network traffic. The model showed a pretty good mean squared error (MSE) value, and two-hidden layers could be used as a model to predict traffic volume. Based on the BPNN model, Purnawansyah and Haviluddin [21] predicted daily network traffic at Universitas Mulawarman, East Kalimantan, Indonesia, based on BP and radial basis function (RBF) neural network models and achieved excellent predicted results. Wang [22] developed a BPNN model to forecast the traffic flow in Guangzhou, China, from August to December in 2014. They analysed the network structure and parameters of BPNN to predict bus traffic. Dou et al. [23] used ARIMA to predict traffic flow at several time periods and provided numerical examples on the field data to testify the accuracy of their model. Gustavsson and Nordstrom [24] used ARIMA to forecast the passenger flow of different types of Swedish inbound tourism and achieved high-quality prediction performance.

According to the authors’ best acknowledgment, there has been no systematic evaluation of the influence of various prediction methods on parking occupancy considering parking type and parking scale. To address this problem, this paper proposes a comparison study on the effects of various methods, such as linear regression, SVM, neural network, and ARIMA, on the prediction performance of selected car parks considering various parking types and parking scales. For the case studies, the real-world data were collected from Shenzhen, Shanghai, and Dongguan and utilised for analyses.

The rest of the paper is organised as follows. Section 2 elaborates on the methodologies, including research framework, data collection and processing effort, forecasting methods and models, model parameter selection, and evaluation indexes. We discuss empirical experiments to predict the hourly parking occupancy with real-world data in Section 3, followed by the comparison analysis of the performance of four different models using two forecasting methods considering parking type and parking scale. Finally, Section 4 concludes the article, along with the discussion regarding future studies.

2. Methodology

2.1. Research Approach

The research framework of this study is depicted in Figure 1. The framework comprises three stages: data collection, method development, and result evaluation. In the data collection part, real-world data from four parking lots of different scales and types are collected. The hourly parking occupancy data of parking lots are obtained based on the in- and out-time records of cars. In the method development part, two forecasting methods have been proposed. Four predicting models have been developed via Python, and each predicting model is applied with forecasting method 1 (FM1) and forecasting method 2 (FM2). A cross validation technique was applied for the tuning of parameters. In the evaluation part, root MSE (RMSE) and mean absolute error (MAE) are calculated to compare the performance of two forecasting methods and four predicting models considering parking type and parking scale.

2.2. Data Description

Four parking lots, located in Shenzhen, Dongguan, and Shanghai, of different scales and types, were selected to obtain the records of entries and exits for 7 consecutive weeks. The information about the four parks is listed in Table 1.

The parking data of 55 days ranging from 2 June 2018 to 27 July 2018 of four parking lots were collected. During the analysis, there were 606,959 records of cars’ in-time and out-time. We recorded the number of cars making entries and exits in every hour for hourly prediction. The records of the first 6 weeks, 9 June 2018–20 July 2018, were used as the training data to train models, and the data of the last week, 21 July 2018–27 July 2018, were used as the analysis data.

The parking curves of four parking lots in the first 5 weeks are shown in Figure 2. Four parking lots fall under three categories. For commercial parking lots (PL1), the parking number on weekends is much higher than that on weekdays. The situations of office parking lots (PL2 and PL4) are vice versa. As for mixed functional parking (PL3), there was no considerable variability in the number of vehicles on weekdays and weekends. Although it is originally a commercial parking lot but was located in a geographical location surrounded by corporate buildings, the parking situation is interlaced and complex. Considering the parking scale, we divided them into three categories. PL1 is a large-scale park, PL4 is a medium-scale park, and PL2 and PL3 are small-scale parks.

According to the entry and exit records of four parking lots, we calculate the number of vehicles parking in the parks per hour. An initial number of vehicles were set up for each parking lot on the basis of the scale of each parking lot obtained from previous studies. The initial values of PL1, PL2, PL3, and PL4 are set as 64, 16, 14, and 0, which makes the bottom of each parking curve close to zero.

2.3. Forecasting Methods

In this paper, two forecasting methods have been proposed. FM1 regards weekends and weekdays as the same set. Considering the weekly pattern, we choose the data of yesterday and the same day in last week as the independent variables to predict the parking occupancy.

FM2 regards weekends and weekdays as two separate sets and forecasts the parking occupancy separately. As shown in Figure 2, the weekends’ parking rate curves are similar, while the weekdays’ parking rate curves are of the same trend. The parking rate curves of different weeks are highly overlapped. Therefore, we make predictions for weekends and weekdays separately. For weekdays, consider the data of the previous five weekdays as independent variables. For a weekend, consider the data of the previous two weekends as independent variables.

2.4. Prediction Models and Parameter Setting

The ultimate goal of our research is to accurately predict the parking occupancy in advance. Thus, the goodness of fit is an important measure standard for the selection of a model. According to the above review and relevant information, linear regression [25], ARIMA [26], SVM [27], and BPNN [28] have excellent prediction performance and implementation feasibility, which are chosen as the predicted models in this research.

2.4.1. Linear Regression

The simplest definition of linear regression is to assign a point set, D, and construct a function to fit the point set, making the error between the point set and the function minimal. If the function is a straight line, it is called linear regression. The general form of the multiple linear regression model is shown as follows:where Y is the explained variable, is the K explanatory variable, is the K + 1 unknown parameter, and μ is the random error term.

The advantages of linear regression are (1) smooth calculation, (2) no adjustment parameters, and (3) convenience. The drawback is that the prediction is not very accurate compared with other complex models. The linear regression model is not feasible enough for data with nonlinear features.

2.4.2. ARIMA

The ARIMA model has been established due to the transformation of nonstationary time series into stationary time series. The ARIMA model can be categorised into moving average (MA), autoregression (AR), autoregressive moving average (ARMA), and ARIMA based on the stability of the original sequence and the differences in regression. An ARIMA (p, d, q) model is given by

Second-order differencing was conducted first to transform the time series into one that is stationary. Then, we used auto correlation function (ACF) and partial auto correlation function (PACF) to determine the optimal number of MA and AR terms.

2.4.3. SVM

The main idea of SVM is to create a classification hyperplane as a decision surface to maximise the isolation edge between positive and negative examples. The theoretical basis of SVM is a statistical learning theory. SVM has its unique advantages and can effectively solve practical problems using small samples, local minimum points, high-level pattern recognition, and nonlinearity.

This study adopts the support vector regression (SVR) model, which is a kind of SVM and widely used in time series predictions because of its strong ability in dealing with nonlinear problems. SVR reduces risk as it is based on the structural risk minimisation principle, which is essentially a convex quadratic programming optimisation problem with linear constraints.

The kernel function is a new SVM method that deals with nonlinear problems. If the function satisfies conditions, we can bypass the complex dimensional transformation operation and directly use the kernel function transformation to solve high-dimensional inner products. Linear kernel, polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel are commonly used with SVMs. Equations (3)–(6) denote the aforementioned four kernels:

As for SVM, it is realized on MATLAB using LIBSVM [29], a library for support vector machines. The RBF (radial basis function) was used as a kernel function, and the epsilon SVR was selected as an SVM model. The penalty parameter C and kernel parameter gamma were selected on the basis of grid search with cross validation. Different pairs of (C, gamma) values were tried and the one with the best cross validation RMSE was picked. We tried exponentially growing sequences of C and gamma to identify good parameters. A coarse grid was initially used to identify a better region on the grid, and then a finer grid search on that region was conducted. After the best values were found for the parameters, the final SVM model was trained on the whole training dataset.

2.4.4. BPNN

BPNN is a multilayer feedforward network trained by the error inverse propagation algorithm, which is one of the widely used neural network models at present. The BP network can store and learn a large amount of input-output mode-mapping relations without revealing the mathematical equation in advance. The learning rule is to continuously adjust the weight and threshold of the network using the gradient descent method through BP. The ultimate goal is to minimise the sum of error square of the network.

As for BPNN, we built a 3-layer neural network. The input layer had 2 units and the output layer had 1 unit. The other network parameters such as training cycle, neurons in the hidden layer, learning rate, and momentum term were chosen by carrying out simulations using a trial and error approach. The model was trained using different combinations of these parameters so as to achieve the maximum predictability of the network for the test data by analysing RMSE. This was achieved by keeping selected parameters constant and slowly moving the other parameters over a wide range of values, as suggested in the previous works [30].

2.5. Evaluation Indexes

The performance of the models is evaluated using RMSE and MAE to select the best prediction model: the lower the error, the better the model. Equations (7) and (8) present the calculations of RMSE and MAE.

MSE is the expected value of the square of the difference between the estimated parameter value and the real parameter value. RMSE is the arithmetic square root of MSE; it is used to measure the deviation between observations and real values. The smaller the value of RMSE is, the more accurate the prediction model is:

MAE is the average value of the absolute error, which can preferably reflect the actual situation of the predicted error:where fi indicates the predicted value and yi indicates the real value.

RMSE and MAE are absolute indicators. The smaller the values of RMSE and MAE, the better the prediction. When comparing the prediction performance of different models regarding the same target, MAE and RMSE will be considered as the most relevant measures.

3. Results and Discussion

Linear regression, SVM, BPNN, and ARIMA were used to develop models to predict the parking occupancy of the aforementioned four parking lots. Machine learning models (linear regression, SVM, and BPNN) were applied to both forecasting methods. ARIMA needs continuous historical data for prediction. Therefore, only FM1 was applied to ARIMA.

3.1. Forecasting Results of FM1

Figure 3 shows the forecasting results of the four parking lots, with all the models using FM1. Among the four predicting models with FM1, SVM performed the best for all the parking lots.

Table 2 shows the forecasting results of FM1. The comparison of the performance of FM1 with all the models for different scales and types of the parking lots is shown in Figure 4. SVM performed the best among all the models.

In summary, considering the type of parking lots, for commercial parking lots, i.e., PL1, and office parking lots, PL2 and PL4, SVM made the best prediction. ARIMA performed the worst among all the models. For mixed functional parking lots, i.e., PL3, SVM made the best prediction, and BPNN performed the worst.

Considering the scale of parking lots, for large-scale parking lots, i.e., PL1, SVM made the best prediction, and ARIMA performed the worst among all the models. For medium-scale parking lots, i.e., PL4, SVM made the best prediction, and BPNN performed the worst among all the models. For small-scale parking lots, i.e., PL2 and PL3, SVM performed the best.

3.2. Forecasting Results of FM2

Figure 5 shows the forecasting results of the four parking lots with all the models, except ARIMA, using FM2. As ARIMA is a time series model, it is not suitable for FM2. Among the four predicting models with FM2, SVM performed the best for majority of parking lots except PL1.

Table 3 shows the forecasting results of FM2. The comparison of the performance of FM2 with all the models for different scales and types of parking lots is shown in Figure 6. SVM performed the best among all the models for most of the parking lots except large commercial parks.

In summary, considering the parking lot type, for commercial parking lots, i.e., PL1, linear regression made the best prediction. For office parking lots (PL2 and PL4) and mixed functional parking lots (PL3), SVM made the best prediction, and BPNN performed the worst.

Considering the parking lot scale, for large-scale parking lots, i.e., PL1, linear regression made the best prediction. For medium-scale parking lots (PL4) and small-scale parking lots (PL2 and PL3), SVM made the best prediction, and BPNN performed the worst.

3.3. Comparison between FM1 and FM2

The performance of FM1 and FM2 with SVM for different scales and types of parking lots was compared and is shown in Figure 7.

Although the SVM model with FM2 in the PL1 case performed worse than LR with FM2, the SVM model with FM1 performed the best among all the models with both FM1 and FM2 for PL1. Therefore, we can make the conclusion that SVM outperforms the other algorithms in parking occupancy prediction for all the parking lots.

In summary, considering the parking lot type, for commercial parking lots (PL1) and mixed functional parking lots (PL3), FM1 performed better. For office parking lots (PL2 and PL4), FM2 with SVM made the best prediction. Considering the parking lot scale, for large-scale parking lots (PL1), FM1 made the best prediction. For medium-scale parking lots (PL4), FM2 made the best prediction. For small-scale parking lots, there was no conclusion as FM2 with SVM made the best prediction for PL2 and FM1 with SVM made the best prediction for PL3.

3.4. Discussion

Generally, it can be found that that the SVM is very promising for most of parking occupancy prediction under study. The obtained experimental results demonstrate the potentiality of SVM model, which can be explained by three competitive advantages of SVM model over the conventional statistical-based prediction models and BPNN-based prediction model. First of all, SVM is a more complex model with more parameters compared with ARIMA and LR. It can be found that the better fit of SVM was due mainly to the greater number of parameters [31]. Furthermore, due to adoption of the structure risk minimization (SRM) principle, the SVM model can eliminate the typical drawbacks of conventional models [32], such as overfitting problems, and thus obtain more stable and robust generalization results. In addition, SVM is a convex optimization problem. The local optimal solution is the global optimal one, so it is easy to find global optimum. These advantages can explain why the SVM model performed clearly better than the other models in the case studies.

4. Conclusions

In this study, a systematic comparison was done to analyse the parking occupancy prediction of parking lots with different types and scales. Two forecasting methods and four prediction models (linear regression, SVM, BPNN, and ARIMA) were used to develop prediction models for parking occupancy. To implement the models, numerical experiments were conducted to predict the hourly parking occupancy with real-world data from four parking lots, located in Shenzhen, Dongguan, and Shanghai, of different scales and types. The results show the following. (i) Overall, among the four models considered in this study, SVM outperforms the others in parking occupancy prediction. When the type of parking lot is uncertain or the forecasting method has not been decided yet, SVM is recommended for parking occupancy prediction. (ii) Considering the parking lot type, for commercial parking lots and mixed functional parking lots, FM1 with SVM made the best prediction. For office parking lots, FM2 with SVM made the best prediction. (iii) Considering the parking lot scale, for large-scale parking lots, FM1 with SVM made the best prediction. For medium-scale parking lots, FM2 with SVM made the best prediction.

Data Availability

Data will be made available on request only for research purpose.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported in part by the Basic Research Program of Shenzhen Science and Technology Innovation Committee (JCYJ20180307123910003), Innovation and Entrepreneurship Team Project for Overseas High-Level Talents of Shenzhen (KQTD20170810150821146), and National Natural Science Foundation of China (61673233).