Abstract
With the advent of big data, statistical accounting based on artificial intelligence can realistically reflect the dynamics of labor force and market segmentation. Therefore, based on the combination of machine learning algorithm and traditional statistical data under big data, a prediction model of unemployment in labor force based on the combination of time series model and neural network model is built. According to the theoretical parameters, the algorithm of the two-weight neural network is proposed, and the unemployment rate in labor force is predicted according to the weight combination of the two. The outcomes show that the fitting effect based on the combined model is superior to that of the single model and the traditional BP neural network model; at the same time, the prediction results with total unemployment and unemployment rate as evaluation indexes are excellent. The model can offer new ideas for assisting to solve the unemployment of the labor force in China.
1. Introduction
Unemployment is not only a comprehensive economic problem but also a complex social problem [1] which has turned into the focal point of consideration for all nations. Whether unemployment can be properly solved is related to the development of the country. In recent years, China’s economy is facing a complicated international economic environment, such as RMB appreciation [2], prices rising of raw materials [3], and pressure increase of inflation input [4], which has led to the decline of export and economic growth, thus bringing about unemployment that deserves our attention. In addition, the government also attaches great importance to the development of a country, which depends not only on the country’s economy, science, technology, military affairs, and the gap between the rich and the poor but also on the treatment results of unemployment. If countries with serious unemployment do not have effective policies and measures to ensure the survival of the unemployed labor force, it can easily lead to social instability, which will affect the development of a country in all aspects. Therefore, if the government can predict the number and proportion of unemployed labor force in advance, they can formulate policies and measures to ensure the prosperity and development of the country according to these predicted data [5].
At present, with the advent of big data, some researchers try to use circumstantial big data to build indicators for measuring the dynamics of China’s labor market [6–8]. For example, by using recruitment data on Internet [9], social media data [10], data of digital search [11], etc., to build economic indicators based on nonstatistical accounting methods, which can reflect the dynamics of the whole labor force and market segments in real time. However, these indicators lack a clear definition in the accounting sense, and their scientific and reliability are questioned. For the method of prediction, because the urban unemployment and unemployment rate in China are all differential stationary time series data, the neural network algorithm has been proved to have an excellent approximation ability to nonlinear relationship in theory [12, 13], so it is of practical significance to study a prediction model of unemployment in the labor force, so as to solve the actual unemployment.
Before 2018, the most important unemployment index released by the Chinese government was the registered urban unemployment rate, which was quite different from the international definition of the unemployment rate, and the value remained stable at around 4% for a long time, so it was difficult to reflect the real unemployment level in China [14]. Since 2018, the Chinese National Bureau of Statistics has released the national urban survey of the unemployment rate on a monthly basis, which is a significant progress in the release of unemployment statistics. To make up for the long-term lack of China’s four macro-economic indicators, the data have been generally recognized by all sectors of society. However, due to the limited sample size of the labor force survey, the unemployment rate in the urban survey is not representative of administrative regions at all levels below the provincial level [13]. In 2021, the Chinese National Labor Force Survey will conduct a new round of sampling according to the seventh national census and appropriately expand the sample size to meet the main index data, such as the urban unemployment rate, which is well representative of the country and provinces (autonomous regions and municipalities).
Therefore, in this paper, the above-mentioned combination model is established, and the unemployment situation of the urban labor force with the number of unemployed and unemployment rate as evaluation indexes to explore new ideas that are helpful to manage the urban unemployment in China.
2. Related Theories
2.1. Neural Network Model
2.1.1. BP Neural Network
First proposed by the Parallel Distributed Processing scientific research group represented by Rumelhart and Mccelland in the United States, BP neural network was is a multilayer perceptron feedforward network [15]. It has strong nonlinear adaptability, excellent nonlinear approximation, and good fault tolerance. Therefore, it can process large-scale data in parallel with outstanding self-organizing, self-learning, and self-adapting abilities, which are widely adopted in the fields of prediction, classification, and clustering. This back propagation algorithm makes multilayer perceptron has the ability to approximate any complex nonlinear function [16]. It is a feedforward neural network using the BP learning algorithm, and Figure 1 shows its topology.

It can be seen that the direction of data flow is that first the data of the input layer flow to the first hidden layer after weighting, then the data flow to the second hidden layer after further weighting, and so on until the output data of the last hidden layer flow to the output layer. The change function between the info and the result of neurons in each layer is known as the actuation work. The universal approximation theorem has been proved [17, 18]. A three-layer BP neural organization with a secret layer can surmise any ceaseless capacity in a limited locale with erratic accuracy as long as there are an adequate number of stowed away hubs. The function corresponding to the ith neuron in a certain layer is shown in the following formula:In which, WJ is the weight in the BP neural network, X is the input vector, θj is the threshold, and F is the activation function.
2.1.2. RBF Neural Network
The radial basis function neural network (RBF) also belongs to feedforward neural network. It is widely used in function approximation and classification its straightforward design, simple preparation, quick learning combination, and the capacity to estimate any nonlinear capacity [19]. Although RBF has only one hidden layer, it has the same excellent nonlinear approximation ability as BP neural network, and Figure 2 shows its topology.

The basis function of hidden nodes in RBF adopts the distance function, and the activation function adopts the Gaussian function. Since RBF has spiral evenness about a middle mark of N-layered space, and the farther the contribution of neurons is from the middle point, the less activated the neurons are. This characteristic is often called “local property” [20].
2.1.3. Two-Weight Neural Network
For the two-weight neural network, by changing parameters of neuron function, it can also be constructed not only as a higher-order neural network [21]. In addition, the neural of the double-weight neural network includes not only the weights of the BP neural network but also the core values (i.e., central data). The neuron function of the neural network is shown in the following formula:Among them, yi is the output value of the jth neuron in a layer in the two-weight neural network model, f is the activation function, ωij is the weight of neuron, c is called the kernel value of neuron, xi is the input value of neuron, θ is the threshold value of neuron, and s and p are two parameters.
In formula (2), when Cj = 0, S = 1, and p = 1, it becomes the neuron function of the BP neural network; When s = 0 and p = 2, the function becomes the neuron function of RBF neural network. For the hidden layer, the neuron function of this layer only contains the direction weight but not the core weight. For the hidden layer of the RBF neural network, the neuron function of this layer only contains core weights but does not contain the direction weight; while for the hidden layer of the two-weight neural network, the neural function of this layer contains both the direction weight and the core weight. Therefore, when the parameters of the activation function take specific values, it has more excellent approximation ability for nonlinear continuous functions [22].
In view of this, the algorithm of the two-weight neural network is taken. To apply all training samples to the improved BP algorithm of weight adjustment (batch processing) in the neural network. The minimum mean square error criterion is selected to evaluate the learning rule and its objective function is defined as the following equation:
The total error of all input samples iswhere P is the number of learning sample vectors, (p = 1,…,P) is the expected output, and (p = 1,…,P) is the actual output.
Generally, the gradient descent method is used to modify the weights. In this paper, the formulas for adjusting the weights and the core values are as follows:
Among them, η represents the learning rate whose value is generally small, , Δc(t) = c(t) − c(t − 1), α is the momentum factor, and the value of α iswhere RSS() is the residual sum of squares of the output.
The learning of network is divided into two processes [23, 24]: one is to calculate the output of each layer by forward propagation from front to back and the other process is to correct the weight or core value of each layer by backward propagation from back to front. The Algorithm 1 steps are as follows:
|
2.2. Time Series Model
The time series of a random event usually refers to sequence {Xt}, formed by arranging the numerical values of an index observed at equal intervals in a certain period of time in sequence, which essentially reflects the trend that an index changes with time. The main purpose of time series analysis is to observe and study the existing data, mine the development law of these data with time, and thus analyze and estimate its future situation in the time series model [25]. The analyzed data are differential stationary time series in nonstationary time series. The so-called time series is a series of ordered data recorded according to the time sequence. In daily life, the data of time series can be seen everywhere, and most of these data are nonstationary time series, among which the ARIMA model, as a kind of nonstationary time series analysis, is widely used.
ARIMA (p, d, q) model has the following formula:
The modeling steps can be summarized as [26, 27]follows:(1)Test the stationarity of the original sequence data. There are two most commonly used methods, one is the graph test, and other is the unit root test. If the sequence is stationary, the graph of data changing with time should show that most of the data fluctuate at a constant and its fluctuation range is similar, autocorrelation coefficient will quickly decay to zero. When the sequence is nonstationary, the corresponding characteristic equation contains characteristic roots. Therefore, the unit root test can also be used to verify whether the sequence is stationary. The commonly used ones are DF, ADF test, etc. If the sequence is not stationary, the difference operation will be carried out.(2)Test the white noise of the series. It is not meaningful for all stationary series models, but only for those series models that can influence each other.(3)Determine the model’s order. The model to be established and its order can be determined according to the correlation coefficient diagram of stationary time series, and the order can also be determined according to the identification function.(4)Parameter estimation: The AIC criterion can be used to complete the process of order determination and parameter estimation simultaneously.(5)Model test. There are usually two types of tests: the significance test of model, whose purpose is to test whether the established model effectively and adequately extracts the information contained in the data. If the established model is effective, the residual has the characteristics of white noise data, and there is no available information; Otherwise, it means that the information in the data has not been extracted completely, and the significance test of the parameters remaining in the residual error is to confirm whether each parameter obtained is obviously different from zero to establish a simpler and more accurate model.
3. The Combined Model of Unemployment Prediction in Labor
Based on the above sample data, we first forecast the number of registered unemployed people and the registered unemployment rate in China by using the two-weight neural network and then forecast the data by using the time series method. Finally, the predicted results of the two methods are combined according to the corresponding weights as the predicted values of the combined model.
3.1. Two-Weight Neural Network Model
In this paper, the parameters in the neurons of the double-weight neural network are s = 0, p = 1, θ = 0, and the BP algorithm is used. When taking two-weight neural network to analyze, it is important to standardize the information first. The normalization formula for the series of the number of unemployed and unemployment rate is:
Among which, X(i) is the ith value in the normalized sequence, x0(i) is the original data, x0min is the minimum value, and x0max is the maximum value.
A training sample is constructed with the normalized data of unemployed and unemployment rate, and the method of sample construction is shown in the following formula:In which, Xi and Yi are the samples ith group, and x(i) is the ith value of the normalized data. The sample data of nearly 30 years are selected in this model, and Table 1 shows some normalized samples as follows:
Through the prediction method of two-weights neural network, the last expectation results are displayed in Table 2.
3.2. Time Series Model
Aiming at analyzing the data that changes with time, the time series analysis uses the time series model to analyze the unemployment and the registered unemployment rate in cities and towns. Because the data series of labor unemployment is nonstationary, it is necessary to make a first-order difference to the original series, and the series after the difference has a very strong short-term correlation. Therefore, the ARIMA model is selected for fitting.
The statistical samples are tested for significance level of a = 0.05. The white noise test of the first-order differential sequence of unemployed shows that the value of the statistics constructed after the sixth-order delay is 0.0122, which is less than 0.05, while for the unemployment rate, the value is 0.0017 that less than 0.05. It can be considered that the differential sequence is a nonwhite noise sequence; that is, the related information of the differential sequence cannot be ignored and has yet to be extracted. The ARIMA model can be used to fit the differential stationary and nonwhite noise sequence.
After the difference sequence, the autocorrelation coefficient is trailing, so as the partial autocorrelation coefficient. After testing, when the ARIMA model with p = 1 and q = 0 is adapted to fit the data, the best results are obtained, and the upsides of AIC and SBC compared to the model are additionally the smallest. The results are shown in Table 3.
Under the significance test level of 0.05, the residual sequence test results are shown in Table 4.
The values of the statistics constructed are far greater than 0.05, so it can be considered that the residual sequence no longer has the significance of extracting information. The results of the coefficient test and residual test show that the ARIMA (1, 1, 0) model implemented in this paper has a decent displaying impact on the sequence of unemployment.
According to the established model, the unemployment of the labor force is predicted, and the prediction results are shown in Table 5.
3.3. The Combined Model
In the previous part, the two-weight neural network model and the ARIMA model were used to analyze the number of unemployed and the unemployment rate in China. The two analysis methods are suitable for fitting the series of unemployed laborers and unemployment rate in China. The fitting values of the two models about the sample data are shown in Table 6:
Figures 3 and 4 describe the fitting situation between the number of unemployed and the unemployment rate under a single model. From the figure, it can be inferred that the fitting value of neural network model and time series model is higher than the actual value, but the curves of the three models are similar, showing that the fitting of single model is better. But for the unemployment rate, there is a certain deviation between the fitting curve and the actual curve.


According to the fitting results of the unemployment model, the sum of residual squares between the fitting value and the true value is solved, which is shown in Table 7:
Assuming that y0 represents the true value, y1 represents the fitting value of the neural network model, y2 represents the fitting value of the time series model, W represents the residual value corresponding to the combination model, ω (0 ≤ ω ≤ 1) represents the weight of the neural network model in the combined model, and 1-ω represents the weight of the time series model in the combined model. The combined model is calculated by the following formula:where k is the number of true values for calculating the residual sum of squares.
After calculation, when ω is 0.4267, the minimum residual error is 21353. Similarly, when ω is 0.598, the minimum residual error of the combined model is 0.6574, which is smaller than the residual sum of squares fitted by neural network and time series model. According to this weight, the predicted value of the combined model for unemployment is shown in Table 8:
3.4. Other Comparison
Because the fitting effect of the combined model is better than that of the single model, which constitutes the combined model, it is still necessary to compare whether the fitting effect is better for other models. The BP neural network model, the most commonly used network model, is selected as the prediction model. For the number of unemployed, the minimum residual error of BP neural network model fitting is 25,673, and for the unemployment rate of labor force, the minimum residual error is 2.9760. Both of them are higher than the minimum residual error obtained by the combined model in fitting, which shows that in the model built in this paper, there are better fitting results and it seems to be more accurate to predict the unemployment in labor force.
4. Conclusion
An accurate prediction of unemployment in labor force is helpful to tackle the issue of metropolitan unemployment. In this paper, the data of unemployment obtained by the Chinese National Bureau of Statistics are taken as a statistical sample. A labor unemployment prediction model in light of the combination of the neural network model and time series model is built, while the situation about labor unemployment in China from 2022 to 2030, is forecasted. The results show that the fitting impact of the combined model is superior to that of the single model, which constitutes the combined model. The minimum residuals of the combined model for fitting and the unemployment rate are 21353 and 0.6574, respectively, which are lower than their single models. At the same time, the fitting condition of the combined model is better than that of the commonly used BP neural network model, which has an excellent impact. When the latter is fitted to the unemployed and unemployment rate, the obtained minimum residuals are 25673 and 2.9760, respectively, which are higher than the combined model. The prediction results of the combined model show that from 2022 to 2030, the number of unemployed labor force in China will fluctuate between 999.6992 and 1038.8520, and the unemployment rate will vary between 5.3406 and 5.8499.
Data Availability
The dataset can be accessed upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.