Abstract

GM (1, N) model is one of the grey prediction models considering the influence of many factors. This paper improves GM (1, N) model and constructs PSO-GM (1, N) model. Firstly, Lasso method is used to select the influencing factors, then the priority of influencing factors and the value of parameter N in GM (1, N) model are determined, and finally PSO method is used to optimize GM (1, N) model. Taking the vegetable supply in Henan Province as the research object, this paper makes an empirical test by using PSO-GM (1, N) model. The results show that the key factors affecting the vegetable supply in Henan Province are the number of rural employees, highway mileage, and application of pesticide. The vegetable supply in Henan Province will continue to show a steady growth trend in the next three years.

1. Introduction

Vegetables are essential agricultural products in the daily life of urban and rural residents. Effectively ensuring the supply of vegetables has become a great event related to people’s livelihood. Vegetable supply is affected by many factors such as natural climate, production capacity, and transportation. Effectively identifying the key factors affecting vegetable supply and accurately predicting the future vegetable supply are of great practical significance for stabilizing vegetable market supply and stabilizing vegetable price fluctuation.

Scholars have studied the influencing factors analysis and prediction model construction of vegetable supply from different angles. Wu and Mu [1] found that the urban population has the greatest impact on the circulation of vegetables; Hu et al. [2] analyzed the impact of agricultural informatization on vegetable yield; Yu [3] found that the main factors affecting vegetable production are planting area, vegetable consumption, and agricultural financial expenditure; Yang and Sun [4] found that the main factors affecting vegetable yield are economic development, technical investment, and land; Li and Mu [5] analyzed the impact of financial capital on Farmers’ willingness to continue planting vegetables; Yin [6] constructed a wavelet neural network prediction model and predicted tomato yield; Qiao ad Liu [7] analyzed the impact of the aging of vegetable farmers on vegetable yield. Jin et al. [8] analyzed the role of logistics service providers in detail. From the existing research results, scholars mainly focus on the field of vegetable production, which cannot completely reflect the whole process of vegetable supply. Scholars rarely or even did not consider the grey characteristics of some known information, some unknown information, and poor information of the influencing factors of vegetable supply. In addition, when constructing the prediction model, the multicollinearity and overfitting between the influencing factors are not considered deeply. When considering the influence of multiple factors, Lasso (least absolute shrinkage and selection operator) method can eliminate those unimportant factors from many influencing factors, select a few important factors that have a significant impact on the system, and effectively eliminate the problems of multicollinearity and overfitting. GM (1, N) model is one of the grey prediction models considering the influence of many factors. In view of the large error of the grey prediction model in practical application, this paper proposes an improved PSO-GM (1, N) model, which provides a new way to study the prediction of vegetable supply.

2. PSO-GM (1, N) Model

For a multifactor prediction problem, let be the system characteristic data sequence, and be the influencing factor sequence. In order to eliminate the multicollinearity problem of relevant influencing factors, PSO-GM (1, N) first screens the relevant influencing factors by Lasso method, then determines the priority of influencing factors and the value of parameter N in GM (1, N) model, that is, the second selection of relevant influencing factors, and finally optimizes GM (1, N) model by PSO method.

2.1. Lasso Method

Lasso is a regularized compression estimation method proposed by statistician Robert Tibshirani in 1996. Regularization is a strategic method that can constrain the characteristics of the model and prevent overfitting. Based on the loss function of the least squares (ordinary least square (OLS)), Lasso uses the sum of the absolute values of the regression coefficients as a penalty function to compress the regression coefficients. When the sum of the absolute values of the regression coefficients is small enough, some regression coefficients can be compressed to zero, and then the variables with zero coefficients can be eliminated, so as to achieve the effect of variable selection [9]. Assumed linear regression model , where X0 is the system behavior characteristic vector, X is the influence factor variable matrix, and β is the coefficient vector, then the coefficient estimation of Lasso method is as follows:

Here, .where is the loss function, is the regularization function, and λ is the adjustment parameter, also known as regularization parameter; it is mainly used to balance loss function and regularization function. When λ (λ ≥ 0) is increasing gradually from 0, the regularization function can set the estimated values of some coefficients to zero, so as to eliminate the corresponding variables and achieve the purpose of variable selection. The value of λ is determined by cross validation, that is, one part of the influencing factor sample data is selected as the training set and the other part is selected as the validation set, and different models are generated under the same training set. When the error of the verification set in these models is the smallest, at this time, λ is selected. Using Lasso method, if m main influencing factors are selected from s influencing factors, i.e., the influencing factor set is screened for the first time, it can be recorded as .

2.2. Priority of Influencing Factors

Based on the set of influencing factors selected for the first time, Deng’s grey relational analysis model is used to determine the priority of each influencing factor. Due to the different dimensionless processing methods of the original data, it has different effects on the grey correlation order of the influencing factors. Here, this paper uses six processing methods, i.e., ① initial value, ② average value, ③ minimization, ④ maximization, ⑤ centralization, and ⑥ difference.

Because the correlation order is reflected in the difference of the influence factors on the system behavior, and the more obvious the difference is, the better the reasonable correlation order is selected based on the following principles. Suppose six dimensionless processing methods are used to get the correlation degree of influencing factors as . On the premise of small amount of calculation, two principles are proposed to judge the reasonable correlation order [10]:

For the influencing factor sequence , according to the above principles, the reasonable correlation order of influencing factors can be determined. For convenience of expression, it is recorded here as follows:

2.3. GM (1, N) Model and Its N Value Determination Method
2.3.1. GM (1, N) Model

GM (1, N) model is a first-order differential equation prediction model with N variables, which reflects the influence of N‒1 influencing factor variables on the first derivative of one system behavior variable. In order to be consistent with the traditional GM (1, N) model representation method, the corresponding system behavior characteristic sequence and the influencing factor sequence selected and determined for the first time are reexpressed as follows:

Influencing factor sequence can be expressed as

Let be the 1-AGO sequence of , and be the nearest neighbor mean generating sequence of , then

is called GM (1, N).

When the amplitude of change of , is very small, the approximate time response of GM (1, N) is as follows:where is taken as .

The predicted values of system behavior variables are as follows:

2.3.2. Determination of N Value in GM (1, N) Model

At present, GM (1, N) model has been widely used and innovated. Ma and Liu [11] proposed discrete GM (1, N) model and analyzed the basic law of oil production decline and the influence of related factors; Fan [12] proposed an improved GM (1, N) soybean price forecasting model; Xie et al. [13] proposed the GICM model, screened the key indicators of complex products, and improved the prediction accuracy of GM (1, N) model; Ren [14] used GM (1, N) method to model anaerobic digestion system and predict methane production; Zeng et al. [15] proposed a new multivariable grey prediction GM (1, N) model; Xiong et al. [16] improved GM (1, N) model and proposed AWGM (1, N) model to predict housing demand; Yang and Liu [17] used GM (1, N) model to predict China’s grain output; Cheng et al. [18] established grey GM (1, 3) model to simulate and predict the main influencing factors of clean energy consumption; Zhang [19] analyzed the influencing factors of ginger planting area from the aspects of economy, input, and output and established a grey prediction model.

From the existing research results, no scholar has studied the value of parameter N in GM (1, N) model. This paper proposes a method to determine the value of the parameter N in GM (1, N) model: according to the correlation order of the influencing factors of system behavior, the influencing factor variables are selected in order to establish the model.

In order to test the accuracy of the prediction results of the model, the last three samples are selected from the n sample of the system as the comparison value of the prediction results of the model, and the model is established by using n‒3 sample data. Through the test of the prediction effect, the optimal N value is determined. The specific steps are as follows:

In Step 1, based on the reexpression of influencing factor sequence in relation (5) and 2.3.1, select , establish GM (1, 2) model, and predict the three periods of n‒2, n‒1, and n. Through comparing with the actual value and calculating the average relative error, the result is recorded as .

In Step 2, according to the above method and the correlation order of influencing factors determined in relation (5), the GM (1, 3) model, GM (1, 4) model,…,GM (1, N) model are established, respectively, and the average relative errors are calculated, which are recorded as .

In Step 3, when is the minimum, the prediction accuracy of the model is the highest. In this case, the value of i is the optimal solution, that is, the GM (1, i) model is the optimal prediction model.

This process can be called the second selection influencing factor set.

2.3.3. PSO-GM (1, N) Prediction Model

Particle swarm optimization (PSO) initializes a group of random particles and then finds the optimal value through multiple iterations. In each iteration, each particle updates its speed and position through the individual extreme pbest and global extreme best. The calculation formula is as follows:where is the current velocity of the particle, is the current position of the particle, is the position of the optimal value currently found by the particle, and is the position of the optimal value currently found by the whole population. is the inertia weight, c1, c2 is the acceleration coefficient, usually . r1, r2 is a random number between (0, 1). Through continuous updating, particles finally reach the position of the global optimal value.

According to the basic principle of GM (1, N) model, let ; the calculation formula is , where

It can be seen that the representation of GM (1, N) model depends on matrix B. The estimated value of the parameter u is obtained according to the above method, but it is not necessarily optimal. In this paper, PSO-GM (1, N) model is proposed: add other parameters to matrix B, and then use particle swarm optimization algorithm to obtain the optimal parameter u. Matrix B with parameters is reexpressed as follows:

Then, the parameter is reestimated according to formula , the parameter in is a function of , and GM (1, N) is also a function of .

According to the optimal prediction model GM (1, N) determined in 2.3.2, the relative errors of the model in three periods of n−2, n−1, and n are calculated, respectively: , , and . The minimum average relative error in these three periods is taken as the objective function of particle swarm optimization algorithm, which is specifically expressed as follows:

3. Prediction of Vegetable Supply in Henan Province

3.1. Data Sources and Influencing Factors

Due to the lack of a perfect statistical index system for agricultural products logistics, this paper selects the product of the number of urban residents and the per capita fresh vegetable purchase of urban residents in Henan Province from 2007 to 2019 as the vegetable supply (X0) and uses the predicted value of vegetable supply (X0) to reflect the development and change of vegetable supply in Henan Province in the future. According to the existing research results and the principles of comprehensiveness, representativeness, and operability of index selection, 9 influencing factors are selected: vegetable planting area (X1) (thousand hectares), the application amount of pesticide (X2) (10 thousand tons), the usage of plastic film (X3) (10 thousand tons), plastic film coverage area (X4) (thousand hectares), the number of rural employees (X5) (10 thousand people), the investment in transportation, storage, and postal industry (X6) (100 million RMB), highway mileage (X7) (10 thousand km), the tonnage of road truck (X8) (10 thousand tons), and the number of employees in highway transportation industry (X9) (people). The data in this paper are all from the statistical yearbook of Henan Province.

3.2. Data Processing and Result Analysis
3.2.1. Determination of the First Influencing Factor Set

Based on the influence factor set selected in 3.1, this paper takes the sample data of influence factors from 2007 to 2016 as the training set and uses Lasso method to determine the influence factors with the help of MATLAB software. The minimum error value can be obtained by iterating the program for 8 times. The factors corresponding to the nonzero term in the regression coefficient are the selection results of Lasso method. The specific results are shown in Table 1.

3.2.2. Priority of Influencing Factors Selected for the First Time

According to the influence factor set selected for the first time, the method established in 2.2 is used for processing, and the specific calculation results are shown in Table 2. According to the judgment criteria (3) and criteria (4), the ideal correlation order of influencing factors is obtained after the initial value processing method:

3.2.3. Determination of N Value in GM (1, N) Model

For expression (13), the method determined in 2.3.2 is used for processing, and the results are shown in Table 3. Because ε4 (4.93%) is the minimum value of , so N = 4, that is, the influencing factors of the second selection are the number of rural employees, highway mileage, and application amount of pesticide.

3.2.4. Prediction Results of PSO-GM (1, N) Model and Analysis of Influencing Factors

The PSO-GM (1, N) model established in 2.3.3 is used to predict the vegetable supply in Henan Province from 2017 to 2019, in which the number of particles is 100, the inertia weight is 1, the acceleration coefficient is 2, and the number of iterations is 1000. After many experiments and based on the sample data from 2010 to 2016, the average relative error of vegetable supply in Henan Province from 2017 to 2019 is the smallest, reaching 2.38%.

In order to compare the rationality of the prediction results, PSO-GM (1, 4) model, GM (1, 4) model, GM (1, 1) model, and multiple linear regression method are used to predict the vegetable supply in Henan Province. The actual value of vegetable supply in Henan Province from 2017 to 2019 is taken as the standard. The results are shown in Table 4. It can be seen from Table 4 that the average relative error from high to low is PSO-GM (1, 4) model, GM (1, 4) model, GM (1, 1) model, and multiple linear regression. It can be concluded that compared with the traditional GM (1, N) model, PSO-GM (1, N) model greatly improves the prediction accuracy and is much higher than other prediction models. It has strong rationality and can effectively predict the future vegetable supply.

According to the result of 3.2.3, PSO-GM (1, 4) model is used to forecast the vegetable supply in Henan Province in the next three years. The results are shown in Table 5. The data of influencing factors in the next three years are obtained by GM (1, 1) model, and the results are shown in Table 6.

It can be concluded from Table 5 that the vegetable supply in Henan Province will still show a steady growth trend in the next three years, and it is expected to reach 6.35 million tons in 2023. The key factors affecting the vegetable supply in Henan Province are the number of rural employees, highway mileage, and pesticide application. Among them, the number of rural employees and road mileage has a positive impact on vegetable supply, while the application amount of pesticides has a negative impact on vegetable supply.

Vegetable production is a labor-intensive industry; the number of labor has a direct impact on vegetable production. With the development of social economy, agricultural machinery has greatly improved labor productivity, but in vegetable production, due to the limitations of vegetable types and terrain, agricultural machinery is difficult to get effective application and still needs a lot of labor for manual operation. Therefore, the impact of rural employment on vegetable supply is positive.

The modes of freight transportation include road transportation, railway transportation, waterway transportation, and air transportation. Due to the characteristics of short distance transportation of vegetables, highway transportation is widely used in vegetable transportation. Transportation infrastructure construction plays an important role in promoting vegetable transportation. Perfect traffic network can speed up the circulation of vegetables and expand the scale of vegetable market. Highway mileage is an important indicator of regional traffic infrastructure. Therefore, the impact of highway mileage on vegetable supply is positive.

The excessive pesticide residue in vegetables is an important task in the supervision of vegetable quality and safety. In the process of vegetable production, due to the frequent occurrence of diseases and insect pests, farmers have overdose in vegetable production in order to ensure vegetable yield. Due to the strict control of government departments, the supply of vegetable market has decreased significantly. In order to stabilize the normal supply of vegetable market, it is urgent to reduce the use of pesticides. Therefore, the effect of pesticide application on vegetable supply is negative.

4. Conclusions

Vegetable supply is affected by multiple factors, and the relationship between influencing factors is complex. In this paper, when using PSO-GM (1, N) model to predict vegetable supply in Henan Province, firstly, Lasso method is used to select many factors affecting vegetable supply for the first time, which eliminates the problems of multicollinearity and overfitting between factors. Then, based on the grey correlation analysis, the priority of the influencing factors is sorted, the determination method of parameter N in GM (1, N) is proposed, and the influencing factors are selected for the second time. Finally, the PSO method is used to optimize the GM (1, N) model, which greatly improves the prediction accuracy. Based on PSO-GM (1, N) model, the vegetable supply in Henan Province in the next three years is predicted. The results show that the vegetable supply in Henan Province continues to show an increasing trend. The key factors affecting vegetable supply in Henan Province are the number of rural employees, highway mileage, and application of pesticide. The PSO-GM (1, N) model proposed in this paper provides a new method for multifactor grey prediction. However, this study does not consider the impact of adjacent provinces on vegetable supply in Henan Province, especially spatial autocorrelation. Next, the temporal and spatial evolution law of vegetable supply in Henan Province will be explored from the two dimensions of time and space.

Data Availability

The data used to support the findings of this study are available from the first author upon request.

Conflicts of Interest

The authors have no conflicts of interest.

Authors’ Contributions

Bingjun Li was responsible for proposing the overall idea and framework of the manuscript. Xueqiang Guo was responsible for data processing and writing of the first draft of the manuscript.

Acknowledgments

This work was supported by the Key Project of Soft Science Research in Henan Province (202400410051).