Abstract

Accurate forecasting of reference crop evapotranspiration (ET0) is vital for sustainable water resource management. In this study, four popularly used single models were selected to forecast ET0 values, including support vector regression, Bayesian linear regression, ridge regression, and lasso regression models, respectively. They all had advantages of low requirement of data input and good capability of data fitting. However, forecast errors inevitably existed in those forecasting models due to data noise or overfitting. In order to improve the forecast accuracy of models, hybrid models were proposed to integrate the advantages of the single models. Before the construction of hybrid models, each single model’s weight was determined based on two weight determination methods, namely, the variance reciprocal and information entropy weighting methods. To validate the accuracy of the proposed hybrid models, 1–30 d forecast data from January 2 to February 1, 2022, were used as a test set in Xinxiang, North China Plain. The results confirmed the feasibility of the information entropy-based hybrid model. In detail, the information entropy model generated the mean absolute percentage errors of 11.9% or a decrease by 48.9% compared to the single and variance reciprocal hybrid models. Moreover, the model generated a correlation coefficient of 0.90 for 1–30 d ET0 forecasting or an increase by 13.6% compared to other models. The standard deviation and the root mean square error of the information entropy model were 1.65 mm·d−1 and 0.61 mm·d−1 or had a decrease by 16.4% and 23.7%. The maximum precision and the F1 score were 0.9618 and 0.9742 for the information entropy model. It was concluded that the information entropy-based hybrid model had the best midterm (1–30 d) ET0 forecasting performance in the North China Plain.

1. Introduction

With the fast growth of world population, people’s requirements for both food and water resources are dramatically increasing [1]. To cope with the problems, intensive and water-saving agriculture has been rapidly developing to meet the demand on the planet [2]. It has been well-known that water resources used for agricultural sector have occupied 70% of the groundwater withdrawn in China [3, 4]. Furthermore, abiotic drought stress happens more often than before in the context of global warming, resulting in yield stagnation or failure in drought-stressed areas [5]. Timely and precision irrigation is one of the most effective approaches to meet the dual goal of high yields and water-saving. With the intensification of global water shortage, it is crucial to develop a high-efficient water-saving irrigation technique [6]. The forecast of reference evapotranspiration (ET0) is the basis for developing this technique [7], as crop water requirement can be estimated using ET0 and crop coefficients. The improvement in ET0 forecast accuracy will greatly improve the accuracy of irrigation forecasting.

Due to stochastic changes in weather systems, accurate ET0 forecast still remains a challenge [8]. To improve ET0 forecast accuracy, different types of forecasting models have been developed, including physical models, statistical models, and combined hybrid models [9]. Physical models achieve ET0 forecast based on future meteorological data via simulating the relationships among the atmosphere, land surface, and waters [10]. However, the accuracy of numerical weather prediction (NWP) in forecasting long-term meteorological parameters limits the accuracy of other models based on weather forecasts. Statistical models mainly include linear regression models, time-series models, and machine learning models [11]. Due to low requirement of data input and good capability of data fitting, those models have been widely adopted to ET0 forecast [12]. With a limited amount of meteorological factors, linear regression models such as Bayesian linear regression and ridge regression have shown advantages in ET0 forecast in China [13], Mediterranean zones [14], and US High Plains [15]. Besides, several neural network models were introduced to forecast ET0, including BP neural networks and support vector machine models [16]. In Turkey, monthly mean ET0 was estimated using adaptive network-based fuzzy inference system (ANFIS) and artificial neural network (ANN) models [17]. It was found that both the ANFIS and ANN methods were superior to Hargreaves and Ritchie methods in estimation of ET0. Regarding the complexity of ET0 forecast, the applicability of most statistical models was limited, so more novel models have been attempted in recent years [18, 19]. To well simulate the dynamics of ET0 trends, researchers combined the physical and statistical models [20]. These hybrid models were adopted to predict nonstationary data series [21]. In Peninsular Malaysia, a mixed multifractal forecasting model was adopted to forecast ET0 trends by combining the light gradient boosting machine, decision forest regression, and artificial neural network models [22]. A number of studies also indicated that the performance of hybrid forecasting models outperformed that of single models, and the forecast accuracy was greatly improved by hybrid models [2325]. For example, in Atakum, Turkey, a hybrid model was constructed for ET0 forecast based on the autoregressive integrated moving average model and generalized regression neural networks, and the hybrid model effectively improved ET0 forecast accuracy [26]. In Brazil, a hybrid model was established for ET0 forecast based on support vector machine and artificial neural network models, and the results showed that the hybrid model had the highest ET0 forecast efficiency and accuracy [27]. Although time-series models have also been applied to ET0 forecast, those models cannot reflect the internal correlation among factors, compared to hybrid models [28]. Because time-series models usually did not consider external factors, it would induce forecast errors when encountering significant external changes [29].

Till now, how to determine each single model’s weight for a hybrid model is still a challenging task [30]. Research on weight assignment based on different weight decomposition methods is little conducted in ET0 forecasting [31]. In this study, two hybrid ET0 forecasting models were proposed based on variance reciprocal and information entropy algorithms. We hypothesized that the combined hybrid models were able to achieve more accurate ET0 forecast values than single forecasting models. The purposes of this study were as follows: (I) to select the optimal weight determining method for the construction of hybrid ET0 forecasting models, (II) to identify the optimal hybrid ET0 model by comparing the accuracy of different single and hybrid models, and (III) to explain the reason why the proposed hybrid model has advantages over other models.

2. Materials and Methods

2.1. Data Establishment

Experimental data were collected from Xinxiang Meteorological Station, North China Plain (35°08′ N, 113°45′ E, a.s.l. 73 m). This paper selected the dataset from January 1, 2020, to December 31, 2022, including maximum air temperature (), minimum air temperature (), mean air temperature (), and relative humidity (RH). The four parameters have shown significant correlations with ET0 variations in the temperate monsoon climate of China [16]. This study extracted the features of these data on the same historical days in each year.

2.2. Feature Extracting

To extract daily features of meteorological data, we supposed that there were H-related meteorological factors on each single day. Based on the assumption, daily eigenvectors on days i and j were expressed as and . Feature similarity on days i and j was defined as follows:where represents daily feature similarity, H is the number of meteorological factors, and are eigenvectors on days i and j, and h represents the number of current meteorological factors.

2.3. Data Preprocessing

Due to different dimensions of data features, data normalization was needed in data preprocessing, which was a step down-scaling raw data to desired scope for further processes. In this study, the min-max normalization was adopted to normalize the target parameters. The expression was as follows:where is the normalized dimensionless data, x is the original data, is the minimum value in the original data, and is the maximum value in the original data.

2.4. Data Training and Test

This study divided the dataset into the training set and the test set at an 8 : 2 ratio. To obtain as much effective information as possible from the 2020–2022 learning data, a cross-validation method was used to segment the dataset, and a 5-fold cross validation was chosen to obtain the best estimate.

2.5. Selection of Single Models
2.5.1. Support Vector Regression (SVR)

When a support vector regression model (SVR) was used for forecast analysis, its core was to establish an optimal classification surface using an insensitive loss function [32]. In this way, the mean square error of all training sets from this optimal classification surface can be minimized. The output of the SVR model was a linear combination of intermediate nodes, each of which corresponded to a support vector. The structure of the SVR forecasting model is shown in Figure 1.

2.5.2. Bayesian Linear Regression

Bayesian linear regression was a linear regression solved using the Bayesian probability inference method in statistics [33]. The regression has the basic properties of Bayesian statistical models, which can solve the probability density function of weight coefficients and test model hypotheses based on Bayesian factors. Given N sets of independent learning samples, a set of data samples , and were constructed, and the empirical Bayesian test was used in the multiple linear regression model. The Bayesian linear regression model was expressed as follows:where X is the observed data, y is the corresponding target value, N is the number of data samples, f (X) is the Bayesian linear regression model, is the weight coefficient, and ε is the residual. In the model, the weight coefficients are independent of the observed data (X), and ε values are independent and identically distributed. Bayesian linear regression assumes that the residual follows a normal distribution.

2.5.3. Ridge Regression

Ridge regression is an improved least squares estimation method used for the analysis of collinear data. In ridge regression, regression coefficient values are introduced to reduce the effect of the covariance of independent variables [34]. The regression is more suitable to fit poor-conditioned data than the least squares method [35]. It is more suitable to solve the problem of collinearity of independent variable data and the lack of explanatory parameters in multiple linear regression [36].

2.5.4. Lasso Regression

Lasso regression focuses on the multiple regression and performs feature selection by restricting absolute values for target models. It has a strong ability to attenuate the regression coefficient vector via selecting useful data features and obtaining reliable variable selection function [36]. In the regression, if the interpreted variable is set to be independent with given observed values, will be considered independent with respect to standardized observed values (). Lasso regression was expressed as follows:where is standardized observed values and t is the harmonic parameter (t ≥ 0). When t gradually decreases, regression coefficients will also decrease and gradually tend to zero. When t approximates zero, it will be eliminated at the time i and j.

2.6. Construction of Hybrid Models

To construct hybrid forecasting models, reasonable weights should be assigned to each single model. The following steps describe the weight determination process for hybrid models.

2.6.1. Determination of Target Attributes

To determine each model’s target attribute, a decision matrix was established. The matrix was expressed as follows:where is predicted values of the ith model on the jth similar day, m is the number of single forecasting models, and s is the total number of the similar days.

2.6.2. Construction of Eigenvalue Matrix

Eigenvalue is the transformation of a linear transformation represented by a matrix into a numerical transformation. The feature vector corresponding to the feature value is the key. The properties of a complex matrix can be transformed into the feature of eigenvectors. In this way, the complex data can be simplified to be analyzed. The eigenvalue matrix was expressed as follows:where λ is the coefficient matrix of the forecasting models, m is the number of single forecasting models, and s is the total number of the similar days.

2.6.3. Normalization of Eigenvalue Matrix

To make different meteorological parameters comparable and easy to be adopted in the calculation of weights, eigenvalues were normalized using equation (2).

2.6.4. Construction of Matrix R

The normalized r was used to obtain the matrix R. The calculation formula was expressed as follows:where s is the total number of the similar days and is the normalized eigenvalue of the ith model on the jth similar day.

2.6.5. Information Entropy-Based Weight Determination

Information entropy was adopted to measure how cluttered the system data were. The information entropy method was usually used to evaluate the amount of information carried by the dataset through characterizing the complexity and quantifying the amount of uncertainty in a system. It is a metric that describes the degree of chaos in a system to determine the diversity of data. In this study, information entropy was expressed as follows:where is the information entropy of the matrix R, s is the total number of the similar days, and is the normalized value of the eigenvalue of the ith model on the jth similar day.

Considering the properties of the logarithmic functions in equation (8), we defined that, when was equal to zero, also became zero. The function assumes that the weight of a model may approximate to zero when is extremely small.

The magnitude of the weight vector () represents the importance of the corresponding model m in a hybrid model. The larger is, the more important a single model is in the hybrid model, and vice versa. In this study, the weight vector was calculated based on the values of as follows:where is the weight vector on days t, m is the number of single forecasting models, and is the information entropy of the matrix R.

2.6.6. Variance Reciprocal-Based Weight Determination

The variance reciprocal method refers to determining the weight using the proportion of the reciprocal of the sum of error squares of a single model to that of the total sum of error squares. This method avoids the appearance of negative weight values and distributes greater weights to more accurate forecasting models. The model was expressed as follows:where is the square of the forecast error of the ith single model, is the total sum of error squares, and is the sum of error squares of the ith single model.

Based on the values of for each single forecasting model, the weight of the ith single model in a hybrid model was expressed as follows:where m is the number of single forecasting models and is the squares of the forecast error of the ith single forecasting model.

We assumed that there were m different single forecasting models to be integrated in a hybrid model. According to each model’s weight and predicted values, the hybrid forecasting model was expressed as follows:where is the predicted value from a hybrid model at time t, is the weight of the ith single model at time t, and is the predicted value from the ith single model at time t. In the model, the sum of all the is 1.00.

To obtain the predicted ET0 values on days t, predicted results of ET0 from each single model should be multiplied by . Therefore, the final results of ET0 were a product of each allocated weight and the single predicted value of ET0.

2.7. Statistical Evaluation Metrics

In order to evaluate the forecasting performance of proposed models, this paper used the root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination () to analyze the accuracy of ET0 forecasting models from the perspective of the error ratio and goodness of fit. The RMSE and MAPE can be used to represent the average error of the predicted result with respect to the ground-truth result. Lower RMSE and MAPE indicate smaller errors between the predicted and observed values. is used to quantify the correlation between the forecasts and observations. Higher indicates better forecast performance of a forecasting model. The mathematical equations of the statistical indices were described as follows:where n is the number of observations, and are the predicted and observed values of the ith day, respectively, and and are the average values of and for the observation periods, respectively.

2.8. Kruskal–Wallis Test

The Kruskal–Wallis test was used to evaluate the accuracy of forecasted results from single and hybrid models. Different from parametric tests, the Kruskal–Wallis test was a nonparametric test without the data requirement of assumptions of normality and homogeneity of variance. With more than two data groups, it examined the medians of the data groups to determine if the predictions were from distinct populations with the same distribution. It used data ranks to calculate the accuracy instead of using numerical values. More detailed description about the Kruskal–Wallis test can be found in Clark et al. [37].

2.9. Statistical Analysis

In this study, support vector regression, Bayesian linear regression, ridge regression, and lasso regression were conducted using the Python 3.7 programming language. By taking 80% of the historical data as the training set, the data from the rest of the months were used for testing. Data were subjected to analysis of variance (ANOVA) using SPSS 20.0 (SPSS Inc., Chicago, IL, USA). Significance was declared at the probability level of 0.05, unless otherwise stated. Graphs were plotted using Sigmaplot 12.0 (Systat Software, San Jose, CA, USA).

3. Results and Discussion

3.1. Normalization and Correlation Analysis

Most machine learning algorithms required the variables to satisfy a normal distribution. This paper performed a normalization test for the interpreted variables through plotting the data probability distribution. The probability plot indicated the degree to which the actual distribution of the variables was in line with the theoretical normal distribution. The test was used to examine whether the data were in agreement with a normal distribution pattern. If the data followed a normal distribution, the data were regarded coinciding with the theoretical straight line (Figure 2(a)). Based on the distribution of ET0 values, our results showed that after processing of the raw data, the processed ET0 data fell into the −3 to 3 quantiles, suggesting the data conformed to a normal distribution, and were able to be applied to the machine learning algorithms. This result was in agreement with the previous studies conducted in China [38], India [39], Turkey [40], and North America [41].

Before using a model to predict target values, it was usually necessary to perform correlation analysis to remove unrelated variables. This method was used to reduce computational complexity and improve the interpretability of the model [16]. In this study, the Pearson coefficient method was adopted to perform correlation analysis, which was popularly adopted by previous studies [42, 43]. This method mainly measured the linear correlation between variables, with the correlation coefficients from −1 to 1. In the present study, the correlation coefficient between RH and ET0 was less than 0.25, implying a very weak correlation between the two variables (Figure 2(b)). The coefficients among , and ET0 were greater than 0.70, among which had the highest correlation coefficient of 0.84. In a humid subtropical climate of China, it was also observed that was the most correlated parameters to ET0, followed by and [20]. In Quebec, Canada, a noticeable exponential relationship between air temperature and ET0 was observed in a humid continental climate [44]. In northeast China, was considered the greatest contributor to ET0 fluctuations related to low radiation conditions [45]. According to correlation analysis, RH was excluded as an input factor for data feature extraction.

3.2. Forecast Performance of Single Models

In this study, four single models were selected according to the recommendation from previous literature [16, 20], including support vector regression (SVR), Bayesian linear regression, ridge regression, and lasso regression (Figure 3). The four single models showed good capacity to fit the linear relationships between observed and predicted ET0 values. They produced similar ET0 trends to the observed ET0 changes in 2022. The observed ET0 values were from 0.26 to 7.32 mm·d−1 from the P-M model, whereas the predicted ET0 ranges were from 0.24 to 7.48 mm·d−1, 0.45 to 7.54 mm·d−1, 0.09 to 7.12 mm·d−1, and 0.02 to 6.98 mm·d−1 for SVR, Bayesian, lasso, and ridge regression models, respectively. The highest values of ET0 appeared during 160–180 Julian days (corresponding to mid-June), while the lowest values were observed in 1–10 Julian days (corresponding to early January). On average, mean ET0 predicted by SVR was 3.04 mm·d−1, or a decrease by 6.2% compared to the real observations. Similarly, average values of Bayesian linear regression, ridge regression, and lasso regression were 3.01, 2.77, and 2.97 mm·d−1, respectively, or had a decrease by 6.8–17.2%. The annual ET0 values of the four single models were 1002.9–1110.3 mm·yr−1 or had a decrease by 8.1–18.4% compared to the real accumulated ET0 value from the P-M model. It can be concluded that all the single models generated lower averaged and accumulated ET0 values than did the P-M model. However, both the averaged and accumulated ET0 predicted by SVR models were much closer to the real observations than did the linear regression models. In previous studies, Piotrowski et al. found that the SVR model had higher prediction accuracy than linear regression models, such as the ridge regression model [46]. Moreover, among those linear regression models, the Bayesian regression model had higher accuracy than the other two proposed linear models. The reason may lie in that, through establishment of a payoff function, the Bayesian model is able to generate an optimal iteration algorithm to obtain desired predicted values [47].

3.3. Weights Assigned to Hybrid Models

The determination of goodness of fit for single models helped calculate each single model’s weight assigned to hybrid models [48]. In this study, RMSE values of SVR and Bayesian linear regression models were within 0.152–0.168 mm·d−1 for both training and test datasets, while R values of the two models were greater than 0.78, showing better forecast performance (Table 1). The SVR and Bayesian models generated the mean absolute percentage errors (MAPEs) of 23.4% for training and test sets or had a decrease by 29.3% compared to the lasso and ridge models. Therefore, the SVR and Bayesian models were given higher weights (26.3–29.9%) than hybrid models (Figure 4). On average, the weights assigned to lasso and ridge models were 20.3% lower than those of SVR and Bayesian models. This finding was similar to the results observed by Liu et al. [49]. Based on the algorithms of information entropy, the SVR model had the highest weights of 0.299, followed by 0.274 for Bayesian linear regression, 0.203 for lasso regression, and 0.224 for ridge regression, respectively. The information entropy method assigned more weights to SVR and Bayesian models than did the variance reciprocal method.

3.4. Forecast Performance of Hybrid Models

In this study, hybrid forecasting models were established based on the SVR model, Bayesian linear regression, ridge regression, and Lasso regression models (Figure 5). Previous results indicated that hybrid models made good use of information from single models, which effectively increased their forecast accuracy [50]. In this study, variance reciprocal and information entropy methods were adopted to construct hybrid models. The four single forecasting models were incorporated into the hybrid forecasting models according to their assigned weights. The predicted ET0 ranges were from 0.38 to 7.12 mm·d−1 and 0.67 to 7.53 mm·d−1 for information entropy and variance reciprocal models. On average, mean ET0 predicted by the information entropy model was 3.19 mm·d−1 or had a decrease by 1.8% compared to the real observations. Similarly, the average value of the variance reciprocal model was 3.04 mm·d−1 or had a decrease by 6.5%. The annual ET0 values of the two hybrid models were 1157.4–1196.3 mm·yr−1 or had a decrease by 3.1–7.4% compared to the real accumulated ET0 value from the P-M model. Our results indicated that the hybrid forecasting models significantly improved the forecasting accuracy when the advantages of single models were comprehensively incorporated. The hybrid model was more accurate for predicting both daily ET0 dynamics and annual accumulated ET0 values in the North China Plain. Our finding was in agreement with the previous results conducted in the Mediterranean climate of Iran [51].

3.5. Correlation Analysis of Forecasting Models

Compared with the variance reciprocal hybrid model, correlation coefficients (R) were significantly increased, while RMSE values were appreciably decreased by the information entropy hybrid model (Table 1). The information entropy hybrid model generated the mean absolute percentage errors (MAPEs) of 11.9% for training and test sets or a decrease by 39.7%–58.1% compared to the single models and the variance reciprocal model. In this study, R, RMSE, and MAPE values of the reciprocal hybrid model were not significantly different from those of SVR and Bayesian models.

Correlation analysis showed that the information entropy hybrid-based model had the highest coefficient of determination () of 0.922 in 2022, followed by the SVR and Bayesian regression models (Figure 6). The ridge regression model had the lowest value, showing the worst forecasting performance among models. It was surprised that the variance reciprocal hybrid model did not show advantages over the SVR and Bayesian models. The results validated the effectiveness of the information entropy-based hybrid model in improving ET0 forecasting performance. In this study, the information entropy-based hybrid model showed obvious superiority over the four single models and variance reciprocal-based hybrid model. The reason why the variance reciprocal-based model had lower forecasting accuracy might be that the model did not guarantee the errors of the hybrid models were small enough at each time node [52]. The excessive errors at single abnormal moments might result in the failure of the entire model [53]. In this study, we recommended the information entropy-based hybrid model. The advantage of the information entropy weight method was that it determined weights based on the data itself, which had strong objectivity and reduced the influence of subjectivity on forecasted results [54]. The information entropy-based method considered multiple indicators simultaneously, and it was not limited by the evaluation of a single indicator, which was why the information entropy-based hybrid model was preferable to the variance reciprocal model [55].

3.6. Evaluation of Model Forecast Performance

To validate the accuracy of forecasting models, both single and hybrid models were applied to forecast 1–30 lead day ET0 trends using independent datasets from January 2 to February 1, 2022 (Table 2). Moreover, the Taylor diagram was plotted using observed and forecasted data in 2022 for a visual comparison test among different models (Figure 7). The results showed that the information entropy model generated correlation coefficient of 0.90 for 1–30 d ET0 forecasting or an increase by 13.6% compared to the single models and variance reciprocal model. The standard deviation and RMSE of the information entropy model were 1.65 mm·d−1 and 0.61 mm·d−1 or had a decrease by 16.4% and 23.7% compared to other models. The Kruskal–Wallis test was also performed to test the accuracy of the forecasted results. The numerical values of the results concerning the forecasting accuracy, precision, and F1 score for all the models are summarized in Table 2. The forecasting accuracy obtained (97.5%) was maximum for the information entropy hybrid model. There was no significant difference among SVR, Bayesian, and variance reciprocal models. The maximum precision and F1 score were 0.9618 and 0.9742 for the information entropy model. Our results proved that the information entropy hybrid model was the best performance model evaluated. In Turkey, Zouzou and Citakoglu used hybrid models created using SVR and Gaussian process regression models for estimating ET0 [56]. They found that the hybrid model resulted in a reduction in MAE and RMSE. The reason why hybrid models had the ability to lower prediction errors lied in that the innovative weight assignment method reduced the possibility of models outperformance and overfitting by optimizing the weight assignments to models [57]. This increased the generalizability of hybrid models in different climatic zones. The study confirmed that the Kruskal–Wallis method was obvious to do better for accuracy evaluation of models when data pooled did not follow a normalized distribution [37]. This study provides insights on the optimal algorithms of weight determination for the construction of hybrid ET0 forecasting models.

4. Conclusions

To achieve precise ET0 forecast, this study proposed two hybrid models based on variance reciprocal and information entropy algorithms. The two algorithms were used to assign weight of each single model to hybrid models. As a result, hybrid models significantly improved the forecast accuracy compared to the single models. To further investigate the general ability of the hybrid models, forecasted weather data were used to forecast ET0 in 1–30 d lead days in 2022. It was observed that the information entropy-based hybrid model outperformed other forecasting models in improving ET0 forecast performance. This study confirmed that the information entropy-based hybrid model was the one of the most effective hybrid models in midterm (1–30 d) ET0 forecasting in the North China Plain. In future works, more attention should be paid on how to extend the generalizability of hybrid models to other climatic types and to improve the accuracy of long-term (>30 d) ET0 forecasting through integrating the advantages of different regression and machine leaning models.

Data Availability

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors thank the Xinxiang Meteorological Station for helping provide related meteorological and NWP data. This study was supported by the State Key Laboratory of Aridland Crop Science, Gansu Agricultural University (Grant no. GSCS-2020-03), and the China National Agricultural Key & Core Technology R&D Program (Grant no. NK202319080301).