Information Entropy-Based Hybrid Models Improve the Accuracy of Reference Evapotranspiration Forecast

Qin, Anzhen; Fan, Zhilong; Zhang, Liuzeng

doi:https://doi.org/10.1155/2024/9922690

Advances in Meteorology

On this page

Abstract Introduction Materials and Methods Results and Discussion Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2024 | Article ID 9922690 | https://doi.org/10.1155/2024/9922690

Information Entropy-Based Hybrid Models Improve the Accuracy of Reference Evapotranspiration Forecast

Anzhen Qin,^1,2Zhilong Fan ,¹and Liuzeng Zhang³

Academic Editor: Yaolin Lin

Received16 Jun 2023

Revised22 Dec 2023

Accepted22 Jan 2024

Published03 Feb 2024

Abstract

Accurate forecasting of reference crop evapotranspiration (ET₀) is vital for sustainable water resource management. In this study, four popularly used single models were selected to forecast ET₀ values, including support vector regression, Bayesian linear regression, ridge regression, and lasso regression models, respectively. They all had advantages of low requirement of data input and good capability of data fitting. However, forecast errors inevitably existed in those forecasting models due to data noise or overfitting. In order to improve the forecast accuracy of models, hybrid models were proposed to integrate the advantages of the single models. Before the construction of hybrid models, each single model’s weight was determined based on two weight determination methods, namely, the variance reciprocal and information entropy weighting methods. To validate the accuracy of the proposed hybrid models, 1–30 d forecast data from January 2 to February 1, 2022, were used as a test set in Xinxiang, North China Plain. The results confirmed the feasibility of the information entropy-based hybrid model. In detail, the information entropy model generated the mean absolute percentage errors of 11.9% or a decrease by 48.9% compared to the single and variance reciprocal hybrid models. Moreover, the model generated a correlation coefficient of 0.90 for 1–30 d ET₀ forecasting or an increase by 13.6% compared to other models. The standard deviation and the root mean square error of the information entropy model were 1.65 mm·d⁻¹ and 0.61 mm·d⁻¹ or had a decrease by 16.4% and 23.7%. The maximum precision and the F1 score were 0.9618 and 0.9742 for the information entropy model. It was concluded that the information entropy-based hybrid model had the best midterm (1–30 d) ET₀ forecasting performance in the North China Plain.

1. Introduction

With the fast growth of world population, people’s requirements for both food and water resources are dramatically increasing [1]. To cope with the problems, intensive and water-saving agriculture has been rapidly developing to meet the demand on the planet [2]. It has been well-known that water resources used for agricultural sector have occupied 70% of the groundwater withdrawn in China [3, 4]. Furthermore, abiotic drought stress happens more often than before in the context of global warming, resulting in yield stagnation or failure in drought-stressed areas [5]. Timely and precision irrigation is one of the most effective approaches to meet the dual goal of high yields and water-saving. With the intensification of global water shortage, it is crucial to develop a high-efficient water-saving irrigation technique [6]. The forecast of reference evapotranspiration (ET₀) is the basis for developing this technique [7], as crop water requirement can be estimated using ET₀ and crop coefficients. The improvement in ET₀ forecast accuracy will greatly improve the accuracy of irrigation forecasting.

Due to stochastic changes in weather systems, accurate ET₀ forecast still remains a challenge [8]. To improve ET₀ forecast accuracy, different types of forecasting models have been developed, including physical models, statistical models, and combined hybrid models [9]. Physical models achieve ET₀ forecast based on future meteorological data via simulating the relationships among the atmosphere, land surface, and waters [10]. However, the accuracy of numerical weather prediction (NWP) in forecasting long-term meteorological parameters limits the accuracy of other models based on weather forecasts. Statistical models mainly include linear regression models, time-series models, and machine learning models [11]. Due to low requirement of data input and good capability of data fitting, those models have been widely adopted to ET₀ forecast [12]. With a limited amount of meteorological factors, linear regression models such as Bayesian linear regression and ridge regression have shown advantages in ET₀ forecast in China [13], Mediterranean zones [14], and US High Plains [15]. Besides, several neural network models were introduced to forecast ET₀, including BP neural networks and support vector machine models [16]. In Turkey, monthly mean ET₀ was estimated using adaptive network-based fuzzy inference system (ANFIS) and artificial neural network (ANN) models [17]. It was found that both the ANFIS and ANN methods were superior to Hargreaves and Ritchie methods in estimation of ET₀. Regarding the complexity of ET₀ forecast, the applicability of most statistical models was limited, so more novel models have been attempted in recent years [18, 19]. To well simulate the dynamics of ET₀ trends, researchers combined the physical and statistical models [20]. These hybrid models were adopted to predict nonstationary data series [21]. In Peninsular Malaysia, a mixed multifractal forecasting model was adopted to forecast ET₀ trends by combining the light gradient boosting machine, decision forest regression, and artificial neural network models [22]. A number of studies also indicated that the performance of hybrid forecasting models outperformed that of single models, and the forecast accuracy was greatly improved by hybrid models [23–25]. For example, in Atakum, Turkey, a hybrid model was constructed for ET₀ forecast based on the autoregressive integrated moving average model and generalized regression neural networks, and the hybrid model effectively improved ET₀ forecast accuracy [26]. In Brazil, a hybrid model was established for ET₀ forecast based on support vector machine and artificial neural network models, and the results showed that the hybrid model had the highest ET₀ forecast efficiency and accuracy [27]. Although time-series models have also been applied to ET₀ forecast, those models cannot reflect the internal correlation among factors, compared to hybrid models [28]. Because time-series models usually did not consider external factors, it would induce forecast errors when encountering significant external changes [29].

Till now, how to determine each single model’s weight for a hybrid model is still a challenging task [30]. Research on weight assignment based on different weight decomposition methods is little conducted in ET₀ forecasting [31]. In this study, two hybrid ET₀ forecasting models were proposed based on variance reciprocal and information entropy algorithms. We hypothesized that the combined hybrid models were able to achieve more accurate ET₀ forecast values than single forecasting models. The purposes of this study were as follows: (I) to select the optimal weight determining method for the construction of hybrid ET₀ forecasting models, (II) to identify the optimal hybrid ET₀ model by comparing the accuracy of different single and hybrid models, and (III) to explain the reason why the proposed hybrid model has advantages over other models.

2. Materials and Methods

2.1. Data Establishment

Experimental data were collected from Xinxiang Meteorological Station, North China Plain (35°08′ N, 113°45′ E, a.s.l. 73 m). This paper selected the dataset from January 1, 2020, to December 31, 2022, including maximum air temperature (), minimum air temperature (), mean air temperature (), and relative humidity (RH). The four parameters have shown significant correlations with ET₀ variations in the temperate monsoon climate of China [16]. This study extracted the features of these data on the same historical days in each year.

2.2. Feature Extracting

To extract daily features of meteorological data, we supposed that there were H-related meteorological factors on each single day. Based on the assumption, daily eigenvectors on days i and j were expressed as and . Feature similarity on days i and j was defined as follows:where represents daily feature similarity, H is the number of meteorological factors, and are eigenvectors on days i and j, and h represents the number of current meteorological factors.

2.3. Data Preprocessing

Due to different dimensions of data features, data normalization was needed in data preprocessing, which was a step down-scaling raw data to desired scope for further processes. In this study, the min-max normalization was adopted to normalize the target parameters. The expression was as follows:where is the normalized dimensionless data, x is the original data, is the minimum value in the original data, and is the maximum value in the original data.

2.4. Data Training and Test

This study divided the dataset into the training set and the test set at an 8 : 2 ratio. To obtain as much effective information as possible from the 2020–2022 learning data, a cross-validation method was used to segment the dataset, and a 5-fold cross validation was chosen to obtain the best estimate.

2.5. Selection of Single Models

2.5.1. Support Vector Regression (SVR)

When a support vector regression model (SVR) was used for forecast analysis, its core was to establish an optimal classification surface using an insensitive loss function [32]. In this way, the mean square error of all training sets from this optimal classification surface can be minimized. The output of the SVR model was a linear combination of intermediate nodes, each of which corresponded to a support vector. The structure of the SVR forecasting model is shown in Figure 1.

2.5.2. Bayesian Linear Regression

Bayesian linear regression was a linear regression solved using the Bayesian probability inference method in statistics [33]. The regression has the basic properties of Bayesian statistical models, which can solve the probability density function of weight coefficients and test model hypotheses based on Bayesian factors. Given N sets of independent learning samples, a set of data samples , and were constructed, and the empirical Bayesian test was used in the multiple linear regression model. The Bayesian linear regression model was expressed as follows:where X is the observed data, y is the corresponding target value, N is the number of data samples, f (X) is the Bayesian linear regression model, is the weight coefficient, and ε is the residual. In the model, the weight coefficients are independent of the observed data (X), and ε values are independent and identically distributed. Bayesian linear regression assumes that the residual follows a normal distribution.

2.5.3. Ridge Regression

Ridge regression is an improved least squares estimation method used for the analysis of collinear data. In ridge regression, regression coefficient values are introduced to reduce the effect of the covariance of independent variables [34]. The regression is more suitable to fit poor-conditioned data than the least squares method [35]. It is more suitable to solve the problem of collinearity of independent variable data and the lack of explanatory parameters in multiple linear regression [36].

2.5.4. Lasso Regression

Lasso regression focuses on the multiple regression and performs feature selection by restricting absolute values for target models. It has a strong ability to attenuate the regression coefficient vector via selecting useful data features and obtaining reliable variable selection function [36]. In the regression, if the interpreted variable is set to be independent with given observed values, will be considered independent with respect to standardized observed values (). Lasso regression was expressed as follows:where is standardized observed values and t is the harmonic parameter (t ≥ 0). When t gradually decreases, regression coefficients will also decrease and gradually tend to zero. When t approximates zero, it will be eliminated at the time i and j.

2.6. Construction of Hybrid Models

To construct hybrid forecasting models, reasonable weights should be assigned to each single model. The following steps describe the weight determination process for hybrid models.

2.6.1. Determination of Target Attributes

To determine each model’s target attribute, a decision matrix was established. The matrix was expressed as follows:where is predicted values of the ith model on the jth similar day, m is the number of single forecasting models, and s is the total number of the similar days.

2.6.2. Construction of Eigenvalue Matrix

Eigenvalue is the transformation of a linear transformation represented by a matrix into a numerical transformation. The feature vector corresponding to the feature value is the key. The properties of a complex matrix can be transformed into the feature of eigenvectors. In this way, the complex data can be simplified to be analyzed. The eigenvalue matrix was expressed as follows:where λ is the coefficient matrix of the forecasting models, m is the number of single forecasting models, and s is the total number of the similar days.

2.6.3. Normalization of Eigenvalue Matrix

To make different meteorological parameters comparable and easy to be adopted in the calculation of weights, eigenvalues were normalized using equation (2).

2.6.4. Construction of Matrix R

The normalized r was used to obtain the matrix R. The calculation formula was expressed as follows:where s is the total number of the similar days and is the normalized eigenvalue of the ith model on the jth similar day.

2.6.5. Information Entropy-Based Weight Determination

Information entropy was adopted to measure how cluttered the system data were. The information entropy method was usually used to evaluate the amount of information carried by the dataset through characterizing the complexity and quantifying the amount of uncertainty in a system. It is a metric that describes the degree of chaos in a system to determine the diversity of data. In this study, information entropy was expressed as follows:where is the information entropy of the matrix R, s is the total number of the similar days, and is the normalized value of the eigenvalue of the ith model on the jth similar day.

Considering the properties of the logarithmic functions in equation (8), we defined that, when was equal to zero, also became zero. The function assumes that the weight of a model may approximate to zero when is extremely small.

The magnitude of the weight vector () represents the importance of the corresponding model m in a hybrid model. The larger is, the more important a single model is in the hybrid model, and vice versa. In this study, the weight vector was calculated based on the values of as follows:where is the weight vector on days t, m is the number of single forecasting models, and is the information entropy of the matrix R.

2.6.6. Variance Reciprocal-Based Weight Determination

The variance reciprocal method refers to determining the weight using the proportion of the reciprocal of the sum of error squares of a single model to that of the total sum of error squares. This method avoids the appearance of negative weight values and distributes greater weights to more accurate forecasting models. The model was expressed as follows:where is the square of the forecast error of the ith single model, is the total sum of error squares, and is the sum of error squares of the ith single model.

Based on the values of for each single forecasting model, the weight of the ith single model in a hybrid model was expressed as follows:where m is the number of single forecasting models and is the squares of the forecast error of the ith single forecasting model.

We assumed that there were m different single forecasting models to be integrated in a hybrid model. According to each model’s weight and predicted values, the hybrid forecasting model was expressed as follows:where is the predicted value from a hybrid model at time t, is the weight of the ith single model at time t, and is the predicted value from the ith single model at time t. In the model, the sum of all the is 1.00.

To obtain the predicted ET₀ values on days t, predicted results of ET₀ from each single model should be multiplied by . Therefore, the final results of ET₀ were a product of each allocated weight and the single predicted value of ET₀.

2.7. Statistical Evaluation Metrics

In order to evaluate the forecasting performance of proposed models, this paper used the root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination () to analyze the accuracy of ET₀ forecasting models from the perspective of the error ratio and goodness of fit. The RMSE and MAPE can be used to represent the average error of the predicted result with respect to the ground-truth result. Lower RMSE and MAPE indicate smaller errors between the predicted and observed values. is used to quantify the correlation between the forecasts and observations. Higher indicates better forecast performance of a forecasting model. The mathematical equations of the statistical indices were described as follows:where n is the number of observations, and are the predicted and observed values of the ith day, respectively, and and are the average values of and for the observation periods, respectively.

2.8. Kruskal–Wallis Test

The Kruskal–Wallis test was used to evaluate the accuracy of forecasted results from single and hybrid models. Different from parametric tests, the Kruskal–Wallis test was a nonparametric test without the data requirement of assumptions of normality and homogeneity of variance. With more than two data groups, it examined the medians of the data groups to determine if the predictions were from distinct populations with the same distribution. It used data ranks to calculate the accuracy instead of using numerical values. More detailed description about the Kruskal–Wallis test can be found in Clark et al. [37].

2.9. Statistical Analysis

In this study, support vector regression, Bayesian linear regression, ridge regression, and lasso regression were conducted using the Python 3.7 programming language. By taking 80% of the historical data as the training set, the data from the rest of the months were used for testing. Data were subjected to analysis of variance (ANOVA) using SPSS 20.0 (SPSS Inc., Chicago, IL, USA). Significance was declared at the probability level of 0.05, unless otherwise stated. Graphs were plotted using Sigmaplot 12.0 (Systat Software, San Jose, CA, USA).

3. Results and Discussion

3.1. Normalization and Correlation Analysis

Most machine learning algorithms required the variables to satisfy a normal distribution. This paper performed a normalization test for the interpreted variables through plotting the data probability distribution. The probability plot indicated the degree to which the actual distribution of the variables was in line with the theoretical normal distribution. The test was used to examine whether the data were in agreement with a normal distribution pattern. If the data followed a normal distribution, the data were regarded coinciding with the theoretical straight line (Figure 2(a)). Based on the distribution of ET₀ values, our results showed that after processing of the raw data, the processed ET₀ data fell into the −3 to 3 quantiles, suggesting the data conformed to a normal distribution, and were able to be applied to the machine learning algorithms. This result was in agreement with the previous studies conducted in China [38], India [39], Turkey [40], and North America [41].

(a)

(b)

Before using a model to predict target values, it was usually necessary to perform correlation analysis to remove unrelated variables. This method was used to reduce computational complexity and improve the interpretability of the model [16]. In this study, the Pearson coefficient method was adopted to perform correlation analysis, which was popularly adopted by previous studies [42, 43]. This method mainly measured the linear correlation between variables, with the correlation coefficients from −1 to 1. In the present study, the correlation coefficient between RH and ET₀ was less than 0.25, implying a very weak correlation between the two variables (Figure 2(b)). The coefficients among , and ET₀ were greater than 0.70, among which had the highest correlation coefficient of 0.84. In a humid subtropical climate of China, it was also observed that was the most correlated parameters to ET₀, followed by and [20]. In Quebec, Canada, a noticeable exponential relationship between air temperature and ET₀ was observed in a humid continental climate [44]. In northeast China, was considered the greatest contributor to ET₀ fluctuations related to low radiation conditions [45]. According to correlation analysis, RH was excluded as an input factor for data feature extraction.

3.2. Forecast Performance of Single Models

In this study, four single models were selected according to the recommendation from previous literature [16, 20], including support vector regression (SVR), Bayesian linear regression, ridge regression, and lasso regression (Figure 3). The four single models showed good capacity to fit the linear relationships between observed and predicted ET₀ values. They produced similar ET₀ trends to the observed ET₀ changes in 2022. The observed ET₀ values were from 0.26 to 7.32 mm·d⁻¹ from the P-M model, whereas the predicted ET₀ ranges were from 0.24 to 7.48 mm·d⁻¹, 0.45 to 7.54 mm·d⁻¹, 0.09 to 7.12 mm·d⁻¹, and 0.02 to 6.98 mm·d⁻¹ for SVR, Bayesian, lasso, and ridge regression models, respectively. The highest values of ET₀ appeared during 160–180 Julian days (corresponding to mid-June), while the lowest values were observed in 1–10 Julian days (corresponding to early January). On average, mean ET₀ predicted by SVR was 3.04 mm·d⁻¹, or a decrease by 6.2% compared to the real observations. Similarly, average values of Bayesian linear regression, ridge regression, and lasso regression were 3.01, 2.77, and 2.97 mm·d⁻¹, respectively, or had a decrease by 6.8–17.2%. The annual ET₀ values of the four single models were 1002.9–1110.3 mm·yr⁻¹ or had a decrease by 8.1–18.4% compared to the real accumulated ET₀ value from the P-M model. It can be concluded that all the single models generated lower averaged and accumulated ET₀ values than did the P-M model. However, both the averaged and accumulated ET₀ predicted by SVR models were much closer to the real observations than did the linear regression models. In previous studies, Piotrowski et al. found that the SVR model had higher prediction accuracy than linear regression models, such as the ridge regression model [46]. Moreover, among those linear regression models, the Bayesian regression model had higher accuracy than the other two proposed linear models. The reason may lie in that, through establishment of a payoff function, the Bayesian model is able to generate an optimal iteration algorithm to obtain desired predicted values [47].

(a)

(b)

(c)

(d)

3.3. Weights Assigned to Hybrid Models

The determination of goodness of fit for single models helped calculate each single model’s weight assigned to hybrid models [48]. In this study, RMSE values of SVR and Bayesian linear regression models were within 0.152–0.168 mm·d⁻¹ for both training and test datasets, while R values of the two models were greater than 0.78, showing better forecast performance (Table 1). The SVR and Bayesian models generated the mean absolute percentage errors (MAPEs) of 23.4% for training and test sets or had a decrease by 29.3% compared to the lasso and ridge models. Therefore, the SVR and Bayesian models were given higher weights (26.3–29.9%) than hybrid models (Figure 4). On average, the weights assigned to lasso and ridge models were 20.3% lower than those of SVR and Bayesian models. This finding was similar to the results observed by Liu et al. [49]. Based on the algorithms of information entropy, the SVR model had the highest weights of 0.299, followed by 0.274 for Bayesian linear regression, 0.203 for lasso regression, and 0.224 for ridge regression, respectively. The information entropy method assigned more weights to SVR and Bayesian models than did the variance reciprocal method.

3.4. Forecast Performance of Hybrid Models

In this study, hybrid forecasting models were established based on the SVR model, Bayesian linear regression, ridge regression, and Lasso regression models (Figure 5). Previous results indicated that hybrid models made good use of information from single models, which effectively increased their forecast accuracy [50]. In this study, variance reciprocal and information entropy methods were adopted to construct hybrid models. The four single forecasting models were incorporated into the hybrid forecasting models according to their assigned weights. The predicted ET₀ ranges were from 0.38 to 7.12 mm·d⁻¹ and 0.67 to 7.53 mm·d⁻¹ for information entropy and variance reciprocal models. On average, mean ET₀ predicted by the information entropy model was 3.19 mm·d⁻¹ or had a decrease by 1.8% compared to the real observations. Similarly, the average value of the variance reciprocal model was 3.04 mm·d⁻¹ or had a decrease by 6.5%. The annual ET₀ values of the two hybrid models were 1157.4–1196.3 mm·yr⁻¹ or had a decrease by 3.1–7.4% compared to the real accumulated ET₀ value from the P-M model. Our results indicated that the hybrid forecasting models significantly improved the forecasting accuracy when the advantages of single models were comprehensively incorporated. The hybrid model was more accurate for predicting both daily ET₀ dynamics and annual accumulated ET₀ values in the North China Plain. Our finding was in agreement with the previous results conducted in the Mediterranean climate of Iran [51].

(a)

(b)

3.5. Correlation Analysis of Forecasting Models

Compared with the variance reciprocal hybrid model, correlation coefficients (R) were significantly increased, while RMSE values were appreciably decreased by the information entropy hybrid model (Table 1). The information entropy hybrid model generated the mean absolute percentage errors (MAPEs) of 11.9% for training and test sets or a decrease by 39.7%–58.1% compared to the single models and the variance reciprocal model. In this study, R, RMSE, and MAPE values of the reciprocal hybrid model were not significantly different from those of SVR and Bayesian models.

Correlation analysis showed that the information entropy hybrid-based model had the highest coefficient of determination () of 0.922 in 2022, followed by the SVR and Bayesian regression models (Figure 6). The ridge regression model had the lowest value, showing the worst forecasting performance among models. It was surprised that the variance reciprocal hybrid model did not show advantages over the SVR and Bayesian models. The results validated the effectiveness of the information entropy-based hybrid model in improving ET₀ forecasting performance. In this study, the information entropy-based hybrid model showed obvious superiority over the four single models and variance reciprocal-based hybrid model. The reason why the variance reciprocal-based model had lower forecasting accuracy might be that the model did not guarantee the errors of the hybrid models were small enough at each time node [52]. The excessive errors at single abnormal moments might result in the failure of the entire model [53]. In this study, we recommended the information entropy-based hybrid model. The advantage of the information entropy weight method was that it determined weights based on the data itself, which had strong objectivity and reduced the influence of subjectivity on forecasted results [54]. The information entropy-based method considered multiple indicators simultaneously, and it was not limited by the evaluation of a single indicator, which was why the information entropy-based hybrid model was preferable to the variance reciprocal model [55].

(a)

(b)

(c)

(d)

(e)

(f)

3.6. Evaluation of Model Forecast Performance

To validate the accuracy of forecasting models, both single and hybrid models were applied to forecast 1–30 lead day ET₀ trends using independent datasets from January 2 to February 1, 2022 (Table 2). Moreover, the Taylor diagram was plotted using observed and forecasted data in 2022 for a visual comparison test among different models (Figure 7). The results showed that the information entropy model generated correlation coefficient of 0.90 for 1–30 d ET₀ forecasting or an increase by 13.6% compared to the single models and variance reciprocal model. The standard deviation and RMSE of the information entropy model were 1.65 mm·d⁻¹ and 0.61 mm·d⁻¹ or had a decrease by 16.4% and 23.7% compared to other models. The Kruskal–Wallis test was also performed to test the accuracy of the forecasted results. The numerical values of the results concerning the forecasting accuracy, precision, and F1 score for all the models are summarized in Table 2. The forecasting accuracy obtained (97.5%) was maximum for the information entropy hybrid model. There was no significant difference among SVR, Bayesian, and variance reciprocal models. The maximum precision and F1 score were 0.9618 and 0.9742 for the information entropy model. Our results proved that the information entropy hybrid model was the best performance model evaluated. In Turkey, Zouzou and Citakoglu used hybrid models created using SVR and Gaussian process regression models for estimating ET₀ [56]. They found that the hybrid model resulted in a reduction in MAE and RMSE. The reason why hybrid models had the ability to lower prediction errors lied in that the innovative weight assignment method reduced the possibility of models outperformance and overfitting by optimizing the weight assignments to models [57]. This increased the generalizability of hybrid models in different climatic zones. The study confirmed that the Kruskal–Wallis method was obvious to do better for accuracy evaluation of models when data pooled did not follow a normalized distribution [37]. This study provides insights on the optimal algorithms of weight determination for the construction of hybrid ET₀ forecasting models.

4. Conclusions

To achieve precise ET₀ forecast, this study proposed two hybrid models based on variance reciprocal and information entropy algorithms. The two algorithms were used to assign weight of each single model to hybrid models. As a result, hybrid models significantly improved the forecast accuracy compared to the single models. To further investigate the general ability of the hybrid models, forecasted weather data were used to forecast ET₀ in 1–30 d lead days in 2022. It was observed that the information entropy-based hybrid model outperformed other forecasting models in improving ET₀ forecast performance. This study confirmed that the information entropy-based hybrid model was the one of the most effective hybrid models in midterm (1–30 d) ET₀ forecasting in the North China Plain. In future works, more attention should be paid on how to extend the generalizability of hybrid models to other climatic types and to improve the accuracy of long-term (>30 d) ET₀ forecasting through integrating the advantages of different regression and machine leaning models.

Data Availability

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors thank the Xinxiang Meteorological Station for helping provide related meteorological and NWP data. This study was supported by the State Key Laboratory of Aridland Crop Science, Gansu Agricultural University (Grant no. GSCS-2020-03), and the China National Agricultural Key & Core Technology R&D Program (Grant no. NK202319080301).

References

T. Sternberg, “Regional drought has a global impact,” Nature, vol. 472, no. 7342, p. 169, 2011.
View at: Publisher Site | Google Scholar
D. Tilman, C. Balzer, J. Hill, and B. Befort, “Global food demand and the sustainable intensification of agriculture,” Proceedings of the National Academy of Sciences, vol. 108, no. 50, pp. 20260–20264, 2011.
View at: Publisher Site | Google Scholar
F. Wang, J. Xiao, B. Ming et al., “Grain yields and evapotranspiration dynamics of drip-irrigated maize under high plant density across arid to semi-humid climates,” Agricultural Water Management, vol. 247, Article ID 106726, 2021.
View at: Publisher Site | Google Scholar
S. Wang, Y. Hu, R. Yuan, W. Feng, Y. Pan, and Y. Yang, “Ensuring water security, food security, and clean water in the North China Plain—conflicting strategies,” Current Opinion in Environmental Sustainability, vol. 40, pp. 63–71, 2019.
View at: Publisher Site | Google Scholar
C. Yang, H. Fraga, W. V. Ieperen, and J. Santos, “Assessment of irrigated maize yield response to climate change scenarios in Portugal,” Agricultural Water Management, vol. 184, pp. 178–190, 2017.
View at: Publisher Site | Google Scholar
S. Yan, Y. Wu, J. Fan et al., “Optimization of drip irrigation and fertilization regimes to enhance winter wheat grain yield by improving post-anthesis dry matter accumulation and translocation in northwest China,” Agricultural Water Management, vol. 271, Article ID 107782, 2022.
View at: Publisher Site | Google Scholar
L. Pereira, P. Paredes, F. Melton et al., “Prediction of crop coefficients from fraction of ground cover and height. Background and validation using ground and remote sensing data,” Agricultural Water Management, vol. 241, Article ID 106197, 2020.
View at: Publisher Site | Google Scholar
J. Nie, P. Dai, and A. Sobel, “Dry and moist dynamics shape regional patterns of extreme precipitation sensitivity,” Proceedings of the National Academy of Sciences, vol. 117, no. 16, pp. 8757–8763, 2020.
View at: Publisher Site | Google Scholar
B. Mueller and S. Seneviratne, “Systematic land climate and evapotranspiration biases in CMIP5 simulations,” Geophysical Research Letters, vol. 41, no. 1, pp. 128–134, 2014.
View at: Publisher Site | Google Scholar
K. Wang and R. Dickinson, “A review of global terrestrial evapotranspiration: observation, modeling, climatology, and climatic variability,” Reviews of Geophysics, vol. 50, no. 2, p. 2005, 2012.
View at: Publisher Site | Google Scholar
M. Al-Gabalawy, N. Hosny, and A. Adly, “Probabilistic forecasting for energy time series considering uncertainties based on deep learning algorithms,” Electric Power Systems Research, vol. 196, Article ID 107216, 2021.
View at: Publisher Site | Google Scholar
Y. Feng, Y. Peng, N. Cui, D. Gong, and K. Zhang, “Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data,” Computers and Electronics in Agriculture, vol. 136, pp. 71–78, 2017.
View at: Publisher Site | Google Scholar
Y. Dong, Y. Zhao, J. Zhai et al., “Changes in reference evapotranspiration over the non-monsoon region of China during 1961–2017: relationships with atmospheric circulation and attributions,” International Journal of Climatology, vol. 41, no. S1, pp. 734–751, 2020.
View at: Publisher Site | Google Scholar
P. Paredes, G. Rodrigues, I. Alves, and L. Pereira, “Partitioning evapotranspiration, yield prediction and economic returns of maize under various irrigation management strategies,” Agricultural Water Management, vol. 135, pp. 27–39, 2014.
View at: Publisher Site | Google Scholar
B. Scanlon, C. Faunt, L. Longuevergne et al., “Groundwater depletion and sustainability of irrigation in the US high Plains and central valley,” Proceedings of the National Academy of Sciences, vol. 109, no. 24, pp. 9320–9325, 2012.
View at: Publisher Site | Google Scholar
A. Qin, Z. Fan, and L. Zhang, “Hybrid genetic algorithm−based BP neural network models optimize estimation performance of reference crop evapotranspiration in China,” Applied Sciences, vol. 12, no. 20, Article ID 10689, 2022.
View at: Publisher Site | Google Scholar
H. Citakoglu, M. Cobaner, T. Haktanir, and O. Kisi, “Estimation of monthly mean reference evapotranspiration in Turkey,” Water Resources Management, vol. 28, no. 1, pp. 99–113, 2014.
View at: Publisher Site | Google Scholar
M. Chia, Y. Huang, and C. Koo, “Support vector machine enhanced empirical reference evapotranspiration estimation with limited meteorological parameters,” Computers and Electronics in Agriculture, vol. 175, Article ID 105577, 2020.
View at: Publisher Site | Google Scholar
J. Fan, X. Wang, L. Wu et al., “Comparison of support vector machine and extreme gradient boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: a case study in China,” Energy Conversion and Management, vol. 164, pp. 102–111, 2018.
View at: Publisher Site | Google Scholar
L. Ferreira and F. Da Cunha, “New approach to estimate daily reference evapotranspiration based on hourly temperature and relative humidity using machine learning and deep learning,” Agricultural Water Management, vol. 234, Article ID 106113, 2020.
View at: Publisher Site | Google Scholar
S. Wang, J. Gong, H. Gao, W. Liu, and Z. Feng, “Gaussian process regression and cooperation search algorithm for forecasting nonstationary runoff time series,” Water, vol. 15, no. 11, p. 2111, 2023.
View at: Publisher Site | Google Scholar
S. Yong, J. Ng, Y. Huang, and C. Ang, “Estimation of reference crop evapotranspiration with three different machine learning models and limited meteorological variables,” Agronomy, vol. 13, no. 4, p. 1048, 2023.
View at: Publisher Site | Google Scholar
A. Pour-Ali Baba, J. Shiri, O. Kisi, A. Fard, S. Kim, and R. Amini, “Estimating daily reference evapotranspiration using available and estimated climatic data by adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN),” Hydrology Research, vol. 44, no. 1, pp. 131–146, 2013.
View at: Publisher Site | Google Scholar
V. Kishore and M. Pushpalatha, “Forecasting evapotranspiration for irrigation scheduling using neural networks and ARIMA,” International Journal of Applied Engineering Research, vol. 12, pp. 10841–10847, 2017.
View at: Google Scholar
M. Sattari, P. Mahesh, K. Yürekli, and A. Ünlükara, “M5 model trees and neural network based modelling of ET₀ in Ankara, Turkey,” Turkish Journal of Engineering and Environmental Sciences, vol. 37, pp. 211–219, 2013.
View at: Google Scholar
E. Küçüktopcu, E. Cemek, B. Cemek, and H. Simsek, “Hybrid statistical and machine learning methods for daily evapotranspiration modeling,” Sustainability, vol. 15, no. 7, p. 5689, 2023.
View at: Publisher Site | Google Scholar
L. Ferreira, F. da Cunha, R. de Oliveira, and E. Fernandes Filho, “Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM—a new approach,” Journal of Hydrology, vol. 572, pp. 556–570, 2019.
View at: Publisher Site | Google Scholar
P. Hebbalaguppae Krishnashetty, J. Balasangameshwara, S. Sreeman, S. Desai, and A. Bengaluru Kantharaju, “Cognitive computing models for estimation of reference evapotranspiration: a review,” Cognitive Systems Research, vol. 70, pp. 109–116, 2021.
View at: Publisher Site | Google Scholar
P. Banda, B. Cemek, and E. Küçüktopcu, “Estimation of daily reference evapotranspiration by neuro computing techniques using limited data in a semi-arid environment,” Archives of Agronomy and Soil Science, vol. 64, no. 7, pp. 916–929, 2018.
View at: Publisher Site | Google Scholar
Y. Zhang, Y. Zhao, C. Kong, and B. Chen, “A new prediction method based on VMD-PRBF-ARMA-E model considering wind speed characteristic,” Energy Conversion and Management, vol. 203, Article ID 112254, 2020.
View at: Publisher Site | Google Scholar
F. Granata, “Evapotranspiration evaluation models based on machine learning algorithms—a comparative study,” Agricultural Water Management, vol. 217, pp. 303–315, 2019.
View at: Publisher Site | Google Scholar
J. Ha, Y. Kim, H. Im, N. Kim, S. Sim, and Y. Yoon, “Error correction of meteorological data obtained with mini-AWSs based on machine learning,” Advances in Meteorology, vol. 2018, Article ID 7210137, 8 pages, 2018.
View at: Publisher Site | Google Scholar
D. Joshi, A. St-Hilaire, T. Ouarda, and A. Daigle, “Statistical downscaling of precipitation and temperature using sparse Bayesian learning, multiple linear regression and genetic programming frameworks,” Canadian Water Resources Journal/Revue canadienne des ressources hydriques, vol. 40, no. 4, pp. 392–408, 2015.
View at: Publisher Site | Google Scholar
R. Obenchain, “Efficient generalized ridge regression,” Open Statistics, vol. 3, pp. 1–18, 2022.
View at: Publisher Site | Google Scholar
M. Rajan, “An efficient Ridge regression algorithm with parameter estimation for data analysis in machine learning,” SN Computer Science, vol. 3, no. 2, p. 171, 2022.
View at: Publisher Site | Google Scholar
B. Kibria and A. Lukman, “A new ridge-type estimator for the linear regression model: simulations and applications,” Scientific, vol. 2020, Article ID 9758378, 16 pages, 2020.
View at: Publisher Site | Google Scholar
J. Clark, P. Kulig, K. Podsiadło et al., “Empirical investigations into Kruskal-Wallis power studies utilizing Bernstein fits, simulations and medical study datasets,” Scientific Reports, vol. 13, no. 1, p. 2352, 2023.
View at: Publisher Site | Google Scholar
X. Jiang, G. Wang, Y. Wang, J. Yao, B. Xue, and Y. A, “A hybrid framework for simulating actual evapotranspiration in data-deficient areas: a case study of the Inner Mongolia section of the Yellow River Basin,” Remote Sensing, vol. 15, no. 9, p. 2234, 2023.
View at: Publisher Site | Google Scholar
P. Rai, P. Kumar, N. Al-Ansari, and A. Malik, “Evaluation of machine learning versus empirical models for monthly reference evapotranspiration estimation in Uttar Pradesh and Uttarakhand States, India,” Sustainability, vol. 14, no. 10, p. 5771, 2022.
View at: Publisher Site | Google Scholar
M. Cobaner, H. Citakoğlu, T. Haktanir, and O. Kisi, “Modifying Hargreaves–Samani equation with meteorological variables for estimation of reference evapotranspiration in Turkey,” Hydrology Research, vol. 48, no. 2, pp. 480–497, 2017.
View at: Publisher Site | Google Scholar
S. Ravindran, S. Bhaskaran, and S. Ambat, “A deep neural network architecture to model reference evapotranspiration using a single input meteorological parameter,” Environmental Processes, vol. 8, no. 4, pp. 1567–1599, 2021.
View at: Publisher Site | Google Scholar
B. Yildiz, J. Bilbao, and A. Sproul, “A review and analysis of regression and machine learning models on commercial building electricity load forecasting,” Renewable and Sustainable Energy Reviews, vol. 73, pp. 1104–1122, 2017.
View at: Publisher Site | Google Scholar
H. Zhang, J. Chang, L. Zhang, Y. Wang, Y. Li, and X. Wang, “NDVI dynamic changes and their relationship with meteorological factors and soil moisture,” Environmental Earth Sciences, vol. 77, no. 16, p. 582, 2018.
View at: Publisher Site | Google Scholar
S. Ricard and F. Anctil, “Forcing the Penman-Montheith formulation with humidity, radiation, and wind speed taken from reanalyses, for hydrologic modeling,” Water, vol. 11, no. 6, p. 1214, 2019.
View at: Publisher Site | Google Scholar
X. Liu, C. Liu, X. Liu, C. Li, L. Cai, and M. Dong, “Spatial and temporal variation in reference evapotranspiration and its climatic drivers in Northeast China,” Water, vol. 14, no. 23, p. 3911, 2022.
View at: Publisher Site | Google Scholar
P. Piotrowski, D. Baczyński, M. Kopyt, and T. Gulczyński, “Advanced ensemble methods using machine learning and deep learning for one-day-ahead forecasts of electric energy production in wind farms,” Energies, vol. 15, no. 4, p. 1252, 2022.
View at: Publisher Site | Google Scholar
C. Wang, P. Shang, and P. Shen, “An improved artificial bee colony algorithm based on Bayesian estimation,” Complex and Intelligent Systems, vol. 8, no. 6, pp. 4971–4991, 2022.
View at: Publisher Site | Google Scholar
H. Chen and Z. Sheng, “A kind of new combination forecasting method based on induced ordered weighted geometric averaging (IOWGA) operator,” Journal of Industrial Engineering and Management, vol. 19, pp. 36–39, 2005.
View at: Google Scholar
D. Liu, L. Wu, and Y. Yang, “A hybrid weight assignment model for urban underground space resources evaluation integrated with the weight of time dimension,” Applied Sciences, vol. 10, no. 15, p. 5152, 2020.
View at: Publisher Site | Google Scholar
G. Guizzi, C. Silvestri, E. Romano, and R. Revetria, “A comparison of forecast models to predict weather parameters,” Advances in Energy and Environmental Science and Engineering, vol. 2, pp. 88–96, 2015.
View at: Google Scholar
B. Shirmohammadi, H. Moradi, V. Moosavi, M. T. Semiromi, and A. Zeinali, “Forecasting of meteorological drought using Wavelet-ANFIS hybrid model for different time steps (case study: southeastern part of east Azerbaijan province, Iran),” Natural Hazards, vol. 69, no. 1, pp. 389–402, 2013.
View at: Publisher Site | Google Scholar
A. Araghi, J. Adamowski, and C. J. Martinez, “Comparison of wavelet-based hybrid models for the estimation of daily reference evapotranspiration in different climates,” Journal of Water and Climate Change, vol. 11, no. 1, pp. 39–53, 2018.
View at: Publisher Site | Google Scholar
Z. Shang, M. Li, Y. Chen, C. Li, Y. Yang, and L. Li, “A novel model based on multiple input factors and variance reciprocal: application on wind speed forecasting,” Soft Computing, vol. 26, no. 17, pp. 8857–8877, 2022.
View at: Publisher Site | Google Scholar
J. Zhang, D. Chen, and P. Gao, “A divide-and-conquer information entropy algorithm for dependency matrix processing,” IEEE Access, vol. 11, pp. 121306–121313, 2023.
View at: Publisher Site | Google Scholar
S. Liu and L. Chao, “Approach of optimal diagnosis test sequence based on improved information entropy,” Electronic Measurement Technology, vol. 36, no. 12, pp. 28–31, 2013.
View at: Google Scholar
Y. Zouzou and H. Citakoglu, “General and regional cross-station assessment of machine learning models for estimating reference evapotranspiration,” Acta Geophysica, vol. 71, no. 2, pp. 927–947, 2022.
View at: Publisher Site | Google Scholar
S. Bayram and H. Çıtakoğlu, “Modeling monthly reference evapotranspiration process in Turkey: application of machine learning methods,” Environmental Monitoring and Assessment, vol. 195, no. 1, p. 67, 2023.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2024 Anzhen Qin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

164

Downloads

126

Citations