Abstract

Accurate estimation of reference evapotranspiration (ETo) is key to agricultural irrigation scheduling and water resources management in arid and semiarid areas. This study evaluates the capability of coupling a Bat algorithm with the XGBoost method (i.e., the BAXGB model) for estimating monthly ETo in the arid and semiarid regions of China. Meteorological data from three stations (Datong, Yinchuan, and Taiyuan) during 1991–2015 were used to build the BAXGB model, the multivariate adaptive regression splines (MARS), and the gaussian process regression (GPR) model. Six input combinations with different sets of meteorological parameters were applied for model training and testing, which included mean air temperature (), maximum air temperature (), minimum air temperature (), wind speed (U), relative humidity (RH), and solar radiation () or extraterrestrial radiation (, MJ m−2·d−1). The results indicated that BAXGB models (RMSE = 0.114–0.412 mm·d−1, MAE = 0.087–0.302 mm·d−1, and R2 = 0.937–0.996) were more accurate than either MARS (RMSE = 0.146–0.512 mm·d−1, MAE = 0.112–0.37 mm·d−1, and R2 = 0.935–0.994) or GPR (RMSE = 0.289–0.714 mm·d−1, MAE = 0.197–0.564 mm·d−1, and R2 = 0.817–0.980) model for estimating ETo. Findings of this study would be helpful for agricultural irrigation scheduling in the arid and semiarid regions and may be used as reference in other regions where accurate models for improving local water management are needed.

1. Introduction

There are two ways for water entering the atmosphere from the land surface. One is the direct evaporation of water from soils and plant surfaces, while the other one is transpiration, by which water from soils passes through plants and then evaporates on the leaves and diffuses to the surrounding air through stomata. Technically, it is difficult to distinguish between these two different pathways, which are collectively defined as crop evapotranspiration (ETc). Although ETc can be directly measured by techniques such as lysimeter or eddy covariance, these techniques cannot be applied widely due to their high costs in observation and maintenance. Therefore, ETc is generally estimated alternatively via multiplying the reference crop evapotranspiration (ETo) by the crop coefficient (). According to Allen et al. [1], ETo represents the evapotranspiration from an actively growing virtual vegetated surface that is 0.12 m tall, with a completely shaded ground and adequate water supply, and without pests or diseases. At present, ETo has become a fundamental parameter in the fields of regional water resource management and irrigation scheduling. The most widely accepted ETo estimating method is the FAO-56 Penman–Monteith method (P-M), which has been proposed as a standard method for validating other methods because of its global applicability and simplicity (no need for parameter calibration) [1]. The method is function of basic meteorological parameters only, including temperature, solar radiation, wind speed, and relative humidity. However, even basic meteorological information in reality is often lacking, especially in developing countries, because of limited observational capacity and costs. For example, the lack of solar radiation records is a common problem in most parts of the world. Although there are over 2000 weather stations in China, less than 130 of them have observational records of radiation [24]. To solve the problem of lacking field observational data, various alternative methods, including empirical models, statistical models, artificial intelligence algorithms, and the satellite remote sensing technology, have been proposed and studied worldwide. Although these methods have different application scopes and characteristics, they all play a great role in improving the management of water resources.

Artificial intelligence (AI) algorithms are becoming popular recently because of their high flexibility and precision and have been applied widely in fields of hydrology, agriculture, and environment [58], such as drought forecasting [9], solar radiation modeling [2, 10], temperature estimation [11], precipitation modeling [12], soil moisture simulation [13], and reference evapotranspiration estimation [1417]. In the early years, artificial neural networks (ANNs) were used to simulate ETo, generating more accurate results than the empirical model [18]. Kisi [19] compared least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS), and M5 model tree (M5T) for estimating ETo in Antalya and Isparta of Iran and found that the MARS algorithm was most accurate when local input data were not sufficient. Sanikhani et al. [20] compared six AI algorithms, including multilayer perceptron (MLP), generalized regression neural network (GRNN), radial basis neural networks (RBNN), integrated ANFIS with grid partitioning and subtractive clustering (ANFIS-GP and ANFIS-SC), and GEP, with data from the Antalya and Isparta stations of Iran. The results tended to be station specific, of which GEP and GRNN produced better performance than other algorithms in the Antalya station, while RBNN and ANFIS-SC were superior to other algorithms in the other station. Saggi and Jain [21] used four AI algorithms (including deep learning-multilayer perceptron (DL), generalized linear model (GLM), random forest (RF), and gradient boosting decision tree (GBDT)) to determine daily ETo in Punjab, India. The DL algorithm presented high performance for modeling daily ETo, with NSE = 0.95–0.98, RMSE = 0.192–0.269 mm·d−1, and R2 = 0.95–0.99. Landeras et al. [22] proposed an alternative strategy, using the combined data of local and its neighboring stations as input for ETo estimation with GEP and ANN, results of which showed that GEP had very good performance in all stations. Feng et al. [23] evaluated extreme learning machine (ELM), GRNN, and the Hargreaves model for the estimation of daily ETo using temperature data. They found that the Hargreaves was worse than AI algorithms, while ELM and GRNN performed best in the local scenario and the cross station scenario, respectively. Fan et al. [24] evaluated four tree-based algorithms (including M5T, RF, GBDT, and XGBoost) for estimating ETo in eight stations of China and found that XGBoost could yield comfortable accuracy but with much less time cost when compared with other algorithms. Moreover, although the gaussian process regressive (GPR) algorithm with a clear mathematical basis has been applied widely, only a few studies to date used it for crop ETo estimation [25].

Parameters of AI models usually need to be calibrated, which is generally time-consuming, and requires a solid professional knowledge, especially when the inputs are complicated (e.g., input data from multiple sites or as multiple combinations) [26]. To overcome this problem, optimization with metaheuristic algorithms has been widely adopted by scholars recently, such as using the genetic algorithm (GA) to optimize ANN for ETo estimation in Iran [27] and hybridizing the firefly algorithm with ANFIS for estimating ETo in Burkina Faso [28]. The effects of metaheuristic optimization tend to be algorithm specific, with both positive and negative effects reported in various studies. For instance, Petković et al. [29] found that the radial basis function neural network (RBFNN) coupled with the particle swarm optimization (PSO) algorithm was more accurate than traditional artificial neural network algorithms.

In recent years, the bat algorithm (BA) has received a lot of attention because of its strong searching ability and robustness, and the application of this algorithm has been discussed in many fields [28, 30, 31]. In addition, as a tree-based assemble algorithm, XGBoost has been proved superior to other simple tree algorithms such as RF and M5T in terms of accuracy and efficiency (i.e., less computational cost) for ETo estimation [24]. However, XGBoost contains many parameters, which would have impacts on its stability and efficiency during processing. Therefore, using the bat algorithm to optimize XGBoost is expected to improve the accuracy and stability of the single XGBoost model. To our knowledge, studies using evolutionary algorithms to optimize model parameters of XGBoost in hydrology, agriculture, and environment fields are scarce to date, especially for ETo estimation. Thus, the aims of this study were (1) to assess the capability of XGBoost optimized by BA in estimating monthly ETo with limit meteorological data as inputs and (2) to evaluate the performance of the BAXGB model for ETo estimation in comparison with GPR and MARS models in the arid and semiarid regions of China.

2. Materials and Methods

2.1. Study Area and Data

Monthly meteorological parameters during 1991–2015 from three weather stations in North China were selected in this study for model training and testing, including mean air temperature (, °C), maximum air temperature (, °C), minimum air temperature (, °C), wind speed (U, m s−1), solar radiation (, MJ m−2·d−1) or extraterrestrial radiation (, MJ m−2·d−1), and relative humidity (RH, %). For the selected weather stations (Figure 1), Datong and Yinchuan locate at the arid region of China, with a mean annual precipitation of 183 mm and 364 mm, respectively, while Taiyuan belongs to the semiarid region of China and has an average of 432 mm annual precipitation.

All meteorological data used in this study were obtained from the National Meteorological Information Center of China (NMICC) with quality control. In addition, a further quality control was applied to the data of solar radiation following guidelines in [3] before processing. The missing data of all datasets were no more than 1% in total. The geographic information and meteorological statistics during the training and testing stages are listed in Table 1. It is noted that, compared with the training stage, wind speed of Datong and Yinchuan decreased by 10% and 29.2% in the testing stage, respectively, while solar radiation of Taiyuan increased by 16% during the testing stage. Six input combinations that consisted of different meteorological parameters were used for model training and testing in this study (Table 2). The map road of this study is described in Figure 2.

2.2. Multivariate Adaptive Regression Splines (MARS)

MARS is a nonlinear nonparametric regression technique introduced by Friedman [32], which uses the divide-and-conquer strategy for data processing. Specifically, data for model training are divided into subsets, each of which has its own function with at least two regression lines. The endpoints of a line segment are defined as nodes. During processing, specific assumptions regarding the basic functional relationship between input variables and outputs are not needed, but using the knotting method instead. For each input variable, knotting nodes can be made on any data point within the range of the variable, and the strategy of knotting is to search all possible knotting positions based on the adaptive regression algorithm and form a piecewise curve consequently. A knotting node marks the end of one data region and the beginning of another. Between the two adjoining knotting nodes, the regression of data forms a basis function (BF). Therefore, MARS generates BFs through the step-by-step searching, during which the node positions are automatically selected by an adaptive regression algorithm. If there are multiple input variables, it will search for all possible univariate node positions and interactions between all variables. However, these processes generally build an over fit model. To reduce the degree of overfitting, pruning is incorporated into MARS as the second phase of the model development. In summary, the MARS model is actually a weighed sum of BF, which makes the model more flexible. More details about MARS can be found in [32].

2.3. Gaussian Process Regression (GPR)

Gauss process regression is a framework based on the Bayesian theory with wide application in many fields. A GPR model can be defined as a probability distribution over functions [33], which is denoted as

Since the Gaussian process is a normal distribution in the domain of functions, it can be interpreted from the perspective of space. If there is a predictor group x consisting of d variables, it can be written as , where is the input vector and is the corresponding output vector. In general, the Bayesian regression function can be expressed aswhere is the weight vector, whose prior distribution is . For instance, if , then .

Since the expectation and variance can be expressed asthen we can see that when testing data and training data obey the same distribution, for a given testing vector , the corresponding prediction vector also belongs to the Gaussian distribution. Suppose they form a joint Gaussian distribution:wherein which is the correlation between x and itself, while is the N ∗ 1 order covariance matrix between the input and the output, briefly recorded as , and is the correlation of the output itself, which can be written in short as .

The Bayesian posterior probability is as follows:wherewhere is the posterior probability means of and is the variance of posterior probability.

When there is noise in the regression model,where is the input vector, is the function value vector, and y is the observations with noise. It is generally considered that noise obeys a normal distribution and has a mean value equal to 0, with the variation expressed as . The prior distribution of observed values can be obtained aswhere is the n-dimensional identity matrix. Then, the joint distribution of the observation set and the prediction set can be written as follows:

Therefore, the posterior probability can be calculated as follows:where

In general, to make the symbol concise, data need to be preprocessed, and the mean value can be set as . Therefore, formula (11) can be written as

For regression problems, the mean value is used as predictive values (), which can be written as

Further simplification is as follows:

According to the Mercer theorem, can be changed to the Gaussian kernel method:where and can be determined automatically by the Bayesian method.

2.4. XGBoost, Bat Algorithm, and BAXGB

XGBoost, proposed by Chen and Guestrin [34], is a scalable artificial intelligence algorithm for tree boosting. The original model of XGBoost is gradient boosting decision trees, which combines many “weak” learning models into a “strong” learner through an iterative method. As shown in Figure 3, during each iteration of XGBoost, the previous predictors are corrected with residuals. The algorithm can independently determine the types of loss functions used for model evaluation. To reduce the risk of overfitting, an additional regularization term is added to the model. The mean score of each tree is used as the predictive value for regression. For the m-th decision tree, its calculation formula can be expressed aswhere m is the number of trees, is a function in the functional space W, and W is the space of all decision trees.

The objective function at the t-th iteration can be written aswhere n is the n-th prediction and can be given as

The regularization term for a decision tree is defined by Chen and Guestrin [34] as follows:where is the complexity of each leaf, T is the number of leaves in a tree, is a parameter to scale the penalty, and is the vector of scores on the leaves. Then, the first-order along with the second-order Taylor expansions are taken to the loss function in XGBoost. For example, if the loss function is the mean square error (MSE), then the objective function can finally be derived aswhere is a function that can assign data points to corresponding leaves and and are the first and second derivatives of MSE loss function, respectively. In formula (21), the loss function is determined by the sum of loss values for each data sample. Since each data sample corresponds to only one leaf node, the loss function can also be expressed as the sum of loss values for each leaf node, which iswhere Ij represents all data samples in the leaf node j. Hence, optimization of the objective function is now transformed into finding the minimum value of a quadratic function. In other words, the change of model performance caused by a certain node split in the decision tree now can be evaluated based on the objective function. If the model performance of the decision tree is improved after the node split, then the change will be adopted; otherwise, the split will be stopped. The structure of the XGBoost algorithm can be seen in Figure 3. Besides, when optimizing the objective function, a predictive classifier can be trained against overfitting because of the regularization terms. In this study, three parameters were optimized, including the number of trees (nrounds), the ratio of subdatasets to all data in training the model (subset), and the minimum sum of instance weight (hessian) needed in a child (min_child_weight).

The bat algorithm (BA) is a metaheuristic algorithm developed by Yang [35], which mirrors the echolocation behavior of microbats. Microbats emit sound pulses to search for prey, and each of them has its own echolocation characteristics (i.e., pulse emission rates and loudness), the feature of which can be used for developing various BA. The main processes of constructing a BA can be summarized as follows:(1)A population of virtual bats is first generated for simulations. For each virtual bat, assume that it flies randomly with a velocity and a frequency fi at position xi.(2)From the first iteration to the maximum iteration, positions and velocities of bats at time t are updated bywhere is a random vector from a normal distribution, and are the updated positions and velocities of bats at time t, respectively, and is the current best position (solution) a bat is located after comparing all the solutions among all the bats within the population.(3)Once the current best solution is selected, the bats will update their positions based on a random walk:where is a random number and is the mean loudness of all bats at time t.(4)As the iteration proceed, the rate of pulse emission ri and the loudness Ai of each bat within the population will be updated according to the following rules:where α and γ are both constants. The emission rates and loudness of all bats will be updated until the best solutions are found, and the iteration will be terminated by then.

In this study, a hybrid model integrating BA and XGBoost (i.e., BAXGB) was developed for ETo estimation. As stated previously, the performance of XGBoost models is largely dependent on the appropriate choice of parameters. In BAXGB, the parameter choosing for XGBoost was determined and optimized by BA. This paper optimizes three important parameters of the XGBoost algorithm, which are the number of trees (nrounds), the minimum sum of instance weight needed in a child (Min_child_weight), and the study rate (eta). The parameter values are shown in Table 3.

2.5. Model Performance Evaluation

Five statistical indicators were used for the performance evaluation of GPR, MARS, and BAXGB models in this study, which were as follows.(1)Coefficient of determination:(2)Root-mean-square error (RMSE):(3)Mean absolute error (MAE):(4)The maximum 10% RMSE (RMSEmax 10%):(5)The maximum 10% MAE (MAEmax 10%):where n is the total number of test data, and are the predicted ETo values by artificial intelligence methods and the FAO-56 P-M ETo values, respectively, is the mean P-M ETo value, and is the maximum 10% absolute error of the test data and the FAO-56 P-M ETo.

The value of R2 is between 0 and 1, and the value closer to 1 indicates better regression fitting between predicted ETo and P-M ETo and hence the better model performance. On the contrary, if the value of RMSE or MAE is closer to 0, the model performs better. RMSE is generally useful when model errors follow a normal distribution, whereas MAE suits better for models with a uniform error distribution [24]. In addition, considering people’s different acceptability of the ETo error in different months, as well as the uncertainty caused by the water resource management process, it is necessary to evaluate not only the overall performance of the model but also whether the performance of the model meets the expectation in extreme cases. Therefore, two new statistical indicators using the maximum error point of 10% were adopted in this study to evaluate the model performance in extreme cases.

3. Results and Discussion

3.1. Model Performance in the Datong Station

Performance of the three artificial intelligence models used in this study (i.e., GPR, MARS, and BAXGB) during the training and testing stages for the Datong station is listed in Table 4. Both air temperature and extraterrestrial radiation were used for the first two input combinations. The difference was that the input combination 1 used the monthly average temperature, while the input combination 2 used the monthly maximum and minimum temperatures. Under these two input combinations, the accuracies of the three types of models varied. For GPR and BAXGB models, input combination 2 had higher accuracy than input combination 1. However, MARS models yielded an opposite result. Input combinations 3, 4, and 5 represented the effects of solar radiation, wind speed, and relative humidity or temperature data, respectively. Considering that the extraterrestrial radiation is only related to the latitude of the Earth and does not belong to the meteorological parameters to be observed, this information was added to the combinations 4 and 5. The importance of meteorological factors given by different models was different. MARS and BAXGB models showed that relative humidity was the most important, followed by wind speed and solar radiation. However, GPR model results showed that wind speed was the most critical, followed by relative humidity and solar radiation. Although there was difference to some degree in the importance between different meteorological parameters, it was still clear that relative humidity and wind speed were more important than solar radiation for the Datong station. Input combination 6 was a combination of temperature, wind speed, and relative humidity. For the three artificial intelligence models, results of input combination 6 were better than other combinations that with relative humidity or wind speed alone. Comparisons between the three artificial intelligence models under the same input combination indicated that BAXGB (RMSE = 0.116–0.438 mm·d−1 and MAE = 0.087–0.311 mm·d−1) always performed better than MARS (RMSE = 0.153–0.512 mm·d−1 and MAE = 0.221– 0.559 mm·d−1) and GPR (RMSE = 0.297–0.695 mm·d−1 and MAE = 0.120–0.379 mm·d−1) under each input combination. In addition, the MARS model generally performed better than the GPR model, except in combinations 2 and 3. Across algorithms and input combinations, values of RMSEmax10% and MAEmax10% during the testing stage were 2–2.5 times of their corresponding values of RMSE and MAE, suggesting that model performance in extreme cases would be worse than normal conditions. Although the overall performance of the BAXGB model was good, the maximum error point of 10% with input combination 3 (i.e., BAXGB3) exceeded 1.0 mm·d−1. This may be due to the difference in distribution of solar radiation data between the training and the testing stage.

Figure 4 presents scatter diagrams between the estimated ETo by the three artificial intelligence algorithms and the P-M ETo (mm·d−1) during the testing stage for the Datong station. In Figure 4, the R2 value ranged at 0.832–0.978, 0.953–0.992, and 0.955–0.996 for GPR, MARS, and BAXGB for all input combinations, respectively. The slopes of the GPR models were all less than 0.95 except with the input combination 3, which led to a problem of underestimation. These results clearly indicated that, for the Datong station, BAXGB was the most superior, while GPR was the worst among the AI algorithms, in terms of the performance in ETo estimation.

Figure 5 presents comparisons between the estimated ETo by AI algorithms and the P-M ETo (mm·d−1) during the testing stage for the Datong station. P-M ETo peaked in May with a value over 5 mm·d−1 while had the lowest value in December (0.6 mm·d−1). For the GPR model, the simulation effect of input combination 1 did not catch the trend well, resulting in a deviation more than 1 mm·d−1, presented as underestimation in May and overestimation in September, respectively. For the MARS model, input combinations 2 and 3 had a problem of overall overestimation, especially for months from August to October where the overestimation was more than 0.4 mm·d−1. For the BAXGB model, all input combination simulated an acceptable trend, except for the input combination 3 where an overestimation (more than 0.4 mm·d−1) was detected from July to October. Overall, among the AI methods in this study, the BAXGB model had the smallest deviation in all the 12 months, while GPR models performed worst.

To further evaluate the performance of different AI models, Taylor diagrams were used to analyze the performance of models under different combinations, and the results of the Datong station can be seen in Figure 6. It is not hard to see that combination 4 performs best in the GPR models, while combination 5 performs best in the MARS models. However, the combination 6 is superior to others in the BAXGB models. In addition, the point of the BAXGB model shows much closer than the other two models. This is consistent with the previous statistical indicators and scatter plot results.

3.2. Model Performance in the Yinchuan Station

The performance of the three artificial intelligence models in the training and testing stages for the Yinchuan station is compared in Table 5. In the first two input combinations, combination 2 had higher accuracy than combination 1 for all the three algorithms used in Yinchuan, which was different from Datong. However, the rank of importance of meteorological factors in different models showed similar trends with the Datong station. In addition, the combination including both wind speed and relative humidity (i.e., input combination 6) was still better than either factor alone for MARS and BAXGB models. It can be seen from the comparison of the three artificial intelligence models under the same input combination that BAXGB (RMSE = 0.175–0.261  mm·d−1 and MAE = 0.123–0.204 mm·d−1) performed better than MARS (RMSE = 0.172–0.340 mm·d−1 and MAE = 0.108– 266 mm·d−1) and GPR (RMSE = 0.289–0.663 mm·d−1 and MAE = 0.195–0.511 mm·d−1) under each combination. In general, the BAXGB model was better than the MARS model, and the GPR model was the worst, except for some individual combinations. Similar to the Datong station, RMSEmax10% and MAEmax10% during the testing stage were more than twice as high as the corresponding RMSE and MAE in the Yinchuan station. However, compared with Datong, values of RMSEmax10% and MAEmax10% were generally smaller in Yinchuan.

Figure S1 presents scatter diagrams between the estimated ETo and the P-M ETo (mm d−1) during the testing stage for the Yinchuan station. In Figure S1, the R2 value ranged at 0.871–0.974, 0.974–0.994, and 0.978–0.988 for GPR, MARS, and BAXGB among the input combinations, respectively. The slopes of the GPR models were less than 0.95 for input combinations 1, 3, 4, and 6, which led to a problem of underestimation. The slopes of MARS models were greater than 1 in all input combinations except the combination 3, suggesting a problem of overestimation. Overall, BAXGB and MARS models were superior to GPR models in the Yinchuan station.

Comparisons between the estimated ETo and the P-M ETo during the testing stage for the three AI algorithms in Yinchuan are presented in Figure S2. P-M ETo had a peak in June (nearly 5 mm·d−1) but was generally small in winter (averaged at about 1 mm·d−1). For GPR models, the deviations of input combinations 1 and 5 were the most obvious, in which the combination 1 was seriously overestimated by 0.6 mm·d−1 or more from August to November, while the combination 5 was overvalued every month at a range of 0.2–0.6 mm·d−1. For MARS models, input combinations were performed similarly to the GPR model, but the extent of overestimation was less. In addition, MARS models in combinations 1 and 2 were significantly overestimated from May to July. For the BAXGB model, it had the smallest deviations (less than 0.4 mm·d−1) in each month among the three AI algorithms.

3.3. Model Performance in the Taiyuan Station

The performance of the three artificial intelligence algorithms in the training and testing stages for the Taiyuan station is compared in Table 6. The statistical results in Taiyuan were similar to those in Datong because Datong is geographically closer to Taiyuan when compared with Yinchuan. The most significant difference between Taiyuan and Datong was that MARS models showed a difference in the importance rank of different meteorological parameters. However, the BAXGB model showed that solar radiation was the most important, followed by wind speed and relative humidity. Considering that each statistical index of the BAXGB model was better than the other two models, it is fair to suggest that solar radiation in this area was more important than other meteorological parameters.

Figure S3 presents scatter diagrams between the estimated ETo and the P-M ETo (mm d−1) during the testing stage for the Yinchuan station. In Figure S3, the R2 value ranged at 0.817–0.981, 0.935–0.986, and 0.940–0.987 for GPR, MARS, and BAXGB among the input combinations, respectively. The slopes of the three types of artificial intelligence models were all less than 0.93 for all combinations, suggesting that all combinations had been underestimated. Like results in other stations, the BAXGB and MARS models were superior to the GPR models in Taiyuan.

Figure S4 presents comparison between the estimated ETo and the P-M ETo (mm d−1) during the testing stage for the three AI algorithms in the Taiyuan station. The monthly fluctuation of P-M ETo in Taiyuan was similar to that of Datong, with the peak appeared in May (nearly 5 mm d−1) as well. For the GPR model, the combination 1 was underestimated in spring but overestimated in autumn, the maximum deviation of which was over 1 mm d−1. For MARS models, the input combination 3 had a severe underestimation in April, while other combinations had overestimations at spring and summer. Although there was an overall trend of underestimation for the BAXGB models, combinations 1, 3, and 4 were underestimated by more than 0.4 mm d−1 in some months (in spring usually).

3.4. Comparison between Different Algorithms

Overall, the three AI algorithms presented different optimal combinations at different stations. For example, GPR models in combination 4 yielded better performance than in combination 6 across the three stations. On the contrary, BAXGB models identified that the combination 6 was the optimal input combination. This indicated that the BAXGB algorithm might have more advantages in identifying the contributions of multiple parameters compared with the other two algorithms, and therefore, its accuracy was the best.

Given the fact that BAXGB overall performed better than GPR and MARS across the three stations and the six input combinations, statistical results of BAXGB can be used for analyzing the importance of each major meteorological parameter for estimating ETo. Based on the results of BAXGB models in this study, solar radiation is more important than the others in Yinchuan and Taiyuan stations. This is similar to other studies [24, 36], where they show that solar radiation is more important than wind speed and relative humidity in a semiarid environment of Qilian Mountain of Northwest China and Southeast China. However, relative humidity and wind speed seem more important than solar radiation for the Datong station with BAXGB models in this study, which may be related to the difference in the distribution of solar radiation data between the training and testing stages. Compared with the input combinations 4 and 5, statistical indicators of the input combination 3 are smaller during the training stage. These differences are much larger in the testing stage than the training stage, especially for values of RMSEmax10% and MAEmax10%.

Tao et al. [28] developed a coupling model based on the firefly algorithm and ANFIS in Burkina Faso, a semiarid region of Africa, in which the RMSE in the testing period ranged at 0.24–1.10 mm×d−1 with different input combinations. Shamshirband [37] found that the cuckoo search algorithm optimized ANN and ANFIS had an RMSE value of 0.330 and 0.265 mm d−1 in the testing stage at Serbia, respectively. Another case study in Serbia showed that the genetic programming models coupling with GA generated RMSE at 0.649–1.188 and 0.657–1.193 mm d−1 in the testing stage and the validation, respectively [38]. In addition, a hybrid model (GA-SVM) conducted in the Pailugou Watershed, Northwest China, showed that the RMSE values ranged at 0.138–0.424 mm d−1 [21, 36]. Overall, RMSE values obtained in this study are similar to those achieved by other researchers using coupled artificial intelligence models in semiarid regions. Furthermore, in the case of limited observational data in this region, BAXGB models with only temperature and extraterrestrial radiation as inputs would offer an acceptable result and can be used as an alternative method for ETo estimation. On this basis, adding one or two meteorological factors would also be beneficial.

Over the past decade, many highly efficient evolutionary algorithms have been proposed. The XGBoost is also a relatively new technology, and the study of combining evolutionary algorithms with XGBoost is very limit to date. This study confirms that the bat algorithm is an effective method for optimizing XGBoost. However, whether other evolutionary algorithms would be more efficient than BA for XGBoost optimization needs to be further studied.

4. Conclusions

Accurate and robust reference evapotranspiration (ETo) estimating models are of importance for agriculture and irrigation scheduling. In this study, a new model (BAXGB) that couples a bat algorithm with XGBoost was proposed for the estimation of monthly ETo. The developed BAXGB algorithm was studied by comparing with the FAO 56 Penman–Monteith model results in arid and semiarid environments of China. Data from three stations (Datong, Yinchuan, and Taiyuan) were used to build the BAXGB model as well as the multivariate adaptive regression splines (MARS) and the gaussian process regression (GPR) models. Six models for each algorithm were developed using different input combinations including mean air temperature, maximum air temperature, minimum air temperature, wind speed, relative humidity, and solar radiation or extraterrestrial radiation. The results showed that the BAXGB models exhibited a suitable accuracy for estimating ETo. Except for the input combination 6 (, , U, and RH), BAXGB models were superior to MARS models. The GPR model was generally worse than the MARS model. For the Datong station, the influence of solar radiation on model accuracy was less than that of wind speed and relative humidity. However, for the other two stations, there was an opposite trend. Findings of this research will be helpful for the agricultural and irrigation scheduling and may offer reference to other regions where accurate models for improving the local water management are needed.

Data Availability

The statistical results are within the manuscript and its Supporting Information file. However, raw data underlying the results presented in the study are available from the National Meteorological Information Center (NMIC) of China Meteorological Administration (CMA) (http://data.cma.cn/).

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Y. Han did conceptualization, writing, and original draft preparation. J. Wu was involved in funding acquisition. B. Zhai edited the manuscript. G. Huang was responsible for methodology and validation. Y. Pan cured the data. L. Wu. performed the analysis using software. W. Zeng revised the manuscript.

Acknowledgments

The authors would like to thank the China Meteorological Administration (CMA) for providing the meteorological and radiation data. This research was funded by National Natural Science Foundation of China (nos. 51790533, 51709144, and 31570444) and the Natural Science Foundation of Jiangxi Province, China (no. 20171BAB216051).

Supplementary Materials

Figure S1: scatter plots between the estimated ETo by the three algorithms and the FAO-56 P-M ETo in the testing stage for the Yinchuan station. Figure S2: relative errors by the different algorithms at each month for the Yinchuan station. Figure S3: scatter plots between the estimated ETo by the three algorithms and the FAO-56 P-M ETo in the testing stage for the Taiyuan station. Figure S4: relative errors by the different algorithms at each month for the Taiyuan station. (Supplementary Materials)