Abstract

Reference evapotranspiration () is one of the major parameters affecting hydrological cycle. Use of satellite images can be very helpful to compensate for lack of reliable weather data. This study aimed to determine using land surface temperature (LST) data acquired from MODIS sensor. LST data were considered as inputs of two data-driven models including artificial neural network (ANN) and M5 model tree to estimate values and their results were compared with calculated by FAO-Penman-Monteith (FAO-PM) equation. Climatic data of five weather stations in Khuzestan province, which is located in the southeastern Iran, were employed in order to calculate . LST data extracted from corresponding points of MODIS images were used in training of ANN and M5 model tree. Among study stations, three stations (Amirkabir, Farabi, and Gazali) were selected for creating the models and two stations (Khazaei and Shoeybie) for testing. In Khazaei station, the coefficient of determination () values for comparison between calculated by FAO-PM and estimated by ANN and M5 tree model were 0.79 and 0.80, respectively. In a similar manner, values for Shoeybie station were 0.86 and 0.85. In general, the results showed that both models can properly estimate by means of LST data derived from MODIS sensor.

1. Introduction

Decline in availability of water for agriculture is one of the most serious challenges facing human life that has affected agricultural production in some arid and semiarid regions around the world. Determining future demands of water for agriculture section includes computation of several factors such as runoff, groundwater, precipitation, and evapotranspiration (ET) [1]. ET is identified as the combination of two different processes including evaporation from the soil surface and crop transpiration [2].

Accurate and reliable estimates of ET are necessary to determine temporal variations in irrigation requirement, improve allocation of water resources, and evaluate the effect of changes in land use and crop patterns on the water balance [3]. Considering difficulties in direct measurement of ET [4], this parameter is estimated through reference evapotranspiration () and crop coefficient () for a specific crop [5]. Therefore, calculation of (evapotranspiration of the given plant) and subsequently crop water requirement as irrigation water depend on estimates. is defined as the rate of ET from a reference surface in such a way that the surface is assumed to be covered with a hypothetical grass with specific characteristics [2]. A large number of methods have been offered in order to model [2, 6, 7]; the majority of these methods are extremely complicated and rely on weather data such as temperature, solar radiation, wind speed, and air humidity [1, 8]. is principally calculated by physically based equations (e.g., Penman-Monteith (PM) equation) or by empirical relationships between meteorological variables.

The FAO-Penman-Monteith (FAO-PM) method is currently recommended as an accepted standard technique for computing . This method employs a variety of complex equations, which are based on weather data, including air temperature, humidity, radiation, and wind speed [9]. The requirement of various weather data, which are not mainly available in many regions, is the major limitation of this method [10]; in addition, it seems that the reordered data are not accurate enough, especially in developing countries [11].

From the viewpoint of temporal and spatial variations of climatic data, weather stations density (the number of stations per unit of area) is not generally adequate for demonstrating spatiotemporal variations. Today, remote sensing techniques, which gather land surface information from a broader geographic scope, are known as an effective tool and methodology for extracting the ground parameters that can be used to estimate ET at regional scales [12]. To this end, in the basins that may face shortage of measured data, remote sensing data can considerably supply needed information such as albedo, leaf area index, and land surface temperature (LST) [13]. Recent advances in remote sensing techniques and enhancement of temporal, spatial, and radiometric resolution of satellite images have produced a range of useful products. These products can be widely used in various research areas, especially water and environment. Many studies have been conducted in estimating as an important element of hydrological cycle. In this regard, it can be mentioned in the studies of Guerschman et al. [14]; Maeda et al. [4]; Yang et al. [12]; Cleugh et al. [15]; Tian et al. [16]; Hankerson et al. [17]; Li et al. [18].

Recently, some data-based techniques such as artificial neural networks (ANNs), support vector machine, and M5 model tree have been broadly utilized in solving practical problems. Mentioned techniques can simulate processes in which there is not an exact solution, and it is merely possible to approach near-optimal solution [19]. In these methods, a new knowledge is not produced about the process of a problem. In fact, thanks to the proper recognition of the problem process in selecting the input and output variables, caused by the use of modern regression techniques, this helps to better fit the data [20]. Data-driven techniques can efficiently model nonlinear and complicated processes such as ET that its precise calculation depends on numerous weather data and interactions between them [21].

Recently, by the developing new machine learning methods, application of new models in the estimation of has been increased. Kişi and Öztürk [22] studied the adaptive neurofuzzy inference system (ANFIS) technique in estimation of in Los Angeles, California, and showed that it could be successfully considered in the estimation. Kişi and Çimen [23] applied SVM in modeling in central California and concluded that SVR model could be embedded as a module for estimating data in hydrological modeling studies. Tabari et al. [24] investigated the potential of SVM, ANFIS, multiple linear regression (MLR), and multiple nonlinear regression (MNLR) for estimating in a semiarid highland environment in Iran. They found that the SVM and ANFIS had better performance in modeling compared to the others. Besides, several studies have shown high performance of ANNs in modeling ET against statistical methods [8, 2529]. The substantial benefit of ANN methods, compared to conventional methods, is the ability of solving problems that are difficult to formalize [30]. Hou et al. [10] studied ANNs in simulating values and demonstrated that the models reflected a noticeable efficiency. Landeras et al. [21] compared calculated by ANN to the values of radiation and thermal models and declared that ANN can provide more accurate results than empirical models.

Due to the simple geometric structure of model tree and providing fast computing set of IF-THEN rules, which are easily understandable [31], the model tree has been utilized in different scientific studies. Pal and Deswal [1] in a study evaluated the capability of M5 model tree to model in daily scale, and the obtained results showed that M5 model tree can be properly used in this context. Rahimikhoob et al. [11] conducted a study to assess the performance of M5 model tree in reaching values in a semiarid region of Iran by means of values of pan evaporation. They asserted that this model shows more satisfactory performance than other approaches in estimating in the study region. Sattari et al. [32] compared ANN and M5 model tree in determining and concluded that both ANN model and M5 model tree can properly estimate . ANN suggested a better performance; however, M5 model tree offered simple linear regressions that can simply calculate .

The main aim of this study is to evaluate the ability of M5 model tree and ANN model in estimating daily using LST values obtained from MODerate Resolution Imaging Spectroradiometer (MODIS)/Terra sensor. The accuracy of calculated by means of M5 model tree and ANN model has been investigated by comparing with the amounts of FAO-PM method.

2. Materials and Methods

2.1. Study Area and Data

Farms of Sugarcane Development Project, which are located in Khuzestan province, Iran, were considered as the study area in the present research. Khuzestan province is stretched in southeastern part of the country, with the area of approximately 63,238 km2 between longitudes of 47°38′ E to 50°32′ E and latitudes of 29°57′ N to 33°00′ N. This province is located adjoining the boundary of Iraq from the west and Persian Gulf from the south. Khuzestan province according to de Martonne climate classification has a semiarid climate. The average temperature during the winter season is nearly 9.14°C, and occasionally its minimum falls down a few degrees below 0°C. The average temperature in the summers is approximately 31.2°C, whereas its maximum sometimes exceeds 50°C. Required climatic variables in order to estimate in this study were collected from weather stations, namely, Amirkabir, Farabi, Gazali, Khazaei, and Shoeybie for the years of 2006 and 2007 in the periods when satellite images existed. The meteorological data consisted of daily observations of maximum and minimum temperature, sunshine duration, relative humidity, and wind speed which have been used in estimating by FAO-PM equation. Figure 1 illustrates the location of weather stations in the study area.

2.2. MODIS Images

In the current study, data of daytime LST () and nighttime LST () obtained from the MODIS sensor were used as inputs to the M5 model tree and ANN models. The MODIS sensor, which is on board of the Terra and Aqua satellites, is able to provide images in 36 spectral bands between 0.62 and 14.385 μm. Spatial resolutions of the sensor are 250 m, 500 m, and 1 km in several bands. In the thermal-infrared (TIR), MODIS sensors have 1 km spatial resolution [19]. This sensor performs a range of daily regular observations on the sea, land, and atmosphere in the global scale; moreover, it has an extensive and continuous spectral and spatial coverage. MODIS sensor was launched by the Terra and Aqua satellites in 1999 and 2002, respectively. These satellites both have a sun-synchronous, near polar, circular orbit; Terra satellite everyday crosses over the equator from north to south, while Aqua satellite has an inverse movement from south to north. United States National Aeronautics and Space Administration (NASA) is responsible for processing and providing MODIS sensor’s data. NASA supplies the data in a large variety of products applicable to land, ocean, and atmosphere uses [4]. The data and products of MODIS sensor are available on the NASA’s website (http://modis.gsfc.nasa.gov/). The images selected for this study are the level three (L3) data of the MODIS sensor (MOD11A1), which are related to Terra satellite. This product provides daytime and nighttime LST data collected on a 1 km spatial resolution and gridded in the sinusoidal projection system [33]. The MODIS sensor LST data is acquired from two TIR channels, that is, 31 (10.78–11.28 μm) and 32 (11.77–12.27 μm) using the split-window algorithm [34].

A total of 428 images of the MODIS sensor associated with MOD11A1 production from the Terra satellite in the time periods of 2006 and 2007 covering the study area were retrieved from the Land Processes Distributed Active Archive Center (LP DAAC) (http://lpdaac.usgs.gov). Cloudless sky and fair weather were the reason for selecting these pictures. Changing coordinate systems from the sinusoidal to UTM (datum WGS84) was performed using MODIS Reprojection Tool (MRT). Afterward, locations of study stations were projected on the images and then and were extracted from the corresponding pixels of the images. Extracting LST values in the desired stations was done by means of Hawth’s Analysis Tools (HAT) in ArcGIS 9.3 software. Unit of LST values extracted from the images was Kelvin, so before utilizing them as model input, these values were converted to Celsius.

In the present study, whole LST values extracted from MODIS images in three stations, including Amirkabir, Farabi, and Gazali, were used to construct and train the ANN and M5 model tree, and two other stations, Khazaei and Shoeybie, were applied to test mentioned models.

2.3. Artificial Neural Network (ANN)

ANNs are known as powerful machine-learning techniques, which are extensively used for numerical prediction and classification [35]. ANNs are also operative tools to model nonlinear systems and need less amount of data as inputs than customary mathematical approaches [27]. A neural network is based on processing a set of data that its analysis procedure has been inspired by biological neural systems. Neural networks consist of an interconnected group of neurons placed near to each other in layers that implement data processing through neurons and their connections. ANNs included one input layer, an output layer, and one or more hidden layers; neurons of each layer are joined to all neurons of the previous layer by means of weighted connections. The input vectors are received by neurons of the input layer and the values can be conveyed to the next layer of processing elements across connections. The number of neurons in each hidden layer is determined by the user. This process continues up to the output layer. ANN modeling is carried out in two stages including training and testing. In the training stage, after getting input data, neural network makes attempts to convert the inputs into desired outputs. Connecting weights of network neurons are determined in this stage; these weights in the testing stage are examined using various datasets.

A multilayer perceptron (MLP) ANN with one input layer, one hidden layer, and one output layer was used to estimate from , , mean LST (), and day of year (DOY) data as inputs. The transfer function in the networks was log-sigmoid for this study. The accuracy of the networks was evaluated for each epoch in the training through mean squared error (MSE). The backpropagation (BP) algorithm was employed to train MLP neural network. Levenberg-Marquardt (LM), a second-order nonlinear optimization technique, was used with an early stopping criterion to improve the network training speed and efficiency. The target error for the training of networks was equal to . With the specification of this target error, the number of iteration for all ANN models was placed to 1000.

2.4. M5 Model Tree

M5 model tree was introduced by Quinlan in 1992 [36]. The model is established according to a binary decision tree in which there are linear regression functions in the terminal node (leaf), which sets a relationship between independent and dependent variables [11]. Tree-based models are made according to a divide-and-conquer method for constructing a relationship between independent and dependent variables [37]. The tree models can be also used for qualitative and quantitative data [9].

Building the M5 model tree involves two separate steps [36, 38]. The first one consists of dividing data into subsets to construct the model tree. The dividing criterion is based on the standard deviation of the subset values that reach a node as a measure of error value in that node and additionally calculating the expected reduction in the error as a result of testing each attribute at that node. Equation of standard deviation reduction (SDR) is given as follows [1]: where is defined as a set of examples that reaches the node; demonstrates the subset of examples that have the th output of the potential set, and SD is the standard deviation [39]. Due to branching process, data in child nodes (subtree or smaller nodes) have fewer SD than parent nodes (greater nodes). After examining all possible structures, a structure would be picked out that has the maximum expected error reduction. This dividing often creates a great tree-like structure that leads to overfit structure. In order to avoid overfitting, in the second step the overgrown tree is pruned, and pruned subtrees are replaced with linear regression functions. Figure 2(a) illustrates splitting the input space (independent variables) into six linear regression functions at the leaves (labeled LM1 through LM6) by M5 model tree algorithm. General form of the model is , in which , , and are linear regression constants. Figure 2(b) shows branches relations in the form of a tree diagram [19].

2.5. FAO-PM Equation

FAO-PM equation, a physically based method, is widely used in the calculation of and is a reliable method in the estimation of in various climate conditions [2, 40, 41]. Moreover, FAO proposed this equation as a standard method for estimating . Although the most precise method for estimating is to employ lysimeter data, considering the absence of such data in the study region, assessment of ANN and M5 model tree performance was made by FAO-PM equation. The equation for estimating daily has been presented as follows [2]: where is the reference evapotranspiration (mm d−1), is the slope of the saturation vapor pressure curve (kPa °C−1) at the daily mean air temperature (°C), and are the net solar radiation and soil heat flux density (MJ m−2 d−1), is the psychrometric constant (kPa °C−1), is the daily mean temperature (°C), is the mean wind speed (ms−1), is the saturation vapor pressure (kPa), and is the actual vapor pressure (kPa) [2].

In current study, the daily values of , , , and were calculated using the equation given by Allen et al. [2]. For calculation of , it is needed to calculate daily solar radiation (). Allen et al. [2] proposed to be calculated using the Angstrom formula, which relates solar radiation to extraterrestrial radiation () and relative sunshine duration. The formula has the following form: where and are, respectively, the actual daily sunshine duration and daily maximum possible sunshine duration, and are, respectively, the daily global solar radiation (MJ m−2 d−1) and the daily extraterrestrial solar radiation (MJ m−2 d−1) on a horizontal surface, and and are empirical coefficients which depend on location. The values of a and b coefficients are considered as default values, 0.25 and 0.5, respectively, proposed by Allen et al. [2].

2.6. Statistical Analyses

In this study, results obtained from FAO-PM equation were utilized as reference values and the results achieved by ANN and M5 model tree were taken into account as estimated values. The comparison of ANN model and M5 model tree values with FAO-PM equation was made using statistical indices such as coefficient of determination (), root mean square error (RMSE), and mean absolute error (MAE), in addition to plotting graphs. These indices are defined as follows: where is computed by FAO-PM, denotes the average amount of computed computed by FAO-PM, is computed by ANN/M5, and is the average computed amounts by last models.

3. Results and Discussion

In the first stage of the study, before calculating the , the correlation between LST obtained from MODIS sensor and air temperature was determined. Accordingly, scatter plots of LST values provided by MODIS images versus air temperature are presented for different stations in Figures 3 and 4. As it can be perceived in Figure 3, there is a high correlation between the and minimum air temperature (). The greatest was observed between and in Shoeybie station, which was equal to 0.91; the minimum was seen in the Amirkabir station with the value of 0.87. Furthermore, for all stations, of two aforementioned parameters was entirely assessed at 0.88. In Figure 3, symmetrical and appropriate distribution of points around the line of best fit (line 1 : 1) indicates a strong correlation between the two parameters. In Figure 4, relation between the and maximum air temperature () displays that the greatest between and is related to Amirkabir and Khazaei stations (); the minimum amount of this coefficient is associated with Gazali station (). For all study stations, was generally equal to 0.78. Overall, it can be concluded that there is a strong relationship between these two parameters. The scatter of points around the line of best fit in the Figure 4 depicts an overestimation of over . This overestimation is more pronounced for the temperatures over 30°C.

For all examined datasets, the correlation between and is higher compared to the correlation between and for thermal products of MODIS sensor. Moreover, previous studies concerning estimation of air temperature have similarly confirmed that correlation between and is greater than correlation between and [12, 19].

In the second stage, in order to calculate using LST values extracted from the MODIS sensor ( and ), ANN models and M5 model tree were employed. In addition to and , two parameters including mean LST () and day of year (DOY) were considered as inputs for generating model in order to increase accuracy of estimates. Thus, in producing ANN model and M5 model tree, four parameters including , , DOY, and were taken into consideration as input and calculated from weather data using FAO-PM equation as output.

For generating ANN model, training data (whole data of Amirkabir, Farabi, and Gazali stations) were applied, and the number of nodes in the hidden layer was obtained using trial and error method. The number of nodes in the hidden layer was considered varying from 1 to 12; based on error statistical parameters, a network with 3 nodes in the hidden layer indicated the lowest error in comparison of values calculated by FAO-PM method. Table 1 presents the values of error parameters for the neural network with the structure of three nodes in the hidden layer. In Figure 5, scatter diagram of calculated values through ANN for all training data has been plotted against values obtained by FAO-PM equation. As shown in Figure 3, demonstrates that the ANN has provided an acceptable determination coefficient in approximating . In this figure, the points density increases around the line 1 : 1 in large amounts of ; that is, it represents the overestimation or underestimation of the data that are situated in this area.

For generating M5 model tree, WEKA 3.7 software [42] was used considering the same data applied to the training ANN model. Algorithm 1 shows M5 model tree produced by the software and established linear regression relations. Afterward, a scatter plot of estimated by M5 model tree versus calculated by FAO-PM equation has been presented for training data (Figure 6); value equal to 0.83 indicates proper agreement. The distribution of points around the best fit line represents higher accuracy of this method in the estimation of low values; in the great values of the error amount increases as well.

LSTn <= 19.43:                                                                         LM num: 1
    LSTms <= 15.55: LM1                                                                    Pm = 0.0147 Tn + 0.1048 Tms − 0.0007 DOY + 0.8963
    LSTms > 15.55:                                                                     LM num: 2
          DOY <= 282.5:                                                                            Pm = 0.0349 Tn + 0.0312 Tms + 0.028 DOY + 1.3107
                  DOY <= 81.5: LM2                                             LM num: 3
                  DOY > 81.5:                                                                           Pm = 0.046 Tn + 0.081 Tms − 0.0134 DOY + 3.9221
                          DOY <= 106:                                                   LM num: 4
                                  LSTd <= 31.2: LM3                                           Pm = 0.0719 Tn + 0.0642 Tms − 0.0088 DOY + 3.8864
                                  LSTd > 31.2: LM4                                 LM num: 5
                          DOY > 106:                                                                   Pm = 0.0705 Tn + 0.0073 Td + 0.0301 Tms + 0.0003 DOY + 4.6615
                                  DOY <= 207:                                          LM num: 6
                                          LSTn <= 17.72:                                             Pm = 0.201 Tn + 0.0147 Td + 0.0301 Tms + 0.0003 DOY + 2.5252
                                                  DOY <= 129: LM5              LM num: 7
                                                  DOY > 129: LM6                              Pm = 0.118 Tn + 0.0301 Tms + 0.0026 DOY + 4.7479
                                          LSTn > 17.72: LM7                      LM num: 8
                                  DOY > 207:                                                          Pm = 0.0529 Tn + 0.0301 Tms − 0.0087 DOY + 6.254
                                          DOY <= 272.5: LM8                 LM num: 9
                                          DOY > 272.5: LM9                                 Pm = 0.0529 Tn − 0.0348 Tms − 0.0116 DOY + 8.4773
          DOY > 282.5:                                                                 LM num: 10
                  DOY <= 311.5: LM10                                                     Pm = 0.0328 Tn + 0.0201 Tms − 0.0012 DOY + 3.367
                  DOY > 311.5: LM11                                             LM num: 11
LSTn > 19.43:                                                                                         Pm = 0.1437 Tn + 0.0201 Tms − 0.0012 DOY + 1.7414
    DOY <= 227.5:                                                                     LM num: 12
          DOY <= 159.5:                                                                           Pm = −0.1076 Tn + 0.0018 Tms − 0.0393 DOY + 15.9787
                  DOY <= 147.5: LM12                                         LM num: 13
                  DOY > 147.5: LM13                                                        Pm = 0.0297 Tn + 0.0018 Tms + 0.0032 DOY + 8.0036
          DOY > 159.5:                                                                  LM num: 14
                  DOY <= 198.5:                                                                  Pm = 0.3445 Tn + 0.0769 Td + 0.0018 Tms − 0.1664 DOY + 26.3097
                          DOY <= 167.5: LM14                                LM num: 15
                          DOY > 167.5: LM15                                               Pm = 0.0316 Tn + 0.0018 Tms − 0.006 DOY + 10.5743
                  DOY > 198.5:                                                          LM num: 16
                          DOY <= 216.5:                                                          Pm = −0.0175 Tn + 0.0018 Tms + 0.0225 DOY + 4.5799
                                  DOY <= 206.5: LM16                      LM num: 17
                                  DOY > 206.5: LM17                                     Pm = −0.206 Tn + 0.0018 Tms + 0.0185 DOY + 11.1645
                          DOY > 216.5: LM18                                   LM num: 18
    DOY > 227.5: LM19                                                                    Pm = 0.0015 Tn + 0.0018 Tms − 0.0046 DOY + 9.6864
                                                                                                             LM num: 19
                                                                                                                        Pm = 0.022 Tn + 0.0018 Tms − 0.0387 DOY + 16.0569

The comparison of error statistics concerning developed models by ANN model and M5 model tree in estimating using training data can be drawn in Table 1. As seen, M5 model tree, which has shown RMSE and MAE values, respectively, equal to 1.27 and 0.96 mm d−1, has resulted in higher accuracy than ANN in estimating .

For testing the methods of ANN and M5 model tree, data of Khazaei and Shoeybie stations have been used. In Figure 7, the scatter plot of calculated through two models of ANN and M5 model tree versus FAO-PM values for the mentioned stations have been shown. values associated with calculated by ANN against FAO-PM equation values for Khazaei and Shoeybie stations were 0.79 and 0.86, respectively. Additionally, the same values related to M5 model tree for aforementioned stations were 0.80 and 0.85, respectively. In both methods, values were properly distributed around line 1 : 1. For smaller amounts of , there was been a better compatibility between the calculated values by two models and FAO-PM values compared to greater amounts of . In other words, underestimation or overestimation of is related to warm seasons further than cold seasons. The evolutions of calculated values over time have been illustrated for two test stations in Figure 8. As perceived in this graph, the maximum amounts of by these two models were not entirely compatible with FAO-PM values. This could be due to the lack of consideration of some weather parameters affecting rate (e.g., wind speed, solar radiation, and relative humidity) as inputs of ANN model and M5 model tree.

In Table 2, error statistics of estimation by ANN and M5 model tree have been presented for two test stations. According to the statistics of RMSE and MAE, both methods performed equally well. Similar results have been reported about close performance of ANN and M5 models in rainfall-runoff modeling [43] and river flow forecasting [44]. The main advantage of M5 model tree is that it provides the results in a simple and comprehensible form of regression equations [1], which can be easily used in the calculation of .

It seems generally that and values obtained from MODIS sensor, making the use of both ANN and M5 models, can be properly utilized in the estimation of .

4. Conclusion

In this study, attempts were made to calculate using LST values extracted from images of MODIS sensor. For this purpose, two methods of ANN and M5 model tree were selected in order to estimate ; four parameters including , , , and DOY were considered as model inputs, and was calculated from FAO-PM equation as model outputs. A comparison of estimated using each mentioned model with the values obtained from the FAO-PM method revealed high accuracy of both methods. The average amounts of RSME in two test stations for ANN and M5 model tree were, respectively, 1.35 and 1.34 mm/day. According to the error values, it can be observed that both models have shown close performance in determining . It seems that considering other significant variables affecting the ET process such as relative humidity and wind speed (as model inputs) can lead to higher accuracy.

Overall, it can be concluded that, unlike complex ANN method, M5 model tree due to its simple structure and providing simple regression equations, is a more appropriate method in determining using LST data derived from MODIS sensor.

In this paper A M5 tree model with some simple linear regressions was developed to estimate using MODIS LST data. Considering simple linear equations and minor data required in the developed M5 tree model, it is expected that this method could be an alternative to calculate in the study area. However, this model provided acceptable results in the current area but it seems that its applicability in the other region of the world or the region with similar climate condition needs more investigation.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.