Research Article  Open Access
Yan Yang, Takeo Onishi, Ken Hiramatsu, "Improving the Performance of Temperature Index Snowmelt Model of SWAT by Using MODIS Land Surface Temperature Data", The Scientific World Journal, vol. 2014, Article ID 823424, 20 pages, 2014. https://doi.org/10.1155/2014/823424
Improving the Performance of Temperature Index Snowmelt Model of SWAT by Using MODIS Land Surface Temperature Data
Abstract
Simulation results of the widely used temperature index snowmelt model are greatly influenced by input air temperature data. Spatially sparse air temperature data remain the main factor inducing uncertainties and errors in that model, which limits its applications. Thus, to solve this problem, we created new air temperature data using linear regression relationships that can be formulated based on MODIS land surface temperature data. The Soil Water Assessment Tool model, which includes an improved temperature index snowmelt module, was chosen to test the newly created data. By evaluating simulation performance for daily snowmelt in three test basins of the Amur River, performance of the newly created data was assessed. The coefficient of determination () and NashSutcliffe efficiency (NSE) were used for evaluation. The results indicate that MODIS land surface temperature data can be used as a new source for air temperature data creation. This will improve snow simulation using the temperature index model in an area with sparse air temperature observations.
1. Introduction
In most of the middle and high latitude regions, snow accumulation and subsequent snowmelt are considered as the most important hydrological processes, because the stream hydrograph is dominated by spring snowmelt [1]. In addition, nutrient transport from land to sea is significantly influenced by spring flood processes [2, 3]. Hence, knowledge of the spring snowmelt process is essential not only for hydrological modeling, but also for further study of nutrient dynamics and transport in middle and high latitude regions. Distributed hydrological models have been proven useful and applicable to investigate streamflow and nutrient transport in snowmeltdominated basins [4, 5].
Recently, the physically based energybalance method has been demonstrated to be accurate and powerful for calculating snowmelt processes [6–8]. However, the demand for accurate and variable input data and complex parameterization still limits applicability of the method [9, 10]. Conversely, the temperature index (henceforth, ) method has been widely used despite its simplicity for the following reasons [11]: (1) wide availability of air temperature data, (2) relative ease of air temperature interpolation and forecasting, and (3) computational simplicity. Thus, hydrological models such as the Soil Water Assessment Tool [12], Hydrological Simulation Program Fortran (HSPF, [13]), MIKE [14], and Snowmelt Runoff Model (SRM, [15]) have adopted the method to simulate snow accumulation and the snowmelt process. Because is based on an assumption that the relationship between ablation and air temperature is usually expressed in the form of positive temperature sums [11], the air temperatures are obviously one of the most important variables for this method.
Air temperature () data can be easily obtained in regions where meteorological observation network is dense, but many remote areas have sparse stations. However, with development of earth observations, MODIS remotely sensed land surface temperature (LST) data have proven powerful for creating data. For example, Kloog et al. [16] successfully applied the spatial smoothing method to evaluate daily air temperature data using MODIS LST data in Massachusetts, United States. Zhu et al. [17] also used MODIS LST to evaluate daily and subdaily maximum and minimum air temperature on the northern Tibetan Plateau. Zakšek and SchroedterHomscheidt [18] reviewed that there are three different methods commonly applied for estimating the based on the LST data: (1) the statistical methods; (2) the temperaturevegetation index methods (TVX); (3) energybalance methods. They reported that the statistical methods generally perform well, within the spatial and time frame they were derived from, but require large amounts of data to train the algorithms [19]. The TVX method is based on the assumption that, for an infinitely thick canopy, the topofcanopy temperature is the same as within the canopy [17] and uses the Normalized Difference Vegetation Index (NDVI) as a key input variable. However, the assumption of linear and negative slope between LST and NDVI is not always applicable and is influenced by seasonality, ecosystem type, and soil moisture variability [19, 20], and the period of created data is limited by the periods of both LST and NDVI data. In addition, Zhu et al. [17] also found that the results of the statistical method are similar to the TVX method. Although the energybalance methods are physically based, the major disadvantage of this method is the requirement of large amounts of information often not provided by remote sensing [19]. The linear relationship between MODIS LST and data has been demonstrated in different study regions [21–24]. Thus, the linear regression method is a common choice for data estimation using MODIS LST data. Colombi et al. [25] used the linear regression method and MODIS LST data to generate average daily temperature in Italian alpine areas, and they proved that the result of the linear regression method was superior to that of the spatial interpolation method. Shen and Leptoukh [26] also used the linear regression relationship between air temperature and MODIS LST data to create new daily air temperature data in northern China and central Russia.
One critical disadvantage of using MODIS LST data is that the period of newly created air temperature data is limited by the operational period of the satellite. However, we frequently need historical data, especially for longterm hydrological simulations. Thus, it is necessary to find an easy and effective way to create spatially dense and temporally longtermair temperature data. Motivated by this unsolved problem, we set our research objectives as follows: (1) development of a simple method to create accurate air temperature data over a longer period; (2) hydrological simulation of the snowmelt process using newly created air temperaturedata for three test basins in the Amur River basin; and (3) evaluation of the validity of the newly created air temperature data by analysis of simulated results.
2. Study Area and Data
2.1. Test Basins
Three basins were selected for model testing, which are located in the upper, middle, and lower stream of the Amur River basin. Basic geographic characteristics of the basins are shown in Figure 1 and Table 1. The Amur River is the tenth longest in the world and is recognized as an important dissolved iron source for the Sea of Okhotsk [27]. There are four distinct phases in the Amur water regime: spring floods, summer low water, summer and autumn floods, and winter low water. The main water source is rainfall, supplying 70–80% of total water, and snowmelt during spring floods adds 10–20% [28]. In the upper stream (basin A, Gari), the annual temperature is −2.4°C and the annual precipitation is 494 mm, and in the lower stream (basin B, Apkoroshi), the annual temperature is −0.1°C and the annual precipitation is 641 mm. In the middle stream (basin C, Malinovka), the mean annual temperature is 1.1°C and the annual precipitation is 593 mm [29].

2.2. Data for Hydrological Simulation
Detailed structure of the snow melt component of SWAT model is explained in the following sections. To generate input data needed for SWAT, a digital elevation model (DEM), soil data, land use and land cover (LULC) data, and weather data are required [30]. In addition, discharge data are required for calibration of hydrological simulations.
A Shuttle Radar Topographic Mission (SRTM 90 m) DEM was used to delineate subbasins of the test basins. We applied the same flow accumulation threshold (200 km^{2}) in watershed delineation of all three test basins. The subbasins were delineated by the ArcSWAT interface. As shown in Figure 1, there are nine subbasins in basin A and 11 in basin B and basin C. The land use/land cover map was constructed by combined use of vegetation maps of China, Mongolia, and Russia and satellite images [31]. Soil data were taken from the Harmonized World Soil Database (HWSD, [32]), obtained from the International Institute for Applied Systems Analysis (IIASA). The spatial resolution of LULC data is as same as DEM data (90 m), and the resolution of soil data is 1 km.
Temporal resolution of all weather data was daily. Asian Precipitation Highly Resolved Observational Data Integration towards Evaluation of Water Resources (APHRODITE, [33]) was used for daily precipitation data. The most important driving data of distributed hydrological models are of accurate precipitation. It has been shown that APHRODITE can give good performance in this study area [34, 35]. Maximum air temperature (), minimum air temperature (), and wind speed data were obtained from the Global Historical Climatic NetworkDaily (GHCNDaily, [36]) of the National Climate Data Center (NCDC). The SWAT model calculates the distance between geometer center of subbasin and the candidate weather station to attach the nearest station for each subbasin as the unique driving station. In our study, the SWAT model selected eight monitoring air temperature stations (Figure 1 and Table 2) for SWAT model. Relative humidity and solar radiation data were from the NCEPDOE reanalysis 2 dataset [37] on the website of the NOAA Earth System Research Laboratory. Because we focused on improving snowmelt simulations in the method, daily monitoring runoff data during March through May were used for the calibration period to obtain optimized parameter values. The years 1983–1987 were selected for basins B and C, and because of lack of the monitoring, 1983, 1984, 1986, and 1988 were used for basin A. Runoff data were provided by the Russian Federal Service for Hydrometeorology and Environmental Monitoring (Roshydromet).
 
Station is the WMO code of stations used for validation of new created data and SWAT model; N_Station shows the WMO code of the nearest station for each station in the “Station” column; P_code is the station pair name, LULC is the land cover type, and Dis is distance between the two stations. 
2.3. MODIS Land Surface Temperature Data
The AQUA/MODIS daily LST data at 1 km spatial resolution of both daytime and nighttime were acquired from mid2002 to 2010. (https://lpdaac.usgs.gov/products/modis_products_table/myd11a1). For different MODIS LST data (TERRA/MODIS and AQUA/MODIS), Mostovoy et al. [23] already proved that the AQUA and TERRA have no significant difference in estimation. In addition, Vancutsem et al. [20] also proved that the AQUA data can give reasonable results even if its observation period is shorter than TERRA. MODIS LST data are derived from thermal infrared bands 31 (10.78–11.28 μm) and 32 (11.77–12.27 μm). Atmospheric effects are corrected by a generalized splitwindow algorithm [38]. The latest LST data are version 005; these datasets have error less than 1°C within the range −10 to 50°C, assuming that surface emissivity is known [19, 39]. In addition, groundbased validation has shown that errors were less than 1°C at homogeneous surfaces such as water, crop, and grassland [39].
3. Methodology
3.1. LongTerm Air Temperature Data Creation and Validation
The simplest method to estimate the air temperature is to create a linear regression equation between at points and (Figure 2) using observed data. In this case, the linear regression equation can be written as follows:
Here, and are coefficients of the linear regression equation, and subscripts and indicate the points.
We call this the method. If monitoring data at both points and are available, we can use the method. However, if we have no data at a point we need to know, the method is not applicable. Here, we use the monitoring stations and their nearest stations (Table 2) to evaluate the linear relationships between two different places (Figure 2).
We used three indices: coefficient of determination (), mean absolute error (MAE), and root mean square error (RMSE) for the evaluation. Equations for , MAE, and RMSE are as follows:
Here, is observed on day , is created air temperature on day by each method, is the average value of observed , is the average value of created air temperature, and is the total number of days. The results of linear correlation analysis for are shown in Table 3 and discussed in Section 4.1.
 
Sta_Pairs is the station pair name, DataNum is the amount of data used for the regression analysis, “” is the first order coefficient of linear regression equation, and the “” is the interception of linear regression equation. 
Further, the study of Sun et al. [24] presented a theoretical derivation of linear regression relationships between and LST and proved that the can be mainly explained by the LST in the linear regression equation, and they also showed that the errors of created based on the linear regression method are limited in a reasonable range, in the North China Plain. In addition, Mostovoy et al. [23] also proposed the similar method to estimate at any point from LST at that point. By constructing linear regression equations between and LST at 161 monitoring station points in the state of Mississippi, they also found a common relationship between LST and , irrespective of location. Furthermore, both of these researches indicate that the first order coefficient of linear equation between and LST is equal to 1. Here, we use the monitoring data and LST data to estimate the air temperature between them; we call this the LST method. In this case, the linear regression equation can be written as
We also validate the linear regression relationships between the and LST in all stations (Table 4); the is over 0.95 for both daily maximum and minimum LST analysis for all station pairs. In addition, the results also showed that the first order coefficient (slope or “”) of linear regression equations (Table 4) is very close to 1. These results also correspond to the theoretical analysis of previous researches in China North Plain [24] and Mississippi of USA [23].
 
DataNum is the amount of data used for the regression analysis, “” is the first order coefficient of linear regression equation, and the “” is the interception of linear regression equation. (The monitoring data of station 3N is till December 31, 1997, which cannot match the LST monitoring period; thus, there are no results for station 3N: 314590.) 
However, the results of LST indicate that this method extends the errors compared with the method. Furthermore, a disadvantage of this method is that because LST data are necessary for this method; it can only be applied to periods after MODIS was launched. As already addressed, we frequently need historical air temperature data to execute hydrological models that include snow accumulation and snowmelt processes. Moreover, many watersheds have very sparse observed air temperature data. Because both the and LST methods are unsuitable for such common cases, we developed a new method as follows.
In the first step, a linear regression equation of LST between two points is created as follows:
Here, and are coefficients of the linear regression equation, and subscripts and indicate the points. is the result of the predicted LST value of point B from (4).
We conducted the linear regression analysis of LST data in all station pairs. According to the results (Table 5), the same as the previous two methods, it is clear that high linear correlations are obtained in all station pairs for LST data.
 
Sta_Pairs is the station pair name, DataNum is the amount of data used for the regression analysis, “” is the first order coefficient of linear regression equation, and the is the interception of linear regression equation. 
In addition, based on the linear analysis results of LST of station pairs, and using the LST method, we can estimate and at the same time:
Based on the results of linear correlation analysis of LST data (Table 5), we neglect the error between the and and by substituting in (6) by , can be expressed by as Combining (7) with (5), we get
Because the LST relationship in the study basins can only be acquired in the point which monitors both LST and data, cannot be obtained as is not monitored in place B. In this study, we take the as an entirety. It is clear that if this entirety can be ignored, the equation will be simplified drastically.
Thus, we calculated the in all station pairs based on the results listed in Tables 4 and 5. The results (Table 6) indicate that in station pair 44 there is a relative large error compared with other station pairs for both daily maximum and minimum analysis. The error is larger in the daily minimum analysis of 22 and 66, and it is also larger in the 77 for the daily maximum analysis. In addition, the average value of all station pairs is 0.38°C for daily maximum analysis and 0.45°C for daily minimum analysis. However, we also recognize that the effects of data on the snowmelt processes are based not only on the “point” or “situ” scale but also on its accurate distribution in the entire basin. Thus, considering relative small average errors (Table 6), we ignore the of (8) and the new equation is
 
Sta_Pairs is the name of station pair, const_{A} is the interception value of linear regression equation in LST method for point A (Table 4), is the first order coefficient of linear regression equation of LST data linear regression results (Table 5), and const_{B} is the interception value of linear regression equation for LST method in point B. (Because of the lack of LST analysis result of station 3N, the station pair: 33N is excluded from the results.) 
This means that once we acquire coefficients and from linear regression analysis of LST, we can estimate using a known . We call this the LSTLST method. We performed a linear regression analysis for creation of based on both daily maximum and minimum LST data. However, according to the limited monitoring period of MODIS LST data, the linear regression equations are formulated only in the entire period. We compared estimated from the LSTLST method with the method, using the same station pairs listed in Table 2, and also the errors caused from the approximation are discussed, especially the approximations of (8) which may cause more errors for the estimation. The comparison was for both the entire period and spring snowmelt period. We defined the snowmelt period as March through May. The results are presented in Section 4.1.
3.2. Air Temperature Calculation in the SWAT Model
The SWAT model can consider orographic effects on air temperature by dividing the subbasin into multiple elevation bands [40]. This significantly influences snow cover and snowmelt processes. The equation is
Here, is calculated in each elevation band (°C), is temperature recorded at a monitoring gage (°C), is mean elevation of each elevation band (m), is elevation of an existing monitoring gage (m), is temperature lapse rate (°C/km), and 1000 is a unit conversion factor from meters to kilometers. By multiplying and areal percentage of a band and summing over all elevation bands, we can obtain average of each subbasin. All test basins were divided into ten elevation bands, based on an equal interval of elevation.
3.3. Snowmelt Simulation in the SWAT Model
Based on the basic concept of the method, the snowmelt module of the SWAT model is as follows (time interval is daily):
Here, SNO_{mlt} is the amount of snowmelt (mm H_{2}O), is the melt factor (mm H_{2}O/day°C), and is the fraction of the hydrological response unit (HRU) area covered by snow. The HRU is divided by using land use and soil types in each subbasin. The HRU is the simulated unit in the SWAT model; the simulation of water budget in SWAT is based on the HRU. is snow pack temperature (°C), is maximum (°C), and is the base temperature above which snowmelt is allowed (°C). allows seasonal variation with maximum and minimum values occurring on the summer and winter solstices. allows nonuniform snow cover caused by factors such as shading, land cover, and topography.
Consider
Here, is snow pack temperature on a given day (°C), is snow pack temperature on the previous day (°C), is snow temperature lag, and is mean on a given day. In SWAT model, the is the arithmetic mean value of daily maximum and minimum .
3.4. Model Calibration and Evaluation
To evaluate different input datasets influences on SWAT model snowmelt simulation, we conducted hydrological simulations under two different scenarios: (1) Elev, which used the original data with elevation band method; (2) New, which used the newly created data based on the LSTLST method without elevation bands. For other settings such as watershed delineations, HRU generations, input data (except ), and initial range of calibration parameters, we keep them the same for all different scenarios in each test basin.
Model parameters were calibrated using the Sequential Uncertainty Fitting Version 2 (SUFI2) method [30]. Table 7 lists the calibration parameters for each scenario. is only included in the Elev scenario. Two different parameter sets were constructed for testing the performance of each scenario. One is the socalled “the least parameters setting” and the other is “complete parameters setting.”
 
Elev scenario is elevation bands combined with the original monitoring air temperature data scenario, and New scenario is the newly created air temperature data. V means the true value of the parameter and R means the relative range of the original parameter. 
For the least parameters setting, to test the performance of two different datasets, we fixed the values of parameters that might significantly influence snowmelt simulations; that is, , ,