Abstract

Flooding normally occurs during periods of excessive precipitation or thawing in the winter period (ice jam). Flooding is typically accompanied by an increase in river discharge. This paper presents a statistical model for the prediction and explanation of the water discharge time series using an example from the Schoharie Creek, New York (one of the principal tributaries of the Mohawk River). It is developed with a view to wider application in similar water basins. In this study a statistical methodology for the decomposition of the time series is used. The Kolmogorov-Zurbenko filter is used for the decomposition of the hydrological and climatic time series into the seasonal and the long and the short term component. We analyze the time series of the water discharge by using a summer and a winter model. The explanation of the water discharge has been improved up to 81%. The results show that as water discharge increases in the long term then the water table replenishes, and in the seasonal term it depletes. In the short term, the groundwater drops during the winter period, and it rises during the summer period. This methodology can be applied for the prediction of the water discharge at multiple sites.

1. Introduction

It has been shown that there is a connection between increased river flooding and climate change [13]. In addition to this increased flooding activity, it has also been proven that global atmospheric, surface, and troposphere temperatures are rising [4, 5], coincident with increasing amounts of water vapor in the atmosphere [6]. Consequently, storms with increasing supplies of moisture produce more intense precipitation events therefore increasing the risk of flooding [7]. Temperature that increases during the winter period is known to break up the river’s icy surface and hence initiate “ice jams.” An ice jam is a localized accumulation of ice that can create blockages of river flow and increase the probability of subsequent upstream flooding [8].

Over the last decade, Schoharie Creek has experienced a significant increase in water discharge [911]. This increase can be explained by the increase of the winter melt events associated with warming temperatures [9].

Water discharge depends on the watershed area and the tributaries that drain into the stream. Schoharie Creek (hydrologic unit: 02020005 with a drainage basin of approximately 886 mi2 (2300 km2) in New York, USA, flows north 93 miles (150 km) from the foot of Indian Head Mountain in the Catskill Mountains through the Schoharie Valley to the Mohawk River. It is one of the two principal tributaries of the Mohawk River, with the other being West Canada Creek. Eight further tributaries drain into the Schoharie Creek. It is critical to model flooding in order to help local governments and communities to take precautionary steps to minimize or to prevent damage on property from flooding.

Several studies on flooding in Schoharie or in different locations (e.g., in Germany, Indus River in India, or in Elbe River in Czech [13, 12]) show a moderate explanation of the water discharge by the climatic variables. The main reason is the presence of uncertainties in all the variables of the time series. In particular, it has been shown that uncertainties in basic variables or from the effects of scales related with different frequencies of the variables can change the inferences of the analysis between the variables [1315]. Moreover, it has been shown that separation of time series into different components is necessary to avoid interference from different covariance structures existing between the components of the time series. An absence of separation of scales may lead to erroneous estimated parameters of the linear regression model or other multivariate models (principal components, canonical pairs). For this reason, the time series decomposition is essential before performing analysis since absence of separation of scales can lead to misinterpretation [16].

In previous studies regarding water discharge, the time series of the water discharge variable has rarely been decomposed into different components [13, 12]. Specifically, Kotlarski et al. investigate the Elbe River flooding in August 2002, using the precipitation variable [3]. In their model, the authors determined a correlation pattern of 0.75 yielding an value of 0.56. A moderate interpretation of the water discharge by the climatic variables might be due to a mixed interference of scales of the time series. The explanation of the water discharge can be improved by using the decomposition of the time series.

The main purpose of this paper is to present a novel methodology to explain and improve the predictive capability of river water discharge time series by using climatic and hydrological variables. We introduce a new model which incorporates separate treatments of frequency scales. Those scales have been determined based on spectral and cospectral analysis, and they provide a physical explanation of the hydrology of this area. In particular, we separate the data into seasonal and long and short term components by using the Kolmogorov-Zurbenko (KZ) filter [17]. The KZ filter provides effective separation of frequencies and its properties are described in the following section. For this paper, we use two models for the prediction of the water discharge time series to avoid mixed interferences. This is especially necessary in high latitudes where winter and summer seasons are prolonged. Prolonged seasons, such as winter (extended by a month or more) in this latitude, may provide a mixed interference due to their different correlation structure, and they must be separated as evidenced in standard statistical approaches.

One model is designed for the winter period and another model for the summer period. This is because flooding in Schoharie Creek may be caused by heavy rainfall during the summer or rapid snowmelt or “ice jam” during the prolonged winter period [18]. For both models, we first separate the different scales of the time series of water discharge, the depth of water level below land surface (referred as groundwater level), and the climatic variables by different frequency bands. We then design different multivariate models for each scale of the water discharge time series. Different frequencies are always uncorrelated and as a result multivariate models can be designed for each scale of the water discharge time series, separately. The long term component and the seasonal component are the same for the winter and summer models. However, the short term component has been designed using two different models for the prediction of the water discharge time series. Physical phenomena (such as an ice jam or flooding) are a result of contributions from all components (seasonal and long and short term) and not simply short term variations. We prove that by using the KZ filter, we have improved the explanation of the water discharge time series up to 81%.

2. Methods

2.1. Data

Daily time series of water discharge, groundwater level, tides, and climatic variables have been obtained through the online public database of National Oceanic and Atmospheric Administration (NOAA; http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/quality-controlled-local-climatological-data-qclcd/; Quality Controlled Local Climatological Data, Albany, Hudson River, NY) and the U.S. Geological Survey (USGS; http://waterdata.usgs.gov/nwis/, Schoharie Creek at Burtonsville, NY, for water discharge and for depth of water level data below surface). All monitoring stations are located in an area nearby the Schoharie Watershed. In particular, daily time series of air temperature (°F), wind speed (m/sec), total rainfall and snowfall precipitation (mm/hr), and tide (feet) data from a nearby station of Schoharie Creek have been obtained by the National Oceanic and Atmospheric Administration for January 2005–February 2013 period. Moreover, daily time series of water discharge and groundwater level have been analyzed for the same area and time period.

2.2. Decomposition of Time Series

In several studies, it has been shown that the separation of a time series into different components is essential in order to avoid contributions from different covariance structures between the components of the time series [1316, 19]. The time series of a variable can be expressed by where represents the original time series of a variable, is the long term trend component, is the seasonal component, and is the short term component. The long term component describes the fluctuations of a time series defined as being longer than a given threshold, the seasonal component describes the year-to-year fluctuations, and the short term component describes the short term variations. The long term trend component incorporates information regarding the trend component together with the cyclical component [20]. The cyclical component of a time series describes the fluctuations around the trend. The cyclical component can be viewed as those fluctuations in a time series which are longer than a given threshold, for example, one and a half years but shorter than those attributed to the trend.

The KZ filter, which separates long term variations from short term variations in a time series [17], provides a simple design and the smallest level of interferences between scales (seasonal and long and short term components) of a time series. It allows a physical interpretation of the scales [21, 22]. Furthermore, the KZ filter provides effective separation of frequencies for application directly to datasets containing missing observations [2325]. KZ filtration is also known to provide the best and closest results to the optimal mean square of error [17, 22, 26]. As examples of its use, the KZ filter has been applied effectively for the explanation and prediction of the ozone problem and for the explanation of the water use time series in Gainesville, Florida [19, 24, 2730].

Specifically, the KZ filter is a low pass filter, defined by iterations of a simple moving average of points. The moving average of the KZ can be expressed by where . The output of the first iteration becomes the input for the second iteration, and so on. The time series produced by iterations of the filter described in expression (2) is denoted by The parameter in expression (3) has been determined to provide the maximum explanation of the water discharge time series by the climatic variables. In this study, first, we examine the periodograms of all the time series. By using the periodograms we verify that the decomposition of the time series is essential for the prediction of the water discharge time series using the climatic variables. We design different multivariate models corresponding to each scale of the time series. For the decomposition of the time series, we use the KZ filter. The multivariate models describe the projection of the water discharge time series in the space defined by the groundwater variable and the climatic variables. This projection can be expressed through linear regression between the water discharge time series and the hydrological and climatic variables. Therefore, we decompose all the time series of the variables using the KZ filter. After the application of the KZ filter, we need to predict each component of the water discharge time series, separately. For this reason, we use multivariate models for the prediction of each component of the water discharge time series. For the application of the multivariate model, we select different climatic variables to explain the components. Finally, we estimate the water discharge time series using the climatic variables for the winter and summer period.

In the following sections, we describe the method for prediction of the seasonal and long and short term components of the water discharge time series using Schoharie Creek in New York as an example of application area.

2.2.1. Raw Data Analysis of the Water Discharge Time Series

Here, we apply the previous methodology for the decomposition of the time series to explain and predict the water discharge time series of Schoharie Creek. For the explanation and prediction of the water discharge, we use the climatic variables and the groundwater level. The logarithm of the water discharge from the study area was measured from January 2005 to February 2013, and it has been presented in Figure 1. We use the logarithm transform for the water discharge time series in order to stabilize the variance.

Table 1 presents the correlation matrix between the raw data of the variables. It can be noted that based on the significance level, the correlations between the water discharge and the climatic variables are statistically significant. To explain the water discharge time series using the raw data of the climatic variables, a linear regression is performed with regression coefficient, , equal to 0.590. This model yields a moderate for the water discharge time series (expression (4)). Specifically, the time series of the water discharge raw data can be expressed through a linear regression as follows: where , , , , , and denote the raw data of the logarithm of water discharge, temperature, tide, wind speed, groundwater level, and precipitation, respectively. Furthermore, in expression (4), represents the residuals of this relationship and is the square of the correlation coefficient.

The relationship of the water discharge with the climatic variables can be strengthened by separating the seasonal and long and short term variations in all the time series. The separation of scales in the time series can also be verified through examination of the coherence between the variables. Table 2 shows the main periods of all the variables derived by the periodograms of the variables. It can be observed that almost all the variables consist of a 365-day period describing the seasonal component of the variables. Some variables consist of short periods (7 days, 4 days, etc.) which are related to the short term components of the variables. In particular, the precipitation variable consists mostly of short periods and for this reason this variable contributes mostly to the short term component of the water discharge. Figures 2 and 3 show two examples of the periodograms for the temperature and the precipitation variable using the DZ (DiRienzo and Zurbenko [31]) algorithm in R software. From Figures 2 and 3 and Table 2, we can conclude that it is essential to decompose the time series into different components to avoid the contribution of different frequencies between the scales of the time series.

For the decomposition of the time series, we use the filter (length is 33 with 3 iterations) to provide a physical based explanation of the water discharge time series. Following Rao et al. [24], the parameters of the KZ filter are chosen to provide the optimal solution for our study. In particular, the parameters have been estimated in order to sustain the properties of each time series component and reduce the short term variations displayed in the long and seasonal component of the time series.

The KZ filter is applied to the logarithm of daily water discharge and produces a time series devoid of short term variations and consisting only of the long term variations of the time series (). The same filter is applied to the variables of daily temperature, tide, wind speed, precipitation, and groundwater level.

2.2.2. Prediction of the Long Term Component of Water Discharge

To explain the long term component of water discharge and its relationship with groundwater level and climatic variables, we examine the filtered daily temperature, tide, wind speed, and precipitation of the sum of four continuous days. For the analysis, we denote the long term components of the water discharge, temperature, tide, wind speed, precipitation, and groundwater level time series with , , , , , and , respectively.

The correlation between the long term component of the water discharge and the precipitation is weak (Table 3). This can be verified by the peaks in the periodogram of the precipitation variable which occur over short periods (Table 2). Consequently, the precipitation variable contributes mostly to the short term component of the water discharge while groundwater level contributes mostly to the long term component of the water discharge (Table 3). The groundwater level is the only variable that consists of time periods greater than a year (991 days; Table 2). This period may be explained by larger time scale atmospheric episodes or from river flow changes due to anthropogenic impacts (e.g., dams, power projects, lock stations, bridges, etc.).

For the prediction of the long term component of water discharge, a linear regression is performed using the filtered logarithm of the water discharge, the filtered climatic variables, and the groundwater level time series with an value of 0.833. The long term component of the logarithm of the water discharge time series can then be expressed by It can be noted that the values for the coefficients of the variables of the linear regression model are equal to zero. Thus, the coefficients of the linear regression are statistically significant. The scale of corresponds to percent change in the long term component of water discharge data due to effects other than the climatic variables.

From expression (5), we can conclude that 83.3% of the long term fluctuations of water discharge can be explained by long term groundwater level fluctuations and major climatic variables. If we were to consider additional climatic variables such as sea level, wind direction, or relative humidity, the coefficients related to those variables would not be statistically significant (the value associated with the -test is greater than the significance level). For this reason, we did not consider additional climatic variables in the model.

Because expression (5) is given for the natural logarithm of water discharge, the additive term has a multiplicative effect, , in the original data. Since the term is sufficiently small, we can obtain that Therefore, corresponds to percent changes in the long term of water discharge unexplained by the groundwater level and the climatic variables.

2.2.3. Prediction of the Seasonal Component of Water Discharge

The seasonal component of a time series represents the year-to-year fluctuations of the corresponding variable. In order to predict the seasonal component of the water discharge time series, we use seasonal components of the climatic variables and the groundwater level. In particular, the seasonal component of the water discharge, , can be defined by , represents January, and so forth where represents the days of a year, is the number of years of the observed values, is the raw data of the time series of the water discharge, and is the long term component of the water discharge time series derived by the application of the KZ filter. Similarly, we define the seasonal components of the climatic variables and the groundwater level. In particular, the seasonal components of the temperature, tide, wind speed, the sum of four continuous days of precipitation, and groundwater level variables are denoted by , , , , and , respectively. The maximum correlation between the water discharge and the precipitation variable occurs when we consider the sum of the precipitation variable for four days.

To investigate the relationship between the seasonal component of the water discharge and the climatic variables, we estimate their correlation matrix (Table 4). The correlation between the seasonal component of the water discharge and two of the climatic variables (temperature and tide) is strong as is the case with the groundwater level. Moreover, all examined variables have a period of 365 days (Table 2). The most correlated variable with the seasonal component of the water discharge is the seasonal component of tide (Table 4).

To predict the seasonal component of the water discharge, we perform linear regression using the seasonal components of the climatic variables and the groundwater level. The coefficient of determination, , is equal to 0.912. In particular, the seasonal component of the logarithm of the water discharge can be estimated by Thus, 91.2% of the variability of the seasonal component of the water discharge can be explained by the climatic variables and the groundwater level as described in expression (9). The values for the coefficients of the variables of the linear regression are equal to zero. Thus, the coefficients of the linear regression are statistically significant. The addition of extra climatic variables such as sea level, wind direction, and relative humidity in expression (9) does not change the value of .

2.2.4. Prediction of the Short Term Component of Water Discharge

For the prediction of the short term component of the water discharge, we consider the short term components of the climatic variables and groundwater level. The short term component of the water discharge time series, , can be defined as follows: where represents the raw data of the water discharge time series, represents the long term component of water discharge, and is the seasonal component of the water discharge time series. Similarly, we can define the short term components of the remaining variables.

For the short term component of the water discharge, we consider two different models. One describes the prediction of the short term component during the summer period (May through September) and the other during winter (December through March). We consider two different models because flooding in the rivers in New York State is caused by extensive precipitation (e.g., extensive rainfall or tropical storms during the summer period) or by rapid snowmelt (e.g., ice jams during the prolonged winter period). The overall explanation of the winter model is maximum during the prolonged winter period (December to March), while the summer model shows maximum explanation during the prolonged summer period (May to September).

2.2.5. Prediction of the Short Term Component of Summer Water Discharge

To predict the short term component of the water discharge time series, we perform a linear regression by using the short term components of daily temperature, precipitation, the sum of four continuous days of daily precipitation, and daily groundwater level. The short term components of the above variables are denoted by , , , and , respectively. Table 5 shows the correlation matrix between the short term components of the variables for the summer period (May through September).

For the prediction of the short term component of the water discharge for the summer period, a linear regression is performed by using the short term components of the above climatic variables and the groundwater level with coefficient of determination, , equal to 0.447. Specifically, the short term component of the water discharge can be expressed through a linear regression as follows: The values for the coefficients of the variables of the linear regression model are equal to zero.

2.2.6. Prediction of the Short Term Component of Winter Water Discharge

Flooding in the rivers during the winter period occurs through rapid snowmelt due to the increase of air temperature. For this reason, the average temperature is used for the prediction model in the linear regression. The maximum correlation between water discharge and average temperature occurs for the average temperature over four days. Moreover, for the prediction model, we consider the variables: tide, wind speed, groundwater level, and the sum of four days of precipitation. The short term components of the above variables are denoted by , , , and , respectively. Table 6 shows the correlation matrix for the short term components of the variables.

To predict the short term component of the water discharge time series, we perform a linear regression by using the short term components of the above climatic variables with resultant coefficient of determination, , equal to 0.719. The short term component of water discharge can be expressed as follows: The variables considered for the prediction of the short term component of water discharge during the winter period are different to those explaining the summer period. Therefore, the separation of scales of the time series is essential since different components are related to different physical phenomena (rapid snowmelt and “ice jam” for the winter period while extensive rainfall or tropical storms for the summer period). The values for the coefficients of the variables of the linear regression model are equal to zero.

3. Results

A river can act as a gaining stream, receiving water from the groundwater system, or as a losing stream, losing water to the groundwater system. The water table’s height reflects a balance between the rate of replenishment, through precipitation, and removal through discharge and withdrawal. Any imbalance either raises or lowers the water table, acting with the opposite effect to the groundwater level value (because groundwater level value is measured as the distance of the water level depth below surface). In the summer period, the water river discharge of the Schoharie Creek increases as it flows downstream as tributaries and groundwater contribute additional water (recalling that eight tributaries contribute to Schoharie Creek before it reaches Mohawk River). During summer, groundwater resources appear to increase to ample levels (recharge). Natural replenishment will decrease the measured value of the groundwater level and enhance the river water discharge. Hence, it produces a negative association.

In regions where there is a prolonged winter period, the rainfall to replenish the water table is often scarce, and the rate of water table recharge will be less than the river’s water discharge. The groundwater will drop and may result in very low water level (depletion). Groundwater level value will increase (water level drops), and water discharge in the stream increases (primarily through snow melt), which produces a positive association.

Due to the separation of scales in the time series, the long term component of water discharge shows a negative correlation with the long term component of the groundwater level and temperature (Table 3), while the correlation between the seasonal components of those variables is positive (Table 4). A positive correlation has also been observed between the short term components of those variables during the winter (Table 6), while a negative correlation exists during the summer period (Table 5).

The negative correlation between the long-term components of the water discharge and the groundwater level is due to the increased rainfall during the last decade in Schoharie Creek area. This phenomenon has also been observed for the short term components during the summer period when precipitation is intense. A positive correlation between the water discharge and the groundwater level takes place during the prolonged winter period.

The correlation between the short term component of the water discharge and precipitation during the winter is lower than the short term components of those variables during the summer. This is because in areas that experience prolonged winter seasons, rainfall does not contribute to water table in the same rate as it would replenish the water table during the summer period.

To estimate the total explanation of the model for the water discharge of the summer period, we combine expressions (5), (9), and (11) to represent the seasonal and long and short term components (Table 7). The contribution of the long term component of the climatic variables and the groundwater level in the time series of the water discharge is 49% from expression (5). Furthermore, 16.2% is the contribution from the seasonal component using expression (9), while 10.4% is the contribution of the short term component from expression (11). By combining expressions (5), (9), and (11), in a similar way to expression (1), we can then explain 75.6% of the total variance of the water discharge time series using the climatic variables and groundwater level during the summer period. Figure 4 shows the raw data of the time series of the water discharge data (blue line) along with the prediction model (purple line) derived by expressions (5), (9), and (11) for the year 2006 (summer flood in June 29; widespread flooding in the Mohawk and Hudson basins, and Catskills was observed). For other years, similar graphs can be presented for the remaining years during the summer period.

By combining expressions (5), (9), and (12), we can estimate the total explanation of the water discharge time series from climatic variables for the winter period (Table 7). In particular, 44.2% is the contribution of the long term components of the climatic variables and groundwater level to the time series of the water discharge as described in expression (5). 14.8% is the contribution of the seasonal components of the climatic variables as described in expression (9), while 22.1% is the contribution of the short term component by expression (12). Consequently, 81.1% is the total explanation of the water discharge time series using the climatic variables and the groundwater level during the winter period. As an example, Figure 5 shows the raw water discharge time series data (blue line) for the year 2010 along with the prediction model (purple line) described by expressions (5), (9), and (12). Our model accounts for the “ice jam” event of 2010 (January 25-26; widespread flooding occurred across east central New York and adjacent western New England from a combination of rain, snowmelt, and frozen ground).

As a consequence of summer and winter model results, we prove that the decomposition of the time series improves our ability to describe and predict the time series variations of water discharge by approximately two times. In particular, the unexplained variance derived by the raw data described in expression (4) is 41%, while the unexplained variance derived from the decomposition method of the winter water discharge time series is 19%. Similar results can be derived for the summer period providing an unexplained variance of 24.4%. This method can be applied in other locations as well. In such cases, the coefficients associated with the above expressions will be different.

4. Discussion

This study focuses on predicting the daily water discharge time series using available climatic variables and the groundwater level. We prove that the decomposition of the time series is essential due to the presence of short term variations in the time series. We use the KZ filter to decompose the time series into the seasonal and long and short term components to provide a physical based explanation for the time series of the water discharge. The long term component is associated with long term changes, the seasonal component with year-to-year fluctuations, and the short term component with short term variations.

In this study, we prove that the seasonal and long and short term components of water discharge can be explained through consideration of climatic variables and groundwater level. The results show that as water discharge increases in the long term the water table replenishes, while in the seasonal term it depletes. In the short term, the groundwater drops during the winter period and rises during the summer period. The short term component of the water discharge is related to synoptic weather fluctuations and short term effects of rain, storms, and cyclones during the summer period and the rapid increase in temperature as well as storms during the winter period. As it is described in expressions (5), (9), (11), and (12), the selection of variables and coefficients of the multivariate models is different for the explanation of the seasonal and long and short term components of the water discharge time series. This requires the separation of the different scales of the water discharge time series in order to avoid erroneous results [1416].

After the application of the KZ filter and the decomposition of all time series, we apply different multivariate models to each component of the water discharge time series. Different scales of time series are associated with different correlation structures (Tables 36) and provide different coefficients for the multivariate models [1416]. Furthermore, for the explanation of the short term component of water discharge, we design two different models. We use different variables to explain the short term component during the winter and summer period and as a result, we need to use two different multivariate models for the explanation of the short term component of water discharge (Tables 5 and 6). This is because, in the specific location, the local climate is controlled mainly by two prolonged seasons instead of four as a typical temperate climate represents.

In our paper, we show that the accuracy of the water discharge time series prediction can be increased up to 81.1% for the winter period and 75.6% for the summer period by incorporating the KZ filter in a separation of the scales of the time series [17]. Moreover, the coefficient of determination of the water discharge time series is very strong for all scales of water discharge time series, and it exceeds those of the raw data. This model can be used for the prediction of critical levels of water discharge ahead of time as long as other variables will be received prior to the event and are applicable to other locations. Due to the variability of local hydrological and climatic characteristics, it would be likely that other locations would produce variable numeric values but maintain the relative improvements in prediction accuracy.

5. Conclusions

The prediction of the daily water discharge time series using the climatic variables and groundwater level can be substantially improved through the decomposition of the time series. The decomposition of the different components (scales), which are the seasonal and long and short term components, avoids erroneous results and approximately doubles the prediction accuracy of the water discharge time series relative to raw data. The resulting isolation of the short term variations by the decomposition of the time series shows a summer period with a water table replenishment and a winter period with a water table depletion. The design of multivariate models (winter and summer) can improve the prediction of flooding caused by storms, rapid snowmelt, and ice jams.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.