Abstract

Filtered renewal processes are used to forecast daily river flows. For these processes, contrary to filtered Poisson processes, the time between consecutive events is not necessarily exponentially distributed, which is more realistic. The model is applied to obtain one- and two-day-ahead forecasts of the flows of the Delaware and Hudson Rivers, both located in the United States. Better results are obtained than with filtered Poisson processes, which are often used to model river flows.

1. Introduction

Let be a Poisson process with rate . A filtered Poisson (sometimes called shot noise) process is a continuous-time stochastic process defined by in which the random variables denote the arrival times of the events of the Poisson process and are assumed to be independent and identically distributed random variables that are also independent of . The function is called the response function.

In many applications, the response function is chosen of the form where is a parameter that must be estimated. It then gives the value at time of an event of magnitude of the Poisson process that occurred at time . Moreover, the random variables are generally assumed to be exponentially distributed with parameter . With the above response function, the filtered Poisson process behaves as in Figure 1. Actually, this behavior depends on the form of the response function, but not on the distribution of the time between the events. Therefore, the same behavior would be observed in the case when is a renewal process. Remember that a Poisson process is a particular renewal process.

To model the daily flows of rivers given their most recent observed values, conceptual and physical models based on the filtered Poisson process have been widely used successfully for many decades; see, for example, Weiss [1], Kelman [2], Koch [3], and Konecny [4]. Filtered Poisson processes are still being used to model various phenomena in civil engineering; see Yin et al. [5] and Miyamoto et al. [6].

Now, especially in hydrological applications, the form of the response function in (2) is taken for granted rather than being justified by the observations of the variable of interest. Actually, it is generally simply an assumption made to obtain a mathematically tractable model.

Similarly, the actual distribution of the ’s is not investigated. However, if one is only interested in forecasting the value of , based on the values of the process up to time , this does not really cause a problem because the forecast is normally based on the mean of the ’s, and the distribution itself is not needed.

Finally, and even more importantly, the assumption that is indeed a Poisson process is not tested. Again, this is a simplifying assumption, because one can then make use of the nice properties of the Poisson process, notably the fact that it has independent and stationary increments.

Next, notice that, with the response function in (2), the effect on the value of the process of an event that occurred at time is maximum at that time instant and has discontinuities at the arrival times of the events of the Poisson process. In the case of the application in hydrology that consists in modeling the flow of rivers, if one looks at real hydrographs, one sees that there is a more or less extended period of time during which the flow increases from a minimum to a maximum. Even if river flows data are generally available on a daily basis, one does not observe very sudden increases of the flow, followed by an exponential decrease. Rather, it takes on average two to three days for the flow to reach a maximum, and the decrease is more or less rapid.

To obtain a more realistic model for the variations of river flows, by taking into account the periods of flow increase, Lefebvre and Guilbault [7] used the following response function: where is a positive parameter. With this response function, the river flow begins to increase immediately after time , but the maximum is reached after time units (i.e., generally days) and then it starts to decrease. Their work was improved by the authors, who provided a method to estimate the parameter . They found that, for the applications that they considered, was in the interval (0,1).

With a Poisson process being a continuous-time Markov chain, the time that the process spends in a given state is exponentially distributed, which is a strictly decreasing density function. Let be the times between two successive events, that is, the interarrival times. In the case of river flows, the interarrival time is generally a random variable with a density function that is increasing at first. Hence, the assumption that is a Poisson process is at least doubtful.

In Lefebvre [8], the author tested the hypothesis that the ’s are exponentially distributed. He found that, for the Delaware River, it was not accurate. Instead, it was shown that both a gamma and a Rayleigh distribution fitted the data much better. The objective was to forecast the peak flows of the Delaware River, which is a difficult task due to the very weak correlation between the successive peaks. To try to forecast these peak flows, the author used a filtered renewal process as a model for the river flow. An advantage in using a Rayleigh distribution for the interarrival time is that the filtered renewal process can be transformed into a filtered Poisson process by working on a different time scale.

Poisson and renewal processes have been used in hydrology to model low flows (see, for instance, Loaiciga and Leipnik [9] and Yagouti et al. [10]) as well as various physical phenomena such as traffic noise (see Marcus [11]). Andersen [12, 13] gave new theoretical results on filtered renewal processes (see also the references therein).

The aim of the present paper is to show that we can obtain more accurate short-term forecasts of river flows by making use of filtered renewal processes rather than filtered Poisson processes. To do so, we will first need to develop a formula to forecast the flow at time , given the history of the process up to time . Because we cannot appeal to the memoryless property of the exponential distribution, the task of finding an estimator for is more difficult. The values of the river flow before time must be taken into account, contrary to the case of filtered Poisson process, for which the forecasts depend only on .

Moreover, because we want to clearly see the improvement obtained by using a more realistic distribution for the interarrival time, we will take the response function defined in (2). Indeed, as mentioned above, filtered Poisson processes having this response function are commonly used in hydrology, and the forecasting formulas are then easy to derive and implement. Therefore, it will be easier to judge the quality of the model that we propose.

In the next section, filtered renewal processes will be formally defined and the formulas needed to forecast the river flows will be developed. Then, in Section 3, the results will be applied to forecast the flows of the Delaware and Hudson Rivers. Finally, a few concluding remarks will be made in Section 4.

2. River Flows Modeled as a Filtered Renewal Process

Let denote the flow of a certain river at time . We assume that in which is a renewal process. The ’s are the arrival times of the events of the process , and the times between the successive renewals are independent and identically distributed random variables. Finally, the (positive) random variables are independent and identically distributed (and independent of ).

Thus, the process is characterized by the sudden increases of the river flow caused by the events that occur at times . A renewal cycle is defined as the time between two consecutive peaks. For the interarrival time, the following distributions (in addition to the exponential distribution) will be considered in Section 3: lognormal, gamma, Weibull, and Rayleigh.

Next, in order to be able to forecast the river flow at time , we must first estimate the parameter that appears in the response function. If there are no signals between and , then is given by

Hence, to estimate the parameter , we can consider all the days where the flow decreased in the data set and calculate the ratio . The point estimate of is then derived from the arithmetic mean of this ratio:

In the case of the random variables , which constitute a random sample of a parent random variable , their exact distribution is not needed. Indeed, only the expected value of will be used to forecast the value of . Therefore, we simply have to compute the mean increase of the river flow during the calibration period of the model.

We will now derive formulas to estimate the river flow one and two days in advance. These formulas can be used with any distribution for the interarrival time , which denotes the parent variable of the ’s.

2.1. Forecasting the Flow at Time

We will first try to forecast the river flow at time , given all the information available up to . To do so, we assume that there will be at most one event in the interval . At any rate, as mentioned above, in practice flow values are generally available on a daily basis. Therefore, it is not really possible to determine whether more than one event occurred in . When it does happen, these events can be considered as a single large event.

Apart from the case when the interarrival time has an exponential distribution, we must take the history of the process into account to calculate the probability that there will indeed be an event in . Let That is, is the time elapsed between the and the event after the most recent one that occurred before or at time . The random variables and are identically distributed. We must compute where is the (integer) number of days since the most recent signal was observed. We have With the help of a mathematical software, we can obtain numerical values of this probability for any density function and any value of .

Next, if is exponentially distributed, it is well known that, given that there is a signal (i.e., an event) between and , the conditional density of is uniform on the interval , so that the signal in question occurred on average at time . In the general case, we must compute the conditional density function of in the interval , given that .

If there is indeed a signal between and , the forecast of the river flow at time would be Therefore, the forecast of the river flow at time , given the entire history of the process up to , is given by

2.2. Forecasting the Flow at Time

Let . To forecast the river flow at time , given all the information available up to time , we need the distribution of the random variable , where is defined in (7).

We now assume that the probability that there will be more than two signals in the interval is negligible (irrespective of the value of ). The density function of is given by the convolution of the density functions of and : Since is a nonnegative random variable, we can write that

In the case when has an exponential distribution with parameter , we know that has a gamma distribution with parameters and . In the other cases, we must generally evaluate the above integral numerically.

The value of is given bywhere denotes the size of the signal in the interval . As mentioned above, we assume that

We must first compute When has an exponential distribution with parameter , In general, we have

Next,

Alternatively, using the fact that and are independent random variables, we may write that

Finally, by independence,

The forecast of the river flow at time , given the entire history of the process up to time , is thus given by

3. Forecasting the Flows of the Delaware and Hudson Rivers

To assess the quality of the forecasts obtained with the renewal filtered process, we need to apply the formulas developed in the previous section to real-life data. As in [7], we chose the Delaware and Hudson Rivers, located in the United States, at the Montague (01438500) and the North Creek (01315500) gage stations, respectively. The flow values are available on the Internet at the following address: http://nwis.waterdata.usgs.gov. To calibrate the model, we used the data from October 2008 to September 2009.

Remarks. (i) When one is dealing with real-life data, things are not as simple as the mathematical models suggest. Looking at the actual hydrograph of the Delaware River, we observe a number of weak peaks that occur while the flow is decreasing and for which the increasing period lasts only one day before the flow resumes its decline. Similarly, there are sometimes small increases of the flow value occurring just after a minimum was observed that last only one day. In building the data set, there is always a subjective part. We decided to neglect these weak peaks, thus considering only the peaks that appear quite clearly in the hydrograph.

(ii) Because the river flow is observed on a daily basis, we must discretize the set of possible values taken by the interarrival time . We computed the number of days elapsed between consecutive peaks in the data set. In the case of the first observed peak, is the number of days elapsed since the beginning of the calibration period. Then, is obtained by subtracting the arrival times of the consecutive peaks. Notice that is greater than or equal to 1.

We first consider the Delaware River. For this river, we found that the average value of the interarrival times in the data set is equal to 6.8889 days. Moreover, the standard deviation of the observations is given by days. Based on these values, one may at once conclude that is very unlikely to be exponentially distributed. Indeed, remember that the mean and the standard deviation of an exponential distribution with parameter are both equal to .

Next, the value of the point estimate of the (unitless) parameter (see (6)) is , and the average value of the magnitude of the signals is given by cubic feet per second.

For the distribution of the random variable , we tested the following models:(i)an exponential distribution with parameter ,(ii)a lognormal distribution with parameters and ,(iii)a gamma distribution with shape parameter and scale parameter ,(iv)a Weibull distribution with shape parameter and scale parameter ,(v)a Rayleigh distribution with parameter .

Table 1 summarizes the results of the chi-square goodness-of-fit tests for each distribution and gives the respective parameter estimates. We find that, as expected, the exponential distribution is rejected, whereas the other distributions are all accepted, according to the -values of the tests. The empirical cumulative distribution function as well as the distribution functions of the various models considered above is shown in Figure 2. We clearly see that the exponential distribution does not fit the data nearly as well as the other four models.

Now, in the case of the filtered Poisson process (1) with the response function defined in (2), to estimate the river flow at time we compute the expected value of given all the past observations. Because of the memoryless property of the exponential distribution, we find that this conditional expectation only depends on the most recent value of the flow, that is, . We obtain (see [7]) that where

Similarly, to estimate the river flow at time , we only need to calculate where

Moreover, we have that where denotes the correlation coefficient of and .

By making use of these three equations, we can estimate the parameter of the Poisson process, the parameter of the assumed exponential distribution of the ’s, and the parameter in the response function. Actually, we do not need point estimates of and to forecast the flow. From (27), we deduce that we can simply estimate the ratio that appears in both (24) and (26) by the mean value of the observed data when the process is in steady state. Finally, to estimate the parameter , we first compute the empirical correlation coefficient between the flow values at times and and then we set (see (29)). With the value of being close to 1, the previous formula is well defined.

Remark. If one is only interested in forecasting the river flow at time , one may estimate by computing the empirical value of and making use of (29).

We are now ready to compare the forecasts derived from both the filtered Poisson process and the filtered renewal process with the various distributions considered for the interarrival time . We used the formulas given above to forecast the flow of the Delaware River for the 89-day period from February to April 2010.

To assess the accuracy of the forecasts, we consider four criteria commonly used in hydrology: the mean absolute percentage error (Mape), the Nash criterion (Nash), the peak criterion (Pc), and the correlation coefficient (Corr) between the observed and forecasted values.

The Nash criterion is based on the mean square error and is used to assess the predictive power of hydrological models. It evaluates the quality of the forecasts from the differences between the expected and observed daily values. It is equal to 1 in case of a perfect fit. For its part, the peak criterion is used to measure the quality of the forecasts during the critical peak period. The closer to 0 it is, the better the forecasts are.

Table 2 displays the results obtained with the two competing models. Looking at these results, we notice that the Mape, Nash, and Corr criteria yield practically the same values for the filtered renewal and filtered Poisson processes. However, the filtered renewal process did much better than the filtered Poisson process when we consider the peak criterion. This criterion is especially important because accurate forecasts are really needed during the peak period.

We now turn to the Hudson River. The value of the point estimate of the constant for this river is and the average magnitude of the signals is , which is much smaller than in the case of the Delaware River.

Proceeding as above, we performed chi-square goodness-of-fit tests for the various distributions considered. The results are presented in Table 3 (see also Figure 3). This time, we see that both the exponential and Rayleigh distributions must be rejected, while the lognormal distribution clearly provides the best fit to the data.

The values of the four criteria used to compare the models are shown in Table 4. We discarded the Rayleigh distribution, because it yielded a very bad fit to the data. As in the case of the Delaware River, the values of the Nash and Corr criteria are quite similar, while the important peak criterion and the Mape criterion are much smaller for the filtered renewal process than for the filtered Poisson process.

Finally, we present the observed and forecasted values of the flows of the Delaware and Hudson Rivers at times and in Figures 4, 5, 6, and 7, based on the filtered Poisson process (FPP) and the filtered renewal process (FRP) with a lognormal distribution for the interarrival time . Notice that the filtered renewal process is generally better than the filtered Poisson process at forecasting high flow values, which is our main concern.

4. Conclusion

In this paper, we were able to significantly improve the short-term hydrological forecasts produced by filtered Poisson processes by using a more realistic distribution for the interarrival time of the signals, thus working with filtered renewal processes. As we saw in the previous section, the filtered Poisson process, despite its lack of realism, is able to yield reasonable short-term forecasts. One possible explanation is the fact that the forecasting formulas derived from this model are very easy to implement. Also, contrary to the filtered renewal process, it does not require subjective decisions to estimate its various parameters. However, the filtered Poisson process was not able to produce forecasts as accurate as those derived from the filtered renewal process during the important peak period.

In addition to being used to forecast flow values, the models considered in this paper can also give us an estimate of the probability that the river flow will exceed a certain threshold in the next few days. This threshold may be a value that corresponds to a flow above which the risk of flooding is very high.

Finally, we could use the response function in (3) to further improve the forecasts. To do so, we would have to be able to estimate the parameter in that response function and then obtain formulas that generalize the ones in (11) and (22), which is probably not an easy task.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors want to thank the anonymous reviewers for the constructive comments.