Abstract

A novel approach for a Poisson cluster stochastic rainfall generator was validated in its ability to reproduce important rainfall and watershed response characteristics at 104 locations in the United States. The suggested novel approach, The Hybrid Model (THM), as compared to the traditional Poisson cluster rainfall modeling approaches, has an additional capability to account for the interannual variability of rainfall statistics. THM and a traditional approach of Poisson cluster rainfall model (modified Bartlett-Lewis rectangular pulse model) were compared in their ability to reproduce the characteristics of extreme rainfall and watershed response variables such as runoff and peak flow. The results of the comparison indicate that THM generally outperforms the traditional approach in reproducing the distributions of peak rainfall, peak flow, and runoff volume. In addition, THM significantly outperformed the traditional approach in reproducing extreme rainfall by 2.3% to 66% and extreme flow values by 32% to 71%.

1. Introduction

Stochastic rainfall generators provide synthetic rainfall input to hydrologic simulation models whenever the observed data with sufficient length are not available. Because they enable the Monte-Carlo simulation approach by providing an infinite length of rainfall time series to hydrologic simulation models, they are extensively utilized to assess the risks associated with hydrologic systems. Poisson cluster stochastic rainfall generation models [1, 2] are considered to be the most robust and practical stochastic rainfall generators because of their model structure that reflects well the seasonal and climatological features of rainfall generating mechanisms [3]. The performance of Poisson cluster rainfall models in reproducing conventional rainfall statistics such as mean, variance, autocorrelation, and probability of dry periods has been well validated over various geographical locations across the world [411]. For this reason, Poisson cluster rainfall models have been applied in a wide range of practices for hydrological risk assessments dealing with flooding (e.g., [12]), drought (e.g., [13]), contaminant transport (e.g., [14]), and ecosystem behavior (e.g., [15]).

The Poisson cluster model has been constantly improved after Rodriguez-Iturbe et al. [1] have suggested the original model structure and parameter calibration scheme. For example, Rodriguez-Iturbe et al. [2] introduced an additional parameter that can account for the storm-to-storm variability of rain cell duration. Cowpertwait [17, 18] derived analytical expressions for the probability of a dry h-hr period and skewness of a synthetically generated rainfall time series, respectively, which can subsequently be used for the calibration of model parameters. Velghe et al. [19] and Onof et al. [20] replaced the one-parameter exponential distribution with the two-parameter gamma distribution to represent the distribution of rain cell intensity more accurately. Cowpertwait [17] proposed a model that can account for the rainfall process in which both convective and stratiform events simultaneously exist.

While most of these previous studies regarding Poisson cluster rainfall generators tried to enhance the performance of the model by modifying the fundamental assumptions of the model structures, Kim et al. [16, 21] indicated that the performance of the model can be enhanced not only by modifying the model structure, but also by providing more information about the rainfall process. Particularly, they suggested that the interannual variability of rainfall is highly associated with extreme rainfall events and that the conventional calibration scheme of a Poisson cluster rainfall model cannot account for this interannual variability. They suggested an approach that can take this variability into account, which was termed The Hybrid Model (THM). THM successfully reduced the systematic bias of extreme rainfall and flow values that exist in a synthetic rainfall time series generated by the traditional approach of Poisson cluster rainfall modeling. However, the performance of the model was tested only at 11 rain gauges across the United States (black squares in Figure 1) for 12 calendar months (total of 132 months), so a general conclusion regarding the advantage of using THM over the traditional approach of Poisson cluster modeling can be constituted only after it is verified over various rainfall characteristics. For this reason, the present study tested THM for an additional 104 geographic locations across the United States. The results of this study are expected to expand the applicability of THM, especially in climatic regions that cannot be represented by the 11 gauges that were analyzed by Kim et al. [16, 21].

2. Methodology

2.1. Data Description

A total of 104 months of precipitation data observed at 104 US National Data Climate Center [22] precipitation gauges (one month per one gauge) across the contiguous United States (gray circles in Figure 1) were used in the analysis. The gauges were randomly drawn from the pool of all NCDC rain gauges that contain at least 50 years of records to make sure that the rainfall statistics represent well the rainfall characteristics at the gauges.

2.2. Modified Bartlett Lewis Rectangular Pulse (MBLRP) Model

THM shares its fundamental model structure with the Modified Bartlett-Lewis rectangular pulse (MBLRP) model [2]. In the MBLRP model, the rainfall time series are represented as sequences of storms comprised of rain cells (see Figure 2). In the model, X1 is a random variable that represents the storm arrival time, which is governed by a Poisson process with parameter ; X2 is a random variable that represents the duration of storm activity (i.e., the time window after the beginning of the storm within which rain cells can arrive), which varies according to an exponential distribution with parameter ; X3 is a random variable that represents the rain cell arrival time within the duration of storm activity, which is governed by a Poisson process with parameter ; X4 is a random variable that represents the duration of the rain cells, which varies according to an exponential distribution with parameter that, in turn, is a random variable represented by a gamma distribution with parameters and ; and X5 is a random variable that represents the rain cell intensity, which varies according to an exponential distribution with parameter . From a physical viewpoint, is the expected number of storms that arrives in a given period, is the inverse of the expected duration of storm activity, is the expected number of rain cells that arrives within the duration of storm activity, is the inverse of the expected duration of the rain cells, and is the average rain cell intensity. Parameters and [dimensionless] do not have a clear physical meaning, but the expected value and variance of can be expressed as and . Therefore, the model has six parameters: , , , , , and ; however, it is customary to use the dimensionless ratios and as parameters instead of and . The estimation of the model parameters is accomplished by matching statistics of the simulated and observed rainfall time series. Some commonly used statistics are the precipitation depth mean, variance, probability of zero rainfall, and lag-s covariance at various time scales [8, 23].

2.3. The Hybrid Model (THM)

The fundamental idea of THM arose from the fact that the rainfall statistics that are considered in the conventional approach of Poisson cluster rainfall modeling do not contain enough information about the variable characteristics of the rainfall process. Especially, it is noteworthy that the rainfall statistics that are used in the calibration process of the conventional approach are typically calculated for the entire period of a rainfall time series, overlooking the fact that they vary from year to year. See, for example, Figure 3. Each plot in the figure shows the monthly variation hourly rainfall statistics (mean, variance, lag-1 autocorrelation, and probability of zero rainfall) at the gauge TX-4300 (star in Figure 1). Thirty-one years of continuous hourly rainfall data (1976–2005) was used to generate the plots. Small dots in each plot of Figure 3 represent the rainfall statistics of a given month of a given year (e.g., hourly mean rainfall of June, 1977). In contrast, hollow circles connected with a solid line represent the rainfall statistics of a given month over the entire recording period (e.g., hourly mean rainfall of June between the years of 1976 and 2005). It can be noted that the interannual variability of rainfall statistics (vertical spread of dots in each plot of Figure 3) is significant. The existing framework of Poisson cluster rainfall models only uses the average long-term statistics (hollow circles in Figure 3) that ignore the interannual variability. For this reason, the rainfall time series that is generated by the conventional approach cannot reflect the interannual variability of the rainfall. Considering that hydrologically important events such as floods and droughts are associated with marginal statistics that are far from its long-term mean, it is critical that stochastic rainfall models include an algorithm that accounts for the interannual statistical variability. Kim et al. [16, 21] proposed an algorithm to resolve this issue, which introduces an additional process of modeling the interannual variability of rainfall statistics to the conventional approach of Poisson cluster rainfall simulation.

The modeling framework of the traditional approach and THM approach is shown and compared in Figure 4. While the traditional approach generates the rainfall time series for a long period of time (e.g., 1000 months) based on the long-term rainfall statistics (namely, rainfall statistics corresponding to the entire period of the observed record), THM generates the rainfall time series with a short period of time (e.g., 1 month) based on the short-term rainfall statistics (namely, rainfall statistics corresponding to one calendar month).

THM firstly simulates the short-term rainfall statistics. Here, short-term statistics refer to the statistics for the time period of one calendar month. For example, if a rainfall time series is to be generated for the month of January for 100 times, THM first generates a 100-set of short-term rainfall statistics. The simulated short-term rainfall statistics are mean rainfall at 1 hourly accumulation level, variance, lag-1 autocorrelation coefficient, and probability of zero rainfall at 1, 3, 12, and 24 hourly accumulation levels (total of 13 statistics).

The sequence of generation of the 13 rainfall statistics is as follows, which is also depicted in Figure 5.(1)Randomly draw mean (MEAN1) and autocorrelation coefficient (AC1) at hourly accumulation levels from the predetermined normal and gamma distributions, respectively. Figure 6 shows a sample histogram and the fitted distributions at the gauge TX-4300 for the month of June. From these distributions, hourly mean and lag-1 autocorrelation coefficients are drawn.(2)Based on the randomly drawn mean rainfall at hourly accumulation levels (MEAN1), generate variance (VAR1 = STDEV12) and probability of zero rainfall (PROB1) at hourly accumulation levels using the correlation between the variables identified through linear regression analysis. Figure 7 shows this correlation and the least square regression lines.(3)Based on the generated VAR1 and PROB1, generate the variance and probability of zero rainfall at 3, 12, and 24 hours of accumulation levels (VAR3, VAR12, VAR24, PROB3, PROB12, and PROB24, resp.) using the correlation between the variables identified through linear regression analysis. Figure 8 shows this correlation and the least square regression lines.(4)Based on the generated AC1 in step , generate the autocorrelation coefficients at 3, 12, and 24 hours of accumulation levels (AC3, AC12, and AC24, resp.) using the correlation between the variables identified through linear regression analysis. Figure 6 shows this correlation and the least square regression lines.

The second part of THM simulates the rainfall time series using the Modified Bartlett-Lewis rectangular pulse model [2] based on the simulated short-term rainfall statistics. Firstly, the 6 parameters of the MBLRPM corresponding to each simulated month are estimated using isolated particle swarm optimization (ISPSO, [24]). Then, 20 rainfall time series with the length of one month are generated based on the estimated parameters. Lastly, the rainfall time series of which statistics is the closest to the short-term rainfall statistics that were used in parameter estimation is selected. For a more detailed description on the methodology of THM, readers can refer to Kim et al. [16].

2.4. Comparison of the Two Models in Reproducing the Distribution of the Observed Rainfall and the Corresponding Watershed Responses

The performance of THM was tested in its ability to reproduce the distribution of monthly maximum rainfall depths, monthly peak flows, and monthly runoff volumes. For each of the chosen months of the 104 gauges, 100 months of synthetic rainfall time series were generated using both THM and the traditional approach of the MBLRP model. Accordingly, each of the gauges has three different types of rainfall time series including the observed ones. Then, the following values were calculated using all three types of rainfall time series: monthly maximum rainfall depths with the duration of 1, 3, 6, 12, and 24 hours; monthly runoff depth; and monthly peak flow. The SCS curve number method and SCS curvilinear unit hydrograph method [25] were used to calculate the last two values. The watershed characteristics assumed were a lag time of 2 hours, drainage area of 7.5 km2, and a curve number of 50, 60, 70, 80, and 90.

As a result, each gauge was associated with 500 monthly maximum rainfall depths based on THM (i.e., 5 rainfall durations × 100 years of simulation—, , , , ); 500 monthly maximum rainfall depths based on the traditional MBLRP model approach (i.e., 5 rainfall durations × 100 years of simulation—, , , , ); a number of monthly maximum observed rainfall depths (i.e., 5 rainfall durations × years of record—, , , , ); 5 (different curve numbers) sets of 100 monthly runoff depths based on THM (, , , , ); 5 sets of 100 monthly runoff depths based on the traditional approach (, , , , ); 5 sets of +50 monthly runoff depths based on observed rainfall time series (, , , , ); 5 (different curve numbers) sets of 100 monthly peak flows on THM (, , , , ); 5 sets of 100 monthly peak flows based on the traditional approach (, , , , ); and 5 sets of +50 monthly peak flows based on observed rainfall time series (, , , , ).

The two-sample Kolmogorov-Smirnov test (K-S test) was used to compare the distributions of the variables calculated from the observed rainfall time series and the ones calculated from the synthetic rainfall time series. The test statistic of the two-sample Kolmogorov-Smirnov test, which compares the distributions of the data set and the data set , is as follows: where is the empirical cumulative density function of the value . The null hypothesis of the test is that the sets and are from the same continuous distribution. Therefore, if the results of the test indicate that the null hypothesis is not rejected, one can say that set and set are from the same continuous distribution with a given significance level that is specified in the test. In this study, a significance level of 5% was used.

In this study, a set of two tests should be performed to tell if THM outperforms the traditional approach. For example, if the test comparing and indicates that both variables are from the same continuous distributions and the test comparing and indicates that they are from the different distributions, the advantage of using THM over the traditional approach to predict the maximum precipitation depth at hourly duration is proved. This set of tests was repeated for the 15 variables (, , , , , , , , , , , , , , and ) to see how the performance of THM compares to the one of the traditional approach based on long-term statistics.

2.5. Comparison of the Two Models in Reproducing the Extreme Rainfall and the Peak Flow Values

The result of the K-S test only tells the overall similarity or difference between the two distributions. However, extreme events are more highly associated with the upper tail of the distribution than they are to the overall shape of the distribution. In other words, there can be a case in which the simulated rainfall time series cannot reproduce the extreme events even if the result of the K-S test indicates the similarity of the distributions. For this reason, the design rainfall and the corresponding peak flow values at the virtual watershed with some given recurrence intervals were calculated for each of the rainfall time series and were compared with each other. A generalized extreme value distribution was used to model the distribution of the monthly peak rainfall and the monthly peak flow, and the method of L-moment [26] was used to estimate the parameters of the distribution. Then, the residual of each model’s design precipitation was normalized as follows: where RP and RQ represent the normalized residual of the design precipitation and the corresponding design flow at the virtual watershed, respectively, DP and DQ represent the estimated design precipitation and design flow, respectively, and superscript and subscript attached to the letter DP and DQ represent recurrence interval and the type of time series on which the calculation is based (either the MBLRPM, THM, or observed), respectively. A value of RP or RQ that is close to 0 means that the extreme precipitation or the extreme flow with a given recurrence interval produced by the rainfall model is close to their observed counterpart. In addition, a positive (or negative) value of RP or RQ means that the extreme precipitation or extreme flow with a given recurrence interval produced by the rainfall model is greater (or smaller) than their observed counterpart, and vice versa. For example, a value of −0.1 means that the 50-year flow value that is reproduced by the traditional MBLRPM model is 10% smaller than its observed counterpart. In this study, this residual analysis was performed for all 104 gauges and a histogram of the residuals was prepared. The centeredness and peakedness near the value of 0 in the shape of this histogram can be a good measure of the overall performance of the model in terms of reproducing extreme rainfall and flow values.

3. Results

3.1. Reproduction of Annual Maximum Rainfall Distribution and Extreme Rainfall Depth

Figure 9 compares the cumulative density function (CDF) of the monthly peak rainfall depth with a duration of 1 hour derived from the simulated and the observed rainfall time series at the NCDC rainfall gauge located in northern Texas (Latitude = 33.61, Longitude = −99.38) for the month of November. It can be noted that both the traditional method and THM reproduce well the distribution of the monthly peak rainfall depth of the observed rainfall time series.

To acquire more general conclusions, the K-S test was performed for all 104 geographical locations and for all 5 rainfall durations. Figure 10 summarizes the results of the K-S test analysis. The figure shows the proportions among the 104 gauges that THM and the traditional approach succeeded in reproducing the distribution of the observed monthly maximum rainfall depths with 1, 3, 6, 12, and 24 hours of duration.

It is notable that THM and the traditional approach succeeded in reproducing hourly maximum rainfall depth only for 47 and 63% of the entire 104 stations, respectively. This seems to be particularly because the monthly maximum rainfall with 1-hour duration is only a small portion of the time series and the statistics used in the model calibration (mean, variance, lag-1 autocorrelation, and probability of zero rainfall) does not directly reflect this event. For this reason, the reproduction of extreme values using Poisson cluster models has been the primary subject for many studies. For example, Cowpertwait [18] described the limitation of Poisson cluster rainfall models in which only a single set of parameters can be applied to represent the characteristics of both extreme and frequent rainfall events. In the meantime, the effort to identify the patterns of the statistical properties, in which both approaches have a high probability of failure in reproducing 1-hour duration maximum rainfall depth, was successful; see Figure 11.

Each plot in the figure shows the relationship between the hourly statistical values (e.g., mean and variance) and the success index in reproducing the monthly maximum rainfall depth with 1-hour duration. The density of circles in the first column of the first plot increases with the decrease in mean hourly rainfall. This suggests that both approaches are more likely to fail in a relatively dry region. In a similar manner, the second plot suggests that both approaches are more likely to fail when the hourly variance of the rainfall time series decreases. Considering the high correlation between rainfall mean and variance (meaning that a dry region is mostly associated with low rainfall depth variability), it can be generalized that Poisson cluster models should be used with caution when used to model extreme events in relatively dry regions (e.g., mean rainfall less than 0.1 mm/hr).

Secondly, THM outperformed the traditional approach in reproducing the distribution of the monthly maximum rainfall depths with a duration starting from 3 to 12 hours, while it did not for the 1-hour duration rainfall. This is particularly because the extreme values were better represented by the short-term rainfall statistics that were additionally introduced by THM as the duration of the extreme rainfall increased. In other words, THM, which incorporates more statistical information about the interannual variability than the traditional approach does, has more information on the maximum rainfall depths, resulting in better performance. An analysis that is similar to the previous one was performed to identify the pattern of statistical properties in which THM particularly outperforms the traditional approach, but no notable pattern was observed.

Figure 12 shows the histograms of the normalized residuals of the 1-hour design precipitation (2) for THM (a) and the MBLRPM (b). It can be noted that the center of the histogram corresponding to THM is nearer to the value of 0 for all three recurrence intervals of design precipitation, while the one corresponding to the traditional approach is biased toward the left. This means that the extreme rainfall values generated by THM are closer to the observed ones, while the traditional approach consistently underestimates the observed extreme rainfall values. Table 1 shows the mean and standard deviation of the RP values for 1-hour, 3-hour, and 6-hour design precipitation values.

The mean of RPTHM values was consistently closer to 0 compared to RPMBLRPM for all three rainfall durations, which means that the overall performance of THM in reproducing the extreme rainfall values (namely, the upper tail part of the distribution) of the observed rainfall time series. The standard deviation of RPTHM values was consistently greater than that of RPMBLRPM. This indicates that the traditional MBLRP model has greater consistency, but it also means that the traditional approach underestimates the extreme rainfall values more consistently compared to THM. The last column of Table 1 shows the proportion among all 104 gauge locations that the model design precipitation value had less than 20% discrepancy from the observed design precipitation. The values are consistently greater for THM. This means that THM has a greater probability of successfully reproducing the extreme rainfall values than does the traditional MBLRP model. It has been also noted that the performance of both THM and the traditional approach was better for the rainfall with the greater accumulation interval. This seems to be because the conceptualization of the rainfall process of MBLRPM is more appropriate for the greater accumulation interval. In other words, the gradual increase of the rainfall rate at a given point at hourly accumulation level which occurs in reality cannot be modeled well using the current MPLRPM and THM framework because they conceptualize the arrival of the rainfall as an abrupt process using time-intensity rectangles in time axis.

3.2. Reproduction of Annual Maximum Peak Flow Distribution and Extreme Flow

Figure 13 shows the proportions, among 104 gauges, that THM and the traditional approach succeeded in reproducing the distribution of the peak flow at the virtual watershed with varying values of curve numbers between 50 and 90. For all of the cases, THM slightly outperformed the traditional approach. This result is encouraging in that the tested hypothetical watershed has 2 hours of lag time and THM did not outperform the traditional approach in reproducing a 2-hour duration maximum rainfall. The traditional approach notably outperformed the THM for the 1-hour duration and THM slightly outperformed the traditional approach for the three hour duration. See Figure 10.

It is also noteworthy that the success ratio of peak flow distribution reproduction was significantly greater than that of the extreme rainfall depth distribution reproduction for both models. The reason may be because the watershed response is not only a function of peak flow depth but also a function of the rainfall depth that occurs around the time of a peak rainfall event, and Poisson cluster models are good at reproducing the latter. Also, the amount of discrepancy between the observed extreme rainfall and the model rainfall decreased as the rainfall converted into runoff due to the infiltration process (the so-called the “damping effect”).

An analysis was performed to identify the statistical pattern at which THM outperforms the traditional approach. Figure 14 shows the relationship between the hourly rainfall mean and the success index of reproducing the monthly peak flows at the virtual watershed with curve number 90. It can be noted that the most gauge locations at which only THM succeeded in reproducing peak flow values had a rainfall mean of less than ~0.1 mm/hr. A similar pattern was observed for the analyses based on virtual watersheds with different curve numbers.

Figure 15 shows the histograms of the RF values for the virtual watershed with a curve number of 90. The plots in the upper row show the ones corresponding to THM, and the plots in the lower row show the ones corresponding to the traditional MBLRP model. It can be seen that the histograms corresponding to THM are more centered near the value of 0, meaning that the overall performance of reproducing extreme peak flow values is better for THM compared to the traditional MBLRP model. Table 2 summarizes the results of this histogram analysis. It can be noted that the degree of underestimation of extreme flow values for the traditional MBLRP approach is significant, with the RQ value varying between −0.34 and −0.47, which means that the extreme flow reproduced by the traditional MBLRP model can be smaller than the observed flow by 34% to 47%. This degree of underestimation was significantly reduced by employing THM, which varied between −0.12 and −0.23. The last column of Table 2 shows the proportions among all 104 gauge locations where the model design flow value had less than 30% residual from the observed design flow, and the values were consistently greater for THM. This means that THM has a greater probability of successfully reproducing extreme flow values than the traditional MBLRP model does.

3.3. Reproduction of Runoff Depth Distribution

Figure 16 shows the proportions, among 104 gauges, that THM and the traditional approach succeeded in reproducing the distribution of the monthly runoff depth at the hypothetical watershed with varying values of curve numbers between 50 and 90. For all cases, THM notably outperformed the traditional approach. As opposed to maximum rainfall depths and peak flow values, runoff volume is more closely related to the rainfall statistics, and THM, which incorporates more rainfall statistics than the traditional approach, had an improved performance in reproducing the distribution of runoff volumes.

An analysis was performed to identify the statistical pattern at which THM outperforms the traditional approach in reproducing runoff volume. Similar to the analysis of peak flow reproduction, it was observed that the most gauge locations at which only THM succeeded in reproducing peak flow values had a rainfall mean of less than ~0.1 mm/hr.

Overall, this result indicates the advantage of using THM for continuous rainfall simulation in which not only the extreme values, but also the overall runoff value is considered important. In the meantime, it should be noted that the results regarding watershed response analysis were obtained using the SCS curve number method, which may yield inaccurate result in continuous watershed modeling. The analysis based on fully distributed hydrologic models such as the ones used in the Distributed Model Intercomparison Project—phase 1 (DMIP1) (http://www.nws.noaa.gov/oh/hrl/dmip/) might be able to provide more physically-based results provided with sufficient information on meteorological forcings, soil types, land use, topography, and initial conditions of any interested domain. However, the distributed continuous models were not adopted in this study because it is not only infeasible to obtain all relevant information but also computationally intensive to run the models for many locations.

4. Conclusion

In this study, the performance of a Poisson cluster stochastic rainfall generator that is capable of accounting for the interannual variability of rainfall statistics was validated over the various geographic locations across the contiguous United States. The results of the present study confirmed that the traditional approach using only a few number of “long-term” rainfall statistics calculated for the entire period of record cannot sufficiently represent the rainfall characteristics of a given calendar month varying from one year to another. Such an identified problem was resolved by using the newly suggested approach. The latter additionally included the process of simulating “short-term” rainfall statistics that varies from one year to another and combined the process into the traditional approach of Poisson cluster rainfall modeling. The suggested novel approach (The Hybrid Model, THM) generally outperformed the conventional approach of the Poisson cluster rainfall simulation when modeling the characteristics of extreme precipitation, extreme flood, and runoff volume, while it did not outperform the traditional modeling approach in reproducing extreme rainfall depths on a fine time scale (e.g., 1 hour). The present study is especially meaningful in that it extended the applicability of THM by validating it over various geographic locations across the contiguous United States, in addition to what was investigated by Kim et al. [16, 21].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (Grant no.: NRF-2013R1A1A1011676).