A Forecasting Approach Combining Self-Organizing Map with Support Vector Regression for Reservoir Inflow during Typhoon Periods
This study describes the development of a reservoir inflow forecasting model for typhoon events to improve short lead-time flood forecasting performance. To strengthen the forecasting ability of the original support vector machines (SVMs) model, the self-organizing map (SOM) is adopted to group inputs into different clusters in advance of the proposed SOM-SVM model. Two different input methods are proposed for the SVM-based forecasting method, namely, SOM-SVM1 and SOM-SVM2. The methods are applied to an actual reservoir watershed to determine the 1 to 3 h ahead inflow forecasts. For 1, 2, and 3 h ahead forecasts, improvements in mean coefficient of efficiency (MCE) due to the clusters obtained from SOM-SVM1 are 21.5%, 18.5%, and 23.0%, respectively. Furthermore, improvement in MCE for SOM-SVM2 is 20.9%, 21.2%, and 35.4%, respectively. Another SOM-SVM2 model increases the SOM-SVM1 model for 1, 2, and 3 h ahead forecasts obtained improvement increases of 0.33%, 2.25%, and 10.08%, respectively. These results show that the performance of the proposed model can provide improved forecasts of hourly inflow, especially in the proposed SOM-SVM2 model. In conclusion, the proposed model, which considers limit and higher related inputs instead of all inputs, can generate better forecasts in different clusters than are generated from the SOM process. The SOM-SVM2 model is recommended as an alternative to the original SVR (Support Vector Regression) model because of its accuracy and robustness.
Cyclones, typhoons, and hurricanes refer to the same meteorological phenomenon in different parts of the world. They are weather systems with strong winds that circulate anticlockwise around a low pressure area in the northern hemisphere and clockwise in the southern hemisphere. Taiwan is located in the northwestern Pacific, on one of the main typhoon paths, and is hit by three to five typhoon events each year on average. However, the rainfall distribution is uneven in both time and space due to the complex terrain conditions in Taiwan. Torrential rain due to typhoons leads to frequent serious disasters such as flooding, landslides, and debris flow. However, the rain is an important water resource that should be stored. Reservoirs are the most important and effective water storage facilities for solving the uneven rainfall problem. Therefore, reservoir inflow forecasting plays an important role in water resource planning and management.
There are numerous difficulties in constructing a physically based mathematical model because of the extremely complex and highly nonlinear relationship between typhoon rainfall and reservoir inflows. As an attractive alternative to physically based models, data-driven models that are based on artificial intelligence methods, such as neural networks, are favored and are practicably applicable in reservoir inflow forecasting [1–5]. Support vector machines (SVMs) are novel, artificial intelligence-based methods. The SVMs, developed for classification and then extended for regression by Vapnik [6, 7], are based on statistical learning theory. Based on the structural risk minimization (SRM) principle, SVMs theoretically minimize the expected error of a learning machine and reduce the problem of overfitting. In addition, the architecture of the SRM principle guarantees a unique and globally optimal solution by solving the convex optimization problem. A more detailed treatment of SVMs can be found in several text books [8, 9].
The SVMs are proposed as alternative data-driven tools in many fields [10–14] and have excellent generalization ability. In the field of hydrology, problems such as time series forecasting have been reported in recent years [15–28]. The SVMs are outstanding data-driven tools, incorporating the property of regression. However, when the SVM regression considers information with excessive noise or low relationships, the ability of generalization will be reduced. In hydrological cases, it will be found in high values; generalization depends on existence of very many low values. Moreover, for long lead-time forecasting, SVMs can only consider limited noisy information. In such cases, the model cannot forecast the inflow well.
To solve the above problems, in this study, the self-organizing map (SOM) is adopted in advance to group the inputs of SVMs. In each group, the inputs concerned in different inflow processes have a high relationship. Hence, the forecasts obtained by SVMs, which are developed using inputs in the same cluster, may have higher accuracy.
The SOM introduced by Kohonen [29, 30] is a special category of artificial neural networks (ANNs). It can project high-dimensional input space on a low-dimensional topology so as to allow the number of clusters to be determined by inspection. This capability enables the discovery of the relationships among complex data and has been used in recent years [31–40]. Furthermore, the clustering performance of SOM is better than that of conventional clustering methods [41–43].
For improving reservoir inflow forecasting, an approach consisting of a SOM-based clustering method and SVMs is proposed in this study. The SOM density map is obtained using only the past two-hour inflows as inputs for different events and lag time forecasting. Then, SVMs are performed on the basis of the results of the SOM-based clustering method to forecast reservoir inflow. Finally, the proposed approach is applied to the Feitsui Reservoir watershed in northern Taiwan to find the 1 h ahead inflow forecast. A flowchart of the proposed model is illustrated in Figure 1.
2.1. Two SOM-SVM Models
In this study, to avoid the lack of information variability which yields a lower learning ability of the neurons, an enlarged training data set, collected as training forms, is considered. This enlarged training data set implies that besides the training data set of each neuron, all the other training data sets in the same specified region are also adopted as training data sets. To more efficiently obtain well-performed forecasting results, two different specified region strategies are defined. The flowchart for these two specified region strategies concept is shown in Figure 1. The first specified regions are defined by the training data set that, according to the different inflow processes, is SOM-SVM1 model. That is, when the data set on the SOM feature map is simply divided into different training forms, each training form collects the neurons having the same pattern as the different rainfall-runoff processes. The concept of specified region strategy for SOM-SVM1 model is adopted from Hsu et al. . But, different from five regions’ definition for Hsu, the specified regions of this study are only four. These training forms, , of the SOM-SVM1 model are denoted byHere, (1) expresses the th neurons having the same rainfall-runoff process in the th region. In general, there are four different rainfall-runoff processes that can be considered specified regions in the typhoon events: increasing inflow region, base flow region, peaking hydrograph region, and recession region. However, for some learning results of the SOM generation, the rainfall-runoff process cannot be clearly divided, especially for the relationships between the increasing inflow region, the peaking hydrograph region, and the recession region. This indicates that, on occasion, there are only three rainfall-runoff training forms found on the SOM feature map. Therefore, when the peaking hydrograph region is not easily separated from the feature map, only three regions can be separated from the SOM feature map.
Different from the SOM-SVM1 specified region definition strategy, the specified regions of the SOM-SVM2 model allow for the selection of stronger relationship neurons. That is, based on the consideration of well-performed results and simple definition of the feature relationship, the enlarged training forms of the SOM-SVM2 model are established. For each neuron generated in the feature map, the SOM-SVM2 model simply collects the enlarged training data sets of some specified higher relation neurons as training forms to implement the SVM model. The concept of the specified higher relation neurons means the neurons in the cross area are adopted as specified higher relation neurons beside themselves. Then, the enlarged training data set is adopted from these specified higher relation neurons. Therefore, for a SOM feature map, the training forms are denoted bywhere , , and the value of is an integer. When the neurons are located on the feature map, the enlarged training forms have the training forms adopted from the four surrounding neurons and the training forms adopted from itself for each neuron. As the occasions in which or , the neuron should be removed from the equation. It drives for the edge neurons, the enlarged training forms can only adopt the training data sets of the surrounding two or three neurons. This means that these training forms have the training data sets adopted from five neurons, or three or four neurons for the edge neurons. The above training forms generated from two different models are adopted to implement the SVM model. Then, in different specified regions, the SVM models are adopted to generate the forecasts from the forecast forms in each neuron. These forecast forms have only the forecast data sets inside each neuron.
3.1. The Study Area and Data
In this paper, all the SVM-based models are applied to the Feitsui Reservoir watershed in northern Taiwan. Feitsui Reservoir is located downstream three major tributaries (Kingkwa Creek, Diyu Creek, and Peishih Creek). It has a surface area of 10 km2, a mean depth of 40 m, a maximum depth of 120 m, a full capacity of 406 million m3, and a total watershed area of 303 km2 (see Figure 2). Feitsui Reservoir supplies water for Taipei city (the capital of Taiwan); thus, it is the most important reservoir in northern Taiwan.
The rainfall data are collected from 1988 to 2008. The maximum and average yearly rainfalls are 5736.6 mm and 3808.6 mm, respectively. In this paper, the 22 typhoon events used for model development are presented in Table 1. These 22 typhoon events used for inflow forecasting are divided into two sets, 21 training events and a testing event, in each event’s forecasting.
3.1.1. Determination of Lag Length
For the model construction in this paper, it is necessary to decide on the length of the forecast form at the beginning. Each forecast form of all the typhoon events iswhere is the current time, is the lead-time period (from 1 to 3 h), is rainfall in the th gauge at time , is the inflow at time , is the derivation of forecasts, and and denote the lag length of inflow and rainfall, respectively.
For the reservoir inflow forecasting model, model construction with appropriate lag lengths of input is an important component. In this paper, the criterion RPE is applied to determine the lag length of inputs. The RPE is defined bywhere and are the root-mean-square-error (RMSE) for the model with and lag lengths, respectively. The RMSE can be obtained aswhere is the inflow at time , is the predicted inflow at time , and is the number of time steps. In general, the RMSE decreases with increasing lag term. When the RPE value is less than 5%, the increase of lag lengths is stopped.
By using this procedure, the most appropriate lag lengths of typhoon rainfall, , and inflow, , for a certain lead-time, , can be determined. The appropriate lag lengths for two hours of rainfall and inflow are used to forecast the 1- to 3-hour ahead inflows. Then, the general form with appropriate lag lengths for the SVM-based models is described as
Additionally, for reasonable model comparison, the two indices of the SVM regression, and , are simply defined as 1 and 0.1.
To discuss the individual and average performance of the SVM and two different SOM-SVM based models, three indices, relative root-mean-square-error (RRMSE), mean root-mean-square-error (MRMSE), and mean coefficient of efficiency (MCE), are used:(1)Relative root-mean-square-error (RRMSE):(2)Mean root-mean-square-error (MRMSE):(3)Mean coefficient of efficiency (MCE):where is the average of observed inflows and is the number of forecasting typhoon events.
3.2. SOM-SVM Models
As a comparison to strengthen the SVM-based model ability, the forecasts of the original SVM model are directly generated from all the forecasting data sets (where the SVM regressions are built from the other 21 typhoon events). Then, in contrast to the original SVM model, the SOM clustering and SVMs regression are combined for both of the proposed SVM-based models. Initially, all forecast forms in each event are divided into different neurons in the SOM clustering process from which the SOMs are established by the forecast forms of the other typhoon events. Then, the forecast forms divided for these different SOM neurons are used to generate forecasts by different strategies. After the forecasts are obtained by SVMs regression, the forecasts are combined according to the original time series process. The difference between the SOM-SVM1 and SOM-SVM2 models is the training form selection strategy, which is mentioned above. To highlight the advantage of the selection strategy of the SOM-SVM1 model, the original SVM forecasts are generated for comparison with the performance of the SOM-SVM1 model. Furthermore, based on the concept of clusters considering the lead-time information, it may make the forecasts not able to obtain some important information from the neighbor neurons for different regions in its cluster strategy. The selection strategy of the SOM-SVM2 model that considers higher relationship inputs is then established. The performance of these three SVM-based models is finally estimated to allow for intercomparison.
3.2.1. SOM Clustering
In this subsection, a SOM with a small dimension is considered the best option. If the clustering result is reasonable and satisfactory, the cluster analysis can be accepted. Otherwise, another SOM with a larger dimension is chosen to analyze input patterns. This step is repeated until a satisfactory result is obtained. That is, the inputs within each grid have the same characteristics associated with a certain inflow process. Then, for different events in different lead-time forecasts, SOMs are generated from the same process. Taking the 1 h ahead forecasts of the Polly Typhoon event for instance, the SOM is constructed from the other 21 events. According to our experiments, a dimension SOM is adopted herein. That is, the competitive layer contains 25 grids. After the SOM clustering is implemented, the corresponding feature map and density map can be obtained.
In the feature map, forecast forms with similar rainfall-runoff processes are located in neighboring grids. On the other hand, forecast forms with significantly different rainfall-runoff processes are located in different grids that are distant from each other. Such a characteristic is also retained in the density map, because the density map results from the feature map. The ID number and location of the SOM generated from the 21 typhoon events (except for the Polly Typhoon) are presented in Figure 3. In addition, its density map is shown in Figure 4. In Figure 4, the number inside each neuron indicates how many training data sets are projected onto the same topology point.
3.2.2. SOM-SVM1 Model
According to the above SOM results, it is found that the map can be divided into four specified regions with the rainfall-runoff process characteristics of different neurons. The specified region divisions are shown in Figure 5. The association between the four specified regions and the rainfall-runoff processes is also clearly illustrated in Figure 6. The regions are (1) increasing inflow region (region 1), (2) base flow region (region 2), (3) peaking hydrograph region (region 3), and (4) recession region (region 4). For example, region 1, located in the upper left area of the SOM (Figure 5), represents increasing reservoir inflow during the period (Figure 6). In this clustering process, all the forecast forms of each event are divided into different neurons. Then, according to (1), the forecasting pattern within the neurons can be expressed aswhere, as in the equation mentioned above, expresses the th specified region and expresses the th neuron.
After the above process, 22 SOMs could be generated, and each SOM is generated from the other 21 typhoon events. Among all 22 events, the longest event is 107 h, and the shortest event is 33 h; therefore, the SOMs are not generated from the same date length and data sets. Consequently, each generated SOM may be divided into four specified regions but is always a little different from the other 21 maps. The training forms of each neuron in the same specified region are collected as training form sets for model establishment in different specified regions. These results are used for SVMs regression to obtain the forecast forms of each neuron in the same specified region. Thus, the model SOM-SVM1 is established.
3.2.3. SOM-SVM2 Model
The above SOM-SVM1 model is established considering high relativity from four reservoir inflow processes. However, some training forms might be allocated to different neurons and, thus, further allocated to different specified regions for different events. This means that these training forms sometimes still have important information for the other specified regions but cannot pass this information to them. Besides, with the different strengths and lengths of the 22 typhoon events, the training forms of each typhoon event would not distribute to the 25 neurons on the feature map evenly. The maximum density number for the feature map is 295, and the minimum number is 12. For different flow processes, the high density map may have less information for other neurons, while the low density map may have more information. However, a large amount of information can be found for close neurons. Furthermore, as the clusters consider lead-time information, some of the forecasting forms of the SOM-SVM1 model cannot pass information to the forecasts in different regions. To cope with these outcomes, the SOM-SVM2 model is proposed.
According to the feature map generated from the SOM discussed in Section 3.2.1, a different enlarged training form is simply adopted here. The concept of this enlarged training form implies that, except for the training data set of each neuron, only the training data sets of surrounding neurons in the cross region are adopted as training forms. This means that, in the cross region of each neuron, four is the maximum number of surrounding neurons containing forecast forms that can be adopted as training forms for each neuron. However, for the neurons located on the edge of the competitive layer, only two or three surrounding neurons are adopted as training forms. Then, according to (2), the forecasting form of a SOM feature map can be written aswhere , , expresses the neuron ID number, and expresses integers. Table 2 presents the lists of neuron ID numbers for the enlarged training data set chosen from different neurons for the Polly Typhoon event generated by the SOM-SVM2 model. Taking the neuron ID numbers 1, 10, and 17 listed in Table 2 as examples, the enlarged training data set of these neurons is presented in Figure 7. The same enlarged training data set generations are produced within the same structure for each typhoon event. Then, the forecast forms are adopted to implement the SVMs for the SOM-SVM2 model results generation. Different from the SOM-SVM1 model, the enlarged training data set of SOM-SVM2 is not selected by considering the inflow process. In this study, irrespective of whether the number of neurons is selected, the training forms are simply selected from the neuron location. This enlarged training form selection strategy not only considers the training data with higher relationships but also does not require to exclude the training data set in a different specified region. Similarly, the above forecast forms are adopted to implement the SVMs for inflow forecasting.
3.2.4. Comparison between the Original SVM and the SOM-SVM1
In the SOM-SVM1 model, the SOM process clusters the forecast forms into different neurons according to the input characteristics and then divides them into different regions related to different inflow processes of the typhoon events. It follows that the forecasts can be generated considering high relative inputs, and the lower relative inputs can be ignored. Taking Typhoon Polly, which is the typhoon event with different rise and fall rainfall-runoff processes, as an example, the SOM-SVM1 model can generate well-performed forecasts by considering forecast forms belonging to one of the inflow specified regions. Compared to the forecasts generated from the original SVM model, the original SVM generates the forecasts irrespective of whether or not the forecast forms have extreme values. However, the forecasts generated from the SOM-SVM1 model can arrange the forecast forms belonging to different rainfall-runoff processes. That is, the SOM-SVM1 model is established to strengthen the forecast ability for different rainfall-runoff processes.
The inflow results generated from the original SVM model and SOM-SVM1 model for 1 h ahead forecasting are arranged in Table 3. These show that, compared to the forecasts generated from the original SVM model, the RRMSE values derived from the SOM-SVM1 model are small for each typhoon event. Even the original SVM model can derive a better CE value for the Xangsane typhoon. This simply means that the CE value of the original SVM can perform better only for the Xangsane typhoon, and both models can generate a CE value up to 0.9. Actually, the CE values generated from the original SVM model only have six events reaching 0.9, while the SOM-SVM1 model has 19. In addition, according to Figures 8(a) and 8(b), it can be found that the SOM-SVM1 model generates forecasts with significantly less scatter than the original SVM model when plotted against the measured values. Further lead-time results are presented in Table 4. It can be found that the average values of both the CE value and RRMSE values of the SOM-SVM1 model for different lead-times are still better than the original SVM model. The improvement of both indices are 21.5%, 18.4%, and 23.0% (arranged in Table 5), respectively.
3.2.5. SOM-SVM Results Comparison
All three SVM-based models are compared in this section. Taking Typhoon Polly as an example, a comparison of the three different SVM-based reservoir inflow forecasts for 1 h ahead to 3 h ahead is presented in Figures 9–11. In these figures, the forecasts generated from the SOM-SVM2 model performed better than the original SVM model but did not perform markedly better than the SOM-SVM1 model. This is because, after the SOM cluster process, all the forecast forms with similar input characteristics are arranged to closer, or the same, neurons. As the SVM-based models are established from these arranged inputs (selected from closer meaningful neurons) instead of all the inputs, the models can be strengthened for the relative selection strategies. The purpose of this study is to find an excellent selection strategy for typhoon events. Figures 8(a), 8(b), and 8(c) are again compared here. In Figures 8(a) and 8(b), the SOM-SVM1 models can generate higher concentrated forecasts, as mentioned above. However, in Figure 8(c), it can still be found that the forecasts of the SOM-SVM2 model have higher concentration than the other two SVM-based models.
All three SVM-based model forecasting results for 1 h ahead are shown in Table 4. It can be found that the SOM-SVM1 model is outperforming the original SVM model, as mentioned above. However, except for the fact that all RRMSE values for the SOM-SVM2 model are lower than both of the original SVM model and the SOM-SVM1 model, the SOM-SVM2 model does not perform better than the SOM-SVM1 model at each CE value of the typhoon events. Nevertheless, in contrast to the other two SVM-based models, all the CE values of SOM-SVM2 surpass 0.9. Table 5 lists the improvements to MCE computed from the SOM-SVM1 and SOM-SVM2 models when compared with the original SVM model. The same improvements to MCE values are shown in Figure 12. These show that the SOM-SVM2 model can generate a more than 20% improvement over the original SVM model, and this improvement reaches 35.4% in 3 h ahead forecasts. This is excellent when compared to the other two models.
Finally, the performances of the SOM-SVM1 and SOM-SVM2 models are compared again. As mentioned above, all the CE values of the SOM-SVM2 model exceed 0.9, but only 19 values exceed 0.9 for the SOM-SVM1 in 1 h forecasts. However, this does not mean that the quantity of CE values in 1 h ahead forecasts for the SOM-SVM2 model is higher than for the SOM-SVM1 model in every typhoon event. As the list in Table 4 shows, the SOM-SVM1 model has the better CE values for 8 events within all 22 events. However, the MCE indicates that the average CE value of the SOM-SVM2 model is better than for the SOM-SVM1 model. In any case, the RRMSE values still show that the SOM-SVM2 model can generate less inaccurate forecasts, as shown in Table 4.
Comparing the SOM-SVM1 and SOM-SVM2 models, the improvements for the SOM-SVM2 model are arranged in Table 6 and Figure 13. The MCE value generated from the SOM-SVM2 model is only 0.009 greater than from the SOM-SVM1 model for 1 h ahead forecasts. This means the improvement of the SOM-SVM2 model over the SOM-SVM1 model is just 0.96%. However, for 2 h and 3 h ahead inflow forecasts, the SOM-SVM2 model obviously improved, with 2.25% and 10.08% improvements, respectively. This means that after the training data set selected from the SOM-SVM2 model is adopted, the forecasts can be better generated than for the training data set selected from the SOM-SVM1 model. As the time is extended, the performance of SOM-SVM2 shows an strong improvement compared to the two other models.
Briefly, the model SOM-SVM2 is built to forecast typhoon events with simple structure concept and limited higher relative data. For the SOM-SVM1 model, the data distribution on the SOMs should be clustered for considering different reservoir inflow processes in advance and it makes the forecasts of typhoon events be generated from fewer and higher relative data. However, the SOM-SVM2 model is robust for the cross region selection strategy does not need to understand the distribution on the SOMs for reservoir inflow processes and still can collect fewer and higher relative data. Furthermore, the SOM-SVM2 model can generate better performance of typhoon events forecasts than both the original SVM model and SOM-SVM1 model.
4. Summary and Conclusions
The objective of this paper is to develop a precise and stable reservoir inflow forecasting model for reservoir operations during typhoon periods. For this purpose, instead of the original SVM model, two different enlarged training form selection strategies from SOM are combined to construct a piecewise nonlinear model. The first is the model that considers inputs selected from the inflow processes: the SOM-SVM1 model. The second is the model adopted for the training form of neurons on the SOM feature map: the SOM-SVM2 model.
In conclusion, there are at least three reasons favoring the use of the SOM-SVM2 model for inflow forecasts. First, both of the developed SOM-combined SVM models are established from the SOM model that strengthens the forecast ability well. Second, the SOM-SVM2 model can adopt the forecasting forms without considering the clusters on the SOM feature map. For the models established, the SOM-SVM2 model needs an enlarged training form containing the training data set of each of the neurons and the surrounding neurons. These neurons are selected depending only on the situation of the SOM feature map without external inflow definition for higher relationships. Finally, although the SOM-SVM2 model only derives a 0.96% improvement in 1 h reservoir inflow forecast results, it derives 2.25% and 10.08% improvements in MCE value for 2 h and 3 h lead-time forecast results, respectively, when compared to the SOM-SVM1 model. Moreover, the SOM-SVM2 model exhibits 20.9%, 21.2%, and 34.4% improvements for the MCE values for reservoir inflow forecast results with 1 h, 2 h, and 3 h lead-times, respectively, when compared to the original SVM model. In other words, the advantage of the SOM-SVM2 model becomes most obvious in long-term forecasts. The proposed SOM-SVM2 model is recommended as an alternative to the existing models because of its accuracy, robustness, and efficiency. This modeling technique is expected to be useful in improving reservoir inflow forecasting.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
V. N. Vapnik, Statistical Learning Theory, Wiley, New York, NY, USA, 1998.
N. Cristianini and J. Shaw-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, New York, NY, USA, 2000.
X. Y. Yu, S. Y. Liong, and V. Babovic, “EC-SVM approach for real-time hydrologic forecasting,” Journal of Hydroinformatics, vol. 6, no. 3, pp. 209–223, 2004.View at: Google Scholar
C. Sivapragasam and S.-Y. Liong, “Flow categorization model for improving forecasting,” Nordic Hydrology, vol. 36, no. 1, pp. 37–48, 2005.View at: Google Scholar
H. Yang, K. L. Hsu, S. Sorooshian, and X. Gao, “Self-organizing nonlinear output (SONO): a neural network suitable for cloud patch-based rainfall estimation from satellite imagery at small scales,” Water Resources Research, vol. 41, Article ID W03008, 2005.View at: Google Scholar
T. Kohonen, Self-Organizing Maps, Springer, New York, NY, USA, 2001.