Abstract

In order to overcome the inaccuracy of the forecast of a single model, a new optimal weight combination model is established to increase accuracies in precipitation forecasting, in which three forecast submodels based on rank set pair analysis (R-SPA) model, radical basis function (RBF) model and autoregressive model (AR) and one weight optimization model based on improved real-code genetic algorithm (IRGA) are introduced. The new model for forecasting precipitation time series is tested using the annual precipitation data of Beijing, China, from 1978 to 2008. Results indicate the optimal weights were obtained by using genetic algorithm in the new optimal weight combination model. Compared with the results of R-SPA, RBF, and AR models, the new model can improve the forecast accuracy of precipitation in terms of the error sum of squares. The amount of improved precision is 22.6%, 47.4%, 40.6%, respectively. This new forecast method is an extension to the combination prediction method.

1. Introduction

Precipitation time series forecast has received tremendous attention in the world because of the uncertainty of climate change which increases the difficulty of accurately forecasting such time series. The forecast of the nonlinear and uncertain time series is very difficult with the traditional deterministic mathematic models, which cause new challenges to increase forecast accuracies [1, 2]. There are many methods for predicting complex time series [3ā€“13].

Rank set pair analysis (R-SPA) model is based on the principle of set pair analysis, and, in this model, we take rank as the particular characteristic of the time series which could be regarded as the standard of the similarity analysis. Radical basis function (RBF) neural network was firstly introduced by Broomhead and Lowe [7]. The RBF network model is motivated by the locally tuned response observed in biological neurons. Neurons with a locally tuned response characteristic can be found in several parts of the nervous system. The theoretical basis of the RBF approach lies in the field of interpolation of multivariate functions [8]. Chau applied particle swarm optimization training algorithm for artificial neural network system (ANN) in prediction [3, 4]. The content of autoregressive (AR) model is a random process, which is often used to model and forecast various types of natural phenomena.

The combination model techniques provide consensus forecast by linear combination of individual model predictions according to different weighting strategies. The weights can be equal for all models in the simplest case or be determined through certain regression based methods [9]. The concept of combining the forecast model obtained from different models has been discussed and used previously [10ā€“19]. The sensible combination of the outputs of different models has the additional merit that it may assist in the understanding of the underlying physical processes. Genetic algorithms (GAs) encode a potential solution to a specific problem on a simple chromosome-like data structure and apply recombination operators to these structures so as to preserve critical information. GAs are chosen to calculate the weights of three submodels because of its outstanding performance in optimization analysis, especially regarding the process of finding optimal parameters.

This study first combines the three submodels which are introduced as previous, and the improved real-code genetic algorithm (IRGA) [19] is used to calculate the weights of the combination model. The three submodels and the new optimal weight combination model are used to forecast the annual precipitation for Beijing from 2004 to 2008. In the next section, optimal weight combination model is presented. In Section 3, we discuss the application of the optimal weight combination model. In Section 4, we give the conclusions.

2. The Optimal Weight Combination Model

In this paper, the procedure of establishing the new optimal weight combination model can be divided into three steps as follows.(1)Construct the weight combination model.(2)Establish three submodels.(3)Calculate the weights of the three submodels by using IRGA.

The flow chart of this procedure is shown in Figure 1.

2.1. Construction of the Weight Combination Model

In the case of forecast models, the weight combination forecast model [20] may be expressed as where is the observed discharge of the th time period, is the weight assigned to the th model, and estimated discharge and are the combination error term.

Equation (2.1) can be represented in matrix notation as where is the input matrix defined by is the output vector, is the weight vector, is the combination error vector, denotes the transpose of the vector, and is the total number of observations.

In the weight combination forecasting model, the sum of the weights is normally constrained to be equal to unity, that is

The value of the weight cannot be less than zero, that is

2.2. Establishment of the Three Submodels
2.2.1. Rank Set Pair Analysis (R-SPA) Model

The procedure of the establishment of this model is shown as follows.(1)Consider an annual precipitation series , we constructed the history sets , current set and the subsequent value of these sets are represented in Table 1.

Because of the weak dependence in the annual precipitation series, we assume that the number of history set and current set to be an integer from 4 to 6.(2)Rank transformation. We mark the elements in from 1 to according to the rank of elements in the sets they belong to. If some elements have the same rank, we mark them according to their average rank and round off the value. Then, we could obtain the rank set.(3)Construct rank set pairsand calculate the difference between the corresponding elements of and . If the absolute value of is equal to zero, we mark them ā€œidenticalā€; if the absolute value of is greater than , we mark them ā€œcontraryā€; if the absolute value of is between zero and , we mark them ā€œdiscrepant.ā€ Respectively, count the total number of ā€œidentical,ā€ ā€œcontrary,ā€ and ā€œdiscrepantā€ of each rank set pair. According to the value of the coefficient of the discrepancy degree and the coefficient of the contrary degree, the connection degree formula as follows: where is the connection degree of the set pair, denotes the total number of characteristics of the set pair, represents the number of identity characteristics, is the number of contrary characteristics, is the number of the characteristics if the set pair is neither identity nor contrary. According to (2.6), we calculate the value of the connection degree of each rank set pair.(4)In accordance with the maximum principle, we can find a similar set of , and also we can find several similar sets of under certain circumstances. is the counterpart of , and the subsequent value of is . We can obtain the value of through the formula as follows: where is the ratio of the average of the elements in and the average of the elements in, is the number of the similar sets of .

2.2.2. Radical Basis Function (RBF) Model

The interpretation of the RBF method as an artificial neural network consists of three layers: one layer is the input layer neurons feeding the feature vectors into the network; another layer is a hidden layer of RBF neurons calculating the outcome of the bas functions; the last layer is the output layer neurons calculating a linear combination of the basis functions [21, 22]. The different numbers of hidden layer neurons and spread constant are tried in the study. Its topological structure is shown in Figure 2.

The procedure of the establishment of this model is shown as follows.(1)Normalization of the time series. Consider an annual precipitation series, we can transform the series to by the normalization formula as follows. where and denote the minimum and the maximum of the time series.(2)Forecast of the data. The application of the RBF neural networks to time series data consists of two steps. The first step is the training of the neural networks. Choose the first value of the new series as the training sample, and set up the RBF neural networks. Once the training stage is completed, the RBF neural networks will be applied to the forecasting data. Based upon the RBF neural networks established by the training sample, we forecast the value of the last elements of the series and the forecasting series can be represented as. In this study, we take that the value of the mean-square error is 0.0001 and the width of the radical primary function is 1.(3)Denormalization of the forecasting series. Since the value of the elements in forecasting series is between zero and one, that is, we should denormalize the forecasting series to final forecasting through the denormalization formula as follows:

2.2.3. Autoregressive (AR) Model

In this paper, we regard the data of the annual precipitation as a time series and the trend term, seasonal term, and random term can be extracted from the time series in sequence. Then, we superpose the trend term, seasonal term and random term, and obtain the equation as follows [23ā€“25]: where is the precipitation time series, is the trend term, is the seasonal term, and is the random term.

The procedure of establishing the autoregressive model is shown as follows.(1)The extraction of the trend term. In this paper, the data performs a clear quadratic algorithms component, so a polynomial function is used to fit the precipitation data. The trend term can be described as follows: where is the coefficient of the quadratic polynomial (2.11).(2)The extraction of the seasonal term. The analysis of precipitation seasonality can be accomplished with the aid of modeling via spectral analysis. The precipitation seasonality can be indicated with waves. is the output of subtract, and the estimated value of can be defined as : where is the number of harmonicwave, and are the coefficient of the Fourier series (2.12):

Taking the working capacity into consideration, we choose the significant wave to forecast. And we define the th wave as the significant wave when the following inequality is satisfied: where is the level of significance (%); is the variance of the series:(3)The extraction of the random term. The random term is defined as a linear combination of : where is the order number of the model; denotes the coefficient of the regression model, which can be confirmed by (Akaikeā€™s Information Criterion) formula: where is the number of series, represents the variance of and the appropriative of can be chosen among 1, 2, 3, and 4.

2.3. Calculation of the Weights of the Submodels

The key of setting up the optimal weight combination model is to ascertain the weight of each forecasting model. In this study, we choose the weight which satisfies that the error sum of squares of the combination model is the minimum among all weight combination forecasting models, that is where is the error sum of squares of the combination model.

Two matrices and are defined as is the error of the th forecasting value of the th model, And, the formula (2.13) can be represented as follows:

If we obtain the value of with the aid of formula (2.4), (2.5), and (2.21), then we can ascertain optimal weight combination model.

Genetic algorithm is an adaptive heuristic search algorithm premised on the evolutionary ideas of natural selection and genetic mutation, and it has always been regarded as a function optimizer [26ā€“28]. The flow chart of genetic algorithm is shown in Figure 3.

In this paper, we use the improved real-code genetic algorithm (IRGA) to solve this optimization problem. The population size is 20; the crossover fraction is 0.8, and the generation is 100.

3. Application of the Optimal Weight Combination Model

In this study, the data of the annual precipitation from 1978 to 2008 for Beijing are collected and shown in Figure 4.

Firstly, we use R-SPA, RBF, and AR models to forecast the annual precipitation from 2004 to 2008 of Beijing, respectively. And the outputs of the three models are shown in Table 2.

Based on the forecasted data of the three submodels, the weights of the three submodels in the combination model are obtained by using IRGA [19] and the weights of the three models are 22.9%, 37.2%, and 39.9%, respectively, and are given in Table 3.

Based on the obtained weights, we calculate the forecasted data of optimal weight combination model, and the output is represented in Table 4.

By comparing the output of the combination model with the output of the three submodels, we find that the error sum of squares of the combination model is apparently lower than that obtained for any other submodel. In this study, the value of the error sum of squares is regarded as the standard for judging the precision of the forecast of the annual precipitation of Beijing, and the improvement of the precision of the new weight combination model compared with three submodels is shown in Table 5.

So we conclude that the precision of the combination model is higher than that of three models in terms of the error sum of squares.

4. Conclusions

A new optimal weight combination model, based on the R-SPA, RBF, and AR models and one weight optimization model based on improved real-code genetic algorithm (IRGA), is proposed in this paper. The annual precipitation time series of Beijing from 1978 to 2008 are studied by using the new model. The main conclusions are given as follows.(1)Three submodels, that is, R-SPA model, RBF model, and AR model, are tested to forecast the annual precipitation of Beijing, and the results suggest that R-SPA is better and RBF worst in the three models in terms of the error sum of squares. Different models have different precision for forecasting annual precipitation. (2)The optimal weights can be obtained by use of IRGA in new optimal weight combination model. Application results of the combination model indicate the weights of the submodels can be appropriately confirmed and such method provides a new way to improve the prediction precision for forecasting complex precipitation time series. (3)Compared with the results of R-SPA, RBF, and AR models, the proposed model can improve the forecast accuracy of precipitation in terms of the error sum of squares, and its improved precision is 22.6%, 47.4%, 40.6%, respectively. So the precision of the three submodels can be improved by establishing the new model in precipitation forecast. (4)Because of the fail to avoid the drawbacks of three submodels completely, the accuracy of the combination model is inevitably affected. In the future, the accuracy of the combination model may be improved by applying some more advanced submodels.

Acknowledgments

The present work is supported by the Program for the National Basic Research Program of China (no. 2010CB951104), the Project of National Natural Science Foundation of China (nos. 50939001 and 51079004), the Specialized Research Fund for the Doctoral Tutor Program of Higher Education (no. 20100003110024), and the Program for Changjiang Scholars and Innovative Research Team in University (no. IRT0809).