Abstract

We developed a method to predict heavy rainfall in South Korea with a lead time of one to six hours. We modified the AWS data for the recent four years to perform efficient prediction, through normalizing them to numeric values between 0 and 1 and undersampling them by adjusting the sampling sizes of no-heavy-rain to be equal to the size of heavy-rain. Evolutionary algorithms were used to select important features. Discriminant functions, such as support vector machine (SVM), k-nearest neighbors algorithm (k-NN), and variant k-NN (k-VNN), were adopted in discriminant analysis. We divided our modified AWS data into three parts: the training set, ranging from 2007 to 2008, the validation set, 2009, and the test set, 2010. The validation set was used to select an important subset from input features. The main features selected were precipitation sensing and accumulated precipitation for 24 hours. In comparative SVM tests using evolutionary algorithms, the results showed that genetic algorithm was considerably superior to differential evolution. The equitable treatment score of SVM with polynomial kernel was the highest among our experiments on average. k-VNN outperformed k-NN, but it was dominated by SVM with polynomial kernel.

1. Introduction

South Korea lies in the temperate zone. In South Korea, we have clearly distinguished four seasons, where spring and fall are short relatively to summer and winter. It is geographically located between the parallels 125°04′′E and 131°52′′E and the meridians 33°06′′N and 38° 27′′N in the Northern Hemisphere, on the east coast of the Eurasian Continent, and also adjacent to the Western Pacific, as shown in Figure 1. Therefore, it has complex climate characteristics, which show both continental and oceanic features. It has a wide interseasonal temperature difference and much more precipitation than that of the Continent. In addition, it has obvious monsoon season wind, a rainy period from the East Asian Monsoon, locally called Changma [1], typhoons, and frequently heavy snowfalls in winter. The area belongs to a wet region because of more precipitation than that of the world average.

The annual mean precipitation of South Korea, as shown in Figure 2, is around 1,500 mm and 1,300 mm in the central part. Geoje-si of Gyeongsangnam-do has the largest amount of precipitation, 2007.3 mm, and Baegryeong island of Incheon has the lowest amount of precipitation, 825.6 mm.

When a stationary front lingers across the Korean Peninsula for about a month in summer, more than half of the annual precipitation falls during the Changma season. Precipitation for the winter is less than 10% of the total. Changma is a part of the summer Asian monsoon system. It brings frequent heavy rainfall and flash floods for 30 days on average, and serious natural disasters often occur.

The heavy rainfall is one of the major severe weather phenomena in South Korea. The weather phenomena can lead to serious damage and losses of both life and infrastructure, and it is very important to forecast heavy rainfall. However, it is considered a difficult task because it takes place in very short time interval [2].

We need to predict this torrential downpour to prevent the losses of life and property [1, 3]. Heavy rainfall forecasting is very important to avoid or minimize natural disasters before the events occur. We used real weather data collected from 408 automatic weather stations [4] in South Korea, for the period from 2007 to 2010. We studied the prediction of one hour to six hours of whether or not heavy rainfall will occur in South Korea. To the best knowledge of the authors, this problem has not been handled by other researchers.

There have been many studies on heavy rainfall using various machine learning techniques. In particular, several studies focused on weather forecasting using an artificial neural network (ANN) [511]. In the studies of Ingsrisawang et al. [11] and Hong [12], support vector machine was applied to develop classification and prediction models for rainfall forecasts. Our research is different from previous work on how to process weather datasets.

Kishtawal et al. [13] studied the prediction of summer rainfall over India using genetic algorithm (GA). In their study, the genetic algorithm found the equations that best describe the temporal variations of the seasonal rainfall over India. The geographical region of India has been divided into five homogeneous zones (excluding the North-West Himalayan zone). They used the monthly mean rainfall during the months of June, July, and August. The dataset consist of the training set, ranging from 1871 to 1992, and the validation set, ranging from 1993 to 2003. The experiment of the first evolution process and the second evolution process were conducted using the training set and the validation set, in order. The performance of the algorithm for each case was evaluated, using the statistical criteria of standard error and fitness strength. Chromosome was made up of five homogeneous zones, annual precipitation, and four elementary arithmetic operators. The strongest individuals (equations with best fitness) were then selected to exchange parts of the character strings between reproduction and crossover, while individuals less fitted to the data are discarded. A small percentage of the equation strings’ most basic elements, single operators and variables, are mutated at random. The process was repeated a large number of times (about 1,000–10,000) to improve the fitness of the evolving population of equations. The major advantage of using genetic algorithm versus other nonlinear forecasting techniques, such as neural networks, is that an explicit analytical expression for the dynamic evolution of the rainfall time series is obtained. However, they used quite simple or typical parameters of a genetic algorithm. If they conducted experiments by tuning various parameters of their genetic algorithm, they would report the experimental results showing better performance.

Liu et al. [14] proposed a filter method for feature selection. Genetic algorithm was used to select major features in their study, and the features were used for data mining based on machine learning. They proposed an improved Naive Bayes classifier (INBC) technique and explored the use of genetic algorithms (GAs) for selection of a subset of input features in classification problems. They then carried out a comparison with several other techniques. This sets a comparison of the following algorithms, namely, genetic algorithm with average classification or general classification (GA-AC, GA-C), C4.5 with pruning, and INBC with relative frequency or initial probability density (INBC-RF, INBC-IPD), on the real meteorological data in Hong Kong. In their experiments, the daily observations of meteorological data were collected from the Observatory Headquarters and King’s Park for training and test purposes, for the period from 1984 to 1992 (Hong Kong Observatory). During this period, they were only interested in extracting data from May to October (for the rainy season) each year. INBC achieved about a 90% accuracy rate on the rain/no-rain (Rain) classification problems. This method also attained reasonable performance on rainfall prediction with three-level depth (Depth 3) and five-level depth (Depth 5), which was around 65%–70%. They used a filter method for feature selection. In general, it is known that a wrapper method performs better than a filter method. In this study, we try to apply a wrapper method to feature selection.

Nandargi and Mulye [15] analyzed the period of 1961–2005 to understand the relationship between the rain and rainy days, mean daily intensity, and seasonal rainfall over the Koyna catchment in India, on monthly, as well as seasonal, scale. They compared a linear relationship with a logarithmic relationship, in the case of seasonal rainfall versus mean daily intensity.

Routray et al. [16] studied a performance-based comparison of simulations carried out using nudging (NUD) technique and three-dimensional variation (3DVAR) data assimilation system, of a heavy rainfall event that occurred during 25–28 June, 2005, along the west coast of India. In the experiment, after observations using the 3DVAR data assimilation technique, the model was able to simulate better structure of the convective organization, as well as prominent synoptic features associated with the mid-tropospheric cyclones (MTC), than the NUD experiment, and well correlated with the observations.

Kouadio et al. [17] investigated relationships between simultaneous occurrences of distinctive atmospheric easterly wave (EW) signatures that cross the south equatorial Atlantic, intense mesoscale convective systems (lifespan > 2 hours) that propagate westward over the western south equatorial Atlantic, and subsequent strong rainfall episodes (anomaly > 10 mm·day−1) that occur in eastern Northeast Brazil (ENEB). They forecasted rainfall events through real-time monitoring and the simulation of this ocean-atmosphere relationship.

Afandi et al. [2] investigated heavy rainfall events that occurred over Sinai Peninsula and caused flash flood, using the Weather Research and Forecasting (WRF) model. The test results showed that the WRF model was able to capture the heavy rainfall events over different regions of Sinai and predict rainfall in significant consistency with real measurements.

Wang and Huang [18] studied on finding the evidence of self-organized criticality (SOC) for rain datasets in China, by employing the theory and method of SOC. For that reason, they analyzed the long-term rain records of five meteorological stations in Henan, a central province of China. They found that the long-term rain processes in central China exhibit the feature of self-organized criticality.

Hou et al. [19] studied the impact of three-dimensional variation data assimilation (3DVAR) on the prediction of two heavy rainfall events over southern China in June and July. They used two heavy rainfall events: one affecting several provinces in southern China with heavy rain and severe flooding; the other is characterized by nonuniformity and extremely high rainfall rates in localized areas. Their results suggested that the assimilation of all radar, surface, and radiosonde data had a more positive impact on the forecast skill than the assimilation of either type of data only, for the two rainfall events.

As a similar approach to ours, Lee et al. [20] studied feature selection using a genetic algorithm for heavy-rain prediction in South Korea. They used ECMWF (European Centre for Medium-Range Weather Forecasts) weather data collected from 1989 to 2009. They selected five features among 254 weather elements to examine the performance of their model. The five features selected were height, humidity, temperature, U-wind, and V-wind. In their study, a heavy-rain criterion is issued only when precipitation during six hours is higher than 70 mm. They used a wrapper-based feature selection method using a simple genetic algorithm and SVM with RBF kernel as the fitness function. They did not explain errors and incorrectness for their weather data. In this paper, we use the weather data collected from 408 automatic weather stations during the recent four years from 2007 to 2010. Our heavy-rain criterion is exactly that of Korea Meteorological Administration in South Korea, as shown in Section 3. We validate our algorithms with various machine learning techniques, including SVM with different kernels. We also explain and fixed errors and incorrectness for our weather data in Section 2.

The remainder of this paper is organized as follows. In Section 2, we propose data processing and methodology for very short-term heavy rainfall prediction. Section 3 describes the environments of our experiments and analyzes the results. The paper ends with conclusions in Section 4.

2. Data and Methodology

2.1. Dataset

The weather data, which are collected from 408 automatic weather stations during the recent four years from 2007 to 2010, had a considerable number of missing data, erroneous data, and unrelated features. We analyzed the data and corrected the errors. We preprocessed the original data given by KMA, in accordance with Table 1. Some weather elements of the original data had incorrect value, and we replaced the value with a very small one (−107). We created several elements, such as month (1–12) and accumulated precipitation for 3, 6, and 9 hours (0.1 mm), from the original data [21]. We removed or interpolated each day data of the original data, when important weather elements of the day data had very small value. Also, we removed or interpolated new elements, such as accumulated precipitation for 3, 6, and 9 hours, which had incorrect value. We undersampled the weather data that were adjusted for the proportion of heavy-rain against no-heavy-rain to be one in the training set, as shown in Section 2.3.

The new data were generated in two forms: whether or not we applied normalization. The training set, ranging from 2007 to 2008, was generated by undersampling. The validation set, the data for 2009, was used to select an important subset from input features. The selected important features were used for experiments with the test set, the data for 2010. Representation of our GA and DE was composed of 72 features accumulated for the recent six hours, as shown in Figure 3. The symbols shown in Figure 3 mean modified weather elements in order by index number shown in Table 1. The symbol “—” in Table 1 means (NA not applicable).

2.2. Normalization

The range of each weather element was significantly different (see Table 2), and the test results might rely on the values of a few weather elements. For that reason, we preprocessed the weather data using a normalization method. We calculated the upper bound and lower bound of each weather factor from the original training set. The value of each upper bound and lower bound was converted to 1 and 0, respectively. Equation (1) shows the process for the used normalization. In (1), means each weather element. The validation set and the test set were normalized, in accordance with the ranges in the original training set. Precipitation sensing in Table 2 means whether or not it rains:

2.3. Sampling

Let be the frequency of heavy rainfall occurrence in the training set. We randomly choose among the cases of no-heavy-rain in the training set. Table 3 shows the proportion of heavy-rain to no-heavy-rain every year. On account of the results of Table 3, we preprocessed our data using this method called undersampling. We adjusted the proportion of heavy rainfall against the other to be one, as shown in Figure 4 and Pseudocode 1.

// : set of heavy-rain cases in training set
// : set of no-heavy-rain cases in training set
// : set of no-heavy-rain cases sampled from B, that is,
// : undersampled training set
the number of heavy-rain cases, that is, ∣A∣;
initialize to be empty;
while (l 0)
 randomly choose one value from B;
if the value is not in , then
  add the value to ;
    − 1;
end if
end while
T <svg style="vertical-align

Table 4 shows ETS for prediction after 3 hours and the effect of undersampling [22] and normalization for 3 randomly chosen stations. The tests without undersampling showed a low equitable threat score (ETS) and required too long a computation time. In tests without undersampling, the computation time took 3, 721 minutes in k-NN and 3, 940 minutes in k-VNN (see Appendix B), the “reached max number of iterations” error was raised in SVM with polynomial kernel (see Appendix C), and and of ETS were zero. In tests with undersampling, the computation time took around 329 seconds in k-NN, 349 seconds in k-VNN, and 506 seconds in SVM with polynomial kernel. The test results with normalization showed about 10 times higher, than those without normalization.

2.4. Genetic-Algorithm-Based Feature Selection

Pseudocode 2 shows the pseudocode of a typical genetic algorithm [23]. In this figure, if we define that is the count of solutions in the population set, we create new solutions in a random way. The evolution starts from the population of completely random individuals, and the fitness of the whole population is determined. Each generation consists of several operations, such as selection, crossover, mutation, and replacement. Some individuals in the current population are replaced with new individuals to form a new population. Finally, this generational process is repeated, until a termination condition has been reached. In a typical GA, the whole number of individuals in a population and the number of reproduced individuals are fixed at and , respectively. The percentage of individuals to copy to the new generation is defined as the ratio of the number of new individuals to the size of the parent population, , which we called “generation gap” [24]. If the gap is close to , the GA is called a steady-state GA.

Create an initial population of size ;
repeat
  for     to  
   choose and from the population;
    = crossover( , );
    = mutation( );
  end for
  replace(population, [ , , …, ]);
until (stopping condition);
return the best solution;

We selected important features, using the wrapper methods that used the inductive algorithm to estimate the value of a given subset. The selected feature subset is the best individual among results of the experiment with the validation set. The experimental results in the test set with the selected features showed better performance than those using all features.

The steps of the GA used are described in Box 1. All steps will be iterated, until the stop condition (the number of generations) is satisfied. Figure 5 shows the flow diagram of our steady-state GA.

2.5. Differential-Evolution-Based Feature Selection

Khushaba et al. [25, 26] proposed a differential-evolution-based feature selection (DEFS) technique which is shown schematically in Figure 6. The first step in the algorithm is to generate new population vectors from the original population. A new mutant vector is formed by first selecting two random vectors, then performing a weighted difference, and adding the result to a third random (base) vector. The mutant vector is then crossed with the original vector that occupies that position in the original matrix. The result of this operation is called a trial vector. The corresponding position in the new population will contain either the trial vector (or its corrected version) or the original target vector depending on which one of those achieved a higher fitness (classification accuracy). Due to the fact that a real number optimizer is being used, nothing will prevent two dimensions from settling at the same feature coordinates. In order to overcome such a problem, they proposed to employ feature distribution factors to replace duplicated features. A roulette wheel weighting scheme is utilized. In this scheme, a cost weighting is implemented, in which the probabilities of individual features are calculated from the distribution factors associated with each feature. The distribution factor of feature is given by the following equation: where , are constants and is a small factor to avoid division by zero. is the positive distribution factor that is computed from the subsets that achieved an accuracy that is higher than the average accuracy of the whole subsets. is the negative distribution factor that is computed from the subsets that achieved an accuracy that is lower than the average accuracy of the whole subsets. This is shown schematically in Figure 7, with the light gray region being the region of elements achieving less error than the average error values and the dark gray being the region with elements achieving higher error rates than the average. The rationale behind (2) is to replace the replicated parts of the trial vectors according to two factors. The factor indicates the degree to which contributes to forming good subsets. On the other hand, the second term in (2) aims at favoring exploration, where this term will be close to 1, if the overall usage of a specific feature is very low.

3. Experimental Results

We preprocessed the original weather data. Several weather elements are added or removed, as shown in Table 1. We undersampled and normalized the modified weather data. Each hourly record of the data consists of twelve weather elements, and representation was made up of the latest six hourly records, 72 features, as shown in Figure 3. We extracted a feature subset using the validation set and used the feature subset to do experiments with the test set.

The observation area has 408 automatic weather stations in the southern part of the Korean peninsula. The prediction time is from one hour to six hours. We adopted GA and DE among the evolutionary algorithms. SVM, k-VNN, and k-NN are used as discriminant functions. Table 5 shows the parameters of a steady-state GA and DE, respectively. LibSVM [27] is adopted as a library of SVM, and we set SVM type, one of the SVM parameters, as C_SVC that regularizes support vector classification, and the kernel functions used are polynomial, linear, and precomputed. We set to be 3 in our experiments.

In South Korea, a heavy-rain advisory is issued when precipitation during six hours is higher than 70 mm or precipitation during 12 hours is higher than 110 mm. A heavy-rain warning is issued when precipitation during 6 hours is higher than 110 mm, or precipitation during 12 hours is higher than 180 mm. We preprocessed the weather data using this criterion. To select the main features, we adopted a wrapper method, which uses classifier itself in feature evaluation differently from a filter method.

An automatic weather station (AWS) [28] is an automated version of the traditional weather station, either to save human labor or to enable measurements from remote areas. An automatic weather station will typically consist of a weather-proof enclosure, containing the data logger, rechargeable battery, telemetry (optional), and the meteorological sensors, with an attached solar panel or wind turbine and mounted upon a mast. The specific configuration may vary, due to the purpose of the system. In Table 6, Fc and Obs are abbreviations for forecast and observed, respectively. The following is a measure for evaluating precipitation forecast skill:

These experiments were conducted using LibSVM [27] on an Intel Core2 duo quad core 3.0 GHz PC. Each run of GA took about 201 seconds in SVM test with normalization and about 202 seconds without normalization; it took about 126 seconds in k-NN test with normalization and about 171 seconds without normalization; it took about 135 seconds in k-VNN test with normalization and about 185 seconds without normalization.

Each run of DE took about 6 seconds in SVM test with normalization and about 5 seconds without normalization; it took about 5 seconds in k-NN test with normalization and about 4 seconds without normalization; it took about 5 seconds in k-VNN test with normalization and about 4 seconds without normalization.

The heavy-rain events, which meet the criterion of heavy rainfall, consist of a consecutive time interval, which has a beginning time and an end time. The coming event is to discern whether or not it is a heavy rain on the beginning time. For each hour from the beginning time to the end time, discerning whether or not it is a heavy rain means the whole process. We defined CE and WP to be forecasting the coming event and the whole process of heavy rainfall, respectively.

Table 7 shows the experimental results for GA and DE. Overall, GA was about 1.42 and 1.49 times better than DE in CE and WP predictions, respectively. In DE experiments, SVM and k-VNN were about 2.11 and 1.10 times better than k-NN in CE prediction, respectively. SVM and k-VNN were about 2.48 and 1.08 times better than k-NN in WP prediction, respectively. In GA experiments, SVM with polynomial kernel showed better performance than that with linear or precomputed kernel on average. SVM with polynomial kernel and k-VNN were about 2.62 and 2.39 times better than k-NN in CE prediction, respectively. SVM with polynomial kernel and k-VNN were about 2.01 and 1.49 times better than k-NN in WP prediction, respectively. As the prediction time is longer, ETS shows a steady downward curve. SVM with polynomial kernel shows the best ETS among GA test results. Figure 8 visually compares CE and WP results in GA experiments.

Consequently, SVM showed the highest performance among our experiments. k-VNN showed that the degree of genes’ correlation had significantly effects on the test results, in comparison with k-NN. Tables 8, 9, 10, and 11 show detailed SVM (with polynomial kernel) test results for GA and DE.

We selected the important features using the wrapper methods using the inductive algorithm to estimate the value of a given set. All features consist of accumulated weather factors for six hours, as shown in Figure 3. The selected feature subset is the best individual among the experimental results, using the validation set. Figure 9 shows the frequency for the selected features after one hour to six hours. The test results using the selected features were higher than those using all features. We define a feature as . The derived features from the statistical analysis, which has a 95 percent confidence interval, were the numbers , , , , , , , , , , , , , , , , , , , , and . The main seven features selected were the numbers , , , , , , and and were evenly used by each prediction hour. These features were precipitation sensing and accumulated precipitation for 24 hours.

We compared the heavy rainfall prediction test results of GA and DE, as shown in Table 7. The results showed that GA was significantly better than DE. Figure 10 shows precipitation maps for GA SVM test results with normalization and undersampling, from one to six hours. The higher ETS is depicted in the map in the darker blue color. The numbers of automatic weather stations by prediction hours are 105, 205, 231, 245, 223, and 182, in order from one to six hours, respectively. The reasons for the differential numbers of automatic weather stations by prediction hours are as follows. First, we undersampled the weather data by adjusting the sampling sizes of no-heavy-rain to be equal to the size of heavy-rain in the training set, as shown in Section 2.3. Second, we excluded the AWS number in which the record number of the training set is lower than three. Third, we excluded the AWS in which hit and false alarm are 0 from the validation experimental results. Finally, we excluded the AWS in which hit, false alarm, and miss are 0 from the test experimental results.

The weather data collected from automatic weather stations during the recent four years had a lot of missing data and erroneous data. Furthermore, our test required more than three valid records in the training set. For that reason, the number of usable automatic weather stations was the lowest in the prediction after one hour and increased as the prediction time became longer.

4. Conclusion

In this paper, we realized the difficulty, necessity, and significance of very short-term heavy rainfall forecasting. We used various machine learning techniques, such as SVM, k-NN, and k-VNN based on GA and DE, to forecast heavy rainfall after from one hour to six hours. The results of GA were significantly better than those of DE. SVM with polynomial kernel among various classifiers in our GA experiments showed the best results on average. A validation set was used to select the important features, and the selected features were used to predict very short-term heavy rainfall. We derived 20 features from the statistical analysis, which has a 95 percent confidence interval. The main features selected were precipitation sensing and accumulated precipitation for 24 hours.

In future work, we will preprocess the weather data by various methods, such as representation learning, cyclic loess, contrast, and quantile normalization algorithms. Also, we will apply other machine learning techniques, such as statistical relational learning, multilinear subspace learning, and association rule learning. As more appropriate parameters are applied to the evolutionary algorithm or machine learning techniques, we expect to get better results. We have validated our algorithms with AWS data; however, it would be interesting to examine the performance with, for example, satellite data as another future work.

Appendices

A. Spatial and Temporal Distribution of Heavy Rainfall over South Korea

We calculated the rainfall duration, which meets the criterion of heavy rainfall, from each automatic weather station for the period from 2007 to 2010. We divided the rainfall duration by 100 and let the result be depicted in the map. Figure 11 shows the distribution of heavy rainfall for the whole seasons. Figure 12 shows the distribution of heavy rainfall by seasons. Most heavy rainfalls have been concentrated in summer, and they have a wide precipitation range regionally. Also, their frequencies are quite different from region to region.

B. k-Nearest Neighbors Classifier

In pattern recognition, the k-nearest neighbors algorithm (k-NN) [29] is a method for classifying objects based on the closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally, and all computation is deferred until classification. The k-NN algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k-nearest neighbors ( is a positive integer, typically small). The k-NN classifier is commonly based on the Euclidean distance between a testing sample and the specified training samples.

Golub et al. [30] developed a procedure that uses a fixed subset of informative genes and makes a prediction based on the expression level of these genes in a new sample. Each informative gene casts a weighted vote for one of the classes, with the magnitude of each vote dependent on the expression level in the new sample, and the degree of that gene’s correlation with the class distinction in their class predictor. We made a variant k-nearest neighbors algorithm (k-VNN) that the degree () of genes’ correlation was applied to a majority vote of its neighbors. Box 2 shows the equation calculating correlation between feature and class. In Box 2, means a feature (i.e., a weather element) and means a class (i.e., heavy-rain or no-heavy-rain). The test results of k-VNN were better than those of k-NN. We set to be 3 in our experiments because it is expected that the classifier will show low performance if is just 1 and it will take a long computing time when is 5 or more.

C. Support Vector Machine

Support vector machines (SVM) [32] are a set of related supervised learning methods that analyze data and recognize patterns and are used for classification and regression analysis. The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes the input is a member of, which makes the SVM a nonprobabilistic binary linear classifier. Since an SVM is a classifier, it is then given a set of training examples, each marked as belonging to one of two categories, and an SVM training algorithm builds a model that assigns new examples into one category or the other. Intuitively, an SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category, based on which side of the gap they fall on.

D. Evolutionary Computation

A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution, and this heuristic is routinely used to generate useful solutions to optimization and search problems [33]. In the process of a typical genetic algorithm, the evolution starts from the population of completely random individuals, and the fitness of the whole population is determined. Each generation consists of several operations, such as selection, crossover, mutation, and replacement. Some individuals in the current population are replaced with new individuals to form a new population. Finally, this generational process is repeated, until a termination condition has been reached.

Differential evolution (DE) is an evolutionary (direct-search) algorithm, which has been mainly used to solve optimization problems. DE shares similarities with traditional evolutionary algorithms. However, it does not use binary encoding as a simple genetic algorithm, and it does not use a probability density function to self-adapt its parameters as an evolution strategy. Instead, DE performs mutation, based on the distribution of the solutions in the current population. In this way, search directions and possible step sizes depend on the location of the individuals selected to calculate the mutation values [34].

E. Differences between Adopted Methods

In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible. It operates on the domain of those optimization problems, in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution [33].

Feature selection is a problem to get a subset among all features, and it is a kind of combinatorial optimization. Genetic algorithms (GAs) and differential evolutions (DEs) use a random element within an algorithm for optimization or combinatorial optimization, and they are typically used to solve the problems of combinatorial optimization such as feature selection, as in this paper.

Machine learning techniques include a number of statistical methods for handling classification and regression. Machine learning mainly focuses on prediction, based on known properties learned from the training data [33]. It is not easy to use general machine learning techniques for feature selection. In this paper, machine learning techniques were used for classification. GA and DE could be used for regression, but they have a weakness in handling regression because these algorithms will take longer computing time than other regression algorithms.

F. Detailed Statistics of Experimental Results

Tables 811 show SVM (with polynomial kernel) test results for GA and DE. As shown in the contingency Table 6, the test results show ETS and other scores. We defined CE and WP to be forecasting the coming event and the whole process of heavy rainfall, respectively. The test results include the number of used automatic weather stations by each prediction hour, and the number of those is equally set, in the same prediction hour of each experiment. As a result, GA was considerably superior to DE.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

A preliminary version of this paper appeared in the Proceedings of the International Conference on Convergence and Hybrid Information Technology, pp. 312–322, 2012. The authors would like to thank Mr. Seung-Hyun Moon for his valuable suggestions in improving this paper. The present research has been conducted by the Research Grant of Kwangwoon University in 2014. This work was supported by the Advanced Research on Meteorological Sciences, through the National Institute of Meteorological Research of Korea, in 2013 (NIMR-2012-B-1).