Research Article | Open Access
Shuang Wei, Henry Leung, "Compromise Rank Genetic Programming for Automated Nonlinear Design of Disaster Management", Mathematical Problems in Engineering, vol. 2015, Article ID 873794, 14 pages, 2015. https://doi.org/10.1155/2015/873794
Compromise Rank Genetic Programming for Automated Nonlinear Design of Disaster Management
This paper presents a novel multiobjective evolutionary algorithm, called compromise rank genetic programming (CRGP), to realize a nonlinear system design (NSD) for disaster management automatically. This NSD issue is formulated here as a multiobjective optimization problem (MOP) that needs to optimize model performance and model structure simultaneously. CRGP combines decision making with the optimization process to get the final global solution in a single run. This algorithm adopts a new rank approach incorporating the subjective information to guide the search, which ranks individuals according to the compromise distance of their mapping vectors in the objective space. We prove here that the proposed approach can converge to the global optimum under certain constraints. To illustrate the practicality of CRGP, finally it is applied to a postearthquake reconstruction management problem. Experimental results show that CRGP is effective in exploring the unknown nonlinear systems among huge datasets, which is beneficial to assist the postearthquake renewal with high accuracy and efficiency. The proposed method is found to have a superior performance in obtaining a satisfied model structure compared to other related methods to address the disaster management problem.
Natural disasters occurred more frequently during these years. Most of them caused a large amount of infrastructure damage, heavy casualties, and financial loss every year, such as earthquake, floods, and typhoon. For example, Sichuan Earthquake left at least 5 million people without housing and government had to spend billions over the following years to rebuild the ravaged areas . In order to avoid the enlargement of economic and mental damage for the people and society, disaster management is therefore a growing need for many governments. The important and complex task of disaster management is to make an efficient reconstruction strategy that can rescue the victims on time and rebuild the ravaged areas efficiently with the limited resources and finance support. Formerly, several qualitative analyses were pointed out for certain special aspects of the reconstruction strategy, such as conceptual decision support for disaster mitigation [2, 3], rescue planning of telecom power , and optimized strategy for resource allocation . But, few mathematics models for disaster management currently exist in the literature, and modeling the reconstruction strategy remains an important open research topic.
From the application view,  pointed out that some prediction models for “in-advance reconstruction strategy” were demanded which can reduce inevitable delays in the recovery process. In addition,  hints that the speed and quality of renewal process interact with the assignment of limited resources (such as experts, medical teams, and finance support). These underlying relationships can be modeled from the “Big Data” of disaster management for prediction. Therefore, it is an urgent problem to design the prediction models for disaster management to assist in making the efficient reconstruction strategy in advance.
Since nonlinear models are empirical for the complex process modeling, such as industrial control systems [8, 9], biomedical data modeling [10, 11], and chemical process systems , such underlying models for disaster management can be converted as a problem of nonlinear systems design (NSD). This NSD problem involves determining the structure and estimating the parameters of the nonlinear models embedded in the disaster management. Traditionally, solving NSD problems focuses on parameter estimation, while the structures are usually assumed to be known or approximated by some universal approximators such as neural networks . However, actually no a priori information can be understood about the nonlinear models for disaster management, and the disaster datasets are usually incomplete and inconsistent . In this case, this approximated approach has its limitation in the sense that it usually uses a very complex model to describe a maybe simple underlying function. Then, the unnecessary number of parameters posts a problem in estimating the parameters and contradicts the basic principle in designing a nonlinear system. Moreover, the related decision variables in the model, which are significant to express the system behavior, are generally difficult to select among a lot of features. In addition, the generalization performance of an approximated model is usually not guaranteed. Therefore, it is a significant challenge to determine an exact structure of each nonlinear model for disaster management and design these nonlinear models with sparse data and little a priori information.
To discover the nonlinear model structure, genetic programming (GP) is a powerful tool by using tree chromosome representation and the crossover operator [15, 16]. It searches in the functional space to determine the optimal structure of a nonlinear system . Generally, GP is used to coevolve model structure and parameters by optimizing a single objective, such as prediction error. However, this approach does not always converge to a global optimal structure since the functional space is usually too large to search for . It is observed that spurious terms and dependent variables are the main causes of this problem. Evolution of the spurious terms and dependent variable results in rapid growth of tree sizes in GP. It tends to cause the algorithm stagnating and leads to the phenomenon of bloating . The simple and common way to prevent bloating for GP is to limit the maximum tree depth of all the individuals and the maximum number of nodes . But for our problem with little a priori information, it is hard to determine the appropriate depth and the number of nodes. The other approaches focus on controlling the offspring trees growth or targeting redundant nodes by improving the crossover and selection operators [21, 22]. Our proposed approach mainly uses this idea to resolve bloating in GP. Besides, it is also a good way for controlling bloating to use a fitness penalty (parsimony pressure) that is biased against the individuals with more complex structure . Luke presented that all of these bloating control methods can perform well without reducing their ability . Considering that the NSD problem for disaster management should have a practical optimum solution, yet an over-complex model results in bad comprehensibility; thus model complexity should be restrained at a low level. Thus, model complexity should be optimized which is also considered as a fitness penalty to control bloating. Since a poor model structure may lead to poor parameter estimation, model accuracy and complexity should be optimized simultaneously in solving the NSD problem. Since the maximization of model accuracy is not the same as the minimization of model complexity, our NSD problem for disaster management is thus modeled as a multiobjective optimization problem (MOP) and solved by a multiobjective evolutionary algorithm (MOEA) cooperating with GP.
The MOEAs mainly include aggregating method and Pareto-based approach . For the aggregating method, predefining weights or a priori information of the objectives is required to convert the MOP to a single-objective optimization. This approach is easy to implement, but improper a priori information might lead to poor optimization result . The Pareto-based approach obtains an optimal solution in two steps: Pareto optimality process and multicriteria decision making process . This approach is usually time consuming in the first step to achieve the Pareto optimal set which consists of many redundant solutions. The final solution is selected from the Pareto optimal set using the goal or preference information from decision makers in the second step. Although several algorithms have been proposed to improve the validity of Pareto optimal set [27–29], little efforts have been put on incorporating the subjective information with Pareto optimality to reduce the search space and improve the efficiency. Since the NSD for disaster management requires a single solution instead of the whole Pareto optimal set, incorporating the subjective information is important to discover the exact nonlinear model for each underlying relation in disaster management. Because we found that the multiple objectives of the NSD for disaster management have different priority for ranking in different situations (and this ranking strategy can be adjusted according to the subjective probability theory), we here propose a novel multiobjective GP algorithm called compromise rank genetic programming (CRGP), to address the NSD for disaster management by combining the subjective information with Pareto optimality.
The proposed CRGP aims to uncover the exact nonlinear structure and parameter estimation of the models embedded in the incomplete disaster datasets and tries to combine with the subjective probability theory to reduce the computational complexity. No choosing of goals or weights information is required for CRGP, which makes it eliminate the error due to the weights mistake. The proposed CRGP utilizes the relative distance of chromosomes to guide the search in the Pareto optimality process and to obtain the final compromise solution in a single run. This characteristic is beneficial in reducing the evolution probability of the model structures composed by redundant terms and unimportant features. It can resolve the bloating problem in GP evolution and improve the convergence rate. To evaluate the effectiveness and practicality of CRGP for our problem, the proposed approach is then applied to a practical problem-post-earthquake reconstruction in disaster management.
The paper is organized as follows. Section 2 presents the formulation of NSD based on MOP. Section 3 describes the proposed CRGP algorithm. Convergence analysis of CRGP is given in Section 4. Section 5 reports the application of CRGP to disaster management problem using real postearthquake reconstruction data. The method is compared with the traditional single-objective GP method, aggregating MOGP method, and Pareto-based MOGP approach. Concluding remarks are given in Section 6.
2. Problem Formulation
Assume an unknown nonlinear system given bywhere is the unknown nonlinear function, is the output vector (with noise) we can obtain, is the unknown input vector, and is the system parameter vector that is unknown. The task of NSD involves determining three variables: appropriate input set () with minimal redundancy from , nonlinear function , and system parameters .
Traditional nonlinear system identification usually assumes and to be determined a priori, and the main task is to determine by some parameter estimation techniques. One popular approach is the minimum mean square error (MSE) method; that is,where is the estimated output and contains all the real numbers. In practice, the a priori information about and is not usually available. A general formulation for NSD based on minimizing MSE can then be expressed bywhere is the functional space that contains all possible nonlinear functions, is the input sequence space, and is the MSE between the true and estimated output. Since input variables might be correlated, different combination of input variables might result in similar level of MSE. Therefore, the order of the appropriate input set should also be minimized; that is,
In addition, different model structures can have the same MSE if there are redundant terms. Thus, the nonlinear function should be in the most parsimonious form. The complexity of model structure is then considered as another measure of here. Two factors are considered to measure the complexity of model structure. One is the total number of terms in to avoid redundancy and another is the ratio of the number of nonlinear terms to the total number. That is,where is the number of terms in and is the number of nonlinear terms. Since the three optimization problems, (3), (4), and (5), are correlated, they should be solved simultaneously for a consistent solution. For example, the optimal input set is determined by minimizing and together. At the same time, an improper input set would cause divergence in optimizing . In this paper, we propose formulating NSD as a MOP given below:
One popular approach to MOP is the evolutionary multiobjective optimization (EMO). Traditional EMO approaches  convert MOP to a single-objective problem by weighting. Taking the sum of weighted objectives, the single fitness function can be expressed as:where are the weights. The optimal solution can be obtained by ranking the individuals in terms of the single-objective value. This approach has some difficulties to solve the NSD of disaster management problem. First, the values of those weights are hard to be defined a priori, and different weightings can result in very different solutions. Thus, several runs with different weight combinations are usually required to obtain various Pareto-optimal solutions , but they still cannot ensure the exploration of the real solution except for higher computation cost. Second, our model (7) is usually nonconvex, because the epigraph of the function is usually not a convex set with respect to all the candidate structures of the unknown function “.”
To overcome these difficulties, the Pareto-based EMO approaches are usually considered. Compared to the objective space of a single-objective optimization, the objective space for Pareto-based EMO is usually more complex. It is a challenge to determine how to evaluate the individuals according to several inconsistent objectives. Pareto-based EMO  are usually implemented in two steps: Pareto optimality process and multicriteria decision making process. The first step ranks individuals by their nondominance degree and then achieves the Pareto-optimal set as a vector which is composed of a set of “nondominated” solutions for all the objectives. In the second step, the final solution is selected from the Pareto-optimal set according to the goal or preference information provided by decision makers . The definition of “dominance” is shown as below.
Definition 1 (Pareto dominance [26, 32]). For a minimization optimization problem, given vector is said to dominate vector if and only if is partially less than , that is,and vice versa for a maximization optimization problem.
Generally, the Pareto dominance rank method assumes that all objectives have an equal weighting for optimization. However, this assumption does not apply to the NSD of disaster management problem. Actually, the three objectives play different roles in determining the final best nonlinear system. In addition, two steps of the general Pareto-based methods cost much time to calculate many redundant Pareto-optimal solutions which is not required for our disaster management problem. To improve these issues, CRGP is proposed here to solve for the NSD MOP of disaster management problem.
3. Compromise Rank Genetic Programming
CRGP is proposed with the purpose of incorporating decision making process with Pareto optimality process to obtain the final compromise solution in one process. The main difference between CRGP and the general Pareto-based EMO methods is the rank approach of individuals according to multiple inconsistent objective functions.
3.1. Compromise Rank Approach
Assume that there are objectives (). These objective functions map individuals in the variable space to vectors in the objective space. The individuals are ranked by evaluating the -dimensional vector in the objective space. Assume and be two different -dimensional vectors in the objective space. and () denote the th objective values of the and , respectively. For the NSD MOP problem for disaster management, there are three objectives as (7); that is, , and they are always positive; in fact, in this paper we only consider that all objective values are positive; that is, , and .
We define the relative distance from to relative to the beginning point through the th objective space as ; that is,The sign of is determined by the magnitude of and , and the magnitude of indicates the relative increment or decrement of the th objective when is compared to . For example, the case that is negative implies that is smaller than . For minimization, is then considered to perform better than for the th objective.
Assume the term “rank” to measure the performance of every individual; the smaller the rank, the better the optimal solution. Thus, comparing the relative distances of different objective vectors, we have the following situations:(1)If , for minimization should be ranked higher than .(2)If , for minimization should be ranked lower than .(3)If , and , and are considered as having the same rank based on Pareto dominance sorting method (see Definition 1). As shown later, unlike Pareto dominance sorting, compromise rank method attempts to consider relative differences of all the objectives and to assign different ranks to and in this case.(4)If , the rank of and should be the same.
Here, we analyze the characteristics of the objective vectors (i.e., ) in different situations for our MOP problem (7) and subsequently propose a new rank rule to solve it. Obviously, when the relative distance of and falls into the first situation and second situation, the rank can be determined by the sum of all parts . For the third situation, the relationship between the absolute values of can be divided into three cases: (a) , (b) , and (c) . In the case (a), the NSD MOP has solutions with and close to each other, but their MSE values, that is, , can be quite different. For example, consider corresponding to the model structure and corresponding to the model structure ; their and values are close. However, their values are quite different; that is, and are 0.004 and , respectively. Actually in the case (a) the first objective has a higher priority; therefore the rank of and in this case would be in agreement with the sign of the sum of . In the case (b), the MSE values of different models are similar, and the difference between and can be ignored. On the other side, in the case (b) the first objective has a lower priority. So, in this case the conclusion is also achieved in which the rank of and would be in agreement with the sign of the sum of . In the case (c), and are considered to have the same rank, yet it occurs rarely. Thus we found that the sum of can adaptively reflect the preference information of objectives of NSD at different cases. This relationship is a novel character the proposed approach uses to apply to the NSD MOP for disaster management, unlike other approaches which need to know the preference information a priori.
Therefore, the proposed approach defines “compromise distance” of two vectors in a -dimensional objective space as below.
Definition 2 (compromise distance). The compromise distance from to is defined as the sum of all the relative distances of every objective from to in the objective space. That is,Compromise rank approach is then proposed to rank the compromise distance between two vectors in the objective space to guide the Pareto optimality search process. Assume that all objective values are positive; the rules for compromise distance ranking are given as follows: (1)If and , then .(2)If and , then .(3)If and , if , then .(4)If and , then .Additionally, these rules are Pareto-compliant which is proved later in Theorem 3.
Figure 1 illustrates the ranking scheme based on Pareto dominance sorting and compromise rank approach, respectively. For the two vectors with rank 1 in Figure 1(a), their relative distance of the first objective is smaller than that of the second objective. Thus, the compromise rank approach will rank these two vectors in agreement with the order of the second objective as shown in Figure 1(b). Consider any vector with rank 1 and any vector with rank 2 in Figure 1(a); their relative distance of the first objective is much bigger than that of the second objective. Thus, the compromise rank approach will rank them with more levels as shown in Figure 1(b).
The implementation of compromise rank approach is similar to the fast nondominated sorting approach . First, in one population, each individual is evaluated by pairwise comparison of compromise distance with other individuals. According to the compromise rank rules, the rank relations among individuals are recorded. For instance, individuals with rank higher than the compared individual are reserved in a set. The number of individuals with rank lower than the compared individual is also kept for record. Then, the individuals with rank lower than any other individuals are assigned with rank 1, and they are considered as the best solutions in this population. After that, according to the recorded rank relations of individuals, those remaining in this population are assigned with corresponding rank in order.
3.2. Main Loop of CRGP to Solve the NSD Problem
By incorporating compromise rank into GP, CRGP can be divided into three steps: initialization, evaluation objectives, and individuals evolution as shown in Figure 2.
GP uses a tree to express a nonlinear system structure as an individual. All the internal nodes are mathematical operators, and the leaf nodes are the input variables. GP defines a function set that includes all the possible mathematical operators and a terminal set that involves all the possible input variables. For example, when a polynomial system is considered, the function set is defined as and the terminal set contains all the input variables . The tree structure is encoded as a string as in Figure 3. In order to solve model structures for consistency with semantic restraints, appropriate values within the function and terminal set of GP should be determined to restrict the search space.
The initial trees are selected by the ramped half-and-half method . The maximum depth of a tree is predefined. Half of the initial tree is a full tree with the maximum depth, the leaf nodes of which are randomly chosen from the terminal set and the other nodes are chosen from the function set. The other half of the initial tree has variant depth of no more than the maximum depth, all the nodes of which are selected from the terminal set or the function set at random. It is noted that the tree might be nonsensical when an internal node is determined as an input variable from the terminal set. Hence, every individual tree is checked by a model certification procedure before calculating the fitness values of it. In this procedure, all the internal nodes are checked to figure out it is a mathematics operator or a symbolic variable. If an internal node is a symbolic variable, the subtree from this internal node down to the leaf nodes is deleted. An example is shown in Figure 4. This procedure is of particular importance for calculating the complexity of the nonlinear model in the following evaluation step.
3.2.2. Evaluation of Multiple Objectives
After generating the individual trees of a population, the performance of a tree is evaluated by calculating the fitness value according to the multiple objectives. It should be noted that evolution operations will make new trees grow larger and have more insignificant subtrees. Before calculating the multiple objectives of each tree, we adopt orthogonal least squares (OLS) method  to eliminate the redundant subtrees of the individuals that contribute less to improving the model accuracy. For example, if a polynomial system is considered, an individual tree is divided into subtrees in terms of the plus operators. OLS calculates the error reduction ratios of those subtrees. Let the input regression matrix , where denotes the function term expressed by each subtree. The expected output vector can be expressed aswhere is the parameter of the th subtree. Since these unknown parameters are linear, least squares (LS) method is employed to estimate ; that is,The first step of OLS is to make QR decomposition on . Suppose , where is an orthogonal matrix and is an upper triangular matrix. Then the positive diagonal matrix is obtained and assistant parameter vector for the output is defined aswhere is the corresponding solution vector of OLS. Thus, the mean square of output can be expressed aswhere is the error between the expected and real outputs. Therefore, the error reduction ratio of every subtree is given byWhether a subtree is eliminated is determined by comparing the error reduction ratio of the subtree with a threshold. When the error reduction ratio of a subtree is less than the threshold, the subtree will be eliminated; otherwise, the subtree will be kept.
After deleting the redundant subtrees, parameters of every subtree are obtained by (13). Consequently, the three objective functions of NSD are calculated tree by tree. The values of , , and form the -dimensional vectors in the objective space. The ranks of these vectors are calculated by the compromise rank approach and are then assigned as the fitness values of the corresponding individual trees. The smaller the fitness value is, the better a tree is.
3.2.3. Evolutionary Operations
After evaluating the fitness values of individual trees, new trees are generated by evolutionary operations, including selection, reproduction, crossover, and mutation. First, the selection operation is carried out to select individuals from the population. These individuals form a mating pool which is used for reproduction, crossover, and mutation. We define the size of the mating pool as the half of the population size. Thus, tournament selection method is operated to compare any two individuals and the individual with better fitness value is selected. After that, individuals in the mating pool are separated into three parts with the probability to apply for reproduction, crossover, and mutation, respectively. The individuals with better fitness values are reproduced as the new individuals of the next generation. In the crossover operation, two individual trees are selected, and a subtree of each individual is selected randomly and exchanged. For example, in Figure 5, the diagonal mark means the crossover point of parent 1 tree and parent 2 tree. The subtrees of “” in parent 1 and “” in parent 2 are exchanged in the child 1 and child 2 trees. In the mutation operation, a subtree of the individual tree is randomly selected and replaced by a new subtree. This new subtree is generated using the same method as the initial tree. After crossover and mutation, certain internal nodes might not be mathematical operators, which violate model construction rules, so the model certification procedure described before will be adopted after evolutionary operations. At last, an elitism mechanism, named competition strategy, is employed after applying the evolution operators to let the better individual survive and to avoid the loss of good genes due to random effects. This mechanism constructs a combined population with parents and offspring population. Every individual in the combined population is assigned a rank based on the compromise rank approach. The individuals with lower ranks are selected to construct the new population. When the new population grows with the same size of the former population, this work is finished. This elitism mechanism has been demonstrated for its convergence property [34, 35] and is successfully applied to the genetic algorithm to solve for many real applications .
The proposed CRGP is summarized as follows:(1)Initialize the function set and terminal set of GP and set the parameters of the algorithm, such as the population size, generation number, and crossover probability.(2)Evaluate the compromise rank value of every tree in a generation using fast compromise rank approach.(3)Generate a mating pool using tournament selection method and the candidate tree with a lower rank is selected in the tournament comparison. Then, assign every tree in the generated mating pool with an evolution operation among three different operations: elite, crossover, and mutation, based on operation probability.(4)Realize the corresponding operations of parent trees to generate their children trees, and calculate the multiobjective vector of each children tree.(5)Combine all the parent trees with children trees as an intermediate population and rank them by compromise rank approach. Put the trees with lower ranks in the next generation set until the population set is met.
4. Theoretic Analysis of Compromise Rank Genetic Programming
In order to ensure the solution accuracy of CRGP, the convergence properties of CRGP are discussed in this part. We present two theorems which prove that CRGP can converge to the actual Pareto Front.
Theorem 3. CRGP converges to Pareto optimum (Pareto Front ) with probability one; that is,where contains vectors in the objective space at the th generation.
Proof. It is proved that an EA using Pareto-based ranking and a monotonic selection algorithm converges to the global optimum of a MOP with probability one .
Because compromise rank results obey Pareto dominance sorting, CRGP obeys Pareto-based ranking. In addition, this approach uses elitism mechanism during the selection process, and the rank of any individual in the offspring population is lower than or equal to the minimum rank in the parents population ; that is,where and denote the objective vectors in the parents and offspring generation, respectively. It is proved that the selection algorithm is monotonic. In addition, the evolutionary procedure of CRGP has the mutation operator that can supply irreducibility property. That means the CRGP algorithm satisfies all the optimum convergence criteria.
Therefore, CRGP converges to the Pareto optimum with probability one after a finite number of iterations.
This part aims to discuss the applicability constraints that should be satisfying when CRGP converges to the best solution on the Pareto Front. CRGP introduces compromise distance of vectors in the objective space to substitute the role of preference information in the optimization process. Thus, the constraints are related to the distribution of vectors in the objective space. When the preference information of decision making can be performed correctly by the relative distances of vectors in the objective space as given below, CRGP is applicable to solve other MOP problems.
Theorem 4. Under the conditionin the objective space, CRGP converges to the best solution of NSD with probability one; that is,where denotes the unique best solution; denotes a vector in the objective space at the th generation on the Pareto Front ; denotes the Hamming distance between the associated incidence vectors. Here, assume that , , and are the value of model error, model structure complexity, and the number of selected variables of the best solution, respectively; , , and are the first, second, and third objective value of a vector at the th generation, respectively.
Proof. For clarity, we use to represent negative relative distance ; denotes positive . It can be seen that .
From Theorem 3, is reachable at certain generation after mutation and recombination. For all objective vectors on the Pareto Front , an -efficient region is defined near the first objective of model error as . is defined as the region outside of the . Consider ; . Here, we will present the proof in the regions and , respectively. A graphical representation of these regions is given in Figure 6 taking a two-dimensional objective space.(1)Assume , ; suppose that is the vector corresponding to the model with more complex structure while is the vector corresponding to the model with simpler structure, that is, , , and , then . Since , , we have ; thus . According to the rank rules, it can be obtained that ; that is, the model with a more complex structure has a higher rank. Therefore, for vectors in the objective space within , CRGP converges to the optimum with simplest structure complexity; that is,(2)Next, we analyze vectors in . Consider and . From Figure 6, we have , , and . Then . But , the sign of cannot be determined without any constraint, and the convergence is uncertain. Therefore, in the following, we deduce the constraints which are required to realize the convergence; that is, Because CRGP adopts the elitism selection strategy, the objective vector corresponding to a smaller rank will be selected after mutation and recombination. That means the convergence can be realized only if the order of and meets . According to the rank rule, there are two situations:(i) If , then . No constraint is required in this case.(ii) If , then it requires and when is met. For the former, since , it is deduced that when , this requirement is satisfying. For the later, the requirement can be written as It is known that . Thus when , this requirement is satisfying. In other words, the constraint can be written as . Thus, , and hence . Summarizing the constraints in the two cases, it is found that if is satisfying, the convergence described by (23) can be achieved. That means when the value of an objective (e.g., has a higher priority) is far from the global optimum, the decrease distance in this objective from this value to the global optimum is always larger than the sum of the increase distances in other objectives (e.g., and have lower priorities). If such a difference is greater than two, then the algorithm converges to the global optimum.
Combining (21) and (23), we conclude that CRGP converges to the best model structure with the minimum model error with probability one; that is,when the relative distances of pairwise comparison meet the constraint as (25) in the objective space.
Not only does this rank approach obey Pareto dominance sorting, but it also incorporates the preference of decision making of NSD by taking into account its special relationship among the three objectives. So it can conduct the final Pareto-optimal solution in one process.
5. Application to Postearthquake Disaster Management
5.1. Description of Dataset
An actual dataset which recorded the 2003 Bam earthquake is studied in the following experiment. The database records 76 features through 106 time frames (nearly week intervals). It includes the number of buildings corresponding to seven stages of reconstruction process, respectively: building permit at first, building at foundation stage, building at framing stage, building at fencing stage, building at roofing stage, building close to be used, and building ready for use. Additionally, some features depict resources about the human and material needed and finance support during the recovery process, such as “number of engineers,” “number of loaders,” “number of reconstruction offices,” “number of heavy and not heavy equipments in reconstruction offices,” “number of units sent to consultants for building plan,” and “number of units received financing.”
In order to help human to efficiently determine the reconstruction strategy with high quality in the disaster management, we aim to uncover the underlying relationships between reconstruction processes and available resources. As we know, polynomial nonlinear models are empirical for process modeling. Thus, we assume that all the relationships can be expressed by nonlinear polynomial models. The structure of models is difficult to achieve due to the incomplete data in the dataset. For example, in our dataset, the “number of units at foundation stage” missed the records at the last thirty weeks, and it had to be filled with the sum of “number of commercial units at foundation stage” and “number of residential units at foundation stage” in the dataset. As a result, the record of “number of units at foundation stage” was not so accurate. Thus, the modeling method is required to avoid overfitting the training data and reduce the influence of measure error. Besides, the variables of these models are not independent of each other and their underlying relations are uncertain. Thus, the significant underlying relationship models about these variables are unknown and not easy to understand which tend to be embedded by other weak relationships. That means that many models with different structures may result in equivalent satisfying model performance. Therefore, the difficulty to solve the underlying models is how to optimize model structure and model performance simultaneously. This paper would apply CRGP to solve it using the actual data from the 2003 Bam earthquake.
5.2. Application Results
The first step of this simulation is done by using the proposed CRGP approach to generate six underlying nonlinear models among the building amounts in seven renewal stages. The results are shown in Table 1. For each model, the left symbol of the equation denotes the estimated target and the other six features are considered as the input variables. These six CRGP employed the same parameter setting defined in Table 2.
These models deliver the closest and simplest expression for the number of building units in every stage. Based on them, it can be observed that “the number of units at foundation stage” (feature ) is the most significant element during the building reconstruction process. Once its value is decided, the values in other stages in the following time can be estimated in advance. Additionally, the relationship between the numbers of units at different stage is almost linear, which hints that other intelligent computing methods suitable for solving linear structure can be conducted to this problem with satisfying result. For example, these models can be simulated using neural network, but the satisfying models can only be obtained when training data are complete and simulation parameters are set properly.
But a problem is raised, that is, how to determine the feature at the foundation stage. Although Model 1 offers an expression to predict in mathematics, the feature cannot be known before is set in the logical level. The same problem happens to the feature as well. Obviously, available financial resources and human resources mainly affect the total number of buildings permitted to be constructed at certain time. So in the second step, we apply the proposed CRGP approach to achieve the relationship models between the building reconstruction process and other reconstruction variables (i.e., financial and human resources). Then adjusted models for the features and are gained as below”where denotes “number of units received financing.” Then, decision makers can estimate the number of buildings for reconstruction according to the available financial resource and human resource. It can avoid the unnecessary loss of useful resource and overestimation for construction building.
To analyze the validity of the proposed CRGP approach for disaster recovery scheduling problem, the learning process of the variable “number of units at foundation stage” is given here as an example. Six available resource variables required during reconstruction are defined as the input features, that is, “number of engineers,” “number of loaders,” “number of reconstruction offices,” “number of heavy and not heavy equipments in reconstruction offices,” “number of units sent to consultants for building plan,” and “number of units received financing.” In order to test the accuracy of solution, we chose of dataset as training data and the residual of dataset as testing data. The optimal solution is obtained by using CRGP to deal with the training data set and tested by testing data set. As a result, the estimated performance of CRGP solution is plotted in Figure 7 by comparing with the real data, and test accuracy of the testing data is 98.7%. It can be observed that CRGP can obtain the solution with high accuracy.
To illustrate the availability of CRGP, a comparison of CRGP with single-objective GP and NSGAII-GP algorithms to solve the relationship model for “number of units at foundation stage” is given. All of these three methods applied the same parameter setting as in Table 2. Without generalization, this single-objective GP chooses the approximate error of model as the objective function. The aim of the comparison of performance of CRGP with single-objective GP is to show that NSD problem is addressed better by considering it as a multiobjective optimization problem than a single-objective optimization problem. This NSGAII-GP applied a Pareto-based EMO algorithm, NSGAII, to deal with the evaluation and evolution process of multiobjective GP. The comparison of CRGP with NSGAII-GP is presented in order to demonstrate the validity and superior property of compromise rank approach.
In Figure 8(a), in order to make a common baseline for comparison, we convert a multiple objectives vector to an experienced objective as to plot the learning curves of CRGP, single-objective GP, and NSGAII-GP. Here, and are chosen as the experience value , respectively, to express the different impact of every objective on the nonlinear system design. To make it clear, the learning curves of model error objective and model complexity objectives are, respectively, shown in Figures 8(b) and 8(c).
(a) Average learning curves of multiple objectives for the disaster management problem using three different algorithms: CRGP, single-objective GP, and NSGAII-GP
(b) Average learning curves of model error objective for the disaster management problem using three different algorithms: CRGP, single-objective GP, and NSGAII-GP
(c) Average learning curves of model complexity objectives for the disaster management problem using three different algorithms: CRGP, single-objective GP, and NSGAII-GP
In Figure 8(b), even though it is shown that NSGAII-GP and single-objective GP can achieve better performance in terms of model error, CRGP can also obtain small model error results that differs from others little. Most importantly, the solutions of NSGAII-GP and single-objective GP are not the right results but are overfitting results. Generally, these overfitting results appear with a small model error and an overcomplex model structure. This conclusion can be found in Figure 8(c). It is seen that model complexity results of NSGAII-GP and single-objective GP are high to around 25 while model complexity of CRGP converges to 5. Therefore, as shown in Figure 8(a), it is clearly indicated that CRGP can converge to not only best model error but also parsimonious structure near the accurate solution, while the other two algorithms converge to only best model error and complex structure.
By comparing the performance of CRGP with single-objective GP, it is seen that the solution of single-objective GP always involves small model error and complex structure. This is because single-objective GP only searches for the optimization of model error and the size of model structure tends to increase much due to bloating problem. Then, more than one structure corresponds to a smaller error. For our problem, many dependent variables and uncertain relations build the underlying models; thus it is hard to determine an exact boundary or a weighting parameter to control bloating and balance model error against model complexity. Therefore, it is more reasonable to treat our NSD problem as a multiobjective optimization problem than as a single-objective optimization problem.
Through the comparison of CRGP with NSGAII-GP, it is found that NSGAII-GP also cannot obtain the optimal model structure and CRGP can easily converge to the best solution. The reason is that NSGA-II can only find the Pareto-optimal set, from which designers should use multicriteria decision making (MCDM) techniques to obtain the best solution. But the realization of MCDM always needs some weighting information between multiple objectives or goal information about the special application, which can scarcely be obtained for our NSD problem. Here, NSGAII-GP chooses the smallest approximate error in the Pareto-optimal set as the final result, and CRGP uses compromise rank approach to combine decision making with optimization process. It is concluded that CRGP provides a better way to this NSD problem for disaster management without any prior information and is able to obtain the satisfying solution.
Besides, average running time required of every algorithm is presented in Table 3. These experimental results illustrate that CRGP is beneficial to efficiently obtain the optimal model structure with allowable small model error and its efficiency and solution accuracy are better than single-objective GP and NSGAII-GP approaches. Furthermore, such outstanding convergence performance is beneficial to explore nonlinear relationship models embedded in the postearthquake recovery management.
Many real world problems, such as disaster management, can be converted as a nonlinear system design (NSD) problem. The difficulty in NSD is to obtain the unknown model structure in the case that no a priori information of system is available. To address this problem, we propose to model NSD as a MOP here for hybrid estimation of mode performance and model structure simultaneously. In particular, the CRGP algorithm is developed in this paper. The main feature of CRGP is that it uses the characteristics of individuals from disaster management problem to reflect the preference information of the NSD MOP problem. CRGP uses a novel compromise rank approach for Pareto-based EMO to obtain the final Pareto-optimal solution in one process. It is realized by comparing the compromise distances of vectors in the objectives space. The convergence properties of CRGP are also investigated here. It is shown that CRGP can converge to a global optimum of NSD problem. Moreover, the convergence constraint is deduced. CRGP is further applied to solve for a postearthquake reconstruction management problem using real earthquake data. Six underlying relationship models about recovery scheduling are generated, which illustrate that the number of buildings constructed in different recovery stages has nearly linear relationships and the number of buildings in the foundation stage is mainly related to financial support. Additionally, simulation results show that CRGP has a better solution accuracy and a faster convergence rate than the single-objective GP and another multiobjective GP algorithm (NSGAII-GP) for this NSD problem. In the future, CRGP is developed to solve other real-world MOPs, such as dense signal estimation and detection.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (Grant no. 61401145), the Natural Science Foundation of Jiangsu Province (Grant no. BK20140858), and the Fundamental Research Funds for the Central Universities (2013b02114). The authors would like to thank the people who provided the earthquake dataset and appreciate all the people who help them to overcome many difficulties during this research work.
- H. Tamura, K. Yamamoto, K. Akazawa, and K. Taji, “Decision analysis for mitigating natural disaster risks,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 554–559, October 2000.
- J. Harrald and T. Jefferson, “Shared situational awareness in emergency management mitigation and response,” in Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS '07), pp. 23–35, January 2007.
- A. Kwasinski and P. T. Krein, “Telecom power planning for natural and man-made disasters,” in Proceedings of the 29th International Telecommunication Energy Conference (INTELEC '07), pp. 216–222, Rome, Italy, October 2007.
- F. Fiedrich, “An HLA-based multiagent system for optimized resource allocation after strong earthquakes,” in Proceedings of the Winter Simulation Conference (WSC '06), pp. 486–492, December 2006.
- M. Ghafory-Ashtiany and M. Hosseini, “Post-Bam earthquake: recovery and reconstruction,” Natural Hazards, vol. 44, no. 2, pp. 229–241, 2008.
- S. B. Miles and S. E. Chang, “Modeling community recovery from earthquakes,” Earthquake Spectra, vol. 22, no. 2, pp. 439–458, 2006.
- M. F. Metenidis, M. Witczak, and J. Korbicz, “A novel genetic programming approach to nonlinear system modelling: application to the DAMADICS benchmark problem,” Engineering Applications of Artificial Intelligence, vol. 17, no. 4, pp. 363–370, 2004.
- A. Jabbarzadeh, S. G. J. Naini, H. Davoudpour, and N. Azad, “Designing a supply chain network under the risk of disruptions,” Mathematical Problems in Engineering, vol. 2012, Article ID 234324, 23 pages, 2012.
- G. N. Beligiannis, L. V. Skarlas, S. D. Likothanassis, and K. G. Perdikouri, “Nonlinear model structure identification of complex biomedical data using a genetic-programming-based technique,” IEEE Transactions on Instrumentation and Measurement, vol. 54, no. 6, pp. 2184–2190, 2005.
- P. Koduru, Z. Dong, S. Das, S. M. Welch, J. L. Roe, and E. Charbit, “A multiobjective evolutionary-simplex hybrid approach for the optimization of differential equation models of gene networks,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 5, pp. 572–590, 2008.
- V. Varadan and H. Leung, “Design of piecewise maps for chaotic spread-spectrum communications using genetic programming,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 49, no. 11, pp. 1543–1553, 2002.
- J. Sjöberg, Q. Zhang, L. Ljung et al., “Nonlinear black-box modeling in system identification: a unified overview,” Automatica, vol. 31, no. 12, pp. 1691–1724, 1995.
- W. Zheng and J. Zhang, “Fuzzy random method in earthquake disaster risk assessment,” in Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD '08), pp. 579–583, October 2008.
- H. Leung and V. Varadan, “System modeling and design using genetic programming,” in Proceedings of the 1st IEEE International Conference on Cognitive Informatics (ICCI ’02), pp. 88–97, Calgary, Canada, 2002.
- X.-L. Yuan, Y. Bai, and L. Dong, “Identification of linear time-invariant, nonlinear and time varying dynamic systems using genetic programming,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '08), pp. 56–61, June 2008.
- J. R. Koza, Genetic Programming, M.I.T. Press, Cambridge, Mass, USA, 1992.
- K. Rodríguez-Vázquez, C. M. Fonseca, and P. J. Fleming, “Identifying the structure of nonlinear dynamic systems using multiobjective genetic programming,” IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, vol. 34, no. 4, pp. 531–545, 2004.
- S. Bleuler, M. Brack, L. Thiele, and E. Zitzler, “Multiobjective genetic programming: reducing bloat using SPEA2,” in Proceedings of the Congress on Evolutionary Computation 2001, pp. 536–543, May 2001.
- J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, The MIT Press, 1992.
- K. Ono and Y. Hanada, “Assembling bloat control strategies in genetic programming for image noise reduction,” in Proceedings of the 14th International Conference on Intelligent Systems Design and Applications (ISDA '14), pp. 113–118, November 2014.
- G. Li and X.-J. Zeng, “Controlling bloating using depth constraint crossover,” in Proceedings of the UK Workshop on Computational Intelligence (UKCI '10), pp. 1–6, September 2010.
- R. Poli and N. F. McPhee, “Parsimony pressure made easy,” in Proceedings of the 10th Annual Genetic and Evolutionary Computation Conference (GECCO '08), pp. 1267–1274, ACM, July 2008.
- S. Luke and L. Panait, “A comparison of bloat control methods for genetic programming,” Evolutionary Computation, vol. 14, no. 3, pp. 309–344, 2006.
- K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms, John Wiley & Sons, 2001.
- C. A. Coello Coello, G. B. Lamont, and D. A. van Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems, Genetic and Evolutionary Computation, Springer, Berlin, Germany, 2nd edition, 2007.
- C. M. Fonseca and P. J. Fleming, “Multiobjective optimization and multiple constraint handling with evolutionary algorithms—part I: a unified formulation,” IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, vol. 28, no. 1, pp. 26–37, 1998.
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
- E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: improving the strength Pareto evolutionary algorithm for Multiobjective Optimization,” in Proceedings of the Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems (EUROGEN '01), K. Giannakoglou, Ed., Athens, Greece, September 2001.
- Y. Jin and B. Sendhoff, “Pareto-based multiobjective machine learning: an overview and case studies,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 38, no. 3, pp. 397–415, 2008.
- D. Cvetković and I. C. Parmee, “Preferences and their application in evolutionary multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, pp. 42–57, 2002.
- D. A. van Veldhuizen, Multiobjective evolutionary algorithms:classification, analyses, and new innovations [Ph.D. thesis], Department of Electrical and Computer Engineering. Graduate School of Engineering. Air Force Institute of Technology, Wright-Patterson AFB, Ohio, USA, 1999.
- S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods and their application to nonlinear system identification,” International Journal of Control, vol. 50, no. 5, pp. 1873–1896, 1989.
- G. Rudolph and A. Agapie, “Convergence properties of some multi-objective evolutionary algorithms,” in Proceedings of the Conference on Evolutionary Computation, vol. 2, pp. 1010–1016, IEEE Press, Piscataway, NJ, USA, July 2006.
- T. Hanne, “On the convergence of multiobjective evolutionary algorithms,” European Journal of Operational Research, vol. 117, no. 3, pp. 553–564, 1999.
- S. Wei and H. Leung, “An improved genetic algorithm for pump scheduling in water injection systems for oilfield,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '08), pp. 1027–1032, June 2008.
- D. A. V. Veldhuizen and G. B. Lamont, “Evolutionary computation and convergence to a pareto front,” in Genetic Programming 1998: Proceedings of the 3rd Annual Conference, J. R. Koza, W. Banzhaf, K. Chellapilla et al., Eds., pp. 22–25, Morgan Kaufmann, San Francisco, Calif, USA, 1998.
- S. Wei and H. Leung, “A novel ranking method based on subjective probability theory for evolutionary multiobjective optimization,” Mathematical Problems in Engineering, vol. 2011, Article ID 695087, 10 pages, 2011.
Copyright © 2015 Shuang Wei and Henry Leung. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.