Mathematical Problems in Engineering

Volume 2015, Article ID 873794, 14 pages

http://dx.doi.org/10.1155/2015/873794

## Compromise Rank Genetic Programming for Automated Nonlinear Design of Disaster Management

^{1}Array and Information Processing Laboratory, College of Computer and Information, Hohai University, Nanjing, Jiangsu 210098, China^{2}Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada T2N 1N4

Received 11 December 2014; Revised 15 April 2015; Accepted 16 April 2015

Academic Editor: Joan Serra-Sagrista

Copyright © 2015 Shuang Wei and Henry Leung. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper presents a novel multiobjective evolutionary algorithm, called compromise rank genetic programming (CRGP), to realize a nonlinear system design (NSD) for disaster management automatically. This NSD issue is formulated here as a multiobjective optimization problem (MOP) that needs to optimize model performance and model structure simultaneously. CRGP combines decision making with the optimization process to get the final global solution in a single run. This algorithm adopts a new rank approach incorporating the subjective information to guide the search, which ranks individuals according to the compromise distance of their mapping vectors in the objective space. We prove here that the proposed approach can converge to the global optimum under certain constraints. To illustrate the practicality of CRGP, finally it is applied to a postearthquake reconstruction management problem. Experimental results show that CRGP is effective in exploring the unknown nonlinear systems among huge datasets, which is beneficial to assist the postearthquake renewal with high accuracy and efficiency. The proposed method is found to have a superior performance in obtaining a satisfied model structure compared to other related methods to address the disaster management problem.

#### 1. Introduction

Natural disasters occurred more frequently during these years. Most of them caused a large amount of infrastructure damage, heavy casualties, and financial loss every year, such as earthquake, floods, and typhoon. For example, Sichuan Earthquake left at least 5 million people without housing and government had to spend billions over the following years to rebuild the ravaged areas [1]. In order to avoid the enlargement of economic and mental damage for the people and society, disaster management is therefore a growing need for many governments. The important and complex task of disaster management is to make an efficient reconstruction strategy that can rescue the victims on time and rebuild the ravaged areas efficiently with the limited resources and finance support. Formerly, several qualitative analyses were pointed out for certain special aspects of the reconstruction strategy, such as conceptual decision support for disaster mitigation [2, 3], rescue planning of telecom power [4], and optimized strategy for resource allocation [5]. But, few mathematics models for disaster management currently exist in the literature, and modeling the reconstruction strategy remains an important open research topic.

From the application view, [6] pointed out that some prediction models for “in-advance reconstruction strategy” were demanded which can reduce inevitable delays in the recovery process. In addition, [7] hints that the speed and quality of renewal process interact with the assignment of limited resources (such as experts, medical teams, and finance support). These underlying relationships can be modeled from the “Big Data” of disaster management for prediction. Therefore, it is an urgent problem to design the prediction models for disaster management to assist in making the efficient reconstruction strategy in advance.

Since nonlinear models are empirical for the complex process modeling, such as industrial control systems [8, 9], biomedical data modeling [10, 11], and chemical process systems [12], such underlying models for disaster management can be converted as a problem of nonlinear systems design (NSD). This NSD problem involves determining the structure and estimating the parameters of the nonlinear models embedded in the disaster management. Traditionally, solving NSD problems focuses on parameter estimation, while the structures are usually assumed to be known or approximated by some universal approximators such as neural networks [13]. However, actually no a priori information can be understood about the nonlinear models for disaster management, and the disaster datasets are usually incomplete and inconsistent [14]. In this case, this approximated approach has its limitation in the sense that it usually uses a very complex model to describe a maybe simple underlying function. Then, the unnecessary number of parameters posts a problem in estimating the parameters and contradicts the basic principle in designing a nonlinear system. Moreover, the related decision variables in the model, which are significant to express the system behavior, are generally difficult to select among a lot of features. In addition, the generalization performance of an approximated model is usually not guaranteed. Therefore, it is a significant challenge to determine an exact structure of each nonlinear model for disaster management and design these nonlinear models with sparse data and little a priori information.

To discover the nonlinear model structure, genetic programming (GP) is a powerful tool by using tree chromosome representation and the crossover operator [15, 16]. It searches in the functional space to determine the optimal structure of a nonlinear system [17]. Generally, GP is used to coevolve model structure and parameters by optimizing a single objective, such as prediction error. However, this approach does not always converge to a global optimal structure since the functional space is usually too large to search for [18]. It is observed that spurious terms and dependent variables are the main causes of this problem. Evolution of the spurious terms and dependent variable results in rapid growth of tree sizes in GP. It tends to cause the algorithm stagnating and leads to the phenomenon of bloating [19]. The simple and common way to prevent bloating for GP is to limit the maximum tree depth of all the individuals and the maximum number of nodes [20]. But for our problem with little a priori information, it is hard to determine the appropriate depth and the number of nodes. The other approaches focus on controlling the offspring trees growth or targeting redundant nodes by improving the crossover and selection operators [21, 22]. Our proposed approach mainly uses this idea to resolve bloating in GP. Besides, it is also a good way for controlling bloating to use a fitness penalty (parsimony pressure) that is biased against the individuals with more complex structure [23]. Luke presented that all of these bloating control methods can perform well without reducing their ability [24]. Considering that the NSD problem for disaster management should have a practical optimum solution, yet an over-complex model results in bad comprehensibility; thus model complexity should be restrained at a low level. Thus, model complexity should be optimized which is also considered as a fitness penalty to control bloating. Since a poor model structure may lead to poor parameter estimation, model accuracy and complexity should be optimized simultaneously in solving the NSD problem. Since the maximization of model accuracy is not the same as the minimization of model complexity, our NSD problem for disaster management is thus modeled as a multiobjective optimization problem (MOP) and solved by a multiobjective evolutionary algorithm (MOEA) cooperating with GP.

The MOEAs mainly include aggregating method and Pareto-based approach [25]. For the aggregating method, predefining weights or a priori information of the objectives is required to convert the MOP to a single-objective optimization. This approach is easy to implement, but improper a priori information might lead to poor optimization result [25]. The Pareto-based approach obtains an optimal solution in two steps: Pareto optimality process and multicriteria decision making process [26]. This approach is usually time consuming in the first step to achieve the Pareto optimal set which consists of many redundant solutions. The final solution is selected from the Pareto optimal set using the goal or preference information from decision makers in the second step. Although several algorithms have been proposed to improve the validity of Pareto optimal set [27–29], little efforts have been put on incorporating the subjective information with Pareto optimality to reduce the search space and improve the efficiency. Since the NSD for disaster management requires a single solution instead of the whole Pareto optimal set, incorporating the subjective information is important to discover the exact nonlinear model for each underlying relation in disaster management. Because we found that the multiple objectives of the NSD for disaster management have different priority for ranking in different situations (and this ranking strategy can be adjusted according to the subjective probability theory), we here propose a novel multiobjective GP algorithm called compromise rank genetic programming (CRGP), to address the NSD for disaster management by combining the subjective information with Pareto optimality.

The proposed CRGP aims to uncover the exact nonlinear structure and parameter estimation of the models embedded in the incomplete disaster datasets and tries to combine with the subjective probability theory to reduce the computational complexity. No choosing of goals or weights information is required for CRGP, which makes it eliminate the error due to the weights mistake. The proposed CRGP utilizes the relative distance of chromosomes to guide the search in the Pareto optimality process and to obtain the final compromise solution in a single run. This characteristic is beneficial in reducing the evolution probability of the model structures composed by redundant terms and unimportant features. It can resolve the bloating problem in GP evolution and improve the convergence rate. To evaluate the effectiveness and practicality of CRGP for our problem, the proposed approach is then applied to a practical problem-post-earthquake reconstruction in disaster management.

The paper is organized as follows. Section 2 presents the formulation of NSD based on MOP. Section 3 describes the proposed CRGP algorithm. Convergence analysis of CRGP is given in Section 4. Section 5 reports the application of CRGP to disaster management problem using real postearthquake reconstruction data. The method is compared with the traditional single-objective GP method, aggregating MOGP method, and Pareto-based MOGP approach. Concluding remarks are given in Section 6.

#### 2. Problem Formulation

Assume an unknown nonlinear system given bywhere is the unknown nonlinear function, is the output vector (with noise) we can obtain, is the unknown input vector, and is the system parameter vector that is unknown. The task of NSD involves determining three variables: appropriate input set () with minimal redundancy from , nonlinear function , and system parameters .

Traditional nonlinear system identification usually assumes and to be determined a priori, and the main task is to determine by some parameter estimation techniques. One popular approach is the minimum mean square error (MSE) method; that is,where is the estimated output and contains all the real numbers. In practice, the a priori information about and is not usually available. A general formulation for NSD based on minimizing MSE can then be expressed bywhere is the functional space that contains all possible nonlinear functions, is the input sequence space, and is the MSE between the true and estimated output. Since input variables might be correlated, different combination of input variables might result in similar level of MSE. Therefore, the order of the appropriate input set should also be minimized; that is,

In addition, different model structures can have the same MSE if there are redundant terms. Thus, the nonlinear function should be in the most parsimonious form. The complexity of model structure is then considered as another measure of here. Two factors are considered to measure the complexity of model structure. One is the total number of terms in to avoid redundancy and another is the ratio of the number of nonlinear terms to the total number. That is,where is the number of terms in and is the number of nonlinear terms. Since the three optimization problems, (3), (4), and (5), are correlated, they should be solved simultaneously for a consistent solution. For example, the optimal input set is determined by minimizing and together. At the same time, an improper input set would cause divergence in optimizing . In this paper, we propose formulating NSD as a MOP given below:

One popular approach to MOP is the evolutionary multiobjective optimization (EMO). Traditional EMO approaches [25] convert MOP to a single-objective problem by weighting. Taking the sum of weighted objectives, the single fitness function can be expressed as:where are the weights. The optimal solution can be obtained by ranking the individuals in terms of the single-objective value. This approach has some difficulties to solve the NSD of disaster management problem. First, the values of those weights are hard to be defined a priori, and different weightings can result in very different solutions. Thus, several runs with different weight combinations are usually required to obtain various Pareto-optimal solutions [25], but they still cannot ensure the exploration of the real solution except for higher computation cost. Second, our model (7) is usually nonconvex, because the epigraph of the function is usually not a convex set with respect to all the candidate structures of the unknown function “.”

To overcome these difficulties, the Pareto-based EMO approaches are usually considered. Compared to the objective space of a single-objective optimization, the objective space for Pareto-based EMO is usually more complex. It is a challenge to determine how to evaluate the individuals according to several inconsistent objectives. Pareto-based EMO [30] are usually implemented in two steps: Pareto optimality process and multicriteria decision making process. The first step ranks individuals by their nondominance degree and then achieves the Pareto-optimal set as a vector which is composed of a set of “nondominated” solutions for all the objectives. In the second step, the final solution is selected from the Pareto-optimal set according to the goal or preference information provided by decision makers [31]. The definition of “dominance” is shown as below.

*Definition 1 (Pareto dominance [26, 32]). *For a minimization optimization problem, given vector is said to dominate vector if and only if is partially less than , that is,and vice versa for a maximization optimization problem.

Generally, the Pareto dominance rank method assumes that all objectives have an equal weighting for optimization. However, this assumption does not apply to the NSD of disaster management problem. Actually, the three objectives play different roles in determining the final best nonlinear system. In addition, two steps of the general Pareto-based methods cost much time to calculate many redundant Pareto-optimal solutions which is not required for our disaster management problem. To improve these issues, CRGP is proposed here to solve for the NSD MOP of disaster management problem.

#### 3. Compromise Rank Genetic Programming

CRGP is proposed with the purpose of incorporating decision making process with Pareto optimality process to obtain the final compromise solution in one process. The main difference between CRGP and the general Pareto-based EMO methods is the rank approach of individuals according to multiple inconsistent objective functions.

##### 3.1. Compromise Rank Approach

Assume that there are objectives (). These objective functions map individuals in the variable space to vectors in the objective space. The individuals are ranked by evaluating the -dimensional vector in the objective space. Assume and be two different -dimensional vectors in the objective space. and () denote the th objective values of the and , respectively. For the NSD MOP problem for disaster management, there are three objectives as (7); that is, , and they are always positive; in fact, in this paper we only consider that all objective values are positive; that is, , and .

We define the relative distance from to relative to the beginning point through the th objective space as ; that is,The sign of is determined by the magnitude of and , and the magnitude of indicates the relative increment or decrement of the th objective when is compared to . For example, the case that is negative implies that is smaller than . For minimization, is then considered to perform better than for the th objective.

Assume the term “rank” to measure the performance of every individual; the smaller the rank, the better the optimal solution. Thus, comparing the relative distances of different objective vectors, we have the following situations:(1)If , for minimization should be ranked higher than .(2)If , for minimization should be ranked lower than .(3)If , and , and are considered as having the same rank based on Pareto dominance sorting method (see Definition 1). As shown later, unlike Pareto dominance sorting, compromise rank method attempts to consider relative differences of all the objectives and to assign different ranks to and in this case.(4)If , the rank of and should be the same.

Here, we analyze the characteristics of the objective vectors (i.e., ) in different situations for our MOP problem (7) and subsequently propose a new rank rule to solve it. Obviously, when the relative distance of and falls into the first situation and second situation, the rank can be determined by the sum of all parts . For the third situation, the relationship between the absolute values of can be divided into three cases: (a) , (b) , and (c) . In the case (a), the NSD MOP has solutions with and close to each other, but their MSE values, that is, , can be quite different. For example, consider corresponding to the model structure and corresponding to the model structure ; their and values are close. However, their values are quite different; that is, and are 0.004 and , respectively. Actually in the case (a) the first objective has a higher priority; therefore the rank of and in this case would be in agreement with the sign of the sum of . In the case (b), the MSE values of different models are similar, and the difference between and can be ignored. On the other side, in the case (b) the first objective has a lower priority. So, in this case the conclusion is also achieved in which the rank of and would be in agreement with the sign of the sum of . In the case (c), and are considered to have the same rank, yet it occurs rarely. Thus we found that the sum of can adaptively reflect the preference information of objectives of NSD at different cases. This relationship is a novel character the proposed approach uses to apply to the NSD MOP for disaster management, unlike other approaches which need to know the preference information a priori.

Therefore, the proposed approach defines “compromise distance” of two vectors in a -dimensional objective space as below.

*Definition 2 (compromise distance). *The compromise distance from to is defined as the sum of all the relative distances of every objective from to in the objective space. That is,Compromise rank approach is then proposed to rank the compromise distance between two vectors in the objective space to guide the Pareto optimality search process. Assume that all objective values are positive; the rules for compromise distance ranking are given as follows: (1)If and , then .(2)If and , then .(3)If and , if , then .(4)If and , then .Additionally, these rules are Pareto-compliant which is proved later in Theorem 3.

Figure 1 illustrates the ranking scheme based on Pareto dominance sorting and compromise rank approach, respectively. For the two vectors with rank 1 in Figure 1(a), their relative distance of the first objective is smaller than that of the second objective. Thus, the compromise rank approach will rank these two vectors in agreement with the order of the second objective as shown in Figure 1(b). Consider any vector with rank 1 and any vector with rank 2 in Figure 1(a); their relative distance of the first objective is much bigger than that of the second objective. Thus, the compromise rank approach will rank them with more levels as shown in Figure 1(b).