Abstract

Decision making in engineering design problems is challenging because they have multiple and conflicting criteria and complex correlation between design parameters. This study proposes a decision-making support methodology named design mode analysis, which consists of data clustering and principal component analysis (PCA). A design mode is indicated by the eigenvector obtained by PCA and reveals the dominant design parameters in a given dataset. The proposed method is a general framework to obtain the design modes from high-dimensional and large datasets. The effectiveness of the proposed method is verified on the conceptual design problem of the hybrid rocket engine.

1. Introduction

Multiple-criteria decision making (MCDM) [1] is used in a class of problems where decisions are made among multiple and conflicting criteria (objectives). Since a single best decision for MCDM problems does not exist, they are solved by seeking a set of available alternative decisions. After the alternatives are obtained, a decision maker (DM) chooses among their preferred solutions. In this procedure, the set of alternatives are represented as a Pareto solution set. This procedure is also known as multiobjective optimization. Nondominated solutions are those where no objective function value can be improved without worsening another objective function value. Multiobjective optimization algorithms derive good approximations for the Pareto optimal solutions, which cannot be further improved.

Once solutions have been properly converged, they can be used to make decisions. However, choosing the best compromise with multiple and conflicting objectives without any information about the target problems to aid decision making is difficult. To solve this problem, many researchers have proposed effective use of optimization results to improve our understanding of the target problems.

Obayashi et al. [2, 3] proposed the multiobjective design exploration framework, which consists of the Kriging Model, multiobjective genetic algorithm, analysis of variance, and a self-organizing map. It can explore a broad part of the decision space and derive many Pareto solutions (alternatives) in a reasonable time through Kriging metamodeling of the objective functions. The information tradeoff between multiple objectives and decision space characteristics is broadly outlined by visualizing the decision and objective spaces using data mining methodologies, such as self-organizing maps.

Oyama et al. [4, 5] used proper orthogonal decomposition (POD) to formulate decision variables involved in designing the airfoil shape and revealed that any design can be decomposed into the mean vector and the fluctuation vector, which is expressed by the linear sum of normalized eigenvectors and orthogonal base vectors. One of the advantages of their approach is that we can understand the representative design types as well as the parameters. After analyzing the fluctuation vector, the parameters are constructed from a large and high-dimensional dataset.

In this paper, inspired by the work of Oyama et al., we extend the concept of the fluctuation vector and define a new concept—design mode. Based on this concept, we propose “design mode analysis,” which is an analytical framework for finding the design modes and using them effectively. It consists of data clustering and principal component analysis (PCA) and will extract the representative design types along with their characteristics. These information aid the decision making. Note that PCA is same algorithm as POD, but the different name is used depending on the application field. In this paper, we use the name of PCA.

This paper is organized as follows. Section 2 gives the definition of the multiobjective optimization problem. Section 3 introduces the concept and framework of the proposed method, as well as its application to engineering design problems. In Section 4, the proposed method is applied to the conceptual design problem of a hybrid rocket engine. The paper concludes with Section 5.

2. Multiobjective Optimization Problem

In many engineering design problems, MCDM is often treated as a multiobjective optimization problem. It can be formulated as where is an objective vector, consisting of -objective functions for all . is called the decision variable space and is defined as Here, and are inequality and equality constraints, respectively. Multiobjective optimization problems consist of two definitions that handle the tradeoff between the multiple criteria (objectives). Notably, these definitions are prescribed for minimization problems.

Definition 1 (Pareto dominance). For and   , is said to dominate if for at least one ,   and .

Definition 2 (Pareto optimality). Let . Then, is Pareto optimal when there are no other solutions in dominating .

Based on these definitions, the Pareto solution set or nondominated solution set is a subset of all the Pareto-optimal solutions. Typically, multiobjective optimization algorithms seek good approximations in the Pareto optimal solutions by evolutionary multiobjective optimization (EMO). EMO is a popular approach because it obtains many nondominated solutions simultaneously in a single run. Recently, a number of EMO algorithms have been developed and improved to derive well-converged and well-spread nondominated solutions sets. One of the most successful frameworks in recent EMO algorithms is the multiobjective evolutionary algorithm based on decomposition (MOEA/D) proposed by Zhang and Li [6]. It has been widely used in many real-world applications [7, 8].

3. Design Mode Analysis

3.1. Definition of Design Mode

Decision variables are defined as a set of parameters that determine the solutions to optimization problems. For example, in product design problems, they frequently represent the size, weight, and shape of the product. Objective functions indicate the goal of the product design, such as the performance and cost of the target product. The decision variables and objective functions form the decision space and the objective space, respectively, as shown in Figure 1. In general, the optimization process explores the decision space with the aim of minimizing or maximizing the objective functions. Many multiobjective evolutionary algorithms aim to improve the diversity and convergence of the nondominated solutions, especially in the objective space [6, 9].

However, in product design problems, the design is also characterized by the decision variables. For example, if the product mass is successfully minimized and the characteristics of the decision variables of the optimum solutions are analyzed, a weight-saving design strategy may be found for the product.

In this study, we incorporate the “design mode” concept into decision space analysis as an essential perspective. The concept is derived from PCA. These methods extract the dominant characteristics of the target dataset by decomposing high-dimensional data into low-dimensional descriptions using a set of principal component vectors, whose directions correspond to maximum variance among the variables. Oyama et al. [4, 5] applied PCA in their analysis of airfoil shape design and reported its effectiveness. They focused on designing the shape of the product, that is, seeking the principal airfoil shapes that made up the Pareto solution set.

In this study, we generalize and extend the concept to deal with all types of decision parameters during product design. Directions indicated by principal component vectors guide us to change the decision variables for constructing the designs in the given dataset. Furthermore, the contribution ratio of each decision variable to the principal component vectors shows its importance for creating the designs. We assume that the principal component vector gives us important information for the engineering design and define it as “design mode.” An analytical framework based on this design mode is also proposed.

Let be a decision variable of the th Pareto solution, and let be a Pareto solution set of size . In PCA, the following optimization problem is solved, and the vector that maximizes variation in the decision variables is selected: where the matrix is the covariance matrix of the dataset . Let be the eigenvalues for , and let be their corresponding eigenvectors. Then, the th principal component of the data is represented by

The original dataset can be decomposed into a low-dimensional representation by selecting a certain number of the principal components. The most useful contribution of PCA to the Pareto dataset is that the entire Pareto dataset can be approximated by the mean vector of the dataset and a linear combination of a specified number of eigenvectors:

A useful criterion for choosing the number of components is the cumulative proportion of the variance , defined below. This metric indicates the extent to which each principal component explains the original dataset. Consider

From the axes associated with each eigenvector, we construct a meaningful new decision space. Here, we define each eigenvector as “design mode.” Along the axes indicated by the design modes, we examine the features of the obtained solutions. To analyze the correlation between the axes of the decision variables in the original decision space and the new axis indicated by the th design mode, we calculate the component loading: where is the variance of the and is the th element of eigenvector . The component loading specifies the importance of the decision variable in constructing the th principal component. Notably, if PCA is executed on a standardized dataset, the covariance matrix is equivalent to the correlation matrix of . In this case, and .

This section has outlined some important aspects of design mode analysis.(i)Applying PCA to a Pareto set enables the extraction of dominant designs and decision variables to be extracted.(ii)Pareto solutions can be approximated by the mean vector of the dataset and the linear combination of a certain number of eigenvectors.(iii)Each eigenvector forms a meaningful new axis in the decision space, and correlations between the original and new axes are quantitatively evaluated by the component loadings.

3.2. Framework of Design Mode Analysis

In the above subsections, we explained the concept of the design mode. Here, we explain the framework of design mode analysis. The main steps of the design mode analysis are as follows: generate the dataset; cluster the dataset; perform PCA; and either perform correlation analysis or construct a new design. The proposed framework is illustrated in Figure 2.

The differences of our proposed approach from the conventional method are as follows:(1)data clustering is incorporated into the analysis;(2)design mode characterization is achieved by studying the component loading of each design mode.The following subsections provide detailed descriptions of each step in the framework.

3.2.1. Generating the Dataset

The first step in our proposed analysis is to choose how to generate the dataset (design examples). One of the easiest ways to do this is to use statistical sampling methods such as Latin hypercube sampling. An expert engineer can also supply representative designs. The dataset generation is dependent on the objective of the design mode analysis. For example, if the engineer desires a uniform, equal study of the design space characteristics, statistical sampling is preferred. Alternatively, if the aim is to elucidate the design methodology of an expert, then many design examples constructed by the experts must be collected. In this study, we investigate the decision space characteristics around the Pareto solution set and then obtain the solutions by EMO. Notably, our proposed framework does not determine the data generation method. Multiobjective optimization is only one of the tools available to generate characteristic designs.

3.2.2. Clustering the Dataset

By incorporating data clustering into data preprocessing, we can obtain reliably distinguished design modes. Data clustering for the Pareto solution set can be applied to either the decision or the objective space. However, since design mode analysis is conducted on the decision space, clustering should ideally be performed on the decision space. Data clustering was not considered as a part of the conventional method proposed by Oyama et al. [4, 5]. However, since data clustering screens all the input designs and divides them into representative designs and their similar counterparts, its inclusion is advantageous. Meanwhile, any of the data clustering methods are suitable. In this study, we adopt -means clustering, which divides the dataset into clusters. Each datum is then assigned to the cluster with the nearest mean vector.

3.2.3. Principal Component Analysis on the Dataset

Once the clustering process is complete, each cluster is subjected to PCA. Different design modes are expected to be derived from each cluster. Moreover, the mean vector of each cluster is the representative design of each cluster, and all solutions in each cluster are approximated by a linear combination of the design modes (eigenvectors), in the coordinate system whose origin is the mean vector.

3.2.4. Correlation Analysis

Having obtained the design modes for each cluster, we study each design mode by referring to the component loadings. This process identifies the dominant decision variable in the design mode, thereby revealing the important factors for creating the new designs in each cluster. Decision variables that make low contributions to the design mode can be eliminated from the variables and set as constants instead. The component loadings characterize the design modes and give us their features. This process was not considered in the conventional method either. It will give us useful information about the target problem.

3.2.5. Constructing a New Design

Our proposed method, called design mode analysis, is employed as an analytical tool and a design support tool. Since most designs in each cluster can be approximated in the decision space formed by the design modes, we can easily generate new designs with the same features in this space, using (5). Note that random sampling in the original decision space does not easily create design sharing characteristics for a specific design, especially if that decision space is high-dimensional. We summarize the proposed framework in the pseudocode shown in Algorithm 1.

 (1) Generate a design dataset X = () of size N.
(2) Divide the dataset into H clusters
    by using data clustering.
(3) for  i = 1 to H do
(4) Extract the design mode = () by
   applying the PCA () on ith cluster.
(5) Calculate and study the component loading
   = () for each using
(6) Choose a base design b from the ith cluster,
   or calculate a mean vector instead of it.
(7) Choose a number of design modes p
   used for generating new designs.
(8) Generate a new design X′ based on
   .
   Coefficient is an arbitrary constant.
(9) end for

The following subsection demonstrates the effectiveness of each step in our proposed method through a series of experiments.

3.3. Case Study: Multiobjective 0/1 Knapsack Problem

In this subsection, our proposed design mode analysis method is applied to the multiobjective 0/1 knapsack problem (MOKP). The effectiveness of the method is investigated in terms of the following outcomes:(i)how well each design mode characterizes the dataset,(ii)whether the data clustering process effectively distinguishes the design modes,(iii)whether any design in each cluster can be approximated by the mean design and a linear combination of the design modes in the cluster,(iv)whether our proposed method can be applied to binary-valued problems.

3.3.1. Experimental Setup

The target design problem, MOKP, is a multiobjective extension of the classic 0/1 knapsack problem (KP), a nondeterministic polynomial time complete combinatorial problem. When we try to pack the items, which have their own values and weights, into a knapsack with capacity constraints, choosing the items to maximize the total profit of the items selected is important, as we cannot pick them all. In combinatorial optimization, this is popularly known as the classic KP and often appears in real-world decision-making problems in various fields, such as production scheduling or portfolio management. Figure 3 illustrates the classic KP. In KP, a binary decision variable is used to indicate whether each item is included in the knapsack or not, and the total profit of the items packed into the knapsack is defined as the objective function. The objective is to find a subset of the items with a total weight not exceeding the knapsack capacity, while maximizing the total profit. MOKP is an extended form of KP where the number of the knapsacks is simply increased.

The -objective MOKP with decision variables is formulated as where indicates the profit of the th item in calculating the function value for the th knapsack. In the constraint function, is the weight, and is the upper limit value of . The test case is the 2KP50-11 dataset selected from MCDMlib [10], a collection of datasets available for testing various multiobjective optimization problems. The study constitutes a biobjective (two-knapsack) problem with 50 decision variables (items). These items in each knapsack are weighted the same, but their profits differ.

The nondominated solution set of 2KP50-11 is obtained by nondominated sorting genetic algorithm II (NSGA-II) [9], one of the most efficient MOEAs: two-point crossover (crossover rate = 1.0) and bit-flip mutation (mutation rate = 1/chromosome length) are used. Population size is set at 120. A single run of NSGA-II is terminated after 1000 generations, and 30 runs of NSGA-II are executed. The crossover and mutation rate used here follow the practice in [6, 9]. Population size and the number of generations are empirically chosen (not optimized) here. Although the choice of the genetic algorithm parameters (population size, number of generations, crossover, and mutation rates) may result in different optimum solutions, it is not the focus of our study to adjust and study the parameter setting. The multiobjective optimization is only the tool to generate the dataset to be analyzed.

Notably, MOKP comprises binary-valued decision variables. Since PCA executes real-valued variables, we must assume that the binary variables are continuous when applying PCA to the solution dataset of MOKP.

3.3.2. Results and Discussion

Figure 4 shows the nondominated solutions obtained by NSGA-II. After deleting the overlapped data obtained in the 30 runs, we obtained 53 solutions.

PCA was also carried out on the decision variables of the nondominated solutions sets. The results of running PCA on MOKP nondominated solution sets are summarized in Table 1. The original dataset can be explained if the cumulative proportion of explained variance is ≥0.8. In this case, the first eight design modes are essential to explain the dataset. To visualize the characteristics of the design modes, the component loadings are calculated and plotted in Figure 5. The component loadings of elements with zero variance are plotted as zero, because when they are equal to zero, (7) is incalculable. To interpret this distribution, we check each element; if the absolute value of the th element in th design mode is large, then packing or discarding the th item strongly affects the th design mode. Each design mode has a unique distribution of its component loadings, indicating that several different strategies will successfully pack the items into two knapsacks while maximizing the profits.

Next, to evaluate the effectiveness of the data clustering, -means clustering was performed on the dataset, yielding three clusters (). Figure 7 plots these clusters in the objective space.

Here, the first design mode is the design mode corresponding to the eigenvector with the maximum eigenvalue. Following PCA, a different design mode (i.e., the first design mode) was obtained for each cluster and for the entire dataset, as shown in the component loading plots of Figure 6. Thus, apparently, data clustering distinguishes the design modes effectively.

If PCA successfully extracts the design modes of each cluster, any design in any cluster can be approximated by the mean design and a linear combination of the design modes within the cluster. To verify this assumption, we approximated the nondominated solution set by its mean vector and effective design mode vectors (sufficient to achieve a cumulative proportion of the variance ≥0.80). The procedure for approximating the solution set is shown below.

Design Approximation Method.(1)Choose a target design from the dataset .(2)Choose eigenvectors so as to account for a cumulative proportion of the variance .(3)Find optimum coefficients () of the approximated design calculated by (5) so as to minimize the sum of the squared error (SSE) between the target and approximated design in the decision space: where is the th decision variable of the th design in the dataset and is the approximated design of . Notably, if there are the constraints in decision variables in the original dataset, they should be added to (9).

As an example, consider a target design in the solution with a maximum value of , and assume that the design belongs in Cluster 1. To obtain the approximation error in the objective space, we must evaluate the objective function value of the approximated design. The solution obtained by this approximation method can be real valued. To evaluate the objective function values of MOKP, the approximated decision variables should be converted to binary values. In this experiment, the approximated variable is rounded to the nearest whole number. If the integer lies outside of , it is assumed to be 0 or 1; that is, if it is smaller than 0, it is assumed to be 0, and if it is larger than 1, it is assumed to be 1.

Results of the design approximation are summarized in Table 2, where is the constraint value in (8). SSE is the approximated error in the decision space defined by (9). is the Hamming distance between the target and the approximated design in the decision space. is the Euclidean distance between the target and the approximated design in the objective space, and is the number of design modes used in the approximation. We tried two different methods of design approximation. In the first, PCA was applied to all solutions, and each design was approximated by the mean vector and the design modes of all solutions. In the second, PCA and design approximation were executed on the Cluster 1 dataset. The results of both trials are listed for comparison in Table 2. The approximated designs in the objective space are plotted in Figure 8.

Table 2 indicates that the target design was successfully approximated from the data in Cluster 1 alone (, ). When the approximation was built from all solutions, the SSE was an order of magnitude greater. These results are also evident in the plots of Figure 8.

To ensure that data clustering effectively distinguishes between the design modes, we analyzed the performance of the approximated design. In this analysis, we adjusted the size of each eigenvector before adding it to the mean vector. The upper charts in Figure 9 show the distributions of the elements of the mean and target design elements, while the lower charts show the distributions of the elements of the modal eigenvectors built into the approximation.

To approximate the target design, elements of the mean design with values different from the target values are altered on addition of the eigenvectors. Arrows on the charts indicate the directions of the altered elements. For instance, observe the 37th variable emphasized by the hatched pattern in Figures 9(a) and 9(b). In the design approximation without data clustering (Figure 9(a)), the magnitude of the 37th variable is very much smaller (±0.1) than that obtained after data clustering (Figure 9(b)). In this case, a larger coefficient () is required to successfully approximate the 37th variable. However, an appropriate coefficient for a single variable is difficult to determine because the coefficient affects all other elements. Conversely, in the design approximated from the clustered data, the element lacked by the mean design compared to the target design is compensated for the corresponding element of the eigenvectors. These results show that data clustering is effective for extracting precise design modes.

This case study also highlights the importance of granularity in our proposed design mode analysis. In the absence of clustering, we assume that some decision variables contribute negligibly to the design. Thus, granularity exists in the design mode extraction. Data clustering increases the granularity of the design modes. Watanabe et al. [11] proposed an interactive granularity control method, which is applicable to our proposed design mode analysis.

3.4. Granularity in Design Mode Analysis

In the above case study, we introduced the concept of “granularity” in design mode analysis. The granularity of the design mode extraction required by DM depends on the situation. For example, if a researcher is interested in the characteristics of specific clusters, he may divide the dataset into several clusters and characterize the clusters by PCA. At the beginning of the analysis, the characteristics of the decision space can be coarsely determined by imposing a low granularity. Once the design mode has been refined, a high granularity is expected. This subsection focuses on granularity and proposes a more general framework for design mode analysis.

Following Watanabe et al. [11], we adopted a hierarchical approach. The proposed framework iterates binary clustering and PCA and approximates a design for each cluster. Different from Watanabe et al., we controlled the granularity of the design mode analysis by the accuracy of the approximated design. Our proposed framework is illustrated in Figure 10.

The design approximation process is discussed in Section 3.3.2. If the design modes are successfully derived, any design can be approximated by a linear combination of the design modes, as shown in (5). In this equation, the base design is the mean vector of the cluster, but the base design can be any design in the cluster, provided that it retains the average or representative characteristics of its own cluster. The easiest way to choose the representative design is to calculate the mean vector and set it as the base.

Once the design approximation is complete, DM checks whether the analysis has adequately converged or whether analysis should be continued. The convergence is evaluated by the errors in the decision and objective spaces. DM may specify a threshold for each approximation error. If high quality design modes or design strategies are obtained, DM can terminate the analysis.

Otherwise, the design modes are refined by dividing each cluster into two new clusters and progressing to the next layer of granularity. Notably, PCA and clustering are not performed on clusters of a single datum. In this case, the next layer inherits the cluster.

The above-mentioned procedures yield design modes at any level of granularity.

4. Application to Conceptual Design of Hybrid Rocket Engine

In this section, our proposed design mode analysis method is applied and tested on the conceptual design of a hybrid rocket engine. This problem, which is one of the most useful real-world optimization problems for testing the performance of optimization algorithms [12], was first proposed by Kosugi et al. [13]. The executable software for objective function evaluation is available from the website [14].

4.1. Problem Definition

A hybrid rocket engine stores propellant in two different kinds of phases. With the advantages of low environmental impact, flexible thrust control by throttling, and reduced chemical explosion hazard, it is becoming increasingly popular. In the hybrid rocket engine, the thrust and the engine design are strongly correlated because thrust is obtained by combustion in the boundary layer diffusion flame. Thus, designing the solid fuel geometry and the oxidizer supply system is important and difficult.

The hybrid rocket investigated comprises four parts: a payload, an oxidizer tank, a thrust chamber, and a nozzle. The thrust is provided to the hybrid rocket by combustion in a turbulent boundary layer in the thrust chamber. The oxidizer and mass/fuel ratio also affect the thrust. The latter is determined by the fuel parameters, namely, the oxidizer, fuel length, and initial port radius. Hybrid rocket design problems constitute two-objective optimization problems, in which the fuel parameters must be optimized to maximize the altitude gained, while minimizing the gross weight. A schematic of the hybrid rocket is shown in Figure 11.

Here, the six-dimensional decision space comprises the initial mass flow of the oxidizer [kg/s], fuel length [m], initial port radius [m], combustion time [s], initial pressure in the combustion chamber [MPa], and aperture ratio of nozzle . The two-objective functions aim to maximize the altitude [km] and simultaneously minimize the gross vehicle weight [kg]. The following motion equation is assumed during the flight analysis: where is acceleration at time , is the thrust [N], is the total drag [N], and [m/s2] is gravitational acceleration. The following equation relates the thrust to the aperture ratio of the nozzle and the pressure in the combustion chamber [MPa]: where is the total thrust loss coefficient and is the momentum loss coefficient, embodying the effect of friction (<1) at the nozzle exit. is the mass flow of propellant, and and denote the velocity and pressure at the nozzle exit, respectively. denotes the atmospheric pressure at flight altitude, and is the area of the nozzle exit.

The drag is decomposed into the pressure drag and the friction drag . The parameters , , and are not described because of space limitations. For details on these parameters, the reader is referred to [13, 14].

The gross weight is estimated by where and are payload and engine weights, respectively. is the total mass of the oxidizer, and is the total fuel mass. The mass of the oxidizer tank, combustion chamber, and other equipment is denoted by ,  , and , respectively. and are the integrated volumes of a material for the oxidizer tank and the combustion chamber, respectively.

Given (12), we define the multiobjective optimization problem of the hybrid rocket engine design as Here, and are set to 1.0 and 50 [kg], respectively. , , , and are assumed as constants. That is, this design problem seeks the most lightweight rocket that does not compromise the flight altitude.

4.2. Multiobjective Optimization

In this subsection, the dataset is the nondominated solution set of the hybrid rocket engine design problem. The solutions are derived by NSGA-II. The population size is set to 120. The analysis assumes a simulated binary crossover (SBX) with a crossover rate of 1.0 and a polynomial mutation with a mutation rate of 1/(chromosome length). The decision variable vector of a single solution is represented as in NSGA-II. Each decision variable is binary coded with a length of 20 bits, giving a chromosome length of 120. A single run of NSGA-II is terminated after 188 generations. The objective function was calculated 22680 times, yielding 120 solutions. The crossover and mutation rate used here follows the practice in [6, 9]. Population size, the number of generations, and other genetic parameters are empirically chosen. The parameter study to obtain appropriate parameter setting is dismissed because it is not the focus of our study to improve optimization accuracy. The obtained nondominated solutions are plotted in Figure 12.

Intuitively, more fuel will achieve higher altitude; however, fuel increases the weight of the rocket. The weight-altitude tradeoff is evident in Figure 12 but is difficult to visualize in the six-dimensional decision space. For this reason, our design mode analysis is effective for analyzing the characteristics of high-dimensional decision spaces.

4.3. Design Mode Analysis

In this subsection, we extract the decision space characteristics of the nondominated solutions for the hybrid rocket engine design and derive an appropriate design strategy using our proposed design mode analysis (described in Section 3.4). Prior to running the PCA, we first preprocess the decision variables of the hybrid rocket design to normalize the mean and variance of them. Thus, PCA is performed based on the correlations matrix. When the range and scale of variables are different from each other, the dataset should be normalized. Otherwise, as in the case of MOKP, the covariance matrix is used for PCA to preserve variance without normalization of the dataset.

The base of the approximated design is set to the mean vector of each cluster, where binary clustering is performed by the -means method with cluster size 2. The distance metric in -means clustering is the Euclidean distance. The error in the design approximated in the decision space is the sum of squares of the relative error (SSRE): where is the th decision variable of the th design in the dataset and is the approximated design of . Moreover, the PCA is also performed on the correlation matrix. The SSRE is minimized by the optimization algorithm (i.e., sequential least squares programming) [15].

If some decision variables are integer values, we regard them as real-valued variables through the design mode analysis. When the designs based on the design modes are evaluated in the objective space, their variables, which are originally the integer values, should be rounded off to the closest whole number.

The pseudocode of the extended framework of design mode analysis is shown in Algorithm 2.

    (1) Generate a design dataset = () of size N.
   (2) Scale the dataset such that all decision variables have
 zero mean and unit variance.
   (3) Initialize total approximation error E = .
   (4) Set a threshold η for E.
   (5) Initialize layer counter i = 1.
   (6) while  E > η do
   (7)  Initialize the number of clusters in current layer
  .
   (8)  Initialize the counter of the clusters in new layer k = 1.
   (9)  Initialize E = 0.
 (10)  for  j = 1 to H  do
  (11)    Extract the design mode by applying PCA to .
 (12)    Calculate the component loading for each
     design mode.
 (13)    Choose a base design , or calculate
     a mean vector of .
 (14)    Choose p design modes so as to satisfy cumulative
     proportion of the variance P ≥ 0.80.
 (15)    Perform Design Approximation (mentioned above)
     for all the designs in .
(16)     Add (approximation error for ) to
     the total error: E = E + .
 (17)    Divide the cluster into two clusters and
      by using data clustering.
 (18)     k = k + 2
 (19)  end  for
(20)  i = i + 1.
 (21) end  while

4.4. Results and Discussion

Figure 13 plots the history of the error in the decision space at each layer of the design approximation. Although, relative to the first layer, the accuracy worsens in the second and third layers, it gradually improves, as the layers are refined. This indicates that data clustering contributes to design mode classification and improves the accuracy of the approximated design. This trend is emphasized in the objective space.

Figure 14 plots the history of the error in the objective space at each layer, evaluated as the average Euclidean distance between each real and approximated design.

Figure 15 plots the histories of the average number of clusters (circle-plotted curve) and the average cluster size (average number of designs within each cluster, indicated by the triangle-plotted curve). While the analysis can be continued until the number of clusters equals the dataset size, such refinement is nonsensical because PCA cannot be performed on a single datum. Instead, we stipulate that our proposed analysis be continued while the dataset size is larger than the dimension of the decision variables. In this case, since the decision space is six-dimensional, the analysis is meaningful up to the 5th layer.

Figure 16 indicates the original and approximated designs of each cluster at each layer. In these plots, “” denotes the th cluster at the th layer. Notably, possibly because these designs have distinguishable characteristics, around C2-2 in the 2nd layer are almost exactly retained in passing via the 3rd to the 4th layer.

For a detailed characterization of each cluster, we consider the component loadings of each cluster. For illustrative purposes, we investigate the mode 1 component loadings only, although each cluster yielded multiple design modes. The mode 1 component loadings of each cluster at each layer are plotted in Figure 17. The component loading, denoted by , represents the correlation between the design mode and decision variable. If is large, the design mode is highly correlated with decision variable .

and are excessively high in the first layer. Since and denote the initial mass flow of the oxidizer and the fuel length , respectively, these two parameters are expected to dominate in this problem. To obtain variable designs on nondominated solution sets, we can alter both parameters along the first design mode. Here, the first design mode is the design mode corresponding to the eigenvector with the maximum eigenvalue. Notably, both parameters should be aligned in the same direction because their component loadings have the same sign. This yields the design mode obtained in the first layer. When we create a new design, we first choose the base design and then modify its decision variables along the direction indicated by each design mode (the eigenvector obtained by PCA). The component loadings represent the correlations between the decision variables and each design mode. The sign of the component loading indicates the direction of each decision variable on the axis of the design mode. In the case of the first layer, , , , and are positive values, but and are negative values. This indicates that if the decision variables , , , and are changed to the positive direction, and should be moved into the negative direction in the first design mode.

In the second layer, each cluster appears to yield different design modes. However, the distribution from to is almost identical between the two clusters, and since the component loadings of and in the clusters are merely opposite in sign, we can regard the design modes in the clusters as unchanged from the first layer. Thus, when the dataset is divided by and , the characteristics of the resulting clusters are almost identical, suggesting that binary clustering is uninformative at the second layer. This explains the deterioration of the accuracy for design approximation in the second layer.

In the third layer, C3-2 and C3-4 are negatively correlated with and but differ in their correlations with and . For example, C3-4 yields a new design mode that reverses the sign of , , , and from positive to negative. While the first and second layers only revealed that and are dominant, different design modes for each cluster are revealed in the third layer. Moreover, the component loadings of C3-3 and C2-2 appear to be very similar, although the characteristics of C3-3 are expected to dominate over those of C2-2.

In layers 1–3, the component loadings of the higher modes (relative to and ) are comparatively low. However, different design modes can be extracted by increasing the granularity. In cluster C4-4 (layer 4), the component loading of becomes relatively high, while and change in sign ( is an initial port radius ). Thus, C4-4 may provide a design strategy that the decision parameter can be explored in the negative direction and at the same time and should be changed to the positive direction. We conclude that unique design modes are obtained at the 4th layer.

From the experiments in this section, we infer the following:(i)the deeper the layer, the better the approximation accuracy (observed in Figures 13 and 14);(ii)the design mode is characterized by the component loading distributions of each mode;(iii)in the hybrid rocket design problem, the initial mass flow of oxidizer and the fuel length dominate the Pareto set;(iv)different design modes are revealed as the granularity is increased; in cluster C4-4, the initial port radius of port exhibits a higher component loading than in other clusters.

A remarkable outcome of this study is that new designs with the same characteristics as a specified design mode are obtained. The design mode provides its own design strategy. The characteristics of each design mode are easily understood by investigating their component loadings. The proposed framework is especially useful when the design problem has a huge number of decision variables because it isolates the important parameters and specifies how their values should be altered.

A priority for our future study is to improve the data clustering process. The current framework adopts binary clustering, which does not always perform to the required standard. To realize more effective clustering, we require a scheme that automatically determines an appropriate number of clusters. Ineffective clustering generates many clusters with identical characteristics. The distance metric of the data clustering should also be reviewed. For example, the Mahalanobis distance, which is based on correlations in the dataset, may improve the data classification.

5. Conclusions

We have proposed a design mode analysis of Pareto solution sets that supports human decision making. The design mode of the Pareto solution set was extracted by PCA. We demonstrated that any design in the Pareto set can be represented by a linear combination of the eigenvectors of the base design. From this finding, we developed a hierarchical framework for design mode analysis, in which the granularity of the extracted design modes determines the accuracy of the approximated design. The effectiveness of the proposed method was tested on the conceptual design problem of the hybrid rocket engine. We found that the extracted design modes depended on the granularity of the analysis. The proposed method will support human decision making in engineering design problems.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.