Research Article | Open Access
Junghwan Lee, Yong-Hyuk Kim, "Epistasis-Based Basis Estimation Method for Simplifying the Problem Space of an Evolutionary Search in Binary Representation", Complexity, vol. 2019, Article ID 2095167, 13 pages, 2019. https://doi.org/10.1155/2019/2095167
Epistasis-Based Basis Estimation Method for Simplifying the Problem Space of an Evolutionary Search in Binary Representation
An evolutionary search space can be smoothly transformed via a suitable change of basis; however, it can be difficult to determine an appropriate basis. In this paper, a method is proposed to select an optimum basis can be used to simplify an evolutionary search space in a binary encoding scheme. The basis search method is based on a genetic algorithm and the fitness evaluation is based on the epistasis, which is an indicator of the complexity of a genetic algorithm. Two tests were conducted to validate the proposed method when applied to two different evolutionary search problems. The first searched for an appropriate basis to apply, while the second searched for a solution to the test problem. The results obtained after the identified basis had been applied were compared to those with the original basis, and it was found that the proposed method provided superior results.
Binary encoding typically uses a standard basis, and when a nonstandard basis is used, the structure of the problem space may become quite different from that of the original problem. In an evolutionary search, various methods can be used to change a problem space by adjusting the basis, including gene rearrangement, different encoding methods, and the use of an eigen-structure [1–12].
An investigation was conducted to elucidate the possibility of changing the basis in binary encoding and the corresponding effects on the genetic algorithm (GA) ; however, it was not possible to determine which basis should be applied to smooth the problem search space. In genetics, epistasis means that the phenotypic effect of one gene is masked by another gene; however, in GA, it refers to any type of gene interaction. In a problem with a large epistasis, as the genes are extremely inter-dependent, the fitness landscape of the problem space is very complex and the problem is difficult . Several studies have been conducted to assess the difficulty of problems from the perspective of epistasis [15–21]. Epistasis has the advantage that it is possible to measure the extent of nonlinearity only with fitness function. In this paper, we define the difficulty of the problem or problem search space as the nonlinearity level of gene expression. Also, we use epistasis as a measure for the difficulty of the problem.
There are three main contributions of this paper. First, an epistasis approximation is used to identify a basis that will reduce the complexity of an evolutionary search problem. Second, the basis is expressed by a variable-length encoding scheme using an elementary matrix. Finally, a GA is defined that can be used to change the basis of an evolutionary search space. This means that when a basis is given, one can tell how it affects the GA. Our intention in this study is that a nonseparable problem can be transformed into a separable problem by performing an appropriate basis transformation. Such an altered environment enables GA to search space effectively.
This paper is organized as follows: Section 2 describes the principle of reducing the complexity of a problem space in an evolutionary search by changing the basis and presents the motivation for evaluating the basis using the epistasis. In Section 3, a method is introduced for changing a standard basis to another basis for a binary encoding problem. Then, a GA is introduced that can be used to apply a change of basis. Once an appropriate basis has been selected, this algorithm is more efficient at searching for a solution than the conventional GA. In Section 4, a method is proposed for estimating a basis that reduces the complexity of an evolutionary search problem. Section 5 describes a GA that can be used to search for a basis by applying the proposed estimation method. Here, a variable-length encoding scheme that consists of an elementary matrix is employed so to increase the efficiency of the search for an appropriate basis in the problem space. Section 6 presents a description of the tests used to validate the method and then discusses the results. In the tests, an appropriate basis for the target problem is found via the GA, and then the identified basis is applied to the target problem. The conclusions that can be drawn from this study are presented in Section 7.
In this section, the concept of the epistasis is introduced as a means of estimating a basis that will reduce the complexity of the problem. First, a principal component analysis (PCA) is used to extract important information by changing the basis in real number encoding. Next, an example of changing the basis in binary encoding is presented to illustrate that a complex problem can be converted to a simple problem by changing the basis. Lastly, the epistases between the original and the modified problems are compared. If the epistasis of the problem decreases when the basis is changed, it implies the complexity of the original problem has decreased. Thus, a suitable basis can be identified using the changes in the epistasis before and after the prospective basis has been applied to the problem of interest.
2.1. An Example of Changing a Basis in
A PCA is used to obtain the principal components of the data by transforming the data into a new coordinate system via an orthogonal transformation. When the data is projected in the coordinate system, the position where the variance is the largest becomes the first principal component. The second principal component is in a position that is orthogonal to the previous component at the position with the second largest variance. Consequently, if the eigenvectors and eigenvalues of the covariance matrix are obtained and sorted in descending order, the principal components can be found. This is identical to changing the basis from the original coordinate system to a coordinate system based on the variance of the data. In general, by using only the important principal components, lost data are used.
2.2. Change of Basis in Binary Representation
Binary encoding typically employs a standard basis; however, it is sometimes easier to manipulate a problem in a nonstandard basis. The following example illustrates that the relationship between the basis vectors is dependent on the basis. Here, is a field that has elements of zero and one, the addition operator corresponds to the exclusive-or (XOR) operator, and the multiplication operator corresponds to the AND operator. The standard basis for vector space is , where consists of column vectors in which the -th entry is one and the remaining entries are zero.
In the vector space , if the vector and the evaluation function are as follows, then the basis vector of has a dependency relationship with the other basis vectors in :where and is the XOR operator.
Let us assume a function performs the same operation as but in a new basis and suppose is even. If a set is composed as follows: then becomes the basis. One property of a basis is that every vector can be represented as a linear combination of basis vectors. That is, where and , which is the representation of with respect to the basis .
Here, is a function that evaluates , has the same operation as , and satisfies the following relationship: It can be seen that the basis vector of is independent of the other basis vectors in . In fact, is identical to the onemax problem that counts the number of ones in a bitstring. Therefore, for a vector in which all are set to one, the evaluation value becomes the largest value, and if this vector is transformed with the standard basis, an optimum solution can be obtained. Figure 1 shows the relationships of the basis vectors according to the basis with in the graphs.
2.3. Epistasis According to the Basis
In a GA, the epistasis indicates the correlation between the genes. If the epistasis for a particular problem is large, then the genes are very inter-dependent, the fitness landscape of the problem space is extremely complex and the problem is difficult. In Section 2.2, it was shown that the complexity of a problem varies depending on the basis. The epistasis numerically expresses the complexity of such a problem. In general, when the genes in a problem are very dependent, the epistasis has a large value. In contrast, when the genes are independent, the value is zero.
The results of calculating the epistasis according to the problem size of evaluation functions and in Section 2.2 are shown in Table 1. In this paper, the method proposed by Davidor  is used to compute the epistasis. In , because the dependency relationship with other basis vectors increases as increases, the epistasis also increases. However, for , since the basis vectors are independent, the epistasis is zero. Thus, it is expected that the search space can be simplified via an appropriate change of basis.
The epistasis can be used to check if the search space can be simplified by using a particular basis. If the epistasis of the problem after changing the basis is lower than the epistasis of the original problem, then this indicates that the problem has become easier. However, using the epistasis in this way requires all solutions to be searched. An alternative is to estimate the actual epistasis by calculating the epistasis of a sample set of solutions. Note that nonlinearity may be misleading due to approximation error by solution sampling. It hinders to find the proper basis for the target problem. The target problem may be transformed into a more complex problem through a basis transformation. That is, the basis transformation can rather prevent a GA from efficiently finding the solution.
3. Change of Basis
This section presents a GA that performs an effective search through a change of basis. Before presenting the GA, we introduce the related terminologies and theories of change of basis in binary representation. Next, we apply the change of basis in the onemax problem to show how the problem actually transformed. In addition, a methodology for evaluating solutions in the transformed problem will be described. Finally, we propose a GA that effectively searches solutions through applying the change of basis. On the other hand, searching for an appropriate basis will be covered in Sections 4 and 5.
3.1. Change of Basis in
A basis for an -dimensional vector space is a subset that consists of vectors and every element of the space can be uniquely represented as a linear combination of basis vectors. Since it is possible to use one or more bases in a vector space, the coordinate representation of a vector with respect to the basis can be transformed via an equivalent representation to other bases via the invertible linear transformation. Such a transformation is called a change of basis. The following theorem was derived from the basic theory of linear algebra .
Theorem 1. Let and be two bases for . Then, there exists a nonsingular matrix such that for every , , where is the representation of with respect to the basis .
A matrix is defined as binary if . In general, if is the standard basis, is the representation of with respect to the basis . In Theorem 1, nonsingular binary matrix is a coordinate-change matrix from basis to . When a is given, can be viewed as a coordinate-change matrix from the standard basis to , which is related to the . For every vector , holds and is . This study considers a change of basis from a standard basis to another basis. Thus, estimating the basis is equivalent to estimating an appropriate .
3.2. Analysis of Changing a Basis in the Onemax Problem
The onemax problem maximizes the number of ones in a bitstring and has zero epistasis. Here, a onemax problem in which the basis was changed using a selected nonsingular binary matrix is compared to the original onemax problem. The specific onemax problem of interest has a size of three. The is defined as follows: Then, it can be shown that . Table 2 shows the original vector and that obtained using . From this, it can be seen that, after the basis change, the problem became more complex.
The evaluation function of the onemax problem is as follows:
On the other hand, from Table 2, it is difficult to identify a rule for the fitness of for the onemax problem. The evaluation function of can be obtained by computing by changing the basis from to and evaluating with . That is, where is the inverse matrix of . The above equation is obtained by multiplying the left side by in and then applying to both sides. In this way, the basis on both sides can be easily changed using and .
3.3. Genetic Algorithm with a Change of Basis
In general, a GA is expected be more efficient when searching for a solution to a simple problem than a complex problem. As shown in Section 2.2, a complex problem can be changed to a simple problem by changing the basis. With this in mind, if an appropriate a change of basis is applied to a problem space to be searched by a GA, this will greatly improve the efficiency of the search process. A flowchart of the proposed algorithm is shown in Figure 2 and the corresponding steps are detailed in Algorithm 1.
|Step 1. The population of the is initialized and the fitness is evaluated.|
|Step 2. is replaced by the population whereby the standard basis is changed to the basis .|
|Step 3. By using the genetic operator on the GA, the offspring population is produced from .|
|Step 4. The fitness of is evaluated using the population that was used to change the basis from to .|
|Step 5. and are used to create a new generation and update to the new generation.|
|Step 6. The process from Step 3 onward is repeated as many times as there are generations. When the number of|
|generations has been exceeded, then we return whereby the basis is changed to the standard basis .|
If Steps 2 and 4 are excluded, then Algorithm 1 produces a typical GA. However, if the problem is transformed with an appropriate basis in Step 2, the original problem space is transformed into an easier problem space, which is expected to make it easier for the GA to find an optimum solution. On the other hand, Step 4 shows that the generated offspring vector is evaluated by changing the basis to the standard basis. This is identical to the method in Section 3.2 that evaluates a solution in another basis.
4. Evaluation of a Basis
The objective is to identify a basis that can be used to change a complex problem into to a simple problem. While such a basis was examined in Section 2.2, in that case, the change in basis converted the onemax problem from a simple to a complex problem.
When a basis and a target problem are given, a method is proposed that uses the epistasis to evaluate whether the basis is appropriate for the problem space. A meta-genetic algorithm (Meta-GA) is generally used as a method for estimating a hyperparameter of a GA. The two methods are compared to analyze the advantages and disadvantages of the proposed method.
4.1. Evaluation with Epistasis
Assume a target problem and basis are given. To determine the smoothing effect of on , a sampling population can be obtained from . Then, can be obtained by changing the basis for from the standard basis to . The epistasis of that numerically shows the difficulty of the problem can then be calculated. The lower the epistasis is, the more appropriate is as a basis for . The epistasis calculation method proposed by Davidor  is shown in Algorithm 2. Suppose the chromosome length is and the number of samples in is . Then, the time complexity of evaluating a single basis becomes . This is because the cost of executing the change of basis is . The change of basis is performed for a total of s vectors, and the cost of the change of basis is since each vector becomes through .
|Require: Sampling population|
|(1) procedure EVALUATION( Evaluation a basis|
|(2) is standard basis|
|(4)for each ind in|
|(5) v(ind) is a fitness of ind|
|(6)for to SIZE() do|
|(7) is allele value (0 or 1)|
|(8) v(ind) allele value of|
|(10) end for|
|(11) end for|
|(13) forto() do|
|(14) for eachin allele values|
|(17) end for|
|(18) end for|
|(20) for each ind in|
|(21) Genic value|
|(22) forto() do|
|(24) end for|
|(26) We have the epistasis|
|(27) end for|
|(29) end procedure|
4.2. Evaluation with a Meta-Genetic Algorithm
The use of a meta-GA to optimize the parameters and tune GAs was first proposed by Grefenstette . Here, a meta-GA to determine whether the basis is appropriate for the problem space of the GA. A method of evaluating a basis with a meta-GA is shown in Algorithm 3. By applying Algorithm 1 with a given and an instance of GA, populations are searched. Then, using the best fitness in each population, the basis is evaluated. That is, when units of fitness are found to be acceptable, it is estimated that is an appropriate basis of the instance. The reason for searching populations is because even with a basis that is not appropriate, a good solution may be obtained by using the GA to search once. To calculate the time complexity, with respect to the target GA, let the number of generations be , population size , and chromosome length . The time cost of line (10) in Algorithm 3 is the largest. When offspring are generated, the time consumed is . Since this is repeated times, the worst case time complexity becomes . Note that, in the experiment evaluated in this paper, is set to and is set to the chromosome length.
|Require: Target GA, Search GA times, Generations of GA|
|(1) procedure EVALUATION( Evaluation of a basis|
|(2) BestFits Return array|
|(4) Initialization of population|
|(5) GA.EvalPopulation() Evaluation of the population|
|(9) Perform crossover and mutation operations|
|(13) end for|
|(14) BestFits the best fitness of|
|(15) end for|
|(16) return BestFits|
|(17) end procedure|
5. Finding a Basis Using a Genetic Algorithm
This section describes the components of the GA used to search for a basis for the problem space with the evaluation method outlined in Section 4. The method of applying a basis and the genetic operator for the encoding are discussed, and the fitness of the basis is evaluated using the method of either Algorithm 2 or 3.
5.1. Encoding with an Elementary Matrix
A nonsingular binary matrix can be regarded as a change from a standard basis to another basis. That is, a basis corresponds to an appropriate the matrix. If a typical 2D type of encoding is used to encode the matrix, a repair mechanism may be required after recombination. In this case, one option is to conduct the repair using the Gauss-Jordan method; however, this will require a length of time equal to time.
Every nonsingular matrix can be expressed as a product of elementary matrices . Therefore, in , if a solution is expressed as a product of elementary matrices, it is possible to maintain their invertibility. Each element in an elementary matrix can be expressed by a variable-length linear string , which allows a new encoding to be applied. Note that any recombination method for a variable-length string can be used. In the following, an elementary row operation is defined and then the elementary matrix in is introduced.
Definition 2. Let . Any one of the following two operations on the rows of is called an elementary two operation:(i)Interchanging any two rows of , and(ii)Adding a row of to another row.
Elementary row operations are Type 1 or Type 2 depending on whether they were obtained using (i) or (ii) of Definition 2.
Definition 3. An elementary matrix in is a matrix obtained by performing an elementary operation on . The elementary matrix is said to be of Type 1 or Type 2 depending on whether the elementary operation performed on is a Type 1 or Type 2 operation, respectively.
Let us define as an elementary matrix of Type 1 that interchanges the -th row and the -th one for and . Also define as an elementary matrix of Type 2 that adds the -th row to the -th row for and .
When the representation of a nonsingular binary matrix is considered in the order of an elementary matrix, this representation is not unique. Also, it is difficult to determine how many equivalent representations exist for a nonsingular binary matrix. Several equivalences were proposed by Yoon and Kim  as Propositions 4 and 5 by way of a simple idea. The newly discovered equivalences proposed in this paper are denoted in Proposition 6. Their proof is provided in the Appendix.
Proposition 4 (exchange rule). For each such that , the following five exchange rules hold:
(i) , (ii) , (iii) , (iv) , and (v) .
Proposition 5 (compaction rules). For each such that , the following two exchange rules hold:
(i) and (ii) .
Proposition 6. For each and such that , the following three rules hold:
(i) , (ii) , and (iii) .
For example, the encodings of matrices and are as follows: let and . Then, calculate based on a sequence alignment between and , where is the edit distance and the insertion, deletion, and replacement functions have weights of one, one, and two, respectively. First, consider the original form: Then, . This allows the parents to be changed into other forms. Note that From these rules, . Thus, the propositions can produce offspring that are more similar to the parents.
Any recombination for a variable-length string can be used as a recombination operator for the encoding and the edit distance is typically used as the distance for the variable-length string. This changes one string into another by using a minimum number of insertions, deletions, and replacements of the elementary matrix. A geometric crossover that is associated with this distance is called a homologous geometric crossover .
Several general string genetic operators can be used. In the case of a string encoding of the elementary matrix, a mathematically designed genetic operator was proposed . Specifically, the geometric crossover by sequence alignment is expected to be effective. Here, alignment refers to allowing the strings to stretch in order to provide a better match. A stretched string involves interleaving the symbol ‘—’ anywhere in the string to create two stretched strings of the same length with a minimum Hamming distance. The offspring is generated by applying a uniform crossover to the aligned parents after removing the ‘—’ symbols. Here, two offspring solutions are generated as solutions of the two parents.
The optimal alignment of the two strings is as per the Wagner-Fischer algorithm , which is a dynamic programming (DP) algorithm that computes the edit distance between two strings of characters. This algorithm has a time complexity of and a space complexity of when the full dynamic programming table is constructed, where and are the lengths of the two strings.
5.3. Initial Population, Selection, Mutation, and Replacement
An initial population is generated with a random number of random elementary matrices. The random number is generated from a normal distribution where the mean is and the standard deviation is when the problem size is . If the random number is smaller than one, it is fixed at one. The selection operator applies a tournament selection method by choosing three parents. The mutation operator applies one of three operations, namely insertion, deletion, or replacement, to each string with a 5% probability. Furthermore, the probability that each individual will be mutated is set at 0.2. Lastly, replacement refers to replacing the parent generation with an offspring generation. The details of this process are as follows: the selection operator is used for candidates of the offspring generation. When the population of the parent generation is , then parents are extracted by applying the selection operator times. The probability of two parents pairing up and applying the crossover is 0.5. When the crossover is not applied, the two parents become candidates for members of the next generation, while in the opposite case, the two offspring become candidates for members of the next generation. Each candidate proceeds with a mutation probability of 0.2 and replaces the parent generation with the next generation.
6.1. Target Problem in Binary Representation
In this section, two problems are described for which better solutions can be obtained with an appropriate basis.(1)Variant-onemax: for the evaluation function of the onemax problem, vector has an evaluation value of one. Variant-onemax is defined as counting the number of ones by changing vector from the standard basis to a certain basis . That is, in variant-onemax, becomes the evaluation value for vector .If the basis is changed for with the nonsingular binary matrix , then we have . Then, becomes a function that counts the number of 1s in . This is therefore identical to the onemax problem as a result of an appropriate change of basis in variant-onemax. Meanwhile, from , an evaluation function of variant-onemax can be generated even when a nonsingular binary matrix is given. As for the optimum solution of variant-onemax, when the problem size is , the number of ones becomes through the change of basis, and becomes the optimal solution.(2)-landscape: the -landscape model consists of a string of length and a fitness contribution is attributed to each character depending on the other characters. These fitness contributions are often randomly chosen from a particular probability distribution. In addition, the number of hills and valleys can be adjusted by varying and . One of the reasons why the -landscape model is used in optimization is that it is a simple instance of an NP-hard problem.
In the experiments, the GA is used to search for solutions to the above the two problems. The GA consists of tournament selection, one-point crossover, and flip mutation, and the replacement replaces all the parent generations with offspring generations. The tournament selection process chooses the best solution among three randomly selected parents, the one-point crossover combines a solution involving two offspring with the solution of two parents, while in flip mutation, each gene is flipped from zero to one or from one to zero with a probability of 0.05. The replacement method is the same as that described in Section 5. In other words, in the composition of the next generation, the number of parents extracted is equal to the number in the population. Two parents are paired up with a 50% probability that the crossover will be applied. When the crossover is not applied, the two parents become member candidates of the next generation, while in the opposite case, the two offspring become member candidates of the next generation. Each member candidate undergoes mutation with a 20% probability that it will replace an existing parent. When the chromosome length of variant-onemax or -landscape is , the size of the population is set to . Because the fitness of the optimum solution of the variant-onemax problem is , solutions of 10,000 generations have to be searched until an optimum solution has been identified. In the -landscape, the fitness of the optimum solution is different for each , and all solutions must be searched to obtain an optimum solution. Thus, 300,000 generations must be searched to find an optimum solution for the -landscape problem.
The evaluation function of variant-onemax requires a nonsingular binary matrix that corresponds to a basis. For the basis of variant-onemax that has a chromosome length of , a random number of elementary matrices are generated and then are multiplied sequentially. The number of elementary matrices is generated from a normal distribution that has a mean of and a standard deviation of .
In the experiment, instances of variant-onemax where was 20, 30, and 50 were generated. With the GA described in Section 5, the following bases were searched for each instance: meta-GA-based basis , epistasis-based basis where the sampling number was , and epistasis-based basis where the sampling number was .
A total of 100 independent searches were conducted for each instance, and the number of times that an optimum solution was identified was counted along with the execution time. The results for the variant-onemax experiment are shown in Table 3. In the table, a type of ‘Original’ indicates that a solution instance was evaluated without a change of basis. Similarly, ‘Meta,’ ‘Epistasis-sq,’ and ‘Epistais-cu’ refer to evaluating solution instances by applying , and , respectively, to change the basis. In addition, the box plot in Figure 3 depicts the fitness distribution of the 100 best solutions obtained by performing 100 independent searches for each instance. A fitness is a value between zero and one that can be obtained by dividing the fitness of the optimum solution. That is, a value of one on the -axis indicates the fitness of an optimum solution, while values approaching zero indicate a lower fitness. In most cases, it can be seen that the search performance of the GA is efficient with the change of basis. When is 50, ‘Epistasis-cu’ does not seem to improve the search performance of the GA. This was likely because the population of the GA was not evenly distributed throughout the sample population.
Intel (R) Core TM i7-6850K CPU @ 3.60GHz
In Table 3, ‘Meta’ found opimal solutions more frequently than the other methods. In particular, when was 30, the 82nd most optimal solution was obtained out of 100. This indicates that the corresponding basis was appropriate. However, because the computation time for this approach was very long, it cannot be applied in practice. Note that when is 50, it was over 2 hours. Furthermore, no difference was observed when compared to the case in which the basis was not changed. The method of evaluating the basis using the epistasis provides a good indication of when changing the basis will provide a better result. In particular, when is 20, the number of optima found in ‘Original’ is 30, and the numbers of optima found in ‘Epistasis-sq’ and ‘Epistasis-cu’ are 64 and 33, respectively. In summary, these tests confirmed that a sample size of provided good results while requiring less time than a sample size of . Therefore, in terms of time and performance, a sample size of was deemed reasonable for estimating an epistasis.
The value of in the -landscape experiment represents the size of the problem. In this experiment, there were characters of zero and one and the total number of populations was . The evaluation functions were randomly generated according to . In terms of the instance generation, each gene was dependent on other genes and a value between was assigned. The fitness of the -landscape is based on the fitness of each gene. Therefore, the maximum and minimum fitness values, which are between zero and one, may be different for each instance. In the experiment, 100 independent searches for a solution are conducted for each instance. Table 4 shows the results of -landscape experiment in which the best solution and the computation time for each of the 100 searches were compared. In the table, when the type is ‘Epistasis,’ this indicates that a basis was obtained based on the epistasis using a sample set of size , and which 100 independent searches were conducted for that instance. A box plot showing the distribution of the 100 best solutions is shown in Figure 4.
Intel (R) Core TM i7-6850K CPU @ 3.60GHz
Upon analysis, the method of searching for the solution after changing the basis exhibited better performance than the original problem. In particular, in the box plot, it can be seen that the distribution of solutions obtained by changing the basis was more concentrated and had a higher mean. In the -landscape, when ‘Meta’ and ‘Epistasis’ were compared, neither side exhibited better performance. However, it can be seen that the computation time of ‘Meta’ was about 4–30 times longer than that of ‘Epistasis.’ Furthermore, although the ‘Epistasis’ consumed slightly more time than the ‘Original,’ it tended to have a more efficient evolutionary search. For these reasons, the method used to obtain the ‘Epistasis’ results was found to be the best among the three methods evaluated.
6.3. Experimental Analysis
The results of the above experiments confirmed that a basis obtained by estimating the epistasis improved the efficiency of searching for a solution using a GA. In this section, an analysis is performed to examine how much the basis found in the experiment reduced the epistasis. The basis was estimated in such a way that the epistasis of the sample population was reduced. Whether the GA proposed in Section 5 was effective can be confirmed by comparing the epistasis of and that of in which the basis was changed to the one identified by the search . It is expected that the latter epistasis will be smaller.
A comparison of the epistasies between and in the variant-onemax and -landscape experiments can be seen in Tables 5 and 6, respectively. First, in Table 5, is the chromosome length of the variant-onemax experiment. The sizes of the sample sets were and , respectively; ‘Before’ and ‘After’ show the epistasies of and , respectively. For every , it was confirmed that a lower epistasis value was obtained when the basis was changed. Moreover, when the sampling size was ‘square’, the epistasis was reduced more compared to the ‘cubic’. Thus, there was a higher possibility that the GA would conduct a more efficient search and find a better solution. When was 20, since there were solutions, the epistasis for all the solutions, not the sample sets, can be obtained. Here, it was confirmed that the epistasis was 4.50, and since the epistasis was 4.46 and 4.35 when the sampling sizes were square and cubic, respectively, this indicates that the original epistasis was accurately estimated.
In Section 5.2, the size of the sample set in the -landscape experiment was . Table 6 shows the epistasies of and after the basis was changed, respectively, for the values of used in the experiment. The ‘Before’ and ‘After’ results indicate the epistasies of and , respectively. As in the case of the variant-onemax experiment, it was confirmed that for every , a lower value of epistasis was obtained when a change of basis was applied. When was , the epistasis for all the solutions, but not the samplings, was obtained. When was 3, 5, and 10, the epistasis was , respectively. These values are close to the respective epistases of , .
In this paper, an epistasis-based evolutionary search method was proposed for estimating a basis that would simplify a particular problem. Two test problems were constructed, a basis was identified by estimating the epistasis, and after the basis was changed, the results before and after the basis change were compared. The epistasis-based basis estimation method was found to be extremely efficient compared to a meta-GA in terms of time. This was also found for the -landscape in which the epistasis-based basis estimation method provided similar results. Thus, it is reasonable to estimate the basis by using the epistasis rather the meta-GA algorithm.
To estimate an epistasis, sample sets of size or sampling data were used. It was therefore necessary to conduct a study to find an appropriate sampling number. However, the method of finding the basis was carried out using a simple GA. In the future, a study should be conducted to identify a better basis. Also, by applying various factors in the GA or other genetic operators or by applying the method shown in the Appendix, a higher quality search can be performed.
Furthermore, the experiment evaluated specific problems that could be simplified with a change of basis. In further research, it will be necessary to identify the characteristics of problems that could benefit from a change of basis. Note that the basis evaluation method is applicable to not only binary encoding, but also to -ary encoding. In addition, it can be used to evaluate any vector space in which the epistasis can be calculated.
We present the following lemma to prove Proposition 6:
Lemma A.1. Letting be an binary matrix. For each and such that , the following four rules hold:(i),(ii),(iii), and(iv), where is the -th row vector of matrix .
Proof. Let be the -th row vector of ; that is, . Without loss of generality, we assume that . Note that So, we have the following:(i),(ii),(iii), and(iv).
Proof of Proposition 6. Let be an binary matrix which is the -th row vector; that is, . (1)It is enough to show that the -th and -th row vectors of are the same as those of . Consider the left side: using Lemma A.1, we have Now consider the right side: (2)We know . We multiply in both sides. Then, the left side is , and so .(3) by the definition of . Now, consider . Note that the left side and note that the right side
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The present research has been conducted by the Research Grant of Kwangwoon University in 2019. This research was supported by a grant (KCG-01-2017-05) through the Disaster and Safety Management Institute funded by Korea Coast Guard of Korean government and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2015R1D1A1A01060105).
- I. Hwang, Y.-H. Kim, and B.-R. Moon, “Multi-attractor gene reordering for graph bisection,” in Proceedings of the 8th Annual Genetic and Evolutionary Computation Conference, pp. 1209–1215, July 2006.
- D. X. Chang, X. D. Zhang, and C. W. Zheng, “A genetic algorithm with gene rearrangement for K-means clustering,” Pattern Recognition, vol. 42, no. 7, pp. 1210–1222, 2009.
- D. Sankoff and M. Blanchette, “Multiple genome rearrangement and breakpoint phylogeny,” Journal of Computational Biology, vol. 5, no. 3, pp. 555–570, 1998.
- G. R. Raidl and B. A. Julstrom, “A weighted coding in a genetic algorithm for the degree-constrained minimum spanning tree problem,” in Proceedings of the ACM Symposium on Applied Computing (SAC '00), vol. 1, pp. 440–445, ACM, March 2000.
- E. Falkenauer, “A new representation and operators for genetic algorithms applied to grouping problems,” Evolutionary Computation, vol. 2, no. 2, pp. 123–144, 1994.
- M. Gen, F. Altiparmak, and L. Lin, “A genetic algorithm for two-stage transportation problem using priority-based encoding,” OR Spectrum, vol. 28, no. 3, pp. 337–354, 2006.
- Y. M. Wang, H. L. Yin, and J. Wang, “Genetic algorithm with new encoding scheme for job shop scheduling,” The International Journal of Advanced Manufacturing Technology, vol. 44, no. 9-10, pp. 977–984, 2009.
- M. M. Lotfi and R. Tavakkoli-Moghaddam, “A genetic algorithm using priority-based encoding with new operators for fixed charge transportation problems,” Applied Soft Computing, vol. 13, no. 5, pp. 2711–2726, 2013.
- F. Pernkopf and P. O’Leary, “Feature selection for classification using genetic algorithms with a novel encoding,” in Proceedings of the International Conference on Computer Analysis of Images and Patterns, vol. 2001, pp. 161–168.
- Y. Wang, L. Han, Y. Li, and S. Zhao, “A new encoding based genetic algorithm for the traveling salesman problem,” Engineering Optimization, vol. 38, no. 1, pp. 1–13, 2006.
- J.-Z. Wu, X.-C. Hao, C.-F. Chien, and M. Gen, “A novel bi-vector encoding genetic algorithm for the simultaneous multiple resources scheduling problem,” Journal of Intelligent Manufacturing, vol. 23, no. 6, pp. 2255–2270, 2012.
- D. Wyatt and H. Lipson, “Finding building blocks through eigenstructure adaptation,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1518–1529, 2003.
- Y.-H. Kim and Y. Yoon, “Effect of changing the basis in genetic algorithms using binary encoding,” KSII Transactions on Internet and Information Systems, vol. 2, no. 4, pp. 184–193, 2008.
- Y. Davidor, “Epistasis variance: suitability of a representation to genetic algorithms,” Complex Systems, vol. 4, no. 4, pp. 369–383, 1990.
- D. Seo, Y. Kim, and B. R. Moon, “New entropy-based measures of gene significance and epistasis,” in Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2724, pp. 1345–1356, 2003.
- D. Seo, S. Choi, and B. Moon, “New epistasis measures for detecting independently optimizable partitions of variables,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 150–161, 2004.
- M. Ventresca and B. Ombuki-Berman, “Epistasis in multi-objective evolutionary recurrent neuro-controllers,” in Proceedings of the 1st IEEE Symposium on Artificial Life, IEEE-ALife'07, pp. 77–84, USA, April 2007.
- D.-I. Seo and B.-R. Moon, “Computing the variance of large-scale traveling salesman problems,” in Proceedings of the GECCO 2005 - Genetic and Evolutionary Computation Conference, pp. 1169–1176, USA, June 2005.
- C. R. Reeves and C. C. Wright, “Epistasis in genetic algorithms: an experimental design perspective,” in Proceedings of the International Conference on Genetic Algorithms, pp. 217–224, 1995.
- B. Naudts and L. Kallel, “A comparison of predictive measures of problem difficulty in evolutionary algorithms,” IEEE Transactions on Evolutionary Computation, vol. 4, no. 1, pp. 1–15, 2000.
- D. Beasley, R. David, and R. Ralph, “Reducing epistasis in combinatorial problems by expansive coding,” in Proceedings of the International Conference on Genetic Algorithms, pp. 400–407, 1993.
- S. H. Friedberg, A. J. Insel, and L. E. Spence, Linear Algebra, Prentice Hall, Upper Saddle River, NJ, USA, 3rd edition, 1997.
- J. J. Grefenstette, “Optimization of control parameters for genetic algorithms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 16, no. 1, pp. 122–128, 1986.
- Y. Yoon and Y.-H. Kim, “A mathematical design of genetic operators on ,” Mathematical Problems in Engineering, vol. 2014, Article ID 540936, 8 pages, 2014.
- A. Moraglio, P. Riccardo, and R. Seehuus, “Geometric crossover for biological sequences,” in In Proceedings of the European Conference on Genetic Programming, pp. 121–132, 2006.
- R. A. Wagner and M. J. Fischer, “The string-to-string correction problem,” Journal of the ACM, vol. 21, pp. 168–173, 1974.
Copyright © 2019 Junghwan Lee and Yong-Hyuk Kim. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.