Mathematical Modeling and Analysis of Soft ComputingView this Special Issue
Research Article | Open Access
Yong-Hyuk Kim, Yourim Yoon, "Linkage-Based Distance Metric in the Search Space of Genetic Algorithms", Mathematical Problems in Engineering, vol. 2015, Article ID 680624, 6 pages, 2015. https://doi.org/10.1155/2015/680624
Linkage-Based Distance Metric in the Search Space of Genetic Algorithms
We propose a new distance metric, based on the linkage of genes, in the search space of genetic algorithms. This second-order distance measure is derived from the gene interaction graph and first-order distance, which is a natural distance in chromosomal spaces. We show that the proposed measure forms a metric space and can be computed efficiently. As an example application, we demonstrate how this measure can be used to estimate the extent to which gene rearrangement improves the performance of genetic algorithms.
Distance metrics are fundamental tools for organizing search spaces, because the introduction of a metric is the simplest way to induce a topology . Different metrics produce different topologies and thus change the shape of the search space. When a space is to be searched by a genetic algorithm (GA), a good distance metric facilitates navigation of the space [2–5] and can also improve the effectiveness of search [6–12]. Hamming distance is a popular metric in a discrete space that is to be searched by a GA. Hamming distance has also been widely used in analyses of solution spaces [13–15].
Fitness distance correlation (FDC), proposed by Jones and Forrest , is a measure of the effectiveness of a distance metric in a space to be searched by a GA. An FDC is obtained by measuring the correlation between fitness and the distance to the nearest global optimum for a number of sample solutions. FDC coefficients range from to , where higher values suggest increased difficulty in maximizing fitness and decreased difficulty in minimizing fitness. When a GA is hybridized with a local optimization, the population consists entirely of local optima, and it is then more useful to determine FDCs of local-optimum spaces.
In this paper, we propose a new distance measure which takes account of gene interaction and show that it forms a metric space. We use this metric to compute FDCs of search space and show that FDCs obtained in this way have improved correlation with the improvement in GA performance that can be obtained by gene rearrangement. The remainder of this paper is organized as follows. In Section 2, we review gene rearrangement in GAs. In Section 3, we propose a new distance measure for GAs, show that it forms a metric space, and demonstrate an application. Finally, we draw conclusions in Section 4.
2. Gene Rearrangement
Holland’s schema theorem  shows that schemata (i.e., groups of genes) with high fitness, short defining length, and low order have high probabilities of survival in a standard GA.
These durable schemata are called building blocks. They make a major contribution to fitness and have a high degree of mutual interaction. The performance of a GA is strongly dependent on the survival and reproduction of these building blocks.
The survival probability of a gene group through a crossover is strongly affected by the positions of genes in the chromosome. Schemata consisting of genes in scattered positions tend to be too long to survive. Thus, the strategy used for placing genes significantly affects the performance of a GA. Inversion is an operator which changes the location of genes while a GA is running , and the process of rearranging genes dynamically to improve performance is called linkage learning . Messy GA  is an example of a technique that implicitly uses dynamic gene rearrangement.
It has been observed that the performance of GAs on problems with a locus-based encoding can be improved by rearranging the indices of the genes before running the GA. Static gene rearrangement was first suggested by Bui and Moon [20, 21], who rearrange genes within a chromosomal representation to improve the quality of schemata and to help the GA to preserve the better schemata. Many studies on the static rearrangement of gene positions [20–24] have showed performance improvements. However, the improvement in performance achieved in this way has been shown to vary greatly between problem instances. This motivated us to develop a distance metric to improve our ability to estimate how much improvement in the performance of a GA on a particular problem instance can be expected through gene rearrangement.
3. A Linkage-Based Distance Measure
3.1. Second-Order Distance Measure
The most usual first-order distance measure in discrete space is the Hamming distance which is also a natural distance in chromosomal space, although there are other first-order distance measures, such as the quotient metric in redundant encoding . We now define a second-order distance measure derived from first-order distance. Given a problem instance , consider the unweighted undirected graph representing first-order gene interaction , which is the pairwise interaction of genes. For convenience, we will assume that each gene has an interaction with itself, so that for each gene . Let be the adjacency matrix of and consider as a binary matrix over [25–27].
Definition 1. Suppose that the inverse of exists as a binary matrix over ; that is, . One defines the second-order distance measure as follows: where is a vector summation operator, which performs a Boolean XOR (i.e., , , , and ) in each coordinate, and is a norm derived from the first-order distance metric (i.e., ).
Theorem 2. is a metric.
Proof. It is enough to show the following four conditions .(i)Nonnegativity: since and is a metric, for all and in .(ii)Identity of indiscernibles: consider (iii)Symmetry: consider (iv)Triangle inequality: consider
If the inverse of does not exist, we can extend the scope of the distance metric using the following well-defined formulation: We note that if the inverse of exists, then , which implies , and hence . Our second-order distance and its extension can be computed in by a variant of Gauss-Jordan elimination , where is the number of genes.
3.2. An Application
Intuitively, our measure of the distance between two chromosomes can be understood as the minimum number of bits that must be changed to transform one chromosome into the other in the genetic process using optimal gene rearrangement.
Given an undirected graph with edge weights , the max-cut problem is that of finding a subset which maximizes the sum of the edge weights which traverse the cut [29–31]. Consider the 6-node max-cut problem instance , which is to maximize the following expression: where a vertex belongs to the position and is the Boolean XOR operator. In this problem instance, edges and increase the fitness and edges and reduce the fitness. In the max-cut problem, we can consider that the given graph removing edge weights shows the first-order gene interaction (see, e.g., Figure 1(a)). Figure 1(b) shows an example in which the Hamming and second-order distances between two chromosomes and are obtained by optimal gene arrangement of the gene interaction graph . In this example, , , and hence . If we use the normalized Hamming distance (developed for the 2-grouping problem) [32, 33] as the first-order distance measure, the FDC of this problem is . But when our second-order distance is used, the FDC becomes .
Given a graph and its adjacency matrix , the graph bipartitioning problem is that of minimizing the following expression: where , a vertex belongs to the position , and is a positive constant introduced to penalize unbalanced partitions. If we ignore the second balancing term altogether, we can regard the given graph as the first-order gene interaction graph of the given problem instance. Bui and Moon  tried gene rearrangement in a GA for graph bipartitioning and obtained dramatic improvements in performance for some graphs. We hypothesized that FDCs calculated using our second-order distance would help identify graphs that could benefit most from gene rearrangement, in terms of GA performance. Figure 2 shows the relationship between FDC and the performance improvement of a GA on 16 benchmark graphs (8 random graphs and 8 random geometric graphs) that were used in [34–40].
(a) First-order distance
(b) Second-order distance
Here, the performance improvement means the difference in percentage between the average performances of a GA with and without gene rearrangement (data from ). The FDC values were approximated from 10,000 randomly generated local optima. When the first-order (normalized Hamming) distance was used, there was little correlation with the change in performance, but our second-order distance provided a clear correlation (see Figure 2(b) and Table 1).
|Change in GA performance obtained by gene rearrangement.|
4. Concluding Remarks
In most previous work, distances among chromosomes in GAs have usually been first-order distances, and in particular Hamming distance. We have proposed a second-order distance measure for GAs, which we consider to be more meaningful. We have showed that this distance measure forms a metric space and that it can be computed efficiently.
Using second-order distance allows us to see problem spaces from a different viewpoint. We have demonstrated its value in predicting the effectiveness of gene rearrangement, and we envisage it providing further understanding of the working mechanism of GAs.
A preliminary version of this paper appeared in the Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1393–1399, 2005.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by the Gachon University research fund of 2014 (GCU-2014-0121).
- D. W. Kahn, Topology: An Introduction to the Point-set and Algebraic Areas, Dover Publications, New York, NY, USA, 1995.
- Y.-H. Kim and B.-R. Moon, “New usage of Sammon's mapping for genetic visualization,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1136–1147, 2003.
- A. Moraglio and R. Poli, “Topological interpretation of crossover,” in Proceedings of the Genetic and Evolutionary Computation Conference, vol. 1, pp. 1377–1388, 2004.
- M. Wineberg and F. Oppacher, “Distance between populations,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1481–1492, 2003.
- Y. Yoon and Y.-H. Kim, “Geometricity of genetic operators for real-coded representation,” Applied Mathematics and Computation, vol. 219, no. 23, pp. 10915–10927, 2013.
- S.-S. Choi and B.-R. Moon, “Normalization in genetic algorithms,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 862–873, 2003.
- Y.-H. Kim, A. Moraglio, A. Kattan, and Y. Yoon, “Geometric generalisation of surrogate model-based optimisation to combinatorial and program spaces,” Mathematical Problems in Engineering, vol. 2014, Article ID 184540, 10 pages, 2014.
- Y.-H. Kim, A. Moraglio, Y. Yoon, and B.-R. Moon, “Geometric crossover for multiway graph partitioning,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1217–1224, July 2006.
- A. Moraglio, Y.-H. Kim, Y. Yoon, and B.-R. Moon, “Geometric crossovers for multiway graph partitioning,” Evolutionary Computation, vol. 15, no. 4, pp. 445–474, 2007.
- Y. Yoon and Y.-H. Kim, “An efficient genetic algorithm for maximum coverage deployment in wireless sensor networks,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1473–1483, 2013.
- Y. Yoon, Y.-H. Kim, A. Moraglio, and B.-R. Moon, “Quotient geometric crossovers and redundant encodings,” Theoretical Computer Science, vol. 425, pp. 4–16, 2012.
- Y. Yoon, Y.-H. Kim, A. Moraglio, and B.-R. Moon, “A theoretical and empirical study on unbiased boundary-extended crossover for real-valued representation,” Information Sciences, vol. 183, pp. 48–65, 2012.
- K. D. Boese, A. B. Kahng, and S. Muddu, “A new adaptive multi-start technique for combinatorial global optimizations,” Operations Research Letters, vol. 16, no. 2, pp. 101–113, 1994.
- T. Jones and S. Forrest, “Fitness distance correlation as a measure of problem difficulty for genetic algorithms,” in Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184–192, Pittsburgh, Pa, USA, July 1995.
- P. Merz and B. Freisleben, “Fitness landscape analysis and memetic algorithms for the quadratic assignment problem,” IEEE Transactions on Evolutionary Computation, vol. 4, no. 4, pp. 337–352, 2000.
- J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich, USA, 1975.
- J. Bagley, The behavior of adaptive systems which employ genetic and correlation algorithms [Ph.D. thesis], University of Michigan, Ann Arbor, Mich, USA, 1967.
- G. R. Harik and D. E. Goldberg, “Learning linkage,” in Foundations of Genetic Algorithms, vol. 4, pp. 247–262, Morgan Kaufmann, San Francisco, Calif, USA, 1996.
- D. E. Goldberg, B. Korb, and K. Deb, “Messy genetic algorithms: motivation, analysis, and first results,” Complex Systems, vol. 3, no. 5, pp. 493–530, 1989.
- T. N. Bui and B.-R. Moon, “Hyperplane synthesis for genetic algorithms,” in Proceedings of the 5th International Conference on Genetic Algorithms, pp. 102–109, July 1993.
- T. N. Bui and B. R. Moon, “Genetic algorithm and graph partitioning,” IEEE Transactions on Computers, vol. 45, no. 7, pp. 841–855, 1996.
- T. N. Bui and B. R. Moon, “New genetic approach for the Traveling salesman problem,” in Proceedings of the 1st IEEE Conference on Evolutionary Computation, pp. 7–12, June 1994.
- Y.-H. Kim, Y.-K. Kwon, and B.-R. Moon, “Problem-independent schema synthesis for genetic algorithms,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1112–1122, Chicago, Ill, USA, July 2003.
- B.-R. Moon and C. K. Kim, “A two-dimensional embedding of graphs for genetic algorithms,” in Proceedings of the International Conference on Genetic Algorithms, pp. 204–211, 1997.
- Y.-H. Kim and K. Seo, “Two congruence classes for symmetric binary matrices over ,” WSEAS Transactions on Mathematics, vol. 7, no. 6, pp. 339–343, 2008.
- Y.-H. Kim and Y. Yoon, “Effect of changing the basis in genetic algorithms using binary encoding,” KSII Transactions on Internet and Information Systems, vol. 2, no. 4, pp. 184–193, 2008.
- Y. Yoon and Y.-H. Kim, “A mathematical design of genetic operators on ,” Mathematical Problems in Engineering, vol. 2014, Article ID 540936, 8 pages, 2014.
- M. Anderson and T. Feil, “Turning lights out with linear algebra,” Mathematics Magazine, vol. 71, no. 4, pp. 300–303, 1998.
- S.-H. Kim, Y.-H. Kim, and B.-R. Moon, “A hybrid genetic algorithm for the max cut problem,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 416–423, 2001.
- K. Seo, S. Hyun, and Y.-H. Kim, “A spanning tree-based encoding of the MAX CUT problem for evolutionary search,” in Proceedings of the International Conference on Parallel Problem Solving from Nature, vol. 7491 of Lecture Notes in Computer Science, pp. 510–518, 2012.
- K. Seo, S. Hyun, and Y.-H. Kim, “An edge-set representation based on spanning tree for searching cut space,” IEEE Transactions on Evolutionary Computation, 2014.
- Y.-H. Kim and B.-R. Moon, “Investigation of the fitness landscapes and multi-parent crossover for graph bipartitioning,” in Genetic and Evolutionary Computation—GECCO 2003, vol. 2723 of Lecture Notes in Computer Science, pp. 1123–1135, Springer, Berlin, Germany, 2003.
- Y.-H. Kim and B.-R. Moon, “Investigation of the fitness landscapes in graph bipartitioning: an empirical study,” Journal of Heuristics, vol. 10, no. 2, pp. 111–133, 2004.
- I. Hwang, Y.-H. Kim, and B.-R. Moon, “Multi-attractor gene reordering for graph bisection,” in Proceedings of the 8th Annual Genetic and Evolutionary Computation Conference, pp. 1209–1215, July 2006.
- D. S. Johnson, C. R. Aragon, L. A. McGeoch, and C. Schevon, “Optimization by simulated annealing: an experimental evaluation, part I. Graph partitioning,” Operations Research, vol. 37, no. 6, pp. 865–892, 1989.
- Y.-H. Kim, “An enzyme-inspired approach to surmount barriers in graph bisection,” in Proceedings of the International Conference on Computational Science and Its Applications, vol. 5072 of Lecture Notes in Computer Science, pp. 841–851, 2008.
- Y.-H. Kim and B.-R. Moon, “A hybrid genetic search for graph partitioning based on lock gain,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 167–174, 2000.
- Y.-H. Kim and B.-R. Moon, “Lock-gain based graph partitioning,” Journal of Heuristics, vol. 10, no. 1, pp. 37–57, 2004.
- Y. Yoon and Y.-H. Kim, “New bucket managements in iterative improvement partitioning algorithms,” Applied Mathematics and Information Sciences, vol. 7, no. 2, pp. 529–532, 2013.
- Y. Yoon and Y.-H. Kim, “Vertex ordering, clustering, and their application to graph partitioning,” Applied Mathematics and Information Sciences, vol. 8, no. 1, pp. 135–138, 2014.
Copyright © 2015 Yong-Hyuk Kim and Yourim Yoon. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.