Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 680624, 6 pages

http://dx.doi.org/10.1155/2015/680624

## Linkage-Based Distance Metric in the Search Space of Genetic Algorithms

^{1}Department of Computer Science & Engineering, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 139-701, Republic of Korea^{2}Department of Computer Engineering, College of Information Technology, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do 461-701, Republic of Korea

Received 31 July 2014; Accepted 7 September 2014

Academic Editor: Shifei Ding

Copyright © 2015 Yong-Hyuk Kim and Yourim Yoon. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We propose a new distance metric, based on the linkage of genes, in the search space of genetic algorithms. This second-order distance measure is derived from the gene interaction graph and first-order distance, which is a natural distance in chromosomal spaces. We show that the proposed measure forms a metric space and can be computed efficiently. As an example application, we demonstrate how this measure can be used to estimate the extent to which gene rearrangement improves the performance of genetic algorithms.

#### 1. Introduction

Distance metrics are fundamental tools for organizing search spaces, because the introduction of a metric is the simplest way to induce a topology [1]. Different metrics produce different topologies and thus change the shape of the search space. When a space is to be searched by a genetic algorithm (GA), a good distance metric facilitates navigation of the space [2–5] and can also improve the effectiveness of search [6–12]. Hamming distance is a popular metric in a discrete space that is to be searched by a GA. Hamming distance has also been widely used in analyses of solution spaces [13–15].

Fitness distance correlation (FDC), proposed by Jones and Forrest [14], is a measure of the effectiveness of a distance metric in a space to be searched by a GA. An FDC is obtained by measuring the correlation between fitness and the distance to the nearest global optimum for a number of sample solutions. FDC coefficients range from to , where higher values suggest increased difficulty in maximizing fitness and decreased difficulty in minimizing fitness. When a GA is hybridized with a local optimization, the population consists entirely of local optima, and it is then more useful to determine FDCs of local-optimum spaces.

In this paper, we propose a new distance measure which takes account of gene interaction and show that it forms a metric space. We use this metric to compute FDCs of search space and show that FDCs obtained in this way have improved correlation with the improvement in GA performance that can be obtained by gene rearrangement. The remainder of this paper is organized as follows. In Section 2, we review gene rearrangement in GAs. In Section 3, we propose a new distance measure for GAs, show that it forms a metric space, and demonstrate an application. Finally, we draw conclusions in Section 4.

#### 2. Gene Rearrangement

Holland’s schema theorem [16] shows that schemata (i.e., groups of genes) with high fitness, short defining length, and low order have high probabilities of survival in a standard GA.

These durable schemata are called* building blocks*. They make a major contribution to fitness and have a high degree of mutual interaction. The performance of a GA is strongly dependent on the survival and reproduction of these building blocks.

The survival probability of a gene group through a crossover is strongly affected by the positions of genes in the chromosome. Schemata consisting of genes in scattered positions tend to be too long to survive. Thus, the strategy used for placing genes significantly affects the performance of a GA. Inversion is an operator which changes the location of genes while a GA is running [17], and the process of rearranging genes dynamically to improve performance is called* linkage learning* [18]. Messy GA [19] is an example of a technique that implicitly uses dynamic gene rearrangement.

It has been observed that the performance of GAs on problems with a locus-based encoding can be improved by rearranging the indices of the genes before running the GA. Static gene rearrangement was first suggested by Bui and Moon [20, 21], who rearrange genes within a chromosomal representation to improve the quality of schemata and to help the GA to preserve the better schemata. Many studies on the static rearrangement of gene positions [20–24] have showed performance improvements. However, the improvement in performance achieved in this way has been shown to vary greatly between problem instances. This motivated us to develop a distance metric to improve our ability to estimate how much improvement in the performance of a GA on a particular problem instance can be expected through gene rearrangement.

#### 3. A Linkage-Based Distance Measure

##### 3.1. Second-Order Distance Measure

The most usual first-order distance measure in discrete space is the Hamming distance which is also a natural distance in chromosomal space, although there are other first-order distance measures, such as the quotient metric in redundant encoding [11]. We now define a second-order distance measure derived from first-order distance. Given a problem instance , consider the unweighted undirected graph representing first-order gene interaction [23], which is the pairwise interaction of genes. For convenience, we will assume that each gene has an interaction with itself, so that for each gene . Let be the adjacency matrix of and consider as a binary matrix over [25–27].

*Definition 1. *Suppose that the inverse of exists as a binary matrix over ; that is, . One defines the second-order distance measure as follows:
where is a vector summation operator, which performs a Boolean XOR (i.e., , , , and ) in each coordinate, and is a norm derived from the first-order distance metric (i.e., ).

Theorem 2. * is a metric.*

*Proof. *It is enough to show the following four conditions [1].(i)Nonnegativity: since and is a metric, for all and in .(ii)Identity of indiscernibles: consider
(iii)Symmetry: consider
(iv)Triangle inequality: consider

If the inverse of does not exist, we can extend the scope of the distance metric using the following well-defined formulation: We note that if the inverse of exists, then , which implies , and hence . Our second-order distance and its extension can be computed in by a variant of Gauss-Jordan elimination [28], where is the number of genes.

##### 3.2. An Application

Intuitively, our measure of the distance between two chromosomes can be understood as the minimum number of bits that must be changed to transform one chromosome into the other in the genetic process using optimal gene rearrangement.

Given an undirected graph with edge weights , the max-cut problem is that of finding a subset which maximizes the sum of the edge weights which traverse the cut [29–31]. Consider the 6-node max-cut problem instance , which is to maximize the following expression: where a vertex belongs to the position and is the Boolean XOR operator. In this problem instance, edges and increase the fitness and edges and reduce the fitness. In the max-cut problem, we can consider that the given graph removing edge weights shows the first-order gene interaction (see, e.g., Figure 1(a)). Figure 1(b) shows an example in which the Hamming and second-order distances between two chromosomes and are obtained by optimal gene arrangement of the gene interaction graph . In this example, , , and hence . If we use the normalized Hamming distance (developed for the 2-grouping problem) [32, 33] as the first-order distance measure, the FDC of this problem is . But when our second-order distance is used, the FDC becomes .