Research Article  Open Access
Genetic Algorithm Optimized CCEM for Complex Topology
Abstract
To evaluate how much two different complex topologies are similar to each other in a quantitative way is an essential procedure in largescale topology researches and still remains an NP problem. Crosscorrelation evaluation model (CCEM) together with Genetic Algorithm (GA) is introduced in this paper trying to solve this issue. Experiments have proved that SLS (Signless Laplacian Spectra) is capable of identifying a topology structure and CCEM is capable of distinguishing the differences between corresponding topology SLS eigenvectors. CCEM used in GA is recommended at last since a way of not finding the optimum solution in GA is a good way to reduce computing complexity.
1. Introduction
The research on the Internet topology modeling has been growing into a hot topic in Internetrelated research fields recently [1, 2]. In Internet topology modeling and other largescale topology researches, to evaluate how much two different complex topologies are similar to each other in a quantitative way is an essential part and up to now, it is still regarded as an NP problem and there is still not a good way to solve it in a quantitative way.
We take Internet topology as an example trying to solve this issue by constructing a quantitative model including crosscorrelation evaluation model (CCEM), spectral density [3] out of composite methods of graph theory, correlation algorithm [4], and Genetic Algorithm in this paper.
1.1. Spectral Density Introduction
A nondirected graph could be denoted by it symmetrical adjacency matrix . If there is a link between node and node in , then , otherwise . Eigen values of are the eigen values of its matrix , and they are denoted as . Researches in graph theory show that eigenvalues of a graph are closely related to the structural properties of the graph topology. So studies on a graph’s eigenvalues are useful in topology research.
Spectrum of a graph is denoted by a set of the eigen values and their tuple of its adjacency matrix [5], and it is denoted as follows. where is the tuple of the eigenvalue.
Spectral density is the eigen value density of the adjacency matrix , and it could be denoted as [5–7]. where is the th eigen value of the adjacency matrix , is the number of the eigen values. will be approaching to a continuous function when .
1.2. Experiment Samples
The samples are the measured routerlevel Internet results with 1,145,841 routers (nodes) and 2,907,638 links. After IP alias solution [8, 9], the size of the sample reduced to 29,367 routers and 190,280 links, respectively [10].
To further simplify the computation, we performed a secondorder sampling (resampling) operations on the experiment samples, and the resampling rules are (1) resampling operation is completely random, it could start from any effective node in target graph; (2) resampled results must be a connected graph; (3) Resampled results should cover as much nodes as possible, that is, node selection is preferential to link selections.
At last, the resampled Internet topology graph was converted into an adjacency matrix for further calculation.
2. Possibility of Using Spectral Density in Distinguishing Topology Graphs
Before we made use of spectral density to construct CCEM, we would first testify whether it could be used to distinguish topology graphs (including Internet topology) or not.
Three representative graphs: ER random graph, scalefree graph, and Internet topology graph were selected for the test in this paper.
According to [6], the spectral density of an ER random graph converges to a halfcircle, and the low part of the halfcircle exhibits an exponential distribution.
And spectrum density of a scalefree graph out of BA model [6, 11–14] exhibits a symmetrically continuous curve with a triangular center together with two powerlaw distribution sides.
We can find from [6, 11] that different graph exhibits quite different spectra diagram. Can the Internet topology graph, however, be denoted by spectrum density or not? As we know, Internet topology is different from that of ER graph and scalefree graph, but is a little close to the latter one [1, 2, 10]. We then take a look at it if it is possible to distinguish the Internet topology graph from the scalefree one.
For simplicity and better comparison, we draw three copies of Internet graph with the resampling tool mentioned above and the size of the three samples after resampling are 30 nodes and 29 links, 300 nodes and 536 links, and 500 nodes and 753 links, respectively. Their eigen values and spectral density are listed in Table 1.
 
Note. The value in the bracket is the total number of the eigen values. 
The symmetry of the spectral density could be found from Table 1, and this is consistent to the spectra symmetry on scalefree graphs found in [6, 11]. The correspondence match proves in a coarse granularity that there is a little similarity between the Internet graph and the scalefree graph, as was mentioned previously.
However, there are differences between the graphs, and we illustrated the Internet’s spectra diagram in Figure 1 for better comparison.
From Figure 1, we first find that there are complete conformities in all three resampled graphs (30 ips, 300 ips, and 500 ips), such as two small peaks when , one distinct peak when , and all when and .
All three graphs comprise quite different sizes and contents (specific routers and links) due to resampling rules, and the conformity found in Figure 1 shows that, though performed on different part of Internet, the spectral density still gets similar results. So conclusions could be made that, spectral density is OK in representing real Internet graph characters.
Next, we find that the center of three spectral density curves in Figure 1 is of triangular shape, which is similar to the scalefree graph. For the two side parts, however, they are different from scalefree graph since the side parts are not complied with exponential distribution or powerlaw distribution. So the spectral density is OK in distinguishing Internet graph from the scalefree graph.
Again, we begin to distinguish the Internet graph from the ER graph, and the differences are easily found. So, we make the conclusion that the spectral density is OK in distinguishing Internet graph from the ER graph.
Together with the fact that spectral density gives a quantitative description of Internet topology characters, we would make use of it in CCEM for Internet topology modeling.
3. Internet Topology Characters Discovered by Spectral Density
3.1. General Spectral Density
For a better view of spectra distribution, we calibrate the coordinate system by a factor of to make a new one with axis as and axis as [1, 6].
What is more, we enlarge the size of the resampled Internet topology graph from 30 ips, 300 ips, and 500 ips (Figure 1) to 300, 800, 2000, 3000, and 4000 ips (Figure 2) so as to make a graph closer to the real Internet.
We know that the more nodes a graph has, the closer to real Internet it is. However, a graph with 4000 ips is the largest one in this paper, and the reasons are (1) limitations of computing abilities, the calculating efficiency of spectral density would decrease sharply if the size of the graph increases over 4000; (2) Internet characters could be well expressed through spectral density no matter how many nodes an Internet graph has. And this is a fact had been proved in Figure 1 (differentsized graph has conformities in spectral density structure) and going to be proved again in Figure 2.
From Figure 2, we found that all five graphs’ spectral density showed very good conformities despite of their different size. All five plots have the maximum when and the second maximum when around.
Similar to what was found in Figure 1, the conformity among five Internet graphs proved that only a smallsized Internet graph could be enough to represent key properties of real Internet topology by spectral density based on the resampling tool. Which means that, performing experiments on the complete Internet topology graph is not necessary any more for us to study its properties, a rather smaller resampled graph with appropriate algorithm could also be effective.
Back to the basic idea of this paper, to distinguish topology graphs by comparing their spectral density. However, the spectral density is somewhat in coarse granularity, there is another especially valuable kind of spectral density named Signless Laplacian Spectra (SLS) which could give further and finer information on a graph’s properties [15].
3.2. SLS
An SLS matrix of a graph is defined to , where matrix is a diagonal matrix representing G’s degree, and matrix is ’s adjacency matrix [15]. SLS is eigen values of . Some researches in graph theory indicate that SLS is the best spectra in distinguishing different graphs [15]. In this paper, SLS is used on four resampled Internet topology graphs (3000 ips). And the result is illustrated in Figure 3.
From Figure 3, firstly, we could see that all four curves show high similarities although the four samples are completely random and different from each other. Again, this should be regarded as another proof that the resampled samples could effectively represent properties of the real Internet graph.
There are two evident horizontal lines when SLS equals to 1(10°) and 2, which means that there are the most nodes in the Internet topology graph when SLS equals to 1, and the secondmost nodes at SLS = 2. All four samples exhibit same properties clearly in Figure 3.
For the other part of Figure 3, that is, the part when SLS > 2 and SLS < 1, we would make further studies by performing powerlaw distribution fitting operations [1]. The fit result is illustrated in Figures 4 and 5.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
From Figure 4, we could see that there is obvious powerlaw relationship between SLS and its corresponding descending order, and the fitting result ACC (absolute value of the correlation coefficient) is greater than 0.9, meaning that the fitting operation is highly acceptable. The powerlaw relationship found here is quite consistent to what was found in the spectral density research on China CERNET in [1].
However, there is not clear powerlaw relationship since ACC is rather small in Figure 5. And this could also be regarded as a criterion identifying Internet graph.
3.3. Selection for CCEM
Compared with the general spectral density, SLS is better since (1) SLS is recommended to be the best spectra in [15]; (2) SLS is as same as the general spectral density in quantitatively identifying Internet graph by its eigen value sequence, but is better in discovering more characters of Internet such as two horizontal phases at SLS = 1 and SLS = 2, one powerlaw distribution part when SLS > 2, and nonpowerlaw distribution at SLS < 1.
So, SLS would be selected for studying CCEM.
4. CrossCorrelation Evaluation Model
4.1. Transformation from SLS to Data Sequence
To evaluate an Internet model is to determine the differences between the generated Internet topology and the real Internet topology. SLS eigen values sequences are introduced to determine the differences as a quantitative evaluation way.
The SLS eigen values are a series of numerical numbers representing the primary characters of the target graph, that is, the Internet topology graph. With the two value sequences, the problem left for us is to find an effective algorithm to get the evaluation result between them.
CCEM, then is used to evaluate whether a given or a generated topology is similar to or same as the real Internet topology. And the first requirement of CCEM is to transform SLS into data sequence.
After the sort of eigen values of SLS in descending way, the data sequence is gained and ready for the next step evaluation, as is shown in: where is sequence of real Internet topology, is sequence of a given topology, and denote the descending order of SLS eigen value of the real Internet topology and a given topology, respectively.
4.2. CrossCorrelation Algorithm
Crosscorrelation algorithm is capable of distinguishing and identifying the differences between numerical number sequences in an absolutely quantitative way [4], and it is defined in (4.2): where is the disalignment lag between and , is crossvariance, and are autocorrelation of and with disalignment lag set to be 0, respectively. And they are: where is length of and . Let = Length (), = Length (), then: Proof 1. The crosscorrelation maximum occurs if and only if two given topologies are completely identical and the disalignment lag is 0.
Proof. If two given topologies are completely identical, then:
And if the disalignment lag is 0, with (4.3), we get:
According to (4.7), we get:
First, we are going to prove
Consider a nonnegative variable,
Extend (4.11), we get:
With (4.10), we simplify (4.12) to
Then,
Now, we have proved that crosscorrelation value reaches maximum when and the disalignment lag set to be 0.
Next, we are going to prove when , the maximum is still or .
When the disalignment lag , to simplify the proof procedure, we can set to be since . So, according to (4.15), we get:
And for , similar to , we still get:
We then use SLS eigen values from Figure 3, that is, the four SLS sequences from four real Internet topologies to testify whether Proof 1 is correct or not.
From Figure 6, it is clear that all four SLS sequences reach their maximums when disalignment lag equals 0, quite consistent with what we have proved in Proof 1.
(a)
(b)
(c)
(d)
And for Figure 7, we can find that the crosscorrelation still reaches the maximum when disalignment lag equals 0, though all four SLS sequences, that is, SLS(1), (2), (3), and (4) are different from each other.
(a)
(b)
(c)
The four SLS sequences, however, all come from real Internet topology, are quite similar to each other. And we can see that the maximum of three crosscorrelation nearly reach 1, quite close to the maximum value of autocorrelation in Figure 6. This is quite reasonable, because the topology that SLS(1), (2), and (3) and SLS(4) represent are proved to be similar in Section 3, and it is again proved to so close in topology structure to each other that the crosscorrelation values are almost equal to that of autocorrelation, that is, the four topologies are almost same to each other. Meanwhile, Proof 1 is testified to be true.
By now it seems that the alike topologies always reaches a maximum close to 1 during crosscorrelation calculations, what about the dislike topologies? We select SLS(1) and make crosscorrelation calculation with three random sequences and illustrated the results in Figure 8.
(a)
(b)
(c)
From Figure 8, it is clear that the plot is quite different from that in Figure 7. Firstly, the maximum of crosscorrelation is around 0.2, not 1 as in Figures 6 and 7, meaning that the similarities between SLS(1) and random sequences (1), (2), and (3) are not identical to each other, that is, the topologies represented by SLS(1) and the other three random sequences are not alike to each other. This is quite reasonable since the three random sequences originate from random operations, it is unlikely to be identical to SLS(1), or the random generated topology has very little possibility to be similar to the real Internet topology.
Secondly, the growing curves are not close to zero any more, but close to 0.1. The reason is that part of the randomly generated sequences is “similar” in some way to part of SLS sequence (1). The “similarity,” however, is quite low since the crosscorrelation values are near 0.1 and 0.2, quite far from 1, the value of the crosscorrelation calculation from completely identical topologies.
With Proof 1 and illustrations from Figures 6, 7, and 8, CCEM can be used to evaluate the difference between topologies, and, more important, CCEM can function as a measuring scale to evaluate how much a given topology is close to the other one.
The gained result from CCEM would be a relative large crosscorrelation value if the two sequences or two topologies are similar to each other, or a small value otherwise. Then a threshold would usually be set for making decisions when using CCEM in evaluating Internet topology model.
4.3. CCEM Algorithm
The CCEM algorithm for the Internet topology is shown in Table 2.

The size of the modeled Internet graph and that of real Internet graph must be identical, and the user could controls how to set the value. We know that the real Internet graphs with different size are quite different, even the real Internet graph with the same size but resampled at different time, are not identical to each other. So the result gained out of the algorithm may differ in some way each time.
But we still consider the CCEM algorithm to be effective because (1) the properties of the real Internet by resampling rules are quite similar (Figures 2, 3, and 4), so the different resampled Internet graph could not make great changes for the algorithm results. (2) Internet is a kind of dynamically growing networks, there is not a static Internet graph to be used as a template in the algorithm. So the resampled Internet is so far OK to be used in the algorithm.
4.4. Recommended Way to Use CCEM
A way to use CCEM is recommended as to use it within a Genetic Algorithm (GA). Here are the reasons. (1)GA fits the CCEM studied in this paper quite well. GA could give direct calculations and optimizations when using CCEM to evaluate and optimize a given topology to real Internet topology. (2)Most Internet modeling researches are out of statistics at present because the Internet is too large to be handled by other approaches. And the most statistical result is a mathematical model with uncertain parameters, for example, some parameters are defined as data sequences [3], other than a single value. How to determine these parameters, or how to optimize the data sequences, is the most essential issue that the current researchers are required to do. Technically speaking, it needs rounds of repeatable calculations. Thus, GA would be the most appropriate tool because of it’s quite good at repeatable computation and autodecisionmaking. GA could automatically make adjustments to the Internet model’s parameters till the optimization is done. (3)In the meanwhile, GA is good at reducing computing complexity by its ability of finding a secondary optimum solution.
So CCEM is recommended to be used in a GA in Internet topology modeling or other largescale topology researches.
5. Conclusions
CCEM and its algorithm were studied in this paper. Firstly, we testified the ability of spectral density in distinguishing different graphs by performing it among ER random graph, BA scalefree graph and the Internet topology graph. We found that three yielded spectra showed quite different properties, so that the spectral density approach was confirmed to be capable of distinguishing and identifying Internet graphs.
Next, we get topology’s SLS eigen values and input them into CCEM to quantitatively evaluate the difference between graphs.
Finally, CCEM used in GA was recommended in Internet topology modeling or other largescale topology researches to reduce computing complexities.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (60802031), the Liaoning Provincial Natural Science Foundation (201003676), and the Natural Science Foundation of Shenyang city (F10205126).
References
 Y. Jiang, B. X. Fang, M. Z. Hu, and R. Q. He, “Example of analyzing the characteristics of a large scale ISP topology measured from multiple vantage points,” Journal of Software, vol. 16, no. 5, pp. 846–856, 2005. View at: Publisher Site  Google Scholar
 M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On powerlaw relationships of the Internet topology,” ACM SIGCOMM Computer Communication Review, vol. 29, no. 4, pp. 251–262, 1999. View at: Google Scholar
 X. F. Wang, X. Li, and G. R. Chen, Complex Networks Theory and Its Application, QsingHua Press, Beijing, China, 2006.
 Rorabaugh, Complete Digital Signal Processing, McGrawHill, New York, NY, USA, 2005.
 D. B. West, “An introduction to graph theory,” Chian Machine Press, no. 1–47, pp. 339–348, 2006. View at: Google Scholar
 I. J. Farkas, I. Derényi, A. L. Barabási, and T. Vicsek, “Spectra of “realworld” graphs: beyond the semicircle law,” Physical Review E, vol. 64, no. 2, pp. 1–12, 2001. View at: Google Scholar
 Y. Zhang, H. L. Zhang, and B. X. Fang, “Survey on internet topology modeling,” Journal of Software, vol. 15, no. 8, pp. 1220–1226, 2004. View at: Google Scholar
 R. Teixeira, K. Marzullo, S. Savage, and G. M. Voelker, “In search of path diversity in ISP networks,” in Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC '03), pp. 313–318, October 2003. View at: Google Scholar
 S. Bilir, K. Sarac, and T. Korkmaz, “Intersection characteristics of endtoend internet paths and trees,” in Proceedings of the 13th IEEE International Conference on Network Protocols (ICNP '05), pp. 378–387, November 2005. View at: Publisher Site  Google Scholar
 Y. Xu, A TL model for routerlevel Internet macroscopic topology, Ph.D. thesis, Northeastern University, Shenyang, China, 2006.
 K. I. Goh, B. Kahng, and D. Kim, “Spectra and eigenvectors of scalefree networks,” Physical Review E, vol. 64, no. 5, pp. 1–5, 2001. View at: Publisher Site  Google Scholar
 A. L. Barabási and E. Bonabeau, “Scalefree networks,” Scientific American, vol. 288, no. 5, pp. 60–69, 2003. View at: Google Scholar
 A.L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999. View at: Publisher Site  Google Scholar
 R. Albert and A.L. Barabási, “Statistical mechanics of complex networks,” Reviews of Modern Physics, vol. 74, no. 1, pp. 47–97, 2002. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 E. R. van Dam and W. H. Haemers, “Which graphs are determined by their spectrum?” Linear Algebra and Its Applications, vol. 373, pp. 241–272, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH
Copyright
Copyright © 2012 Ye Xu and Zhuo Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.