Genetic Algorithm Optimized CCEM for Complex Topology
To evaluate how much two different complex topologies are similar to each other in a quantitative way is an essential procedure in large-scale topology researches and still remains an NP problem. Cross-correlation evaluation model (CCEM) together with Genetic Algorithm (GA) is introduced in this paper trying to solve this issue. Experiments have proved that SLS (Signless Laplacian Spectra) is capable of identifying a topology structure and CCEM is capable of distinguishing the differences between corresponding topology SLS eigenvectors. CCEM used in GA is recommended at last since a way of not finding the optimum solution in GA is a good way to reduce computing complexity.
The research on the Internet topology modeling has been growing into a hot topic in Internet-related research fields recently [1, 2]. In Internet topology modeling and other large-scale topology researches, to evaluate how much two different complex topologies are similar to each other in a quantitative way is an essential part and up to now, it is still regarded as an NP problem and there is still not a good way to solve it in a quantitative way.
We take Internet topology as an example trying to solve this issue by constructing a quantitative model including cross-correlation evaluation model (CCEM), spectral density  out of composite methods of graph theory, correlation algorithm , and Genetic Algorithm in this paper.
1.1. Spectral Density Introduction
A nondirected graph could be denoted by it symmetrical adjacency matrix . If there is a link between node and node in , then , otherwise . Eigen values of are the eigen values of its matrix , and they are denoted as . Researches in graph theory show that eigenvalues of a graph are closely related to the structural properties of the graph topology. So studies on a graph’s eigenvalues are useful in topology research.
Spectrum of a graph is denoted by a set of the eigen values and their tuple of its adjacency matrix , and it is denoted as follows. where is the tuple of the eigenvalue.
Spectral density is the eigen value density of the adjacency matrix , and it could be denoted as [5–7]. where is the th eigen value of the adjacency matrix , is the number of the eigen values. will be approaching to a continuous function when .
1.2. Experiment Samples
The samples are the measured router-level Internet results with 1,145,841 routers (nodes) and 2,907,638 links. After IP alias solution [8, 9], the size of the sample reduced to 29,367 routers and 190,280 links, respectively .
To further simplify the computation, we performed a second-order sampling (resampling) operations on the experiment samples, and the re-sampling rules are (1) resampling operation is completely random, it could start from any effective node in target graph; (2) resampled results must be a connected graph; (3) Re-sampled results should cover as much nodes as possible, that is, node selection is preferential to link selections.
At last, the re-sampled Internet topology graph was converted into an adjacency matrix for further calculation.
2. Possibility of Using Spectral Density in Distinguishing Topology Graphs
Before we made use of spectral density to construct CCEM, we would first testify whether it could be used to distinguish topology graphs (including Internet topology) or not.
Three representative graphs: ER random graph, scale-free graph, and Internet topology graph were selected for the test in this paper.
According to , the spectral density of an ER random graph converges to a half-circle, and the low part of the half-circle exhibits an exponential distribution.
We can find from [6, 11] that different graph exhibits quite different spectra diagram. Can the Internet topology graph, however, be denoted by spectrum density or not? As we know, Internet topology is different from that of ER graph and scale-free graph, but is a little close to the latter one [1, 2, 10]. We then take a look at it if it is possible to distinguish the Internet topology graph from the scale-free one.
For simplicity and better comparison, we draw three copies of Internet graph with the re-sampling tool mentioned above and the size of the three samples after re-sampling are 30 nodes and 29 links, 300 nodes and 536 links, and 500 nodes and 753 links, respectively. Their eigen values and spectral density are listed in Table 1.
The symmetry of the spectral density could be found from Table 1, and this is consistent to the spectra symmetry on scale-free graphs found in [6, 11]. The correspondence match proves in a coarse granularity that there is a little similarity between the Internet graph and the scale-free graph, as was mentioned previously.
However, there are differences between the graphs, and we illustrated the Internet’s spectra diagram in Figure 1 for better comparison.
From Figure 1, we first find that there are complete conformities in all three re-sampled graphs (30 ips, 300 ips, and 500 ips), such as two small peaks when , one distinct peak when , and all when and .
All three graphs comprise quite different sizes and contents (specific routers and links) due to re-sampling rules, and the conformity found in Figure 1 shows that, though performed on different part of Internet, the spectral density still gets similar results. So conclusions could be made that, spectral density is OK in representing real Internet graph characters.
Next, we find that the center of three spectral density curves in Figure 1 is of triangular shape, which is similar to the scale-free graph. For the two side parts, however, they are different from scale-free graph since the side parts are not complied with exponential distribution or power-law distribution. So the spectral density is OK in distinguishing Internet graph from the scale-free graph.
Again, we begin to distinguish the Internet graph from the ER graph, and the differences are easily found. So, we make the conclusion that the spectral density is OK in distinguishing Internet graph from the ER graph.
Together with the fact that spectral density gives a quantitative description of Internet topology characters, we would make use of it in CCEM for Internet topology modeling.
3. Internet Topology Characters Discovered by Spectral Density
3.1. General Spectral Density
What is more, we enlarge the size of the re-sampled Internet topology graph from 30 ips, 300 ips, and 500 ips (Figure 1) to 300, 800, 2000, 3000, and 4000 ips (Figure 2) so as to make a graph closer to the real Internet.
We know that the more nodes a graph has, the closer to real Internet it is. However, a graph with 4000 ips is the largest one in this paper, and the reasons are (1) limitations of computing abilities, the calculating efficiency of spectral density would decrease sharply if the size of the graph increases over 4000; (2) Internet characters could be well expressed through spectral density no matter how many nodes an Internet graph has. And this is a fact had been proved in Figure 1 (different-sized graph has conformities in spectral density structure) and going to be proved again in Figure 2.
From Figure 2, we found that all five graphs’ spectral density showed very good conformities despite of their different size. All five plots have the maximum when and the second maximum when around.
Similar to what was found in Figure 1, the conformity among five Internet graphs proved that only a small-sized Internet graph could be enough to represent key properties of real Internet topology by spectral density based on the re-sampling tool. Which means that, performing experiments on the complete Internet topology graph is not necessary any more for us to study its properties, a rather smaller re-sampled graph with appropriate algorithm could also be effective.
Back to the basic idea of this paper, to distinguish topology graphs by comparing their spectral density. However, the spectral density is somewhat in coarse granularity, there is another especially valuable kind of spectral density named Signless Laplacian Spectra (SLS) which could give further and finer information on a graph’s properties .
An SLS matrix of a graph is defined to , where matrix is a diagonal matrix representing G’s degree, and matrix is ’s adjacency matrix . SLS is eigen values of . Some researches in graph theory indicate that SLS is the best spectra in distinguishing different graphs . In this paper, SLS is used on four re-sampled Internet topology graphs (3000 ips). And the result is illustrated in Figure 3.
From Figure 3, firstly, we could see that all four curves show high similarities although the four samples are completely random and different from each other. Again, this should be regarded as another proof that the re-sampled samples could effectively represent properties of the real Internet graph.
There are two evident horizontal lines when SLS equals to 1(10°) and 2, which means that there are the most nodes in the Internet topology graph when SLS equals to 1, and the second-most nodes at SLS = 2. All four samples exhibit same properties clearly in Figure 3.
For the other part of Figure 3, that is, the part when SLS > 2 and SLS < 1, we would make further studies by performing power-law distribution fitting operations . The fit result is illustrated in Figures 4 and 5.
From Figure 4, we could see that there is obvious power-law relationship between SLS and its corresponding descending order, and the fitting result ACC (absolute value of the correlation coefficient) is greater than 0.9, meaning that the fitting operation is highly acceptable. The power-law relationship found here is quite consistent to what was found in the spectral density research on China CERNET in .
However, there is not clear power-law relationship since ACC is rather small in Figure 5. And this could also be regarded as a criterion identifying Internet graph.
3.3. Selection for CCEM
Compared with the general spectral density, SLS is better since (1) SLS is recommended to be the best spectra in ; (2) SLS is as same as the general spectral density in quantitatively identifying Internet graph by its eigen value sequence, but is better in discovering more characters of Internet such as two horizontal phases at SLS = 1 and SLS = 2, one power-law distribution part when SLS > 2, and non-power-law distribution at SLS < 1.
So, SLS would be selected for studying CCEM.
4. Cross-Correlation Evaluation Model
4.1. Transformation from SLS to Data Sequence
To evaluate an Internet model is to determine the differences between the generated Internet topology and the real Internet topology. SLS eigen values sequences are introduced to determine the differences as a quantitative evaluation way.
The SLS eigen values are a series of numerical numbers representing the primary characters of the target graph, that is, the Internet topology graph. With the two value sequences, the problem left for us is to find an effective algorithm to get the evaluation result between them.
CCEM, then is used to evaluate whether a given or a generated topology is similar to or same as the real Internet topology. And the first requirement of CCEM is to transform SLS into data sequence.
After the sort of eigen values of SLS in descending way, the data sequence is gained and ready for the next step evaluation, as is shown in: where is sequence of real Internet topology, is sequence of a given topology, and denote the descending order of SLS eigen value of the real Internet topology and a given topology, respectively.
4.2. Cross-Correlation Algorithm
Cross-correlation algorithm is capable of distinguishing and identifying the differences between numerical number sequences in an absolutely quantitative way , and it is defined in (4.2): where is the disalignment lag between and , is cross-variance, and are autocorrelation of and with disalignment lag set to be 0, respectively. And they are: where is length of and . Let = Length (), = Length (), then: Proof 1. The cross-correlation maximum occurs if and only if two given topologies are completely identical and the disalignment lag is 0.
Proof. If two given topologies are completely identical, then:
And if the disalignment lag is 0, with (4.3), we get:
According to (4.7), we get:
First, we are going to prove
Consider a nonnegative variable,
Extend (4.11), we get:
With (4.10), we simplify (4.12) to
Now, we have proved that cross-correlation value reaches maximum when and the disalignment lag set to be 0.
Next, we are going to prove when , the maximum is still or .
When the disalignment lag , to simplify the proof procedure, we can set to be since . So, according to (4.15), we get:
And for , similar to , we still get:
We then use SLS eigen values from Figure 3, that is, the four SLS sequences from four real Internet topologies to testify whether Proof 1 is correct or not.
From Figure 6, it is clear that all four SLS sequences reach their maximums when disalignment lag equals 0, quite consistent with what we have proved in Proof 1.
And for Figure 7, we can find that the cross-correlation still reaches the maximum when disalignment lag equals 0, though all four SLS sequences, that is, SLS(1), (2), (3), and (4) are different from each other.
The four SLS sequences, however, all come from real Internet topology, are quite similar to each other. And we can see that the maximum of three cross-correlation nearly reach 1, quite close to the maximum value of autocorrelation in Figure 6. This is quite reasonable, because the topology that SLS(1), (2), and (3) and SLS(4) represent are proved to be similar in Section 3, and it is again proved to so close in topology structure to each other that the cross-correlation values are almost equal to that of autocorrelation, that is, the four topologies are almost same to each other. Meanwhile, Proof 1 is testified to be true.
By now it seems that the alike topologies always reaches a maximum close to 1 during cross-correlation calculations, what about the dislike topologies? We select SLS(1) and make cross-correlation calculation with three random sequences and illustrated the results in Figure 8.
From Figure 8, it is clear that the plot is quite different from that in Figure 7. Firstly, the maximum of cross-correlation is around 0.2, not 1 as in Figures 6 and 7, meaning that the similarities between SLS(1) and random sequences (1), (2), and (3) are not identical to each other, that is, the topologies represented by SLS(1) and the other three random sequences are not alike to each other. This is quite reasonable since the three random sequences originate from random operations, it is unlikely to be identical to SLS(1), or the random generated topology has very little possibility to be similar to the real Internet topology.
Secondly, the growing curves are not close to zero any more, but close to 0.1. The reason is that part of the randomly generated sequences is “similar” in some way to part of SLS sequence (1). The “similarity,” however, is quite low since the cross-correlation values are near 0.1 and 0.2, quite far from 1, the value of the cross-correlation calculation from completely identical topologies.
With Proof 1 and illustrations from Figures 6, 7, and 8, CCEM can be used to evaluate the difference between topologies, and, more important, CCEM can function as a measuring scale to evaluate how much a given topology is close to the other one.
The gained result from CCEM would be a relative large cross-correlation value if the two sequences or two topologies are similar to each other, or a small value otherwise. Then a threshold would usually be set for making decisions when using CCEM in evaluating Internet topology model.
4.3. CCEM Algorithm
The CCEM algorithm for the Internet topology is shown in Table 2.
The size of the modeled Internet graph and that of real Internet graph must be identical, and the user could controls how to set the value. We know that the real Internet graphs with different size are quite different, even the real Internet graph with the same size but re-sampled at different time, are not identical to each other. So the result gained out of the algorithm may differ in some way each time.
But we still consider the CCEM algorithm to be effective because (1) the properties of the real Internet by re-sampling rules are quite similar (Figures 2, 3, and 4), so the different re-sampled Internet graph could not make great changes for the algorithm results. (2) Internet is a kind of dynamically growing networks, there is not a static Internet graph to be used as a template in the algorithm. So the re-sampled Internet is so far OK to be used in the algorithm.
4.4. Recommended Way to Use CCEM
A way to use CCEM is recommended as to use it within a Genetic Algorithm (GA). Here are the reasons. (1)GA fits the CCEM studied in this paper quite well. GA could give direct calculations and optimizations when using CCEM to evaluate and optimize a given topology to real Internet topology. (2)Most Internet modeling researches are out of statistics at present because the Internet is too large to be handled by other approaches. And the most statistical result is a mathematical model with uncertain parameters, for example, some parameters are defined as data sequences , other than a single value. How to determine these parameters, or how to optimize the data sequences, is the most essential issue that the current researchers are required to do. Technically speaking, it needs rounds of repeatable calculations. Thus, GA would be the most appropriate tool because of it’s quite good at repeatable computation and auto-decision-making. GA could automatically make adjustments to the Internet model’s parameters till the optimization is done. (3)In the meanwhile, GA is good at reducing computing complexity by its ability of finding a secondary optimum solution.
So CCEM is recommended to be used in a GA in Internet topology modeling or other large-scale topology researches.
CCEM and its algorithm were studied in this paper. Firstly, we testified the ability of spectral density in distinguishing different graphs by performing it among ER random graph, BA scale-free graph and the Internet topology graph. We found that three yielded spectra showed quite different properties, so that the spectral density approach was confirmed to be capable of distinguishing and identifying Internet graphs.
Next, we get topology’s SLS eigen values and input them into CCEM to quantitatively evaluate the difference between graphs.
Finally, CCEM used in GA was recommended in Internet topology modeling or other large-scale topology researches to reduce computing complexities.
This work is supported by the National Natural Science Foundation of China (60802031), the Liaoning Provincial Natural Science Foundation (201003676), and the Natural Science Foundation of Shenyang city (F10-205-1-26).
M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the Internet topology,” ACM SIGCOMM Computer Communication Review, vol. 29, no. 4, pp. 251–262, 1999.View at: Google Scholar
X. F. Wang, X. Li, and G. R. Chen, Complex Networks Theory and Its Application, QsingHua Press, Beijing, China, 2006.
Rorabaugh, Complete Digital Signal Processing, McGraw-Hill, New York, NY, USA, 2005.
D. B. West, “An introduction to graph theory,” Chian Machine Press, no. 1–47, pp. 339–348, 2006.View at: Google Scholar
I. J. Farkas, I. Derényi, A. L. Barabási, and T. Vicsek, “Spectra of “real-world” graphs: beyond the semicircle law,” Physical Review E, vol. 64, no. 2, pp. 1–12, 2001.View at: Google Scholar
Y. Zhang, H. L. Zhang, and B. X. Fang, “Survey on internet topology modeling,” Journal of Software, vol. 15, no. 8, pp. 1220–1226, 2004.View at: Google Scholar
R. Teixeira, K. Marzullo, S. Savage, and G. M. Voelker, “In search of path diversity in ISP networks,” in Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC '03), pp. 313–318, October 2003.View at: Google Scholar
Y. Xu, A TL model for router-level Internet macroscopic topology, Ph.D. thesis, Northeastern University, Shenyang, China, 2006.
A. L. Barabási and E. Bonabeau, “Scale-free networks,” Scientific American, vol. 288, no. 5, pp. 60–69, 2003.View at: Google Scholar