Abstract

Most growth models for complex networks consider networks comprising a single connected block or island, which contains all the nodes in the network. However, it has been demonstrated that some large complex networks have more than one island, with an island size distribution () obeying a power-law function . This paper introduces a growth model that considers the emergence of islands as the network grows. The proposed model addresses the following two features: (i) the probability that a new island is generated decreases as the network grows and (ii) new islands are created with a constant probability at any stage of the growth. In the first case, the model produces an island size distribution that decays as a power-law with a fixed exponent and in-degree distribution that decays as a power-law with . When the second case is considered, the model describes island size and in-degree distributions that decay as a power-law with and , respectively.

1. Introduction

Research in complex networks (CN) has risen in interest and importance given that many natural and artificial systems can be abstracted, modeled, and analyzed using this type of networks. Examples of such systems are numerous: neuron connectivity [1], the plant pollination process [2], gene inheritance [3], metabolic interactions [4, 5], highway and road networks [6], emails [7], sexual partners [8], and many others.

Before 1998, most networks were studied using the random network model, which assumes that each node randomly chooses other nodes to get connected. This random selection process produces certain topological properties [9]. For example, both the in-degree and out-degree distributions may approximate either a Poisson or an exponential function [9].

Random models were employed before real data became available which allowed to verify the properties and characteristics obtained from this model. Still, it was difficult to emulate the behavior of real networks because many of the factors needed to perform these simulations were unknown. It is under this context that Paxon and Floyd published an article where they described the main difficulties in simulating the Internet [10].

A new insight was provided by Redner in 1998, when he published a study about the distribution of citations in scientific publications [11]. In his study, publications are described as a network, where an article is represented as a node and the citations between papers are represented as a network edge. Redner discovered that the tail of the citation distribution decays as a power-law with an exponent .

In 1999, Faloutsos et al. published important topological properties occurring in the Internet [12]. Among the most interesting ones is the conclusion that the Internet’s out-degree and in-degree distributions at the autonomous system scale follow a power-law distribution [12]. In that same year, Adamic et al. showed that the WWW also follows a power-law distribution in some of its topological properties [13]. These studies suggest that these networks cannot be analyzed using the random model, since some of their connectivity properties do not behave as a Poisson or as an exponential function but rather as a power-law function.

It is important to notice that although these networks are fundamentally different they still have similar properties. For example, even when the Internet has doubled its number of nodes, its diameter has not changed significantly. In this context, diameter is the number of nodes that need to be visited for the longest of all the shortest paths between all the pairwise nodes in the network.

These investigations have been the watershed that initiated the search of power-law behaviors and other properties in systems that can be modeled as a CN. In a few years this search has led to discoveries which have confirmed that power-law distributions and their properties, like clustering and diameter, can be found in most of these complex systems [14].

In 1998, Watts and Strogatz published a study about CN where they affirmed that real networks are in an intermediate state between randomness and regularity. They named this type of systems as small-world networks because of their small diameter, and they also introduced a mechanism to produce them. However, there was no knowledge about the internal processes which were able to reproduce these properties and thus there was no growth model which would allow to study them. At this point, Barabási and Albert published a study [15] where they introduced a growth model for CN based on a node aggregation process with preferential attachment. This model yields networks with a degree distribution that decays as a power-law with an exponent equal to 3. Such a model, however, has some limitations: it only generates networks with a fixed degree distribution exponent. This result contrasts with real networks obeying a power-law out-degree and in-degree distributions, which both exhibit exponents . Therefore, a new effort started the search for new processes that could be included into the Albert-Barabási model in order to reproduce the same topological properties found in real CN.

Some of these local processes have already been studied. For example, Krapivsky et al. [16] implemented a nonlinear preferential attachment growth mechanism. Dorogovtsev and Mendes [17] proposed another model in which nodes get an initial attractiveness at the time of being born. This is how they solved a contradiction in Barabási’s original model for directed networks which could be illustrated by the following example: in the first day of term in a classroom nobody knows each other. Following Barabási’s model, the probability that a student obtains a new friend is proportional to the number of friends he or she already has. Since nobody has friends, no new friendships are created, which is not what happens in a real situation.

Another local process studied is the one introduced by Dorogovtsev and Mendes [18] who considered that nodes may become old and decrease their attractiveness. For example, scientific articles’ citations decrease with time. Albert and Barabási introduced a model with rewiring [19], in which there is a probability that some node deletes link to a node and then uses preferential attachment to a node using a new link . This process may be found in some networks, like the WWW when a web master deletes a hyperlink and adds a new one. Esquivel-Gómez et al. [20] described a model which prohibits multiple links between the same nodes. Acosta-Elias et al. [21] studied the impact of delays, which is based on the fact that each new node selects to which other node it should get connected using local information, instead of global topological information. In other words, delays allow for a behavior in which new nodes select new connections based on a partial view of the system.

There are many other models, processes, and properties that have been studied about CN. Many of these may be consulted in various review articles [14, 22, 23]. However, after sixteen years of different studies and analyses of CN, it seems that there is a lack of consensus and knowledge about the fundamental laws that govern these systems.

Using the percolation concept, there have been analytical studies about the properties of islands or connected components in random and CN [24]. Moreover, islands have been observed in real networks: Broder et al. [25] studied a WWW sample with approximately 200 million web pages with 1.5 billion of links and discovered islands with sizes, expressed as the number of pages, exhibiting a power-law distribution with an exponent equal to 2.54. This power-law distribution of island sizes allows the existence of giant islands (giant connected components). Determining the probability that any given node belongs to these giant islands is important because it allows to calculate the probability that a message generated by a node reaches another node or to model strategies that could stop the transmission of an epidemic disease. As a matter of fact, in the literature it is possible to find growth models capable of generating CN with community structure [2630]; that is, networks containing groups of nodes strongly connected between them but weakly connected with the other nodes comprising the network. Likewise, there exist growth models [20, 3135] that consider every node added to the network always connecting to the existing ones. In other words, all the nodes in these models form a single island, which contains all the nodes of the network. However, in some real networks, as in the US patent’s citation network [36], the nodes form more than one island and hence follows a power-law distribution [37].

A possible cause for the origination of islands in some real CN is that, during network growth, some nodes may be born with zero out-going links (i.e., patents without references to other patents) and this causes new islands to be generated.

This paper proposes a growth model for directed CN that considers the emergence of islands as the network grows. In the proposed model new islands are created with probability , contemplating two different cases: () remains constant during the whole life of the network and () decreases as the network grows. It is shown analytically and validated through numerical simulation that, for the first case, the model generates directed CN with power-law behavior in its in-degree () and island size () distributions with and both tunable from to . For the second case of the exponents are fixed to and , respectively.

The paper is organized as follows. Section 2 outlines the features of the model proposed in this paper. The analytical solution of the model and the experiment details and results are present in Section 3. Finally, the discussion and conclusion are given in Sections 4 and 5, respectively.

2. Model

We consider that the birth of new islands is governed by a probability and by taking into account two cases:(i), where is the number of nodes in the network. In this case, the probability that a new island is born decreases as the number of nodes in the network increases. This idea is mapped from real networks as follows: in a scientific papers citation network, when there are few papers (nodes), it is more probable that a new paper does not cite other papers (generating a new island) because it addresses an entirely new scientific topic. Conversely, when the quantity of papers increases, the probability that a new paper addresses an entirely new theme decreases; thus the probability of generating a new island also decreases.(ii), . In this case, the probability that a new island is created remains constant during the whole life of the network.

Furthermore, the growth of the network is performed by adding one node at each time step. At the beginning, only node exists in the network and for each new node added to the network, either one of the following rules is performed:(i)With probability , does not connect to any node in the network. That is, generates a new island (see Figure 1).(ii)With complementary probability , randomly selects a node and connects to it, as well as to all nodes that have one incoming link from (see Figure 1).

3. Analytic Solution of the Model

3.1. Islands Size Distribution

In order to obtain the analytical solution for , the continuum method [38] was employed using the following differential equation:

Equation (1) describes the variation of the number of islands with nodes with respect to the total number of nodes in the network. Term describes the birth of a new island; that is, it models the situation that a new node does not connect with any node (first rule of this model). The term depicts the second rule of the model, term accounts for the situation that a new node randomly selects a node belonging to an island with nodes and connects to it, and thus increases. The term describes the situation that a new node randomly selects a node belonging to an island with nodes and connects to it, and thus decreases.

Equation (1) may also be written in the standard form for a linear differential equation:In order to investigate the impact that and have in , (2) is solved for each one of them. With , (2) takes the following form:In order to obtain , (3) is solved for , , and so on. For , (3) takes the following form:and solving (4) giveswhere is a constant and is the exponential integral function. As , (5) yields . Solving (3) for the following values produces the following:From the previous result (see (6)) it is possible to deduce thatThat is, with the proposed model is able to produce island size distributions with a power-law behavior with fixed exponent .

For , (2) takes the following form:In order to obtain , (8) is solved for , , and so on. For , (8) takes the following form:Solving (9) giveswhere is a constant. As , (10) can be approximated asSolving (8) for yieldsFrom the previous results (see (11) and (12)) one can deduce thatApproximating with the Gamma Function we obtainFrom (14), when the model is able to produce island size distributions with a power-law behavior for with exponent . This allows to take values from to .

3.2. In-Degree Distribution

In order to obtain the analytical solution for the in-degree distribution generated with the proposed model, the continuum method is used [38]. Hence, the differential equation that describes the in-degree distribution may be written as follows:Equation (15) describes the variation of the number of nodes with incoming links with respect to the number of nodes in the network. The term represents how the number of nodes with incoming links increases, describes how a new node randomly selects a node with incoming links and connects to it, and describes how randomly selects an in-neighbor of a node that has incoming links and connects to it (see Figure 2); thus increases. The term takes into account how the number of nodes with incoming links decreases, and terms and perform similar functions as and . Finally, the terms and model the effect of adding a new node with zero incoming links using the second and the first rule of the model.

Equation (15) may be written in the standard form for a linear differential equation:In order to analyze the impact that and have in , (16) is solved for each one of them.

For , (16) takes the following form:Solving (17) for some values it is possible to deduce thatThat is, with the proposed model is able to produce in-degree distributions with a power-law behavior for with fixed exponent . The same result was previously obtained by Krapivsky and Redner [32] in a similar model without contemplating the emergence of islands during the network growth.

For , (16) takes the following form:Solving (19) for several values producesApproximating with the Gamma Function we obtainTherefore, if the proposed model is able to produce in-degree distributions with a power-law behavior for with exponent . This allows to take values from to .

3.3. Experiment Details and Results

In order to validate the analytical predictions for (see (7) and (13)), four numerical simulations were performed. For each simulation, the growth of a directed network from to nodes was realized taking into account the proposed model developed above. Figure 3 shows that produced by our simulations and by the analytical results fit appropriately.

In order to validate the analytical predictions for (18) and (20), four numerical simulations were performed. For each simulation, we considered the growth of a directed network from to nodes. Figure 4 shows the comparison of produced by the simulations and the analytical results, showing that both fit appropriately.

Figure 5 shows the experimental out-degree distribution . It can be seen that when , the model produces networks with following a Poisson distribution (Figure 5(a)). On the other hand, when , decays as an exponential function as , whereas it exhibits a combination of an exponential and Poisson distribution when (see Figure 5(b)).

3.4. Numerical Study of and in the Islands

In order to investigate the in-degree and out-degree distributions occurring in the islands generated with our model, several numerical simulations were also performed. The numerical simulations consisted in simulating the growth of a network from to nodes with and (with ) and three islands of different sizes were randomly selected.

Figures 6(a)6(d) show that the in-degree distributions of the islands are all following a power-law with exponent . The only difference is the scale among them owing to the size of each island. More precisely, all the in-degree distributions are consistent with (18) and that obtained by Krapivsky and Redner [32]. The above result is easily interpreted since within each island, the in-degree distribution is governed only by the copy process, originating from the power-law with exponent . Similar behavior for the out-degree distribution is observed (Figure 7).

4. Discussion

Importantly, in this model the case when increases as the number of nodes increases is not considered. This is because when is large enough, new nodes added to the network would have a high probability of not connecting to other nodes, thus generating new islands. Therefore, the resulting network would be composed of a big quantity of isolated nodes. Also, the situation that a new node can connect to nodes present in different islands is not considered, because this would result in the fusion of two or more islands. These cases will be included in a future work.

Also, the proposed model produces networks with out-degree distributions that follow exponential and Poisson distributions. This contrasts with the out-degree distributions of several real networks in which the out-degree distribution follows a power-law. This is a limitation of the proposed model and could be a future topic of study.

Finally, it should be noted that the result obtained in (14) is similar to the model proposed by Simon [39], which was developed to explain the distribution of words in prose samples by their frequency of occurrence. It is remarkable that such different behaviors produce similar mathematical results.

5. Conclusions

In summary, large complex networks such as the US patents citation network [36] have more than one island and its island size distribution follows a power-law [37]. In order to reproduce this behavior, we have introduced a growth model for complex networks that considers the creation of islands during the growth of the network. In this model two cases are considered: namely, the creation of islands is more frequent in the networks early stages of growth and the probability of new islands remains constant as the network grows. When the first case is used, the network generated have and with power-law behavior with scaling exponents and , respectively. When the second case is used, the network generated exhibits and with power-law behavior with scaling exponent, both ranging from to .

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was partially supported by Consejo Nacional de Ciencia y Tecnología and Secretaría de Educación Pública-PRODEP (DSA/103.5/15/6660).