Complexity

Volume 2017, Article ID 3250301, 12 pages

https://doi.org/10.1155/2017/3250301

## On Measuring the Complexity of Networks: Kolmogorov Complexity versus Entropy

^{1}Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland^{2}Department of Computational Intelligence, ENGINE-The European Centre for Data Science, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

Correspondence should be addressed to Mikołaj Morzy; lp.nanzop.tup@yzrom.jalokim

Received 6 April 2017; Revised 27 July 2017; Accepted 13 August 2017; Published 1 November 2017

Academic Editor: Pasquale De Meo

Copyright © 2017 Mikołaj Morzy et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

One of the most popular methods of estimating the complexity of networks is to measure the entropy of network invariants, such as adjacency matrices or degree sequences. Unfortunately, entropy and all entropy-based information-theoretic measures have several vulnerabilities. These measures neither are independent of a particular representation of the network nor can capture the properties of the generative process, which produces the network. Instead, we advocate the use of the algorithmic entropy as the basis for complexity definition for networks. Algorithmic entropy (also known as Kolmogorov complexity or -complexity for short) evaluates the complexity of the description required for a lossless recreation of the network. This measure is not affected by a particular choice of network features and it does not depend on the method of network representation. We perform experiments on Shannon entropy and -complexity for gradually evolving networks. The results of these experiments point to -complexity as the more robust and reliable measure of network complexity. The original contribution of the paper includes the introduction of several new entropy-deceiving networks and the empirical comparison of entropy and -complexity as fundamental quantities for constructing complexity measures for networks.

#### 1. Introduction

Networks are becoming increasingly more important in contemporary information science due to the fact that they provide a holistic model for representing many real-world phenomena. The abundance of data on interactions within complex systems allows network science to describe, model, simulate, and predict behaviors and states of such complex systems. It is thus important to characterize networks in terms of their complexity, in order to adjust analytical methods to particular networks. The measure of network complexity is essential for numerous applications. For instance, the level of network complexity can determine the course of various processes happening within the network, such as information diffusion, failure propagation, actions related to control, or resilience preservation. Network complexity has been successfully used to investigate the structure of software libraries [1], to compute the properties of chemical structures [2], to assess the quality of business processes [3–5], and to provide general characterizations of networks [6, 7].

Complex networks are ubiquitous in many areas of science, such as mathematics, biology, chemistry, systems engineering, physics, sociology, and computer science, to name a few. Yet the very notion of network complexity lacks a strict and agreed-upon definition. In general, a network is considered “complex” if it exhibits many features such as small diameter, high clustering coefficient, anticorrelation of node degrees, presence of network motifs, and modularity structures [8]. These features are common in real-world networks, but they rarely appear in artificial random networks. Finding a good metric with which one can estimate the complexity of a network is not a trivial task. A good complexity measure should not depend solely on the number of vertices and edges, but it must take into consideration topological characteristics of the network. In addition, complexity is not synonymous with randomness or unexpectedness. As has been pointed out [8], within the spectrum of possible networks, from the most ordered (cliques, paths, and stars) to the most disordered (random networks), complex networks occupy the very center of this spectrum. Finally, a good complexity measure should not depend on a particular network representation and should yield consistent results for various representations of the same network (adjacency matrix, Laplacian matrix, and degree sequence). Unfortunately, as current research suggests, finding a good complexity measure applicable to a wide variety of networks is very challenging [9–11].

Among many possible measures which can be used to define the complexity of networks, the entropy of various network invariants has been by far the most popular choice. Network invariants considered for defining entropy-based complexity measures include number of vertices, number of neighbors, number of neighbors at a given distance [12], distance between vertices [13], energy of network matrices such as Randić matrix [14] or Laplacian matrix [15], and degree sequences. There are multiple definitions of entropies, usually broadly categorized into three families: thermodynamic entropies, statistical entropies, and information-theoretic entropies. In the field of computer science, information-theoretic measures are the most prevalent, and they include Shannon entropy [16], Kolmogorov-Sinai entropy [17], and Rényi entropy [18]. These entropies are based on the concept of the information content of a system and they measure the amount of information required to transmit the description of an object. The underlying assumption of using information-theoretic definitions of entropy is that uncertainty (as measured by entropy) is a nondecreasing function of the amount of available information. In other words, systems in which little information is available are characterized by low entropy and therefore are considered to be “simple.” The first idea to use entropy to quantify the complexity of networks comes from Mowshowitz [19].

Despite the ubiquitousness of general-purpose entropy definitions, many researchers have developed specialized entropy definitions aimed at describing the structure of networks [10]. Notable examples of such definitions include the proposal by Ji et al. to measure the unexpectedness of a particular network by comparing it to the number of possible network configurations available for a given set of parameters [20]. This concept is clearly inspired by algorithmic entropy, which defines the complexity of a system not in terms of its information content, but in terms of its generative process. A different approach to measure the entropy of networks has been introduced by Dehmer under the form of information functional [21]. Information functional can be also used to quantify network entropy in terms of -neighborhoods of vertices [12, 13] or independent sets of vertices [22]. Yet another approach to network entropy has been proposed by Körner, who advocates the use of stable sets of vertices as the basis to compute network entropy [23]. Several comprehensive surveys of network entropy applications are also available [9, 11].

Within the realm of information science, the complexity of a system is most often associated with the number of possible interactions between elements of the system. Complex systems evolve over time, they are sensitive to even minor perturbations at the initial steps of development and often involve nontrivial relationships between constituent elements. Systems exhibiting high degree of interconnectedness in their structure and/or behavior are commonly thought to be difficult to describe and predict, and, as a consequence, such systems are considered to be “complex.” Another possible interpretation of the term “complex” relates to the size of the system. In the case of networks, one might consider to use the number of vertices and edges to estimate the complexity of a network. However, the size of the network is not a good indicator of its complexity, because networks which have well-defined structures and behaviors are, in general, computationally simple.

In this work, we do not introduce a new complexity measure or propose new informational functional and network invariants, on which an entropy-based complexity measure could be defined. Rather, we follow the observations formulated in [24] and we present the criticism of the entropy as the guiding principle of complexity measure construction. Thus, we do not use any specific formal definition of complexity, but we provide additional arguments why entropy may be easily deceived when trying to evaluate the complexity of a network. Our main hypothesis is that algorithmic entropy, also known as Kolmogorov complexity, is superior to traditional Shannon entropy due to the fact that algorithmic entropy is more robust, less dependent on the network representation, and better aligned with intuitive human understanding of complexity.

The organization of the paper is the following. In Section 2, we introduce basic definitions related to entropy and we formulate arguments against the use of entropy as the complexity measure of networks. Section 2.3 presents several examples of entropy-deceiving networks, which provide both motivation and anecdotal evidence for our hypothesis. In Section 3, we introduce Kolmogorov complexity and we show how this measure can be applied to networks, despite its high computational cost. The results of the experimental comparison of entropy and Kolmogorov complexity are presented in Section 4. The paper concludes in Section 5 with a brief summary and future work agenda.

#### 2. Entropy as the Measure of Network Complexity

##### 2.1. Basic Definitions

Let us introduce basic definitions and notation used throughout the remainder of this paper. A* network* is an ordered pair , where is the set of* vertices* and is the set of* edges*. The* degree* of the vertex is the number of vertices adjacent to it, . A given network can be represented in many ways, for instance, using an* adjacency matrix* defined as

An alternative to the adjacency matrix is the Laplacian matrix of the network defined as

Other popular representations of networks include the* degree list* defined as and the* degree distribution* defined as

Although there are numerous different definitions of entropy, in this work we are focusing on the definition most commonly used in information sciences, the Shannon entropy [16]. This measure represents the amount of information required to provide the statistical description of the network. Given any discrete random variable with possible outcomes, the Shannon entropy of the variable is defined as the function of the probability of all outcomes of :

Depending on the selected base of the logarithm, the entropy is expressed in bits (), nats (), or dits () (bits are also known as Shannon, and dits are also known as Hartley). The above definition applies to discrete random variables; for random variables with continuous probability distributions differential entropy is used, usually along with the limiting density of discrete points. Given a variable with possible discrete outcomes such that in the limit the density of approaches the invariant measure , the continuous entropy is given by

In this work, we are interested in measuring the entropy of various network invariants. These invariants can be regarded as discrete random variables with the number of possible outcomes bound by the size of the available alphabet, either binary (in the case of adjacency matrices) or decimal (in the case of other invariants). Consider the 3-regular graph presented in Figure 1. This graph can be described using the following adjacency matrix: