#### Abstract

We propose a novel, information-based classification of elementary cellular automata. The classification scheme proposed circumvents the problems associated with isolating whether complexity is in fact intrinsic to a dynamical rule, or if it arises merely as a product of a complex initial state. Transfer entropy variations processed by cellular automata split the 256 elementary rules into three* information classes*, based on sensitivity to initial conditions. These classes form a hierarchy such that coarse-graining transitions observed among elementary rules predominately occur within each information-based class or, much more rarely, down the hierarchy.

#### 1. Introduction

Complexity is easily identified, but difficult to quantify. Attempts to classify dynamical systems in terms of their complexity have therefore so far relied primarily on qualitative criteria. A good example is the classification of cellular automaton (CA) rules. CA are discrete dynamical systems of great interest in complexity science because they capture two key features of many physical systems: they evolve according to a local uniform rule and can exhibit rich emergent behavior even from very simple rules [1]. Studying the dynamics of CA therefore can provide insights into the harder problem of how it is that the natural world appears so complex given that the known laws of physics are local and (relatively) simple. However, in the space of CA rules, interesting emergent behavior is the exception rather than the rule. This has generated wide-spread interest in understanding how to segregate those local rules that generate rich, emergent behavior, for example, gliders and particles, computational universality, and so on from those that do not (for the seminal paper on Conway’s* Game of Life* cf. [2]. The first outlines of proof of the universality of a two-dimensional Game of Life can be found in [3, 4]. Proofs of the universality of nonelementary, one-dimensional cellular automata are instead in [5, 6]. For the famous proof of the universality of elementary rule 110 cf. [7]). A complication arises in that the complexity of the output of a CA rule is often highly dependent on that of the input state. This makes it difficult to disentangle emergent behavior that is a product of the initial state from that which is* intrinsic* to the rule. This has resulted in ambiguity in classifying the intrinsic complexity of CA rules as one must inevitably execute a CA rule with a particular initial state in order to express its complexity (or lack thereof).

One of the first attempts to classify CA rules was provided by Wolfram in his classification of elementary cellular automata (ECA) [8]. ECA are some of the simplest CA and are 1-dimensional with nearest-neighbor update rules operating on the two-bit alphabet . Despite their simplicity, ECA are known to be capable of complex emergent behavior. Initializing an ECA in a random state leads some rules to converge on fixed point or oscillatory attractors, while others lead to chaotic patterns that are computationally irreducible (such that their dynamics are impossible to predict from the initial state and chosen rule without actually executing the full simulation). Based on these diverse behaviors, Wolfram identified four distinct complexity classes, shown in Table 1. Class I CA are those whose evolution eventually leads to cells of only one kind. Class II CA lead to scattered stable or oscillating behaviors. Class III CA show an irreducibly chaotic pattern. Class IV CA can exhibit any of the previous behaviors simultaneously and seem to possess the kind of complexity that lies at the interface between mathematical modeling and life studies [9]. Wolfram’s classification stands as a milestone in the understanding of CA properties and still represents the most widely adopted classification scheme. Nonetheless, its qualitative nature represents its main limitation. In fact, as Culik II and Yu showed, formally redefining Wolfram’s classes [10] reveals that determining which class a given ECA belongs to is actually* undecidable*.

Despite this no-go result, efforts to better classify CA have not diminished. Li and Packard [11] introduced a classification scheme refining Wolfram’s originally proposed scheme, with the explicit goal to better distinguish between locally and globally chaotic behaviors. Langton introduced the first attempt at a quantitative classification [12]. His classification implemented a projection of the CA* rule space* over the 1D closed interval : CA rules with similar qualitative features are then roughly arranged in terms of comparable values of the Langton parameter, with the most complex behaviors found at the boundary separating periodic rules from chaotic ones. Alternative approaches to quantitative classification include generalizations of concepts from continuous dynamical systems applied to CA, such as the maximum Lyapunov Exponent (MLE) [13]. For CA, MLE is defined in terms of the Boolean Jacobian matrix [14] and captures the main properties of its continuous counterpart: encoding the tendency of a CA to reabsorb* damage* in its initial configuration or to let it spread, as a consequence of chaotic dynamics. This approach proved useful in more recent analysis of the stability and the shift in complexity of CA in response to changes in topology [15–18]. A number of other classification schemes have been proposed over the years, based on index complexity [19], density parameter with d-spectrum [20], communication complexity [21], algorithmic complexity [22], integrated information [23], and so on. A comprehensive review is outside of the scope of this paper. The interested reader is directed to the recent review by MartÍnez [24] and the literature cited therein.

In this paper, we report on experiments demonstrating new quantitative classification of the intrinsic complexity of ECAs, which differs from earlier attempts by exploiting the main weakness plaguing many approaches to quantitative classification. That is, we explicitly utilize the sensitivity of the expressed complexity of ECA rules to the initial input state. In recent years, there has been increasing interest in using information-theoretic tools to quantify the complexity of dynamical systems, particularly in the context of understanding biological systems [25, 26]. A promising tool in this context is transfer entropy (), Schreiber’s measure of the directed exchange of information between two interacting parts of a dynamical system [27]. In what follows, we adopt TE as a candidate quantitative selection criterion to classify the intrinsic complexity of ECAs by exploring the sensitivity of TE to changes in the initial state of a CA.

The paper is structured as follows. In Section 2, we start from the simplest, nontrivial, initial configuration of an ECA and use it to identify the dynamical rules able to produce a complex output (as quantified by TE) by virtue of their intrinsic complexity. In Section 3, we repeat our analysis for more general inputs and identify those outputs whose complexity is instead inherited by the complexity of the input, as opposed to the rule. We then classify ECA rules according to the maximum degree of variability of the output they produce for varying inputs. As we will show, three quantitatively and qualitatively distinct classes naturally emerge from this analysis. In Section 4, we show that this classification induces a partially ordered hierarchy among the rules, such that coarse-graining an ECA of a given class yields an ECA of the same class, or simpler [1]. We conclude by proposing further applications of the classification method presented.

#### 2. Intrinsic Complexity

In what follows we identify the complexity of a CA with the* amount of information it processes* during its time evolution. The concept of information processing presented herein is adopted from information dynamics [29–32], a formalism that quantifies computation in dynamical systems using methods from information theory. In information dynamics, Schreiber’s transfer entropy [27] is identified with the information processed between two elements of a dynamical system. In Schreiber’s measure, the transfer of information between a source and a target is quantified as the reduction in uncertainty of the future state of due to knowledge of the state of . In a CA, the source and target are both cells of the CA.

Formally, TE is defined as the mutual information between the source at a certain time and the target at time , conditioned on a given number of previous states of :where (see Figure 1)(i) represents the state of cell at time ;(ii) is the state of cell at the previous time ;(iii) is the vector of the previous states of cell ;(iv) represents the set of all possible patterns of sets of states .

To calculate TE, one must start with the time series of states of a CA. Given the time series (see Figure 1), the occurrence numbers of patterns of states are counted for each combination of cells and in the array. Once normalized to the total number of (not necessarily distinct) patterns appearing in the time series, these frequencies are identified with the probabilities appearing in (1). The conditional probabilities and are calculated analogously.

Changing the value of the history length can affect the value of TE calculated for the same time series data. We find very similar values of TE for and 6 and observe a considerable decrease in TE for and (cf. Section of [32]), relative to the value of TE for . We therefore consider as the optimal value of that properly captures the past history of for the results presented here (see Section 3 for a visual explanation of why represents the optimal history length).

We first generated time series for the simplest initial state for each of the 256 possible ECA rules, which we numerically label following Wolfram’s numbering scheme [33, 34]. Here and in the following, periodic boundary conditions are enforced. Therefore, the specific location of the different color cell in the input array is irrelevant. Our initial state is eitheror the equivalent state obtained through a conjugation. For example, rules 18 and 183 are equivalent under a conjugation. The input is updated using rule 18, and the conjugated input using rule 183. A comprehensive study of the symmetry properties of ECA can be found in [11], where the 256 possible ECA rules are grouped into 88 different equivalence classes. Each class contains no more than four rules, which show the same behavior under exchange, left-right transformations, or the combination of both transformations. The interested reader can find such classification summarized in Table 1 of the cited work. For Wolfram Classes III and IV (Wolfram’s most recent ECA classification scheme is now implemented in the* Wolfram Alpha* computational engine [28] and reproduced here (Table 1) for the convenience of the reader), the equivalence classes are also shown in the legend of Figure 2.

**(a) Single-cell input**

**(b) Random input**

We calculate the amount of* information processed by a CA* in a space-time patch as the average of calculated over that region, designated as . To do so, we evolve the CA for 250 time steps and then remove the first 50 time steps from each generated time series. This allows us to evaluate TE over a region of the CA in causal contact with the initial input at each point in space and time. This ensures that the observed dynamics over the relevant space-time patch are driven by both the chosen rule* and* the initial state. The resulting time series is then used to evaluate the 101^{2} values . For each rule, the average value is shown in Figure 2(a). Equivalent rules, like rules 30, 86, 135, and 149, produce the same values of and are represented by a common marker. For clarity, individual Wolfram Classes I and II rules are not shown and are instead replaced by a dashed line corresponding to the highest of any individual Class I or II rule. All Class III rules lie above the range of Classes I and II. Interestingly, with only the exception of rule 110 and its equivalent rules, all Class IV rules lie within the range of Classes I and II rules.

The single-cell input considered here is extremely rare within the space of all possible initial inputs. The number of black cells in a state randomly extracted among the different possible inputs follows a binomial distribution, meaning that states containing 50 or 51 black and white cells are about times more likely than our initially selected input. Our motivation for considering the single-cell input first is that it automatically excludes many trivial rules from our search for the complex ones. Rules that duplicate the input in a trivial way or annihilate it to produce all or all naturally yield . This is the same approach that has been recently assumed in algorithmic complexity based classification of CAs [22]. It has the advantage of selecting many rules according to their intrinsic complexity, and not the complexity of the input. However, a major shortcoming of choosing the single-cell input is that many Class IV rules look simpler than what they truly are. Class IV rule 106 is a good example. For the simple input of a single black bit, rule 106 functions to shift the black cell one bit to the left at each time-step, generating a trivial trajectory. However, in cases where the input allows two black cells to interact, the full potential of rule 106 can be expressed. This is an explicit example of the sensitivity of the behavior of some ECA rules to the complexity of their input, as discussed in the introduction. To isolate the complexity of the rule from that inherited by the input, we must therefore consider a random sample of initial states, as we do next.

#### 3. Inherited Complexity and Information-Based Classification

We next consider a more generic input, randomly selected among the different possibilities. As the input is no longer symmetric, we now need to consider reflections in selecting the equivalent input rule associations. The scenario changes completely in this case, as shown in Figure 2(b), where the highest values of correspond to Class II rules, including 15, 85, 170, and 240, which generate trivial, wave-like behaviors. These rules behave like rule 106 (especially 170 and 240) when initialized with a single-cell input, but they do not contribute any new, emergent nontrivial features when nearby cells interact. For all purposes they appear* less* intrinsically complex. With only the exception of rule 110 and its equivalent rules, Class IV rules behave like many rules of Class II and exhibit a large increase in complexity, as qualitatively observed and also as quantitatively captured by , in response to a more complex input.

It is worth noting that all of the rules whose initial value of lay above the upper limit for Classes I and II of bits in the simplest input scenario still have calculated above this value under a change of input. In particular, the rules with the highest values of are not significantly affected by the change in the input. Let us naively use the upper limit for Classes I and II rules emerging from Figure 2(a) as the border line between what we call* low* and* high* values of . We can summarize the changes under varying the input as follows. There are rules whose value of is low for both the inputs. There are rules whose value of is low in the simplest case, but high in the more complex one. And, finally there are also rules with a high value of in both cases. The interesting point is that* we find no rule whose value of ** is high for the simplest input and low for the random input*. This is equivalent to say that there are no rules generating complexity for simple inputs, but annihilating it for complex ones. The significance of this observation lies in the fact that it enables classification in terms of the shift of over a space-time region in response to a change in its input by taking advantage of the main limitation that makes quantitative classifications of ECAs so difficult, that is*,* ECA sensitivity to their initial state, and exploits it in order to achieve such classification.

To confirm this is indeed a viable method for quantitative classification, we must consider more than just two inputs. We therefore first randomly selected twenty different ones. Being interested in how much can vary as we vary the input, for each rule we selected the maximum absolute value of the percent change of between the random inputs () and the single-cell input () considered before:

The results are shown in Figure 3, where the maximum percent change is plotted as a function of . Equivalent rules are represented by a single marker and share the same value for the maximum relative change of . In fact, given two rules and sharing a symmetry, it is always possible to find two inputs and such that rule initialized to and rule initialized to give rise to the same value of .

The horizontal dashed line separates rules whose maximum change is less than one order of magnitude from those that can undergo a change in of at least a factor 10. The vertical dashed line denotes the highest value of for Wolfram Classes I and II, exactly as in Figure 2(a). The region to the right of the vertical line and above the horizontal line is void of any ECA rules. Points in that region would correspond to values of that are both high for the simplest input and capable of high variation, for example*,* CA rules that can annihilate the complexity of the input, which we do not observe. This feature yields a distinctive L-shape in the distribution of rules with the rules in each region sharing well-defined properties.

As a result, a natural, information-based classification of ECAs can be given as follows: *Class *: is very small for the simplest input and stays so for complex inputs. This is the most populated class, including almost all Wolfram Classes I and II rules, rules 18 and 146 and their equivalent Class III rules. *Class *: is small for the simplest input, but it experiences a drastic change (one order of magnitude or more) when the input is complex. This is the case for many Wolfram Class II and some Class IV rules (e.g.*,* 54 and 106 and their equivalent rules). *Class *: has a high value for the simplest input, and this value is approximately unaffected by a change in the input. Most Wolfram Class III rules belong to this class, as well as Class IV rule 110 and its equivalent rules.

Our classification is summarized in Table 2. Despite the arbitrary placement of our boundary lines at the border of the region, it shows an unambiguous separation between the most majority of rules classified as and .

Randomly sampling inputs leads to a bias favoring nearly fifty-fifty distributions of black and white cells, due to their binomial distribution. We therefore also verified this classification using a different distribution of inputs, where the number of black cells is increased in a regular way from 2 to 50, while keeping the specific positions of black cells in the input array random. We considered 20 different inputs, each containing exactly , and 50 black cells (higher numbers are not considered due to the conjugation). Apart from minor shifts of the data points, applying the same procedure as above yields exactly the same classification as Figure 3, indicating that our classification scheme is robust.

We stress the importance of considering a* large* system (order cells) as opposed to a much smaller one (e.g.*,* order ). While for the latter a scan over the entire space of inputs is computationally feasible (and indeed we performed these experiments), it hides one of the main features enabling information-based classification, the existence of Class rules, which form the most stable class with respect to our TE based complexity measure. Class rules produce time series largely independent of the initial state, exemplified by rule 30 in Figure 3, a feature evident only in larger systems (being the result of combinatorial calculus, TE computational time grows exponentially with the number of cells considered. An ECA array of about 100 cells is both the smallest size that we could safely consider much larger than that of the domains of Figure 3 and for which a complete scan over the rule space and a large enough ensemble of inputs could be made. We also considered individual runs with 200 and 400 cells and noticed that our numerical results are stable for CA with 100 or more cells but are fluctuating for CA with 50 or less cells).

The typical, order 10, cell length of the domains appearing in Figure 3 is also the reason why is the most optimal value for the history length parameter. Smaller and larger values of would not be able to resolve this lattice size independent feature of Class rules [32] (Figure 4).

#### 4. Coarse-Graining and the Information Hierarchy

Perhaps the most interesting feature of our quantitative classification is that Wolfram Class III and Class IV rules are distributed over different information-based classes. This behavior looks less surprising in the light of the coarse-graining transitions among ECA uncovered by Israeli and Goldenfeld [1]. One important aspect of the physical world is that coarse-grained descriptions often suffice to make predictions about relevant behavior. Noting this, Israeli and Goldenfeld adopted a systematic procedure to try to coarse grain ECA rules. Their prescription consists in grouping together nearby cells in the CA array into a supercell of a given size , according to some specified supercell single-cell correspondence (for Boolean CAs, possible applications of this kind exist), and a given time constant . The search for a coarse-graining rule consists in looking for a new CA rule such that running the original CA for time steps, and then grouping the supercells, produces the same output as grouping the initial array and then adopting the new CA rule for time steps. The new CA rule, the time constant , and the size of the supercell are all unknown; therefore, this search requires a scan over all the possible combinations of these parameters. Israeli and Goldenfeld successfully coarse-grained 240 of the 256 ECA rules, many to other ECA rules, performing a complete scan of all the possibilities compatible with . Importantly, the rule complexity was never observed to increase under coarse-graining, introducing a partial ordering among CA rules.

The same ordering emerges from our information-based classification, as shown in Figure 5, where arrows indicate coarse-graining transitions uncovered in [1]. These transitions introduce a fully ordered hierarchy such that coarse-graining is never observed to move up the hierarchy, and the vast majority of rules may only undergo coarse-graining transitions within the* same information class*. This ordering is sometimes expected, like in the case of many of our Class rules, where wave-like patterns are coarse-grained to either self-similar, or very similar, patterns. Other times it looks more profound, like in the case of Wolfram Class III rule 146. Rule is in , and it can be nontrivially coarse-grained to rule 128 (a Wolfram Class I rule), due to a shared sensitivity of the information processed in a given space-time patch to its input state. As noted in [1], the fact that potentially (a conclusive proof of the computational irreducibility of ECA rule 146 is still missing) computationally irreducible rules [8] like elementary rule 146 can be coarse-grained to predictable dynamics (e.g., rule 128) shows that computational irreducibility might lack the characteristics of a good measure of physical complexity. We are led to conclude that the coarse-graining hierarchy is more likely defined by conserved informational properties, with more complex rules by Wolfram’s classification appearing lower in the hierarchy if they can be coarse-grained to less complex ones with common sensitivity to the input state.

#### 5. Conclusions

Physical systems evolve in time according to local, uniform rules but can nonetheless exhibit complex, emergent behavior. Complexity arises either as a result of the initial state or rule, or some combination of the two. The ambiguity of determining whether complexity is generated by rules or states has confounded previous efforts to isolate complexity* intrinsic* to a given rule. In this work, we introduced a quantitative, information-based, classification of ECA. The classification scheme proposed circumvents the difficulties arising due to the sensitivity of the expressed complexity of ECA rules to their initial state. The (averaged) directed exchange of information (TE) between the individual parts of an ECA is used as a measure of its complexity. The identification of the single-cell input (Section 2) as the nontrivial state with the least complexity is assumed as a working hypothesis and provides a reference point for our analysis of the degree of variability of the complexity of ECA rules for varying inputs. We identified three distinct classes based on our analysis, which vary in their sensitivity to initial conditions. Class ECA always process little information, Class always processes high information, and Class can be low or high depending on the input. It is only for Class that the expressed complexity is intrinsic and not a product of the complexity carried by the input.

The most complex rules by our analysis are in Class , which includes the majority of Wolfram’s Class III rules and Class IV rule 110 and its equivalent rules. These rules form a closed group under the coarse-graining transitions found in [1]. The truly complex rules are those that remain complex even at a macrolevel of description, with behavior that is not sensitive to the initial state.

We are currently investigating the possibility of extending our measure to less elementary dynamical systems, as well as to idealized biological systems (e.g., networks of regulatory genes). Inspired by Nurse’s idea [35] that the optimization of information storage and flow might represent the distinctive feature of biological functions, we find interesting that information dynamics identifies the most complex ECA dynamics with the ones that least depend on their initial state. This feature, evocative of the redundancy so ubiquitous in biology, might represent a link between biology and information dynamics [36] that we consider worth of further study.

#### Disclosure

The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of Templeton World Charity Foundation.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This publication was made possible through support of a grant from Templeton World Charity Foundation. The authors thank Douglas Moore and Hector Zenil for their helpful comments.