Complexity

Volume 2017, Article ID 7208216, 10 pages

https://doi.org/10.1155/2017/7208216

## A Computable Measure of Algorithmic Probability by Finite Approximations with an Application to Integer Sequences

^{1}Grupo de Lógica, Lenguaje e Información, Universidad de Sevilla, Sevilla, Spain^{2}Algorithmic Nature Group, LABORES, Paris, France^{3}Algorithmic Dynamics Lab, Center for Molecular Medicine, Science for Life Laboratory (SciLifeLab), Department of Medicine, Solna, Karolinska Institute, Stockholm, Sweden^{4}Group of Structural Biology, Department of Computer Science, University of Oxford, Oxford, UK

Correspondence should be addressed to Hector Zenil; moc.liamg@clinezh

Received 19 February 2017; Revised 22 June 2017; Accepted 7 August 2017; Published 21 December 2017

Academic Editor: Giacomo Innocenti

Copyright © 2017 Fernando Soler-Toscano and Hector Zenil. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Given the widespread use of lossless compression algorithms to approximate algorithmic (Kolmogorov-Chaitin) complexity and that, usually, generic lossless compression algorithms fall short at characterizing features other than statistical ones not different from entropy evaluations, here we explore an alternative and complementary approach. We study formal properties of a Levin-inspired measure calculated from the output distribution of small Turing machines. We introduce and justify finite approximations that have been used in some applications as an alternative to lossless compression algorithms for approximating algorithmic (Kolmogorov-Chaitin) complexity. We provide proofs of the relevant properties of both and and compare them to Levin’s Universal Distribution. We provide error estimations of with respect to . Finally, we present an application to integer sequences from the On-Line Encyclopedia of Integer Sequences, which suggests that our AP-based measures may characterize nonstatistical patterns, and we report interesting correlations with textual, function, and program description lengths of the said sequences.

#### 1. Algorithmic Information Measures

Central to Algorithmic Information Theory is the definition of algorithmic (Kolmogorov-Chaitin or program-size) complexity [1, 2]: where is a program that outputs running on a universal Turing machine and is the length in bits of . The measure was first conceived to define randomness and is today the accepted objective mathematical measure of randomness, among other reasons, because it has been proven to be mathematically robust [3]. In the following, we use instead of because the choice of is only relevant up to an additive constant (invariance theorem). A technical inconvenience of as a function taking to be the length of the shortest program that produces is its uncomputability. In other words, there is no program that takes a string as input and produces the integer as output. This is usually considered a major problem, but one ought to expect a universal measure of randomness to have such a property.

In previous papers [4, 5], we have introduced a novel method to approximate based on the seminal concept of algorithmic probability (or AP), introduced by Solomonoff [6] and further formalized by Levin [3] who proposed the concept of uncomputable semimeasures and the so-called Universal Distribution.

Levin’s semimeasure (it is called a semimeasure because, unlike probability measures, the sum is never 1. This is due to the Turing machines that never halt) defines the so-called Universal Distribution [7], with the value being the probability that a random program halts and produces running on a universal Turing machine . The choice of is only relevant up to a multiplicative constant, so we will simply write instead of .

It is possible to use to approximate by means of the following theorem.

Theorem 1 (algorithmic coding theorem [3]). *There is a constant such that *

This implies that if a string has many descriptions (high value of , as the string is produced many times, which implies a low value of , given that ), it also has a short description (low value of ). This is because the most frequent strings produced by programs of length are those which were already produced by programs of length , as extra bits can produce redundancy in an exponential number of ways. On the other hand, strings produced by programs of length which could not be produced by programs of length are less frequently produced by programs of length , as only very specific programs can generate them (see Section in [8]). This theorem elegantly connects probability to complexity—the frequency (or probability) of occurrence of a string with its algorithmic (Kolmogorov-Chaitin) complexity. It implies that [4] one can calculate the Kolmogorov complexity of a string from its frequency [4], simply rewriting the formula as Thanks to this elegant connection established by (2) between algorithmic complexity and probability, our method can attempt to approximate an algorithmic probability measure by means of finite approximations using a fixed model of computation. The method is called the Coding Theorem Method (CTM) [5].

In this paper, we introduce , a computable approximation to which can be used to approximate by means of the algorithmic coding theorem. Computing requires the output of a numerable infinite number of Turing machines, so we first undertake the investigation of finite approximations that require only the output of machines up to states. A key property of and is their universality: the choice of the Turing machine used to compute the distribution is only relevant up to an (additive) constant, independent of the objects. The computability of this measure implies its lack of universality. The same is true when using common lossless compression algorithms to approximate , but on top of their nonuniversality in the algorithmic sense, they are block entropy estimators as they traverse files in search of repeated patterns in a fixed-length window to build a replacement dictionary. Nevertheless, this does not prevent lossless compression algorithms from finding useful applications in the same way as more algorithmic-based motivated measures can contribute even if also limited. Indeed, has found successful applications in cognitive sciences [9–13] and in financial time series research [14] and graph theory and networks [15–17]. However, a thorough investigation to explore the properties of these measures and to provide theoretical error estimations was missing.

We start by presenting our Turing machine formalism (Section 2) and then show that it can be used to encode a prefix-free set of programs (Section 3). Then, in Section 4, we define a computable algorithmic probability measure based on our Turing machine formalism and prove its main properties, both for and for finite approximations . In Section 5, we compute , compare it with our previous distribution [5], and estimate the error in as an approximation to . We finish with some comments in Section 7.

#### 2. The Turing Machine Formalism

We denote by the class (or space) of all -state 2-symbol Turing machines (with the halting state not included among the states) following the Busy Beaver Turing machine formalism as defined by Radó [18]. Busy Beaver Turing machines are deterministic machines with a single head and a single tape unbounded in both directions. When the machine enters the halting state, the head no longer moves and the output is considered to comprise only the cells visited by the head prior to halting. Formally, we have the following definition.

*Definition 2 (Turing machine formalism). *We designate as the set of Turing machines with two symbols and states plus a halting state . These machines have entries (for and ) in the transition table, each with one instruction that determines their behavior. Such entries are represented by where and are, respectively, the current state and the symbol being read and represents the instruction to be executed: is the new state, is the symbol to write, and is the direction. If is the halting state , then ; otherwise is (right) or (left).

Proposition 3. *Machines in can be enumerated from to .*

*Proof. *Given the constraints in Definition 2, for each transition of a Turing machine in , there are different instructions . These are instructions when (given that is fixed and can be one of the two possible symbols) and instructions if ( possible moves, states, and symbols). Then, considering the entries in the transition table, These machines can be enumerated from to . Several enumerations are possible. We can, for example, use a lexicographic ordering on transitions (4).

For the current paper, consider that some enumeration has been chosen. Thus, we use to denote the machine number in following that enumeration.

#### 3. Turing Machines as a Prefix-Free Set of Programs

We show in this section that the set of Turing machines following the Busy Beaver formalism can be encoded as a prefix-free set of programs capable of generating any finite nonempty binary string.

*Definition 4 (execution of a Turing machine). *Let be a Turing machine. We denote by the execution of over an infinite tape filled with (a blank symbol), where . We write if halts and otherwise. We write if (i),(ii) is the output string of , defined as the concatenation of the symbols in the tape of which were visited at some instant of the execution .

As Definition 4 establishes, we are only considering machines running over a blank tape with no input. Observe that the output of considers the symbols in all cells of the tape written on by during the computation, so the output contains the entire fragment of the tape that was used. To produce a symmetrical set of strings, we consider both symbols and as possible blank symbols.

*Definition 5 (program). *A program is a triplet , where (i) is a natural number,(ii),(iii). We say that the output of is if and only if .

Programs can be executed by a universal Turing machine that reads a binary encoding of (Definition 6) and simulates . Trivially, for each finite binary string with length , there is a program that outputs .

Now that we have a formal definition of programs, we show that the set of valid programs can be represented as a prefix-free set of binary strings.

*Definition 6 (binary encoding of a program). *Let be a program (Definition 5). The binary encoding of is a binary string with the following sequence of bits: (i)First, there is , that is, repetitions of followed by . This way we encode .(ii)Second, a bit with value encodes the blank symbol.(iii)Finally, is encoded using bits.

The use of bits to represent ensures that all programs with the same are represented by strings of equal size. As there are machines in , with these bits we can represent any value of . The process of reading the binary encoding of a program and simulating is computable, given the enumeration of Turing machines.

As an example, this is the binary representation of the program .