A method for analyzing sequential data sets, similar to the permutation entropy one, is discussed. The characteristic features of this method are as follows: it preserves information about equal values, if any, in the embedding vectors; it is exempt from combinatorics; and it delivers the same entropy value as does the permutation method, provided the embedding vectors do not have equal components. In the latter case, this method can be used instead of the permutation one. If embedding vectors have equal components, this method could be more precise in discriminating between similar data sets.

1. Introduction

Because of technical progress in the areas of sensors and storage devices, a huge amount of raw data about time course of different processes, such as ECG, EEG, climate data recordings, and stock market data, have become available. These data are redundant. The data processing and classification, aimed at extracting meaningful for nonspecialist characteristics, is based on reducing the excess of redundancy. As a result, new data are obtained, small in size and digestible by a human being. Examples of those reduced data for time series can be mean value, variance, Lyapunov exponents, correlation dimension, and attractor dimension.

A remarkable method suitable for reducing the excess of redundancy in time series has been proposed by Bandt and Pompe in [1], known as permutation entropy. This method is simple and transparent, is robust with respect to monotonic distortions of the raw data, and is suitable for estimating the dynamical complexity of the underlying dynamical process. Many interesting results, e.g. [25], have been obtained with straightforward application of the permutation entropy methodology in its initial form, as it is described in [1]. Nevertheless, this method is subjected to a critique for not taking into account absolute values of the raw data and for not treating properly a possibility of having equal values in the embedding vector (ties), [6, 7]. In this connection, it should be taken into account that any redundancy reduction method leaves out some types of information, which may be useless for one process/task and may carry useful information for another one. In the latter case, the bare idea in [1] about how to treat equal values can/should be modified in order to meet a purpose of concrete situation. Examples of such a modification can be found in [8, 9] for taking into account absolute values or in [10, 11] for treating equal values. Interesting modification of the permutation entropy method has been proposed in [12] for 3-tuple EEG data.

In the standard permutation entropy methodology, it is preferable that embedding vectors have all their components different. Otherwise, they cannot be plainly symbolized by a permutation without using additional rules, which actually treat equal values as not being such. Situation with equal values in the embedding vector may arise for high embedding dimension, for crude quantization of measured data, for very long data sequences, and when observed dynamical system has intrinsically only a small number of possible outputs.

This note is aimed at discussing a slightly different symbolization technique of embedding vectors, which does not refer to combinatorics and which is capable of preserving information about equal values in embedding vectors. Instead of permutation, an embedding vector is emblematized with a single integer number of base , where is the embedding dimension. In the case of no ties (no equal components in the embedding vectors), the technique is equivalent to the standard permutation entropy methodology. In the opposite case, it may discriminate between similar data sets better than the permutation entropy method does.

2. Permutation Entropy

Consider a finite sequenceof measurements. By choosing the embedding dimension , the data (1) can be embedded into a -dimensional space by picking out consecutive ‐tuples from . As a result, a set of -dimensional embedding vectors are obtained:where each vector has the following form:

An additional parameter of the embedding procedure is delay . In the above definition, we put for simplicity. With one would have instead of (3).

The data represented in (2) and/or (3) are even more redundant than those represented in (1) since for , most data values from (1) are represented in (3) times. In the permutation entropy technique [1], each embedding vector from (2) and/or (3) is replaced with a permutation of integers {0, 1, 2, …, }, which is defined by the order pattern of values composing the vector. For any embedding vector , the permutation , which symbolizes it, is calculated as follows. Arrange all components of either in the descending [13]:or in the ascending [11,14]:order keeping their subscripts unchanged (actually, in (4) and (5), equal values (ties) are as well admitted. Here, we exclude such a possibility for the sake of clarity. The equal values are discussed in the next section.) The permutation which corresponds to is obtained as the row of the subscripts in the rearranged vector from either (4) or (5):

From the set of embedding vectors , calculate a new set of order patterns by replacing each vector in (2) by the corresponding permutation:

Now, empirical probability of each permutation, , can be obtained by dividing the number of occurrences of in by the total number of elements in . The permutation entropy of is the Shannon entropy of the probability distribution :where is the number of different permutations in .

2.1. Treatment of Equal Values

Equal values in an embedding vector are, to an extent, inconvenient. Indeed, if for some in a vector , then and should be placed side by side in the permutation (6), but which one should go first? Due to the sameness of values, it is impossible to uniquely determine a corresponding permutation without introducing additional rules. In some cases, the possibility of equal values can be ignored due to their low probability. This is reasonable when the embedding dimension is low and/or a chaotic process data are recorded with high precision [1, 15, 16]. If equal values are inevitable, the following rule is applied (in some cases, e.g. [10, 11], the opposite inequality sign is used here):

The rule (9) has different meaning depending on whether (4) or (5) convention is used. Namely, in the case of (4), an embedding vector with all components equal will be equivalent to a vector with monotonically ascending components. If (5) is adopted, then that same vector will be equivalent to a vector with monotonically descending components (Figure 1).

Without knowing a real system, it is not clear which case is better and whether it is good or bad to label a sequence of same values as being decreasing or increasing. Actually, the permutation symbolization technique aims at reducing redundancy. Discrimination between constant and either increasing or decreasing sequences of data may appear to be excessive in some cases. On the contrary, when a system, which generates data, has a few possible outputs, the data were subjected to a crude quantization, or embedding dimension is large, it may happen to be useful if the presence of equal values in the embedding vector results in the order pattern preserving this fact. One possible approach to do this is discussed in the next section.

3. Arithmetic Entropy

3.1. Symbolization

The following symbolization is aimed to keep information about equal values in embedding vectors. Having a vector , construct a sequence of integers :by using the following rule: find the smallest component, , in . If is found at places , put number 0 at those places in . Find the next smallest component , in . If is found at places , put number 1 at those places in . Proceed this way until components of are exhausted. At this stage, all components of will be determined. obtained this way is used as a symbol of embedding vector .

For example, consider . The corresponding symbol, or the order pattern, is . Here, information about equal values and their positions is preserved.

If has no equal components, it can be proven (Appendix A) that . This means that is the inverse permutation of the one obtained for if convention (5) is used. Since correspondence between permutations and their inverse is one-to-one, it does not matter which one, or , is used for calculating entropy. This further means that for a data set and embedding method, which does not deliver equal values in the embedding vectors, symbolization used here is equivalent to the permutation one (it seems that in paper [5], the symbolization method described here is used. But, as it may be concluded from [5] (equation (6)), the issue of equal values is not addressed. Similar approach is used in [12, 17], again without considering equal values.) while calculating entropy.

3.2. Arithmetization

Expect that embedding vector in (10) has exactly unique components, where . In this case, the corresponding symbol will be a sequence of numbers chosen from the set in such a way that not any element from is missed. The latter can be formulated as the following condition:

The sequence can be considered as a single integer , in a base- positional numeral system with digits :

(For a single embedding vector, might be chosen as radix instead of . But may be different for different vectors. And a same integer may have different representation for different bases with (11) satisfied, e.g., 01123 = 11102.). It is clear that there is one-to-one correspondence between order patterns and integers obtained as shown in (12). Therefore, a set of order patterns, constructed as described in Section 3.1, can be replaced with a set of integers obtained as shown in (12):

The empirical probabilities to find an integer among those in can be calculated as usual, and we have for the arithmetic entropy:where is the number of different integers in .

For a data sequence and embedding method which does not deliver equal values in the embedding vectors, all and the integers will represent corresponding permutation order patterns unambiguously. In this case, , where corresponds to pattern :

And corresponds to pattern :

In this case, only integers will be used from due to condition (11).

3.3. How Many New Possible Order Patterns Are Got?

If it is decided to treat the order patterns generated from embedding -vectors with some components equal as not equivalent to those from vectors with all components different, then the number of all possible patterns will be greater than . Here we attempt to estimate how many new patterns can be obtained.

Any new pattern appears from embedding -vector with different components, where . So, having fixed, the number of corresponding new patterns is equal to the number of base--digit integers constructed from digits in such a way that each of the digits is used at least once. This number can be calculated aswhere is the Stirling numbers of the second kind ([18], Part 5, Section 2). Considering all possible values for , we have for the total number of possible new patterns:where are known as the ordered Bell numbers, see ([19], p. 337) for naming discussion. Calculating (the Stirling numbers were calculated with stirling2(D, d) function in the “maxima” computer algebra system (http://maxima.sourceforge.net/)) for , we see that the number of new patterns is normally greater than , see Figure 2 and also Table 1. Of course, the possible new patterns may only be significant when they can be observed (see discussion about this in [7]). This depends on the process under study and embedding method.

3.4. Coding

Certainly, there are several possible implementations of the algorithm discussed in Sections 3.1 and 3.2. Here, the one used for the examples in Section 4, and Appendix C, is shown. It is a C++ program. It is expected that the sequence (1) is organized into a one-dimensional array X[N]. For calculating the arithmetic order pattern of vector shown in (3), it is necessary to pass a pointer to X[i] to the function get_numerical_pattern, below, as its third argument: data_point = X + i.

In the below example, X[i] is declared as double, but it can be of any type with appropriate sorting defined. The returning value is declared as mpz_class, which is a GNU multiple precision integer (https://gmplib.org/). This is used because for embedding dimensions , the returned number representing an order pattern may exceed 64 bits in size (it makes sense to use large embedding dimensions only for very long sequences of data. Otherwise, any observed pattern appears only once, which is unfavorable for estimating probabilities). For smaller , mpz_class can be replaced with int or long everywhere in the code.(1)#include <gmp.h>(2)#include <gmpxx.h>(3)#include <forward_list>(4)(5)/(6) Function calculates numerical representation of order pattern(7) of an embedding vector V_i = {(x_i, x_{i + tau},…).(8) Here D is the embedding dimension and tau is the delay.(9) The data_point points to the first component of Vi in the(10) array of raw data.(11)/(12)mpz_class get_numerical_pattern (int D, int tau, double data_point)(13){(14) int k;(15) std:forward_list<double> FL;(16) auto it = FL.before_begin();(17) for (k = 0; k < D; k++) it = FL.emplace_after (it, data_point [k∗tau]);(18) FL.sort ();(19) FL.unique ();(20)(21) int pDpnm = new int [D]; //order pattern will be here(22) int tag = 0;(23) for (auto it = FL.begin (); it != FL.end (); ++it)(24)  {(25)   for (k = 0; k < D; k++)(26)    if (it == data_point [ktau]) pDpnm [k] = tag;(27)   tag++;(28)  }(29)(30) mpz_class pnum = 0; // arithmetic order pattern (initial value)(31) mpz_class digval = 1; // initial value of a single digit(32) for (k = 0; k < D; k++)(33)  {(34)   pnum += pDpnm [k]digval;(35)   digval = D;(36)  }(37) return pnum;(38)}

This code is transparent and does not refer to combinatorics. At the same time, provided an embedding vector does not have equal components, when loop at lines 23–28 above is complete, we obtain in the array pDpnm [D] a permutation , where is the permutation for that vector obtained in accordance with the standard rules of [1] reproduced in Section 2 with (5) adopted.

4. Example

The discussed methodology has been tested at two surrogate sequences. The purpose was to demonstrate that for a pair of sequences, the standard permutation entropy method gives roughly the same entropy, whereas the arithmetic entropy may be considerably different.

For calculating standard permutation entropy in situation when equal components in embedding vectors are possible, we replace the following fragment:(i)for (auto it = FL.begin (); it != FL.end (); ++it)(ii) {(iii)  for (k = 0; k < D; k++)(iv)   if (∗it == data_point [ktau]) pDpnm [k] = tag;(v)  tag++;(vi) },

in the code of Section 3.4, with the following one:(i)for (auto it = FL.begin (); it != FL.end (); ++it)(ii) {(iii)  for (k = D−1; k >= 0; k--)(iv)   if (it == data_point [ktau]) pDpnm [k] = tag++;(v) }

With such a replacement, we get in the array pDpnm [D] above the permutation, which is inverse to the one obtained for in the standard permutation entropy symbolization with rules (5) and (9) adopted. As it was mentioned above, usage of inverse permutations instead of the initial ones delivers the same value for the standard permutation entropy.

The two sequences, S1 and S2, are obtained as follows: by means of function gsl_rng_uniform_int from the GNU Scientific Library [20], we generate random numbers from the set , which are equally probable. Each obtained random number “val” is written into the S1. The same number is written into S2 provided it is not equal to the number written to S2 at the previous step. If it does, then the number (val + 1) (mod 5) is written instead. This introduces a nonzero correlation between consecutive values in S2. For example, in S2, any two consecutive values are always different. Examples of S1 and S2 are as follows:

1 000 000 long S1 and S2 were produced and both permutation and arithmetic entropy have been calculated. The results are shown in Tables 2 and 3.

Notice that arithmetic entropy is considerably greater than the permutation one. This is due to the high frequency of embedding vectors with equal components. Also, from Table 2 with , it can be seen that arithmetic entropy discriminates better between S1 and S2. However, the case with delay shown in Table 3 is not similarly conclusive. This might be due to the construction method of the S2 sequence. Namely, by pulling from S2 embedding vectors with delay 2, we may get vectors with equal adjacent components, similarly to the S1 case. This alleviates difference between S1 and S2. For , embedding vectors for S2 do not have equal adjacent components. One more example is in Appendix C.

5. Conclusions and Discussion

In this note, we have discussed a method for calculating entropy in a sequence of data, which is similar to the permutation entropy method. The characteristic features of this method are as follows:(i)It treats equal components in the embedding vectors as being equal instead of ordering them artificially(ii)It is entirely exempt from combinatorics, labeling order patterns by integers instead of permutations(iii)If embedding vectors do not have equal components, this method delivers exactly the same value for the entropy as does the standard permutation entropy one

In the symbolization procedure discussed in Section 3.1, new order patterns may appear as compared to the standard permutation method (Section 3.3). Those new patterns arise from embedding vectors with some components being equal to each other. In the standard permutation entropy method, the embedding vectors characterized by those new patterns, if any, are labeled by permutations as if there were no equal components. This is made possible through ordering equal values in accordance with the rule (9).

Mathematically, replacing embedding vectors with their order patterns means constructing a quotient set from the set of all embedding vectors with respect to some equivalence relation [10, 21, 22]. In the case of permutation entropy, the corresponding equivalence relation is defined by using (9) and either (4) or (5). Denote it by . For arithmetic entropy, the corresponding equivalence relation is defined by using the algorithm described in the first paragraph of Section 3.1. Denote it by . It is clear that for two embedding vectors and , if , then . Namely, if and have the same arithmetic order pattern, then they do have the same permutation order pattern. That means that is coarser relation than . Other equivalence relations could be offered, which are courser than , finer than , lying in between, or incomparable with the both, see [12]. A symbolization which still uses permutations, but is equivalent to the one discussed here, as regards the treatment of equal values in embedding vectors, has been proposed in [11], see discussion in Appendix B. Which one is better depends on the data sequence and which kind of redundancy is intended to strip.


A. Equivalence with Permutations

The following theorem proves the statement made in Section 3.1.

Theorem A.1. Suppose that an embedding vector does not have equal components. Then its symbolic pattern , obtained as described in Section 3.1 after equation (10), represents permutation which is inverse to the —the permutation obtained in the standard permutation entropy approach with convention (5) adopted:

Proof. Since has no equal components, then represents some permutations of sequence . Furthermore, the procedure of obtaining from does not change the rank order: for any , if , then , and vice verse. If so, then can be used for calculating standard permutation :In this course, after arranging elements of as required in (5), one obtainsObtained permutation acts as follows:Now, take into account that has number at position (A.3). That means that order pattern , if treated as a permutation, acts as follows:The latter just means that .
Due to this theorem, the method discussed in this note is equivalent to the standard permutation entropy method if in any embedding vector, any two components are different.

B. Comparison with Modified Permutation Entropy

Several versions of the modified permutation entropy symbolization have been proposed. We analyze here those proposed in [10, 11]. Considering firstly [10], the symbolization proposed is obtained as follows: having an embedding vector arrange its components as shown in (5), with their subscripts retained, if there are equal components, arrange their subscripts similar to (9) rule, or any other way. Before fetching the row of subscripts in the resulting vector as the modified symbol, do the following preparation. If there is a group of equal components in , then replace all subscripts in this group by the smallest among . Do this with all groups of equal components in . Use the row of subscripts in the such way modified as the modified symbol of . This way modified symbolization retains some information about equal components in . Let us denote this type of symbolization as MPE and corresponding symbol as .

By comparing the values presented in Table 1 with the data of TABLE 1 in [10], we see that the total number of possible patterns is bigger in Table 1. Therefore, it could be expected that MPE symbolization used in [10] is coarser than that discussed in this note. An additional hint in the same direction is that for some embedding vectors, symbolization in [10] gives the same result, while the method discussed in this note gives two different results. Here is one example: , . MPE symbolization in [10] gives both for and the same order pattern , whereas AE symbolization gives for and for resulting in two different numerical patterns, 68 and 20 calculated for as shown in (12). Notice now that for any two embedding vectors and ,

Indeed, the MPE symbol of any vector is obtained through rearranging components of in accordance with their rank order. Symbol , if considered as a vector, has the same rank of its components as does . Therefore, can be used for calculating instead of itself. If so, then (B.1) becomes evident. The above reasoning proves that MPE and AE methods of symbolization are comparable, and AE is finer than MPE.

Consider now the symbolization used for modified permutation entropy proposed in [11]. In this symbolization, each embedding vector is symbolized with . has the following structure:where is a permutation. The second half in (B.2), , keeps information about equal components in . Call this symbolization MPE2. The symbol is obtained as follows: arrange components in in the ascending order keeping their subscripts. As a result, we obtain a sequence of groups consisting of equal components. Each group may have from one to elements. Of course, in the latter case, there will be only one group. The value composing each group in the sequence increases from left to right. Arrange subscripts in each group in the ascending order. Denote this way prepared sequence of components with their initial subscripts as . The row of subscripts in is from (B.2). This is the standard PE symbol with (5) adopted with the only difference that in the rule (9), the opposite inequality sign is used. The sequence is composed of zeros and ones by the following rule: if, then; otherwise.

Theorem B.2. Symbolization MPE2 produces the same partition of a set of embedding vectors as does the AE one described in Section 3.1.

Proof. In order to prove this statement, we need to show that for any and , the following equivalence holds:It is easily seen that can be unambiguously recovered from . Indeed, and considered as a vector have the same rank order of components. And calculation of is based exclusively on the rank order. Therefore,Thus, vectors with same will have same . This proves the one half of (B.3). In order to prove the second half, we need to show how can be unambiguously recovered from . For this purpose, we use the equality (B.4). So, if we arrange in the ascending order retaining subscripts, we obtain, instead of , above, a vector . This vector consists of groups of equal values: the first group has only zeros, the second one has only “1,” and the last one has only “,” where is the number of unique components in or . The sequence of subscripts in is the permutation, which constitutes the first part in . If one would have without subscripts inherited from , in the form of , the required might be obtained by applying permutation to . Namely, for , , where is taken from the permutation : . The required sequence can be recovered from the second part of . For this purpose, do the following reprocessing of . Replace with the following sequence: . With the obtained new sequence, proceed as follows: at the step number one, if , replace it with 0, otherwise replace it with 1. Similarly, at the step number , if , replace it with the number put at the previous step in place of , otherwise, replace with that same number incremented by 1. After replacing , we obtain the required sequence . This completes the proof.

C. Example

Here, we consider the sequence of digits in decimal expansion of (see ([6], Section 4.1)). The first 1 million digits in the decimal expansion of have been downloaded from https://catonmat.net/tools/generate-sqrt2-digits and https://apod.nasa.gov/htmltest/gifcity/sqrt2.1mil.

Denote this sequence S1. The first 10 millions digits in the decimal expansion of has been downloaded from https://apod.nasa.gov/htmltest/gifcity/sqrt2.10mil.

Denote this sequence S10. Both PE and AE were calculated for both S1 and S10 for different embedding dimensions with delay . The values were chosen based on the number of occurrences of different order patterns in S1 (Table 4). Based on the data of Table 4, we skip and cases because the number of occurrences of some arithmetic entropy patterns is too small for calculating probabilities. The values obtained for entropy are presented in Tables 5 and 6.

The data, which are obtained numerically, can be checked analytically. Indeed, the number is believed to be base 10 normal [23]. This means that any combination of digits can be found in the expansion with probability . For example, if , there are 10 combinations with AE pattern , 45 combinations with AE pattern , and the same amount with AE pattern . This gives for the probabilities , , and . And . In the PE symbolization, both and correspond to and corresponds to (we use here the rule (9) with inverse inequality sign). This gives .

From Tables 5 and 6, we see that AE is usually bigger than PE. This could be explained by the bigger total number of patterns available in the AE symbolization. Perhaps, for the same reason, normalized AE is smaller than NPE for small . What seems unexpected, it is the opposite behavior of NPE and NAE with growing . Namely, NAE is increasing and NPE is decreasing function of for the parameter set considered (D = 7 and 8 were considered for S10. The results, as regards decreasing and increasing, support those observed for smaller D). As it is illustrated in the previous paragraph, the ‐tuples of digits from the expansion sequence are distributed unevenly between different order patterns both for PE and AE (this might explain dispersion of the patterns’ frequencies observed in ([6], Figure 8)).The abovementioned behavior with increasing suggests that the unevenness decreases for AE and increases for PE, at least in some “normalized” sense. This is for the expansion. Whether a similar behavior takes place for other sequences and a possible practical utilization of this fact require additional study.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.


In this paper, the following free software has been used: (i) linux operating system (https://getfedora.org/); (ii) GNU Scientific Library [20] (https://www.gnu.org/software/gsl/); (iii) GNU Multiple Precision Arithmetic Library (https://gmplib.org/); (iv) Maxima, a free Computer Algebra System (http://maxima.sourceforge.net/); and (v) RefDB, a free Reference Manager created by Markus Hoenicka (http://refdb.sourceforge.net/).

Conflicts of Interest

The author declares that there are no conflicts of interest.


The work was partially supported by the Program of Fundamental Research of the Department of Physics and Astronomy of the National Academy of Sciences of Ukraine “Mathematical models of nonequilibrium processes in open systems” (N 0120U100857).