Abstract

Two main weaknesses have been identified for permutation entropy (PE): the neglect of subsequence pattern differences in terms of amplitude and the possible ambiguities introduced by equal values in the subsequences. A number of variations or customizations to the original PE method to address these issues have been proposed in the scientific literature recently. Specifically for ties, methods have tried to remove the ambiguity by assigning different weighted or computed orders to equal values. Although these methods are able to circumvent such ambiguity, they can substantially increase the algorithm costs, and a general characterization of their practical effectiveness is still lacking. This paper analyses the performance of PE using several biomedical datasets (electroencephalogram, heartbeat interval, body temperature, and glucose records) in order to quantify the influence of ties on its signal class segmentation capability. This capability is assessed in terms of statistical significance of the PE differences between classes and classification sensitivity and specificity. Being obvious that ties modify the PE results, we hypothesize that equal values are intrinsic to the acquisition process, and therefore, they impact all the classes more or less equally. The experimental results confirm ties are often not the limiting factor for PE, even they can be beneficial as a sort of stochastic resonance, and it can be far more effective to focus on the embedding dimension instead.

1. Introduction

Permutation entropy (PE) is a very powerful metric to capture the dynamical features of a time series based just on its ordinality [1]. It has been used in a wide diversity of applications for signal classification or event detection [2].

The time series with equal or repeated values is a usual scenario in many of these real-life applications of PE. If these equal values are sufficiently close, that is, they fall within the same subsequence used in the computation of PE, they can introduce a significant bias in the estimation of the ordinal pattern probability distribution [3] since they are usually assigned an index based on their location in the input subsequence.

A few methods have been proposed to account for ties in PE, including the original PE method. In fact, in their seminal paper [1], Bandt and Pompe already suggested to add random perturbations to break the ties, in the unlikely case ties were present in an otherwise continuous distribution. This is the same recommendation in [4]. Other more recent works, like the modified PE algorithm (mPE) described in [5], propose to represent equal values with the same symbol, using the smallest index at which the repetition starts, defining more states than the PE method. Another approach to this problem, the amplitude-aware permutation entropy (AAPE) method [6], uses a weighted approach, where all possible permutations with equal values are considered, and the probabilities are updated accordingly. All these methods try to account for ties from a mathematical perspective, but not from a conceptual perspective: what ties really represent in a time series, since equal values may be an intrinsic part of the dynamics of the records. Furthermore, the methods are assessed using synthetic or a very specific type of records, which do not provide a complete picture of the real influence of ties in PE: Is there a correlation with the percentage of ties and PE performance? Is it preferable to account for ties in the PE algorithm or try to remove them in advance? Is the time series location of ties making any difference on their influence? Studies to address these and other related PE ties issues are still lacking. Only in [3] did the authors study this influence and conclude that equal values introduce a PE bias, leading to a possible misinterpretation of the regularity of the underlying dynamics. However, this study was focused on assessing the changes in the results of PE compared to those obtained with no ties, in absolute terms.

In signal classification tasks, the main driver of PE performance is not the specific PE value, but its intra- and interclass distribution [7]. Due to equal values in subsequences, point clouds of PE values may drift but still remain separable. Thus, it can be hypothesized that minor changes in PE values due to ties do not necessarily imply a degradation of the segmentation capability of PE, provided its capacity to unveil differences among signal classes is not completely lost. In other words, PE bias may be distributed more or less uniformly among all the classes, and therefore, the differences may still remain apparent.

Based on this hypothesis, the present study aims to gain a more practical insight into the real influence of equal values in PE. In the framework of biomedical records, specifically, using electroencephalogram, heartbeat interval, body temperature, and glucose time series, we assess the level of ties right after signal acquisition and then compute the classification performance and its correlation with such level. The classification performance is quantified using the sensitivity and specificity achieved, and the statistical significance is computed using the Mann–Whitney test. We later add synthetic ties in order to further characterise the possible performance deterioration of signal classification and quantify how some of the methods proposed [5, 6] really deal with ties. Finally, a resampling scheme is applied to remove equal values before using PE to find out whether ties really matter or if it is better to focus on other methodological aspects of PE to improve its detection sensitivity. The experimental dataset is composed of a varied group of biosignals that exhibit a different range of ties and features and cover a wide range of medical applications.

The structure of the paper is as follows. The next section, Section 2.1, includes a detailed description of the PE method as introduced in [1]. Next, in Section 2.2, the main features of the four experimental datasets are presented. These datasets correspond to a varied group of real biomedical records that include equal values due to the acquisition stage and the resolution of the devices employed, with percentages ranging from 3% up to 30%. They have been used in many signal classification applications, and most of their classes have been demonstrated to be readily separable. These records will be corrupted with additional synthetic ties, and the possible degradation on PE performance, either in absolute or relative terms, will be assessed as described in Section 2.3. Quantitative results will be reported in Section 3, and their interpretation explained in Discussion (Section 4). The final conclusions of the study will be described in Section 5.

2. Materials and Methods

2.1. Permutation Entropy

Let be a realization of a discrete time stationary stochastic process, with a cardinality of , and let , , be a block extracted from , with a cardinality of , and an associated vector of orders so that is linked to 0, is linked to 1, and so on, up to , which is linked to order .

There is a ≤ relation in that also applies to . This relation can be used to obtain a new ordered version of , termed , where . As a consequence, the order of the indices in must be updated according to the mapping between and , resulting in a new ordinal pattern that represents the final arrangement of the original generic ordered indices.

In order to obtain the PE result for , the probability of all the resulting ordinal patterns for all the blocks has to be estimated. This value can be obtained by counting the pattern occurrences in a list , , that contains all possible permutations of , so that , , from which .

Finally,

2.2. Experimental Dataset

The experimental dataset includes biomedical records of diverse origin to avoid model overfitting and provide a more complete picture of the practical influence of ties in signal segmentation applications using PE. Specifically, this experimental dataset contains the following: (i)Electroencephalogram (EEG) time series [8]. This group contains 5 different classes (0–4). Class 0 includes records from healthy subjects with the eyes open. Class 1 also corresponds to healthy subjects but with the eyes closed. Classes 2, 3, and 4 include records from epileptic patients using intracranial electrodes, with class 2 including seizure activity and 3 and 4 only seizure-free activity. The number of records is 500, uniformly distributed among the 5 classes. The length is also uniform, 4096 samples, for a duration of 23.6 s. This database has been used in a number of papers [911], and it is publicly available. An example of records of each class is shown in Figure 1.(ii)Body temperature time series (Temp) [12]. This group contains 2 different classes (0 and 1) from patients admitted to the internal medicine ward of the Teaching Hospital of Móstoles, Madrid (Spain). A total of 30 artifact-free records starting at 8:00 AM were selected, with class 0 including 16 patients with no fever peaks and class 1, 14 patients that had a central temperature measurement above 38°C the day before they were monitored. Temperature was sampled every minute, and records of 8 h were extracted for analysis. This database has also been used in previous studies [12], where further details can be found. An example of records of each class is shown in Figure 2.(iii)RR interval records (distance between two consecutive heartbeats) available at the CAST RR-Interval Substudy Database (RR) [1315]. There are six groups of records in this database corresponding to three types of medications, acquired, pre-, and postmedication (paired data not included in the classification experiments, pairs 0-1, 2-3, and 4-5). The duration of the records ranges from 21 h up to 24 h. The database is available for download at Physionet [16] and has been used extensively in other scientific works [10, 1719]. Since some records had QRS detection errors, they were preprocessed before computing their PE by the method described in [20]. An example of records of each class is shown in Figure 3.(iv)Continuous glucose monitoring (CGM) time series [21]. This group includes 206 glucose records of patients with diabetes risk sampled at 5 minutes during 24 h also acquired at the Teaching Hospital of Móstoles, Madrid (Spain). There are 18 records out of the 206 corresponding to patients who were diagnosed with diabetes. The other 188 records correspond to subjects that remained healthy at the end of the two-year study. This database has been used to test if there are metrics capable of detecting the subtle differences between the records [21] and anticipate who is more likely to develop diabetes in order to implement mitigation measures as soon as possible. An example of records of each class is shown in Figure 4.

All the experimental datasets include pattern ties in their original form, but most of them have been successfully classified using a disparity of metrics and algorithms; in other words, they are separable. Equal values are a usual issue in many real-life time series, and any method should be robust against them. That is why many modifications have been devised to minimise their impact on PE computation. Biomedical records are no exception to this universal rule.

As the baseline case, the original form, Table 1 shows the percentage of equal values found for patterns of length up to 9. EEG records exhibit a low incidence of ties, around 5% in the worst case, whereas RR records almost achieve 40%. Although CGM had a higher percentage of ties for , they are mainly due to consecutive equal values, but in RRs, they are more randomly distributed. As a consequence, when increases, it is more likely for RR records to find new equal values, while for CGMs, they were already included in the shorter subsequences.

In order to further clarify the presence of ties in the input signals and how the pattern length may introduce more equal values, we illustrate this fact with real values drawn from the time series described above. For the EEG case, record F001.txt includes the following initial values: . For , there are no equal values, but for , the subpattern includes two 22 values, that is, 2 out of 6 values are equal. For , another value of 22 is included, and therefore, 3 out of 7 values are equal. The temperature records also contain a lot of ties. As an example, a file from a healthy subject contains the following data: .

As in the previous case, the longer the subsequence, the more equal values it may contain. With regard to the RR records, in file fRR000a, the following sequence can be found: . In this case, the number of equal values is 2 for , 4 for (100%), and 4 for , and then raises again with greater . Finally, a CGM record contains values like . The repetitions are also very frequent but more often with consecutive values, since glucose is a slow-varying parameter but with clear upward or downward trends instead of a high-frequency oscillating pattern (similar to temperature records, and in clear contrast to RR records). The basic rule of thumb is that the longer the subsequence is, the more likely it is to add a value that is already contained in such subsequence, since the set of different numbers in the time series is finite in practical terms. Actually, that set is very small due to the lack of resolution of the measuring devices, specially within relatively short time windows. This is also numerically demonstrated in Table 1.

The numerical values included in the previous examples also illustrate why the percentage of ties in RR records grows with more quickly than that for CGM records. These last records have a step shape, with a clear trend. When the size of the step (equal values, horizontal line in a plot) is exceeded by , it is less likely to find in the vicinity equal values to those in that step; they will be usually smaller or higher. In the previous CGM example, initially, the consecutive 119 values become 120–122, there is a new higher step, and then the values go downward following the same pattern. In other words, the equal values in CGM records are mainly consecutive or too far to be included in the same subpattern. On the contrary, RR records oscillate around the average heart rate, and repeated values can be found at any sample. In the previous RR numerical example, values such as 72, 73, or 71 can be found at any point in the entire record (values more randomly distributed).

2.3. Statistical Assessment

The statistical assessment was carried out using an unpaired two-tailed Mann–Whitney test, with a significance threshold of , to which a Bonferroni correction [22] can be applied for datasets with more than two classes (EEG and RR records), being or instead. The differences between classes were quantified using this metric, with a correction for tied ranks. This approach was used because some experiments did not satisfy the normality assumption for Student’s t-test. Although all the classes were compared on a pair-to-pair basis, in the case of EEG and RR records, a Kruskal–Wallis test was also conducted to assess the separability of the entire set. In the case of the 500 EEG records, the result of this test was , with the statistic . As for the RR records, results were and . These results are in accordance with previous studies [10, 11], where these EEG records demonstrated to be more easily separable than the RR records [20].

2.4. Temporal Scale

As in many other metrics, the values of PE depend on the values of its input parameters [23, 24]. In addition to the length of the subpattern , the PE general method includes a time scale parameter . In practical terms, this time scale parameter corresponds to a downsampling of the block extracted from , without filtering [6]. In other words, defines the step by which is increased until symbols or values are drawn from . Biosignals usually exhibit a multiscale time behaviour [25]. Therefore, the parameter is very useful to assess the contribution of each time scale to the signal complexity. In this context of subpatterns with ties, this parameter can also contribute to reduce the number of equal values and can also have an impact on the classification performance of PE. However, implies an effective reduction of , since the number of samples used for the computation of PE becomes . If the time series are long enough to guarantee [26], that is not a problem, but in many cases, biomedical records are short and the number of applicable time scales is very small.

In this paper, we will mostly consider since the main objective of the study is to assess the influence of ties in relative terms, not to propose a signal classification scheme. In addition, as ties are not usually equally distributed in time, it is arguably reasonable to hypothesize that a nonuniform downsampling can be more effective to reduce the number of equal values, as illustrated by Table 2 (see Trace Segmentation experiment at the end of next section) for short biomedical records from the experimental dataset. However, for longer time series, the EEG and RR records in this study, we explored the possible influence of in a similar manner as in [27]. A complete characterization of this influence is beyond the scope of the paper, but in general, the classification performance for EEG data was maximum for (), classes 0-1. It monotonically dropped for , with no significant classification for . For the RR records, we also chose classes 0-1, which were not separable already for (). For greater values, the performance was even worse. Therefore, we used by default, except for the experiments described at the end of Section 3.

3. Experiments and Results

From a signal classification perspective, the baseline case was defined by the initial differences among all the groups in each dataset, analysed in pairs. These differences have been quantified in terms of the Mann–Whitney test for all the cases, as shown in Table 3. With the exception of RR records, the rest of the classes are distinguishable, and the differences seem to become more significant as the length of the patterns increases.

Each class in the experimental dataset was corrupted by inserting synthetic equal and independent values. The number of additional ties to generate was given by percentages ranging from 5% up to 25% in 5% steps. The resulting percentage of equal values in subsequences is shown in Table 4.

As stated above, it is obvious that changes in the amount of equal values will result in changes in the computed PE value in absolute terms. In order to illustrate this point, the percentage of synthetic ties was specifically increased up to 50% for EEG and CGM records, beyond the general levels stated in Table 4, and PE was computed in each case with . The purpose of this experiment was to obtain PE as a function of the level of ties and plot the results to visually determine the degree of variation that such changes entail. These results are depicted in Figures 5 and 6, including a confidence band based on the standard deviation of the results (68% confidence level). Normality in this case was assumed since the number of records was relatively high, 100 (EEG) or 206 (CGM). PE varies in the same manner for all the cases tested. For low percentages of ties, PE exhibits a growing trend. As this percentage increases, PE declines. It is important to note that, as hypothesized in this paper, if the interclass differences of PE remain relatively constant, as can be arguably inferred from the plots, the classification power of PE should not be significantly affected.

In order to assess this point, the experiments in Table 3 were repeated individually for each dataset, with , comparing the classification performance of PE for the baseline case (no synthetic case, only those already present in the records), and with an additional 25% of synthetic ties (no mitigation).

Tables 5, 6, 7, and 8 show these results for EEG, Temp, RR, and CGM records, respectively. The results are expressed in terms of statistical significance, sensitivity and specificity , for each pair, with subindices b and t to represent the baseline and synthetic ties, respectively. Furthermore, the analysis was repeated using random perturbations to break the ties, as recommended in [1], using the modified PE algorithm described in [5] or using the AAPE method [6] (only the part devised to account for equal values). The purpose of this experiment was to empirically determine if more ties always imply worse classification results and whether the methods proposed in the scientific literature to address equal values really make a significant difference either for original or added ties.

Finally, the effect of equal values was compared with the other claimed drawback of PE: amplitude differences between patterns are overlooked. The complete AAPE method [6] includes two algorithm optimizations for each drawback. In Table 9, the results using the amplitude part are shown for and and the complete version with with EEG records. In this case, the results can be compared with the previous ones where AAPE only addressed the problem of ties and assessed the possible influence of the two drawbacks separately.

Another approach to tackle the problem of equal values in PE subpatterns is to avoid them in the first place. This can be accomplished by a suitable design of the acquisition stage, but sometimes, this is not possible either. Alternatively, once the time series have been recorded, ties can be removed to some extent using signal-resampling methods. We explored this possibility by using a nonuniform resampling method, trace segmentation [28]. This method is specially well suited for this task since the sampling points are usually those where maximum signal variations take place; that is, equal values tend to be removed seamlessly but main signal features (peaks and valleys) are kept. An example is depicted in Figure 7.

Table 2 shows the classification results obtained using this approach and CGM records, those with the highest level of ties at baseline. As the input record is shortened by trace segmentation, the percentage of ties also decreases. The results are compared with those obtained with a classical signal decimation method. The length ranges from 95% of the initial length down to 60% for the trace segmentation method and from 50% down to 25% for the decimation method.

4. Discussion

The percentage of ties in the original records varies significantly, as described in Table 1. As the length of the subsequences increases, this percentage increases as well, but not homogeneously across all the signal types. Specifically, the RR percentage raises more than that of CGM. This is due to the fact that ties in CGM mainly correspond to consecutive samples, whereas in RR records, they are more randomly distributed. This also has important consequences in terms of PE computation since the ordinality of patterns is not modified by consecutive equal values. This is the same trend when synthetic ties are introduced in the records (Table 4). However, the percentage of ties tends to saturate since the replacement of already existing ties becomes more and more frequent for higher percentages.

Changes in PE results due to equal values are obvious. They can also lead to misinterpretations of the underlying nature of the records. However, as illustrated in Figures 5 and 6, if these changes follow a similar trend, the segmentation capability of PE might remain intact.

In principle, records that have been successfully classified by other metrics not so clearly influenced by ties [10, 11, 21, 29] can still be classified using PE despite the high percentage of ties, except RR records, as quantified in Table 3. It is important to also note that, as increases, the classification performance also increases. This fact has also been reported in other works [30] and suggests that the greater is , the better. Unfortunately, this has few practical consequences since the computational cost and memory requirements of PE grow as , and this prevents from being in widespread use.

In terms of statistical significance, in Table 3, all the Temp, CGM, and EEG classes (except 0–2) are separable, despite percentages of ties as high as 20%. The case of RR records is more difficult since only half the combinations are distinguishable. RR records have a relatively high percentage of ties, randomly distributed, whereas CGM and Temp records include more consecutive equal values, and EEG records have the lowest ratio of ties. This analysis seems to hint that ties are not important unless found at high percentages and not consecutive, when their influence on ordinal patterns is maximal.

The results in Tables 5, 6, 7, and 8 correspond to an analysis employing 25% of synthetic ties, , and three methods devised to address equal values in PE patterns. According to the results for EEG records in Table 5, when no mitigation algorithm was applied, average was 0.725 and was 0.803 for the baseline case (no synthetic ties). With an additional 25% of ties, average became 0.696 and 0.773 for . Despite the significant increase in tie levels, classification was only degraded by 3%.

If random perturbations were added to break the ties, as recommended in [1], results improve some 1%, with less than 2% difference between baseline and synthetic noise. With the modified PE method, there is an additional 1-2% improvement for the baseline case, but when ties were added, the classification performance decreased on average some 4%. Finally, with the AAPE method (only ties considered), the classification performance was even lower than that of the no mitigation approach. For the other datasets, the differences were usually bigger between the baseline and the 25% case, but there was no method that clearly outperformed the others. It is important to note that in some cases, the performance with more ties was even better. It can be hypothesized that a stochastic resonance phenomenon [3133] may be involved in terms of the impact of equal values in subpatterns. Stochastic resonance [34] is an effect by which an external disturbance and the internal dynamics of a signal have a positive collaborative interaction that results in enhanced signal detection, among other possible benefits [35]. It is a counterintuitive outcome of the addition of noise to a signal, where the detection probability increases with noise [36]. In this context, stochastic resonance may therefore be involved since ties appear to be beneficial in some cases, from a signal classification perspective.

Using the AAPE method with the customization that deals with amplitude differences only [6], it does not seem to attain the performance achieved with the specific customization for ties (Table 5). For , the average values were and (Table 9), whereas and in Table 5, below the performance of the no mitigation approach. With 25% additional ties, the results were almost the same, and . Again, the level of equal values in subpatterns does not seem to significantly impact the classification performance.

For and , the results slightly improved. The average values became and and and , respectively, comparable to the results using the tie customization. In these cases, with synthetic equal values, the performance even improves, and and and . The complete AAPE version, addressing both PE problems simultaneously, does not have an additive effect; that is, performance is higher separately. In all these cases, the signal to noise ratio in the context of PE computation does not seem to be directly correlated to the classification performance, even the other way round, probably becoming another manifestation of the well-known stochastic resonance phenomenon [37], as stated above.

The results in Table 2 correspond to the standard PE method with and CGM records. The number of ties in the baseline case was reduced using a nonuniform downsampling method, trace segmentation. Initially, the classification performance improved as the percentage of ties decreased, almost exclusively in terms of . However, at some point, around 75% of the original length, less ties implied poorer performance, as in the previous experiments.

5. Conclusions

Biomedical records often include ties due to the intrinsic nature of the records (RR records), lack of resolution of the measuring devices (CGM), slow variations in the underlying signal (temperature), or just by chance (EEG). The percentage of ties in these records can vary significantly, from 2.5% up to 20%. This interference obviously modifies the statistical distribution of patterns and therefore the values obtained for PE.

However, in the context of signal classification tasks, equal values in patterns do not seem to significantly damage the segmentation capability of PE. Actually, in some cases, ties appear to play a beneficial role, in the framework of a stochastic resonance phenomenon. Moreover, the removal of ties before computing PE does not markedly improve the results.

The methods proposed to address this problem do not seem to improve the performance of the standard PE algorithm either. The gains, if any, in classification accuracy, in the vicinity of 2%, do not seem to outweigh the additional costs in terms of algorithm complexity, memory requirements, and computational burden.

In summary, the problem of ties in PE has been probably overrated, not in the interpretation of the nature of the underlying signals, but in practical applications of PE for signal classification. Differences probably lie in a few patterns, and ambiguous patterns due to ties do not suffice to blur those differences. Arguably, their influence dissolves randomly into all the histogram bins of PE. In addition, ties are not all equal. Those due to consecutive values seem to exert a lower influence than those due to unconnected equal values in the same pattern. A statistical analysis of the changes induced in PE bins due to ties would be necessary to better understand and mitigate their effects from a more theoretical perspective.

When ties are involved, if PE had to be applied to a classification task and the results were poor, we would suggest maximising in accordance with the computational resources available (memory and time cost) and the classification performance achieved, using the standard PE algorithm. In case the results were borderline and had been already maximised, marginal gains could probably be achieved using some of the PE improvements tested, but no significant changes should be expected. Which method or which combination to use cannot be set in advance; different alternatives will have to be assessed. If the classification results were poor despite all the efforts in this regard, we would recommend focusing on another metric or combination of metrics, instead of trying to remove ties or applying another customized algorithm.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.