Abstract

In different fields of science and engineering, a model of a given underlying dynamical system can be obtained by means of measurement data records called time series. This model becomes very important to understand the original system behaviour and to predict the future values of that system. From the model, parameters such as the prediction horizon can be computed to obtain the point where the prediction becomes useless. In this work, a new parallel kd-tree based approach for computing the prediction horizon is presented. The parallel approach uses the maximal Lyapunov exponent, which is computed by Wolf’s method, as an estimator of the prediction horizon.

1. Introduction

In large amount of applications in science and engineering, the main purpose to study a given dynamical system is to be able to obtain a prediction in order to carry out preventive actions. For example, in the case of the analysis of electrocardiogram signals, the information obtained through that study could be used to prevent heart related diseases; in air temperature regulation, the prediction information could be used to manage energy in an efficient manner either into a building or in a crop; moreover, breakdown in wind turbine generators caused by a broken wing could be prevented applying the same principles over the study of the wind speed. However, in all these cases and in the most part of the real-world applications based on dynamical systems, there is not a mathematical model or description of the underlying dynamical system. It is precisely the context in which the nonlinear time series analysis plays an important role. According to this kind of analysis, from time series of one single physic variable, for example, voltage, air temperature, or wind speed, it is possible to obtain parameters which describe the behaviour of the underlying dynamical systems represented for these time series.

This work deals with the problem of computing an estimation for the prediction horizon, that is, the point where the prediction becomes useless. To calculate this value, we used the parameter known as the maximal Lyapunov exponent. From nonlinear time series analysis, there are several methods in the literature to obtain this parameter: Wolf et al. [1], Rosenstein et al. [2], and Kantz [3]. The advantage of Wolf’s method is the conceptually simpler and direct approach which converges to the maximal Lyapunov exponent value without subsequent computing. However, the implementation presented in [1] uses a brute force approach which is enough for shorter time series but becomes very expensive in terms of the execution time for larger data records. As far as the authors know, there is not any implementation of the method considering other approaches.

Therefore, the main goal of this work is to obtain a runtime reduction of Wolf’s method, bearing in mind the time requirements of the real-world applications for the subsequent prediction stage. Thus, to improve the runtime of the method, we developed a parallel approach introducing kd-tree data structure into the implemented code. In order to assert the results, four time series have been considered as case studies. These data records of one single physic variable were obtained from the Lorenz system, the Hénon map, the Rössler system, and electrocardiogram signal.

The parallel approach was developed as part of a multidisciplinary framework. Into the framework, a group formed by computer scientists and physicists is working applying high performance programming onto nonlinear time series analysis algorithms. To present this particular work, we have structured the paper as follows. Wolf’s method is introduced in Section 2. The parallel approach is introduced in Section 3. The experimental results are shown in Section 4 and the conclusions and future works are presented in Section 5.

2. Wolf’s Method

Into the context of nonlinear time series analysis, we mainly focused on time series from deterministic dynamical systems that exhibit sensitive dependence on initial conditions. For the purpose of quantifying this sensitivity, the maximal Lyapunov exponent can be used. This exponent describes the average rate at which the predictability is lost [4]. Thus, a crucial fact is that the prediction horizon can be defined by

Before computing , let us introduce you to the basic concepts related with nonlinear time series analysis. Let be a time series, a set of sequence scalar values of one single physic variable which are typically measured at successive times and spaced at uniform time intervals. From these measurement values, points in delay coordinates, also known as delay vectors, can be constructed according towhere is the embedding delay and is the embedding dimension (Fraser and Swinney [5]). Note that the number of delay vectors constructed in this manner is .

Takens’ embedding theorem [6] states that, for a large enough embedding dimension greater than or equal to , where , the delay vectors yield a phase space that has exactly the same properties as the one formed by the original variables of the system. We can use the mutual information method [5] to obtain the proper value of and the false nearest neighbors method [7] for determining the minimal embedding dimension .

For the reconstructed -dimensional phase space obtained from a single variable, we can compute Lyapunov exponents . These exponents determine the rate of convergence or divergence for two initially nearby trajectories in that phase space. The fact that if one or more of these exponents are larger than zero we are in presence of a system that exhibits sensitive dependence on initial conditions is a well-established fact [8, 9]. In this case, two arbitrary close trajectories of the system will diverge apart exponentially, eventually ending up in completely different phase space areas as time progresses.

For our purpose, as we mentioned at Introduction, only the maximal Lyapunov exponent computation will be sufficient. To calculate this parameter we use the method introduced by Wolf et al. [1], referred to by Wolf’s method. The method works as follows. In the embedding dimension , we selected a point as the starter point, also known as fiducial point, which by default is . A near neighbor searching is performing to obtain a point which satisfies the relation , where is the square norm of a vector and is a small constant. Then, both points are iterated forward in time for a fixed evolution time , which should be a couple of times larger than the embedding delay but not much larger than . If the dynamical system has sensitive dependence on initial conditions, the distance after the evolved time will be larger than the initial ; that is, . But, if the dynamical system does not have sensitive dependence on initial conditions, then . After the evolution, a near neighbor searching is performing to find a new point , the replacement point, whose distance to the evolved point is less than . The angular separation between the vectors constituted by the points and and and should be small. If we found a point that satisfies the constraint, the point is replaced by that point. This procedure is repeated until the fiducial point of the trajectory reaches the last one which is called the fiducial trajectory. Finally, the maximal Lyapunov exponent can be calculated according towhere is the total number of replacement steps and denotes the natural logarithm. Figure 1 shows a partial snapshot of the trajectories followed by the computing, and Algorithm 1 depicts the algorithmic notation by Wolf’s method. The algorithm is called fet1 by fixed evolution time for such as in the open source code presented by Wolf et al. [1].

Program  fet1(, , , )
Input:
    : data record of scalar quantities;
    : embedding delay;
    : minimal embedding dimension;
    : fixed evolution time;
Output:
  For each replacement step : (1) : maximal Lyapunov exponent estimation; (2) : evolving step;
  (3) : initial separation between points; (4) : final separation between points.
(1) begin
   /  Initialization.                                    /
(2) Set as the useful size of ;
(3) Set as the number of replacement steps;
(4) Build using (2) as the set of delay vectors;
(5) Compute as the standard deviation of ;
(6) Set scalmn = 0.0125σ as the noise scale;
(7) Set scalmx = 0.05σ as an estimation of the useful length scale;
(8) Set ;
   /  Computing maximal Lyapunox exponent.                         /
(9) Select as the fiducial point;
(10)Search such as ;
(11)for    to    do
(12)   Set as the initial separation;
(13)   Set as the final separation;
(14)   Set ;
(15)   Print ;
(16)   repeat
(17)    Search such as ;
(18)   until    was a minimum;
(19)   if  a replacement point    was found  then    else  ;
(20)   Set ;
(21) end

3. Parallel Approach Using kd-Tree

The most time-consuming part in the algorithm fet1 is the neighbor searching (lines 10 and 17 in Algorithm 1). This task searches for the point using the brute force approach. This approach will be enough when working with shorter time series but not for working with larger data records. For this task, a review of neighbor search algorithms, which are especially useful for the study of time series data, can be found in Schreiber [10].

In our experience into the context of nonlinear time series analysis, we have worked with a kd-tree data structure to implement the false nearest neighbor method [7] onto a distributed memory platform [11]. In that work, we showed that the implementation based on kd-tree provides better results in terms of the execution time. Thus, we decide to keep this data structure in the present work.

3.1. The kd-Tree Data Structure

From the computational theory point of view, the kd-tree based algorithms [12, 13] have the advantage of providing an asymptotic number of operations proportional to for a set of points, which is the best possible performance for arbitrary distribution of elements. The disadvantage of the brute force search approach is the fact that it involves a lot of comparisons with all the array elements one by one. Then, the algorithm complexity of the searching is equivalent to for a set of points with coordinates. Therefore, it is expected that the runtime will be reduced replacing the brute force algorithm by a kd-tree based algorithm.

In general, a kd-tree data structure considers a set of points , where . This data structure is a -dimensional binary search tree that represents a set of points in a -dimensional space. The variant described in Freidman et al. [13] distinguishes between two kinds of nodes: internal nodes partition the space by a cut plane defined by a value from the dimensions; external nodes, also known as buckets, store the points in the resulting hyperrectangles of the partition. For instance, Figure 2 depicts a -dimensional space partitioned using cut planes defined by the dimension containing the maximum spread, and Figure 3 shows the kd-tree that represents these partitions.

Although many different versions of kd-trees have been developed, their purpose is always to hierarchically decompose the -dimensional space into a relatively small number of buckets such that no bucket contains too many points. This decomposition provides a fast way to access any point by position. We traverse down the hierarchy until we find the bucket containing the point and then scan through the few points in the bucket to identify the right one.

To perform this work, we have implemented the kd-tree data structure and the kd-tree based functions using the outlines described in [13, 14]. The most important functions are the following:(i)newkdtree: building a kd-tree using as input a set of points with dimensions. By definition, a kd-tree data structure has no repeated points, that is, points with similar coordinates. Due to the fact that the index of each point is very important in the course of Wolf’s method, we have updated the algorithms in [14] to implement a kd-tree that supports these feature.(ii)frnn: performing a fixed-radius-near-neighbor query. This query reports all points within a fixed radius of the given target point under a specified metric. In this case, the metric used is the square norm of a vector. We also have updated the algorithms in [14] to use both the and the parameters (lines 10 and 17 in Algorithm 1). Thus, the resulting query is a set of points which have a distance between and to the fiducial point.

3.2. Tasks Decomposition

We use data decomposition to implement the parallel approach. Conceptually, the set of delay vectors is splitted into consecutive subsets of length , where is the number of processes. This data decomposition technique allow us to build a local kd-tree which represents a -dimensional subspace formed for all the points in the subset , where . Hence, the transposition of all the local subspaces represents the whole space as shown in Figure 4 as an example of -dimensional space and processes.

We try to obtain a well-balanced task workload with this coarse-grained decomposition. For this goal, we use balanced trees; that is, to split each subspace in the resulting hyperrectangles, the median of the cut dimension is selected as the cut value.

3.3. Tasks Synchronization

A master-slave model is implemented as the tasks synchronization. We assume a parallel platform model which allows us to run copies of a single program in different processes with unique identifiers for each. Hence, each process uses its process-identifier to know if it is the or a slave, where .

For each near neighbors searching, each process calls the function frnn to obtain a set of points from the local kd-tree which are between and of the fiducial point . From , each process computes which is the local nearest neighbor which has the minimum distance and excludes the others. A synchronization between processes is performed to gather the local results in the . This process computes which is the global nearest neighbor from candidates and a new synchronization is performed to broadcast the result from the to the slaves. Algorithm 2 depicts the algorithmic notation by the parallel Wolf’s method approach. We call this approach pkdfet1 by parallel kd-tree-fixed evolution time for . Note that the process has the task to print and (see (1)) at the end of the entire processing.

Program  pkdfet1(, , , )
Input:
    : data record of scalar quantities;
    : embedding delay;
    : minimal embedding dimension;
    : fixed evolution time;
Output:
    : maximal Lyapunov exponent estimation;
    : prediction horizon;
(1) in parallel do:
  /  A process is treated as  MASTER  and other processes are treated as slaves.
The number of processes is  .                          /
(2) begin
   /  Initialization.                                 /
(3) Set as the useful size of ;
(4) Set as the number of replacement steps;
(5) Compute and as local bounds, where ;
(6) Build using (2);
(7) Call newkdtree();
(8) Compute as the standard deviation of ;
(9) Set as the noise scale;
(10)Set as an estimation of the useful length scale;
(11) Set ;
   /  Computing maximal Lyapunox exponent.                      /
(12)Call frnn();
(13)Compute the local nearest neighbor from ;
(14)synchronization Gather all the local nearest neighbor ;
(15)if   = MASTER then Compute the global nearest neighbor from the candidates;
(16)synchronization Broadcast the global nearest neighbor ;
(17)for    to    do
(18)if   = MASTER then
(19)  Set as the initial separation;
(20)  Set as the final separation;
(21)  Set ;
(22)repeat
(23)  Call frnn();
(24)  Compute the local replacement point from ;
(25)  synchronization Gather all the local replacement point ;
(26)  if   = MASTER then Compute the global replacement point from the candidates;
(27)  synchronization Broadcast the global replacement point ;
(28)until    was a minimum;
(29)if  a replacement point    was found  then    else  ;
(30) Set ;
(31)if   = MASTER then Print , ;
(32) end

4. Experimental Results

As we mentioned previously, the sequential brute force approach of Wolf’s method is enough for shorter time series but becomes very expensive in terms of the execution time for larger data records. This is put in evidence in Table 1, which depicts the sequential runtime of the method presented in Algorithm 1, using data records of one million pieces of data. As far as the authors know, there is not any implementation of that method considering other approaches.

Therefore, the main goal of our parallel implementation is to obtain a runtime reduction of the values shown in Table 1. In general, in order to test the performance and the accuracy of the parallel approach, presented in this work, we have considered four case studies as benchmark. Three time series of one single variable were obtained from the Lorenz system [15], the Hénon map [16], and the Rössler system [17], which correspond with classical examples of discrete and continuous dynamical systems in nonlinear time series analysis. The fourth is a realistic synthetic electrocardiogram (ECG) signal obtained from the reference of McSharry et al. [18]. For each case study, the program computes the prediction horizon using data record of one million pices of data.

We check the performance of the program onto a distributed memory platform called GALGO. (The platform is in the Albacete Research Institute of Informatics, http://www.i3a.uclm.es/.) This parallel computer is a cluster of Intel Xeon E5450 GHz dual-processor nodes with GB RAM per node and uses an Infiniband interconnection network. Each processor has cores with  KB of cache memory per each. This platform uses Red Hat Enterprise Linux operating system.

The experimental results are presented in terms of execution time, speed-up, and efficiency. These parallel metrics are defined as follows:  Execution time: the sequential runtime of a program is the elapsed time between the beginning and the ending of its execution on a sequential computer. The parallel runtime is the time that elapses from the moment when a parallel computation starts to the moment when the last processing element finishes its execution. We denote the sequential runtime by and the parallel runtime by .  Speed-up: it is a measure that captures the relative benefit of solving a problem in parallel. It is defined as the ratio of the time taken to solve a problem in a single processing element to the time required to solve the same problem on a parallel computer, with identical processing elements. We denote speed-up by the symbol . Mathematically, it is given by .  Efficiency: it is a measure of the fraction of time for which a processing element is usefully employed; it is defined as the ratio of speed-up to the number of processing elements. We denote efficiency by the symbol . Mathematically, it is given by .

We use the Message Passing Interface MPI [19] to implement the tasks synchronization into the source code of the parallel approach. The implementation of Algorithms 1 and 2 is written in C language (all the codes can be obtained from http://www.dsi.uclm.es/ntsa/). The results obtained are depicted in Figure 5. The runtimes involves the calculus performed from line 12 to line 30 in Algorithm 2.

For clarity, we present the results of the execution time in Table 2. Note that the parallel runtimes for the case are not the same as those shown in Table 1. This is because at the first stage of this work we implement a sequential approach of Algorithm 1 using a kd-tree data structure, and with this new approach we take times for that case. Hence, as a first result of our work, we have obtained an important reduction of Wolf’s method which is from (Hénon case) up to (Rössler case) times faster than the original approach. In practice, at the second stage of this work, we introduce MPI in this new sequential approach to obtain the final parallel version.

As you can note, for each case, the parallel metrics are quite different between them, despite the fact that all the cases have the same value of . We can explain this as follows. In first term, we have the fixed evolution time which determines the number of replacement steps that pkdfet1  must do, for example, we use for Hénon and for Rössler, corresponding with the maximal and the minimal , respectively. In second place, we have the size of the fixed-radius-near-neighbor query which is determined by both and variables. At the same time, the query size determines the number of buckets that must be visited. This number of buckets must be increased or decreased or kept constant when we add a major number of processes. If the number of buckets decreases, then the performance gain achieved could be unexpected due to the best behaviour of the kd-tree based algorithms; for example, Hénon depicts a phenomenon known as superlinear speed-up from to . If the number of buckets increases or remains constant, the speed-up has the expected behaviour as shown in the other cases, where . Moreover, the Hénon case does not have a superlinear speed-up in , probably due to the fact that the number of buckets cannot decrease more. In the Rössler case, we cannot achieve better results for parallel metrics due to the fact that the introduction of the kd-tree at the first stage of the work allows us to obtain a very small value for the neighbors searching using . So for the synchronization has a most important influence on the parallel runtime.

In general, as we expected, the parallel runtime is better when we use a major number of processes, due to the less number of comparisons for each query. However, also involves the processes synchronization. Hence, at the moment when the neighbors searching reaches a very small value, the synchronization has a most important influence on the parallel runtime. This is the situation shown in for both the Lorenz and Rössler case studies.

Finally, the efficiency showed us that the parallel approach has a better performance when for almost all the cases. In a real-world application, the reasons to use less or more processes will be a decision of the researchers.

At this part, we want to mention that the kd-tree query is performed using a top-down approach over balanced trees. For a bottom-up approach and for unbalanced trees, we have obtained quite similar results.

To check the performance of the parallel approach using real time series, we download two time series, related with engineering system applications, from the UCI Machine Learning Repository [20]. In these cases we use the parallel approach considering the better performance reached when , as we commented before. We run the implementation of Algorithm 2 using default values for both the embedding delay and the minimal embedding dimension parameters.

The first case is a time series with measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. In particular, the time series is a household global minute-average active power (in kilowatt). The metrics for this case are depicted in Figure 6.

The second case is a time series with measurements of a chemical sensor exposed to gas mixtures at varying concentration levels. This time series was constructed by the continuous acquisition of the sensor signals for a duration of about 12 hours without interruption. In particular, the sensor was exposed to a gas mixtures of Ethylene and CO in air. The metrics for this case are depicted in Figure 7.

As you can see, both cases have similar performance using default values into the parallel approach, despite the fact that both cases have different value of : the first case has 2 million pieces of data and the second case has 4 million pieces of data. In general, as we can observe for the theoretical case studies, the parallel runtime is better when using a major number of processes; at the same time both the speed-up and the efficiency decrease with , but this one is over 60% until 8 processes.

5. Conclusions and Future Works

A new parallel kd-tree based approach for computing the prediction horizon was presented. The results obtained can be extrapolated below over Wolf’s method, which is used as an estimator of the prediction horizon. Moreover, a new sequential kd-tree based approach for that method was developed as part of this work. This sequential approach is from to times faster than the original approach and by itself represents an improvement of the mentioned method.

In the case of the new parallel approach, this is from to times faster than the new sequential approach, considering from to processes onto a distributed memory platform. As was presented, the results depend on each particular case. The better results were obtained when the advantage in the data organization of the kd-tree data structure can be fully exploited.

In terms of the efficiency, for almost all the cases, the better performance is achieved using . But, considering the time requirements of the real-world application, if we use processes, the parallel program is from to times faster than the original brute force approach, which represents a time reduction from one hour and minutes to seconds in the best case.

As far as the authors know, to work with a kd-tree data structure onto a distributed memory platform, the literature suggests the same approach by different ways. That is, the local kd-tree building represents a subtree of the original tree formed by a sequential building. Here, a new approach is presented which proposes a decomposition of the whole space in subspaces represented by the local trees; that is, the data decomposition technique is focused on the phase space instead of the kd-tree. For this particular work, this new approach allows us to obtain a well-balanced task workload.

As a future work, we will apply the QR decomposition into Wolf’s method to improve the accuracy of the maximal Lyapunov exponent estimation. Through this new parallel implementation, we thought that the whole spectrum of Lyapunov exponents could be computing, despite the fact that we work with time series of one single physic variable instead of working with a model of the dynamical system.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.