Journal of Complex Systems

Volume 2015, Article ID 932750, 12 pages

http://dx.doi.org/10.1155/2015/932750

## Use of False Nearest Neighbours for Selecting Variables and Embedding Parameters for State Space Reconstruction

Institute of Measurement Science, Slovak Academy of Sciences, Dúbravská Cesta 9, 842 19 Bratislava, Slovakia

Received 18 September 2014; Revised 5 January 2015; Accepted 24 February 2015

Academic Editor: Yang Tang

Copyright © 2015 Anna Krakovská et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

If data are generated by a system with a *d*-dimensional attractor,
then Takens’ theorem guarantees that reconstruction that is diffeomorphic
to the original attractor can be built from the single time series
in -dimensional phase space. However, under certain conditions,
reconstruction is possible even in a space of smaller dimension. This
topic is very important because the size of the reconstruction space
relates to the effectiveness of the whole subsequent analysis. In
this paper, the false nearest neighbour (FNN) methods are revisited
to estimate the optimum embedding parameters and the most appropriate
observables for state space reconstruction. A modification of the
false nearest neighbour method is introduced.
The findings contribute to evidence that the length of the embedding
time window (TW) is more important than the reconstruction delay time
and the embedding dimension (ED) separately. Moreover, if several
time series of the same system are observed, the choice of the one
that is used for the reconstruction could also be critical. The results
are demonstrated on two chaotic benchmark systems.

#### 1. Introduction

State space reconstruction is usually an unavoidable step before the analysis of a time series in terms of dynamical systems theory. Suppose that we have data (a single time series) that was presumably generated by a -dimensional deterministic dynamical system. Then, the usual choice for a reconstruction is a matrix of time shifts of one variable, as supported by Takens theorem from 1981 [1]. Alternate methods of reconstruction, such as derivatives or linearly independent coordinates found by principal component analysis, can be seen as transformations on time-shift vectors. By one of these embedding procedures, a new state space is created that is (in the sense of diffeomorphism) equivalent to the original state space. The reconstruction preserves relevant geometrical and dynamical invariants, such as the fractal dimensions of the attractor, the entropies, or the Lyapunov exponents (which measure the sensitivity to the initial conditions).

Reconstructing requires decision making regarding the size of the space of the reconstruction, the value of the time shifts between the coordinates, and another important—although often overlooked—aspect: which one or which combination of observables (if several of them are available) are to be used for the reconstruction?

*Choice of the Time Delay. *The time-delayed versions of the known observable form an embedding from the original -dimensional manifold into (where is the embedding dimension and is the time lag between consecutive states) [1, 2]. Theoretically, for noise-free data of unlimited length, the existence of a diffeomorphism between the original attractor and the reconstructed image is guaranteed for almost any choice of delay and a sufficiently high embedding dimension. In practice, however, the experimental time series can be short and noisy. Then, the quality of the reconstruction can vary depending on the choices for the time delay and the embedding dimension. If the delay is too small, then each coordinate is almost the same, and the reconstructed trajectories resemble a line (the phenomenon known as redundancy). Geometrically, this arrangement means that there are trajectory intersections at a small angle. In the case of noisy measurements, this circumstance makes the separation of trajectories impossible. On the other hand, if the delay is too large, then due to the sensitivity of the chaotic motion, the coordinates appear to be independent, and the reconstructed state portrait looks random or unnecessarily complicated (a phenomenon known as irrelevance). Such an extremely inappropriate choices for the delay can be detected at first sight in a 2-dimensional delayed plot.

To select the embedding parameters optimally, many competing approaches have been proposed. Most of them are based on heuristic reasoning rather than mathematically rigorous criteria. The simple idea is to unfold the reconstruction of the trajectories sufficiently to avoid self-crossing and extreme closeness of distinct parts. In particular, the delay that is used for reconstruction is often given by the first zero of the autocorrelation function or as the first minimum of the mutual information between the delayed components [3]. By using the first instead of the absolute minimum of the mutual information, the selection is biased toward small delays, to avoid irrelevance. The benefit of using the mutual information, as opposed to the autocorrelation function, is that the nonlinear character of the data is accounted for.

One method to avoid selecting the time delay is to use derivatives instead of delayed coordinates. In addition to the fact that this approach makes the embedding procedure delay-free, the derivative coordinates offer some further advantages. First, in some applications, they enable a clear physical interpretation. Moreover, the prediction results that are obtained in differential phase space could be better than in the time-delay phase space [4]. However, the largest problem is that the numerical estimate of the derivatives leads to errors and deteriorates quickly when calculating higher order derivatives. Any noise in the data would make the situation even worse [5].

*Choice of the Embedding Dimension. *In addition, to the time shift, you must choose a proper embedding dimension to be able to reconstruct the state portrait. The theorem of Whitney guarantees the possibility of embedding any -dimensional smooth manifold into -dimensional Euclidean space [6]. Sauer et al. generalised the theorem to fractal objects. They have proved that, under some conditions regarding periodic orbits and the measurement function, almost every map from the fractal to with forms an embedding, whereby is the box-counting dimension of [7]. This finding means that it is not the size of the manifold of the original attractor that determines the minimal embedding dimension but only the fractal dimension . However, even represents only an upper limit—the embedding theorem does not rule out an embedding dimension that is lower than .

Sometimes, the required size of the reconstruction space can be smaller because of the less demanding goal of the investigation. For example, for the numerical calculation of the correlation dimension of the attractor , any dimension above the box-counting dimension of is sufficient [8]. Of course, such cases do not guarantee that the attractor is mapped one-to-one; however, that is not necessary for dimension estimation.

On the other hand, if the objective is to model or predict the future behaviour, then self-intersections are unacceptable and a reconstruction of a dimension as high as might be needed. However, in favourable cases, embeddings into lower than -dimensional spaces could still exist. It is definitely worthwhile to explore such possibilities because, in practice, it is advantageous to construct embeddings of the lowest possible dimension (ideally of the original system’s dimension). What are the favourable cases and how not to miss them are now a subject of research. For example, Cross and Gilmore contributed to the issue when they analysed differential mappings of the rotationally equivariant Lorenz dynamical system [9]. They showed that, while the differential reconstruction based on the coordinate is an embedding of the attractor in three dimensions, it does not yield an embedding of the entire manifold; that is, the projection of the manifold into possesses singularities. However, it is possible to embed the manifold into a 3-dimensional twisted submanifold of . Then, not only diffeomorphism invariants (as fractal dimensions or Lyapunov exponents) but also information about the mechanism responsible for generating the chaotic behaviour is preserved. The two objects are actually isotopic (smoothly deformable into each other) in . Nonisotopic embeddings provide distinct representations of the original state space because one might not be deformed into another without self-intersection. For , any two embeddings of an -manifold into are isotopic [10]. This result is known as an isotopy version of the strong Whitney embedding theorem. Moreover, Cross and Gilmore have shown that for 3-dimensional systems (if genus and ), all of the representations become equivalent for already [11]. This result is, however, limited to attractors that exist in a 3-dimensional manifold because the considered topological indices are restricted to three dimensions. Very little is known about lower than -dimensional embeddings of dynamical systems with .

The choice of the minimal possible embedding dimension when the number of degrees of freedom of the original system is unknown and is not easy. A space of an undervalued dimensionality does not unfold the trajectories, while an unnecessarily large embedding space can result in overfitting. Typically, the search for the proper dimension is based on a step-by-step expansion of the reconstruction space while simultaneously following some proper diffeomorphism invariant that is expected to stay constant after reaching the sufficient embedding dimension. As examples of such invariants, the correlation dimension, largest Lyapunov exponent, predictability indices, or percentage of false nearest neighbours have previously been mentioned. In the long run, the various options have been superseded in practice by the false near neighbour test [12].

*Choice of the Observable.* When considering the Takens or Sauer theorem, the variables of the system are assumed to be in equal positions regarding their use for the state space reconstruction. For example, the theorems guarantee that a 5-dimensional delay reconstruction from any variable (, , or ) of the Rössler system constitutes a diffeomorphism between the original manifold and the reconstructed image. However, in the case of the variable , already a 3-dimensional differential reconstruction suffices for diffeomorphism [13, 14]. Due to computational and modeling reasons, we would like to know whether some variables lead to a diffeomorphism in a space of smaller dimension than others and to know how low the minimal possible embedding dimension is. To contribute to solving this problem, Letellier et al. defined some observability indices that enable ranking of the observables according to their effectiveness in the reconstruction process [13, 15–17].

*Problems with the Standard Estimates of the Embedding Parameters.* In practice, the most commonly used method for selecting the embedding parameters consists of the first minimum of the mutual information to estimate the time delay and the FNN test to find the sufficient embedding dimension.

It should be emphasised that the selection of the delay for reconstruction, which is based on the mutual information, holds for 2-dimensional embeddings but not necessarily for higher dimensional embeddings. Even in the 2-dimensional case, the criterion can be regarded as effective only for a time series that has a single, dominant periodicity or recurrence time. In that case, the suitable lag is approximately one-quarter of the dominant period, and this value is in good agreement with the minimum of the mutual information or the first zero of the autocorrelation function. However, the same delay time is often used regardless of the number of delay vectors that form the reconstruction, although some authors suggest lowering the delay time when increasing the dimension. They argue that the independent parameter that should be estimated is not the delay or the embedding dimension separately but rather the whole embedding time window (TW), which is given as [18–22]. Despite all this, no standard procedure for estimating the time window has emerged yet and most researchers continue to use the same time delay regardless of the size of the reconstruction space.

In this work, we want to contribute to the debate about the importance of the time window and the possibility of using FNN methods as a tool for the optimal embedding parameters selection.

The paper is organised as follows.

In Section 2, a short review of the methods that use the idea of false nearest neighbours to estimate the parameters of the state space reconstruction is given. We also discuss the importance of the correct choice of observables, if several of them are at our disposal. Then, we describe the data that is used as benchmarks, and we introduce the three methods that were used for testing the data. In Section 2.3 a rank-based modification of false nearest neighbour method is proposed.

In Section 3, we present the results that regard the effects of the reconstruction on the evolution of the false nearest neighbours, on the estimates of the correlation dimension, and on the errors in the predictions.

Finally, the findings are discussed and summarised in Section 4.

#### 2. Data and Methods

##### 2.1. False Nearest Neighbours Algorithms

The false nearest neighbours method is the most popular tool for the selection of the minimal embedding dimension. This method is based on the assumption that two points that are near to each other in the sufficient embedding dimension should remain close as the dimension increases. However, if the embedding dimension is too small, then the points that are in reality far apart could seem to be neighbours (as a consequence of projecting into a space of smaller dimension). The various modifications of the method apply geometrical reasoning: one increases ED until the reconstructed image is unfolded. The method checks the neighbours in increasing embedding dimensions until it finds only a negligible number of false neighbours when going from dimension to . This is chosen as the lowest embedding dimension, which is presumed to give reconstruction without self-intersections.

In the case of clean deterministic data, we expect that the percentage of false neighbours will drop to zero when the proper dimension is reached. If the signal is too noisy, however, it could be that the method fails due to efforts to unfold the noise.

From several variants and modifications of the false neighbours algorithms [12, 23–26] we would describe the two most commonly used.

###### 2.1.1. Kennel’s Algorithm

The false nearest neighbours method, as introduced by Kennel et al., is an iterative process [12]. An -dimensional state portrait is reconstructed by taking the time-delayed coordinates of the observed time series. The time delay is set as the first minimum of the mutual information function [3]. Then, the algorithm takes each point in the -dimensional portrait and finds the distance to its nearest neighbour and, afterward, the distance between the two points in dimensions. If where is some threshold, then the neighbours are said to be false. One then repeats the process at higher dimensions, stopping when the proportion of false nearest neighbours becomes zero or sufficiently small and will remain so from then onward. For clean data with an infinite length, this criterion would be sufficient to determine the proper embedding dimension. However, noise with a limited amount of data would erroneously produce a finite embedding dimension. The problem turns out to be that even though two points are the nearest neighbours, they are not necessarily close to each other. Such points are considered to be false neighbours, and this arrangement is checked by the second criterion: , where is an estimate of the attractor size and is the second threshold. The authors advocate using this pair of criteria jointly by declaring a nearest neighbour as false if either test fails. For data sets of similar size and complexity, as in our study, Kennel et al. recommend the next settings of the thresholds: , . For each dimension, the percentage of the false nearest neighbours is calculated. Eventually, the lowest possible dimension with no more false neighbours than what we are prepared to tolerate is declared as the optimal embedding dimension.

###### 2.1.2. Cao’s Algorithm

One of the problems of the FNN method stems from the subjective choice of several parameters: Kennel’s algorithm, for example, uses and to distinguish between true and false neighbours and another threshold parameter to determine when the fraction of FNN is sufficiently small (to allow the reconstruction space to be declared as sufficiently large). Unfortunately, for different thresholds of parameters, the algorithm could lead to different estimates of the optimal embedding dimension. To avoid this subjectivity, Cao introduced a modified algorithm that is sometimes called an averaged false neighbours method [24]. Instead of testing the neighbours to be false or not, Cao calculates how, on average, the distances between the nearest neighbours change after going from dimension to . The dimension at which the change stops is taken as the proper embedding dimension, assuming that the trajectory is fully unfolded, and adding another dimensions does not change the average distance between the nearest neighbours. The main advantage of this method is that the number of subjectively chosen thresholds of parameters is reduced.

###### 2.1.3. Comparison of the FNN Methods

To compare different FNN methods, Cellucci et al. [27] have tested them on the Rössler system and the Mackey-Glass equation. Five criteria for selecting embedding parameters have been applied to the observables of the systems. For the resulting combinations of embedding parameters, the largest Lyapunov exponent was calculated by using a procedure published by [28] and was compared against those that were determined by the more exhaustive analytically based calculations published by Benettin et al. [29]. The criterion that reproduced best the reference values of the Lyapunov exponents was considered to be the most successful. The best identification of the embedding dimension has been achieved with the method of Kennel [12], and the best value of the time shift has been found by using the mutual information.

In another comparative study, which was conducted by Letellier and his colleagues [30], three classical tests for whether a mapping is an embedding, depending on the geometric and dynamical measures, were compared with a fourth test, which depended on a topological measure (the Gauss linking number). The tests involved estimating the fraction of false near neighbours, the correlation dimension, and the largest Lyapunov exponent as a function of the embedding parameters. Finally, the topological test that was proposed by the authors was based on the idea that in regions where intersections of unstable periodic orbits occur and the linking numbers of the orbits change the mapping cannot be an embedding. For the testing examples, the authors used a periodically driven Takens-Bogdanov oscillator and a modification of the Malkus-Robbins equations, which were originally introduced to model the action of a self-exciting dynamo. Due to limitations of the topological test, a comparison of the methods could be performed only for mappings into three dimensions. The authors have found that the classical tests often fail to identify when the mapping is an embedding. They have suggested that all claims for successful embeddings into three or higher dimensions that were based on geometric or dynamical methods should be treated with the greatest skepticism.

##### 2.2. Suitability of Variables for Reconstruction

Another issue that is not satisfactorily resolved concerns the fact that if we have more observables from the same system, they do not appear to be equivalent with respect to the phase space reconstruction. It appears that different variables could contain different levels of information [13, 21]. For example, it is much easier to obtain a global model from the variable of the Rössler system than from the variable .

Moreover, Whitney’s embedding theorem ensures us that a combination of different variables could form an embedding as well. For example, multichannel measurements are typical for neurology. In such cases, instead of time-lagged copies of a single variable, you can use several different simultaneously taken observables or you can yield a mixed time-delay and multivariate embedding.

As already mentioned in Introduction, it would be useful to have an index that enables a ranking of the observables according to their effectiveness in the reconstruction process. In control theory, the notion of observability is well defined for linear systems. The linear system is evaluated as either observable or not. If a system is observable, then from the system’s outputs, it is possible to determine the behaviour of the entire system. If it is not observable, then the output data disallows us from estimating the states of the system completely. To check if a linear system with states is observable, the rank of the so-called observability matrix is calculated. If it is equal to , then the rows are linearly independent, the initial state can be recovered from a sequence of observations and inputs, and the system is observable in Kalman’s sense [31].

Since 1998, to extend the theory of observability to nonlinear systems, Letellier et al. introduced several measures that rank the variables of the system according to their observability [13, 15–17]. The indices were derived for systems that have known equations, and they appear to be ranking the variables quite well. In [16], a procedure for comparing two observables of the same system without a need for the system equations is proposed. This time-series approach is based on the so-called omnidirectional nonlinear correlation functions, and it agrees relatively well with the earlier indices with respect to the observability order of some benchmark systems’ variables.

In this paper, the so-called symbolic observability coefficient will be used for comparison purposes [17]. Its computation requires knowledge of the equations of the system, and it is based on the so-called fluency matrix, which emphasises constant and nonconstant elements of the Jacobian matrix; these elements correspond to linear and nonlinear terms in the vector field of the system. The symbolic observability coefficients are greater than one when the dimension of the reconstructed state space is too large. They allow choosing from the system equations the best variable or the best combination of variables for univariate (resp., multivariate) reconstruction. The observability coefficients provide an upper limit for the size of the reconstruction space that is sometimes smaller than those provided by the Takens criterion. These computations indicate that the observability is more related to the couplings between dynamical variables than the dynamical regime itself. The authors also claim that the observability is related to the possibility of rewriting the system in a polynomial form while using only the chosen observable. Provided that the coordinate transformation is a global diffeomorphism in -dimensional space and the original system is polynomial, the system can be rewritten under the form of an -order ordinary differential equation in a polynomial form.

##### 2.3. False First Nearest Neighbour (FFNN) Method

In this paragraph, let us introduce a new modification of false neighbour methods, which we then use to find the best embedding parameters. The basic idea is to use rank-based modification of FNN method and to create maps that visualise the evaluation of false neighbours for combinations of values of delay and embedding dimension.

When designing the algorithm, we intended to not leave room for subjective choices of thresholds in the method. Moreover, we also aimed to reduce another serious problem of the FNN methods, which is related to the phenomenon called the curse of dimensionality. When the dimensionality increases, the volume of the space grows so fast that soon the available data become sparse. To obtain statistically reliable results, the amount of data would need to grow exponentially with the dimensionality. Even with enormously large data sets, it is usually not recommended to use the algorithms in more than 10–15 dimensions. In [32], the authors explored the effect of increasing the dimensionality on the nearest neighbour problem. They showed that under a broad set of conditions, as the dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. This arrangement is obviously a problem because it indicates poor discrimination ability, which arises from the fact that all of the distances between pairs of data elements appear to be very similar. In such cases, the use of rank-based measures can be considered because they appear to be less prone to the curse of dimensionality compared with to the primary distances from which the rankings are derived [33].

In this study, we use quite large data sets and low dimensions. Therefore, problems with dimensionality need not be critical, and theoretically, we could use the performance of the average distance of the nearest neighbours (the method of Cao) for creating the maps. Nevertheless, we suggest avoiding using the distances, and we prefer to rely only on counting the number of shared neighbours. In particular, we assess the performance of the first nearest neighbours as going from -dimensional space to ()-dimensional, to obtain a secondary measure that is induced by the primary distance measure (Euclidean norm here).

Here is our modification of the false nearest neighbour algorithms.(1)Take the observable of the system, and for combinations of time delays and embedding dimensions, form time-delay reconstructions.(2)Both in -dimensional and in -dimensional reconstructions, identify the closest point (the first neighbour in the Euclidean sense) to each point on the reconstructed trajectory.(3)Quantify the rank-based measure of the FFNN method as a percentage of cases when the nearest neighbour to a point in -dimensional space ceases to be the nearest neighbour of the same point in the space of one higher dimension.(4)Visualise the results, for example, as a color-coded map to detect the best combinations of parameters for the state space reconstruction.

The best embedding parameters bring us as close as possible to the reconstruction for which the nearest neighbours remain nearest neighbours in larger state spaces. However, can we expect that the reconstruction that is chosen as optimal by the FFNN method is also the most suitable for purposes such as the estimation of Lyapunov exponents, modeling, or forecasting? Let us check this possibility at least for two applications—the estimation of the correlation dimension and a nonlinear one-point prediction. To take advantage of dynamical systems whose properties are quite well known, we are going to use the Rössler and the Lorenz system as the benchmark systems.

##### 2.4. Rössler System

As the first test example, we use the Rössler system [34]:with parameters , , and and the initial condition . This system was integrated by means of the fourth order Runge-Kutta formula with integration step . The first points were discarded, and the next data points were saved.

Spectral analysis of any of the variables shows a peak, which suggests that one running around the attractor takes approximately points. Then, one of the tips for the time delay for the reconstruction could be a quarter of that period, that is, . Finding the first minimum of the mutual information and the first zero crossing of the autocorrelation functions also suggest approximately the same value for all three variables. Therefore, the established way to proceed would be to use as the time delay and to look for the minimum necessary embedding dimension by some nearest neighbours method.

For a long time, possible differences between the levels of observability of different variables were ignored. According to Takens theorem, 5-dimensional reconstruction from any of the variables ensures a diffeomorphism between the original phase space of the Rössler system and the reconstructed space. However, the investigation of the system shows that derivative reconstruction is globally diffeomorphic to the original state portrait in three dimensions already. The global model from observable can be obtained with relative ease. On the other hand, and need at least a 4-dimensional derivative reconstruction space, and especially the variable is known to be a very problematic basis for reconstruction and modeling [14, 17, 35]. Moreover, other than derivative reconstruction might even need an additional dimension to fully unfold the attractor. The values of the symbolic observability degrees confirm that is the best and is the worst observable of the Rössler system. The values of can be found in Table 1.