Abstract

Experimental data are often very complex since the underlying dynamical system may be unknown and the data may heavily be corrupted by noise. It is a crucial task to properly analyze data to get maximal information of the underlying dynamical system. This paper presents a novel principal component analysis (PCA) method based on symplectic geometry, called symplectic PCA (SPCA), to study nonlinear time series. Being nonlinear, it is different from the traditional PCA method based on linear singular value decomposition (SVD). It is thus perceived to be able to better represent nonlinear, especially chaotic data, than PCA. Using the chaotic Lorenz time series data, we show that this is indeed the case. Furthermore, we show that SPCA can conveniently reduce measurement noise.

1. Introduction

Data measured in experimental situations, especially in real environments, can be very complex since the underlying dynamical system may be nonlinear and unknown structure, and the data may be very noisy. It is challenging to appropriately analyze the measured data, especially the noisy ones. Since chaotic phenomena have been discovered, interpretation of irregular dynamics of various systems as a deterministic chaotic process has been popular and widely used in almost all fields of science and engineering. A number of important algorithms based on chaos theory have been employed to infer the system dynamics from the data or reduce noise from the data [16]. The first step of these approaches is to reconstruct a phase space from the data so that the dynamic characteristic of the system can be properly studied [7]. This is achieved using Takens’ embedding theorem [8], which states that the system dynamics under the noise-free case can be reconstructed from one-dimensional signal, that is, a time series. However, the actual systems may often be noisy—sometimes so noisy that the reconstructed attractor of the nonlinear system could exhibit different features when different analysis techniques are used [912]. Therefore, appropriate analyses of the measured data are a critical task in the fields of science and engineering. In this work, we propose a novel nonlinear analysis method based on symplectic geometry and principal component analysis, called symplectic principal component analysis (SPCA).

The symplectic geometry is a kind of phase space geometry. Its nature is nonlinear. It can describe the system structure, especially nonlinear structure, very well. It has been used to study various nonlinear dynamical systems [1315] since Feng [16] has proposed a symplectic algorithm for solving symplectic differential. However, from the view of data analysis, few literatures have employed symplectic geometry theory to explore the dynamics of the system. Our previous works have proposed the estimation of the embedding dimension based on symplectic geometry from a time series [1720]. Subsequently, Niu et al. have used our method to evaluate sprinter’s surface EMG signals [21]. Xie et al. [22] have proposed a kind of symplectic geometry spectra based on our work. In this paper, we show that SPCA can well represent chaotic time series and reduce noise in chaotic data.

2. Method

Consider a dynamical system defined in phase space 𝑅𝑑. A discretized trajectory at times 𝑡=𝑛𝑡𝑠,𝑛=1,2,, may be described by maps of the form𝑥𝑛+1𝑥=𝑓𝑛.(2.1) In SPCA, a fundamental step is to build the multidimensional structure (attractor) in symplectic geometry space. Here, in terms of Taken’s embedding theorem, we first construct an attractor in phase space, that is, the trajectory matrix 𝑋 from a time series. Then, we describe the symplectic principal component analysis (SPCA) based on symplectic geometry theory and give its corresponding algorithm.

2.1. Attractor Reconstruction

Let the measured data (the observable of the system under study) 𝑥1,𝑥2,,𝑥𝑛 be recorded with sampling interval 𝑡𝑠; 𝑛 is the number of samples. Takens’ embedding theorem states that if the time series is indeed composed of scalar measurements of the state from a dynamical system, then, under certain genericity assumptions, a one-to-one image of the original set {𝑥} is given by the time-delay embedding, provided 𝑑 is large enough. That is, the time-delay embedding provides the map into 𝑅𝑑-𝑑-dimensional series 𝑋𝑇𝑖,𝑖=1,,𝑚:𝑋𝑋=𝑇1𝑋𝑇2𝑋𝑇𝑚=𝑥1𝑥2𝑥𝑑𝑥2𝑥3𝑥𝑑+1𝑥𝑚𝑥𝑚+1𝑥𝑛,(2.2) where 𝑑 is embedding dimension, 𝑚=𝑛𝑑+1 is the number of dots in 𝑑-dimension reconstruction attractor, and 𝑋𝑚×𝑑 denotes the trajectory matrix of the dynamical system in phase space, that is, the attractor in phase space.

2.2. Symplectic Principal Component Analysis

SPCA is a kind of PCA approaches based on symplectic geometry. Its idea is to map the investigated complex system in symplectic space and elucidate the dominant features underlying the measured data. The first few larger components capture the main relationship between the variables in symplectic space. The remaining components are composed of the less important components or noise in the measured data. In symplectic space, the used geometry is called symplectic geometry. Different from Eulid geometry, symplectic geometry is the even-dimensional geometry with a special symplectic structure. It is dependent on a bilinear antisymmetric nonsingular cross product—symplectic cross product:[]𝑥,𝑦=𝑥,𝐽𝑦,(2.3) where𝐽=𝐽2𝑛=0+𝐼𝑛𝐼𝑛0(2.4) when 𝑛=1,𝑥=[𝑥1,𝑥2],𝑦=[𝑦1,𝑦2],,[]=𝑥𝐽=0110𝑥,𝑦1𝑥2𝐽𝑦1𝑦2=||||||𝑥1𝑦1𝑥2𝑦2||||||.(2.5) The measurement of symplectic space is area scale. In symplectic space, the length of arbitrary vectors always equals zero and without signification and there is the concept of orthogonal cross-course. In symplectic geometry, the symplectic transform is the nonlinear transform in essence, which is also called canonical transform, since it has measure-preserving characteristics and can keep the natural properties of the original data unchanged. It is fit for nonlinear dynamics systems.

The symplectic principal components are given by symplectic similar transform. It is similar to SVD-based PCA. The corresponding eigenvalues can be obtained by symplectic 𝑄𝑅 method. Here, we first construct the autocorrelation matrix 𝐴𝑑×𝑑 of the trajectory matrix 𝑋𝑚×𝑑. Then, the matrix 𝐴 can be transformed as a Hamilton matrix 𝑀 in symplectic space.

Theorem 2.1. Any 𝑑×𝑑 matrix can be made into a Hamilton matrix 𝑀. Let a matrix as 𝐴, so𝑀=𝐴00𝐴𝑇,(2.6) where 𝑀 is Hamilton matrix.

Theorem 2.2. Hamilton matrix 𝑀 keeps unchanged at symplectic similar transform.

Theorem 2.3. Let 𝑀𝐶2𝑑×2𝑑 be Hamilton matrix, so 𝑒𝑀 is symplectic matrix.

Theorem 2.4. Let 𝑆𝐶2𝑑×2𝑑 be sympletcic matrix, and there is 𝑆=𝑄𝑅, where 𝑄 is symplectic unitary matrix; and 𝑅 is upper triangle matrix.

Theorem 2.5. The product of sympletcic matrixes is also a symplectic matrix.

Theorem 2.6. Suppose that Household matrix H is 𝐻=𝐻(𝑘,𝜔)=𝑃00𝑃,(2.7) where 𝑃=𝐼𝑛2𝜛𝜛𝜛𝜛,𝜛=0,,0;𝜔𝑘,,𝜔𝑑𝑇0,(2.8) so 𝐻 is symplectic unitary matrix. 𝜛 is 𝜛 conjugate transposition.

Proof. For proving that 𝐻 matrix is symplectic matrix, we only need to prove 𝐻𝐽𝐻=𝐽. 𝐻𝐽𝐻=𝑃00𝑃𝐽=𝑃00𝑃0𝑃𝑃𝑃,𝑃0(2.9)𝑃=𝐼𝑛2𝜛𝜛𝜛𝜛𝑃𝑃=𝑃𝑃=𝑃2=𝐼𝑛2𝜛𝜛𝜛𝜛𝐼𝑛2𝜛𝜛𝜛𝜛=𝐼𝑛4𝜛𝜛𝜛𝜛+𝜛4𝜛𝜛𝜛(𝜛𝜛)(𝜛𝜛)=𝐼𝑛,(2.10) where 𝜛=(0,,0;𝜔𝑘,,𝜔𝑑)𝑇0.
Plugging (2.10) into (2.9), we have:𝐻𝐻𝐽𝐻=𝐽𝐻issymplecticmatrix𝐻=𝑃00𝑃=𝑃𝑃00𝑃𝑃00𝑃𝑃=𝐼2𝑛𝐻isalsounitarymatrix𝐻issymplecticunitarymatrix.(2.11)

For Hamilton matrix 𝑀, its eigenvalues can be given by symplectic similar transform and the primary 2𝑑-dimension space can be transformed into 𝑑-dimension space to resolve [1719], as follows:(i)Let 𝑁=𝑀2,𝑀2=𝐴𝜏𝐺𝐹𝐴2.(2.12)(ii)Construct a symplectic matrix Q,𝑄𝜏𝑁𝑄=𝐵𝑅0𝐵𝜏,(2.13) where 𝐵 is up Hessenberg matrix (𝑏𝑖𝑗=0,𝑖>𝑗+1). The matrix 𝑄 may be a symplectic Household matrix 𝐻. If the matrix 𝑀 is a real symmetry matrix, 𝑀 can be considered as 𝑁. Then, one can get an upper Hessenberg matrix (referred to (2.13)), namely, 𝐻𝑀𝐻=𝑃00𝑃𝐴00𝐴𝑃00𝑃=𝑃𝐴𝑃00𝑃𝐴𝑃=𝐵00𝐵,(2.14) where 𝐻 is the symplectic Householder matrix.(iii)Calculate eigenvalues 𝜆(𝐵)={𝜇1,𝜇2,,𝜇𝑑} by using symplectic 𝑄𝑅 decomposition method; if 𝑀 is a real symmetry matrix, then the eigenvalues of 𝐴 are equal to those of 𝐵: 𝜆𝜇=𝜆(𝐵)=𝜆(𝐴),(𝐴)=𝜆2(𝑋).(2.15)(iv)These eigenvalues 𝜇={𝜇1,𝜇2,,𝜇𝑑} are sorted by descending order, that is, 𝜇1>𝜇2>>𝜇𝑘𝜇𝑘+1𝜇𝑑.(2.16)

Thus, the calculation of 2𝑑-dimension space is transformed into that of 𝑑-dimension space. The 𝜇 is the symplectic principal component spectrums of 𝐴 with relevant symplectic orthonormal bases. In the so-called noise floor, values of 𝜇𝑖,𝑖=𝑘+1,,𝑑, reflect the noise level in the data [18, 19]. The corresponding matrix 𝑄 denotes symplectic eigenvectors of 𝐴.

2.3. Proposed Algorithm

For a measured data 𝑥1,𝑥2,,𝑥𝑛, our proposed algorithm consists of the following steps:(1)Reconstruct the attractor 𝑋𝑚×𝑑 from the measured time series, where 𝑑 is the embedding dimension of the matrix 𝑋 and 𝑚=𝑛𝑑+1.(2)Remove the mean values 𝑋mean of each row of the matrix 𝑋.(3)Build the real 𝑑×𝑑 symmetry matrix 𝐴, that is, 𝐴=𝑋𝑋mean𝑋𝑋mean.(2.17) Here, 𝑑 should be larger than the dimension of the system in terms of Taken’s embedding theorem.(4)Calculate the symplectic principal components of the matrix 𝐴 by 𝑄𝑅 decomposition, and give the Household transform matrix 𝑄.(5)Construct the corresponding principal eigenvalue matrix 𝑊 according to the number 𝑘 of the chosen symplectic principal components of the matrix 𝐴, where 𝑊𝑄. That is, when 𝑘=𝑑,𝑊=𝑄; otherwise, 𝑊𝑄. In use, 𝑘 can be chosen according to (2.16).(6)Get the transformed coefficients 𝑆={𝑆1,𝑆2,,𝑆𝑚}, where 𝑆𝑖=𝑊𝑋𝑖,𝑖=1,,𝑚.(2.18)(7)Reestimate the 𝑋𝑠 from 𝑆𝑋𝑠𝑖=𝑊𝑆𝑖.(2.19) Then, the reestimation data 𝑥𝑠1,𝑥𝑠2,,𝑥𝑠𝑚 can be given.(8)For the noisy time series, the first estimation of data is usually not good. Here, one can go back to the step (6) and let 𝑋𝑖=𝑋𝑠 in (2.18) to do step (6) and (7) again. Generally, the second estimated data will be better than the first estimated data.

Besides, it is necessary to note that, for the clean time series, the step (8) is unnecessary to handle.

3. Numerical and Experimental Data

In order to investigate the feasibility of SPCA, this paper employs the chaotic Lorenz time series as follows:𝑥̇𝑥=𝜎(𝑦𝑥),̇𝑦=𝛾𝑥𝑦𝑥𝑧,̇𝑧=𝑏𝑧+𝑥𝑦,𝑠(𝑡)=𝑥(𝑡)+𝑒(𝑡),(3.1) where 𝜎=10,𝑏=8/3,𝛾=28,𝑥(0)=5,𝑦(0)=5,and𝑧(0)=15. Here, 𝑒 is a white Gaussian measurement noise. The measurement noise 𝑒 is used because all real measurements are polluted by noise. For more details of noise notions, refer to the literature [2326].

4. Performance Evaluation

SPCA, like PCA, not only can represent the original data by capturing the relationship between the variables, but also can reduce the contribution of errors in the original data. Therefore, this paper studies the performance analysis of SPCA from the two views, that is, representation of chaotic signals and noise reduction in chaotic signals.

4.1. Representation of Chaotic Signals

We first show that, for the clean chaotic time series, SPCA can perfectly reconstruct the original data in a high-dimensional space. We first embed the original time series to a phase space. Considering that the dimension of the Lorenz system is 3, 𝑑 of the matrix 𝐴 is chosen as 8 in our SPCA analysis. To quantify the difference between the original data and the SPCA-filtered data, we employ the root-mean-square error (RMSE) as a measure:RMSE=1𝑁𝑁𝑖=1[]𝑥(𝑖)̂𝑥(𝑖)2,(4.1) where 𝑥(𝑖) and ̂𝑥(𝑖) are the original data and estimated data, respectively.

When 𝑘=𝑑, the RMSE values are lower than 10−14 (see Figure 1). In Figure 1, the original data are generated by (3.1) when noise 𝑒=0. The estimated data is obtained by SPCA with 𝑘=𝑑. The results show that the SPCA method is better than the PCA. Since the real systems are usually unknown, it is necessary to study the effect of sampling time, data length, and noise on the SPCA approach. From Figures 1 and 2, we can see that the sampling time and data length have less effect on SPCA method in the case of free noise.

For analyzing noisy data, we use the percentage of principal components (PCs) to study the occupancy rate of each PC in order to reduce noise. The percentage of PCs is defined by𝑃𝑖=𝜇𝑖𝑑𝑖=1𝜇𝑖×100%,(4.2) where 𝑑 is the embedding dimension and 𝜇𝑖 is the 𝑖th principal component value. From Figure 3, we find that the first largest symplectic principal component (SPC) of the SPCA is a little larger than that of the PCA. It is almost possessed of all the proportion of the symplectic principal components. This shows that it is feasible for the SPCA to study the principal component analysis of time series.

Next, we study the reduced space spanned by a few largest symplectic principal components (SPCs) to estimate the chaotic Lorenz time series (see Figure 4). In Figure 4, the data 𝑥 is given with a sampling time of 0.01 from chaotic Lorenz system. The estimated data is calculated by the first three largest SPCs. The average error and standard deviation between the original data and the estimated data are 6.55𝑒16 and 1.03𝑒2, respectively. The estimated data is very close to the original data not only in time domain (see Figure 4(a)) but also in phase space (see Figure 4(b)). We further explore the effect of sampling time in different number of PCs. When the PCs number 𝑘=1 and 𝑘=7, respectively, the SPCA and PCA give the change of RMSE values with the sampling time in Figure 5. We can see that the RMSE values of the SPCA are smaller than those of the PCA. The sampling time has less impact on the SPCA than the PCA. In the case of 𝑘=7, the data length has also less effect on the SPCA than the PCA (see Figure 6).

Comparing with PCA, the results of SPCA are better in Figures 4, 5, and 6. We can see that the SPCA method keep the essential dynamical character of the primary time series generated by chaotic continuous systems. These indicate that the SPCA can reflect intrinsic nonlinear characteristics of the original time series. Moreover, the SPCA can elucidate the dominant features underlying the observed data. This will help to retrieve dominant patterns from the noisy data. For this, we study the feasibility of the proposed algorithm to reduce noise by using the noisy chaotic Lorenz data.

4.2. Noise Reduction in Chaotic Signals

For the noisy Lorenz data 𝑥, the phase diagrams of the noisy and clean data are given in Figures 7(a) and 7(b). The clean data is the chaotic Lorenz data 𝑥 with noise-free (see (3.1)). The noisy data is the chaotic Lorenz data 𝑥 with Gaussian white noise of zero mean and one variance (see (3.1)). The sampling time is 0.01. The time delay 𝐿 is 11 in Figure 7. It is obvious that noise is very strong. The first denoised data is obtained in terms of the proposed SPCA algorithm (see Figures 7(c)7(f)). Here, we first build an attractor 𝑋 with the embedding dimension of 8. Then, the transform matrix 𝑊 is constructed when 𝑘=1. The first denoised data is generated by (2.18) and (2.19). In Figure 7(c), the first denoised data is compared with the noisy Lorenz data 𝑥 from the view of time field. Figure 7(d) shows the corresponding phase diagram of the first denoised data. Compared with Figure 7(a), the first denoised data can basically give the structure of the original system. In order to obtain better results, this denoised data is reduced noise again by step (8). We can see that, after the second noise reduction, the results are greatly improved in Figures 7(e) and 7(f), respectively. The curves of the second denoised data are better than those of the first denoised data whether in time domain or in phase space by contrast with Figures 7(c) and 7(d). Figure 7(g) shows that the PCA technique gives the first denoised result. We refer to our algorithm to deal with the first denoised data again by the PCA (see Figure 7(h)).

Some of noise has been further reduced but the curve of PCA is not better than that of SPCA in Figure 7(e). The reason is that the PCA is a linear method indeed. When nonlinear structures have to be considered, it can be misleading, especially in the case of a large sampling time (see Figure 8). The used program code of the PCA comes from the TISEAN tools (http://www.mpipks-dresden.mpg.de/~tisean/).

Figure 8 shows the variation of correlation dimension 𝐷2 with embedding dimension 𝑑 in the sampling time of 0.1 for the clean, noisy, and denoised Lorenz data. We can observe that, for the clean and SPCA denoised data, the trend of the curves tends to smooth in the vicinity of 2. For the noisy data, the trend of the curve is constantly increasing and has no platform. For the PCA denoised data, the trend of the curve is also increasing and trends to a platform with 2. However, this platform is smaller than that of SPCA. It is less effective than the SPCA algorithm. This indicates that it is difficult for the PCA to describe the nonlinear structure of a system, because the correlation dimension 𝐷2 manifests nonlinear properties of chaotic systems. Here, the correlation dimension 𝐷2 is estimated by the Grassberger-Procaccia’s algorithm [27, 28].

5. Discussion and Conclusion

In this paper, we have proposed a novel PCA based on symplectic geometry, called SPCA. From the view of theory, this method can reflect nonlinear structure of nonlinear dynamical systems very well because it is intrinsically nonlinear. Using chaotic Lorenz data and calculating RMSE, percentage, correlation dimension, and phase space diagrams, we have shown that the SPCA method can yield more reliable results for chaotic time series with wider range of data length and sampling time, especially with short data length and undersampled sampling time than the classic PCA. With regard to noise reduction, SPCA algorithm is also more effective than PCA.

We wish to emphasize that SPCA has phase delay property; that is, the second row of SPCA-filtered data is closer to the original data. It is worth further investigation in future.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 10872125), Science Fund for Creative Research Groups of the National Natural Science Foundation of China (no. 50821003), State Key Lab of Mechanical System and Vibration, Project supported by the Research Fund of State Key Lab of MSV, China (Grant no. MSV-MS-2010-08) and Science and Technology Commission of Shanghai Municipality (no. 06ZR14042). The authors also thank Dr. Gao Jianbo for many valuable discussions.