Mathematical Problems in Engineering

Volume 2011 (2011), Article ID 793429, 14 pages

http://dx.doi.org/10.1155/2011/793429

## Symplectic Principal Component Analysis: A New Method for Time Series Analysis

Institute of Vibration, Shock & Noise and State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, Shanghai 200030, China

Received 9 July 2011; Accepted 22 September 2011

Academic Editor: Mahmoud T. Yassen

Copyright © 2011 Min Lei and Guang Meng. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Experimental data are often very complex since the underlying dynamical system may be unknown and the data may heavily be corrupted by noise. It is a crucial task to properly analyze data to get maximal information of the underlying dynamical system. This paper presents a novel principal component analysis (PCA) method based on symplectic geometry, called symplectic PCA (SPCA), to study nonlinear time series. Being nonlinear, it is different from the traditional PCA method based on linear singular value decomposition (SVD). It is thus perceived to be able to better represent nonlinear, especially chaotic data, than PCA. Using the chaotic Lorenz time series data, we show that this is indeed the case. Furthermore, we show that SPCA can conveniently reduce measurement noise.

#### 1. Introduction

Data measured in experimental situations, especially in real environments, can be very complex since the underlying dynamical system may be nonlinear and unknown structure, and the data may be very noisy. It is challenging to appropriately analyze the measured data, especially the noisy ones. Since chaotic phenomena have been discovered, interpretation of irregular dynamics of various systems as a deterministic chaotic process has been popular and widely used in almost all fields of science and engineering. A number of important algorithms based on chaos theory have been employed to infer the system dynamics from the data or reduce noise from the data [1–6]. The first step of these approaches is to reconstruct a phase space from the data so that the dynamic characteristic of the system can be properly studied [7]. This is achieved using Takens’ embedding theorem [8], which states that the system dynamics under the noise-free case can be reconstructed from one-dimensional signal, that is, a time series. However, the actual systems may often be noisy—sometimes so noisy that the reconstructed attractor of the nonlinear system could exhibit different features when different analysis techniques are used [9–12]. Therefore, appropriate analyses of the measured data are a critical task in the fields of science and engineering. In this work, we propose a novel nonlinear analysis method based on symplectic geometry and principal component analysis, called symplectic principal component analysis (SPCA).

The symplectic geometry is a kind of phase space geometry. Its nature is nonlinear. It can describe the system structure, especially nonlinear structure, very well. It has been used to study various nonlinear dynamical systems [13–15] since Feng [16] has proposed a symplectic algorithm for solving symplectic differential. However, from the view of data analysis, few literatures have employed symplectic geometry theory to explore the dynamics of the system. Our previous works have proposed the estimation of the embedding dimension based on symplectic geometry from a time series [17–20]. Subsequently, Niu et al. have used our method to evaluate sprinter’s surface EMG signals [21]. Xie et al. [22] have proposed a kind of symplectic geometry spectra based on our work. In this paper, we show that SPCA can well represent chaotic time series and reduce noise in chaotic data.

#### 2. Method

Consider a dynamical system defined in phase space . A discretized trajectory at times , may be described by maps of the form In SPCA, a fundamental step is to build the multidimensional structure (attractor) in symplectic geometry space. Here, in terms of Taken’s embedding theorem, we first construct an attractor in phase space, that is, the trajectory matrix from a time series. Then, we describe the symplectic principal component analysis (SPCA) based on symplectic geometry theory and give its corresponding algorithm.

##### 2.1. Attractor Reconstruction

Let the measured data (the observable of the system under study) be recorded with sampling interval ; is the number of samples. Takens’ embedding theorem states that if the time series is indeed composed of scalar measurements of the state from a dynamical system, then, under certain genericity assumptions, a one-to-one image of the original set is given by the time-delay embedding, provided is large enough. That is, the time-delay embedding provides the map into --dimensional series : where is embedding dimension, is the number of dots in -dimension reconstruction attractor, and denotes the trajectory matrix of the dynamical system in phase space, that is, the attractor in phase space.

##### 2.2. Symplectic Principal Component Analysis

SPCA is a kind of PCA approaches based on symplectic geometry. Its idea is to map the investigated complex system in symplectic space and elucidate the dominant features underlying the measured data. The first few larger components capture the main relationship between the variables in symplectic space. The remaining components are composed of the less important components or noise in the measured data. In symplectic space, the used geometry is called symplectic geometry. Different from Eulid geometry, symplectic geometry is the even-dimensional geometry with a special symplectic structure. It is dependent on a bilinear antisymmetric nonsingular cross product—symplectic cross product: where when , The measurement of symplectic space is area scale. In symplectic space, the length of arbitrary vectors always equals zero and without signification and there is the concept of orthogonal cross-course. In symplectic geometry, the symplectic transform is the nonlinear transform in essence, which is also called canonical transform, since it has measure-preserving characteristics and can keep the natural properties of the original data unchanged. It is fit for nonlinear dynamics systems.

The symplectic principal components are given by symplectic similar transform. It is similar to SVD-based PCA. The corresponding eigenvalues can be obtained by symplectic method. Here, we first construct the autocorrelation matrix of the trajectory matrix . Then, the matrix can be transformed as a Hamilton matrix in symplectic space.

Theorem 2.1. *Any matrix can be made into a Hamilton matrix . Let a matrix as , so**
where is Hamilton matrix.*

Theorem 2.2. *Hamilton matrix keeps unchanged at symplectic similar transform.*

Theorem 2.3. *Let be Hamilton matrix, so is symplectic matrix.*

Theorem 2.4. *Let be sympletcic matrix, and there is , where is symplectic unitary matrix; and is upper triangle matrix.*

Theorem 2.5. *The product of sympletcic matrixes is also a symplectic matrix.*

Theorem 2.6. *Suppose that Household matrix H is
**
where
**
so is symplectic unitary matrix. is conjugate transposition.*

*Proof. *For proving that matrix is symplectic matrix, we only need to prove .
where .

Plugging (2.10) into (2.9), we have:

For Hamilton matrix , its eigenvalues can be given by symplectic similar transform and the primary -dimension space can be transformed into -dimension space to resolve [17–19], as follows:(i)Let ,(ii)Construct a symplectic matrix , where is up Hessenberg matrix . The matrix may be a symplectic Household matrix . If the matrix is a real symmetry matrix, can be considered as . Then, one can get an upper Hessenberg matrix (referred to (2.13)), namely, where is the symplectic Householder matrix.(iii)Calculate eigenvalues by using symplectic decomposition method; if is a real symmetry matrix, then the eigenvalues of are equal to those of : (iv)These eigenvalues are sorted by descending order, that is,

Thus, the calculation of 2-dimension space is transformed into that of -dimension space. The is the symplectic principal component spectrums of with relevant symplectic orthonormal bases. In the so-called noise floor, values of , reflect the noise level in the data [18, 19]. The corresponding matrix denotes symplectic eigenvectors of .

##### 2.3. Proposed Algorithm

For a measured data , our proposed algorithm consists of the following steps:(1)Reconstruct the attractor from the measured time series, where is the embedding dimension of the matrix and .(2)Remove the mean values of each row of the matrix .(3)Build the real symmetry matrix , that is, Here, should be larger than the dimension of the system in terms of Taken’s embedding theorem.(4)Calculate the symplectic principal components of the matrix by decomposition, and give the Household transform matrix .(5)Construct the corresponding principal eigenvalue matrix according to the number of the chosen symplectic principal components of the matrix , where . That is, when ; otherwise, . In use, can be chosen according to (2.16).(6)Get the transformed coefficients , where (7)Reestimate the from Then, the reestimation data can be given.(8)For the noisy time series, the first estimation of data is usually not good. Here, one can go back to the step (6) and let in (2.18) to do step (6) and (7) again. Generally, the second estimated data will be better than the first estimated data.

Besides, it is necessary to note that, for the clean time series, the step (8) is unnecessary to handle.

#### 3. Numerical and Experimental Data

In order to investigate the feasibility of SPCA, this paper employs the chaotic Lorenz time series as follows: where . Here, is a white Gaussian measurement noise. The measurement noise is used because all real measurements are polluted by noise. For more details of noise notions, refer to the literature [23–26].

#### 4. Performance Evaluation

SPCA, like PCA, not only can represent the original data by capturing the relationship between the variables, but also can reduce the contribution of errors in the original data. Therefore, this paper studies the performance analysis of SPCA from the two views, that is, representation of chaotic signals and noise reduction in chaotic signals.

##### 4.1. Representation of Chaotic Signals

We first show that, for the clean chaotic time series, SPCA can perfectly reconstruct the original data in a high-dimensional space. We first embed the original time series to a phase space. Considering that the dimension of the Lorenz system is 3, of the matrix is chosen as 8 in our SPCA analysis. To quantify the difference between the original data and the SPCA-filtered data, we employ the root-mean-square error (RMSE) as a measure: where and are the original data and estimated data, respectively.

When , the RMSE values are lower than 10^{−14} (see Figure 1). In Figure 1, the original data are generated by (3.1) when noise . The estimated data is obtained by SPCA with . The results show that the SPCA method is better than the PCA. Since the real systems are usually unknown, it is necessary to study the effect of sampling time, data length, and noise on the SPCA approach. From Figures 1 and 2, we can see that the sampling time and data length have less effect on SPCA method in the case of free noise.

For analyzing noisy data, we use the percentage of principal components (PCs) to study the occupancy rate of each PC in order to reduce noise. The percentage of PCs is defined by where is the embedding dimension and is the th principal component value. From Figure 3, we find that the first largest symplectic principal component (SPC) of the SPCA is a little larger than that of the PCA. It is almost possessed of all the proportion of the symplectic principal components. This shows that it is feasible for the SPCA to study the principal component analysis of time series.

Next, we study the reduced space spanned by a few largest symplectic principal components (SPCs) to estimate the chaotic Lorenz time series (see Figure 4). In Figure 4, the data is given with a sampling time of 0.01 from chaotic Lorenz system. The estimated data is calculated by the first three largest SPCs. The average error and standard deviation between the original data and the estimated data are and , respectively. The estimated data is very close to the original data not only in time domain (see Figure 4(a)) but also in phase space (see Figure 4(b)). We further explore the effect of sampling time in different number of PCs. When the PCs number and , respectively, the SPCA and PCA give the change of RMSE values with the sampling time in Figure 5. We can see that the RMSE values of the SPCA are smaller than those of the PCA. The sampling time has less impact on the SPCA than the PCA. In the case of , the data length has also less effect on the SPCA than the PCA (see Figure 6).

Comparing with PCA, the results of SPCA are better in Figures 4, 5, and 6. We can see that the SPCA method keep the essential dynamical character of the primary time series generated by chaotic continuous systems. These indicate that the SPCA can reflect intrinsic nonlinear characteristics of the original time series. Moreover, the SPCA can elucidate the dominant features underlying the observed data. This will help to retrieve dominant patterns from the noisy data. For this, we study the feasibility of the proposed algorithm to reduce noise by using the noisy chaotic Lorenz data.

##### 4.2. Noise Reduction in Chaotic Signals

For the noisy Lorenz data , the phase diagrams of the noisy and clean data are given in Figures 7(a) and 7(b). The clean data is the chaotic Lorenz data with noise-free (see (3.1)). The noisy data is the chaotic Lorenz data with Gaussian white noise of zero mean and one variance (see (3.1)). The sampling time is 0.01. The time delay is 11 in Figure 7. It is obvious that noise is very strong. The first denoised data is obtained in terms of the proposed SPCA algorithm (see Figures 7(c)–7(f)). Here, we first build an attractor with the embedding dimension of 8. Then, the transform matrix is constructed when . The first denoised data is generated by (2.18) and (2.19). In Figure 7(c), the first denoised data is compared with the noisy Lorenz data from the view of time field. Figure 7(d) shows the corresponding phase diagram of the first denoised data. Compared with Figure 7(a), the first denoised data can basically give the structure of the original system. In order to obtain better results, this denoised data is reduced noise again by step (8). We can see that, after the second noise reduction, the results are greatly improved in Figures 7(e) and 7(f), respectively. The curves of the second denoised data are better than those of the first denoised data whether in time domain or in phase space by contrast with Figures 7(c) and 7(d). Figure 7(g) shows that the PCA technique gives the first denoised result. We refer to our algorithm to deal with the first denoised data again by the PCA (see Figure 7(h)).

Some of noise has been further reduced but the curve of PCA is not better than that of SPCA in Figure 7(e). The reason is that the PCA is a linear method indeed. When nonlinear structures have to be considered, it can be misleading, especially in the case of a large sampling time (see Figure 8). The used program code of the PCA comes from the TISEAN tools (http://www.mpipks-dresden.mpg.de/~tisean/).

Figure 8 shows the variation of correlation dimension with embedding dimension in the sampling time of 0.1 for the clean, noisy, and denoised Lorenz data. We can observe that, for the clean and SPCA denoised data, the trend of the curves tends to smooth in the vicinity of 2. For the noisy data, the trend of the curve is constantly increasing and has no platform. For the PCA denoised data, the trend of the curve is also increasing and trends to a platform with 2. However, this platform is smaller than that of SPCA. It is less effective than the SPCA algorithm. This indicates that it is difficult for the PCA to describe the nonlinear structure of a system, because the correlation dimension manifests nonlinear properties of chaotic systems. Here, the correlation dimension is estimated by the Grassberger-Procaccia’s algorithm [27, 28].

#### 5. Discussion and Conclusion

In this paper, we have proposed a novel PCA based on symplectic geometry, called SPCA. From the view of theory, this method can reflect nonlinear structure of nonlinear dynamical systems very well because it is intrinsically nonlinear. Using chaotic Lorenz data and calculating RMSE, percentage, correlation dimension, and phase space diagrams, we have shown that the SPCA method can yield more reliable results for chaotic time series with wider range of data length and sampling time, especially with short data length and undersampled sampling time than the classic PCA. With regard to noise reduction, SPCA algorithm is also more effective than PCA.

We wish to emphasize that SPCA has phase delay property; that is, the second row of SPCA-filtered data is closer to the original data. It is worth further investigation in future.

#### Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 10872125), Science Fund for Creative Research Groups of the National Natural Science Foundation of China (no. 50821003), State Key Lab of Mechanical System and Vibration, Project supported by the Research Fund of State Key Lab of MSV, China (Grant no. MSV-MS-2010-08) and Science and Technology Commission of Shanghai Municipality (no. 06ZR14042). The authors also thank Dr. Gao Jianbo for many valuable discussions.

#### References

- J. Gao, Y. Cao, W.-W. Tung, and J. Hu,
*Multiscale Analysis of Complex Time Series*, Wiley-Interscience, Hoboken, NJ, USA, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - H. Kantz and T. Schreiber,
*Nonlinear Time Series Analysis*, Cambridge University Press, Cambridge, UK, 2nd edition, 2004. - J. Hu, J. Gao, and W. W. Tung, “Characterizing heart rate variability by scale-dependent Lyapunov exponent,”
*Chaos*, vol. 19, no. 2, Article ID 028506, 2009. View at Publisher · View at Google Scholar · View at Scopus - M. Lei, Z. Wang, and Z. Feng, “Detecting nonlinearity of action surface EMG signal,”
*Physics Letters, Section A*, vol. 290, no. 5-6, pp. 297–303, 2001. View at Publisher · View at Google Scholar · View at Scopus - G. Ozer and C. Ertokatli, “Chaotic processes of common stock index returns: an empirical examination on Istanbul Stock Exchange (ISE) market,”
*African Journal of Business Management*, vol. 4, no. 6, pp. 1140–1148, 2010. View at Google Scholar · View at Scopus - A. Bogris, A. Argyris, and D. Syvridis, “Encryption efficiency analysis of chaotic communication systems based on photonic integrated chaotic circuits,”
*IEEE Journal of Quantum Electronics*, vol. 46, no. 10, pp. 1421–1429, 2010. View at Publisher · View at Google Scholar · View at Scopus - R. Hegger, H. Kantz, and T. Schreiber, “Practical implementation of nonlinear time series methods: the TISEAN package,”
*Chaos*, vol. 9, no. 2, pp. 413–435, 1999. View at Google Scholar · View at Scopus - F. Takens, “Detecting strange attractors in turbulence,” in
*Dynamical Systems and Turbulence*, D. A. Rand and L.-S. Young, Eds., vol. 898 of*Lecture Notes in Mathematics*, pp. 366–381, Springer, 1981. View at Google Scholar - A. M. Fraser, “Reconstructing attractors from scalar time series: a comparison of singular system and redundancy criteria,”
*Physica D*, vol. 34, no. 3, pp. 391–404, 1989. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - M. Paluš and I. Dvořák, “Singular-value decomposition in attractor reconstruction: pitfalls and precautions,”
*Physica D*, vol. 55, no. 1-2, pp. 221–234, 1992. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - J. B. Gao, J. Hu, and W. W. Tung, “Complexity measures of brain wave dynamics,”
*Cognitive Neurodynamics*, vol. 5, pp. 171–182, 2011. View at Google Scholar - J. B. Gao, J. Hu, W. W. Tung, and Y. H. Cao, “Distinguishing chaos from noise by scale-dependent Lyapunov exponent,”
*Physical Review E*, vol. 74, no. 6, Article ID 066204, 2006. View at Publisher · View at Google Scholar · View at Scopus - X. Lu and R. Schmid, “Symplectic integration of Sine-Gordon type systems,”
*Mathematics and Computers in Simulation*, vol. 50, no. 1-4, pp. 255–263, 1999. View at Google Scholar · View at Scopus - Z. G. Ying and W. Q. Zhu, “Exact stationary solutions of stochastically excited and dissipated gyroscopic systems,”
*International Journal of Non-Linear Mechanics*, vol. 35, no. 5, pp. 837–848, 2000. View at Publisher · View at Google Scholar · View at Scopus - A. C. J. Luo and R. P. S. Han, “The resonance theory for stochastic layers in nonlinear dynamic systems,”
*Chaos, Solitons and Fractals*, vol. 12, no. 13, pp. 2493–2508, 2001. View at Publisher · View at Google Scholar · View at Scopus - K. Feng,
*Proceeding of the 1984 Beijing Symposium on Differential Geometry and Differential Equations*, Science Press, Beijing, China, 1985. - C. Van Loan, “A symplectic method for approximating all the eigenvalues of a Hamiltonian matrix,”
*Linear Algebra and Its Applications*, vol. 61, no. C, pp. 233–251, 1984. View at Google Scholar · View at Scopus - M. Lei, Z. Wang, and Z. Feng, “The application of symplectic geometry on nonlinear dynamics analysis of the experimental data,” in
*Proceedings of the 14th International Conference on Digital Signal Processing Proceeding*, vol. 1-2, pp. 1137–1140, 2002. - M. Lei, Z. Wang, and Z. Feng, “A method of embedding dimension estimation based on symplectic geometry,”
*Physics Letters, Section A*, vol. 303, no. 2-3, pp. 179–189, 2002. View at Publisher · View at Google Scholar · View at Scopus - M. Lei and G. Meng, “Detecting nonlinearity of sunspot number,”
*International Journal of Nonlinear Sciences and Numerical Simulation*, vol. 5, no. 4, pp. 321–326, 2004. View at Google Scholar · View at Scopus - X. Niu, F. Qu, and N. Wang, “Evaluating sprinters' surface EMG signals based on EMD and symplectic geometry,”
*Journal of Ocean University of Qingdao*, vol. 35, no. 1, pp. 125–129, 2005. View at Google Scholar - H. Xie, Z. Wang, and H. Huang, “Identification determinism in time series based on symplectic geometry spectra,”
*Physics Letters, Section A*, vol. 342, no. 1-2, pp. 156–161, 2005. View at Publisher · View at Google Scholar · View at Scopus - J. Argyris, I. Andreadis, G. Pavlos, and M. Athanasiou, “The influence of noise on the correlation dimension of chaotic attractors,”
*Chaos, Solitons and Fractals*, vol. 9, no. 3, pp. 343–361, 1998. View at Google Scholar · View at Scopus - J. Argyris, I. Andreadis, G. Pavlos, and M. Athanasiou, “On the Influence of Noise on the Largest Lyapunov Exponent and on the Geometric Structure of Attractors,”
*Chaos, Solitons and Fractals*, vol. 9, no. 6, pp. 947–958, 1998. View at Google Scholar · View at Scopus - W. W. Tung, J. B. Gao, J. Hu, and L. Yang, “Recovering chaotic signals in heavy noise environments,”
*Physical Review E*, vol. 83, Article ID 046210, 2011. View at Google Scholar - J. Gao, H. Sultan, J. Hu, and W. W. Tung, “Denoising nonlinear time series by adaptive filtering and wavelet shrinkage: a comparison,”
*IEEE Signal Processing Letters*, vol. 17, no. 3, pp. 237–240, 2010. View at Publisher · View at Google Scholar · View at Scopus - P. Grassberger and I. Procaccia, “Characterization of strange attractors,”
*Physical Review Letters*, vol. 50, no. 5, pp. 346–349, 1983. View at Publisher · View at Google Scholar · View at Scopus - P. Grassberger and I. Procaccia, “Measuring the strangeness of strange attractors,”
*Physica D*, vol. 9, no. 1-2, pp. 189–208, 1983. View at Google Scholar · View at Scopus