Abstract
Experimental data are often very complex since the underlying dynamical system may be unknown and the data may heavily be corrupted by noise. It is a crucial task to properly analyze data to get maximal information of the underlying dynamical system. This paper presents a novel principal component analysis (PCA) method based on symplectic geometry, called symplectic PCA (SPCA), to study nonlinear time series. Being nonlinear, it is different from the traditional PCA method based on linear singular value decomposition (SVD). It is thus perceived to be able to better represent nonlinear, especially chaotic data, than PCA. Using the chaotic Lorenz time series data, we show that this is indeed the case. Furthermore, we show that SPCA can conveniently reduce measurement noise.
1. Introduction
Data measured in experimental situations, especially in real environments, can be very complex since the underlying dynamical system may be nonlinear and unknown structure, and the data may be very noisy. It is challenging to appropriately analyze the measured data, especially the noisy ones. Since chaotic phenomena have been discovered, interpretation of irregular dynamics of various systems as a deterministic chaotic process has been popular and widely used in almost all fields of science and engineering. A number of important algorithms based on chaos theory have been employed to infer the system dynamics from the data or reduce noise from the data [1–6]. The first step of these approaches is to reconstruct a phase space from the data so that the dynamic characteristic of the system can be properly studied [7]. This is achieved using Takens’ embedding theorem [8], which states that the system dynamics under the noise-free case can be reconstructed from one-dimensional signal, that is, a time series. However, the actual systems may often be noisy—sometimes so noisy that the reconstructed attractor of the nonlinear system could exhibit different features when different analysis techniques are used [9–12]. Therefore, appropriate analyses of the measured data are a critical task in the fields of science and engineering. In this work, we propose a novel nonlinear analysis method based on symplectic geometry and principal component analysis, called symplectic principal component analysis (SPCA).
The symplectic geometry is a kind of phase space geometry. Its nature is nonlinear. It can describe the system structure, especially nonlinear structure, very well. It has been used to study various nonlinear dynamical systems [13–15] since Feng [16] has proposed a symplectic algorithm for solving symplectic differential. However, from the view of data analysis, few literatures have employed symplectic geometry theory to explore the dynamics of the system. Our previous works have proposed the estimation of the embedding dimension based on symplectic geometry from a time series [17–20]. Subsequently, Niu et al. have used our method to evaluate sprinter’s surface EMG signals [21]. Xie et al. [22] have proposed a kind of symplectic geometry spectra based on our work. In this paper, we show that SPCA can well represent chaotic time series and reduce noise in chaotic data.
2. Method
Consider a dynamical system defined in phase space . A discretized trajectory at times , may be described by maps of the form In SPCA, a fundamental step is to build the multidimensional structure (attractor) in symplectic geometry space. Here, in terms of Taken’s embedding theorem, we first construct an attractor in phase space, that is, the trajectory matrix from a time series. Then, we describe the symplectic principal component analysis (SPCA) based on symplectic geometry theory and give its corresponding algorithm.
2.1. Attractor Reconstruction
Let the measured data (the observable of the system under study) be recorded with sampling interval ; is the number of samples. Takens’ embedding theorem states that if the time series is indeed composed of scalar measurements of the state from a dynamical system, then, under certain genericity assumptions, a one-to-one image of the original set is given by the time-delay embedding, provided is large enough. That is, the time-delay embedding provides the map into --dimensional series : where is embedding dimension, is the number of dots in -dimension reconstruction attractor, and denotes the trajectory matrix of the dynamical system in phase space, that is, the attractor in phase space.
2.2. Symplectic Principal Component Analysis
SPCA is a kind of PCA approaches based on symplectic geometry. Its idea is to map the investigated complex system in symplectic space and elucidate the dominant features underlying the measured data. The first few larger components capture the main relationship between the variables in symplectic space. The remaining components are composed of the less important components or noise in the measured data. In symplectic space, the used geometry is called symplectic geometry. Different from Eulid geometry, symplectic geometry is the even-dimensional geometry with a special symplectic structure. It is dependent on a bilinear antisymmetric nonsingular cross product—symplectic cross product: where when , The measurement of symplectic space is area scale. In symplectic space, the length of arbitrary vectors always equals zero and without signification and there is the concept of orthogonal cross-course. In symplectic geometry, the symplectic transform is the nonlinear transform in essence, which is also called canonical transform, since it has measure-preserving characteristics and can keep the natural properties of the original data unchanged. It is fit for nonlinear dynamics systems.
The symplectic principal components are given by symplectic similar transform. It is similar to SVD-based PCA. The corresponding eigenvalues can be obtained by symplectic method. Here, we first construct the autocorrelation matrix of the trajectory matrix . Then, the matrix can be transformed as a Hamilton matrix in symplectic space.
Theorem 2.1. Any matrix can be made into a Hamilton matrix . Let a matrix as , so where is Hamilton matrix.
Theorem 2.2. Hamilton matrix keeps unchanged at symplectic similar transform.
Theorem 2.3. Let be Hamilton matrix, so is symplectic matrix.
Theorem 2.4. Let be sympletcic matrix, and there is , where is symplectic unitary matrix; and is upper triangle matrix.
Theorem 2.5. The product of sympletcic matrixes is also a symplectic matrix.
Theorem 2.6. Suppose that Household matrix H is where so is symplectic unitary matrix. is conjugate transposition.
Proof. For proving that matrix is symplectic matrix, we only need to prove .
where .
Plugging (2.10) into (2.9), we have:
For Hamilton matrix , its eigenvalues can be given by symplectic similar transform and the primary -dimension space can be transformed into -dimension space to resolve [17–19], as follows:(i)Let ,(ii)Construct a symplectic matrix , where is up Hessenberg matrix . The matrix may be a symplectic Household matrix . If the matrix is a real symmetry matrix, can be considered as . Then, one can get an upper Hessenberg matrix (referred to (2.13)), namely, where is the symplectic Householder matrix.(iii)Calculate eigenvalues by using symplectic decomposition method; if is a real symmetry matrix, then the eigenvalues of are equal to those of : (iv)These eigenvalues are sorted by descending order, that is,
Thus, the calculation of 2-dimension space is transformed into that of -dimension space. The is the symplectic principal component spectrums of with relevant symplectic orthonormal bases. In the so-called noise floor, values of , reflect the noise level in the data [18, 19]. The corresponding matrix denotes symplectic eigenvectors of .
2.3. Proposed Algorithm
For a measured data , our proposed algorithm consists of the following steps:(1)Reconstruct the attractor from the measured time series, where is the embedding dimension of the matrix and .(2)Remove the mean values of each row of the matrix .(3)Build the real symmetry matrix , that is, Here, should be larger than the dimension of the system in terms of Taken’s embedding theorem.(4)Calculate the symplectic principal components of the matrix by decomposition, and give the Household transform matrix .(5)Construct the corresponding principal eigenvalue matrix according to the number of the chosen symplectic principal components of the matrix , where . That is, when ; otherwise, . In use, can be chosen according to (2.16).(6)Get the transformed coefficients , where (7)Reestimate the from Then, the reestimation data can be given.(8)For the noisy time series, the first estimation of data is usually not good. Here, one can go back to the step (6) and let in (2.18) to do step (6) and (7) again. Generally, the second estimated data will be better than the first estimated data.
Besides, it is necessary to note that, for the clean time series, the step (8) is unnecessary to handle.
3. Numerical and Experimental Data
In order to investigate the feasibility of SPCA, this paper employs the chaotic Lorenz time series as follows: where . Here, is a white Gaussian measurement noise. The measurement noise is used because all real measurements are polluted by noise. For more details of noise notions, refer to the literature [23–26].
4. Performance Evaluation
SPCA, like PCA, not only can represent the original data by capturing the relationship between the variables, but also can reduce the contribution of errors in the original data. Therefore, this paper studies the performance analysis of SPCA from the two views, that is, representation of chaotic signals and noise reduction in chaotic signals.
4.1. Representation of Chaotic Signals
We first show that, for the clean chaotic time series, SPCA can perfectly reconstruct the original data in a high-dimensional space. We first embed the original time series to a phase space. Considering that the dimension of the Lorenz system is 3, of the matrix is chosen as 8 in our SPCA analysis. To quantify the difference between the original data and the SPCA-filtered data, we employ the root-mean-square error (RMSE) as a measure: where and are the original data and estimated data, respectively.
When , the RMSE values are lower than 10−14 (see Figure 1). In Figure 1, the original data are generated by (3.1) when noise . The estimated data is obtained by SPCA with . The results show that the SPCA method is better than the PCA. Since the real systems are usually unknown, it is necessary to study the effect of sampling time, data length, and noise on the SPCA approach. From Figures 1 and 2, we can see that the sampling time and data length have less effect on SPCA method in the case of free noise.
For analyzing noisy data, we use the percentage of principal components (PCs) to study the occupancy rate of each PC in order to reduce noise. The percentage of PCs is defined by where is the embedding dimension and is the th principal component value. From Figure 3, we find that the first largest symplectic principal component (SPC) of the SPCA is a little larger than that of the PCA. It is almost possessed of all the proportion of the symplectic principal components. This shows that it is feasible for the SPCA to study the principal component analysis of time series.
Next, we study the reduced space spanned by a few largest symplectic principal components (SPCs) to estimate the chaotic Lorenz time series (see Figure 4). In Figure 4, the data is given with a sampling time of 0.01 from chaotic Lorenz system. The estimated data is calculated by the first three largest SPCs. The average error and standard deviation between the original data and the estimated data are and , respectively. The estimated data is very close to the original data not only in time domain (see Figure 4(a)) but also in phase space (see Figure 4(b)). We further explore the effect of sampling time in different number of PCs. When the PCs number and , respectively, the SPCA and PCA give the change of RMSE values with the sampling time in Figure 5. We can see that the RMSE values of the SPCA are smaller than those of the PCA. The sampling time has less impact on the SPCA than the PCA. In the case of , the data length has also less effect on the SPCA than the PCA (see Figure 6).
(a) |
(b) |
Comparing with PCA, the results of SPCA are better in Figures 4, 5, and 6. We can see that the SPCA method keep the essential dynamical character of the primary time series generated by chaotic continuous systems. These indicate that the SPCA can reflect intrinsic nonlinear characteristics of the original time series. Moreover, the SPCA can elucidate the dominant features underlying the observed data. This will help to retrieve dominant patterns from the noisy data. For this, we study the feasibility of the proposed algorithm to reduce noise by using the noisy chaotic Lorenz data.
4.2. Noise Reduction in Chaotic Signals
For the noisy Lorenz data , the phase diagrams of the noisy and clean data are given in Figures 7(a) and 7(b). The clean data is the chaotic Lorenz data with noise-free (see (3.1)). The noisy data is the chaotic Lorenz data with Gaussian white noise of zero mean and one variance (see (3.1)). The sampling time is 0.01. The time delay is 11 in Figure 7. It is obvious that noise is very strong. The first denoised data is obtained in terms of the proposed SPCA algorithm (see Figures 7(c)–7(f)). Here, we first build an attractor with the embedding dimension of 8. Then, the transform matrix is constructed when . The first denoised data is generated by (2.18) and (2.19). In Figure 7(c), the first denoised data is compared with the noisy Lorenz data from the view of time field. Figure 7(d) shows the corresponding phase diagram of the first denoised data. Compared with Figure 7(a), the first denoised data can basically give the structure of the original system. In order to obtain better results, this denoised data is reduced noise again by step (8). We can see that, after the second noise reduction, the results are greatly improved in Figures 7(e) and 7(f), respectively. The curves of the second denoised data are better than those of the first denoised data whether in time domain or in phase space by contrast with Figures 7(c) and 7(d). Figure 7(g) shows that the PCA technique gives the first denoised result. We refer to our algorithm to deal with the first denoised data again by the PCA (see Figure 7(h)).
(a) phase diagram for noisy data
(b) phase diagram for clean data
(c) Lorentz time series
(d) phase diagram for the SPCA
(e) phase diagram for the SPCA
(f) the second denoised data
(g) phase diagram for PCA
(h) phase diagram for PCA
Some of noise has been further reduced but the curve of PCA is not better than that of SPCA in Figure 7(e). The reason is that the PCA is a linear method indeed. When nonlinear structures have to be considered, it can be misleading, especially in the case of a large sampling time (see Figure 8). The used program code of the PCA comes from the TISEAN tools (http://www.mpipks-dresden.mpg.de/~tisean/).
Figure 8 shows the variation of correlation dimension with embedding dimension in the sampling time of 0.1 for the clean, noisy, and denoised Lorenz data. We can observe that, for the clean and SPCA denoised data, the trend of the curves tends to smooth in the vicinity of 2. For the noisy data, the trend of the curve is constantly increasing and has no platform. For the PCA denoised data, the trend of the curve is also increasing and trends to a platform with 2. However, this platform is smaller than that of SPCA. It is less effective than the SPCA algorithm. This indicates that it is difficult for the PCA to describe the nonlinear structure of a system, because the correlation dimension manifests nonlinear properties of chaotic systems. Here, the correlation dimension is estimated by the Grassberger-Procaccia’s algorithm [27, 28].
5. Discussion and Conclusion
In this paper, we have proposed a novel PCA based on symplectic geometry, called SPCA. From the view of theory, this method can reflect nonlinear structure of nonlinear dynamical systems very well because it is intrinsically nonlinear. Using chaotic Lorenz data and calculating RMSE, percentage, correlation dimension, and phase space diagrams, we have shown that the SPCA method can yield more reliable results for chaotic time series with wider range of data length and sampling time, especially with short data length and undersampled sampling time than the classic PCA. With regard to noise reduction, SPCA algorithm is also more effective than PCA.
We wish to emphasize that SPCA has phase delay property; that is, the second row of SPCA-filtered data is closer to the original data. It is worth further investigation in future.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (no. 10872125), Science Fund for Creative Research Groups of the National Natural Science Foundation of China (no. 50821003), State Key Lab of Mechanical System and Vibration, Project supported by the Research Fund of State Key Lab of MSV, China (Grant no. MSV-MS-2010-08) and Science and Technology Commission of Shanghai Municipality (no. 06ZR14042). The authors also thank Dr. Gao Jianbo for many valuable discussions.