Empirical Mode Decomposition Combined with Local Linear Quantile Regression for Automatic Boundary Correction
Empirical mode decomposition (EMD) is particularly useful in analyzing nonstationary and nonlinear time series. However, only partial data within boundaries are available because of the bounded support of the underlying time series. Consequently, the application of EMD to finite time series data results in large biases at the edges by increasing the bias and creating artificial wiggles. This study introduces a new two-stage method to automatically decrease the boundary effects present in EMD. At the first stage, local polynomial quantile regression (LLQ) is applied to provide an efficient description of the corrupted and noisy data. The remaining series is assumed to be hidden in the residuals. Hence, EMD is applied to the residuals at the second stage. The final estimate is the summation of the fitting estimates from LLQ and EMD. Simulation was conducted to assess the practical performance of the proposed method. Results show that the proposed method is superior to classical EMD.
We consider the following general nonparametric regression model: where is the response variable, is a covariate, is assumed to be a smooth nonparametric function, and represents independent and identical random errors with mean 0 and variance .
Empirical mode decomposition (EMD) is a form of analysis based on nonparametric methods . This technique is particularly useful for analyzing nonlinear and nonstationary time series. This method has been widely applied over the last few years to analyze data in different disciplines, such as biology, finance, engineering, and climatology. EMD can enhance estimation performance. Applying the capabilities of EMD as a fully adaptive method and its advantages of handling nonlinear and nonstationary signal behaviors leads to better results. However, EMD suffers from boundary extension, curve fitting, and stopping criteria . Such problems may corrupt the entire data and result in a misleading conclusion . Given that finite data are involved, the algorithms must be adjusted to use certain boundary conditions. In EMD, the end points are also considered problems. The influence of the end points propagates into the data range during sifting. Data extension (or data prediction) is a risky procedure for linear and stationary processes and is more difficult for nonlinear and nonstationary processes. The work in  indicated that only the values and locations of the next several extrema, and not all extended data, need to be predicted for EMD. Widely used approaches, such as the characteristic wave extending method, mirror extending method , data extending method , data reconstruction method , and similarity searching method , were proposed to overcome the problem and generate a more reasonable solution. The work in  introduced quantile regression, a significant extension of traditional parametric and nonparametric regression methods. Quantile regression has been largely used in statistics since its introduction because of its ease of interpretation, robustness, and numerous applications in important areas, such as medicine, economics, environment modeling, toxicology, and engineering [9, 10]. A robust version of classical local linear regression (LLR) known as local linear quantile regression (LLQ) by [11, 12] respectively, have increasingly drawn interest. With its robust behavior, LLQ exhibits excellent boundary adjustment. This characteristic can more efficiently distinguish systematic differences in dispersion, tail behavior, and other features with respect to covariates [12, 13].
The current study aims to use the advantages of LLQ to automatically reduce the boundary effects of EMD instead of using classical boundary solutions mentioned previously. The proposed method consists of two stages that automatically decrease the boundary effects of EMD. At the first stage, LLQ is applied to the corrupted and noisy data. The remaining series is then expected to be hidden in the residuals. At the second stage, EMD is applied to the residuals. The final estimate is the summation of the fitting estimates from LLQ and EMD. Compared with EMD, this combination obtains more accurate estimates.
The remainder of this study is organized as follows. In Section 2, we present a brief background of EMD and LLQ. Section 3 introduces the proposed method. Section 4 compares the results of the original EMD algorithm and the proposed new boundary adjustment by simulation experiments. Conclusions are drawn in Section 5.
2.1. History of Boundary Treatment in Nonparametric Estimators
Most nonparametric techniques such as kernel regression, wavelet thresholding, and empirical mode decomposition show a sharp increase in variance and bias at points near the boundary. Lots of works have been reported in the literature in order to reduce the effects of boundary problem. For kernel regression solution, see [14, 15]. For wavelet thresholding, in addition to use of periodic or symmetric assumption, the authors in [16, 17] used polynomial regression to improve the boundary problem. For empirical mode decomposition the authors in  provided a new idea about the boundary extension instead of using the traditional mirror extension on the boundary, and they proposed a ratio extension on boundary. The authors in  applied neural network to each IMF to restrain the end effect. The work in  provided an algorithm based on the sigma-pi neural network which is used to extend signals before applying EMD. The authors in  proposed a new approach that couples the mirror expansion with the extrapolation prediction of regression function to solve boundary problem. The algorithm includes two steps: the extrapolation of the signal through support vector (SV) regression at both endpoints to form the primary expansion signal, and then the primary signal is further expanded through extrema mirror expansion and EMD is performed on the resulting signal to obtain reduced end limitations.
In this paper we have followed  and  strategies to handle end effects of boundary problem in EMD. Instead of using classical polynomial nonparametric regression we will replace it by using a more robust nonparametric estimator, called local linear quantile regression. Practical justifications for choosing such estimator will be explained in Section 2.4.
2.2. Empirical Mode Decomposition (EMD)
EMD  has proven to be a natural extension and an alternative technique to traditional methods for analyzing nonlinear and nonstationary signals, such as wavelet methods, Fourier methods, and empirical orthogonal functions . In this section, we briefly describe the EMD algorithm. The main objective of EMD is to decompose the data into small signals called intrinsic mode functions (IMF). An IMF is a function in which the upper and the lower envelopes are symmetric; in addition, the number of zero-crossings and the number of extremes are equal or differ by at most one . The algorithm for extracting IMFs for a given time series is called shifting and consists of the following steps.(I)Setting initial estimates for the residue as , , , and the index of IMF as .(II)Constructing the lower minima and the upper envelopes of the signal by the cubic spline method.(III)Computing the mean values, , by averaging the upper envelope and the lower envelope as .(IV)Subtracting the mean from the original signal, that is, and . Steps II to IV are repeated until becomes an IMF. If so, the th IMF is given by .(V)Updating the residue as . This residual component is treated as new data and subjected to the process described above to calculate the next .(VI)Repeating the steps above until the final residual component becomes a monotonic function and then considering the final estimation of residue .Many methods have been presented to extract trends from a time series. Freehand and least squares methods are the commonly used techniques; the former depends on the experience of users, and the latter is difficult to use when the original series are very irregular . EMD is another effective method for extracting trends .
2.3. Local Linear Quantile Regression (LLQ)
The seminal study by  introduced the parametric quantile regression, which can be considered an alternative to classical regression in both parametric and nonparametric fields. Many models for the nonparametric approach, including locally polynomial quantile regression by  and kernel methods by , have since been introduced into the statistical literature. In this paper we adopt local linear regression (LLQ) introduced by .
Let , be bivariate observations. To estimate the th conditional quantile function of response , the equation below is defined given : Let be a positive symmetric unimodal kernel function and consider the following weighted quantile regression problem: where . Once the covariate observations are centered at point , the estimate of is simply , which is the first component of the minimizer of (2). determines an estimate of the slope of the function at point .
The higher-order LLQ estimate is the minimizer of the following: The choice of the bandwidth parameter significantly influences all nonparametric estimations. An excessively large obscures too much local structure by excessive smoothing. Conversely, an excessively small introduces too much variability by relying on very few observations in the local polynomial fitting .
2.4. Bandwidth Selection
The practical performance of depends strongly on selected of bandwidth parameter. In this study we adopt the strategy of . In summary, we have the automatic bandwidth selection strategy for smoothing conditional quantiles as follows.(1)Use ready-made and sophisticated methods to select ; we use the technique of .(2)Use to obtain all other from .Here, and are standard normal density and distribution function and is a bandwidth parameter for regression mean estimation with various existing methods. As it can be seen, this procedure leads to identical bandwidth for and quantiles.
2.5. The Behavior of Local Linear Quantile Estimator at Boundary Region
To examine the asymptotic the asymptotic behavior of the local linear quantile estimators at the boundaries, we offer this theorem which has been discussed in detail; see . Here we omitted the proofs and summarized the key points. Without loss of generality, one can consider only the left boundary point , , if takes value only from . However, a similar result holds for the right boundary point .
Theorem 1 (see ). Consider the following assumptions.(1) is twice continuously differentiable in a neighborhood of for any .(2) is continuous and .(3) is bounded and satisfies the Lipschitz condition.(4)The kernel function is symmetric and has a compact support, say .(5) is a strictly -mixing stationary process with mixing coefficient which satisfies for some positive real number and .(6) with .(7) is positive-definite and continuous in a neighborhood of .(8) is continuous and positive-definite in a neighborhood of .(9)The bandwidth satisfies and .(10) for where is the conditional density of given .(11). The asymptotic normality of the local linear quantile estimator at the left boundary point is given by where Further, the asymptotic normality of the local constant quantile estimator at the left boundary point for is where
From the above theorem, one can deduce that, at the boundaries, the asymptotic bias term for the local constant quantile estimate is of the order , compared to the order for the local linear quantile estimate. Hence, the local linear estimation possesses good behavior at boundaries and there is no need for any boundary correction. In other words, the local linear quantile estimate does not suffer from boundary effects but the local constant quantile estimate does. Therefore, local linear quantile is preferable in practice.
3. Proposed Method
This section elaborates on the proposed method. This technique combines EMD and LLQ (EMD-LLQ). Since local linear quantile regression produces excellent boundary treatment , it is expected that the addition of this component to empirical mode decomposition will result in equally well-boundary properties. Results from numerical experiments extremely support this claim.
The basic idea behind the proposed method is to estimate the underlining function with the sum of a set of EMD functions, , and an LLQ function, . That is, We need to estimate the two components and to obtain our proposed estimate, , by the following steps.(1)Applying LLQ to the corrupted and noisy data, and obtaining the trend estimate.(2)Determining the residuals from LLQ; that is, .(3)Applying EMD to , given that the remaining series is expected to be hidden in the residuals. This step is accomplished by performing the following substeps.(I)Setting initial estimates for the residue as , , , and the index of IMF as .(II)Constructing the lower minima and envelopes of the signal by the cubic spline method.(III)Calculating the mean values by averaging the upper envelope and the lower envelope. Setting .(IV)Subtracting the mean from the original signal as and . Steps I to IV are repeated until becomes an IMF. The th IMF is then given as .(V)Updating the residue . This residual component is regarded as a new datum and is subjected to the process described above to calculate the next .(VI)The steps above are repeated until the final residual component becomes a monotonic function. The final estimation of the residue is then considered.(4)The final estimate is the summation of the fitting estimates from LLQ and EMD, as follows:
4. Simulation Study
In this simulation, the software package was employed to evaluate classical EMD by  and the proposed combined method, EMD-LLQ. The following conditions were set.(1)Three different test functions (Table 1).(2)Three different values of quantile (0.25, 0.50, and 0.75).(3)Three different kinds of noise structure errors, namely:(a)normal distribution with zero mean and unity variance,(b)correlated noise from the first-order autoregressive model AR (1) with parameter (0.5),(c)heavy-tailed noise from distribution with three degrees of freedom.
Datasets were simulated from each of the three test functions with a sample size of (Figure 1). For each simulated dataset, the above two methods were applied to estimate the test function. In each case, 1,000 replications of the sample size were made. The mean squared error (MSE) was used as the numerical measure to assess the quality of the estimate. The MSE was calculated for those observations that were at most 10 sample points away from the boundaries of the test functions: where .
From the simulation results, reported in Tables 2, 3, and 4, we have observed the following. Regardless of the boundary assumptions, test functions, noise structures, and different values of quantile, the proposed method is constantly superior to the classical EMD under periodic, symmetric (Mirror) and wave conditions. Tables 2, 3, and 4 summarize the results.
To ensure that the improvement in mean squared error is due to our proposed treatment, not to something else, we evaluated the classical method and our proposed one when no boundary treatment has been set up at all. From simulation result, we observed that even though the classical solutions help improve the mean squared error, our improvement is much better. Then, at the end, to get rid of some suspicions that the differences might not be significant, we used rank Wilcoxon test. This provided us evidence that our proposed method still achieves a better performance near the boundaries than EMD. All value for Wilcoxon test are less than 0.05.
In this study, a new two-stage method is introduced to decrease the effects of the boundary problem in EMD. The proposed method is based on a coupling of LLQ at the first stage and classical EMD at the second stage. The empirical performance of the proposed method was tested on different numerical experiments by simulation and real data application. The results of these experiments illustrate the improvement of the EMD estimation in terms of MSE.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank the School of Mathematical Sciences Universiti Sains Malaysia for the financial support.
N. E. Huang, Z. Shen, S. R. Long et al., “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society A, vol. 454, no. 1971, pp. 903–995, 1998.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Z. Liu, “A novel boundary extension approach for empirical mode decomposition,” in Intelligent Computing, vol. 4113 of Lecture Notes in Computer Science, pp. 299–304, Springer, Berlin, Germany, 2006.View at: Google Scholar
W. Wang, X. Li, and R. Zhang, “Boundary processing of HHT using support vector regression machines,” in Computational Science—ICCS 2007, vol. 4489 of Lecture Notes in Computer Science, pp. 174–177, Springer, Berlin, Germany, 2007.View at: Google Scholar
J. Zhao and D. Huang, “Mirror extending and circular spline function for empirical mode decomposition method,” Journal of Zhejiang University Science, vol. 2, no. 3, pp. 247–252, 2001.View at: Google Scholar
K. Zeng and M.-X. He, “A simple boundary process technique for empirical mode decomposition,” in Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS '04), pp. 4258–4261, September 2004.View at: Google Scholar
Z. Zhao and Y. Wang, “A new method for processing end effect in empirical mode decomposition,” in Proceedings of the International Conference on Communications, Circuits and Systems (ICCCAS '07), pp. 841–845, July 2007.View at: Google Scholar
M. Buchinsky, “Quantile regression, Box-Cox transformation model, and the U.S. wage structure, 1963–1987,” Journal of Econometrics, vol. 65, no. 1, pp. 109–154, 1995.View at: Google Scholar
R. Koenker, Quantile Regression, John Wiley & Sons, New York, NY, USA, 2005.
Y. Deng, W. Wang, C. Qian, Z. Wang, and D. Dai, “Boundary-processing-technique in EMD method and Hilbert transform,” Chinese Science Bulletin, vol. 46, no. 11, pp. 954–961, 2001.View at: Google Scholar
C. D. Blakely, A Fast Empirical Mode Decomposition Technique for Nonstationary Nonlinear Time Series, vol. 3, Elsevier Science, New York, NY, USA, 2005.
A. Amar and Z. El abidine Guennoun, “Contribution of wavelet transformation and empirical mode decomposition to measurement of US core inflation,” Applied Mathematical Sciences, vol. 6, no. 135, pp. 6739–6752, 2012.View at: Google Scholar
Y. Fan, J. W. Zhi, and S. L. Yuan, “Improvement in time-series trend analysis,” Computer Technology and Development, vol. 16, pp. 82–84, 2006.View at: Google Scholar
D. Ruppert, S. J. Sheather, and M. P. Wand, “An effective bandwidth selector for local least squares regression,” Journal of the American Statistical Association, vol. 90, pp. 1257–1270, 1995.View at: Google Scholar
X. Xu, Semiparametric quantile dynamic time series models and their applications [Ph.D. thesis], University of North Carolina at Charlotte, Charlotte, NC, USA, 2005.