Robust Mean Change-Point Detecting through Laplace Linear Regression Using EM Algorithm

Yang, Fengkai

doi:https://doi.org/10.1155/2014/856350

Journal of Applied Mathematics

On this page

Abstract Introduction References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 856350 | https://doi.org/10.1155/2014/856350

Robust Mean Change-Point Detecting through Laplace Linear Regression Using EM Algorithm

Fengkai Yang^1,2

Academic Editor: Zhihua Zhang

Received20 Jul 2014

Revised05 Sept 2014

Accepted05 Sept 2014

Published04 Nov 2014

Abstract

We proposed a robust mean change-point estimation algorithm in linear regression with the assumption that the errors follow the Laplace distribution. By representing the Laplace distribution as an appropriate scale mixture of normal distribution, we developed the expectation maximization (EM) algorithm to estimate the position of mean change-point. We investigated the performance of the algorithm through different simulations, finding that our methods is robust to the distributions of errors and is effective to estimate the position of mean change-point. Finally, we applied our method to the classical Holbert data and detected a change-point.

1. Introduction

Change-point analysis has been an active research area since the early 1950s. During the following period of sixty-some years, numerous articles have been published in various journals and proceedings. Chen and Gupta [1] summed up the main methods and applications on change-point detection and estimation. Ever since the change-point hypothesis was introduced into statistical analyses, the study of switching regression models has taken place in regression analysis. This made some previously poorly fitted regression models better fitted to some datasets after the change-point has been located in the regression models.

The Schwarz information criterion (SIC) proposed by Schwarz [2] has been applied to change-point analysis for different underlying models by many authors in the literature. Chen [3] was the first to use the SIC model selection method to study the mean change-point problem in normal linear regression model, and later Chen and Gupta [4] used the same method to detect both mean and variance change-point in normal model. Chen and Wang [5] developed a statistical change-point model approach for the detection of DNA copy number variations in array CGH data using the SIC method and assuming the error follows the normal distribution.

However, in practice, we do not know the real distribution of the data, and it is difficult to determine the real distribution especially when some change-point is present in the data. So, the normal assumption is not always suitable, for a lot of real data usually shows heavy tail and skewness. In such cases, some robust change-point detecting model with heavy-tailed distribution might be better than the normal model. Osorio and Galea [6] developed a mean change-point linear regression model with independent errors distributed according to the Student t-distribution and located the change-point using the SIC method. Lin et al. [7] studied the variance change-points in the Student regression model under Bayesian framework and analyzed the U.S. stock market data.

The symmetric Laplace distribution, also known as the double exponential distribution or the first law of Laplace, is another heavy-tailed error distribution besides the Student t-distribution. It is less sensitive to the outlier and more robust than the normal distribution. Kotz et al. [8] gave a systematic overview of the Laplace distribution, in their book, and the authors were devoted to presenting properties, generalizations of the Laplace distribution, and their applications in communications, economics, engineering, and finance.

In recent years, statistical models based on Laplace distribution have developed rapidly both in theory and application. Purdom and Holmes [9] found that the error distribution for gene expression data from microarray experiments can be better fitted by Laplace distribution than normal distribution. Pop [10] identified a Laplace distribution in the change in daily sunspot number, and later Noble and Wheatland [11] showed the physical origin of Laplace distribution and its use in daily sunspot numbers. van Sanden and Burzykowski [12] considered the analysis of microarray data by using ANOVA models under the assumption of Laplace-distributed error terms. Phillips [13] developed the expectation maximization (EM) algorithm for the least absolute deviation regression, which is also known as the Laplace regression or median regression. Park and Casella [14] interpreted the Lasso estimate for linear regression parameters as a Bayesian posterior mode estimate when the regression parameters have independent Laplace priors. Song et al. [15] proposed a robust estimation procedure for mixture linear regression models assuming that the error terms follow the Laplace distribution.

In this paper, we study the single mean change-point problem in linear regression model assuming that the error follows the Laplace distribution via EM algorithm and use the SIC model selection method to estimate the position of the mean change-point. Then, we investigate the robustness of the algorithm through simulations under different error distributions. Finally, we apply our method to some stock market data set.

2. Laplace Linear Regression Model with Mean Change-Point

2.1. Laplace Distribution as Scale Mixture of the Normal Distribution

The symmetric Laplace distribution is commonly denoted by , where is the location parameter and is the scale parameter. The density function is given by with mean and variance . The Laplace distribution is more peaked in the center and more heavy-tailed compared to the normal distribution, and, for a given samples from , the maximum likelihood estimate of is the median of the samples, which is robust to the outliers.

Andrews and Mallows [16] presented necessary and sufficient conditions under which a random variable may be generated as the ratio , where and are independent and has a standard normal distribution. Random variable is referred to as a normal variance mixture distribution or a scale mixture of Gaussian distribution. It was established that when is exponential, is double exponential, so Laplace is some kind of scale mixture of Gaussian distribution, where the mixing distribution is exponential distribution. The statement above can be described by the following proposition.

Proposition 1 (representation of Laplace distribution). Suppose random variable , and then there exists a random variable , such that , where denotes the Gamma distribution with parameters and .

2.2. Laplace Regression Model with Mean Change-Point

Let be a sequence of observations obtained in a practical situation, where , , is a nonstochastic vector variable, and the model we are going to discuss is where is a unknown parameter vector and , are random errors, which are independent and identically distributed as , with unknown. We can see that , , are independently distributed and, for , . To develop EM algorithm for the Laplace linear regression model, we utilize a mixture representation of the Laplace regression, which is given by Proposition 2, derived from Proposition 1.

Proposition 2 (representation of Laplace linear regression). Suppose random variables , , andthen there exist random latent variables , , where , such that , for .

Proposition 2 represents the Laplace linear regression model as a normal mixture model with mixture variables following the Gamma distribution. This kind of representation is crucial to perform the EM algorithm in the following mean change-point model.

The single mean (regression coefficients) change-point problem in a Laplacian linear regression model can be formulated as to test the following null hypothesis: versus the alternative where That is, a change exists (in the regression coefficients) in an unknown position , denominated mean change-point. In the following section, we perform EM algorithm to estimate the unknown parameters and use the SIC model selection method to detect the position of change-point.

3. EM Algorithm and Schwarz Information Criterion

3.1. EM Algorithm under

Denote , and random latent vector . The likelihood function of under is From the stochastic representation in Section 2.2, the regression model under becomes So, the likelihood function of the complete data has the following form: Consequently, the complete log-likelihood function is

Given the initial values , , we can obtain the maximum likelihood estimate of , based on EM algorithm via the following two steps.

(i) E Step. Given the th iteration values of and omitting the terms having no relation to , the function of the th iteration is where .

In order to obtain , we firstly compute the conditional probability density function (pdf) of . Note that the joint pdf of and is Then, we have Due to the conditionally independent of , given , we obtain the marginal conditional pdf of given as where is the generalized inverse Gaussian distribution and its pdf is given by where is the modified Bessel function of the third kind in Barndorff-Nielsen and Shephard [17].

Therefore, we can obtain

(ii) Step. we maximize the function in E step with respect to , .

Denote , and we get It can be seen from (15) and (16) that the updating formulae of is independent of the estimated value of ; in other words, we need not to update at each EM iteration, and, after the final estimate of is obtained, the estimate of can then be gotten by maximizing the original log-likelihood function which produces Finally, the maximum of the log-likelihood function under , is obtained as with

3.2. EM Algorithm under

Denote , , , , and random latent vectors , .

The likelihood function of under is

From the stochastic representation in Section 2.2, the regression model under becomes where are independent.

So, the likelihood function of the complete data under has the following form:

Consequently, the complete log-likelihood function is

Given initial values , , and , we can obtain the maximum likelihood estimate of , , and based on EM algorithm via the following two steps.

(i) E Step. Given the th iteration values of and omitting the terms having no relation to , the function of the th iteration is where .

The marginal conditional pdf of given is that, for , and, for , Therefore, for , and, for ,

(ii) M Step. We maximize the function in E step with respect to , , and .

Denote , , and we get

It can be seen from (28)–(30) that the updating formulae of , is independent of the estimated value of , so we need not to update at each EM iteration. After the final estimates of and of are obtained, the estimate of can then be gotten by maximizing the original log-likelihood function, which produces Finally, the maximum of the log-likelihood function under is obtained as with

3.3. SIC Algorithm for Laplacian Regression Model

The work by Chen [3] proposes transforming the process of hypothesis testing in a procedure of model selection using the Schwarz Information Criterion (SIC) defined by where corresponds to the log-likelihood function evaluated on the maximum likelihood estimate of the parameters and is the number of model parameters and is the sample size. Note that maximizing the log-likelihood function is equivalent to minimizing the Schwarz information criterion.

In the Laplacian linear regression model, the Schwarz information criterion under , denoted by , is given by where and are given by (19) and (20). The Schwarz information criterion under , denoted by , is given by with , , and given by (32) and (33).

The selection criteria are to choose a model with a change-point in the position, if, for some , When the null hypothesis is rejected, the maximum likelihood estimate of the change-point in the regression coefficients, denoted by , must satisfy

4. Simulation Studies

In this section, we investigate the performance of the proposed approach to detect mean change-point through simulations, and we compare our procedure with the change-point detecting procedure proposed by Chen [3], who assumed that the errors follow the normal distribution. A data set of observations is generated from the model where , , and for and . For demonstrating the performances of the algorithm in different cases, we choose , respectively, and consider the following six error distributions:(i)Simulation 1: normal distribution: ;(ii)Simulation 2: Laplace distribution: ;(iii)Simulation 3: distribution with three degree of freedom: ;(iv)Simulation 4: distribution with three degree of freedom: ;(v)Simulation 5: log-normal distribution: ;(vi)Simulation 6: Cauchy distribution.

In order to evaluate the finite sample performance of the proposed method, 500 replications are conducted for different error distribution, respectively. In each replication, the initial values of and , are set to be their ordinary least squares estimates, and the EM procedure is stopped when the absolute differences between two successive values of maximum log-likelihood functions are less than a preassigned small number (e.g., ). The final values of and , are taken as the median regression coefficient and the estimate of is given by (31). The position of change-point is estimated by the SIC method denoted as in the th replication.

Finally, the mean and standard difference of the estimated change-point position are given by In order to compare the results of our proposed method with those of the the normal method proposed in Chen [3], the same simulations are conducted using normal method. The results are presented in Tables 1 and 2, where is the true value of change-point position in simulation, estimate. and diff. are the estimate of and the absolute difference of the estimate and the true value of , and finally, sd. is the standard difference of the estimate. The result by normal method is denoted as estimate., diff., and sd., respectively. We compare diff. with diff. and sd. with sd., and the smaller the better. From Tables 1 and 2, we have the following findings.(1)In the and cases, both the Laplace method and normal method behave quite well with small diff. Besides that, the normal method is better than the Laplace method with smaller sd in the normal error case, and the result is inverse in the Laplace error case.(2)In the skew and heavy-tailed , , and Cauchy cases, the Laplace method is better than the normal method with smaller diff. and smaller sd.(3)In the and cases, when the true position of change-point is in the middle of the data (), both of the two methods have good performance, but when is in the lower part of the data () or upper part (), the normal method can not detect the change-point effectively with large diff and large sd.(4)An interesting phenomenon appears in the case of Cauchy distribution, where the normal method simply uses integers around 100 to estimate the true values of whenever is small or large; on the contrary, the performance of the Laplace method is better relatively.

Another simulations design with and are conducted and the results are not presented in our paper. The results also justify our former findings; that is, our Laplace method is more robust than the normal method to estimate the position of mean change-point when the errors follow skew and heavy-tailed distributions, especially when the true position of change-point is in the head part or in the tail part of the data.

5. Application to Stock Market Data

Holbert [18] studied the switching simple linear regression models with changes in the coefficients from a Bayesian point of view. Later on, Chen [3] analyzed this data using the SIC method to detect changes in the mean for normal linear regression models, and Osorio and Galea [6] analyzed the same data also using the SIC method to detect mean change-point but assuming the errors that follow the Student -distribution. In this section, we reanalyze the Holbert data using the proposed Laplace linear regression model to detect the mean change-point based on the SIC method. The monthly dollar volume of sales (in millions) on the Boston Stock Exchange (BSE) is considered as the response variable , and the combined New York American Stock Exchange (NYAMSE) is considered as the regressor . There are observations corresponding to the period between January 1967 and November 1969. The model we consider is that where The computed SIC values are listed in Table 3 along with the original BSE and NYAMSE values given in Holbert [18]. The SIC values calculated using the normal method by Chen [3] are also listed in the last column.

The bold SIC value in Table 3 is the minimum SIC value, which corresponds to time point 9, using the Laplace method and time point 23 using the normal method in Chen [3]. So, Chen [3] believed the regression model change starts at the time point 24, which is December 1968. Osorio and Galea [6] obtained the minimum values of SIC using the Student-t model, which are at time point 9 for degree of freedom 1 and 23 for degree of freedom 4. Osorio and Galea [6] finally drew the same conclusion as Chen [3] that the regression model change starts at the time point 24. We list the minimum values of SIC in different regression models in Table 4. Table 4 tells us that the minimum SIC value is 353.8327 for time point 9 in our Laplace model; hence, we draw the conclusion that the regression model change starts at the time point 10, which is October 1967. The median estimates for the parameters are , for observations 1–9 and , for observations 10–35. Finally, the predictive regression lines are plotted in Figure 1. The scatter plot shows that there are outliers in the response observations, so the median regression line estimated form Laplace model is more robust than the ordinary least square regression line estimated from the normal model.

6. Summary

In this paper, we proposed the Laplace linear regression model with a mean change-point and developed the EM algorithm with SIC model selection criterion to estimate the position of mean change-point. We investigated the performance of the algorithm for different simulations, finding that the algorithm behaved quite well when the errors follow the Laplace distribution. Besides that, our Laplace method is more robust than the normal method to estimate the position of mean change-point when the errors follow skew and heavy-tailed distributions, especially when the true position of change-point is in the head part or in the tail part of the data. Finally, we applied our method to the Holbert data and detected a mean change-point. Considering the difficulty in estimating the unknown degree of freedom in the Student t-distribution, we did not compare our Laplace model with the Studen model in Osorio and Galea [6], where the degree of freedom was predetermined by the authors. As to the multiple mean change-points problem in Laplace linear regression, the stepwise and dichotomy methods Vostrikova [19] may work and further research is needed in the future.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

References

J. Chen and A. K. Gupta, Parametric Statistical Change Point Analysis: with Applications to Genetics, Medicine, and Finance, Birkhäauser, Boston, Mass, USA, 2nd edition, 2012.
View at: Publisher Site | MathSciNet
G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Chen, “Testing for a change point in linear regression models,” Communications in Statistics. Theory and Methods, vol. 27, no. 10, pp. 2481–2493, 1998.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Chen and A. K. Gupta, “Change point analysis of a Gaussian model,” Statistical Papers, vol. 40, no. 3, pp. 323–333, 1999.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Chen and Y.-P. Wang, “A statistical change point model approach for the detection of DNA copy number variations in array CGH data,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 6, no. 4, pp. 529–541, 2009.
View at: Publisher Site | Google Scholar
F. Osorio and M. Galea, “Detection of a change-point in Student-t linear regression models,” Statistical Papers, vol. 47, no. 1, pp. 31–48, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
J. G. Lin, J. Chen, and Y. Li, “Bayesian analysis of student t linear regression with unknown change-point and application to stock data analysis,” Computational Economics, vol. 40, no. 3, pp. 203–217, 2012.
View at: Publisher Site | Google Scholar
S. Kotz, T. J. Kozubowski, and K. Podgorski, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, Birkhäuser, Boston, Mass, USA, 2001.
E. Purdom and S. P. Holmes, “Error distribution for gene expression data,” Statistical Applications in Genetics and Molecular Biology, vol. 4, no. 1, pp. 1–33, 2005.
View at: Google Scholar | MathSciNet
M.-I. Pop, “Distribution of the daily sunspot number variation for the last 14 solar cycles,” Solar Physics, vol. 276, no. 1-2, pp. 351–361, 2012.
View at: Publisher Site | Google Scholar
P. L. Noble and M. S. Wheatland, “Origin and use of the laplace distribution in daily sunspot numbers,” Solar Physics, vol. 282, no. 2, pp. 565–578, 2013.
View at: Publisher Site | Google Scholar
S. van Sanden and T. Burzykowski, “Evaluation of Laplace distribution-based ANOVA models applied to microarray data,” Journal of Applied Statistics, vol. 38, no. 5, pp. 937–950, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
R. F. Phillips, “Least absolute deviations estimation via the EM algorithm,” Statistics and Computing, vol. 12, no. 3, pp. 281–285, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
T. Park and G. Casella, “The Bayesian lasso,” Journal of the American Statistical Association, vol. 103, no. 482, pp. 681–686, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
W. Song, W. Yao, and Y. Xing, “Robust mixture regression model fitting by Laplace distribution,” Computational Statistics & Data Analysis, vol. 71, pp. 128–137, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
D. F. Andrews and C. L. Mallows, “Scale mixtures of normal distributions,” Journal of the Royal Statistical Society B, vol. 36, no. 1, pp. 99–102, 1974.
View at: Google Scholar | MathSciNet
O. E. Barndorff-Nielsen and N. Shephard, “Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics,” Journal of the Royal Statistical Society B Statistical Methodology, vol. 63, no. 2, pp. 167–241, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
D. Holbert, “A Bayesian analysis of a switching linear model,” Journal of Econometrics, vol. 19, no. 1, pp. 77–87, 1982.
View at: Publisher Site | Google Scholar
L. J. Vostrikova, “Detecting disorder in multidimensional random processes,” Soviet Mathematics Doklady, vol. 24, no. 1, pp. 55–59, 1981.
View at: Google Scholar

Copyright

Copyright © 2014 Fengkai Yang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2162

Downloads

1148

Citations