We propose a Gibbs sampling algorithm to detect additive outliers and patches of outliers in bilinear time series models based on Bayesian view. We first derive the conditional posterior distributions, and then use the results of first Gibbs run to start the second adaptive Gibbs sampling. It is shown that our procedure could reduce possible effects on masking and swamping. At last, some simulations are performed to demonstrate the efficacy of detection and estimation by Monte Carlo methods.

1. Introduction

Dynamic systems or engineering time series observations [1] are often perturbed by some interrupting phenomena which generate aberrant data. They may be due to some unusual events, such as sudden disturbed factor, noise, and even recording errors. Such values are usually referred to as outliers, which may be isolated or of patch. Because outliers and patches in a time series can have adverse effects in data analysis, which maybe make the resultant inference unreliable or even invalid, it is important to detect and remove such outlier's effects. Some authors have considered the detection problem in the linear and nonlinear models. Abraham and Box [2] analyzed some outlier problems based on Bayesian methods in time series. Tsay [3] considered time series model specification in the presence of outliers. Ljung [4] presented outlier detection in time series. Justel et al. [5] studied detection of outlier patches in autoregressive time series. Battaglia and Orfei [6] considered outlier detection and estimation in nonlinear time series. Chen et al. [7] studied similar problem in ARMAX time series models. P. Chen and Y. Chen [8] gave the identification of outliers in ARMAX models via genetic algorithm.

Based on some different prior distributions, a new Gibbs sampling algorithm is proposed for identifying additive isolated outliers and patches of outliers in bilinear time series. This paper considers detection of outliers and patches in bilinear time series models, which are one of the fractal time series models [9]. Wang and Wei [10] analyzed the probabilistic structure for a rather general class of bilinear models systematically. Wang [11] considered parameter estimation and subset selection for separable lower triangular bilinear models. In particular, Chen [12] proposed a procedure for detecting additive outliers in bilinear time series, which also discusses some major problems encountered in practice, such as how one can distinguish between ARMA model with outliers and a bilinear model without outliers. Some special cases of bilinear models, such as diagonal bilinear model, vector bilinear model and so on, have been studied by many authors on not only probabilistic structure but also statistical inferences; for instance, Subba Rao [13], Subba Rao and Gabr [14], Kim et al. [15], and Sesay and Subba Rao [16], among others.

The format of the paper is as follows. Section 2 gives the outliers model of bilinear series. Section 3 presents a method to detect outliers in bilinear model via standard Gibbs sampling. In Section 4, we propose an adaptive Gibbs sampling method to detect patches of outliers in bilinear model. Section 5 gives simulated examples and some conclusions.

2. Outliers Models of Bilinear Time Series

The time series is called a bilinear process, if it satisfies the model

where , and and are unknown parameters. This model is called a bilinear model with order . The bilinear time series models were introduced by Granger and Anderson [17]. It is a generalization of the linear ARMA model by incorporating the product term of time series and innovation . This bilinear model has attracted much attention in the recent time series literature. In terms of potential applications, bilinear models are known to be able to mode occasional outbursts in time series, which might be useful for modeling seismological data such as records for explosions and earthquakes.

When additive outliers (AO) are present, is disturbed and unobservable. In this case, it is assumed that the observed series follows

where 's are independent and identically distributed Bernoulli random variables with , and 's are random variables from normal distribution. That means that the observation may be AO with probability ; its magnitude is at time .

For simplicity, let and assume that are fixed and for , that is, there exist no outliers in the first observations. The indicator vector of outliers then becomes and the size vector is . Assume that a vector of residuals of model (2.1) is available. And let , , and , where denotes the estimation of in model (2.1) without AO. Then the model (2.1) and (2.2) may be transformed as the follows

3. Outliers Detection via Standard Gibbs Sampling

In order to detect outliers in the bilinear model, it is essential to derive the conditional posterior distributions of parameters , and .

For computational reason, we give the following conditions: (1) using conjugate prior distribution for parameters and , which distributed as multidimensional uniform distribution on region and inverted-Gamma distribution respectively. (2) Assume that the outlier indicator and the outlier magnitude are independent and distributed as and , respectively for all . (3) The prior distribution of the contamination parameter is , and are for all . (4) The hyperparameters in our model are and , all of which are assumed to be known.

Note that the prior probability of being contaminated by an outlier is the same for all observations, namely, , for . Then, under these conditions and by using the standard Bayesian method, we can obtain the following results.

Theorem 3.1. The conditional posterior distribution of is , where

Proof. Under the conditions above, we have that This completes the proof of the theorem.

Theorem 3.2. (1) The conditional posterior distribution of is , where .
(2) The conditional distribution of given the sample and the other parameters is , where denotes the number of
(3) When , there is no new information about the posterior distribution of , namely, distributed as When , since contains information of , we have that
where is obtained by eliminating the element , and where and . The proof of this Theorem is very tedious and similar to that in MuCulloch and Tsay [18]; we omit it here.

Theorem 3.3. For the conditional posterior distribution of , we have where is obtained by eliminating the element , , and here , , and When , then for all ; when , we have that

Proof. The conditional posterior distribution of is with probability So the conclusion is obtained.

4. Detection of Outlier Patches via Adaptive Gibbs Sampling

Similar to Justel et al. [5], our procedure also consists of two Gibbs runs. In the first run, the standard Gibbs sampling based on the results of Section 3 is carried out. The results of this Gibbs run are then used to implement a second Gibbs sampling that is adaptive in treating identified outliers and in using block interpolation to reduce possible masking and swamping effects. Let and be the posterior means of and respectively based on the iterations of the first Gibbs run. First, we select an appropriate critical value to identify potential outliers. An observation is identified as an outlier if the posterior probability . Let be the collection of time indexes of outliers identified by the first Gibbs run. Second, let be another appropriate critical value to specify the beginning and end points of a potential outlier patch. We select a window of length around the identified outlier to search for the boundary points of a possible outlier patch by a forward-backward method. Exampling an identified outlier , for the observations before if their posterior probabilities , then these points are regarded as possible outlier patch associated with . We then select the farthest point from as the beginning point of the outlier patch. Denote the point by . Then we do the same for the observations after and select the farthest point from with as the end point of the outlier patch. Denote the end point by . Combine the two blocks to form a possible outlier patch associated with , which is denoted by . Consecutive or overlapping patches should be merged to form a larger patch. Lastly, draw Gibbs samples jointly within a patch. Suppose that a patch of outliers starting at time index is specified. Denote the vectors of outlier indicators and magnitudes by and , respectively, associated with the patch. We give the conditional posterior distributions of and in the next proposition; its derivation is similar to Theorem of Justel et al. [5].

Proposition 4.1. Let z be a vector of observations according to (2.3), with no outliers in the first data points. Assume , and where the parameters and are known. Let be the residual at time when the series is adjusted for all identified outliers not in the interval with the outliers identified in , , and where if and if . Then we have the following.(a)The conditional posterior probability of a block configuration , given the sample and the other parameters, is where denotes all the other parameters except , and is a normalization constant so that the total probability of the possible configurations of is one.(b)The conditional posterior distribution of given the sample and other parameters is , where , and with for and for or .

For the second adaptive Gibbs sampling, we use the results of the first Gibbs run to start the second Gibbs sampling and to specify prior distributions of the parameters. For each outlier patch, we use the results of Proposition 4.1 to draw and in the second Gibbs sampling, which is also run for iterations.

The starting values of are as follows: if , otherwise, . Then the prior distributions of are as follows.

(a)If is identified as an isolated outlier, then the prior distribution of is , where is the Gibbs estimate of from the first Gibbs run.(b)If belongs to an outlier patch, then the prior distribution of is , where is the conditional posterior mean as follows: (c)If does not belong to any outlier patch and is not an isolated outlier, then the prior distribution of is .

5. Simulated Example and Conclusions

Example 5.1. In the simulations, the designed bilinear model is BL(2,2,1,1) model: where is Kronecker symbol: if , then , else , and is standard normal white noise.

We produce a set of observations of the bilinear model (5.1) by simulation. In order to obtain stationary data, we produce 1000 observations of firstly; then we use the last 100 observations as our simulated sample, in which a patch of five consecutive additive outliers has been introduced from to , a single AO has been added at , and the outlier magnitudes are and , respectively. Figure 1 shows the trend of simulated series . It is obvious that the curve of observations had large volatility; it would be very difficult to distinguish between “outliers” and normal points of nonlinear model.

Let , and . Here means that we believe the prior probability of each point is an outlier approximate to 0.05. First, we detect the outliers in using the standard Gibbs sampling. We can obtain the posterior probabilities that each observation is an outlier by using the given methods in Section 3, which are shown in Figure 2. It displayed that the posterior probabilities of being outliers only at and were larger than . Meanwhile, the outlying posterior probabilities of other observations were lower; the posterior probability of being an outlier at was even smaller than . We see that the standard Gibbs sampling failed to detect the inner and border points at , that resulted in the masking effect.

Second, in order to avoid the presence of masking and swamping problem, we use the method given in Section 4, and take 900 iterations by the adaptive Gibbs algorithm. Figure 3 gives the plot of the estimated posterior probability that each point is seen as an outlier via adaptive Gibbs sampling in simulation, where the window width of search was 6. It is obvious that the posterior probabilities of being an outlier obtained by adaptive Gibbs algorithm for data points at and are larger than 0.5, which meant that these points were possible outliers. Meanwhile, the outlying posterior probabilities of other observations are very small. Actually, if we select the critical values and , then we could identify the patch. On the other hand, many normal points like outliers in Figure 1 were not misspecified as outliers because the outlying posterior probabilities of these points were smaller than 0.2, which shows that the adaptive Gibbs sampling was more effective than the standard Gibbs sampling in mining the additive outlier patch for bilinear time series. Third, the posterior means of the sizes of these outliers are , , , , and , respectively.

Furthermore, we did 200 replications for the above simulation. Table 1 gives the average values of the posterior means of and and their standard deviations. In these replications, the isolated outliers were identified by standard and adaptive Gibbs sampling procedures. Through computing, we find that the standard Gibbs sampling failed to detect all the outlier patches in 200 replications, but in 179 replications, the adaptive Gibbs sampling successfully could specify all outliers and patches. On the other hand, we note that the last three parameters are actually not estimated well by the adaptive Gibbs algorithm in many replications. Investigating its reason, we think it may be the influence of nonlinearity or the frontal outliers of patch. Such situation deserves careful investigation in the future.

Since the bilinear series will by itself produce what appears to be outliers, the detection of the locations of outliers and patches is more difficult, and the estimation of the magnitude of outliers and patches is more complex than autoregressive series. By a number of simulations of detecting the outlier patches in bilinear model by adaptive Gibbs sampling, we discovered that the critical value should be selected smaller than that in ARMA model. It may be that the variation of bilinear series is larger than that of ARMA series. On the other hand, in the process of running, the Gibbs sampler was repeated several times with different hyper-parameters and different numbers of iterations to reanalyze the data. The results show that the locations of possible outliers and patch are stable, even though the estimated outlying probabilities may vary slightly between the Gibbs samples. Some other case studies also show that our procedure could reduce possible masking and swamping effects, which is an improvement and extension on bilinear time series model over the existing outliers detection methods.


The authors thank the editor and three referees for their careful and constructive comments that led to substantial improvements of the paper. The project is supported by NSFCJS BK2008284.