Computational Intelligence and Neuroscience

Volume 2016 (2016), Article ID 8343187, 16 pages

http://dx.doi.org/10.1155/2016/8343187

## A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

^{1}College of Information Science & Technology, Donghua University, Shanghai 201620, China^{2}Australia e-Health Research Centre, Csiro Computation Informatics, Brisbane, QLD 4060, Australia

Received 28 September 2015; Accepted 28 April 2016

Academic Editor: Hiroki Tamura

Copyright © 2016 Jin-Peng Qi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, -statistic (), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.

#### 1. Introduction

Abrupt change detection is to identify abrupt changes in the statistical properties of a signal series, which occur at unknown instants [1–3]. These changes are interesting because they are indicative of qualitative transitions in the data generation mechanism (DGM) underlying the signals. Currently, CP detection has attracted considerable attention in the fields of data mining and statistics, and it has been widely studied in many real-world problems, such as atmospheric and financial analyses [1], fault detection in engineering system [4, 5], climate change detection [6], genetic time series analyses [7], signal segmentation [8, 9], and intrusion detection in computer network [4].

In community of statistics, some nonparametric approaches for CP detection have been widely explored. For example, KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution or between the empirical distribution function of two samples [10, 11]. Also, KS statistic and its modified versions are broadly investigated on many application fields, for example, testing hypotheses regarding activation in blood oxygenation level-dependent functional MRI data [12], modeling the cumulative distribution function of rub-induced AE signals, quantifying the goodness of fit to offer a suitable signal feature for diagnosis [13], as well as abrupt change detecting on EEG signals [14], and gene expression time series [15]. Meanwhile, as for the model-related statistic approaches, some modified cumulative sum (CUSUM) methods provide the asymptotic distributions of test statistics and the consistency of procedures and behave better in finite samples and have a higher stability with respect to the time of change than ordinary CUSUM procedures [16]. The CUSUM method and its revised versions have been widely applied to detect the structural breaks in the parameters of stochastic models, as well as the abrupt changes in the regression parameters of multiple time series regression models, such as multiple CP detection in biological sequences [17], abrupt change detection in the regression parameters of a set of capital asset pricing data related to the Fama-French extension of the CAPM [16], and abrupt change detection in a shape-restricted regression model [18].

On the other hand, SSA is a powerful technique for time series analyses. SSA is nonparametric and requires no prior knowledge on the properties of time series signal [19]. The main idea of SSA is applied in the principal component analyses on the trajectory matrix with subsequent reconstruction of the original time series. SSA has been proved to be very successful and has already become a standard tool in the analyses of climatic [10], meteorological, and geophysical time series [11, 19]. Currently, SSA has been successfully applied in the real time series recordings, for example, abrupt change analyses on EMG-onset detection [12] and CP detection in time series [13]. Although SSA is a model-free method, it is not scalable to large-scale datasets, because it is time-consuming and sometimes invalid for time series analyses with less significant data fluctuation.

In addition, Wavelet Transform (WT) is another important tool for time series analyses [14, 15, 20–23]. WT has been widely applied in anomaly detection, time series prediction, image processing, and noise reduction [15, 23–25]. WT can represent general function at different scales and positions in a versatile and sophisticated manner, so that the data distribution features can be easily extracted from different time or space scales [25, 26]. As a simple WT, Haar Wavelet (HW) owns some attractive features including fast implementation and ability to analyze the local features. HW is very useful to find abrupt changes of discontinuity and high frequency in time series, so it is a potential candidate in modern electrical and computer engineering applications, such as signal and image compression, eye detection [27], abnormality detection on time series [28, 29], and abrupt change detection on autoregressive conditional heteroscedastic processes [30].

However, all of these methods above are time-consuming and sometime invalid for abrupt change detection near the left or the right boundary, especially for insignificant data fluctuation in large-scale time series. To resolve these problems, we propose a fast framework for CP detection based on binary search trees and a modified KS statistic, termed BSTKS for short. In this novel method, first, two BSTs are derived from a diagnosed time series. Second, three search criteria are introduced in terms of the statistic and variance fluctuations between two adjacent time series segments, and then an optimal search path is detected from the root to leaf nodes of two BSTs. Last, the proposed BSTKS and other KS, , and SSA methods are tested on both the synthetic time series and real EEG recordings and evaluated in terms of computation time, hit rate, error, accuracy, and area under curve (AUC) of Receiver Operating Characteristic (ROC) curve analyses.

In general, for a certain bioelectric signal, an abrupt change means an important transition of biological functions or health states before and after a strong attack or an acute perturbation from internal or external environment. Therefore, it is very necessary to not only discern abrupt change from all kinds of physiological and psychological time series signals, but also inspect the significant fluctuation between adjacent time series segments with different scales. The following sections focused on not only presenting the framework of the proposed BSTKS method through theoretical foundation, simulation, and evaluation, but also discussing how it can more quickly and efficiently detect abrupt change on both synthetic and real bioelectric EEG signals than other existing KS, , and SSA methods. The rest of this paper is organized as follows. Section 2 gives the preliminary of abrupt change by introducing the statistic and variance fluctuations between two adjacent time series segments. Section 3 implements the integrated framework of the BSTKS method in terms of three search criteria in detail. Section 4 provides some representative experiments by using the synthetic time series and real EEG recordings and then analyzes the performance of BSTKS by comparing with other KS, , and SSA methods. Section 5 gives summary and conclusion from previous sections.

#### 2. Preliminary

##### 2.1. Statistic Fluctuation

KS statistic is sensitive to differences in both location and shape of the cumulative distribution functions (c.d.f) of two samples. The null distribution of KS statistic is calculated under the null hypothesis that the two samples are drawn from the same distribution or one sample is drawn from the reference distribution. To detect an abrupt change from a diagnosed time series , we define the statistic fluctuation between two adjacent segments within by means of KS statistic as follows [1, 4, 19].

*Definition 1. *Supposing a time series sample, , one observes where is a set of the discrete and centred i.i.d random variables and is a noisy mean signal with unknown distribution. The statistic fluctuation between two adjacent segments and is defined asin which and are the c.d.f of and , respectively; , , and . Supposing the hypothesized and in (2) are not available, we can derive the empirical cumulative distribution functions (e.c.d.f) of and from and . Then, and can be redefined aswhere and count the proportion of the sample points below level .

*Hypothesis 1. *In order to discern an abrupt change on in terms of statistic fluctuation defined above, we introduce KS test for two adjacent segments and in asif , no abrupt change occurs in ;if , abrupt change occurs in , in which is a threshold of the statistic fluctuation within belonging to an identical distribution. Then, we test against from observations. If an abrupt change occurs in , there exists a value satisfying , , and . In this hypothesis, we assume that the number, the location, and the size of the function in (1) are unknown, and the upper bound of the statistic fluctuation is supposed to be known.

##### 2.2. Variance Fluctuation

Provided the statistic fluctuation defined in (2) is insignificant enough, it is difficult to detect abrupt change near the left or the right boundary within , especially when sample size gets smaller. Therefore, we need to introduce another variable to calculate the variance fluctuation between two adjacent parts within a time series sample.

*Definition 2. *Supposing two adjacent segments and in , the variance fluctuation between and is defined asin which , , and .

*Hypothesis 2. * If , no abrupt change occurs at in ; if , abrupt change occurs at in .

Here, is a variance threshold of time series which obeys an identical distribution. If there exists a value satisfying , then an abrupt change occurs at in .

#### 3. Method

##### 3.1. Two BSTs’ Construction

In the first part of the proposed BSTKS method, two BSTs, that is, BSTcA and BSTcD, are constructed from a time series sample , by using multilevel HWT. Generally, as shown in Figure 1, a discrete time series signal can be decomposed into the th-level trend and -level fluctuations, that is, , . The -level HWT is the mapping defined as [13]and then, the mapping can be represented by the approximation and detail coefficient matrices, termed McA and McD as follows: where and .