Security and Communication Networks

Volume 2017, Article ID 1869787, 10 pages

https://doi.org/10.1155/2017/1869787

## Locality-Based Visual Outlier Detection Algorithm for Time Series

^{1}Department of Computer Science, School of Internet of Things Engineering, Jiangnan University, Jiangsu, Wuxi 214122, China^{2}Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA

Correspondence should be addressed to Zhihua Li; nc.ude.nangnaij@ilhz

Received 22 August 2016; Revised 8 June 2017; Accepted 6 July 2017; Published 22 August 2017

Academic Editor: Emanuele Maiorana

Copyright © 2017 Zhihua Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Physiological theories indicate that the deepest impression for time series data with respect to the human visual system is its extreme value. Based on this principle, by researching the strategies of extreme-point-based hierarchy segmentation, the hierarchy-segmentation-based data extraction method for time series, and the ideas of locality outlier, a novel outlier detection model and method for time series are proposed. The presented algorithm intuitively labels an outlier factor to each subsequence in time series such that the visual outlier detection gets relatively direct. The experimental results demonstrate the average advantage of the developed method over the compared methods and the efficient data reduction capability for time series, which indicates the promising performance of the proposed method and its practical application value.

#### 1. Introduction

Time series, widely existing in various applications [1] such as sensor network data collection [2–4], credit card fraud data [1], and environment monitoring data [2, 5–8], is one of the major types of big data. In fact, time series is an ordered sequence of observed data with respect to time; highly intuitive and usually most of the desired key information can be directly obtained from the different variations or distributions via the human visual system. On the other hand, physiological experiments have demonstrated that the deepest impression for sequence data with respect to the human visual system is its extreme value [9, 10], so it intuitively inspired us to study the visual outlier detection method with respect to the outlier events based on this principle.

Generally, there are three types of outliers: collective outliers, point outliers, and contextual outliers [8]. Identification of outliers can lead to the discovery of significant clues and has practical applications in various fields, such as financial risk management [1, 10], anomaly detection [5], and disaster alarm in environment monitoring [2, 5–7]. In the past few decades, this issue has been addressed in academia and attracted an increasing amount of attention. Some of the outlier detection approaches are based on notably different assumptions, intuitions, and models and also differ substantially in the scaling, range, and even meaning of values [11]. Furthermore, some other methods are developed on the basis of the technologies themselves such as the cluster-based detection method [4], the immunology-based detection method [12], and the SVM-based detection method [8]. Regardless of any type of time series, there always exist many valuable characteristics in most locations, such as the locality features neighboring the real outlier, the locality characteristic maybe more meaningful than the global information. For example, when a doctor diagnoses a disease based on the electrocardiogram, the ECG’s local information is enough for finding the lesion. However, most of the aforementioned methods are unable to detect the outliers in time series locally and visually.

Although most of the previous researches [1–8] have addressed the outlier detection in time series, there still exist some challenges to undertake; for example, different time series appear out of synchronism, results of the traditional similarity calculation method are no longer available, the periodical outlier in time series is hard to detect, the determination of the outlier threshold is unreasonable, and so on. In this paper, a hierarchy-segmentation-data-extraction-based outlier detection method is proposed. Our scheme integrates the investigation on the following to achieve relatively high effectiveness and efficiency: (a) studying the extreme-point discriminating strategy based on hierarchy segmentation; (b) the hierarchy-segmentation-based data extraction (HSDE) method for time series; (c) the outlier detection model; and (d) the locality outlier detection algorithm. Specific to the outlier identification, here, unlike all previous attempts to solve this problem, the proposed method depends on the departure from the location of the objects from its expected hierarchy rather than its global structure. Additionally, being labeled as an “outlier” here is not an either/or proposition. Instead, the proposed method assigns a local outlier factor to each detected subsequence, and the factor is the level to whether the object is outlying. Our major contributions are detailed as follows.

() The relation between the distribution characteristic in time series and the recognition mechanism associated with the human visual system is addressed, and the HSDE-based visual outliers detection method distinguishes the outliers directly without requiring previously observed training data.

() The locality-based outlier detection idea is successfully transferred into the realization for data mining of time series; in contrast, the previous LOF algorithms are only applicable to numerical data.

() A novel hierarchy-segmentation-based data extraction method for time series and its associated outlier detection model are presented.

The remainder of this paper is organized as follows. The related works are introduced in Section 2. In Section 3, we describe the new hierarchy-segmentation-based strategy and the related data extraction method. In Section 4, we improve the key ideas in LOF algorithm and derive the framework of the HSDE-based outlier detection model and algorithm. Promising experimental results on benchmarking datasets are presented in Section 5, which are followed by the concluding remarks in Section 6.

#### 2. Related Works

A wide variety of studies investigating outlier detection have been examined; various outlier detection methods, such as global versus local, scoring versus labeling, and supervised versus unsupervised, were proposed [13]. Most of them are developed from different identification ideas of outliers, respectively, such as similarity measurement or dissimilarity measurement. Due to the specificity of time series, only a small part of detection methods are able to detect the outliers in time series.

As to the distance-based outlier detection methods in time series, there are four main dissimilarity measurements and their related evolution works, such as Euclidean distance (ED), dynamic time warping (DTW), symbolic aggregate approximation (SAX), and extended symbolic aggregate approximation (Extended-SAX) and their derived outlier detection schemes. The associated outlier detection methods that are developed from the four types of distance all inherit their own advantages or disadvantages without exception. ED is well known for its simple computation and sound universality, but it can only carry out the time series of equal length and cannot recognize the variation trend of time series [13, 14]. DTW can well overcome the first disadvantage of ED and can support the time warping of time series. However, its computing complexity and time complexity are high, which limits its application range. Chiu et al. [15] proposed the symbolic aggregate approximation (SAX) approach. SAX firstly symbolizes the time series and then carries out data similarity measure of the symbolic data. This method was easy to use and independent of specific experimental data. With relatively strong universality, the approach has been widely used [16–18]. However, the essence of similarity measure in SAX is based on ED or DTW, so it is inevitable to inherit their disadvantages.

Naess and Gaidai [9] developed a feature space-based outlier detection method based on SAX. The feature space-based outlier detection method can reduce the number of features effectively and compress the scale of time series. It was easy to miss some important features in the process of reduction. And also, it was unable to detect the outliers in time series visually. Extended symbolic aggregate approximation (Extended-SAX) [19] was developed from SAX, and an outlier detection method was also presented. Extended-SAX needed to depend on the piecewise aggregate approximation (PAA) representation for dimensionality reduction that minimizes dimensionality by the mean values of equal sized subsequences. Undoubtedly, the final distance measurement in Extended-SAX also depended on ED or DTW. Furthermore, the PAA still needed more time to strengthen the computation complexity. The outlier detection method based on Extended-SAX is unable to detect the outliers in time series visually. More so, all of the above methods realized the outlier detection through the so-called “distance measurement” rather than the locality distribution characteristic of time series.

This paper also uses DTW as the dissimilarity measurement. The HSDE*-*based outlier detection scheme is also inspired by the strategy of the local outlier factor LOF [19] and its incremental LOF algorithm [20], whereby we address the collective outlier detection by DTW-based methods and aim to enumerate the desired outliers in time series visually via the locality distribution characteristics of data points. Particularly, the outliers are visually enumerated to detect by the human visual system. Finally, comparison studies are also performed with the feature space-based outlier detection method [9] and the Extended-SAX-based outlier detection method [19], and the analysis results are also presented.

#### 3. Hierarchy-Segmentation-Based Time Series Extraction

##### 3.1. Extreme-Point-Based Hierarchy Segmentation

According to the physiological theories [9, 10], the extreme value in time series (i.e., either the maximum value or the minimum value) usually gives people the deepest impression. Based on this principle, this paper presents a new concept: “*hierarchy* of time series.”

*Definition 1. *Given a time series , before and after , wherein the interval of is , if and is the maximum value or minimum value in , then it is called the hierarchy of and is the size of its corresponding marked window.

*Definition 2 (). *The absolute value of is called its “hierarchy value.”

In the following, is used to represent the corresponding subsequence and its hierarchy value is .

In this, the “hierarchy value” describes the importance level of in time series. The larger the hierarchy value, the higher the importance of in time series. Therefore, the hierarchy value is also entirely used to represent the importance level of in time series.

Based on the characteristic of the hierarchy of different data points in time series, the hierarchy-segmentation-based data extraction (HSDE) for time series is proposed, which includes stages such as extreme-pointed discriminating (EPD), hierarchy marking of time series (HM) and hierarchy segmentation series accessing (HSSA).

*(**1) Extreme-Pointed Discriminating.* In this section, extreme-pointed discriminating (EPD) function is discussed. In a time series , is a subsequence of . If is “,” then the returned value of EPD is noted as Flag = 1; if is “,” then the returned value of EPD is noted as Flag = −1; otherwise, Flag = 0. The pseudocode of EPD is expressed in Pseudocode 1.