Advances in Architectures, Big Data, and Machine Learning Techniques for Complex Internet of Things Systems
View this Special IssueReview Article  Open Access
Xiaodan Xu, Huawen Liu, Minghai Yao, "Recent Progress of Anomaly Detection", Complexity, vol. 2019, Article ID 2686378, 11 pages, 2019. https://doi.org/10.1155/2019/2686378
Recent Progress of Anomaly Detection
Abstract
Anomaly analysis is of great interest to diverse fields, including data mining and machine learning, and plays a critical role in a wide range of applications, such as medical health, credit card fraud, and intrusion detection. Recently, a significant number of anomaly detection methods with a variety of types have been witnessed. This paper intends to provide a comprehensive overview of the existing work on anomaly detection, especially for the data with high dimensionalities and mixed types, where identifying anomalous patterns or behaviours is a nontrivial work. Specifically, we first present recent advances in anomaly detection, discussing the pros and cons of the detection methods. Then we conduct extensive experiments on public datasets to evaluate several typical and popular anomaly detection methods. The purpose of this paper is to offer a better understanding of the stateoftheart techniques of anomaly detection for practitioners. Finally, we conclude by providing some directions for future research.
1. Introduction
Anomaly analysis is of great interest to diverse research fields, including data mining and machine learning. It aims to identify those regions from data whose behaviours or patterns do not conform to expected values [1]. The unexpected behaviours, which are significantly different from those of the remainder of the given data, are commonly called anomalies. Notwithstanding, there is no widely acceptable formal definition of this concept. In the literature, an anomaly is also referred to as an outlier, a discordant object, an exception, an aberration, or a peculiarity, depending on specific application scenarios [1–5].
Identifying interesting or unexpected patterns is very important to many domains, such as decision making, business intelligence, and data mining. For example, an abnormal network transmission may imply that a computer system is attacked by hackers or viruses, an anomalous transaction of a credit card may imply unauthorized usage, and unexpected geological activity in nature can be a precursor of an earthquake or tsunami. Due to this fact, anomaly detection has a wide variety of applications, including public medical health, credit card fraud and network intrusion, and data cleaning [3, 5].
With the emergence of new technologies, data collected from realworld scenarios are becoming larger and larger, not only in size, but also in dimensionality. The highdimensional property makes the data objects almost equidistant to each other. This implies that any data objects become very close as the dimensionality of data increases, resulting in the meaningless nature of their respective distances [4]. In this case, traditional anomaly detection methods cannot effectively handle highdimensional data. In addition, most of the traditional detection methods assume that the data have the same type of features. However, the data in reality often have different feature types, such as numerical, binary, categorical, or nominal. This leads to an increased difficulty in anomaly detection.
Since anomaly detection has a wide range of potential applications, a great number of detection algorithms have been witnessed during the past decades. In this paper, we briefly review the latest works and place especial focuses on the ones for those complex data with high dimensionalities and mixed types. Generally, the existing anomaly detection techniques can be grouped into three categories: neighbourbased, subspacebased, and ensemblebased detection methods, depending on the techniques used. Table 1 summaries brief descriptions of the anomaly detection algorithms, including their definitions, pros, and cons.

In the literature, there are several survey papers (e.g., [1–5]) proposed for anomaly detection. However, they concern different aspects of anomaly detection. For example, [1] only reviews traditional outlier detection algorithms, while [2] places its focus on ensemble learning ones. The detection methods for specific application domains like network data and temporal data have been overviewed in [5] and [3], respectively. Unlike the surveys above, this paper only involves the latest and popular anomaly detection methods for the data with high dimensionality and mixed types, on which the classical detection methods cannot handle very well. Besides, this paper also offers more information related to anomaly detection, such as public datasets and widely used metrics. These aspects, however, have not been considered in the other papers. Additionally, this paper has made a comprehensively experimental comparison of several popular detection algorithms. The paper aims to help practitioners to better understand the stateoftheart techniques of anomaly detection.
The remainder of this paper is organized as follows. Section 2 presents a survey of anomaly detection for the complicated data, including neighbourbased, subspacebased, and ensemblebased detection techniques. Section 3 provides evaluation metrics commonly used in the anomaly detection techniques, followed by experimental comparisons of the popular detection methods in Section 4. Section 5 concludes the paper.
2. Methodology
How to effectively identify outliers from the highdimensional or mixedtype data is a fundamental and challenging issue in outlier detection. Recently, a rich number of detection algorithms have been developed to alleviate the problems. Roughly speaking, they can be divided into three categories, that is, neighbourbased (e.g., RBDA [6]), subspacebased (e.g., SOD [7]), and ensemblebased methods (e.g., HiCS [8]). The neighbourbased outlier detection methods mainly exploit the neighbourhood information of a given data object to determine whether it is far from its neighbours or its density is low or not. The subspacebased detection methods identify anomalies by sifting through different feature subsets in an ordered way. Unlike the routine algorithms, the ensemblebased ones combine the outputs of several detection algorithms or base detectors into a unified output by using integrated strategies. Table 1 briefly summarizes descriptions of the anomaly detection techniques.
2.1. NeighbourBased Detection
The basic idea of the neighbourbased anomaly detection methods is to identify outliers by virtue of the neighbourhood information. Given a data object, the anomaly score is defined as the average distance (kNN [9]) or weighted distance (kNNW [10]) to its k nearest neighbours. Another strategy is to take the local outlier factor (LOF) [11] as the measurement of anomaly degree, in which the anomaly score was measured relative to its neighbourhoods. Based on LOF and LoOP [12] provided for each object an outlier probability as score, which is easily interpretable and can be compared over one data set. In ODIN (Outlier Detection using Indegree Number) [13], an object is defined as an outlier if it participates in most neighbourhoods in kNN graph.
Note that all the neighbourbased detection methods mentioned above are independent of the distributions of the data and capable of detecting isolated objects. However, their performance heavily relies on the distance measures, which become unstable or meaningless in highdimensional spaces. To cope with this problem, a feasible solution is to consider the ranking of neighbours, because, for each object, the ranking of its nearest neighbours is still meaningful to the nature of highdimensional data. The underlying assumption is that two objects would most likely become nearest neighbours or have similar neighbours if they were generated from the same mechanism [7]. Following this idea, RBDA (RankBased Detection Algorithm) [6] takes the ranks of each object in its neighbours as the proximity degree of the object. For each object s∈D, let N_{k}(s) be the k nearest neighbours of s. The anomaly degree of s is defined as follows: where r_{p}(s) is the rank of s among the neighbours of p. According to Eq. (1), one may observe that if s ranks behind the neighbours N_{k}(s), it has a higher anomaly degree and would have a high probability of being considered an anomaly. RBDA does not consider the distance information of objects with regard to their neighbours, which would be useful in some cases; MRD (ModifiedRanks with Distance) [28] does. MRD takes both the ranks and the distances into account when estimating the anomaly scores of objects.
A special kind of the nearest neighbour, called the reverse neighbour, is also used to represent the proximate relationship among objects. For any object s, p is called a reverse neighbour of s if s is one of the nearest neighbours of p, and vice versa, that is, s∈N_{k}(p) and p∈N_{k}(s). The intuitive idea is that if an object has fewer reverse nearest neighbours, it is more likely to be an anomaly. Radovanovic et al. [29] adopted the reverse nearest neighbours to estimate the anomaly scores for each object. Bhattacharya et al. [30] continued this method even further by adopting both the reverse neighbours and the ranks of nearest neighbours to measure the anomaly score for each candidate object. Zhang et al. [31] estimated the anomaly scores using the number of the shared nearest neighbours of objects. Tang and He [32] exploited three kinds of neighbourhoods, including k nearest neighbours, reverse nearest neighbours, and shared nearest neighbours, to determine the anomaly scores in the local kernel density estimation. The neighbour rankingbased methods are sensitive to k, where different k values will yield different results. In addition, assigning the right value to k for a specific application is not trivial. To this end, Ha et al. [33] adopted a heuristic strategy to select an appropriate value for k using an iterative random sampling procedure. The assumption is that outlying objects are less likely to be selected than inlying objects in random sampling. Thus, greater inlier scores, called the observability factor (OF), should be given to the selected objects in each sampling. After several iterations of random sampling, the OF score of each object is estimated by counting its occurrence times in its neighbourhood. Based on the OF scores, the value of k can be appropriately assigned as the entropy of the observability factors.
2.2. SubspaceBased Detection
Anomalies often exhibit unusual behaviours in one or more local or lowdimensional subspaces. The lowdimensional or local abnormal behaviours would be masked by full dimensional analysis [34]. Zimek et al. [4] noted that, for an object in a highdimensional space, only a subset of relevant features offers valuable information, while the rest are irrelevant to the task. The existence of the irrelevant features may impede the separability of the anomaly detection model. However, the anomaly detection techniques discussed so far identify anomalous objects from the whole data space with full dimensions. Thus, identifying anomalies from appropriate subspaces appears to be more interesting and efficient.
Subspace learning is a popular technique to handle highdimensional problems in the literature. It is also extensively studied in anomaly analysis. The anomaly detection methods based on subspace techniques aim at finding anomalies by sifting through different subsets of dimensions in an ordered way. These methods have two kinds of representations: the sparse subspace methods [14, 16, 35, 36] and the relevant subspace methods [7, 15, 17, 18, 37].
The sparse subspace techniques project all objects in a highdimensional space onto one or more lowdimensional and sparse subspaces. The objects falling into the sparse subspaces are considered anomalies because the sparse subspaces have abnormally lower densities. It is noted that exploring the sparse projections from the entire highdimensional space is a timeconsuming process. To alleviate this problem, Aggarwal and Yu [36] exploited an evolutionary algorithm to improve the exploration efficiency, where a subspace with the most negative scarcity coefficients was considered a space projection. However, the performance of the evolutionary algorithm heavily relies on some factors, such as the initial populations, the fitness functions, and selection methods.
Subspace representation and encoding are another studied topic for sparse subspace techniques. As a typical example, Zhang et al. [14] utilized the concept of lattice to represent the relationship of subspaces, where the subspaces with low density coefficients are regarded as sparse ones. This kind of method shows advantages in the performance and the completeness. However, constructing the concept lattice of subspaces is complex, leading to low efficiency. Dutta et al. [16] leveraged the technique of sparse encoding to project objects to a manifold space with a linear transformation, making the space sparse.
The relevant subspace methods exploit local information represented as relevant features to identify anomalies. For instance, OR (Out Ranking) [17] extend a subspace clustering model to rank outliers in heterogeneous highdimensional data. SOD (Subspaces Anomaly Detection) [7] is a typical example of the relevant subspace learning methods. It first explores several correlation datasets by using the shared nearest neighbours for each object and then determines an axisparallel subspace on each correlation dataset by linear correlation such that each feature has low variance in the subspace. Unlike SOD, Muller et al. [37] used the relevant relationships of features from the correlation dataset to determine the subspace. Specifically, they obtained relevant subspaces by examining the relevant relationships of features with the KolmogorovSmirnov test [38]. Then, the anomaly degree of the object was calculated by multiplying the local anomaly scores in each relevant subspace. It can be easily observed that this kind of detection method is computationally expensive. The limitation of this method is that it requires a great number of local data to detect the trend of deviation.
2.3. EnsembleBased Detection
Ensemble learning is widely studied in machine learning [39, 40]. Since it has a relatively better performance than other related techniques, ensemble learning is also frequently used for anomaly detection. As we know, none of the outlier detection methods can discover all anomalies in a lowdimensional subspace due to the complexity of the data. Thus, different learning techniques or even multiple subspaces are required simultaneously, where the potential anomalies are derived by ensemble techniques. In the literature, there are two ensemble strategies frequently adopted for anomaly analysis, that is, summarizing the anomaly scores and selecting the best one after ranking. For anomaly analysis, feature bagging and subsampling are extensively studied.
The FB (Feature Bagging) detection method [19] aims to train multiple models on different feature subsets sampled from a given highdimensional space and then combines the model results into an overall decision. A typical example of this technique is the work done by Lazarevic and Kumar [19], in which feature subsets are randomly selected from the original feature space. On each feature subset, the score of each object is estimated with an anomaly detection algorithm. Then, the scores for the same object are integrated as the final score. Nguyen et al. [41] used different detection techniques, rather than the same one, to estimate anomaly scores for each object on random subspaces.
Keller et al. [8] proposed a flexible anomaly detection method that decouples the process of anomaly mining into two steps, that is, subspace search and anomaly ranking. The subspace search aims at obtaining high contrast subspaces (HiCS) using the Monte Carlo sampling technique, and, then, the LOF scores of objects are aggregated upon the obtained subspaces. Stein [20] extended this by first gathering the relevant subspaces of HiCS and then calculated the anomaly scores of objects using local anomaly probabilities (LoOP) [12], in which the neighbourhood is selected in the global data space.
The subsampling technique obtains training objects from a given collection of data without replacement. If implemented properly, it can effectively improve the performance of detection methods. For example, Zimek et al. [21] applied the technique of random subsampling to obtain the nearest neighbours for each object and then estimated its local density. This ensemble method, coupled with an anomaly detection algorithm, has a higher efficiency and provides a diverse set of results.
There are several anomaly detection methods that consider both feature bagging and subsampling. For example, PasillasDiaz et al. [22] obtained different features at each iteration via feature bagging and then calculated the anomaly scores for different subsets of data via subsampling. However, the variance of objects is difficult to obtain using feature bagging, and the final results tend to be sensitive to the size of subsampled datasets.
2.4. MixedType Detection
It is worthy of remark that most of the anomaly detection methods mentioned above can only handle numerical data, resulting in poor robustness. In realworld applications, categorical and nominal features are ubiquitous; that is, categorical and numerical features are mixed within the same dataset [34]. Such mixedtype data pose great challenges to the existing detection algorithms. For mixedtype data, a common and simple strategy is to discretize numerical features and then treat them as categorical ones so that the detection methods for categorical data can be applied directly. While this practice seems to be a good solution, it may lose important information, that is, the correlations between features, leading to poor performance.
By now, a great number of detection methods have been developed to handle categorical data in the literature [42]. For example, He et al. [43] proposed a frequent patternbased anomaly detection algorithm, where the potential anomalies were measured using a frequent pattern anomaly factor. As a result, the data objects that contained infrequent patterns could be considered anomalies. Contrastively, Otey et al. [44] developed a nonfrequent item setbased anomaly detection algorithm. Despite the patternbased methods being suitable for handling categorical data, they are time consuming for general cases. Wu and Wang [45] estimated the frequent pattern anomaly factors based on nonexhaustive methods by mining a small number of patterns instead of all the frequent patterns. Koufakou and Georgiopoulos [46] considered the condensed representation of nonderivable item sets in their algorithm, which is a compact representation and can be obtained less expensively.
There are a lot of studies attempting to handle mixedtype data directly in the literature. Typical examples include LOADED [23], RELOADED, and ODMAD [24]. For instance, LOADED calculates an anomaly score for each object by using the support degrees of item sets for categorical features and correlation coefficients for numerical features [23]. RELOAD employs naive Bayes classifiers to predict abnormalities of categorical features. Finally, ODMAD treats categorical and numerical features separately. Specifically, it first calculates anomaly scores for categorical features using the same classification algorithm as LOADED. The objects, which are not identified as anomalies at this step, will be examined over numerical features with the cosine similarity [24]. Bouguessa [47] modelled the categorical and numerical feature space by using a mixture of bivariate beta distributions. The objects having a small probability of belonging to any components are regarded as anomalies.
The correlations of features have also been taken into consideration. For example, Zhang and Jin [25] exploited the concept of patterns to determine anomalies. In this method, a pattern is a subspace formed by a particular category and all numerical features. Within this context, the patterns are learned via logistic regression. The objects would be considered anomalies if the probability returned by the model is far from a specific pattern. Lu et al. [26] took pairwise correlations of mixedtype features into consideration and presented a generalized linear model framework for anomaly analysis. Additionally, the tstudent distribution was also used to capture variations of anomalies from normal objects. More recently, Do et al. [27] calculated anomaly scores for each object using the concept of free energy derived from a mixedvariant restricted Boltzmann machine. Since this well captured the correlation structures of mixedtype features through the factoring technique, it has a relatively high performance.
3. Evaluation Measurements
Unlike the problems of classification, evaluating the performance of the anomaly detection algorithms is more complicated. On the one hand, the ground truth of anomalies is unclear because real anomalies are rare in nature. On the other hand, the anomaly detection algorithms often output an anomalous score for each object. The objects with relatively large anomalous scores are considered anomalies if they are larger than a given threshold. Setting a proper threshold for each application in advance is relatively difficult. If the threshold is set too large, true anomalies would be missed; otherwise, some objects that are not true anomalies would be mistakenly taken as potential anomalies.
In general, the following measurements have often been used to evaluate the performance of the anomaly detection methods.(1) Precision at t (P@t) [48]: given a dataset D consisting of N objects, P@t is defined as the proportion of the true anomalies, A⊆D, to the top t potential anomalies identified by the detection method; that is, It is noticeable that the value of t is difficult to set for each specific application. A commonly used strategy is to set t as the number of anomalies in the ground truth.(2) Rprecision [49]: this measurement is the proportion of true anomalies within the top t potential anomalies identified, where t is the number of ground truth anomalies. Since the number of true anomalies is relatively small in comparison to the size of the dataset, the value of Rprecision would be very small. Thus, it contains less information.(3) Average precision (AP) [50]: instead of evaluating the precision individually, this measurement refers to the mean of precision scores over the ranks of all anomaly objects: where P@t is the precision at t, that is, Eq. (2).(4) AUC [4]: the receiver operating characteristic (ROC) curve is a graphical plot of the true positive rate against the false positive rate, where the true (false) positive rate represents the proportion of anomalies (inliers) ranked among the top t potential anomalies. Zimek et al. [4] noted that, for a random model, the ROC curve tends to be diagonal, while, for a good ranking model, it will output true anomalies first, leading to the area under the corresponding curve (AUC) covering all available space. Thus, the AUC is often used to numerically evaluate the performances of anomaly detection algorithms.(5) Correlation coefficient: correlation coefficients, such as Spearman’s rank similarity and Pearson correlation, are also taken as evaluation measurements. This kind of measurement places more emphasis on the potential anomalies ranked at the top by incorporating weights. More details about the measurements of correlation coefficients can be found in [51] and references therein.(6) Rank power (RP): Both the precision and AUC criteria do not consider characteristics of anomaly ranking. Intuitively, an anomaly ranking algorithm will be regarded as more effective if it ranks true anomalies in the top and normal observations in the bottom of the list of anomaly candidates. The rank power is such a metric and evaluates the comprehensive ranking of true anomalies. The formal definition is where n is the number of anomalies in the top t potential objects and R_{i} is the rank of the ith true anomaly. For a fixed value of t, a larger value indicates better performance. When the t anomaly candidates are true anomalies, the rank power equals one.
4. Experimental Comparisons
As discussed above, various anomaly detection algorithms have been developed. For better understanding the characters of the detection methods, in this section we make an experimental comparison of the popular anomaly detection algorithms.
4.1. Experimental Settings
In the literature, two kinds of data, that is, synthetic and realworld datasets, were often reported to evaluate the performance of the anomaly detection methods. The former is generated under the contexts of specific constraints or conditions. Wang et al. [52] provided several synthetic datasets with anomalies for different scenarios. The realworld datasets are offered at public sources such as UCI Machine Learning Repository [53] and ELKI toolkits [54]. However, the datasets publicly available are initially used for classification purposes. Hence, they should be preprocessed, making them suitable for the anomaly detection tasks. Two strategies are frequently adopted during the preprocessing stage [55]. The classes with rare data will be regarded as anomalies and the remaining as normal ones, if they have explicitly semantically meanings. Otherwise, one of the classes will be randomly selected as the anomalies.
To make a fair comparison, our experiments were carried out on 18 realworld datasets. They were downloaded from the UCI Machine Learning Repository [53], the ELKI toolkit [54], and ELVIRA Biomedical Dataset Repository (EBD) [56]. A brief summary of the datasets is presented in Table 2, where the “N (A)” column refers to the numbers of normal objects and anomalies, respectively. We performed the preprocessed operation on the datasets as suggested in [55]. For example, the fourth class (‘4’) in PenDigits consisting of 9,868 objects was considered anomalies, while the remaining as normal objects in our experiments.

The experiments compared nine popular anomaly detection algorithms, including kNN (k Nearest Neighbours) [9], LOF (Local Anomaly Factor) [11], LoOP (Local Anomaly Probabilities) [12], ODIN (Outlier Detection using Indegree Number) [13], RBDA (RankBased Detection Algorithm) [6], OR (Out Rank) [17], SOD (Subspace Anomaly Degree) [7], FB (Feature Bagging) [19], and HiCS (High Contrast Subspaces) [8]. They stand for the three kinds of the anomaly detection methods as mentioned above. For example, kNN, ODIN, LOF, LoOP, and RBDA belong to the neighbourbased detection methods and OR and SOD are the subspacebased detection methods. FB and HiCS are the ensemblebased detection methods.
In our experiments, two metrics, that is, Rprecision and AUC, were adopted to evaluate the detection algorithms. For the remaining four metrics, we have not presented here, because similar conclusions were found. The comparison experiments were conducted with the ELKI toolkit. The parameters involved within the anomaly detection algorithms were assigned to default values as recommended in the literature. The experiments were performed on a PC with 2.8 GHz of CPU clock rate and 4 GB of main memory.
4.2. Experimental Results
Table 3 provides the Rprecision performance of the anomaly detection algorithms on the experimental datasets. Since the main memory was quickly consumed when RBDA, FB, and OR run on the ALOI and KDDCup99 datasets, their experimental results were unavailable and presented as “/” in Table 3.

From the experimental results in Table 3, one may observe that the neighbourbased methods had relatively stable performance, while the ensemblebased methods, for example, HiCS, performed unsteadily in many cases. For instance, kNN and RBDA achieved relatively good performance on eight datasets. Even HiCS had worse performance on four of them, for example, PenDigits, KDDCup99, Annthyroid, and DLBCL, but it achieved the highest Rprecisions on Waveform, WDBC, and Ovarian. The reason is that the ensemblebased detection methods tend to be sensitive to the size of datasets subsampled from the original ones. Since OR is heavily dependent on the quantities of feature subspaces, it obtained the highest values on Annthyroid and the lowest values on Sonar, Waveform, Arrhythmia, and Spambase. For the highdimensional datasets, that is, Arcene, ALLAML, DLBCL, Gisette, Lung_MPM, and Ovarian, kNN, RBDA, SOD, and OR had good performance, where OR and SOD were more stable. Indeed, these contain many irrelevant features, which makes those subspacebased methods more effective. It can also be observed that RBDA was better than kNN and ODIN, for RBDA took the neighbour ranking, instead of the distances, into account which is more suitable for the highdimensional datasets.
The compared algorithms, except OR, take kNN as their baseline. As we know, kNN heavily relies on the number of neighbours k. To reveal the impact of k on the performance, we performed a comparison experiment among these methods with different k values. Tables 4 and 5 show the AUC scores of the anomaly detection algorithms with k=10 and k=50, respectively. Since the experimental results on the highdimensional datasets (i.e., Arcene, ALLAML, DLBCL, Gisette, Lung_MPM, and Ovarian) were still unavailable after three hours’ running, they were not provided in Table 5.


According to the results, we know that the detection performance of the comparison algorithms was heavily dependent on the number of neighbours and varied greatly when k assigned different values. To further illustrate this fact, we conducted additional experiments by performing the detection algorithms on Arrhythmia, Waveform, and WDBC with k varying from 10 to 50. The experimental results are illustrated as Figure 1.
(a) Arrhythmia
(b) Waveform
(c) WDBC
As shown in Figure 1, the performance of RBDA, FB, and SOD was relatively stable, although they took use of kNN as their baselines. In fact, SOD exploits kNN to obtain the relative subspace information, while FB ensembles all informative features found by kNN. As a result, k had less impact on them. On the other hand, kNN, ODIN, LoOP, and LOF heavily relied on the values of k. For example, the AUC values from ODIN varied greatly on all three datasets with the different values of k. HiCS had unsteady performance in many cases. For instance, it was less affected by k on WDBC, while sensitive to k on Arrhythmia. The reason is that, in our experiments, the basis detector of HiCS was also kNN, leading to its performance relying on k, although it is an ensemble anomaly detection algorithm.
Another interesting fact is that, on the WDBC and Waveform datasets, the AUC values of the compared algorithms varied greatly. Indeed, the ratios of anomalies to normal objects within these two datasets are relatively small (2.7% and 2.9% for WDBC and Waveform, respectively). Consequently, the anomaly detection algorithms were more sensitive to k. In contrast, on the datasets with high anomaly proportions, for example, Arrhythmia (45.7% anomalies), the AUC scores of the anomaly detection algorithms were less sensitive to k. Similar situations can be found for the other datasets. Due to the limitation of space, they will not be presented here one by one.
Computational efficiency is another important aspect for the practical applications of the anomaly detection methods. We carried out an additional experiment to compare the computational efficiencies of the anomaly detection algorithms. Table 6 records the elapsed time (s) of the anomaly detection algorithms on the experimental datasets.

The elapsed time in Table 6 shows that the neighbourbased detection methods, using the metrics of both distances (e.g., kNN and ODIN) and densities (e.g., LOF and LoOP), had relatively higher efficiencies. However, the ensemblebased detection methods, especially HiCS, took too much time to detect anomalies. As a matter of fact, they construct lots of individual detectors before identifying outliers. For the subspacebased detection algorithms, their efficiencies are dependent on the techniques adopted. For example, SOD, which exploits neighbours to explore relative subspaces, is more efficient than OR.
5. Conclusion
The data collected from realworld applications are becoming larger and larger in size and dimension. As the dimensionality increases, the data objects become sparse, resulting in identifying anomalies being more challenging. Besides, the conventional anomaly detection methods cannot work effectively and efficiently. In this paper, we have discussed typical problems of anomaly detection associated with the highdimensional and mixedtype data and briefly reviewed the techniques of anomaly detection. To offer a better understanding of the anomaly detection techniques for practitioners, we conducted extensive experiments on publicly available datasets to evaluate the typical and popular anomaly detection methods. Although the progresses of anomaly detection for the highdimensional and mixedtype data have been achieved to some extent, there are also several open issues shown as follows that further need to be addressed:(1)The traditional distance metrics in the neighbourbased methods cannot work very well for the highdimensional data because of the equidistant characteristics. The mixedtype features make anomaly detection more difficult. Introducing effective distance metrics for the highdimensional and mixedtype data is necessary.(2)The neighbourbased anomaly detection algorithms are sensitive to nearest neighbours selected for the models. Determining the right number of neighbours is a challenging issue for the neighbourbased methods.(3)The subspacebased and ensemblebased methods have relatively good performance if the diversity of the subspaces or base learners is large. For these kinds of anomaly detection methods, how to choose the right subspaces or base learners, as well as their quantities and their combining strategies, is still and open issue.(4)Since anomalies are relatively rare and the ground truth is often unavailable in real scenarios, how to effectively and comprehensively evaluate the detection performance is also a challenging issue.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Funding
This work was supported by the National Natural Science Foundation (NSF) of China (61871350,61572443); the Natural Science Foundation of Zhejiang Province of China (LY14F020019); and Shanghai Key Laboratory of Intelligent Information Processing, Fudan University (IIPL2016001).
References
 V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: a survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, 2009. View at: Publisher Site  Google Scholar
 C. C. Aggarwal, “Outlier ensembles,” ACM SIGKDD Explorations Newsletter, vol. 14, no. 2, pp. 49–80, 2017. View at: Publisher Site  Google Scholar
 M. Gupta, J. Gao, C. C. Aggarwal, and J. Han, “Outlier detection for temporal data: a survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 9, pp. 2250–2267, 2014. View at: Publisher Site  Google Scholar
 A. Zimek, E. Schubert, and H.P. Kriegel, “A survey on unsupervised outlier detection in highdimensional numerical data,” Statistical Analysis and Data Mining, vol. 5, no. 5, pp. 363–387, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 P. Gogoi, D. K. Bhattacharyya, B. Borah, and J. K. Kalita, “A survey of outlier detection methods in network anomaly identification,” The Computer Journal, vol. 54, no. 4, pp. 570–588, 2011. View at: Publisher Site  Google Scholar
 H. Huang, K. Mehrotra, and C. K. Mohan, “Rankbased outlier detection,” Journal of Statistical Computation and Simulation, vol. 83, no. 3, pp. 518–531, 2013. View at: Publisher Site  Google Scholar  MathSciNet
 H. P. Kriegel, P. Kroger, E. Schubert, and A. Zimek, “Outlier Detection in AxisParallel Subspaces of High Dimensional Data,” in Proceedings of the PacificAsia Conference on Advances in Knowledge Discovery and Data Mining, pp. 831–838, SpringerVerlag, 2009. View at: Google Scholar
 F. Keller, E. Müller, and K. Böhm, “HiCS: High contrast subspaces for densitybased outlier ranking,” in Proceedings of the IEEE 28th International Conference on Data Engineering, ICDE 2012, pp. 1037–1048, USA, April 2012. View at: Google Scholar
 S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 427–438, 2000. View at: Google Scholar
 F. Angiulli and C. Pizzuti, “Fast Outlier Detection in High Dimensional Spaces,” in Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, pp. 15–26, SpringerVerlag, Heidelberg, Berlin, Germany, 2002. View at: Publisher Site  Google Scholar
 M. M. Breunig, H.P. Kriegel, R. T. Ng, and J. Sander, “LOF: identifying densitybased local outliers,” ACM SIGMOD Record, vol. 29, no. 2, pp. 93–104, 2000. View at: Publisher Site  Google Scholar
 H.P. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “LoOP: local outlier probabilities,” in Proceedings of the ACM 18th International Conference on Information and Knowledge Management (CIKM '09), pp. 1649–1652, ACM Press, November 2009. View at: Publisher Site  Google Scholar
 H. Ville, I. Karkkainen, and P. Franti, “Outlier Detection Using kNearest Neighbour Graph,” in Proccedings of the IEEE International Conference on Pattern Recognition, vol. 3, pp. 330–433, 2004. View at: Google Scholar
 J. Zhang, Y. Jiang, K. H. Chang, S. Zhang, J. Cai, and L. Hu, “A concept lattice based outlier mining method in lowdimensional subspaces,” Pattern Recognition Letters, vol. 30, no. 15, pp. 1434–1439, 2009. View at: Publisher Site  Google Scholar
 J. Zhang, X. Yu, Y. Li, S. Zhang, Y. Xun, and X. Qin, “A relevant subspace based contextual outlier mining algorithm,” KnowledgeBased Systems, vol. 99, no. 72, pp. 1–9, 2016. View at: Publisher Site  Google Scholar
 J. K. Dutta, B. Banerjee, and C. K. Reddy, “RODS: Rarity based Outlier Detection in a Sparse Coding Framework,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 2, pp. 483–495, 2016. View at: Publisher Site  Google Scholar
 E. Müller, I. Assent, U. Steinhausen, and T. Seidl, “OutRank: Ranking outliers in high dimensional data,” in Proceedings of the 2008  IEEE 24th International Conference on Data Engineering Workshop, ICDE'08, pp. 600–603, Mexico, April 2008. View at: Google Scholar
 E. Müller, M. Schiffer, and T. Seidl, “Adaptive outlierness for subspace outlier ranking,” in Proceedings of the 19th International Conference on Information and Knowledge Management and Colocated Workshops, CIKM'10, pp. 1629–1632, Canada, October 2010. View at: Google Scholar
 A. Lazarevic and V. Kumar, “Feature bagging for outlier detection,” in Proceedings of the KDD2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 157–166, USA, August 2005. View at: Publisher Site  Google Scholar
 B. Van Stein, M. Van Leeuwen, and T. Back, “Local subspacebased outlier detection using global neighbourhoods,” in Proceedings of the 4th IEEE International Conference on Big Data, Big Data 2016, pp. 1136–1142, USA, December 2016. View at: Google Scholar
 A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander, “Subsampling for efficient and effective unsupervised outlier detection ensembles,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 428–436, USA, August 2013. View at: Google Scholar
 J. R. PasillasDiaz and S. Ratte, “Bagged subspaces for unsupervised outlier detection,” International Journal of Computational Intelligence, vol. 33, no. 3, pp. 507–523, 2017. View at: Publisher Site  Google Scholar  MathSciNet
 A. Ghoting, M. E. Otey, and S. Parthasarathy, “LOADED: Linkbased outlier and anomaly detection in evolving data sets,” in Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 387–390, UK, November 2004. View at: Google Scholar
 A. Koufakou and M. Georgiopoulos, “A fast outlier detection strategy for distributed highdimensional data sets with mixed attributes,” Data Mining and Knowledge Discovery, vol. 20, no. 2, pp. 259–289, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 K. Zhang and H. Jin, “An effective pattern based outlier detection approach for mixed attribute data,” in AI 2010: Advances in Artificial Intelligence, vol. 6464 of Lecture Notes in Computer Science, pp. 122–131, Springer, Berlin, Germany, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 Y.C. Lu, F. Chen, Y. Wang, and C.T. Lu, “Discovering anomalies on mixedtype data using a generalized Studentt based approach,” Expert Systems with Applications, vol. 28, no. 10, pp. 1–10, 2016. View at: Publisher Site  Google Scholar
 K. Do, T. Tran, D. Phung, and S. Venkatesh, “Outlier detection on mixedtype data: an energybased approach,” in Advanced Data Mining and Applications, pp. 111–125, Springer International Publishing, Cham, switzerland, 2016. View at: Publisher Site  Google Scholar
 H. Huang, K. Mehrotra, and C. K. Mohan, “Outlier detection using modifiedranks and other variants,” Electrical Engineering and Computer Science 72, 2011, https://surface.syr.edu/eecs_techreports/72/. View at: Google Scholar
 M. Radovanović, A. Nanopoulos, and M. Ivanović, “Reverse nearest neighbors in unsupervised distancebased outlier detection,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1369–1382, 2015. View at: Publisher Site  Google Scholar
 G. Bhattacharya, K. Ghosh, and A. S. Chowdhury, “Outlier detection using neighborhood rank difference,” Pattern Recognition Letters, vol. 60, pp. 24–31, 2015. View at: Publisher Site  Google Scholar
 L. Zhang, Z. He, and D. Lei, “Shared nearest neighbors based outlier detection for biological sequences,” International Journal of Digital Content Technology and its Applications, vol. 6, no. 12, pp. 1–10, 2012. View at: Publisher Site  Google Scholar
 B. Tang and H. He, “A local densitybased approach for outlier detection,” Neurocomputing, vol. 241, pp. 171–180, 2017. View at: Publisher Site  Google Scholar
 J. Ha, S. Seok, and J.S. Lee, “A precise ranking method for outlier detection,” Information Sciences, vol. 324, pp. 88–107, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 C. C. Aggarwal, “High dimensional outlier detection: the subspace method,” in Outlier Analysis, pp. 135–167, Springer, New York, NY, USA, 2013. View at: Google Scholar
 J. Zhang, S. Zhang, K. H. Chang, and X. Qin, “An outlier mining algorithm based on constrained concept lattice,” International Journal of Systems Science, vol. 45, no. 5, pp. 1170–1179, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 C. C. Aggarwal and S. Yu, An Effective and Efficient Algorithm for HighDimensional Outlier Detection, SpringerVerlag, New York, NY, USA, 2005. View at: MathSciNet
 E. Muller, M. Schiffer, and T. Seidl, “Statistical selection of relevant subspace projections for outlier ranking,” in Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011, pp. 434–445, Germany, April 2011. View at: Google Scholar
 M. A. Stephens, “Use of the kolmogorovsmirnov, cramérvon mises and related statistics without extensive tables,” Journal of the Royal Statistical Society: Series B, vol. 32, no. 1, pp. 115–122, 1970. View at: Publisher Site  Google Scholar
 A. Zimek, R. J. Campello, and J. Sander, “Ensembles for unsupervised outlier detection: challenges and research questions,” ACM SIGKDD Explorations Newsletter, vol. 15, no. 1, pp. 11–22, 2014. View at: Publisher Site  Google Scholar
 C. C. Aggarwal and S. Sathe, “Theoretical Foundations and Algorithms for Outlier Ensembles,” ACM SIGKDD Explorations Newsletter, vol. 17, no. 1, pp. 24–47, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 H. V. Nguyen, H. H. Ang, and V. Gopalkrishnan, “Mining outliers with ensemble of heterogeneous detectors on random subspaces,” in Database Systems for Advanced Applications, vol. 5981, pp. 368–383, Springer, Berlin, Germany, 2010. View at: Publisher Site  Google Scholar
 A. Giacometti and A. Soulet, “Frequent pattern outlier detection without exhaustive mining,” Advances in Knowledge Discovery and Data Mining, pp. 196–207, 2016. View at: Google Scholar
 Z. He, X. Xu, Z. Huang, and S. Deng, “FPoutlier: Frequent pattern based outlier detection,” Computer Science and Information Systems, vol. 2, no. 1, pp. 103–118, 2005. View at: Publisher Site  Google Scholar
 M. E. Otey, A. Ghoting, and S. Parthasarathy, “Fast distributed outlier detection in mixedattribute data sets,” Data Mining and Knowledge Discovery, vol. 12, no. 23, pp. 203–228, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 S. Wu and S. Wang, “Informationtheoretic outlier detection for largescale categorical data,” IEEE Transactions on Knowledge and Data Engineering, vol. 3, no. 25, pp. 589–602, 2013. View at: Publisher Site  Google Scholar
 A. Koufakou, J. Secretan, and M. Georgiopoulos, “Nonderivable itemsets for fast outlier detection in large highdimensional categorical data,” Knowledge and Information Systems, vol. 29, no. 3, pp. 697–725, 2011. View at: Publisher Site  Google Scholar
 M. Bouguessa, “A practical outlier detection approach for mixedattribute data,” Expert Systems with Applications, vol. 42, no. 22, pp. 8637–8649, 2015. View at: Publisher Site  Google Scholar
 N. Craswell, “Precision at n,” in Encyclopaedia of Database Systems, L. Liu and M. Ozsu, Eds., pp. 21272128, Springer, Berlin, Germany, 2009. View at: Google Scholar
 N. Craswell, “Rprecision,” in Encyclopaedia of Database Systems, L. Liu and M. Ozsu, Eds., p. 2453, Springer, Berlin, Germany, 2009. View at: Google Scholar
 E. Zhang and Y. Zhang, “Average precision,” in Encyclopaedia of Database Systems, L. Liu and M. Ozsu, Eds., pp. 192193, Springer, Berlin, Germany, 2009. View at: Google Scholar
 E. Schubert, R. Wojdanowski, A. Zimek, and H.P. Kriegel, “On evaluation of outlier rankings and outlier scores,” in Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012, pp. 1047–1058, USA, April 2012. View at: Google Scholar
 X. Wang, X. L. Wang, Y. Ma, and D. M. Wilkes, “A fast MSTinspired kNNbased outlier detection method,” Information Systems, vol. 48, pp. 89–112, 2015. View at: Publisher Site  Google Scholar
 “UCI Machine Learning Repository,” 2007, http://archive.ics.uci.edu/ml/. View at: Google Scholar
 “ELKI,” 2016, https://elkiproject.github.io/releases/. View at: Google Scholar
 G. O. Campos, A. Zimek, J. Sander et al., “On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 891–927, 2016. View at: Publisher Site  Google Scholar
 “ELVIRA Biomedical DataSet Repository,” 2005, http://leo.ugr.es/elvira/DBCRepository/. View at: Google Scholar
Copyright
Copyright © 2019 Xiaodan Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.