Mathematical Problems in Engineering

Volume 2017 (2017), Article ID 3956415, 7 pages

https://doi.org/10.1155/2017/3956415

## A Quick Negative Selection Algorithm for One-Class Classification in Big Data Era

^{1}College of Computer Science, Sichuan University, Chengdu 610065, China^{2}College of Cybersecurity, Sichuan University, Chengdu 610065, China^{3}Chongqing University of Technology, Chongqing 400054, China^{4}Chengdu University of Information Technology, Chengdu 610225, China

Correspondence should be addressed to Wen Chen; nc.ude.ucs@nehcnew and Hanli Yang; nc.ude.tuqc@lhy

Received 2 February 2017; Accepted 3 May 2017; Published 12 June 2017

Academic Editor: Zonghua Zhang

Copyright © 2017 Fangdong Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Negative selection algorithm (NSA) is an important kind of the one-class classification model, but it is limited in the big data era due to its low efficiency. In this paper, we propose a new NSA based on Voronoi diagrams: VorNSA. The scheme of the detector generation process is changed from the traditional “Random-Discard” model to the “Computing-Designated” model by VorNSA. Furthermore, we present an immune detection process of VorNSA under Map/Reduce framework (VorNSA/MR) to further reduce the time consumption on massive data in the testing stage. Theoretical analyses show that the time complexity of VorNSA decreases from the exponential level to the logarithmic level. Experiments are performed to compare the proposed technique with other NSAs and one-class classifiers. The results show that the time cost of the VorNSA is averagely decreased by 87.5% compared with traditional NSAs in UCI skin dataset.

#### 1. Introduction

NSA was proposed by Forrest et al. in 1994 [1], which generates immune detectors based on the “Random-Discard” model. Initially, massive immature detectors are randomly generated, and then the ones covering the self-areas are discarded. González et al. presented the real-valued negative selection algorithm (RNSA) in 2003 [2], in which the detectors and antigens are studied in the real-value space. Ji and Dasgupta proposed V-Detector algorithm [3, 4]. It turns the fixed-length detectors in RNSA into the variable-sized detectors to enlarge the detection areas. In 2015, Cui et al. developed BIORV-NSA [5]. In their work, the self-radius can be variable and the detectors, which are recognized by other mature detectors, are replaced by new ones to eliminate the “detection holds.”

In big data era, the low efficiency of NSA becomes an important challenge, which largely limits its applications. In this paper, we design a new NSA based on Voronoi diagrams, named VorNSA. In the VorNSA, a restrained Voronoi diagram is constructed based on the whole training set in the first step. Then, two types of detectors are generated in the specific location of the Voronoi diagram separately. In order to accelerate the test stage of NSA, in particular for large scale dataset, a new testing strategy VorNSA/MR (VorNSA with Map-Reduce) is proposed. Unlike the testing stage of classic NSAs, data are divided into small groups and calculated to generate the labels separately in Map stage. Then the final labels can be obtained after merging and sorting in the Reduce stage.

The contributions of this work can be summarized as follows. () Based on Voronoi diagrams, the optimal position of detectors is calculated directly rather than in a stochastic way. Therefore, the time consumption wasted on excessive invalid detectors is avoided. () In the Map/Reduce framework, data are partitioned into several small parts by VorNSA/MR and can be processed in parallel to enhance the self/non-self-discrimination efficiency.

The rest of the paper is organized as follows. In Section 2, we describe the definitions of VorNSA. The original contribution of the paper is presented in Section 3. Experimental results on synthetic datasets and real-world datasets are shown and discussed in Section 4. Conclusions appear in Section 5.

#### 2. Basic Definition of VorNSA

VorNSA is designed based on Voronoi, which is derived from computation geometry to search the nearest neighbors, and it has been widely utilized in the fields of life sciences [6], material sciences [7], and mobile navigation [8]. The basic definitions are listed as follows.

*Definition 1 (site). *Site is a set of distinct points in the feature space. In VorNSA, all the training samples are defined as site points: .

*Definition 2 (Voronoi diagram). * divides the feature space into unoverlapped cells based on the given site set , and each cell only contains one site in , such that any point in satisfies , , and can be any distance metrics.

*Definition 3 (cell). *All the cells construct a mathematic partition of the feature space, and the cell corresponding to site is denoted by .

*Definition 4 (largest empty circle). *The largest circle with center , which does not contain any site in , is denoted by .

Theorem 5. *A point is a vertex of iff contains at least three sites on its boundary [9].*

*Definition 6 (I-detector). *, where is the detector position in the feature space, and is the detector radius, satisfies that corresponds to one vertex of the Voronoi diagram.

Theorem 7. *Given is the center of an I-detector, there are at least three sites located on the boundary of , and these sites are the nearest neighbors of each other.*

*Proof. *According to Definitions 2 and 6, it can be inferred that the center of the I-detector* p* is an intersection of three or more cells. Suppose that is intersected by three cells , , , while the sites of these cells are , , . According to Definition 4 and Theorem 5, there is a largest empty circle that does not contain any site of , and , , are located on its boundary. So , , and are the nearest sites of among the site sets .

Theorem 8. *The bisector between sites and defines an edge of iff there is a point on the bisector such that contains both and on its boundary with no other site [9].*

*Definition 9 (II-detector). *, where is the detector position in the feature space, and is the detector radius, satisfies that corresponds to the junction of the edges of and the unit hypercube .

Theorem 10. *Given is the center of II-detector, there are two sites located on the boundary of , and these sites are the nearest neighbors of each other.*

*Proof. *According to Definitions 2 and 9, it can be inferred that the center of II-detector is an intersection of two cells. Suppose that is intersected by two cells , , while the sites of these cells are , . According to Definition 4 and Theorem 8, there is a largest empty circle that does not contain any site of , and , are located on its boundary. So and are the nearest sites of among the site sets .

As an example in Figure 1, there are 10 sites in set , and the space is divided into 10 cells by the Voronoi diagram . The green circle is , and three sites (, , ) are located on its boundary. The red circle is , and two sites (, ) are located on the boundary. The purple circle is , and two sites (, ) are located on the boundary. is the center of I-detector, while and are the centers of II-detector.