#### Abstract

Secure localization under different forms of attack has become an essential task in wireless sensor networks. Despite the significant research efforts in detecting the malicious nodes, the problem of localization attack type recognition has not yet been well addressed. Motivated by this concern, we propose a novel exchange-based attack classification algorithm. This is achieved by a distributed expectation maximization extractor integrated with the PECPR-MKSVM classifier. First, the mixed distribution features based on the probabilistic modeling are extracted using a distributed expectation maximization algorithm. After feature extraction, by introducing the theory from support vector machine, an extensive contractive Peaceman-Rachford splitting method is derived to build the distributed classifier that diffuses the iteration calculation among neighbor sensors. To verify the efficiency of the distributed recognition scheme, four groups of experiments were carried out under various conditions. The average success rate of the proposed classification algorithm obtained in the presented experiments for external attacks is excellent and has achieved about 93.9% in some cases. These testing results demonstrate that the proposed algorithm can produce much greater recognition rate, and it can be also more robust and efficient even in the presence of excessive malicious scenario.

#### 1. Introduction

The location information of the sensor node performs a critical role for numerous applications in wireless sensor networks (WSNs) such as environment monitoring, target tracking, and automatic surveillance. It also helps some fundamental techniques in sensor networks (e.g., geographical routing protocol and topology control) to be aware of where the messages are located. Driven by those demands, earlier research efforts have resulted in many localization schemes, with most assuming that the sensors are deployed in a benign scenario. But when the sensor nodes are deployed in malicious environments, it is prone to different forms of threats and risks. A simple malicious attack can disturb the accurate position estimating and even make the entire network functioning improperly [1]. The existing attacks in localization can generally be divided into internal and external attack. Internal attackers usually are compromised nodes whose encryption key has been extracted, which can be prevented by using advanced cryptography techniques. The external attack is launched by one or more malicious nodes to distort the information without system’s authorization, which means that traditional security mechanism like cryptography is limited to defend against this type of attack. In this paper, we will mainly analyze the recognition of the external attacks on the localization procedure.

In recent years, designing secure localization schemes that provide valid location information resistant to externals attacks has received much research attention [2–8]. Most of these secure location mechanisms can be broadly divided into two categories: cheating node detection and robust localization algorithms. The former such as [3, 5, 7] are characterized by verifying some location-related parameters like distance or time during positioning process to detect the inconsistency and then eliminating abnormal nodes, while the latter [2, 4, 6, 8] depend on designing robust localization schemes to tolerate attacks rather than detecting them.

Most existing works for WSN localization security focused on either achieving high detection ratio under different types of attacks or developing robust positioning methods. Unfortunately, none of these techniques can explicitly differentiate those attacks. This may make the network defense fall into the passive situation and have a negative effect in preventing future repeated attacks. If the network only detects localization attacks without type classification and analyzing, the possible consequence can be implied as follows. One of the main results is that it is not convenient for network to restore location-related information. The other is that it could make the network difficult to provide more information services and evidence in security event processing. Only after alert information is collected and analyzed can we determine the dangerous region where attack frequently takes place and then design targeted localization scheme according to certain threat. Therefore, attack classification in localization is not only the premise and foundation of threat analysis, but also a crucial component in network security situation awareness. And attack recognition algorithm should be executed as second line of protection against attacks before the location information can be used by other applications.

In this work, we proposed a localization attack classification method based on the distributed expectation maximization algorithm followed by support vector machines called PECPR-MKSVM. The classification mechanism consists of two phases: the feature extraction phase and the classification phase. The techniques developed in our solutions offer the advantage of classifying various kinds of attacks. More specially, our approach possesses the following contributions.(1)To extract more efficient attack features, an Exponential-Gaussian (EG) mixture distribution is firstly modeled by investigating the common properties of initial features based on their probability distribution. The initial features are composed of distance and topology-related measurements.(2)A distributed version of expectation maximization (EM) algorithm which exchanges information with neighbor sensors is implemented for density estimation and feature extraction, where one term for time dependent information averaging is combined with another term for iterative information propagation.(3)In order to recognize multiple attacks more accurately and adapt to the distributed characteristics of sensor networks, we design an exchange-based classifier called proximal extension contractive Peaceman-Rachford splitting-multiple kernel support vector machines (PECPR-MKSVM).(4)To identify the effectiveness of our distributed recognition approach, comprehensive designed experiments are conducted by testing the attacks dataset under different conditions. Compared with other similar schemes, we find that the results obtained in these comparisons clearly show that the distributed classification algorithm achieves better recognition performance and has stronger robustness, with very competitive runtime.

The remainder of the paper is structured as follows. Some related works on secure localization and recognition algorithms are reviewed in the next section (Section 2). In Section 3, we describe the attack assumptions and model the initial features with a joint Exponential-Gaussian distribution, while Section 4 presents the distributed EM algorithm based feature extraction method by employing the distributed averaging approach. In Section 5, by improving the contractive Peaceman-Rachford splitting method algorithm, a novel distributed classifier PECPR-MKSVM is presented. In Section 6 we verify the performance of the classification algorithm by means of extensive experiments. Finally, some conclusions are devoted to Section 7.

#### 2. Related Work

To investigate the scheme for classifying localization attack in WSN, a necessary literature survey on secure localization mechanism is firstly provided. Moreover, we provide a succinct summary of research on two essential components of the proposed method, that is to say, the EM algorithm for feature extraction and support vector machines for classifier.

##### 2.1. Secure Localization Mechanism

In the prior work about the secure localization, one theme is able to discover and eliminate the suspicious nodes. In [9], the authors proposed a beacon-based securing localization method, which is also used by a Minimum Mean Square approach to filter out suspicious nodes. This work was implemented by observing the inconsistency in location references between the malicious beacon nodes and the benign ones [10]. Similarly, Du et al. [11] created a general scheme by using network deployment knowledge to detect localization anomalies if the level of inconsistency in expectations of the derived positions exceeds the certain threshold. Recently, another detection-based secure localization algorithm by Han et al. was proposed, which has two steps. The anchor nodes first identified the suspicious node if it sends abnormal reference information. Then a mesh generation method was developed to separate suspicious nodes [12].

The other theme is an error-intolerant localization when there exist malicious adversaries and great measurement inaccuracy. Li et al. in [13] employ an improved LM approach to achieve the goal of securing localization in a scenario where the fraction of the malicious nodes is less than 50%. Based on candidate locations identifying, a similar method called random sample consensus (RANSAC) algorithm was proposed in [14]. This method used picked subsets of sensors to detect and choose the value which minimizes the median of the remains as its solution. Alternatively, by using the Taylor-series least squares scheme with different weighting, Yu et al. developed two-stage secure localization method which applied beta distribution function to tolerate the presence of malicious beacons [15]. Some other approaches try to realize the secure location estimation by expressing it as a global optimization problem. For instance, by taking advantage of improved least median squares, a robust statistical method was developed to make positioning attack tolerant. In [16], Doherty et al. designed a feasible secure localization methodology using convex optimization based on pairwise angles and connectivity between nodes. Bao et al. extended the work from static to mobile scenario with the help of a game-based strategy [17].

According to our current knowledge, the problem of localization attacks recognition for sensors network, which is our focus here, has not been well studied.

##### 2.2. EM Algorithm for Feature Extraction

Unsupervised feature selection/extraction techniques are generally classified into three categories as wrappers, filters, and integrated-learning approaches. Several integrated-learning feature extraction algorithms like EM have been developed in various fields. In [18], the features were extracted from the continuous-valued dataset by using a primary integrated-learning strategy. In another algorithm of feature extraction, the feature saliency is firstly regarded as relevant features, and the pruning behavior is then outlined by using EM optimization. Moreover, a double-loop EM algorithm was applied in medical detection such as epileptic seizure so that the supervised learning could fit well with the mixture of experts network structure [19]. In [20], EM algorithm was applied in image feature extraction to identify parameters for generalized Gaussian mixture model. Subsequently, a Kullback divergence-based similarity measure was presented and analyzed. However, the fact that class of texture distribution is under the influence of its neighborhood is neglected. To address the issues of information loss, a shuffled frog-leaping method is added to the EM algorithm to enhance the performance of crack image segmentation [21]. According to an evaluation threshold , neighborhood of each pixel was classified into three types, respectively. Because the value threshold is selected by the experience, it may lead to inaccurate segmentation.

##### 2.3. Support Vector Machines (SVM) for Classifier

SVM, the most popular branch of machine learning theory to address classification and regression problems, was firstly presented from research in statistical learning theory. Then the introduction of kernel skill breeds a new group of techniques for nonlinear program with high-dimensional or small-sample data [22, 23]. Based on MK-SVM, Yeh et al. proposed a new composite multiple kernel in a form of a linear weighted combination. They combine multiple kernel with SVMs to design a counterfeit banknotes detection system [24]. Although all of these centralized learning approaches have been well preformed in various scenarios, they also increased memory and computational resource consumption, especially for low energy constrained WSN. Therefore, some new algorithms on the topic of distributed SVM have recently been presented. In [25], Forero et al. proposed a distributed SVM scheme that combines alternating direction method of multipliers (ADMM) with consensus-based SVM to reduce the training time cost. This algorithm enhanced the prediction performance with the help of ADMM optimization. However, the collaborative pattern may face risk from shortage of local processors with the increment of the multiple kernel number.

#### 3. Network Assumptions and Statistic Based Feature Model

It is considered that there exist three classes of nodes distributed randomly in the sensing area: sensors, anchors, and malicious nodes. The random network topology is modeled as Erdös-Rényi (ER) random graph denoted by , where symbolizes node set and indicates edge set. The node set consists of sensors and anchors, in which the sensor is expressed by and the anchor is indicated by , respectively. The malicious node, labelled by , is included in sensor set. We define the distance between sensors and as and the distance from sensor to anchor as . The total number of links through sensor , which is calculated by the shortest path, is equal to . The sensors are in charge of data-gathering and are not aware of their own coordinates. The anchor is a node that knows and broadcasts its location reference in advance, equipped with localization hardware such as GPS. The malicious nodes exist singly or in pairs to launch various attacks. We assume that the sensors’ communication range is assumed to be a circle with the same value while the malicious nodes’ communication range is unlimited. The distance between two sensors is estimated by the received power strength, whose background noises are Gaussian distributed. Likewise, the distance between the sensor and the anchor can be provided by the measurement from anchor. We also assume that each sensor has its own ID and can broadcast it with distances between its neighbors, passively collects adjacent sensors’ broadcast, and then makes one list of ID and position which is also called the sensor’s neighborhood observation. When all the sensors receive such multiple kinds of packets from neighbors, they transport the information to nearest sensor node with the most energy in a multihop fashion, and the node is engaged in the calculation of feature extraction and recognition and so on in WSNs.

The WSN is assumed to be deployed in an adversarial attacking environment. The adversary launches only external attacks to disrupt the localization procedure, which means it implements malicious behaviors without right cryptographic key. Moreover, the presence of malicious nodes is a small number compared with the benign number in local area. The attack type of the malicious node is divided into three categories including wormhole, replay, and interference attack [26]. Wormhole attack can eavesdrop on the packets of location reference at one position and then create a tunnel and send to other sensors that are far apart, thus causing inaccurate location estimating [27]. As illustrated in Figure 1, sensor could only capture the beacon signal of anchor in normal conditions. When a wormhole attack is launched, the malicious node copies the message from anchor and sensor and then tunnels it through a bidirectional link and replays it at the location . Eventually, the node will determine its location based the positions of and ; it may consider sensor as neighbor at the same time. In interference attack, the hostile sensor may be an obstacle between signal sender and receiver to distort the signal measurement or time of arrival for ranging. For example, if a signal strength based localization process suffers range enlargement attack, attackers may attenuate the node’s transmission power. Replay attack is another common type of attack which is more likely to appear under the circumstance that energy and computing resources are limited to the adversary. The location message will be captured by the malicious nodes from one anchor. Then an incorrect location reference will be retransmitted to the receiving sensor later. The position calculation in sensors can be frequently affected by the invalid information. In addition to the above characteristics, the adversarial node of wormhole or replay attack in practical environment also has the same ability of data communicating and processing as other normal sensors, which means that this malicious node can be capable of overhearing other types of packet and then modifies and broadcasts it [28, 29]. Again, in order to acquire more accurate attack related information, it is inappropriate to use encryption techniques to eliminate all the adversaries in advance. Furthermore, these attacks might be launched with irregular schedule during the whole classification process. But, in this study, the problem of recognition for localization attacks is more concerned, and, thus, it is required that the proportion and extent of modification in other types of packets do not exceed those in distance related information.

Because there is no single variable to directly characterize the external localization attack, it is necessary to build the original feature set. From the above-mentioned description of external attacks, we find that it might interact directly or indirectly with distance between nodes. The value of distance and , which is closely associated with whether the node suffers from attacking, could be considered as one main initial feature for recognition. And thus the distance feature vector VD for sensor is gained as . While the distance value could describe some information about external attacks, it is still insufficient to classify those external attacks by this single feature. To handle this problem, the complex network theory is introduced to express feature information more comprehensively. Because WSN is comprised of large amounts of sensors, it belongs to complicated network structure. Furthermore, the topological properties will vary with the fluctuation of sensor’s location and distance. It implies that these properties can reveal the impact of localization attack from a complex network perspective. Up to now, a number of indexes have been developed to measure the behavior in complex network such as degree and clustering coefficient, which also supply a framework that reflects various features of network. In this work, the indexes considered are degree, clustering coefficient, betweenness centrality (normalized), and coreness. The topological feature based vector VT for sensor is defined as , where , , , and represent degree, clustering coefficient, betweenness centrality (normalized), and coreness, respectively.

It seemingly makes sense that the value of the original feature will vary when sensors are under attacks. However, we found that the difference of change in original features between some types of localization attack is not significant, which cannot be classified by a threshold. Furthermore, this change will be expanded under multiple attacks. Thus an effective feature extraction method needs to be explored. We note that the above-mentioned original features can be described by statistics modeling. If a distribution model is constructed to represent the original features obtained by each sensor, the different attack type can be described more accurately by the model parameters extracted.

For each single element such as in feature vector , the probability of it can be modeled into the Gaussian distribution with mean and variance according to [30], which is analyzed from the point of error measurement:

Moreover, the feature vector is constituted by the shortest path length, which also possesses the property of complex network. In [31, 32], the length of shortest path is investigated as a negative exponential distributed variable with rate parameter :Thus, in order to obtain more detailed properties, the distance vector is modeled as a mixed distribution as

For the topology-related feature, the probability distribution is further investigated to model the irregular deviation along with the normal and attack scenario. Because of limited space, two representative parameters that form the mixture distribution were chosen to analyze the impact of external attacks.

The first parameter analyzed is the node degree. It is expressed by the total amount of neighbors connected to a picked sensor. Degree distribution is defined by a probability , which is the proportion of the sensors with the same amount of connections. A graph of Erdös-Rényi random WSN has a vertex degree following the Poisson distribution as [33]

In this formula, indicates the expectation value of number of sensors with degree . Meanwhile, we observe that the mixture distribution of the distance feature is known as a continuous probability distribution while Poisson distribution in degree feature is discrete. It is formidable to construct a unified model by using these two diverse variables. Moreover, if only employing the single parameter in Poisson distribution, it may be hard to distinguish between multiple attacks. It has been found that, according to the limit form when using central limit theorem [34], the Poisson probability density function tends to be excellently achieved by a Gaussian distribution with a high mean value in Poisson distribution. The mean of node degree in Poisson distribution under normal condition, calculated by maximum likelihood estimation, is equal to 14.72, which is not fairly satisfactory. But, for the sake of reducing computational complexity and realization of a feasible mixture model, the Gaussian probability density function is still applied in approximating Poisson distribution.

Figure 2 exhibits the degree distribution of the WSN and its variation under different external attacks. The curves of the probability density function (pdf) with Gaussian distribution approximation are also added. As seen in Figure 2(a), we notice that, for unassailed scene, the degree distribution approximately agrees with a Gaussian distribution. However, the measured distances by RSSI in real-world are affected by the multipath fading or data modification by attackers. Therefore, the value of probability for each degree varies with a fluctuation. Figures 2(b)–2(d) compare the variation of degree distribution and the probability density function of their Gaussian distribution approximation under three types of external attack, respectively. As depicted by Figure 2(b), a peak of probability emerges around degree 14, while another weak intensity probability peak appears around the degree of 35. And the mean in Gaussian approximation function decreases to 14.16. The reason for this change is that wormhole tunnel makes some nodes far away be identified incorrectly as neighbors. When under interference attack, the maximum probability in the degree distribution is lower than that in Figure 2(a). The proportion of sensors with low degree value increases. And under the approximation of Gaussian distribution with and , the spread of shape in probability density function looks wider than that in Figure 2(a). As shown in Figure 2(d), the degree distribution has similar variation as Figure 2(c). We also note that the variance in the approximated Gaussian distribution has the highest value, which may correspond to the fact that relayed packets from malicious node increase the nonexistent connections. The above results demonstrated that the parameters in Gaussian distribution approximating help to differentiate these external attacks. Then, using the similar analysis as mentioned above, another feature clustering coefficient is also fitted by Gaussian distribution.

**(a) Unassailed**

**(b) Wormhole attack**

**(c) Interference attack**

**(d) Replay attack**

The second property analyzed is normalized betweenness centrality. The betweenness centrality [35], denoted by , is used to examine the potential of a sensor on the connection control with other sensors and evaluate ratio summation of shortest paths passing through sensor . Therefore, the betweenness centrality of sensor is formulated aswhere represents the entire quantity of shortest paths from sensor to sensor and represent the shortest paths quantity from to including sensor . For convenience, the normalization form of is obtained by

Figure 3 plots the normalized betweenness centrality distribution and its probability density function of exponential distribution approximation with the same scenarios as node degree. It is noticed that the normalized betweenness centrality distribution in all scenarios are peaked at initial part and then decrease monotonically. Previous works found that normalized betweenness centrality tends to obey a power-law distribution [36]. However, the descending speed of the normalized betweenness centrality distribution for each scenario does not appear so sharply. Furthermore, in order to build the mixture model, the remainder of features should be also presented as a continuous function. Based on these considerations, the distribution of the normalized betweenness centrality is alternatively approximated by a negative exponential distribution which is of the formwhere is a rate parameter. In addition, it can be observed in Figure 3(b) that the proportion of with high probability value is increased by a small amount because the malicious nodes indirectly enhance the communication capability of their neighbors. And under the approximation of negative exponential distribution with , the decay of pdf looks more rapid than that of Figure 3(a). Comparing to Figure 3(b), the variation of distribution for the interference attack case (Figure 3(d)) is analogous to that for wormhole case, but it has milder decreasing, whose rate parameter in exponential distribution approximation reaches the value of 0.007020. Referring to Figure 3(c), the distribution varies slightly, which leads to the change of approximation distribution parameter being not significant. In general, the introduction of the new parameter will contribute to distinguishing certain attacks and improving the performance of classification too. The last topological feature coreness has similar distribution characteristics with normalized betweenness centrality. Accordingly its distribution is also approximated by negative exponential function.

**(a) Unassailed**

**(b) Wormhole attack**

**(c) Interference attack**

**(d) Replay attack**

For demonstrating the capability of the distribution approximation, the mean square error (MSE) of the approximation curve related to the probability of observed data for all topological features is calculated by setting different attack scenarios and the result is listed in Table 1. In general, the MSE basically maintains at the same magnitude even under attacks, except for some values in the wormhole attack. Secondly, it can be seen that the feature of normalized betweenness centrality yields the smallest MSE value compared with other topological features, which suggests that the exponential distribution is the best approximation. However, from the point of fitting accuracy, it is not easy to say that these approximation distributions precisely fit the feature data, even for normalized betweenness centrality. The reason is that, in the simulated localization of WSN, the distance related message and other data are influenced by some other factors such as channel fading, internode interference, and packet modification by malicious sensors. These elements will bias the true measurement and further increase the approximation error, which will also affect the recognition performance. Therefore, it is indispensable to integrate other approaches like classifier to strengthen the recognition ability in later processing.

For a set of features collected by one sensor, with distribution of Gaussian and negative exponential, respectively, the probability density function for the mixed features may be divided into two parts. One part is associated with node degree and clustering coefficient denoted by , and the other part is associated with normalized betweenness centrality and coreness, denoted by . Combining (1) with (7), the probability density of features vector observation is modeled in the following manner [37]:where represents the mixture probability density of a features vector. The vector of distribution parameters to be estimated is . The mixing weights are represented by and , which satisfies that . The means and variances of the Gaussians are represented by and , respectively, corresponding to . is the rate parameter corresponding to .

#### 4. Distributed Feature Extractor Design

In order to explore the statistical properties embedded in the mixture density function and to describe the behavior of attack more completely, the EM algorithm can be adopted for calculating unknown model parameters [38]. However, in face of hostile environments, it is unable to confirm whether the sensors for computing and recognition are malicious or not. The data may be ruined or viciously modified by adversary itself if only the centralized computation is used, and the correctness of feature extracted and classifier recognition will be further decreased. Consequently, the issues of security in computation should be taken into consideration. The great success that has been made currently on the research of distributed computing attracts our attention [39]. The primary benefit of this technique is that the multiple consistent intermediated variables updated at each incremental step can be conveyed to several adjacent nodes; the records kept on these nodes will help to detect and separate the attacker. Depending on this merit, a distributed scheme of feature extraction based on information exchange is presented, and then a verification policy is added.

The exchange-based distributed EM method we proposed is to calculate and update the parameters in classic EM method by using the neighbors’ information, which is based on the idea of distributed averaging approach in [40]. We use to denote set of nodes that communicate with sensor ; that is, there exists an edge between and any sensor or anchor . A distributed linear iteration problem for value among sensor and could be described aswhere is a time variable and denotes a weight matrix where and only if and . In order to solve problem (10) asymptotically by average consensus, should be assumed symmetric and ensure the necessary and sufficient constraints as follows [40]:where represents one vector whose elements are all equal to 1. is known as the spectral radius for the given matrix. denotes the averaging matrix.

Then, based on the probability density in Section 3, the mixture distribution for features is

Then the log-likelihood for the features vector satisfies

After initializing , the distributed EM algorithm can be written as follows.

*(A) Expectation Process*. Let be the binary hidden variable vectors of having observed the th density given . For one feature , we calculate the a posteriori probabilities of using Bayes rule and the previous values of parameters :

Then the condition expectation with respect to the actual observed feature is defined by :

*(B) Maximization Process*. In the maximization process, the model parameters are updated by maximizing , which compute the intermediate variables along with iterative step :

Note that the calculation of current intermediate state at the th sensor at its time exchanges information with its neighbors by application of averaging matrix , where is nonzero for . It became a weighted combination of a prediction and the value derived from neighborhood averaging. By this mean, the local information of gradually spread over the network. Thus, each sensor can update its prediction values , , , , and using the intermediate variable until all nodes reach a fixed point on their values. The mixing parameter in (15) determines the influence of information transmitted across the network, whereas the predictor parameter associated with the convergence rate. These two step-size coefficients are predecided real constants. It is necessary to investigate the choice for the value of and for our scenario. The time-related mixing parameters and are defined as [41] where is a growth rate parameter. And, based on the analysis in [42], the convergence rate of distributed averaging algorithm is going to speed with a large value of when the random network is well connected. Therefore, a tentative experimental test on the time cost of the proposed algorithm is conducted with the change on the value, in which the variation interval is 0.05. Figure 4 approximately shows the varying trend of and result of its corresponding time cost. From the figure we can observe that the computational time decays gradually when is increased from 0.05 to 0.95, which means the convergence rate of distributed EM method increases. Moreover, the decline of the time cost nearly stops improving when the value achieves 0.8, which demonstrated that the effect of will be not necessary after 0.8. And thus the value of rate parameter for distributed EM is chosen as 0.8. According to available convergence analysis not included due to space limitation, it is claimed that the proposed method converges to a fixed point of the centralized EM solution when it holds the assumption in (10).

Consequently, the estimation of parameters can be updated as follows: , , , , and :Iterate processes (A) and (B) until a suitable stopping criterion is reached. After each update for condition expectation and mixture model parameters, the neighbor sensors organized into computing store their local values into memory. And before the end of operation, every node begins to compare the calculated results with at least two neighbors’ record after fixed time interval. If it is found a discrepancy in the information, the node with inconsistent values will be considered as an adversary attack and discarded. Then the remainder of sensors will rerun distributed EM algorithm.

When feature extraction using distributed EM algorithm is finished, five new features are acquired for each node, which is defined as . These new features are used to provide more statistical feature information for attack classification. As a result, the sum of feature vector for sensor can be expressed by , in which the dimension of is equal to . And then will be entirely used as an input into the classifier at the next stage of recognition.

#### 5. Distributed Classifier Design

After these features have been selected and further extracted, we plan to perform classification to recognize the external attacks. A classification process with excellent generalization properties and minimal test error is sufficient to compensate for deficiency in the feature dimensions. As described in the last section, the EG mixture modeling and distributed EM feature extraction all belong to the generative model, which could establish more distinct features from the variation of distance and topological parameters by exploiting their probability density. The generative model possesses excellent ability of modeling and flexibility for the nonnormalized data. However, the optimization capability of generative scheme in recognition phase is always weaker than its discriminative counterpart, especially when the labelled data is sufficient [43]. Another class of technique for recognition is discriminative method. It maps the posterior probability directly as a class label, which avoids the rigid hypotheses for background posterior probability estimation. It generally obtains lower asymptotic error than generative approach in recognition task [44]. However, this manner cannot capture the intrinsic relationship between the feature distribution and the observed feature. In order to make the best of advantages from discriminative approach and generative approach, it is better to couple the generative features with a discriminative classifier to get higher recognition accuracy.

Here, it is noticed that, besides the MK-SVM algorithm, logistic regression (LR) is another prominent and competitive methodology among the discriminative classifiers, which has been used for an extensive range of recognition tasks [45]. Although the computational time of LR is fast and it can often achieve higher accuracy than support vector machine, especially for huge dataset case. Localization attack data classification by using LR has a potential challenge. LR is known as a linear classifier, which means that it can make the best performance on the linear separable features. For the distance and topological features, the variation trend of their approximated distribution and parameters has close relationship with each other. But there still exists certain difference between them for some attacks, which will bring nonlinearity into the features extracted from the unified mixed distribution. In addition, the uncertain data modification by the malicious node will also enhance the nonlinearity. The accumulative nonlinearity may degrade the accuracy of LR in attack recognition. Comparatively, the MK-SVM utilized a kernel function to transform the feature into higher-dimensional space nonlinearly, which is more appropriate to make the feature distinguishable. Therefore, the MK-SVM is chosen as the classifier for attack classification. Furthermore, in order to adapt to distributed sensor networks, the PECPR-MKSVM algorithm is devised to fully exploit the strengths of machine learning, and a two-stage data verification policy is added finally.

##### 5.1. Extension for CPRSM

For the multiclass problem, we can equal it to a linear equality-constrained optimization problem which consists of multiple separable objective functions:where is closed proper convex function, is the feature vector, are given matrices, and is a designated vector. Although the objective function in (18) is convex and linearly constrained, it is not suited for a classic centralized optimization to solve due to a lack of safety and time efficiency. On one hand, after feature extraction, a new feature vector is generated and the potential adversaries are discarded. Although the period of attack launching is uncertain, there is still a small possibility for the undetected adversaries to launch a data modifying assault, which could pose severe damage to recognition performance. On the other hand, the input feature dataset is always large and high dimensional, which is not easy to fulfill the classifier training and testing task for a common optimization scheme. So it becomes important to continue exploiting a parallelizing optimization method to prevent the malicious sensors and process the large feature dataset.

Referring to the literature [46], the contractive Peaceman-Rachford splitting method (CPRSM) has been developed for the linearly constrained convex optimization problem that has been split into two parts. The augmented Lagrangian iterations are given bywhere and are closed convex functions, and are primal variables, and are given matrices, and is a designated vector. and are the intermediate updated Lagrange multiplier corresponding to the linear constraints and is a penalty scalar; here the value of relaxing factor is not determined to ensure the sequence derived by (19) under strictly contractive condition. For convenience, it is assumed that is chosen close to 1. Inspired by effectiveness of CPRSM, a natural idea for solving (18) is to extend the CPRSM scheme from the special situation to the general situation, so the straightforward extension of CPRSM results is the following scheme:where , is closed convex function, , is primal variable, , is given matrices, and is a designated vector. Noting here that is an intermediate variable, its value is updated between iterations of . is a penalty scalar and is a relaxing factor. In order to reduce (19) to improve calculation efficiency, can be further rewritten asSubstituting (21) into (20) and applying scaled dual form, problem (19) can be simplified toSo the minimization problem with more than three convex functions can be obtained via splitting the subproblem of (18) alternatingly. We name (22) as the extending contractive Peaceman-Rachford splitting method (ECPR).

##### 5.2. Proximal MK-SVM with ECPR Substitution

Considering a labelled training set , where feature vector and , MK-SVM places a separating hyperplane between the two categories in feature space. So the minimization optimal problem of the MK-SVM utilizing the unweighted kernels combination is given as follows [47]:where denotes the th element of weight vector. represents a bias term corresponding to the hyperplane. is a basic kernel function. The objective of this formulation is to optimize the variable of and , which will also find the maximum margin and the minimum empirical error.

In order to convert the inequality constraints to an equality constraint, a slack variable is introduced into optimization problem:

Then optimization procedures can be divided into two parts, with optimization as a group and as another. The first part is to solve the minimization problem with respect to parameters by using ECPR when is fixed. For solving the optimization, the augment Lagrangian function of (24) can be expressed aswhere denotes the feature vector and denotes the th element of weight vector. represents a bias term corresponding to the hyperplane. is a basic kernel function, is a Lagrange multiplier, and is a positive scalar. By applying ECPR to the augmented Lagrangian function, the distributed iterative form of problem (25) is obtained. To reduce the calculation of derivative, we then use a linearized proximal method that was proposed by Xu and Wu [48]:After taking the differentiation of (26), the primal variables can be calculated as

Substitute equations in (27) into primal problem , and the primal problem of minimization is converted to a dual function:

In the following, we consider the optimization by fixing , which can be easily solved via gradient method. Setting derivatives of (25) with respect to equal to 0 yields the following results:

Repeat the above iterations until convergence. Therefore, we name (28) and (29) as the proximal extension contractive Peaceman-Rachford splitting-multiple kernel support vector machines (PECPR-MKSVM).

While the PECPR-MKSVM algorithm runs effectively under the condition of normal, it ignores the attack scenarios that the intermediate variables may be negatively disrupted by the modified data from attacker. To address these inadequacies, a simple two-stage calculation verification policy is supplemented to avoid the adversary and improper output. It requires the following steps to carry out during the PECPR-MKSVM training. (1)* Neighbor Node Verification Stage*. As the variable updated at each neighbor node is not the same, the sequence of data forwarding for all neighbor nodes should be rearranged such as forward or backward one position after the first fixed time interval. Then the set of renewing parameter information produced in the next interval is checked with the record maintained in the same sensor. Finally, the first sensor with divergence is determined to be attacker. (2)* Host Node Verification Stage*. Although the validation on neighbor node can eliminate the malicious node, it does not exclude the potential risk on host node itself. Therefore, it is necessary to conduct the algorithm repeatedly on the neighbor node that has been authenticated, which ensures the attacks prevention and calculation precision.

##### 5.3. The Process Overview of the Proposed Algorithm and Calculation Verification Policy

Based on the above design, the message transmission through neighboring nodes in the distributed recognition method can be summarized as follows and is clarified with Figure 5. As shown in Figure 5(a), in the feature extraction phase, when every node obtains its own original feature set, then it conveys to the sensor with the most energy (named ) and its one-hop neighbors to compute statistical attack feature set and verifies its local states by exchanging record with neighbors. As shown in Figure 5(b), in the PECPR-MKSVM training phase, the authenticated sensor with the most residual energy sets an initial value to , and then computes via (23); next, node sends its newly updated to one of its one-hop neighboring sensors. After receiving , iteration resumes when another node updates with the features set included in itself. According to the forwarding rule, all the intermediate variables will be transmitted along the path in order of ’s direct neighbor sensor one by one. Eventually, was sent to to start a new circulation. And the final global minimum of the associated cost function can be got by iterative update on the distributed classifier.

**(a) Iteration visualization of distributed EM algorithm in feature extraction phase**

**(b) Iteration visualization of PECPR-MKSVM classifier in training phase**

#### 6. Experimental Setup and Results

##### 6.1. Simulation Setup

To assess the effective aspects of our mechanism, we presented four groups of experiments that were carried out under different localization attacks. In our simulation, 600 sensors including 48 anchors are randomly distributed over an area. We set the value of communication distance regarding sensors and anchors all to . Moreover, three types of external localization attacks (wormhole, replay, and interference) exist in network simultaneously. The fraction of malicious sensors is 20%, where each kind of external localization attack has one-third number of the total. If the sensor responsible for performing computation happens to be the adversary, there will be a possibility and range of data modification which is lower than 30%. At first all the sensors in the network begin to collect the information of original feature and then convey the feature data to the sensor with the most energy in the network and its one-hop neighbors. And these sensors will conduct distributed EM scheme to compute the new statistical features. At last, we chose one of the authenticated sensors with new feature dataset to run PECPR-MKSVM for training and classification. The experiments are then repeated for 5 times. The features of first four times were adopted as training sets whereas the ones of the last time were used as testing sets. Our classifier for PECPR-MKSVM uses RBF+Poly kernels and one-versus-all approach.

##### 6.2. Attack Classification Performance with the Proposed Algorithm

For the effectiveness evaluation of combining distributed feature extraction and classifier scheme, the recognition performances on two kinds of feature datasets are compared first between the proposed classifier and four similar classifiers, such as a distributed SVM (MoM-DSVM), a multiple kernel SVM (SimpleMKL), a typical SVM (C-SVM), and a logistic regression (LR) classifier. Table 2 shows the average recognition accuracy obtained by these algorithms under different external localization attacks. In general, as depicted in Table 2, the average success classification rate for each kind of attack using feature extraction technology significantly rose 9.4% compared to the one recognized only by classifier. Furthermore, it is worth mentioning that the proposed classifier obtains relatively higher accuracy than the rest of other classifier schemes. For example, for the replay attack, the proposed classifier with the features extracted offers the highest classification accuracy of 93.28%. For the same case, SimpleMKL and C-SVM only result in recognition accuracy of 84.56% and 68.72%, respectively. Although the MoM-DSVM classifier achieves satisfactory classification performance using a consensus-based support vector for replay attack, it is still not sufficiently to recognize the wormhole and interference attacks. The recognition performance of LR is improved obviously by the extracted features, whose performance is superior for the interference attacks compared to the MoM-DSVM and Simple-MKL. However, it is still not comparable to the PEPMK-SVM due to the limitation of safety and the nonlinear features. These comparisons show that the combination of distributed feature extraction and the proposed classifier is able to achieve higher recognition accuracy than any other recognition methods.

##### 6.3. Classification Robustness of the PECPR-MKSVM with Different Kernel Function

We further explore classification robustness of the PECPR-MKSVM classifier with different kernel function. Figure 6 shows the average recognition accuracies for varied numbers of multikernel classifiers by combining different kernels, such as RBF kernel, sigmoid kernel, and polynomial kernel. In Figure 6(a), the average recognition accuracy using the proposed classifier is 4%–7% higher than the MoM-DSVM and MK-SVM method. Moreover, the kernel combination of RBF and polynomial kernel achieves higher recognition accuracy than the others; on the contrary, a single kernel fails to offer good recognition accuracy. For classification error existing in the result, it can often be attributed to the lack of sufficient training samples for classifier. Next, to show the robustness of the proposed classifier, Figure 6(b) compares the recognition performance under a higher malicious sensors ratio. When the ratio of the malicious sensors exaggerates, the average attack recognition rates have a certain improvement for all classifiers, which means that the additional data of the malicious sensors provides more sample to the classifier and affects the classification hyperplanes. Particularly, the average recognition rate of the proposed classifier for RBF+Poly kernel is increased from 91.9% to 93.9%. Thus, the proposed algorithm is more robust to recognize localization attacks even under a severe scenario.

**(a) When 20% sensors are malicious**

**(b) When 30% sensors are malicious**

##### 6.4. Convergence Performance of PECPR-MKSVM with Different Positive Scalar and Relaxation Factor

In order to assess the impact of positive scalar and relaxation factor for the proposed classifier, each node trains a local PECPR-MKSVM and its convergence of test error is compared with the one obtained via MoM-DSVM. We first fix and choose two different values of with respect to PECPR-MKSVM classifier. Then the evolutions of iteration are plotted for each choice of . For comparison purposes, we also plot the convergence performance of MoM-DSVM with and . As illustrated in Figure 7, we see that the test error of PECPR-MKSVM reduces very rapidly with a fewer steps of iterations and soon approaches the minimum value, which outperforms MoM-DSVM based method. Moreover, plot in Figure 7(a) also reveals that a very large value of may lead to dispersion and hinder the convergence rate. These results further reflect the importance of choosing when constructing the EPRSM classifier. Last, plot in Figure 7(b) illustrates that, for each of the test scalar , larger relaxation factor tends to accelerate the convergence rate of the proposed classifiers, thus shortening the runtime.

**(a)**

**(b)**##### 6.5. Time Cost with the Proposed Algorithm

Additionally, to assess different algorithms in saving the time cost of the classification, we further perform experiment under the situation that the number of sensors varies from 200 to 1000 and plot the computational time in Figure 8. Here, we combine all the classifiers with the proposed feature extraction process. Generally, it is not hard for us to find that the proposed algorithm is the fastest among three schemes. More importantly, we can observe that the time for the proposed algorithm increased linearly with the number of sensors, but the growth rate is slower, even though the number of sensor increases to 1000. This is because the classification process is distributed computing, and the computational complexity is depending on the number of neighbor sensors. In contrast, although time cost of the consensus-based MoM-DSVM scheme is more efficient than the MK-SVM and LR algorithm, it still requires higher calculation amount in the training process. The performance of LR algorithm lies between the MK-SVM and the distributed SVM. The MK-SVM algorithm uses the centralized architecture to execute the classification, which increases the number of iterations and computational complexity. Thus, the proposed algorithm is more computationally efficient than the MoM-DSVM, LR, and the MK-SVM method.

#### 7. Conclusion

This paper generalized a distributed classification scheme, which is used for external localization attack classification in WSN. A novel distributed version of EM feature extractor and MK-SVM classifier is also proposed. These new schemes help each sensor computing during feature extraction and recognition across different neighbor sensors. The algorithm models the distance and topological based features into a mixed distribution at the first frame of the phase. Then the parameter features are extracted with a distributed EM scheme that fuses the time and neighbors’ information, as it evolves over iteration. Eventually, a distributed classifier, which corporates MK-SVM with extension for CPRSM, is designed to classify localization attack datasets into multiclass. The experimental results have shown that using the distributed EM as feature extractor and PECPR-MKSVM as classifier can be able to achieve higher classification accuracy than other similar methods. Moreover, the attack recognition scheme presented in this paper is more robust to a wide range of attacks with competitive time efficiency.

#### Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61401360), the Fundamental Research Funds for the Central Universities of China (no. 3102014JCQ01055), and the Natural Science Basis Research Plan in Shaanxi Province of China (no. 2014JQ2-6033).