- About this Journal ·
- Abstracting and Indexing ·
- Advance Access ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents

International Journal of Distributed Sensor Networks

Volume 2012 (2012), Article ID 648058, 16 pages

http://dx.doi.org/10.1155/2012/648058

## Enhancing Sink-Location Privacy in Wireless Sensor Networks through *k*-Anonymity

^{1}College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China^{2}Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA

Received 23 May 2011; Revised 5 January 2012; Accepted 7 January 2012

Academic Editor: Yuhang Yang

Copyright © 2012 Guofei Chai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Due to the shared nature of wireless communication media, a powerful adversary can eavesdrop on the entire radio communication in the network and obtain the contextual communication statistics, for example, traffic volumes, transmitter locations, and so forth. Such information can reveal the location of the sink around which the data traffic exhibits distinctive patterns. To protect the sink-location privacy from a powerful adversary with a global view, we propose to achieve *-anonymity* in the network so that at least entities in the network are indistinguishable to the nodes around the sink with regard to communication statistics. Arranging the location of entities is complex as it affects two conflicting goals: the routing energy cost and the achievable privacy level, and both goals are determined by a nonanalytic function. We model such a positioning problem as a nonlinearly constrained nonlinear optimization problem. To tackle it, we design a generic-algorithm-based quasi-optimal (GAQO) method that obtains quasi-optimal solutions at quadratic time. The obtained solutions closely approximate the optima with increasing privacy requirements. Furthermore, to solve -anonymity sink-location problems more efficiently, we develop an artificial potential-based quasi-optimal (APQO) method that is of linear time complexity. Our extensive simulation results show that both algorithms can effectively find solutions hiding the sink among a large number of network nodes.

#### 1. Introduction

With the increasing advances of sensing devices and wireless technology, wireless sensor networks (WSNs) have been interwoven into the fabric of our daily life. In particular, WSNs have been deployed to monitor personal health, track targets, and sense pollutants. Those sensor networks typically consist of many resource-constrained sensor nodes and one sink. Each sensor node monitors the underlying physical phenomenon and reports the measurements to the sink in a multihop manner.

In spite of their popularity, the viability and success of those sensor networks hinge on a variety of security and privacy threats. One of the most challenging threats is location privacy, since it cannot be addressed by traditional cryptographic mechanisms [1]. Due to the shared nature of wireless communication media, an attacker can easily eavesdrop on the radio communication either by purchasing her own sensor devices or by leveraging other radio devices capable of monitoring message transmission. Thus, no matter whether messages are encrypted or not, an adversary is able to identify contextual information: where the communication has occurred and who has participated in communication, without accessing the content of messages. For example, an adversary can identify the sender of a message by analyzing the angle of arrival [2], or he can determine the receiver in the similar fashion when the receiver relays a message [3].

Since an adversary can locate both the origin and destination of messages (i.e., sinks) purely by observing the contextual information, the WSN location privacy problem can be divided into two categories: *source*-location privacy and *sink*-location privacy. The source-location privacy problem is concerned with preventing attackers from discovering the locations of message sources, which may reveal sensitive position information of assets being monitored, for example, endangered animals. Much effort has been devoted to preserve source-location privacy against a wide variety of attackers, ranging from resource-constrained attackers [2] to powerful attackers that have a global view of network communications [1, 4].

In this study, we focus on preserving *sink*-location privacy against attackers with a *global* view. The sink node serves as the aggregating point for data collection and is crucial to assure the availability of a WSN. If the sink node is located and destroyed, the sensed data can no longer be relayed to a data center, rendering the entire WSN useless. Despite the great importance of sink node, the sink-location privacy problem has only been studied under the assumption of resource-constrained attackers [3, 5–8]. When a global adversary is involved, those strategies for resource-constrained attackers become inapplicable. Our work aims to fill in the absence in defending against powerful global adversaries.

To achieve the global view, an attacker can either deploy her own sensors [1, 4] or utilize powerful radio receivers with extremely sensitive antennas to pick up communications across the whole network [1]. As such, a global attacker can derive the location of sinks either by traffic-analysis attacks [5] or packet-tracing attacks [2, 3]. Traffic-analysis attacks utilize the fact that the closer a node is located towards the WSN sink, the higher the number of messages it needs to forward. Thus, moving towards a spot that exhibits a higher message volume can eventually lead the adversaries to find the sink. Packet-tracing attacks lead the adversary toward the travel direction of messages hop by hop till he reaches the sink.

Both traffic-analysis attacks and packet-tracing attacks require no access to the message content but message existence. Additionally, a global adversary can identify *every* node that has forwarded a message instantly, while most literature [4, 9] assumes that an adversary with a *local* view can only identify the sender when communication occurs within his observable range. We are unaware of any solutions that can defend against a global adversary, since it is virtually impossible to protect the network against a global eavesdropper [10]. Any local obfuscation created by fake messages cannot confuse a global adversary. For instance, fractional propagation [5] forks a fake message toward a random destination while the real message is forwarded towards the sink, which is likely to mislead an adversary with a local view. However, such an approach cannot deceive the adversary with a global view, since all real messages always arrive at the sink.

One naive defense strategy is to have each node send the same volume of messages as the sink (including both real and fake messages). However, such strategy imposes high energy consumption and is infeasible. To limit the energy conception while enhancing the privacy against a powerful adversary with a global view, we propose to achieve *-anonymity* in the network so that at least entities exhibit the same characteristics as the nodes located close to the sink. As such, they are indistinguishable even to the powerful attackers with regard to contextual communication information.

The concept of achieving -anonymity [11] was originally proposed to protect personal identity while releasing person-specific data and has been studied extensively in the field of database and data mining. To our best knowledge, our work is the first attempt to apply this concept to preserving sink-location privacy in wireless sensor networks, and there are no other valid approaches dealing with the attacks of a global adversary. We summarize our contribution as follows.(i)We identify the absence of defense strategies to enhance sink-location privacy against global adversaries.(ii)To enhance sink-location privacy, we propose to achieve -anonymity via an Euclidean minimum-spanning tree-based routing protocol, that is, create designated nodes in the network.(iii)We show that positioning designated nodes is complex as it affects two conflicting goals: the routing energy cost and the achievable privacy level, and both goals are determined by a non-analytic function. To strike a balance between those two goals, we formulate the problem of -anonymity routing protocols as a nonlinearly constrained optimization problem.(iv)The nonlinearly constrained optimization problem is extremely challenging to solve. To tackle the problem, we design two quasi-optimal algorithms that can obtain the -node locations closely approximating the optima, and our extensive simulations validate that both algorithms can effectively find solutions hiding the sink among a large number of network nodes.

The rest of the paper is organized as the following. In Section 2, we describe the network model, attack model, and formalize the problem of achieving -anonymity as a nonlinear optimization problem. We present the routing algorithm for achieving -anonymity in Section 3. In Section 4, we discuss two approximate algorithms that can obtain quasi-optimal solutions and show our validation effort. Finally, we discuss related work in Section 5 and provide concluding remarks in Section 6.

#### 2. Problem Overview

A wide variety of WSNs have emerged as monitoring and controlling solutions for numerous applications. It is very hard, if even possible, to design a solution applicable to all types of WSNs and to address all attacks. In this section, we specify a popular type of WSNs, which were adopted by several work [12–16]. We formalize the problem below.

##### 2.1. Network Model

We consider a network of wireless sensor nodes that is distributed throughout a bounded environment at positions , and we denote where is indexed using an index set . The network has the following features.

###### 2.1.1. Periodic Data Reporting

WSNs can be classified as event-driven or periodic. In an event-driven sensor network, only those sensors that have observed events will generate and deliver messages to sinks in a multihop manner while others remain silent. In a periodic network, each sensor will measure the underlying physical phenomena and will deliver its measurements periodically to sinks. We focus on periodic networks since in such networks, even aggregation cannot eliminate the data traffic accumulation towards the sink [9]. Further, we assume that no aggregation algorithms are applied to the networks.

###### 2.1.2. Homogeneous Network with One Sink

We consider homogeneous sensor networks that consist of sinks and a large number of sensor nodes and are densely deployed in a square. Each sensor node is equipped with an omnidirectional antenna and transmits at the same transmission power level. Without loss of generality, we assume that one sink in the network collects data. We note that our scheme can be easily extended to a network with multiple sinks.

###### 2.1.3. No ACK

We assume that the sensor networks do not rely on acknowledgement packets (ACKs) to achieve reliable communication, since the excessive number of ACKs transmitted by the sink will easily reveal its location. We assume that the sink only passively receives messages. Thus, the sink is hidden, and the adversary cannot pinpoint the location of the sink purely by relying on eavesdropping on ACKs.

###### 2.1.4. End-to-End Data Encryption

We assume that messages are protected by an end-to-end encryption protocol using pairwise keys [17]. Due to the limitation of constrained resources, we do not consider the case where the messages are decrypted and re-encrypted at each hop. Therefore, a message exhibits the same cipher as it travels from the source to the sink.

##### 2.2. Attack Model

We consider a powerful attacker who is able to eavesdrop on all communications across the whole network. The adversary does not actively interfere with regular communications in the network but passively eavesdrops on network communications. Her goal is to find the location of the sink and to compromise the sink via physical contacts. Additionally, according to Kerckhoffs’ Principal [18], we assume the adversary is aware of all protocols being used but does not know the established keys of the network and is unable to decrypt messages.

To find the sink physically, the adversary will perform a two-phase search: (1) the *location-mining* phase and (2) the *visual searching* phase. In the location-mining phase, the adversary eavesdrops on the network traffic and identifies a set of nodes that appear to be close to the sink. Given the information on nearby nodes, the adversary will find the sink physically in the visual searching phase.

###### 2.2.1. Phase I: Location Mining

Let be the message in the network. When is forwarded from its originator to the sink, the attacker will record a set of communication events represented by three tuples: , where is the number of hops that has travelled and each three-tuple maps to an event that the sensor node located at forwards at time . Up to time , the adversary will obtain the communication event set , where and are index sets of messages and the hop counts for the messages, respectively.

Given , the adversary will perform statistical analysis on message transmission information. Formally, let denote the space of all communication events. We describe the statistical analysis method as a composite function : the function maps the communication events to traffic statistics associated with every node, and the function selects the set of nodes who have unusual traffic statistics. That is, where is the cardinality of a set and is the power set of .

We consider a powerful attacker who is able to perform traffic-analysis attacks and traffic-tracing attacks. Particularly, he is able to obtain two traffic statistics : the traffic volume and the number of messages that end at a node . Assume the attacker starts to record communication events at time , and he can obtain the following statistics at time : where is the hop count for the message . Given and , the adversary can identify nodes that have either the maximum traffic volume or the maximum number of messages ending here:

Consider the example depicted in Figure 1, where a tree-based routing protocol is used and a routing tree is formed with the sink node serving as the root of the tree. After one reporting period , the adversary will conclude that , since transmits 12 messages per period, one for each node.

###### 2.2.2. Phase II: Visual Searching

Although only identifies the nodes that are close to the sink and does not pinpoint the sink’s location, it does help the adversary to refine the region where the sink resides. To find the sink physically, the adversary needs to search either visually or using equipment such as a metal detector. Assume the adversary is able to search an area of size per second and the area of is , then the amount of time required for the adversary to identify the sink physically is at most .

Continuing with the example depicted in Figure 1, only contains a node . The region is the communication range of with a size . The amount of time required for the adversary to find the sink is at most .

##### 2.3. -Anonymity

Our goal is to design a routing strategy that can enhance sink-location privacy. Essentially, the risk of breaching the sink-location privacy is caused by the observable asymmetric traffic pattern of the sensor networks. The message traffic volume is the largest at the nodes close to the sink, and the travel paths of messages always end there as well. The basic idea of our approach is to change the traffic pattern such that at least nodes located at may be far away from the sink but behave the same as the nodes around the sink; namely, In particular, we envision that each message is delivered to the sink prior to its last-hop transmission, and thus messages no longer end at the nodes around the sink. Further, a lot more nodes send high volumes of messages other than the ones around the sink. As a result, .

The main design goal of the -anonymity routing protocol is to enhance sink-location privacy, and it should also deliver messages without incurring high energy overhead. Therefore, we define a privacy measure and a network efficiency metric to evaluate a routing strategy.(1)The safety period is the *average* amount of time taken for a global attacker to find the sink physically. We use the safety period to quantify the privacy level. A larger safety period maps to a higher level of sink-location privacy. The safety period includes the amount of time needed for *location mining* and for *visual searching*. Because the duration for *location mining* is fixed and short, we consider the safety period equals the duration of *visual searching*. Since at least nodes located at exhibit the same traffic statistics, the adversary has to visually search all the communication ranges of these nodes. Thus, the safety period is a function of , denoted by .(2)The energy cost is the average amount of energy consumed for transmitting one message from each sensor to the sink in one measurement period. Since for the routing strategy, the messages delivered to the sink are also transmitted to the nodes , the energy cost also relates to the positions of these nodes, denoted by .

An ideal routing protocol should provide a long safety period at small energy cost . However, typically a longer safety period requires the messages to be transmitted in a longer way to visit and thus imposes a larger energy cost. To find a balance between the safety period and the energy cost , we define the problem of designing the routing protocol as an optimization problem.

*Problem 1. *
where is the required safety period.

#### 3. Routing Algorithm Description

##### 3.1. Algorithm Model

In order to achieve -anonymity, we propose an Euclidean minimum-spanning tree-based (EMST-based) routing algorithm to create at least nodes whose traffic volumes are equally high. Consider a network deployed in a square , as depicted in Figure 2. The EMST-based routing algorithm partitions the square into non-overlapping sub-regions . Denote the partition by . In each subregion , a node is chosen to be the *designated node*, which locates at and collects all messages originating from the sub-region .

Each message is forwarded in two stages, *intraregion forwarding* and *interregion forwarding*. During intra-region forwarding, messages originating from are routed to the designated node through a routing tree rooted at . Once the designated node receives a message generated inside , it starts the inter-region forwarding by sending the message to all other designated nodes through an EMST that connects those nodes. We envision that as the message travels through the EMST, it will reach the sink that is located at most one communication range away from the EMST. Such an arrangement can be achieved by positioning the sink after the EMST is determined. We note that we adopt an EMST because, by definition, an EMST is a spanning tree with a weight less than or equal to the weight of all other spanning trees.

Interestingly, as a result of constructing an EMST connecting designated nodes, the number of nodes that exhibit similar traffic statistics as these designated nodes is larger than ; that is, . Typically, the distance between any pair of designated nodes and is larger than one communication range, and additional sensor nodes are needed to form a complete EMST for message relaying. As a result, additional nodes are added to as a side effect of the proposed two-stage routing. To make the problem model simple yet representative, for the rest of the paper we denote as the number of partitions, for example, the number of designated nodes, and denote as the position vector of designated nodes; that is, even though the total degree of anonymity is larger than . The selection of the partition number is affected by many factors. For instance, a larger suggests constructing larger number of routing trees rooted at for each region and thus larger overhead, while a smaller may not meet the requirement of the safe period, . As a general rule, the value of should be small so that it reduces the overhead of constructing multiple routing trees yet satisfies the constraint of . We postpone the detailed discussion on the selection of to Section 4.

##### 3.2. Problem Elaboration

Before updating the problem definition according to two-stage routing, we define the length of the EMST as where is the edge that connects and and is the Euclidean distance.

According to the two-stage routing protocol, we elaborate the definition of the privacy and network efficiency metrics based on EMST and hop counts

###### 3.2.1. Safety Period Quantified by EMST()

In one reporting period, the number of messages transmitted by all nodes that are part of the EMST equals the total number of nodes in the network. Therefore, contains all nodes belonging to the EMST. To further find the sink physically, the adversary has to search along the EMST. Assume that the adversary can travel at a very high speed when he is not performing visual search such that the time he spends traveling from one location to another can be ignored. Let denote the adversary’s searching speed, and let be the node communication range. Then as Figure 2 illustrates, the searching time is approximately For the rest of the paper, we will use EMST as an indicator for the safety period to avoid possible confusion that might be caused by an inappropriately selected .

###### 3.2.2. Energy Cost Quantified by Hop Counts

We define energy cost as the unit of hop counts. Assume the average hop size across the network is . Then, in a network consisting of uniformly distributed nodes, the average energy cost of routing a message from to a designated node can be approximated by the hop count [7]: We note that this energy representation is sufficient to model energy spent both at the sending end and at the receiving end, since we can scale up by multiplying by a coefficient . The coefficient can include the energy consumed both as the sender transmits the message and as its neighbors overhear and process the message.

The average total energy cost for each sensor node consists of intra-region communication and inter-region communication . Since every sensor node will generate one message per reporting period, the average intra-region energy cost per period per node is and the average inter-region energy cost per period per node is

Accordingly, the routing optimization problem defined as Problem 1 can be precisely formulated as follows.

*Problem 2. *
where is the threshold value to satisfy the safety period requirement, .

##### 3.3. Problem Reduction

Problem 2 defines a non-linear optimization problem that contains two variables: the locations of designated nodes, that is, , and the partition . Solving such a nonlinear optimization problem is difficult. Thus, in this subsection we focus on reducing the problem to a simpler version.

We observe that the locations of designated nodes will affect the inter-region communication energy cost and the intra-region energy cost while the partition only affects . Thus, we first examine the principle of the partition that minimizes . Intuitively, knowing the partitioning principle enables us to solve the problem defined in Problem 2 in two steps. (1) Finding the optimal locations of designated nodes. (2) Applying the optimal partition to further reduce .

Next, we present a result showing that, for given locations , the Voronoi partition is the optimal partition for Problem 2.

Lemma 1. *If is the global optimum that minimizes , then is the Voronoi partition , where
*

*Proof. *We prove the lemma by contradiction. Without loss of generality, we examine the case as shown in Figure 3, and let and be the locations of the two designated nodes. The solid line located in the middle of the network region represents the Voronoi partition, and it perpendicularly bisects the line connecting and . Let be the optimal partition that minimizes , shown by the dashed line. Then,
that is, for ,
Let denote the characteristic of with regard to the set ; that is,
Then, (17) is equivalent to
For each , it belongs to one of the following four cases. According to the definition of Voronoi partition, we have(1) and : ,(2) and : ,(3) and : ,(4) and : .Combining the above four cases, we have
which contradicts to (19). Thus, the optimal partition is the Voronoi partition.

For the rest of the paper, we will use the following notation:

Additionally, to reflect the fact that depends on EMST, we reform Problem 2 to the following.

*Problem 3. *
As a result, the sets of variables for the routing optimization problem have been reduced to , the positions of designated nodes.

#### 4. Quasi-Optimal Solutions

Solving Problem 3 gives us the optimal solution of -anonymity, that is, the positions of designated nodes that minimize the total routing energy and guarantee the safety period requirement. However, solving Problem 3 is challenging. First, Problem 3 is related to the problem of finding a set of points in a constrained planar region such that its Euclidean minimum spanning tree has the length of a given value. To the best of knowledge, such an problem has not been addressed in the literature so far, and it is unknown whether the problem is NP hard. Second, our Problem 3 seeks optimized locations for an energy cost function subject to an EMST constraint and thus creates more difficulties.

Popular methods for solving nonlinear optimization problems, such as the generalized reduced gradient [19], are inapplicable to solve Problem 3, because those methods leverage the first or second derivative of the objective function to search for the optimal solution and the derivative of EMST is complicated to formulate. Searching for the optimal positions of designated nodes through every conceivable value is computationally infeasible. To tackle the problem, we first analyze Problem 3 by finding a that minimizes using genetic algorithms (GA) and then propose quasi-optimal algorithms to obtain a solution approximating the optimal one.

To facilitate discussion, we summarize the notation convention of optimal solutions to Problem 3 and its reduced subproblems in Table 1.

##### 4.1. Minimizing

The objective function consists of two components: and , and we start by searching for a that minimizes the first component , namely, solving the following problem:

*Problem 4. *

Problem 4 is still a nonlinear optimization problem with an objective function whose derivative is difficult to calculate. We choose to exploit the widely adopted genetic algorithms (GAs) to find the optimal solution. GA mimics Darwin’s theory about evolution. It iteratively generates a set of solutions known as a population and selects a subset of solutions to form a new population based on each solution’s “fitness.” The fitness level of a solution can be evaluated using the objective function of the optimization problem. “Fitter” solutions will be selected with higher probability while “weaker” solutions will still have chances to be selected. As a result, GA is likely to escape from local optima and evolves to the global optima with high probability. Thus, we call the solutions obtained by GA as optimal solutions.

We call our customized genetic algorithm that searches for optimal solutions of Problem 4 as GA4(k), and we built our GA4(k) using Matlab toolbox GAtool and searched for optimal designated node locations in a 2500-node network that is deployed in a square with a uniform density. The node communication range was set to , which resulted in an average hop size of . We constructed the “chromosome” as , that is, coordinates of designated nodes and performed multiple runs of experiments while changing the value of . For each , we ran the experiments about 10 times, and we set the population size to approximately , the crossover fraction to 0.8, and the maximum number of generations to 100. Figure 4 shows the typical patterns for optimal designated nodes’ positions that minimize and the corresponding EMST(), when .

*Remark 2. *From Figure 4, we observe that for each optimal layout the designated nodes are distributed almost uniformly across the network, and the network area is partitioned into regions with similar sizes. This observation can be intuitively explained by rewriting (12) as
where is the average distance between every sensor node and its nearest designated node. To minimize the designated nodes have to be deployed in such a way that is minimized.

*Remark 3. *We depict , , and in Figure 5 and EMST in Figure 6, which show that both and EMST increase with while decreases with . Intuitively, when the number of partitioned regions increases, the average distance between a sensor node and its nearest designated node decreases and so does . However, the increase of causes the designated nodes to further spread out and thus increases EMST. A slight change of EMST will cause a larger level of than , because creates an equivalent level of while amortized among all nodes with regard to . Thus, we observe that as increases, grows quickly, and soon .

To estimate the relationship between EMST and , we performed a regression analysis on the empirical results of EMST and . Rather than choosing a polynomial, we construct the regression function according to Remark 2; that is, the network area is very likely to be partitioned into regions of similar sizes and the distances between every two neighboring designated nodes (two designated nodes that are connected by an edge in the EMST) are roughly the same. Let be the average distance between neighboring designated nodes. Then Additionally, we can use a disk with radius to approximate the area of each region, and where is the area of the square and is a coefficient describing how close the disk approximates each region on average. Thus, the length of EMST can be estimated by the following equation: Our regression analysis showed that the fitting error is minimized when . As shown in Figure 6, the comparison between the estimated EMST with and the empirical one obtained by GA show that the regression line is a close fit.

##### 4.2. GA-Based Quasi-Optimal Algorithm

Analyzing Problem 4 utilizing GA provides important insights towards solving the original routing optimization problem defined in Problem 3. In this subsection, we introduce a GA-based quasi-optimal algorithm (GAQO) that can obtain an approximate optimal solution for Problem 3. In particular, the GAQO algorithm provides the quasi-optimal solution to the following problem:

*Problem 5. *

We will show that the quasi-optimal solutions for Problem 5 closely approximate the solutions for Problem 3 empirically. Intuitively, according to Remark 3, a slight change of EMST will cause a larger level of increase of than decrease of . Thus, our approach is to minimize as much as possible. Note that achieves its minimum when EMST. Thus, ensuring that EMST will produce a solution approximating the optimal solution for Problem 3.

###### 4.2.1. Approximation Evaluation Metric

To evaluate how close the solutions obtained by the GAQO algorithm approximates the optima, we define the approximation evaluation metric as the energy difference between and ; We will show that is bounded by the difference between the intra-region energy of and : We now justify (39) by proving the following lemma.

Lemma 4. *.*

*Proof. *(Second inequality.) By definition, for a given , is the global optimum which minimizes , so .

(First inequality.) For a given , minimizes . Thus, . Additionally, by definition, EMST and EMST. Thus,
Combining both facts, we conclude that . Therefore, the lemma is proved.

###### 4.2.2. Algorithm Walk-Through

Searching optimum for Problem 4 using GA has provided insights of (We did not apply GA to solve Problem 5, because the constraint of EMST makes it prohibitively time consuming to obtain a feasible solution.). In particular, for a given , if the required happens to equal EMST, then is the global optimum for Problem 3 that is, . We take the hypothesis that optimal solutions for different threshold values are continuous and design our GA-based quasi-optimal (GAQO) algorithm with steps shown in Algorithm 1:

*Step 1. *Call Closest_EMST to find whose EMST is closest to the given , according to (27).

*Step 2. *For the given , find an optimal layout for Problem 4 using genetic algorithm GA4(k).

*Step 3. *Shrink or expand with regard to the center of the network area until EMST. Let the center of be the origin of the coordinate, and let . Then .

We note that the aforementioned GAQO algorithm, though not optimal, does approximate optimal solutions.

*Example 5. *Here, we illustrate how the GAQO algorithm achieves -anonymity for a given safety period in Figure 7. We use the same parameters of the sensor network described in Section 4.1 and set the required safety period . In the first step, based on (27), GAQO concluded that the closest EMST when . Then, GAQO utilized the genetic algorithm GA4(k) to search for the optimal positioning of 7 designated nodes. An example layout of when is denoted by the red “” points in Figure 7. Since EMST, GAQO shrank to the quasi-optimal layout of the designated nodes , as marked by blue “” points in Figure 7.

###### 4.2.3. Evaluation

To evaluate how close the solutions obtained by the GAQO algorithm approximate the optimal solution, we performed an empirical study. In particular, we used the same network setup as before and searched for the quasi-optimal solutions in a 2500-node network deployed in the square. We changed the constraint of Problem 5 by varying the length of EMST. To capture the statistical character of GAQO, for each EMST value, we ran the algorithm at least 10 times over randomly generated network topologies, and calculated the upper bound of the difference between the quasi-optimal solution and global optimal solution, that is, . The plot in Figure 8 has confirmed that for the quasi-optimal solution obtained by the GAQO algorithm, approaches as increases.

##### 4.3. Artificial Potential-Based Quasi-Optimal Algorithm

The GAQO algorithm can obtain quasi-optimal solutions of the -anonymity sink-location problem. However, our simulation study shows that the run time of GA4(k), that is, the algorithm that searches for that minimizes using genetic algorithms, increases quadratically as the constraint increases. To efficiently solve the -anonymity sink-location problem, we design an artificial potential-based algorithm named AP4(k) to substitute GA4(k), and we call the new quasi-optimal algorithm leveraging AP4(k) an APQO algorithm.

Artificial potential (AP) [20] (aka. artificial physics in some literature as opposed to natural physics) was originally developed for the purpose of obstacle avoidance. Later, it was used as a distributed control strategy to solve self-deployment problems of WSNs. The approach is simple enough to let each entity exert forces on other nearby entities and respond to forces from them; yet a uniform distribution will eventually emerge. Since the approach is largely independent of the number of entities, it scales well for large sets of entities. We take advantage of the linear time complexity of an AP-based method to solve the -anonymity sink-location problem, since searching for optimal solutions of designated nodes is equivalent to deploying nodes uniformly across the network (according to Remark 2).

We built our APQO algorithm on the AP-based self-deployment algorithm proposed by Ding et al. [21], whereby sensors are deployed into uniform lattices inside a bounded region. We start by assuming the designated nodes can move to any position inside the network area and we denote the aggregate position vector of mobile nodes. Once the AP-based algorithm converges and finds the final position , we select those sensor nodes that are closest to to be the designated nodes.

###### 4.3.1. AP Definition

Two types of artificial potential functions are defined for every node : , which is the potential between node and another node (), and , which is the potential between node and the boundary. The artificial potential has the following characteristics. When node is located close to another node or to the boundary, the potential is high and has a tendency to push node away. When node is very far away from another node or the boundary, the potential reduces to zero. is defined as where is the distance between these two mobile nodes and is the effective radius of the potential.

We define as the potential between mobile node and the nearest point on the boundary , where is the set of all the nearest points, and , being the set of all points on the boundary. We note that may not be a singleton. For example, when the is on the diagonal of the square, there exist two nearest points with each on one edge of the square. is defined as where and is the effective radius of the boundary potential. Here we set .

The relationships between and the distance of and between and are depicted in Figure 9, which exhibit desired characteristics.

In addition, we define the total potential as

To distribute nodes approximately uniformly inside the network area is equivalent to finding that minimize : We consider the gradient descent method to find the minimum for and define the following position update scheme for mobile node : that is, we let the mobile nodes move towards the negative of the gradient to minimize the total potential .

###### 4.3.2. Algorithm Walk-Through

Overall, the APQO follows the similar framework as shown in Algorithm 1. For a given , the function Cloest_EMST() returns whose corresponding EMST is closest to , according to the line fitting equation (27). Different from GAQO, APQO utilizes the AP-based function AP4(k) to find the quasi-optimal layout that minimizes . Similar to GAQO, APQO also shrinks or expands with regard to the center of until EMST; that is, .

We listed the pseudocode of AP4(k) in Algorithm 2, which contains the following steps.

*Step 1. *Initialize the locations of the nodes to be around the center of the network square without overlapping.

*Step 2. *Obtain the gradients , and update the location vector according to the gradients and the step size (a small constant we choose) iteratively until convergence. Denote the converged position as .

*Step 3. *Select the sensor nodes that are closest to to be the designated nodes, and we call their positions as .

We use the following lemma to show that the AP4(k) algorithm must converge.

Lemma 6. *The AP-based algorithm is convergent; that is, asymptotically approaches the location where .*

*Proof. *Taking the derivative of , we obtain
Therefore, and is bounded for . Further, note from (33) that tends to if approaches 0. Thus, the boundedness of implies that will never become 0 and remains inside the network region all the time.

Let . Then by LaSalle’s invariance principle [22], the trajectory converges to the largest invariant set in , which completes the proof.

###### 4.3.3. Evaluation

Similar to the GAQO algorithm, we have defined an approximation evaluation metric and is bounded by the difference between the intra-region energy of and : To evaluate the APQO algorithm, we performed an empirical study using the same network setup as before: a 2500-node network deployed in the square. Figure 10 shows the result, and for the quasi-optimal solution obtained by the APQO algorithm, approaches as increases. Additionally, the steady-state locations of the designated nodes , obtained by AP4(k), are affected by the value of . If is small and disks (with a radius of ) are not enough to fill the region , then in the steady state, each designated node is at least away from its nearest designated node [23]. In comparison, if is large and disks are more than enough to fill the region, the distances from any pairs of nearest designated nodes in the steady state are less than . For a given , to ensure that the length of the EMST obtained by AP4(k) is similar to the one obtained by GA4(k), we set to , the average distance between neighboring designated nodes obtained by empirical equation (27). Additionally, we adopted the same setups as the one for the GAQO algorithm evaluation and used the same topologies to evaluate the APQO algorithm.

*Performance Comparison*

The length of EMSTs obtained using GA4(k) and AP4(k) is presented in Figure 11(a), and the locations derived by AP4(k) for various are demonstrated in Figure 12. We note that the resulting EMSTs shown in Figure 12 appear slightly different from the ones that are obtained via GA4(k) (shown in Figure 4). This is because designated nodes are scattered roughly evenly across the network and a slight variation of their locations will cause the EMST to go through edges connecting different pairs of nodes. However, the numerical results of EMST length show that the AP-based AP4(k) algorithm can acquire EMSTs of similar length as the ones derived by GA4(k). Further, as shown in Figure 11(b), for a given , the total energy levels obtained by the APQO algorithm fit closely with what the GAQO algorithm derives, which indicates that the APQO algorithm can also obtain quasi-optimal solutions for Problem 3.

*Time Complexity Comparison*

Since the majority of the run-time for the GAQO and APQO algorithms is contributed by executing GA4(k) and AP4(k), we measure the run-time of GA4(k) and GA4(k) only. We tested both GA4(k) and AP4(k) on a computer equipped with a 2.1 GHz AMD dual-core CPU and 3 GB RAM and depicted the run-time of these two algorithms when varying in Figure 11(c). Figure 11(c) shows that the run-time of GA4(k) increases quickly as increases while the run-time of AP4(k) remains short. This is because the time complexity for GA4(k) is , where is the total number of nodes in the network, and the time complexity of AP4(k) is .

GA4(k) involves calculating multiple generations, and each generation has a population size of . Computing the fitness function for each individual requires calculating the distance between designated nodes and all network nodes. Considering that the maximum number of generations is at most 1000 in our simulation, the time complexity of GA4(k) is . In comparison, each iteration of AP4(k) only involves updating locations . Since the total number of iteration is independent of the number , the time complexity of AP is . In our simulation, AP4(k) converged around 1s to 5s. Thus, APQO performs better than GAQO as the number of nodes in the network increases.

*-Anonymity Evaluation*

We evaluated how effective the EMST-based routing protocol can change the traffic pattern around the sink. Let the node that is closest to the sink be . We are interested in the number of nodes exhibiting the same traffic statistics as . Denote as the number of nodes whose traffic volumes (3) are the same as that of , and denote as the number of nodes which has the same number of messages ended there (4) as . Figure 13 shows the trend of and when and increase. It indicates that the EMST-based two-stage routing algorithm can effectively hide the location of the sink. Almost all nodes in the network appear to have the same as that of , and a lot more network nodes other than designated nodes forward the same amount of traffic as .

#### 5. Related Work

Protecting the identity of traffic sources has been extensively studied in the context of general networks, where the usage of a series of intermediate mixes and onion routing [24] was proposed to cope with traffic analysis. The problems of tracking users’ paths in wireless networks with location-oriented services were studied by Gruteser and Grunwald [25] and Hoh and Gruteser [26], and they proposed a path perturbation algorithm to increase source location anonymity. Since sensor networks have constrained resources, those methods are not applicable there.

In the context of wireless sensor networks, both source-location privacy and sink-location privacy have attracted attention from the research community. Source location privacy focuses on protecting the message source, since such information can reveal sensitive position information of the target that is close to the message source. Preserving source-location privacy against a local adversary was first studied by Kamat et al. [2], where fake message injection and phantom routing are proposed to prevent a local eavesdropper from discovering the message source through hop-by-hop traces.

The problem of preserving source-location privacy under a global eavesdropper has been studied extensively [1, 4, 27, 28]. Mehta et al. [4] have proposed periodic collection and source simulation techniques to prevent the leakage of message source location, and Yang et al. [1] have introduced dummy traffic to hide the real message source. Ouyang et al. [27] have devised a set of privacy-preserving algorithms involving sending periodic maintainable messages to address a laptop-class attacker who has longer radio range and can eavesdrop on all communications in a sensor network. A notion of statistically strong source anonymity is proposed by Shao et al. [28], and a strategy called FitProbRate has been proposed to achieve statistically strong source anonymity with a reduced real event report latency.

In the areas of enhancing sink-location privacy, Deng et al. [9] have shown that traffic analysis can reveal the location of sinks and proposed several antitraffic analysis countermeasures to hide the direction of data flow and create fake sink locations that exhibit artificially high traffic. In their follow-up work [5], multiple parent routing, controlled random walk, random fake paths, and combinations of all three routing algorithms have been studied to generate randomness against traffic rate monitoring and traffic path direction attacks. Location privacy routing (LPR) [3] utilizes probabilistic routing and fake message injection to deceive an adversary from tracking the direction of traffic flow. Conner et al. [29] proposed the decoy sink protocol, whereby data are forwarded to a decoy sink for aggregation before they are relayed to the real sink. As a result, the traffic volume near the sink is reduced while decoy sinks exhibit high traffic volume, which makes traffic analysis attacks difficult. Liu and Xu [7] presented a zeroing-in attack that can be launched by resource constraint adversaries and proposed a random walk-based defense strategy. Gu et al. [6] proposed a privacy-preserving scheme which obfuscates the sink’s location with dummy sink nodes and can help secure existing mobility control protocols against attacks. However, those strategies cannot cope with a global adversary.

To deal with global adversaries, Ngai [8] proposed randomized routing with hidden address (RRHA), whereby packets are routed from the source to the sink along a random path and the destination field is not included in the header of the packets. Such a routing protocol does provide sink anonymity, but the packet may not reach the sink at all. Additionally, Nezhad et al. [10] designed an anonymous routing protocol to preserve the sink-location privacy against a global adversary. However, their global adversaries are only capable of packet-tracing attacks not traffic-analysis attacks. In this paper, we focused on addressing the problem of enhancing sink-location privacy against a global adversary capable of both attacks, while assuring that messages will arrive at the sink.

Artificial potential was originally developed in Khatib [20] for the purpose of obstacle avoidance. Later, it was used as a distributed control strategy for a large number of entities to achieve certain geometric configurations, such as in coverage and connectivity problems of WSNs [21, 30] and formation and flocking problems of collective artificial agents [31]. Since the approach is largely independent of the size and number of entities, the results scale well to larger sets of entities. We take advantage of the linear time complexity of this method to solve a nonlinear optimization problem that defines the -anonymity sink-location problem.

#### 6. Concluding Remarks

Wireless sensor networks rely on the sink to collect the measurements across the entire network; thus it is essential to protect the location information of the sink. However, the traffic around the sink typically exhibits distinctive patterns, and an adversary with a global view can identify the location of the sink by measuring the traffic statistics of the entire network. In this study, we addressed such a threat, and we proposed an EMST-based two-phase routing algorithm that can achieve -anonymity of the sink. In particular, the network is partitioned into regions with each containing one designated node. Messages are first delivered to one designated node and then forwarded onto the EMST that interconnects all other designated nodes. The two-phase routing algorithms can effectively create many entities that exhibit the same traffic pattern as the nodes located close to the sink.

The positioning of designated nodes affects two conflicting goals: the routing energy cost and the privacy level of the sink’s location, and thus we formulated it as a nonlinear optimization problem. To tackle this problem, we first utilized a genetic algorithm to search for quasi-optimal solutions and developed a genetic algorithm-based quasi-optimal (GAQO) algorithm that can obtain solutions which closely approximate global optimal solutions. Further motivated by the observation that the quasi-optimal solution partitions the network into areas with similar sizes, we designed an artificial potential-based quasi-optimal (APQO) algorithm that can also obtain a quasi-optimal positioning of nodes but which requires significantly reduced run-time. Our simulation results validated that both algorithms can effectively derive the positions of designated nodes which meet the requirement of privacy at the minimum routing energy cost.

#### Acknowledgments

Thw authors thank Dr. Jianjun Hu for his feedback on the genetic algorithms. This work is partially supported by the National Science Foundation Grant CNS-0845671.

#### References

- Y. Yang, M. Shao, S. Zhu, B. Urgaonkar, and G. Cao, “Towards event source unobservability with minimum network traffic in sensor networks,” in
*Proceedings of the 1st ACM Conference on Wireless Network Security (WiSec ’08)*, pp. 77–88, ACM, 2008. - P. Kamat, Y. Zhang, W. Trappe, and C. Ozturk, “Enhancing sourcelocation privacy in sensor network routing,” in
*Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS ’05)*, pp. 599–608, IEEE Computer Society, 2005. - Y. Jian, S. Chen, Z. Zhang, and L. Zhang, “Protecting receiver-location privacy in wireless sensor networks,” in
*Proceedings of the 26th IEEE International Conference on Computer Communications (INFOCOM’07)*, pp. 1955–1963, 2007. - K. Mehta, D. Liu, and M. Wright, “Icnp’07: location privacy in sensor networks against a global eavesdropper,” in
*Proceedings of the IEEE International Conference on Network Protocols*, pp. 314–323, 2007. - J. Deng, R. Han, and S. Mishra, “Countermeasures against traffic analysis attacks in wireless sensor networks,” in
*Proceedings of the 1st International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM ’05)*, pp. 113–126, IEEE Computer Society, 2005. - Q. Gu, X. Chen, Z. Jiang, and J. Wu, “Sink-anonymity mobility control in wireless sensor network,” in
*Proceedings of the IEEE International Conference on Wireless and Mobile Computing, Networking and Communications*, pp. 36–41, 2009. - Z. Liu and W. Xu, “Zeroing-in on network metric minima for sink location determination,” in
*Proceedings of the 3rd ACM conference on Wireless network security (WiSec ’10)*, pp. 99–104, ACM, 2010. - E. C.-H. Ngai, “On providing sink anonymity for sensor networks,” in
*Proceedings of the International Conference on Wireless Communications and Mobile Computing: Connecting the World Wirelessly*, pp. 269–273, ACM, 2009. - J. Deng, R. Han, and S. Mishra, “Intrusion tolerance and anti-traffic analysis strategies for wireless sensor networks,” in
*Proceedings of the International Conference on Dependable Systems and Networks (DSN ’04)*, p. 637, IEEE Computer Society, 2004. - A. A. Nezhad, A. Miri, and D. Makrakis, “Location privacy and anonymity preserving routing for wireless sensor networks,”
*Computer Networks*, vol. 52, no. 18, pp. 3433–3452, 2008. View at Publisher · View at Google Scholar · View at Scopus - Samarati P. and Sweeney L., “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Tech. Rep., 1998. View at Google Scholar
- S. B. Eisenman, E. Miluzzo, N. D. Lane, R. A. Peterson, G.-S. Ahn, and A. T. Campbell, “The bikenet mobile sensing system for cyclist experience mapping,” in
*Proceedings of the 5th international conference on Embedded networked Sensor Systems (SenSys ’07)*, pp. 87–101, ACM, New York, NY, USA, 2007. - L. Krishnamurthy, R. Adler, P. Buonadonna et al., “Design and deployment of industrial sensor networks: experiences from a semiconductor plant and the north sea,” in
*Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems (SenSys ’05)*, pp. 64–75, ACM, New York, NY, USA, 2005. - L. Selavo, A. Wood, Q. Cao et al., “Luster: wireless sensor network for environmental research,” in
*Proceedings of the 5th International Conference on Embedded Networked Sensor Systems (SenSys ’07)*, pp. 103–116, ACM, New York, NY, USA, 2007. - V. Singhvi, A. Krause, C. Guestrin, J. Garrett, and S. Matthews, “Intelligent light control using sensor networks,” in
*Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems (SenSys ’05)*, pp. ACM–218, New York, NY, USA, 2005. - N. Xu, S. Rangwala, K. K. Chintalapudi et al., “A wireless sensor network for structural monitoring,” in
*Proceedings of the Second International Conference on Embedded Networked Sensor Systems (SenSys'04)*, pp. 13–24, New York, NY, USA, November 2004. View at Scopus - H. Chan, A. Perrig, and D. Song, “Random key predistribution schemes for sensor networks,” in
*EEE Symposium on Security And Privacy (SP ’03)*, pp. 197–213, IEEE Computer Society, May 2003. View at Scopus - W. Trappe and L. Washington,
*Introduction to Cryptography with Coding Theory*, Prentice Hall, 2002. - C. L. Hwang, J. L. Williams, and L. T. Fan,
*Introduction to the Generalized Reduced Gradient Method*, Institute for Systems Design and Optimization, 1972. - O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,”
*International Journal of Robotics Research*, vol. 5, no. 1, pp. 90–98, 1986. View at Google Scholar · View at Scopus - W. Ding, G. Yan, and Z. Lin, “Self-deployment and coverage of mobile sensors within a bounded region,” in
*Proceedings of the Chinese Control and Decision Conference*, pp. 3683–3688, 2009. - Rouche N., Habets P., and Laloy M.,
*Stability Theory by Lyapunov’s Direct Methods*, Springer, 1977. - D. Dimarogonas and K. Kyriakopoulos, “An inverse agreement control strategy with application to swarm dispersion,” in
*Proceedings of the 46th IEEE Conference on Decision and Control*, pp. 6148–6153, 2007. - Mixmaster Remailer, http://mixmaster.sourceforge.net/.
- M. Gruteser and D. Grunwald, “Anonymous usage of location-based services through spatial and temporal cloaking,” in
*Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys ’03)*, pp. 31–42, ACM, 2003. - B. Hoh and M. Gruteser, “Protecting location privacy through path confusion,” in
*Proceedings of the 1st International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM ’05)*, pp. 194–205, IEEE Computer Society, 2005. - Ouyang Y., Le Z., Liu D., Ford J., and Makedon F., “Source location privacy against laptop-class attacks in sensor networks,” in
*Proceedings of the 4th international conference on Security and Privacy in Communication Netowrks (SecureComm ’08)*, pp. 1–10, ACM, 2008. - M. Shao, Y. Yang, S. Zhu, and G. Cao, “Towards statistically strong source anonymity for sensor networks,” in
*Proceedings of the 27th IEEE International Conference on Computer Communications (INFOCOM’08)*, pp. 51–55, 2008. - W. Conner, T. Abdelzaher, and K. Nahrstedt, “Using data aggregation to prevent traffic analysis in wireless sensor networks,” in
*Proceedings of the International Conference on Distributed Computing in Sensor Networks (DCOSS ’06)*, pp. 202–217, 2006. - A. Howard, M. Mataric, and G. Sukhatme, “Mobile sensor network deployment using potential fields: A distributed scalable solution to the area coverage problem,”
*Distributed Autonomous Robotic Systems*, vol. 5, pp. 299–308, 2002. View at Google Scholar - T. Balch and M. Hybinette, “Behavior-based coordination for large-scale robot formations,” in
*Proceedings of the 4th International Conference on Multiagent Systems*, pp. 363–364, 2000.