- About this Journal ·
- Abstracting and Indexing ·
- Advance Access ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents

International Journal of Distributed Sensor Networks

Volume 2014 (2014), Article ID 838391, 14 pages

http://dx.doi.org/10.1155/2014/838391

## A Game Theory-Based Analysis of Data Privacy in Vehicular Sensor Networks

^{1}School of Computer Science, Xidian University, Xi’an, China^{2}State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Science, Beijing, China^{3}College of Information Science and Engineering, Henan University of Technology, Zhengzhou, China

Received 12 April 2013; Accepted 28 October 2013; Published 20 January 2014

Academic Editor: Liusheng Huang

Copyright © 2014 Yunhua He et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Mobile traces, collected by vehicular sensor networks (VSNs), facilitate various business applications and services. However, the traces can be used to trace and identify drivers or passengers, which raise significant privacy concerns. Existing privacy protecting techniques may not be suitable, due to their inadequate considerations for the data accuracy requirements of different applications and the adversary’s knowledge and strategies. In this paper, we analyze data privacy issues in VSNs with a game theoretic model, where a defender uses the privacy protecting techniques against the attack strategies implemented by an adversary. We study both the passive and active attack scenarios, and in each scenario we consider the effect of different data accuracy requirements on the performance of defense measures. Through the analysis results on real-world traffic data, we show that more inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. In addition, we present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy.

#### 1. Introduction

With the advances and wide adoption of wireless communication technologies, vehicles are now often equipped with wireless devices that allow them to communicate with each other (V2V) as well as with roadside infrastructures (V2I). The V2V and V2I communications make driving more safe and improve a driver’s driving experiences. Such communication networks are called Vehicular Ad Hoc Networks (VANETs). However, with the increasing needs for sensing and data acquisition in cities, VANETs have turned into Vehicular Sensor Networks (VSNs) [1]. VSNs exploit vehicles and passengers to capture the occurrence of events, such as traffic volume, road surface condition, chemical, and radiation. The location traces in the traffic-related data create various fresh new business applications and services, such as map drawing [2], traffic prediction [3], city planning, and mobile network analysis [4].

However, the places in these location traces that a driver or passenger has visited may reveal his/her sensitive information, such as traffic law violations, political affiliations, and medical conditions [5, 6]. Although the information about vehicular mobility traces are often collected in an anonymous way, an adversary can reidentify the true owner of a trace. Because the location information of drivers and passengers can be openly observed in public places, and also can be disclosed voluntarily or inadvertently by themselves, such as a casual conversation, or published media such as news articles or web blogs [4]. The adversary who has partial knowledge of the whereabouts of drivers or passengers (which are called victims), can infer the traces’ true owners with high probability by using vehicular mobility constraints and spatiotemporal correlation [4, 7].

To reduce spatiotemporal correlation, some frequently proposed privacy protecting techniques suggest reducing the resolution of the recorded data [8, 9] or introducing noise in the data [10–12]. However, these techniques may not be suitable for privacy preserving in VSNs due to their inadequate consideration for the adversary’s knowledge and its attack strategies. On the other hand, these techniques can not meet the different data accuracy requirements of different applications and services [13]. To address these challenges, we must first analyze the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.

In this paper, we use a game-theoretic model to study the effect of the adversary’s knowledge and strategies and the data accuracy requirements on the performance of defense measures. More specifically, we first present location privacy issues in VSNs, including the ability and goal of the adversary and defender. Then, we define a game theoretic model the attack and defense game which models the strategy selection decision behavior of the adversary and defender. In this game, the adversary implements its attack strategies in both the passive and active attack scenarios; the defender uses the frequently proposed privacy protecting approaches to increase the hardness of the adversary to reidentify the victims. Finally, through analysis results on real-world traffic data, we show that attack strategies in different scenarios show different performance. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy for each attack strategy. The main contributions of this paper are as follows.(i)We define an attack and defense game model to capture the strategy selection decision behavior of the adversary and defender, and we show the effectiveness of defense strategies.(ii)We establish the attack and defense game based on real world traffic data. In particular, for different attack scenarios, we study both the complete information game (the defender knows the adversary’s knowledge on whereabouts of victims) and the incomplete information game (the defender does not know the adversary’s knowledge).(iii)Through the Nash equilibriums in these games, we show that the defender can balance the data accuracy and victims’ location privacy to obtain the maximum utility when an adversary implements the optimal attack strategy.

The rest of the paper is organized as follows. In Section 2, we discuss related work. Section 3 presents the network model of VSNs and the location privacy issues in VSNs including the ability and goal of the adversary and defender. In Section 4, we define the attack and defense game model and present the attack in different scenarios and defense strategies. In Section 5, we present our main analysis results in the complete and incomplete information game for different attack strategies. We conclude the paper in Section 6.

#### 2. Related Work

Several recent studies [4, 7] have analyzed the privacy risk of mobile traces and found that omitting identifiers from mobile traces does not guarantee anonymity due to the spatio-temporal correlation. Ma et al. [4] show that an adversary who have a relatively small amount of drivers’ location snapshots, could infer the true owners of anonymous traces with high probability. Montjoye et al. [7] study fifteen months of human mobility data for one and a half million individuals and find that four spatio temporal points are enough to uniquely identify 95% of the individuals. Therefore, there is an raising need for stronger privacy protection mechanisms.

In general, the existing solutions can be divided into two categories: reducing the resolution of the recorded data [8, 14] and introducing noise in the data. Hoh et al. [8] propose a disclosure control algorithm called uncertainty-aware path cloaking algorithm that selectively reveals GPS samples to limit the maximum time-to-confusion for all vehicles. Nergiz et al. [14] adopt the notion of k-anonymity to trajectories. They find a representative trajectory in trajectories so that every trajectory is indistinguishable from other trajectories.

These approaches of introducing noise have also been extensively studied in [10, 12]. Lu et al. [10] create a mix area in social spots to achieve the provable location privacy in VANETs. Huang et al. [12] proposed a solution called silent period to provide user with location privacy preserving in wireless networks. However, these techniques may be not suitable for privacy preserving in VSNs for two reasons. First, these techniques rarely consider the effect of the adversary’s knowledge and its attack strategies on the performance of defense strategies. Second, they cannot meet the different data accuracy requirements of different applications and services [13]. These issues are what we want to address in this paper.

Game theory provides the many needed mathematical frameworks for analysis, modeling, and decision processes for network security and privacy issues [15–17], so we adopt game theory to study the data privacy issues in VSNs. There are many works on using game theory in security aspects of VANETs. Raya et al. [18] model the revocation problem using a finite dynamic game with mobile nodes as players, who can detect misbehavior with a certain probability. Reidt et al. [19] design a distributed detection error tolerant revocation scheme called karmic-suicide by using a game theoretic approach. In [20], zerosum game, fuzzy game and fictitious play are applied to model the interaction of the attacker and defender. In [10], the authors use game theoretic techniques to prove the feasibility of their pseudonym changing strategy. Freudiger et al. [21] analyze the noncooperative behavior of mobile nodes by using a game theoretic model, where each player aims at maximizing its location privacy at a minimum cost. In this paper, we study a new aspect of privacy by evaluating the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.

#### 3. Preliminaries

In this section, we explain our network model, as well as our assumptions and the location privacy issues in VSNs. We conclude by sketching the problem this work aims at solving. In Table 1, we summarize the notations introduced throughout this paper.

##### 3.1. Network Model

In VSNs, vehicles and passengers act as sensors to capture the occurrence of events, such as traffic accidents, traffic distribution, and road weather information [22]. VSNs can perceive the traffic distribution in a city with efficiency and high accuracy, and thus they have been envisioned to have a great potential to revolutionize human’s driving experiences and metropolitan-area traffic flow control. Figure 1 illustrates the network model of VSNs. Vehicles use Dedicated Short Range Communication (DSRC) [23] technology to transmit the traffic related information, that is, *time, position* pairs, to Roadside Unit (RSU) via single-hop or multihop communications. Then, RSUs upload the information to a traffic control center through wired networks, so the center can predict the traffic distribution with very low cost and high accuracy.

We assumed that a set of traces, each of which recording intermittently the time and corresponding location of a mobile node, are used by various applications, such as traffic prediction, city planning, and mobile network evaluation. These traces are anonymous in that the true identity of a vehicle has been replaced by a random identifier, but one same true identity is always mapped to a same identifier. In the following, we will elaborate the privacy threat on these anonymous traces.

##### 3.2. Threat Model

An adversary tries to identify the complete path histories of one or more victims (drivers or passengers) from the anonymous traces. We assume that the adversary can collect certain side information about one or more victims. Each piece of side information gives the location of a victim at a time instant, although the information may not be exact. In practice, the side information may be obtained through the following means. First, nodes are open to observations in public spaces. Hence, the adversary may obtain the side information directly through meeting the victim by chance or engineered encounters. This case is called an active attack scenario. Second, nodes may disclose information on their whereabouts either voluntarily or inadvertently [11], which is called passive attack scenario. For example, a casual conversation between Alice and Bob may involve where Alice was around 8 am, or it may involve another person’s location.

We set the trace of node to be , the set of all nodes’ traces is , the side information of a victim as a map, , where is the time instant at which some side information about the victim’s locations is revealed, and is the set of all cell location IDs. Then, the ability of the adversary to reveal the victim’s trace can be given in the form of conditional entropy as follows: where is the conditional probability that the victim’s trace corresponds to the side information known by the attacker, and is the joint probability of the trace and the side information .

The ability of the attacker is determined by side information and conditional entropy because . When is fixed, the smaller the entropy , the smaller the conditional entropy , the stronger the attack power. When , the victim’s trace can be uniquely determined.

##### 3.3. Defense Measures

A defender tries to protect the privacy of victims, which is defined by the uncertainty of a node’s trace. The trace uncertainty is referred to as the entropy of the probability distribution of a node’s trace [9, 12]. Let , , denotes the probability that corresponds to the victim’s trace , so the entropy of the probability distribution of can be defined as The larger the entropy is, the more uncertainty the victim’s trace is. The smaller the entropy , the less uncertainty the victim’s trace , when , the victim’s trace can be uniquely determined.

Defense objectives for protecting trace privacy can be measured by k-anonymity. An anonymity set is denoted as that includes the nodes with their traces indistinguishable from that of the victim . The k-anonymity model for privacy protection as used in [24] essentially refers to an anonymity set with a minimum size , where the victim is guaranteed to be not distinguishable from at least nodes with respect to information related to the victim (such as location information). However, not all the nodes in the anonymity set are equally likely to be the victim since an adversary may be able to obtain side information on the nodes.

We use entropy to improve this model here. Let the defender want to provide an anonymity set with a minimum size ; then, the entropy of the distribution of the anonymity set is given by . Hence, the mechanisms for trace privacy protection should meet as follows: where is the side information that the defender believes the attacker has known, and is the uncertainty of the victim’s trace in the case of .

Since privacy is a context-specific property and is socially and/or culturally defined [6, 25], the trace privacy needs of individual users may vary, and further different users may require different k-anonymity entropy . But here, we only consider the average of k-anonymity entropy for all users.

##### 3.4. Problem Statement

Given the attacker’s strategies and the defender’s strategies, the problem is to find the relationship between the attack strategies and the performance of defense strategies, and the relationship between the data accuracy requirements and the performance of defense strategies. Then, based on these anlysis results, we find the optimal defense strategies for different attack scenarios.

In order to address these problems, we must consider (i) how to model the strategy selection decision behavior of the adversary and defender, and (ii) that the defender may not know the adversary’s knowledge.

#### 4. Attack and Defense Game

In this section, we introduce the attack and defense game model to capture the strategy selection decision behavior of the adversary and defender. We first define the game model and the concept of Nash Equilibrium (NE) throughout the paper, and then we present different attack strategies and defense strategies.

##### 4.1. Game Model

The game is defined as a triplet , where is the set of players, is the set of strategies, and is the set of payoff functions [26].

*Players*. The set of players , here , corresponds to the adversary and defender. There are two types of adversaries: Global Passive Adversary (GPA) and Local Active Adversary (LAA) [4].

*Strategy*. The set of strategies in the games is , where are the set of strategies of the adversary and defender, respectively. We will describe them in detail in Sections 4.3 and 4.4.

*Payoff Function*. When the defender knows the side information that the adversary has collected, we use complete information games. In a complete information game, the payoff function of the adversary is , and the payoff function of the defender is , where are the cost of attack and defense, respectively. In order to maximize attack power, is set to 0. The defense cost includes both the cost of implementing defensive strategies and the damage to data accuracy.

Typically, the defender does not know all the side information that the adversary has collected. Hence, we consider the suggestions proposed by Harsanyi [27]. We introduce a new player named Nature, which assigns a type to the adversary according to a prior distribution . can be considered as the side information that the adversary has collected. Then, the payoff functions are expressed as .

##### 4.2. Equilibrium Concepts

In complete information games, Nash equilibrium (NE) can be defined as follows.

*Definition 1. *A strategy profile is a Nash equilibrium if, for each player ,

In other words, in a NE, none of the players can unilaterally change his strategy to increase his payoff. A player can also play each of his pure strategies with some probability using mixed strategies. A mixed strategy of player is a probability distribution defined over the pure strategies .

In incomplete information games, we adopt the concept of Bayesian Nash equilibrium [21].

*Definition 2. *A strategy profile is a pure-strategy Bayesian Nash equilibrium (BNE) if, for each player ,
where is the type of player ’s opponents, and is the prior distribution of .

##### 4.3. Attack Strategies

###### 4.3.1. Attack Scenarios

According to the two types of adversaries, an attack can be classified as two scenarios: passive attack and active attack.

*Scenario A: Passive Attack*. In this setting, the adversary is given the complete (anonymized) traces. The adversary’s goal is, given some pieces of side information about a victim, to identify in some optimal fashion the complete path history of the chosen victim. The key assumptions are (i) the adversary is passive that it does not actively go out to seek encounters with potential victims and (ii) the side information given to the adversary contains noise. If sampled times , at which the actual node locations are published, are equally spaced, and is the time instants at which some side information about the victim’s locations are revealed, passive attack can be divided into the two following attacks.

*Attack 1 (A1)*. The side information references time instants that coincide with sampled times in the trace only, we have .

*Attack 2 (A2)*. The side information references time instants between two consecutive sampled times in the set of traces, we have , and for each , there exists such that,

A2 is the more general attack. To some extent, A1 can be considered as a special case of A2, that is, for each , . We assume that the adversary will attempt to use all known information in its inference strategy, by employing some form of Bayesian inference. In applying the Bayesian inference, the adversary can make use of some general knowledge, including constraints on nodal movements imposed by geography of the roads, and general movement preferences of the nodes.

*Scenario B: Active Attack*. The adversary is active in this scenario that it obtains side information about victims by encountering the victims. The adversary can obtain traces in a real time and gradual fashion, that is, as time progresses, the adversary is provided with the trace information together with the information acquired up to the real time instants. The goal here is to identify as many traces as possible. If is the trace of the adversary , active attack can be divided into the following attacks.

*Attack 1 (B1)*. The adversary stays at one fixed location, that is, for any and , .

*Attack 2 (B2)*. The adversary moves to maximize the amount of useful side information, that is, there exists at least one pair of and such that .

We assume that after encountering a victim, the adversary will not attempt to follow the victim. Because the objective of the adversary is to identify as many trace identities as possible.

###### 4.3.2. Strategies for A1 and A2

As noted before, the side information often contains noise. The adversary thus needs to perform Bayesian inference or use the maximum likelihood estimator to make the best guess. The goal is, given , to find the that gives the best match. The formulation of such a procedure is described below. Given , compute The goal of the maximum likelihood estimator is to find which maximizes the expression (7). Note that the denominator is a constant. In addition, without any knowledge about how the victim is chosen, we set the priori distribution of the victim to be uniform: as follows, . Hence the solution of the maximum likelihood estimator is given by

For A1 and A2, the expression (8) can be given in the following form.

*Scenario A1*. If the noise in the side information is independent and identically distributed, and obey some given distribution , the expression (8) can be written as
where , this location difference is computed using the Cartesian distance between the two cells [4].

*Scenario A2*. If node mobility obeys the Markov model, (8) can be given by
where , and when is a constant, the Markov process of node mobility is steady.

The expression (9) can be greatly simplified if the noise obeys specific forms, such as normal distribution or uniform distribution. Therefore, the adversary can use some heuristic approaches [4] to identify the victim’s trace. In the following we consider four strategies used by the adversary to identify the victim’s trace from the published trace set. We first describe them for scenario A1 as follows.(i)Maximum likelihood estimation approach (MLE). This is the same as formulation (9), that is, the similarity value of trace is given by where the trace with the maximum similarity value is declared to be the victim’s.(ii)Minimum square approach (MSQ). When the ’s takes normal distribution , that is, , , for some constant . Hence, the maximum likelihood estimator is essentially the same as the following minimum square approach: where the trace with the least similarity value is declared to be the victim’s.(iii)Basic approach (BAS). The adversary assumes that the noise is zero-mean and has a specific standard deviation (), but makes no assumption about its exact distribution. The adversary then computes the similarity value of trace with the side information as follows: where if and 0 otherwise. Hence, the adversary accepts a trace as a potential candidate if the trace owner appears in a radius of of the revealed location. The trace with the maximum similarity value is declared to be the victim’s.(iv)Weighted exponential approach (EXP). In this approach, which is proposed and analyzed in [11], the adversary does not know the type of noise or its magnitude. Similar to BAS, the adversary maximizes the similarity value of trace as follows: where is some weight assigned to the revealed cell and is a constant.

The above formula can be easily modified for scenario A2. For convenience, the probability that the vehicle is on the cell at time for trace is defined as the function as follows:
where , . Then we have
MLE_{2}
MSQ_{2} BAS_{2} EXP_{2}

The four strategies have the same computational complexity, which is linear in the number of pieces of side information and the number of nodes. Notice that we assume attack strategies that only collect the side information about one victim. However, the strategies can be easily extended to the case in which the adversary collects the side information about several victims. In particular, the MLE approach can be used directly without modification, while a properly picked threshold can be used for the other attack strategies to remove traces from consideration if their similarity to the victim’s trace is lower than the threshold.

###### 4.3.3. Strategies for B1 and B2

In the active attack scenario, the adversary observes the participants directly. The published traces are revealed in a real-time and synchronized way with respect to the information collected by the adversary. As there is no noise when additional information is acquired, the adversary does not need to use any inference strategy. Based on the idea of excluding the unmatched traces [4], attack algorithm can be described as follows.

Illustrated in Algorithm 1, the attack algorithm takes as input the traces that are published progressively. The algorithm first assumes that all the traces are candidate traces for each victim. A trace is said to be a candidate trace of a victim if it appears at the same set of times and locations as when/where the adversary meets the victim. As time evolves, the adversary removes candidate traces which do not agree with the observed information about each victim from the set for that victim. When a victim’s trace is identified, the identified trace is removed from the candidate set of other victims. Notice that the adversary may not identify a participant at times they meet each other, but the identification can occur at a later time when all but one candidate traces are identified and removed. Hence, the adversary identifies a participant more efficiently when it tries to identify as many participants as possible.

In the scenario B1, is fixed, because the adversary always stays at one position. While changes as the adversary moves in the scenario B2. B2 has experimentally a better performance than B1. The reason is because mobile nodes typically obtain more side information than stationary nodes.

##### 4.4. Defense Strategies

There are two types of defense strategies: anonymous and cloaking techniques. Anonymous techniques which hide the true identities of users can be implemented by several ways. For example, one user uses one or more pseudonyms [28–31], or a group of users share the same ID [32]. However, the true identity of each vehicle is only replaced by a random identifier in VSNs, so we will not discuss the anonymous techniques. Cloaking techniques, such as introducing noise data or reducing the recorded data, also can be used to protect users’ privacy. But using cloaking techniques will impact on the truth of the published information, so we should balance the users’ privacy and the data accuracy required by different applications.

In VSNs, the defender’s (i.e., the traffic control center or other authorities) objective is to increase the hardness for the adversary to identify a trace in the anonymous published traces. The defender can insert some bogus traces or delete the recorded data at some sampled times to achieve its objective. If is the set of the sampled times in the set of the published traces , the defense strategies can be expressed as follows.

*Defense 1 (D1)*. bogus traces are inserted into to form a new set of trace , such that and .

*Defense 2 (D2)*. sampled times are deleted in to form a new sequence , such that .

Theorem 3. *D1 can provide nodes with a higher trace privacy level.*

*Proof. *Before the execution of D1, the trace privacy levels of nodes meet as follows:

After the execution of D1, the trace privacy levels of nodes meet as follows:

Since , D1 provides nodes with a higher trace privacy level.

Lemma 4. *If is a concave function and is a random variable,
*

*Proof. *For a two-mass-point distribution, the inequality becomes
which follows directly from the definition of concave functions. Suppose that the lemma is true for distributions with mass points. Then, writing , , we have
where the first inequality follows from the induction hypothesis and the second follows from the definition of concavity.

Theorem 5. *If the probability density function of is concave over the interval , D2 reduces the probability that the victim’s trace is identified by the adversary.*

*Proof. *Let be the deleted sampled time. When there exists such that , A1 turns into A2. Since the probability density function of is concave over , is concave over . Then, we have

When there is no such , the adversary is always in A2. Then, we have
where .

So, in both cases, the probability that the victim’s trace is identified by the adversary will be reduced.

Notice that it is reasonable to assume that the probability density function of is concave over the interval , because the collected side information about appears more frequently in the vicinity of .

However, these defense strategies will bring some loss in data accuracy. The more the inserted bogus traces and the deleted sampled times, the more unreal the data. Hence, we assume that the defense cost is proportional to the inserted bogus traces and the deleted sampled times. We take into account the application requirements involved in the defense cost in a parameter . Then for A1, can be expressed as, where is a system parameter that indicates the requirement of an application, is the number of all the traces, is the number of the inserted bogus traces.

Similarly to A1, for A2 can be expressed as: where is a system parameter, is the number of all the sampled times, and is the number of the deleted sampled times in the sampled times.

#### 5. Analysis of Attack and Defense Game

In this section, we study both the passive and active attack scenarios. In the passive attack scenarios (A1 and A2), we consider games of complete and incomplete information. As the adversary can obtain the victims’ positions accurately in the active attack scenarios (B1 and B2), the defender does not need to infer the side information the adversary obtained. Therefore, we only establish the complete information game in scenario B1 and B2.

The traffic data used in the experiments contains mobility traces of taxis in Beijing, China. It contains GPS coordinates of approximately 28,000 taxis collected in a month in Beijing [33]. The location updates are quite fine-grained the average time interval between two consecutive location updates is less than 30 sec. In the experiments, the adversary tries to identify the trace of one participant (randomly picked from all the participants) by gathering side information. The noisy randomly sampled *time, location* pairs from the trace are revealed to the adversary as side information, which the adversary utilizes to identify the complete movement history of the victim from the anonymous traces. The defender inserts some bogus traces (D1) or deletes the recorded data at some sampled times (D2) against the attack strategies implemented by the adversary.

##### 5.1. Games for Scenario A1

The adversary has four attack strategies: MLE, MSQ, BAS, and EXP. The defender has two defense strategies: D1, and D2. In strategy D1, we set , where is the total number of the published traces. In strategy D2, we set , where is the total number of the sampled times.

###### 5.1.1. Analysis of Complete Information Game

Strategic form for complete information game in A1 is depicted as matrices, as in Table 2. Each player chooses a strategy simultaneously, and has common knowledge about the side information. We assumed that the side information contains 18 pairs of *time, location*. In particular, since the adversary is still in scenario A1 when the defender uses strategy D2, the side information which is not at sampled times is not involved in calculating the payoff.

We observe that Nash equilibriums depend on the value of the defense cost in this game. when the defense cost is zero, that is, , is a pure-strategy Nash equilibrium in the left strategic form. And is a pure-strategy Nash equilibrium in the right strategic form. when , from (27) and (28), we have , . For the left strategic form, the Nash equilibrium is if and if . For the right strategic form, the Nash equilibrium is if and if . In other words, or is the optimal defense strategy when is small, whereas or becomes the optimal strategy when is very large. Because a larger means a higher data accuracy, which restricts the defense measures.

###### 5.1.2. Analysis of Incomplete Information Game

Generally, the defender does not know how much side information obtained by the adversary. To solve the problem, Harsanyi [27] introduce a new player named Nature that turns an incomplete information game into a game with complete but imperfect information. We assume that Nature chooses the probability that side information contains 10 pairs of *time, location* is , the probability that side information contains 15 pairs of *time, location* is , and the probability that side information contains 20 pairs of *time, location* is , as shown in Table 3. We observe that MSQ is a strictly dominated strategy for the adversary. Nash equilibriums are , if .

When , we study which are the optimal defense strategies at different values of . As shown in Figure 2, the maximum point of each line corresponds to the optimal defense strategy. For example, In Table 3(c), the optimal defense strategy is if , if and if . in Table 3(d), if and if . In other words, when is small, more inserted bogus traces or deleted recorded samples show a better performance. When is very large, doing nothing becomes the optimal strategy.

###### 5.1.3. Discussion

In the complete information game, the adversary prefers the strategy MSQ. When the cost of defense measures is 0 or very low, we observe that the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. The method of deleting sampled times is more effective than the method of inserting bogus traces. When the cost of defense cost is high, the defender prefers to do nothing.

In the incomplete information game, the adversary also prefers the strategy MSQ, while the optimal defense strategies depend on the values of . If the defense cost is small, that is, when , the optimal defense strategy is , and when , the optimal defense strategy is . If the defense cost is high, that is, when , the optimal defense strategy is , and when , the optimal defense strategy is .

##### 5.2. Games for Scenario A2

In scenario A2, the adversary has four attack strategies: MLE_{2}, MSQ_{2}, BAS_{2}, and EXP_{2}, and the defender has two defense strategies. In addition, we assume that , and .

###### 5.2.1. Analysis of Complete Information Game

Strategic form for complete information game in A2 is depicted as shown in Table 4. The defender has the knowledge about the side information obtained by the adversary. The side information contains 18 pairs of *time, location*. We observe that Nash equilibriums also depend on the value of the defense cost . When , the pure-strategy Nash equilibrium is in Table 4(a), and in Table 4(b). When , the Nash equilibrium is if and if in Table 4(a). In Table 4(b), the Nash equilibrium is if and if . From Table 2 and Table 4, we observe that the attacks in A2 have a better performance than that in A1, because the adversary can use the side information between two consecutive sampled times.

###### 5.2.2. Analysis of Incomplete Information Game

Table 5 depicts the incomplete information game in scenario A2. MLE_{2} is a strictly dominated strategy for the adversary. and are Nash equilibriums when .

Figure 3 depicts the optimal defense strategies at different values of . The maximum point of each line corresponds to the optimal defense strategy. In Figure 3(a), the optimal defense strategy is if and if , and if and if in Figure 3(b). In Figure 3(c), the optimal defense strategy is if , if and if , and if and if in Figure 3(d). In Figure 3(e), the optimal defense strategy is if and if , and if and if in Figure 3(f). From Figure 2 and Figure 3, we also observe that the attacks in A2 have a better performance than that in A1.

###### 5.2.3. Discussion

In the A2 complete information game, the adversary prefers the strategy MLE_{2}. When the cost of defense measures is 0 or very low, the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. Compared with A1, the method of deleting sampled times in A2 has a lower performance, but still better than the method of inserting bogus trace. When the cost of defense cost is high, the defender also prefers to do nothing.

In the A2 incomplete information game, the adversary also prefers the strategy MLE_{2}, while the optimal defense strategies depend on the values of . If the defense cost is small, that is, when , the optimal defense strategy is , and when , the optimal defense strategy is . If the defense cost is high, that is, when , the optimal defense strategy is , and when , the optimal defense strategy is .

##### 5.3. Games for Scenario B1 and B2

Because the goal of the adversary is to identify as many traces as possible, the adversary does not need to use any inference strategy. We assume that the attack strategies are B1 and B2. The defender has two defense strategies, we set in D1, and in D2. Table 6 depicts the strategic form for complete information game in A2, where attack algorithm preformed 60, 100, 400, and 600 minutes. When , the Nash equilibriums are and if the attack algorithm preformed 60, 100, and 400 minutes, and and if the attack algorithm preformed 600 minutes.

When , the optimal defense strategies at different values of for different preformed time are depicted as shown in Figure 4. The maximum point of each line corresponds to the optimal defense strategy. From Figures 4(a)–4(h), we observe that the optimal defense strategy is when , when , when , when , and when , when .

In the complete information game, the adversary prefers B2 if the time that the attack algorithm performs is short, B1 if the time that the attack algorithm performs is long. It is because that when time is long, the adversary at one position can also meet other victims with a high probability. When , the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. When , the optimal defense strategies depend on the value of .

#### 6. Conclusion

In this paper, we analyze data privacy aspects of VSNs by using a game-theoretic model. We first quantify attack power and defense objectives for recording and comparing to the performance. Then, we define an attack and defense game model which can capture the strategy selection decision behavior of the adversary and defender. We also show the effectiveness of defense strategies. Finally, we establish and analyze the complete information and incomplete information games for passive and active attack scenarios based on real world traffic data. Through the analysis results, we show that attack strategies in different scenarios show different performances. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy. Therefore, our analysis results are useful for designing appropriate privacy protection mechanisms.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank Xiang Lu from the Institute of Information Engineering, Chinese Academy of Science, for helping revise the typos and grammar mistakes throughout this paper. This study is supported in part by the China 973 under Grant no. 2011CB302902, the NSF China Major Program (60933011, 61202099), National High-Tech R&D Program of China (863) under Grant no. 2013AA011102, Scientific and Technological Pilot Project under Grant no. XDA06040100.

#### References

- C. Zhang, R. Lu, X. Lin, P. H. Ho, and X. Shen, “An efficient identity-based batch verification scheme for vehicular sensor networks,” in
*Proceedings of the 27th IEEE Conference on Computer Communications (INFOCOM '08)*, Phoenix, Ariz, USA, April 2008. View at Publisher · View at Google Scholar · View at Scopus - Google Map, “Collection of anonymous location data,” https://support.google.com/gmm/answer/2839958?hl=en&ref_topic=2839910.
- K. Mershad and H. Artail, “A framework for secure and efficient data acquisition in vehicular Ad Hoc networks,”
*IEEE Transactions on Vehicular Technology*, vol. 62, no. 2, pp. 536–551, 2013. View at Publisher · View at Google Scholar - C. Y. T. Ma, D. K. Y. Yau, N. K. Yip, and N. S. V. Rao, “Privacy vulnerability of published anonymous mobility traces,” in
*Proceedings of the 16th Annual International Conference on Mobile Computing and Networking (MobiCom '10)*, pp. 185–196, 2010. View at Publisher · View at Google Scholar - X. Lin, R. Lu, C. Zhang, H. Zhu, P. H. Ho, and X. Shen, “Security in vehicular Ad Hoc networks,”
*IEEE Communications Magazine*, vol. 46, no. 4, pp. 88–95, 2008. View at Publisher · View at Google Scholar · View at Scopus - M. A. Razzaque, S. A. Salehi, and S. M. Cheraghi, “Security and privacy in vehicular Ad-Hoc networks: survey and the road ahead,” in
*Wireless Networks and Security*, Signals and Communication Technology, pp. 107–132, Springer, New York, NY, USA, 2013. View at Publisher · View at Google Scholar - Y. A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel, “Unique in the crowd: the privacy bounds of human mobility,”
*Nature Science Report*3, 2013. - B. Hoh, M. Gruteser, H. Xiong, and A. Alrabady, “Achieving guaranteed anonymity in GPS traces via uncertainty-aware path cloaking,”
*IEEE Transactions on Mobile Computing*, vol. 9, no. 8, pp. 1089–1107, 2010. View at Publisher · View at Google Scholar · View at Scopus - M. Gruteser and D. Grunwald, “Anonymous ge of lotion-based services through spatial and temporal cloaking,” in
*Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys '03)*, pp. 42–31, San Francisco, Calif, USA, 2003. View at Publisher · View at Google Scholar - R. Lu, X. Lin, T. H. Luan, X. Liang, and X. Shen, “Pseudonym changing at social spots: an effective strategy for location privacy in VANETs,”
*IEEE Transactions on Vehicular Technology*, vol. 61, no. 1, pp. 86–96, 2012. View at Publisher · View at Google Scholar · View at Scopus - A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in
*Proceedings of the IEEE Symposium on Security and Privacy (SP '08)*, pp. 111–125, Oakland, Calif, USA, May 2008. View at Publisher · View at Google Scholar · View at Scopus - L. Huang, K. Matsuura, H. Yamane, and K. Sezako, “Towards modeling wireless location privacy,” in
*Proceedings of the 5th International Conference on Privacy Enhancing Technologies (PET '05)*, pp. 59–77, Vigo, Spain, 2005. - R. Shokri, G. Theodorakopoulos, C. Troncoso, J. P. Hubaux, and J. Y. le Boudec, “Protecting location privacy: optimal strategy against localization attacks,” in
*Proceedings of the 19th ACM Conference on Computer and Communications Security*, Raleigh, NC, USA, October 2012. - M. E. Nergiz, M. Atzori, Y. Saygin, and G. Bariş, “Towards trajectory anonymization: a generalization-based approach,”
*Transactions on Data Privacy*, vol. 2, no. 1, pp. 47–75, 2009. View at Scopus - T. Alpcan and T. Başar, “A game theoretic analysis of intrusion detection in access control systems,” in
*Proceedings of the 43rd IEEE Conference on Decision and Control (CDC '04)*, pp. 1568–1573, December 2004. View at Scopus - T. Alpcan and T. Basar, “An intrusion detection game with limited observations,” in
*Proceedings of the 12th International Symposium on Dynamic Games and Applications*, July 2006. - J. Grossklags, N. Christin, and J. Chuang, “Secure or insure? A game-theoretic analysis of information security games,” in
*Proceedings of the 17th International Conference on World Wide Web (WWW '08)*, pp. 209–218, April 2008. View at Publisher · View at Google Scholar · View at Scopus - M. Raya, M. H. Manshaei, M. Félegyházi, and J. P. Hubaux, “Revocation games in ephemeral networks,” in
*Proceedings of the 15th ACM conference on Computer and Communications Security (CCS '08)*, pp. 199–210, October 2008. View at Publisher · View at Google Scholar · View at Scopus - S. Reidt, M. Srivatsa, and S. Balfe, “The fable of the bees: incentivizing robust revocation decision making in ad hoc networks,” in
*Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09)*, pp. 291–302, November 2009. View at Publisher · View at Google Scholar · View at Scopus - T. Alpcan and S. Buchegger, “Security games for vehicular networks,”
*IEEE Transactions on Mobile Computing*, vol. 10, no. 2, pp. 280–290, 2011. View at Publisher · View at Google Scholar - J. Freudiger, M. H. Manshaei, J. P. Hubaux, and D. C. Parkes, “On non-cooperative location privacy: a game-theoretic analysis,” in
*Proceedings of the ACM Conference on Computer and Communications Security (CCS '09)*, pp. 324–337, Chicago, Ill, USA, November 2009. View at Publisher · View at Google Scholar · View at Scopus - T. Chim, S. Yiu, L. Hui, and V. Li, “VSPN: VANET-based secure and privacy-preserving navigation,”
*IEEE Transactions on Computers*, 2012. View at Publisher · View at Google Scholar - Dedicated Short Range Communications (DSRC) Home, http://www.leearmstrong.com/DSRC/DSRCHomeset.htm.
- C. Zhang, X. Lin, R. Lu, P. H. Ho, and X. Shen, “An efficient message authentication scheme for vehicular communications,”
*IEEE Transactions on Vehicular Technology*, vol. 57, no. 6, pp. 3357–3368, 2008. View at Publisher · View at Google Scholar · View at Scopus - H. Nissenbaum, “Privacy as contextual integrity,”
*Washington Law Review*, vol. 79, no. 1, pp. 119–158, 2004. View at Scopus - D. Fudenberg and J. Tirole,
*Game Theory*, MIT Press, Boston, Mass, USA, 1991. - J. C. Harsanyi, “Games with incomplete information played by “Bayesian” players, I–III. Part I. The basic model,”
*Management Science*, vol. 14, no. 3, pp. 159–182, 1967. - Y. Sun, R. Lu, X. Lin, X. Shen, and J. Su, “An efficient pseudonymous authentication scheme with strong privacy preservation for vehicular communications,”
*IEEE Transactions on Vehicular Technology*, vol. 59, no. 7, pp. 3589–3603, 2010. View at Publisher · View at Google Scholar · View at Scopus - D. Huang, S. Misra, G. Xue, and M. Verma, “PACP: an efficient pseudonymous authentication-based conditional privacy protocol for VANETs,”
*IEEE Transactions on Intelligent Transportation Systems*, vol. 12, no. 3, pp. 736–746, 2011. View at Publisher · View at Google Scholar · View at Scopus - H. Lu, J. Li, and M. Guizani, “A novel ID-based authentication framework with adaptive privacy preservation for VANETs,” in
*Proceedings of the Computing, Communications and Applications Conference (ComComAp '12)*, pp. 345–350, Hong Kong, China, January 2012. View at Publisher · View at Google Scholar · View at Scopus - “IEEE draft standard for wireless access in vehicular environments—security services for applications and management messages,” IEEE P1609.2-2013, 2013.
- X. Lin, X. Sun, P. H. Ho, and X. Shen, “GSIS: a secure and privacy-preserving protocol for vehicular communications,”
*IEEE Transactions on Vehicular Technology*, vol. 56, no. 6, pp. 3442–3456, 2007. View at Publisher · View at Google Scholar · View at Scopus - Datatang Company,
*Taxi GPS data of one city in North of China (200903)*, http://www.datatang.com/data/2987.