Abstract

In spatiotemporal crowdsourcing applications, sensing data uploaded by participants usually contain spatiotemporal sensitive data. If application servers publish the unprocessed sensing data directly, it is easy to expose the privacy of participants. In addition, application servers usually adopt the static publishing mechanism, which is easy to produce problems such as poor timeliness and large information loss for spatiotemporal crowdsourcing applications. Therefore, this paper proposes a spatiotemporal privacy protection (STPP) method based on dynamic clustering methods to solve the privacy protection problem for crowd participants in spatiotemporal crowdsourcing systems. Firstly, the working principles of a dynamic privacy protection mechanism are introduced. Then, based on k-anonymity and l-diversity, the spatiotemporal sensitive data are anonymized. In addition, this paper designs the dynamic k-anonymity algorithm based on the previous anonymous results. Through extensive performance evaluation on real-world data, compared with existing methods, the proposed STPP algorithm could effectively solve the problem of poor timeliness and improve the privacy protection level while reducing the information loss of sensing data.

1. Introduction

With the widespread use of wireless communication technologies and smart mobile terminals, location-based services (LBS) are becoming more and more popular [1, 2]. In many spatiotemporal crowdsourcing applications, participants receive corresponding rewards by submitting their own sensing tasks to crowdsourcing application servers [3]. However, the submitted sensing data contain the participants’ spatiotemporal data [4, 5]. If the crowdsourcing application server publishes these spatiotemporal data without processing, the participant’s privacy information will be obtained by attackers [6, 7]. More importantly, attackers can infer the participant’s recent medical service or entertainment venue by locating his spatiotemporal information to understand his health status, preferences, time, and scope of the outing [8]. Therefore, in a spatiotemporal crowdsourcing application, it is especially important to protect the spatiotemporal information of participants. The privacy protection technology based on spatiotemporal crowdsourcing has also become a research hotspot in the field of spatiotemporal crowdsourcing systems [9].

In order to ensure that participants’ spatiotemporal private information is not leaked when publishing data, a large amount of work on spatiotemporal privacy protection is devoted to disturbing and anonymizing the spatiotemporal data that may reveal personal whereabouts. To et al. [10] proposed a protection framework based on differential privacy. The workers in spatiotemporal crowdsourcing first submit their real location information to the truthful mobile service provider, and the mobile service provider uses a grid-based method to construct the private spatial decompositions (PSDs) for the original location information and adds Laplace noise to process workers’ real location data for privacy protection purposes. Vu et al. [11] proposed a privacy protection mechanism based on local sensitive hashing to group participants’ positions. Each group contains at least k participants to achieve spatial anonymity. The ideal partition of spatial data under low time complexity is realized, and participants’ location information is protected in a spatiotemporal crowdsourcing application scenario. In [12], the problem of insufficient diversity of k-anonymity algorithm to the participants’ sensitive locations is solved, and the probability of participants’ access to the sensitive locations is limited or the probability analysis based on adversary knowledge is used to ensure the location diversity.

However, most researchers currently only consider the data publishing in static scenario. The attackers use historical publishing results to reveal sensitive information during static publishing, for example, to compare with the results of the previous publishing. In spatiotemporal crowdsourcing applications, many data analysis applications actually involve dynamic data publishing. For example, in order to plan travel routes for special vehicles (cash trucks, ambulances, fire engines, etc.), it is necessary to issue a sensing task to collect road traffic jams [13, 14]. For such a spatiotemporal sensitive task, application server needs to dynamically publish sensing data submitted by participants to improve the timeliness of the task. Dynamic sensing data change constantly over time, so it is often necessary to anonymize and dynamically publish sensing data at different times. However, most anonymity algorithms are invalid when dealing with the dynamic publishing of spatiotemporal data [15]. The previous anonymity result cannot be effectively utilized. Because of the big data scenario, the time complexity of algorithms is high, and the timeliness is poor [16]. Moreover, most researchers proposed privacy preserving for participant’s location information but failed to consider that attackers can also infer other private information based on participant’s spatiotemporal information. According to these problems, we research the privacy protection for spatiotemporal privacy information in spatiotemporal crowdsourcing systems, and the following issues should be improved further:(1)In the process of dynamic data publishing, the results after anonymization should be effectively utilized instead of unifying the anonymization of incremental data with previous data to improve the timeliness of dynamic publishing of big data(2)In the process of anonymizing the location attribute of participants, the time attribute is added to effectively avoid the background knowledge attack and homogeneity attack against the location attribute

In order to solve the above problems, we propose a spatiotemporal privacy protection method for spatiotemporal crowdsourcing systems. The contributions of this paper are shown as follows:(1)A dynamic publishing algorithm based on spatiotemporal data privacy protection is designed by improving k-anonymity. When incremental data arrive, the anonymization result of the last time will be utilized to solve the timeliness problem of dynamic publishing.(2)Based on the traditional position coordinate, a time axis is added to form the spatiotemporal information of participants, and the anonymization of participants’ spatiotemporal information is carried out by applying k-anonymity and l-diversity methods, so as to solve the background knowledge attack and homogeneity attack problems.(3)In order to verify the effectiveness of the proposed privacy protection method, the comparison experiments with k-anonymity and variable centroid location aggregation (VCLA) [17] algorithms are conducted on two real-world datasets.

The structure of the paper is as follows. Section 2 introduces the related works of spatiotemporal privacy protection. Section 3 introduces the proposed spatiotemporal privacy protection method for spatiotemporal crowdsourcing systems. In Section 4, the real-world datasets and the existing anonymity algorithms are used for evaluating the performance of the proposed method. Section 5 concludes the paper and presents the future work.

In this section, we will introduce the related works about privacy protection methods for spatiotemporal data and dynamic publishing of sensitive data under a participatory sensing environment. Participatory sensing (PS) refers to the formation of a mobile Internet through daily mobile devices, where data are sensed, collected, analyzed, or screened by the public and professional users and then uploaded to the participatory sensing network [18]. With the popularization of mobile terminals and the rapid development of wireless sensor technology, the application of PS is becoming more and more common in real life. For example, in [19], Chen et al. studied the energy-efficient task offloading in mobile edge computing (MEC). However, in the process of task offloading, the privacy of participants will be exposed. In order to deal with the problems that participants’ privacy will be exposed during the task offloading process, Xu et al. [20] put forward a two-phase offloading optimization strategy for joint optimization of offloading utility and privacy in edge computing. Further, Xu et al. [21] discussed the problem that transmitted information is vulnerable to attack and may cause incomplete data during task offloading. A blockchain-enabled computation offloading method was proposed to ensure data integrity. In the implementation process of these participatory sensing applications, sensing tasks uploaded by participants will mark personal spatiotemporal data, which brings great risks to the privacy security and personal safety of participants. Therefore, while people enjoy the convenience brought by LBS, their privacy is also at risk of being exposed [22].

In LBS, using anonymous technology to solve the location privacy problem of participants has been widely studied [23]. The k-anonymity technology was firstly proposed by Samarati and Sweeney [24]. The parameter k specifies the maximum risk of information disclosure that users can bear. It requires at least k indistinguishable records on the quasi-identifier in published data, so that attackers cannot identify the specific individual that the privacy information belongs to, so as to protect personal privacy. In [25], the clustering-based k-anonymity strategy is adopted to protect the privacy disclosure of wearable owners when they upload sensing data. In [26], a k-anonymous location privacy protection method based on coordinate transformation was proposed for the problem that the third-party truthful server (TTP) was often untruthful in real life [27]. The anonymous server receives the coordinate-converted participant location and constructs an anonymous area without knowing the user’s actual location, thereby protecting the participant’s location privacy. In [28], the optimal k value of the current user is determined according to the user’s environment and social attributes, and a location protection k-anonymity method based on the truthful chain was proposed to protect the location privacy of participants while ensuring the quality of service.

However, k-anonymity cannot cope with the background knowledge attack and homogeneity attack. Machanavajjhala et al. [29] firstly proposed l-diversity to improve k-anonymity. Each k-anonymity group in the published data sheet contains at least l different sensitive attribute values, so that the probability that an attacker infers a certain record privacy information will be less than . In [30], considering the identity attributes of participants, it is ensured that each anonymous set at least has participants, and each anonymous set has p different sensitive values. In [31], k-anonymity and l-diversity were adopted as privacy models, and an anonymization method based on genetic algorithm clustering was proposed. The basic operator of genetic algorithm is improved to protect the personal sensitive information contained in the published report.

However, when requesting LBS services, the location of most participants is always related to time [32]. The above works only protect the location attribute of participants but do not associate the location attribute of participants with the time attribute. Trajectory anonymity refers to the sequence of user location information in a continuous period of time, which anonymizes and protects the user’s location attribute and time attribute together. In [33], the trajectory privacy protection method based on user demand was proposed. By dividing different time intervals and setting different privacy protection parameters for different trajectories, the anonymous trajectory equivalence class is constructed. In [34], the Hilbert curve was used to extract the distribution characteristics of trajectory data each time, and the personalized differential privacy publishing mechanism was designed according to the individual needs for different degrees of privacy. In [35], a collaborative trajectory privacy protection scheme for continuous query was proposed to confuse attackers by issuing false query, thus confusing users’ actual trajectory. In [36], a trajectory privacy protection algorithm based on trajectory shape diversity was proposed by combining k-anonymity and l-diversity to solve the trajectory privacy leakage problem that may be caused by the high similarity between trajectories in the anonymous set.

In the research of privacy protection data publishing (PPDP), the first proposed model was mainly used for static publishing, that is, only considering the one-time publishing of data, and the above research was mainly conducted for the static data publishing [37]. However, in many spatiotemporal crowdsourcing applications, a large amount of data stays in a changing state, and dynamic data publishing occurs from time to time [38]. In order to solve the problem that static publishing cannot resist link attacks and critical missing attacks, Wang and Fung [39] firstly studied the possible privacy leaks of data redistribution and proposed a method to prevent privacy leakage. The main idea of this method is to hide the true connection relationship between the two publishing versions, thereby weakening the global quasi-identifier. Xiao [40] firstly proposed the privacy protection model m-invariance for dynamic data publishing, whose key is to introduce pseudogeneralization technology to ensure that any QI group records in different data publishing versions have the same sensitive attribute value. In [41], because of the problem that the privacy protection association rule mining algorithm is not applicable to the dynamic change database, the incremental privacy protection data mining algorithm based on granularity calculation was proposed, and the incremental update algorithm was used to solve the problem of frequent item set calculation of incremental transaction database. In [42], a differential privacy histogram publishing method based on fractal dimension mining technology was proposed. The method used fractal dimension to cluster datasets and counted the values of each class. Laplace noise was added to data before publishing to achieve differential privacy. However, the above methods cannot cope with the privacy protection for spatiotemporal information, and it is difficult to adapt to the issue of dynamic privacy protection in spatiotemporal crowdsourcing applications. Even if the above methods consider participants’ location information, attackers could also infer participants’ privacy through time information. More importantly, the above methods are invalid for real-time data tasks.

Based on the above discussions, a dynamic publishing method for spatiotemporal privacy protection under the participatory sensing environment is proposed. By combining k-anonymity and l-diversity, the proposed dynamic publishing method could protect the privacy information of participants and reduce the information loss.

3. Dynamic Privacy Protection Algorithm

In this section, a dynamic privacy protection mechanism for spatiotemporal sensitive information is researched, and the working principles of the three main parts of the dynamic privacy protection mechanism are introduced. The proposed algorithms and corresponding explanations are given through an example.

3.1. Dynamic Privacy Protection Mechanism

The mechanism is divided into three parts: participants, TTPs, and application server.(i)Participants: in spatiotemporal crowdsourcing applications, participants are responsible for the collection and uploading of sensing data [43]. Sensing data uploaded by participant , , and include following attributes: , where is the identity attribute of , means completed sensing tasks uploaded by , and indicates real-time attribute and location attribute contained in , denoted by . It is a sensitive attribute and requires to anonymize. In the dynamic privacy protection publishing mechanism, participants submit sensing data in batches.(ii)TTPs: in this mechanism, participants firstly upload sensing data to TTPs, and TTPs preprocesses the sensing data to extract sensitive data (i.e., participants’ real spatiotemporal data) [44]. Then, using k-anonymity, the real spatiotemporal data are anonymized. The sensing data that do not satisfy the anonymity condition are stored in buffer pool and anonymize with the next incremental data. More importantly, when incremental data arrive, the corresponding equivalence classes will be added if the adaptive threshold is satisfied by utilizing the previous anonymity results. Finally, the cluster center value , is sent to the application server.(iii)Application server: for avoiding the background knowledge attack and the homogeneity attack against k-anonymity, cluster center values need to be clustered again based on l-diversity idea. Application server anonymizes according to the time attribute. Each cluster contains at least l cluster center values, and then the newly generated cluster center value , is published. After anonymization processing on TTPs and application server, the results are shown in Table 1, where , , and , both r and c, respectively, represent the number of position clusters and time clusters. represents a cluster containing spatiotemporal sensitive data. and represent the number of spatiotemporal sensitive data in the ith cluster. and represent the spatiotemporal sensitive data included in a cluster.

The dynamic publishing privacy protection mechanism proposed in this paper is different from the traditional spatiotemporal crowdsourcing process. In the process of traditional spatiotemporal crowdsourcing, requesters firstly publish tasks and then recruit participants to complete the task. In the process of uploading tasks by the participants, traditional spatiotemporal crowdsourcing does not consider the privacy of participants. More importantly, the static publish of tasks will reduce users’ experience. The working process of the proposed dynamic publishing privacy protection mechanism is shown in Figure 1. In Step 1, participants send collected sensing data (including spatiotemporal sensitive information) to TTPs by secure wireless networks. In Step 2, TTPs reprocess the sensing data and extract spatiotemporal sensitive data. The spatiotemporal sensitive data are used by k-anonymity to anonymize. If the clustering condition is not met, Step 3 is performed to temporarily store the corresponding spatiotemporal sensitive data into buffer pool. If the clustering condition is met, Step 4 is performed, and TTP sends the anonymity result to the application server. In Step 5, application server clusters based on l-diversity for the time attribute of anonymity results. In Step 6, application server publishes the sensing data containing anonymity sensitive spatiotemporal data. In Step 7, when other participants submit sensing data, incremental data are sent to TTPs together with the sensing data temporarily stored in buffer pool. In Step 8, sensing data are dynamically anonymized by utilizing the previous anonymity results. Perform the above process until participants no longer submit sensing data.

3.2. Static Publishing Anonymous Protection

In order to protect the spatiotemporal privacy of participants, k-anonymization is used to anonymize participants’ time and location attributes together. In the spatiotemporal crowdsourcing application, because of the different dimensions of time and location attributes of participants, we standardize the spatiotemporal sensitive data , by using the standard deviation method expressed by equation (1). represents the real spatiotemporal information of the kth dimension of the ith data. represents normalized spatiotemporal information of that is shown by the following equation:

The distance between participants and is calculated by equation (2). The distance includes spatial distance and temporal distance between participants and :

In order to easily find the center points of position cluster and time cluster, we calculate the global centroid of the actual spatiotemporal dataset for anonymization by the following equation:

In order to reduce the information loss and increase the privacy protection, we set the adaptive threshold expressed as follows:where r indicates that there is r spatiotemporal data in the cluster. The static publishing anonymity protection based on k-anonymity is shown in Algorithm 1.

Input: k-anonymous parameter k, the actual spatiotemporal dataset A from participants
Output: aggregation result U, buffer pool dataset B
(1) Calculate the global centroid of A by equation (3)
(2),
(3)while do
(4)  
(5)  
(6)  ,
(7)  for to do
(8)   Update the centroid of by equation (3)
(9)   
(10)   ,
(11)  end for
(12)  whiledo
(13)   Calculate the average distance of by equation (4)
(14)   
(15)   ifthen
(16)    ,
(17)    Update the centroid of by equation (3)
(18)   end if
(19)  end while
(20)  ,
(21)end while
(22)B = A
(23)return U, B

Algorithm 1 describes that participants send sensing data to TTPs. The TTPs firstly process the sensing data and extract participant’s real spatiotemporal information (represented by set A) as sensitive data for privacy protection. The input of Algorithm 1 is k-anonymity-specified parameter k and the participant’s real spatiotemporal dataset A. The output of Algorithm 1 is the anonymity result set U and buffer pool dataset B. Calculate the global centroid in Step 1. Step 2 initializes parameters, and count is the number of new split clusters. Steps 4–11 describe that the number of points in the new split cluster is k. Step 5 selects a point with the largest distance to the global center point . indicates the new cluster center point and will be deleted from A (Step 6). Steps 7–11 select points that have the smallest distance with to form a new cluster . Update the center point of (Step 8), and select the point with the smallest distance to (Step 9). Step 10 adds to cluster and removes it from A. Steps 12–19 extend the cluster , and in order to reduce information loss, we set the adaptive threshold ave (Step 13). If the point (Step 14) in A satisfies the adaptive threshold, it will be added to the cluster (Step 16), and the centroid of (Step 17) is updated. In Step 20, is added to U, and the number of cluster is updated. If there are remaining data in A, it is stored in the buffer pool B (Step 22). In Step 23, the output of Algorithm 1 is returned.

The real spatiotemporal information contained in the sensing data uploaded by participants is anonymized by TTPs and returned to anonymity result set U. Send the center point to the application server. Then, we illustrate the anonymity process of spatiotemporal data more visually by some data examples in experiments. As shown in Table 2, the first column is the class ID being run by Algorithm 1, the second and third columns are the real spatiotemporal information of participants, and the fourth and fifth columns are the anonymity spatiotemporal information. We can see that each equivalence class contains at least 3 points.

3.3. Improved Static Publishing Anonymity Protection Based on l-Diversity

However, k-anonymity is vulnerable to background knowledge attack and homogeneity attack. Therefore, when the application server publishes anonymity results, we adopt l-diversity to improve the algorithm. Application server receives the cluster center value sent by TTPs, anonymizes the time attribute based on l-diversity, and calculates the time center value by the following equation:where m refers to the number of spatiotemporal data anonymized by TTPs, i.e., .

Algorithm 2 describes the anonymous releasing based on l-diversity for time attribute. The input is l-diversity parameter l and the output set U of Algorithm 1, and the output is anonymous set L. Step 1 and Step 2 take the time set T and the position set O out of U, respectively. Step 3 calculates the global central value of time set T, and the number of initial clusters is count = 1. Steps 4–15 describe that the number of points in the newly generated cluster is l. Step 5 initializes time cluster and location cluster . Step 6 finds the with the largest distance to the global central value by equation (2). Step 7 adds to the new cluster , and the coordinate cluster corresponding to the subscript tma is added to the new cluster. Then, is deleted from the time set T. The l − 1 points with the smallest distance are selected to join the cluster (steps 8–12). Step 13 updates the time center value of cluster. The output of Algorithm 2 in Step 14 is, and the Cartesian product of time center value sets and position cluster. Steps 16–23 describe that if there is any remaining point in time set T, the cluster with the smallest distance (Step 18) is found by equation (2) and added to the cluster (Step 19), then the time center value of the cluster is updated (Step 20). In Step 24, the output of Algorithm 2 is returned.

Input: l-diversity parameter l, aggregation result U from Algorithm 1
Output: aggregation result L
(1) Take time set T out of U
(2) Take location set O out of U
(3) Calculate the global centroid by equation (5),
(4)whiledo
(5)  ,
(6)  
(7)  , ,
(8)  for to do
(9)   Update the centroid of by equation (3)
(10)   
(11)   , ,
(12)  end for
(13)  Update the centroid of by equation (3)
(14)  ,
(15)end while
(16)whiledo
(17)  fordo
(18)   
(19)   ,
(20)   Update the centroid of
(21)   
(22)  end for
(23)end while
(24)return L

The following is a more visual illustration of releasing spatiotemporal data based on l-diversity. As shown in Table 3, the first column is the group ID being run by Algorithm 2, the second column is the class ID being run by Algorithm 1, the third and fourth columns are the anonymous spatiotemporal information being run by Algorithm 1, and the fifth and sixth columns are the anonymous spatiotemporal information anonymized by the application server. We can see that each 2-equivalence group (l = 2) contains at least two 3-equivalence classes (k = 3), where the anonymous time attribute is the same and the anonymous location attribute is different.

3.4. Dynamic Publishing Anonymity Protection

For static one-release mechanisms, k-anonymity and l-diversity are valid. However, in real life, application servers usually publish sensing data dynamically. Therefore, in this section, we improve k-anonymity and l-diversity to accommodate dynamic publishing mechanism. Algorithm 3 describes the dynamic publishing anonymity protection.

Input: k-anonymous parameter k, aggregation result U from Algorithm 1, incremental dataset I, buffer pool dataset B
Output: aggregation result D, buffer pool dataset
(1) Calculate global dataset W=I+B
(2)for to r do
(3)  Take the centroid set out of U
(4)  
(5)  
(6)  ifthen
(7)   ,
(8)   Update the centroid of
(9)   
(10)  end if
(11)end for
(12)forto r do
(13)  ifthen
(14)   Callback Algorithm4
(15)    input: k-anonymous parameter k, cluster M
(16)    output: aggregation result G
(17)  end if
(18),
(19)end for
(20)return,
Input: k-anonymous parameter k, cluster M
Output: aggregation result G
(1) Calculate the global centroid of cluster M by equation (3)
(2)
(3)whiledo
(4)  
(5)  
(6)  ,
(7)  for to do
(8)   Update the centroid of
(9)   
(10)   ,
(11)  end for
(12)  ,
(13)end while
(14)whiledo
(15)  fordo
(16)   
(17)   
(18)   Update the centroid of
(19)  end for
(20)end while
(21)return G

Algorithm 3 describes how TTPs use the previous anonymity result to solve the problem of dynamic publishing when participants submit sensing data in different time periods. The input of Algorithm 3 is k-anonymous parameter k, the clustering result U of Algorithm 1, incremental dataset I (that is, the sensing data submitted by participants), and buffer pool dataset B. The output of the algorithm is the clustering result D and buffer pool dataset . The global dataset W is the incremental data I and the buffer pool data B (Step 1). Steps 2–11 describe the process of adding data from dataset W that meets the adaptive threshold condition to the last clustering result, where r represents the number of clusters of U (Table 1). First, the cluster center set in the clustering result U is taken out (Step 3), and Step 4 finds the subscript of the smallest cluster center point smi to point . Then, adaptive threshold values r ave are set by equation (4) (Steps 5 and 6), where r ave is the average distance between point and center point in cluster. If the point in dataset W meets the adaptive threshold, join the corresponding cluster and delete the point from W (Step 7), update the central value of cluster (Step 8), and assign the updated U to the output result D in Algorithm 3 (Step 9). Steps 12–19 describe that if the number of points in cluster is greater than or equal to 2k, then Algorithm 4 is called to split . The clustering result D is denoted by , where U is the number of points in cluster less than 2k, and G is the output of Algorithm 4 and temporarily stores the remaining data in W to buffer pool (Step 18). In Step 20, the output of Algorithm 3 is returned.

Algorithm 4 describes that if the number of points in the cluster is greater than or equal to 2k, the cluster is split. The input of Algorithm 4 is k-anonymous parameter k and cluster M. The output of Algorithm 4 is the clustering result G of the new split. Step 1 calculates the global center point of cluster M by equation (3). Step 2 initializes parameters, count is the number of new split clusters, and G is the output of Algorithm 4. Steps 3–13 describe that the number of points in the new split cluster is k. Step 5 selects a point with the largest distance to the global center point . Take as the new cluster center point and delete it from M (Step 6). Steps 7–11 select k-1 points that have the smallest distance with to form a new cluster, update the center point of (Step 8), and select the point with the smallest distance to (Step 9). Step 10 adds to cluster and removes it from M. In Step 12, the newly generated cluster is added to the output result G, and the number of clusters increases. If there are remaining points in cluster M, add them to the new cluster closest to them (steps 14–20). Step 16 finds a cluster having the smallest distance with the remaining point , add to (Step 17), and update the center point of cluster . In Step 21, the output G of Algorithm 4 is returned.

4. Experiments and Result Analysis

In this section, we use real-world datasets, including Gowalla’s Friendship Network dataset and Kaggle’s New York Taxi Travel Time dataset. Table 4 shows the number of attributes and data points and the density of data points contained in datasets. We compare the proposed STPP algorithm with k-anonymity and VCLA algorithms in terms of running time, information loss, and privacy protection. The hardware environment of the experiments is an AMD A8-5550M APU with Radeon (tm) HD Graphics @ 2.10 GHz equipped with 4 GB RAM and running the Win 10 OS.

Datasets are processed to better protect participants’ spatiotemporal sensitive data. First, we randomly extract 1000 data from Friendship Network dataset as a segment, a total of five segments, as participant’s sensing data to conduct comparison experiments. Then, we randomly extract 3000 data from New York City Taxi Trip dataset as a segment, a total of five segments, as participant’s sensing data to design comparison experiments. Each segment of sensing data is uploaded to TTPs in batches dynamically. Then, the spatiotemporal sensitive data of participants, including time and location attributes, are extracted from sensing data for anonymization.

Figure 2 shows the comparison of experimental results by comparing the proposed STPP algorithm with k-anonymous and VCLA algorithms on running time. Figure 2(a) shows the experimental result on Friendship Network dataset, and Figure 2(b) shows the experimental result on New York City Taxi Trip dataset. The x-coordinate is the number of participants, and the y-coordinate is running time. It can be seen that the STPP algorithm is superior to the other two algorithms, whether it is on a small dataset where participants’ spatiotemporal distance is sparse, or on a large dataset with dense spatiotemporal distance. When there are fewer participants submitting tasks, the running time of the three algorithms is not much different. It is because that the three algorithms are improved by k-anonymity algorithms, the STPP algorithm proposed in this paper does not have obvious advantages in terms of running time when there are few participants. However, when the number of participants gradually increases, STPP algorithm could better solve the problem of poor timeliness of data publishing due to the large number of participants in spatiotemporal crowdsourcing applications.

Since anonymized data are used for dynamic publishing, the difference between real spatiotemporal data and anonymized data is seen as the information loss. The information loss is expressed by the following equation:where represents the anonymized information of . k represents dimension, which includes time dimension and location dimension.

Figure 3 shows the comparison of experimental result by comparing the STPP algorithm with k-anonymous and VCLA algorithm on information loss. The x-coordinate is the number of participants on the Friendship Network dataset, and the y-coordinate is information loss. From the experimental result, it can be seen that the information loss increases with the increase of participants. Moreover, STPP algorithm is obviously better than the comparison algorithms on information loss.

Figure 4 shows the relationship between the parameter k of k-anonymous and the information loss of the STPP algorithm, where different curves represent different amounts of data. The experiments are conducted on the Friendship Network dataset. From the experimental result, it can be inferred that with the increase of k, the information loss increases gradually, which is because that increasing k leads to an increase of spatiotemporal sensitive data in clusters, and IL in each cluster will increase correspondingly.

For evaluating the performance of privacy protection, we use the probability of attackers’ attack success to quantify and compare, that is, attackers guess the probability of participants’ specific spatiotemporal data based on the published sensing data. Suppose that n sensing data are published, and spatiotemporal sensitive data , are aggregated into r location clusters and c time clusters. In this paper, equation 7 is used to quantify privacy protection, where and represent the average probability that attackers can infer real location attribute and time attribute of each sensing data, respectively:

Figure 5 shows the comparison of experimental result by comparing the STPP algorithm with k-anonymous and VCLA algorithms on privacy protection. The x-coordinate is the number of participants on the New York City Taxi Trip dataset, and the y-coordinate is the probability of attackers to infer specific spatiotemporal data of participants based on the published sensing data. From the experimental result, it can be seen that the privacy protection gradually increases with the increase of participants. It is because that if the number of participants increases, the sensing data published by the application server will increase correspondingly, which reduces the probability of attackers’ attack success, since the probability of successful attack without background knowledge is very low (y-coordinate unit is ). STPP algorithm is slightly better than the comparison algorithms on privacy protection.

Figure 6 shows the relationship between the parameter k of k-anonymous and the privacy protection of the STPP algorithm. We conduct the experiments on New York City Taxi Trip dataset. From the experimental result, we can see that with the increase of k, the privacy protection increases gradually, which is because that increasing k leads to an increase of spatiotemporal sensitive data in each cluster, and the average probability is reduced that the real spatiotemporal data are inferred by attackers.

When participants upload sensing data, TTPs will temporarily store sensing data that do not meet anonymity condition into buffer pool. Then, TTPs wait for the arrival of the next incremental data, which will generate the problem of delayed publish of sensing data. Figure 7 shows the ratio of buffer pool data to the number of sensing data for this publish. The x-coordinate is the number of participants on the Friendship Network dataset, and the y-coordinate is the ratio of sensing data in buffer pool. It can be seen that the proportion of data in buffer pool is very low, which proves that the sensing data in buffer pool have no great impact on delayed publish.

Through experiments on real-world datasets, we can see that the proposed STPP algorithm is superior to k-anonymous and VCLA algorithms in terms of running time, information loss, and privacy protection. STPP algorithm could solve the privacy protection problem of dynamic publishing for spatiotemporal crowdsourcing.

5. Conclusions

In the existing work, few researchers focus on privacy protection for dynamic publishing mechanism. There are few privacy protection methods for spatiotemporal sensitive data in dynamic publishing mechanism. In this paper, a dynamic publishing mechanism for spatiotemporal sensitive data privacy protection is proposed. Then, we design the dynamic k-anonymity algorithm and add the spatiotemporal data that met the adaptive threshold condition to the corresponding equivalence classes, making full use of the previous anonymous result to solve the problem of poor timeliness of static publishing. Thirdly, aiming at the shortcomings of k-anonymity, which is vulnerable to background knowledge attacks and homogeneous attacks, we anonymize participants’ time attribute based on l-diversity, so as to improve privacy protection and reduce information loss. Finally, the performance of the proposed STPP algorithm is evaluated on two real-world datasets. Compared with the existing algorithms, experimental results show that STPP algorithm has lower time complexity, less information loss, and stronger privacy protection.

In the future, we will detect and process malicious participants (i.e., outliers) so as to better reduce information loss and protect participants’ privacy data.

Data Availability

The experiment data used to support the findings of this study have been deposited in the GitHub repository (https://github.com/ltn21999/K_L-dynamic-privacy-protection).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant nos. 61822602, 61772207, 61802331, 61572418, 61602399, 61702439, and 61773331, the China Postdoctoral Science Foundation under Grant nos. 2019T120732 and 2017M622691, the National Science Foundation (NSF) under Grant nos. 1704287, 1252292, and 1741277, and the Graduate Innovation Foundation of Yantai University (GIFYTU) under Grant nos. YDYB2024 and YDZD1908.