Support Personalized Weighted Local Differential Privacy Skyline Query

Zhang, Guopeng; Chen, Xuebin; Jia, Yuanli; Zhai, Ran

doi:https://doi.org/10.1155/2022/9075470

Security and Communication Networks

On this page

Abstract Introduction Related Work Evaluation Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 9075470 | https://doi.org/10.1155/2022/9075470

Support Personalized Weighted Local Differential Privacy Skyline Query

Guopeng Zhang,^1,2,3Xuebin Chen ,^1,2,3Yuanli Jia,¹and Ran Zhai^1,2,3

Academic Editor: Mahmood Niazi

Received06 Dec 2021

Revised11 May 2022

Accepted06 Jun 2022

Published14 Sept 2022

Abstract

The potential privacy risks in certain situations are of concern because of the frequent sharing of data during skyline queries, leading to leakage of users’ private information. The most common privacy-preserving technique is to anonymize data by removing or changing certain information, for which an attack with specific background knowledge would render the privacy protection ineffective. To overcome these difficulties, this study proposes a personalized weighted local differential privacy method (PWLDP) to protect data privacy during skyline querying. Compared with existing studies of skyline queries under privacy protection, the degree of privacy protection can be quantitatively analyzed, and the processing of data privacy lies with the user, who quantitatively perturbs the processing according to the sensitivity of the weights of different attributes to avoid substantial information loss. The performance of the proposed PWLDP is verified by comparing PWLDP and LDP on different datasets, the average privacy leakage reduction of 62.22% and 51.67% is obtained for experiments conducted on different datasets relative to the iDP-SC algorithm, and the experimental results demonstrate the efficiency and advantages of the proposed method.

1. Introduction

In the era of IoT, the amount of data generated is growing geometrically, and data have become an essential strategic resource. However, with the massive use of data, the privacy of data owners has become the most significant issue, and with the increasing number of data privacy protection regulations, for example, with the introduction of the EU’s “general data protection regulation (GDPR) [1]” in 2016, companies are no longer able to exchange or share data containing user privacy as freely as before, even if they subjectively wish to do so; China’s “Data Security Law” clearly states that data processing, including the collection, storage, use, processing, and transmission of data, the departments dealing with data need to take the necessary measures to ensure that the data are in a state of adequate protection and legitimate use; and privacy protection has become an urgent issue at present.

How to dig out the data of interest to users from the huge amount of data and make decisions that satisfy their preferences has led many researchers to pay attention to skyline queries [2]. Skyline queries refer to the discovery of the set of all tuples from a dataset that is not dominated by any other tuples. Liu et al. [3] proposed a new structure, namely, the skyline diagram, to implement precomputation for skyline queries. The skyline diagram consists of skyline regions, called skyline polyominoes, each corresponding to the same set of skyline results, and the final result shows that the proposed algorithm is effective and extensible for both the exact skyline diagram and the approximate skyline diagram. Saad et al. [4] proposed the SkyQUD algorithm to answer skyline queries for data with uncertain dimensions, but many organizational databases may contain various sensitive data, such as personal case data or financial data, the disclosure of which can seriously violate individual privacy and may lead to significant reputational damage and PR crises for these organizations, and the increasing frequency of data sharing and querying will result in skyline queries that can reveal private user information. Differential privacy [5] (DP), as the mainstream privacy protection technique today, requires particular attack assumptions and background knowledge for K-anonymity, L-diversity, and T-closeness anonymization [6] approaches. In contrast, differential privacy can resist various forms of attacks under the condition that the attacker has maximum background knowledge, and thus, it is robust and reliable. Considering the privacy treatment of local differential privacy [7] (LDP), it has been used and performed well in practice, e.g., Apple [8], Microsoft [9], and other companies have accomplished specific user behavior information statistics with the help of this technique, so skyline queries are performed under local differential privacy protection, but this technique is limited to providing the same level of privacy protection for all individuals. However, not all attributes of users need the same level of privacy, and personalized privacy protection needs to be implemented to avoid providing too much privacy protection for those attributes that do not need too high a privacy level. In this study, a skyline query with personalized weights under local differential privacy is proposed, where each organization assigns subjective and objective weights to the dataset attributes, and both weights are used to calculate new weights, where each data owner has their privacy requirements for each dimension of the data, and the newly calculated weights represent the privacy budgets of the data owner for different dimensions; then, the data owner’s data in different dimensions will be subject to different privacy budgets. The data owner does not have to report its privacy allocation to the server, i.e., the server retains only the data that the data owner perturbs to the process. The specific contributions are summarized as follows:(1)We propose a new privacy method called personalized weighted local differential privacy (PWLDP), which primarily protects users’ privacy and allows data owners to add noise to attributes according to different privacy needs, realizing personalized privacy protection.(2)We propose a skyline query based on personalized weighted local differential privacy, under which each organization performs skyline queries locally and then perturbs them and sends them to the data publisher for integration and final skyline query statistics. The degree of privacy protection can be quantitatively analyzed to make up for the shortcomings of existing studies.(3)Experiments on different datasets validate that the PWLDP algorithm with a personalized privacy policy has higher privacy and accuracy of its skyline query results compared to the LDP algorithm, and the PWLDP algorithm yields more minor errors in the results compared to the LDP algorithm.

Skyline query research is divided into two areas: optimization of skyline query algorithm and application of skyline query algorithm to related research areas [10]. Skyline query algorithm mainly includes block nested-loop algorithm, nearest-neighbor algorithm, branch-and-bound algorithm, etc. Yang et al. [11] extended for skyline single-point queries and proposed a top-k group skyline query method based on skyline layers to optimize and speed up by pruning the points on high skyline layers. It is verified that when the k value is small, the skyline group can be quickly found, and the query efficiency is improved. The effectiveness of naive skyline query decreases overall due to the expansion of data volume or the rise in dimensionality, which will lead to the increase in cost for comparison between data, so Choi et al. [12] proposed HI-Sky. This method can perform fast skyline computation by using hash indexes, which exploits the fact that grid location address (GLAD) has column priority ordering and data spatial location information, and HI-Sky can efficiently manage data by hash indexing and can be good at cleaning up unnecessary comparisons during the comparison process.

At this stage, the research work on data privacy handling in the skyline query process is mainly focused on homomorphic encryption techniques [13, 14], but the encrypted keys are brute force broke, which still leaks privacy during skyline queries, and the level of privacy protection cannot be quantitatively analyzed [15]. Zaman et al. [16] proposed a new method for computing skyline in a multiparty computing environment in the MapReduceHadoop framework without revealing an object’s value to another party, demonstrating the validity and extensibility of the proposed secure skyline computation. Liu et al. [17] proposed a new framework, PUSC, which introduces a user-defined vector-dominated secure protocol that compares the vector-dominated relationship between two cryptographic vectors based on the user’s preferences, which is not efficient enough due to the complexity of the different protocols and the complexity of the computational process, which takes a lot of time to execute. Hua et al. [18] proposed an efficient and privacy-preserving online medical primary diagnosis (CINEMA) framework. Within the CINEMA framework, maintaining the privacy of users’ dynamic skyline queries is considered, and users can accurately access online medical primary diagnosis services without revealing medical data. Qaosar et al. [19] proposed to compute skyline in a secure multiparty computing environment using the Paillier cryptosystem [20] to transform object attribute values without changing the order of objects on each attribute; each participant collaborates with the other participants to securely prepare the encrypted order of objects on each feature. The skyline is then computed based on the order of the object attribute values on each dimension, without obtaining the original attribute values of the objects. Qaosar et al. [21] proposed a new privacy-preserving multiparty skyline query framework that utilizes additive homomorphic encryption and data anonymization, perturbation, and randomization techniques that do not reveal data to others during multiparty skyline queries.

Differential privacy (DP) has been widely used in data publishing as a privacy-preserving method with solid privacy guarantees [22]. Differential privacy can be achieved by using privacy metrics and utility metrics to achieve a trade-off between privacy and utility. However, third-party servers are assumed to be trusted in the DP model. The local differential privacy model sets a stricter notion of privacy, and the application of LDP to data is of increasing concern. Sun et al. [23] applied random response techniques to the frequent itemset mining for personalized privacy requirements. Ouyang et al. [24] introduced a set-valued data collection approach (SetLDP) based on a category hierarchy under a local differential privacy model, whose central concept is to first to provide a random response to the presence of a category, and the results show that it can well protect the privacy information in the set-valued data. Lan et al. [25] proposed personalized differential privacy (iDP-SC) based on a spectral clustering algorithm to reduce the local sensitivity by the introduction of the spectral clustering algorithm, and the noise reduction generated by spectral clustering compensates for the information distortion error introduced by itself. Xiong et al. [26] proposed a new -LDP concept for capturing users’ privacy needs by accounting for the temporal relevance of spatiotemporal data at the same time as guaranteeing sensible utility, demonstrating its superiority in achieving a better trade-off between privacy and utility for real-time spatiotemporal data integration and rigorous privacy protection.

3. Basic Theory

3.1. Local Differential Privacy

If the output of any input after random encoding as the same result is similar, then the observer cannot infer the original data from the production, so the privacy of the data is protected, and based on such an idea, the definition of local differential privacy is described as follows.

Definition 1. (-LDP [7]). For a given privacy budget , if the random perturbation mechanism satisfying -LDP, when and only when any inputs , and output satisfy the following conditions: in equation (1) is a parameter that controls the strength of privacy protection, and the closer the value is to 0, the higher the degree of privacy protection of the algorithm .
Differential privacy has essential properties such as sequential composition and parallel composition, and the compositional nature of differential privacy can help designers partition the privacy budget .

Property 1. Sequence combinatoriality [27]. Suppose that given a dataset with n random response algorithms such that , algorithm satisfies local differential privacy, and then, the sequence consisting of algorithms also satisfies -local differential privacy.

Property 2. Parallel combinatoriality [27]. Suppose that given a dataset , where the datasets and are disjoint subsets, algorithm satisfies -local differential privacy, and then, the sequence consisting of algorithms also satisfies max -local differential privacy.

3.2. Skyline Query Calculation

With the booming development of internet technology, a large amount of data is generated all the time. Skyline query can effectively analyze and find accurate results from this high-value, colossal amount of data, which can make appropriate decisions according to user needs. It plays an important role in multitarget decision-making, data mining, and other needed fields.

Definition 2. (Skyline calculation [2]). For a -dimensional dataset , i.e., , the attribute of the dataset is denoted by , and suppose any two data records and are said to be dominated by if they satisfy the following condition, then is denoted.(1);(2);For any set of data records , , if , and , the data records and are said not to have any dominance relationship.

Definition 3. (Skyline points). For a -dimensional dataset , if there is a data record and , for any one data record, other than , there is no , then the data record is the Skyline point of the dataset .

Definition 4. (Skyline query), it refers to the process of filtering a set of data records that are relatively better in all attribute dimensions from a known dataset, and the result of the above query process is the skyline result set. In addition, any data record in the result set is a skyline point, and the result set is denoted as , which is formally represented as follows.

Definition 5. (Skyline query additivity [28]). We assume n dataset , if the dataset satisfies , thenIn skyline query, local skyline query represents the set of nonexisting dominance relation objects in , i.e., . The global skyline query represents the set of objects in U that do not exist in the dominance relation, i.e., .

4. Privacy-Protected Skyline Queries

4.1. Problem Description

The scenario is shown in Figure 1, with organizations, a central server, i.e., a data publisher. The organization’s data usually affect the publisher’s skyline query of the data. Given that the data publisher shares data with multiple organizations, the publisher needs to comprehensively evaluate the organization’s data situation from multiple perspectives so that the organization can better assess the situation of its data and develop corresponding measures, and the data publisher assumes the role of the central server in the scenario. Let be the dataset of all institutions, we divide the dataset horizontally into datasets , where the institution has the dataset , where is the number of data dimensions. The number of data dimensions is the same for each organization, is the amount of data owned by the organization, and is the total number of data. Organizations do not want to send their local data to the central server, which would cause the server to leak the data. To solve the problem, each organization can perform a local skyline query on its data by preference relation to get the . Then, it will only send the perturbed to the data publisher. This process avoids unnecessary data wastage and data leakage and also avoids excessive leakage of raw data to the server.

In this scenario, it is set up that there is no collusion among organizations and between organizations and publishers, i.e., each organization and publisher are honest and curious, and the calculations are performed strictly according to the regulations.

4.2. Personalized Weighted Local Differential Privacy

4.2.1. Analytic Hierarchy Process

The analytic hierarchy process (AHP) is a widely used and effective method to determine the weights. It is a method that simulates the way of thinking of people’s decision-making process and mathematizes the thinking process of decision-making by using less quantitative information based on in-depth research on the nature of complex decision-making problems, influencing factors, and their internal relationships Table 2 .(1)Constructing judgment matrix Each attribute of user is selected as the index, and the judgment matrix is established by pairwise comparison between attribute indexes. The comparison influence degree assignment between attributes needs the help of “scale,” as shown in Table 1.(2)Consistency check First, the eigenvector of the maximum eigenvalue of the judgment matrix is calculated, and the vector is normalized to obtain . The sum of the elements in the vector is 1. We calculate the consistency index and consistency ratio . The in equation (5) denotes the order of the judgment matrix . The value in equation (6) is known from Table 2. When , it means that the consistency test is passed and the judgment matrix is reasonably constructed, and the larger the is, the more serious the degree of inconsistency of the judgment matrix is. The judgment matrix is modified and adjusted until , so that it has a good consistency.(3)Determining weights The judgment matrix Q passes the consistency test. The elements of vector generated after the normalization process are the weights of each attribute.

4.2.2. Entropy Weight Method

The entropy weight method is an objective method of attribute weight assignment. By calculating the information entropy of each attribute, the weight of the attribute is determined according to the impact of the degree of variation of the attribute on the dataset as a whole, and the attribute with a higher degree of variation receives a more considerable weight so that the attribute receives a more objective attribute weight.(1)Data processing Suppose the dataset has d attributes, , where , the value after processing the attributes of the dataset. where is the attribute of the user, is the maximum value of the attribute, and is the minimum value of the attribute.(2)Calculating the information entropy of an attribute In equation (8), .(3)Determining weights

4.2.3. Personalized Privacy Budget Allocation

The analytic hierarchy process has an advantage over the entropy weight method in determining the weights according to the decision-maker’s wishes but is less objective and more subjective. The entropy weight method has objective advantages, but it cannot reflect the importance of decision-makers to different attributes, and there will be a certain weight and degree opposite to the actual attribute. Given the strengths and weaknesses of the two alternative weighting methods, we hope to control the subjective randomness within a certain range and achieve a neutral weighting between subjective and objective. Therefore, when assigning weights to each attribute, the inherent statistical laws and authoritative value among the feature data should be considered. To make up for the shortcomings of a single method, a reasonable attribute assignment method is proposed, i.e., a combined assignment approach combining the hierarchical analytic hierarchy process and entropy weight method. The weight of the attribute of each data record is as follows.

The in equation (10) is the weighting factor, which is used to balance the parameters of the two weights, and the value of is temporarily taken as 0.5 in this study.

Therefore, we propose a new LDP concept called personalized weighted local differential privacy (PWLDP), in which the user sets the privacy budget for different features based on attribute weights as described in Algorithm 1.

	Input: : each data record has attributes; : denotes the overall privacy budget.
	Output: : the privacy budget allocated to each attribute .
(1)	Construct the judgment matrix of the data set according to equation (4)
(2)
(3)	According to equation (5), we obtain the consistency index
(4)	According to equation (6), we calculate CR
(5)	if then
(6)	Denotes a failure of the consistency checking
(7)	else
(8)	By passing the consistency check, the feature vector is calculated based on
(9)	end if
(10)	: normalization
(11)	Attributes of the standardized data record
(12)
(13)
(14)
(15)	According to equation (10), we can get
(16)
(17)	return ;

4.2.4. Comparison of PWLDP and LDP

Most of the traditional LDP approaches are to assign an even privacy budget to all attributes of the data owner; however, not all attributes of the user need the same privacy level, for example, when comparing the user’s age and identity ID, the ID is more sensitive than the age, so the two should not have the same degree of noise addition, so PWLDP can avoid providing too much privacy protection to those attributes that do not need too high a privacy level. In a multidimensional data scenario, PWLDP and LDP privacy budget allocation are related as follows: . Thus, when PWLDP provides an averaging privacy budget for the attributes of user-owners, then it becomes LDP, so PWLDP is a generalization of LDP, and the relation between the two is stated by the following theorem.

Theorem 1. For any -dimensional user record , if a perturbation mechanism M satisfies , then it also satisfies , where .

4.3. Skyline Query Based on PWLDP

4.3.1. The Perturbation Mechanism of PWLDP

Definition 6. (-PWLDP), given a privacy budget , the data owner assigns different privacy budgets to different features according to the feature weights, i.e., the privacy budget for each attribute is , and a random perturbation mechanism satisfies -PWLDP when and only when any input and output satisfy the following.The mechanism [29] is the mainstream perturbation mechanism of , the main idea is to give stochastic answers to the private data, to overcome the problem that this method is for binary variables, and later, researchers proposed a more generalized definition in the form of - [30, 31]. For any input , the output of its response is given in the following way as shown in equation.In equation (12) (k > 2) is the number of attributes containing different candidate values, is the probability in different perturbation cases, the probability response of is the true value, and the probability response of is any one of -1 outcomes except the true value, and algorithm 2 is as follows.

	Input: : attribute value; : privacy budget; : the maximum value in the value range of each attribute.
	Output: : property values after perturbation processing
(1)	if then
(2)
(3)	else
(4)
(5)	end if
(6)	return ;

4.3.2. Skyline Query Flow Based on PWLDP

The skyline query returns the object that cannot be dominated by any other object given the dominance relationship in the dataset. Suppose a two-dimensional dataset with a total of 16 points and the dominance relationship as the value of each dimension is the smallest, as shown in Figure 2 the skyline point is returned.

In a scenario where organizations and data publishers share data with each other, the data publisher acts as the central server, and each organization is responsible for providing the data. The publisher is not trusted by the organization because the publisher is likely to disclose the privacy of users in each organization. Therefore, each organization first performs a subset of skyline queries internally based on the preference relationships published by the data publisher, and each organization is responsible for uploading the scrambled skyline query results, avoiding unnecessary data leakage. As can be seen from Definition 5, the of the data publisher is based on the original data query of all organizations, which is the same as the skyline result that is computed based on the of each organization. Therefore, each organization can use the original data to obtain the locally against the skyline query and later perform the perturbation process, and it is only responsible for sending the perturbed to the data publisher. This process does not affect the accuracy of skyline queries and prevents organizations from leaking large amounts of original data to publishers. The data publisher shares the key results with each organization after the global skyline query, and each organization only knows its own key important data and the overall percentage of data to make the corresponding measure plan, and cannot get the details of other organizations from the data, and the sharing of data in this process does not disclose user privacy.

From algorithm 3 and Figure 3, it can be seen that PWLDP first provides local privacy processing for users to perform their own privacy perturbation processing of data, so that privacy is guaranteed, and then provides the allocation of a personalized privacy budget, considering that data owners have personal privacy requirements for each dimension of the data, providing different levels of noise addition to achieve personalized privacy protection.

	Input: : local skyline query results for each organization; : privacy budget; : the proportion of weights under each attribute.
	Output: = : global skyline query results.
(1)	for local skyline query data set from 1 to do
(2)	for data record from 1 to do
(3)	for each attribute from 1 to do
(4)
(5)
(6)
(7)
(8)	= np.full(shape = , fill_value = )
(9)
(10)	= np.random.choice(a = range(1, ), p =
(11)	end for
(12)	end for
(13)	end for
(14)	for each in List do
(15)	flag = True
(16)	for each in List do
(17)	ifthen
(18)	if () then
(19)	return or
(20)	flag =
(21)	end if
(22)	end if
(23)	end for
(24)	return. Index
(25)	end for
(26)
(27)	return to each organization

4.3.3. Privacy Analysis

For the privacy of PWLDP, we prove that it satisfies -LDP. Then, by extending this proof to -PWLDP, we can prove that the probability that the query result is indistinguishable on any two data records is not greater than .

Theorem 2. PWLDP algorithm satisfies -local differential privacy.

Proof. Any two different data records and represent the attributes of different data records, denotes the possible output value after random perturbance, and denotes the possible output value after perturbance of different attributes.
According to Definition 1, the PWLDP algorithm satisfies -local differential privacy.According to Definition 1, the PWLDP algorithm satisfies -local differential privacy.

5. Evaluation

To verify the performance of the PWLDP algorithm and its effectiveness, and we design multiple sets of experiments to test the PWLDP algorithm. We verify the effectiveness of the PWLDP algorithm from two aspects. First, the difference between the original data skyline query results and the perturbed skyline results is measured on different institutional datasets. The mean square error (MSE) is the average of the sum of squares of the difference between the perturbed skyline results and the corresponding real skyline results. The square absolute error (MAE) is the average of the sum of absolute values of the difference between the perturbed skyline results and the corresponding real skyline results. These two measures have been broadly applied to evaluate the utility of noisy results relative to true query results. Finally, after the publisher performed the final skyline query, the PWLDP algorithm was compared with the LDP algorithm results by using the precision rate and F-measure, and the degree of privacy protection of this study’s algorithm relative to the iDP-SC [25] algorithm is measured by the metric privacy leakage. To better avoid randomness, the algorithm query performance final metrics were based on the average of 1000 tests run in the same environment.

5.1. Experimental Setup

5.1.1. Experimental Environment

(1)Hardware environment: Intel(R) Core(TM) i5-7200U CPU @ 2.50 GHz Windows 10 PC with 8 GB RAM.(2)Programming environment: , PyCharm platform, and Jupyter platform.

5.1.2. Dataset Settings

The mammographic mass dataset and the breast cancer dataset were selected on the publicly available UCI dataset to evaluate both algorithms. To better simulate the scenarios in this study, we suppose that each dataset has three organizations. For the breast cancer dataset, it has 32 features, we selected 5 representative attributes by dimensionality reduction, and the dominant rule of skyline query is that if the attribute values are equally smaller in all dimensions, the record takes precedence over another record. The mammographic mass dataset selects five attributes, and again if the attribute values are equally small in all dimensions, the record takes precedence over another record.

5.1.3. Evaluation Metrics

(1)MAE (L1 loss)(2)MSE (L2 loss)(3)F-measure where = 1, is precision, and is recall.(4)Amount of privacy leakage (APL) [32].where indicates true-positive rate, and indicates false-positive rate.

5.1.4. Parameter Setting

(1)Privacy parameters: the privacy budget is set to 0.1, 0.3, 0.5, 0.7, 0.9, and 1.(2)Weight parameters. The different attribute weights for different datasets are shown in Table 3.(3)Comparison algorithms: the PWLDP algorithm, LDP algorithm, and iDP-SC algorithm.

5.2. Experimental Analysis

5.2.1. Impact of Privacy Budget on MSE and MAE

The privacy budget is used as a parameter for privacy protection, determining the degree of privacy protection. From Figures 4 and 5, it can be observed that the MSE and MAE values of PWLDP and LDP decrease with increasing . From Figures 4 and 5, the MSE and MAE values of the PWLDP algorithm and LDP algorithm are close to each other for = 0.1 with the increase of , the error values of both are gradually separated, and the most tremendous variability in the results of MSE and MAE values of the two algorithms is observed when = 1, it is because when is too small, and the probability of the perturbation being other values and the actual value is infinitely close concerning the KRR response mechanism. Under the same privacy budget , the MSE and MAE values of PWLDP are smaller than those of the LDP algorithm, which is because the personalized privacy allocation is provided. When the attribute value weight of the PWLDP algorithm is large, the allocated privacy budget is too large, the perturbance intensity decreases, and the privacy protection decreases. When the attribute weight is small, the budget is allocated too small, the perturbance intensity increases, and the protection is enhanced. Because the more extensive the weight is, the more important the value is for users. The PWLDP has higher security and lower privacy budget consumption.

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)

5.2.2. Impact of Privacy Budget on Precision Rate and F-Measure

The precision rate refers to the percentage of skyline query results that are correct records in the dataset after perturbation, and as observed in Figures 6 and 7, the precision rate gradually increases with the increase in the privacy budget . In Figure 6, it can be seen that for = 0.1, the precision rate of PWLDP is 0.573 and that of LDP is 0.571, with an improvement of 0.3%, and for = 1, the precision rate of PWLDP is 0.591 and that of LDP is 0.581, with an improvement of 1.7%. The improvement of PWLDP can also be observed in Figure 7. The F-value measures the effectiveness of the experimental method and also indicates the availability of the data, as can be seen from Figures 8 and 9, as increases, the PWLDP algorithm F-value also gradually increases and is higher than the LDP algorithm, because the degree of privacy protection decreases while the amount of noise added also decreases, the PWLDP as the privacy budget increases, the attribute values with higher weights retain the information of the original data with a higher probability and smaller probability perturbing for other data information, it is possible to make important data require light privacy protection, unimportant data attributes use light privacy budget, which has less impact on data availability and ensures that the availability of data is improved, and therefore, the F-value is also improved. From Figure 9, it can be seen that the F-value gradually approaches 1, and the effectiveness of the algorithm is greatly improved. The F-value in Figure 8 is relatively slow to improve because of the significant difference in the value domains of different attributes in the mammographic mass dataset because the method has better performance when the attribute takes a lower value domain.

5.2.3. Impact of Privacy Budget on Privacy Leakage

As can be seen from Figure 10, the smaller the privacy budget , the lower the privacy leakage, and the better the privacy protection, and as the privacy budget increases, the privacy leakage gradually increases, the iDP-SC algorithm has the most obvious privacy leakage, and the privacy protection becomes worse. There is no significant difference in the privacy leakage amount between the PWLDP and LDP algorithms when the privacy budget is set low. In Figures 10(a) and 10(b), the privacy leakage between the two gradually differs when = 0.3, the PWLDP algorithm adds noise based on the personalized weights of user attributes, and when the privacy budget is small, the difference between this personalized weight allocation budget and the uniform allocation budget is not very obvious, and as the budget increases, the budget allocated to certain attributes in PWLDP gradually increases and is larger than the budget uniformly allocated to attributes in LDP, so it causes a small difference between the two when the privacy budget increases. When = 0.1, the privacy leakage of the PWLDP in Figure 10(a) is reduced by 83.3% relative to the iDP-SC algorithm, and the privacy leakage of the PWLDP in Figure 10(b) is reduced by 56.6% relative to the iDP-SC algorithm. When = 1, the privacy leakage of the PWLDP in Figure 10(a) is reduced by 48.7% relative to the iDP-SC algorithm, and the privacy leakage of the PWLDP in Figure 10(b) is reduced by 36.1% relative to the iDP-SC algorithm. As can be seen from Table 4, the average privacy leakage for PWLDP on the mass dataset is 0.034 and 0.09 for iDP-SC, with an average reduction of 62.22%, and the average privacy leakage for the PWLDP on the cancer dataset is 0.029 and 0.06 for iDP-SC, with an average reduction of 51.67%. This shows that the PWLDP has higher privacy, lower privacy leakage, and personalized privacy protection as the privacy budget increases.

(a)

(b)

6. Conclusions

In this study, we consider the data privacy of skyline query under the data sharing scenario based on local differential privacy. It is worth exploring how to solve the data leakage problem in skyline query, and most of the current privacy protection methods about skyline query focus on encryption and anonymization, and the privacy is still hidden and cannot be quantitatively analyzed under the premise that data availability is improved. A personality weight assignment for local differential privacy (PWLDP) is proposed, first the privacy treatment of data lies in the hands of users, quantitative privacy protection is provided for different attributes according to individual privacy requirements, and experiments based on real datasets verify the effectiveness of the scheme in this study. The following two aspects are investigated in future work: (1) to study the appropriate weighting parameters to achieve a balanced and efficient personalization weighting; (2) to consider the user’s privacy needs and data sensitivity from multiple perspectives so that the personalization mechanism can be more widely expanded and applied.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant no. U20A20179).

References

P. Voigt and A. V. D. Bussche, “The eu general data protection regulation (gdpr),” A Practical Guide, Springer International Publishing, Cham, vol. 10, Article ID 3152676, 1st edition, 2017.
View at: Publisher Site | Google Scholar
S. Borzsony, D. Kossmann, and K. Stocker, “The skyline operator,” in Proceedings of the 17th international conference on data engineering, pp. 421–430, IEEE, Heidelberg, Germany, April 2001.
View at: Publisher Site | Google Scholar
J. Liu, J. Yang, L. Xiong et al., “Skyline diagram: efficient space partitioning for skyline queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 1, pp. 271–286, 2019.
View at: Google Scholar
N. H. M. Saad, H. Ibrahim, F. Sidi, R. Yaakob, and A. A. Alwan, “Efficient skyline computation on uncertain dimensions,” IEEE Access, vol. 9, Article ID 96994, 2021.
View at: Publisher Site | Google Scholar
C. Dwork, “Differential privacy: a survey of results,” in Proceedings of the International conference on theory and applications of models of computation, pp. 1–19, Springer, Hong Kong, China, May 2013.
View at: Google Scholar
A. M. Sazdar, S. A. Ghorashi, V. Moghtadaiee, A. Khonsari, and D. Windridge, “A low-complexity trajectory privacy preservation approach for indoor fingerprinting positioning systems,” Journal of Information Security and Applications, vol. 53, Article ID 102515, 2020.
View at: Publisher Site | Google Scholar
J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438, IEEE, Berkeley, CA, USA, October 2013.
View at: Publisher Site | Google Scholar
B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting Telemetry Data Privately,” 2017, https://arxiv.org/abs/1712.01524.
View at: Google Scholar
G. Cormode, S. Jha, T. Kulkarni, N. Li, D. Srivastava, and T. Wang, “Privacy at scale: local differential privacy in practice,” in Proceedings of the 2018 International Conference on Management of Data, pp. 1655–1658, Houston, TX, USA, June 2018.
View at: Google Scholar
Y. Tang and S. Chen, “Supporting continuous skyline queries in dynamically weighted road networks,” Mathematical Problems in Engineering, vol. 2018, Article ID 6749650, 14 pages, 2018.
View at: Publisher Site | Google Scholar
Y. Yang, W. Lu, and C. Tang, “A fast top-k group skyline query method based on skyline layer,” in Proceedings of the 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI), pp. 146–151, IEEE, Sanya, China, December 2020.
View at: Publisher Site | Google Scholar
J.-H. Choi, F. Hao, and A. Nasridinov, “Hi-sky: hash index-based skyline query processing,” Applied Sciences, vol. 10, no. 5, p. 1708, 2020.
View at: Publisher Site | Google Scholar
X. Liu, R. Lu, J. Ma, L. Chen, and H. Bao, “Efficient and privacy-preserving skyline computation framework across domains,” Future Generation Computer Systems, vol. 62, pp. 161–174, 2016.
View at: Publisher Site | Google Scholar
J. Liu, J. Yang, L. Xiong, and J. Pei, “Secure skyline queries on cloud platform,” in Proceedings of the 2017 IEEE 33rd international conference on data engineering (ICDE), pp. 633–644, IEEE, San Diego, CA, USA, April 2017.
View at: Publisher Site | Google Scholar
P. Saravanakumar, T. Sundararajan, R. Kumar Dhanaraj, K. Nisar, F. Hussain Memon, and A. Ibrahim, “Lamport certificateless signcryption deep neural networks for data aggregation security in wsn,” Intelligent Automation & Soft Computing, vol. 33, no. 3, pp. 1835–1847, 2022.
View at: Publisher Site | Google Scholar
A. Zaman, M. A. Siddique, Y. Annisa, and Y. Morimoto, “Secure computation of skyline query in mapreduce,” in Proceedings of the International Conference on Advanced Data Mining and Applications, pp. 345–360, Springer, Foshan, China, November 2016.
View at: Publisher Site | Google Scholar
X. Liu, K.-K. R. Choo, R. H. Deng, Y. Yang, and Y. Zhang, “Pusc: privacy-preserving user-centric skyline computation over multiple encrypted domains,” in Proceedings of the 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering, pp. 958–963, IEEE, NY, USA, August 2018.
View at: Publisher Site | Google Scholar
J. Hua, H. Zhu, F. Wang et al., “Cinema: efficient and privacy-preserving online medical primary diagnosis with skyline query,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1450–1461, 2019.
View at: Publisher Site | Google Scholar
M. Qaosar, A. Zaman, M. Siddique, Y. Annisa, and Y. Morimoto, “Privacy-preserving secure computation of skyline query in distributed multi-party databases,” Information, vol. 10, no. 3, p. 119, 2019.
View at: Publisher Site | Google Scholar
P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in Proceedings of the International conference on the theory and applications of cryptographic techniques, pp. 223–238, Springer, Trondheim, Norway, June 2022.
View at: Google Scholar
M. Qaosar, K. M. R. Alam, A. Zaman et al., “A framework for privacy-preserving multi-party skyline query based on homomorphic encryption,” IEEE Access, vol. 7, Article ID 167496, 2019.
View at: Publisher Site | Google Scholar
E. Antwi-Boasiako, S. Zhou, Y. Liao, Q. Liu, Y. Wang, and K. Owusu-Agyemang, “Privacy preservation in distributed deep learning: a survey on distributed deep learning, privacy preservation techniques used and interesting research directions,” Journal of Information Security and Applications, vol. 61, Article ID 102949, 2021.
View at: Publisher Site | Google Scholar
C. Sun, Y. Fu, J. Zhou, and H. Gao, “Personalized privacy-preserving frequent itemset mining using randomized response,” The Scientific World Journal, vol. 2014, Article ID 686151, 10 pages, 2014.
View at: Publisher Site | Google Scholar
J. Ouyang, Y. Xiao, Y. Xiao, S. Liu, Z. Xiao, and X. Liao, “Set-valued data collection with local differential privacy based on category hierarchy,” Mathematical Biosciences and Engineering, vol. 18, no. 3, pp. 2733–2763, 2021.
View at: Publisher Site | Google Scholar
Q. Lan, J. Ma, Z. Yan, and G. Li, “Utility-preserving differentially private skyline query,” Expert Systems with Applications, vol. 187, Article ID 115871, 2022.
View at: Publisher Site | Google Scholar
X. Xiong, S. Liu, D. Li, Z. Cai, and X. Niu, “Real-time and private spatio-temporal data aggregation with local differential privacy,” Journal of Information Security and Applications, vol. 55, Article ID 102633, 2020.
View at: Publisher Site | Google Scholar
F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 19–30, New York, NY, June 2009.
View at: Google Scholar
K. Hose and A. Vlachou, “A survey of skyline processing in highly distributed environments,” The VLDB Journal, vol. 21, no. 3, pp. 359–384, 2012.
View at: Publisher Site | Google Scholar
N. Holohan, D. J. Leith, and O. Mason, “Optimal differentially private mechanisms for randomised response,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 11, pp. 2726–2735, 2017.
View at: Publisher Site | Google Scholar
P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” Advances in Neural Information Processing Systems, vol. 27, pp. 2879–2887, 2014.
View at: Google Scholar
T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private protocols for frequency estimation,” in Proceedings of the 26th {USENIX} Security Symposium ({USENIX} Security 17), pp. 729–745, Vancouver, BC, Canada, August 2017.
View at: Google Scholar
B. Jayaraman and D. Evans, “Evaluating differentially private machine learning in practice,” in Proceedings of the 28th USENIX Security Symposium, pp. 1895–1912, Santa Clara, CA, USA, August 2019.
View at: Google Scholar

Copyright

Copyright © 2022 Guopeng Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

177

Downloads

243

Citations

Security and Communication Networks

Support Personalized Weighted Local Differential Privacy Skyline Query

Abstract

1. Introduction

2. Related Work

3. Basic Theory

3.1. Local Differential Privacy

3.2. Skyline Query Calculation

4. Privacy-Protected Skyline Queries

4.1. Problem Description

4.2. Personalized Weighted Local Differential Privacy

4.2.1. Analytic Hierarchy Process

4.2.2. Entropy Weight Method

4.2.3. Personalized Privacy Budget Allocation

4.2.4. Comparison of PWLDP and LDP

4.3. Skyline Query Based on PWLDP

4.3.1. The Perturbation Mechanism of PWLDP

4.3.2. Skyline Query Flow Based on PWLDP

4.3.3. Privacy Analysis

5. Evaluation

5.1. Experimental Setup

5.1.1. Experimental Environment

5.1.2. Dataset Settings

5.1.3. Evaluation Metrics

5.1.4. Parameter Setting

5.2. Experimental Analysis

5.2.1. Impact of Privacy Budget on MSE and MAE

5.2.2. Impact of Privacy Budget on Precision Rate and F-Measure

5.2.3. Impact of Privacy Budget on Privacy Leakage

6. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright