Face Image Publication Based on Differential Privacy

Liu, Chao; Yang, Jing; Zhao, Weinan; Zhang, Yining; Li, Jingyou; Mu, Chunmiao

doi:https://doi.org/10.1155/2021/6680701

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Background Methods Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Applications of Semantic Mobile Computing in the Internet of Things and Web of Things

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 6680701 | https://doi.org/10.1155/2021/6680701

Face Image Publication Based on Differential Privacy

Chao Liu,^1,2Jing Yang ,¹Weinan Zhao,²Yining Zhang,³Jingyou Li,^1,2and Chunmiao Mu²

Academic Editor: Chi-Hua Chen

Received09 Oct 2020

Revised28 Nov 2020

Accepted10 Dec 2020

Published07 Jan 2021

Abstract

As an information carrier, face images contain abundant sensitive information. Due to its natural weak privacy, direct publishing may divulge privacy. Anonymization Technology and Data Encryption Technology are limited by the background knowledge and attack means of attackers, which cannot completely content the needs of face image privacy protection. Therefore, this paper proposes a face image publishing SWP (sliding window publication) algorithm, which satisfies the differential privacy. Firstly, the SWP translates the image gray matrix into a one-dimensional ordered data stream by using image segmentation technology. The purpose of this step is to transform the image privacy protection problem into the data stream privacy protection problem. Then, the sliding window model is used to model the data flow. By comparing the similarity of data in adjacent sliding windows, the privacy budget is dynamically allocated, and Laplace noise is added. In SWP, the data in the sliding window comes from the image. To present the image features contained in the data more comprehensively and use the privacy budget more reasonably, this paper proposes a fusion similarity measurement EM (exact mechanism) mechanism and a dynamic privacy budget allocation DA (dynamic allocation) mechanism. Also, for further improving the usability of human face images and reducing the impact of noise, a sort-SWP algorithm based on the SWP method is proposed in the paper. Through the analysis, it can be seen that ordered input can further improve the usability of the SWP algorithm, but direct sorting of data will destroy the -differential privacy. Therefore, this paper proposes a sorting method-SAS method, which satisfies the -differential privacy; SAS obtain an initial sort by using an exponential mechanism firstly. And then an approximate correct sort is obtained by using the Annealing algorithm to optimize the initial sort. Compared with LAP algorithm and SWP algorithm, the average accuracy rate of sort-SWP algorithm in ORL, Yale is increased by 56.63% and 21.55%, the recall rate is increased by 6.85% and 3.32%, and F1-sroce is improved by 55.62% and 16.55%.

1. Introduction

With the rapid development of information technology and multimedia technology, it is easier to obtain and share face digital images. Users can publish photos of their mobile phones or digital cameras to social networking platforms (such as Twitter, LinkedIn, WeChat) or other channels. Relevant statistics show that the number of face photos shared by users on major social networking platforms worldwide exceeds 3.2 billion every day. Also, there are numerous face image data derived from video. However, these digital images usually contain a wealth of personally sensitive information. If this information was collected and analyzed by a third party with ulterior motives, it may cause personal privacy disclosure and other unexpected losses.

Privacy is an emotional word, which implies different meanings to different people. According to the definition of International Organization Standardization (ISO), privacy refers to the characteristics that can distinguish individuals or groups from other individuals and groups. Different countries have different legal definitions of privacy, and different objects (individuals, enterprises, governments, etc.) define the scope of privacy differently. For digital images, sensitive information can be a specific person or object in an image, a face, or fingerprint; the embedded information (photo location information and creation time, etc.); or an area that the image owner is concerned about. How to publish and analyze without disclosing sensitive information is the main purpose of privacy protection.

The early research uses Anonymization Technology or Data Encryption Technology to solve the privacy protection of face image. Anonymization Technology refers to cover up real data with methods of hidden or fuzzy. It generally adopts anonymous operations such as suppression [1], generalization [2], analysis [3], slicing [4], and separation [5]. Especially, -anonymity [6] is a classic representative algorithm. -anonymity proposed that the sensitive information covered by the data should at least be indistinguishable from other data. Because of its shortcomings and shortcomings, -anonymity extends the -diversity method which ensures that each equivalent class contains at least different sensitive attribute values [7], the -closeness method to improve the global distribution of sensitive attributes [8], the -variance method for dynamic relational data [9], and the HD composition method [10]. References [11, 12] use an anonymization mechanism to propose the -same method. This method anonymizes the published digital image so that the probability of the attacker to identify the user identity through the published digital image again is less than . However, the main drawback of the traditional anonymization mechanism is too many assumptions about the attacker’s background knowledge and attack model, but those assumptions are not completely successful in reality.

Data Encryption Technology is another important research direction of image information security, and its representative methods include secure multiparty computing [13], homomorphism encryption [14], and classification algorithm [15]. References [16–19] prevent the invasion of the third party by controlling the user communication protocol. References [20–23] propose using pixel replacement or pixel value substitution to encrypt image content. References [24–28] encrypt the image by changing the transform coefficient of the image in the frequency domain. Similar to the problems of anonymization technology, data encryption technology will also make corresponding assumptions for attacks and then design the corresponding encryption algorithm based on these assumptions. But this kind of encryption method will fall into the cycle of “new encryption methods are constantly proposed but constantly broken.” Also, the encrypted image is not open.

Dwork first proposed differential privacy [29] in 2006, which disturbs sensitive data by adding noise to the output. Differential privacy can hide the influence of a single record. That means whether the record is in or not in the dataset, the output probability of the same result will not change significantly. The attacker’s ability to further reasoning is limited. Therefore, differential privacy is better than other privacy protection technologies by not making any assumptions about the background knowledge of any potential attacker. Besides, differential privacy is further studied in a series of papers of Dwork [30–34], and its implementation mechanism is proposed in [35, 36]. McSherry pointed out that some differential privacy algorithms for complex privacy problems satisfy two combinatorial properties: sequence composability and juxtaposition and combination [37]. In recent years, differential privacy is mainly used in data publishing, including histogram publishing [38–43], graph data publishing [44–48], data mining [49–51], data stream publishing [52], and spatial data publishing [53]. Due to the complexity of image data, researchers are still in the exploratory stage to use differential privacy technology to protect sensitive information in images.

The real field matrix is a common representation of the image. Any pixel in the image can be mapped to a numerical relative position in a 2D matrix. It is the most direct method to add Laplace noise to all the values in the matrix. Although this method can satisfy -differential privacy, it will cause excessive distortion and low usability of the disturbing image. Fourier transform and Wavelet transform are commonly compression techniques in image processing. In reference [54], an image compression method based on a discrete Fourier transform is proposed. This method adds the corresponding Laplace noise to the compressed image. Although the noise error is reduced, the reconstruction error is introduced in the image compression process. To reduce the impact of noise on the original image and improve the usability of published images, this paper proposes a differential privacy protection method for image publishing. It is inspired by the noncorrelation of the values in the image matrix; this paper tries to use image segmentation technology to transform the image gray matrix into 1D ordered data stream and then use the sliding window model to model the data flow. By comparing the similarity of data in adjacent sliding windows, the privacy budget is dynamically allocated which is used to solve the problem of image privacy protection. This method not only satisfies the differential privacy but also has high usability of the published image.

2. Background

2.1. Differential Privacy

Dalenius raised a problem with statistical databases: by accessing the database, no one should be able to get any information about a person [55]. However, due to background knowledge, absolute privacy protection is not possible. Differential privacy sidesteps this issue and turns to relative privacy protection. Any potential privacy breach will be limited to a small multiplier. To be in attention is that serious leaks may occur, but it is not because of whether a particular piece of data exists in the database.

As a carrier of information, a digital image is usually stored and transmitted by a 3D matrix (i.e., a color image can be expressed as R, G, and B, three 2D matrices). To facilitate the data processing, it tries to make normalization treatment to the 3D image matrix and then get the corresponding 2D image gray matrix. Image can be expressed as a 2D matrix , with is the number of rows, and is the number of columns of the matrix.

Formula (1) gives the specific calculation method.

Before giving the formal definition of differential privacy, the definition of the neighborhood is given first by combining with .

Definition 1. Given an image , the image gray matrix is obtained after normalization, so represents the gray value of the corresponding element in the matrix . If there is an , and there is only one element difference between and , that is, |, then and are said to be adjacent to each other.

Definition 2. A random algorithm for image data publishing is supposed. Range () is the output range of . if any output of algorithm on two two-dimensional matrices and which are adjacent to each other satisfies Equation (3), then algorithm satisfies -differential privacy.

In Equation (3), is usually a small positive number, which is used to weigh the relationship between privacy and precision. Relatively, if the is small, the privacy is higher, and the accuracy is lower, and vice versa. In general, the users select by executing a certain privacy policy. Besides, if an algorithm satisfies the -differential privacy when the neighbor database differs by one record, it satisfies the -differential privacy when the neighbor database differs at most by records.

To achieve differential privacy, a certain amount of random noise needs to be added to the query results. Intuitively, its magnitude should cover the maximum impact of a single record on the output. Therefore, the noise level is closely related to the global sensitivity of the corresponding query function.

Definition 3. is supposed to any query function and , the sensitivity of is expressed as

Laplace mechanism is the most common noise-adding mechanism. For achieving differential privacy, the noise generated by Laplace distribution (the noise distribution satisfies Laplace probability density function , ) is added to disturb the real output.

Theorem 4. Laplace mechanism: suppose is a query sequence of length . random algorithm , which takes database as input and outputs the following vectors. It will satisfy -differential privacy.

What should be noted here is that () is an independent Laplace noise. The magnitude of the noise is proportional to and inversely proportional to .

Theorem 5. Exponential mechanism: for any sampling method under the exponential mechanism, satisfies -differential privacy if it satisfies Equation (6).

The exponential mechanism mainly deals with the nonnumerical output of the sampling algorithm. In the mechanism, is the scoring function, is the global sensitivity of scoring function , and is the output domain of the algorithm. According to formula (6), the higher the scoring function of , the greater probability of output is selected.

2.2. Data Flow and Sliding Window Model

Differential privacy can ensure that the operation of inserting or deleting a record in a database will not affect the output of any query, thus ensuring that each record’s deletion or joining the database will not pose a threat to its privacy. To define differential privacy on a data stream, it is necessary to give the nearest neighbor relationship between two data streams.

Definition 6. For the data stream and , if there is at most one record difference between them, then and are neighbors to each other.

Definition 7. For the data stream and , they are adjacent to each other. A privacy algorithm is assumed, if the result of on and satisfies Equation (7), then algorithm satisfies -differential privacy.

At any time, the sliding window model only needs to consider and process the most recently arrived data. It can better reflect the characteristics that the importance of data in the data stream gradually decreases with time. Therefore, the sliding window model is usually used in data stream processing. The sliding window model is used to model a data stream with the length of . All sliding windows use the fixed size of . The value of is equal to the number of data contained in the sliding window. The sliding amplitude is the distance that the current sliding window moves forward compared with the previous one. Generally, the sliding amplitude is 1, which can be adjusted according to the actual demand. However, to ensure the continuity of the sliding window in the data stream, it is necessary to ensure . The data stream and two adjacent sliding windows and . Assumed 且 , Figure 1 shows the schematic diagram of adjacent sliding windows under different sliding amplitudes.

(a)

(b)

(c)

3. Methods

Most of the existing differential privacy methods for image publishing use image compression technology to transfer ; then, Laplace noise is added to the transformed data. After that, the disturbing data is restored to obtain which includes noise. However, there are two kinds of errors in the process of obtaining . One is the noise error which is caused by the Laplace mechanism. The other is the reconstruction error in the process of transformation. As a result, the total error of “” can be shown.

The process of converting an image into an image gray matrix can be understood as storing the gray value in each position of the corresponding two-dimensional matrix. As shown in Figure 2, is supposed to give each value a two-dimensional number to represent its position. This paper proposes to reconstruct the original two-dimensional number into a one-dimensional number, transform the image gray matrix into a data stream in the order of the one-dimensional number, and then, use the sliding window model to construct the data stream.

Because the image gray matrix does not have the overall mathematical meaning, namely, the numerical of is no correlation. It can be seen that the transformation of the 2D matrix into a 1D data stream does not destroy the distribution of the original image. Be in attention, the whole conversion process is reversible and lossless, so is avoided. In the research of this paper, the total error of is expressed as

In this paper, the imaging differential privacy publishing method uses a sliding window model to construct the data stream. Assuming that there is a sliding window at the moment of , and set . Whether the data in the current sliding window is noisy and published, it needs to be judged according to the preset threshold value. If and the latest published ( has been added noise by Laplace mechanism) is less similar than the default threshold, replace to be released; otherwise, is released as after it is allocated an appropriate privacy budget and adds noise. In the process of method implementation, the following three principles should be considered: (1)The data samples contained in any two adjacent sliding windows are all from the image gray matrix. Comparing the similarity between the two samples by numerical difference alone cannot show all the features of the image. Therefore, the measurement method of samples similarity needs to be considered and designed from the image features(2)Using a sliding window model to construct data flow needs to consider the allocation of the privacy budget. For any , the smaller privacy budget is allocated, the greater noise value is added, and the higher degree of privacy protection has, but this practice also caused the lower availability of . Therefore, it is necessary to establish a reasonable privacy budget allocation mechanism on the premise of satisfying -differential privacy(3)The method should satisfy -differential privacy and improve the usability of the noisy image

3.1. LAP Algorithm

In this paper, the Laplace algorithm is proposed based on the Laplace mechanism. The method does not change the original data, but directly uses Laplace noise to disturbed the value of and then release .

Given the gray matrix of an image, Laplace noise is added to each numerical matrix to obtain . According to Theorem 4, the LAP algorithm satisfies the -differential privacy, and is the privacy budget that is allocated to each number of the image gray matrix. is the total privacy budget, and is the privacy budget allocated to each numerical. To explain the problem more simply, the size of will not be specified under the same conditions.

This paper will continue to describe the error size of the lap algorithm. Since there are only Laplace noises in the algorithm, the sum of squares of errors is obtained as:

In formula (10), the sensitivity is . Because the value of comes from the calculation result of the formula (1), so exists. Besides, it can be seen from the above formula that the main factor determining the error size in the lap algorithm is the selected image size (). When the selected image size is too large, using the lap algorithm will produce a large amount of noise, which will lead to the image with low usability after adding noise. To improve usability, this paper proposes an image privacy-preserving publishing method based on a sliding window.

3.2. The Measurement Method of Similarity

The image privacy protection publishing method proposed in this paper needs to rely on an accurate similarity measurement method. Different from an ordinary matrix, an image gray matrix not only has mathematical meaning but also includes a color feature, texture feature, and spatial distribution feature of the image itself. The commonly used similarity measurement methods include Manhattan distance, Chebyshev distance, and cosine distance. Due to the existing methods cannot analyze and compare samples from the perspective of images, the results obtained are available, but it will affect the usability of privacy protected images. Therefore, an accurate image data similarity measurement method—EM (exact mechanism)—is proposed.

Suppose and are two adjacent samples obtained by using a sliding window model after the image gray matrix is transformed into a data stream, where . Euclidean distance is often used to compare the similarity of data. The formula is:

Since the data in the sample comes from pixel values, the Euclidean distance is used to measure the similarity of image data, and the results can reflect the difference of color features between different samples. According to formula (11), the similarity formula of the two samples is

There is an example to illustrate the limitation of obtained from Equation(12): there are three sample , , and . Equation(12) is used to calculate of and and of and . If , and can be determined more similar. But actually, it is not like this. Different color feature, texture feature, is not based on pixel feature. It needs statistical calculation in the region containing multiple pixels. In pattern matching, this regional feature has great advantages and will not fail to match due to local bias. Suppose , , and , there is a large global deviation between and , and a minimum local deviation between and (only one data difference between different samples). Equation (12) does not consider the influence of texture features on image matching results.

Jaccard similarity index is used to measure the similarity between two sets. It is defined as the number of elements in the intersection of two sets divided by the number of elements in the union, the formula is

For considering the influence of texture features, the Jaccard similarity index of two samples is used as the optimal value to add to the calculation formula of , the formula is

In the formula, is the Euclidean distance between two samples. is the Jaccard similarity index between the two samples. appears as a correction value to ensuring the denominator is not 0. Besides, to reduce the influence of to the calculation results of formula (14), the value range of should be controlled between 0.8 and 1.

Besides, hamming distance is often used to reflect the difference between two spatial vectors with the same structure and size. Hamming distance is a judgment method by comparing the values of the corresponding positions of two space vectors and in turn. If the value of and is the same, the result is 0. If it is different, it is 1. The result is to compare the values of all positions of the two space vectors in turn and then accumulate the result 1. The value obtained is the Hamming distance of the two space vectors. The formula of hamming distance is

The hamming distance can effectively measure the difference of spatial distribution characteristics between two samples, but its calculation result is a positive integer accumulated by the number 1, which cannot be directly used in the calculation of . Besides, the calculation results of Hamming distance are limited by the sample size, and the Hamming distance of different size samples is not comparable. To solve this problem, a method combining Hamming distance with a perceptual hash algorithm [55] is proposed to add a disturbance value to formula (14). It further enhances the accuracy of . The values of are as follows:

In the above formula, is the size of the sample, and is the minimum value given in advance. If , the two samples are not similar. If , the two samples are similar. To sum up, the EM mechanism can be expressed as

3.3. SWP Algorithm

Different from the lap algorithm, SWP (sliding window publication) uses a sliding window model to construct data flow. Through continuous moving of sliding window, it calculates between and by EM mechanism. If replaces to release ; if , the DA mechanism is used to allocate an appropriate privacy budget to get , then release . The implementation process of the SWP algorithm is as follows.

input : sliding window , privacy budget
output : 1. ;
2. ;
3. ;
4.
5. use EM mechanism to calculate between and ;
6. If
7. ;
8. ;
9. else
10. ;
11. ;
12. ;
13. ;
14. end if
15. end for
16. return .

Note that in the SWP algorithm, we use the iterative method to take a dynamic allocation privacy budget mechanism—DA (dynamic allocation) mechanism which can be used for a limited data stream. Different from the commonly used dichotomy privacy budget allocation mechanism, the DA mechanism solves the problem of too fast consumption of privacy budget in a data stream and reduces the impact of noise on the original data to a certain extent.

3.3.1. Dynamic Allocation

In the DA mechanism, the privacy budget for each consumption is assumed to be , and the number of sliding windows affected by each consumption of privacy budget is , there is , and , is the total number of sliding windows. Besides, is the remaining privacy budget after status is completed, then there is

We can see from the above formula of DA that under any state, , and the general formula of and is as follows

As shown in Figure 3, in the implementation process of the DA mechanism, is added Laplace noise to generate and publish it; the privacy budget is consumed with and then publish it. Using EM mechanism to calculate the between and , if the condition () is satisfied, then is replaced by and released. The above operation is repeated until finding all that meet the conditions. When , , and are the suitable conditions, all of them are replaced with . Because the replacement operation does not consume the privacy budget, the total privacy budget consumed by the four sliding windows is . If between and does not satisfy the condition (), adds Laplace noise to generate and publish it; the privacy budget which cost by is ..

Theorem 8. In SWP, the privacy budget consumed will not exceed , that is .
Certificate: according to formula (17), when , , then . In this state, it is expressed as where , , then . In this state, the formula is The proof is complete.
Property 1 sequence combination property: is assumed as a privacy data set, is random algorithms, and satisfies -differential privacy with .

Theorem 9. SWP method satisfies -differential privacy
Certificate: It can be seen from Theorem 4:

In the formula, is the budget privacy which is consumed in each state. According to the sequence combination property of difference privacy (property 1), the SWP method satisfies -differential privacy. The proof is complete.

Theorem 10. The error of the SWP method is not greater than the lap algorithm, it is:

Certificate: the DA used in the SWP method consumes a privacy budget of . The number of sliding windows affected by each consumption of privacy budget is , and there is . The Laplace mechanism adopted by the lap algorithm consumes a privacy budgetand ()

First of all, the maximum error caused by the DA mechanism is calculated. In this case, it knows , means , so it is

For other cases, it can be judged according to formula (20). After the completion of state , the remaining privacy budget of DA is , the remaining privacy budget of the Laplace mechanism is The DA allows and assume the existence of . From this, it can get

Due to , so , then , it is shown that .The proof is complete.

3.4. Sort-SWP Algorithm

In the SWP algorithm, the sliding window is set to move forward only in one-dimensional space, so the impact range of is completely determined by . The verification method is not complicated. Assuming that there is an image which use the SWP method to protect -privacy protection of (assuming ), and between and every is calculated. Figure 4 shows the distribution of specific data (abscissa is , the ordinate is ). When , and will be replaced by to publish. From this, we can see that .

However, by looking at all in Figure 3, it is found when , it is not only and satisfy the condition but also including and . According to the proof process of Theorem 10, under the same conditions, the larger the value of is, the smaller the is. If the SWP method is executed after sorting , assumed , can be obtained. The sorted result is shown in Figure 5.

However, the direct sorting of data will destroy the -differential privacy [56]. Reference [57] proposed to add independent Laplace noise to each subdataset and then use the added noise value to sort. But this method will cause inaccurate sorting results. Reference [58] proposed to use the repeated index mechanism to sample the data and then to sort the data according to the order of sampling. However, the accuracy of the ranking results is too dependent on the allocation of the privacy budget. Therefore, an approximate sorting method—SAS (Simulated Annealing-Sort)—is proposed to satisfy -differential privacy. In the SAS algorithm, the sampling technique in exponential mechanism is used to get the initial sorting results, and then Simulated Annealing Algorithm is used to optimize the initial sorting results, so more accurate sorting results can be obtained.

The Simulated Annealing Algorithm comes from the principle of solid annealing, and its core is the choice of solution. Simulated Annealing Algorithm is also a greedy algorithm. Random factors are introduced in its implementation process, which makes the algorithm accept a solution that is worse than the current solution with a certain probability. The annealing process in thermodynamics can be summarized as the state that the body with variable temperature drops slowly and reaches the lowest energy among molecules. Suppose there are discrete states in the thermodynamic system , and the energy of the current state is , the energy of the next state is . According to the Metropolis criterion, the probability of state transition at temperature () is expressed as follows:

According to Equation (27), the receiving state is transferred when . It will accept the state transition with probability when . As a positive number which is less than 1, is used to control the change rate of temperature by an iterative method, which also shows that the higher the temperature is, the greater the value of is. Conversely, the lower the temperature is, the smaller the value of will be.

Because the Simulated Annealing Algorithm is probability-based, the solution obtained is not necessarily correct. It is not contrary to the core idea of differential privacy. Based on the above analysis, the implementation details of the SAS algorithm are given.

Input : sliding window , Privacy budget and , initial temperature , Cooling coefficient , Termination temperature
Output : Approximation of
1. ;
2. Using EM mechanism to calculate the similarity between and all other sliding windows, get ;
3. ;
4. while
5. The scoring function of exponential mechanism is ;
6. An unlabeled … is selected with probability by using the exponential mechanism
7. marked which is currently selected
8. ;
9. ;
10. ;
11. while ()
12. Random select ;
13. ;
14. if()
15. Change the position of and ;
16. Update sorting of data in
17. Change the position of and;
18. else
19. if()
20. Change the position of and ;
21. Update sorting of data in
22. Change the position of and ;
23. end if
24. ;
25. Return the sliding window ={W₁, , …} after sorting, initial position .

In the SAS algorithm, the first line is to distribute to get . In line 2, using EM mechanism calculates the similarity of with all other sliding windows and stored in . Lines 4-7 use the exponential mechanism to rank from small to large. The fifth row is the scoring function of the index mechanism. In the SAS algorithm, we want those sliding window which is more similar to extracted with a higher probability. is a small positive number to ensure that the denominator of the scoring function is not 0. Besides, because deleting or adding a record only affects one count of , . In line 9, the sorted results are stored in . Line 10 is to store the sequence number of the position before sorting into . In line 11-24, the annealing algorithm is used to optimize the sorting results. In line 12-17, is randomly selected in , and the position of and is changed. Assumed energy change from it is . If , the changed and are obtained after accepting the adjustment of this position, and the changed and are regarded as the starting point of the next change. Lines 18-23, if , it is accepted as the change with a certain probability and make the corresponding adjustment. Note that when ranges from 0 to 1. In line 24, the temperature is adjusted by using (the value of is generally set as a positive number slightly less than 1). Since it is an annealing process, it can be seen from formula (12) that the value of will decrease with the increase of iteration times, and this change will cause the acceptance probability to decrease.

Based on the above contents, a sort-SWP method which is an optimization method based on sorting is proposed. The implementation process of the algorithm is shown as follows.

Input : sliding window , Privacy budget
Output : Satisfying differential privacy
1. ;
2. ;
3. ;
4. for to
5. if
6. ;
7. Else
8. ;
9. ;
10. endif
11. end for
12. return .

The sort SWP method consists of three main steps: first, the SAS algorithm is used to sort to get , where is used for exponential mechanism, and is used in the Laplace mechanism. To ensure the consistency of the sort SWP method, is set. Secondly, the SWP algorithm is used to add noise to then gets ; the third (lines 5-11), because the distribution of data in destroys the structure of the original image, the position number stored in is used to restore to .

Theorem 11. Sort SWP method satisfies -differential privacy
Certificate: in the sort SWP method, the total privacy budget is divided into three parts, and there exists . Inside, used exponential machine to select the sliding window and the probability is proportional to , it shows as

Given and its adjacent , for any value of , it can be seen from formula (28)

It can be seen that the process of selecting by using the exponential mechanism satisfies -differential privacy in Sort-SWP. Besides, according to Theorem 4, the Sort-SWP method satisfies -differential privacy. According to the proof process of Theorem 9, it can be concluded that the Sort-SWP method satisfies -differential privacy. Therefore, the whole process of the Sort-SWP method satisfies the -differential privacy. The proof is complete.

4. Experiment and Result Analysis

4.1. Sorting Method Experiment

Direct sorting will destroy the -differential privacy. Therefore, this paper adopts an approximate sorting method—SAS algorithm—which satisfies the-differential privacy. During the test, a one-dimensional array which is containing 57 random numbers is used as the experimental data set, and three sorting methods, namely, simulated annealing algorithm, exponential mechanism, and SAS algorithm, are selected for comparison. Besides, to verify the impact of the privacy budget on different ranking results, is taken as 0.4, 0.5, ln2, and 1, respectively. The results are shown in Figure 6.

In Figure 7, the abscissa represents the number of times the algorithm is executed, and the ordinate represents the accuracy of the sorting. The following three conclusions can be drawn from the experimental results: (1) the sorting results of the simulated annealing algorithm are only related to the execution times of the algorithm and not relevant to the value of ; (2) the ranking results of the exponential mechanism are only related to the value of , and the higher the value of is, the higher the sorting accuracy is. It is independent of the execution times of the algorithm; (3) the sorting results of the SAS algorithm are related to the value of and also related to the execution times of the algorithm, and they are proportional.

(a)

(b)

(c)

(d)

4.2. Simulation Experiment

In different kinds of images, the sensitive information of face photos is the most representative. Therefore, to verify the usability of the algorithm, it selects a front face photo () to complete the corresponding experiment in this paper. The experimental results of different algorithms under the same conditions are shown in Figure 8. Figure 8(a) is the original image; Figure 8(b) is the resulting graph obtained by directly adding Laplace noise to the original image; Figure 8(c) is the resulting graph obtained by using the SWP algorithm; Figure 8(d) is the resulting graph obtained by using sort SWP algorithm. Compared with the LAP algorithm, the SWP algorithm and sort SWP algorithm have better display effect.

(a) Original image

(b) LAP

(c) SWP

(d) Sort-SWP

The core of this paper is to use the Laplace mechanism in differential privacy protection to add noise to the image and to protect the sensitive information in the image. Therefore, to destroy the effect of privacy protection, it is a common method to eliminate or reduce the impact of image noise. For the methods of eliminating image noise, the representative methods are average filtering or median filtering. In Figures 9 and 10, the mean filter and median filter are used to interfere with the experimental results in Figure 8. The purpose of this is to evaluate the degree of privacy protection and the usability of the protected images after noise elimination.

(a) Original image and AF

(b) LAP and AF

(c) SWP and AF

(d) Sort-SWP and AF

(a) Original image and MF

(b) LAP and MF

(c) SWP and MF

(d) Sort-SWP and MF

Now, we have given a preliminary conclusion that the SWP algorithm and sort SWP algorithm are better than the LAF method. However, this conclusion is only obtained by observing the experimental results, so it may be one-sided or inaccurate. To further verify whether the conclusion is correct, the previous 12 images are transformed into a gray histogram for further comparison. Gray histogram reflects the relationship between the frequency of gray level pixels and gray level in an image, although the gray histogram cannot reflect the specific distribution of image pixels but reflect the impact of Laplace noise on the image as a statistical result.

It can be seen from Figures 11–13, under the same conditions, the histogram distribution of the SWP algorithm and lap algorithm is closer, but it is quite different from the distribution of pixels in the original image. The distribution of pixels in the histogram of the result of the sort SWP algorithm is consistent with the original image. Therefore, it can be considered that the noise interference caused by the sort SWP algorithm is the least.

(a) Original image and GH

(b) LAP and GH

(c) SWP and GH

(d) Sort-SWP and GH

(a) Original image and AF and GH

(b) LAP and AF and GH

(c) SWP and AF and GH

(d) Sort-SWP and AF and GH

(a) Original image and MF and GH

(b) LAP and MF and GH

(c) SWP and MF and GH

(d) Sort-SWP and MF and GH

In the papers related to differential privacy protection, the noise size is an important evaluation index to measure the advantages and disadvantages of the algorithm. Under the same conditions, less noise represents higher availability. Of course, as a special information carrier, image noise is only one of the evaluation indicators.

If the pixel value of the gray-scale image is regarded as a value, the value after adding Laplace noise will change with the change of . The smaller the is, the greater the noise is, the higher the degree of privacy protection is, and there is no limit to the value. However, in the experiment, the range of the pixel value of the gray image is limited between 0 and 255, so even if the obtained value is beyond this range under the influence of noise, it will be normalized. That means all values greater than 255 will be changed to 255, and all values less than 0 will be changed to 0. Therefore, the error of the image after privacy protection is divided into absolute error and relative error. For example, in an image , there is a pixel . If after adding noise, the absolute error value is 80, and the relative error value is 55.

In Figure 6, the absolute error and relative error value of the experimental result graph under the influence of different algorithms are given. The conclusion is that the error caused by the three algorithms is in line with the previous expectations, and compared with the LAP algorithm and SWP algorithm, the sort-SWP algorithm will cause the smallest error. Also, the median filter is slightly better than the average filter for Laplace noise.

4.3. Result Analysis

To verify the feasibility of the algorithm, the ORL face database, and YALE face database are used as experimental data sets, and the experimental environment is Intel® Corei9-9900K CPU@ 3.60 GHz,32 G memories, GTX 21080TI GPU, Windows 10 operating system. A face recognition method based on an improved AlexNet convolution neural network is adopted in the experiment. Compared with the AlexNet model, this method has a simpler network structure and fewer parameters, which can save a lot of model training time and then make a fast prediction. When using the ORL face database for the experiment, 5 images of each person are used as a training set, and other images are used as a test set. The sliding window length is 56, . When using the YALE face database for experiments, random 5 images of each person are also used as the training set and the rest as the test set. The sliding window length is 160, . Privacy budget is 1, 2, 3, 4, and 5. The test items are accuracy rate, recall rate, and F1 score. The experimental results are shown in Tables 1–6.

5. Conclusions

To solve the privacy protection problem of face image publishing, this paper proposes to use the Laplace mechanism of differential privacy to add noise in the image, so that this should protect the sensitive information in the face image. Compared with directly adding Laplace noise to the image, the SWP algorithm and sort SWP algorithm can effectively reduce the impact of noise on the protected image. Especially sort SWP algorithm, which uses the SAS algorithm to sort and then execute the SWP algorithm, is proposed to further improve the usability of the protected face image. From the results of several test experiments, compared with the lap algorithm (ORL, Yale), the absolute error of SWP algorithm decreased by 40.81%, relative error decreased by 38.26%, accuracy increased by 35.01%, recall rate increased by 3.53%, and F1-sroce increased by 39.07%; absolute error of sort-SWP algorithm decreased by 64.12%, relative error decreased by 51.70%, accuracy increased by 56.63%, recall rate increased by 6.85%, and F1-sroce increased by 55.62%. Besides, in the test of SAS algorithm, compared with the exponential mechanism ranking method (the accuracy of simulated annealing algorithm is only related to the number of iterations), the average ranking accuracy of SAS algorithm is improved by 17.63%.

The differential privacy protection method proposed in this paper is implemented by global noise adding. However, most of the sensitive information contained in a face image exists in some specific areas (such as eyes, mouth, nose, and other organs or feature points). The next research direction will be devoted to accurately finding the region of sensitive information and adding noise for those local regions. At the same time, the research will try to further reduce the impact of noise and improve the usability of face image after privacy protection.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Conflicts of Interest

The authors declare that they have no competing interests.

Acknowledgments

The authors thank the project of the National Natural Science Foundation of China (Nos. 61672179, 61370083, and 61402126), Natural Science Foundation of Heilongjiang Province (No. F2015030), and Fundamental Research Funds in Heilongjiang Provincial Education Department (Nos. 135109247, 135109243, 12541872, and 135209239).

References

K. Wang, B. C. M. Fung, and P. S. Yu, “Handicapping attackers confidence; an alternative to -anonymization,” Knowledge and Information Systems, vol. 11, no. 3, pp. 345–368, 2007.
View at: Publisher Site | Google Scholar
B. C. M. Fung, K. Wang, and P. S. Yu, “Anonymizing classification data for privacy preservation,” IEEE Trans on Knowledge and Data Engineering, vol. 19, no. 5, pp. 711–725, 2007.
View at: Publisher Site | Google Scholar
X. Xiao and T. Y. Anatomy, “Simple and effective privacy preservation,” in Proc of the 32nd IntConf on Very Large Data Bases, pp. 139–150, New York, 2006.
View at: Google Scholar
T. Li, N. Li, J. Zhang, and I. Molloy, “Slicing: a new approach for privacy preserving data publishing,” IEEE Trans on Knowledge and Data Engineering, vol. 24, no. 3, pp. 561–574, 2012.
View at: Publisher Site | Google Scholar
M. Terrovitis, N. Mamoulis, J. Liagouris, and S. Skiadopoulos, “Privacy preservation by diSASsociation,” Proceedings of the VLDB Endowment, vol. 5, no. 10, pp. 944–955, 2012.
View at: Publisher Site | Google Scholar
L. Sweeney, “k-Anonymity: a model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557–570, 2012.
View at: Publisher Site | Google Scholar
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “l-Diversity: privacy beyond k-anonymity,” in 22nd International Conference on Data Engineering (ICDE'06), pp. 1–12, Atlanta, GA, USA, 2006.
View at: Publisher Site | Google Scholar
N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: privacy beyond k-anonymity and l-diversity,” in 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, Istanbul, Turkey, April 2007.
View at: Publisher Site | Google Scholar
X. K. Xiao and Y. F. Tao, “Towards privacy-preserving re-publication of dynamic datasets,” in IEEE 51th Annual Symposium on Foundations of Computer Science, pp. 619–642, Las Vegas, NV,USA, 2010.
View at: Google Scholar
J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu, “Utility-based anonymization using local recoding,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06, pp. 785–790, Philadelphia, PA, USA, 2006.
View at: Publisher Site | Google Scholar
J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, “Private release of graph statistics using ladder functions,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15, pp. 731–745, Melbourne, Victoria, Australia, 2015.
View at: Publisher Site | Google Scholar
Y. Chen, A. Machanavajjhala, M. Hay, and G. Miklau, “PeGaSus: data-adaptive differentially private stream processing,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1375–1388, Dallas, TX, USA, October 2017.
View at: Publisher Site | Google Scholar
O. Goldreich, S. Micali, and A. Wigderson, “How to play any mental game,” in Proceedings of the nineteenth annual ACM conference on Theory of computing - STOC '87, pp. 218–229, New York, USA, 1987.
View at: Publisher Site | Google Scholar
C. Gentry, “Fullyhomomorphic encryption using ideal lattices,” Stochastics, vol. 9, p. 169, 2009.
View at: Google Scholar
C. Liu, J. Yang, and J. Wu, “Web intrusion detection system combined with feature analysis and SVM optimization,” EURASIP Journal on Wireless Communications and Networking, vol. 2020, no. 1, Article ID 33, 2020.
View at: Publisher Site | Google Scholar
S. Ramezanian, T. Meskanen, M. Naderpour, V. Junnila, and V. Niemi, “Private membership test protocol with low communication complexity,” Digital Communications and Networks, vol. 6, no. 3, pp. 321–332, 2020.
View at: Publisher Site | Google Scholar
V. Shukla, A. Chaturvedi, and N. Srivastava, “A secure stop and wait communication protocol for disturbed networks,” Wireless Personal Communications, vol. 110, no. 2, pp. 861–872, 2020.
View at: Publisher Site | Google Scholar
C. Thammarat and W. Kurutach, “A lightweight and secure NFC-base mobile payment protocol ensuring fair exchange based on a hybrid encryption algorithm with formal verification,” International Journal of Communication Systems, vol. 32, no. 12, article e3991, 2019.
View at: Publisher Site | Google Scholar
M. Anbarasan, S. Prakash, A. Antonidoss, and M. Anand, “Improved encryption protocol for secure communication in trusted MANETs against denial of service attacks,” Multimedia Tools and Applications, vol. 79, no. 13-14, pp. 8929–8949, 2020.
View at: Publisher Site | Google Scholar
B. Liu, B. Zhan, C. Zhang, and L. Yang, “Research on visual control system of inverted pendulum based on pixel displacement,” Journal of Physics: Conference Series, vol. 1550, article 062006, 2020.
View at: Publisher Site | Google Scholar
PixArt Imaging, “Patent issued for high accuracy displacement detection system with offset pixel array (USPTO 10, 609, 314),” pp. 8821–8835, 2020.
View at: Google Scholar
A. Grigoriev, E. Danilova, V. Trusov, M. Miheev, and M. Uhanova, “Modelling of measurement error for vibrational displacement based on the blurring analysis of a round mark image,” Applied Computer Systems, vol. 23, no. 1, pp. 69–74, 2018.
View at: Publisher Site | Google Scholar
G. Ye and X. Huang, “An image encryption algorithm based on autoblocking and electrocardiography,” IEEE Multimedia, vol. 23, no. 2, pp. 64–71, 2016.
View at: Publisher Site | Google Scholar
L. Yuan, P. Korshunov, and T. Ebrahimi, “Privacy-preserving photo sharing based on a secure JPEG,” in 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 185–190, Hong Kong, China, April 2015.
View at: Publisher Site | Google Scholar
Z. Moghaddasi, H. A. Jalab, and R. M. Noor, “Image splicing forgery detection based on low-dimensional singular value decomposition of discrete cosine transform coefficients,” Neural Computing and Applications, vol. 31, no. 11, pp. 7867–7877, 2019.
View at: Publisher Site | Google Scholar
L. Yuan and T. Ebahimi, “Image privacy protection with secure JPEG transmorphing,” IEEE Transactions on Signal Processing, vol. 11, no. 9, pp. 1031–1038, 2017.
View at: Google Scholar
M. Sundararajan, M. Veerappan, and S. Anbazhagan, “Partial image encryption based on using discrete cosine transform coefficients and lightweight stream algorithm,” Journal of Computational and Theoretical Nanoscience, vol. 16, no. 4, pp. 1573–1576, 2019.
View at: Publisher Site | Google Scholar
L. He, Y. Wang, and Z. Xiang, “Wavelet frame-based image restoration using sparsity, nonlocal, and support prior of frame coefficients,” The Visual Computer, vol. 35, no. 2, pp. 151–174, 2019.
View at: Publisher Site | Google Scholar
C. Dwork, “Differential privacy,” in Proceedings of the 33rd International Colloquium on Automata Languages and Programming, pp. 1–12, Berlin, 2006.
View at: Google Scholar
C. Dwork, “Differential privacy: a survey of results,” in Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, pp. 1–19, Xi’an, China, 2008.
View at: Google Scholar
C. Dwork, “The differential privacy frontier (extended abstract),” in Proceedings of the 6th Theory of Cryptography Conference, pp. 496–502, San Francisco.CA.UAS., 2009.
View at: Google Scholar
C. Dwork and J. Lei, “Differential privacy and robust statistics,” in Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 371–380, Bethesda, MD, USA, 2009.
View at: Google Scholar
C. Dwork, “Differential privacy in new settings,” in Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, Austin, Texas, USA, January 2010.
View at: Publisher Site | Google Scholar
C. Dwork, “The promise of differential privacy: a tutorial on algorithmic techniques,” in 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, Palm Springs, CA, USA, October 2011.
View at: Publisher Site | Google Scholar
C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284, New York, NY, USA, 2006.
View at: Google Scholar
F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), pp. 94–103, Providence, Rhode Island, USA, October 2007.
View at: Publisher Site | Google Scholar
F. McSherry, “Privacy integrated queries,” Communications of the ACM, vol. 53, no. 9, pp. 89–97, 2010.
View at: Publisher Site | Google Scholar
J. Xu, Z. Zhang, X. Xiao, Y. Yang, and G. Yu, “Differentially private histogram publication,” in 2012 IEEE 28th International Conference on Data Engineering, pp. 32–43, Washington, DC, USA, April 2012.
View at: Publisher Site | Google Scholar
X. Liu and S. Li, Histogram Publishing Method Based on Differential Privacy, 2018.
M. Hay, V. Rastogi, G. Miklau, and D. Suciu, “Boosting the accuracy of differentially-private queries through consistency,” Tech. Rep., CoRR, 2009, https://arxiv.org/abs/0904.0942.
View at: Google Scholar
C. Piao, Y. Shi, J. Yan, C. Zhang, and L. Liu, “Privacy-preserving governmental data publishing: a fog-computing-based differential privacy approach,” Future Generation Computer Systems, vol. 90, pp. 158–174, 2019.
View at: Publisher Site | Google Scholar
H. Li, J. Cui, X. Meng, and J. Ma, “IHP: improving the utility in differential private histogram publication,” Distributed and Parallel Databases, vol. 37, no. 4, pp. 721–750, 2019.
View at: Publisher Site | Google Scholar
X. Xiao, G. Wang, and J. Gehrke, “Differential privacy via wavelet transforms,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 8, pp. 1200–1214, 2011.
View at: Publisher Site | Google Scholar
M. Hay, C. Li, G. Miklau, and D. Jensen, “Accurate estimation of the degree distribution of private networks,” in 2009 Ninth IEEE International Conference on Data Mining, pp. 169–178, Miami, FL, USA, December 2009.
View at: Publisher Site | Google Scholar
J. Xia, W. Huang, Z. Ma, X. Dai, and L. He, “Gradient-based differential privacy optimizer for deep learning model using collaborative training mode,” in 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, October 2019.
View at: Publisher Site | Google Scholar
J. Liu, “Security and privacy problems and countermeasures of internet of things applications,” in Proceedings of the 2017 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017), Edmonton, AB, Canada, 2017.
View at: Publisher Site | Google Scholar
C. Task and C. Clifton, “A guide to differential privacy theory in social network analysis,” in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 411–417, Istanbul, Turkey, August 2012.
View at: Publisher Site | Google Scholar
R. Chen, B. C. M. Fung, P. S. Yu, and B. C. Desai, “Correlated network data publication via differential privacy,” The VLDB Journal, vol. 23, no. 4, pp. 653–676, 2014.
View at: Publisher Site | Google Scholar
W. Jinqiu, Q. Gang, and K. Pengbin, “Emerging 5G multicarrier chaotic sequence spread spectrum technology for underwater acoustic communication,” Complexity, vol. 2018, Article ID 3790529, 7 pages, 2018.
View at: Publisher Site | Google Scholar
C.-H. Chen, F. Song, F.-J. Hwang, and L. Wu, “A probability density function generator based on neural networks,” Physica A: Statistical Mechanics and its Applications, vol. 541, article 123344, 2020.
View at: Publisher Site | Google Scholar
J. Zhang, Z. Zhang, X. Xiao et al., “Functional mechanism: regression analysis under differential privacy,” in Proceedings of the 38th Conference of Very Large Database, pp. 1364–1375, Istanbul, Turkey, 2012.
View at: Google Scholar
J. Cao, Q. Xiao, G. Ghinita, N. Li, E. Bertino, and K. L. Tan, “Efficient and accurate strategies for differentially-private sliding window queries,” in Proceedings of the 16th International Conference on Extending Database Technology, pp. 191–202, New York,USA, 2013.
View at: Google Scholar
W. Qardaji, W. Yang, and N. Li, “Differentially private grids for geospatial data,” in 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 32-33, Brisbane, Australia, April 2013.
View at: Publisher Site | Google Scholar
X. J. Zhang, X. J. Zhang, C. C. Fu, and X. F. Meng, “Facial image publication with differential privacy,” Journal of Image and Graphics, vol. 23, no. 9, pp. 1305–1315, 2018.
View at: Google Scholar
T. Dalenius, “Towards a methodology for statistical disclosure control,” StatistikTidskrift, vol. 15, article 429–222, 1977.
View at: Google Scholar
G. Kellaris and S. Papadopoulos, “Practical differential privacy via grouping and smoothing,” Proceedings of the Very Large Database Endowment, vol. 6, no. 5, pp. 301–312, 2013.
View at: Publisher Site | Google Scholar
X. Zhang, R. Chen, J. Xu, X. Meng, and Y. Xie, “Towards accurate histogram publication under differential privacy,” in Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 587–595, Philadelphia, PA, April 2014.
View at: Publisher Site | Google Scholar
X. Li, J. Yang, Z. Sun, and J. Zhang, “Differentially private release of the distribution of clustering coefficients across communities,” Security and Communication Networks, vol. 2019, Article ID 2518714, 9 pages, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Chao Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1134

Downloads

1130

Citations