Artificial Intelligence and Machine Learning-Driven Decision-MakingView this Special Issue
The Discrete Gaussian Expectation Maximization (Gradient) Algorithm for Differential Privacy
In this paper, we give a modified gradient EM algorithm; it can protect the privacy of sensitive data by adding discrete Gaussian mechanism noise. Specifically, it makes the high-dimensional data easier to process mainly by scaling, truncating, noise multiplication, and smoothing steps on the data. Since the variance of discrete Gaussian is smaller than that of the continuous Gaussian, the difference privacy of data can be guaranteed more effectively by adding the noise of the discrete Gaussian mechanism. Finally, the standard gradient EM algorithm, clipped algorithm, and our algorithm (DG-EM) are compared with the GMM model. The experiments show that our algorithm can effectively protect high-dimensional sensitive data.
Now, big data have spread to every field and organization in our society, generating large amounts of personal data every day, which people use and analyse to enable the rapid development of society and technology. However, it is expected that some personal private data will be protected from being hacked or made public when it is collected. Therefore, how to effectively protect the privacy of data, not to be attacked, and can be effectively used, has gradually been paid attention to. Dwork et al.  introduced the concept and basic theoretical framework of differential privacy, which can effectively protect users’ data privacy and has a strict and elegant mathematical theoretical framework and guarantees.
Gradient EM algorithm is one of the most important statistical models, and Wang et al.  recently applied sensitive data for privacy protection. Before this, people used the original EM algorithm and gradient EM algorithm, and there is no statistical guarantee. Until Balakrishnan et al.  gave the statistical guarantee of EM algorithm, Wang et al.  gave the guarantee of gradient EM algorithm based on it and extended it to the data privacy protection theory. However, just like most scholars, Gaussian noise with continuous distribution is added to the data, while in practice, the data output queries are often discrete, such as the number of records in the database that meets certain conditions. For this reason, Canonne et al.  proposed to use a discrete Gaussian mechanism to add discrete Gaussian noise to the data and to ensure that it has the same excellent accuracy as adding continuous Gaussian noise.
In this paper, we design a discretized Gaussian algorithm based on the gradient EM algorithm for differential privacy calculation based on . Our algorithm has a good practical effect and can be extended to the general standard model. Meanwhile, the corresponding statistical guarantee of the algorithm is given in this paper. The structure of this paper is as follows: in the second part, we first introduce some theories of gradient EM algorithm, discrete Gaussian, and differential privacy, as well as some works related to this paper. In the third part, we introduce our model, namely, differential privacy discrete Gaussian EM (Gradient) algorithm (DG-EM), and the relevant statistical guarantee theorem. In the fourth part, we give the data simulation of the sensitivity, sample size, and dimension of the aggregated data, and the discussion of the model and future work are shown in the fifth part. Finally, we add the proof of some lemmas in the appendix.
2.1. Gradient EM Algorithm
Assume that is complete data, where is an observing sample and called as a latent variable. They are generally unobservable because they are missing or have underlying data structures. We denote and as the sample space for variables , respectively. Suppose that has a joint density function ; it belongs to some parameterized distribution family . For convenience, the variable has a margin density function , and is a s conditional density function which is under . Suppose that the given observer samples are from population . The EM algorithm needs to maximize the log-likelihood function . Through Jensen’s inequality, the lower bound of the log-likelihood function can be writen as follows:where
The expectation of is denoted as
To maximize equation (3), the left term of the inequality can be sufficiently large by iteratively increasing the lower bound on the right term. The standard EM algorithm [6–9] estimates the function by E-step at each iteration, then the parameters are estimated in M-step to make the parameter values of this iteration maximize the function and denote the parameter as . The gradient EM algorithm is usually used to achieve higher accuracy and faster global maximum if the function is differentiable at each iteration step. The gradient EM algorithm is usually stated as follows: when the function is differentiable at the t-th iteration, we can update the current parameter to by the following steps: E-step: compute , M-step: update ,where is a parameter which calls step size.
2.2. Discrete Gaussian
The study of discrete distributed forms of noise has received more attention this year. In the literature, people studied discrete Laplace distribution, discrete binomial distribution, and discrete Gaussian distribution and applied them to the field of cryptography.
In this paper, the differential privacy model is studied based on Gaussian mechanism. The noise with normal distribution makes the model have many elegant mathematical properties. Although the discrete Laplace noise mechanism and the discrete Gaussian noise mechanism cannot be compared in the same model, since they are used in different privacy mechanisms, we are still willing to use the discrete Gaussian noise in order to obtain aesthetic mathematical conclusions [10–13].
In this paper, we need to add noise to have discrete Gaussian distribution to specially treated sample. Firstly, we will give the definition of the discrete Gaussian distribution and some useful related theories.
Definition 1. Let , if random variable has probability mass function as follows:On the integers support set, then we call it is a discrete Gaussian distribution with location parameter and scale parameter and denoted .
2.3. Some Basic Theories on Differential Privacy
Definition 2. A randomized algorithm satisfies -differential privacy (DP) if for all neighboring datasets ,, differing on a single entry. For all events in the space , we have . Moreover, we called its approximate differential privacy, if , and we called its pure or point-wise -differential privacy in the case of -differential privacy.
The concept of concentrated differential privacy given by Bun et al.  as follows:
Definition 3. A randomized algorithm satisfies -concentrated differential privacy if for neighboring datasets , and for any , we have , where is the Renyi divergence of order of the distribution form the distribution.
From these definitions, we have the conclusion that pure-DP can imply -CDP, and -CDP can imply -DP, where is a positive constant.
In order to ensure the consistency of the parameters of our model, we need some basic definitions and assumptions based on .
Definition 4. (self-consistent). We called the function is self-consistent if .
Definition 5. (Lipschitz-gradient-2 ). We called the function is Lipschitz-gradient-2 , if we have the following inequality for parameter and :
Definition 6. (-smooth). We call the function is -smooth, if for any parameters , we have the inequality
Definition 7. (-strongly concave). We call the function is -strongly concave, if for any parameters , we have the inequality
Assumption 1. We assume that the function is self-consistent, Lipschitz-gradient-2 , -smooth, and -strongly concave on some parameter sets .
3. Differential Privacy Discrete Gaussian EM (Gradient) Model
We will mention that the EM algorithm based on  and use the discrete Gaussian noise mechanism of high-dimensional truncation algorithm, which satisfies the centralized differential privacy (CDP). Like Wang et al. , we have first considered one coordinate case that is 1-dimensional random variable . Let be i.i.d. sampled from . We get the clipped estimator as follows: Step 1. For the sample , we take a soft truncation function which is defined by Catoni and Giulini , Then, we take some mild constant and rescaled sample by dividing to get ; through this approach, we can get the truncated mean as follows: From the expression of the function , we know is bounded by , so the sensitivity is . Step 2. Generate random noises from a common distribution with . For data , we get a new data though multiply the noise factor , and we get term by scaling and truncation step. Finally, we get Multiplicative noise is an effective method to ensure the estimation effect of typical points and increase the estimation effect of outliers as much as possible. It was first proposed by Srivastava et al. , and the motivation of using Gaussian multiplicative noise comes from . Step 3. Finally, we take the expectation for the distributions with arrive multiplicative noise as follows:
Like Catoni and Giulini , taking , we take the distribution following the discrete Gaussian distribution as . Easily, for any given constant , we also havewhere is a correction term . Signs are respectively denoted as
Also, the notation is defined by
Lemma 1. Let be i.i.d. sampled form . Assume , and the upper bound has known. Given a number , for and , we havewith probability at least .
From the soft truncation function and the multiplicative noise algorithm, we know that the sensitivity of the processed observation samples is . Next, we need to add discrete Gaussian noise to the observations and obtain that the querywill be -DP, which leads the following Lemma 3; we give the proof in Appendix B.
Lemma 2. Let ; let the function be an operator algorithm which is defined by Steps 1–3, satisfying for any ; the query can be writen as randomized algorithm by , where , then satisfies -DP.
Furthermore, these results imply the following lemma.
Lemma 3. Under the assumptions in Assumption 1, with probability at least , the following holds:After the estimation of the univariate private data, in the t-th iteration of Algorithm 1, we use the univariate estimation method for each coordinate of the gradient and then get the estimation of the gradient . Finally, step M is performed.
Lemma 4. For any , let ; for any , and , Algorithm 1 satisfies -DP forwhere .
For Algorithm 1, the next theorem shows that the parameter estimation is consistent if the initial parameter is close to the true parameter enough. After some simple calculations, we conclude that in Lemma 2, the upper bound is , where is the optimal numerical solution to the equation
Lemma 5. Let denote a parameter set with which is a positive constant. Assume parameters satisfying condition of . If and is a large number such thatWe have for all . Furthermore, if we take and , we have
Lemma 6. Let , then there exists a constant such that the properties of self-consistent Lipschitz-gradient--smoothness, and -strongly concave hold for the function with , where is a enough large constant means that the minimum signal-to-noise ratio (SNR).
Furthermore, we can get Theorems 1 and 2. The proof of these theorems is very simple; we do not list the detailed proof procedure here. In fact, we only need to replace the upper bound on the variance of the discrete noise in  with a single coordinate with .
Theorem 1. With the same condition as in Lemma 4, for any , the -th coordinate of satisfies the following results:
Theorem 2. With the same conditions in Lemma 3, we assume that in Algorithm 1, and is a large enough number such thatIf we take and the ratio as , then for a failure probability , we have with probability at least We note that Lemmas 3–6 and Theorems 1 and 2 are easy to get through Lemmas 1 and 2. Due to limited space, we delete these proofs here, and readers can prove them by themselves. It is only necessary to pay attention to the upper bound of the -norm between the iterative values of parameters and the truth values in the process of proof.
4. Experiments and Results
In this section, we will evaluate the performance of Algorithm 1 on the GMM model based on these methods. We will study the statistical setting and theoretical behavior of this algorithm on synthetic data.
4.1. Baseline Methods
In this part, we will compare the two methods primarily. For convenience, we will refer to the gradient EM algorithm as EM, which will serve as a nonprivate baseline method. The other is the clipped differential private EM algorithm, which we still refer to as clipped , which will serve as our privacy baseline approach.
4.2. Experimental Settings
In this experiment, we generate the synthetic data of the mixed distribution of two components. To generate each of the algorithm, we consider the random initialization method for the selection of initial parameter values. In the results, we used to measure the resulting estimation error. We set signal-to-noise ratio . For the privacy parameter , we set , and then the parameter needs to calculate because it is the function of .
4.3. Experimental Results
As can be seen from Figure 1, we fixed . When the budget of our method is set at different values, the estimation error decreases significantly with the increase of iteration time. When the budget is , and 1, the optimal value is , and 2, respectively. It is difficult for us to determine the optimal value .
In Figure 2, under the lower dimension case, we test how the data dimension , privacy budget , and data size affect the estimation error of algorithms on the Gaussian mixture model over iteration . We can see that the estimation error of Algorithm 1 in GMM decreases when increases, increases, or decreases. However, we can see that when the budget is small, the effect of our algorithm is performed badly, and the estimation error declines unstably with the increase of the number of iterations.
In Figure 3, we can see that, in the face of high-dimensional data, the effect of estimation error needs a relatively large sample to be guaranteed. We conducted experiments with higher dimensions and different sample sizes of and 10 000, respectively. It can be seen that when the sample size is large enough, the estimation error can be guaranteed to decrease significantly with the number of iterations . As shown in Figure 3, with the increase of sample size, our algorithm is equally effective in high-dimensional space, which is not comparable with Wang et al.’s  algorithm.
In this paper, we study the differential privacy model with discrete Gaussian mechanism noise. Through the process of data scaling and truncation, the model effectively solves the influence of high-dimensional data on the model. Through the experimental part and theoretical proof, we can see that the estimation error of the model adding discrete Gaussian noise is faster than that of the model adding continuous Gaussian noise in the low dimension than that of the clipped model. The effect is much better than that of  in the case of high dimension. At the same time, in the previous lemma section, we can see that our model has more compact bounds, because of the smaller variance of discrete Gaussian noise.
Proof of Lemma 1
Proof. In order to make the conclusion universal, we make some necessary assumptions. Firstly, let denote all probability measures on , and we assumed it has an appropriate -field. Consider any two measures , and is a -measurable function. We take the form of a cumulative generating function asthrough a Legendre transform of the mapping like , where denotes the Renyi divergence between and .
Because is depend on two random quantities and the noise , we write .
By the definition of function before, the function is measurable and bounded with . Next, we letwhere is a term needs to be determined later. Inserting to (A.1), we haveFurthermore, we haveIfwe haveSo,Because is -measurable, the is -measurable. We havewith probability at least . We have the following inequality:Since the noise terms are independent and follow distribution , we can getThus, we have the bound from equation (A.9) as follows:and then we need to analyse the first term and the second term on the right-hand side of the top inequality (A.11).
For the first term, from the definition of the truncation function , by (A.11), we haveSince , the expectation and variance of the are as follows:For the second term in (A.11), we need to evaluate . We take ; through simple computations, we can getThus, we can take the upper bound form asWe take the differential for the variable ; we haveand with respect to , we havePlugging equation (A.17) into the setting of , we can getWe can get the setting of from equation (A.18); equation (A.15) has upper bound with form as follows:To get lower bounds on , we need to get the upper bounds on . Similar to the analysis above, we get the upper bound of throughBy the factwe havePutting the above analysis together, we haveThen, with probability at least , the following holds
Proof of Lemma 2
Proposition 1. Let with and . Let . Then,Furthermore, this inequality is an equality whenever is an integer.
The data in this paper are random numbers generated by statistical software R.
Conflicts of Interest
The author declares no conflicts of interest.
This research was supported by the Higher Education Project of Jilin Province (Grant no. JGJX2019D253).
G. Mclachlan and T. Krishnan, The EM Algorithm and Extensions, Wiley-Interscience, Hoboken, NJ, USA, 2nd edition, 2007.
I. Naim and D. Gildea, “Convergence of the EM algorithm for Gaussian mixtures with unbalanced mixing coefficients,” in Proceedings of the 29th International Conference on Machine Learning, ICML 2012, vol. 2, Edinburgh, Scotland, June 2012.View at: Google Scholar
C. Dwork and R. Aaron, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, pp. 211–407, 2014.View at: Google Scholar
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.View at: Google Scholar