Security and Communication Networks

Volume 2019, Article ID 2518714, 9 pages

https://doi.org/10.1155/2019/2518714

## Differentially Private Release of the Distribution of Clustering Coefficients across Communities

^{1}College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China^{2}College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China

Correspondence should be addressed to Jing Yang; nc.ude.uebrh@gnijgnay

Received 22 March 2018; Accepted 16 December 2018; Published 1 January 2019

Academic Editor: Emanuele Maiorana

Copyright © 2019 Xiaoye Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Aiming to provide more information about the behaviors between groups or patterns between clusters in social networks, we propose a two-step differentially private method to release the distribution of clustering coefficients across communities. The DPLM algorithm improves a Louvain method to partition one network using an exponential mechanism. We introduce an absolute gain of modularity to sanitize neighboring communities. Otherwise, the algorithm is difficult to converge due to the randomness introduced. The DPCC algorithm charts the noisy distribution of clustering coefficients as a histogram, which presents the results in an intuitive manner. We conduct experiments on three real-world datasets to evaluate the proposed method. The experimental results indicate that the proposed method provides valuable distribution results while guaranteeing -differential privacy. Moreover, the DPLM algorithm can obtain better modularity for the networks.

#### 1. Introduction

The understanding of the quantitative and qualitative characteristics of social networks has become an important challenge in the scientific research of the Internet era. The research content includes many aspects by means of the basic measures of complex networks. For example, the counts of triangles or other simple subgraphs can be used to characterize the connectivity of a graph. Meanwhile, various subgraph counts are the core data in graph analysis and also the parameters of random graph models. The clustering coefficient, a reflection of social cohesion, measures whether nodes in the graph tend to cluster together. The detected communities can assist in studying the organization and function of complex networks.

The release of graph measures may violate the privacy of individuals in social networks. Some algorithms have been proposed to address this problem. We focus on the schemes providing *ε*-differential privacy [1], a prominent concept, which is discussed largely in the computer field. Initially, differential privacy is used to protect the output of queries in an interactive environment. Karwa et al. [2] provided differentially private approaches for releasing* k*-star and* k*-triangle counts, respectively. A* k*-star subgraph has a central node with* k* connected nodes. A* k*-triangle subgraph means* k* triangles share one edge. The two approaches are based on smooth sensitivity [3] and a higher-order local sensitivity, respectively. Shoaran et al. [4] provided zero-knowledge private [5] methods for releasing a group-based triangle measure, which is the fraction of the number of actual triangles over the number of all possible triangles. Note that the nodes of such triangles belong to different groups. We consider the clustering coefficient as the graph measure in this study and provide an approach to protect link privacy during release.

Task et al. [6] proposed the concept of partition privacy, which provides broader protection at the level of small social groups rather than individuals. They released various graph measures in the form of histograms, such as triangle density, average shortest-path lengths, and subgraph counts. It should be noted that they ran experiments over a collection of graphs rather than one network. The accessible datasets are nonpartitioned graphs, which require developing differentially private partitioning algorithms. Mülle et al. [7] proposed an approach for perturbing the input graph by operating on the adjacency matrix. The approach is a combination of edge sampling and edge flipping, which is essentially an edge randomization method. Then, the graph clustering algorithms are applied directly to the perturbed graph. Nguyen et al. [8] addressed the problem of detecting communities under differential privacy. They proposed two schemes, input perturbation and algorithm perturbation. In addition, another category is output perturbation. They applied a high-pass filtering technique [9] to create a noisy weighted super-graph. Then, the original Louvain method [10] was run on the super-graph. To heuristically detect cohesive groups in a private manner, they proposed a divisive algorithm ModDivisive by realizing an exponential mechanism via Markov Chain Monte Carlo (MCMC). The modularity was used as a score function and the global sensitivity was also demonstrated. We improve on the Louvain method, one of the most cited methods for community detection, to implement a differentially private partitioning task for one network.

In this paper, we propose a novel method for differentially private release of the distribution of clustering coefficients across communities. The method partitions one network into several communities and then releases the histogram of clustering coefficients. It is more meaningful to compare with an average clustering coefficient of an entire network, as it may provide more information about the behaviors between groups or patterns between clusters in social networks. The rest of this paper is organized as follows. Section 2 introduces the background knowledge. The proposed method is presented in Section 3. Section 4 reports the experimental results. Finally, Section 5 concludes the study and provides additional research directions.

#### 2. Background

In this section, we first review the definition of differential privacy and some relevant concepts. Then, we introduce the calculation method of a clustering coefficient. Finally, we demonstrate the partition process of the Louvain method.

##### 2.1. Differential Privacy

Differential privacy is based on a mathematical foundation and can provide proven security as cryptography does. The probability of the same results will not change significantly, whether a record is in the dataset or not. It is difficult to provide further reasoning for any potential adversary according to background knowledge.

*Definition 1 ( ε-differential privacy [1]). *A randomized algorithm

*K*satisfies

*ε*-differential privacy if, for all neighboring datasets

*D*

_{1}and

*D*

_{2}differing by at most one record, and for all subsets of possible outputs , where

*is a tuning parameter to make the trade-off between privacy and accuracy. It is a small positive value; a smaller value yields a higher privacy and lower accuracy, and vice versa.*

*ε*In the context of social networks, differential privacy is adapted to edge-differential privacy and node-differential privacy in the literature [11]. We adopt the former conception, edge-DP for short, to protect individual edges from being disclosed. In the definition, a neighboring graph is produced either by adding or removing an edge or by adding or removing an isolated node.

To achieve differential privacy, noise mechanisms need to be introduced. The magnitude of noise required is dependent on the global sensitivity. The common techniques are the Laplace mechanism and the exponential mechanism. Furthermore, differential privacy contains two important combination properties, sequential composition and parallel combination. The relevant definitions are formally described as below.

*Definition 2 (global sensitivity [1]). *For a function* f *:* D*→*R*^{d}, the global sensitivity of* f* is where* R*^{d} is* d* dimensional real vector and* D* and are neighboring datasets. Global sensitivity represents the largest change that a single record could have on the outputs.

*Definition 3 (Laplace mechanism [12]). *For a function* f *:* D*→*R*^{d}, the randomized algorithm* M* satisfies *ε*-differential privacy, where is a random variable sampled from the Laplace distribution with mean 0 and scale parameter . The Laplace mechanism perturbs the numerical outputs by adding noise to guarantee *ε*-differential privacy.

*Definition 4 (exponential mechanism [13]). *Let be a scoring function, and the randomized algorithm* M* satisfies *ε*-differential privacy, where the probability that* r* is selected is proportional to ; a higher score means a greater probability of being selected. The exponential mechanism is applicable to discrete outputs, which ensures that an output is selected in a differentially private manner.

Proposition 5 (sequential composition [14]). *Let each algorithm provide ε_{i} - differential privacy. The combination algorithm A (A_{1} (D), A_{2} (D),…) over the entire dataset D provides Σε_{i} - differential privacy.*

Proposition 6 (parallel composition [14]). *Let each algorithm provide ε_{i} - differential privacy. The combination algorithm A (A_{1} (D_{1}), A_{2} (D_{2}),…) over the disjoint subsets of dataset D provides - differential privacy.*

In solving complex privacy problems, we need to combine several differentially private mechanisms and properly allocate a privacy budget *ε* to every portion based on the combination properties.

##### 2.2. Clustering Coefficient

There are two versions of calculation methods of clustering coefficients. The global approach is to measure the clustering of the entire network. The local approach provides a measure of the embedding of a single node, and the mean is for measuring the entire network.(1)Global clustering coefficient is(a)The wedge subgraph (*∨*) is a chain of 3 nodes connected by 2 edges.(b)The triangle subgraph (△) is a clique of 3 nodes connected by 3 edges.(2)Local clustering coefficient is(a) is the set of direct neighboring nodes of .(b) is the set of edges.

##### 2.3. Louvain Method

The Louvain method [10] is a heuristic algorithm based on modularity optimization to implement a community detection task. The algorithm can discover high quality partitions in a short time and unfold a complete hierarchical community structure for large networks.

The algorithm is divided into two phases for each pass, the first phase optimizes modularity until a local maximum is attained, and the second phase aggregates communities to build a new weighted network. In the initial partition, each node is treated as a different community. The passes are repeated iteratively to reach the final partition, the top level of the hierarchy. The Louvain method is shown in Figure 1. The algorithm runs two passes in this example. In the first pass, the network is partitioned into three communities with a modularity of 0.3291. The algorithm continues for the second pass based on the new weighted network. The modularity of the final partition is 0.3571, and then the algorithm ends.