Abstract

Due to the popularity of social networks and human-carried/human-affiliated devices with sensing abilities, like smartphones and smart wearable devices, a novel application was necessitated recently to organize group activities by learning historical data gathered from smart devices and choosing invitees carefully based on their personal interests. We proposed a private and efficient social activity invitation framework. Our main contributions are () defining a novel friendship to reduce the communication/update cost within the social network and enhance the privacy guarantee at the same time; () designing a strong privacy-preserving algorithm for graph publication, which addresses an open concern proposed recently; () presenting an efficient invitee-selection algorithm, which outperforms the existing ones. Our simulation results show that the proposed framework has good performance. In our framework, the server is assumed to be untrustworthy but can nonetheless help users organize group activities intelligently and efficiently. Moreover, the new definition of the friendship allows the social network to be described by a directed graph. To the best of our knowledge, it is the first work to publish a directed graph in a differentially private manner with an untrustworthy server.

1. Introduction

Nowadays, social networks are pervading our lives in nearly every possible form and corner [17], as people use them to connect, interact, and share with their peers. In particular, the ubiquity of smart phones and various social network applications have made the global social network flourish over recent years. One common and critical service provided by social networks is organizing group activities. Unfortunately, most social networks offer only rudimentary invitation mechanisms, which send invitations either one-by-one manually or to everyone automatically. Besides, most group activities are filled strictly with a first-come, first-served manner. These services are ill-suited for frequent, small ad hoc events such as outdoor activities: inviting every possible candidate increases the likelihood of a group where few people know anybody else except for the host; however, it is tedious to manually search for a well-acquainted social group that performs the same kinds of exercise, at the same time and place [8]. From the invitees’ perspective, they might be overwhelmed by a plethora of different activity invitations that they are not willing to attend since invitations are typically sent out without considering the real interest, ability, and social habit of each invitee.

The popularity of human-carried/human-affiliated devices with sensing abilities, like smartphones and smart wearable devices, has opened up a large resource for sensory data, which has necessitated many novel sophisticated applications. For example, smart watches are usually equipped with an array of different sensors such as compasses, proximity sensors, accelerometers, gyroscopes, altimeters, barometers, and GPS [9]. These can be used to collect various data such as location, route, distance, pace/speed, duration, and elevation changes for different activities attended by the owner. By analyzing these personal data with state-of-the-art mining or learning algorithms, the habits of the device owners, including their preferred activities, schedule, and location, can be easily derived. This habit information can, in turn, be used to help the owners find group activities appropriate for them.

Based on this observation, Ai et al. [10] first proposed an efficient and personalized group activity organizing framework by learning historical data gathered from smart devices and choosing invitees carefully for an activity. However, they did not consider the risk of the privacy leakage of participants’ sensitive information, such as habits, age, and gender. Later, Tong et al. [8] designed a private group activity organizing framework and proposed the adoption of differential privacy to secure participants’ personal information. Tong et al. [8] considered a practical scenario, where three parties, including an untrustworthy activity organizer app, current app users, and potential users, are involved. After registering on this app, users can either organize activities by submitting a request to the server or receive invitations from the app server. Users have the capability of adding each other as friends. In order to receive more interesting invitations, app users need to divulge personal information such as age, gender, locational preferences, and historical data from their wearable devices. In particular, Tong et al. [8] assumed that the activity organizer app is untrustworthy, mainly due to the reason that the app developers are motivated by advertising revenue therefore attempting to attract more users by releasing some useful information about current users. The main contribution of Tong et al.’s work [8] is to protect existing users’ privacy while satisfying all three parties involved. The primary drawback, however, is that it allows the entire social network to be released to the public after naive sanitization approaches like removing user IDs. This may leave users open to privacy risks, especially reidentification attacks [11, 12].

In our work, based on the same three-party scenario assumption by Tong et al. [8], we designed a new group activity organizing framework with a stronger privacy guarantee and a more efficient invitee-selection algorithm. More precisely, our contributions in this research can be summarized as follows.(1)A novel definition of friendship: we introduced a more flexible definition of the friendship between a pair of users, which asks user “Who do you like doing activities with?” instead of “Who is your friend?” In previous works [8, 10], the friendship is defined mutual. However, a person could enjoy doing activities with another person without having the other person reciprocate the same feeling. This makes sense for the event invitation framework since its purpose is not to keep track of actual mutual friendships, but which users enjoy doing activities with whom. A more accurate term for this relationship would be “preferred friend” or “directed friend.”Such “directed” friendship notion brings several benefits. First of all, such friendships can be described easily by a directed graph , where the vertex set represents the user set and the arc set shows the corresponding directed friendships. That is, if likes doing activities with , then . Second, there is no need for other users to accept a friendship request, meaning two users do not have to directly communicate or have mutual agreement on friendship. This relieves the workload of updating the social network. Last but not least, since friendships are not bidirectional, having one user’s report does not compromise information about the remaining users. In other words, such friendships enhance the privacy protection for the users.(2)Stronger privacy guarantee: we added an efficient algorithm to make the graph satisfy a strong privacy guarantee, differential privacy, and thus allow the app server to release the underlying graph of the entire social network without jeopardizing users’ privacy. Differential privacy requires no computational/informational assumptions about attackers, data type-agnosticity, composability, and so on [13]. Since the app server is untrustworthy, we need to hide structure information before it is uploaded to the server. We applied the Randomized Response Technique (RRT) [14] to all vertices (or users). That is, each user’s friendships will be perturbed before being reported to the server. For example, a user will report the true (fake, resp.) relationship with a probability (, resp.), where the parameter is usually a small number. Such a randomized response strategy ensures the existence of connection from one user to the other to be hidden in the output graph while keeping the low distortion of the graph and preserving the most useful information about the graph. To the best of our knowledge, this is the pioneer work that this technique is applied in a directed graph under the existence of an untrusted server.(3)A more efficient invitation sending mechanism: in order to select appropriate candidates as invitees, Ai et al. [10] proposed a greedy algorithm, K-CORE, based on the k-core (undirected) graph theory. Our work designed a novel greedy algorithm, named as advanced k-core (ADV-K-CORE), to improve the K-CORE algorithm. The K-CORE algorithm starts with the original graph, sets , and then iteratively deletes all vertices with a degree less than in the current graph. gradually increases and the algorithm terminates when the size of the remaining graph reaches a lower bound. Our ADV-K-CORE deletes vertices more carefully by assigning higher priority to the vertex with the least impact on other vertices.(4)Experimental validation: in order to evaluate the performance of our activity invitation framework, we simulated an outdoor activity invitation system, where at most 1,000 users are created with different profiles, including age, gender, free time schedules, activity types, activity levels, and locational ranges. Then, at most 5,000 different activity events are generated, each of which requires a specific age range, time range, activity type, activity level, and location. Our experiments show that the privacy-preserving algorithm protects the structure of the social network effectively and the ADV-K-CORE algorithm improves the original K-CORE algorithm extensively.

The rest of the paper is organized as follows. Section 2 reviews related works; the proposed framework is introduced in Section 3; Section 4 shows the simulation results; and Section 5 concludes our paper.

Organizing group activities via social media, such as Facebook, Twitter, Plancast, Meetup, Yahoo! Upcoming, and Eventbrite, are quite popular in the era of “Internet of Everything.” However, most of these social media offer only rudimentary functions for organizing group activities [8]. Take Facebook as an example; it allows users to create public or private events, but the organizer can only choose to send invitations one-by-one or to everyone.

There is plenty of research in the literature on social networks; the following two are the ones most related to our work. Ai et al. [10] first made the proposal to design the social event invitation framework based on historical data of smart devices. They also presented two greedy invitation-disseminating algorithms. Their framework, however, is impractical as it assumes the existence of a trusted and altruistic server. Besides, few privacy protection approaches were applied to guarantee the security or confidentiality of users’ personal information. Recently, Tong et al. [8] considered a more realistic scenario in which the server is selfish and possibly untrustworthy. They concentrated more on the privacy issue such that existing users will be sufficiently protected while satisfying all involved parties simultaneously. Nevertheless, Tong et al. [8] only protected personal data such as age, gender, free time schedules, activity types, activity levels, and locational ranges, while leaving the underlying graph structure of the entire social network open to privacy risks, especially reidentification attacks [11, 12].

Differential privacy [1418] is a strictly provable and security-controlled privacy model to provide a very strong privacy guarantee. It can quantify the extent to which individuals’ privacy in a data set is preserved, while maintaining the usefulness of the data set. Differential privacy has proven to be extremely successful since its inception. The most popular differential privacy mechanisms include the Laplace mechanism [14], exponential mechanism [19], geometric mechanism [20], and Gaussian mechanism [17, 21].

The problem of graph publication under differential privacy has been well investigated. Generally speaking, there are two main techniques: direct publication and model-based publication. By direct publication, the output graph is constructed by directly adding noise to each edge or vertex, followed by a postprocessing step (probably a rounding step). For example, given an undirected graph and assuming edges are independent, adding Laplace noise to each cell of the adjacency matrix and then rounding each cell to 1’s or 0’s is a trivial Laplace mechanism to preserve the privacy. However, such an approach may severely deteriorate the graph structure. Recently, there are two differential privacy algorithms, TmF [22] and EdgeFlip [23], in this category for undirected graph publication. The algorithms for model-based publication inject noise to some intermediary quantities or structures, such as graph spectral, instead of directly to the original graph. The output graph will be regenerated from these noisy intermediary structures. Popular algorithms in this category include 1K-series, 2K-series [24, 25], Kronecker graph model [13], graph spectral analysis [26], DER [27], HRG-MCMC [28], and ERGM [29]. Most existing privacy-preserving algorithms for graph publication assume the graph is undirected and published by a trusted and altruistic server.

3. Privacy-Enhanced Activity Invitation Framework

In this section, we introduce our novel privacy-enhanced activity invitation framework (refer to Figure 1). Following Tong et al.’s [8] design, our framework also involves three parties: a central server controlled by the app developers, the existing app users, and potential new members. Compared with Tong et al.’s [8] framework, our framework enhances the users’ privacy by defining a “directed” friendship and protecting the underlying graph structure of the social network under the differential privacy model. Furthermore, our framework employs a novel and significantly more effective invitation-disseminating algorithm.

As introduced, we make a realistic assumption that the server is untrustworthy, given that it is motivated by advertising to its existing users and gaining profits. In order to bolster its income, the server will strive to provide quality services to maintain current members and also try to entice new users by releasing some statistical information about current users and providing online querying services. As a result, existing users or new registers may have trouble deciding whether to report their personal information honestly, including age, gender, and “Who I like doing activities with.” On the one hand, the server will definitely learn users’ habits more accurately if users could provide candid information, which in turn leads to better services. On the other hand, users should be worried by the possibility of having their personal information leaked.

The following shows how our design works in detail. Once a person registers on the app, the server will create and maintain a profile for him/her until he/she wants to destroy the account. If the user is a smart wearable device owner, the front-end app will seek authorization to access his/her historical data which contains records pertaining to activities. Otherwise, users need to fill their own profiles manually based on their understanding and estimation of their abilities. Whenever a user needs to update or report his/her personal information to the server, the front-end, user-side app will automatically obfuscate the given personal information before being transferred to the server so that the information is protected by differential privacy. If a user wants to organize an activity, a request will be first sent to the server. Then the server will analyze users’ historical data and estimate the users’ abilities or levels for each type of activity; the routine times they are free; and a locational range, indicating the rough area in which he/she is willing or able to travel in order to participate in the activity. Based on the above estimated habits about existing users, the server will disseminate the invitations to appropriate candidates via the ADV-K-CORE algorithm such that all of the invitees meet the group activity requirements and have a high chance to attend the activities. Since any privacy-preserving algorithm that satisfies differential privacy will protect the individual’s information regardless of the adversary’s background information [13], the server can release the statistical information about the current users safely to the public.

Our framework does not need to keep track of actual mutual friendships, but which users enjoy doing activities with whom. To depict such relationship among the users, we first define the concept of directed friendship and then use a directed graph to simulate the entire social network.

Definition 1 (directed friendship). For any two users A and B, if A likes attending activities together with B, one says B is A’s directed friend.

While the traditional friendship is a symmetric relation, our definition implies an asymmetric relation between users. Let represent the underlying directed graph, where a vertex denotes a user. An arc from to means that user likes attending activities together with . Such a definition allows each user to update his/her neighbors independently, which not only reduces workload but also enhances the privacy guarantee for the users.

3.1. Graph Publication via Differential Privacy
3.1.1. Preliminary

Differential privacy [14, 16, 17] is a privacy model that offers strong privacy guarantees under the assumption of a powerful adversary. In particular, the adversary could have nearly unlimited background knowledge. The model works by injecting artificial noise to the disclosed data set such that no one can tell whether an entry in the data set has been changed or not. On the other hand, differential privacy guarantees the released information is still useful. Formally, given two datasets where only one entry is altered, the probability distribution of the outputs for a statistical analysis of one data set should be nearly identical to the distribution of the other’s.

Let and be two data sets. The distance between the two datasets, denoted as , is the minimum number of sample changes that are required to change into . If , that is, if and differ by at most one entry, then we say that and are neighbors.

Definition 2 (edge-neighboring graphs). One says two directed graphs and are edge-neighboring graphs if , , .

Definition 3 (vertex-neighboring graphs). One says two directed graphs and are vertex-neighboring graphs if , , . Here denotes the set of incident incoming and outgoing arcs on .

A query is a function whose domain is the collection of data sets. The output of the query is usually denoted as . The global sensitivity of the given query is defined aswhere is a norm function. Our proposed framework is trying to hide the true friendship information for each user against queries like “how many neighbors does a user have?” It is not difficult to check that the sensitivity is under the edge-neighboring notion and at most in the worst case under the vertex-neighboring notion. We adopt the edge-neighboring notion in our work for the sake of low sensitivity.

Definition 4 (-differential privacy [14, 30]). A mechanism or randomized function provides -differential privacy if and only if for all pairs of neighboring data sets and , and all subset , it holds that

The parameter , deemed privacy budget, controls the level of privacy. Usually, the value of is small; say . Intuitively speaking, the parameter gives the upper bound on the output difference when the mechanism is applied to a data set and any one of its neighbors. From inequality (2), and become closer when decreases, implying more effort to distinguish the neighboring data sets and therefore indicating a stronger privacy guarantee.

The Laplacian mechanism [17] and exponential mechanism [19] are two of the most popular -differentially private mechanisms. Generally speaking, the Laplace mechanism is typically used when the output is numerical, whereas the exponential mechanism is applied to nonnumerical outputs. In particular, the exponential mechanism is more suited for situations where we need to select the “optimal” response but adding noise directly to can completely destroy its value.

Definition 5 (Laplacian mechanism [17]). Given a query , the Laplacian mechanism is defined as where are i.i.d. (independent and identically distributed) random variables drawn from . Here, denotes a Laplace distribution (centered at ) with scale and its probability density function is

Definition 6 (exponential mechanism [19]). The exponential mechanism selects and outputs an element with probability proportional to , where is a utility function that maps data set/output pairs to utility scores, and the sensitivity of is defined as

3.1.2. Our Differential Privacy Mechanism

There are two main types of noise injection strategies: output perturbation and input perturbation. Namely, the -differentially private mechanisms are usually designed by either perturbing the output of the query or adding noise to the input data set. Obviously, the output perturbation requires a trusted server to hold the authentic data sets while the input perturbation is more flexible as the data can be perturbed before being transferred to the server. Our framework assumes an untrustworthy server, and therefore an input perturbation strategy will be adopted.

Both the Laplacian and exponential mechanisms mentioned in Section 3.1.1 can be modified to perturb the input rather than output. These two mechanisms can be applied to obfuscate different types of users’ raw data, such as age, activity types, or activity ranges [8]. Since our work concentrates on the protection of users’ friendships, we add a novel privacy-preserving mechanism, named as PERT, in the random response manner (refer to Algorithm 1). More precisely, each user reports his/her real friendship information with a probability , where . The larger is, the more arcs in the graph are randomized.

Input: A directed graph and
Output: A perturbed graph
() Let
() for each   do
() if    then
() Add to with probability
() else
() Add to with probability
() return the resultant graph

Theorem 7. Our graph perturbation algorithm PERT guarantees -differential privacy.

Proof. Suppose and are two edge-neighboring graphs. Assume . Let and represent the perturbed version of and , respectively. Note that represents the same set of users in both graphs. The probability that two edge-neighboring graphs are perturbed to the same graph is determined by the value assigned to the differing arc . According to the algorithm PERT, an arc in the input graph maintains its original value with a probability and flips its value with a probability . For any , depending on whether , we havewhere the last inequality is due to the value of . This proves the theorem according to the definition of -differential privacy.

Theorem 8 (composition theorem [30]). Let , , be -differentially private algorithms. Suppose is the combination of these algorithms. (i)If all are defined on the same data set, then is -differentially private.(ii)If all are defined on different data sets, then is -differentially private.

According to the Composition Theorem, combining several differentially private algorithms results in a new differentially private algorithm at a cost of linearly increasing privacy budget in the worst case. For each user, his/her profile can be described by a tuple, where each dimension represents one type of data. Injecting noises to different data field with different -differentially private algorithms , -differential privacy will be guaranteed if data fields are independent; otherwise, -differential privacy will be guaranteed.

3.2. Improved -Core Algorithm

The server’s main job is to select invitees to meet the request of organizing a group activity from some user. Following Ai et al. [10] and Tong et al. [8], we assume that having friends attend an activity will improve participants’ overall experience. Therefore, the server needs to ensure that a number of friends will also be invited for each invitee.

We adopt the concept of -core graph to simulate a qualified social network where each user has at least friends. Suppose is a subgraph of such that users in satisfy all the requirements for an activity. Let and denote the vertex and arc set of , respectively. We say is a -core graph if each vertex has at least directed friends. Let be the set of neighbors of in graph , and let denote its cardinality, or the degree of in . Suppose a group activity has a limited capacity , and is the statistical response rate for similar past activities. The task then becomes choosing invitees such that each person also has friends invited.

Ai et al. [10] presented a greedy invitee-selection algorithm, K-CORE. The K-CORE algorithm starts with the original graph and sets ; and then it iteratively deletes all vertices with a degree less than in the current graph. When deleting the vertices, the highest priority will be assigned to the vertex with the minimum degree. As gradually increases, the algorithm terminates when the size of the remaining graph is . We propose an improved k-core algorithm, denoted as ADV-K-CORE (refer to Algorithm 2). ADV-K-CORE works very similar to K-CORE with the exception of the vertex deletion step. We scan through the whole graph and find the vertex with the least impact on other vertices, in respect to the number of vertices with degree less than current by the deletion.

Input: A directed -core graph and a group size
Output: A list of invitees
() Let
() Let
() while    do
() if   such that   then
() Pick the such that after its deletion
() resulting in miminum number of vertices
() with degree
() Delete
() else
()
() return the remaining list

4. Experiments

Two experiments were designed to evaluate the performance of our activity invitation framework. In these experiments, an outdoor activity invitation system is simulated, where at most 1000 users are created with different profiles, including friendships, age, gender, free time schedules, activity types, activity levels, and locational ranges. Then, at most 5,000 different activity events are generated, each of which requires a specific age range, time range, activity type, activity level, and location. As previously mentioned, each participant must satisfy all of the event’s requirements. A random response rate is generated uniformly for each user in advance. When a user receives an invitation, another random number is generated. If , he/she accepts the invitation; otherwise, there will be no response. All experiments were implemented with Java and conducted under OS X EL Capitan with processor, 3.5 GHz Intel Core i5, and memory, 16 GB 1600 MHz DDR3.

4.1. Experiment  1

As shown in Section 3.1, users’ sensitive information has been theoretically secured by our differential privacy algorithms. In particular, the graph structure of the social network can be protected by the algorithm PERT. Since the algorithm PERT hides users’ friendship by perturbing the arcs, the graph structure can be changed, which might affect users’ usage experience. For example, suppose a user originally has 5 friends in the social network and the number may decrease to 0 after the PERT algorithm is applied, which excludes this user from the invitee pool.

Our first experiment is to investigate whether existing users will receive worse services if they report noisy friendships to the server. We define the utility for each existing user as the ratio of accepted invitations in the original graph to the number of accepted invitations in the perturbed graph. Denote this ratio by . That is,The quantity tending to 0 indicates that our framework could still provide qualified servers to existing users despite users reporting noisy information to the server. For simplicity, we still name as the utility.

In this experiment, we set privacy budget , where the notation denotes an arithmetic sequence of numbers with lower bound , upper bound , and constant difference between the consecutive terms. For example, . To figure out how the utility behaves as the privacy budget varies, the average utility was calculated for each privacy budget . Additionally, we tested 4 scenarios aiming at investigating the scalability of the algorithm PERT. More precisely, we revoked the PERT algorithm to inject noises to the outdoor activity invitation systems with the following settings:(i)200 users and 1000 invitations;(ii)200 users and 5000 invitations;(iii)500 users and 1000 invitations;(iv)500 users and 5000 invitations.

To calculate the average utility , our ADV-K-CORE algorithm was used to select invitees. The results for the size-200 and size-500 outdoor activity invitation systems are shown in Figures 2 and 3, respectively.

Figures 2 and 3 show the average utility is relatively small, that is, , in most cases. This demonstrates the service quality for existing users is not jeopardized severely even if they report noisy friendships to the server. Besides, the experiment results show the excellent scalability of our PERT algorithm. As the privacy budget increases, the privacy guarantee becomes weaker according to the definition of differential privacy, resulting in better services received by the existing users. Consequently, should decrease towards 0 along the axis, which is verified by Figures 2 and 3. Actually, when , our PERT algorithm has already achieved a satisfying utility.

4.2. Experiment  2

Our second experiment is to study the efficiency of our invitation-selection algorithm ADV-K-CORE. Suppose and are the value of and the number of remaining arcs after the algorithm K-CORE terminates. Similarly, let and be the value of and the number of remaining arcs after the algorithm ADV-K-CORE stops. Then define two measures The smaller values of and mean more average neighbors in the resulting graph after the application of ADV-K-CORE, compared with the one obtained by the employment of K-CORE. Therefore, they further indicate a closer related invitee pool, which implies invitees have higher chance to accept the invitation.

To study how the values of and change as the graph size changes, we applied both algorithms ADV-K-CORE and K-CORE in graphs with multiple sizes. For each size, a number of graphs of the same size were generated and then the average values of and were obtained over these graphs, which was designed to show how stable our algorithm ADV-K-CORE could improve the algorithm K-CORE. In our experiment setting, let the set of graph sizes be and we calculated the average values of and over graphs, where . The results are shown in Figure 4.

From Figure 4, we can observe that and almost hold for all graph sizes in the experiment, which implies our algorithm ADV-K-CORE indeed produces a closer related invitee pool. Moreover, we find is always smaller than . This indicates the original K-CORE algorithm generates a less “consistent” adjacency in the sense that both high degree users and low degree users can be selected, which in turn results in a smaller value. In other words, in the resultant graph after the application of our ADV-K-CORE algorithm, the variance of the numbers of neighbors is relatively smaller. Besides, we can claim that our ADV-K-CORE algorithm improves the K-CORE algorithm steadily as the values of and are quite stable when the graph size increases.

5. Conclusion

This paper follows the recent works by Ai et al. [10] and Tong et al. [8]. We presented a private and efficient social activity invitation framework where the server is assumed to be untrustworthy but can nonetheless help users organize group activities intelligently and efficiently. Our main contributions are () a novel definition of friendship to reduce the communication/update cost among the network while simultaneously enhancing data security and user confidence; () a strong privacy-preserving algorithm for graph publication, which addresses the concern proposed by Tong et al. [8]; () an efficient invitee-selection algorithm. Our simulation results show that our proposed framework has good performance. In our current research, we assumed each data field is independent with each other and queries from the adversary are also independent. In the future, we will consider more complicated queries and the correlation among data fields.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Buglass, Chen, and Tong were supported in part by the 2016 Allen E. Paulson College of Engineering & Information Technology Faculty Research Seed Grant (CEIT-FRSG) Award from CEIT, Georgia Southern University. Gao was supported in part by funds from the Office of the Vice President for Research & Economic Development at Georgia Southern University.