Abstract
With the advent of the era of big data, people’s lives have undergone earth-shaking changes, not only getting rid of the cumbersome traditional data collection but also collecting and sorting information directly from people’s footprints on social networks. This paper explores and analyzes the privacy issues in current social networks and puts forward the protection strategies of users’ privacy data based on data mining algorithms so as to truly ensure that users’ privacy in social networks will not be illegally infringed in the era of big data. The data mining algorithm proposed in this paper can protect the user’s identity from being identified and the user’s private information from being leaked. Using differential privacy protection methods in social networks can effectively protect users’ privacy information in data publishing and data mining. Therefore, it is of great significance to study data publishing, data mining methods based on differential privacy protection, and their application in social networks.
1. Introduction
In recent years, with the development of communication technology, social software has brought convenience to user communication, ensured real-time contact between users, and accelerated the dissemination of information and news. Therefore, more and more users are attracted to register and use them. Their social circles have moved to social platforms, and their various activities and behaviors on social platforms have accumulated a lot of data [1]. With the rapid development of database technology and the continuous improvement of hardware level, as well as the increasing demand for information dissemination and sharing, a large amount of useful data can be saved [2]. Faced with such massive data storage, data mining and data publishing have become two important research directions for database applications [3]. Data mining is intended to extract meaningful rules and models from data, and data publishing is to present the data in an appropriate form [4]. Data release and data mining in social networks are likely to cause the personal sensitive information included in the social network and the relationship between users to be destroyed or information leaked, which greatly affects the security of the use of social networks. There is a great risk of privacy leakage [5]. Therefore, how to better publish and mine the massive information in social networks without destroying its private information has become an important research topic in social networks [6].
Continuously enhancing the security of social networks and continuously improving privacy protection capabilities will help people use social networks more safely and securely [7]. At present, many privacy protection technologies have been proposed for user privacy and security issues in social networks. The easiest way to implement the technology is only to hide user identity information and not to process other information [8]. Although this technology protects the user’s personal privacy within a certain range, malicious actors can still identify the individual’s identity through the background knowledge of the target user’s social network relationship, leading to the disclosure of user privacy [9]. Therefore, how to ensure the privacy and security of users when performing data mining on social network data is of great significance [10]. The social network recommendation system not only helps users find valuable information for themselves but also allows the information to be displayed to interested users so as to achieve a win-win situation for information producers and information consumers [11]. Privacy protection mainly includes two aspects: the protection of sensitive knowledge and the protection of sensitive data. Sensitive knowledge mainly refers to sensitive knowledge such as association rules and classification rules extracted from the database; sensitive data refers to the private data that can correspond to an individual, thereby causing the individual to be exposed [12]. The paper explores and analyzes the privacy issues in current social networks and proposes user privacy data protection strategies based on data mining algorithms so as to hope that in the era of big data, the privacy of users in social networks will no longer be illegally violated [13].
Mobile social networking has become a rapidly growing application among domestic and foreign mobile users. It is urgent to protect user privacy [14]. The existing simple data processing methods cannot meet the needs of privacy protection, and the existing laws and regulations have restricted the application and development of data mining technology [15]. If certain protection measures are not taken for the information, the private information of a specific individual will be exposed, which will cause harm to the owner of the data. Similarly, if the protection measures taken are improper or too simple, then reasonable data mining methods will be used to obtain the private information of a specific individual, resulting in privacy leakage [16]. The privacy protection of social network data is to perform some artificial operations on the original network data, such as adding, deleting, or modifying parts, so that the attacker cannot obtain the user’s sensitive information and avoid information leakage [17]. The data mining algorithm proposed in this paper can well protect the user’s identity from being identified and the user’s private information from being leaked. The algorithm can decompose the data, reconstruct the features, and store the data vertically, which can effectively prevent the data from being threatened by security and will not cause the loss of mining accuracy. Only the processed data can be released to the public. Of course, while protecting the user’s sensitive information, making the processed information still have certain usability is also an important factor in measuring data anonymity.
2. Related Work
Literature [18] classifies privacy protection technologies into three categories according to different specific applications. They are privacy protection based on data perturbation technology, data encryption, and data anonymization.
Literature [19] put forward the concept of database anonymization and used the generalization method to hide sensitive attributes in groups of scale.
Literature [20] proposed a k-degree model for privacy protection of node degrees in social networks, which made it impossible for attackers to identify target nodes by collecting node degrees as background knowledge.
Literature [21] proposed to minimize information loss while generating a k-degree anonymous model.
Literature [22] proposed to construct the k-degree anonymous graph by using the idea of dynamic programming to protect the privacy of social network structure.
Literature [23] proposed that the parameter k of many existing K-anonymity models is predefined, and K represents the privacy protection of nodes in social networks. The idea of personalized privacy protection is formally introduced, and a K-anonymity model based on personalized privacy protection requirements is proposed.
Literature [24] divided the original network into k isomorphic subgraphs, which effectively prevented the node reidentification attack.
Literature [25] constructs K anonymity model for path privacy, and the construction method is to modify different types of edges based on greedy ideas.
Literature [26] combines L diversity on the basis of the k-degree model to protect the sensitive attributes of nodes or edge relations in social networks.
In literature [27], through clustering technology, the nodes in the original network are clustered to obtain an anonymous network composed of super nodes, and the super nodes are generalized to achieve the purpose of privacy protection.
Due to the development of technology sharing, big data are widely used in every aspect of life, and unreasonable use also brings great troubles and even terrible threats to users. However, at present, there is still no mature technology and relevant perfect laws and regulations for the protection of users’ privacy. The lack of this aspect makes it impossible for relevant industry standards to have clear boundaries and implementation criteria and to implement effective measures to overcome this shortcoming. In order to ensure users’ privacy, this paper carries out effective data mining and analysis on social networks. Combined with the KD tree optimization algorithm, a social network model based on data mining is built to protect the privacy of social networks, and experimental verification and algorithm analysis are carried out on data sets.
3. Methodology
Big data is like a huge spider web, weaving the network information of today’s society. It is a large-scale and quite complex project, with the collection and processing function irreplaceable by other modern technologies. Thus, complexity, diversity, scale, and convenience are the outstanding characteristics of big data. It is such a combination of characteristics that big data technology has incomparable advantages over other technologies. The main problem of attribute and relationship-oriented data privacy protection is how to hide data in a relational database. The three common directions are data anonymization, secure multiparty computing, and data distortion. The comprehensive application of the three directions can effectively reduce the risk of personal data leakage.
The goal of the anonymous triangle protection principle is to protect those anonymous triangles in the process of graph anonymization. If multiple edges are generated, the original triangle that has been anonymized will not be included, and the triangle will not be anonymized gradually, so as to protect the original relationship as shown in Figure 1.

(a)

(b)
The common friends in the social network are all distributed on a scale, so only a small part of the connected edges have a higher relationship value. In order to participate in convenient social activities and enjoy all-round customized services in the era of big data, users cannot have absolute privacy rights [28]. But this does not mean that social network users can relax the protection of personal privacy but should pay more attention to the awareness of personal privacy protection. Only in this way can we ensure that we can enjoy normal services and social activities in the torrent of the big data era and can protect our privacy from being violated. At present, the release of dynamic social network data divides the privacy protection needs in social networks into different levels and at the same time provides privacy protection for users’ sensitive attributes and sensitive edges in social networks. The data storage security system architecture based on cloud computing is shown in Figure 2.

K-anonymity technology has been widely used in anonymous relational data. In the privacy protection of graph data, many researchers still use k-anonymity technology to expand its application to graph data. K-nearest neighbor anonymity extracts all nodes with similar neighbors, encodes them, and divides them into the same group until each group is composed of at least k nodes. Then each group is anonymized so that any node in the same group has at least k − 1 isomorphic neighbor nodes. This method can effectively resist neighborhood attacks. For social networks, social networks have the characteristics of a “small world,” and nodes with the same background are more likely to generate connections and aggregate in a small group. Therefore, the anonymous data after clustering privacy protection still retains the macro characteristics of the original network. Social network analysts can carry out data mining on social networks on the premise of ensuring users’ privacy and security so as to ensure the effectiveness of anonymous data.
Traditional data mining refers to the process of discovering new knowledge based on the original data and using corresponding mining algorithms. Traditional data mining algorithms cannot effectively protect private data, and security is affected. The KD3 framework is based on traditional data mining technology to process the privacy data that needs to be protected to form a new release database D′. Then reconstruct its features to form a new data feature F. And use the algorithm on it to adjust to get a new data mining algorithm M′. Finally, get a new mining result X′, make X′ and X as close as possible. In this way, privacy data is effectively protected, and almost consistent mining results are obtained. The frame is shown in Figure 3.

At present, the development and utilization of various social software include the signing of privacy treaties, but most of these treaties are mandatory terms, and users can only be forced to accept them. Social network users cannot check individual options in the privacy terms according to their actual situation, so they check “agree” in order to have the right to use the software. In the process of software development, each merchant should take more initiative to consider the initiative of user authorization, rather than blindly forcing users to accept terms. Clustering-based privacy protection is also one of the mainstream protection technologies of graph data. The idea of aggregation is to aggregate the points or edges in the social network into a super point or super edge according to the similarity and perform the same anonymous operation on the members in the super point or super edge. Figure 4 shows the structure of the intrusion detection system.

Describe computer intrusion data by ω and . Among them, ω represents the horizontal domain vector of computer network intrusion data, and represents the vertical domain vector of computer network intrusion data. α represents the initial filtering result of the intrusion feature data; then α is expressed as follows:where W represents the norm vector of the intrusion signal, s(ω) represents the norm coefficient of the horizontal domain vector, s() represents the norm coefficient of the vertical domain vector, and m represents the initial filtering constant. The signal processing result of the intrusion feature data can be expressed as follows:
where ς(n) represents the superimposed signal processing result of computer intrusion communication data.
The agglomeration coefficient is generally used in social networks to describe the degree of interconnection between a point and its neighboring points, that is to say, the agglomeration coefficient can reflect the degree of mutual understanding between a user’s friends. The local agglomeration coefficient is used to describe the properties of a specific vertex, and the average agglomeration coefficient is used to describe the average of the local agglomeration coefficients of all vertices in the entire social network. In the social network G = (V, E, L), G is an undirected graph. The local agglomeration coefficient of a vertex in G is shown in the following formula:where is the edge between vertices i and j, and is , which is the set of adjacent vertices of vertex . is the number of adjacent vertices in ; then in the social network G = (V, E, L), the average agglomeration coefficient is shown in the following formula:where n is the number of vertices in social network G.
Social network researchers can still use the clustered graph features to investigate the macro characteristics of the original graph. The main idea of the algorithm is: cluster the nodes of the social network according to the comprehensive distance between the nodes, cluster them into several super points, and the specific details in the super points are hidden. As long as the nodes in the two super points have one edge connected, there is only one edge connected between the two super points.
In the social network G = (V, E, L), the average path length APL is the average of the shortest distance between all vertices, as shown in the formula:where is the shortest distance between the vertices and and n is the number of vertices in the social network G.
With the rapid development of the Internet and information technology, all kinds of data in social networks are constantly accumulating. With the progress of the times and the passage of time, big data has spread all over various fields and platforms. It also ushered in the generation of massive data. In the social network-oriented application, it is particularly important to protect users’ privacy. By adopting certain protection strategies, users’ data cannot be leaked, and their security can be guaranteed. KD tree is a kind of data structure, which can be used to divide data nodes into K-dimensional space. KD tree is a binary tree in which each node represents a spatial range. In order to further study the KD tree optimization center point selection method, define the following formulas. First, set the sample data set .
The number of data elements contained in a single rectangular cell Numwhere n represents the number of elements in the sample data set, k represents the number of clusters, and m represents the number of sub-blocks contained in a cluster. The data can be adjusted timely according to the size of the given data set. Usually, when there is little difference in the number of data set samples, m can be taken as 10. A complete KD tree can be constructed by knowing the three parameters of N, M, and K, while the parameters of K and M represent the depth of KD tree and the number of contained leaf nodes, respectively.
Rectangular unit center where represents the linear sum of all elements in the rectangular unit, represents the weight of the rectangular unit, and its value mainly represents the number of sample elements contained in the rectangular unit.
The density Den of the rectangular unit is mainly used to indicate the density between the data elements contained in the rectangular unit.where represents the number of sample elements contained in the rectangular cell, represents the area of the rectangular cell, and and represent the maximum and minimum data elements in the corresponding rectangular cell, respectively.
With the improvement of data sharing and the development of data mining technology, people are getting more information, and the leakage of personal privacy data is getting more and more attention. The hierarchical information security organization is shown in Figure 5.

The existence of vertices is one of the most basic privacy information in social networks. Everyone may be on many different social networks. The same user may disclose different privacy in different social networks. Vertex is a necessary condition for the existence of a social network, and the attribute of the vertex is easy to obtain information in the social network graph. Although differential privacy protection can effectively protect users’ social relations, it is mainly based on that the attacker has mastered some information about the attack object. Therefore, the ability of the attacker should be reasonably evaluated before designing the privacy protection algorithm.
4. Result Analysis and Discussion
Data mining is a process of extracting hidden patterns from data. It is an important way to transform data into information and knowledge, and it is one of the effective means to analyze and process large amounts of data. At present, data mining technology has been widely used in biology, natural language processing, information retrieval, and other fields. Applying data mining methods to the research of social networks has become a new branch in the field of data mining.
The above mainly introduces some basic theories in social networks and the background knowledge that an attacker may have to launch an attack. And the algorithm of KD tree optimization to select the center point is analyzed experimentally. Brief introduction and summary of structured privacy protection technology and privacy protection technology with label attribute data. Although there are endless methods to protect user privacy in social networks, with the vigorous development of social applications and the large-scale increase in the number of people using social networks, social network data will become more and more complex, and privacy protection technologies need to be more perfect. The efficiency and usability of the algorithm for selecting the initial center point based on the KD tree optimization are analyzed. All the experimental results are simulated in MATLAB. The data set used in the experiment comes from UCI Machine Learning Repository, and the five data sets used in UCI are Iris, Ecoli, AcuteInflammations, Breastcancer, and Thyroid for related research. Table 1 is a description of these five data sets.
The accuracy of the KD tree optimization center point selection algorithm and the traditional K-medoids clustering algorithm when performing the same clustering are compared, as shown in Table 2.
From Table 2, we can see that compared with the traditional K-medoids algorithm, the accuracy of the KD-tree optimized center point selection algorithm proposed in this paper has been significantly improved, which shows that the KD-tree optimized center point selection algorithm is very effective. However, in the experiment, due to the KD tree optimization algorithm, it is necessary to build a KD tree and calculate the center and density of rectangular elements, so the time consumption is relatively large, which is inevitable. Therefore, the KD tree optimization algorithm proposed in this paper has a high accuracy for the processing of data with low dimensions.
Next, we further verify the effectiveness of the algorithm for higher dimension data. Table 3 shows the attribute description of related data sets.
The above data are applied to the KD tree optimization selection algorithm proposed in this paper and the traditional K-medoids algorithm, and five independent experiments are performed on each group of data to analyze the accuracy rate in detail and select each group of data. The average of the results of the five experiments was recorded, and the results of the experimental analysis are shown in Figure 6.

As can be seen from Figure 6, the KD tree optimization algorithm proposed in this paper is also suitable for high-dimensional data, and the accuracy is also high, but the accuracy of the traditional algorithm is decreasing. Of course, when the data dimension is high, the time cost of the algorithm will increase accordingly.
The performance of the algorithm is analyzed from two aspects: data validity and algorithm running time. The evaluation of data validity focuses on the information loss caused by the algorithm after anonymity to the original social network. As shown in Figure 7, under the same degree of privacy protection, the algorithm proposed in this paper has higher data availability.

As shown in Figure 8, under the same privacy protection strength, the KD tree algorithm takes less time, is more efficient, and has a higher time efficiency.

The algorithm in this paper first uses KD tree optimization to select k clustering centers. When there are new data, the nearest neighbor search method is used to cluster the new data reasonably so as to cluster the dynamic data quickly and efficiently. The algorithm only needs to process the incremental data so as to avoid reclustering all the data when the incremental data appears, thus improving the efficiency of clustering the incremental data to a certain extent. The increase in weight varies with k as shown in Figure 9, which basically changes linearly. The number of node splits varies with k as shown in Figure 10, which is positively correlated with the overall, but also related to the size of the data set. Because it needs to be affected by node grouping, if there are too many remaining nodes less than the value, too many nodes need to be split.


There are many ways to protect users’ privacy in social networks, but what we cannot ignore is how to ensure the practicability and availability of anonymized data. Anonymous social network graph should ensure that the user’s identity is not identified and the user’s sensitive information is not leaked. Although different applications may have different anonymous methods to process data, they should ensure the authenticity of the processed data, which has its due research and mining value when it is released. On the one hand, data mining of privacy data protection should take certain protection measures for privacy data; on the other hand, in data mining, the protected data is mined, and the algorithm adopted needs to eliminate and reconstruct the data, but the protection of privacy data that is damaged by decomposition will increase the storage capacity in decomposition, which wastes the storage space to a certain extent; at the same time, the damaged decomposition also effectively prevents data leakage and plays a very good security role. In the current era of big data, it is of great significance to explore the privacy protection of social networks.
5. Conclusions
With the continuous development of social network software and platform, a large number of data with social value and research significance have been accumulated. Data mining and analysis may lead to the disclosure of users’ privacy. Therefore, how to ensure users’ privacy security when effectively mining social networks is particularly important. Future research will mainly focus on the optimization of the algorithm so as to make it better applied to massive data. This paper proposes a KD tree optimal selection center point algorithm. Because it is easy to be attacked by external data in the process of dynamic clustering, the algorithm also introduces noise to disturb the data so as to achieve the effect of privacy protection. The data mining algorithm proposed in this paper can well protect the user’s identity from being identified and the user’s private information from being leaked. The algorithm can decompose the data and reconstruct the features and store the data vertically, which can effectively prevent the data from being threatened by security and will not cause the loss of mining accuracy. Only the processed data can be released to the public. Of course, while protecting the user’s sensitive information, making the processed information still have certain usability is also an important factor in measuring data anonymity. The proposed anonymization algorithm is used in social networks to retain the structure of the original network and the effectiveness of the original data. While solving user identity authentication, data privacy disclosure, and information loss, it also protects the better application of social networks and creates more value. It is an important carrier for the development of the information age. Huge data is like a mine. The game between data mining and privacy protection is also a contest of technological development. Research on privacy protection technology of social network data still faces many new challenges, and there are still many problems to be further studied.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
Acknowledgments
This paper was supported by the Research on Criminal Law Guarantee System of Network Security in China from the perspective of new development concept and overall national security concept (no. 21AZD082) and Research on Criminal Compliance Issues of Cross-Border e-Commerce Enterprises in Shaanxi Province under the background of “One Belt and One Road” (no.2021E003).