Abstract

Social network aims to extend a widespread framework to communicate users and find alike people with common features, easier and faster. As people usually experience in everyday life, social communication can be formed from common groups with almost identical properties. Detecting such groups or communities is a challenging task in various fields of social network analysis. Many researchers intend to develop algorithms that work effectively and efficiently on social networks. It is believed that the most influential user in a community that had been followed by similar users could be a central point of a community or cluster, and the similar user would be members of the community. Research studies tend to increase intracommunity similarity and decrease intercommunity similarity to improve the performance of the community detection methods by finding such influential users accurately. In this paper, a hybrid metaheuristic method is proposed. In the proposed method called trust-based community detection using artificial bee colony by feature fusion (TCDABCF), we use a fusion approach combined with artificial bee colony (ABC) to improve the accuracy of the community detection task. In this approach, not only the social features of users are considered but also the relationship of trust between users in a community is also calculated. So, the proposed method can lead to finding more precise clusters of similar users with influential users in the center of each cluster. The proposed method uses the artificial bee colony (ABC) to find the influential users and the relation of their followers accurately. We compare this algorithm with nine state-of-the-art methods on the Facebook dataset. Experimental results show that the proposed method has obtained values of 0.9662 and 0.9533 for NMI and accuracy, respectively, which has improved in comparison with state-of-the-art community detection methods.

1. Introduction

The expansion of social network users and interaction between them indicates the complex structure of these networks that needs to be analyzed [1, 2]. Humans have always preferred to live in a society by forming groups consisting of small or large communities. Communities can be formed based on different characteristics that have been distributed over long distances around the world. This is possible due to the available social networks [3]. A network is composed of nodes and edges where the edges represent the relationships between the nodes. Complex networks can be displayed graphically. This show facilitates an understanding of its structure and reveals some important information. The structure of such a network can be shown as a graph in which nodes associate individuals and links compress friendships between them [4]. Social networks can be divided into social groups based on individuals’ preferences or features, and social groups refer to a group of socially connected individuals with a closer relationship [5, 6]. It is clear that people in social networks are self-organized in communities where the nodes of the same one are very dense and close in comparison to the rest of the network.

The growth of social networks is the main way of communication between people. But, trust is the foundation of these connections. On the contrary, with the expansion of social networks in the electronic environment and since this environment is less tangible than the external one and its dark spots are more, the concept of security and trust has become crucial [7, 8]. In many computing systems, information is generated and processed by many people. Trusted agents can be a very useful resource for collecting, filtering, and ordering information. Also, if the trust is used to support decision-making, it is important to have an accurate estimate of that, especially, if it is not directly available. Influential users who are trusted can be good agents for interacting and sharing preferences. They can also be a good head cluster candidate node on a social network since users are looking to connect with influential and trusted similar agents [9].

Social networks are aggregated from communities that, in a community, nodes inside a community are more related, whereas nodes between other communities are less related and more separated [10]. Determining the overall features of communities is a challenging concern because some of the people in a community have different features but they are common in a few ones. So, finding common features in a community leads to finding similar mutual users in a specific community. In general, community detection can be considered as clustering [11, 12]. The main concept of clustering is to use local similar features as similarity measures to detect clusters and find candidate points representative of these measures as cluster head nodes. In the case of social network analysis, community detection plays the same role as finding influential users as a community head node. Also, similarity measures can be considered as social features of users consisting of membership in a specific group, and it could also express the same ideas, follow reciprocal users, and so on.

In social networks, usually, the users are forced to communicate together, and it is one of the inevitable certain rules of this association. Otherwise, individuals and communities have to observe basic rules to achieve success targeted in social media. Trust is the basic requirement of social communication, for a healthy social network, and the main motivation of this is to take into account the sense of belonging and loyalty necessary for success in all areas. Social trust reflects the confidence to each of the individuals in a community for sufficient utilization of resources and license to reduce formal barriers to reduce the cost of security extension in all aspects of a network [13]. Therefore, to gain an integrated community, it is required to establish a healthy social structure in that users trust each other and influential agents, respectively.

In this paper, a trust-based community detection using artificial bee colony by features fusion (TCDABCF) is proposed. The purpose of this method is to find influential and reliable users in the network that are considered as cluster head nodes in communities. Such users can be a good feed source for other users and have a strong connection with them, so they can be a good central point for an integrated community. By searching among users surrounded by candidate agents, similar users can be found who have a trust-based connection with the agents, as members of the community. Finding such users is considered as an optimization issue in this paper. We use a metaheuristic approach to trade-off a balance between the goals of community discovery and building trust-based communities. The main contribution of this paper is given in the following:(i)Adapting the metaheuristic approach of artificial bee colony (ABC) as a clustering algorithm based on the trust between people in a community is the novelty and the main contribution of this work. In this approach, we use artificial bee colony (ABC) to find influential users as candidate agents for the central points of communities, based on the centrality criterion.(ii)Defining a constrain-based optimization problem based on the fusion of trust and social similarity and related constraints in the social network as the fitness function of the bee colony optimization algorithm is the second contribution of the proposed method.(iii)Defining intraclusters’ distance and interclusters’ distances to find the level of trust between users and agents is another contribution of the proposed method. In this method, we find similar users as candidate agents by the metaheuristic exploration due to the result of the proposed constraint-based optimization fitness function.

The rest of this paper is organized as follows: in the second section, related works will be discussed. In the third section, the preliminaries and algorithm of the proposed method will be detailed. In the fourth section, the implementation and result of the experiments will be shown. Finally, we will discuss and conclude this paper in Section 5.

In recent years, due to the importance of community detection in social networks and effectiveness of the research area for analysis of it, many community detection algorithms have been developed to reveal community structures in complex social networks [14]. In this section, some existing community detection algorithms are briefly introduced.

Artificial bee colony (ABC) is a swarm intelligence-based evolutionary algorithm that is inspired by exploring the food source behavior of honey bees. Similar to many other metaheuristic algorithms such as differential evolution (DE) algorithm, ant colony (ACO) algorithm, particle swarm optimization (PSO), and gravitational emulation that are adapted to solve many different problems such as routing [15, 16], image segmentation [1722], and many other clustering problems [2325], ABC is also used in solving many different problems [26, 27]. In the study by Hafez et al. [28], artificial bee colony optimization was used for community detection problems for the first time. It had motivated to determine the number of communities automatically. It also is based on this hypothesis that the performance metaheuristics-based community detection algorithms are influenced directly by the quality function used in the optimization process. In the work by Aung et al. [29, 30], ABC community detection is used by introducing exclusive mutation and crossover operations. By creating diversity in the new population, these operators try to find optimal similar users to the influential user in each community. Aung and Nyunt [31] proposed a new community detection method based on modularity ABC that normalizes mutated information to use for the fitness function to increase the performance of the specific community structure. In the work by Saoud [24], an ABC-based approach is used to split communities into high-quality communities by modularity and select one node at each community as a representative concept. Then, each user is assigned to community similarity to a representative node. Moreover, this approach could determine the number of communities, automatically.

Ding et al. [32] examined trust model-based detection schemes, and based on these schemes, they modeled user interactions as a transfer of trust. They focused on the cold start problem, which is one of the most important problems in trust model-based diagnostic schemes. The researchers proposed a new algorithm called TLCDA to identify a community based on the trust model. This algorithm uses the K-mediods clustering method to identify communities. The researchers said that the results of the implementation and evaluation of their proposed algorithm showed that this algorithm has the ability to identify coherent communities and also support topological coherence.

Lingam et al. [33] focused on identifying social botnets in online social networks. The researchers first created a weight chart based on behavioral similarities (such as the similarity in interests, similarity in social interactions, and similarity in content tweeted by users) and participants’ trust in each other on Twitter. The researchers then developed two algorithms, Social Botnet Community Detection (SBCD) to identify social botnet communities that behave similarly to malicious behavior and Deep Autoencoder-based SBCD (DA-SBCD) to reconstruct and identify social botnet communities. The researchers used the Twitter dataset to implement and evaluate the accuracy of the algorithms proposed by the researchers. The results of their evaluation showed that the DA-SBCD algorithm has 90% accuracy in detecting communities, which when compared to other methods (such as normalized mutual information (NMI)) showed an improvement of 8%.

Sheng et al. [34] studied how to discover the structure of the society in the network in order to extract information. These researchers suggested a new algorithm for identifying communities’ novel community detection algorithm NTCD (Community Detection based on Node Trust). This algorithm is an algorithm for identifying stable communities that do not need to set any parameters, and its temporal complexity is almost linear. The proposed algorithm for determining the probability of a node in the current community examines the relationship between the node and its neighboring communities (this relationship is called the Node Trust). Node Trust uses the maximum search method to identify communities. To implement the proposed algorithm in this research, real-time and hybrid networks were used, and the results showed that the proposed algorithm has high accuracy for identifying communities and belonging to a node to a community.

Zhu et al. [35] focused on how various social networks, including Foursquare and Instagram, work. These networks often use two methods, including Point-of-Interest (POI) discovery and friendly advice to make suggestions based on where people live. In other words, social trust among individuals has no effect on the performance of social networks and the suggestions they make to users. In this research, a new algorithm for clustering users based on the level of trust in them has been proposed, which then, using these clusters, a method to predict the level of trust in users was presented. The researchers used the level of trust in people and the similarities in their behaviors to provide advice. To improve the quality of the recommendations made to the user, the researchers proposed a framework for providing POI recommendations based on the user’s preferences, the impact of the geographical coordinates of the user’s location, and the level of trust in the user. The results of implementation and evaluation of the proposed algorithms showed that the clustering method based on the trust proposed in this research has high efficiency and accuracy in identifying people with similar and reliable desires and interests.

Chen et al. [36] focused on the analysis of social networks, and from the discovery of communities between individuals on social networks to better understand the topological features of real social networks, to monitor public opinion, and to offer personalized suggestions, ideas used by leaders were identified. The researchers considered the characteristic of trusting users in creating links between them. In this study, they proposed a new algorithm for identifying communities between users on social networks. The proposed algorithm has two steps. First, the trust relationship between users was divided into three categories: direct, indirect, and mutual trust. A method was provided to calculate the level of trust between users. Secondly, they used edge fit and fitness between users to identify nonoverlapping communities and to integrate these communities and build a relationship of trust. Then, they proposed a new trust-based algorithm to apply trust between nodes in a community. The results of the implementation of the proposed algorithm using Lesmis and Gemo datasets showed that it can be very effective for deep understanding and discovery of communities among social network users.

Wu et al. [37] investigated the role of the trust factor in communities created on social networks. For this purpose, they reviewed all the research conducted in two databases, SSCI and SCIE. The results of this study showed that most of the research in this field has been done by researchers in the United States, and in this research, more emphasis is placed on creating computational models and proposing systems. Finally, some of the most important limitations in the development of research in this field were analyzed in this research.

Beigi et al. [38] focused on interactions in e-commerce and emphasized on the need to create a factor called trust between users so that they can conduct their interactions in the world of e-commerce with ease. In this research, a new method for identifying communities in the network has been proposed, which has used the relationships created based on trust and similarities in user rankings based on the amount of credibility and trust in them to build relationships in communities. The main purpose of the proposed framework and method in this research is that the value of trust between different people in the society can be used as a predictor of relationships between other people in the society. In this study, two proposed recognition algorithms were used (one to validate people in dealing with people inside the community and the other to validate the level of trust in people based on their behavior inside and outside the network community) to assess the level of trust between people. The results of the implementation of the proposed method showed that this method, compared to other existing methods of predicting trust, in the dataset of the famous websites reviewing the products of Epinions and Ciao, works better.

Kou et al. [39] studied how social media delivers content to different people on a signed social network (SSN) and modeled some of the more complex systems in the real world using these networks. These networks provide content to users by creating two groups of users (including trusted and untrusted). One of the major challenges in using these networks is that some users on these networks are unknown, so you cannot attach a trusted or untrusted label to them. For this purpose, these researchers proposed a new method called the trust-based missing link prediction (TMLP) method. In this method, first, they use the Simhash method to create a hash index for each user, and then, to determine whether it can create a social relationship between two users or not, they calculate the Hamming distance between two users. Finally, they use the fuzzy computing model to determine the type of new social relationships between users (including trustworthy or untrustworthy).

Baek et al. [40] focused on online social networks (OSN). These networks can facilitate the establishment of new relationships with previously unknown individuals who have similar opinions and interests to the user, and they can also have many security problems (abuse, disclosure of uncontrollable content, dissemination of incorrect information, etc.) to create for them. For this purpose, we need a dynamic model that can evaluate the reliability of users dynamically. In this study, they proposed a dynamic access control method based on trust in online social networks. In this method, various sociological methods were used to understand negative behaviors and limit the level of access dynamically in the communities created by users in online social networks. The evaluation results of this method showed that it can have a high ability to prevent uncontrolled disclosure of information.

Li et al. [41] focused on using data mining techniques to analyze users’ behaviors and feelings on mobile social networks. They provided an overview of the user emotion estimation scheme (including chain of trust between users, the meaning of different users’ emotions, etc.). Further, they proposed a model for building a chain of trust between users and finally proposed a new way to model emotions by presenting modeling rules. The results of implementing their proposed method showed that their method has been very effective in modeling the trust chain.

Ma et al. [42] examined two important factors in creating communities on social networks (including the quality of grouping and the cost of grouping). To optimize these two factors, they proposed a multiobjective group recognition algorithm based on user relationship analysis that can optimize the two factors, while normalizing the information. In the proposed algorithm, the local search strategy, which is related to the specific knowledge of the problem, is used to improve the effectiveness of the new algorithm. Experiments performed by these researchers on various local and social networks showed that their proposed algorithm can find group structures with high accuracy in social networks.

All the methods mentioned in this section used various algorithms in different ways for community detection, but none of them has paid attention to the importance of trust. The method proposed in this paper, in addition to being inspired by previous methods, has also added the trust factor to the parameters of the fitness function of the ABC optimization algorithm in order to effectively perform community detection in social networks.

3. Methodology

As mentioned earlier, in this paper, the trust-based ABC optimization algorithm by feature fusion is used for community detection in social networks. This method finds influential users in the network. Influential users are the ones who have the most communication on the network. After calculating the degree of trust for every influential user, then the characteristics of influential users and their trust degree are used as parameters of the fitness function in the ABS optimization algorithm to obtain similar users to each influential user and later to form communities. Algorithm 1 shows the algorithm of the proposed method.

(1)Input: MM adjacency matrix
(2)Output: Communities
(3)Begin
(4)For all users in the social network.
(5)Calculate all user’s communication rates.
(6)Calculate all user’s trust levels.
(7)End of for
(8)Select influential users with maximum communication rate and trust level as a community center.
(9)Merge overlap influential users due to the direct link.
(10)Extract community centers.
(11)Consider influential users as food sources in ABC.
(12)Search for a similar user to community centers based on fitness function to find employee, onlooker, and scout bees.
(13)Form communities based on fitness values greater than the threshold.
(14)Integrate communities based on overlap users.
(15)End
3.1. Problem Formulation

The complex networks in the presented method are considered with an adjacency graph, denoted by , where and E are the vertex and edge sets, respectively [2]. Each vertex in G represents a user in the network, and each edge is a communication between a pair of users. The size of the network is defined by n =  which is the number of users and m = |E| which is the number of links. The network structure is demonstrated as a square adjacency matrix A = (aij)n×n, where each element can have one of the values of the set {0, 1}, that is, aij = 1 if useri is connected to userj;otherwise, aij = 0.

The main steps in this method include the influential users’ identification, the trust level calculation, and the closet user detection using the metaheuristic ABC algorithm. We describe the proposed algorithm in detail in the following subsections.

3.2. Influential Users’ Identification

At the first step in TCDABCF, the main assumption is that the influential user in a social network might have followed by many users, while followers of the influential user may not be familiar with each other. The number of followers of a user is considered as a communication degree, and a user with a larger degree would have a lot of users surrounding it in the local neighborhood. The influential user with a high communication degree is more likely to be a central node of the community. Figure 1 shows an example of a synthetic network that nodes related to influential users, according to their communication degree, have a bigger circle than other nodes.

Considering a case like in Figure 1, if the distance between influential users is too short, they cannot be considered as central points of discrete communities. Therefore, we assumed that the distance of influential users is not less than the average distance of the network. The average network distance is defined as the average of the shortest paths between each two separate node pairs in the network. Hence, we identify the influential users by their communication degree and the distance of other influential users. If the distance between two influential users is less than the threshold, the lower one is ignored and is considered as a follower of the main influential user. So,where is the user in the network, is threshold, Ii,j is the influential user, and D is the shortest distance between two nodes.

3.3. Trust Level Calculation

In online social networks, trust is defined as the belief of one user to another user in a mutual relationship [32]. In other words, the expectation of a user from the other hand user in a cyberspace relationship is considered as trust. Therefore, in social networks, users who have a higher level of trust are considered by other network users and can be present as the main node in the network. Hence, to calculate the trust of the user in social relations in a social network, scholars measured user relation strength as an individual weight that indicates trust level and trust evolution, which are practically useful for improving the accuracy in social network analysis.

Let us consider two users i and j as connected neighbors in a social network , and their direct trust can be calculated as follows:where indicates all paths between user i and user j and is the sum of all direct paths between node i and its adjacent nodes.

3.4. ABC Algorithm

ABC is a swarm intelligence-based evolutionary algorithm that is inspired by exploring the food source behavior of honey bees. The ABC algorithm has three types of bees consisting of employee, onlooker, and scout bees which search the fly in the problem space to find the optimum solution. At the initial stage, the bees are grouped by their fitness. An employee bee saves a food source in her mind when she searches in the space, and she demonstrates the quality of food source with onlookers by dancing around that. Onlookers look carefully at the dances of the employee bees and select the best source due to it. Onlookers tend to this source to exploit it. After emptying the food source, it may be released, in which case the scout bees will randomly try to find a new food source. The percentage of scout, employee, and onlookers is usually determined manually.

In the ABC algorithm, the position of a food source represents a solution to the optimization problem, and the amount of nectar from the food source is associated with the suitability of the solution. The number of employee bees or onlooker bees is equal to the number of solutions in the community. In the first step, the ABC distributes the initial population randomly (, solutions of SN food source positions, and N number of bees), where SN represents the population size.

Each solution (food source SNi, i = 1, 2, …) is a D-dimensional vector. Here, D is the number of optimization parameters. After initialization, the population of situations is subject to a repetition of the cycle where C is the process of searching for employee and scout bees of an artificial bee or onlooker which is likely to generate a position change (solution) in their memory to find a new food source and test the amount of nectar (merit value) from the new source (new solution) [33].

The onlooker bee selects a food source according to the probability value associated with that food source, , which is calculated bywhere is the degree of suitability of solution i evaluated by its employee bee, which is proportional to the amount of nectar in the food source at position i, and SN is the number of food sources equal to the number of employee bees. In this process, employee bees exchange information with onlooker bees. In order to generate a preselected food position, ABC uses the following expression:where xij is the previous position of the bee x, , and , and their indexes are randomly selected. Although k is randomly assigned, it is different from φij which is a random number between [−1, 1]. It controls the production of the neighbor’s food source position around xij and provides comparative variations of the neighbor’s food situation visually. Equation (4) shows different parameters between xij and xkj and also changes in the position xij. Therefore, the search approaches of the optimal solution are reduced periodically. The bee moves according to the following equation:where is the new position of bee x, (xij + xkj) is the distance between two bees xi and xk, i is the current bee, k is the neighboring bee, and j is the next move of bee xi towards bee xk.

Here, a dimension is considered. So, a variable (j) is selected. If the new position (new food area) has a better quality (more nectar or more of the same function as the target), the bee stays in the new area; otherwise, it returns to its previous area, and a unit is added to the bee’s trial index.

In fact, the trial index is a counter to the number of scout bee movements with no improvement. If the trial index value of a bee exceeds the predefined value, this means that the food area no longer has martyrs and must have left [32, 43].

The second group is scout bees. Employee bees are spread in the food area. Depending on the quality of the nectar and the amount of nectar in that area, they show a movement above their food area, and the guard scout bees find out which area has a better quality according to the movements of the onlooker bees. Therefore, scout bees select these areas accordingly. In other words, those areas that have more nectar have a better chance of being selected by scout bees. Assigning scout bees gives the bees in the area another chance to move. Assigning scout bees means that bees in the area are given a second chance to move. How to calculate the odds is done according to the following equation:

The third group is the search engines. The bees leave the areas that have been identified as unfavorable for nectar and randomly select other areas. If the stop condition is met, the algorithm pattern stops. Otherwise, we go to the beginning of the ring, which is the movement of the employee bees [44].

3.5. Closet User Detection Using ABC

In the proposed method, the closet user is a user who, in addition to having a direct relationship with influential users in the network, has social characteristics and trust degree that are similar to influential users in the social network. As mentioned earlier, influential users in the network are the users who have the most communication and the highest trust degree. Influential users are surrounded by a multitude of users. In the proposed method, the purpose of determining closet users is to find those followers who are more similar to influential users both in terms of social characteristics and the trust degree. Such users, in conjunction with influential users, can form integrated communities with the same features that are surrounded by a multitude of other users.

In this paper, we use the ABC algorithm in order to find similar users to influential users. In this algorithm, social similarity and trust are selected as evaluation criteria in the fitness function. In the case of community detection, each possible solution defines a set of users that are similar together and linked to an influential user in trusted relation. So, a food source is considered a community due to the coding [28]. In this way, influential users are considered as employee bees, which are the core of the community in the social network. The onlooker bees are looking for the employee bee that performs the best dance. The quality of the dance is based on a combination of similarity and trust as the fitness function. The onlooker bee follows the employee bee, which is more similar to her and less different in their level of trust. Scout bees look for the new community due to the influential users in a social network. Since the social features of a user are represented as a vector in which each element of the vector is considered as an attribute, to calculate the similarity between the two users, the Jacquard criterion can be used as follows:where A(i) and A(j) are the connected users in the social network [45].

3.6. Fitness Function

The fitness function in the proposed method is a combination of the trust degree of users in the social network and the similarity between users and influential users in the network. As mentioned, the degree of trust of a user is defined as the proportion of the input communications of each user divided by the total network users. The higher trust degree of a user shows that many users among all users in the network know this user and trust this user and have communicated with him. In online social networks, the principle of communication between two users is considered as knowledge and trust. On the contrary, since, in the proposed method, influential users are selected as the central points in each community, the degree of similarity of users to influential users can be an important factor in determining the community. Hence, the proposed fitness function is defined as the fusion of users’ trust degree and their similarity to influential users as the central points of any community. So, the fitness function of the proposed method is defined as an optimization problem as follows:where the total paths between an influential user and its neighbors should not be more than half of the total edges in the network, and the number of neighbors of an influential user should not be more than half of the network users because in that case, the whole social network will be considered as one community.

4. Experimental Results

In this paper, the Facebook dataset stored in the Stanford University Governor’s Data Repository is used [37]. This dataset is part of the data related to social media on Facebook. This dataset is used to test many different clustering methods [46]. This dataset consists of “circles” (or “friends’ lists”) from Facebook that were collected from survey participants using this Facebook app. The Facebook dataset includes node features (profiles), circles, and ego networks. In this dataset, a social media review tool for collecting, analyzing, and visualizing the level of its use and disseminating it in social media has been published. Also, the latest articles that the data collection has used for accurate analysis of community detection can be found in this data collection. Table 1 shows the details of the database.

The complete dataset cannot be distributed due to Facebook’s privacy policies and the copyright of the news publisher. Therefore, in this dataset, specific jobs and user information are not disclosed due to Facebook’s policy. This dataset can be used to check Facebook’s connection to work and data related to social media.

This dataset is constantly being updated due to the development of the social network Facebook and the increase in the amount of work on it and the network content. The latest version of this data collection (located in the data collection folder) includes networks related to schools and colleges. Each school file has a sparse adjacency matrix and a “local_info” variable of feature information.

4.1. Influential Users

As mentioned in the previous sections, in this study, we use the ABC algorithm to validate the work on the social network. In the proposed method, the target community is those users who are connected in a trust chain. The proposed method is under the assumption that influential users in the social network can play an important role in the formation of communities and other users who surround influential users have similar characteristics to the influential users, and these users act as cluster centers or community representations. Therefore, the first step of community detection is to find influential users in the social network. Influential users in the network are extracted according to their communication rate with other users in the social network. Users whose normalized communication exceeds the average of the network are selected as influential users on the network. Figure 2 shows the influential users on the social network in the Facebook dataset.

As shown in Figure 2, influential users on the social network are extracted according to their communication and threshold. Now, according to the figure, it can be seen that some of its influential users may be in direct contact with another influential user. In such a case, if the distance between the two influential users is short and as a result, their similarity to each other can be high. In such cases, we measure the similarity of two effective users, and if there is a similarity between these two users, we merge the two users. In fact, in such a case, a user with a lower communication rate will be considered as a member of the community of the influential user with a higher communication rate. Figure 3 shows the integration of the effective users extracted in the previous step.

As shown in Figure 3, from 10 influential users extracted in the previous step, 5 influential users are merged due to the similarity, and only 5 influential users who are not similar to each other remain. These users are considered as the central point of a cluster related to a community, and the ABC algorithm starts its search and optimization by considering this user as the target.

4.2. Implementation of Community Detection Using ABC

In this article, influential users are considered as food resources. According to the bee optimization algorithm, in the proposed method, first, the employee bees are found, which are the direct neighbors of the influential user. These users are the users who are most likely to be influential users and can belong to the community associated with influential users. Figure 4 shows an example of a total of worker bees associated with an influential user on a social network.

As mentioned in Figure 4, total neighbors of an influential user with index 6 have been selected as employee bees in the initial population. Now, according to the fitness function, among these users, only users who have a fit value higher than the average value are selected as employee bees specified by ABC. Given that the fitness function is a combination of trust and similarity between the target user and other users, the users selected by ABC as the trusted users are who have the most similarity to the influential user. Selecting such users as members of the community, not only reduces intracluster distances in community discovery but will also increase the accuracy of the selected community.

Employee bees are evaluated according to the fitness function, and each of them has taken a value according to the degree of trust and similarity to the relevant influential user. Figure 5 shows the selected employee bees in the ABC algorithm.

As shown in Figure 5, all neighbors of the influential user with index 6 are extracted, but they do not have the qualifications to become an employee user, and only some of these users with a high level of trust and similarity due to the fitness function are selected as the employee user. Thus, according to Figure 5, it can be seen that these users are in one effective step of the influential user and have the most similarity and the least distance from the center of the cluster. In the next step of the proposed method, onlooker bees are considered as close neighbors of employee bees whose fitness function is greater than the average of the total values of the obtained fitness function. These users have a direct relationship with employee bees and are one step away from influential users as a food source. Onlooker users are selected similar to employee bees and therefore can be placed in the relevant community due to the similarity of employee bees and influential users. Figure 6 shows an example of onlooker bees based on the fitness function.

In the next step of the proposed method, scout bees are considered as close neighbors of onlooker bees. These users have a direct relationship with onlooker users and are two steps away from influential users as a food source. Scout users are selected similar to onlooker users and therefore can be placed in the relevant community due to the similarity of onlooker, employee, and influential users. Figure 7 shows an example of scout bees based on the fit function.

4.3. Results

The ABC steps are repeated, and the employee, onlooker, and scout bees are updated in each step. Finally, after 20 generations of repetition, the community of each influential user is detected. Figure 8 shows the final clustering of influential user communities.

As shown in Figure 8, users are extracted from communities centered on influential users. According to the figure, it can be seen that due to the more number of users around some influential users, two influential users without any similar user alone have formed a community. Of course, if a more complete dataset is available, these users may also have their communities. ABC converges and stops globally after 20 generations of repetition. Figure 9 shows the convergence of ABC to the optimal point.

As shown in Figure 9, ABC has been able to find the target users in each community and explore the communities of each target user due to the users’ adaptation to the problem parameters. This algorithm directs the repetition of the defined goal values in each step to the optimal one and minimizes the community discovery error.

4.4. Performance Evaluation

The quality of community discovery performance in social networks can be measured by many criteria, and the type of criteria used depends on the method used. In our proposed method, the two criteria of Normalized Mutual Information (NMI) and the accuracy of community discovery performance based on the accuracy of user clustering were adopted. The normalized common information criterion is used to compare the modular structure of communities discovered in the social network, which is inspired by information theory [47]. This criterion measures the common users between the discovered communities and similar users based on the features mentioned in the dataset. The NMI criterion is calculated in equation (5):where S is the set of similar users, C is the set of users in the communities, I(S, C) is the set of mutual users between these two sets, H(S) is the number of similar users, and H(C) is the number of users discovered in the communities. Figure 10 shows the NMI criteria for each of the communities discovered in the proposed method.

As shown in Figure 10, the NMI criterion in the proposed method is high for the discovered communities. Figure 10 shows that most of the users extracted for each community are similar users on the social network.

Another criterion used in this study to evaluate the performance of the proposed method is the criteria related to the confusion matrix for clustering and community discovery in social networks. This matrix, which is one of the most well-known criteria for measuring clustering performance, is obtained by comparing the class of clustered samples with the class of original samples. In this paper, the clustering class of the samples is extracted based on users in the communities, and the main class includes similar users in the social network based on the published features in the dataset. This matrix consists of four elements: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN):(i)TP: users who have been selected as members of the community and whose features in the dataset are similar to each other(ii)TN: users who have been selected as community members but whose features in the dataset are not similar to each other(iii)FP: users who are not selected as members of the community and whose features in the dataset are not similar to each other(iv)FN: users who are not selected as members of the community but whose features in the dataset are similar to each other

Criteria for assessing the accuracy of community discovery performance are determined based on similar features in users on social networks. Therefore, the evaluation criteria based on the confusion matrix obtained from the mentioned factors are given in the following equations:

In general, the Precision criterion is defined as the ratio of correctly detected users in communities to the total number of users detected in communities. The Recall criterion is defined as the ratio of correctly detected users to the total number of correctly detected users and members of the community who are not detected.

The F-measure performance evaluation criterion is defined to combine the two criteria, Precision and Recall, into a single criterion. The resulting value makes comparisons between algorithms, and the entire dataset is very simple and easy. The F-measure criterion is defined as follows:

Algorithm 1 shows the values of the evaluation criteria for the proposed method. These values are obtained from the average values obtained for each community.

According to Table 2, it can be seen that the proposed method has good average values for evaluation criteria, and this shows the good performance of ABC in community detection in social networks. Figure 11 shows a curve of the accuracy criterion for the communities discovered in ABC.

As shown in Figure 11, the accuracy of ABC for community discovery is a high value with an average of 95.33% due to the agreement of ABC parameters with the similarity structure of users in the social network. This high value for the accuracy of the community detection method indicates the high similarity of users in terms of the level of trust and social characteristics with influential users in the network.

As shown in Figure 12, the Recall criterion has a high value in ABC-based community discovery concerning the proper use of it, with an average of 98.62%. Based on the calculated Recall, it can be said that the method could achieve a low intracluster distance and a high intercluster distance between users in the social network.

Figure 13 shows the curve for the Precision criterion for the communities discovered in the proposed method. As shown in this figure, the Precision criterion in ABC for community detection is 87.9% on average. This value for the Precision of the community discovery method indicates a small percentage of users that the proposed method has not been able to discover and cluster them in the form of communities. In other words, the percentage of undetected users who were members of a community is minimum.

Finally, the calculated F-measure of the proposed method based on the number of communities with an average of 92.62% is presented in Figure 14. This criterion balances the sensitivity and specificity of the proposed method for discovering users on a social network. This figure emphasizes the sensitivity and specificity of the proposed method in finding the most similar users in terms of the level of trust and social characteristics with influential users in the social network that are highly reliable.

4.5. Comparison with Previous Methods

After implementing and evaluating the proposed method, to validate the proposed method and to show the improvement made in this method, we compared our method with other methods. Due to the importance of community detection in social networks and finding similar users in order to divide them into systematic groups in the form of similar social structures, many researchers in this field have done a lot of research studies. For this purpose, the proposed method is compared with some previous methods [3, 4855] in the aspect of NMI. This criterion is one of the most important evaluation criteria in the field of community recognition in social networks. Figure 15 shows a comparison of the proposed method with other methods. In order to make the comparison as fair as possible, we use the same experimental setting as given in [46].

As shown in Figure 15, the proposed method, in comparison to other previous methods and even from the ABC-based method without considering the degree of trust and the influential users, has achieved better results in terms of NMI criteria.

In Figure 16, the proposed method is compared with previous methods in the aspect of accuracy. This criterion has shown the quality of the proposed method in detecting community users among all users in the social network concerning other methods. It refers to the effect of selecting trusted and influential users as the central points of any community in the proposed method.

The sensitivity of detected users among all users discovered as members of communities is presented in Figure 17. This figure compares the point-to-point Recall value of the proposed method with previous methods. In this figure, the calculated Recall of the proposed method is better than other methods in all the cases.

Figure 18 presents a point-to-point comparison of the proposed method with previous methods in the aspect of Precision that shows the specificity of correctly detected users among all detected users of communities. As shown in Figure 18, the proposed TCDABCF method achieved better results in terms of Precision compared to most of the previous methods especially when the number of communities increases.

Finally, the proposed method is compared with previous methods in the aspect of F-measure in Figure 19. As shown in this figure, the point-to-point comparison of the proposed method with previous methods shows that sensitivity and specificity of the proposed method for discovering users on a social network has achieved better results in all the cases. That proves the superiority of the proposed TCDABCF method compared with state-of-the-art methods.

5. Conclusion

Today, social networks play an effective role in all aspects of life, and the use of social networks in all aspects of daily life is clearly palpable. Many complex phenomena can be modeled on social networks and have now become a hot research area. Relationships can be presented as social networks in many areas of real life, such as social media, biology, and scientific communication. Therefore, finding similar users in the form of communities and organizations in social networks can bring very useful applications for all researchers. Discovering the community on social media, not only systematizes the search for users participating in social media but also can bring together similar users with almost identical interests, and the results of which can be used in many areas. On the contrary, due to the dynamic nature of social networks, searching for similar users who reject the same level of trust requires the use of metaheuristic search algorithms. Hence, in order to explore a society in social networks, an approach based on the artificial bee colony algorithm is presented. In the proposed method, an influential user is selected as the target in the network, and other users in the network are known as bees. According to the social characteristics and the level of trust and the similarity of the user to the target, the hierarchy of users is formed, and thus, a community of similar users is identified.

Using trust and similarity of users combined with the ABC algorithm makes the proposed method superior to the previous works in this area. As mentioned in the Experimental Results section, we compare this algorithm with nine similar methods on the Facebook dataset, and results show that the proposed method has obtained values of 0.9662 and 0.9533 for NMI and accuracy, respectively, which has improvements in comparison with state-of-the-art community detection methods.

In order to make future suggestions for the proposed method, a multiobjective artificial bee colony algorithm can be proposed by aggregating the objectives of increasing trust, increasing similarity, reducing intracluster distances, and increasing intercluster distances in order to perform community detection in social networks.

Data Availability

In this paper, the Facebook dataset stored in the Stanford University Governor’s Data Repository was used Kilroy, Ref: D.S., and Chelsea Hejny, Facebook. Internet resource, 2017.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

ZhihaoPeng developed the theory and performed the computations, developed theoretical formalism, performed the experiments using the Facebook database, and analyzed the results with the support from Poria Pirozmand. Mohsen Rastgari implemented the method using the Matlab compiler with the support from Zhihao Peng and wrote the manuscript with the support from Yahya Dorostkar Navaei. Yahya Dorostkar Navaei helped Mohsen Rastgari to write the manuscript and conceived and planned the experiments. Raziyeh Daraei took the lead in revising the manuscript, inserted more related works, while revising the manuscript, and proofread the final manuscript before passing it to a native proofreader. Rozita Jamili Oskouei supervised Mohsen Rastgari in implementing codes, helped Mohsen Rastgari in writing the manuscript, submitted the manuscript to the journal, and was the Iranian team leader. Poria Pirozmand contributed to sample preparation, helped Zhihao Peng in developing theoretical formalism, processed the experimental data, and was the Chinese team leader. Seyed Saeid Mirkamali supervised the project and Sestablished the connection between Chinese and Iranian teams.