Complexity

Volume 2018, Article ID 8098325, 10 pages

https://doi.org/10.1155/2018/8098325

## An Extreme Learning Machine-Based Community Detection Algorithm in Complex Networks

School of Automation, Beijing Institute of Technology, No. 5 Zhongguancun South Street, Beijing 100081, China

Correspondence should be addressed to Senchun Chai; moc.361@79csiahc

Received 14 February 2018; Accepted 2 July 2018; Published 6 August 2018

Academic Editor: Shyam Kamal

Copyright © 2018 Feifan Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Community structure, one of the most popular properties in complex networks, has long been a cornerstone in the advance of various scientific branches. Over the past few years, a number of tools have been used in the development of community detection algorithms. In this paper, by means of fusing unsupervised extreme learning machines and the -means clustering techniques, we propose a novel community detection method that surpasses traditional -means approaches in terms of precision and stability while adding very few extra computational costs. Furthermore, results of extensive experiments undertaken on computer-generated networks and real-world datasets illustrate acceptable performances of the introduced algorithm in comparison with other typical community detection algorithms.

#### 1. Introduction

As one of the most popular research fields over the past decades, complex networks have stimulated scientific advances in various fields such as biology [1], social networks [2], epidemiology [3], computer science [4], and transportations [5]. Numerous articles have explored different types of properties in complex networks. Among these articles, the study on community structure, which means vertices in a given network are inherently segregated into groups inside which the connections are relatively denser than those outside, has been one of the most popular [6]. Finding out the divisions of nodes in networks, which is called community detection or network clustering, is a hot spot for investigators because it is a good means to uncover the underlying semantic structure, mechanisms, and dynamics of certain networks [7]. Using such extracted information, internet service providers (ISPs) could set up a dedicated mirror server for intense web visits from the same geographic region to improve their customers’ internet surfing experiences [8], and/or online retailers could provide more efficient recommendations to customers in favor of creating a more friendly purchase environment [9].

To address the community detection problem, researchers have developed numerous algorithms. Social network scientists used to solve this problem by traditional methods such as graph partitioning, hierarchical clustering, partitional clustering, and spectral clustering [7, 10]. Girvan and Newman proposed the first divisive algorithm named after them, which is a milestone historically because it introduced more physicists and computer scientists to this field [11]. Divisive algorithms use the concept of betweenness as criteria to judge how often an edge participates in a graph process and break up connections one by one to determine the most significant community structure [10]. A byproduct of Girvan and Newman’s algorithm, called modularity , a quality function originally proposed as a criterion to decide when to stop the calculation, is another landmark that supports clustering methods focusing on the modularity optimization problem [12]. Although it has been proven impossible to list all the feasible divisions to determine the best strategy in deterministic polynomial time (the problem is np-hard) [13], many alternative approximate optimization techniques, including greedy algorithms [14], random walks [15], fast unfolding algorithm [16], information-theoretic framework [17], belief propagation [18], extremal optimization [19], simulated annealing [20], and genetic algorithms [21], have been deployed to solve the problem. Along with the optimization tools of interest, many instruments have been involved in this field. For example, spectral algorithms explore eigenvalues of Laplacian matrices of graphs using traditional clustering techniques [22]. Similar to spectral clustering, similarity matrix factorization and blocking can also be applied [23]. Label propagation, that is, attaching labels to each node based on the neighbor information, is known as a fast and effective clustering method [24].

-means clustering has long been one of the best off-the-shelf tools that exhibit relatively high precision and low computational complexity [25]. However, the performance of -means relies too heavily on the selection of the initial centroids; hence many updates have been proposed to overcome this drawback. -means++ chooses distinct initial seeds far from each other in a probabilistic manner, which leads to more stable clustering results but involves increased complexity [26]. Through ranking nodes in the same manner as Google’s cofounders did [27] and picking the center nodes from the highest ranking ones, -rank achieves small fluctuation in the community detection output although it requires additional running time [28]. Another defect of -means, explained by Ng et al. [29], is that it is only capable of finding clusters corresponding to convex regions. To address this problem, one could map the original data into a more suitable feature space. For example, Li et al. made use of principal component analysis (PCA) to implement -means in a lower-dimensional space for community detection tasks [30].

The prevalence of extreme learning machines (ELM), originally proposed by Huang et al., should be largely credited to the simplicity of its implementation [31]. It has been demonstrated that given random input weights and biases of the hidden layer, a single-layer feedforward network (SLFN) could approximate any continuous functions simply by tuning the output weights [32]. As a result, the abstracted task in ELM is equivalent to a regularized least squares problem which can be solved in closed form using the Moore-Penrose generalized inverse [33]. Recently, semisupervised and unsupervised ELMs have been exploited based on the manifold regularization framework [34]. Regarding the clustering task, the unsupervised ELM can be interpreted as an embedding process that maps the input data into low-dimensional space.

In this paper, we propose an extreme learning machine community detection (ELM-CD) algorithm based on the combination of -means and unsupervised ELM to fulfill the community detection task. Unsupervised ELM, inheriting the efficiency of ELM, has been utilized as a mapping mechanism that transforms the adjacency matrix to low-dimensional space, where -means can be employed to label the groups. In consideration of no additional computational load, we prefer the original lite weighted -means to other reinforced editions. Extensive comparison trials on both artificial networks and realistic networks indicate that ELM-CD outperforms traditional -means in light of different precision criteria. Meanwhile, the introduced algorithm has remarkably low complexity, approaching that of -means, and outperforms all other competitors evaluated.

The remainder of this article is organized as follows. Section 2 provides details of our algorithm. In Section 3, evaluations and comparisons are made in artificial and real-world networks. Finally, we conclude our work in Section 4.

#### 2. Model and Algorithm

##### 2.1. Preliminaries

We focus on an undirected network , where represents the set of vertices numbering in total and represents the set of edges with total number . Neglecting self-loops, which mean an edge starts and ends at the same point, connections among nodes are expressed as a symmetric adjacency matrix . where and .

According to , the Laplacian matrix is defined as in which is the degree of vertex , , and .

A community detection algorithm segregates into mutually exclusive districts by means of attaching labels on each node to indicate which group it belongs to. Because is symmetric, each row or column of can be considered as an input sample denoted by , . Instead of directly assigning each node with a community label, ELM-CD first embeds into a smaller matrix in the dimensional feature space, where -means clustering proceeds to output a vector whose items are the community labels of each node.

##### 2.2. Embedding Process

Following the universal semisupervised and unsupervised ELM framework [34], given neurons in the hidden layer, we define as the output vector from ELM with respect to input vector . in which and represent input weights and bias of the th neuron in the hidden layer, respectively; is the output weight from the th neuron to dimensional output elements; and is the sigmoid function

Assume that the input dataset can be classified as unlabeled set , , and that represents the number of unlabeled nodes and labeled set , , where is the corresponding community label and is the number of labeled nodes so that . The calculation of ELM would fall into two parts. In the first part, we randomly decide the input weights and biases according to a uniform distribution. In the second, let (3) be expressed as the inner product where

We define each line of as , indicating the output with respect to . The target of the second stage could be interpreted as a manifold-regularized optimization problem: in which is the Euclidean distance, calculates the trace of a matrix, and is a trade-off parameter. The first term of (8) is the loss function taking into account all labeled nodes with diverse penalty coefficients . The second term refers to the classical antioverfitting regularization item that constrains the output weight to be as small as possible. The third term is the manifold regularization where the unlabeled data comes into play. Concretely speaking, the manifold regularization framework believes in the assumption that if two points on the manifold sphere are close to each other, then they would also result in similar predicted outcomes [35]. Consider the adjacency matrix as a measure of distance that two nodes are close to (connected to) each other if , otherwise they will be far away (disconnected) from each other. The approximation of the manifold regularization is provided as

This regularization penalizes the large difference in prediction of two connected nodes.

When there exist no labeled nodes in the input dataset and substituting (5) into (8), the optimization formulae could be reformulated as

The additional constraint abides by the suggestion of Belkin and Niyogi in the case of the degeneration of the solution [36], and is an dimensional identity matrix.

Let and denote the eigenvalue and the corresponding eigenvector, respectively. When and have full column rank, it has been proven in [34] that solving (10) is equivalent to selecting generalized eigenvectors corresponding to the smallest eigenvalues of the problem.

To organize the matrix , we abandon the first eigenvector because it corresponds to eigenvalue 0 and contributes little in the embedding process. Thus, given the first smallest eigenvalues sorted by ascending order and their corresponding eigenvectors , the output weights are in which , , indicates the normalized eigenvectors.

If , meaning that the dimension of is smaller than the number of neurons in the hidden layer, the proven formulation of (11) is underdetermined and has the following alternative formulation:

In turn, the solution is where the normalized eigenvectors are given by ,.

Finally, we substitute into (5) to obtain the embedding matrix that would be fed into a -means clustering algorithm to determine the community labels .

##### 2.3. Clustering Process

In this article, an implementation of the original -means clustering [25], owing to its low computational complexity, has been integrated into ELM-CD. First, ELM-CD randomly selects rows in as the initial centroids of clusters. Second, taking the Euclidean distance as the standard, each row in , which is represented by, , is assigned to a cluster whose centroid is closest to a certain row. Then, for each cluster, we calculate the mean value of all its members and designate this mean vector as the new centroid. We then iterate the cluster assignment and the centroid update processes until no row in changes its community label.

We show the entire procedure of the ELM-CD algorithm in Pseudocode 1.