Abstract
The goal of network representation learning is to extract deeplevel abstraction from data features that can also be viewed as a process of transforming the highdimensional data to lowdimensional features. Learning the mapping functions between two vector spaces is an essential problem. In this paper, we propose a new similarity index based on traditional machine learning, which integrates the concepts of common neighbor, local path, and preferential attachment. Furthermore, for applying the link prediction methods to the field of node classification, we have innovatively established an architecture named multitask graph autoencoder. Specifically, in the context of structural deep network embedding, the architecture designs a framework of highorder loss function by calculating the node similarity from multiple angles so that the model can make up for the deficiency of the secondorder loss function. Through the parameter finetuning, the highorder loss function is introduced into the optimized autoencoder. Proved by the effective experiments, the framework is generally applicable to the majority of classical similarity indexes.
1. Introduction
Nowadays, with the explosive growth of network data, the mainstream network representation learning algorithms are gradually difficult to adapt to the intricate data types. A variety of approaches were proposed to address privacy [1] and security [2] issues. The network is the carrier of the sophisticated relationships between data. Taking social networks as an example, large websites such as Twitter and Facebook have been consistently developing for a long time so that they can possess millions of online users. The user information scale is enormous, and the network structure is rather intricated. Thus, a mass of relationships between online users are worth exploring. By capturing the structural characteristics of realworld networks, experts and scholars can deal with multiple data analysis tasks efficiently, such as community detection [3], link prediction [4, 5], and node classification [6]. The emergence of network representation learning [7, 8] technology is of vital significance to social network analysis.
In the field of link prediction based on the classical similarity index, the CN [9] index calculates the number of common neighbors to predict the potential links between node pairs. The AA [10] index imposes a penalty on lowerconnected neighbors. The Jaccard [11] index measures the similarity by comparing the proximities and differences between sample sets of common neighbors. The LP [12] index introduces the influencing factor of a thirdorder local path to the algorithm. The Katz [13] index improves the prediction accuracy by optimizing the LP index, by which it comprehensively extends the local path to the global path.
Motivated by Natural Language Processing [14], lots of network representation learning algorithms based on the Continuous BagofWord model and Random Walk have gradually appeared. Essentially, it is a network mapping technique that each node is uniquely represented in form of lowdimensional vectors. By measuring the similarities between embedding vectors, these latent representations are probably to find the potential correlations between different entities denoted by nodes. Specifically, the lowdimensional space can visualize the potential links in the complex network that are hard to be observed. Network representation learning not only is broadly employed to handle sophisticated social network tasks but also can be parallelized to reduce computational time.
Perozzi et al. [15] utilized the Random Walk mechanism to traverse all the network nodes deeply and preferentially. Given the initial node and walk step size, the algorithm samples a neighbor node as the next access node at random and then constitutes node access sequences of specified length in order so as to express the cooccurrence relation between nodes. After obtaining associated sampling data, the algorithm inputs sampling data into the skipgram model for training, and the neighborhood structure of discrete nodes is then represented by vectors. Struc2vec [16] redefines node similarity from the perspective of a spatial structure. The algorithm constructs the weighted hierarchy graph by computing the node pair distances in different layers. Eventually, it leverages the generated node sequences that are structurally similar to learn network representations. Tang et al. [17] use the gradient descent method to separately optimize the firstorder proximity and the secondorder proximity. During the process of training, Tang et al. apply the negative sampling [18] method to decrease the time complexity.
Here, the contributions of our paper are demonstrated as follows: (1) We propose a new link prediction algorithm of mixed local neighbor and path, namely, MLNP. (2) For the deficiency of loss functions in structural deep network embedding (SDNE), our work establishes an architecture of multitask graph autoencoder (MTGAE), which designs a framework of highorder loss function from the perspective of capturing the similarity information. (3) We confirm the universal effectiveness of the loss function framework on different datasets. The specific model flow chart is shown in Figure 1.
2. Related Works
2.1. Autoencoder
As a special form of feedforward neural network, an autoencoder [19–22] is often used for dimensionality reduction feature learning in a graph embedding field. Let be an dimensional adjacency matrix representing a graph network as input and be an adjacency vector comprised of the local neighborhood structure information. The autoencoder consists of two components: the encoder and the decoder . Specifically, it maps the adjacency vector to the lowdimensional embedding space composed of several nonlinear functions and acquires the approximate representation vector by effective way of compressing the graphstructured data. Then, we decode the embedding vector and represent it as the reconstruction vector . During the backward pass, the reconstruction loss error between the input and the output is minimized by adjusting the weight matrix cyclically. The representation vectors of latent space for different layers are computed as follows: where is the weight matrix of the th layer, is the th layer latent vector, is the biases of the th layer, and denotes the sigmoid nonlinear activation function.
2.2. Structural Deep Network Embedding
In 2016, Wang et al. [23] put forward a structural deep network embedding model in two aspects. The firstorder proximity captures local structure features of the network by judging whether nodes are linked by a direct edge [24], which can be thought of as the supervised component. Meanwhile, the secondorder proximity preserves global structure features by observing the differences between the neighborhood structure of nodes, which can be regarded as the unsupervised component. Two concepts of proximity describe the characteristics of the network structure from complementary viewpoints. The SDNE model gives weights to the firstorder and secondorder proximity loss functions for iterative optimization, respectively. The SDNE architecture is shown in Figure 2.
The firstorder loss function makes the corresponding embedding vectors of adjacent nodes and approximate in embedding spaces. The objective function is calculated as follows: where denotes the matrix trace, is the element of the adjacency matrix, is the Laplace vector matrix, and is the encoded vector matrix of the hidden layer.
Intuitively, the secondorder proximity compares the neighborhood structure of node pairs, and the proximity is computed as follows: where is the reconstruction error, denotes the Hadamard product, and is a penalty coefficient, where . If , ; otherwise, . Affected by the sparsity of the network, the quantity of zero elements in the adjacency matrix is far more than that of nonzero elements. We assume that the adjacency matrix is directly addressed as the input of SDNE; it is simpler to reconstruct the zero elements. However, this is not in accordance with our previous expectations, and a reasonable solution is to impose a higher penalty coefficient on the reconstruction error of nonzero elements. The ultimate goal of the SDNE model is to jointly optimize the proximity loss functions, and the integral loss function is shown in where denotes the regularization term to avoid overfitting. Because of the robustness of the sparse network, performances of overall optimization are hardly affected by variations of parameters and .
3. Proposed Link Prediction Algorithm
In this paper, we innovatively propose an MLNP link prediction algorithm that integrates methods of common neighbors, highorder path, and preferential attachment. We adjust the structural factors of the LP index by weighing prediction accuracy against computational efficiency. The calculation method is shown in where is the adjacency matrix, is the attenuation parameter, and and denote the degrees of pairwise nodes. More importantly, means the neighbor nodes. By utilizing the MAARA matrix based on the AA index and RA [25] index, we highlight the importance of nodes with tremendous influence. In specific, the algorithm enhances the contribution of nodes with higher degree centralities to similarity and weakens the contribution of nodes with lower degree centralities to similarity. We distinguish common neighbors with different degree centralities to reflect the correlations between pairwise nodes more accurately. The node similarity is calculated as follows:
According to the theory of preferential attachment [12], the probability of potential links between the central node and other neighbor nodes is directly proportional to the degree centrality of the central node. Furthermore, the likelihood one link connecting pairwise nodes and is also directly proportional to . To summarize, the Hadamard product of the reconstructed MAARA matrix and PA matrix compresses the local neighborhood information so that we can thoroughly take the properties of nodes themselves, the number, and influence of common neighbors into consideration.
The above method conducts structural optimizations for the common neighbor index and explains its superiority from the theoretical level. Inspired by the idea of the global path, as the number of intermediate nodes in local paths increases, the weight parameter of the highorder path will decay. Intuitively, the number of secondorder paths is equal to the number of common neighbors that have been discussed, indicating that the weight of the thirdorder path is the highest. Hence, our work innovatively introduces the factor of thirdorder path combined with the above matrices into the ultimate similarity matrix so as to produce a substantial boost on prediction accuracies. The basic algorithm procedure is shown in Algorithm 1.

4. Multitask Graph Autoencoder
4.1. HighOrder Loss Function
The deficiency in secondorder proximity of the SDNE model is explained as follows: When imposing a penalty coefficient on nonzero elements, the only criterion for measuring similarities is whether an edge exists between pairwise nodes. Factually, the properties of common neighbors, the length of paths, and even the attenuation parameters will bring about deviations in the process of computing similarities. The adjacency matrix only describes the actual condition, while the similarity matrix reveals the hidden structural similarity of the network. For instance, a couple of individuals who have more common friends are more likely to establish friendships, even though they do not get acquainted with each other before. In network topology, we can directly observe the explicit links but may ignore the potential links simultaneously. Thus, the idea of applying the adjacency matrix only is single that seeking the potential links inferred by the algorithm is the key to lifting the capability of our model.
The highorder proximity and secondorder proximity are complementary in that they, respectively, punish matrix elements according to the explicit similarity and the hidden similarity of the network structure. By using the backpropagation algorithm, we cyclically minimize the introduced highorder loss function error. In detail, the reconstructed highorder loss function is defined as follows: where is the similarity matrix and is the adjustment parameter. Parameter directly controls the fluctuation range of similarity and constrains the reconstruction weight. We believe that should be consistent with (120), or the different loss functions will exhibit extreme imbalance. Our model has its advantages in addressing the tasks of link prediction and semisupervised node classification at the same time. In specific, we borrow the idea of link prediction, which takes the output similarity matrix as an intermediate product, and then, we input the processed vector matrix into a stacked autoencoder.
4.2. Optimization of Autoencoder
In our experiment, we use the Keras [26] module to implement two layers of encoder and decoder at the CPUenabled Tensorflow [27] backend. The hidden layer dimensionality of our model architecture is fixed at N256128256N. Due to the abandonment of the deep belief network [28] structure for parameter pretraining, the SGD optimizer and Sigmoid activation function applied by the original SDNE algorithm may lead to the cessation of training. Alternatively, our architecture attempts to apply the Adam [29] algorithm with a fixed learning rate and ReLU [30] activation function for optimization.
The Adam optimizer has the characteristics of inertia retention and environmental perception. The method of calculating a new round of gradient descent is the linear weighting of the current real gradient with the gradient used in the previous round for gradient descent. The superiority of its adaptive learning efficiency lies in overcoming the network sparsity problem effectively. Compared with the SGD optimizer which is easy to converge to the local optimum and trapped in the saddle point, the Adam optimizer is recognized for accelerating the convergence speed and maintaining the convergence stability. However, the adaptive learning rate algorithm of the Adam optimizer performs worse in the fields of object recognition and syntax component analysis. In the deep neural network (DNN), the gradient of the Sigmoid activation function is very small at a position away from point 0. During the backpropagation phase, the information loss problem caused by gradient disappearance may occur, and computation of the partial derivative involved with division may increase the time complexity of the algorithm. ReLU activation function, however, can effectively alleviate this type of vanishing gradient issue and perform well in enhancing computational efficiency. To summarize, the complete algorithm is shown in Algorithm 2.

5. Link Prediction Experiments
5.1. Datasets and Evaluation Metrics
For link prediction, we evaluate our MLNP algorithm on five classical graphstructured datasets. The fundamental information of datasets is introduced as follows.
NS [31] is a collaboration network of scientists who have published distinguished papers on the topic of complex networks. An observed link is present if there is a cooperative relationship between scientists in papers. PB [32] is an American political blog network that documents the links between blogs extracted from network websites. PPI [33] is a proteinprotein interaction network. The nodes denote macromolecules of proteins, and the links indicate the interactions between a couple of proteins. USAir [34] is an aviation network of the United States that each node corresponds to a termination. If there is a direct air route between terminations, it means that there is a connection between nodes. Router is a router [35] network on the Internet, where nodes denote routers and edges directly connect the two routers for packet exchange through optical fiber or other means.
Our work adopts the most widely used AUC score to evaluate our proposed MLNP algorithm on link prediction tasks. It can be explained as the likelihood that the randomly selected test links score higher than stochastically selected nonexistent links. In contrast to the precision@k evaluation indicator, the AUC score overall measures the prediction accuracy. It is defined as follows:
Among times of experimental comparisons, denotes the occurrences of missing links that score higher than nonexistent links, while denotes the occurrences of having the same score.
5.2. Result Analysis
To guarantee a more finegrained comparison, we empirically choose 90% links at random as the training set, and the remaining 10% links constitute the probe set for prediction. We summarize the consequences of link prediction for five datasets in Figure 3.
In comparison to other strong baselines [36], the experiment results explicitly show that the formulated MLNP algorithm consistently achieves the best AUC performance on three datasets , respectively, 1.24%, 0.33%, and 0.3% higher than the best baseline. Although the prediction accuracy of our method is slightly 0.24% lower than the RA index on the USAir dataset and 2.78% lower than the Katz index on the Router dataset, it remains competitive compared with the rest of the similarity indexes.
On smallscale datasets, we explicitly observe that our method outperforms all other baselines, even exceeding the Katz index based on the global path. To our surprise, the prediction accuracy reaches 99.7% on the NS dataset. However, the formulated algorithm gets worse AUC performances than the RA index and Katz index on largescale datasets. The possible reasons are twofold. Firstly, with the increase of diameter and average path length of the network, it is far from enough that the MLNP algorithm only captures local information. Secondly, the Katz index preserves the global structures adequately by traversing the network. The experiments reveal that the MLNP algorithm is quite effective for optimization of the original similarity index. We attribute the efficacy of our innovation to multiple integrated methods.
6. Node Classification Experiments
6.1. Datasets and Evaluation Metrics
We select two air transportation networks of Europeflight and Brazilflight to assess the effects of representations. Specifically, the dataset contents comprise nodes, links, and node labels that 399 nodes and 5995 links exist in the Europeflight network, and 131 nodes and 1074 links exist in the Brazilflight network. Both datasets divide node labels into four categories, and the detailed statistics of network attributes are computed in Table 1.
To ensure that the adopted similarity theory can traverse the network locally and globally, we calculate the degree distribution of nodes as well. According to the simulation consequences, although the quantity of network nodes decreases, the structure information is adversely more intact due to the relatively high link density and average node degree. The exact degree distributions of datasets are shown in Figure 4.
(a)
(b)
Empirically, we employ the current popular F1measure indicator [367] to evaluate the quality of graph embedding representations, and the calculation method is defined as follows:
6.2. Loss Convergence Comparison of Optimizers
To check the loss convergence of the Adam optimizer and SGD optimizer, we apply the control variable method to perform 100 iterative training epochs on the premise of consistent model parameters. The simulation experiments of loss convergence are shown in Figure 5.
In this experiment, the results obviously reveal that the architecture combined with the Adam optimizer converges more quickly and more stably. Under the same circumstances, there is no doubt that the capability of the Adam optimizer is better compared with the SGD optimizer.
6.3. Result Analysis
We set the training batch size of our model to the total number of nodes in one network. To ensure the consistency of other model parameters, our work configures the training parameters of the MTGAE model shown in Table 2. Specifically, the weight parameters of firstorder and secondorder proximity should remain strictly constant. Affected by the negative effect of overfitting, the autoencoder applies the regularization to limit the weight threshold value in the fully connected neural network.
The feature learning of network structure is insufficient when we train on fewer nodes. Considering the contingent consequences that may appear, we determine to give up sampling 10% and 20% of the observed links in networks for training. Instead, when the training percentage increases from 30% to 90%, every time, we calculate the mean value of 10 experiments to compare the performances between the MTGAE (MLNP) algorithm and the classical SDNE algorithm. Moreover, the mainstream algorithm of Line and Node2vec [37] is chosen as benchmarks as well, and the actual consequences of node classification are shown in Figures 6 and 7.
(a)
(b)
(a)
(b)
By carefully calculating the experiment results, we discover that under different proportions of training sets, the proposed MTGAE (MLNP) model applied in Europeflight and Brazilflight networks boosts the average MicroF1 by 2.42% and 2.25%, respectively, and enhances the average MacroF1 by 2.54% and 2.21%, respectively. When the training percentage is up to 90%, it means that the algorithm completely learns the network representations, and the promotion of node classification accuracy reaches the climax, even 5%6%. It can be seen that whatever proportion of the training set is divided by the experiment, both the MicroF1 and MacroF1 of our algorithm are generally higher than those of the related algorithms. We find that our algorithm can promote both the evaluation metrics, indicating that the introduced highorder proximity can capture the structure features better in latent spaces and achieve ideal classification effects. The visualizations of the two datasets are shown in Figure 8.
(a)
(b)
6.4. Horizontal Contrast of Loss Function Framework
In order to verify the universal validity of the highorder loss function framework, we separately adopt the same processing method as the MLNP index for the CN index, RA index, and Katz index. In two different networks, the horizontal contrasts of our experiments are shown in Figures 9 and 10.
(a)
(b)
(a)
(b)
The results reveal that no matter what kind of similarity index we introduce into the framework of the highorder loss function, the MTGAE model is superior to the SDNE model except for a couple of special cases on two datasets. Only when we randomly sample 80% of the links in the Europeflight network and stochastically sample 50% of the links in the Brazilflight network for training, the SDN model can behave better slightly than one or two other models. The underlying cause is the particularity of datasets. Moreover, it can be found that when we convert the MLNP index and Katz index to highorder loss functions, the improvement margin of node classification is more apparent. The accurate results are shown in Table 3. We choose the MTGAE model with the best prediction accuracy to display the specific improvement margin compared with the SDNE model. Hence, the experiment consequences demonstrate that the introduced framework of highorder loss function is generally effective in boosting the accuracy of node classification.
7. Conclusions
In this paper, we put forward an MLNP similarity algorithm that integrates multiple similarity theories. In addition, we establish an architecture of the MTGAE model which introduces the highorder loss function into an optimized autoencoder by preprocessing the similarity index. The extraordinary innovation of the MTGAE model is that it successfully applies the link prediction methods to the field of node classification. Specifically, the MLNP index of link prediction is used as an intermediate product to construct the highorder loss function. The above algorithms perform favorably well in both applications of link prediction and node classification. Furthermore, our work applies different similarity matrices as the highorder loss functions to verify the universal validity of the framework. The results demonstrate that our framework of highorder loss function adapts to the majority of popular similarity indexes.
With the continuous development and innovation of deep learning, numerous deep models with side information of nodes and edges emerge in an endless stream. However, some static models can no longer satisfy the needs of a broad range of practical applications. Experts and scholars have gradually turned their attention to dynamic graph embedding models. Although some professors have put forward algorithms to address the dynamic network, quite efficient methods to handle the multidimensional features still lack. The dynamic network is increasingly becoming a significant research object. Embedding the features of nodes and edges into autoencoder architecture and building dynamic evolution models are becoming significant research directions to extend graph embedding technologies. In the future, the majority of models to address the network representation learning problems have broad application prospects in such as recommender systems [38] and mobile computing [39].
Data Availability
The data used to support the findings of this study are publicly available.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 61771072), Special Project of People’s Public Security University of China (Grant No. 2020JWCX01), and Open Project of the Key Laboratory of the Police Internet of Things Application Technology (Ministry of Public Security of China).