Privacy-Preserving Techniques in Deep Learning for Mobile ComputingView this Special Issue
Link Prediction and Node Classification Based on Multitask Graph Autoencoder
The goal of network representation learning is to extract deep-level abstraction from data features that can also be viewed as a process of transforming the high-dimensional data to low-dimensional features. Learning the mapping functions between two vector spaces is an essential problem. In this paper, we propose a new similarity index based on traditional machine learning, which integrates the concepts of common neighbor, local path, and preferential attachment. Furthermore, for applying the link prediction methods to the field of node classification, we have innovatively established an architecture named multitask graph autoencoder. Specifically, in the context of structural deep network embedding, the architecture designs a framework of high-order loss function by calculating the node similarity from multiple angles so that the model can make up for the deficiency of the second-order loss function. Through the parameter fine-tuning, the high-order loss function is introduced into the optimized autoencoder. Proved by the effective experiments, the framework is generally applicable to the majority of classical similarity indexes.
Nowadays, with the explosive growth of network data, the mainstream network representation learning algorithms are gradually difficult to adapt to the intricate data types. A variety of approaches were proposed to address privacy  and security  issues. The network is the carrier of the sophisticated relationships between data. Taking social networks as an example, large websites such as Twitter and Facebook have been consistently developing for a long time so that they can possess millions of online users. The user information scale is enormous, and the network structure is rather intricated. Thus, a mass of relationships between online users are worth exploring. By capturing the structural characteristics of real-world networks, experts and scholars can deal with multiple data analysis tasks efficiently, such as community detection , link prediction [4, 5], and node classification . The emergence of network representation learning [7, 8] technology is of vital significance to social network analysis.
In the field of link prediction based on the classical similarity index, the CN  index calculates the number of common neighbors to predict the potential links between node pairs. The AA  index imposes a penalty on lower-connected neighbors. The Jaccard  index measures the similarity by comparing the proximities and differences between sample sets of common neighbors. The LP  index introduces the influencing factor of a third-order local path to the algorithm. The Katz  index improves the prediction accuracy by optimizing the LP index, by which it comprehensively extends the local path to the global path.
Motivated by Natural Language Processing , lots of network representation learning algorithms based on the Continuous Bag-of-Word model and Random Walk have gradually appeared. Essentially, it is a network mapping technique that each node is uniquely represented in form of low-dimensional vectors. By measuring the similarities between embedding vectors, these latent representations are probably to find the potential correlations between different entities denoted by nodes. Specifically, the low-dimensional space can visualize the potential links in the complex network that are hard to be observed. Network representation learning not only is broadly employed to handle sophisticated social network tasks but also can be parallelized to reduce computational time.
Perozzi et al.  utilized the Random Walk mechanism to traverse all the network nodes deeply and preferentially. Given the initial node and walk step size, the algorithm samples a neighbor node as the next access node at random and then constitutes node access sequences of specified length in order so as to express the cooccurrence relation between nodes. After obtaining associated sampling data, the algorithm inputs sampling data into the skip-gram model for training, and the neighborhood structure of discrete nodes is then represented by vectors. Struc2vec  redefines node similarity from the perspective of a spatial structure. The algorithm constructs the weighted hierarchy graph by computing the node pair distances in different layers. Eventually, it leverages the generated node sequences that are structurally similar to learn network representations. Tang et al.  use the gradient descent method to separately optimize the first-order proximity and the second-order proximity. During the process of training, Tang et al. apply the negative sampling  method to decrease the time complexity.
Here, the contributions of our paper are demonstrated as follows: (1) We propose a new link prediction algorithm of mixed local neighbor and path, namely, MLNP. (2) For the deficiency of loss functions in structural deep network embedding (SDNE), our work establishes an architecture of multitask graph autoencoder (MTGAE), which designs a framework of high-order loss function from the perspective of capturing the similarity information. (3) We confirm the universal effectiveness of the loss function framework on different datasets. The specific model flow chart is shown in Figure 1.
2. Related Works
As a special form of feedforward neural network, an autoencoder [19–22] is often used for dimensionality reduction feature learning in a graph embedding field. Let be an -dimensional adjacency matrix representing a graph network as input and be an adjacency vector comprised of the local neighborhood structure information. The autoencoder consists of two components: the encoder and the decoder . Specifically, it maps the adjacency vector to the low-dimensional embedding space composed of several nonlinear functions and acquires the approximate representation vector by effective way of compressing the graph-structured data. Then, we decode the embedding vector and represent it as the reconstruction vector . During the backward pass, the reconstruction loss error between the input and the output is minimized by adjusting the weight matrix cyclically. The representation vectors of latent space for different layers are computed as follows: where is the weight matrix of the th layer, is the th layer latent vector, is the biases of the th layer, and denotes the sigmoid nonlinear activation function.
2.2. Structural Deep Network Embedding
In 2016, Wang et al.  put forward a structural deep network embedding model in two aspects. The first-order proximity captures local structure features of the network by judging whether nodes are linked by a direct edge , which can be thought of as the supervised component. Meanwhile, the second-order proximity preserves global structure features by observing the differences between the neighborhood structure of nodes, which can be regarded as the unsupervised component. Two concepts of proximity describe the characteristics of the network structure from complementary viewpoints. The SDNE model gives weights to the first-order and second-order proximity loss functions for iterative optimization, respectively. The SDNE architecture is shown in Figure 2.
The first-order loss function makes the corresponding embedding vectors of adjacent nodes and approximate in embedding spaces. The objective function is calculated as follows: where denotes the matrix trace, is the element of the adjacency matrix, is the Laplace vector matrix, and is the encoded vector matrix of the hidden layer.
Intuitively, the second-order proximity compares the neighborhood structure of node pairs, and the proximity is computed as follows: where is the reconstruction error, denotes the Hadamard product, and is a penalty coefficient, where . If , ; otherwise, . Affected by the sparsity of the network, the quantity of zero elements in the adjacency matrix is far more than that of nonzero elements. We assume that the adjacency matrix is directly addressed as the input of SDNE; it is simpler to reconstruct the zero elements. However, this is not in accordance with our previous expectations, and a reasonable solution is to impose a higher penalty coefficient on the reconstruction error of nonzero elements. The ultimate goal of the SDNE model is to jointly optimize the proximity loss functions, and the integral loss function is shown in where denotes the regularization term to avoid overfitting. Because of the robustness of the sparse network, performances of overall optimization are hardly affected by variations of parameters and .
3. Proposed Link Prediction Algorithm
In this paper, we innovatively propose an MLNP link prediction algorithm that integrates methods of common neighbors, high-order path, and preferential attachment. We adjust the structural factors of the LP index by weighing prediction accuracy against computational efficiency. The calculation method is shown in where is the adjacency matrix, is the attenuation parameter, and and denote the degrees of pairwise nodes. More importantly, means the neighbor nodes. By utilizing the MAARA matrix based on the AA index and RA  index, we highlight the importance of nodes with tremendous influence. In specific, the algorithm enhances the contribution of nodes with higher degree centralities to similarity and weakens the contribution of nodes with lower degree centralities to similarity. We distinguish common neighbors with different degree centralities to reflect the correlations between pairwise nodes more accurately. The node similarity is calculated as follows:
According to the theory of preferential attachment , the probability of potential links between the central node and other neighbor nodes is directly proportional to the degree centrality of the central node. Furthermore, the likelihood one link connecting pairwise nodes and is also directly proportional to . To summarize, the Hadamard product of the reconstructed MAARA matrix and PA matrix compresses the local neighborhood information so that we can thoroughly take the properties of nodes themselves, the number, and influence of common neighbors into consideration.
The above method conducts structural optimizations for the common neighbor index and explains its superiority from the theoretical level. Inspired by the idea of the global path, as the number of intermediate nodes in local paths increases, the weight parameter of the high-order path will decay. Intuitively, the number of second-order paths is equal to the number of common neighbors that have been discussed, indicating that the weight of the third-order path is the highest. Hence, our work innovatively introduces the factor of third-order path combined with the above matrices into the ultimate similarity matrix so as to produce a substantial boost on prediction accuracies. The basic algorithm procedure is shown in Algorithm 1.
4. Multitask Graph Autoencoder
4.1. High-Order Loss Function
The deficiency in second-order proximity of the SDNE model is explained as follows: When imposing a penalty coefficient on nonzero elements, the only criterion for measuring similarities is whether an edge exists between pairwise nodes. Factually, the properties of common neighbors, the length of paths, and even the attenuation parameters will bring about deviations in the process of computing similarities. The adjacency matrix only describes the actual condition, while the similarity matrix reveals the hidden structural similarity of the network. For instance, a couple of individuals who have more common friends are more likely to establish friendships, even though they do not get acquainted with each other before. In network topology, we can directly observe the explicit links but may ignore the potential links simultaneously. Thus, the idea of applying the adjacency matrix only is single that seeking the potential links inferred by the algorithm is the key to lifting the capability of our model.
The high-order proximity and second-order proximity are complementary in that they, respectively, punish matrix elements according to the explicit similarity and the hidden similarity of the network structure. By using the backpropagation algorithm, we cyclically minimize the introduced high-order loss function error. In detail, the reconstructed high-order loss function is defined as follows: where is the similarity matrix and is the adjustment parameter. Parameter directly controls the fluctuation range of similarity and constrains the reconstruction weight. We believe that should be consistent with (1-20), or the different loss functions will exhibit extreme imbalance. Our model has its advantages in addressing the tasks of link prediction and semisupervised node classification at the same time. In specific, we borrow the idea of link prediction, which takes the output similarity matrix as an intermediate product, and then, we input the processed vector matrix into a stacked autoencoder.
4.2. Optimization of Autoencoder
In our experiment, we use the Keras  module to implement two layers of encoder and decoder at the CPU-enabled Tensorflow  backend. The hidden layer dimensionality of our model architecture is fixed at N-256-128-256-N. Due to the abandonment of the deep belief network  structure for parameter pretraining, the SGD optimizer and Sigmoid activation function applied by the original SDNE algorithm may lead to the cessation of training. Alternatively, our architecture attempts to apply the Adam  algorithm with a fixed learning rate and ReLU  activation function for optimization.
The Adam optimizer has the characteristics of inertia retention and environmental perception. The method of calculating a new round of gradient descent is the linear weighting of the current real gradient with the gradient used in the previous round for gradient descent. The superiority of its adaptive learning efficiency lies in overcoming the network sparsity problem effectively. Compared with the SGD optimizer which is easy to converge to the local optimum and trapped in the saddle point, the Adam optimizer is recognized for accelerating the convergence speed and maintaining the convergence stability. However, the adaptive learning rate algorithm of the Adam optimizer performs worse in the fields of object recognition and syntax component analysis. In the deep neural network (DNN), the gradient of the Sigmoid activation function is very small at a position away from point 0. During the backpropagation phase, the information loss problem caused by gradient disappearance may occur, and computation of the partial derivative involved with division may increase the time complexity of the algorithm. ReLU activation function, however, can effectively alleviate this type of vanishing gradient issue and perform well in enhancing computational efficiency. To summarize, the complete algorithm is shown in Algorithm 2.
5. Link Prediction Experiments
5.1. Datasets and Evaluation Metrics
For link prediction, we evaluate our MLNP algorithm on five classical graph-structured datasets. The fundamental information of datasets is introduced as follows.
NS  is a collaboration network of scientists who have published distinguished papers on the topic of complex networks. An observed link is present if there is a cooperative relationship between scientists in papers. PB  is an American political blog network that documents the links between blogs extracted from network websites. PPI  is a protein-protein interaction network. The nodes denote macromolecules of proteins, and the links indicate the interactions between a couple of proteins. USAir  is an aviation network of the United States that each node corresponds to a termination. If there is a direct air route between terminations, it means that there is a connection between nodes. Router is a router  network on the Internet, where nodes denote routers and edges directly connect the two routers for packet exchange through optical fiber or other means.
Our work adopts the most widely used AUC score to evaluate our proposed MLNP algorithm on link prediction tasks. It can be explained as the likelihood that the randomly selected test links score higher than stochastically selected nonexistent links. In contrast to the precision@k evaluation indicator, the AUC score overall measures the prediction accuracy. It is defined as follows:
Among times of experimental comparisons, denotes the occurrences of missing links that score higher than nonexistent links, while denotes the occurrences of having the same score.
5.2. Result Analysis
To guarantee a more fine-grained comparison, we empirically choose 90% links at random as the training set, and the remaining 10% links constitute the probe set for prediction. We summarize the consequences of link prediction for five datasets in Figure 3.
In comparison to other strong baselines , the experiment results explicitly show that the formulated MLNP algorithm consistently achieves the best AUC performance on three datasets , respectively, 1.24%, 0.33%, and 0.3% higher than the best baseline. Although the prediction accuracy of our method is slightly 0.24% lower than the RA index on the USAir dataset and 2.78% lower than the Katz index on the Router dataset, it remains competitive compared with the rest of the similarity indexes.
On small-scale datasets, we explicitly observe that our method outperforms all other baselines, even exceeding the Katz index based on the global path. To our surprise, the prediction accuracy reaches 99.7% on the NS dataset. However, the formulated algorithm gets worse AUC performances than the RA index and Katz index on large-scale datasets. The possible reasons are twofold. Firstly, with the increase of diameter and average path length of the network, it is far from enough that the MLNP algorithm only captures local information. Secondly, the Katz index preserves the global structures adequately by traversing the network. The experiments reveal that the MLNP algorithm is quite effective for optimization of the original similarity index. We attribute the efficacy of our innovation to multiple integrated methods.
6. Node Classification Experiments
6.1. Datasets and Evaluation Metrics
We select two air transportation networks of Europe-flight and Brazil-flight to assess the effects of representations. Specifically, the dataset contents comprise nodes, links, and node labels that 399 nodes and 5995 links exist in the Europe-flight network, and 131 nodes and 1074 links exist in the Brazil-flight network. Both datasets divide node labels into four categories, and the detailed statistics of network attributes are computed in Table 1.
To ensure that the adopted similarity theory can traverse the network locally and globally, we calculate the degree distribution of nodes as well. According to the simulation consequences, although the quantity of network nodes decreases, the structure information is adversely more intact due to the relatively high link density and average node degree. The exact degree distributions of datasets are shown in Figure 4.
Empirically, we employ the current popular F1-measure indicator  to evaluate the quality of graph embedding representations, and the calculation method is defined as follows:
6.2. Loss Convergence Comparison of Optimizers
To check the loss convergence of the Adam optimizer and SGD optimizer, we apply the control variable method to perform 100 iterative training epochs on the premise of consistent model parameters. The simulation experiments of loss convergence are shown in Figure 5.
In this experiment, the results obviously reveal that the architecture combined with the Adam optimizer converges more quickly and more stably. Under the same circumstances, there is no doubt that the capability of the Adam optimizer is better compared with the SGD optimizer.
6.3. Result Analysis
We set the training batch size of our model to the total number of nodes in one network. To ensure the consistency of other model parameters, our work configures the training parameters of the MTGAE model shown in Table 2. Specifically, the weight parameters of first-order and second-order proximity should remain strictly constant. Affected by the negative effect of overfitting, the autoencoder applies the regularization to limit the weight threshold value in the fully connected neural network.
The feature learning of network structure is insufficient when we train on fewer nodes. Considering the contingent consequences that may appear, we determine to give up sampling 10% and 20% of the observed links in networks for training. Instead, when the training percentage increases from 30% to 90%, every time, we calculate the mean value of 10 experiments to compare the performances between the MTGAE (MLNP) algorithm and the classical SDNE algorithm. Moreover, the mainstream algorithm of Line and Node2vec  is chosen as benchmarks as well, and the actual consequences of node classification are shown in Figures 6 and 7.
By carefully calculating the experiment results, we discover that under different proportions of training sets, the proposed MTGAE (MLNP) model applied in Europe-flight and Brazil-flight networks boosts the average Micro-F1 by 2.42% and 2.25%, respectively, and enhances the average Macro-F1 by 2.54% and 2.21%, respectively. When the training percentage is up to 90%, it means that the algorithm completely learns the network representations, and the promotion of node classification accuracy reaches the climax, even 5%-6%. It can be seen that whatever proportion of the training set is divided by the experiment, both the Micro-F1 and Macro-F1 of our algorithm are generally higher than those of the related algorithms. We find that our algorithm can promote both the evaluation metrics, indicating that the introduced high-order proximity can capture the structure features better in latent spaces and achieve ideal classification effects. The visualizations of the two datasets are shown in Figure 8.
6.4. Horizontal Contrast of Loss Function Framework
In order to verify the universal validity of the high-order loss function framework, we separately adopt the same processing method as the MLNP index for the CN index, RA index, and Katz index. In two different networks, the horizontal contrasts of our experiments are shown in Figures 9 and 10.
The results reveal that no matter what kind of similarity index we introduce into the framework of the high-order loss function, the MTGAE model is superior to the SDNE model except for a couple of special cases on two datasets. Only when we randomly sample 80% of the links in the Europe-flight network and stochastically sample 50% of the links in the Brazil-flight network for training, the SDN model can behave better slightly than one or two other models. The underlying cause is the particularity of datasets. Moreover, it can be found that when we convert the MLNP index and Katz index to high-order loss functions, the improvement margin of node classification is more apparent. The accurate results are shown in Table 3. We choose the MTGAE model with the best prediction accuracy to display the specific improvement margin compared with the SDNE model. Hence, the experiment consequences demonstrate that the introduced framework of high-order loss function is generally effective in boosting the accuracy of node classification.
In this paper, we put forward an MLNP similarity algorithm that integrates multiple similarity theories. In addition, we establish an architecture of the MTGAE model which introduces the high-order loss function into an optimized autoencoder by preprocessing the similarity index. The extraordinary innovation of the MTGAE model is that it successfully applies the link prediction methods to the field of node classification. Specifically, the MLNP index of link prediction is used as an intermediate product to construct the high-order loss function. The above algorithms perform favorably well in both applications of link prediction and node classification. Furthermore, our work applies different similarity matrices as the high-order loss functions to verify the universal validity of the framework. The results demonstrate that our framework of high-order loss function adapts to the majority of popular similarity indexes.
With the continuous development and innovation of deep learning, numerous deep models with side information of nodes and edges emerge in an endless stream. However, some static models can no longer satisfy the needs of a broad range of practical applications. Experts and scholars have gradually turned their attention to dynamic graph embedding models. Although some professors have put forward algorithms to address the dynamic network, quite efficient methods to handle the multidimensional features still lack. The dynamic network is increasingly becoming a significant research object. Embedding the features of nodes and edges into autoencoder architecture and building dynamic evolution models are becoming significant research directions to extend graph embedding technologies. In the future, the majority of models to address the network representation learning problems have broad application prospects in such as recommender systems  and mobile computing .
The data used to support the findings of this study are publicly available.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by the National Natural Science Foundation of China (Grant No. 61771072), Special Project of People’s Public Security University of China (Grant No. 2020JWCX01), and Open Project of the Key Laboratory of the Police Internet of Things Application Technology (Ministry of Public Security of China).
L. Yin, B. Fang, Y. Guo, Z. Sun, and Z. Tian, “Hierarchically defining Internet of Things security: from CIA to CACA,” International Journal of Distributed Sensor Networks, vol. 16, no. 1, Article ID 1550147719899374, 2020.View at: Google Scholar
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics, vol. 2008, no. 10, pp. 155–168, 2008.View at: Google Scholar
T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, “Complex embeddings for simple link prediction,” in Proceedings of The 33rd International Conference on Machine Learning, pp. 2071–2080, New York, New York, USA, 2016.View at: Google Scholar
L. Y. Lv, C. H. JIN, and T. Zhou, “Similarity index based on local paths for link prediction of complex network,” Physical Review E, vol. 80, no. 4, pp. 211–223, 2009.View at: Google Scholar
D. Zhang, J. Yin, X. Zhu, and C. Zhang, “Network representation learning: a survey,” IEEE transactions on Big Data, vol. 6, pp. 3–28, 2018.View at: Google Scholar
L. Cai, Y. Xu, T. He, T. Meng, and H. Liu, “A new algorithm of DeepWalk based on probability,” Journal of Physics: Conference Series, vol. 1069, no. 1, pp. 130–135, 2019.View at: Google Scholar
P. Jaccard, “Etude comparative de la distribution florale dans une portion des Alpes et des Jura,” Bulletin of the Torrey Botanical Club, vol. 37, p. 547, 1901.View at: Google Scholar
B. Perozzi, R. Alrfou, and S. Skiena, “Deepwalk: online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 701–710, Newyork, USA, 2014.View at: Google Scholar
L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo, “struc2vec: learning node representations from structural identity,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 385–394, Halifax, NS, Canada, 2017.View at: Google Scholar
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: large- scale information network embedding,” in Proceedings of the 24th international conference on world wide web, pp. 1067–1077, Florence, Italy, 2015.View at: Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, pp. 3111–3119, 2013.View at: Google Scholar
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza et al., “Generative adversarial networks,” Advances in Neural Information Processing Systems, vol. 3, pp. 2672–2680, 2014.View at: Google Scholar
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training of deep networks,” in International conference on neural information processing systems, pp. 153–160, Stony Brook University, 2006.View at: Google Scholar
M. Ranzato, Y. L. Boureau, and Y. Lecun, “Sparse feature learning for dep belief networks,” Advances in Neural Information Processing Systems, vol. 20, pp. 1185–1192, 2007.View at: Google Scholar
N. K. Tomas and W. Max, Variational Graph Auto-Encoders, vol. 28, no. 3, Springer, 2016.
D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in Proceedings of the 22nd international conference on knowledge discovery and data mining, pp. 1225–1234, San Francisco, CA, USA, 2016.View at: Google Scholar
D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” Journal of the Association for Information Science and Technology, vol. 58, no. 7, pp. 1019–1031, 2007.View at: Google Scholar
N. Ketkar, “Introduction to keras,” in Deep learning with Python, pp. 97–111, Apress, Berkeley, CA, 2017.View at: Google Scholar
M. Abadi, A. Agarwal, P. Barham et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015, https://github.com/tensorflow/tensorflow.
D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” in International Conference on Learning Representations (ICLR), San Diego, USA, 2015.View at: Google Scholar
V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning, Toronto, Canada, 2010.View at: Google Scholar
R. Ackland, Mapping the US Political Blogosphere: Are Conservative Bloggers More Prominent, Presentation to Blog Talk Downunder, Sydney, 2005, http://incsub.org/blogtalk/images/robertackland.pdf.
V. Batageli and A. Mrvar, Pajek Datasets, http://vlado.fmf.unilj.si/pub/networks/data/default.htm.
A. Grover and J. Leskovec, “node2vec: scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864, San Francisco, CA, USA, 2016.View at: Google Scholar