Abstract

Search engines and recommendation systems are an essential means of solving information overload, and recommendation algorithms are the core of recommendation systems. Recently, the recommendation algorithm of graph neural network based on social network has greatly improved the quality of the recommendation system. However, these methods paid far too little attention to the heterogeneity of social networks. Indeed, ignoring the heterogeneity of connections between users and interactions between users and items may seriously affect user representation. In this paper, we propose a hierarchical attention recommendation system (HA-RS) based on mask social network, combining social network information and user behavior information, which improves not only the accuracy of recommendation but also the flexibility of the network. First, learning the node representation in the item domain through the proposed Context-NE model and then the feature information of neighbor nodes in social domain is aggregated through the hierarchical attention network. It can fuse the information in the heterogeneous network (social domain and item domain) through the above two steps. We propose the mask mechanism to solve the cold-start issues for users and items by randomly masking some nodes in the item domain and in the social domain during the training process. Comprehensive experiments on four real-world datasets show the effectiveness of the proposed method.

1. Introduction

With the acceleration of people’s daily life, much time can be saved to get useful information quickly in a practical way, and recommender systems play a critical role in filtering information. Collaborative filtering recommendation [1] is the mainstream traditional recommendation algorithm. However, due to the data sparsity, it seriously affects the recommendation quality. The collaborative filtering recommendation combined with the neural network [2] (e.g., CNN, RNN, and CDAE) alleviates this problem. Besides, taking advantage of the productive relationship in the social networks [35] can effectively solve the cold-start problem [6, 7], but there is a malicious fraud problem by distrusting users.

On one hand, in social networks, there are explicit or implicit relationships between most users, which influence each other’s behaviors. However, not every user and connection in a social network is trustworthy. To reduce the impact of untrusted users, social networks can be divided into many small ego-networks (representing all the users both directly and indirectly trusted by u) according to users’ trust. Adding a trusted network into the recommender system can effectively solve the problem of a fraud attack. The traditional attribute graph-based semisupervised classification methods, such as label propagation (LP) [8] and label spreading (LS) [9], can effectively classify the unstructured data and have been used in community (consisting of similarity nodes) detection. In [10], they propose a new method to extend the global reputation model with local reputation, through an unsupervised process calculated in the user’s ego-network. On the other hand, the widespread existence of social media has dramatically enriched the social activities of network users and produced productive social relationships; the integration of these social relationships has improved the quality of the recommender system and solved the cold-start problem. In recent years, researchers have proposed a large number of recommender systems [11, 12] based on social networks that homogenize social relationships. However, it may restrain user representation learning in each respective domain, since users behave and interact differently in two domains, which makes their representations heterogeneous [13]. Some research [14, 15] considers the heterogeneity, and learning separated latent representations and then transferred them from a source domain to a target domain for a recommendation. However, learning the representations is challenging due to the inherent data sparsity problem in both domains.

Based on the latest advances in deep learning, especially the development of graph convolutional networks (GCN) [16], it is more accessible to aggregate features in social networks. The recommender system proposed by researchers based on graph convolutional neural network [17] is superior to the previous recommendation algorithms in recommendation quality; Chebyshev [18] is a classical graph convolution based on the spectral method. However, graph convolution assigns the same weight to neighbor nodes, and the feature aggregation depends on the entire graph, which limits the flexibility and generalization of the graph. Therefore, a graph attention network [19] is proposed, which uses the attention mechanism to weigh the sum of the features of neighboring nodes, and the feature weights of neighboring nodes are related to the nodes themselves, whether it is a graph convolutional neural network or a graph attention network, processing the user nodes and item nodes in the same way, without considering the heterogeneity of the two types of nodes. Besides, the training time is longer when the network is more massive.

In this paper, we consider the heterogeneity of information and relationships in the social and item domains, propose the Context-NE model to learn the representation in the item domain. We can learn representations of users interacting with multiple items through the model. The context contains the basic properties of the node and the user’s comments on the item so that it can fuse the structure information and text information in the item domain. Besides, based on the graph convolutional networks, we propose a hierarchical attention mechanism and mask mechanism, which can improve the quality of recommendations and the generalization ability and flexibility of the network. Our significant contributions can be summarized as follows:(i)We propose a network embedding method Context-NE, which learns the node embedding in the item domain. It mainly represents the information of multiple items visited by the user and obtains the user’s node representation in this domain.(ii)We use the hierarchical attention mechanism to aggregate the node information in the social domain. At first, we use the K-head self-attention mechanism the same as GAT to aggregate the information of neighbor nodes; then, we use the attention mechanism to aggregate the output information of the K-head self-attention mechanism to obtain the final representation of the node.(iii)We apply the mask mechanism during training and processing the features of a mask node according to the mask mechanism. Newly joined nodes (e.g., new users and new items) can be regarded as a masked node, increasing the flexibility and generalization of the network.(iv)We conducted extensive experiments to evaluate the performance of the proposed HA-RS model on citation networks and Yelp datasets, and the results showed that our approach performed better than the most advanced baseline approaches.

The remainder of this paper is structured as follows: Section 2 discusses the related works. Section 3 gives preliminary definitions in the paper and gives a description of the problem. Section 4 introduces the method of hierarchical attention recommender system based on cross-domain social network, which is divided into four parts. The experimental results are demonstrated in Section 5. Lastly, the full study is concluded in Section 6.

In this section, we will introduce the study of the traditional recommender system, trust recommender system, and graph convolution recommender system, respectively. This paper builds on the latest advanced methods based on social networks.

2.1. Traditional Recommender Systems

It is mainly divided into two categories. One is content-based recommender systems [20] which use content information of users and items, such as their respective occupation and genre, to predict the next purchase of a user or rating of an item. The other is collaborative filtering models [21] which find the choices of users by mining the historical behavior data of users, divide the groups of users based on different preferences, and recommend commodities with similar interests. The traditional the recommendation algorithm has serious cold-start problems.

2.2. Trust Recommender System

Social network-based recommender systems [22] can mitigate the cold-start problem, and most of the information on social networks is unreliable due to different backgrounds or preferences of users. The trust network is introduced into the recommender system to solve the problem of a fraud attack. Yang et al. [23] propose a hybrid method (TrustMF) that combines both a truster model and a trustee model from the perspectives of trusters and trustees; that is, both the users who trust the active user and those who are trusted by the user will influence the user’s ratings on unknown items. Guo et al. [24] propose TrustSVD, a trust-based matrix factorization technique, which takes into account both the explicit and implicit influence of ratings and trust information when predicting ratings of unknown items. Pasquale et al. [10] propose a model capable of integrating local and global reputation in online social networks and calculating the user’s ego-network through an unsupervised method. In this paper, it claims that the local reputations generally perform better than global ones, although they could not always be sufficient for calculating reliable values.

2.3. Graph Convolution Recommender System

Topological relations [25] and information transmission [26] between nodes in social networks are very important for social recommendation. Therefore, fully extracting the information of network structure is one of the key parts of recommendation algorithm. Based on the graph embedding method and drawing on the ideas of CNN, a graph neural network (GNN) [27] is proposed to collect information from the graph structure. GNN is a compelling, structured data modeling architecture; however, GNN uses the same parameters in the iteration and cannot aggregate edge information. To solve some of the limitations of graph neural networks, some variants of graph neural networks (e.g., GCN, DCNN, GAT, and GGNN) have gradually appeared. Instead of directly propagating all the attributes of each node in GCN and GAT, Masked GCN [28] only propagates a portion of its attributes to the neighbors. PinSage [29] combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure and node feature information. STAR-GCN [30] can produce node embeddings for new nodes by reconstructing masked input node embeddings, which primarily tackles the cold-start problem.

Despite the compelling success achieved by these methods in general cases, they ignore the heterogeneity of social networks. It may restrain user representation learning in each respective domain since users behave and interact differently in social networks. GraphRec [31] and DiffNet [32] consider the heterogeneity of social network, but they separately learn a latent representation of users and items. Poor network flexibility and scalability is a common problem in graph-based recommendation algorithms. The main concern of this paper is to consider the heterogeneity of social networks and improve the flexibility of networks.

3. Definitions and Problem Statement

The social network used in this paper is a heterogeneous graph, which contains multiple types of nodes (users and item nodes), as well as various types of edges (e.g., the relationship between users and the interaction between users and items).

Definition 1. Social domain and item domain: we divide the social network into the social domain and the item domain, which represent the user-user connections and user-item interactions. represents the item sets and m indicates that there are m items in the item domain. The user sets are represented by and n represents the total of users in the social domain. The edges between nodes represent friend relationships are represented by ; the edges between the user and the item represent their interaction and are represented by . The entire network can be represented through a quaternary and the overall network is shown in Figure 1.

Definition 2. User embedding and item embedding: item embedding contains information such as the user’s reviews of the item, the score, and the attributes of the item (e.g., item name and type). The initial user representation includes the primary attributes of the user (e.g., name and age).

Definition 3. Social homogeneity and item homogeneity: take the user as an example, the user directly connected to the user is a first-order neighbor (e.g., and ), the user directly connected to the first-order neighbor is a second-order neighbor, and so on. User tends to choose items and that first-order neighbors and second-order neighbors prefer (i.e., users and friends have similar preferences), which is called social homogeneity. User also tends to choose item which is similar to item selected by the user’s historical behavior, which is called item homogeneity. We will further introduce item homogeneity and social homogeneity in Sections 4.1 and 4.2, respectively.

Definition 4. Network mask: we randomly selected some nodes in network and masked some attributes of these nodes. We describe the mask mechanism in detail in Section 4.3; the users ( and ) and the items ( and ) are the masked nodes in the network .

3.1. Task Description

We divided the heterogeneous network into the social domain and item domain. Firstly, we learn the user representation with user preference in the item domain and then aggregate the social influence in the user domain. Finally, we use the user representation to recommend items.

4. Model

The HA-RS model framework is shown in Figure 2 and mainly includes two parts, and one is the user’s node representation model Context-NE in the item domain; it can be seen on the left side of Figure 2. The other is the hierarchical attention embedding model, which is used to learn the feature of user friends; it can be seen on the right side of Figure 2. The following content will describe the model in detail.

4.1. Context-NE

Considering the item homogeneity, users tend to choose the item similar to the historical item; that is, the user’s past behavior will affect the user’s next behavior. We obtain the user representation in item domain through the Context-NE model, which contains the historical preference of the user. The Context-NE model learns the user embedding in the item domain, and we call the information of users/items (refers to the point of interest) as a context in this paper. The user’s context includes the primary attributes of the user, context = {user’id, user’name, friend’name, fans’number}. The context of the item includes the user’s review of the item and the primary attributes of the item, context = {review, bussiness’id, bussiness’name, city, latitude, longitude, categories}. Considering the interactive sparse of the user in the item domain, we use LSTM to learn context sequence vector other than the score matrix. The Context Description transforms each word in the sequence into the corresponding word vector through the LSTM model. It then merges context information (in particular, the context-dependence of the review) into the sequence embedding , . The Convolution Layer extracts local features from the sequence embedding , the number of convolution kernels is , the size is , and the size of the sliding window is . The user context vector and item context vector after convolution are expressed as , respectively. Calculation formula is as shown in the following equations:where is the m-th convolution kernel, is bias, represents the sequence embedding of the user, and represents the sequence embedding of the item.

The pooling layer uses mean pooling to reduce the output dimension and prevent overfitting. The sliding window size is also . The user context vector and item context vector after pooling are expressed as , respectively, and the calculation formula is as shown in the following equations:

We get the user representation of the -th user in the item domain after the user embedding and the item embedding are nonlinearized by the Tanh function, which is represented by ; the calculation formula is as shown in the following equation:where represents the embedding of item that user has visited, represents the representation of user , and and represent weight parameters and bias parameters, respectively. represents the user’s preference for each item, represents the number of interactions between user and item , and represents the total number of interactions between user and items. The more times user interacts with the same item, the more the user likes the item; that is, the larger the value, the greater the proportion of the item embedded in the user representation. can be calculated by the following equation:

4.2. Hierarchical Attention

In the social domain, hierarchical attention mechanism and graph convolution are mainly used to fuse the feature information of user neighbors into user embedding. The user representation finally obtained contain the user’s preference information and the social impact in social networks, and utilizing it to recommend will be closer to the user’s preferences.

Learning the user representation in the social domain should consider the friend relationships, that is, neighbor nodes in the network graph. However, it is not worth calculating all neighbor nodes in large social networks. Every user has too many neighbors, but not every neighbor has a significant impact on the user’s preferences. For example, user has a neighbor that infrequently interacts; the impact of on user will be minimal. Besides, we will consider -layer neighbor in the aggregating process, if we aggregate the features of all the neighbor nodes that will increase the noise in the user’s representation. Meanwhile, the time cost of calculating all -layer neighbor nodes in a large-scale network is very high.

We prune the neighbor node in this paper. First, we calculate the node similarity between the user and the neighbor and, then, sort and select the node of the top as the user’s neighbor when the number of neighbors of the user is higher than ; otherwise, there is no need to delete the neighbor, and the pruning strategy can be expressed as the following equation:where means take T users with the highest similarity to user , is the similarity calculation function between two nodes, and N(u) represents the neighbors of node . Only considering the more critical neighbor nodes can reduce the computation and improve the quality of the user representation, which can be applied to large-scale networks.

In this paper, we assume that users’ preferences are related to their -order neighbors and use the graph convolution network to aggregate the features of the user’s neighbor. Specifically, we assign different weights to the user’s neighbors through the hierarchical attention mechanism to indicate the influence degree of different neighbors on users. This process can be likened to the calculation of local reputation in ego-network, where users’ -order neighbors are user friends in ego-network and the function of attention coefficient is similar to the local reputation. The input of graph convolution network is the embedded set learned in the item domain, , represents the number of user nodes, and represents the number of features in each node. The output through the layer attention mechanism is represented as , , which can be calculated by the following equation:where is the output vector of the aggregate attention layer, which aggregates the output of heads of attention in the upper layer, which can be calculated by the following equation:where is the attention coefficient of the aggregation layer, is the weight vector of the aggregation layer, and the output of the -head attention is aggregated through the sigmoid function to obtain the output of node , the calculation method of as shown in the following equation:where represents the influence of head attentional output on the node when calculating the aggregation expression of node . is the output of node through the head attention, and it can be calculated by (11). Vectors and are multiplied by the Shared weight vector and then concatenated as the input of the single-layer feedforward neural network; is the bias. is the weight vector of the neural network layer, and is the LeakyReLu nonlinear function with an input gradient of 0.2. In order to make the coefficients between different neighbor nodes easy to compare, the attention coefficient of each neighbor node is obtained by using the softmax function:where represents the attention coefficient of the first layer, W is the weight vector, and are the representation of user learned from item domain, and the calculation method of is as shown in the following equation:where represents the influence of neighbor on the node when calculating the expression of node , the calculation process is consistent with , is the shared weight vector, and is the bias. We get the attention vector by normalizing through sigmoid.

4.3. Mask Mechanism

The size of the network determines the percentage of the mask; the more extensive the network, the larger the proportion of the nodes in the mask, and there are three ways to calculate the mask node:(i)The probability of is randomly sampling the feature through mask vector from the neighbor node to represent the corresponding feature of the mask node.(ii)The probability of is randomly sampling the feature through mask vector from the neighbor node to represent the corresponding feature of the mask node. The probability of represents the corresponding features of the masked node by randomly sampling the features from other nodes in the network.(iii)The probability of is randomly sampling the feature through mask vector from the neighbor node to represent the corresponding feature of the mask node. The probability of does not deal with the node features of the masked node.

stands for mask node in item domain and stands for mask node in social domain, and the calculation formula is as shown in the following equations: indicates that, by randomly selecting a neighbor of masked node in the item domain and transmitting the features of neighbor to masked node by mask vector, the mask vector is learned in the training process; indicates randomly selecting a node in the item domain and propagating the features of the node to masked node; indicates keeping the features of the masked node. function indicates randomly selecting a neighbor of masked node in the social domain and transmitting the features of neighbor to masked node; indicates randomly selecting a node in the social domain and propagating the features of the node to masked node; and represents the original embedding of the masked node. represents the strategy for processing mask nodes.

4.4. Complexity Analysis

The time cost of the HA-RS model is composed of two parts. One is to calculate the users’ representation in the item domain. Each user needs to be calculated once, the time complexity is , and is the number of users. The second is aggregating the features of neighbor nodes in the social domain. The time complexity of the aggregation process is , where is the number of users, is the maximum neighbor order of the aggregation, and is the number of neighbors of each user. After the reduction strategy, the number of neighbors of each user is less than or equal to . Since is much less than , and and have small values, the time complexity of the HA-RS model is less than .

5. Experiments

In this section, we conduct experiments to evaluate the performance of HA-RS on four datasets. Specifically, we aim to answer the following two research questions: how does HA-RS perform compared with the state-of-the-art social recommendation. How do different components (i.e., the Context-NE model, the attention model, and the mask mechanism) affect HA-RS?

5.1. Experimental Settings
5.1.1. Datasets

In this section, we conduct experiments that are including transductive learning, inductive learning, ablation experiment, and parameter selection to evaluate the performance of HA-RS on four datasets. Specifically, we aim to answer the following two research questions: how does HA-RS perform compared with the state-of-the-art social recommendation. How do different components (i.e., the Context-NE model, the attention model, and the mask mechanism) affect HA-RS?

5.2. Experimental Settings
5.2.1. Datasets

We use the citation network datasets for transductive learning, which is a collection composed of references and relationships between references. In this paper, three citation networks (Cora, CiteSeer, and PubMed) were used for experiments. The description of the dataset is shown in Table 1. In each network, nodes and edges are papers and undirected citations, respectively. The node content is constructed by extracting the words from the documents.

We use Yelp for inductive learning that is an online location-based social network. It contains business tables, reviews, tips, user information, and check-in tables. The business table lists the restaurant’s name, geographic location, opening hours, cuisine categories, average star rating, and so forth. The review table lists the restaurant’s star rating, evaluation content, evaluation time, and support rate for the evaluation. In this paper, we transform the ratings that are larger than 3 as the liked items by this user and contain 141804 users and 17625 items.

5.2.2. Baselines

For the transductive learning task, we compare HA-RS with various state-of-the-art baselines, including three classical recommendation models: multilayer perceptron (MLP), label propagation (LP) [8], and graph embedding (DeepWalk) [33]. Besides, we also compare our proposed model with four graph convolutional based recommendation models: graph convolution with Chebyshev filters (Chebyshev) [18], graph convolutional network (GCN) [16], graph attention networks (GAT) [19], and Masked Graph Convolutional Network (Masked GCN) [28].(i)MLP: It has multiple layers of neurons, which is a deep neural network, which cannot effectively process the graph data.(ii)LP: It uses local information from truncated random walks as input, learning a representation of vertices in a network that encodes social relations in a continuous vector space.(iii)Chebyshev: It provides strict control over the local support of filters to extract local and stationary features through graph convolutional layers. The method is computationally more efficient by avoiding an explicit use of the Graph Fourier basis.(iv)GCN: It uses an efficient layerwise propagation rule to learn hidden layer representations; the rule is based on a first-order approximation of spectral convolutions on graphs. However, it scales linearly in the number of graph edges.(v)GAT: It allows for implicitly assigning different importance to different nodes within a neighborhood while dealing with different sized neighborhoods and does not depend on knowing the entire graph structure upfront.(vi)Masked GCN: It only propagates a specific part of the node attributes via a mask vector learned for each node.

For the inductive learning task, we compare HA-RS with three classical recommendation models: BPR [34], feature enhanced latent factor model FM [35], and a state-of-the-art social recommendation model TrustSVD [24]. Besides, we also compare our proposed model with four graph convolutional based recommendation models: GC-MC [36], PinSage [29], GraphRec [31], and DiffNet [32]. Note that GraphRec and DiffNet consider the heterogeneity of social network.(i)BPR: It only considered the user-item rating information for recommendation; FM and TrustSVD improve over BPR by leveraging the node features and social network information.(ii)GC-MC: It is one of the first few attempts that directly applied the graph convolutions for a recommendation. GC-MC defines a user-item bipartite graph from user-item interaction behavior and formulating matrix completion as a link prediction task on the bipartite graph(iii)PinSage: It is designed for similar item recommendations from an extensive recommender system, combining efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure and node feature information. However, this method homogenizes the nodes in the graph without taking into account the differences between user and user interactions and between the user and item interactions.(iv)GraphRec: It provides a principled approach to jointly capture interactions and opinions (rate) in the user-item graph, and the framework coherently models two graphs and heterogeneous strengths.(v)DiffNet: It is designed to simulate how users are influenced by the recursive social diffusion process for social recommendation. This method considers the heterogeneity of social networks, which separately learns user embedding and item embedding, finally summing the user embedding and user pleasing items embedding for prediction.(vi)In order to further validate the improvement obtained by HA-RS, we designed three variant models based on HA-RS.(vii)HA-attn has removed the hierarchical attention mechanism from the model, HA-mask has removed the mask mechanism from the model, and RS only considers the heterogeneity of the social network, while removing the hierarchical attention mechanism and the mask mechanism.

5.2.3. Evaluation Metrics

On the citation network dataset, the average classification accuracy (with standard deviation) was used to measure the model performance, and we reuse the metrics already reported in Masked Graph Convolutional Network for the state-of-the-art techniques.

On the Yelp dataset, as we focus on recommending top-N items for each user, we use two widely adopted ranking based metrics: hit ratio (HR) and normalized discounted cumulative gain (NDCG). For these two indicators, the larger the value, the better the performance. We reuse the metrics already reported in A Neural Influence Diffusion Model for Social Recommendation for the state-of-the-art techniques.

HR measures the number of items that the user likes in the test data that has been successfully predicted in the top-N ranking list, and its calculation formula is shown in the following equation:

The denominator is all the test sets, and the numerator is the sum of the number of each user’s top-N recommendation list in test sets.

NDCG considers the hit positions of the items and gives a higher score if the hit items are in the top positions and its calculation formula is shown in the following equation:where indicates the rank of the positive item (successful prediction item).

5.3. Parameter Setting

During training, when learning user representation in the item domain, we apply the grid search to set the hyper-parameters , , and . By using the Masked LM model in Bert [37] for reference, this paper sets to 20%. We aggregate the feature embedding of -layer neighbor in the social domain and set layer  = 2; the user’s maximum number of neighbors is set to 15 based on the grid search results. L2 regularization was used to prevent overfitting. The attenuation coefficient is set to 0.0005 when the dataset was the citation network and 0.001 when the dataset was Yelp. User nodes in Yelp dataset output embedded dimension after passing through the Context-NE model. In the first layer of attention and the aggregation layer of attention, the output of each layer is also set to dimension. The first attentional mechanism is composed of attentional heads, whose output is the input of the second attentional head. We used Adam as the optimization method for all the models that relied on gradient descent-based methods, with an initial learning rate of 0.001.

5.4. Experimental Results
5.4.1. Accuracy of Transductive Learning on Citation Network

For the transductive learning, the results of classification accuracy on the three datasets are shown in Table 2. It can be seen from Table 2 that the performance of our model is better than other baselines. More specifically, the accuracy of the model in the three datasets of the citation network in this paper is 1.9%, 2%, and 2.1% higher than the GAT model, respectively, indicating that the HA-RS with mask mechanism and hierarchical attention can improve the performance.

Masked GCN propagates partial attributes instead of the entire ones via a mask vector learned for each node, and the Masked GCN significantly improves the performances compared to GCN and GAT. However, according to the paper of the Masked Graph Convolutional Network, the running time of Masked GCN is 1.24 times compared to that of GAT on average, and the extra time is mainly utilized to learn the parameters of masks. HA-RS only learns mask parameters of a few nodes; it can save much time. Besides, our model HA-RS outperforms Masked GCN.

5.4.2. Experimental Results of Inductive Learning on Yelp

For the inductive learning, the results in terms of HR and NDCG are shown in Tables 3 and 4. Table 3 shows the HR@10 and NDCG@10 results with varying output dimension size D, and we have the following observations:(i)On both HR and NDCG, the graph convolutional based recommendation models outperform classical recommendation models. Graph convolution methods considering network heterogeneity, such as GraphRec and DiffNet, have better performance than GC-MC and PinSage. As DiffNet is the best baselines, our model HA-RS improves over DiffNet range of 3% to 3.5% on HR as the output dimension size increases from 16 to 64 and range of 1.2% to 5.3% on NDCG. It indicates that this paper divides the heterogeneous social network into the item domain and social domain, and the user representation obtained is more suitable for a recommendation.(ii)We find that the performance of all models does not increase as the output latent dimension size increases from 16 to 64. The BRP and FM models achieve the best performance at  = 32, and other models achieve optimal performance at  = 64. It might be concluded that BPR only considered the user-item rating information for the recommendation; a too large dimension increases the noise in the representation, resulting in a decrease in the recommended performance. Although FM leveraging the node features and social network information, the feature extraction ability of the model is too weak.

Table 4 shows the HR@N and NDCG@N results with varying top-N recommendation size . From the results, we also find similar observations as in Table 3, with our proposed model HA-RS always showing the best performance. Our model HA-RS improves over DiffNet range of 2.3% to 4.0% on HR as the top-N recommendation size increases from 5 to 15 and range of 2.9% to 5.1% on NDCG. Based on the experiment results, we could empirically conclude that our proposed HA-RS model outperforms all the baselines under different output dimensions and different recommendation size .

5.4.3. Ablation Experiment

(1) Ablation Experiment and Cold-Start Analysis. We set the ablation experiment in Yelp dataset to verify the effectiveness of the model, and we set the dimension size as 64 and the as 10 in top-N recommendations. The experimental results are shown in Figure 3.

According to Figures 3(a) and 3(b), we observe that HA-RS outperforms the others on HR and NDCG. When removing the hierarchical attention mechanism, the recommendation performance in HR and NDCG decreased slightly, 4.8% and 3.8%, respectively, but there is a great decline in performance of HR and NDCG when removing the mask mechanism, 10.2% and 9%, respectively. It indicates that both the reduced hierarchical attention mechanism and the mask mechanism will affect the performance of the model, and removing the mask mechanism has a more significant impact on the model.

Besides, removing both the hierarchical attention mechanism and the mask mechanism, the performance of the RS was the worst; however, it also outperforms PinSage that uses both social network information and node feature information. The main reason is that PinSage processes all nodes in the full graph in the same way and does not explicitly distinguish between the user (board) and item (pins) nodes which does not consider the heterogeneity of the social network. It demonstrates the superiority of our model by dividing the heterogeneous social graph into two domains.

We verified the cold-start problem on the Cora dataset, set of the cold-start users as 0, 50, 100, and 150, respectively, and compared the model HA-RS with its three variants. We regard the cold-start users as the masked nodes and learned by mask vector in HA-RS and HA-attn while, in HA-mask and RS, the cold-start user is randomly initialized; the experimental results are shown in Table 5.

For cold-start users, there is very little information available about a recommendation, and the recommendation result obtained by randomly initializing user embedding is often not ideal. In the method based on social networks, the cold-start problem is alleviated by combining the information of friends. In this paper, cold-start users are regarded as masking nodes, and user embedding is learned by the mask vector. From Table 5, we can conclude that the HA-RS and HA-attn (with mask mechanism) outperform HA-mask and RS (without mask mechanism). Therefore, it can say that the mask mechanism can solve the cold-start problem.

(2) Parameter Selection. In this section, we mainly explore the impact of different parameters of HA-RS. We study the impact of L-layer neighbor in the aggregating process in the social domain, the effect of a different number of user neighbors, and the effect of different mask proportions.

We vary the layer of neighbor from 1 to 3 and set the number of user neighbors in every layer from 5 to 20; the performance of HR and NDCG is shown in Figure 4. We can observe from Figure 4 that HA-RS has the best performance when the number of user neighbors is 15; the performance of HA-RS has declined dramatically when the number of neighbors increases to 20. Meanwhile, we can find that aggregating 2-layer neighbor in HA-RS outperforms aggregating 1-layer or 3-layer neighbor; this is because 1-layer cannot capture the higher-order relationships between users in the social domain. Nevertheless, 3-layer may bring massive noise to the model. The above observation has proved that aggregating the features of too many neighbors will increase the noise in the user’s representation, which is described in Section 4.2. Other related studies have empirically found similar trends, with the best layer size set as  = 2 [32, 38].

Consider that Masked GCN propagates partial attributes instead of the entire ones via a mask vector learned for each node, and we will discuss that the different mask ratios influence the performance of the model. We apply the mask mechanism of this paper to the original GCN and GAT models, respectively, expressed as GCN-mask and GAT-mask, and compare with HA-RS proposed in this paper.

As mentioned above, the mask mechanism plays a vital role in improving the model. Six groups of different mask proportions are set in the experiment. When the proportion is set to 0%, it is the GCN, GAT, and HA-RS without mask mechanism, and the maximum mask proportion is set to 10%. The accuracy of the three methods in the citation network is shown in Figure 5.

According to Figure 5, we can find that the HA-attn outperforms GCN and GAT when the mask rate is 0%, and it is also proved that dividing the heterogeneous network into the social domain and item domain to learn user representation is better for classification and recommendation. In the dataset of Cora and CiteSeer, the accuracy of the three models reached the highest when the mask proportion was 2%, while the accuracy of the models decreased with the increase of the mask proportion. On PubMed dataset, the mask ratio reached the highest value when it was 6% and then decreased slowly. The main reason is that there are more nodes in PubMed dataset, 6 times more than in Cora and CiteSeer. At a mask ratio of 10% in three datasets, the three methods perform better than those without a masking mechanism. This shows that the mask mechanism can indeed affect the performance of the model, and the proportion of the mask is related to the network scale. The proportion of the mask that achieves the best performance increases with the increase of the network scale.

6. Conclusion

In this work, we explore the problem of extracting more abundant features of user representation for the recommendation in heterogeneous social networks. Considering that the behavior of users in the item domain and the social domain is inconsistent, we learn that user representation in the HA-RS is divided into two parts. First, we learn the user representation with user behavior in the item domain by the Context-NE model and then through the hierarchical graph attention model to learn user embedding with social impact in the social domain. The mask mechanism in the model also improves the flexibility of the network. As new nodes are added, we can regard it as a masked node. A large number of experiments on real datasets show that our model performs better than the existing techniques above. Future research directions will focus on how to use the geographical location and timing information better to improve the quality of the algorithm further.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The work was partially supported by the National Natural Science Foundation of China under Grant nos. 81901389 and 71701026; the China Postdoctoral Science Foundation under Grant no. 2019M653400; the Sichuan Science and Technology Program under Grant nos. 2019YFS0236 and 2018GZDZX0039; the Key Project for Soft Science of Meteorology under Grant no. 201704; and the Youth Foundation for Humanities and Social Sciences of Ministry of Education of China under Grant no. 17YJC190035.