Abstract

Users’ reviews of items contain a lot of semantic information about their preferences for items. This paper models users’ long-term and short-term preferences through aspect-level reviews using a sequential neural recommendation model. Specifically, the model is devised to encode users and items with the aspect-aware representations extracted globally and locally from the user-related and item-related reviews. Given a sequence of neighbor users of a user, we design a hierarchical attention model to capture union-level preferences on sequential patterns, a pointer model to capture individual-level preferences, and a traditional attention model to balance the effects of both union-level and individual-level preferences. Finally, the long-term and short-term preferences are combined into a representation of the user and item profiles. Extensive experiments demonstrate that the model substantially outperforms many other state-of-the-art baselines substantially.

1. Introduction

In the era of information explosion, information overload is an important problem faced by users. Recommendation system came into being to solve this problem. They have evolved into a fundamental tool to help users in selecting items. Recommendation models include review-based model [1], attention-based model [24], aspect-based model [5], etc [68].

In the academic research area, a lot of work is focused on modelling long-term preferences. However, there is no appropriate model for short-term preferences from near-neighbor users. Near-neighbor users often influence the item purchase decisions made by a user. For example, a user would be highly interested in buying Apple mobile phones if his/her near-neighbor users buy them. Recently, sequential recommendations have drawn an increasing attention from both academic and industrial circles. The task is to identify an item purchased by a user by considering his or her temporal preference as a sequence.

Meanwhile, many techniques have been developed to tackle the data sparsity problem by utilizing the semantic signals provided by reviews. To model the rich semantic information in textual reviews, it is necessary to move beyond the surface-level feature representations of words in reviews. Consider the following two short sentences which contain the word “short”: (1) “a short battery life,” and (2) “a short loading time.” Here, “short” has different sentimental information in the two sentences. Every word has different meanings in different contexts. Word embedding only considers words without considering the different meanings of each word in different contexts.

To this end, we propose an end-to-end review-based aspect-level neural model (RANM) for a sequential recommendation. Compared with the pipeline model, our end-to-end model overcomes the inherent problems of the pipeline model: time delay, parameter redundancy, and error propagation. The incorporation of reviews provides more abundant semantic information and enhances the model’s expressive ability. Specifically, we embed user-related and item-related reviews into a continuous low-dimensional dense vector space by utilizing aspect-aware convolution and self-attention. Aspect-aware convolution can obtain semantic-oriented sentimental information in reviews locally, while self-attention can obtain word association information in reviews globally. The two sides are just complementary. Then, the hierarchical attention layer is used to obtain the sequence pattern of neighbor users at the union level, and pointer mechanism is used at the individual level. Attention mechanism is used to balance the effects of the two sides. The hierarchical attention layer can obtain a multigranularity union-level neighbor sequence pattern, whereas the point mechanism can obtain an individual-level neighbor sequence pattern. The two sides are just complementary. Finally, we combine the long-term and short-term preferences to obtain a hybrid representation method for more accurate item recommendations. Overall, the key contributions of this paper towards sequential recommendation are summarized as follows:(i)In this paper, recommendation reviews are used to precisely represent users and items. Our model can alleviate the data sparsity problem and provide a good interpretability for recommendation tasks. To the best of our knowledge, this is the first paper to harness the rich semantic information from neighbor users’ reviews in the neural sequential recommendation system.(ii)More fine-grained user and item profiles are developed. Furthermore, we use an aspect-aware convolution and a self-attention layer to identify users, their neighbors, and item representation. Meanwhile, we introduce a hierarchical attention mechanism and a review pointer to model multigranularity union-level and individual-level user preferences.(iii)We conduct extensive experiments on a large number of datasets (Amazon and Yelp) to verify the effectiveness of the model. The importance of different components of the model and the sensitivity of the parameters are investigated. It is compared with many classic baseline systems and achieves good results on sparse datasets.

The remainder of this paper is organized as follows. In Section 2, we discuss the existing work related to our method. In Sections 3 and 4, we describe the details of RANM, model architecture, and training method. Section 5 describes our experiment setup and compares our proposed model with many state-of-the-art baselines. Finally, we conclude this paper in Section 6.

2.1. Review-Based Recommendation

In order to improve the accuracy and interpretability of recommendation, there is a lot of research on utilizing reviews. Both convolution neural network (CNN) [9] and recurrent neural network (RNN) [10] have been widely adopted to extract semantic representation from reviews for rating prediction [14, 1114]. DeepCoNN [13] is the first attempt to jointly model both the user and the item from reviews using neural networks. From the perspective of multimodel combination, TransNets [1] extend the DeepCoNN [13] by introducing an additional layer to represent the user-item pair. The target network assists the source network in scoring predictions. Wu et al. [11] proposed a model to exploit user-item interaction features from auxiliary users’ reviews. Wu et al. [12] modelled a joint representation for a given user and item pair, which includes review-based and interaction-based feature learning. From the discovery of multiple preferences, Tay et al. [4] proposed a multipointer model that can combine multiple views of user-item interactions through review-level and word-level co-attention. Li et al. [14] considered a viewpoint of a user on an item as a semantic representation unit, which is organized into multiple logic interest units.

These methods can be classified as collaborative filtering based on interaction. From single-model view, these all are late interaction models. Our method includes not only the early interaction between neighbor users and items but also the late interaction between users and items. Our model includes a combination of the neighbor model and the user model and has a variety of preferences, such as long-term preferences and union-level and individual-level short-term preferences.

2.2. Topic-Based Recommendation

Many research studies try to extract semantic features from texts through topic model [15, 16]. CTR [17] assumes that the latent factors of items depend on the latent topic distributions of their text. TopicMF [18] links the latent topics and latent factors by using a defined transform function. McAuley et al. [19] integrated the topic method into matrix factorization through corpus likelihood regularization. Xu et al. [20] presented a model combining rating prediction and topic selection, which incorporates reviews and co-clusters of hidden user communities and item groups.

There are not only topic information but also sentimental information in reviews. Some solutions model latent factors into the framework of topic graphical model. Wang et al. [21] presented a latent aspect rating analysis model to determine the relative importance of a topical aspect. Diao et al. [22] jointly modelled ratings and review generation through aspects and their sentiments. Other solutions incorporate topical factors learned from reviews into a latent factor learning framework. Cheng et al. [23] jointly modelled the aspect-level importance and rating of reviews and item images, which were put into matrix decomposition framework for rating prediction. Shao et al. [24] matched users’ and items’ sentiment-aware multimodal topic models. Compared with the first kind of method, this model can be optimized both jointly and independently.

2.3. Aspect-Based Recommendation

Aspect-based recommendation systems can be divided into two main categories. The first category uses external sentiment analysis tools to extract the aspects. Zhang et al. [25] developed a multimatrix factorization model using a user-item rating matrix, a user-feature attention matrix, and an item-feature quality matrix. Chen et al. [3] learned to rank user preferences based on a phrase-level sentiment analysis across multiple categories and further integrated this framework with matrix factorization at both the item and category levels. SLUM [26] predicts the sentiment information and then identifies the most valuable aspects of the user on that item. These works rely on the performance of the external sentiment analysis toolkit.

The second category automatically obtains the aspects of reviews by an embedded model component. Chin et al. [5] modelled the multifaceted process behind how users rate items by estimating the aspect-level user and item importance based on the neural co-attention mechanism. A3NCF [27] defines aspects in reviews as a combination representation of topic information and embedding information. Li et al. [28] used a special CNN network to capture aspect-aware representations in reviews. Compared with the first kind of method, this explicit aspect extraction method causes error accumulation in the downstream recommendation task.

Topic-based and aspect-based recommendations are shallow latent semantic methods, but our model is a deep latent semantic method through complex hierarchical structure. In addition, our approach has a more unified model representation for joint training and inference, and the representation of latent semantics is also more diverse.

2.4. Sequential Recommendation

Sequential recommendation is an important means to use implicit feedback. Sequential pattern learning is widely verified as a critical issue in sequential recommendation [2932]. Rendle et al. [29] presented a factorized personalized Markov chain model that subsumes both a common Markov chain and a matrix factorization model. Wang et al. [30] implemented a hierarchical representation model that can well capture both users' general tastes and sequential behaviors by involving transaction and user representations in a rating prediction. He McAuley [31] integrated similarity-based methods with Markov chains. These methods use a Markov chain to model the sequence patterns. Chen et al. [32] proposed a more flexible model to integrate collaborative filtering with a memory network.

Session/cookie-based recommendation, which does not contain user identification information, is very similar to the sequential recommendation. Hidasi et al. [33] proposed a GRU model for session recommendations. This is generally believed to be the first session recommendation based on a deep neural network (DNN). Tan et al. [34] proposed two practical techniques, namely, data augmentation and distribution shifts, to improve model performance. Jannach and Ludewig [35] proposed a heuristic near-neighbor framework for sessions that are complementary with GRU4REC [33]. Ludewig and Jannach [36] proposed an effective session-based matrix factorization, which presents the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our model uses an attention-based mechanism to obtain a more expressive relationship in and out of the sequence.

3. Proposed Model

In this section, we propose an end-to-end review-driven aspect-level neural recommendation model. Figure 1 illustrates the overall network architecture of the user representation network.

3.1. Model Architecture

In Table 1, let and represent the user and item sets, respectively. Each item is associated with a sequence of neighbor users arranged chronologically as , where is a neighbor user with which the item has interacted. We merge the user ’s and the item ’s review sets to form user document and item document , where and are the user ’s and the item ’s document lengths in number of words, respectively.

Firstly, the user review is transformed into an embedding matrix via a neural embedding layer, where is the dimension size of the representation vector of each word in the review. Specifically, the neural embedding layer performs a look-up operation through a shared word-embedding matrix. The word-embedding matrix can be initialized using a pretrained word vector [37, 38].

The same color is used for the same type of review and review-based vector representation in Figure 1, such as for users’ reviews and review-based vectors. The same color is used for the same type of subnetwork, such as the self-attention network, CNN, and attention network.

3.2. Finding Neighbor Users

Finding neighbor users can help in determining the neighbor users in a social network. We use neighbor users’ reviews to improve the user representation model, since they can provide more information for the current user. Here a neighbor user’s preference is similar to that of the current user. The key here is to determine the current user’s neighbor users. This paper adopts a matrix factorization method [39], a standard method to detect neighbor users. The basic idea is to decompose the rating matrix to obtain the representation of hidden variables, so as to obtain the neighbor users through a similarity function.

Specifically, the user’s ratings on the item are expressed by a matrix , where the number of elements in the user set and item set is denoted by and. Three matrices are used to approximate the rating matrix. In the learning algorithm, we use an iteration optimization method for SVD matrix factorization.where represents the posterior probability of topic clusters for each item, represents the distribution of each topic , and represents the posterior probability of topic for each user.

We use matrix factorization to obtain the posterior probability. To infer , , and , the solution to the problem is obtained by the following updating rule:

The similarity of users and , given the rating matrix , is computed using the Pearson correlation coefficient as follows:where and denote the average weighting ratings of users and , respectively. The process for neighboring items is the same as for the neighbor users. Similar to equation (5), we can compute the similarity between two item vectors using a topic-item matrix .

3.3. Aspect-Level CNN

In this section, we describe the aspect-level convolution process of user reviews for extracting aspect-level information from reviews. Traditionally, we use the same word-embedding representation for the same word. However, the semantics or sentiment polarity of the same word can be different for different aspects. In this paper, an aspect-based transformation matrix is used to transform a word-embedding matrix into an aspect-based word-embedding matrix . The word embedding for aspects can be expressed as a tensor . Different from the traditional word-embedding representation, this paper presents the different aspects of each word.

Then, motivated by the work of Li et al. [28], we use a CNN to encode the current word’s context. The tensor is similar to the multichannel feature representation of an image. In this paper, the classical image feature processing method is used to perform the convolution operation on an aspect-based feature embedding tensor. Specifically, there are n filters/kernels , , where is the height of the filter. For instance, if n = 10, h = {1, 3, 5, 7, 9}, then there will be two filters for each size. To obtain the nonlinear feature transformation, we use ReLU as the activation function after the convolution operation using n filters.where is the dimension h extracted from the first order of the third-order tensor, denotes the elementwise product of the tensor and the k-th convolution kernel , and is the starting position of the sliding window. Specifically, the context uses a window spanning words on both sides of the current word. The result is (is the context feature for the word, and n is the number of filters). Because different convolution kernels have different convolution ranges, the dimensions of the generated vectors are different. For processing convenience, different filling sizes are used for convolution kernels of different sizes. The parameter set of the aspect-level CNN layer contains , , , and .

3.4. Self-Attention

In general, a pooling layer is used to refine features after the convolution layer. However, not all words are equally crucial for each user/item. Convolution considers local features, while self-attention takes global features into account. The integration of the two feature extractions can be better considered from global and local views [40]. For better generalization ability, we use the general self-attention [41].where and are the representation vectors of the current word, are learnable parameters, and is a softmax function. Intuitively, the self-attention layer computes the average of all value vectors , where the weight is a matrix-weighted dot product between the query vector and the key vector . A scaling factor n is used to avoid a large weight because of its high dimension. The parameter set of the self-attention layer contains .

The neighbor user’s representation vector is encoded as follows through using max pooling. Maximum pooling implies that the strongest of these features is retained.where () is the embedding for the j-th neighbor user and is the number of neighbor users. The user’s long-term preference and the item representation are extracted using the same procedure.

3.5. Hierarchical Attention Net

Hierarchical attention mechanism is used to obtain the relationship between neighbor users. Because the relationship between neighbor users is complex, hierarchical modelling is needed to obtain the relationships between different granularities. Intuitively, the order of neighbor user is pivotal for the sequential recommendation. Given a user u and his neighbor users arranged in order, each impact on the item is related to the order. Accordingly, we encode this order information by including position embedding to the user representation: (, where is the embedding for the j-th position.

We use the item representation to transform the neighbor user representation into a user-item representation as follows:where is the weight of . The weighting function is computed using the following equation at hop t:

Note that is used iteratively as a query. We define the initial value at the first hop. Here, is the sequential preference representation of neighbor users at the union level. That is, is jointly encoded by all neighbor users. The parameter set of the self-attention layer contains , , , and .

3.6. Review Pointer

Intuitively, group-buying behavior does not represent individuality. Therefore, in addition to the union-level neighbor user representation, it also needs the individual-level neighbor user representation. For example, a user will buy a Samsung mobile phone just because most neighbor users bought a Samsung mobile phone. In this case, purchases other than Samsung are just noises, which are considered by the recommendation model. We can simply consider as the final short-term preference derived from user u, but it only captures sequential patterns at the union level. To better express users' short-term preferences, we further explore the influence of neighbor users on purchase action at an individual level, that is, identify several users associated with the item.

Inspired by the pointer mechanism for review-based recommendation [4], we choose the item with the maximum attention weight as follows:

3.7. User Representation Vector

The two short-term preference representations of the user must be balanced. We utilize the attention mechanism of dot product for a weighted average of union-level and individual-level short-term preferences. The final short-term user preference is calculated as follows:where and are the weights of union-level short-term preference and individual-level short-term preference , respectively. The parameter set of the attention layer contains , which is the matrix required to calculate attention.

The user's long-term and short-term preferences also need to be balanced. The user’s current preference is formed through a linear combination of the representation vector as follows:where balances the importance of the two components and is the hyperparameter. The structure of the item representation network is similar to that of the user representation network; however, it only uses the nearest neighbor items. The corresponding parameters are different.

4. Model Training

We use a factorization machine [29] to obtain the first-order and second-order features. As in collaborative filtering or matrix factorization techniques, the user and item representations are mapped into the same vector representation space . The rating can be computed using the inner product [2, 42]. is a scalar, representing the user-item interaction.

The concatenation x of the user representation and item representation is passed into a factorization machine. Owing to the symmetry of the scalar product and the inner product, subscript j takes a larger value than i.Training. This paper uses the mean squared error (MSE) as the objective function. All parameters in the component networks are trained jointly through a backpropagation training procedure. The model parameters of RANM include word embeddings, position embeddings, aspect transformation matrices, attention networks, and CNN networks.Pretraining. We replaced the hierarchical attention layer with two feed-forward networks. One is for current user, and the other is for the interaction part between neighbor users and items. This simplified model does not consider the interaction details between neighbor users and items.Generalization. Many research studies have found that deep neural networks tend to suffer from overfitting. Thus, we apply a L2 regularization to first-order weights, second-order weights, and biases. In addition to L2 regularization, we also use dropout [13] to further reduce overfitting. Dropout techniques can prevent co-adaptation by dropping some neural units during the training procedure [43].

5. Experiment

5.1. Experimental Setup
Dataset: we perform our experiments on the Amazon-5cores dataset (http://jmcauley.ucsd.edu/data/amazon/). This dataset contains the product purchase histories from Amazon ranging from May 1996-July 2014. We adopt 5-core settings for these datasets. We only retain the records in Yelp in 2018 (https://www.yelp.com/dataset/challenge/) as the final dataset, which is denoted as Yelp18. For all the datasets, a standard text preprocessing method is used to process reviews. Sparsity is defined as #Rating/(#User × #Item). For each dataset, we randomly build the training and testing sets in a ratio of 80 : 20. Moreover, 10% of the records in the training set are the development set for cross-validation. Basic statistics information of the datasets is presented in Table 2. Metrics: here, we use the MSE as a performance metric. A statistical significance test (using to show the result) is conducted by performing a pairwise Student’s t-test.
5.1.1. Baseline Methods

To analyze our model performance more comprehensively, we use three types of baselines to conduct extensive experiments. This can not only comprehensively analyze the performance of RANM but also compare the performance of these classical baselines. We introduce the following three categories.(a)Probabilistic matrix factorization: PMF [44].(b)Latent topic and shallow embedding models with reviews: RBLT [45] and CMLE [46].(c)Deep learning-based method with reviews including attention mechanism and aspect modelling: DeepCoNN [13], TransNet [1], D-Attn [2], TARMF [3], MPCN [4], ANR [5], and CARP [14].Hyperparameter settings: We apply a grid search method for hyperparameter optimization. The final performances are reported over 5 runs. The dimension is selected from {50, 100, 200, 300}. The final dimension size of the word embedding is set to 300. The batch sizes for smaller and bigger datasets are set to be 100 and 200, respectively. For the proposed RANM, we set the number of neighbors as M = 5. The number of aspects is set to K = 5, and the embedding size is set to d = 100, n = 10, and h = {1, 3, 5, 7, 9} for the convolution layer. The number of aspects is set as H = 5. The learning rate of the learning algorithm is set to 0.001 for model training, weighting factor is set as α = 0.1 for balancing long-term and short-term preferences, and regularization coefficient is set as λ = 0.0001 to overcome overfitting.

5.2. Results and Discussion

The best configuration is used in the experiment for all baseline methods. For the deep learning method, the same TensorFlow implementation platform is used to facilitate a fair comparison. Table 3 shows the results for all 25 open datasets. Among the baselines, there is no dominating winner across all datasets. Different datasets have a certain influence on the baseline method. Overall, on most datasets, aspect-based systems (ANR and CARP) are better than review-based systems (DeepCoNN, D-Attn, TransNet, TARMF, and MPCN), where the best performance is achieved by using an attention mechanism (D-Attn, TARMF, and MPCN). The review-based system is better than the shallow semantic model (RBLT and CMLE). The shallow semantic model is better than PMF. The same conclusions can be drawn from the average performance.

From all the experimental results in Table 3, the RANM method has outperformed all the baselines and has passed the hypothesis tests, demonstrating the robustness of our model. This is because a user's context better represents the short-term preferences, while other models only consider long-term preferences and do not model short-term preferences well. Overall, the experimental results demonstrate that RANM is useful in modelling sequential reviews for rating predictions. It is worthwhile to highlight that a significant improvement is gained by RANM on Yelp18 which is the sparsest dataset with the least review information in all datasets (see Table 1). Compared with the recently proposed end-to-end aspect-level models (i.e., ANR and CARP), RANM’s performance improves by 8% and 18% on an average, respectively. Compared with latent topic and shallow embedding models with reviews (i.e., RBLT and CMLE), RANM’s performance improved by 42% and 34% on an average. Compared with the earliest PMF model, RANM achieved an average performance improvement of up to 50%. These results show that the proposed system is sufficiently powerful.

5.3. Ablation Study

We perform an ablation study to analyze the functions of different components. The default of this discussion refers to the complete model with all components, and we compare it with its six variants:No shared projection (SP) and aspect-level projection (AP): rather than having aspect-specific transformation matrices, we constrain the model using only a single transformation matrix that is shared across all aspects of extractions. Furthermore, we remove the AP. Instead of modelling the aspect-level CNN, we do not differentiate between the aspects of each word by using a simple CNN.No pretraining (PT): we forgo the pretraining phase (described in Section 4) for the parameters of hierarchical attention, that is, the set of parameters for the hierarchical attention layer.No self-attention (SA): the self-attention layer is removed directly, and the global feature representation between word vectors is obtained without a self-attention mechanism.No position embedding (PE) and sequential model (SM): on the one hand, the order of neighbors should have some impact on model performance from intuition. On the other hand, we do not use neighbor users to model the interaction with the target item. Our model can alleviate the problem of data sparsity or cold start to some extent because the SM contains rich aspect-level semantic information about long-term and short-term user preferences.

The results of the ablation study for the representative Toys, Video, Beauty, and Book datasets are shown in Table 4. According to the experiment, the removal of neighbor users results in the lowest performance, which demonstrates the importance of sequential modelling. We observe that a lack of shared AP leads to a small performance degradation. This is the smallest performance degradation of all, showing that the information captured by the shared method is not significantly different from that captured by the nonshared method. If the aspect-level project is further removed, the performance will continue to decline to the second-worst one. This shows the importance of aspect-level reviews and sequence modelling of neighbor users. It also demonstrates the effectiveness of our method in the paper.

Other factors also have a certain impact on the results. Relatively speaking, the impact is not significant. For the hierarchical attention layer, we find that the PT procedure provides a better starting optimization point for the entire optimization objective function. For the SA layer, the CNN only obtains local relevance, but not the global relevance. The SA mechanism is just a good supplement to the CNN. The no PE result suggests that temporal information in the form of a neighbor’s proximity is a useful signal for a sequential recommendation.

5.4. Parameter Sensitivity
Effects of #aspect: Figure 2(a) shows the effect of varying the number of aspects from 1 to 10. We observe that a good performance can be obtained using approximately 5 to 7 aspects. The optimal number of aspects in each dataset is slightly different because the number of aspects in the reviews of different datasets is not the same.Effects of #hop: we show the influence of hops of the hierarchical attention layer in Figure 2(b). Note that the model would only consider the general neighbor user’s reviews when the number of hops H = 0. Figure 2(b) shows the effect of varying the number of hops from 1 to 10 for our model in the test datasets. This indicates that multiple hops can capture more abstract information from neighbor users’ reviews as an external memory. However, too many hops lead to overfitting.Effects of #neighbor: RANM consistently achieves a better performance at M = 5, as shown in Figure 2(c). When M becomes smaller or larger, the RANM performance degrades to some extent. This is reasonable because a small M produces less useful neighbor user information and a large M would inevitably introduce much noise. Hence, we set M = 5 in our experiments.Effects of weighting parameter: parameter α in equation (15) controls the importance of sequential patterns (i.e., union-level and individual-level short-term preferences). Figure 2(d) plots a performance curve with varying α values. When α = 0, RANM degrades to a recommendation system based on reviews without sequential modelling. We observe that RANM performs much worse in this setting. An optimal performance is consistently achieved when α = 0.1 in all the three datasets. As α becomes increasingly larger, the performance of RANM further degrades to a larger extent. These results show that short-term user preferences are a useful supplement to long-term user preference. Accordingly, we set α = 0.1 in our experiments.
5.5. Model Interpretability

Recall that RANM encodes aspect information in the reviews. To better visualize an aspect-level review, we show the top-n phrases whose weight is the sum of the weights of the constituent words. Important phrases are highlighted in red. The weight of words in neighbor users' reviews is obtained by the weight product of equations (7) and (9). The weight of the words in the user's reviews is obtained by equation (7). We randomly sample two users and their neighbor users from office products in the Amazon dataset, as shown in Table 5. From the following two examples, we can see that the RANM model has a more fine-grained representation of users’ reviews. Most of the attention to users’ and neighbor users’ reviews is focused on aspect and opinion words.

6. Conclusions

In this paper, we propose RANM, a novel neural recommendation model for sequential recommendation with reviews. Our model incorporates both user’s long-term intrinsic preference and short-term preference to predict the user's rating of the target item. It utilizes aspect-level projections to extract aspect-level sentiment representations. The CNN and the SA layer cooperate with each other to extract global and local user features (including users to be predicted and neighbor users) and item features from the relevant review documents. A novel hierarchical attention model is proposed to capture the fine-grained alignment probability. At the same time, it uses the attention model to obtain union-level and individual-level short-term preferences. The experimental results show that RANM significantly outperforms various strong state-of-the-art methods. Experiments demonstrate the superiority of our model, especially for sparse data. Furthermore, our model can interpret the recommendation results based on reviews. In future, we will extend RANM to consider the effect of each review on the recommendation. At the same time, the user's interest of point on the item can also be modelled in a more fine-grained and adaptive learning manner.

Data Availability

We perform our experiments on the Amazon-5cores dataset (http://jmcauley.ucsd.edu/data/amazon/). This dataset contains product purchase history from Amazon ranging from May 1996 to July 2014. We adopt 5-core settings over these datasets. We only retain the records in Yelp in 2018 (https://www.yelp.com/dataset/challenge/) as the final dataset, denoted as Yelp18.

Conflicts of Interest

The authors declare that they have no conflicts of interest.