#### Abstract

Recommender systems have become indispensable for online services since they alleviate the information overload problem for users. Some work has been proposed to support the personalized recommendation by utilizing collaborative filtering to learn the latent user and item representations from implicit interactions between users and items. However, most of existing methods simplify the implicit frequency feedback to binary values, which make collaborative filtering unable to accurately learn the latent user and item features. Moreover, the traditional collaborating filtering methods generally use the linear functions to model the interactions between latent features. The expressiveness of linear functions may not be sufficient to capture the complex structure of users’ interactions and degrades the performance of those recommender systems. In this paper, we propose a neural personalized ranking model for collaborative filtering with the implicit frequency feedback. The proposed method integrates the ranking-based poisson factor model into the neural networks. Specifically, we firstly develop a ranking-based poisson factor model, which combines the poisson factor model and the Bayesian personalized ranking. This model adopts a pair-wise learning method to learn the rankings of uses’ preferences between items. After that, we propose a neural personalized ranking model on top of the ranking-based poisson factor model, named NRPFM, to capture the complex structure of user-item interactions. NRPFM applies the ranking-based poisson factor model on neural networks, which endows the linear ranking-based poisson factor model with a high level of nonlinearities. Experimental results on two real-world datasets show that our proposed method compares favorably with the state-of-the-art recommendation algorithms.

#### 1. Introduction

Recommender systems [1] have become an indispensable component in E-commerce, online news and social media sites. These systems alleviate the information overload problem for users, by discovering the users’ hidden preferences and providing users with the personalized information, products, or services. With such attractive features, recommender systems are widely employed in many online applications, including Amazon, Youtube, and Netflix.

As one of the most widely used techniques for recommender systems, collaborative filtering (CF) [2] has achieved great success in E-commerce. CF methods, which are independent of specific domains, make recommendations by analyzing the past activities of users. The main idea of CF is to learn the latent user preferences and the item characteristics by modelling the user-item interaction behaviors. Among a variety of CF methods, matrix factorization (MF) [3, 4] has drawn a large amount of attentions, due to its effectiveness and efficiency for coping with large datasets. MF method assumes that only a few latent factors contribute to the preferences of users and the characteristics of items. Matrix factorization approach simultaneously embeds both the user and item feature vectors into a low-dimensional latent factor space.

Most of the traditional CF methods work on explicit feedback, i.e., the ratings on items given by users. They generally apply the point-wise regression methods to predict the ratings for unobserved items. However, explicit feedback may not always be available, since it is comparatively difficult to collect. As a result, using the implicit feedback (e.g., clicks, bookmarks and purchases) to express users’ preferences is more common in the practical recommender systems. In CF with implicit feedback, only the positive instances are observed, while the negative instances and the missing values are mixed together, which make the personalized recommendation with implicit feedback more challenging.

Collaborative filtering with implicit feedback is referred to as the One-Class Collaborative Filtering (OCCF) problem [5, 6]. In order to solve the OCCF problem, Pan et al. [5] and Hu et al. [6] proposed a Weighted Regularized Matrix Factorization (WRMF) method. Rendle et al. [7] formulated recommendation as a ranking problem and proposed the Bayesian Personalized Ranking (BPR). Zhao et al. [8] proposed the Social Bayesian Personalized Ranking (SBPR) model, which integrates social connections with the users’ implicit feedback to estimate users’ rankings of items. However, most of the existing methods for OCCF simplify the implicit interactions between users and items. Regardless of how many times a user interacts with an item, they use the binary implicit feedback to indicate whether a user has clicked or viewed an item [5–8]. In other words, the existing methods generally use the matrix factorization models to quantify the binary implicit interactions between users and items. Intuitively, the number of implicit interactions reflects the degree of the user’s preferences for items. The larger the number of interactions, the more preferred. Hence, such simplified schemes make the CF methods unable to accurately capture the users’ preferences for items.

In addition, as reported in [9], matrix factorization based models use a linear function (i.e., inner product) to model the interactions between the user latent features and the item latent features. The expressiveness of the linear function is too limited to capture the complex structure of users’ interactions, which hinders the performance of recommender systems. Hence, He et al. [9] proposed a general framework, named neural collaborative filtering (NCF) for recommender systems and suggested applying neural networks to learn the nonlinear interaction function from data. The nonlinear interaction function learned from the interaction data endows recommender systems with a high level of nonlinearities. Similar to the traditional recommendation methods that work on the explicit feedback, however, the point-wise learning method in NCF degrades the recommendation performance due to the data sparsity issue.

To tackle the aforementioned issues, in this paper, we propose a neural personalized ranking model for collaborative filtering with implicit frequency feedback, which integrates the ranking-based poisson factor model with the neural networks. Specifically, we basically adopt the poisson factor model (PFM) [10, 11], instead of the classical matrix factorization techniques, to model the implicit interactions between users and items. The poisson factor model replaces the usual Gaussian likelihood in the probabilistic matrix factorization with the Poisson likelihood, which guarantees the nonnegativity of latent factors. Moreover, as pointed out by Ma et al. [10], the poisson factor model is better at modelling the frequency data than the traditional matrix factorization models. However, the data sparsity of implicit frequency feedback limits the performance of the poisson factor model, because the observed feedback available is not sufficient for the poisson factor model to learn latent features. To solve the data sparsity issue, we develop a ranking-based poisson factor model, which combines the poisson factor model and the Bayesian personalized ranking. The ranking-based poisson factor model adopts a pair-wise learning method to learn the rankings of preferences between items. In order to capture the complex structure of interactions, moreover, we propose a neural personalized ranking model on top of the ranking-based poisson factor model, called NRPFM. NRPFM integrates the ranking-based poisson factor model with the neural networks, which provides the linear ranking-based poisson factor model with a high level of nonlinearities. In our neural personalized ranking model, we use the multilayer perceptron (MLP) [12] to learn the nonlinear and nontrivial user-item interaction relationships. Hence, our proposed neural personalized ranking model unifies the strengths of the ranking-based poisson model in learning users’ preferences ranking between items from implicit frequency feedback, and the neural networks in capturing the nonlinear user-item interaction relationships.

The key contributions of our work are summarized as follows:(i)We propose a ranking-based poisson factor model, which combines the poisson factor model and the Bayesian personalized ranking to tackle the data sparsity of implicit frequency feedback.(ii)We propose a neural personalized ranking model for collaborative filtering with implicit frequency feedback. This neural personalized ranking model integrates the ranking-based poisson factor model with the neural networks, which endows the linear ranking-based poisson factor model with a high level of nonlinearities.(iii)We perform extensive experiments to evaluate our proposed method on real-life datasets. The results show that our proposed method outperforms the state-of-the-art recommendation algorithms.

The rest of this paper is organized as follows. Section 2 briefly reviews the related work in recommender systems. Section 3 introduces some preliminary knowledge. Section 4 describes the details of our proposed item recommendation algorithm. Experiments are evaluated in Section 5. Finally, we conclude this paper and present some directions for future work in Section 6.

#### 2. Related Work

In this section, we review the major related work for recommender systems, including the traditional collaborative filtering methods and the neural network-based recommendation methods.

##### 2.1. Collaborative Filtering

Collaborative filtering (CF) [2] approaches are widely deployed in the modern E-commerce websites and have achieved a great success. CF approaches include two main categories [2]: memory-based algorithms and model-based algorithms, according to different ways of utilizing a user-item rating matrix.

Memory-based CF algorithms, also known as neighbor-based methods, use the entire user-item rating matrix to generate recommendations. Typical memory-based algorithms include user-based methods [2] and item-based methods [13, 14]. The underlying assumption of memory-based methods is that similar users share common interests, and users usually prefer similar items. The key issue of user-based and item-based methods is to adopt suitable similarity measures to calculate the pairwise similarity between users or between items. Typical similarity measures include the cosine similarity, the Pearson correlation coefficient, and the adjusted cosine similarity [13]. Model-based CF methods firstly learn a predictive model, which characterizes the rating behaviors of users, by exploiting the statistical and machine learning techniques. They use the predictive models to predict users’ future behaviors. Typical model-based filtering approaches include Bayesian networks [2], clustering model [15, 16], latent semantic analysis [17, 18], and restricted Boltzmann machines [19].

As the most popular approaches among various CF methods, matrix factorization methods (MF) [3, 4] have attracted a lot of attentions due to their effectiveness and efficiency in dealing with a very large scale user-item rating matrix. The basic assumption of matrix factorization is that only a few latent factors contribute to the preferences of users and the characteristics of items. Therefore, matrix factorization approaches simultaneously embed both user and item feature vectors into a low-dimensional latent factor space, where the correlation between user’s preference and item characteristics can be computed directly. Typical matrix factorization approaches include NMF [20], PMF [4], SVD++ [21], and MMMF [22].

These above matrix factorization based recommendation algorithms generally learn the latent feature vectors of users and items from users’ explicit feedback (i.e., users’ ratings on items). Explicit feedback however may not always be available since it is difficult to collect. So, it is more common for recommender systems to present users’ preferences using implicit feedback (e.g., clicks, bookmarks and purchases) in real world, since implicit feedback is relatively easy to obtain. However, only positive instances are observed in implicit feedback, and negative instances and missing values are mixed together, which make the personalized recommendation with implicit feedback more challenging. Collaborative filtering with implicit feedback is referred as the One-Class Collaborative Filtering (OCCF) problem [5, 6]. To solve the OCCF problem, Pan et al. [5] and Hu et al. [6] proposed a Weighted Regularized Matrix Factorization (WRMF) method. WRMF treats all missing entries as negative instances and assigns varying confidence to positive and negative instances. Rendle et al. [7] modeled the rankings of feedback and proposed a Bayesian Personalized Ranking (BPR) criterion for recommendation systems based on implicit feedback. Pan et al. [23] extended BPR and proposed the Group Bayesian Personalized Ranking (GBPR), via introducing the richer interactions among users. GBPR aggregates the features of similar users in groups to reduce sampling uncertainty. In [8], Zhao et al. proposed the Social Bayesian Personalized Ranking (SBPR) model, which integrates social connections with users’ implicit feedback to estimate users’ rankings of items. In [24], Zhao et al. utilized the cross-region community matching technique to generate personalized locally interest locations for users. In particular, they proposed the Bayesian probabilistic tensor factorization with social and location regularization (BPTFSLR) framework to extract users’ latent social dimensions from users’ implicit feedback. It must be noted that the recommender systems with implicit feedback are more practical than those with explicit feedback. Therefore, the recent research directions on recommendation have been shifted towards learning users’ hidden preferences from implicit feedback, rather than inferring users’ tastes from explicit feedback.

##### 2.2. Neural Networks-Based Recommendation Approaches

Recently, many research has employed the neural network technique to design recommendation algorithms, because neural network technique is able to effectively capture the non-linear and non-trivial user-item interaction relationships [25] and extract deep and abstract feature representations for users and items, leading to large improvements in recommendation quality. Representatives of neural network-based recommendation algorithms include Wide & Deep Learning [26], NCF [9], NFM [27], AutoRec [28], CDAE [29], and ConvMF [30].

Among deep neural network techniques, the multilayer perceptron (MLP) [12] is able to approximate measurable function and widely adopted in recommender systems. To provide the App recommendation in Google play, Cheng et al. [26] presented the Wide & Deep Learning approach, which consists of the wide learning model and the deep learning model. The wide learning component is a single layer perceptron and can effectively memorize feature interactions using the cross-product feature transformations. The deep learning component applies the MLP to generalize to the unobserved feature interactions through low-dimensional embeddings. In [9], He et al. proposed a general framework named neural collaborative filtering (NCF) for CF based on neural networks. Specifically, NCF leverages multilayer perceptron to learn the user-item interaction function, which endows NCF modelling with a high level of nonlinearities. Under the NCF framework, three instantiations of NCF are presented, i.e., GMF, MLP, and NeuMF. GMF employs a linear kernel to model the latent feature interactions, and MLP utilizes a nonlinear kernel to the user-item interaction function. Based on GMF and MLP, NeuMF unifies the linearity of GMF and the nonlinearity of MLP for modelling the complex interactions between users and items. Furthermore, Wang et al. [31] extended NCF to solve the cross-domain social recommendation problem (i.e., recommending the relevant items of information domains to the potential users of social networks) and proposed a neural social collaborative ranking (NSCR) approach. NSCR enhances NCF by plugging a pairwse pooling operation on top of embedding vectors and utilizes the graph regularization technique to model the cross-domain social relations. In addition, in order to simultaneously model the low-order feature interactions and the high-order feature interaction, Guo et al. [32] proposed an end-to-end model, named deepFM, which seamlessly fuses the factorization machine (FM) [33] and MLP. Similar to deepFM, which employes FM and MLP for recommendation, He et al. [27] proposed the neural factorization machine (NFM) for prediction under sparse settings. Unlike other MLP-based methods, NFM introduces a Bi-Interaction pooling component on top of embeddings vectors, which captures the second-order feature interactions in the low-level and greatly facilitates the following hidden layers of NFM to learn the high-order feature interactions. An extension of NFM, called AFM, is also proposed in [34]. AFM takes the importance of different feature interactions into consideration and learns the importance of each feature interaction via a neural attention network.

As one of the core components of deep neural network, autoencoder [35] technique is able to reconstruct inputs in the output layer via a low-dimensional hidden space; some researchers have employed the autoencoder technique in recommender systems to improve the recommendation performance. For example, Sedhain et al. [28] utilized the autoencoder paradigm to make recommendation and proposed AutoRec. Specifically, AutoRec takes the user-partial observed vectors or the item-partial vectors as inputs and embeds them into a low-dimensional hidden space. Finally, AutoRec reconstructs inputs in the output layer by directly optimizing Root Mean Square Error (RMSE). According to the types of inputs, AutoRec includes two variants: U-AutoRec and I-AutoRec, which correspond to take the user-partial observed vectors and the item-partial vectors as inputs, respectively. Compared to the classical autoencoder technique, denoising autoencoder (DAE) [36] techniques are able to discover more robust representations and avoid learning an identity function. In order to take the advantages of DAE, several research work employes DAE techniques for CF [37, 38]. Li et al. [38] proposed the deep collaborative filtering framework (DCF), which unifies the deep learning models with MF based CF. The key deep learning model used in DCF is the marginalized denoising auto-encoders (mDA) [39], which is more computationally efficient and has a closed-form solution to learn model parameters. Moreover, Wu et al. [29] utilized the idea of DAE and proposed the collaborative denoising autoencoder (CDAE) for CF. The key difference between CDAE and the above DAE-based CF methods is that CDAE considers the personalized user factors by encoding a latent vector for each user, which greatly improve the recommendation performance. These above DAE-based recommendation methods [29, 37, 38] assume that the observed user-item interactions are a corrupted version of the user’s full preferences and learn the latent representations of the corrupted user-item preferences, which can be used to reconstruct users’ full preferences. In contrast, Wang et al. [40] proposed a hierarchical Bayesian model, called CDL. CDL integrates the stacked denoising autoencoder (SDAE) [41] into PMF [4] and jointly uses SDAE to learn the deep representations for item content and utilizes PMF to perform collaborative filtering for ratings matrix. Note that CDL takes the item features as inputs for SDAE while other DAE-based methods (i.e., AutoRec, DCF, and CDAE) make user feedback as inputs. In other words, CDL utilizes the deep learning component to model the auxiliary information rather than model user behaviors.

Convolution neural network (CNN) [42] is a specialized kind of feedforward neural network for processing the data with grid-like topology. CNN-based recommendation methods usually utilize CNN to extract the deep and abstract feature representations [30, 43–45] from images, audio, and text information. Wang et al. [46] proposed a visual content enhanced point-of-interest (POI) recommendation method (VPOI). VPOI incorporates the visual features extracted via CNN into PMF for learning the latent features of users and POIs. He et al. [44] proposed a visual Bayesian personalized ranking (VBPR) algorithm that incorporates the visual features learned from the product images by CNN into MF. In [45], Oord et al. utilized CNN to extract the latent features from music audio for the music recommendation. Gong et al. [43] formulated the hashtag recommendation task as a multiclass classification problem and proposed an attention based CNN architecture for the hashtag recommendation. Zheng et al. [47] proposed the deep cooperative neural network (DeepCoNN). DeepCoNN consists of two parallel CNNs coupled together by a shared common layer to model the user behaviors and the item properties from reviews. Moreover, Kim et al. [30] integrated CNN into the probabilistic matrix factorization and proposed the convolutional matrix factorization for recommendation (ConvMF). ConvMF utilizes CNN to capture the contextual information of item content and further enhance the rating prediction accuracy. Other neural network-based approaches include the recurrent neural network (RNN) based methods [48–50], the restricted Boltzmann machine (RBM) based methods [19, 51], and the generative adversarial network based methods [52].

There are also other cross-domain multimodal research developments on recommendation systems that provide interesting and insightful discussions to guide future directions. As an example, Nie et al. [53] proposed a scheme to re-rank web images for complex queries from probabilistic perspective. Specifically, they first proposed a heuristic approach to detect noun-phrase based visual concepts from complex query. Then, they proposed a heterogeneous network to automatically estimate the relevance score of each image, which jointly integrates three layer relationships, spanning from semantic level to visual level. Nie et al. [54] focused on a challenging image search performance prediction problem. By analyzing Normalized Discounted Cumulative Gain (NDCG) and Average Precision (AP), they found that only the prediction of the images’ relevance probabilities was required to compute their mathematical expectations. Therefore a query-adaptive graph-based learning approach was proposed to estimate the relevance probability of each image to a given query.

The works that are most related to ours are BPR [7] and NCF [9]. Comparing BPR with NRPFM, the major difference between BPR and NRPFM includes three aspects: BPR is designed to learn latent user and item feature vectors from implicit binary feedback, while NRPFM is designed to learn latent user and item representations from implicit frequency feedback, which is more practical in recommendation scenarios; BPR uses a linear function (i.e., inner product) to model the interactions between user latent features and item latent features. By contrast, NRPFM uses a nonlinear function learned from a multilayer perceptron to model the complex interactions between latent user features and latent item features, which endows NRPFM with a high level of nonlinearity; essentially, BPR assumes that users’ implicit feedback follows the Gaussian distribution, which does not fit the heavily skewed frequency data well. But NRPFM assumes that users’ implicit feedback follows the Poisson distribution, which is more suitable for fitting the skewed frequency feedback. In addition, although both NCF and NRPFM leverage the multilayer perceptron to model the nonlinear interactions between latent user features and latent item features, the difference between NRPFM and NCF mainly lies in three aspects. Similar to BPR, NCF is also designed for implicit binary feedback, which is a simplified form of implicit frequency feedback. In contrast, NRPFM directly learns users’ preferences and items’ characteristics from the implicit frequency feedback. Moreover, NCF leverages a point-wise method to learn latent user and item feature vectors, while NRPFM utilizes a pair-wise method to learn latent feature vectors for users and items. For NCF, only the observed feedback has contributions to the learning of latent user and item feature vectors. In contrast, NRPFM makes the missing values also contribute to learning latent user and item features. Hence, NRPFM to some extent better alleviates the data sparsity problem. NCF also assumes that users’ implicit feedback follows the Gaussian distribution, while NRPFM makes a different assumption that users’ implicit feedback follows the Poisson distribution.

Our proposed Neural Personalized Ranking via Poisson Factor Model is an important attempt of this direction. Moreover, we consider more practical recommendation scenarios, in which users represent their preferences using the implicit frequency feedback, rather than using the simplified implicit binary feedback. Furthermore, although it is direct to use deep learning techniques to extend personalized ranking recommendation models (i.e., BPR), this scheme is not able to infer latent user preferences and item characteristics from implicit frequency feedback because traditional ranking based personalized recommendation models essentially are designed to deal with implicit binary feedback. By contrast, our proposed deep learning based personalized ranking model is built on top of BPR, Poisson factor model, and deep learning technique and unify the strengths of these models to more accurately learn latent user and item features from implicit frequency feedback. In short, our contributions are three-fold. Specifically, we use Poisson factor model to model users’ implicit frequency feedback. Subsequently, we utilize the nonlinear function learned from the deep learning algorithm to model the complex interactions between latent user feature vectors and latent item feature vectors. Finally, in order to train the recommendation model, we sample the set of triplets with partial order from implicit frequency feedback, resulting in alleviating the data sparsity problem.

#### 3. Preliminary Knowledge

In this section, we introduce the preliminary knowledge related to our proposed neural personalized ranking based recommendation algorithm. We firstly describe the recommendation problem in Section 3.1. Then, we briefly introduce the Poisson factor model in Section 3.2.

##### 3.1. Problem Description

In the typical recommender systems with implicit frequency feedback, users’ implicit feedback is used to construct the user-item interaction matrix , which is comprised of two entity sets: the set of users and the set of items . Each entry of represents the number of interactions between user and item . The number of interactions reflects the users’ preferences for certain items. Note that, in the recommender systems with explicit feedback, the feedback usually indicates the value of the rating on item given by user . Ratings are usually integers and fall into , in which indicates the missing value, since the user has not yet rated that item. In the recommender systems with implicit frequency feedback, however, the feedback has a larger range compared with ratings. For example, in shopping websites, user may click hundreds of times at some products, while user may click few times for other products. Moreover, is the mixture of the missing value and the negative instance in implicit scenario, which indicates the user is not aware of the item, and the user does not like it, respectively. The set of items interacting with the user is denoted as (). In practice, the user-item interaction matrix is generally very sparse with many unknown entries, since a typical user may have only interacted with a tiny percentage of items. This sparse nature of the user-item interaction matrix leads to the poor recommendation quality. In this paper, we use “feedback" and “interaction" interchangeably.

The task of recommender systems with implicit frequency feedback is to learn the users’ hidden preferences by utilizing users’ interaction history and provide them with the ranked lists of items that user may be interested in.

##### 3.2. Poisson Factor Model

Poisson factor model (PFM) [10] is a generative probabilistic model, which assumes that each observed element follows the Poisson distribution with the expectation . The expected value matrix is factorized into the user latent feature matrix and the item latent feature matrix . Besides assuming that the Poisson distribution generates the observed elements, PFM places Gamma priors over and ,where is the Gamma function. and are the shape and rate parameters of Gamma distribution, respectively.

The generative process of an observed element is as follows.(1)For each user , generate each component of the user latent feature vector: .(2)For each item , generate each component of the item latent feature vector: .(3)Generate .

The posterior distribution of and given the user-item interaction matrix is as follows:where , .

Maximizing the log-posterior distribution results in the following objective function:

PFM applies the stochastic gradient descent algorithm (SGD) technique to learn the user latent feature matrix and the item latent feature matrix .

#### 4. Our Approach

Our motivation is to learn user preferences and item characteristics from implicit frequency feedback, as well as model the complex structure of user interactions via deep learning. In order to implement this motivation, we do not directly fit the observed implicit frequency feedback, but fit the partial orders of user preferences for items embedded in the triplets, which are sampled from implicit frequency feedback according to our assumption. To endow the ranking-based Poisson factor model with a high level of nonlinearities, we use the multilayer perceptron to learn the nonlinear and nontrivial user-item interaction relationships. In the following sections, we elaborate our proposed neural personalized ranking model that integrates the neural networks into the ranking based poisson factor model for collaborative filtering with implicit frequency feedback.

##### 4.1. Ranking-Based Poisson Factor Model

In practical recommender systems, the implicit interactions between users and items usually are displayed in the form of frequency data, which reflects the degree of the user’s preferences for items. The larger the number of interactions, the more preferred. Traditional recommendation models generally simplify the implicit frequency feedback to be binary feedback, which to some extents leads to information loss. The Poisson factor model [10] replaces the usual Gaussian likelihood in probabilistic matrix factorization with the Poisson likelihood, which guarantees the nonnegativity of latent factors. Moreover, as reported in [10], Poisson factor model is suitable for modelling the frequency data, which displays similar property to implicit interactions. Hence, we basically adopt the poisson factor model to quantify users’ interaction behaviors and learn the latent user features and the item features from implicit frequency feedback. Similar to classical matrix factorization models [3, 4], the Poisson factor model basically adopts the point-wise regression method to learn the latent representations for users and items from the observed feedback. In this sense, only the observed feedback has contributions to the learning of the latent user features and the item features. Due to the data sparsity issue that commonly exits in recommender systems, the available observed feedback is not sufficient for the poisson factor model to learn the latent features, resulting in degrading the performance of recommender systems.

The Bayesian personalized Ranking (BPR) [7] is a popular pairwise learning method for collaborative filtering with binary feedback and has been widely adopted in many recommendation models [8, 23]. BPR learns the latent user and item features by optimizing the Bayesian pairwise ranking criterion. Unlike the point-wise learning methods, BPR assumes that users prefer the observed items over the nonobserved items. In fact, BPR essentially makes the missing values contribute to the training of recommendation model. Hence, this pair-wise learning method somehow alleviates the data sparsity problem.

To tackle the sparsity of implicit frequency feedback, we propose a ranking based Poisson factor model, which combines the Poisson factor model and the Bayesian personalized ranking. The ranking based Poisson factor model adopts a pair-wise learning method to learn the rankings of preferences between items. Specifically, we assume that users’ preferences for items increase as the numbers of interactions increase. This assumption implies three aspects: the ranking of preferences for the observed item is higher than that of the preferences for nonobserved item; if two items are observed, a user prefers the one with the larger number of interactions over the another one with the fewer number of interactions; for two nonobserved items, we can not infer the order of their preferences. Let denote the number of interactions between the user and the item , and for the user and the item . If , then the user prefers the item over the item ; i.e., , where indicates the preference relationships between user and items. If , then , where denotes a threshold parameter. In other words, if the difference between and surpasses the threshold , we assume that the preference on the item is ranked higher than the preference on the item . Formally, training set (i.e., the set of triplet ) is defined as follows:

Given the preference relationships between the user and items , we maximize the posterior probability to learn the latent user and item features. denotes model parameters; i.e., . Through the Bayesian inference, the posterior probability of and can be obtained as follows:

All users are presumed to be independent of each other, and the preference ranking of one (user,item) pair for a specific user is also assumed to be independent of the rankings of other (user,item) pairs. As a result, the likelihood function for all the users is formulated aswhere is the indication function. Based on the totality and antisymmetry properties of preferences relationship, (6) is rewritten as denotes the probability that the user prefers the item over the item and is defined aswhere is the logistic sigmoid function. is a real value function of model parameters and captures the relationship between the user , the item , and the item and is defined as follows:

In addition, Gamma priors are assumed for the latent user and item feature vectors:

Substitute the likelihood function defined in (7) and the model parameter priors defined in (10) into (5); then maximize the log of the posterior probability; we obtain the objective function of the ranking-based poisson factor model as follows:

It should be noted that both PFM and ranking-based PFM basically assume that each observed element in the user-item interaction matrix follows the Poisson distribution. Moreover, they both place Gamma priors over each entry of latent user feature matrix and item feature matrix. In addition, they basically are generative probabilistic models. The ranking-based PFM model is an extension of PFM. The difference between the ranking-based PFM and PFM models is that PFM is a point-wise recommendation model, whereas the ranking-based PFM adopts a pair-wise method to learn model parameters. For PFM, only observed feedback has contributions to learning recommendation model parameters. For ranking-based PFM, both observed feedback and missing values contribute to learning model parameters.

##### 4.2. The Architecture of Neural Personalized Ranking Model

As shown in (9), the ranking based Poisson factor model uses a linear function (i.e., inner product) to model the interactions between the user latent features and the item latent features. As reported in [9], the expressiveness of the linear function is limited and may not be sufficient to capture the complex structure of users’ interactions, which hinders the performance of the ranking based Poisson factor model. To capture the complex structure of interactions; therefore we develop a neural personalized ranking model on top of the ranking-based Poisson factor model, called NRPFM. This model integrates the neural networks with the ranking-based Poisson factor model, which endows the linear ranking-based Poisson factor model with a high level of nonlinearities. In our neural personalized ranking model, we use the multilayer perceptron (MLP) [12] to learn the nonlinear and nontrivial user-item interaction relationships. Figure 1 presents the architecture of our proposed model, which consists of two branches - The left branch is used to predict score for the positive (user,item) pair and the right branch for the negative pair. Each branch includes four layers: embedding layer, merge layer, hidden layers, and prediction layer.

**Embedding Layer**. Embedding layer is aimed at mapping users and items into a low-dimensional latent space and uses the compact and dense real value vectors, instead of the sparse and high-dimensional vectors, to represent users and items.

The input of our model is a triplet that indicates the indexes of user and the corresponding ranking item pair . After one-hot encoding the user and item indexes, we obtain the sparse representations of users and items. Then, we use the embedding table lookup to obtain the embeddings of user and items and . Formally,where indicates the result of one-hot encoding for user or item. and are the user embedding matrix and the item embedding matrix, respectively. The embedding matrices and the latent feature matrices derived from matrix factorization models have the same semantics. Hence, represents the latent feature vector of user and indicates the latent feature vector of item .

Above the embedding layer, we concatenate on the user embedding and the item embedding for each (user,item) pair in the merge layer. The concatenation of the embeddings of user and item jointly encodes the user preferences and the item characteristics. Then, we feed the concatenated embedding into the hidden layers. This design for the emerge layer is widely adopted in the multilayer perceptron- (MLP-) based recommendation methods [9, 26].

**Hidden Layers and Prediction Layer**. Since the simple concatenation of embeddings does not account for any interactions between the user latent features and the item latent features, we utilize a multilayer perceptron (MLP) to learn the user-item interaction relationships, which endows our model with a high level of nonlinearities. MLP stacks multiple fully connected hidden layers, where each hidden layer nonlinearly transforms the output of the previous hidden layer via the weight matrix and the activation function and feeds their output into the following hidden layer. The entire MLP adopts the tower structure, where the size of layer is the half of the size of layer . Based on this structure, a higher hidden layers is able to learn more abstract features for users and items.

The prediction layer is connected with the last hidden layer and takes ReLu as the activation function to predict the scores on items. Formally, the prediction score on the item given by the user is defined as follows: where is the weight matrix of predication layer. , , and denote the weight matrix, the bias vector, and the activation function for the th hidden layer, respectively. is the concatenation of the user embedding and the item embedding . The prediction score on the item given by the user is computed in the same way. We adopt ReLU as the activation functions for hidden layers, because other activation functions, such as sigmoid and tanh, suffer from the saturation and lead to overfitting.

##### 4.3. Model Learning

The prediction scores reflect users’ preferences for items. After obtaining the prediction scores for the positive (user, item) pair and the negative (user, item) pair, i.e., and , the real value function is rewritten as follows:

Integrating the above relationship function with the ranking-based Poisson factor model, the objective function of our neural personalized ranking model is defines as follows:

We initialize the user embedding matrix and the item embedding matrix with the Poisson distribution, which is parameterized by the shape parameters and the rate parameters . At the training stage, according to the rules of the construction of training set , we uniformly sample triplets in each iteration and control the number of negative items for each positive instance. After shuffling these sampled triplets, a batch of triplets are fed into our neural personalized ranking model. For the optimization algorithm, we adopt Adam [55] to update gradients, since Adam tunes the learning rate based on the adaptive schemes and thus yields fast convergence.

#### 5. Experiments

In this section, we conduct several experiments on real datasets to compare the performance of our proposed recommendation algorithm with the state-of-the-art methods.

##### 5.1. DataSet Description

There are many datasets used to evaluate the performance of recommendation algorithms, such as MovieLens (https://grouplens.org/datasets/movielens/), Yelp(https://www.yelp.com/dataset/challenge), Epinions(https://snap.stanford.edu/data/soc-Epinions1.html), and Netflix(https://www.netflixprize.com/). However, most of them only consist of explicit feedback (i.e., ratings). In our experiment, we choose two publicly available datasets(http://www.ntu.edu.sg/home/gaocong/datacode.htm): Foursquare and Gowalla, to evaluate the performance of our proposed method, since they include implicit frequency feedback of users. Foursquare (https://foursquare.com/) and Gowalla (http://gowalla.com/) are two popular location-based social networks (LBSNs), which attract lots of attention from both the industry and the academia recently. In both Foursquare and Gowalla, users present their preferences in terms of the check-ins at locations (e.g., restaurants, tourists spots, and stores). In other words, users interact with locations via the check-ins. The number of check-ins to some extend indicates the degree of users’ preferences for locations.

In the Foursquare dataset, all check-ins were collected within Singapore from Aug. 2010 to Jul. 2011. Check-ins of Gowalla dataset were made within California and Nevada from Feb. 2009 to Oct. 2010. In both datasets, users who have checked in fewer than 5 locations have been removed. Meanwhile, locations with less than 5 users have been filtered out. Foursquare dataset contains 194,108 check-in observations from 2,321 users at 5,596 locations. We randomly sample a subset from the original Gowalla dataset for evaluation. The sampled Gowalla dataset includes 242,172 check-in observations from 5,000 users at 23,997 locations. After aggregating the check-in records based on user and location identifiers, we obtain 105,764 entries in the user-location interaction matrix of Foursquare, and 160,689 entries in the user-location interaction matrix of Gowalla, respectively. The sparsity of Foursquare and Gowalla datasets is and , respectively. Hence, both the Foursquare and Gowalla datasets are very sparse. In Foursquare dataset, each user checked in 45.57 locations on average. And in Gowalla, each user checked in 32.13 locations on average.

The general statistics of Foursquare and Gowalla are summarized in Table 1.

##### 5.2. Evaluation Metrics

We focus on the recommendation problem with implicit feedback, which is formulated as the item recommendation problem aimed at providing users with top- highest ranked items. Therefore, we employ two widely used rank metrics to evaluate the performance of different recommendation algorithms, i.e., Precision@ and Recall@, where is the length of ranked recommendation list. Given a user , the precision and recall are defined as follows:where is the top- recommended item list for the user and is the visited items list by the user in testing set. The final Precision@ and Recall@ of the entire recommendation algorithm are computed by averaging the precision and recall values over all the users, respectively. For both metrics, we set to evaluate the performance in our experiments.

##### 5.3. Compared Approaches

In order to evaluate the effectiveness of our proposed method, we compare our method with the following state-of-the-art approaches:(1)PMF: this method was proposed by Mnih and Salakhutdinov [4] and can be viewed as a probabilistic extension of SVD [56] model. PMF represents the latent user and item feature vector by means of a probabilistic graphic model with Gaussian observation noise.(2)BPR: BPR adopts a Bayesian Personalized Ranking criterion [7] for item ranking. BPR is a pair-wise learning method for OCCF problem. In our experiments, we employ a uniform sampling strategy to sample the user-item pairs for model training.(3)MLP: MLP is an instantiation of NCF [9], which concatenates the user and item embeddings, and feeds the concatenation into neural networks to model the nonlinear user-item interactions.(4)NeuMF: NeuMF is another instantiation of NCF, which is a strong baseline that fuses the generalized matrix factorization and the multilayer perceptron to simultaneously model the linear and nonlinear interactions between the latent user features and the latent item features.(5)PFM: this method was proposed by Ma et al. [10]. PFM focuses on website recommendation and models users’ implicit feedback using the Poisson distribution.(6)NRPFM: NRPFM is described in Section 4. NRPFM is a neural personalized ranking model for collaborative filtering with implicit frequency feedback, which integrates the ranking-based Poisson factor model with the neural networks.

##### 5.4. Experiment Settings

In order to make a fair comparison, we set the parameters of each method, according to respective references or based on our experiments. Under these parameter settings, each method achieves its best performance. For PMF, we set and to be 0.001. For BPR, , we employ a uniform sampling strategy to sample the user-item pairs for model training. For PMF, BPR, and PFM, we set the learning rate involved in the gradient descent algorithm to be 0.0001. For MLP, we adopt the tower structure for neural networks with three hidden layers, where the sizes of each hidden layer are . The embedding size of users and items is equal to the size of the first hidden layer, i.e, 32. For NeuMF, we adopt the default parameters setting of the original paper: the number of hidden layers is set to be 3, and the sizes of each hidden layer are , respectively. The number of negative samples for each positive one is set to be 4, and the embedding size of the generalized matrix factorization is set to be 10. For PFM, . For our proposed NRPFM, we initialize the model parameters using Gamma distribution with the shape parameter and the rate parameter . We set the number of hidden layers to be 3 and the sizes of each hidden layer to be . Meanwhile, we set the number of negative samples per positive instance to be 8, and adopt Adam with the batch size of 256 as optimizer.

For each user, we randomly extract 70% of the visited locations as the training set and the remaining 30% of locations as the testing data. We conduct data splitting five times and report the average results on the test sets for each dataset.

##### 5.5. Recommendation Quality Comparisons

Tables 2 and 3 report the recommendation quality of all the compared methods on Foursquare and Gowalla datasets.

From Tables 2 and 3, we have the following observations: on both datasets, PMF performs the worst among all the compared methods. Besides the data sparsity issue, one possible reason is that PMF assumes a user’s implicit feedback follows the Gaussian distribution, which is not suitable for modeling the implicit frequency feedback. BPR achieves better performance than PMF. This is because BPR adopts the pair-wise learning method to infer the latent user and item feature vectors, making missing values contribute to the learning of model parameters. Hence, to some extent, this pair-wise learning method is able to alleviate the data sparsity problem. Although MLP and NeuMF apply the point-wise methods to learning the latent representations of users and items, they are generally superior to BPR. This shows the strengths of the neural networks in capturing the complex structure of users’ interactions. The performance improvements of MLP and NeuMF over BPR demonstrate that using the neural network to capture the nonlinear user-item interaction relationships is beneficial for collaborative filtering. PFM outperforms PMF by a large margin and achieves comparable performance to BPR. This observation indicates that the Poisson factor model is more suitable for modeling uses’ implicit frequency feedback than probabilistic matrix factorization. Meanwhile, the overall performance of PFM is worse than BPR. This is because PFM only uses the observed feedback to learn the latent feature vectors, suffering from the data sparsity issue. NBPFM consistently outperforms other methods, which demonstrates the effectiveness of our proposed model for collaborative filtering with implicit frequency feedback. Our proposed method improves the Precision@3 of NeuMF by 6.4% and 8.9% on Foursquare and Gowalla, respectively. In terms of Recall@3, the improvements of NBPFM over NeuMF are 9.2% and 4.6% on Foursquare and Gowalla, respectively. This observation confirms our assumption that integrating the strengths of the ranking-based poisson model in learning users’ preferences ranking between items and neural networks in capturing nonlinear user-item interaction relationships is able to boost the recommendation quality. All the compared methods perform better on Foursquare than on Gowalla. The reason is that Gowalla is more sparse than Foursquare. With the dense user-item interactions, recommendation methods is more able to accurately learn the latent user and item feature vectors, resulting in better recommendation performance.

##### 5.6. Sensitivity Analysis

###### 5.6.1. Impact of the Depth of Neural Networks

In our proposed method, we use the neural networks, i.e, MLP, to learn the nonlinear interactions between the user and item feature vectors from users’ implicit feedback. The depth of neural networks is an important factor that affects the expressiveness of neural networks. In the section, we conduct a group of experiments to investigate the impact of the depth of neural networks on the recommendation quality. We fix the size of the last hidden layer as 8 and vary the depth of neural networks from 1 to 4. For example, if the depth of neural networks is 4, then the structure of neural networks is , and the embedding size of user and item is 64. Other parameters keep the same settings as described in Section 5.4. We only present the experimental results on Foursquare in Table 4 and the experimental results on Gowalla show similar trends.

In Table 4, the NBPFM- denotes the NBPFM method with hidden layers. As demonstrated in Table 4, NBPFM with different number of hidden layers consistently outperforms NeuMF. Moreover, it is not beneficial for NBPFM that stack more hidden layers in MLP to learn the nonlinear interaction functions between the latent user and item feature vectors. A possible reason is that a deeper neural network makes NBPFM have more trainable parameters, which are relatively difficult to learn with limited training data, resulting in the degradation of recommendation performance. In addition, for NBPFM without hidden layers, the performance of NBPFM- is even better than that of NeuMF. NBPFM- can be viewed as a variant of the ranking-based poisson factor model that utilizes the concatenation of user and item embeddings to predict scores. This observation somehow shows the effectiveness of our proposed ranking-based Poisson factor model for collaborative filtering with implicit frequency feedback.

###### 5.6.2. Impact of Negative Samples

In this section, we conduct another group of experiments to investigate the impact of negative samples on the recommendation quality. We vary the number of negative samples for each positive one from 1 to 16 and observe the changes of recommendation quality. We set the number of hidden layers to be 2 since NBPFM achieves better performance under this settings, which is shown in Table 4. And other parameters remain unchanged. The experimental results of NBPFM on Foursquare are shown in Figure 2.

**(a) Precision@3**

**(b) Precision@5**

**(c) Precision@10**

**(d) Recall@3**

**(e) Recall@5**

**(f) Recall@10**

As indicated in Figure 2, our proposed neural personalized model is sensitive to the number of negative samples. The recommendation quality firstly improves as the number of negative samples increases and then degrades as the number of negative samples further increases. This indicates that insufficient or too many negative samples may hurt the recommendation performance of NBPFM. NBPFM achieves the best performance when the number of negative samples is around 8.

###### 5.6.3. Impact of Parameters and

In our proposed method, parameters and control the shapes and scales of the Gamma distributions, which are used to initialize the user embedding matrix and the item embedding matrix. We perform another group of experiments to evaluate the sensitivities of and , by changing the values of from 1 to 5 given , or varying the values of from 0.1 to 0.5 given . The experimental results of NBPFM with respect to different and on Foursquare are plotted in Figures 3 and 4, respectively.

**(a) Precision@3**

**(b) Precision@5**

**(c) Precision@10**

**(d) Recall@3**

**(e) Recall@5**

**(f) Recall@10**

**(a) Precision@3**

**(b) Precision@5**

**(c) Precision@10**

**(d) Recall@3**

**(e) Recall@5**

**(f) Recall@10**

As we can see, both and significantly affect the performance of our proposed recommendation model. In the case of with the fixed value 2, NBPFM performs best when is around 0.2, further reducing or increasing the value of leads to worse performance. In the case of fixing the value of to be 0.2, NBPFM shows the similar trends; i.e., all the evaluation metrics firstly move upwards and then begin to drop down, when surpasses a certain threshold. This observation indicates that NBPFM is also sensitive to the initializations of the user embedding matrix and the item embedding matrix, which initially encode the user preferences and the item characteristics, respectively.

#### 6. Conclusion and Future Work

In this paper, we propose a neural personalized ranking model for collaborative filtering with implicit frequency feedback, which integrates the ranking-based poisson factor model with the neural networks. We firstly develop a ranking-based Poisson factor model, which combines the Poisson factor model and the Bayesian personalized ranking to model sparse implicit frequency feedback. The ranking-based Poisson factor model adopts a pair-wise learning method to learn the rankings of preferences between items. Then, we utilize the neural networks to further extend the ranking-based Poisson factor model, and propose a neural personalized ranking model to capture the complex structure of user-item interactions. The neural personalized ranking model leverages the multilayer perceptron to learn the nonlinear user-item interaction relationships and endows the linear ranking-based Poisson factor model with a high level of nonlinearities. Experimental results on two real-world datasets show that our proposed method outperforms the state-of-the-art recommendation algorithms.

We only infer users’ preferences and items’ characteristics from users’ implicit feedback. Since auxiliary information, such as social relationships and user reviews, is beneficial to recommendation algorithms, we plan to integrate these auxiliary information into our proposed neural personalized ranking model to boost the recommendation performance. In addition, recent advances in deep learning, e.g., attention mechanism, graph convolutional neural network, and generative adversarial network, have shown great potential in the fields of natural language processing and computer vision. Hence, applying the above deep learning techniques to recommender systems would be an interesting direction.

#### Data Availability

The ZIP data used to support the findings of this study are available at http://www.ntu.edu.sg/home/gaocong/datacode.htm. The ZIP data is publicly available and can be directly downloaded from http://www.ntu.edu.sg/home/gaocong/data/poidata.zip. The description of datasets is presented in Section 5.1 of our manuscript.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work is supported in part by the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (Grant no. 17KJB520028), NUPTSF (Grant no. NY217114), Tongda College of Nanjing University of Posts and Telecommunications (Grant no. XK203XZ18002), and Qing Lan Project of Jiangsu Province.