Abstract

Nonnegative matrix factorization (NMF), which is aimed at making all elements of the factorization nonnegative and achieving nonlinear dimensional reduction at the same time, is an effective method for solving recommendation system problems. However, in many real-world applications, most models learn recommendation models under the supervised learning paradigm. Since the recommendation performance of NMF models relies heavily on initialization, the user-item interaction information is often very sparse. In many cases, supervised information about the data is difficult to obtain, resulting in a large number of existing models for supervised learning being inapplicable. To address this problem, we propose an information self-supervised NMF model for recommendation. Specifically, this model is based on the matrix factorization idea and introduces a self-supervised learning mechanism based on the NMF model to enhance the sparse data information of sparse data, and an easily extensible self-supervised NMF model was proposed. Furthermore, a corresponding gradient descent optimization algorithm was proposed, and the complexity of the algorithm was analysed. A large number of experimental results show that the proposed S2NMF has better performance.

1. Introduction

In the age of information explosion, information overload has become a central issue faced by society. Recommender systems play a vital role in solving this problem, as they help determine what information to provide to individual consumers and allow online users to quickly find personalized information that suits their needs [1]. Recently, recommender systems have become ubiquitous on e-commerce platforms, such as Amazon for book recommendations, http://Last.com/ for music recommendations, Netflix for movie recommendations, and CiteULike for references.

The main recommendation methods include collaborative filtering recommendation [2, 3], content-based recommendation [4], knowledge-based recommendation [5], and social network-based recommendation [6]. Collaborative filtering recommendation generally adopts the nearest neighbor technology, calculates the distance between users by using the historical preference information of users, and then predicts the preference degree of target users to specific products by using the weighted evaluation value of the nearest neighbor users, and the system makes recommendations to target users according to this preference degree. The maximum advantage of collaborative filtering is that it has no special requirements on the recommended objects and can deal with unstructured complex objects, such as music and movies. Content-based recommendation is the continuation and development of information filtering technology. It makes recommendations based on the content information of the project, without relying on users’ comments on the project. It is more necessary to use machine learning methods to get user interest information from the feature description of content. Knowledge-based recommendation can be regarded as a kind of inference technology, which is not based on the needs and preferences of users. Knowledge-based approaches differ markedly depending on the functional knowledge they use. Social network-based recommendation has previously been mostly domain-based approach. Firstly, the social network of raters was explored, and the scores of raters were aggregated to calculate the predicted scores. And then, find the raters’ neighbors.

Learning high-quality user and item representations from interaction data is the core idea of collaborative recommendation. In early studies, such as matrix factorization (MF) [7, 8], a single ID of each user (or item) is projected into an embedding vector. Subsequent studies [9] enriched single IDs with interaction histories to learn better representations. Typically, nonnegative matrix factorization (NMF) [10, 11], a well-known dimensional reduction method in data representation, has also been successively applied to recommender system problems in recent years [12, 13]. Although NMF can be used for any nonnegative rating matrix (e.g., ratings from 1 to 5), its greatest interpretability advantage arises from the fact that users have the mechanism to specify the liking entry but not to specify a disliking entry. Such matrices include one-dimensional rating matrices or matrices in which nonnegative entries correspond to activity frequencies. These datasets are also referred to as implicit feedback datasets.

However, the NMF model is essentially a nonconvex optimization problem, and its sensitivity to initialization is unavoidable, i.e., the recommendation performance of the NMF model depends heavily on the initialization, and a poor initialization matrix can significantly degrade the recommendation performance. A general recommendation system uses only historical user-item interaction information (explicit or implicit feedback) as input, which poses two problems. First, in real-world scenarios, information about user-item interactions is often sparse. For example, a movie app may contain tens of thousands of movies, yet the average number of movies typed by a user may be only a few dozen. Using such a small amount of observed data to predict a large amount of unknown information can greatly increase the risk of overfitting the algorithm. For newly added users or items, the system does not have their historical interaction information, so it cannot accurately model and recommend them. This situation is called the cold start problem.

Moreover, most existing models learn recommendation models in a supervised learning paradigm [1416], where the supervised signals are derived from observed user-item interactions. However, the observed interactions are extremely sparse compared to the entire interaction space [8, 17], which makes it insufficient to learn quality representations. Moreover, in many cases, supervised information about the data is difficult to obtain, making a large number of existing models for supervised learning inapplicable.

Accordingly, this paper introduces a self-supervised learning mechanism based on the matrix factorization idea and NMF model. We propose an easily scalable self-supervised nonnegative matrix factorization recommendation model framework based on matrix decomposition.

Self-supervised nonnegative matrix factorization (S2NMF) was proposed, and a corresponding gradient descent optimization algorithm was further proposed. The complexity of the algorithm is analysed. Extensive experimental results show that the proposed S2NMF has better performance. The main contributions can be summarized as follows: (i)Based on the idea of matrix factorization, the self-supervised learning mechanism is introduced on the basis of the NMF model to realize the information enhancement of sparse data(ii)A self-supervised nonnegative matrix factorization recommendation model S2NMF is proposed, and a corresponding gradient descent optimization algorithm is further proposed, and the complexity of the algorithm is analysed(iii)Extensive experimental results demonstrate that the proposed S2NMF has superior performance on recommendation in comparison with comparison algorithm

The rest of this paper is organized as follows. Section 2 briefly reviews random-walk-based, factorization-based, and deep-learning-based approaches. Section 3 presents the proposed CAHE and the corresponding optimization algorithm. Section 4 analyses the performance of CAHE, including the experimental results of node classification and clustering. Finally, Section 5 concludes this work.

2. Materials and Methods

This section provides a detailed description of the proposed model S2NMF and its model optimization algorithm and gives the pseudocode of the S2NMF model optimization algorithm and its time complexity analysis.

2.1. Problem Definition

Suppose there are users and items . Let the scoring matrix array , where is the rating of user for item . If the rating is unknown, we set . In detail, and the symbol descriptions of the work are shown.

There are usually two ways to construct the user-item interaction matrix .

Based on literature [18], most researchers usually believe that all evaluations are equal to 1, and then, choose Formula (1) to construct the interaction matrix . In this paper, we choose Equation (2) to construct the interaction matrix , where the rating of user for item remains in the interaction matrix . Explicit recommendation rating is very complicated for recommendation. Here, we express the user’s preference degree for a certain product by Formula (2) and mark the unknown rating as 0 by the method of implicit feedback without preference. Usually, recommendation systems are formulated as a problem of estimating the rating of each unobserved item in .

In order to better formalize the mathematical process of this work, the detailed notation is shown in Table 1. The tasks of a recommendation system can be divided into three types: scoring prediction, top-N prediction, and click prediction. Therefore, the proposed S2NMF model framework is summarized as follows. (i)Input. The observed user-item interaction matrix (ii)Output. The predicted user-item interaction matrix

3. Model Framework

3.1. Classical NMF Model

Nonnegative matrix factorization (NMF) was proposed by Lee and Seung in 1999 in Nature [19] that can achieve nonlinear data dimensionality reduction and has strong interpretability. With the extensive attention of researchers, NMF has gradually become a mature and reliable multidimensional data processing model that is widely used in recommendation systems, pattern recognition, signal processing, computer vision, and network science. It is widely used in research fields such as recommendation systems, pattern recognition, signal processing, computer vision, and network science [20]. In addition, it can reveal the potential feature-to-feature relationship quite accurately and can also be used for other related relationships between features and for related tasks, such as node importance identification [21, 22], link prediction [2325], and evolutionary analysis [26, 27].

In recent years, many researchers and scholars have applied NMF to recommender system discovery [14, 17, 28], which effectively improves the accuracy and efficiency of personalized recommendation results. Normally, the user product is represented as a data matrix . The matrix can represent the interaction characteristics of users and products, such as the rating matrix and click-through rate matrix. NMF decomposes the matrix into two nonnegative matrices and optimizes them iteratively such that , where , and is the predetermined number of hidden features. In a normal situation, denotes the basis matrix, while denotes the data in the reduced feature space, also called the combined coefficient matrix of the basis. In fact, can denote the probability that user likes topic , and can denote the propensity of topic includes item . So I have no reason to believe that can represent the probability that user likes item . Then, how can be made as close to as possible? This involves the construction of the NMF model and the optimization process of solving it. The goal is to make as close to as possible, and it may be assumed that is a Gaussian distribution obeying a mean of 0 and a variance of .

Assuming that is independently and identically distributed, the likelihood function can be obtained from the Gaussian probability density function as

The maximized likelihood can be transformed to maximize the log-likelihood, which is

In Equation (5), denotes a constant, since denotes the Frobenius norm and the Euclidean distance. Maximizing the log-likelihood translates into minimizing the Euclidean distance, which can then be expressed as

Similarly, if is assumed to obey Poisson distribution, its log-likelihood function is distance, which can then be expressed as

This corresponds to the scatter, which also denotes the distance. Then, maximizing the log-likelihood translates into minimizing the minimizing the distance, which can be expressed as

In optimizing Equations (6) and (8), the objective functions and are nonconvex for both and , but they are convex if or are considered alone. Therefore, although it is not practical to find the global minimum of and , the local optimal solution can always be found by iterative optimization. Lee and Seung [15] proposed a corresponding update rule based on the gradient descent approach. For Equation (6), the update rule is

For Equation (8), the update rule is

By updating the rules, Equations (9) and (10) are iteratively updated to obtain locally optimal and . Then, usually by reconstructing by and , we obtain the complementary matrix , which is

There are many deformations of NMF methods [20], among which the more commonly used is symmetric NMF (SymNMF) [29, 30]. SymNMF decomposes the observation matrix into two identical matrices, . SymNMF inherits the advantages of NMF, because the observed matrix can fuse the similarity between data points and has fewer parameters. In addition, in 2013, Wang and Zhang [20] performed a systematic review of various expansion methods of NMF, and they classified NMF methods into the following: basic NMF, constrained NMF, structured NMF, and generalized NMF. In recent years, NMF-related models have been widely used by many researchers for graph image processing [3133], complex network analysis [21, 34, 35], and recommendation systems.

3.2. S2NMF Model

The proposed S2NMF model framework is shown in Figure 1. First, the super similarity matrix is constructed by taking the score matrix as input. Secondly, the NMF was repeated times randomly, and dimensionality reduction representations were analysed. Thirdly, area indicator matrices are obtained by analytic strategy, and a new super similarity matrix is obtained by combination and reconstruction. The above stochastic matrix factorization process is repeated to guide convergence to obtain the predicted scoring matrix. The S2NMF model framework proposed in this paper is an intelligent recommendation model with self-enhancement of information based on different types of NMF models and fusion of self-supervised information. Due to space limitations, this paper takes the classical NMF [19] as an example to introduce it in detail.

As mentioned earlier, NMF is required to solve a nonconvex optimization problem that is sensitive to the initialization of variables. The details are shown in Figure 1. We propose self-supervised NMF (S2NMF). By exploiting the sensitivity of NMF, the model can gradually improve the recommendation performance without relying on any additional information. First, based on the classical NMF model, is decomposed into two nonnegative matrices and . Based on the NMF basis introduced in the previous subsection, we assume that the errors obey a Gaussian distribution. Then, the model optimization problem at time can be constructed as where is the scoring matrix, is the basis matrix of , and is the combined coefficient of matrix.

Since the NMF model factorization has some randomness, the factorization operation is repeated randomly times in this paper. , where represents the number of hidden features and the number of associations. In terms of physical meaning, it represents that users can cluster into groups of similar hobbies. Therefore, the community delineation can be divided by resolving the community affiliation matrix to obtain the community indication matrix for each user.

Considering that this community indication matrix is more discriminative than the scoring matrix , this paper constructs a super similarity matrix as where is the th element of the vector . This weight matrix is mainly used to balance the contribution of each association degree of each association. The obtained supersimilarity matrix can be resolved as a recommendation indicator matrix. Again, using the super similarity matrix as input, a new community affiliation matrix can be obtained by the NMF model, and the experiment is repeated several times to obtain a better recommendation. The experiment is repeated several times to obtain better recommendation results. This process is repeated until the stopping criterion is reached or the maximum number of iterations. We represent the above process as a constrained optimization model. where denotes the full 1 vector. Clearly, by minimizing Equation (15), a better set of and will result in a smaller , and accordingly, a larger weighting factor will be assigned. Thus, the value of can measure the quality of and , and by resolving , can be constructed. However, there is a nonnegative constraint on . Equation (15) imposes an implicit weighted parametrization on . This may lead to a rather sparse solution in the optimization of Equation (15); most elements of are equal to or close to zero. Since our goal is to combine the contributions of multiple clusters, the extreme sparsity is not a perfect choice. For this reason, a hyperparameter is introduced to control the distribution of terms, and the final model is rewritten as where belongs to (1, +∞). When is close to 1, only a few elements of are valid. When tends to +∞, the process of minimizing the equation causes equal weights to be assigned to .

Therefore, should not be too large or too small, and the size needs to be appropriate. In this paper, we empirically set to 2.

By solving the equation, the final community indication matrix can be obtained. Meanwhile, a better super similarity matrix and contribution vector are determined.

3.3. Model Optimization Algorithm

To solve the objective Equation (16), an alternating iteration strategy is proposed in this paper. First, using the fixed of and multiple random nonnegative initialization matrices and , solving and and for the objective equation, we obtain the following:

In this paper, a simple and effective criterion for adaptive termination Algorithm 1 is proposed.

 Input: Observed user-item interaction matrix R B;
 Output: Observed user-item interaction matrix ;
 Initialize:Niter =1, maxIter =10, S = R;
 while Niter < maxIter do
   Update according to algorithm 2: ;
   Update according to equation (3): S;
   if satisfy stop condition then
     Break;
   Niter = Niter+1;
;
 Return ;

In this paper, a simple and effective criterion for adaptive termination Algorithm 1 is proposed, and the pseudocode is given in Algorithm 1. It is reasonable to assume that in the first few iterations, the association detection of all partitions can be gradually improved and the consensus between them can also be increased. When the maximum consensus is reached, the consensus among them will remain at such a high level that it may even decrease and fluctuate due to the randomness of variable initialization in the iterations. Therefore, we use a different partition for the degree of agreement between them to construct the stopping criterion.

For the objective equation of the S2NMF model (see Equation (17)), the derivation of the update rule is similar to that of NMF and can be found in [15]. Since this objective equation has , , and multiple parameters to be optimized, it belongs to a nonconvex optimization problem.

Based on the gradient descent approach, only the other parameters can be fixed separately to optimize the current parameters. Similar to Equation (9), it is easy to obtain and to update the equation as

Unlike the classical NMF, which fixes the parameters and and optimizes , the objective equation can be rewritten as where . Then, the Lagrangian function of Equation (20) can be expressed as

Taking the first-order partial derivative of and setting it to 0 yields the following:

Since , can be expressed as

Then, bringing into Equation (22), we obtain as

Clearly, the numerator and denominator of Equation (24) are greater than 0. Then, we have that is always greater than 0, which satisfies both the nonnegative constraint of . The solution of Equation (24) satisfies the KKT (Karush-Kuhn-Tucker) condition of Equation (20), and it is a locally optimal solution. However, since the solution of Equation (20) is a convex problem, Equation (24) is a globally optimal solution of Equation (20).

 Input: S  B;
 Output: ;
 Initialize:Niter =1, Randomly initialize a nonnegative matrix W and H
 while Niter <500 do
  For do
   Update according to equation (18): ;
    Update according to equation (19): ;
    Update according to equation (24): ;
   if the convergence of objective function then
      Break;
   Niter = Niter +1;
 Return ;

In summary, the detailed optimization process of model S2NMF objective Equation (16) is summarized in Algorithm 2, and the pseudocode is given in Algorithm 2. The algorithm stops iterating if the difference in the maximum change of variables during two adjacent iterations is less than 0.001, e.g., . , , and are updated iteratively by updating the rules until the objective function converges.

3.4. Model Complexity Analysis

For the computational complexity of Algorithm 2, one of the two alternating iterative solutions , , and are fixed first. For the solution of and , the computational complexity is , and the complexity of the solution of is . Therefore, the complexity of each iteration of Algorithm 2 is .

Algorithm 1 solves Algorithm 2 repeatedly with the computational complexity , where is the maximum number of iterations of Algorithm 2. The computational complexity of constructing is of . Therefore, the complexity of each iteration of Algorithm 1 is .

4. Results and Discussion

To verify the effectiveness of the S2NMF recommendation model proposed in this paper, this section designs comparison experiments on several standard data sets, focusing on the experimental setup, experimental results, and discussion of parameter sensitivity. The computer configuration used in our experiment is as follows: CPU: I5 6500, graphics card: Sotai 1600, and memory: 16 G.

4.1. Experimental Settings

To test the performance of the S2NMF recommendation model, four widely used datasets in recommender systems are selected to evaluate the model in this section: MovieLens 100 K (ML-100 k), MovieLens10M (ML-1 M) (https://grouplens.org/datasets/movielens/), Amazon music (Amusic), and Amazon movies (Amovie) (http://jmcauley.ucsd.edu/data/amazon/). The statistics of the four datasets are listed in Table 2.

4.2. Comparison Algorithm

(i)ItemPop. This is a ranking of items based on their popularity and the number of interactions they have. It is a nonpersonalized method and usually uses performance as a benchmark for personalization methods(ii)ItemKNN [36]. This is a standard item-based collaborative filtering method used commercially by the Amazon method(iii)BPR [37]. It is a generalized personalized ranking recommendation algorithm derived from the Bayesian analysis of the problem of the maximum a posteriori estimate

4.3. Evaluation Indicators

To comprehensively evaluate the effectiveness of the model proposed in this chapter, the experiments in this chapter use five evaluation metrics to evaluate the algorithm: recall, mrr, ndcg, hit, and precision. These metrics examine the recommendation accuracy of the algorithm. (i)Recall. It is an evaluation metric that is the same as accuracy and indicates the proportion of relevant content in the returned recommendation list to all relevant content, regardless of the order of the returned results. It is defined as the percentage of resources preferred by users in the test set that appears in the recommendation list. It reflects the proportion of correctly predicted content in the returned recommendation list as a percentage of all known content. when there are no accurately predicted resources in the user’s test set at all, and when all resources in the user’s test set are accurately predicted. Thus, a higher recall indicates a more comprehensive prediction of the user’s preferred resources(ii)mrr: Mean Reciprocal Rank (MRR). The indicator response is whether we find these items placed in a more obvious position of the user, emphasizing the location relationship in a sequential manner(iii)ndcg. The whole process is called normalized discounted cumulative gain (NDCG), which is used as an evaluation metric for ranking results and evaluates the accuracy of the ranking(iv)Hit. Hit rate (HR) reflects whether the recommended sequence contains the items that the user in the recommendation sequence, i.e., whether the item selected by the user is in the recommendation sequence. This value is 1 if it exists and 0 if not(v)Precision. Accuracy, which is a common unordered evaluation index, indicates the proportion of the returned results that indicate the proportion of relevant content in the returned results, without considering the order of the returned results. When the user’s recommendation list , there are no accurate resources in the user’s recommendation list, and when all predictions are accurate, so the higher the hit rate, the more the recommendation list matches the user’s actual situation

4.4. Comparison Experiments

To fully validate the performance of the S2NMF model proposed in this chapter, the comparison results of the four data sets in Table 2 under three comparison methods and five evaluation metrics are given in this subsection, and the results are discussed and analysed in detail.

In Figure 2, the histogram of comparison results for the data set ML-100 K is given. Specifically, the horizontal coordinate represents the type of evaluation metrics, and the vertical coordinate represents the five evaluation metrics calculated by recommending items to users according to the top 10 items of the rating prediction value.

In addition, the four colours represent the four different model results. As seen overall from the figure, the results of S2NMF proposed in this chapter are all higher than the other three commonly used benchmark methods.

In Figure 3, a histogram of the comparison results for the data set M1-1 M is given. Again, the horizontal coordinates represent the type of evaluation metrics, and the vertical coordinate represents the top 10 items recommended to users according to the rating prediction value of the 5 calculated evaluation metrics. The overall figure shows that the proposed S2NMF results are higher than those of the other three commonly used benchmark algorithms.

In Figure 4, a histogram of the comparison results for the Amusic dataset is given. Again, the horizontal coordinate represents the type of evaluation metric, and the vertical coordinate represents the five evaluation metrics calculated by recommending items to users according to the top 10 items of the rating prediction value. It is generally seen from the figure that the S2NMF results proposed in this chapter are all higher than those of the other three commonly used benchmark algorithms.

In Figure 5, a histogram of the comparison results for the Amovie dataset is given. Again, the horizontal coordinate represents the type of evaluation metrics, and the vertical coordinate represents the five evaluation metrics calculated by recommending items to users according to the top 10 items of the rating prediction value. It is generally seen from the figure that the S2NMF results proposed in this chapter are all higher than the other three commonly used benchmark algorithms.

Figure 6 shows the sensitivity analysis of the parameters for the four data sets with respect to Algorithm 1. The horizontal coordinate represents the number of iterations, and the vertical coordinate represents the number of iterations that hit the indicator value. From the figure, it can be generally seen that the S2NMF proposed in this chapter is relatively sensitive to the parameter (number of iterations) on the four data at less than 4, while it does not change much at greater than 4. And then, it becomes relatively stable as the number of iterations increases. Accordingly, the S2NMF model can choose a relatively small number of iterations 4 to effectively reduce the computational cost without affecting the model performance.

5. Conclusions

Based on the matrix factorization idea, this paper introduced a self-supervised learning mechanism based on the NMF model to achieve information enhancement of sparse data, proposed an easily scalable self-supervised nonnegative matrix factorization recommendation model framework S2NMF, further proposed a corresponding gradient descent optimization algorithm, and analysed the complexity of the algorithm. Numerous experimental results showed that the S2NMF proposed in this paper has superior performance.

From the contributions of this paper, the sparse data problem of user-project interaction is solved, the interpretability of the recommendation model is enhanced based on the matrix factorization idea, and the self-supervised learning mechanism is introduced to realize the information enhancement of sparse data. However, determining the number of hidden features automatically is still an urgent problem. Likewise, exploring deep hidden features and expanding them to large-scale application scenarios is an urgent problem, which has important research significance in the field of recommendation systems.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62141205 and the Fund Project of XJPCC (2022CB002-08, 2022CA007, 2019AB001, 2020DB005, and 2017CD010).