Abstract

Big data has been developed for nearly a decade, and the information data on the network is exploding. Facing the complex and massive data, it is difficult for people to get the demanded information quickly, and the recommendation algorithm with its characteristics becomes one of the important methods to solve the massive data overload problem at this stage. In particular, the rise of the e-commerce industry has promoted the development of recommendation algorithms. Traditional, single recommendation algorithms often have problems such as cold start, data sparsity, and long-tail items. The hybrid recommendation algorithms at this stage can effectively avoid some of the drawbacks caused by a single algorithm. To address the current problems, this paper makes up for the shortcomings of a single collaborative model by proposing a hybrid recommendation algorithm based on deep learning IA-CN. The algorithm first uses an integrated strategy to fuse user-based and item-based collaborative filtering algorithms to generalize and classify the output results. Then deeper and more abstract nonlinear interactions between users and items are captured by improved deep learning techniques. Finally, we designed experiments to validate the algorithm. The experiments are compared with the benchmark algorithm on (Amazon item rating dataset), and the results show that the IA-CN algorithm proposed in this paper has better performance in rating prediction on the test dataset.

1. Introduction

With the continuous development and maturity of big data, cloud computing, and other technologies, this has greatly enriched our production and life. At the same time, the information in the network has also shown an explosive growth. The information generated by human beings in the past ten years or so is more than the total amount of information generated in the previous thousands of years [1]. Generally, when faced with massive amounts of information, it is often difficult for people to find what they want and what they are interested in. Even if they can find it, it will take a long time. How to effectively deliver useful information to users is a very important issue. In order to solve this problem, Google has introduced search engine technology. Users can enter keywords to search according to the purpose of quickly and accurately finding the content they want. However, this method is only suitable for occasions where users actively seek information and need a clear purpose. In actual daily life, many users’ needs for seeking information are vague, latent, and passive. Users may not know how to describe what they are interested in and may not even actively seek out the content they are interested in. In this case, search engines cannot accurately convey effective information to users.

Recommendation algorithm is one of the effective means to solve the above problems. It can be modeled by analyzing users’ browsing history, user-item rating information, and other information such as users’ preferences, to discover some potential personalized needs of users. Currently, this type of recommendation algorithm is the most widely used on e-commerce websites and social networks, so recommendation algorithms have important research significance and application value.

In traditional recommendation algorithms, both content-based recommendation [2] and collaborative filtering recommendation [3] have achieved certain success in discovering the relevance of users and items. However, with the rapid increase of network data in recent years and taking the characteristics of “big data” into consideration, these traditional recommendation algorithms are not completely suitable for the current network environment. First of all, the data is sparse. In current e-commerce websites and social networks, the user’s evaluation record for a certain item is incomplete, and there are a lot of null values in the evaluation-related index of the user and the item. Secondly, for newly listed items or newly registered users in the system, the algorithm does not properly handle this type of data. Finally, the frequency of a user consuming a valuable item in the system is extremely low, but, for the investors, the profit that can be made is considerable, and the algorithm does not specifically deal with this type of problem. Therefore, the effect of traditional recommendation algorithms is limited.

To address the above issues, this paper designs a hybrid recommendation algorithm based on deep learning. The algorithm uses integrated strategies to fuse multiple recommendation algorithms and generalize and classify the output results of multiple models, thus compensating for the disadvantages carried out between different algorithms. Then, based on the fusion model, the deeper and abstract nonlinear interaction relationship between users and items is captured through deep learning technology, which can effectively solve the cold start and long tail items problems that exist in traditional algorithms. At the same time, the accuracy of the algorithm has been further improved. The main contributions of this article are as follows:(1)An integrated strategy model of multiple recommendation algorithms is designed, and the output result of the model is preprocessed and preclassified on the dataset.(2)Related comparative experiments are designed, an improved CNN deep learning model IA-CN is trained, and a comparative analysis and comparison of advantages and disadvantages with the benchmark model are made.

The remainder of the paper is organized as follows: the second section introduces related work, the third section introduces the design and algorithm description related to the hybrid recommendation scheme, the fourth section is devoted to the experiment, and the final section gives the summary.

The recommendation algorithm originated from a branch of data mining science and became an independent research field in the 1990s. Early GroupLens was proposed based on the idea of collaborative filtering [4] to complete the recommendation task, and the research of recommendation algorithm has been in continuous development.

Collaborative filtering algorithms are starting to gain popularity. They can be broadly classified into two types: nearest-neighbor-based and model-based. The former calculates the similarity of users or items in the rating matrix to make recommendations for users. Sarwar et al. [5] proposed the item-based algorithm on this basis, and it was successfully applied in Amazon’s e-commerce system. In addition to calculating the similarity, the latter also requires designing a model, but it can be easily extended to large-scale data systems to solve the recommended real-time problem. However, massive data can bring data sparsity and cold start problems. Yu [6] proposed a collaborative filtering algorithm that utilizes the potential factor space of the auxiliary domain to expand user and item features, which overcomes the shortcoming of using only a single auxiliary domain recommendation. Simon Funk proposed a collaborative filtering recommendation algorithm based on matrix decomposition [7], which gained a large improvement in recommendation accuracy. Subsequently, scholars have continuously improved it and successively proposed probabilistic matrix decomposition model (PMF) [8], SVD++ model [9], MCFSAE [10], and so forth, which have gained a certain improvement in overcoming data sparsity.

However, for the highly sparse rating data in the big data environment, the performance of model-based recommendation algorithms is relatively low, and they cannot cope with the “cold start” problem of new items or new users. With the development of the Internet and social networks, a large amount of auxiliary information can be obtained, such as social relationships and user information. The application of machine learning models has given rise to hybrid recommendation algorithm that combines content-based filtering and collaborative filtering [11].

In fact, the shallow layers of machine learning models cannot learn the deeper features implied by users and items. Due to the excellent results of deep learning in hidden feature extraction, many scholars have introduced deep learning into the hidden feature learning of recommendation systems [12]. Salakhutdinov et al. [13] first introduced deep learning to learn the implicit factors of users and items in recommender systems and proposed a restricted Boltzmann machine- (RBM-) based collaborative filtering algorithm. Yu [14] used SVM to improve the traditional CF method and made progress for the study of cross-domain collaboration, but he did not take the underlying information into account. Wang et al. [15] directly used convolutional neural networks (CNN) and DBN to obtain the implicit factors from content information, which is only applicable to music data, since it only considers the implicit factors of items. Kim [16] used convolutional neural networks to extract potential features of relevant auxiliary information and then combined them with a probability matrix model to make recommendations, but many of the hyperparameters need to be adjusted manually. Chen Da also applied Deep Belief Network (DBN) [17] to recommender systems earlier and proposed a new deep hybrid recommendation model. This approach intrinsically suffered from the data sparseness problem. To address this issue, Wang [18] introduced collaborative deep learning (CDL) model by integrating the probabilistic topic model with the collaborative filtering technique. Zheng [19] proposed DeepCoNN to exploit two CNN networks to jointly model both the user and item reviews for improving rating prediction. Covington et al. [20] highly summarized the great impact of deep learning techniques on YouTube video recommendation system. The YouTube system consists of two neural networks, one for generating candidate sets and one for ranking, and the two neural networks can work independently without affecting each other, and the ranking layer can not only use candidate sets but also add other datasets. Experiments have proved that this two-stage information retrieval method can effectively improve the recommendation effect and bring better user experience to users while handling large amount of data very well.

Although deep learning techniques have made certain achievements in the field of recommendation algorithms, there is still a great deal of improvement in the application of deep learning techniques in the field of personalized recommendation algorithms, which also presents new challenges and opportunities for related research work.

3. Comprehensive Recommendation Model IA-CN

In recommendation systems, multiple recommendation algorithms are often combined in different stages of the recommendation system to meet the diversity of recommendation results. In order for the system to make recommendations to users under low-latency requirements based on items that are of interest to users, the four following steps are generally required. The details are shown in Figure 1. The recall phase is mainly responsible for triggering as many correct results as possible from the full information collection and returning the results to the “sort.” The sorting stage is mainly responsible for selecting the feature set that matches the consumption at this stage from the screening set as complete as possible. The fine ranking stage is mainly responsible for scoring and sorting the current set through model training from the feature set, so as to obtain the recommendation result. In the mechanism strategy stage, there will be deviations in estimates for items that have not been shown, and there may be problems. Therefore, at this stage, it is necessary to rely on rerank to filter and screen items through various recall methods.

3.1. Recall Processing-Description of Hybrid Collaborative Filtering Algorithm

In the recall phase, we will improve the traditional collaborative filtering algorithm. Collaborative filtering is a widely used technology in the category of recommender systems, and the approach of using neighbors is more explanatory than the approach based on models. In the nearest-neighbor collaborative filtering algorithm, the most important task is to find the k users/items closest to the target user/item in the entire space. The core is to calculate the similarity. The existing basic methods are all based on similarity calculation under vectors, that is, calculating the distance between vectors, the closer the distance, the greater their similarity. In recommendation, we can calculate the similarity between users by taking the user’s preference for all items as a vector or use all users’ preferences for an item as a vector to calculate the similarity between items. Among them, the most common method is cosine similarity formula.

In the above formula, what is sought is the similarity between user and user , is an item that both users have rated, represents the score of user on item , and represents user ’s score on item i.

Pearson’s correlation calculation formula is as follows:

In the above formula, represents the set of users who have scored, represents the average score for item , and represents the average score for item . The Pearson correlation coefficient is used to calculate the closeness of the connection between two users, and the value is [−1,1].

The collaborative filtering prediction scoring formula is as follows:

In the above formula, represents the predicted value of user for the unrated item i, and represents the set of users who have item to be scored in the nearest neighbor.

This paper uses hybrid collaborative filtering to recall the dataset. The purpose is to obtain a dataset that is easy to classify. The detailed algorithm description is as in Algorithm 1.

Input: User feature mount feature. product feature matrix feature, user rating matrix , user list and product list
Output: Residual data set
(1)Init Empty similarity matrix sim and sim //REviewed users and products similarity matrix;
(2)Search the user list and the product List in ;
(3)Construct. List and ListI into pairwise tuples set and
(4)for in and i in set do
(5) Compute sim and sim ;
(6) sim add in sim , sim ;
(7)Calculate all users similarity matrix Sim and all items similarity matrix sim
(8)for in do
(9) Define polarity score:
(10) Search the K-clustering U and T-clustering-I in SimRU SimRI
(11) Compute the rating of all users of uS on the product It//user collaboration;
(12) User Uk’s rating on the products in the iS collection of all fields//item collaboration;
(13) Query , construct and respectively;
(14) Select relevant users and products are selected to construct a real score matrix
(15)repeact//after multiple iterations, find the Wrong samplesby updating the learning r ate and weight coefficients
(16) Package ;
(17) Add data in Data;
(18)until Get all data
(19)final;
(20)return Data;
3.2. Rough Sorting Model Based on XGboost Classification Method

As a boosting method [21], XGBoost has a good performance in improving the accuracy of the classification algorithm. The reason is that its internal decision tree uses regression trees, and the use of parallelization technology can improve the efficiency of learning. It has not only the advantages of fast speed and good results but also a series of advantages such as large-scale data and custom loss functions. It is an additive model composed of a series of base models. Assume that we are training the tree model for the t-th iteration, and it is expressed by ; then we have the following formula:

In the above formula, represents the prediction result of sample i after the t-th iteration, represents the prediction result of the -th tree, and represents the model of the -th tree.

The definition of XGBoost is as follows.

Define the complexity of the tree:

In the above formula, represents the structure of the tree, and represents the weight of the leaves.

Improve the complexity function:

In the above formula, is the number of leaf nodes, represents the L2 modulus square of the output score on each leaf node, and and control the proportion of this part in the final model formula. In this way, the complexity of the model can be measured, thereby effectively controlling overfitting.

Perform the objective function:

Taylor rewrite the objective function:

In the above formula, is the loss function, and is the regularization term. is a constant item. Derivative and rewrite the above formula, taking the square loss function (formula (9)) as an example:

Since it is actually a known value at the t-th step, is a constant, which will not affect the optimization of the function. Therefore, removing all the constant terms, the objective function is obtained as follows:

So we only need to find the value of the first and second derivatives of the loss function at each step (because in the previous step is known, so these two values are constants) and then optimize the goal function; you can get of each step and finally get an overall model according to the additive model.

For the set preprocessed by Algorithm 1, we train the XGBoost model to add descriptive tags to the original dataset to achieve the purpose of preparing for classification. The tags are for user and product datasets. The purpose of this is to reduce the complexity of scoring model training and improve the recommendation accuracy of the algorithm. The detailed algorithm description is given in Algorithm 2.

Input: user list and product list , Residual data collection Data
Output: User description category , product description category
(1)Init sample same training weight
(2)for data in Data do
(3) Use XGBOOST to iterate m times to calculate the error rate through
(4) Update the learning rate according to
(5) Reset the weight of the nth sample as
(6)repeat//after multiples iterations find the wrong samples by updating the learning rate and weight coefficients
(7) Constructs a regression tree and input the data set as the root node in the form of a label;
(8) Build the XGBoost model according to the objective function of formulas (3)–(6)
(9) Defined the logistics loss function
(10) Obtain the second-order partial deivative of the loss function update the learning rate
(11)until N;
(12)Add label description in US and IS
(13)Final;
(14)Return US, IS
3.3. Classification Algorithm Description

Recall processing and classification are to improve the efficiency of the improved deep learning model recommendation algorithm. In actual model deployment, recall and classification do not necessarily have to be processed in real time. The classification algorithm is based on the user’s past behavior, score, features, and user interaction, interuser relations, and interuser project relations, and calculate the similarity separately, build a similarity matrix, calculate the similarity of each user to other users and items, cluster the calculation results, form the nearest neighbor set, and then use the XGBoost algorithm for classification, find out the samples with incorrect classification, calculate the classification error rate, update the weights and learning rate, and reset the weights to achieve the effect of improving the accuracy of the algorithm classification.

3.4. Refinement: An Improved CNN Model Based on the Attention Mechanism

For the processing of fine ranking, we propose a new deep network-CNN improved network based on the attention mechanism (IA-CN), which is used for user’s real-time recommendation. Figure 2 shows the model architecture of IA-CN. It first includes an embedding layer, which embeds the hotspot vector of the original action feature into the dense vector, and then a fully connected layer composed of two different feature spaces. Then, the convolutional layer is applied to the user word vector and the item word vector convolution to generate new features to model and capture the deeper and abstract nonlinear interaction relationship between the user and the item. Then, the attention pooling layer can fuse the output from the convolutional layer. Finally, the prediction layer will give the ratio of recommended predictions, which is used to rank the recommended list. Next, we will introduce the various components of IA-CN in detail.

3.4.1. Embedding Layer

The embedding layer is to extract a set of sequence words from the user’s comment information and item description information and map them to an n-dimensional matrix. Word vector embedding can be used to construct the word vector matrix, so as to learn its semantic information. Specifically, the recent comments of user u are divided into a single document d, which contains a total of n words. Then, a word vector matrix is constructed for user , and the rules are shown as follows.where represents the n-th word in document , the preprocessing function returns the -dimensional word vector of the corresponding word , and is the concatenation operation. Through matrix , the order of words can be maintained. This article uses Word2vec [22] to initialize the embedding layer on a large Google news corpus.

3.4.2. Convolutional Layer

After being processed by the embedding layer, user comment data and item description data will enter the convolutional layer part. The convolutional layer contains m neurons, which generate new features by performing convolution operations on the word vector of user and the word vector of item . Taking the user as an example, the size of filter of each neuron in the convolutional layer is set to ; that is, the filter operates on words and then performs a convolution operation with . The specific operation is shown in the following formula:

In the above formula, is the activation function, is the convolution operation, and is the bias term. As for the loss function used in this article, the Relus function is used as the loss function, because it performs better in training speed [23].

3.4.3. Attention Layer

The traditional convolutional neural network has the characteristics of translation invariant weight sharing. It also has certain disadvantages. For example, although the dimensionality reduction output can be easily achieved during the experiment, it will also cause a lot of valuable information to be lost, and the connection between the part and the whole will be ignored. Therefore, this paper tries to combine the attention mechanism with the CNN network and at the same time pay attention to the comment text and item description information and jointly model the user’s behavior and item attributes. The pooling technology of the attention mechanism is used to model the convolved word vectors, which can extract more prominent features. The word vector in the comment or description text is mapped to the corresponding real-valued vector by the attention network, and then the normalization operation is performed to obtain the weight of the vector. As the final pooling result, attention should be paid to the connection between the whole and the parts.

The specific pooling strategy uses a weighted average strategy to replace the average and maximum and minimum strategies. The weighted average result of all word vectors in the auxiliary text is regarded as the vector representing the auxiliary text. The weighted weight corresponds to each auxiliary text. The importance of words in corresponding comments and descriptions is the attention factor. For a given auxiliary text vector sequence , the process of describing the attention-based pooling technique is shown in the following formula:

In the above formula, is the weight, that is, attention, which is obtained by calculating each word in the sequence through the normalized exponential function. It is obtained by the following formula:

In the above formula, is an exponential function with as the base, and el is the input of normalization processing, which is formed by an attention network . Take the user as an example; the entire pooling algorithm is as follows.

In the algorithm description,Step 1 initializes the attention weight matrix A to facilitate the subsequent splicing of the matrix,Step 2 is start the loop from user U. Take out the word vectors one by one from all the word vectors in it,Step 3 uses the modified cosine similarity as the attention network to calculate the weight of each word in all words,Step 4 performs normalization processing to obtain the final attention weight, andStep 5 is where the attention weights processed in step 4 are joined together, and finally the pooled word vector matrix o is obtained by weighted average, which is used as the input of the next layer.

3.4.4. Prediction Layer

The rating prediction layer is where the actual rating prediction task of the recommendation process occurs. The prediction layer usually accepts user/item representation and aspect representation as input and passes them to the decomposition machine. The factorization machine accepts real-valued eigenvectors and processes pair interactions. Therefore, the overall rating can be inferred as follows:where , , and are user, project, and global deviations, respectively. The back-propagation method can be used to learn model parameters, and the mean square error function is used as the loss function formula:

In the above formula, represents the predicted score of the convolutional neural network and the actual score of the user on the item. Of course, the result of prediction layer training only represents the behavior probability of a certain user for the item. The specific recommendation process also needs to rank all behavior probabilities and output the first n results as a display. The ranking algorithm is not the focus of this article, so there is no systematic discussion.

4. Experiment

This section provides different experiments to evaluate the scientificity of the proposed IA-CN model. This experimental study aims to answer the following questions:(i)RQ1: Can the IA-CN model perform better than the benchmark model method?(ii)RQ2: Is the IA-CN model sensitive to hyperparameters such as degeneration, latent factors, and embedding size?

4.1. Dataset

In order to evaluate the performance of IA-CN, this article uses the current relatively complete Amazon product dataset 2018 [12], which consists of two parts: product metadata and user comment data. This dataset is the information collected by Amazon.com from 1996 to October 2018. It currently contains 29 categories of items and is the largest source dataset available for recommendation algorithm experiments.

Metadata of Amazon products includes mainly the description of the product, including information such as product number, price, feature description, product category, and related products.

As regards Amazon review dataset, in previous works [1, 8, 9, 15, 17, 19], researchers have used this data for rating prediction tasks. The main information contained in the review data includes user number, product number, user’s vote on the product, rating, and review information.

Due to the long time span of the collection, the original dataset is too large, and the original data contains large numbers of unrated users and unrated products. So, in our experiment, we especially used the 5-core version of the Amazon-mini dataset (each user or item has at least 5 classified interactions). The data volume of this version is about 0.3% of the original data, including only 10-item classification; even so, the amount of data is still very large, so the experimental error caused by insufficient samples can be ignored. In the specific experiment process, before the word vector mapping, in order to reduce the time and space required for the program to repeatedly read the data, this article deletes all uncommon terms, duplicates, and irrelevant attributes. Only the data necessary for this experiment is kept, and finally 5 classification versions are got. All stop words and nonvocabulary words in the comment dataset are filtered. The dataset is divided into training set, test set, and validation set with a ratio of 8 : 1 : 1. In the parameter sensitivity experiments, we constructed 5 categorical versions of the data in categorical form and constructed 4 different sets of data (single-class item, 3-class item, 5-class item, and full).

4.2. Benchmark Model and Configuration

In order to try to restore the fairness of the experimental comparison, for the benchmark model, this paper uses the same set of software and hardware platforms for experiments. Four different benchmark models were used for comparison, namely, matrix factorization (MF) model, probabilistic matrix factorization (PMF) model, collaborative topic regression (CTR) model, and deep cooperative neural network (DeepCoNN). Among them, MF [2] and PMF [3, 8] are the most widely used and well-known standard benchmarks for collaborative filtering methods. In contrast, the effect of the description information on the recommendation algorithm can be verified. CTR [24] is a more successful topic-based modeling method, which also uses potential topics to model ratings and reviews. DeepCoNN [19] is currently a relatively advanced deep-learning-based rating prediction model. It also uses two parallel CNN models to learn user and item representations. It uses matrix factorization in the sharing layer for rating prediction, and the IA-CN, design ideas in this article are also derived from this. For MF and PMF, we use the open recommendation system library MyMedialite to evaluate the value of the model. For CTR and DeepCoNN, we refer to the source code on GitHub and fine-tune and restore the experiment according to the relevant parameters set by the authors of the paper.

In this experiment, we first designed a parameter sensitivity experiment to preliminarily set the parameters of the model and then continuously optimized the parameters during the experiment, especially the characteristic dimensions of users and objects and the number of nuclei in the convolutional neural network. The user and item feature dimensions are set to 10 to 100 and the number of cores is set to 10 to 500. Finally, for optimal performance, the feature dimension of users and objects is set to 50, the number of cores is set to 100, the dimension of word embedding is 300, the learning rate is 0.001, the sliding window parameter size is set to 3, the batch size is set to 100, and the dropout parameter is set to 0.6.

4.3. Experimental Environment and Measurement Labelling

The hardware configuration of the deep learning model experimental platform in this article is Intel i7-10700 CPU, the graphics card is GTX 2070, the memory is 32 G, the operating system is Windows 10, and the model is implemented based on the Python programming language and the TensorFlow framework. The optimizer uses AdamOptimizer, the default learning rate is 0.001, and all models are trained until convergence.

The metric experiment uses the root mean square error (RMSE as formula (16)) to measure the accuracy of the prediction score, because the root mean square error can measure the deviation between the predicted value and the true value, which is defined as follows: the smaller the calculation result, the better the prediction accuracy and the better the recommendation effect.

The main goal of the recommendation system is to generate top n recommendations for end users, so we use (formula (17)) and (formula (18)) indicators to evaluate our proposed IA-CN model. They are expressed as follows:where represents the number of items the user likes in top-N and represents the number of various items used by the user. In order to evaluate the proposed hybrid model, we arranged the predicted scores of all products for each user and recommended the top n recommendation lists to each user.

4.4. Model Training

Because the model parameters are an important factor that affects the experiment, we tune and compare the parameters in the convolutional neural network in the four following aspects. Finally, the final model was constructed using experimental data.

4.4.1. Impact of Dropout

Dropout has been proven to be an effective method to solve the problem of neural network model overfitting [25]. Therefore, in order to check the impact of dropout, we change dropout with different values.

Figure 3 shows that, after setting dropout, the impact on reducing prediction errors is very obvious. It can be observed from Figure 3 that there is a significant improvement of the model across all the datasets. However, the gain of RMSE varies on different datasets. The model records the best results between 0.55 and 0.65 in most of the datasets, while it records the worst performance when the dropout is not used. It can also be observed that, compared with the small-scale dataset, the effect of dropout in a relatively large dataset is less obvious. This is consistent with previous observations [18] and reiterates that dropout is more important for small-scale datasets. Therefore, as shown in Figure 3, the replacement value of the model is between 0.55 and 0.65.

4.4.2. Embedding Dimension

We set different word embedding dimensions {50, 100, 200, 300, 400, 500, 600} to verify the sensitivity of the model to word embedding dimensions. The result is as shown in Figure 4.

In the figure, it can be easily observed that the model has greater sensitivity within 500 word embedding dimensions. In the single-category data, the results of sensitivity are more obvious. This is not difficult to analyze.The reason is that the frequency of words appearing in the single class item data is not rich, which is why the full data set model is more accurate. The results also show that, in all datasets, the performance is the highest around 300 dimensions, and it remains relatively stable above 400. This is enough for the model’s sensitivity to the word embedding dimension. Therefore, further use of larger values did not show a significant improvement in the model. In the end, we choose 300 as the word embedding dimension in the experiment.

4.4.3. Impact of Length of the Document

In order to evaluate the impact of the length of the document information on the model, we set up a verification experiment on 5-class data. The accuracy and training time of the model are recorded by adjusting the length of the document. It is not difficult to see from Figure 5 that when the maximum length of the document is 200, the model has better performance, and the time spent at this time is also more reasonable; that is, the model is more stable.

It can also indicate that the additional document information can be accurately used for the document latent vector until the maximum length of 400 is reached, and no further information can be obtained even if the document length is greater than 400. That is because while increasing the length of the document, the time it takes to process the document information will increase, and the model needs more time for effective training. Therefore, when processing document information, it is important to consider the trade-off between training time and document length.

4.5. Experimental Comparison and Analysis

In order to verify the effectiveness of the proposed algorithm IA-CN, this paper selects a total of 4 models for comparison experiments under the condition of the above superparameters, namely, the matrix factorization (MF) model, the hybrid matrix factorization (HMF) model, the collaborative topic regression (CTR) model, and the deep cooperative neural network (DeepCoNN); they are all classic models in recommendation algorithms. Among them, the matrix factorization model and the hybrid matrix factorization model belong to model-based recommendation algorithms. By comparing them, the effect of review text and description information on the recommendation algorithm can be verified, especially when the score matrix is relatively sparse; the collaborative topic model belongs to topic-based model recommendation algorithms. Most recommendation algorithms that consider review text are based on topic models. Compared with them, the advantages of the proposed algorithm can be verified. In particular, the word embedding vector technology is used to replace the topic model in the preprocessing step, focusing on the performance improvement brought by the order of words in the text; DeepCoNN is a typical representative of collaborative filtering algorithms based on deep learning, and the algorithm proposed in this chapter is improved on the basis of the deep cooperative neural network (CNN), and it is also necessary to make a vertical comparison with it. This experiment was repeated three times, and the average of the three results was taken as the final result. Ultimately, experiments on Amazon product evaluation datasets of different data sizes yielded the experimental results as shown in Table 1.

Of course, we also chose the Amazon product evaluation dataset to compare the more convincing and to compare our method with the benchmark experiments MF, HMF, CTR, and DeepCoNN. The results of Figures 6 and 7 show that the value of gradually increases with the increase of n. Compared with our method in the benchmark experiments of Recall@N and , our method obtains relatively higher results.

The running time is considered as a nonessential evaluation criterion; we designed the following experiments: the 10w-full dataset was trained with different models, and then 20 different samples were randomly selected from the test machine for testing, and the time comparison graph as shown in Figure 8 was obtained; it can be seen that MF and HMF took less time in the same test environment; the reason is that the machine learning method learns fewer layers. Meanwhile DeepCoNN and our method in deep learning took more time, and because the algorithm in this paper added attention mechanism and a factorization, the time it takes is higher than the former However, even so, the recommendation can be completed within 0.4 s, which is obviously an acceptable recommendation time range, so our algorithm can be considered correct and reasonable.

5. Conclusion

Aiming at the data sparseness, cold start of items, long tail items, and other related issues in existing personalized item recommendation algorithms, the paper studies the use of existing recommendation techniques and deep learning methods that have powerful fitting capabilities and can perform complex tasks. The advantages of nonlinear mapping and its powerful representation ability can alleviate the impact of data sparseness and item cold start in these existing item personalized recommendation systems. The work done in the paper designs an integrated strategy model of recommendation algorithms and generalizes and classifies the output of the model and then trains an improved CNN deep learning model and finally verifies the feasibility of the scheme through experiments. Although the experimental results show some progress, the research and application of deep learning methods and technologies in recommendation systems are still in the development stage. The hybrid recommendation scheme based on deep learning proposed in this article still has phased and partial problems, and it remains to be used in actual recommendation systems.

Data Availability

The dataset used in this paper can be found at https://nijianmo.github.io/amazon/index.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.