Abstract

How to associate films with users among various film data and help users get useful information is a big problem we face. The recommendation system aims to provide users with accurate project recommendations, which can effectively solve the problem of information explosion caused by a large amount of data. Traditional recommendation systems are widely used in movie shopping. Aiming at this problem, this paper designs and develops a collaborative filtering recommendation algorithm based on big data platform. Firstly, the depth is deeper than the traditional automatic coding network, and the new activation function is used to generate the depth feature vector. Secondly, the model can describe both linear and nonlinear features of movie data, which further improves the extraction ability of nonlinear features. Experimental results show that the proposed algorithm is effective and can bring better user experience and economic benefits to consumers.

1. Introduction

According to relevant statistics, by June 2020, the number of Internet users in China has exceeded 900 million, and more and more people can use the Internet to get the information they need. But with the mature development of network technology and the rapid increase of the number of users, which greatly increased the burden of the user access to effective information, the user carries a lot of information, including news advertising, electronic products, movies, and so on [1]. However, the emergence and development of recommendation system effectively alleviates the information overload phenomenon. Recently, research on movie recommendation system is becoming more and more popular. With big data, users tend to browse web pages, click to watch, and score and comment on movies on different movie platforms. These behaviors of users can generate relevant data on movie platforms. In different types of data, user-movie rating information plays a very important role, which can usually reflect the popularity of the movie to users.

In order to alleviate the problem of information overload, information retrieval and recommendation systems have emerged successively, such as iQiyi, Youku video, and other movie recommendation platforms, acting the function of search engine. But, but when the user cannot accurately summarize the movie according to the keywords, he may not get the movie, greatly reducing the user experience [2]. This is mainly because the actual keywords that need to be typed can give people an abstract feeling that is hard to describe, sometimes even beyond the user’s recognition of the technical terms. In view of this defect of the search engine, researchers have made efforts for a long time, and finally the recommendation system came into being. It improves the efficiency of user operating system and enhances user experience. Therefore, in order to help users search for movies they are really interested in quickly and efficiently, a large number of researchers turn their attention to the research of recommendation algorithm [3].

How to use sparse and limited score data to improve the accuracy of recommendation is still the direction for researchers to continuously optimize the recommendation system. In order to optimize the accuracy of recommendation, researchers took the social network connection in real life as an important auxiliary factor, integrated it with the traditional recommendation framework, and then proposed a social recommendation method. This method has been proved to be practical and reliable. From one perspective, it is easier for major social platforms to obtain users’ social network relationships, and users rely on the recommendation system to filter out a large number of useless messages to facilitate decision making [4].

Many experts and scholars have proposed and improved various recommendation algorithms in recent decades, such as collaborative filtering algorithm based on matrix decomposition [5]. As a famous recommendation algorithm, it has attracted the attention of researchers. With the deepening of research, it is found that the algorithm blindly relies on the association between users and projects, which does not necessarily reflect all users’ preferences, so it may reduce the accuracy of recommendations. On the one hand, the algorithm only uses linear model to learn the interaction between users and projects, so it is difficult to learn the deeper hidden features of users and projects. On the other hand, the dimension and scale of data are very large; in the project scoring matrix, there is very little communication between the user and the project. Moreover, the algorithm makes recommendations based on the similarity relationship of users or projects. When a new user or project appears, the similarity cannot be calculated, thus affecting the performance of the recommendation system. The basic structure of the recommendation engine is given in Figure 1.

Deep learning technology can precisely solve the above problems [6]. First, deep learning technology can learn the implied features of data by establishing neural network model, and it has the ability to express and learn relevant data. The hybrid movie recommendation algorithm can effectively mine the deep data features of users and projects, thus improving the performance of the traditional recommendation algorithm, establishing a recommendation model more in line with users’ preferences, and providing more personalized recommendation services [7]. Deep learning recommendation algorithm effectively compensates for the limitations of the traditional recommendation algorithm [8].

The movie recommendation system recommends movies that users may be interested in by analyzing their historical viewing records. An excellent movie recommendation system needs to provide personalized recommendations for users, the core part of which is movie recommendation technology, including content-based recommendation and recommendation based on collaborative filtering, etc. Collaborative filtering recommendation can tap the potential untapped interest preferences of users and make recommendations personalized [9]. At present, collaborative filtering algorithm has become an important recommendation technology, which can be divided into user-based collaborative filtering algorithm (user-based CF) and item-based collaborative filtering algorithm (item-based CF). Collaborative filtering algorithm relies on a large number of users’ historical behavior data to make accurate recommendations for users. Therefore, collaborative filtering algorithm has problems such as data sparsity and cold start. In today’s Internet environment, there is a large amount of project data, and the records left by users when accessing the Internet are limited, resulting in the sparse access records of users to the project. In addition, the traditional collaborative filtering algorithm does not take into account the influence of time factor on users’ interest, and users’ interest will migrate in different time periods. Therefore, traditional collaborative filtering often makes fixed recommendations based on users’ historical behaviors. The above questions directly affect the recommendation accuracy of collaborative filtering algorithm [10].

2.1. Introduction of Traditional Recommendation Method

The core idea of the recommendation system is to filter a large amount of information and recommend appropriate items for users [11]. The system analyzes and mines the historical behavior of users through algorithms and automatically presents the screened information to users. The data analyzed by the system mainly include log files generated by user behaviors on the client, such as ratings, likes, favorites, browsing duration, personal information filled in by users when registering an account and item information filled in when items are uploaded to the application, and so on [12, 13]. In this process, the recommendation algorithm is undoubtedly the core of the system. How to correctly use social data to analyze users’ interests directly affects the accuracy and user experience of the recommendation system [14].

According to different recommendation forms, recommendation tasks can generally be divided into score prediction and top-K recommendation. Among them, the scoring prediction problem is to process the data information that users have commented and convert it into a prediction model to explore users’ interest in unfamiliar items [15, 16]. Since the user’s rating of an item can be regarded as a two-dimensional scoring matrix, matrix filling technique is widely used in such tasks. In this method, the existing scoring matrix is regarded as the incomplete state of the matrix, and the elements in the complete state are carefully sampled, and then the incomplete elements are displayed as much as possible [17]. Explicit rating and user feedback are indispensable factors affecting the recommendation task. However, this often leads to the problem of ignoring user behavior and actions (implicit feedback), and the effect of the model is vulnerable to the sparsity of the scoring matrix. Once the process data amount is small, the model is often difficult to achieve the expected recommendation effect [18].

Recommendation algorithm based on collaborative filtering is one of the most popular recommendation algorithms; the algorithm carries out interactive analysis of the information related to users and items and then begins to recommend users. Because human behavior decision is affected by subjective initiative, the relationship between users and objects can be mined through cooperation between users and objects. Due to the accessibility of the interactive information between users and objects and the effectiveness and feasibility of the algorithm, it has been widely used in many fields [19, 20]. Model-based recommendation algorithm is to build an accurate mechanism model first. This method has high accuracy, but sometimes it is difficult to build an accurate model and the model is poor in extensibility. Commonly used models include clustering model, matrix decomposition model and neural network model, etc. Among them, the matrix decomposition model can effectively alleviate the problem that neighborhood-based recommendation algorithm is not easy to calculate the similarity matrix and has been widely used [21, 22]. Users are recommended according to the similarity between their attributes, the matching between the user and the item’s attributes, and the similarity between the item’s attributes [23]. However, how to extract the image content, audio spectrum, language, and other attributes of objects and calculate the heterogeneous attributes is the difficulty. Hybrid recommendation algorithm is the organic fusion of multiple different single recommendation algorithms. As a combination strengthening algorithm, it can not only effectively avoid the shortcomings of single algorithm but also enhance the recommendation efficiency of hybrid algorithm, thus improving the recommendation performance; for example, weighted mixing, partition mixing, layered mixing, and other hybrid algorithms can perform recommendations well. These hybrid modes are usually combined with recommended scenes and recommended tasks for dynamic combination and selection [24].

2.2. Deep Learning-Based Recommendation Algorithms

Autoencoder is a special kind of neural network, which can learn the deep hidden features of input data by adding constraints to the model. The autoencoder can compress and reduce the dimension of data, learn the deep features of the high-dimensional sparse data, and compress and generate the low-dimensional hidden feature vectors in the hidden layer.

Encoder neural network can not only reduce and compress the high-dimensional data but also learn the deep features of the target, so it is often used to extract the hidden features of the target. In recent years, there are endless recommendation algorithms based on autoencoder neural network. The author in [25] combined autoencoder and recommendation algorithm for the first time, updated parameters with gradient descent method in machine learning to make the error close to the minimum value, and proposed AutoRec. However, the recommendation algorithm based on autoencoder only uses a single autoencoder network for recommendation, without combining with the existing recommendation algorithm. Therefore, experts and scholars began to try to integrate the autoencoder into the existing algorithm. The authors in [26] integrated the stacked denoising autoencoder into the collaborative filtering recommendation algorithm based on tag content, which improved the performance of the traditional algorithm after adding tag content. A collaborative filtering algorithm (DLCF) based on tag content and stack denoising autoencoder was proposed. The authors in [27] calculated the implied feature vectors of users and items by combining stacked denoising autoencoder with deep neural network, improved the accuracy of traditional recommendation algorithm, and proposed a recommendation algorithm based on hybrid autoencoder. Wang et al. first extracted the low-dimensional nonlinear features of the project with the autoencoder and then integrated them into the matrix decomposition algorithm, alleviating the cold start problem of traditional factored item similarity models (FISM), and proposed a recommendation algorithm (AutoFISM) to integrate integrating the inherent information of the autoencoder and the project [28]. The authors in [29] applied multiple denoising to the original scoring matrix, which effectively improved the robustness of the original algorithm and solved the problem of low accuracy of the traditional (autoencoder) DAE algorithm. The authors proposed a multi-denoising autoencoder (MDAE) combining product scoring information [30]. The authors [31] extract the user implicit characteristic vector of the project and through the establishment of trust model to study target users with trusted user interest preference, according to the clustering of user interest and assign different recommended weight to recommend.

Neural collaborative filtering (NCF) algorithm for the development of the recommendation system has brought tremendous change. In order to improve the performance of the traditional collaborative filtering algorithm, the authors in [32] proposed the neural network collaborative filtering (NCF) model, which solved the problem that the traditional collaborative filtering algorithm could not effectively learn the complex nonlinear relationship between users and projects. Deep learning has a strong ability to learn nonlinear features of data, which to some extent makes up for the limitation that traditional recommendation algorithms cannot learn deep features of users and items. Deep learning recommendation algorithm generally includes input layer, model layer, processing layer, and output layer. Usually, the input of the algorithm framework is the user’s personal information and attribute information. The model layer is used since the autoencoder (AE), restricted Boltzmann machine (RBM), convolutional neural network (CNN), recurrent neural network (RNN), deep belief network (DBN) [33], and other deep learning models transform the deep learning model into the implied feature representation of users and projects at the processing layer. Finally, the recommendation list of users is generated by connecting the vector inner product to softmax or similarity calculation.

Based on the above analysis, the contributions of this paper are as follows:(1)The depth is deeper than the traditional automatic coding network, and the new activation function is used to generate the depth feature vector.(2)The model can describe both linear and nonlinear features of movie data, which further improves the extraction ability of nonlinear features.(3)Experimental results show that the proposed algorithm is effective and can bring better user experience and economic benefits to consumers.

3. Big Data Technology-Based Digital Movie Recommendation

3.1. Improved Neural Network Cooperative Filtering Model

In order to solve this problem, this section will improve traditional NCF model based on the feedback and further enhance the performance of the recommendation model. Feature layer satisfies the generalized matrix decomposition principle, and its mathematical definition is shown in the following equation:

Then, the improved multi-layer perceptron (MLP) network model and prediction function are shown in (2) and (3):where W and B are the weight matrix and offset value of MLP network, respectively; F is the activation function of MLP network; and L is the number of network layers corresponding to the MLP network. The final definition of the improved neural network collaborative filtering model is shown in the following formula:

3.2. Neural Cooperative Filtering Algorithm Based on Hybrid Depth Autoencoder

In order to solve the problem that traditional NCF model cannot extract the deeper hidden features of users and items and has feature loss in feature extraction, this paper proposes a neural cooperative filtering algorithm based on hybrid depth autoencoder. The algorithm model is mainly composed of feature generation and feature extraction. In this paper, the definition of ReLU function is shown in the following equation:

Then, the user project offset for the NCF model is

The weight adjustment formula of the first hidden layer can be expressed as

The predicted score after improved NCF treatment is

When the data ratio is sparse, the method is simple and convergent. The mean squared error (MSE) was selected as the loss function of the model, and the optimal parameters of the model were found by minimizing the MSE.

Based on equations (1)–(11), Figure 2 gives digital movie recommendation algorithm based on big data platform proposed in this paper.

4. Experimental Results and Analysis

4.1. Introduction to Experimental Dataset

The experiment is based on Windows operating system, using Python programming language, and implemented on PyCharm 2019.3.1 Professional platform. The deep learning framework adopts TensorFlow developed by Google’s AI team.

In recent years, a variety of recommendation algorithms emerged MovieLens dataset is the most widely used public dataset in the recommendation field, and it is the public dataset of movie recommendation provided by GroupLens. MovieLens dataset is shown in the following order according to the data scale from large to small: Ml-20m, ML-10m, ML-1m, and ML-100. The higher the score is, the more interested the user is in the movie.

MovieLens dataset, one of the most common datasets in the recommended field, was selected in this experiment. MovieLens is a website that counts reviews of movies and offers movie recommendations. Among them, the MovieLens-1M dataset has1,000,000 reviews, with a score of 1–5 points and a dataset density of 4.19%. In order to reflect the interaction between users and movies, the MovieLens dataset needs to be preprocessed. Specifically, the explicit score in the user’s movie interaction matrix is converted to the implicit score. If there is interaction between users and movies, it is set to 1; otherwise, it is set to 0. In the trial data, each dataset consisted of 15 random subsets from a user who rated at least 40 movies. There were 600 users in the experimental data, 180 users in the test data, and 900 users rated 1,800 movies 120,000 times.

4.2. Experimental Result Analysis

In this experiment, the data of 2000, 4000, 6000, 8000, and 10000 people and movies watched were selected from the valid samples combined for research, and the values of different root mean square errors (RMSEs) were compared. According to the experimental results, when the recommended number of digital movies is set to 2, 4, 6, and 8, respectively, the linear relationship between RMSE value and the number of digital movies is observed. The specific experimental results are shown in Figure 3.

As can be seen from the figure, when the number of recommended movies is less, the RMSE value is smaller, that is, the accuracy is higher, which indicates that the performance of the algorithm is better when there are fewer samples. As the number of movies gradually increases, RMSE values tend to stabilize at a certain level. It can be seen that the improved neural cooperative filtering algorithm proposed in this paper still has good recommendation performance when processing big data and can recommend the most suitable digital movies for users with high accuracy.

In the test phase, experimental results of different k (the number of movies recommended) are shown in Figure 4. As can be seen from the figure, different values of k and different amounts of data have a significant impact on the experimental RMSE value. Specifically, when k is greater than 20, the RMSE obtained is generally about 50%. With the increasing value of k, the RMSE value tends to decrease; when k is greater than 200, the RMSE value tends to be stable and the error value is small, indicating that the method proposed in this paper still has good movie recommendation performance under the big data platform.

In addition, mean absolute percentage error (MAPE) was used as a measurement index to verify the effectiveness of the proposed method. The MAPE value of the proposed method is shown in Figure 5. According to the results in Figure 5, when a large number of movies are recommended (RM = 5), the MAPE value of the algorithm is 28.63% when a large number of movies are recommended (RM = 15), the minimum MAPE value is 32.00%. Therefore, the recommended number of best movies is RM = 5, and the proposed method achieves the best recommended performance.

Figure 5 shows the performance evaluation results of the algorithm based on MAPE indicators, and then we give more quantitative indicators to evaluate the model performance. In this part, four indicators, namely, TP rate, FP rate, accuracy, and F1 value, are used to evaluate the performance of our model. This experiment was run on the recommendation model based on explicit feedback. Two MovieLens datasets of different sizes were selected as the main experimental data to study the influence of different parameter settings on experimental results. After a large number of experiments, it was found that the vector size of the same experimental parameter, the dimension of the hidden layer, and the learning rate and dropout can affect the evaluation indexes of the recommendation algorithm. In order to study the performance and effectiveness of the algorithm in various aspects, this section decides to set a control group experiment to compare with several traditional algorithms. From Figure 6, we know that in terms of TP indexes, the proposed method achieves the best results, which means that the model has a stronger ability to identify short films suitable for collaborative filtering. CB stands for combination method, and SA stands for sentiment analysis method.

First, compare the performance of the same algorithm in two MovieLens datasets of different sizes. Experiments in Figure 7 show that the improved deep autoencoder algorithm performs better in each evaluation index. It can be seen from the figure that the method proposed in this paper has achieved good film recommendation accuracy no matter in training or test set.

5. Conclusions

This paper mainly optimizes the autoencoder neural network in deep learning and proposes two recommendation algorithms based on autoencoders, which to some extent improves the performance of traditional algorithms and several newer algorithms. However, there are still some limitations in the research. The utilization rate of the algorithm on the dataset is still insufficient. This experiment only involves the interactive information between users and projects in the dataset and does not take into account age and gender. Key information about other attributes of the user, such as occupation, can also add to the performance of the model. Therefore, adding key inherent attributes to the game can further improve the accuracy and interpretability of recommendations. Therefore, future research can consider adding the interest transfer model and semantic analysis model to further improve the recommendation accuracy and algorithm diversity.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Special Research Project on Smart Teaching in Undergraduate Universities of Henan Province in 2021 (Research and Practice on the Construction of Smart Teaching Environment in Colleges and Universities Based on I & C-Center) and Research and Practice Project of 2021 School Level Education and Teaching Reform of Henan Institute of Technology (2021-ZD014).