Abstract

Aiming at the shortcomings of current music recommendation algorithms, such as low accuracy and poor timeliness, a personalized hybrid recommendation algorithm incorporating genetic features is proposed. The user-based collaborative filtering (UserCF) algorithm analyzes the degree of users’ preference for music genes. The improved neural matrix decomposition collaborative filtering (B-NCF) algorithm calculates the correlation between similar users and constructs the adjacency relationship between users. The results of the two algorithms are fused by using a weighted hybrid approach to generate the recommendation list. Finally, the hybrid recommendation model is built on the Spark platform. The paper’s traditional and hybrid recommendation algorithms are validated using the Yahoo Music dataset. The experimental results show that the advantages of the algorithm in this paper are more significant under the MAE and F1-measure indexes, and the recommendation accuracy and precision have been greatly improved; the hybrid algorithm can ensure the diversity of the recommended contents, the recommendation hit rate is higher, and the timeliness meets the demand of personalized music recommendation.

1. Introduction

With the rapid development of mobile communication technology, the Internet has become the most effective channel for music transmission [1]. Network music provides convenience for people’s entertainment and leads us into the considerable data age. In the face of vast and complex music data, if users cannot get accurate information quickly and effectively, it will inevitably cause the problem of information overload [2, 3]. At present, traditional Internet music platforms tend to focus on light operation modes such as search, collection, and selection of tracks. Users need to put forward precise song requirements independently to complete the search task, which is time consuming and easy to cause user information fatigue [4]. Based on this, some scholars use different algorithms to achieve the active recommendation of music, which can effectively solve the problem of information overload. Park and Cho [5] presented a recommendation algorithm based on SVD matrix decomposition to predict user preferences. The accuracy is 7% higher than Netflix Cinematch, and the prediction performance is good. However, there are some shortcomings, such as high algorithm complexity and ample storage space. Ahn [6] proposed a heuristic similarity measure PIP, which can solve the cold start recommendation problem to some extent. However, there are fewer everyday scoring items among users in sparse datasets, and the recommended results are not ideal. Liu et al. [7] proposed a new heuristic algorithm, NHSM, by improving the PIP algorithm. The algorithm considers the user’s rating context information and the global preferences of user behaviour. It can calculate the user’s similarity with fewer scores, and the recommendation performance has been dramatically improved. He et al. [8] proposed the neural collaborative filtering (NCF) algorithm, used the neural network architecture to model the characteristics of users and projects, designed a common framework for the neural network collaborative filtering algorithm, and improved the performance of the recommended model by introducing a multilayer sensor to make the algorithm highly nonlinear. At the same time, some researchers from the perspective of music genes put forward some feasible personalized music recommendation methods for the emotions and scenarios of music. Vignoli and Pauws [9] calculated the similarity based on the music’s timbre, rhythm, mood, and genre and calculated the similarity of songs through the factor weight factor to complete the accurate music recommendation. Baltrunas et al. [10] proposed a music recommendation algorithm according to the user’s mood in different scenarios which achieved good results in the experiment. Hariri et al. [11] used the user’s social tags to classify music and used the user’s historical playlists and collection lists to organise and recommend the user’s preferred music genres and achieved good results.

In summary, the current recommendation system can better solve the information overload problem and has further improved the recommendation performance. However, there are still shortcomings such as complex implementation process, standard recommendation accuracy, and poor timeliness, and the single algorithm recommendation cannot meet the multifaceted needs of users. The recommendation effect is not ideal in practical applications. Therefore, this paper introduces the concept of music genes based on users’ preferences for music genes and social tags and combines the advantages of two algorithms, UserCF and B-NCF. We design a hybrid recommendation algorithm incorporating music gene features to solve the shortcomings of current music recommendation algorithms and improve personalized music recommendations’ accuracy.

2. Music Genetic Characteristics

Music genes control the basic information that expresses the auditory effects of music. They are mainly composed of four essential elements: melody, rhythm, harmony, and timbre [12]. Genetic traits can describe different characteristics of music. For a piece of music, some features can be directly felt by the user, have uniqueness, and cannot be changed. For example, lyrics and audio of music can be classified as internal genetic characteristics. However, some music features that are not unique can be classified as external gene characteristics because different users will have different perceptions, such as emotion, style, category, and other music features. The overall structure of music gene characteristics is shown in Figure 1.

According to the different nature of music, external gene characteristics can also be divided into fixed gene characteristics and free gene characteristics. Selected gene characteristics refer to the inherent characteristics of music that users cannot change, mainly including music title, album, singer, and other identifying features. Free radical gene characteristics are user-defined and can reflect the music characteristics of the user’s cognition, mainly including music style, attribution category, music emotion, and other cognitive features [13]. Among them, music emotion refers to the emotional type used to describe the music, which is generally derived from analyzing the context of the lyrics or the user’s active tags. Figure 2 shows the Hevner emotional ring model, composed of strong, joyous, soothing, sad, exciting, and other eight emotions. Free radical gene characteristics can reflect users’ interests, preferences, and cognitive status and play an essential role in improving the ability of personalized music recommendations.

3. Hybrid Recommendation Algorithms Combining Musical Gene

3.1. User-Based Collaborative Filtering Algorithm

The basic idea of user-based collaborative filtering algorithm is calculating the user’s preference degree for a particular gene feature, searching for similar users with higher interest levels to the target user, and then recommending suitable music to the target user according to the similarity principle [14]. Suppose u is the number of users; n is the number of music genes; pun denotes the preference degree of user u for a specific music gene n; and the preference degree can be the direct or implicit evaluation of users. Then, the expression of the user-music gene matrix P in the collaborative filtering algorithm is

Since users have limited usage time and experience, it is impossible to generate behaviours for most music genres, and thus P is mostly a sparse matrix. After obtaining the user-music gene scoring matrix, the similarity between the target users and similar users needs to be calculated to get the set of users with the highest similarity to the target users. There are more algorithms to calculate the similarity, and the main algorithms commonly used at present are cosine distance, Jaccard similarity coefficient, and Pearson correlation coefficient [15]. Among them, the cosine distance uses the cosine of the angle between two vectors to measure the similarity between users, focusing more on the difference of vectors in direction. Thus, the cosine distance is used to calculate the similarity between users.

Each user description file can be considered a vector, projected to a space of n dimensions to obtain a dimensional vector [16]. If the user does not evaluate the music genes, the value of the corresponding position of the user vector is set to 0. The cosine angle between the vectors measures the similarity between users. Let the vectors of user a and target user u be and , respectively; then, the similarity between the two users is

The range of cosine value is [−1, 1]; the closer the weight tends to 1, the closer the direction of the two vectors is and the higher the similarity between users is; on the contrary, the closer the value tends to −1, the greater the difference in the direction of the two vectors is and the lower the similarity between users is. Based on the similarity between users, the music not viewed by the target user is predicted, and the piece with the highest preference score is selected and recommended to the target user.

3.2. Improved NCF Model

A The NCF model uses “dual thread” to model the user and music genes and connects arithmetic information through two routes. The arithmetic information is connected through generalized matrix decomposition (GMF) and multilayer perceptron (MLP) to obtain the info combined with high-order implicit features and low-order features [17]. However, model learning abstraction of implicit information is prone to loss, the algorithm is poorly interpreted, and a single source of information is challenging to satisfy complex recommendation problems [18]. Therefore, the Bayesian personalized ranking (BPR) structure is used to replace the GMF structure of the neural collaborative filtering network, and the designed B-NCF model is used to complete the information mining and ranking. Figure 3 shows the structure of the B-NCF model.

In the B-NCF model, the upper layer of the input layer is the fully connected embedding layer, which is used to map the sparse representation of the input layer into a dense vector. The user ID (Uid) is mapped to the user feature vector, and the rated music ID (Rid) and the unrated music ID (Kid) are mapped to the music feature vector.

Vectors suffixed with MLP are input to the MLP layer for stitching to form new vectors that generate higher-order feature information through a multilayer perceptron. Based on the MLP layer, batch normalization (BN) and dropout layers are added. The BN layers are used to unify the variance of each layer to speed up the convergence of the model. The dropout layer improves model generalization ability and prevents overfitting. The output high-order feature information expression iswhere OT represents the transpose of user feature matrix O; QT represents the transpose of music feature matrix Q; and represent the user feature vector and music feature vector, respectively; and defines the model parameters of the interaction function.

The vector with the suffix Emb is input to the BPR layer, and the weights of the BPR layer can be considered a user-music hidden factor matrix, which can be used to obtain the ranking scores of different users for any music. The purpose of ranking is to minimize the BPR loss and thus maximize the probability of ranking the music higher. The expression of BPR loss iswhere denotes the posterior probability p maximized under the model parameter ; ln is the natural logarithm function; and indicate the vector of users u multiplied by the vectors of music r and k, expressing the user’s preference for different music genes; σ denotes the sigmoid activation function; λ is the regularization parameter; and is the regularization term.

In the output layer, the ranking information of the BPR layer and the high-order feature information of the MLP layer are spliced to form a new vector. The predicted value is obtained using the sigmoid activation function. When the expected value is 1, it indicates interaction; 0 indicates no interaction. The expected value’s expression is

The cross entropy between the predicted value and the target value is calculated, and the parameters of the model are updated with the following expression:

3.3. Hybrid Recommendation Algorithm

A single recommendation algorithm is often challenging to meet the needs of diverse scenarios, and thus a mixture of multiple recommendation algorithms is needed to improve the accuracy of recommendations [19]. The commonly used hybrid methods are waterfall hybrid, weighted hybrid, and transform combination. Among them, the weighted mixture can set different weight factors for other models and generate dynamic weighted models through training, which can improve the accuracy of recommendation and make the recommendation model more suitable for diverse scenarios [20]. Therefore, in this paper, we use a weighted mixture to mix the two algorithms of UserCF and B-NCF.

Let the length of the list to be recommended to the user be N. XUserCF and YB-NCF are the recommendation lists derived from the collaborative filtering algorithm and the improved neural collaborative filtering model, respectively. and denote the recommendation weights of the two algorithms, . Then, the algorithm’s mixed recommendation list TopN can be expressed as

Depending on the application scenarios, the way the weights of each algorithm in the hybrid model are taken varies slightly. In this paper, the hit ratio percentage situation is used as the weight, and the corresponding evaluation index is used to evaluate its performance. The hybrid algorithm flow is shown in Figure 4.

First, the user-music gene information is obtained from the music dataset, and the collaborative filtering algorithm calculates a recommendation list. At the same time, user-preferred song information is extracted from the dataset, and another recommendation list is calculated according to the B-NCF model. Then, the two recommendation lists are fused using a weighted mixture, and the recommendation list is obtained after data filtering. The two algorithms are performed simultaneously in a parallel manner, and the weights can be adapted according to the actual situation to meet the recommended requirements in different scenarios.

4. Experiments

4.1. Experimental Dataset and Test Environment

To verify the advantages of the hybrid algorithm in this paper, the Yahoo Music dataset is used as the experimental dataset for testing the algorithm in this paper, and its performance is evaluated. The genetic types of music are added according to their attributes, including artist, song title, genre, emotion type, etc., which are 14 types in total. The dataset is divided into a training set and a test set according to the ratio of 8 : 2. The description of the relevant information of the dataset is detailed in Table 1.

The experiments were conducted under the Spark platform, containing 1 master node and 7 worker nodes. The operating system of each node computer is Linux CentOS6.5, CPU is Intel i7-12700 KF, and memory is 16 GB. Software includes Hadoop-2.8.4, Spark-2.3.2, JDK 1.8.0_171 and Python3.6.4, .The code editor uses Pycharm2017.2.3 × 64.

4.2. Performance Evaluation Indicators

MAE and F1-measure are used as evaluation criteria to measure the accuracy and recommendation performance of the algorithm. MAE evaluates the recommendation accuracy of the algorithm by calculating the deviation between the predicted user-music gene scores and the actual scores. The lower the value of MAE is, the higher the recommendation performance is indicated [21]. Assuming that the predicted set of ratings is AA and the corresponding set of actual ratings is BB, the MAE can be expressed as

F1-measure is a metric that combines precision and recall results to evaluate the strengths and weaknesses of a recommendation model. Assumptions: IR1 is the predicted list of recommendations provided by the recommendation algorithm for the target user u, IR2 is the actual list of recommendations for user u in the test set, and Iu is the number of music genes reviewed by user in the test set. The relationship between the precision, recall, and F1-measure evaluation metrics is as follows:

4.3. Model Parameter Optimization

The training process of the hybrid recommendation algorithm takes a lot of time and cannot guarantee the timeliness of the recommendation. Therefore, the algorithm parameters need to be optimized to ensure that the algorithm has good real-time performance, and the learning rate (LR) of the B-NCF model has a significant impact on the performance of the model. Considering the large sparsity of the dataset, the optimizer of the B-NCF model was chosen as Adam, the epoch was formed as 30, and the learning rates were 0.1, 0.05, 0.01, and 0.001 for the experiments. The results are shown in Figure 5.

Figure 5 shows that when the learning rate is 0.1 and 0.05, the network iteration loss is large, which is not conducive to model training. When the learning rate is 0.01 and 0.001, the network error is lower and stabilizes after 20 rounds of training. Considering the model training speed requirement, the learning rate of the model is taken as 0.01, the optimizer is chosen as Adam, and the epoch is set as 20. The training time is shortened based on guaranteeing the recommendation accuracy. After several trials, the weights of UserCF and B-NCF are taken as 0.37 and 0.63, respectively.

4.4. Analysis of Results

As can be seen from Figure 6, the MAE index of this method is significantly better than that of other recommendation methods. When the number of similar users K is 500, the prediction error of this method reaches the lowest, and the MAE value is 0.944. Compared with the NCF model, which has a higher prediction accuracy, the accuracy is improved by about 4%. Figure 7 shows the performance comparison of the algorithms under the F1-measure index.

Figure 7 shows the strengths and weaknesses of the recommendation quality of the Yahoo Music dataset. The evaluation metric F1 increases with the increase of K value. When the K value is 500, the hybrid algorithm has the highest evaluation with a value of 0.64, 2.2% and 5.3% more accurate than the NCF and NHSM algorithms, respectively. Thus, it shows that the hybrid recommendation algorithm provides more detailed music listings.

NCF, NHSM, and hybrid recommendation algorithms are used as examples, and the length of the recommendation list is set to 10 to examine the hit rate of the algorithms. The obtained results are shown in Figure 8.

Compared with NCF and NHSM algorithms, the hit rate of the hybrid recommendation algorithm is higher, which indicates that the algorithm has more vital higher-order nonlinear expression ability and can better realize the interaction between users and music genes. In the early stage of the hybrid recommendation algorithm, the hit rate of the hybrid model mainly comes from B-NCF due to the low hit rate of UserCF. As the hit rate of ALS improves and stabilizes, the hybrid model better combines the advantages of UserCF and B-NCF, and its hit rate is also improved to some extent. To further verify the performance of the hybrid algorithm in the paper, the hit rate and diversity metrics of the hybrid model are analyzed by taking different length recommendation lists. The experimental results are shown in Table 2.

From Table 2, it can be seen that the hybrid algorithm has a high hit rate for different lengths of recommendation lists, and the diversity changes steadily. Thus, it can be shown that the performance of the hybrid algorithm in terms of accuracy and recall is better than the currently used recommendation algorithms, and it can better match the relationship between users and music genes; when the length of the recommendation list is 50, the time taken is only 96.82, which meets the requirement of recommendation timeliness, and the algorithm is more feasible and can be used as a recommended method for personalized music.

5. Conclusion

A personalized music hybrid recommendation algorithm based on UserCF and B-NCF is proposed to research the user’s preference situation of music genres. Several metrics are used to verify the algorithm’s performance. Experiments on the Yahoo Music dataset show that the algorithm improves the accuracy and precision of recommendation by 4% and 2.2%, respectively, compared with the NCF model, and the recommendation list is more reasonable and effective. For different lengths of recommendation lists, the hybrid algorithm takes less time to recommend, which can meet the requirements of hit rate, diversity, and timeliness of music recommendation. From the perspective of the depth of information mining, the hybrid algorithm can improve the effective recommendation hit rate and requires less computational resources. In terms of the breadth of information sources, the hybrid algorithm can be used as a personalized music recommendation method because it takes music genetic information into account and broadens the diversity of information sources.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.