Abstract

With the continuous development of information technology, data transmission bandwidth and speed also increase. Therefore, how to retrieve their favorite music quickly and effectively has become a key research direction at present. Genre is one of the most mentioned music labels. Music detection by genre has become the mainstream method of the music information search. It is also an important part of the music service platform to recommend music. Therefore, music genre recognition has attracted much attention and become the mainstream direction of research. This article presents a music genre recognition method based on the improved Bayesian algorithm. The variational modal decomposition is optimized by particle swarm optimization with the variable step size. Then the naive Bayesian network model is constructed to detect music genres. Experimental results show that the proposed algorithm can efficiently extract music feature information, fully consider the particularity of different situations, and improve the accuracy of music genre recognition.

1. Introduction

The rise of mobile Internet leads to the explosive growth of information. How can we grab useful information from the mass of information? To reduce the cost of searching for practical information as much as possible has become a strong demand of Internet users [1]. Providing good classification services has become the major e-commerce and film and television, news, and other information service manufacturer’s necessary duties. As an indispensable part of people’s daily life, music can be classified reasonably and effectively so that users can quickly and accurately find the ideal style. It also provides a guarantee for service providers to push users [2] effectively. Music classification has gone through the traditional manual classification. With the introduction of new technologies such as machine learning and deep learning, not only efficiency has been significantly improved, but accuracy has also been improved [3].

Based on the classification model of music genre, music features [4] need to be extracted first, which is mainly used to describe music information. It can be divided into many categories, such as features in energy, time domain, and frequency domain. Music classification can be realized according to the combination of different features [5]. The pilot coefficient is gradually adopted. Second, in addition to the extraction of music features, another important content is the establishment of the music classifier. Traditional classifiers include rule-based music classification and pattern matching [6]. These methods can classify music to a certain extent, but each has its defects. Rule-based taxonomies are only suitable for simple audio files. The pattern matching requires establishing and matching standard patterns with low precision and large computation [7]. Hidden Markov model, neural network, vector supporting machine, and other new methods were based on machine learning [8]. It realizes effective classification and greatly improves the efficiency and accuracy of the calculation.

In recent years, scholars have tried to use deep learning models to improve the accuracy of automatic recognition of music genres [9]. Deep Neural network (DNN) development provides technical support for completing classification tasks with large amounts of music data [10]. Due to many music data tracks on the Internet, using DNN to classify music genres became the mainstream technology at that time [11]. However, as time goes by, the defects of DNN are gradually revealed. For example, different music has its melody [12]. A recurrent neural network (RNN) has been used to build a deep learning music genre recognition model, which has become one of the subsequent research hot spots [13]. Literature [14] proposed the concept of fusion segment feature. The effectiveness of music feature segmentation is verified. Literature [15] used two data formats of long and short segments for model training and established effective LSTM and gated recurrent unit (GRU) music genre recognition models. Literature [16] proposed three feature sets to characterize timbre texture, rhythm, and tonal features. In literature [16], timbre features and rhythm features were extracted respectively to represent musical signals, and the extracted musical features were classified. Literature [17] introduced the concept of the chord in feature extraction to represent musical signals better. Literature [18] uses empirical mode decomposition (EMD) to capture the local characteristics of different music genres.

In the music genre recognition task, feature extraction and classifier modelling are two key parts that affect recognition accuracy. However, the feature extraction process is complex and difficult to implement, and the features required by different classification tasks need to be specially designed. And the existing deep learning recognition accuracy is not ideal. Given the above situation, this article proposes a research method of music genre recognition based on the improved Bayesian algorithm. The algorithm in this article uses POS to optimize the VMD decomposition process. Then the naive Bayes recognition model is constructed, and feature information is selected by chi-square detection. The weights of each feature item are calculated, and the sum of the weights of each feature is calculated. The detection result is obtained by applying the weights to the detection model group.

2. The Proposed Algorithms

2.1. Variational Modal Decomposition and Algorithm
2.1.1. Variational Modal Decomposition

Variational mode decomposition (VMD) is a non-recursive mode decomposition method [19]. By a constructing variational model, adaptive signal decomposition is realized. Let signal be decomposed and reconstructed by VMD to form IMF components. The reconstructed IMF components were used as modulation signals and demodulated by the Hilbert transform to obtain their analytic signals. The obtained analytic signal is multiplied by the exponent , and the spectrum of each IMF component is modulated to the corresponding base band.

The variational problem with constraints is obtained by estimating the bandwidth of each mode of post-transitional demodulation signal.where is the input signal. is the IMF components of the original data decomposed by VMD. is the central frequency of each IMF component.

To solve the variational constraint problem, a quadratic penalty factor is introduced to ensure the accurate decomposition of the signal under the influence of Gaussian noise. The Lagrange multiplication operator can keep the constraint problem strict. The augmented Lagrange function L is expressed as follows:

For equation (3), the alternating multiplier algorithm (AMDD) is used to alternately update , , and to solve the optimal solution of the augmented Lagrange function.

2.1.2. VMD Optimization Based on Particle Swarm Optimization (PSO-VMD)

It is found that the second penalty factor α and the number of IMF components K have a great influence on VMD decomposition. In the traditional method, only considering the influence of parameter K on VMD decomposition can get a relatively better result. This article uses the improved PSO algorithm to simultaneously optimize VMD parameters α and K.

Particle swarm optimization (PSO) is a population intelligent global optimization algorithm with good global optimization ability. However, for the ordinary particle swarm optimization algorithm, the weight ω is constant, so it is not easy to maintain high local and global optimization effects simultaneously. In order to optimize the two influence parameters of VMD at the same time, this article uses the particle swarm optimization algorithm with adaptive weight to optimize VMD parameters. The adaptive weight PSO introduces an inertial weight ω. It is closely related to the global optimum and varies with the particle’s position. This article adopts the nonlinear dynamic inertia weight coefficient, which is expresses ad follows:where f is the real-time objective function value of the particle, is the average of the particles, and is the minimum target value of the particle. According to formula (4), the magnitude of inertia weight is updated with the particle objective function value change.

First, the fitness function needs to be determined. The fitness function updates as the particle position changes. The updating direction of particles was determined by comparing fitness values. The fitness function adopts the difference between the total energy of the original signal and the total energy of the decomposed Z modal components. The difference value represents the similarity between the original signal and each modal component. The smaller the difference value is, the higher the similarity degree is.where and are the total energy of the original signal and the total energy of each IMF component, respectively, is the input signal, is each modal component after VMD decomposition, and K is the decomposed quantity of VMD.(1)Initialize the parameters of the PSO algorithm and determine the fitness function.(2)Initialize the particle population position of PSO algorithm. The partial influence parameter combinations [α, Z] are randomly generated as the initial positions of particles, and the velocities of each particle are randomly initialized.(3)VMD decomposition at different particle positions to determine fitness function values at different positions.(4)Compare fitness function values at different locations, thus updating the local extreme and the global extreme of the population and particle velocity and position.(5)Update the weight ω according to formula (4). Repeat steps (3)–(5) to the maximum number of iterations. Output the best fitness value and the best particle position.

2.2. Naive Bayesian Networks
2.2.1. Related Knowledge

(1) Naive Bayes Classification Algorithm. The naive Bayes algorithm assumes that each feature is independent of the other and is an effective classification method. The common models are polynomial model and Berkelium model, and the polynomial model is used in this article. Suppose the text feature item to be classified is , the class set is . The algorithm computes the probability that entries belong to each document, assuming that they are independent between and . The category with the highest probability as to which the prediction document belongs is taken. Polynomial naive Bayes calculation is as follows:where is the probability that the new text belongs to class . is the probability that category contains entry .where is the number of words under class . is the number of words under all classes.where represents the weight of entry in class . represents the sum of the weights of all entries in class .

In order to prevent the zero-probability problem caused by the possible occurrence of feature word in category , the following solution is generally adopted:

(2) Chi-Square Feature Selection. Due to the high dimension of the words screened in the preprocessing stage, special feature selection is needed to get the feature word set with high differentiation but small dimension. statistical method was used for feature selection. This method assumes that the two samples are unrelated and the chi-square value determines the deviation degree of the two samples. The larger the chi-square value, the more obvious the feature. The calculation of this method is as follows:where T is the number of documents, Z is the feature item, and C is the category. H is the total number of texts containing feature item Z in non-category C. C is the total number of texts in class C that do not contain feature item Z. G is the total number of texts containing feature item Z in class C. D is the total number of texts in non-category C that do not contain feature item Z.

(3) TF-IDF Weight. Term frequency-inverse document frequency (TF-IDF) stands for word frequency-reverse document frequency [20]. In a text, let us say a word’s TF is high. If it infrequently appears in other documents, the term is a good representation of this type of article. Its expression is as follows:where represents the weight of feature item in document . represents the word frequency of feature item in document . represents the number of documents. represents how many documents contain the feature . However, in the actual calculation process, if the number of documents that feature items appear is 0, the denominator is 0. So, we can add 1 to the denominator, as in the following expression:

(4) MapReduce Programming Framework. The core idea of MapReduce is that large amounts of data are processed by many sub-nodes, which are managed by a single master node. The result can be obtained by sorting out the processing results of each sub-node. Map and reduce are the two main parts of the framework. It works on key-value pairs of the form. Since the NB algorithm assumes that each feature term is independent, it can be implemented in parallel.

2.2.2. Optimization of Naive Bayes Algorithm

The process of NB parallelization is divided into four stages: feature selection, weight calculation, model training, and test. First, the Chinese word segmentation tool is used to preprocess the text content. Then, nonsense words are eliminated through the Chinese stop words table constructed, the sum of word frequencies of the same category are calculated, and words with high or low word frequency are filtered out. As a result, there will be two files: total news and word_count.

(1) Feature Selection. The work flow of feature selection job is listed as follows:(1)To read the contents of the distributed file system, total_news and word_count files are inputted.(2)In the Map phase, two files are read sequentially. The data are written to the words_list and news_list tuples, respectively. The flag is defined to determine whether each word appears in each type of text through the for loop. The flag is 1, otherwise 0. T and G, H, C, D of each characteristic term are calculated and used formula (12) to calculate chi. Sqrt is then used to root it and overflow it to the local disk of HDFS as a file in the form of <s_CHI, wordID_> key-value pair.(3)All key-value pairs output by fragments are sorted and merged in descending order according to the size of s_CHI during Shuffle. The Reduce phase receives sorting and merging results and continues processing. The collation result is printed as a key pair of “<s_CHI, wordID_ >.”(4)In the reduce stage, the previous step’s output content is obtained. Each category selects the front of top as the final feature word of this category and filters out the repeated . The final global feature term is obtained and saved to the CHI file <wordID, > key-value pair and outputted. WordID is the feature wordID. is the feature word.

(2) Weight Calculation. The work flow of weight calculation job is listed as follows:(1)To read the contents of the distributed file system, total_news and CHI files are inputted.(2)In the Map phase, two files are read sequentially. The data are written to the words_list and news_list tuples, respectively. Formula (14) is used to calculate the TF and IDF values of and overflow it to the local disk of HDFS as a file in the form of <wordID_ , newCategory_TF_IDF> key-value pair.(3)The shuffle process merges based on the same key-value. The Reduce phase receives the merge result and continues processing. The collation result will be output in the key pair of <wordID_ , newCategory_TF_IDF>.(4)In the Reduce stage, the previous step’s output content is obtained. The weight value of each in each text is calculated, saved it in TF-IDF file as <WordID_ , newCategory_TFID>, and outputted it.

2.2.3. Training Classification Model

Training the work flow of the classification model job:(1)To read contents in distributed file system, TF-IDF file is inputted.(2)Map stage, files are read. The TF-IDF value of in each category is calculated and overflow it to the local disk of HDFS as a file in the form of key-value pair <wordID_ , TFID.(3)All key-value pairs output by fragments are merged according to wordID_ in the shuffle process. In the Reduce phase, merge results are received and processed. The collation result is printed in the key pair of “<WordID__, .(4)In the Reduce stage, the output content of the previous step and the corresponding category of the output maximum value are obtained. It is saved directly to the weight file as a key-value pair of <WordID , TFIDF> and printed it.

2.2.4. Testing the Workflow of the Classification Model Job

(1)read the contents of a distributed file system, total_test_news file and weight file are inputted as test data.(2)Map phase, two files are read in sequence. The new text probability is predicted according to formula (8), which is then saved as a key-value pair of newID, pro.(3)All key-value pairs outputted by fragments will be merged according to newID in the shuffle process. In the Reduce phase, merge results are received and processed. The collation result is printed in the form of the key pair of “<newID, <, , .”(4)In Reduce stage, the output content of the previous step and the corresponding category of the output maximum value are obtained.

3. Experiment and Result Analysis

3.1. Experimental Environment

Lenovo Z40-70 notebook was used in this experiment. The notebook is Intel I5-4210U dual core CPU, main frequency 2.0 GHz, memory 16 GB, hard disk 1 “ ” TB, one physical network card. For notebooks, Windows 10 Professional is installed and VMware Workstation Pro14 is used to create four VMS. Each VM contains one CPU core, 2″ GB memory, 20″ GB hard disk, and one virtual NIC.

3.2. The Data Set

The GTZAN data set is a public data set commonly used music genre identification. Its music data are divided into 10 genres, namely pop, classical, metal, jazz, reggae, blues, disco, hip-hop, country, and rock. The GTZAN data set contains 1200 pieces of music data. The experiment in this article takes 1000 pieces of music data as training set. 100 pieces of music data were used as validation sets for supervised learning, and 100 pieces of music data were used as a test set to test the accuracy of music genre recognition.

3.3. Algorithm Comparison Experiment
3.3.1. Network Evaluation Indicators

In this article, spectrum recognition accuracy and spectrum recognition loss function are used as the performance evaluation indexes of the algorithm. Spectrum identification accuracy refers to the identification accuracy of spectrum slices by the network. The loss function of spectrum recognition is a cross-entropy function suitable for multi-classification problems. The cross-entropy function is calculated as follows:where n represents the number of samples, I represent the number of categories, is the distribution target, and is the predicted matching distribution. The cross-entropy function refelcts the difficulty of representing in terms of . The smaller the value of cross-entropy function is, the better the convergence effect is.

3.3.2. Analysis of Experimental Results

The proposed algorithm iterates 32000 training sets with a learning rate of 0.0001 compared with common detection algorithms. The curve of spectrum recognition accuracy is shown in Figure 1. The spectrum recognition accuracy of the training set after 32,000 iterations is shown in Table 1. The common algorithms train gradient descent algorithm and update the weight parameters iteratively. The updated weight parameters are used to calculate the frequency spectrum recognition accuracy. With the increase of iteration times, the frequency spectrum identification accuracy gradually improves and tends to be stable. According to Figure 1, the spectrum identification accuracy of the training set tends to be stable after 32,000 iterations of each network. The frequency spectrum recognition accuracy of the proposed algorithm is higher than that of common detection algorithms. The data in Table 1 show that the frequency spectrum recognition accuracy of the proposed algorithm is 0.19–3.81% higher than that of common detection algorithms.

The algorithm in this article and common detection algorithms iterate 32,000 training sets with a learning rate of 0.0001, and the change curve of spectrum recognition loss function value is shown in Figure 2. The spectrum recognition loss function of the training set after 32000 iterations is shown in Table 2.

In order to minimize the spectrum recognition loss function, the conventional detection algorithm iteratively solves the problem by the gradient descent algorithm. With the increase of the number of iterations, the loss function of spectrum identification decreases gradually and tends to be stable. According to Figure 2, after 32,000 iterations of each algorithm, the training set’s spectrum recognition loss function values tend to be stable. The training set spectrum recognition loss function of the proposed algorithm is lower than common detection algorithms. The data in Table 2 show that the spectral recognition loss function value of the training set of the algorithm in this article is 0.0288–0.1311.

The algorithm in this article and common detection algorithms iterate 32,000 training sets with a learning rate of 0.0001, and the curve of spectrum recognition accuracy is shown in Figure 2. The spectrum recognition loss function of the training set after 32000 iterations is shown in Table 3.

As can be seen from Figure 3, after 32,000 iterations of each network, the frequency spectrum recognition accuracy of the music in the validation set tends to be stable. The accuracy of the proposed algorithm is the highest. The data in Table 3 show that the frequency spectrum identification accuracy of the proposed algorithm is 1.77%–7.03% higher than that of common detection algorithms.

The validation set of the proposed algorithm and common detection algorithms is iterated 32000 times with a learning rate of 0.0001, and the curve of spectrum recognition accuracy is shown in Figure 4. The spectrum recognition loss function of the training set after 32000 iterations is shown in Table 4.

According to Figure 4, after 32000 iterations of each network, the spectrum recognition loss function of the music in the validation set tends to be stable. Moreover, the loss function value of the proposed algorithm is minimized. The data in Table 4 show that the spectrum recognition loss function value of the validation set of the proposed algorithm is 0.1215∼0.3108 lower than that of common detection algorithms.

The experimental results show that the frequency spectrum recognition accuracy and frequency spectrum recognition loss function are stable after 32000 iterations. The training and verification set’s frequency spectrum recognition accuracy is 99.13% and 88.18%, respectively. The training and verification sets’ spectral recognition loss function value is reduced to 0.0737 and 0.4656, respectively. The proposed algorithm has better spectrum recognition performance.

4. Conclusion

With the spread of the Internet and multimedia devices, digital music has increased dramatically. As a label for people to describe and understand music, the music genre provides the possibility for the task of automatic music classification. With the increase of Internet music library capacity, music detection by genre has become the mainstream method of music information detection. It is also an important basis for the music service platform to recommend music for users. This article proposes a music genre recognition algorithm based on the improved Bayesian algorithm. The algorithm uses adaptive-weighted particle swarm optimization to optimize VMD parameters to ensure high local and global optimization results. Meanwhile, the MapReduce framework is used to calculate the feature selection and its weight, respectively. The experimental results show that the proposed algorithm has high accuracy in music genre recognition. The effectiveness of detection was also significantly improved. However, for some music genres with small distinction, it is still difficult to classify them correctly. In the future, we also need to find music element features that can more appropriately describe its characteristics, to identify the subtle differences between music genres.

Data Availability

The labeled data set used to support the findings of this study are available from the author upon request.

Conflicts of Interest

The author declares no conflicts of interest.

Acknowledgments

This work was supported by the Zibo Vocational Institute.