Research on Music Emotion Intelligent Recognition and Classification Algorithm in Music Performance System

Huang, Chun; Shen, Diao

doi:https://doi.org/10.1155/2021/7886570

Scientific Programming

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Machine Learning in Image and Video Processing

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 7886570 | https://doi.org/10.1155/2021/7886570

Research on Music Emotion Intelligent Recognition and Classification Algorithm in Music Performance System

Chun Huang¹and Diao Shen¹

Academic Editor: Bai Yuan Ding

Received11 Aug 2021

Revised08 Oct 2021

Accepted28 Oct 2021

Published12 Nov 2021

Abstract

The music performance system works by identifying the emotional elements of music to control the lighting changes. However, if there is a recognition error, a good stage effect will not be able to create. Therefore, this paper proposes an intelligent music emotion recognition and classification algorithm in the music performance system. The first part of the algorithm is to analyze the emotional features of music, including acoustic features, melody features, and audio features. Then, the three kinds of features are combined together to form a feature vector set. In the latter part of the algorithm, it divides the feature vector set into training samples and test samples. The training samples are trained by using recognition and classification model based on the neural network. And then, the testing samples are input into the trained model, which is aiming to realize the intelligent recognition and classification of music emotion. The result shows that the kappa coefficient k values calculated by the proposed algorithm are greater than 0.75, which indicates that the recognition and classification results are consistent with the actual results, and the accuracy of recognition and classification is high. So, the research purpose is achieved.

1. Introduction

Watching entertainment programs has become one of the main leisure activities in our daily lives. When watching the program, we can often see that the lights will change with the performance of the performers, so as to render the performance scene and drive the scene atmosphere, which helps the performers to complete their performance better. The conversion of the performance lighting is mainly completed under the control of the music performance system. Namely, the control principle of the music performance system is to control the lighting by classifying and identifying the emotions contained in the performers’ music. For example, if the emotions expressed in the music are cheerful, joyful, and enthusiastic, the corresponding rhythm of lighting will change fast, and the light color will be red or other bright colors. On the contrary, if the emotional factors expressed in music are depressive and serious, the corresponding rhythm of lighting will change slowly. And, the light color will be dark blue or other deep cool colors [1]. Once the emotion recognition is wrong, it is very likely to ruin a performance. Based on the above background, music emotion recognition and classification has become the most critical part of the music performance system and the focus of researches.

There are many researches on emotion recognition and classification contained in music. For example, in literature [2], Wang Jinhua, Ying Na, Zhu Chendu, and others extracted emotion strong correlation feature spectrogram through the hybrid convolution neural network model and recognized emotion in music on this basis. In literature [3], Wang Jie and Zhu Beibei took Chinese lyrics as the main object, extracted the emotional feature keywords contained in lyrics, and then calculated the similarity between words and Chinese emotional dictionary, to achieve music emotional classification. In literature [4], Li Qiang and Liu Xiaofeng constructed a probabilistic neural network model (PNN) to classify music emotion, extracted feature parameters in the process of music playing, and then input the feature parameters into the PNN model to complete emotion classification.

Combined with the experience in previous researches, in order to improve the accuracy of recognition and classification, this study extracts the emotional features contained in music from multiple aspects and then constructs a multifeature space vector. Finally, the classification of music emotion is achieved by using the constructed recognition and classification model, which helps the completion of the control of lighting in the music performance system. The purpose of this study is to help performers complete stage music performance and improve the performance appeal.

2. Research on Music Emotion Intelligent Recognition and Classification Based on Multifeatures

A perfect music performance is not only achieved by the help of necessary musics but also the complementary scene atmosphere. In the music performance, the setting of the scene atmosphere is mainly realized by lighting, which often changes with the emotional factors expressed in the music and assists the music to create a good stage effect [5]. Based on this, in order to control the lighting in the performance, realizing the music emotion recognition is very important. Therefore, to achieve a more suitable music emotion, it is necessary to construct the music recognition and classification model and complete the research on intelligent recognition and classification of music emotion in the music performance system.

2.1. Analysis of the Features of Music Emotion

The realization of music emotion recognition and classification is based on music emotion features, so the extraction of music emotion features is analyzed in the former part of this study. In the previous classification of music emotions, most studies were just based on one kind of music feature. Although this kind of study can also complete the classification task, its accuracy cannot be guaranteed [6]. In order to solve the above problem, this paper analyzes all the factor features of music emotion and then combines them together into a feature vector, which will do a great help to carry out the purpose.

Considering that the premise of extracting emotional features in music is to understand the composition of music, among which the music-related factors can show emotional features more obviously. It includes acoustic features, melody features, and audio features [7]. In view of these three aspects, the following content will analyze it specifically.

Among them, acoustic characteristics, melodic characteristics, and audio characteristics are the general music-related factors that can show emotional characteristics more obviously [7]. According to the above analysis, the Thayer emotion model constructed mainly includes two dimensions: energy and pressure, as shown in Figure 1. The above two dimensions representing the abscissa and ordinate, respectively, correspond to Figure 1 that can reflect the measurement of standard strength. From left to right, the abscissa corresponds to emotion from happiness to sadness, and from bottom to top, the ordinate corresponds to emotion from calm to vitality.

Based on Figure 1, the following content specifically analyzes acoustic features, melodic features, and audio features.

2.1.1. Acoustic Features

Acoustic factor is the most basic component of music. Music with different emotions has different acoustic features. The basic corresponding relationship is shown in Table 1 [8].

2.1.2. Melody Features

Melody is the overall beat and tune of music, which can be described by the following five characteristic parameters [9].

(1) Balance Parameter Y1. Balance refers to the proportion of volume in the left and right channels. The calculation formula is as follows:

In the formula, represents the equilibrium value, .

(2) Volume Parameter Y2. Volume refers to the loudness of sound that can be heard by human [10]. The calculation formula is as follows:

In the formula, represents the total volume loudness, and its range is .

(3) Pitch Parameter Y3. Pitch refers to the vibration frequency of the note fundamental frequency. Music with fast rhythm has fast vibration frequency; on the contrary, it has slow vibration frequency [11]. The calculation formula is as follows:

In the formula, represents the pitch value, represents the number of note fundamental frequency.

(4) Average Strength Parameter Y4. Strength refers to the strength of the power generated by music. The soothing music has weak strength, while the more shocking music has strong strength [12]. The calculation formula is as follows:

In the formula, represents the sound’s intensity factor; represents the sort number of musical notes; and indicates the total number of musical notes.

(5) Energy Parameter of Musical Notes. Energy of musical notes refers to the sum of the product of the pitch and the length of a note. The calculation formula is as follows.

In the formula, represents the pitch and length of notes in track.

2.1.3. Features of Audio Frequency

Audio frequency is one of the important factors in music, which affects the rhythm of music. The faster the rhythm is, the more obvious the audio is, and the happier the music emotion will be. On the contrary, the music emotion is more dull or depressed [13]. Audio features are described from two aspects, the following part is a detailed analysis.

Time domain characteristics are as follows.

(1) Zero Crossing Rate Z_n. Zero crossing rate refers to the frequency of the audio signal waveform passing through the zero level. Generally speaking, in a piece of music, the zero crossing rate of high frequency band will be higher; otherwise, the zero crossing rate will be lower. Through this parameter, we can well distinguish between unvoiced and voiced music. Generally, unvoiced music is mostly used in cheerful music, while voiced music is often used in low and deep music [14]. The zero crossing rate is calculated as follows:

In the formula, represents the sign function of audio signal of and represents the effective width of the window.

(2) Range M_n. Range refers to the width of the waveform vibration of the audio signal. The more passionate the music is, the greater the audio amplitude is. The more soothing the music is, the smoother the audio amplitude is. The audio amplitude is described as follows:

In the formula, represents the moving window function.

(3) Frequency Domain Characteristics. The frequency domain characteristics of audio include two parts: spectrum centroid and spectrum flux . The calculation formula is as follows:

In the formula, represents the amplitude of the short-time spectrum of the frame at frequency point and and represents the amplitude value of the spectrum at frame at frequency and .

Based on the above three types of 14 music emotional features, the feature vector is constructed to describe the emotional factors of a part of music or a piece of music. It is described as follows:

In the formula, represents acoustic features; represents the characteristics of melody; represents audio features; represents the speed of speech; represents the pitch; represents strength; represents sound quality; and stands for pronunciation.

2.2. Construction of Music Emotion Recognition and Classification Model

Based on the emotional features contained in the above music, this chapter uses the BP neural network to establish a recognition and classification model to realize the recognition and classification of music emotion.

BP neural network is an intelligent algorithm that simulates the neural work of brain, which mainly includes three layers, and the classification processing is realized through the operation of each layer [15]. The recognition and classification model constructed by this algorithm is shown in Figure 2.

In Figure 2, the model needs to be trained before practical application. The specific process is as follows: through feedforward operation of each layer, it can obtain the output result and subtract the output result from the expected result. When the difference between the two is less than the set threshold, the difference is propagated backward after training, and the process is repeated from output to input until the weight and threshold reach the optimal [16]. The purpose of the BP neural network training is to adjust and optimize the weights and thresholds of every two levels in the model. Therefore, the formula is given as follows:(1)Connection weight and threshold between the first input layer and the second hidden layer, adjustment formula is as follows: In the formula, represents error value of the second layer; represents input feature vector; represents iterations; and represents the number of training samples.(2)Connection weight and threshold between the second hidden layer and the third output layer, adjustment formula is as follows:

In the formula, represents error value and represents the output value of the second layer node.

After training of the model based on the BP neural network, the music emotion classification can be realized by inputting test music samples.

3. Example Analysis

In order to test the application effect of the multifeature recognition and classification algorithm in music emotion recognition of the music performance system, the following research takes MATLAB software as the algorithm operation platform, selects specific calculation examples, and carries out simulation test and analysis.

Based on the five emotions of happiness, disgust, anger, sadness, and fear in Table 1, this study constructs a simulation model of music emotion recognition process in the music performance system, as shown in Figure 3.

As can be seen from Figure 3, the example analysis steps based on the simulation model are as follows: firstly, input different music training samples in MATLAB software, secondly classify music emotion through the BP neural network emotion classifier and then determine the music performance lighting pertinently, so as to achieve the purpose of analyzing music emotion.

3.1. Sample Selection

The samples selected in the test come from three emotional corpora, namely, EMO-DB, Belfast, and e NTERFACE. According to the different emotions of the selected samples, the samples can be divided into five categories, and the specific distribution of the samples is shown in Table 2.

3.2. Neural Network Classification Model Training

Input the training samples in Table 2 above into the neural network classification and recognition model for model training. The parameters of model training are set as follows: the number of nodes in the three-layer structure is 3-32-5; the maximum training times are set to 500 times; the target of convergence accuracy is less than 0.0002; tansig function is the transfer function of the hidden layer, and purelin function is the transfer function of output layer. The training page based on Simulink is shown in Figure 4.

It can be seen from Figure 4 that the convergence accuracy of the classification model based on the neural network finally stabilizes at 0.000125, meeting the convergence accuracy target (the set convergence accuracy is less than 0.0002), indicating that the performance of the constructed model meets the needs of subsequent classification and can be used in the actual music emotional intelligence classification test.

3.3. Recognition Results of Melody Feature

Melody is the soul of music, and interval difference is the most basic element of melody movement. Interval difference refers to the formation of different melody combinations according to the differences between the high and low tones, so that people can perceive different musical images, thoughts, and emotions. Figure 5 shows five examples of interval difference statistics for music with five different emotions: happiness, sadness, tenderness, anger, and fear.

Figure 5 shows statistical results of interval difference. The smaller the interval difference, the higher the percentage of data, the more interesting and relaxed the main emotion of the music. The smaller the interval difference, the more intense and depressed the main emotion of the music. The frequent occurrence of large interval difference cannot increase the fluency of music, but will cause a sudden feeling. Therefore, in the main melody of music, there are also some differences in the ratio range of the same interval difference for music with different emotions.

3.4. Extraction Result of Audio Features

With the acceleration of music rhythm in the music performance system, the more obvious the audio is, and the more cheerful the musical emotion is expressed. On the contrary, the musical emotion is dull or depressed, and the extraction results of audio features are shown in Figure 6.

According to the analysis results in Figure 6, the extraction area of a music audio feature is regular under a single emotion change, but the extraction area of audio feature is irregular under different emotion changes, which is consistent with the music emotion reflected by the actual rhythm.

3.5. Recognition Result of Spectral Flux

Spectral flux refers to the difference mean value of the spectrum of all two adjacent audio frames, which reflects the dynamic characteristics of the music signal. The music signal includes three parts: unvoiced, voiced, and mute, which determines that the spectrum flux of music has a large range of variation. The change of performers’ performance will also cause the change of spectrum flux. Therefore, the following takes music tracks containing multiple emotions as an example and carries out the spectrum flux identification test. The test results are shown in Figure 7.

According to the analysis result in Figure 7, with the different changes of music emotions in the music performance system, the output spectral density shows an irregular change trend. The main reason is that, there are many emotional expressions in a music, and the music content (such as lyrics and background ect.) will affect the expression of emotion, which further leads to the irregular change of output spectral density.

3.6. Algorithm Evaluation Index

The kappa coefficient is the index to evaluate the performance of the algorithms. This coefficient is usually used to evaluate the consistency between the application results and the actual results of the algorithm, and its calculation formula is as follows:

In the formula, represents observation consistency rate; represents expected consistency rate; , the larger the value of K, the better the consistency is, and the closer the algorithm application result is to the actual result. Generally speaking, when ≥0.75, the recognition and classification is more accurate. If <0.4, it indicates that there is a lack of consistency, and the accuracy of recognition and classification is poor.

The test samples of Table 2 are tested by inputting them into the training results in Chapter 2.2, and the classification results of output recognition are shown in Figure 8.

Based on Figure 8, the test results are counted according to formula (12) to calculate the kappa coefficient. The results are as follows:

In the formula, , , , , and represent the kappa coefficients of happiness, sadness, tenderness, anger, and fear, respectively. The kappa coefficient K values calculated by this method are all greater than 0.75, which indicates that the recognition and classification results are in agreement with the actual results, that is, the application results of algorithm are close to the actual results, and the recognition and classification accuracy is high. Then, the research purpose is achieved.

4. Conclusion

In order to achieve the effect of stage performance, generally in the process of performance, the light will change with the artistic conception created by the performance under the control of the music performance system. Therefore, it can be seen that music emotion recognition is crucial to the lighting control. Based on this, this paper proposes an intelligent music emotion recognition and classification algorithm in the music performance system. The following conclusions are drawn from the study:(1)The proposed algorithm mainly extracts emotional features from music, and then inputs them into the constructed recognition and classification model so as to clarify the emotion to be expressed in music and control the lighting.(2)This study verifies the algorithm’s performance by an example that proves the proposed algorithm can accurately recognize the music interval difference. The extraction area of audio feature is regular, and the output spectral density shows an irregular change trend, which is consistent with the actual trend of music expression. The kappa coefficient values are greater than 0.75, indicating that the recognition and classification results are in good agreement with the actual results, and the research goal is achieved.(3)However, this study did not apply the algorithm to the actual music performance system and lacks some practical application, which needs to be further verified and analyzed in the future.

Data Availability

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.

References

Z. Duan and J. Yan, “Research on stage lighting control method based on music emotion recognition,” Computer Measurement and Control, vol. 28, no. 11, pp. 95–100, 2020.
View at: Google Scholar
J. Wang, N. Ying, C. Zhu, Z. Liu, and Z. Cai, “Speech emotion recognition algorithm on extraction of deep space attention characteristics based on spectrogram,” Telecommunication Science, vol. 35, no. 07, pp. 100–108, 2019.
View at: Google Scholar
J. Wang and B. Zhu, “Classification of musical emotions oriented to Chinese lyrics,” Computer Systems & Applications, no. 8, pp. 24–29, 2019.
View at: Google Scholar
L. Qiang and X. Liu, “Sentiment classification of music features based on PNN,” Computer Engineering and Design, vol. 40, no. 2, pp. 528–532, 2019.
View at: Google Scholar
X. Deng, G. F. Xing, M. I. Jianxun, D. Li, J. Wang, and Y. Tang, “Classifying emotional EEG using sparse representation method,” Application Research of Computers, vol. 36, no. 3, pp. 801–806, 2019.
View at: Google Scholar
G. Li, T. Yun, and L. Qi, “Multi-feature speech emotion recognition based on random forest classification optimization,” Microellectronics & Computer, vol. 36, no. 1, pp. 70–73, 2019.
View at: Google Scholar
C. E. Li and L. Zhi, “Research on classification of electronic music signals based on particle swarm optimization and support vector machine,” Modern electronic technology, vol. 43, no. 21, pp. 51–54, 2020.
View at: Google Scholar
H. Li, H. Li, and M. Lin, “Research on long term music emotion based on dynamic brain network,” Journal of Fudan University ( Natural Science), vol. 59, no. 3, pp. 330–337, 2020.
View at: Google Scholar
X. Li and C. Yun, “UGC automatic emotion recognition method based on joint optimization of feature selection and dispositional analysis,” Journal of Industrial Engineering and Engineering Management, vol. 33, no. 2, pp. 61–71, 2019.
View at: Google Scholar
Z. Zhu, J. Tao, H. Ge, and J. Jia, “Passive sonar target classification and recognition technique based on BPSO-KNN algorithm,” Acoustic technology, vol. 38, no. 2, pp. 219–223, 2019.
View at: Google Scholar
N. Lu, “Design of music emotion classification method based on audio and lyrics dual mode,” Techniques of Automation and Applications, vol. 39, no. 5, pp. 166–169, 2020.
View at: Google Scholar
X. Jing, “Design of intelligent detection algorithm for electronic music signal in complex noise scene,” Modern electronic technology, vol. 43, no. 19, pp. 49–52, 2020.
View at: Google Scholar
L. Yi, “Electronic music classification model based on multi feature fusion and machine learning algorithm,” Microcomputer Applications, vol. 36, no. 9, pp. 117–119, 2020.
View at: Google Scholar
T. Li, S. Ye, G. Ye, and Y. Zhu, “Research on speech emotion recognition algorithm based on ensemble learning,” Computer Technology and Development, vol. 30, no. 6, pp. 82–86, 2020.
View at: Google Scholar
T. Feng and Z. Yang, “Speech emotion recognition algorithm based on multi task learning and recurrent neural network,” Signal Processing, vol. 35, no. 7, pp. 1133–1140, 2019.
View at: Google Scholar
C. Chuang, R. chellali, and X. Yin, “Speech emotion recognition based on BP neural network optimized by improved genetic algorithm,” Application Research of Computers, vol. 36, no. 2, pp. 344–346, 2019.
View at: Google Scholar

Copyright

Copyright © 2021 Chun Huang and Diao Shen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

455

Downloads

476

Citations