Music is a common art and it is a jewel of human civilisation. In the course of music’s development, the evaluation of music teaching is an inevitable step in the development of quality music. Universities are important places that abound with musical souls and their contribution to the development of music has been outstanding. But with the development of the times, university music has been hampered in the field of teaching and learning. As an important branch in the field of computer science and information technology, artificial intelligence technology contains many intersecting and comprehensive subject connotations, bringing brand-new elements to music education. It has also had an important impact on the development of music teaching. The article focuses in depth on the traditional process of music development in terms of the characteristics and ways of teaching music. Based on this, the article further explores the integration of artificial intelligence and music and analyses the role of emerging technologies as an aid to music from the perspective of the times. And this article uses emotion recognition as an evaluation index to explore the evaluation role of artificial intelligence technology in college music teaching, and improve the quality and efficiency of music teaching. The experimental results show that the teacher’s positive emotion rate based on image data is 57.8%, and the student’s positive emotion rate is 44.5%; the teacher’s positive emotion rate based on voice data is 53.3%, and the student’s positive emotion rate is 51.1%. The classroom emotion is negative at 7–13 minutes, the classroom emotion continues to be low at 28–40 minutes, and the teacher and student emotions are more positive at 13–28 minutes.

1. Introduction

Music is a popular art form, and good musical expression can often help people develop a healthy personality. However, music often covers the creator’s personal emotions, and music teaching is a highly subjective art. Therefore, only relying on manual evaluation and guidance of music teaching often has a lot of randomness. Artificial intelligence is an emerging technology to study and expand human intelligence, which was first born in the 1950s. Because of its unique functional characteristics, artificial intelligence is widely used in many disciplines and fields. For example, artificial intelligence systems for medical clinical use scan data and images and provide medical information; banks use artificial intelligence systems to organize operations, financial investments, and property management; Yuncheng Communications Corporation studies machines that manage labor. The combination of artificial intelligence and music teaching could bring a new face to music education and bring music into the zeitgeist, while this move could also improve people’s practical skills and enhance their appreciation of art. The development and advancement of artificial intelligence will further help humans to analyze and make decisions, and revolutionise the way they work.

As an emerging technology, artificial intelligence has a wide range of applications in the fields of pattern processing and image recognition, which has attracted the attention of many experts and scholars. Lu et al. plan to use artificial intelligence to build an intelligent learning model that can mimic the human learning process to the greatest extent possible. In the research process, he also used special events to test the learning model, and proposed many targeted optimization strategies [1]. In order to study the development process of artificial intelligence, Hassabis et al. proposed an intelligent algorithm combined with neural network. During his investigation, he discovered that there is a connection between the development of artificial intelligence and the biological brain [2]. Thrall et al. have pioneered the introduction of artificial intelligence into the field of medicine. Through the application of artificial intelligence in imaging to analyze image data, he improves the certainty of diagnosis. And he put forward a strategic plan for how radiologists and pathologists will combine big data and artificial intelligence in the future [3]. Rongpeng demonstrates the effectiveness of artificial intelligence in managing and orchestrating cellular network resources. Rongpeng et al. discussed the relationship between artificial intelligence and candidate technologies in 5G cellular networks [4].

The performance of the song “Transits into an Abyss” marks the first time that humans have played a work entirely created by machines. This piece of music was composed by a computer cluster “Iamus” with intelligent algorithms. In the creative process, Iamus can create a complete music with only some scattered information, and then it can also achieve high achievements in a very short period of time. Following the above achievements, Iamus went on to create many other genres of music. Not only that, relying on the calculation of artificial intelligence, Iamus can also get rid of manual guidance to complete the music arrangement, lyrics, and arrangement independently. People call the product of combining music and artificial intelligence as music artificial intelligence, which is a field where artificial intelligence and music learning are integrated. With the development of the times, the combination of music and artificial intelligence is getting closer and closer, which also provides ideas for people to innovate in the field of music. With the blessing of artificial intelligence technology, human physiological senses are infinitely amplified, which has built a good platform for people to create music. At the same time, combining artificial intelligence with music is not only an innovation of traditional music forms, but also a better development of artificial intelligence. It is suitable for human perception, cognition, research and creation of music, forming a new way of music teaching with human-computer interaction (Figure 1).

With the progress of the times, the form of music teaching has also undergone earth-shaking changes. The renewal of these musical instruments makes people’s lives more convenient, intelligent, and complete. It not only provides a new model for music classroom teaching, but also provides new directions, teaching ideas and thinking space for music educators, opening up a new world. Based on this, many scholars have conducted research on the combination of artificial intelligence and music teaching. In order to study the role of kindergarten music teachers in music teaching, Wong assessed the teaching situation of 88 in-service kindergarten music teachers. In the process of comprehensive evaluation, he used the music teaching effect and emotional perception indicators as the basis for evaluation, and established a quantitative evaluation table based on this [5]. Prichard adopts a continuous explanatory hybrid method design, divided into two chains (chain I: quantitative, chain II: qualitative). Introductory music education students (N = 684) from 41 accredited institutions of the National Music School Association participated in the first line, and a nested sample of 24 respondents participated in the second line. The effectiveness beliefs of preservice music teachers are interpreted as having two dimensions: music teaching effectiveness beliefs and classroom management effectiveness beliefs. The mixed method analysis shows that the introductory music education students’ beliefs in music teaching effectiveness may be affected by various curriculum experiences, including personal guidance, peer teaching, and field experience [6]. Singh et al. used to study the data of the multiple intelligences of Indian children, and to evaluate the different forms of intelligence of students. The subjects of the experiment were 1065 students between 12 and 16 years old. All students received a multiple intelligence questionnaire consisting of 30 right and wrong questions. It evaluates children's intelligence in seven areas, including language skills, logic/mathematics skills, music skills, spatial intelligence, bodily kinesthetic skills, intrinsic intelligence, and interpersonal intelligence [7].

However, the above-mentioned experts and scholars’ research on music teaching mode mainly focus on the level of primary and secondary schools, and most scholars have failed to combine artificial intelligence with music teaching at this stage. Throughout the entire music industry, the combination of artificial intelligence and college music teaching is still in its infancy [8]. This article explores how to use artificial intelligence to carry out teaching behavior analysis, and then obtain the auxiliary role of artificial intelligence in college music classrooms. It is to help artificial intelligence integrate into college music classrooms more scientifically and efficiently, and to better promote college students' music learning.

2. Artificial Intelligence in Music Teaching in Colleges and Universities

2.1. The Status Quo of Music Teaching in Colleges and Universities

The development of traditional university music teaching depends to a large extent on the music courses provided by universities and the teaching experience of music teachers [9]. The quality of music courses is also closely related to teachers’ teaching experience, and the quality of music courses in different universities varies greatly. Under the combined effect of these factors, many problems inevitably arise in college music teaching, mainly as follows.

2.1.1. Not Paying Enough Attention to Music Teaching

In the process of college education, colleges often focus all their attention on academic education, so they ignore music and other arts education to a certain extent [10]. Taking a step back, even professional art colleges often cannot guarantee enough attention to music teaching. Under this circumstance, many colleges and universities naturally ignore the quality of music teaching. If things go on like this, the problems of music teaching will become more and more prominent.

2.1.2. Single Music Teaching Mode

In the process of college music teaching, there is a phenomenon of single teaching mode. Usually, teachers are accustomed to abstractly explaining the basic theories of music, allowing students to understand the background, creative environment, and story outlines of musical works. It also leads students to analyze the tunes, melody, and techniques of musical works. However, most students are not strong in understanding abstract music knowledge and concepts, which lead to the phenomenon that students have a weak foundation in music theory. Most of the development modes of music courses are that teachers use musical instruments to let students appreciate or sing classic songs. The teaching mode and content are monotonous, and the quality of music classroom teaching is also low [11].

2.1.3. Taking Teachers as the Main Body

In the primitive music assessment system, music education is often the preserve of the teacher, which severely erases the autonomy of the students [12]. In addition, students are often afraid or unwilling to express themselves accordingly due to traditional attitudes, which further detract from the practical needs of music education.

2.1.4. Music Teaching Behavior Evaluation Mechanism

In the traditional music teaching evaluation model, the evaluation of music teaching in colleges and universities often relies on manual evaluation and manual guidance. In this case, the overall level of music teaching cannot be guaranteed. Moreover, this method often requires a large number of professional music experts, which consumes a lot of manpower and material resources [13]. At the same time, even the most professional experts cannot guarantee the accuracy and objectivity of each evaluation, so the current music teaching behavior evaluation is highly subjective.

2.2. Network Structure and Working Mode of RBF Algorithm

The algorithm model of the music intelligence system uses the artificial intelligence algorithm RBF model. This algorithm is called a radial basis function and is a neural network composed of locally regulated neurons. It usually consists of a three-layer network model: input layer, hidden layer, and output layer. The input layer is composed of signal source nodes. They connect the network and the external environment, and play a role in the transmission of data and information. It does not make any changes to the input information. The kernel function of the hidden layer neuron is taken as the radial basis function, which performs nonlinear transformation of the input information into the hidden layer space. The output layer is linear and provides a response to the activation mode of the input layer. The RBF algorithm flow is shown in Figure 2 [14].

There are many commonly used radial basis functions [15].(1)Gaussian function:(2)Abnormal sigmoid function:(3)Quasi-multiple quadratic functions:

This research uses Gaussian function in radial basis function. In formula (1), is the center of the j-th basis function [16], the output value range of the node is between 0 and 1, and the output value is inversely proportional to the distance of the input sample from the center. That is, the closer the distance, the larger the output value; the farther the distance, the smaller the output value [17]. The initial calculation formulas of , , and are

Among them, the input vector and n is the number of input layer units.

minX is the minimum value of the i-th feature input vector and maxX is the maximum value of the i-th feature input vector.

minW is the minimum value of all expected outputs in the k-th feature output and maxW is the maximum value of all expected outputs in the k-th feature output.

When the input parameter is 0, the function value gets the maximum value of 1. As the distance between the weight and the input vector decreases, the output value becomes larger [18]. That is, the radial basis function responds locally to the input signal. When the input argument X of the function is close to the central range of the function, the hidden layer node will produce a larger output [19]. It can be seen that this function has the characteristics of local approximation.

Figure 3 shows the output curves of Gaussian radial basis functions at different Euclidean distances. As shown in the figure, when the input value enters the neural network, the corresponding output value will be obtained. The radial basis output value represents the Euclidean distance between the input value and the corresponding weight. In layman’s terms, it is the degree of similarity between the two values [20]. The larger the phase difference, the smaller the radial basis output value. As shown in the figure, when the abscissa is −1.0 and 1.0, the radial basis output is almost 0. The Gaussian output at 0.0 point is 1, and the neuron with a radial basis output of 1 gets the weight of the second layer [21].

The output layer formula is, , and can be adjusted to the best value, and the adjustment calculation is as follows:

Among them:

Among them, is the expected output value of the k-th output neuron at the l-th input sample and is the network output value of the k-th output neuron at the l-th input sample.

2.3. Network Structure and Working Method of BP Algorithm

The neural network BP includes two processes: the forward propagation of the signal and the backward propagation of the error. When propagating forward, the input signal acts on the output node through the hidden layer and undergoes a nonlinear transformation to produce an output signal. If the actual output does not match the expected output, the process of back propagation of the error will be entered. The principle of error back propagation is to use the gradient descent algorithm to continuously adjust the input signal part and neuron threshold of each layer, and repeat it many times. The basic idea is to propagate the output layer error from back to front layer by layer, and indirectly calculate the hidden layer error [22]. The structure diagram is shown in Figure 4.

The BP algorithm is now applied to music [23]:(a)Parametric assignment of weights and music parameter selection .(b)Add input variable to the input unit.(c)Calculate the net input of the hidden layer:(d)Calculate the output value from the sigmoid function defined in the hidden layer:The output value is(e)Then get the first output result, calculate the deviation of the result in the corresponding layer:(f)After getting the deviation value, compare it according to the function obtained above, and get the output:(g)Get the error value from the above function:(h)Rematch the weight vector according to the error:(i)After matching the corresponding weights, update the display layer, where Ω is a bias value:(j)Handle iterative weights located in hidden layers:

With the above operations, we have initially implemented the combination of neural network and music. In the above process, we first redefined the objective function of the neural network, then we made error adjustments according to the actual situation, and finally obtained the desired output.

3. System Verification

3.1. Music Classroom Teaching Behavior Evaluation System

Combining the above algorithms and practices, we can initially construct a system for evaluating teaching behaviour specifically for the music classroom. In this regard, the basic composition of the system is shown in Figure 5. In this system, we can clearly see that there are three levels of the system, so the next article will analyze the relevant performance of the system from the following three levels.

3.1.1. Application Layer

The application layer contains three interfaces: managers, teachers, and students. After the initial collection and analysis of the data, the visualization results will eventually be fed back to the three parties. The application layer can help managers evaluate classroom teaching, intelligently manage school courses, and improve management efficiency.

3.1.2. Data Layer

The main tasks of the data layer are data collection, data source, data preprocessing, and data analysis. Data collection is classroom teaching video data collected through the infrastructure layer. This research is divided into two types of data: image data and voice data. The data source is the collected teacher and student data. Data preprocessing is to filter, transform, and process the collected source data into data that can be used for experiments. Data analysis is to conduct text analysis, cluster analysis, feature selection and association analysis on the data, and finally form a visualization diagram and draw conclusions.

3.1.3. Infrastructure Layer

The infrastructure layer is mainly responsible for data collection. Put recording equipment and video equipment in the classroom to obtain the video and voice data. In addition, the infrastructure layer also includes operation and maintenance management and data processing center.

The system process starts from the infrastructure layer and uses smart devices to collect classroom videos and obtain classroom video and voice data. Then, it transfers to the data layer for data analysis to form a visualization map, which is finally fed back to the tripartite interface of the application layer.

3.2. Classroom Teaching Behavior Analysis Process Based on Artificial Intelligence Technology

After this brief analysis, we can see that the different modules have different roles. The infrastructure layer is the main source of information from the underlying technology. Once a certain amount of data has been obtained, the infrastructure layer packages the data and sends it back to the data layer. After receiving the data, the data layer then starts a series of processing and transformation. The data transformed by the data layer is then passed on to the application layer for presentation to the user. The information obtained by the infrastructure layer is shown in Table 1.

It can be seen from Table 1 that the data stored in the data layer is mainly voice and image data. The collected data is voice and image data, using voice, image, and facial recognition to analyze the preprocessed data. People’s emotions are directly reflected in body language and facial expressions, so we first chose to identify and analyze the facial expressions of data subjects [24]. It can process and analyze the face images of students and teachers, then process and analyze speech and text, and extract them into the speech stream. Combining the analysis results of the two to analyze the emotional behavior of teachers and students. The finally generated data can be displayed in three parties at the application layer, allowing managers, teachers, and students to clearly see the intelligent analysis.

4. Experiment and Result Analysis

4.1. The Purpose of the Experiment

The purpose of this experiment is to use artificial intelligence technology and a designed classroom teaching behavior analysis system to identify and analyze the language and behavior of teachers and students by analyzing sample classroom teaching videos. Designing the emotion algorithm to present the teacher and student emotion change graph, get the teacher and student emotion score, and the overall emotion analysis of the classroom.

4.2. Experimental Objects and Analysis Indicators

The sample model selected in this experiment comes from a music classroom in an information recording and broadcasting classroom of a university in this province. There is a teacher and 35 students in the class, including 22 girls and 13 boys. The collected videos come from three perspectives: teachers, students, and classroom perspectives. Analysis indicators are sentiment analysis from these three perspectives. In the experiment process, combined with analysis indicators, the collected classroom teaching videos are analyzed in the data layer of the system.

This experiment uses speech recognition to evaluate the emotions of experimental subjects, which is mainly divided into positive evaluation, neural evaluation, and negative evaluation. Through the statistical analysis of the evaluation dimensions, the time and frequency of the positive emotions of experimental subjects are calculated, and the overall emotional tendency of classroom learning is finally obtained.(1)The emotion analysis process based on image data. First, according to the accuracy of the experiment, a whole teaching video is divided into several sections, and then the data is collected and analyzed. The accuracy of this experiment is moderate, taking a screenshot every 30 seconds. The duration of a class is 45 minutes, and 90 frames of data will be intercepted for analysis and processing. The analysis flowchart is shown in Figure 6.In the face detection of teacher’s view video, in order to obtain more accurate results, the experiment first performs human body recognition and face recognition on the collected image data to obtain face images. The experiment does further quantitative analysis and assigns corresponding values, set the positive value to a positive value, ranging from 0.5 to 1.5, the neutral value to −0.5 to 0.5, and the negative value to −1.5 to −0.5. Teacher images collected every 30 seconds will have a corresponding emotional evaluation value.In the face detection of the student’s perspective video, all students cannot be detected due to the accuracy of human body recognition. Therefore, this experiment aims at most students with obvious characteristics and obtains their emotional changes at different moments and the corresponding emotional evaluation values.(2)The process of sentiment analysis is based on voice data. First, it extracts the voice data from the video, and extracts the voice data every 30 seconds according to the time interval of image analysis. Using emotion recognition model to process the obtained text fragments, count, and analyze classroom emotions. The analysis flowchart is shown in Figure 7:

When performing sentiment analysis of language behavior on example instructional videos, the experiment first needs to extract voice from classroom video data to obtain corresponding voice data. Then the voice data is cut, cut into segments of voice data, and then converted into text data through the voice transcription platform, then use the vector model to convert to vector data. Finally, these data are classified through the emotion recognition network, and divided into two categories, positive and negative. And assigning the corresponding value, the positive value is set to a positive value, the range is 0∼1, and the negative value is set to a negative value, the range is −1∼0.

4.3. Experimental Visualization Results
4.3.1. Sentiment Analysis Results of Sample Teaching Video Image Data

In the course of the experiment, the image data in the video was statistically analyzed, and the images of the emotional changes of the teachers and students of the whole class were obtained, as shown in Figure 8.

Figure 8(a) is a diagram of teacher emotion changes based on image data, and Figure 8(b) is a diagram of student emotion changes based on image data. The abscissas and ordinates are, respectively, the time and sentiment evaluation value. It can be seen from Figure 8 that the teacher’s emotional state has always been positive and stable. The emotional score in the first half of the class is always between 0.5 and 1.5, and the emotional score in the second half of the class is stable between 0 and 0.5. The students’ mood in the early stage is very positive, but not stable enough, and the emotional score in the later stage is stable between −0.5 and 0.5. This experiment uses the following emotion evaluation formula:

Positive emotion frequency = total number of times the emotion score is greater than 0.5.

Positive emotion time = collection time intervalpositive emotion frequency.

Positive emotion rate = frequency of positive emotion/total number of collections.

Class emotion score = the sum of each emotion score/total number of times.

According to Figures 8(a) and 8(b), as well as the calculation of the emotion evaluation formula, the positive emotion rate of teachers and students based on the image data and the classroom emotion score are obtained. It gets the teacher-student emotional evaluation tables shown in Tables 2 and 3. When the positive emotion rate is greater than 50%, the emotion is positive (positive). When the positive emotion rate is between 30% and 50%, the emotion is neutral. When the positive emotion rate is less than 30%, the emotion is negative (negative).

It can be seen from the table that the teacher’s positive emotion rate = 52/90100% = 57.8%, the emotion score = 34/90 = 0.3820, and the teacher’s emotion is positive. The student’s positive emotion rate = 40/90100% = 44.5%, the emotion score is = 12/90 = 0.1386, and the student’s emotion is neutral.

4.3.2. Sentiment Analysis Results of Sample Teaching Video Voice Data

In the course of the experiment, the image data in the video was statistically analyzed, and the images of the emotional changes of the teachers and students of the whole class were obtained, as shown in Figure 9.

Figure 9(a) is a diagram of teacher emotion changes based on voice data, and Figure 9(b) is a diagram of student emotion changes based on voice data. The abscissas and ordinates are, respectively, the time and sentiment evaluation values. This experiment uses the following emotion evaluation formula:

Positive emotion frequency = the total number of times the emotion score is greater than 0; Positive emotion time = collection time intervalpositive emotion frequency; Positive emotion rate = frequency of positive emotion/total number of collections; Class emotion score = the sum of each emotion score/total number of times.

According to Figures 9(a) and 9(b) and the calculation of the emotion evaluation formula, the teacher-student positive emotion rate based on the voice data and the classroom emotion score are obtained, and the teacher-student emotion evaluation tables shown in Tables 4 and 5 are obtained.

It can be seen from the table that the teacher’s positive emotion rate = 48/90100% = 53.3%, the emotion score = 27/90 = 0.3048, and the teacher’s emotion is positive. The student’s positive emotion rate = 46/90100% = 51.1%, the emotion score = 26/90 = 0.2842, and the student’s emotion is also positive.

4.3.3. Comprehensive Sentiment Analysis of Example Teaching Videos

This research integrates image data and voice data. In this experiment, combining the obtained voice and image data, the following comprehensive emotion evaluation formula is designed:

Comprehensive emotional score = teacher evaluation score0.5 + student evaluation score0.5.

Teacher evaluation score = teacher image emotion score0.5 + teacher voice emotion score0.5.

Student evaluation score = student image emotion score0.5 + student voice emotion score0.5.

According to the image and voice data of teachers and students, as well as the comprehensive emotion evaluation formula, the overall emotion change graph in the classroom can be obtained, as shown in Figure 10.

It can be seen from Figure 10 that the mood changes in the entire classroom are between -1 and 1, and most of them are positive. Judging from the overall change trend of classroom emotions, this is a positive classroom and the overall emotional changes are not large. In class 7∼13 min, 28∼40 min, the student’s mood is negative, while in class 1∼8 min, 13∼28 min, and 40∼45 min, the student’s mood is very positive. When students are in a negative moment, teachers can arouse students’ interest and prolong the positive emotional time in class.

The comprehensiveness of classroom teaching analysis is not only based on speech recognition and image recognition, but also must be based on action, facial expression recognition, and other methods to analyze classroom teaching in many aspects [25]. It is limited by current teaching data collection methods and statistical analysis methods. At present, it is not possible to grasp the specific time point of the data analysis of the teaching activities, so that it is impossible to effectively carry out the comprehensive teaching evaluation data analysis [26].

5. Conclusion

The combination of music and artificial intelligence is a collision of art and technology in a new era, which is bound to bring infinite possibilities. Starting from the general issues of music teaching, the article focuses on the need to establish a systematic evaluation process for music teaching. The article then analyses the feasibility of combining artificial intelligence with music teaching, and then focuses on the construction of an artificial intelligence-based music teaching evaluation system on this basis. Finally, the article focuses on the practical utility of the music teaching evaluation system in the context of artificial intelligence. The experiments show that combining music and artificial intelligence can effectively reduce human and material costs and improve the quality and level of music teaching. At the same time, the music teaching evaluation system based on artificial intelligence provides a reference for the reform and upgrading of teaching models in other industries. Of course, the research in this article also has some shortcomings. The overall evaluation of the music classroom is not only limited to image recognition and speech recognition, but also depends on various data collection and analysis. Therefore, the evaluation of other indicators needs to be updated in the classroom teaching analysis framework.

Data Availability

No data were used to support this study.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this article.