This paper analyzes the application basis of AI technology in music teaching and realizes the extraction of music features. In addition, an intelligent music teaching model based on the RBF algorithm is constructed, which determines the way of music learning, and thus, an intelligent music teaching system is designed. The system constructed can realize the effective interaction between teachers and students, improve learners’ singing style to the maximum extent, and play a good auxiliary role in students’ music knowledge learning.

1. Introduction

Learning music is very important for students’ long-term development, and music is also a compulsory course in basic education. Besides imparting theoretical knowledge, music teaching should pay more attention to the cultivation of students’ music skills. However, the present situation of music teaching cannot meet people’s expectations. Traditional music teaching belongs to the class teaching system where there are not many music lessons per week, and students must learn theoretical knowledge and singing practice at the same time [1,2]. With the rapid development of the economy, in the Internet era, intelligent electronic musical instruments are constantly being introduced [3]. These intelligent electronic musical instruments can not only store a wide variety of musical instrument timbres but also realize the effective arrangement of them, so that they can perform orderly music according to the corresponding behavior instructions. Obviously, the function of this musical instrument is difficult for traditional musical instruments to realize. Therefore, AI has a unique advantage in music education. Digitizing sound signals through AI technology is not only convenient for preservation and reproduction but also characterized [4,5]. Digital audio is easy to record, easy to store, and spread, so it can be combined with traditional music teaching to help teachers give feedback and guidance to students’ singing and make up for the current situation of traditional music teaching with few class hours and lack of teachers.

In addition, the AI music software makes music tasks that used to be edited by music synthesizers or music practitioners only need to be handed over to the computer, thus greatly improving the processing capacity of music data and broadening the storage space of music information. Users can freely edit, adjust, record, and carry out AI processing on a variety of different musical elements [68]. The application of artificial music software in music teaching provides an interactive platform for teachers and students to teach and learn, which greatly changes traditional music teaching methods.

In view of the present situation of traditional music teaching in the basic education stage, such as students’ weak music foundation, large differences in individual learning progress, lack of music equipment, shortage of teachers, and limitation of music hours, this study integrates AI technology into music teaching, aiming at developing a set of music teaching system based on a feature comparison.

2. Theoretical Basis of AI in Music Education

2.1. Emotional Interaction Theory

Emotion is a special way of thinking of human beings, which contains a complex operating mechanism. The emotion obtained by molding machines to develop technical problems of AI is mainly applied to music teaching by creating six dimensions of emotional machines, namely consciousness, mental activity, common sense, thinking, intelligence, and self [9]. This explains the complex mechanism of the human brain and confirms the possibility of applying emotional machine to music teaching. Emotion interaction is emotion calculation based on AI, which endows computers or machines with the ability of human beings to communicate similar or identical observations, understanding, and various emotions. In the process of communicating with machines or computers, it is more natural and convenient that for personalized music education, AI teachers are used to solve problems that students encounter, and when there is an error in music practice, the intelligent system talks about the error, as shown in Figure 1.

Music learners access the learning combination, generally a learning platform, by logging in to the client, and then presenting the learning results to the teachers through the man-machine interaction mode. However, the teachers improve the new teaching model through their achievements and improve the learners’ learning awareness and learning ability based on negative feedback, thus forming an interactive closed loop. Compared with the traditional evaluation methods, the teaching system based on AI adopts developmental feedback and reasonably applies emotional evaluation to improve the teaching efficiency of music learners.

2.2. Evaluation Criteria

Although it is difficult for computers to perceive people’s emotions, there are ways to detect intonation, rhythm, and breath smoothness. In terms of intonation, this study decided to use a pitch feature sequence to represent it. Taking time as a unit, the sound signal is divided into small enough frames, and the pitch of them is extracted to generate a sequence of sound features. In addition, by calculating the number of consecutive pitches with the same pitch value, the sound length of a note can be calculated.

In the aspect of rhythm, it is particularly important to judge the rhythm after extracting the pitch characteristic sequences of singing signals and template music, respectively. The judgment of rhythm is realized by comparing the values of template music pitch and singing voice pitch at the same time. If the pitch value of the template music is 0 but the singing voice pitch value is not negative, or if the pitch value of the template music is not only the pitch value of the singing voice but also the pitch values of more than a dozen or even dozens of tons in succession appear in this situation, it can be judged that the singing rhythm does not correspond.

2.3. Extraction of Music Feature
2.3.1. Pitch Characteristics

As a whole, the sound is constantly changing with time, so it cannot be analyzed and processed by ordinary methods. However, in a short time range, its characteristics basically remain unchanged, so it is considered that the speech signal has short-term stationarity. According to the characteristics of short-term stationarity of speech signals, by adding a rectangular window or Hamming window to it, the speech signal is split into a series of small speech signals, each of which is a pause, and then, the pitch period of each signal is calculated, as shown in Figure 2.

2.3.2. Melody Features

Compare the students’ singing voice with the standard music to judge. As MIDI files are instruction music files, similar to music scores, from which more accurate music information can be directly extracted, therefore, taking MIDI files as a template for singing comparison can make the judgment result more accurate and more reliable. The results are taken as music templates for students to sing for comparative assessment. MIDI files are composed of a header block and one or more audio track blocks. One of the tracks contains the singing score information of the song, which is the main track. If it has only one track, this track is the main track, and while there are multiple tracks, it is necessary to judge the main track by extracting the main track in the song file and then extracting melody features.

2.4. Application of FMS

Flash Media Server (FMS) is mainly used as a platform for users to communicate with each other, which has the characteristics of multimedia interaction, real-time audio, real-time video, and real-time data stream. By installing FMS3 on the Windows platform, there will be an applications folder under the installation directory, and all the server applications will be placed in it. The folder name of the application must be consistent with the name of the application; that is, if the application name is “micRecord,” it must be placed under a folder called “micRecord.” Then, FlashDevelop software is adopted to program, which realizes the connection between microphone and FMS. The flow chart is shown in Figure 3.

The program is written in AS3 language, and its function is to complete the generation operation of WAV format music files according to the WAV format standard. Afterwards, the sound signal is saved as an audio file in WAV format, which is convenient for students to call out the original singing and self-singing for voice comparative analysis. Before using THE AS3WavSound program to generate WAV files, you can freely make some presets to determine the generation of a WAV file, such as mono or stereo, the sampling frequency can choose 11025 Hz, 22050 Hz, or 44100 Hz sample bit rate can choose 8 bit or 16 bit.

3. Intelligent Music Teaching Model

The intelligent music teaching system proposed in this paper adopts the RBF algorithm, which is a neural network composed of locally adjusted neurons [10,11]. It generally has a five-layer network, as shown in Figure 4.

The first layer is the information factors related to the case, and these inputs can be summarized into different music item indicators, which are input into the neural network structure.

The second layer is the membership function, and its mathematical expression is as follows:

The third layer describes the number of fuzzy rules. By learning the samples, the number of learned rules is trained the least and the most important. Among them, the output calculation of the j-th rule is shown in the following formula:where represents the center of the j-th RBF unit. The characteristic of the RBF neural network is that the closer the neuron is to the center, the higher its activation degree, which is very consistent with the teaching mode of influencing factors of interactive music learning.

The fourth layer is the normalization layer. The nodes of this layer should be consistent with the fuzzy rule nodes, and the output of the j-th node is shown in the following formula:

The fifth layer is the output layer, which outputs the evaluation of each skill in music performance. It is mainly based on TS fuzzy model in the RBF algorithm, and its output is shown in the following formula:where is the connection mode representing the k-th rule, that is, the sum of the weight products of the output variables, as shown in the following formula:

The music learning mode based on the RBF algorithm is to integrate the algorithm idea into the design of the platform and fully show the algorithm when writing the code, which realizes the function of the platform interface, so that it can effectively match with the interactive learning mode. Among them, X is the proportion of learning time consumed by 100 students in the music system; and refer to the distribution of hidden layer in each music learning courseware for excellent students in X. The hidden layer category of these learning samples is relatively parallel; in addition, Y is the best music score corresponding to each input layer.

In order to simplify the RBF algorithm, the second, third, and fourth layers can be classified as hidden layers, and the first and fifth layers are input layers and output layers, respectively, which are used as input layers for different aspects of music teaching. When aiming at a series of large-scale data, the first m data of music data are taken as initial training, and then, an RBF model for students to learn music knowledge can be constructed, as shown in Figure 5.

4. Design of Intelligent Music Teaching System

4.1. Demand Analysis

There are two main target users of this system. The first target is students, especially those in basic education. Because their knowledge construction ability is not perfect, they need to understand abstract music knowledge such as pitch, melody, whole tone, and semitone in the study of music. In the process of music learning, it is necessary to train pronunciation and correct intonation. Because there are also remote areas where music teachers are scarce, it is urgent to find a channel that can provide professional guidance for one’s own theoretical study and singing.

The next target is music teacher. They should have rich professional knowledge of music and be able to guide students in singing training to practice pitch and intonation and improve their singing level.

Based on the characteristics and needs of target users, this system emphasizes more on education. Therefore, in the design and development of the system, more emphasis should be placed on the standardization of reference audio, the accuracy of data processing, and the professionalism of feedback guidance. Therefore, it should have the following functions: selecting songs, listening to songs, recording singing, grading, correcting errors, saving recording and evaluation results, and comparing original and self-singing and uploading music templates.

4.2. Frame Design

According to the demand analysis of the music teaching system, the design of the systematic framework is shown in Figure 6.

The user module converts the analog signal of students’ singing voice into a digital signal and then transmits it to the audio feature extraction module. The module extracts the pitch characteristics of the audio input by the user to obtain the singing pitch sequence to be compared. On the other hand, after the students select the track, the system will call out the template audio sequence of the corresponding track from the music feature library. The music feature library contains the audio files of all songs in the template music library, which is generated by the feature extraction module.

After singing, the similarity comparison module compares the singing pitch sequence with the template pitch sequence and finally obtains the short-term score and the total score. Afterwards, the feedback module lists the five items with the lowest short-term scores, determines the causes of errors according to the pitch data, and gives improvement strategies to learners, so that students can practice singing next time.

The operation process of users is as follows:(1)Select songs through the user input interface, then enter the singing stage, and grasp the overall rhythm according to the information such as song mode, beat, and speed displayed on the interface(2)Sing it through the microphone at the right time when listening to the accompaniment of the song(3)After singing, the system will give the singing score, the reasons for mistakes, and suggestions for improvement

In addition, students can select a section of audio according to the visual pitch curve, freely switch between self-singing and original singing, visually and audibly compare the difference between self-singing and original singing, and correct mistakes.

4.3. Design of Functional Modules
4.3.1. Template Music Library Module

At present, the accuracy rate of extracting the main melody of multimusic is low, and only with accurate template data can the scoring results and feedback information be calculated with high credibility. Because this system is designed for music education, it requires high accuracy of music data, and it is necessary to avoid using polyphonic music to build a database. The construction of a common music feature database is mainly divided into three types: WAV files, music score information, and MIDI analysis.

This system uses MIDI to build a database. In addition to the large amount of music data that comes with the system, it also supports music teachers to upload MIDI files to expand the music library. Ways for teachers to obtain MIDI files are included as follows:(1)Download MIDI files through the network. MIDI files are small in size, easy to store, spread, and make, which makes many MIDI files available for downloading on the network.(2)Creating MIDI music. MIDI music can be made by software such as Sonar or connected to the computer by devices with MIDI interface, such as the electronic keyboard. By playing music, MIDI data can be directly input into the computer through a sound card to synthesize MIDI files.

In addition, in the music template library, the extracted MIDI main melody pitch feature sequence of a song is associated with the song name, accompaniment, and lyrics and stored in the database together. When the user selects the song, the system can directly call up the template pitch data for comparison.

4.3.2. User Input Module

The most important function of the module is real-time recording, which converts the analog signals of students’ singing practice into digital signals and stores them for later processing. Besides, it also includes some interactive functions, such as selecting practice tracks, playing accompaniment, starting, pausing, ending, and other control functions, and some display functions, such as displaying the name and duration of the selected song, as well as melody information such as mode, beat, and speed.

4.3.3. Music Feature Extraction Module

The function of the feature extraction module is to extract the feature information of music, which is convenient for similarity comparison and as the basis for error correction. The difference between music lies mainly in the difference in the melody that is related to pitch, length, and rhythm. As the sound length involves cutting notes, accurate cutting of notes is always a difficult point in the field of audio processing. Especially, for the audio input by the user, it will reduce the credibility of the later comparison results. Therefore, this system uses pitch as a feature vector to represent music. Since time has absolute correspondence with audio signals, the sound length can be understood as the number of frames with the same pitch.

4.3.4. Similarity Comparison Module

The function of the similarity comparison module is to compare the extracted singing pitch feature sequence with the template pitch feature sequence and get the scoring result. The purpose is to compare the students’ singing situation with the template and evaluate whether the students’ singing is accurate. In the course of students’ singing, the pitch sequence of singing and template audio is compared in sections with a fixed duration as a unit, and the short-term singing score is obtained. The system sets this fixed duration to seconds. After the singing is finished, the system will normalize the vocal data and calculate its total score, which can give feedback to students in time during the singing process and facilitates the system to locate the wrong position according to the score.

4.3.5. Feedback Module

The feedback module is the most important module in this system. Only by accurately analyzing students’ problems in singing and reminding them to make targeted corrections can we really improve students’ singing levels. The study of singing or performance belongs to the study of motor skills which refers to the process of relatively lasting changes in athletic ability caused by practice or experience.

In this paper, the content of feedback includes not only the overall evaluation of the singing situation but also the short-term scoring of the lowest scores, analysis of the causes of errors, and suggestions for correction. In addition, the demonstration of the original singing in the wrong position is also provided for students to compare and then carry out the targeted practice. Finally, guidance, demonstration, practice, and feedback are carried out circularly.

The feedback mode adopted by this system is shown in Figure 7.

When designing feedback methods, the advantages and disadvantages of immediate feedback and delayed feedback are considered. Immediate feedback is to give feedback information in real time during students’ singing or playing; delayed feedback refers to giving feedback information after singing or playing. In this study, the system uses a combination of immediate feedback and delayed feedback. The advantages of storage and analysis of delayed feedback are used to make up for the shortness of immediate feedback. The immediate feedback is used, because the delayed feedback cannot remind students to correct pitch immediately.

5. Conclusion

Through the extraction of music features, this paper puts forward an intelligent music teaching model based on the RBF algorithm, determines the way of music learning, and designs an intelligent music teaching system. According to the result of the requirement analysis, the design of the systematic flow is completed, and the function design of each module of the system is carried out. Among them, the template music library module is constructed by MIDI music files to obtain more accurate comparison templates; the user input module uses FMS to record and save students’ singing voice in real time; the feature extraction module realizes the feature extraction of singing audio and template music, so that the similarity comparison module can get accurate comparison results; feedback module can improve learners’ singing style to the maximum extent.

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.