Abstract

Influenced by cultural background, economic development, social system, education system, and other factors, there is still a big gap between Chinese institutions and developed countries in flute teaching, even with our neighbors, South Korea and Japan. Under the influence of cultural background, economic development, social system, and educational system, there is still a very big gap between Chinese colleges and universities and developed countries in flute teaching, even with our neighbors, South Korea and Japan. Because of its local perception and weight-sharing structure, the convolutional neural network is closer to the biological neural network in the real world. The weight-sharing structure reduces the complexity of the neural network, which can avoid the complexity of feature extraction and classification process in data reconstruction. This paper studies the analysis and optimization of flute playing and teaching system based on a convolutional neural network. By applying local perception field and parameter sharing in a convolutional neural network at the same time and adding multiple filters, it can not only effectively reduce the number of parameters but also extract features layer by layer. In the process of convolution, the parameters of the characteristic map obtained by each layer decrease layer by layer, but the number increases gradually. Based on the analysis of the problems faced by the flute performance teaching, this paper puts forward the corresponding solutions in order to promote the flute performance teaching in China to achieve better results.

1. Introduction

Flute is not a traditional Chinese musical instrument but was introduced from western countries. Flute is light and portable. There are a group of loyal fans in China. Some higher education courses also set up categories related to flute, so that students can receive professional and systematic flute performance training [1]. Influenced by cultural background, economic development, social system, education system, and other factors, there is still a big gap between Chinese institutions and developed countries in flute teaching, even with our neighbors, South Korea and Japan. At the same time, there are not many excellent Chinese flute teachers, let alone very systematic training and teaching. This makes most Chinese flute learners have poor finger flexibility, inaccurate pronunciation, incorrect breathing methods, and many other basic skills [2, 3]. Kneading is one of the most important skills and contents in flute playing. The correct and reasonable use of kneading sound can not only beautify the timbre of flute, improve, and enrich the artistic expression and appeal of flute but also reflect the artistic style of flute artists to a certain extent. Therefore, kneading is what everyone who studies flute must master. In current universities and some training institutions, the teaching quality of flute performance is uneven. In order to enable students to pass the grade examination, teachers and students of many music schools and training institutions only pay attention to the teaching progress and ignore the teaching practice [1].

Based on a convolutional neural network, the personalized music recommendation system designed in this paper starts with the analysis of the characteristic data of flute performance, which not only brings great convenience to music users but also is the goal that every music software provider hopes to achieve, so it has important research significance and broad application prospects [2, 4]. Before 2006, the development of artificial neural network can be roughly divided into two periods. In 1943, Mc Culloch and Pitts put forward the earliest artificial neuron, which has the ability to learn. This is the beginning of an artificial neural network. During this period, we studied its learning algorithm along a single neuron [4]. The convolutional neural network is closer to a real-world biological neural network because of its local perception and weight-sharing structure. Weight sharing structure reduces the complexity of the neural network, which can avoid the complexity of feature extraction and classification process in data reconstruction. In the mid-1980s, Nobel Prize winner John Hopfield proposed Hopfield neural network model, which is dynamic and may be used to solve complex problems [57]. At the same time, the back-propagation algorithm of the multilayer feedforward neural network was rediscovered.

The convolutional neural network is used to realize the analysis of flute performance and teaching system. In machine learning, the convolutional neural network is a deep feedforward artificial neural network. The artificial neuron can respond to the surrounding units and can perform flute performance and teaching. It has been successfully applied to music recognition [710]. The difference between a convolutional neural network and an ordinary neural network is that it includes input layer, output layer, and multiple hidden layers. Each layer is composed of several two-dimensional planes, and each two-dimensional plane is composed of many independent neurons. Convolution neural network has powerful functions of feature extraction and feature learning. For the initial features of the input, it learns the intermediate features in the multilayer convolution process and finally learns the advanced features conducive to flute performance and teaching system [1116]. A convolutional neural network has been widely used in flute performance and teaching systems. However, this paper mainly studies flute performance and teaching systems based on a convolutional neural network, which is essentially a hybrid recommendation model based on music content and user historical behavior [1721]. The music features extracted from the audio signal can essentially express the characteristics of music, which can not only better fit human’s intuitive feeling of music but also effectively avoid the problem of a cold start. This chapter will describe the related contents involved in the implementation of the recommendation algorithm, such as the argot meaning model, matrix decomposition, audio feature representation, and convolution neural network model architecture design, in order to achieve a better recommendation effect [2226]. By simultaneously applying local perception fields and parameter sharing in a convolutional neural network and adding multiple filters, it can not only effectively reduce the number of parameters but also extract features layer by layer. In the process of convolution, the parameters of the characteristic map obtained by each layer decrease layer by layer, but the number increases gradually [2731].

2.1. Research Status at Home and Abroad

Lv J, Sun Q, and Li Q put forward that not only flute, oboe, bassoon, saxophone, and other wind instruments, such as horn, trumpet, and trombone, but also kneading sound is an important skill. For stringed instruments, playing without kneading is even more incredible. Literature Guo Y, He Y, and Song H pointed out that compared with foreign students of the same age, most flute playing students in China basically have no systematic training, and their playing skills are not up to standard, and even many students have not correctly understood and grasped the style of western music. Literature Yao P A, Rh A, and Hy A pointed out that the flute is one of the most important wind instruments, which mainly depends on the control of the mouth and playing posture to complete the performance of the whole piece of music. The whole process needs not only good listening but also coordinated movements to grasp the breath. Alvarez J and Leon A pointed out that for flute basic performance teaching, and first of all, the ability level of teachers should be solved in college teaching. On the basis of training teachers’ level, only teachers with relevant professional qualities can better complete the teaching task. The difficulty of basic flute playing is relatively high, so we should pay more attention to it. Literature Xu C, Yang J, and Lai H pointed out that there are not many excellent flute teachers in China, which led many teachers to pay little attention to the cultivation of students’ music literacy and basic skills training, and even to the training of students’ playing style. In short, at present, Chinese flute teaching is facing many problems, which is also an important factor restricting the sustainable development of Chinese flute. Literature Bl A, Adfc B, and Rc C pointed out that at present, as far as China’s domestic situation is concerned, many colleges and universities regard basic flute performance as one aspect of students’ comprehensive quality education. However, due to the influence of the school’s own level, the lack of enough excellent teachers leads to insufficient teachers’ strength, and at the same time, there is a problem that teachers’ comprehensive quality can’t keep up, which directly affects the teaching effect of basic flute performance. The document Ishino M points out that after the combination of new media technology and flute playing teaching, many teachers use new media to explain music theory knowledge and vocal essentials, which promotes flute lovers to make a breakthrough in playing technology. The literature by Meenakshi, Khosla, and Keith pointed out that at present, many colleges and universities in China regard flute playing as one aspect of students’ comprehensive quality education. However, many schools are influenced by their own factors, and there are no professional teachers or teachers who are not very strong, and the level of teachers’ comprehensive quality is not very high, which directly leads to the poor teaching effect of flute playing, so that most students do not receive systematic training, which is very unfavorable for students’ healthy development in the future. The literature by Satar H. M. and Wigham CR put forward that the basic flute performance in contemporary China is a continuation of the traditional teaching mode, which is relatively inflexible in teaching form. Therefore, it is necessary to make corresponding reasonable plans in teaching. According to the literature by Guizzo E, Weyde T, and Tarroni G, new media technology, as an auxiliary teaching method, should be used to assist teaching. However, many flute teachers pay too much attention to the role of new media technology in teaching and do not pay enough attention to students’ performance in performance technology, which leads to many students not mastering the key of performance and making technical mistakes when playing tunes.

2.2. Research Status of Flute Playing and Teaching System Based on Convolutional Neural Network

This paper studies the analysis and optimization of flute performance and teaching system based on a convolutional neural network. There are two main aspects in the cultivation of basic flute performance skills. First, we need to improve students’ music literacy and performance skills. Music literacy is the first thing that needs to be possessed in learning vocal music knowledge, followed by the proficiency and enrichment of performance skills, which also affects students’ ability and occupies an important position in many aspects. Second, we need to improve students’ mastery of breathing skills. Having a correct breathing method is an important basis for a student to master the basic flute playing methods. Whether he can have a correct breathing method is very important for a flute player, which directly affects the accuracy of sound quality in the playing process. Therefore, in order to provide students with systematic training and high-quality teaching, we must strengthen the cultivation of the quality of teachers, improve teachers’ professional skills, and make teachers have high music literacy. A convolutional neural network is a virtual technology. It can teach and learn anytime and anywhere without the limitation of time and space. This advantage is very helpful for flute performance teaching so as to continuously guide students in the process of flute performance teaching, give students more systematic training, and improve the classroom quality of flute performance teaching, so as to lay a good foundation for students’ sustainable development.

3. Principle and Model of Convolutional Neural Network

A convolutional neural network is a kind of feedforward neural network with convolution calculation and depth structure. The research on convolutional neural networks can be traced back to the 1980s and 1990s of last century. TDNN and LeNet-5 are generally regarded as the earliest convolutional neural networks. Teachers/administrators can log in to the management end of the teacher system, upload exam tracks, set scoring weights, maintain school and student information, and create exams. The overall flow chart of flute performance and educational art quality monitoring system based on a convolutional neural network is shown in Figure 1.

From the perspective of candidates, the basic operation process can be divided into the following five steps:Candidates log in to the test terminal, debug the equipment, and confirm that the recording equipment is ok.Candidates check playing tracks and playing tracks.Candidates prepare for audition.Candidates perform formally, which can be adjusted according to the beat prompt.Data submitted to background server convolutional neural network can supplement information sources for flute performance and teaching system, and to a certain extent, it can alleviate the common problems of cold start, sparseness, and expansibility in recommendation system and better meet the increasingly strong personalized application requirements. The overall system design block diagram is shown in Figure 2.

As can be seen from Figure 2, the system mainly includes a user modeling module, a music feature extraction module, and recommendation algorithm module. The user modeling module is mainly used to collect the historical behavior data of music users in the system and construct the user preference feature model. The performance feature extraction module is mainly used to preprocess the performance content and extract the spectrum features, so as to prepare for training the convolutional neural network to obtain the regression model for predicting the potential features of flute performance. The recommendation algorithm module is mainly used to calculate the matching degree between users and music according to the potential characteristics of flute performance predicted by the regression model and combined with the user preference characteristics and finally generate a recommendation list of music objects that users may be interested in. The basic characteristics of music processing include pitch, loudness, and timbre. Pitch is the most intuitive parameter that people can feel. It is determined by the frequency of the vocal signal, and the unit is hertz. The higher the pitch is, the sharper the sound feeling is. This is why the sharpness of girls’ voice is generally higher than that of boys in daily life. The loudness intuitively reflects the size of the sound, in decibels. During network model training, the difference between the output result and the real label is called the loss function. The essence of parameter training is to find an optimal parameter set in the parameter space to minimize the difference between all output results and the corresponding real labels. The convolution layer mainly completes the feature extraction of the flute performance and teaching system. The convolution core is used for convolution operation with the flute performance and teaching system. When the convolution core is working, the convolution core slides along the horizontal and vertical directions of the flute performance in a certain step. Each step moves, and it will regularly sweep the input features. In the receptive field, the input image and the corresponding position elements of the filter are multiplied and then summed, and finally, the offset term is added. The operation result is placed on the output characteristic image corresponding to the position of the convolution kernel. With the end of the sliding, the teaching system of flute performance can be obtained. Among many loss functions, convolutional neural networks are commonly used in two ways: mean square error loss function and cross-entropy loss function. This feature makes convolutional neural network have obvious advantages in processing grid or matrix structure data like images, and audio signals can extract the corresponding time-frequency spectrum, so convolutional neural network is gradually applied to the identification and processing of audio signals.

The two key steps in a convolutional neural network are convolution operation and pooling operation.

The expression of convolution operation is

In a convolution neural network, is the input feature, is the convolution kernel, and is the feature map. When processing two-dimensional matrix data, the above formula can be written as

The mean square error loss function, also known as the square loss function, uses the Euclidean distance to characterize the difference between the output value and the label value, and the expression iswhere l is the loss, X is the input sample, is the final output of the network model, is the tag value, and N is the number of samples. Minimizing the loss function is the goal of parameter optimization. The general optimization method is to train the parameters by the back-propagation loss function. The derivation of the learnable parameters and B in the network can be obtained by making the number of samples n = 1.where is the gradient of activation function with respect to neuron output Z. From the above two equations, it can be obtained that the learning rate of learnable parameters is directly proportional to . When is smaller, the parameter training is slower, and the network is difficult to converge. Another commonly used loss function is the cross-entropy function. Cross-entropy is generally used to characterize the similarity of probability distribution between two sample sets. The expression of cross-entropy loss function iswhere l is the loss, X is the input sample, a is the output, y is the tag value, and N is the number of samples. The derivation of the learnable parameters and B in the network is obtained, and the number of samples n = 1.where is the activation function of neuron output Z. From the above formula, it can be seen that the learning rate of parameters W and B is proportional to , but independent of its derivative . The learning rate of the parameter is proportional to ; that is, it is proportional to the loss function, which can prevent the convergence rate from missing the optimal solution too fast. This advantage of cross-entropy loss function makes it more used in CNN’s network parameter training than mean square error loss function. Momentum algorithm adds momentum parameters to SGD algorithm, so that when updating parameters, not only the current gradient but also the accumulation of exponential decay results of previous gradients are considered. Specifically, when SGD updates the parameters, equations (2)–(10) are rewritten aswhere is the velocity parameter, which is updated together with the gradient unbiased estimation.

Thereby we can ensure stable convergence and reduce shock while training better. In application, the momentum parameter A is usually selected as 0.9 or 0.99.

4. Innovative Countermeasures of Flute Playing and Teaching System

4.1. Analysis and Optimization of Flute Performance and Teaching System Based on Convolutional Neural Network

Teachers can organize students to participate in the discussion in their spare time. For some problems in the teaching process, students should put forward what they don’t understand in time, and we should discuss together to put forward countermeasures and reasonable suggestions. In this way, we can effectively solve various problems in the teaching process of basic flute performance, because making plans according to students’ mastery can also effectively avoid teachers from making unreasonable training arrangements. The analysis and optimization of flute playing and teaching system based on a convolutional neural network mainly cultivate students’ flute playing skills from the following two aspects: first, the cultivation of students’ good sound quality is strengthened. Both teachers and students must pay attention to good sound quality. Good sound quality is jointly affected by the player himself and the musical instrument. At the same time, the player’s own factors are closely related to the player’s experience, skills, and methods; second, the cultivation of students’ breathing methods is strengthened. Based on the convolutional neural network, information is transmitted through the network, and a large amount of information can be transmitted to the receiver in a very short time. It is precise because of this advantage that the convolutional neural network is more and more favored by teachers in flute teaching. However, the convolutional neural network is good at theoretical teaching, and the actual playing skills still need teachers to teach hand-in-hand, simply by watching ppt or video, and students can’t really master the performance methods. Therefore, in teaching, teachers should integrate theory and practice to comprehensively improve students’ performance level. Using a convolutional neural network in flute teaching needs to carry out relevant research and analysis according to the actual situation of students, which requires a relevant detailed analysis of the teaching content and class management and reasonable arrangement of courses and innovative teaching methods, so as to improve student’s learning ability and teachers’ teaching level. At present, people agree that the most reasonable breathing method is chest abdominal breathing, which fully meets the requirements of flute players for playing and breathing. When students use thoracoabdominal breathing in the process of playing, the inhalation amount of this method is relatively large, which requires the intercostal muscle and diaphragm to participate in the breathing process, so as to control the uniformity of breath to the greatest extent and ensure good sound quality, so as to promote flute players to play better. According to the teaching requirements of school flute basic performance based on convolutional neural network, all students should be trained in basic performance methods after entering the school. Through this training, the performance of each student should be observed, and timely correction should be given according to the different conditions of students, so as to help students develop good performance methods. According to the observed progress of students, different teaching methods are established. At the same time, students’ learning attitude is corrected, so as to lay a solid foundation for each student to learn flute playing.

4.2. Experimental Results and Analysis

If the school wants to provide students with systematic training and high-quality teaching, it is necessary to strengthen the construction of the teaching staff, improve the quality of the teaching staff, improve the professional skills and skills of teachers, and promote the teachers’ specific and stronger teaching level in teaching, so as to cultivate students’ abilities and improve their knowledge, ability, and playing skills with a brand-new model. In the end-to-end neural network structure, the first layer structure can learn the primary feature representation of the input signal, which is equivalent to the primary feature with traditional time-frequency transform as input in improving the final classification performance. From a certain point of view, the positive influence of a convolutional neural network on flute teaching and performance is not limited to the improvement of technology and communication mode but also its impact on traditional teaching ideas. The hardware environment of the experimental platform is Intel i7-7800X CPU, clocked at 3.5 GHz, turbo frequency at 4.0 GHz, 6 cores and 12 threads, and 15 GB of memory, and the graphics card is a dual GPU of NVIDIA GTX 2080. For each time frame with a length of 8194, before extracting features by using logarithmic frequency-domain filter banks, it is segmented by a sliding window. Here, we use a cosine window, and adding a window function is beneficial to solve the spectrum leakage phenomenon caused by the boundary effect. The length of the window is 2045 sampling points, and the stride of window movement is 253 sampling points to prevent the loss of boundary information. Then, (8193–2046)/255 + 1 = 25 TXs per frame, that is, pT = 25 in the previous section, and the length s = 2047. The feature size changes in the whole process are shown in Table 1.

In order to explore the influence of the primary features extracted by the log-frequency filter bank with artificially defined weights on the classification effect, we replaced the log-frequency filter bank with a double-layer ReLu network to extract the primary features of audio, which was used as a comparative experiment. The double-layer ReLu network can be regarded as a set of filter banks whose weights need to be learned. Because the weights at this time are learned from random initialization, it is uncertain whether these learned weights show a topological structure sorted from low frequency to high frequency like logarithmic frequency-domain filter banks, and it is also possible to learn a topological structure of self-organizing mapping in the parameter space as shown in Table 2.

From Table 2, it can be seen that when the two-layer network and filter bank are used to extract the main features of audio respectively, the pitch recognition accuracy P, recall R, and F1 scores of the recognition model under different frame lengths. These three are also the standards of multipitch estimation used by MIREX, an international conference on music information retrieval and evaluation. Under the convolutional neural network, it is faster and more convenient for teachers to teach students the teaching contents and matters needing attention in playing, which can realize the real-time transmission of information, and is not limited by time and space, which brings new opportunities to flute playing teachers’ teaching. At the same time, flute teachers can also take advantage of the opportunities brought by new media to reform the traditional teaching mode, make new media serve flute playing teaching, realize the effective combination of the two, and construct new teaching methods. Through multimedia, students can learn and master the different styles of different performers, understand the basic common sense of performance, think and ponder, take the essence and discard the dross, and gradually form their own unique performance style, which is constantly run-in and revised in actual performance and is recognized by the audience.

In order to verify the feasibility of a convolutional neural network and measure the quality of recommendation results generated by the model, the recommendation accuracy under different recommendation list lengths is tested experimentally. In this paper, three experiments were carried out to compare. In the experiment, the recommended list was set to different lengths such as 15, 20, 25, 30, 35, 40, 45, and 50, respectively, and the accuracy of the recommended list was quantitatively evaluated by accuracy, recall, and F1 value. The recommendation results under different recommendation list lengths are shown in Figures 35.

From the experimental results in Figures 35, it can be seen that the length of the recommendation list has a certain impact on the recommendation results, and with the increase of the length of the recommendation list, the accuracy is decreasing, while the recall rate and F1 value are increasing. When the length of the recommendation list is 15, the highest accuracy rate is about 0.45, and the lowest recall rate is about 0.234. When the length of the recommendation list increases to 30, the accuracy rate decreases to about 0.45, and the recall rate increases to about 0.361, which basically conforms to the general law of the recommendation system.

In order to more objectively show the effectiveness of the convolutional neural network, this paper selects other recommended algorithms that can be implemented on the existing data sets for comparative experiments. Three experiments were conducted to test the accuracy, recall, and F1 value of different recommendation algorithm models such as frunk SVD, user CF, and CB under different recommendation list lengths. The experimental results are shown in Figures 68.

It can be seen from Figures 6 to 8 that under the same length of recommendation list, the recommendation results generated by a convolution neural network in this paper are better in accuracy, recall, and F1 value than the other three traditional methods. This may be because the traditional recommendation algorithm model only uses a sparse score matrix or a single item content for the recommendation. The recommendation algorithm in this paper not only uses the historical behavior data of users’ interaction with music but also introduces the characteristics of audio content through deep learning, and the deep convolution neural network can better learn the characteristics of data. When making teaching plans, flute teachers must make reasonable teaching plans in combination with the actual mastery and acceptance of students. For example, teachers can organize students to talk together, put forward problems in the teaching process in time, and put forward corresponding solutions and reasonable suggestions, which will effectively solve various problems in the flute teaching process and effectively avoid unreasonable training arrangements. Of course, the convolutional neural network in this paper does not greatly improve the recommendation effect. This is because the focus of this paper is to explore the feasibility of a convolutional neural network for music recommendation. At the same time, aiming at the improvement of the cold start problem of traditional recommendation algorithms, it can supplement the available information source for the music recommendation system. If we integrate more user and project attributes and further improve the model, it is expected to greatly improve the overall performance of the recommendation system.

5. Conclusions

To sum up, the teaching of flute performance still faces many difficulties in China. We must pay attention to it, strengthen the training of teachers, strengthen the training of students’ basic performance skills, and correct students’ learning attitude, so as to lay a good foundation for students’ healthy development in the future. This paper uses a convolutional neural network to analyze and optimize flute performance and teaching system. For music educators, how to combine traditional teaching methods with new media technology to improve classroom efficiency, close to students’ life reality, and help them improve their mastery of basic theoretical knowledge and performance skills is an important topic. Flute performance teachers should consider the positive and negative effects of new media technology, formulate reasonable teaching strategies, make use of the fast and simple characteristics of new media technology, give real-time guidance to students’ learning, and promote students’ performance skills to achieve a qualitative leap. Through this training, the performance of each student should be observed, and timely correction should be given according to the different conditions of students, so as to help students develop good performance methods.

Data Availability

The figures and tables used to support the findings of this study are included in the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

The author would like to show sincere thanks to those techniques which have contributed to this research.