Abstract

In order to explore students’ boredom in English learning, a recognition algorithm based on fuzzy neural network is proposed. The algorithm selects Gaussian membership function and initializes the clustering center obtained by fuzzy c-means algorithm to the center of Gaussian function, and the width of Gaussian function is obtained by the membership and center of fuzzy c-means clustering algorithm. In the construction of base classifier, diversity strategy is adopted to increase its diversity and complementarity. In the selection of base classifiers, the combination of contour coefficient and clustering algorithm is used to determine the number of classifiers to be fused, and the inconsistency measurement method is used to evaluate their differences. In the combination strategy, we learn from the Bayesian thought and dynamically adapt the weight of learning base classifier based on its a priori probability and class conditional probability. The experimental results show that 10 times of cross-validation are carried out, respectively, and the accuracy of each algorithm is given. The algorithm based on tree structure obviously has better performance, followed by the rule-based algorithm, and finally the fuzzy neural network algorithm based on neural network, while the accuracy of SVM and logistic regression algorithm LR is lower. It is proved that the fuzzy neural network can effectively identify students’ boredom in English learning.

1. Introduction

With the continuous development of the computer industry, the relationship between computers and people is becoming closer and closer. It is of great significance for computers to perceive human emotions for improving human-computer interaction experience. Voice emotion sensing technology bridges this field and makes it possible for machines to understand people’s emotions. Voice emotion sensing technology, that is, voice emotion sensing technology, mainly refers to allowing the computer to recognize different human emotional states by receiving voice signals. Human speech in different emotional states has very obvious features, such as sound quality features, prosodic features, spectral features, and other features. These features can help computers better distinguish different emotional states [1]. The deep neural network originates from the artificial neural network and the multilayer perceptron model. Based on the original two-layer perceptron model, the deep neural network can approach any mathematical continuous function in theory, and the feature expression ability of the model has been better expanded. At the same time, due to the increase of levels, the features can become more abstract, more features can be expressed with the same number of features, and the classification effect becomes better. Convolutional neural network and cyclic neural network are very popular in the field of deep learning. They have made breakthroughs in speech recognition, expression recognition (Figure 1), and natural language processing [2].

Human beings have invented words to record the communication information in time and space. In the process of communication, the voices of both sides of communication are different, and they will have different interpretations of each other. Only the description of words, the dialogue will have no emotion and be straightforward. Such information lacks dimension. In modern society, people have also invented many ways to record information, such as pictures, movies, operas, and novels [3]. These inventions add more interest and convenience to life and bring massive data to life at the same time. Processing and analyzing these data is a time-consuming and human task, and facial emotion, as a supplement to language communication, can more three-dimensional express people’s real situation in the case of strong language subjectivity. In interpersonal communication, people can have 65% cognitive judgment of each other in the first seven seconds of acquaintance. People’s facial emotions account for 55% of the communication. In addition, the tone of voice accounts for 38% of the communication, and only 7% of the real information about each other’s communication content. From the above data, it can be seen that facial emotion has a great influence on communication and is an indispensable part. As a member of the tide of artificial intelligence, deep learning has become the focus of academic and industrial circles with its high-performance performance and theoretical precipitation over the years. At present, the demand for intelligent manufacturing in the industry is becoming stronger and stronger, and face emotion recognition has also got a good development opportunity. Facial emotion can reflect the individual’s internal psychological state to the greatest extent in facial information, which has great research value. The following are the related applications of facial emotion recognition [4].

2. Literature Review

Emotion is a physiological and psychological state accompanied by the process of cognition and consciousness. It plays a very important role in interpersonal communication. In recent years, the first mock exam has been done by many researchers at home and abroad. Most of these researches are to distinguish six basic facial expressions from single modal emotion recognition to multimodal emotion recognition. Kahaki proposed a multimodal method of facial expression and speech based on the decision-making level fusion mechanism and achieved a certain fusion effect [5]. Fatahi proposed a multimodal method of facial expression and speech based on Multi-Stream Hidden Markov Model (HMM). After fusion, the recognition rate was 72.42% [6]. Kanya used the kernel cross model factor analysis method to reduce the dimension and fuse the features of speech mode and facial expression mode. However, for the same facial expression, the expression intensity is different, and the emotional state will be different. For example, the expression degree of anger can be divided into different levels from anger to anger. Anger indicates a slight degree of anger; anger means that people are in a very angry state [7]. Calculating the intensity of facial expressions is an extension of the process of recognizing emotions, and there are relatively few researchers in this regard. For Kanya, the sun used a 6-level classification of binary classifiers for hierarchical division. They assessed the intensity of Au from a unit of activity representing several infant facial features [8]. However, it is not possible to avoid class overlap between classes, which makes these methods more robust. In Soria’s study, they used a picture-based classification method to calculate intensity at three levels (low, medium, and high). From the above research, it can be observed that the level division of expression is preset. Therefore, this paper proposes an expression level subdivision method based on fuzzy clustering. This method adopts a clustering fusion algorithm, which can automatically obtain the intensity level without setting the intensity level in advance [9]. Jebadass inputs a large number of facial expression pictures of different races, ages and genders into the computer and observes the changes of important facial feature points such as the texture and wrinkles of all faces and the changes of shape through the algorithm, so as to identify people’s current emotions. The algorithm can accurately identify seven emotions: happiness, sadness, surprise, anger, contempt, disgust, and fear [10]. Cui experimentally defined human emotions as six basic emotions: happiness, anger, disgust, sadness, fear, and surprise. This emotion model has the universal applicability of problem description and problem-solving for easily identifiable emotions. However, this discrete emotion model does not have wide adaptability in subtle, complex, and abstract emotion description. At the same time, there are strong subjective factors in the standard of emotional judgment. For example, people have individual differences in the distinction between fear and disgust. Secondly, there is no quantifiable dimensional system to make the whole emotion model have the ability to describe the relationship and degree, and it is impossible to use the tools of vector and matrix for quantitative description. Hinton proposed an algorithm for parameter initialization of multilayer structure deep neural network in 2006. Using this algorithm, it is easier to make the network converge, and the network can be designed deeper. This contribution is considered to be the cornerstone and beginning of modern deep neural networks. With the improvement of computer computing power, the optimization of computing framework and the accelerated development of image datasets with the advent of the Internet era. The literature proposes to use artificial neural network ANN for nonlinear model mapping, which is divided into three types: multilayer perceptron MLP, cyclic neural network RNN, and restricted Boltzmann neural network RBN. The recognition effect of ANN is poor, with an average of about 65% [11]. Sabri has developed a speech emotion perception platform, which can recognize emotions through sound quality analysis. At present, the emotion states that can be analyzed include five basic emotions, and the recognition accuracy is 70% to 80%, which is 60% higher than the human average [12].

Based on the current research, a recognition algorithm based on fuzzy neural network is proposed. The algorithm selects Gaussian membership function and initializes the clustering center obtained by fuzzy c-means algorithm to the center of Gaussian function, and the width of Gaussian function is obtained by the membership and center of fuzzy c-means clustering algorithm. In the construction of base classifier, diversity strategy is adopted to increase its diversity and complementarity; in the selection of base classifiers, the combination of contour coefficient and clustering algorithm is used to determine the number of classifiers to be fused, and the inconsistency measurement method is used to evaluate their differences; in the combination strategy, we learn from Bayesian thought and dynamically adapt the weight of learning base classifier based on its a priori probability and class conditional probability.

3. Traditional and Fuzzy Neural Emotion Recognition System

3.1. Traditional Recognition Model
3.1.1. Overall Design of System Prototype

Figure 2 shows the overall process of speech emotion perception, which is in line with the general process of pattern recognition, machine learning, and classification.

Every step in the process of speech emotion perception is very key and will have a very important impact on the results.

(1) Data Collection. Whether recognition or training, we need to collect samples and collect the original data of the research object. However, for the collected original data, we should pay attention to whether the distribution of the samples is uniform and whether the samples truthfully reflect the essence of the characteristics. The quality of the data directly determines the quality of the recognition results [13]. At the same time, the model also depends on the experimental samples, so it is necessary to experiment on multiple data samples to determine the independence and universality of the model. The samples used in this paper include CASIA, EMO, and SAVEE. The author will carry out experiments on these three databases in the future.

(2) Format Preprocessing. The collected samples cannot always be taken directly for use. It is necessary to sort and label the data, which is called formatting. For voice data, endpoint detection, pre-emphasis, windowing, and framing are also needed. At the same time, the data needs to be transformed to meet the needs of the model.

(3) Key Feature Extraction. The effect of machine recognition largely depends on the quality of features. The features used in this paper mainly include Mel cepstrum coefficient (MFCC) and spectrum. However, the characteristic effects under different parameters are different, which needs to be tested [14].

(4) Model Training. In deep learning, the focus on classification has changed from manual features to feature representation, and more attention is paid to expressing features through models. Therefore, it is necessary to build different models and constantly adjust the models to improve the classification effect.

(5) Model Identification. There is no doubt about this step. When the model is trained well and the data is input, the classifier can directly output the results.

In order to illustrate the effectiveness of the emotion perception model proposed in this paper, this paper makes a complete experimental comparison. At the same time, due to the strong engineering and reasonable writing expression of this experiment, this paper explains these experiments from the perspective of engineering by introducing the process of how to build a better voice emotion perception system.

Figure 3 shows the construction process of speech emotion recognition system. The process mainly includes sample collection and formatting, alternative speech feature extraction, alternative model construction, alternative feature model combination, feature selection and comparison, model optimization, and comparison [15].

The samples, features, and models to be selected in Figure 3 constitute the alternatives and experimental objects in this paper. The relationship between these alternatives and experimental objects is drawn as shown in Figure 4.

It can be clearly seen from Figure 4 that experiments are needed to determine the following: (1)Comparison between feature MFCC and spectrogram(2)Comparison of classifier SVM and soft Max(3)System model combination relationship(4)Comparison of fuzzy neural network/cyclic neural network/fuzzy neural network + cyclic neural network(5)Effectiveness of the combination of fuzzy neural network/cyclic neural network/fuzzy neural network + cyclic neural network and SVM(6)Is the model universal on different data samples: Berlin emo, SAVEE, and CASIA(7)The above part is the basic idea and operation process of the emotion perception model proposed in this paper. This is a model proposed based on many experiments. In order to realize the subsequent model, this paper will first establish a traditional recognition model and then use this model to carry out feature comparison experiments and classifier comparison experiments

3.1.2. Voice Emotion Data Collection and Formatting

The collection of speech emotion data is the first step in the experiment of speech emotion perception system. This paper mainly collects three public speech emotion databases, CASIA, Berlin emo, and SAVEE. The reason why data formatting is needed is that these databases have different formats for labeling different types of audio data. Data formatting is to preprocess the way of labeling these data and label them according to a unified posture, which is convenient for the unified processing of subsequent programs. These corpora are used by different professional actors to read the same types of texts with different emotions, and some actors will interpret different versions of the same emotion for the same text [16]. Therefore, the audio file of each emotion database consists of several elements: actor number, text number, emotion number, and version number (there may be). Finally, the length of audio is counted, and the statistical results are shown in Figure 5.

As can be seen from Figure 5, most of CASIA’s data length is between 1s and 2s, most of emo’s data length is between 2S and 3S, and most of SAVEE’s audio length is between 3S and 4S. By analyzing the duration, the duration distribution of the sample is understood, which provides a reference for the duration of subsequent system test recording.

3.2. Interval Type II Fuzzy Neural Network

The structure of general interval type II fuzzy neural network is shown in Figure 6.

Assuming that the type 2 fuzzy neural network has m rules, the k-th rule can be expressed as follows.

Each layer of multi-input single output neural network is described as follows.

The first layer is the input layer: It contains n neurons, representing the input vector: . The output of the j-th neuron is the j-th input variable of the input vector [17].

The second layer is the membership function layer: Each neuron is the membership function of the input variable, the number of neurons is , and the output of each neuron represents the i-th input variable in the rule. If the membership function in is a traditional type-I fuzzy membership function, the Gaussian membership function is selected, which can be expressed as

Either the menbership function of Gauss type and type II can pass through the fuzzification center or the width indicates that if the fuzzification center can be past through; thus, the upper and lower limits of the membership function can be expressed as

If the center is blurred, the upper and lower limits of the membership function can be expressed as

Now, choose the second form of Gaussian membership function, namely, Formula (3) and Formula (4).

The third layer is the fuzzy reasoning layer: The number of neurons is m, and the output of each neuron can be expressed as

In Formula (5), .

The fourth layer is the output layer: If it is a single output system, the number of neurons is 1, of which the third and fourth layers have a connection parameter , and the neuron output can be expressed as

In Formula (6),

and are obtained by the KM order reduction algorithm of interval type II fuzzy sets. The specific km order reduction algorithm can be found in the literature.

Fuzzy c-means clustering algorithm is an unsupervised learning algorithm, and its objective function is

In Formula (7), is the weighted index, and is a real number greater than 1; is the Euclidean distance between the data point and the cluster center. The result of clustering is to obtain the minimum value of objective function , which meets the following constraints:.

According to the Lagrange multiplier method, the specific implementation steps of FCM clustering algorithm are as follows.

Initialization given the number of cluster categories,,is the number of samples, set the given stop threshold, the maximum number of iterations, initialize the cluster center, and set the iteration counter r =0 [18].

Step 1: Calculate the fuzzy partition matrix of the R +1st iteration with Equation (8).

For , if , there are

If for , , then there is , and for .

Step 2: Calculate the cluster center of the -th iteration with Equation (9):

In Step 3, if or the number of iterations is greater than , the algorithm stops, and the value of the last iteration is the fuzzy partition matrix U and the cluster center P. Otherwise, make , and turn to Step 1, where is an appropriate matrix norm.

The parameter training method of interval type II fuzzy neural network proposed in this paper uses BP algorithm, and the optimized objective function is

Firstly, the direct order reduction algorithm is used to avoid the problems caused by the common km order reduction algorithm. The output of type II fuzzy neural network is, where .

The parameters of the BP algorithm here include: centerand widthof Gaussian membership function; And the subsequent parameter, the parameter learning algorithms are

According to the derivation of composite function, the learning algorithms of each parameter are

For groups of training input and output data , set the maximum number of iterations I and cutoff error . Learning rate, learning rate,and subsequent parameterlearning rate. The interval type II fuzzy neural network algorithm proposed in this paper is described as follows [19]:

(1) Set the number of fuzzy rules m of type II fuzzy neural network

(2) The results of FCM clustering algorithm are used to initialize the parameters of Gaussian membership function in neural network, that is, is initialized as the input x cluster center, and the subsequent parameter of fuzzy rule is initialized as the output y cluster center

(3) Use Equation (17) to initialize the parameters of Gaussian membership function:

(4) Set the initial iteration counter for each training data; use Equations (12) to (16) to update parameters , , and .

(5) If the sum J of root square error (RSE) of all training data, as shown in Equation (18), is less than the given cutoff error or the number of iterations reaches the maximum number of iterations I, the training process is ended; otherwise, ; go to Step (3):

4. Experimental Methods and Results

In the selection of base classifier, this paper selects three groups of eight algorithms based on different principles as the base classifier.

Algorithms based on tree structure: lad tree, rep tree, and cart tree

Rule based algorithms: DTNB (decision tree naive Bayes) and part rule (some rules)

Linear: support vector machine SVM and linear regression LR

Multilayer perceptron: MLP algorithm with neural network structure

Input the training data into the above algorithm for training, and its accuracy is shown in Figure 7.

The above algorithms have been cross verified for 10 times, respectively, and the accuracy of each algorithm is given. It can be seen from the figure that the algorithm based on tree structure obviously has better performance, followed by the rule-based algorithm, and finally the MLP algorithm based on neural network, while the accuracy of SVM and logical regression algorithm LR is lower [20].

Within the value range of 2 to 10 for , the k-means algorithm is used to cluster the data samples and calculate the contour coefficient value, and the broken line diagram of contour coefficient and value can be obtained, as shown in Figure 8.

As can be seen from the figure, the contour coefficient increases monotonically between the values of from 2 to 3 and decreases monotonically between the values of from 3 to 10. An inflection point is formed on the value of , that is, the maximum value is taken when , which means that the optimal cluster number is 3. Three base classifiers are selected for final classifier integration. At the same time, the difference measurement is calculated for the 10 base classifiers, as shown in Figure 9.

Firstly, in the difference measurement, this paper adopts the pairwise difference measurement method, that is, the difference between two classifiers is compared, and then it is summarized and averaged. However, as can be seen from Figure 9, this measurement method often cannot reveal the best difference due to its averaging characteristics. For example, for two classifiers with large differences, adding the third classifier will always reduce the overall average difference measure. Therefore, through experiments, it is found that the average difference measurement method, which summarizes the pairwise difference measurement, often cannot get the expected ideal results, that is, the curve is relatively flat without particularly prominent peaks and troughs. In this case, the different values between algorithms based on the same principle are close and clustered in a similar range [21]. At the same time, the algorithm with better accuracy also has a significant correct recognition rate for the samples misclassified by other classifiers, so the difference value is high. On the contrary, the algorithm with poor accuracy (such as LR and SVM) has a low difference value. To sum up, based on the experimental results (i.e., there is no particularly prominent peak and trough difference data), we choose the algorithm with relatively high difference and the algorithm based on different principles to fuse with each other, so as to observe the final recognition effect. In this experiment, we choose different algorithms to verify the relationship between the fusion effect of different kinds of classifiers and their differences. At the same time, we use the voting method, the weight method based on recognition accuracy, and the adaptive weight method as the fusion strategy to observe the experimental effect. In this experiment, firstly, three base classifiers with different combinations are selected, and the three fusion strategies are adopted to compare their recognition effects and performance, as shown in Figure 10.

As shown in Figure 11, the analysis shows that the fusion accuracy of voting method is the lowest; using the fusion method based on accuracy, because the weight of the base classifier with the lowest accuracy is the smallest and the discourse power of the base classifier with the highest accuracy is the heaviest, its recognition performance is slightly improved compared with the voting method [22]; after using the adaptive weight method, its performance has been significantly improved compared with the accuracy method, especially in group 2 and group 3. Then, analyze each group in detail.

Group 1 adopts three tree algorithms with the highest accuracy and average difference for fusion. The fusion effect based on accuracy and adaptive weight method is slightly improved compared with the voting method, but the improvement effect is not obvious, indicating that the selection of the base classifier is not the optimal choice.

Group 2 replaces the rep algorithm with the rule-based DTNB algorithm, which has the worst performance in group 1, and obtains better results than group 1 in the final fusion effect. It is the combination with the highest final recognition rate in all groups, and the improvement range becomes larger. In group 3, the cart algorithm in group 2 is replaced by the MLP algorithm based on neural network. The accuracy of the algorithm is low, but the maximum improvement is unexpectedly obtained, and the final recognition rate is second only to group 2, slightly higher than group 1 [23]. Group 4 then replaced the MLP algorithm with SVM algorithm. Due to the ultra-low accuracy of SVM, its voting method obtained the worst recognition effect, and the performance based on accuracy and adaptive weight method was not ideal. Through the above comparative analysis, we can draw the following conclusion.

In the combination strategy, in most cases, the adaptive weight method can often achieve better decision performance than the optimal subclassifier and has a significant improvement in performance compared with the traditional method of determining the weight based on the accuracy of each classifier. In the difference and base classifier category selection strategy, the average difference measurement method based on pairwise reduces the prominent peak on the curve because of its average, which reduces the reference of its data. From the analysis of groups 1-5, in most cases, the combination of base classifiers based on different principles can achieve better combination improvement effect (e.g., group 3, the classifier integrating three principles has the largest improvement). Group 2 integrates a rule-based DTNB algorithm into the two tree algorithms and achieves the best fusion performance because of the high accuracy of its base classifier. Group 1 uses three tree algorithms, even though its base classifier has the best accuracy, and its improvement is not obvious. As for group 4, the linear SVM algorithm may not be complementary to other algorithms, so its fusion effect is also very limited.

In simulation, the number of fuzzy rules . 200 groups of data were used for training, and 200 groups of data were used for verification. Figure 12 shows the fuzzy neural network output curve of the algorithm in this paper, Figure 13 shows the error curve of the verification data, and Figure 14 shows the objective function curve of the number of iterations. In the output curve of Figure 12, the line represents the output of neural network.

Table 1 shows the performance comparison results based on root mean square error between this algorithm and other identification algorithms. The comparison criterion is the root mean square error of training data and verification data.

In simulation, the number of fuzzy rules . Figure 15 shows the output curve of the fuzzy neural network for the validation data, Figure 16 shows the error curve of the validation data, and Figure 17 shows the objective function curve of the number of iterations.

Table 2 shows the performance comparison between this algorithm and other identification algorithms. The comparison criterion is the root mean square error of the validation data.

Next, for the test of the system, this paper simulates six emotional states of 50 students in English class, obtains a total of 300 data, and then tests it through the speech emotion recognition system of the mobile terminal. The model used by the mobile terminal is fuzzy neural network. The test results are shown in Figure 18 below. The data in the table represents the accuracy of input speech for emotion recognition.

The test results are not as accurate as expected in previous experiments. The possible reasons are as follows: (1)The training sample data is insufficient. In the experiment, the speaker independent and text independent recognition experiments of different speech emotional states are used. In this case, when the training sample size is small enough to include all the speaker’s personal voice quality characteristics and all possible texts, it is almost impossible to realize speaker independent and text independent speech emotion recognition on the test set. The deep learning model needs a lot of data to support the training of the model(2)The training sample data comes from professional actors, while the testers are ordinary people. In this paper, the main training samples come from the performance simulated emotions of professional actors, and ordinary people may not be the same as the simulated emotions of professional actors without professional training. Therefore, when the training set is inconsistent with the test set, the accuracy of test recognition will be reduced

5. Conclusion

Aiming at the complex structure identification of interval type-2 fuzzy neural network and the dependence of BP parameters on initial values, an interval type-2 fuzzy neural network identification algorithm based on FCM clustering algorithm is proposed. Firstly, the center and membership degree of the training data are obtained by FCM clustering algorithm, and the initial values are assigned to the center and width of the Gaussian membership function and the subsequent parameters of the type 2 fuzzy system. Secondly, a direct type-2 fuzzy set reduction algorithm is used to simplify the process of training network parameters by BP algorithm. The simulation results show that the proposed algorithm has the advantages of simple learning process, good identification accuracy, and fast convergence speed.

Data Availability

The labeled datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no competing interests.

Acknowledgments

This work is supported by the Shenyang Urban Construction University.