Abstract

Nowadays, the number of online teaching videos is rising rapidly; how to evaluate the actual effect of these videos objectively and justly is a hot issue. To solve this problem, this paper proposes a video learning effect evaluation scheme based on EEG signals and machine learning, and the k-nearest neighbor regression algorithm is adopted to complete the mental workload test because the determination coefficient of the training set can achieve 1.0 and no other model can achieve this value. Furthermore, the random forest algorithm is employed to complete the concentration test, and the determination coefficient of the training set is 0.978 and that of the test set is 0.929, both better than the existing relevant online learning video evaluation models. Finally, the effect of teaching videos is evaluated based on the learning efficiency of subjects. Through the student satisfaction test, it is found that this scheme can indeed improve students’ satisfaction with watching teaching video, and the increase rate can achieve 85%. This scheme could not only promote teachers to continuously improve their teaching level, but also provide a more reasonable reference for students to choose teaching videos.

1. Introduction

Modern online education [1] is a new form of education with the development of modern information technology. By March 2020, the number of online learning users in China reached 423 million, an increase of 222 million compared with the end of 2019, accounting for 46.8% of all Internet users [2]. In response to the COVID-19 pandemic, the Polish government decided to shut down all public and private institutions, including schools, from 12 March 2020. Since then, 4.58 million students from 24,000 schools have remained in their homes and practiced online education [3]. However, among the many online education platforms, how to objectively and impartially evaluate the teaching videos in these platforms and choose the videos most suitable for learning has become one of the most important problems for many users. At present, users mainly judge the quality of teaching videos based on comments and scores. These comments mostly contain subjective comments of specific users, and it is very easy to see the phenomenon of favorable or negative comments by the online water army.

Electroencephalogram (EEG)[4], is a method to record brain activity using electrophysiological indicators. In 1924, Hans Berger successfully measured the electrical signals of human brains using electroencephalogram for the first time [5]. With the cross-application of computer science and brain science, brain-computer interface (BCI) technology began to gradually come into the practical teaching scenario. Based on such EEG-related physiological indicators, the real-time state of users can be more objectively represented, the actual influence of various teaching videos on users can be reflected, and the subjective feelings of users are less affected.

In order to solve some negative effects that may be produced by subjective methods, this paper proposes a video learning effect evaluation scheme based on EEG signal. (a) Firstly, the EEG signals of the subjects in the process of mental workload test are extracted. Then, the linear and nonlinear characteristics of the EEG signals are extracted by discrete wavelet transform and fuzzy entropy, respectively. Finally, the “EEG signal-mental workload-efficiency” map is constructed by the nearest neighbor regression method. (b) The EEG signals of the subjects during the concentration test are extracted. Then, the power spectral density analysis is used to extract features from the EEG signals. The “EEG signal-concentration-efficiency” map is constructed by the random forest method. (c) Based on the above two maps, the two efficiency indexes are linearly weighted according to the entropy method to obtain the overall efficiency. The EEG signals of the subjects when they watch the teaching video are substituted to obtain the overall efficiency value, which is used as the evaluation standard of the teaching video quality.

The main contributions of the work are summarized as follows:(1)For the first time, EEG signals, mental workload, and concentration are used to evaluate the teaching effect of online teaching videos, making the evaluation more comprehensive and objective.(2)Methods such as independent component analysis, discrete wavelet transform, fuzzy entropy, and power spectral density are used to complete the preprocessing and feature extraction of EEG data. By comparing six machine learning algorithms, such as support vector machine, neural network, polynomial regression, k-nearest neighbor regression, classification and regression trees, and random forest, the k-nearest neighbor regression algorithm is finally adopted to complete the mental workload test, and the training set’s determination coefficient is 1.0 and the test set’s determination coefficient is 0.86; the random forest algorithm is employed to complete the concentration test, and the determination coefficient of the training set is 0.978 and that of the test set is 0.929. For the two tests, the used machine learning algorithms all obtain high determination coefficients, which are better than the existing relevant online learning video evaluation models.(3)The schema can provide objective and valuable reference for students selecting videos and effectively improve students’ satisfaction with watching videos, and the increase rate can achieve 85%. Furthermore, the application of the schema can monitor the real-time mental workload level and concentration of college students and provide relevant suggestions, so that they can maintain a relatively efficient learning state.

The remainder of the paper is organized as follows. Section 2 describes the related work. Section 3 describes the methodology, including the preliminary and experimental procedure, data analysis, and the preprocessing and feature extraction of the EEG signals. Section 4 describes the experiment design and result analysis, including six machine learning models’ comparison. Section 5 describes the satisfaction test. Section 6 summarizes the whole paper, pointing out some of the problems in this paper and prospects for the future.

2.1. Evaluation Approaches of Online Teaching Videos

In order to make a reliable evaluation of the effect of online teaching videos, Shen et al. [6] constructed a set of detailed expert evaluation systems based on the users’ behavior and the Delphi method to evaluate the overall learning effect of learners. From an empirical perspective, Hu et al. [7] built a structural equation model to explore the relationship among online learning users’ information literacy, level of investment, and learning effectiveness. AMOS statistical processing tool was used to verify the fitting degree of the data and model, and the chi-square to degree of freedom ratio was 2.786; the determination coefficient of the model was 0.892. Wang et al. [8] built a new network learning state analysis model to analyze the learning state of online learning users by combining users’ Q&A results and users’ behaviors. Rapidminer was used to create a decision tree model, and the accuracy of the model was 0.925 and the kappa was 0.884. The result showed that the learning state analysis model was effective.

Although their methods analyze the effect of online learning effectively from different perspectives, they are mainly based on the subjective judgment of subjects or experts, and the sample data are easy to be distorted subjectively or made up artificially. Therefore, it is particularly important to introduce objective physiological indicators as evaluation variables. Xue [9] et al. measured the users’ cognitive load through eye movement experiment, and established the corresponding quantitative model. The research of online learning effect evaluation system no longer only depends on the subjective opinions of evaluators; there are more objective physiological indicators for scholars to refer to, and the research results will be more objective and closer to the actual situation.

2.2. Correlation between Mental Workload and Learning Efficiency

Recently, researchers have found a strong link between mental workload and learning efficiency [10]. In fact, Yerkes and Dodson [11] have summarized the Yerkes–Dodson law, which states that when the pressure is too low, the operator is prone to lax state, resulting in poor task performance. When the pressure is appropriately increased, the pressure as a kind of power will motivate the operators to a certain extent, thus improving the performance of the operation. However, when the pressure exceeds the capacity of the operator, the pressure will become a resistance, thus reducing the task performance. Thus, it can be seen that maintaining the appropriate mental workload level becomes an important prerequisite to maintain high efficiency.

2.3. Correlation between Concentration and Learning Efficiency

It is well known that concentration is another major factor affecting learning efficiency [12]. If students pay attention when learning, they can follow the guidance of teachers to explore problems, and then they can gain a lot of knowledge and improve their learning ability. Therefore, by monitoring the EEG signals of the subjects when they watch the education video, this paper also obtains the real-time concentration of the subjects and the learning efficiency under this concentration, which is another important basis to evaluate the quality of the teaching video.

3. Methodology

In order to effectively evaluate the quality of online teaching videos, this paper analyzes the EEG signals of subjects in the mental load test and the concentration test and obtains the learning efficiency index, which is used as the evaluation basis. This section mainly introduces the EEG experimental equipment, the selection of the online teaching videos, the design of the subjective questionnaire, the specific process of the experiment, and the main sources and acquisition methods of relevant data. At the same time, this section also introduces the subjects’ subjective data analysis, how to generate label values corresponding to samples from subjective data, how to filter and de-noise raw EEG signals by MNE, and how to distinguish the noise component from the normal event component after ICA analysis, and introduce three methods of EEG feature extraction: discrete wavelet transform, fuzzy entropy, and power spectral density.

3.1. Preliminary
3.1.1. EEG Signal Collecting

In this study, the EEG signal acquisition system called Emotiv EPOC Flex Saline [13] is used to monitor and collect the subjects’ EEG signals. The corresponding relationship between the electrode symbol and the brain scalp area is shown in Table 1. The nine electrodes used in this paper are CZ, FZ, F3, C3, PZ, C4, F4, CMS, and DRL, among which CMS and DRL are the reference electrodes, seeing Figure 1. According to the experimental comparison of the number of channels in the literature [14], the classification accuracy under two channels (Fz, Pz) is very close to the classification accuracy under 32 channels. Therefore, in this paper, two channels Fz and Pz are selected for feature extraction.

3.1.2. Teaching Video

The teaching video selected in this paper is the course “KMP algorithm” launched by Tianqinlvhui Graduate Entrance Examination Institute on the Bilibili platform, with a total length of 16 minutes.

3.1.3. Subject Description

In this paper, 30 subjects are recruited to conduct the experiment. All of them are students majoring in computer science between 18 and 24 years old and have a certain professional foundation; however, they have not been exposed to the relevant curriculum knowledge of “KMP algorithm” for a long time and are nearly completely oblivious. They are in good physical condition and have no history of related neurological diseases such as brain injury.

3.1.4. Public Evaluation Form

The public evaluation form in this paper is mainly an evaluation questionnaire for the public (including experts, teachers, and students). After watching the teaching video, the public needs to fill in the public evaluation form. Each question in the public evaluation form is assigned a certain point value, and the final sum of the points of each question is recorded as the score on the video. The score is mainly used to evaluate the overall quality of the teaching video and is compared with the result obtained by the scheme proposed in this paper. For detailed information, refer to Appendix A.

3.2. Experimental Procedure and Data Analysis
3.2.1. Experimental Procedure

(a)Explain the experiment to the subjects and sign the relevant confidentiality agreement.(b)Take the mental workload test. Subjects wear Emotiv EEG devices and sit in front of the computer. A string of random numbers will be displayed on the screen every round. Subjects need to complete the memory of the string of random numbers within 5 seconds, and then complete the algebraic operation formed by the specified position of the string of numbers according to the requirements of the screen (no tools such as scratch paper are provided for this process). The background automatically records the time and accuracy of the subjects to complete the test. A total of 36 rounds of the test are conducted. For memory difficulty, the number of random digits in the first 3 rounds is 5, and then every 3 rounds is a level. The number of random digits increases by 1 when the level increases by 1, and the memory difficulty gradually increases. For the difficulty of algebraic operations, rounds 1–9 are two digits plus or minus two digits, rounds 10–18 are three digits plus or minus three digits, rounds 19–27 are four digits plus or minus four digits, and rounds 28–36 are two digits by two digits. The difficulty of operations gradually increases.(c)After the mental workload test is completed, the subjects rest for 15 minutes to relieve their tension.(d)Take the concentration test. The concentration test is divided into three stages. In the first stage, subjects wear the Emotiv EEG device and sit in front of the computer. They complete 25 rounds of algebraic operation tests without listening to music. The time and accuracy of the subjects are automatically recorded in the background. In the second stage, the subjects will complete 25 rounds of algebraic operation tests while listening to -EEG music. The background automatically records the time and accuracy. In the third stage, the subjects will complete 25 rounds of algebraic operation tests while listening to -EEG music. The background automatically records the time and accuracy. The three stages of the test are of the same difficulty, and there are 25 rounds of questions in each stage. Rounds 1–5 are two-digit plus or minus two-digit numbers; rounds 6–10 are three-digit plus or minus three-digit numbers; rounds 7–15 are four-digit plus or minus four-digit numbers; rounds 16–20 are two-digit by two-digit numbers; and rounds 21–25 are three-digit by three-digit numbers. (In “A Brief Introduction to Music and Alpha Frequency of the Brain”, the author proposed that “Among the four kinds of brain waves in our brain, wave is the most suitable for subconscious activities. wave state promotes inspiration, speeds up data collection, and enhances memory. It is only in this state that memory works best. Therefore, wave is also called the learning brain wave. A brain wave frequency more than 14 is wave; anger, worry, fatigue, pain, and other undesirable feelings are the origin of wave.” Now, a number of researchers have experimentally confirmed that listening to brainwave music can improve attention, while listening to brainwave music can reduce attention. Therefore, in this experiment, wave is taken as the inducer to improve concentration, and wave is taken as the inducer to reduce concentration, so as to explore the efficiency under different concentrations.)(e)Finally, the subjects are asked to wear the EEG equipment to watch the prepared teaching video, and the relevant EEG signals are collected. Thus, the experiment is completed. Figure 2 shows the actual experimental scenes of a subject.

3.2.2. Data Analysis

(1) Statistical Analysis of the Public Evaluation Form. A total of 60 public evaluation forms are collected. The ratio of teachers and students participating in the public evaluation form is 1 : 4.2, and nearly 80% have a certain learning experience of the KMP algorithm. The results are shown in Table 2.

In order to get the comprehensive results of the public evaluation of the video as objectively and fairly as possible, and ensure the accuracy and reliability of the results, this paper calculates the final results by the method of interval estimation, i.e., under certain probability conditions, the possible interval of the overall evaluation results is statistically inferred from the sample results. Given that the sample average is 90.22, using the T statistic to calculate the average interval of overall evaluation, the result is (85.21, 95.23) and the normalized result is (0.8521, 0.9523).

(2) Analysis of Learning Efficiency. In order to calculate the real-time learning efficiency of subjects in the mental workload and concentration tests, this paper records the subjects’ completion of questions at intervals, and calculates the efficiency value in this period using the method of moving average. The efficiency in the mental workload and concentration tests is obtained using the following formula: where is the learning efficiency, is the sample interval, is the statistical time window size, m is the category number of questions, represents the number of correct completion j class questions in the previous time period, and is the j class question’s difficulty value. In this paper, when calculating the efficiency in the mental workload test, we set , i, and m as 3, 60, and 12, respectively; when calculating the efficiency in the concentration test, we set , i, and m as 3, 60, and 5, respectively. All questions’ values (difficulty value) are shown in Table 3. Furthermore, the difficulty value of memorizing a 5-digit string is 1, and if the digit increases by 1, the difficulty value will increase by 1.

3.3. Preprocessing
3.3.1. MEG + EEG Analysis and Visualization (MNE)

In this paper, we use Python’s MNE library to preprocess the EEG data. MNE is an open source library of Python, which is used to process and visualize MEG, EEG, SEEG, ECOG, and other physiological signals.

3.3.2. Data Filtering and Noise Reduction

The EEG data collection process often times contains a lot of noise, such as environmental noise, the subjects’ eye dynamic noise, and even personnel operating noise and so on; these noises will distort the EEG signals collected. In this paper, in order to obtain effective EEG signals, the noise reduction of the EEG signals is realized through filtering and independent component analysis based on MNE.

Firstly, the EEG signals are mainly distributed in the frequency band of 0.5–50 Hz; thus, we use Fourier bandpass filtering [15], and use the filter function of MNE to filter out the EEG signals in 0.5–50 Hz. The second step is independent component analysis (ICA) [16]. As an effective method of blind source analysis, the ICA plays a great role because the results measured by the equipment contain noise signals such as electro-ophthalmia and muscle twitching, which are not very accurate EEG signals.

In this paper, eye movement, head movement, and other noises are separated by ICA class of MNE. The ICA analyzing the result of a certain EEG signal is shown in Figure 3. In the brain topographic map, red represents the positive value, blue represents the negative value, and color depth represents the absolute value. Then, the analysis is carried out by clicking on each component according to the detail information of the component. For example, Figure 4 shows the specific information of ICA000 (ICA component number). The energy of the ICA000 component is high in the low-frequency band and decreases with an increase in the frequency band. Moreover, the ICA000 component is mainly concentrated in the left front part of the head and is an eye movement component; thus, it can be eliminated. The remaining components ICA001∼ICA009 are analyzed and denoised using a similar method to component ICA000.

After denoising by bandpass filtering and independent component analysis, the noises are successfully removed on the basis of the retaining data features. By denoising the EEG data, we also get the ideal EEG data, which lays a foundation for our next step of extracting the features.

3.4. Feature Extraction

At present, there are many feature extraction methods, but in the field of EEG signals, these mainly include the following methods: (1) time domain analysis methods, such as peak skewness and coherent average, which mainly analyze the fluctuations and changes of EEG signals from the time domain; (2) frequency domain analysis methods, including autoregressive model (AR), fast Fourier transform (FFT), and other methods. These methods split the signal into different frequency bands; (3) time-frequency analysis method, mainly including wavelet analysis. Although the first two analysis methods have their own advantages and characteristics, when the time domain or frequency domain analysis is used alone, the characteristics of each other cannot be well taken into account, which proves the superiority of wavelet analysis. Wavelet analysis has both time domain and frequency domain characteristics, and can carry out a more accurate analysis of the EEG signals. Of course, the time-frequency analysis method of wavelet analysis is not very complete and cannot fully reflect the essence of the EEG signal, which promotes the emergence and development of more complex analysis methods such as approximate entropy and fuzzy entropy.

3.4.1. Feature Extraction in the Mental Workload Test

(1) Discrete Wavelet Transform (DWT-)Based Feature. Wavelet transform [17] is a time-frequency localized analysis tool of the signal; the signal can be refined in multiple scales through operations such as scaling and translation, and then the characteristics of the signal in the time domain and frequency domain can be obtained. Discrete wavelet transform [18] is a discrete form of wavelet transform, and its calculation time is significantly reduced compared to other signal processing techniques.

Given the discrete EEG signals collected by the experiments, the discrete wavelet transform of is defined aswhere is the time variable, is the scale factor, is the shift factor, and is the wavelet basis function. According to the Mallat algorithm, the EEG signals are decomposed into the sum of approximate signals and detail signals of each layer:

L is the number of decomposition layers, is the approximate signals, and is the detail signals of each layer. Using orthogonal order 4 Daubechies wavelet (db4), taking  = 2, and = 0, five-level decomposition of the original EEG signals is performed. The sampling frequency of the decomposed signals is 128 Hz, and the sampling time is 3 s.

After wavelet decomposition of the EEG signals, the frequency bands corresponding to each wavelet band are shown in Table 4. At present, a large number of studies have shown that the change of , , and bands is closely related to the change in the mental load [19], and the corresponding frequency band distribution range is 4–32 Hz. After wavelet decomposition, the frequency band mainly focuses on subband signals , , and . Therefore, we mainly extract the linear features of the wavelet coefficients of the above three frequency bands.

It is known that the wavelet coefficient obtained after the five-stage wavelet decomposition is , , , where represents the decomposition mode, is the subband signal, and is the coefficient of each subband signal; the value of is different depending on the hierarchy.Feature 1: is the mean of the absolute values of the wavelet coefficients in the three frequency bands:Feature 2: is the average power of the wavelet coefficients in the three frequency bands:Feature 3: is the standard deviation of the wavelet coefficients in three frequency bands: where is the number of wavelet coefficients of each frequency band, and is the average of wavelet coefficients of each frequency band.

In this paper, the subject’s memory and algebraic operation to complete a problem will be taken as an event to intercept and segment the original EEG signals, and the linear features mentioned above will be calculated in turn with events as the unit.

(2) Fuzzy Entropy-Based Feature. The fuzzy entropy [20] of EEG signals corresponding to each event is extracted as its nonlinear feature in this paper.

The specific idea of fuzzy entropy is as follows:(1)For a M-point’s sampling sequence definition: (2)A set of -dimensional vectors is generated by reconstructing the serial number: (3)The distance between two -dimensional vectors and is defined as the one with the largest difference between their corresponding elements, i.e.,(4)The fuzzy function is used to define the similarity of and , i.e.,(5)Repeat steps (2) to (4) to reconstruct a set of dimension vectors in the ordinal order. The function is defined as follows:(6)Fuzzy entropy is defined as

(3) Feature Aggregation and Normalization. By combining the above linear and nonlinear features, they can be used as a single channel feature group of an event. Table 5 shows the extracted features. For each event, 9 linear features of wavelet transform and 1 nonlinear feature of fuzzy entropy are extracted on Fz and Pz channels, respectively, i.e., each event has a total of 20 features. In order to eliminate the dimensional influence between different eigenvalues, the final eigenmatrix is normalized by the Min-Max method in this paper.

Finally, the paper extracts a total of 1080 groups of samples with 20-dimensional EEG features and divides them into 756 training sets and 324 test sets with a ratio of 7 : 3. Meanwhile, the efficiency label corresponding to each sample is calculated according to the efficiency index calculation formula introduced in Analysis of Learning Efficiency in Section 3.2.2 as the output.

3.4.2. Feature Extraction in the Concentration Test

In this paper, the power spectral density (PSD) [21] method is used to analyze the EEG signals, and the linear characteristics are extracted based on the results.

Figure 5 shows the image obtained using MNE to analyze the power spectral density of a single channel event. The abscissa is the frequency and the ordinate is the spectral density. The topographic map of each frequency band is shown in Figure 6.

Since concentration is mainly related to wave and wave, the spectral density within the wave frequency range is extracted as the feature.Feature 1: is the mean value of the absolute value of the spectral density:Feature 2: is the average power of the spectral density:Feature 3: is the standard deviation of the spectral density:where is the spectral density corresponding to a certain frequency and is the average spectral density of this frequency band.

In this paper, three features are extracted from the wave and wave in a single channel, a total of 6 features. Since this paper mainly selects two channels Fz and Pz for feature extraction, there are 12 features in total, and the linear features above are calculated in turn with events as a unit. Table 6 shows the datasets.

Finally, the paper extracts a total of 2250 groups of samples with 12-dimensional EEG features, which are divided into a training set of 1800 and a test set of 450, with the ratio of 8 : 2. Meanwhile, the efficiency label corresponding to each sample is calculated according to the efficiency index calculation formula introduced in Analysis of Learning Efficiency in Section 3.2.2 as the output.

4. Experiment and Results

4.1. Comparison of Machine Learning Methods

After denoising and feature extraction of the EEG data, the next step is to substitute the EEG feature data into a machine learning model for learning efficiency value classification and prediction. In this paper, a total of six machine learning algorithms are applied to the mental load test and the concentration test, respectively, and the optimal machine learning algorithm model is selected by comparing the determination coefficient.

4.1.1. Support Vector Regression (SVR)

Support vector regression [22] is a method of Support Vector Machine (SVM) [23] for regression modeling. In this paper, we write a Python program to implement the SVR. The SVR is developed using Sklearn’s built-in SVR function. Firstly, to conduct SVR, the EEG signal features of the mental workload test are used as the input, and the corresponding learning efficiency score of each sample is used as the output.

In this paper, the linear kernel function, the polynomial kernel function, the Gaussian kernel function, and the Sigmoid kernel function are selected for SVR training, and the influence of different kernel functions on the results is obtained. Figure 7 shows the classification accuracy of the training set and the test set under different penalty factors, respectively, with different kernels. It can be seen that the accuracy of the training set and the test set under the Sigmoid kernel function decreases monotonically before and after the period and is negative, which is not suitable for this test. What’s more, the accuracy of the Gaussian kernel function on the training set and the test set is higher than that of the linear kernel function and polynomial kernel function. Therefore, the Gaussian kernel function is selected in this paper, and the penalty factor C = 80 is selected when the test set has the highest accuracy.

Then, in order to select the optimal value of the gamma parameter (gamma is a hyper-parameter of the Gaussian kernel, which implicitly determines the distribution of the data after mapping to the new feature space), the gamma parameter needs to be iterated continuously. Finally, Figure 8 shows the classification accuracy of the training set and the test set with different gamma values in the Gaussian kernel function when the penalty factor is 80. Therefore, the gamma value is set as 0.5, and the accuracy of the training set and the test set is 80.39% and 72.52%, respectively.

For the concentration test, the process is the same as that of the mental workload test; the Gaussian kernel function (Gamma = 0.3, C = 100) has the best effect among different parameter pairs, but its accuracy rate in the training set is 34.1%, and the accuracy rate in the test set is 41%. The overall performance is poor, indicating that the SVR is not suitable for the concentration test.

4.1.2. Fully Connected Neural Network (FCNN)

This paper uses a four-layer feedforward neural network [24] to form a fully connected neural network to predict the output values in the mental workload test.

In order to obtain the most appropriate model parameters, we first determine the learning rate of the neural network as 0.005, use the SGD optimizer, take the mean square error (MSE) as the loss function, and select different number of neurons in the hidden layers for model training. The loss values of the training set of some models change with the number of iterations, as shown in Figure 9. The first number in the figure represents the number of neurons in the first hidden layer, and the second number represents the number of neurons in the second hidden layer. It can be clearly seen from the figure that with an increase in rounds, the loss values are constantly decreasing and tend to a small fixed range.

The loss value of the test set after 1000 iterations of each model is shown in Table 7. It can be seen that when the number of neurons is (20, 24), the loss value is the minimum and the prediction effect is the best. Figure 10 shows the change of the loss value of the training set with the number of iterations at different learning rates. Table 8 shows the loss value of the test set at different learning rates. It can be found that when the learning rate is 0.001, the learning speed of the model is too slow and the efficiency is too low. When the learning rate is 1, the loss value of the training set fluctuates up and down, indicating that the learning rate is set too high. By comparing the loss value of the test set, it can be judged that the best effect is achieved when the learning rate is 0.005.

Then, based on the above optimal neuron number pair (20, 24) and learning rate of 0.005, the Sigmoid, Relu, Tanh, and Softplus functions are used as activation functions to conduct experiments, respectively. The experimental results are shown in Table 9. From Table 9, it can be seen that although the Tanh function has the smallest loss value in the training set, it has the largest loss value in the test set, indicating that its generalization ability is not strong in this test. In the case of the Sigmoid function, although its training set loss value is higher than that of the Tanh function, its test set loss value is the smallest, which can better avoid the overfitting phenomenon of the Tanh function. Therefore, the Sigmoid function is selected as the activation function of the neural network in this paper.

For the concentration test, we find that when the number of neurons is (45,60) and the learning rate is 0.001, using the Sigmoid activation function has the best effect among different selection pairs, and its accuracy rate in the training set and the test set reaches 96.1% and 91.3%, respectively, indicating a good overall performance.

4.1.3. Polynomial Regression (PR)

Polynomial regression [25], i.e., the regression function, is the regression of the regression variable polynomial. This paper implements polynomial regression by writing a Python program that uses Sklearn’s built-in polynomial regression function. This method is applied to the mental workload and concentration tests, respectively. The experimental data are fitted with different polynomial degrees, and the determination coefficient is used as the evaluation index of the regression method. The determination coefficient and the mean square error obtained from the two tests are shown in Figures 11 and 12, respectively.

In the mental workload test, when the degree of the polynomial is 1, the determination coefficient of the training set is 0.786, and the determination coefficient of the test set is 0.736. When the degree increases, the determination coefficient of the training set is 1, and the determination coefficient of the test set begins to be less than 0. Thus, when the degree is greater than or equal to 2, the method is overfitted; thus, the first-degree polynomial regression fitting has a better effect. In the concentration test, when the degree of the polynomial is 1, the determination coefficient of the training set is 0.496 and the determination coefficient of the test set is 0.449. When the degree increases, the method starts to overfit, indicating that the polynomial regression method is not suitable for the concentration test.

4.1.4. K-Nearest Neighbor Regression (KNR)

The K-Nearest Neighbor Algorithm (KNN) is a nonparametric method used for classification and regression [26]. This paper implements KNR by writing a Python program and using Sklearn’s built-in k-neighbors regressor function. Different K values are used to fit the experimental data. The determination coefficient is adopted as the evaluation index of the regression method. The results are shown in Figures 13 and 14. In the mental workload test, when the value of K is 1, the training set’s determination coefficient is 1.0, and the test set’s determination coefficient is 0.86, which is the optimal value. In the concentration test, when the value of K is 2, the determination coefficient of the training set is 0.968, and the determination coefficient of the test set is 0.939. When the value of K increases, the determination coefficient of the test set begins to decline. It can be seen that when the value of K is 2, the method performs well in both the training set and the test set.

4.1.5. Classification and Regression Trees (CART)

Classification and regression trees (CART) was first proposed by Breiman and Friedman in 1984 [27]. Different from the multiple subsample set nodes of ID3 decision tree [28] and C4.5 decision tree, CART decision tree adopts a binary structure similar to a binary tree.

This paper implements CART by writing a Python program and using Sklearn’s built-in decision tree regressor function. The experimental data are fitted with different tree depths, and the determination coefficient is used as the evaluation index of the regression method. As shown in Figures 15 and 16, in the mental workload test, when the depth of the regression tree is 3, the determination coefficient of the training set is 0.866, and the determination coefficient of the test set is 0.794. The determination coefficient of the test set is the highest and the mean square error is the lowest. In the concentration test, when the depth of regression tree is 7, the determination coefficient of the training set is 0.955, and the determination coefficient of the test set is 0.856. The determination coefficient of the test set is the highest and the mean square error is the lowest. It can be seen that the CART depth of 3 has a better performance in the mental workload test, while the depth of 7 has a better performance in the concentration test.

4.1.6. Random Forest (RF)

Random forest is a new algorithm combining CART algorithm and Bagging algorithm, which was proposed by Leo Breiman in 2001 [29,30]. In this paper, we write a Python program to implement the RF and use the built-in random forest regressor function of Sklearn. The experimental data are fitted with different tree numbers, and the determination coefficient is used as the evaluation index of the regression method. As shown in Figures 17 and 18, in the mental workload test, when the number of trees is 13, the determination coefficient of the training set is 0.965 and that of the test set is 0.843; the determination coefficient of the test set is the highest and the mean square error is the lowest. In the concentration test, when the number of trees is 18, the determination coefficient of the training set is 0.978 and that of the test set is 0.929; the determination coefficient of the test set is the highest and the mean square error is the lowest.

4.1.7. Summary of Various Machine Learning Methods

By comparing various machine learning methods in the mental workload test and the concentration test, it can be concluded that all of them use the optimal parameters when the training results are the best; the experimental results are shown in Table 10.

As can be seen from the table, for the mental workload test, the k-nearest neighbor regression algorithm with K = 1 performs well in both the training set and the test set. For the concentration test, the nearest neighbor regression algorithm with K = 2 has the highest determination coefficient on the test set, while the random forest with the number of tree = 18 has the highest determination coefficient on the training set. Both of them have good performance in the concentration test.

Therefore, in the subsequent prediction of the overall efficiency experimental results, the nearest neighbor regression algorithm with K = 1 is adopted for the mental workload test, while the random forest algorithm with the number of tree = 18 is adopted for the concentration test.

4.2. Experiments on the Overall Efficiency Prediction

According to the above data, the EEG signal of a subject (e.g., A) when watching a teaching video is substituted; the mental load machine learning model and the concentration machine learning model are introduced, and the image of subject A’s learning efficiency changing over time is obtained, as shown in Figure 19.

In order to obtain the overall efficiency value, we write a Python program to realize the entropy method [31], and get the overall efficiency concentration learning efficiency; the figure of the overall efficiency changing over time is obtained, as shown in Figure 20.

4.3. Analysis of Results

According to the above overall efficiency calculation method, we obtain the overall efficiency change images of 30 subjects, and the image of some subjects is shown in Figure 21. Then, the mean value of the overall efficiency for each subject is calculated, and the result is shown in Figure 22. In Figure 22, the two green lines represent the maximum and minimum interval estimates of the corresponding questionnaire data mentioned in Statistical Analysis of Public Evaluation Form in Section 3.2.2, and the small red triangle is the mean value of the overall efficiency for different subjects.

It can be seen from Figure 22 that the mean value of the overall efficiency for 30 subjects mostly falls within the range of interval estimation (0.8521, 0.9523), indicating that the results which use efficiency to evaluate the quality of online teaching videos are almost consistent with expert subjective evaluation, and can avoid the error caused by subjectivity, which is more objective and reliable.

5. Discussion

The above experiment has effectively proved the validity and rationality of this scheme for evaluating a teaching video’s quality based on EEG, and whether this scheme can recommend teaching videos for students to improve their learning satisfaction is the topic to be discussed in the following section.

First of all, 10 KMP teaching videos are selected from the Bilibili platform in this experiment and are numbered A-J respectively, as shown in Table 11.

Then, through wearing EEG devices, the EEG signals of 30 subjects watching the 10 teaching videos are collected, i.e., there are 30 EEG data for each video from A to J. Apply our proposal to calculate the efficiency curves of the 30 EEG signals under each video, and then the average value of the 30 efficiency curves is calculated to obtain the average efficiency curves of A–J videos, as shown in Figure 23.

For the total time segment, sum and average the efficiency within the corresponding segment to obtain the average efficiency of each video, and take the efficiency as the basis for teaching effect grading, as shown in Table 12.

As can be seen from the table, among the 10 KMP videos selected from the Bilibili platform, the score of video D is the highest and is the lowest.

In order to test the improvement of the scheme on students’ learning satisfaction, 20 students of different majors and ages are selected in this experiment and asked to randomly select a video from video A to J (excluding D) to watch and fill in the satisfaction survey questionnaire (see Appendix B for details). The student chooses an integer value from 0 to 10 to score each question, and finally the total score is used as the student’s satisfaction score. After watching the randomly selected teaching video, the student continues to watch video D with the highest score given by this paper’s scheme and fills in the satisfaction survey questionnaire. The experimental results are shown in Table 13.

It can be seen from the table that after watching video D, the satisfaction scores basically increase, only two remain the same, and one decreases. The increase rate is 85%. So, the scheme in this paper can effectively provide students with efficient teaching videos and better improve students’ learning satisfaction.

6. Conclusion

This paper proposes an objective and effective scheme for video teaching effect evaluation, based on EEG signals and machine learning. Firstly, we collect the EEG signals of the subjects during the mental workload test, the concentration test, and watching video test, and use the MNE Library to preprocess the EEG signals. Then, the EEG signals of different tests are extracted by wavelet analysis, fuzzy entropy, and power spectral density. Finally, we use the nearest neighbor regression algorithm and the random forest algorithm to build two prediction maps. The entropy method is introduced to calculate the overall efficiency. Combined with the statistics of public evaluation, the results show that the learning efficiency of the subjects in this experiment lies in the range of the public evaluation results, which indicates that the proposed scheme can be used to evaluate the actual teaching effect of teaching videos from the learning efficiency of the subjects.

The scheme proposed in this paper breaks the traditional subjective evaluation method, and makes use of the advantages of the EEG signals, such as nonstealing and noncopying, to accurately reflect the real-time concentration and mental load trend of the subjects when watching the teaching video, so as to measure the teaching quality more objectively. For teachers, this scheme can not only provide the overall evaluation of their teaching quality, but also provide the real-time concentration and mental load trend of the students, and could point out the direction for them to adjust the teaching difficulty. For the students, this scheme provides their real-time concentration and mental load when watching the video. When the mental load is high, students can pause watching for a rest; when the concentration is low, there can be a warning to remind students to focus on learning. The downstream application of this scheme can really improve the teaching quality of the teacher, and can also improve students’ satisfaction with watching videos; the increase rate can achieve 85%.

However, there are still some deficiencies in the scheme; the collected EEG signal is still not ideal and the adopted machine learning algorithms are not comprehensive. To address the above limitations, we look forward to making corresponding improvements in the future: firstly, if conditions permit, use more accurate sampling equipment to make the EEG signal more accurate. Secondly, try deep learning algorithms when there are enough subjects, and when the amount of data is huge enough, explore the optimal machine learning model to make the evaluation results more convincing.

Appendix

A. Public Evaluation Form

(1)Have a background in computer science or not? [single choice] ( )(A)Yes(B)No(2)Your current occupation (5, 6, 7 logical nesting) [single choice] ( )(A)Teacher(B)Student(C)Other(3)Have you learned the “KMP Algorithm”? [single choice] ( )(A)Yes(B)No(C)Under study(4)As an introduction to the first lesson of the KMP algorithm, how do you feel about the content quality of this video [enter a number from 0(bad) to 100(good)] [ ](5)Have you gained any helpful new knowledge through the study of this video? [single choice] ( )(A)The harvest is great and helpful for future study(B)Some gains, but not much help for the future(C)Not much(6)Please comment on this learning video [enter a number from 0(bad) to 100(good)] [ ](7)What areas do you think need improvement? [Multiple choice] ( )(A)The content should be more detailed(B)Teaching methods should be improved(C)The content’s difficult is not properly(D)The courseware shall be revised and adjusted(E)Don’t need to improve

B. Satisfaction Survey Questionnaire

(1)Whether the teaching video is clear [Enter number from 0(bad) to 10(good)](2)Whether the teacher’s speed is appropriate [Enter number from 0(bad) to 10(good)](3)Whether the teacher is using Mandarin Chinese in lectures and expressing appropriately [Enter number from 0(bad) to 10(good)](4)Whether the PPT or blackboard writing interface used by the teacher is clean and beautiful [Enter number from 0(bad) to 10(good)](5)Whether the teaching video knowledge is comprehensive and full of information [Enter number from 0(bad) to 10(good)](6)Whether the teacher has logic and emphasis in the process of explaining [Enter number from 0(bad) to 10(good)](7)Whether the teaching link is natural and smooth [Enter number from 0(bad) to 10(good)](8)Whether the teaching method is inspirational, innovative, and not scripted [Enter number from 0(bad) to 10(good)](9)Whether the atmosphere in the teaching process is active and not boring [Enter number from 0(bad) to 10(good)](10)Give a comprehensive score of the video [Enter number from 0(bad) to 10(good)]

Data Availability

The data are not freely available because of the data privacy of the subjects’ brain electrical signals, and we have consulted them and decided not to disclose it.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 62072252 and 61772285), Key Project of Educational Reform of Higher Education in Jiangsu Province (no. 2021JSJG144), Bidding Project of Nanjing University of Posts and Telecommunications (no. JG00419JX63), and the Project of Computer Education Research Association of Chinese Universities under Grant CERACU2021R05.