Abstract

Classroom teaching quality evaluation system can enable the school’s functional departments to accurately assess the performance of the teaching staff and current teaching operations. As per the requirements for cultivating high-quality talents, planned teaching staff construction and teaching reforms need to be carried out to promote teachers’ appointments. Improving the system makes the appointment process more scientific by giving due attention to the individual characteristics of all types of teachers while hiring them for related jobs. The system motivates the love of teaching, high academic level, high teaching level, and competitive teaching. In recent years, the rapid development of artificial intelligence and deep learning caused many colleges and universities to put forward the target of campus digitization and education informatization. The state of the classroom is a critical reference factor throughout the teaching and learning process for evaluating students’ acceptance of the course and the quality of the teaching. However, at present, the analysis of the classroom status is mainly conducted manually, which distracts teachers and is also not much precise. Therefore, finding a method that can improve the efficiency of classroom status analysis has great research significance. This study uses the deep neural network method to read each class’s video recording and analyze it from the aspects of students’ behavior and attendance. The system can realize class behavior and eventually evaluate the course quality employed to motivate teachers to improve teaching and overall quality of education.

1. Introduction

The expansion in the scale of schools, colleges, and universities creates a deep concern regarding the quality of education at the institutions and the mechanisms to improve it [13]. From the perspective of talent cultivation in colleges and universities, classroom teaching is the central link of the whole teaching work [47]. Its quality determines the overall quality of education in colleges and universities to a great extent. Classroom teaching includes many factors, such as teaching conditions [8, 9], course difficulty [1012], teacher’s coaching, and learning effect [13, 14]. All these interact with each other to form an acceptable teaching network. Classroom teaching is the most important link in all the highlighted factors because it determines the level of talent cultivation and influences the quality of life of teachers, students, and the delivered course. Precise assessment of classroom teaching done by using scientific methods as per the settled requirements can be used to improve teaching methodologies and educational reforms in a planned way. It is also helpful in the teacher’s employment/promotion system by shifting the employment process on a more scientific basis considering the personality characteristics of all teachers [1517] who are being employed on related posts. The system encourages the appointment of teachers having a passion for teaching who possess high academic and teaching levels. It can also assist in exploring talents by changing the employment mechanism where excellent young teachers can be appointed on related teaching posts while getting rid of the seniority theory. The scientific evaluation can be used to dismiss, suspend, or transfer those teachers who are not taking proper responsibility and show low teaching levels. Similarly, the system plays its part during the promotion process of talented, dedicated, and performing teachers. Hence, the scientific evaluation of classroom teaching quality in colleges and universities is of great significance to encourage teachers to improve teaching quality, thereby increasing the overall quality of education.

At present, most colleges and universities have not yet established any scientific classroom teaching quality evaluation system. Many colleges and universities do not imagine or design exact needs as per their requirements for a perfect teaching evaluation system. Thus they are unable to precisely assess the teacher’s personality, methodology, and overall teaching skills. In essence, the schools blindly copy other school’s classroom teaching evaluation indexes that usually do not suit their specific environment and are only utilized as a mere formality since it does not yield much useable data for the classroom evaluation. Therefore, in-depth and systematic research on the classroom teaching quality evaluation system is a demand of time to meet the requirements of the modern education system.

With the rapid development of artificial intelligence and deep learning in recent years, many universities have put forward the goal of campus digitization [1820] using intelligent and educational informatization systems [21, 22]. Since classroom state analysis at present is mainly carried out manually, which distracts teachers’ attention, it is of great research significance to find a method to improve the efficiency of classroom state analysis [23, 24]. Using the target detection method to identify students can also quickly count out the number of students in class and their corresponding behavior, which is convenient and accurate compared to the manual methods. Face recognition can be employed in class attendance by automatically identifying the person upon his arrival. In the end, the use of deep learning to evaluate classroom status will result in a better assessment of the overall classroom environment serving as a significant research issue.

Following are the main innovative points of this paper:(1)This paper proposes a novel course evaluation model for colleges and universities based on deep neural networks, which can assist teachers in improving their teaching methodologies and improve the quality of related courses.(2)Aiming at the neural network model, this paper proposes an improved SSD model by replacing the backbone network with an improved MobileNet network. The deep separable convolutional network reduces the network parameters, thereby increasing the calculation speed. The information in the deeper feature maps is merged in the shallow layers to improve the small target recognition rate accuracy.(3)Finally, the RMSProp optimization algorithm is integrated into the network to achieve network model optimization and accelerate the model convergence speed.

2. Background

The research on effective teachers’ behavior has gone through explicit behavior observation, psychological research, and comprehensive methodological studies. The work on effective teacher’s behavior provides an important basis for determining classroom teaching and education quality criteria [25]. Classroom teaching evaluation started in Western countries in the 20th century, originating from the educational measurement movement that prevailed in Europe and America at that time. The mechanism at that time was primarily based on the evaluation rating scale. However, due to the lack of research on the content of the scale itself, the reliability and validity of the evaluation results are poor. Some theories state that “the validity of the evaluation results of that time is almost zero.”

In the 1950s, the development of modern education evaluation research assessed the teachers on a more quantitative basis to provide more valuable information regarding classrooms using different observation methods. The emphasis of that research was precisely focused on objective and verifiable results. After the 1980s, a series of new evaluation models developed in the field of educational evaluation, such as the theories and methods of the fourth generation of educational evaluation proposed by Lincoln, which had a great impact on classroom teaching assessment. Compared with the evidence-based method, the interview-based evaluation method has the characteristics of stronger humanistic care. The evaluation improved by engaging the evaluator and teacher in direct communication. However, most of these researches are centered on the teachers, and very few studies and evaluations are conducted on students’ behaviors. Thus these models overevaluate teaching while ignoring the learning, especially from the students’ perspective.

Since the 1990s, research on the teaching evaluation model has been rising rapidly in educational reforms around the globe. Classroom teaching researchers believe that “effective teaching depends first on making the right decisions about what teachers should do in the classroom, and secondly on how to implement those decisions”. The important trend of the reform is to promote the professional development of teachers and produce more “effective teachers.” The evaluation of effective teachers includes classroom evaluation. Teaching evaluation is the most popular method in American teacher evaluation systems. One study found that more than 80 percent of public school principals preferred to use classroom observations as their primary data source for teacher evaluations which directly forms the teacher ranking system. However, these methods of classroom evaluation have been widely debated, with some researchers pointing out that these do not relate to students’ achievement.

3. Methodology

3.1. Identification of Parameters Affecting Classroom Teaching Quality

Classroom teaching quality evaluation refers to the quantitative assessment of each element, including the development change and effects of these elements in classroom teaching activities according to certain standards. As mentioned earlier, evaluation is the process by which the object under discussion is evaluated empirically, which is the quality of classroom teaching.

Classroom teaching process evaluation includes the evaluation of teachers’ classroom teaching goals, the arrangement of teaching content, the design of teaching structure, the choice of teaching methods and the embodiment of teaching ability, the enthusiasm of students in learning, and the overall classroom atmosphere. A practical evaluation mainly measures the completion of various predetermined goals of teachers’ classroom teaching and students’ understanding of the content taught. These two aspects complement each other, thereby reflecting the quality of teachers’ classroom teaching from different perspectives. From the point of view of pedagogical theory, the teaching process refers to how students acquire knowledge, develop abilities, and form moral character under the guidance of teachers. In this process, teachers make and implement teaching plans and guide students to gradually achieve the expected educational goals according to certain educational goals. The evaluation of the teaching process should include several basic links before, during, and after the class and other factors that constitute the teaching process.

Students exhibit the final evaluation of classroom teaching quality via their performance. It must be noted that the quality of students’ academic performance is affected by many factors. It cannot be simply used as a yardstick for evaluating teachers’ classroom teaching quality. Although students’ learning is carried out under teachers’ guidance, many factors such as their learning goals, learning attitudes, learning habits, learning efforts, and original learning foundations are difficult for teachers to control directly. Unfortunately, although these factors are out of the control of the teachers, these have a great influence on teaching results, which makes the overall justified evaluation process very difficult.

3.2. Modeling Classroom Behavior Recognition System

For the modeling of a classroom evaluation system, students’ postures give an important clue for effective identification and research on the classroom behavior that can be summarized into the quality of teaching. For the current study, different common student postures are selected, including sitting idle or listening, raising hands, writing, sleeping, and using mobile phones. The paper then proposes an improved SSD algorithm based on these characteristics of students’ classroom behaviors.

3.2.1. Construction of the Classroom Behavior Recognition Model

The model construction mainly includes determining the network structure, preparing training data, and the system’s training and testing. Figure 1 is a schematic diagram of the recognition model.

The first step is to prepare the student behavior image, which involves necessary preprocessing. In the following step, a behavior recognition [26] database is formed. The neural network [27, 28] is trained and tested in the third step, which is the essential part of the study. Initially, the model is trained by supplying training samples and their labels specifying different behaviors of the students. The network model trained and adjusted the weights of the network according to the training samples. The system is then verified by the verification/testing set to obtain the analyzed results to determine if the proposed model operated as per expectations. The behavior identification model trained with a better identification effect is saved for subsequent behavior identification and prediction in the particular class.

3.2.2. Improved SSD Algorithm

The SSD (Single Shot Detector) algorithm [29] is built on to traditional VGG16 (Visual Geometry Group of Oxford) [30], which is a pretrained model that comes integrated into the Keras library. The VGG16 revealed good results but demanded high computations. In contrast, the SSD algorithm maintains its strong performance in high-quality image classification problems. Yet, it improves the working by not demanding the full mesh connections of all the layers, giving it an edge on the performance side. At the same time, the precision is not affected badly. It extracts the image features through the primary network. Then the additional convolution layer selects some feature layers to carry out target detection. Although the algorithm has achieved good results in the field of target detection, there is still some room for improvement. For Instance, when the final full connection layer is removed, there are 12120,000 parameters. About 4/5 of the time in training is utilized by the basic network operation. The model requires high computation configuration for training purposes and the performance in real time is not much appealing. In addition, according to the network structure of the SSD algorithm, the detection of small targets is completed in the feature map at the shallow level. However, this level contains less feature information, so the detection effect is not good enough. To overcome these issues, this section proposes the following improvement strategies for the primary network and small target detection, replacing VGG16 with a lightweight network, “MobileNet,” which reduces the number of parameters, thereby improving the detection speed. Moreover, high-level semantics are also integrated into the low level to improve the detection effect of a small target. The use of the MobileNet network [31] reduces the computational cost of the model as it uses deep separable convolution instead of ordinary convolution to reduce the number of parameters resulting in only 4.2 million parameters compared to 133 million parameters in VGG16.

The reason why MobileNet is faster than VGG16 is because of two differences. Firstly, the network composition is composed of depth separable convolution, and on the other hand, the width coefficient and resolution coefficient are also used. The central part is deep separable convolution, which completes a convolution operation through two parts: deep convolution and point convolution. The MobileNet network structure has 28 layers if the two layers are counted separately. In comparison, the structure reduces to only 14 layers if the two layers are merged into one. So the deep separable convolution mentioned above means that you do the convolution in two steps. When the image is input into the network, some graphs containing feature information need to be obtained through deep convolution. Some other feature information needs to be obtained through point convolution after B. N. (Batch Normalization) and ReLU (Rectified Linear Unit) operations. Then the results need to be obtained through B. N. and ReLU operations again. The process is shown in Figure 2, with deep convolution at the top and point convolution at the bottom. Equation (1) determines the ratio of the parameter quantities of the depth separable convolution and the standard convolution:

In order to reduce network parameters, width coefficient and resolution coefficient are used in addition to depth separable convolution, and their values range from 0 to 1, among which the most commonly used values of are (1,0.75,0.5,0.25). The function is to reduce the number of channels. For example, for an input channel with the value , when added, it becomes , reducing the calculation by . Another factor that affects the calculation is the resolution. Therefore, is used to reduce the image resolution, and the use of this coefficient reduces pixel value calculation. According to the traditional SSD model design structure, the first 14 improved deep separable convolutional layers are intercepted from the improved MobileNet (300 × 300) network to replace VGG16 as the backbone network of the improved algorithm presented in this paper. Later, to increase the model’s feature extraction capability, eight ordinary convolutional layers of decreasing size are introduced behind the replaced basic network to obtain deeper information about the image further. The classification layer is then integrated to judge the category at the end of the network. The nonmaximum value suppression layer used to filter the regression box is utilized to complete the replacement of the basic network. Finally, as in the original SSD, six feature layers are selected to complete feature extraction and target detection. The layout of the improved SSD model is presented in Figure 3.

3.2.3. Feature Fusion

The obtained features could be fused to reduce the system’s size and enhance its efficiency [32]. The collected features could be fused by two popular methods [33]. Initially, the “Concat” method is used, which splices features directly and can also be understood as merging channels where each channel has its corresponding convolution sum. This method increases the information of the image itself. However, the information of the features of each layer does not increase. On the other hand, the “Add” method is used to combine the feature vectors to get the compound vector, which adds values, but the number of channels remains unchanged since it carries out the convolution operation after adding the feature graphs. The calculation equations of the two feature fusion methods are as follows:where and represent the channel to be fused, represents the weight of the th channel, and represents feature fusion.

This paper selects the feature fusion method to carry out network fusion operations. According to the model structure, the size of the extracted six feature layers decreases step by step from shallow to deep, where the earlier stages contain less abstract information than the latter. Therefore, feature fusion aims to reverse transfer the abstract information of the deep feature layer to the shallow layer.

3.2.4. Model Optimization

During the training stages, the model needs to be closely observed for the loss function [34]. When the value of the loss function is declining, the result of the model training is getting closer and closer to the real result. Therefore, the loss function needs to show a downward trend in order to find the minimum value. However, in the process of gradient descent, problems such as excessive or constant value swing may occur, resulting in slow gradient descent.

We use RMSProp (Root Mean Square Prop) optimization algorithm. The algorithm calculates the square of the historical gradient of each dimension. It then superimposes it and at the same time introduces the attenuation rate to obtain a historical gradient sum, and it uses the learning rate to divide the result obtained above when the parameters are updated. After using this optimization algorithm, the gradient direction keeps changing within a small range to speed up the network convergence speed. The calculation formula for the said purpose is given, respectively, inwhere represents the decay rate, represents the cumulative gradient variable, represents the learning rate, represents a constant, the prevention denominator is 0, and represents the parameter.

4. Experiments and Results

4.1. Dataset

For training and testing the model, a reliable dataset is required. For the current study, a dataset comprising 5,000 images is collected, which depicted 50 student behaviors, including raising hands, sitting upright, writing, sleeping, and using mobile phones, each with 100 images. The second step involves the creation of a behavior recognition database. The collected 5000 images are preprocessed and labeled accordingly. The whole dataset is divided into three portions, i.e., the training set, test set, and verification set in proportion.

4.2. Experimental Setup

The environment used for simulation is represented in Table 1. In the experiment of this study, 100 images are selected for each action in the training set, with a total of 5000 images, and the test set has 200 images. The batch size is selected as 4 during training, constituting 1200 batches for 5000 training sets, and the epoch is set to 100, which means a total of 62500 iterations. The learning rate for the initial 5000 iterations is reduced to for the remaining simulation.

4.3. Evaluation Index

This paper evaluates the model considering the detection time of a single frame image against the mean Average Precision (mAP). mAP [35] is the average of all AP values, where AP is the average precision calculated by considering the area under the curve composed of precision and recall [36]. All these calculations are carried out over a confusion matrix that places the actual labels against the predicted labels. This results in four prediction possibilities, i.e., true positive, true negative, false positive, and false negative, where TP represents the number of samples that the classifier tags as positive samples and are a positive sample. Similarly, TN is the negative samples correctly predicted as negative by the system. The false positive (FP) represents the number of samples that the classifier regards as a positive sample but is actually a negative sample, and FN represents the number of samples that the classifier considers the target as a negative sample but is actually positive. A confusion matrix for a binary classification problem is presented in Table 2.

To evaluate the performance of the proposed study, three performance metrics are utilized, i.e., the mAP, precision, and recall. The precision states the rate of true predictions, i.e., how often your model predicts the label of a test sample correctly. Similarly, recall figures out how often the system mistakenly predicts a positive sample into a negative category. Considering these two, the area under the precision-recall curve is calculated, which is called average precision, which computes the mAP that summarizes the performance of a prediction model.

The calculation equations for precision and recall based upon the confusion matrix are given as follows:

4.4. Experimental Results

The simulation setup compares the three systems, i.e., the traditional SSD algorithm, MobileNet-SSD, and the feature-fused MobileNet-SSD (current study). The mean average precision and the detection speed (detection time per frame of image) of the three approaches are compared in Table 3.

It can be seen from Table 3 that, in the classroom behavior recognition experiment, the MobileNet-SSD feature fusion proposed in this paper has increased the detection speed by 2 frames per second compared with the traditional SSD algorithm. The average detection accuracy rate has reached 86.45%. Compared with the MobileNet-SSD model without feature fusion, the accuracy is improved by 8.34%. After the above analysis, it could be claimed that the algorithm in this paper has a good performance compared with the other two algorithms in terms of detection speed and recognition accuracy.

The difficulty of model training can be evaluated by comparing the change curve of the loss function in the training process. The MobileNet-SSD and SSD models of feature fusion presented in this paper were intercepted. The loss function curve is presented in Figure 4 and Figure 5, respectively, maintaining constant value for all the parameters while considering epoch as 100 taken over 50000 iterations.

5. Conclusion

In traditional teaching, both the teacher and the school need to understand the classroom status of a certain course considering the way of human work to judge the efficiency, acceptance, and attendance rate of students. With the development of artificial intelligence and smart campus systems, it has become a trend to introduce intelligence into the classroom environment and replace manual analysis with intelligent analysis. Therefore, it is of great significance to find a method to improve the efficiency of classroom state analysis. This paper presented a scientific method utilizing a deep neural network to analyze the state of a class video from the two aspects of students’ class behavior and attendance. It can realize the identification of classroom behavior and the evaluation of course quality. It can significantly affect the teachers’ motivation and promotion, thereby improving the overall college education.

In the future, the system may be enhanced by adding more behavioral images, which will further enhance the performance evaluation of the system. Moreover, the system may be integrated with a direct student feedback system and an existing manual teachers’ evaluation system. It will test the prediction quality of the system and augment its working, thereby increasing the overall quality level of the evaluation system. A similar system may also be developed for the office environment, which may enhance productivity and provide an opportunity to promote hardworking employees.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

All the authors declare no conflicts of interest.

Acknowledgments

This study was supported by the research project of State Administration of Traditional Chinese Medicine with the Theme of “Extensive Learning, Deep Investigation and Detailed Implementation” (GY-15, Research and Construction of TCM-Characterized Online General Courses) and Science Popularization Project of Shanghai Municipal Science and Technology Commission (17dz2301000, Construction of TCM Science Popularization System Based on Cultural Inheritance and Health Guidance).