Abstract

College students are under increasing competition pressure, which has a negative impact on their mental health, as the pace of learning and life accelerates, as well as the increasingly difficult employment situation. As a result, emphasizing the importance of college students’ mental health and fully addressing it has become a top priority in the work of colleges and universities. However, some students and even teachers are currently unconcerned about mental illness, making it difficult for students with psychological abnormalities to receive timely detection and effective treatment. As a result, it is the responsibility of student management for colleges and universities to identify and intervene early in the mental health problems of college students. Through the use of multimodal data and neural network models, it is now possible to evaluate and predict the mental state of college students in real time, thanks to the advancement of intelligent technology. Therefore, a novel multimodal neural network model is proposed in this paper. Our model is divided into two branches in particular. The traditional mental health assessment and prediction algorithm, which is based on the improved BP neural network and the International Mental Health Scale SCL-90, is one of the branches. Given how difficult it is to meet the requirements for the accuracy of college students’ mental health assessments using this method, our other branch is computer vision-based facial emotion recognition of college students, which is used to aid in the evaluation of mental health assessments. Our model demonstrates competitive performance through simulation and comparative experiments.

1. Introduction

With today’s fierce competition and increasing pressures in life, college students’ mental health problems [13] have become more visible, and their mental health conditions [46] are concerning. People with severe mental disorders or mental illnesses are forced to suspend school, drop out of school [7, 8], self-harm [9, 10], commit suicide, and even break the law in an endless stream among college students. It is critical and urgent to improve college students’ overall quality, particularly their psychological quality [11], cultivate exceptional social talents, improve mental health education, and predict mental health.

College students are outstanding members of the youth population, representing a high intellectual group, and their mental health is critical. College students are in a critical transition period in their development and maturity. They will face a variety of issues during this time, including emotions and socialization [12, 13]. If they are not handled properly, they can lead to depression, anxiety, and other psychological issues. This is extremely harmful to college students’ development. It is not uncommon to come across examples of exceptional college students who failed to deal with the final suicide due to emotional issues [14].

College students are confronted with a more complicated environment as society evolves. On the one hand, the steady increase in the number of graduates has put more pressure on college students to find work. Advances in science and technology, on the other hand, have increased the barriers to employment and education, and college students are facing greater challenges. Furthermore, college students face pressures in the areas of learning, communication, emotions, and life. Such complex relationships are causing an increase in psychological problems [15]. Psychological problems affect all aspects of college students, not only have an impact on their studies and employment but may seriously endanger their own health. A survey conducted by the Chinese Center for Disease Control and Prevention showed that 16.0%-25.4% of college survivors have general mental health problems such as mild anxiety and stress, of which about 2.8% have mental health problems of varying degrees, and some college survivors have varying degrees of mental health. Accidents of grief occur in severe cases. According to a survey on student mental health, 80% of students said they had experienced psychological problems, 11% of students said they were in an unhealthy mental state, and 35% of students said they were under increased psychological pressure at school. According to the survey, psychological issues have surpassed financial difficulties as the leading cause of college dropout. The recurrence of campus tragedies demonstrates that mental health issues among college students have progressed to the point where they are now life-threatening. In this context, we should investigate the assessment and prediction of college students’ mental health [1618].

In recent years, with the rapid development of intelligent technologies [1921] such as neural networks [2224], it has extraordinary performance for nonlinear problems such as college students’ mental health assessment. Therefore, the proposed model in this paper is mainly based on convolutional neural networks and improved BP neural networks [25]. The neural network and Bayesian method are combined to learn by constructing a suitable training model [26] from the selected typical samples, selecting the best training situation to grasp the internal relationship between the input and output,and obtaining the network weight vector of the knowledge about the problem. Apply the obtained prior probability to the Bayesian formula and combine the conditional probability obtained by actual statistics to calculate the impact of various indicators on the mental health of college students. Secondly, we also recognize the facial emotions of college students based on computer vision [26, 27] calculations to further improve the performance of college students’ mental health assessment.

The main innovations of this article are as follows: (1)This paper proposes a novel dual-branch neural network model based on intelligent technology for college students’ mental health assessment. One branch is based on improved BP neural network for mental health prediction, and the other branch is based on computer vision-based facial emotion recognition for college students(2)This paper proposes to combine neural networks with Bayesian methods, by constructing a suitable training model from selected typical samples, selecting the best training situation to grasp the internal relationship between input and output, and obtaining network rights about the knowledge of the problem. The vector is learned and then converted into the weight of each indicator according to the formula, that is, the prior probability. Apply the obtained prior probability to the Bayesian formula and combine the conditional probability obtained by actual statistics to calculate the influence of various indicators on the mental health of college students(3)Small scale is the key to facial emotion recognition in complex environments. We first analyze the scenes of small scale faces, make reasonable structural adjustments to the classic classification network, and introduce attention mechanism modules to improve the subtleties of the network model. Feature extraction capabilities to improve the accuracy of facial emotion recognition

The rest of the paper is arranged as follows: the second section presents the related work, the third section proposes the methodology, the fourth section provides the experiments and results, and finally, the fifth section proposes the conclusions.

2.1. Mental Health-Related Research

The American Jiehata advocates the “positive mental health” health concept as the most authoritative from a psychological standpoint. Jiehata believes that even in the face of adversity and setbacks, people can maintain a stable and stable mental state and maximize their performance. Maintain self-control and rationality, deal calmly with pressure and stimuli, and clarify your personal development path. Mental health, according to the famous psychologist Sigmund Freud, is defined as the ability to love and work. Mentally healthy people, he believes, should be sensible, loving, able to maintain close relationships with others, have a sense of self-worth, accept reality, and have peace of mind. Freud believes that people’s mental health is related to their early experiences. The early years of college students are usually experienced in the family; so, family relations, family economic conditions, and other factors will have a good or bad impact on college students’ mental health; Kara Zwin uses the Internet to track the life and study of American college students, records the behavioral characteristics of mentally unhealthy students according to different levels of mental health, and finds that the psychological problems of college students are complex and changeable. Many college students suffer from multiple psychological problems, and most students suffer from at least one. Holly Anne followed up with 97357 college students and found that the weight of college students is a factor that needs attention, because in the research, college students will cause psychological problems because of their weight, and the weight of students with psychological problems is more likely to be affected.

2.2. Facial Emotion Recognition

In recent years, facial emotion recognition [2830] has also attracted a lot of attention and has become a hot research direction. Facial emotion recognition is divided into static emotion recognition and dynamic emotion recognition according to the feature representation. Static only needs to consider the characteristics of the current image, and dynamic needs to consider the relationship between adjacent frames in the video. For the emotion recognition of static images, it is divided into two parts at the beginning. First, SIFT, LBP, and other operators are used to extract the features, and then the classifier is used to complete the classification, as shown in Figure 1. Fasel [31] et al. found that the ability of convolutional neural network to extract features is better than that of multilayer perceptrons when the position of the face changes and the scale changes greatly. Matsugu et al. [32] and others used a convolutional neural network model to solve the problem of face position and scale changes in facial emotion recognition. Yao et al. [33] et al. proposed the HoloNet network model to achieve end-to-end classification. In the shallow design, the phase convolution module was proposed, and the two-way activation method was introduced. Under the premise of ensuring the output dimension remains unchanged, the number of convolution kernels was changed. Halve, while retaining the positive and negative phase modulus information; in the middle-level design, the phase residual module is proposed, which combines the two-way activation method with the residual structure to increase the depth of the network and retain the positive and negative phase modulus information. In the deep design, the initial residual module is proposed, by designing a wider and deeper combination structure than the Inception structure [34] and introducing a multiscale deep feature extraction and fusion mechanism. Zhao et al. [35] et al. proposed a cascaded network structure; that is, the network is first trained to detect human faces, then the areas related to facial expressions are detected hierarchically, and finally, emotion recognition is performed based on these areas.

A recurrent neural network model is typically used to realize emotion recognition of dynamic sequences. To complete the two tasks of facial emotion recognition in static images and dynamic sequences, Sun et al. [36] proposed a method based on the fusion of traditional features and deep features: extract MSDF, DCNN, and RCNN features from static images and then use SVM classifier, to classify and then use the fusion network to arrive at a final classification result. Zhang et al. [37] proposed the MSCNN network for detecting facial feature points and sending them to the PHRNN recurrent neural network for facial expression classification.

3. Methodology

3.1. Definition of Mental Health

College students’ mental health is extremely important to their physical well-being, and it is an important aspect of their overall health. There are corresponding standards for human physical health, and mental health has its own set of criteria for evaluation, but the main content of mental health standards differs significantly from physical health standards. Even if a person’s physiology is perfect, his mental health may not be, and mental illnesses may arise. Only by grasping the concept of mental health can you understand your own mental health problems, so as to adopt targeted training and treatment to reach the level of mental health.

Mental health is a broad and complex concept that encompasses a variety of fields, including medical, psychological, and social phenomena. As a result, different scholars have different perspectives and opinions on how to manage health. Clement believes that mental health encompasses not only physical well-being but also the psychological level of happiness. In this regard, he proposed that mental health should be classified according to the degree of happiness experienced by the individual. This level should range from a low happiness state to a medium state and then to the continuum of the most energetic and higher state. Bellonio proposed that mental health includes three aspects: emotional, psychological, and social well-being. Emotional happiness refers to the degree of individual’s response to positive and negative emotions, while psychological happiness refers to the establishment and development of potential interpersonal relationships and the degree of acceptance of individuals in the process of pursuing their own meaning and goals in life. Social happiness is that an individual identifies with the society and its parts and obtains satisfaction from his contribution to the society or part of the society.

The Third International Mental Health Conference held in 1946 defined mental health as follows: mental health refers to the development of an individual’s mental state to the best state within the range of physical, intellectual, and emotional not contradicting the psychology of others.

3.2. Improved BP Neural Network Based on Bayesian
3.2.1. BP Network

The basic processing unit of the BP neural network is neurons, which are made up of a large number of them. Each layer of neurons affects only the next layer, and the same layer does not interact with the others. The weights represent the degree of influence of the upper layer neurons on the lower layer neurons. The neural network’s training process involves continuously adjusting the weight threshold between neurons based on the training sample set of system input and target output data in order to establish the appropriate mathematical relationship for the input and output data and express their inherent characteristics that meet the target requirements. The input layer and the output layer each have one layer, and the hidden layer can be set at one or more layers according to actual needs. The number of hidden layer nodes is determined by repeated comparisons based on experience and experimental results. The topological structure of a typical three-layer feedforward BP network is shown in Figure 2.

The forward propagation of the working signal and the backward propagation of the error signal are the two main parts of the BP network training process. The first step is performed if the variance between the actual output obtained by the output layer, and the target output does not meet the target accuracy. The error signal is transmitted back according to the previous distance in the second stage, and the network’s weights and thresholds are continuously corrected. For the next iteration, the adjusted weights are used, and the error square between the actual output of the network and the target output is squared to achieve the required accuracy.

Assign a random value in the interval to each connection weight , , and threshold , tr of the network. For the input layer, the input and output are the same; then,

For the hidden layer, first obtain the input of each neuron in the hidden layer according to the input sample data , , and and then use to find the output of each neuron:

For the output layer, calculate the output of each neuron in the output layer through the connection weight , threshold , and hidden layer output of each neuron in the hidden layer and then calculate the actual output of the output layer through the transfer function:

Then, we calculate the error, and the error of the output layer and the hidden layer are as follows:

Then, correct the error of the connection weight matrix between the upper and lower layers of the BP neural network: where represents the learning speed.

Due to the slow convergence speed of the BP network and the poor generalization ability, we need to make basic improvements to the BP network.

3.2.2. Optimization of the BP Network

Because the standard algorithm of the BP network that uses the steepest gradient descent method to adjust the weight has many defects such as too slow convergence and easy to fall into local minima, we use the LM algorithm for optimization.

The LM algorithm is an improved algorithm that integrates the local convergence of the quasi-Newton method and the global advantages of the gradient descent method to adaptively adjust the network weights and thresholds. Let and be divided into vectors composed of the weights and thresholds of iteration and , , where is the number of weight thresholds ; Then, where and are the first derivative and the second derivative of the error function, respectively, and is the error function. where is the error vector , is the number of input sample groups, is the number of output target vectors, and is the dimension of the error vector. Then, the first derivative of the error function is

And the second derivative is

Then, we can get where and are the Jacobian matrix, and the calculation equation of the matrix is as follows:

Let , and then is the Hessian matrix. Then, the calculation equation of the LM algorithm is as follows:

3.2.3. Bayesian-BP Branch Network

We combine the Bayesian and BP network to analyze and judge the multiple factors that cause college students’ mental health diseases and provide effective guidance and reference for the prevention of college students’ mental health diseases.

First, perform feature screening and feature engineering on the collected college student mental health big data. The training sample converges quickly in the learning process and reaches the required accuracy of the target and the prediction effect of the test sample. There is a small difference from the actual expected target, that is, when the accuracy rate is high. Transform the weight vector output by the network to obtain the prior probability and apply it to the Bayesian method to obtain the posterior probability. Sort the obtained posterior probability, that is, the degree of influence of each physiological index on the mental illness of college students, combined with computer vision-assisted facial emotion recognition, etc., finally, get the most important factors among them. The schematic diagram of the branch network is shown in Figure 3.

3.3. Vision-Based Branch Network

Due to the lack of accuracy caused by subjective uncertain factors in the prediction based on the self-rating symptom scale SCL90, this paper innovatively considers the intelligent method based on computer vision. But in complex environments such as classrooms, the emotion recognition of small-scale faces is the key to the branch network in this section. In this article, we analyze the scenes of small-scale faces and make reasonable structural adjustments to the classic classification network and introduce the attention mechanism module to improve the subtle feature extraction ability of the network model, thereby improving the accuracy of facial emotion recognition.

3.3.1. VGG Model

We choose VGG as the backbone network, as shown in Figure 4. For ordinary classification tasks, VGG has a strong feature extraction ability and can usually achieve good results. For fine-grained classification tasks, the model is required to have very high requirements for the ability to extract subtle features, and its performance is average. Face emotion recognition is to classify a variety of emotions. There are many expressions that seem to have very small differences, such as disgust and fear, sadness, and neutrality, which can be counted as fine-grained classification tasks. Combining the characteristics of the scene and the model, the network structure should be adjusted to adapt to the low-resolution input used, and the model needs to be compressed to alleviate the overfitting situation.

3.3.2. Attention Module

The discrimination of facial expressions is very small in facial emotion classification tasks, and it can be divided into fine-grained classification. The most common supervised method of fine-grained classification divides the image into several areas, which are then fed into a composite structure network along with the original image. This can often produce good results for high-resolution images, but there are issues such as which area to choose and how large the area should be; however, it is difficult to achieve for low-resolution images, primarily because the target area cannot be obtained.

Inspired by the above focus on a certain area, this paper proposes to use the attention mechanism [38] module (Figure 5) to achieve the above goals. The attention mechanism module allows the model to learn and focus on important information areas autonomously during the training process, eliminating the need to divide areas and other steps. The attention mechanism module includes two parts: channel attention module and spatial attention module. Given a three-dimensional feature map [39], compress it into a two-dimensional feature map according to the channel and spatial dimensions, then connect it with convolution or full connection, and finally restore it to a weight between 0 and 1 for each pixel point.

3.3.3. Vision-Based Model

We embed the attention mechanism module into the VGG network to construct a branch network of emotion recognition for college students based on vision, as shown in Figure 6.

3.3.4. Our Model

Finally, we merged the Bayesian-BP network branch and the vision-based branch to construct a mental health assessment model for college students, as shown in Figure 7.

4. Experiments

4.1. Experimental Environments

This article uses the deep learning framework PyTorch developed by Facebook to build and train a convolutional neural network model. The environment configuration is shown in Table 1. We use 80% of the samples of the data set as the training set and 20% of the samples as the test set. In addition, the batch size is 100, and the number of iterations is 5000.

4.2. Dataset

In this paper, more than 2000 cases of college students’ mental health data are collected from a university, and the unreasonable parts are filtered out. 1877 cases of valid data are selected as the training sample set of the neural network model, and 10% is used as the test set to detect the trained network model. Generalization: since the sample set used for network training contains many different parameters, it is necessary to preprocess the input data and the output after the detection results are used, so that they are all at [0,1] or [-1, 1]. In addition, we also correspondingly collected facial emotion image data of college students in class and outside class.

4.3. Evaluation Index

To fairly verify the performance of the proposed mental health evaluation model for college students, we use the mean square error (MSE) for evaluation, and the calculation equation is as follows:

4.4. Experimental Results

We first compared with the BP network. Table 2 gives the experimental results. It can be seen that the proposed model has achieved a huge performance improvement over the BP network. This proves the effectiveness of this model.

In addition, to further verify the effectiveness of the model in this article, we designed an ablation experiment. We split the dual-branch network into single branches to conduct experiments one by one. The comparative results are shown in Table 3.

From Table 2, we find that the vision-based branch is better than Bayesian-BP, which proves that the vision-based mental emotion recognition is more reliable and effective. This may be due to the fact that college students have too strong subjective awareness when filling in the SCL90 Symptom Self-Rating Scale; however, the single branch is worse than the double branch, which again illustrates the superiority of the proposed model.

We also simulated the trained network and then reverse normalized the simulation results and compared them with the original data. The comparison result of predicted data and real data is shown in Figure 8. Among them, represents real data, and represents simulated data. It can be seen from the figure that the obtained simulation data is very close to the real data. This shows that the trained BP network has a better fitting effect in predicting the mental state of college students.

4.5. Ablation Experiment for VGG

In order to prove the superiority of VGG16 in the manuscript, we have added ablation experiments in this section to investigate the performance difference between VGG16 and VGG19 in this algorithm. It is worth noting that the ablation experiment was carried out under the same experimental conditions, and the experimental results are shown in Table 4.

It can be seen from Table 4 that after using VGG19, the MSE is relatively increased by 50.7%, because this shows that the choice of VGG16 is the best, the proposed algorithm obtains 3.12% of MSE, and VGG16 also has fewer parameters and computational complexity.

4.6. Ablation Experiment for Attention Mechanism

In order to verify the influence of the attention mechanism on the proposed algorithm, this section sets up an ablation experiment of the attention mechanism. No attention means not using the attention mechanism, and attention means using the attention mechanism. The results of the ablation experiment are shown in Table 5.

It can be clearly seen from Table 5 that without using the attention mechanism, the MSE rises to 7.63%, which greatly improves the error of the proposed algorithm. Therefore, this proves that the attention mechanism is effective. The mechanism improves the accuracy of facial emotion recognition.

5. Conclusion

The mental health of college students and intervention is the most important tasks in the management of students. With the development of intelligent technology, it has become possible to evaluate and predict the mental state of college students in real time through multimodal data and neural network models. Therefore, this paper proposes a novel multimodal neural network model. Specifically, our model is divided into two branches. One branch is the traditional mental health assessment and prediction algorithm based on the improved BP neural network and the International Mental Health Scale SCL-90. Considering that this method is already difficult to meet the requirements for the accuracy of college students’ mental health assessment, our other branch is the facial emotion recognition of college students based on computer vision, which is used to assist in the evaluation of mental health assessment. Through simulation and comparative experiments, we prove the effectiveness and superiority of the proposed method.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

All the authors do not have any possible conflicts of interest.

Acknowledgments

This work was supported by the 2019 Teaching Reform Research Project of Wuxi Vocational College of Science and Technology: “flipped classroom” teaching mode in college students’ mental health education course application and design under Grant JG2019205.