#### Abstract

The separation of time and space in immersive virtual teaching makes students unable to realize emotional communication, which may affect students’ mental health. In recent years, the use of affective computing technology to solve the problem of affective loss in distance education has become a key research topic. In order to realize the problem of emotion interaction in immersive virtual teaching, a semisupervised support vector machine- (SVM-) based affective interaction model was proposed. First, the natural language sequences of students in the virtual teaching environment are preprocessed using a statistical-based framing method, and mutual information and expected cross-entropy are used as feature selection methods. Then, a vector space model based on TF/IDF feature term weights is proposed to implement the feature vector representation of natural language sequences. Finally, after the constructed sentiment space, a semisupervised SVM is employed as the classifier to complete the affective interaction computation. The experimental results of emotion classification show that the proposed model is able to determine and understand the emotional state more accurately than other traditional models and significantly improves the training speed. In addition, the proposed model can provide emotional encouragement or emotional compensation according to the specific emotional state of the learner.

#### 1. Introduction

In recent years, virtual reality technology has gained widespread attention as a new type of educational media, providing a new technical means for modern distance education. Distance virtual instruction technology (DVIT) mainly combines modern information technology and human-computer interaction technology into the actual teaching tasks. At present, the networked education system has entered the era of personalised development. The networked education system needs to be intelligently adapted to the learning interests or learning emotions of different learners. Therefore, in order to promote the development of virtual distance learning, a very interesting research topic is how to apply psychology and affective computing to the field of education.

Virtual instruction is a two-way interactive teaching model using computer technology, communication technology, simulation technology, artificial intelligence technology, and so on [1–7]. In the United States, the number of people learning through the Internet is growing at an annual rate of over 300%. The U.S. Education Technology Forum has proposed the “Digital Learning” initiative. The “Smart Classroom” introduces the concept of interactive spaces to distance learning systems, where the key technologies are interactive spaces and human-computer interaction models.

Communication between humans and computers using natural means is the goal of human-computer interaction theory. As computers play an increasing role, there is a pressing desire to be able to manipulate them in a natural way. People can interact with machines in the same way that they interact with each other in everyday life (voice, gestures, and expressions) [8–12]. As a result, there has been an increasing amount of research in recent years on interacting with computers through multiple modalities, such as simple voice, gesture, and expression interaction. This human-like interaction has opened up a new generation of human-computer interaction research [13–15]. The main point to note is that the connection between humans and computers has a certain natural and social character.

Therefore, the most critical issue to be addressed in human-computer interaction is the capacity for affective intelligence, which is the main focus of this study. Affective interaction in virtual teaching refers to the positive and active tendencies between teachers and students in the virtual teaching process, reflecting the social relations and psychological activities of people in the virtual learning environment. In order to realize the problem of emotional interaction in immersive virtual teaching, this paper proposes an affective interaction model based on semisupervised support vector machines and verifies its effectiveness through testing.

#### 2. Related Works

Affective computing is a hot topic in information science, cognitive science, and psychological science in recent years, based on human emotion and sentiment theory [16–18]. As an important research area of affective computing, natural language processing (NLP) is an important way to obtain emotional information during human-computer interaction.

Although emotions are an internal experience of attitudes, they are often accompanied by external manifestations, such as a person’s facial expressions, body posture, and verbal expressions. Verbal expressions refer to the characteristics of an individual’s speech in terms of tone, rhythm, and speed when emotions occur. The concept of affective computing was first introduced by the MIT Media Lab in the USA. At present, typical emotion models include the Vonte 3-D theory model, the Plutchik 3-D model, and the hidden Markov models (HMM). The aim of affective computing research is to create a computing system that can recognise and understand human emotions and respond intelligently to them. The primary problem in the study of affective computing is the acquisition of emotional signals, including changes in pulse rate, skin current intensity, speech, and facial expressions. Speech is one of the most intuitive signals of human emotion in virtual teaching. Moreover, speech signals are easier to acquire than other signals. The essence of the speech recognition task is NLP [19].

Traditional supervised learning methods have been effective in improving the performance of NLP by annotating large amounts of information. However, the annotation of training data is time and labour intensive and presents a number of problems when dealing with large data tasks. For example, there is no sufficient annotated corpus, or the quality of the labelling resources is poor. Therefore, natural language processing techniques based on semisupervised learning have become a hot topic of research [20]. Semisupervised learning has become an important emerging direction in the field of machine learning recently. Unlike supervised learning methods, semisupervised learning uses both labelled and unlabelled data. In addition, semisupervised learning has better performance than supervised learning methods that only use labelled data. Semisupervised support vector machine (semi-SVM) is a typical semisupervised learning algorithm. Mohapatra [21] proposed a sampling-based semi-SVM that can effectively predict the presence of defective problems in software. However, the excessive number of iterations makes it unsuitable for large-scale situations. Ying et al. [22] proposed a direct optimisation of semi-SVM using a concave-convex process, which greatly reduces the training time, but the complexity is still high.

Therefore, this paper attempts to use semi-SVM to implement affective interaction in virtual teaching. The proposed model is based on affective computing and NLP as the core technology, which captures and recognises the learner’s voice characteristics, determines and understands his or her emotional state, and provides corresponding emotional encouragement or emotional The proposed model is, to a certain extent, a model of the learner’s emotional state. The proposed model enhances the emotional interaction in the distance learning system to a certain extent and makes a useful exploration to solve the emotional deficit in distance virtual teaching. The main innovations and contributions include the following: (1) a vector space model based on TF/IDF feature term weights is proposed to implement the feature vector representation of language sequences; (2) after the constructed emotion space, semi-SVM is employed as a classifier to complete the emotion interaction calculation.

#### 3. NLP in Virtual Teaching

##### 3.1. Framing and Feature Selection for Natural Language Sequences

Compared to other emotional signals, the speech signal is an easy to capture one-dimensional signal during virtual teaching and learning, which can visually represent the changes in students’ emotions. In order to recognise the speech signal, its frequency domain information must be analysed. The flow chart is shown in Figure 1.

As shown in Figure 1, the first step is to perform the framing and windowing operations. Here, the Hamming window is chosen as the window function. The short-term Fourier transform (STFT) is applied to each frame obtained after the windowing. where denotes the input speech sequence, denotes the window function, denotes the scaling weights, and denotes the window size.

For the framing task, the greater the number of simultaneous occurrences of adjacent words in the content of a speech sequence, the greater the probability of forming a word. For a statistically based framing approach, i.e., to count the frequency of each combination of words in a natural language sequence, this requires the calculation of its mutual information . where is the probability of word occurring in the sequence, is the probability of word occurring in the sequence, is the probability of two words occurring next to each other, and provides an objective indication of the tightness of the relationship between two words. By framing the sequences based on statistics, the speech sequences are partitioned into representations of feature terms in order to disambiguate as much as possible in the context.

After the natural language sequences have been framed into word sets, a high-dimensional feature set is obtained. In automatic sequence classification, the dimensionality of the feature space needs to be reduced; i.e., the most suitable feature subset is selected from the input feature set. Based on statistical principles, mutual information (MI) and expected cross-entropy (ECE) are used as feature selection methods in this paper. The mutual information is calculated as follows: where is the conditional probability and is a feature term and the category. The larger is, the greater the probability that the feature term and the category will occur together.

For the whole series, the average of the mutual information is generally used for the calculation.

Expected cross-entropy is calculated as follows:

Based on the analysis of the probability distribution of the sequence classification, the feature term with the larger value is usually selected.

##### 3.2. Vector Space Representation Model

Based on the principle of semantic retrieval, we use a vector space model based on TF/IDF feature term weights [23] to implement a feature vector representation of natural language sequences. The initial frequency statistics of sequence word frequencies in the affective computing system are performed using the following formula. where is the raw frequency statistic of word sense in sequence , is the standard frequency of word sense in sequence , and is the total number of word sense in sequence . The inverse sequence frequency is determined by the number of sequences. where is the number of sequences in which sense occurs at least once, is the total number of sequences in the journal’s intake system, and is the frequency of inverted sequences of sense . The sequence lexical sense weights are defined as follows.

The definition of query word sense weights is as follows. where is the weight of word sense in query and is the initial frequency statistic of word sense for query .

#### 4. Semi-SVM-Based Affective Interaction Model

Affective interaction as an indispensable part of teacher-student interaction in teaching activities has a positive role in learning: (1) affective interaction can stimulate learners’ motivation to learn. (2) Affective interaction facilitates the achievement of emotional goals in teaching. (3) Affective interaction can effectively improve learning efficiency. Generally speaking, pleasant and enthusiastic emotions keep the human brain in an optimal state. When people learn in a pleasant mood, they remember well and learn efficiently. (4) Affective interaction can relieve learners’ real-life stress and regulate their mood.

##### 4.1. Constructing Affective Space

Psychologists classify human emotions into six categories: happiness, surprise, fear, sadness, disgust, and anger. All human emotions are based on a complex blend of these six emotions. Based on these ideas, emotional interaction requires the solution of two main problems: the selection of affective semantic words and the construction of an artificial emotional space.

On the basis of the 3D theoretical model of affect, and considering the possible affective states of learners in distance virtual teaching, we created an affective space with four components, as shown in Figure 2.

As can be seen from Figure 2, the three pairs of affective elements divide the affective space into eight subspaces. To simplify the problem, the eight subspaces are combined into four affect spaces: excitement, happiness, frustration, and anger. Excitement and happiness are regarded as positive emotions, which are emotional states that contribute to students’ learning. Frustration and anger are regarded as negative emotions, which are emotional states that inhibit students’ learning. The dots on the three-dimensional coordinate system are neutral emotions, which neither promote nor inhibit student learning.

Let the set of sentiment state spaces be , where denotes the number of basic sentiment states. The sentiment states are denoted by the random variable . Let be the probability that (the -th affective state) and that it satisfies the conditions shown as follows.

In this way, the probabilistic spatial model of the affective state can be expressed as follows.

A person’s affective state cannot be directly observed, but the characteristics of a state can be observed. The features are used to find out the possible affective states. On the basis of the HMM model, it is assumed that the transition between emotional states corresponds to different probability values for different external motivational strategies. For example, students are more likely to shift from “calm” to “happy” when they are “positively” motivated than when they are “negatively” motivated. The probability of moving from “calm” to “happy” is higher when motivated by “positive” than by “negative” motivation. In establishing the affective space, we only considered the effect of different motivational strategies on emotions but did not consider the difference in the effect of the same motivational strategy on the emotions of different subjects (the problem of individualisation of emotions), which will be one of the areas to be further investigated in this study.

##### 4.2. Affective Interaction Algorithms

Currently, as a typical machine learning method, support vector machines have been widely used in various automated classification fields [24–26]. Semisupervised learning is a popular direction that has emerged in the field of machine learning in recent years, which can make effective use of both labelled and unlabelled data. Researchers have applied semi-SVM to classification tasks in several fields, showing certain advantages. The basic principle model of a support vector machine is shown in Figure 3.

Set the training sample set of the support vector machine as , where is the category label of the samples, is the sample dimension, and is the number of training samples. As a typical semisupervised machine learning algorithm, in semisupervised learning, the training set can be regarded as a mixture of a labelled and an unlabelled data set. If the data sample set is linearly divisible, then we can find a hyperplane that satisfies the generalised classification optimum [27]. where is a coefficient and is an offset. The optimisation problem for the classification task can be represented by

It can be seen that maximising the classification interval () means minimising . In general, it is unlikely that the specific data in real life is perfectly linearly divisible. Therefore, a penalty factor is introduced into formula (13) to obtain the Lagrangian transformed optimisation problem, as shown in where introduces an error penalty factor. Solve for via formula (14) and then for via .

For vectors with uncertain class attributes, the following judgement function is generally used to discriminate [28].

For a nonlinear support vector machine, the judgement function is shown in
where *K*(.,.) denotes the kernel function, denotes the symbolic function, and is the number of training samples.

##### 4.3. Complexity Analysis

The semisupervised algorithm consists of two stages: (1) training the model on labelled data with a time complexity of and (2) training the model on all data with a time complexity of , where and are the number of iterations, is the number of labelled samples, and is the number of all samples.

Both and are not greater than 10. The labelled samples are much smaller than the unlabelled samples, i.e., . Therefore, the total time complexity of the semisupervised algorithm is and is linear in the number of samples.

#### 5. Experimental Results and Analysis

##### 5.1. Experimental Setup

In order to verify the validity of the proposed affective interaction model, simulation experiments and analysis were conducted using 400 real teaching process recordings as the data set. The simulation experimental environment was configured with Windows 7 operating system, I5 CPU processor, 4 GB RAM, and Matlab 2012 simulation platform.

The data set used for the experiments was derived from virtual English teaching videos of 15 classes. The duration of the audio sequences in each video was 30 minutes. Out of the 400 audio sequences, 200 were randomly selected as annotated data and the remaining 200 as unannotated data in order to complete semisupervised learning. For comparison purposes, in addition to accuracy (ACC), we used F1.

##### 5.2. Affect Recognition Results

First, we are doing baseline experiments using the standard HMM model on labelled data. Then, the effectiveness and time performance of semi-SVM and -nearest neighbor (KNN) [29] were compared under different unlabelled samples. The experimental results of the ACC and F1 values of the three methods under different setup conditions are shown in Table 1.

As can be seen from Table 1, the proposed sentiment interaction model achieved the best ACC and F1 values for this task, demonstrating the effectiveness of semi-SVM. The accuracy and F1 values of both semisupervised algorithms improved as the number of unlabelled samples increased. A comparison of the training times of KNN and semi-SVM for different setup conditions is shown in Table 2.

As can be seen from Table 2, compared with KNN, semi-SVM has a substantial increase in training speed, although the accuracy improvement is not significant, due to the fact that the total time complexity of semi-SVM is and linear with the total number of samples; i.e., the time complexity reaches linearity.

##### 5.3. Stimulating Sentiment Swings

The next step is to verify whether the proposed affective interaction model fits the pattern of human affective change. Human sentiment states will shift in response to external stimulus signals. As the external stimulus disappears, the human emotional, mood voltage-state should gradually tend to a calm state. The process of emotion fading over time after receiving a stimulus in a given state is shown in Figure 4. The sentiment swings at a given moment when receiving stimulation , , , and in sequence are shown in Figure 5.

The results of the experiment show that the affective state can be effectively changed by different nature of stimulation when the initial affective state is calm. The degree of change in affective state was related to the motivational factors , , and . This result is consistent with the theories of emotional psychology, common sense, and experience and validates the validity of the semi-SVM-based emotional interaction model.

#### 6. Conclusions

In order to realize the problem of affective interaction in immersive virtual teaching, this paper proposes a semi-SVM-based affective interaction model. A vector space model based on TF/IDF feature term weights is proposed in this paper to implement the feature vector representation of language sequences. After the constructed sentiment space, semi-SVM is employed as a classifier to complete the sentiment interaction computation. The proposed model achieves good results in terms of accuracy and F1 evaluation metrics, validating its feasibility. However, relying only on speech signals to train and test the emotional interaction model is still not satisfactory, and the combination of brain IR detection and face recognition will be considered for more accurate emotional computation in order to make the human-computer emotional interaction more realistic, natural, and effective.

#### Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.