Abstract

Based on basic emotion theory and the PAD emotion model that can describe continuous emotion changes, we first propose a more general concept of a five-dimensional emotion model to better meet the needs in the area of emotion recognition. We determined the relationship between its dimensions and basic emotions and used a Pearson correlation analysis, multilayer perceptron, and other methods to compare and verify it with volunteer human identifiers. The results demonstrated that the five-dimensional emotion model was better than human identification in the field of emotion recognition. We also compared it with the PAD emotion model. The results demonstrated that the five-dimensional emotion model performed better. Finally, using the proposed model, we designed a technology prototype of a mood adaptive interface to demonstrate its potential application.

1. Introduction

The development of artificial intelligence has reached a very high level in the areas of reasoning, calculation, recognition, and learning. However, it only simulates human thinking and rarely pays attention to people’s actual mental activity. The research on artificial intelligence combined with human psychological changes in the field of emotion recognition is only just beginning.

In current research, different psychological schools have different understandings of emotion. The psychological understanding of emotion can be roughly divided into basic emotion theories and dimensional emotion theories. Functional psychology believes that mental activity is a continuous whole. It opposes the splitting of a person’s mental activity into parts [1]. Dimensional emotion theories are more suitable for expressing a continuous change of emotion.

In the study of dimensional emotion theories over the past decades, psychologists have proposed various models. The PAD emotion model (or simply PAD, from its three dimensions) describes emotions in terms of a composition of pleasure–displeasure, arousal–nonarousal, and dominance–submissiveness [24]. Frijda proposed that emotions are a mixture of pleasant–unpleasant, excited, interested, social evaluation, surprise, and simple–complex [5]. Izard proposed that emotions are composed of a pleasant, tension, impulse, and confidence dimension [6]. These emotion models have been applied in various situations and contexts, but the existing emotion models do not necessarily work best in emerging fields. In the area of emotion recognition, many computer science studies that use existing emotion models could be improved upon in terms of their results in order to be applicable in various environments.

In current AI research using existing emotion models, Wang used the OOC model to composite the cognitive components of the agent’s emotions [7]; Han used the AVS emotion model to design the cognitive-emotional model of an eldercare robot [8]; Mi used the arousal-valence two-dimensional emotion space to explore the academic emotions of secondary school students [9].

However, there is not much research in the design of emotion models adapted to the field of artificial intelligence. Shangfei Wang believed that the name of the emotion dimension was unimportant, and the use of dimension implied that an emotion space can be established using appropriate methods in which each emotion can be seen as a vector of that space [10]. Fu et al. [11] and Cai [12] used a neural network to directly train data features to obtain a model. On the basis of a series of experimental studies [1315], Gable and Harmon-Jones first directly tested the hypothesis that motivation-related factors of emotion may affect cognitive processing, such as attention and memory, and proposed the motivational dimensional model of affect [16]. Many of these studies have not been adapted well to the field of emotion recognition, and this is why we propose the five-dimensional emotion model.

2. Five-Dimensional Emotion Model

In view of the shortcomings of those existing models for our purposes, we deduced and defined five dimensions related to emotion and then built a five-dimensional emotion model and evaluated it.

2.1. Determine Dimensions

In 1971, Ekman developed the Facial Affect Scoring Technique (FAST) [17], which is an emotional measurement system based on facial muscle movement components. According to this system, he listed six basic emotions: happiness, anger, fear, sadness, surprise, and disgust. On the premise that this dimensional emotion theory provided an objective truth, there must be different emotional dimension values that could be assigned to different basic emotions. Accordingly, we established a method in reverse which determines basic emotions from inferred dimensions.

Because “disgust” can be understood as an evaluation of external objects or stimuli, which was not conducive to determine the dimensions, it was temporarily excluded in the derivation. Based on two existing emotion models, we first assumed two known dimensions: intensity and pleasure. Intensity is used to describe the degree of individual neural activation; pleasure is used to distinguish the positive and negative aspects of emotions. These dimensions are internal experiences, belonging to an individual’s own feelings. Those feelings can be divided into internal dimensions. Intensity is not a determinant of emotion; it is only used to describe the strength of an emotion. Pleasure distinguishes happiness from the other five basic emotions and makes happiness exist as a positive emotion.

In 1985, Smith and Ellsworth determined six dimensions of emotion through experimentation: certainty, pleasure, attention activity, control, expected effort, and sense of responsibility. They then proposed a cognitive evaluation theory of emotion [18]. According to this theory, the core dimensions that distinguish anger from other negative emotions are certainty, control, and responsibility. Anger is an emotion that often occurs when something negative happens to the individual and others were responsible and/or have control. Also, the individual has a fair sense of certainty about the event. Fear is an emotion that appears often when the environment controls negative events. With it, individuals have a high sense of uncertainty for the negative objective events. In Kalat and Shiota’s study of emotion, they pointed out that, under the same conditions, the difference between anger and sadness is whether people can control the situation [19]. So, we can determine two dimensions to distinguish negative emotions: control and certainty.

Surprise and fear are similar in a large number of cases. In a 1985 study, Meng [20] determined the facial expression patterns of infants. They asked strangers to hold rats in front of infants to cause fear and presented rats that ran freely in a box to cause surprise in the infants. They obtained the predicted correct results in this experiment. It can be seen from this comparison that the key variable determining fear or surprise was whether the infants increased their response intensity to the external stimuli, namely, tension. From this, we can determine the dimension that distinguishes surprise from fear: tension.

Because the three dimensions described above are related to cognition and they lead to emotional differences, we can categorize them as external dimensions.

To reduce the potential error caused by the failure of individuals to analyze emotions well, we employed 65 people to evaluate the relationships between the basic emotions and the five dimensions at six levels of very high (100), high (75), medium (50), low (25), very low (0), and no correlation (empty). In addition to the given level, some people think that the difference between these levels is too large and thus fill in other values such as 30. This was not expected to have an impact on the experimental results.

In order to reduce the error, we deleted outliers (defined as data with very large deviations). Based on this, we thought that a relationship with too high a standard deviation and too large a dispersion was irrelevant for our use because different individuals were expressing different feelings. The standard deviation between the five basic emotions and the five dimensions is shown in Table 1.

Finally, we calculated the average of the remaining data and determined the corresponding level relationships between the five dimensions and the basic emotions. The relationship is shown in Table 2.

2.2. Verify Dimensions

Izard points out that emotion plays a key role in adaptation and survival. Emotions have adaptability and mobility. The evidence of this conclusion is that, in a study of comparative anatomy and comparative psychology, it was confirmed that facial activities play an important role in the lives of some species of vertebrates; this emphasizes the role of facial expression in emotion research [21]. On this basis, Meng Zhaolan proposed that facial expression can be used as basic means of determining emotions in research [22].

Therefore, we can verify the five dimensions of our model through the study of facial motion or expression.

Ekman and other researchers built the Facial Action Coding System (FACS) in the mid-1970s by studying their own facial expressions [23]. This system is not based on muscles but on facial activity, namely, an activity unit (AU). Each AU is associated with one or more muscles. The AUs we used are shown in Table 3.

An individual can generally judge another’s emotions by a quick scan of their expressions. In this case, the visual image of the face is very hazy, and many AUs cannot be recognized clearly. This would suggest that there is no need for so many AUs in judging emotions. So, we could simplify the table of active units and merged the 24 AUs into five simple AUs: the movement of the eyebrows up and down, the distance between the eyebrows, the closure of the eyes, the curvature of the mouth corners, and the closure of the mouth.

By using the Karolinska Directed Emotional Faces (KDEF) [24], a dataset of facial expressions and corresponding emotions provided by the Karolinska Institute, to analyze the simple AUs, we could verify the accuracy of our model.

Using the Face++ engine, we extracted 20 feature points from the KDEF expressions (see Figure 1).

Then, we calculated the coordinates of these feature points in order to get the quantized values of each sample AU.

For the closure of the mouth, the following equations were used:

The reason for introducing the mouth_reference here is that it can eliminate the influence of the camera’s distance and proximity on the coordinate value by finding a reference system in the face. The other similar operations are the same.

For the curvature of the mouth corners, the following equations were used:

For the closure of the eyes, the following equations were used:

For the movement of the eyebrows up and down and the distance between the eyebrows, the following equations were used:

The operation here is that we used the data of individuals with extreme results to calculate the maximum and minimum values of the distance between the eyebrows and then mapped this value to the range of 0–100. The following similar operations are the same as this one:

In this way, we determined the values for the sample AUs from the expression pictures needed to calculate the five-dimensional values: the closure of the mouth corresponds to mouth_open, the curvature of the mouth corners corresponds to mouth_rad, the closure of the eyes corresponds to eyes_open, the distance between the eyebrows corresponds to eyebrows_distance, and the movement of eyebrows up and down corresponds to eyebrows_height.

2.2.1. Correlation Analysis

We conducted a Pearson correlation analysis on the five variables obtained. The overall correlation coefficient is represented by ρ in a population. Generally, ρ is unknown, so we needed to estimate ρ approximately through a sample correlation coefficient r. Because of the different values within the individual samples, the values of r varied. Therefore, we needed to examine the suitability of the sample correlation coefficient and carry out a significance test on it. By using SPSS, we calculated the sample AUs from the KDEF database with the formulas that were described. Then, we conducted the Pearson correlation analysis with the five variables (i.e., the dimensions of the corresponding emotions obtained). After that, we used min-max normalization, which is a commonly used normalization method, and calculated the correlation coefficients of the five simple AUs. They could then be used as feature weights when we calculated the five-dimensional values. The comparison results of the correlation coefficients of the five sample AUs are shown in Table 4.

We calculated the corresponding five-dimensional values of the images from the KDEF database by applying the obtained coefficients.

2.2.2. Multilayer Perceptron

A multilayer perceptron can solve most linear nonseparable problems. In this part of the study, we used a multilayer perceptron, which was composed of five layers of neurons. This network included an input layer, hidden layers, and an output layer, and it used gradient clipping and dropout methods to regularize.

In the multilayer perceptron, we used a softmax activation function and a cross-entropy loss function. We considered the five-dimensional data of an expression picture as a group and therefore had 980 groups of data. We divided this data into 880 groups for training, and the remaining 100 groups were used as test data.

After 5,180,104 iterations of training, the accuracy of the model achieved is shown in Table 5.

In order to verify the accuracy of the model, we also invited 12 volunteers of different ages to identify the same 100 randomly selected facial images and calculated their accuracy. At the same time, we let the model identify the 100 facial images. The final results are shown in Table 6.

The results show that the accuracy of the model recognition after training was higher than that of human volunteers.

3. Comparison between the Five-Dimensional Emotional Model and the PAD Emotional Evaluation Model

3.1. PAD Emotional Evaluation Model

The PAD emotional evaluation model is a model based on emotional dimension theory. According to the evaluation method of semantic differences, Mehrabian and Russell defined the three dimensions of emotion as “P” for pleasure–displeasure, “A” for arousal–nonarousal, and “D” for dominance–submissiveness [24]. Among them, P represents the positive or negative degree of an individual’s emotional state; A represents the physiological activation level of an individual’s nerves, indicating that the individual is excited or depressed; and D represents the subjective control state, an active or passive state of the individual to the situation and others.

The PAD emotional evaluation model, as the most famous model in emotional dimension theory, has been widely used in emotional research. In this part of the study, we tested the applicability of the five-dimensional emotion model by comparing it with the PAD emotional model.

3.2. Contrast Experiment of Accuracy under the Same Conditions

For the PAD emotional model, we used the same sample and multilayer perceptron with the same input layer and hidden layers as the five-dimensional emotional model for training.

Lee et al. at the Institute of Psychology, Chinese Academy of Sciences, compiled the Chinese version of the PAD emotion scales and evaluated the PAD values of the 14 basic emotions. This included basic emotions proposed by different researchers, such as happiness, anger, relaxation, and fear. [25]. In the current research, we used PAD values corresponding to seven basic emotions from the KDEF database to analyze the correlation of the results. The PAD values of the seven emotions are shown in Table 7.

We analyzed the Pearson correlations between the PAD values of emotions and the five sample AUs and took the min-max normalized coefficients as the feature weights of the PAD emotional evaluation model. The coefficients obtained are shown in Table 8.

Based on these parameters, we transformed the sample AUs of the expression images of the KDEF database into the corresponding PAD values and then trained the multilayer perceptron with the data. After 6,000,000 iterations of training, we were able to compare the accuracies of the PAD emotional model and the five-dimensional emotion model. The final results are shown in Table 9.

Based on this result, we felt that the five-dimensional emotion model was better than the PAD emotional evaluation model when applied to emotion recognition. The comparison results are graphically shown in Figure 2.

4. Comparison between the Five-Dimensional Emotional Model and the Motivational Dimensional Model

The PAD model proposes the existence of a dominance dimension in addition to the valence and arousal dimensions. Lee [25] evaluated the values of the basic emotions on the PAD dimension. Since the three dimensions of the PAD model are independent of each other, we can eliminate the individual dominance dimension and keep only the values of pleasure and arousal for the valence and arousal dimensions.

The motivational dimensional model of affect points to the existence of a motivational dimension in addition to the valence and arousal dimensions. Xia [26] designed experiments to assess the motivational dimension of basic emotions on a scale of 1–9. By normalizing his experimental results with Lee’s, we can map them to a range of −4 to 4. We can obtain a numerical representation of basic emotions on the motivational dimension model, which is shown in Table 10:

We analyzed the Pearson correlations between the motivational dimension value and the five sample AUs and took the min-max normalized coefficients as the feature weights of the motivational dimensional model. The coefficients obtained are shown in Table 11.

Based on these parameters, we transformed the sample AUs of the expression images of the KDEF database into the corresponding motivational dimension value and then trained the multilayer perceptron with the data. After 2,742,366 iterations of training, we were able to compare the accuracy of the motivational dimensional model and the five-dimensional emotion model. The final results are shown in Table 12.

5. Application of the Five-Dimensional Emotion Model

Human emotion has a wide range of application prospects and potential economic value in man-machine interaction, electronic education, robotics, entertainment, the medical field, and other areas. So, it has attracted a great deal of attention in academia and industry. In both arenas, a large number of related projects about human emotion have been carried out. Some of the companies that have been interested and involved in this research have been IBM in the United States, Sony in Japan, Philips in Europe, and many others. China has gradually attached importance to this kind of research, where it has been explored at the Beijing University of Science and Technology, the Institute of Automation of the Chinese Academy of Sciences, and others. In a few words, research on the five-dimensional emotion model as applied in emotion recognition has far-reaching scientific significance and extensive application value. To this end, we took the adaptive man-machine interactive system based on the five-dimensional emotion model as an example to illustrate its practical application.

Man-machine interaction is a subject that studies the design, evaluation, and implementation of interactive computing systems for the convenience of human users. An adaptive interface is the mainstream of man-machine interaction research and application. Now, with the development of artificial intelligence, emotion is more and more important in the research of an adaptive interface. Through the introduction of the five-dimensional emotional model, the man-machine interactive interface can observe a user’s emotions, so as to provide a more user-friendly user experience.

For a general user interface, color and background music are part of the core content. Based on this, we designed the style of our user interface to change when users were experiencing different emotions.

Before we designed it, we needed to make some adaptive improvements to the five-dimensional emotion model. Because different people have different facial features, behavioral habits, and uncontrollable expression changes, this will lead to recognition errors if the model worked according to only one frame of expression while performing a continuous emotional judgment. We needed to make some improvements to the model; that is, to prepare a queue of length N, each element of which was a final result of each emotion detection. When the queue was full, the model would perform an enqueue or dequeue operation. From this, we would receive a user’s continuous emotion detection results N times. Because emotion is a continuously changing process, this method would receive more accurate results by analyzing the emotions continuously.

According to Zhang Congcong’s experiment [27] in classic orchestra, Bach’s evaluation was significantly higher than others; his music in a minor key had P and D values that were significantly higher than the music in a major key; and the PAD values of his faster music were significantly higher than those of his slower music. In terms of color, the A value of longer wave color was higher than that of shorter waves; for multicolor, the PAD values of saturated and bright colors were higher than those of soft and dark colors as a whole. There was a nonlinear relationship between the D value and brightness, but in black and white, the D value of black was significantly higher than that of white and gray.

Based on the results of his study, we captured a user’s expression by accessing the user’s camera and then used the five-dimensional emotion model to detect the user’s continuous emotions and calculate the average value of the dimensions. For this, we could determine which dimensions needed to be strengthened most. Finally, through color gradient animation, music smooth switching, and other applications of JavaScript, the model displayed the interface style and background music that could improve the required dimension value and guide the user’s emotion in a positive direction.

For example, when the user was in a low-pleasure and low-control mood, we could use high-saturation blue as the main style and Bach’s fast minor-keyed Orchestral Suite No. 2 in B Minor, BWV 1067: VII. Badinerie as the background music to improve the user’s pleasure and control. When the user was in a high pleasure mood, we could use high-saturation red as the main style and Bach’s fast major-keyed Brandenburg Concerto No. 2 in F BWV1047:I. [Allegro] as background music to improve the user’s intensity and keep the user happy for a longer time.

6. Conclusion

Artificial intelligence has played a promising and productive role in many fields, but an AI that cannot distinguish emotion can only exist as a machine and loses much of the human experience. Therefore, we constructed the five-dimensional emotion model in this paper and made research and exploration in the field of emotion recognition. The highlights are as follows:(i)We determined five dimensions of the proposed emotion model as intensity, pleasure, control, certainty, and tension and simplified the AUs of the FACS. Based on the KDEF database, we determined the feature weights of the five dimensions corresponding to the sample AUs and used a multilayer perceptron to train our model with higher accuracy than recognition by human volunteers.(ii)We compared the five-dimensional emotion model with the traditional PAD emotional evaluation model under the same conditions and demonstrated that our proposed model was better in terms of accuracy.(iii)Finally, we discussed the practical application of the five-dimensional emotion model with the illustration of a prototype emotion-detecting interface as an example.

Data Availability

The data used in the study are available in Karolinska Directed Emotional Faces (KDEF): https://www.kdef.se/index.html.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.