Abstract

Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods.

1. Introduction

Emotion recognition has necessary applications in the real world. Its applications include but are not limited to artificial intelligence and human computer interaction. It remains a challenging and attractive topic. There are many methods which have been proposed for handling problems in emotion recognition. Speech [1, 2], physiological [35], and visual signals have been explored for emotion recognition. Speech signals are discontinuous signals, since they can be captured only when people are talking. Acquirement of physiological signal needs some special physiological sensors. Visual signal is the best choice for emotion recognition based on the above reasons. Although the visual information provided is useful, there are challenges regarding how to utilize this information reliably and robustly. According to Albert Mehrabian’s 7%–38%–55% rule, facial expression is an important mean of detecting emotions [6].

Further studies have been carried out on emotion recognition problems in facial expression images during the last decade [7, 8]. Given a facial expression image, estimate the correct emotional state, such as anger, happiness, sadness, and surprise. The general process has two steps: feature extraction and classification. For feature extraction, geometric feature, texture feature, motion feature, and statistical feature are in common use. For classification, methods based on machine learning algorithm are frequently used. According to speciality of features, applying weighted features to machine learning algorithm has become an active research topic.

In recent years, emotion recognition with weighted feature based on facial expression has become a new research topic and received more and more attention [9, 10]. The aim is to estimate emotion type from a facial expression image captured during physical facial expression process of a subject. But the emotion features captured from the facial expression image are strongly linked to not the whole face but some specific regions in the face. For instance, features of eyebrow, eye, nose, and mouth areas are closely related to facial expression [11]. Besides, the effect of each feature on recognition result is different. In order to make the best of feature, using feature weighting technique can further enhance recognition performance. While there are several approaches of confirming weight, it remains an open issue on how to select feature and calculate corresponding weight effectively.

In this paper, a new emotion recognition method based on weighted feature facial expression is presented. It is motivated by the fact that emotion can be described by facial expression and each facial expression feature has different impact on recognition results. Different from previous works by calculating weight of each feature directly, this method considers impact of feature by calculating subrecognition rate. Our method consists of two stages: weight calculation stage and recognition stage. In the weight calculation stage, we first divide face into 4 areas according to degree of facial behavior changes. Then, we use each area’s features to calculate corresponding recognition rate. At last, we calculate weight of each area’s features according to magnitude of recognition rate. In the recognition stage, we first use the above weight results to calculate weighted kernel function. Then, we obtain a new recognition model based on SVM with weighted kernel function.

For the proposed method, there are three main contributions and differences compared to the preliminary work. () A more advanced weight of feature method is used. In previous method, the weight of each feature was calculated individually without practical verification. To overcome this shortage, we group features and calculate corresponding subrecognition rate. Then we calculate weight of feature groups based on their respective subrecognition rate. () In the recognition stage, the previous method used the weight of features directly. In this paper, we use weight of feature groups to weight kernel function. Then we use new weighted kernel function in machine learning model. () The proposed method has been evaluated in a database which contains 7 kinds of emotions. Moreover, comparison results have been carefully analyzed and studied on whether to use weighted kernel function. The rest of the paper is organized as follows: Section 2 gives an overview of related works on feature extraction of facial expression, calculation of weight of feature, and classification of emotion. Section 3 describes the theorem in proposed method and proofs. Section 4 verifies the proposed method by experiment and analyzes experimental results. Section 5 concludes the paper.

The recognition performance of motion based methods is highly dependent on the feature extraction methods. Many novel approaches have been proposed for feature extraction based on facial expression. They can be broadly classified into two categories: appearance-based methods and geometric-based methods. The appearance-based methods extract intensity or other texture features from facial expression images. The common methods of feature extraction include Local Binary Patterns (LBP) [12, 13], Histogram of Oriented Gradient (HOG) [14, 15], Gabor Wavelet [16, 17], and Scale-Invariant Feature Transform (SIFT) [18, 19]. These features can be used to extract Action Unit (AU) feature and recognize facial expression. The geometric-based methods describe facial component shapes based on key points of facial detected on images, such as eyebrows, eyes, nose, mouth, and contour line. The movement of these key points can be used for guiding the facial expression recognition process. For instance, the active appearance model (AAM) [20] or Active Shape Model (ASM) [21, 22] and the constrained local model (CLM) [23] are widely used to detect and trace these key points of face to record their displacement. However, the location accuracy of both ASM and AAM relies on their geometric face models. And the model training phases sometimes need manual works and are usually time-consuming.

The recognition results obtained by classification algorithm are affected by all features. So the introduction of weight can distinguish the contribution of different features and improve classification performance. A variety of methods have been proposed to calculate the weight of every feature. Reference [24] presented Euclidean metric in the criterion extended to Minkowski metric to calculate weight of each feature directly. Some methods divided the facial image into some uniform subregions and calculated the weight of each subregion. Reference [25] introduced information entropy to distinguish the contribution of different partitions of the face. Reference [26] estimated the weight of each subregion by employing the local variance. For feature weighting in different ways, feature selection and weight calculation might be recognized as a latent problem. One effective method to solve this problem is to perform feature weighting based on the obtained feedback. Some methods [27, 28] divided the facial image into some uniform subregions and returned the subregion result for feature weighting. There is no restriction on each feature, which provides freedom on how the feature representations are structured.

Many machine learning methods have been proposed to classify facial expressions, such as SVM [29], Random Forest (RF) [30], Neural Network (NN) [31], and nearest neighbor (NN) [32]. Reference [33] presented the performance of RF and SVM in classification of facial recognition. Reference [34] used boosting technique for the construction of NNEs and the final prediction is made by Naive Bayes (NB) classifier. Reference [35] divided the region into different types and combined the characteristic of the Fuzzy Support Vector Machine (FSVM) with NN, switching the classification methods to the different types. The studies show that these methods are extremely suitable for facial expression classification.

3. Support Vector Machine

3.1. Linear Support Vector Machines

SVM is a new supervised learning model with associated learning algorithm for classification problem of data whose ultimate aim is to find the optimal separating hyperplane. The mathematical model of SVM is shown below.

Given a training set , where is input and is the corresponding output, if there is a hyperplane which can divide all the points into two groups correctly, we aim to find the “maximum-margin hyperplane” where the distance between the hyperplane and the nearest point from either group is maximized. By introducing the penalty parameter and the slack variable , the optimal hyperplane can be obtained by solving constraint optimization problem as follows:Based on Lagrangian multiplier method, the problem is converted into a dual problem as follows:where are the Lagrange multipliers of samples . Only a few are solutions of the problem of removing the parts of , so that we can get the classification decision function as follows:

3.2. Nonlinear Support Vector Machines

For the linearly nonseparable problem, we first map the data to some other high-dimensional space , using a nonlinear mapping which we call . Then we use linear model to achieve classification in new space . Through defined “kernel function” , (2) is converted as follows:And the corresponding classification decision function is converted as follows:

The selection of kernel function aims to take the place of inner product of basis function. The ordinary kernel functions investigated for linearly nonseparable problems are as follows:(1)h-degree polynomial kernel function(2)(Gaussian) radial basis kernel function(3)Sigmoid kernel function

3.3. Weighted Feature SVM

Weighted feature SVM is based on weighted kernel function of SVM, which is defined as Definition 1.

Definition 1. Let be a kernel function defined in , . is a linear transformation square matrix of order of given input space, where is dimensionality of input space. Weighted feature kernel function is defined aswhere is referred to as a weighted feature matrix. The different choices for lead to different weight situation:(1) is an identity matrix of order , which is no weight situation.(2) is a diagonal matrix of order , where is the weight of th feature and not all are equal to the others(3) is an arbitrary matrix of order , which is full weight situation

We only consider is a diagonal matrix of order in this paper.

Definition 2. The ordinary weighted feature kernel function can be got by (9), and the process is shown as follows:(1)Weighted feature polynomial kernel function (2)Weighted feature (Gaussian) radial basis kernel function (3)Weighted feature sigmoid kernel function

The motivation for introducing kernel function is to search nonlinear model in the new feature space which is obtained by using nonlinear mapping. Matrix appears not to be related to the motivation, since it acts as linear mapping. However, it can be useful in practice, because it can change geometry shape of input space and feature space, thereby changing the weight of different functions in the feature space. And the weighted feature Gaussian basis kernel function is still a nonlinear model after using linear transformation. The conclusion can be proved by Theorem 3.

Theorem 3. Let be a kernel function defined in , . is a mapping from input space to feature space. is a linear transformation square matrix and . Then it deduces .

Proof. , acts as any of the three ordinary kernel functions in Definition 1; then it deduces

Theorem 4. When there is , th feature of sample data is irrelevant to calculation of kernel functions and output of classifier. Furthermore, the smaller the value of , the less the effect of calculation of kernel functions and output of classifier.

Proof. From definition of weighted feature kernel function and classification decision function (5), the conclusion is straightforward.

Theorem 3 indicates that changes of location relation between spot and spot lead to changes of geometry shape of feature space after linear transformation. And there may be better linear separating hyperplane in new feature space to improve the classification performance of SVM. Theorem 4 indicates that weighted kernel function can reduce the effects of weak correlation and no-correlative features, and we are looking forward to better classification results. The experiment results in the following section of this article demonstrate this conclusion.

3.4. Weight Estimation of Features

Feature weighting technique based on certain principle gives a weight to various data features where calculating is the key element. The changes in facial expression lead to slight different instant changes in individual facial muscles in facial appearance. According to motion range of facial muscles, the whole face can be divided into three kinds of regions: rigid region (nose), semirigid region (eyes, forehead, and cheek), and on-rigid region (mouth). According to the principles above, we divide face into several areas and find out recognition rate of all the areas where the higher the recognition rates, the greater the influences. Otherwise, the lower the recognition rate, the smaller the influences. Regard weight determination as the base for calculating the value of weight, and the calculation formula is presented as follows:This approach makes .

The area of the highest value of weight has the highest differentiation in the face, although it is also the largest contributor to classification results. Therefore, the higher the value of weight as a correlation measurement index, the stronger the correlation. The four constructing steps of weighted feature SVM are as follows:(1)Collecting origin facial expression image dataset and extracting feature dataset , where is feature vector of facial expression, is the feature vector of th region of face, and is the corresponding class label of facial expression(2)Calculating recognition rates and corresponding value of weight of each area. Constructing feature weight vector and linear transformation square matrix , where (3)Replacing standard kernel function formulas with weighted feature kernel function formulas (12)–(14), and constructing a classifier based on sample dataset (4)Evaluating the performance of achieved classifier

4. Experiment

The experiments on the extended Cohn-Kanade (CK+) dataset show the effectiveness of the proposed method. In our experiments, we use python programs based on LIBSVM software packages, and the platform of data processing is a computer with Windows 7, Intel® Core™ i3-2120 CPU (3.30 GHz), 4.00 GB RAM.

4.1. Extended Cohn-Kanade Dataset

Lucey et al. [36] presented the CK+ dataset containing 593 sequences from 123 subjects. Each of the sequences incorporates images from onset (neutral frame) to peak expression (last frame). But, only 327 of the 593 sequences were found to meet criteria for one of seven discrete emotions. And, 327 peak frames have been selected and labeled which come together to compose origin facial expression image dataset . The detailed number of images of each discrete emotion is shown in Table 1.

4.2. Facial Feature Extraction

In the paper, we use facial key points of each image as feature points on emotion recognition based on facial expression. Each feature point is expressed as a 2-dimensional coordinate as follows: . The resolution of each image of dataset is , , or . In order to unify the standard of coordinate system, image preprocessing is used to change the resolution of each image into . Reference [11] proposed the production of emotion, which has brought about facial behavior changes and is strongly linked to not the whole face but some specific areas, such as eyebrows, eyes, mouth, nose, and tissue textures. Besides, a face has different rigidness in different areas. According to the principles above, this paper divides face into 4 areas, which are shown in Figure 1 and corresponding feature vectors are listed as follows.

() Eyebrows Area. Select 8 key points from each eyebrow; their 2-dimensional coordinates , work together to form a 32-dimensional feature vector .

() Eyes. Select 8 key points from each eye; their 2-dimensional coordinates , work together to form a 32-dimensional feature vector .

() Nose. Select 10 key points from nose; their 2-dimensional coordinates , work together to form a 20-dimensional feature vector .

() Mouth. Select 17 key points from mouth; their 2-dimensional coordinates , work together to form a 34-dimensional feature vector .

Above all, we select 59 key points from the eyebrows, eyes, nose, and mouth. Therefore, 118-dimensional facial feature vector can be got from each frame where .

4.3. Experiment Contrast with Different Feature

Sample set contains 327 feature vectors of facial images of seven discrete emotions. We use the method of stratification sampling to get training set and test set. First, we treat the sample set in 7 disjoint layers on the basis of certain emotions. Then, we select a fixed number of feature vectors from each layer independently and randomly. The number is determined by the smallest size of 7 facial expression sample sets, which is 70% of the size of contempt sample set in this article. At last, all these selected feature vectors come together to compose training set , while the rest of feature vectors come together to compose test set . The detailed number of feature vectors of each emotion is shown in Table 1.

Select the component of feature vector to compose training set and test set . Thus we experiment four times under different facial area features, respectively. The detailed recognition accuracy of each facial area feature is shown in Table 2. According to the analysis of experimental results in four feature areas, the influence of features of three types of region is different. The nonrigid region has the biggest impact; rigid region has the least while semirigid region has an impact at a fair level.

4.4. Experiment Contrast with Different Kernel Function

We use the previous experiment results and (10) and (16) to obtain the weight of each area and corresponding linear transformation square matrix as follows:

Standard Gaussian kernel function and weighted feature Gaussian kernel function can be got by (7) and (13) for the 118-dimensional facial feature vector

Thus we experiment twice under training set and test set with different kernel function, respectively. The number of correctly recognized facial expressions under two kernel functions is shown in Table 3.

Finally, we compare our results with the experiments of two kernel functions, which are all image-based framework and tested on the CK+ dataset. The average precision of WF-SVM which uses weighted feature Gaussian kernel function is 93%, which is higher than SVM that uses standard Gaussian kernel function whose average precision is 83%, as is shown in Table 3. And the recognition rate is better than the previous method for the seven emotions. These confirm the effectiveness of our method. After investigating the reason, we find it can be explained from robustness of machine learning algorithm. This method reduces the influence of weak correlation feature by weighted feature, thus improving the robustness of algorithm.

5. Conclusion and Future Work

In this paper, we propose an approach of emotion recognition based on facial expression. In our approach, we propose a feature weighting technique since the effect of each feature on recognition result is different. Different from previous works by calculating weight of each feature directly, the facial expression images are divided into some uniform subregions and weight of subregion features is calculated based on their respective subrecognition rate. The experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. But our approach shows a pretty good performance for the dataset with limited head motion. Emotion recognition based on facial expression is still full of challenges in the future.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61573066).