Abstract

Micro-expressions are unconscious, faint, short-lived expressions that appear on the faces. It can make people's understanding of psychological state and emotion more accurate. Therefore, micro-expression recognition is particularly important in psychotherapy and clinical diagnosis, which has been widely studied by researchers for the past decades. In practical applications, the micro-expression recognition samples used in training and testing are from different databases, which causes the feature distribution between the training and testing samples to be different to a large extent, resulting in a drastic decrease in the performance of the traditional micro-expression recognition methods. However, most of the existing cross-database micro-expression recognition methods require a large number of model selection or hyperparameter tuning to select better results from them, which consumes a large amount of time and labor costs. In this paper, we overcome this problem by exploiting the intradomain structure. Nonparametric transfer features are learned through intradomain alignment, while at the same time, a classifier is learned through intradomain programming. In order to evaluate the performance, a large number of cross-database experiments were conducted in CASMEII and SMIC databases. The comparison of results shows that this method can achieve a promising recognition accuracy and with high computational efficiency.

1. Introduction

Micro-expression is a spontaneous expression that is associated with self-defense mechanisms that occur when a person attempts to conceal an internal emotion and can neither be faked nor suppressed [1]. A spontaneous facial expression was first identified by Haggard and Isaacs [2] in 1966. In 1969, Ekman and Friesen [3] found that a depressed patient who smiled a lot had occasional frames of very painful expressions during repeated viewing of a conversation. The researchers call these quick, unconscious, spontaneous facial movements that people make when they experience strong emotions micro-expressions. Therefore, micro-expressions have a lot of potential utilization value for emotion recognition tasks, e.g., clinical diagnosis [4], marital relationship prediction [5], communication negotiation [6], and teaching assessment [7, 8].

It has been found that micro-expression, a unique and common tiny facial movement that lasts for a very short time, is usually only 1/25 to 1/5 second [9]. Since it is almost hard to detect and recognition with the naked eye, it is necessary to carry out automatic recognition of micro-expressions by computer. So far, although the micro-expression recognition experiments done by the researchers have achieved some results, the micro-expression samples for training and testing come from the same database. In many practical applications, in fact, the micro-expression recognition samples for training (source) and testing (target) come from different databases. Due to the differences in race, gender, age, camera equipment, and recording environment between the two databases, the original consistency of feature distribution in traditional micro-expression recognition was seriously damaged, resulting in most micro-expression recognition methods being unsatisfactory. Previously, the recognition of cross-database micro-expression related to it has been widely followed by most researchers, and many relevant and effective methods have been proposed. Selective transfer machine (STM) [10, 11], a new method, is discovered by using the target sample to study a group of weights of the source sample and pick the most appropriate parameter values , , , etc., where the weighted source sample has the same or similar distribution of features as the target sample. In the work [12], a dictionary is jointly learned through samples from source domain and target domain, balancing the parameters using the trade-offs and optimizing the model, so that the source and target domains have the same or similar feature distribution. Zheng and Zhou [13] and Zheng and Zhou [14] discovered a transformed subspace learning framework to handle cross-pose and cross-database cases in facial expression recognition and selected the optimal solution by adjusting parameters. To address the challenge of cross-database micro-expression recognition (CDMER), Zong et al. [15] first proposed the use of domain adaptive methods and introduced some new methods such as the domain regeneration framework and target sample regenerator. The experiment yielded good results but still needs to tune the individual parameters. Li et al. proposed target-adapted least-squares regression (TALSR) to learn the regression coefficient matrix based on the source domain data information and optimize TALSR to fit the target micro-expression database by weighing parameters , , and [16]. Although some existing methods are effective for the task of cross-database micro-expression recognition, it is worth noting that most of them require parameter tuning as well as model selection. Some adaptive adjusting algorithms can be used for implementing the parameter tuning and model selection such as the recursive algorithms [1725] and the iterative algorithms [2632]. However, due to the inability to determine the best model and its optimal hyperparameters, it usually takes a lot of labor cost to use parameter grid search method or cross-validation strategy to search. In addition, such selection is somewhat subjective and often performs radially very differently on different datasets. Applying it to real situations remains a challenge. Therefore, how to find a more efficient way to carry out this work is particularly important.

In this paper, we propose the Intradomain Structure Domain Adaptation (IDSDA) method to solve this problem. Nonparametric transfer features are learned by aligning the source and target domain subspaces, and then, a classifier is learned through intradomain programming to predict the sample labels of the target domain, without model selection and hyperparameter tuning. For one thing, this method saves time and labor costs. For another, it improves the recognition accuracy and practicality of CDMER.

In general, this paper contains the following main contributions:(1)IDSDA is proposed to carry out cross-database micro-expression recognition. IDSDA mainly learns nonparametric transfer features through intradomain alignment and a classifier through intradomain programming, which is simple to operate and enhances practicability(2)Different from the current popular CDMER, this method does not require model selection and hyperparameter tuning, saving labor cost and greatly shortening experiment time(3)We have conducted extensive CDMER experiments, and the experimental results prove that our method has advantages in micro-expression recognition, and the recognition accuracy is improved

The rest of this paper is arranged as follows. Section 2 reviews the work about unsupervised domain adaptation and state-of-the-art CDMER. In Section 3, we describe the Intradomain Structure Domain Adaptation (IDSDA) for CDMER in detail. For better evaluation of this method, extensive experiments and analyses on SMIC and CASMEII are shown in Section 4. At last, the conclusion and the future planning of this paper are drawn in Section 5.

In addition, a large number of symbols are used in this paper. For clarity, we show the frequently used notations and corresponding descriptions in Table 1.

Domain adaptation is actually a method of transfer learning. The principle of domain adaptation is a learning process in which models acquired in the old domain are applied to the new domain. Since the source domain has sufficient data information, it usually uses all the data to fully learn the internal structure of the data. Therefore, how to effectively reduce the distribution difference between the source domain and target domain is the essence of domain adaptation. Jing et al. [33] proposed Adaptive Component Embedding (ACE), for resolving large domain discrepancies. In the work of [34], Maximum Density Divergence was proposed to minimize the interdomain divergence and maximize the intradomain density. In [35], Heterogeneous Domain Adaptation (HAD) method is found, which optimizes both feature discrepancy and distribution discrepancy in a uniform objective function. In addition, Li et al. [36] discovered a new method, Locality Preserving Joint Transfer, which considers the knowledge transfer between the feature and sample level. In this way, the distribution divergence between the two domains is reduced, while preserving the neighborhood relationship of samples and making it robust to outliers. Similar to the domain adaptive classification, according to whether the target domain has the label information or not, cross-database micro-expression recognition is usually divided into unsupervised and semisupervised categories. The former can only use the source domain's sample label information for training, while the latter can combine some existing label information of the target domain with the known sample information of the source domain, so the information of the latter is sufficient. Obviously, the former has more practical applications than the latter, so this paper focuses on unsupervised cross-database MER.

For unsupervised cross-database MER, although methods such as domain regeneration in the original label space (DRLS) and domain regeneration in the original feature space with unchanged target domain (DRFS-T) have achieved good results by Zong et al. [15], they all require hyperparameter tuning. Taking the DRFS-T method as an example. Firstly, a regenerator is learned, which is able to regenerate the source and target domain samples. Then, the classifier (SVM) can be trained according to the data information in the source domain, and the target domain sample class can be predicted based on the classifier. But the constraints of the regenerator are as follows.

First, the regenerated target domain samples remain unchanged in the feature space:

and are the target domain and source domain micro-expression samples, and and are the number of target and source domain samples. Besides, represents the dimension of the feature vector.

Second, the regenerated source domain and target domain samples have the same feature distribution:

For the purpose of regulating the balance of the two terms in the objective function of DRFS-T, the trade-off parameter is introduced, while is the regular term.

Then, Zong et al. [15] exploited the characteristics of regenerative Hilbert space and its MMD distance and finally optimized them as and are column vectors with dimensions and , respectively. In addition, and are the kernel matrices of the source and target domain samples. is the L1 parametric constraint on the coefficient matrix , where is the sparsity trade-off parameter controlling .

To sum up, DRFS-T involves hyperparameters and , which requires huge time cost to pick the optimal result. However, IDSDA can solve this problem well. The IDSDA method mainly utilizes the subspace method for intradomain alignment followed by nonparameter intradomain programming, so as to achieve the purpose of improving the recognition accuracy of CDMER.

3. IDSDA for Cross-Database Micro-Expression Recognition

3.1. Problem Definition

Domain adaptation is specifically defined as follows: given a labeled source domain and an unlabeled target domain . and are the samples in the source and target domains, respectively, while and are the number of their samples. It is assumed that the feature space, label space, and conditional distribution are the same, i.e., , , and , but the margin distribution is different . The core idea is to learn classifiers using with sufficient information for predicting the labels of the target domain.

As shown in Figure 1, Figure 1(a) indicates the different covariance of the sample in source and target domains. Figure 1(b) shows that while the target domain remains unchanged, the feature correlations of the source domain are removed. Figure 1(c) draws the correlations from the target domain into the whitened source domain so that their source and target distributions are aligned. Figures 1(a)–(c) are for source and target domain distribution alignment. Figure 1(d) depicts the performance of learning a classifier on the source domain data by intradomain programming using the intradomain structure. In the next section, we will describe each step in turn.

3.2. Intradomain Alignment

The IDSDA learns nonparametric transfer the feature through intradomain data alignment, mainly by transforming the statistical features of the data for alignment, thereby reducing the difference between the training domain and testing domain. CORrelation ALignment (CORAL) [37] is one of the better subspace alignment methods, it is computationally efficient, and it does not include other parameters to adjust. Inspired by this method, the intradomain alignment process for the IDSDA method is as follows:where is the covariance matrix. and are identity matrices of the same size as and . We can regard this step as whitening the source domain; that is, the feature correlation is removed from the source domain.

With the aim of reducing the interdomain differences, the whitened source domain features need to be recolored next [38]. Meanwhile, the target domain features are kept unchanged.

3.3. Intradomain Programming

This step in cross-database micro-expression experiments aims to learn a transfer classifier. Before learning the nonparametric transfer classifier, we first need to understand the probability annotation matrix. Table 2 shows a probability annotation matrix , , and the elements of it satisfy the condition . denotes the class of the label while . is expressed as the annotation probability that is of class . Similar to the principle of softmax classifier, the class corresponding to the highest probability value for is the class to which it belongs. For example, the highest probability value of 0.5 for implies that it belongs to .

If represents the European distance from to the center of class of the source domain, and is used to represent the center of class , then the cost function is as follows: can be summarized as the indicator function, the output is 1 when the input is correct, and 0 represents the incorrect input.

The fact that , with any label containing a number of samples not less than 1. Under ideal conditions, if does not belong to class , then is 0, and vice versa is 1. So, we can denote it as

Since is the probability that is the th class label and , is bounded by the following condition:

We need to minimize the cost function to optimize the model in combination with equation (6), and with the constraints of equations (9) and (8), the final learning goal becomes

Linear programming is the solution to the problem of maximizing or minimizing a linear objective function under the constraints of a linear equation or inequality and is often solved very efficiently by an open-source linear programming package called PuLP (https://pypi.org/project/PuLP/1.1/). We can use this method to obtain .

Ultimately, the labels of the target domain can be obtained by the softmax function.

4. Experiments

4.1. Micro-Expression Database

For further confirmation of the reliability of the IDSDA in cross-database facial MER, plenty of CDMER experiments will be conducted in this chapter. We choose two widely used databases, CASMEII and SMIC. The SMIC dataset [39, 40] was created by the University of Oulu, Finland, using a 100 fps high-speed video camera to record the subject who was required to look at videos with large emotional swings while attempting to conceal her/his emotions, and the recorder observed the subjects' expressions without watching the videos. Under this elicitation mechanism, 164 video sequences were obtained from 16 individuals (10 men and 6 women), and the obtained micro-expressions belonged to 3 categories (Positive, Surprised, and Negative). In order to study the micro-expressions more deeply, 71 video sequences of micro-expressions were later recorded under normal vision (VIS) and near-infrared (NIR) environments using cameras with a frame rate of 25 fps. SMIC (VIS, HS, NIR) will be used as an independent dataset in this experiment.

CASMEII [41], which was created by the Institute of Psychology, Chinese Academy of Sciences, uses a similar elicitation mechanism to ensure the reliability of the data. The dataset consists of 247 video sequences of 26 individuals recorded at 200 fps frame rate, and seven micro-expression categories (Happy, Surprised, Disgusted, Depressed, Sad, Scared, and Other) are included within the dataset. As we have seen, the micro-expression categories in the databases CASMEII and SMIC are quite different. In order to make the two categories consistent, the categories in CASME II were selected and relabeled. Specifically, the Happy sample of micro-expressions is tagged as Positive, and then, the Other category was removed. The labels of the Surprised sample remained unchanged, and finally, the Disgusted and Depressed samples were relabeled as Negative. The details are shown in Table 3.

4.2. Experimental Setup

SMIC (VIS, HS, NIR) and CASMEII were set as the four datasets for this cross-database. In other words, when CASMEII is the source (target) domain dataset, one of the datasets in the SMIC database is the target (source) domain dataset. We can obtain 6 sets of cross-database micro-expression experiments, namely, No.1 : C-H, No.2 : C-N, No.3 : C-V, No.4 : H-C, No.5 : N-C, and No.6 : V-C. It should be explained that C represents CASMEII, H corresponds to SMIC (HS), N represents NIR, and V is recorded as VIS in the SMIC database. For the four data sets, LBP-TOP [42] descriptor is selected to operate feature extraction. An 8 × 8 grid was used to divide the micro-expression sequences, on three orthogonal planes, on the premise that the neighborhood point P of the LBP operator is 8, and the neighborhood radius R is set as 2. Then, all LBP-TOP histograms in each block are connected in series to form facial feature vectors.

With the purpose of proving that the IDSDA is superior, a comparison with other domain adaptive methods that have good performance in cross-database identification has been chosen. These methods include support vector machine (SVM) [43], CORAL [37], geodesic flow kernel (GFK) [44], and DRFS-T [15]. Parameter settings of all the above methods in the experiment are shown as follows:(1)For SVM, set (linear), and use it as a traditional method. The results of experiments conducted directly with SVM will be used as a baseline method for comparison.(2)DRFS-T involves two important parameters, i.e., and . We search for the best values of these two parameters in and .(3)For CORAL, we use the common SVM classifier and set . To enrich the types of classifiers in the experiments, 1NN is used in GFK and the optimal dimensionality reduction is chosen in .

4.3. Analysis of Experimental Results

Table 4 shows the accuracy of the experimental results of the domain adaptive methods covered in this paper. It is clear that the experimental results of the IDSDA method were superior to other methods in most of the experiments, and the results are significantly improved. For example, in No.2 and No.3, the maximum difference between the IDSDA method and other DA methods can reach 23.94% and 26.76%. In addition, an interesting phenomenon was found by observing the experimental results of all methods. Under the conditional setting that the source domain samples were from CASMEII, the experimental results of SMIC (VIS) as the target domain dataset were superior to those of SMIC (NIR) as the target dataset (No.2 and No.3). Meanwhile, No. 6 micro-expression recognition accuracy is higher than No.5. As shown in Section 4.1, SMIC (NIR) data set is shot by a near-infrared camera, and SMIC (VIS) data set is shot under normal vision. We found that the difference in image quality at both source and target domains may cause this phenomenon.

The exact value in No.5 and No.6 is lower than that in No.2 and No.3, with a minimum difference of 17.33%. In other words, the recognition accuracy of CASMEII as a source domain database is higher than that as a target domain database. The analysis may be caused by the imbalance of sample categories. As shown in Table 3, the proportions of the three micro-expression categories in the CASMEII database used in this experiment are greatly different, with negative emotions taking up the majority (91/148), while the difference in the recognition rate between No.1 and No.4 was only 2.5%, significantly less than 17.33%. Carefully observed, it can be found that SMIC (HS) dataset also accounts for a large proportion of negative emotions. We found that they have the same database composition, which further proves that similar data composition of source and target domains is helpful for DA recognition. This further proves that category imbalance may be the main factor affecting the recognition effect in CASMEII and SMIC databases.

In addition, we also recorded the average running time (training and testing) of the DA methods involved in the paper in each group of experiments, the average precise values of the six groups of experiments, and the required parameters of each method in Table 5. The results show that the IDSDA method outperforms the other methods in terms of average accuracy (56.49%) while greatly reducing the running time (58.09 s). In summary, this experiment demonstrates the superiority of the IDSDA method in terms of accuracy and efficiency.

5. Conclusion and Discussion

In this paper, we use the intradomain structure for CDMER. The intradomain alignment is first performed to learn the nonparametric transfer features, while the classifier is later learned through intradomain planning. It is simple to operate as it does not require model picking and hyperparametric tuning. Extensive experiments are carried out on CASMEII and SMIC databases, and by a comparative analysis, there is a clear indication that the IDSDA is significantly superior to other cutting-edge domain adaptive methods when it comes to the comprehensive performance and efficiency for the CDMER task.

Although the Intradomain Structure Domain Adaptation (IDSDA) method has achieved good results in CDMER, there are still some problems to be investigated. (1) According to the experimental results, database category imbalance is an important factor affecting cross-database MER. Next, we need to focus on how to reduce the impact of category imbalance. (2) The recognition effect is also influenced by the difference of image quality between source and target domain databases when the domain adaptive method is dealing with cross-database micro-expression recognition. We will be able to consider transforming the database used in the experiment to the same image quality in our future work. The proposed approaches in the paper can combine other methods and tools [4553] to study the image recognition and identification problems of different plants and can be applied to other studies [5462] in natural sciences and social sciences.

Data Availability

The data used to support this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Y. Zhang and Y. Liu jointly designed the study. Y. Liu collected and analyzed the data. H. Wang reviewed and edited the manuscript.

Acknowledgments

This work was supported by the Henan Scientific and Technological Research Project under Grant 212102210504.