We present a new approach to automatically recognize the pain expression from video sequences, which categorize pain as 4 levels: “no pain,” “slight pain,” “moderate pain,” and “ severe pain.” First of all, facial velocity information, which is used to characterize pain, is determined using optical flow technique. Then visual words based on facial velocity are used to represent pain expression using bag of words. Final pLSA model is used for pain expression recognition, in order to improve the recognition accuracy, the class label information was used for the learning of the pLSA model. Experiments were performed on a pain expression dataset built by ourselves to test and evaluate the proposed method, the experiment results show that the average recognition accuracy is over 92%, which validates its effectiveness.

1. Introduction

In recent years, tremendous amounts of researches have been carried out in the field of automatic expressions (such as pain, anger, and sadness) recognition from video sequence. Pain is a subjective and personal experience, and pain recognition is still difficult. There are numerous potential applications for pain recognition. Doctors can recognize pain when patients are experiencing genuine pain so that their pains are taken seriously, like young children who could not self-report pain measures, or many patients in postoperative care or transient states of consciousness, and with severe disorders requiring assisted breathing, among other conditions [1, 2]. Real-time automatic system can be trained which could potentially provide significant advantage in patient care and cost reduction.

Measuring or monitoring pain is normally conducted via self-report as it is convenient and requires no special skill or staffing. However, self-report measures cannot be used when patients cannot communicate verbally. Many researchers have pursued the goal of obtaining a continuous objective measure of pain through analyses of tissue pathology, neurological “signatures,” imaging procedures, testing of muscle strength, and so on [3]. These approaches have been fraught with difficulty because they are often inconsistent with other evidence of pain [3], in addition to being highly invasive and constraining to the patient.

The experience of pain is often represented by changes in facial expression. So, facial expression is considered to be the most reliable source of information when judging the pain intensity experienced by another. In the past several years, significant efforts have been made to identify reliable and valid facial indicators of pain [414]. In [4, 5], an approach was developed to automatically recognize acute pain; active appearance models (AAM) were used to decouple shape and appearance parameters from face images; based on AAM, three pain representations were derived. And then SVM were used to classify pain. In [610], Prkachin and Solomon validated a facial action coding system (FACS) based measure of pain that can be applied on a frame-by-frame basis. But these methods require manual labeling of facial action units or other observational measurements by highly trained observers [15, 16], which is both timely and costly. Most must be performed offline, which makes them ill-suited for real-time applications in clinical settings. In [11], a robust approach for pain expression recognition was presented using video sequences. An automatic face detector is employed which uses skin color modeling to detect human face in the video sequence. The pain affected portions of the face are obtained by using a mask image. The obtained face images are then projected onto a feature space, defined by Eigenfaces, to produce the biometric template. Pain recognition is performed by projecting a new image onto the feature spaces spanned by the Eigenfaces and then classifying the painful face by comparing its position in the feature spaces with the positions of known individuals. Zhang and Xia [12] used supervised locality preserving projections (SLPP) to extract feature of pain expression, and multiple kernels support vector machines (MKSVM) are employed for recognizing pain expression. Methods described above used static features to character pain expression, but these static features cannot fully represent pain.

In this paper, we propose a method for automatically inferring pain form video sequences. This approach includes two steps: extracting feature of pain expression and classifying pain expression. In the extracting feature, features of pain expression are extracted by motion descriptor based on optical flow. Then we convert facial velocity information to visual words using “bag-of-words” models, and pain expression is represented by a number of visual words; final pLSA model is used for pain expression recognition. In addition, in order to improve the recognition accuracy, the class label information was used for the learning of the pLSA model.

The paper is structured as follows. After reviewing related work in this section, we describe the pain feature extraction based on optical flow technique and “bag-of-words” models in Section 2. Section 3 gives details of pLSA model for recognizing pain expression. Section 4 shows experiment result, also comparing our approach with three state-of-the-art methods, and the conclusions are given in the final section.

2. Pain Expression Representation

2.1. Facial Velocity Feature

According to the physiology, the experience of pain is often represented by changes in facial expression and the expression is a dynamic event; it is must be represented by the motion information of the face. So, we use facial velocity features to characterize pain. The facial velocity features (optical flow vector) are estimated by optical flow model, and each pain expression was coded on a 4-level intensity dimension (A–D): “no pain,” “slight pain,” “moderate pain,” and “severe pain.”

Given a stabilized video sequence in which the face of a person appears in the center of the field of view, we compute the facial velocity (optical flow vector) at each frame using optical flow equation, which is expressed as where where is the image in pixel (, ) at time , where is the intensity at pixel (, ) and time , , are the horizontal and vertical velocities in pixel (, ).

We can obtain by minimizing the objective function:

There are many methods to solve the optical flow equation. We use the iterative algorithm [17] to compute the optical flow velocity: where is the number of iterations, initial value of velocity , and is the average velocity of the neighborhood of point ().

The optical flow vector field is then split into two scalar fields and , corresponding to the and components of [18]. and are further half-wave rectified into four nonnegative channels ,,, and , so that and . These four nonnegative channels are then blurred with a Gaussian kernel and normalized to obtain the final four channels ,,, and .

Facial pain expressionisrepresented by velocity features that are composed of the channels ,,, and of all pixels in facial image. Because pain expression can be regarded as facial motion, the velocity features can describe pain effectively, in addition to the velocity features having been shown to perform reliably with noisy image sequences [18], and have been applied in various tasks, such as action classification and motion synthesis. But the dimension of these velocity features is too high , where is image size) to be used directly for recognition and, so, we convert these velocity features into visual words using “bag of words” [19, 20].

2.2. Visual Words for Characterizing Pain

The “bag-of-words” model was originally proposed for analyzing text documents, where a document is represented as a histogram over word counts.

In this paper, each facial image is divided into blocks whose size is , and each image block is represented by optical flow vector of all pixels in the block. On this basis, pain is represented by visual words using the method of BoW (bag of words).

To construct the codebook, we randomly select a subset from all image blocks; then, we use -means clustering algorithms to obtain clusters. Codewords are then defined as the centers of the obtained clusters, namely, visual words. In the end, each face image is converted to the “bag-of-words” representation by appearance times of each codeword in the image that is used to represent the image, namely, BoW histogram.

The step for characterizing pain is as follows.

Step 1. Optical flow channels ,,, and are computed.

Step 2. Each facial image is divided into blocks, which is represented by optical flow vector of all pixels in the block.

Step 3. Vision words are obtained using -means clustering algorithms.

Step 4. Pain expression is represented by BoW histogram : where is the number of visual word included in image and is the number of vision words in word sets.

Figure 1 shows an example of our “bag-of-words” representation.

3. pLSA-Based Pain Expression Recognition

We use the pLSA models [21] to learn and recognize human pain. Our approach is directly inspired by a body of work on using generative topic models for visual recognition based on the “bag-of-words” paradigm. The pLSA models have been applied to various computer vision applications, such as scene recognition, object recognition, action recognition, and human detection [2226].

3.1. Probabilistic Latent Semantic Analysis (pLSA)

pLSA is a statistical generative model that associates documents and words via the latent topic variables, which represents each document as a mixture of topics. We briefly outline the principle of the pLSA in this subsection. The model of pLSA is shown in Figure 2.

Suppose document, word, and topic are represented by , , and zk, respectively. The joint probability of document , topic zk, and word can be expressed as where is the probability of word occurring in pain category , is the probability of topic occurring in image , and can be considered as the prior probability of . The conditional probability of can be obtained by marginalizing over all the topic variables :

Denote as the occurrence of word in image ; the prior probability can be modeled as

A maximum likelihood estimation of and is obtained by maximizing the function using the expectation maximization (EM) algorithm. The objective likelihood function of the EM algorithm is or

The EM algorithm consists of two steps: an expectation (E) step computes the posterior probability of the latent variables, and a maximization (M) step maximizes the completed data likelihood computed based on the posterior probabilities obtained from E-step. Both steps of the EM algorithm for pLSA parameter estimate are listed below.

E-Step. Given and , estimate :

M-Step. Given the estimated in E-Step and , estimate and : where is the length of document .

Given a new document, the conditional probability distribution over aspect can be inferred by maximizing the likelihood of using a fixed word-aspect distribution learned from the observed data [21]. The iteration of inferring is the same as the learning process except that the word-topic distribution in (12) is a fixed value, that is, learned from training data.

3.2. pLSA-Based Pain Expression Recognition

In this paper, we treat each block in an image as a single word , an image as a document , and a pain category as a topic variable zk. For the task of pain classification, our goal is to classify a new face image to a specific pain class. During the inference stage, given a testing face image and the document specific coefficients , we can treat each aspect in the pLSA model as one class of pains. So, the pain categorization is determined by the aspect corresponding to the highest . The pain category of is determined as

For pain recognition with large amount of training data, this would result in long training time. In this paper, we adopt a supervised algorithm to train pLSA, which is similar to [27]. Each image has class label information in the training images, which is important for the classification task. Here, we make use of this class label information in the training images for the learning of the pLSA model, since each image directly corresponds to a certain pain class on train sets; the image for training data becomes observable. This model is called supervised pLSA (SpLSA). The graphical model of SpLSA is shown in Figure 3.

The parameter in the training step defines the probability of a word drawing from a topic zk. Letting each topic in pLSA correspond to a pain category, the distribution in the training can be simply estimated as where is the number of the images corresponding to the th pain class and is the number of the th word (block) in the images corresponding to the th pain class. This means that the calculated by this way can be used to initialize the in the EM algorithm for model learning, which makes the EM algorithm converge more quickly. The supervised training of pLSA is summarized in Algorithm 1. Once the distribution is computed by the EM algorithm, for a testing face image , the posterior distribution can be calculated the same as in original pLSA. The training of pLSA for classification is summarized in Algorithm 2.

Algorithm 1. Supervised training of the pLSA.

Step 1. For all and , calculate as the initialization of the and random initialization of the .

Step 2. E-Step: for all pairs, calculate

Step 3. M-Step: substitute as calculated in Step 2; for all and  , calculate

Step 4. M-Step: substitute as calculated in Step 2; for all and , calculate

Step 5. Repeat Steps 24 until the convergence condition is met.

The supervised training algorithm not only makes the training more efficient, but also improves the overall recognition accuracy significantly.

Algorithm 2. Training of the pLSA for classification

Step 1. For all and , calculate

Step 2. E-Step: for all pairs, calculate

Step 3. Partial M-Step: fix as calculated in Step 1; for all , calculate

Step 4. Repeat Steps 2 and 3 until the convergence condition is met.

Step 5. Calculate pain class

4. Experimental Results and Analysis

The effectiveness of the proposed algorithm was verified by using C++ and MATLAB hybrid implementation on a PC with Pentium 3.2 GHz processor and 4 G RAM.

We have built a database of painful and normal face images. In this database, there are four groups of images (“no pain,” “slight pain,” “moderate pain,” and “severe pain”), and each group includes 20 males and 20 females. The face images were taken under various laboratory-controlled lighting conditions, and each face image was normalized to a size of . Some sample images are shown in Figure 4.

In experiments, 30 face images per class are randomly chosen for training, while the remaining images are used for testing. We preprocessed these images by aligning and scaling them so that the distances between the eyes were the same for all images and also ensuring that the eyes occurred in the same coordinates of the image. We run the system 5 times and obtain 5 different training and testing sample sets. The recognition rates were found by averaging the recognition rate of each run.

Each facial image is divided into blocks whose size is . First, we studied the effect of the size of image block on the recognition accuracy. Figure 5 represents the recognition accuracy curve with different block sizes . It can be concluded that the accuracy peaks when the block sizes is 8. Therefore is set as 8.

In order to determine the value of , that is, the number of the visual word set, the relation between and recognition accuracy was observed, which is displayed in Figure 6. It is revealed in Figure 4 that the recognition accuracy is risen up at the beginning with the increasing of recognition and if is larger than or equal to 60, the recognition accuracy is stabled to 0.922. As a result, is set as 60.

To examine the accuracy of our proposed pain recognition approach, we compare our method to three state-of-the-art approaches for pain recognition using the same data. The first method is “AAM + SVM” [4], which used active appearance models (AAM) to extract face features and SVM to classify pain. The second method is “Eigenimage” [11], which used Eigenface for pain recognition. The third method is “SLPP + MKSVM” [12], which used SLPP to extract feature of pain expression and multiple kernels support vector machines (MKSVM) for recognizing. 200 different expression images are used for this experiment. Some images contain the same person but in different moods. The recognition results are presented in the confusion matrices shown in Table 1. Each cell in the confusion matrix is the average results; our results are at the upper left; the results of “AAM + SVM,” “Eigenimage,” and “SLPP + MKSVM” are presented in the upper right, the lower left, and the lower right, respectively. where A, B, C, and D indicate no pain, slight pain, moderate pain, and severe pain, respectively. As Table 1 shows, our method improves the recognition accuracies in all categories. It achieves 92.2% average recognition rate, whereas “AAM + SVM” obtain 81.2%, “Eigenimage” gets 82.5%, and “SLPP + MKSVM” attains 86.5%, as shown in Table 2. The reason is that we improve the recognition accuracy in the two stages of pain feature extraction and pain expression recognition. In the stage of pain feature extraction, we use motion features that are reliable with noisy image sequences and describe pain effectively, while other methods used static features, which cannot effectively describe the expression of pain. In the stage of expression recognition, we use bag-of-words framework and pLAS model to classify expression images. In addition, we make use of this class label information in the training images for the learning of the pLSA model, which can improve the overall recognition accuracy significantly.

5. Conclusion

Pain recognition can provide significant advantage in patient care and cost reduction. In this paper, we present a novel method to recognize the pain expression and give the pain level at the same time. The main contribution can be concluded as follows.(1)Visual words are used for pain expression. Optical flow model is used for extracting facial velocity features; then we convert facial velocity features into visual words using “bag-of-words” models.(2)We use pLSA topic models for pain expression recognition. In our models the “latent topics” directly correspond to different pain expression categories. In addition, in order to improve the recognition accuracy, the class label information was used for the learning of the pLSA model.(3)Experiments were performed on a pain expression dataset built by ourselves and evaluate the proposed method. Experimental results reveal that the proposed method performs better than previous ones.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.


This work was supported by Research Foundation for Science & Technology Office of Hunan Province under Grant no. 2014FJ3057, by Hunan Provincial Education Science and “Twelve Five” planning issues (no. XJK012CGD022), by the Teaching Reform Foundation of Hunan Province Ordinary College under Grant no. 2012401544, and by the Foundation for Key Constructive Discipline of Hunan Province.