Abstract

Active appearance model is a statistically parametrical model, which is widely used to extract human facial features and recognition. However, intensity values used in original AAM cannot provide enough information for image texture, which will lead to a larger error or a failure fitting of AAM. In order to overcome these defects and improve the fitting performance of AAM model, an improved texture representation is proposed in this paper. Firstly, translation invariant wavelet transform is performed on face images and then image structure is represented using the measure which is obtained by fusing the low-frequency coefficients with edge intensity. Experimental results show that the improved algorithm can increase the accuracy of the AAM fitting and express more information for structures of edge and texture.

1. Introduction

Facial features extraction is an essential technique for image analysis and interpretation, which is usually used for face recognition, face reconstruction, and expression analysis [1, 2]. In recent years, finding a precise and efficient method for extracting facial features has gained more concern in pattern recognition. Wiskott et al. [3] utilized Gabor filters for face recognition and proposed the concept of elastic graph for facial features extraction. Beinglass and Wolfson [4] proposed a new extraction method in using generalized hough transform by each clear feature point being as the reference point of graph. Then some new methods based on symmetry transform and multiscale symmetry transform were applied for extracting facial features.

Currently, statistics-based flexible models, especially the active appearance model (AAM) [5] proposed by Cootes et al. have gained great success in this area. AAM is an efficient fitting algorithm for extracting features. The goal of the AAM search is to find the model parameters that generate a synthetic image as close as possible to a given input image and to use the resulting AAM parameters for interpretation. Fitting the model and the target image is treated as a nonlinear optimization problem. Therefore, the fitting task requires a huge amount of computation when the standard nonlinear optimization techniques such as the gradient descent method are used. Unfortunately, these algorithms were inefficient because of the huge complexity. Matthews and Baker [6] combined AAM with the Lucas-Kanade algorithm [7] to build an efficient AAM fitting method. The method can be implemented very efficiently because the Hessian matrix and the Jacobian matrix can be precomputed.

However, the original formulation of the AAM using image intensity values to model the texture variations are very sensitive to changes in imaging conditions such as lighting and camera parameters. Many researchers pointed out the drawback and have paid much attention to the texture representation. Wolstenholme and Taylor [8] introduced wavelet-based method to represent the texture using as few dimensions as possible. This method can efficiently decrease image dimensions and keep fitting accuracy. Gao et al. [9] utilized Gabor filters to represent the image texture, the Gabor representations lead to more accurate fitting. In addition, other researchers proposed some measures which include the direction of gradient, the edge, and the corner of training images at each pixel. Cootes and Taylor [10] proposed to model the local orientation of structures using the appearance of edge strength. Kittipanya ngam and Cootes [11] proposed representation of the image structure in a region which can improve fitting performance. In a word, all above methods can improve the fitting accuracy of AAM.

All the above improved AAM algorithms considered the texture and local structure separately, we now consider to propose a new representation which combines the edge intensity and the texture information. The edge intensity of image reflects the discontinuity of gray-value, where it is less sensitive to imaging conditions. In this paper, translation invariant wavelet transform is performed on face images and then image texture representation is obtained by the low-frequency coefficients, which are fused with edge intensity. Experimental results obtained on IMM database and Weizmann database show that the improved algorithm can express more information for structures of edge and texture and thus leading to more accurate fitting.

The remainder of this paper is organized as follows. Section 2 concisely introduces the backgrounds of original AAM and translation invariant wavelet transform. Then, we show an improved AAM algorithm in Section 3. Experimental results are presented in Section 4. Conclusions are drawn in Section 5.

2. Background

2.1. Active Appearance Model (AAM)

AAM is a statistical model including two aspects: model building and fitting algorithm. AAM learns characteristics of objects by building a compact statistical model. The statistical model, which represents shape and texture variations of objects is obtained from applying principal component analysis (PCA) to a set of labeled data.

In AAM, an object is described by a set of landmarks which indicate important positions at the boundary of the object. Landmarks are labeled manually on each object in the training data. A shape 𝐬 is defined as coordinates of 𝑣 vertices: 𝑥𝐬=1,𝑦1,,𝑥𝑖,𝑦𝑖,,𝑥𝑣,𝑦𝑣𝑇,(2.1) where (𝑥𝑖,𝑦𝑖) is the coordinate of the 𝑖-th landmark in the object and 𝑣 is the number of the landmarks in an image. Given a set of labeled shapes, we first align these shapes into unified framework by procrustes analysis method. Then, PCA is applied to extract shape eigenvectors and a statistical shape model is constructed as 𝐬=𝐬0+𝑛𝑖=1𝐬𝑖𝑝𝑖,(2.2) where 𝐬0 is the mean shape, 𝐬𝑖 is the 𝑖-th shape eigenvector corresponding to the 𝑖-th eigenvalue computed by PCA, and 𝑝𝑖 is the 𝑖-th shape parameter. The above equation allows us to generate a new shape by varying shape parameters.

For each training face image, the number of pixels inside shape contour is different, and the corresponding relationship of pixels among different training images is uncertain. In order to obtain the required texture information between landmarks, a piece-wise affine warp based on the Delaunay triangulation of the mean shape is used. We obtain a texture-vector set including texture information of each training images by warping the labeled shapes onto the mean shape. Then, the texture-vector set is normalized to avoid these influences from global changes in pixel intensities and so on. Next, PCA is applied to the normalized data, we can easily derive the statistical model of appearance which is represented as follows: 𝐀=𝐀0+𝑚𝑖=1𝐀𝑖𝜆𝑖,(2.3) where 𝐀0 is the mean appearance, the appearance vectors 𝐀𝑖(𝑖=1,2,,𝑚) are the 𝑚 eigenvectors corresponding to the 𝑚 largest eigenvalues, and the coefficients 𝜆𝑖(𝑖=1,2,,𝑚)are the appearance parameters.

AAM based on inverse compositional algorithm is an efficient fitting algorithm, evolved from Lucas-Kanade algorithm. The goal of Lucas-Kanade algorithm is to find the location of a constant template image in an input image, minimizing the error between template image and input image as follows: 𝑥𝐼(𝑊(𝑥;𝑝))𝐴0(𝑥)2,(2.4) where 𝐴0(𝑥) is template image and 𝐼(𝑊(𝑥;𝑝)) is input image. In Lucas-Kanade algorithm, with respect to Δ𝑝 and then updating the warp using 𝑝𝑝+Δ𝑝,(2.5)

the inverse compositional algorithm is different from Lucas-Kanade algorithm, updating the warp using (2.6) and minimizing (2.7):𝑊(𝑥;𝑝)𝑊(𝑥;𝑝)𝑊(𝑥;Δ𝑝)1,(2.6)𝑥𝐼(𝑊(𝑥;𝑝))𝐴0(𝑊(𝑥;Δ𝑝))2,(2.7)

where is compositional operation.

Taking the Taylor series expansion of (2.7) gives𝑥𝐼(𝑊(𝑥;𝑝))𝐴0(𝑊(𝑥;0))𝐴0𝜕𝑊𝜕𝑝Δ𝑝2,(2.8)

where 𝐴0(𝑊(𝑥;0)) is texture information of mean shape. Minimizing (2.8) is a least squared problem, the solution after taking the derivative will obtain the following:Δ𝑝=𝐻1𝑥𝐴0𝜕𝑊𝜕𝑝𝑇𝐼(𝑊(𝑥;𝑝))𝐴0(𝑥),(2.9)

where 𝐻 is Hessian matrix with 𝐴0 and 𝐼:𝐻=𝑥𝐴0𝜕𝑊𝜕𝑝𝑇𝐴0𝜕𝑊𝜕𝑝.(2.10)

In AAM-based inverse compositional algorithm, since the template 𝐴0 is constant and the Jacobian matrix, the Hessian matrix are nothing to do with location 𝑥 variables, the relevant computation can be moved to a precomputation step and computed only once. So it is an efficient fitting algorithm than traditional AAM.

2.2. Translation Invariant Wavelet Transform (TIWT)

Wavelet transform is a powerful tool in the analysis of functions and applications in image processing. Via this method, characteristics of a function are separated into wavelet coefficients at different scales. Kourosh and Hamid [12] proposed to utilize different scales energy to depict texture of image via wavelet transform. This method would not be complete to characterize texture features of image because of its rotation invariance. Xiong et al. [13] constructed a translation and scale-invariant wavelet transform. One disadvantage of the classical wavelet translation is that it is not translation invariant, thus, it will lead to filtering result exists a visual distortion. To overcome these defects, Berkner [14] proposed a new approach based on the translation invariant wavelet transform (TIWT). The result is a transform that is not orthogonal anymore and contains redundant information.

The TIWT function representation of an image texture is obtained by convolving TIWT functions with an image. After convolving with TIWT functions, there are many components (images) for an image. Each component is the magnitude part of the output, which is obtained by convolving the image with a TIWT function. As an example, a face image is decomposed into 3 scales via TIWT with harr base. There are 10 components (images) for the image and the example of the convolving result is shown in Figure 1.

From Figure 1, we can see that each component has the same dimensions with the original image. The last component represents low-frequency coefficients and other all components represent high frequency coefficients. The low-frequency coefficients contain most of the energy in the image, which is also the most important visual component. The results show that the low-frequency coefficients represent the approximate weight of the image and contain contour information of the image, while the high-frequency coefficients represent the details of the image. The low-frequency coefficients can gain much clear image representation than the high-frequency coefficients.

3. Improved AAM Algorithm

Translation invariant wavelet transform-based texture can provide enough information for model building. Rather than represent the image structure using intensity values, we use a measure which combines texture information and edge intensity. This improved measure is less sensitive to imaging conditions. In this section, the improved AAM method will be illustrated in detail.

Given a face image in training set, we can get the low-frequency coefficients of each image via translation invariant wavelet transform. Then, the low-frequency coefficients are fused with edge intensity. We use the method based on edge intensity which was proposed in [15] to fuse the low-frequency coefficients. Assume 𝐶(𝑖,𝑗) to represent the low-frequency coefficients obtained via translation invariant wavelet transform, the edge intensity 𝐸(𝑖,𝑗) can be computed as follows: 𝐸(𝑖,𝑗)=𝐶(𝑖,𝑗)𝑀12(𝑖,𝑗)+𝐶(𝑖,𝑗)𝑀22(𝑖,𝑗)+𝐶(𝑖,𝑗)𝑀32,(𝑖,𝑗),𝑀1=111222111,𝑀2=121121121,𝑀3=101040101(3.1) where represents convolution, 𝑖=1,2,,𝑀 and 𝑗=1,2,,𝑁,𝑀 and 𝑁 are the image dimensions. From (3.1), we can see that the edge intensity reflects the edge information of image on horizontal, vertical, and diagonal direction. Edge intensity-based texture representations tend to be less sensitive to imaging conditions. Figure 2 shows texture representation which contains the low-frequency coefficients and the low-frequency coefficients fused by edge intensity.

We represent the image structure using the low-frequency coefficients fused by edge intensity. To build a statistical shape model is similar to the original AAM. Then, to build an appearance model, the texture information is sampled using the low-frequency coefficients fused by edge intensity. All other calculations remain the same as the original AAM.

When fitting model, the same approach as building an appearance model is used to sample the texture of the target image. Then, to find the best model parameters of the most similar AAM instance with the target image, we can extract precise facial features when model converged to the target image.

4. Experimental Results

4.1. Database

We implement the following experiments on two databases: IMM face database [16] and Weizmann face database [17]. IMM face database consists of 240 annotated images of 40 different human faces, and each image is labeled 58 landmarks manually. Weizmann face consists of 1260 images, which has 28 subjects under 5 different poses, 3 illuminations, and 3 expressions. Each image is labeled 56 landmarks manually.

4.2. Fitting Accuracy Comparison

In order to verify the fitting performance of the improved AAM, experimental results will be obtained on the above-mentioned databases. Translation invariant wavelet transform based on texture representations are used to build the appearance model. Figure 3 shows models of shape and texture on IMM database. Figure 4 shows fitting comparison between AAM and improved AAM on IMM database. From Figure 4, we can see that the improved AAM obtains more accurate fitting result in the facial contour than original AAM. This is because that the improved method adds edge intensity and provides more information of training images.

In order to examine the fitting accuracy of different AAM methods, we compute the mean displacement errors 𝐸 between hand-labeled landmarks and fitting model points, the mean displacement can be computed as follows: 1𝐸=𝑀𝑀𝑖=11𝑁𝑁𝑗=1𝑝dist𝑖𝑗𝑝𝑖𝑗,(4.1) where 𝑀 is the number of test face images, 𝑁is the number of labeled landmarks, 𝑝𝑖𝑗 is the labeled coordinate, and 𝑝𝑖𝑗 is the coordinate after model fitting. dist(𝑝𝑖𝑗𝑝𝑖𝑗) represents the Euclidean distance between two points. Figure 5 shows mean displacements of varying number of iterations for different AAM models on IMM database. Figure 6 shows mean displacements of varying number of iterations for different AAM models on Weizmann database. From Figure 5, we can see that improved AAM has much lower mean displacement than that of original AAM. Figure 6 tells us the same story. The precision of feature extraction using improved AAM has been improved, and the fitting speed has been improved.

We can also calculate the overall improvement percentage of our improved AAM to the original AAM by 𝐸𝐼=originalAAM𝐸improvedAAM𝐸originalAAM×100%.(4.2) Table 1 shows fitting comparison between the original AAM and improved AAM on IMM database. Table 2 shows fitting comparison between the original AAM and improved AAM on Weizmann database. From the above two tables, fitting results show that improved AAM has a lower mean displacement than the original AAM, with the overall improvement about 6.73% and 8.28% in two different databases, respectively, compared with original AAM. Because a new texture representation is used in improved AAM, the improved method can provide more texture information for model fitting than intensity values, especially the new texture presentation is fused with edge intensity, which is less sensitive to imaging conditions. So this improved method can improve the fitting accuracy and the fitting speed has been improved.

5. Conclusions

As an important method for statistical modeling, active appearance model in the area of human facial features extraction has a wide range of applications. However, intensity values used in original AAM cannot provide enough information for model fitting. To overcome these defects, a new texture representation is proposed in this paper. We propose to utilize translation invariant wavelet transform to obtain the low-frequency coefficients, then, the low-frequency coefficients are fused with edge intensity, the edge intensity will be expressed as texture information for each pixel. The new texture representation is less sensitive to imaging conditions. Experimental results show that the improved algorithm improves the accuracy and exhibits a better fitting performance.

Acknowledgments

This work was supported in part by the following projects: Key Grant Project of Chinese Ministry of Education. (Grant no.  311024), National Natural Science Foundation of China (Grant nos. 60973094, 61103128), and Fundamental Research Funds for the Central Universities (Grant no. JUSRP31103).