Face verification in the presence of age progression is an important problem that has not been widely addressed. In this paper, we propose to use the active appearance model (AAM) and gradient orientation pyramid (GOP) feature representation for this problem. First, we use the AAM on the dataset and generate the AAM images; we then get the representation of gradient orientation on a hierarchical model, which is the appearance of GOP. When combined with a support vector machine (SVM), experimental results show that our approach has excellent performance on two public domain face aging datasets: FGNET and MORPH. Second, we compare the performance of the proposed methods with a number of related face verification methods; the results show that the new approach is more robust and performs better.

1. Introduction

1.1. Background

Face verification is an important yet challenging problem in computer vision and has a very wide range of applications, such as surveillance, access control system, image retrieval, and human computer interaction. Despite decades of study on face image analysis, age related facial image analysis has not been extensively studied until recently. Most of the research effort has been focused on pursuing robustness to different imaging conditions, such as illumination change, pose variation, and expression. Published approaches to age invariant face recognition are limited. Most of the available algorithms dealing with facial aging problem are focused on age estimation [17] and aging simulation [810]. One of the successful approaches to age invariant face recognition is to build a 2 D or 3 D generative model for face aging [11]; the aging model can be used to compensate for the aging process in face matching or age estimation, which we can see in Figure 1. There are only a few previous works that applied age progression for face verification tasks. Ramanathan and Chellappa [12] used a face growing model for face verification tasks for people under the age of eighteen. This assumption limits the application of these methods, since ages are often not available. When comparing two photos, these methods either transform one photo to have the same age as the other or transform both to reduce the aging effects. While the model-based methods have been shown to be effective in age invariant face recognition, they have some limitations, such as, the difficulty to construct face models, the need for the training faces’ information and other uncontrolled conditions (e.g., frontal pose, normal illumination, and neutral expression). Unfortunately, such constraints are not easy to satisfy in practice. Biswas et al. [13] studied feature drifting on face images at different ages and applied it to face verification tasks. Other studies used age transformation for verification including [1417]. Ling et al. [18] used gradient orientation pyramid for feature representation, combined with support vector machine for verifying faces across age progression, and it showed the difference of the influence on the image information in matching. Li et al. [19] proposed a discriminative model called MFDA to address face matching in the presence of age variation. In a recent work [20], Sungatullina et al. proposed a new multiview discriminative learning (MDL) method with three different types of local feature representations for age-invariant face recognition.

1.2. Contribution

In this paper, we make some contributions. First, we propose using the active appearance model (AAM) and the gradient orientation pyramid (GOP) for face verification. We show that, when combined with the support vector machine (SVM), the feature demonstrates excellent performance for face verification with age gaps. This is mainly motivated by the good performance of the gradient orientation pyramid as shown in [18]. The gradient orientation is robust to aging processes under some flexible conditions and pyramid technique is used to add hierarchical information that further improves the performance. Given an AAM image pair by using the active appearance model [21], we use the gradient orientations pyramid to build the feature vector. At last, similar to the procedure in [22], we combined with a SVM classifier for the face verification. Second, we assessed seven different methods to complete the task, including two benchmark approaches ( norm and gradient orientation) and five different representations with the same SVM-based framework (intensity different, gradient with magnitude, gradient orientation, gradient orientation pyramid, and active appearance model with gradient orientation pyramid). The thorough empirical experiments are executed on the two big public aging datasets: FGNET and MORPH.

The rest of the paper is organized as follows. Section 2 describes the AAM. Then in Section 3, we introduce the gradient orientation pyramid. Section 4 reports the verification experiments on the two datasets mentioned above. At last, Section 5 summarizes and discusses the paper.

2. Active Appearance Model (AAM)

AAM (active appearance model) is a feature point extraction method which is widely used in the field of pattern recognition. Cootes first proposed the ASM [21], but the ASM more or less ignores the texture (color and gray value) information of images. Then the AAM proposed, the facial features localization method based on AAM in establishing face model process, not only considers the local features, but also considers comprehensively the global shape and texture information. Through statistical analysis of the shape and texture features of the human faces, face mixed model is established, which is the final AAM.

Firstly we use the 68 feature points to establish the shape model; then we normalized the shape model to eliminate the effect of other factors, and after the normalizing, we use PCA on the normalized shape model. Then we can get an average sample as follows:

The covariance matrix of the training sample is

Use the covariance matrix, and we can get the shape model, the model parameter c of the AAM used to control the shape the model. The model shape presentation is where is the average vector of model shape and is the matrix describing the model of variation derived from the training set.

Based on the normalized shape model, we apply the Delaunay triangulation algorithm (shown in Figure 2) and the affine transformation to get the texture model.

There are two triangular nets, and ; their three vertices are , , and and , , and . Any one point in a triangular net is where , , and are adaptive parameters, , and . And if point is in , we can get Then we can get the point in as follows:

By establishing one-to-one mapping, we can get the model texture representation: where is the average vector of model texture and is the matrix describing the model of variation derived from the training set.

When we get the AAM images, which are of the size of pixels, we normalize the AAM image to pixels and transform it into a gray-scale one. And the normalized images are independent of the shape information, we will use only the texture information of the original images in the next feature extraction process. The final AAM image we used is shown in Figure 3.

By using the AAM, We can reduce the impact of the age variation in face verification.(1)In the process of the AAM, the face pose has been corrected; and the effect of the posture has been nearly eliminated.(2)The differences among different people are reduced by normalizing the shape model.(3)By normalizing the shape model, the texture models we have gotten almost ignore the shape information, and the only texture information is mainly used in the feature extraction process.(4)To verify the reason 3, in the Section 4, we will conduct experiments on the effect of shape and texture representations, and the results show that the texture feature is more useful than shape feature in face verification across age progression.

3. Gradient Orientation Pyramid (GOP)

Because of previous study of the robustness of gradient orientation (GOP) [18], we propose to use it as the feature descriptor for face verification across age progression in our experiment. After we get the AAM of all the images, given an AAM image , where indicates pixel locations, we first define the pyramid of as as where is the Gaussian kernel, denotes half size down-sampling, and is the number of pyramid layers. What should be noted in (8) is that is used for both the original image and the images at different scales for convenience.

Then, the gradient orientation at each scale is defined by its normalized gradient vector at each pixel as where is a threshold for dealing with “flat” pixels. The gradient orientation pyramid (GOP) of is naturally defined as that maps to a representation, where stack (·) is used for stacking gradient orientations of all pixels across all scales and is the total number of pixels. Figure 4 illustrates the computation of a GOP from an input image.

Given an AAM image pair and corresponding GOPs (, ), the feature vector is computed as the cosines of the difference between gradient orientations at all pixels over scales as where is the element-wise product. Next, a Gaussian kernel is used on the extracted feature for combining with the SVM. Specifically, our kernel is defined as where is a parameter determining the size of RBF kernels.

4. Face Verification Experiments

In this section, we report experimental results obtained on FGNET and MORPH database by comparing our algorithm with a number of related face verification methods. While there are several public domain face datasets, only a few are constructed specifically for the aging problem. The lack of a large face aging database until recently limited the research on age invariant face verification. There are two desired attributes of a face aging database: (i) large number of subjects and (ii) large number of face images per subject captured at many different ages. In addition, it is desired that these images should not have large variations in pose, expression, and illumination. But the conditions are too hard to satisfy in the real world. Hence, it is crucial to design an appropriate feature representation scheme which is tolerant to such multiple variations.

4.1. Experimental Classifier and Evaluation

Our model face verification is a two-class classification problem. Given an input image pair and , the task is to assign the pair as either intrapersonal (i.e., and from the same people) or extrapersonal (i.e., and from the different individuals). We use a support vector machine (SVM). Specifically, given an image pair (; ), it is first mapped onto the feature space as where is the feature vector extracted from the image pair through the feature extraction function, , is the set of all images, and forms the -dimensional feature space.

Then SVM is used to divide the feature space into two classes, one for intrapersonal pairs and the other for extrapersonal pairs. Using the same terminology as in [20], we denote the separating boundary with the following equation: where is the number of support vectors and is the support vector. The notation is used to balance the correct rejection rate and correct acceptance rate as described in (14), and the is the kernel function that provides SVM with nonlinear abilities. By the way, in our experiments, we adopt the most widely used SVM, the LIBSVM library [23].

For verification tasks, we use two popular critical criteria, the correct rejection rate (CRR) and the correct acceptance rate (CAR): where “accept” indicates that the input image pair are from the same subject and “reject” indicates the opposite. The performance of algorithms is evaluated using the CRR-CAR curves that are usually created by varying some classifier parameters. For our experiments, the CRR-CAR curve in each experiment is created by adjusting parameter in (13). In addition, the equal error rate (EER), defined as the error rate when a solution has the same CAR and CRR, is frequently used to measure verification performance. As we know, the lower the EER, the better the performance.

4.2. Experimental Compared Approaches

We compare the following approaches.(i)SVM + AAM + GOP: this is the approach proposed in this paper.(ii)SVM + GOP: this is the method using GOP in [18].(iii)SVM + GO: this is the SVM+GOP without a hierarchical model.(iv)SVM + G: this one is similar to SVM+GO, except that the gradient (G) is used instead of gradient.(v)SVM + diff: as in [22], we use the differences of normalized images as input features combined with SVM.(vi)GO: this is the method using gradient orientation proposed in [24].(vii): this is a benchmark approach that uses the norm to compare two normalized images.

To avoid to the huge difference between the original images and the AAM images, in our compared experiments, the images are preprocessed using the same scheme as in [18]. This includes manual eye location labeling, alignment by eyes, and cropping with an elliptic region, where the size of the elliptic region is almost equal to the black edge of the AAM images. Figure 5 shows the preprocessing in the other compared experiments.

4.3. Experiments on the FGNET Dataset

In the face verification, there are several public domain face datasets; only a few are constructed specifically for the aging problem. As we know, the lack of a large face aging database until recently limited the research on age invariant face recognition. The FGNET Aging database [25] is a widely used and an important dataset of the face recognition in the presence of age progression. It contains 1002 facial images of 82 persons; consequently, there is an average of 12 images per subject and all the images in the database are annotated with landmark points and age information in the age range of 0–69 years. Figure 6 shows some of the examples in the FGNET database.

The appearance changes of human faces are very different in children than in adults, which were proved in [26]. In this experiment we only use a subset of the FGNET database that contains only images that are taken above age 18 (including 18), which is consistent with the study in [12, 18]. In the subset of the FGNET database, there are 272 facial images of 62 persons.

For verification tasks, we generate 665 intrapersonal images pairs by collecting all image pairs from same subjects. Extrapersonal pairs are randomly selected from images from different subjects. We only utilize the three-fold cross validation for the less number of the images, such that in each fold samples from the same subject never appear in both training and testing pairs; each fold contains 220 intrapersonal pairs and 2000 extrapersonal pairs. The results are shown in the Figure 7 and Table 1.

From the figure and table, we can see that the proposed approach outperforms the other methods in the experiments, especially compared to the norm and gradient orientation, which improves approximately 10%. Although the improvement is not obviously compared to the SVM + GO and SVM + GOP, but in contrast to others traditional methods, the improvement is apparent. Since, as we know, the number of the images is limited, we consider that the experiments only in the FGNET database is not enough, and the extended face verification experiments are needed.

4.4. Experiments on the MORPH Dataset

In this section, we report results on experiments on a larger public domain face aging dataset, which is the MORPH database [27].

In the MORPH database, there are total of 52099 facial images of 12938 subjects in the age range of 16–77 years. But there is only age information in the dataset, lack of the landmark point information. So we labeled the 68 landmark points of all the facial images manually prior to the verification task. Figure 8 shows several examples of face images at different ages in MORPH database. While the number of subjects in this database is large, the number of face images per subject is rather small (an average of about 4 facial images per subject). Notice there are also large pose, lighting, and expression variations along with the age variation. In this experiment we use all the image data to carry on the face verification.

We emphasize the importance of experiments on MORPH database due to the following reasons.(i)MORPH is very challenging for our task in two ways. First, it contains much larger age gaps. The largest gap is 61 years in MORPH, compared to 45 years in the FGNET databases. Second, the number of each subject’s images is more limited (an average of about 4 facial images per subject compared to 12 facial images per subject in the FGNET), which makes learning more difficult.(ii)Compared to the FGNET, obviously, the number of the facial images in MORPH is larger. The subjects are 10 times more than the subjects in FGNET. We consider that experiments in MORPH will serve as a baseline for future studies on the topic.(iii)The MORPH database contains facial images of three kinds of skin color, in contrast to the FGNET which only contains the white people. And the illumination and environment of the images in the MORPH will be more in line with the actual application; it can provide a better foundation in its future actual application.

Because of the big size of the MORPH database, we decide to use the three-fold, five-fold, and ten-fold cross validation rather than utilize only three-fold cross validation in the following experiments; each fold contains 2000 intrapersonal pairs and 3000 extrapersonal pairs which we also choose randomly from all the intrapersonal pairs and extrapersonal pairs by collecting all image pairs from all the subjects, which makes each fold samples from the same subject never appear in both training and testing pairs.

The experiment results are shown in Figures 9, 10, and 11 and Tables 2, 3, and 4.

AS we see in the results of the three-fold across validation experiments on the MORPH database, we can study that, again, our method outperforms all other approaches, and in addition, the results improve more than the experiments on the FGNET database (the equal error rate difference increases from 0.3% to 1.3%). We also can find that all the methods’ equal error rate decreases obviously (even the worst result (the norm) equal error rate decreases from 40.6% to 26.6%), this probably thanks to the fact that the dataset is big enough, so that the SVM could conduct more thorough learning, or maybe due to the lesser average of facial images per subject and the smaller difference of age of the facial images of per subject to make the learning more simple. Consequently, because of the tremendous MORPH database, in addition, we consider to extend the experiments by using the five-fold and ten-fold cross validation to improve the verification performance of our method.

4.5. Age Factor Experiments

As mentioned above, the experiments in [26] have proved that the appearance changes of human faces are very different in children than in adults, and in this section we consider the verifying of the method again by the additional experiments on the FGNET database which only contains only images that are taken below age of 18.

For verification tasks, we generate about 2000 intrapersonal images pairs by collecting all image pairs from same subjects. As the experiments in Section 4.3, we only utilize the three-fold cross validation in the experiments, and each fold contains 690 intrapersonal pairs and 2000 extrapersonal pairs. The results are shown in Figure 12 and Table 5. We can see that the performance of the experiments decreases a lot; in other words, the face verification becomes more difficult, which confirms the method in [26] again.

Because the paper is about the face verification across aging, the way age differences affect the performance in the task is very interesting. So we make statistics about the age effect in the experiments. The statistics results are shown in Figures 13 and 14.

From the figures in the experiment FGNET dataset, we found that in the experiments almost all the performance reduces with the age gap increasing, especially when the age gap is more than four. Maybe the SVM+diff is more suitable for the face images in the FGNET database below 18 years old, and as we can see the results shown in the Figure 12 and Table 5 are consistent.

Figure 15 shows the experiments on the MORPH database. And there is something interesting in the experiments on MORPH database; the faces separated by four years are easier than those separated by more than four years and less than four years. Because the age gaps of each subject are lesser, when the age gap is big enough, the task becomes easy as the age gap is four, and the results on MORPH database are irregular.

4.6. Shape and Texture Representation Experiments

In this experiment, we only use the shape feature and the texture feature to address the age variant in face verification on MORPH. For the texture feature, we increase the weight from 0.1 to 0.9; for the shape feature, the weight deceases from 0.9. to 0.1.

We use the three-fold, five-fold, and ten-fold cross validation in the experiments. The experiment results are shown in Figures 16, 17, and 18 and Table 6.

We can see from the figures and the table, with the weight of the texture feature increase, the performance of the face verification is better. Then we can deduce that the texture feature plays a more improtant and useful role than the shape feature on face verification across age progression.

5. Conclusion and Discussion

In this paper we studied the problem of face verification with age variation combining active appearance model (AAM) and gradient orientation pyramid (GOP) representation. First, we establish the AAM on the datasets. After we generate the AAM images, we use a robust face descriptor, the gradient orientation pyramid, for face verification tasks across ages. Then we use SVM classification to train and test them. To compare to previously used descriptors which are very classic in our experiments, the proposed method demonstrated very promising results on two public domain databases: FGNET and MORPH database. Both databases contain many face images with large age differences. In addition, we make statistics about the age effect in the experiments.

Facial aging is a challenging problem that will require continued efforts to further improve the recognition performance. There are several directions for future work. First, since the data always affect the performance in the face verification, we plan to test on other large public datasets for deeper understanding of the proposed approaches. Second, we anticipate the proposed and more effective feature extraction methods to solve face verification problem in the future research.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was supported by the Grants of the National Science Foundation of China (nos. 61175121 and 61102163), the Program for New Century Excellent Talents in University (no. NCET-10-0117), the Grant of the National Science Foundation of Fujian Province (no. 2013J06014), the Program for Excellent Youth Talents in University of Fujian Province (no. JA10006), the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (no. ZQN-YX108), and the Project of Science and Technology Innovation Platform of Fujian province (no. 2012H2002).