Mathematical Problems in Engineering

Volume 2015, Article ID 840840, 9 pages

http://dx.doi.org/10.1155/2015/840840

## Multifeature Extreme Ordinal Ranking Machine for Facial Age Estimation

School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798

Received 12 May 2015; Accepted 23 August 2015

Academic Editor: Huaguang Zhang

Copyright © 2015 Wei Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Recently the state-of-the-art facial age estimation methods are almost originated from solving complicated mathematical optimization problems and thus consume huge quantities of time in the training process. To refrain from such algorithm complexity while maintaining a high estimation accuracy, we propose a multifeature extreme ordinal ranking machine (MFEORM) for facial age estimation. Experimental results clearly demonstrate that the proposed approach can sharply reduce the runtime (even up to nearly one hundred times faster) while achieving comparable or better estimation performances than the state-of-the-art approaches. The inner properties of MFEORM are further explored with more advantages.

#### 1. Introduction

With the rapid development of computer vision, pattern recognition, and biometrics, more and more attention has been paid to computer-based human facial age estimation, which will be utilized in the scenarios where an individual’s age needs to be obtained without specifically identifying other irrelevant personal information, such as electronic customer information management [1, 2], human-computer interaction (HCI) [3], security surveillance monitoring [4, 5], age-based visual advertisement, and even entertainment.

Unlike other face-oriented problems, the difficulties of computer-based facial age estimation are reflected in the following aspects:(1)Difference of aging process: different people have their own living environment, ethnic group, gender, lifestyle, social contact, health condition, and even gene diversity, which all together determine the speed of aging.(2)Shape or texture: different forms of aging will emerge in different age levels. For example, from infancy to adolescence, the craniofacial growth (shape growth) is the main change. However, from adult period to old age, the craniofacial change decreases remarkably and skin transformation (texture change) would be the most prominent change.(3)Data insufficiency: we only have a very limited number of aging datasets, especially which can cover all the age range.(4)Disturbance: some females tend to show their younger face, so final estimation results will be largely interfered by using cosmetics and accessories.

A lot of facial age estimation approaches have been put forward, some of which are able to obtain rather satisfying performance. Among them, most of the traditional approaches formulate facial age estimation problem based on classification [6–9], regression [4, 10–12], or combination of the two. Suppose we have a dataset of training samples, , in which represents the th face image and represents the corresponding age label. In multiclass classification, every sample will be regarded as a single independent age label for training; as a result, we get a multiclassifier to estimate a person’s age. But the trouble is that the age labels have no relationship with each other; that is, each age label is only treated as a separate entity in the training process while, in essence, human’s age labels are sequential. So this kind of multiclassification method may omit some connotative information of the correlation among different age labels, which together compose the fine-ordered age set. For instance, two images with adjacent age labels for the same person will be more similar than those with far-apart labels. In short, multiclass classification cannot take full advantage of the correlation among ordinal age labels. In contrast, the regression method aims to find the best mapping from raw images to the corresponding ages and get a function for age estimation. However, craniofacial and skin changes in different age levels would result in unstable random process in feature space, so the kernels used to assess the similarities among different ages could be drifted. As for the estimation performance, it has been shown in the literature [4, 11, 13] that when different datasets are used for training and testing, the regression method will show better or worse results than the classification-based method. In addition, Guo et al. [4, 14] proposed a hybrid method which combines classification and regression approaches together to make use of both advantages. As a result, the actual performance is further improved to some degree.

In order to overcome the aforementioned defects of classification and regression approaches, ordinal hyperplanes ranker (OHRank) [15] based age estimation has been proposed. As everyone knows, the aging process is diversified for different age levels. As an analogy, the aging process from 22 to 25 would have a different tendency compared to that from 62 to 65. So it is more credible to compare two age labels’ relative sequences (smaller or larger) than to compare the differences among labels. In spite of all the above merits, OHRank only utilizes a single-feature set as the feature representation model, so it fails to synthetically include all discriminative information from all available feature sets. More specifically, each feature set has its own advantages over others. For example, the anthropometry models [16] mathematically model the growing of people’s head from babyhood to adulthood, so it reveals some information of the face’s size and proportion; Active Appearance Models (AAM) [17] can represent both shape and texture information instead of only facial geometry; age manifold [18] learns the age tendency from different face images at every age label. So OHRank cannot integrate these pieces of discriminative information of different feature sets. Later Weng et al. [19] present a multifeature ordinal ranking (MFOR) method to utilize multifeature sets simultaneously, so the feature information’s discriminative power is further reinforced. Experiment results demonstrate that MFOR outcompetes other age estimation approaches. However, almost all these approaches including MFOR largely rely on the complicated mathematical optimization solution. Take OHRank and MFOR as an example: these two ordinal ranking-oriented methods have top performance so far but they are all constructed under SVM-based formulation, so the SVM parameters must be computed iteratively by working out complicated and time-consuming optimization problems. As a result, calculation complexity would be a heavy burden for improving efficiency.

Recognizing this point, we propose a multifeature extreme ordinal ranking machine (MFEORM) for facial age estimation. Basically, we divide our approaches into three stages: (1) representing features using certain feature extraction models, (2) processing the obtained feature sets, and (3) applying certain algorithms to estimate age. On the first stage, we use multiple feature models parallelly to represent our facial image database. For the second stage, since it is more logical and reasonable to distinguish which is older/younger between two facial images than to directly predict the age from images, the more reliable “larger or smaller than” information is used for one binary classification at each age. In this case, the abstract age estimation problem can be downgraded into binary classification subproblems where represents the number of total age labels. For the third stage, an ultrafast extreme learning machine (ELM) with kernel function is applied to get a series of classifiers. These classifiers are then integrated together according to a certain rule which will be illustrated in the following. Experiment results explicitly demonstrate that our MFEORM is able to notably reduce the runtime (even up to nearly one hundred times faster) while achieving similar or even better estimation results against state-of-the-art methods.

To sum up, the following contributions are made in this paper:(1)Multifeature extreme ordinal ranking machine (MFEORM) for facial age estimation is proposed, which combines the advantages of multifeature space, age’s natural characteristics of ordinal information, and extreme learning machine’s rapid learning rate while achieving similar or even better performances with much less time compared to state-of-the-art methods. Our approach avoids tediously conducting the iterative computation for mathematical optimization problem and improves efficiency.(2)The experiments are conducted comprehensively and thoroughly from the internal and external aspects, respectively, and find out more about MFEORM’s particular characteristics and advantages.(3)Further properties are explored: (a) the influence of different number of feature models on the final results and (b) the influence of number of dimensions (after PCA dimension reduction) on the final results.

The rest of this paper is organized as follows: firstly, briefs of previous ordinal ranking-oriented age estimation solutions and extreme learning machine will be reviewed in Section 2. Then our proposed method will be detailed in Section 3. After that, experiment results and remarks are reported in Section 4. Finally, Section 5 concludes the paper.

#### 2. Extreme Learning Machine (ELM)

All through the years, conventional learning techniques like support vector machines (SVMs) and neural networks have been suffering from the following: (1) slow training and learning speed, (2) human involvement, and (3) unsatisfactory generalization performance. However, extreme learning machine (ELM) [20–22], which recently draws more and more attention, conquers these drawbacks and gets a satisfying performance to a certain extent. Primarily, ELM is based on generalized single-hidden layer feedforward neural networks (SLFNs). ELM has the following advantages:(1)In ELM, all hidden layer parameters of SLFNs do not need to be tuned and do not rely on training samples. They only need to be randomly generated and reduce human intervention.(2)ELM has much faster speed and more superior generalization performance.

Let us start from the structure of SLFNs. Figure 1 shows a typical SLFN’s construction. Generally speaking, SLFN can be described as where represents the output weight between the th neuron and the output node, is the output function of the th neuron, is the weight vector linking to the th node, and is the bias of the th node. Particularly, we have Many researchers have built the theoretical foundation [23–25] that SLFN is able to learn arbitrary distinguishing samples with zero error provided this SLFN has any bounded nonlinear activation function and hidden neurons at most. More precisely, suppose we have training samples , where . Also, we let Then the abovementioned “zero error” means In other words, we can find a combination of and such that To make it concise, we formulate (5) as where