Abstract

We explore how to leverage the performance of face feature points detection on mobile terminals from 3 aspects. First, we optimize the models used in SDM algorithms via PCA and Spectrum Clustering. Second, we propose an evaluation criterion using Linear Discriminative Analysis to choose the best local feature descriptions which plays a critical role in feature points detection. Third, we take advantage of multicore architecture of mobile terminal and parallelize the optimized SDM algorithm to improve the efficiency further. The experiment observations show that our final accomplished GPC-SDM (improved Supervised Descent Method using spectrum clustering, PCA, and GPU acceleration) suppresses the memory usage, which is beneficial and efficient to meet the real-time requirements.

1. Introduction

Face feature point detection and tracking is a hot topic in the research field of computer vision. It is the foundational technology of face pose evaluation, face expression transfer or synchronization, and so on. With the popularity of smart mobile terminals, a real-time tracking of facial feature points in mobile terminal is highly on demand. For example, the recently designed mobile [1] or vehicle [2] social networks need high quality face feature points tracking and detection for security authentication. It may also benefit the information retrieval within mobile crowd sensing architectures [3, 4]. However, due to the limited computational capacity of floating points on mobile terminal and the battery power, a series of face feature detection algorithms, which were successful on computer, cannot be realized directly on mobile terminals. Thus we come to the problem about how to realize real-time detection and tracking of face feature points on mobile terminals with limited computation power.

Nowadays, numerous research studies of facial feature point detection have been embodied in the experimental exploration on personal computer (PC) side. According to literature [5], from the model design, the method of facial feature point detection methods can be divided into local facial feature based, global facial feature based, and the hybrid method combining local features and global features. In comparison with other methods developed in recent years, four local feature based algorithms, such as Supervised Descent Method (SDM) [6], Coarse to Fine Shape Searching (CFSS) [7], Ensemble Regressing Trees (TREES) [8], and Explicit Shape Regression (ESR) [9], have achieved outstanding performances. Surprisingly, although these methods do not take into account the geometric features of facial feature points, they still manifest a good expression of facial occlusion. In spite of the simplicity, the speed and accuracy of SDM algorithm are still very competitive so far. And the cascade regression method firstly contributed by SDM has been continuously learned by the scholar community [1013] until now. Besides, based on a local receptive field of artificial neurons, Baltrusaitis et al. [14] in CMU proposed a method of facial feature point detection, which performed well under different illuminations. Based on both global and local feature descriptions, Hasan et al. [15] developed a shape regression method, which made a breakthrough on the W300 test set. Jaiswal et al. [16] proposed Gabor-LBP based face feature detection using model selection and support vector regression. According to the experiment observations, this method plays a superior role in capturing the face feature points with local occlusion and background clutters. Specially, Martinez et al. [17] broke the traditional routines of sequential regression method on this problem and proposed multivisual cues aggregation regression approach. With the continuous success of deep learning in computer vision tasks, Jourabloo and Liu [13] employed CNN as both shape and coefficients of 3D face model estimators for large pose variant face alignment and it can estimate 3D face shapes as shown in their experiments.

Recently, mobile devices have attracted tremendous attentions. Choi et al. [18] developed a face feature detection system for real-time training on mobile phones. Tresadern et al. [19] introduced the fixed-point method [20] and Haar-Like feature into the Active Shape Model [21] (ASM) algorithm to perform real-time tracking of facial feature points. Based on hierarchical model, Jian-kang et al. [22] proposed a detection method, achieving better results on mobile phones. Jiang [23] developed a facial feature detection system based on Android platform. With facial point detection on Android platform, Hp [24] invented a 3D human face pose estimation system.

Despite the numerous studies on the face feature detection algorithm for mobile terminals, it is still in its infancy, and most of them are based on mobile operating system. In this article, we improved SDM algorithm [6] with two aspects: the calculation and model. First of all, we reduced the dimension of features needed for localizing the facial feature points by using spectrum clustering. Second, we decreased the number of linear models in SDM by PCA. Finally, the local feature extraction and SDM model calculation were combined and optimized on the mobile GPU to further improve the performance of SDM algorithm. Since the SDM method is widely learned by most recently released face feature point detection methods, our modification should be also meaningful to boost the performance of other approaches.

2. Solution Pipeline

Figure 1 presents the schematics of collecting video data with mobile phone cameras. Face feature point detection is applied to a frame extracted from the video stream. Face detection is performed using Viola’s face detector. In order to overcome the illumination problems, the preprocessing steps have been employed in this work (details are available in Section 3). The local image feature extraction algorithm is performed on the initial feature points distribution, using LBP, HOG [25], SIFT [26], and so on. The feature extraction algorithm is implemented following GPU parallel design, which is copied into shared memory according to the pixel values of face images needed by each computing unit. At the same time, the SDM algorithm model is also embedded into the shared memory according to the computing unit. The calculated results of each computational unit are the regression components of human face shape.

After obtaining the regression components, we merge the regression components and update the shape model based on the integrated regression components. The process is repeated until that the shape model satisfies the error requirements in the training process.

3. Preprocessing

Considering the real video data acquisition, camera may have noise in dim condition, resulting in large image gradient or irregular filter responses, which severely influence the robustness of the detection algorithm. We normalized the face image from two aspects: noise reduction and brightness stretching as shown in Figure 2.

Firstly, we performed median filter to reduce the noise probably caused by the low light conditions. Then the histogram equalization is performed, in order to unify the light effect, clear the surface details of dim face image, and prepare for the subsequent steps.

4. Optimized SDM

4.1. Introduction to SDM

The core idea of SDM algorithm is to minimize the object function (1), aiming at finding the right modification of current position to the ground truth. The expression is as follows: where represents for the labeled facial feature points position; stands for face images; is the local description extracted at certain positions; is the initial shape. According to Newton method, minimizing (1) yieldswhere is the Hessian Matrix of feature extraction function denoted by ; is the Jacobian of . Although the function is ill posed since the feature extraction function is commonly unknown in practice, it can be reformulated as

Since the facial feature points are labeled at each instance, , the difference between initial and optimal feature points is also known; thus we have the following linear system: Let represent the training image; is used to solve and in the training phase. After the regression model , is obtained; we update each initial shape of each individual training samples:

Training was accomplished after obtaining the regression models: .

In the research of facial feature point tracking, SDM algorithm is still applicable. The difference between the face feature points detection and the face feature points tracking is in the initial feature points distribution. In the training step of facial feature points tracking, each individual’s initial facial points position is randomly sampled once or several times from its label within 95% or 20 pixels rescale or translation. Then, the tracking models are obtained with the same process discussed above. It is reasonable because the face shapes in consequence frames change slowly in the video.

4.2. Model Compression

Through the above analysis, we can find that SDM algorithm uses multiple regression to achieve the face feature point alignment, and its computational complexity is mainly concentrated in the calculation of regression models:

Assuming is the feature points position after times regression, considering we have 66 feature points to align, we will need linear model at each regression model. However, with careful consideration, we can find that if two feature points in shape are not strongly correlated, the local feature extracted at does not help much to locate . Thus, we separate to , making feature points in the same subshape mostly correlated, and feature points in different subshapes not that relevantly. In order to accurately estimate the number of categories, we introduce spectral clustering for analysis. First, the correlation coefficients between different feature points are defined as

Let be the number of training samples, be the dimension of feature points, and be the covariance of the feature point in dimension .

We use the correlation coefficient matrix directly to define the Laplacian Matrix:

The weight matrix , and . According the spectrum clustering, the number of smallest Eigen values of minus 1 is the number of catalogs denoted as . Thus we can get subshapes by performing -means on the last second Eigen vector to the Eigen vectors. Assuming that face feature points can be divided into classes, the facial feature point location of SDM algorithm can be rewritten as

Equation (9) uses union of subshapes’ regression model to determine the whole shape of face. The method reduces the length of each linear model times to the original and relaxes the computational complexity of SDM. However, the number of linear models in the framework of the SDM algorithm has not been reduced. Therefore, we use PCA to further simplify the SDM cascade linear model. The process is as follows:

Assuming we have the following linear model:

we perform dimension reduction using PCA on so that . is the principals of and . We have

If we denote , this yields

From matrix decomposition of , we have

is the diagonal matrix with Eigen values corresponding to . Because , we can know . So (11) can be rewritten as

Since is orthogonal, multiplying on both sides of (14), we get

It can be rewritten as

From the analysis above, we can know that the width of each linear model can be reduced to number of principal of .

4.3. GPU Acceleration

In the design principle of SDM algorithm, we know that the robustness of local feature extraction from facial feature points has a very important impact on the effect of face alignment. In practice, we find that the feature extraction and the model computation consume half of the time cost of SDM algorithm, respectively. With the development of computer vision, scholars have proposed an endless stream of local feature descriptors. Among them, the most widely used features are HOG (Histogram of Gradients) [25], SIFT (Scale Invariant Feature Transformation) [26], and LBP (Local Binary Patterns) [27]. However, these features are either too time consuming or difficult to be parallelized. For example, HOG feature is hard to be paralleled for lack of feature points. The reason may lie in overlapping cell. Statistical histograms of gradient directions are very easy to generate conflicts in memory accesses of computing units. While using parallel SIFT features, because of the need to obtain images pyramid, it will not save computational overhead but will lower the performance of the algorithm when the feature points’ size is small. Therefore, the main work of GPU acceleration is summarized as follows: picking up the suitable image feature extraction algorithm which is easy to be parallelized and robust under small scale data. By taking benefits of mobile phone GPU, parallelize the feature extraction in face alignment problem and the calculation of SDM algorithm

For the judgment of local description features, we derive heuristics from the linear discrimination analysis (LDA). Given an algorithm, denoted as , we expected local features extracted using should have smaller covariance at the same feature point in training set . The features extracted at different feature points should have enough difference. Apply the algorithmic idea of LDA; that is, in the same class, variance of features should be as small as possible. On the contrary, the variance between classes should be as large as possible. We denote as the local feature extracted in the sample’s feature points using algorithm . The covariance of ’s local feature can be expressed as

where is the mean of all samples feature extracted at using algorithm , .

Thus, the covariance of different feature points using can be calculated with

where . We believe that if the feature extraction algorithm is good enough, the covariance should be small with the same feature points but distinguishable at different feature points. Thus, we want to evaluate using the similar method as used in LDA. Despite the Eigen vectors, we are only concerned about the portion of the max Eigen values:

Finally, we choose the algorithm with largest portion of largest Eigen values:

We have analyzed recent popular local feature extraction algorithms for computer vision, including SIFT, SURF [28], HOG, BRIEF [29], ORB [30], BRISK [31], Freak [32], and MRLBP [27]. And we discovered that Freak has better performance in the problem of face alignment. The experiment details will be discussed in Section 5.

SDM face alignment algorithm does not have sufficient data to apply GPU acceleration. And if we do so, we will make both GPU-shared memory and CPU memory frequently visited. This will not only affect the performance of the algorithm, but also reduce memory life. Therefore, facilitating GPU architecture of mobile phone, we design a parallel SDM and feature extraction mode. As shown in Figure 3, our method parallelized features extraction and model solution of SDM. First, we load the pixel values of the subblocks at each feature point in memory according to the location of each feature point in the initial shape. Please note that if the pixel values in the subgraph block need to be accessed times, we will store copies of the pixel values in the shared memory. At the same time, the SDM algorithm is applied to the model fractions and image features at each patch, which is stored in the shared memory. Once the feature extraction calculation of a feature point is completed, the component of the image feature at the feature point is directly taken as the contribution to the overall shape regression.

5. Experiment Results & Discussion

In our experiment setup, we firstly evaluate the performance of local features as described in Section 4 with . In Table 1, we denote (Sep.) as the separable score according to (20), (p/s) as the numbers of feature points can be processed by these algorithms. To evaluate performance in mobile terminal, we use Qualcomm snapdragon 800 series platform. The dataset we used is a combination of LFW66 [33] and Helen [34].

As we can see in Table 1, the traditional local feature algorithm still has certain advantage regarding robustness or running speed. The most stable feature is still the HOG feature, but the fastest algorithm is the LBP feature. Considering the stability and running speed of the feature, the fast retinal feature points (Freak) proposed in recent years has more practical value.

Consequently, we compare the performance of the original SDM algorithm with our optimization on the aspects of efficiency and robustness. In order to prove the effectiveness of each optimization step, the whole optimization procedure is divided into C-SDM: the clustering optimization based on the feature point correlation; PC-SDM: the dimensional reduction optimization based on principal component analysis and feature point clustering; GPC-SDM: the face alignment algorithm which is finally proposed in this paper with adding dynamic GPU parallel optimization. First, we compare the optimization algorithms of each step with the original SDM. In order to quantify the problem, we first define the accuracy of the face regression algorithm with

where is the image in testing set and represents the estimated shape. The denominator is the distance between left and right eye, which is used to normalize the estimation error. We define the successful score of certain precision as follows:

where is the indicator pointing out whether the testing sample’s estimation error is lower than and is the number of testing samples. The successful rate of different optimization with growing is shown as Figure 4.

Combined with Figures 4 and 5, we can see that each phase of the optimization has significantly improved against the previous phase. Compared with the original SDM algorithm, C-SDM algorithm has a slight decline in the success rate. PC-SDM algorithm, integrating PCA and C-SDM, can enhance the performance further without affecting the success rate. Though the GPU-based GPC-SDM optimization should have the same success rate curve with the PC-SDM algorithm, due to the excessive GPU floating-point truncation error, the success rate decreased slightly in high precision conditions. For all the algorithms, a relaxation of the accuracy condition can lead to the same success rate as the SDM algorithm. In a word, through comparison experiment, we can see that the optimization scheme proposed in this paper can achieve significant improvement in performance.

Finally, we compare our last two modifications with the state-of-the-art face feature points detection methods with speed advantage. The experiments were cross datasets designed to prevent overfitting phenomenon; that is, if the model was trained on one dataset, saying the Helen dataset, it must be evaluated with other datasets, say the LFW. Thus, the precisions may slightly different from the publications of the original works. Also, the fps decreases dramatically since the experiment environment moves to mobile terminal. To simplify the experiments, we choose and compare both fps and success rate with state-of-the-art algorithms: the ensemble regression trees (EST) [9], the Gaussian process regression trees (F-cGPRT) [10], the local binary feature regression (F-LBF) [11], and the L1 norm penalty SDM (SDM(L1)) [12].

As shown in Table 2, the performance of PC-SDM and GPC-SDM boost the speed of face feature points detection and preserve the successful rate of the most robust SDM algorithm. Other state-of-the-art algorithms used regression trees or forest and simple image features to improve the speed. However, these approaches do not break through the computation framework proposed by SDM. And nowadays regression trees algorithms are easy to overfit the training set. This can be easily observed in Table 2 where although their approaches were announced to be precise, the successful rate downgraded with cross datasets evaluation. One successful modification of SDM is SDM(L1) [12], which employs sparsity with the SDM models and can speed up the algorithm easily and significantly. However, it also sacrifices the success rate compare to original SDM algorithm with support vector regression. Thus, although our proposal did not achieve the fastest performance even with GPU acceleration, it has more practical value considering both robustness and real-time requirements.

6. Conclusion

In this paper, we transplant the SDM algorithm on mobile devices with the following work: first, based on the correlation analysis between the feature points on the statistics of face shapes, we introduce spectral cluster and PCA to reduce the model size of SDM. Second, a local feature evaluation method based on linear discrimination classification is proposed, which decouples feature evaluation and face alignment algorithm to facilitate the experimental work. Third, a GPU-based SDM model is proposed, which combines feature extraction and model solving. Finally, the potential of SDM algorithm is fully exploited, and the SDM algorithm is optimized and transplanted to mobile terminal.

Face alignment algorithm has broad application prospects, but the research of this problem is still in its initial stage. We still need to improve the success rate of the face alignment algorithm, especially in the presence of occlusion, light changes, night light, and face 3D pose changes. The algorithm of face alignment and facial feature point tracking success rate is still insufficient. Studying how to overcome these problems is important in our next research work.

Conflicts of Interest

The authors declare that they have no conflicts of interest.