Abstract

In this paper, we developed a new robust part-based model for facial landmark localization and detection via affine transformation. In contrast to the existing works, the new algorithm incorporates affine transformations with the robust regression to tackle the potential effects of outliers and heavy sparse noises, occlusions and illuminations. As such, the distorted or misaligned objects can be rectified by affine transformations and the patterns of occlusions and outliers can be explicitly separated from the true underlying objects in big data. Moreover, the search of the optimal parameters and affine transformations is cast as a constrained optimization programming. To mitigate the computations, a new set of equations is derived to update the parameters involved and the affine transformations iteratively in a round-robin manner. Our way to update the parameters compared to the state of the art of the works is relatively better, as we employ a fast alternating direction method for multiplier (ADMM) algorithm that solves the parameters separately. Simulations show that the proposed method outperforms the state-of-the-art works on facial landmark localization and detection on the COFW, HELEN, and LFPW datasets.

1. Introduction

Localization of detailed facial features arises in a variety of applications including occlusion coherence for prediction in high-dimensional images [1], landmark localization [24], head pose estimation [5, 6], face image alignment [7], low-rank estimation [810], and object detection [1114], etc. However, high-dimensional data are easily impacted by outliers and occlusions. It is thus of importance to develop a deep robust part-based models combining the robust regression algorithm with affine transformation for separating the landmark localization from the occlusions as in [15] and detection [16]. Thus, developing a new algorithm that is resilient with adverse effects while processing big data particularly for face detection and landmark localization is highly indispensable.

A number of algorithms have been suggested for robust facial landmark localization and object detection [1724]. For instance, [18, 25, 26] addressed these problems using a regression approach. However, these approaches are mostly an approximate and, additionally, they are no longer pose-invariant. To improve the performance of the algorithm and overcome the aforementioned setback, deformable part models (DPMs) based on high-dimensional images [2730] were taken into account. However, they can not deal with a huge amount of outliers and occlusions. Zhang et al. [31] proposed a tasks-constrained deep convolutional network (TCDCN) through multitask learning which directly conducted the simulations based on high-dimensional images. However, it ignores the potential annoying effects and the correlation between images which jeopardize the robustness of the algorithm. Plus, the issue of the misalignment problem is also not tackled. Moreover, to tackle the misalignment problem, [8, 9, 3234] addressed several algorithms via affine transformation and the L2,1 norms. To circumvent this dilemma, Martinez et al. [35] considered a robust facial landmark detection (RFLD) based on high-dimensional image data via L2,1 norm regularization against poor initializations. This approach, however, may lead to a suboptimal solution by performing a dimension reduction and a regression process. Zhang et al. [36] addressed joint face detection alignment using multitask cascaded convolutional networks and [37] feature selection with multiview data; however, affine transformation is not taken into consideration. While [38] suggested a fast and accurate facial landmark detection network through using an augmented (EMTCNN) instead of [36], yet the performance is not promising. To boost the performance of the algorithms, [3840] proposed a novel deep network method for detecting facial region and landmarks to learn from low-dimensional image representations; however, the proposed techniques seek affine transformation to deal with occlusions and illuminations. To tackle this dilemma, [8, 32, 33] proposed new methods assisted via affine transformation in high-dimensional images to estimate the optimal parameters corresponding to the low rank recovery.

To deal with outliers and occlusions, Tzimiropoulos and Pantic [41] proposed an active appearance model via a fast simultaneous inverse composition (Fast-SIC) based on image datasets for landmark localization by considering the shape and appearance of images to enhance the face fitting accuracy. However, its localization outputs are distorted due to the presence of outliers and occlusions that is also why the performance of the method is not promising. To solve this problem, [1, 42] proposed hierarchical part models (HPMs) relying on high-dimensional images to explicitly model partial occlusions and detect facial points and occlusions simultaneously based on high-dimensional images using the landmark localization error. In addition to this, [43, 44] proposed a latent multiview self-representations for clustering via the tensor nuclear norm, while [44] addressed the robust sparse low-rank embedding for image dimension reduction. But, in real scenarios, the outliers and occlusion patterns can be very diverse and almost unpredictable. Nada et al. [45] considered a new annotated unconstrained face detector (UF-DD) to enhance the performance; however, it lacks robustness to large-scale variations and is required to perform an exhaustive search of the optimal transformation to obtain satisfactory performance.

2. Characteristics of the State-of-the-Art Methods

In this section, we illustrate the characteristics of the five state-of-the-art works such as TCDCN [31], RFLD [35], Fast-SIC [41], HPM [42], and UF-DD [45] with the proposed approach, which further adds affine transformations and multiple subspaces. The merits of the proposed method are quite different from the baselines in terms of its novelity and updating the parameters. For instance, TCDCN [31] addressed the idea of facial land mark location to deal with occlusion and pose variation via deep multitask learning, but it yet requires to incorporate affine transformation to explore deep multitask learning in dense landmark detection and high-dimensional image recovery. This method considers a few regularization parameters with about six different labels that fail to penalize the complexity of the weights. To tackle the dilemmas of gross errors and variation in images, RFLD [35] proposed a novel regression method that substitutes the commonly used least squares regression developed by [31] via the use of the L2,1 norm, but it lacks performance when there are a large number of landmarks. Moreover, there are seven numbers of parameters taken into account, which increases the computational complexity of the method. To alleviate the problems in facial deformable models for the unconstrained images, Fast-SIC was developed by [41], where six different parameters including the vector appearance, localization, and warping parameters involved in problem formulation, which makes the method difficult to optimize the parameters. To solve this setback, [42] proposed HPM for facial landmark localization and occlusion estimation to boost the performance, with five parameters and a number of network hierarchical structure of layers which is increasing the computational complexity of the algorithm. To tackle the setback of outliers and heavy sparse noises, occulusions and illuminations, [45] proposed a novel UF-DD method for face detection where five numbers of parameters are involved; however, the impacts of large outliers and heavy sparse noises are not alleviated. As a new annotated unconstrained face detector is addressed in [45], the time complexity is relatively better over RFLD [35], Fast-SIC [41], and HPM [42]. In the latest work, there are about seven number of parameters involved including affine transformation. As affine transformation is newly added in the new model, it shares the advantage of pruning out the potential impacts of occlusions and illuminations, outliers, and heavy sparse noises in landmark localization’s and facial detection from which we note that the time complexity of the proposed method is highly reduced and the performance is enhanced. Another advantage of the new part-based model over the state-of-the-art works is it considers multiple subspace parameters to constrain each parameters and shares the advantages of positive semidefinite. However, the new technique seeks to establish the spatial dependency between different images that can be captured incorporating the spatial weight matrix, as such the performance of the new part-based model be boosted.

In this paper, we present a new robust regression algorithm for facial landmark localization and face detection. Invoked by the novel idea of affine transformations in [8], and to be more resilient to occlusions and outliers, the new algorithm incorporates affine transformations with the robust regression based on the hierarchical part-based models [1, 42]. As such, the distorted or misaligned images can be rectified by affine transformations and the patterns of occlusions and outliers can be explicitly separated. The proposed method via affine transformation contributes in clearly separating the facial landmark localization and occlusion estimation. The search of the optimal parameters and the affine transformations is cast as a constrained optimization programming. To mitigate the computational overhead, a new set of equations are derived to update the parameters involved and the affine transformations iteratively using ADMM approach in a round-robin manner. Conducted simulations show that the proposed method outperforms the state-of-the-art works on face detection and landmark localization on some common on the COFW, HELEN, and LFPW datasets. The major contributions of this work include the following: (1)The affine transformations are incorporated in the robust regression based on the part-based models to take advantages of both schemes in the learning process(2)The affine transformations are aggregated with the low-rank-sparse representation, where the low-rank component lies in a union of subspaces instead of one single subspace. These transformations can fix the distortion or misalignment in a batch of corrupted images to render more faithful image decomposition, thereby being more robust against heavy sparse errors and outliers(3)The ADMM method is employed to solve the new convex optimization problem, and a set of updating equations is derived to iteratively update the optimization variables and the affine transformations(4)New set of updating equations is established to iteratively solve the constrained optimization problem(5)We conduct experiments on several benchmark datasets, and the experimental results demonstrate the effectiveness of our new method

3. Problem Formulation

Given images , all of which contain the same object and are linearly correlated, where and denote the weight and height of the images, respectively. Based on the part-based models [1], we approximate by a shape parameter, where is the vector stacking operator [46] and is the number of landmarks. However, in practice, the images are usually contaminated by occlusions and outliers, so the images can be represented as , where is a corrupted high-dimensional images, is a term accounting for the occlusions and outliers, and is an appearance parameter.

To solve the misalignment problem incurred by the outliers and occlusions in , the original images need to be linearized, which can be done by affine transformations. Applying a set of affine transformations, denoted by , ..., to [33, 47], then, we then have , where denotes the transformed data. Assume that the differences between the consecutive affine transformations are small, so we can approximate by , , in which denotes the number of parameters, denotes the Jacobian of the th training images with respect to , and denotes the standard basis for . Thus, consequently, the main objective is to minimize the localization error via incorporating a novel idea of affine transformation for better performance, so the overall problem can thus be posted as the following constrained optimization problem where , , is a weight matrix, is the regression matrix mapping to , is a regression coefficient that controls the appearance variation, denotes as a vector of all ones, , , and are the regularization parameters, with being the singular values of , , and .

3.1. Proposed Approach

To solve a constrained optimization problem (2), we consider an ADMM given by where , , and are the Lagrangian multipliers, , , , and are the penalty parameters, and . This can be solved by using ADMM. The major necessary condition of ADMM is the convexity issue; we give sufficient conditions under which the algorithm asymptotically reaches the standard first-order necessary conditions for local optimality.

Then, based on a linearized alternating direction method [48], the augmented Lagrangian multiplier in (3) can be reexpressed as

where . Directly solving (4) is computationally expensive, so to follow, we consider a set of equations to iteratively update the parameters in (4). So typically, we can choose to minimize the Lagrangian function to update all involved parameters involved alternatively.

Firstly, the updates of and are determined, respectively, by where is the index of iterations, and by ignoring all irrelevant terms and applying the ordinary least square regression procedures on and , finally, the updates of and are given by where and and is an identity matrix.

Secondly, to update the shape parameter , we fix , , , , , and remain as constants, and can be determined by

Using the soft shrinkage operator, we can get the update of as where is the singular value threshold, , , is the proximal parameter, in which denotes the spectral radius of .

Thirdly, to update , we fix , , , , , and , and can be determined by

By invoking a linearized alternating direction, is updated by where is the soft shrinkage operator [49, 50].

Next, to get the optimal update of , we again keep all , , , , , and as constants, then the updating paramter can be determined by

Then, by employing a singular value threshold operators, the update of is given by where and .

To find the optimal update of , we fix , , , , , and as constants, then can be obtained by

Similarly, employing the soft threshold operator and an augmented Lagrangian multiplier to subproblem (14), then finally, is updated by

Invoked with an affine transformations, we obtained an additional optimal parameter. Therefore, we need to derive an additional update for .

To do so, we update , we fix , , , , , and , and can be obtained by

Along the same line, the affine transformations can be updated by where is a More-Penrose-pseudoinverse of .

Similarly, the updates of , , and are given, respectively, by

The updates of the regularization parameters , , and are given by where is a properly chosen constant and are tunable parameters adjusting the convergence of the proposed algorithm. We sequentially update , , , , , , and independently by keeping all of the other parameters unchanged. First, the regression matrix and are updated by (6). Next, we can update , , , , and the optimal affine transformation parameters, , by (8), (10), (12), (14), and (16), respectively. Finally, the Lagrangian multipliers , , and are updated by (17) and the regularization parameters , , and are updated by (18). These new updating equations proceed in a round-robin manner until convergence. Since a monotonically decreasing and bounded below sequence will convergence, the above algorithm is ensured to converge. It is noteworthy that based on the augmented Lagrangian multiplier, ordinary least squares procedures, and soft-threshold operators, all the updating parameters are achieved. The overall summary of the proposed method compared with other related works is summarized in Table 1.

4. Experimental Results

In this section, some simulations are conducted to assess the effectiveness of the proposed algorithm. Three datasets are considered in the simulations, including the Labeled Face Parts in the Wild (LFPW) [65], the HELEN68 [21], and the more challenging Caltech Occluded Faces in the Wild (COFW) [66] datasets. First, we consider the novel ideas of affine transformation that tackle the adverse effects of outliers, heavy sparse noises, occlusions, and illuminations. After an intensive mathematical derivation, we experimentally supported the effectiveness of the proposed method through some numerical simulations.

4.1. Comparison with the State-of-the-Art Methods

In this subsection, we compare the proposed approach, which adds affine transformations with some recently reported works in terms of precision-recall curves, landmark localization error, and time complexity on the aforementioned three databases. In this work, five different state-of-the-art methods, including TCDCN [31], RFLD [35], Fast-SIC [41], HPM [42], UF-DD [45], and the proposed algorithm are conducted based on the performance evaluation and time complexity. The results from these baselines are to reimplement of the publicly available codes.

4.2. Face Detection

First, we assess the proposed approach for face detection based on the LFPW and COFW databases in terms of the precision-recall curve. The comparisons of the proposed approach with UF-DD [45] and HPM [42] are shown in Figure 1, from which we can see that UF-DD outperforms HPM, as it downsampled the images with high resolution using an unconstrained face detector to tackle the impact of large variations. As noted in Figure 1, the HPM without occlusion is relatively better compared to the HPM with occlusion, Multiresolution HPM with rotation outperforming to the HPM with occlusion and multiresolution HPM without rotation is relatively outperforming to all other versions of the HPM in face detecting. We can also notice from Figure 1 that the proposed algorithm is superior than all HPM versions and UF-DD by achieving better recall rate on both datasets. The superiority of the new approach is due to an incorporation of an affine transformation and the multiple subspaces and constraining the parameters taking the advantages of positive semidefinite, which makes it to be more robust with a various adverse effects of outliers, illuminations, occlusions, and heavy sparse noises. This is because the new algorithm combines affine transformations with the part-based model so that it is more robust against the impact of occlusions and outliers. To justify the effectiveness of the proposed method, comparison of the localization error is taken into account, from which we note that the proposed method is outperforming to five different baselines. The performance of the proposed method is superior to all the baselines. This is due to affine transformation; first, it aligns the misaligned images and it prunes out the potential impact of annoying effects which boosts the performance of the proposed method.

4.3. Facial Landmark Localization

In this subsection, we compare the proposed algorithm with some state-of-the-art for the facial landmark localization on the HELEN68 and COFW datasets. The comparison with five baselines, TCDCN [31], RFLD [35], Fast-SIC [41], HPM [42], and UF-DD [45] in terms of the root mean square error is as shown in Table 2, from which we can see that TCDCN has the largest error compared with the other baselines, as it ignores the multicollinearity during multitasks to localize the objects. Also, RFLD [35] and Fast-SIC [41] can not provide satisfactory landmark localization accuracy, as their localization outputs are distorted and the patterns of occlusions can not be explicitly separated. We can also notice that HPM outperforms [31, 35, 41], as it utilizes a hierarchical-based structures to explicitly model the occlusions to attain accurate landmark localization. UF-DD outperforms [31, 35, 41], since UF-DD further downsampled the images with high resolution using a face detector. We can also find that the proposed method outperforms all baselines on both datasets. Thus, the results of Table 2 justify the effectiveness of the proposed approach based on the localization error for the facial landmark localization as compared with the main-state-of-the art works. This is because it aggregates robust regression using affine transformations with part-based models to accurately localize facial landmarks even under outliers and occlusions.

As an illustration, we also show three images for facial landmark localization based on the aforementioned algorithms as shown in Figure 2, from which we can see that TCDCN, RFLD, and Fast-SIC provide poor localization. This is because the approaches fail to clearly separate the landmark localization from the occluded ones as TCDCN [31] considers combined multitask learning and ignores multicollinearity while RFLD [35] considers a large number of concatenating histogram of oriented gradient (HOG) descriptors with a multiple initializations which influences the separation, which still makes the separation challenging, when the images are distorted by the presence of occlusions as shown in Figures 2(a)2(c). Moreover, it is also observed that the Fast-SIC [41] is relatively better compared to TCDCN [31] and RFLD [35] in separating landmark localization from the annoying effects such as occlusion, illuminations, outliers, and heavy sparse noises, at it considers the project out optimization framework. UF-DD [45] which incorporates the new annotated unconstrained face detection and downsampled strategy the images with high resolution provides better localization than HPM [31, 35, 41, 42], as it better separates the landmark localization (blue) and occluded landmarks (red) explicitly as depicted in Figure 2(e). The proposed approach works better under the impact of outliers and occlusion and explicitly separates the landmark localization’s from the occlusions as shown in Figure 2(f). This once again justifies that effectiveness of the combination of the affine transformations and part-based models, which makes the proposed method to be more robust with outliers and images which are highly influenced by occlusions.

In summary, the proposed method is relatively better in landmark localization and facial detection; this is due to affine transformation considered detects the potential impact of outliers, heavy sparse noises, occlusions and illuminations. However, the major drawback of the proposed method is it needs to account the spatial dependency between images through incorporating a spatial weight matrix in the mathematical formulation.

5. Computational Complexity

The time complexity of the proposed method as compared to the state-of-the-art works is described in this section. On a very standard desktop computer, the computational load of the baselines mainly TCDCN [31], RFLD [35], Fast-SIC [41], HPM [42], and UF-DD [45] along with the proposed method is given in Table 3, from which we note that the proposed method has small number of running time due to in the proposed method there are less number of parameters involved in updating the parameters. Additionally, our algorithm can handle batches of over one hundred images in a few minutes on a standard PC as the number of parameters involved is small as compared to the state-of-the-art works. The new algorithm is guaranteed faster time complexity compared to the state-of-the-art algorithms as shown in Table 3. This is due to affine transformation removing the extreme values, and various regularization parameters are also involved to make the situation of the proposed model stable and gross errors including outliers, occlusions, and illuminations.

6. Conclusions

In this paper, we developed a new model for facial landmark localization and detection, comprising an affine transformation and ADMM methods. The new algorithm combines the efficacious affine transformation-assisted robust regression with the part-based models to enjoy the advantages of both schemes. The problem is formulated as a constrained optimization programming, and a set of equations are established to iteratively update the parameters and the affine transformations. The search of the affine transformations and the optimization variables is formulated as a constrained convex optimization problem. The ADMM approach is then employed, and a new set of equations is established to iteratively update the optimization variables and the affine transformations. Simulations justify the effectiveness of the new algorithm on some common on the COFW, HELEN, and LFPW datasets compared to the state-of-the-art works. To minimize the time complexity, the ADMM approach is further considered in a round-robin manner. In this work, the affine transformation taken into account is to correct the alignments of the individual images, but not consider the issue of spatial dependency between images. This can be captured as one of our future work through incorporating the spatial weight matrix between images. Another future work is to excel this new part-based model in high-dimensional tensor datasets.

Data Availability

The data used in this article are freely available for the user.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research and Development Program of China under Grant no. 2018YFB1305700 and Scientific and Technological Program of Quanzhou City under Grant no. 2019CT009. We thank Addis Ababa University, Ethiopia.