Abstract

Facial makeup significantly changes the perceived appearance of the face and reduces the accuracy of face recognition. To adapt to the application of smart cities, in this study, we introduce a novel joint subspace and low-rank coding method for makeup face recognition. To exploit more discriminative information of face images, we use the feature projection technology to find proper subspace and learn a discriminative dictionary in such subspace. In addition, we use a low-rank constraint in the dictionary learning. Then, we design a joint learning framework and use the iterative optimization strategy to obtain all parameters simultaneously. Experiments on real-world dataset achieve good performance and demonstrate the validity of the proposed method.

1. Introduction

Digital technology represented by artificial intelligence, Internet of things (IoT), and cloud computing, etc. is developing vigorously for smart cities. A smart city aims at using various kinds of information technology to integrate the system and services of the city, which improves the utilization efficiency of resources and the quality of life of residents [1, 2]. Devices and sensors of IoT will reach 40 billion by 2025 [3]. With the amount of data increasing, IoT industry expands from the initial connection to intelligence and autonomy. Simultaneously, artificial intelligence as a powerful tool provides intelligence for smart cities, and a large number of machine learning algorithms are put into practical application to realize the autonomy of the equipment, which completes data collection and processing by itself. In this case, artificial intelligence helps to collect relevant data, identify alternatives, and make choices among alternatives, review decisions, and make predictions [4, 5]. Automatic face recognition is considered as one of important techniques to realize smart city. It plays an interactive role in human-computer interaction and intelligent transportation in access control system, community management information system, person of interest, and so on [6, 7]. For example, based on face recognition technology, monitoring is carried out in crowded places such as passenger stations and railway stations. Real-time recognition of faces in the video is compared with the database of people of key concern to public security, and real-time alarm can be provided. In smart cities, face recognition technology can now be applied to examinations in schools. At the examination centre, candidates verify their identity through a face recognition system to ensure fairness and prevent the occurrence of test substitution.

Due to differences in illumination variations, face angle, posture, and cameras, the face images belonging to the same person may look very different. Particularly, in real-world applications, facial makeup significantly changes the perceived appearance of the face and reduces the accuracy of face recognition. The literatures [810] indicated that facial makeup has a negative impact on the performance of the majority of face recognition algorithms. Figure 1 shows examples of face image pairs in the wild (DFW) dataset [11]: left one in each pair is without makeup, and right one in each pair is with makeup. The face before and after makeup can intuitively feel the significant changes in facial appearance. For those reasons, the makeup face recognition has become a difficult problem in facial classification. In order to develop a powerful face recognition system, the influence of cosmetics on face verification needs to be solved. Yan [12] introduced a multiple feature descriptors into the metric learning that learned multiple distance metrics by collaborating different facial features from visual and audio information. Chen et al. [13] developed a method for the automatic detection of makeup in face images. This method extracts a feature vector to capture the shape, texture, and color features of face images and uses SVM and Adaboost to determine if makeup is present. In addition to extracting features from the whole face, the method also uses parts of the face associated with the left eye, right eye, and mouth. Kose et al. [14] developed a facial makeup detector to reduce the impact of makeup in face recognition. This method exploits the shape and texture information of face and uses SVM and Alligator as classifiers. Wang and Kumar [15] developed a framework for facial makeup detector and remover. In this framework, it uses a locality-constrained low-rank dictionary learning method for facial makeup detector and uses locality-constrained coupled dictionary learning for facial makeup removal. Although there have been some research results on makeup face recognition, the performances of the methods in real scenario applications still need to be improved.

Recently, dictionary learning has achieved great success in the field of face recognition. Traditional dictionary learning learns sparse representation and dictionary in the original data space. However, face makeup image verification is not only affected by cosmetics, but also easily affected by illumination and posture. In this study, we develop a joint subspace and low-rank coding method for makeup face recognition (JSLC). We consider finding a feature projection space and project the face images into it. At the same time, we learn a discriminative dictionary in such feature subspace, and each face image is encoded by a discriminative coding. To solve the problem of subspace and dictionary simultaneously, we build a joint learning model for them. In addition, to obtain more discriminative information in the subspace, we consider a low-rank constraint in the dictionary learning. The optimal solution of subspace projection matrix, dictionary, and sparse coefficient can be obtained simultaneously by alternating iterative optimization strategy.

We organize the rest of this paper as follows. Firstly, related work about makeup face recognition is reviewed in Section 2. Secondly, the proposed method is introduced in Section 3. After that, the results of comparison experiment are shown in Section 4. Finally, conclusions and future work are summarized in Section 5.

In the view of AI, the makeup face recognition contains two stages: feature extraction and classification method. The common used feature extraction methods for face recognition is geometric methods and appearance methods [9]. Geometric methods use geometric shape of facial components, and appearance methods use textures of the facial images, also including creases and furrows. Geometric methods use pre-defined geometric marker positions on salient facial features to represent facial characters. Since geometric methods express facial characters according to the limited fiducial points on the human face, they usually need accurate facial feature detection. Thus, appearance methods often perform better in face recognition. The commonly used local binary patterns (LBP) and Garber filters are all appearance methods. There are many successful classification methods for face recognition, such as SVM, metric learning, dictionary learning, Adaboost, and so on [16, 17]. Due to its sparsity and noise alleviation, dictionary learning demonstrated its advantages in image processing tasks.

Dictionary learning methods can approximate each sample by using a linear combination of a few atoms from the learned dictionary [18, 19]. Given training samples where , the dictionary and corresponding sparse coefficients can be trained by the following formula:where is the Frobenius norm operation, is the sparsity regularization, and is the balance parameter.

The original meaning of equation (1) is to complete the reconstruction tasks. In order to use DL for classification tasks, more discriminative or supervision information is considered in the dictionary learning. Thus, its optimization problem can be written aswhere function can be a classifier, discrimination criterion, or label consistency term.

3. Joint Subspace and Low-Rank Coding Method for Makeup Face Recognition

3.1. Objective Function of JSLC

Because the appearance of the person face will change significantly after makeup, in this study we use subspace learning to project the original data samples and preserve the discriminative information in the feature subspace. The subspace learning imbedded into dictionary learning can be represented aswhere is the projection matrix, p is the dimension of the subspace, and and are two positive parameters. has three terms. The first two are the dictionary learning terms in the subspace, and their goal is to minimize the representation error. The third is the regularization term, and it plays the role of principal component analysis (PCA), by which the discriminant information in the original space can be preserved in the projection subspace [20].

Then, we consider using an affinity matrix Q to measure the discriminant ability of the sparse codes; i.e., if two face images are from the same person and look similar, the difference in their sparse codes is minimized; if two face images are from different person and look similar, the difference in their sparse codes is maximized so that discriminative information can be exploited. This idea can be represented as

The element of matrix Q can be written aswhere function returns the k-nearest neighbors of image . means images and from the same person. means images and from the different person.

We denote diagonal matrix whose diagonal elements are the sums of the row elements of Q. term can be simplified aswhere L = S-Q.

In order to obtain more discriminative information in the subspace, we consider a low-rank constraint of A in the dictionary learning. Following [21], we use to present the rank, where and . Thus, the objection function of rank minimization can be written as

We combine , , and together, and we obtain the objection function of JSLC as i.e.,

Obviously equation (8) is a joint learning function for subspace and dictionary learning. The subspace can gradually enhance the discriminative ability of the learned dictionary during the optimization process. Also, the learned dictionary can improve the recognition of the subspace.

3.2. Optimization

In this subsection, we solve equation (8) by using the alternating optimization strategy. First, we denote , , for some and some . The objective function of JSLC can be written aswhere .(1)Update W: when fixing B, A, E, and H, we can obtain the following formula:Equation (10) has a closed-form solution. We can obtain R bywhere , and Z can be solved byWe can obtain Z in a closed-form solution. Then, we can obtain W by(2)Update D: when fixing A, W, E, and H, we can obtain the following formula:We use Lagrange dual approach to solve equation (14). The closed-form solution of D iswhere is a very small diagonal matrix. Then, we can obtain matrix B by , where is the operation of pseudo-inverse matrix.(3)Update A: with W, D, E, and H fixed, the objective function is rewritten byObviously, each term in equation (16) is quadratic; we have the following formulation by setting the derivative of A to zero:Equation (17) is a standard Sylvester equation, and we can solve it by following the Bartels-Stewart algorithm [22].(4)Update E, H: with W, D, and A fixed, equation (9) can be written as . We can easily obtain the closed-form solution of E and H by

When we obtain the optimal parameters of dictionary D and project matrix W, we can obtain the sparse coding of testing image by

Finally, we can use the closing distance strategy to perform the testing task.

Based on the above analysis, the proposed JSLC method is presented in Algorithm 1.

 Input: a dataset of facial images , including with makeup images and without makeup images;
 Output: dictionary D and project matrix W.
 Initialization: random matrix B, constructing R’s columns using eigen-vectors with top p eigen-values of C;
 Repeat
   Update W using equation (13) with D, A, E, and H fixed;
   Update D using equation (15) with A, W, E, and H fixed;
   Update A using equation (17) with W, D, E, and H fixed;
   Update E and H using equations (18) and (19) with W, D and A fixed;
 if converged

4. Experiments

4.1. Datasets and Experimental Settings

In the experiment, we use the widely used face datasets DFW [11]. The DFW dataset contains 11155 different images of 1000 people collected from the Internet, including face images of movie stars, singers, athletes, and politicians. Each person contains one face image without makeup and multiple face images with makeup, and there are differences in posture, age, lighting, and expression. Wearing glasses and hats are also categories of makeup. The example face images of DWF dataset are shown in Figure 2. In this paper, we use histogram of oriented gradient (HOG) [23], local binary pattern (LBP) [24], and three-patch LBP (TPLBP) [25] to extract the features of facial images. The HOG algorithm sets the image block size as 16 × 16, and the extracted features are 1764 dimensions. LBP divides each face dataset image into 16 non-overlapping regions of 16 × 16 pixels and extracts 3776 data features. The TPLBP algorithm sets the image block size as 16 × 16, and the extracted features are 4096 dimensions. We randomly select 2000 images of 200 people. We reduce the obtained features to 500 dimensions by principal component analysis (PCA).

To validate the effectiveness of our approach, our method verified performance with the following methods: LLC [26], LMNN [27], PRDC [28], NCA [29], and RDML-CCPVL [30]. We set the subspace dimension in the grid {100, 200, 300, 400, 450} and the number of dictionary atoms in the grid {200, 300,...,600}. The parameters , , , and are set in the grid {0.5, 1,...,5}. All parameters in these methods are set according to their default settings. We use 5-fold cross-validation to obtain the optimal parameters and the average results of five turns are taken as the final result.

4.2. Experimental Results

Table 1 shows the comparison of JSLC based on HOG feature extraction and four comparison algorithms in the matching rate index. The results show the following: (1) JSLC achieves the best results on Rank 1, Rank 5, Rank 10, and Rank 15 of matching rate. JSLC uses dictionary learning framework and combines subspace and low-rank learning technology, which can effectively mine the discrimination information of different face images. (2) The comparison algorithm PRDC is mainly based on relative distance comparison; LMNN mainly uses the large interval information of inter domain samples, which cannot effectively make full use of the image discrimination information, so it still shows poor ability. Although RDML-CCPVL uses the depth discriminative metric learning method, the clustering method used in RDML-CCPVL cannot exploit all the effective information of images, so that its performance cannot achieve the ideal results. Tables 2 and 3 show the comparison of JSLC and four comparison algorithms in the matching rate index based on LBP and TPLBP features, respectively. Similar results are obtained on HOG feature; JSLC obtains the best matching performance compared with the other four methods. The results in Tables 13 also indicate that HOG, LBP, and TPLBP features are suitable for extracting makeup face feature vectors. The bold means the best result in the tables.

Figures 3 and 4 show the values of Rank 1 of JSLC using HOG features with different subspace dimensions and dictionary atoms. The results in Figures 3 and 4 show that the dimension of subspace to 400 and the number of dictionary atoms to 450 is feasible. In the JSLC method, the parameters , , , and are related to the performance of the model. Next, we analyze these four parameters. With LBP features and the fixed values of other parameters, Figure 5 shows the average value of Rank 1 of JSLC method with different parameters , , , and .

First, we discuss the effect of in JSLC. The parameter controls the role of sparse regularization term. The results in Figure 5(a) show that when  = 1, the average Rank 1 achieves the best performance. In addition, the differences in model performance for different values of are modest. The parameter controls the role of PCA regularization term. The larger the value of is, the larger the proportion of PCA term in the objective function is. The results in Figure 5(b) show that different values of lead to different performance of JSLC. But we cannot find the relationship between and matching rate. Therefore, the optimal value determined by grid search method is feasible. Next, we consider the effect of . The parameter controls the role of affinity matrix in JSLC. The results in Figure 5(c) show that the matching rate of JSLC is sensitive to . When  = 4, the matching rate is highest. Therefore, grid search method for is feasible. Finally, we discuss the effect of in JSLC. The results in Figure 5(d) show that the matching rate of JSLC is also sensitive to . The parameter controls the role of low-rank term. When the value is too small or too large, the low-rank term cannot exploit the intrinsic data structure of the face image.

5. Conclusion

In this study, a joint subspace and low-rank coding method is proposed for makeup face recognition. Based on the dictionary learning framework, the subspace learning and low-rank coding is jointly, so that the discriminative information of face images can be exploited. Experiment results on DFW show the good performance of our method. In the future, we will carry out face makeup recognition and verification in more complex datasets and more scenes, such as under various illumination, pose, and expression. How to extract deep features of face images into our method is also our work in the next step.

Data Availability

The labeled datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The research activities described in this paper have been conducted within the Qinglan Project of Jiangsu Province under Grant no. Q019001, the Scientific Research Project of Changzhou Institute of Technology under Grant no. YB201813101005, Youth Innovation Fund Project of Changzhou Institute of Technology under Grant nos. QN202013101002 and HKKJ2020-37, National Natural Science Foundation of China under Grant no. 61806026, Natural Science Foundation of Jiangsu Province under Grant no. BK20180956, and Project of Jiangsu Education Science in the 13th Five-Year Plan in 2018 under Grant no. B-a/2018/01/41.