Abstract

This paper proposes a method to classify whether a landmark, which consists of the outline in a face shape model in the shape model based approaches, is properly fitted to feature points. Through this method, the reliability of information can be determined in the process of managing and using the shape. The enlarged face image by image sensor is processed by bilinear interpolation. We use the gray-value variance that considers the texture feature of skin for classification of landmarks. The gray-value variance is calculated in skin area of the patch constructed around the landmark. In order to make a system strong to poses, we project the image of face to the frontal face shape model. And, to fill out each area, the area with insufficient pixel information is filled out with bilinear interpolation. When the fitting is properly done, it has the variance with a low value to be calculated for smooth skin texture. On the other hand, the variance for misaligned landmark shows a high variance by the background and facial contour gradient. We have proposed a classifier using this characteristic and, as a result, classified the true and false in the landmark with an accuracy of 83.32% through the patch classifier.

1. Introduction

The approaches based on the shape model to faces are the technique that matches the shape model toobjects in an image through shapes or textures and have shown good results by applying the various methods [13]. In addition, researches are in progress, which consider the illumination in the real world, the impact of a background, and the case that some part of a face is covered [4, 5]. However, there are still some known imitations in the techniques that use the shape model. Representatively, the feature detection and tracking techniques, which estimate an approximation [1] with the least square solution that calculates the tracking parameter used in the shape alignment, have limitations in each [6].

The purpose of the techniques related to the face does not aim to simply fit shapes. Mainly using the location information of the landmark, they are used in systems that perform the face recognition [7], gaze estimation [8], and expression recognition [9]. If a landmark is not properly placed on a feature point, each system causes serious problems in these approaches. Furthermore, when tracking is progressed for a continuous image from a video stream, when a landmark of shape stays in the background, when the face is obscured, and when the object becomes a variation in the type that cannot be generated by a variant of the shape model, the situation occurs in which the accurate fitting cannot be made according to the environmental changes.

Therefore, we have decided that another verification method is required in addition to the optimization strategies and the shape alignment method. If it is gone through the verification, prior to using the result of shape fitting, the reliability of the current shape can be determined. For example, it can be determined whether to use the result or not before using the data. Also, if the verification result is not reliable as the result of fitting in the video stream, the current tracking is abandoned and it can be operated to start again from the search process. Or searching for a better spot for the landmark can be induced.

We propose a method of verifying the landmark constituting the outline of the face to achieve this. But it may not be enough to go through the verification process only for the outlines of the landmark of the face shape model. However, every landmark is associated with each other in the form of the shape model. Thus, verification on the outline of the landmark can help to separate the fitting in the overall shape model.

Landmarks of a face contour have a feature suitable to be distinguished as compared to other landmarks because it is located at the boundary between the background and other objects in the two-dimensional image. Our approach is to classify landmarks through the analysis of the combination of a texture and a shape for a landmark. If a patch is configured that contains the pixel information around each landmark, this patch will all include the background, the skin of the face, and the boundary gradient. Therefore, as analyzing texture features targeting patches around the landmarks of the face outline, the result placed on the facial contour can be verified.

We named the classifier “patch classifier” that classifies landmarks using features described above. The patch classifier configures patches for the landmarks of the shape model and classifies whether the fitting is correct or not by analyzing the features from the gray-value variance calculation.

Most of the shape models go through the fitting process in accordance with variations in poses. At this point, the case occurs that the pixel information is not shown, so it goes through the process of filling content of a missing pixel using bilinear interpolation. Since the variance represents the relationship with the average, the bilinear interpolation is suitable for filling the missing information without alteration of variance results.

This paper is developed in the following flow. First, in Section 2, we review the approaches associated with our system. In Section 3, the patch classifier that we propose is introduced. In Section 4, the verification of the patch classifier is shown through an experiment. Finally, Section 5 refers to the conclusion, and Section 6 concludes with the contributions and suggestions for future research of the paper.

2. Relative Work

2.1. Face Shape Model and Fitting Method

Showing good performances by the method of matching the shape model with an image and the technique that matches the shape alignment with the image object, various approaches have been studied. The representative methods of the shape are Active Shape Models [1, 1012] as start, the Active Appearance Models [2, 1315] which uses the texture, and the Constrained Local Model [3, 6, 16] which uses the patch texture around a landmark. These studies are common to estimate the position of a landmark, each feature point. And it updates the optimized shape parameter and goes through the process of transforming the reference shape, and generates a shape that fits an object.

Starting with these techniques, a more extensive research direction is created. Typically, by aligning the nonrigid shape model on the face image and distinguishing the expression, it estimates the emotion of a person [9, 17]. A pose is estimated using the geometric information of the shape [18]. And by using the pose and eye position information of the shape, the research on gaze tracking is achieved [8]. Also, by generating the frontal face image from the nonfrontal face, it is used to identify faces [7, 19]. Above these, the various studies are underway to make good use of the shape in a number of areas. In this paper, before using the shape model, by judging whether the result can be trusted or not, it can be usefully utilized.

2.2. Gray-Value Variance

Gray-value variance is an attribute that can calculate texture as the feature. In Tracking-Learning-Detection, as the first state from the 3 states in object detection method, the gray-value variance of the patch was used [20]. If it is scanned by a scanning window and the variance is less than 50% compared to the target object, it is rejected from the candidate group. This process is commonly a nonobject patch, and it is determined from the background like sky, street, and so forth.

Also, in [21], similar to the previous paper, a cascade filter based tracking algorithm of multifeature is proposed in tracking the object. In this way, in the stage of cascade, if the gray-value variance is lower than 80% and greater than 120%, it is used in the method of rejection.

The Fingerprint image segmentation research has proposed the algorithm using gray-level variance as threshold [22]. In this way, the gray-value variance is used to filter objects or the background in the object detection research.

2.3. Most Related Approaches

In [7], as suggesting the view based active shape model, it deals with the content of the boundary extraction. In this research, the whole active shape mode is initialized with the Procrustes analysis; however, due to the individual differences, in order to solve the case that the shape grows apart from the actual face area, the process of extracting the boundary of the face is added. It is limited through the boundary in the square that includes the top, bottom, and sides from the eyebrow to the chin and the end of nose. From the bottom to the top of the rectangular area in which the edge strength and smooth combination are optimized, it was formulated by the curve running. Optimal curve is found by dynamic programming in seam carving. The seam means path of least importance. First, the importance function is measured by measuring neighbor pixels of image. The seam carving consists of finding the path of minimum energy cost. Then the optimal curve is excluded and this part is maximized instead of minimizing the edge strength. Later, it updates the boundary of AAM points using curve points.

The study is different from our purpose, but it is consistent with the point that it draws the result through the face outline. However, it extracts the contour of a face by using the edge components of the face outline proposed in the above mentioned paper. The method of using the edge represents the form feature of an object, but this would not be able to avoid the influence on the background of real life.

3. Patch Classifier

The patch classifier proposed in this paper has a role to classify whether the landmark that consists of a face outline is properly placed on the outline of the face image. The key idea of the patch classifier is that the inside of a shape composed of the outskirts of a landmark consists of skin. Since a region other than the outer face is only composed of smooth skin texture, when calculating the gray-value variance, it shows a low value. To calculate this, we construct the area around the landmark with patches. In addition, the outside of the shape composed of  landmarks that constitute the face contour is treated with a mask. The patch that excludes each mask should not include the background and gradient boundary of the face. Using that, the patch classifier judges whether the landmark is properly placed on the facial contour or not. Prior to these works, it is necessary to create a frontal image of the face so that you can work in the free pose of the shape model. Figure 1 shows the flow with the simple picture. We use current shape in tracking method and trained reference shape. First, the frontal face image is generated by projecting image warped current shape to reference shape. And we construct patches for set of pixel in region around each landmark of reference shapes. Then the gray-value variance is calculated to analyze the characteristics of the patch. Finally, we decide classification results using the calculated gray-value variance and geometric feature of face shape model.

This process will be described in the following flow. First, Section 3.1 deals with the process that configures the patch after creating a frontal face image composed of gray-value in the preprocessing section. Section 3.2 explains the process of calculating the variance of each patch. Finally, Section 3.3 describes the process of determining the variance calculation result again considering the relationship between landmarks.

3.1. Preprocessing

A frontal face image is produced with the shape and image which are in tracking to compute the variance of the patch classifier. Next, it goes through the preprocess for the calculation of variance by constructing the patch made of gray-value of landmarks.

3.1.1. Frontal Face Shape Model

The shape model is basically performed through the calculation of the mean shape and the parameter [1].

In this paper, we use the current shape, image, and the reference shape of the Point Distribution Model (PDM) used in the shape alignment. Reference shape is the mean shape of the shape of the training set that is used when the PDM is training, and it is generally from the front face. The shape model is utilized only for the face when the background was removed. Since our approach is pose-invariant system, the frontal image is prepared for the fact that the patch calculates under the same condition regardless of the pose.

3.1.2. Generate 2D Frontal Face Image

Using the reference shape, the current face image is transformed into the frontal image. In a number of studies, Piecewise Affine Warp [23] is used to produce a face image. This technique is the texture mapping, which maps the different area from the image. First, it configures the Delaunay triangulation in the shape and reference shape currently being tracked. Then, using a bilinear interpolation correction with the center coordinates, it maps the specific triangle to the target triangle. When a pose variation occurs in the shape model, it is to fill with bilinear interpolation for the case that 2D face image is not shown. The bilinear interpolation is an extension of the simple linear interpolation. It fills the empty area by the method that does not affect the gray-value variance which will be calculated later.

In general, when generating the front face image through the Piecewise Affine Warp, the area other than the face region is thereby excluded by the mask. That is, the other area except the set of the triangle region is treated with a mask. The mask is possible to be obtained by a convex hull to the reference shape. Figure 2 shows an example of the result of the process.

This mask is not used solely for the Piecewise Affine Warp transformation. In the variance calculation, the mask area is excluded to calculate only the skin area. In this way, it is to naturally review the outer boundary of the face. The background is included in the outside area of connecting lines between neighboring landmarks. On the other hand, only the skin is included in the inside of the boundary.

3.1.3. Construct Patch

Each patch specifies a rectangular area around the landmarks of the frontal face shape. The size of the rectangular area determines the size of the patch in proportion to the whole size of the shape model. All the created patches have to include a boundary of the face contour.

The reason to specify the patch is to reduce the effect on the skin or the light amount when the variance is calculated. The gray-value variance is the number that indicates the extent of the fact that the gray-value image is spread from the average. That is, the value indicates the distribution of the pixel in the image. Therefore, if we specify a range of image as the full face, the large numerical value must be acquired due to the gradient by the illumination, eyes, nose, mouth, and so forth. If the variance is locally calculated by configuring the patch, on the other hand, it is possible to reduce the effect on the gradient of the overall light in the face. Because we only calculate the variance to determine whether the texture inside of the shape model is soft, calculating in a limited scale is more stable and efficient.

3.2. Gray-Value Variance

This section describes the details for calculating a variance for each patch via the previous preprocessing. Variance is a value that measures how much the number sets of the values are spread out. Low variance means that the value is very closely gathered to the average. On the other hand, a high variance value indicates that the numbers are located far from the average. That is, if there is no change in the gray-value in the generated patch, variance will be low. As a result, if the variance of the gray-value is low, it means it has a smooth texture. Conversely, if the calculated value of variance is high, it means that the background or any other object is included in the patch. Thus we calculate the variance to the patch consisting of the landmark as the center and determine the skin texture detection by calculating the variance. For this, the gray-value variance to the th landmark is calculated as the following equation. First, the mean of gray-level values except mask for image is calculated bywhere is the gray-level value of coordinate of the image , is the pixel of coordinate of the mask , is number of pixel when the value of the coordinate is 1, and is size of image. The image type should be a grayscale. And the mask consists of 0 or 1. Equation (1) is used for calculating variance except points where value of mask is 0.

The mask has been used in the preprocessing. The ideal result should be that the mask is cut out along the face contour. Through this fact, the gray-value variance of frontal face image excepting the mask iswhere is the patch that contains gray-level value in the center of the th landmark. We can calculate the variance of the range which should contain only the gray-value of the skin through this formula as noted above. We have collected the appropriate range of variance through the landmark passively placed. In Table 1, the patch, the actual gray-value figures, and a gray-value variance are shown. The mask area is shown as black in the patch of which gray-value is 0.

We measured the gray-value variance from the face image of the Multi-PIE database [24] and the ground truth landmarks in order to determine the numerical value for the calculation results in the general image and the correct position of the landmarks. The face images of the Multi-PIE database contain a variety of light direction, the number of people, and even the beard. The results are shown in Figure 3. This graph shows the result of collecting the number of landmark that has a gray-value variance included in the gray-value variance range shown on the -axis. In the result of the measurement, in the landmarks of the entire face outline, approximately 87.52% of the gray-value variance showed a value of less than 200. Through the result, we come to believe that we are likely to determine the possibility of whether the fitting is correctly done in the proposed method. Some landmark represents high gray-value variance value by a gradient according to the direction of the light source and the beard. Since this phenomenon can occur more frequently in the real world, the patch classifier determines the classification through one more step.

3.3. Considering Relation between Landmarks

As described above, the result of whether it is placed in the correct position of the landmark can be estimated through the calculation of the variance of the gray-value patch. However, due to the environment like the amount of light, shadow, skin characteristics, and the beard may cause different results. In addition, there is a weak point in what will be calculated from the variance. For example, let us assume that the front face is being tracked in front of a single color background. If the patch only contains the background due to the failure of the face alignment, the result indicates a very low variance and the incorrect fitting result, more likely, will be classified as normal. To solve this problem, we redefine the result of the variance through the relationship between the landmarks of the shape.

As a result of the trace for the first situation in Figure 4(a), gray-value variance of one landmark is calculated over 300; it is classified as an incorrect result. In this case, due to the gradient of the facial contour, it came to have a high variance. However, the RMS error of the manually placed landmark and location is actually a very small number, and it is the point needed to be determined as the correct fitting.

Since these patches are created with the landmarks that consist of a face, features of a face can be used. The landmarks of which variances are calculated have the shape of curve that forms the face boundary. Also, because it keeps the reference shape by transforming using the parameters in the shape alignment technique, it is difficult that only one landmark takes the protrusive shape. Thus, the result of the protrusive location is treated as the same as the result of the peripheral landmark.

As the same phenomenon of this, the different situation in the protrusive result is shown in Figure 4(b). This result obtained the very low value of variance since the around patches of protrusive landmarks remain on the background of a solid color. Further, a single landmark to start escaping showed a high variance. Until this situation, it is the same as the previous situation. In order to prevent this, the mean to the patch that does not exclude the mask used in the Piecewise Affine Wrap after the landmark isTherefore, the gray-value variance to the whole patch isThis calculation will calculate the variance to the patch including the background face and so on. If the landmark stays in a solid colored background, since the face contour does not exist, the similar variance will be retained.

4. Experimental Results

In this paper, the patch classifier, as described above, classifies the correct fitting result by calculating the variance of each landmark nearby. We did an experiment based on the CMU Multi-PIE database [24] to evaluate the accuracy of the result. Also, after performing by applying a patch classifier to the actual shape fitting technique in the video frame of FGNet [25], according to the RMS error, the measurement of the gray-value variance and the result of the classification are checked. As the same experiment, the patch classifier in real-life images with a complex background is reviewed.

Verification of the basic patch classifier proceeds in the Multi-PIE database. Multi-PIE has various pose, facial expression changes, and face images to the illumination. 68 ground truth landmark points for this have attached to comments. Each image is the 640 × 480 size. A total of face images for 346 people exist in various poses and, at the same time, 20 sets of direction of light are composed. Also, it is made up of images of faces taken from the camera in a total of 15 directions. We use the total 75,360 images by utilizing a set of three directions. This database is a database that is primarily used in the face shape fitting performance comparison, so we carry out the experiment about the gray-value variance compared with the ground truth landmark advances.

In this paper, because it determines whether landmarks are correctly positioned in the facial contours or not, errors are estimated by the distance between the existing ground true landmark and the face contour, not by the RMS error. This example is shown in Figure 5.

In order to find an appropriate variance threshold to distinguish between true and false in the patch classifier, the Receiver Operating Characteristic (ROC) curves for a wide range of gray-value variance are presented in Figure 6. The curve is calculated only for the outline of the landmark of the shape model. It is used as the ground truth landmark of the Multi-PIE for each sample image and goes through the fitting process for the same image using the technique of [6]. We thus apply the patch classifier to the around of the landmark of the facial contour for this fitting. The fitting result is defined as true when the distance of the ground truth landmark is less than 3 pixels and defined as false when it is greater than that. In ROC curve, the red mark is the case that the result is determined only by the gray-value variance. The blue mark is the result of considering the relationship between the landmarks. By considering the relationship between the landmarks, it was identified that the result was improved in the true positive rate before the correction, and also the false positive rate became much lower. The critical gray-value variance in which the classifier shows the best performance is showed as 228. If it is classified by using this critical value, the true positive rate is 87.71%, and the true negative rate is 71.84%.

The experiment is processed for continuous images using the calculated critical value of the gray-value variance. In Figure 7, the result for the distance between fitted landmarks and ground truth landmarks in the FGNet talking face video database [25] and the gray-value variance is showed. As the same as in the experiment of Multi-PIE, the fitting technique [23] was used. The graph at the top represents the distance between the landmark, the result of the fitting, and the ground truth landmark. As shown in the graph, this fitting technique maintains the distance of approximately 10 pixels on the whole in tracking. According to this, it is confirmed that the gray-value variance for each landmark shows less than the value of 300. As shown in Figure 7(a), if the fitting result is stable, the gray-value variance can be seen to be stable. And Figures 7(b) and 7(c) show the relationship between gray-value variance and error. As shown in Figure 7(d), the 17th landmark, which are the gray-value variance at both ends of the landmarks of the outline of the face, do not seem to be stable. Since this position does not only contain the skin due to hairs, the value must be high. However, according to considering the relationship between the landmarks of the shape as mentioned, the result of the patch classifier can be returned as true according to neighbor landmarks.

Finally, the result of applying the patch classifier on real images is shown in Figure 8. The green dots represent the landmarks of which normal fitting is made and the red dots represent the landmarks that are classified by the fitting failure by the patch classifier. In the upper right of each image, the frontal image is placed that its gray-value variance is measured, after configuring the patch. The image represents the result that the patch classifier classifies the case that the shape cannot keep up with or the landmark stays on the background by the rotation. It can be identified that, in the frontal face image in the upper right, the nonface contour area is included. This method showed better performance in complex background. The gray-value variances in complex background are higher than the gray-value variances in simple background. When the face rotated yaw axis, our approach can classify landmarks properly.

5. Conclusion

In this study, the patch classifier has been studied which determines the fitting result of landmarks that consist of the face contour lines of the face shape model. The patch classifier approaches to estimate a texture of skin through a gray-value variance measurement. To show the performance invariant to pose, it transforms to the frontal shape model. In this transformation, we were able to fill a hidden pixel by a pose using the bilinear interpolation without modification in measuring the gray-value variance. Further, in order to reduce the interference with the illumination or the like, the method of configuring a patch around a landmark is used. A gray-value variance is calculated only in this patch. If you apply this approach to the Multi-PIE database, approximately 87.52% of the gray-value variance of the landmark is confirmed to appear as a value of less than or equal to 200. The texture and shape features are dealt with in the approach of the existing fitting methods and showed the potential for the use of the features in this study. In addition, by applying this technique to the outskirts of the landmark face shape model, we proposed a method to classify whether a fitting is successful or not. As a result, it was able to classify the correct fitting result in the probability of 83.32%. We could verify whether a gray-value variance is placed on the landmark of the face outline in the simple and effective way.

6. Limitations and Future Work

We used the outline landmarks by which the success or failure of fitting can be determined in the relatively simple way. As mentioned above, since landmarks of the shape model are composed of geometric relations, we can estimate the fitting state of the shape being tracked. Estimation of the fitting state might be helpful in verifying the reliability of the shape model used in various directions and determining the usage.

First, there is a case that uses it as a measure for improving the tracking of the shape model itself. The systems that are related to the shape model consisting of the landmarks are flowed in two steps. After the initial searching of a face area and the initial fitting process of shape, fitting process only occurs. If a landmark fails to fit the facial feature in tracking, it can be occur that a shape repeats the incorrect fitting. In this case, with analyzing the current status and returning to the initial phase of the system by using this research, the recovery is possible.

Also, it can be used as a correction of the shape fitting result. The fitting technique keeps track of each feature and maintains the natural form through the shape alignment. In this process, the outer landmark directly evaluates and finds the better fitting points. After verification of the shape, furthermore, it can be used in various systems to take advantage of the shape model. The systems of gaze tracking, face recognition, motion recognition, and facial expression recognition are applicable. The role of the landmark in the system is critical. If landmarks are incorrectly fitted, a more serious problem might occur in these systems.

Like this, the patch classifier can achieve the better results by applying after the shape fitting. In addition, by utilizing the failed data, it contributes to the way of escaping from the bigger problem.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.