Abstract

For many years, face recognition has been one of the most important domains in pattern recognition. Nowadays, face recognition is more required to be used in video actually. So moving facial capture must be studied firstly because of performance requirement. Since classic facial capture method is not so suitable in a moving environment, in this paper, we present a novel facial capture method in a moving environment. Firstly, continuous frames are extracted from detecting videos by similar characteristics. Then, we present an algorithm to extract the moving object and restructure background. Meanwhile, with analysis of skin color in both moving and static areas, we use the classic faces capture method to catch all faces. Finally, experimental results show that this method has better robustness and accuracy.

1. Background

Today is the era of electronic public security, which uses electronic equipment and network in human security. So identity verification becomes the most important area in electronic social security. Meanwhile, face recognition becomes one of the most widely used biometric identification technologies because of its positive features such as being direct, friendly, convenient, difficult to counterfeit, and cost-effective [1]. Moreover, face recognition also has a wide range in biometric authentication, video surveillance, social security, and other application areas [2]. Generally, face recognition can be divided into three steps: facial detection and segmentation from scene, capture of facial feature, and matching and recognition of human face [3].

Facial detection and capture is the first step in face recognition method. Detecting and tracking faces rapidly from frames in video is the basis of face recognition. Then, recognition will process when an image makes for identification.

However, there are many problems in face detection, like how to use information of facial time and space, how to overcome low resolution ratio and huge range of mutative scale, and how to recognize faces with intense transformation or a hidden facial part. These are the emphases in studies nowadays [4]. Meantime, facial detection and capture in video need high quality in recognition time sometimes [5]. Thus, it needs low computational cost. So we have to use simpler model with higher accuracy. So, statistical models are used widely in facial detection today [6, 7].

In recent years, neural network, Support Vector Machine (SVM), and AdaBoost are the widest statistical models in facial detection and capture [8]. Thereinto, AdaBoost is used widely in dynamical detection for its fast speed and well stability [9]. Meantime, AdaBoost is also used in other classificatory areas just like classification of music and protein [10, 11]. Besides, AdaBoost is used in other biometric authentications like iris recognition and facial recognition [12].

Though AdaBoost is widely used, it has obvious deficiencies. The first deficiency is its less robustness with transformation of illumination and expression. Secondly, it cannot detect deflecting faces. So, in this paper, we ameliorate AdaBoost by moving object capture [13]; that is, we detect a moving object by using AdaBoost for a tiny time range. Admittedly, facial moving speed is slow. So, frames with tiny time range can be extracted by similar characteristics. We determine relevance of frames by area comparison of moving objects. Meantime, we determine face when area ratio of facial image is registration more than a threshold. Then, with capture of moving object with restructured background, we analyze skin color in moving area and use it in static area. In this way, all faces are detected and extracted no matter whether these faces are front or deflecting.

The remainder of the paper is organized as follows. We present and analyze our method in Section 2. Then, we present a novel fusion algorithm in Section 3. Moreover, we experiment some data with this novel algorithm and classic AdaBoost algorithm in Section 4. Finally, Section 5 summarizes the main results of the paper.

2. Theory of Methods

There are many expressions of color set in computers and they form different color space. RGB, HLS, YCbCr, and YIQ are the widest used today. Admittedly, skin set gives expression to different cluster features when it is in different color spaces. Researchers discover that skin set shows better clustering results in HLS and YCbCr [14]. Generic terms of YCbCr are called YUV [5]. Y expresses luminance and U, V express chrominance signals.

Since Duan presents her hierarchical method of skin color for color spaces [15], we can analyze distribution of skin color in YIQ and YUV. Then, we use threshold of and to detect skin colors where is th component of YIQ and is phase angle of YUV. So formula of is shown in

We revolve the component of chromatic aberration of YUV to form YIQ. So I contains color information from Orange to Ching and Q contains Green to Magenta. Then, skin tone is between Red and Yellow, and is basic in domain .

In this paper, we use and . Then we use (2) to transform YUV and YIQ to RGB:

Then, in order to increase speed of facial detection, we fuse skin color detection into face detection after transformation. We extract a frame as original image and detect possible skin color area. Moreover, we crop following images by using skin space model. Then, we cascade connect weak classifiers to strong classifiers. We have (3) to show a weak classifier with feature . In (3), is clustering threshold of minimum error in training samples and expresses direction of inequality:

In this way, we drop those weak classifiers whose clustering rate less than 50%. Then, in order to enhance accuracy of facial capture, we enhance weight for them with better performance and reduce weight for them with worse performance. Computation of weight is shown in

Equation (4) is initialization of weight, and (5) is normalization of weight. Equation (6) is classifier , which has minimum error rate for all classifiers of feature with weight . Equation (7) shows transformation of all weights. Equation (8) presents the strong classifier, which is composed by weak classifier.

3. A Novel Fusion Algorithm with Background Restructured

In this paper, we restructure background by (9). Without background, we compute function to express difference of two next frames and function to express whether there is a moving object in it. Then, applying as pixel gray of foreframe and as threshold, we consider a pixel is moving when it fits in (10). In this way, we can find all background points by training. Figure 1 is a flow chart of background training. Consider

After background restructure, we detect facial area from some continuous frames by (8) with skin color detection. We know (8) process well with front faces. Then, based on faces detected by (8), we search moving object around them. We trust a moving object is a face where registration rate of moving object and facial area is more than a threshold and its color belongs to skin color space. The novel fusion method can extract some deflecting faces, which cannot be found by a classic method.

The following steps are presented to show the novel fusion method.

Step 1. When the video plays, catch three continuous frames as original frames from video. Otherwise go to Step 8.

Step 2. If background is not complete, restructure it one time.

Step 3. Extract skin area with skin model and morphological operation.

Step 4. Extract detected areas from corresponding positions of original images, and then these areas are processed to connected rectangle or oval regions.

Step 5. Use (8) to detect faces.

Step 6. Detect moving faces from nearest facial area.

Step 7. Stamp results and go to Step 1.

Step 8. Procedure finished.

We have Figure 2 to show the flow chart of the novel fusion method.

4. Experimental Result and Its Analysis

In this paper, we validate our method from a video, which contains both single face frames and multifaces frames. We process classic facial capture method and our method to detected faces. Then comparisons of time and rate are shown.

4.1. Single Face Detection

In frame sequence of the video, the face is moving with different kinds. In this paper, we extract the face image by using front, side, up, down, lean, and shaded face. As we know, classic method cannot detect faces when they are not frontal and can only detect a part of faces when they are shaded. Oppositely, the fusion method can detect faces accurately. We put them in Figure 3. Figures from upper two rows are capture result for classic method and figures from bottom two rows are for fusion method.

4.2. Multifaces Capture

In Section 4.1, we choose faces with different kinds and detect them in Figure 4. Figures from upper two rows are capture result for classic method and figures from bottom two rows are presented for fusion method.

4.3. Analysis of Experimental Result

We have (11) to define mean computational time of every image. With fames of single face, it costs 177.502 ms by using classic method and 127.887 ms by this fusion method. Moreover, it costs 229.631 ms by classic method and 153.963 ms by this fusion method with frames of multifaces (two faces in each image). Then, we have Table 1 to show these results. In Table 1, we find the same problem of these two methods is that it costs much time when detection fails. It is because we need to search the whole frame in this condition. Consider the following:

Then, we have (12) to define detection accuracy of every image. In (12), is frame number with correct detection and is total frame number. With fames of single face, the accuracy is 72.0% by using classic method and 96.8% by this fusion method. Moreover, the accuracy is 60.5% by classic method and 89.6% by this fusion method with frames of multifaces (two faces in each image). We have Table 2 to show these results. In Table 2, we find accuracy of the fusion method is better than the classic one. We check fail frames of these two methods and find that the classic method shows more negative than the fusion one when deflection of faces is large. The classic method cannot detect all deflecting faces, but the fusion one can detect most of them. Consider the following:

5. Conclusions

In this paper, by fused skin color model, facial detection method, and moving object capture algorithm, we present a fusion facial detection method. This method takes full advantage of information in continuous frames of the detecting video and shows that it is positive in facial detection. Furthermore, we reach that the fusion method has well detecting effect when expressions and facial gestures change greatly. This fusion method remedies deficiencies of the classic method. Finally, we validate our method by using lots of experimental results. The experimental results indicate that the fusion method makes a good effect where faces are moving variously.

The deficiency of this method is that detecting accuracy is low when faces move quickly. In fact, the quickly moving faces lead to error of moving facial judgment. It is because the threshold of two continuous frames is too large that our method treats them as two faces.

Acknowledgments

This work is supported by Grants Programs of Higher-Level Talents of Inner Mongolia University (nos. 125126, 115117), Scientific Projects of Higher School of Inner Mongolia (NJZY13004), National Natural Science Foundation of China (nos. 61261019, 61262082), Key Project of Chinese Ministry of Education (no. 212025), and Inner Mongolia Science Foundation for Distinguished Young Scholars (2012JQ03). The authors wish to thank the anonymous reviewers for their helpful comments in reviewing this paper.