With the rise in biometric-based identity authentication, facial recognition software has already stimulated interesting research. However, facial recognition has also been subjected to criticism due to security concerns. The main attack methods include photo, video, and three-dimensional model attacks. In this paper, we propose a multifeature fusion scheme that combines dynamic and static joint analysis to detect fake face attacks. Since the texture differences between the real and the fake faces can be easily detected, LBP (local binary patter) texture operators and optical flow algorithms are often merged. Basic LBP methods are also modified by considering the nearest neighbour binary computing method instead of the fixed centre pixel method; the traditional optical flow algorithm is also modified by applying the multifusion feature superposition method, which reduces the noise of the image. In the pyramid model, image processing is performed in each layer by using block calculations that form multiple block images. The features of the image are obtained via two fused algorithms (MOLF), which are then trained and tested separately by an SVM classifier. Experimental results show that this method can improve detection accuracy while also reducing computational complexity. In this paper, we use the CASIA, PRINT-ATTACK, and REPLAY-ATTACK database to compare the various LBP algorithms that incorporate optical flow and fusion algorithms.

1. Introduction

Facial recognition is a major research hotspot that has been widely applied to all areas of life, from the registration of certifications, transfers, and accounts of major financial institutions to identity verification in professional exams or mobile phone and entertainment APPs. These applications use a camera to obtain frame-by-frame images of the owner’s face, by analysing image attributes to determine whether the user’s identity is legitimate. Prompted by personal information attacks and threats, criminals are able to obtain pictures or videos from legitimate users to engage in fake face attacks. This behaviour is a serious threat to personal property security and public safety.

At present, there are three main types of attacks: legitimate users of high-definition photo attacks, video recording attacks, and three-dimensional face model attacks [1].

1.1. Photo Cheating [2]

Photo cheating is one of the most common and convenient ways to attack a biometric identity authentication system. The use of high-definition photos in front of the system camera is utilized to properly bend, move, and create real face effects. Furthermore, removing the eye part in the photo and using face photos and real human eye rotation all spoof the detection system.

1.2. Video Deception [3]

Video deception [3] uses cameras, pinhole cameras, and other means to shoot videos of the legitimate user, which poses a great threat to the facial recognition system. Compared to the photo, the video can obtain head or facial movement and eye information from the blink of eye movement. Even though this is done via secondary imaging, this attack is still as realistic as a living body.

1.3. 3D Model Cheating [4]

By using the user’s photos or videos, attackers can print a three-dimensional facial model. However, three-dimensional models with facial texture and expression can have large gaps when features are extracted from photos and videos. However, with the development of advanced printing technology, a three-dimensional model may more closely resemble the human face, even to the extent that the human eye cannot identify any differences; therefore, such a facial recognition system poses a great threat.

Today, there are numerous ways to identify fake faces from a variety of identification features. Many well-known journal and conference papers have carried out in-depth analysis of facial recognition software. The algorithms are divided into the following three categories.

2.1. Image Attribute Analysis

The image attributes that are analysed mainly include three-dimensional depth information, Fourier spectrum analysis, multispectral imaging technology, facial optical flow analysis, and the local binary model. De Marsico et al. proposed a method, based on geometric invariance, to detect three-dimensional objects [2]. They compared zigzag photographs and the geometry of the face. The results showed that the geometric shape of the picture is a “V” figure. However, because the face surface is rugged, it is easy to distinguish the face. Boulkenafet et al. introduced a new method that detects fake face attacks using colour texture analysis [5]. By using the luminance and chrominance channels of the combined colour texture information, complementary low-level feature descriptions are extracted from a variety of colour spaces; the feature descriptions from each image band are combined to produce a feature histogram. Garcia and De Queiroz proposed a detection algorithm based on a moiré search pattern [6]. When a photo or video attack occurs, the image, after digital media sampling, has higher overlapping pixels. A Gaussian filter, with an isolated high frequency mode, is used to extract a descriptor with less training data to obtain a higher recognition effect. Additionally, Määttä et al. researched analytical methods, based on multiscale and multiregional LBP features that were proposed at the 2011 International Symposium on Biometrics [7]. However, highly pixelated photos and video used for image attribute detection still pose a large threat.

2.2. Movement Attribute Analysis

Kollreider et al. proposed an interactive living detection method; this system requires the user to perform a set of simple actions, for example, reading words or verbally identifying images/figures, to analyse whether lip movement is a consistent feature to determine whether a user is a live person [8]. In addition, Arashloo et al. proposed multiscale dynamic texture descriptors based on binary statistical images, which feature three orthogonal planes (MBSIF-TOP) and are effective in detecting spoof attacks [9]. Next, by combining multiscale, local, phase quantization representation, the robustness of the spoofing attack detector can be further improved. Kim et al. proposed a real-time, noninterfering algorithm that uses the diffusion rate of single-frame images [10]. By calculating the diffusion speed between the true and false faces, a local velocity model was defined. The result is inputted to the linear SVM classifier by using the antispoofing and full flow scheme.

2.3. Static Dynamic Binding Analysis

In addition to the point mentioned above, Komulainen et al. improved the method of their predecessors and proposed the use of motion correlation analysis and face texture analysis to achieve in vivo detection [11]. Through the combination of dynamic and static analysis methods, this method can be more comprehensive as a face detection method. Experimental data of the article were collected using a camera to obtain video information; features from the adjacent frame contrast and face texture analysis were intercepted. Facial recognition technology has been increasingly applied to smart devices. Smith et al. identified a set of response mechanisms from a video sequence and digital watermarking in conjunction with the feature extraction test on smart terminal devices [12]. The method achieved very good test results.

In summary, the current fake face detection techniques have launched a wide range of research. Whether it is a feature of facial image extraction or human-computer interaction, all methods have achieved good results. However, highly pixelated images and video attacks still have many difficulties. Most methods require interpersonal interaction to identify fake image faces in video attacks. However, if you master the interactive content, video attacks can also be evaded.

In this paper, fake face analysis is made based on optical flow and a local binary model. This work is used to obtain facial motion characteristics and facial features rather than fusing multiple methods. This way, the method reduces its dimensionality and finally trains the SVM classifier to make accurate true and false judgements. Meanwhile, the public CASIA database will be used to identify potential test volunteers. The method proposed in this paper does not need the users to cooperate with one another, and it has some hidden mechanisms (optical flow) for obtaining dynamic information. Meanwhile, the extracted feature dimension is low using this algorithm, which reduces the computational cost and complexity of the algorithm. This article introduces the fake face detection method in detail throughout the second section. Meanwhile, the experimental results are compared in the third section. The fourth section summarizes the full paper.

3. Fusion of Fake Face Detection Algorithms Based on a Variety of Features

For high-definition photos and videos, our naked eye cannot easily distinguish between a real and fake face. We also find it difficult for the computer to analogously identify real and fake faces via image attributes. This paper proposes a fusion algorithm that can resolve high-resolution image and video attacks. First, the LBP feature is extracted from the facial part. LBP [1618] is an effective texture description operator, which has significant advantages such as rotational invariance and grey-scale invariance. It plays an important role in the extraction of texture features from the video photo face and the real face. Simultaneously, it also uses the optical flow algorithm [1921] to extract motion-based features on the facial features. This is largely because, when a person’s face is in front of the camera, he will unconsciously blink, move lips, and have other facial movements. Figure 1 shows some real faces with the best picture and video contrast. Each video is 5 seconds long and randomly intercepts two pictures of the adjacent 10 frames.

By examining Figure 1, the naked eye can identify the true face. However, it is still difficult to distinguish between faces found in some high-definition pictures and videos. Although careful analysis of the video images can identify local movement such as blinking, simply bending the photo can also create facial movement. However, there is no local movement that is relative to the overall vector movement. The movement in the video and the real face are very similar. However, by analysing facial texture, the characteristics of the real face and the video face are very different. In summary, the combination of dynamic and static methods used in this paper is theoretically valid.

3.1. Local Binary Patter (LBP)

The local binary pattern (LBP) [22], as an operator of texture description, is often used to measure and extract an image with local texture information, which has the significant advantages of grey-scale and rotation invariance. Since the LBP-based algorithm was proposed, it has enabled a resurgence of facial recognition research. For example, it extended areas to arbitrary neighbourhoods and used circle neighbourhoods instead of square neighbourhoods. The basic LBP algorithm calculates nine pixels and then compares their weight. The computing method compares the periphery of the eight pixels; if the centre is greater than the others, the value is set to 1; otherwise, the value is 0. The LBP weight is calculated as follows:where is the centre point, the pixel . is the pixel of the neighbour of , and is the threshold function:

The largest drawback of the basic LBP operator is that it encompasses only a small area of calculation and is not robust to light and noise. First, we extracted the face image from the video frame via our lab. Then, we treated the part of the image with the face in a greyness and noise reduction way. Finally, we extracted the necessary features via the LBP algorithm. This paper proposes a pixel-neighbour operator of relational grey pixels based on multiscale LBP features. Figure 2 shows a window pixel arrangement.

The neighbouring-pixel-based LBP algorithm in this paper is processed by the following steps.

Figure 2 reflects the nine pixels. The upper left corner pixels act as the starting point. Each pixel value is determined in clockwise order, that is, integration into a row: S9 S8 S7 S6 S5 S4 S3 S2 S1.

Next, the arrangement is converted into binary codes. Starting from the highest level, S9 is compared with the current value of pixels as well as with the next pixel. Moreover, the last pixel is compared with the first pixel. If the current pixel is greater, then the value is set to 1. Otherwise, the value is set to 0. The formula is expressed as follows:

Finally, we determine the final contrast and compare it to the binary code of the original pixel. The average of 1 on behalf of the pixel minus the average of 0 represents the value of the pixel.

Figure 3 shows the output characteristics of a characteristic figure that used LBP.

We found that the attack face feature is obviously not as clear as the true face of the user. This is because the reflection of the real face is diffused. Since picture or video reflections have a fixed angle, this step will reduce the image characteristics that have been collected. In addition, the feature graph, extracted by the video, has some colour effects that are obviously different compared with other characteristics.

To reduce approximation errors caused by the local area and fixed radius of this kind of algorithm, the article also joins multiscale expressions with pyramid image resolution, which represents a series of expressions as image multiscale expressions. Every layer of the pyramid image has a unique size and resolution. The images in the floor in the pyramid have higher resolution. The lower layer’s output is used as the next layer’s input characteristics; this increases the algorithmic complexity produced by multiple outputs, which is effectively avoided.

3.2. Optical Flow Analysis

The definition of optical flow is the calculation of position changes between two consecutive frames [19]; it has important value in micromotion analysis. The front of the camera’s facial movement belongs to the micromovement, and background and light do not change. Usually, a person blinks, moves his or her lips, or twitches his or her head. Photos or video structures are two-dimensional, so after the secondary imaging, the feature points for the movement and the true face are not consistent.

Assume a pixel point on the image is ; the brightness at time is , and the brightness at time is . The movements of the point a vector are expressed by and on behalf of the vertical and horizontal movement vector, respectively.

After time, the point’s brightness is . The point brightness is the same as when the infinite is close to zero. The brightness can be expanded using the Taylor formula:

When using the optical flow calculation image features, the brightness is the same as LBP, since they both use the pyramid structure to calculate the image in multiscale. Finally, the least squares method is applied to solve the basic optical flow equation for all the pixels in the neighbourhood. The algorithm is not sensitive to noise and is not affected by noise and edge effects. When the image is stratified to calculate the optical flow, the feature points of the motion are more effectively differentiated. Generally, the establishment of the pyramid does not require too many layers, so this paper utilizes a three-tier pyramid structure. For the upper layer, the pixel coordinates are; the coordinate points calculated in the lower part of the pyramid are .

Each layer of the image will output the optical flow and transformation matrix. The transformation matrix is the smallest grey difference matrix vector between adjacent frames. The iterative calculation of the optical flow is delivered via the upper layer of the optical flow matrix ; then, the transformation matrix is delivered to the next layer. Finally, the optical flow and superposition are determined.

Figure 4 indicates the two images extracted by our lab at random. Our Optical Flow algorithm needs a window size that can be used to classify a pixel; thus, it has an impact on our feature extraction processing.

Figure 5 shows the feature using this algorithm, which shows the size effect of different windows on the image. The first line of the picture shows the direction of motion between two different frames, via our optical flow algorithm. When the window size is 29, the number of the feature vectors meet our needs and the characteristics of the facial movements are more apparent. The second line reflects the corresponding heat map, which has different colours to indicate different features.

To select the best size, we calculated many window sizes in our lab (Figure 6). The experiment indicates that oversized windows increase the complexity of the subsequent calculation and reduce the subsequent accuracy. Thus, our experiment selected a window size of 29.

As the pyramid structure from top to bottom shrinks, the image size and its resolution lead to reduced optical flow information. In this paper, the top layer is defined as the beginning. Then, calculating to the bottom cumulatively, so each layer of the light flow is kept relatively small, but final calculation of the optical flow increases.

3.3. Multifeature Fusion Detection Method (MLOF) Based on Static and Dynamic Combination

In this paper, a multifeature fusion scheme is proposed that combines the texture operator LBP and the optical flow algorithm, while also classifying the true and false face by SVM. The flow chart is shown as Figure 7.

Step 1. First step is face extraction and background extraction. Two adjacent frames are taken as input images from recorded real face and fake face videos. Then, the optical flow is used to compare the two images to determine whether there is a motion vector while the LBP uses one of the computing features. The extracted image is converted into the grey level, and then the partial image is extracted.

Step 2. Second step is optical flow analysis and LBP analysis. Take two images for optical flow calculation. The steps are as follows:(1)Different images are used to reduce the impact of noise.(2)Establish the pyramid of image and calculate the size and resolution of each image in the Pyramid. In this experiment, we use a Gaussian filter to process the image, which is to say, the -layer image can be obtained by smoothing and subsampling the -layer image, and the pyramid contains a series of low-pass filters. To obtain the image of the first layer, the layer image needs to be described by the convolution of the Gauss kernel. Then, all the even rows and columns are removed. The resulting graph is 1/4 of the original image, which reduces the image processing. By iterating over the input image, the whole pyramid appears. The convolution function is as follows: is the Gaussian convolution kernel of length 5. Then, the image is expanded in each layer. The expansion parameter is set to 14 pixels. Then, the horizontal and vertical features of the upper image are superimposed. Figure 8 shows the size of , , .(3)Using optical flow function to determine the characteristics of this layer,(a)obtain two frames of the current pixel matrix and obtain the difference matrix with the two-matrice subtraction;(b)four feature maps are for the square and gradient matrix multiplication. Image pixel difference is obtained by the gradient matrix;(c)comparing the value of each difference over the specified threshold, calculate the point moving vector over the threshold.(d)By calculating the sum of the vectors, we obtained two channel pixels. The two-channel pixels are merged into a single channel pixel, and the characteristics of the current image are outputted.(4)Extracting LBP feature: the image is divided into blocks ( pixels in the experiment).(a)The pixels of the block are arranged clockwise, and the centre pixel is placed in the last bit or the first position. According to the arrangement of adjacent pixels, a 01 code is obtained.(b)The decimal number and contrast LBP/C are obtained based on the obtained codes.(c)The histogram for each block is calculated, and the normalization of histogram is obtained.(d)The histograms of every block are connected to obtain the features of the current image.To compare the MLOF algorithm for feature extraction, we compared our MLOF algorithm’s histogram of the image feature to real faces and three different attacks. Figure 9 indicates that the characteristics of the real face are stronger than the fake face and photo and video attacks significantly reduce the facial features.

Step 3. Third step is analysis and classification of the two characteristics above. The Gaussian kernel function SVM method is used in this study. The two features are distinguished by finding a separating hyper plane. However, it cannot be guaranteed that a hyperplane can divide all the data linearly. In this experiment, we introduced the Gauss kernel function, which can be mapped into a higher dimensional space. The features obtained by the above algorithms are trained, including the features of the training set and test set. The ROC curves for classification results are drawn.

4. Experimental Result

4.1. Experimental Preparation

To detect the accuracy and efficiency of the experiment, this experiment was tested on the public face video database. All video data provided by the database are collected in an uncontrolled environment. The background region is complex and the light conditions are changeable. To consider different ways of attacking, high-definition image targets are displayed on different mediums, including print on ordinary A4 paper, print on photo paper, and display on a high-resolution screen. In addition, the glasses area of the face of the A4 paper or photo is removed to resist the living detection method based on the blink model.

The subsequent experiments compared the CASIA database, the PRINT-ATTACK database, and the REPLAY-ATTACK database. The data collection of the above datasets are, respectively, shown as Tables 1, 2, and 3.

4.2. Experimental Comparison

To test the effectiveness of the algorithm, this paper extracts the LBP and optical flow features to both the true and false faces for the classification training via SVM. The detection accuracy is shown in Table 4, and the accuracy of experimental algorithm and other commonly used algorithms to the human face are shown in Table 5.

Through the data in both Tables 4 and 5, the MLOF algorithm shows a strong ability to distinguish the real and fake faces. The LBP algorithm can detect the face texture, and the optical flow algorithm can detect whether there is movement in the face part. Moreover, the optical flow vector of different attack modes is obviously different. The difference in lighting and the environment of the photo and video secondary imaging also has a great impact. Therefore, the combination of the two to increase the accuracy rate is feasible. In Table 4, the accuracy rate of NLBP is 93.12%, and the accuracy of LK optical flow algorithm is increased to 97.56%. Table 5 shows the comparison results of various feature extraction algorithms. The results indicate that the correct rate and characteristic dimension in this experiment are obviously improved. Tables 2 and 3 show that the experiment can reduce the complexity of the algorithm and reduce the feature dimension while also improving the accuracy rate, whether it is a separate test or a single algorithm.

4.3. Experimental Results

The ROC curves of the LBP, optical flow, HOG, and improved experimental algorithm are as Figure 10(a). Figure 10(b) shows the algorithm’s ability to recognize photo attack, video attack, image with real eye attack, and the total three-way attack.

The operator accepts that the ROC curve is used to evaluate the algorithm performance. Through the ROC curve and the relationship between the positions of the axis, the operator can visually examine the algorithm’s ability to distinguish between the real and fake face. The closer to the line the ROC curve is, the better the performance of the algorithm is. As shown in Figure 10, each point on the ROC curve represents the corresponding FRR and FAR values at different thresholds. The performance evaluation of the entire ROC curve for the algorithm has nothing to do with the selection of the specific threshold. This feature also makes the ROC curve better reflect the robustness and can be applied to compare the use of the various algorithms. It can be seen from the ROC diagram in Figure 10 that the fusion algorithm of this experiment is superior to other algorithms and has good recognition performance and higher accuracy and robustness for both photo and video attacks.

In Figure 11, we describe, in more detail, the accuracy of our proposed method utilizing the CASIA dataset. MLOF represents the algorithm proposed in this paper, as well as a variety of feature fusion algorithms. This article is used to use the linear SVM classifier to distinguish true and false faces. Compared to the accuracy of LBP and the DOG algorithm, we can see that our proposed algorithm is superior in accuracy compared to the other two kinds of algorithms.

We also compared the accuracy of the algorithm on multiple databases, and the accuracy result is displayed in Table 6. The results show that the algorithm proposed in this paper has high accuracy in multiple database tests.

Table 7 provides a comparison with the state-of-the-art face spoofing detection technique proposed in the literature. As we can see from Table 7, our proposed fusion analysis approach achieves very competitive performance in multiple datasets compared with other advanced algorithms. Most importantly, the essay’s approach is able to reflect the stable performance across all three benchmark datasets.

The computation cost of most of the published methods on face spoof detection is unknown. In this paragraph, we verify the efficiency of our proposed approach. In order to analyse the processing time in detail, we show a comparison of the processing time of the proposed method with that of other methods in Table 8. As we can see from Table 8, we can conclude that the difference between the processing time of the proposed and other approaches is negligible, while our approach significantly outperforms previous ones. Currently, the proposed approach is implemented in MATLAB, likely allowing for further optimizations.

5. Conclusion

Facial recognition technology is user-friendly and direct and has many characteristics to identity authentication, which has been widely used in the financial and banking fields as well as other fields. However, the attacks on facial recognition software have become more sophisticated. The main methods of photo attacks and video attacks along with other types of attacks are always performing perfectly. This paper aims to improve the existing methods of facial recognition technology based on photo and video attacks. The complexity of the algorithm and the dimension of the feature values are reduced by using a pyramid structure. Because the output optical flow algorithm in this paper possesses dual-channel characteristics when merging the dual channel characteristics into a single channel characteristic, directly calculating their square root (horizontal optical vector and vertical optical vector) may lead to lower accuracy. We will continue to study a better fusion method. Meanwhile, the classification algorithm of SVM with high accuracy and the depth learning algorithm will both be used to improve the accuracy and computational efficiency. Although the samples for training and testing in all images are people with yellow skin tones, we will verify more experiments with people of other skin tones in our sample library in the future.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the Fundamental Research Funds for the Central Universities (2018ZD06).