Abstract
Face recognition has been widely used in personal authentication, especially on edge computing devices. However, face recognition systems suffer from face spoof attack. In this paper, a novel method for face spoof attack detection in edge computing scenarios is proposed. It is based on federated learning and improves traditional federated learning with multitask learning and manifold regularization, which is known as federated learning for face spoof attack detection (FedFSAD). In this way, local model learning is completed on edge devices and global model learning only depends on the trained local models without using the original image data. Besides, the performance is improved by imposing hypergraph manifold regularization in the global training of multitask learning. The results of comprehensive experiments show that the detection performance is improved by about 10% and robust against stragglers and network delays, which indicates the effectiveness of FedFSAD.
1. Introduction
Personal authentication with face recognition has been widely used currently. However, facial images can be easily captured or faked. With these images, face spoof attack may be conducted. In some interactive applications, such as mobile payment and online banking systems, users are required to perform some predefined actions. In this way, face recognition systems can ensure that a live person is recognized [1]. However, interactions may slow down the authentication process. Besides, requirements are not always satisfied. Therefore, more advanced methods without interactions are needed.
To detect face spoof attack without interactions, researchers make use of different image features, such as motion features [2, 3], texture features [4], and image quality features [5]. Due to the descriptive power, deep features are also used [6, 7]. Since only a single type of features may not sufficient to describe facial images, methods with multiple features are proposed. Some of them make use of additional sensors to obtain different types of features, such as near-infrared illumination [8] and depth information [9]. Atoum et al. used HSV and YCC images instead of RGB images. In this way, features in both color images and depth images can be extracted [10].
To make better use of the above features, learning methods are also critical. With Local Binary Pattern (LBP) features, support vector machines (SVMs) were used to train the detection model [4]. The blinking-based approach using conditional random fields (CRFs) was used to detect face spoof attack [11]. To improve the efficiency for edge face recognition systems, spoof attack score is measured with Hamming distance [12]. Furthermore, some novel methods are proposed to handle multiple features. Deep learning is one of the representative frameworks to train the model with multiple features. For example, both spatial and temporal features in CNN were used for attack detection [7]. Shearlet features, RGB images, and optical flow were used as image quality, pixel colors, and motion cues by Feng et al. [13]. They were also combined in neural networks.
As we know, a large number of face recognition systems run on edge devices due to the growing storage and computational power. They are used in mobile authentication, security entrance, and so on [14]. These devices can be easily connected by a fast network. Therefore, face spoof attack detection is also needed on these devices. In this scenario, the data come from different sources. Researchers try to make better use of these data. Shao et al. made use of metalearning and the feature space with deep learning to tackle the domain generalization problem [15]. In addition, training images are not directly shared between data owners due to legal and privacy issues, which bring in new challenge in many applications [16–18]. Face spoof attack detection makes use of facial images, and they are also critical personal information. To tackle it, researchers try to store data locally and push more network computation to the edge. Federated learning is a novel framework proposed to train models on devices [19]. Then, these models can be used in classification or regression without touching the training images directly [20]. Shao et al. have done the pioneer work on using federated learning for face spoof attack detection [21]. However, they focus on tackling data centers with significant domain shift effectively but not improving the performance of federated learning framework.
Generally speaking, current methods for face spoof attack detection significantly depend on the quantity of training data. Federated learning can be used to alleviate this issue, but existing models cannot collect and use the distributed data sufficiently and safely. In this paper, a novel method for face spoof attack detection in edge computing scenarios is proposed. It is based on federated learning and improves traditional federated learning with multitask learning and manifold regularization, which is known as federated learning for face spoof attack detection (FedFSAD). The contribution can be summarized as follows:(1)First, we propose a novel framework for face spoof attack detection in edge computing scenarios. It models the problem of federated learning with the multitask learning idea.(2)Second, the process of multitask learning is further improved with manifold regularization, in which the inner relationships among different training tasks are explored to learn a unified model.(3)Third, we propose hypergraph manifold regularization with sparse representation. Multiple vertices are connected by one hyperedge and the connectivities among features are computed by sparse learning.(4)Finally, with the trained model, face spoof attack is detected in the classification process. Comprehensive experiments are conducted to indicate the effectiveness our method on three commonly-used benchmark datasets for face spoof attack detection.
The remainder of our paper is organized below. In Section 2, we outline the proposed FedFSAD first and then introduce it in detail. After theoretical introduction, in Section 3, we show the improvements of FedFSAD on face spoof attack detection in edge computing scenarios. Finally, in Section 4, we provide some discussion about the novelty and improvements of the proposed method.
2. Federated Multitask Learning with Manifold Regularization
2.1. Outline
The proposed method can be outlined by Figure 1. The whole framework consists of local model training in edge subsystems and global model training in the server. Local models are trained separately and transferred to the server. Then, the global model is trained using local models and robust to a small fraction of subsystems unpredictably dropping.

2.2. Notations
To make the paper clear, we summarize the definitions of notations in Table 1.
2.3. Federated Learning Framework for Local Model Learning
The proposed method is based on the routine of federated learning. In the setting of federated learning, local model training is completed on the edge directly. It brings in two advantages. First, plenty of data can be collected in a distributed way. More training data can be used to improve the performance since the quality of images captured by edge devices is not always satisfactory [22, 23]. Second, image data will not be transferred to servers and data privacy can be preserved [24]. Assuming that is the facial image data captured on edge devices and they can be represented by image features . To obtain the global model, we need to solve the local subproblem and compute the local models firstly. Then, the parameters of the global model can be updated by incoming local data. Inspired by CoCoA [25] and MOCHA [26], for the -th device and the corresponding image data denoted by , the -th subproblem is defined bywhere . represents the values of within the -th task, which indicates the parameters updated by the -th task. is the average of and . is the loss function of the -th task, which demonstrates the differences between the predicted results and the ground truth. is a constant parameter to control the updating speed of the federated model. . is the -th diagonal block of , which is defined aswhere is the identity matrix. Computing requires and is the -th block of , which is required to transferred between devices and servers.
With and , we can define the dual problem aswhere is the dual variable for the data point and is the label. Therefore, to solve the above subproblem and compute , we have to find good definitions of , , and .
2.4. Multitask Learning for Global Model Learning
The key of CoCoA and MOCHA is the solution to equation (1). In the proposed method, we try to improve it. Therefore, we propose to solve equation (1) using multitask learning. In multitask learning, we set the training process on each edge device as a task. They can be trained separately and then combined to obtain a unified model. This model can be transferred back to the devices and used for face spoof attack detection. There have been several definitions for and in multitask learning provided by MALSAR [27]. can be arbitrary convex loss functions such as the hinge loss and so on. can be defined as a clustering loss and computed by a biconvex function:where the -th column of indicates the weight of the -th task and indicates the weights among different tasks. can be updated according to equation (1). However, updating with equation (1) requires and depends on . Therefore, the key to the proposed method is computing an optimal .
2.5. Hypergraph Manifold Regularization with Sparse Representation
Manifold regularization has been widely used to describe the relationships among data [28]. Inspired by it, we make use of manifold regularization to model the weights among different tasks , which is known as the Laplacian matrix modeling the relationship of features in the feature space. In the feature space, the LBP features of images are considered as the vertices and the relationships are connectivities among them. In contrast to the traditional graph, hypergraph allows an edge to connect more than two vertices. Thus, an edge contains a subset of vertices. Hypergraph has been proved to be a better idea to describe the connectivities among related data. Notations used in hypergraph regularization are summarized in Table 2. Based on the patch alignment framework [29], we propose Hypergraph Manifold Regularization with Sparse Representation (HMRSR), in which can be computed with two steps:
2.5.1. Part Optimization
We define one patch to be the vertices connected by one hyperedge. Thus, the patch in the proposed regularization process is defined by
For one patch, we should computewhich means that we randomly choose two vertices in the subset of vertices contained by a hyperedge, , and sum the value of
Expanding (6) and combining items, we can get the patch optimization for each hyperedge:
Matrix iswhere , is an identity matrix.
2.5.2. Whole Alignment
In the hypergraph, the weight of a hyperedge is computed by summing the similarity scores of all the pairs of vertices contained in this hyperedge. The similarity score of any pair of vertices is defined as the distance of image features:where represents the image feature vector of vertex and is the similarity of and . With the hyperedge weighting matrix, the multiview hypergraph Laplacian can be computed by summing the patch optimization defined in (8) of all the hyperedges:
(11), there are three matrix to be initialized. They are , and . is computed by obtaining the most similar vertices:
Then, can be computed with . The -th item of can be computed by
Finally, can be computed with . The -th item of can be computed by
In (10) and (12), we need to define a reasonable measurement for feature similarity. Traditionally, it is computed by feature distances, such as L2 distances and so on. In this paper, we make use of sparse learning. In the result of sparse learning, one vector is represented by the combination of basis vectors and only about 30% coefficients are nonzero. Then, the coefficients can be used to represent the relationship between them [30]. There are several existing solutions for sparse learning. Among them, we choose the approximation by L1 norm [31]. Then, the -th image feature can be represented as the combination of the whole feature set. In this way, the whole feature set is used as the basis vectors. Then, the coefficient can be computed bywhere is the resulting coefficients. To compute (15), we choose the LARS with Lasso modification implemented by SparseLab [32].
2.6. Implementation Details
The training process of the proposed FedFSAD is shown in Algorithm 1. With the trained model, the input image can be classified as a real image or a fake image.
|
3. Simulated Evaluations
3.1. Datasets and Settings
In the experiments, we use three challenging datasets for face spoof attack detection. The first one is the NUAA Photograph Imposter Database (NUAA). NUAA is collected by generic and commonly-used webcams [33]. It is collected in three sessions. The place and illumination conditions of each session are different. There are 5105 real faces and 7509 fake images from 15 subjects in total.
The second dataset is the Multispectral-Spoof face spoofing database built at Idiap Research institute (MSSPOOF) 2. It contains both color images (VIS) and infrared images (NIR) [34]. Similar to NUAA, images in MSSPOOF are recorded in different light conditions. The number of subjects in the database is 21. There are 70 real faces and 144 fake images for each subject. Examples of the database are shown in Figure 2.

The third dataset is the CASIA Face Antisproofing Database (CASIA-SURF) [35], which is collected by Automation, Chinese Academy of Sciences 3. It contains 29266 training samples, 9608 validating samples, and 57710 testing samples. A color image, a depth image, and an infrared image are provided for each sample.
In the experiments, the performance is measured by the classification accuracy, which is computed bywhere is the number of correctly classified samples. Classification is completed with a simple SVM regularized by the trained model [26]. For cross validation, we randomly choose samples in training and the rest samples are used in testing. By default, samples in the datasets are equally assigned to tasks. In addition, to simulate the scenario of federated learning, edge subsystems randomly drop. This process is repeated 20 times and the average results are shown. All the facial parts are detected and resized to . A laptop with i7-9750H CPU, 16G RAM, and GTX1650 GPU is used. Evaluations are run on MATLAB R2017a.
3.2. Comparison of Different Manifold Learning Methods
Manifold regularization has been comprehensively studied. In this part, we demonstrate the effectiveness of the proposed Hypergraph Manifold Regularization with Sparse Representation (HMRSR). We compare it with existing manifold learning methods, such as LDA, DLA, LPP, NPE, LSDA, and ISOMAP [29]. The results are shown in Figure 3. Thanks to hypergraph learning and sparse representation, the proposed HMRSR can capture the connectivities among features and achieve better performance.

3.3. Parameter Sensitivities
As shown in Table 1, the proposed method depends on 3 parameters. They are , , and . The setting of is quite complicated, and we follow MOCHA [26]. Performance is influenced by and . Reasonable and should be set for different datasets. The performance with different is show in Figure 4. We can figure out that the performance of FedFSAD is achieved when for NUAA, for MSSPOOF, and for CASIA-SURF.

(a)

(b)

(c)
The performance with different is show in Figure 5. We can figure out that the performance of FedFSAD is achieved when for NUAA, for MSSPOOF, and for MSSPOOF.

(a)

(b)

(c)
3.4. Comparison with Existing Methods
First, we emphasize the improvement of the proposed FedFSAD with multi-task learning and manifold regularization. FedFSAD is compared with a fully global model, a fully local model, and previous multitask model MOCHA. Besides, some state-of-the-arts are also included in the numerical comparison. In this experiment, we refer to the following methods:(1)CoCoA [25]: CoCoA is a communication-efficient framework for distributed learning. It uses train local models in a primal-dual setting. In this way, the amount of transferred information can be reduced.(2)MOCHA [26]: MOCHA extends CoCoA with multitask learning and is applied to federated learning.(3)SeetaFace6 [36]: SeetaFace is an open-source project for face applications with computer vision. The latest version, which is named SeetaFace6, provides face spoof attack detection. The model has been trained and we directly use it in testing.(4)Guided Scale Local Binary Pattern (GS-LBP) [4]: GS-LBP makes use of the edge-preserving property of the guided scale space. Besides, joint quantization is used to encode the spatial locality. Therefore, it can be used as the facial image feature. SVM is used as the classifier.
The result is shown in Table 3. The items with the best performance in each dataset are highlighted in red. Generally speaking, CASIA-SURF is the most difficult. SeetaFace6 and GS-LBP provide stable performance. However, they have not considered the improvements with multiple tasks. Thanks to the application of multi-task learning, MOCHA achieves better performance than the fully global model and the fully local modal. In addition, the proposed FedFSAD is better than MOCHA due to the usage of manifold regularization.
Second, we show the robustness of the proposed FedFSAD. In the scenario of federated learning, stragglers and network delays are critical. Stragglers appear when it takes too much time to train local models, while network delays appear when it takes too much time to transfer local models to the server. In these experiments, we take CoCoA and MOCHA into comparison and show the results on statistical heterogeneity and system heterogeneity. The results are shown in Figures 6 and 7. As the time elapse, primal suboptimality can be reduced. We can figure out that the proposed FedFSAD is robust and outperforms state-of-the-arts when stragglers and network delays appear. We also conduct the cross datasets testing, which is shown in Table 4. We can figure out that the proposed method is still applicable in cross-set scenario. Besides, if a larger training set, such as CASIA-SURF, is used, the performance is better.

(a)

(b)

(c)

(a)

(b)

(c)
4. Conclusion and Discussion
According to the methodology of face spoof attack detection and the improvements of simulated performance, the novelty and contribution of the proposed FedFSAD can be shown.
First, the proposed method tries to tackle the issue of face spoof attack detection on edge devices. Based on the framework of federated learning, we introduce multitask learning. Therefore, we proposed a solution to federated learning with multitask learning. Besides, it is improved by using manifold learning. In this way, the relationships among tasks on edge devices are explored by hypergraph manifold regularization with sparse representation. Therefore, the proposed method is a novel method for face spoof attack detection.
Second, comprehensive simulation has been conducted. According to the results, FedFSAD outperforms exiting methods on accuracy of face spoof attack detection. It proves that a better model is obtained with multitask learning and manifold learning. Besides, we simulate the situations of stragglers and network delays. Although some information is late or missed in these situations, FedFSAD is still better than exiting methods. In this way, robustness is also improved.
In the future, we will focus on the unexpected problems on edge devices, such as hardware failure, network disconnection, and so on. In these situations, both the performance and robustness can be further improved.
Data Availability
The NUAA dataset is provided for research purposes to a researcher only and not for any commercial use. The data cannot be released and the link cannot be redistributed without the authors’ permission. Therefore, we contact them and obtain the data with permission. The MSSPOOF dataset is available at https://www.idiap.ch/dataset/msspoof.
Conflicts of Interest
The authors declare that they have no conflicts of interest.