Abstract

Face detection is a crucial prestage for face recognition and is often treated as a binary (face and nonface) classification problem. While this strategy is simple to implement, face detection accuracy would drop when nonface training patterns are undersampled. To avoid these problems, we propose in this paper a one-class learning-based face detector called support vector data description (SVDD) committee, which consists of several SVDD members, each of which is trained on a subset of face patterns. Nonfaces are not required in the training of the SVDD committee. Therefore, the face detection accuracy of SVDD committee is independent of the nonface training patterns. Moreover, the proposed SVDD committee is also able to improve generalization ability of the original SVDD when the face data set has a multicluster distribution. Experiments carried out on the extended MIT face data set show that the proposed SVDD committee can achieve better face detection accuracy than the widely used SVM face detector and performs better than other one-class classifiers, including the original SVDD and the kernel principal component analysis (Kernel PCA).

1. Introduction

Face detection plays a key role in human-computer interaction since it is a prior step to face recognition. Given an image, the objective of face detection is to locate the faces in the image and return the location of each face. Due to complex backgrounds, variations in facial details, and lighting conditions, face detection has been considered one of the most challenging pattern recognition problems. A large body of works has been presented to tackle this difficult problem in the past decades, which had been nicely surveyed in [13].

1.1. Related Works

Appearance-based approach has dominated the recent advances in face detection [3], which consists of two main steps: first, a sliding window is used for scanning the whole image in a serial fashion [4, 5]; then a preselected face detector performs a binary (face and nonface) classification task on each window image to verify whether a face is present or not in each window. Previous works based on this approach were focused on addressing issues such as (1) exploiting robust features, for example, Haar-like features [6], Bayesian feature [7], spectral histogram [8], local binary pattern- (LBP-)based spatial histogram [9], and principal component analysis (PCA) and its nonlinear version: Kernel PCA [10], (2) seeking face detectors with good generalization ability, such as neural networks (NNs) [1113], Bayesian classifier [7, 14], and support vector machine (SVM) [4, 8, 10, 1525], and (3) further improving efficiency of a given face detector by boosting-based techniques, in which the AdaBoost is probably the most famous and has been used in the Viola-Jones face detector [6, 26]. In this work, we aim at dealing with the second issue, the face detector design, and propose a novel face detector called support vector data description (SVDD) committee.

1.2. Problem Description

Appearance-based face detection typically treats the face detection task as a binary classification problem [2, 3, 5]: face and nonface classification. Accordingly, two-class classifiers were adopted. According to previous works, the two-class classifier SVM by Vapnik [27] has been frequently used as the face detector. The success of SVM in face detection should be attributed to the use of kernel tricks and its learning strategy based on structural risk minimization (SRM). However, SVM may still suffer from a critical problem when applying it to face detection: the high false negative rate due to unrepresentative nonface training patterns, described as follows.

To train an SVM, one has to prepare a training set containing face (positive) and nonface (negative) patterns in prior [22]. The training set is then used to train an SVM to find an optimal separating hyperplane (OSH) with maximal margin of separation in a kernel-induced feature space. However, compared with face class, nonface class’s distribution has a relatively large variation due to rich nonface patterns. In other words, it is easy to collect a set of face training patterns which can represent the face class; however, collecting a set of nonface training patterns which are representative enough is difficult because any patterns that do not belong to the face class are nonface patterns. In other words, the nonface class is most likely undersampled: distribution of the collected nonface patterns used for training is not identical to the true distribution of the nonface class. If an SVM is trained by such a training set in which the nonface training set is unrepresentative, many nonface patterns would fall into the wrong side of the OSH in testing stage, resulting in numerous false positives, as illustrated in Figure 1.

1.3. Presented Work

To avoid this critical problem, we choose to adopt the one-class learning strategy to deal with aforementioned undersampled problem of nonface training patterns. One-class learning is to solve the conventional two-class classification problems where one of the two classes is undersampled, or only the data of one single class can be available for training [2830]. One-class classifiers are to find a compact description for a specific class (usually being referred to as target class) and can be built on just one single class, the target class. In this work, we treat faces as targets, while nonfaces as outliers. The decision boundary of a one-class classifier is then used to distinguish targets from outliers.

Various one-class classifiers have been proposed, such as the linear programming (LP) approach [31], single-class minimax probability machine (MPM) [32], Gaussian mixture model (GMM) [28], one-class SVM [33], SVDD [34, 35], and the Kernel PCA for novelty detection [36]. In this work, we design a robust face detector based on the SVDD of Tax and Duin.

SVDD is a kernel method for novelty detection. Given a target training set, SVDD maps the set into a higher-dimensional kernel-induced feature space and then finds a minimum-volume hypersphere that can enclose all or most of the mapped target data in this feature space. Due to the use of kernel tricks, the sphere boundary in the feature space becomes a flexible one in the original input space, thus being able to fit any irregularly shaped target data sets. This is particularly useful for face detection since face patterns are in general nonlinearly distributed [37, 38]. Recently, success of SVDD has been shown in a variety of novelty detection problems, such as anomaly detection in hyperspectral images [39], defect inspection [40], and novel percept detection for a vision-guidance mobile robot [41].

However, SVDD still has its limits. When a target training set is not a compact set but is formed by a set of disjoint clusters in the data space, the generalization performance of SVDD would drop significantly, as pointed out by Tax and Duin [34]. Unfortunately, face patterns from different individuals form a multicluster distribution in the space of patterns. Thus, using one single SVDD to discriminate faces from nonfaces may not be adequate. To solve this problem, we propose in this paper an SVDD committee.

The training of the proposed SVDD committee consists of two stages. In the first stage, a given face training set is automatically partitioned into disjoint clusters using fuzzy -means (FCM) algorithm [42] and a partitioning entropy-based best cluster selection criterion [43]. The face patterns in each cluster form a compact face subset. Each face subset is then used to train a unique SVDD, which is the second stage. In addition, the decision boundary of SVDD often encloses the face training patterns tightly, which limits the generalization performance for testing faces. To improve the performance, we also modify the original decision function of SVDD so that the decision boundary of SVDD is enlarged. By doing so, face acceptance rate can be improved. Finally, if there are face clusters, SVDDs (members) will be trained. In the testing stage, each trained SVDD serves as a committee member. An unseen pattern is classified as a face pattern if it is accepted by any of the SVDDs. Details of the SVDD will be introduced in Section 3.

The rest of this paper is organized as follows. In Section 2, we first review the basics of SVDD. Then, the SVDD committee will be introduced in Section 3. Results and discussions are provided in Section 4. Finally, conclusions are drawn in Section 5.

2. SVDD

Let be a face training set, where are face training patterns. SVDD maps the training patterns into a higher dimensional space using a nonlinear map and then finds a minimum-enclosing hypersphere with center and radius in , which can be formulated as the optimization problem as where the penalty weight is user specified and are slack variables representing training errors. Taking the partial derivatives , and , where is the Lagrangian function and and are nonnegative Lagrange multipliers, and substituting the results back into yield the dual constrained optimization problem as follows: where is the kernel function. In this paper, the Gaussian function is used as the SVDD kernel, where is the parameter of the Gaussian kernel. According to the Kuhn-Tucker conditions, (1) the data points with are inside of the hypersphere, (2) the data points whose are on the sphere boundary, and (3) the data points whose fall outside the sphere and have nonzero . The data points with are support vectors (SVs). Further, the SVs with are called unbounded SVs (UBSVs), while the SVs with are called bounded SVs (BSVs). The center of the sphere is spanned by the mapped SVs, where is the number of SVs. The sphere radius is determined by taking any and calculating the distance from its image to the center as follows: For Gaussian kernel, , for  all  . Mapping the sphere boundary back into the original space yields a flexible boundary that encloses the face training set. The free parameter controls the tightness of the boundary: the smaller the is, the tighter the boundary is. However, the cannot be too small; otherwise the boundary will be too tight to get a satisfactory face acceptance rate for unseen patterns. The decision function of SVDD is given by

If , the test pattern is accepted as a face pattern, rejected as a nonface pattern otherwise.

3. SVDD Committee

3.1. Training

Stage 1 (partitioning). The first stage is to partition the face training set into clusters which are disjoint. Here, the fuzzy -means (FCM) algorithm is employed to accomplish this task, which solves the optimization problem as follows: subject to the constraints where is the partition matrix, is the membership degree that the th training pattern belongs to the th cluster, is a -tuple of cluster prototypes, is the centroid of the th cluster, is the number of clusters (), and is the weight controlling the degree of fuzziness of the matrix (we set in this study). FCM algorithm is performed in an iterative manner as follows.

Step 1. Initialize .

Step 2. Initialize .

Step 3. Compute by

Step 4. Update by

Step 5. Repeat Steps 3 and 4 until , where is a small positive value.
Choosing a proper number of clusters is of primary importance. The best number of clusters can be determined using the partitioning entropy-based criterion [43] as follows: where is the set of solutions and is the partitioning entropy, providing a global validity measure for the clustering results. Finally, the training set is partitioned into subsets using the simple rule: the th training data point is assigned to the th cluster if its membership degree to the th cluster is the highest; namely,

Stage 2 (training SVDDs). After the training set is partitioned into the subsets , where there is no overlap between two subsets and , the second training stage is to train SVDDs. The th subset is used to train the th SVDD. Namely, use the face training subset to obtain the Lagrange multipliers for the th SVDD by solving the quadratic programming problem formulated in (3) and then use (6) to calculate the sphere radius . However, the trained decision boundary tightly encloses the face training subset. For some test face patterns located around the subset’s distribution, they may be outside of the boundary. Such face test patterns will be rejected as nonfaces as a result. In this study, we deal with this problem by enlarging the sphere; that is, where is a positive value. cannot be too large, otherwise the outlier acceptance rate (nonfaces classified as faces) will be significantly increased although the face acceptance rate is improved. Therefore, should be much smaller than . According to our preliminary testing results, setting can improve the face acceptance rate without increasing the nonface acceptance rate. After updating the sphere radius by (15), the decision function for the th SVDD is obtained.

3.2. Testing

For an unseen pattern , it is rejected as a nonface if it is rejected by all the committee members, accepted as a face otherwise:

In other words, the decision making strategy used in the SVDD committee is not based on the usual majority voting. Instead, the unseen pattern is classified as a face as long as at least one of the SVDDs accepts it.

4. Results and Discussions

In Section 4.1, we first introduce the data set used for experiments. Then, in Section 4.2 we give an illustrated example that shows how the proposed SVDD committee deals with the problem of multicluster face distribution by a visualization analysis. Then, we compare our method with SVM and other one-class classifiers in terms of face detection accuracy.

4.1. Data Set

The extended MIT face data set [44, 45] consists of a training set and a test set. The training set contains 489410 patterns, where 17496 are faces and the remaining 471914 are nonfaces. The test set contains 472 faces and 23573 nonfaces. Each pattern is represented by a 361-dimensional vector.

4.2. An Illustrative Example

We randomly select 100 face patterns from the extended MIT face data set and perform the principal component analysis (PCA) to reduce the dimensionality of the patterns by selecting two leading eigenvectors of the pattern covariance matrix. The projections of the 100 face patterns in the 2D PCA-based subspace are depicted in Figure 2.

It can be observed from Figure 2(a) that the face patterns do not form a compact distribution but a multimodal (multicluster) distribution in the 2D space. We first use one single SVDD to learn the boundary which encloses most of the face patterns. As the kernel parameter is set to a very large value (), the decision boundary of SVDD is nearly spherical shaped in the original 2D space. As the value of decreases to 10 from 100, the SVDD boundary becomes tighter (see Figure 2(c)). However, the boundary is still not tight enough because there still exist empty areas (the areas within the green circles). Nonface patterns falling into these empty areas will be accepted as faces since the patterns are also inside of the SVDD boundary. One way to avoid such a situation is to further decrease the value of . However, according to author’s previous work [46], when is too small, all the mapped target training data will be orthonormal to each other in the Gaussian kernel-induced feature space and become SVs. Also, when all or almost all the target training data becomes SVs, the boundary will be too tight to get a good target acceptance rate [33, 34]. As shown in Figure 2(d), the boundary overfits the face training set and most of the face training patterns become SVs as is set to 1. Although nonfaces can be rejected by this extremely tight boundary successfully, face test patterns are easily rejected as nonfaces as well, resulting in a poor face acceptance rate (i.e., face detection rate).

Instead of using the whole face training set to train an SVDD, the proposed SVDD committee partitions the whole training set into disjoint clusters and then utilizes the face patterns in each cluster to train an independent SVDD. By doing so, the performance drop due to the multicluster face distribution can thus be solved. An illustrated example is shown in Figure 3.

First, the 100 face patterns are partitioned into three clusters (three disjoint subsets) using the FCM algorithm and the best cluster selection criterion stated in Section 3. Compared with the whole training set, each subset forms a much more compact distribution. Then, each subset is utilized to train an SVDD. Finally, the three independent SVDDs constitute a committee. By comparing the results of Figure 3 with the ones of Figure 2, we can see that using multiple SVDDs to find a description for a multicluster face distribution is more suitable than using one single SVDD.

4.3. Comparison with Other Methods
4.3.1. Methods

We compare our proposed SVDD committee with the frequently adopted face detector SVM and the one-class learning methods including the regular SVDD (i.e., single SVDD) and Kernel PCA [36].

Kernel PCA is a nonlinear version of PCA, which was originally designed for pattern representation [47]. Recently, Tsang et al. [44] further extended their original idea to novelty detection. Kernel PCA uses the reconstruction error in a kernel-induced feature space as a novelty measure (see Section 3 of [44] for details). A test data point is accepted as a target if its reconstruction error is below a predefined threshold, rejected as an outlier otherwise. The Kernel PCA for novelty detection involves three free parameters: kernel parameter, number of chosen eigenvectors , and the threshold of the reconstruction error . Free parameters of the methods are listed in Table 1.

4.3.2. Performance Measure

For SVM, faces are treated as positive (target) data while nonfaces as negative data (outlier). For one-class classifiers (SVDD, SVDD committee, and Kernel PCA) faces are treated as targets while nonfaces as outliers. For face detection problems, nonfaces often outnumber faces largely. If the data set is imbalanced, usual classification error rate is not an appropriate performance measure [48]. Therefore, the balanced loss suggested in [48] is adopted as the performance measure in the following experiments, where TAR and ORR stand for the target acceptance rate and outlier rejection rate, respectively. A good face detector should be able to achieve a high TAR and a high ORR simultaneously. Therefore, the lower the balanced loss is, the better the face detector is. To facilitate the comparisons, the balanced loss is simply called error rate hereafter.

4.3.3. Experiments and Results

All methods are optimized by performing a fivefold cross validation procedure on a data set. The data set is not the entire training set of the extended MIT face data set because the scale of the training set is very large: the training set contains 17496 faces and 471914 nonfaces. Since the intent of this paper is not on proposing a method that can deal with large-scale data sets but on proposing a robust face detector which can be better than the widely used face detector SVM and further improving the performance of the one-class classifier SVDD. Thus, only a subset of the training set is used for training, and a subset of the test set is used for testing. The experiment is as follows.

(1)  Step  1. Select face and nonface patterns from the training set by random. The collected patterns form a new training set. In addition, randomly select 200 face and 400 nonface patterns from the test set. The 600 patterns form a new test set. There is no overlap between the new training set and the new test set.

(2)  Step 2. Perform 10-run twofold cross validation [30] on the new training set to optimize the methods.

(3)  Step 3. Feed the new test set to the methods to obtain their error rates.

(4)  Step 4. Repeat Step 1–Step 3 ten times. Compute the average error rate of each method.

We set and There are three experiments in total: Exp A (, ), Exp B (, ), and Exp C (, ). The average error rates of the methods are listed in Table 2.

According to Table 2, we can first observe that the average error rates of SVM in Exp A, Exp B, and Exp C are 20.12%, 18.24%, and 14.24%, respectively. The comparison results indicate that increasing the number of nonface training patterns can improve generalization performance of SVM in face detection. This could be due to the fact that when the number of nonface training patterns increases, the distribution of nonface training patterns can better represent the true distribution of the nonface class, thus being able to gain better generalization performance. Moreover, all the one-class classifiers perform better than the two-class classifier SVM in all experiments except for Exp C where SVM (14.24%) is slightly better than SVDD (14.27%). Also, the proposed SVDD committee gives the best results in the three experiments and greatly improves the accuracy of the original SVDD in face detection, which can be seen from Table 3 where the error reduction ratio (ERR) is defined as where and denote the average error rates of SVDD and SVDD committee, respectively. The results reported in Table 3 demonstrate that the proposed SVDD committee performs better than one single SVDD in face detection.

4.3.4. Training and Testing Speeds

We further compare the computational complexities of the methods. The training time and testing time of each method are recorded during the experiments. A 3.40 GHz-CPU (i7-3770) computer (with 8 GB RAM) running on Windows 8 is used. The methods are implemented with MATLAB 7.10, which is also a 64-bit version. Finally, training time and testing are summarized in Tables 4 and 5, respectively.

We can see from Table 4 that the training time of SVM increases with the size of the training set, because SVM’ training has time complexity, where (for twofold cross validation, 50% of the patterns are used for training). For one-class classifiers, only half of the face training patterns are included in the training. For example, SVDD’s training time complexity is , where . Since in all the three experiments, the actual training time of SVDD is almost the same for the three cases. SVDD committee takes more training time than SVDD. However, compared with SVM, SVDD committee has a much faster training speed, especially when the number of training faces increases. The testing speeds reported in Table 5 indicate that SVDD has the fastest testing speed. Although SVDD committee is lower than SVDD, its testing speed (1.12–1.54 ms/pattern) is acceptable for real-time face detection.

5. Conclusion

In this paper, we have presented a novel face detector called SVDD committee. The proposed SVDD committee is based on one-class learning and partitioning strategies, thus being able to improve the generalization performance of the original SVDD in face detection. Also, nonface patterns are not required in the training of SVDD committee. Therefore, the face detection accuracy of SVDD committee would not be affected by the chosen nonface patterns. On the contrary, the frequently adopted face detector SVM is a two-class classifier. Its face detection accuracy depends on the number of collected nonface training patterns, and its training time increases with . Experiments have demonstrated that the proposed SVDD committee not only performs better than SVM, but also significantly improves the generalization performance of the original SVDD in face detection. Also, the testing speed is acceptable for applications where real-time face detection is required.

This work does not apply any robust feature extraction methods, because the focus of this work is on the development of a novel face detector based on one-class learning. It is believed that the SVDD committee-based face detection accuracy can be further improved if advanced feature extractions are applied, such as LBP and Haar-like features. In addition, the presented work does not address other critical issues such as learning from large-scale data sets. Those will be our future works.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This study was supported by a grant from the Institute of Nuclear Energy Research, Atomic Energy Council, Taiwan.