Abstract

Face detection has been an important and active research topic in computer vision and image processing. In recent years, learning-based face detection algorithms have prevailed with successful applications. In this paper, we propose a new face detection algorithm that works directly in wavelet compressed domain. In order to simplify the processes of image decompression and feature extraction, we modify the AdaBoost learning algorithm to select a set of complimentary joint-coefficient classifiers and integrate them to achieve optimal face detection. Since the face detection on the wavelet compression domain is restricted by the limited discrimination power of the designated feature space, the proposed learning mechanism is developed to achieve the best discrimination from the restricted feature space. The major contributions in the proposed AdaBoost face detection learning algorithm contain the feature space warping, joint feature representation, ID3-like plane quantization, and weak probabilistic classifier, which dramatically increase the discrimination power of the face classifier. Experimental results on the CBCL benchmark and the MIT + CMU real image dataset show that the proposed algorithm can detect faces in the wavelet compressed domain accurately and efficiently.

1. Introduction

Automatically detecting specific objects from images has been a popular research topic for intelligent image analysis and understating with many applications, including face recognition, face tracking, expression cloning, face pose estimation, and 3D head model reconstruction from images. These applications usually assume the face regions are detected correctly as the first step. A lot of researchers from computer vision and image processing have proposed many different approaches for this problem.

Most previous face detection methods focused on detecting faces from a single gray-scale image. The survey paper [1] by Yang et al. classified the face detection methods into four categories, namely, knowledge-based methods, feature-based methods, template matching methods, and appearance-based methods. The appearance-based approach has evolved to a major stream in the face detection research. Since it is very hard to describe a general face in an image by some explicit characterization or feature description, the appearance-based approach learns a face classifier from a large number of face and nonface examples. The training stage in this approach is to decide a two-class classifier from training examples. After collecting a large number of training face images, most researchers focus on finding a suitable feature representation and a powerful classifier for face detection.

In recent years, with the popularity of digital camera and camcorder, the demand for real-time face detection is increasing. Detecting faces directly in a compressed domain, instead of the original image, is an interesting approach that can save time in the decompression process and reduce the complexity of hardware and software design, especially that most digital images in the world are stored in a compressed form. However, not much previous research work was focused on detecting faces in a compressed domain [2, 3]. Detecting faces directly from a compressed domain can skip parts of the decompression and feature extraction process. In this paper, we propose a novel joint feature representation based on the wavelet coefficients and improve the AdaBoost-based learning for fast and accurate face detection. In the detection process, the proposed system accesses the corresponding wavelet coefficients and executes the cascade classifier efficiently with a sliding-window search fashion.

Some previous face detection works proved that the wavelet representation [47] or Haar-like features [814] can well describe faces. However, if we apply the famous AdaBoost face detector [8, 11] with the features replaced by the wavelet coefficients, the resulting accuracy turns out to be unsatisfactory for the following two reasons. Firstly, the feature discrimination is limited by the restricted wavelet compresses feature space. Secondly, it is difficult to implement image contrast normalization directly in the wavelet compressed domain. Thus, this work is motivated by achieving a discriminative face detector in the restricted wavelet feature space. We proposed a paired feature representation and an improved learning framework to achieve a robust classifier with high accuracy in the wavelet compressed domain.

Figure 1 shows the flow diagram of detecting faces directly in the compressed image, such as JPEG2000 [15, 16]. Note that some decompression and feature extraction processes, that is, the blocks in green zone in Figure 1, will be skipped in the proposed method.

Increasing the discrimination in the limited wavelet compressed feature space is the main goal of our learning mechanism. We propose a space warping technique to increase the representation capability of each feature via reweighted learning of the sample distributions. In addition, the joint feature representation projects learning samples onto a paired-feature plane to improve the discrimination power. We propose an improved AdaBoost system based on learning with this joint feature representation.

In order to avoid the information loss in the feature learning procedure, some modified components are developed to preserve more information. For example, the ID3-like quantization is applied after the join feature space representation. Compared with traditional quantization methods, the proposed ID3-like quantization considers the positive and negative sample distribution in 2D pair feature space to achieve the best discrimination by separating samples into different bins with their labels. Moreover, instead of a binary classifier, Bayesian weak learners are adopted to compute the ratio between positive and negative samples for each bin as the output for the classifier. With simple prelearned look-up tables, the weak classifier can provide more detailed classification result than a hard decision.

Finally, the trained face classifier can be applied directly in the wavelet compressed domain with very efficient calculation. The learning framework preserves the essential information in quantization tree and some look-up tables. The execution process of the face detector is simplified to some low-complexity computation, such as accessing corresponding coefficients and querying look-up tables. Although the input features of the proposed algorithm are restricted to the wavelet compressed coefficients without normalization, our experimental results show the accuracy of the proposed face detector is comparable to some state-of-the-art methods which detect faces in the uncompressed image domain.

The rest of this paper is organized as follows. Section 2 reviews the previous related works and the AdaBoost learning algorithm [8, 11], since our learning framework is based on this algorithm. Section 3 gives the details of the proposed learning framework for our face detector, including feature space warping, joint feature representation, ID3-like quantization, and weak probabilistic classifier. Subsequently, we describe the execution flow of the proposed face detector applied in the wavelet compressed domain in Section 4. In Section 5, we show several experimental results on two popular benchmarking databases. Finally, we conclude this paper in Section 6.

Automatic object detection is an important issue in computer vision and image. In addition to face detection, researchers have proposed algorithms for car detection [5], pedestrian detection [6, 1720], and even generic object detection [21]. In this section, we will first briefly review some previous face detection techniques. Then, we will describe the AdaBoost learning algorithm [22, 23].

2.1. Previous Face Detection Techniques

Our focus in this paper is on frontal face detection from still gray-scale images. The survey paper [1] by Yang et al. reviewed some face detection methods in the early period, such as manually established facial rules and predefined symmetric attributes. These methods are intuitive but lack robustness, because the natural sense includes too much variety and the simple heuristic rules or models can not cover all possible variations well. The appearance-based methods became the mainstream of face detection research. It normally consists of collecting a lot of training samples, projecting these samples onto an appropriate feature space, and applying machine learning techniques to form a classifier from the distribution of samples in the feature space. Most initial appearance-based methods used the pixel brightness values from the sliding windows in an image as the features. Then, some well-developed learning algorithms, such as support vector machine [24], neural network [25], eigen space analysis [26], and Sparse Network of Winnows [27], were applied to develop the face detectors.

In order to tolerate more scale and pose variations, Fleuret and Geman [28] proposed a coarse-to-fine face detector based on an edge feature descriptor. Schneiderman and Kanade [5] employed the histogram of wavelet features for face and car detection with out-of-plane rotation. In the meantime, Papageorgiou and Poggio [6] developed their multipurpose object detection system by using the wavelet features with the support vector machine classifier. In addition, Heisele et al. [29] partitioned a face region into several local patches and applied support vector machines to develop a component-based face detector. In 2001, Viola and Jones [8] presented the first real-time face detection system based on AdaBoost learning in conjunction with block sum difference features easily computed with an integral image. The efficient computation and acceptable accuracy of this system bring the face detection into real applications. More details in the AdaBoost learning will be discussed in the next subsection. Later, Liu [30] applied Bayesian Discriminating Features (BDF) technique to develop an accurate face detection system with a very low false detection rate.

Although detecting faces in gray-scale images is the most general approach, some researchers also employed the color information to simplify the face detection problem. By using the color information, the face detection system can extract more discriminative information and increase the speed and accuracy dramatically. Traditional works [4, 31] collected a large number of pixels of skin regions to determine a skin color distribution and filter. Hsu et al. [32] proposed a face detector that contains a lighting compensation step and eye/mouth color maps. Huang and Lai [7] developed a color face classifier by learning face appearance in the color feature space. Tsalakanidou et al. [33] used extra 3D range data acquired by a 3D sensor to improve the performance of color face detection under illumination and expression variations.

Detecting faces in video [7, 34, 35] has been another interesting face detection approach in recent years. Combining video tracking techniques and the face appearance models can extend the face detection from still images to video sequences. These methods proved that they can recover missed face detections and eliminate false positives by temporally integrating the face detection results.

Furthermore, multipose face detection techniques have been researched to extend the previous frontal face detection methods in recent years. Schneiderman and Kanade [5] first applied the statistical histogram representation for detecting faces in profile as well as frontal views. Then, a convolutional neural network architecture [36] and an AdaBoost method with pose estimator [10] are proposed to extend the upright and frontal face classifier to detect faces with large pose variations, with rotation up to ±30 degrees in image plane (RIP) and up to ±60 degrees for out-of-plane rotation (ROP). More recently, Huang et al. [14] proposed the width-first-search structure and vector boosting algorithm to accomplish the face detection with arbitrary RIP angles and the ROP angles up to ±90 degrees.

In addition, some more related works were developed for different problem settings and different applications. For example, detecting small faces from degraded images [13] focused on detecting low-resolutional faces. There were previous methods proposed to detect faces in the DCT compressed domain [2, 3], which is somewhat related to the problem setting of this paper. The major difference between the previous works and the proposed method is that our method can be applied directly in the wavelet compressed domain without wavelet decomposition or intensity normalization, and it can still achieve high accuracy comparable to the state-of-the-art face detection methods.

In addition to the face detection, face identification and recognition is another challenging problem which had been widely discussed in computer vision research field. After the face region is detected precisely from the input image, the face recognition system would analyze the frontal facial image patch and determine or verify the identity of the person. Zhao et al. [37] had extensively reviewed early machine recognition systems and surveyed several psychological studies which focused on human faces. These works can be roughly categorized into two types: face recognition from single still image and face recognition from video sequences. Wright et al. [38] proposed a new classification framework based on sparse representation techniques and provided new insights into two crucial issues: feature extraction and robustness to occlusion. To solve the face identification under uncontrolled environments or with a lack of training samples, Schwartz et al. [39] employed a large and rich set of feature descriptors and used partial least squares regression model to increase the discriminant ability of recognizer across varying conditions.

2.2. AdaBoost Learning

AdaBoost, short for adaptive boosting, is a meta-algorithm that can cooperate with other machine learning techniques to improve their performance. The AdaBoost algorithm was originally proposed by Freund and Schapire [22], and this original algorithm is listed in Algorithm 1.

Input: sequence of labeled examples
   Distribution over examples
   Weak learning algorithm WeakLearn
   integer specifying number of iterations
Initialize the weight vector for
Do for  
  (1) Set
  (2) Call WeakLearn, providing it with the distribution ; get back a hypothesis :
  (3) Calculate the error of
  (4) Set
  (5) Set the new weight vector to be
Output the hypothesis
      

As shown in Algorithm 1, after a series of labeled data put into the learning machine, we need to initialize the weights of each learning sample. In most two-class classification problems without prior knowledge of the training examples, the weight summations of all positive and negative data will be set equally and each learning sample belonging to the same category has the same weight. Another issue is about an adequate quantity of training samples, which is very difficult to determine for a practical machine learning problem. Bootstrap learning architecture provides a solution to resolve this problem.

In the AdaBoost algorithm depicted in Algorithm 1, WeakLearn is a function or an algorithm that performs the hypothesis to classify the input samples into different categories by considering the current sample weights. The word “weak” means the hypothesis is not expected to be very powerful since it only uses very simple features and calculation in the weak classifier. In most applications, the WeakLearn function is normally designed in a simple way, such as a binary function of a feature value. The basic idea is that the WeakLearn classification functions are very easy to calculate and at least slightly better than random guess. Thus, the AdaBoost learning algorithm is applied to select a set of discriminating and complementary weak classifiers to form a final strong classifier.

The input integer specifies the number of iterations in the learning system. One obvious advantage of AdaBoost is that it did not need any tuning parameters except . The selection of depends on different applications. Selecting a larger value will decrease the error measure in the training data, but it may lead to the overfitting problem. The value is decided in each iteration and the sample weights are updated from the error measure. The basic idea in the AdaBoost is to assign more weighting to the samples misclassified in the previous iterations to achieve a global optimization process.

AdaBoost has been very popular in computer vision and image processing research fields since the first real-time face detection method proposed by Viola and Jones [8]. For face detection, there were some improved versions, such as Kullback-Leibler Boosting [40], FloatBoost [9], and asymmetric AdaBoost [41] algorithm, to increase the accuracy and efficiency of learning performance. Some modifications of the learning framework extended Viola and Jones’ method to different applications, such as detecting faces in video [7], detecting faces in degraded images [13], and detecting pedestrians via motion and appearance patterns [17]. A similar technique of AdaBoost was also applied in different feature space to solve other two-class learning problems. The image retrieval with relevance feedback [42, 43] was also an important application of AdaBoost.

At the end of this section, we list some improved versions of the AdaBoost algorithm in Table 1. Although most applications applied the AdaBoost algorithm to solve the two-class classification problems, there were some extensions of the AdaBoost algorithm to the regression [22] and multiclass classification problem [23].

3. Proposed Learning Method

The proposed learning system is an improved version of Viola and Jones’ face detector [8, 11] to adapt to our requirement, that is, detecting faces directly from wavelet compressed domain. The fundamental structure is similar to most appearance-based learning methods. An initial training data set, including 4916 face image blocks and 7872 nonface (negative) image blocks, is prepared for the learning of the face classifier. When the trained AdaBoost classifier can separate samples in the training dataset well, the current classifier is applied to a large image database to accumulate false positive blocks as the negative samples in the next training dataset. In the bootstrap learning system, the growing negative learning samples are extracted from 100 different categories of Corel PhotoStock database, totally 10000 natural images.

For the proposed face detector, a 3-leveled wavelet transformation is applied for each training image to obtain the 576-dimensional features. The LL band of highest level from images is skipped because this part cannot be recovered when we execute the face detection directly from compressed domain without any decompressed process.

3.1. Learning System Overview

The goal of the improved AdaBoost learning algorithm is to learn an efficient face detector from a restricted wavelet feature space. Without wavelet decomposition and intensity-based normalization processes, the feature discrimination power for face detection is weak. The improved AdaBoost learning system contains four major improvements based on two principles: higher discrimination and more information preservation. Algorithm 2 gives an overview of the proposed system. More details of each step will be described in the following parts of this section.

(i) Given example images , where takes the value 0 for negative examples or 1
   for positive examples, respectively.
(ii) Initialize weights for or for , where and denote
   the total numbers of negative and positive images, respectively.
(iii) For
 (1) Estimate feature space warping function via the sample distribution [ ]
   and weights
 (2) For each possible feature pair ( ), map all training sample onto paired plane via
    warped feature value and
 (3) Apply ID3-like tree method to each axis of paired feature plane in rotation, and find
    the quantization function which will try to separate positive and negative
    samples into different bins.
 (4) Compute the conditional probability as the Bayesian classification result for each
    weak classifier .
 (5) Estimate the error for each feature pair ( ) as follows:
             
 (6) Select the paired feature with minimum error
              
 (7) Update the weights for all training samples as follows:
         ,
    where .
 (8) Normalize the weights by
            
(iv) The final classifier is given by
          ,
where = .

3.2. Feature Space Warping

The function of feature space warping is similar to the histogram equalization for image enhancement. The basic idea is that we should use more levels in the area with dense data distribution and less levels in the area with sparse data for the feature quantization. In actual implementation, the distribution of the training data samples should be reweighted by the current weights. We need a nonlinear transformation for each feature to increase the feature representation capability.

A discrete cumulative density function is estimated to find some landmark points, such as the feature value located in 50% weighted distribution. After we have these landmarks, a simple space warping, which linearly interpolates samples between these points, is applied. Figure 2(a) shows the landmarks of original weighted data distribution, and Figure 2(b) shows the weighted distribution after space warping. After the space warping process, the distance measure of a single feature between two different samples is driven by the current sample weights and distributions.

3.3. Joint Feature Representation

Schneiderman and Kanade [5] have first adopted the idea of joint distribution of a pair of features to represent objects. Mita et al. [12] simply extended the AdaBoost detector proposed by Viola and Jones and combined three binary weak classifiers to the three-digit code learning. In the proposed method, we map all learning samples onto paired wavelet feature spaces and estimate the corresponding 2D distributions before selecting the weak classifiers. Because of the feature space warping step, the data distribution appears approximately uniform when only considering a single feature. After the joint feature representation, mapping data to higher dimension is a strategy to increase the feature variety instantly. With the processes of permutation and combination, it provides higher possibility to explore more discrimination.

The original feature dimension for a three-level wavelet transform of a 24-by-24 block image is 576. After the cross-level and cross-band combinations of a pair of wavelet coefficients, the feature dimension is increased to 160,461 dramatically without extracting any additional information. Instead of the paired feature representation, a cubic or even higher dimensional mapping is another aggressive possibility. However, the efficiency of the AdaBoost learning system should be considered, especially when the computational cost for the AdaBoost learning from the paired feature spaces is already very high.

The major design principle for the feature space warping and the joint feature representation is to explore high discrimination from the limited set of wavelet coefficients. After the joint feature representation is created, the following steps are designed for preserving more information in the feature quantization before going into the AdaBoost learning procedure.

3.4. ID3-Like Plane Quantization

After the joint feature representation, we can estimate the positive and negative data distributions for each feature pair from the training data. To develop weak classifiers for all possible pair features for AdaBoost learning, we quantize the paired feature plane for computing the conditional probabilities of the joint features for positive and negative cases. Our strategy is to segment the paired feature plane into several representative regions and use the ratio of the conditional probabilities in each quantized region for classification. In other words, we want to quantize the continuous paired feature plane such that each quantized segment has its own dominant sample label. With this strategy, the system can separate the positive and negative into different segments as good as possible to achieve high discrimination capability.

To achieve the above goal, we employ the ID3-like plane quantization on the paired feature space. This quantization for each feature is determined based on the distribution of the training data with the current weight function. Compared to other traditional quantization methods, our ID3-like approach can retain more information with a little bit more computational effort. The main algorithm of ID-3 decision tree [44] is to select the best boundary in each node such that it can divide the data passing through this node into two classes with the highest information gain. It means that the boundary is selected such that each branch contains as much data of the same class as possible. In other words, we want to find appropriate boundaries to divide the data into intervals of maximal uniformity.

In the ID3-decision tree, we first define the entropy and information gain as follows: where is the weighting function for each training sample at the th iteration and the symbol denotes a function that can classify samples from a parent node to a different leaf node . Because our ID3-like balance tree quantization is a binary tree, the function can be represented by a list of thresholds that divide a feature value into left or right leaf nodes recursively. Then, we can select the best seed value that maximizes the information gain as follows: First, entering all learning samples into root node can find the best boundary in one of the axes to separate the space into two parts. In these two regions, we turned the seed selection process in another axis independently. The ID3-like plane quantization involves repeating the above process recursively and alternatively along the two axes to determine a quantization function . If the processing time is a critical issue, histogram equalization can provide the initial seeds to speed up the computation with similar performance. In practical setting, a four-layer decision tree is constructed just like Figure 3(b) to quantize the pair feature plane into 16 different regions, as shown in Figure 3(a).

3.5. Weak Probabilistic Classifier

For each pair of features, we can train a weak classifier based on the corresponding joint conditional probability determined from the training data. The AdaBoost training algorithm is then used to select some powerful and complementary weak classifiers and combine them to form a final classifier for face detection. For each weak classifier, we apply the Bayes rule based on the conditional probability density function in each interval to decide the class. By thresholding the ratio of face to nonface conditional probabilities for a given paired feature vector in a quantized interval, we obtain a number of weak classifiers that can be used in the AdaBoost training algorithm.

Making a binary decision in the weak classifier does not fully exploit the information computed in the conditional probabilities. Therefore, we replace the binary decision in the weak classifier by the conditional probability in the AdaBoost learning algorithm.

By applying the Bayes rule, we can compute the conditional probability as follows: where means the probability of the th pair feature for image to be class 1, that is, face class and , which denotes the leaf node index of ID3 tree determined from the plane quantization. Equation (4) measures the conditional distribution of under the situation that label and the sample weights are updated after iterations of the AdaBoost learning algorithm. Equation (3) returns a probability value between 0 and 1, which indicates the face conditional probability. We use to be the Bayesian weak classifier in the AdaBoost system.

In our implementation, we segment the paired feature plane into 16 regions and a look-up table containing 16 Bayesian probabilities is used for the AdaBoost learning and testing processes.

4. Detecting Faces on Wavelet Compressed Domain

In this section, we describe how the trained AdaBoost face classifier is applied for face detection from the wavelet representation of the whole image. The previous AdaBoost face detector is featured for its simple computation. Although we add several complicated components into our learning system to increase the overall discrimination for face detection from a restricted feature space, the resulting AdaBoost classifier only needs a very small amount of computation with some look-up tables and quantization tree structures. Algorithm 3 depicts the complete algorithm of the proposed face detection on the wavelet compressed domain. Sections 4.1 and 4.2 give more details of our implementation.

(i) Given a test image represented in -layered wavelet compressed domain
(ii) Each layered-coefficient plane is composed of three sub-bands, , and
(iii) Preprocessing
  (1) Apply the bi-linear interpolation to down-sample each sub-band to 1/1.25, 1/1.5,
    and 1/1.75 scales, respectively, and form three additional wavelet layer sets.
(iv) For each of these four sets of the wavelet-layer representation, run the sliding window face
   detection with the scale initialized to 1
  (1) Apply the AdaBoost face classifier to each sliding window which is constructed from
    the coefficients in the planes from to .
  (2) If the classifier determines the region is a face, calculate and save the position and
    size of the corresponding window in the original image space based on the shift,
    downsample, and layer information.
  (3) Repeat the previous two steps with the scale incremented by one until the scale .
(v) Postprocessing
  (1) Eliminate the overlapped face regions based on the scores provided by the AdaBoost classifier.
  (2) Output the detected faces.

4.1. Face Detection in a Single Sliding Window

First, we describe how the trained AdaBoost classifier is applied in a sliding window search strategy for face detection from a wavelet compressed image. When focusing on a single block of a whole image to determine whether it is a face region, we can access the corresponding wavelet coefficients from HL, HH, and LH bands of three contiguous levels. Since the LL bands for all levels are not available without a complete decompression process, we do not include them into our feature space learning.

In each iteration , the selected hypothesis from our system refers to a weak classifier associated with a feature pair . The first step is to retrieve the coefficient values of features and . These two feature values are quantized through the corresponding ID3-like plane quantization, as described above, with only four comparison operators. Then, a simple look-up table is used to access the corresponding weak classifier output value for the quantized feature region, which can be done very efficiently. Finally, the accumulated sum of the products of , provided from the AdaBoost training, and the weak Bayesian classifier probability output is used to determine if this window is a face region or not.

4.2. Face Detection in a Whole Image

When detecting faces in a whole image, the position and scale of sliding windows should cover all the possible image blocks where faces could appear. The minimal detection window is as large as the training samples with pixels. The shift of the sliding window in the wavelet compressed domain can be easily implemented by adding relative shift terms when seeking the corresponding feature values. The shift at one higher level is twice the shift at the previous level. Figure 4 is a simple chart that describes the corresponding spatial relationship of the window shifting. The gray solid rectangle shifts one coefficient in level 3 from the red rectangle. The relative shifts in level 2 and level 1 are 2 and 4, respectively. Recovering to the original image domain, the gray detected window shifts 8 pixels in horizontal direction from red one. This kind of scanned windows with full correspondence is not dense enough for face detection. Therefore, shifting windows of the corresponding coefficients in lower level and rounding the shifts in higher level could achieve higher accuracy.

In addition to the shift, the various scales of the sliding windows in the compressed domain are not easy to implement without wavelet decomposition. The multilevel structure of wavelet decomposition provides a basis to find the corresponding features from different scales, but it is restricted to the detected windows with power-of-2 scales of the template face window. For example, if we can detect faces of size 48-by-48 from wavelet levels ,  , and , then the coefficients related to 96-by-96 faces are positioned in levels to . In order to detect faces of sizes between these two scales, we apply the bilinear interpolation to the wavelet coefficient plane. Downsampling these coefficient planes to 1/1.25, 1/1.5, and 1/1.75 of the original width and height will create three different starting scale bases. Thus, between sizes 24 and 48, we can have sizes 30, 36, and 42 for different window widths with the same framework. The downsampled plane with 1/1.25 ratio can provide the coefficients of 30-by-30 detection windows and its higher wavelet levels should cover the sizes , and so forth, in the original image scale. An additional postprocessing is required to decrease the detected face regions which are overlapped with each other. The positive windows with higher scores in strong classifier will be reserved as the final decision.

5. Experimental Results

In this section, we show four sets of experiments to verify the improvement of the proposed AdaBoost learning system to demonstrate the performance and efficiency of the proposed face detector directly on the wavelet compressed domain. We first adopted CBCL face database which contains separate training and testing image datasets for evaluating the AdaBoost learning results. Totally 24,045 testing clipped images are examined after one-time learning from 6,977 training image blocks. Then, the MIT + CMU face database is used for the testing of face detection from whole images with bootstrap learning. Our experimental results show that the proposed learning system significantly improves the accuracy of face detection on the restricted wavelet compresses domain.

5.1. Learning System Improvement Benchmarks

To justify the improved AdaBoost-based learning system, we need a benchmark to evaluate the performance of the proposed method. MIT CBCL face dataset provides a fair benchmarking database to compare the performance of face classifiers. The training dataset contains 6,977 image blocks (2,429 face blocks and 4,548 nonface blocks) and the testing data set is composed of 24,045 image blocks (472 face blocks and 23,573 nonface blocks).

All the experiments in this part have three different learning system settings: AdaBoost learning with Viola and Jones’ feature space [11], AdaBoost learning in conjunction with the 567-dimensional wavelet feature space, and the proposed learning system on the paired wavelet feature space.

The first benchmarking experiment is performed for each of the three learning systems with 10 weak classifiers only. The results are depicted in Figure 5(a). It is obvious that changing the feature space from the original feature space in Viola and Jones’ method to the restricted wavelet feature space degrades the face detection accuracy dramatically. However, with our improved AdaBoost learning, the detection rates of the proposed face detector are better than all the other two systems under the same false positive rates. One may argue that the comparison is unfair because the proposed paired feature learning strategy adopts two features for each iteration, that is, weak classifier. Therefore, we performed another experiment, shown in Figure 5(b), which restricted each of the three final classifiers can only include 20 wavelet coefficients. The result is more reasonable from our expectation. When the false alarm is equal to 0.2, the detection rate of Viola and Jones’ method is 0.79. The same learning method with wavelet features only has 0.65 detection rate, while the proposed system with paired wavelet features can improve the face detection rate to 0.76. In addition, the proposed method can obtain better results in this benchmark comparison when the false alarm rate increases to 0.35.

In order to examine the limit of the learning systems, we increase the accessing feature number to 200. In other words, the standard AdaBoost classifiers contain 200 iterations and the proposed paired feature learning system consists of 100 weak classifiers. As shown in Figure 5(c), it has similar result to that of Figure 5(b), and the proposed system outperforms Viola and Jones’ method when the false alarm rate is larger than 0.185.

5.2. ROC Curve of MIT + CMU Dataset

The second experiment is mainly used to examine the performance of the proposed learning system after bootstrap learning. Our system is designed for face detection in wavelet compressed domain, so the input to our face detection system is the wavelet representation of the whole testing image.

MIT + CMU face dataset is widely used in the face detection research. There are 130 gray-scale images containing 507 faces in this dataset. Figure 6 depicts three ROC curves and detection rates with respect to different numbers of false positives, obtained by applying the three different face detection algorithms to the entire dataset. The experimental results show that the proposed method can improve the detection rate from 0.68 to 0.89 under 100 false positives, which is near the 0.92 detection rate in raw image.

The curve of Viola and Jones’ method was published in their paper [11] and adopted here for comparison. Another ROC curve is obtained by applying the same AdaBoost learning algorithm with the wavelet features. It is obvious from Figure 6 that the proposed learning system with the paired wavelet features improves the ROC curve with the restricted wavelet features significantly and it is close to the ROC curve of Viola and Jones’ face detector, which is based on a large number of features.

5.3. More Comparisons on MIT + CMU Dataset

In this experiment, we provide more results and comparisons between the proposed face detection system and other systems. The experimental setting is similar to that in the previous two experiments. The detection rates under different numbers of false detection of the proposed face detector and some previous methods on the MIT + CMU face dataset are depicted in Table 2.

In this comparison, we can see that the accuracy of the proposed method is about 1%~5% lower than that of Viola-Jones’ method under different rates of false positives. We think it is a reasonable accuracy decrease since the proposed face detection algorithm is restricted to the limited wavelet feature space and it can save the decompression cost. Figure 7 depicts the face detection results of the proposed algorithm in the test images of MIT + CMU dataset. The experimental results show that our system can detect face regions correctly not only in the natural photos but also in the paintings and sketches. Moreover, Figure 8 displays some false positives and missing detection of proposed method in the same database. We observe that the proposed system misses several faces with partial occlusions or higher rotation angles as shown in Figure 8. In addition, balancing the number of missing and false positive rate is critical for a binary decision system design. The false positives shown in Figure 8 can be eliminated with a tighter detector and the detection rate will also be decreased to 87.3%.

5.4. Execution Time Analysis

Table 3 depicts the execution time of the components in the proposed face detection system that operates directly in the wavelet domain and the original AdaBoost face detector [8, 11] that detects faces after wavelet decomposition. This experiment is performed on a regular PC with Intel E6320 CPU (Dual 1.87 GHz Cores) and 2 G RAM. The face detector was applied from blocks to full image size with 1.25 times scale increase. In the spatial domain, the sliding window is scanned on the image for face detection with step 1/8 window width in both horizontal and vertical directions for speed consideration.

The proposed wavelet-domain face detector skips the IDWT procedure and the feature extraction process to achieve more efficient detection. Our experiments show that the processing time, which skips the tier 1 and tier 2 decoding time, on a image is only 19.5 ms. When discussing the total execution time of detecting faces from a compressed image, our face detector only requires 57% computation time of the original AdaBoost face detector [8, 11].

6. Conclusion

In this paper, we proposed a face detection system working directly in the wavelet compressed domain. The main contributions of the proposed face detection algorithm are an improved AdaBoost-based learning framework and a series of joint feature representation strategies, which can produce a strong face classifier on the restricted wavelet feature space.

The proposed face detection system involves a feature space warping process, a paired feature learning scheme, an ID3-like joint feature plane quantization method, and a weak Bayesian classifier. Although some complicated components which increase the power of classifiers are included in the face detection system, the execution of the final face classifier is quite simple. With tree structure quantization and look-up tables, the proposed face detector can work very efficiently and directly on the wavelet compressed domain. Our experimental results on the benchmarking face datasets showed that the proposed face detection system working in the compressed domain can achieve similar accuracy to that of Viola and Jones’ face detector [8, 11].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the National Science Council of the Republic of China, Taiwan, for partially financially supporting this research under Contract nos. NSC 102-2221-E-007-082 and NSC 102-2622-E-007-019-CC3. This work was also supported by the Advanced Manufacturing and Service Management Research Center (AMSMRC), National Tsing Hua University.