Abstract

In pedestrian detection methods, their high accuracy detection rates are always obtained at the cost of a large amount of false pedestrians. In order to overcome this problem, the authors propose an accurate pedestrian detection system based on two machine learning methods: cascade AdaBoost detector and random vector functional-link net. During the offline training phase, the parameters of a cascade AdaBoost detector and random vector functional-link net are trained by standard dataset. These candidates, extracted by the strategy of a multiscale sliding window, are normalized to be standard scale and verified by the cascade AdaBoost detector and random vector functional-link net on the online phase. Only those candidates with high confidence can pass the validation. The proposed system is more accurate than other single machine learning algorithms with fewer false pedestrians, which has been confirmed in simulation experiment on four datasets.

1. Introduction

Nowadays, pedestrian detection has drawn the attention of many researchers, due to its wide range of applications, such as driver assistant system [13], intelligent video surveillance system [4, 5], and victim rescue in case of emergency [6]. Numerous pedestrian detection algorithms have been proposed during the past decades, based on different techniques and strategies [710].

Pedestrians have properties of both rigid and flexible objects. Furthermore, the appearances of pedestrians are easily affected by view angle, occlusion, apparel, scale, pose variation, and illumination changes. All these issues have made pedestrian detection become a hot issue and one of the difficulties in the fields of computer vision. In current mainstream methods for pedestrian detection, machine learning algorithms are adopted to distinguish and identify pedestrians from candidates extracted by multiscale sliding windows. However, high accuracy detection rates of these algorithms are always obtained at the cost of a large amount of false pedestrians. These experiments show that high accuracy detection rates and low false positive rates are by no means simultaneously guaranteed.

The two-stage classifier [11], proposed by Guo et al., can further reduce false positive rates and this system has better performance than these single-stage algorithms. However, the detection rates cannot be further increased and maintained at a certain level as can these single-stage algorithms. In this paper, a novel two-stage detecting system is proposed based on a cascade AdaBoost detector [9] and random vector functional-link net [12, 13]. These two algorithms can simultaneously deal with the normalized candidates extracted by multiscale sliding windows, which can guarantee the detecting efficiency of the proposed system. These processing results of the cascade AdaBoost detector and random vector functional-link net are fused together, as the final evaluation criteria of whether these candidates are pedestrians or not. The cascade AdaBoost detector and random vector functional-link net are two of the significant high-efficient machine learning algorithms. They have both been applied in many research fields, such as multimedia processing, natural language processing, biological information processing, and network security.

The proposed system can achieve high accuracy detection rates on the basis of low false positive rates, which is benefited from the joint promotion of the cascade AdaBoost detector and random vector functional-link net. The high performance of the proposed system has been demonstrated on four datasets, with different types, during our simulation experiments. The remainder of the paper is organized as follows. We start by introducing the structure of the proposed system in Section 2, and the experimental comparison of the proposed system with other state-of-the-art detectors is demonstrated in Section 3. Finally, we summarize the characteristics of the proposed system and discuss its superiority over other detectors in Section 4.

2. Proposed Pedestrian Detection System

As there is seldom any single detector that can reach excellent performance with high detection rate and few false positives in complex scenarios, the proposed pedestrian detection system is based on machine learning algorithms. The cascade AdaBoost (CAB) detector  [9] and random vector functional-link (RVFL) net  [12, 13] have been employed and combined to enhance the corresponding performance of detection results.

2.1. System Architecture

The flow chart of the proposed pedestrian detection system is demonstrated in Figure 1. The proposed system contains the off-line training phase and the on-line detecting phase. During the off-line training phase, the CAB detector and RVFL net are trained separately with the given training dataset. Each training sample has the same size, called the standard size, which is demonstrated in Section 3. The CAB detector is trained as classification pattern, while RVFL net is trained as regression pattern. For the classification pattern of CAB detector, the positive samples are labeled as 1 and negative samples are labeled as 0. During the training process of RVFL net with regression pattern, the confidence scale is limited in .

During the on-line detecting phase, all the subimages are generated by multiscale sliding windows, and they are resized to be the standard size as testing candidates. Then the CAB detector and RVFL net are employed to judge whether each candidate is a pedestrian or not. The CAB detector estimates whether each candidate is a pedestrian or not, and RVFL net estimates a confidence score for each candidate. Their two results are combined to get the final matching score and, finally, only those candidates with higher matching scores than the given threshold are regarded as pedestrians. The details of the proposed system are as follows.

2.2. Feature Extraction

Feature extraction is a type of dimensionality reduction that efficiently represents the ROI region of an image in the fields of object detection and pattern recognition algorithms. These features are extracted as a compact feature vector, for subsequent processing. Therefore, effective image feature extraction is rather important, which concerns final objection detection accuracy. Common features extraction techniques include the RGB histogram, local binary patterns (LBP) [14], histogram of oriented gradients (HOG)  [7], Haar-like feature, first-order image statistics (the mean standard deviation, skewness, and kurtosis of pixel intensities), second-order image statistics (the mean and range of contrast, correlation, energy, and homogeneity)  [15], and Hu’s invariant matrix  [16].

Past research has shown that, in the past researches, Haar-like and LBP features have been used for detecting faces, as they have desirable properties for representing fine-scale textures. And the HOG features, which can capture the overall shape of an object, have been used for detecting objects such as people and cars. In this paper, HOG features are adopted to enhance the pedestrian detection performance of the proposed system. In our experiment, the parameters for the HOG feature extraction applied to the CAB detector and RVFL net are the same. For our system, the normalized candidates are divided into pixel blocks; each block contains cells of pixels; linear gradient voting into 9 orientation bins in . Therefore, the HOG features for CAB detector and RVFL net can be extracted in one step.

2.3. Cascade AdaBoost Detector (CAB)

The cascade AdaBoost algorithm [9] is adopted, to detect object categories whose aspect ratio does not significantly vary. This algorithm consists of a series of classifiers, where each classifier is an AdaBoost learner and its parameters are adjusted utilizing a boosting algorithm. The flow chart of the cascade AdaBoost algorithm is illustrated in Figure 2. The expression of the cascade AdaBoost algorithm is formed as where is sample inputs, is the number of stages, and is the strong classifier of stage , which can be represented as where is the number of weak classifiers of each stage, is the th weak classifier, and is the corresponding ensemble weight of . Suppose the total number of positive samples is , and the minimum true positive rate is ; then the number of positive samples to use at each stage is calculated by where is the floor function. The number of negative samples for each stage is always set to be , twice the positive samples.

During the training process, a certain amount of positive samples and negative images are required. The feature type and number of stages are set and other function parameters, which contain the minimum true positive and the maximum false alarm rates, are first initialized. Then, the parameters of each stage are estimated with partial positive and negative samples.

As mentioned above, true positives are usually not sufficiently given and worth taking the time to verify through the cascade stages. Furthermore, sufficient negative samples should be provided to ensure the training phase is carried out smoothly, and typical negative samples are supplied containing background information of the images to be detected. During the parameter estimation of each stage, the AdaBoost learner is trained by adding features, until the minimum true positive and the maximum false alarm rates are met. The number of stages is determined with proper final false positive and detection rates.

During the detection phase, as shown in Figure 2, all subwindows of the image are extracted through a multiscale sliding window. The structure of the cascade AdaBoost reflects that the vast majority of these subwindows are negative. As such, each stage of the cascade AdaBoost detector rejects the large possible number of nonpedestrian windows and lets potential targets pass to the next stage. Finally, only a few of these subwindows accepted by all stages of the detector are regarded as objects.

2.4. Random Vector Functional-Link (RVFL) Net

The random vector functional-link net [12, 13] is a special case of the single hidden layer feed-forward neural network. The hidden layer contains two different types of nodes: input patterns and enhancement patterns. Input patterns are simple linear combinations of sample inputs. These additional enhancements can be represented as , where is the weights of the input vector, is the threshold parameter for the th node, is the sample inputs, and is the activation function. Therefore, the RVFL net can be interpreted as a mapping from -dimensional space to -dimensional space, where is the dimensionality of training sample inputs, and separately, is the number of additional enhancements. The output of the RVFL net can be represented as

For the random vector functional-link net, and are randomly generated according to an appropriate given distribution (e.g., Gaussian distribution). Therefore, only the weight vector needs to be learned, which largely reduces the time cost of the training phase. The optimal weight vector is obtained by minimization of the system error where is the number of training samples, is the enhanced pattern vector, the subscript is the sample index, and is the target value of the th training sample.

The unique minimum of system error can be found by a learning phase, such as the conjugate gradient approach [17, 18]. If matrix inversion with the use of a pseudoinverse is feasible, then the optimal weight vector is obtained by a single step, without any iteration. For this case, the pseudoinverse of optimal weight vector was estimated by a single step in our experiments.

2.5. Matching Score Fusion

The proposed system deploys CAB and RVFL net to get more accurate detection rate. To obtain the final matching score for any subwindow, the proposed system fuses their two results: classification result , represented by 0 or 1, from CAB and confidence score , represented by continuous value with the range of , from RVFL net. Subimages with high matching scores can be accepted as objects. The function of matching score fusion is defined as

With proper activation function, the enhancement patterns of RVFL net are more powerful than these input patterns, as the output of enhancement patterns has nonlinear correlation with sample inputs. During our experiment, only enhancement patterns of RVFL net are employed, and the activation function is set to be a sigmoid function. For this case, the final match score in (6) can be simplified to be

3. Experiments

In this section, we compare our proposed two-stage detection system with four of the latest state-of-the-art detectors. To validate our proposed system, we have tested it on four publicly available sequences, which are PET’09 S3.MF (Multiple Flow) and PET’09 S0.CC (City Center) from PET benchmark [19], the “USC pedestrian set A” sequence from USC dataset [20], and the INRIA Person dataset  [21]. The first two datasets are consecutive frames captured by one fixed camera, while the sequences of the latter two datasets are chosen from different scenarios. For the city center sequence, the first 100 frames are selected for testing, as the amount of this sequence is quite large, while all sequences of the other three datasets are adopted, during this experiment. For the INRIA dataset, parts of these images are resized, to guarantee that these pedestrians have similar size to the training dataset, as the pedestrian size scale of this dataset varied greatly.

The training data are the same for all these four testing dataset, which are selected from the NITCA pedestrian dataset [22], with image size of pixels. However, in order to improve the performance of the proposed system, 600 nonpedestrian images from the Daimler dataset [23] are added to the negative training dataset of the cascaded AdaBoost algorithm. The number of positive training samples is 500 for the cascade AdaBoost algorithm while the number of negative samples is twice that of the positives. For RVFL net, the amount of positive and negative samples is the same and is set to be 3000.

Figure 3 shows the training and testing accuracies and times of RVFL net, with increasing number of hidden nodes from 10 to 200, by the step of 10. All these results are estimated by k-fold cross-validation [24]. During the cross-validation process, the whole dataset is randomly partitioned into 10 equal size subsets, and one single subset is selected as the validation data for testing the model, while the remaining 9 subsets are used as training data, on a case-by-case basis. Finally, the number of hidden nodes is set to be 180, with smooth and efficient training accuracy and high capability of generalization. When the number of hidden nodes is 180, the testing time is just 0.056 s, although the training time reaches 1.18 s. Therefore, the efficiency of RVFL net is guaranteed, during practical applications.

The minimum true positive and the maximum false alarm rates of CAB detector in our experiment are set to be 0.995 and 0.5, correspondingly. Figure 4 shows the (detection rate/false positives per frame) curve of the proposed system with the parameter pair (added confidence and stage number of the CAB detector). The formulas of the pedestrian detection rate (PDR) and false positives per frame (FPPF) are demonstrated as follows: where is the number of pedestrian samples correctly predicted to be pedestrians; is the number of nonpedestrian samples incorrectly predicted to be pedestrians; is the number of pedestrian samples incorrectly predicted to be nonpedestrians; is the number of total frames of the corresponding dataset sequences. We have tested the performance of added confidence with different values . The curve of is very close to , which means that the performance of the proposed system is beginning to stabilize when is growing larger than . The curve comparison of is demonstrated in Figure 4. From all these 12 subfigures, the performance of is superior to . Moreover, when the stage number is 12, the results are better than , on the whole. Finally, the parameter pair is set to be (0.6, 12). Note that the single RVFL net can be regarded as a special case of the proposed system when . Therefore, the proposed system has better performance than single RVFL net.

The comparison results of the proposed system and four other state-of-the-art detectors (CAB [9], SVM [7], GAB [25], and HF algorithm [8]) are shown in Table 1. In order to demonstrate the superiority of the proposed system, two sets of detection rates and the corresponding average number of false positives per frame of the proposed system are shown in the first two columns of Table 1. The second column shows high detection rates, at the cost of more false positives. However, its number of false positives per frame is still lower than SVM detector, in most cases. For “Multiple Flow” dataset, the low PDR of the proposed system is 94.55%, which is higher than those of the other four detectors. The corresponding FPPF, 0.56, is the lowest one among all these detectors. For “City Center” dataset, the low PDR and corresponding FPPF of the proposed system are 90.49% and 0.46, which are better than those for the GAB and HF algorithm. The high PDR reaches 95.29%, which is more accurate than CAB and SVM, at the cost of a few more false positives. For USC(A) dataset, the low PDR and corresponding FPPF are better than those for GAB and HF algorithm, and the high PDR and corresponding FPPF are better than those for SVM detector. The CAB algorithm has better performance of FPPF, while its detection rate is worse than the high PDR of the proposed system. For the INRIA dataset, the low PDR and corresponding FPPF of the proposed system are better than those for CAB, GAB, and HF algorithm, and the high PDR and corresponding FPPF are better than those for SVM detector.

Parts of the experimental results of the proposed system are depicted in Figure 5. During the detection results of these examples, the overwhelming majority of these pedestrians are detected with very few false pedestrians, through the validation of the proposed system.

4. Conclusion

In this paper, we presented a novel two-stage pedestrian detecting system based on a cascade AdaBoost detector and random vector functional-link net. The proposed system simultaneously enhances the detection accuracy and reduces the false positive rate, which improves the comprehensive performance for pedestrian detection. Numerous experiment comparisons with other state-of-the-art algorithms on four challenging datasets with different types demonstrate that the proposed system achieves favorable results, in terms of the detection rate and false positive rate simultaneously.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (2013R1A1A2013778).