Abstract

We propose a new computational intelligence method using wavelet optical flow and hybrid linear-nonlinear classifier for object detection. With the existing optical flow methods, it is difficult to accurately estimate moving objects with diverse speeds. We propose a wavelet-based optical flow method, which uses wavelet decomposition in optical flow motion estimation. The algorithm can accurately detect moving objects with variable speeds in a scene. In addition, we use the hybrid linear-nonlinear classifier (HLNLC) to classify moving objects and static background. HLNLC transforms a nonoptimal scalar variable into its likelihood ratio and uses a scalar quantity as the decision variable. This approach is appropriate for the classification of optical flow feature vectors with unequal variance matrices. The experimental results confirm that our proposed object detection method has an improved accuracy and computation efficiency over other state-of-the-art methods.

1. Introduction

In modern engineering, the requests for research and design are increasingly achieved with the help of intelligent models. Computational intelligence (CI) has emerged as powerful tools for information processing, decision making, and knowledge management [1]. CI is a set of nature-inspired computational methodologies and approaches to address complex real-world problems to which traditional approaches [2, 3]. In this paper, we propose a new computational intelligence method using wavelet optical flow and hybrid linear-nonlinear classifier (HLNLC) for object detection. Object detection can be subcategorized as either the detection that has similar characteristics [4, 5] or the detection of a specific object in a video sequence [6, 7]. Our paper is focused on the second category.

One important task in object detection is motion estimation. Optical flow is one commonly used approach to estimate the object motion. Starting with the original algorithms by Lucas and Kanade (LK) [8] as well as Horn and Schunck (HS) [9], gradient-based methods have led to other improved optical flow estimation methods. However, when the image background is cluttered or the detected object is moving at high speed, the accuracy of gradient-based methods will be significantly decreased [10]. Another important task in object detection is classification. Classifier techniques such as pulse-coupled neural network (PCNN) [11], fuzzy neural network (FNN) [12], Gaussian SVM (GSVM) [13], and linear discriminant analysis (LDA) [14] have been applied to different object detection situations. In PCNN and FNN schemes, detection accuracy may be coupled with small training errors because each image pixel is associated with a unique neuron and vice versa. In SVM classifier approach, the classifier is characterized by excessive complexity when it comes to the binary classification task of identifying moving objects versus static background. The LDA classifier is a more robust linear classifier which has proven itself as the ideal observer for input feature vectors with equal covariance matrices [15], but it is not the optimal choice for data that contain multivariate optical flow vectors with unequal covariance matrices.

Our work will consider both factors. We propose a new object detection method using wavelet-based optical flow and hybrid linear-nonlinear classifier. Some wavelet-based optical flow estimation approaches have been proposed; Wu et al. [16] used wavelets to model and reconstruct flow vectors; at each iteration, the estimation will be repeated, which is to reduce efficiency. In [17], Bernard assumed that the optical flow was locally constant. In [18], Srinivasan and Chellappa proposed a similar method which modeled optical field using a set of overlapping basis functions. In [17, 18], optical flow computation has been simplified, and the disadvantage is to compromise the accuracy, especially when several objects with different speeds existed in one scene. In this paper, we use wavelet calculus to compute derivatives of the functions in terms of the scaling expansion coefficients. Our proposed method could achieve an accelerated optical flow computation and accurately estimate the motions of different speed moving objects in the same scenes.

In the binary classification about distinguishing moving objects versus static background, a linear classifier is a more robust choice than existing methods [19]. Linear discriminant analysis (LDA) is the most popular linear classifier. LDA produces accurate classifications for two types of input feature matrices: those with normal distributions and those with equal covariance. However, it has trouble with vector data that have an unequal covariance. Thus, we use the novel hybrid linear-nonlinear classifier (HLNLC). HLNLC was proposed by Chen et al. in 2010 [20]. It has been proved to be a more robust linear classifier than LDA and other existing classifiers.

This paper is organized as follows. Section 2 describes wavelet-based optical flow motion estimation method. Section 3 introduces the hybrid linear-nonlinear classifier. Section 4 describes rectangle window scan algorithm. Section 5 describes experimental results. Our conclusions are presented in Section 6.

2. Wavelet Based Optical Flow

2.1. Gradient-Based Optical Flow

Gradient-based optical flow algorithms are based on the assumption of constant brightness [8, 9], where it is assumed that the gradient value of a pixel will not vary due to displacement [21]. It can be described as

Here, represents the brightness value of pixel at time . , , and are the partial derivatives of with respect to , , and . The variable is the velocity vector of optical flow estimation about , and is the gradient operator of . The brightness constancy assumption assumes that the motion vectors are constant within small windows and that the image sequence will not change significantly during a short period of time. This assumption can be expressed as

Based on (1) and (2), the flow algorithm produces two simultaneous equations for the velocity vector and : , , , , and are the product of the partial derivatives , in a time range . The detailed representations may be written as

In gradient-based optical flow algorithms, object displacement between successive frames determines the accuracy of optical flow estimation because the assumption that there will be no major changes between successive frames will break down when the displacement between frames is significant. In order to improve the accuracy of optical flow estimation, displacements between video frames should be projected and recalculated in different frame rates, which means that the algorithm should be able to adaptively adjust the velocity components for different objects moving with different speeds.

2.2. Wavelet Based Optical Flow Estimation

We try to apply the wavelet transform into optical flow estimation. Wavelet transform is an important tool for signal processing, and its magnitude will not oscillate around singularities as the transform magnitude is locally nearly shift invariant [22]. To apply the wavelet decomposition to optical flow estimation, we transform the optical flow equation into the following expression [23]:

Suppose the image size is pixels. Then the optical flow vector can be expressed as

The variables , are the weighted coefficients of the optical flow. Once and can be determined, the optical flow estimation will be accomplished [24]. Thus, we transform the optical flow estimation into a calculation that includes node variables, where and can minimize the object function (5).

, , and denote the products of spatial derivatives: , , and are products of spatial and temporal partial derivatives, where the time variable is taken from the variable frame rate:

The shortest frame interval is , and the interval of multiplications is set to , so that the product of and in an interval is given by where .

The partial derivatives of can be obtained via two difference equations:

In the time interval , we compute the optical flow partial derivatives using (7). In our algorithm, and are replaced by the approximate products ~ and ~ in order to improve computation accuracy. The specific formulas are shown in the following expressions:

In this work, continuous displacements between image frames are considered to be very small. Specifically, and will be considered as constants in the time interval ~. With this assumption, we can replace and with ~ and ~. The amplitude of optical flow is estimated by considering the previous frame . The adaptive parameter is determined by the optical flow estimation from the previous frame. It can be used to estimate the pseudovariable frame rate . When computing the product in frame interval , is adjusted automatically according to the speed of detected object. For objects moving at high speed, is small, while, for slow-moving objects, it is larger. Furthermore, may take on different values at any pixel location because may exhibit spatial variability in the optical flow estimation algorithm. By using a wavelet transform, we can rewrite (5):

Figure 1 depicts the application of wavelet into optical flow estimation.

The optimal coefficient of can be determined by the following expression: where , .

The sparse representation is , and the matrix obtained by the wavelet transform is . Elements of matrix are written as

The combination of wavelet transform and optical flow defines moving objects via a sparse linear representation in the defined structure. The wavelet algorithm has collected all the information from the optical flow estimation and stored it in the matrix . Once the computation is complete, an optimal and sparse will have been determined. Our method transforms the optical flow estimation into the problem of minimizing an energy function. The coefficients determination yields accurate optical flow vectors.

3. Classification

3.1. Hybrid Linear-Nonlinear Classifier

The novel hybrid linear-nonlinear classifier (HLNLC) divides the traditional binary classification into two stages: in the first stage, a linear function combines the input feature vector and a scalar variable, and in the second stage, the scalar variable is transformed into a decision variable [25].

In a two-class sorting approach, a particular data set is divided into positive and negative parts. Suppose the feature vector is given by , where represents the joint outcome of some random variables, where there are of such variables. The corresponding distributions are described by a multivariable positive normal distribution (where the mean is and the covariance matrix is ) and by a multivariable negative normal distribution (where the mean is and the covariance matrix is ). The probability density function (PDF) for positive and negative components is and .

In first stage of the HLNLC, the feature vector is mapped into a scalar vector . Because follows the multivariable normal distribution and is a linear combination of , we see that follows a two-variable normal distribution:

These parameters can be represented by the input vector and the other related input parameters. They may be written as

In [26], the classifier is improved by projecting the multivariable classification algorithm into a two-density distribution interval, with the projection method described by

In a second stage of the HLNLC algorithm, the likelihood ratio of is used as a decision variable:

Based on binormal ROC theory, the corresponding can be expressed as where the and are

is the cumulative distribution function (CDF) of the standard normal distribution. Using the parameter , we may express the AUC as a function of the linear coefficient vector :

In order to find the optimal linear function of the HLNLC, which mainly focuses on and AUCHLNLC, the HLNLC algorithm classifies the optimization problem as follows:

The optimization problem is given by , which can be solved with gradient-based mathematical methods. The specific process is shown in the following expressions:

3.2. Classification of Optical Flow Vectors

In this paper, the linear coefficient vector is determined by the parameter for a 2D optical flow vector , which can be understood to be the angle between vector coordinate and vector coordinate . Any 2D optical flow vector can be normalized as

The operating assumption in the HLNLC algorithm is that two types of feature data follow a pair of multivariable normal distributions. However, the optical flow distributions contain some small differences. We can implement a more robust method. The pair of optical flow vector variables uses bivariate normal distributions to normalize the data. Specifically,

Here, represents feature covariance of the optical flow . Using this method, is normalized into the normal distribution. The parameters that concern the HLNLC algorithm can be obtained with (18), in which positive values indicate motion areas and negative values represent background. AUCHLNLC can be calculated by using (23). The related parameter is used to produce an ROC curve and calculate AUCHLNLC.

4. Rectangle Window Scan

In order to detect moving objects, an rectangle window should be determined. We propose rectangle window scan algorithm; in scanning process, the classified optical flow vectors which are attributed to moving object are the input variables, and a rectangle window will shift pixel locations in each direction per unit time. The operation is repeated until the rectangle window size is less than the presupposed threshold value. The detection area marked by the rectangle window will be the final output. Our proposed method could detect multiple moving objects in one scene. However, the iterative process may be time consuming. Figure 2 illustrates the details of rectangle window scan algorithm.

In the method, the th focused area at the th scan line is denoted by , and the related rectangle window is recorded as . Initially, the scan area is obtained by a normal adaptive modification calculation method. Thereafter, the rectangle window shifts to the right by pixels, where was used in this paper, and is shifted to . There is no need to recalculate the overlap region of an integral scan because the two adjacent regions and share the same overlap region . For the remaining horizontal scan lines, describes the region of concern which is below . The upper and lower boundaries of the nonoverlapping region are defined as and .

Based on the distribution principle for motion vectors, less than 50% of the motion vectors may be zero vectors. We use the sum of absolute gradient difference (SAGD) to judge motion vector value. SAGD can be calculated in the following expression:

Depending on the current location of rectangle window, some adjacent windows may not exist. We use windows adjacent to block combination to estimate the object motion, using the formula

For the remaining areas, , with , , the moving object region is identified as

5. Experimental Results

In this section, we validate the performance of our proposed algorithm on four different videos. All of the source videos were presented in a consistent format (MPEG-2 standard, 25.68 frames per second (fps)). Video (a) was produced with cameras where high-speed moving cars appear in both close and distant scenes, it has 996 frames of 768 × 576 resolution. Video (b) is a standard video compression sequence known as the coastguard sequence, where a video camera is fixed on a moving boat so it appears that the background is moving; it has 876 frames of 768 × 576 resolution. Video (c) is a spatial satellite video sequence, and the satellite is the detected object; it has 1025 frames of 720 × 480 resolution. Video (d) is a human-motion video in which the motion is a human running; it has 825 frames of 720 × 480 resolution. These four videos are the most tested videos, in which the pedestrians, vehicles, and satellites are typical detected targets. Comparison experiments with other state-of-the-art object detection algorithms could demonstrate the efficiency of our proposed method. Specific steps are shown in Figure 3.

5.1. Optical Flow Experiments

We use a constant value to determine the time interval between video frames, and the parameter value pixel/ms was found to be suitable. Each frame was processed with subsampling and rescaled under the wavelet estimation. Two important steps are involved: (1) normalization of optical flow vectors and computation of the matrix and (2) computation of the spare representation of . Our results show that cars and human bodies can be detected in (a) and (d). Different frequency components in unstable regions and jitters have been removed in sequences (c). Compared with other video sequences, detected objects in sequence (b) have different characteristics. The optical flow is detected for background area. The experimental results are shown in Figure 4.

Computation time comparisons results are shown in Table 1. We compare our proposed method with LK, HS, and occlusion-aware optical flow (OAOF) [27]. LK and HS calculations take almost the same time because they involve similar computation methods. OAOF takes a little longer time. Our proposed wavelet optical flow method has an improved computation time, with a reduction of nearly 5~6 sec over LK and HS and 10~14 sec over OAOF. It clearly demonstrates the efficiency of using wavelet transform in optical flow estimation.

5.2. Classification Experiments

In the first stage of HLNLC algorithm, structure optimal parameters are chosen as , and the classifier undergoes 20 iterations. In order to get a dense optical flow field, our method uses a global smoothness constraint. Each optical flow vector has a spatial connection with respect to high-speed moving object. Figure 5 shows the initial classification results. As background information and moving object features have similar characteristics, motion regions adhere and empty holes emerged. Obviously, classification result is not very satisfactory.

In the second stage, the classifier uses a decision variable to deal with topology changes in detection region. Figure 6 shows the classification results about video sequences (a) and (b). Motion areas in video (a) contain queues of cars, which belong to close scene and distant scene, respectively. Because the cars in distant scene are too wide, some background information has also been classified as motion region. Optical flow vectors generated from video (b) reflect background information. We employ a trust-region Newton-Rapthson method to solve this problem [28]. This method separates moving objects from background by making judgments about the most meaningful trust-region in sequences based on the maximum likelihood ratio.

We are interested in comparing the classification performance of HLNLC and other classifiers, including PCNN, FNN, GSVM, and LDA. We specified the population parameters of a pair of normal distributions and drew samples from the specified distributions. Then, different classifiers are applied to each optical flow feature vector in the sampled data. The comparison results are shown in Table 2. M&SD are the mean and standard deviation of the motion region (positive region), and AUC is calculated based on M&SD:

The function is the cumulative distribution function of the standard normal distribution. Positive values indicate motion areas, and negative values represent background. AUC, as a decision variable of normal distribution, equals the probability that outcomes from the actual positive class. A greater AUC means higher classification accuracy. Based on AUC, we were able to calculate the detection rate (DR), which is defined as the ratio between total moving object region and background area.

Our experiments use the mean AUC for each sequence which has different frame numbers, respectively. Neural network algorithms (PCNNFNN) give rise to proliferation errors in the edges of moving objects and have larger deviations in their classification results. GSVM can directly classify data without using PCA, so the result is better than what is produced by NN. About LDA and HLNLC, we calculate AUC by inserting trained LDA vectors, and AUCHLNLC is found to be superior to AUCLDA. In all the methods, We observe that the HLNLC can substantially improve classification performance over other classifiers. M&SD, AUC, and DR curves obtained by different methods are shown in Figure 7.

5.3. Rectangle Window Scan Experiments

In this part, we used three other object detection methods for comparison. These included SIFT, background subtraction (BS), and Hough forest method (HF). The SIFT method is based on SIFT features, which is invariant to image scale and rotation [29, 30]. The background subtraction method uses temporal differencing pixels from a Laplacian model and completes the object detection task via a threshold value [31]. Hough forests can be regarded as a task-adapted codebook, in which different locations, scales, and motions are stored [32]. Figure 8 shows the experimental results; the object detection results obtained by our proposed method are shown with red solid boxes; the results in SIFT, BS, and HF are shown in blue, green, and pink dotted boxes. Our proposed method could detect moving objects in different distant scenes in the same sequence; in video sequence (a), moving cars in the distant scene were detected in green and orange solid rectangle windows. In addition, our proposed method could adaptively adjust the rectangle window that will be suitable for the detected object, such as satellite and human body in video sequences (c) and (d).

We use three measures to compare object detection accuracy [33]:

Here, and represent th ground-truth and detected object in frame . and represent the number of “ground-truth” and detected object in frame . is the number of classification errors. is the missed detection count. is the pair number of ground-truth and detected object in frame . OverLapRatio is the quality of alignment between the detected objects and the ground-truth. SFDA is the detection accuracy for a video sequence, which is essentially the average of FDA over all of the relevant frames in the sequence; ODA is the object detection accuracy, which utilized the missed detection and classification error counts. ODP is the object detection precision, which gave us the precision of detection by taking into account the spatial overlap information between ground-truth and system output.

Experimental results are shown in Table 3. Compared with SIFT, our proposed algorithm increases SFDA, ODP, and ODA by about 3%~7%, 5%~10%, and >10%. Compared with BS and HF, the SFDA, ODP, and ODA are increased by about 5%~15%, 8%~15%, and 10%, respectively. It implies that our proposed algorithm has a higher detection accuracy and lower false detection rate. SFDA, ODP, and ODA curves obtained by different methods are shown in Figure 9.

6. Conclusions

In this paper, we propose a new computational intelligence method for object detection. We apply the wavelet transform into optical flow estimation. Our proposed method is able to estimate the motions of different speed moving objects in the same scenes and achieve an accelerated optical flow computation. In the classification stage, HLNLC method is used to classify optical flow vectors, and it yields higher accuracy than other existing classifiers. Experimental results demonstrate that the combination of wavelet optical flow estimation and HLNLC classification can achieve more accurate and precise object detection.

Acknowledgments

This work was supported by the National Basic Research Program of China (973 Program) (2012CB821206), the National Natural Science Foundation of China (no. 91024001, no. 61070142) and the Beijing Natural Science Foundation (no. 4111002).