Abstract

We present a new method for object tracking; we use an efficient local search scheme based on the Kalman filter and the probability product kernel (KFPPK) to find the image region with a histogram most similar to the histogram of the tracked target. Experimental results verify the effectiveness of this proposed system.

1. Introduction

Real time object tracking is one of the most active research areas in computer vision. The goal of object tracking is to estimate the locations and motion parameters of the target in a video sequence given the initialized position in the first frame. Research in tracking plays a key role in understanding motion and structure of objects. It finds numerous applications including surveillance [1], human-computer interaction [2], traffic pattern analysis [3], recognition [4], and medical image processing [5], to name a few. Although object tracking has been studied for several decades and numerous tracking algorithms have been proposed for different tasks, it remains to be a very challenging problem. One of the most challenging factors in object tracking is to account for appearance variation of the target object caused by change of illumination, deformation, and pose. In addition, occlusion, motion blur, and camera view angle also pose significant difficulties for algorithms to track target objects. Furthermore, there exists no single tracking method that can be successfully applied to all tasks and situations.

In this paper we present a new method for real time tracking of complex nonrigid objects. This new method successfully coped with camera motion, partial occlusions, and target scale variations. The scale/shape of the object is approximated by an ellipse and its appearance by histogram-based features derived from local image properties. We use an efficient search scheme (Kalman filter [6, 7], using the probability product kernels [811] and integral image [12] as a similarity measure) to find the image region with a histogram most similar to the target of the object tracker. In this work, we address the problem of scale/shape adaptation and orientation changes of the target. The proposed approach (KFPPK) is compared with MKF [13]. In [13] the authors build a color histogram-based visual representation regularized by a spatially smooth isotropic kernel. Using the Bhattacharyya kernel [8] as the similarity measure, a mean shift procedure is performed for object localization by finding the basin of attraction of the local maxima. However, the tracker MKF only considers color information and therefore ignores other useful information such as target scale variations, resulting in the sensitivity to background clutters and occlusions. Extensive experiments are performed to testify the proposed method and validate its robustness and effectiveness to track the scale and orientation changes of the target in real time.

The rest of the paper is organized as follows: Section 2 presents the probability product kernels. Section 3 introduces basic Kalman filter for object tracking. Section 4 presents scale estimation. Section 5 presents the proposed approach and Section 6 the experiment result. Section 7 concludes the paper.

2. Probability Product Kernel

In this paper we define a class of kernels on the space of normalized discrete distributions over some index set . Specifically, we define the general probability product kernel [8, 9] between distributions and as where is a parameter and and are probability distributions defined on the space , since, for any , the Gram matrix consisting of elements is positive semidefinite: for . Different values correspond to different types of probability product kernels. For , we have

We call this the probability product kernels defined by . We denote the histogram of target of object tracking as and the number of pixels inside as , which is also equal to the sum over bins Let be the normalized version of given by , so we can consider as a discrete distribution, with . Let be the normalized histogram obtained in the frames of the video sequence. For the -bin of , its value is obtained by counting the pixels that are mapped to the index : where is the Kronecker delta with if and otherwise. The mapping function maps a pixel to its corresponding bin index. The computation of the probability product kernels can be expressed as

Therefore, the computation of the probability product kernels can be done by taking the sum of values within target . The output of the following algorithm is a support map using integral image to compute the similarity measure between target and candidate region from each frame of the video sequence.

3. Kalman Filter

The Kalman filter has many uses, including applications in control [14], navigation [15], and computer vision. One important field of computer vision is the object tracking. Different movement conditions and occlusions can hinder the vision tracking of an object. In this paper, we present the use of the Kalman filter in the object tracking. We considered using the capacity of the Kalman filter to allow small occlusions and complex movements of objects [16]. Furthermore, the Kalman filter is a framework for predicting a process state and using measurements to correct or “update” these predictions.

3.1. State Prediction

For each time step , a Kalman filter first makes a prediction of the state at this time step: where is a vector representing process state at time and is a process transition matrix. The Kalman filter concludes the state prediction steps by projecting estimate error covariance forward one time step: where is a matrix representing error covariance in the state prediction at time and is the process noise covariance (or the uncertainty in our model of the process).

3.2. State Correction

After predicting the state (and its error covariance) at time using the state prediction steps, the Kalman filter next uses measurements to “correct” its prediction during the measurement update steps.

First, the Kalman filter computes a Kalman gain , which is later used to correct the state estimate : where is measurement noise covariance. Determining for a set of measurements is often difficult. In our contribution we calculated dynamically from the measurement algorithms state.

Using Kalman gain and measurements from time step , we can update the state estimate: Conventionally, the measurements are often derived from sensors. In our approach, measurements are instead the output of the tracking algorithm given the same input: one frame of a streaming video and the most likely and coordinates of the target object in this frame (taking the first two dimensions of ).

The final step of the Kalman filter iteration is to update the error covariance into : The updated error covariance will be significantly decreased if the measurements are accurate (some entries in are low) or only slightly decreased if the measurements are noisy (all of is high).

4. Target Model Update

A target is represented by an ellipsoidal region in the image in order to eliminate the influence of different target dimensions [13]. We will denote a dataset of independent samples by . Let us assume that the probability density function Gaussian is a good generative model for our data [17]. Gaussian density function is proposed to automatically determine the number of density functions and their associated parameters, including the pixel locations , covariance , and location of the target . The advantages of using the covariance matrix representation are as follows:(i)It is robust to illumination changes, occlusion, and shape deformations.(ii)It can capture the intrinsic self-correlation properties of object appearance.(iii)It is low dimensional. Those advantages lead to the computational efficiency:

We represent an object by modeling the spatial-color joint probability distribution of the corresponding region with a Gaussian density function. We define : to be the function that assigns a color value of the pixel at location . We employ the normalized RGB color space as the color feature in our case; we have used a normalized RGB histogram to provide some robustness to illumination change. We can write the features of spatial position and the color as where is the Kronecker delta function. We use the Gaussian kernel to rely more on the pixels in the middle of the object and to assign smaller weights to the less reliable pixels at the borders of the objects. We use only the pixels from a finite neighborhood of the kernel and the pixels further.

The color histogram that describes the appearance of the region is and the value of the -bins is calculated by

The weights of the value of -bins in object are calculated by

The new distribution of color histogram is calculated by

The covariance matrix update can be used to approximate the new shape of the object and it is calculated byand we should use . The correct value for depends on the noise that is present in the image sequence.

5. Proposed Approach

To ensure good organization of the progress of the work, we used the benefits of modular design in our implemented approach using MATLAB. The goal of object tracking is to generate the trajectory of an object over time by discovering its exact position in every frame of the video sequence. The steps of the object tracking system are shown in Figure 1.

The proposed approach (KFPPK) for object tracking is composed of five blocks named block processing, block prediction, block tracking, block correction, and block result. The functions of these blocks are as follows.

Block Processing. The algorithm starts the video sequence and converts it into images processing for extracting color information of images and the target of the object tracking.

Block Prediction. This step attempts to evaluate how the state of the target will change by feeding it through a state prediction of the Kalman filter. The state prediction serves two purposes: the time update equations are responsible for projecting forward (in time) the current state and the error covariance estimates for obtaining the a priori estimate for the next time step.

Block Tracking. In this block we use the probability product kernels combined with integral image as a similarity measure to find the image region with a histogram most similar to the target of the object tracker. Furthermore, we estimate the scale/shape and orientation of the object tracker using target model update.

Block Correction. The block correction update equations are responsible for the feedback. That is used for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate. The time update equations can also be considered as predictor equations, while the measurement update equations can be thought of as corrector equations based on block tracking.

Block Result. Tracking trajectory of the object is done on the basis of the region properties of the object such as scale/shape and centroid.

The algorithm of the proposed approach (KFPPK) can be explained as follows.

Step 1. Start video sequence and select the target of the object tracker in the first frame.

Step 2. In this stage we use state prediction of Kalman filter to estimate how the state of a target will change by feeding it through a current state and error covariance estimates to obtain a priori estimate for the next time step. This step uses (1) and (2).

Step 3. Calculate the similarity measure () between the target model and candidate regions using (5) and estimate the shape and orientation of the object tracker using (13).

Step 4. Correct and update equations into a priori estimate to obtain an improved a posteriori estimate, using (9) and (10).

Step 5. Draw trajectory. And go to Step 2 in the next frame.

6. Experimental Results

This section mainly shows the tracking results using the proposed approach (KFPPK). Experiments on real video sequences have been carried out to evaluate the performance of the proposed algorithm. In these experiments, we compared our system with existing algorithms MKF [13]. The proposed algorithm achieves good estimation accuracy of the scale and orientation of the object in the video sequence. We used different sequences; each has its own characteristics but the use of a single object in movement is a commonality between these different sequences. We then set up experiments to list the estimated width, height, trajectory, and orientation of the object.

In this work, we selected normalized RGB color space as the feature space and it was quantised into 16 × 16 × 16 bins to compare between algorithms. One synthetic video sequence and one real video sequence are used in the experiments.

We first use a synthetic ellipse sequence (where the resolution is , the frame rate is 25 fps, and the number of frames is 77) which is used to verify the efficiency of the KFPPK. As shown in Figure 2, the external ellipses represent the target candidate regions, which are used to estimate the real targets, that is, the inner ellipses. The experimental results show that the KFPPK is reliable for estimating the mean position and trajectory (as shown in Figure 3) of the ellipse with scale and orientation changes. Meanwhile, the results by the MKF [13] algorithm are not as performant as the proposed approach KFPPK because of significant scale and orientation changes of the object tracker.

The second test is an occlusion sequence (where the resolution is , the frame rate is 25 fps, and there are 193 frames) which is used to verify the efficiency of the KFPPK on a more complex sequence. The object exhibits large scale changes with partial occlusion. We can see the results by the proposed approach (KFPPK) and the MKF [13] in Figure 4. We observe that the KFPPK works much better in estimating the scale and orientation of the target, especially when occlusion occurs.

Table 1 lists the average time on the videos sequences. We notice that our proposed approach (KFPPK) has a better average time of execution than MKF [13] algorithm.

6.1. Tracking Performance Evaluation

The main objective in this section is to evaluate the performance of our method using Receiver Operator Characteristic (ROC) curves. The ROC curves show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples. ROC curves are defined via two major rates expressed as follows:where TP (true positives) are examples correctly labeled as positives, FP (false positives) refer to negative examples incorrectly labeled as positive, TN (true negatives) correspond to negatives correctly labeled as negative, and FN (false negatives) refer to positive examples incorrectly labeled as negative.

We studied the ROC curves of the proposed algorithm and compared them with the MKF [13] algorithm. Figure 5 compares the performance of the algorithms using ROC curves for each video sequence. The KFPPK algorithm has a higher performance in regard to the MKF [13] algorithm.

The experimental results demonstrate that the KFPPK is robust in the tracking of the trajectory of objects in different situations (scale variation, pose, rotation, and occlusion). It can be seen that the KFPPK achieves good estimation accuracy in real time of the scale and orientation of the target.

7. Conclusion

In this paper, the proposed approach (KFPPK) has been presented for tracking a single moving object in the video sequence using color information. In this approach we combine the Kalman filter and probability product kernel as a similarity measure using integral image to compute the histograms of all possible target regions of object tracking in the video sequence. The new KFPPK has been compared with the state-of-the-art algorithm on a very large dataset of tracking sequences and it exceeds in performance in the processing speed. The extensive experiments are performed to testify the KFPPK and validate its robustness to the scale and orientation changes of the target in real time. This implemented system can be applied to any computer vision application for moving object detection and tracking.

Conflict of Interests

The authors declare that they have no competing interests.