Abstract

Pedestrian tracking is a critical problem in the field of computer vision. Particle filters have been proven to be very useful in pedestrian tracking for nonlinear and non-Gaussian estimation problems. However, pedestrian tracking in complex environment is still facing many problems due to changes of pedestrian postures and scale, moving background, mutual occlusion, and presence of pedestrian. To surmount these difficulties, this paper presents tracking algorithm of multiple pedestrians based on particle filters in video sequences. The algorithm acquires confidence value of the object and the background through extracting a priori knowledge thus to achieve multipedestrian detection; it adopts color and texture features into particle filter to get better observation results and then automatically adjusts weight value of each feature according to current tracking environment. During the process of tracking, the algorithm processes severe occlusion condition to prevent drift and loss phenomena caused by object occlusion and associates detection results with particle state to propose discriminated method for object disappearance and emergence thus to achieve robust tracking of multiple pedestrians. Experimental verification and analysis in video sequences demonstrate that proposed algorithm improves the tracking performance and has better tracking results.

1. Introduction

Object tracking [1, 2] is an important research field in computer vision for its wide range of application demands and prospects in industries, such as intelligent human-computer interaction, video monitoring, and intelligent transportation. Pedestrian is the main goal of tracking in most scenes of object tracking. Pedestrian tracking [3] has important research significance and application value in object tracking. However, multiple pedestrians tracking in complex environment is still facing many problems due to randomness in human motion, pedestrian scale changes and posture changes, mutual occlusion, complex backgrounds, and so forth.

For video object tracking study, there are mainly three methods: method based on pattern matching, method based on classification, and method based on object state estimation. Method based on pattern matching is the method which transforms visual tracking into object matching of successive video frames [4]. Mean Shift [5, 6] is the most typical object pattern matching algorithm. This method has relatively small calculating amount and can achieve fast pedestrian detection and tracking in static background. However, it is difficult for pedestrian detection and tracking in moving background, which limits the application range of this method; method based on classification [79] transforms object tracking into foreground and background classification and usually adopts machine learning method for processing. But there are three problems of this method: first, construction of classifier needs a large amount of positive and negative samples to learn and how to choose samples is a key issue; second, the calculation has high complexity and large calculation amount; thus it is hard to satisfy real-time needs; third, it needs to do object search within the scope of object region. It still needs to study how to optimize the scope to the size which is neither too small to affect object tracking precision nor too large to reduce searching efficiency. Method based on object state estimation [10] is based on Bayesian theory. The method achieves object tracking by iteratively solving the maximum posterior probability of object state under new observation value condition on the basis of acquiring prior probability of object state. For its stable tracking performance and the continuous improvement of computer processing ability in recent years, it has been widely used in video tracking. Due to particle filters’ non-Gaussian, nonlinear assumption and multiple hypothesis property, it has been successfully applied to video object tracking [11]. Therefore, it becomes the mainstream research method [12].

2. Previous Work

Since people differ from other objects, they belong to nonrigid objects, and they have various shapes and deformations; it is difficult to describe their features. Compared with single pedestrian tracking, multiple pedestrians tracking is more complex. It needs to estimate the state of multiple pedestrians and is difficult to deal with similar appearance, mutual block, disappearance, and appearance among pedestrians. Isard et al. [13] proposed Bayesian Multi-Blob tracker on the basis of a combination of blob algorithm and particle filter algorithm. The tracker can handle number changes of objects during the process of tracking but only for a static background. It is difficult to construct the observation model when the background is moving. Okuma et al. [14] proposed an enhanced particle filter which can achieve tracking of ice hockey players. The method combines the two successful algorithms, hybrid particle filter and AdaBoost, to enable automatic multiobject tracking system. However, it is prone to tracking failure when the background is comparatively more complicated; Henriques et al. [15] proposed a new theoretical framework, CSK, for analyzing sample in detection and tracking. The method can quickly and accurately train trackers which can be used to track pedestrians. Danelljan et al. [16] improved CSK method and adopted self-adaptive dimensionality reduction method to extract color feature which improves CSK’s real-time effect in tracking pedestrian, but CSK is based on classification, and, with object increasing, the efficiency and the real-time effect will also be reduced; Lu et al. [17] combined random airport tracking with DPM pedestrian detection and corrected detection result by mutual assistance between them and calibrated NBA basketball players through the feature extraction and recognition. Chen et al. [18] proposed constrained sequential labeling (CSL), which can be used for volleyball and basketball player tracking. However, both of the above two methods can only track indoor scenes, which are rarely affected by light, weather, and other complex factors. Therefore, they are not suitable for tracking pedestrians in complex outdoor scenes; Yang et al. [19] proposed a robust superpixel tracking method. The method fully takes advantage of intermediate visual features and constructs object tracker based on superpixel to accommodate the nonrigid deformation and out-of-plane rotation of the object. Thus, it has better robustness for nonrigid deformations, illumination changes, big posture changes, and out-of-plane rotations. The method can be extended to multiple pedestrians tracking; however, it only considers the superpixel color information; thus it is prone to cause tracking errors when being applied to multiple pedestrians tracking. Xue et al. [20] consider the problem of human tracking in RGBD videos filmed by sensors such as MS Kinect and Primesense. This method can achieve indoor multipedestrian tracking quickly and accurately. Since the distance that Kinect requires to collect object should be controlled between 1.6 m and 3 m, the pedestrian beyond 3 m cannot be tracked. Furthermore, this method is susceptible to occlusion. Wu et al. [21] proposed a regional deep learning tracker that observes the object by multiple subregions and each region is observed by a deep learning model. However, with the increase of tracked objects, it can cause a huge deep network; thus it is difficult to achieve efficient tracking.

In summary, the major issues to be addressed for the current multipedestrian tracking based on particle filter are as follows: tracking under complex situations such as pedestrian posture, dimension, and other changes and the temporary disappearance of object caused by severe or completely occlusion; multiple pedestrians tracking under a moving background; pedestrian disappearance and appearance in multiple pedestrians tracking caused by the limited range of camera shooting; the fact that, for multiple pedestrians tracking, mutual occlusion and interference among pedestrians often occur. Therefore, effective tracking algorithm is needed to reduce errors of object tracking results and track the real state and trajectory of the object.

In order to solve the above problems and achieve automated and robust tracking of pedestrians in complex scenarios, we present tracking algorithm of multiple pedestrians based on particle filters in video sequences. Figure 1 shows diagram of algorithmic process.

3. Detection of Pedestrians

3.1. Object Region Extraction

Before tracking, we need to detect object regions in the first frames. If the value of is too small, it will affect the accuracy of a priori knowledge; otherwise it will affect the efficiency of tracking algorithm. Therefore, after weighing between efficiency and accuracy, we set the number of frames as 5 through experimental testing. In order to get object region, there are two main processes: firstly, adopt frame difference method to obtain the object region; then obtain detection results by using HOG detector. Figure 2 shows the flow diagram of object regions extraction of the first frames. This detection method not only reduces HOG detection region but also improves efficiency and accuracy of the detection.

3.2. A Priori Knowledge Extraction

Enlarge object region detected by image of frames and do superpixel segmentation in rectangular region which is larger than the object region thus to get a priori knowledge which reflects the object and background. Description of prior knowledge acquisition process is as follows:(1)Define two rectangular regions , for frames, represents detected object region of the last frame, and represents rectangular region whose side lengths are, respectively, 1.5 times the maximum length and width of object region, that is, a rectangular region which is slightly larger than object region . represents the middle region of two rectangles, that is, the nonoverlapping part of two regions.(2)Do superpixel segmentation for region; if is superpixel at position , then represents the th feature vector of superpixel. Feature vector here is HSV color histogram.(3)Employ Mean Shift clustering for superpixel feature vector and get different clusters.(4)In feature space, each cluster contains the following information: cluster center , cluster radius , and cluster member . Thus, each cluster corresponds to a certain region in image frame. Compute two areas for the cluster: and . represents the area occupied by cluster in object region ; represents the area occupied by cluster in object region . Thus, the larger value is, the more superpixel members of cluster appear in the object region; and the smaller value is, the more superpixel members of cluster appear in the background region. According to this condition, we use the following method to calculate confidence value for each cluster in region :

Establish object and background prior knowledge based on the object region obtained from the first frames. The process of acquiring a priori knowledge is shown in Figure 3.

3.3. Confidence Map Obtaining

When a new frame, that is, frame, arrives, do superpixels segmentation in the region which is 1.5 times object region based on the location of the last frame and acquire superpixels and then calculate the object/background confidence value for each superpixel according to the following formula:where is weighting term, which measures the distance between the th superpixel in the th frame and cluster center which it belongs to. The closer the distance is, the closer confidence value of the superpixel is to the object confidence value of the cluster, and vice versa. Parameter represents radius of cluster , and is a normalization term, used to adjust distinction degree of superpixel weight in clustering. The choice of this value was made in light of existing research literature and the fact that the optimal value for the threshold should lie near the bend of the ROC curve. This was consistent across a number of trials. Thus, by referring to previous studies, we set lambda as 2. According to the above method, object/background confidence value for each superpixel in each new frame can be calculated by using cluster confidence value and the distance weight item .

Finally, assign each pixel of superpixels in the frame of the superpixel confidence value , and assign pixel point in region outside superpixel segmentation confidence −1. The confidence values of all pixels in the current frame are obtained, and thus confidence map of the current frame is obtained. The entire process is shown in Figure 4, where, in the confidence map, deeper blue represents the greater possibility of superpixel to be belonging to the background, and deeper red represents the greater possibility of the superpixel to be belonging to the object.

After acquiring object confidence map of the current frame, randomly sample image blocks in rectangular region . The size of the block image is the same as the object rectangular region of the previous frame, as shown in Figure 5. Then calculate confidence values of pixel points in the locations of each sampling image block center. The image block which has the greatest confidence value is detection result.

4. Particle Filter Tracking

Nonlinear, non-Gaussian distribution system can more accurately describe pedestrian tracking in actual complex scenes, and particle filter algorithm can handle any nonlinear, non-Gaussian distribution systems. For the above reason, we choose particle filter framework to solve pedestrian tracking in complex scenes.

4.1. State-Space Model

Provided , is the state number in , is the state number in , is the object state of the th object at time , object state is during the test process of object tracking, state of the object is indicated as , and are, respectively, the coordinates of rectangle center at direction and direction in the image, and and are, respectively, width and height of the rectangle. Since each object motion is an independent process, joint product of a single object model can be used to track multiple objects:

In order to obtain the state transition density function of the th object at time , we use stochastic disturbance model to describe state transition of the th object from time to time , which is shown as the following formula:where indicates normal density function, the covariance of which is diagonal matrix , and the elements on the diagonal correspond to variances of four parameters in state , namely, , , , , respectively, representing variance of object center coordinates, width, and height of the rectangle.

According to the object position of the last time, acquire particles of each object by state-space model sampling; each particle represents a candidate region. Figure 6 shows the sampling of candidate regions. Then, calculate observation value of each candidate region at the position pixel by observation model.

4.2. Observation Model

Do feature extraction of acquired object detection results at the current time to obtain observation value of the current object ; observation value of predicted particles is . In observation model, observation value of particles and that of object need to be correlated to select the best particle, and candidate region represented by the particle can be seen as detection result.

4.2.1. Feature Extraction

In order to obtain the observed values of object and particles, respectively, feature extraction needs to be done for detected regions and candidate regions. Tracking method based on a single feature can get better tracking results in some special scenarios or situations with little changes in the object environment, and so forth. However, using this method always leads to loss of tracking under more complex environment, background, or the influence of noise, likeness interference, and other factors. In such cases, using multifeature combination is able to get a better applicability. Therefore, this paper adopts two features which are complementary to each other, HSV color histogram and LBP histogram, as shown in Figure 7.

4.2.2. Association of Particles and Observed Values

After the state transition of particles based on the equation of state, particle state at the new time can be acquired. Set the object state number at time as ; object state particle sets can be expressed as

Prediction observation value of particles corresponding to its object state is ; the current object observation value is . In order to calculate weight of each particle, we need to do data association between state particle set of each object and the current observation , which is shown by incidence matrix in the following formula:where represents incidence matrix of time , which is dimensional matrix, indicates the number of object observations, indicates the number of object states, element in the matrix represents correlation degree of the th observation value at time , and , with observation value of particle set , correspond to the th object state. The larger value indicates the greater correlation degree. is calculated as follows:Calculate feature similarity :where represents the similarity between the th object observation value and all the particles set features which correspond to the th object state; represents the feature vector of a certain feature of the th object; represents feature vector of each particle from the set which corresponds to the object state; shows the intersection distance between two features vectors. Obtain the value of according to the following formula:where and are, respectively, similarity function of color and LBP feature. , are weights of two feature likelihood functions. And they continuously carry out dynamic changes during particle transferring process. Weights and are, respectively, calculated as follows: where and , respectively, represent the weight of color features of all the particles and LBP feature at time . and are distribution range of particle weights in each feature at time .

Thus, we can get correlation condition between object observation value and particles set of object status according to the correlation matrix. Moreover, the appearance and disappearance of the object can also be derived from the analysis of the correlation matrix. If elements in a certain row of the correlation matrix are approximately zero, which indicates that no object status corresponds to the observed value of the time, it can be determined that the observation value is the newly emerging object. If elements in a certain line of the correlation matrix are approximately zero, which indicates that no observed value corresponds to a certain object status, it can be determined that the object of this status has disappeared.

4.2.3. Determination of Object Disappearing and Appearing

According to the correlation matrix, we can know whether a certain object disappears or appears at time . If object in the correlation matrix is determined to have disappeared for three consecutive frames from time , it can be determined that object has disappeared. At this time, delete all particles in this status from the entire particle state. Then we get particle set as follows:

If object in the correlation matrix is determined to have appeared for three consecutive frames from the time , then it can be determined that object is the newly appearing object. Then we get particle set as follows:

4.3. Occlusion Handling

In the process of tracking multiple pedestrians, mutual occlusion often occurs especially when object pedestrians are in serious occlusion. Useful pedestrians information cannot be extracted; thus it is likely to cause obscured pedestrians off-tracking results. The above tracking process can handle partial occlusion. However, when the object is in severe or complete occlusion, the value of likelihood function of predicted particles state will become very small. In view of this situation, the paper processes severe occlusion. When the entire likelihood function of particle state is smaller than a certain threshold, keep the last state of object tracking unchanged and the particles continue to do state transfer, since the object position of two adjacent frames differs less. However, when the object is severely occluded for more frames, in this paper, it will be determined as the object that disappeared. Object tracking results under severe occlusion as well as the movement of particles are, respectively, shown in Figures 8 and 9.

The advantage of this method is that when a pedestrian is determined as being in serious occlusion, although the object tracking results remain unchanged, the particle states continue to transfer in each frame. When the object reappears after several successive frames, it remains to be in the particle sampling range. Estimate the most likely candidate objects based on the candidate objects obtained by particles sampling at this time.

4.4. The Algorithmic Process

The entire algorithmic process can be summarized as in Algorithm 1.

Detection of pedestrians
 For
    (1.1) Perform frame difference and obtain object region;
    (1.2) Build a pedestrian detector using HOG descriptors and SVM;
    (1.3) Detect pedestrians in each frame;
    (1.4) Extract priori knowledge of pedestrians;
 End for
 For
    (1.5) Perform superpixel segmentation according to object region of the last frame;
    (1.6) Obtain confidence map according to (2);
    (1.7) Do random sampling for object confidence map and obtain object;
    (1.8) Update priori knowledge for each object;
 End for
Particle filter tracking
 For object =
    (2.1) Initialize particle state distribution ;
    (2.2) Set initial weight value of feature information ;
 End For
 For
  For target =
    (2.3) Important sampling step
    Propagate and get new particles using (4);
    (2.4) Update the weights
    Compute the observation likelihood function and for each particle using (7);
    Update weight value of features information using (9);
    If
     
    End if
  End For
 End For
    (2.5) State estimation
    Estimate the object state

5. Experimental Verification and Analysis

In order to evaluate the performance and tracking effect of proposed tracking algorithm, video sequences of real scenes are used to test the algorithm. Video data used in the test come from our database, CAVIAR database, and PETS2012 standard library. These video sequences include complex situations such as random translation, occlusion, scale change, likeness interference, and disappearance and appearance of the object.

Among them, parameter settings of the proposed tracking algorithm are as shown in Table 1; these parameters apply to all of the following test video data.

In the tracking process, root mean square error and average root mean square error are usually employed to evaluate the performance of the tracking algorithm; the root mean square error of time is as shown as follows:Whereas is the estimated value of the target position at time , is the real object position at time .

The average RMS error is defined asWhereas is the total number of tracked video sequence frames, is known as the average root mean square error, which is seen as a measurement of test result error; smaller value indicates better tracking effect.

5.1. Contrast Analysis

For comparison purpose, we compare the performance and tracking effect in these video sequences of BPF tracking algorithm [9], superpixel tracking algorithm [14], and the proposed tracking algorithm using the same video sequence. Reasons of comparison are as follows: superpixel tracking algorithm mainly uses superpixel method to convert more pixels into fewer pixels to achieve tracking, whereas proposed tracking algorithm also uses superpixel to establish prior knowledge in the detection phase thus to achieve multiple pedestrian detection; BPF tracking algorithm employs AdaBoost to detect pedestrians and particle filter algorithm to track pedestrians, whereas proposed tracking algorithm obtains confidence map of each frame by established prior knowledge and achieves pedestrian detection by random sampling of confidence map and therefore uses particle filter to achieve tracking. Both of them detect pedestrians at first and then achieve tracking by particle filter. Thus, there is certain similarity between them in the framework of the algorithm. Therefore, there is comparability among superpixel tracking algorithm, BPF tracking algorithm, and proposed tracking algorithm. Video sequence parameters for testing under the actual scene are shown in Table 2.

(1) Video Sequence 1. This group of video sequence tests the tracking performance of proposed algorithm in the mobile background. It can be seen from Figure 10 that the background of video sequence is constantly changing during tracking process. Because neither superpixel tracking algorithm nor BPF tracking algorithm processes interference of objects according to background change, tracking drift and even tracking error occur during tracking procedure, as shown in frames 170, 217, and 268. Changes in background cause no impact on proposed algorithm, which can still track object stably. This is mainly due to accurate detection of the object for the first frames, which results in more accurate tracking.

Figure 11 is the error curve which shows the error between three pedestrians in the process of tracking and the actual position, from left to right, is, respectively, the position error of tracking result in each frame for pedestrians 1, 2, and 3. It can be seen from the figure that, compared with the proposed algorithm, the position errors of superpixels tracking algorithm and BPF tracking algorithm are relatively large. Although pedestrian block does not appear in the video sequence, dynamic changes of the background cause a certain influence on the tracking results of the two algorithms, whereas the position error curve shows that proposed tracking algorithm has better accuracy. Table 3 shows the average root mean square error of three algorithms’ tracking results. It can be seen that the average root mean square error of proposed tracking algorithm is smaller, which indicates better tracking accuracy.

(2) Video Sequence 2. The video data are from CAVIAR database. There have been two instances of serious occlusion during tracking this group of video sequences, and Figure 12 is a comparison among the results of three tracking algorithms. We can see that the proposed tracking algorithm can track the object and has a better tracking performance than the other two kinds of tracking algorithm.

Figure 13 shows pedestrian error curves obtained by three tracking algorithms. It can be seen that superpixel tracking algorithm and the BPF tracking algorithm have larger tracking error for pedestrian 1, and BPF tracking algorithm has relatively small tracking error for pedestrians 3, 4, and 5, which indicates that BPF has better performance when there is no occlusion, whereas proposed tracking algorithm has relatively small root mean square error for all pedestrians. The average root mean square error in Table 4 indicates that the proposed tracking algorithm has better tracking accuracy in the tracking process.

Figure 14 shows motion trajectories of five pedestrians tracked by proposed tracking algorithm, in which curves of different colors represent different pedestrian trajectories, and each dot represents coordinate position of pedestrian in each frame.

(3) Video Sequence 3. The video shows three pedestrians walking in the hall, which aims to test the tracking effect of the three algorithms at the time of serious occlusion and changes in pedestrian scale. Figure 15 shows pedestrians tracking results of the three methods. The first line shows tracking results of superpixel tracking algorithm and proposed tracking algorithm, while the second line indicates the tracking results of BPF tracking algorithm and proposed tracking algorithm. As can be seen from the figure, using BPF tracking algorithm, drift appears when there is severe pedestrian occlusion, such as frames 1, 2, and 3, thus affecting the accurate tracking of subsequent frames. The algorithm cannot acquire enough pedestrian features description when severe occlusion occurs and therefore easily causes failing of the tracking, whereas the superpixel tracking algorithm and the proposed algorithm can track the object by acquiring partial features and can handle severe occlusion; therefore they can get more accurate tracking results in the video sequence. However, due to the fact that BPF tracking algorithms and superpixel tracking algorithm do not deal with changes of the pedestrian scale, the tracking area is of a fixed size, while proposed tracking algorithm takes into account pedestrian scale changes in the state space model; thus it has better tracking effects compared with the other two tracking algorithms.

Figure 16, from left to right, respectively, shows the error curve of pedestrians 1, 2, and 3 between the tracking result and actual position in each frame. Red represents error curve of proposed tracking algorithm. It can be seen that, compared to the other two algorithms, proposed tracking algorithm has better accuracy and robustness. The average root mean square error in Table 5 shows that proposed tracking algorithm has better tracking accuracy in the video sequence.

Figure 17 shows the target trajectories of three pedestrians, which are drawn by acquiring the coordinate positions of three pedestrians from the first frame to the last frame using proposed tracking algorithm.

(4) Video Sequence 4. The group of video sequences comes from PETS2012 standard library, mainly testing the impact of disappearance and appearance of objects on the tracking results. When an object has disappeared or appeared more than three times, it is considered that the object disappears or appears. In Figure 18, the rectangular frames in different colors represent different tracked pedestrians. At the 15th frame and the 45th frame, there are totally six tracked pedestrians; due to the limited range of the camera shot, pedestrians 6, 5, 1, and 4 disappear successively. As can be seen, proposed tracking algorithm can more accurately determine disappearance of the object.

Figure 19 shows the tracking results of objects appearance by using proposed tracking algorithm. For instance, pedestrian 7 and pedestrian 8 appear in the scene in turn; proposed tracking algorithm accurately determines the appearance of objects. In the tracking process, pedestrian 7 is continuously severely blocked by pedestrian 2 and objects, and therefore at the 188th frame and 191st frame, pedestrian is considered to have disappeared. Because pedestrian 7 has disappeared more than three times, when it appears again (e.g., frame 202), it is considered as a new object and is given the new number 9.

Figure 20 shows tracking trajectories of all pedestrians from the first frame to the current frame. Different colors represent different pedestrian trajectories, which provide an important basis for senior visual study such as pedestrian behavior recognition and scene understanding, and so forth.

6. Conclusion

In order to solve multipedestrian tracking problems in video sequences, this paper proposes tracking algorithm of multiple pedestrians based on particle filters. The contributions of our work can be listed as follows: we apply frame difference and HOG to getting object regions in training frames rapidly and accurately and acquire object and background confidence by establishing a priori knowledge to track pedestrians; we integrate color and texture features into particle filter to get better observation results and then automatically adjust the weight value of each feature according to the current tracking environment; we correlate the test results with the particle states and propose discrimination method for object disappearance and appearance, thus achieving multipedestrian tracking in complex scenarios. Test results show that proposed algorithm has better stability and robustness in complex situations such as moving background, pedestrian translation, severe occlusion, interference among pedestrians, pedestrian scale change, and disappearance and appearance of pedestrians. However, there is still a lack of a common tracking algorithm for various kinds of complex environments, such as pedestrian tracking in moving objects. Therefore, improvements and amendments are still needed in future studies.

Competing Interests

The authors declare that they have no competing interests.