Research Article | Open Access
Yizhong Yang, Qiang Zhang, Pengfei Wang, Xionglou Hu, Nengju Wu, "Moving Object Detection for Dynamic Background Scenes Based on Spatiotemporal Model", Advances in Multimedia, vol. 2017, Article ID 5179013, 9 pages, 2017. https://doi.org/10.1155/2017/5179013
Moving Object Detection for Dynamic Background Scenes Based on Spatiotemporal Model
Moving object detection in video streams is the first step of many computer vision applications. Background modeling and subtraction for moving detection is the most common technique for detecting, while how to detect moving objects correctly is still a challenge. Some methods initialize the background model at each pixel in the first N frames. However, it cannot perform well in dynamic background scenes since the background model only contains temporal features. Herein, a novel pixelwise and nonparametric moving object detection method is proposed, which contains both spatial and temporal features. The proposed method can accurately detect the dynamic background. Additionally, several new mechanisms are also proposed to maintain and update the background model. The experimental results based on image sequences in public datasets show that the proposed method provides the robustness and effectiveness in dynamic background scenes compared with the existing methods.
Compared to optical flow [10, 11] and interframe difference algorithms , background subtraction algorithm needs less computation and performs better, and it is more flexible and effective. The idea of background subtraction is to differentiate the current image from a reference background model. These algorithms initialize a background model at first to represent the scene with no moving objects and then detect the moving objects by computing the difference between the current frame and the background model. Dynamic background is a challenge for background subtraction, such as waving tree leaves and ripples on river. In the past several years, many background subtraction algorithms have been proposed, and most of them focus on building more effective background model to handle dynamic background as follows:(1)Features: texture and color [13–15](2)Combining methods: combining two or more background models as the new model (3)Updating the background model 
In this paper, a new pixelwise and nonparametric moving object detection method is proposed. Background model is built by the first frames and sampling times in 3 × 3 neighborhood region randomly. On the one hand, spatiotemporal model represents dynamic background scenes well. On the other hand, a new update strategy makes the background model fit the dynamic background. In addition, the proposed method can deal with ghost well. Experimental results show that the proposed method can efficiently and correctly detect the moving objects from the dynamic background.
This paper is organized as follows. In the next section, an overview of existing approaches of background subtraction is presented. Section 3 describes the proposed method in detail, and then Section 4 provides the experimental results and comparison with other methods. Section 5 includes conclusions and further research directions.
2. Related Work
In this section, some background subtraction methods will be introduced, which are divided into parametric and nonparametric models.
For parametric models, the most commonly used method is Gaussian Mixture Model (GMM) . Before GMM, a per-pixel Gaussian model was proposed , which calculated the mean and standard deviation for each pixel at first and then compared the probability with a certain threshold of each pixel to classify the current pixel as background or foreground. But this Gaussian model cannot deal with noise and dynamic situation. GMM was proposed to solve these problems. GMM usually set three-to-five Gaussian models for each pixel and updated the model after matching. Several papers [20, 21] improved the GMM method to be more flexible and efficient in recent years.
In contrast to parametric models, nonparametric models are commonly set up by the collection of the observed pixel values or neighborhood pixel values of each pixel. Kernel Density Estimation (KDE)  was proposed to open the door of hot research of nonparametric methods. In , a clustering technique was proposed to set up a nonparametric background model. The background model’s samples of each pixel were clustered into the set of code words. In , Wang et al. chose to include large number (up to 200) of samples in the background model. Since the background models set up by [13, 23] are only based on temporal information, they cannot deal with dynamic background scenes well without the spatial information. In ViBe [24, 25], a random scheme was introduced to set up and update background models. They initialized the background model from the first frame, and the model elements were sampled from the collection of each pixel’s neighborhood randomly. ViBe shows robustness and effectiveness for dynamic background scenes in a sense. In order to improve ViBe further, Hofmann et al.  proposed an adaptive scheme to automatically tune the decision threshold based on previous decisions made by the system. However, the background models set up by [17, 24, 25] are only based on spatial information. The lack of temporal information makes it hard to deal with time-related situation well. In , a modified Local Binary Similarity Pattern (LBSP) descriptor was proposed to set up the background model in feature space. It calculated the LBSP descriptor by absolute difference which is different from LBP. What is more, intra-LSBP and inter-LSBP were calculated in the same predetermined pattern to capture both texture and intensity changes. The change detection results from LSBP proved efficiency against many complex algorithms. Reference  improved LSBP in threshold area and combined with ViBe method to detect motion. The improvement was obviously in noisy and blurred regions. Reference  proposed spatiotemporal background model by integrating the concepts of a local feature-based approach and a statistical approach into a single framework; the results show that it can deal with illumination and dynamic background scenes well. These algorithms contain both temporal information and spatial information, resulting in not bad performance.
Initialization and update strategy are important steps common for background modeling. As for initialization, some background subtraction methods initialized the background models with pixel values at each pixel in the first frames . However, it was not effective for dynamic background situation because of the lack of neighboring pixel information. Reference  initialized from the first frame by choosing the neighborhood pixel values as sample randomly. However, it initialized the background model by only one frame. In addition, it sampled 20 pixels as the background model in the field of current pixel neighborhood. However, there were only 8 pixels in neighborhood, which inevitability resulted in repeated selection. Then it would affect segmentation decision because of the ill-considered model. Reference  proposed a different method to initialize the background model. Every element of the model contained pixel value and an efficacy , and the element with the least value of will be removed or updated. However, element with the least value of might not be the worst element in dynamic background scenes. As for update strategy, in , when a pixel has been classified as background, a random process determined whether this pixel was used to update the corresponding pixel model. It was workable but too blind to update the model well.
Herein, a nonparametric model collecting both the history and the neighborhood pixel values is presented to improve the performance for dynamic background scenes. The proposed method, based on spatiotemporal model, collects pixel values as sample from the history and neighborhood of a pixel, and the model elements are sampled from neighborhood region in the first frames. As for update strategy, the range of randomness is decreased to increase the accuracy. All above methods proposed are different from other methods based on spatiotemporal model.
3. Spatiotemporal Model for Background
Normally, a background model can fit only one kind of scenes and it is difficult to get a universal background model which can deal with all the complex and diverse scenes. Some background subtraction methods combine the different models or features like texture together to get universal models. These methods regard every frame as the most complex scenes and result in a large amount of calculation. As for this question, this paper proposes a novel and simple method to model background for dynamic background scenes, and the idea is employed to initialize the model. Next, the details of our spatiotemporal model will be introduced. The diagram of the proposed method is shown in Figure 1.
The proposed method initializes background model from the first frames. First of all, the spatial model can be initialized by picking out pixel value randomly in the neighborhood of for times at each frame, and is less than 8.
Then these spatial background models are integrated together to construct spatiotemporal model :
For the convenience of record,
As for the value of , , will be discussed in Section 4 later. The spatial information and the temporal information are integrated, and the combined idea is used here without large amount of computation. The proposed background model is proved to be effective.
3.2. Segmentation Decision
Since the proposed model only consists of grayscale value of pixel, the segmentation decision is simple in our single model. It just compares the distance between the current pixel and the pixel in the background model, and the formula is shown as follows:where represents the th element in model . defines the least number of elements in background model meeting the threshold condition. If , it implies that the pixel belongs to foreground, and conversely, the pixel belongs to background.
3.3. Updating Process
Background changes all the time in dynamic background scenes, so it is necessary to update the background model regularly to fit the dynamic background. In this section, update of the spatiotemporal model and adaptive update of decision threshold will be described in detail.
3.3.1. Update of the Spatiotemporal Model
The proposed method divides the model elements into two parts, high-efficacy part and low-efficacy part. The elements which meet the formula belong to high-efficacy part, and the rest belong to low-efficacy part. Then the random strategy will be conducted in the range of these elements belonging to low-efficacy part. What is more, learning rate is determined by experiments to fit the proposed method better.
3.3.2. Update of the Neighborhood
Background pixels always exist together in some regions, so the neighborhood of a pixel may be background pixels if this pixel has been classified as background. However, it may not be true in the edge region. In conclusion, pixels in neighborhood region of a background pixel are more likely to be background pixels compared with other pixels. So the background model of neighborhood pixel will be updated as well with the same method introduced in Section 3.3.1. After the update process, parameter will become -1 when segmentation decision is conducted in neighborhood, which is just like adaptive update.
The update method above is a memoryless update strategy. The samples in the background model at time are preserved after the update of the pixel model with the probability . For any further time , this probability formula is shown as follows:
This formula can also be written as follows:where denotes the probability after time , and it shows that the expected remaining lifespan of any sample value of the model decays exponentially.
4. Experiments and Results
In this section, a series of experiments are conducted to analyze the parameter setting and evaluated the performance of the proposed method with others. Here, we first express our gratitude to changedetection.net , which provides the datasets for our experiments. The datasets include six test videos on the category of dynamic background and several objective indexes for evaluating performance quantitatively:where True Positive (TP) is the number of correctly classified foreground pixels and True Negative (TN) is the number of correctly classified background pixels. On the other hand, False Positive (FP) is the number of background pixels that is incorrectly classified as foreground and False Negative (FN) is the number of foreground pixels that is incorrectly classified as background pixel in background subtraction method. The data above are used to calculate Recall, Precision, and -Measure. Recall represents the percent of the correctly detected foreground relative to the ground truth foreground. Precision represents the percent of the correctly detected foreground relative to the detected foreground including true foreground and false foreground. -Measure is a comprehensive index of Recall and Precision, which is primarily used to evaluate the performance of different parameters and different methods.
The proposed method is implemented in C++ programming language with opencv2.4.9 on a core i3 CPU with 3.0 GHz and 2 G RAM.
4.1. Parameter Setting
It was mentioned in Section 3 that we initialized the model from frames and sampled elements from neighborhood randomly times. We conducted a series of experiments on the adjustment of and with the fixed parameter, learning rate and , and without postprocessing.
It is clear that performance with parameter from 5 to 6 and from 6 to 10 are better in Figure 2. Further experiments tested with different parameters are shown in Table 1. Performance with different value is shown in Figure 3.
| and are the best choices, and is also a desired parameter for small computational burden.|
After these kinds of experiments (Figures 3 and 4), the parameters were set as follows:(1), , .(2).(3), .(4)A median filter step was applied, and it can be seen that, in Table 2, a 9 × 9 window behaves better. The median filter is a step to make the results better, while, compared with other algorithm, this step is removed.
4.2. Comparison with Other Methods
Figures 5(b) and 5(c) show the detection results of  and the proposed method from the input frame (a), respectively. The waving tree leaves in (a) are the dynamic background. Since  is a temporal-only model method, the background model lacks the neighborhood pixel information, which will regard the dynamic background as moving objects. The proposed method considers both temporal information and spatial information, setting up the background model from the first 8 frames and sampling 5 times in the 3 × 3 neighborhood region randomly. Therefore, the performance in dynamic background scenes is better than .
Figure 6 shows the detection results of ViBe  and the proposed method. Since ViBe  sets up the background model only based on spatial information, time-related situation such as ghost may exist. As shown in Figure 6(c), it sets up background model just from the first frame and regards all pixels in it as background pixels without moving objects. If there are some moving objects in first frame and the moving objects move away (the fiftieth frame (b)), they will be detected as ghosts (cars marked in red rectangles in (c)). The background model of the proposed method contains not only spatial information but also temporal information, so it can recognize the moving objects from first frame. Therefore, ghost can be well eliminated.
The proposed method focuses on building and updating more effective background model to deal with dynamic background scenes. The public dynamic background video datasets from changedetection.net, which are “Boats” with 7999 frames, “Canoe” with 1189 frames, “Fall” with 4000 frames, “Fountain01” with 1184 frames and “Fountain02” with 1499 frames, “Overpass” with 3000 frames, are used to conduct the experiments. For fair comparison, the results of the proposed method do not use any postprocessing. ViBe  and CodeBook  are two classical methods for background segmentation, so we conducted comparison between the proposed method and them. Experimental results are shown in Figure 7.
Beyond dynamic background scenes, the results of other categories in changedetection.net are shown in Figure 8. It can be seen that the proposed method performs well in several different categories, such as “Bad Weather,” “Baseline”, “Thermal,” and “Intermittent Object Motion.” But in other categories, the proposed method performs not very well. For example, in “PTZ” category, after the camera moves, the proposed method needs a rather long time to learn the new background by the update process, which may result in false detection during this process. However, although the proposed method is not a universal method, it can deal with most scenes satisfactorily.
The quantitative comparison results of “dynamic background” category between the proposed method and more other background subtraction methods are shown in Table 3. Among these methods, ViBe  is a nonparameter algorithm, from which the proposed method is derived. LOBSTER  and Multiscale Spatiotemporal BG Model  are spatiotemporal background modeling algorithms, which are similar to the proposed method. EFIC  is a popular method in changedetection.net. TVRPCA  is an advanced RPCA based method, which is also designed for dynamic background scenes. As shown in Table 3, AAPSA  has the highest -Measure for its autoadaptive strategy. Expect AASPA, in the aspect of -Measure, the proposed method gets the highest score. Herein, although the proposed method’s -Measure is not the highest, it can deal with not only dynamic background scenes well but also ghost elimination.
In this paper, a novel change detection method of nonparametric background segmentation for dynamic background scenes is proposed. The background model is built by sampling 5 times in 3 × 3 neighborhood region randomly from first 8 frames. The samples of background model are separated to high-efficacy part and low-efficacy part, and the samples in low-efficacy part will be replaced randomly. The update strategy which replaces sample in low-efficacy part can continuously optimize the background model. It can be seen from the experimental results that the proposed method is robust in dynamic background scenes and ghost elimination compared to other methods.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China under Grant 61401137 and Grant 61404043 and the Fundamental Research Funds for the Central Universities under Grant J2014HGXJ0083.
- I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809–830, 2000.
- E. Stringa and C. S. Regazzoni, “Real-time video-shot detection for scene surveillance applications,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 69–79, 2000.
- L. Li, Y. H. Gu, M. K. H. Leung, and Q. Tian, “Knowledge-based fuzzy reasoning for maintenance of moderate-to-fast background changes in video surveillance,” in Proceedings of the 4th IASTED International Conference Signal and Image Processing, 2002.
- Z. Zivkovic and F. van der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognition Letters, vol. 27, no. 7, pp. 773–780, 2006.
- C. Schmidt and H. Matiar, “Performance evaluation of local features in human classification and detection,” Iet Computer Vision, vol. 2, no. 28, pp. 236–246, 2008.
- L. Maddalena and A. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Transactions on Image Processing, vol. 17, no. 7, pp. 1168–1177, 2008.
- C. Guo and L. Zhang, “A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression,” IEEE Transactions on Image Processing, vol. 19, no. 1, pp. 185–198, 2010.
- P. Sun, S. Xia, G. Yuan, and D. Li, “An overview of moving object trajectory compression algorithms,” Mathematical Problems in Engineering, vol. 2016, Article ID 6587309, 13 pages, 2016.
- C. I. Patel, S. Garg, T. Zaveri, and A. Banerjee, “Top-down and bottom-up cues based moving object detection for varied background video sequences,” Advances in Multimedia, vol. 2014, Article ID 879070, 20 pages, 2014.
- J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.
- S. Denman, C. Fookes, and S. Sridharan, “Improved simultaneous computation of motion detection and optical flow for object tracking,” in Proceedings of the Digital Image Computing: Techniques and Applications, DICTA 2009, pp. 175–182, December 2009.
- R. Liang, L. Yan, P. Gao, X. Qian, Z. Zhang, and H. Sun, “Aviation video moving-target detection with inter-frame difference,” in Proceedings of the 2010 3rd International Congress on Image and Signal Processing, CISP 2010, pp. 1494–1497, October 2010.
- K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging, vol. 11, no. 3, pp. 172–185, 2005.
- K. Wilson, “Real-time tracking for multiple objects based on implementation of RGB color space in video,” International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 9, no. 4, pp. 331–338, 2016.
- M. Heikkilä and M. Pietikäinen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 657–662, 2006.
- B. Yin, J. Zhang, and Z. Wang, “Background segmentation of dynamic scenes based on dual model,” IET Computer Vision, vol. 8, no. 6, pp. 545–555, 2014.
- M. Hofmann, P. Tiefenbacher, and G. Rigoll, “Background segmentation with feedback: The pixel-based adaptive segmenter,” in Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2012, pp. 38–43, June 2012.
- G.-A. Bilodeau, J.-P. Jodoin, and N. Saunier, “Change detection in feature space using local binary similarity patterns,” in Proceedings of the 10th International Conference on Computer and Robot Vision, CRV 2013, pp. 106–112, May 2013.
- C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “P finder: real-time tracking of the human body,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780–785, 1997.
- P. Kaewtrakulpong and R. Bowden, An Improved Adaptive Background Mixture Model for Realtime Tracking with Shadow Detection, Springer, USA, 2002.
- D.-S. Lee, “Effective Gaussian mixture learning for video background subtraction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 827–832, 2005.
- A. Mittal and N. Paragios, “Motion-based background subtraction using adaptive kernel density estimation,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, CVPR 2004, vol. 2, no. 2, pp. 302–309, Washington, DC, USA.
- H. Wang and D. Suter, “A consensus-based method for tracking: Modelling background scenario and foreground appearance,” Pattern Recognition, vol. 40, no. 3, pp. 1091–1105, 2007.
- O. Barnich and M. Van Droogenbroeck, “ViBe: a universal background subtraction algorithm for video sequences,” IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709–1724, 2011.
- M. Van Droogenbroeck and O. Paquot, “Background subtraction: experiments and improvements for ViBe,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 71, no. 6, pp. 32–37, IEEE, Providence, RI, USA, June 2012.
- G. A. Bilodeau, J. P. Jodoin, and N. Saunier, “Change detection in feature space using local binary similarity patterns,” in Proceedings of the International Conference on Computer & Robot Vision, vol. 10, no. 1, pp. 106–112, 2013.
- P.-L. St-Charles and G.-A. Bilodeau, “Improving background subtraction using Local Binary Similarity Patterns,” in Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision, WACV 2014, pp. 509–515, March 2014.
- S. Yoshinaga, A. Shimada, H. Nagahara, and R.-I. Taniguchi, “Object detection based on spatiotemporal background models,” Computer Vision and Image Understanding, vol. 122, no. 5, pp. 84–91, 2014.
- B. Wang and P. Dudek, “A fast self-tuning background subtraction algorithm,” in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014, pp. 401–404, June 2014.
- X. Lu, “A multiscale spatio-temporal background model for motion detection,” in Proceedings of the IEEE International Conference on Image Processing, pp. 3268–3271.
- G. Allebosh, F. Deboeverie, P. Veelaert, and W. Philips, “EFIC: Edge based Forground background segmentation and interior classification for dynamic camera viewpoints,” in Advanced Concepts for Intelligent Vision Systems (ACTIVS), pp. 433–454, 2015.
- X. Cao, L. Yang, and X. Guo, “Total variation regularized rpca for irregularly moving object detection under dynamic background,” IEEE Transactions on Cybernetics, vol. 46, no. 4, pp. 1014–1027, 2016.
- G. Ramírez-Alonso and M. I. Chacón-Murguía, “Auto-adaptive parallel SOM architecture with a modular analysis for dynamic object segmentation in videos,” Neurocomputing, vol. 175, pp. 990–1000, 2016.
- N. Goyette, P.-M. Jodoin, F. Porikli, J. Konrad, and P. Ishwar, “changedetection.net: a new change detection benchmark dataset,” in Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2012, pp. 1–8, June 2012.
Copyright © 2017 Yizhong Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.