Abstract

As a classical problem for computer vision, moving object detection (MOD) can be efficiently achieved by foreground and background separation. The Robust Principal Component Analysis (RPCA)-based method has been potentially utilized to solve the problem. However, the detection accuracy for RPCA-based method is limited for complex scenes with slow-motion. Besides, it is time consuming for the way to seek for background modeling based on solving a low-rank minimization problem, for which multiple frames of the videos are required as the input. Therefore, a real-time MOD framework (LSRPCA_KF) is proposed for the dynamic background, where a weighted low-rank and structured sparse RPCA algorithm is used to achieve background modeling for history data, while the online MOD is achieved by the background subtraction method and updated by the Kalman filter for every real time frame. Specifically, for the background model, a newly designed weight is incorporated to distinguish the significance of different singular values, and a structured sparse prior is added to penalty the spatial connection property of the moving object. Besides, the weighted low-rank and structured sparse RPCA model is efficiently solved by the Alternating Direction Method of Multipliers (ADMM) optimization algorithm. Experimental results demonstrated that better performance of our method with significantly reduced delay in processing and better detect the moving object has been achieved, especially for the dynamic background.

1. Introduction

Moving object detection (MOD) of video sequences is an important technique to represent the moving objects of interest by a binary mask in each frame. It has been actively investigated for various vision-based applications such as intelligent video surveillance, human behavior understanding, human and computer interaction, monitoring of protection along marine border, and so on [1]. Theoretically, the only changes between consecutive frames are caused by moving objects for the surveillance video with fixed cameras. Thus, it is easy to implement MOD by interframe difference or background subtraction. Unfortunately, all the changes are not totally due to the moving objects in practice. For example, the changes may be caused by shadow regions and illumination source changes for a controlled indoor scene. Furthermore, for an outdoor scene, the effect of MOD is affected by many uncontrollable changes such as branch movement and cloud movement. Therefore, it is still challenging to develop robust MOD methods for dynamic environment to further consider the complexity of the background.

Plenty studies have been carried out on MOD for dynamic environment [2], and there are two main types of methods, which are the object detection-based or background subtraction-based algorithm [3]. In recent years, a large number of object detectors have been proposed to achieve MOD, but large-scale datasets are needed for offline learning [4], or a manually labeled background at the start of a video is needed for online learning [5]. The performance has been significantly improved in recent years especially for deep-learning based methods [6]. However, the spectral and spatial information are adequately explored, as no temporal information is available which has limited the applications for moving object detection with a dynamic background. Besides, the data-based learning process and the manually labeled process are not appropriate for real-time application. Alternatively, MOD can be achieved by subtracting the current frame from the background image which is created by background subtraction [7]. However, the performance depends on the choice of the background image. Therefore, both of these two methods are limited in certain automated video analysis applications.

Training phases have been avoided for motion-based MOD methods [8], for which only motion information is required. Motion patterns are the most commonly employed motion information which is assumed to be rigid motion or smooth motion [9]. However, this assumption does not match the actual application. Besides, the background may be complex, especially for the dynamic environment. In this situation, background estimation can be chosen as an alternative motion-based method, in which the background model is directly estimated [10]. But this method also assumes that the background pixels remain unchanged over a certain time interval. Thus, this type of method does not apply to a dynamic background or moving cameras.

Robust Principal Component Analysis (RPCA)-based methods [11] are chosen as extended methods of background estimation that can be used to estimate background without additional assumption on the background. The main idea is that the background changes slowly among a set of consecutive image frames, and it can be expressed with a low-rank matrix for the high similarity. Correspondingly, the movement object is treated as a sudden change in local intensity, which cannot be fitted to the low-rank background but appears as a local sparse outlier [12]. Based on the above methods, the detection of moving objects can be effectively realized. However, the solutions of RPCA algorithms are obtained in a batch manner. In addition, the optimization process requires the use of multiframe videos as input and are solved by multiiteration. Furthermore, the number of video frames processed by the algorithm at a time is fixed. Therefore, such algorithms are not suitable for real-time processing.

The Kalman filter is incorporated to form a real-time moving object detection framework called LSRPCA_KF in this study to solve the above problems. Kalman Filtering is a commonly used object tracking algorithm. An efficient recursive solution with prediction and correction mechanism is provided to minimize the mean of the squared error. The Kalman filter has fast tracking speed, but it is difficult to deal with moving targets with strong light and sudden illumination. At this time, the whole region will be regarded as the foreground.

To address the above problems, a novel method for online MOD is proposed via combination of weighted low-rank structure sparse RPCA and Kalman filtering, in which MOD and background modeling can be achieved at the same time without training processing. Therefore, the MOD for each current frame is detected by the background subtraction method during the real-time application, in which the background images are obtained by solving a weighted low-rank and structure sparse RPCA problem with an N consecutive previous frame of the current frame. And then, the detection results are used as measurement information for KF and updated for the current frame itself to improve the detection accuracy and speed. The main contributions of our paper are summarized as follows:(1)We propose a new formulation of RPCA-based MOD, in which two newly designed weights and a structure sparse prior are incorporated to form the background modeling problem to provide an accurate background image. The model can be interpreted as RPCA with a low-rank and sparse structure, which is a more suitable model for motion segmentation for it gets rid of assumptions on foreground behavior. According to the new model, an effective ADMM-based algorithm was developed to solve this problem. Experiments have been conducted to prove that the proposed method can be effectively used for both MOD and background modeling of the dynamic environment.(2)Once a moving object is detected, Kalman filters are initialized for tracking the moving objects to adapt the algorithm to real-time applications. To be mentioned, the initialization process for the first frame or training process is avoided. Besides, the background model is continuously updated by the RPCA algorithm to make the background model continuously effective to realize target detection. In particular, when occlusion occurs, the online detection mechanism can also detect the target again and take this position as the starting point of Kalman filter-based prediction and tracking.

The overall structure of the paper is organized as follows. The most related works, including RPCA and KF filters, are briefly introduced in Section 2, while the proposed method is presented in detail in Section 3, where both the weight low-rank and structure RPCA-based moving object detection and Kalman filter-based moving object tracking are described in detail. Experimental results and corresponding performance analysis are demonstrated in Section 4. Finally, conclusions and suggestions for future research are drawn in Section 5.

In this paper, RPCA based background modeling and Kalman filtering based object tracking are combined for MOD which are the two related works.

2.1. RPCA-Based Background Modeling Method

RPCA defines a low-rank matrix decomposition problem for solving moving object detection, in which the low-rank property of the background and the sparsity of objects are fully considered. Thus, MOD can be achieved by separating the background matrix L with a low-rank property and foreground matrix S with a sparse property from the input video sequence [8]where D is the converted data matrix of the input video sequence, the image resolution is , and the number of frames is T. So, the goal of the RPCA-based background modeling problem is to reconstruct the background matrix L with the low-rank property and foreground matrix S with the sparse property. Therefore, a combined low-rank and sparse optimization problem is constructed aswhere the low-rank property of L and the sparsity of S are characterized asin which is the singular values of L.

There are many methods that can be used to solve (2), such as Principal Component Pursuit (RPCA-PCP) [13], GoDec [14], and the Alternating Direction Method of Multipliers (ADMM). In this section, ADMM is employed to provide a relatively accurate result. Firstly, an unconstrained optimization problem is constructed aswhere Y is the Lagrange multiplier; is the penalty parameter; denotes the inner product; and is the Frobenius norm [15]. Secondly, two subproblems of L and S are alternately and iteratively solved aswhere , .

For L subproblem (6), singular value threshold is employed

For subproblem (7), soft thresholding is employed

Finally, Y is updated as

The background modeling algorithm via the ADMM is summarized in Algorithm 1.

Input: Input Video Data D, ,
(1)Initialization: L = S=Y=0;
(2)while not converged do
(3) Update as (8) and (9);
(4) Update as (11);
(5) Update as (12);
(6)end while
Output:,

RPCA and related improved algorithms have been applied to solve the problem of MOD and have achieved good detection accuracy. Many researches have demonstrated that the selection of the sparse constraint term directly affects the results of the RPCA-based method for practical problems. However, there is currently no appropriate designation method for the sparse constraint term that can be applied to all situations. The sparsity patterns of the moving objects need to be improved, and the way to solve the modified object function needs to be adjusted accordingly.

2.2. Kalman Filter-Based Object Detection and Tracking

The Kalman filter is popular in the field of target tracking for its simplicity [16]. The basic idea of feedback control is utilized in the discrete Kalman filter to estimate the state of a discrete-time controlled process. The state variable for moving object tracking is selected aswhere respects the centroid of an object in tth frame and respects the object velocity which is constant in this paper. Once a moving object is detected, its position and velocity information are used for KF initialization with , where n respects the nth object. KF consists of two main processes.

Further, two main processes are consisted in a KF, which are the predictor process and corrector process [17]. The predictor process is used to project the current state estimate ahead in time, so the predictor equations are constructed aswhere is the estimated state at time k and is the error covariance matrix that can be used to measure the accuracy of . A= [1 0 1 0; 0 1 0 1; 0 0 1 0; 0 0 0 1] is used as the state transition model, and Q is defined as the process noise covariance. It is actually changing over time, but it is defined as a constant value.

Corrector equations are constructed to adjust the projected estimate by an actual measurement at that timewhere K is employed to minimize the error covariance and R is the measurement noise covariance which is assumed to be a constant. Besides, the measurement model H= [1 0; 0 0; 0 1; 0 0], is the predicted measurement. State estimation error covariance matrix , system noise and measurement noise R are set as a unit matrix.

3. Online Weight Low-Rank Combined with the Kalman Filter Framework for MOD

To address the issue of MOD with a dynamic background, an online weight low-rank combined with the Kalman filter framework is proposed. To be specific, a weighted low-rank and structured sparse RPCA algorithm is used to achieve background modeling for history data, while the online MOD is achieved by the background subtraction method and updated by the Kalman filter for every real time frame. Besides, the background image is updated in an online fashion. The flowchart of this algorithm is depicted in Figure 1. The online moving object detection framework combines the two major components together. The moving objects will be detected by the background subtraction method for the current frame, whereas the background image is estimated by the previous N frames. And when a new moving object is detected, a KF is initialized and used for tracking. In particular, the object does not need to be initialized for the proposed algorithm. Thus, the framework can work well for the real time system.

3.1. Weighted Nuclear Norm Minimization

A nuclear norm is employed to constrain the low-rank matrix for (2), while a singular threshold algorithm is commonly employed to solve the corresponding nuclear norm minimization problem. In general, the singular values are treated equally to pursue the minimization of the nuclear norm which greatly restricts its speed to converge for the progress of solving the RPCA model. However, most of the background information is characterized by large singular values, while the background noise is characterized by small values. Background information will be lost by the same degree of shrinkage operation. Studies have shown that the main information of the image can be characterized by larger singular values. Therefore, larger singular values should be preserved as much as possible, and smaller singular values should shrink more. Based on the above analysis, different weights should be assigned, and the minimization problem will be rewritten as follows [18]:wherewhich denote the weighted nuclear norm of X. A different nonnegative weight , is assigned to each , and the weight vector are in a nondescending order. The WNNM problem in (14) has a globally optimal solution

For MOD, two newly designed weight vectors and are proposedwhere c and b are two constants, n is the number of nonzero singular values; is the largest singular value [15].

3.2. Structured Sparse Prior

For the sparse constraint in (2), l1 norm is commonly used for the RPCA model. The l1 norm imposes a sparse constraint on a moving target from a pixel scale, but the spatial connection between each pixel is not considered. Therefore, considering the expression of spatial connection, spatial prior could be more appropriate for MOD. The more commonly used spatial constraints are total variation (TV) regularization [16], the first-order Markov random field (MRF), l2,1-norm, and block-sparse etc. The smoothness on the foreground is enforced with TV, while the continuity of the edges for moving objects is enforced with MRF [19]. Further, the l2,1-norm was employed with column-wise sparsity in [20], and block-sparse has been used in [21, 22]. But still no structured information has been enforced for all the above spatial constraints. However, it has been proven that foregrounds are distributed with structures [23]. In addition, we need to refer to the successful utilization of the structural information in sparse signal recovery [24], and a structured sparsity norm is introduced to promote structured sparsity of objects. In this paper, the structured sparsity norm inspired by [25] is utilized to describe the structure information of the foreground objects, which is

Therefore, the weighted low-rank and structured sparse RPCA has been modified as

Similarly, the ADMM method is employed, and the augmented Lagrange functions are defined as

The iterative schemes are the same with the ADMM-based RPCA progress. The major difference is the subproblem to solve S, which is

The subproblem S is solved by a quadratic min-cost flow problem. Please refer to [25] for more details. The whole low-rank and structured sparse prior-based RPCA method can be found in Algorithm 2.

Input: video matrix D, ,
(1)Initialization: L = S=Y=0
(2)while not converged do
(3)Update as (8), (18) or (19) and (16);
(4)Update as:
(5)Update as (12).
(6)k= k + 1
(7)end while
Output:,

4. Experimental Results

All the methods are implemented in Matlab R2013b, and all the experiments have been performed on a computer with Intel core i7-6700 3.40 GHz and 16 GB RAM. The performance have been tested on selected video sequences of the benchmark dataset provided by [26], and the video sequence Water surface, boat, and canoe are selected. All the selected videos contain a dynamic background such as rippling of water which is very suitable for the verification of sea surface moving object detection and identification. The frame resolution of Water surface is while for the other two.

4.1. Offline Analysis of Weighted Low-Rank and Structured Sparsity RPCA-Based Object Detection

In the section, the performance of the weighted low-rank and structured sparsity RPCA-based object detection method is evaluated by both qualitative and quantitative analysis. Besides, a comparative analysis with other background modeling methods, such as traditional RPCA and MOG, was also carried out. Seven evaluation metrics (Re, Sp, FPR, FNR, PWC, Pr, and F-measure) are employed to give a quantitative evaluation result which is more reliable to demonstrate the performance of the MOD method. Evaluation methods and the calculation of related metrics refer to the website https://www.changedetection.net/.

4.1.1. Parameter Setting

The performance of the proposed object detection method is controlled by several parameters (, , b, and c). A trade-off of the structured sparsity term and low-rank constraint term is controlled with the parameter and the penalty parameter is introduced by the ADMM optimization process. In the following experiments, and are fixed to and respectively, for all image sequences. Besides, b and c are two extra parameters introduced with the first newly designed weight vector. The results with different weights are demonstrated in Figure 2. The value of b is adjusted by fixing c, and the optimal value of b is selected as 0. Then, the value of c is adjusted by fixing b, and the optimal value of c is selected as 15.3. Besides, the performance of is much better than . Therefore, is selected in our method.

4.1.2. Performance Evaluation

The comparison detection results of the proposed algorithm with traditional RPCA, WNNM, GoDec, RPCA_PCP, and MOG are demonstrated in Figure 3. From Figure 3, the background modeling results and target extraction results obtained by the algorithm in this paper are the best. Besides, GoDec and RPCA_PCP are two different methods to solve the RPCA model. Overall, these two solution methods have obtained slightly worse experimental results. The background modeling results of RPCA and WNNM have serious moving object artifacts, while the integrity of the moving object is poor. In addition, the background effect obtained by this algorithm is similar to that obtained by MOG. However, the detection effect of the moving target is very poor. In order to further illustrate the role of the structure sparse constraint terms in the algorithm in this paper, the methods of different objective functions are further compared in Figure 4.

The evaluation metrics for all comparison algorithms on the selected video sequences are demonstrated in Table 1. It can be seen from the related metrics that RPCA-based methods are superior to MOG on the whole. Among the three RPCA based methods, the performance of our method is the best. According to [26], the most relevant to the performance of moving object detection is metric F. Taking F as an example, the algorithm in this paper has the best effect on both test images. For video boat, the detection performance is much better for smaller metrics that are achieved. There are two major reasons. Firstly, the water surface ripple dynamics for video canoe is more intense. Secondly, the hull surface area in the video canoe is much larger, which is easy to be recognized as the background when moving at a slower speed. Subsequent research will focus on this problem.

4.1.3. Analysis on Computation Consumption

In order to better adapt to real-time applications, the convergence is analyzed, as shown in Table 2. As demonstrated in Table 2, the convergence of the algorithm is improved by the introduction of structural sparse constraint terms and weighted vectors. However, the computation consumption of each iteration is greatly increased for structured sparsity encoding. Therefore, how to further reduce the amount of computation due to sparse constraints is the focus of later attention.

4.2. Performance for Kalman Filtering-Based Online Object Detection

The real-time performance of object detection algorithms needs to be considered in practical applications. The low-rank and sparse structure RPCA algorithm shows good performance in both object detection and background modeling for off-line application. In order to make the algorithm to be better applied to real-time applications, the Kalman filter is incorporated to form an online object detection mechanism.

In order to verify the real-time moving object detection effect of the algorithm, the video boats is selected for simulation analysis in this section. This video sequence contains ground truth of the detected moving object, which is convenient for comparison of algorithms. Real-time object detection results with different algorithms are shown in Figures 5 and 6. Figure 5 is the center point coordinates (including horizontal and vertical) of the object detection bounding box in the entire video sequence, where ground truth is provided by the video sequence. The proposed method presented can better realize real-time object detection and tracking in Figures 5 and 6. Compared with other methods, the average deviation of the algorithm presented in this paper is the smallest. The tracking effect at the beginning of the video is the worst, and this is mainly because the object detection algorithm requires the maximum connected domain to be identified as a moving object after it exceeds a certain threshold, and then the target is not considered as an effective target when it first enters the field of vision (Figure 6 (a)). In addition, the results have appeared with a large deviation near the 20th frame of the video (Figure 6 (b)) because there is a moving car on the top view of the image, and the vehicle is similar to the background. Therefore, it is not recognized as a valid moving object in the ground truth files. However, the vehicle is detected as a moving object for our real-time object detection mechanism. Therefore, our method can be better employed to the moving object detection in the complex background. Also, the tracking effect of the horizontal position is better than that of the vertical position on the whole. The vertical coordinate tracking results are lower than the true value. The reason lies in the requirement of the complete structure constrain for object detection, so that the person on the boat and the boat itself are detected as two objects. Finally, the algorithm in this paper has a better recognition ability for ship wake flow, as shown in Figure 6 (e).

5. Conclusion

In this paper, an online weighted low-rank and structured sparse RPCA and Kalman filter-based moving objects detection method for dynamic background has been proposed. The algorithm is successfully used for real-time applications without any additional sensor data. A newly designed weight vector and structure sparsity prior have been incorporated to improve the effectiveness of background modeling. Besides, KF is fully explored to realize online object detection of the RPCA-based method. Experimental results show that the idea of online weighted low-rank and structured sparse RPCA is very effective for background modeling or moving object tracking in the dynamic background. It can be expected that the proposed method will be successful in real-time applications of moving object detection. But this method did not handle the changing background condition, and camera shake will be considered more in the future work.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities (3132020110) and China Postdoctoral Science Foundation (2019M661076).