Abstract
Background and foreground modeling is a typical method in the application of computer vision. The current general “lowrank + sparse” model decomposes the frames from the video sequences into lowrank background and sparse foreground. But the sparse assumption in such a model may not conform with the reality, and the model cannot directly reflect the correlation between the background and foreground either. Thus, we present a novel model to solve this problem by decomposing the arranged data matrix into lowrank background and moving foreground . Here, we only need to give the priori assumption of the background to be lowrank and let the foreground be separated from the background as much as possible. Based on this division, we use a pair of dual norms, nuclear norm and spectral norm, to regularize the foreground and background, respectively. Furthermore, we use a reweighted function instead of the normal norm so as to get a better and faster approximation model. Detailed explanation based on linear algebra about our two models will be presented in this paper. By the observation of the experimental results, we can see that our model can get better background modeling, and even simplified versions of our algorithms perform better than mainstream techniques IALM and GoDec.
1. Introduction
Over the past several decades, the applications of monitoring and control system, including video surveillance, traffic monitoring and analysis, human detection and tracking, and gesture recognition in humanmachine interface, are increasingly popular and widely used in our daily life. Moeslund et al. [1] and many scholars were committed to the related researches by using various methods.
Background subtraction is an algorithm proposed since 1999. Due to the underlying ease of implementation and effectiveness, foreground object detection by using background subtraction has been widely used in the video surveillance applications. This kind of approach can be traced back to Toyama et al. [2], the algorithm which predicts pixel intensity by using the Wiener filter to dig out the foreground region instead of maintaining a specific image. In the meanwhile, Stauffer and Grimson [3, 4] proposed a method based on the Gaussian mixture model (GMM) to track the dynamic objects and give the trajectory of them. The GMM has gained great popularity among the computer vision community, and other scholars continue to revisit the method and propose enhanced algorithms [5, 6]. But modeling the high frequency variations in the background with 3–5 of Gaussian distribution is not accurate and even fails to achieve the sensitive detection. So, in the following studies, Elgammal et al. [7, 8] proposed a corresponding new background subtraction, using more shortterm distributions to obtain better detection sensitivity. They put forward the statistical scene background by using general nonparametric kernel density estimation to subtract the background while retaining the foreground image. All the methods above are realtime algorithm; the estimated background at each pixel location is based on the pixels recent history and no spatial correlation is used between different or neighbouring pixel locations. So, Oliver built an eigenspace to model the background and used principal component analysis (PCA) to reduce the dimensionality of the space [9].
Over the last five years, there has been growing interest within the usage of PCA to model the background. Chandrasekaran put forward the “lowrank + sparse” model to achieve the purpose of background modeling and foreground detection in the video processing field [10]. In 2009, Candes et al. [11] proposed the robust principal component analysis (RPCA) method to decompose the large data matrix of video sequence into lowrank and sparse components. RPCA being used firstly in the field of computer vision was put forward by de la Torre and Black [12]. Almost at the same time as Candes, Wright et al. [13] also chose RPCA to model the background as a lowrank part of the frames in the video sequence. After that, many scholars threw themselves into the work to solve this model better. An alternating direction method (ADM) for sparsity and lowrank decomposition was studied by Yuan and Yang [14]. Lin et al. solved this optimization problem via the method of augmented Lagrange multipliers (ALM) in [15]. Zhou and Tao [16] introduced a special fast lowrank approximation based on bilateral random projections (BRP), which was the essence of their algorithm GoDec. Peng et al. [17] referred to RPCA problem as robust alignment by sparse and lowrank (RASL) decomposition and gave the outer and inner loop of RASL to solve the problem. All these improved PCA methods exploit the correlations between different pixels very well and no longer being local method like the previous background subtraction algorithms.
The general “lowrank + sparse” model in the articles available is as follows: where is a data matrix whose columns are the pixels from each frame in the video sequence; denotes a lowrank matrix which naturally corresponds to the stationary background; and is a sparse matrix which contains the moving objects in the foreground. The Frobenius norm is defined as the quadratic sum of the matrix elements; the nuclear norm is defined as the sum of its singular values; the norm is seen as the sum of the absolute values of the matrix elements [18, 19].
To address this convex optimization problem, numerous approaches have been explored and proposed in the academic papers over several decades. However, in my opinion, the “sparse” assumption in this model may not always conform to the reality. The foreground moving target may be a very small part of the full panoramic view, and it may also occupy a large proportion in the captured picture. And, under normal circumstances, the scholars usually add a related item onto (1) to constraint the correlation between foreground and background matrices. Therefore, we are trying to improve the PCA method in another way to find a new model to separate the background and foreground and reflect the correlation between them as well. In this paper, we proposed a “lowrank + dual” model to solve this foreground and background modeling problem. And we further use the reweighted dual function norm instead of the normal norms so as to get a better and faster model, as shown in Algorithm 2. Although our two models are all based on offline method, they provide the probability distribution function of the background within the consideration of the spatial correlation between neighboring pixels. Experiments on several scenes show that our methods have better performance and lower consumption compared with the “lowrank + sparse” methods, IALM and GoDec.
This paper is organized as follows. Section 2 briefly details our motivation of proposing the rudimentary dual model, and the algorithm to this dual model will also be given; a further improved reweighted form model and the corresponding algorithm will be given in Section 3; and, in Section 4, the experiments of our methods will be presented and the results will be compared with IALM and GoDec; and a final summary will be given in Section 5.
2. Dual Norm Model
2.1. Functional with the Dual Norm Constraint
Inspired by Meyer for the usage of the dual norms to regular the cartoon and texture in the image decomposition problem [20], we try to use the advantages of such dual regularizations. So, we put forward our rudimentary dual norm model, “lowrank + dual” model, which has been improved from the general “lowrank + sparse” model, as shown in Algorithm 1. Here, we keep the lowrank hypothesis of the model for the background, which is similar to Oliver’s concept of eigenspace, and use the nuclear norm as one regularization to realize this assumption. But, for the foreground part, we use the “dual” norm regularization instead of the “sparse” constraint to obtain the most uncorrelated foreground from the background. To illustrate this point, we need to recall the definition of the dual norm [21].


Definition 1. For any given norm in an inner product space, there exists a dual norm defined as Furthermore, the dual norm of is again the original norm .
From the definition, we can obtain the following upper bound of the inner product: Then, due to the fact that always has , we can further convert (3) into the following equivalent result:
Thus, minimizing the sum of the norms and can get a smaller value of the inner product , which leads to the most irrelevant and . Back to the foreground and background modeling processing in the video sequence problems, we use the nuclear norm as a prior assumption of the background. Then, if we use the dual norm of the nuclear norm to the foreground and minimize the sum of the norms in the minimization problem, then we will be able to get the most unrelated foreground and background from the known data matrix.
Theorem 2 (Proposition 2.1 in [22]). The dual norm of the nuclear norm is the operator norm it is the spectral norm in .
So, the minimization problem (1) can be improved as the following nuclear norm and spectral norm minimization problem (NSMP): where , the spectral norm, is the largest singular value of the matrix.
This is our rudimentary model, with the fidelity term, which makes the result being faithful and loyal to the original video data; with the regularization terms, the nuclear norm ensures the lowrank of the background, and the spectral norm ensures the extraction of the foreground.
2.2. Algorithm to the Dual Norm Model
Although (5) is a nonconvex problem, while its two subproblems, nuclear norm minimization (NNM) problem and spectral norm minimization (SNM) problem, are all convex. Thus, we can solve the subproblems alternatively instead of solving (5) directly.
2.2.1. Nuclear Norm Minimization Problem
For a fixed matrix , the NNM problem can be expressed as the following subproblem: The solution to this NNM problem has already been given by Cai et al. in Theorem 2.1 of his work [23]. Suppose is the singular value decomposition (SVD) of the matrix ; then, the solution to (6) is where the softthreshold operator is defined as [24].
2.2.2. Spectral Norm Minimization Problem
Similarly, for a fixed matrix , the other subproblem can be expressed as the following form:
We will provide the singular value threshold method for the spectral norm minimization problem based on the linear algebra in this section. For convenience, we can reexpress the problem as follows: Suppose we have the SVD of the matrix ; then, we put forward the following theorem.
Theorem 3. The solution to the minimization problem (9) can be obtained by , where obeys denotes the vector obtained by the rearrangement of the diagonal elements of the matrix.
Here, we assume directly and set as any matrix of the same size with and let denote the diagonal of the matrix . For the reason that the Frobenius norm and the spectral norm are invariant under orthogonal transformation, (9) can be firstly converted into On account of the fact that and , (11) can be further converted into Finally, rewriting the above function into the vector form can get (10), which means the minimum of the solution reaches on the diagonal of the variable matrix.
The problem (10) has already been solved by Fadili and Peyre in Proposition 2 of his article [25]. To solve the problem, we introduce a sequence, where , means that the array is the rearrangement of the singular values of the matrix . And sequence (13) can be extended into where is the cumulated ordered values.
For and , the optimal solution to the problem (9) obeys where and is given by where satisfies .
Such a threshold could keep most of the small singular value while changing the larger singular values as a constant, and the constant changes with the matrix . So, the solution to the problem (8) can be obtained by .
2.2.3. Alternating Iterative Algorithm
Based on aforementioned analysis, we present the alternating iterative method (AIM) to the minimization problem (5).(i) Stopping Criterion. We suggest using the following value to judge whether to end iteration: where . And if the iterations exceed the maximum set , the method will also stop.(ii) Parameter . We set the value of according to different video sequence as , where and are the sizes of the data matrix and is a small adjustable constant that ensures the good performance.(iii) Parameter . We set according to , where is also a small adjustable constant. We can see that the selection of parameters is basically multiplied by , which is used to ensure the two regular items calculate in the same order of magnitude.
The choice of parameter values for the algorithm is very important, and there has two parameters, , and , which can be adjusted in our algorithm. Due to the slightly different parameter settings for different data, here we can only give out the setting range of these two parameters. The parameters will influence each other, thereby affecting the running of the algorithm. The selection of and cannot be too small; otherwise, the processing speed will slow down and the image will have great losses. The more complex the foreground of the frames, the greater and needed. A larger brings a cleaner background, and a larger causes a more complete extract of portrait. However, the value of the parameters can not just blindly large, too large will leads to the failure to extract the background, and too large will extract background into the foreground. The exact theoretical basis of the parameters selection will conduct a detailed study in the followup work.
3. Improved Dual Function Model
3.1. Model Creation
In the previous section, we put forward the preliminary dual model NSMP. And next, we will introduce an improved dual model, weighted function nuclear norm and spectral norm minimization model. This “lowrank + dual” model applies the weighted function nuclear norm to the lowrank background instead of the normal nuclear norm then find the dual representation to this weighted function nuclear norm to regular the foreground. The reason we put forward the improvement is that the wellknown nuclear norm regularized problem is the convex relaxation of the rank minimization problem, but it is not a perfect approximation of the rank function. Although the minimization problem with the weighted function nuclear norm is nonconvex, fortunately it has a closed form solution due to the special choice of the value of weights, and it is also a better approximation to the rank function. Thus, what we are going to do next is introduce the weighted function and optimize our model further. By this way, we can improve (5) into the following weighted function nuclear norm and spectral norm minimization problem (WNSMP): where denotes the weighted function which directly adds the weights onto the singular values of the matrix, and, for any matrix , weighted function norm is defined as
If we still rearrange the singular values of the matrix into a vector , then we will have that is actually the weighted norm of the vector . For any vector , there exists So, we can further get where is actually the weighted function spectral norm . Thus, we finally get that and are dual form to each other under the same weight.
3.2. Algorithm to the Dual Function Model
3.2.1. Weighted Function Nuclear Norm Minimization Problem
Here, we still use the alternating iterative algorithm to solve the problem (19). Fix the matrix to get the first subproblem as
For the convenience of discussion, we set and substitute (20) into the functional; then, we can rewrite (24) as follows: Here, we would impose increasing weight to ensure the global solution to (25), although is no longer a matrix norm. In our paper, we determine the adaptive weights by where is an adjustable constant to ensure the value range of the weights including 1, is the number of the frames included in the sample image sequence, denotes the singular values of the lowrank matrix , and is to avoid the initial value of zero.
Theorem 4 (Theorem 2.3 in [26]). For any , , and , a global optimal solution to the optimization problem (25) is given by the adaptive SVD softthresholding (ASVT) operator , where , and
Thus, the solution to the problem (24) is .
3.2.2. Weighted Function Spectral Norm Minimization Problem
Fixing the matrix , we can express WNSMP by the following subproblem: For any , the solution to the problem (28) is , and, for any matrix, has where .
3.2.3. Alternating Iterative Algorithm
In order to enable the values of weights to be updated in each iteration process, we set in the th iteration associated with the lowrank matrix . But the problem is that, in the th iterative, the matrix is not available. So, we need to use the matrixes to estimate the value of the matrix as We always rearrange the value included in the set to ensure the weights to be nondecreasing. Stopping criterion and parameter setting are the same as in NSMP.
4. Experimental Results
In this section, we present the numerical experiments corroborating our main results. We practice IALM, GoDec, NSMP, and WNSMP in a wide variety of surveillance video sequences, including five different scenarios: lobby, restaurant, airport, shopping mall, and campus, and all the datasets can be downloaded in http://perception.i2r.astar.edu.sg/bk_model/bk_index.html. We stack each 200 frames from one scenario to be rearranged as the columns of one data matrix and use these four algorithms to decompose the data matrix into different background and foreground. All the selected frames from the five video sequences are listed in Figure 1. We can see that the foreground part in Figure 1(a) is just a single person, while, in Figures 1(b), 1(c), and 1(d), there are a lot of people in motion; further to Figure 1(e), the moving foreground is expanded into objects.
(a)
(b)
(c)
(d)
(e)
Figure 2 shows us the segmentation results in the case that only one person is moving in the video, and we can see that the difference of the performance is inconspicuous, four kinds of methods all doing well.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
In the simple situations, the advantage of our algorithm is not very obvious, but, in a relatively complex case, the superiority will be immediately apparent. The backgrounds of IALM in Figures 3(a), 4(a), and 5(a) always exist incomplete processed moving shadows in the results. And the artifacts problem in both foreground and background of GoDec gets worse. Characterized by three kinds of situations, backgrounds always have the unnatural artifacts shown in Figure 4(b); as shown in Figure 5(b), there exist incomplete processing moving shadows in the processed background just like IALM in Figure 5(a); Even more worse, due to the interference by the other frames, unexpected shadow appears in the results of the current frame, as shown in Figure 5(f). While, in these processes, our algorithms always have stable performance with clear background and complete foreground.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 6 shows the far more complex situations. Although the results do not live up to the expectation of greater progress, our method still has a relatively good processing results. The backgrounds in our methods are always very clear, while GoDec has a little artifact, and IALM is even worse.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
We can easily conclude from the observed image, Figures 2–6, that our model can get a better and clearer background with lowrank. Therefore, in order to show that our model is actually good enough to model the background, we display the precisionrecall (PR) curves and the receiver operator characteristic (ROC) curves in Figure 7 to evaluate the binary decision problem about the decomposition of the foreground and background. Furthermore, we list the rank of the final modeling background of the four methods in Table 1.
(a)
(b)
Under the noiseless situation, “lowrank” is the unique characteristic of the background in the video sequence, and the small correlation can separate the nonlowrank foreground from the background. Thus, our “lowrank + dual” model not only keeps the background but also extracts the foreground part of the video. Additionally, the correlation of the foreground and background images has been given in Table 1. Here, we adopt the following ways to measure the correlation of the foreground and background images: where and denote each column in the matrixes and , which are the foreground and background images from the same frame to pull into columns.
Through the observation of the visual and data results, it is easy to draw the advantages and disadvantages of each method. IALM has a relatively fast speed, but it offers a blind separation of the lowrank and sparse data. Therefore, once the foreground becomes more complex, the detection of background will be obvious interference by the artifacts. More importantly, every time the result of the lowrank approximation matrix basically has reached the rank up to around 100, which in some sense no longer can be called a lowrank matrix. GoDec has two adjustable parameters, “rank” and “card,” in the algorithm. Presetting “rank” can guarantee the approximation matrix to be necessarily lowrank, but, as the rank has been fixed already, it can not adaptively adjust according to the situation of scene. And the correct selection of “card” is also very important, which has a great influence on the results. The larger the value of “card” the greater time the algorithm runs, and the background would become messy; the smaller the value of “card” the lower time consumption, but the foreground appears lack of information. Both our methods can adaptively choose the appropriate lowrank and ensure the most irrelevant foreground and background to be separated. The parameter setting of our methods has been already stated in Section 2, and the parameters of IALM and GoDec are set as in [15, 16].
5. Discussion and Future Work
This paper proposes a “lowrank + dual” model to decompose the static background and the moving foreground in each frame of the video surveillance sequence. Our model decomposes the stack data matrix into two parts, lowrank background and moving foreground, and we upgrade this model further into a reweighted function norm form. Through the experiment, we can clearly observe that our two models can not only achieve clearer backgrounds but also have the lower consumption and correlation. Our “lowrank + dual” model is a better choice for the background modeling, which can always extract clear and complete background in any case. However, since the focus of this paper is to propose the new model and show the feasibility of this model, the algorithm itself has not been optimized which still uses singular value decomposition to get the solution. So, the computation complexity of our algorithm is still , and the space complexity is . Using the outstanding fast SVD method or other effective methods to speed up our algorithms will be our next work. Also, the further study in the choice of the optimal parameters and weights can be a followup process.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors thank Zhouchen Lin and Tianyi Zhou for their respective open source codes. The authors also would like to thank anonymous reviewers for their kind comments. This paper is supported by the National Nautral Science Foundation of China (nos. 61271294, 61105011, and 61362029).