Abstract
Motion estimation techniques are widely used in today's video processing systems. The frequently used techniques are frequency-domain motion estimation methods, most notably phase correlation (PC). If the image frames are corrupted by Gaussian noises, then cross-correlation and related techniques do not work well. In this paper, however, we have studied this topic from a viewpoint different from the above. Our scheme is based on the bispectrum method for sub-pixel motion estimation of noisy image sequences. Experimental results show that our proposed method performs significantly better than PC technique.
1. Introduction
Image frames are generated by scanning a scene several times a second where each frame, generally, consists of two regions. One region, referred to as stationary background, is virtually the same as the previous frame. The other region, referred to as moving data, has moved with respect to the previous frame. Estimating the motion between image frames has long been a problem of interest in areas such as video compression, robot vision, and biomedical engineering [1].
Many motion estimation schemes have been developed. They can be classified into spatial-domain and frequency-domain approaches. The spatial-domain algorithms consist of matching algorithms and gradient-based algorithms. The frequency domain algorithms consist of phase correlation algorithms, wavelet transform-based algorithms, and DCT-based algorithms [2]. The vast majority of these algorithms consider noise-free data, although in [3] the displacement vector is estimated from noisy data using the generalized maximum likelihood criterion. If the image frames are corrupted by Gaussian noises, then cross-correlation and related techniques do not work well. In this circumstance, higher-order spectra (HOS) in general and the bispectrum in particular have recently been widely used as an important tool for signal processing. The classical methods based on the power spectrum are now being effectively superseded by the bispectral ones due to some definite disadvantages of the former. These include the inability to identify systems fed by non-Gaussian noise (NGN) inputs and nonminimum phase (NMP) systems, and by identification of system nonlinearity. In these cases, the autocorrelation-based methods offer no answer. Out of all these, the identifiability of NMP systems has received the maximum attention from researchers.
HOS-based methods have already been proposed to estimate motion between image frames [4β9]. In [6], the displacement vector is obtained by maximizing a third-order statistics criterion. In [8], the global motion parameters obtained by a new region recursive algorithm. In [4, 5], several algorithms are developed based on a parametric cumulant method, a cumulant-matching method, and a mean-kurtosis error criterion. In this correspondence, a novel algorithm for the detection of motion vectors in video sequences is proposed. The algorithm uses bispectrum method for subpixel motion estimation of noisy image sequences to obtain a measure of content similarity for temporally adjacent frames and responds very well to scene motion vectors.
2. Bispectrum-Based Image Motion Estimation
The problem of motion estimation can be stated as follows: βgiven an image sequence, compute a representation of the motion field that best aligns pixels in one frame of the sequence with those in the nextβ [9]. This is formulated as where ; denotes spatial image position of a point; and are observed image intensities at instant and , respectively; and are the noise-free frames; and are assumed to be spatially and temporally stationary, zero-mean image Gaussian noise sequences with unknown covariance; is the displacement vector of the object during the time interval .
The third-order autocumulants and cross-cumulants of a zero-mean from 2D random field are defined, respectively, as follows:where represents the expectation operator.
The discrete bispectrum of the frame is defined as follows: where denotes the Fourier transform operation; corresponds to the DFT of the frame ; indicates the complex conjugate; and are the frequency co-ordinates for the 2D Fourier transform.
Due to the shift-invariance of the bispectrumwhere and denote bispectrum of the frame and noise, respectively. If the probability density function of the noise is symmetrical, that is, , or at least not skewed, that is, , then the term is negligible which renders the triple-correlation very effective in detecting a signal embedded in noise [10]. ThenThe cross-bispectrum is defined as follows:Then,Using the relation in (5), we can transform (7) as
As we can see from (3), the bispectrum has two vector arguments containing totally four scalar frequency variables. Assuming that is an N-by-N discrete Fourier transform of , the bispectrum becomes a four-dimensional N-by-N-by-N-by-N matrix. It is therefore not practical to evaluate the whole bispectrum. A better solution is to take 2D slices of the 4D spectrum. There are basically various ways of defining these slices, but we will only consider the case where
Although we have now taken only a small portion of the whole spectrum [11], it can be shown that the motion vector is still possible and no essential information has been lost.
Thus, the third-order hologram, , is defined by
As a result, by finding the location of the pulse in (10) we are able to tell the displacement, which is the motion vector. Since third-order statistics are used, the method is insensitive (in theory) to both spatially and temporally corrupted by noise which is symmetrically distributed (e.g., Gaussian). In practice, the motion vector is not an impulse; hence, we estimate as the index , which maximizes
The co-ordinates of the maximum of the real-valued array can be used as an estimate of the horizontal and vertical components of motion between and as follows:
3. Subpixel Accuracy
Subpixel performance is a critical element of the proposed algorithm. With reference to our previously published work [12, 13], we are introducing a number of important new features, which improve the accuracy of the motion estimates.
Subpixel accuracy of motion measurements is obtained dy variable-separable fitting performed in the neighborhood of the maximum using one-dimensional quadratic function. Using the notation in (11) above, prototype functions are fitted to the tripletsthat is, the maximum peak of the phase correlation surface and its two neighboring values on either side, vertically and horizontally.
The location of the maximum of the fitted function provides the required subpixel motion estimate . Fitting a parabolic function horizontally to the data triplet (12) yields a closed-from solution for the horizontal component of the motion estimate as follows:where
The fractional part of the vertical component can be obtained in a similar way using (13) instead of (12).
Finally, the horizontal and vertical components of the subpixel accurate motion estimate are obtained by computing the location of the maxima of each of the above fitted quadratics.
In [14], it is shown that half-pixel accuracy motion vectors leads to a very significant improvement when compared to one-pixel accuracy, where as a higher precision results in negligible changes. Therefore, a half-pixel accuracy was chosen in our simulations.
4. Experimental Results
To prove the feasibility of the proposed method, we compared it to a PC technique implemented in a similar manner as our approach. In this section, we examine a few examples and compare the performance, efficiency, and complexity of the two methods. In our experiments, we used the well-known test sequences: foreman (176 pixels by 144 lines), mother-daughter and Stefan (352 pixels by 288 lines), table tennis (352 pixels by 240 lines). Although the original sequences are in color, only the luminance (brightness) component is used to estimate the motion vectors.
To assess the performances of the different motion estimation techniques, the following comparisons were made. First, the subjective quality of the estimated motion field was evaluated, showing the capability of the algorithm to estimate the true motion in the scene. Second, the PSNR of motion compensated was measured, giving insight about the quality of the prediction. Results obtained using the foreman, mother-daughter and Stefan sequences are shown in Figure 1. All image sequences are degraded with additive zero-mean Gaussian noise to a signal-to-noise ratio (SNR) of 10βdB. Our results demonstrate that the proposed method outperforms PC, achieving higher precision and a significantly smaller corresponding measurement error. This confirms the motion that the proposed technique of an image is a superior feature selector utilizing the portions of the image spectrum most likely to contribute to reliable motion estimation.
The ability of the bispectrum method to accurately estimate the displacement vector field from a degraded sequence is demonstrated in Figure 2. In this Figure, the estimated motion vector fields for the mother-daughter sequence using the two aforementioned motion estimation methods. The motion vectors estimated between the frames 126 and 127 are shown for the mother-daughter sequence. We can see that the estimates from the PC seem very random, but the bispectrum technique gives better results, producing the same motion vectors. Thus, the motion fields estimated by the βour approachβ tend to be very smooth due to the smoothness constraint. Because of the noise-resistant property of the bispectrum, it produces more reliable estimates. Therefore, the proposed method motion estimation results globally in motion fields more representative of the true motion in the scene.
(a) PC
(b) Bispectrum
In terms of motion compensated images, from mother-daughter sequence, we observe better compensated images by the proposed method. We also observe that the motion compensated images for the βour methodβ are much closer to the original images. Thus, the βour schemeβ is able to measure the motion vector more accurately and is more robust in general. Overall, the bispectrum typically offers better visual quality images than the PC method. Figure 3 gives examples of motion compensated images.
(a) Original image
(b) PC
(c) Bispectrum
Comparisons of the PC and bispectrum methods indicate that the bispectrum is a robust technique for motion vector. Results of these comparisons are shown for different noise levels and video sequences. Additive Gaussian noise (AGN) was added to image sequences with an input SNR varying from 15βdB to 28βdB. Figure 4 shows the PSNR of the motion compensated prediction error for noise power of 15βdB for table tennis sequence, confirming that our scheme is consistently more immune to noise. Experiments with other levels of noise power demonstrated that similar performance gains are achievable. Similar experiments were performed for the other sequences. The results further confirm that the βour methodβ consistently outperforms PC. In terms of complexity, this is measured by the computation time. All the computations are performed on Intel(R) Pentium(R)D CPU 3.4βGHz with Windows XP. The two algorithms have been implemented using a prototype written in MATLAB 6.5 R13. The comparison between the βour methodβ and the PC confirmed that the two methods have the complexity on the same order. This is shown in Table 1.
5. Conclusion
In this paper, the bispectrum method for subpixel motion estimation of noisy image sequences in frequency-domain was presented. The βourβ proposed method provides an advantage over the PC algorithm in the presence of AGN. With βour method,β the displacement vector field is smoother, providing a more accurate measure of object motion. At relatively low noise levels, the bispectrum performance is comparable to its performance in the noise-free environment. At high noise levels SNR around 10βdB, the PC fails, yet even under these extreme conditions, the bispectrum provides improvement in performance over the PC algorithm. In addition to its PSNR performance, the bispectrum also yields smooth motion fields.