Motion estimation techniques are widely used in today's video processing systems. The frequently used techniques are frequency-domain motion estimation methods, most notably phase correlation (PC). If the image frames are corrupted by Gaussian noises, then cross-correlation and related techniques do not work well. In this paper, however, we have studied this topic from a viewpoint different from the above. Our scheme is based on the bispectrum method for sub-pixel motion estimation of noisy image sequences. Experimental results show that our proposed method performs significantly better than PC technique.
1. Introduction
Image frames are generated by scanning a scene several times a second where each frame, generally, consists of two regions. One region, referred to as stationary
background, is virtually the same as the previous frame. The other region,
referred to as moving data, has moved with respect to the previous frame.
Estimating the motion between image frames has long been a problem of interest
in areas such as video compression, robot vision, and biomedical engineering
[1].
Many motion estimation schemes have been developed.
They can be classified into spatial-domain and frequency-domain approaches. The
spatial-domain algorithms consist of matching
algorithms and gradient-based algorithms. The frequency domain algorithms
consist of phase correlation algorithms, wavelet transform-based algorithms,
and DCT-based algorithms [2]. The vast majority of these
algorithms consider noise-free data, although in [3] the displacement vector is
estimated from noisy data using the generalized maximum likelihood criterion.
If the image frames are corrupted by Gaussian noises, then cross-correlation
and related techniques do not work well. In this circumstance, higher-order
spectra (HOS) in general and the bispectrum in particular
have recently been widely used as an important tool
for signal processing. The classical methods based on the power spectrum are
now being effectively superseded by the bispectral ones due to some definite
disadvantages of the former. These include the inability to identify systems
fed by non-Gaussian noise (NGN) inputs and nonminimum phase (NMP) systems, and
by identification of system nonlinearity. In these cases, the autocorrelation-based
methods offer no answer. Out of all these, the identifiability of NMP systems
has received the maximum attention from researchers.
HOS-based methods have already been proposed to
estimate motion between image frames [4–9]. In [6], the displacement vector is
obtained by maximizing a third-order statistics criterion. In [8], the global motion
parameters obtained by a new region recursive algorithm. In [4, 5], several algorithms are
developed based on a parametric cumulant method, a cumulant-matching method,
and a mean-kurtosis error criterion. In this correspondence, a novel algorithm
for the detection of motion vectors in video sequences is proposed. The algorithm
uses bispectrum method for subpixel motion estimation of noisy image sequences
to obtain a measure of content similarity for temporally adjacent frames and
responds very well to scene motion vectors.
2. Bispectrum-Based Image Motion Estimation
The problem of motion estimation can be stated as follows: “given an image sequence, compute a representation of the motion field that best aligns pixels in one frame of the sequence with those in the next” [9]. This is formulated as
where ; denotes spatial
image position of a point; and are observed
image intensities at instant and , respectively; and are the
noise-free frames; and are assumed to
be spatially and temporally stationary, zero-mean image Gaussian noise
sequences with unknown covariance; is the
displacement vector of the object during the time interval .
The third-order autocumulants and cross-cumulants of a
zero-mean from 2D random field are defined,
respectively, as follows:where represents the expectation operator.
The discrete bispectrum of the frame is defined as
follows:
where denotes the Fourier transform operation; corresponds to the
DFT of the frame ; indicates the
complex conjugate; and are the
frequency co-ordinates for the 2D Fourier transform.
Due to the shift-invariance of the
bispectrumwhere and denote
bispectrum of the frame and noise,
respectively. If the probability density function of the noise is symmetrical,
that is, , or at least not skewed, that is, , then the term is negligible
which renders the triple-correlation very effective in detecting a signal
embedded in noise [10]. ThenThe cross-bispectrum is defined
as follows:Then,Using the relation in (5), we
can transform (7) as
As we can see from (3), the bispectrum has two vector
arguments containing totally four scalar frequency variables. Assuming that is an N-by-N
discrete Fourier transform of , the bispectrum becomes a four-dimensional
N-by-N-by-N-by-N matrix. It is therefore not practical to evaluate the whole
bispectrum. A better solution is to take 2D slices of the 4D spectrum. There
are basically various ways of defining these slices, but we will only consider
the case where
Although we have now taken only a small portion of the
whole spectrum [11],
it can be shown that the motion vector is still possible and no essential
information has been lost.
Thus, the third-order hologram, , is defined by
As a result, by finding the location of the pulse in
(10) we are able to tell the displacement, which is the motion vector. Since
third-order statistics are used, the method is insensitive (in theory) to both
spatially and temporally corrupted by noise which is symmetrically distributed
(e.g., Gaussian). In practice, the motion vector is not an impulse; hence, we
estimate as the index , which maximizes
The co-ordinates of the maximum
of the real-valued array can be used as
an estimate of the horizontal and vertical components of motion between and as
follows:
3. Subpixel Accuracy
Subpixel performance is a critical element of the proposed algorithm. With reference to our previously published work [12, 13], we are introducing a number of important new
features, which improve the accuracy of the motion estimates.
Subpixel accuracy of motion measurements is obtained
dy variable-separable fitting performed in the neighborhood of the maximum
using one-dimensional quadratic function. Using the notation in (11) above,
prototype functions are fitted to the
tripletsthat is, the maximum peak of the
phase correlation surface and its two neighboring values on either side,
vertically and horizontally.
The location of the maximum of the fitted function
provides the required subpixel motion estimate . Fitting a parabolic function horizontally to the
data triplet (12) yields a closed-from solution for the horizontal component of
the motion estimate as
follows:where
The fractional part of the vertical
component can be obtained in a similar way using (13) instead of (12).
Finally, the horizontal and vertical components of the
subpixel accurate motion estimate are obtained by computing the location of the
maxima of each of the above fitted quadratics.
In [14], it is shown that half-pixel accuracy motion vectors
leads to a very significant improvement when compared to one-pixel accuracy,
where as a higher precision results in negligible changes. Therefore, a
half-pixel accuracy was chosen in our simulations.
4. Experimental Results
To prove the feasibility of the proposed method, we compared it to a PC technique
implemented in a similar manner as our approach. In this section, we examine a
few examples and compare the performance, efficiency, and complexity of the two
methods. In our experiments, we used the well-known test sequences: foreman
(176 pixels by 144 lines), mother-daughter and Stefan (352 pixels by 288
lines), table tennis (352 pixels by 240 lines). Although the original sequences
are in color, only the luminance (brightness) component is used to estimate the
motion vectors.
To assess the performances of the different motion
estimation techniques, the following comparisons were made. First, the
subjective quality of the estimated motion field was evaluated, showing the
capability of the algorithm to estimate the true motion in the scene. Second,
the PSNR of motion compensated was measured, giving insight about the quality
of the prediction. Results obtained using the foreman, mother-daughter and
Stefan sequences are shown in Figure 1. All image sequences are degraded with
additive zero-mean Gaussian noise to a signal-to-noise ratio (SNR) of 10 dB. Our
results demonstrate that the proposed method outperforms PC, achieving higher
precision and a significantly smaller corresponding measurement error. This confirms
the motion that the proposed technique of an image is a superior feature
selector utilizing the portions of the image spectrum most likely to contribute
to reliable motion estimation.
Figure 1: PSNR obtained for noisy sequences (SNR = 10 dB).
The ability of the bispectrum method to accurately
estimate the displacement vector field from a degraded sequence is demonstrated in Figure 2. In this Figure, the estimated motion vector fields for the
mother-daughter sequence using the two aforementioned motion estimation
methods. The motion vectors estimated between the frames 126 and 127 are shown for the mother-daughter sequence. We can see that the estimates from the PC seem very random, but the bispectrum technique gives better results, producing the same motion vectors. Thus, the motion fields estimated by the “our approach” tend to be very smooth due to the smoothness constraint. Because of
the noise-resistant property of the bispectrum, it produces more reliable
estimates. Therefore, the proposed method motion estimation results globally in
motion fields more representative of the true motion in the scene.
Figure 2: Motion field for the mother-daughter sequence in the presence of noise.
In terms of motion compensated images, from mother-daughter sequence, we observe better compensated images by the proposed
method. We also observe that the motion compensated images for the “our
method” are much closer to the original images. Thus, the “our scheme” is able to measure the motion vector more accurately and is more robust in general. Overall, the bispectrum typically offers better visual quality images than the PC method. Figure 3 gives examples of motion compensated images.
Figure 3: Prediction for frame 3 of the mother-daughter.
Comparisons of the PC and bispectrum methods indicate
that the bispectrum is a robust technique for motion vector. Results of these comparisons are shown for different noise levels and video sequences. Additive
Gaussian noise (AGN) was added to image sequences with an input SNR varying from 15 dB to 28 dB. Figure 4 shows the PSNR of the motion compensated prediction
error for noise power of 15 dB for table tennis sequence, confirming that our scheme is consistently more immune to noise. Experiments with other levels of
noise power demonstrated that similar performance gains are achievable. Similar experiments were performed for the other sequences. The results further confirm that the “our method” consistently outperforms PC. In terms of complexity, this is measured by the computation time. All the computations are
performed on Intel(R) Pentium(R)D CPU 3.4 GHz with Windows XP. The two algorithms have been implemented using a prototype written in MATLAB 6.5 R13.
The comparison between the “our method” and the PC confirmed that the two methods have the complexity on the same order. This is shown in Table 1.
Table 1: The comparison between two methods for the computation time.
Figure 4: PSNR versus frame number for motion compensated prediction of the table tennis sequence.
5. Conclusion
In this paper, the bispectrum method for subpixel motion estimation of noisy image sequences in frequency-domain was presented. The “our” proposed method provides an
advantage over the PC algorithm in the presence of AGN. With “our method,”
the displacement vector field is smoother, providing a more accurate measure of
object motion. At relatively low noise levels, the bispectrum performance is
comparable to its performance in the noise-free environment. At high noise
levels SNR around 10 dB, the PC fails, yet even under these extreme conditions,
the bispectrum provides improvement in performance over the PC algorithm. In
addition to its PSNR performance, the bispectrum also yields smooth motion
fields.