Table of Contents Author Guidelines Submit a Manuscript
Research Letters in Signal Processing
Volume 2008, Article ID 417915, 5 pages
http://dx.doi.org/10.1155/2008/417915
Research Letter

Estimation of Subpixel Motion Using Bispectrum

1Faculté des Sciences de Rabat, Université Mohammed V Agdal, 4 Avenue Ibn Battouta, B.P. 1014 RP, Rabat, Morocco
2Institut National des Postes et Télécommunication, Madinat Al Irfane, Rabat, Morocco

Received 19 September 2007; Accepted 1 February 2008

Academic Editor: Liang-Gee Chen

Copyright © 2008 El Mehdi Ismaili Aalaoui and Elhassane Ibn-Elhaj. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Motion estimation techniques are widely used in today's video processing systems. The frequently used techniques are frequency-domain motion estimation methods, most notably phase correlation (PC). If the image frames are corrupted by Gaussian noises, then cross-correlation and related techniques do not work well. In this paper, however, we have studied this topic from a viewpoint different from the above. Our scheme is based on the bispectrum method for sub-pixel motion estimation of noisy image sequences. Experimental results show that our proposed method performs significantly better than PC technique.

1. Introduction

Image frames are generated by scanning a scene several times a second where each frame, generally, consists of two regions. One region, referred to as stationary background, is virtually the same as the previous frame. The other region, referred to as moving data, has moved with respect to the previous frame. Estimating the motion between image frames has long been a problem of interest in areas such as video compression, robot vision, and biomedical engineering [1].

Many motion estimation schemes have been developed. They can be classified into spatial-domain and frequency-domain approaches. The spatial-domain algorithms consist of matching algorithms and gradient-based algorithms. The frequency domain algorithms consist of phase correlation algorithms, wavelet transform-based algorithms, and DCT-based algorithms [2]. The vast majority of these algorithms consider noise-free data, although in [3] the displacement vector is estimated from noisy data using the generalized maximum likelihood criterion. If the image frames are corrupted by Gaussian noises, then cross-correlation and related techniques do not work well. In this circumstance, higher-order spectra (HOS) in general and the bispectrum in particular have recently been widely used as an important tool for signal processing. The classical methods based on the power spectrum are now being effectively superseded by the bispectral ones due to some definite disadvantages of the former. These include the inability to identify systems fed by non-Gaussian noise (NGN) inputs and nonminimum phase (NMP) systems, and by identification of system nonlinearity. In these cases, the autocorrelation-based methods offer no answer. Out of all these, the identifiability of NMP systems has received the maximum attention from researchers.

HOS-based methods have already been proposed to estimate motion between image frames [49]. In [6], the displacement vector is obtained by maximizing a third-order statistics criterion. In [8], the global motion parameters obtained by a new region recursive algorithm. In [4, 5], several algorithms are developed based on a parametric cumulant method, a cumulant-matching method, and a mean-kurtosis error criterion. In this correspondence, a novel algorithm for the detection of motion vectors in video sequences is proposed. The algorithm uses bispectrum method for subpixel motion estimation of noisy image sequences to obtain a measure of content similarity for temporally adjacent frames and responds very well to scene motion vectors.

2. Bispectrum-Based Image Motion Estimation

The problem of motion estimation can be stated as follows: “given an image sequence, compute a representation of the motion field that best aligns pixels in one frame of the sequence with those in the next” [9]. This is formulated as 𝑔𝑘1(𝑥,𝑦)=𝑓𝑘1(𝑥,𝑦)+𝑛𝑘1𝑔(𝑥,𝑦),𝑘(𝑥,𝑦)=𝑓𝑘(𝑥,𝑦)+𝑛𝑘(𝑥,𝑦)=𝑓𝑘1𝑥𝑑𝑘𝑥,𝑦𝑑𝑘𝑦+𝑛𝑘(𝑥,𝑦)=𝑔𝑘1𝑥𝑑𝑘𝑥,𝑦𝑑𝑘𝑦+𝑤𝑘(𝑥,𝑦),(1)where 𝑤𝑘(𝑥,𝑦)=𝑛𝑘(𝑥,𝑦)𝑛𝑘1(𝑥𝑑𝑘𝑥,𝑦𝑑𝑘𝑦); (𝑥,𝑦) denotes spatial image position of a point; 𝑔𝑘(𝑥,𝑦) and 𝑔𝑘1(𝑥,𝑦) are observed image intensities at instant 𝑘 and 𝑘1, respectively; 𝑓𝑘(𝑥,𝑦) and 𝑓𝑘1(𝑥,𝑦) are the noise-free frames; 𝑛𝑘(𝑥,𝑦) and 𝑛𝑘1(𝑥,𝑦) are assumed to be spatially and temporally stationary, zero-mean image Gaussian noise sequences with unknown covariance; (𝑑𝑘𝑥,𝑑𝑘𝑦) is the displacement vector of the object during the time interval [𝑘1,𝑘].

The third-order autocumulants and cross-cumulants of a zero-mean from 2D random field 𝑔𝑘(𝑥,𝑦) are defined, respectively, as follows:𝐶𝑔𝑘𝑔𝑘𝑔𝑘3𝑟1,𝑟2;𝑠1,𝑠2𝑔=𝐸𝑘(𝑥,𝑦)𝑔𝑘𝑥+𝑟1,𝑦+𝑟2𝑔𝑘𝑥+𝑠1,𝑦+𝑠2,𝐶𝑔𝑘𝑔𝑘1𝑔𝑘3𝑟1,𝑟2;𝑠1,𝑠2𝑔=𝐸𝑘(𝑥,𝑦)𝑔𝑘1𝑥+𝑟1,𝑦+𝑟2𝑔𝑘𝑥+𝑠1,𝑦+𝑠2,(2)where 𝐸{} represents the expectation operator.

The discrete bispectrum of the frame 𝑔𝑘(𝑥,𝑦) is defined as follows: 𝐵𝑔𝑘𝑔𝑘𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2𝐶=𝑔𝑘𝑔𝑘𝑔𝑘3𝑟1,𝑟2;𝑠1,𝑠2=𝐺𝑔𝑘𝑢1,𝑢2𝐺𝑔𝑘𝑣1,𝑣2𝐺𝑔𝑘𝑢1+𝑣1,𝑢2+𝑣2,(3)where denotes the Fourier transform operation; 𝐺𝑔𝑘(𝑢) corresponds to the DFT of the frame 𝑔𝑘(𝑥,𝑦); indicates the complex conjugate; (𝑢1,𝑢2) and (𝑣1,𝑣2) are the frequency co-ordinates for the 2D Fourier transform.

Due to the shift-invariance of the bispectrum𝐵𝑔𝑘𝑔𝑘𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2=𝐵𝑔𝑘1𝑔𝑘1𝑔𝑘13𝑢1,𝑢2;𝑣1,𝑣2+𝐵𝑤𝑘𝑤𝑘𝑤𝑘3𝑢1,𝑢2;𝑣1,𝑣2,(4)where 𝐵𝑔𝑘1𝑔𝑘1𝑔𝑘13(𝑢1,𝑢2;𝑣1,𝑣2) and 𝐵𝑤𝑘𝑤𝑘𝑤𝑘3(𝑢1,𝑢2;𝑣1,𝑣2) denote bispectrum of the frame 𝑔𝑘1(𝑥,𝑦) and noise, respectively. If the probability density function of the noise is symmetrical, that is, 𝑝(𝑛)=𝑝(𝑛), or at least not skewed, that is, 𝑛3𝑝(𝑛)𝑑𝑛=0, then the term 𝐵𝑤𝑘𝑤𝑘𝑤𝑘3(𝑢1,𝑢2;𝑣1,𝑣2) is negligible which renders the triple-correlation very effective in detecting a signal embedded in noise [10]. Then𝐵𝑔𝑘𝑔𝑘𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2=𝐵𝑔𝑘1𝑔𝑘1𝑔𝑘13𝑢1,𝑢2;𝑣1,𝑣2.(5)The cross-bispectrum is defined as follows:𝐵𝑔𝑘𝑔𝑘1𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2𝐶=𝑔𝑘𝑔𝑘1𝑔𝑘3𝑟1,𝑟2;𝑠1,𝑠2=𝐺𝑔𝑘𝑢1,𝑢2𝐺𝑔𝑘1𝑣1,𝑣2𝐺𝑔𝑘𝑢1+𝑣1,𝑢2+𝑣2.(6)Then,𝐵𝑔𝑘𝑔𝑘1𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2=𝐵𝑔𝑘1𝑔𝑘1𝑔𝑘13𝑢1,𝑢2;𝑣1,𝑣2𝑒𝑗2𝜋(𝑣1𝑑𝑘𝑥+𝑣2𝑑𝑘𝑦).(7)Using the relation in (5), we can transform (7) as𝐵𝑔𝑘𝑔𝑘1𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2=𝐵𝑔𝑘𝑔𝑘𝑔𝑘3𝑢1,𝑢2;𝑣1,𝑣2𝑒𝑗2𝜋(𝑣1𝑑𝑘𝑥+𝑣2𝑑𝑘𝑦).(8)

As we can see from (3), the bispectrum has two vector arguments containing totally four scalar frequency variables. Assuming that 𝐺𝑔𝑘(𝑢1,𝑢2) is an N-by-N discrete Fourier transform of 𝑔𝑘(𝑥,𝑦), the bispectrum becomes a four-dimensional N-by-N-by-N-by-N matrix. It is therefore not practical to evaluate the whole bispectrum. A better solution is to take 2D slices of the 4D spectrum. There are basically various ways of defining these slices, but we will only consider the case where𝐵𝑔𝑘𝑔𝑘𝑔𝑘3,𝑙𝑢1,𝑢2=𝐵𝑔𝑘𝑔𝑘𝑔𝑘3𝑢1,𝑢2;𝑙𝑢1,𝑙𝑢2𝐵𝑙𝑅,𝑔𝑘𝑔𝑘1𝑔𝑘3,𝑙𝑢1,𝑢2=𝐵𝑔𝑘𝑔𝑘1𝑔𝑘3𝑢1,𝑢2;𝑙𝑢1,𝑙𝑢2𝑙𝑅.(9)

Although we have now taken only a small portion of the whole spectrum [11], it can be shown that the motion vector is still possible and no essential information has been lost.

Thus, the third-order hologram, 𝑙(𝑟1,𝑟2), is defined by𝑙𝑟1,𝑟2=1[𝐵𝑔𝑘𝑔𝑘1𝑔𝑘3,𝑙𝑢1,𝑢2|||𝐵𝑔𝑘𝑔𝑘𝑔𝑘3,𝑙𝑢1,𝑢2|||𝑟]=𝛿1𝑑𝑘𝑥,𝑟2𝑑𝑘𝑦.(10)

As a result, by finding the location of the pulse in (10) we are able to tell the displacement, which is the motion vector. Since third-order statistics are used, the method is insensitive (in theory) to both spatially and temporally corrupted by noise which is symmetrically distributed (e.g., Gaussian). In practice, the motion vector is not an impulse; hence, we estimate (𝑑𝑘𝑥,𝑑𝑘𝑦) as the index (𝑟1,𝑟2), which maximizes |𝑙(𝑟1,𝑟2)|.

The co-ordinates (𝑟1𝑚,𝑟2𝑚) of the maximum of the real-valued array 𝑙(𝑟1,𝑟2) can be used as an estimate of the horizontal and vertical components of motion between 𝑔𝑘(𝑥,𝑦) and 𝑔𝑘1(𝑥,𝑦) as follows:𝑟1𝑚,𝑟2𝑚=argmax𝑙𝑟1,𝑟2.(11)

3. Subpixel Accuracy

Subpixel performance is a critical element of the proposed algorithm. With reference to our previously published work [12, 13], we are introducing a number of important new features, which improve the accuracy of the motion estimates.

Subpixel accuracy of motion measurements is obtained dy variable-separable fitting performed in the neighborhood of the maximum using one-dimensional quadratic function. Using the notation in (11) above, prototype functions are fitted to the triplets𝑙𝑟1𝑚1,𝑟2𝑚,𝑙𝑟1𝑚,𝑟2𝑚,𝑙𝑟1𝑚+1,𝑟2𝑚,(12)𝑙𝑟1𝑚,𝑟2𝑚1,𝑙𝑟1𝑚,𝑟2𝑚,𝑙𝑟1𝑚,𝑟2𝑚+1,(13)(𝑑𝑘𝑑𝑥,𝑘𝑦)that is, the maximum peak of the phase correlation surface and its two neighboring values on either side, vertically and horizontally.

The location of the maximum of the fitted function provides the required subpixel motion estimate 𝑑𝑘𝑥. Fitting a parabolic function horizontally to the data triplet (12) yields a closed-from solution for the horizontal component of the motion estimate 𝑑𝑘𝑥=𝑙𝑟1𝑚+1,𝑟2𝑚𝑙𝑟1𝑚1,𝑟2𝑚2𝐻,(14) as follows:𝐻=𝑙𝑟1𝑚+1,𝑟2𝑚2𝑙𝑟1𝑚,𝑟2𝑚+𝑙𝑟1𝑚1,𝑟2𝑚.where 𝑑𝑘𝑦

The fractional part of the vertical component can be obtained in a similar way using (13) instead of (12).

Finally, the horizontal and vertical components of the subpixel accurate motion estimate are obtained by computing the location of the maxima of each of the above fitted quadratics.

In [14], it is shown that half-pixel accuracy motion vectors leads to a very significant improvement when compared to one-pixel accuracy, where as a higher precision results in negligible changes. Therefore, a half-pixel accuracy was chosen in our simulations.

4. Experimental Results

To prove the feasibility of the proposed method, we compared it to a PC technique implemented in a similar manner as our approach. In this section, we examine a few examples and compare the performance, efficiency, and complexity of the two methods. In our experiments, we used the well-known test sequences: foreman (176 pixels by 144 lines), mother-daughter and Stefan (352 pixels by 288 lines), table tennis (352 pixels by 240 lines). Although the original sequences are in color, only the luminance (brightness) component is used to estimate the motion vectors.

To assess the performances of the different motion estimation techniques, the following comparisons were made. First, the subjective quality of the estimated motion field was evaluated, showing the capability of the algorithm to estimate the true motion in the scene. Second, the PSNR of motion compensated was measured, giving insight about the quality of the prediction. Results obtained using the foreman, mother-daughter and Stefan sequences are shown in Figure 1. All image sequences are degraded with additive zero-mean Gaussian noise to a signal-to-noise ratio (SNR) of 10 dB. Our results demonstrate that the proposed method outperforms PC, achieving higher precision and a significantly smaller corresponding measurement error. This confirms the motion that the proposed technique of an image is a superior feature selector utilizing the portions of the image spectrum most likely to contribute to reliable motion estimation.

417915.fig.001
Figure 1: PSNR obtained for noisy sequences (SNR = 10 dB).

The ability of the bispectrum method to accurately estimate the displacement vector field from a degraded sequence is demonstrated in Figure 2. In this Figure, the estimated motion vector fields for the mother-daughter sequence using the two aforementioned motion estimation methods. The motion vectors estimated between the frames 126 and 127 are shown for the mother-daughter sequence. We can see that the estimates from the PC seem very random, but the bispectrum technique gives better results, producing the same motion vectors. Thus, the motion fields estimated by the “our approach” tend to be very smooth due to the smoothness constraint. Because of the noise-resistant property of the bispectrum, it produces more reliable estimates. Therefore, the proposed method motion estimation results globally in motion fields more representative of the true motion in the scene.

fig2
Figure 2: Motion field for the mother-daughter sequence in the presence of noise.

In terms of motion compensated images, from mother-daughter sequence, we observe better compensated images by the proposed method. We also observe that the motion compensated images for the “our method” are much closer to the original images. Thus, the “our scheme” is able to measure the motion vector more accurately and is more robust in general. Overall, the bispectrum typically offers better visual quality images than the PC method. Figure 3 gives examples of motion compensated images.

fig3
Figure 3: Prediction for frame 3 of the mother-daughter.

Comparisons of the PC and bispectrum methods indicate that the bispectrum is a robust technique for motion vector. Results of these comparisons are shown for different noise levels and video sequences. Additive Gaussian noise (AGN) was added to image sequences with an input SNR varying from 15 dB to 28 dB. Figure 4 shows the PSNR of the motion compensated prediction error for noise power of 15 dB for table tennis sequence, confirming that our scheme is consistently more immune to noise. Experiments with other levels of noise power demonstrated that similar performance gains are achievable. Similar experiments were performed for the other sequences. The results further confirm that the “our method” consistently outperforms PC. In terms of complexity, this is measured by the computation time. All the computations are performed on Intel(R) Pentium(R)D CPU 3.4 GHz with Windows XP. The two algorithms have been implemented using a prototype written in MATLAB 6.5 R13. The comparison between the “our method” and the PC confirmed that the two methods have the complexity on the same order. This is shown in Table 1.

tab1
Table 1: The comparison between two methods for the computation time.
417915.fig.004
Figure 4: PSNR versus frame number for motion compensated prediction of the table tennis sequence.

5. Conclusion

In this paper, the bispectrum method for subpixel motion estimation of noisy image sequences in frequency-domain was presented. The “our” proposed method provides an advantage over the PC algorithm in the presence of AGN. With “our method,” the displacement vector field is smoother, providing a more accurate measure of object motion. At relatively low noise levels, the bispectrum performance is comparable to its performance in the noise-free environment. At high noise levels SNR around 10 dB, the PC fails, yet even under these extreme conditions, the bispectrum provides improvement in performance over the PC algorithm. In addition to its PSNR performance, the bispectrum also yields smooth motion fields.

References

  1. F. Kelly and A. Kokaram, “Graphics hardware for gradient based motion estimation,” in Embedded Processors for Multimedia and Communications, vol. 5309 of Proceedings of SPIE, pp. 92–103, San Jose, Calif, USA, January 2004. View at Publisher · View at Google Scholar
  2. L. Jooheung, N. Vijaykrishnan, M. J. Irwin, and W. Wolf, “An efficient architecture for motion estimation and compensation in the transform domain,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 2, pp. 191–201, 2006. View at Publisher · View at Google Scholar
  3. N. M. Namazi and C. H. Lee, “Nonuniform image motion estimation from noisy data,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 2, pp. 364–366, 1990. View at Publisher · View at Google Scholar
  4. J. M. Anderson and G. B. Giannakis, “Noise insensitive image motion estimation using cumulants,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '91), vol. 4, pp. 2721–2724, Toronto, Ontario, Canada, April 1991. View at Publisher · View at Google Scholar
  5. J. M. Anderson and G. B. Giannakis, “Image motion estimation algorithms using cumulants,” IEEE Transactions on Image Processing, vol. 4, no. 3, pp. 346–357, 1995. View at Publisher · View at Google Scholar · View at PubMed
  6. R. P. Kleihorst, R. L. Lagendijk, and J. Biemond, “Noise reduction of severely corrupted image sequences,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '93), vol. 5, pp. 293–296, Minneapolis, Minn, USA, April 1993. View at Publisher · View at Google Scholar
  7. E. Sayrol, T. Gasull, and J. R. Fonollosa, “Cost function for motion estimation based on higher order statistics,” in Proceedings of the 7th European Signal Processing Conference (EUSIPCO '94), pp. 1117–1120, Edinbourgh, Scotland, UK, September 1994.
  8. E. Ibn-Elhaj, D. Aboutajdine, S. Pâteux, and L. Morin, “HOS-based method of global motion estimation for noisy image sequences,” Electronics Letters, vol. 35, no. 16, pp. 1320–1322, 1999. View at Publisher · View at Google Scholar
  9. E. Sayrol, A. Gasull, and J. R. Fonollosa, “Motion estimation using higher order statistics,” IEEE Transactions on Image Processing, vol. 5, no. 6, pp. 1077–1084, 1996. View at Publisher · View at Google Scholar · View at PubMed
  10. C. L. Nikias and A. P. Petrpulu, Higher-Order Spectra Analysis: A Nonlinear Signal Processing Framework, Prentice-Hall, Englewood Cliffs, NJ, USA, 1993.
  11. A. P. Petropulu and H. Pozidis, “Phase reconstruction from bispectrum slices,” IEEE Transactions on Signal Processing, vol. 46, no. 2, pp. 527–530, 1998. View at Publisher · View at Google Scholar
  12. E. M. Ismaili Aalaoui and E. Ibn-Elhaj, “Estimation of motion fields from noisy image sequences: using generalized cross-correlation methods,” in Proceedings of the IEEE International Conference on Signal Processing and Communications (ICSPC '07), Dubai, United Arab Emirates, November 2007.
  13. E. M. Ismaili Aalaoui and E. Ibn-Elhaj, “Estimation of displacement vector field from noisy data using maximum likelihood estimator,” in Proceedings of the 14th IEEE International Conference on Electronics, Circuits and Systems (ICECS '07), Marrakech, Morocco, December 2007.
  14. G. Madec, “Half pixel accuracy in block matching,” in Proceedings on the Picture Coding Symposium (PCS '90), Cambridge, Mass, USA, March 1990.