Abstract

A novel digital image stabilization technique is proposed in this paper. It is based on a fuzzy Kalman compensation of the global motion vector (GMV), which is estimated in the log-polar plane. The GMV is extracted using four local motion vectors (LMVs) computed on respective subimages in the logpolar plane. The fuzzy Kalman system consists of a fuzzy system with the Kalman filter's discrete time-invariant definition. Due to this inherited recursiveness, the output results into smoothed image sequences. The proposed stabilization system aims to compensate any oscillations of the frame absolute positions, based on the motion estimation in the log-polar domain, filtered by the fuzzy Kalman system, and thus the advantages of both the fuzzy Kalman system and the log-polar transformation are exploited. The described technique produces optimal results in terms of the output quality and the level of compensation.

1. Introduction

Digital video stabilization is the process, where the video signal is smoothened against unwanted oscillations while preserving the intentional camera movements. Almost any acquired image sequence is affected by noise and undesired camera jitters, caused by unstable holding and rough terrain. These unwanted positional oscillations of the image sequence affect the visual quality, which besides the aesthetic part is also crucial in many applications such as in robot vision or in video compression. High visual quality enables either humans or machines to easily watch and perceive the sequence, and thus meaningful results to be extracted. Several different image stabilization methods have been reported in the literature and they can be distinguished into three major categories; the technique where the unwanted fluctuations are mostly the rotational ones and the stabilization is implemented by servo motors, which compensates the pan and the tilt camera movements, respectively, is known as active image stabilization [1]. The image stabilization which is performed by electronic hardware is referred as electronic image stabilization [2]. Finally, when the unwanted oscillations are compensated by pure image processing techniques, the process is called digital image stabilization (DIS) [3]. A DIS system is built by two successive units: the motion estimation and the motion compensation one. The goal of the first unit is to compute the motion vectors, and eventually the GMV. The compensation unit follows the motion estimation and produces the vector to shift the current frame's position so that the output to be free from irregularities, preserving the desired global motion. An important feature affecting the performance of the DIS systems is the noise level. Apparently, the lower the noise is the smoother the results are. The GMV calculation has been realized by various techniques, such as phase correlation matching [4] and normalized cross-correlation [5]. A real-time DIS implementation that performs image matching of two successive images, by means of the Fourier-Mellin transformation has been reported in [6]. In [7], the GMV estimation is optimized by the exploitation of fuzzy logic. Kalman filtering has been utilized for the enhancement of the compensation of frame position [4, 8]. Apart from the matching techniques, optical flow ones have been adopted to estimate the motion in a sequence. The undesired motion effects are calculated in [9] by estimating the rotational center and the angular frequency from the local translational motion definition by fine-to-coarse multiresolution motion estimation. In [10], the stabilization is accomplished by fixating at the central image region, while optical flow estimation optimizes this approximation. The LMVs determine the movement in a particle of the image, resulting in a better estimation of the indented camera movement and the undesired motion. A widely used technique is to compute the GMV via a series of LMVs. The computational cost of full-frame search algorithms implied the calculation of the global motion on subimages. The LMVs estimation on these regions has reduced the processing times to a high degree. The image sequence transformation to less computational intensive topological rearrangements has further reduced the processing and the computational resources.

In this paper, we transformed the Cartesian images into log-polar ones [11, 12] and there we computed the GMV from four LMVs in respective image regions. The resulting method achieves low processing times, efficient for real-time implementation. Due to the intrinsic attentional nature of the log-polar transformation, the motion estimation of the LMVs exhibits a space-variant distribution. Moreover, a fuzzy Kalman DIS technique is proposed. Kalman filter and fuzzy systems have widely been used in DIS applications. Recursive fuzzy systems provide optimal results. Prior smoothening of the imported displacements to the fuzzy system, either by Kalman filtering or another filter, has also provided efficiency to fuzzy systems [7]. However, in this work the recursiveness of Kalman filter is directly introduced to the fuzzy system, instead of expressing it as a standard discrete time-invariant system. The fuzzy inputs of the proposed system are expressed with the estimation-correction equations of the Kalman filter. Therefore, the intended camera movement is preserved more efficiently since it happens mostly in the foreground. Consequently, to the GMV estimation the fuzzy Kalman filter is utilized. In each time step, the estimated motion vector is the a priori measurement, while the output of the system is the a posteriori one. Finally, the correction is achieved through the previous measurements, which are used as the estimated ones. The fuzzy system was tested with several types of membership functions (MFs) and different aggregation and defuzzification methods. The measured fluctuations were not filtered further. The use of log-polar images for the motion field extraction issued fast and optimized results both for the stabilization of each frame and the visual quality of the video output, in all the tested situations. The whole operation exploits the advantages of the log-polar plane and the fuzzy Kalman system.

2. Motion Estimation

The motion estimation unit of the DIS system extracts the GMV. This unit distinguishes between the desired and the unwanted motion effects. The key feature is the accuracy of the intended camera motion estimation. Several motion estimation approaches were proposed in the past. Their main categories are the block matching [13], the phase correlation [14], and the optical flow ones [15].

2.1. Log-Polar Transformation

Motion estimation is extremely demanding in terms of com-putation and resources. Subsampling of the images is often used in order to overcome this computational load. Therefore, a topological arrangement and notably a space-variant one, such as the log-polar, provides lesser volume of the image data without constraining the field of view or the image resolution at the fixation point. The log-polar transformation is based on the human's eyes projections of the retina plane to the visual cortex. It finds its origins into studies on the vision mechanisms of the mammals. The adoption of this topology into artificial vision systems ex-hibits several advantages as in visual attention, throughput rate and real-time processing. Many applications of the log-polar transformation have been reported, such as the time-to-impact estimation [11], wavelet extraction based on log-polar mapping [16], tracking [17], and disparity estimation and vergence control [18].

The mathematical model of the log-polar mapping can be expressed as a transformation between the polar (𝜌,πœƒ) (retinal), the log-polar (πœ‰,πœ‚) (cortical plane), and the Cartesian plane (π‘₯,𝑦) (image plane) as shown in Figure 1. As-suming that Nr is the number of cells in the radial direction and Na is the number of cells in the angular direction, the mapping from the polar coordinates (𝜌,πœƒ) to the log-polar coordinates (πœ‰,πœ‚); the log-polar variables πœ‰ and πœ‚ are defined as πœ‰=logπ‘Žξ‚€πœŒπœŒ0,𝑁𝛾=πœ‚π›Ό,2πœ‹(1) where πœ‰ is each row pixel, πœ‚ is each column pixel, and 𝜌0 is the radius of the fovea. The logarithmic basis 𝛼 is obtained from the foveal radius, the image radius 𝜌max and the radial resolution π‘π‘Ÿ: π›Όπ‘π‘Ÿ=𝜌max𝜌0or𝛼=𝑒(1/π‘π‘Ÿ)lnξ€·πœŒmax/𝜌0ξ€Έ.(2)

The aforementioned mathematical formulation applied on the image in Figure 2(a) results to the log-polar image in Figure 2(b). In Figure 2(c), the reconstructed Cartesian representation of the log-polar image is shown.

2.2. Motion Field Extraction

The image motion is the projection of the real world 3D motion onto the two-dimensional image plane. This is ex-pressed as either image velocities or image displacements on the x and y axes of the optical flow field. Optical flow techniques are divided into three main categories: the differential techniques, the frequency-based ones, and the matching methods [15]. The chosen calculation method is a differential one, that is, the classical Horn and Schunk optical flow model as modified in [19].

In order to reduce the computational load of the motion estimation, the horizontal and vertical axes displacements are computed on selected image regions located at the periphery of the image. On the Cartesian plane, these have a rectangular shape of 440Γ—100 pixels and 100Γ—280 pixels, respectively, as shown in Figure 3(a). Notwithstanding, the calculation of the LMVs was performed on the log-polar plane. The respective patches have an arch-like shape of dimensions 7353 pixels and 1893 pixels, respectively, as shown in Figure 3(b).

Yet, the motion estimation on the log-polar plane has some special features that should be taken into consideration, that is, the motion vectors are not transferred straightforwardly from the Cartesian to the log-polar plane due to the introduced fictitious gray-value curvature in the polar image [12]. Having estimated the LMVs, the GMV was the average value of the four LMVs, as it provided better results for the tested image sequences. The displacements are then imported into the fuzzy Kalman system without further processing.

3. Fuzzy Kalman System

The prediction-correction recursive equations of the Kalman filter were employed for the definition of the fuzzy inputs. The ground truth values of the fuzzy Kalman system are the displacements obtained during the optical flow technique, at the motion estimation phase. The use of the Fuzzy Kalman system equations are depicted into Figure 4 and are defined as follows.

Prediction:
π‘ƒβˆ’π‘˜=π΄π‘ƒπ‘˜βˆ’1𝐴𝑇+𝑄,Μ‚π‘₯βˆ’π‘˜=𝐴̂π‘₯βˆ’π‘˜βˆ’1+π΅π‘’π‘˜.(3)

Correction:
Input1π‘˜=π‘§π‘˜βˆ’1βˆ’Μ‚π‘₯βˆ’π‘˜,(4)Input2π‘˜=Input1π‘˜βˆ’Input1π‘˜βˆ’1𝐾,(5)π‘˜=π‘ƒβˆ’π‘˜π»π‘‡ξ‚€π»π‘ƒβˆ’π‘˜π»π‘‡ξ‚+π‘…βˆ’1𝑃,(6)π‘˜=ξ‚€πΌβˆ’πΎπ‘˜π»ξ‚π‘ƒβˆ’π‘˜,(7)Μ‚π‘₯π‘˜=Μ‚π‘₯βˆ’π‘˜+πΎπ‘˜ξ‚€π‘§π‘˜βˆ’π»Μ‚π‘₯βˆ’π‘˜ξ‚,(8)π‘§π‘˜Μ‚π‘₯βˆ’π‘˜Μ‚π‘₯π‘˜π‘ƒβˆ’π‘˜where k is the time index, π‘ƒπ‘˜ is the measurement value at the current time step, and Γ— is the a priori estimation of the frames positions. 1MSE=π‘šπ‘šξ“π‘–=1ξ‚€π‘“π‘–βˆ’π‘§π‘–ξ‚2,𝑓LSE=π‘–βˆ’π‘§π‘–ξ‚21=min,LMSE=,π‘šπ‘šξ“π‘–=1ξ‚€π‘“π‘–βˆ’π‘§π‘–ξ‚2=min,(9) are the a posteriori estimated frame positions. 𝑓𝑖 and 𝑧𝑖 define, respectively, the a priori and the a posteriori error covariance matrices. The first input (4) is defined as the difference between the absolute frame translation and the a priori estimation of the stabilized frame position. The second input (5) indicates the rate of change of the first input at the current time step. The measurement values in each time index represent the frame's translation. The tuning variables Q and R for the process and the measurement noise, respectively, are set to a ratio of 10 (R/Q = 10). Higher ratio yields to quicker responses, but the final output is not smooth enough, as the final frames position are close to the measured ones. High R values lead to low responses, though the high frequencies are cut off, providing smooth output. In order to provide a fast response the ratio was set to 10, although a ratio of 100 and higher introduced less error to the final output.

The key features in the designing of a fuzzy system are the shape of the MFs and the decision rules. In the proposed system, five MFs are used for each input and output, as they are efficient for the desired task. The construction of the fuzzy rules depends on the experience of the designer and the application used. In our task, there was a need of covering the range in order the final output to be smooth enough. Thus, the options are to distribute normally the MFs to their range or to import more MFs. More MFs lead to more fuzzy rules, and consequently to higher complexity. The selection of the type of the MFs is also crucial for the construction of a fuzzy system. The tested types of the MFs are Gaussian, trapezoid, and triangular ones. In all experiments, all the variables (inputs and output) had the same type of MFs. The two inputs and the output are normally distributed to their range in order to obtain, as it is mentioned, a smooth output. All the variables define the frame translations and are set to [βˆ’8   8] pixels, as 8 pixels were the maximum absolute translation both on the horizontal and the vertical axis. The sign indicates the direction of the movement, that is, left or right and up or down. The rules interaction set is depicted in Table 1 and Figure 5 illustrates the fuzzy system for the Gaussian MFs. Important role to the fuzzy system play the possible adjustment methods, such as the implication, the defuzzification, and the aggregation ones. In the proposed system, the implication was set to product and the aggregation method to sum, as it provided a smoother output value. The defuzzification method was set to centroid, as it covers the output range more efficiently.

4. Experimental Results

In order to evaluate the performance of the proposed system we performed several tests. These include different stabilization experiments captured by an active stereo vision head. The size of the acquired sequences is 640 480 pixels. Some of the testing input videos were acquired, while an active image stabilization routine was running. All of these sequences suffer from high-frequency image jitters, produced intentionally by the user for testing purposes. They also suffer from high- illumination changes as well as from fluctuations caused by the servo motors. Further experiments were made, capturing video on a free course. These sequences suffer from motion blurred frames. The remedy to such sequences is a higher frame rate. As the acquired videos were tuned to 25 fps, the fast oscillatory movements during the course provoked loss of information to a high degree. The purpose of capturing such noisy and shaky sequences is to assess the proposed fuzzy Kalman system against complicated and challenging circumstances.

In order to compare the efficiency of our system the stabilization was assessed in four different combinations of image topologies as follows:

(i)Cartesian image, full frame;(ii)Log-polar image, full frame;(iii)Cartesian image, subimages;(iv)Log polar image, subimage.

The use of LMVs in Cartesian images provided better results than the full-frame ones. Table 2 summarizes the comparative results. In order to measure the performance of the proposed stabilization the mean square error (MSE), the least square error (LSE), and the least mean square error (LSME) were calculated. The equations of these errors, as all the values are known, are defined as where is the final stabilized frame position and is the measured one from the motion estimation phase for every time index i. It is clear that the GMV extraction via LMVs in the log-polar plane provided the smoother output. The fuzzy Kalman system responded better by using triangular MFs. The visual results of the fuzzy Kalman system are demonstrated in Figure 6, while in Figure 7 the initial and the final frames' translation are shown for all the tested occasions. It is clear that the estimation of the GMV into the log-polar plane provides better performance.

Furthermore, these errors were also calculated for the efficiency of the different types of MFs. In Table 3, the comparative results for all the tested MFs are demonstrated. From Figure 8 and Table 3, it is clear that the triangular MFs provide a smoother output as they exhibit lower error cost in all the qualitative tests.

5. Conclusion

An image stabilization technique by means of a fuzzy Kalman system was proposed. The fuzzy Kalman system processes the GMV which is computed in the log-polar plane. The system provided a smoothly compensated output in all the tested image sequences. For the proposed fuzzy system, the triangular MFs proved to produce lesser errors. The use of log-polar images, along with the recursiveness of the Kalman filter, led to an optimum system, which not only stabilizes any fluctuations but also filters the noise during the process. To conclude, log-polar images are ideal for image stabilization, as the errors are shorter. The proposed fuzzy Kalman system is a valuable and efficient tool for image stabilization.

Acknowledgment

This work is partially supported by the EC research Project β€œACROBOTER” FP6-IST-2006-045530.