Department of Communication Engineering, National Central University, Chungli 32054, Taiwan
Abstract
We incorporate the early zero-block detection technique into the UMHexagonS algorithm, which has already been adopted in H.264/AVC JM reference software, to speed up the motion estimation process. A nearly sufficient condition is derived for early zero-block detection. Although the conventional early zero-block detection method can achieve significant improvement in computation reduction, the PSNR loss, to whatever extent, is not negligible especially for high quantization parameter (QP) or low bit-rate coding. This paper modifies the UMHexagonS algorithm with the early zero-block detection technique to improve its coding performance. The experimental results reveal that the improved UMHexagonS algorithm greatly reduces computation while maintaining very high coding efficiency.
1. Introduction
The
newest international video coding standard H.264/AVC has recently been
approved by the ITU-T (as recommendation H.264) and by ISO/IEC as the
international standard MPEG-4 part 10 advanced video coding (AVC) standard [1]. The emerging H.264/AVC achieves
significantly better performance in both PSNR and visual quality at the same
bit-rate compared
with prior video coding standards such as MPEG4 part 2 and H.263. One important technique is the use of the variable
block-size motion estimation and rate
distortion optimization techniques; the computational
complexity of H.264/AVC is dramatically increased due to the variable
block-size modes performed.
Many
fast and efficient methods for motion estimation (ME) have been proposed in
recent years to reduce computational cost and maintain coding performance. In
general, there are two ways to reduce computation. One is to speed up the ME
algorithms themselves, such as the hybrid unsymmetrical-cross multihexagon-grid
search (UMHexagonS) algorithm [2], which has been adopted in JM reference
software. The other is to terminate the ME calculation by early detection of the zero-blocks (ZBs) of discrete
cosine transform (DCT) coefficients after quantization. Xie et al. [3] established a
zero-block condition based on the following criterion:
(1)where
and
is residual samples between the current
macroblock and the reference macroblock. For H.264, the relation between
and quantization parameter is
.
This criterion has been employed in the JM reference software. In [4, 5], the early zero-block detection approach was applied to
the motion search process using a threshold of
for comparison with the sum of difference of
block size (
) and deciding whether
DCT is a
zero-block. The motion search stops when all zero-blocks are detected. This
results in significant computational savings, especially for low bit-rate
coding. The threshold of
(corresponding to
in
discrete cosine transform and
quantization (DCT/Q)) is not sufficient, and
it could improperly detect a great number of zero-blocks, leading to a severe
degradation in coding performance.
Some
sufficient but not necessary conditions for zero-block detection of DCT
coefficients after quantization were derived by examining the sum of absolute
differences (SADs) between the current macroblock and the reference macroblock
[6, 7]. Although
the zero-blocks of DCT coefficients can be detected correctly, numerous
zero-blocks still remain undetected. Based on Moon’s method [7], a technique using an adaptive threshold was suggested
to enhance zero-block detecting capability [8].
In this work, we
derive a nearly sufficient condition based on the ensemble average of all
DCT coefficients. The nearly sufficient
condition for zero-block detection is then applied to both motion search and
DCT/Q calculation in the UMHexagonS algorithm. The
experimental results reveal that a significant improvement in computation
reduction can be achieved compared to methods using the other two sufficient
conditions, while high coding efficiency is still maintained.
2. A Nearly Sufficient Condition for Zero-Block Detection
To
guarantee integer transform, the
DCT in H.264/AVC is approximated to the
following form:
(2)where
,
and
.
The basic quantization operation is given by
(3)
The value of quantization parameter (QP) varies in the range 0–51. The quantizer step size
is used to control bit-rate and video quality.
With postscaling factor (PF) considered with the quantizer, the quantized
output
can be written as
(4)where
is the entry of the core 2D transform
.
To avoid any division operation, the factor (
) is implemented by a multiplication factor
and a right shift:
(5) with
(6)
where
,
% denotes the modular operator, and
is the multiplication factor. The quantized
coefficient can be implemented using integer arithmetic:
(7)where
represents a binary shift right, and f is
for interblocks or
for intrablocks.
Sousa
[6] derived a simple sufficient condition under which each quantized coefficient
becomes zero for
DCT. To derive the
sufficient condition for
DCT, the
PF factor is absorbed back into
the core 2D transform and
DCTcoefficients are rewritten Y:
(8)where
,
and
.
Each coefficient
can be written as
(9)
(10)for all DCT coefficients. For interblock encoding,
the DCT coefficient is quantized as zero when the quantized coefficient
satisfies
,
that is,
(11) From (10) and (11), it is easy to show that the
DCT
is a zero-block if the sum of absolute differences
satisfies
(12) This is Sousa’s sufficient condition for zero-block
detection.
Moon
et al. [7], derived a more precise sufficient condition for zero-block
detection by examining the integer
transform and quantization in H.264/AVC, which is summarized as follows:
(1)
if
, then
DCT is a zero-block, and where
(13)
(2)
if
and
, then
DCT is also a zero-block where the parameters
and
are, respectively, given by
(14)
Interestingly,
note that

is exactly identical to Sousa’s condition. As
can be seen, the condition varies with

.
An intensive study indicates that this sufficient condition varies within a
range

,
which is a little higher than the Sousa’s condition (

).
2.1. A Nearly Sufficient Condition Based on Ensemble Average of DCT Coefficients
In this section, a
nearly sufficient condition is derived based upon the ensemble average of all
DCT coefficients by summing up all
DCT
coefficients
.
The summation over all DCT coefficients can be written as
(15) Define
,
and the ensemble average
can be upper-bounded as follows:
(16)or
(17) After some manipulation,
was found to be 3.7975. Instead, using
,
if the ensemble average of DCT coefficients
is applied to (11), the following upper bound
for zero-block detection can be obtained:
(18) Although the ensemble average condition is not
sufficient and it might detect a zero-block incorrectly, the experiment
indicates that only a very small portion of DCT coefficients is incorrectly detected
as a zero-block. However, compared to both Sousa’s and Moon’s conditions, more
zero-blocks can be detected correctly using the derived condition.
2.2. Zero-Block Detection Capability and Computation
Reduction in DCT/Quantization
The various thresholds
for zero-block detection as a function of QP are plotted in Figure
1.
Note that both Sousa’s and Moon’s conditions are theoretically sufficient, but
not for the thresholds
and
.
The zero-block
detecting capability of all various thresholds carried on the news and paris
sequences are plotted in Figure 2. Although both Sousa’s and Moon’s conditions
are theoretically sufficient, fewer zero-blocks can be detected using these two
sufficient conditions compared to the other two conditions. The threshold
brings out the best zero-block detecting capability;
it simultaneously detects numerous improper zero-blocks that could lead to
severe performance degradation. The percentage of zero-blocks detected
improperly using these two nonsufficient conditions are shown in Figure 3. As
can be seen, less than 1% of improper zero-blocks were found for the ensemble
average threshold
,
while more than 9% for the threshold
for
.
Figure 1: Thresholds versus

.
Figure 2: Zero-block detecting capability.
Figure 3: Percentage of improper zero-block detected.
To
evaluate the performance of previously mentioned conditions for early zero-block
detection, an experiment was performed in DCT/Q calculation. Table 1 displays
the savings of total encoding time in DCT/Q as well as PSNR loss, conducted on the news sequence, for different QPs. The integer transform and
quantization only occupies about 5% of the total encoding time. Note that no
loss in either PSNR or bit-rate were found for Sousa’s and Moon’s conditions. As shown, the threshold
can achieve a significant reduction in DCT/Q
computation with a negligible PSNR loss. Up to 3% of total encoding time can be
saved with PSNR loss of only 0.005 dB for
. The threshold
[4], however, runs into a severe PSNR
degradation due to improper zero-block detection, although computation in DCT/Q
can be further reduced. Consequently, the threshold
is not subsequently analyzed.
Table 1: Encoding time saving and PSNR loss in DCT/Q.
3. Conventional Methods to Adopt Zero-Block Detection in Umhexagon Algorithm
In the H.264/AVC,
interframe motion estimation is performed for 7 different block sizes (denoted
as modes), varying among
,
,
,
,
,
, and
. The motion
estimation involves finding a macroblock in a previously encoded reference
frame that best matches the current macroblock using SADs between
current and reference area samples:
(19)
In
the early termination method of motion estimation, each
in
is compared with a threshold; and if all
satisfy sufficient or nearly sufficient
conditions, the motion search stops. In addition, the DCT/Q calculation need not be done
if the
DCT is a zero-block. This leads to a great
reduction in computation. Since the conventional early zero-block detection
method only requires a comparison of
with a threshold, this approach can be applied
to all kinds of motion searches, such as full search and all other fast search
algorithms. This has been investigated in [4, 5].
In this section, we apply
the various zero-block detection methods to the UMHexagonS algorithm and investigate the
performance. The simulation
conditions are tabulated in Table 2. Table 3 displays the average search points
per block for different QPs conducted on the news sequence
achieved using various zero-block detection thresholds. As shown, the average
search points decrease with increasing threshold. For the news sequence
and
, up to 78% of average search points (14.09 reduced to 3.04) in
the motion estimation can be saved when utilizing the zero-block detection
approach using threshold
much higher than the other two sufficient
conditions (9.11 and 7.26, resp.). The average
PSNR loss, bit-rate increment, and motion estimation time saving versus QP are also compared using various thresholds and tabulated in Table 4. As shown, the early zero-block detection using a
nearly sufficient condition (i.e., with threshold
) significantly outperforms other thresholds
in terms of computation for any bit-rate coding. As high as 56% of motion
estimation time can be saved for
compared to the UMHexagonS
algorithm.
Table 2: Simulation conditions.
Table 3: Average search points per frame achieved by various thresholds.
Table 4: Performance comparison on news sequence.
The PSNR degradation,
to whatever extent it occurs, becomes strict for low bit-rate coding or high QP.
Table 5 displays PSNR loss conducted on several video sequences for
.
As shown, the conventional zero-block detection runs into a PSNR loss of 0.212 dB on the foreman sequence. This phenomenon is illustrated in Figure
4, which demonstrates
the SAD error surface and the corresponding search iterations using the
UMHexagonS algorithm in mode
for a macroblock (42nd MB, 10th frame) in the foreman sequence. As shown, it requires 110 search points
for the UMHexagonS algorithm to find the minimum error (
at the 26th iteration). The search
stops at the 26th iteration and the minimum error can also be found when the
conventional zero-block detection method is employed to the UMHexagonS
algorithm with
(threshold
). However, the search stops at the first
iteration where
as QP is increased to
, which corresponds to the threshold
; and this leads to severe performance
degradation. As the quantization parameter increases, the degradation becomes harsher.
Table 5: PSNR loss using nearly sufficient condition for

.
Figure 4: Foreman (42nd MB,10th frame) (a) SAD error surface (b) search iteration using UMHexagonS
algorithm.
4. Improved Umhexagons Algorithm
The conventional early
zero-block detection technique cannot give a satisfactory coding performance
when applied to the UMHexagonS algorithm for large
quantization step sizes. In this section, we modify the UMHexagonS algorithm using the early zero-block detection technique to achieve high coding
efficiency. Many commonly used video sequences (4 QCIF sequences: foreman,
carphone, football, coastguard and 4 CIF sequences: stefan, mobile, paris,
tempete)
with different motion contents are simulated by exploiting full search
algorithms on these video sequences with a search range
.
The experimental results indicate that a large number
of global minimum are occupied near the search center especially at the zero MV
(0,0) (average 38%), horizontal direction (average 27%), and vertical direction
(average 18%). The early zero-block
detection technique is not employed in these search points to improve coding performance. In addition, the motion search does not stop immediately when the
nearly sufficient condition is satisfied. Instead, the diamond search is
performed to find a smaller SAD. The improved algorithm is illustrated in Figure
5,
and summarized as follows.
Figure 5: Early zero-block detection for
motion search and DCT/Q.
Step 1. Predict the initial search point.
Step 2. Perform unsymmetrical-cross
search.
Step 3. Perform uneven multi-hexagon-grid
search. If all
satisfy the nearly sufficient condition in
(16), the motion search stops in this step and jumps to the diamond search in Step
4 and perform the diamond search.
Step 4. Perform extended
hexagon based search. Similarly, if all
satisfy the nearly sufficient condition in the
hexagon search, then jump to perform the diamond search.
The average PSNR loss,
bit-rate increment, and ME time saving of the improved algorithm versus QP are also compared with the UMHexagonS algorithm and tabulated
in Table 6. As shown, a great improvement in computation and up to 55% of ME
computation can be saved, while maintaining a very good rate distortion
performance. A gain of 0.128 dB in PSNR can be obtained for the improved
algorithm on the foreman sequence for
with a slight
increase in computation, compared to the conventional early zero-block
detection method.
Table 6: PSNR loss, bit-rate and ME time saving.
5. Conclusion
In this paper, we
modified the early termination of UMHexagonS algorithm to avoid the serve
performance degradation in high QP. In addition, we derived a nearly sufficient
condition for zero-block detection of
DCT coefficients after quantization, based
upon the ensemble average of all
DCT coefficients. The nearly sufficient
condition for zero-block detection is shown to have excellent zero-block
detecting capability, while improper zero-block detection is negligible. The early
zero-block detection approach with a nearly sufficient condition (threshold
) was then applied to both motion search and
DCT/Q calculation in a fast-motion estimation algorithm (UMHexagonS algorithm). The simulation results reveal that a significant
improvement in computation reduction (up to 55%) can be achieved with negligible
performance degradation compared to the UMHexagonS algorithm.
References
- T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003.
- Z. Chen, J. Xu, Y. He, and J. Zheng, “Fast integer-pel and fractional-pel motion estimation for H.264/AVC,” Journal of Visual Communication and Image Representation, vol. 17, no. 2, pp. 264–290, 2006.
- Z. Xie, Y. Liu, J. Liu, and T. Yang, “A general method for detecting all-zero-blocks prior to DCT and quantization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 2, pp. 237–241, 2007.
- J.-F. Yang, S.-C. Chang, and C.-Y. Chen, “Computation reduction for motion search in low rate video coders,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 10, pp. 948–951, 2002.
- L. Yang, K. Yu, J. Li, and S. Li, “An effective variable block-size early termination algorithm for H.264 video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 6, pp. 784–788, 2005.
- L. A. Sousa, “General method for eliminating redundant computations in video coding,” Electronics Letters, vol. 36, no. 4, pp. 306–307, 2000.
- Y. H. Moon, G. Y. Kim, and J. H. Kim, “An improved early detection algorithm for all-zero blocks in H.264 video encoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 8, pp. 1053–1057, 2005.
- D. Wu, K. P. Lim, T. K. Chiew, J. Y. Tham, and K. H. Goh, “An adaptive thresholding technique for the detection of all-zeros blocks in H.264,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '07), vol. 5, pp. 329–332, San Antonio, Tex, USA, September 2007.