VLSI Design

Research Article

An Efficient Multi-Core SIMD Implementation for H.264/AVC Encoder

SIMD instructions for video coding.


Instruction description	Affected modules	Notes

Horizontal add: adds all the elements inside a vector register and produces a scalar result	ME, intraprediction	Speeds up SAD
Horizontal permute: rearranges elements inside a vector register	Intraprediction, DCT/Q/IQ/IDCT	Allows zig-zag scan and speeds up intra diagonal modes
Concatenate: concatenates two vector registers into an intermediate composite, shifts the composite to the right by a variable offset	Motion estimation and compensation	Allows software implementation of unaligned load
Promotion/demotion precision: an efficient support for promoting element precision while loading data from memory, and demoting the precision (with saturation) while storing data to memory	All the main modules	It will speed up the load and store operations for several modules
Absolute subtraction: for every element “a” in the first vector and every element “b” in the second vector performs the following operation:	ME, intraprediction, deblocking filter	Speeds up SAD in conjunction with horizontal add; used in deblocking filter
Shift with round: performs the following operation for every element “a” in the vector operand: , where n is a scalar value	IDCT, deblocking filter, motion compensation	Speeds up pixel interpolation
Average: for every element “a” in the first vector and every element “b” in the second vector performs the following operation:	Intraprediction, deblocking filter, motion compensation	Speeds up pixel interpolation