An Efficient Multi-Core SIMD Implementation for H.264/AVC Encoder
Table 4
SIMD instructions for video coding.
Instruction description
Affected modules
Notes
Horizontal add: adds all the elements inside a vector register and produces a scalar result
ME, intraprediction
Speeds up SAD
Horizontal permute: rearranges elements inside a vector register
Intraprediction, DCT/Q/IQ/IDCT
Allows zig-zag scan and speeds up intra diagonal modes
Concatenate: concatenates two vector registers into an intermediate composite, shifts the composite to the right by a variable offset
Motion estimation and compensation
Allows software implementation of unaligned load
Promotion/demotion precision: an efficient support for promoting element precision while loading data from memory, and demoting the precision (with saturation) while storing data to memory
All the main modules
It will speed up the load and store operations for several modules
Absolute subtraction: for every element “a” in the first vector and every element “b” in the second vector performs the following operation:
ME, intraprediction, deblocking filter
Speeds up SAD in conjunction with horizontal add; used in deblocking filter
Shift with round: performs the following operation for every element “a” in the vector operand: , where n is a scalar value
IDCT, deblocking filter, motion compensation
Speeds up pixel interpolation
Average: for every element “a” in the first vector and every element “b” in the second vector performs the following operation: