Research Article

An Efficient Multi-Core SIMD Implementation for H.264/AVC Encoder

Table 4

SIMD instructions for video coding.

Instruction descriptionAffected modulesNotes

Horizontal add: adds all the elements inside a vector register and produces a scalar resultME, intrapredictionSpeeds up SAD
Horizontal permute: rearranges elements inside a vector registerIntraprediction, DCT/Q/IQ/IDCTAllows zig-zag scan and speeds up intra diagonal modes
Concatenate: concatenates two vector registers into an intermediate composite, shifts the composite to the right by a variable offsetMotion estimation and compensationAllows software implementation of unaligned load
Promotion/demotion precision: an efficient support for promoting element precision while loading data from memory, and demoting the precision (with saturation) while storing data to memoryAll the main modulesIt will speed up the load and store operations for several modules
Absolute subtraction: for every element “a” in the first vector and every element “b” in the second vector performs the following operation: ME, intraprediction, deblocking filterSpeeds up SAD in conjunction with horizontal add; used in deblocking filter
Shift with round: performs the following operation for every element “a” in the vector operand: , where n is a scalar valueIDCT, deblocking filter, motion compensationSpeeds up pixel interpolation
Average: for every element “a” in the first vector and every element “b” in the second vector performs the following operation: Intraprediction, deblocking filter, motion compensationSpeeds up pixel interpolation