Research Article

An Efficient Multi-Core SIMD Implementation for H.264/AVC Encoder

Algorithm 1

Hadamard transform xSTream SIMD implementation.
/*first level: one 16 samples butterfly*/
/*(s0 ÷ s7)+(s8 ÷ s15)*/
vaddh out_low = in_low, in_high
/*(s0 ÷ s7)−(s8 ÷ s15)*/
vsubh out_high = in_low, in_high
/*data reordering*/
/*0 1 2 3 8 9 10 11*/
vmrgbl in_low = out_low, out_high, perm
/*4 5 6 7 12 13 14 15*/
vmrgbu in_high = out_low, out_high, perm
/*second level: two 8 samples butterfly*/
vaddh out_low = in_low, in_high
vsubh out_high = in_low, in_high
/*data reordering*/
/*0 1 8 9 4 5 12 13*/
vmrge in_low = out_low, out_high
/*2 3 10 11 6 7 14 15*/
vmrgo in_high = out_low, out_high
/*third level: four 4 samples butterfly*/
vaddh out_low = in_low, in_high
vsubh out_high = in_low, in_high
/*data reordering*/
/*0 8 2 10 4 12 6 14*/
vmrgeh in_low = out_low, out_high
/*1 9 3 11 5 13 7 15*/
vmrgoh in_high = out_low, out_high
/*fourth level: eight 2 samples butterfly*/
vaddh out_low = in_low, in_high
vsubh out_high = in_low, in_high