Research Article

An Efficient Multi-Core SIMD Implementation for H.264/AVC Encoder

Algorithm 2

Unaligned load SIMD implementation with concatenate instruction.
uint32 AddressAt128;
vector_16b_sw Va, Vb, Vout;
AddressAt128b = ((uint32) (mref_ptr)) & (~0xF);
Offset = ((uint32) (mref_ptr)) & (0xF);
Va = ldq(AddressAt128, 0);
Vb = ldq(AddressAt128, 16);
Vout = wrot(Va, Vb, Offset);