Research Article
An Efficient Multi-Core SIMD Implementation for H.264/AVC Encoder
Algorithm 2
Unaligned load SIMD implementation with concatenate instruction.
uint32 AddressAt128; | vector_16b_sw Va, Vb, Vout; | | AddressAt128b = ((uint32) (mref_ptr)) & (~0xF); | Offset = ((uint32) (mref_ptr)) & (0xF); | Va = ldq(AddressAt128, 0); | Vb = ldq(AddressAt128, 16); | Vout = wrot(Va, Vb, Offset); |
|