Research Article

Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

Algorithm 4

do j = 1, 4
do k = 1, 4
sumx = 0.0
sumy = 0.0
do i = 1, 4
sumx = sumx + matrixA(i,k)   matrixB(i,j)
sumy = sumy + matrixA(i,k)   matrixB(j,i)
enddo
matrixC(k,j) = sumx
matirxD(j,k) = sumy
enddo
enddo