Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015, Article ID 269764, 14 pages
Research Article

Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

1Mobile Computing and Compilers Software and Service Group, Intel Corporation, Santa Clara, CA 95054, USA
2Mobile Computing and Compilers Software and Service Group, Intel Corporation, 6/1 Prospect Akademika, Novosibirsk 125009, Russia

Received 15 May 2014; Accepted 29 September 2014

Academic Editor: Sunita Chandrasekaran

Copyright © 2015 Xinmin Tian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.