Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015, Article ID 269764, 14 pages
http://dx.doi.org/10.1155/2015/269764
Research Article

Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

1Mobile Computing and Compilers Software and Service Group, Intel Corporation, Santa Clara, CA 95054, USA
2Mobile Computing and Compilers Software and Service Group, Intel Corporation, 6/1 Prospect Akademika, Novosibirsk 125009, Russia

Received 15 May 2014; Accepted 29 September 2014

Academic Editor: Sunita Chandrasekaran

Copyright © 2015 Xinmin Tian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Intel Corporation, “Intel Xeon Phi Coprocessor System Software Developers Guide,” 2012, http://software.intel.com/en-us/mic-developer.
  2. N. Satish, C. Kim, J. Chhugani et al., “Can traditional programming bridge the Ninja performance gap for parallel computing applications?” in Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12), pp. 440–451, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  3. J. Reinders, “An Overview of Programming for Intel Xeon processor and Intel Xeon Phi Coprocessor,” 2012.
  4. Intel Corporation, Intel Advanced Vector Extensions Programming Reference, Document Number 319433-011, Intel Corporation, 2011.
  5. A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian, “Automatic intra-register vectorization for the intel architecture,” International Journal of Parallel Programming, vol. 30, no. 2, pp. 65–98, 2002. View at Publisher · View at Google Scholar · View at Scopus
  6. A. J. C. Bik, D. L. Kreitzer, and X. Tian, “A case study on compiler optimizations for the Intel CoreTM 2 duo processor,” International Journal of Parallel Programming, vol. 36, no. 6, pp. 571–591, 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. X. Tian, H. Saito, M. Girkar et al., “Compiling C/C++ SIMD extensions for function and loop vectorizaion on multicore-SIMD processors,” in Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops (IPDPSW '12), pp. 2349–2358, May 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. H. J. Lu, M. Garkar, M. Matz, J. Hubicka, A. Jaeger, and M. Mitchell, “System V Application Binary Interface K1OM Architecture Processor Supplement,” Version 1.0, 2012, http://software.intel.com/en-us/forums/topic/278102.
  9. S. J. Aarseth, Gravitational N-Body Simulations: Tools and Algorithm, Cambridge Monographs on Mathematical Physics, Cambridge University Press, Cambridge, UK, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  10. A. G. Gray and A. W. Moore, “‘N-body’ problems in statistical learning,” in Advances in Neural Information Processing Systems (NIPS), pp. 521–527, 2000. View at Google Scholar
  11. M. Kachelrieb, M. Knaup, and O. Bockenbach, “Hyperfast perspective cone-beam backprojection,” in Proceedings of the IEEE Nuclear Science Symposium Conference Record, pp. 1679–1683, November 2006. View at Publisher · View at Google Scholar · View at Scopus
  12. R. Allen and K. Kennedy, “Automatic translation of FORTRAN programs to vector form,” ACM Transactions on Programming Languages and Systems, vol. 9, no. 4, pp. 491–542, 1987. View at Publisher · View at Google Scholar · View at Scopus
  13. A. E. Eichenberger, K. O'Brien, P. Wu et al., “Optimizing compiler for the CELL processor,” in Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT '05), pp. 161–172, IEEE, St. Louis, Mo, USA, September 2005. View at Publisher · View at Google Scholar · View at Scopus
  14. R. Karrenberg and S. Hack, “Whole-function vectorization,” in Proceedings of the 9th International Annual IEEE/ACM Symposium on Code Generation and Optimization, pp. 141–150, Charmonix, France, April 2011.
  15. S. Larsen and S. Amarasinghe, “Exploiting superword level parallelism with multimedia instruction sets,” in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI '00), pp. 145–156, June 2000. View at Scopus
  16. P. Wu, A. E. Eichenberger, and A. Wang, “Efficient SIMD code generation for runtime alignment and length conversion,” in Proceedings of the International Symposium on Code Generation and Optimization (CGO '05), pp. 153–164, March 2005. View at Publisher · View at Google Scholar
  17. Crescent Bay Software, VAST-F/AltiVec: Automatic Fortran Vectorizer for PowerPC Vector Unit, 2004.
  18. D. Nuzman and A. Zaks, “Outer-loop vectorization—revisited for short SIMD architectures,” in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08), pp. 2–11, Toronto, ON, Canada, October 2008. View at Publisher · View at Google Scholar · View at Scopus
  19. D. Nuzman and R. Henderson, “Multi-platform auto-vectorization,” in Proceedings of the 4th International Symposium on Code Generation and Optimization (CGO '06), pp. 281–294, New York, NY, USA, March 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. G. Cheong and M. S. Lam, “An optimizer for multimedia instruction sets,” in Proceedings of the 2nd SUIF Compiler Workshop, August 1997.
  21. J. Shin, M. Hall, and J. Chame, “Superword-level parallelism in the presence of control flow,” in Proceedings of the International Symposium on Code Generation and Optimization (CGO '05), pp. 165–175, IEEE Computer Society, March 2005. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Klemm, A. Duran, X. Tian, H. Saito, D. Caballero, and X. Martorell, “Extending OpenMP* with vector constructs for modern multicore SIMD architectures,” in OpenMP in a Heterogeneous World: 8th International Workshop on OpenMP, IWOMP 2012, Rome, Italy, June 11–13, 2012. Proceedings, Lecture Notes in Computer Science, pp. 59–72, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar
  23. OpenMP Architecture Review Board, “OpenMP Application Program Interface,” Version 4.0 (Release Candidate RC1), 2012.