Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015, Article ID 859491, 20 pages
http://dx.doi.org/10.1155/2015/859491
Research Article

OpenCL Performance Evaluation on Modern Multicore CPUs

School of Computer Science, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA

Received 15 May 2014; Accepted 29 September 2014

Academic Editor: Xinmin Tian

Copyright © 2015 Joo Hwan Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. AMD, AMD Accelerated Processing Units (APUs), http://www.amd.com/en-us/innovations/software-technologies/apu.
  2. Intel, “Products (Formerly Sandy Bridge),” http://ark.intel.com/products/codename/29900/Sandy-Bridge.
  3. Khronos Group, “OpenCL: the open standard for parallel programming of heterogeneous systems,” http://www.khronos.org/opencl.
  4. Intel, “Intel OpenCL SDK,” http://software.intel.com/en-us/articles/intel-opencl-sdk/.
  5. NVIDIA, “NVIDIA OpenCL SDK,” http://developer.nvidia.com/cuda/opencl/.
  6. J. Aycock, “A brief history of just-in-time,” ACM Computing Surveys, vol. 35, no. 2, pp. 97–113, 2003. View at Publisher · View at Google Scholar · View at Scopus
  7. Intel, Intel Threading Building Blocks, http://threadingbuildingblocks.org/.
  8. The OpenMP Architecture Review Board, OpenMP, http://openmp.org/wp/.
  9. NVIDIA, CUDA Programming Guide, V4.0, 2011.
  10. S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-M. W. Hwu, “Optimization principles and application performance evaluation of a multithreaded GPU using CUDA,” in Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08), pp. 73–82, February 2008. View at Scopus
  11. S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, and W.-M. W. Hwu, “Program optimization study on a 128-core GPU,” in Proceedings of the 1st Workshop on General Purpose Processing on Graphics Processing Units (GPGPU '07), October 2007.
  12. S. Ryoo, C. I. Rodrigues, S. S. Stone et al., “Program optimization space pruning for a multithreaded GPU,” in Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '08), pp. 195–204, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. V. Volkov and J. W. Demmel, “Benchmarking GPUs to tune dense linear algebra,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '08), pp. 31:1–31:11, November 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. R. Balasubraamonian, S. Dwarkadas, and D. H. Albonesi, “Reducing the complexity of the register file in dynamic superscalar processors,” in Proceedings of the 34th Annual International Symposium on Microarchitecture, pp. 237–248, December 2001. View at Scopus
  15. J. Kim, S. Seo, J. Lee, J. Nah, and G. Jo, “SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters,” in Proceedings of the 26th ACM International Conference on Supercomputing (ICS '12), pp. 341–351, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  16. J. A. Stratton, V. Grover, J. Marathe et al., “Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs,” in Proceedings of the 8th International Symposium on Code Generation and Optimization (CGO '10), pp. 111–119, ACM, April 2010. View at Publisher · View at Google Scholar · View at Scopus
  17. G. Diamos, “The design and implementation Ocelot’s dynamic binary translator from PTX to Multi-Core x86,” Tech. Rep. GIT-CERCS-09-18, Georgia Institute of Technology, 2009. View at Google Scholar
  18. B. Saha, X. Zhou, H. Chen et al., “Programming model for a heterogeneous x86 platform,” in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09), pp. 431–440, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua, “An evaluation of vectorizing compilers,” in Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT '11), pp. 372–382, Galveston, Tex, USA, October 2011. View at Publisher · View at Google Scholar · View at Scopus
  20. L. Gwennap, “Intel’s MMX speeds multimedia,” Microprocessor Report, 1996. View at Google Scholar
  21. Intel, “Intel Integrated Performance Primitives,” https://software.intel.com/en-us/intel-ipp.
  22. Intel, Intel Math Kernel Library, http://software.intel.com/en-us/intel-mkl.
  23. Intel, Intel C and C++ Compilers, https://software.intel.com/en-us/c-compilers.
  24. M. Pharr and W. R. Mark, “ispc: a SPMD compiler for high-performance CPU programming,” in Proceedings of the Innovative Parallel Computing (InPar '12), pp. 1–13, IEEE, San Jose, Calif, USA, May 2012. View at Publisher · View at Google Scholar · View at Scopus
  25. D. Grewe and M. F. P. O'Boyle, “A static task partitioning approach for heterogeneous systems using OpenCL,” in Proceedings of the 20th International Conference on Compiler Construction (CC '11), pp. 286–305, Saarbrücken, Germany, March 2011.
  26. The IMPACT Research Group and UIUC, “Parboil benchmark suite,” http://impact.crhc.illinois.edu/Parboil/parboil.aspx.
  27. C.-K. Luk, R. Cohn, R. Muth et al., “Pin: building customized program analysis tools with dynamic instrumentation,” in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05), pp. 190–200, June 2005. View at Scopus
  28. Intel, A Guide to Auto-Vectorization with Intel C++ Compilers, http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers.
  29. S. Hong and H. Kim, “An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness,” in Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), pp. 152–163, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  30. A. Ali, U. Dastgeer, and C. Kessler, “OpenCL for programming shared memory multicore CPUs,” in Proceedings of the MULTIPROG Workshop at HiPEAC, 2012.
  31. S. Seo, G. Jo, and J. Lee, “Performance characterization of the NAS Parallel Benchmarks in OpenCL,” in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC '11), pp. 137–148, Austin, Tex, USA, November 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. Intel, “Writing Optimal OpenCL Code with Intel OpenCL SDK,” http://software.intel.com/file/37171.