Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015, Article ID 621730, 15 pages
http://dx.doi.org/10.1155/2015/621730
Research Article

Multi-GPU Support on Single Node Using Directive-Based Programming Model

Department of Computer Science, University of Houston, Houston, TX 77004, USA

Received 15 May 2014; Accepted 29 September 2014

Academic Editor: Xinmin Tian

Copyright © 2015 Rengan Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. CUDA, http://www.nvidia.com/object/cuda_home_new.html.
  2. OpenCL Standard, http://www.khronos.org/opencl.
  3. HMPP Directives Reference Manual, (HMPP Workbench 3.1), 2015, https://www.olcf.ornl.gov/wp-content/uploads/2012/02/HMPPWorkbench-3.0_HMPP_Directives_ReferenceManual.pdf.
  4. The Portland Group, PGI Accelerator Programming Model for Fortran and C, Version 1.3, The Portland Group, 2010.
  5. OpenACC Standard Home, http://www.openacc-standard.org/.
  6. C. Liao, Y. Yan, B. R. de Supinski, D. J. Quinlan, and B. Chapman, “Early experiences with the openMP accelerator model,” in OpenMP in the Era of Low Power Devices and Accelerators: Proceedings of the 9th International Workshop on OpenMP, IWOMP 2013, Canberra, ACT, Australia, September 16–18, 2013, vol. 8122 of Lecture Notes in Computer Science, pp. 84–98, Springer, Berlin, Germany, 2013. View at Publisher · View at Google Scholar
  7. X. Tian, R. Xu, Y. Yan, Z. Yun, S. Chandrasekaran, and B. Chapman, “Compiling a high-level directive-based programming model for GPG-PUs,” in Languages and Compilers for Parallel Computing: 26th International Workshop, LCPC 2013, San Jose, CA, USA, September 25–27, 2013, Revised Selected Papers, pp. 105–120, Springer International Publishing, 2014. View at Google Scholar
  8. R. Reyes, I. López-Rodríguez, J. J. Fumero, and F. de Sande, “accULL: an OpenACC implementation with CUDA and OpenCL support,” in Euro-Par 2012 Parallel Processing, vol. 7484 of Lecture Notes in Computer Science, pp. 871–882, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar
  9. S. Lee and J. S. Vetter, “Early evaluation of directive-based GPU programming models for productive exascale computing,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '12), pp. 1–11, IEEE Computer Society Press, Salt Lake City, Utah, USA, November 2012. View at Publisher · View at Google Scholar · View at Scopus
  10. S. Wienke, P. Springer, C. Terboven, and D. an Mey, “OpenACC—first experiences with real-world applications,” in Euro-Par 2012 Parallel Processing, vol. 7484 of Lecture Notes in Computer Science, pp. 859–870, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar
  11. R. Xu, S. Chandrasekaran, B. Chapman, and C. F. Eick, “Directive-based programming models for scientific applications—a comparison,” in Proceedings of the SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC '12), pp. 1–9, IEEE, Salt Lake City, Utah, USA, November 2012. View at Publisher · View at Google Scholar
  12. A. Hart, R. Ansaloni, and A. Gray, “Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers,” The European Physical Journal Special Topics, vol. 210, no. 1, pp. 5–16, 2012. View at Publisher · View at Google Scholar · View at Scopus
  13. J. M. Levesque, R. Sankaran, and R. Grout, “Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '12), pp. 1–11, IEEE Computer Society Press, Salt Lake City, Utah, USA, November 2012. View at Publisher · View at Google Scholar · View at Scopus
  14. J. Bueno, J. Planas, A. Duran et al., “Productive programming of GPU clusters with OmpSs,” in Proceedings of the IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS '12), pp. 557–568, IEEE, May 2012. View at Publisher · View at Google Scholar · View at Scopus
  15. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, “StarPU: a unified platform for task scheduling on heterogeneous multicore architectures,” Concurrency and Computation: Practice and Experience, vol. 23, no. 2, pp. 187–198, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. R. Xu, S. Chandrasekaran, and B. Chapman, “Exploring programming multi-GPUs using OpenMP and OpenACC-based hybrid model,” in Proceedings of the IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW '13), pp. 1169–1176, IEEE, Cambridge, Mass, USA, May 2013. View at Publisher · View at Google Scholar
  17. S. Chatterjee, M. Grossman, A. Sbîrlea, and V. Sarkar, “Dynamic task parallelism with a GPU work-stealing runtime system,” in Languages and Compilers for Parallel Computing, vol. 7146 of Lecture Notes in Computer Science, pp. 203–217, Springer, Berlin, Germany, 2013. View at Publisher · View at Google Scholar
  18. T. Komoda, S. Miwa, H. Nakamura, and N. Maruyama, “Integrating multi-GPU execution in an OpenACC compiler,” in Proceedings of the 42nd Annual International Conference on Parallel Processing (ICPP '13), pp. 260–269, IEEE, Lyon, France, October 2013. View at Publisher · View at Google Scholar · View at Scopus
  19. E. Ayguadé, N. Copty, A. Duran et al., “The design of OpenMP tasks,” IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 3, pp. 404–418, 2009. View at Publisher · View at Google Scholar · View at Scopus
  20. Technical report on directives for attached accelerators, 2012, http://openmp.org/wp/openmp-specifications/.
  21. CAPS OpenACC Parallism Mapping, 2015, http://exxactcorp.com/index.php/software/prod_list/5.
  22. R. Xu, X. Tian, Y. Yan, S. Chandrasekaran, and B. Chapman, “Reduction operations in parallel loops for GPGPUs,” in Proceedings of the Programming Models and Applications on Multicores and Manycores (PMAM '14), pp. 10–20, ACM, New York, NY, USA, 2007. View at Publisher · View at Google Scholar
  23. NVIDIA Kepler GK110 Architecture Whitepaper, 2014, http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
  24. J. H. Chen, A. Choudhary, B. de Supinski et al., “Terascale direct numerical simulations of turbulent combustion using S3D,” Computational Science & Discovery, vol. 2, no. 1, Article ID 015001, 2009. View at Publisher · View at Google Scholar · View at Scopus
  25. K. Spafford, J. Meredith, J. Vetter, J. Chen, R. Grout, and R. Sankaran, “Accelerating S3D: a GPGPU case study,” in Euro-Par 2009—Parallel Processing Workshops, vol. 6043 of Lecture Notes in Computer Science, pp. 122–131, Springer, Berlin, Germany, 2010. View at Publisher · View at Google Scholar
  26. O. Hernandez, W. Ding, B. Chapman, C. Kartsaklis, R. Sankaran, and R. Graham, “Experiences with high-level programming directives for porting applications to GPUs,” in Facing the Multicore—Challenge II, vol. 7174 of Lecture Notes in Computer Science, pp. 96–107, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar
  27. G. Pullan, “Cambridge cuda course 25–27 May 2009,” http://www.many-core.group.cam.ac.uk/archive/CUDAcourse09/.
  28. Cray C and C++ Reference Manual, 2014, http://docs.cray.com/books/S-2179-81/S-2179-81.pdf.
  29. R. Xu, M. Hugues, H. Calandra, S. Chandrasekaran, and B. Chapman, “Accelerating Kirchhoff migration on GPU using directives,” in Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD '14), pp. 37–46, IEEE, 2014. View at Publisher · View at Google Scholar