Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015, Article ID 576498, 16 pages
http://dx.doi.org/10.1155/2015/576498
Research Article

Optimized Data Transfers Based on the OpenCL Event Management Mechanism

1Tohoku University/JST CREST, Sendai, Miyagi 980-8579, Japan
2Tohoku University, Sendai, Miyagi 980-8578, Japan
3NVIDIA Research, Santa Clara, CA 95050, USA
4The University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Received 15 May 2014; Accepted 29 September 2014

Academic Editor: Sunita Chandrasekaran

Copyright © 2015 Hiroyuki Takizawa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. D. B. Kirk and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Morgan Kaufmann Publishers, 2007.
  2. B. Gaster, L. Howes, D. R. Kaeli, P. Mistry, and D. Schaa, Heterogeneous Computing with OpenCL, Morgan Kaufmann, Boston, Mass, USA, 2011.
  3. W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, The MIT Press, 1999.
  4. O. S. Lawlor, “Message passing for GPGPU clusters: cudaMPI,” in Proceedings of the IEEE International Conference on Cluster Comptuing and Workshops (CLUSTER '09), pp. 1–8, 2009.
  5. A. M. Aji, J. Dinan, D. Buntinas et al., “MPI-ACC: An integrated and extensible approach to data movement in accelerator-based systems,” in Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications (HPCC '12), pp. 647–654, Liverpool, UK, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  6. H. Wang, S. Potluri, M. Luo, A. K. Singh, S. Sur, and D. K. Panda, “MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters,” Computer Science: Research and Development, vol. 26, no. 3-4, pp. 257–266, 2011. View at Publisher · View at Google Scholar · View at Scopus
  7. J. A. Stuart, P. Balaji, and J. D. Owens, “Extending MPI to accelerators,” in Proceedings of the 1st Workshop on Architectures and Systems for Big Data (ASBD '11), pp. 19–23, 2011. View at Publisher · View at Google Scholar
  8. I. Gelado, J. Cabezas, N. Navarro, J. E. Stone, S. Patel, and W.-M. W. Hwu, “An asymmetric distributed shared memory model for heterogeneous parallel systems,” in Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '10), pp. 347–358, March 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. J. A. Stuart and J. D. Owens, “Message passing on data-parallel architectures,” in Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS '09), pp. 1–12, May 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. A. Barak, T. Ben-Nun, E. Levy, and A. Shiloh, “A package for OpenCL based heterogeneous computing on clusters with many GPU devices,” in Proceedings of the IEEE International Conference on Cluster Computing Workshops and Posters, pp. 1–7, September 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. Y. Munekawa, F. Ino, and K. Hagihara, “Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU,” in Proceedings of the 8th IEEE International Conference on BioInformatics and BioEngineering (BIBE '08), pp. 1–6, October 2008. View at Publisher · View at Google Scholar · View at Scopus
  12. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012. View at Google Scholar
  13. E. H. Phillips and M. Fatica, “Implementing the Himeno benchmark with CUDA on GPU clusters,” in Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS '10), pp. 1–10, April 2010. View at Publisher · View at Google Scholar · View at Scopus
  14. M. Shigeta and T. Watanabe, “Growth model of binary alloy nanopowders for thermal plasma synthesis,” Journal of Applied Physics, vol. 108, no. 4, Article ID 043306, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. The Open MPI Project, “Open MPI: open source high performance computing,” http://www.open-mpi.org/.