Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2019, Article ID 6825728, 20 pages
https://doi.org/10.1155/2019/6825728
Research Article

Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC

1Simula Research Laboratory, P.O. Box 134, NO-1325 Lysaker, Norway
2University of Innsbruck, Technikerstraße 13, A-6020 Innsbruck, Austria
3The Arctic University of Norway, NO-9037 Tromsø, Norway
4University of Oslo, NO-0316 Oslo, Norway

Correspondence should be addressed to Xing Cai; on.alumis@acgnix

Received 26 September 2018; Revised 14 January 2019; Accepted 27 January 2019; Published 3 March 2019

Academic Editor: Manuel E. Acacio Sanchez

Copyright © 2019 Jérémie Lagravière et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. G. Almasi, “PGAS (partitioned global address space) languages,” in Encyclopedia of Parallel Computing, D. Padua, Ed., pp. 1539–1545, Springer, Berlin, Germany, 2011. View at Google Scholar
  2. D. E. Culler, A. Dusseau, S. C. Goldstein et al., “Parallel programming in split-C,” in Proceedings of 1993 ACM/IEEE Conference on Supercomputing (Supercomputing’93), pp. 262–273, Portland, OR, USA, November 1993.
  3. M. de Wael, S. Marr, B. de Fraine, T. van Cutsem, and W. de Meuter, “Partitioned global address space languages,” ACM Computing Surveys, vol. 47, no. 4, pp. 1–27, 2015. View at Publisher · View at Google Scholar · View at Scopus
  4. PGAS—Partitioned Global Address Space, 2016, http://www.pgas.org.
  5. T. El-Ghazawi, W. Carlson, T. Sterling, and K. Yelick, UPC: Distributed Shared Memory Programming, John Wiley & Sons, Hoboken, NJ, USA, 2005.
  6. UPC Consortium, “UPC language specifications version 1.3,” 2013, http://upc.lbl.gov/docs/user/upc-lang-spec-1.3.pdf. View at Google Scholar
  7. W.-Y. Chen, “Optimizing partitioned global address space programs for cluster Architectures,” Ph.D. thesis, University of California at Berkeley, Berkeley, CA, USA, 2007.
  8. W.-Y. Chen, C. Iancu, and K. Yelick, “Communication optimizations for fine-grained UPC applications,” in Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05), St. Louis, MO, USA, September 2005.
  9. R. Grimes, D. Kincaid, and D. Young, “ITPACK 2.0 User's Guide,” Technical Report CNA-150, Center for Numerical Analysis, University of Texas, Austin, TX, USA, 1979.
  10. Y. Zheng, “Optimizing UPC programs for multi-core systems,” Scientific Programming, vol. 18, no. 3-4, pp. 183–191, 2010. View at Publisher · View at Google Scholar
  11. S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,” Communications of the ACM, vol. 52, no. 4, pp. 65–76, 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. J. Langguth, N. Wu, J. Chai, and X. Cai, “Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes,” Journal of Parallel and Distributed Computing, vol. 76, pp. 120–131, 2015. View at Publisher · View at Google Scholar · View at Scopus
  13. J. D. McCalpin, “STREAM: sustainable memory bandwidth in high performance computers,” Technical Report, University of Virginia, Charlottesville, VA, USA, 2007.
  14. J. Langguth, M. Sourouri, G. T. Lines, S. B. Baden, and X. Cai, “Scalable heterogeneous CPU-GPU computations for unstructured tetrahedra l meshes,” IEEE Micro, vol. 35, no. 4, pp. 6–15, 2015. View at Publisher · View at Google Scholar · View at Scopus
  15. H. Si, “TetGen, a delaunay-based quality tetrahedral mesh generator,” ACM Transactions on Mathematical Software, vol. 41, no. 2, pp. 1–36, 2015. View at Publisher · View at Google Scholar · View at Scopus
  16. The Abel computer cluster, 2018, https://www.uio.no/english/services/it/research/hpc/abel/.
  17. Berkeley UPC—Unified Parallel C, 2018, http://upc.lbl.gov.
  18. C. Barton, C. Caşcaval, and J. N. Amaral, “A characterization of shared data access patterns in UPC programs,” in Proceedings of 19th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2006), vol. 4382, pp. 111–125, Springer, New Orleans, LA, USA, November 2006.
  19. T. El-Ghazawi and F. Cantonnet, “UPC performance and potential: a NPB experimental study,” in Proceedings of ACM/IEEE SC 2002 Conference (SC’02), Baltimore, MD, USA, November 2002.
  20. H. Shan, F. Blagojević, S.-J. Min et al., “A programming model performance study using the NAS parallel benchmarks,” Scientific Programming, vol. 18, no. 3-4, pp. 153–167, 2010. View at Publisher · View at Google Scholar
  21. Z. Zhang and S. R. Seidel, “Benchmark measurements of current UPC platforms,” in Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05), Denver, CO, USA, April 2005.
  22. D. Bailey, E. Barszcz, J. Barton et al., “The NAS parallel benchmarks,” Technical Report RNR-94-007, NASA Ames Research Center, Mountain View, CA, USA, 1994.
  23. J. González-Domínguez, Ó. García-López, G. L. Taboada, M. J. Martín, and J. Touriño, “Performance evaluation of sparse matrix products in UPC,” Journal of Supercomputing, vol. 64, no. 1, pp. 100–109, 2013. View at Publisher · View at Google Scholar · View at Scopus
  24. S. Li, C. Hu, J. Zhang, and Y. Zhang, “Automatic tuning of sparse matrix-vector multiplication on multicore clusters,” Science China Information Sciences, vol. 58, no. 9, pp. 1–14, 2015. View at Publisher · View at Google Scholar · View at Scopus
  25. M. Alvanos, “Optimization techniques for fine-grained communication in PGAS environments,” Ph.D. thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2013.
  26. F. Cantonnet, T. El-Ghazawi, P. Lorenz, and J. Gaber, “Fast address translation techniques for distributed shared memory compilers,” in Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05), Denver, CO, USA, April 2005.
  27. Z. Zhang and S. R. Seidel, “A performance model for fine-grain accesses in UPC,” in Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06), Rhodes Island, Greece, April 2006.
  28. R. Rabenseifner, Short Course: Introduction to Unified Parallel C (UPC) and Co-array Fortran (CAF), HLRS, University of Stuttgart, Stuttgart, Germany, 2015.
  29. H. Stengel, T. Jan, G. Hager, and G. Wellein, “Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model,” in Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 207–216, ACM, Irvine, CA, USA, June 2015.
  30. A. Marowka, “Execution model of three parallel languages: OpenMP, UPC and CAF,” Scientific Programming, vol. 13, no. 2, pp. 127–135, 2005. View at Publisher · View at Google Scholar · View at Scopus
  31. K. Z. Ibrahim, P. H. Hargrove, C. Iancu, and K. Yelick, “An evaluation of one-sided and two-sided communication paradigms on relaxed-ordering interconnect,” in Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS’14), pp. 1115–1125, IEEE, Phoenix, AZ, USA, May 2014.
  32. M. Prugger, L. Einkemmer, and A. Ostermann, “Evaluation of the partitioned global address space (PGAS) model for an inviscid Euler solver,” Parallel Computing, vol. 60, pp. 22–40, 2016. View at Publisher · View at Google Scholar · View at Scopus