Exploring Languages for Expressing Medium to Massive On-Chip ParallelismView this Special Issue
Yili Zheng, "Optimizing UPC Programs for Multi-Core Systems", Scientific Programming, vol. 18, Article ID 646829, 9 pages, 2010. https://doi.org/10.3233/SPR-2010-0310
Optimizing UPC Programs for Multi-Core Systems
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems to get good performance. First, we describe several UPC program optimization techniques that are important to achieving good performance on NUMA multi-core computers with examples and quantitative performance results. Second, we use two numerical computing kernels, parallel matrix–matrix multiplication and parallel 3-D FFT, to demonstrate the end-to-end development and optimization for UPC applications. Our results show that the optimized UPC programs achieve very good and scalable performance on current multi-core systems and can even outperform vendor-optimized libraries in some cases.
Copyright © 2010 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.