Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 20 (2012), Issue 1, Pages 45-67
http://dx.doi.org/10.3233/SPR-2012-0338

Combined Scheduling and Mapping for Scalable Computing with Parallel Tasks

Jörg Dümmler,1 Thomas Rauber,2 and Gudula Rünger1

1Department of Computer Science, Chemnitz University of Technology, Chemnitz, Germany
2Bayreuth University, Angewandte Informatik II, Bayreuth, Germany

Copyright © 2012 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Recent and future parallel clusters and supercomputers use symmetric multiprocessors (SMPs) and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection networks combining computing resources at different levels, starting with the interconnect within multi-core processors up to the interconnection network combining nodes of the cluster or supercomputer. The challenge for the programmer is that these computing resources should be utilized efficiently by exploiting the available degree of parallelism of the application program and by structuring the application in a way which is sensitive to the heterogeneous interconnect. In this article, we pursue a parallel programming method using parallel tasks to structure parallel implementations. A parallel task can be executed by multiple processors or cores and, for each activation of a parallel task, the actual number of executing cores can be adapted to the specific execution situation. In particular, we propose a new combined scheduling and mapping technique for parallel tasks with dependencies that takes the hierarchical structure of modern multi-core clusters into account. An experimental evaluation shows that the presented programming approach can lead to a significantly higher performance compared to standard data parallel implementations.