Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 18, Issue 1, Pages 35-50
http://dx.doi.org/10.3233/SPR-2010-0297

Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures

Hatem Ltaief,1 Jakub Kurzak,1 Jack Dongarra,1,2,3 and Rosa M. Badia4

1Department of Electrical Engineering and Computer Science, University of Tennessee, TN, USA
2Computer Science and Mathematics Division, Oak Ridge National Laboratory, TN, USA
3School of Mathematics and School of Computer Science, University of Manchester, Manchester, UK
4Barcelona Supercomputing Center – Centro Nacional de Supercomputación, Consejo Nacional de Investigaciones Cientificas, Barcelona, Spain

Copyright © 2010 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The objective of this paper is to describe, in the context of multicore architectures, three different scheduler implementations for the two-sided linear algebra transformations, in particular the Hessenberg and Bidiagonal reductions which are the first steps for the standard eigenvalue problems and the singular value decompositions respectively. State-of-the-art dense linear algebra softwares, such as the LAPACK and ScaLAPACK libraries, suffer performance losses on multicore processors due to their inability to fully exploit thread-level parallelism. At the same time the fine-grain dataflow model gains popularity as a paradigm for programming multicore architectures. Buttari et al. (Parellel Comput. Syst. Appl. 35 (2009), 38–53) introduced the concept of tile algorithms in which parallelism is no longer hidden inside Basic Linear Algebra Subprograms but is brought to the fore to yield much better performance. Along with efficient scheduling mechanisms for data-driven execution, these tile two-sided reductions achieve high performance computing by reaching up to 75% of the DGEMM peak on a 12000×12000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for two-sided transformations is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrices to the required forms.