Abstract

A number of applications on parallel computers deal with very large data sets that cannot fit in main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/write smaller sections of the arrays from/to files. In this article, we describe a method for accessing sections of out-of-core arrays efficiently. Our method, the extended two-phase method, uses collective l/O: Processors cooperate to combine several l/O requests into fewer larger granularity requests, to reorder requests so that the file is accessed in proper sequence, and to eliminate simultaneous l/O requests for the same data. In addition, the l/O workload is divided among processors dynamically, depending on the access requests. We present performance results obtained from two real out-of-core parallel applications – matrix multiplication and a Laplace's equation solver – and several synthetic access patterns, all on the Intel Touchstone Delta. These results indicate that the extended two-phase method significantly outperformed a direct (noncollective) method for accessing out-of-core array sections.