Abstract

On most massively parallel architectures, the actual communication performance remains much less than the hardware capabilities. The main reason for this difference lies in the dynamic routing, because the software mechanisms for managing the routing represent a large overhead. This article presents experimental studies on benchmark programs concerning scientific computing; the results show that most communication patterns in application programs are predictable at compile-time. An execution model is proposed that utilizes this knowledge such that predictable communications are directly compiled and dynamic communications are emulated by scheduling an appropriate set of compiled communications. The performance of the model is evaluated, showing that performance is better in static cases and gracefully degrades with the growing complexity and dynamic aspect of the communication patterns.