Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 8, Issue 1, Pages 23-30
http://dx.doi.org/10.1155/2000/269312

Performance of MC2 and the ECMWF IFS Forecast Model on the Fujitsu VPP700 and NEC SX-4M

Michel Desgagné, Stephen Thomas, and Michel Valin

Recherche en prévision numérique (RPN), Environment Canada, 2121, route Transcanadienne, Dorval, Québec, Canada

Received 11 October 2000; Accepted 11 October 2000

Copyright © 2000 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The NEC SX-4M cluster and Fujitsu VPP700 supercomputers are both based on custom vector processors using low-power CMOS technology. Their basic architectures and programming models are however somewhat different. A multi-node SX-4M cluster contains up to 32 processors per shared memory node, with a maximum of 16 nodes connected via the proprietary NEC IXS fibre channel crossbar network. A hybrid combination of inter-node MPI message-passing with intra-node tasking or threads is possible. The Fujitsu VPP700 is a fully distributed-memory vector machine with a crossbar interconnect which also supports MPI. The parallel performance of the MC2 model for high-resolution mesoscale forecasting over large domains and of the IFS RAPS 4.0 benchmark are presented for several different machine configurations. These include an SX-4/32, an SX-4/32M cluster and up to 100 PE's of the VPP700. Our results indicate that performance degradation for both models on a single SX-4 node is primarily due to memory contention within the internal crossbar switch. Multinode SX-4 performance is slightly better than single node. Longer vector lengths and SDRAM memory on the VPP700 result in lower per processor execution rates. Both models achieve close to ideal scaling on the VPP700.