Research Article
Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
Table 3
Time usage (in seconds) of 1000 iterations SpMV for Test problems 1–3.
| | 1 node | 2 nodes | 4 nodes | 8 nodes | 16 nodes | 32 nodes | 64 nodes | 16 threads | 32 threads | 64 threads | 128 threads | 256 threads | 512 threads | 1024 threads |
| Test problem 1: 6,810,586 tetrahedrons | UPCv1 | 28.80 | 522.15 | 443.98 | 1882.01 | 551.20 | 311.54 | 183.73 | UPCv2 | 39.37 | 36.70 | 23.68 | 18.89 | 13.61 | 9.98 | 9.57 | UPCv3 | 25.01 | 15.07 | 8.22 | 4.65 | 2.91 | 2.68 | 5.56 |
| Test problem 2: 13,009,527 tetrahedrons | UPCv1 | 59.14 | 2525.05 | 3532.33 | 3657.95 | 3078.35 | 2613.85 | 1588.67 | UPCv2 | 73.79 | 69.60 | 55.33 | 36.39 | 24.16 | 25.06 | 21.29 | UPCv3 | 46.88 | 24.97 | 15.43 | 10.91 | 6.25 | 5.15 | 7.54 |
| Test problem 3: 25,587,400 tetrahedrons | UPCv1 | 115.25 | 2990.92 | 1758.94 | 986.85 | 1302.52 | 4653.10 | 2692.69 | UPCv2 | 154.72 | 178.14 | 122.38 | 81.77 | 52.99 | 41.16 | 44.80 | UPCv3 | 93.30 | 48.74 | 26.13 | 15.37 | 11.12 | 7.41 | 10.16 |
|
|