Research Article
Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
Listing 3
An improved UPC implementation of SpMV by explicit thread privatization.
| /∗ Allocation of the five shared arrays x, y, D, A, J as in the naive implementation ∗/ | | // … | | /∗ Instead of upc_forall, each thread directly handles its designated blocks ∗/ | | int mythread_nblks = nblks/THREADS + (MYTHREAD < (nblks% THREADS) ?1 : 0); | | for (int mb = 0; mb < mythread_nblks; mb++) { | | int offset = (mb ∗ THREADS + MYTHREAD) ∗ BLOCKSIZE; | | /∗ casting shared pointers to local pointers ∗/ | | double∗loc_y = (double∗) (y + offset); | | double∗loc_D = (double∗) (D + offset); | | double∗loc_A = (double∗) (A + offset ∗ rnz); | | int∗loc_J = (int∗) (J + offset ∗ rnz); | | /∗ computation per block ∗/ | | for (int k = 0; k < min(BLOCKSIZE, n-offset); k++) { | | double tmp = 0.0; | | for (int j = 0; j < rnz; j++) | | tmp += loc_A[k ∗ rnz + j] ∗ x[loc_J[k ∗ rnz + j]]; | | loc_y[k] = loc_D[k] ∗ x[offset + k] + tmp; | | } | | } |
|