Research Article

Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC

Listing 2

A naive UPC implementation of SpMV using a modified EllPack storage format.
/∗ Total number of blocks in every shared array ∗/
int nblks = n/BLOCKSIZE + (n% BLOCKSIZE) ?1 : 0;
/∗ Allocation of five shared arrays ∗/
shared [BLOCKSIZE] doublex = upc_all_alloc (nblks, BLOCKSIZE ∗ sizeof(double));
shared [BLOCKSIZE] doubley = upc_all_alloc (nblks, BLOCKSIZE ∗ sizeof(double));
shared [BLOCKSIZE] doubleD = upc_all_alloc (nblks, BLOCKSIZE ∗ sizeof(double));
shared [rnz ∗ BLOCKSIZE] doubleA = upc_all_alloc (nblks, rnz ∗ BLOCKSIZE ∗ sizeof(double));
shared [rnz ∗ BLOCKSIZE] intJ = upc_all_alloc (nblks, rnz ∗ BLOCKSIZE ∗ sizeof(int));
// …
/∗ Computation of SpMV involving all threads ∗/
upc_forall (int i = 0; i < n; i++; &y[i]) {
double tmp = 0.0;
for (int j = 0; j < rnz; j++)
  tmp += A[i ∗ rnz + j] ∗ x[J[i ∗ rnz + j]];
y[i] = D[i] ∗ x[i] + tmp;
}