Scientific Programming

Research Article

Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC

An improved UPC implementation of SpMV by explicit thread privatization.

	/∗ Allocation of the five shared arrays x, y, D, A, J as in the naive implementation ∗/
	// …
	/∗ Instead of upc_forall, each thread directly handles its designated blocks ∗/
	int mythread_nblks = nblks/THREADS + (MYTHREAD < (nblks% THREADS) ?1 : 0);
	for (int mb = 0; mb < mythread_nblks; mb++) {
	int offset = (mb ∗ THREADS + MYTHREAD) ∗ BLOCKSIZE;
	/∗ casting shared pointers to local pointers ∗/
	double^∗loc_y = (double^∗) (y + offset);
	double^∗loc_D = (double^∗) (D + offset);
	double^∗loc_A = (double^∗) (A + offset ∗ r_nz);
	int^∗loc_J = (int^∗) (J + offset ∗ r_nz);
	/∗ computation per block ∗/
	for (int k = 0; k < min(BLOCKSIZE, n-offset); k++) {
	double tmp = 0.0;
	for (int j = 0; j < r_nz; j++)
	tmp += loc_A[k ∗ r_nz + j] ∗ x[loc_J[k ∗ r_nz + j]];
	loc_y[k] = loc_D[k] ∗ x[offset + k] + tmp;
	}
	}