Scientific Programming

Research Article

Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC

A naive UPC implementation of SpMV using a modified EllPack storage format.

	/∗ Total number of blocks in every shared array ∗/
	int nblks = n/BLOCKSIZE + (n% BLOCKSIZE) ?1 : 0;
	/∗ Allocation of five shared arrays ∗/
	shared [BLOCKSIZE] double^∗x = upc_all_alloc (nblks, BLOCKSIZE ∗ sizeof(double));
	shared [BLOCKSIZE] double^∗y = upc_all_alloc (nblks, BLOCKSIZE ∗ sizeof(double));
	shared [BLOCKSIZE] double^∗D = upc_all_alloc (nblks, BLOCKSIZE ∗ sizeof(double));
	shared [r_nz ∗ BLOCKSIZE] double^∗A = upc_all_alloc (nblks, r_nz ∗ BLOCKSIZE ∗ sizeof(double));
	shared [r_nz ∗ BLOCKSIZE] int^∗J = upc_all_alloc (nblks, r_nz ∗ BLOCKSIZE ∗ sizeof(int));
	// …
	/∗ Computation of SpMV involving all threads ∗/
	upc_forall (int i = 0; i < n; i++; &y[i]) {
	double tmp = 0.0;
	for (int j = 0; j < r_nz; j++)
	tmp += A[i ∗ r_nz + j] ∗ x[J[i ∗ r_nz + j]];
	y[i] = D[i] ∗ x[i] + tmp;
	}