Research Article

A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs

Algorithm 1

Generic 3D stencil computation.
void stencil_computation (
float (array1) [Y_SIZE] [X_SIZE],
float (array2) [Y_SIZE] [X_SIZE])
float (in) [Y_SIZE] [X_SIZE] = array1;
float (out) [Y_SIZE] [X_SIZE] = array2;
for (int t = 0; t < T_MAX; ++t)
for (int z = 0; z < Z_SIZE; ++z)
for (int y = 0; y < Y_SIZE; ++y)
for (int x = 0; x < X_SIZE; ++x)
float temp0 = in [z+] [y+] [x+];
float temp1 = in [z+] [y+] [x+];
float tempN = in [z+] [y+] [x+];
out [z] [y] [x]= f();
// Swap in and out pointers