Improving Accuracy for Matrix Multiplications on GPUs
Reproducibility of an experiment is a commonly used metric to determine its validity. Within scientific computing, this can become difficult due to the accumulation of floating point rounding errors in the numerical computation, greatly reducing the accuracy of the computation. Matrix multiplication is particularly susceptible to these rounding errors which is why there exist so many solutions, ranging from simulating extra precision to compensated summation algorithms. These solutions however all suffer from the same problem, abysmal performance when compared against the performance of the original algorithm. Graphics cards are particularly susceptible due to a lack of double precision on all but the most recent generation graphics cards, therefore increasing the accuracy of the precision that is offered becomes paramount. By using our method of selectively applying compensated summation algorithms, we are able to return a whole digit of accuracy on current generation graphics cards and potentially two digits of accuracy on the newly released “fermi” architecture. This is all possible with only a 2% drop in performance.