Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2016, Article ID 8471283, 12 pages
http://dx.doi.org/10.1155/2016/8471283
Research Article

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

1Zhijiang College, Zhejiang University of Technology, Hangzhou 310024, China
2College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China

Received 4 January 2016; Accepted 27 March 2016

Academic Editor: Veljko Milutinovic

Copyright © 2016 Guixia He and Jiaquan Gao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, Pa, USA, 2nd edition, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  2. N. Bell and M. Garland, “Efficient Sparse Matrix-vector Multiplication on CUDA,” Tech. Rep., NVIDIA, 2008. View at Google Scholar
  3. N. Bell and M. Garland, “Implementing sparse matrix-vector multiplication on throughput-oriented processors,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09), pp. 14–19, Portland, Ore, USA, November 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. NVIDIA, CUSPARSE Library 6.5, 2015, https://developer.nvidia.com/cusparse.
  5. N. Bell and M. Garland, “Cusp: Generic parallel algorithms for sparse matrix and graph computations, version 0.5.1,” 2015, http://cusp-library.googlecode.com.
  6. F. Lu, J. Song, F. Yin, and X. Zhu, “Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters,” Computer Physics Communications, vol. 183, no. 6, pp. 1172–1181, 2012. View at Publisher · View at Google Scholar · View at Scopus
  7. M. M. Dehnavi, D. M. Fernández, and D. Giannacopoulos, “Finite-element sparse matrix vector multiplication on graphic processing units,” IEEE Transactions on Magnetics, vol. 46, no. 8, pp. 2982–2985, 2010. View at Publisher · View at Google Scholar · View at Scopus
  8. M. M. Dehnavi, D. M. Fernández, and D. Giannacopoulos, “Enhancing the performance of conjugate gradient solvers on graphic processing units,” IEEE Transactions on Magnetics, vol. 47, no. 5, pp. 1162–1165, 2011. View at Publisher · View at Google Scholar · View at Scopus
  9. J. L. Greathouse and M. Daga, “Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14), pp. 769–780, New Orleans, La, USA, November 2014. View at Publisher · View at Google Scholar · View at Scopus
  10. V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, “An extended compression format for the optimization of sparse matrix-vector multiplication,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 10, pp. 1930–1940, 2013. View at Publisher · View at Google Scholar · View at Scopus
  11. W. T. Tang, W. J. Tan, R. Ray et al., “Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13), pp. 1–12, Denver, Colo, USA, November 2013. View at Publisher · View at Google Scholar
  12. M. Verschoor and A. C. Jalba, “Analysis and performance estimation of the conjugate gradient method on multiple GPUs,” Parallel Computing, vol. 38, no. 10-11, pp. 552–575, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  13. J. W. Choi, A. Singh, and R. W. Vuduc, “Model-driven autotuning of sparse matrix-vector multiply on GPUs,” in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '10), pp. 115–126, ACM, Bangalore, India, January 2010. View at Publisher · View at Google Scholar · View at Scopus
  14. G. Oyarzun, R. Borrell, A. Gorobets, and A. Oliva, “MPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditioner,” Computers & Fluids, vol. 92, pp. 244–252, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  15. F. Vázquez, J. J. Fernández, and E. M. Garzón, “A new approach for sparse matrix vector product on NVIDIA GPUs,” Concurrency Computation Practice and Experience, vol. 23, no. 8, pp. 815–826, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. F. Vázquez, J. J. Fernández, and E. M. Garzón, “Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach,” Parallel Computing, vol. 38, no. 8, pp. 408–420, 2012. View at Publisher · View at Google Scholar · View at Scopus
  17. A. Monakov, A. Lokhmotov, and A. Avetisyan, “Automatically tuning sparse matrix-vector multiplication for GPU architectures,” in High Performance Embedded Architectures and Compilers: 5th International Conference, HiPEAC 2010, Pisa, Italy, January 25–27, 2010. Proceedings, pp. 111–125, Springer, Berlin, Germany, 2010. View at Publisher · View at Google Scholar
  18. M. Kreutzer, G. Hager, G. Wellein, H. Fehske, and A. R. Bishop, “A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide simd units,” SIAM Journal on Scientific Computing, vol. 36, no. 5, pp. C401–C423, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  19. H.-V. Dang and B. Schmidt, “CUDA-enabled sparse matrix-vector multiplication on GPUs using atomic operations,” Parallel Computing. Systems & Applications, vol. 39, no. 11, pp. 737–750, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  20. S. Yan, C. Li, Y. Zhang, and H. Zhou, “YaSpMV: yet another SpMV framework on GPUs,” in Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14), pp. 107–118, February 2014. View at Publisher · View at Google Scholar · View at Scopus
  21. D. R. Kincaid and D. M. Young, “A brief review of the ITPACK project,” Journal of Computational and Applied Mathematics, vol. 24, no. 1-2, pp. 121–127, 1988. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  22. G. Blelloch, M. Heroux, and M. Zagha, “Segmented operations for sparse matrix computation on vector multiprocessor,” Tech. Rep., School of Computer Science, Carnegie Mellon University, Pittsburgh, Pa, USA, 1993. View at Google Scholar
  23. J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” ACM Queue, vol. 6, no. 2, pp. 40–53, 2008. View at Publisher · View at Google Scholar
  24. B. Jang, D. Schaa, P. Mistry, and D. Kaeli, “Exploiting memory access patterns to improve memory performance in data-parallel architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 105–118, 2011. View at Publisher · View at Google Scholar · View at Scopus
  25. T. A. Davis and Y. Hu, “The University of Florida sparse matrix collection,” ACM Transactions on Mathematical Software, vol. 38, no. 1, pp. 1–25, 2011. View at Publisher · View at Google Scholar · View at MathSciNet
  26. NVIDIA, “CUDA C Programming Guide 6.5,” 2015, http://docs.nvidia.com/cuda/cuda-c-programming-guide.