Table of Contents Author Guidelines Submit a Manuscript
VLSI Design
Volume 2014, Article ID 712085, 11 pages
http://dx.doi.org/10.1155/2014/712085
Research Article

A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)

School of Engineering, University of Guelph, Guelph, ON, Canada N1G 2W1

Received 6 August 2013; Revised 15 December 2013; Accepted 18 December 2013; Published 24 February 2014

Academic Editor: Jim Ching-Rong Lin

Copyright © 2014 Antony Savich and Shawki Areibi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. J. Jang, S. Choi, and V. Prasanna, “Area and time efficient implementations of matrix multiplication on FPGAs,” in Proceedings of the IEEE International Conference on Field Programmable Technology (FPT ’02), pp. 93–100, Hong Kong, China, December 2002. View at Publisher · View at Google Scholar
  2. A. Qasim, A. Telba, and A. AlMazroo, “FPGA design and implementation of dense matrix-vector multiplication for image processing application,” in Proceedings of the World Congress of Engineering and Computer Science (WCECS ’10), vol. 1, pp. 1–4, San Francisco, Calif, USA, October 2010.
  3. F. Bensaali, A. Amira, and A. Bouridane, “Accelerating matrix product on reconfigurable hardware for image processing applications,” IEE Proceedings—Circuits Devices and Systems, vol. 152, no. 3, pp. 236–246, 2005. View at Publisher · View at Google Scholar
  4. N. Dave, K. Fleming, M. King, M. Pellauer, and M. Vijayaraghavan, “Hardware acceleration of matrix multiplication on a xilinx FPGA,” in Proceedings of the 5th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE '07), pp. 97–100, Nice, France, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  5. D. Yang, G. D. Peterson, H. Li, and J. Sun, “An FPGA implementation for solving least square problem,” in Proceedings of the 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM '09), pp. 303–306, Napa, Calif, USA, April 2009. View at Publisher · View at Google Scholar · View at Scopus
  6. I. Sotiropoulos and I. Papaefstathiou, “A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems,” in Proceedings of the 19th International Conference on Field Programmable Logic and Applications (FPL ’09), pp. 276–281, Lausanne, Switzerland, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Vucha and A. Rajawat, “Design and FPGA implementation of systolic array architecture for matrix multiplication,” International Journal of Computer Applications, vol. 26, no. 3, pp. 18–22, 2011. View at Publisher · View at Google Scholar
  8. J. Jiang, V. Mirian, K. P. Tang, P. Chow, and Z. Xing, “Matrix multiplication based on scalable macro-pipelined FPGA accelerator architecture,” in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig '09), pp. 48–53, Quintana Roo, Mexico, December 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. C. Lin, Z. Zhang, N. Wong, and H. So, “Design space exploration for sparse matrix-matrix multiplication on FPGAs,” in Proceedings of the International Conference on Field-Programmable Technology (FPT '10), pp. 369–372, Beijing, China, December 2010. View at Publisher · View at Google Scholar · View at Scopus
  10. V. Kumar, S. Joshi, S. Patkar, and H. Narayanan, “FPGA based high performance double-precision matrix multiplication,” International Journal of Parallel Programming, vol. 38, no. 3-4, pp. 322–338, 2010. View at Publisher · View at Google Scholar · View at Scopus