VLSI Design
Volume 2014 (2014), Article ID 712085, 11 pages
Research Article

A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)

School of Engineering, University of Guelph, Guelph, ON, Canada N1G 2W1

Received 6 August 2013; Revised 15 December 2013; Accepted 18 December 2013; Published 24 February 2014

Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out.