Table of Contents
VLSI Design
Volume 2014, Article ID 712085, 11 pages
Research Article

A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)

School of Engineering, University of Guelph, Guelph, ON, Canada N1G 2W1

Received 6 August 2013; Revised 15 December 2013; Accepted 18 December 2013; Published 24 February 2014

Academic Editor: Jim Ching-Rong Lin

Copyright © 2014 Antony Savich and Shawki Areibi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out.