Fast Inner Product Computation on Short Buses

Lin, R.; Olariu, S.

doi:https://doi.org/10.1080/10655140290011140

VLSI Design

On this page

Abstract Copyright Related Articles

Open Access

Volume 14 | Article ID 471289 | https://doi.org/10.1080/10655140290011140

Fast Inner Product Computation on Short Buses

R. Lin¹and S. Olariu²

Received03 Dec 2000

Revised12 Apr 2001

Abstract

We propose a VLSI inner product processor architecture involving broadcasting only over short buses (containing less than 64 switches). The architecture leads to an efficient algorithm for the inner product computation. Specifically, it takes 13 broadcasts, each over less than 64 switches, plus 2 carry-save additions (tcsa) and 2 carry-lookahead additions (tcla) to compute the inner product of two arrays of N=29 elements, each consisting of m=64 bits. Using the same order of VLSI area, our algorithm runs faster than the best known fast inner product algorithm of Smith and Torng [“Design of a fast inner product processor,” Proceedings of IEEE 7th Symposium on Computer Arithmetic (1985)], which takes about 28 tcsa + tcla for the computation.

Copyright

Copyright © 2002 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation Order printed copies

Views

180

Downloads

492

Citations