EURASIP Journal on Embedded Systems
Volume 2008 (2008), Article ID 930250, 12 pages
doi:10.1155/2008/930250
Research Article

High Speed 3D Tomography on CPU, GPU, and FPGA

Nicolas GAC,1,2 Stéphane Mancini,1 Michel Desvignes,1 and Dominique Houzet1

1Grenoble-Images-Parole-Signal-Automatique Laboratoire (GIPSA-lab), Grenoble Institute of Technology (INPG), BP 46, 38402 Grenoble Cedex, France
2Equipes Traitement des Images et du Signal (ETIS), Centre National de la Recherche Scientifique (CNRS), Ecole Nationale Supérieure de l'Electronique et de ses Applications (ENSEA), Université de Cergy-Pontoise, 95000 Cergy-Pontoise Cedex, France

Received 1 March 2008; Revised 24 June 2008; Accepted 12 November 2008

Academic Editor: Dragomir Milojevic

Copyright © 2008 Nicolas GAC et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Back-projection (BP) is a costly computational step in tomography image reconstruction such as positron emission tomography (PET). To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET). The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache), when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC) to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU).