Research Article

Power-Efficient Computing: Experiences from the COSA Project

Figure 9

Slices per second (a) and per Joule (b) of OpenMP and CUDA versions of the Filtered Backprojection algorithm executed on traditional high-end Xeon architecture (red) and on Jetson TK1 board (blue). The rightmost bar shows the combined GPU + CPU solution on Jetson TK1; that is, only 3 CPU threads are used for the OpenMP version. Otherwise, the OpenMP version uses all the available threads, that is, 24 threads on Xeon and 4 threads on Jetson TK1.
(a)
(b)