(a) Dataset level view
(b) Memory level view
Figure 1: (a) An overview of row-column decomposition (RCD) for 2D FFT implementation. Intermediate storage is required because all elements of the row-by-row operations must be available for column-by-column processing. (b) An overview of strided column-wise access from DRAM as compared to trivial row-wise access. An entire row of elements must be read into the row buffer even to access a single element within a specific row.