Research Article  Open Access
KeunYung Byun, ChunSu Park, JeeYoung Sun, SungJea Ko, "Vector Radix 2 × 2 Sliding Fast Fourier Transform", Mathematical Problems in Engineering, vol. 2016, Article ID 2416286, 8 pages, 2016. https://doi.org/10.1155/2016/2416286
Vector Radix 2 × 2 Sliding Fast Fourier Transform
Abstract
The twodimensional (2D) discrete Fourier transform (DFT) in the sliding window scenario has been successfully used for numerous applications requiring consecutive spectrum analysis of input signals. However, the results of conventional sliding DFT algorithms are potentially unstable because of the accumulated numerical errors caused by recursive strategy. In this letter, a stable 2D sliding fast Fourier transform (FFT) algorithm based on the vector radix (VR) 2 × 2 FFT is presented. In the VR2 × 2 FFT algorithm, each 2D DFT bin is hierarchically decomposed into four subDFT bins until the size of the subDFT bins is reduced to 2 × 2; the output DFT bins are calculated using the linear combination of the subDFT bins. Because the subDFT bins for the overlapped input signals between the previous and current window are the same, the proposed algorithm reduces the computational complexity of the VR2 × 2 FFT algorithm by reusing previously calculated subDFT bins in the sliding window scenario. Moreover, because the resultant DFT bins are identical to those of the VR2 × 2 FFT algorithm, numerical errors do not arise; therefore, unconditional stability is guaranteed. Theoretical analysis shows that the proposed algorithm has the lowest computational requirements among the existing stable sliding DFT algorithms.
1. Introduction
The twodimensional (2D) discrete Fourier transform (DFT) has been widely used for spectrum analysis of 2D input signals in the field of signal processing. The vector radix (VR) 2 × 2 FFT [1] is one of the most practical approaches to performing the 2D DFT. The VR2 × 2 FFT algorithm hierarchically decomposes each DFT bin into subDFT bins until the size of the DFT bins becomes 2 × 2. Because the DFT bins can be efficiently obtained from the subDFT bins using a butterfly structure, the computational cost of the DFT can be significantly reduced. Additionally, various VR FFT algorithms including VR4 × 4, split VR, and VR2^{2} × 2^{2} have been introduced to further enhance the efficiency of the VR2 × 2 FFT by more finely decomposing the DFT bins [2–5].
The existing VRbased 2D FFT algorithms are widely applied in various fields of signal processing with satisfactory computing speed. However, these algorithms have computational redundancies when the transform window is shifted to the next sample because the input signals in the previous and current windows overlap. To reduce the redundancies in the sliding window scenario, numerous algorithms have been introduced over the past several years. The sliding DFT (SDFT) algorithm significantly reduces the computational load of the 1D DFT for the shifted window using the circular shift property [6]. The hopping DFT (HDFT) applies the SDFT to the hopping window scenario; HDFT can reduce the number of complex multiplications and additions in intermediate calculations [7]. Recently, the 2D SDFT proposed in [8] further reduced the number of computations of the SDFT using recursive strategy for a 2D input signal. The moving FFT (MFFT) algorithm proposed in [9] introduced a fast implementation of the 2D FFT by updating the previously calculated FFT bins when the current window is shifted. It is noteworthy that the recursive strategy adopted in the aforementioned sliding algorithms significantly reduces the number of complex multiplications and additions. However, the recursive implementation causes error propagation. Because, in practice, the complex numbers used in the DFT are represented in floatingpoint format with finite precision, the output DFT bins of the sliding algorithm and those of the traditional algorithm are not the same, although they are mathematically equivalent. Furthermore, the errors accumulate and, thus, can increase in the worst case.
To address this issue, several stable DFT algorithms have been proposed. The stable SDFT proposed in [10], called rSDFT, reduces the numerical error by multiplying the twiddle factor by the damping factor , where . In the modulated SDFT (mSDFT) [11], the unstable twiddle factor is multiplied recursively by the previous output DFT bin, which is considered to be the cause of numerical errors. Therefore, the mSDFT removes the twiddle factor from the feedback loop, thereby guaranteeing unconditional stability. Recently, the guaranteedstable SDFT (gSDFT) was proposed for fast, stable DFT [12]. The gSDFT not only guarantees the stability by removing the twiddle factor from the feedback of the resonator but also reduces the computational requirements by adopting the butterfly structure of the traditional FFT algorithm to obtain the bins of the updating vector transform. Nevertheless, the output DFT bins still contain the numerical error because the recursive strategy is used. On the other hand, the sliding FFT (SFFT) algorithm introduced by FarhangBorojueny and Gazor in [13] does not adopt the recursive strategy. The SFFT calculates the bins of the shifted window by exploiting delayed intermediate calculations of the previous window. Therefore, the SFFT has the advantage that numerical errors do not occur as the transform window slides. However, because the SFFT is designed to handle 1D input data, its computational complexity can be further reduced for the 2D input data.
In this letter, we propose a novel stable, fast 2D sliding FFT algorithm. Because the 2D butterfly structure used in the VR2 × 2 FFT algorithm is comprised of the 1D butterfly structure, the proposed VR2 × 2 SFFT adopts the concept of the SFFT to guarantee the stability. In addition, we consider the 2D sliding window to move in only one direction, that is, columnwise or rowwise, to reduce the computational complexity. The rest of this letter is organized as follows. In Section 2, we first analyze the computational relationship between the subDFT bins and the DFT bins of the VR2 × 2 FFT. Then, we explain how the proposed VR2 × 2 SFFT algorithm reduces the amount of computation in the sliding window scenario while guaranteeing the computational stability. Section 3 presents the performance of the VR2 × 2 SFFT by analyzing its arithmetic complexity and stability, making a comparison with those of the conventional algorithms. Finally, conclusions are given in Section 4.
2. Proposed VR2 × 2 SFFT Algorithm
Because the proposed VR2 × 2 SFFT algorithm reduces the computational complexity of the VR2 × 2 FFT in the sliding window situation, we first explain the structure of the VR2 × 2 FFT. The VR2 × 2 FFT can be derived only for an data sequence, where is an integer power of two. Let and denote the bin of the input data and the bin of the output DFT bins, respectively. Then, the point DFT is defined aswhere , and is a twiddle factor equal to .
The VR2 × 2 FFT algorithm first decomposes (1) into 2 × 2 partial sums as follows:Note that each partial sum denoted by is a 2D DFT of size .
has a period of along both and ; that is, Combining (2) and (3), the following relationship can be obtained:where , and . The matrix relationship shows that the subDFT bins are shared to calculate multiple DFT bins, resulting in the VR2 × 2 FFT algorithm having less computational complexity than 2D DFT. This relationship has generally been illustrated with the butterfly diagram, as shown in Figure 1.
It is obvious that the conventional butterfly diagram represents the computational relations between DFT bins and subDFT bins. However, because the butterfly diagram has been developed to illustrate the data flow of the 1D FFT, it is difficult to present both the 2D spatial position of the input and output data and the spatial relations between DFT bins and subDFT bins. Although the index of each subDFT bin can indicate its 2D spatial location, the diagram becomes complicated as the amount of data increases.
For visual comprehension of the VR2 × 2 FFT algorithm, we introduce a new 3D diagram, as shown in Figure 2.
Unlike for the conventional 2D butterfly diagram, the 2D spatial location of the input subDFT bins and the output DFT bins can be clearly illustrated, as in Figure 2. In addition, because the Xshaped pairs of arrows do not overlap with each other, it is easier to discriminate the relations between input and output DFT bins. However, the diagram still becomes complicated as the amount of data increases. To make the diagram simpler, let us omit the Xshaped arrows and twiddle factors , , and from Figure 2. Then, the 3D diagram of the VR2 × 2 FFT can be simplified as shown in Figure 3.
The decimation procedure repeats times until the size of is reduced to 2 × 2. Let denote the th decimation stage, where is an integer in the range of , and the input data bins appear at stage . The sizes of and at stage are and , respectively. The twiddle factors multiplied with , , , and , , at are 1, , , and , respectively. An example of the twiddle factors required for the 8 × 8point VR2 × 2 FFT is shown in Figure 4.
Now, let us consider the computational relationship between the input data and the values of . As in (2), the VR2 × 2 FFT algorithm computes the DFT bins using the decomposed 2 × 2 , which means that the input data give hierarchical dependencies to the output DFT bins through all decimation stages.
This technique can best be explained via an example. Figure 5 shows a flow graph of the 8 × 8point VR2 × 2 FFT algorithm. For the 8 × 8point VR2 × 2 FFT, three decimation stages, that is, , , and , are required, and, thus, there are four data layers, as shown in Figure 5. Let , , denote the th data layer, where and consist of the input data and the output DFT bins, respectively. Note that the input data are arranged in bitreverse order. First, the 2 × 2 input data indicated by black circles in are linearly combined at stage to obtain the corresponding 2 × 2 DFT bins indicated by gray circles in . Then, 4 × 4 data including the resultant 2 × 2 DFT bins in are used at stage to calculate 4 × 4 DFT bins indicated by gray circles in . Finally, the 8 × 8 DFT bins in are obtained at stage using all of the data in . Here, it is noteworthy that the data indicated by white circles in and are not involved in calculating the subDFT bins indicated by gray circles in and .
Based on this observation, we derive the proposed algorithm, which can reduce the computational complexity of the VR2 × 2 FFT in the sliding window scenario. Assume that all calculations in the structure shown in Figure 5 have already been performed for the previous data and that the results were stored. Furthermore, assume that the 8 × 8 window on the input data shifts by one column and that the same process must be performed at the current state.
Figure 6 shows 2D flow graphs of the VR2 × 2 SFFT at both the previous and the current state by projecting the 3D flow graph in Figure 5 onto the columnlayer plane. Denote a set of data of the column in the th layer by . Then, the number of data corresponding to is . Figure 6(a) shows that the 8 × 8point VR2 × 2 FFT is performed using the input data from the 0th to the 7th column from the previous state. All of the calculated data in and represented by gray circles are stored. At the current state, the input data of the 0th column are removed from the window and those of the 8th column are newly included. After the input data of the current window are arranged in bitreverse order, they are transformed using the 8 × 8point VR2 × 2 FFT.
Here, note that the data in , 5, rd, 7, and nd, 6 columns are paired, becoming the input of at the current state, as they were at the previous state. Then, the values of , , , , , in and , , , in at the previous state are the same as those at the current state, and, thus, they can be reused. As a result, only the data indicated by white circles in Figure 6(b) need to be calculated in the sliding scenario.
In addition, the computations for the data indicated by white circles can be further reduced, based on the fact that the 2D window is shifted columnwise. The structure of the 2D butterfly is composed of two stages, where each stage has two 1D butterflies, as shown in Figure 7.
In the first stage, two sets of the data and are the inputs of the single butterfly. The resultant and are used as the input to the butterflies in the second stage. Here, it can be observed that the data set is not used to calculate . This means that if the data have already been calculated, they do not need to be recalculated. Then, the complexity of one complex multiplication () and two complex additions (’s) can be reduced.
In Figure 6(b), the input data of the 4th and 8th columns are used together as the input to stage . Because the calculations for the data in the 4th column were already performed for the previous state, the results can be reused for the current state. Next, and are paired with and , respectively, as the input at stage , and the 2D butterfly is performed for each pair. Because and have also been calculated, the number of computations for the 2D butterflies can be reduced. The complexity reductions can be achieved at stage in the same manner.
Consequently, each 2D butterfly of the proposed algorithm requires only two ’s and six ’s, whereas that of the VR2 × 2 FFT algorithm needs three ’s and eight ’s. Moreover, because the output DFT bins of the proposed algorithm are identical to those of the VR2 × 2 FFT algorithm, numerical errors do not accumulate. Therefore, the proposed VR2 × 2 SFFT algorithm can guarantee stability. Next, we analyze the computational complexity of the proposed VR2 × 2 SFFT and compare it to that of the conventional algorithms.
3. Complexity and Stability Analysis of the VR2 × 2 SFFT Algorithm
3.1. Complexity Analysis
For input data, the VR2 × 2 FFT is comprised of decimation stages. Each decimation stage has 2 × 2 butterflies, and each 2 × 2 butterfly needs three ’s and eight ’s. Hence, for the input data, the VR2 × 2 FFT requires ’s and ’s. Among the butterflies in the structure of the VR2 × 2 FFT, the 2 × 2 butterflies related to only the newly imported input data must be calculated in the VR2 × 2 SFFT. For the input data, the number of 2 × 2 butterflies calculated at is . Then, the total number of 2 × 2 butterflies required by the VR2 × 2 SFFT isAs mentioned in Section 2, a 2 × 2 butterfly needs two ’s and six ’s, and the VR2 × 2 SFFT requires ’s and ’s for input data. Because the proposed VR2 × 2 SFFT computes only some of the butterflies among those of the VR2 × 2 FFT structure, the computational requirements of the proposed algorithm are lower than those of the VR2 × 2 FFT algorithm.
A computational comparison of various window sizes is shown in Table 1, where one is counted as four real multiplications (’s) and two real additions (’s) and one is counted as two ’s. The 2D MFFT [9], SFFT [13], and 2D SDFT [8] are chosen for the existing fast DFT/FFT algorithms. In addition, the rSDFT [10], mSDFT [11], and gSDFT [12] are chosen for the existing stable DFT algorithms. Considering that the SFFT, rSDFT, mSDFT, and gSDFT are proposed for the 1D input signal, we assume that each 1D transform is horizontally performed on the 2D input signal, and, then, the 1D FFT is vertically applied to the results. All the algorithms listed in Table 1 were implemented using the ANSIC code and the performance was evaluated on a 3.3GHz CPU with 8 GB of RAM. In our simulation, the size of transform window was set to 16 × 16 and we measured the processing time required for performing the sliding transform process times. For each algorithm, we measured the processing time 10 times and averaged the measured values. The resultant values are listed in Table 2. The complexity of the proposed VR2 × 2 SFFT is higher than that of the 2D SDFT and 2D MFFT. However, the proposed algorithm achieves complexity in both and , which is the lowest among the existing stable DFT algorithms.


Further, we present the memory requirements of all the algorithms in Table 2. For each algorithm, we examined the amount of required memory for performing the sliding transform. For the sake of the clarity, the memory to store the input and output signals is not considered. In Table 2, we see that the memory requirements of the algorithms vary depending on the window size. In general, 2D SDFT and 2D MFFT need a relatively small amount of memory as compared to the other algorithms. Note that the size of the transform window is usually much smaller than those of the input and output signals. Therefore, in general, the amount of memory required to performing the sliding transform may be not a big burden for realworld applications.
3.2. Stability Analysis
We investigated the stability of the proposed VR2 × 2 SFFT algorithm using a complex test signal, which was zeromean Gaussian noise with a standard deviation of one. The simulation was performed in 64bit doubleprecision arithmetic; was set to 16. In our simulation, the numerical errors are generated by the recursive strategy of the sliding DFT/FFT algorithms. Therefore, we evaluate the stability of the algorithm by computing the differences between the output DFT bins of the sliding algorithm and those of the reference algorithm. The reference algorithm denotes the original algorithm from which the sliding algorithm was derived, for example, the reference algorithm of the VR2 × 2 SFFT is the VR2 × 2 FFT. All algorithms were tested over iterations. The error at time index is calculated aswhere represents the DFT bin of the reference algorithm and is the bin of the test algorithm.
Then, it is observed that the errors in the 2D MFFT and the 1D rSDFT + 1D FFT are accumulated as increases, as shown in Figure 8. For the rSDFT, the damping factor is set to . In our simulation, the average of the 2D MFFT is and that of the 1D rSDFT + 1D FFT is . In terms of the average increasing ratio of , the 2D MFFT and 1D rSDFT + 1D FFT are and , respectively.
The measured values of the 1D mSDFT + 1D FFT, 1D gSDFT + 1D FFT, and VR2 × 2 SFFT are presented in Figure 9. The errors of the 1D mSDFT + 1D FFT and the 1D gSDFT + 1D FFT did not accumulate; however, they fluctuated in the range of and , respectively. The average values of the 1D mSDFT + 1D FFT and 1D gSDFT + 1D FFT are and , respectively. On the other hand, as mentioned in Section 2, the outputs of the proposed algorithm are exactly the same as those of the VR2 × 2 FFT, and errors do not accumulate. Therefore, the proposed VR2 × 2 SFFT outperforms the other algorithms in terms of stability.
4. Conclusion
In this letter, a new stable SFFT based on the VR2 × 2 FFT algorithm was presented for 2D input data. We first analyzed the computational relationship between the subDFT bins of the structure of the VR2 × 2 FFT. Then, we adopt the concept of the 1D SFFT, which calculates the bins of the shifted window by exploiting the delayed intermediate calculations of the previous window. The proposed VR2 × 2 SFFT algorithm achieves complexity, which is the lowest among the existing stable DFT algorithms. Because the output DFT bins of the proposed method are exactly the same as those of the VR2 × 2 FFT algorithm, the numerical errors do not accumulate in the sliding transform process.
Conflict of Interests
The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgment
This work was supported by Institute for Information & Communications Technology Promotion (IITP) Grant funded by the Korean Government (MSIP) (no. B0101150525; Development of Global MultiTarget Tracking and Event Prediction Techniques Based on RealTime LargeScale Video Analysis) and by Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2015 (Grant no. C0276386).
References
 K. R. Rao, D. N. Kim, and J. J. Hwang, Fast Fourier Transform—Algorithms and Applications, Springer, 2011.
 H. R. Wu and F. J. Paoloni, “Structure of vector radix fast Fourier transforms,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 9, pp. 1415–1424, 1989. View at: Publisher Site  Google Scholar
 H. R. Wu and F. J. Paoloni, “On the twodimensional vector splitradix FFT algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 8, pp. 1302–1304, 1989. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 S. C. Chan and K. L. Ho, “Split vectorradix fast Fourier transform,” IEEE Transactions on Signal Processing, vol. 40, no. 8, pp. 2029–2039, 1992. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. T. Hamood and S. Boussakta, “Vectorradix2^{2}$\times $2^{2} fast Fourier transform algorithm,” in Proceedings of the 17th IEEE International Conference on Electronics, Circuits, and Systems (ICECS '10), pp. 734–737, IEEE, Athens, Greece, December 2010. View at: Publisher Site  Google Scholar
 E. Jacobsen and R. Lyons, “An update to the sliding DFT,” IEEE Signal Processing Magazine, vol. 21, no. 1, pp. 110–111, 2004. View at: Publisher Site  Google Scholar
 C.S. Park and S.J. Ko, “The hopping discrete fourier transform,” IEEE Signal Processing Magazine, vol. 31, no. 2, pp. 135–139, 2014. View at: Publisher Site  Google Scholar
 C.S. Park, “2D discrete Fourier transform on sliding windows,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 901–907, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 B. G. Sherlock and D. M. Monro, “Moving discrete Fourier transform,” IEE Proceedings, Part F: Radar and Signal Processing, vol. 139, no. 4, pp. 279–282, 1992. View at: Publisher Site  Google Scholar
 E. Jacobsen and R. Lyons, “The sliding DFT,” IEEE Signal Processing Magazine, vol. 20, no. 2, pp. 74–80, 2003. View at: Publisher Site  Google Scholar
 K. Duda, “Accurate, guaranteed stable, sliding discrete fourier transform,” IEEE Signal Processing Magazine, vol. 27, no. 6, pp. 124–127, 2010. View at: Publisher Site  Google Scholar
 C. Park, “Fast, accurate, and guaranteed stable sliding discrete Fourier transform,” IEEE Signal Processing Magazine, vol. 32, no. 4, pp. 145–156, 2015. View at: Publisher Site  Google Scholar
 B. FarhangBorojueny and S. Gazor, “Generalized sliding FFT and its application to implementation of block LMS adaptive filters,” IEEE Transactions on Signal Processing, vol. 42, no. 3, pp. 532–537, 1994. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 KeunYung Byun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.