Research Article  Open Access
Leonid Moroz, Shinobu Nagayama, Taras Mykytiv, Ihor Kirenko, Taras Boretskyy, "Simple Hybrid ScalingFree CORDIC Solution for FPGAs", International Journal of Reconfigurable Computing, vol. 2014, Article ID 615472, 4 pages, 2014. https://doi.org/10.1155/2014/615472
Simple Hybrid ScalingFree CORDIC Solution for FPGAs
Abstract
COordinate Rotation DIgital Computer (CORDIC) is an effective method that is used in digital signal processing applications for computing various trigonometric, hyperbolic, linear, and transcendental functions. This paper presents the theoretical basis and practical implementation of circular (sinecosine) CORDICbased generator. Synthesis results of this generator based on Altera Stratix III FPGA (EP3SL340F1517C2) using Quartus II version 9.0 show that the proposed hybrid FPGA architecture significantly reduces latency (42% reduction) with a small area overhead, compared to the conventional version. The proposed algorithm has been simulated for sine and cosine function evaluation, and it has been verified that the accuracy is comparable with conventional algorithm.
1. Introduction
COordinate Rotation DIgital Computer (CORDIC) is an effective method for the rotation vector in circular and hyperbolic coordinate systems. CORDIC was described by Jack Volder (J. Volder) in 1959. Since then, a great deal of work on this issue was published [1]. The method is popular now, due to its ease of software and hardware implementation. Indeed, for classical CORDIC method only a small amount of memory and some basic operations such as loading from memory, addition, subtraction, and shift are needed. The disadvantage of classical CORDIC is that the method has linear convergence. Thus, in order to get thecorrect result fractional bits, one must hold alliterations.
The comprehensive review of the modern state of theory and hardware representation of CORDIC method is provided in [1–3]. To achieve high performance there are attempts to build hybrid algorithms, such as LUT + CORDIC or CORDIC + final multipliers to boost performance [1, 4, 5]. However, there is no hybrid algorithm, which combines these three components: LUT, CORDIC, and final multipliers. Moreover, practical realization of such algorithm is unknown. An interesting method to increase productivity and reduce the average computation time was proposed in [6, 7], known as scalingfree CORDIC method. However, this approach has a narrow range of input angles, complex structure, and thus makes it impossible to design CORDIC calculators with accuracy of more than 16 correct bits.
In this paper we propose hybrid architecture for high performance with a simple structure that allows one to get more than 16 correct bits of the result.
2. General Approach
The known methods to improve performance of conventional [1, 2, 8, 9] and scalingfree CORDIC methods [6, 7, 10, 11] are tabular calculation method and the residual method of multiplication described in [4, 10, 12], respectively. Table method is used to process the most significant bits of the input angle (argument), while the remaining multiplication is for the least significant bits.
The first method is to create a precalculated table of values for the most significant bits of the computing function argument. In this method [10] proposes an approach based on the scalingfree CORDIC, which combines the functionalities of memory units (LUT) and the corresponding iterative processing. The algorithms [4, 12] compute sine and cosine functions, where the first iterations are performed by the classical CORDIC, and then output multipliers are applied, allowing significant savings in hardware resources compared to the classical approach described in [13]. We propose combining both described approaches. The combination of the memory units, iterative scalingfree CORDIC algorithms, and multipliers provides a method without deformation module of the vector, which significantly saves hardware resources in FPGA implementation and reduces delays. The theoretical basis for the proposed method and its 16bit microcontroller implementation are described in [11, 14]. However, no hardware implementation on FPGA has been reported until now. This paper proposes the algorithm of the hybrid FPGA calculator of trigonometric sine and cosine functions.
3. Proposed Algorithm
We give an improved and expanded hybrid CORDIC algorithm [11, 14] shown as follows.
Let be the given number of bits of the CORDIC, which determines the absolute errorfor calculating sine and cosine functions. We assume that the input angleis in the range(the range of valueswill be discussed below) and is represented as whereare the coefficients of the binary representation of anangle.
To construct the sinecosine generator, we use three elements—LUT, simple scalingfree CORDIC stages, and output multipliers. Accordingly, the input angleis divided into three groups, which are processed sequentially using LUT, scalingfree CORDIC, and output multipliers.
Therefore
We must set the numbers and to determine the size of each of these construction elements. By using and , the number of bits of each group can be written as;;.
To do this, we first consider the possibility of using CORDIC method in this computational scheme. The scale factor for conventional CORDIC is and for the scalingfree CORDIC it ison every iteration. An angleis also updated by the valuesand accordingly [11, 14]. For a given value, influence of the scale factor on the results of calculations for the conventional CORDICs is canceled when, beginning from [1, 4], and for the scalingfree CORDIC it is canceled when, beginning from [11] It is caused by the fact that the scale factor becomes equal to 1.
Similarly, influence of the angle for a conventional CORDIC also stops, when, beginning from [5], and for the scalingfree CORDIC when , beginning from [11] In these casesandare equal,.
Hence, we determine thatand. Thus, the number of bits for the first group (most significant bits) is defined as, that for the second (middle) is, and that for the third (least significant bits) is.
Example 1. Assume that, ; and.
Examples of the distribution of iterations between LUT, scalingfree (SF) CORDIC, and multipliers for different number of digits shown in Table 1.

Step 1. Preprocessing (or range reduction). Calculation of a sinecosine with any of angles, which is in the range, can be reduced to the calculation of sinecosine with angle, which is in the first octant of . Such reduction of a range is allowed by the rule of 8point symmetry of circle. Methods for transformation of the input angleto theare well known, for example, [8, 10], so we will not discuss it here.
Step 2. Forthe angleis divided into the following three angles:
Step 3. Value is selected from LUT:
and then the following iterations are made for: For In vectormatrix form, step 3 is as follows:
Step 4. Performing the multiplication on the residual angle or in the unfolded form Obviously from these equations, final multiplication can be realized on typical for CORDIC operations as shiftadd.
Step 5. Postprocessing. Computed sinecosine with angleis necessary to be converted to sinecosine with angleby the rule of 8point symmetry of circle (everything is reduced only to the appropriate definition of sign and ).
4. FPGA Implementation Results
To show the effectiveness of the proposed algorithm, we implemented steps 3 and 4 in the algorithm with the Altera Stratix III FPGA (EP3SL340F1517C2) using Quartus II version 9.0.
Figure 1 shows the architecture of the algorithm implemented on the FPGA. In this figure, LUT and CORDIC blocks realize step 3 in the algorithm, and the block labeled with “Shift Add” realizes step 4. To achieve highthroughput computation, the architecture is pipelined with 9 stages: 1 stage for the LUT block, 4 stages for the CORDIC block, and 4 stages for the Shift Add block.
The LUT block is implemented by ROM with a 4bit input and two 16bit outputs. In the CORDIC block, each iteration of the algorithm from to is implemented as a pipeline stage by logic elements, and it has two 20bit outputs. The Shift Add block is implemented by dedicated DSP blocks on the FPGA, which have 4 pipeline stages. The outputs of the Shift Add block have 28 bits, and they are rounded to 16 bits.
In the bestknown FPGA implementation of the conventional CORDIC algorithm for high throughput, all iterations of the algorithm from to are unrolled, and each of them is implemented as a pipeline stage. Thus, such implementation method requires a long latency and many logic elements. On the other hand, our implementation of the hybrid CORDIC requires fewer logic elements and causes shorter latency due to fewer pipeline stages.
Table 2 shows FPGA implementation results of the conventional CORCIC algorithm and our hybrid CORDIC algorithm. As shown in this table, our hybrid algorithm is much more efficient than the conventional one in terms of hardware cost. Compared to the conventional CORDIC, the proposed algorithm has approximately the same bandwidth. However, there is a considerable gain in reduction of latency, as well as the number of logic elements. Moreover, the number of pipeline stages is also reduced. Therefore, the proposed method is easier to implement and uses less logic elements, while it has similar bandwidth. Although our hybrid algorithm requires not only logic elements but also DSP blocks, this is efficient for modern FPGAs. This is because most of modern FPGAs have DSP blocks and memory blocks, and a balanced use of them as well as logic elements is more efficient. Therefore, we can conclude that our hybrid algorithm is suitable for modern FPGA implementation.

5. Concluding Remarks
This paper describes the theoretical bases and practical pipelined FPGA implementation of a new hybrid scalingfree CORDIC algorithm. Logical combination of three construction elements of modern FPGAs which are LUT, simple scalingfree CORDIC stages, and multipliers allowed a considerable improvement of calculation efficiency of sine and cosine functions without the loss of accuracy, compared to conventional CORDIC algorithms.
The proposed method allows the reduction of the FPGA hardware resources by more than 30%, while providing the same throughput and accuracy of calculations.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
 P. K. Meher, J. Valls, T.B. Juang, K. Sridharan, and K. Maharatna, “50 years of CORDIC: algorithms, architectures, and applications,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 56, no. 9, pp. 1893–1907, 2009. View at: Publisher Site  Google Scholar
 J.M. Muller, Elementary Functions: Algorithms and Implementation, Birkhauser, Boston, Mass, USA, 2nd edition, 2006.
 B. Lakshmi and A. S. Dhar, “CORDIC architectures: a survey,” VLSI Design, vol. 2010, Article ID 794891, 19 pages, 2010. View at: Publisher Site  Google Scholar
 E. Antelo and J. Villalba, “Low latency pipelined circular CORDIC,” in Proceedings of the 17th IEEE Symposium on Computer Arithmetic (ARITH17 '05), pp. 280–287, usa, June 2005. View at: Google Scholar
 S. Wang, V. Piuri, and E. E. Swartzlander Jr., “Hybrid CORDIC algorithms,” IEEE Transactions on Computers, vol. 46, no. 11, pp. 1202–1207, 1997. View at: Publisher Site  Google Scholar
 K. Maharatna, S. Banerjee, E. Grass, M. Krstic, and A. Troya, “Modified virtually scalingfree adaptive CORDIC rotator algorithm and architecture,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 11, pp. 1463–1473, 2005. View at: Publisher Site  Google Scholar
 F. J. Jaime, M. A. Sánchez, J. Hormigo, J. Villalba, and E. L. Zapata, “Enhanced scalingfree CORDIC,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 7, pp. 1654–1662, 2010. View at: Publisher Site  Google Scholar
 A. Madisetti, A. Y. Kwentus, and A. N. Willson Jr., “100MHz, 16b, direct digital frequency synthesizer with a 100dBc spuriousfree dynamic range,” IEEE Journal of SolidState Circuits, vol. 34, no. 8, pp. 1034–1043, 1999. View at: Publisher Site  Google Scholar
 A. S. N. Mokhtar, M. B. I. Reaz, K. Chellappan, and M. A. Mohd Ali, “Scaling free CORDIC algorithm implementation of sine and cosine function,” in Proceedings of the World Congress on Engineering (WCE '13), vol. II, London, UK, July 2013. View at: Google Scholar
 Y.S. Juang, L.T. Ko, J.E. Chen, T.Y. Sung, and H.C. Hsin, “Optimization and implementation of scalingfree CORDICbased direct digital frequency synthesizer for body care area network systems,” Computational and Mathematical Methods in Medicine, vol. 2012, Article ID 651564, 9 pages, 2012. View at: Publisher Site  Google Scholar
 L. Moroz, Theory and hardwaresoftware devices of iteration evaluation of functions [M.S. thesis], Lviv Polytechnic National University, Lviv, Ukraine, 2013.
 F. de Dinechin, M. Istoan, and G. Sergent, “Fixedpoint trigonometric functions on FPGAs,” in Proceedings of the 4th International Symposium on HighlyEfficient Accelerators and Reconfigurable Technologies, RoyaumeUni, Edimburgh, UK, March 2013. View at: Google Scholar
 XILINX LogiCORE IP CORDIC v6.0 (December 2013).
 L. Moroz, T. Mykytiv, and M. Herasym, “Improved scalingfree CORDIC algorithm,” in EastWest Design & Test Symposium, pp. 1–5, September 2013. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 Leonid Moroz et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.