#### Abstract

COordinate Rotation DIgital Computer (CORDIC) is an effective method that is used in digital signal processing applications for computing various trigonometric, hyperbolic, linear, and transcendental functions. This paper presents the theoretical basis and practical implementation of circular (sine-cosine) CORDIC-based generator. Synthesis results of this generator based on Altera Stratix III FPGA (EP3SL340F1517C2) using Quartus II version 9.0 show that the proposed hybrid FPGA architecture significantly reduces latency (42% reduction) with a small area overhead, compared to the conventional version. The proposed algorithm has been simulated for sine and cosine function evaluation, and it has been verified that the accuracy is comparable with conventional algorithm.

#### 1. Introduction

COordinate Rotation DIgital Computer (CORDIC) is an effective method for the rotation vector in circular and hyperbolic coordinate systems. CORDIC was described by Jack Volder (J. Volder) in 1959. Since then, a great deal of work on this issue was published [1]. The method is popular now, due to its ease of software and hardware implementation. Indeed, for classical CORDIC method only a small amount of memory and some basic operations such as loading from memory, addition, subtraction, and shift are needed. The disadvantage of classical CORDIC is that the method has linear convergence. Thus, in order to get thecorrect result fractional bits, one must hold alliterations.

The comprehensive review of the modern state of theory and hardware representation of CORDIC method is provided in [1–3]. To achieve high performance there are attempts to build hybrid algorithms, such as LUT + CORDIC or CORDIC + final multipliers to boost performance [1, 4, 5]. However, there is no hybrid algorithm, which combines these three components: LUT, CORDIC, and final multipliers. Moreover, practical realization of such algorithm is unknown. An interesting method to increase productivity and reduce the average computation time was proposed in [6, 7], known as scaling-free CORDIC method. However, this approach has a narrow range of input angles, complex structure, and thus makes it impossible to design CORDIC calculators with accuracy of more than 16 correct bits.

In this paper we propose hybrid architecture for high performance with a simple structure that allows one to get more than 16 correct bits of the result.

#### 2. General Approach

The known methods to improve performance of conventional [1, 2, 8, 9] and scaling-free CORDIC methods [6, 7, 10, 11] are tabular calculation method and the residual method of multiplication described in [4, 10, 12], respectively. Table method is used to process the most significant bits of the input angle (argument), while the remaining multiplication is for the least significant bits.

The first method is to create a precalculated table of values for the most significant bits of the computing function argument. In this method [10] proposes an approach based on the scaling-free CORDIC, which combines the functionalities of memory units (LUT) and the corresponding iterative processing. The algorithms [4, 12] compute sine and cosine functions, where the first iterations are performed by the classical CORDIC, and then output multipliers are applied, allowing significant savings in hardware resources compared to the classical approach described in [13]. We propose combining both described approaches. The combination of the memory units, iterative scaling-free CORDIC algorithms, and multipliers provides a method without deformation module of the vector, which significantly saves hardware resources in FPGA implementation and reduces delays. The theoretical basis for the proposed method and its 16-bit microcontroller implementation are described in [11, 14]. However, no hardware implementation on FPGA has been reported until now. This paper proposes the algorithm of the hybrid FPGA calculator of trigonometric sine and cosine functions.

#### 3. Proposed Algorithm

We give an improved and expanded hybrid CORDIC algorithm [11, 14] shown as follows.

Let be the given number of bits of the CORDIC, which determines the absolute errorfor calculating sine and cosine functions. We assume that the input angleis in the range(the range of valueswill be discussed below) and is represented as whereare the coefficients of the binary representation of anangle.

To construct the sine-cosine generator, we use three elements—LUT, simple scaling-free CORDIC stages, and output multipliers. Accordingly, the input angleis divided into three groups, which are processed sequentially using LUT, scaling-free CORDIC, and output multipliers.

Therefore

We must set the numbers and to determine the size of each of these construction elements. By using and , the number of bits of each group can be written as;;.

To do this, we first consider the possibility of using CORDIC method in this computational scheme. The scale factor for conventional CORDIC is and for the scaling-free CORDIC it ison every iteration. An angleis also updated by the valuesand accordingly [11, 14]. For a given value, influence of the scale factor on the results of calculations for the conventional CORDICs is canceled when, beginning from [1, 4], and for the scaling-free CORDIC it is canceled when, beginning from [11] It is caused by the fact that the scale factor becomes equal to 1.

Similarly, influence of the angle for a conventional CORDIC also stops, when, beginning from [5], and for the scaling-free CORDIC when , beginning from [11] In these casesandare equal,.

Hence, we determine thatand. Thus, the number of bits for the first group (most significant bits) is defined as, that for the second (middle) is, and that for the third (least significant bits) is.

*Example 1. *Assume that, ; and.

Examples of the distribution of iterations between LUT, scaling-free (SF) CORDIC, and multipliers for different number of digits shown in Table 1.

*Step 1*. Preprocessing (or range reduction). Calculation of a sine-cosine with any of angles, which is in the range, can be reduced to the calculation of sine-cosine with angle, which is in the first octant of . Such reduction of a range is allowed by the rule of 8-point symmetry of circle. Methods for transformation of the input angleto theare well known, for example, [8, 10], so we will not discuss it here.

*Step 2*. Forthe angleis divided into the following three angles:

*Step 3*. Value is selected from LUT:

and then the following iterations are made for: For In vector-matrix form, step 3 is as follows:

*Step 4*. Performing the multiplication on the residual angle
or in the unfolded form
Obviously from these equations, final multiplication can be realized on typical for CORDIC operations as shift-add.

*Step 5*. Postprocessing. Computed sine-cosine with angleis necessary to be converted to sine-cosine with angleby the rule of 8-point symmetry of circle (everything is reduced only to the appropriate definition of sign and ).

#### 4. FPGA Implementation Results

To show the effectiveness of the proposed algorithm, we implemented steps 3 and 4 in the algorithm with the Altera Stratix III FPGA (EP3SL340F1517C2) using Quartus II version 9.0.

Figure 1 shows the architecture of the algorithm implemented on the FPGA. In this figure, LUT and CORDIC blocks realize step 3 in the algorithm, and the block labeled with “Shift Add” realizes step 4. To achieve high-throughput computation, the architecture is pipelined with 9 stages: 1 stage for the LUT block, 4 stages for the CORDIC block, and 4 stages for the Shift Add block.

The LUT block is implemented by ROM with a 4-bit input and two 16-bit outputs. In the CORDIC block, each iteration of the algorithm from to is implemented as a pipeline stage by logic elements, and it has two 20-bit outputs. The Shift Add block is implemented by dedicated DSP blocks on the FPGA, which have 4 pipeline stages. The outputs of the Shift Add block have 28 bits, and they are rounded to 16 bits.

In the best-known FPGA implementation of the conventional CORDIC algorithm for high throughput, all iterations of the algorithm from to are unrolled, and each of them is implemented as a pipeline stage. Thus, such implementation method requires a long latency and many logic elements. On the other hand, our implementation of the hybrid CORDIC requires fewer logic elements and causes shorter latency due to fewer pipeline stages.

Table 2 shows FPGA implementation results of the conventional CORCIC algorithm and our hybrid CORDIC algorithm. As shown in this table, our hybrid algorithm is much more efficient than the conventional one in terms of hardware cost. Compared to the conventional CORDIC, the proposed algorithm has approximately the same bandwidth. However, there is a considerable gain in reduction of latency, as well as the number of logic elements. Moreover, the number of pipeline stages is also reduced. Therefore, the proposed method is easier to implement and uses less logic elements, while it has similar bandwidth. Although our hybrid algorithm requires not only logic elements but also DSP blocks, this is efficient for modern FPGAs. This is because most of modern FPGAs have DSP blocks and memory blocks, and a balanced use of them as well as logic elements is more efficient. Therefore, we can conclude that our hybrid algorithm is suitable for modern FPGA implementation.

#### 5. Concluding Remarks

This paper describes the theoretical bases and practical pipelined FPGA implementation of a new hybrid scaling-free CORDIC algorithm. Logical combination of three construction elements of modern FPGAs which are LUT, simple scaling-free CORDIC stages, and multipliers allowed a considerable improvement of calculation efficiency of sine and cosine functions without the loss of accuracy, compared to conventional CORDIC algorithms.

The proposed method allows the reduction of the FPGA hardware resources by more than 30%, while providing the same throughput and accuracy of calculations.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.