Research Article  Open Access
Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO
Abstract
The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and addsubtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation.
1. Introduction
Numerical controlled oscillator (NCO) is an important part of digital downconversion. It is widely used in radar wireless transceiver system and software radio system [1–3]. The main function of NCO is to produce two path sine and cosine data samples with variable frequency, discrete time, and mutually orthogonal. It has an advantage of high frequency precision and fast response.
The traditional implement method of NCO is lookup table and polynomial expansion method. Data accuracy of lookup table method depends on the size of the lookup table ROM. The size of the memory and the precision of phase accuracy are exponential relationship, which enlarges the resource consumption and reduces the processing speed of the system. In [4], it solves this problem by using store content mapping technology of oddeven symmetry to optimize the storage unit and reduce the storage resources to 12.5%. However, under the request of high precision, it still consumes a lot of resources. Polynomial expansion method is a realtime computing method which needs multiplier resources and has certain restrictions on the complexity and speed of the hardware. It is too hard for the two methods to trade off speed, accuracy, and resource. Coordinate rotation digital compute algorithm (CORDIC) is proposed to solve the problem. CORDIC algorithm uses a basic algorithm to replace the complex algorithm. CORDIC algorithm is easy to hardware implementation. It does not require hardware multiplier and all operations are only shift accumulation, which meets the hardware requirements of modular and regularization algorithm requirements.
Along with proposing high speed broadband receiver, the data accuracy and processing speed have a higher request. Under the background, traditional CORDIC algorithm has some inherent drawbacks, such as limited coverage angle and too much pipeline series which increase resource consumption and limit data processing speed. Aiming at these shortcomings, this paper puts forward an efficient pipeline architecture CORDIC algorithm for NCO design.
2. Traditional CORDIC Algorithm
Volder CORDIC algorithm was proposed in 1959, and in 1971, Walther unified the form of the algorithm. Meyerbase realized the algorithm [5, 6], using FPGA implementation for the first time. CORDIC algorithm has been applied in many fields, such as direct digital frequency synthesizer, fast Fourier transform, discrete cosine transform, digital modulation/demodulator, and stream processors [7–10]. According to certain phase, starting point rotates continuously and approaches the final point gradually. Rotation vector diagram is shown in Figure 1.
In Figure 1, it is easy to get
From the start to the end position, spinning process can be done by several steps and each step only rotates a certain phase:
After extracting , formula (2) can be expressed as follows:
In order to simplify the hardware implementation, every operation sets each rotation phase to . The total rotation phase is . So . Formula (3) can be expressed as follows:
From formula (4), in addition to the coefficient, the operation is simple shift and addition.
In the final result, can be eliminated by multiplying a known constant. For example, , the number of iterations is 16 and . can be expressed as follows:
In the phase rotation process, approximative rotational iterative formula is
Parameter is used to judge when the iteration is over: , . When , . When , . If the initial value is , of th iteration will converge to . The phase convergence satisfies the CORDIC convergence theorem [6]. The constant scaling factor is fixed and can be precomputed as long as the precision is determined. After analysis of traditional CORDIC algorithm calculation accuracy, the iteration number and phase precision are expressed as follows: where and the input phase data width is .
3. Hybrid CORDIC Algorithm Based on Phase Rotation Estimation
Common operation structures are iteration, pipeline, and differential CORDIC algorithm. Iterative structure occupies less hardware resources, but the processing data efficiency is low. Although the pipeline structure occupies more hardware resources, it can improve the throughput. Based on the two realization structures, implementation schemes have parallel pipelines, hybrid rotation CORDIC, angle encoding method, and so forth [11, 12]. The work in [13] puts forward the way of prediction rotation direction. The algorithm, applied in error analysis and elimination, has the advantages of fast speed. But it does not optimize hardware structure. Using the structure of the parallel hybrid CORDIC algorithm, the prediction scheme of [14] is more regular and simpler compared to previous approaches, which can reduce the number of iterations by more than 50 percent. However, the judgment of rotation direction is not optimized, which increases latency time and resources, so that it affects the throughput. The work in [15] puts forward a modified hybrid CORDIC algorithm and improved the precision of output data, but the method is more complex. Trading off the disadvantages of the above methods and advantages of pipeline structure and iterative structure, this paper simplifies the CORDIC algorithm further. By using the arctangent function property, it reduces the rotating judgment and addsubtract unit operation.
In this paper, attention is focused mainly on techniques that reduce the number of iterations, while keeping the low latency. The hybrid CORDIC algorithm based on phase rotation estimation is presented in this section, which can be addressed by digitonline pipelined CORDIC circuits and repetitive multiple accumulations architecture.
3.1. Rotation Phase Estimation
Assuming that the input phase length of CORDIC algorithm is and pipeline series is , rotation phase can be represented as follows: where . It is noted that the initial value of is 1, and the reason is that we restrict the rotation angle within the range in the application example of NCO.
With the increase of rotational coefficient , gets close to . When , . Error is . Arctangent function is developed through the tailor equation: where . The minimum phase value is . In the process of phase rotation, when the error estimate is , error generated by estimated value can be ignored. The range of is
When , . Through (7), pipeline series of CORDIC algorithm is . The less the pipeline series are, the faster the speed is. When , we define the hybrid radix set: After iterating times, the sum of residual rotation phase is , as shown in formula (12):
The actual residual phase is . According to the traditional CORDIC algorithm theory, . So . When the th rotation begins, the new rotation phase is , where the absolute value of is . Thus . When , . When , . Taking it into formula (3),
After the rotation, the residual phase is 0. It shows that and are the output of cosine data and sine data.
3.2. Rotation Function Optimization and Error Analysis
In order to obtain cosine data from the new pipeline process, we put forward unidirectional rotation method to reduce the comparator and choose addition or subtractor. should be expressed firstly. bits input phase needs to iterate times. The results can be expressed as . The residual phase at this time is . is expressed as follows: where or 0. When , . When , is that flips every bit and adds 1:
From formula (15), is unknown. In the hardware implementation, needs to be expressed as follows: where or . Taking all figures of into (15),
Uniting formulas (13) and (17), where and or 0.
CORDIC algorithm of efficient pipeline uses instead of the traditional rotation phase . The last set of rotation phase can be expressed as binary. Rotation direction is obtained directly from the last set. One more shift and add operation reduces rotation times. Under the premise of ensuring phase and data accuracy, it reduces the resource consumption and improves the operation speed. Finally, with fewer lines series, constant coefficient is as follows:
At this time set the initial value to . According to the above process, converges to .
The cosine error of this algorithm can be divided into three parts:(1)the quantization errors are caused due to the limited word length,(2)limited phase word length leads to approximation error,(3)the phase estimation gives rise to the rotation estimation error.
Quantization error is in an inverse ratio to word length and output word length is set by pipeline series. The more the pipeline series are, the lower the quantization error is. But the increase of pipeline series will lead to resources consumption. So according to the data figure, it is necessary to trade off pipeline series and quantization error. Considering the hardware consumption, computing speed, and precision, [7] proposes the optimization method of data bits and pipeline series. According to [16], the quantization error consists of two parts, the quantization error produced before and this time. It can be expressed as follows: where is the sum of quantization error and is the th phase rotation quantization error with .
When the output data is and , , , , can be expressed as follows:
According to formula (7), when phase length is , phase resolution is . Approximation error produced by limited phase word length can be expressed as follows: where is the actual value and is the error value. is the difference between real phase and approximate phase. In the final rotating phase estimate, rotating phase is instead of . Arc value of can be replaced only by binary values similarly. In formula (16), the generated error can be expressed as follows:
4. The FPGA Design and Implementation of NCO
4.1. High Speed and Precision NCO Structure
This paper adopts efficient pipelining structure CORDIC algorithm for high speed and high precision NCO. Its structure is shown in Figure 2. We take 16bit phase control words as an example. Firstly, input is a 16bit phase control word and 16bit frequency control word. Secondly, through the phase accumulator and phase adder, the output is 16bit phase value. Phase map generates the phase. Thirdly, the shiftadd efficient pipelining structure processes phase data. Finally according to the previous mapping relation, 16bit sine and cosine data can be generated.
The range of rotation angle value is and approximates to . It does not meet the scope of phase. Before 16bit phase values are sent into the algorithm, cosine function property can judge the highest, second highest, and third highest bit. According to certain mapping relation, the highest 3 bits of 16bit phase value and phase can be reduced to and , respectively. The highest bit controls sine data symbol. If the bit is 1, the algorithm flips the sine data and adds 1. On the other hand, the algorithm does not process input data. The highest bit and second highest bit control cosine data symbol. If they are different, the algorithm flips sine data and adds 1. Otherwise, it remains to be the input data. The second highest bit and third highest bit control the location of cosine data and sine data. If they are different, the algorithm exchanges cosine data and sine data. Or else it remains to be the input data.
4.2. Internal Architecture Design and the Major Implementation Steps
According to formula (10), our algorithm needs times for traditional phase rotation and one time for rotation phase estimation. If , the pipeline structure is shown in Figure 3. Each level only needs three addersubtractors, two or six phase shift registers, and a phase coefficient memory and reduces more than a half of the rotation phase judgment and shift operation. For reducing the critical path in the pipelined implementation of traditional CORDIC, the differential CORDIC (DCORDIC) algorithm based on digitonline pipelined CORDIC circuits [17] can be used to achieve higher throughput and lower pipeline latency. DCORDIC algorithm is equivalent to the usual CORDIC in terms of accuracy as well as convergence. The system architecture uses parallel and pipeline differential CORDIC architecture to reduce latency and improve throughout. Digitonline pipelined CORDIC circuits take place of continuous phase accumulation in Figure 3.
From what has been discussed above, the major steps of our algorithm are as follows.
Step 1. Phase rotation is limited in the range of .
Step 2. Traditional or differential CORDIC algorithm implements partial phase rotation.
Step 3. Using a relatively simple prediction scheme, we divide original CORDIC rotations into the lower part and the higher part.
Step 4. Differential CORDIC or traditional architecture is proposed to compute rotation direction. The lower part is computed by continuous accumulation or online architecture [18] based on differential CORDIC and the higher part is predicted by rotation phase estimation.
Step 5. According to phase mapping relationship, the required high precision and high speed cosine data is produced.
4.3. Simulation Results
Table 1 compares the delay of some CORDIC rotation methods. Our proposed algorithm could obtain good performance in delay and resource.

To compare our pipeline CORDIC algorithm with other previously proposed methods fairly, we assume CSA is universal adder in all algorithms and fast carrypropagate adders (CPA) are used in the last stage to take carrysave forms back to the input initial phase value.
In [13], the first iterations use the traditional continuous comparison method, the same as the traditional CORDIC. The delay increases logarithmically with the maximum number of shifts. If the delay of carrypropagate adder (CPA) is , the latency of iterations increases linearly with the word length and the delay is .
Based on the calculation method above, the traditional CORDIC based on pipeline architecture has the delay of .
Unlike the above methods, our proposed method reduces the number of iterations and simplifies the datapath. The first iterations still adopt the traditional CORDIC algorithm where a delay of is assumed for an bit CPA. The accumulations of final iteration use repetitive multiple accumulations architecture [19], which has much higher throughput and less delay compared with serial accumulator and pipelined adder based on carrysave addition as well. The last iteration increases linearly and the delay is , where is the fulladder number for the accumulations based on addertree architecture.
According to the structure shown in Figure 2, traditional pipeline structure and efficient pipeline structure based on rotation phase estimation are implemented by verilog language, respectively. Hardware platform is a Cyclone II series EP2C8Q208C8 chip and software platform is in Quartus II of Altera company. Modelsim 10.0 simulation software tests the experience result. Firstly, the input frequency control word, phase control word, and clock frequency are set to , , and 100 MHz. Output frequency is 10 MHz. Compared to the use of resources, the result can be expressed in Table 2.

Through the comparison in Table 2, our proposed algorithm reduced resource obviously.
This algorithm precision is the same as traditional CORDIC algorithm, . The input frequency control word, phase control word, and clock frequency are set to and . The output frequency is . Compared with the theoretical value and experiment value, the error statistic is shown in Figures 4 and 5. The simulation runtime of our proposed algorithm costs less than the traditional CORDIC algorithm in Figure 6.
(a) Sine value error
(b) Cosine value error
(a) Sine value error
(b) Cosine value error
Compared with Figures 4 and 5, our proposed algorithm has the larger error volatility, while the two kinds of the algorithm error will be controlled in .
Though our algorithm structure reduces logic unit, it guarantees the cosine data accuracy. Figure 7 shows the NCO simulation waveform of efficient pipeline structure.
It is necessary to obtain efficient bits of phase, optimum iteration number, and data width. We do the above experiment 200 times. The random angle value is restricted from 0 to 45°. When the iteration number is 5~8 and the series of data width are 15, 16, 18, and 21, we can obtain the effective bits. The relationship of effective bit number with iteration times and data width is shown in Table 3. The data unit is degree.

The algorithm error will be controlled in , when the iteration number is greater than 6. The experimental results show that the effective bit number is 13. Through calculating the minimum number of microrotation, the effective bit number is generally seven greater than iteration number. The calculation of total quantization errors could be calculated through this method.
5. Conclusion
In this paper, the hybrid CORDIC algorithm based on phase rotation estimation is proposed to design NCO. In the case of assuring the high precision output, the efficient CORDIC algorithm reduces more than a half of the rotation phase judgment and shift operation. Resource consumption, operation speed, and system delay have much better performance than traditional CORDIC algorithm. In terms of electronic countermeasures, it has a certain practicality. The algorithm has been successfully used in high speed broadband ADSB receiver and shows good performance.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grant no. 61172159) and the Fundamental Research Funds for the Central Universities (HEUCFT1101).
References
 L. Guo, S. Tian, Z. Wang, and J. Luo, “Study of NCO realization in parallel digital down anversion,” Chinese Journal of Scientific Instrument, vol. 33, no. 5, pp. 998–1004, 2012. View at: Google Scholar
 Q. Zhang, Y. Luo, S. Chen, and J. Yan, “Design and implementation of NCO based on phase rotation,” Systems Engineering and Electronics, vol. 32, no. 5, pp. 908–911, 2010. View at: Publisher Site  Google Scholar
 X.N. Yang, Y.C. Lou, and J.L. Xu, Software Radio Technology and Application, Beijing Institute of Technology Press, Beijing, China, 2010.
 W.B. Qin, L.Y. Luo, and T.Y. Li, “Study on the efficient technology applied to high precision and high resolution storage in high speed NCO,” Journal of Sichuan University (Engineering Science Edition), vol. 39, no. 1, pp. 156–159, 2007. View at: Google Scholar
 J. Volder, “The CORDIC trigonometric computing technique,” IRE Transactions on Electronic Computers, vol. 8, no. 3, pp. 330–334, 1959. View at: Publisher Site  Google Scholar
 J. Walther, “A unified algorithm for elementary functions,” in Proceedings of the Spring Joint Computer Conference, vol. 38, pp. 379–385, 1971. View at: Google Scholar
 S.Q. Wan, W.F. Chen, S.R. Huang, H. Ji, and Z. Yu, “Implementation of a highspeed direct digital frequency synthesizer based on improved CORDIC algorithm,” Chinese Journal of Scientific Instrument, vol. 31, no. 11, pp. 2586–2591, 2010. View at: Google Scholar
 S. Y. Park and Y. J. Yu, “Fixedpoint analysis and parameter selections of MSRCORDIC with applications to FFT designs,” IEEE Transactions on Signal Processing, vol. 60, no. 12, pp. 6245–6256, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 S. Aggarwal, P. K. Meher, and K. Khare, “Scalefree hyperbolic {CORDIC} processor and its application to waveform generation,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60, no. 2, pp. 314–326, 2013. View at: Publisher Site  Google Scholar  MathSciNet
 H. Huang and L. Xiao, “CORDIC based fast radix2 DCT algorithm,” IEEE Signal Processing Letters, vol. 20, no. 5, pp. 483–486, 2013. View at: Publisher Site  Google Scholar
 T. Juang, “Low latency angle recoding methods for the higher bitwidth parallel CORDIC rotator implementations,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 55, no. 11, pp. 1139–1143, 2008. View at: Publisher Site  Google Scholar
 P. K. Meher and S. Y. Park, “CORDIC designs for fixed angle of rotation,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 2, pp. 217–228, 2013. View at: Publisher Site  Google Scholar
 S. Wang, V. Piuri, and E. E. Swartzlander Jr., “Hybrid CORDIC algorithms,” IEEE Transactions on Computers, vol. 46, no. 11, pp. 1202–1207, 1997. View at: Publisher Site  Google Scholar
 S.F. Hsiao, Y.H. Hu, and T.B. Juang, “A memoryefficient and highspeed sine/cosine generator based on parallel CORDIC rotations,” IEEE Signal Processing Letters, vol. 11, no. 2, pp. 152–155, 2004. View at: Publisher Site  Google Scholar
 X. Zhang, R. Xin, Q. Wang, and H. Li, “Design of direct digital frequency synthesizer based on improved hybrid CORDIC algorithm,” Acta Electronica Sinica, vol. 36, no. 6, pp. 1144–1148, 2008. View at: Google Scholar
 Y. H. Hu, “The quantization effects of the CORDIC algorithm,” IEEE Transactions on Signal Processing, vol. 40, no. 4, pp. 834–844, 1992. View at: Publisher Site  Google Scholar
 H. Dawid and H. Meyr, “The differential CORDIC algorithm: constant scale factor redundant implementation without correcting iterations,” IEEE Transactions on Computers, vol. 45, no. 3, pp. 307–318, 1996. View at: Publisher Site  Google Scholar
 M. D. Ercegovac and T. Lang, “Redundant and online CORDIC: application to matrix triangularization and SVD,” IEEE Transactions on Computers, vol. 39, no. 6, pp. 725–740, 1990. View at: Publisher Site  Google Scholar
 P. K. Meher, “New approach to scalable parallel and pipelined realization of repetitive multiple accumulations,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 55, no. 9, pp. 902–906, 2008. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 Chaozhu Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.