Abstract

Coordinate rotation digital computer (CORDIC) is an efficient algorithm for computations of trigonometric functions. Scaling-free-CORDIC is one of the famous CORDIC implementations with advantages of speed and area. In this paper, a novel direct digital frequency synthesizer (DDFS) based on scaling-free CORDIC is presented. The proposed multiplier-less architecture with small ROM and pipeline data path has advantages of high data rate, high precision, high performance, and less hardware cost. The design procedure with performance and hardware analysis for optimization has also been given. It is verified by Matlab simulations and then implemented with field programmable gate array (FPGA) by Verilog. The spurious-free dynamic range (SFDR) is over 86.85 dBc, and the signal-to-noise ratio (SNR) is more than 81.12 dB. The scaling-free CORDIC-based architecture is suitable for VLSI implementations for the DDFS applications in terms of hardware cost, power consumption, SNR, and SFDR. The proposed DDFS is very suitable for medical instruments and body care area network systems.

1. Introduction

Direct digital frequency synthesizer (DDFS) has been widely used in the modern communication systems. DDFS is preferable to the classical phase-locked-loop- (PLL-) based synthesizer in terms of switching speed, frequency resolution, and phase noise, which are beneficial to the high-performance communication systems. Figure 1 depicts the conventional DDFS architecture [1], which consists of a phase accumulator, a sine/cosine generator, a digital-to-analog converter (DAC), and a low-pass filter (LPF). As noted, two inputs: the reference clock and the frequency control word (FCW) are used; the phase accumulator integrates FCW to produce an angle in the interval of , and the sine/cosine generator computes the sinusoidal values. In practice, the sine/cosine generator is implemented digitally, and thus followed by digital-to-analog conversion and low-pass filtering for analogue outputs. Such systems can be applied in many fields, especially in industrial, biological, and medical applications [24].

The simplest way to implement the sine/cosine generator is to use ROM lookup table (LUT). However, a large ROM is needed [5]. Several efficient compression techniques have been proposed to reduce the lookup table size [510]. The quadrant compression technique can compress the lookup table and then reduce the ROM size by 75% [6]. The Sunderland architecture splits the ROM into two smaller memories [7], and the Nicholas architecture improves the Sunderland architecture to achieve a higher ROM-compression ratio (32 : 1) [8]. The ROM size can be further reduced by using the polynomial approximations [1118] or CORDIC algorithm [1927]. In the polynomial approximations-based DDFSs, the interval of is divided into subintervals, and sine/cosine functions are evaluated in each subinterval. The polynomial approximations-based DDFS requires a ROM to store the coefficients of the polynomials and the polynomial evaluation hardware with multipliers. In the circular mode of CORDIC, which is an iterative algorithm to compute sine/cosine functions, an initial vector is rotated with a predetermined sequence of subangles such that the summation of the rotations approaches the desired angle [28, 29]. CORDIC has been widely used for the sine/cosine generator of DDFS [1927]. Compared to the lookup table-based DDFS, the CORDIC-based DDFS has the advantage of avoiding the exponential growth of hardware complexity while the output word size increases [3033].

In Figure 1, the word length of the phase accumulator is bits; thus, the period of the output signal is as follows: where is the phase increment and denotes the sampling period. It is noted that the output frequency can be written by

According to the equation above, the minimum change of output frequency is given by Thus, the frequency resolution of DDFS is dependent on the word length of the phase accumulator as follows:

The bandwidth of DDFS is defined as the difference between the highest and the lowest output frequencies. The highest frequency is determined by either the maximum clock rate or the speed of logic circuitries; the lowest frequency is dependent on FCW. Spurious-free dynamic range (SFDR) is defined as the ratio of the amplitude of the desired frequency component to that of the largest undesired one at the output of DDFS, which is often represented in dBc as follows: where is the amplitude of the desired frequency component and is the amplitude of the largest undesired one.

In this paper, a novel DDFS architecture based on the scaling-free CORDIC algorithm [34] with ROM mapping is presented. The rest of the paper is organized as follows. In Section 2, CORDIC is reviewed briefly. In Section 3, the proposed DDFS architecture is presented. In Section 4, the hardware implementation of DDFS is given. Conclusion can be found in Section 5.

2. The CORDIC Algorithm

CORDIC is an efficient algorithm that evaluates various elementary functions including sine and cosine functions. As hardware implementation might only require simple adders and shifters, CORDIC has been widely used in the high speed applications.

2.1. The CORDIC Algorithm in the Circular Coordinate System

A rotation of angle in the circular coordinate system can be obtained by performing a sequence of micro-rotations in the iterative manner. Specifically, a vector can be successively rotated by the use of a sequence of pre-determined step-angles: . This methodology can be applied to generate various elementary functions, in which only simple adders and shifters are required. The conventional CORDIC algorithm in the circular coordinate system is as follows [28, 29]: where denotes the direction of the th micro-rotation, with in the vector rotation mode [34], with in the angle accumulated mode [34], the corresponding scale factor is equal to , and . The product of the scale factors after micro-rotations is given by

In the vector rotation mode, and can be obtained with the initial value: . More specifically, and are computed from the initial value: as follows:

2.2. Scaling-Free CORDIC Algorithm in the Circular Coordinate System

Based on the following approximations of sine and cosine functions: the scaling-free CORDIC algorithm is thus obtained by using (6), (7), and the above. In which, the iterative rotation is as follows:

For the word length of bits, it is noted that the implementation of scaling-free CORDIC algorithm utilizes four shifters and four adders for each micro-rotation in the first -microrotations; it reduces two shifters and two adders for each microrotation in the last -micro-rotations [24, 34, 35].

3. Design and Optimization of the Scaling-Free CORDIC-Based DDFS Architecture

In this section, the architecture together with performance analysis of the proposed DDFS is presented. It is a combination of the scaling-free-CORDIC algorithm and LUT; this hybrid approach takes advantage of both CORDIC and LUT to achieve high precision and high data rate, respectively. The proposed DDFS architecture consists of phase accumulator, radian converter, sine/cosine generator, and output stage.

3.1. Phase Accumulator

Figure 2 shows the phase accumulator, which consists of a 32-bit adder to accumulate the phase angle by recursively. At time , the output of phase accumulator is and the sine/cosine generator produces and . The load control signal is used for FCW to be loaded into the register, and the reset signal is to initialize the content of the phase accumulator to zero.

3.2. Radian Converter

In order to convert the output of the phase accumulator into its binary representation in radians, the following strategy has been adopted. Specifically, an efficient ROM reduction scheme based on the symmetry property of sinusoidal wave can be obtained by simple logic operations to reconstruct the sinusoidal wave from its first quadrant part only. In which, the first two MSBs of an angle indicate the quadrant of the angle in the circular coordinate and the third MSB indicates the half portion of the quadrant; thus, the first three MSBs of an angle are used to control the interchange/negation operation in the output stage. As shown in Figure 3, the corresponding angles of in the second, third, and fourth quadrants can be mapped into the first quadrant by setting the first two MSBs to zero. The radian of is therefore obtained by , which can be implemented by using simple shifters and adders array shown in Figure 4. Note that the third MSB of any radian value in the upper half of a quadrant is 1, and the sine/cosine of an angle in the upper half of a quadrant can be obtained from the corresponding angle in the lower half as shown in Figure 5. More specifically, as and , the normalized angle can be obtained by replacing with while the third MSB is 1. In case the third MSB is 0, there is no need to perform the replacement as .

3.3. Sine/Cosine Generator

As the core of the DDFS architecture, the sine/cosine generator produces sinusoidal waves based on the output of the radian converter. Without loss of generality, let the output resolution be of 16 bits, for the sine/cosine generator consisting of a cascade of processors, each of which performs the sub-rotation by a fixed angle of radian as follows:

For where representing the positive or zero subrotation, respectively. Figure 6 depicts the CORDIC processor-A for the first 7 microrotations, which consists of four 16-bit adders and four 16-bit shifters. The CORDIC processor-B with two 16-bit adders and two 16-bit shifters for the last 9 microrotations is shown in Figure 7.

The first CORDIC stages can be replaced by simple LUT to reduce the data path at the cost of hardware complexity increasing exponentially. Table 1 depicts the hardware costs in 16-bit DDFS with respect to the number of the replaced CORDIC-stages, where each 16-bit adder, 16-bit shifter, and 1-bit memory require 200 gates, 90 gates, and 1 gate [36], respectively. Figure 8 shows the hardware requirements with respect to the number of the replaced CORDIC-stages [24]. Figure 9 shows the SFDR/SNRs with respect to the replaced CORDIC-stages [25]. As one can expect, based on the above figures, there is a tradeoff between hardware complexity and performance in the design of DDFS.

3.4. Output Stage

Figure 10 shows the architecture of output stage, which maps the computed and to the desired and . As mentioned previously, the above mapping can be accomplished by simple negation and/or interchange operations. The three control signals: xinv, yinv, and swap derived from the first three MSBs of are shown in Table 2. xinv and yinv are for the negation operation of the output and swap for the interchange operation.

4. Hardware Implementation of the Scaling-Free CORDIC-Based DDFS

In this section, the proposed low-power and high-performance DDFS architecture () is presented. Figure 11 depicts the system block diagram; SFDR of the proposed DDFS architecture at output frequency is shown in Figure 12. As one can see, the SFDR of the proposed architecture is more than 86.85 dBc.

The platform for architecture development and verification has also been designed as well as implemented to evaluate the development cost [3740]. The proposed DDFS architecture has been implemented on the Xilinx FPGA emulation board [41]. The Xilinx Spartan-3 FPGA has been integrated with the microcontroller (MCU) and interface circuit (USB 2.0) to form the architecture development and verification platform.

Figure 13 depicts block diagram and circuit board of the architecture development and evaluation platform. In which, the microcontroller read data and commands from PC and writes the results back to PC via USB 2.0 bus; the Xilinx Spartan-3 FPGA implements the proposed DDFS architecture. The hardware code in Verilog runs on PC with the ModelSim simulation tool [42] and Xilinx ISE smart compiler [43]. It is noted that the throughput can be improved by using the proposed architecture, while the computation accuracy is the same as that obtained by using the conventional one with the same word length. Thus, the proposed DDFS architecture is able to improve the power consumption and computation speed significantly. Moreover, all the control signals are internally generated on-chip. The proposed DDFS provides both high performance and less hardware.

The chip has been synthesized by using the TSMC 0.18 μm 1P6M CMOS cell libraries [44]. The physical circuit has been synthesized by the Astro tool. The circuit has been evaluated by DRC, LVS, and PVS [45]. Figure 14 shows the cell-based design flow.

Figure 15 shows layout view of the proposed scaling-free CORDIC-based DDFS. The core size obtained by the Synopsys design analyzer is . The power consumption obtained by the PrimePower is 0.302 mW with clock rate of 500 MHz at 1.8 V. The tuning latency is 11 clock cycles. All of the control signals are internally generated on-chip. The chip provides both high throughput and low gate count.

5. Conclusion

In this paper, we present a novel DDFS architecture-based on the scaling-free CORDIC algorithm with small ROM and pipeline data path. Circuit emulation shows that the proposed high performance architecture has the advantages of high precision, high data rate, and simple hardware. For 16-bit DDFS, the SFDR of the proposed architecture is more than 86.85 dBc. As shown in Table 3, the proposed DDFS is superior to the previous works in terms of SFDR, SNR, output resolution, and tuning latency [6, 17, 18, 26, 27]. According to the high performance of the proposed DDFS, it is very suited for medical instruments and body care network systems [4649]. The proposed DDFS with the use of the portable Verilog is a reusable IP, which can be implemented in various processes with tradeoffs of performance, area, and power consumption.

Acknowledgment

The National Science Council of Taiwan under Grants NSC100-2628-E-239-002-MY2 and NSC100-2410-H-216-003 supported this work.