Research Article | Open Access
Reduced Voltage Scaling in Clock Distribution Networks
We propose a novel circuit technique to generate a reduced voltage swing (RVS) signals for active power reduction on main buses and clocks. This is achieved without performance degradation, without extra power supply requirement, and with minimum area overhead. The technique stops the discharge path on the net that is swinging low at a certain voltage value. It reduces active power on the target net by as much as 33% compared to traditional full swing signaling. The logic 0 voltage value is programmable through control bits. If desired, the reduced-swing mode can also be disabled. The approach assumes that the logic 0 voltage value is always less than the threshold voltage of the nMOS receivers, which eliminate the need of the low to high voltage translation. The reduced noise margin and the increased leakage on the receiver transistors using this approach have been addressed through the selective usage of multithreshold voltage (MTV) devices and the programmability of the low voltage value.
Continuous VLSI technology scaling has enabled integration of millions of transistors on a single chip working in over GHz clock frequencies. Besides area (cost) and performance, modern VLSI designs are critical to achieve low-power consumption due to limited battery lifetime in mobile applications, increased priority to achieve improved energy efficiency for data centers, web servers, supercomputing centers, and expensive alternative cooling options for personal computers.
LVS is an effective active power consumption reduction technique since active power consumption is proportional to signal voltage swing. Interconnects are responsible for up to 50% of the active power consumption, while up to 90% of interconnect power consumption comes from only 10% of the interconnects, such as clock networks and global signal busses . Developing LVS techniques for these powers hungry interconnects are critical to modern VLSI designs.
Power efficiency is an increasingly critical VLSI design objective. Low-power design for high-performance computers improves energy efficiency and reduces package cost for heat dissipation, while low-power design for mobile applications increases battery lifetime.
Low-voltage swing is an effective technique to reduce dynamic power consumption, especially for clocks which are among the most active signals in a VLSI circuits and generally consume up to 50% of the total power . Reduced voltage swing clock signals can be applied at the upper level of a clock tree for low-power, while clock gates (such as inverters) amplify the signals to full swing upon reaching sequential elements .
Existing techniques to generate reduced voltage swing signals require an extra low-power supply or need precise timing for a pulse signal which enables the driver gate, while a number of voltage level converters have been developed which trigger a reduced voltage swing signal into a full swing signal .
In general there are two techniques to reduce clocking swing, the first one is dual power supply voltage the second one is single power supply voltage. The first method adds more complexity to the overall design and layout. The second one, single supply voltage challenge, is the design of reduced swing buffers. Many papers [1, 4] implemented this method by utilizing pMOS for passing low logic level and nMOS for passing high level logic. Such techniques result in poor rise and fall times, which make it impractical for high-performance applications.
In this paper, we propose Reduced Voltage Swing (RVS) design comparing to the traditional Low Voltage Swing (LVS) technique. We elevate the low logic voltage instead of lowering the high logic voltage. We propose an inverter design which generates RVS signals at the cost of an extra transistor, and an extension of the RVS inverter with programmable gates for adjustable low logic voltage. We achieve (1) minimum area overhead (by not requiring an extra power supply network), (2) minimum performance degradation (by keeping the supply voltage and the high logic voltage), and (3) robustness to process variations (the logic 0 voltage is adaptive to process variations). The simulation results from HSPICE  tool show that we reduced active power consumption with very limited performance loss.
The rest of the paper is organized as follows. Existing low-voltage swing signal and clocking is presented in Section 2. Section 3 presents the reduce voltage swing principle and circuit followed by implementation and simulation results in Section 4. Finally Section 5 concludes the paper.
2. Existing Low-Voltage Swing Signaling and Clocking Schemes
Existing low-voltage swing circuits  process a number of deficiencies, such as the need for extra supplies, performance impact, differential signaling, and reliability degradation. They typically look at reducing the supply voltage on the targeted net, which impacts timing significantly. Most of the papers describing low or reduced voltage swing signals are targeting clock network or signal nodes with high capacitance to reduce power. Zhang et al.  surveyed the different options and circuits used to generate small or reduced signal swings. The paper lists the comparison of speed, power, and complexity of the different options. It also points out the deficiencies of each technique. They also proposed their own scheme called pseudodifferential Interconnect (PDIFF). However, all these LVS signaling techniques require an extra power supply which adds cost and complexity to the design.
An LVS clocking technique that requires only a single power supply is proposed , wherein intermediate clock buffers are turned off once they reach the desired voltage levels. This makes the clock node essentially floating and is susceptible to noise. Subsequent regular clock buffers act as amplifiers which restore the clock signal to full swing. The short circuit power consumption of these amplifier clock buffers is reduced through the usage of small and high threshold voltage transistors.
3. Reduce Voltage Swing Principle and Circuits
For clock distribution the synchronous clock must be distributed all over the chip with minimum possible skew. The clocking network consumes significant amount of power, Clock distribution interconnects, and their increased parasitic with scaling results in the increased power consumption. Typically, buffers are inserted within the clock network to isolate the downstream capacitance; thus it is reducing the transition times and increases amount of power consumption substantially.
As stated and used in [6–8] there is a need to reduce the power dissipation of the clock network while maintaining the performance objectives. Power can be reduced by reducing Clock frequency. However, the frequency cannot be changed without significant architectural changes. So alternatively, power can be reduced by reducing the total load capacitance, CL, on all nodes, reducing VDD or reducing Vswing, without reducing VDD, which corresponds to a linear reduction in the power dissipation.
The former works on reducing V(swing) via adding extra supplies to the selected nets. Extra supply not only adds cost and area but also increases the complexity of routing and switching off two power grids for power savings.
Figure 1 shows the traditional waveforms obtained with the conventional reduce voltage swing approach. It also shows the waveforms produced by our proposed circuit. The main difference is that with the proposed circuit, the voltage waveform still reaches. This eliminates the timing impact; also the rising edge delay of the lclk1 from clk_in (t1) (Figure 1) in our approach is not impacted.
(a) Conventional reduce voltage swing waveform
(b) Proposed reduce voltage swing waveform
The proposed RVS circuit is shown in Figure 2 with both receiver and driver using the same power supply. We showed both traditional and the new circuit with the expected waveforms. In the proposed approach no extra supply is needed when generating the RVS signal (lclk2). The overhead of adding a transistor to the driver is very minimal compared to the total net capacitance especially when the driver fans out to many receivers which is true in the clock distribution network or long wire net.
The transistors in our proposed design are with low threshold voltage (LVT) for the driver and high threshold voltage (HVT) for the receiver. This enables a built in noise margin on the net lclk2 which is equal to the voltage difference between the HVT and LVT values. Also the receiver HVT transistor and its drain to source voltage being less than vdd minimize the increased leakage due to elevated gate voltage. The receiver SSTC latch topology  is selected because the clock pin goes only into nFET transistor which eliminates the need of level translation to prevent short circuit current. Another advantage of our proposed design is that the lclk2 is always going to be actively driven. In case of coupling noise high on lclk2 net the pull down stack will turn on and clear any charge on the net before it reaches the threshold for the receiver HVT FET. This is true because the driver is LVT and the receiver is HVT device. The addition of series transistor to the final driver slows down the falling edge of the clock which only affect hold time and not the speed of the circuit (clk-> q- delay). The M1 transistor that is controlled by power_mode is meant as an override mode to the system. If power_mode is set to 1, the RVS circuit will behave the same as the traditional one.
One limitation of our proposed technique which shown in Figure 2 is that it only limits the swing of lclk2 between vdd and vdd-vt where vt is the value of the threshold voltage of the LVT transistor. We developed another circuit (Figure 3) that gives programmability to the value to logic 0 based on control bits Cnt. The Programmable Reduced Voltage Swing (PRVS) circuit can vary the logic 0 value based on how many bits of Cnt bus are selected. Each of the Cnt bits corresponds to W transistor and it varies how fast the fdb node can be discharged to vt through the dotted path 1. Both Mp1 and Mkp are minimum size devices to pull up the fdb node and both have no impact on the circuit speed.
To summarize the differences of our proposed technique to traditional techniques in terms of some of the key design metrics we have the following.(1)Area. Traditional LVS signaling requires an extra power supply routing of such an extra power supply network gives rise to considerable area overhead. RVS signaling does not require extra power supply and has only one extra transistor for each inverter.(2)Power consumption. Active power consumption is proportional to signal voltage swing. As a result, LVS and RVS signaling are equivalent in reducing signal voltage swing hence active power consumption.(3)Performance. Low supply voltage and low logic 1 voltage in LVS signaling lead to performance degradation. While in RVS signaling, the constant supply voltage and logic 1 voltage do not degrade performance 2.(4)Noise margin. The reduced signal voltage swing needs to cover the receiver flip-flop’s meta-stability point (e.g., 0.5 ), and the minimum distance from the metastability point to input signal voltage swing boundary gives noise
4. Simulation Result
We compared our proposed RVS inverter and traditional full voltage swing (FVS) inverter, we used HSPICE tools for the simulation, and results for the two inverters with 2.00 fF load capacitance under 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, and 1.5 V supply voltage are shown in Table 1. Table 1 gives the average comparison results between FVS and RVS inverters.
To further verify this approach, we build a clock spine which drives an array of 6 32 = 192 flip-flops. HSPICE simulation shows that by replacing the clock buffers with the proposed RVS buffers as shown in Figure 4, we achieve 37.2% power reduction, while the signal propagation delay from the spine input to a flip-flop is degraded by 8.6%, and clock signal slew rate is degraded by 30.0%. Table 2 gives the comparison results and Figure 5 shows waveform of RVS and output results.
In this paper, we propose Reduced-Voltage-Swing (RVS) signaling as compared to the traditional Low-Voltage-Swing (LVS) signaling for reduced active power consumption. We achieve minimum area overhead (without routing an extra power supply network and a minimum number of extra transistors), equivalent active power reduction, and minimum performance degradation. HSPICE simulation results using Arizona state university technologies with respect to a variety of design parameters (supply voltage, load capacitance, input signal slew rate, etc.) verify the effectiveness of these novel RVS circuits, which save an average of 37.2% dynamic power, with 8.6% clock insertion delay increase in a clock spine driving 192 flip-flops.
- N. Magen, A. Kolodny, U. Weiser, and N. Shamir, “Interconnect-power dissipation in a microprocessor,” in Proceedings of the International Workshop on System Level Interconnect Prediction (SLIP '04), pp. 7–13, Februery 2004.
- S. K. H. Fung, H. T. Huang, S. M. Cheng et al., “65nm CMOS high speed, general purpose and low power transistor technology for high volume foundry application,” in Proceedings of the Digest of Technical Papers Symposium on VLSI Technology, pp. 92–93, June 2004.
- F. Haj and M. Sachdev, “A low-power reduced swing global clocking methodology,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 5, pp. 538–545, 2004.
- H. Zhang, V. George, and J. M. Rabaey, “Low-swing on-chip signaling techniques: effectiveness and robustness,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 3, pp. 264–272, 2000.
- “Typical HSPICE model files,” http://ptm.asu.edu/.
- P.-F. Lu, L. Sigal, N. Cao, P. Woltgens, R. Robertazzi, and D. Heidel, “A low-voltage swing latch for reduced power dissipation in high-frequency microprocessors,” in Proceedings of the IEEE International SOI Conference, pp. 165–167, October 2004.
- A. D. Bailey, J. Di, S. C. Smith, and H. A. Mantooth, “Ultra-low power delay-insensitive circuit design,” in Proceedings of the IEEE Midwest Symposium on Circuits and Systems, pp. 503–506, 2008.
- S. Lin, Y.-B. Kim, and F. Lombardi, “A 32nm SRAM design for low power and high stability,” in Proceedings of the IEEE Midwest Symposium on Circuits and Systems, pp. 422–425, 2008.
- J. Yuan and C. Svensson, “New single-clock CMOS latches and flipflops with improved speed and power savings,” IEEE Journal of Solid-State Circuits, vol. 32, no. 1, pp. 62–69, 1997.
Copyright © 2009 Khader Mohammad et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.