VLSI Design

VLSI Design / 2011 / Article
Special Issue

CAD for Gigascale SoC Design and Verification Solutions

View this Special Issue

Research Article | Open Access

Volume 2011 |Article ID 845957 | https://doi.org/10.1155/2011/845957

Yoni Aizik, Avinoam Kolodny, "Finding the Energy Efficient Curve: Gate Sizing for Minimum Power under Delay Constraints", VLSI Design, vol. 2011, Article ID 845957, 13 pages, 2011. https://doi.org/10.1155/2011/845957

Finding the Energy Efficient Curve: Gate Sizing for Minimum Power under Delay Constraints

Academic Editor: Shiyan Hu
Received23 Sep 2010
Accepted28 Jan 2011
Published07 Apr 2011

Abstract

A design scenario examined in this paper assumes that a circuit has been designed initially for high speed, and it is redesigned for low power by downsizing of the gates. In recent years, as power consumption has become a dominant issue, new optimizations of circuits are required for saving energy. This is done by trading off some speed in exchange for reduced power. For each feasible speed, an optimization problem is solved in this paper, finding new sizes for the gates such that the circuit satisfies the speed goal while dissipating minimal power. Energy/delay gain (EDG) is defined as a metric to quantify the most efficient tradeoff. The EDG of the circuit is evaluated for a range of reduced circuit speeds, and the power-optimal gate sizes are compared with the initial sizes. Most of the energy savings occur at the final stages of the circuits, while the largest relative downsizing occurs in middle stages. Typical tapering factors for power efficient circuits are larger than those for speed-optimal circuits. Signal activity and signal probability affect the optimal gate sizes in the combined optimization of speed and power.

1. Introduction

ā€‰Optimizing a digital circuit for both energy and performance involves a tradeoff, because any implementation of a given algorithm consumes more energy if it is executed faster. The tradeoff between power and speed is influenced by the circuit structure, the logic function, the manufacturing process, and other factors. Traditional design practices tend to overemphasize speed and waste power. In recent years power has become a dominant consideration, causing designers to downsize logic gates in order to reduce power, in exchange for increased delay. However, resizing of gates to save power is often performed in a nonoptimal way, such that for the same energy dissipation, a sizing that results in better performance could be achieved.

In this paper, we explore the energy-performance design space, evaluating the optimal tradeoff between performance and energy by tuning gate sizes in a given circuit. We describe a mathematical method that minimizes the total energy in a combinational CMOS circuit, for a given delay constraint. It is based on an extension of the Logical Effort [1] model to express the dynamic and leakage energy of a path as well as the delay. Starting from the minimum achievable delay, we apply the method for a range of longer delays, in order to find the optimal energy-delay relation for the given circuit. We show that downsizing all gates in a fast circuit by the same factor does not yield an energy-efficient design, and we characterize the differences between gate sizing for high speed and sizing for low power.

In trading off delay for energy, we are interested only in a subset of all the possible downsized circuits2014those implementations that are energy efficient. A design implementation is considered to be energy efficient when it has the highest performance among all possible configurations dissipating the same power [2, 3]. When the optimal implementations are plotted in the energy-delay plane, they form a curve called the energy efficient curve. In Figure 1, each point represents a different hardware implementation. The implementations which belong to the energy efficient family reside on the energy efficient curve.

Zyuban and Strenski [3, 4] introduce the hardware intensity metric. Hardware intensity (šœ‚) is defined to be the ratio of the relative increase in energy to the corresponding relative gain in performance achievable ā€œlocallyā€ through gate resizing and logic manipulation at a fixed power-supply voltage for a power efficient design. Simply put, it is the ratio of % energy per % speed performance tradeoff for an energy-efficient design. Since speed performance is inversely proportional to delay,šœ‚=āˆ’1/šø1/š·šœ•šøšœ•š·,(1) where š· is delay, šø is the dissipated energy, and šœ‚ represents the hardware intensity. The hardware intensity is a measure of the ā€œdifferentialā€ energy-performance tradeoff (the energy gained if the delay is relaxed by a small Ī”š· around a given delay and energy point on the energy efficient curve), and is actually the sensitivity of the energy to the delay.

As shown in [3], each point on the energy efficient curve corresponds to a different value of the hardware intensity šœ‚. The hardware intensity decreases along the energy efficient curve towards larger delay values. According to [3], šœ‚ is equivalent to the tradeoff parameter š‘› in the commonly used optimization objective function combining energy and delay:š¹opt=šøā‹…š·š‘›,š‘›ā‰„0.(2)

In [5], Brodersen et al. formalize the tradeoff between energy and delay via sensitivities to tuning parameters. The sensitivity of energy to delay due to tuning the size š‘Šš‘– of gate š‘– is defined asšœƒī€·š‘Šš‘–ī€ø=āˆ’1/šø1/š·ā‹…šœ•šø/šœ•š‘Šš‘–šœ•š·/šœ•š‘Šš‘–,(3) where šœƒ(š‘Šš‘–) is the sensitivity, š· is the delay, šø is the energy, šœ•šø/šœ•š‘Šš‘– is the derivative of energy with respect to size of device š‘–, and šœ•š·/šœ•š‘Šš‘– is the derivative of delay with respect to size of device š‘–. To achieve the most energy-efficient design, the energy reduction potentials of all the tuning variables must be the same. Therefore, for an energy efficient design, (3) is equivalent to (1) for all points on the energy efficient curve.

The focus of this paper is on the conversion to low power of circuits that were optimized only for speed during their initial design process. Optimal downsizing is applied to each gate for each relaxed delay target, such that the whole energy efficient curve is generated for the circuit. Note that the gate sizes are allowed to vary in a continuous manner between a minimum and a maximum size. While the resultant gate sizes would be mapped into a finite cell library in a practical design, the continuous result for some basic circuits provides guidelines and observations about CMOS circuit design for low power.

The rest of this paper is organized as follows: The design scenario is described in Section 2. Usage of logical effort to analyze the delay and energy is described in Section 3. The optimization problem is formalized in Section 4. Typical circuit types are analyzed in Section 5. Section 6 concludes the paper.

2. Power Reduction Design Scenario

Typically, an initial circuit is given, where speed was the only design goal. In order to save energy, the delay constraint is relaxed, and the gates sizes are reduced. For example, consider Figure 1, with the initial circuit implementation 0, which is energy efficient. While relaxing the delay constraint (moving from š·0 to š·1), the design gets downsized, which results in circuit implementation 1.

To calculate the energy gain achievable by relaxing the delay by š‘‹ percent, we define a metric we call ā€œEnergy Delay Gainā€ (EDG). The EDG is defined as the ratio of relative decrease in energy to the corresponding relative increase in delay, with respect to the initial design point (š·0, šø0). š·0 is the initial delay (not necessarily the minimum achievable delay), and šø0 is the corresponding initial energy. Note that the EDG defines the total energy-performance tradeoff, as opposed to the differential tradeoffā€”the hardware intensity. Mathematically, EDG at a given delay š· with corresponding energy šø is defined asEDG=ī€·šø0āˆ’šøī€ø/šø0ī€·š·āˆ’š·0ī€ø/š·0.(4)

For example, assuming that the initial design point in Figure 1 is implementation 0, then the EDG of point 1 isī€·šø0āˆ’šø1ī€ø/šø0ī€·š·1āˆ’š·0ī€ø/š·0.(5) Figure 2 illustrates the difference between hardware intensity and EDG. It shows the energy efficient curve of a given circuit, where š·0 is the initial delay, and šø0 is the corresponding initial energy. The hardware intensity is the ratio between the slope of the tangent to the energy efficient curve at point (š·, šø) to the slope of the line connecting the origin to point (š·, šø). The EDG is the ratio between the slope of the line connecting points (š·0, šø0) and (š·, šø), to the slope of the line connecting the origin to point (š·0, šø0). Note that when point (š·, šø) is close to (š·0, šø0), the two definitions converge.

Resizing of the gates to tradeoff performance with active energy is the most practical approach available to the circuit engineer. Continuous gate sizes has been used for optimizing delay under area constraints and vice versa [6]. Other degrees of freedom include logic restructuring, tuning of threshold voltages or supply voltage, and power gating. Changing the threshold voltage affects mainly the leakage energy, and not the dynamic energy dissipation [7, 8], so does power gating [9, 10]. Logic restructuring of the circuit could be an effective method to trade off energy and performance, by reducing the load on high activity nets, and by introducing new nodes that have a lower switching activity [11]. However, changing the circuit topology may increase the time required for the design process to converge. Changing the supply voltage is an effective technique as well [3, 5, 7, 11ā€“14]. However, in most cases, changing the supply voltage for a subcircuit requires major changes in the package and in the system and therefore is not practical. For instance, latest state-of-the art CPUs include only 1-2 power planes [15, 16].

In the following sections, we set up an optimization framework that maximizes the energy saving for any assumed delay constraint in a given combinational CMOS circuit. It determines the appropriate sizing factor for each gate in the circuit. For primary inputs and outputs of the circuit we assume that fixed capacitances. Given activity factor and signal probability are assumed at each node of the circuit. The result of this optimization process is equivalent to finding the energy-efficient curve for the given circuit.

3. Analytical Model

The optimization problem we solve is defined as follows. Given a path in a circuit with initial delay (minimum or arbitrary) š·0 and the corresponding energy consumption šø0, find gate sizing that maximizes the EDG for an assumed delay constraint. We use the logical effort method [1] in order to calculate the delay of a path and adapt it to calculate the dynamic and leakage energy dissipation of the circuit.

For a given path (Figure 3), we assume that constant input and output loads and an initial sizing that is given as input capacitance for each gate. For each gate we apply a sizing factor š‘˜. The input capacitance of the resized š‘–th gate is expressed as the initial input capacitance š¶0š‘– multiplied by š‘˜š‘–. The energy-delay design space is explored by tuning the š‘˜ā€™s.

The following properties are defined:š‘€š‘–: number of inputs to gate š‘–,AFš‘—š‘–: activity factor (switching probability) of input š‘— in gate š‘–,AFš‘–š‘œ:output activity factor of gate š‘–,š‘”š‘–:logical effort of gate š‘–,š‘š‘–:parasitic delay of gate š‘–,š¶0š‘–:initial capacitance of gate š‘– that achieves initial path delay (corresponds to (š·0, šø0)),š¶oļ¬€š‘–:off-path constant capacitance driven by gate š‘–,š‘ƒleakš‘–:the average leakage power for gate š‘–, for a unit input capacitance,š‘˜š‘–:sizing factor for gate š‘–. The š‘˜'s are used in the gate downsizing process. For each gate š‘–, š‘˜š‘–ā‹…š¶0š‘– is the gate size. Although specified, š‘˜1 is assumed to be 1 (constant driver).

3.1. Energy of a Logic Path
3.1.1. Switching Energy

The switching energy of a static CMOS gate š‘– with š‘€š‘– inputs and a single output isSwitchingEnergy=š‘€š‘–ī“š‘—=1AFš‘—š‘–ā‹…š¶š‘—ā‹…š‘‰2š‘—ī„æī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…ƒī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…Œinputenergy+AFoutš‘–ā‹…š¶outš‘–ā‹…š‘‰2outš‘–ī„æī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…ƒī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…Œoutputenergy.(6) Assuming that the voltage amplitude for each net in the design is the same (Vš‘š‘), we can define a parameter called dynamic capacitance (š¶dyn), which is the switching energy normalized by Vš‘š‘. The dynamic capacitance of a gate š‘– (š¶dynš‘–) is š¶dynš‘–=SwitchingEnergyVš‘š‘2=š‘€š‘–ī“š‘—=1AFš‘—š‘–ā‹…š¶š‘—+AFoutš‘–ā‹…š¶outš‘–,(7)

Without loss of generality, we assume that the first input of each gate resides on the investigated path. We assume that the inputs of the gates we deal with are symmetrical (input capacitance on each input pin is equal) and the gates are noncompound (i.e., gates implementing functions like š‘Žā‹…š‘+š‘ are out of scope). Our method can be easily extended to support these types. Under these assumptions, all input capacitances of a given gate are identical. Therefore, the input š¶dyn of gate š‘– (š¶dyninš‘–) isš¶dyninš‘–=š¶0š‘–ā‹…š‘˜š‘–š‘€š‘–ī“š‘—=1AFš‘—š‘–=š¶0š‘–ā‹…š‘˜š‘–ā‹…AFš‘–,(8) where AFš‘– is defined to be āˆ‘š‘€š‘–š‘—=1AFš‘—š‘–ā€”sum of activity factors for input pins of gate š‘–. Note that unlike calculating the delay of a gate, when calculating the gate energy, all input and output nets of a gate have to be taken into consideration. The š¶dyn of the nets not in the desired path should not be overlooked.

The output capacitance of a gate is defined to be its self loading and is combined mainly of the drain diffusion capacitors connected to the output. The parasitic delay of gate š‘– in logical effort method, denoted by š‘š‘–, is proportional to the diffusion capacitance. The logical effort of gate š‘–, denoted by š‘”š‘–, expresses the ratio of the input capacitance of gate š‘– to that of an inverter capable of delivering the same current. It is easy to see that the output capacitance of gate š‘– can be expressed asš¶outš‘–=š¶inš‘–š‘”š‘–š‘š‘–.(9)

We can now rewrite (7) using the notation defined previously:š¶dynš‘–=š¶0š‘–š‘˜š‘–ā‹…AFš‘–+š¶0š‘–š‘˜š‘–š‘”š‘–š‘š‘–ā‹…AFš‘–š‘œ.(10)

Besides the gates in the path, we have to take into account the š¶dyn of the side loads. Multiplying š¶offš‘– by AF1š‘– results in the š¶dyn of the off-path load driven by gate š‘–. We use (10) to calculate š¶dyn of a desired path:š¶dyn=AF1ā‹…š¶in1ī„æī…€ī…€ī…€ī…€ī…ƒī…€ī…€ī…€ī…€ī…Œinputš¶dyn+š‘ī“š‘–=2AFš‘–ā‹…š¶inš‘–+AFš‘–āˆ’1š‘œā‹…š¶outš‘–āˆ’1ī„æī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…ƒī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…Œstageš‘–š¶dyn+AFš‘š‘œ(š¶outš‘+š¶load)ī„æī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…ƒī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…€ī…Œoutputš¶dyn+š‘ī“š‘–=1š¶offš‘–ā‹…AF1š‘–ī„æī…€ī…€ī…€ī…€ī…€ī…ƒī…€ī…€ī…€ī…€ī…€ī…Œš¶dynofoļ¬€pathloadš‘–.(11)

Substituting input š¶dyn with (8) and š¶outš‘– with (9), and rearranging the formula, we getš¶dyn=š‘ī“š‘–=1š‘˜š‘–ī‚µAFš‘–ā‹…š¶0š‘–+AFš‘–š‘œā‹…š¶0š‘–ā‹…š‘š‘–š‘”š‘–ī‚¶+AFš‘+1ā‹…š¶load+š‘ī“š‘–=1š¶offš‘–ā‹…AF1š‘–.(12)

By definingš¶dynš‘–ā‰œAFš‘–ā‹…š¶0š‘–+AFš‘–š‘œā‹…š¶0š‘–ā‹…š‘š‘–š‘”š‘–,š¶dyn-oļ¬€ā‰œš‘ī“š‘–=1š¶offš‘–ā‹…AF1š‘–,(13) we getš¶dyn=š‘ī“š‘–=1š¶dynš‘–ā‹…š‘˜š‘–+AFš‘+1ā‹…š¶load+š¶dyn-oļ¬€.(14)

The initial š¶dyn is achieved by setting all š‘˜ī…žš‘–š‘  to 1:š¶0dynā‰œš¶dyn||š‘˜ī…žš‘–š‘ =1=š‘ī“š‘–=1š¶dynš‘–+AFš‘+1ā‹…š¶load+š¶dyn-oļ¬€.(15)

3.1.2. Leakage Energy

The leakage energy of a static CMOS gate š‘– with š‘€š‘– inputs and a single output can be expressed asLeakageEnergyofGateš‘–=š‘‡cycleā‹…š‘ƒleakš‘–ā‹…š¶0š‘–,(16) where š‘‡cycle is the cycle time of the circuit, and š‘ƒleakš‘– is the average leakage power for gate š‘–, for a unit input capacitance. š‘ƒleakš‘– is a function of the manufacturing technology, gate topology, and signal probability (SP: the probability for a signal to be in a logical TRUE state at a given cycle) for each input. See [8, 17, 18] for leakage power calculation methods. Under a given workload, š‘ƒleakš‘– should be precalculated for each gate š‘–. Since š‘ƒleakš‘– is sensitive to the signal probability, it needs to be recalculated whenever the workload is modified, to reflect changes in gatesā€™ signal probability.

By dividing the leakage energy by V2š‘š‘, we can express the leakage in terms of capacitance:LeakageCapacitanceofGateš‘–ā‰œš¶leakš‘–=1V2š‘š‘š‘‡leakš‘–ā‹…š¶0š‘–ā‹…š‘ƒleakš‘–.(17) And the total š¶leak is equal to:š¶leak=1V2š‘š‘š‘‡cycleš‘ī“š‘–=1ī‚€š‘˜š‘–ā‹…š¶0š‘–ā‹…š‘ƒleakš‘–ī‚=š‘ī“š‘–=1š‘˜š‘–ā‹…š¶leakš‘–.(18)

The initial š¶leak is achieved by setting all š‘˜ī…žš‘–š‘  to 1:š¶0leakā‰œš¶leak||š‘˜ī…žš‘–š‘ =1=š‘ī“š‘–=1š¶leakš‘–.(19)

By combining (14), (15), (18), and (19) we can express the total capacitance and the initial capacitance of a desired path:š¶path=š‘ī“š‘–=1š‘˜š‘–ī‚€š¶dynš‘–+š¶leakš‘–ī‚+AFš‘+1ā‹…š¶load+š¶dyn-oļ¬€,š¶0path=š‘ī“š‘–=1ī‚€š¶dynš‘–+š¶leakš‘–ī‚+AFš‘+1ā‹…š¶load+š¶dyn-oļ¬€.(20)

The energy decrease rate (š‘’dec) due to downsizing of the gates by a factor of š‘˜ is expressed asš‘’dec=š¶0pathāˆ’š¶pathš¶0path=āˆ‘š‘š‘–=1ī‚€š¶dynš‘–+š¶leakš‘–ī‚ī€·1āˆ’š‘˜š‘–ī€øāˆ‘š‘š‘–=1ī‚€š¶dynš‘–+š¶leakš‘–ī‚+AFš‘+1ā‹…š¶load+š¶dyn-oļ¬€.(21)

In order to estimate the upper bound of š‘’dec, we assume an initial design point with minimum delay for š¶0path, and set the sizes of the gates in the path to minimum allowed feature size (š¶min), to reflect the minimum possible š¶path. By definingš¶mindynš‘–ā‰œAFš‘–ā‹…š¶min+AFš‘–š‘œā‹…š¶minā‹…š‘š‘–š‘”š‘–,š¶minleakš‘–ā‰œ1V2š‘š‘š‘‡cycleā‹…š¶minā‹…š‘ƒleakš‘–.(22) We getš‘’decā©½š‘’MAXdec=āˆ‘š‘š‘–=1ī‚€š¶dynš‘–+š¶leakš‘–ī‚āˆ’āˆ‘š‘š‘–=1ī‚€š¶mindynš‘–+š¶minleakš‘–ī‚āˆ‘š‘š‘–=1ī‚€š¶dynš‘–+š¶leakš‘–ī‚+AFš‘+1ā‹…š¶load+š¶dyn-oļ¬€.(23)

By using (23), the upper bound to the EDG at a given delay increase rate (š‘‘inc)ā€”EDGMAX(š‘‘inc) can also be calculated, simply by dividing š‘’MAXdec by š‘‘inc:EDGMAX(š‘‘inc)=š‘’MAXdecš‘‘inc.(24)EDGMAX(š‘‘inc) can be used by the circuit designer to quickly evaluate the potential for saving power. However, the designer should note that the value of EDGMAX(š‘‘inc) is a nonreachable upper bound since the minimum sizing leads to a delay increase which is always greater than the one that the designer refers to. If the value of EDGMAX(š‘‘inc) is not sufficient, other energy reduction techniques should be considered.

3.2. Delay of a Logic Path

When using the logical effort notation, the path delay (š·) is expressed asš·=š‘ī“š‘–=1š‘”š‘–ā„Žš‘–+š‘ƒ.(25) The electrical effort of stage š‘– (ā„Žš‘–) is calculated as the ratio between capacitance of gate š‘–+1 and gate š‘–, plus the ratio of side load capacitance of gate š‘– and input capacitance of gate š‘–. For the sake of simplicity, š‘˜š‘+1 and š‘˜1 are defined to be 1. Using the notation defined earlier, the path delay š· can be written as:š·=š‘ī“š‘–=1š‘”š‘–āŽ›āŽœāŽš¶0š‘–+1š‘˜š‘–+1š¶0š‘–š‘˜š‘–+Coffš‘–š¶0š‘–š‘˜š‘–āŽžāŽŸāŽ +š‘ƒ.(26) By defining š·0š‘–ā‰œš‘”š‘–š¶0š‘–+1š¶0š‘–,š·1š‘–ā‰œš‘”š‘–š¶offš‘–š¶0š‘–.(27)Equation (26) becomesš·=š‘ī“š‘–=1ī‚µš·0š‘–š‘˜š‘–+1š‘˜š‘–+š·1š‘–1š‘˜š‘–ī‚¶+š‘ƒ.(28) The initial delay is achieved by setting all š‘˜ī…žš‘–š‘  to 1.š·0ā‰œš·||š‘˜ī…žš‘–š‘ =1=š‘ī“š‘–=1ī€·š·0š‘–+š·1š‘–ī€ø+š‘ƒ.(29)

And therefore, the delay increase rate (š‘‘inc) due to downsizing of the gates by a factor of š‘˜š‘– isš‘‘inc=š·āˆ’š·0š·0=āˆ‘š‘š‘–=1ī€·š·0š‘–š‘˜š‘–+1/š‘˜š‘–+š·1š‘–1/š‘˜š‘–ī€ø+š‘ƒāˆ’š·0š·0.(30)

4. Optimizing Power and Performance

Given a delay value that is š‘‘inc percent greater than the initial delay š·0, we seek the path sizing (š¶02ā‹…š‘˜2ā‹Æš¶0š‘ā‹…š‘˜š‘) that maximizes the energy reduction rate š‘’dec.

From (21), maximizing š‘’dec is achieved by minimizing š¶dyn. By ignoring the factors that do not depend on š‘˜š‘– and will not affect the optimization process in (20), we define an objective function š‘“0:š‘“0=š‘ī“š‘–=1š‘˜š‘–ī‚€š¶dynš‘–+š¶leakš‘–ī‚.(31) Note that š‘“0 depends linearly on the dynamic and the leakage capacitances, which apply weights and determine the importance of each š‘˜š‘–. Equation (31) can also be written as:š‘“0=š‘ī“š‘–=1š‘˜š‘–š¶0š‘–ī‚µ1V2š‘š‘š‘‡cycleš‘ƒleakš‘–+AFš‘–+AFš‘–š‘œā‹…š‘š‘–š‘”š‘–ī‚¶.(32) Note that when all gates in a path are of the same type, all activity factors are equal, and average leakage power for all gates in the path is equal, both š¶dynš‘– and š¶leakš‘– can be eliminated from (31) without affecting the optimization result. These conditions are satisfied on an inverter chain with input signal probability of 0.5, for instance. In this case, the leakage power of activity factor has no influence on the optimization result.

To get a canonical constraint goal, in which the constraint is less than or equal 1, we rearrange (30) toš‘ī“š‘–=1ī‚µš·0š‘–š‘˜š‘–+1š‘˜š‘–+š·1š‘–1š‘˜š‘–ī‚¶=š‘‘incš·0+š·0āˆ’š‘ƒ,(33) and defineš·ī…ž0š‘–ā‰œš·0š‘–š‘‘incš·0+š·0āˆ’š‘ƒ,š·ī…ž1š‘–ā‰œš·1š‘–š‘‘incš·0+š·0āˆ’š‘ƒ,(34) to getš‘ī“š‘–=1ī‚µš·ī…ž0š‘–š‘˜š‘–+1š‘˜š‘–+š·ī…ž1š‘–1š‘˜š‘–ī‚¶=1.(35)

We now can use (35) to get an optimization constraintš‘“1=š‘ī“š‘–=1ī‚µš·ī…ž0š‘–š‘˜š‘–+1š‘˜š‘–+š·ī…ž1š‘–1š‘˜š‘–ī‚¶ā©½1.(36) Combining (32) and (31) results in the following optimization problem:Minimizeš‘“0ī€·š‘˜1ā‹Æš‘˜š‘ī€ø,subjecttoš‘“1ī€·š‘˜1ā‹Æš‘˜š‘ī€øā©½1,whereš‘“0ī€·š‘˜1ā‹Æš‘˜š‘ī€ø=š‘ī“š‘–=1š‘˜š‘–ī‚€š¶dynš‘–+š¶leakš‘–ī‚,š‘“1ī€·š‘˜1ā‹Æš‘˜š‘ī€ø=š‘ī“š‘–=1ī‚µš·ī…ž0š‘–š‘˜š‘–+1š‘˜š‘–+š·ī…ž1š‘–1š‘˜š‘–ī‚¶.(37) However, š‘“1 defined above is nonconvex. We use geometrical programming [19ā€“21] to solve the optimization problem, by changing variablesī‚š‘˜š‘–=logī€·š‘˜š‘–ī€øāŸ¹š‘˜š‘–=š‘’ī‚š‘˜š‘–,īƒ“š¶dynš‘–=logī‚€š¶dynš‘–ī‚āŸ¹š¶leakš‘–=š‘’īƒ“š¶leakš‘–,īƒ“š¶leakš‘–=logī€·š¶leakš‘–ī€øāŸ¹š¶dynš‘–=š‘’īƒ“š¶dynš‘–,ī‚‹š·ī…ž0š‘–=logī€·š·ī…ž0š‘–ī€øāŸ¹š·ī…ž0š‘–=š‘’ī‚‹š·ī…ž0š‘–,ī‚‹š·ī…ž1š‘–=logī€·š·ī…ž1š‘–ī€øāŸ¹š·ī…ž1š‘–=š‘’ī‚‹š·ī…ž1š‘–.(38)

So the equivalent convex optimization problem (which can be solved using convex optimization tools) is:Minimizeī‚š‘“0ī€·š‘˜1ā‹Æš‘˜š‘ī€ø,subjecttoī‚š‘“1ī€·š‘˜1ā‹Æš‘˜š‘ī€øā©½0,whereī‚š‘“0ī€·š‘˜1ā‹Æš‘˜š‘ī€ø=logāŽ›āŽœāŽš‘ī“š‘–=1š‘’ī‚š‘˜š‘–+īƒ“š¶dynš‘–+š‘’ī‚š‘˜š‘–+īƒ“š¶leakš‘–āŽžāŽŸāŽ ,ī‚š‘“1ī€·š‘˜1ā‹Æš‘˜š‘ī€ø=logāŽ›āŽœāŽš‘ī“š‘–=1š‘’Ģƒā€Œš‘˜š‘–+1āˆ’ī‚š‘˜š‘–+ī‚‹š·ī…ž0š‘–+š‘’ī‚‹š·ī…ž1š‘–āˆ’ī‚š‘˜š‘–āŽžāŽŸāŽ .(39)

The convexity of (39) ensures that a solution to the optimization problem exists, and that the solution is the global optimum point. In order to obtain the EDG curve, the delay increase rate is swept from 0 to the desired value, and for each delay increase value, a different optimization problem is solved by geometrical programming.

This result can be extended to handle circuit delay, instead of a single path delay. All paths must be enumerated, and the optimized delay should reflect the critical path delay. The critical path delay is calculated as the maximum delay of all enumerated paths. However, the MAX operator cannot be handled directly in geometrical programming, since it produces a result which is not necessarily differentiable. Boyd et al. [20] solve the general problem of using the MAX operator in geometrical programming (MAX(š‘“1(š‘„),š‘“2(š‘„)ā€¦š‘“š‘(š‘„))ā©½1) by introducing a new variable š‘”, and š‘ inequalities (š‘ being the number of paths), to obtainš‘”ā©½1,š‘“1(š‘„)ā©½š‘”,š‘“2(š‘„)ā©½š‘”,ā‹Æš‘“š‘(š‘„)ā©½š‘”.(40) This transformation can be used in order to feed the critical path into the optimizer. To calculate the energy-delay tradeoff, the š¶dyn of the entire circuit should be taken into account.

In the following sections, we employ this procedure to characterize the EDG and power reduction in typical logic circuits, and derive design guidelines.

5. Exploring Energy-Delay Tradeoff in Basic Circuits

We run numerical experiments that explore the EDG of some basic circuits. We use GGPLAB [22] as a geometrical programming optimizer, to solve the optimization problem (37) and (39). GGLAB is a free open source library and can be easily installed over Matlab. For each experiment, we provide an EDG curve which is obtained by optimizing the circuit for a wide range of increased delay values. Although the propagation delay and the active energy dissipation are technology independent, the leakage depends on the manufacturing technology and the circuit's cycle time. Throughout this section, the leakage is calculated according to the 32ā€‰nm technology node of the ITRS 2007 projection [23], in which š¶leakinv is calculated to be 0.5694, based on clock frequency of 2ā€‰GHz and signal probability of 0.5.

5.1. Inverter Chain

Consider a chain consisting of š‘ inverters, with output load of š¶out. š¶01 is set arbitrarily to a constant value of 1ā€‰ff, and therefore the path electrical effort (š») is š¶out (Figure 4). We set initial gate capacitances (š¶02ā‹Æš¶0š‘) that ensure minimum delay, using the logical effort methodology. The minimum delay was obtained by setting the electrical effort to be the š‘th root of the path electrical effort. The leakage calculation takes into account the signal probability of the inverters in the chain.

Figure 5(a) shows the EDG for different combinations of path electrical effort (š») and chain length (š‘) where the leakage energy is negligible. Figure 5(b) shows the same analysis, for negligible dynamic energy. In both cases, the largest potential for energy savings occurs near the point where the design is sized for minimum achievable delay. The potential for energy savings decreases as the delay is being relaxed further. This is consistent with the observation in [5].

Figure 6 shows the optimal sizing of a fixed input and output load inverter chain with an arbitrary activity factor and signal probability ofā€‰ā€‰0.5, for various delay increase values. For input signal probability ofā€‰0.5, all the gates in the inverter chain have the same signal probability. Therefore, the optimization process is indifferent to the average leakage power of each gateā€”š‘ƒleakš‘– in (17) is constant and can be eliminated from (37).

The optimization process leads to increasing the electrical effort of the last stages and decreasing the electrical effort of the first stages, to meet the timing requirements (Figure 6(f)). The largest energy savings, for a given delay increase value, are achieved by downsizing the largest gates in the chain (Figure 6(e)). The relative downsizing, however, is maximal around the middle of the chain (Figure 6(c)), due to the fact that the first stage and the load are anchored with a fixed size. In order to understand the behavior of the middle stages, a 16-stage inverter stage simulation is plotted in Figure 6(d). As the delay increases, the gates towards the middle of the chain are downsized and form a plateau-like shape. Note that the optimal gate sizes might be limited by the minimum allowed size according to design rules.

Both Figures 6(a) and 6(b) (absolute sizing) and Figure 6(f) illustrate that as we move further from the minimal achievable delay (delay increase = 0, where all electrical efforts are identical), the difference between the electrical efforts of the stages increases. However, uniform downsizing (e.g., increase the delay by downsizing each gate by 5%) is sometimes used in the power reduction process by the circuit designer as an easy and straightforward method to trade off energy for performance. Figure 7 shows the energy efficient curve (optimal sizing) versus energy-delay curve generated by uniform downsizing of an 8-long inverter chain with out/in capacitance ratio of 200. The energy difference between the curves in the figure reaches up to 7%.

Most of the energy in the path is dissipated in the last stages of the chain, where the fanout factors are larger, in order to drive the large fixed output capacitance.

Figure 8 demonstrates the effect of chain length on delay and energy. The external load of the circuit is relatively largeā€”9pF, for which 8-long chain yields an optimal timing. The energy efficient curves for chains of 8, 6, and 4 inverters are plotted in the energy-delay plane. We can see that the number of stages is important when the optimal delay is required. Generally, as we move further from the smallest achievable delay, fewer inverters achieve better energy dissipation for the same delay. However, the difference in energy between the optimal number of inverters and a fixed number of inverters decreases as the delay is relaxed.

Figure 9 shows good correlation between EDGMAX10% see (24), and the actual energy delay gain. The energy saving opportunity increases when the output load is small, and when the number of stages in the path increases.

5.2. Activity and Signal Probability Effect on Sizing

The more active a gate is, the more energy it consumes. In order to trade off delay and energy better, active gates in the timing critical path can be downsized more than inactive gates in the critical path. For instance, consider the circuit in Figure 10. The path from š“ to out is the timing critical. Input š“ has a fixed activity factor of 0.5, while the activity factor of input šµ is varied. In order to calculate the activity factor and signal probability of internal nodes, the method described in [24] for AF/SP propagation in combinational circuits is usedā€”for a NAND gate with uncorrelated inputs š“ and šµ and output š‘‚, the activity at its output is calculated as:AFš‘‚=AFš“ā‹…SPšµ+AFšµā‹…š‘†š‘ƒš“āˆ’12ā‹…AFš“ā‹…AFšµ.(41) According to (41), the activity factor at the NAND's gate output is AFnand=0.25+0.5AFšµā€”the activity factor at the output of the NAND is controlled by the activity factor of input šµ, and monotonically rises as AFšµ increases.

When the delay constrains of the circuit are relaxed, As AFšµ is increased, and with it AFnand, we expect that the gates that are driven by the NAND gate will get downsized at the expense of the gates driving the NAND gate. Figure 11 shows the sizing factor of each gate for various AFšµ values, for a delay increase rate of 20%. We see that as AFšµ increases, the sizing factor of gates 1 and 2 is increased, while the sizing factor of gates 5 and 6 is decreased.

A similar observation holds for leakage dominant circuits, where the signal probability becomes the affecting parameter instead of the activity factor. š‘ƒleakš‘– in (16) depends on the signal probability. Therefore, it is expected that the sizing of each gate during the optimization process will be influenced by the signal probability at the gate input. For example, in an inverter, where the š‘ƒmos transistor's size is twice the size of the š‘mos transistor, the leakage power of a single inverter can be estimated by:InverterLeakagePower=SPā‹…š¶inā‹…23ā‹…š‘ƒleak(š‘ƒmos)+(1āˆ’SP)ā‹…š¶inā‹…13ā‹…š‘ƒleak(š‘mos),(42) where SP is the signal probability in the input of the inverter, š¶in is the input capacitance of the inverter, and š‘ƒleak(š‘mos,š‘ƒmos) is the leakage power of š‘mos and š‘ƒmos transistors respectively, per unit input capacitance. Figure 12 shows the sizes of the gates in a six-stage inverter chain with input capacitance of 1ā€‰ff and output load of 600ā€‰ff with a small activity factor, when the delay increase rate is varied from 0% to 50%. The optimal sizing at each stage is clearly affected by the signal probability. Up to 50% difference in the sizing of the stages as a function of the signal probability can be observed (see delay increase of 50%, 4th stage).

5.3. Comparing Analytical and Simulation-Based Optimization

In order to validate the correctness of the EDG optimization algorithm, the results of Section 5.1 are compared to simulation results. The simulation was performed using a proprietary circuit simulator combined with a proprietary numerical optimization environment, in a 32ā€‰nm process. The circuit was first optimized for minimum delay, which was used later as a reference. In order to get the EDG curves, the circuit was optimized by the simulation-based tool for minimum energy, for several delay constraints.

Figure 13 presents the difference between the analytical computation (Section 5.1) and the simulation-based optimization. The error is small, and ranges from a maximum of 7% to a minimum of Ģƒā€Œ0%. Obtaining the EDG curves using simulation-based optimization is orders of magnitude slower than running the proposed analytical method. Table 1 compares the run time of simulation-based optimization and the run time of the proposed analytical model for few inverter chain circuits. Note that simulation-based optimization run time increases dramatically as the circuit complexity increases.


Circuit Sim-based optimization Analytical model optimization

4-long Inverter Chain 240ā€‰sec 25ā€‰sec
8-long Inverter Chain 360ā€‰sec 40ā€‰sec
15-long Inverter Chain 1100ā€‰sec 70ā€‰sec

The analytical model was calibrated by computing the parasitics delay of an inverter (š‘) for the given technology, simply by comparing the output capacitance to the input capacitance of an unloaded inverter (see (9)).

6. Final Remarks and Conclusion

We have presented a design optimization framework that explores the power-performance space. The framework provides fast and accurate answers to the following questions. (1)How much power can be saved by slowing down the circuit by š‘„ percent? (2)How to determine gate sizes for optimal power under a given delay constraint?

We introduced the energy/delay gain (EDG) as a metric for the amount of energy that can be saved as a function of increased delay. The method was demonstrated on a variety of circuits, exhibiting good correlation with accurate simulation-based optimizations. We have shown that around 25% dynamic energy can be gained when the delay constraint is relaxed by 5% in an optimal way, for circuits in 32ā€‰nm technology which were initially designed for maximal operation speed. An upper bound of power savings in a given circuit can be obtained without optimization, in order to quickly assess whether a downsizing effort may be justified for the circuit.

The method described in this work can be used by both circuit designers and EDA tools. Circuit designers can increase their intuition of the energy-delay tradeoff. The following rules of thumb can be derived from the experiments. (i)Minimum Delay Is Power Expensive. By relaxing the delay, significant amount of dynamic energy could be saved. We have shown that under given conditions, for a 2-bit multiplexer up to 40% of dynamic energy could be saved when the delay constraint is relaxed by 10%. (ii)A fixed Uniform Downsizing Factor for all Gates in the circuit would lead to an inefficient design in terms of energy. The optimal downsizing factor is not uniform. (iii)Increase delay by downsizing the ā€œmiddleā€ gates. In order to save energy with minimal impact on timing-the gates located in the middle (between he input and the load) are downsized the most. The downsizing factor increases as the delay constraint relaxes. (iv)Increase Delay by Increasing the Electrical Effort towards the load. Minimum delay design requires a constant tapering factor. Typically, a ā€œfanout of 4ā€ is used [1]. Minimum energy design (when neglecting short circuit power) requires high tapering factor, that decreases the number of stages. When performance is compromised to save energy, the tapering factor of the stages must increase towards the external load. The tapering factor increases as the delay constraint is relaxed. Note that this result is applicable only when the external load is larger than the input capacitance. (v)Downsizing of the Gates Reduces Both Dynamic energy and Leakage Energy Dissipation. Both dynamic energy and leakage energy dissipation depend linearly on the size of the gates. By downsizing the gates, both dynamic and leakage energy are reduced. (vi)The Power Optimization Has to Be Performed under a Given Work-load. The activity factor and signal probability influence the optimized circuit's sizing. Different tests may result in different sizing. Using random tests, rather then typical tests to optimize the circuit may lead to sub-optimal design.

Acknowledgments

The authors would like to thank Yoad Yagil for his valuable inputs.

References

  1. I. E. Sutherland, R. F. Sproull, and D. F. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kauffmann, Boston, Mass, USA, 1999.
  2. P. I. Penzes and A. J. Martin, ā€œEnergy-delay efficiency of VLSI computations,ā€ in Proceedings of the 12th ACM Great Lakes symposium on VLSI (GLSVLSI '02), April 2002. View at: Google Scholar
  3. V. Zyuban and P. N. Strenski, ā€œBalancing hardware intensity in microprocessor pipelines,ā€ IBM Journal of Research and Development, vol. 47, no. 5-6, pp. 585ā€“598, 2003. View at: Google Scholar
  4. V. Zyuban and P. Strenski, ā€œUnified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels,ā€ in Proceedings of the International Symposium on Low Power Electronics and Design, pp. 166ā€“171, Monterey, Calif, USA, August 2002. View at: Google Scholar
  5. D. Marković, V. Stojanović, B. Nikolić, M. A. Horowitz, and R. W. Brodersen, ā€œMethods for true energy-performance optimization,ā€ IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282ā€“1293, 2004. View at: Publisher Site | Google Scholar
  6. C. P. Chen, C. C. N. Chu, and D. F. Wong, ā€œFast and exact simultaneous gate and wire sizing by Lagrangian relaxation,ā€ IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 7, pp. 1014ā€“1025, 1999. View at: Publisher Site | Google Scholar
  7. R. Gonzalez, B. M. Gordon, and M. A. Horowitz, ā€œSupply and threshold voltage scaling for low power CMOS,ā€ IEEE Journal of Solid-State Circuits, vol. 32, no. 8, pp. 1210ā€“1216, 1997. View at: Google Scholar
  8. Z. Chen, M. Johnson, L. Wei, and K. Roy, ā€œEstimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks,ā€ in Proceedings of the International Symposium on Low Power Design, pp. 239ā€“244, 1998. View at: Google Scholar
  9. L. Benini, G. de Micheli, and E. Macii, ā€œDesigning low-power circuits: practical recipes,ā€ IEEE Circuits and Systems Magazine, vol. 1, no. 1, pp. 6ā€“25, 2001. View at: Google Scholar
  10. V. Khandelwal and A. Srivastava, ā€œLeakage control through fine-grained placement and sizing of sleep transistors,ā€ in Proceedings of the International Conference on Computer-Aided Design (ICCAD '04), pp. 533ā€“536, 2004. View at: Google Scholar
  11. S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Springer, New York, NY, USA, 1998.
  12. R. Zlatanovici and B. Nikolić, ā€œPower—performance optimization for custom digital circuits,ā€ in Proceedings of the Power—Performance Optimization for Custom Digital Circuits (PATMOS '05), vol. 3728 of Lecture Notes in Computer Science, pp. 404ā€“414, 2005. View at: Google Scholar
  13. H. Q. Dao, B. R. Zeydel, and V. G. Oklobdzija, ā€œEnergy optimization of pipelined digital systems using circuit sizing and supply scaling,ā€ IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 2, pp. 122ā€“134, 2006. View at: Publisher Site | Google Scholar
  14. V. Oklobdzija and R. K. Krishnamurthy, High-Performance Energy-Efficient Microprocessor Design, Springer, New York, NY, USA, 2006.
  15. A. Naveh, E. Rotem, A. Mendelson et al., ā€œPower and thermal management in the Intel core duo processor,ā€ Intel Technology Journal, vol. 10, no. 2, 2006. View at: Google Scholar
  16. AMD Press Release, ā€œAMD Phenom X4 9100e processor enables full featured, sleek and quiet quad-core PCs,ā€ AMD Press Resources Web Page, March 2008. View at: Google Scholar
  17. H. Rahman and C. Chakrabarti, ā€œA leakage estimation and reduction technique for scaled CMOS logic circuits considering gate-leakage,ā€ in Proceedings of the IEEE International Symposium on Cirquits and Systems (ISCAS '04), pp. 297ā€“300, May 2004. View at: Google Scholar
  18. Y. Xu, Z. Luo, and X. Li, ā€œA maximum total leakage current estimation method,ā€ in Proceedings of the IEEE International Symposium on Cirquits and Systems (ISCAS '04), pp. 757ā€“760, May 2004. View at: Google Scholar
  19. S. Boyd, Lieven Vandenberghe Convex Optimization, Cambridge University Press, Cambridge, UK, 2006.
  20. S. Boyd, S. J. Kim, L. Vandenberghe, and A. Hassibi, ā€œA Tutorial on Geometric Programming,ā€ Revised for Optimization and Engineering, July 2005. View at: Google Scholar
  21. S. P. Boyd, S. J. Kim, D. D. Patil, and M. A. Horowitz, ā€œDigital circuit optimization via geometric programming,ā€ Operations Research, vol. 53, no. 6, pp. 899ā€“932, 2005. View at: Publisher Site | Google Scholar | MathSciNet
  22. A. Mutapcic, K. Koh, S. Kim, L. Vanden-Berghe, and S. Boyd, ā€œGGPLAB: A Simple Matlab Toolbox for Geometric Programming,ā€ May 2006, http://www.stanford.edu/boyd/ggplab/. View at: Google Scholar
  23. ā€œInternational Technology Roadmap for Semiconductors,ā€ 2007 Edition, http://www.itrs.net/Links/2007ITRS/Home2007.htm. View at: Google Scholar
  24. A. Ghosh, S. Devadas, K. Keutzer, and J. White, ā€œEstimation of average switching activity in combinational and sequential circuits,ā€ in Proceedings of the 29th ACM/IEEE Conference on Design Automation, pp. 253ā€“259, Anaheim, Calif, USA, June 1992. View at: Google Scholar

Copyright © 2011 Yoni Aizik and Avinoam Kolodny. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views1725
Downloads844
Citations

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.