Selected Papers from the Midwest Symposium on Circuits and Systems
View this Special IssueResearch Article  Open Access
Kumar Yelamarthi, ChienIn Henry Chen, "Dynamic CMOS Load Balancing and Path Oriented in Time Optimization Algorithms to Minimize Delay Uncertainties from Process Variations", VLSI Design, vol. 2010, Article ID 230783, 13 pages, 2010. https://doi.org/10.1155/2010/230783
Dynamic CMOS Load Balancing and Path Oriented in Time Optimization Algorithms to Minimize Delay Uncertainties from Process Variations
Abstract
The complexity of timing optimization of highperformance circuits has been increasing rapidly in proportion to the shrinking CMOS device size and rising magnitude of process variations. Addressing these significant challenges, this paper presents a timing optimization algorithm for CMOS dynamic logic and a Path Oriented IN Time (POINT) optimization flow for mixedstaticdynamic CMOS logic, where a design is partitioned into static and dynamic circuits. Implemented on a 64b adder and International Symposium on Circuits and Systems (ISCAS) benchmark circuits, the POINT optimization algorithm has shown an average improvement in delay by 38% and delay uncertainty from process variations by 35% in comparison with a stateoftheart commercial optimization tool.
1. Introduction
The performance improvement of microprocessors has been driven traditionally by dynamic logic and microarchitectural improvements [1] and can be further enhanced through circuit design and topology organization. Dynamic logic is an effective logic style in terms of timing and area when compared to its static counterpart due to the absence of requirement for design implementation in complementary PMOS logic, and the use of a clock signal in its implementation of combinational logic circuits. In general, CMOS dynamic logic uses fast NMOS transistors in its pulldown network. Its delay is dependent on the number and size (width) of transistors in the NMOS critical path. This paper presents an NMOS transistor sizing optimization for a faster operation.
Static logic is slower because it has twice the loading, higher thresholds, and actually uses slow PMOS transistors for computation. Dynamic logic has been predominantly used in microprocessors, and their usage has increased the timing performance significantly over static CMOS circuits [1, 2]. However, timing optimization of dynamic logic is challenging due to several issues such as charge sharing, noiseimmunity, leakage, and environmental and semiconductor process variations. Also, with dynamic circuits consuming more power over static CMOS, an optimal balance of delay and power can be achieved at the architectural level through effective partitioning of design into a mixedstaticdynamic circuit style [3].
Process variations introduce design uncertainties at each step of process development, design, manufacturing, and test. The ratio of these process variations to the nominal values has been increasing with the shrinking device size towards 32 nm [4], causing an impending requirement to account for process variations during timing optimization. They need to be taken into account during the design phase to make sure that performance analysis provides an accurate estimation [5].
One of the challenges in timing optimization of CMOS logic is delay uncertainty from process variations, where and are the maximum and minimum delays of a timing path. In the 180 nm CMOS technology, these process variations have caused about 30% variation in chip frequency, along with 20X variation in current leakage [6]. The magnitude of intradie channel length variations has been estimated to increase from 35% of total variations in 130 nm to 60% in 70 nm CMOS process and variation in wire width, height, and thickness is also expected to increase from 25% to 35% [7]. In CMOS 65 nm process, the parameters that affect timing the most are device length, threshold voltage, device width, mobility, and oxide thickness [8]. For process variation sensitive circuits such as SRAM arrays and dynamic logic circuits, these process variations may result in functional failure and yield loss [7].
Addressing the challenges of timing optimization and delay uncertainty from process variations, this paper presents a timing optimization algorithm for dynamic circuits, and a timing optimization flow for mixedstaticdynamic CMOS logic. The proposed algorithm and flow are validated through implementation on several benchmark circuits and a 64b binary adder in 130 nm CMOS process. This paper is an extension of our previous work [9] and is organized as follows. Section 2 presents previous work on transistor sizing (width) optimization for timing and process variation minimization. Section 3 presents the proposed transistor sizing optimization for CMOS dynamic logic. Section 4 presents implementation and results of several ISCAS benchmark circuits. Based on this timing optimization algorithm for dynamic circuits, Section 5 presents a timing optimization flow for mixedstaticdynamic CMOS logic and its implementation and results of several ISCAS benchmark circuits. Finally, conclusion is presented in Section 6.
2. Previous Work
Methods of automating transistor sizing for timing optimization were proposed in [3, 10–16], but many of them focus on static CMOS circuits and technologies using multiple threshold voltages. TImed LOgic Synthesizer (TILOS) [10] presented an algorithm of iteratively sizing transistors by a certain factor in the critical path. The algorithm is not a deterministic approach, as it does not guarantee convergence in timing optimization. MINFLOTRANSIT [11] is another algorithm proposed for transistor sizing based on iterative relaxation method but requires iterative generation of directed acyclic graphs for every step of timing optimization. Computation of “Logical Effort” is the other method proposed for timing optimization [16]. However, it has two limitations. First, it requires estimation of input capacitance, of which circuits with complex branches or multiple paths have difficulty in accurate estimate. Second, it optimizes timing at the cost of increased area [17].
Methods to mitigate the effect of process variations in CMOS circuits were proposed in [6, 7, 18–23]. These methods deal with statistical variations and are not optimal for designs with large number of parameter variations [24]. A technique called Adaptive Body Biasing (ABB) was presented in [6, 22] to compensate for variation tolerance. The ABB technique is implemented in postsilicon where each die receives a unique bias voltage, reducing the variance of frequency variation. However, this method does not minimize intradie variations, as each block in the design requires a unique bias voltage. Another limitation is the increasing leakage power, caused by the reduction of threshold voltage. Programmable keepers were proposed to compensate for process variations in [23]. This method works for designs with large number of parallel stacks (similar to the NOR gates). However, it requires additional hardware to program the keeper transistors for other designs.
Research has shown that intradie variations primarily impact the mean delay, and interdie variations impact the variance of delay [18]. As timing optimization should consider both interdie and intradie variations, both mean and variance should be accounted for. In addition to optimizing path delay, other parameters affected by process variations that need to be considered and reduced are delay uncertainty () and sensitivity , where and are the maximum and minimum delays, is the mean delay, and is the standard deviation of delay distribution.
3. Transistor Sizing Optimization of Dynamic Circuits
The delay of dynamic circuit is highly dependent on the number and size (width) of transistors in the critical path. Increasing width of transistors in a path will increase the discharging current and reduce the output pulldown path delay. However, increasing width of transistors to reduce one path delay may increase the capacitive load of channelconnected transistors on other paths and substantially increase their delays. This complexity increases along with the number of paths present in the circuit. A 2b Weighted BinarytoThermometric Converter (WBTC) that is used in highperformance binary adders [25] shown in Figure 1 is used as an example to explain the path delay optimization complexity while considering process variations.
Figure 1 highlights two timing paths: pathA (T_{28 }T_{7}T_{8}T_{12}T_{18}T_{32}) and pathB (T_{28}T_{0}T_{4}T_{11}T_{15}T_{16}T_{31}). A test was performed to optimize pathA by gradually increasing widths of T_{7}, T_{8}, T_{12,} and T_{18}. It was observed that the delay of pathA reduced by 4%, but delay of pathB increased by 9.3%. This is a result of transistors on pathB being channelconnected to transistors on pathA. For instance, T_{4} and T_{11} are channelconnected to T_{7} and T_{8}, and T_{15} and T_{16} are channelconnected to T_{12} and T_{18}. Increasing widths of T_{7}, T_{8}, T_{12}, and T_{18} in pathA causes the capacitive load of T_{4}, T_{11}, T_{15}, and T_{16} to increase and therefore increase delay of pathB. This circuit example illustrates that increasing widths of transistors on one critical path increases capacitive load and delay of the other critical paths.
Conventionally, a path delay is denoted by the mean of its delay distribution, which accounts only for intradie variations. As interdie variations are equally important, its standard deviation is as important and should be considered as well. Consider the delay distribution of two paths of WBTC shown in Figure 2. PathB has a high mean delay, while pathA has a high standard deviation. Typically, pathB would be chosen as the critical path for timing optimization as it has the highest mean delay . Optimizing the design by increasing width of transistors on pathB may reduce the mean delay but may not reduce its standard deviation as well. However, by considering both the mean and the standard deviation (i.e., ), pathA would be chosen as the critical path to be optimized. As both interdie and intradie variations are equally important, the proposed timing optimization algorithm ranks critical paths based on the sum of the path mean delay and its standard deviation, . The Load Balance of Multiple Paths (LBMPs) timing optimization algorithm proposed for transistor sizing of dynamic circuits while considering process variations is presented in Figure 3.
Consider the circuit with a series of channelconnected NMOS transistors, in Figure 4. While conducts the discharge current of the load capacitance conducts the discharge current from a total capacitive load, , which is substantially large when increases. So, the discharge time of the transistor near Gnd, is longer than the transistor near output. Accordingly, accounting for the discharge times increasing from to the transistor sizes are made progressively larger, starting from a minimumsize transistor at to reduce the total discharge time of the pulldown path (out, ). The width of the next to the last transistor is scaled up by a factor. In the proposed LBMP algorithm we assign a weight (used for transistor sizing) in the range of 0.05–0.5 to each transistor relative to its distance from the output for this reason. For instance, the 2b WBTC in Figure 1 is comprised of seven transistor stacks relative to their distance from the output. Stack1, closest to the output, includes transistors T_{3}, T_{10}, T_{16}, T_{21}, T_{25}, and T_{27}. Stack2 includes transistors T_{6}, T_{13}, T_{18}, T_{23}, and T_{26}. Stack3 includes transistors T_{2}, T_{9}, T_{15}, T_{20}, and T_{24}. Stack4 includes transistors T_{5}, T_{12}, T_{17}, and T_{22}. Stack5 includes transistors T_{1}, T_{8}, T_{14}, and T_{19}. Stack6 includes transistors T_{4} and T_{11}. Stack7 farthest from the output includes transistors T_{0} and T_{7}. Accordingly, transistors in stacks 1–7 are assigned weights of 0.05, 0.1, 0.15, 0.2, 0.3, 0.4, and 0.5, respectively. For designs with different number of stacks, weights of transistors in stacks are evenly distributed in the range 0.05–0.5 relative to its distance from the output; weight of 0.05 is assigned to stack of transistors closest to the output, and a weight of 0.5 is assigned to stack of transistors farthest from the output.
As increasing the width of transistor that appears in the most number of paths reduces overall delay, the number of paths a transistor is present in is computed and denoted by “repeats”. The initial step in LBMP algorithm is to size transistors on every path with a fixed ratio, for example, 1.1 for faster optimization convergence [26]. After the repeats and the weights of all transistors are computed, simulations are performed to obtain delay distribution of each path. The transistors on the top 20% critical paths are grouped to setx, and their widths are increased and calculated by (1):
As the delay of critical path is dependent on the capacitive load of channelconnected transistors, reducing this capacitive load reduces the overall delay. The 1storder connection transistors in setx are identified and grouped to sety. Then, transistors in setx are excluded from sety to form setz. For each transistor in setz, it is checked if the transistor is present in setx of previous iteration. If so, its width is decreased and calculated by (2) and (3). If not, its width is decreased and calculated by (4). Once new transistor widths are determined, simulations are performed to locate the new critical paths. This algorithm is repeated until a convergence of an optimal solution is obtained:
4. Optimization of Delay, Uncertainty, and Sensitivity from Process Variations
A 2b Weighted BinarytoThermometric Converter (WBTC) used in highperformance binary adders was shown in Figure 1 [25]. This circuit is used as an example to illustrate the complexity of transistor sizing optimization. With less than 50 transistors, the 2b WBTC has 34 timing paths, and of which path delays change dramatically with different transistor sizes.
The timing paths of the 2b WBTC are shown in Table 1, and transistor repeat and weight profiles are shown in Table 2. Prior to optimization, the worstcase delay of 2b WBTC was 355 psec from path1. The top 20% critical paths are path1, 2, 5, 8, 26, and 29. Widths of all transistors in these critical paths are initially increased by a ratio of 1.1 to their initial values. For example, the sizes of transistors (T_{22}, T_{11}, T_{4}, and T_{0}) in path1 are increased to nm, nm, nm, nm, respectively. After initial transistor sizing, process variations are considered in simulations in which delay distribution of each path is obtained. Then, transistor sizes are updated using (1)–(4), and simulations are performed to obtain a new critical path order. After several iterations of the LBMP timing optimization algorithm, the worstcase delay of 2b WBTC is reduced and finally converged to 157 psec, accounting for a 55.77% improvement.


Efficiency of the LBMP algorithm is further illustrated through reduction in delay uncertainty Figures 5 and 6 show the normalized delay distribution of the 2b WBTC before and after optimization, respectively. It is clearly evident that delay uncertainty has reduced and distribution has been narrowed significantly in the optimized design. With major contributors towards delay uncertainty being gate length, channel width, capacitance, supply voltage, and threshold voltage [5], timing analysis was performed to categorize the impact of each. Figure 7 shows the reduction in delay uncertainty of 2b WBTC from 14% to 8% due to variation in zerobias junction capacitance.
Kinget in his work on device mismatch [27] has shown that variance in delay distribution of is dependent on device area, as shown in (5), where is the transistor width, is the transistor channel length, and is the proportionality constant (a technology dependant value). Experimental results of delay impact from variation in oxide thickness are shown in Figure 8 where increasing device area after transistor sizing reduces the delay uncertainty of the 2b WBTC from 24% to 15%. The other research [5] shows that a drop in supply voltage degrades cell timing at a quadratic rate; a 5% drop in total railtorail voltage may result in a 15% timing degradation. Figure 9 shows the delay uncertainty of the 2b WBTC before and after optimization using the LBMP algorithm. It is observed that a 20% drop in total railtorail voltage (from 1.0 V to 0.8 V) results in a 4% variation in timing (much less than 15%), which further illustrates that the LBMP algorithm is less sensitive to variation in supply voltage:
Another benchmark used to validate the algorithm is a 4b Unity Weight BTC (UWBTC) that is used in highperformance digitaltoanalog converters, as shown in Figure 10. Along with an increase in the number of transistors, the number of timing paths to be considered is also increased to 83. Prior to optimization, the 4b UWBTC had a worstcase delay of 152 psec. Through iterative optimization using the LBMP algorithm, the worstcase delay of 4b UWBTC was reduced to 103 psec, an improvement of 33%. Furthermore, the LBMP algorithm was also implemented on several ISCAS benchmark circuits of which the ratio of the number of critical paths to the number of transistors is shown in Table 3. Through implementation and verification in 130 nm CMOS process, in Table 4 the LBMP algorithm has shown an average delay reduction by 47.8%, uncertainty reduction by 48%, power increase by 13%, and an area increase by 39.8%. The delay convergence profiles of these circuits are shown in Figure 11.


As delay in general can be reduced by increasing power consumption [28], powerdelay product (PDP) is a key evaluation parameter to compare the design performance among different circuit structures. Table 5 shows the PDP of benchmark circuits before and after optimization. Through optimal sizing of transistor widths, the proposed LBMP timing optimization algorithm has reduced the PDP by an average of 40.17%.

The other electronic performance measurement associated with timing optimization is delay sensitivity due to process variations. Traditionally, CMOS device switching speed improves at a lower temperature due to increase in mobility. However, Negative Bias Temperature Instability (NBTI) effects may degrade the device switching speed over time via threshold voltage shifts in PMOS transistors [29, 30], even at a lower temperature. The delay sensitivities of several ISCAS benchmark circuits, due to process variations at different temperatures are reported in Figure 12. It is observed that all circuits after timing optimization have a very little difference in the delay sensitivity reduction for different temperatures. The LBMP timing optimization provides consistent delay sensitivity at different temperatures.
5. Timing Optimization of MixedStaticDynamic Circuits
Conventionally, synthesis tools perform design and optimization using static CMOS logic [31, 32]. It is not uncommon for the synthesis tools to not find an acceptable solution in terms of timing. This challenge can be answered through utilizing the advantage of fast timing in dynamic logic. Dynamic logic has smaller gate capacitances compared to their static CMOS counterparts, which accounts for a significant speedup [3, 33]. With static and dynamic logic having their respective advantages of low power and low delay, an optimal balance can be obtained by partitioning the design to use both static and dynamic logic in an effective manner.
At the architecture level, a common limitation in most design optimization flows is the limited accountability for process variations. Typically after placement and route, if a design fails to meet the timing constraints, optimization flow is reiterated. Even after several iterations, design may still not meet the timing constraint and miss the timetomarket window. The process variationaware Path Oriented IN Time (POINT) optimization algorithm proposed in Figure 13 answers these challenges of timing optimization and also accounts for process variations. Utilizing the LBMP algorithm proposed in Section 3, the POINT optimization algorithm partitions the design to effectively utilize both dynamic and static CMOS logic to meet the timing constraints.
Initially, a highlevel description of a design is input to Synopsys Design Compiler (SDC) [31] for synthesis and optimization. The optimized designs from SDC are considered as the initial case for POINT optimization flow. Following synthesis and optimization, static timing analysis (STA) is performed using Synopsys PrimeTime (SPT) to identify the critical paths. Also, the critical timing modules identified by the number of occurrences, and delay significance on the critical paths are reported and dynamic circuits of the same are designed. Using the LBMP algorithm, iterative transistor sizing optimization for timing is performed on these dynamic circuits.
With the updated design comprising of dynamic logic circuits, clock tree design and timing verification is performed. After the design is verified for clock signal timing constraints, incremental STA is performed to verify for timing convergence. The algorithm is iteratively repeated towards convergence of acceptable solution. Following the timing convergence through iterations, the final mixedstaticdynamic circuit design is exported for placement and route.
The POINT optimization algorithm is verified through implementation on several ISCAS benchmark circuits, including C3540, an 8b ALU as shown in Figure 14 [34]. Initial synthesis and optimization was performed using SDC, and static timing analysis was performed using SPT [35]. For the design in hierarchical format (synthesis and optimization was performed at block level, and design flatten option was disabled), the critical path delay was found to be 3.6 nanoseconds. The critical modules and the critical paths obtained from STA are highlighted in Figure 14. Based on the STA report, it is shown that the ALU CoreM5 with a delay of 1.24 nanoseconds is the timing critical module with the most number of worstcase paths. Figure 15 shows the schematic of UM5_6 from ALU CoreM5 with the critical paths highlighted; the submodules labeled CC5 and CC9 are timing critical with delays of 0.5 nanoseconds and 0.61 nanoseconds respectively.
With timing optimization being the primary goal in this stage, submodules CC5 and CC9 in M5/UM5_6 of C3540 are designed in dynamic logic, and timing optimization is performed using the LBMP algorithm. With dynamic circuits optimized using LBMP algorithm, the delay of CC5 was reduced from 0.5 nanoseconds to 0.07 nanoseconds, and delay of CC9 was reduced from 0.61 nanoseconds to 0.20 nanoseconds, respectively. After the first iteration of the POINT optimization flow, the critical path delay of C3540 was reduced from 3.6 nanoseconds to 2.8 nanoseconds. Further iterations of POINT optimization flow reduced the critical path delay from 3.6 nanoseconds to 2.4 nanoseconds, as shown in Figure 16. In addition to reducing the delay by 33%, the delay uncertainty due to process variations is also reduced by 40% as shown in Figure 17.
Similarly, the process variationaware POINT optimization was implemented on other benchmark circuits and timing optimization results are presented in Table 6, where both delay and uncertainty from process variations were reduced by an average of 38% and 35%, respectively, over initial designs optimized with Synopsys Design Compiler.

6. Conclusion
In this paper a process variationaware timing optimization of dynamic logic and a timing optimization flow for mixedstaticdynamic CMOS logic have been presented. Solutions addressing further design challenges are presented by considering delay uncertainties from process variations and developing a process variationaware Path Oriented IN Time (POINT) optimization algorithm for mixedstaticdynamic logic.
Through implementation and verification of several benchmark circuits in 130 nm CMOS process, the process variationaware timing optimization algorithm has shown by average a delay reduction by 47.8%, an uncertainty reduction by 48%, a power increase by 13%, and a powerdelayproduct reduction by 40%. Validated through implementation of mixedstaticdynamic logic on a 64b adder and several ISCAS benchmark circuits, the POINT optimization algorithm has demonstrated an average improvement in delay reduction by 38% and delay uncertainty reduction from process variation by 35%.
7. Acknowledgment
The authors wish to thank the anonymous reviewers for insightful comments to improve the quality of presentation.
References
 D. H. Allen, S. H. Dhong, H. P. Hofstee et al., “Custom circuit design as a driver of microprocessor performance,” IBM Journal of Research and Development, vol. 44, no. 6, pp. 799–822, 2000. View at: Google Scholar
 P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon, “Highperformance microprocessor design,” IEEE Journal of SolidState Circuits, vol. 33, no. 5, pp. 676–685, 1998. View at: Google Scholar
 M. Zhao and S. S. Sapatnekar, “Timingdriven partitioning and timing optimization of mixed staticdomino implementations,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 19, no. 11, pp. 1322–1336, 2000. View at: Publisher Site  Google Scholar
 L. Zhang, Statistical timing analysis for digital circuit design, Ph.D. dissertation, December 2005.
 P. McGuinness, “Variations, margins, and statistics,” in Proceedings of the International Symposium on Physical Design, pp. 60–67, Portland, Ore, USA, April 2008. View at: Publisher Site  Google Scholar
 J. Tschanz, K. Bowman, and V. De, “Variationtolerant circuits: circuit solutions and techniques,” in Proceedings of Design Automation Conference, pp. 762–763, 2005. View at: Google Scholar
 P. S. Zuchowski, P. A. Habitz, J. D. Hayes, and J. H. Oppold, “Process and environmental variation impacts on ASIC timing,” in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD '04), pp. 336–342, November 2004. View at: Google Scholar
 S. B. Samaan, “The impact of device parameter variations on the frequency and performance of VLSI chips,” in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD '04), pp. 343–346, 2004. View at: Google Scholar
 K. Yelamarthi and C.I. H. Chen, “A path oriented in time optimization flow for mixedstaticdynamic CMOS logic,” in Proceedings of the 51st Midwest Symposium on Circuits and Systems, pp. 454–457, Knoxville, Tenn, USA, August 2008. View at: Publisher Site  Google Scholar
 J. P. Fishburn and A. E. Dunlop, “TILOS: a posynomial programming approach to transistor sizing,” in Proceedings of IEEE International Conference on Computer Aided Design (CCAD '85), pp. 326–328, Santa Clara, Calif, USA, 1985. View at: Google Scholar
 V. Sundararajan, S. S. Sapatnekar, and K. K. Parhi, “Fast and exact transistor sizing based on iterative relaxation,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 21, no. 5, pp. 568–581, 2002. View at: Publisher Site  Google Scholar
 M. Borah, R. M. Owens, and M. J. Irwin, “Transistor sizing for minimizing power consumption of CMOS circuits under delay constraint,” in Proceedings of the International Symposium on Low Power Design, pp. 167–172, Dana Point, Calif, USA, April 1995. View at: Google Scholar
 S.O. Jung, K.W. Kim, and S.M. Kang, “Transistor sizing for reliable domino logic design in dual threshold voltage technologies,” in Proceedings of the 11th Great Lakes Symposium on VLSI (GLSVLSI '01), pp. 133–138, West Lafayette, Ind, USA, March 2001. View at: Google Scholar
 Z. Luo, “General transistorlevel methodology on VLSI lowpower design,” in Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI '06), pp. 115–118, Philadelphia, Pa, USA, April 2006. View at: Google Scholar
 A. R. Conn, I. M. Elfadel, W. W. Molzen Jr. et al., “Gradientbased optimization of custom circuits using a statictiming formulation,” in Proceedings of Design Automation Conference, pp. 452–459, June 1999. View at: Google Scholar
 I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann, San Francisco, Calif, USA, 1999.
 N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, Addison Wesley, Boston, Mass, USA, 3rd edition, 2004.
 K. A. Bowman, S. G. Duvall, and J. D. Meindl, “Impact of dietodie and withindie parameter fluctuations on the maximum clock frequency distribution for gigascale integration,” IEEE Journal of SolidState Circuits, vol. 37, no. 2, pp. 183–190, 2002. View at: Publisher Site  Google Scholar
 M. Orshansky, Increasing Circuit Performance through Statistical Design Techniques, Closing the Gap between ASIC & Custom, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2003.
 D. Burnett, K. Erington, C. Subramanian, and K. Baker, “Implications of fundamental threshold voltage variations for highdensity SRAM and logic circuits,” in Proceedings of the Symposium on VLSI Technology, pp. 15–16, Honolulu, Hawaii, USA, June 1994. View at: Google Scholar
 K. Takeuchi, T. Tatsumi, and A. Furukawa, “Channel engineering for the reduction of randomdopantplacementinduced threshold voltage fluctuation,” in Proceedings of the IEEE Electron Devices Meeting (IDEM '97), pp. 841–844, Washington, DC, USA, December 1997. View at: Publisher Site  Google Scholar
 S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, “Parameter variations and impact on circuits and microarchitecture,” in Proceedings of Design Automation Conference, pp. 338–342, 2003. View at: Google Scholar
 C. H. Kim, K. Roy, S. Hsu, R. Krishnamurthy, and S. Borkar, “A process variation compensating technique with an ondie leakage current sensor for nanometer scale dynamic circuits,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 6, pp. 646–649, 2006. View at: Publisher Site  Google Scholar
 L. Scheffer, “The Count of Monte Carlo,” in Proceedings of the ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU '04), February 2004. View at: Google Scholar
 F. Maloberti and C. Gang, “Performing arithmetic functions with the Chinese abacus approach,” IEEE Transactions on Circuits and Systems II, vol. 46, no. 12, pp. 1512–1515, 1999. View at: Publisher Site  Google Scholar
 B. Fu, Q. Yu, and P. Ampadu, “Energydelay minimization in nanoscale domino logic,” in Proceedings of the 16th ACM Great Lakes Symposium on VLSI (GLSVLSI '06), pp. 316–319, Philadelphia, Pa, USA, April 2006. View at: Google Scholar
 P. R. Kinget, “Device mismatch and tradeoffs in the design of analog circuits,” IEEE Journal of SolidState Circuits, vol. 40, no. 6, pp. 1212–1224, 2005. View at: Publisher Site  Google Scholar
 W. Wolf, Modern VLSI Design: IPBased Design, Prentice Hall, Upper Saddle River, NJ, USA, 4th edition, 2008.
 B. Lasbouygues, R. Wilson, N. Azemard, and P. Maurine, “Timing analysis in presence of supply voltage and temperature variations,” in Proceedings of the International Symposium on Physical Design, pp. 10–16, 2006. View at: Google Scholar
 W. Wang, S. Yang, S. Bhardwaj et al., “The impact of NBTI on the performance of combinational and sequential circuits,” in Proceedings of the 44th Annual Design Automation Conference, pp. 364–369, 2007. View at: Publisher Site  Google Scholar
 Synopsys Design Compiler, http://www.synopsys.com/.
 Cadence Encounter, http://www.cadence.com/.
 R. Puri, “Design issues in mixed staticdynamic circuit implementation,” in Proceedings of International Conference on Computer Design, pp. 270–275, 1998. View at: Google Scholar
 M. C. Hansen, H. Yalcin, and J. P. Hayes, “Unveiling the ISCAS85 benchmarks: a case study in reverse engineering,” IEEE Design and Test of Computers, vol. 16, no. 3, pp. 72–80, 1999. View at: Publisher Site  Google Scholar
 Synopsys PrimeTime, http://www.synopsys.com/.
Copyright
Copyright © 2010 Kumar Yelamarthi and ChienIn Henry Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.