Fast and Near-Optimal Timing-Driven Cell Sizing under Cell Area and Leakage Power Constraints Using a Simplified Discrete Network Flow Algorithm

Ren, Huan; Dutt, Shantanu

doi:https://doi.org/10.1155/2013/474601

VLSI Design

On this page

Abstract Introduction Experimental Results Conclusions References Copyright Related Articles

Special Issue

New Algorithmic Techniques for Complex EDA Problems

View this Special Issue

Research Article | Open Access

Volume 2013 | Article ID 474601 | https://doi.org/10.1155/2013/474601

Fast and Near-Optimal Timing-Driven Cell Sizing under Cell Area and Leakage Power Constraints Using a Simplified Discrete Network Flow Algorithm

Huan Ren¹and Shantanu Dutt¹

Academic Editor: Gi-Joon Nam

Received24 May 2012

Revised06 Nov 2012

Accepted21 Nov 2012

Published07 May 2013

Abstract

We propose a timing-driven discrete cell-sizing algorithm that can address total cell size and/or leakage power constraints. We model cell sizing as a “discretized” mincost network flow problem, wherein available sizes of each cell are modeled as nodes. Flow passing through a node indicates the choice of the corresponding cell size, and the total flow cost reflects the timing objective function value corresponding to these choices. Compared to other discrete optimization methods for cell sizing, our method can obtain near-optimal solutions in a time-efficient manner. We tested our algorithm on ISCAS’85 benchmarks, and compared our results to those produced by an optimal dynamic programming- (DP-) based method. The results show that compared to the optimal method, the improvements to an initial sizing solution obtained by our method is only 1% (3%) worse when using a 180 nm (90 nm) library, while being 40–60 times faster. We also obtained results for ISPD’12 cell-sizing benchmarks, under leakage power constraint, and compared them to those of a state-of-the-art approximate DP method (optimal DP runs out of memory for the smallest of these circuits). Our results show that we are only 0.9% worse than the approximate DP method, while being more than twice as fast.

1. Introduction

In order to achieve a balance between design quality and time-to-market, cell library-based design is becoming the dominant design methodology over the custom design method even for high-performance ICs. Usually in a cell library, several different cell implementations are available for the same function with different sizes, intrinsic delays, driving resistances, and input capacitances. Choosing the cell with an appropriate size, that is, cell sizing, is a very effective approach to improve timing.

The cell-sizing problem has been studied for a long time. Many methods [1, 2] assume the availability of a continuous range of cell sizes; that is, the size of a cell can take any value in a range. Then, the obtained gate size is rounded to the nearest available size in the library. However, a large number of realistic cell libraries are “sparse,” for example, geometrically spaced instead of uniformly spaced [3]. Geometrically spaced gate sizes are desired in order to cover a large size range with a relatively small number of cell instances. Also it has been proved in [4] that, under certain conditions, the set of optimal gate sizes must satisfy the geometric progression. With a sparse library, the simple rounding scheme can introduce huge deterioration from the continuous solution, which often causes the sizing results to fail to meet given timing requirements [3].

On the other hand, few time-efficient methods are known that they can directly handle timing optimization with discrete cell sizes since this problem is NP complete. The technique in [5] uses multidimensional descent optimization that iteratively improves a current solution by changing the size choices of a set of cells that produces the largest improvement. It is not clear how well this method can avoid being trapped in a local optimum. In [3], a new rounding method is developed based on an initial continuous solution. Instead of only rounding to the nearest available size, the method visits cells in topological order in the circuit, and tries several discrete sizes around the continuous solution for each cell. In order to reduce the search space, after a new cell is visited, it performs a pruning step that discards obviously inferior solutions and a merging step that keeps only several representative solutions within a certain quality region. The run time of this method is significantly larger than the method in [5].

More recently, [6] has proposed a method that uses a combination of continuous and discrete optimization techniques for the power-driven timing-constrained cell-sizing problem. The problem is first simplified to an unconstrained problem using Lagrange relaxation. Then, the resulting unconstrained discrete sizing problem is solved using dynamic programming. Through Lagrange relaxation-based simplification, it is able to handle complex delay constraints of current industry designs. However, as a continuous convex optimization technique, the quality and convergence of the Lagrange relaxation method is not guaranteed when applied to discrete problems. On the other hand, our proposed method directly handles constraint satisfaction simultaneously with objective optimization in a unified and specially designed mincost network flow model.

In this paper, we propose a network-flow-based method for the discrete gate-sizing problem. In our method, the different size options of a cell are modeled by nodes in the network graph. The flow cost represents the change in the timing objective function value when the cell sizes corresponding to the nodes in the graph which have flows through them are chosen. Hence, by solving a mincost flow in the network graph, near-optimal cell sizing can be determined by choosing cell sizes whose corresponding nodes have the mincost flow through them—the near optimality comes from having to constraint the flow to adhere to certain discrete requirements like going through exactly one size-option node per cell.

By modeling the gate-sizing problem as a mincost network flow problem, we can solve it using standard network flow algorithms, which are very time efficient. Also problem constraints, like the maximum allowable total cell area, can be handled efficiently by making the flow amount proportional to the chosen cell area and using an arc with an appropriate capacity to limit the total flow amount.

However, network flow is a continuous optimization method. Thus when applying it to solve the discrete option selection problem, invalid solutions may be produced. For example, the mincost flow may pass through two sizing options for the same cell. We solve such problems by using min lookahead-cost or max flow selection heuristics. Network flow has been used to solve various EDA problems including placement [7–9] and placement legalization [10]. In the recent technique of [10], only the network graph modeling the legalization problem, and which nodes (representing bins of cells) are overfull (these are then supply nodes) and under full (these are demand nodes) have been determined a priori to solve the mincost flow problem. Costs of arcs, which represent shipment of cells, between adjacent bins are not known in advance since this depends on the exact set of cells being shipped from one bin to an adjacent one. This set of cells and the cost of arcs out of the “current” bin are determined dynamically within Dijstra’s shortest path computation that is used iteratively to solve an approximate mincost flow problem. Our problem in using network flow to solve the cell-sizing problem is different: while we know the exact cost and capacities of arcs in the network graph modeling the problem, the issue we need to tackle is that of preventing splitting of the flow among each subset of arcs that represent the selection of the sizes of each cell, so that flow can go through only one of these arcs, thus providing a unique selection of a size for each cell. We solve this problem in an outer loop enveloping the mincost flow computation, as opposed to the technique in [10], which solves it within an inner loop of the mincost flow computation.

This paper is an extended version of the workshop paper [11] that appeared in a internal workshop compendium of papers, which is not considered a published proceedings. Thus it is not strictly necessary to discuss the issue of extensions of that paper. However, there are extensions, which include an updated discussion of previous work, results for benchmark circuits using Synopsys’s 90 nm library (in addition to the 180 nm library used in [11]), and an analysis of the differences in the two sets of results for the two libraries.

The rest of the paper is organized as follows. Section 3 provides an overview of our method. A general view of our size selection network graph (SSG) is presented in Section 4. In Sections 5.1–5.4, we discuss various issues of the SSG. In Section 6, we show how to obtain a valid mincost flow in the SSG. Section 7 briefly describes an optimal exhaustive search method to which we compare our network-flow-based technique. Section 8 presents experimental results and we conclude in Section 9.

2. New Algorithmic Approach Used

In this paper we use a simplified version of discretized network flow (DNF) that has been recently introduced as a versatile optimization technique for CAD problems over the last four years [7, 9, 12–14]. The DNF technique imposes certain discrete requirements on the flow through the network graph like a mutual exclusivity constraint on certain arc sets, called mutually exclusive arc (MEA) sets, in which at most one arc can have a nonzero flow in each MEA set. In the above cited works, the DNF problems is modeled as a fixed-charge network flow problem [15] in order to solve complex physical synthesis problems using multiple transforms and constraints with significant efficacy. However, it is possible to solve the discrete cell-sizing problem near optimally with a simple version of DNF in which max-flow or min-lookahead-cost (subsequently termed “mincost” for brevity) heuristics for determining which arc in each MEA set should have nonzero flow. We present this simpler version of DNF as the new algorithmic technique in this paper.

3. Overview of Our Method

Our cell-sizing method starts from an initial sizing solution that may be far from the optimal. The objective is to improve the critical path delay of the circuit by resizing cells. We define to be the set of paths with a delay greater than fraction () of the most critical path delay. In order to reduce the complexity of the problem we only consider changing the sizes of cells in . In our experiments, is set to be 0.1. This simplification does not limit the optimization potential of our method, since our method is incremental in its nature. Thus we can iterate it several times to take more paths into consideration.

We define to be the set of critical sinks of a net , which are all sinks in if the net is in , or the sink with the minimum slack otherwise. Our method tries to improve the critical path delay by minimizing the objective function proposed in [9]. A timing cost of a net is defined as [9]: where is a sink cell of net , is the net delay of to , and is the allocated slack of a net in the initial sizing solution; the allocated slack is defined as the path slack divided by the number of nets in the path. is the exponent of the allocated slack used to adjust the weight difference between costs of nets on critical and noncritical paths. Based on experimental results, works best in this scenario, since only nets in and those connected to it are considered in the optimization function (see below). Let be the set of nets that are either in or connected to nodes in it. The timing objective function is the summation of of all nets in given as Here we only consider nets in , since the delays of only these nets will change with our selection of resizable cells. Note that in this objective function the nets on more critical paths have a higher contribution since the net cost is inversely proportional to the allocated slack and thus will be optimized more—a desirable outcome, especially in a scenario where there is a quota on the resource (e.g., total area) available for optimization.

Usually, the total area is given as a constraint for timing driven cell sizing. Our algorithm can also handle this constrained optimization problem.

4. Overview of the Size Selection Graph (SSG)

We model the timing-minimization cell-size selection problem as a mincost network flow problem. A network flow graph called the size selection graph (SSG) is constructed in which the set of sizing options of each cell is modeled as a set of sizing option nodes, and each sizing option node corresponds to one sizing option. We use flows in the SSG to model cell-size selection; that is, a size option of a cell is chosen by a flow in the SSG if its corresponding option node in the SSG is on the path of the flow. Hence, each flow through the SSG that passes through one option node in every set of sizing options in the SSG corresponds to a sizing selection for the cells in . Furthermore, if we can set the costs of arcs between option nodes in the SSG in such a way that the total cost incurred by a flow through the SSG is equal to the change in the timing objective function (2) corresponding to the sizing scheme selected by the flow, then the mincost flow in the SSG selects the optimal cell-sizing scheme for . Hence the problem of finding the optimal cell sizing is converted to the problem of finding the mincost flow in a graph for which several efficient algorithms such as the network simplex algorithm and the enhanced scaling algorithm [16] are available. The general structure of our SSG is described below.

4.1. The SSG Structure

Since is the summation of the timing costs of nets, we employ a divide-and-conquer approach in constructing the SSG; that is, first a mininetwork flow graph called a net structure is constructed for each net in , and then net structures of connected nets are connected by net spanning structures to form the complete SSG as shown in Figure 1. Thus, the SSG has a similar topology as the paths in . Note that the source node is the only supply node with total flow of (whose determination is discussed in Section 5.4), and the sink is the only demand node of flow amount in the SSG. A net structure is a child net structure of if they are connected, and follows in the flow direction in the SSG (i.e., the signal direction in the circuit is from net to net ); correspondingly is also a parent net structure of . Each net structure contains the sets of option nodes for cells in the corresponding net.

(a)

(b)

Let us denote the set of sizing options of a cell as , and the th sizing option in it as . For a net with a driving cell , the net structure is constructed by connecting each sizing option of to all sizing option nodes in of every sink cell to form a complete bipartite subgraph between and . An example is shown in Figure 2. The net has one driving cell and two critical sink cells and ; see Figure 2(a). The corresponding net structure is shown in Figure 2(b). Here, we only show two sizing options in each option set. A flow through the net structure is also shown, which selects size option for , for , and for . With the complete bipartite subgraphs between the sizing option sets of the drive cell and sink cells in a net structure, every possible combination of size choices of cells in the net has a corresponding flow through the net structure and hence is considered in our size selection process.

(a)

(b)

There are two major issues that need to be tackled in constructing the SSG in order to correctly map the mincost flow in the SSG to an optimal sizing choice. They are as follow.

In each net structure, the cost of each arc needs to be determined so the total cost of a flow through the net structure can accurately capture the change value of the timing cost for the net corresponding to the sizing options chosen by the flow. A detailed description of this issue is given in Section 5.1.

The flow that is determined must be a valid flow, that is, can be converted to a valid sizing option selection. A valid sizing option selection requires the satisfaction of two types of consistencies. First, sizing option nodes of a particular cell have to be chosen in a mutually exclusive manner (consistency in a sizing option set). Second, if a cell is connected to multiple nets, its sizing options will also be included in each of the corresponding net structures. In such cases, the selected sizing options for the cell must be consistent across all these net structures (consistency across net structures). As described in Section 5.2, the net spanning structure is designed to guide flows between net structures in the SSG to satisfy the second type of consistency. To guarantee the first type of consistency, two heuristic methods are proposed that prune flows corresponding to invalid option selections when determining the mincost flow; these are discussed in Section 6.

4.2. Handling Cumulative Constraints

We define a cumulative metric as one, that is, the sum over all relevant circuit components (cells in our case) of a function of the chosen option (cell size in our case) of component/cell . A cumulative constraint is then an upper-bound or lower-bound constraint on a cumulative metric. We tackle upper-bound cumulative metrics, specifically, total cell area and total leakage power in this paper.

To handle any given total cell area constraint, each net structure has an outgoing arc called the area output arc. A shunting structure as explained in Section 5.3 is present in each net structure and connects the area output arc with the option nodes in the net structure. The function of the shunting structure is that when an option node is chosen, the shunting structure diverts a flow of amount that equals the chosen size to the area output arc from the incoming flow through the option node. All these flows are gathered at the area-gathering node as shown in Figure 1(b) which is connected to the sink. By setting the capacity of the arc between the area gathering node and the sink to be the given total cell area limit, we make sure the total selected cell area, which is equal to the total incoming flow amount to the area-gathering node, is smaller than or equal to the given limit. The shunting structure is discussed in detail in Section 5.3. Note that, as mentioned before, the sizing option of a cell can be contained in more than one net structure; in this case, the flow amount equal to its selected area will be sent to the area output arc in only one of the net structures that contain the cell.

A leakage power constraint is handled similarly, by having a flow equal to the leakage power of a cell corresponding to its chosen size option go through the shunting structure into a power-gathering node, and having an arc from this node to the sink with capacity equal to the leakage power upper-bound constraint.

The high-level pseudocode of our cell-sizing method FlowSize is given in Algorithm 1.

Algorithm FlowSize
Construct a net structure as depicted in Figure 2 for each net in .
Determine the cost of each arc in the net structures, so that the change in timing cost is accurately incurred by flows through
them.
Connect net structures with net spanning structures that maintain consistency of size selection of common cells across multiple
net structures; see Section 5.2.
Add the shunting structure and an area output arc (Section 5.3) to each net structure to divert flow of amount equal to the
selected size options (nodes) to the area output arc.
Connect area output arcs to the area gathering node that has only one outgoing arc with capacity equal to the area-constraint to
limit total selected cell area (= flow amount into the node).
Determine a valid min-cost flow in the resulting SSG by applying the standard min-cost flow algorithm and
the min-cost/max-flow heuristics (Section 6) iteratively.
Select the size options chosen by the valid min-cost flow as cell sizes to obtain a near-optimal critical path delay for the circuit.

5. Further Details of the SSG

In this section, we discuss important details of the SSG, including determining arc costs and capacities, net spanning structures, and further details of the structures for constraint satisfaction.

5.1. Arc Cost Determination

To explain our arc cost formulation that accurately captures the timing cost change, we assume a lumped capacitance and resistance net delay model, which is widely used in cell sizing [2, 4, 17]. For net , the delay to a sink cell of is where is the driving resistance of , is the unit WL capacitance, is the WL of , is the set of sink cells of , and is the input capacitance of cell . In the pre-placement sizing stage, the WL of a net is usually estimated according to the fan-out number of the net. In the postplacement resizing stage, it can be estimated using one of several well-know models, for example, HPBB. With this delay model, for a critical net with critical sinks, the timing cost of is

In the above expression, let us denote the coefficient as . The parameters affected by cell-sizing options are and . Hence, if a term in the formula includes , its value is determined by the size of , and if a term includes , its value is determined by the size of . For example, the term is determined by choices in sizes of both and , and the value change of this term when the two sizing options and are chosen can be written as where is the driving resistant corresponding to the driver cell with size option , is the input capacitance of with size option , and and are the original sizes of and , respectively. The term is determined only by the size of , and its value change when the option is selected is

We denote the set of arcs between and as , where is the driving cell, and is a sink cell. A valid flow will pass through only one arc in each such arc set, since only one option in and can be meaningfully chosen. Hence, in order to make the valid flow cost equal to the change of the timing cost, the cost of an arc in an arc set is set as the sum of the changes of all terms in the timing cost function that is functions of the cell-size options represented by the arc. Furthermore, if a term in is a function of only the size of one cell , then we arbitrarily choose an arc set among all those connected to , and the term value change determined by each is added to all arcs starting from option node in the arc set.

Thus, the value change of term is included in the cost of the arc set . Specifically, the cost of an arc in the set includes . The value change of term is a function only of the driving cell size and thus is included in the cost of arcs in only one arc set , where is an arbitrary chosen sink cell of . Specifically, the cost of an arc in the set includes .

Finally, the size change of a cell in a critical net also affects the timing cost of noncritical nets that are connected to the cell. Instead of also constructing net structures for these affected noncritical nets, we use a much simpler method, which is including the timing cost changes of noncritical nets in the net structures of their connected critical nets. Let be a cell in . We have two cases with respect to and any noncritical net (a net not in ) it may be connected to: if is the driver of a noncritical net , the timing cost change of is , where is the chosen option of , is the initial size of , and is the total load capacitance of . Otherwise, if is a sink cell of a noncritical net , the timing cost change of is . Let denote the total timing cost change of all noncritical nets connected to . If is the driving cell of the critical net , is included in arcs in one arc set , where, again, is the arbitrarily chosen sink cell of . Otherwise, if is a sink cell, is included in the arc set . Specifically, the cost of an arc includes .

To sum up, for an arc in a net structure, if ( is the chosen sink in whose arc set cost, that is, costs of the arcs in this set, and we include the change in the terms in that are only dependent on the cell size of driver ), its is Otherwise if , its cost is

We should note that our cost formulation is accurate under the assumption that the delay is linearly proportional to the load capacitance change. Only under such assumption, for a multiple fanout net, we can sum up the cost for the size change of each fanout to obtain the total delay change of the net. Unfortunately, in modern libraries with small feature sizes, the delay of a cell usually shows a nonlinear relationship with load capacitance. To handle this issue of a nonlinear delay function (as a function of capacitive load) in the ISPD’12 library (where the non-linearity is very pronounced), we determine in each iteration of the mincost flow computation (see Section 6), a min-square error linear approximation around the current design point.

5.2. Maintaining Consistency across Net Spanning Structures

As we mentioned before, the net spanning structure is designed to maintain consistency across net structures. Note that if multiple nets have a common cell, their net structures must be connected in the SSG by net spanning structures.

We first consider the situation in which a cell is a sink cell of a net , and the driving cell of nets (); see Figure 3(a). The size option set is present in all the net structures corresponding to these connected nets. We connect these net structures by adding arcs from each option node in of to the equivalent option node (indicating the same size choice) in the ’s of , where is the net structure corresponding to net . The resulting spanning structure is shown in Figure 3(b). With this structure, it is easy to see that if one size option of is chosen by the flow through , then this flow will also pass through the equivalent option nodes in . Thus, the required consistency is maintained.

The other situation is that the common cell is the sink cell of more than one net () and the driver of at least one net , as shown in Figure 3(c). In this situation, we will treat each () individually and make the connections between each and as stated in the first situation. However, in this case the spanning structure cannot guarantee the consistency of the option selected for in . We use a mincost heuristic to tackle this problem; this is described in Section 6.

The costs of all arcs in the spanning structure are 0.

5.3. Structures for Cumulative Constraint Satisfaction

In this subsection we discuss further details on the structures for satisfying the cell area constraint. As described earlier in Section 4.2, a leakage power constraint is handled by a similar structure.

As we mentioned before, for each net structure, there is an area output arc. Once a sizing option node is chosen, a flow with an amount equal to the chosen size needs to be sent to this arc—this flow is then sent to the sink via a common area-gathering node and its capacity-constrained arc in order to satisfy the total cell area constraint. The simplest way to achieve this is adding to each option node an arc leaving the net structure to the area output arc with a capacity equal to the option size. Then if enough flow comes into the option node, the desired amount will be sent to the area output arc.

An example is shown in Figure 4. A flow selects two sizing options and and incurs the corresponding cost of arc in a net structure. is the size of an option . At each of the two option nodes, a branch of flow diverges a part of to the area output arc with amount equal to the corresponding option sizes.

However, with this structure, the amount of flow that leaves a net structure is dependent on the option selected and is thus a variable. This is not desirable for determining the capacities of arcs in the spanning structures—the capacities of arcs in a spanning structure from net structure to need to be a constant, that is, independent of the size choices made in by the flow entering . We thus need a structure to diverge a constant amount of the flow that enters ; this amount is , where is the maximum size of options in .

Our complete structure for diverting flows to the area output arc is shown in Figure 5. In the structure, the capacity of the arc leaving towards the sink (called the leaving arc) from each option node in an option set is set to be . Therefore, the amount of flow leaving the net structure from any option node is always . However, not all this amount is sent to the area output arc. A shunting node is connected to each of these leaving arcs and divides the flow into two parts, one to the area output arc and the other one shunted (i.e., sent directly) to the sink. If the leaving arc is from an option node , the capacity of the arc between the shunting node and the area output arc is then , and the capacity of the arc to the sink is . Therefore, if is chosen, the amount of flow sent to the area output arc is exactly , and the rest of the amount is shunted to the sink. In this way, we send the correct amount of flow into the area output arc of each net structure and also make the total amount of flow leaving to the sink be . The costs of all arcs in this structure between an option node and the area output arc are 0.

5.4. Arc Capacity Determination

Setting proper capacities for arcs is very important for the correct functioning of the SSG. The capacities should be set such that sufficient flow can be sent to each net structure in the SSG to meet the flow demand on its area/power output arc; within a net structure, the total incoming flow can be distributed to all sink and the driver cell option sets.

In order to determine the arc capacity, we first determine how much incoming flow is needed for each net structure. A net structure has two types of outgoing flows: into the area output arc for area constraint satisfaction and supply flow to its child net structures. The first type of outgoing flow has a fixed amount for a net structure , irrespective of the chosen sizing options, as discussed in Section 5.3. The incoming flow amount must be sufficient to cover the two outgoing flows, and thus the required incoming amount of a net structure is recursively given as where child is the set of child net structures of , and is the incoming degree of , that is, the number of parent net structures of . In (9), we assume that the required flow amount for a net structure is sent uniformly from all its parent net structures. The determination of the flow needed in each structure starts from the boundary condition of “leaf” net structures given in (9) that are directly connected to the sink. The incoming flow amount needed for these net structures is the total of their first type of outgoing flows. Starting from the leaf net structures, we visit other net structures in a reverse topological order and determine their required incoming flow amount according to the formulation in (9). The total flow to be supplied from the source node is then where a root net structure is a net structure of a net that is driven by an I/O cell in the circuit and thus has no “parent” net in the circuit; in the SSG, the parent of all root net structures is .

After obtaining for each net structure, we can determine its arc capacities as follows. (i)For an arc in the net spanning structure from to , its capacity is , which equals the flow amount sent from to according to (9). This makes the total incoming flow amount to a net structure exactly . As mentioned before, the cost of this arc is 0.(ii)Within a net structure, the capacities of arcs in each arc set are the same as is derived below; is the driving cell and is any sink cell of the corresponding net.

For an arc set , if is not connected to any arc in the outgoing spanning structure of , the capacity of each arc in it is set as , so that sufficient flow can be sent to the leaving arc for constraint satisfaction. Otherwise, let be the child net structures of that is connected to (via spanning structures). Note that this means that is a common cell in nets and . Then, the capacity of each arc in the set is set to be .

(a) Unit Flow Arc Cost

When we gave the costs of arcs in a net structure in Section 5.1, we assumed that any flow on the arc will incur the cost. However, in a standard network flow graph, the cost of a flow on an arc is determined as the flow amount multiplied by the unit flow cost of the arc. With the above capacity assignment, a valid flow will always be a full flow on each arc it passes through in a net structure, since the summation of the arc capacity of a single arc in each arc set is equal to the incoming flow amount, and a valid flow uses exactly one arc in each arc set. In order to incur the same cost for a valid flow as determined in Section 5.1, the unit flow cost of an arc is set to be the corresponding arc cost given in Section 5.1 divided by its capacity as determined above, that is,

6. Finding a Valid (Discretized) Mincost Flow

The standard mincost network flow solves a linear programming problem (a continuous optimization method). Hence, it cannot automatically handle the consistency (mutual exclusiveness) requirement in a size option set for a particular cell, which may result in an invalid mincost flow for size selection. Therefore, after one iteration of a mincost network flow process, in a net structure, the obtained mincost flow may pass through an option node for the driving cell and then to two or more sizing options in for a sink cell as shown in Figure 6(a). Furthermore, as explained in Section 5.2, the resulting flow may also violate the consistency requirement for the size option selection across net structures that have a common sink cell.

(a)

(b)

In the above two cases, we will start a new iteration of the network flow process by pruning out some options that lead to an invalid flow based on certain criteria of the flow so that a near-optimal size selection is obtained. In the new iteration, for the first case, will only connect to one of the option nodes in that had flow through them in the first iteration as shown in Figure 6(b); the selection criterion is discussed shortly. Similarly for the second case, in all net structures whose corresponding nets have as a sink cell, the chosen driving cell options in the first iteration, for example, and as shown in Figure 7(b), will only connect to the same sizing option of selected from those that had flow through them in the first iteration. In this way, the same invalid flow will not occur in the second iteration.

(a)

(b)

We have used two alternative selection heuristics to choose a good option node from an illegal selection in each option set to be part of the new iteration.(i)Max-flow heuristic: always choose the option node with the largest flow amount through it.(ii)Mincost heuristic: for the first situation, starting from , follow the path of each branch of the mincost flow up to a length of , and choose the option node that is on the branch that has the mincost path.

For the second situation, for each currently selected option of , the cost that we use to determine whether it is a good option consists of two parts, output path cost and incoming arc cost. As shown in Figure 8, similar to the first situation, the outgoing path cost of an option of is the cost of the flow path starting from the option node. The incoming arc cost of an option of is the total cost of the set of incoming arcs to all ’s across all net structures that contain the option set ; see Figure 8. The summation of these two costs is a good estimation of the cost of a valid flow that chooses only option for cell . Hence, we choose the option with smallest summation of the path cost and the arc cost. Note that due to run time consideration we cannot always follow each branch flow to the sink. We thus set a limit (in number of net structures) on the length of paths we follow.

The percentage timing improvement of four representative circuits in Table 1 for the ISCAS85 benchmarks reveals that the mincost heuristics with path length limits of 2, 3, 4, and 5 perform consistently better than the max-flow heuristic and have a relatively better performance in the range of 24–39%. The mincost heuristic is thus implemented in our algorithms, and we set for a balance between computational complexity and accuracy.

6.1. Time Complexity of FlowSize

It is easy to see that, with the two pruning heuristics, if the option selection for a cell is inconsistent according to the mincost flow obtained in one iteration, in the following iterations the mincost flow will select a valid size option for the cell. Thus, the number of iterations required to reach a valid mincost flow is no more than the total number of cells in .

In each iteration, we use the network simplex algorithm to solve the mincost flow. Given a graph with arcs, if the capacities and costs of arcs are all integers, with being the maximum arc capacity and being the maximum arc cost, then the time complexity of the network simplex method is [18]; however, it is well known that the average-case run time of the simplex method is much lower than this worst-case complexity [19]. As described in Section 5.4, the capacities of arcs in our SSG, being the summation of cell sizes, are integers (note that cell sizes in a standard cell design are integer in the unit of the technique feature size of the library). On the other hand, while our cost is not integer, it can be converted to integer values by proper scaling. Hence though we do not actually do the scaling, we use this assumption here in order to derive an upper bound on the time complexity of our algorithm.

Let us first consider the total number of arcs in our SSG. It is dependent on the number of cells , the number of nets in the circuit, and the number of available sizes for each cell . There are three types of arcs in our SSG: arcs between size option sets, arcs in net spanning structures, and arcs in shunting structures. The number of arcs between two size option sets is , and in each net structure there are sets of such arcs, where is the degree of the corresponding net. Thus, the total number of these arcs in the SSG is , where is the average degree of nets in a circuit. Since is usually no more than 4, the total number of arcs between size option sets is . The number of arcs in the shunting structure for each option node is three, and hence the total number is . The net spanning structure only connects size option sets for the same cell in multiple net structures, and the number of arcs between two of such size option sets is . Since the total number of size option sets is , the total number of arcs in the net spanning structure is thus . To sum up, the total number of arcs in the SSG is .

The maximum arc capacity is equal to the total cell size when all cells are chosen to be at their maximum width and thus is . The maximum arc cost is less than or equal to the maximum net delay. The delay of a net is dependent on the driving resistance of the driving cell, input capacitances of the sink cells, and the net fanout. Since the driving resistances and input capacitances of cells are constants specified by the library and the average fanout is usually a small constant in a VLSI circuit, can be viewed as a constant.

Typically in a real circuit, and are about the same. Hence, the number of arcs in our SSG can be rewritten as . Therefore, the time complexity of each iteration is , and the total time complexity is then . The polynomially bounded time complexity is a highlight of our algorithm FlowSize, since other discrete cell sizing methods such as [3, 5] are not polynomially bounded in run time. Also, as we show in Section 8 (Figure 10), its actual run time reveals a much smaller complexity, that is, in keeping with the much smaller average-case run time of the network simplex algorithm compared to its worst-case complexity. Furthermore, our experiments show that the actual number of iterations needed in FlowSize is much less than the number of cells in , for example, eight iterations for a circuit with 300 cells in .

7. An Optimal Dynamic Programming Algorithm

Since we do not have the library or library parameters used in [3], we cannot directly compare our results for the benchmarks they use (ISCAS’85) to their published results for those benchmarks. Thus, we implemented an optimal dynamic programming (DP) method that can produce optimal solutions for the sizing problem of cells in and compared our solution quality to the optimal one.

In the DP algorithm we propose three pruning methods to reduce the number of partial solutions generated in the DP process that can maintain the optimality of the solution but greatly reduce the run time. We process cells in circuit topological order (from driver cells to sink cells). Each time we process a cell, a new set of partial solutions that involve the cell is generated by combining all partial solutions we have for previously processed cells with possible size choices of the cell. The pruning happens after new partial solutions are generated.

We propose three pruning conditions. A partial solution is pruned when it fails to meet the area constraint; it gives longer delay than the critical path delay produced by our method; there is another better partial solution (generated in the search process or extracted from the complete solution of our method) that gives smaller total area, and better arrival time at the outputs of cells on the boundary of the visited region (connected to unvisited cells as shown in Figure 9) in both of the following cases: (a) the unvisited cells are all at their maximum sizes or (b) the unvisited cells are all at their minimum sizes.

The first two pruning methods obviously do not change the optimality of the exhaustive search method. For the third condition, let us first denote the arrival time at the output of a cell as . Figure 9 shows a single path situation. is the visited cell at the boundary of the visited region, and is the unvisited cell connected to it. We have two partial solutions and , and is a better solution according to our third pruning condition. Then for any complete solution expanded from , we can also expand to by choosing exactly the same sizes for unvisited cells in as in . Since has smaller area than , if meets the constraints, so does . Then the total delay at the output of will be the same for both complete solutions. Since is better than , we have , where is the value according to partial solution , and is the value according to partial solution . Hence, the total delay of the path for the total delay for . Therefore, cannot be expanded to an optimal solution. Thus, our third pruning method also does not negatively affect optimality of the method.

8. Experimental Results

We tested our algorithm on two sets of benchmarks, the ISCAS’85 suite, and the ISPD 2012 suite [21]. For the ISCAS’85 benchmark suite, we used two different libraries, a 0.18 μm (180 nm) library and Synopsys’s 90 nm library. We use the same industrial 0.18 μm standard cell library as in [22], which provides four cell implementations for each function with different areas, driving resistances, input capacities, and intrinsic delays. The interval between the four available sizes for each cell is increased about exponentially, that is, . Other electrical parameters we use are unit length interconnect resistance m and unit length interconnect capacitance m. For ISPD 2012 benchmark, it has its own artificial library, which has high nonlinear dependency between delay and load capacitance. Results were obtained on Pentium IV machines with 1 GB of main memory for ISCAS’85 benchmark and Xeon machine with 72 G memory for ISPD 2012 benchmark. Competing methods (the optimal DP method of Section 7 for the ISCAS’85 benchmarks and an approximate DP method [3] for the ISPD’12 benchmarks) were also run on the respective machines.

The ISCAS’85 benchmarks come with initial sizing solutions. We ran our algorithm with a 10% total cell area increase constraint, which means that the total cell area after cell sizing cannot increase more than 10% from the initial solution. The improvements compared to the initial solution obtained by our net work flow method and the optimal dynamic programming- (DP-) based exhaustive search method are listed in Table 2. Compared to the optimal solution from DP, the improvement obtained by our method is only 1% worse (11.9% versus 12.9%); note that the solutions of both methods satisfy the 10% area increase constraint. Furthermore, our run time is 60X less than that of the exhaustive search method even with all three pruning conditions. Note again that we cannot compare our technique directly to that of [3], since we do not have their cell library or parameters.

We have also tested our algorithm using the more advanced and industry-like 90 nm library from Synopsys [20]. The results are given in Table 3. A similar trend to that of Table 2 is obtained is with the same initial sizing and 10% area relaxation, our method is only 3% worse (10.9% versus 13.9%) than the optimal solution. The run time we use is 40X less than the optimal one on average. We observe that there is a slightly increase in the optimality gap (from 1% to 3%) for 90 nm library compared to the 180 nm library. Our conjecture is that this increase is mainly due to the increased sensitivity in the 90 nm library; that is, for the same amount of cell size change, the relative delay change in the 90 nm library can be about 40% more than that of 180 nm library. Thus small errors in the optimal choice of cell sizes in near-optimal methods such as ours can lead to somewhat larger errors in the circuit delay results for libraries with larger sensitivity. However, we should note that, even with such large sensitivity increase as mentioned above, our optimality gap increases by only 2%.

To show the scalability of our algorithm, two additional experiments were performed with the 180 nm library. In the first one, the sizes of all cells were considered for resizing rather than only cells in . The results are listed in Table 4(a). Compared to only focusing on , the average number of cells that are resizable is increased by 10 times from 120 to 1336, the run time is also increased by about 10 times from 48 secs to 525 secs, while the timing improvement % is an absolute of 2% better. The run time plot with respect to the number of resizable cells is shown in Figure 10, which best fits a linear function. This is much lower than the worst-case complexity we derived in Section 6, and in keeping with the well-known much smaller empirical time complexity of the network simplex method compared to its worst-case complexity [19].

In the second experiment, we expanded the cell library by adding six artificial size options with proportional driving resistances and input capacitances for each cell (three size options are added between and with uniform spacing between them, and the other three added options are made larger than . The intervals between the last three newly added size options are the same as between the first three newly added size options. We use linear approximation determined from the four options provided in the original library to calculate the driving resistances and input capacitances of the added options), so that each cell has ten different size options. The results with this larger library and cells in being resizable are shown in Table 4(b). With the larger library, we can only obtain the optimal results for the two smallest circuits using the exhaustive search method. Our results are only 2.5% worse (14.8% versus 17.3%) for circuit C432, and 2.0% worse (16.8% versus 18.8%) for C499 compared to the optimal solutions. The run times of our method are about 80X less than the exhaustive search method for C432 (72 secs versus 5343 secs) and over 135X less for C499 (140 secs versus 18993 secs). Compared to the four-option results in Table 2, our DNF method’s run time is increased by about 7 times, while the timing improvement % is increased by an absolute of . We also plot in Figure 11 the run time with respect to the number of available size options for each cell for three representative circuits C432, C1908, and C7552. The plot for C1908 best matches a cubic function of , and the plots for C432 and C7552 best match quadratic functions, which are all consistent with the upper bound time complexity derivation given in Section 6 (that the run time is proportional to ). However, since, in current VLSI circuits, and even are generally much smaller than , the number of cells being re-sized, the dominant run time function is the one shown in Figure 10 that is linear in .

Finally, we ran our DNF method on the complex ISPD 2012 contest benchmarks [21]. The sizes of the benchmarks range from 25 K to 959 K cells. For each type of cell, there are 30 different “sizes” (the sizes are combinations of actual sizes and different threshold voltages) in the library. For each circuit, the power constraint is determined by power optimizing the fast version of each ISPD benchmark using an internal tool (this tool uses the more complex DNF formulation modeled as a fixed-charge network flow problem that was alluded to in Section 2) that was part of the ISPD’12 competition and was in the top 6 (out of 17 teams) for 6 of the circuits; see [23]. Then, we perform our timing-driven sizing under this leakage power constraint. We run three rounds of our DNF-based sizing. In the first round, all cells in the circuit are sizable; this is needed as the ISPD’12 circuits also come with an initial sizing, and these correspond to very high delays and low power designs. In the second round, we perform further improvement by focusing on sizing only cells on critical and near critical paths (with delay ≥90% of the max path delay); note that this is the only round we do for the ISCAS’85 circuits. In the last round, we perform final adjustment, again using the DNF method, to only sub-paths in the critical paths that show delay reduction potential. The delay reduction potential is measured by the differences in the delay of the size selection chosen in the second round, and the adjacent sizes of , and the ratio of the corresponding power increases to the current positive leakage power slack of the circuit.

Both the circuit sizes of the ISPD’12 benchmark and the number of sizing options per cell are far beyond the capabilities of the optimal DP method (it runs out of memory for the smallest circuit). We thus compared our method to a state-of-the-art cell-sizing method in [3] (implemented by us), which also uses an approximate dynamic programming (DP)—it has a nonoptimal similarity-based partial solution pruning that can significantly reduce the number of partial solution generated. The results are shown in Table 5. As we can see, we achieve almost the same delay quality as the DP-based method (only 0.9% above it on the average) but use less than half of its run time. This highlights the efficiency of our method in achieving high quality results. We also show the run time plot with respect to the number of cells in Figure 12. Again, a linear scalability with respect to the number of cells is seen for our DNF method.

9. Conclusions

We presented a novel and efficient timing-driven network flow-based cell-sizing algorithm. We developed a size option selection graph, in which cell size options are modeled as nodes, and the cost of flows passing through various nodes is equal to the change in the timing objective function when the cell sizes corresponding to these nodes are chosen. Thus, by solving for a mincost flow, we can determine the cell sizes that can optimize the circuit delay. Various techniques are proposed to ensure that we can obtain, from the continuous optimization of standard mincost flow, a valid “discrete” mincost flow that meets the discrete mutual exclusiveness condition of cell-size selection. Area and leakage power constraint satisfaction is also taken care of by special network flow structures. The results show that the timing improvement obtained using our method is near optimal for ISCAS’85 benchmarks and similar to a state-of-the-art method for the ISPD’12 benchmarks (the near optimality could not be determined for these circuits as the optimal DP method ran out of memory for the smallest circuit) while being more than twice as fast as that technique. Furthermore, our technique scales well with problem size since its worst-case time complexity is polynomially bounded, and the empirical time complexity is linear.

Acknowledgment

This work was supported by NSF Grants CCF-0811855 and CCR-0204097.

References

C. P. Chen, C. C. N. Chu, and D. F. Wong, “Fast and exact simultaneous gate and wire sizing by Lagrangian relaxation,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 7, pp. 1014–1025, 1999.
View at: Publisher Site | Google Scholar
J. Fishburn and A. Dunlop, “Tilos: a posynomial programming approach to transistor sizing,” in Proceedings of International Conference on Computer-Aided Design, pp. 326–328, 1985.
View at: Google Scholar
S. Hu, M. Ketkar, and J. Hu, “Gate sizing for cell library-based designs,” in Proceedings of the 44th ACM/IEEE Design Automation Conference (DAC '07), pp. 847–852, June 2007.
View at: Publisher Site | Google Scholar
F. Beeftink, P. Kudva, D. Kung, and L. Stok, “Gate-size selection for standard cell libraries,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '98), pp. 545–550, November 1998.
View at: Google Scholar
O. Coudert, “Gate sizing for constrained delay/power/area optimization,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 4, pp. 465–472, 1997.
View at: Google Scholar
M. M. Ozdal, S. Burns, and J. Hu, “Gate sizing and device technology selection algorithms for high-performance industrial designs,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '11), pp. 724–731, November 2011.
View at: Google Scholar
S. Dutt and H. Ren, “Discretized network flow techniques for timing and wire-length driven incremental placement with white-space satisfaction,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 7, pp. 1277–1290, 2011.
View at: Publisher Site | Google Scholar
H. Ren and S. Dutt, “A provably high-probability white-space satisfaction algorithm with good performance for standard-cell detailed placement,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 7, pp. 1291–1304, 2011.
View at: Publisher Site | Google Scholar
S. Dutt, H. Ren, F. Yuan, and V. Suthar, “A network-flow approach to timing-driven incremental placement for ASICs,” in Proceedings of the International Conference on Computer-Aided Design (ICCAD '06), pp. 375–382, November 2006.
View at: Publisher Site | Google Scholar
U. Brenner, “VLSI legalization with minimum perturbation by iterative augmentation,” in Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE '12), pp. 1385–1390, March 2012.
View at: Google Scholar
H. Ren and S. Dutt, “A network-flow based cell sizing algorithm,” in Proceedings of the 17th International Workshop on Logic & Synthesis, pp. 7–14, 2008.
View at: Google Scholar
S. Dutt and H. Ren, “Timing yield optimization via discrete gate sizing using globally-informed delay PDFs,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '10), pp. 570–577, November 2010.
View at: Google Scholar
H. Ren and S. Dutt, “Effective power optimization under timing and voltage-island constraints via simultaneous VDD, Vth assignments, gate sizing, and placement,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 5, pp. 746–759, 2011.
View at: Publisher Site | Google Scholar
H. Ren and S. Dutt, “Algorithms for simultaneous consideration of multiple physical synthesis transforms for timing closure,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD '08), November 2008.
View at: Google Scholar
A. Nahapetyan and P. M. Pardalos, “A bilinear relaxation based algorithm for concave piecewise linear network flow problems,” Journal of Industrial and Management Optimization, vol. 3, no. 1, pp. 71–85, 2007.
View at: Google Scholar
R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows: Theory, Algorithms, and Applications, chapter 10-11, Prentice-Hall, Upper Saddle River, NJ, USA, 1993.
D. Kim and P. Pardalos, “Gate sizing in MOS digital circuits with linear programming,” in Proceedings of the Conference on European Design Automation (EURO-DAC '90), pp. 217–221, 1990.
View at: Google Scholar
R. K. Ahuja and J. B. Orlin, “Scaling network simplex algorithm,” Operations Research, vol. 40, supplement 1, pp. S5–S13, 1992.
View at: Publisher Site | Google Scholar
I. Adler and N. Megiddo, “A simplex algorithm whose average number of steps is bounded between two quadratic functions of the smaller dimension,” Journal of the ACM, vol. 32, no. 4, pp. 871–895, 1985.
View at: Publisher Site | Google Scholar
Synopsys 90 nm Library, http://www.synopsys.com/community/universityprogram/pages/library.aspx.
ISPD 2012 cell sizing contest, http://www.ispd.cc/contests/12/ispd2012_contest.html.
X. Yang, B. K. Choi, and M. Sarrafzadeh, “Timing-driven placement using design hierarchy guided constraint generation,” in Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD '02), pp. 177–180, November 2002.
View at: Publisher Site | Google Scholar
http://www.ispd.cc/contests/12/ISPD_2012_Contest_Results.pdf.

Copyright

Copyright © 2013 Huan Ren and Shantanu Dutt. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1395

Downloads

669

Citations