VLSI Design

Volume 2013 (2013), Article ID 905493, 12 pages

http://dx.doi.org/10.1155/2013/905493

## Power-Driven Global Routing for Multisupply Voltage Domains

University of Wisconsin, Madison, WI 53706, USA

Received 22 June 2012; Revised 16 November 2012; Accepted 5 April 2013

Academic Editor: Shantanu Dutt

Copyright © 2013 Tai-Hsuan Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This work presents a method for global routing (GR) to minimize power associated with global nets. We consider routing in designs with multiple supply voltages. Level converters are added to nets that connect driver cells to sink cells of higher supply voltage and are modeled as additional terminals of the nets during GR. Given an initial GR solution obtained with the objective of minimizing wirelength, we propose a GR method to detour nets to further save the power of global nets. When detouring routes via this procedure, overflow is not increased, and the increase in wirelength is bounded. The power saving opportunities include (1) reducing the area capacitance of the routes by detouring from the higher metal layers to the lower ones, (2) reducing the coupling capacitance between adjacent routes by distributing the congestion, and (3) considering different power weights for each segment of a routed net with level converters (to capture its corresponding supply voltage and activity factor). We present a mathematical formulation to capture these power saving opportunities and solve it using integer programming techniques. In our simulations, we show considerable saving in a power metric for GR, without any wirelength degradation.

#### 1. Introduction

Power consumption is a primary design objective in many application domains. Dynamic power still remains the dominant portion of the overall power spectrum. Design with multisupply voltage (MSV) allows significant reduction in dynamic power by taking advantage of its quadratic dependence on the supply voltage.

Dynamic power is dissipated in combinational and sequential logic cells, clock network, and the (remaining) local and global nets. We refer to the latter as the signal power. The signal power can take a significant portion of the dynamic power spectrum. For example, the contribution of the signal power is reported to be around 30% of dynamic power for a 45 nm high-performance microprocessor synthesized using a structured data paths design style and about 18% of the overall power spectrum [1].

The signals are complex structures in nanometer technologies that span over many metal layers. The power of a route segment depends on its width, metal layer, and spacing relative to its adjacent parallel-running routes. These factors determine the area, fringe, and coupling capacitances which impact power. Furthermore, in MSV designs, the power of a routed net depends on its corresponding supply voltage. For example, a route will have lower power if all its terminal cells have (the same) lower supply voltage. If a net connects a driver cell of lower voltage to a sink cell of higher voltage, its route includes a level converter (LC) and is decomposed into two segments of low and high supply voltages, corresponding to before and after the LC.

Figure 1 shows an example to motivate for power-aware global routing. Three nets are given in this example with a corresponding power supply ( low voltage or high voltage). An activity factor is also given for each net. A net with higher power supply and activity factor consumes more power. Figure 1(a) shows that a shortest-length routing results in overflow in routing resources. The congested area is shown in the figure. Traditional GR is based on minimization overflow with minimal increase in wirelength. It is shown in this example in Figure 1(b), in which net is now 2 units longer; however, the congested area is eliminated. However, net has the highest power consumption (due to higher values of supply and activity factor). Making longer further increases its power consumption. Therefore, in power-aware GR which is shown in Figure 1(c), net has the shortest length, however, net instead is detoured to eliminate congestion. The wirelength of power-aware GR is higher than wirelength-based GR, but it has less signal power.

In this work, we propose a global routing (GR) method that optimizes the signal power in MSV designs. Figure 2 shows a generic design flow for a MSV-based GR. After placement and voltage assignment, the location and supply voltage of each cell are known. The supply voltage is determined either through voltage island generation [2, 3] or through a row-based assignment in a standard cell methodology. Furthermore, LCs are added to any net that connects a driver cell to a set of sink cells of higher supply voltage. Next, GR is applied to minimize the overall wirelength (WL), where the LCs are also included as terminals of a net.

For a given WL-optimized GR solution, we propose to further detour the nets in order to optimize the signal power. The signal power can be approximated during GR since at this stage the metal layers of each route segment are known. Furthermore, the spacing of parallel routes can be estimated from the routing congestion. Given a WL-optimized solution, the nets can be rerouted to trade off WL with power. For example, nets from higher metal layers can be routed to the lower ones for less wire widths and area capacitance. Nets can also be rerouted to spread the congestion, thereby increasing their spacing for less coupling capacitance. Activity factor and voltage can be incorporated as a power-weight for each route.

We present a mathematical formulation for MSV-based GR to minimize power and present integer programming-based techniques to solve the formulation. As part of power saving, our methods spread the routing congestion and ensure no additional overflow (of routing resources) and a bounded degradation in WL compared to the initial solution.

To the best of our knowledge, this is the first work of power-driven global routing in MSV designs. Recently, the work [4] discusses power-driven GR; however it does not consider the MSV case. Furthermore, it relies on the availability of *power-efficient candidate routes* for each net but generates such candidate routes quite heuristically. As part of the contributions of this work, we show a formal procedure to generate power-efficient candidate routes from the initial WL-optimized solution while taking into account the overall WL degradation and power saving. Also, recently the work [5] studies the GR problem for MSV domains, but it does not focus on routing for power minimization.

#### 2. New Algorithmic Techniques Used

Power-aware routing can be considered as a new EDA problem. This is because the power of global interconnects (or signals) are starting to show nonnegligible contribution to the overall power spectrum for advanced technology nodes [1]. This issue is further exacerbated for multisupply voltage domains for which the power of a net dramatically changes depending on the voltage domain(s) that it (fully or partially) falls in. This work is the first to propose and formulate the power-aware routing for multi-supply voltage domains.

Furthermore, from an algorithmic perspective, the techniques offered in this work are a combination of integer programming with parallel processing based on problem decomposition. Integer programming allows obtaining a higher-quality solution compared to using heuristics as shown in [6]. However, it is not considered a suitable algorithmic venue for large-sized industry circuits. This work relies on decomposition of the routing problem intro smaller-sized and parallel-processed subproblems in order to make the use of integer programming possible for large-sized circuits.

#### 3. Interconnect Modeling

In this section, we discuss an MSV-based GR model. We assume that the level converters (LCs) are placed for some of nets and the supply voltage of each cell is known.

##### 3.1. Interconnect Modeling in MSV Designs

We are given a grid-graph model of the GR problem, where each vertex corresponds to a global bin containing a number of cells. Each edge represents the boundary of two adjacent bins. A capacity is associated with each edge , reflecting the maximum number of routes that can pass between two adjacent bins. A net is identified by its terminal cells, which are a subset of the vertices . In MSV-based GR, the terminals of a net may also be the LCs. During GR, a Steiner tree in is found for each net to connect its terminals. The length of is taken to be its wirelength (WL).

Figure 3(a) shows an example. The chip is divided into regions. Each region has either a low or high supply voltage. A routed net is specified in the figure. The net has one driver terminal with voltage and three sink terminals of voltage. The route includes two LCs which are also considered as additional terminals of the net.

For power-driven MSV-based GR, we first decompose a net which contains a LC into a set of subnets. We reroute each subnet as an individual net during power optimization. Consequently, we have number of nets after decomposition. For example, in Figure 3(b), the initial route is shown with its LCs. The net is decomposed into three subnets, each of which will be rerouted. The first subnet connects the driver terminal in to the two LCs. The second one connects one LC to one terminal. The third one connects the other LC to the other two terminals.

Figures 3(c)–3(e) illustrate our net decomposition procedure. The decomposition of each net is done using its initial route and the location(s) of its level converter(s), assuming they are determined before this stage. For a net containing level converters, starting from its driver terminal, a subnet corresponding to a low supply voltage is formed that connects the driver terminal to a set of level converters and/or a set of sink terminals of the same supply voltage. Next, one or more subnets are formed that connect the level converters to the sink terminals of the same (and higher) voltage level. The BFS algorithm is utilized to traverse the initial route in our implementation. For example, in Figure 3(d), we start traversing from the source node until reaching the two level converters. All the touched edges form the first subnet which has a low supply voltage. Next, we continue traversing from each of the level converters individually until reaching all the sink nodes, using which the subnets and with high supply voltage are then identified.

Our net decomposition procedure is able to find a minimum number of subnets for each net that contains a level converter such that each subnet has only one corresponding supply voltage. Note that after rerouting the subnets, it is possible that these subnets may pass through the same edge(s) as shown in Figure 3(e). If the subnets which pass through the same edges have the same voltage level, (e.g., the subnets and in Figure 3(e)), then we can merge these subnets to release the overutilized routing resources. The above procedure is given for the case when two supply voltages and exist, which is also the case considered in this work. For higher number of voltage domains, the procedure can be extended in a similar way.

##### 3.2. Power Modeling

Each decomposed net has a corresponding supply voltage and switching activity . The required interconnect power for a GR solution is estimated as where is the frequency. As seen in (1), the capacitance of routed net is the sum of the capacitances of its sink cells (denoted by ) and of its route (denoted by ). Here is a constant that does not depend on the rerouting, so it is excluded from the optimization. Note that the power of the LCs are considered fixed and thus also not considered as part of the interconnect power optimization. The capacitance for a routed net is the sum of the capacitances of its unit-length edges that are contained in route (given by notation ): The parameter is the capacitance of one routed edge . This capacitance is a function of the metal layer , wire width , and wire spacing of the edge . Specifically, where and are the area and fringe capacitances with respect to substrate, and is the coupling capacitance. As indicated, these capacitances are functions of wire length, width, and spacing and are provided by the technology library through a lookup table.

In this work, we assume that only one (and a different) wire width is associated with each metal layer, so we exclude the parameter , and for each edge , its metal layer is known. The spacing for edge is estimated from the edge utilization in a GR solution. Given the utilization and the length of edge (computed from the chip dimension and the routing grid granularity), the spacing is calculated to allow maximum spacing between its corresponding routes. Figure 4 shows an example for . This simple *averaging* strategy may be adjusted if more information is available at the GR stage (e.g., the adjustment may be due to the fixed short nets which fall inside a single global routing bin). With this approximation, we can express the capacitance of a unit-length route edge in terms of the edge’s metal layer and its utilization. The total capacitance of edge is given by the product of the per-unit capacitance and the utilization : .

Figure 5(a) shows the curves representing area, fringe, and coupling capacitances for metal layer 1 with respect to edge utilization for a 45 nm library [7], assuming each GR edge is 2. The summation of the 3 capacitances is shown in Figure 5(b).

#### 4. Placement of Level Converters

The LCs may only be placed on the WL-optimized route, initially provided for each net. This ensures that the addition of LCs will not cause extra congestion; it allows connecting each LC to the initial route conveniently just by adding vias from the LC to the initial route. Randomly placing the LCs may harm the GR congestion and degrade WL or overflow.

We list a set of requirements to identify valid LC insertion cases for a net with given route . We assume the net has a single source and may have multiple sink terminals. (1) The location of LC is vertex in . (2) This vertex should fall inside a voltage island. (3) The global bin corresponding to should have enough space to add the LC. We denote the available space of by and compute it after placement (see Figure 2). (4) For vertices satisfying the above 3 conditions, if all have the same distance to the source terminal (in terms of the number of edges on ), we require LCs be added on these vertices simultaneously.

Figure 6 shows the set of potential LC locations of net with initial route . The source is the terminal in island. Note that one vertex in cannot be used because it is in the island. We have four cases for valid LC insertion indicated by , , , and . In the latter case, two LCs should be placed on the net after the diverging point on the route to ensure that is delivered to both sink terminals. For a single-source net , we identify all the cases for valid LC insertion using a breadth first traversal on and denote this set by . In this example . For each case , we further compute a corresponding power using (1), where the edge utilization required to compute coupling capacitance is obtained from the initial WL-optimized solution. The power includes the interconnect portions on and the LC(s).

To select one LC insertion case for each net, we define binary variable to be equal to 1 if and only if case is selected for net . The LC placement problem is expressed as the following integer program (IP) which can efficiently be solved using a solver, as we elaborate in our experiments: where the parameter is equal to 1 if, in case , an LC is placed at vertex . The first set of constraints ensures at most one LC insertion case is selected for each net. The slack variable will be positive if there is no available space for placing LCs for net and is heavily penalized by positive to maximize the number of placed LCs. The second constraints ensure LCs are placed in the free placement space.

In addition, it may not be possible to place LCs on any vertex on the GR grid because its corresponding global bin is highly congested. We therefore associate for each vertex a constant parameter , indicating its available placement space. In our experiments, we calculate this available space for each global bin according to the placement density.

With this assumption, after adding an LC, the initial route can connect to the LC by extending through a set of vias at the LC location. Furthermore, for the WL-optimized tree of net , the potential locations of LCs are only allowed to be those vertices which fall inside the voltage islands as the LC should get connected to voltage.

To enumerate all the possible LC insertion cases in a given route, consider a single-source net with WL-optimized route . We enumerate and systematically identify all the cases for *valid* LC insertion according to the distance of vertex from the source vertex on the route. The distance is measured in terms of the number of edges on the route between and the source and obtained using breadth first traversal on tree . We count each LC location and in the island as one possibility for LC placement on that route. However, if multiple vertices have the same distance to the source, we consider adding LCs on all such vertices simultaneously and count them as one possibility. For example, in case in Figure 6 we insert two LCs simultaneously. We then define a set for each net , indicating all the possibilities for its valid LC insertion. In the given example . Furthermore, we compute the power for LC insertion case for route . The power is computed according to (1) where the edge utilizations used to compute coupling capacitance are computed from the provided WL-optimized solution.

#### 5. Power-Driven MSV-Based GR

In this section, we first present a mathematical formulation of power-driven MSV-based GR. We then discuss integer programming-based techniques to obtain high-quality solutions to the formulation.

##### 5.1. Mathematical Formulation

As described in Section 3.2, the per-unit capacitance of an edge is a function of its metal layer and the edge utilization. Typically, this function is a convex increasing function, as depicted in Figure 5. We represent the function by a set of line segments denoted by . For example, the set is composed of 7 line segments in the library used in this work [7]. Each line segment is of the form , for a given range of , where and are derived from the library for that range. For each of the 8 metal layers in our library, the curve is represented as 7 piecewise linear segments.

Since the per-unit capacitance is convex, its value may be expressed in our mathematical optimization problem for GR with the following set of linear inequalities: For a given edge utilization , the corresponding is obtained from the line equation that gives the largest value of for .

To model GR, we are given a routing grid graph , a set of decomposed multiterminal nets denoted by , and edge capacities . Let be a collection of all Steiner trees that can route net . We later discuss how to approximate by generating a set of power-efficient candidate trees with consideration of WL degradation. Each tree is associated with a binary decision variable which is equal to 1 if and only if it is selected to route net . Let the parameter be equal to 1 if tree contains edge (if ). The GR problem for power minimization is given by The first term in the expression of the objective function is the interconnect power as explained in Section 3.2. It includes activity and voltage of net . The capacitance of a route of net is obtained by adding the unit edge capacitances for all the edges . Here, the route will be selected for net only if .

The first set of constraints selects at most one route for each net. The slack variable is equal to 1 if net cannot be routed, and the variable is penalized in the objective function by a large parameter to maximize the number of routed nets. The term represents the edge utilizations . The second set of constraints ensures that the edge utilizations are within the given edge capacities. The third set of constraints determines the per-unit edge capacitance for each edge from its utilization, using the discussed piecewise linear model. The fourth constraint ensures the new wirelength is within a factor of the initially-provided wirelength . Here denotes the wirelength of route of net .

The constraints of formulation are all linear. However, the objective expression is nonlinear (due to the multiplication of variables and ). We handle the nonlinearity in a heuristic manner using a two-phase approach. First, we choose a rerouting that attempts to minimize the total capacitance of all edges. Next, per-unit capacitances are estimated (and fixed) based on the solution of the first phase, and a re-routing is sought that minimizes the total estimated power. Each of these two phases becomes integer *linear* programs (IPs) which are discussed in the next sections.

##### 5.2. Phase 1: Minimizing Total Capacitance

Using the piecewise linear approximation for the per-unit capacitance given by (5), we may also approximate the total capacitance as This (convex) nonlinear expression may be relinearized, resulting in another piecewise linear expression for the total edge capacitance that may be used in our linear integer program for minimizing the total capacitance:

###### 5.2.1. Formulation

The formulation of phase 1 is given by the following IP: The objective expression is similar to formulation but the first term is replaced by which represents an estimate of the total interconnect capacitance. The third set of constraints is also updated; the variable replaces in the previous formulation, and the coefficients in the piecewise linear model are updated to use (8).

###### 5.2.2. A Price-and-Branch Solution Procedure

We approximately solve the using the two-step heuristics. First, a *pricing* procedure is used to generate a set of candidate routes for each net that are power-efficient while considering the WL degradation. The pricing step approximates in the formulation to contain a small set of power-efficient candidate routes, instead of all the potential routes of net . Second, *branch-and-bound* is applied to solve , selecting one route for each net from the set of generated candidate routes. The standard branch and bound algorithm can be carried out using a commercial solver. This two-step procedure of generating candidate routes and then running branch and bound is commonly known as price and branch [8, 9]. The price and branch procedure was recently applied to solve the GR problem for WL improvement [6]. We apply the same procedure for power improvement. The major technical difference in our procedure is in the pricing step to find power-efficient candidate routes, which we next discuss in detail.

###### 5.2.3. Overview of Pricing for Route Generation

We solve a linear-programming relaxation of by replacing the binary requirements on the variables with constraints for all , for all . The linear program is solved by an iterative procedure known as column-generation [10]. In column generation, we start by replacing (set of all possible routes of net ) in formulation by subset , initially containing one candidate route per net. We then gradually expand , adding new routes that may decrease the objective function. Adding the new candidate routes is via a *power-aware pricing condition* for each net.

Before explaining the procedure in more detail, we first give the following notations:(1) we refer to the LP relaxation of in which is replaced by and by the “restricted master problem” denoted by (RMLP-P1); the solution of (RMLP-P1) for a given is denoted by ; (2) we refer to the dual of the restricted master problem by (D-RMLP-P1). The solution of (D-RMLP-P1) consists of , corresponding to the dual variables for the first, second, and third set of constraints in the relaxed , respectively.

The iterative column generation procedure including the pricing condition is enumerated below. (1) For each net , initialize with one route. (In this work we start with the solution of [11].) (2) Solve (RMLP-P1), yielding a primal solution and dual values in (D-RMLP-P1). (3) Generate a new route for net . Using the solution of step 2, evaluate the pricing condition: if , then . (4) If an improving route for some net was found in step 3, return to step 1. Otherwise, stop—*the solution ** is an optimal solution to* (RMLP-P1).

Step 3 gives the pricing condition in terms of the solution of the dual problem (D-RMLP-P1) obtained at the current iteration. This step can determine for a given new route if it should be added to the set to reduce the objective of (RMLP-P1). However, it does not specify how a new route should be found such that the pricing condition gets satisfied. We discuss a convenient graph-based procedure to generate new route which satisfies the pricing condition.

###### 5.2.4. Route Generation for One Net

To find improving routes for net , we associate a weight for edge in the GR grid as

By the theory of linear programming, for each edge , at most one dual variable will be positive in an optimal solution to (D-RMLP-P1). Thus, considering route , we can compute the pricing condition as . We take advantage of this interpretation to identify promising route which satisfies the pricing condition. Given a route obtained from previous iterations, we obtain by rerouting branches of with the updated edge weights so that the overall weights of rerouted branches are reduced.

We explain the procedure with the example of Figure 7. Considering two nets and , suppose we are initially given the routes and for these two nets. After step 2 at the first iteration of column generation, we obtain edge weights which are given in Figure 7(a). To obtain a new route for net , we reroute different branches of . For each terminal, we identify a branch as the segment connecting it to the first Steiner point on . We then reroute this branch by solving Dijkstra’s single-source shortest path algorithm [12] on the weighted graph with the weights of the first iteration, similar to [13, 14]. The route is shown in Figure 7(b). After adding to , we proceed to the second iteration and obtain new edge weights which are shown in Figure 7(b).

The discussed pricing procedure is similar to [6]. However, it differs in the pricing condition and the way edge weights are set up. For solving (RMLP-P1) and its dual at each iteration, we use the solver CPLEX 12.0. After obtaining the final set , again we use CPLEX 12.0 for the branch and bound step to get the final solution. We further accelerate the process by applying a simple problem decomposition that we will discuss in Section 5.4.

##### 5.3. Phase 2: Considering Activity and Voltage

At phase 2, we approximate the per-unit edge capacitances using the solution from phase 1 and reroute the nets to minimize an approximation of the total power. Since the utilization (and hence capacitance) corresponding to the routing solution of phase 2 may be different from phase 1, we heavily penalize any mismatch in our optimization.

###### 5.3.1. Formulation

We compute the following quantities after phase 1.(1) We define a new “effective” capacity for each edge as , where is the value of the routing solution from phase 1. (2) We define the new per-unit capacitance as , where is the value of the edge capacitance from the solution found in phase 1.

With these definitions, the formulation of phase 2 is the following integer linear program:

The first term in the objective expression is summation of an estimate of the power of the nets, where is the fixed approximate per-unit capacitance of edge which contains route and is obtained using the solution of phase 1 as discussed before. The first set of constraints ensures that at most one route is selected per net; otherwise, a heavy penalty of is associated if , and this is reflected in the second term of the objective function. The second set of constraints enforces the new utilization of each edge to be , where is a new variable which is heavily penalized by a large factor in the objective function if . In other words, we highly penalize if the rerouting of a net causes a larger edge utilization compared to phase 1. This in effect forces the routing process to keep the *mismatch* in the edge utilizations as small as possible which translates in the capacitance (which is function of utilization) to remain close to phase 1. We also enforce to ensure that the edge utilization is not beyond its actual capacity in the fourth set of constraints. Finally, the third set of constraints ensures that the increase in wirelength is bounded by factor .

###### 5.3.2. Solving Using Price and Branch

The solution procedure is quite similar to the one explained in the previous Section 5.2 for phase 1. Here, we just note the differences. We denote the restricted master problem by (RMLP-P2) and its solution by . The dual of the restricted master is denoted by (D-RMLP-P2) and its solution is , corresponding to the first, second, and third set of inequalities in relaxed , respectively.

The initial set is set to *all the candidate routes generated from phase 1*. This helps to quickly generate a high-quality solution for phase 2. It also ensures that the solution of phase 1 is included as a *feasible* solution in phase 2.

The pricing condition is given by the following inequality and is used to define the edge weights given by , for all .

##### 5.4. Decomposition

To accelerate solving the two-phase formulation, we apply a simple problem decomposition. We recursively divide the chip into a set of rectangular subregions while balancing the total number of nets that fall inside each subregion. We use the initial WL-optimized solution of [11] to guide this process. We stop when the number of nets at each subregion is at most 3000, which we empirically determined for our experimented benchmarks from the ISPD2008 suite [15].

Each subproblem is then defined as one rectangular subregion with the set of nets assigned to it. If a net passes from multiple subregions, we force the terminal location on the subregion boundary to be fixed from the initial WL-optimized solution. This allows independent solving of each subproblem without the hassle of later connecting the segments of a route in adjacent subregions. The subproblems are then (one-time) parallel-solved to get the final solution. Figure 8 shows an example.

Even though in our decomposition each subproblem in effect is assigned a low or high voltage level, it is possible that the nets assigned to it have different supply levels. For example, a high voltage net may just pass from a subproblem in a low voltage island, or a net with level converter (which will have portions of high and low voltage levels after net decomposition) may fall in a high voltage island.

Please note, the main difference between our decomposition procedure and [6] is the use of the initial WL-optimized solution to fix the terminal locations on the subregion boundaries and thus avoid later connecting adjacent subproblems.

Overall this decomposition is extended from PGRIP [16], but we make use of our initially provided global routing solution for more effective decomposition to determine the fixed terminal locations on the boundaries for independent and parallel processing of the subproblems.

#### 6. Simulation Results

##### 6.1. Benchmark Instances

In order to test our solution procedure and determine whether or not significant power savings were possible without increasing wirelength, we modified known benchmarks to include multisupply voltages. Modifying the benchmarks required us to generate timing data and power data and place level converters. We implemented the procedure of [2] to generate voltage islands for two voltage levels of and . The procedure required a sequential netlist with gate-level delay and power models.

*Timing Modeling*. We assumed the locations of the sequential elements in the ISPD 2008 benchmarks using the following procedure. First, we obtained a directed acyclic graph (DAG) representation of the benchmarks from the variation provided by the ISPD 2006 placement benchmarks [17]. Using the placement benchmarks, we obtained a DAG by starting from the designated primary inputs and traversing in forward direction until reaching the primary outputs. We also assumed the nets with more than 50 terminals to be clock trees to identify sequential elements.

We then assumed that the delay of each cell (or node in the DAG) is proportional to its size (for unit load) where the unit delay was assumed to be of the inverter of the 45 nm library [7] used in this work. We considered loading in our cell delay modeling to be proportional to the cell size which was also given in the placement benchmarks.

*Power Modeling*. We randomly and uniformly generated the activity factors of each net to be between 0.1 and 0.9. The 45 nm library used in this work contained information about the total capacitance (area, fringe, and coupling) for each of the 8 metal layers. We used the method described in Section 5 to extract piecewise linear model for and for each of the 8 metal layers. For each metal layer, we considered the minimum wire size given in the library. To map edge utilization to spacing, we assumed the length of each edge of the GR grid to be ; for a given utilization we assumed the maximum spacing between the routes mapped to the same GR edge.

*Level Converter Placement*. After voltage island generation, we needed to decide the locations of the level converters (LCs). (The procedure in [2] did not specify these locations.) For simplicity, we inserted the LCs on the initial WL-optimized solution that was taken from [11]. The LCs were inserted for any net that had a source terminal driving one or more sink terminals. The procedure minimized the number of LCs and placed them as close as possible to the sink terminals, subject to the available whitespace. The whitespace inside each global bin was derived by evaluating (both) the placement and GR variations of the ISPD benchmarks.

##### 6.2. Level Converter Placement

In our first experiment, we report the result from our level converter placement algorithm for the nets that contained a level converter (had a source terminal in island with fanout terminals in islands). We consider the following case in our experiment: we routed all the nets using the initial wirelength-optimized solution of NTHU-Route2.0 [11]. We solve our formulation to obtain the level converter locations subject to the area density constraints. We consider the obtained results as the base case for power comparison in our second experiment.

Recall the placement of level converters can impact the power of each route by decomposing it into multiple segments where each segment has a high or low supply level. Using (1), we compute the total power of the nets which need level conversion. This includes the power of level converters and the different routes segments of the decomposed nets after inserting the level converters.

Table 1 reports our power comparison results. We report the total number of nets and the number of nets which require level conversion in columns 2 and 3, respectively, for each benchmark. The total number of level converters in our case is given in column 4. The number of level converters is larger than column 3, indicating that for some nets it may be better to add extra LCs but place them closer to the sink terminals to reduce the route portion that is driven by high voltage and save power. In column 5, we report the power of ([11] + LC) for the nets including the ones with level conversion. We use these power numbers as the base case for our next experiment. Finally, the wall clock time of the level converter placement (indicated by WCPU) is given in column 6. As can be seen this step is done very quickly.

##### 6.3. Power-Aware Global Routing

In this experiment, we used the initial WL-optimized solution of [11], and after fixing the locations of LCs, we applied net decomposition (as described in Section 3.1). We then solved two IPs corresponding to the formulations given in phase 1 and phase 2 for each subproblem using CPLEX 12.0 [18]. The number of subproblems is listed in Table 2 column 5 (indicated by SP number) which ranged from 130 to 670 among the benchmarks. These IPs were solved on the computer-aided engineering (CAE) grid at the University of Wisconsin Madison. Each machine had 2 GB of memory. All IPs were submitted to HTCondor [19] which manages the computers in a shared environment. HTCondor then assigned the jobs for parallel processing to the available machines.

Table 2 reports the number of nets, decomposed nets, and LCs in columns 2, 3, 4, and respectively. We then applied our power-driven GR procedure using a wirelength degradation factor of , so *no* wirelength degradation was allowed.

We then compared three routing solutions: (i) the initial WL-optimized solution of [11]; (ii) the solution after applying phase 1, obtained by solving the formulation ; (iii) the solution by further applying phase 2, obtained by solving followed by .

For each case, we report the wirelength (WL), the total capacitance (, where is defined in (2)), given in units , and the GR power metric from (1), excluding the constant portions of the expression.

The results are reported in Table 2 in columns 6 to 14. For the initial solution, we report the wirelength of the NTHU-R2.0 routes that have been augmented with the extra via-only segment(s) to connect the LC(s) to the original routes. (As a result, there is slight increase in wirelength compared to the numbers reported in the work [11].) For the solutions of phase 1 and phase 2, we report only the percentage *improvement* in WL, , and , all with respect to the initial solution.

As can be seen, applying phase 1 of the power-reduction heuristic results in significant saving of 8.77% in . Recall, the savings are solely due to capacitance reduction (as can be seen from the higher improvement rate in compared to ). By further applying phase 2, we see additional improvement in (on average 16.70%). The improvement in is slightly larger than phase 1, even though phase 1 solely focuses on optimizing . This is because we start phase 2 by including all the candidate routes generated from phase 1. Notice that in both phase 1 and phase 2 there is an improvement (reduction) in WL compared to . It is important to note that no extra overflow was introduced in the power-optimized solutions.

In our simulations, we explicitly bounded the runtime for phase 1 and phase 2. The wall clock runtime of all benchmarks for phase 1 and phase 2 was set to 30 min and 40 min, respectively. The number of processors used for parallel processing of the subproblems was upper bounded by the number of subproblems, for example, up to 130 simultaneously processed jobs in benchmark adaptec1; the exact number of parallel jobs is not known and depended on the number of free machines in our computational grid (which depended on the number of users of the grid when the simulations ran) as well as HTCondor’s internal procedure to schedule jobs to available resources which considers factors such as user priority and past usage history. Furthermore, HTCondor resource management policy ensured that each machine ran at most one job at each time, so the machines were solely dedicated to solving the subproblems when utilized by us.

In an ideal situation (i.e., a grid which can support simultaneous runs of all the subproblems), the wall clock time of our tool will be 30 min and 40 min (for phases 1 and 2, resp.) for a total sum of 70 min for each of the benchmarks. We note, in this work unlike PGRIP [16], our decomposition procedure creates *independent* subproblems so there will not be any communication between the subproblems.

#### 7. Conclusions

We proposed a formulation for minimizing an interconnect power metric for global routing for design with multi-supply voltage. Power minimization is after an initial wirelength-optimized solution is obtained. We presented a mathematical formulation which considered power saving opportunities by reducing the area, fringe, and congestion-dependent coupling capacitances at each metal layer, while accounting for the activity and supply voltage of each route segment. We showed significant savings in the power metric for global routing without any degradation in wirelength or overflow.

#### References

- R. S. Shelar and M. Patyra, “Impact of local interconnects on timing and power in a high performance processor,” in
*ACM International Symposium on Physical Design*, pp. 145–152, 2010. View at Google Scholar - R. L. S. Ching, E. F. Y. Young, K. C. K. Leung, and C. C. N. Chu, “Post-placement voltage island generation,” in
*Proceedings of the International Conference on Computer-Aided Design (ICCAD '06)*, pp. 641–646, November 2006. View at Publisher · View at Google Scholar · View at Scopus - L. Guo, Y. Cai, Q. Zhou, and X. Hong, “Logic and layout aware voltage island generation for low power design,” in
*Proceedings of the Asia and South Pacific Design Automation Conference*, pp. 666–671, January 2007. View at Publisher · View at Google Scholar · View at Scopus - H. Shojaei, T.-H. Wu, A. Davoodi, and T. Basten, “A Pareto-algebraic framework for signal power optimization in global routing,” in
*Proceedings of the 16th ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED '10)*, pp. 407–412, August 2010. View at Publisher · View at Google Scholar · View at Scopus - W.-H. Liu, Y.-L. Li, and K.-Y. Chao, “High-quality global routing for multiple dynamic supply voltage designs,” in
*Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '11)*, pp. 263–269, November 2011. View at Publisher · View at Google Scholar · View at Scopus - T.-H. Wu, A. Davoodi, and J. T. Linderoth, “GRIP: scalable 3D global routing using integer programming,” in
*Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC '09)*, pp. 320–325, July 2009. View at Scopus - Nangate 45 nm open cell library, 2008, http://www.nangate.com/.
- C. Barnhart, E. L. Johnson, G. L. Nemhauser, M. W. P. Savelsbergh, and P. H. Vance, “Branch-and-price: column generation for solving huge integer programs,”
*Operations Research*, vol. 46, no. 3, pp. 316–329, 1998. View at Google Scholar · View at Scopus - D. G. Jørgensen and M. Meyling, “A branch-and-price algorithm for switch-box routing,”
*Networks*, vol. 40, no. 1, pp. 13–26, 2002. View at Publisher · View at Google Scholar · View at Scopus - G. Dantzig and P. Wolfe, “Decomposition principle for linear programs,”
*Operations Research*, vol. 8, pp. 101–111, 1960. View at Publisher · View at Google Scholar - Y.-J. Chang, Y.-T. Lee, and T.-C. Wang, “NTHU-route 2.0: a fast and stable global router,” in
*Proceedings of the International Conference on Computer-Aided Design (ICCAD '08)*, pp. 338–343, November 2008. View at Publisher · View at Google Scholar · View at Scopus - E. W. Dijkstra, “A note on two problems in connexion with graphs,”
*Numerische Mathematik*, vol. 1, no. 1, pp. 269–271, 1959. View at Publisher · View at Google Scholar · View at Scopus - M. Pan and C. C. N. Chu, “FastRoute 2.0: a high-quality and efficient global router,” in
*Proceedings of the Asia and South Pacific Design Automation Conference*, pp. 250–255, January 2007. View at Publisher · View at Google Scholar · View at Scopus - J. A. Roy and I. L. Markov, “High-performance routing at the nanometer scale,”
*IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 27, no. 6, pp. 1066–1077, 2008. View at Publisher · View at Google Scholar · View at Scopus - “ISPD 2008 global routing contest and benchmark suite,” http://www.sigda.org/ispd2008/contests/ispd08rc.html.
- T.-H. Wu, A. Davoodi, and J. T. Linderoth, “A parallel integer programming approach to global routing,” in
*Proceedings of the 47th Design Automation Conference (DAC '10)*, pp. 194–199, June 2010. View at Publisher · View at Google Scholar · View at Scopus - ISPD 2006 placement contest and benchmark suite.
- CPLEX Optimization,
*Using the CPLEX Callable Library, Version 9*, Incline Village, Nev, USA, 2005. - M. J. Litzkow, M. Livny, and M. W. Mutka, “Condor—a hunter of idle workstations,” in
*Proceedings of the 8th International Conference on Distributed Computing Systems*, pp. 104–111, 1998.