`VLSI DesignVolume 2012 (2012), Article ID 395260, 14 pageshttp://dx.doi.org/10.1155/2012/395260`
Research Article

## A New Length-Based Algebraic Multigrid Clustering Algorithm

Department of Electrical and Computer Engineering, University of Calgary, 2500 University Drive NW Calgary, AB, Canada T2N 1N4

Received 21 December 2011; Revised 24 February 2012; Accepted 25 February 2012

Copyright © 2012 L. Rakai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Clustering algorithms have been used to improve the speed and quality of placement. Traditionally, clustering focuses on the local connections between cells. In this paper, a new clustering algorithm that is based on the estimated lengths of circuit interconnects and the connectivity is proposed. In the proposed algorithm, first an a priori length estimation technique is used to estimate the lengths of nets. Then, the estimated lengths are used in a clustering framework to modify a clustering technique based on algebraic multigrid (AMG), that finds the cells with the highest connectivity. Finally, based on the results from the AMG-based process, clusters are made. In addition, a new physical unclustering technique is proposed. The results show a significant improvement, reductions of up to 40%, in wire length can be achieved when using the proposed technique with three academic placers on industry-based circuits. Moreover, the runtime is not significantly degraded and can even be improved.

#### 1. Introduction

Clustering is usually employed during the large-scale placement problems encountered in today’s circuits, to speed up the placement process and improve the solution quality. The clustering algorithms used today are based on finding small groups of cells with high connectivity and putting each of them in a cluster. This scheme has proven effective, to a large extent, as there is a high correlation between the cells’ connectivity and the lengths of the nets that connect them [1, 2]. Hence, by clustering cells that are close to one another, the lengths of the nets between them, and hence the total wire length will be reduced.

In this paper, a new perspective on how to cluster cells is proposed, where clustering decisions are made based on estimates of the lengths of nets connecting the cells. If a net is estimated to be short, then its cells are expected to be physically closer to each other, and the cells can be put in a cluster. The opposite is true for a net which has a high estimated length.

The proposed algorithm consists of three phases. In the first phase, a state-of-the-art length estimation technique is used to estimate the lengths of individual nets before placement.

Accurate length estimates are then used to direct the clustering decisions. These length estimates are used in the second phase to build a length-based proximity matrix. Then, it is proposed to use the proximity matrix in an AMG-based clustering framework to find clusters of the circuit. The AMG-based clustering finds seed cells and assigns scores based on the whole proximity matrix, not only the local connections. Hence, it can be more effective in finding a more globally optimal clustering configuration.

In the third phase, clusters are formed, and an initial placement is performed on the new clustered circuit. The initial placement solution is then refined in the unclustering phase, where a new technique for unclustering is proposed.

The major contributions of this paper are as follows:(i)using estimated lengths when deciding which cells should be clustered;(ii)designing and implementing a length-based AMG clustering algorithm;(iii)proposing and implementing an unclustering refinement algorithm;(iv)improving the runtime of the placement process;(v)illustrating the effectiveness of multilevel length-based AMG clustering.

The rest of this paper is organized as follows. In Section 2, a literature review of existing net length estimation methods, an introduction to AMG, and a background of clustering techniques are presented. In Section 3, the proposed length-based clustering algorithm is described in detail. The proposed model efficacy is examined and validated by several experiments in Section 4. Conclusions are given in Section 5.

#### 2. Background

In this section, a literature review of the three main components of the algorithm are given. As these components are distinct subjects not all normally used in the context of VLSI placement, the literature review is divided into three distinct sections: length estimation (2.1), AMG (2.2), and finally, clustering algorithms (2.3).

##### 2.1. Net Length Estimation Techniques

Several a priori length estimation techniques, for example [36], have been proposed that try to estimate the length of individual nets. Older techniques, such as [712], can only estimate the average length of a set of nets or the distributions of net lengths.

In [6], a variable referred to as the intrinsic shortest path length (ISPL) is developed and used to estimate the individual net lengths. Although the estimation results obtained using this technique are exceptional, this approach is not very useful for modern mixed-size circuits since it only considers cells with unit area and nets with degree two.

Different properties of nets and cells are modeled by several variables in [3]. These variables are then used to make a third-order polynomial model for length estimation. The estimation results are well correlated for relatively small circuits that only include standard cells. However, none of these variables considers the effects of macro blocks on net lengths, and so the estimation results for mixed-size circuits are unreliable. In [5], this technique is further studied using a quadratic polynomial. Three new variables are proposed to account for the effects of macro blocks. The estimation results for modern mixed-size circuits show around improvement over those obtained using [3].

Some applications of a priori net length estimation techniques are introduced in [4, 13, 14]. A variable called mutual contraction, is proposed and used for net length estimation, in [13, 14]. This variable is then used in placement. The estimated net lengths are utilized to quantify the quality of potential clusters. In [4], another application of preplacement length estimation is proposed where the negative effects of clustering are corrected based on the estimation results.

##### 2.2. Algebraic Multigrid

A challenging and frequently encountered problem in various domains is to solve a large system of linear equations as in where is a large, sparse square matrix, is a vector containing the unknown variables, and is the known right-hand side vector. When is very large, solving this system of equations directly becomes very expensive, in general. In the circuit placement domain, this equation arises from minimizing a quadratic length objective where the dimensions of are on the order of millions.

The algebraic multigrid (AMG) technique uses a multilevel framework to approximately solve (1), hence reducing the computational cost. Several AMG algorithms have been developed [1521]. Each one of these methods is suited for a specific type of problem where the matrix has certain properties. General methods such as [15, 17, 20] have sufficient convergence conditions and can easily be adapted to more problems.

A typical AMG technique consists of three steps: coarsening, direct solution, and interpolation. Coarsening is used to decrease the size of the matrix by keeping the essential elements of it and disregarding the nonessential elements. Once the matrix size has been adequately reduced, the second step starts. In this step, the reduced problem is solved exactly using a direct method. In the final step, interpolation, the solution obtained after the direct method step is projected back onto the sequence of larger systems of equations obtained during coarsening to achieve a final solution.

Another difference between the AMG framework and other direct methods is that in AMG, instead of solving for directly, the difference or the error, , between the initial guess for and the actual value of is driven to zero. This is accomplished by solving the residual equation for level , where is the residual that is calculated as . A multilevel AMG scheme is illustrated in Figure 1. In this figure, in each level of coarsening, a restriction matrix, , is calculated. This matrix contains fewer rows than columns. To reduce the size of the equation matrix at level , , is premultiplied by and postmultiplied by , the interpolation matrix, and the equation matrix at level , is obtained [16]. Thus, has smaller dimensions than . At the lowest level, , an exact solution for is found, that is, as shown in the figure. In the interpolation stage, the error is propagated to the higher levels using the appropriate interpolation matrix. Finally, using the calculated error in the highest level, the solution approximation is updated: . The relaxation steps shown in the figure refer to using a fast, iterative method to improve the solution at the current level.

Figure 1: A multilevel AMG scheme.
##### 2.3. Clustering Algorithms

In different stages of VLSI physical design, clustering algorithms are used to reduce the sizes of circuits. Several clustering algorithms have been proposed to handle the modern large-size circuits, for example, [1, 2, 2226]. Most placement techniques, such as the ones that competed in the ISPD 2006 [27] placement contest, use clustering algorithms.

Clustering algorithms such as heavy-edge matching [22], Edge Coarsening [28], FirstChoice [26], and PinEC [23] are similar in how they choose and form clusters. In these algorithms, a cell is chosen randomly to become the seed for a cluster. A seed refers to the first cell of a cluster. Once a seed has been chosen, a cell that measures the highest in connectivity with the seed is added to the cluster. These algorithms are easy to implement, and the runtimes for them are typically low. However, as no comparisons between potential clusters are performed, the quality of the results can be low.

In an effort to improve results, algorithms, such as edge-separability [24], fine-granularity clustering (FGC) [25], best-choice clustering [1], and Net Cluster [2], first prepare a set of potential clusters and then either refine the potential clusters or only finalize potential clusters that have high scores.

In [29, 30], a clustering algorithm for wire length-driven placement, called SafeChoice, is proposed where a condition is used to check if potential clusters will degrade the placement solution. If a cluster passes the check, it is referred to as a safe cluster, and the cluster is finalized.

In [31], a linear-time AMG-based clustering technique is proposed. This AMG-based algorithm uses only the connectivity matrix of a circuit to form clusters.

#### 3. The Proposed Length-Driven Multilevel Placement Framework

##### 3.1. The Proposed Length Estimation-Based AMG Clustering Algorithm

The main objective of the placement phase in the physical design of circuits is to reduce the total wire length of the circuit. Clustering algorithms are normally used during placement to improve the total wire length and runtime. However, clustering is performed based on circuit connectivity and not the wire length. In this paper, an algorithm that performs clustering based on estimated wire lengths is proposed. The novelty of the algorithm is in using the estimated lengths instead of connectivities in determining the best cells to be clustered.

The algorithm has five main stages: preplacement length estimation, proximity matrix construction, AMG-based coarsening, cluster seed cell and score assignment, and final cluster formation, as summarized in Algorithm 1.

Algorithm 1: The high-level flow of the AMG-LE clustering algorithm.

###### 3.1.1. Preplacement Length Estimation

In an estimation technique, first a set of model parameters needs to be calculated. In this work, the parameters used or developed in [5] are employed as these have been shown to be a comprehensive set of parameters that are well suited to the mixed-size circuits used today. These parameters include local net characteristics, such as the half perimeter of the cells of each net and global characteristics, such as the number of degree-two nets in the design.

Once the model parameters have been selected, an estimation technique should be used to fit the parameters into a model. The model parameters in [5] are used to fit a quadratic model. Any terms of the resulting model which have small coefficients, ineffective terms, are pruned from the model. The remaining terms, effective terms, are used to make the estimation model and calculate the length estimates.

It is worth noting that any other individual net length estimation technique could be used in this step. The model in [5] is selected as it includes several factors such as macro cells but is independent of the placer to be used allowing for broader application of this work.

To better illustrate the algorithm flow, a small example circuit is presented in Figure 2. In this figure, the nets are annotated with the estimated lengths. The estimated lengths are found using a simplified version of the technique in [5]. Due to the small size of the example circuit, only the variables associated with the cell dimensions and net degree are used. These lengths will be used in the following sections to perform clustering using the proposed AMG-LE algorithm.

Figure 2: An example circuit with nets annotated with estimated lengths. The dimensions of all cells are except cells 4, 7, and 9 that are , and cell 5 which is .
###### 3.1.2. Proximity Matrix Construction

To perform circuit clustering, the circuit’s net list should be represented using a matrix. Normally, this matrix is the connectivity matrix of the circuit which shows the total number of weighted connections between two cells. In this paper, a new matrix, called the proximity matrix, is designed which shows how close two cells are predicted to be. The proximity matrix is later used for the AMG-based clustering.

It is shown in [16] that for the best AMG performance the matrix of (1), which in this case represents the net list, should be an M-matrix, that is, a symmetric, positive-definite matrix with positive diagonal and nonpositive offdiagonal elements. In the proposed AMG-based clustering technique, a proximity matrix , which is an M-matrix, is constructed. The rows and columns of represent the cells of the circuit. The element shows the connectivity between cell and cell . Each is determined as follows: consider two cells, and , which are treated as points in AMG, and let be the set of nets, , that contain both cells and . Then, element is calculated as where represents the inverse of the estimated length of a net and is defined as where denotes the estimated length of a net obtained from the model described in Section 3.1.1. The inverse of the estimate is used because nets whose estimated lengths are small have a high proximity and vice versa. Using the formulation in (2), any nonzero offdiagonal element of the proximity matrix is equal to the negative sum of the inverses of the estimated lengths of nets between and , and any diagonal element is equal to the sum of all the inverses of nets connected to . This matrix is a positive-semidefinite M-matrix which is highly desirable for AMG. In addition, multiterminal nets and multiple connections between cells are handled by (2).

As an example, the proximity matrix for the example circuit in Figure 2 is given in (4). For any two cells and that are not connected, the entry is equal to zero. For any two cells that are connected, the entry is equal to the negative sum of the inverse of estimated lengths of the nets connecting them. As an example, cells 5 and 6 are connected by two nets each with estimated length of six. Therefore, . Finally, the diagonal elements are the negative sum of the off-diagonal elements in the corresponding row of the matrix. For example, .

###### 3.1.3. AMG-Based Coarsening on the Proximity Matrix

Once the proximity matrix has been constructed, the coarsening algorithm in [31] is used to reduce and construct and the associated restriction matrix . The coarsening can be loosely thought of as a heuristic for selecting a maximally independent subset of cells which are the most significant in . The significance of each cell is measured by the number of cells that strongly connect to it. A parameter, , is used in the definition of the strength of a connection. Cell is strongly connected to cell if where, is the element of , and is the number of cells in level . The relationship between the parameter and the amount coarsening performed is complex. If is close to one, more aggressive coarsening will be performed. However, more aggressive coarsening can overly simplify the reduced matrix and the associated reduced circuit. An interpretation of the strength of connection criteria in the context of the proposed proximity matrix is that if a cell is in several nets with small estimated length it is likely to be a part of the reduced circuit.

###### 3.1.4. Seed Cell and Cluster Score Assignment

After coarsening the proximity matrix, the cells that are selected to form the reduced circuit become seed cells used for clustering. The cells which are not selected to form the reduced circuit are considered to be clustered with the selected seed cells. This is the same technique used in [31]. In this technique, the entries of the AMG interpolation matrix are used to rank each seed cell that a non-selected cell connects to. A large entry in the interpolation matrix means the seed cell is representative of the non-selected cell in the reduced circuit.

The selected seed cells for the example in Figure 2 are cells , and 7. Therefore, the interpolation matrix has three columns, one for each seed cell with the first column corresponding to cell 2, and so forth. The number of rows is nine, which is the total number of cells and row 1 corresponds to cell 1, and so forth. The interpolation matrix is shown in Figure 3(a). Each seed cell interpolates from itself directly, so entries , and are all one, where is the element of . The other non-zero entries represent the interpolation weight of non-selected cells to seed cells. As an example, cell 4 is connected to seed cells 2 and 5. The interpolation weights of cell 4 with seed cells 2 and 5 are given by entries and , respectively. To better visualize the seed cell and cluster score assignment using the interpolation matrix, a pictorial representation of is presented in Figure 3(b). In this figure, the seed cells are shown with a bold border. Each non-selected cell has an interpolation weight to the seed cells that they connect to. These interpolation weights annotate arrows pointing from non-selected cells to seed cells in Figure 3(b).

Figure 3: The interpolation matrix shown in matrix form and pictorial form for the example circuit in Figure 2. In the pictorial representation, the selected seed cells are shown in bold, and arrows are annotated with the interpolation weights of non-selected cells.
###### 3.1.5. Final Cluster Formation

The final step of the clustering algorithm is to cluster non-selected cells to the seed cell which they interpolate from most strongly. A cluster is finalized as long as the area of the cluster is not greater than five times the average standard cell area and its interpolation weight is more than 0.9. The cluster area is restricted to prevent the formation of macro-sized clusters which require special treatment in placement algorithms. The interpolation weight threshold is set high to ensure that only the best quality clusters are formed. Each cluster meeting the constraints is given a physical shape in the clustered circuit with a height equal to the maximum height of all of its cells and a width equal to the sum of the widths of its cells.

The final clusters for the example in Figure 2 are shown with dashed lines in Figure 4. The clusters are formed by clustering each non-selected cell with the seed cell that it has the maximum interpolation weight with. Assuming each cluster has an area of less than or equal to five times the average standard cell area, the clusters are finalized.

Figure 4: The final clusters for the example circuit in Figure 2 shown by dashed lines.
##### 3.2. The Proposed Length-Driven Unclustering Technique

The description of a clustering algorithm alone is not enough to fully specify the implementation of a clustering algorithm used for placement. Just as important is the algorithm for unclustering, or determining the locations of cells within a cluster after the cluster’s location has been determined. The description of this step is rarely mentioned in the literature of clustering algorithms.

In this work, a length-driven unclustering algorithm is proposed. The chief benefit of using the proposed technique is that a legal solution is preserved after unclustering. Global placement algorithms often produce legal solutions before performing detailed placement. A new unclustering technique which preserves the legality of the globally-placed solution improves the runtime in multilevel placement. With the legality restriction, the problem of unclustering reduces to ordering the cluster’s cells. The proposed technique finds the ordering by solving a relaxed length-driven minimization and using the result to determine the ordering of the cells. In addition, the combination of preserving legality and restricting cluster areas means that the row in which the unclustered cells will be placed is also preserved. Consequently, the vertical components of the lengths can be ignored. This helps to reduce the disruption in the wire length before and after unclustering which is a significant problem for multilevel placement as discussed in [32].

The locations of cells, , in cluster are determined by minimizing the quadratic matrix length objective where is a weighted connectivity matrix between the cells of the cluster, that is, , and is defined as where is the degree of net , that is, the number of cells in . The vector contains the weighted horizontal cell locations of cells outside of the cluster that each clustered cell connects to, that is, and defined as where is the location of the cell or the cluster that a cell belongs to. The scalar represents the length of all the nets not involving the cells in and can be ignored. After minimizing (6), the cells are inserted into the area occupied by the cluster in the order of their locations in . An illustration of the length-driven unclustering is given in Figure 5. In Figure 5(a), a clustered placement is given. The highlighted cluster is the focus of the example and contains three cells (not shown in Figure 5(a)). The cluster is connected to three other cells via degree-two nets, shown by dashed lines in the illustration. The cells contained in the cluster are shown in shades of grey in Figure 5(b). The three external connections as well as the two internal connections are used to determine the non-zero values in the matrix . In this example all of the values for are equal to 2, the degree of each net. The locations of the cells not in the cluster, shown in white, are used in forming the vector . The locations of the cluster’s cells in Figure 5(b) are determined by minimizing (6). These locations are used to determine the order of the cells inside of the cluster’s area to complete the unclustering, illustrated in Figure 5(c). Because the unclustered cells remain inside of the cluster’s area, the placement remains legal.

Figure 5: An illustration of the length-driven unclustering technique.

This technique does not take advantage of the locations of already unclustered cells as it iterates through the clusters. However, this also means that the technique is not sensitive to the order in which the clusters are visited. Furthermore, because the clusters are restricted to have an area of less than five times the average standard cell area, the additional benefit of using the locations of already unclustered cells and a good ordering of the clusters is expected to be small.

#### 4. Experimental Results

Several experiments have been carried out to verify the efficacy of the proposed clustering algorithm. These experiments are performed using the ICCAD04 benchmark circuits [33] released by IBM. The detailed statistics of these circuits are given in Table 1. In columns 2 and 3 of this table the number of nets and cells in each circuit are given, respectively. In columns 4 and 5, the maximum net degree, , and maximum cell degree, , are given.

Table 1: Statistics of ICCAD04 benchmarks.

To better show the scalability of the proposed clustering algorithm, the same experiments are also performed with the ISPD05 benchmark suite [34]. The statistics of these benchmark circuits are presented in Table 2. The benchmark circuit bigblue4 is not included in the experiments due to the memory limitations of the testing platform.

Table 2: Statistics of ISPD05 benchmarks.
##### 4.1. Multilevel Clustering Results Using the Proposed AMG-LE Technique

To evaluate the proposed clustering algorithm, its effectiveness in the context of multilevel circuit placement using multiple placers is illustrated. The experiments are performed using three high-quality academic placers: Capo 10.5 [35], Fastplace 3.0 [36], and mPL6 [37]. Fastplace and mPL are analytical placers and have clustering algorithms embedded inside their placement algorithms. To better evaluate the performance of the proposed technique, the internal clustering of these placers are disabled in the clustering experiments. If the internal clustering is enabled, the results will have noise and it will not be clear if the proposed clustering technique or the internal clustering is the cause of any benefits. Comparisons of the proposed technique with existing clustering techniques, including the technique used internally by Fastplace 3.0 and mPL6, are presented in Section 4.3. Capo is a partitioning-based placer and does not use internal clustering. Capo is, however, non-deterministic so the average of ten runs of each experiment are reported when results for Capo are given. The experiments are performed on a 64-bit dual core AMD opteron system running Linux with 8 GB of RAM. In addition, the parameter in (5) is set to 0.1 in the experiments.

Wire length and runtime results of the experiment are presented in Table 3(a) and 3(b), respectively. In both tables, Column 1 identifies the circuit that is placed. Columns 2–4, 5–7, and 8–10 show the results for Capo, Fastplace, and mPL, respectively. Each placer is used in three modes: baseline without clustering, one-level AMG-LE clustering with length-driven unclustering, and two-level AMG-LE clustering with length-driven unclustering. The columns showing one- and two-level AMG-LE are given in terms of the percentage improvement over the baseline. In this representation, positive percentages represent improvements.

Table 3: Placement wire length and runtime results comparing one- and two-level AMG-LE clustering with length-driven unclustering to baseline placements without using clustering with three placers on the ICCAD04 benchmark suite.

The results in Table 3(a) show that AMG-LE is effective at improving wire length on average for all placers using one or two levels of clustering. A maximum improvement of 38.1% is achieved using one-level AMG-LE and Fastplace. Using one-level AMG-LE, every circuit for Capo, 16 circuits for Fastplace, and 10 circuits for mPL have improved wire length when compared to the baseline. When two-level AMG-LE is used, every circuit for Capo and Fastplace, and 12 circuits for mPL are improved.

The runtime results show that using the one-level scheme for mPL and the two-level scheme for all placers double the total runtime. For a placer, such as Fastplace, the increase in runtime may be more acceptable because it requires much less time than other placers. However, the increase for Capo and particularly for mPL are less acceptable. It should be noted that mPL does not offer an option to perform only detailed placement requiring full global and detailed placement to be performed between each level, which accounts for the significant increases compared to the other two placers. The experiment performed in the following section will illustrate how to improve the runtime results while maintaining, and even improving, the wire length results.

An interesting result arising from the experiment is that the placements resulting from using the AMG-LE framework are more correlated with the length estimates than the baseline placements. The correlation improvement is small, with a maximum improvement of 6% for Capo using two-level AMG-LE, but does suggest that the estimates are directing the placement.

In order to assess the scalability of the proposed clustering and unclustering algorithms, the same experiments are performed on the ISPD05 circuits. These circuits are bigger than the ICCAD04 benchmarks and have significantly larger maximum net degree. The results are tabulated in Table 4. The version of Capo used in the experiment could not place the adaptec4 benchmark on the test platform so its entries in Table 4 are not included. In general, the results resemble those of Table 3 with Capo and Fastplace obtaining significant wire length improvement for both one- and two-level AMG-LE. Meanwhile, only a negligible average improvement is achieved by mPL. The runtime increases for all placers with mPL having the largest increase in runtime, which is not surprising given that mPL must perform global placement in addition to detailed placement at each level. Capo has a more modest increase in runtime compared to the other two placers, on average. In this case, the placements for mPL are nearly identical whether one or two levels of AMG-LE clustering are performed. This means that the effects of the second level of clustering are almost entirely removed by performing global placement on the unclustered circuit.

Table 4: Placement wire length and runtime results comparing one- and two-level AMG-LE clustering with length-driven unclustering to baseline placements without using clustering with three placers on the ISPD05 benchmark suite.
##### 4.2. Length-Driven Unclustering as a Detailed Placer

In this section, the same multilevel experiments are performed with the difference of not performing any placement apart from placing the bottommost clustered circuit. The circuit is placed at the bottom level using the global and detailed placement functions of each placer (if the two are distinguished). As a result, the placement at the bottom level is legal, that is, free from overlaps. To improve upon the runtime results given in the previous section, it is proposed to only use the length-driven unclustering technique as the only refinement between levels. The placement will preserve its legality using the technique mentioned in Section 3.2. The results of this experiment are given in Table 5.

Table 5: Placement wire length and runtime results comparing one- and two-level AMG-LE clustering with length-driven unclustering without detailed placement to baseline placements without using clustering with three placers on the ICCAD04 benchmark suite.

The format of the Tables 5(a) and 5(b) is the same as Table 3. It can be seen that all placers’ wire lengths are improved on average using either one- or two-level AMG-LE. The best wire length improvement is again for Fastplace with one-level AMG-LE on ibm06 with a 35.8% improvement. Using one-level AMG-LE 16, 15, and 14 circuits improve for Capo, Fastplace, and mPL, respectively. Improvement is observed in 14, 16, and 15 circuits for Capo, Fastplace, and mPL, respectively, when using two-level AMG-LE. When compared to the results in Section 4.1, the average wire length for Capo and Fastplace degrades while mPL significantly improves. This improvement is because mPL has no option to perform just detailed placement. It performs global and detailed placement using the clustered placement as the starting point. During this process, the clusters become dispersed. Therefore, when the proposed length-driven unclustering is used as a detailed placer, the benefits of clustering and length-driven unclustering are preserved.

The runtime results show significant improvements over the results in Section 4.1. When compared to the baseline, Capo is not significantly changed while Fastplace is degraded by 19% and 66% for one- and two-level AMG-LE. The increases in runtime for Fastplace may be tolerated because of its comparatively low runtimes. However, mPL runtimes are improved significantly with more improvement seen with two-level AMG-LE, nearly cutting the average runtime by half.

In order to assess the scalability of the proposed length-driven unclustering as a detailed placer, the same experiments are performed on the ISPD05 circuits. The results are tabulated in Table 6. The results follow similar trends as those of Table 5. All placers achieve wire length improvements and significantly improved running times compared to the results of Table 4 which are obtained by using each placers’ detailed placement algorithm in between clustering levels. In this case, Capo achieves an overall runtime improvement compared to the baseline with and average improvements in wire length when performing one and two levels of AMG-LE clustering, respectively. mPL achieves substantial improvements in wire length with modest increases in average running time. Finally, for Fastplace, significant average wire length improvements are observed for both one and two levels of AMG-LE clustering, while the runtime has degraded which is acceptable considering the relatively low runtimes of Fastplace.

Table 6: Placement wire length and runtime results comparing one- and two-level AMG-LE clustering with length-driven unclustering without detailed placement to baseline placements without using clustering with three placers on the ISPD05 benchmark suite.

In the end, the circuit designers can decide which of the placement flows presented in Sections 4.1 and 4.2 is preferred for their application. Both improve wire length but to varying degrees depending on the choice of placer. When using a placer like Capo or Fastplace, the best wire length is obtained by an increase in runtime, while mPL achieves the best results with the fast detailed placement flow. If runtime is the most urgent concern, the proposed scheme of using length-driven unclustering as a detailed placer is the best option.

##### 4.3. Comparison to Existing Clustering Algorithms

The proposed AMG-LE clustering algorithm with length-driven unclustering is compared to other existing clustering algorithms in this section. Each technique is evaluated in terms of after placement wire length and total runtime using the mPL6 placer [37]. The clustering algorithms compared to are heavy-edge matching (HEM) [22], best-choice (BC) [1], and the AMG clustering (AMGC) algorithm in [31]. It is worth mentioning that best-choice is the clustering algorithm used internally by Fastplace 3.0 and mPL6. Placement is first performed without any clustering to establish a baselline. Then, each of the algorithms is used to produce a clustered circuit which is placed by mPL. The clustered placement is then unclustered and detailed placement is performed by mPL or by using length-driven unclustering as a detailed placer. Results of the experiment are given in Table 7.

Table 7: Placement wire length and runtime results comparing one-level of existing clustering algorithms and AMG-LE with and without length-driven unclustering to baseline placements without using clustering with mPL6 on the ICCAD04 benchmark suite.

The percentage improvement for AMG-LE in a regular clustered placement flow, as described in Section 4.1, is given under the subheading Regular. The column labeled Fast Detailed refers to the results using only length-driven unclustering as a detailed placer, described in Section 4.2. In terms of wire length, the regular AMG-LE scheme is competitive with the existing algorithms performing better in some circuits and worse in others. On the other hand, the runtime for regular AMG-LE degrades more than the other clustering techniques. This is because the runtime includes the time to perform pre-placement length estimation and the algorithm is implemented in the MATLAB environment Therefore, the runtime can be way more competitive if the procedure is implemented in a more efficient way. However, the fast detailed variant of AMG-LE is a clear winner in terms of average wire length and runtime improvement.

#### 5. Conclusions

This paper presents a new clustering technique by utilizing length estimates in an algebraic multigrid-based clustering technique. In addition, a physical unclustering technique is proposed with the benefits of reducing wire length and preserving legality. Because of its legality-preserving nature, the technique eliminates the need to use detailed placement between levels in the framework. These techniques are shown to be effective in reducing wire length and can also be used to benefit runtime.

#### Acknowledgments

This research is supported by the Natural Sciences and Engineering Research Council of Canada and Alberta Innovates Technology Futures. This research has been enabled by the use of computing resources provided by WestGrid and Compute/Calcul Canada.

#### References

1. C. Alpert, A. Kahng, G. J. Nam, S. Reda, and P. Villarrubia, “A semi-persistent clustering technique for vlsi circuit placement,” in International Symposium on Physical Design (ISPD '05), pp. 200–207, April 2005.
2. J. Li, L. Behjat, and A. Kennings, “Net cluster: a net-reduction-based clustering preprocessing algorithm for partitioning and placement,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 26, no. 4, pp. 669–679, 2007.
3. S. Bodapati and F. N. Najm, “Prelayout estimation of individual wire lengths,” IEEE Transactions on VLSI Systems, vol. 9, no. 6, pp. 943–958, 2001.
4. A. Farshidi, L. Behjat, L. Rakai, and B. Fathi, “A pre-placement individual net length estimation model and an application for modern circuits,” Integration, vol. 44, no. 2, pp. 111–122, 2011.
5. B. Fathi, L. Behjat, and L. M. Rakai, “A pre-placement net length estimation technique for mixed-size circuits,” in ACM/IEEE Workshop on System Level Interconnect Prediction (SLIP '09), pp. 45–52, July 2009.
6. A. B. Kahng and S. Reda, “Intrinsic shortest path length: a new, accurate a priori wirelength estimator,” in IEEE/ACM International Conference on Computer-Aided Design (ICCAD '05), pp. 173–180, November 2005.
7. W. E. Donath, “Placement and average interconnection lengths of computer logic,” IEEE Trans Circuits Syst, vol. 26, no. 4, pp. 272–277, 1979.
8. W. E. Donath, “Wire length distribution for placements of computer logic,” Ibm Journal of Research and Development, vol. 25, no. 2-3, pp. 152–155, 1981.
9. M. Feuer, “Connectivity of random logic,” IEEE Transactions on Computers, vol. C-31, no. 1, pp. 29–33, 1982.
10. T. Hamada, C. K. Cheng, and P. M. Chau, “A wire length estimation technique utilizing neighborhood density equations,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 15, no. 8, pp. 912–922, 1996.
11. H. T. Heineken and W. Maly, “Standard cell interconnect length prediction from structural circuit attributes,” in IEEE Custom Integrated Circuits Conference, pp. 167–170, May 1996.
12. M. Pedram and B. Preas, “Interconnection length estimation for optimized standard cell layouts,” in IEEE International Conference on Computer-Aided Design (ICCAD '89), pp. 390–393, November 1989.
13. B. Hu and M. Marek-Sadowska, “Wire length prediction based clustering and its application in placement,” in 40th Design Automation Conference, pp. 800–805, June 2003.
14. Q. Liu, B. Hu, and M. Marek-Sadowska, “Wire length prediction in constraint driven placement,” in International Workshop on System Level Interconnect Prediction, pp. 99–105, April 2003.
15. A. Brandt, “Algebraic multigrid theory: the symmetric case,” Applied Mathematics and Computation, vol. 19, no. 1–4, pp. 23–56, 1986.
16. W. L. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, SIAM, Philadelphia, Pa, USA, 2nd edition, 2000.
17. S. McCormick, Ed., Multigrid Methods, vol. 5 of Frontiers in Applied Mathematics, SIAM, Philadelphia, Pa, USA, 1987.
18. J. Ruge and K. Stüben, Multigrid Methods, chapter 4, SIAM, Philadelphia, Pa, USA, 1987.
19. K. Stüben, “Algebraic multigrid (AMG): experiences and comparisons,” Applied Mathematics and Computation, vol. 13, no. 3-4, pp. 419–451, 1983.
20. U. Trottenberg, C. Oosterlee, and A. Schüller, Multigrid, Academic Press, 2001.
21. P. Wesseling, An Introduction to Multigrid Methods, Pure and Applied Mathematics Series, John Wiley and Sons, New York, NY, USA, 1992.
22. C. J. Alpert, J. H. Huang, and A. B. Kahng, “Multilevel circuit partitioning,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 17, no. 8, pp. 655–667, 1998.
23. A. Caldwell, A. Kahng, and I. Markov, “Improved algorithms for hypergraph bi-partitioning,” in Asia and South Pacific Design Automation Conference (ASP-DAC '00), pp. 661–666, 2000.
24. J. Cong and S. K. Lim, “Edge separability-based circuit clustering with application to multilevel circuit partitioning,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 23, no. 3, pp. 346–357, 2004.
25. B. Hu and M. Marek-Sadowska, “Fine granularity clustering for large scale placement problems,” in International Symposium on Physical Design, pp. 67–74, April 2003.
26. G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hypergraph partitioning: applications in vlsi domain,” IEEE Transactions on VLSI Systems, vol. 7, no. 1, pp. 69–79, 1999.
27. G. J. Nam, “ISPD 2006 placement contest: benchmark suite and results,” in International Symposium on Physical Design (ISPD '06), p. 167, April 2006.
28. G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hypergraph partitioning: application in vlsi domain,” in 34th Design Automation Conference, pp. 526–529, June 1997.
29. J. Z. Yan, C. Chu, and W. K. Mak, “Safechoice: a novel clustering algorithm for wirelength-driven placement,” in ACM International Symposium on Physical Design (ISPD '10), pp. 185–192, March 2010.
30. J. Z. Yan, C. Chu, and W. K. Mak, “Safechoice: a novel approach to hypergraph clustering for wirelength-driven placement,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 30, no. 7, pp. 1020–1033, 2011.
31. L. Rakai, L. Behjat, S. Martin, and J. Aguado, “An algebraic multigrid-based algorithm for circuit clustering,” Applied Mathematics and Computation, vol. 218, no. 9, pp. 5202–5216, 2012.
32. A. B. Kahng and Q. Wang, “Implementation and extensibility of an analytic placer,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 24, no. 5, pp. 734–747, 2005.
33. S. N. Adya, S. Chaturvedi, J. A. Roy, D. A. Papa, and I. L. Markov, “Unification of partitioning, placement and floorplanning,” in IEEE/ACM International Conference on Computer-Aided Design (ICCAD '04), pp. 550–557, November 2004.
34. G. J. Nam, C. J. Alpert, P. Villarrubia, B. Winter, and M. Yildiz, “The ispd2005 placement contest and benchmark suite,” in International Symposium on Physical Design (ISPD '05), pp. 216–220, April 2005.
35. J. Roy and I. Markov, Partitioning-driven Techniques for VLSI Placement. Handbook of Algorithms for VLSI Physical Design Automation, CRC Press, 2008.
36. N. Viswanathan, M. Pan, and C. Chu, “Fastplace 3.0: a fast multilevel quadratic placement algorithm with placement congestion control,” in Asia and South Pacific Design Automation Conference (ASP-DAC '07), pp. 135–140, January 2007.
37. T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie, “Mpl6: enhanced multilevel mixed-size placement,” in International Symposium on Physical Design (ISPD '06), pp. 212–214, April 2006.