Abstract

We proposed to use the conjugate gradient method to effectively solve the thermal resistance model in HotSpot thermal floorplan tool. The iterative conjugate gradient solver is suitable for traditional sparse matrix linear systems. We also defined the relative sparse matrix in the iterative thermal floorplan of Simulated Annealing framework algorithm, and the iterative method of relative sparse matrix could be applied to other iterative framework algorithms. The experimental results show that the running time of our incremental iterative conjugate gradient solver is speeded up approximately 11x compared with the LU decomposition method for case ami49, and the experiment ratio curve shows that our iterative conjugate gradient solver accelerated more with increasing number of modules.

1. Introduction

With the constant improvement of the chip speed, power and temperature of a chip also increase accordingly. To cope with the increasing temperature of chips, the thermal aware floorplan method is widely used to avoid hotspots on chips in physical design. It makes the thermal problem to be more critical for physical design quality. In [1, 2], the authors employ the hierarchical thermal model to decrease the modules’ maximum temperature in chip physical design. They use -tree [3, 4] to represent the floorplan/placement with Simulated Annealing optimization algorithm. The authors’ hierarchical thermal floorplan/placement includes two critical steps. First, they cluster the modules by power density, and then they use the Gauss–Seidel method to solve the thermal linear system. In the [1, 2], the authors only have done a theoretical analysis between the Gauss–Seidel method and the traditional LU decomposition matrix solver by loop iterative times. They assume that the time complexity of the traditional LU decomposition matrix solver is and compare the Gauss–Seidel method real loop iterative times with instead of the real program or solver CPU running time.

In [5], the authors compare the program running time; this comparison is not obvious because the program running time includes other cost computations and SA’s iterative time. In order to better compare the linear solver, in this study, we improve the timing statistical method and join the other precondition methods, such as the SSOR precondition.

In [8], exponentially increasing power densities in current day designs due to aggressive technology scaling has resulted in temperature being one of the primary design constraints along with others like timing, area, and power. A lot of design techniques are being adopted during the physical design stage to minimize the power, apart from the architectural techniques like throttling for dynamic thermal management. In [8], the authors propose a practical methodology for better thermal management by floorplan modifications based on thermal hotspots obtained through dynamic simulations, without disturbing the logical connectivity information. This methodology definitely warrants the benefits which can be readily realized by doing this analysis early in the design cycle. This can also improve the placement of the thermal sensors and boost additional performance which can be extracted by their delayed triggering, considering the lateral spreading due to better floorplanning.

In [9], with the continuing scaling of CMOS technology, on-chip temperature and thermal-induced variations have become a major design concern. To effectively limit the high temperature in a chip equipped with a cost-effective cooling system, thermal-specific approaches, besides low-power techniques, are necessary at the chip design level. The high temperature in hotspots and large thermal gradients are caused by the high local power density and the nonuniform power dissipation across the chip. With the objective of reducing power density in hotspots, the authors proposed two placement techniques that spread cells in hotspots over a larger area. Increasing the area occupied by the hotspot directly reduces its power density, leading to a reduction in peak temperature and thermal gradient. To minimize the introduced overhead in delay and dynamic power, they maintained the relative positions of the coupling cells in the new layout. They compared the proposed methods in terms of temperature reduction, timing, and area overhead to the baseline method, which enlarges the circuit area uniformly. The experimental results showed that our methods achieve a larger reduction in both peak temperature and thermal gradient than the baseline method. The baseline method, although reducing peak temperature in most cases, has little impact on thermal gradient.

In [10], with the thermal effect, improper analog placements may degrade circuit performance because the thermal gradient can affect electrical characteristics of the thermally sensitive devices. To mitigate the thermal effect in analog layout design, it is required to reduce thermally induced mismatches among matched devices in addition to eliminating thermal hot spots. The study presented major challenges arising from the chip thermal gradient for analog placement , introduced nonuniform and uniform thermal profiles as well as the corresponding placement configurations, surveyed key existing techniques for analog placement under nonuniform and uniform thermal profiles, and provided the experimental results for analog placement with thermal consideration.

In [11], the work developed a thermal aware placer, ThermPL, to abate both on-chip peak temperature and thermal gradient by developing thermal force and padding techniques cooperated with rough legalization in the force-directed global placement. Thermal padding is firstly adopted to reduce local power density. To make use of thermal force, the authors used the thermal gain basis to fast and accurately capture the temperature distribution of a placement and effectively calculate the thermal contribution of cells based on the thermal locality. Then, they utilized the proposed innate thermal force assessed through thermal criticality and capabilities to spread cells away from hotspots. With the thermal gain basis, ThermPL can efficiently obtain the thermal profile of placement with the maximum error of 0.65% compared with a commercial tool. Experimental results show that ThermPL can provide 7% and 19% reduction on average in peak temperature and thermal gradient, respectively, within only 4.6% wirelength overhead.

In [12], with the thermal effect, improper analog placements may degrade circuit performance because the thermal impact from power devices can affect electrical characteristics of the thermally sensitive devices. There is not much previous work that considers the desired placement configuration between power and thermally sensitive devices for a better thermal profile to reduce the thermally induced mismatches. The study first introduced the properties of a desired thermal profile for better thermal matching of the matched devices. It then presented a thermal-driven analog placement methodology to achieve the desired thermal profile and to consider the best device matching under the thermal profile while satisfying the symmetry and the common-centroid constraints. Experimental results based on real analog circuits show that the proposed approach can achieve the best analog circuit performance/accuracy with the least impact due to the thermal gradient, among existing works.

In this study, we embed the iterative conjugate gradient method into HotSpot floorplan to compare the real CPU running time different solvers with the same compiler and running environment.

The conjugate gradient solver was imported into the thermal floorplan tool HotSpot [13], comparing with its previous LU decomposition solver. Then, we can compare the running time with different solvers of thermal floorplan between conjugate gradient method and LU decomposition solver. The thermal floorplan solver is switched by the program command line parameter with the same running environment such as CPU and memory, GCC version, and compiler’s option. It is more convective to compare CPU time with two solvers than theoretical analysis about loop iterative times. We also use the SSOR and Jacobi preconditions to accelerate the conjugate gradient solver.

HotSpot thermal aware floorplan employs the thermal model to compute the blocks’ temperature; the thermal temperature metric is combining with other area and wire length metrics, and it is a relative sparse matrix in the HotSpot thermal model of iterative SA framework algorithm. The HotSpot thermal floorplan can decrease the maximum of block temperature by evenly distributing the power density, avoiding hotspots in the floorplan step of physical design. We import an iterative method to solve this kind of relative sparse linear system in the thermal model of floorplan. This paper’s contributions include the following:(i)The relative sparse matrix is defined. It can speed up linear system solver convergence by the iterative sparse method.(ii)The conjugate gradient iterative method is imported in the HotSpot floorplan thermal model. It is an efficient algorithm that can reduce the running time by accelerating the linear solver in hotspot.

This paper is organized as follows. Section 2 introduces the HotSpot floorplan flow. Relative sparse matrix definition and the thermal resistance model are given in Section 3 and Section 4, respectively. Section 5 shows the result of experiments, and conclusions and future research are given in Section 6.

2. HotSpot Thermal Floorplan

The VLSI physical design floorplan is to place the blocks without overlap in the silicon die, and the floorplan algorithm needs to obey the chip constraint, optimizing area, wire length, and thermal temperature metrics’ cost. The placement temperature is solved by linear equations in the thermal model.

2.1. Introduction of Hot Thermal Floorplan

The floorplan/placement of physical design is a critical step for the thermal aware design. Hot floorplan is a thermal aware tool to decrease the module temperature and avoid the hotspot convergence. The hot floorplan merges the thermal cost with the traditional area and wire length cost into iterative Simulated Annealing algorithm, and it is time-consuming to solve the module temperature in the iterative Simulated Annealing algorithm.

We also use the HotSpot model to guide thermal floorplan/placement to do static temperature computer and modules' temperature statistics, and then the thermal cost is integrated with the other area and wire length to do thermal aware physical design.

HotSpot builds a thermal model to compute the dynamic block or grid temperature on chip. We import the conjugate gradient solver to accelerate the thermal block model computation in floorplan SA algorithm.

3. Relative Sparse Matrix of Iterative Framework Algorithm

There are more sophisticated algorithms to solve sparse linear equations, avoiding to process zero entries of sparse matrix [6, 7], such as the iterative method of linear systems. The relative sparse matrix is not a strict sparse matrix and it tends to be a dense matrix; relative sparse matrix means that there is a few “interesting” entry between one matrix and another.

3.1. Relative Sparse Matrix

Relative sparse matrix definition: if the matrix is sparse, there is a few nonzero values of matrix items. We define that matrices and are relatively sparse; in other words, the matrix is a sparse matrix relative to matrix . Assume there are two linear equations and , the order of solving linear equations is as follows: the first step is solving , the second step is solving in sequentially iterative framework algorithm; for example, Simulated Annealing algorithm solves and sequentially with the same vector , in a linear system with different matrices from to .

In this case of sequentially relative sparse matrix, the iterative methods are employed to solve the relative sparse linear equations to reduce the number of iterations for the convergence. The detail operations are described as follows:

In the first step, the linear system is solved trivially, obtaining the solution , and then we reuse the solution as the initial estimate value for the linear equations . In the same way, we reuse the solution of to be initial estimate value of , sequentially.

In this case of the sequence relative sparse matrix, if we set the initial estimate value of equal to , is the previous solution of , and it can speed up the iterative method convergence. Because the changes of the matrix from to is little, even though the matrices and are full matrix not traditional sparse matrix about entry densities, the matrix is a sparse matrix relative to the matrix .

We can call this relative sparse linear system computation as the incremental updating solution method.

In the iterative algorithm, the previous iterative solution preserved as intermediate solution is the initial estimate value for current iteration in the linear system.

4. Relative Sparse Matrix in Thermal Floorplan of SA Framework Algorithm

4.1. Thermal Model Introduction

In the floorplan of VLSI physical design, the circuit modules are randomly placed in the die using Simulated Annealing; once the Simulated Annealing generates a floorplan of circuit modules, we calculate the cost metric of die area, wire length between circuit modules, and the maximum temperature of circuit modules by the thermal model.

The Simulated Annealing is an iterative optimizing algorithm; the thermal model is incorporated into the SA (Simulated Annealing). The thermal conduct in the die is complex [14], and it can be an abstracted thermal resistant model [15]:where and are the vectors representing temperature and power consumption, respectively; thermal resistance is the square matrix and symmetric matrix. Once the circuit blocks are determined, the blocks’ power vector will not change, and it is a constant vector. Thermal resistance matrix will change entry values according to the placement detail, and it is a dense matrix instead of the sparse matrix, but it matches the relative sparse matrix definition in iterative Simulated Annealing framework algorithm.

The Simulated Annealing algorithm changes placement, a few from one stage to another, computing new cost of metrics, for example, moving a block from one location to another unused location, and this perturbation will only change one block’s location in placement so that the thermal conduct between blocks and most block’s temperature changes little too. Here the updating block’s thermal resistance matrix R is a dense matrix but has a few “interesting” changes about thermal resistance matrix items, and it is a relative sparse matrix between and . The new thermal resistance matrix is a relative sparse matrix with .

4.2. LU Decomposition Solving Linear Equation

Solving linear equations gave a system of linear equations in the matrix form:Given matrix and vector , the solution is needed to be solved. The matrix is LUP decomposed such that PA = LU. The linear equations could be transformed into LU form equivalently as

The LU solver is done in two logical steps:(i)First step: solving the lower triangular matrix linear equations, for (ii)Second step: solving the upper triangular matrix linear equations,

The cost of solving a system of linear equations is approximately floating point operations if the matrix has size n [16].

The LU decomposition is the direct solver method of the linear equation.

4.3. Incrementally Iterative Conjugate Gradient Solver and Convergence

There are many iterative linear solver methods including conjugate gradient, Gauss–Seidel, and successive over relaxation. In this study, the conjugate gradient method is used to solve the thermal model in floorplan.

4.3.1. Convergence of Incrementally Iterative Conjugate Gradient Solver

The conjugate gradient method is convergence if the matrix is symmetric and positive definite. The condition number associated with the linear equation gives a bound on how inaccurate the solution will be after approximation. The condition number of matrix is the product of the two operator norms:

If is normal, then , where and are maximum and minimum values of eigenvalues of a matrix , respectively. The convergence of CG depends on the condition number of matrix which is equal to .

Denoting initial guess for , at starting of the SA algorithm, we can assume that ; if we get after the first time linear solver, then the conjugate gradient solver will reuse the previous as the next time initial estimate value , . This incremental updating solution method can accelerate the solver convergence.

The conjugate gradient method inspires solution, and is also unique in minimizing the following quadratic function:

This suggests taking the first basis vector to be the negative of the gradient of at , the gradient of equals , and we take .

It is conjugate to gradient between the vectors.

Let be the residual at the step:

Note that is the negative gradient of f at , so the gradient descent method would be to move in the direction . Here, we insist that the directions be conjugate to each other. We also require that the next search direction be built out of the current residue and all previous search directions, which is reasonably enough in practice.

4.3.2. Pseudocode of Conjugate Gradient Algorithm

The algorithm is detailed below for solving , where is a real, symmetric, positive-definite matrix. The input vector can be an approximate initial solution or 0.

The pseudocode of conjugate gradient solver is shown in Algorithm 1.

Require: f(x): objective function; x0: initial solution;
Ensure: optimal x
(1)int Iter = 0;
(2)double α, β, rp, rpold, bnorm, rnorm;
(3)vector r, p, q, z;
(4)initial r = b ? A ? x;
(5)rnorm = (r,r);
(6)while (rnorm < precision) do
(7)if using preconditioned then
(8)  z = M−1r; (using difference preconditioned)
(9)else
(10)  z = r; (no preconditioned)
(11)end if
(12) rp = (r,z);
(13)if Iter == 1 then
(14)  p0 = z
(15)else
(16)  β = rp/rpold;
(17)  p = z + β p
(18)end if
(19)  q = A p;
(20)  α = (r,z)/(p,q);
(21)  x = x + α p;
(22)  r = rα q;
(23)  rnorm = (r,r);
(24)  rpold = rp;
(25)end while
4.3.3. Precondition of Conjugate Gradient Method

In most cases, preconditioning is necessary to ensure fast convergence of the conjugate gradient method. The preconditioned conjugate gradient method takes the following form.

We consider a preconditioned system ofwhere is a nonsingular matrix.

Jacobi preconditioning: the simplest preconditioner consists of just the diagonal of the matrix:

This is known as the Jacobi preconditioner.

The SSOR preconditioner, like the Jacobi preconditioner, can be derived from the coefficient matrix without any work. If the original, symmetric matrix is decomposed asin its diagonal, lower, and upper triangular part, the SSOR matrix is defined as

The pseudocode of the conjugate gradient solver with preconditioned is shown as Algorithm 1.

4.3.4. Incrementally Inherits Initializing Estimate Value from Previous Solution

The HotSpot thermal floorplan is using the iterative Simulated Annealing algorithm. The Simulated Annealing algorithm changes the placement from one to another, only one or two blocks’ location, and the most blocks’ temperatures change a little. If the initialize estimate value inherits from the previous temperature result, it can reduce the number of iteration times for convergence. It is the reason that the SA framework floorplan algorithm employs the iterative conjugate gradient solver to accelerate the convergence for the thermal model.

The pseudocode of SA thermal floorplan algorithm with the conjugate gradient thermal solver is shown in Algorithm 2.

Require: cost(f) SA floorplan evaluate metric;
Ensure: optimal floorplan
1rnorm = (r,r);
2int Iter = 0;
3double α, β, rp, rpold, bnorm, rnorm;
4vector r, p, q, z;
5initial T schedule;
6/stop annealing if temperature has cooled down enough or max no. of iterations have been tried/
7while (TTcold && steps < cfg.Nmax) do
8n = cfg.Kmoves ∗ flp− > n units;
9i = downs = rejects = 0;
10 sumcost = 0;
11 / try enough total or downhill moves per T /
12while (do(i < 2 ∗ n)&&(downs < n))
13  make random move and data process to floorplan
14  new cost = floorplan evaluate metric;
15  // area (A), temperature (T), and wire length (W):
16  lambdaA A + lambdaT T + lambdaW W
17  reusing the T to be linear solver initial solution
18  if (new cost < cost ∥ rand fraction() < exp(−(new cost − cost)/T)) then
19   / downhill always accepted /
20   or / Boltzmann probability function /
21   accepted new cost
22  else
23   rejects++;
24  end if
25  i++;
26end while
27 / stop annealing if there are too little accepts /
28if((rejects/i) > cfg.Rreject)
29   break;
30 / annealing schedule /
31T= cfg.Rcool;;
32 steps++;
33end while
4.4. Stopping Criteria for Iteration Solver

The residual is computed as follows:

If and only if , the iterations are terminated after the residual is less small than the desired precision.

The overall iterative times of the conjugate gradient solver are in floorplan [17]. The conjugate gradient solver with preconditioned is ; in two-dimensional problem, , and in three-dimensional, .

The linear equations can be solved by LU in time . The analysis and experiment also prove the excellence of the conjugate gradient incremental updating solution method in the SA iterative framework algorithm of thermal floorplan.

5. Results of Experiments

The conjugate gradient algorithm is imported into the open source HotSpot floorplan [15] in C and C++ program language. The HotSpot thermal floorplan uses the SA (Simulated Annealing) [18] optimal algorithm. The experiments are running on Ubuntu with Intel® Core™ CPU i5-2300 2.80 GHz and 12 G memory. The benchmarks are MCNC [19] benchmark circuits. The block power trace is generated by a Perl script random function, and the power density ranges from 105 W/m2 to 107 W/m2 for each block [14].

It is a more convictive way to compare CPU time of two linear solvers in same program; the conjugate gradient algorithm is been implanted by the C++ code and merged into the hot floorplan [15]; and the conjugate gradient algorithm compares with the hot floorplan default LU decomposition solver. The two linear solvers of program are switched by the command line parameter, so the program run environment, such as CPU, memory, GCC version, and compile option, is the same.

Table 1 shows that the conjugate gradient solver without precondition (CG normal in the table) run time is approximate; in the LU decomposition solver on MCNC case, the run time unit is second; the conjugate gradient without precondition solver run time is about speed up averagely 1.49 compared with the LU decomposition solver; the LU decomposition solver once average time is 0.00387 second (3.87E − 03 in the table); and the conjugate gradient solver without precondition solver once average time is 0.00191 second (1.91E − 03 in the table).

Table 2 shows that the conjugate gradient solver with Jacobi precondition run time is quicker than the LU decomposition solver, and the conjugate gradient solver with Jacobi precondition run time speeds up averagely 4.32x comparing with the LU decomposition solver. The conjugate gradient solver with the Jacobi precondition solver once had an average time of 0.000567 second (5.67E − 04 in table).

Table 3 shows that the conjugate gradient solver with SSOR precondition run time is less better than the conjugate gradient solver with the Jacobi Precondition; the conjugate gradient solver with SSOR precondition run time speeds up averagely 5.18x with the LU decomposition solver; the conjugate gradient solver with the SSOR precondition solver once had an average time of 0.000473 second (4.73E − 04 in table). It is 11x for test case ami49. The profit of conjugate gradient solver with SSOR precondition gains the best result ratio comparing with the LU decomposition solver.

Figure 1 shows the conjugate gradient solver run time versus the LU decomposition solver on the MCNC benchmark. The figure name rule: the CG is the conjugate gradient solver without precondition; the Jacobi is the conjugate gradient solver with Jacobi precondition; the SSOR is the conjugate gradient solver with SSOR precondition. The conjugate gradient solver without precondition run time is more than the LU decomposition solver on two small cases, and less than on large cases; the conjugate gradient solver with Jacobi and SSOR precondition experimental results are less than the LU decomposition solver results. The precondition is important for the iterative conjugate gradient solver.

Figure 2 shows the conjugate gradient solver run time ratio versus the LU decomposition solver on the MCNC benchmark. Naming rules are consistent with Figure 1. The experimental ratio curve shows that our iterative conjugate gradient solver accelerated more with increasing number of modules.

We also received the reviewer proposal to adapt to the larger GSRC benchmark of examples to test the scalability of our algorithm. However, we need to emphasize that HotSpot floorplan is designed for the CPU small quantity module floorplan, which is not suitable for running the large collection of placement instances, so the running time is relatively long. Table 4 shows that the conjugate gradient solver with SSOR precondition run time is better than the LU decomposition solver, the conjugate gradient solver with SSOR precondition run time speeds up averagely 17x with the LU decomposition solver, the conjugate gradient solver with SSOR precondition is 24x for test case n300. The profit of the conjugate gradient solver with SSOR precondition gains the best result ratio comparing with the LU decomposition solver.

6. Conclusions and Future Work

The conjugate gradient solver is often been used in large sparse matrix method computation, and the HotSpot thermal floorplan could be speeded up by using the sparse matrix iterative linear solver.

The experiments show that the iterative conjugate gradient solver is faster than the direct LU decomposition solver on the MCNC benchmark.

The relative sparse matrix theory could be applied to other iterative framework algorithms, and the relative sparse matrix could be extensible. The future works may be the following:(i)Extend to other precondition methods of an iterative linear solver, and we could use other preconditions.(ii)Explore sparse linear system theory to speed up our program because there are many theoretical innovations to solve linear system in the last two decades.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Special thank are due to Michigan Technological University Professor Zhuo Feng for the technical discussion with great significance. This paper was supported by “the Fundamental Research Funds for the Central Universities.”