Mobile Information Systems

Volume 2018, Article ID 2921451, 8 pages

https://doi.org/10.1155/2018/2921451

## HotSpot Thermal Floorplan Solver Using Conjugate Gradient to Speed Up

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China

Correspondence should be addressed to Zhonghua Jiang; moc.621@008hzj

Received 9 November 2017; Revised 5 February 2018; Accepted 20 February 2018; Published 5 April 2018

Academic Editor: Yuh-Shyan Chen

Copyright © 2018 Zhonghua Jiang and Ning Xu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We proposed to use the conjugate gradient method to effectively solve the thermal resistance model in HotSpot thermal floorplan tool. The iterative conjugate gradient solver is suitable for traditional sparse matrix linear systems. We also defined the relative sparse matrix in the iterative thermal floorplan of Simulated Annealing framework algorithm, and the iterative method of relative sparse matrix could be applied to other iterative framework algorithms. The experimental results show that the running time of our incremental iterative conjugate gradient solver is speeded up approximately 11x compared with the LU decomposition method for case ami49, and the experiment ratio curve shows that our iterative conjugate gradient solver accelerated more with increasing number of modules.

#### 1. Introduction

With the constant improvement of the chip speed, power and temperature of a chip also increase accordingly. To cope with the increasing temperature of chips, the thermal aware floorplan method is widely used to avoid hotspots on chips in physical design. It makes the thermal problem to be more critical for physical design quality. In [1, 2], the authors employ the hierarchical thermal model to decrease the modules’ maximum temperature in chip physical design. They use -tree [3, 4] to represent the floorplan/placement with Simulated Annealing optimization algorithm. The authors’ hierarchical thermal floorplan/placement includes two critical steps. First, they cluster the modules by power density, and then they use the Gauss–Seidel method to solve the thermal linear system. In the [1, 2], the authors only have done a theoretical analysis between the Gauss–Seidel method and the traditional LU decomposition matrix solver by loop iterative times. They assume that the time complexity of the traditional LU decomposition matrix solver is and compare the Gauss–Seidel method real loop iterative times with instead of the real program or solver CPU running time.

In [5], the authors compare the program running time; this comparison is not obvious because the program running time includes other cost computations and SA’s iterative time. In order to better compare the linear solver, in this study, we improve the timing statistical method and join the other precondition methods, such as the SSOR precondition.

In [8], exponentially increasing power densities in current day designs due to aggressive technology scaling has resulted in temperature being one of the primary design constraints along with others like timing, area, and power. A lot of design techniques are being adopted during the physical design stage to minimize the power, apart from the architectural techniques like throttling for dynamic thermal management. In [8], the authors propose a practical methodology for better thermal management by floorplan modifications based on thermal hotspots obtained through dynamic simulations, without disturbing the logical connectivity information. This methodology definitely warrants the benefits which can be readily realized by doing this analysis early in the design cycle. This can also improve the placement of the thermal sensors and boost additional performance which can be extracted by their delayed triggering, considering the lateral spreading due to better floorplanning.

In [9], with the continuing scaling of CMOS technology, on-chip temperature and thermal-induced variations have become a major design concern. To effectively limit the high temperature in a chip equipped with a cost-effective cooling system, thermal-specific approaches, besides low-power techniques, are necessary at the chip design level. The high temperature in hotspots and large thermal gradients are caused by the high local power density and the nonuniform power dissipation across the chip. With the objective of reducing power density in hotspots, the authors proposed two placement techniques that spread cells in hotspots over a larger area. Increasing the area occupied by the hotspot directly reduces its power density, leading to a reduction in peak temperature and thermal gradient. To minimize the introduced overhead in delay and dynamic power, they maintained the relative positions of the coupling cells in the new layout. They compared the proposed methods in terms of temperature reduction, timing, and area overhead to the baseline method, which enlarges the circuit area uniformly. The experimental results showed that our methods achieve a larger reduction in both peak temperature and thermal gradient than the baseline method. The baseline method, although reducing peak temperature in most cases, has little impact on thermal gradient.

In [10], with the thermal effect, improper analog placements may degrade circuit performance because the thermal gradient can affect electrical characteristics of the thermally sensitive devices. To mitigate the thermal effect in analog layout design, it is required to reduce thermally induced mismatches among matched devices in addition to eliminating thermal hot spots. The study presented major challenges arising from the chip thermal gradient for analog placement , introduced nonuniform and uniform thermal profiles as well as the corresponding placement configurations, surveyed key existing techniques for analog placement under nonuniform and uniform thermal profiles, and provided the experimental results for analog placement with thermal consideration.

In [11], the work developed a thermal aware placer, ThermPL, to abate both on-chip peak temperature and thermal gradient by developing thermal force and padding techniques cooperated with rough legalization in the force-directed global placement. Thermal padding is firstly adopted to reduce local power density. To make use of thermal force, the authors used the thermal gain basis to fast and accurately capture the temperature distribution of a placement and effectively calculate the thermal contribution of cells based on the thermal locality. Then, they utilized the proposed innate thermal force assessed through thermal criticality and capabilities to spread cells away from hotspots. With the thermal gain basis, ThermPL can efficiently obtain the thermal profile of placement with the maximum error of 0.65% compared with a commercial tool. Experimental results show that ThermPL can provide 7% and 19% reduction on average in peak temperature and thermal gradient, respectively, within only 4.6% wirelength overhead.

In [12], with the thermal effect, improper analog placements may degrade circuit performance because the thermal impact from power devices can affect electrical characteristics of the thermally sensitive devices. There is not much previous work that considers the desired placement configuration between power and thermally sensitive devices for a better thermal profile to reduce the thermally induced mismatches. The study first introduced the properties of a desired thermal profile for better thermal matching of the matched devices. It then presented a thermal-driven analog placement methodology to achieve the desired thermal profile and to consider the best device matching under the thermal profile while satisfying the symmetry and the common-centroid constraints. Experimental results based on real analog circuits show that the proposed approach can achieve the best analog circuit performance/accuracy with the least impact due to the thermal gradient, among existing works.

In this study, we embed the iterative conjugate gradient method into HotSpot floorplan to compare the real CPU running time different solvers with the same compiler and running environment.

The conjugate gradient solver was imported into the thermal floorplan tool HotSpot [13], comparing with its previous LU decomposition solver. Then, we can compare the running time with different solvers of thermal floorplan between conjugate gradient method and LU decomposition solver. The thermal floorplan solver is switched by the program command line parameter with the same running environment such as CPU and memory, GCC version, and compiler’s option. It is more convective to compare CPU time with two solvers than theoretical analysis about loop iterative times. We also use the SSOR and Jacobi preconditions to accelerate the conjugate gradient solver.

HotSpot thermal aware floorplan employs the thermal model to compute the blocks’ temperature; the thermal temperature metric is combining with other area and wire length metrics, and it is a relative sparse matrix in the HotSpot thermal model of iterative SA framework algorithm. The HotSpot thermal floorplan can decrease the maximum of block temperature by evenly distributing the power density, avoiding hotspots in the floorplan step of physical design. We import an iterative method to solve this kind of relative sparse linear system in the thermal model of floorplan. This paper’s contributions include the following:(i)The relative sparse matrix is defined. It can speed up linear system solver convergence by the iterative sparse method.(ii)The conjugate gradient iterative method is imported in the HotSpot floorplan thermal model. It is an efficient algorithm that can reduce the running time by accelerating the linear solver in hotspot.

This paper is organized as follows. Section 2 introduces the HotSpot floorplan flow. Relative sparse matrix definition and the thermal resistance model are given in Section 3 and Section 4, respectively. Section 5 shows the result of experiments, and conclusions and future research are given in Section 6.

#### 2. HotSpot Thermal Floorplan

The VLSI physical design floorplan is to place the blocks without overlap in the silicon die, and the floorplan algorithm needs to obey the chip constraint, optimizing area, wire length, and thermal temperature metrics’ cost. The placement temperature is solved by linear equations in the thermal model.

##### 2.1. Introduction of Hot Thermal Floorplan

The floorplan/placement of physical design is a critical step for the thermal aware design. Hot floorplan is a thermal aware tool to decrease the module temperature and avoid the hotspot convergence. The hot floorplan merges the thermal cost with the traditional area and wire length cost into iterative Simulated Annealing algorithm, and it is time-consuming to solve the module temperature in the iterative Simulated Annealing algorithm.

We also use the HotSpot model to guide thermal floorplan/placement to do static temperature computer and modules' temperature statistics, and then the thermal cost is integrated with the other area and wire length to do thermal aware physical design.

HotSpot builds a thermal model to compute the dynamic block or grid temperature on chip. We import the conjugate gradient solver to accelerate the thermal block model computation in floorplan SA algorithm.

#### 3. Relative Sparse Matrix of Iterative Framework Algorithm

There are more sophisticated algorithms to solve sparse linear equations, avoiding to process zero entries of sparse matrix [6, 7], such as the iterative method of linear systems. The relative sparse matrix is not a strict sparse matrix and it tends to be a dense matrix; relative sparse matrix means that there is a few “interesting” entry between one matrix and another.

##### 3.1. Relative Sparse Matrix

Relative sparse matrix definition: if the matrix is sparse, there is a few nonzero values of matrix items. We define that matrices and are relatively sparse; in other words, the matrix is a sparse matrix relative to matrix . Assume there are two linear equations and , the order of solving linear equations is as follows: the first step is solving , the second step is solving in sequentially iterative framework algorithm; for example, Simulated Annealing algorithm solves and sequentially with the same vector , in a linear system with different matrices from to .

In this case of sequentially relative sparse matrix, the iterative methods are employed to solve the relative sparse linear equations to reduce the number of iterations for the convergence. The detail operations are described as follows:

In the first step, the linear system is solved trivially, obtaining the solution , and then we reuse the solution as the initial estimate value for the linear equations . In the same way, we reuse the solution of to be initial estimate value of , sequentially.

In this case of the sequence relative sparse matrix, if we set the initial estimate value of equal to , is the previous solution of , and it can speed up the iterative method convergence. Because the changes of the matrix from to is little, even though the matrices and are full matrix not traditional sparse matrix about entry densities, the matrix is a sparse matrix relative to the matrix .

We can call this relative sparse linear system computation as the incremental updating solution method.

In the iterative algorithm, the previous iterative solution preserved as intermediate solution is the initial estimate value for current iteration in the linear system.

#### 4. Relative Sparse Matrix in Thermal Floorplan of SA Framework Algorithm

##### 4.1. Thermal Model Introduction

In the floorplan of VLSI physical design, the circuit modules are randomly placed in the die using Simulated Annealing; once the Simulated Annealing generates a floorplan of circuit modules, we calculate the cost metric of die area, wire length between circuit modules, and the maximum temperature of circuit modules by the thermal model.

The Simulated Annealing is an iterative optimizing algorithm; the thermal model is incorporated into the SA (Simulated Annealing). The thermal conduct in the die is complex [14], and it can be an abstracted thermal resistant model [15]:where and are the vectors representing temperature and power consumption, respectively; thermal resistance is the square matrix and symmetric matrix. Once the circuit blocks are determined, the blocks’ power vector will not change, and it is a constant vector. Thermal resistance matrix will change entry values according to the placement detail, and it is a dense matrix instead of the sparse matrix, but it matches the relative sparse matrix definition in iterative Simulated Annealing framework algorithm.

The Simulated Annealing algorithm changes placement, a few from one stage to another, computing new cost of metrics, for example, moving a block from one location to another unused location, and this perturbation will only change one block’s location in placement so that the thermal conduct between blocks and most block’s temperature changes little too. Here the updating block’s thermal resistance matrix *R* is a dense matrix but has a few “interesting” changes about thermal resistance matrix items, and it is a relative sparse matrix between and . The new thermal resistance matrix is a relative sparse matrix with .

##### 4.2. LU Decomposition Solving Linear Equation

Solving linear equations gave a system of linear equations in the matrix form:Given matrix and vector , the solution is needed to be solved. The matrix is LUP decomposed such that PA = LU. The linear equations could be transformed into LU form equivalently as

The LU solver is done in two logical steps:(i)First step: solving the lower triangular matrix linear equations, for (ii)Second step: solving the upper triangular matrix linear equations,

The cost of solving a system of linear equations is approximately floating point operations if the matrix has size *n* [16].

The LU decomposition is the direct solver method of the linear equation.

##### 4.3. Incrementally Iterative Conjugate Gradient Solver and Convergence

There are many iterative linear solver methods including conjugate gradient, Gauss–Seidel, and successive over relaxation. In this study, the conjugate gradient method is used to solve the thermal model in floorplan.

###### 4.3.1. Convergence of Incrementally Iterative Conjugate Gradient Solver

The conjugate gradient method is convergence if the matrix is symmetric and positive definite. The condition number associated with the linear equation gives a bound on how inaccurate the solution will be after approximation. The condition number of matrix is the product of the two operator norms:

If is normal, then , where and are maximum and minimum values of eigenvalues of a matrix , respectively. The convergence of CG depends on the condition number of matrix which is equal to .

Denoting initial guess for , at starting of the SA algorithm, we can assume that ; if we get after the first time linear solver, then the conjugate gradient solver will reuse the previous as the next time initial estimate value , . This incremental updating solution method can accelerate the solver convergence.

The conjugate gradient method inspires solution, and is also unique in minimizing the following quadratic function:

This suggests taking the first basis vector to be the negative of the gradient of at , the gradient of equals , and we take .

It is conjugate to gradient between the vectors.

Let be the residual at the step:

Note that is the negative gradient of *f* at , so the gradient descent method would be to move in the direction . Here, we insist that the directions be conjugate to each other. We also require that the next search direction be built out of the current residue and all previous search directions, which is reasonably enough in practice.

###### 4.3.2. Pseudocode of Conjugate Gradient Algorithm

The algorithm is detailed below for solving , where is a real, symmetric, positive-definite matrix. The input vector can be an approximate initial solution or 0.

The pseudocode of conjugate gradient solver is shown in Algorithm 1.