Probabilistic Analysis of Steady-State Temperature and Maximum Frequency of Multicore Processors considering Workload Variation

Zhang, Biying; Fu, Zhongchuan; Chen, Hongsong; Cui, Gang

doi:https://doi.org/10.1155/2016/2462504

Mathematical Problems in Engineering

On this page

Abstract Introduction Related Work Experimental Results Conclusions Authors’ Contributions Acknowledgments References Copyright Related Articles

Special Issue

Advances in High Performance Computing and Related Issues

View this Special Issue

Research Article | Open Access

Volume 2016 | Article ID 2462504 | https://doi.org/10.1155/2016/2462504

Probabilistic Analysis of Steady-State Temperature and Maximum Frequency of Multicore Processors considering Workload Variation

Biying Zhang,^1,2Zhongchuan Fu,¹Hongsong Chen,³and Gang Cui¹

Academic Editor: Veljko Milutinovic

Received03 Dec 2015

Accepted06 Apr 2016

Published22 May 2016

Abstract

A probabilistic method is presented to analyze the temperature and the maximum frequency for multicore processors based on consideration of workload variation, in this paper. Firstly, at the microarchitecture level, dynamic powers are modeled as the linear function of IPCs (instructions per cycle), and leakage powers are approximated as the linear function of temperature. Secondly, the microarchitecture-level hotspot temperatures of both active cores and inactive cores are derived as the linear functions of IPCs. The normal probabilistic distribution of hotspot temperatures is derived based on the assumption that IPCs of all cores follow the same normal distribution. Thirdly and lastly, the probabilistic distribution of the set of discrete frequencies is determined. It can be seen from the experimental results that hotspot temperatures of multicore processors are not deterministic and have significant variations, and the number of active cores and running frequency simultaneously determine the probabilistic distribution of hotspot temperatures. The number of active cores not only results in different probabilistic distribution of frequencies, but also leads to different probabilities for triggering DFS (dynamic frequency scaling).

1. Introduction

Continuous technology scaling and miniaturization have escalated the power density and temperature of multicore processors. In order to decrease manufacturing costs, the packages of multicore processors are mostly designed based on average power dissipation instead of the maximum, and temperature is controlled with dynamic thermal management (DTM) techniques such as dynamic voltage and frequency scaling (DVFS) and dynamic frequency scaling (DFS) [1]. When temperature of processor reaches or approaches the critical point, DVFS or DFS are invoked to ensure the thermal constraint at the cost of sacrificing the speed of processors. Therefore, it is crucial for design space exploration to analyze temperature and running frequency accurately and fast at the early stage.

1.1. Motivation

To explore the design space of thermal-aware multicore processors at the early stage, some thermal models have been proposed to estimate the temperature and performance of processors [2–5], and most of estimation approaches are based on transient analysis [6–14]. For transient analysis, temporal variations of temperature and performance depending on workloads are traced, contributing to high estimation accuracy. However, transient analysis is time-consuming, and in particular for multicore processors time complexity is unacceptable at the early design stage. Accordingly, to speed up the estimation of temperature and performance of multicore processors, researchers resort to steady-state analysis [15, 16]. Nevertheless, to the best of our knowledge, all previous work related to steady-state analysis is based on the assumption that every workload has the same thermal contribution, which greatly hinders the estimation accuracy. In fact, temperature of multicore processors has great variations between different workloads [10, 17]. Our preliminary work has demonstrated that the dynamic power of processors is highly correlated with IPCs (instructions per cycle) and that within a small temperature range, leakage power linearly depends on temperature [18]. According to the HotSpot thermal model, temperature can be derived given the power of processors [3]. Thus processor temperature has higher correlation with IPC. According to CLT (central limit theorem), when a large number of instructions are executed in the processor, the probabilistic distribution of IPC tends to follow the normal distribution. Accordingly, given both the probabilistic distribution of IPC and the relationship between the temperature and IPC, the probabilistic distribution of processor temperature can be derived and analyzed. Subsequently, the probabilistic distribution of the maximum running frequency can be inferred given that the zero-slack DTM policy is used by the processor, which means that the speed of the processor is set to a value which makes the temperature of the hotspot be the maximum threshold allowed by the processor [7].

1.2. Contributions

In this paper, a probabilistic method is proposed to analyze the steady-state temperature and frequency of multicore processors taking into account the variation of workloads. In order to simplify the analyzing processes, DFS technique rather than DVFS technique is adopted to manage the temperatures of processor, where the voltage is constant and only frequency is adjusted. And the dynamic power can be modeled as the linear function of the frequency [12, 16]. The main contributions of this work are as follows:(i)At the microarchitecture level, the dynamic power of processors is modeled as the linear function of IPC and running frequency, and the leakage power of processor is approximated as the linear model of temperature.(ii)The microarchitecture-level hotspot temperatures of both active cores and inactive cores are derived as the linear functions of IPCs of all active cores.(iii)It is inferred that the hotspot temperatures of both active cores and inactive cores follow the normal probabilistic distribution, based on the assumption that IPCs of all active cores follow the same normal distribution.(iv)The probabilistic distribution of the set of frequencies is determined given the zero-slack DTM policy [7].

The remainder of this paper is organized as follows. In Section 2, related work is overviewed. In Section 3, the microarchitecture-level steady-state temperature of a core is formulated as the linear function of its powers based on the Hotspot thermal model. In Section 4, at the microarchitecture level, the dynamic powers of processors are modeled as the linear function of IPC and the running frequency, and the leakage powers are approximated as the linear model of temperature. In Section 5, the microarchitecture-level hotspot temperatures of both active cores and inactive cores are derived as the linear function of IPCs of all active cores. It is inferred that the hotspot temperatures of both active cores and inactive cores follow the normal probabilistic distribution, based on the assumption that the IPCs of all active cores follow the same normal probabilistic distribution. In Section 6, the probabilistic distribution of the set of frequencies is determined given the zero-slack DTM policy. In Section 7, experimental results are presented. This paper is concluded in Section 8.

The estimation approach of temperature and performance of processors can be classified into transient analysis and steady-state analysis, and so far most researches have been based on transient analysis.

2.1. Transient Analysis

In order to transiently analyze the temperature and explore the design space of processors at the early stage, several thermal models for processors have been proposed. Skadron et al. and Huang et al. [2, 3] presented a compact thermal modeling methodology based on the analogy between thermal and electrical phenomena, namely, HotSpot. Using HotSpot, the spatial and temporal variations of processor temperature can be obtained through transient analysis. To improve the accuracy of thermal simulation, Jang et al. [11] made an extension to the thermal model for HotSpot by taking into account the different ambient temperature owing to workload variations. To accelerate thermal analysis of multicore processors at the architecture level, Wang et al. [5] presented a composite thermal model, termed ThermComp, to optimize the model for different large processors. Li et al. [4] proposed a parameterized architecture-level dynamic thermal model, namely, ParThermPOF, in which many parameters can be set such as the location of thermal sensors and the conductivity of different components.

In order to improve the performance of thermal-aware multicore processors, various DTM techniques have been investigated based on thermal models such as Hotspot and analyzed transiently for the estimation of performance. Hanumaiah et al. [7, 19] presented an online thermal management algorithm for thermal-aware multicore processors, in which DVFS and task allocation techniques are simultaneously adopted. In the context of hard real-time systems, the time-varying voltage and frequency of multicores are computed to satisfy not only the thermal constraint but also the deadline constraint [6]. Shi et al. [12] presented a DTM policy under soft thermal constraint, in which the temperature constraint can be exceeded sometimes.

In order to simulate the thermal behavior fast and accurately for multicore processors, several researches have been performed. Wojciechowski et al. [10, 20] analyzed the transient characteristics of workloads based on a finite Fourier series expansion to accurately predict the thermal behavior of multicore processors, and a new DVFS approach is presented. Liu et al. [13] proposed a transient analysis method of temperature of multicore processors based on moment matching, and it is used to guide the migration processes of tasks. To account for the nondeterministic behavior of tasks in terms of executing times and decision branches, Das et al. [14] formulated thermal analysis as a hybrid automata reachability verification problem and an algorithm for constructing the automata was provided.

For transient analysis, temporal variations of temperature and performance depending on workload are traced and then high estimation accuracy is obtained. However, transient analysis is time-consuming, and in particular for multicore processors time complexity is unacceptable at the early design stage.

2.2. Steady-State Analysis

In order to speed up the estimation of temperature and performance at the early design stage of multicore processors, researchers resort to steady-state analysis and have carried out a lot of work. Based on Amdahl’s Law, Lee and Kim [15, 21] introduced variations of process and workload parallelism into the analyzing model and optimized the throughput of thermal-aware multicore processors by exploiting DVFS and the per-core power-gating (PCPG). Based on HotSpot, Rao et al. [16, 22] described an approximate thermal model for homogeneous multicore processors to fast and accurately predict the maximum steady-state throughput under thermal constraints. In the context of a hard real-time system of a single-core processor, Mohaqeqi et al. [23] studied stochastic behavior of the system, for example, performance, temperature, and reliability, based on Markovian view.

To the best of our knowledge, all previous work of steady-state analysis is based on the assumption that every workload has the same thermal contribution to processors, resulting in inaccuracy of temperature and performance estimation. This is the focus of our work. In this paper, the variation of workloads is taken into account to model the thermal and frequency more accurately.

3. Thermal Model

In this paper, a microarchitecture-level thermal model for a multicore processor is created by replication of a single-core processor based on HotSpot [3]. The multicore processor is divided into four layers, that is, chip, thermal interface material (TIM), heat spreader, and heat sink. There are thermal blocks in the chip and TIM, and there are five and nine thermal blocks in the heat spreader and heat sink, respectively. Totally, there are thermal blocks in the multicore processor, where is the number of cores.

The microarchitecture-level thermal model can be represented by the state-space differential equation as follows [16]:where and are -dimension vectors, respectively, denoting the temperature and power of the multicore processor, and and are constant matrices of dimension depending on the thermal conduction and capacitance of the processor.

When the temperature of the processor is in the steady-state, , then where isthe thermal conductance matrix of HotSpot model.

The thermal conductance matrix can be obtained from the HotSpot simulation tool. The thermal conductance matrix of a dual-core processor is as shown in Figure 1, where the submatrices , , and along the diagonal are lateral thermal conductance of the chip, TIM, and package, respectively, the submatrices and are the vertical conductance between die and TIM, and the vectors and are the vertical conductance between the TIM and the spreader. is equal to , and is equal to . The detailed conductance matrix of is as shown in Figure 2.

Lemma 1. According to HotSpot thermal model, when the temperature of the multicore processor is in the steady-state, there exist -dimension matrices , , and , such that where and are -dimension vectors, representing, respectively, temperature and power of the th core of the processor.

Proof. According to the arrangement of elements in the conductance matrix in HotSpot thermal model, (2) can be decomposed into the following equations:where is -dimension vector representing the temperature of TIM layer of the th core, is a scalar representing the temperature of the central block of the spreader, and is 13-dimension vector representing the temperature of other blocks of the spreader and the sink.
According to (6) and (7), SetEquation (8) can be converted intoSubstitute (10) into (5) and getAccording to (4) and (11), getSetThen, (12) can be transformed intoAccording to (14), getAccording to (15), getSubstitute (16) into (14) and getSet Get

This means that in the steady-state of temperature of a multicore processor, given the powers of all cores, the temperature of any core can be calculated according to Lemma 1 at the microarchitecture level.

4. Power Model

It is assumed that multicore processors have two power states, active mode and inactive mode, and the power state of each core can be set separately. And global dynamic frequency scaling (DFS) technique is used where frequencies of all cores are scaled uniformly.

4.1. Active Mode

When a core is in the active mode, workloads are executed and the core dissipates both dynamic power and leakage power. At the microarchitecture level, the power of the active core is defined aswhere , , and are, respectively, the total power, dynamic power, and leakage power of the th active core of size .

For the processor using DVFS technique, the dynamic power is proportional to the product of the square of the voltage and the frequency [7]; that is, . In this paper, the primary purpose is to analyze the impact of workload variation on the temperatures of processors. So, in order to simplify the analyzing processes, DFS technique rather than DVFS technique is used to manage the temperatures of processor, where the voltage is constant and only frequency is adjusted. Hence, the dynamic power can be modeled as the linear function of the frequency [12, 16]. In addition, the dynamic power caused by workload execution has a close linear relationship with IPCs.

Let be the IPC of the th core when workloads are running on it and let be the normalized frequency between 0 and 1, and then the microarchitecture-level dynamic power of the th active core is defined as where and are the linear regression coefficients when is set to maximum 1 and is the regression residual which follows the normal distribution with mean of zero: namely, . , , , and are vectors of size , and is the element-by-element squares of .

At the microarchitecture level, in order to simplify the analyzing procedure, the relationship between the leakage power and the temperature of the th active core is approximated by the linear model as follows: where and are the regression coefficients and is the regression residual which follows the normal distribution with mean of zero: namely, . is a diagonal matrix of size . , , , and are vectors of size . is the element-by-element squares of .

According to (20), (21), and (22), the power of the active core is represented by

4.2. Inactive Mode

When a core is in the inactive mode, it is powered off using power-gating technique. The inactive core only dissipates leakage power, which is much lower than that of the active core. Hence, the power of the th inactive core is the leakage power , which depends on the core’s temperature and it can be approximated by the linear model at the microarchitecture level as follows:where and are regression coefficients and is the regression residuals which follow normal distribution with mean of zero: namely, . is a diagonal matrix of size . , , and are vectors of size . is the element-by-element squares of .

5. Thermal Analysis

Lemma 2. When the temperature of a multicore processor is in the steady-state, the temperatures and the powers of different inactive cores are same; that is, and for , where and are the temperatures and the powers of the th inactive core, respectively.

Proof. According to (3), for any inactive cores and , Substitute (24) into (25), and getAccording to (26), get

The parameters and are invertible matrices, so is an invertible matrix. Therefore, ; that is, . And can also be obtained according to (24).

To be convenient, set and ; (24) can be simplified into

Theorem 3. Assume that all cores of a processor have the same hotspot. Let be the selection vector of the hotspot, where only one element of can be set to 1 and the others are 0 s, indicating that the corresponding functional unit is hotspot. There exist functions , , , , , and , such that the hotspot of the th active core can be formulated byThere exist functions , , , , and , such that the hotspot of the inactive core can be formulated by

Proof. According to (3), (23), and (28), the temperature of the inactive core can be derived as Let Then, (31) can be transformed intoAccording to (3), (23), and (28), the temperature of the th active core can be derived asLet Then, (34) can be transformed intoSubstitute (33) into (36), and getLetThen, (37) is transformed into According to (39), getSubstitute (40) into (39), and getThe hotspot temperature of the active core can be given by LetThen, (42) is transformed intoSubstitute (40) into (33), and getThe hotspot temperature of the inactive core is given by LetThen, (46) is transformed into

According to Theorem 3, it can be known that the hotspot temperature of any core can be expressed as the linear function of IPCs of all cores.

Theorem 4. Suppose that (a) the random variable for the IPC follows normal distribution with the mean and the variance , that is, ; (b) the tasks running on different cores are mutually independent; that is, and are independent for ; and (c) workload balancing techniques are used in the processor, such that and follow the same normal distribution. Then there exist functions and , such that the hotspot temperature of the active core follows the normal distribution with mean and variance ; that is, There exist functions and , such that the hotspot temperature of the inactive core follows the normal distribution with mean and variance ; that is,

Proof. According to (29), getThe norm-distributed random variables and are independent in the case of , so that the linear combination of still follows the normal distribution.
All elements in the random vector follow normal distribution and are independent, so that the linear combination of all elements in follows normal distribution. In the same way, and also follow normal distribution.
, , , , and all follow normal distribution and are mutually independent, so that follows normal distribution, where the mean is calculated by and the variance is calculated byIn the same way, , , , and in (30) all follow the normal distribution and are mutually independent, so that follows the normal distribution, where the mean is calculated by and the variation is calculated by

According to Theorem 4, it can be known that the hotspot temperature of any core follows the normal probabilistic distribution.

6. Frequency Analysis

The zero-slack policy is used as the DTM strategy of processors, that is, the speed of processor is set to a value which makes the temperature of the hotspot be the threshold [7]. However, the frequencies are discrete in this work. Therefore, in most cases, there is no frequency in the set of frequencies making the hotspot temperature be the threshold exactly.

Theorem 5. Let be the set of frequencies of multicore processors, where ; then the probabilistic distribution of the frequency f follows where is the function of the number of active cores and the frequency , representing the hotspot temperature of the active core, and is the temperature threshold of the processor.

Proof. When , according to the zero-slack DTM policy, obviously, When , then the probabilities of can be broken into two cases:(a)if , according to the zero-slack DTM strategy, the probability of is 0; that is, (b)if , the probability of is given byThe right-hand side of (59) can be derived byAccording to (59) and (60), getHence, when , according to (58) and (61), get

Therefore, according to Theorem 5 the probabilistic distribution of the set of frequencies can be obtained based on the assumption that the zero-slack policy is used.

Given the probabilistic distribution of the frequency and the mean of hotspot temperature of the active core for a certain frequency, the average hotspot temperature of the active core can be obtained by where denotes the set of frequencies of multicore processors and denotes the probability that the frequency is ; denotes the mean of hotspot temperature of the active core given the frequency .

7. Experimental Results

7.1. Experimental Methodology

A multicore version of Alpha 21264 processor is used as the processor model in our experiment [24], and there are eight cores in the processor. The cores have two working states, active state and inactive state, and the working state of each core can be set separately. The processor employs a global DFS technique, which means that frequencies of all cores in the processor are scaled uniformly. There are four discrete frequencies used by the processor, that is, 1.5 GHz, 2 GHz, 2.5 GHz, and 3 GHz. To facilitate analysis, the frequencies are normalized into the interval , so that the maximum frequency is normalized to 1. After normalization, the set of frequencies is . According to our previous work, the hotspot of Alpha 21264 processor is the branch predictor [18]. So the second element of the hotspot selection vector , corresponding to the branch predictor, is set to 1, and the other elements are set to 0 s. The thermal threshold, that is, the maximum temperature allowed by processor, is set to 100°C.

The HotSpot is used as the thermal model of the multicore processor, and the parameters such as thermal conductance and capacitance are set to default values of HotSpot simulation tool [3]. In order to construct the linear model of dynamic power, PTScalar is modified to obtain both the dynamic power profiles of each functional unit in Alpha 21264 processor and the IPC profiles [25]. The parameters of PTScalar are set to default values as well. The mean and variance of norm-distributed are determined using IPC profiles based on the maximum likelihood estimation. Some representative tasks such as mesa, ammp, quake, bzip, mcf, math, and qsort from MiBench [26] and SPEC CPU2000 [27] are selected as the benchmarks. These selected tasks are mutually independent; that is, no task takes precedence over the others, so the tasks can be parallel executed at the task level. In addition, workload balancing techniques are used in the processor. A task is not fixed on a core, and the tasks can be migrated among all cores such that the IPCs of different cores are equal. The simple linear regression analysis is used to determine the coefficients and in (21), and the variance of regression residuals is obtained.

The leakage power is the nonlinear monotonic increasing function of temperature, which is given by [25]where , , and are parameters which depend on topology, size, technology, and design of processors. In order to construct the linear model of leakage power, (64) is regressed linearly to determine the coefficients and in (22) and the coefficients and in (24), as well as the variance and of regression residuals and . The parameters , , and are set to the default values of PTScalar. The smaller the range of temperature, the higher the linear correlation between leakage power and temperature [7]. The temperature of a processor using DTM techniques does not exceed the maximum value, and the lower temperature has no impact on the design optimization of thermal-aware processors. Therefore, the linear regression analysis of leakage powers is performed at the temperature interval between 60°C and 100°C, and the regression results are used to estimate the leakage power at the whole temperature interval.

7.2. Estimated Accuracy

For the hotspot of the processor, that is, the branch predictor, Figure 3 shows the comparison between the actual value and the estimated value of leakage power for active cores, and Figure 4 shows that for inactive cores. A higher estimation accuracy of leakage power is obtained at the temperature interval between 60°C and 100°C at the cost of lower accuracy at the other intervals.

After regression analysis, the dynamic and leakage power can be estimated with the linear model in (21), (22), and (24). Figure 5 shows the estimation error rate of the dynamic power and that of the leakage power for active cores and inactive cores in thermal range between 60°C and 100°C. It can be seen that the error rates of the dynamic powers for different functional units have significant variations. For the decoder, the estimation error rate of dynamic power is only 2.62% but 10.14% for the floating point register (FPReg). The reason for this fact is that the linear correlations between the IPC and the dynamic powers for various functional units are different. Lower error rate results from higher correlation. Obviously, the IPC and the dynamic power of the decoder have the highest linear correlation, while the IPC and that of FPReg have the lowest linear correlation. It can also be seen from Figure 5 that the estimation error rates of leakage power for both active cores and inactive cores are similar. The estimation error rates of leakage power for active cores are between 3.15% and 3.26%, and those for inactive cores are between 3.06% and 5.91%. This is because the temperature and the leakage power of various functional units have similar linear correlations. In order to consider the impact of estimation errors on the analysis of temperature and frequency, the error terms , , and are introduced into the linear models of dynamic power and leakage power as expressed in (21), (22), and (24).

7.3. Probabilistic Distribution of Temperature

Based on the assumption that no DTM techniques are used to control the temperature of processors, according to Theorem 4, the probabilistic distribution of the hotspot temperatures for both active cores and inactive cores can be obtained. Table 1 presents the means and the standard deviations of the probabilistic distribution of the hotspot temperature for active cores, and the corresponding probabilistic density curves are as shown in Figure 6. It can be seen that the hotspot temperature of processors is not deterministic and has significant variations for a certain number of active cores and a certain frequency. The number of active cores and the running frequency simultaneously determine the range in which the temperature lies and the probabilistic distribution of the hotspot temperature. For the same running frequency, more active cores will yield higher temperature, and vice versa. For the same number of active cores, higher frequency will bring higher temperature, and vice versa. According to the characteristics of normal distribution curve, it can be known that the shape of probabilistic density curve corresponds to the variations of data distribution depending on the standard deviation of random variables. The curve with a higher peak implies a smaller standard deviation, that is, a lower variation of data distribution, whereas the curve with a lower peak implies a bigger standard deviation, that is, a larger variation; it can be seen from Figure 6 that the probabilistic density curves corresponding to various frequencies have different peaks. This observation implies that the degree of temperature variation has close correlation with working frequency, and higher frequency will yield higher variation of temperature.

(a)

(b)

(c)

Table 2 presents the means and standard deviations of the probabilistic distribution of the hotspot temperature for inactive cores, and the corresponding probabilistic density curves are as shown in Figure 7. When the number of active cores is eight, inactive core does not exist, so there is no case where the number of active cores is eight in Table 2 and Figure 7. The effect of the frequency and the number of active cores at the hotspot temperature for inactive cores is the same as that for active cores, except that the mean value and variation of hotspot temperature of inactive cores are lower than those of active cores under the same frequency and the number of active cores.

(a)

(b)

7.4. Probabilistic Distribution of Frequencies

If the power-gating and the DFS techniques are simultaneously used to manage the temperature of a processor, according to Theorem 5, the probabilistic distribution of working frequencies can be determined. Figure 8 presents the probabilistic distribution of frequencies when the number of active cores is six, seven, and eight, respectively. When the frequency is less than 1, it is implied that the hotspot temperature surpasses the threshold, and the DFS technique is triggered to reduce the frequency of the processor. So the probability for triggering DFS can be obtained from the probabilistic distribution of frequencies. Figure 9 presents the probability for triggering DFS when the number of active cores is six, seven, and eight, respectively.

If a core is powered off and made inactive using the power-gating technique, it only dissipates the leakage power which is much less than that of an active core, so the power dissipated by the processor is reduced significantly. When the number of active cores is less than six, that is, more than two cores are powered off, the decreased power makes it enough for the rest of active cores of a processor to execute at the full speed, and the DFS is not necessary to be triggered. Therefore, when the number of active cores is less than six, the frequency of the processor is constantly 1, and the probability for triggering DFS is constantly 0. This situation is not given in Figures 8 and 9.

It can be seen that various numbers of active cores result in different probabilistic distributions of frequencies and different probabilities for triggering DFS. When all cores are powered on, the probability that the processor runs at the maximum frequency is only 44.16%, and the probability for triggering DFS is 55.84%. This means that all cores will run at the full speed only when the IPCs of tasks are less. If the IPCs increase to an extent, the running frequency will be scaled down to control the temperature under the threshold. As the number of active cores decreases, the probability that the processor runs at the maximum frequency increases, and the probability for triggering DFS decreases. When the number of active cores is less than six, no matter what the IPCs are, the saved power by shutting off more than two cores makes it deterministic for the active cores to run at the full speed, so the probability that the processor runs at the maximum frequency is 100%, and the probability for triggering DFS is 0%.

7.5. Comparisons of Temperatures with and without DFS

If the DFS technique is not used, then the processor always runs at the full speed, that is, the running frequency is constantly 1. So the average hotspot temperature of the active core without the DFS can be obtained according to (52). If both the power-gating and DFS techniques are used simultaneously for the dynamic thermal management, then the running frequency can be scaled to control the temperature of the processor under the thermal threshold. According to (63), the average hotspot temperature of the active core with the DFS can be obtained. In terms of the average hotspot temperature and the probability that the hotspot temperature exceeds the threshold, Table 3 presents the comparative results between the active cores with and without the DFS when the number of active cores is 6, 7, and 8.

If the DFS technique is used by the processor, the running frequency will be scaled down to reduce the temperature once the hotspot temperature reaches the threshold. So the hotspot temperature will not exceed the threshold; that is, the probability that the hotspot temperature exceeds the threshold is 0%. If the DFS technique is not used by the processor, the running frequency is always the maximum, and the hotspot temperature is possible to exceed the threshold; that is, the probability that the hotspot temperature exceeds the threshold is larger than 0%. Therefore, the average hotspot temperature of the processor with the DFS is lower than that without the DFS.

As the number of active cores decreases, the saved power by shutting off cores makes it more possible for the active cores to execute at the full speed, and the effect of the DFS on cooling down the processor weakens until it disappears. Therefore, for the processors with and without the DFS, the average hotspot temperatures become closer as the number of active cores decreases, as shown in Table 3. Even though the DFS is not used, the probability that the hotspot temperature exceeds the threshold will reduce until 0% when more cores are powered off.

When the number of active cores is lower than 6, that is, more than 2 cores are powered off, no matter what speed the processor runs at, the hotspot temperature will not exceed the threshold, so all active cores are not necessary to trigger the DFS for managing the temperature of the processor and can run at the maximum frequency. Therefore, when the number of active cores is lower than 6, the probability that the hotspot temperature exceeds the threshold is 0%, and the average hotspot temperatures of the processor with and without the DFS are same. There is no difference with the DFS and without the DFS when the number of active cores is lower than 6, so the comparisons in this situation are not given in Table 3.

8. Conclusions

In this paper, a probabilistic analysis method of the temperature and frequency of multicore processors is presented taking the variation of workloads into account. It is proved theoretically in this paper that () the hotspot temperatures of both active cores and inactive cores are the linear functions of the IPC; () the hotspot temperature follows the normal probabilistic distribution based on the assumption that IPCs of all cores follow the same normal distribution; and () the running frequency follows a probabilistic distribution.

From the experimental results, it can be seen that the estimation error rates of the dynamic powers for different functional units have significant variations, indicating that the linear correlations between the IPC and the dynamic powers for various functional units are different; the estimation error rates of leakage powers for both active cores and inactive cores are similar, showing similar linear correlations between temperature and leakage power across various functional units; a higher estimation accuracy of leakage powers can be obtained at the temperature interval between 60°C and 100°C at the cost of lower accuracy at other intervals; the hotspot temperature of the processor is not deterministic and has significant variation for a certain number of active cores and a certain frequency, and the number of active cores and the running frequency determine simultaneously the probabilistic distribution of hotspot temperature; and various numbers of active cores result in different probabilistic distributions of frequencies and different probabilities for triggering DFS.

Competing Interests

The authors declare no competing interests.

Authors’ Contributions

Biying Zhang performed the theoretical analysis and wrote the major part of this paper. Gang Cui and Zhongchuan Fu are the supervisor and the cosupervisor of Biying Zhang and his current Ph.D. work. Hongsong Chen performed the experiments.

Acknowledgments

This work is supported by the project under Grant no. SGSDWH00YXJS1500229, Beijing Natural Science Foundation (4142034), and Fundamental Research Funds for the Central Universities (no. FRF-BR-15-056A).

References

J. Kong, S. W. Chung, and K. Skadron, “Recent thermal management techniques for microprocessors,” ACM Computing Surveys, vol. 44, no. 3, article 13, 2012.
View at: Publisher Site | Google Scholar
K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan, “Temperature-aware microarchitecture: modeling and implementation,” ACM Transactions on Architecture and Code Optimization, vol. 1, no. 1, pp. 94–125, 2004.
View at: Publisher Site | Google Scholar
W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan, “HotSpot: A compact thermal modeling methodology for early-stage VLSI design,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 5, pp. 501–513, 2006.
View at: Publisher Site | Google Scholar
D. Li, S. X.-D. Tan, E. H. Pacheco, and M. Tirumala, “Parameterized architecture-level dynamic thermal models for multicore microprocessors,” ACM Transactions on Design Automation of Electronic Systems, vol. 15, no. 2, article 16, 2010.
View at: Publisher Site | Google Scholar
H. Wang, S. X.-D. Tan, D. Li, A. Gupta, and Y. Yuan, “Composable thermal modeling and simulation for architecture-level thermal designs of multicore microprocessors,” ACM Transactions on Design Automation of Electronic Systems, vol. 18, no. 2, article no. 28, 2013.
View at: Publisher Site | Google Scholar
V. Hanumaiah and S. Vrudhula, “Temperature-aware DVFS for hard real-time applications on multicore processors,” IEEE Transactions on Computers, vol. 61, no. 10, pp. 1484–1494, 2012.
View at: Publisher Site | Google Scholar
V. Hanumaiah, S. Vrudhula, and K. S. Chatha, “Performance optimal online DVFS and task migration techniques for thermally constrained multi-core processors,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 11, pp. 1677–1690, 2011.
View at: Publisher Site | Google Scholar
W. S. Lawrence and P. R. Kumar, “Improving the performance of thermally constrained multi core processors using DTM techniques,” in Proceedings of the International Conference on Information Communication and Embedded Systems (ICICES '13), pp. 1035–1040, IEEE, Chennai, India, February 2013.
View at: Publisher Site | Google Scholar
K. Chen, E. Chang, H. Li, and A. Wu, “RC-based temperature prediction scheme for proactive dynamic thermal management in throttle-based 3D NoCs,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 1, pp. 206–218, 2015.
View at: Publisher Site | Google Scholar
B. Wojciechowski, K. S. Berezowski, P. Patronik, and J. Biernat, “Fast and accurate thermal modeling and simulation of manycore processors and workloads,” Microelectronics Journal, vol. 44, no. 11, pp. 986–993, 2013.
View at: Publisher Site | Google Scholar
H. B. Jang, J. Choi, I. Yoon et al., “Exploiting application/system-dependent ambient temperature for accurate microarchitectural simulation,” IEEE Transactions on Computers, vol. 62, no. 4, pp. 705–715, 2013.
View at: Publisher Site | Google Scholar
B. Shi, Y. Zhang, and A. Srivastava, “Dynamic thermal management under soft thermal constraints,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 11, pp. 2045–2054, 2013.
View at: Publisher Site | Google Scholar
Z. Liu, T. Xu, S. X.-D. Tan, and H. Wang, “Dynamic thermal management for multi-core microprocessors considering transient thermal effects,” in Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC '13), pp. 473–478, Yokohama, Japan, January 2013.
View at: Publisher Site | Google Scholar
D. Das, P. P. Chakrabarti, and R. Kumar, “Thermal analysis of multiprocessor SoC applications by simulation and verification,” ACM Transactions on Design Automation of Electronic Systems, vol. 15, no. 2, article 15, 2010.
View at: Publisher Site | Google Scholar
J. Lee and N. S. Kim, “Analyzing potential throughput improvement of power- and thermal-constrained multicore processors by exploiting DVFS and PCPG,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 225–235, 2012.
View at: Publisher Site | Google Scholar
R. Rao and S. Vrudhula, “Fast and accurate prediction of the steady-state throughput of multicore processors under thermal constraints,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 10, pp. 1559–1572, 2009.
View at: Publisher Site | Google Scholar
C.-Y. Cher and E. Kursun, “Exploring the effects of on-chip thermal variation on high-performance multicore architectures,” ACM Transactions on Architecture and Code Optimization, vol. 8, no. 1, article 2, 2011.
View at: Publisher Site | Google Scholar
B. Zhang and G. Cui, “Power estimation for alpha 21264 using performance events and impact of ambient temperature,” International Journal of Control and Automation, vol. 7, no. 6, pp. 159–168, 2014.
View at: Publisher Site | Google Scholar
V. Hanumaiah, S. Vrudhula, and K. S. Chatha, “Maximizing performance of thermally constrained multi-core processors by dynamic voltage and frequency control,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '09), pp. 310–313, San Jose, Calif, USA, November 2009.
View at: Google Scholar
B. Wojciechowski, K. S. Berezowski, P. Patronik, and J. Biernat, “Fast and accurate thermal simulation and modelling of workloads of many-core processors,” in Proceedings of 17th International Workshop on Thermal Investigations of ICs and Systems (THERMINIC '11), September 2011.
View at: Google Scholar
J. Lee and N. S. Kim, “Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating,” in Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC '09), pp. 47–50, San Francisco, Calif, USA, July 2009.
View at: Google Scholar
R. Rao, S. Vrudhula, and C. Chakrabarti, “Throughput of multi-core processors under thermal constraints,” in Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '07), pp. 201–206, Portland, Ore, USA, August 2007.
View at: Publisher Site | Google Scholar
M. Mohaqeqi, M. Kargahi, and A. Movaghar, “Analytical leakage-aware thermal modeling of a real-time system,” IEEE Transactions on Computers, vol. 63, no. 6, pp. 1377–1391, 2014.
View at: Publisher Site | Google Scholar
R. E. Kessler, “The alpha 21264 microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24–36, 1999.
View at: Publisher Site | Google Scholar
W. Liao, L. He, and K. M. Lepak, “Temperature and supply voltage aware performance and power modeling at microarchitecture level,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 7, pp. 1042–1053, 2005.
View at: Publisher Site | Google Scholar
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “MiBench: a free, commercially representative embedded benchmark suite,” in Proceedings of the IEEE International Workshop of Workload Characterization (WWC '01), Washington, DC, USA, 2001.
View at: Google Scholar
J. L. Henning, “SPEC CPU2000: measuring CPU performance in the new millennium,” Computer, vol. 33, no. 7, pp. 28–35, 2000.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2016 Biying Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

737

Downloads

634

Citations

Mathematical Problems in Engineering

Advances in High Performance Computing and Related Issues

Probabilistic Analysis of Steady-State Temperature and Maximum Frequency of Multicore Processors considering Workload Variation

Abstract

1. Introduction

1.1. Motivation

1.2. Contributions

2. Related Work

2.1. Transient Analysis

2.2. Steady-State Analysis

3. Thermal Model

4. Power Model

4.1. Active Mode

4.2. Inactive Mode

5. Thermal Analysis

6. Frequency Analysis

7. Experimental Results

7.1. Experimental Methodology

7.2. Estimated Accuracy

7.3. Probabilistic Distribution of Temperature

7.4. Probabilistic Distribution of Frequencies

7.5. Comparisons of Temperatures with and without DFS

8. Conclusions

Competing Interests

Authors’ Contributions

Acknowledgments

References

Copyright