Abstract

Nowadays, modern embedded applications are becoming more and more complex and resource demanding. Fortunately, Systems on Chip (SoC) are one of the keys used to follow their requirements that stand in need of high performance while maintaining a low-power profile. On one hand, today, due to the limited power budget imposed by the batteries, power is the limiting factor of the logic CMOS. On the other hand, the downscaling of the technology node for 65 nm and beyond, based on the International Technology Roadmap for Semiconductors (ITRS) as a reference, has not only resulted in huge energy consumption but also increased the temperature chip. To address this challenge, designing at the system level is the suitable measure to tackle with the complexity of the Systems on Chip, aiming at having better adjustment between timing and accuracy for power and temperature estimations. We present in this paper, at the first stage, two models describing the static and dynamic power at the physical level. These models are implemented on an open virtual platform Model Power-Consumption and Temperature in SystemC/TLM (LIBTLMPWT) based on a representative SoC architecture. At the second stage, we focus on power, especially the thermal behaviour of the chip while running three benchmarks set on the game of life application for two different technology nodes.

1. Introduction

Today, the demand for advancing the type of technology we use is high as people’s demands and life style change. A good example is smart phones, which are becoming more and more complicated and resource demanding. It is one reason why the technological advancements scaling enable the integration of different processing elements, input/output components, and memories on a single silicon die to form a SoC [1]. These SoCs are based on very small components, MOSFETs [2] (Metal Oxide Semiconductor Field Effect Transistor), whether the downscaling is still the most effective way for achieving low power consumption and high performances as long as the chip area is kept constant. Nonetheless, the real scaling trend for the last years was more aggressive; we suffered from the increase of the dynamic and static power consumed while enjoying the increase of clock frequency or the downsizing of the technology node.

Back in time with Moore’s law [3], the number of transistors was 32 when he made his prediction and today, they are approximately billion transistors integrated on a single chip. However, short-channel effect commences having the main role as much as the transistors are getting small and as much as the technology node is shorter than 65 nm. Thus, leakage current is overwhelmed, which increases the static power consumption as a consequence.

According to the IRDS and ITRS [4] (International Roadmap for Devices and Systems & International Technology Roadmap for Semiconductors), latest reports anticipate that power in general is getting worse as we move to the next technology node. Hence, power estimation need to be done at an early stage in the design flow.

Literally, new research studies came up with different power estimation techniques, starting from the gate level to the system level. That last one is considered as a vital premise to deal with the critical design constraints even if some common practices using low level are still adopted, which is unluckily not suitable for the ever-growing complexity of embedded systems supporting complex applications.

In fact, system level [5] includes various abstraction levels, such as transactional, functional, and cycle accurate. However, these two last levels require high amount of simulation. So, they are not considered in our approach. At this stage, the transactional level modeling is the appropriate level to model digital systems, in which the implementation units and the communication modules details are separated from each other. FIFOs or buses are modeled as channels and presented to modules using SystemC interface classes.

In this paper, we present not only static and dynamic power models but also temperature estimation implemented in an open virtual platform LIBTLMPWT [6], in which the temperature computations are done by the ATMI [7] (Analytical Model of Temperature in Microprocessors) library, based on running different benchmarks to grab the software effect on the hardware part while changing the technology node.

The rest of the paper is organized as follows. Section 2 presents an overview of the related works. In Section 3, physical models are described through the power estimation methodology exposing the relation between power density and temperature. Section 4 provides some applications run on the platform to evaluate the accuracy of the power models implemented. Finally, Section 5 concludes the paper.

In the last decade, to estimate the power and temperature dissipation at different abstraction levels of the embedded systems, various research efforts have been employed [8]. Tiwari et al. [9] have been presented the concept of power estimation at the software level through the ILPA (Instruction Level Power Analysis) approach. Using the ISS (Instruction Set Simulator) allows extracting instruction traces, then estimating the power consumed by the program running on the processor. This approach is no longer efficient while working on complex architectures. Hence, the FLPA [10] (Functional Level Power Analysis) methodology was proposed for faster processor power model elaboration based on algorithmic and configuration parameters. This method was successfully applied for different hardware components relying on the identification of a set of functional blocks, which influence the power consumption of the target component. One of the tools based on this methodology is Soft Explorer that covers the power analysis of simple and complex processors. This tool was included as a part in CAT [11] (Consumption Analysis Toolbox), which allows getting precise estimations on a very small time. But some parameters might be difficult to determine with precision such as cache miss rates when complex hardware or software are involved.

Evaluating systems power consumption at a higher abstraction level was one of the proposed solutions in most studies intending to have a reduced simulation time. Not long ago, the EDPE (Early Design Power Estimation) methodology was introduced. In this method, the authors depended on the characterization of the power consumed per hardware component. Then the models were integrated into the Soclib [12] library. The power models were based on the fine-grain level, which makes the approach more complicated for multiprocessor systems.

For the moment, in all the approaches cited we come up with the power-state models in early use for the components of chip. However, there was no notion of temperature, which makes it no longer possible to rely on such power simulators without power & temperature feedback loop. A system-level analytical model was proposed by Kumar and Thiele [12] taking in the consumption and thermal behaviour of a chip, whereas the consumption of hardware block apart from processors was not considered.

Our solution is closed to the intrinsic principals as [13]; we could run the actual software after including all the hardware components. Using the LIBTLMPWT allows us to get early estimations of the power and temperature, even though it has a lack of accuracy as long as the static power model implemented was based on a linear equation. Our contributions focus on implementing real models on that open platform setting one’s sights on analyzing the chip thermal behaviour while working on different technology nodes.

3. Case Study

This section details our hardware/software cosimulation methodology for a soft-core processor, covering the SystemC/TLM level for more effective management of complexity, improved quality, and reduced time to market.

This methodology is built on emulating the behaviour of different parts of the system for the power models in terms of consumption. The power modeling process is centered upon two conditions: the main activity characterization and the power model granularity. All at once, the second aspect concerns the granularity of the relevant activities, which covers a large spectrum that starts from the fine-grain level such as the logic gate switching and stretches out to the coarse-grain level like the hardware component events.

Generally, for system-level designers, fine-grain power estimation looks a more correlated model with technological parameters and data, whereas the coarse-grain power models are based on microarchitectural activities that cannot be determined easily regarding to the complexity of the system.

Thus, the analysis of the global consumption goes through the study of the consumption of each element, in order to be able to estimate the consumption of a function using them. The complexity of the proposed solution will be more visible while extending this work to multi-core systems where multiple sources of power dissipation would exist. Then, space will play a crucial role apart from time.

Figure 1 presents the applied methodology where our first step is extended to implement generic power models at system level on the LIBTLMPWT library, computing the power density of modern electronic systems.

Recently, a large issue is coming up with deep submicron processes due to the leakage current, which is greatly affected by the threshold voltage, oxide thickness, and gate length. Otherwise, since our developed architecture is based on a soft-core processor model as well as these issues affecting transistors operation, the second step of our methodology will describe the software applications generated based on the game of life application, to evaluate the accuracy of the models used and to put on the thermal behaviour for two various technology nodes.

As highlighted in Figure 1, the current version of our methodology considers a shared memory system. This methodology can be extended to take into account other architectures including multiprocessor systems. For instance, the target architecture includes a single soft-core processor able to execute different tasks and share resources using a system bus.

In general, the implemented generic power models rely on different parameters, static like capacitance, leakage current, temperature, and all physical values that depend on the technology, then dynamic ones such as the voltage used by each component, the frequency, and finally the active gates ratio that can be changed up to every clock cycle.

We admit a system serving several independent tasks denoted as A1, A2,... Ai. Each task is characterized by timing and power models. At the moment we discuss the power models. So, during the acting tasks, the system consumes power that can be called the power trace of the system. This power is the sum of three parts, dynamic , static , and short circuit power that are expressed by equation (1). The short circuit power refers to dynamic components too as a result of direct path short circuit current, which flows when both NMOS and PMOS transistors are simultaneously active, conducting current directly from supply to ground. This source of power is neglected for the moment in our approach. Hence, the total energy consumption is obtained by integrating the power over time. So, when this power is constant, the integration is simply a multiplication by duration.

3.1. Dynamic Power Model

The charge and discharge of the capacitance C in MOSFETs are the source of dynamic power switching dissipation [14], while each low to high output transition, the load capacitance C, is charging through the PMOS transistor, and a part of energy is drawn from the power supply. A certain amount of this energy is dissipated in PMOS device and a part of it is stored on C. It is discharged during the high to low output transition, and then the stored energy is dissipated through the NMOS transistor. This power is independent of temperature. However, it is up on runtime parameters such as the supply voltage, frequency, active gates ratio, and capacitance per gate.

3.2. Static Power Model

Generally, static power is usually low in relatively long channel MOSFET technology, whereas when looking at current technology trends and with the increased MOSFETs miniaturization hundreds of millions transistors are enabled to be placed on a single chip, which increases the subthreshold leakage current, the band-to-band tunneling, and gate tunneling leakage [15]. Currently, subthreshold leakage seems to be the main contribution factor as compared to all types of leakage power [16], resulting due to the subthreshold conduction when there is no activity in the circuits and gate to source voltage being less than the threshold voltage .

Equation (4) summarizes the formula of the static power followed by the expression of the leakage current in equation (5).where n is the subthreshold-swing coefficient [17] expressed by the depletion-layer capacitance per unit area and the Gate oxide capacitance and the thermal voltage are given by

3.3. Thermal Model

Thanks to the temperature solver ATMI that computed the temperature at the sensor place according to the power consumption of the components taking into account the floor plan as input. At the beginning, the initial temperature is fixed to compute the power density since it is based on it. Then we compute the temperature corresponding to this power density. After that, we recomputed the power density with the last temperature gotten, and so on.

As a second stage of our methodology, to evaluate the power and temperature of the virtual platform proposed, it is involved to characterize the activities that require a determinate number of microbenchmarking experiments, and thus a significant time to compute the power models. Fortunately, the TLM approach is apt to run embedded software applications (Ai) based on the ISS (Instruction Set Simulator) soft-core processor in which static power and temperature of two different technology nodes have been the core focus of the real time system. The common set of parameters of generic power models used is listed in Table 1.

4. Evaluation and Results of Power and Temperature Models

This section validates and emphasizes the power models benefits in the context of a SoC architecture where an external solver can be used. We cosimulate the functional behaviour with the temperature solver and power, whichever approves a bidirectional interaction between each other.

4.1. Experimental Settings

Extensive simulation experiments have been conducted to validate the proposed models. We perform our experimental simulations based on a Microblaze soft-core processor for a System on Chip implemented at the SystemC/TLM simulator afforded by the SoClib library.

After including all hardware components such as a shared memory divided into two parts, one for instruction and the other for data, a VGA controller, a timer, and usual devices, the actual software is run. For instance, the benchmarks used for models evaluation are based on the game of life application [18]. It is a cellular automaton, played on an infinite grid of square cells, where its evolution is only determined by its initial state. Every cell interacts with its eight neighbors, which are horizontally, vertically, or diagonally adjacent.

The processor model is built on 65 nm and 22 nm technology nodes; using the parameters presented in Table 1, we assume that initially T = 26.2°C,  = 3°V and f = 50 MHz. The period of tasks in applications is supposed to equal the temperature deadline, which is up to 100 seconds as can be seen in Figure 2. The number of transistors is predicted to be 2 million transistors/mm2 for 65 nm technology node and 15.3 million transistors/mm2 for 22 nm based on Intel newsroom estimations [18]. Finally, the task activity ratio α is uniformly distributed in the interval [0; 1].

4.2. Simulation Results

We evaluate the accuracy and efficiency of the proposed models using a longer benchmark test. As shown in Figure 3, while using the game of life application mentioned in Figure 4, the CPU, VGA controller, and bus are considered as the most consuming devices when the processer is computing. We first begin by discussing how the idle mode is modeled. So, according to the system model, when there are no tasks executing, the processor is idle with power consumption Pidle = 0.58 W during one second, then it skips to the running mode for a half second.

Looking at a short time range, we plot the corresponding temperature trace in Figure 5 which is fluctuating above of 12°C for the soft-core processor. As a consequence of dissipation phenomena, the temperature increases during the periods of high consumption and decreases during the periods of low consumption.

Coming back to Section 3 discussion, static power exhibits a strong dependence on subthreshold leakage. For a gate it is a small fraction of the total power, but with the scaling down features, it is posing new low-power design challenges. Figures 6 and 7 illustrate that while moving to the 22 nm technology node the static power is becoming more important.

The second benchmark consists of polling devices registers instead of idle modes and interrupts. Using the life-polling application was one of the proposed software making the CPU speeding up. At this moment, the behaviour of the system is preserved, a part from the power consumed for the IPs, which is raised as a consequence. The results describing this application can be seen in Figure 8.

The temperature of the system as a function of time is plotted in Figure 5. With more focus, its oscillations are amplified instead of being stable, which may damage the chip. One more application to control the temperature of the embedded system and overcoming this drawback is using a PID controller software on the target platform. It has been demonstrated in Figure 9 that the temperature curves are smoothed compared to the last applications used.

As can be seen in Table 2, the simulation results gotten of the virtual platform are as expected. Each application is characterized by a power consumption in which the life-polling and life-PID applications are closed, whereas the game of life is less consuming.

Finally, to verify the utility of the whole platform we present in Figures 10 and 11 the power consumed for the game of life benchmark used while having two different technology nodes, which proves that with the new technologies the static power is becoming predominant.

5. Conclusion

Managing on chip temperature and power consumption is a worthy aspect in the design of electronic devices today. The LIBTLMPWT was the best tool to introduce a virtual platform in which the hardware and software parts were included.

This paper presents one of the efforts done to achieve accurate estimations where physical models were implemented and simulated at the system level.

Each task is characterized by a power consumption and temperature computed in the interest of the ATMI solver. Two different technology nodes are tested to put on the line the static power Moore’s law meets.

We experimented on a small architecture but representative, containing both hardware IPs and a soft-core processor. To this end, it would be more interesting to work on a large platform holding different IPs and processors.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

We declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

Special thanks are due to Matthieu Moy, Lecturer at the Verimag laboratory, and Grenoble INP Ensimag, who partially supported our research leading to these results.