This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model

Anthony Dowling Lin Jiang Ming-Cheng Cheng Yu Liu Department of Electrical and Computer Engineering, Clarkson University, 8 Clarkson Ave., Potsdam, 13699, New York, USA
Abstract

Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of a multi-core CPU based on a defined set of states and their transitions. We compare the performances of a dynamic RC thermal circuit simulator (HotSpot [34]) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS [19], to our proposed POD-TAS algorithm. Furthermore, we utilize the COMBS benchmark suite [9] to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature of 29.01%. Using a set of 8 benchmarks, the comparison of the two algorithms shows a decrease of 29.57% in the peak spatial variance of the chip temperature and 26.26% in the peak chip temperature. We also identify several potential future research directions.

keywords:
Thermal Aware Scheduling , Proper Orthogonal Decomposition , High Resolution Thermal Modelling , CPU Thermal Management , Real-Time Scheduling

1 Introduction

Real-time computing systems aim to execute computational tasks to meet real-world clock timing deadlines as required by their use case. Modern embedded computing chips used in real-time systems are comprised of large numbers of transistors which execute computationally massive tasks. However, these complex devices suffer from large amounts of heat dissipation due to the electricity required to power the integrated circuits. This heat dissipation can create a number of issues within the chip’s circuitry, including mechanical stress and electrical malfunction. These issues can cause the chip to be unable to meet the real-time deadlines of tasks. This has led to the implementation of methods that can help to minimize the heat dissipation to avoid hardware damage and maintain the ability to meet real-time deadlines for tasks. These improvements can aid in increasing chip reliablity for safety critical systems such as autonomous vehicles, in-flight computers, and others. This, in turn, improves the level of safety that these systems can maintain for their operators.

Various measures have been proposed to control the thermal behavior of modern embedded chips to address the issue of thermal stress. These measures include both reactive and proactive methods [18]. A reactive method takes action when a temperature becomes too high, while a proactive method aims to avoid the temperature rising too high to begin with. Reactive methods include clock gating, power gating, and Dynamic Voltage and Frequency Scaling (DVFS). Of these, DVFS is a popular solution [18, 37], but since it reduces power consumption by lowering the chip voltage and operating frequency, it results in degradation of the performance, which may become an obstacle to time-sensitive real-time systems [36]. The major argument in favor of using DVFS is that while performance is sacrificed, deadlines can still be met [18]. The success of this argument relies on a moderate and predictable workload so that the scheduler can generate enough slack time while still maintaining the thermal state of the chip. A heavy or sporadic workload challenges DVFS and requires a more effective scheduling approach to maintain real-time task deadlines. Furthermore, not all available chips support the use of DVFS, due to an incapability to modify the chip operating frequency. This limits the generalizability of algorithms that rely on DVFS.

To address the thermal stress issues proactively without negatively impacting the performance of the multi-core processor, thermal-aware task scheduling aims to optimize heat dissipation while maintaining the deadlines of tasks as required by hard real-time systems. By applying Thermal-Aware Scheduling (TAS), the thermal output is lowered, which minimizes the cooling costs, energy consumption, and hazards to the chip’s reliability. Thus, TAS is a very active research area [18, 37]. However, to enhance the effectiveness of TAS, accurate and efficient thermal simulation of the CPU is required. In the past, many methods for thermal simulation have been developed, and each carries differing levels of accuracy and efficiency.

Rigorous approaches to thermal simulation that provide a high-resolution and accurate thermal profile capable of capturing all hot spots are called Direct Numerical Simulations (DNSs). These methods include the Finite Difference (FDM), Finite Element (FEM), and Finite Volume Methods (FVM). The drawback of these methods is that they require a very large degree of freedom for simulation and therefore carry a high computational cost [15]. This makes the use of these methods challenging for chip-level thermal simulation. Another method of thermal simulation that is more efficient and more typically used, is the RC-element thermal circuit. These models are commonly used for large-scale CPUs due to their efficiency and simplicity [7, 35]. RC circuit models treat each Functional Block (FB) as a thermal node; thus, the temperature in each FB is assumed to be uniform in space. This assumption limits the potential accuracy of the model [16]. This has been improved in the HotSpot thermal simulator [34] by using a grid-based RC circuit approach, but, even then, this method carries drawbacks in its accuracy due to the inherent limitations of the thermal circuit approach. Additionally, the use of very small RC elements in grid mode causes the computational burden of the RC circuit approach to become similar to DNS methods.

With the goal of achieving real-time TAS, the capabilities of the chosen thermal model must be taken into consideration. A computationally heavy model, like a DNS method, will limit the speed of the scheduling algorithm, leading to an inability for the model to make decisions in real time. An inaccurate thermal model could cause the algorithm to make sub-optimal decisions, or even to make decisions that will damage the CPU while executing tasks. Thus, there is a requirement for a highly accurate and computationally efficient thermal model that can be used in scheduling.

Proper Orthogonal Decomposition (POD), a physics-informed data-centric learning model, can be applied in TAS. POD enables the complex thermal behavior of a physical domain to be trained using a reduced-order method. This method uses collected data to train POD modes that may be used for more efficient thermal simulation of a CPU. Due to the reduced order inherent to POD, the computational cost of thermal prediction is massively reduced. However, even with this reduction in computational cost, the accuracy of the model is comparable to DNS [16]. In this work, we have performed an initial result comparing POD prediction with FEM DNS and a dynamic RC circuit simulation (HotSpot). See Figure 2 where POD outperforms the HotSpot in its accuracy and very closely matches the FEM DNS.

Many current TAS algorithms, including RT-TAS [19], are designed to utilize a steady state thermal model [19, 21, 31]. These models do not provide information regarding the transient thermal state of the chip (See Figure 1). The transient thermal state of a chip depicts how the temperature of the chip changes over time. The transient thermal state can vary from the steady state temperature of the chip due to the time it takes the chip to reach the steady state temperature after a change in power dissipation. Thus, in this work, we develop a dynamic POD-based TAS algorithm, named POD-TAS. POD-TAS uses a reduced-order thermal modeling approach which enables a dynamic model that is capable of accurate prediction of the transient thermal state with high spatiotemporal resolution. This enables POD-TAS to assign tasks based on the CPU’s transient thermal behavior.

We implement the POD thermal model and POD-TAS scheduling algorithm and replicate the steady-state RC circuit-based thermal model used by Lee et al. [19] and the RT-TAS [19] scheduling algorithm for comparative evaluation. This algorithm is chosen to compare to POD-TAS due to the differences in the methods the algorithms employ for temperature management. RT-TAS focuses on minimizing temperature by balancing the steady-state temperatures of the CPU cores, while POD-TAS observes the transient thermal state of the CPU. By comparing POD-TAS with an algorithm that relies only on steady-state temperature, the potential advantages of using a dynamic thermal model for scheduling can be investigated.

Using a simulation-based evaluation methodology (Section 6), we compare our proposed POD-TAS algorithm to RT-TAS [19], an existing state-of-the-art TAS algorithm that relies on a steady-state thermal model. Our evaluation methodology enables a direct comparison of the CPU thermal behavior while executing task schedules created by the two algorithms. Our simulation pipeline includes gem5 [6], a cycle-accurate CPU simulator that allows for detailed traces of architecture-level events during execution to be collected. These traces are then used as input to McPAT [20], a power simulator that can generate a dynamic power map of the power of each FB of the CPU with high temporal resolution. Lastly, FEniCS [10] is used to collect the dynamic temperature profile of the CPU during schedule execution via FEM-based DNS. Refer Figures 46, and 7. The COMBS [9] benchmark suite is used to provide task workloads during our evaluation of both of the TAS algorithms (See Table 4).

Therefore, the contributions of this work can be summarized as follows:

  1. 1.

    We implement a new TAS algorithm, POD-TAS, that controls the temperature of a multi-core CPU based on a pair of temperature thresholds (Section 5). This algorithm is the first to use a POD-based thermal model, which provides temperature information with high spatiotemporal resolution. We describe a set of CPU core thermal states and transitions among them to manage the CPU behavior, which avoids the assignment of tasks to cores in unsuitable states. See Figure 3.

  2. 2.

    We compare the accuracy of a popular RC thermal circuit simulator, HotSpot, with a POD-based thermal model using FEM DNS as a baseline. See Figure 2. This comparison supports the use of a POD-based thermal model over others.

  3. 3.

    This work provides a novel simulation-based methodology for the evaluation of TAS algorithms which measures the algorithm performance with high fidelity without the need for hardware setups (Section 6). A state of the art benchmark suite, COMBS [9], is used to provide CPU workloads during task scheduling.

  4. 4.

    We evaluate the POD-TAS algorithm to demonstrate its capability to manage the thermal behavior of the CPU.

  5. 5.

    POD-TAS is compared to the existing state of the art, RT-TAS [19], using the proposed methodology (Section 6). We implement the POD thermal model and POD-TAS scheduling algorithm and replicate the RT-TAS [19] thermal model and scheduling algorithm for comparative evaluation (See Figure 13). Two evaluation cases per algorithm (POD-TAS and RT-TAS [19]) are performed using subsets of the benchmark suite, one with 4 benchmarks and another using 8 benchmarks. See Tables 4 and 5.

  6. 6.

    Using a set of 4 benchmarks, we observe that POD-TAS can decrease the peak spatial thermal variance by 53.0% and the peak chip temperature of 29.01%. The variance of the maximum CPU temperature over time is decreased by 96.18% and the variance of the mean temperature is decreased by 88.69% as well. Using a set of 8 benchmarks, POD-TAS decreases the variance of the maximum chip temperature by 93.26%, and the peak chip temperature by 26.26%. See Table 5.

Our results indicate that the use of an accurate and efficient thermal model, POD, can improve the performance of TAS algorithms. However, the TAS algorithm must be designed to leverage the capabilities of the dynamic thermal model that underlies its decision-making.

The remainder of this paper is structured as follows: Section 2 reviews studies related to this work and Section 3 describes the POD thermal model used by POD-TAS. Section 4 investigates the differences between POD and other common thermal models. The effects that these differences may have on scheduling are also discussed. Section 5 provides a formal description of the POD-TAS algorithm, while Section 6 describes our evaluation methodology. Section 7 provides the results of the evaluation of POD-TAS and its comparison with RT-TAS [19]. Section 8 identifies potential future research directions to extend this study and lastly, Section 9 concludes the paper.

2 Related Work

The criteria of selecting existing state of the art that are related to our study are to choose the most recent works that have performed TAS to examine the methods by which they manage the chip temperature and evaluate their TAS algorithms. The evaluation methodologies of TAS works tend to use either a simulation environment [21, 33, 22, 28, 25, 27, 30, 32, 1, 29], or a hardware set up [19, 17, 24, 4, 3, 26]. While hardware can be argued to be very realistic; simulation-based methodologies offer greater flexibility without the risk of damaging the hardware during evaluation. Another varying factor in TAS research is the thermal model used for predicting/detecting the CPU temperature during scheduling. Some works rely on inbuilt temperature sensors to log the real temperature [17, 24], while others [33] use the Analytical model of Temperature in MIcroprocessors (ATMI) model, neural network models [1], or a custom Ordinary Differential Equation (ODE)-based methodology [26]. However, the most commonly used thermal model is the thermal RC circuit. Some works use the HotSpot simulator to generate the RC circuit model [25, 30, 32], while others create their own RC model [21, 22, 19, 28, 4, 27, 29]. The thermal models used in TAS algorithms are typically either dynamic models that allow for the transient thermal state to be predicted, or steady-state models that predict the steady-state temperature of the CPU for a given power dissipation. Additionally, some algorithms collect the real temperature of the CPU using inbuilt temperature sensors. A lot of recent studies utilize DVFS to control the heat output of the CPU [21, 17, 24, 25, 27, 3, 29, 32], or other methods [33, 22, 19, 28, 4, 30, 1, 26]. Table 1 describes related state of the art on TAS algorithms highlighting, for each study, the year of publication, CPU model, thermal model, temperature type from the thermal model, whether DVFS is used, evaluation methodologies, and performance metrics.

There are existing studies that aim to create TAS algorithms for mixed criticality systems, which have low priority and high priority operating modes. The algorithm by Li et al. [21] uses an RC circuit model and can be tuned to focus on either energy aware scheduling or thermal aware scheduling. Li et al. [22] extend the existing algorithm to support mixed-criticality systems by assigning each task a different Worst Case Execution Time (WCET) for each criticality mode. In the mixed criticality system-based algorithm, if a task cannot complete execution while the system is in low criticality mode, then all low priority tasks are dropped and the system switches to high criticality mode until its return to low criticality mode is determined to be safe. The algorithm focuses on setting the execution rates for each task to approximate a fluid schedule. Safari et al. [30] also design a TAS algorithm for mixed criticality systems, which relies on thermal safe power, defining the amount of power that a processor can dissipate before it overheats. Based on this defined value, it determines the maximum safe number of simultaneously active cores that can be used for scheduling.

Akbar et al. [1] use a neural network to predict the temperature of servers in a data center to schedule tasks among a large set of machines. The TAS algorithm implemented by Sharma et al. [32] creates groups of CPU cores called clusters. The algorithm initially assigns tasks to the clusters, then the schedule is modified based on whether energy or temperature efficiency is being prioritized. Bashir et al.’s [3] algorithm relies on DVFS and a modified RC circuit model. Their algorithm controls the CPU power management states of the CPU to regulate the static power dissipation of cores.

Many studies focus on heterogeneous processors, as they provide more task allocation options to a TAS algorithm. Benedikt et al. [4] design the MultiPAWS algorithm for an avionics environment. Their algorithm relies on a steady state RC circuit model to schedule tasks on an ARM big.LITTLE CPU. Kim et al. [17] also design a TAS algorithm for a heterogeneous ARM big.LITTLE CPU which focuses on minimizing power dissipation by deciding whether to use DVFS or migrate tasks between the big and LITTLE cores. Maity et al. [24] implement their TAS algorithm for an ARM big.LITTLE heterogeneous chip which includes a GPU. Their algorithm uses the LITTLE cores to operate the operating system, scheduler, and temperature sensors; on the other hand, tasks are run on the big cores and GPU. This is one of the works that uses inbuilt temperature sensors during scheduling. Lee et al. implement the Real-Time Thermal Aware Scheduling (RT-TAS [19]) algorithm for a heterogeneous chip with an integrated GPU and CPU cores. RT-TAS [19] assigns tasks with the goal of maintaining a balanced steady state temperature among the cores. The maintenance of this balance aims to minimize the chip temperature. Our POD-TAS algorithm can be readily expanded to support scheduling tasks on chips with an integrated GPU and CPU cores or other heterogeneous chips.

There are studies that limit the execution rate of tasks using DVFS and other tools to control the thermal behavior of the chip. Ozceylan et al. [26] create an algorithm that uses a differential equation-based thermal model to predict the system temperature. After finding the minimum utilization required to complete a task by its deadline, the cpulimit tool is used to limit the CPU speed. Shehzad et al. [33] use an Earliest Deadline First (EDF) methodology along with a fairness metric to create a TAS algorithm that aims to approximate a fluid schedule by maintaining equal amounts of execution among tasks. These methods rely on limiting the execution rate of tasks to control the CPU temperature and may face difficulty under a heavy or sporadic task load. Additionally, not all chips support the use of DVFS for limiting the execution rate of tasks.

Rodríguez et al. [28, 27, 29] create and expand algorithms that rely on two temperature thresholds. Their algorithm for single core processors aims to maintain the CPU temperature below a maximum temperature threshold. This algorithm has two settings that define whether it cools the CPU enough to execute the next task, or if after executing the task it cools back down to the low threshold [28]. The existing algorithm is expanded by incorporating DVFS to control the heating rate of the CPU [27]. Furthermore, this study expands their previous works to support dual-core CPUs [29]. The proposed POD-TAS algorithm differs from the algorithms developed by Rodríguez et al. in their conditions for inserting idle time for CPU cores and the prioritization of tasks and cores during task assignment. Additionally, unlike Rodríguez et al., the POD-TAS algorithm is not limited to single and dual core CPUs and does not rely on DVFS for temperature control.

Table 1: Comparative literature review of selected recent studies
Ref. Year CPU Model Thermal Temperature Uses Evaluation Metrics Used
Num. Model Type DVFS? Method
[33] 2019 Custom ATMI Transient Simulation Transient Temp., Thermal Gradient, Max Spatial Gradient, Mean Core Temp.
[22] 2019 Custom RC Transient Simulation Transient Temp., Energy, Acceptance Ratio
[19] 2019 Tegra X1 RC Steady-State Hardware Peak Temp., Response Time, Temp. Vs. Utilization, Schedulability, Transient Temp.
[28] 2019 Custom RC Transient Simulation Response Time, Schedulability
[17] 2020 Exynos 5422 Sensor Real yes Hardware Normalized Execution Time, Mean Temp., Peak Temp., Thermal Emergencies
[24] 2021 Exynos 5422 Sensor Real yes Hardware Peak Temp.
[4] 2021 NXP i.MX8 RC Steady-State Hardware Steady State Temp., Schedule Visualizations
[27] 2021 Custom RC Transient yes Simulation Response Time, Min/Mean/Max Temp.
[30] 2021 ARM Cortex-A15 RC(HotSpot) Steady-State Simulation Feasibility, Schedulability, Thermal Violations, Quality of Service, Peak Temp., Reliability
[32] 2022 Custom ARMv8 RC(HotSpot) Steady-State yes Simulation Mean Temp. Success Ratio, Energy Consumption, Context Switching
[1] 2022 Data Center DNN Steady-State Simulation Energy, Temp. Difference, Migrations, Service Level Agreement Violations
[3] 2022 Xeon 2680v3 RC(modified) Transient yes Hardware Power, Mean Temp., Schedulability, Overheats
[29] 2022 Custom RC Transient yes Simulation Response Time, Min/Mean/Max Temp.
[26] 2022 Exynos 5422 ODE-based Transient Hardware Max Temp., Variance
This work Athlon II X4 640 POD Transient Simulation Peak Temp., Peak Variance, Var(Mean), Var(Max), Var(Var)

Table 1 highlights the summarized information about the reviewed studies in comparison to ours. As we can see from Table 1, many studies on thermal-aware real-time scheduling rely on RC circuit-based thermal models, which carry an isothermal assumption for each RC element. This assumption can lead to the calculation of incorrect heat flux across the element interfaces, which is worse in the dynamic case. Due to the assumption of isothermic elements in the RC circuit, a popular RC-based thermal simulator, HotSpot [34], is observed to have an error as large as 200% in its block mode, when compared to FEM analysis [16]. This lack of accuracy can cause a TAS algorithm to make sub-optimal decisions during scheduling that may result in hardware damage due to overheating.

Furthermore, as we can see from Table 1, 6 out of the 14 reviewed TAS methods rely on DVFS. While DVFS can provide more control over the chip behavior, it limits the generalizability of an algorithm, as the algorithm will only be usable on processors that support DVFS. Algorithms that do not rely on DVFS are able to perform scheduling for chips that do not support DVFS, as well as those that do by maintaining a constant frequency. Thus, POD-TAS is designed to maintain generalizability by avoiding DVFS.

3 Physics-Informed Thermal Modeling Enabled By Proper Orthogonal Decomposition [14]

POD is a physics-informed model which is able to represent a complex thermal problem in both time and space using trained POD modes, together with the Galerkin Projection (GP) of the heat equation [12, 13]. The modes are extracted from thermal solution data of the physical domain in a training process. The data used for training should include a range of parametric variations to inform the model about the response of the physical quantity in the domain in a range of conditions. After training, the modes created are tuned to the parametric variations, such as Boundary Conditions (BCs) and power variations. The GP futher provides a physics-based guidance to improve the model accuracy and efficiency.

3.1 Construction of POD Modes

The POD modes are optimized by maximizing the mean square inner product of the thermal solution data with the modes over the entire domain [23, 5] while taking into account the dynamic or static parametric variations of BCs and interior power sources. In our study, the thermal data is collected at each simulation time step in the FEM DNS (Section 1) for the data collection. This optimization process [23, 5] leads to an eigenvalue problem, described by the Fredholm equation, shown in Eq. (1):

ΩR(r,r)φ(r)𝑑r=λφ(r),\int_{{\Omega}}R(\vec{r},\vec{r}^{\prime})\varphi(\vec{r}^{\prime})d\vec{r}^{\prime}=\lambda\varphi(\vec{r}), (1)

where λ\lambda is the eigenvalue corresponding to the POD mode (i.e., eigenfunction φ\varphi) and R(r,r)R(\vec{r},\vec{r}^{\prime}) is a two-point correlation tensor. This two-point correlation tensor is shown in Eq. (2):

R(r,r)=T(r,t)T(r,t)R(\vec{r},\vec{r}^{\prime})=\langle T(\vec{r},t)\otimes T(\vec{r}^{\prime},t)\rangle (2)

with \otimes as the tensor operator and \langle\rangle indicating the average over the number of thermal data sets. With the modes obtained from the training process, including collection of the thermal data and solving the eigenvalue problem as shown in Eq. (1), the temperature can then be represented by Eq. (3):

T(r,t)=i=1Mai(t)φi(r),T(\vec{r},t)=\sum_{i=1}^{M}a_{i}(t)\varphi_{i}(\vec{r}), (3)

where MM is the number of POD modes chosen to reconstruct the temperature solution.

3.2 Projection of the Thermal Problem to POD Space

The GP is used to construct a POD model by projecting the heat transfer equation onto POD space, as shown in Eq. (4):

Ω(φi(r)ρCTt+φi(r)kT)𝑑Ω=Ωφi(r)Pd(r,t)𝑑ΩSφi(r)(kTn)𝑑S,\int_{\Omega}(\varphi_{i}(\vec{r})\frac{\partial\rho CT}{\partial t}+\nabla\varphi_{i}(\vec{r})\cdot k\nabla T)d\Omega=\int_{\Omega}\varphi_{i}(\vec{r})P_{d}(\vec{r},t)d\Omega-\int_{S}\varphi_{i}(\vec{r})(-k\nabla T\cdot\vec{n})dS, (4)

where kk, ρ\rho, and CC are the thermal conductivity, density, and specific heat, respectively. Pd(r,t)P_{d}(\vec{r},t) is the interior power density, SS is the boundary surface, and n\vec{n} is the outward normal vector of the boundary surface. With the selected POD modes, Eq. (4) can be rewritten as an MM-dimensional Ordinary Differential Equation (ODE) for ai(t)a_{i}(t) as shown in Eq. (5):

i=1Mci,jdai(t)dt+i=1Mgi,jai(t)=Pj,j=1toM,\sum_{i=1}^{M}c_{i,j}\frac{da_{i}(t)}{dt}+\sum_{i=1}^{M}g_{i,j}a_{i}(t)=P_{j},\>j=1\>\textrm{to}\>M, (5)

where ci,jc_{i,j} and gi,jg_{i,j} are the elements of thermal capacitance matrices and thermal conductance matrices in the POD space and they are defined in Eq. (6) and Eq. (7) respectively as:

ci,j=ΩρCφi(r)φj(r)𝑑Ωc_{i,j}=\int_{\Omega}\rho C\varphi_{i}(\vec{r})\varphi_{j}(\vec{r})d\Omega (6)
andgi,j=Ωkφi(r)φj(r)𝑑Ω\text{and}\>\>g_{i,j}=\int_{\Omega}k\nabla\varphi_{i}(\vec{r})\cdot\nabla\varphi_{j}(\vec{r})d\Omega (7)

PjP_{j} in Eq. (5) is the power source strength for the jj-th POD mode in the POD space and is described below in Eq. (8):

Pj=Ωφj(r)Pd(r,t)𝑑ΩSφj(r)(kTn)𝑑S.P_{j}=\int_{\Omega}\varphi_{j}(\vec{r})P_{d}(\vec{r},t)d\Omega-\int_{S}\varphi_{j}(\vec{r})(-k\nabla T\cdot\vec{n})dS. (8)

Once the power consumption is obtained, the interior power source strength in POD space given in Eq. (8) can be pre-evaluated. For the boundary heat source in Eq. (8), the BC of the substrate bottom is modeled by convection heat transfer with a constant heat transfer coefficient and an ambient temperature of 45 C (TambT_{amb}). All other boundaries are adiabatic. The coefficients in Eq. (6) can also be pre-evaluated once the modes are determined. With ai(t)a_{i}(t) solved from Eq. (5) in POD simulation, the temperature solution can be predicted from Eq. (3).

4 Thermal Model Selection and Effects on Scheduling

The effects that the chosen thermal model can have on the performance of a TAS algorithm are manifold. An inaccurate temperature prediction can lead to sub-optimal or destructive decisions. On the other hand, some thermal models may not provide detailed enough information to make optimal scheduling decisions. Thus, the choice of the thermal model used when designing a TAS algorithm is a defining factor for its overall performance. In this section, we compare the differences between steady-state and dynamic thermal models (Figure 1). Also, we present an initial result (Figure 2) that compares our thermal model, POD, with HotSpot RC thermal circuit simulation, and FEM DNS.

Refer to caption
Figure 1: Dynamic temperature prediction compared to steady state

A dynamic thermal model predicts the temperature of the CPU over time. On the other hand, a steady state model only predicts the temperature at which the CPU will reach equilibrium given a certain power dissipation. Figure 1 shows the predictions of a dynamic thermal model alongside the predictions of a steady state thermal model. When the power dissipated is high, as shown at t=0t=0s and t=0.2t=0.2s, the steady state model predicts a temperature of 91.43C. However, the core takes a significant amount of time to approach that temperature, as shown by the dynamic predictions. Given this difference, the scheduler may make incorrect decisions based on the steady-state predictions unless the decision rate of the scheduler is extremely slow. However, as shown in Table 4, the execution time for most tasks is far less than the time needed for the CPU to approach steady-state temperature. In this case, extra slack time may be generated for a task that does not need it. Due to the lack of transient thermal prediction, steady-state thermal models limit the ability of an algorithm to have a precise knowledge of the current thermal state of the CPU it is currently scheduling tasks for. This limits the ability of the algorithm to be able to make optimal decisions regarding task scheduling. Thus, dynamic temperature prediction is required for a TAS algorithm to make well-informed decisions based on the CPU thermal state at a given time.

Refer to caption
Figure 2: Comparing POD with HotSpot and FEM DNS

A popular RC thermal circuit simulator is HotSpot [34]. This simulator allows for simulation in two modes, the first of which is block mode, where each functional unit of the CPU is treated as an RC node. The second mode is grid mode, which allows for the spatial domain to be treated as a mesh of RC elements, allowing for much higher simulation accuracy at the cost of computation time. This simulator also includes a scaling factor setting which scales the capacitance of lumped thermal nodes to aid in simulation accuracy.

We compare POD with FEM DNS, a rigorous and accurate thermal simulation [15], and with HotSpot, a dynamic RC thermal circuit simulator [34]. This comparison sheds light on the accuracy of HotSpot and POD when compared with FEM DNS as a baseline. To this end, FEM is configured with a mesh size of 200×\times200×\times17 to allow for high spatial resolution in the xx, yy, and zz directions. POD is trained with, and thus predicts, temperature with the same resolution and mesh size as FEM. HotSpot is limited to only support xx and yy mesh sizes that are powers of 2, so to support fair comparison between the models, we configure it to use a mesh of 256×\times256×\times17. The spatial domain in all cases is representative of an AMD Athlon II X4 640 CPU, which has a size of 14mm×\times12mm×\times0.3mm.

Figure 2 is a plot of the temperature over time where we use three different methods, POD, HotSpot, and FEM DNS to simulate the CPU temperature. All methods use the same power input data for consistency. The FEM DNS is the solid blue line in the plot, which is the baseline to compare the other models to. The four dashed lines are the temperature calculated using HotSpot with different scaling factors, denoted as CC, which is set to 1.0, 0.9, 0.8, and 0.5. Lastly, the three dotted lines are predictions using our POD model with varying numbers of modes: 3, 10, and 30. We can see from the figure that HotSpot does not follow the FEM temperature well and over time it diverges from the FEM curve, unlike POD. From the inset plot in Figure 2, we can see that as the number of POD modes increases, the prediction accuracy greatly increases. Around 0.052 seconds, the POD prediction with 30 modes is almost exactly following the FEM result, while the 10 modes result has some error, while the 3 modes result is close to the FEM result, but we can see that the slope differs greatly from the FEM result in this region. However, even with only 3 modes, the POD prediction is more accurate than the HotSpot result. Using 30 POD modes, the average prediction error of the maximum chip temperature is 0.0033%, while with hotspot, the lowest error is 2.12% at C=0.8. Therefore, we do not evaluate our proposed TAS algorithm using a dynamic RC thermal circuit model such as HotSpot due to its accuracy. Even though the temperature output is very similar between POD with 30 modes and FEM DNS, we do not perform TAS using FEM DNS due to its computational cost [15].

Furthermore, as shown in Figure 2, different thermal models and differing configurations can provide very different accuracy. For instance, we can see that HotSpot, when configured with a scaling factor of 0.5 predicts the temperature to be much higher than the FEM baseline at 0.02 seconds. In this case, the scheduling algorithm that relies on HotSpot would generate extra slack time for that CPU core even though in reality, it is not needed. Conversely, at 0.035 seconds, we can see that many of the HotSpot temperature curves are below the ground-truth curve. In this case, the algorithm would receive the incorrect information that the CPU has more thermal capacity available than it realistically does, leading to over heating of the device. Thus, a highly accurate thermal model is required to guarantee the safety of the hardware when using a TAS algorithm.

5 A TAS Algorithm Based on a Dynamic POD Thermal Model

As shown, different thermal models can provide temperature prediction with a wide variety of characteristics. Given the high performance of the POD-based thermal model, this model is chosen as the basis for our POD-TAS algorithm. Thus, the algorithm should be designed to leverage the dynamic, high resolution thermal prediction of the model. The POD-TAS algorithm leverages a pair of temperature thresholds to restrain the maximum core temperature. Once tasks are selected and assigned, POD (Section 3) is used to predict the temperature of all cores of the CPU so that the cores may be idled when the temperature within it exceeds the threshold. Once the predicted temperatures are observed to reach the temperature threshold, cores exceeding the hot temperature threshold, THT_{H}, are idled. The idling continues until the core reaches a “cool” threshold, TCT_{C}, and is then allowed to resume execution.

Table 2: Table of Notations
Notation Meaning
Δt\Delta t The time step size used during scheduling
τ\tau Set of Tasks
π\pi Set of CPU cores
Λ\Lambda Mapping of tasks to cores
Ψ\Psi Set of CPU states
Ψπi\Psi_{\pi_{i}} The state of the CPU core πi\pi_{i}
THT_{H} The high temperature threshold
TCT_{C} The low temperature threshold
TπT_{\pi} The temperature of the CPU π\pi
TπiT_{\pi_{i}} The temperature of the CPU core πi\pi_{i}
n()n() A function that determines the number of elements in a set
Algorithm 1 Pseudocode of POD-TAS Algorithm (Primary Concept)
1:function POD-TAS(τ\tau, TCT_{C}, THT_{H}, tendt_{end})
2:     tcurr0t_{curr}\leftarrow 0; σ[]\sigma\leftarrow[];
3:     while tcurr<tendt_{curr}<t_{end} do
4:         τSELECT_TASKS(τ,π,Ψ)\tau^{\prime}\leftarrow\text{SELECT\_TASKS}(\tau,\pi,\Psi)
5:         ΛASSIGN_TASKS(τ,π,Ψ)\Lambda\leftarrow\text{ASSIGN\_TASKS}(\tau^{\prime},\pi,\Psi)
6:         append(σ,(tcurr,Λ))\text{{append}}(\sigma,(t_{curr},\Lambda))
7:         do
8:              tcurrtcurr+Δtt_{curr}\leftarrow t_{curr}+\Delta t
9:              TπPREDICT_TEMP(Pσ,tcurr)T_{\pi}\leftarrow\text{PREDICT\_TEMP}(P_{\sigma},t_{curr})
10:              ΨUPDATE_STATES(π,Ψ,Tπ,Λ)\Psi\leftarrow\text{UPDATE\_STATES}(\pi,\Psi,T_{\pi},\Lambda)
11:         while max(Tπi(tcurr),πiπ)<TH and tcurr<tend\text{max}(T_{\pi_{i}}(t_{curr}),\forall\pi_{i}\in\pi)<T_{H}\text{ {and} }t_{curr}<t_{end}      
12:     return σ\sigma

Algorithms 123, and 4 represent the primary concept of our POD-TAS algorithm. Based on these concepts, we have implemented our novel POD-TAS algorithm into a scheduling system which enables use of the algorithm to generate schedules in a computer-readable format for use in evaluation. A description of the use of the scheduling system is provided in Section 6. In our POD-TAS algorithm, the notation π\pi represents a CPU platform, with π={π1,π2,,πm}\pi=\{\pi_{1},\pi_{2},...,\pi_{m}\} being the set of mm CPU cores. The temperatures at time tt are Tπi(t),πiπ where 1imT_{\pi_{i}}(t),\pi_{i}\in\pi\text{ where }1\leq i\leq m. The set of tasks is denoted as τ\tau, such that τkτ\tau_{k}\in\tau represents the kk-th task. The mapping or assignment of the tasks from τ\tau to the CPU cores in π\pi is represented as Λ\Lambda. Λπi\Lambda_{\pi_{i}} represents the task that is assigned to CPU core ii where πi\pi_{i} belongs to a sorted subset of π\pi that have valid thermal states. The sorting of the elements of π\pi is explained later in this section. The notation Ψ\Psi in our algorithm represents the set of CPU core states as described in Figure 3. We use Ψπi\Psi_{\pi_{i}} to represent the state of the ii-th CPU core, πi\pi_{i}. The primary concept of the assignment of cores depending on their thermal state is represented in Algorithm 2. The notation σ\sigma is used to describe an ordered set of task-to-core assignments paired with their assignment times (schedule). Additionally, to denote the power dissipated by a certain schedule, we use the notation PσP_{\sigma}. The power being dissipated by the CPU cores is dependent on the task assignment, and the power dissipated by each task at a given time instant is dependent on how much run time has been given to the task by the schedule σ\sigma. Thus, PσP_{\sigma} contains the power dissipation of the CPU while it is running schedule σ\sigma over time. To make the powermap PjP_{j} in Eq. 5 (Section 3), the algorithm constructs PσP_{\sigma} to be used in each POD prediction.

C,IC,IStartC,RC,RW,IW,IW,RW,RH,IH,IH,RH,RAssignedIdledReassignedAssignedIdledReassignedTπi>TCT_{\pi_{i}}>T_{C}Tπi>THT_{\pi_{i}}>T_{H}Tπi<TCT_{\pi_{i}}<T_{C}Tπi<TCT_{\pi_{i}}<T_{C}
Figure 3: State Transition Diagram for CPU Cores in POD-TAS

To denote the thermal state of the CPU cores, we describe a set of states and transitions among them to manage the CPU behavior. Figure 3 describes the possible CPU states, Ψ\Psi, and their transitions. Each state consists of a pair of letters of which the first letter denotes the thermal state of the core and the second letter denotes the execution state of the core. The possible thermal states are CC, WW, and HH, which correspond to cool, warm, and hot respectively. The possible execution states are II and RR, where II denotes an idle core, and RR denotes a running core. The definition of the states of each CPU core and the transitions between them defines how the algorithm should treat CPU cores depending on their state and what actions should be taken depending on their state. In this way, we can avoid the assignment of tasks to cores that are in unsuitable states. To our knowledge, no other study has used such concept of core assignment depending on its state.

Algorithm 2 Pseudocode of POD-TAS Core State Management Algorithm (Primary Concept)
1:function Update_States(π\pi,Ψ\Psi,TπT_{\pi},Λ\Lambda)
2:     for πiπ\pi_{i}\in\pi do
3:         run_stateRrun\_state\leftarrow R
4:         if Λπi=\Lambda_{\pi_{i}}=\emptyset then
5:              run_stateIrun\_state\leftarrow I          
6:         if Tπi<TCT_{\pi_{i}}<T_{C} then
7:              Ψπi(C,run_state)\Psi_{\pi_{i}}\leftarrow(C,run\_state)
8:         else if TC<Tπi<THT_{C}<T_{\pi_{i}}<T_{H} then
9:              Ψπi(W,run_state)\Psi_{\pi_{i}}\leftarrow(W,run\_state)
10:         else
11:              Ψπi(H,run_state)\Psi_{\pi_{i}}\leftarrow(H,run\_state)               
12:     return {Ψπiwhereπiπ}\{\Psi_{\pi_{i}}\>\>\text{where}\>\>\pi_{i}\in\pi\}

Algorithm 2 describes the subroutine UPDATE_STATES which includes the procedure of core state management and is called in Algorithm 1. As shown in Figure 3, all cores start in the cool, idle state, (C,I)(C,I). From this state, when a task is assigned to the core, it moves to the cool-running state, (C,R)(C,R). From the (C,R)(C,R) state, a core can return to (C,I)(C,I) if the core is idled due to task completion or reassignment. However, if the temperature of the core, TπiT_{\pi_{i}} exceeds TCT_{C}, the core moves to the state (W,R)(W,R), which denotes a warm, running core. The warm states denote that the core has crossed TCT_{C} but has not yet crossed THT_{H}, thus the core is in a transitional state. Similar to the transitions from (C,R)(C,R), a core in the (W,R)(W,R) state can be idled, moving it to the (W,I)(W,I) (warm, idle) state, but should its temperature exceed THT_{H}, meaning a temperature violation has occured, the core is forced to idle, and moved to the (H,I)(H,I) (hot, idle) state. Cores in the (H,I)(H,I) state are disallowed from executing any tasks until their temperature falls below TCT_{C} and they return to state (C,I)(C,I). This restriction ensures that after a task exceeds the high temperature threshold, it must cool down before resuming execution. This ensures that no cores will rapidly alternate between task execution and idling. The (H,R)(H,R) (hot, running) state is separate from the other states with no transitions to it because that state is forbidden. As soon as the temperature of a core is observed to exceed THT_{H}, the core is idled.

Algorithm 1 formally describes the primary concept of the POD-TAS algorithm. The algorithm takes as input τ\tau, the set of tasks to be ran; TCT_{C}, the CPU cool temperature threshold; THT_{H}, the CPU hot temperature threshold; and tendt_{end}, the amount of time that the schedule should be constructed for. The function returns the schedule, σ\sigma, which can be stored for use by a scheduling system. There are four subroutines that are called: UPDATE_STATES, SELECT_TASKS, ASSIGN_TASKS, and PREDICT_TEMP. PREDICT_TEMP takes the dynamic power map of the current schedule, and the time for which to predict, and predicts the CPU temperature for each CPU core at that time using the POD thermal model. UPDATE_STATES is described in Algorithm 2, which is explained before with the help of Figure 3. SELECT_TASKS and ASSIGN_TASKS are described in the remainder of this section.

Algorithm 3 Pseudocode of POD-TAS Task Selection Algorithm (Primary Concept)
1:function Select_Tasks(τ\tau,π\pi,Ψ\Psi)
2:     n_tasksn({πiπ|ΨπiΨ(H,I)})n\_tasks\leftarrow n(\{\pi_{i}\in\pi|\Psi_{\pi_{i}}\neq\Psi(H,I)\})
3:     τsort(τ|TimeLeft(τi)TimeLeft(τi+1))\tau^{\prime}\leftarrow sort(\tau|TimeLeft(\tau_{i})\geq TimeLeft(\tau_{i+1}))
4:     τrun{}\tau_{run}\leftarrow\{\}
5:     while n(τrun)<n_tasksn(\tau_{run})<n\_tasks do
6:         τrunτrun{τ[n(τrun)]}\tau_{run}\leftarrow\tau_{run}\cup\{\tau^{\prime}[n(\tau_{run})]\}      
7:     return τrun\tau_{run}

In POD-TAS, temperature management is handled primarily by the two thresholds, but task selection from the queue and core assignments can have a significant impact on the schedule created and on the schedulability of a task set. Task selection is handled by prioritizing the execution of tasks with more remaining run time (TimeLeft(τi)TimeLeft(\tau_{i})). This helps to avoid tasks being idled unnecessarily and not being able to be completed by their deadline. The task selection algorithm (the subroutine SELECT_TASKS) is described in Algorithm 3. The primary concept of this algorithm is that for each CPU core that is not in the (H,I)(H,I) state, a task should be selected from τ\tau, prioritizing tasks with more remaining execution time.

Algorithm 4 Pseudocode of POD-TAS Task to Core Assignment Algorithm (Primary Concept)
1:function Assign_Tasks(τrun\tau_{run},π\pi,Ψ\Psi)
2:     πsort({πiπ|ΨπiΨ(H,I)}|TπiTπi+1)\pi^{\prime}\leftarrow sort(\{\pi_{i}\in\pi|\Psi_{\pi_{i}}\neq\Psi(H,I)\}|T_{\pi_{i}}\leq T_{\pi_{i+1}})
3:     for τkτrun\tau_{k}\in\tau_{run} do
4:         for πiπ\pi_{i}\in\pi^{\prime} do
5:              if Λπi=\Lambda_{\pi_{i}}=\emptyset then
6:                  Λπiτk\Lambda_{\pi_{i}}\leftarrow\tau_{k}
7:                  break                             
8:     return {Λπi where πiπ}\{\Lambda_{\pi_{i}}\text{ where }\pi_{i}\in\pi^{\prime}\}

Algorithm 4 (the subroutine ASSIGN_TASKS) describes the assignment of selected tasks to available cores. This algorithm first sorts the cores in ascending order of temperature so that cooler cores are assigned the tasks requiring more execution time. This prioritization scheme aims to provide the maximum amount of execution time for the tasks that require it, since Algorithm 3 ensures that the tasks with the highest remaining run time are prioritized. The interplay and effects of these two algorithms is a potential future research direction. This subroutine creates Λ\Lambda, which notes which tasks from the task set, τ\tau, should be running on each CPU core (πi\pi_{i}). If a CPU core, πiπ\pi_{i}\in\pi, does not have an assignment, then Λπi\Lambda_{\pi_{i}} will not exist for that assignment, meaning that core is idle. This can occur when either more cores are available in π\pi^{\prime} than there are tasks in τrun\tau_{run}, or if πi\pi_{i} is overheated, that is Ψπi=(H,I)\Psi_{\pi_{i}}=(H,I).

We utilize Python as the main language for the POD-TAS implementation, with the Numpy [11] package used to support mathematical operations. The ODE in Eq. (5) must be solved to obtain ai(t)a_{i}(t) for POD temperature prediction. The PETSc [2] library is used to implement a program in C++++ to solve the ODE. The values of ci,jc_{i,j} and gi,jg_{i,j} for Eq. (5) are pre-calculated using Eq. (6) and Eq. (7) respectively. To provide PjP_{j}, the dynamic power map for each benchmark is arranged according to the schedule that has been constructed. After constructing PjP_{j}, the main script writes out an output, which is loaded by the ODE solver program to solve Eq. (5). The solution, ai(t)a_{i}(t), is then loaded using Numpy [11] to solve Eq. (3) to obtain the predicted temperatures to be used in the POD-TAS algorithm.

5.1 Runtime Complexity of POD-TAS

Algorithm 1 is a loop with linear time complexity based on tend/Δtt_{end}/\Delta t, which calls Algorithms 23, and 4. Algorithm 2 uses a linear loop through the CPU cores to update the core states in Ψ\Psi. Algorithm 3 sorts the tasks in non-increasing order based on their remaining execution times before using a linear time loop to take enough tasks to fill the available CPU cores. The largest cost in Algorithm 3 is the sorting of the tasks based on their remaining time, which using common efficient sorting algorithms can be achieved in Θ(nlog(n))\Theta(n\>log(n)) time, where nn is the number of tasks to be sorted. Algorithm 4 sorts the CPU cores based on their temperatures before a pair of nested loops which iterate through the selected tasks and the available CPU cores. Thus, Algorithm 4 runs with Θ(mlog(m))+qr\Theta(m\>log(m))+qr complexity, where mm is the number of CPU cores in π\pi, qq is the number of tasks in τrun\tau_{run}, and rr is the number of CPU cores in π\pi^{\prime}. The number CPU cores can be assumed to be small in the general CPU scheduling case, while the number of tasks does not tend to be large enough to have a heavy impact on the computational cost of sorting the set of tasks. Thus, the primary source of computation is Algorithm 1, with its loop that calls the other three algorithms. So we can say that POD-TAS overall has a linear time complexity dependent on tend/Δtt_{end}/\Delta t. The RT-TAS [19] algorithm, which we replicate and compare POD-TAS to, also has linear time complexity [19].

5.2 Differences Between POD-TAS and RT-TAS [19] Algorithms

The RT-TAS [19] algorithm supports the co-scheduling of an integrated CPU-GPU platform, such as in a heterogeneous chip. The algorithm also supports platforms with only CPU cores, and no integrated GPU. The evaluation performed by Lee et al. [19] demonstrates the performance of the RT-TAS [19] algorithm on a computing platform with an integrated GPU over a large period of time (>17>17 minutes). Thus, there is a need to replicate their algorithm on a CPU platform without an integrated GPU to examine its performance in this environment. Furthermore, the temperature sensors provided in a computing chip, which are used in the RT-TAS [19] evaluation, are not capable of measuring the chip temperature with high spatiotemporal resolution. Thus, a more detailed evaluation methodology is required. The methodology used in this work (Section 6) allows evaluation with high spatiotemporal resolution. Hence, we implement the POD thermal model and POD-TAS scheduling algorithm and replicate the RT-TAS [19] thermal model and scheduling algorithm for comparative evaluation. Our evaluation includes a set of 4 tasks to provide a case similar to the one used by RT-TAS [19]. Additionally, we evaluate with 8 tasks to examine the performances of the two algorithms in a comparatively heavy workload scenario. Furthermore, the POD-TAS algorithm can be expanded to support computing chips with integrated GPU and CPU cores by treating the GPU as another CPU core, to which only certain tasks (GPU computation) may be assigned.

It is important to note that the POD-TAS algorithm differs from RT-TAS [19] in its use of a dynamic thermal model. This change enables the prediction of how long a certain task can execute on a CPU core before crossing the thermal threshold. RT-TAS [19] relies on a steady-state thermal model, which only predicts the temperature where the CPU will reach thermal equilibrium for a certain power dissipation. This allows RT-TAS [19] to use a task assignment algorithm based on the Worst-Fit Decreasing (WFD) bin-packing algorithm to maintain a balance among the steady state temperatures of the CPU cores. However, this ignores the transient thermal behavior of the CPU cores caused by variations in the power dissipation throughout task execution. The POD-TAS algorithm tracks the state of each CPU core over time using a state transition diagram which further aids in assigning tasks to suitable cores. Thus, the POD-TAS algorithm ensures that the dynamic temperature of each core does not exceed a given temperature threshold, which is the key to improving the reliability and extending the lifespan of the processor chips.

6 Evaluation Methodology

Refer to caption
Figure 4: TAS Algorithm Evaluation Flow Overview

With the goal of uniformly evaluating TAS algorithms with high spatiotemporal resolution, we develop a simulation-based methodology. This methodology is used to evaluate both our POD-TAS and the RT-TAS [19] algorithms using the same set of tasks (Table 4) on the same CPU platform. To this end, a set of open-source simulators are adapted to be used for evaluation. Many steps are involved in the evaluation pipeline to enable a thorough evaluation. First, the benchmark programs used as workloads for task scheduling are evaluated to find their timing and power information. We also train the thermal models that will be used for scheduling. Then, the trained thermal model and the task information are given to the scheduler program, which constructs the execution schedule. Then, this schedule is executed in the final evaluation stages to gather the dynamic temperature profile of the CPU during the execution of the schedule. An overview of the evaluation process is shown in Figure 4.

The simulators chosen are gem5 [6] and Multicore Power, Area, and Timing (McPAT) [20]. Gem5 is a cycle-accurate CPU simulator that can be used to gather traces of architecture-level events during program execution. These traces of event statistics are then used as input for McPAT. The McPAT simulator computes the power dissipation of different FBs of the CPU based on the statistics trace from gem5. Lastly, for thermal evaluation, the Finite Element Computational Software (FEniCS) [10] FEM platform is used to calculate the detailed thermal profile of the CPU during execution. The input to FEniCS is a floor plan of the CPU FBs, along with the dynamic power map for each FB that was collected by McPAT.

[Uncaptioned image]
Figure 5: Floorplan of AMD Athlon II X4 640
Table 3: CPU Configuration Parameters for AMD Athlon II X4 640
Parameter Value
Clock Speed 3.0 GHz
Number of Cores 4
L1i Size 64 kB/core
L1d Size 64 kB/core
L2 Size 512 kB/core
Memory Size 8 GB
Memory Type DDR3 1600 8x8
L1 Cache Latency 3
L2 Cache Latency 15
Table 4: Selected Benchmarks from COMBS
Benchmark Description WCET (ms)
2D Heat Tightly coupled heat distribution algorithm 147
Radix Sort Sorts a large array with the Radix Sort algorithm 85
Advection-Diffusion One dimensional CFD simulation 41
Monte-Carlo Performs Monte-Carlo simulation to estimate π\pi 32
FFTW Fastest Fourier Transform in the West 150
KS-PDE 1D Kuramoto-Sivashinsky equation 84
Loops Speed-tests loop structures 39
Lid-Driven Cavity CFD simulation of viscous incompressible fluid flow 78

The CPU chosen for our evaluation is the AMD Athlon II X4 640 [8]. This CPU model was chosen due to its quad-core configuration and the availability of information about the layout of the CPU’s FBs. The layout of the CPU is shown in Figure 5. Each FB of the CPU has a different purpose (e.g., northbridge, caches, cores, etc.). Each FB in the figure is labeled with its name. For our evaluation, gem5 and McPAT are configured to match the AMD Athlon II X4 640 as best as possible. The values used for the CPU configuration are shown in Table 6. Thus, the simulators are matched to the realistic CPU while achieving greater temporal resolution.

To provide a realistic workload for the CPU to run during thermal-aware task scheduling, we select a subset of the open-source COMBS benchmark suite [9]. This benchmark suite was chosen to supply our workload because it contains many benchmarks that are representative of realistic workloads, such as heat simulation, Fourier transforms, PDE solving, and conjugate gradient calculation. The chosen subset of the COMBS benchmark suite used in our evaluation is described in Table 4. This subset is chosen as the other benchmarks in the COMBS suite exhibited compatibility issues with the Linux environment simulated by gem5 that is used for our evaluation. The evaluation has been performed using two different sets of the benchmarks, namely, a set of 4 tasks and a set of 8 tasks. We chose to evaluate with a set of 4 tasks to create a scheduling scenario similar to the evaluation performed in Lee et al.’s RT-TAS [19]. The 4 task case provides a lighter scheduling scenario. We evaluate an 8 task case where the thermal dissipation of the schedule is comparatively higher than the 4 task case. For the 4 task evaluations, the first 4 benchmarks in Table 4 are used, while for the 8 task evaluations, all 8 of the listed benchmarks are used.

Figure 6 shows the flowcharts for the creation for both of the thermal models (POD and steady-state) used in this work. Figure 6(a) shows the POD model creation flow. The temperature profile output by FEniCS is used in the POD training process described in Eq. 2 of Section 3. This output creates a POD model that is trained for the AMD Athlon II X4 640. For the creation of a steady-state model, we use the process proposed by RT-TAS [19]. A flowchart of the process to construct the steady-state model is shown in Figure 6(b). To create the model, we first use FEniCS to find the steady-state temperature of the CPU when a random power is applied to a single FB of the CPU. Then, from the temperature solution output by FEniCS, we extract the average temperature of each FB to solve for the thermal-coupling coefficients for each block needed for steady-state temperature prediction. Through this, we train the steady-state model needed for prediction. This thermal model is used by Lee et al. (2019) for their scheduling algorithm, RT-TAS [19].

Refer to caption
(a) POD Model Creation
Refer to caption
(b) Steady State model Creation
Figure 6: Thermal Model Creation for Use During Scheduling
Refer to caption
(a) Benchmark Data Collection Flow
Refer to caption
(b) Schedule Creation
Figure 7: Benchmark Data Collection and Schedule Creation Flows

To supply the task scheduler with information about the tasks, they are each first evaluated to collect the needed information. The timing information and power information for each benchmark is used to create a task file to use during scheduling. This task file includes information for the scheduler about each task, including its execution time, power dissipation, and name. The flow of this process is shown in Figure 7(a).

The final stage before schedule evaluation is the creation of the execution schedule. This process is shown in Figure 7(b). Using the task file created during benchmark evaluation and one of the trained thermal models (either our POD or Lee et al.’s model [19]), the scheduler is run to create the task execution schedule. If the thermal circuit model is selected for evaluation, then the RT-TAS [19] algorithm is used. When the POD model is chosen for evaluation, then the POD-TAS algorithm is used to create the schedule. This outputs a schedule that can be used during the final evaluation. Once schedules are created using both the thermal circuit model from Lee et al. (2019) [19] and our own POD model, they are evaluated to compare the thermal behavior of the CPU during schedule execution.

7 Evaluation Results

Refer to caption
Figure 8: Average LSE per number of POD Modes over 1 second of prediction
Refer to caption
Figure 9: Whole chip LSE over time per number of POD Modes

We perform evaluation using the methodology described in Section 6 on both our POD-TAS algorithm and RT-TAS [19]. To perform evaluation with our POD model, we first need to determine an appropriate number of POD modes to use for prediction during scheduling. To this end, a POD model is trained for the CPU and is evaluated on a task schedule to simulate how it would perform when predicting the temperature of the CPU during schedule execution. To provide a ground-truth to compare the POD predictions to, FEniCS (FEM DNS) is used to calculate the temperature profile of the same schedule. Comparison is done using Least-Squared Error (LSE) as the metric. The LSE per time step is calculated using Eq. (9), where NxN_{x}, NyN_{y}, and NzN_{z} are the number of thermal simulation mesh grid steps in the xx, yy, and zz directions respectively; TiT_{i} is the temperature at the ithi^{\text{th}} point in space calculated by FEniCS (the ground-truth); and PiP_{i} is the temperature at the ithi^{\text{th}} point in space predicted by POD.

LSE(%)=100i=0NxNyNz(TiPi)2i=0NxNyNzTi2\text{LSE}(\%)=100*\sqrt{\frac{\sum^{N_{x}*N_{y}*N_{z}}_{i=0}(T_{i}-P_{i})^{2}}{\sum^{N_{x}*N_{y}*N_{z}}_{i=0}T_{i}^{2}}} (9)

This gives prediction results with LSEs over time shown in Figure 9. A key behavior of these error measurements is that, while they vary over time, they do not have an increasing trend. This shows that the predictions of the POD model remain accurate over a very long period of time. The averages over time of the LSE per number of modes are shown by the blue line in Figure 8. The red line in Figure 8 shows the LSE of the maximum predicted temperature in the active layer of the chip. This is a key value because we are most interested in the peak temperature of the chip during scheduling. Based on Figures 8 and 9, the predictions for schedule construction in POD-TAS for the remainder of this work are performed using 30 modes to balance the accuracy of the model and computational overhead. Furthermore, based on Figure 2, we can see that 30 modes follows the temperature calculated with FEniCS (FEM) very well.

Refer to caption
(a) 4 Tasks
Refer to caption
(b) 8 Tasks
Figure 10: POD-TAS task assignment frequency vs THT_{H} and TCT_{C}

Figure 10 shows some characteristics of the schedules for varying combinations of TCT_{C} and THT_{H}. Figure 10(a) and Figure 10(b) show the task assignment frequency for the 4 task evaluation set and the 8 task evaluation set respectively. Note that pairs of (TC,TH)(T_{C},T_{H}) where TCTHT_{C}\geq T_{H} are invalid; thus, there are no points shown on or below the line TH=TCT_{H}=T_{C}. Other points that are missing in the plot are due to those combinations of (TC,TH)(T_{C},T_{H}) not yielding a schedule that meets the hard real-time deadlines for all tasks. By comparing Figure 10(a) and Figure 10(b), we can observe that as more tasks are added without changing the deadlines of any tasks, the values of THT_{H} and TCT_{C} must be increased to create valid schedules. Furthermore, as THT_{H} approaches TCT_{C} or as THT_{H} is lowered, the frequency of assignments performed by POD-TAS increases. This is due to a requirement for the algorithm to more actively control the thermal dissipation of the chip to maintain the temperature within the constraints.

Figure 11 shows an example of the temperature of the CPU as calculated with FEniCS during the execution of the schedule compared to the temperatures of the CPU that POD predicted during scheduling. For this case, TH=75T_{H}=75^{\circ}C and TC=70T_{C}=70^{\circ}C. The left-hand side of Figure 11 shows the first 200ms of schedule construction, while the right-hand side shows the predictions between 1800ms and 2s. The POD predictions follow the maximum temperature from FEniCS very closely. Furthermore, we can see that each core follows the behavior defined by the CPU states shown in Figure 3 after heating up beyond THT_{H}. The cores that have exceeded THT_{H} idle to cool down until its temperature is below TCT_{C}.

Refer to caption
Figure 11: POD predictions during schedule construction
Refer to caption
Figure 12: HotSpot predictions during schedule construction

Figure 12 shows the result of using HotSpot as the thermal prediction model during scheduling with the POD-TAS algorithm. This is performed using the same CPU and task configuration as the four task POD result shown in Figure 11. HotSpot is configured with a mesh size of 256×256×17256\times 256\times 17 to match the POD model as closely as possible. This demonstrates the need for an accurate thermal model during scheduling. We can see from Figure 11 that the POD predictions are highly accurate, but when HotSpot is used, there is an error of about 3.5C consistently. This corresponds to the temperature prediction above the ambient temperature having an average percent error of 12.69%. Furthermore, HotSpot tends to predict the temperature to be below what the true temperature of the CPU is while executing the schedule, leading to a much higher temperature than is predicted during scheduling. POD demonstrates greatly reduced prediction error. This allows POD to make scheduling decisions that reflect the true CPU thermal state during execution.

To demonstrate the effectiveness of the POD-TAS algorithm, we have evaluated it alongside the RT-TAS [19] algorithm in our simulation platform. The evaluation for both algorithms is performed for a period of 2 seconds. The simulation uses a Δt\Delta t of 10μ\mus to provide a very high temporal resolution to view the transient thermal behavior of the CPU. This high temporal resolution allows for the capture of temperature spikes that occur in very small time scales. The thermal results of this evaluation for both models are shown in Figure 13. The left y-axis of each plot is the temperature, against which the minimum, mean, and maximum chip temperatures; and temperature thresholds (TCT_{C}, THT_{H}) are plotted. The right y-axis of each plot is the thermal variance, against which the variance values are plotted. The schedule for the evaluation of POD-TAS with 4 tasks is generated using TH=75CT_{H}=75^{\circ}\text{C} and TC=70CT_{C}=70^{\circ}\text{C}. The 8 task case for POD-TAS uses thresholds of TH=80CT_{H}=80^{\circ}\text{C} and TC=77CT_{C}=77^{\circ}\text{C}. These thresholds are shown with horizontal lines in Figures 13(a) and 13(c). These thresholds are selected based on the frequency of assignments for the thresholds shown in Figure 10. The highest frequencies of assignments are avoided while aiming to minimize the value of THT_{H}. On the other hand, for our evaluation of RT-TAS [19], shown in Figures 13(b) and 13(d), no values for any temperature thresholds are shown since RT-TAS [19] does not support their use.

Refer to caption
(a) POD-TAS 4 Task
Refer to caption
(b) RT-TAS [19] 4 Task
Refer to caption
(c) POD-TAS 8 Task
Refer to caption
(d) RT-TAS [19] 8 Task
Figure 13: CPU temperature during schedule execution
Table 5: Evaluation Metrics to Compare the Performances of the POD-TAS and RT-TAS [19] Algorithms. All temperatures from FEniCS are calculated in C.
4 Task 8 Task
Metric RT-TAS [19] POD-TAS % Difference RT-TAS [19] POD-TAS % Difference
Peak Temp. 110.53 78.47 -29.01 112.71 83.11 -26.26
Peak Var. 87.12 40.95 -53.00 75.83 53.41 -29.57
Var(Mean) 10.08 1.14 -88.69 5.04 3.03 -39.88
Var(Max) 141.71 5.41 -96.18 111.49 7.51 -93.26
Var(Var) 332.80 15.03 -95.48 178.30 53.28 -70.12

Table 5 gives more details regarding the performance of the algorithms. For evaluating both of the algorithms, we utilize several performance metrics. The first metric is the peak chip temperature during schedule execution. The second is the peak spatial variance of the chip temperature. This metric is affected by the presence of high thermal gradients within the chip. Lower thermal gradients yield a lower spatial variance of the chip temperature. Variance is calculated as i=1n(xix¯)2/n\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}/n, where nn is the number of samples, x¯\bar{x} is the sample mean, and xix_{i} is the ii-th sample. The last three variance metrics in Table 5 are all calculated in a similar manner to one another. First, the given metric is calculated from the spatial temperature distribution, for instance, the mean temperature in the chip at each time step. Each time step of the evaluation yields a single value for the chip mean, and maximum temperature, and a single value of the variance of the temperature in space. These values are plotted in Figure 13. Then, to observe how steady each measurement is, the variance of each metric is taken over time and reported in Table 5.

From Table 5, we can see that POD-TAS strongly outperforms RT-TAS [19] for both the 4 task and 8 task scenarios. The peak chip temperature is lowered by 29.01% in the 4 task case and 26.26% in the 8 task case by POD-TAS due to its capability to maintain the temperature very close to the configured thresholds. THT_{H} is exceeded in some cases, but this is due to the temporal fidelity of the decision making of the TAS algorithm, which is limited to only be able to make decisions with a time difference of 1 millisecond. However, we can see that the magnitude of exceeding THT_{H} is very limited. Furthermore, the peak spatial variance of the temperature is reduced by 53.00% and 29.57% in the 4 and 8 task cases respectively by POD-TAS. This demonstrates the ability of POD-TAS to restrict the occurence of high thermal gradients in space.

Also displayed in Table 5 is the percent difference of each metric. The percent difference is calculated as 100(ab)/b100*(a-b)/b where aa is the value of the metric when POD-TAS is used, and bb is the value when RT-TAS [19] is used. These values demonstrate that POD-TAS maintains a much more stable chip temperature over time. Especially important are the reductions in the variance of the maximum chip temperature over time by 96.18% and 93.26% in the 4 and 8 task cases respectively. Furthermore, the variance of the mean chip temperature is reduced by 88.69% in the 4 task case and 39.88% in the 8 task case and the variance of the spatial variance is reduced by 95.48% and 70.12% in the 4 and 8 task cases respectively. These reductions in the metrics demonstrate that the maximum chip temperature is maintained at a much more consistent level by POD-TAS than by RT-TAS [19]. This effect avoids repeated heating and cooling of the chip which helps to improve chip reliability and prevent premature device aging.

8 Future Research Directions

Future work in this new direction can include a more thorough investigation into the behavior of POD-TAS as the parameters change. A greater understanding of how the generated schedules change depending on the temperature thresholds would help to more optimally apply POD-TAS. Furthermore, evaluation with more varied task sets would further deepen the understanding of the algorithm’s behavior. An expansion of this algorithm to be capable of scheduling a sporadic task set and reacting to sudden changes to the task set would further leverage the capabilities of the underlying POD thermal model. This would achieve the creation of an even more versatile algorithm that can be applied to manage the thermal behavior of systems with non-periodic tasks, such as laptops, desktops, and server computers. This would also allow POD-TAS to be applied in data center environments to optimize the thermal dissipation of large-scale computing systems. Furthermore, expanding the POD-TAS algorithm to perform co-scheduling on computing platforms with integrated GPU and CPU cores would enable its use in more diverse computing environments given our implemented POD-TAS algorithm can be readily extended to support this capability. Further expansion to the POD-TAS algorithm would be the addition of a method to enable the dynamic modification of the thresholds to minimize the temperature dissipation under a sporadic workload.

9 Conclusion

TAS algorithms can employ a variety of methods to limit the thermal dissipation of a chip. These methods include careful scheduling of task execution, the use of DVFS, and many others. Many TAS methodologies rely on the use of a thermal model to predict the chip temperature during schedule execution. These thermal models can have limitations in prediction accuracy or efficiency that can negatively impact the algorithm’s ability to do effective task scheduling. In this work, we have proposed and implemented POD-TAS, a TAS algorithm that is based on a reduced-order thermal model enabled by POD.

The POD thermal model enables the prediction of temperature with high spatiotemporal resolution. The accuracy of the dynamic thermal RC circuit simulator, HotSpot, is compared with the POD-based thermal model in Figure 2. This comparison supports the use of the POD-based model for scheduling. The dynamic nature of the POD thermal model enables POD-TAS to utilize a pair of temperature thresholds to manage the transient CPU temperature during task scheduling. On the other hand, a steady state thermal model does not provide the transient temperature predictions needed to control the chip temperature based on threshold values. The states of the CPU cores are defined using a state transition diagram (Figure 3) that enables POD-TAS to avoid the assignment of tasks to cores that are in unsuitable states.

We replicate the steady state thermal model by Lee et al. [19] and their TAS algorithm, RT-TAS [19]. The RT-TAS [19] algorithm is evaluated alongside the implemented POD-TAS algorithm in a novel simulation-based evaluation pipeline so that the performance of the two algorithms can be directly compared. Our evaluation methodology captures the thermal behavior of the TAS algorithms being evaluated with high spatiotemporal resolution over a period of 2 seconds. This enables sharp temperature variations to be observed. Our evaluation includes two subsets of the COMBS [9] benchmark suite to provide a 4 task evaluation case similar to the evaluation case used by Lee et al. [19] and a heavier 8 task case. The evaluation of the POD-TAS algorithm demonstrates its capabilities to control the CPU temperature based on a pair of configured temperature thresholds. Utilizing our novel evaluation methodology on both algorithms, it is observed that POD-TAS consistently outperforms RT-TAS [19]. Through our evaluation, it is observed that POD-TAS can reduce the peak spatial variance of the CPU by 53% and 29.57%, and the peak chip temperature by 29.01% and 26.26% compared to RT-TAS [19] in the 4 and 8 task evaluation cases respectively. Furthermore, variations in chip temperature over time are greatly reduced by POD-TAS. These improvements can help to greatly stabilize and reduce the temperature over the entire chip and consequently improve the lifetime of a chip.

Acknowledgements

This work is supported by National Science Foundation under Grant No. ECCS-2003307.

References

  • [1] Saeed Akbar, Ruixuan Li, Muhammad Waqas, and Avais Jan. Server temperature prediction using deep neural networks to assist thermal-aware scheduling. Sustainable Computing: Informatics and Systems, 36:100809, 2022.
  • [2] Satish Balay, William Gropp, Lois Curfman McInnes, and Barry F Smith. Petsc, the portable, extensible toolkit for scientific computation. Argonne National Laboratory, 2(17), 1998.
  • [3] Qaisar Bashir, Mohammad Pivezhandi, and Abusayeed Saifullah. Energy-and temperature-aware scheduling: From theory to an implementation on intel processor. In 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pages 1922–1930. IEEE, 2022.
  • [4] Ondrej Benedikt, Michal Sojka, Pavel Zaykov, David Hornof, Matej Kafka, Premysl Sucha, and Zdenek Hanzalek. Thermal-aware scheduling for mpsoc in the avionics domain: Tooling and initial results. In 2021 IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 159–168. IEEE, 2021.
  • [5] Gal Berkooz, Philip Holmes, and John L Lumley. The proper orthogonal decomposition in the analysis of turbulent flows. Annual review of fluid mechanics, 25(1):539–575, 1993.
  • [6] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. The gem5 simulator. ACM SIGARCH computer architecture news, 39(2):1–7, 2011.
  • [7] Pedro Chaparro, José Gonzáles, Grigorios Magklis, Qiong Cai, and Antonio González. Understanding the thermal implications of multi-core architectures. IEEE Transactions on Parallel and Distributed Systems, 18(8):1055–1065, 2007.
  • [8] CPU-World.com. Amd athlon ii x4 640, 2010.
  • [9] Anthony Dowling, Frank Swiatowicz, Yu Liu, Alexander John Tolnai, and Fabian Herbert Engel. Combs: First open-source based benchmark suite for multi-physics simulation relevant hpc research. In Algorithms and Architectures for Parallel Processing: 20th International Conference, ICA3PP 2020, New York City, NY, USA, October 2–4, 2020, Proceedings, Part I 20, pages 3–14. Springer, 2020.
  • [10] Todd Dupont, Johan Hoffman, Claus Johnson, Robert C Kirby, Mats G Larson, Anders Logg, and L Ridgway Scott. The fenics project. Chalmers Finite Element Centre, Chalmers University of Technology Gothenburg, 2003.
  • [11] Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy, September 2020.
  • [12] Wangkun Jia, Brian T Helenbrook, and Ming-C Cheng. Fast thermal simulation of finfet circuits based on a multiblock reduced-order model. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., 35(7):1114–1124, 2016.
  • [13] Wangkun Jia, Brian T Helenbrook, and Ming-Cheng Cheng. Thermal modeling of multi-fin field effect transistor structure using proper orthogonal decomposition. IEEE Trans. Electron Devices, 61(8):2752–2759, 2014.
  • [14] Lin Jiang, Anthony Dowling, Yu Liu, and Ming-C Cheng. Exploring an efficient approach for architecture-level thermal simulation of multi-core cpus. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pages 278–282. IEEE, 2022.
  • [15] Lin Jiang, Yu Liu, and Ming-C Cheng. Fast accurate full-chip dynamic thermal simulation with fine resolution enabled by a learning method. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
  • [16] Lin Jiang, Martin Veresko, Yu Liu, and Ming-C Cheng. An effective physics simulation methodology based on a data-driven learning algorithm. In Proceedings of the Platform for Advanced Scientific Computing Conference, pages 1–10, 2022.
  • [17] Young Geun Kim, Minyong Kim, Joonho Kong, and Sung Woo Chung. An adaptive thermal management framework for heterogeneous multi-core processors. IEEE Transactions on Computers, 69(6):894–906, 2020.
  • [18] CM Krishna and Israel Koren. Thermal-aware management techniques for cyber-physical systems. Sustainable Computing: Informatics and Systems, 15:39–51, 2017.
  • [19] Youngmoon Lee, Kang G Shin, and Hoon Sung Chwa. Thermal-aware scheduling for integrated cpus–gpu platforms. ACM Transactions on Embedded Computing Systems (TECS), 18(5s):1–25, 2019.
  • [20] Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen, and Norman P Jouppi. Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd annual ieee/acm international symposium on microarchitecture, pages 469–480, 2009.
  • [21] Tiantian Li, Ge Yu, and Jie Song. Minimizing energy by thermal-aware task assignment and speed scaling in heterogeneous mpsoc systems. Journal of Systems Architecture, 89:118–130, 2018.
  • [22] Tiantian Li, Tianyu Zhang, Ge Yu, Yichuan Zhang, and Jie Song. Ta-mcf: Thermal-aware fluid scheduling for mixed-criticality system. Journal of Circuits, Systems and Computers, 28(02):1950029, 2019.
  • [23] John Leask Lumley. The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation, 1967.
  • [24] Srijeeta Maity, Anirban Ghose, Soumyajit Dey, and Swarnendu Biswas. Thermal-aware adaptive platform management for heterogeneous embedded systems. ACM Transactions on Embedded Computing Systems (TECS), 20(5s):1–28, 2021.
  • [25] Sanjay Moulik, Arnab Sarkar, and Hemangee K Kapoor. Tarts: A temperature-aware real-time deadline-partitioned fair scheduler. Journal of Systems Architecture, 112:101847, 2021.
  • [26] Baver Ozceylan, Boudewijn R Haverkort, Maurits de Graaf, and Marco ET Gerards. Minimizing the maximum processor temperature by temperature-aware scheduling of real-time tasks. IEEE transactions on very large scale integration (VLSI) systems, 30(8):1084–1097, 2022.
  • [27] Javier Pérez Rodríguez and Patrick Meumeu Yomsi. An efficient proactive thermal-aware scheduler for dvfs-enabled single-core processors. In 29th International Conference on Real-Time Networks and Systems, pages 144–154, 2021.
  • [28] Javier Pérez Rodríguez and Patrick Meumeu Yomsi. Thermal-aware schedulability analysis for fixed-priority non-preemptive real-time systems. In 2019 IEEE Real-Time Systems Symposium (RTSS), pages 154–166. IEEE, 2019.
  • [29] Javier Pérez Rodríguez, Patrick Meumeu Yomsi, and Pavel Zaykov. A thermal-aware approach for dvfs-enabled multi-core architectures. In 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pages 1904–1911. IEEE, 2022.
  • [30] Sepideh Safari, Heba Khdr, Pourya Gohari-Nazari, Mohsen Ansari, Shaahin Hessabi, and Jörg Henkel. Therma-mics: Thermal-aware scheduling for fault-tolerant mixed-criticality systems. IEEE Transactions on Parallel and Distributed Systems, 33(7):1678–1694, 2021.
  • [31] Shi Sha, Wujie Wen, Gustavo A Chaparro-Baquero, and Gang Quan. Thermal-constrained energy efficient real-time scheduling on multi-core platforms. Parallel Computing, 85:231–242, 2019.
  • [32] Yanshul Sharma and Sanjay Moulik. Cetas: a cluster based energy and temperature efficient real-time scheduler for heterogeneous platforms. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, pages 501–509, 2022.
  • [33] Muhammad Naeem Shehzad, Qaisar Bashir, Ghufran Ahmad, Adeel Anjum, Muhammad Naeem Awais, Umar Manzoor, Zeeshan Azmat Shaikh, Muhammad A Balubaid, and Tanzila Saba. Thermal-aware resource allocation in earliest deadline first using fluid scheduling. International Journal of Distributed Sensor Networks, 15(3):1550147719834417, 2019.
  • [34] Mircea R Stan, Kevin Skadron, Marco Barcella, Wei Huang, Karthik Sankaranarayanan, and Sivakumar Velusamy. Hotspot: A dynamic compact thermal model at the processor-architecture level. Microelectronics Journal, 34(12):1153–1165, 2003.
  • [35] Yen-Wei Wu, Chia-Lin Yang, Ping-Hung Yuh, and Yao-Wen Chang. Joint exploration of architectural and physical design spaces with thermal consideration. In Proceedings of the 2005 international symposium on Low power electronics and design, pages 123–126, 2005.
  • [36] Shikang Xu, Israel Koren, and C Mani Krishna. Thermal aware task scheduling for enhanced cyber-physical systems sustainability. IEEE Transactions on Sustainable Computing, 5(4):581–593, 2019.
  • [37] Qi Zhou, Lei Mo, and Xianghui Cao. Energy optimization for dvfs-enabled cps through reliable and real-time task mapping. In 2020 Chinese Automation Congress (CAC), pages 5832–5837. IEEE, 2020.