# Online Resource Management for Improving Reliability of Real-Time Systems on "Big–Little" Type MPSoCs

Yue Ma<sup>®</sup>, Student Member, IEEE, Junlong Zhou<sup>®</sup>, Member, IEEE, Thidapat Chantem, Senior Member, IEEE, Robert P. Dick<sup>®</sup>, Member, IEEE, Shige Wang, Senior Member, IEEE, and Xiaobo Sharon Hu<sup>®</sup>, Fellow, IEEE

Abstract-Heterogeneous multiprocessor systems on a chips 1 2 (MPSoCs) consisting of cores with different performance/power 3 characteristics are widely used in many real-time embedded 4 systems, where both soft-error reliability and lifetime reliabil-5 ity are key concerns. Although existing efforts have investigated 6 related problems, they either focus on one of the two reliabil-7 ity concerns or propose time-consuming scheduling algorithms 8 that cannot adequately address runtime workload and environ-9 mental variations. This paper introduces an online framework 10 which is adaptive to runtime variations and maximizes soft-error 11 reliability while satisfying the lifetime reliability constraint for 12 soft real-time systems executing on MPSoCs that are composed of 13 high-performance cores and low-power (LP) cores. Based on each 14 core's executing frequency and utilization, the framework per-15 forms workload migration between high-performance cores and 16 LP cores to reduce power consumption and improve soft-error 17 reliability. Experimental results based on different hardware plat-18 forms show that the proposed approach reduces the probability 19 of failures due to soft errors by at least 17% and 50% on aver-20 age compared to a number of representative existing approaches 21 that satisfy the same lifetime reliability constraints.

Index Terms—Heterogeneous multiprocessor systems on a chip
 (MPSoC), lifetime reliability, real-time embedded system, soft error reliability.

Manuscript received May 28, 2018; revised September 4, 2018; accepted October 18, 2018. This work was supported in part by NSF under awards CNS-1319904, CNS-1319718, CNS-1319784, National Natural Science Foundation of China under Grant 61802185, and Natural Science Foundation of Jiangsu Province under Grant BK20180470. This paper was recommended by Associate Editor C. Yang. (*Corresponding author: Junlong Zhou.*)

Y. Ma and X. S. Hu are with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail: yma1@nd.edu; shu@nd.edu).

J. Zhou is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: jlzhou@njust.edu.cn).

T. Chantem is with the Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203 USA (e-mail: tchantem@vt.edu).

R. P. Dick is with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: dickrp@umich.edu).

S. Wang is with General Motors, Warren, MI 48093 USA (e-mail: shige.wang@gm.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCAD.2018.2883990

#### I. INTRODUCTION

**n**0 ADDRESS power/energy concerns, various 26 heterogeneous multiprocessor systems on a chip 27 (MPSoCs) have been introduced [1]. A popular MPSoC architecture that is often used in power/energy-conscious 29 real-time embedded applications is composed of pairs of 30 high-performance (HP) cores and low-power (LP) cores. 31 Following the terminology introduced by ARM [2], we 32 refer to this architecture as the "big-little" architecture. 33 Nvidia's variable symmetric multiprocessing [3] is such an 34 example. Such HP and LP cores present unique performance, power/energy, and reliability tradeoffs, which are investigated 36 in this paper. 37

Resource management in heterogeneous MPSoCs has been 38 widely studied [4]–[8], but few work targets the big–little architecture [9]–[12]. In this architecture, HP (LP) cores 40 are homogeneous and both HP and LP cores have the 41 same instruction set architecture. However, big-little type 42 MPSoCs may support different execution models. In one 43 model, represented by Nvidia's TK1 [13] and Samsung's 44 Exynos 5410 [14], one HP core is paired with one LP core, 45 and the HP and LP cores in the one pair cannot work 46 simultaneously. In another model, represented by Nvidia's 47 TX2 [15] and NXP's i.MX8 [16], although HP and LP cores 48 can work simultaneously, all HP (all LP) cores must exe-49 cute at the same frequency. We aim to design a resource 50 management framework that is adaptive to different execution 51 models.

Since many real-time embedded systems are deployed in 53 critical applications and are expensive as well as inconvenient 54 to replace, lifetime reliability due to permanent faults<sup>1</sup> as well 55 as soft-error reliability due to transient faults are important 56 design considerations. Although there exist several efforts that 57 either target soft-error reliability [17]-[19] or lifetime relia-58 bility [8], [20]–[23], only a few papers have examined both 59 soft-error reliability and lifetime reliability together [24]–[27]. In addition, runtime workload variations further complicate 61 the problem of improving the system overall reliability. 62

<sup>1</sup>Intermittent faults are unlikely to be strongly dependent on power consumption and therefore are out of the scope of this paper.

0278-0070 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

<sup>63</sup> Hence, designing an online approach considering both lifetime <sup>64</sup> reliability and soft-error reliability becomes necessary.

This paper systematically addresses reliability concerns for real-time systems running on big–little type MPSoCs. Since transient faults occur much more frequently than permanent faults [28], we focus on increasing soft-error reliability without sacrificing lifetime reliability. Specifically, we solve the problem of maximizing soft-error reliability while satisfying temperature, real-time, and lifetime reliability requirements. Our problem is motivated by many real world applicatons, such as mobile devices and in-vehicle infotainment systems [29]. We are particularly interested in developing an online framework to address unavoidable workload and environment variations.

Our online framework, referred to as dynamic reliability reliability reliability and judiciously scaling core frequencies above by dynamically and judiciously scaling core frequencies to increase soft-error reliability. By leveraging the power and performance features of the big–little type MPSoCs, we dynamically migrate workload and activate the most powerefficient cores to execute tasks. Meanwhile, in order to reduce the computational overhead to check whether the lifetime reliability caused by a thermal profile is larger than a lifetime reliability constraint, we design a tool, referred to as LTR-Checker, which is computational efficient to use at run time.

<sup>89</sup> This paper makes three main contributions.

We propose a computationally efficient method to determine whether a given temporal thermal profile would
 respect the corresponding lifetime reliability threshold.

By performing extensive experiments on a hardware
 platform, we experimentally establish a suitable task
 migration guideline allowing tasks executed on most
 power efficient cores.

We develop an online framework to maximize soft-error
 reliability under temperature, real-time, and lifetime reli ability constraints by scaling cores' frequencies and

selecting the most power efficient cores to execute tasks.

We have implemented and validated DRIF on two hardware boards containing Nvidia's TK1 [13] chip and TX2 [15] chip, respectively. Based on the results obtained from running the MiBench benchmark suite [30], we show that DRIF increases the no soft error occurring time at least 2 more days than to existing approaches.

The rest of this paper is organized as follows. We review related work in Section II. Section III introduces the various system models. We experimentally explore the power features the power features the problem and LP cores, and establish a task migration guideline the in Section IV. Section V formulates the problem and protive vides an overview of our framework. Section VI describes the LTR-Checker. Section VII describes DRIF in detail. Sections VIII and IX describe our experimental setup and the results, respectively. Section X concludes this paper.

## II. RELATED WORK

116

As a special type of heterogeneous MPSoCs, the big-118 little type MPSoCs use two types of cores: the LP cores offer high power efficiency while the HP cores provide max- 119 imum computing performance [2]. This type of MPSoCs 120 provides flexibility to balance the performance and power, and 121 facilitates ease of use [31]. Since different execution models 122 introduce unique constraints, e.g., HP core and LP core in 123 the same pair cannot work simultaneously, or all HP (all LP) 124 cores must execute at the same frequency, most resource man- 125 agement approaches for heterogeneous MPSoCs are not appli- 126 cable for the big-little architecture [5], [21], [23], [32], [33]. 127 Focusing on the big-little architecture, Liu et al. [9] proposed 128 an iterative approach for mapping multithreaded applications 129 on MPSoCs composing of multiple core types to achieve 130 high performance and power efficiency. Annamalai et al. [10] 131 designed a novel technique to dynamically swap threads 132 between HP cores and LP cores and change the core frequency 133 to achieve a high throughput/Watt. Considering the constraints 134 for HP cores and LP cores, Carroll and Heiser [11] investi- 135 gated the mechanisms for frequency scaling, and proposed a 136 technique to reduce energy consumption. Singla et al. [12] 137 designed an online method to predict and reduce power and 138 runtime temperature for big-little type MPSoCs. While the 139 above work considers the specific features of the big-little 140 architecture, none of them focuses on lifetime reliability or 141 soft-error reliability. 142

There exist several efforts that directly aim to increase 143 soft-error reliability [17], [18], [34], [35] or lifetime 144 reliability [7], [21], [36], [37]. In order to improve soft-error 145 reliability, Zhao et al. proposed a method to allocate recover- 146 ies for tasks [17], [18] while Nahar and Meyer [38] assigned 147 redundancies to tasks statically. Fan et al. proposed a dynamic 148 voltage and frequency scaling (DVFS)-based method to reduce 149 power consumption under soft-error reliability constraint. 150 Although these methods are effective at improving and ensur- 151 ing soft-error reliability, they usually reduce lifetime reliability 152 with a high operating temperature. For periodic tasks running 153 on an MPSoC, Huang et al. [21] proposed an analytical model 154 to estimate lifetime reliability of MPSoCs and a task map- 155 ping and scheduling algorithm to guard against aging effects. 156 Bolchini et al. [36] dynamically determined the most effec- 157 tive mapping of tasks to minimize network-on-chip energy 158 consumption and maximize lifetime reliability. Das et al. [7] 159 proposed a machine learning-based algorithm to handle inter- 160 and intra-application variations and reduce peak temperature 161 and thermal cycling. These methods are designed to increase 162 lifetime reliability but weaken soft-error reliability. 163

Our proposed framework considers soft-error reliability and 164 lifetime reliability, both of which have not typically been 165 examined together. The work by Das *et al.* [24] aims to jointly 166 improve soft-error reliability and lifetime reliability by mapping tasks to all cores and scaling core frequencies. However, 168 their solution is too computationally intensive to use at run 169 time. Kapadia and Pasricha [25] proposed a framework to 170 optimize performance and energy. Although transient and perreliability but only focuses on reducing power under lifetime 173 reliability and soft-error reliability constraints. Zhou *et al.* [26] 174 proposed an offline technique to maximize system availability 175 by allocating replications of tasks and determining the core 177 frequency statically. Although these works consider both
178 lifetime reliability and soft-error reliability, they are offline
179 approaches and ignore the specific features of big–little type
180 MPSoCs. In this paper, we focus on big–little type MPSoCs
181 and propose to maximize soft-error reliability under lifetime
182 reliability constraint.

## III. SYSTEM MODELS

<sup>184</sup> In this section, we present the hardware platform as well as <sup>185</sup> the task and reliability models used in our framework.

## 186 A. Hardware Model

183

<sup>187</sup> We consider on big–little type MPSoCs with *n* HP and *m* <sup>188</sup> LP cores. We assume that both HP cores and LP cores support <sup>189</sup> DVFS and have multiple frequency levels [13], [15]. A core <sup>190</sup> dissipates static power when it is idle and consumes additional <sup>191</sup> active power when it performs operations [33]. Both active and <sup>192</sup> static power are related to the core's frequency. Let the uti-<sup>193</sup> lization of a core in a given time interval |t| be  $u = (|t_a|/|t|)$ , <sup>194</sup> where  $|t_a|$  is the amount of time that the core performs oper-<sup>195</sup> ations [33]. A core's utilization is commonly used to estimate <sup>196</sup> real-time performance and soft-error reliability.

We consider two execution models of big-little MPSoCs the in this paper. In the first execution model, referred to as Hetero-Paired model and represented by Nvidia' TK1 [13] and Samsung's Exynos 5410 [14], HP cores and LP cores are paired, and the paired HP core and LP core cannot be active simultaneously. In the second execution model, referred to as Homo-Grouped model and represented by Nvidia's TX2 [15] and NXP's i.MX8 [16], all cores can work simultaneously, but HP (LP) cores must execute at the same frequency. There exist other execution models, where HP and LP cores can run simultaneously with their own core frequencies, but such models are not widely supported by MPSoCs.

## 209 B. Task Model

We assume that MPSoCs execute independent periodic tasks 210 211 with soft deadlines, such as those found in multimedia and 212 communication applications. A task is associated with a tuple <sup>213</sup>  $\tau_i = \{d_i, e_i^H, e_i^L\}$ , where  $d_i$  is the deadline, and  $e_i^H$  and  $e_i^L$ 214 represent the worst-case execution time when running on an <sup>215</sup> HP core and LP core, respectively. Generally  $e_i^H \leq e_i^L$ . Since <sup>216</sup> all the jobs of the *i*th task have the same properties,  $\tau_i$  also 217 denotes the jobs of the *i*th task. Tasks on each core are sched-<sup>218</sup> uled according to a real-time scheduling policy, such as earliest <sup>219</sup> deadline first or rate monotonic scheduling [39]. In this paper, 220 we adopt a mapping approach, where tasks are assigned to 221 cores at design time to balance the workload of cores [40]. 222 We guarantee the real-time constraint by ensuring that the 223 utilization of each core is lower than utilization bound for 224 schedulability [8], [41].

#### 225 C. Soft-Error Reliability

In this paper, we aim to maximize reliability in the presence of soft errors caused by transient faults. The soft-error reliability of a single core in a time interval is the probability that soft errors occur during the time interval [26]

$$r(f,t) = e^{-\lambda(f) \times u \times |t|}.$$
(1) 230

The *f* is the core frequency, |t| is the length of time interval, <sup>231</sup> and *u* is the core's utilization in this time interval.  $\lambda(f)$  is the <sup>232</sup> average fault rate depending on *f* [26] <sup>233</sup>

$$\lambda(f) = \lambda_0 \times 10^{\frac{d(f_{\text{max}} - f)}{f_{\text{max}} - f_{\text{min}}}}.$$
 (2) 234

 $\lambda_0$  is the average faults rate at highest core frequency.  $f_{\min}$  <sup>235</sup> and  $f_{\max}$  are the minimum and maximum core frequency and <sup>236</sup> d (d > 0) is a hardware specific constant that indicates the <sup>237</sup> sensitivity of fault rates to frequency scaling. This model indicates that improving core frequency is effective in improving <sup>239</sup> soft-error reliability. <sup>240</sup>

For a big-little type MPSoC with *n* active HP cores and  $_{241}$  *m* active LP cores, the soft-error reliability in the *i*th time  $_{242}$  interval,  $t_i$ , is  $_{243}$ 

$$R(t_i) = \prod_{j=1}^{n} r_j^{\text{HP}}(f_j, t_i) \times \prod_{j=1}^{m} r_j^{\text{LP}}(f_j, t_i)$$
(3) 244

where  $r_j^{\text{HP}}(f_j, t_i)$  and  $r_j^{\text{LP}}(f_j, t_i)$  are the soft-error reliability of <sup>245</sup> the *j*th HP (LP) core in the time interval  $t_i$ . The aim of this <sup>246</sup> paper is to maximize soft-error reliability of the MPSoC in <sup>247</sup> each time interval. <sup>248</sup>

## D. Lifetime Reliability

Lifetime reliability, which is typically measured by the <sup>250</sup> mean-time-to-failure (MTTF), is dependent on multiple wearout effects [23]. For the sake of simplicity, we consider <sup>252</sup> electromigration as the primary source of permanent faults <sup>253</sup> in this paper. Other device fault mechanisms can be incorporated using the sum-of-fault rate model [22], [24]. Since the <sup>255</sup> tasks are executed periodically, the temperature variance with <sup>256</sup> respect to time will be also periodical after the system stabilization, so we assume the thermal profiles are same in each <sup>258</sup> task set's hyperperiod, hp. Based on the thermal profile in a <sup>259</sup> hyperperiod, the MTTF can be calculated by <sup>260</sup>

$$MTTF = |hp| \times \sum_{i=0}^{\infty} e^{-(i \times A)^{\beta}}$$
(4) 261

where |hp| is the length of the hyperperiod and  $\beta$  is the slope 262 parameter in the Weibull distribution [21]. *A* is a temperaturerelated parameter. If one hyperperiod can be divided by *p* time 264 intervals of the same length, and the operating temperature is 265 constant in each time interval, we calculate *A* as 266

$$A = \sum_{i=1}^{p} \frac{|t|}{\alpha(T_i)}$$
(5) 267

where |t| and  $T_i$  are the length of the time interval and the <sup>268</sup> temperature at the *i*th time interval, respectively.  $\alpha(T_i)$  relates <sup>269</sup> to the arrival rate of permanent faults and depends on the <sup>270</sup> hardware and temperature  $T_i$  [21]. <sup>271</sup>

229



Fig. 1. Power consumption of an HP (Denver) core and an LP (ARM) core under different utilization and frequency levels.

## 272IV. EMPIRICAL STUDY: POWER273CONSUMPTION OF CORES

In this section, we describe the big-little type MPSoCs 274 consisting of HP and LP cores, especially explore their unique 275 power features. We first observe that executing tasks on an 276 P core may consume more power and energy than executing 277 L an HP core. We provide a measurement-based method to 278 on uantitatively compare the power and energy consumption of 279 HP and LP cores. Based on this method and the measurement 280 results, we establish a suitable task mapping and migration 281 guideline to migrate tasks between cores and reduce a chip's 282 power consumption. 283

284 Whereas the primary goal of big-little MPSoCs is to reduce 285 power consumption by executing a light workload on the LP cores, an LP core may consume more power than an HP core. 286 To totally capture the power consumption behavior of big-little 287 MPSoCs, we have conducted a series of measurement-based 288 experiments. We measure the power consumption of the HP 289 core and LP core<sup>2</sup> in Nvidia's TX2 [15]. We use FLUKE 290 AC/DC current clamp meters [43] and National Instruments 291 USB-6216 data acquisition system [44] to acquire power con-292 sumption when cores execute at different core frequencies and 293 at different utilizations. 294

To generally evaluate the power features of HP and LP cores, we propose a measurement-based method to quantitatively compare power consumption of HP and LP cores. This method measures and compares the power consumption of cores with different frequencies and utilizations, and the

TABLE I TASK MAPPING AND MIGRATION GUIDELINE

| Litilization | Core Frequency (in GHz) |       |       |       |       |       |
|--------------|-------------------------|-------|-------|-------|-------|-------|
| Utilization  | 1.881                   | 1.574 | 1.267 | 0.960 | 0.652 | 0.345 |
| 100%         | HP                      | -     | -     | LP    | LP    | LP    |
| 80%          | -                       | -     | LP    | LP    | LP    | LP    |
| 60%          | -                       | LP    | LP    | LP    | LP    | LP    |
| 40%          | -                       | LP    | LP    | LP    | LP    | LP    |
| 20%          | LP                      | LP    | LP    | LP    | LP    | LP    |

comparison results can guide the mapping of tasks. A low 300 utilization means that the workload is light, a core consumes 301 less active power, and the leakage power may be dominated. 302 In order to maintain the core's utilization at a specific level, we 303 develop a feedback-based tool which can maintain the core's 304 utilization at a specific value. 305

The measured power consumptions are illustrated in Fig. 1. 306 The results show that for any core frequency, both HP and LP 307 cores have a higher power consumption with a heavier work- 308 load. However, LP cores are not always power efficient. The 309 LP core consumes less power than the HP core only when 310 the core frequency is low and the workload is light. For other 311 platforms, such as Nvidia's TK1 [13], we have similar obser- 312 vations that the LP core has a lower power than the HP core 313 only when the utilization and core frequency are low [27]. 314 One possible reason to explain this phenomenon is that the 315 HP and LP core have different microarchitectures, such as on 316 TX2. Meanwhile, although HP and LP cores on TK1 have 317 the same microarchitecture, the transistors in the HP core and 318 LP core have different threshold voltages. The LP core con- 319 sumes low leakage power but requires high voltage to operate 320 at high frequencies. On the contrary, the HP core can work at 321 high frequency with a low voltage. The measurement results 322 reveal that in order to reduce power consumption of MPSoCs, 323 we should keep the workload light in the LP cores, and it is 324 necessary to migrate tasks between HP and LP cores if cores' 325 utilizations vary at run time. 326

Based on the data collected from our extensive experi- 327 ments, we can establish a suitable task mapping and migration 328 guideline guiding the selection of cores for executing work- 329 load to balance the power consumption and performance. This 330 guideline indicates that whether the LP core or the HP core 331 consumes less power for each given core frequency and core 332 utilization. With this guideline, we should map and migrate 333 tasks to the core consuming less power. As an example, Table I 334 presents the guideline for Nvidia TX2. In this table, "HP" 335 ("LP") indicates the HP (LP) core is more power efficient with 336 the corresponding core frequency and utilization, so the work- 337 load should be executing on an HP (LP) core. Note that due 338 to small variations in ambient temperature, as well as chip 339 operating voltage and current, the power consumption may 340 vary slightly even for exactly the same workload. Therefore, 341 it is insufficient to conclude that a core always consumes less 342 power when its measured power is lower than that of another 343 core by a small amount. We treat two measured power values 344 as the same if their difference is smaller than 0.1 W, which is 345 the resolution of our sensors. In Table I, "-" indicates that 346 the difference in power consumption of an HP core and an LP 347 core is smaller than this threshold. In this case, workload can 348 run either on an HP core or an LP core. 349

<sup>&</sup>lt;sup>2</sup>Note that TX2 is composed of ARM Cortex A57 cores geared for multithreading, and Nvidia' Denver cores for high single-thread performance with dynamic code optimization [42]. In this measurement, we only consider single-thread applications for TX2, therefore the Denver core is an HP core and the ARM core is an LP core.

In this paper to dynamically improve reliability, we will use this guideline to migrate tasks between HP and LP cores most power efficient cores. This task migration reducing power consumption and temperature allows the cores to execute at a high core frequency and achieves a high soft-error reliability.

## V. PROBLEM FORMULATION AND FRAMEWORK OVERVIEW

In this section, we first formulate the problem addressed in this paper and then describe our solution DRIF at high level.

#### 360 A. Problem Formulation

The problem that we aim to solve is motivated by applications, such as in-vehicle infotainment systems. For such systems, tasks are expected to complete before their deadlines, and both lifetime and soft-error reliability are critical to guarantee the safety of human drivers and passengers [29]. At the same time, the infotainment and other in-vehicle computational subsystems should be power efficient especially for electric vehicles [45]. Furthermore, the workload in these systems can vary significantly at run time due to variations in input data and the environment.

Before formulating the problem, we first introduce two definitions.

Definition 1: A sampling window (SW) is defined as a time interval during which the temperature is constant.

<sup>375</sup> *Definition 2:* A profiling window (PW) is composed of <sup>376</sup> multiple equal-length SWs.

We determine the core frequencies and cores' workloads for each SW, and the PW is used to estimate lifetime reliability. The soft-error reliability, frequency, utilization, and operating temperature of the *j*th HP (LP) core at the *i*th set SW are denoted by  $r(SW_i, HP_j)$  ( $r(SW_i, LP_j)$ ),  $f(SW_i, HP_j)$ ( $f(SW_i, LP_j)$ ),  $u(SW_i, HP_j)$  ( $u(SW_i, LP_j)$ ), and  $T(SW_i, HP_j)$ ( $T(SW_i, LP_j)$ ).

Assume that a PW is composed of p SWs and the MPSoC ass has n HP cores and m LP cores.<sup>3</sup> Our objective is to maximize the system-level soft-error reliability in each PW

$$R = \prod_{i=1}^{p} \left( \prod_{j=1}^{n} r(SW_i, HP_j) \times \prod_{j=1}^{m} r(SW_i, LP_j) \right)$$
(6)

389

$$T(SW_i, HP_j) \le T_{th}, \forall SW_i, \forall HP_j$$
 (7)

$$T(SW_i, LP_j) \le T_{th}, \forall SW_i, \forall LP_j$$
(8)

$$u(SW_i, HP_i) \le u_{th}, \forall SW_i, \forall HP_i$$
 (9)

$$u(SW_i, LP_i) \le u_{th}, \forall SW_i, \forall LP_i$$
 (10)

$$MTTF(TP(PW)) \ge MTTF_{th}.$$
 (11)

The first two constraints require the temperature of both HP and LP cores are less than the thresholds  $T_{\text{th}}$  in any SW. Note that this temperature constraint also limits the power consumption of the system. The third and forth constraints capture the real-time requirement, where  $u_{\text{th}}$  is the upper bound on utilization to satisfy schedulability. The last constraint requires the MTTF resulting from the thermal profile, TP(PW), to be 396 not less than a threshold MTTF<sub>th</sub>. For soft real-time systems, 397 temporarily violating the real-time and lifetime reliability constraints is acceptable, but the temperature constraint must be 399 satisfied to avoid thermal throttling. 400

Different execution models of big–little type MPSoCs introduce different execution related constraints. For the Hetero-Paired execution model, the paired HP core and LP core cannot work simultaneously. If the *j*th HP core is paired with the *j*th LP core, one of them must be idle, i.e., 405

$$f(SW_i, HP_i) \times f(SW_i, LP_i) = 0.$$
<sup>(12)</sup> 400

We assume that a core whose frequency is 0 is powered-off. 407 For the Homo-Grouped execution model, all HP (LP) cores 408 should have the same core frequency, i.e., 409

$$f(SW_i, HP_j) = f(SW_i, HP_{j+1}), \forall j$$
(13)

$$f(SW_i, LP_i) = f(SW_i, LP_{i+1}), \forall j.$$
(14)

Our framework is applicable to both execution models and 411 dynamically improves the soft-error reliability under the 412 temperature, real-time, and lifetime reliability constraints in 413 each PW. 414

In order to solve the formulated problem, there are two main 415 challenges that we need to overcome: 1) since the history (i.e., 416 tasks' execution times) does not always reflect the future, it is 417 possible for the constraints to be violated when using historybased predictions and 2) a highly efficient algorithm is needed 419 to avoid excessive overhead. We address these challenges by 420 proposing an online framework to: 1) obtain system runtime 421 status and 2) dynamically migrate tasks between cores, power 422 off idle cores, and determine core frequencies based on the 423 system status in history. 424

#### B. Overview of Reliability Improvement Framework

As stated earlier in this paper, to better respond to workload 426 and environmental changes that are unavoidable in real-time 427 embedded systems, we aim to develop an online approach to 428 solve the problem defined in (6)–(11) by taking into consideration of execution models given in (12) or (13)–(14). The 430 basic idea of our framework, DRIF, is to incrementally solve 431 the optimization problem by using the history of system states 432 in the previous PW. The system state includes which cores are 433 active and each active core's frequency, operating temperature, 434 and utilization. Note that our method can be easily applied to 435 any arbitrary history window size. DRIF consists of three main 436 components: a schedule generator (SG), which is triggered at 437 the beginning of each PW, a schedule executor (SE), which is 438 triggered at the beginning of each SW, and a state collector 439 (SC), which collects the system state in each SW (see Fig. 2). 440

DRIF works as follows. In each SW, SC collects and saves 441 the system state. At the end of each PW, the system state during this PW is sent to SG. Based on the state information, 443 SG then generates a solution, called schedule, which specifies cores' workloads and frequencies for each SW in the 445 next PW (see Section VII-A). The migration guideline given 446 in Table I is used by SG to migrate tasks between cores to 447 achieve a lower power consumption as well as operating temperature. In order to reduce the computational cost, SG relies 449

 $<sup>{}^{3}</sup>m$  is equal to *n* for MPSoCs with Homo-Grouped execution model.



Fig. 2. High-level overview of DRIF.

450 on LTR-Checker to efficiently check whether the lifetime 451 reliability constraint is satisfied. In each SW, SE either adopts 452 the schedule generated by SG or modifies the schedule to adapt 453 to runtime variations (see Section VII-B).

We highlight the effectiveness of DRIF. First of all, DRIF is 454 455 adaptive to different types of big-little MPSoCs and different number of cores and/or pairs of cores. Meanwhile, considering 456 457 that the workload in systems may vary at runtime, DRIF periodically obtain the status of each core. Based on the obtained 458 459 runtime status, DRIF determines the most appropriate cores execute tasks satisfying the real-time, lifetime reliability 460 and operating temperature constraints. In order to reduce the 461 <sup>462</sup> computational overhead, we propose heuristics to periodically 463 migrate tasks and tune core frequencies in linear time. Note 464 that the execution order of tasks in each core can be deter-465 mined by some existing scheduling policies. DRIF is adaptive 466 to and can work on any scheduling policy, such as rate mono-467 tonic and earliest deadline first [39]. The details of our DRIF 468 are elaborated in the next section.

## 469 VI. LTR-CHECKER: TOOL TO CHECK LIFETIME 470 RELIABILITY CONSTRAINT

In this section, we design a tool LTR-Checker, which 471 472 computational efficiently checks whether the lifetime reliability caused by a given thermal profile in a task set's 473 <sup>474</sup> hyperperiod is larger than a prespecified constraint, MTTF<sub>th</sub>. Calculating MTTF by using (4) is extremely time consuming 475 476 and may not be practical to use at runtime. Hence, the tar-477 get of LTR-Checker is reducing the runtime computational 478 overhead by allowing some calculations are operated offline. We first introduce a concept called super hyperperiod, sp, 479 <sup>480</sup> which is a set of multiple adjacent hyperperiods. Let the length 481 of a super hyperperiod be |sp|, and  $|sp| = |hp| \times k$ , where k is 482 a positive integer. Since one super hyperperiod is composed of 483 multiple adjacent hyperperiods and thermal profiles are same <sup>484</sup> in each super hyperperiod, the lifetime reliability can also be 485 expressed as

$$\text{MTTF} = |\text{sp}| \times \sum_{i=0}^{\infty} e^{-(i \times A^{\star})^{\beta}}$$

486

(15)

w where

$$\mathbf{h}^{\star} = \sum_{i=1}^{k \times p} \frac{|t|}{\alpha(T_i)}.$$
 (16) 486

For a given thermal profile in the hyperperiod, LTR-Checker  $^{499}$  checks whether the corresponding MTTF is larger than  $^{490}$  MTTF<sub>th</sub>. LTR-Checker reduces the online computational  $^{491}$  overhead by operating the accumulation offline and only  $^{492}$  calculating  $A^*$  online.  $^{493}$ 

The aim of the offline part in LTR-Checker is to find <sup>494</sup> a threshold for  $A^*$ , referred to as  $A_{th}^*$ , such that if  $A^* \leq A_{th}^*$ , <sup>495</sup> the corresponding MTTF is larger than MTTF<sub>th</sub>. We first arbitrarily determine the length of super hyperperiod |sp|. Since <sup>497</sup> |hp| is usually in seconds and MTTF<sub>th</sub> is in years, setting |sp| <sup>498</sup> to months can satisfy that |sp| can be evenly divided by any <sup>499</sup> possible |hp|. After determining the value of |sp|, we can find <sup>500</sup> the threshold  $A_{th}^*$  such that <sup>501</sup>

$$|\text{sp}| \times \sum_{i=0}^{\infty} e^{-(i \times A_{\text{th}}^{\star})^{\beta}} = \text{MTTF}_{\text{th}}.$$
 (17) 502

If the  $A^*$  caused by a thermal profile is smaller than  $A_{\text{th}}^*$ , the 503 corresponding system's MTTF is larger than MTTF<sub>th</sub>. 504

The online part of LTR-Checker calculates  $A^{\star}$  based on 505 the thermal profile in a hyperperiod. With the determined |sp|, 506 we first find the relationship between A [in (5)] and  $A^{\star}$ , which 507 is described in Lemma 1. 508

*Lemma 1:* If one super hyperperiod is composed of  $k_{509}$  hyperperiods, i.e.,  $|sp| = k \times |hp|$ , then  $A^* = A \times k$ .

*Proof:* Since thermal profiles are same in each hyper- 511 period, each hyperperiod's *i*th time interval has the same 512 temperature, i.e.,  $T_i = T_{i+p} = \cdots = T_{i+kp}$ . Furthermore, 513  $\alpha(T_i) = \alpha(T_{i+p}) = \cdots = \alpha(T_{i+kp})$ . Hence 514

$$A^{\star} = \sum_{i=1}^{kp} \frac{|t|}{\alpha(T_i)} = k \times \sum_{i=1}^{p} \frac{|t|}{\alpha(T_i)} = A \times k.$$
(18) 515

Since |sp| is arbitrarily determined offline and |hp| is constant for a given task set, we only need to calculate *A* in <sup>518</sup> order to obtain *A*\*. *A* can be obtained by using (5), and its <sup>519</sup> computational overhead only depends on the value of |hp|, <sup>520</sup> which is much smaller than |sp| and MTTF<sub>th</sub>. Comparing to <sup>521</sup> obtain MTTF directly by using (4) and (5), the online operation of LTR-Checker is only obtaining *A* by using (5). <sup>523</sup> Hence, LTR-Checker dramatically reduces the online computational overhead and it can be easily used even when the computational resources are limited. In DRIF, we require the <sup>526</sup> length of the PW is multiple of the length of the task set's <sup>527</sup> hyperperiod, and the SG utilizes the LTR-Checker to determine whether a given operating temperature can guarantee the lifetime reliability constraint.

## VII. DESIGN OF RELIABILITY IMPROVEMENT 531 FRAMEWORK 532

We provide the details of our framework DRIF to improve 533 the soft-error reliability under the temperature, real-time, and 534 lifetime reliability constraints. 535

487

#### Algorithm 1 SG for Homo-Grouped MPSoCs

| 1: hf (lf): the cores with high (low) core frequencies                                                      |
|-------------------------------------------------------------------------------------------------------------|
| 2: $l(lf, SW_i)$ : frequency level of $lf$ cores at sampling window $SW_i$                                  |
| 3: $TP_j$ : thermal profile in the $q^{th}$ profiling window                                                |
| 4: <b>procedure</b> GENERATORHOG( <i>Sc</i> ( <i>PW<sub>j</sub></i> ), <i>St</i> ( <i>PW<sub>j</sub></i> )) |
| 5: <b>if</b> MTTF $(TP_j) < MTTF_{th}$ <b>then</b>                                                          |
| 6: <b>for</b> each sampling window $SW_i$ <b>do</b>                                                         |
| 7: <b>if</b> $u(l(hf, SW_i) - 1) < u_{th}$ then                                                             |
| 8: $l(hf, SW_i) = l(hf, SW_i) - 1$                                                                          |
| 9: else if $u(l(hf, SW_i) - 1) < u_{th}$ then                                                               |
| 10: $l(lf, SW_i) = l(lf, SW_i) - 1$                                                                         |
| 11: end if                                                                                                  |
| 12: end for                                                                                                 |
| 13: <b>else</b>                                                                                             |
| 14: <b>for</b> each sampling window $SW_i$ <b>do</b>                                                        |
| 15: <b>if</b> $T(l(lf, SW_i) + 1) < T_{th}$ <b>then</b>                                                     |
| 16: $l(H, SW_i) = l(H, SW_i) + 1$                                                                           |
| 17: <b>end if</b>                                                                                           |
| 18: end for                                                                                                 |
| 19: <b>end if</b>                                                                                           |
| 20: for each sampling window $SW_i$ do                                                                      |
| 21: $Sc^{\star}(SW_i) \leftarrow migrate workload based on TABLE I$                                         |
| 22: end for                                                                                                 |
| 23: $Sc(PW_{j+1}) \leftarrow \{Sc^{\star}(SW_1), \dots, Sc^{\star}(SW_p)\}$                                 |
| 24: end procedure                                                                                           |
|                                                                                                             |

#### 536 A. Schedule Generator

The goal of SG is to generate a schedule, i.e., each core's workload and frequency, for the next PW based on the system status in the current PW. Although it is possible to use an optimization solver to generate an optimal schedule for the problem defined in (6)–(11), such a solver would be too time consuming for online use. Instead, we design a computational state effective heuristic migrating tasks and dynamically scaling core frequencies.

As pointed out earlier, we assume that the workload has fate already been mapped and the workload is balanced between fate cores. Considering the runtime variations of workload, SG fate determines the frequencies of all cores to maximize soft-error fate reliability and meet all constraints in (7)–(11) by considerfoing the execution models of big–little MPSoCs given in (12) for (13), (14).

Before we present the algorithm in SG, we first introduce some concepts. System state,  $St(PW_j)$ , denotes the state in the PW PW<sub>j</sub>, which includes the utilization, frequency, and operating temperature of each core in the SWs of PW<sub>j</sub>.  $St(SW_i)$ , some a subset of  $St(PW_j)$ , represents the state in the SW SW<sub>i</sub>. System schedule,  $Sc(PW_j)$ , specifies each core's workload and frequency in all SWs in PW<sub>j</sub>. Similarly,  $Sc(SW_i)$  represents schedule in the SW SW<sub>i</sub>.

SG is invoked at the end of each PW and takes  $St(PW_j)$ and  $Sc(PW_j)$  as inputs. SG generates a schedule for Homo-Grouped MPSoCs (in Algorithm 1) or for Hetero-Paired MPSoCs (in Algorithm 2), respectively. We provide the details to generate a schedule for Homo-Grouped MPSoCs first. The idea is that we check whether the lifetime reliability constraint set is satisfied, and try to increase core frequencies if the lifetime reliability is larger than its constraint, otherwise, reduce core frequencies (in lines 5–19). Since all HP (LP) cores run at the same core frequency, we use *hl* (*lf*) to represent cores

#### Algorithm 2 SG for Hetero-Paired MPSoCs

| 1:  | $\rho_k$ : the $k^{th}$ active core                                     |
|-----|-------------------------------------------------------------------------|
| 2:  | $l(\rho_k, SW_i)$ : HP's frequency level at sampling window $SW_i$      |
| 3:  | $TP_j$ : thermal profile in the $q^{th}$ profiling window               |
| 4:  | <b>procedure</b> GENERATORHEP( $Sc(PW_j), St(PW_j)$ )                   |
| 5:  | if $MTTF(TP_i) < MTTF_{th}$ then                                        |
| 6:  | for each sampling window $SW_i$ do                                      |
| 7:  | Sort core with their core frequencies                                   |
| 8:  | for $\rho_k$ (starting form the core with high frequency)               |
|     | do                                                                      |
| 9:  | if $u(l(\rho_k, SW_i) - 1) < u_{th}$ then                               |
| 10: | $l(\rho_k, SW_i) = l(\rho_k, SW_i) - 1$                                 |
| 11: | break                                                                   |
| 12: | end if                                                                  |
| 13: | end for                                                                 |
| 14: | end for                                                                 |
| 15: | else                                                                    |
| 16: | for each sampling window $SW_i$ do                                      |
| 17: | Sort core with their core frequencies                                   |
| 18: | for $\rho_k$ (starting form the core with low frequency) do             |
| 19: | if $T(l(\rho_k, SW_i) + 1) < T_{th}$ then                               |
| 20: | $l(\rho_k, SW_i) = l(\rho_k, SW_i) + 1$                                 |
| 21: | break                                                                   |
| 22: | end if                                                                  |
| 23: | end for                                                                 |
| 24: | end for                                                                 |
| 25: | end if                                                                  |
| 26: | for each sampling window $SW_i$ do                                      |
| 27: | $Sc^{\star}(SW_i) \leftarrow migrate workload based on TABLE I$         |
| 28: | end for                                                                 |
| 29: | $Sc(PW_{j+1}) \leftarrow \{Sc^{\star}(SW_1), \dots, Sc^{\star}(SW_p)\}$ |
| 30: | end procedure                                                           |
|     |                                                                         |

running at high (low) core frequencies. For each SW, if the 570 system status in the previous PW,  $St(PW_i)$ , violates the life- 571 time reliability constraint, SG reduces the core frequencies of 572 cores running at high frequency if doing so does not vio- 573 late the real-time constraint (in lines 7 and 8). Otherwise, 574 reduce the core frequencies of cores with low core frequency 575 if not violate the real-time constraint (in lines 9 and 10). 576 Meanwhile, if  $St(PW_i)$  meets the lifetime reliability constraint, 577 SG increases frequencies for cores with low core frequency 578 to improve soft-error reliability under the temperature con- 579 straint (in lines 14–18). After determining core frequencies, 580 SG migrates tasks between cores to reduce the power con- 581 sumption and temperature (in lines 20–22). We provide the 582 details of task migration in Algorithm 3. After determining 583 core frequencies and migrating tasks between cores, the sched- 584 ule for the next PW,  $Sc(PW_{i+1})$ , is generated (in line 23). The 585 computational complexity to determine the core frequencies 586 for Homo-Grouped MPSoCs is O(p), where p is the number 587 of SWs in a PW. 588

SG generates a schedule for Hetero-Paired MPSoCs in  $_{599}$ Algorithm 2. If the system status in the previous PW, St(PW<sub>*j*</sub>),  $_{590}$ violates the lifetime reliability constraint, SG tries to reduce  $_{591}$ the core frequency for the core which executes at the highest  $_{592}$ frequency if doing so does not violate the real-time constraint  $_{593}$ (in lines 6–14). On the contrary, if St(PW<sub>*j*</sub>) satisfies the lifetime reliability constraint, SG increases the core frequency  $_{595}$ of cores with low core frequency under the temperature constraint (in lines 16–24). Similar to Homo-Grouped MPSoCs,  $_{597}$ 

| Algorithm 3 Migrate Workload                                                              |
|-------------------------------------------------------------------------------------------|
| 1: $Ty(\rho_j)$ : the type of $\rho_j$ , its HP or LP                                     |
| 2: $u(\rho_j, SW_i)$ : $\rho_j$ 's utilization at $SW_i$                                  |
| 3: $u(\rho_j, W)$ : $\rho_j$ 's utilization if executing workload W                       |
| 4: $e_k^{\rho_p}$ : the execution time of task $\tau_k$ on core $\rho_p$                  |
| 5: <b>procedure</b> MIGRATE( $Sc(SW_i)$ , $St(SW_i)$ , TABLE I)                           |
| 6: <b>if</b> Homo-Grouped MPSoCs <b>then</b>                                              |
| 7: <b>for</b> each core $(\rho_i)$ <b>do</b>                                              |
| 8: $\tau_k$ : the task on $\rho_i$ with shortest execution time                           |
| 9: $\rho_p$ : the lowest utilization core at different type of $\rho_i$                   |
| 10: Search TABLE I with $u(\rho_i, SW_i)$ and $f(\rho_i, SW_i)$                           |
| 11: $\mathcal{T} \leftarrow$ the type of the most power efficient core                    |
| 12: while $T_{V}(\rho_{i}) \neq \mathcal{T}$ do                                           |
| $e_{l}^{\rho p}$                                                                          |
| 13: if $u(\rho_p) + \frac{k}{d_k} < u_{th}$ then                                          |
| 14: Migrate $\tau_k$ to core $\rho_p$                                                     |
| 15: end if                                                                                |
| 16: $\mathcal{T} \leftarrow \text{search TABLE I}$                                        |
| 17: end while                                                                             |
| 18: end for                                                                               |
| 19: end if                                                                                |
| 20: if Hetero-Paired MPSoCs then                                                          |
| 21: <b>for</b> each active core $\rho_j$ <b>do</b>                                        |
| 22: $\mathcal{T} \leftarrow \text{search TABLE I with } u(\rho_j) \text{ and } f(\rho_j)$ |
| 23: W: the workload on $\rho_j$                                                           |
| 24: $\rho_p: \rho_j$ 's paired core                                                       |
| 25: <b>if</b> $Ty(\rho_j) \neq T$ and $u(W, \rho_p) < u_{th}$ <b>then</b>                 |
| 26: Migrate all workload to $\rho_p$ paired core                                          |
| 27: end if                                                                                |
| 28: end for<br>29: end if                                                                 |
|                                                                                           |
| 30: <b>for</b> each core $\rho_j$ <b>do</b>                                               |
| 31: <b>if</b> $\rho_j$ 's workload is empty <b>then</b><br>32: Power off $\rho_i$         |
| 32: Power off $\rho_j$<br>33: end if                                                      |
| 33: end for                                                                               |
| 35: end procedure                                                                         |
|                                                                                           |

<sup>598</sup> SG migrates tasks (in lines 26–28) and finally generates a new <sup>599</sup> schedule Sc(PW<sub>*j*+1</sub>) (in line 29). The computational complex-<sup>600</sup> ity of Algorithm 2 is  $O(p \times (n + m) \times \log(n + m))$ , where *p* <sup>601</sup> is the number of SWs in a PW, and *n* and *m* are the number <sup>602</sup> of HP cores and LP cores, respectively.

We provide the details on how to migrate tasks and select 603 <sup>604</sup> power efficient cores to execute tasks are in Algorithm 3. This 605 task migration algorithm is called by Algorithms 1 and 2 at 606 each SW, and its inputs are the migration guideline given 607 in Table I, the system status, and schedule at each SW. The 608 key idea is that we search the migration guideline with each 609 core's utilization and frequency, and migrate tasks based on 610 the search results. For the Homo-Grouped MPSoCs, for a core, <sub>611</sub>  $\rho_i$ , if the migration guideline indicates we should tune  $\rho_i$ 's 612 utilization to save power, we migrate the task with shortest 613 execution to an LP or HP core (in lines 6–19). We itera-614 tively migrate tasks between cores until the results of search 615 migration guideline match the types of all cores. For the 616 Hetero-Paired MPSoCs, the paired HP and LP cores work 617 exclusively. Hence, if tasks are ready optimally mapped to 618 each pair initially, we only need to select the HP or LP 619 core to use for each pair. If the searching results from the 620 task migration guideline do not match the type of the active <sub>621</sub> core  $\rho_i$ , migrate all tasks on  $\rho_i$  to its paired core if doing so does not violate the real-time constraint (in lines 20–29). For 622 both Homo-Grouped and Hetero-Paired MPSoCs, if a core's 623 workload is empty, power off this core to save energy (in 624 lines 30–34). For the Homo-Grouped MPSoCs, the computational complexity of Algorithm 3 is  $O(\wp \times (m + n))$ , where 626  $\wp$ , *m*, *n* are the number of tasks, HP cores, and LP cores, 627 respectively. For Hetero-Paired MPSoCs, the complexity is 628 O(m + n). 629

630

661

## B. Schedule Executor

The SE, determines the active cores' frequencies at the <sup>631</sup> beginning of each SW. A straightforward approach is to sim-<sup>632</sup> ply follow the schedule generated by SG. However, since the schedule  $Sc(PW_{j+1})$  is generated based on the system status <sup>634</sup>  $St(PW_j)$ , but the utilization in the PW PW<sub>j</sub> can be different <sup>635</sup> from that in the  $PW_{j+1}$ ,  $Sc(PW_{j+1})$  may actually violate some <sup>636</sup> or all of the constraints during run time. For soft real-time <sup>637</sup> systems, it is acceptable to temporarily violate the real-time <sup>638</sup> compensated in the next PW. However, violating the temperature constraint may either cause timing faults or unexpected <sup>641</sup> throttling. Therefore, SE should be designed to avoid the <sup>642</sup> occurrence of such a case. <sup>643</sup>

SE adjusts core frequency for each core. At the beginning of each SW, SE receives the initial temperatures from SC, which is the temperature of the previous SW, and gets the cores' frequencies from Sc(PW<sub>*j*+1</sub>). We can statically design a table that for all possible initial temperatures and core frequencies. This table indicates the worst-case temperature in an SW by assuming the core utilization is 100%. SE checks whether the worst-case temperature can remain below the thermal threshold. If not, we reduce the core frequency one level lower than that specified in the schedule Sc(PW<sub>*j*+1</sub>). Since we establish such a table statically, the computational complexity of SE is O(1).

## VIII. EXPERIMENTAL SETUP 656

To evaluate the proposed DRIF, we conducted experiments 657 to compare with two representative approaches. In this section, 658 we present the platforms, workloads, and the frameworks used 659 for comparison in our experiments. 660

## A. Comparison Targets

We compared the performance of DRIF to two representative frameworks. The multiobjective optimization of system 663 reliability (MOO) finds the Pareto-optimization of soft-error 664 reliability and lifetime reliability by using a genetic algorithm [24]. Since the genetic algorithm-based solver is too 666 costly to be used at runtime, core frequencies are determined 667 offline and cannot be changed online. In order to evaluate the benefits of migrating tasks between cores, we compare DRIF 669 with a framework, called simplified DRIF (S-DRIF), which 670 scales core frequencies as in DRIF, but does not migrate tasks 671 between cores. 672

Three metrics are considered in the comparison. The prob-  $_{673}$  ability of failures (PoF) due to soft errors quantifies the  $_{674}$  soft-error reliability. The PoF is defined as 1 - R, where R  $_{675}$ 

TABLE IITASKS' EXECUTION TIMES ON TK1

| Tasks    | Execution time |             |  |
|----------|----------------|-------------|--|
| Tasks    | HP ARM Core    | LP ARM Core |  |
| qsort    | 145 ms         | 145 ms      |  |
| blowfish | 150 ms         | 152 ms      |  |
| crc32    | 195 ms         | 196 ms      |  |

676 is the system-level soft-error reliability. An approach achiev-677 ing a lower PoF is the same as achieving a higher soft-error 678 reliability. We used the percentage of feasible solutions for 679 real-time constraint (FS-RT) to describe the capability of sat-680 isfying real-time constraint. In experiments, the jobs of each 681 task are periodically released. We checked which job meet-682 ing its deadline and the percentage of FS-RT is quantified as 683 the ratio of the number of jobs meeting its deadline over the 684 total number of all jobs. Similarly, the percentage of feasible 685 solution for lifetime reliability (FS-LTR) constraint describes 686 the capability of satisfying lifetime reliability. In experiments, 687 we utilized LTR-Checker to check whether the lifetime reliability is satisfied at each PW. The percentage of FS-LTR is <sup>689</sup> quantified as the ratio of the number of PWs achieving a higher 690 lifetime reliability than the lifetime reliability constraint over 691 the total number of PWs.

#### 692 B. Experimental Platforms

The experiments are conducted on two boards containing 693 694 Nvidia's TK1 [13] and TX2 [15] chip, respectively. The TK1 695 chip provides four HP cores and one LP core, but the HP cores 696 and the LP core cannot work simultaneously. Hence, the TK1 697 chip is a Hetero-Paired type MPSoC, and it only provides one 698 HP-LP core pair. In our experiments, the workload for TK1 designed to be light enough to fit on one HP or LP core. 699 is The TX2 chip includes two HP cores (with Nvidia's Denver 700 microarchitecture [42]) and four LP cores (with ARM Cortex 701 702 A57 microarchitecture). Hence, TX2 chip is a Homo-Grouped <sup>703</sup> type MPSoC. Note that we only consider single-thread tasks, 704 so the Denver core has a better performance than the ARM 705 core [42].

We obtained the chip's operating temperature by reading their integrated thermal sensors. Note that although TK1 and TX2 only report one CPU temperature, it is enough to show that DRIF can achieve a lower temperature and guarantee the temperature constraint. For both HP and LP cores in TK1, we use the core frequencies 1.092 GHz, 0.96 GHz, 0.828 GHz, 0.696 GHz, and 0.564 GHz. For TX2, we select the core frequencies 1.881 GHz, 1.574 GHz, 1.267 GHz, 0.960 GHz, 14 0.652 GHz, and 0.345 GHz.

## 715 C. Workloads

We now discuss the tasks set for experiments on TK1 and TX2. Considering the low performance of cores in TK1, we rue chose three tasks from Mibench benchmark suite [30] and measured their execution times when the core's frequency is rue 1.092 GHz (see Table II). TK1 only provides one 1 HP–LP rue core pair, so tasks execute either on the HP core or the LP rue core. For experiments on TX2, we used two ARM cores and rue Denver core to execute eight tasks from Mibench [30]. We

TABLE IIITASKS' EXECUTION TIMES ON TX2

| Tasks        | Execution time |          |  |  |
|--------------|----------------|----------|--|--|
| Tasks        | Denver Core    | ARM Core |  |  |
| cjpeg        | 24 ms          | 33 ms    |  |  |
| qsort        | 49 ms          | 69 ms    |  |  |
| dijkstra     | 47 ms          | 64 ms    |  |  |
| blowfish     | 26 ms          | 52 ms    |  |  |
| susan        | 52 ms          | 78 ms    |  |  |
| stringsearch | 2 ms           | 3 ms     |  |  |
| crc32        | 30 ms          | 75 ms    |  |  |
| patricia     | 12 ms          | 16 ms    |  |  |

TABLE IVTASK ALLOCATION FOR TX2

| Tasks        | Mapping to    |
|--------------|---------------|
| cjpeg        | ARM Core 0    |
| qsort        | ARM Core 0    |
| dijkstra     | ARM Core 1    |
| blowfish     | ARM Core 1    |
| susan        | Denver Core 0 |
| stringsearch | Denver Core 0 |
| crc32        | Denver Core 0 |
| patricia     | Denver Core 0 |

first measured the execution times of the tasks on an ARM and 724 Denver core with the highest core frequency (see Table III). 725 Based on the measurements, we mapped these tasks to ARM 726 and Denver cores and balanced the workloads of cores (see 727 Table IV). Note that although TX2 provides four ARM cores 728 and two Denver cores, we only used one Denver core and two 729 ARM cores because the workload is light. If allocating the 730 selected tasks to three ARM cores and/or two Denver cores, 731 the workload of each core is such light that a core can always 732 execute at the highest frequency. Meanwhile, we aim at inde-733 pendent tasks and the soft-error reliability achieved by DRIF 734 is related to a cores utilization but independent to the number 735 of cores. Hence, executing tasks on two ARM cores and one 736 Denver core is sufficient to validate the capability of DRIF in 737 improving soft-error reliability. 738

We designed two task groups. In the first group, tasks are 739 frame-based and share the same period and deadline. For 740 experiments on TX2, tasks' periods and deadlines are 150, 741 200, 250, and 300 ms, and for experiments on TK1, they are 742 700, 800, 900, and 1000 ms. In the second group, a task's 743 deadline and period are set to be the same but random in the 744 ranges between 150–200 ms, 200–250 ms, 250–300 ms for 745 TX2, and for TK1, the ranges are 700–800 ms, 800–900 ms, 746 and 900–1000 ms. We used the deadline-monotonic schedul-747 ing policy to schedule tasks, where a task with shorter deadline is assigned a higher priority and executed earlier [39]. Also, 749 change from tasks to jobs to be consistent. Such setups ensure that tasks are schedulable, and represent multiple workloads 751 ranging from heavy to light. 752

## IX. EXPERIMENTAL RESULTS 753

756

In this section, we examine the performance of the proposed 754 DRIF compared to the S-DRIF and MOO. 755

#### A. Experiments on TK1 Chip

We first validated our approach on a TK1 chip with Hetero-  $_{757}$  Paired execution model. We compared the proposed DRIF  $_{758}$ 



Fig. 3. PoFs due to soft errors and percentage of feasible solutions for a frame-based task set running on TK1.



Fig. 4. PoFs due to soft errors and percentage of feasible solutions for a general periodic task set running on TK1.

with MOO and S-DRIF to determine whether DRIF can
improve soft-error reliability without violating temperature,
real-time, and lifetime reliability constraints.

Fig. 3 shows the experimental results when tasks are frame-762 based. DRIF and S-DRIF have similar performance when the 763 workload is heavy, but DRIF achieves a lower PoF than MOO 764 and S-DRIF in all the cases. The PoF of DRIF is 97.89%, 765 95.64%, 37.9%, and 18.89% of S-DRIF when the period is 766 700, 800, 900, and 1000 ms, respectively. This reduced PoF 767 guarantees the system can work without soft errors at least 768 2 min more than S-DRIF, and up to 10 h. Meanwhile, since 769 our task migration considers the real-time and lifetime reli-770 ability constraints, the percentages of FS-RT and FS-LTR of 771 772 DRIF, S-DRIF and MOO are close, especially when the work-773 load is light. For the soft-error reliability, the PoF of DRIF is only 29.18%, 51.21%, 16.29%, and 15.04% of MOO. It means 774 775 that the system can work successfully without soft errors 1.1, 0.4, 12.7, and 100.8 h more than MOO, respectively. 776

We extended the experiment to validate DRIF for a general periodic task set, where tasks' periods and deadlines are equal where tasks' periods and deadlines are equal average PoF of DRIF is 81% of S-DRIF and 51% of MOO, which translates to DRIF allowing the system to successfully



Fig. 5. PoFs due to soft errors and percentage of feasible solutions for a frame-based task set running on TX2.

work for 17 min more than S-DRIF on average, and 63 min 782 more than MOO on average. Comparing to the results in Fig. 3, 783 DRIF provides less benefits when tasks have different periods. 784 The reason is that the workload in each SW varies dramatically, and DRIF guarantees the lifetime reliability constraint 786 with a low core frequencies, which limits the performance in 787 improving soft-error reliability. However, DRIF is still a better 788 approach than S-DRIF and MOO, and achieves a lower PoF. 789

We measured the time and power consumption of DRIF on 790 an ARM core. DRIF consumes less than 1 ms to complete 791 and we cannot observe power changes when operating DRIF 792 because the resolution of our power measurement tool is about 793 0.1 W. Based on these measurements, we claim that the time 794 and power consumption of DRIF on TK1 can be ignored. 795

We also compared DRIF with a brute force search-based <sup>796</sup> approach which finds the optimal solution at each SW. This <sup>797</sup> approach, although can guarantee the highest soft-error reliability at each SW, is computation intensive and cannot be <sup>799</sup> used at the runtime. The execution time of this approach is <sup>800</sup> about 30 s if running on the TK1's HP core. Compared to this <sup>801</sup> approach, the computation time of DRIF is less than 1 ms even <sup>802</sup> if one PW has 100 SWs. Since both approaches determine <sup>803</sup> core frequencies for each PW, which is typically in minutes, <sup>804</sup> the brute force search may not be a good choice to use at <sup>805</sup> runtime.

Although the brute force search-based approach can find <sup>807</sup> optimal solutions at each SW, it is too computational complicated to apply at runtime. On the contrary, DRIF determines <sup>809</sup> the core frequencies at each PW by tuning the operating core frequencies one level at one time. Our experiments show that <sup>811</sup> DRIF can find the best solution starting from the fifth PW. <sup>812</sup> Since the length of a PW is in minutes, not finding the best <sup>813</sup> solution in the first five PWs (less than 10 min) has negligible effect on lifetime reliability and soft-error reliability. <sup>815</sup>

## B. Experiments on TX2 Chip

We conducted experiments on TX2 chip to evaluate the 817 performance of DRIF in the platform with Homo-Grouped 818 execution model. On this platform, DRIF scales core 819 frequencies and migrates tasks to increase soft-error 820



Fig. 6. PoFs due to soft errors and percentage of feasible solutions for a general periodic task set running on TX2.

<sup>821</sup> reliability under temperature, real-time and lifetime reliability 822 constraints.

Similar to the experiments on TK1, we validated DRIF 823 824 for: 1) a frame-based task set (see Fig. 5) and 2) a general periodic task set (see Fig. 6). For the frame-based task set, 825 826 the PoF of DRIF is 47.25%, 81.95%, 0.1%, and 0.003% of 827 S-DRIF when the period is 150, 200, 250, and 300 ms, 828 respectively. This low PoF guarantees the system can suc-829 cessfully work 158 h more than S-DRIF on average and 830 up to 24 days. Thanks to the dynamic task migration, DRIF 831 dynamically selects the most appropriate cores to execute 832 tasks. DRIF can also dynamically power off any idle cores 833 to reduce power consumption and allow active cores run-<sup>834</sup> ning at high core frequency. Hence, the benefits of DRIF are 835 clearer than the experiments on TK1 in Fig. 3. Comparing 836 to MOO, DRIF achieves a lower PoF in all cases, and 837 leads to a system that can successfully work about 6.6 days <sup>838</sup> more than MOO on average and up to 26 days. In terms of 839 satisfying real-time and lifetime reliability constraints, both 840 DRIF and S-DRIF achieve a similar percentage of FS-841 RT and FS-LTR to MOO especially when the workload is 842 light.

Fig. 6 shows the performance of DRIF when the workload 843 a general periodic task set. The PoF of DRIF is about 844 is 845 98%, 86%, and 0.001% of S-DRIF when periods of tasks 846 in ranges 150–200 ms, 200–250 ms, and 250–300 ms, respec-<sup>847</sup> tively. It means that DRIF guarantees the system successfully work without soft errors 7.6 days more than S-DRIF on 848 849 average, and up to 22.8 days. Meanwhile, the soft-error relia-850 bility improvement of DRIF over MOO is similar to that over -DRIF. Comparing to MOO, DRIF increases the system's S 851 <sup>852</sup> successful execution time about 7.6 days on average, and up 853 to 22.8 days. Finally, the execution time of DRIF is less than 1 ms either on the ARM core or the Denver core. The power 854 855 consumption of DRIF on TX2, similar as on TK1, is also 856 too small to be observed. In summary, the above experiments 857 confirm that our approach DRIF has a better performance in <sup>858</sup> improving soft-error reliability in all cases, especially when 859 the workload is light.

#### X. CONCLUSION

Focusing on two execution models of big-little type 861 MPSoCs, we proposed a DRIF to maximize soft-error 862 reliability under temperature, real-time, and lifetime reliabil- 863 ity constraints. We designed a computational efficient tool to 864 check whether the lifetime reliability caused by a thermal pro- 865 file is larger than a prespecified constraint. In order to reduce 866 power consumption, we empirically studied the power features 867 of the HP and LP cores and established a task migration guideline to indicate the most appropriate and power efficient core 869 to execute tasks. Based on these contributions, our framework 870 dynamically migrates tasks between cores and adjusts the core 871 frequencies to satisfy all constraints. The results on chips sup- 872 porting different execution models show that our approach is 873 effective in increasing soft-error reliability under constraints 874 compared to other representative approaches. As future work, 875 we plan to extend our approach to more general task models 876 and consider MPSoCs with GPU. 877

#### References

- [1] W. Wolf, A. Jerraya, and G. Martin, "Multiprocessor system-onchip (MPSoC) technology," IEEE Trans. Comput.-Aided Design Integr. 880 Circuits Syst., vol. 27, no. 10, pp. 550-561, Oct. 2008. 881
- [2] ARM. Big.LITTLE Technology: The Future of Mobile. 882 [Online]. Available: https://www.arm.com/files/pdf/big\_LITTLE\_ 883 Technology\_the\_Futue\_of\_Mobile.pdf 884
- [3] Nvidia. Variable SMP (4-Plus-1) a Multi-Core CPU Architecture 885 for Low Power and High Performance. [Online]. Available: 886 https://www.nvidia.com/content/PDF/tegra\_white\_papers 887
- [4] A. S. Hartman, D. E. Thomas, and B. H. Meyer, "A case for lifetime-888 aware task mapping in embedded chip multiprocessors," in Proc. Int. 889 Conf. Hardw. Softw. Codesign Syst. Synth., Oct. 2010, pp. 145-154 890
- [5] T. Chantem, X. S. Hu, and R. P. Dick, "Temperature-aware schedul- 891 ing and assignment for hard real-time applications on MPSoCs," 892 IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 10, 893 pp. 1884-1897, Oct. 2011. 894
- [6] L. Huang, F. Yuan, and Q. Xu, "On task allocation and scheduling 895 for lifetime extension of platform-based MPSoC designs," IEEE Trans. 896 Parallel Distrib. Syst., vol. 22, no. 12, pp. 789-800, Dec. 2011. 897
- [7] A. Das et al., "Reinforcement learning-based inter- and intra-application 898 thermal optimization for lifetime improvement of multicore systems," in 899 Proc. Design Autom. Conf., Jun. 2014, pp. 1-6. 900
- Y. Ma, T. Chantem, R. P. Dick, and X. S. Hu, "Improving system-level [8] 901 lifetime reliability of multicore soft real-time systems," IEEE Trans. 902 Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 6, pp. 1895–1905, 903 Jun. 2017 904
- [9] G. Liu, J. Park, and D. Marculescu, "Dynamic thread mapping for 905 high-performance, power-efficient heterogeneous many-core systems," 906 in Proc. Int. Conf. Comput. Design, Oct. 2013, pp. 54-61. 907
- A. Annamalai, R. Rodrigues, I. Koren, and S. Kundu, "An oppor-[10] 908 tunistic prediction-based thread scheduling to maximize throughput/watt 909 in AMPs," in Proc. Int. Conf. Parallel Archit. Compilation Techn., 910 Oct. 2013, pp. 63-72. 911
- [11] A. Carroll and G. Heiser, "Unifying DVFS and offlining in mobile mul- 912 ticores," in Proc. Int. Conf. Real Time Embedded Technol. Appl. Symp., 913 Apr. 2014, pp. 287-296. 914
- G. Singla, G. Kaur, A. K. Unver, and U. Y. Ogras, "Predictive dynamic 915 [12] thermal and power management for heterogeneous mobile platforms," 916 in Proc. Design Autom. Test Europe, Mar. 2015, pp. 1-6. 917
- of NVIDIA [13] Nvidia. (2018). Technical Brief TK1 918 Jetson Development Kit. 2018. [Online]. Accessed: Oct. Available: 919 http://developer.download.nvidia.com 920
- [14] Samsung. (2018). Samsung Exynos 5 Octa (5410) Mobile Processor. 921 Accessed: Oct. 2018. [Online]. Available: http://www.samsung.com/ 922 semiconductor/minisite/Exynos/Solution/MobileProcessor/Exynos\_5\_ 923 Octa 5410.html 924
- [15] Nvidia. (2018). Jetson Tegra X2. Accessed: Oct. 2018. [Online]. 925 Available: https://developer.nvidia.com/embedded/buy/jetson-tx2 926

- 927 [16] NXP. (2018). I.MX 8 Family ARM Cortex-A53, Cortex-A72,
- Virtualization, Vision, 3D Graphics, 4K Video. Accessed: Oct. 2018. 928 929 [Online]. Available: https://www.nxp.com/products/processors-and-
- microcontrollers/arm-based-processors-and-mcus/i.mx-applications-930
- processors/i.mx-8-processors:IMX8-SERIES 931
- 932 [17] B. Zhao, H. Aydin, and D. Zhu, "Enhanced reliability-aware power
- management through shared recovery technology," in Proc. Int. Conf. 933 934 Comput.-Aided Design, Nov. 2009, pp. 63-70.
- 935 [18] B. Zhao, H. Aydin, and D. Zhu, "Energy management under gen-936 eral task-level reliability constraints," in Proc. Int. Conf. Real Time Embedded Technol. Appl. Symp., Apr. 2011, pp. 285-294. 937
- 938 [19] B. Zhao, H. Aydin, and D. Zhu, "Generalized reliability-oriented energy management for real-time embedded applications," in Proc. Design 939 Autom. Conf., Jun. 2011, pp. 381-386. 940
- 941 [20] A. K. Coskun, R. Strong, D. M. Tullsen, and T. S. Rosing, "Evaluating
- the impact of job scheduling and power management on processor life-942 time for chip multiprocessors," in Proc. Int. Conf. Meas. Model. Comput. 943 Syst., Jun. 2009, pp. 169-180. 944
- L. Huang, F. Yuan, and Q. Xu, "Lifetime reliability-aware task alloca-945 [21] tion and scheduling on MPSoC platform," in Proc. Design Autom. Test 946 Europe, Mar. 2009, pp. 51-56. 947
- 948 [22] A. Das, A. Kumar, and B. Veeravalli, "Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor 949 systems," in Proc. Design Autom. Test Europe, Mar. 2013, pp. 689-694. 950
- 951 [23] T. Chantem, Y. Xiang, X. S. Hu, and R. P. Dick, "Enhancing multicore reliability through wear compensation in online assignment and schedul-952
- ing," in Proc. Design Autom. Test Europe, Mar. 2013, pp. 1373-1378. 953
- 954 [24] A. Das, A. Kumar, B. Veeravalli, C. Bolchini, and A. Miele, "Combined 955 DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs," in Proc. Design Autom. Test Europe, 956 Mar. 2014, pp. 1-6. 957
- 958 [25] N. Kapadia and S. Pasricha, "VARSHA: Variation and reliability-aware application scheduling with adaptive parallelism in the dark-silicon era,' 959 in Proc. Design Autom. Test Europe, Mar. 2015, pp. 1060-1065. 960
- 961 [26] J. Zhou, X. S. Hu, Y. Ma, and T. Wei, "Balancing lifetime and softerror reliability to improve system availability," in Proc. Asia South Pac. 962 Design Autom. Conf., Jan. 2016, pp. 685-690. 963
- Y. Ma, T. Chantem, R. P. Dick, S. Wang, and X. S. Hu, "An on-line 964 [27] framework for improving reliability of real-time systems on 'big-little' 965 966 type MPSoCs," in Proc. Design Autom. Test Europe, Mar. 2017, pp. 1-6.
- 967 [28] B. Zhao, H. Aydin, and D. Zhu, "On maximizing reliability of real-time embedded applications under hard energy constraint," IEEE Trans. Ind. 968 Informat., vol. 6, no. 3, pp. 316-328, May 2010. 969
- G. Macario, M. Torchiano, and M. Violante, "An in-vehicle infotainment 970 [29] software architecture based on Google Android," in Proc. Int. Symp. Ind. 971 Embedded Syst., Jul. 2009, pp. 257-260. 972
- 973 [30] University of Michigan. (2018). MiBench. Accessed: Oct. 2018. [Online]. Available: http://vhosts.eecs.umich.edu/mibench 974
- A. Annamalai, R. Rodrigues, I. Koren, and S. Kundu, "High-975 [31] performance and energy-efficient mobile Web browsing on big/little 976 977 systems," in Proc. Int. Conf. High Perform. Comput. Archit., Feb. 2013, pp. 13-24. 978
- 979 [32] P. Pop, K. Poulsen, V. Izosimov, and P. Eles, "Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered 980 981 embedded systems," in Proc. Int. Conf. Hardw. Softw. Codesign Syst.
- Synth., Sep. 2007, pp. 233-238. 982 983 [33] Y. Fu, N. Kottenstette, C. Lu, and X. Koutsoukos, "Feedback thermal
- control of real-time systems on multicore processors," in Proc. Int. Conf. 984 Embedded Softw., Oct. 2012, pp. 113-122. 985
- 986 [34] M. Fan, Q. Han, S. Liu, and G. Quan, "On-line reliability-aware dynamic power management for real-time systems," in Proc. Int. Symp. Qual. 987 988 Electron. Design, Mar. 2015, pp. 361-365.
- 989 [35] J. Huang, J. Blech, A. Raabe, C. Buckl, and A. Knoll, "Analysis and 990 optimization of fault-tolerant task scheduling on multiprocessor embedded systems," in Proc. Int. Conf. Hardw. Softw. Codesign Syst. Synth., 991 Oct. 2011, pp. 247-256. 992
- 993 [36] C. Bolchini et al., "Run-time mapping for reliable many-cores based on energy/performance trade-offs," in Proc. Design Autom. Conf., 994 995 Jun. 2013, pp. 58-64.
- 996 [37] A. Das, A. Kumar, and B. Veeravalli, "Temperature aware energyreliability trade-offs for mapping of throughput-constrained applica-997 tions on multimedia MPSoCs," in Proc. Design Autom. Test Europe, 998 Mar. 2014, pp. 1-6. 999
- B. Nahar and B. H. Meyer, "RotR: Rotational redundant task map-1000 [38] ping for fail-operational MPSoCs," in Proc. Defect Fault Tolerance VLSI 1001 Nanotechnol. Syst., Oct. 2015, pp. 21-28. 1002

- [39] C. L. Liu and J. W. Layland, "Scheduling algorithm for multipro- 1003 gramming in a hard-real-time environment," J. ACM, vol. 20, no. 1, 1004 pp. 46-61, Jan. 1973. 1005
- [40] Y. Ma, T. Chantem, X. S. Hu, and R. P. Dick, "Improving lifetime of 1006 multicore soft real-time systems through global utilization control," in 1007 Proc. Great Lakes Symp. VLSI, May 2015, pp. 79-82. 1008
- [41] Y. Fu et al., "Feedback thermal control for real-time systems," in 1009 Proc. Int. Conf. Real Time Embedded Technol. Appl. Symp., Apr. 2010, 1010 pp. 111-120. 1011
- [42] Nvidia Development Blog. (2018). Nvidia Jetson TX2 Delivers 1012 Twice the Intelligence to the Edge. Accessed: Oct. 2018. [Online]. 1013 Available: https://devblogs.nvidia.com/parallelforall/jetson-tx2-delivers- 1014 twice-intelligence-edge/ 1015
- [43] FLUKE. (2018). 80i-110s AC/DC Current Clamp. Accessed: 1016 Oct. 2018. [Online]. Available: http://www.fluke.com/fluke/iden/ 1017 accessories/current-clamps/80i-110s.htm?pid=55352 1018
- [44] National Instruments. (2018). NI USB-6216 BNC. Accessed: 1019 Oct. 2018. [Online]. Available: http://sine.ni.com/nips/cds/view/ 1020 p/lang/en/nid/207100 1021
- J. Moreno, M. E. Ortuzar, and J. W. Dixon, "Energy-management 1022 [45] system for a hybrid electric vehicle, using ultracapacitors and neural 1023 networks," IEEE Trans. Ind. Electron., vol. 53, no. 2, pp. 614-623, 1024 Apr. 2006. 1025



Yue Ma (S'16) received the B.S. degree from 1026 the Chengdu University of Technology, Chengdu, 1027 China, and the M.S. degree from the University 1028 of Electronic Science and Technology of China, 1029 Chengdu. He is currently pursuing the Ph.D. 1030 degree with the Department of Computer Science 1031 and Engineering, University of Notre Dame, 1032 Notre Dame, IN, USA. 1033

His current research interests include real-time 1034 embedded systems, reliable system design, 1035 power efficiency, and temperature-aware resource 1036 management. 1037



Junlong Zhou (S'15-M'17) received the Ph.D. 1038 degree in computer science from East China Normal 1039 University, Shanghai, China, in 2017. 1040

He was a Visiting Scholar with the University of 1041 Notre Dame, Notre Dame, IN, USA, from 2014 to 1042 2015. He is currently an Assistant Professor with 1043 the School of Computer Science and Engineering, 1044 Nanjing University of Science and Technology, 1045 Nanjing, China. His current research interests 1046 include real-time embedded systems, cloud comput- 1047 ing and IoT, and cyber physical systems. 1048

Dr. Zhou has been an Associate Editor for the Journal of Circuits, Systems, 1049 and Computers since 2017. 1050



Thidapat Chantem (S'05-M'11-SM'18) received 1051 the bachelor's degree from Iowa State University, 1052 Ames, IA, USA, in 2005 and the master's and 1053 Ph.D. degrees from the University of Notre Dame, 1054 Notre Dame, IN, USA, in 2011. 1055

She is an Assistant Professor of electrical 1056 and computer engineering with Virginia Tech, 1057 Blacksburg, VA, USA. Her current research 1058 interests include real-time embedded systems, 1059 energy-aware and thermal-aware system-level 1060 design, cyber-physical system design, and 1061 intelligent transportation systems. 1062



**Robert P. Dick** (S'95–M'02) received the B.S. degree from Clarkson University, Potsdam, NY, USA, in 1996 and the Ph.D. degree from Princeton University, Princeton, NJ, USA, in 2002.

He is an Associate Professor of electrical engineering and computer science with the University of Michigan, Ann Arbor, MI, USA. He was a Visiting Professor with the Department of Electronic Engineering, Tsinghua University, Beijing, China, in 2002, as a Visiting Researcher with NEC Labs America, Princeton, NJ, USA in 1999, and was on

1074 the faculty of Northwestern University, Evanston, IL, USA, from 2003 to 1075 2008. He is also CEO of the Stryd, Inc., Boulder, CO, USA, which produces 1076 wearable electronics for athletes.

Dr. Dick was a recipient of the NSF CAREER Award and the Department's Best Teacher of the Year Award in 2004. In 2007, his technology won a Computerworld Horizon Award, and the Best Paper Award at DATE for his research in 2010. His paper was selected as one of the 30 in a special collection of DATE papers appearing during the past 10 years. He served as the Technical Program Committee Co-Chair of the 2011 International Conference and System Synthesis, as an Associate ditor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLS1) SYSTEMS, and as a Guest Editor for *ACM Transactions on Embedded Computing Systems*.



Shige Wang (S'02–M'05–SM'11) received the Ph.D. degree in computer science and engineering from the University of Michigan, Ann Arbor, MI, USA, in 2004.

He is a Staff Research Scientist with General Motors Research and Development, Warren, MI, USA. His current research interests include system modeling and analysis, software architecture for parallel processing in automated driving systems, and embedded real-time control systems.



Xiaobo Sharon Hu (S'85–M'89–SM'02–F'16) 1097 received the B.S. degree from Tianjin University, 1098 Tianjin, China, the M.S. degree from the Polytechnic 1099 Institute of New York, New York, NY, USA, 1100 and the Ph.D. degree from Purdue University, 1101 West Lafayette, IN, USA. 1102

She is Professor with the Department of Computer 1103 Science and Engineering, University of Notre Dame, 1104 Notre Dame, IN, USA. Her current research interests 1105 include real-time embedded systems, LP system 1106 design, and computing with emerging technologies. 1107 She has published over 250 papers in the above areas. 1108

Dr. Hu was a recipient of the NSF CAREER Award in 1997, and the Best 1109 Paper Award from Design Automation Conference in 2001, and the IEEE 1110 Symposium on Nanoscale Architectures, in 2009. She served as an Associate 1111 Editor for the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION 1112 (VLSI) SYSTEMS, *ACM Transactions on Design Automation of Electronic* 1113 *Systems*, and *ACM Transactions on Embedded Computing*. She is the Program 1114 Chair of 2016 Design Automation Conference (DAC) and the TPC Co-Chair 1115 of DAC, in 2014 and 2015, respectively. 1116