Path Structured Multimarginal Schrödinger Bridge for Probabilistic
Learning of Hardware Resource Usage by Control Software

Georgiy A. Bondar, Robert Gifford, Linh Thi Xuan Phan, Abhishek Halder Georgiy A. Bondar is with the Department of Applied Mathematics, University of California, Santa Cruz, CA 95064, USA, [email protected].
Robert Gifford and Linh Thi Xuan Phan are with the Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA, {rgif,linhphan}@seas.upenn.edu.
Abhishek Halder is with the Department of Aerospace Engineering, Iowa State University, Ames, IA 50011, USA, [email protected].
This work was supported by NSF grants 2112755, 2111688 and 1750158.

Abstract

Solution of the path structured multimarginal Schrödinger bridge problem (MSBP) is the most-likely measure-valued trajectory consistent with a sequence of observed probability measures or distributional snapshots. We leverage recent algorithmic advances in solving such structured MSBPs for learning stochastic hardware resource usage by control software. The solution enables predicting the time-varying distribution of hardware resource availability at a desired time with guaranteed linear convergence. We demonstrate the efficacy of our probabilistic learning approach in a model predictive control software execution case study. The method exhibits rapid convergence to an accurate prediction of hardware resource utilization of the controller. The method can be broadly applied to any software to predict cyber-physical context-dependent performance at arbitrary time.

I Introduction

Control software in safety-critical cyber-physical systems (CPS) is often designed and verified based on platform models that do not fully capture the complexity of its deployment settings. For example, it is common to assume that the processor always operates at full speed, is dedicated to the control software, and that overheads are negligible. In practice, the hardware resources – such as last-level shared cache (LLC), memory bandwidth and processor cycles – often vary with time and depend on the current hardware state, which is a reason why we observe different execution times across different runs of the same control software [1]. This gap can lead to overly costly or unsafe design.

Measurement-based approaches and overhead-aware analysis can reduce the analysis pessimism or ensure safety [2]. The recent work [3] uses fine-grained profiles of the software execution on an actual platform to make dynamic scheduling and resource allocations. Supervisory algorithms that dynamically switch among a bank of controllers – all provably safe but some computationally more benign (and less performant) than others – depending on the resource availability also exist [4]. However, the effectiveness of these techniques is contingent on the quality of prediction of the hardware resource availability at a future instance or time horizon of interest.

In this work, we propose to predict the resource usage by control software based on just a very small set of measurements. This approach is attractive as it can reduce measurement efforts while better handling potential variances.

A first-principle predictive model for hardware resource availability based on semiconductor physics of the specific platform is, however, unavailable. Furthermore, resources such as cache and bandwidth are not only time-varying and stochastic, but they are also statistically correlated. This makes it challenging to predict the joint stochastic variability of the hardware resource availability in general. The challenge is even more pronounced for control software because the computational burden then also depends on additional context, e.g., reference trajectory that the controller is tracking.

We note that for safety-critical CPS, predicting the joint stochastic hardware resource state, as opposed to predicting a lumped variable such as worst-case execution time, can open the door for designing a new class of dynamic scheduling algorithms with better performance than what is feasible today while minimizing hardware cost.

This work proposes learning a joint stochastic process for hardware resource availability from control software execution profiles conditioned on CPS contexts (to be made precise in Sec. III-A, III-B). Our proposed method leverages recent advances in stochastic control – specifically in the multimarginal Schrödinger bridge (MSBP) – to allow prediction of time-varying joint statistical distributions of hardware resource availability at any desired time.

Contributions

Our specific contributions are as follows.

•

We show how recent algorithmic developments in solving the MSBP, enable probabilistic learning of hardware resources. This advances the state-of-the-art at the intersection of control, learning and real-time systems.
•

The proposed method is statistically nonparametric, and is suitable for high-dimensional joint prediction since it avoids gridding the hardware feature/state space.
•

The proposed formulation provably predicts the most likely distribution given a sequence of distributional snapshots for the hardware resource state.
•

We explain that the resulting algorithm is an instance of the multimarginal Sinkhorn iteration with path structured cost that is guaranteed to converge to a unique solution, and enjoys linear rate of convergence. Its computational complexity scales linearly w.r.t. dimensions, linearly w.r.t. number of distributional snapshots, and quadratically w.r.t. number of scattered samples.

Organization

We introduce the notations and preliminaries in Sec. II. The problem formulation is detailed in Sec. III. Sec. IV summarizes the overall methodology, which is then followed by a numerical case study in Sec. V. Concluding remarks are provided in Sec. VI.

II Notations and Preliminaries

We use unboldfaced capital letters to denote matrices and bold capital letters to denote tensors (of order three or more). Unboldfaced (resp. boldfaced) small letters are used to denote scalars (resp. vectors). Capital calligraphic letters are reserved to denote sets.

Square braces are used to denote the components. For instance, $\left[\bm{X}_{i_{1},\ldots,i_{r}}\right]$ denotes the $(i_{1},\ldots,i_{r})$ th component of the order $r$ tensor $\bm{X}$ , where $(i_{1},\ldots,i_{r})\in\mathbb{N}^{r}$ . We use the $r$ fold tensor product space notation $\left(\mathbb{R}^{d}\right)^{\otimes r}:=\underbrace{\mathbb{R}^{d}\otimes\ldots\otimes\mathbb{R}^{d}}_{r\;\text{times}}$ .

For two tensors $\bm{X},\bm{Y}$ of order $r$ , we define their Hilbert-Schmidt inner product as

\displaystyle\langle\bm{X},\bm{Y}\rangle:=\sum_{i_{1},\ldots,i_{r}}\left[\bm{X}_{i_{1},\ldots,i_{r}}\right]\left[\bm{Y}_{i_{1},\ldots,i_{r}}\right].

(1)

The operators $\exp(\cdot)$ and $\log(\cdot)$ are understood elementwise. We use $\odot$ and $\oslash$ to denote elementwise (Hadamard) multiplication and division, respectively.

For measures $\mu,\nu$ defined on two Polish spaces, their product measure is denoted by $\mu\otimes\nu$ . The relative entropy a.k.a. Kullback-Leibler divergence $D_{\rm{KL}}(\cdot\|\cdot)$ between probability measures $\mu$ and $\nu$ is

\displaystyle D_{\rm{KL}}(\mu\|\nu):=\begin{cases}\int\log\frac{{\rm{d}}\mu}{{\rm{d}}\nu}{\rm{d}}\mu&\text{if}\quad\mu\ll\nu,\\ +\infty&\text{otherwise,}\end{cases}

(2)

where $\frac{{\rm{d}}\mu}{{\rm{d}}\nu}$ denotes the Radon-Nikodym derivative, and $\mu\ll\nu$ is a shorthand for “ $\mu$ is absolutely continuous w.r.t. $\nu$ ”.

The Hilbert (projective) metric (see e.g., [5]) $d_{\rm{H}}\left(\bm{u},\bm{v}\right)$ between two vectors $\bm{u},\bm{v}\in\mathbb{R}^{n}_{>0}$ is

\displaystyle d_{\rm{H}}\left(\bm{u},\bm{v}\right)=\log\left(\dfrac{\max_{i=1,\ldots,n}u_{i}/v_{i}}{\min_{i=1,\ldots,n}u_{i}/v_{i}}\right).

(3)

We use the term “control cycle” to mean one pass of a feedback control loop. Due to hardware stochasticity, each control cycle completion takes variable amount of time.

III Problem Formulation

III-A Context $\bm{c}$

We consider a context vector $\bm{c}$ comprised of separable cyber and physical context vectors

\displaystyle\bm{c}:=\begin{pmatrix}\bm{c}_{\rm{cyber}}\\ \bm{c}_{\rm{phys}}\end{pmatrix}.

(4)

In this work, we consider an instance of (4) where

\displaystyle\bm{c}_{\rm{cyber}}=\begin{pmatrix}{\text{allocated last-level cache}}\\ {\text{allocated memory bandwidth}}\end{pmatrix},

(5)

where both features are allocated in blocks of some size, and

\displaystyle\bm{c}_{\rm{phys}}=y_{\rm{des}}(x)\in{\rm{GP}}\left([x_{\min},x_{\max}]\right),

(6)

where ${\rm{GP}}$ denotes a Gaussian process over the domain $[x_{\min},x_{\max}]$ . We work with a collection of contexts with cardinality $n_{\rm{context}}$ , i.e., a sample of contexts $\{\bm{c}^{i}\}_{i=1}^{n_{\rm{context}}}$ .

III-B Hardware Resource State $\bm{\xi}$

For concreteness, we define a hardware resource state or feature vector used in our numerical case study (Sec. V):

\displaystyle\bm{\xi}:=\begin{pmatrix}\xi_{1}\\ \xi_{2}\\ \xi_{3}\end{pmatrix}=\begin{pmatrix}\text{instructions retired}\\ \text{LLC requests}\\ \text{LLC misses}\end{pmatrix}.

(7)

The three elements of $\bm{\xi}$ denote the number of CPU instructions, the number of LLC requests, and the number of LLC misses in the last time unit (10 ms in our profiling), respectively.

We emphasize that our proposed method is not limited by what specific components comprise $\bm{\xi}$ . To highlight this flexibility, we describe the proposed approach for $\bm{\xi}\in\mathbb{R}^{d}$ with suitable interpretations for the specific application.

For a time interval $[0,t]$ of interest, we think of time-varying $\bm{\xi}$ as a continuous time vector-valued stochastic process over subsets of $\mathbb{R}^{d}$ . Suppose that $s\in\mathbb{N},s\geq 2$ snapshots or observations are made for the stochastic state $\bm{\xi}(\tau)$ , $0\leq\tau\leq t$ , at (possibly non-equispaced) instances

\tau_{1}\equiv 0<\tau_{2}<\ldots<\tau_{s-1}<\tau_{s}\equiv t.

Consider the snapshot index set $\llbracket s\rrbracket:=\{1,\ldots,s\}$ . For a fixed context $\bm{c}$ , the snapshot observations comprise a sequence of joint probability measures $\{\mu_{\sigma}\}_{\sigma\in\llbracket s\rrbracket}$ satisfying $\int{\rm{d}}\mu_{\sigma}(\bm{\xi}(\tau_{\sigma}))=1$ . In other words,

\displaystyle\bm{\xi}(\tau_{\sigma})\sim\mu_{\sigma}\quad\forall\sigma\in\llbracket s\rrbracket.

(8)

In our application, the data $\{\mu_{\sigma}\}_{\sigma\in\llbracket s\rrbracket}$ comes from control software execution profiles, i.e., by executing the same control software for the same $\bm{c}$ with all parameters and initial conditions fixed. So the stochasticity in $\bm{\xi}(\tau_{\sigma})$ stems from the dynamic variability in hardware resource availability.

In particular, for finitely many (say $n$ ) execution profiles, we consider empirical distributions

\displaystyle\mu_{\sigma}:=\dfrac{1}{n}\sum_{i=1}^{n}\delta(\bm{\xi}-\bm{\xi}^{i}(\tau_{\sigma})),

(9)

where $\delta(\bm{\xi}-\bm{\xi}^{i}(\tau_{\sigma}))$ denotes the Dirac delta at sample location $\bm{\xi}^{i}(\tau_{\sigma})$ where $i\in\llbracket n\rrbracket$ , $\sigma\in\llbracket s\rrbracket$ . At any snapshot index $\sigma\in\llbracket s\rrbracket$ , the set $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ is scattered data.

Given the data (8)-(9), we would like to predict the most likely hardware resource state statistics

\displaystyle\bm{\xi}(\tau)\sim\mu_{\tau}\quad\text{for any}\;\tau\in[0,t].

(10)

Without the qualifier “most likely”, the problem is overdetermined since there are uncountably many measure-valued continuous curves over $[0,t]$ that are consistent with the observed data (8)-(9).

Refer to caption — Figure 1: The path tree for sequentially observed $\{\mu_{\sigma}\}_{\sigma\in\llbracket s\rrbracket}$ .

III-C Multimarginal Schrödinger Bridge

Let $\mathcal{X}_{\sigma}:={\rm{support}}\left(\mu_{\sigma}\right)\subseteq\mathbb{R}^{d}$ $\forall\sigma\in\llbracket s\rrbracket$ , and consider the Cartesian product $\mathcal{X}_{1}\times\mathcal{X}_{2}\times\ldots\times\mathcal{X}_{s}=:\bm{\mathcal{X}}\subseteq\left(\mathbb{R}^{d}\right)^{\otimes s}$ . Let $\mathcal{M}\left(\mathcal{X}_{\sigma}\right)$ and $\mathcal{M}\left(\bm{\mathcal{X}}\right)$ denote the collection (i.e., manifold) of probability measures on $\mathcal{X}_{\sigma}$ and $\bm{\mathcal{X}}$ , respectively. Define a ground cost $\bm{C}:\bm{\mathcal{X}}\mapsto\mathbb{R}_{\geq 0}$ .

Following [6, Sec. 3], let


$\displaystyle{\rm{d}}\bm{\xi}_{-\sigma}$	$\displaystyle:={\rm{d}}\bm{\xi}(\tau_{1})\times\ldots\times{\rm{d}}\bm{\xi}(\tau_{\sigma-1})\times{\rm{d}}\bm{\xi}(\tau_{\sigma+1})\times\ldots\times{\rm{d}}\bm{\xi}(\tau_{s}),$	(11a)
$\displaystyle\bm{\mathcal{X}}_{-\sigma}$	$\displaystyle:=\mathcal{X}_{1}\times\ldots\times\mathcal{X}_{\sigma-1}\times\mathcal{X}_{\sigma+1}\times\ldots\times\mathcal{X}_{s}.$	(11b)

For $\varepsilon\geq 0$ (not necessarily small), the multimarginal Schrödinger bridge problem (MSBP) is the following infinite dimensional convex program:


	$\displaystyle\underset{\bm{M}\in\mathcal{M}\left(\bm{\mathcal{X}}\right)}{\min}\!\int_{\bm{\mathcal{X}}}\!\big{\{}\bm{C}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right)+\varepsilon\log\bm{M}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right)\!\!\big{\}}$
	$\displaystyle\qquad\qquad\qquad\bm{M}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right){\rm{d}}\bm{\xi}(\tau_{1})\ldots{\rm{d}}\bm{\xi}(\tau_{s})$		(12a)
	$\displaystyle\text{subject to}\int_{\bm{\mathcal{X}}_{-\sigma}}\!\!\!\!\!\!\bm{M}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right){\rm{d}}\bm{\xi}_{-\sigma}=\mu_{\sigma}\,\forall\sigma\in\llbracket s\rrbracket.$		(12b)

In particular, $\mathcal{M}(\bm{\mathcal{X}})$ is a convex set. The objective (12a) is strictly convex in $\bm{M}$ , thanks to the $\varepsilon$ -regularized negative entropy term $\int_{\bm{\mathcal{X}}}\varepsilon\bm{M}\log\bm{M}$ . The constraints (12b) are linear.

In this work, the measures $\{\mu_{\sigma}\}_{\sigma\in\llbracket s\rrbracket}$ correspond to sequential observation, and we therefore fix the path structured (Fig. 1) ground cost

\displaystyle\bm{C}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right)=\sum_{\sigma=1}^{s-1}c_{\sigma}\left(\bm{\xi}(\tau_{\sigma}),\bm{\xi}(\tau_{\sigma+1})\right).

(13)

In particular, we choose the squared Euclidean distance sequential cost between two consecutive snapshot indices, i.e., $c_{\sigma}(\cdot,\cdot):=\|\cdot-\cdot\|_{2}^{2}$ $\forall\sigma\in\llbracket s\rrbracket$ . MSBPs with more general tree structured ground costs have appeared in [7].

When the cardinality of the index set $\llbracket s\rrbracket$ equals $2$ , then (12) reduces to the (bi-marginal) Schrödinger bridge problem (SBP) [8, 9]. In this case, the solution of (12) gives the most likely evolution between two marginal snapshots $\mu_{1},\mu_{2}$ . This can be established via the large deviations [10] interpretation [11, Sec. II] of SBP using Sanov’s theorem [12]; see also [13, Sec. 2.1].

Specifically, let $\mathcal{C}\left([\tau_{1},\tau_{2}],\mathbb{R}^{d}\right)$ denote the collection of continuous functions on the time interval $[\tau_{1},\tau_{2}]$ taking values in $\mathbb{R}^{d}$ . Let $\Pi(\mu_{1},\mu_{2})$ be the collection of all path measures on $\mathcal{C}\left([\tau_{1},\tau_{2}],\mathbb{R}^{d}\right)$ with time $\tau_{1}$ marginal $\mu_{1}$ , and time $\tau_{2}$ marginal $\mu_{2}$ . Given a symmetric ground cost (e.g., Euclidean distance) $C:\mathcal{X}_{1}\times\mathcal{X}_{2}\mapsto\mathbb{R}_{\geq 0}$ , let

\displaystyle K(\cdot,\cdot):=\exp\left(-\dfrac{C(\cdot,\cdot)}{\varepsilon}\right),

(14)

and consider the bimarginal Gibbs kernel

\displaystyle K\left(\bm{\xi}(\tau_{1}),\bm{\xi}(\tau_{2})\right)\mu_{1}\otimes\mu_{2}.

(15)

Then, the bimarginal SBP solves

\displaystyle\underset{\pi\in\Pi(\mu_{1},\mu_{2})}{\min}\varepsilon D_{\rm{KL}}\left(\pi\|K\left(\bm{\xi}(\tau_{1}),\bm{\xi}(\tau_{2})\right)\mu_{1}\otimes\mu_{2}\right),

(16)

i.e., the most likely evolution of the path measure consistent with the observed measure-valued snapshots $\mu_{1},\mu_{2}$ .

Under the stated assumptions on the ground cost $c$ , the existence of minimizer for (16) is guaranteed [14, 15]. The uniqueness of minimizer follows from strict convexity of the map $\pi\mapsto D_{\rm{KL}}(\pi\|\nu)$ for fixed $\nu$ .

This relative entropy reformulation, and thereby “the most likely evolution consistent with observed measures” interpretation, also holds for the MSBP (12) with $s\geq 2$ snapshots. Specifically, for $\bm{C}:\bm{\mathcal{X}}\mapsto\mathbb{R}_{\geq 0}$ as in (12)-(13), we generalize (14) as

\displaystyle\bm{K}\!\left(\!\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\!\right):=\exp\!\left(\!\!-\dfrac{\bm{C}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\!\right)}{\varepsilon}\!\!\right),

(17)

and define the multimarginal Gibbs kernel

\displaystyle\bm{K}\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right)\!\mu_{1}\otimes\ldots\otimes\mu_{s}.

(18)

Problem (16) then generalizes to

\displaystyle\underset{\pi\in\Pi(\mu_{1},\ldots,\mu_{s})}{\min}\varepsilon D_{\rm{KL}}\left(\pi\|\bm{K}\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right)\mu_{1}\otimes\ldots\otimes\mu_{s}\right)

(19)

where $\Pi(\mu_{1},\ldots,\mu_{s})$ denotes the collection of all path measures on $\mathcal{C}\left([\tau_{1},\tau_{s}],\mathbb{R}^{d}\right)$ with time $\tau_{\sigma}$ marginal $\mu_{\sigma}$ $\forall\sigma\in\llbracket s\rrbracket$ . The equivalence between (12) and (19) can be verified by direct computation. Thus solving (19), or equivalently (12), yields the most likely evolution of the path measure consistent with the observed measure-valued snapshots $\mu_{\sigma}$ $\forall\sigma\in\llbracket s\rrbracket$ .

We propose to solve the MSBP (12) for learning the time-varying statistics of the hardware resource state $\bm{\xi}$ as in (10). We next detail a discrete formulation to numerically solve the same for scattered data $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ where $n$ is the number of control software execution profiles.

The minimizer of (12), $\bm{M}_{\rm{opt}}\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right)$ can be used to compute the optimal coupling between snapshot index pairs $(\sigma_{1},\sigma_{2})\in\{\llbracket s\rrbracket^{\otimes 2}\mid\sigma_{1}<\sigma_{2}\}$ as

\displaystyle\int_{\bm{\mathcal{X}}_{-\sigma_{1},-\sigma_{2}}}\!\!\!\!\!\!\bm{M}_{\rm{opt}}\!\left(\bm{\xi}(\tau_{1}),\ldots,\bm{\xi}(\tau_{s})\right){\rm{d}}\bm{\xi}_{-\sigma_{1},-\sigma_{2}}

(20)

where


$\displaystyle{\rm{d}}\bm{\xi}_{-\sigma_{1},-\sigma_{2}}$	$\displaystyle:=\prod_{\sigma\in\llbracket s\rrbracket\setminus\{\sigma_{1},\sigma_{2}\}}{\rm{d}}\bm{\xi}(\tau_{\sigma}),$	(21a)
$\displaystyle\bm{\mathcal{X}}_{-\sigma_{1},-\sigma_{2}}$	$\displaystyle:=\prod_{\sigma\in\llbracket s\rrbracket\setminus\{\sigma_{1},\sigma_{2}\}}\mathcal{X}_{\sigma}.$	(21b)

This will be useful for predicting the statistics of $\bm{\xi}(\tau)\sim\mu_{\tau}$ at any (out-of-sample) query time $\tau\in[0,t]$ .

Remark 1.

(MSBP and MOT) When the entropic regularization strength $\varepsilon=0$ , then the MSBP (12) reduces to the multimarginal optimal transport (MOT) problem [16, 17] that has found widespread applications in barycenter computation [18], fluid dynamics [19, 20], team matching problems [21], and density functional theory [22, 23]. Further specializing MOT with the cardinality of $\llbracket s\rrbracket$ equals $2$ , yields the (bimarginal) optimal transport [24] problem.

III-D Discrete Formulation of MSBP

For finite scattered data $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ and $\{\mu_{\sigma}\}_{\sigma\in\llbracket s\rrbracket}$ as in (9), we set up a discrete version of (12) as follows.

With slight abuse of notations, we use the same symbol for the continuum and discrete version of a tensor. The ground cost in discrete formulation is represented by an order $s$ tensor $\bm{C}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}$ , with components $\left[\bm{C}_{i_{1},\ldots,i_{s}}\right]=\bm{C}\left(\bm{\xi}_{i_{1}},\ldots,\bm{\xi}_{i_{s}}\right)$ . The component $\left[\bm{C}_{i_{1},\ldots,i_{s}}\right]$ encodes the cost of transporting unit mass for a tuple $(i_{1},\ldots,i_{s})$ .

Likewise, consider the discrete mass tensor $\bm{M}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}$ with components $\left[\bm{M}_{i_{1},\ldots,i_{s}}\right]=\bm{M}\left(\bm{\xi}_{i_{1}},\ldots,\bm{\xi}_{i_{s}}\right)$ . The component $\left[\bm{M}_{i_{1},\ldots,i_{s}}\right]$ denotes the amount of transported mass for a tuple $(i_{1},\ldots,i_{s})$ .

For any $\sigma\in\llbracket s\rrbracket$ , the empirical marginals $\bm{\mu}_{\sigma}\in\mathbb{R}^{n}_{\geq 0}$ are supported on the finite sets $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ . We denote the projection of $\bm{M}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}$ on the $\sigma$ th marginal as $\operatorname*{proj}_{\sigma}(\bm{M})$ . Thus $\operatorname*{proj}_{\sigma}:\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}\mapsto\mathbb{R}^{n}_{\geq 0}$ , and is given componentwise as

\displaystyle\left[{\rm{proj}}_{\sigma}\!\left(\bm{M}\right)_{j}\!\right]\!\!=\!\!\!\!\sum_{i_{1},\ldots,i_{\sigma-1},i_{\sigma+1},\ldots,i_{s}}\!\!\!\!\!\!\!\!\bm{M}_{i_{1},\ldots,i_{\sigma-1},j,i_{\sigma+1},\ldots,i_{s}}.

(22)

Likewise, denote the projection of $\bm{M}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}$ on the $(\sigma_{1},\sigma_{2})$ th marginal as $\operatorname*{proj}_{\sigma_{1},\sigma_{2}}(\bm{M})$ , i.e., $\operatorname*{proj}_{\sigma_{1},\sigma_{2}}:\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}\mapsto\mathbb{R}^{n\times n}_{\geq 0}$ , and is given componentwise as

	$\displaystyle\left[{\rm{proj}}_{\sigma_{1},\sigma_{2}}\!\left(\bm{M}\right)_{j,\ell}\!\right]$
	$\displaystyle=\!\!\!\!\sum_{i_{\sigma}\mid\sigma\in\llbracket s\rrbracket\setminus\{\sigma_{1},\sigma_{2}\}}\!\!\!\!\!\!\!\!\bm{M}_{i_{1},\ldots,i_{\sigma_{1}-1},j,i_{\sigma_{1}+1},\ldots,i_{\sigma_{2}-1},\ell,i_{\sigma_{2}+1},\ldots,i_{s}}.$		(23)

We note that (22) and (23) are the discrete versions of the integrals in (12b) and (20), respectively.

With the above notations in place, the discrete version of (12) becomes


	$\displaystyle\underset{\bm{M}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{\geq 0}}{\min}~{}\langle\bm{C}+\varepsilon\log\bm{M},\bm{M}\rangle$		(24a)
	$\displaystyle\text{subject to}~{}~{}{\rm{proj}}_{\sigma}\left(\bm{M}\right)=\bm{\mu}_{\sigma}\quad\forall\sigma\in\llbracket s\rrbracket.$		(24b)

The primal formulation (24) has $n^{s}$ decision variables, and is computationally intractable. Recall that even for the bimarginal ( $s=2$ ) case, a standard approach [25] is to use Lagrange duality to notice that the optimal mass matrix $M_{\rm{opt}}$ is a diagonal scaling of $K:=\exp(-C/\varepsilon)\in\mathbb{R}^{n\times n}_{>0}$ , i.e., $M_{\rm{opt}}=\mathrm{diag}(\bm{u}_{1})K\mathrm{diag}(\bm{u}_{2})$ where $\bm{u}_{1}:=\exp(\bm{\lambda}_{1}/\varepsilon)$ , $\bm{u}_{2}:=\exp(\bm{\lambda}_{2}/\varepsilon)$ , and $\bm{\lambda}_{1},\bm{\lambda}_{2}\in\mathbb{R}^{n}$ are the Lagrange multipliers associated with respective bimarginal constraints $\operatorname*{proj}_{1}(M)=\bm{\mu}_{1}$ , $\operatorname*{proj}_{2}(M)=\bm{\mu}_{2}$ . The unknowns $\bm{u}_{1},\bm{u}_{2}$ can be obtained by performing the Sinkhorn iterations


$\displaystyle\bm{u}_{1}$	$\displaystyle\leftarrow\bm{\mu}_{1}\oslash\left(K\bm{u}_{2}\right),$	(25a)
$\displaystyle\bm{u}_{2}$	$\displaystyle\leftarrow\bm{\mu}_{2}\oslash\left(K^{\top}\bm{u}_{1}\right),$	(25b)

with guaranteed linear convergence [26] wherein the computational cost is governed by two matrix-vector multiplications.

The duality result holds for the multimarginal ( $s\geq 2$ ) case. Specifically, the optimal mass tensor in (24) admits a structure $\bm{M}_{\rm{opt}}=\bm{K}\odot\bm{U}$ where $\bm{K}:=\exp(-\bm{C}/\varepsilon)\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{>0}$ , $\bm{U}:=\otimes_{\sigma=1}^{s}\bm{u}_{\sigma}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{>0}$ , $\bm{u}_{\sigma}:=\exp(\bm{\lambda}_{\sigma}/\varepsilon)$ , and $\bm{\lambda}_{\sigma}\in\mathbb{R}^{n}$ are the Lagrange multipliers associated with the respective multimarginal constraints (24b). The unknowns $\bm{u}_{\sigma}$ can, in principle, be obtained from the multimarginal Sinkhorn iterations [27]

\displaystyle\bm{u}_{\sigma}\leftarrow\bm{u}_{\sigma}\odot\bm{\mu}_{\sigma}\oslash{\rm{proj}}_{\sigma}\left(\bm{K}\odot\bm{U}\right)~{}\forall\sigma\in\llbracket s\rrbracket,

(26)

which generalize (25). However, computing ${\rm{proj}}_{\sigma}\left(\bm{K}\odot\bm{U}\right)$ requires $\mathcal{O}\left(n^{s}\right)$ operations. Before describing how to avoid this exponential complexity (Sec. III-F), we point out the convergence guarantees for (26).

III-E Convergence for Multimarginal Sinkhorn Iterations

The iterations (26) can either be derived as alternating Bregman projections [27] or via block coordinate dual ascent [6]. Following either viewpoints leads to guaranteed linear convergence of (26); see [28], [7, Thm. 3.5].

More recent works have also established [29] guaranteed convergence for the continuous formulation (12) with linear rate of convergence [30].

III-F Multimarginal Sinkhorn Iterations for Path Structured $\bm{C}$

We circumvent the exponential complexity in computing ${\rm{proj}}_{\sigma}\left(\bm{K}\odot\bm{U}\right)$ in (26) by leveraging the path structured ground cost (13). This is enabled by a key result from [6], rephrased, and reproved below in slightly generalized form.

Proposition 1.

([6, Prop. 2]) Consider the discrete ground cost tensor $\bm{C}$ in (24) induced by a path structured cost (13) so that $\left[\bm{C}_{i_{1},\ldots,i_{s}}\right]=\sum_{\sigma=1}^{s-1}\left[C_{i_{\sigma},i_{\sigma+1}}^{\sigma\rightarrow\sigma+1}\right]$ where the matrix $C^{\sigma\rightarrow\sigma+1}\in\mathbb{R}^{n\times n}_{\geq 0}$ encodes the cost of transporting unit mass between each source-destination pair from the source set $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ to the destination set $\{\bm{\xi}^{i}(\tau_{\sigma+1})\}_{i=1}^{n}$ .

Let $K^{\sigma\rightarrow\sigma+1}:=\exp(-C^{\sigma\rightarrow\sigma+1}/\varepsilon)\in\mathbb{R}^{n\times n}_{\geq 0}$ , $\bm{K}:=\exp(-\bm{C}/\varepsilon)\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{>0}$ , $\bm{U}:=\otimes_{\sigma=1}^{s}\bm{u}_{\sigma}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{>0}$ .

Then (22) and (23) can be expressed as

	$\displaystyle{\rm{proj}}_{\sigma}\!\left(\bm{K}\odot\bm{U}\right)\!=\!\!\left(\!\bm{u}_{1}^{\!\top}K^{1\to 2}\!\prod_{j=2}^{\sigma-1}\!\mathrm{diag}(\bm{u}_{j})K^{j\to j+1}\!\!\right)^{\!\!\top}\!\!\!\!\odot\bm{u}_{\sigma}\odot$
	$\displaystyle\left(\!\!\left(\prod_{j=\sigma+1}^{s-1}K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\!\right)\!\!K^{s-1\to s}\bm{u}_{s}\!\right)\,\forall\sigma\in\llbracket s\rrbracket,$		(27)

and

	$\displaystyle{\rm{proj}}_{\sigma_{1},\sigma_{2}}\!\left(\bm{K}\odot\bm{U}\right)=\mathrm{diag}\!\left(\!\!\bm{u}_{1}^{\top}K^{1\to 2}\prod_{j=2}^{\sigma_{1}-1}\mathrm{diag}(\bm{u}_{j})K^{j\to j+1}\!\!\right)$
	$\displaystyle\qquad\qquad\qquad\qquad~{}~{}~{}\mathrm{diag}(\bm{u}_{\sigma_{1}})\!\!\prod_{j=\sigma_{1}+1}^{\sigma_{2}}\!\!\left(K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\right)$
	$\displaystyle\qquad\qquad\qquad~{}\mathrm{diag}\!\left(\!\!\left(\prod_{j=\sigma_{2}+1}^{s-1}K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\!\!\right)\!K^{s-1\to s}\bm{u}_{s}\!\!\right)$
	$\displaystyle\qquad\qquad\qquad\qquad\forall(\sigma_{1},\sigma_{2})\in\{\llbracket s\rrbracket^{\otimes 2}\mid\sigma_{1}<\sigma_{2}\}.$		(28)

Proof.

The proof strategy is to write the Hilbert-Schmidt inner product $\langle\bm{K},\bm{U}\rangle$ in two different ways.

First, recall that $\bm{K}:=\exp(-\bm{C}/\varepsilon)\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{>0}$ and $\bm{U}:=\otimes_{\sigma=1}^{s}\bm{u}_{\sigma}\in\left(\mathbb{R}^{n}\right)^{\otimes s}_{>0}$ . So following (1), we have

	$\displaystyle\langle\bm{K},\bm{U}\rangle$	$\displaystyle=\sum_{i_{1},\dots,i_{s}}\left(\prod_{j=2}^{s}\left[K^{j-1\to j}_{i_{j-1},i_{j}}\right]\right)\prod_{j=1}^{s}(\bm{u}_{j})_{i_{j}}$
		$\displaystyle=\sum_{i_{1},\dots,i_{s}}(\bm{u}_{1})_{i_{1}}\prod_{j=2}^{s}\left[K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\right]_{i_{j-1},i_{j}}$
		$\displaystyle=\bm{u}_{1}^{\top}\left(\prod_{j=2}^{s-1}K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\right)K^{s-1\to s}\bm{u}_{s},$

and (27) follows from [6, Lemma 1 in Appendix 1].

Next, notice that we can alternatively write

	$\displaystyle\langle\bm{K},\bm{U}\rangle=\>$	$\displaystyle\bm{u}_{1}^{\top}\left(\prod_{j=2}^{\sigma_{1}-1}K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\right)K^{\sigma_{1}-1\to\sigma_{1}}$
		$\displaystyle\left(\prod_{j=\sigma_{1}}^{\sigma_{2}-1}\mathrm{diag}(\bm{u}_{j})K^{j\to j+1}\right)\mathrm{diag}(\bm{u}_{\sigma_{2}})$
		$\displaystyle\left(\prod_{j=\sigma_{2}+1}^{s-1}K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\right)K^{s-1\to s}\bm{u}_{s}.$

Then (28) follows from [6, Lemma 2 in Appendix 1]. ∎

Remark 2.

Unlike [6, Prop. 2], our data $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ are scattered, i.e., not on a fixed grid, hence the need for superscripts $\sigma\rightarrow\sigma+1$ for the time-varying matrices in our Prop. 1. In contrast, the corresponding matrices in [6, Prop. 2] are independent of $\sigma$ .

Remark 3.

We note that substituting (27) into (26) cancels the (elements of) positive vectors $\bm{u}_{\sigma}$ $\forall\sigma\in\llbracket s\rrbracket$ from the corresponding numerators and denominators. This further simplifies our multimarginal Sinkhorn recursions to

	$\displaystyle\bm{u}_{\sigma}\leftarrow\bm{\mu}_{\sigma}\oslash\!\!\left(\!\!\!\left(\!\bm{u}_{1}^{\!\top}K^{1\to 2}\!\prod_{j=2}^{\sigma-1}\!\mathrm{diag}(\bm{u}_{j})K^{j\to j+1}\!\!\right)^{\!\!\top}\right.$
	$\displaystyle\left.\odot\left(\!\!\left(\prod_{j=\sigma+1}^{s-1}K^{j-1\to j}\mathrm{diag}(\bm{u}_{j})\!\right)\!\!K^{s-1\to s}\bm{u}_{s}\!\!\right)\!\!\right)\>\forall\sigma\in\llbracket s\rrbracket.$		(29)

Remark 4.

(From exponential to linear complexity in $s$ ) We note that (29) involves $s-1$ matrix-vector multiplications each of which has $\mathcal{O}(n^{2})$ complexity. So the computational complexity for (29) becomes $\mathcal{O}\left((s-1)n^{2}\right)$ which is linear in $s$ , i.e., a significant reduction from earlier $\mathcal{O}\left(n^{s}\right)$ complexity mentioned at the end of Sec. III-D.

Remark 5.

(Linear complexity in $d$ ) The dimension $d$ of the vector $\bm{\xi}$ only affects the construction of the time-varying Euclidean distance matrices $C^{\sigma\rightarrow\sigma+1}$ $\forall\sigma\in\llbracket s-1\rrbracket$ in Prop. 1, which has total complexity $\mathcal{O}(sd)$ . Once constructed, the recursions (29) are independent of $d$ .

We next outline how the solution tensor $\bm{M}_{\rm{opt}}=\bm{K}\odot\bm{U}$ obtained from the converged Sinkhorn iterations can be used together with (28), to make stochastic predictions of the most likely hardware resource state in the form (10).

III-G Predicting Most Likely Distribution

For the ground cost (13) resulting from sequential information structure (Fig. 1), we utilize (28) to decompose $\bm{M}_{\rm{opt}}=\bm{K}\odot\bm{U}$ of (24) into bimarginal transport plans

\displaystyle M^{\sigma_{1}\to\sigma_{2}}:={\rm{proj}}_{\sigma_{1},\sigma_{2}}(\bm{M}_{\rm{opt}})={\rm{proj}}_{\sigma_{1},\sigma_{2}}(\bm{K}\odot\bm{U}).

(30)

Further, when $\bm{C}$ is squared Euclidean, as we consider here, the maximum likelihood estimate for $\mu_{\tau}$ in (10) for a query point $\tau\in[0,t]$ , is (see [6, Sec. 2.2])

\displaystyle\!\hat{\mu}_{\tau}\!:=\!\sum_{i=1}^{n}\!\sum_{j=1}^{n}\!\left[\!M^{\sigma\to\sigma+1}_{i,j}\!\right]\!\!\delta(\bm{\xi}-\hat{\bm{\xi}}(\tau,\bm{\xi}^{i}(\tau_{\sigma}),\bm{\xi}^{j}(\tau_{\sigma+1})))

(31)

where $\sigma\in\llbracket s\rrbracket$ such that $\tau\in[\tau_{\sigma},\tau_{\sigma+1}]$ , and


	$\displaystyle\!\hat{\bm{\xi}}(\tau,\bm{\xi}^{i}(\tau_{\sigma}),\bm{\xi}^{j}(\tau_{\sigma+1}))\!\!:=\!(1-\lambda)\bm{\xi}^{i}(\tau_{\sigma})\!+\!\lambda\bm{\xi}^{j}(\tau_{\sigma+1}),$		(32a)
	$\displaystyle\lambda:=\dfrac{\tau-\tau_{\sigma}}{\tau_{\sigma+1}-\tau_{\sigma}}\in[0,1].$		(32b)

IV Overall Algorithm

The methodology proposed in Sec. III is comprised of the following three overall steps.
Step 1. Given a collection of contexts (Sec. III-A) $\{\bm{c}^{i}\}_{i=1}^{n_{\rm{context}}}$ , execute the control software over $[0,t]$ to generate hardware resource state sample snapshots (Sec. III-B) $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ , and thereby empirical $\mu_{\sigma}$ as in (9) for all $\sigma\in\llbracket s\rrbracket$ , conditional on each of the $n_{\rm{context}}$ context samples.
Step 2. Using data from Step 1, construct Euclidean distance matrices $C^{\sigma\rightarrow\sigma+1}$ from the source set $\{\bm{\xi}^{i}(\tau_{\sigma})\}_{i=1}^{n}$ to the destination set $\{\bm{\xi}^{i}(\tau_{\sigma+1})\}_{i=1}^{n}$ $\forall\sigma\in\llbracket s-1\rrbracket$ . Perform recursions (29) until convergence (error within desired tolerance).
Step 3. Given a query context $\bm{c}$ and time $\tau\in[0,t]$ , return most likely distribution $\hat{\mu}_{\tau}$ using (31).

Remark 6.

For the three steps mentioned above, Step 1 is data generation, Step 2 is probabilistic learning using data from Step 1, and Step 3 is prediction using the learnt model.

V Numerical Case Study

In this Section, we illustrate the application of the proposed method for a vehicle path tracking control software. All along, we provide details for the three steps in Sec. IV.

Control Software. We wrote custom software¹¹1Git repo: https://github.com/abhishekhalder/CPS-Frontier-Task3-Collaboration in C language implementing path following nonlinear model predictive controller (NMPC) for a kinematic bicycle model (KBM) [32, 33] of a vehicle with four states $(x,y,v,\psi)$ and two control inputs $(a_{c},\delta)$ , given by $\dot{x}=v\cos(\psi+\beta),\dot{y}=v\sin(\psi+\beta),\dot{v}=a_{c},\dot{\psi}=\frac{v}{\ell_{\text{rear}}}\sin\beta$ , where the sideslip angle $\beta=\arctan\left(\frac{\ell_{\text{rear }}}{\ell_{\text{front }}+\ell_{\text{rear }}}\tan\delta\right)$ . The $4\times 1$ state vector comprises of the inertial position $(x,y)$ for the vehicle’s center-of-mass, its speed $v$ , and the vehicle’s inertial heading angle $\psi$ . The $2\times 1$ control vector comprises of the acceleration $a_{c}$ , and the front steering wheel angle $\delta$ .

The parameters $\ell_{\text{front }},\ell_{\text{rear }}$ are the distances of the vehicle’s center-of-mass to the front and rear axles, respectively.

The NMPC was designed to track a desired path given as a sequence of $N=200$ waypoint tuples $\big{\{}(x_{d}^{(i)},y_{d}^{(i)},v_{d}^{(i)})\big{\}}_{i=1}^{N}$ , i.e., a sequence of desired positions and speeds (desired speed profile was numerically estimated from the desired waypoint profiles). At every control step (at most every 100 ms), using the IPOPT nonlinear program solver [34], the NMPC solved an optimization problem to minimize the sum of the crosstrack, $\psi$ , and $v$ errors, along with the magnitude and rapidity in change of the control inputs, over a period of time from 0 to the time horizon $H_{p}=4$ , subject to control magnitude and slew rate constraints. For formulation details and control performance achieved, see [35]. For implementation and parameter values, we refer the readers to the Git repository in the footnote.

While closing the loop with KBM with computed control values requires minimal computational overhead, the NMPC is computationally demanding. In the case where multiple vehicle controllers are available, it is of practical interest to predict the hardware resource usage for the NMPC for one to several control cycles, conditional on the CPS context $\bm{c}$ (Sec. III-A) at a given time. For this we ‘profile’ the NMPC, meaning we run the software many times for different values of $\bm{c}$ as in (4), measuring time evolution of the hardware resource state $\bm{\xi}$ as in (7). We use these profiles to generate marginals $\mu_{\sigma}$ as in (9), thus completing Step 1 in Sec. IV.

We next provide details on generating control software execution profiles for our specific case study.
Generating Execution Profiles. To gather the execution profiles for our NMPC control software, we used an Ubuntu 16.04.7 Linux machine with an Intel Xeon E5-2683 v4 CPU. We leveraged Intel’s Cache Allocation Technology (CAT) [36] and Memguard [37] to control allocation of LLC partitions and memory bandwidth available to the control software, respectively. Both LLC partitions and memory bandwidth were allocated in blocks of 2MB.

Utilizing these resource partitioning mechanisms, we ran our application on an isolated CPU and used the Linux perf tool [38], version 4.9.3, to sample $\bm{\xi}$ every 10 ms.

For each run of our application, we set the cache and memory bandwidth to a static allocation and pass as input a path for the NMPC to follow, represented as an array of desired $(x,y)$ coordinates. We then execute the control software for $n_{c}:=5$ uninterrupted “control cycles”, wherein the NMPC gets the KBM state, makes a control decision, and updates the KBM state.

We profile over 12 unique desired paths to track, denoted $\{y_{\rm{des}}^{i}(x)\}_{i=1}^{12}$ , and 5 unique vectors of $\{\bm{c}_{\rm{cyber}}^{i}\}_{i=1}^{5}$ , comprising $n_{\rm{context}}=12\times 5=60$ samples for $\bm{c}$ . Conditional on each of these $60$ context samples $\{\bm{c}^{i}\}_{i=1}^{n_{\rm{context}}=60}$ , we run the software for 500 profiles for each unique $\bm{c}$ for a total of 30,000 profiles.

The sample paths $\{y_{\rm{des}}^{i}(x)\}_{i=1}^{12}$ in (6) were all generated for $x\in[0,10]$ using a GP with mean zero and variance 10 [31], and are shown in Fig. 2.

Our vectorial samples $\{\bm{c}_{\rm{cyber}}^{i}\}_{i=1}^{5}$ in (5) were $[1,1]^{\top}$ , $[5,5]^{\top}$ , $[10,10]^{\top}$ , $[15,15]^{\top}$ , and $[20,20]^{\top}$ , where each entry represents the number of cache/memory bandwidth partitions from $1$ to $20$ . These values were selected to broadly cover the range of possible hardware contexts.

Control cycle	Mean	Standard deviation
#1	0.1181	0.0076
#2	0.2336	0.0106
#3	0.3495	0.0127
#4	0.4660	0.0143
#5	0.5775	0.0159

TABLE I: The means and standard deviations of the end times for the

n_{c}=5

control cycles data shown in Fig. 4.

$s_{\rm{int}}$	$W_{1}$	$W_{2}$	$W_{3}$	$W_{4}$	$W_{5}$
$0$	$2.0489$	-	-	-	-
$1$	$2.2695$	$1.1750$	-	-	-
$2$	$5.7717$	$0.9163$	$0.3794$	-	-
$3$	$2.2413$	$1.6432$	$1.2345$	$0.6010$	-
$4$	$0.6372$	$1.2691$	$0.9176$	$0.6689$	$0.2111$

TABLE II: Number of intracycle marginals

s_{\rm{int}}

vs. Wasserstein distances

W_{j}

as in (33). All entries are scaled up by

10^{4}

Applying the Proposed Algorithm. Given a query context $\bm{c}$ , we determine the closest CPS context for which profiling data is available, using the Euclidean distance between cyber context vectors (5), and the Fréchet distance [41] between physical context curves (6). In this case study, we consider a query context with closest $\bm{c}_{\rm{cyber}}=\begin{bmatrix}15,15\end{bmatrix}^{\top}$ and closest $\bm{c}_{\rm{phys}}=y_{\rm{des}}^{1}(x)$ . Profiling data for this $\bm{c}$ is shown in Fig. 3.

For each of the $n=500$ profiles, we are given the end times for each of the $n_{c}=5$ control cycles. We then determine the statistics of the cycle end times (Fig. 4) and compute the empirical distributions of $\bm{\xi}$ at the means (Table I) of the control cycle start/end time boundaries. For empirical distributions at times between cycle boundaries, we let $s_{\rm{int}}$ be the number of marginals equispaced-in-time between each cycle boundary. We then set $\tau_{\sigma\in\llbracket s\rrbracket}$ from the means in Table I, where $s:=1+n_{c}(s_{\rm{int}}+1)$ and $\tau_{\sigma(s_{\rm{int}}+1)+1}$ is the sampled mean end time for the $\sigma$ th control cycle.

Our distributions are as per (9), where $\bm{\xi}^{i}(\tau_{\sigma})$ is the sample of the hardware resource state (7) at time $\approx\tau_{\sigma}$ (within 5ms) for profile $i$ given context $\bm{c}$ .

We set $\varepsilon=0.1$ and solve the discrete MSBP (24) with squared Euclidean cost $\bm{C}$ using (29). Fig. 5 shows that the Sinkhorn iterations converge linearly (Sec. III-E). We emphasize here that the computational complexity of proposed algorithm is minimal, thanks to the path structure of the information. In particular, we solve the MSBP (24) with $n^{s}=500^{26}$ decision variables in approx. 10 s in MATLAB on an Ubuntu 22.04.2 LTS Linux machine with an AMD Ryzen 7 5800X CPU.

Fig. 6 compares predicted versus observed empirical distributions. Specifically, Fig. 6 shows $s_{\rm{int}}+1=5$ distributional predictions $\hat{\mu}_{\hat{\tau}_{j}}$ at times $\hat{\tau}_{j}$ , temporally equispaced throughout the duration of the 3rd control cycle, i.e., between $\tau_{2(s_{\rm{int}}+1)+1}$ and $\tau_{3(s_{\rm{int}}+1)+1}$ , with

\hat{\tau}_{j}=\tau_{2(s_{\rm{int}}+1)+1}+\Bigg{(}\frac{\tau_{3(s_{\rm{int}}+1)+1}-\tau_{2(s_{\rm{int}}+1)+1}}{s_{\rm{int}}+2}\Bigg{)}j,

where $j\in\llbracket s_{\rm{int}}+1\rrbracket$ . We used (31) with $\sigma=2(s_{\rm{int}}+1)+j$ , since $\hat{\tau}_{j}\in[\tau_{2(s_{\rm{int}}+1)+j},\tau_{2(s_{\rm{int}}+1)+j+1}]$ .

From Fig. 6 it is clear that the measure-valued predictions, while largely accurate, are prone to error in cases where the software resource usage behavior changes in bursts too short to be appear in our observations. It follows that increasing the number of snapshots should yield an improvement in overall accuracy. In this example, we achieve this by increasing $s_{\rm{int}}$ . Table II reports the Wasserstein distances $W(\cdot,\cdot)$ between the corresponding predicted and measured distributions:

\displaystyle W_{j}:=W(\hat{\mu}_{\hat{\tau}_{j}},\mu_{\hat{\tau}_{j}})\quad\forall j\in\llbracket s_{\rm{int}}+1\rrbracket.

(33)

We computed each of these $W_{j}$ as the square root of the optimal value of the corresponding linear program that results from specializing (24) with $s=2$ , $\varepsilon=0$ .

VI Concluding Remarks

We apply recent algorithmic advances in solving the MSBP to learn stochastic hardware resource usage by control software. The learnt model demonstrates accurate nonparametric measure-valued predictions for the joint hardware resource state at a desired time conditioned on CPS context. The formulation and its solution comes with a maximum likelihood guarantee in the space of probability measures, and the algorithm enjoys a guaranteed linear convergence rate.

References

[1] G. Bernat, A. Colin, and S. Petters, “WCET analysis of probabilistic hard real-time systems,” in 23rd IEEE Real-Time Systems Symposium, 2002 (RTSS), 2002, pp. 279–288.
[2] M. Lv, N. Guan, Y. Zhang, Q. Deng, G. Yu, and J. Zhang, “A survey of WCET analysis of real-time operating systems,” in 2009 International Conference on Embedded Software and Systems, 2009, pp. 65–72.
[3] R. Gifford, N. Gandhi, L. T. X. Phan, and A. Haeberlen, “Dna: Dynamic resource allocation for soft real-time multicore systems,” in 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 2021, pp. 196–209.
[4] K. Zhang, J. Sprinkle, and R. G. Sanfelice, “Computationally aware switching criteria for hybrid model predictive control of cyber-physical systems,” IEEE Transactions on Automation Science and Engineering, vol. 13, no. 2, pp. 479–490, 2016.
[5] P. J. Bushell, “Hilbert’s metric and positive contraction mappings in a Banach space,” Archive for Rational Mechanics and Analysis, vol. 52, pp. 330–338, 1973.
[6] F. Elvander, I. Haasler, A. Jakobsson, and J. Karlsson, “Multi-marginal optimal transport using partial information with applications in robust localization and sensor fusion,” Signal Processing, vol. 171, p. 107474, 2020.
[7] I. Haasler, A. Ringh, Y. Chen, and J. Karlsson, “Multimarginal optimal transport with a tree-structured cost and the Schrodinger bridge problem,” SIAM Journal on Control and Optimization, vol. 59, no. 4, pp. 2428–2453, 2021.
[8] C. Léonard, “A survey of the Schrödinger problem and some of its connections with optimal transport,” Discrete and Continuous Dynamical Systems-Series A, vol. 34, no. 4, pp. 1533–1574, 2014.
[9] Y. Chen, T. T. Georgiou, and M. Pavon, “Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schrodinger bridge,” Siam Review, vol. 63, no. 2, pp. 249–313, 2021.
[10] A. Dembo and O. Zeitouni, Large deviations techniques and applications. Springer Science & Business Media, 2009, vol. 38.
[11] H. Follmer, “Random fields and diffusion processes,” Ecole d’Ete de Probabilites de Saint-Flour XV-XVII, 1985-87, 1988.
[12] I. N. Sanov, On the probability of large deviations of random variables. United States Air Force, Office of Scientific Research, 1958.
[13] M. Pavon, G. Trigila, and E. G. Tabak, “The data-driven Schrödinger bridge,” Communications on Pure and Applied Mathematics, vol. 74, no. 7, pp. 1545–1573, 2021.
[14] I. Csiszár, “I-divergence geometry of probability distributions and minimization problems,” The annals of probability, pp. 146–158, 1975.
[15] J. M. Borwein, A. S. Lewis, and R. D. Nussbaum, “Entropy minimization, DAD problems, and doubly stochastic kernels,” Journal of Functional Analysis, vol. 123, no. 2, pp. 264–307, 1994.
[16] L. Rüschendorf and L. Uckelmann, “On the n-coupling problem,” Journal of multivariate analysis, vol. 81, no. 2, pp. 242–258, 2002.
[17] B. Pass, “Multi-marginal optimal transport: theory and applications,” ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, vol. 49, no. 6, pp. 1771–1790, 2015.
[18] M. Agueh and G. Carlier, “Barycenters in the Wasserstein space,” SIAM Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, 2011.
[19] Y. Brenier, “Generalized solutions and hydrostatic approximation of the Euler equations,” Physica D: Nonlinear Phenomena, vol. 237, no. 14-17, pp. 1982–1988, 2008.
[20] J.-D. Benamou, G. Carlier, and L. Nenna, “Generalized incompressible flows, multi-marginal transport and sinkhorn algorithm,” Numerische Mathematik, vol. 142, pp. 33–54, 2019.
[21] G. Carlier, A. Oberman, and E. Oudet, “Numerical methods for matching for teams and Wasserstein barycenters,” ESAIM: Mathematical Modelling and Numerical Analysis, vol. 49, no. 6, pp. 1621–1642, 2015.
[22] G. Buttazzo, L. De Pascale, and P. Gori-Giorgi, “Optimal-transport formulation of electronic density-functional theory,” Physical Review A, vol. 85, no. 6, p. 062502, 2012.
[23] C. Cotar, G. Friesecke, and C. Klüppelberg, “Density functional theory and optimal transportation with Coulomb cost,” Communications on Pure and Applied Mathematics, vol. 66, no. 4, pp. 548–599, 2013.
[24] G. Peyré and M. Cuturi, “Computational optimal transport: With applications to data science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
[25] M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” Advances in neural information processing systems, vol. 26, 2013.
[26] J. Franklin and J. Lorenz, “On the scaling of multidimensional matrices,” Linear Algebra and its applications, vol. 114, pp. 717–735, 1989.
[27] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, “Iterative Bregman projections for regularized transportation problems,” SIAM Journal on Scientific Computing, vol. 37, no. 2, pp. A1111–A1138, 2015.
[28] H. H. Bauschke and A. S. Lewis, “Dykstras algorithm with Bregman projections: A convergence proof,” Optimization, vol. 48, no. 4, pp. 409–427, 2000.
[29] S. D. Marino and A. Gerolin, “An optimal transport approach for the Schrödinger bridge problem and convergence of Sinkhorn algorithm,” Journal of Scientific Computing, vol. 85, no. 2, p. 27, 2020.
[30] G. Carlier, “On the linear convergence of the multimarginal Sinkhorn algorithm,” SIAM Journal on Optimization, vol. 32, no. 2, pp. 786–794, 2022.
[31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[32] J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli, “Kinematic and dynamic vehicle models for autonomous driving control design,” in 2015 IEEE intelligent vehicles symposium (IV). IEEE, 2015, pp. 1094–1099.
[33] S. Haddad, A. Halder, and B. Singh, “Density-based stochastic reachability computation for occupancy prediction in automated driving,” IEEE Transactions on Control Systems Technology, vol. 30, no. 6, pp. 2406–2419, 2022.
[34] A. Wächter and L. T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006.
[35] https://github.com/abhishekhalder/CPS-Frontier-Task3-Collaboration/blob/master/Codes/kbm_sim/Documentation_KinematicBicycle_Controllers.pdf, accessed: 2023-09-29.
[36] Intel Corporation, “Improving real-time performance by utilizing cache allocation technology,” Apr. 2015, White Paper.
[37] H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha, “Memory bandwidth management for efficient performance isolation in multi-core platforms,” IEEE Transactions on Computers, vol. 65, no. 2, pp. 562–576, Feb 2016.
[38] “perf(1) — linux manual page,” https://man7.org/linux/man-pages/man1/perf.1.html, accessed: 2023-09-29.
[39] A. W. Bowman, “An alternative method of cross-validation for the smoothing of density estimates,” Biometrika, vol. 71, no. 2, pp. 353–360, 1984.
[40] P. Hall, J. Marron, and B. U. Park, “Smoothed cross-validation,” Probability theory and related fields, vol. 92, no. 1, pp. 1–20, 1992.
[41] T. Eiter and H. Mannila, “Computing discrete Fréchet distance,” 1994. [Online]. Available: https://api.semanticscholar.org/CorpusID:16010565

Path Structured Multimarginal Schrödinger Bridge for Probabilistic Learning of Hardware Resource Usage by Control Software