An MCMC Method for Uncertainty Set Generation
via Operator-Theoretic Metrics

Anand Srinivasan and Naoya Takeishi A. Srinivasan is with the Mathematics Department at the Massachusetts Institute of Technology, Cambridge, MA, USA [email protected]N. Takeishi is with RIKEN Center for Advanced Intelligence Project, Tokyo, Japan [email protected]

Abstract

Model uncertainty sets are required in many robust optimization problems, such as robust control and prediction with uncertainty, but there is no definite methodology to generate uncertainty sets for nonlinear dynamical systems. In this paper, we propose a method for model uncertainty set generation via Markov chain Monte Carlo. The proposed method samples from distributions over dynamical systems via metrics over transfer operators and is applicable to general nonlinear systems. We adapt Hamiltonian Monte Carlo for sampling high-dimensional transfer operators in a computationally efficient manner. We present numerical examples to validate the proposed method for uncertainty set generation.

I INTRODUCTION

Generating model uncertainty sets of dynamical systems is a universal problem in situations such as robust control, prediction with uncertainty, and scenario optimization. For example, in a min-max model predictive control (MPC) problem, one would like to solve

\min_{\{u\}}\max_{F\in\mathcal{F}}~{}J(\{u\},x_{0},F),

where $J$ is some loss function, $u$ denotes control signals, $x_{0}$ is an initial state, and $F$ is a dynamics model. Here, $\mathcal{F}$ is a model uncertainty set with which the worst-case performance is to be optimized and thus is key to good performance of the robust controller. However, the way to configure a good $\mathcal{F}$ is not trivial in general.

A possible strategy for model uncertainty set generation is via Bayesian inference, with which we can create an uncertainty set from an inferred posterior. One of the common difficulties in Bayesian inference is that we need to prepare appropriate priors and observation models, which is sometimes challenging when the target dynamics are nonlinear. Moreover, the computational procedures of Bayesian inference may depend on the specific parametrization of dynamics models.

In this paper, we describe an approach toward uncertainty quantification that is parametrization-agnostic, using the transfer operator description of dynamical systems. While commonly used in mathematical physics, the operator-theoretic view of dynamical systems has attracted attention for use in model order reduction, estimation, and control [1]. Considering linear operators (called transfer operators) over function spaces that represent the transition of observable or density functions, we can analyze, identify, and control nonlinear dynamical systems using linear techniques [2, 3, 4, 5, 6, 7, 8, 9].

Model uncertainty quantification within the transfer operator view, however, is a relatively new research area, and developing suitable methods of transfer operator generation for robust optimization and control is the aim of our work. To this end, we develop a method for sampling transfer operators using a metric [10] between nonlinear dynamical systems. The proposed method is based on a Hamiltonian Monte Carlo in the space defined by the metric about a nominal model. Due to the transfer operator representation, our method requires only computing perturbations of linear operators (which are ultimately approximated as $d\times d$ matrices) to generate the uncertainty set, and thus does not require error propagation through complex parametrizations. Using both linear and nonlinear examples, we show how the sampling method preserves qualitative dynamical properties while efficiently searching dynamics space. We finally provide heuristics for sampling transfer operators with constraints.

II RELATED WORK

Uncertainty is inevitable as data are always finite and may include observation noise, but the intersection of uncertainty and the operator-theoretic view on dynamical systems has not yet been explored well. Takeishi et al. [11] discussed a probabilistic interpretation of a technique called dynamic mode decomposition (DMD), which is a widely used algorithm for computing transfer operators, but they only considered the uncertainty of spectral components of the operators. Morton et al. [12] considered the uncertainty of linear transition operators via the uncertainty of observable function values for a model based on neural networks.

In terms of parametric models, uncertainty and inference of parameters in nonlinear differential equations is a well-explored problem [13], with recent advances for accelerating Bayesian inference [14, 15]. These methods are useful in certain contexts (e.g. where only parameters or spectral components are uncertain) but may not fully capture dynamics present in uncertain data. In this case, a more general method is needed to explore the space of dynamics, and this is where data-driven methods for approximating transfer operators such as extended DMD [16] and observable eigenfunction learning [17] are promising.

Complementarily to model uncertainty, state uncertainty has been at the center of interests in the control community. Several researchers have investigated the computation of state uncertainty using operator-theoretic techniques [18, 19, 20]. For example, state-space density propagation using Perron–Frobenius operators has been used for controller selection [20].

Our method is complementary to existing techniques for state- and parameter-space uncertainty quantification in two ways. First, rather than performing error propagation from ensembles of trajectories, we utilize a distribution induced by a metric between transfer operators – thus it is a way to take an arbitrary set of initial models and expand it. Secondly, while the uncertainty set we produce can be used to extrapolate uncertain trajectories and perform state estimation, its use is not only limited to trajectory sampling but a wide range of optimization problems where a set of models is needed, without assuming a specific parametrization.

III BACKGROUND

III-A Transfer operator theory of nonlinear dynamics

A transfer operator is a linear map $\mathcal{L}$ acting on functions $\phi:\mathcal{M}\to\mathbb{C}$ on phase space $\mathcal{M}$ of a dynamical system $F:\mathcal{M}\to\mathcal{M}$ . $\mathcal{L}$ is a linear operator over the function space $\{\phi\}$ and thus is a convenient alternative to the nonlinear state-space function $F$ .

In computational fluid dynamics, statistical physics, control theory, and many other fields, the use of transfer operators in describing dynamical systems has seen a recent surge of popularity since it facilitates the usage of linear techniques (e.g. spectral methods) in analyzing nonlinear systems [1].

When $\phi$ are phase-space densities $p_{\mathcal{M}}(x)$ , $\mathcal{L}$ is referred to as the Perron–Frobenius operator $\mathcal{P}$ that acts as the pushforward operator for the Markov process $p_{\mathcal{M}}^{+}(x)=\mathcal{P}p_{\mathcal{M}}(x)$ . When $\phi$ are arbitrary observables $\phi:\mathcal{M}\to\mathbb{C}$ , $\mathcal{L}$ is known as the Koopman (or composition) operator $\mathcal{K}$ [21]. Seminal work by Mezić [1] showed the utility of transfer operator theory in data-driven system identification and model reduction. Using Koopman eigenfunctions to obtain linear descriptions of nonlinear systems facilitated the study of dynamics via the spectrum of the Koopman operator (e.g., characterizations of chaos [22]):

\displaystyle\Phi(F^{t}(X))=\Lambda^{t}\Phi(X),

where $\Phi$ and $\Lambda$ are the eigenfunctions and the eigenvalues of the Koopman operator, respectively. This led to the least-squares solutions for $\mathcal{K}$ by Schmid et al. [23], called DMD.

DMD has since been extended with dictionaries of nonlinear basis functions (extended DMD [16]), observables in reproducing kernel Hilbert spaces [24, 25], neural networks (e.g., [26]), and the incorporation of control [4]. In this paper, we build upon this prior work, exploring uncertainty quantification in the context of transfer operators which are learned from observation data.

III-B Transfer operators for stochastic processes

To discuss transfer operators in the context of uncertain data, let us move to a fundamentally probabilistic setting where observations $X_{t}\in\mathcal{M}$ represent a stochastic process. While the system may not be intrinsically noisy, the probabilistic view facilitates representing uncertainty about future states. We adopt the framework of stochastic Koopman operators as introduced by Klus et al. [25] and Song et al. [27], and provide a brief background below.

Let a dynamical system $F:\mathcal{M}\to\mathcal{M}$ have invariant probability measure $\mu$ , compactly supported over a measurable subspace $\Omega\subseteq\mathcal{M}$ . Let the sequence $\{X_{t}|t\geq 0\}$ be a Markov process with transition density given by $p(y|x)=\Pr\{F(X)=y|X=x\}$ . Given an observable $g\in L^{2}(\mathcal{M},\mu)$ , the action of the stochastic Koopman operator $\mathcal{K}$ on $g$ is defined by the conditional mean:

\mathcal{K}g(x)=\mathbb{E}[g(F(X))|X=x]=\int_{\Omega}p(y|x)g(y)\mathrm{d}\mu(y).

(1)

Its right-adjoint, the Perron–Frobenius operator $\mathcal{P}$ , directly maps marginal distributions as $p^{+}(x)=\mathcal{P}p(x)=\int_{\Omega}p(y|x)p(x)\mathrm{d}\mu(x)$ .

Let observables $\phi$ lie in an inner product space $\mathcal{H}$ ( $\mathcal{H}$ is taken to be an RKHS in [25], but for our purposes it can be the completion of a finite observable basis $|\Phi|=d$ , and for trajectory visualization one may define these functions such that the preimage can be computed). Now, we may define the Gramian $\mathcal{C}_{XX}$ (i.e., a cross-covariance operator [27]):

\mathcal{C}_{XY}:=\mathbb{E}_{XY}[\Phi(X)\otimes\Phi(Y)]=\int_{\Omega\otimes\Omega}\Phi(x)\otimes\Phi(y)\mathrm{d}p(x,y).

Using the relation $\mathcal{C}_{YX}f=\mathbb{E}_{Y|X}[f(Y)|X]\mathcal{C}_{XX}$ [27], we can express the stochastic Koopman and Perron–Frobenius operators (1) in terms of the cross-covariance operators as:

	$\displaystyle\mathcal{K}$	$\displaystyle\simeq C_{XY}(C_{XX}+\epsilon\mathcal{I})^{-1},$		(2)
	$\displaystyle\mathcal{P}$	$\displaystyle\simeq C_{YX}(C_{XX}+\epsilon\mathcal{I})^{-1},$		(3)

that push forward observables $g\in\mathcal{H}$ and densities $p(x)$ on $\Omega$ , respectively. $\epsilon$ is for ensuring invertibility.

Considering perturbations to these operators, which are estimated from cross-covariances matrices using finite trajectories, forms the population of our uncertainty set.

III-C Kernels over dynamical systems

To compare the behavior of two dynamical systems, various metrics exist in the literature. For Markov processes, total-variation distance between density functions is commonly used, and more general classes of linear metrics over Markov chains have been proposed [28]. The standard operator norm can be used for bounded linear operators; when approximated by matrices, one can use an induced matrix norm, such as $\operatorname{Tr}(A^{T}B)$ where $A$ and $B$ are linear finite dynamical systems. However, both of these standard metrics fail to capture the iterated behavior of the respective systems. To address this, Viswanathan et al. leverage the Binet–Cauchy theorem to define a kernel between trajectories or iterated maps [29].

In [10], Ishikawa et al. generalize the Binet-Cauchy kernel to general nonlinear dynamical systems defined by their respective Perron–Frobenius operators as follows. For two dynamical systems $(D_{1},D_{2})$ specified by their initial values and maps $((X_{1,0},f_{1}),(X_{2,0},f_{2}))$ ,

	$\displaystyle k_{\mathcal{P}}^{m,T}$	$\displaystyle((X_{1,0},f_{1}),(X_{2,0},f_{2}))$		(4)
		$\displaystyle:=\operatorname{Tr}\left(\bigwedge^{m}\sum_{t=0}^{T-1}(L_{1,h}\mathcal{P}_{1}^{t}\mathcal{I}_{1})^{*}L_{2,h}\mathcal{P}_{2}^{t}\mathcal{I}_{2}\right),$		(4)

where $\mathcal{I}:\mathbb{C}^{n}\to\mathcal{H}$ is an initial value operator embedding initial data into a Hilbert space, $\mathcal{P}^{t}$ is the $t$ th iterate of the Perron–Frobenius operator on $\mathcal{H}$ , and $L_{h}:\mathcal{H}\to\mathcal{H}_{\text{ob}}$ embeds states into an observable space. There is a relation between (4) and the determinant kernel in [29]; please refer to [10] for details. In the proposed sampling method, we make use of this kernel (4) to generate perturbations to Koopman operators.

IV PROPOSED METHOD

In this section, we first define a kernel and a pseudo-metric over Koopman operators utilizing the previous studies on operator-theoretic kernels over dynamical systems [30, 10]. Then, we present sampling procedures for dynamical systems using the kernel, which depend on the Hamiltonian Monte Carlo method [31] with some heuristic modifications.

IV-A Defining a kernel over Koopman operators

Whereas the kernel proposed by Ishikawa et al. [10] is defined for Perron–Frobenius operators on RKHS, we adapt it for the Koopman operator as follows. Let the initial value and observable operators $\mathcal{I},\mathcal{L}_{h}$ be already applied, and the kernel (4) evaluated on a dynamical system $\mathcal{K}$ which acts in this (observable) space. Then we simplify (4) to:

\displaystyle k_{\mathcal{K}}^{m,T}

\displaystyle(f_{1},f_{2}):=\operatorname{Tr}\left(\bigwedge^{m}\sum_{t=0}^{T-1}(\mathcal{K}_{1}^{t})^{*}\mathcal{K}_{2}^{t}\right).

(5)

As noted in [10], this kernel is convergent in the limit of $T$ for semi-stable $\mathcal{K}$ , that is, those with spectral radius $\rho(\mathcal{K})\leq 1$ only. To use $k_{\mathcal{K}}^{m,T}$ for general Koopman operators $\mathcal{K}$ , we use an exponential discounting scheme with factor $\lambda\geq 0$ :

\displaystyle k_{\mathcal{K}}^{m,T,\lambda}

\displaystyle(\mathcal{K}_{1},\mathcal{K}_{2})=\operatorname{Tr}\left(\bigwedge^{m}\sum_{t=0}^{T-1}e^{-\lambda t}(\mathcal{K}_{1}^{t})^{*}\mathcal{K}_{2}^{t}\right).

(6)

Note that the discounting factor has been adopted also in previous studies on dynamical system kernels [32, 30].

Let us show the convergence of (6) as $T\to\infty$ informally, for a finite-dimensional approximation $K$ of $\mathcal{K}$ . We first observe that $\sum_{t=0}^{\infty}A^{t}$ converges if $\lim_{t\to\infty}||A^{t}||_{F}=0$ for any matrix $A$ . As the product of two convergent series is convergent, it suffices to show that $\lim_{t\to\infty}||e^{-\lambda t/2}K^{t}||_{F}=0$ for all $K$ . Using Gelfand’s formula,

\displaystyle||e^{-\lambda t/2}K^{t}||_{F}=e^{-\lambda t/2}||K^{t}||_{F}<e^{-\lambda t/2}\rho(K)^{t}.

Even if $\rho(K)>1$ , the series is convergent if $\lambda>2\log\rho(K)$ . Thus $k^{m,T,\lambda}$ converges for all $K$ and appropriate $\lambda$ .

IV-B Sampling from distributions over transfer operators

We propose a sampling procedure for dynamical systems models given an inner product defined over transfer operators. We define a (pseudo-)metric bounded in $[0,1]$ for operators $K_{1},K_{2}$ , using a cosine similarity, as

\displaystyle d_{k}(K_{1},K_{2}):=\sqrt{1-\frac{\langle K_{1},K_{2}\rangle_{k}^{2}}{\langle K_{1},K_{1}\rangle_{k}\langle K_{2},K_{2}\rangle_{k}}},

where $\langle\cdot,\cdot\rangle_{k}$ denotes the inner product induced by a positive-definite kernel $k$ . As a baseline, we may also consider the standard linear kernel $k(A,B)=\operatorname{Tr}(A^{\textsf{T}}B)$ , which is not necessarily a good option for transfer operators.

Let $D\in[0,1]$ be a random variable with density $p_{D}$ . Furthermore, let $K_{0}$ be a nominal transfer operator (such as ones estimated by DMD). Then we define a likelihood of any dynamical system $K$ as:

\displaystyle\mathcal{L}(K\mid K_{0}):=p_{D}(d_{k}(K,K_{0}))

(7)

For example, we may assume $p_{D}$ is a Beta distribution with $\beta\gg\alpha$ , or an exponential distribution with vanishing density past $1$ . Using this we may easily construct an uncertainty set of radius $r$ as $\Delta=\{K\sim\mathcal{L}(K|K_{0})\ |\ d_{k}(K,K_{0})\leq r\}$ .

Sampling $K$ using $\mathcal{L}$ can be done in a number of ways. The conceptually simplest algorithm is rejection sampling: for any uniformly perturbed $K$ , accept if $u<c\mathcal{L}(K|K_{0})$ for $u\sim[0,1]$ and convergence parameter $c$ . Unfortunately, even generating the initial uniform perturbations may be computationally intractable due to the high dimensionality of the samples. This results in low acceptance rates for rejection sampling as well as random-walk MCMC methods such as Metropolis–Hastings. We turn our focus to gradient-based MCMC methods which are able to generate distant proposals and achieve dimensionality-independent acceptance rates.

IV-C High-dimensional sampling via HMC

In his seminal work [31], Neal introduced Hamiltonian Monte Carlo (HMC), which uses the gradient of the likelihood to simulate stochastic Hamiltonian dynamics whose stationary distribution is the posterior (7). It is well suited to our case since many kernels over dynamical systems are basically differentiable.

We adapt HMC for transfer operator sampling by introducing an auxiliary momentum variable $R$ which is of the same dimension as $K$ , whose relation to $K$ is given via Hamiltonian dynamics. Let us define the potential and the Hamiltonian of an operator $K$ as:

	$\displaystyle U(K)$	$\displaystyle=-\log\Big{[}p_{D}\big{(}d_{k}(K,K_{0})\big{)}\Big{]}$		(8)
	$\displaystyle H(K)$	$\displaystyle=U(K)+\frac{1}{2}\operatorname{Tr}(R^{T}R)$		(9)

Noting that the potential $U$ must be defined (and continuous) everywhere in order to achieve the correct stationary distribution, $d_{k}$ must be similarly well-behaved for all $K$ . Using the discounted kernel (6), we generate samples $\{K\}$ of Koopman operators about a nominal $K_{0}$ via Hamiltonian dynamics in the potential defined by (8) using the leapfrog integrator (we refer the reader to [31] for details).

IV-D Heuristic: parallel HMC with uniform prior

The mixing time of HMC is highly sensitive to the choice of discretization parameters (in particular, n_leapfrog, step_size). In practice, Neal recommends $\epsilon\sim O(d^{1/4})$ [31], and there also exist adaptive step-determining algorithms such as the No-U-Turn Sampler [33]; however, we find a tradeoff between computational efficiency (samples/step) and sufficient exploration of the model space when using HMC for transfer operators. To compensate, we use a pre-run of HMC in a zero-potential to generate a uniform prior of samples, which are then used as initial conditions for parallel HMC sampling from the distribution (7). We find that this accelerates the mixing time and wall-clock time for sampling significantly.

IV-E Heuristic: HMC with spectral constraints

In the interest of producing meaningful samples, one may wish to constrain the space of models using prior knowledge of the underlying system. Constraints can be readily expressed as boundary conditions on HMC without changing its stationary solution or reversibility (under some assumptions) – known as Reflective HMC [34]. As an example, suppose that we have knowledge that the system is structurally stable under any realistic perturbation; then, we can (roughly) encode this as a constraint on the spectral radius $\rho(K_{0})-\epsilon\leq\rho(K)\leq\rho(K_{0})+\epsilon$ .

Extending the reflective HMC procedure from [34], we describe a leapfrog integrator (Algorithm 1) which ensures $f(K)\in[a,b]$ for any differentiable, scalar-valued $f$ . In numerical experiments, we use $f(K)=\rho(K)$ and $[a,b]=[\rho(K_{0})-.01,\,\rho(K_{0})+.01]$ .

Algorithm 1 HMC with scalar constraints

procedure BoundedLeapfrog(

K,R,\epsilon,f,a,b

)

R\leftarrow R-1/2\epsilon\nabla_{K}U

for

i\leftarrow 1

L

while

|\epsilon-\epsilon^{\prime}|>\delta

\triangleright

Reflect till exhaustion

\epsilon^{\prime}\leftarrow\epsilon

K,R,\epsilon\leftarrow\textsc{Step}({K},{R},{\epsilon},{f},{a},{b})

R\leftarrow R-1/2\epsilon\nabla_{K}U

return

K,R

function Step(

K,R,\epsilon,f,a,b

)

f(K+\epsilon R)>b

then

K,\epsilon\leftarrow\textsc{FindMax}({K},{f},{b})

\triangleright

Find boundary

R_{\perp}\leftarrow\frac{\langle R,\nabla_{K}f\rangle}{\langle\nabla_{K}f,\nabla_{K}f\rangle}*\nabla_{K}f

\triangleright

Reflection plane

R\leftarrow R-2R_{\perp}

return

K,R,\epsilon

else if

f(K+\epsilon R)<a

then

\cdots

\triangleright

Defined similarly

else

return

K+\epsilon R,R,0

\triangleright

Otherwise, take full step

IV-F Computation of samples in practical settings

We will give two formulations of HMC (8) for finite arguments, one explicitly over transfer operators, and one implicitly over observed trajectories. Either may be used.

Formulation 1

Assume a transfer operator $\mathcal{K}$ for a dynamical system is approximated as a matrix $K\in\mathbb{R}^{d\times d}$ . Then, (6) simplifies to:

		$\displaystyle k^{m,T,\lambda}_{K}(K,K_{0})$		(10)
		$\displaystyle\quad=\sum_{I\subset[d],\|I\|=m}\det\left(\sum_{t=0}^{T-1}e^{-\lambda t}(K^{t})^{\textsf{T}}K_{0}^{t}\right)_{[I,I]},$		(10)

where $A_{[I,I]}$ denotes the submatrix given by indices $I$ . This formulation can be used when we have a nominal Koopman operator estimation $K_{0}$ and want to generate perturbed $K$ ’s.

Formulation 2

Assume observations $x_{i}\in\mathbb{R}^{d}$ of a dynamical system are given as a matrix $X$ . Then, we may define the Hamiltonian (8) for trajectories as:

\displaystyle H(X)

\displaystyle=-\log\left[p_{D}(d_{k}(X,X_{0}))\right]+1/2\langle R,R\rangle.

The kernel (6) can be defined for trajectories of length $T$ in a similar fashion to (10). With $T$ fixed, an explicit discounting term is no longer needed. As an example, for $m=2$ :

\displaystyle k^{2,T,\lambda}(X,X_{0})=\sum_{i,j\in[1,T]}\det\begin{bmatrix}k(x_{i},x_{j})&k(x_{i},x_{0,j})\\ k(x_{0,i},x_{j})&k(x_{0,i},x_{0,j})\end{bmatrix}

for some feature kernel $k(\cdot,\cdot)$ . Here, $x_{i}$ and $x_{0,i}$ are the elements of $X$ and $X_{0}$ , respectively. If we have explicit observables $\phi$ , simply let $k(x,y)=\langle\phi(x),\phi(y)\rangle$ . This formulation allows one to sample trajectories $\{X\}$ about some observed trajectory $X_{0}$ . When used in conjunction with a parametrized dynamics model $F_{\theta}$ , {X} can be used for inference procedures (e.g., E-M) on $\theta$ .

Refer to caption — Figure 1: Perturbations of 2x2 systems. Left to right: spiral sink, spiral source, nodal sink, center, center, saddle. Top to bottom: trace-determinant plots for baseline vs. proposed method ( $m=2,T=80,\lambda=2\log\rho(A_{d})$ ).

V NUMERICAL EXAMPLES

We show how our sampling procedure compares against baseline perturbation methods in generating meaningful perturbations of dynamical systems¹¹1Codes: github.com/ooblahman/koopman-robust-control.. For example, methods in robust optimization (RO) take the general form

\displaystyle\min_{\theta}\max_{A\in\Delta}J(A,\theta)\quad\text{where}\quad\Delta:=\{A+\Delta A\},

where $A$ is a model and $\Delta A$ is an uncertainty structure. $\Delta A$ may have a particular form, e.g. block-diagonal, or unstructured, e.g. $\Delta A\sim\mathcal{N}(0_{d\times d},\Sigma^{2})$ . The goodness of the RO minimizer depends solely on the choice of uncertainty set. Either of these uncertainty structures essentially induces a distribution over the norms of perturbations; we note that this is an assumption, and the subject of our testing is whether this assumption is valid when it is known the perturbed matrices represent dynamical systems.

A norm-bounded perturbation set essentially translates as sampling with a trace kernel:

\displaystyle k(A,A_{0})=\operatorname{Tr}(A^{\textsf{T}}A_{0}),

(11)

which we use as a baseline for comparison. As mentioned in section III-C, despite its naturalness, it is not necessarily suitable for dynamical systems. By contrast, the proposed uncertainty structure induced by (10) incorporates the iterated behavior of $K$ . We demonstrate that the latter better preserves key properties of dynamical systems such as structural stability and attractor basins, while effectively exploring dynamics space, on both linear systems of ODEs and nonlinear systems via the Koopman operator.

V-A 2-dimensional LTI systems

We first consider linear systems of the form $\dot{x}=Ax$ via discretization as $A_{d}=e^{A\Delta t}$ . We use simple 2x2 systems in order to clearly characterize the dynamics in a trace-determinant plot. Using the discounted kernel (10), we are able to generate perturbations of both source- and saddle-types in addition to the semistable regimes. We use spread parameter $\beta=5$ , HMC step $\epsilon=10^{-4}$ , HMC leapfrog $L=100$ , and generate $N=1000$ samples with $k=50$ initial conditions for every test shown. We compare the following two kernels:

\displaystyle\operatorname{Tr}(A^{\textsf{T}}A_{0})~{}\text{in}~{}\eqref{eq:tracekernel}\quad\text{and}\quad k^{m,T,\lambda}_{K}(A,A_{0})~{}\text{in}~{}\eqref{eq:finitekernel},

termed the “trace kernel” and “Koopman kernel”, respectively.

The strength of the proposed method, using $k^{m,T,\lambda}_{K}$ , can be seen when the nominal system is within a region of structural instability (see the two center systems, Figure 1). The trace kernel perturbations venture easily into spiral sink or spiral sources, which are distant in dynamics terms but very close in absolute norm. In these cases, the proposed method using the Koopman kernel retains a much tighter spread in the trace-determinant plane. Furthermore, it can be seen from the posterior distributions that the proposed method is able to explore distant dynamics while staying bounded within structurally similar regions.

V-B Unforced Duffing oscillator

Next, we consider perturbations of a nonlinear system. We use the unforced Duffing equation:

\dot{x}=y,\quad\dot{y}=-0.3y+x-x^{3},

(12)

whose basins of attraction are like in Figure 2.

We use simulated trajectories of length $t=400$ seconds with $8000$ samples per trajectory across $144$ initial conditions in the range $[-2,-2],[2,2]$ . Using $15$ polynomial observables with maximum degree $5$ , we compute the Koopman operator $K$ as $K=YX^{\textsf{T}}(XX^{\textsf{T}})^{\dagger}$ . For all experiments we use spread $\beta=5$ , HMC step $\epsilon=5\times 10^{-5}$ , HMC leapfrog $L=200$ , $N=2000$ samples, and $k=200$ initial conditions. In the shown perturbations, trajectories in the left and right basins of attraction are highlighted in red and blue, respectively (Figure 3).

We immediately observe some qualitative differences between the two perturbation sets. First, it is apparent that perturbations via the Koopman kernel preserve attractor structure in most samples, versus almost none in the baseline setting. In the case of the Duffing oscillator, this is a defining feature, and such preservations are important consideration for any robust prediction or control procedure over dynamics models. Second, a large proportion of samples produced by the baseline method are diverging; these would need to be manually filtered out if used in a robust optimization setting. We observe in experiments that this can be mitigated by restricting the norm of perturbations (i.e., increasing $\beta$ ), but this comes at the cost of decreased exploration of dynamics space and changes the robustness of an RO solution using the perturbation set (moreover, manual filtering changes the posterior, altering the RO problem). We also find that the spectral radius constraint alleviates many of these concerns with the baseline method, however, non-convex reflection is not a trivial procedure to implement in HMC and has not been a typically used method in generating uncertainty sets.

Finally, while attractors are mostly preserved in our method, the attractor basins seem to undergo some geometry warping. This suggests an interpretation of our perturbation method as warping the underlying potential wells, which may have meaningful physical interpretations.

VI CONCLUSIONS

In this work, we developed a method for sampling from distributions over dynamical systems using transfer-operator representations, leveraging operator-theoretic metrics to generate perturbations. We suggested to use the method for model uncertainty set generation, which is a universal problem in robust control and prediction and an important step for uncertainty quantification. Future directions of research include expressing constraints over sampled dynamical systems where we may have domain-specific knowledge.

References

[1] I. Mezić, “Spectral properties of dynamical systems, model reduction and decompositions,” Nonlinear Dynamics, vol. 41, no. 1, pp. 309–325, 2005.
[2] A. Mauroy, F. Forni, and R. Sepulchre, “An operator-theoretic approach to differential positivity,” in Proc. of the 54th IEEE Conf. on Decision and Control, pp. 7028–7033, 2015.
[3] J. Annoni, P. Seiler, and M. R. Jovanović, “Sparsity-promoting dynamic mode decomposition for systems with inputs,” in Proc. of the 55th IEEE Conf. on Decision and Control, pp. 6506–6511, 2016.
[4] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic mode decomposition with control,” SIAM J. on Applied Dynamical Systems, vol. 15, no. 1, pp. 142–161, 2016.
[5] A. Surana, “Koopman operator based observer synthesis for control-affine nonlinear systems,” in Proc. of the 55th IEEE Conf. on Decision and Control, pp. 6492–6499, 2016.
[6] A. Mauroy and J. Goncalves, “Linear identification of nonlinear systems: A lifting technique based on the Koopman operator,” in Proc. of the 55th IEEE Conf. on Decision and Control, pp. 6500–6505, 2016.
[7] A. Mauroy and I. Mezić, “Global stability analysis using the eigenfunctions of the Koopman operator,” IEEE Trans. on Automatic Control, vol. 51, no. 1, pp. 3356–3369, 2016.
[8] A. Surana, M. O. Williams, M. Morari, and A. Banaszuk, “Koopman operator framework for constrained state estimation,” in Proc. of the 56th IEEE Conf. on Decision and Control, pp. 94–101, 2017.
[9] N. Takeishi, T. Yairi, and Y. Kawahara, “Factorially switching dynamic mode decomposition for Koopman analysis of time-variant systems,” in Proc. of the 57th IEEE Conf. on Decision and Control, pp. 6402–6408, 2018.
[10] I. Ishikawa, K. Fujii, M. Ikeda, Y. Hashimoto, and Y. Kawahara, “Metric on nonlinear dynamical systems with Perron–Frobenius operators,” in Advances in Neural Information Processing Systems 30, pp. 2856–2866, 2018.
[11] N. Takeishi, Y. Kawahara, Y. Tabei, and T. Yairi, “Bayesian dynamic mode decomposition,” in Proc. of the 26th Int. Joint Conf. on Artificial Intelligence, pp. 2814–2821, 2017.
[12] J. Morton, F. D. Witherden, and M. J. Kochenderfer, “Deep variational Koopman models: Inferring Koopman observations for uncertainty-aware dynamics modeling and control,” in Proc. of the 28th Int. Joint Conf. on Artificial Intelligence, pp. 3173–3179, 2019.
[13] M. Girolami, “Bayesian inference for differential equations,” Theoretical Computer Science, vol. 408, no. 1, pp. 4–16, 2008.
[14] B. Calderhead, M. Girolami, and N. D. Lawrence, “Accelerating bayesian inference over nonlinear differential equations with gaussian processes,” in Advances in neural information processing systems 21, pp. 217–224, 2009.
[15] M. Niu, S. Rogers, M. Filippone, and D. Husmeier, “Fast inference in nonlinear dynamical systems using gradient matching,” in Proc. of the 33rd Int. Conf. on Machine Learning, pp. 1699–1707, 2016.
[16] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition,” J. of Nonlinear Science, vol. 25, no. 6, pp. 1307–1346, 2015.
[17] M. Korda and I. Mezić, “Optimal construction of Koopman eigenfunctions for prediction and control.” arXiv:1810.08733, 2018.
[18] A. Surana and A. Banaszuk, “Linear observer synthesis for nonlinear systems using Koopman operator framework,” IFAC-PapersOnLine, vol. 49, no. 18, pp. 716–723, 2016.
[19] T. Shnitzer, R. Talmon, and J.-J. Slotine, “Diffusion maps Kalman filter for a class of systems with gradient flows.” arXiv:1711.09598, 2017.
[20] J. J. Meyers, A. M. Leonard, J. D. Rogers, and A. R. Gerlach, “Koopman operator approach to optimal control selection under uncertainty,” in Proc. of the 2019 American Control Conf., pp. 2964–2971, 2019.
[21] B. O. Koopman and J. v. Neumann, “Dynamical systems of continuous spectra,” Proc. of the National Academy of Sciences, vol. 18, no. 3, pp. 255–263, 1932.
[22] H. Arbabi, Koopman Spectral Analysis and Study of Mixing in Incompressible Flows. PhD thesis, University of California, Santa Barbara, 2017.
[23] P. J. Schmid, “Dynamic mode decomposition of numerical and experimental data,” J. of Fluid Mechanics, vol. 656, pp. 5–28, 2010.
[24] Y. Kawahara, “Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis,” in Advances in Neural Information Processing Systems 29, pp. 911–919, 2016.
[25] S. Klus, I. Schuster, and K. Muandet, “Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces,” J. of Nonlinear Science, vol. 30, p. 283–315, Aug 2019.
[26] N. Takeishi, Y. Kawahara, and T. Yairi, “Learning Koopman invariant subspaces for dynamic mode decomposition.,” in Advances in Neural Information Processing Systems 30, pp. 1130–1140, 2017.
[27] L. Song, J. Huang, A. Smola, and K. Fukumizu, “Hilbert space embeddings of conditional distributions with applications to dynamical systems,” in Proc. of the 26th Int. Conf. on Machine Learning, pp. 961–968, 2009.
[28] P. Daca, T. A. Henzinger, J. Kretínský, and T. Petrov, “Linear distances between Markov chains.” arXiv:1605.00186, 2016.
[29] A. J. Smola and S. Vishwanathan, “Binet-Cauchy kernels.,” in Advances in Neural Information Processing Systems 17, pp. 1441–1448, 2005.
[30] K. Fujii, Y. Inaba, and Y. Kawahara, “Koopman spectral kernels for comparing complex dynamics: Application to multiagent sport plays,” in Lecture Notes in Computer Science, vol. 10536, pp. 127–139, 2017.
[31] R. M. Neal, MCMC Using Hamiltonian Dynamics, ch. 5. CRC Press, 2011.
[32] S. Vishwanathan, A. J. Smola, and R. Vidal, “Binet–Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes,” Int. J. of Computer Vision, vol. 73, no. 1, p. 95–119, 2007.
[33] M. D. Hoffman and A. Gelman, “The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo,” J. of Machine Learning Research, vol. 15, no. 47, pp. 1593–1623, 2014.
[34] H. Mohasel Afshar and J. Domke, “Reflection, refraction, and Hamiltonian Monte Carlo,” in Advances in Neural Information Processing Systems 28, pp. 3007–3015, 2015.

An MCMC Method for Uncertainty Set Generation via Operator-Theoretic Metrics

Abstract

I INTRODUCTION

II RELATED WORK

III BACKGROUND

III-A Transfer operator theory of nonlinear dynamics

III-B Transfer operators for stochastic processes

III-C Kernels over dynamical systems

IV PROPOSED METHOD

IV-A Defining a kernel over Koopman operators

IV-B Sampling from distributions over transfer operators

IV-C High-dimensional sampling via HMC

IV-D Heuristic: parallel HMC with uniform prior

IV-E Heuristic: HMC with spectral constraints

IV-F Computation of samples in practical settings

Formulation 1

Formulation 2

V NUMERICAL EXAMPLES

V-A 2-dimensional LTI systems

V-B Unforced Duffing oscillator

VI CONCLUSIONS

References

An MCMC Method for Uncertainty Set Generation
via Operator-Theoretic Metrics