This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Problem of Reformulating Systems with Uncertain Dynamics as a Stochastic Differential Equation

Thomas Lew, Apoorva Sharma, James Harrison,
Edward Schmerling, Marco Pavone
The authors are with the Department of Aeronautics & Astronautics, Stanford University, Stanford, CA 94305-4035 USA (emails: {thomas.lew, apoorva, jharrison, schmrlng, pavone}@stanford.edu).
Abstract

We identify an issue in recent approaches to learning-based control that reformulate systems with uncertain dynamics using a stochastic differential equation. Specifically, we discuss the approximation that replaces a model with fixed but uncertain parameters (a source of epistemic uncertainty) with a model subject to external disturbances modeled as a Brownian motion (corresponding to aleatoric uncertainty).

I Problem Formulation and Error in Literature

Consider a nonlinear system whose state at time t0t\geq 0 is x(t)nx(t)\in\mathbb{R}^{n}, control inputs are u(t)mu(t)\in\mathbb{R}^{m}, such that

x˙(t)=f(x(t),u(t)),t[0,T],\displaystyle\dot{x}(t)=f(x(t),u(t)),\quad t\in[0,T], (1)

where T>0T>0, x(0)=x0nx(0)=x_{0}\in\mathbb{R}^{n} almost surely, i.e., x(0)x(0) is known exactly, and f:n×mnf:\mathbb{R}^{n}\times\mathbb{R}^{m}\rightarrow\mathbb{R}^{n} is twice continuously differentiable.

In many applications, ff is not known exactly, and prior knowledge is necessary to safely control (1). One such approach consists of assuming that ff lies in a known space of functions \mathcal{H}, and to impose a prior distribution in this space ()\mathbb{P}(\mathcal{H}). For instance, by assuming that ff lies in a bounded reproducing kernel Hilbert space (RKHS), a common approach consists of imposing a Gaussian process prior on the uncertain dynamics f𝒢𝒫(m,k)f\sim\mathcal{GP}(m,k), where m:n+mnm:\mathbb{R}^{n+m}\,{\rightarrow}\,\mathbb{R}^{n} is the mean function, and k:n+m×n+mn×nk:\mathbb{R}^{n+m}\times\mathbb{R}^{n+m}\,{\rightarrow}\,\mathbb{R}^{n\times n} is a symmetric positive definite covariance kernel function which uniquely defines \mathcal{H} [1, 2]. An alternative consists of assuming that f(x,u)=ϕ(x,u)θf(x,u)=\phi(x,u)\theta, where ϕ:n+mn×p\phi:\mathbb{R}^{n+m}\rightarrow\mathbb{R}^{n\times p} are known basis functions, and θp\theta\in\mathbb{R}^{p} are unknown parameters. With this approach, one typically sets a prior distribution on θ\theta, e.g., a Gaussian θ𝒩(θ¯,Σθ)\theta\sim\mathcal{N}(\bar{\theta},\Sigma_{\theta}), and updates this belief as additional data about the system is gathered.

Given these model assumptions and prior knowledge about ff, safe learning-based control algorithms often consist of designing a control law uu satisfying different specifications, e.g., minimizing fuel consumption u\|u\|, or satisfying constraints x(t)𝒳t[0,T]x(t)\in\mathcal{X}\ \forall t\in[0,T], with 𝒳\mathcal{X} a set encoding safety and physical constraints.

Next, we describe an issue with the mathematical formulation of the safe learning-based control problem that has appeared in recent research [3, 4, 5, 6], slightly changing notations and assuming a finite-dimensional combination of features for clarity of exposition but without loss of generality. As in [5], consider the problem of safely controlling the uncertain system

x˙(t)=ϕ(x(t),u(t))θ,θ𝒩(θ¯,Σθ),\displaystyle\dot{x}(t)=\phi(x(t),u(t))\theta,\quad\theta\sim\mathcal{N}(\bar{\theta},\Sigma_{\theta}), (2)

where t[0,T]t\in[0,T], x(0)=x0x(0)=x_{0}, θ¯p\bar{\theta}\in\mathbb{R}^{p}, and Σθp×p\Sigma_{\theta}\in\mathbb{R}^{p\times p} is positive definite, with Σθ=BθBθ\Sigma_{\theta}=B_{\theta}B_{\theta}^{\top} its Cholesky decomposition. Note that this formulation can be equivalently expressed in function space, where ff is drawn from a Gaussian process with mean function m(x,u)=ϕ(x,u)θ¯m(x,u)=\phi(x,u)\bar{\theta} and kernel k(x,u,x,u)=ϕ(x,u)Σθ1ϕ(x,u)k(x,u,x^{\prime},u^{\prime})=\phi(x,u)\Sigma^{-1}_{\theta}\phi(x^{\prime},u^{\prime})^{\top}. Our representation can be seen as a weight-space treatment of the GP approaches used in [3] and [4].111For a squared exponential kernel, one needs pp\rightarrow\infty for this equivalence, see [1] for more details. Nevertheless, the issue discussed in this paper remains valid for such kernels.

These works then proceed by introducing the Brownian motion W(t)W(t), making the change of variable θdt=θ¯dt+BθdW(t)\theta\textrm{d}t=\bar{\theta}\textrm{d}t+B_{\theta}\textrm{d}W(t), and reformulating (2) as a stochastic differential equation (SDE)

dx(t)=ϕ(x(t),u(t))θ¯dt+ϕ(x(t),u(t))BθdW(t),\displaystyle\textrm{d}x(t)=\phi(x(t),u(t))\bar{\theta}\textrm{d}t+\phi(x(t),u(t))B_{\theta}\textrm{d}W(t), (3)

with t[0,T]t\in[0,T] and x(0)=x0x(0)=x_{0}. Unfortunately, (3) is not equivalent to (2). Indeed, the solution to (3) is a Markov process, whereas the solution to (2) is not. Intuitively, the increments of the Brownian motion W(t)W(t) in (3) are independent, whereas in (2), θ\theta is randomized only once, and the uncertainty in its realization is propagated along the entire trajectory. By making this change of variables for θdt\theta\textrm{d}t, the temporal correlation between the trajectory x(t)x(t) and the uncertain parameters θ\theta is neglected. In the next section, we provide a few examples to illustrate the distinction between these two cases. The first demonstrates the heart of the issue on a simple autonomous system, whereas the second shows that analyzing the SDE reformulation (3) is insufficient to deduce the closed-loop stability of the system in (2).

II Counter-Examples

II-A Uncontrolled system

Consider the scalar continuous-time linear system

x˙(t)\displaystyle\dot{x}(t) =θ,θ𝒩(0,1),t[0,T],\displaystyle=\theta,\quad\theta\sim\mathcal{N}(0,1),\quad t\in[0,T], (4)

where T>0T>0 and x(0)=0x(0)=0 almost surely, i.e., x(0)x(0) is known exactly. The solution to (4) satisfies x(t)=θtx(t)=\theta t, i.e. each sample path is a linear (continuously differentiable) function of time t[0,T]t\in[0,T]. The marginal distribution of this stochastic process is Gaussian at any time t[0,T]t\in[0,T], with x(t)𝒩(0,t2)x(t)\sim\mathcal{N}(0,t^{2}). The increments of this process are not independent, since the increment x(t2)x(t1)=(t2/t1)x(t1)x(t_{2})-x(t_{1})=(t_{2}/t_{1})x(t_{1}) depends on x(t1)x(0)=x(t1)x(t_{1})-x(0)=x(t_{1}) for any t2>t1>0t_{2}>t_{1}>0.

Using the change of variables described previously, one might consider substituting dW(t)\textrm{d}W(t) for θdt\theta\textrm{dt}, where W(t)W(t) is a standard Brownian motion, yielding the following SDE

dx(t)=dW(t),x(0)= 0(a.s.)t[0,T].\displaystyle\textrm{d}x(t)=\textrm{d}W(t),\quad x(0)\,{=}\,0\ \text{(a.s.)}\quad t\in[0,T]. (5)

The solution of this SDE is a standard Brownian motion x(t)=W(t)x(t)\,{=}\,W(t) started at W(0)= 0W(0)\,{=}\,0. This stochastic process has different marginal distributions x(t)𝒩(0,t)x(t)\sim\mathcal{N}(0,t), has independent increments, and is not differentiable at any tt almost surely. We illustrate sample paths of these two different stochastic processes in Figure 1.

II-B System with linear feedback

Starting from x(0)=x0x(0)=x_{0}\in\mathbb{R}, consider the controlled linear system

x˙(t)\displaystyle\dot{x}(t) =θx(t)+u(t)=(θ+k)x(t),θ𝒩(θ¯,1),t[0,T],\displaystyle=\theta x(t)+u(t)=(\theta+k)x(t),\ \theta\sim\mathcal{N}(\bar{\theta},1),\ t\in[0,T], (6)

where kk\in\mathbb{R} is a feedback gain and u(t)=kx(t)u(t)=kx(t) is the state-feedback control policy. Solutions to (6) take the form x(t)=x0e(θ+k)tx(t)=x_{0}e^{(\theta+k)t}. Choosing the gain k=(θ¯+1)k=-(\bar{\theta}+1) and simulating from x0=1x_{0}=1, one obtains the sample paths shown in Figure 2. We observe that some sampled trajectories are unstable, corresponding to samples of θ\theta such that θ+k>0\theta+k>0.

Note that the substitution θdt=θ¯dt+dW(t)\theta\textrm{dt}=\bar{\theta}\textrm{dt}+\textrm{d}W(t) yields the SDE

dx(t)=(θ¯+k)x(t)dt+x(t)dW(t),t[0,T].\displaystyle\textrm{d}x(t)=(\bar{\theta}+k)x(t)\textrm{d}t+x(t)\textrm{d}W(t),\quad t\in[0,T]. (7)

The solution of this SDE is a geometric Brownian motion x(t)=x0e((θ¯+k12)t+W(t))x(t)=x_{0}e^{((\bar{\theta}+k-\frac{1}{2})t+W(t))}. Choosing the same control gain k=(θ¯+1)k=-(\bar{\theta}+1) and plotting sample paths in Figure 2, we observe that the system (7) is stochastically stable.

Refer to caption
Figure 1: Visualization of sample paths and confidence intervals for the open loop system given by (4) (green) and for the reformulation presented in [3, 4, 5, 6] (red): the solutions of (4) and of (5) are distinct. The solid and dashed lines represent a handful of sample paths. The shaded regions represent the marginal 95%95\% confidence intervals.

II-C Discrete-time system

The observation we make in this note is well-known in the discrete-time problem setting. For example, starting from x0x_{0}\in\mathbb{R}, the linear system with multiplicative uncertainty

xt+1\displaystyle x_{t+1} =θxt,θ𝒩(θ¯,1),t,\displaystyle=\theta x_{t},\quad\theta\sim\mathcal{N}(\bar{\theta},1),\quad t\in\mathbb{N}, (8)

is different from the system with additive disturbances

xt+1\displaystyle x_{t+1} =θ¯xt+wk,wk𝒩(0,1),t,\displaystyle=\bar{\theta}x_{t}+w_{k},\quad w_{k}\sim\mathcal{N}(0,1),\quad t\in\mathbb{N}, (9)

where the disturbances (wk)k(w_{k})_{k\in\mathbb{N}} are independent and identically distributed. We refer to [7, Chapter 4.7] for further discussions about this topic. We also refer to [8] for a recent analysis of systems of the form of (8) where the parameters θ\theta are resampled at each time tt.

III Implications and Possible Solutions

As (2) and (3) are generally not equivalent, the stability and constraint satisfaction guarantees derived for the SDE (3) in recent research [3, 4, 5, 6] do not necessarily hold for the system (2).222For instance, the generator of the Markov process solving the SDE (3) (see [9]) is used to prove stability in [3, 4, 5, 6]. Unfortunately, (2) does not yield a Markov process. Thus, it would be necessary to adapt the concept of generator to solutions of (2) before concluding the stability of the original system. This could yield undesired behaviors when applying such algorithms, developed on an SDE formulation of dynamics (3), to safety-critical systems where uncertainty is better modeled by (2), i.e., dynamical systems with uncertain parameters that are not changing over time.

Although (3) is not equivalent to (2), it is interesting to ask whether (3) is a conservative reformulation of (2) for the purpose of safe control. For instance, given a safe set 𝒳n\mathcal{X}\subset\mathbb{R}^{n}, if one opts to encode safety constraints through joint chance constraints of the form

(x(t)𝒳t[0,T])(1δ),\displaystyle\mathbb{P}(x(t)\in\mathcal{X}\ \forall t\in[0,T])\geq(1-\delta), (10)

where δ(0,1)\delta\in(0,1) is a tolerable probability of failure, there may be settings where a controller uu satisfying (10) for the SDE (3) may provably satisfy (10) for the uncertain model (2). Indeed, as solutions of (3) may have unbounded total variation (as in the example presented above), which is not the case for solutions of (2), we make the conjecture that for long horizons TT, a standard proportional-derivative-integral (PID) controller may better stabilize (2) than (3), and that similar properties hold for adaptive controllers.

Alternatively, approaches which bound the model error through the Bayesian posterior predictive variance [10] or confidence sets holding jointly over time [11, 12] exist. Given these probabilistic bounds, a policy can be synthesized yielding constraints satisfaction guarantees.

Refer to caption
Figure 2: Visualization of sample paths and confidence intervals for the controlled system given by (6) (green) and for the reformulation presented in [3, 4, 5, 6] (red): the solutions of (6) and of (7) are distinct. The solid and dashed lines represent a handful of sample paths. The shaded regions represent the marginal 95%95\% confidence intervals.

References

  • [1] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning.   MIT press, 2006.
  • [2] M. A. Álvarez, L. Rosasco, and N. D. Lawrence, “Kernels for vector-valued functions: A review,” Foundations and Trends in Machine Learning, vol. 4, no. 3, pp. 195–266, 2012.
  • [3] G. Chowdhary, H. A. Kingravi, J. P. How, and P. A. Vela, “Bayesian nonparametric adaptive control using Gaussian processes,” IEEE Transactions on Neural Networks, vol. 26, no. 3, pp. 537–550, 2015.
  • [4] D. D. Fan, J. Nguyen, R. Thakker, N. Alatur, A. Agha-mohammadi, and E. A. Theodorou, “Bayesian learning-based adaptive control for safety critical systems,” in Proc. IEEE Conf. on Robotics and Automation, 2020.
  • [5] Y. K. Nakka, A. Liu, G. Shi, A. Anandkumar, Y. Yue, and S. J. Chung, “Chance-constrained trajectory optimization for safe exploration and learning of nonlinear systems uncertainties,” IEEE Robotics and Automation Letters, vol. 1, no. 1, pp. 1–9, 2020.
  • [6] G. Joshi and G. Chowdhary, “Stochastic deep model reference adaptive control,” in Proc. IEEE Conf. on Decision and Control, 2021.
  • [7] A. McHutchon, “Nonlinear modelling and control using gaussian processes,” Ph.D. dissertation, University of Cambridge, 2014.
  • [8] R. R. Smith and B. Bamieh, “Median, mean, and variance stability of a process under temporally correlated stochastic feedback,” IEEE Control Systems Letters, vol. 5, no. 3, 2021.
  • [9] J. F. Le Gall, Brownian Motion, Martingales, and Stochastic Calculus.   Springer, 2016.
  • [10] M. J. Khojasteh, V. Dhiman, M. Franceschetti, and N. Atanasov, “Probabilistic safety constraints for learned high relative degree system dynamics,” in 2nd Annual Conference on Learning for Dynamics & Control, 2020.
  • [11] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-based model predictive control for safe exploration,” in Proc. IEEE Conf. on Decision and Control, 2018.
  • [12] T. Lew, A. Sharma, J. Harrison, A. Bylard, and M. Pavone, “Safe active dynamics learning and control: A sequential exploration-exploitation framework,” 2021, available at https://arxiv.org/abs/2008.11700.