An LMI Framework for Contraction-based Nonlinear Control Design by Derivatives of Gaussian Process Regression

Yu Kawano Kenji Kashima Graduate School of Advanced Science and Engineering, Hiroshima University, Higashi-Hiroshima 739-8527, Japan (email: [email protected]). Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan (e-mail: [email protected]).

Abstract

Contraction theory formulates the analysis of nonlinear systems in terms of Jacobian matrices. Although this provides the potential to develop a linear matrix inequality (LMI) framework for nonlinear control design, conditions are imposed not on controllers but on their partial derivatives, which makes control design challenging. In this paper, we illustrate this so-called integrability problem can be solved by a non-standard use of Gaussian process regression (GPR) for parameterizing controllers and then establish an LMI framework of contraction-based control design for nonlinear discrete-time systems, as an easy-to-implement tool. Later on, we consider the case where the drift vector fields are unknown and employ GPR for functional fitting as its standard use. GPR describes learning errors in terms of probability, and thus we further discuss how to incorporate stochastic learning errors into the proposed LMI framework.

keywords:

Nonlinear systems, discrete-time systems, stochastic systems, contraction analysis, Gaussian process regression

^†^†thanks: This work was supported in part by JSPS KAKENHI Grant Numbers JP21H04875 and JP21K14185.

1 Introduction

Contraction theory [28, 9] has attracted massive research attention in the systems and control community, e.g., [7, 40], which establishes a differential geometric approach to study incremental properties, i.e., properties between any pair of trajectories. Revisiting nonlinear control theory from contraction perspectives brings new insights not only for stability analysis but also for dissipativity theory [43, 11, 23], balancing theory [25, 19], and monotone systems [10, 20, 21] to name a few. As an advantage in comparison with classical Lyapunov theory, the nature of incremental stability enables us to formulate various design problems in a unified framework, such as stabilizing control [29, 24], tracking control [15, 33], observer design [1, 28], and control design for achieving synchronizations [1] and other rich behaviors [11]. Owing to the differential geometric feature, these problems are described in terms of Jacobian matrices, which is expected as another advantage in practical use. Indeed, restricting the class of controllers (and observer gains) into linear reduces design problems to linear matrix inequalities (LMIs) [19, 11]. However, nonlinear control design is more involved because of the so-called integrability problem. Namely, design conditions are imposed not on controllers but on their partial derivatives, which are the main difficulty in developing an LMI framework for contraction-based nonlinear control design.

To overcome this difficulty, in this paper, we employ Gaussian process regression (GPR), a functional fitting tool [32, 4]. As its non-standard use, we employ GPR to parametrize a controller based on its two important features: 1) computing derivatives of GPR is easy; 2) GPR becomes linear with respect to parameters while it possesses the flexibility to describe a nonlinear function. Utilizing these two, we describe a condition for control design in terms of LMIs with respect to parameters of GPR. Namely, we establish an LMI framework for contraction-based nonlinear stabilizing control design by explicitly addressing the integrability problem. We mainly consider nonlinear discrete-time systems with constant input vector fields and constant metrics for contraction. Then, we mention that the formulations are still in the LMI framework for discrete-time systems with non-constant input vector fields and continuous-time systems with non-constant input vector fields and non-constant metrics. The proposed method is further applicable to the aforementioned various design problems thanks to a unified problem formulation by contraction theory.

In systems and control, GPR is typically used to estimate an unknown drift vector field from measured states, and [17, 36, 35, 2, 41, 42] study control design for learned models. Other applications are a joint estimation of the state and model [3, 6] and solving the Hamilton-Jacobi equations/inequalities [18, 14]. In particular, [35, 2, 41, 42] study closed-loop stability under learning errors. However, controllers are designed without taking learning errors into account, and learning errors are used for closed-loop analysis only. Motivated by these works, in this paper, we also consider the case where a drift vector field is unknown, and this is learned by GPR. In contrast to the conventional approach, we compensate for the learning error by control design, which are benefits of developing the LMI framework for control design and learning models by GPR. The proposed approach can be generalized to the case where the whole system dynamics are unknown, and only system’s input and output are measurable, since there are learning approaches by GPR in such a setting [13, 12].

As relevant researches, [38, 39, 35] give neural network frameworks for contraction-based control design, which requires iterations for finding suitable parameters in contrast to the proposed LMI framework. The paper [17] formulates control design for models learned by GPR in an LMI framework and designs a switching linear controller, but does not use GPR for control design or is not based on contraction theory. Therefore, the result in [17] is not applicable for solving the integrability problem. In this paper, we establish an LMI framework for contraction-based control design by utilizing the derivatives of GPR to solve the integrability problem.

The remainder of this paper is organized as follows. In Section 2, we pose the problem formulation by mentioning the integrability problem of control design in contraction theory. In Section 3, we develop an LMI framework for contraction-based control design by utilizing derivatives of GPR. In Section 4, we consider the case where drift vector fields are unknown. In Section 5, the proposed control design method is illustrated by the means of an example.

Notation: The sets of real numbers, non-negative integers, and positive integers are denoted by ${\mathbb{R}}$ , ${\mathbb{Z}}_{\geq 0}$ , and ${\mathbb{Z}}_{>0}$ , respectively. The identity matrix with the size $n$ is denoted by $I_{n}$ . For $P,Q\in{\mathbb{R}}^{n\times n}$ , $P\succ Q$ (resp. $P\succeq Q$ ) means that $P-Q$ is symmetric and positive definite (resp. semi-definite). The Euclidean norm of $x\in{\mathbb{R}}^{n}$ weighted by $P\succ 0$ is denoted by $|x|_{P}:=\sqrt{x^{\top}Px}$ . If $P=I_{n}$ , this is simply denoted by $|x|$ . The Moore-Penrose inverse of a matrix $A$ is denoted by $A^{+}$ .

For a function $f(x,y)$ , the row vector-valued function consisting of its partial derivatives with respect to $x$ is denoted by $\partial_{x}f:=\partial f/\partial x=[\begin{matrix}\partial f/\partial x_{1}&\cdots&\partial f/\partial x_{n}\end{matrix}]$ . If $f$ depends on $x$ only, this is simply denoted by $\partial f$ . For a scalar-valued function $k(x,x^{\prime})$ , its Hessian matrix is denoted by $\partial^{2}k(x,x^{\prime}):=\partial^{2}k(x,x^{\prime})/\partial x\partial x^{\prime}$ , which is a matrix-valued function. The multivariate normal distribution with mean $\mu$ and variance $\Sigma$ is denoted by ${\mathcal{N}}(\mu,\Sigma)$ . The (standard) expectation is denoted by ${\mathbb{E}}[\cdot]$ . A stochastic process $\{\omega_{k}\}_{k\in{\mathbb{Z}}_{\geq 0}}$ is said to be i.i.d. if $\omega_{k}$ , $k\in{\mathbb{Z}}_{\geq 0}$ are independently distributed, and none of the characteristics of $\omega_{k}$ changes with $k\in{\mathbb{Z}}_{\geq 0}$ .

2 Preliminaries

2.1 Problem Formulation

Consider the following discrete-time nonlinear system:

\displaystyle x_{k+1}=f(x_{k})+bu_{k},\quad k\in{\mathbb{Z}}_{\geq 0},

(1)

where $x_{k}\in{\mathbb{R}}^{n}$ and $u_{k}\in{\mathbb{R}}$ denote the state and input, respectively; $f:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n}$ is of class $C^{1}$ , i.e., continuously differentiable, and $b\in{\mathbb{R}}^{n}$ . The $i$ th components of $f$ and $b$ are denoted by $f_{i}$ and $b_{i}$ , respectively. For the sake of notational simplicity, we consider single-input systems. However, the results can readily be generalized to multiple-input systems as explained below. We also later discuss the case where $b$ is a function of $x$ .

In contraction theory [28, 37, 22], we study incremental stability as a property of the pair of trajectories, stated below.

Definition 2.1.

The system $x_{k+1}=f(x_{k})$ (with its copy $x^{\prime}_{k+1}=f(x^{\prime}_{k})$ ) is said to be incrementally exponentially stable (IES) if there exist $a>0$ and $\lambda\in(0,1)$ such that

\displaystyle|x_{k}-x^{\prime}_{k}|\leq a\lambda^{k}|x_{0}-x^{\prime}_{0}|,\quad\forall k\in{\mathbb{Z}}_{\geq 0}

for each $(x_{0},x^{\prime}_{0})\in{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}$ . $\lhd$

Applying [37, Theorem 15] to control design yields the following IES condition.

Proposition 2.2.

Suppose that there exist $\varepsilon>0$ , $P\in{\mathbb{R}}^{n\times n}$ , and $p_{\partial}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{1\times n}$ of continuous such that

\displaystyle\begin{bmatrix}P&*\\ \partial f(x)P+bp_{\partial}(x)&P\end{bmatrix}\succeq\varepsilon I_{2n}

(2)

for all $x\in{\mathbb{R}}^{n}$ , where $*$ represents the appropriate matrix. If there exists $p:{\mathbb{R}}^{n}\to{\mathbb{R}}$ of class $C^{1}$ such that

\displaystyle p_{\partial}(x)=\partial p(x)P

(3)

for all $x\in{\mathbb{R}}^{n}$ , then the closed-loop system $x_{k+1}=f(x_{k})+bp(x_{k})$ is IES.

{pf}

By the Schur complement, the set of (2) and (3) is equivalent to $P\succ 0$ and

		$\displaystyle P(\partial f(x)+b\partial p(x))^{\top}P^{-1}(\partial f(x)+b\partial p(x))P$
		$\displaystyle-P\preceq\varepsilon I_{n}.$		(4)

This is nothing but the definition of uniform contraction with $\Theta=P^{-1/2}$ [37, Definition 6] for the closed-loop system $x_{k+1}=f(x_{k})+bp(x_{k})$ . Therefore, [37, Theorem 15] concludes IES of the closed-loop system. ∎

In [37, Theorem 15], it has been shown that a closed IES system admits a state-dependent $P$ satisfying a similar inequality as (2). Moreover, such a $P$ is uniformly lower and upper bounded by constant matrices. It is not yet clear when $P$ becomes constant. If one restricts the class of controllers into linear for a constant $P$ , control design can be reduced to linear matrix inequalities (LMIs); see, e.g., [11, 19]. Namely, control design can be sometimes formulated as a practically solvable problem even for nonlinear systems. Indeed, (2) is an LMI with respect to $\varepsilon$ , $P$ and $p_{\partial}(x)$ at each $x\in{\mathbb{R}}^{n}$ . However, as in Proposition 2.2, designing nonlinear controllers is not fully formulated in the LMI framework, due to the so-called integrability constraint (3), i.e., $p_{\partial}(x)P^{-1}$ needs to be the partial derivative of some function $p(x)$ providing a feedback control law $u=p(x)$ . The main objective of this paper is to develop an LMI framework for stabilizing nonlinear control design by tackling the integrability constraint, stated below.

Problem 2.3.

Given $f(x)$ and $b$ of the system (1), develop an LMI framework for designing $p:{\mathbb{R}}^{n}\to{\mathbb{R}}$ which satisfies all the conditions in Proposition 2.2, if such $p$ exists. $\lhd$

Remark 2.4.

We later consider the case where $f$ is unknown by learning it. As a byproduct of developing control design methodologies in the LMI framework, it is possible to compensate for learning errors by control design. $\lhd$

An important feature of contraction theory is to study the convergence between any pair of trajectories. By virtue of this, one can handle observer design [1, 28], tracking control [15, 33], and control design for imposing synchronizations [1] and rich behavior such as limit cycles [11] in the same framework as stabilizing control design. These references mainly focus on continuous-time systems, but similar results can be delivered to discrete-time systems, since incremental stability conditions in contraction analysis have been derived also for discrete-time systems [28, 37]. Moreover, the proposed method in this paper can be generalized to continuous-time systems as will be explained in Section 3.3. Therefore, solving Problem 2.3 can result in LMI frameworks for various design problems.

Integrability constraints sometimes appear in nonlinear adaptive control or observer design. Since directly addressing integrability constraints are difficult, there are techniques for avoiding them by adding the dynamic order of the identifier. However, as pointed out by [27], adding additional dynamics can degenerate control performances, and it has not been validated that such an approach works for contraction-based control design. Therefore, it is worth solving Problem 2.3 directly. In particular, we provide an LMI framework for control design, which is easy-to-implement. The proposed method may be tailored for nonlinear adaptive control or observer design although this is beyond the scope of this paper.

2.2 Gaussian Process Regression

To solve Problem 2.3, we employ Gaussian process regression (GPR) [32, 4]. We, in this subsection, briefly summarize basics of GPR and, in the next section, demonstrate that GPR is a suitable tool for handling problems involving partial derivatives, e.g., integrability conditions.

Let $\{x^{(i)}\}_{i=1}^{N}$ , $x^{(i)}\in{\mathbb{R}}^{n}$ be input data, and let $\{y^{(i)}\}_{i=1}^{N}$ , $y^{(i)}\in{\mathbb{R}}$ be the corresponding output data given by

\displaystyle y^{(i)}=p(x^{(i)})+\omega^{(i)},\quad i=1,\dots,N,

(5)

where $\omega^{(i)}\sim{\mathcal{N}}(0,\sigma^{2})$ is i.i.d. GPR is a technique to learn an unknown function $p:{\mathbb{R}}^{n}\to{\mathbb{R}}$ from input-output data $\{x^{(i)},y^{(i)}\}_{i=1}^{N}$ by assuming $p$ as a Gaussian process (GP).

A stochastic process $p$ is said to be GP if any finite set $\{p(x^{(i)})\}_{i=1}^{N}$ has a joint Gaussian distribution [32, Definition 2.1]. A GP is completely specified by its mean function $m_{p}:{\mathbb{R}}^{n}\to{\mathbb{R}}$ and covariance function $k_{p}:{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}\to{\mathbb{R}}$ . They are defined by

	$\displaystyle m_{p}(x)$	$\displaystyle={\mathbb{E}}[p(x)]$		(6)
	$\displaystyle k_{p}(x,x^{\prime})$	$\displaystyle={\mathbb{E}}[(p(x)-m_{p}(x))(p(x^{\prime})-m_{p}(x^{\prime}))],$		(7)

and we represent the GP by $p\sim\mathcal{GP}(m_{p}(x),k_{p}(x,x^{\prime}))$ .

The essence of GPR is to estimate $m_{p}(x)$ and $k_{p}(x,x^{\prime})$ as the posterior mean and covariance given $\{x^{(i)},y^{(i)}\}_{i=1}^{N}$ by the Bayes estimation [32, 4]. Typically, the prior mean $m_{0}(x)$ is selected as zero. The prior covariance $k_{0}(x,x^{\prime})$ needs to be a positive definite kernel [32, Section 6], and we also require smoothness. Standard kernels are linear, polynomial, or squared exponential (SE) [32, 4], which are all smooth and positive definite. For instance, an SE kernel is

\displaystyle k_{0}(x,x^{\prime})=\beta e^{-|x-x^{\prime}|_{\Sigma^{-1}}^{2}/2},

(8)

where $\beta>0$ and $\Sigma\succ 0$ in addition to $\sigma>0$ in (5) are free parameters, called hyper parameters. The hyper parameters can be selected to maximize the marginal likelihood from observed data; see e.g., [32, Section 5.4].

To use GPR for learning $p(x)$ , we need its data as in (5). However, Proposition 2.2 contains no information of $p(x)$ but of $\partial p(x)$ via the integrability constraint (3). To handle this situation, we employ derivatives of GPR as elaborated in the next section.

3 Contraction-based Nonlinear Control Design

In this section, we establish an LMI framework for contraction-based nonlinear control design by utilizing derivatives of GPR to solve the integrability problem. Then, we discuss generalizations of the proposed method to non-constant input vector fields and continuous-time cases.

3.1 LMI Frameworks for Nonlinear Control Design

Taking the partial derivatives of (6) and (7), we have the joint distribution of $p(x)$ and $\partial p(x)$ as in

\displaystyle\begin{bmatrix}p\\ \partial^{\top}p\end{bmatrix}\sim\mathcal{GP}\left(\begin{bmatrix}m_{p}(x)\\ \partial^{\top}m_{p}(x)\end{bmatrix},\begin{bmatrix}k_{p}(x,x^{\prime})&\partial_{x^{\prime}}k_{p}(x,x^{\prime})\\ \partial_{x}^{\top}k_{p}(x,x^{\prime})&\partial^{2}k_{p}(x,x^{\prime})\end{bmatrix}\right),

(9)

see, e.g., [30, Equation (2)]. The goal of this subsection is to find $m_{p}(x)$ based on the Bayes estimation such that $p(x)=m_{p}(x)$ is a solution to Problem 2.3. For the sake of notational simplicity, we select the prior mean of $p(x)$ as zero.

A standard procedure of the Bayes estimation is that we first select a class $C^{2}$ positive definite kernel $k_{0}(x,x^{\prime})$ as a prior covariance. Then, we compute the posterior mean $m_{p}(x)$ given data of $\partial p(x)$ . Looking at this from a different angle, $m_{p}(x)$ can be viewed as a function of data of $\partial p(x)$ . Based on this perspective, we consider generating suitable data such that $m_{p}(x)$ becomes a solution to Problem 2.3 as a non-standard use of GPR.

To this end, let $\{x^{(i)},\hat{p}^{(i)}\}_{i=1}^{N}$ denote a data set to be generated, where

\displaystyle\hat{p}^{(i)}=\partial p(x^{(i)})+\omega_{p}^{(i)},\quad i=1,\dots,N.

(10)

The role of i.i.d. $\omega_{p}^{(i)}\sim{\mathcal{N}}(0,\sigma_{p}^{2}I_{n})$ is explained later; one can simply choose it as zero. Define the vector consisting of $\{\hat{p}^{(i)}\}_{i=1}^{N}$ by

\displaystyle Y_{p}:=\begin{bmatrix}\hat{p}^{(1)}&\cdots&\hat{p}^{(N)}\end{bmatrix}\in{\mathbb{R}}^{nN}.

Also, we introduce the following notations:

	$\displaystyle X_{N}:=\begin{bmatrix}(x^{(1)})^{\top}&\cdots&(x^{(N)})^{\top}\end{bmatrix}^{\top}\in{\mathbb{R}}^{nN}$
	$\displaystyle\partial p(X_{N}):=\begin{bmatrix}\partial p(x^{(1)})&\cdots&\partial p(x^{(N)})\end{bmatrix}\in{\mathbb{R}}^{nN}$
	$\displaystyle{\bf k}^{\prime}_{\partial,0}(x^{\prime}):=\begin{bmatrix}\partial_{x}k_{0}(x^{(1)},x^{\prime})&\cdots&\partial_{x}k_{0}(x^{(N)},x^{\prime})\end{bmatrix}$
	$\displaystyle{\bf k}_{\partial,0}(x):=\begin{bmatrix}\partial_{x^{\prime}}k_{0}(x,x^{(1)})&\cdots&\partial_{x^{\prime}}k_{0}(x,x^{(N)})\end{bmatrix}$
	$\displaystyle K_{0}:=\begin{bmatrix}\partial^{2}k_{0}(x^{(1)},x^{(1)})&\cdots&\partial^{2}k_{0}(x^{(1)},x^{(N)})\\ \vdots&&\vdots\\ \partial^{2}k_{0}(x^{(N)},x^{(1)})&\cdots&\partial^{2}k_{0}(x^{(N)},x^{(N)})\end{bmatrix}$
	$\displaystyle\hskip 142.26378pt\in{\mathbb{R}}^{nN\times nN}.$		(11)

Then, the joint distribution of the prior distribution of $p(x)$ , denoted by $p^{\rm pri}(x)$ and $\partial p(X_{N})$ is

\displaystyle\begin{bmatrix}p^{\rm pri}\\ \partial^{\top}p(X_{N})\end{bmatrix}\sim{\mathcal{N}}\left(\begin{bmatrix}0\\ Y_{p}^{\top}\end{bmatrix},\begin{bmatrix}k_{0}(x,x^{\prime})&{\bf k}_{\partial,0}(x)\\ ({\bf k}^{\prime}_{\partial,0})^{\top}(x^{\prime})&K_{0}+\sigma_{p}^{2}I_{nN}\end{bmatrix}\right).

By the Bayes estimation, we can compute the posterior mean $m_{p}(x)$ given $Y_{p}$ as

\displaystyle m_{p}(x|Y_{p}):=\sum_{i=1}^{N}\partial_{x^{\prime}}k_{0}(x,x^{(i)})h_{p}^{(i)},

(12)

where $h_{p}^{(i)}$ denotes the $i$ th block-component with size $n$ of $h_{p}$ defined by

\displaystyle h_{p}:=(K_{0}+\sigma_{p}^{2}I_{nN})^{-1}Y_{p}^{\top}\in{\mathbb{R}}^{nN}.

(13)

The partial derivative of $m_{p}(x|Y_{p})$ is easy-to-compute:

\displaystyle\partial m_{p}(x|Y_{p})=\frac{\partial}{\partial x}\left(\sum_{i=1}^{N}\partial_{x^{\prime}}k_{0}(x,x^{(i)})h_{p}^{(i)}\right).

(14)

Especially if $\sigma_{p}=0$ and $K_{0}\succ 0$ , it follows from (14) that

\displaystyle\partial m_{p}(x^{(i)}|Y_{p})=\hat{p}^{(i)},\quad i=1,\dots,N.

(15)

Note that $m_{p}(x|Y_{p})$ and $\partial m_{p}(x|Y_{p})$ are nonlinear functions of $x$ and linear functions of $Y_{p}$ . Therefore, control design reduces to generating suitable $Y_{p}$ , i.e., $\{x^{(i)},\hat{p}^{(i)}\}_{i=1}^{N}$ , which is proceeded based on Proposition 2.2.

Now, we are ready to develop an LMI framework for contraction-based nonlinear control design. Substituting $p_{\partial}=\partial m_{p}(x|Y_{p})P$ into (2) does not give an LMI because of the coupling between $Y_{p}$ and $P$ . This issue can be addressed by using a standard technique of an LMI. According to [34, Theorem 2.3.11], (2), i.e., the set of $P\succ 0$ and (2.1) implies, for all $x\in{\mathbb{R}}^{n}$ ,

\displaystyle\left\{\begin{array}[]{l}P\succ 0\\ b^{+}(P-\partial f(x)P\partial^{\top}f(x))(b^{+})^{\top}\succeq\varepsilon_{p}I_{n-1}.\end{array}\right.

(18)

This is an LMI with respect to $P$ and $\varepsilon_{p}>0$ at each $x\in{\mathbb{R}}^{n}$ . For a solution $P$ to (18) and $p_{\partial}=\partial m_{p}(x|Y_{p})P$ , one only has to solve (2) with respect to $Y_{p}$ . The proposed procedure for solving Problem 2.3 is summarized in Algorithm 1.

Algorithm 1 Solving Problem 2.3

f(x)

b

k_{0}(x,x^{\prime})

\sigma_{p}\geq 0

\{x^{(i)}\}_{i=1}^{N}

p(x)

and

P

(if they exist)

3:Define

\partial m_{p}(x|Y_{p})

as in (14)

4:Solve a finite family of LMIs with respect to

P

and

\varepsilon_{p}>0

		$\displaystyle\left\{\begin{array}[]{l}P\succ 0\\ b^{+}(P-\partial f(x^{(i)})P\partial^{\top}f(x^{(i)}))(b^{+})^{\top}\succeq\varepsilon_{p}I_{n-1}\\ \end{array}\right.$		(21)
		$\displaystyle\hskip 128.0374pti=1,\dots,N$		(22)

5:For

P

obtained in Step 2, solve a finite family of LMIs with respect to

Y_{p}

and

\varepsilon>0

	$\displaystyle\begin{bmatrix}P&*\\ (\partial f(x^{(i)})+b\partial m_{p}(x^{(i)}\|Y_{p}))P&P\end{bmatrix}\succeq\varepsilon I_{2n},$		(23)
	$\displaystyle i=1,\dots,N$

6:Define

m_{p}(x|Y_{p})

as in (12) by using

Y_{p}

obtained in Step 3

7:Return

p(x):=m_{p}(x|Y_{p})

and

P

If Algorithm 1 has a set of solutions, the obtained $p(x)$ is a solution to Problem 2.3 at each data point, stated below.

Theorem 3.1.

Given a system (1), a class $C^{2}$ positive definite kernel $k_{0}(x,x^{\prime})$ , $\sigma_{p}\geq 0$ , and $\{x^{(i)}\}_{i=1}^{N}$ , suppose that Algorithm 1 has a set of solutions $p(x)$ and $P$ . Then, $p(x)$ satisfies (2) and (3) for all $x=x^{(i)}$ , $i=1,\dots,N$ . $\lhd$

{pf}

Using a set of solutions $p(x)$ and $P$ , define $p_{\partial}(x)=\partial p(x)P$ . Then, (3) holds for all $x\in{\mathbb{R}}^{n}$ . It suffices to confirm that (2) holds at $x=x^{(i)}$ , $i=1,\dots,N$ . Since $p(x)=m_{p}(x|Y_{p})$ , we have $\partial p(x^{(i)})=\partial m_{p}(x^{(i)}|Y_{p})$ , $i=1,\dots,N$ . Therefore, (23) implies (2) for $x=x^{(i)}$ , $i=1,\dots,N$ . ∎

Note that in Algorithm 1, the number $N$ of training data and data point set $\{x^{(i)}\}_{i=1}^{N}$ can be chosen arbitrarily. As $N$ increases, the number of $Y_{p}\in{\mathbb{R}}^{nN}$ in (23) increase. However, the problem is still convex. As explained in the next subsection, for sufficiently large $N$ , one can show that $p(x)=m_{p}(x|Y_{p})$ is a solution to Problem 2.3 other than $\{x^{(i)}\}_{i=1}^{N}$ when $x^{(i)}$ is distributed evenly in the state space. Moreover, such $N$ can be found.

When $\sigma_{p}=0$ and $K_{0}\succ 0$ , (15) helps to simplify Algorithm 1. In fact, we only have to solve a finite family of LMIs once, stated below without the proof.

Corollary 3.2.

Given a class $C^{2}$ positive definite kernel $k_{0}(x,x^{\prime})$ and $\{x^{(i)}\}_{i=1}^{N}$ , suppose that $K_{0}\succ 0$ , and the following finite family of LMIs admits a set of solutions $\{\bar{p}^{(i)}\}_{i=1}^{N}$ , $P\in{\mathbb{R}}^{n\times n}$ , and $\varepsilon>0$ :

\displaystyle\begin{bmatrix}P&*\\ \partial f(x^{(i)})P+b\bar{p}^{(i)}&P\end{bmatrix}\succeq\varepsilon I_{2n},\quad i=1,\dots,N.

(24)

Define $\hat{p}^{(i)}=\bar{p}^{(i)}P^{-1}$ , $i=1,\dots,N$ and choose $\sigma_{p}=0$ . Then, $p(x):=m_{p}(x|Y_{p})$ defined by (12) satisfies (2) and (3) for all $x=x^{(i)}$ , $i=1,\dots,N$ . $\lhd$

Remark 3.3.

It is not guaranteed that the constructed controller $p(x)$ in Theorem 3.1 preserves an equilibrium point $x^{*}$ of $x_{k+1}=f(x_{k})$ . However, it is easy to impose $p(x^{*})=0$ . An approach is to use shifted $u=p(x)-p(x^{*})$ . Another approach is to utilize (5). Consider new data $\{x^{(i)},y^{(i)}\}_{i=N+1}^{M}$ in (5). Define $Y:=[\begin{matrix}y^{(N+1)}&\cdots&y^{(M)}\end{matrix}]^{\top}$ . Then, the posterior mean $m_{p}(x)$ given $Y$ and $Y_{p}$ can be computed by the Bayes estimation also. As a special case, specifying $M=N+1$ , $x^{(N+1)}=x^{*}$ , $y^{(N+1)}=0$ , and $\sigma=0$ can result $p(x^{*})=0$ for learned $p(x)$ . More generally, one can specify the values of $p(x)$ at arbitrary finite points $\{x^{(i)}\}_{i=N+1}^{M}$ . $\lhd$

Remark 3.4.

In the multiple-input case, each component $p_{i}$ of $p$ can be designed separately by introducing $m_{p,i}(x|Y_{p})$ corresponding to $p_{i}$ . Namely, the results obtained in this paper can readily be generalized to the multiple-input case. $\lhd$

At the end of this subsection, we argue the role of $\omega_{p}^{(i)}\sim{\mathcal{N}}(0,\sigma_{p}^{2}I_{n})$ in (10) by interpreting the procedure of Algorithm 1 as follows. We generate suitable $Y_{p}$ based on the LMIs and then construct $p(x)=m_{p}(x|Y_{p})$ by functional fitting. In functional fitting, overfitting is a common issue, since a constructed function becomes unnecessarily complex. The variance $\sigma_{p}^{2}$ specifies how much a constructed function needs to fit to data, and thus adding the noise $\omega_{p}^{(i)}$ helps to avoid overfitting. However, if $\sigma_{p}^{2}$ is large, a constructed function can fit to $\omega_{p}^{(i)}$ instead of $\partial p(x^{(i)})$ . This is well known as the bias-variance tradeoff in functional fitting [16, Section 5.4.4]. It is worth emphasizing that Theorem 3.1 holds for arbitrary $\sigma_{p}>0$ .

If $K_{0}$ in (11) is non-singular, $h_{p}$ in (13) is nothing but the minimizer of the following optimization problem:

\displaystyle\min_{h_{p}}|Y_{p}-K_{0}h_{p}|_{K_{0}^{-1}}^{2}+\sigma_{p}^{2}|h_{p}|^{2}.

(25)

The first term evaluates the fitting error at each $x^{(i)}$ and the second one is for regularization. Especially when $\sigma_{p}=0$ , the optimal value becomes zero for the obtained $h_{p}$ , i.e., a complete fitting $\partial p(x^{(i)})=\hat{p}^{(i)}$ , $i=1,\dots,N$ is achieved as stated by Corollary 3.2. A reproducing kernel Hilbert space is a formal tool to study a functional fitting problem under regularization as an optimization problem. For the prior variance as a kernel, the pair $(p,\partial p)$ can be understood as a minimizer according to the representer theorem [31, Theorem 2].

3.2 Closed-loop Stability

In Theorem 3.1, it is not clear whether $p_{\partial}(x)=\partial p(x)P$ satisfies (2) other than $\{x^{(i)}\}_{i=1}^{N}$ in contrast to the integrability condition (3). Applying standard arguments of the polytope approach [5], we improve the LMIs in Algorithm 1 such that its solution satisfies (2) other than $\{x^{(i)}\}_{i=1}^{N}$ when $N$ is sufficiently large.

For a set of matrices ${\mathbb{A}}:=\{A_{1},\dots,A_{L}\}$ , let ${\rm ConvexHull}({\mathbb{A}})$ denote its convex hull, i.e.,

	$\displaystyle{\rm ConvexHull}({\mathbb{A}})$
	$\displaystyle:=\left\{A=\sum_{\ell=1}^{L}\theta_{\ell}A_{\ell}:\sum_{\ell=1}^{L}\theta_{\ell}=1,\;\theta_{\ell}\geq 0,\ell=1,\dots,L\right\}.$

Now, we choose $L=N$ and

\displaystyle A_{i}:=\partial f(x^{(i)})+b\partial p(x^{(i)}),\quad i=1,\dots,N.

From the standard discussion of the polytope approach [5], (2) and (3) hold for all $x\in{\mathbb{R}}^{n}$ belonging to

\displaystyle D:=\{x\in{\mathbb{R}}^{n}:\partial f(x)+b\partial p(x)\in{\rm ConvexHull}({\mathbb{A}})\}.

This set $D$ can be made larger by increasing the number $N$ of data used for control design.

Increasing $N$ is not the only approach to enlarge a set of $x$ in which (2) holds. Another approach is to improve the LMIs in Algorithm 1. First, we decide ${\mathbb{A}}^{(i)}:=\{A^{(i)}_{1},\dots,A^{(i)}_{L^{(i)}}\}\subset{\mathbb{R}}^{n\times n}$ which is arbitrary as long as

\displaystyle\partial f(x^{(i)})\in{\rm ConvexHull}({\mathbb{A}}^{(i)}),\quad i=1,\dots,N.

(26)

Instead of (21), we consider the following finite family of LMIs with respect to $P\in{\mathbb{R}}^{n\times n}$ and $\varepsilon_{p}>0$ :

	$\displaystyle\left\{\begin{array}[]{l}P\succ 0\\ b^{+}(P-A^{(i)}_{\ell^{(i)}}P(A^{(i)}_{\ell^{(i)}})^{\top})(b^{+})^{\top}\succeq\varepsilon_{p}I_{n-1}\\ \end{array}\right.$		(29)
	$\displaystyle\ell^{(i)}=1,\dots,L^{(i)},\;i=1,\dots,N.$

Using a solution $P$ , we consider the following finite family of LMIs with respect to $Y_{p}\in{\mathbb{R}}^{nN}$ and $\varepsilon>0$ instead of (23):

	$\displaystyle\begin{bmatrix}P&*\\ (A^{(i)}_{\ell^{(i)}}+b\partial m_{p}(x^{(i)}\|Y_{p}))P&P\end{bmatrix}\succeq\varepsilon I_{2n}$		(30)
	$\displaystyle\quad\ell^{(i)}=1,\dots,L^{(i)},\;i=1,\dots,N.$

The newly obtained $p(x):=m_{p}(x|Y_{p})$ satisfies (2) for all $x\in{\mathbb{R}}^{n}$ belonging to

\displaystyle\bar{D}:=\bigcup_{i=1}^{N}\{x\in{\mathbb{R}}^{n}:

\displaystyle\;\partial f(x)+b\partial p(x)\in{\rm ConvexHull}({\mathbb{A}}^{(i)})\}.

Now, we are ready to state a control design procedure such that the IES conditions (2) and (3) hold on a given bounded set.

Theorem 3.5.

Consider a system (1), a class $C^{2}$ positive definite kernel $k_{0}(x,x^{\prime})$ , and $\sigma_{p}\geq 0$ . Let $D\subset{\mathbb{R}}^{n}$ denote an $n$ -dimensional closed hypercube. We partition it evenly to $N$ closed hypercubes, denoted by $\{D^{(i)}\}_{i=1}^{N}$ , where $N=r^{n}$ , $r\in{\mathbb{Z}}_{>0}$ . For all $r\in{\mathbb{Z}}_{>0}$ , there exist ${\mathbb{A}}^{(i)}$ , $i=1,\dots,N$ such that

		$\displaystyle\partial f(x)\in{\rm ConvexHull}({\mathbb{A}}^{(i)})$		(31)
		$\displaystyle\hskip 28.45274pt\forall x\in D^{(i)},\;\forall i=1,\dots,N.$

Let each $x^{(i)}$ , $i=1,\dots,N$ be the center of $D^{(i)}$ . Suppose that for such ${\mathbb{A}}^{(i)}$ , $i=1,\dots,N$ ,

1)

for all $r>0$ , the LMI (29) admits $P\in{\mathbb{R}}^{n\times n}$ and $\varepsilon_{p}>0$ ;
2)

for all $r>0$ and $P$ obtained in item 1), the LMI (30) admits $Y_{p}\in{\mathbb{R}}^{nN}$ and $\varepsilon>0$ ;
3)

for the elements $\hat{p}^{(i)}$ of $Y_{p}$ , there exists a strictly decreasing positive function $\delta$ of $r$ such that $\delta(r)\to 0$ as $r\to\infty$ , and $|\hat{p}^{(i)}-\hat{p}^{(j)}|<\delta(r)$ if $D^{(i)}$ and $D^{(j)}$ are next to each other.

Then, there exists a sufficiently large $r>0$ such that $p(x):=m_{p}(x|Y_{p})$ satisfies (2) and (3) for all $x\in D$ .

{pf}

First, we show (31). Since a continuous function is bounded on a bounded set, $\{\partial f(x)\in{\mathbb{R}}^{n\times n}:x\in D^{(i)}\}$ is bounded for each $i=1,\dots,N$ . Each bounded set admits its convex hull. Therefore, there exists ${\mathbb{A}}^{(i)}$ satisfying (31).

Next, we consider the latter statement. We choose $p_{\partial}(x)=\partial p(x)P=\partial m_{p}(x|Y_{p})P$ . Then, (3) holds on ${\mathbb{R}}^{n}$ . It remains to show that (2) holds on $D$ . Items 1) and 2) and (31) imply that for each $i=1,\dots,N$ ,

\displaystyle\begin{bmatrix}P&*\\ \partial f(x)P+bp_{\partial}(x)&P\end{bmatrix}\succeq\varepsilon I_{2n},\quad\forall x\in D^{(i)}.

The strict inequality holds for any positive $\bar{\varepsilon}<\varepsilon$ . Since $p_{\partial}(x)=\partial m_{p}(x|Y_{p})P$ is continuous (with respect to $x$ ), there exists a sufficiently small $\bar{D}_{\bar{\varepsilon}}^{(i)}\subset{\mathbb{R}}^{n}$ centered at $x^{(i)}$ such that

\displaystyle\begin{bmatrix}P&*\\ \partial f(x)P+bp_{\partial}(x)&P\end{bmatrix}\succ\bar{\varepsilon}I_{2n},\quad\forall x\in\bar{D}_{\bar{\varepsilon}}^{(i)}.

From item 3), the continuity of $\partial f$ and $p_{\partial}$ , and the boundedness of $D^{(i)}$ , there exists a sufficiently large $r\in{\mathbb{Z}}_{>0}$ such that $D^{(i)}\subset\bar{D}_{\bar{\varepsilon}}^{(i)}$ for all $i=1,\dots,N$ . Consequently, (2) hold on $D$ , a union of $D^{(i)}$ , $i=1,\dots,N$ . ∎

In the above theorem, $D$ and $D^{(i)}$ are not necessarily to be hypercubes or closed. Essential requirements are that $D$ is covered by $\{D^{(i)}\}_{i=1}^{N}$ , and each $D^{(i)}$ shrinks as $N$ increases. In the proof, we show that a switching controller $u=\partial p(x^{(i)})x$ , $x\in D^{(i)}$ also satisfies (2) and (3). However, a continuous controller $u=p(x)$ is more easy-to-implement.

Remark 3.6.

In Theorem 3.5, IES of the closed-loop system is guaranteed on geodesically convex $D$ if either $D$ is positively invariant or contains an equilibrium point. By the Schur complement, one can confirm that (2) and (3) on $D$ implies (2.1) on $D$ . According to the proof of [37, Theorem 15], if (2.1) holds on $D$ , then there exists $\lambda\in[0,1)$ such that

	$\displaystyle\sqrt{(x_{k+1}-x^{\prime}_{k+1})^{\top}P(x_{k+1}-x^{\prime}_{k+1})}$
	$\displaystyle\leq\lambda\sqrt{(x_{k}-x^{\prime}_{k})^{\top}P(x_{k}-x^{\prime}_{k})}.$

for all $k=0,1,\dots$ and $(x_{0},x^{\prime}_{0})\in{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}$ as long as $x_{k}-x^{\prime}_{k}\in D$ . This implies that if $D$ is geodesically convex and positively invariant, the closed-loop system is IES on $D$ . Next, if $D$ contains an equilibrium point $x^{*}$ , it follows that

\displaystyle\sqrt{(x_{1}-x^{*})^{\top}P(x_{1}-x^{*})}\leq\lambda\sqrt{(x_{0}-x^{*})^{\top}P(x_{0}-x^{*})}

for all $x_{0}\in D$ . From geodesic convexity, this further implies that $D$ is positively invariant. Thus, the closed-loop system is IES on $D$ . $\lhd$

3.3 Discussions for Generalizations

In this subsection, we discuss how to generalize our results to the cases where the input vector field $b$ is a function of $x$ . Also, we mention the continuous-time case.

First, we consider the system with non-constant $b$ :

\displaystyle x_{k+1}=f(x_{k})+b(x_{k})u_{k},\quad k\in{\mathbb{Z}}_{\geq 0},

(32)

where $b:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n}$ is of class $C^{1}$ . A modification of Proposition 2.2 implies that a controller $u=p(x)$ achieves IES if there exist $\varepsilon>0$ , $P\in{\mathbb{R}}^{n\times n}$ , and $p:{\mathbb{R}}^{n}\to{\mathbb{R}}$ of class $C^{1}$ such that for all $x\in{\mathbb{R}}^{n}$ ,

\displaystyle\begin{bmatrix}P&*\\ (\partial f(x)+p(x)\partial b(x)+b(x)\partial p(x))P&P\end{bmatrix}\succeq\varepsilon I_{2n}.

(33)

The difference from (2) is the additional term $p(x)\partial b(x)$ .

For finding $P$ first, one can utilize a modification of (18):

\displaystyle\left\{\begin{array}[]{l}P\succ 0\\ b^{+}(x)(P-\partial f(x)P\partial^{\top}f(x))(b^{+}(x))^{\top}\succeq\varepsilon_{p}I_{n-1}.\end{array}\right.

(36)

Substituting its solution $P$ , $p(x)=m_{p}(x|Y_{p})$ , and $\partial p(x)=\partial m_{p}(x|Y_{p})$ into (33) yields an LMI with respect to $Y_{p}$ and $\varepsilon$ at each $x\in{\mathbb{R}}^{n}$ . Therefore, even for non-constant $b$ , one can still design a controller only by solving two finite families of LMIs on data points $\{x^{(i)}\}_{i=1}^{N}$ .

In this paper, we focus on discrete-time systems. However, our method can also be applied to the continuous-time systems:

\displaystyle\dot{x}=f(x)+b(x)u.

(37)

According to [9, Theorem 1], a controller $u=p(x)$ makes the closed-loop system IES if there exist $\varepsilon>0$ and $P(x)\succ 0$ , $x\in{\mathbb{R}}^{n}$ such that for all $x\in{\mathbb{R}}^{n}$ ,

		$\displaystyle\sum_{k=1}^{n}\frac{\partial P(x)}{\partial x_{k}}(f_{k}(x)+b_{k}(x)p(x))$
		$\displaystyle+P(x)(\partial f(x)+p(x)\partial b(x)+b(x)\partial p(x))$		(38)
		$\displaystyle+(\partial f(x)+p(x)\partial b(x)+b(x)\partial p(x))^{\top}P(x)\preceq-\varepsilon P(x).$

Let $P_{i,j}\sim\mathcal{GP}(m_{i,j}(x),k_{i,j}(x,x^{\prime}))$ and $P_{i,j}^{(l)}=P_{i,j}(x^{(l)})+\omega_{i,j}^{(l)}$ , where $P_{i,j}(x)$ denotes the $(i,j)$ th element of $P(x)$ , and $\omega_{i,j}^{(l)}\sim{\mathcal{N}}(0,\sigma_{i,j}^{(l)})$ is i.i.d. Denote $P(x|P_{i,j}^{(l)})$ by the posterior mean of $P(x)$ given $\{P_{i,j}^{(l)}\}_{l=1}^{N}$ , $i,j=1,\dots,n$ . Then, we first find $P(x)=P(x|P_{i,j}^{(l)})$ satisfying $P(x)\succ 0$ and

	$\displaystyle b^{+}(x)\Biggl{(}\sum_{k=1}^{n}\frac{\partial P(x)}{\partial x_{k}}f_{k}(x)+P(x)\partial f(x)+\partial^{\top}f(x)P(x)\Biggr{)}$
	$\displaystyle(b^{+}(x))^{\top}\preceq-\varepsilon_{p}b^{+}(x)P(x)(b^{+}(x))^{\top},\quad\forall x\in{\mathbb{R}}^{n}.$

For the obtained $P(x)$ , it suffices to solve (3.3) with respect to $p(x)=m_{p}(x|Y_{p})$ , i.e., $Y_{p}$ . Therefore, in the continuous-time case, nonlinear control design can be achieved only by solving two finite families of LMIs at $\{x^{(l)}\}_{l=1}^{N}$ even for a non-constant metric $P(x)$ . As mentioned in Section 2.1, the proposed method can further be applied to various design problems.

4 Control Design for Unknown Systems

For unknown system dynamics, it is shown by e.g. [13, 12] that its state-space model can be estimated from the system’s input and output by GPR. Since GPR is a Bayesian approach, the estimation error is represented by a posterior covariance. In this section, we show how to compensate for a stochastic learning error by control design. To focus on exposing the main idea, we consider a case where the drift vector field is unknown, and the state is measurable, but the results can be generalized to the case where all systems dynamics are unknown and only the system’s input and output are measurable by utilizing the results in [13, 12].

4.1 Learning Drift Vector Fields

In GPR, we learn each component $f_{i}$ , $i=1,\dots,n$ of $f$ separately from training data $\{x^{(j)},y_{i}^{(j)}\}_{j=1}^{N}$ ,

\displaystyle y_{i}^{(j)}=f_{i}(x^{(j)})+\omega_{y_{i}}^{(j)},\quad i=1,\dots,n,\;j=1,\dots,N,

where $\omega_{y_{i}}^{(j)}\sim{\mathcal{N}}(0,\sigma_{y_{i}}^{2})$ is i.i.d. The number $N$ of training data and training data points $\{x^{(j)}\}_{j=1}^{N}$ for learning $f$ are allowed to be different from those used for control design. Differently from control design, we can directly obtain training data of $f_{i}$ , and thus it can be estimated by the standard use of GPR. Moreover, to compensate for the learning error of $f_{i}$ by control design, we estimate the error as the posterior covariance.

We choose a prior distribution of $f_{i}$ as $f_{i}^{\rm pri}\sim\mathcal{GP}(0,k_{i}(x,x^{\prime}))$ , where $k_{i}:{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}\to{\mathbb{R}}$ is a class $C^{2}$ positive definite kernel. Then, the Bayes estimation yields the posterior mean of the joint distribution of $f_{i}$ and $\partial f_{i}$ as follows:

	$\displaystyle\mu_{i}(x):=\sum_{j=1}^{N}k_{i}(x^{(j)},x)h_{i}^{(j)}$
	$\displaystyle\partial\mu_{i}(x):=\sum_{j=1}^{N}\partial k_{i}(x^{(j)},x)h_{i}^{(j)}$
	$\displaystyle\quad h_{i}:=(K_{i}+\sigma_{y_{i}}^{2}I_{N})^{-1}Y_{i}\in{\mathbb{R}}^{N},$
	$\displaystyle\quad Y_{i}:=\begin{bmatrix}y_{i}^{(1)}&\cdots&y_{i}^{(N)}\end{bmatrix}^{\top}\in{\mathbb{R}}^{N}$
	$\displaystyle\quad K_{i}:=\begin{bmatrix}k_{i}(x^{(1)},x^{(1)})&\cdots&k_{i}(x^{(1)},x^{(N)})\\ \vdots&\ddots&\vdots\\ k_{i}(x^{(N)},x^{(1)})&\cdots&k_{i}(x^{(N)},x^{(N)})\end{bmatrix}\in{\mathbb{R}}^{N\times N},$

where $h_{i}^{(j)}$ denotes the $j$ th component of $h_{i}$ ; see, e.g. [32, Section 2] for the computation of $\mu_{i}(x)$ , and $\partial\mu_{i}(x)$ can be computed by taking its partial derivative with respect to $x$ .

Remark 4.1.

When $b$ is also unknown, we learn $g_{i}(x,u):=f_{i}(x)+b_{i}u$ , $i=1,\dots,n$ from training data $\{(x^{(j)},u^{(j)}),y_{i}^{(j)}\}_{j=1}^{N}$ ,

\displaystyle y_{i}^{(j)}=g_{i}(x^{(j)},u^{(j)})+\omega_{y_{i}}^{(j)},\quad i=1,\dots,n,\;j=1,\dots,N.

To utilize the prior knowledge that $g_{i}(x,u)$ is linear with respect to $u$ , we employ the following kernel $k_{i}(x,x^{\prime})+uu^{\prime}$ , where $k_{i}:{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}\to{\mathbb{R}}$ is a class $C^{2}$ positive definite kernel. Namely, we select a prior distribution of $g_{i}(x,u)$ as $g_{i}^{\rm pri}\sim\mathcal{GP}(0,k_{i}(x,x^{\prime})+uu^{\prime})$ . Then, the Bayes estimation yields the posterior mean of $g_{i}$ as follows:

	$\displaystyle\bar{\mu}_{i}(x,u):=\sum_{j=1}^{N}(k_{i}(x^{(j)},x)+u^{(j)}u)\bar{h}_{i}^{(j)}$
	$\displaystyle\quad\bar{h}_{i}:=(K_{i}+K_{u}+\sigma_{y_{i}}^{2}I_{N})^{-1}Y_{i}\in{\mathbb{R}}^{N}$
	$\displaystyle\quad K_{u}:=\begin{bmatrix}(u^{(1)})^{2}&\cdots&u^{(1)}u^{(N)}\\ \vdots&\ddots\\ u^{(N)}u^{(1)}&\cdots&(u^{(N)})^{2}\end{bmatrix}\in{\mathbb{R}}^{N\times N},$

where $K_{i}$ and $Y_{i}$ are the same as the above. Note that $\bar{\mu}_{i}(x,u)$ is linear with respect to $u$ . $\lhd$

A benefit of GPR for learning a function is the ease of analytical computation of the posterior covariance function $v_{i}:{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}\to{\mathbb{R}}$ of $f_{i}$ as in

	$\displaystyle v_{i}(x,x^{\prime}):=k(x,x^{\prime})-{\bf k}_{i}^{\top}(x)(K_{i}+\sigma_{y_{i}}^{2}I_{N})^{-1}{\bf k}_{i}(x^{\prime})$
	$\displaystyle\hskip 11.38109pt{\bf k}_{i}(x):=\begin{bmatrix}k_{i}(x^{(1)},x)&\cdots&k_{i}(x^{(N)},x)\end{bmatrix}^{\top}.$

Similarly, the posterior covariance function $v_{\partial,i}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n\times n}$ of $\partial f_{i}$ is easy to compute:

	$\displaystyle v_{\partial,i}(x,x^{\prime}):=\partial^{2}k_{i}(x,x^{\prime})$
	$\displaystyle\hskip 56.9055pt-{\bf k}_{\partial,i}^{\top}(x)(K_{i}+\sigma_{y_{i}}^{2}I_{N})^{-1}{\bf k}_{\partial,i}(x^{\prime})$
	$\displaystyle\hskip 11.38109pt{\bf k}_{\partial,i}(x):=\begin{bmatrix}&\partial_{x^{\prime}}^{\top}k_{i}(x,x^{(1)})&\cdots&\partial_{x^{\prime}}^{\top}k_{i}(x,x^{(N)})\end{bmatrix}^{\top}.$

Therefore, the posterior distributions of $f_{i}$ and $\partial f_{i}$ are respectively obtained by

	$\displaystyle f_{i}^{\rm post}\|Y_{i}$	$\displaystyle\sim\mathcal{GP}(\mu_{i}(x),v_{i}(x,x^{\prime}))$
	$\displaystyle\partial^{\top}f_{i}^{\rm post}\|Y_{i}$	$\displaystyle\sim\mathcal{GP}\left(\partial^{\top}\mu_{i}(x),v_{\partial,i}(x,x^{\prime})\right),$

and consequently,

$\displaystyle f_{i}^{\rm post}(x)\|Y_{i}$	$\displaystyle\sim{\mathcal{N}}(\mu_{i}(x),\sigma_{i}^{2}(x)),\quad\forall x\in{\mathbb{R}}^{n}$
$\displaystyle\partial^{\top}f_{i}^{\rm post}(x)\|Y_{i}$	$\displaystyle\sim{\mathcal{N}}\left(\partial^{\top}\mu_{i}(x),\sigma_{\partial,i}^{2}(x)\right),\quad\forall x\in{\mathbb{R}}^{n}$
	$\displaystyle\sigma_{i}(x):=\sqrt{v_{i}(x,x)},\;\sigma_{\partial,i}(x):=\sqrt{v_{\partial,i}(x,x)}.$	(39)

An advantage of obtaining the covariance functions $\sigma_{i}(x)$ and $\sigma_{\partial,i}(x)$ in nonlinear system identification is that one can compute empirical confidence intervals and decide if one increases training data in some region of interest to relearn the model. Repeating this, one can construct a model with a desired accuracy.

Taking the model learning error into account, a representation of an estimated closed-loop system with $u=p(x)$ becomes

	$\displaystyle x_{k+1}=\mu_{c}(x_{k})+\sigma(x_{k})\omega_{k}$		(40)
	$\displaystyle\hskip 11.38109pt\mu_{c}(x):=\mu(x)+bp(x)$
	$\displaystyle\hskip 15.649pt\sigma(x):={\rm diag}\{\sigma_{1}(x),\dots,\sigma_{n}(x)\},$

where $\omega_{k}\sim{\mathcal{N}}(0,I_{n})$ is i.i.d. The error can be compensated by control design. To see this, we study the stochastic system (40) from two aspects. First, by applying a moment IES condition in [22, Corollary 5.4], we argue how to choose $\varepsilon>0$ in the LMI (23). Then, we also discuss how to construct ${\mathbb{A}}^{(i)}$ , $i=1,\dots,N$ for guaranteeing IES in probability.

4.2 Moment Incremental Stability

Proposition 2.2 for IES has been generalized to the moment IES of stochastic systems [22, Corollary 5.4]. This can be used to decide $\varepsilon>0$ in (23) for control design.

Definition 4.2.

[22, Definition 3.4] The system (40) is said to be IES in the $p$ th moment if there exist $a>0$ and $\lambda\in(0,1)$ such that

\displaystyle{\mathbb{E}}[|x_{k}-x^{\prime}_{k}|^{p}]\leq a\lambda^{k}|x_{0}-x^{\prime}_{0}|^{p},\;\forall k\in{\mathbb{Z}}_{\geq 0}

for each $(x_{0},x^{\prime}_{0})\in{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}$ . $\lhd$

Applying [22, Corollary 5.4] to the system (40) gives the following condition for moment IES.

Proposition 4.3.

A system (40) is IES in the second moment if there exist $\bar{\varepsilon}>0$ and $\bar{P}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n\times n}$ such that

\displaystyle\left\{\begin{array}[]{l}\bar{P}\succ 0\\ \bar{P}-\partial^{\top}\mu_{c}(x)\bar{P}\partial\mu_{c}(x)\\ \hskip 11.38109pt\succeq\bar{\varepsilon}I_{n}+\displaystyle\sum_{i=1}^{n}\partial^{\top}\sigma_{i}(x)e_{i}^{\top}\bar{P}e_{i}\partial\sigma_{i}(x)\end{array}\right.

(44)

for all $x\in{\mathbb{R}}^{n}$ , where each $e_{i}$ , $i=1,\dots,n$ denotes the standard basis whose $i$ th element is $1$ , and the other elements are all $0$ .

{pf}

According to [22, Corollary 5.4], the system (40) with i.i.d. noise $\omega_{k}$ is IES in the second moment if there exist $\varepsilon>0$ and $\bar{P}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n\times n}$ such that

\displaystyle\left\{\begin{array}[]{l}\bar{P}\succ 0\\ {\mathbb{E}}\left[(\partial\mu_{c}(x)+\partial\sigma(x)\omega_{k})^{\top}\bar{P}(\partial\mu_{c}(x)+\partial\sigma(x)\omega_{k})\right]\\ -\lambda^{2}\bar{P}\preceq 0\end{array}\right.

Since $\omega_{k}\sim{\mathcal{N}}(0,I_{n})$ is i.i.d, the left-hand side of the second inequality can be rearranged as

	$\displaystyle{\mathbb{E}}[(\partial\mu_{c}(x)+\partial\sigma(x)\omega_{k})^{\top}\bar{P}(\partial\mu_{c}(x)+\partial\sigma(x)\omega_{k})]-\lambda^{2}\bar{P}$
	$\displaystyle=\partial\mu_{c}^{\top}(x)\bar{P}\partial\mu_{c}(x)+\sum_{i=1}^{n}\partial^{\top}\sigma_{i}(x)e_{i}^{\top}\bar{P}e_{i}\partial\sigma_{i}(x)-\lambda^{2}\bar{P}.$

These inequalities hold if (44) holds. ∎

The condition (44) can be used to decide $\varepsilon>0$ in (2), i.e., (2.1) for control design. If (2.1) holds for a sufficiently large $\varepsilon$ (or $\varepsilon(x)$ ), then (44) holds for some $\bar{\varepsilon}>0$ . Therefore, moment IES suggests how to decide $\varepsilon>0$ .

4.3 Incremental Stability in Probability

In Section 3.2 for control design, a polytope approach has been mentioned. There is a freedom to design finite families of matrices ${\mathbb{A}}^{(j)}\subset{\mathbb{R}}^{n\times n}$ , $j=1,\dots,N$ . This can be utilized to guarantee IES in probability.

Applying the Chebyshev’s inequality [8, Theorem 1] yields, given $c>0$ ,

	$\displaystyle{\mathbb{P}}\big{(}\left(\partial f_{i}^{\rm post}(x)-\partial\mu_{i}(x)\right)\sigma^{-1}_{\partial,i}(x)$
	$\displaystyle\hskip 11.38109pt\left(\partial f_{i}^{\rm post}(x)-\partial\mu_{i}(x)\right)^{\top}<c\big{)}\geq 1-\frac{n}{c}$
	$\displaystyle\hskip 85.35826pt\forall x\in{\mathbb{R}}^{n},\;i=1,\dots,n,$

where $\sigma_{\partial,i}(x)$ is the variance of the Jacobian matrix, computed in (39). Therefore, one can design ${\mathbb{A}}^{(j)}$ such that $\partial f_{i}^{\rm post}(x^{(j)})$ is contained in ${\rm ConvexHull}({\mathbb{A}}^{(j)})$ in probability $(1-n/c)^{n}$ , where note that $\partial f_{i}^{\rm post}|Y_{i}$ and $\partial f_{j}^{\rm post}|Y_{j}$ , $i\neq j$ are mutually independent. For the designed ${\mathbb{A}}^{(j)}$ , suppose that (30) has a set of solutions. Then, a controller guaranteeing IES in probability $(1-n/c)^{n}$ can be constructed from the solutions.

5 Example

Consider a negative resistance oscillator [26, Exercise 2.7]. Its forward Euler discretization with the sampling period $\Delta t=0.01$ is given by

	$\displaystyle f(x)$	$\displaystyle=x+\begin{bmatrix}x_{2}\\ -x_{1}+h(x_{1})x_{2}\\ \end{bmatrix}\Delta t,\;b=\begin{bmatrix}0\\ 1\end{bmatrix}\Delta t$
		$\displaystyle h(x_{1})=-x_{1}+x_{1}^{3}-x_{1}^{5}/5+x_{1}^{7}/105.$

We learn $\partial f_{2}$ only, since $\partial f_{1}=[\begin{matrix}0&1\end{matrix}]$ is determined by the psychical structure. For the number of training data, we consider two cases $N=121$ and $N=2601$ . For both cases, training data points $\{x^{(j)}\}_{j=1}^{N}$ are equally distributed on $[-3,3]\times[-3,3]$ . Training data is generated by $y^{(j)}=f_{2}(x_{1}^{(j)},x_{2}^{(j)})+\omega_{y^{(j)}}$ , where $\omega_{y^{(j)}}\sim{\mathcal{N}}(0,0.01^{2})$ . As a kernel function, we use a Gaussian kernel $k=e^{-|x-x^{\prime}|^{2}/2}$ . Figures 1 and 2 show the learned $\partial\mu_{2}$ and the learning error $\partial(f_{2}-\mu_{2})$ , respectively. In Fig. 2, the error is large around the edges. This is because when computing derivatives at some point, we need information around it, but around the edges, this is not possible. In other words, the learned $\partial\mu_{2}$ in Fig. 1 is closed to the true $\partial f_{2}$ except for the edges.

Refer to caption — Figure 1: (top) Learned $\partial\mu_{1}/\partial x_{1}$ (bottom) Learned $\partial\mu_{2}/\partial x_{2}$ (left) $N=121$ (right) $N=2601$

Next, we design a nonlinear controller based on the learned $\partial\mu_{2}$ by using Algorithm 1. To avoid the edges, we consider a smaller region $[-2,2]\times[-2,2]$ . For the number of training data, we consider two cases $N=49$ and $N=961$ . For both cases, training data points $\{x^{(j)}\}_{j=1}^{N}$ are equally distributed. We select $k_{0}=e^{-|x-x^{\prime}|^{2}/2}$ and $\sigma_{p}=0$ . Then, for both cases, solutions to the LMI (21) are the same:

\displaystyle P=\begin{bmatrix}30.3&-25.2\\ -25.2&30.0\end{bmatrix}.

For this $P$ , we solve the LMI (23) and construct $p(x)$ that is plotted in Fig. 3. We apply the constructed controller to the true system. Since the origin is an equilibrium point of $x(k+1)=f(x(k))$ , we modify the controller as $u=p(x)-p(0)$ to preserve the equilibrium point. For the different numbers of training data, Fig. 4 shows the phase portraits of the closed-loop systems, i.e., the state trajectories starting from different initial states. In each case, the origin of the true system is stabilized by a controller designed for a learned model.

An advantage of our approach is that a nonlinear stabilizing controller is designed only by solving LMIs. The papers [41, 42] propose stabilizing control design methods for a model learned by GPR. The essence of these methods are to cancel nonlinear terms by state feedback like feedback linearization. We apply this approach. Namely, we divide control design procedure into two steps. First, we apply $u=-\mu_{2}+\bar{u}$ for cancelling the nonlinear term $f_{2}$ , where $\mu_{2}$ is an estimation of $f_{2}$ . Next, we design linear feedback $\bar{u}=Kx$ based on the linear terms, which can be done by solving an LMI. In fact, we obtain $K=[-49.8\;40.6]$ . For the different numbers of training data, Fig 5 shows the phase portrait of the closed-loop system by $u=-\mu_{2}-49.8x_{1}+40.6x_{2}$ . In each case, some trajectories (e.g. the red colored one) does not converge to the equilibrium point. This can be caused by the gap between true $f_{2}$ and learned $\mu_{2}$ as known that feedback linearization is weak at model uncertainty.

6 Conclusion

In this paper, we have established an LMI framework for contraction-based control design by utilizing the derivatives of GPR to solve the integrability problem, as an easy-to-implement tool for nonlinear control. Differently from the standard use, we have used GPR to parametrize a set of controllers and have found suitable parameters from the contraction condition. The proposed method can further deal with the case where system dynamics are unknown by learning them by GPR, and the learning errors can be compensated by control design by simple modifications of the proposed LMI framework. Future work includes applying our method to a high-dimensional model with a real data set.

References

[1] V. Andrieu, B. Jayawardhana, and L. Praly. Transverse exponential stability and applications. IEEE Transactions on Automatic Control, 61(11):3396–3411, 2016.
[2] T. Beckers, D. Kulić, and S. Hirche. Stable Gaussian process based tracking control of Euler–Lagrange systems. Automatica, 103:390–397, 2019.
[3] K. Berntorp. Online Bayesian inference and learning of Gaussian-process state–space models. Automatica, 129:109613, 2021.
[4] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2006.
[5] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. SIAM, Philadelphia, 1994.
[6] M. Buisson-Fenet, V. Morgenthaler, S. Trimpe, and F. Di Meglio. Joint state and dynamics estimation with high-gain observers and Gaussian process models. IEEE Control Systems Letters, 5(5):1627–1632, 2020.
[7] F. Bullo. Contraction Theory for Dynamical Systems. Kindle Directed Publishing, 2022.
[8] X. Chen. A new generalization of Chebyshev inequality for random vectors. arXiv:0707.0805, 2007.
[9] F. Forni and R. Sepulchre. A differential Lyapunov framework for contraction anlaysis. IEEE Transactions on Automatic Control, 59(3):614–628, 2014.
[10] F. Forni and R. Sepulchre. Differentially positive systems. IEEE Transactions on Automatic Control, 61(2):346–359, 2015.
[11] F. Forni and R. Sepulchre. Differential dissipativity theory for dominance analysis. IEEE Transactions on Automatic Control, 64(6):2340–2351, 2018.
[12] R. Frigola, Y. Chen, and C.E. Rasmussen. Variational Gaussian process state-space models. Advances in Neural Information Processing Systems, 27, 2014.
[13] R. Frigola, F. Lindsten, T.B. Schön, and C.E. Rasmussen. Identification of Gaussian process state-space models with particle stochastic approximation EM. IFAC Proceedings Volumes, 47(3):4097–4102, 2014.
[14] K. Fujimoto, H. Beppu, and Y. Takaki. On computation of numerical solutions to Hamilton-Jacobi inequalities using Gaussian process regression. Proc.2018 American Control Conference, pages 424–429, 2018.
[15] M. Giaccagli, D. Astolfi, V. Andrieu, and L. Marconi. Sufficient conditions for global integral action via incremental forwarding for input-affine nonlinear systems. IEEE Transactions on Automatic Control, 67(12):6537 – 6551, 2022.
[16] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT press, Massachusetts, 2016.
[17] Y. Ito, K. Fujimoto, Y. Tadokoro, and T. Yoshimura. On stabilizing control of Gaussian processes for unknown nonlinear systems. IFAC-PapersOnLine, 50(1):15385–15390, 2017.
[18] Y. Ito, K. Fujimoto, T. Yoshimura, and Y. Tadokoro. On Gaussian kernel-based Hamilton-Jacobi-Bellman equations for nonlinear optimal control. Proc.2018 American Control Conference, pages 1835–1840, 2018.
[19] Y. Kawano. Controller reduction for nonlinear systems by generalized differential balancing. IEEE Transactions on Automatic Control, 67(11):5856–5871, 2022.
[20] Y. Kawano, B. Besselink, and M. Cao. Contraction analysis of monotone systems via separable functions. IEEE Transactions on Automatic Control, 65(8):3486–3501, 2020.
[21] Y. Kawano and M. Cao. Contraction analysis of virtually positive systems. Systems & Control Letters, 168:105358, 2022.
[22] Y. Kawano and Y. Hosoe. Contraction analysis of discrete-time stochastic systems. arXiv:2106.05635, 2021.
[23] Y. Kawano, C. K. Kosaraju, and J. M. A. Scherpen. Krasovskii and shifted passivity based control. IEEE Transactions on Automatic Control, 66(10):4926–4932, 2021.
[24] Y. Kawano and T. Ohtsuka. Nonlinear eigenvalue approach to differential Riccati equations for contraction analysis. IEEE Transactions on Automatic Control, 62(10):6497– 6504, 2017.
[25] Y. Kawano and J. M. A. Scherpen. Model reduction by differential balancing based on nonlinear Hankel operators. IEEE Transactions on Automatic Control, 62(7):3293–3308, 2017.
[26] H. K. Khalil. Nonlinear Systems. Prentice-Hall, New Jersey, 1996.
[27] M. Krstic. On using least-squares updates without regressor filtering in identification and adaptive control of nonlinear systems. Automatica, 45(3):731–735, 2009.
[28] W. Lohmiller and J.-J. E. Slotine. On contraction analysis for non-linear systems. Automatica, 34(6):683–696, 1998.
[29] I. R. Manchester and J.-J. E. Slotine. Control contraction metrics: Convex and intrinsic criteria for nonlinear feedback design. IEEE Transactions on Automatic Control, 62(6):3046–3053, 2017.
[30] M. Padidar, X. Zhu, L. Huang, J. Gardner, and D. Bindel. Scaling gaussian processes with derivative information using variational inference. Advances in Neural Information Processing Systems, 34, 2021.
[31] G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung. Kernel methods in system identification, machine learning and function estimation: A survey. Automatica, 50(3):657–682, 2014.
[32] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Massachusetts, 2006.
[33] R. Reyes-Báez, A. J. van der Schaft, and B. Jayawardhana. Tracking control of fully-actuated port-Hamiltonian mechanical systems via sliding manifolds and contraction analysis. Proc. 20th IFAC World Congress, pages 8256–8261, 2017.
[34] E. S. Robert, T. Iwasaki, and M. G. Karolos. A Unified Algebraic Approach to Linear Control Design. Routledge, London, 2017.
[35] D. Sun, M. J. Khojasteh, S. Shekhar, and C. Fan. Uncertain-aware safe exploratory planning using Gaussian process and neural control contraction metric. Proc. 3rd Annual Conference on Learning for Dynamics and Control, pages 728–741, 2021.
[36] Y. Takaki and K. Fujimoto. On output feedback controller design for Gaussian process state space models. Proc.56th IEEE Conference on Decision and Control, pages 3652–3657, 2017.
[37] D. N. Tran, B. S. Rüffer, and C. M. Kellett. Convergence properties for discrete-time nonlinear systems. IEEE Transactions on Automatic Control, 63(8):3415–3422, 2018.
[38] H. Tsukamoto and S.-J. Chung. Convex optimization-based controller design for stochastic nonlinear systems using contraction analysis. Proc. 58th IEEE Conference on Decision and Control, pages 8196–8203, 2019.
[39] H. Tsukamoto, S.-J. Chung, and J.-J. E. Slotine. Neural stochastic contraction metrics for learning-based control and estimation. IEEE Control Systems Letters, 5(5):1825–1830, 2020.
[40] H. Tsukamoto, S.-J. Chung, and J.-J. E. Slotine. Contraction theory for nonlinear stability analysis and learning-based control: A tutorial overview. Annual Reviews in Control, 52:135–169, 2021.
[41] J. Umlauft and S. Hirche. Feedback linearization based on Gaussian processes with event-triggered online learning. IEEE Transactions on Automatic Control, 65(10):4154–4169, 2019.
[42] J. Umlauft, L. Pöhler, and S. Hirche. An uncertainty-based control Lyapunov approach for control-affine systems modeled by Gaussian process. IEEE Control Systems Letters, 2(3):483–488, 2018.
[43] A. J. van der Schaft. On differential passivity. IFAC Proceedings Volumes, 46(23):21–25, 2013.