This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Stability Analysis using Quadratic Constraints for Systems with Neural Network Controllers

He Yin, Peter Seiler and Murat Arcak Funded in part by the Air Force Office of Scientific Research grant FA9550-18-1-0253, the Office of Naval Research grant N00014-18-1-2209, and the U.S. National Science Foundation grant ECCS-1906164.H. Yin and M. Arcak are with the University of California, Berkeley {he_yin, arcak}@berkeley.edu.P. Seiler is with the University of Michigan, Ann Arbor pseiler@umich.edu.
Abstract

A method is presented to analyze the stability of feedback systems with neural network controllers. Two stability theorems are given to prove asymptotic stability and to compute an ellipsoidal inner-approximation to the region of attraction (ROA). The first theorem addresses linear time-invariant systems, and merges Lyapunov theory with local (sector) quadratic constraints to bound the nonlinear activation functions in the neural network. The second theorem allows the system to include perturbations such as unmodeled dynamics, slope-restricted nonlinearities, and time delay, using integral quadratic constraint (IQCs) to capture their input/output behavior. This in turn allows for off-by-one IQCs to refine the description of activation functions by capturing their slope restrictions. Both results rely on semidefinite programming to approximate the ROA. The method is illustrated on systems with neural networks trained to stabilize a nonlinear inverted pendulum as well as vehicle lateral dynamics with actuator uncertainty.

I Introduction

The paradigm of stabilizing dynamical systems with Neural Network (NN) controllers [1] has been revived following recent development in deep NNs, e.g. policy gradient [2, 3, 4, 5] and behavioral cloning [6]. However, feedback systems with NN controllers suffer from lack of stability and safety certificates due to the complexity of the NN structure. Specifically, NNs have various types of nonlinear activation functions, potentially numerous layers, and a large number of hidden neurons, making it difficult to apply classical analysis methods, e.g. Lyapunov theory [7]. Monte-Carlo simulations can be used to investigate stability but lack formal guarantees, which are important in safety-critical applications.

Several works propose using quadratic constraints (QCs) to bound the nonlinear activation functions. This approach is used to outer-bound the outputs of a (static) NN given a set of inputs in [8] and upper-bound the Lipschitz constant of NNs in [9, 10]. The work [11] uses this idea for finite-time reachability analysis of a system with a NN controller. The work [12] performs stability analysis by constructing QCs from the bounds of partial gradients of NN controllers. Reference [13] assesses global asymptotic stability of dynamic neural network models using QCs and Lyapunov theory.

This paper presents two main stability results for a feedback system with a NN controller. Theorem 1 provides a condition to prove stability and to inner-approximate the region of attraction (ROA) for a linear time-invariant (LTI) plant. It uses Lyapunov theory, and local (sector) QCs to bound the nonlinear activation functions in the NN. Theorem 2 allows the plant to include perturbations such as unmodeled dynamics, slope-restricted nonlinearities, and time delay, characterizing them with integral quadratic constraints (IQCs) [14, 15]. This in turn allows for the use of off-by-one IQCs [16] to capture the slope restrictions of activation functions. Both results rely on semidefinite programming to approximate the ROA.

The specific contributions of this paper are three-fold. First, our nominal analysis with LTI plants and NN controllers uses offset local sector QCs that are centered around the equilibrium inputs to the NN activation functions and allow for analyzing stability around a non-zero equilibrium point. Second, our analysis of uncertain plants and NN controllers provides robustness guarantees for the feedback system. The uncertain plant is modeled as an interconnection of the nominal plant and perturbations that are described by IQCs. The use of IQCs also allows for plants that are not necessarily LTI. Third, the proposed framework allows for local (dynamic) off-by-one IQCs to further sharpen the description of activation functions by capturing their slope restrictions. This differs from [8, 9, 10, 11, 12], which derive only static QCs for activation functions.

Local (static) sector IQCs have been used in the stability analysis of linear systems with actuator saturation [17, 18], and unbounded nonlinearities [19]. The description of these nonlinearties are refined by incorporating soft (dynamic) IQCs in the stability analysis framework for linear systems [20], and polynomial systems [21]. Compared with these works, this work is specialized to NN-controlled systems: it exploits the specific properties of NNs and uses the Interval Bound Propagation method [22] to derive non-conservative static and dynamic local IQCs to describe NN controllers; and it also allows for the analysis of NN-controlled nonlinear systems by accommodating perturbations.

The paper is organized as follows. Section II presents the problem formulation and the nominal stability analysis when the plant is LTI. Section III addresses uncertain systems using IQCs. Section IV provides numerical examples, including a nonlinear inverted pendulum and an uncertain vehicle model.

Notation: 𝕊n\mathbb{S}^{n} denotes the set of nn-by-nn symmetric matrices. 𝕊+n\mathbb{S}_{+}^{n} and 𝕊++n\mathbb{S}_{++}^{n} denote the sets of nn-by-nn symmetric, positive semidefinite and positive definite matrices, respectively. 𝕃\mathbb{RL}_{\infty} is the set of rational functions with real coefficients and no poles on the unit circle. 𝕃\mathbb{RH}_{\infty}\subset\mathbb{RL}_{\infty} contains functions that are analytic in the closed exterior of the unit disk in the complex plane. 2nx\ell^{n_{x}}_{2} is the set of sequences x:nxx:\mathbb{N}\rightarrow\mathbb{R}^{n_{x}} with x2:=k=0x(k)x(k)<\left\lVert x\right\rVert_{2}:=\sqrt{\sum_{k=0}^{\infty}x(k)^{\top}x(k)}<\infty. When applied to vectors, the orders >,>,\leq are applied elementwise. For P𝕊++nP\in\mathbb{S}_{++}^{n}, xnx_{*}\in\mathbb{R}^{n}, define the ellipsoid

(P,x):={xn:(xx)P(xx)1}.\displaystyle\mathcal{E}(P,x_{*}):=\{x\in\mathbb{R}^{n}:(x-x_{*})^{\top}P(x-x_{*})\leq 1\}. (1)

II Nominal Stability Analysis

II-A Problem Formulation

Consider the feedback system consisting of a plant GG and state-feedback controller π\pi as shown in Figure 1. As a first step, we assume the plant GG is a linear, time-invariant (LTI) system defined by the following discrete-time model:

x(k+1)\displaystyle x(k+1) =AGx(k)+BGu(k),\displaystyle=A_{G}\ x(k)+B_{G}\ u(k), (2)

where x(k)nGx(k)\in\mathbb{R}^{n_{G}} is the state, u(k)nuu(k)\in\mathbb{R}^{n_{u}} is the input, AGnG×nGA_{G}\in\mathbb{R}^{n_{G}\times n_{G}} and BGnG×nuB_{G}\in\mathbb{R}^{n_{G}\times n_{u}}. The controller π:nGnu\pi:\mathbb{R}^{n_{G}}\rightarrow\mathbb{R}^{n_{u}} is an \ell-layer, feed-forward neural network (NN) defined as:

w0(k)\displaystyle w^{0}(k) =x(k),\displaystyle=x(k), (3a)
wi(k)\displaystyle w^{i}(k) =ϕi(Wiwi1(k)+bi),i=1,,,\displaystyle=\phi^{i}\left(\ W^{i}w^{i-1}(k)+b^{i}\ \right),\ i=1,\ldots,\ell, (3b)
u(k)\displaystyle u(k) =W+1w(k)+b+1,\displaystyle=W^{\ell+1}w^{\ell}(k)+b^{\ell+1}, (3c)

where winiw^{i}\in\mathbb{R}^{n_{i}} are the outputs (activations) from the ithi^{th} layer and n0=nGn_{0}=n_{G}. The operations for each layer are defined by a weight matrix Wini×ni1W^{i}\in\mathbb{R}^{n_{i}\times n_{i-1}}, bias vector binib^{i}\in\mathbb{R}^{n_{i}}, and activation function ϕi:nini\phi^{i}:\mathbb{R}^{n_{i}}\rightarrow\mathbb{R}^{n_{i}}. The activation function ϕi\phi^{i} is applied element-wise, i.e.

ϕi(v):=[φ(v1),,φ(vni)],\displaystyle\phi^{i}(v):=\begin{bmatrix}\varphi(v_{1}),\cdots,\varphi(v_{n_{i}})\end{bmatrix}^{\top}, (4)

where φ:\varphi:\mathbb{R}\rightarrow\mathbb{R} is the (scalar) activation function selected for the NN. Common choices for the scalar activation function include φ(ν):=tanh(ν)\varphi(\nu):=\tanh(\nu), sigmoid φ(ν):=11+eν\varphi(\nu):=\frac{1}{1+e^{-\nu}}, ReLU φ(ν):=max(0,ν)\varphi(\nu):=\max(0,\nu), and leaky ReLU φ(ν):=max(aν,ν)\varphi(\nu):=\max(a\nu,\nu) with a(0,1)a\in(0,1). We assume the activation φ\varphi is identical in all layers; this can be relaxed with minor changes to the notation.

Refer to caption
Figure 1: Feedback system with plant GG and NN π\pi

The state vector xx_{*} is an equilibrium point of the feedback system with input uu_{*} if the following conditions hold:

x\displaystyle x_{*} =AGx+BGu,\displaystyle=A_{G}\ x_{*}+B_{G}\ u_{*}, (5a)
u\displaystyle u_{*} =π(x).\displaystyle=\pi(x_{*}). (5b)

Let χ(k;x0)\chi(k;x_{0}) denote the solution to the feedback system at time kk from initial condition x(0)=x0x(0)=x_{0}. Our goal is to analyze asymptotic stability of the equilibrium point and to find the largest estimate of the region of attraction, defined below, using an ellipsoidal inner approximation.

Definition 1.

The region of attraction (ROA) of the feedback system with plant GG and NN π\pi is defined as

:={x0nG:limkχ(k;x0)=x}.\displaystyle\mathcal{R}:=\{x_{0}\in\mathbb{R}^{n_{G}}:\lim_{k\rightarrow\infty}\chi(k;x_{0})=x_{*}\}. (6)

II-B NN Representation: Isolation of Nonlinearities

It is useful to isolate the nonlinear activation functions from the linear operations of the NN as done in [8, 13]. Define viv^{i} as the input to the activation function ϕi\phi^{i}:

vi(k):=Wiwi1(k)+bi,i=1,,.\displaystyle v^{i}(k):=W^{i}w^{i-1}(k)+b^{i},\ i=1,\ldots,\ell. (7)

The nonlinear operation of the ithi^{th} layer (3b) is thus expressed as wi(k)=ϕi(vi(k))w^{i}(k)=\phi^{i}(v^{i}(k)). Gather the inputs and outputs of all activation functions:

vϕ:=[v1v]nϕ and wϕ:=[w1w]nϕ,\displaystyle v_{\phi}:=\begin{bmatrix}v^{1}\\ \vdots\\ v^{\ell}\end{bmatrix}\in\mathbb{R}^{n_{\phi}}\mbox{ and }w_{\phi}:=\begin{bmatrix}w^{1}\\ \vdots\\ w^{\ell}\end{bmatrix}\in\mathbb{R}^{n_{\phi}}, (8)

where nϕ:=n1++nn_{\phi}:=n_{1}+\cdots+n_{\ell}, and define the combined nonlinearity ϕ:nϕnϕ\phi:\mathbb{R}^{n_{\phi}}\rightarrow\mathbb{R}^{n_{\phi}} by stacking the activation functions:

ϕ(vϕ):=[ϕ1(v1)ϕ(v)].\displaystyle\phi(v_{\phi}):=\begin{bmatrix}\phi^{1}(v^{1})\\ \vdots\\ \phi^{\ell}(v^{\ell})\end{bmatrix}. (9)

Thus wϕ(k)=ϕ(vϕ(k))w_{\phi}(k)=\phi(v_{\phi}(k)), where the scalar activation function φ\varphi is applied element-wise to each entry of vϕv_{\phi}. Finally, the NN control policy π\pi defined in (3) can be rewritten as:

[u(k)vϕ(k)]\displaystyle\begin{bmatrix}u(k)\\ v_{\phi}(k)\end{bmatrix} =N[x(k)wϕ(k)1]\displaystyle=N\begin{bmatrix}x(k)\\ w_{\phi}(k)\\ 1\end{bmatrix} (10a)
wϕ(k)\displaystyle w_{\phi}(k) =ϕ(vϕ(k)).\displaystyle=\phi(v_{\phi}(k)). (10b)

The matrix NN depends on the weights and biases as follows, where the vertical and horizontal bars partition NN compatibly with the inputs (x,wϕ,1)(x,w_{\phi},1) and outputs (u,vϕ)(u,v_{\phi}):

N\displaystyle N :=[000W+1b+1W1000b10W200b200W0b]\displaystyle:=\left[\begin{array}[]{c|cccc|c}0&0&0&\cdots&W^{\ell+1}&b^{\ell+1}\\ \hline\cr W^{1}&0&\cdots&0&0&b^{1}\\ 0&W^{2}&\cdots&0&0&b^{2}\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots\\ 0&0&\cdots&W^{\ell}&0&b^{\ell}\end{array}\right] (11f)
:=[NuxNuwNubNvxNvwNvb].\displaystyle:=\begin{bmatrix}N_{ux}&N_{uw}&N_{ub}\\ N_{vx}&N_{vw}&N_{vb}\end{bmatrix}. (11g)

This decomposition of the NN, depicted in Figure 2, isolates the activation functions in preparation for the stability analysis.

Refer to caption
Figure 2: NN representation to isolate the nonlinearities ϕ\phi.

Suppose (x,u)(x_{*},u_{*}) satisfies (5). Then xx_{*} can be propagated through the NN to obtain equilibrium values viv_{*}^{i}, wiw_{*}^{i} for the inputs/outputs of each activation function (i=1,,i=1,\ldots,\ell), yielding (vϕ,wϕ)=(v,w)(v_{\phi},w_{\phi})=(v_{*},w_{*}). Thus (x,u,v,w)(x_{*},u_{*},v_{*},w_{*}) is an equilibrium point of (2) and (3) if:

x\displaystyle x_{*} =AGx+BGu,\displaystyle=A_{G}\ x_{*}+B_{G}\ u_{*}, (12a)
[uv]\displaystyle\begin{bmatrix}u_{*}\\ v_{*}\end{bmatrix} =N[xw1],\displaystyle=N\begin{bmatrix}x_{*}\\ w_{*}\\ 1\end{bmatrix}, (12b)
w\displaystyle w_{*} =ϕ(v).\displaystyle=\phi(v_{*}). (12c)

II-C Quadratic Constraints: Scalar Activation Functions

The stability analysis relies on quadratic constraints (QCs) to bound the activation function. A typical constraint is the sector bound as defined next.

Definition 2.

Let αβ\alpha\leq\beta be given. The function φ:\varphi:\mathbb{R}\rightarrow\mathbb{R} lies in the (global) sector [α,β][\alpha,\beta] if:

(φ(ν)αν)(βνφ(ν))0ν.\displaystyle(\varphi(\nu)-\alpha\nu)\cdot(\beta\nu-\varphi(\nu))\geq 0\,\,\,\forall\nu\in\mathbb{R}. (13)

The interpretation of the sector [α,β][\alpha,\beta] is that φ\varphi lies between lines passing through the origin with slope α\alpha and β\beta. Many activation functions are bounded in the sector [0,1][0,1], e.g. tanh\tanh and ReLU. Figure 3 illustrates φ(ν)=tanh(ν)\varphi(\nu)=\tanh(\nu) (blue solid) and the global sector defined by [0,1][0,1] (red solid lines).

Refer to caption
Figure 3: Sector constraints on tanh\tanh

The global sector constraint is often too coarse for stability analysis, and a local sector constraint provides tighter bounds.

Definition 3.

Let α\alpha, β\beta, ν¯\underline{\nu}, ν¯\bar{\nu}\in\mathbb{R} with αβ\alpha\leq\beta and ν¯0ν¯\underline{\nu}\leq 0\leq\bar{\nu}. The function φ:\varphi:\mathbb{R}\rightarrow\mathbb{R} satisfies the local sector [α,β][\alpha,\beta] if

(φ(ν)αν)(βνφ(ν))0ν[ν¯,ν¯].\displaystyle(\varphi(\nu)-\alpha\,\nu)\cdot(\beta\,\nu-\varphi(\nu))\geq 0\,\,\,\forall\nu\in[\underline{\nu},\bar{\nu}]. (14)

As an example, φ(ν):=tanh(ν)\varphi(\nu):=\tanh(\nu) restricted to the interval [ν¯,ν¯][-\bar{\nu},\bar{\nu}] satisfies the local sector bound [α,β][\alpha,\beta] with α:=tanh(ν¯)/ν¯>0\alpha:=\tanh(\bar{\nu})/\bar{\nu}>0 and β:=1\beta:=1. As shown in Figure 3 (green dashed lines), the local sector provides a tighter bound than the global sector. These bounds are valid for a symmetric interval around the origin with ν¯=ν¯\underline{\nu}=-\bar{\nu}; non-symmetric intervals (ν¯ν¯\underline{\nu}\neq-\bar{\nu}) can be handled similarly.

The local and global sector constraints above were defined to be centered at the point (ν,φ(ν))=(0,0)(\nu,\varphi(\nu))=(0,0). The stability analysis will require offset sectors centered around an arbitrary point (ν,φ(ν))(\nu_{*},\varphi(\nu_{*})) on the function. For example, φ(ν)=tanh(ν)\varphi(\nu)=\tanh(\nu) satisfies the global sector bound (red solid) around the point (ν,tanh(ν))(\nu_{*},\tanh(\nu_{*})) with [α,β]=[0,1][\alpha,\beta]=[0,1], as shown in Figure 4. It satisfies a tighter local sector bound (green dashed) when the input is restricted to ν[ν¯,ν¯]\nu\in[\underline{\nu},\bar{\nu}]. An explicit expression for this local sector is β=1\beta=1 and

α:=min(tanh(ν¯)tanh(ν)ν¯ν,tanh(ν)tanh(ν¯)νν¯).\displaystyle\alpha:=\min\left(\frac{\tanh(\bar{\nu})-\tanh(\nu_{*})}{\bar{\nu}-\nu_{*}},\frac{\tanh(\nu_{*})-\tanh(\underline{\nu})}{\nu_{*}-\underline{\nu}}\right).

The local sector upper bound β\beta can be tightened further. This leads to the following definition of an offset local sector.

Refer to caption
Figure 4: Offset local sector constraint on tanh\tanh
Definition 4.

Let α\alpha, β\beta, ν¯\underline{\nu}, ν¯\bar{\nu}, ν\nu_{*}\in\mathbb{R} be given with αβ\alpha\leq\beta and ν¯νν¯\underline{\nu}\leq\nu_{*}\leq\bar{\nu}. The function φ:\varphi:\mathbb{R}\rightarrow\mathbb{R} satisfies the offset local sector [α,β][\alpha,\beta] around the point (ν,φ(ν))(\nu_{*},\varphi(\nu_{*})) if:

(Δφ(ν)αΔν)(βΔνΔφ(ν))0ν[ν¯,ν¯]\displaystyle(\Delta\varphi(\nu)-\alpha\,\Delta\nu)\cdot(\beta\,\Delta\nu-\Delta\varphi(\nu))\geq 0\,\,\,\forall\nu\in[\underline{\nu},\bar{\nu}] (15)

where Δφ(ν):=φ(ν)φ(ν)\Delta\varphi(\nu):=\varphi(\nu)-\varphi(\nu_{*}) and Δν:=νν\Delta\nu:=\nu-\nu_{*}.

II-D Quadratic Constraints: Combined Activation Functions

Offset local sector constraints can also be defined for the combined nonlinearity ϕ\phi, given by (9). Let v¯,v¯,vnϕ\underline{v},\bar{v},v_{*}\in\mathbb{R}^{n_{\phi}} be given with v¯vv¯\underline{v}\leq v_{*}\leq\bar{v}. Assume that the activation input vϕnϕv_{\phi}\in\mathbb{R}^{n_{\phi}} lies, element-wise, in the interval [v¯,v¯][\underline{v},\bar{v}] and the ithi^{th} input/output pair is wϕ,i=φ(vϕ,i)w_{\phi,i}=\varphi(v_{\phi,i}). Further assume the scalar activation function satisfies the local sector [αi,βi][\alpha_{i},\beta_{i}] around the point v,iv_{*,i} with the input restricted to vϕ,i[v¯i,v¯i]v_{\phi,i}\in[\underline{v}_{i},\bar{v}_{i}] for i=1,,nϕi=1,\ldots,n_{\phi}. The local sector bounds can be computed for φ\varphi on the given interval either analytically (as above for tanh\tanh) or numerically. These local sectors can be stacked into vectors αϕ,βϕnϕ\alpha_{\phi},\beta_{\phi}\in\mathbb{R}^{n_{\phi}} that provide QCs satisfied by the combined nonlinearity ϕ\phi.

Lemma 1.

Let αϕ\alpha_{\phi}, βϕ\beta_{\phi}, v¯\underline{v}, v¯\bar{v}, vnϕv_{*}\in\mathbb{R}^{n_{\phi}} be given with αϕβϕ\alpha_{\phi}\leq\beta_{\phi}, v¯vv¯\underline{v}\leq v_{*}\leq\bar{v}, and w:=ϕ(v)w_{*}:=\phi(v_{*}). Assume ϕ\phi satisfies the offset local sector [αϕ,βϕ][\alpha_{\phi},\beta_{\phi}] around the point (v,w)(v_{*},w_{*}) element-wise for all vϕ[v¯,v¯]v_{\phi}\in[\underline{v},\bar{v}]. If λnϕ\lambda\in\mathbb{R}^{n_{\phi}} with λ0\lambda\geq 0 then:

[vϕvwϕw]ΨϕMϕ(λ)Ψϕ[vϕvwϕw]0\displaystyle\begin{bmatrix}v_{\phi}-v_{*}\\ w_{\phi}-w_{*}\end{bmatrix}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}\begin{bmatrix}v_{\phi}-v_{*}\\ w_{\phi}-w_{*}\end{bmatrix}\geq 0
vϕ[v¯,v¯],wϕ=ϕ(vϕ),\displaystyle\hskip 28.90755pt\forall v_{\phi}\in[\underline{v},\bar{v}],\,w_{\phi}=\phi(v_{\phi}),
whereΨϕ\displaystyle\text{where}\ \ \Psi_{\phi} :=[diag(βϕ)Inϕdiag(αϕ)Inϕ]\displaystyle:=\begin{bmatrix}\text{diag}(\beta_{\phi})&-I_{n_{\phi}}\\ -\text{diag}(\alpha_{\phi})&I_{n_{\phi}}\end{bmatrix} (16)
Mϕ(λ)\displaystyle M_{\phi}(\lambda) :=[0nϕdiag(λ)diag(λ)0nϕ].\displaystyle:=\begin{bmatrix}0_{n_{\phi}}&\text{diag}(\lambda)\\ \text{diag}(\lambda)&0_{n_{\phi}}\end{bmatrix}. (17)
Proof.

For any vϕnϕv_{\phi}\in\mathbb{R}^{n_{\phi}} and wϕ=ϕ(vϕ)w_{\phi}=\phi(v_{\phi}):

[vϕvwϕw]ΨϕMϕ(λ)Ψϕ[vϕvwϕw]\displaystyle\begin{bmatrix}v_{\phi}-v_{*}\\ w_{\phi}-w_{*}\end{bmatrix}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}\begin{bmatrix}v_{\phi}-v_{*}\\ w_{\phi}-w_{*}\end{bmatrix}
=i=1nϕλi(ΔwiαiΔvi)(βiΔviΔwi)\displaystyle\,\,\,\,=\sum_{i=1}^{n_{\phi}}\lambda_{i}(\Delta w_{i}-\alpha_{i}\,\Delta v_{i})\cdot(\beta_{i}\,\Delta v_{i}-\Delta w_{i})

where Δwi:=φ(vϕ,i)φ(v,i)\Delta w_{i}:=\varphi(v_{\phi,i})-\varphi(v_{*,i}) and Δvi:=vϕ,iv,i\Delta v_{i}:=v_{\phi,i}-v_{*,i}. If vϕ[v¯,v¯]v_{\phi}\in[\underline{v},\bar{v}] then each term in the sum is non-negative by the offset local sector constraints and λ0\lambda\geq 0. ∎

In order to apply the local sector and slope bounds in the stability analysis, we must first compute the bounds v¯,v¯nϕ\underline{v},\overline{v}\in\mathbb{R}^{n_{\phi}} on the activation input vϕv_{\phi}. The process to compute the bounds is briefly discussed here with more details provided in [22]. Let v1v_{*}^{1} be the equilibrium value at the first NN layer. Select v¯1\underline{v}^{1}, v¯1n1\bar{v}^{1}\in\mathbb{R}^{n_{1}} with v¯1v1v¯1\underline{v}^{1}\leq v_{*}^{1}\leq\bar{v}^{1}. The assumed bounds on v1v^{1} can be used to compute an interval [w¯1,w¯1][\underline{w}^{1},\bar{w}^{1}] for the output w1=ϕ1(v1)w^{1}=\phi^{1}(v^{1})For example, if φ(ν)=tanh(ν)\varphi(\nu)=\tanh(\nu) then the input bound ν[ν¯,ν¯]\nu\in[-\bar{\nu},\bar{\nu}] implies the output bound φ(ν)[tanh(ν¯),tanh(ν¯)]\varphi(\nu)\in[-\tanh(\bar{\nu}),\tanh(\bar{\nu})]. which can then be used to compute bounds [v¯2,v¯2][\underline{v}^{2},\bar{v}^{2}] on the input v2v^{2} to the next activation function.The next activation input is v2:=W2w1+b2v^{2}:=W^{2}w^{1}+b^{2}. The largest value of the ithi^{th} entry of this vector is obtained by solving the following optimization: v¯i2:=maxw¯1w1w¯1yw1+bi2\displaystyle\bar{v}^{2}_{i}:=\max_{\underline{w}^{1}\leq w^{1}\leq\bar{w}^{1}}y^{\top}w^{1}+b_{i}^{2} (18) where yy^{\top} is the ithi^{th} row of W2W^{2}. Define c:=12(w¯1+w¯1)c:=\frac{1}{2}(\bar{w}^{1}+\underline{w}^{1}) and r:=12(w¯1w¯1)r:=\frac{1}{2}(\bar{w}^{1}-\underline{w}^{1}). The optimization can be rewritten as: v¯i2:=(yc+bi2)+maxrδryδ\displaystyle\bar{v}^{2}_{i}:=\left(y^{\top}c+b_{i}^{2}\right)+\max_{-r\leq\delta\leq r}y^{\top}\delta (19) This has the explicit solution v¯i2=yc+bi2+j=1n1|yjrj|\bar{v}^{2}_{i}=y^{\top}c+b_{i}^{2}+\sum_{j=1}^{n_{1}}|y_{j}r_{j}|. Similarly, the minimal value is v¯i2=yc+bi2j=1n1|yjrj|\underline{v}^{2}_{i}=y^{\top}c+b_{i}^{2}-\sum_{j=1}^{n_{1}}|y_{j}r_{j}|. The intervals computed for w1w^{1} and v2v^{2} will contain their equilibrium value w1w_{*}^{1} and v2v_{*}^{2}. This process can be propagated through all layers of the NN to obtain the bounds v¯,v¯nϕ\underline{v},\overline{v}\in\mathbb{R}^{n_{\phi}} for the activation function input vϕv_{\phi}. The remainder of the paper will assume the local sector bounds have been computed as briefly summarized in the following property.

Property 1.

Let vnϕv_{*}\in\mathbb{R}^{n_{\phi}} be an equilibrium value of the activation input and v1n1v_{*}^{1}\in\mathbb{R}^{n_{1}} be the corresponding value at the first layer. Let v¯1\underline{v}^{1}, v¯1n1\bar{v}^{1}\in\mathbb{R}^{n_{1}} with v1[v¯1,v¯1]v^{1}_{*}\in[\underline{v}^{1},\bar{v}^{1}] and their corresponding activation input bounds v¯,v¯\underline{v},\overline{v} be given. There exist αϕ\alpha_{\phi}, βϕnϕ\beta_{\phi}\in\mathbb{R}^{n_{\phi}} such that ϕ\phi satisfies the offset local sector around the point (v,ϕ(v))(v_{*},\phi(v_{*})) for all vϕ[v¯,v¯]v_{\phi}\in[\underline{v},\bar{v}].

II-E Lyapunov Condition

This section uses a Lyapunov function and the offset local sector to compute an inner approximation for the ROA of the feedback system of GG and π\pi. To simplify notation, the interval bound on v1v^{1} is assumed to be symmetrical about v1v^{1}_{*}, i.e. v¯1=2v1v¯1\underline{v}^{1}=2v^{1}_{*}-\bar{v}^{1} so that v¯1v1=v1v¯1\bar{v}^{1}-v_{*}^{1}=v_{*}^{1}-\underline{v}^{1}. This can be relaxed to handle non-symmetrical intervals with minor notational changes.

Theorem 1.

Consider the feedback system of plant GG in (2) and NN π\pi in (3) with equilibrium point (x,u,v,w)(x_{*},u_{*},v_{*},w_{*}) satisfying (12). Let v¯1n1\bar{v}^{1}\in\mathbb{R}^{n_{1}}, v¯1:=2v1v¯1\underline{v}^{1}:=2v_{*}^{1}-\bar{v}^{1}, and αϕ,βϕnϕ\alpha_{\phi},\beta_{\phi}\in\mathbb{R}^{n_{\phi}} be given vectors satisfying Property 1 for the NN. Denote the ithi^{th} row of the first weight W1W^{1} by Wi1W^{1}_{i} and define matrices

RV:=[InG0nG×nϕNuxNuw], and Rϕ:=[NvxNvw0nϕ×nGInϕ].\displaystyle R_{V}:=\begin{bmatrix}I_{n_{G}}&0_{n_{G}\times n_{\phi}}\\ N_{ux}&N_{uw}\end{bmatrix},\,\mbox{ and }\,R_{\phi}:=\begin{bmatrix}N_{vx}&N_{vw}\\ 0_{n_{\phi}\times n_{G}}&I_{n_{\phi}}\end{bmatrix}.

If there exists a matrix P𝕊++nGP\in\mathbb{S}_{++}^{n_{G}}, and vector λnϕ\lambda\in\mathbb{R}^{n_{\phi}} with λ0\lambda\geq 0 such that

RV[AGPAGPAGPBGBGPAGBGPBG]RV\displaystyle R_{V}^{\top}\begin{bmatrix}A_{G}^{\top}PA_{G}-P&A_{G}^{\top}PB_{G}\\ B_{G}^{\top}PA_{G}&B_{G}^{\top}PB_{G}\end{bmatrix}R_{V}
+RϕΨϕMϕ(λ)ΨϕRϕ<0,\displaystyle\hskip 28.90755pt+R_{\phi}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}R_{\phi}<0, (20)
[(v¯i1v,i1)2Wi1Wi1P]0,i=1,,n1,\displaystyle\begin{bmatrix}(\bar{v}_{i}^{1}-v^{1}_{*,i})^{2}&W^{1}_{i}\\ W^{1\top}_{i}&P\end{bmatrix}\geq 0,\,\,\,i=1,\cdots,n_{1}, (21)

then: (i) the feedback system consisting of GG and π\pi is locally stable around xx_{*}, and (ii) the set (P,x)\mathcal{E}(P,x_{*}), defined by (1), is an inner-approximation to the ROA.

Proof.

By Schur complements, (21) is equivalent to:

Wi1P1Wi1(v¯i1v,i1)2,i=1,,n1.\displaystyle W_{i}^{1}P^{-1}W_{i}^{1\top}\leq(\bar{v}_{i}^{1}-v_{*,i}^{1})^{2},\ i=1,\cdots,n_{1}. (22)

It follows from Lemma 1 in [23] that:

(P,x){xnG:v¯1v1W1(xx)v¯1v1}.\displaystyle\mathcal{E}(P,x_{*})\subseteq\{x\in\mathbb{R}^{n_{G}}:\underline{v}^{1}-v_{*}^{1}\leq W^{1}(x-x_{*})\leq\bar{v}^{1}-v_{*}^{1}\}.

Finally, use v1v1=W1(xx)v^{1}-v_{*}^{1}=W^{1}(x-x_{*}) to rewrite this as:

(P,x){x:v¯1v1v¯1}.\displaystyle\mathcal{E}(P,x_{*})\subseteq\{x:\underline{v}^{1}\leq v^{1}\leq\bar{v}^{1}\}.

To summarize, feasibility of (21) verifies that if x(k)(P,x)x(k)\in\mathcal{E}(P,x_{*}) then v1(k)[v¯1,v¯1]v^{1}(k)\in[\underline{v}^{1},\bar{v}^{1}] and hence the offset local sector conditions are valid.

Next, since the LMI in (20) is strict, there exists ϵ>0\epsilon>0 such that left / right multiplication of the LMI by [(x(k)x)(wϕ(k)w)]\begin{bmatrix}(x(k)-x_{*})^{\top}&(w_{\phi}(k)-w_{*})^{\top}\end{bmatrix} and its transpose yields

[][AGPAGPAGPBGBGPAGBGPBG][x(k)xu(k)u]\displaystyle\Big{[}\star\Big{]}^{\top}\begin{bmatrix}A_{G}^{\top}PA_{G}-P&A_{G}^{\top}PB_{G}\\ B_{G}^{\top}PA_{G}&B_{G}^{\top}PB_{G}\end{bmatrix}\begin{bmatrix}x(k)-x_{*}\\ u(k)-u_{*}\end{bmatrix}
+[]ΨϕMϕ(λ)Ψϕ[vϕ(k)vwϕ(k)w]ϵx(k)x2.\displaystyle+\Big{[}\star\Big{]}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}\begin{bmatrix}v_{\phi}(k)-v_{*}\\ w_{\phi}(k)-w_{*}\end{bmatrix}\leq-\epsilon\|x(k)-x_{*}\|^{2}.

where the entries denoted by \star can be inferred from symmetry. Define the Lyapunov function V(x):=(xx)P(xx)V(x):=(x-x_{*})^{\top}P(x-x_{*}) and use (2) and (12) to show:

V(x(k+1))V(x(k))+[]ΨϕMϕ(λ)Ψϕ[vϕ(k)vwϕ(k)w]\displaystyle V(x(k+1))-V(x(k))+\Big{[}\star\Big{]}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}\begin{bmatrix}v_{\phi}(k)-v_{*}\\ w_{\phi}(k)-w_{*}\end{bmatrix}
ϵx(k)x2.\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}\leq-\epsilon\|x(k)-x_{*}\|^{2}. (23)

Assume x(k)(P,x)x(k)\in\mathcal{E}(P,x_{*}) for some k0k\geq 0, i.e., V(x(k))1V(x(k))\leq 1. As noted above, x(k)(P,x)x(k)\in\mathcal{E}(P,x_{*}) implies the offset local sector [αϕ,βϕ][\alpha_{\phi},\beta_{\phi}] around vv_{*}. Then, by Lemma 1, the final term on the left side of (II-E) is 0\geq 0, and thus from (II-E) we have V(x(k+1))1V(x(k+1))\leq 1, i.e., x(k+1)(P,x)x(k+1)\in\mathcal{E}(P,x_{*}). By induction, we have that (P,x)\mathcal{E}(P,x_{*}) is forward invariant, i.e., x(0)(P,x)x(k)(P,x)k0x(0)\in\mathcal{E}(P,x_{*})\implies x(k)\in\mathcal{E}(P,x_{*})\ \forall k\geq 0. As a result, if x(0)(P,x)x(0)\in\mathcal{E}(P,x_{*}), then the final term on the left side of (II-E) is 0\geq 0 for all k0k\geq 0, and V(x(k+1))V(x(k))ϵx(k)x2V(x(k+1))-V(x(k))\leq-\epsilon\|x(k)-x_{*}\|^{2} for all k0k\geq 0. It follows from a Lyapunov argument, e.g. Theorem 4.1 in [7], that xx_{*} is an asymptotically stable equilibrium point and (P,x)\mathcal{E}(P,x_{*}) is an inner approximation of the ROA. ∎

Remark 1.

Note that v¯1\overline{v}^{1} should be chosen with care as it affects the size of ROA inner-approximations directly: decreasing (v¯1v1)(\overline{v}^{1}-v_{*}^{1}) gives rise to sharper local sector bounds, which is beneficial on ROA estimation, but also restricts the region where ROA inner-approximations lie in; increasing (v¯1v1)(\overline{v}^{1}-v_{*}^{1}) leads to a larger region that contains ROA inner-approximations, but also provides looser local sector bounds. A possible way of choosing v¯1\overline{v}^{1} is to parameterize (v¯1v1)(\overline{v}^{1}-v_{*}^{1}) as v¯1v1=δv×1n1×1\overline{v}^{1}-v_{*}^{1}=\delta_{v}\times 1_{n_{1}\times 1} with δv++\delta_{v}\in\mathbb{R}_{++}, grid the interval [0,δ¯v][0,\overline{\delta}_{v}]§§§δ¯v\overline{\delta}_{v} is the largest value such that (20) and (21) stay feasible. where δv\delta_{v} lies in, inner-approximate the ROA on the grid, and choose δv\delta_{v} that leads to the largest inner-approximation.

Remark 2.

In the paper, the NN controller is assumed to be state-feedback. For the output-feedback case, i.e., u=π(Cx)u=\pi(Cx), where Cny×nGC\in\mathbb{R}^{n_{y}\times n_{G}}, the stability analysis can be performed similarly, using a new NvxN_{vx} defined as Nvx:=[W1C0(n2++n)×nG]N_{vx}:=\left[\begin{smallmatrix}W^{1}C\\ 0_{(n_{2}+...+n_{\ell})\times n_{G}}\end{smallmatrix}\right].

III Robust Stability Analysis

III-A Problem Formulation

Consider the uncertain feedback system in Figure 5, consisting of an uncertain plant Fu(G,Δ)F_{u}(G,\Delta) and a NN controller π\pi as defined by (3). The uncertain plant Fu(G,Δ)F_{u}(G,\Delta) is an interconnection of a nominal plant GG and a perturbation Δ\Delta. The nominal plant GG is defined by the following equations:

x(k+1)\displaystyle x(k+1) =AGx(k)+BG1q(k)+BG2u(k),\displaystyle=A_{G}\ x(k)+B_{G1}\ q(k)+B_{G2}\ u(k), (24a)
p(k)\displaystyle p(k) =CGx(k)+DG1q(k)+DG2u(k),\displaystyle=C_{G}\ x(k)+D_{G1}\ q(k)+D_{G2}\ u(k), (24b)

where x(k)nGx(k)\in\mathbb{R}^{n_{G}} is the state, u(k)nuu(k)\in\mathbb{R}^{n_{u}} is the control input, p(k)npp(k)\in\mathbb{R}^{n_{p}} and q(k)nqq(k)\in\mathbb{R}^{n_{q}} are the input and output of Δ\Delta, AGnG×nGA_{G}\in\mathbb{R}^{n_{G}\times n_{G}}, BG1nG×nqB_{G1}\in\mathbb{R}^{n_{G}\times n_{q}}, BG2nG×nuB_{G2}\in\mathbb{R}^{n_{G}\times n_{u}}, CGnp×nGC_{G}\in\mathbb{R}^{n_{p}\times n_{G}}, DG1np×nqD_{G1}\in\mathbb{R}^{n_{p}\times n_{q}}, and DG2np×nuD_{G2}\in\mathbb{R}^{n_{p}\times n_{u}}. The perturbation is a bounded, causal operator Δ:2enp2enq\Delta:\ell^{n_{p}}_{2e}\rightarrow\ell^{n_{q}}_{2e}. The nominal plant GG and perturbation Δ\Delta form the interconnection Fu(G,Δ)F_{u}(G,\Delta) through the constraint

q()=Δ(p()).\displaystyle q(\cdot)=\Delta(p(\cdot)). (25)

Denote the set of perturbations to be considered as 𝒮\mathcal{S}.

Refer to caption
Figure 5: Feedback system with uncertain plant Fu(G,Δ)F_{u}(G,\Delta) and NN controller π\pi
Assumption 1.

In this section, we assume (i) the equilibrium point (x,u,v,w,p,q)(x_{*},u_{*},v_{*},w_{*},p_{*},q_{*}) of the feedback system is at the origin, and (ii) 0=Δ(0)0=\Delta(0) for all Δ𝒮\Delta\in\mathcal{S}. Note that Δ\Delta is modeled as an operator mapping inputs to outputs. If Δ\Delta has an internal state then there is an implicit assumption that it has zero initial condition.

Let χ(k;x0,Δ)\chi(k;x_{0},\Delta) denote the solution to the feedback system of Fu(G,Δ)F_{u}(G,\Delta) and π\pi with Δ𝒮\Delta\in\mathcal{S} at time kk from the initial condition x(0)=x0x(0)=x_{0}An input/output model is used for the perturbation Δ\Delta so that its internal state and initial condition is not explicitly considered.. Define the robust ROA associated with xx_{*} as follows.

Definition 5.

The robust ROA of the feedback system with the uncertain plant Fu(G,Δ)F_{u}(G,\Delta) and NN π\pi is defined as:

:={x0nG:limkχ(k;x0,Δ)=xΔ𝒮}.\displaystyle\mathcal{R}:=\{x_{0}\in\mathbb{R}^{n_{G}}:\lim_{k\rightarrow\infty}\chi(k;x_{0},\Delta)=x_{*}\ \forall\Delta\in\mathcal{S}\}. (26)

The objective is to prove the uncertain feedback system is asymptotically stable and, if so, to find the largest estimate of the robust ROA using an ellipsoidal inner approximation.

III-B Integral Quadratic Constraints

The perturbation can represent various types of uncertainty [14], [15], including saturation, time delay, unmodeled dynamics, and slope-restricted nonlinearities. The input-output relationship of Δ\Delta is characterized with an integral quadratic constraint (IQC), which consists of a ‘virtual’ filter ΨΔ\Psi_{\Delta} applied to the input pp and output qq of Δ\Delta and a constraint on the output rr of ΨΔ\Psi_{\Delta}. The filter ΨΔ\Psi_{\Delta} is an LTI system of the form:

ψ(k+1)\displaystyle\psi(k+1) =AΨψ(k)+BΨ1p(k)+BΨ2q(k)\displaystyle=A_{\Psi}\ \psi(k)+B_{\Psi 1}\ p(k)+B_{\Psi 2}\ q(k) (27a)
r(k)\displaystyle r(k) =CΨψ(k)+DΨ1p(k)+DΨ2q(k)\displaystyle=C_{\Psi}\ \psi(k)+D_{\Psi 1}\ p(k)+D_{\Psi 2}\ q(k) (27b)
ψ(0)\displaystyle\psi(0) =0\displaystyle=0 (27c)

where ψ(k)nψ\psi(k)\in\mathbb{R}^{n_{\psi}} is the state, r(k)nrr(k)\in\mathbb{R}^{n_{r}} is the output, and AΨA_{\Psi} is a Schur matrix. The state matrices have compatible dimensions. The dynamics of ΨΔ\Psi_{\Delta} can be compactly denoted by [AΨBΨ1BΨ2CΨDΨ1DΨ2]\left[\begin{array}[]{c|cc}A_{\Psi}&B_{\Psi 1}&B_{\Psi 2}\\ \hline\cr C_{\Psi}&D_{\Psi 1}&D_{\Psi 2}\end{array}\right]. By (p,q)=0(p_{*},q_{*})=0 from Assumption 1, the equilibrium state ψnψ\psi_{*}\in\mathbb{R}^{n_{\psi}} of (27) is also zero.

The Lyapunov analysis in the next subsection makes use of time-domain IQCs as defined next:

Definition 6.

Let ΨΔnr×(np+nq)\Psi_{\Delta}\in\mathbb{RH}_{\infty}^{n_{r}\times(n_{p}+n_{q})} and MΔ𝕊nrM_{\Delta}\in\mathbb{S}^{n_{r}} be given. A bounded, causal operator Δ:2enp2enq\Delta:\ell^{n_{p}}_{2e}\rightarrow\ell^{n_{q}}_{2e} satisfies the time domain IQC defined by (ΨΔ,MΔ)(\Psi_{\Delta},M_{\Delta}) if the following inequality holds for all p2enpp\in\ell_{2e}^{n_{p}}, q=Δ(p)q=\Delta(p) and for all N0N\geq 0

k=0Nr(k)MΔr(k)0.\displaystyle\sum_{k=0}^{N}r(k)^{\top}M_{\Delta}r(k)\geq 0. (28)

The notation Δ\Delta\in IQC(ΨΔ,MΔ)(\Psi_{\Delta},M_{\Delta}) indicates that Δ\Delta satisfies the IQC defined by ΨΔ\Psi_{\Delta} and MΔM_{\Delta}. Therefore, the precise relation (25), for analysis, is replaced by the constraint (28) on rr. The QC proposed in Lemma 1 is a special instance of a time-domain IQCs. Specifically, Lemma 1 defines a QC that holds at each time step kk and hence the inequality also holds summing over any finite horizons. This is referred to as the offset local sector IQC.

The time-domain IQCs, as defined here, hold on any finite horizon N0N\geq 0. These are typically called “hard IQCs” [14]. IQCs can also be defined in the frequency domain and equivalently expressed as time-domain constraints over an infinite horizon (N=N=\infty). These are called soft IQCs. Although this paper focuses on the use of hard IQCs, it is possible to also incorporate soft IQCs [21, 20, 24, 25].

III-C Lyapunov Condition

Let ζ:=[xψ]nζ\zeta:=\left[\begin{smallmatrix}x\\ \psi\end{smallmatrix}\right]\in\mathbb{R}^{n_{\zeta}} define the extended state vector, nζ=nG+nψn_{\zeta}=n_{G}+n_{\psi}, whose dynamics are

ζ(k+1)\displaystyle\zeta(k+1) =𝒜ζ(k)+[q(k)u(k)]\displaystyle=\mathcal{A}\ \zeta(k)+\mathcal{B}\begin{bmatrix}q(k)\\ u(k)\end{bmatrix} (29a)
r(k)\displaystyle r(k) =𝒞ζ(k)+𝒟[q(k)u(k)]\displaystyle=\mathcal{C}\ \zeta(k)+\mathcal{D}\begin{bmatrix}q(k)\\ u(k)\end{bmatrix} (29b)
u(k)\displaystyle u(k) =π(x(k))\displaystyle=\pi(x(k)) (29c)

where the state-space matrices are

𝒜=[AG0BΨ1CGAΨ],=[BG1BG2BΨ1DG1+BΨ2BΨ1DG2],\displaystyle\mathcal{A}=\begin{bmatrix}A_{G}&0\\ B_{\Psi 1}C_{G}&A_{\Psi}\end{bmatrix},\ \mathcal{B}=\begin{bmatrix}B_{G1}&B_{G2}\\ B_{\Psi 1}D_{G1}+B_{\Psi 2}&B_{\Psi 1}D_{G2}\end{bmatrix},
𝒞=[DΨ1CGCΨ],𝒟=[DΨ1DG1+DΨ2DΨ1DG2].\displaystyle\mathcal{C}=\begin{bmatrix}D_{\Psi 1}C_{G}&C_{\Psi}\end{bmatrix},\ \mathcal{D}=\begin{bmatrix}D_{\Psi 1}D_{G1}+D_{\Psi 2}&D_{\Psi 1}D_{G2}\end{bmatrix}.

Let ζ:=[xψ]=0\zeta_{*}:=\left[\begin{smallmatrix}x_{*}\\ \psi_{*}\end{smallmatrix}\right]=0 define the equilibrium point of the extended system (29). Since IQCs implicitly constrain the input pp of the extended system (29), the response of the extended system subject to IQCs “covers” the behaviors of the original uncertain feedback system. The following theorem provides a method for inner-approximating the robust ROA by performing analysis on the extended system subject to IQCs.

Theorem 2.

Consider the feedback system of an uncertain plant Fu(G,Δ)F_{u}(G,\Delta) in (24)–(25), and the NN π\pi in (3) with zero equilibrium point (ζ,u,v,w,p,q)(\zeta_{*},u_{*},v_{*},w_{*},p_{*},q_{*}). Assume Δ\Delta\in IQC(ΨΔ,MΔ)(\Psi_{\Delta},M_{\Delta}) with ΨΔ\Psi_{\Delta} and MΔM_{\Delta} given. Let v¯1n1\bar{v}^{1}\in\mathbb{R}^{n_{1}}, v¯1:=v¯1\underline{v}^{1}:=-\bar{v}^{1}, and αϕ,βϕnϕ\alpha_{\phi},\beta_{\phi}\in\mathbb{R}^{n_{\phi}} be given vectors satisfying Property 1 for the NN, and define matrices

RV\displaystyle R_{V} =[Inζ0000InqNuζNuw0],Nuζ=[Nux,0nu×nψ],\displaystyle=\left[\begin{array}[]{ccc}I_{n_{\zeta}}&0&0\\ \hline\cr 0&0&I_{n_{q}}\\ N_{u\zeta}&N_{uw}&0\end{array}\right],\ N_{u\zeta}=[N_{ux},0_{n_{u}\times n_{\psi}}],
Rϕ\displaystyle R_{\phi} =[NvζNvw00Inϕ0],Nvζ=[Nvx,0nϕ×nψ],\displaystyle=\begin{bmatrix}N_{v\zeta}&N_{vw}&0\\ 0&I_{n_{\phi}}&0\end{bmatrix},\ N_{v\zeta}=[N_{vx},0_{n_{\phi}\times n_{\psi}}],
𝒲i1\displaystyle\mathcal{W}_{i}^{1} =[Wi101×nψ],Wi1is the ith row ofW1.\displaystyle=\begin{bmatrix}W^{1}_{i}&0_{1\times n_{\psi}}\end{bmatrix},\ W^{1}_{i}\ \text{is the $i^{th}$ row of}\ W^{1}.

If there exists a matrix P𝕊++nζP\in\mathbb{S}_{++}^{n_{\zeta}}, and vector λnϕ\lambda\in\mathbb{R}^{n_{\phi}} with λ0\lambda\geq 0 such that

RV[𝒜P𝒜P𝒜PP𝒜P]RV+RϕΨϕMϕ(λ)ΨϕRϕ\displaystyle R_{V}^{\top}\begin{bmatrix}\mathcal{A}^{\top}P\mathcal{A}-P&\mathcal{A}^{\top}P\mathcal{B}\\ \mathcal{B}^{\top}P\mathcal{A}&\mathcal{B}^{\top}P\mathcal{B}\end{bmatrix}R_{V}+R_{\phi}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}R_{\phi}
+RV[𝒞𝒟]MΔ[𝒞𝒟]RV<0,\displaystyle~{}~{}~{}~{}~{}~{}+R_{V}^{\top}\begin{bmatrix}\mathcal{C}&\mathcal{D}\end{bmatrix}^{\top}M_{\Delta}\begin{bmatrix}\mathcal{C}&\mathcal{D}\end{bmatrix}R_{V}<0, (30a)
[(v¯i1)2𝒲i1𝒲i1P]0,i=1,,n1,\displaystyle\begin{bmatrix}(\bar{v}^{1}_{i})^{2}&\mathcal{W}^{1}_{i}\\ \mathcal{W}^{1\top}_{i}&P\end{bmatrix}\geq 0,\ i=1,\cdots,n_{1}, (30b)

then: (i) the feedback system comprising Fu(G,Δ)F_{u}(G,\Delta) and π\pi is locally stable around xx_{*} for any Δ\Delta\in IQC(ΨΔ,MΔ)(\Psi_{\Delta},M_{\Delta}), and (ii) the intersection of (P,ζ)\mathcal{E}(P,\zeta_{*}) with the hyperplane ψ=0\psi=0, i.e. (Px,x)\mathcal{E}(P_{x},x_{*}) where PxnG×nGP_{x}\in\mathbb{R}^{n_{G}\times n_{G}} is the upper left block of PP, is an inner-approximation to the robust ROA.

Proof.

As in the proof of Theorem 1, feasibility of (30b) implies that if ζ(k)(P,ζ)\zeta(k)\in\mathcal{E}(P,\zeta_{*}) then v1(k)[v¯1,v¯1]v^{1}(k)\in[\underline{v}^{1},\bar{v}^{1}] and hence the offset local sectors conditions are valid. Since the LMI in (30a) is strict, there exists ϵ>0\epsilon>0 such that left/right multiplication of the LMI by [(ζ(k)ζ),(wϕ(k)w),(q(k)q)][(\zeta(k)-\zeta_{*})^{\top},(w_{\phi}(k)-w_{*})^{\top},(q(k)-q_{*})^{\top}] and its transpose yields:

[][𝒜P𝒜P𝒜PP𝒜P][ζ(k)ζq(k)qu(k)u]\displaystyle\Bigg{[}\star\Bigg{]}^{\top}\begin{bmatrix}\mathcal{A}^{\top}P\mathcal{A}-P&\mathcal{A}^{\top}P\mathcal{B}\\ \mathcal{B}^{\top}P\mathcal{A}&\mathcal{B}^{\top}P\mathcal{B}\end{bmatrix}\left[\begin{array}[]{c}\zeta(k)-\zeta_{*}\\ \hline\cr q(k)-q_{*}\\ u(k)-u_{*}\end{array}\right] (34)
+[][𝒞𝒟]MΔ[𝒞𝒟][ζ(k)ζq(k)qu(k)u]\displaystyle+\Bigg{[}\star\Bigg{]}^{\top}\begin{bmatrix}\mathcal{C}&\mathcal{D}\end{bmatrix}^{\top}M_{\Delta}\begin{bmatrix}\mathcal{C}&\mathcal{D}\end{bmatrix}\left[\begin{array}[]{c}\zeta(k)-\zeta_{*}\\ \hline\cr q(k)-q_{*}\\ u(k)-u_{*}\end{array}\right] (38)
+[]ΨϕMϕ(λ)Ψϕ[vϕ(k)vwϕ(k)w]ϵζ(k)ζ2,\displaystyle+\Big{[}\star\Big{]}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}\begin{bmatrix}v_{\phi}(k)-v_{*}\\ w_{\phi}(k)-w_{*}\end{bmatrix}\leq-\epsilon\|\zeta(k)-\zeta_{*}\|^{2},

Define the Lyapunov function V(ζ):=(ζζ)P(ζζ)V(\zeta):=(\zeta-\zeta_{*})^{\top}P(\zeta-\zeta_{*}), and use (29) to show:

V(ζ(k+1))V(ζ(k))+r(k)MΔr(k)\displaystyle V(\zeta(k+1))-V(\zeta(k))+r(k)^{\top}M_{\Delta}r(k)
+[]ΨϕMϕ(λ)Ψϕ[vϕ(k)vwϕ(k)w]ϵζ(k)ζ2.\displaystyle+\Big{[}\star\Big{]}^{\top}\Psi_{\phi}^{\top}M_{\phi}(\lambda)\Psi_{\phi}\begin{bmatrix}v_{\phi}(k)-v_{*}\\ w_{\phi}(k)-w_{*}\end{bmatrix}\leq-\epsilon\|\zeta(k)-\zeta_{*}\|^{2}.

Sum this inequality from k=0k=0 to any finite time N0N\geq 0. The third and fourth term on the left side will be 0\geq 0 by the local sector conditions and the IQC. This yields:

V(ζ(N+1))V(ζ(0))k=0Nϵζ(k)ζ2.\displaystyle V(\zeta(N+1))-V(\zeta(0))\leq-\sum_{k=0}^{N}\epsilon\|\zeta(k)-\zeta_{*}\|^{2}.

Thus if ζ(0)(P,ζ)\zeta(0)\in\mathcal{E}(P,\zeta_{*}) then ζ(k)(P,ζ)\zeta(k)\in\mathcal{E}(P,\zeta_{*}) for all k0k\geq 0. Moreover, this inequality implies that ζ(N)ζ\zeta(N)\rightarrow\zeta_{*} as NN\rightarrow\infty. The initial condition for the virtual filter is ψ(0)=0\psi(0)=0 so that ζ(0)(P,ζ)\zeta(0)\in\mathcal{E}(P,\zeta_{*}) is equivalent to x(0)(Px,x)x(0)\in\mathcal{E}(P_{x},x_{*}). Hence (Px,x)\mathcal{E}(P_{x},x_{*}) is an inner approximation for the ROA. ∎

For a particular perturbation Δ\Delta there is typically a class of valid time-domain IQCs defined by a fixed filter ΨΔ\Psi_{\Delta} and a matrix MΔM_{\Delta} drawn from a constraint set Δ\mathcal{M}_{\Delta}. Therefore when formulating an optimization problem, along with PP and λ\lambda, we can treat MΔΔM_{\Delta}\in\mathcal{M}_{\Delta} as an additional decision variable to reduce conservatism. In this paper, the set Δ\mathcal{M}_{\Delta} is restricted to one that is described by LMIs [15]. Using trace(Px)(P_{x}) as the cost function to minimize along with the LMIs developed before, we have the following optimization to compute the “largest” ROA inner-approximation:

minP,λ,MΔtrace(Px)s.t.(30a)and(30b)hold,\displaystyle\min_{P,\lambda,M_{\Delta}}\ \text{trace$(P_{x})$}\ \text{s.t.}\ \eqref{eq:dissineq}\ \text{and}\ \eqref{eq:robsetcontain}\ \text{hold,} (39)

which is convex in (P,λ,MΔ)(P,\lambda,M_{\Delta}). The strict inequality in (30a) can be enforced by either replacing <0<0 with ϵI\leq-\epsilon I with ϵ=1×106\epsilon=1\times 10^{-6}, or solving (39) with a non-strict inequality 0\leq 0, and checking if the constraint is active afterwards.

III-D IQCs for Combined Activation Functions ϕ\phi

Now that we have the general framework that merges Lyapunov theory with IQCs, we will revisit the problem of describing the activation functions ϕ\phi using more general tools. Recall that offset local sector QCs have been used in Sections II and III to bound activation functions ϕ\phi. However, these local sectors fail to incorporate slope bounds of ϕ\phi. In this subsection, in addition to local sectors, we will use off-by-one IQCs [16] to capture the slope information of ϕ\phi to achieve less conservative ROA inner-approximations.

Besides the local sector bound αϕ,βϕ\alpha_{\phi},\beta_{\phi}, the bounds v¯,v¯\underline{v},\overline{v} on activation input vϕv_{\phi} can also be used to compute the local slope bounds [mϕ,Lϕ][m_{\phi},L_{\phi}] of ϕ\phi, with mϕ,Lϕnϕm_{\phi},L_{\phi}\in\mathbb{R}^{n_{\phi}}. For example, ϕi(vϕ,i):=tanh(vϕ,i)\phi_{i}(v_{\phi,i}):=\tanh(v_{\phi,i}) restricted to the interval [v¯i,v¯i][\underline{v}_{i},\overline{v}_{i}] for i=1,,nϕi=1,...,n_{\phi} satisfies the local slope bound [mϕ,i,Lϕ,i][m_{\phi,i},L_{\phi,i}] with mϕ,i:=min(dtanh(vi)dvi|vi=v¯i,dtanh(vi)dvi|vi=v¯i)m_{\phi,i}:=\min\left(\frac{d\tanh(v_{i})}{dv_{i}}|_{v_{i}=\underline{v}_{i}},\frac{d\tanh(v_{i})}{dv_{i}}|_{v_{i}=\overline{v}_{i}}\right), and Lϕ,i:=1L_{\phi,i}:=1. If wϕ=ϕ(vϕ)w_{\phi}=\phi(v_{\phi}) with vϕ(k)[v¯,v¯]v_{\phi}(k)\in[\underline{v},\overline{v}], then ϕ\phi also satisfies the hard IQC defined by (Ψoff,Moff)(\Psi_{\text{off}},M_{\text{off}}), where

Ψoff:=[0nϕdiag(Lϕ)InϕInϕdiag(Lϕ)Inϕ0nϕdiag(mϕ)Inϕ],\displaystyle\Psi_{\text{off}}:=\left[\begin{array}[]{c|cc}0_{n_{\phi}}&-\text{diag}(L_{\phi})&I_{n_{\phi}}\\ \hline\cr I_{n_{\phi}}&\text{diag}(L_{\phi})&-I_{n_{\phi}}\\ 0_{n_{\phi}}&-\text{diag}(m_{\phi})&I_{n_{\phi}}\end{array}\right], (43)
Moff(η):=[0nϕdiag(η)diag(η)0nϕ],for allηnϕwithη0.\displaystyle M_{\text{off}}(\eta):=\begin{bmatrix}0_{n_{\phi}}&\text{diag}(\eta)\\ \text{diag}(\eta)&0_{n_{\phi}}\end{bmatrix},\ \text{for all}\ \eta\in\mathbb{R}^{n_{\phi}}\ \text{with}\ \eta\geq 0.

This is the so-called “off-by-one” IQC [16], which is a special instance of the Zames-Falb IQC [26, 27]. It provides constraints that relate the activation at different time instances, e.g. between ϕi(vϕ,i(k))\phi_{i}(v_{\phi,i}(k)) and ϕi(vϕ,i(k+1))\phi_{i}(v_{\phi,i}(k+1)) for any i=1,,nϕi=1,\ldots,n_{\phi}.

The analysis on the feedback system of Fu(G,Δ)F_{u}(G,\Delta) and π\pi can be instead performed on the extended system made up by GG, ΨΔ\Psi_{\Delta} and Ψoff\Psi_{\text{off}} with additional constraints that Δ\Delta\in IQC(ΨΔ,MΔ)(\Psi_{\Delta},M_{\Delta}), and ϕ\phi satsifies the offset local sector and ϕ\phi\in IQC(Ψoff,Moff)(\Psi_{\text{off}},M_{\text{off}}). However, since Ψoff\Psi_{\text{off}} introduces a number of nϕn_{\phi} states to the extended system, the size of the corresponding Lyapunov matrix PP will increase from 𝕊++nx+nψ\mathbb{S}_{++}^{n_{x}+n_{\psi}} to 𝕊++nx+nψ+nϕ\mathbb{S}_{++}^{n_{x}+n_{\psi}+n_{\phi}}, which leads to longer computation time. The effectiveness of the off-by-one IQC is demonstrated in Section IV-B.

IV Examples

In the following examples, the optimization (39) is solved using MOSEK with CVX. The code is available at https://github.com/heyinUCB/Stability-Analysis-using-Quadratic-Constraints-for-Systems-with-Neural-Network-Controllers.

IV-A Inverted pendulum

Consider the nonlinear inverted pendulum example with mass m=0.15m=0.15 kg, length l=0.5l=0.5 m, and friction coefficient μ=0.5\mu=0.5 Nms//rad. The dynamics are:

θ¨(t)\displaystyle\ddot{\theta}(t) =mglsin(θ(t))μθ˙(t)+sat(u(t))ml2,\displaystyle=\frac{mgl\sin(\theta(t))-\mu\dot{\theta}(t)+\text{sat}(u(t))}{ml^{2}}, (44)

where θ\theta is the angular position (rad) and uu is the control input (Nm). The plant state is x=[θ,θ˙]x=[\theta,\dot{\theta}]. The saturation function is defined as sat(u)=sgn(u)min(|u|,umax)\text{sat}(u)=\text{sgn}(u)\min(|u|,u_{\text{max}}), with umax=0.7u_{\text{max}}=0.7 Nm. The controller π\pi is obtained through a reinforcement learning process using policy gradient [3, 4, 5]. During training, the agent decision making process is characterized by a probability: u(k)Pr(u(k)=u|x(k)=x)u(k)\sim Pr(u(k)=u\ |\ x(k)=x) for all uu\in\mathbb{R} and x2x\in\mathbb{R}^{2} where the probability is a Gaussian distribution with mean π(x(k))\pi(x(k)), and standard deviation σ\sigma. After training, the policy mean π\pi is used as the deterministic controller u(k)=π(x(k))u(k)=\pi(x(k)). The controller π\pi is parameterized by a 2-layer, feedforward NN with n1=n2=32n_{1}=n_{2}=32 and tanh\tanh as the activation function for both layers. The biases in the NN are set to zero during training to ensure that the equilibrium point is x=0x_{*}=0 and u=0u_{*}=0. The dynamics used for training are the discretized version of (44) with sampling time dt=0.02dt=0.02 s.

We rearrange (44) into the form:

θ¨(t)\displaystyle\ddot{\theta}(t) =mglq(t)+mglθ(t)μθ˙(t)+sat(u(t))ml2,\displaystyle=\frac{-mglq(t)+mgl\theta(t)-\mu\dot{\theta}(t)+\text{sat}(u(t))}{ml^{2}}, (45a)
q(t)\displaystyle q(t) =Δ(θ(t)):=θ(t)sin(θ(t)).\displaystyle=\Delta(\theta(t)):=\theta(t)-\sin(\theta(t)). (45b)

The static nonlinearity Δ(θ)=θsin(θ)\Delta(\theta)=\theta-\sin(\theta) is slope-restricted, and sector bounded. If we assume that θ(k)[θ¯,θ¯]\theta(k)\in[\underline{\theta},\overline{\theta}] with θ¯=θ¯=0.73\bar{\theta}=-\underline{\theta}=0.73, then the nonlinearity is slope-restricted in [0,0.2548][0,0.2548], and sector bounded in [0,0.087][0,0.087]. We also assume that v1[v¯1,v¯1]v^{1}\in[\underline{v}^{1},\bar{v}^{1}] with v¯1=v¯1=δv×132×1\bar{v}^{1}=-\underline{v}^{1}=\delta_{v}\times 1_{32\times 1} using δv=0.1\delta_{v}=0.1. Both assumptions are verified using the ROA inner-approximation. Two types of IQCs are used to characterize the nonlinearity Δ()\Delta(\cdot): an off-by-one IQC to capture the slope information, and a local sector IQC to express the local sector bound. Only the local sector IQC is used to characterize the activation functions ϕ\phi. The saturation function is static and can also be described using a local sector bound. Let u¯\bar{u} be the largest possible control command from π\pi induced from the assumption that v1[v¯1,v¯1]v^{1}\in[\underline{v}^{1},\bar{v}^{1}]. Then the saturation function satisfies the local sector [α,β][\alpha,\beta], where α:=umaxu¯\alpha:=\frac{u_{\text{max}}}{\bar{u}} and β:=1\beta:=1.

Figure 6 shows the boundaries for the sets {x:v¯1v1v¯1}\{x:\underline{v}^{1}\leq v^{1}\leq\bar{v}^{1}\} and {x:θ¯θθ¯}\{x:\underline{\theta}\leq\theta\leq\bar{\theta}\} with orange and brown lines, the ROA inner-approximation with a blue ellipsoid, and the phase portrait of the closed-loop system, with green and red curves representing trajectories inside and outside the ROA.

Refer to caption
Figure 6: A ROA inner-approximation of the inverted pendulum

IV-B Vehicle lateral control

Consider the vehicle lateral dynamics from [28]:

[e˙e¨e˙θe¨θ]=\displaystyle\begin{bmatrix}\dot{e}\\ \ddot{e}\\ \dot{e}_{\theta}\\ \ddot{e}_{\theta}\end{bmatrix}= [01000Cαf+CαrmUCαf+CαrmaCαfbCαrmU00010aCαfbCαrIzUaCαfbCαrIza2Cαf+b2CαrIzU][ee˙eθe˙θ]\displaystyle\begin{bmatrix}0&1&0&0\\ 0&\frac{C_{\alpha f}+C_{\alpha r}}{mU}&-\frac{C_{\alpha f}+C_{\alpha r}}{m}&\frac{aC_{\alpha f}-bC_{\alpha r}}{mU}\\ 0&0&0&1\\ 0&\frac{aC_{\alpha f}-bC_{\alpha r}}{I_{z}U}&-\frac{aC_{\alpha f}-bC_{\alpha r}}{I_{z}}&\frac{a^{2}C_{\alpha f}+b^{2}C_{\alpha r}}{I_{z}U}\end{bmatrix}\begin{bmatrix}e\\ \dot{e}\\ e_{\theta}\\ \dot{e}_{\theta}\end{bmatrix}
+\displaystyle+ [0Cαfm0aCαfIz]u+[0aCαfbCαrmU20a2Cαf+b2CαrIz]c\displaystyle\begin{bmatrix}0\\ -\frac{C_{\alpha f}}{m}\\ 0\\ -\frac{aC_{\alpha f}}{I_{z}}\end{bmatrix}u+\begin{bmatrix}0\\ \frac{aC_{\alpha f}-bC_{\alpha r}}{m}-U^{2}\\ 0\\ \frac{a^{2}C_{\alpha f}+b^{2}C_{\alpha r}}{I_{z}}\end{bmatrix}c (46)

where ee is the perpendicular distance to the lane edge (m), and eθe_{\theta} is the angle between the tangent to the straight section of the road and the projection of the vehicle’s longitudinal axis (rad). Let x=[e,e˙,eθ,e˙θ]x=[e,\dot{e},e_{\theta},\dot{e}_{\theta}]^{\top} denote the plant state. The control uu is the steering angle of the front wheel (rad), the disturbance cc is the road curvature (1/1/m), and the parameters are: longitudinal velocity U=28U=28 m//s, front cornering stiffness Cαf=1.232×105C_{\alpha f}=-1.232\times 10^{5} N//rad, rear cornering stiffness Cαr=1.042×105C_{\alpha r}=-1.042\times 10^{5} N//rad, mass m=1.67×103m=1.67\times 10^{3} kg, moment of inertia Iz=2.1×103I_{z}=2.1\times 10^{3} kg/m2/\text{m}^{2}, distances from vehicle center of gravity to front axle a=0.99a=0.99 m and rear axle b=1.7b=1.7 m.

Again, the controller π\pi is obtained using policy gradient, and is parameterized by a 2-layer, feedforward NN, with n1=n2=32n_{1}=n_{2}=32 and tanh\tanh as the activation function for both layers. The training process uses a discretized version of (IV-B) with sampling time dt=0.02dt=0.02 s and draws the curvature c(k)c(k) at each time step from an interval [1/200,1/200][-1/200,1/200]. The control command derived from u(k)=π(x(k))u(k)=\pi(x(k)) enters the vehicle dynamics through a saturation function sat()(\cdot) with umax=π/6u_{\text{max}}=\pi/6. Let usat:=sat(π(x))u_{\text{sat}}:=\text{sat}(\pi(x)) define the saturated control signal.

The analysis is performed for a constant curvature c0c\equiv 0, resulting in a zero equilibrium state x=0x_{*}=0. In the analysis problem, on top of saturation, we also add a norm-bounded LTI uncertainty ΔLTI\Delta_{\text{LTI}}\in\mathbb{RH}_{\infty} with ΔLTI0.1\left\lVert\Delta_{\text{LTI}}\right\rVert_{\infty}~{}\leq~{}0.1 to the control input. This is used to assess the robustness of the NN controller against actuator uncertainty. As shown in Figure 7, the actual input to the vehicle dynamics is

upert(k)=usat(k)+q(k),andq()=ΔLTI(usat()).\displaystyle u_{\text{pert}}(k)=u_{\text{sat}}(k)+q(k),\ \text{and}\ q(\cdot)=\Delta_{\text{LTI}}(u_{\text{sat}}(\cdot)).
Refer to caption
Figure 7: Uncertain vehicle system with actuator uncertainty

It is assumed that v1[v¯1,v¯1]v^{1}\in[\underline{v}^{1},\bar{v}^{1}], where v¯1=v¯1=δv×132×1\bar{v}^{1}=-\underline{v}^{1}=\delta_{v}\times 1_{32\times 1} with δv=0.6\delta_{v}=0.6. To show effectiveness of the off-by-one IQC, two experiments were carried out: one with only local sector IQC to describe ϕ\phi, and one with both local sector and off-by-one IQCs. The achieved trace(Px)(P_{x}) for the two experiments are 4.4 and 2.9, respectively. Moreover, the achieved det(Px1)\det(P_{x}^{-1}) (proportional to the volume) for the experiments are 3.2×1053.2\times 10^{5}, and 1.1×1061.1\times 10^{6}, respectively. Therefore, with the help of off-by-one IQC to sharpen the description of ϕ\phi, the second experiment achieves a larger ROA inner-approximation. It is also important to note that thanks to the off-by-one IQC, the SDP is able to tolerate looser local sector bounds. The largest value of δv\delta_{v} such that the SDP is feasible is 0.67 for the first experiment, and 1.4 for the second experiment.

Figure 8 shows slices of the ROA inner-approximation from the second experiment on the eee˙\dot{e} and eθe_{\theta}e˙θ\dot{e}_{\theta} spaces. Specifically, these are intersections of (Px,x)\mathcal{E}(P_{x},x_{*}) with the hyperplanes (eθ,e˙θ)=(eθ,e˙θ)(e_{\theta},\dot{e}_{\theta})=(e_{\theta*},\dot{e}_{\theta*}) and (e,e˙)=(e,e˙)(e,\dot{e})=(e_{*},\dot{e}_{*}), respectively, where x=[e,e˙,eθ,e˙θ]x_{*}=[e_{*},\dot{e}_{*},e_{\theta*},\dot{e}_{\theta*}]^{\top}. The slices are shown with blue ellipsoids. The boundary of the polytopic set {x:v¯1v1v¯1}\{x:\underline{v}^{1}\leq v^{1}\leq\bar{v}^{1}\} is shown with the orange lines. The brown crosses represent the zero equilibrium state xx_{*}.

Refer to caption
Figure 8: ROA inner-approximation on the eee˙\dot{e} and eθe_{\theta}e˙θ\dot{e}_{\theta} spaces using both local sector and off-by-one IQCs with δv=0.6\delta_{v}=0.6.

References

  • [1] A. U. Levin and K. S. Narendra, “Control of nonlinear dynamical systems using neural networks: controllability and stabilization,” IEEE Transactions on Neural Networks, vol. 4, no. 2, pp. 192–206, 1993.
  • [2] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3–4, p. 229–256, May 1992.
  • [3] J. Peters and S. Schaal, “Reinforcement learning of motor skills with policy gradients,” Neural networks : the official journal of the International Neural Network Society, vol. 21 4, pp. 682–97, 2008.
  • [4] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust Region Policy Optimization,” p. arXiv:1502.05477, Feb. 2015.
  • [5] S. Kakade, “A natural policy gradient,” in Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, ser. NIPS’01.   Cambridge, MA, USA: MIT Press, 2001, p. 1531–1538.
  • [6] D. A. Pomerleau, ALVINN: An Autonomous Land Vehicle in a Neural Network.   San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1989, p. 305–313.
  • [7] H. K. Khalil, Nonlinear systems; 3rd ed.   Upper Saddle River, NJ: Prentice-Hall, 2002.
  • [8] M. Fazlyab, M. Morari, and G. J. Pappas, “Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming,” p. arXiv:1903.01287, Mar. 2019.
  • [9] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” in Advances in Neural Information Processing Systems 32.   Curran Associates, Inc., 2019, pp. 11 427–11 438.
  • [10] P. Pauli, A. Koch, J. Berberich, and F. Allgöwer, “Training robust neural networks using Lipschitz bounds,” p. arXiv:2005.02929, May 2020.
  • [11] H. Hu, M. Fazlyab, M. Morari, and G. J. Pappas, “Reach-SDP: Reachability Analysis of Closed-Loop Systems with Neural Network Controllers via Semidefinite Programming,” p. arXiv:2004.07876, Apr. 2020.
  • [12] M. Jin and J. Lavaei, “Stability-certified reinforcement learning: A control-theoretic perspective,” p. arXiv:1810.11505, Oct. 2018.
  • [13] K. K. Kim, E. R. Patrón, and R. D. Braatz, “Standard representation and unified stability analysis for dynamic artificial neural network models,” Neural Networks, vol. 98, pp. 251–262, 2018.
  • [14] A. Megretski and A. Rantzer, “System analysis via integral quadratic constraints,” IEEE Transactions on Automatic Control, vol. 42, no. 6, pp. 819–830, June 1997.
  • [15] J. Veenman, C. W. Scherer, and H. Köroğlu, “Robust stability and performance analysis based on integral quadratic constraints,” European Journal of Control, vol. 31, pp. 1 – 32, 2016.
  • [16] L. Lessard, B. Recht, and A. Packard, “Analysis and design of optimization algorithms via integral quadratic constraints,” SIAM Journal on Optimization, vol. 26, no. 1, pp. 57–95, 2016.
  • [17] H. Fang, Z. Lin, and M. Rotea, “On iqc approach to the analysis and design of linear systems subject to actuator saturation,” Systems & Control Letters, vol. 57, no. 8, pp. 611 – 619, 2008.
  • [18] S. Tarbouriech, Stability and Stabilization of Linear Systems with Saturating Actuators.   Springer Publishing Company, Incorporated, 2014.
  • [19] E. Summers and A. Packard, “L2 gain verification for interconnections of locally stable systems using integral quadratic constraints,” in 49th IEEE Conference on Decision and Control (CDC), 2010, pp. 1460–1465.
  • [20] M. Fetzer, C. W. Scherer, and J. Veenman, “Invariance with dynamic multipliers,” IEEE Transactions on Automatic Control, vol. 63, no. 7, pp. 1929–1942, 2018.
  • [21] A. Iannelli, P. Seiler, and A. Marcos, “Region of attraction analysis with integral quadratic constraints,” Automatica, vol. 109, p. 108543, 2019.
  • [22] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arandjelovic, T. Mann, and P. Kohli, “On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models,” p. arXiv:1810.12715, Oct. 2018.
  • [23] H. Hindi and S. Boyd, “Analysis of linear systems with saturation using convex optimization,” in Proceedings of the 37th IEEE Conference on Decision and Control, vol. 1, 1998, pp. 903–908 vol.1.
  • [24] H. Yin, P. Seiler, and M. Arcak, “Backward reachability using integral quadratic constraints for uncertain nonlinear systems,” IEEE Control Systems Letters, vol. 5, no. 2, pp. 707–712, 2021.
  • [25] H. Yin, A. Packard, M. Arcak, and P. Seiler, “Reachability analysis using dissipation inequalities for uncertain nonlinear systems,” Systems & Control Letters, vol. 142, p. 104736, 2020.
  • [26] G. Zames and P. L. Falb, “Stability conditions for systems with monotone and slope-restricted nonlinearities,” SIAM Journal on Control, vol. 6, no. 1, pp. 89–108, 1968.
  • [27] V. V. Kulkarni and M. G. Safonov, “All multipliers for repeated monotone nonlinearities,” IEEE Transactions on Automatic Control, vol. 47, no. 7, pp. 1209–1212, 2002.
  • [28] A. Alleyne, “A comparison of alternative intervention strategies for unintended roadway departure (urd) control,” Vehicle System Dynamics, vol. 27, no. 3, pp. 157–186, 1997.