This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

FRIDAY: Real-time Learning DNN-based Stable LQR controller for Nonlinear Systems under Uncertain Disturbances

\NameTakahito Fujimori \Email[email protected]
\addrSchool of Biology
   Osaka University    Machikaneyamacho 1-1    Toyonaka    Japan
Abstract

Linear Quadratic Regulator (LQR) is often combined with feedback linearization (FBL) for nonlinear systems that have the nonlinearity additive to the input. Conventional approaches estimate and cancel the nonlinearity based on first principle or data-driven methods such as Gaussian Processes (GPs). However, the former needs an elaborate modeling process, and the latter provides a fixed learned model, which may be suffering when the model dynamics are changing. In this letter, we take a Deep Neural Network (DNN) using real-time-updated dataset to approximate the unknown nonlinearity while the controller is running. Spectrally normalizing the weights in each time-step, we stably incorporate the DNN prediction to an LQR controller and compensate for the nonlinear term. Leveraging the property of the bounded Lipschitz constant of the DNN, we provide theoretical analysis and locally exponential stability of the proposed controller. Simulation results show that our controller significantly outperforms Baseline controllers in trajectory tracking cases.

keywords:
Feedback Linearization, Deep Neural Networks, Real-time Learning, Guaranteed Stability

1 Introduction

Although Linear Quadratic Regulator (LQR) is an intuitively simple controller for linear systems and one of the successful theoretical subjects in modern control theory (Chen, 1984), many safety-critical systems such as self-driving vehicles, unmanned aerial vehicles (UAVs) and mobile manipulators exhibit additive nonlinear dynamics to the control input (Yang et al., 2021). For such systems, LQR is often combined with feedback linearization (FBL) to cancel out the nonlinearity, and various identification methods have been proposed.
A classical approach is first principle based (Tourajizadeh et al., 2016) or adaptive (Yucelen, 1999) approximations. However, parametric FBL-LQR controllers often suffer from slow response and delayed feedback if the model mismatches with the true dynamics. Another approach utilizes data-driven estimation, such as Gaussian Processes (GPs). GPs learn the nonlinear term and present the quantified prediction mismatch, enabling GPs-FBL-LQR controllers to stably linearize the system (Greeff et al., 2021; Greeff and Schoellig, 2020). However, GPs often keep a fixed dataset and do not update the model while the controller is running. Fixed models may suffer from situations where the model dynamics are changing due to unexpected disturbances, or the model needs to be updated because of biased training data. The compensating strategy is adding data to the dataset in real-time (Berkenkamp et al., 2016) or updating models based on its current reliability (Umlauft and Hirche, 2019), but these methods heavily rely on data efficiency and need a possibly expensive online optimization process, compromising fast convergence to the reference states. This is because the data efficiency of GPs is very sensitive to the choice of kernel/covariance function, and the runtime complexity is inherently O(n3)O(n^{3}). For these reasons, we take Deep Neural Networks (DNNs) approach to compensate for the unknown disturbances in real-time.
Related works: (Li et al., 2017; Konoiko et al., 2019) incorporate a learned DNN into a controller and directly cancel out the uncertainty by its prediction. However, these DNNs are not theoretically analyzed and thus possibly generate unpredictable outputs. To guarantee stability, (Shi et al., 2019; Zhang et al., 2023) apply Spectral Normalization (SN) to their DNN weights to constrain the Lipschitz constants. Though they stably compensate for complex fluid dynamics, they have fixed training datasets, resulting in the dependency of the performance on the DNN generalizations. Concurrent with real-time execution, (Joshi and Chowdhary, 2019; Sun et al., 2021) collect data and iteratively train their DNNs with Stochastic Gradient Descent (SGD) to apply batch-like updates. Still, in such multi-time scale controllers, when to collect the data and when to update the weights are open questions. Instead of SGD, adaptive DNN weights update laws based on Lyapunov stability analysis have been developed to continuously adjust the weights. Though these architectures are well-established, they only apply to NNs with a single hidden layer (Lewis, 1999) or only update the output layer weights (Joshi et al., 2021). For full layer weights update, (Le et al., 2021; Patil et al., 2022) develop modular adaptive law, but in arbitrary width and depth DNNs, simultaneously updating weights online under adaptive laws may be computationally intractable or undesired.
Contributions: Exploiting the technique to stabilize DNN-based FBL signal (Shi et al., 2019), which is computationally light and easy to implement, we continuously update all the weights with simple SGD, not with adaptive laws. Specifically, we apply SN to the weights before executing our DNN-based control input in each time-step such that it be a contraction mapping, converging to its unique point. By doing so, we can stably optimize the weights with SGD while controlling the system. The proposed controller is named FRIDAY, short for Fast ResIdual Dynamics AnalYsis. Leveraging the property of the bounded Lipschitz constant, FRIDAY is proved to be locally exponentially stable under bounded learning error. Simulation results show FRIDAY achieves almost double trajectory tracking accuracy of an adaptive baseline controller and ten times that of an LQR, while learning the map of uncertain disturbances. To the best of our knowledge, this is the first guaranteed framework that constantly collects data and updates the full layer weights with SGD to cancel out unknown dynamics.

2 Problem Statement: Nonlinear Systems under Uncertain Disturbances

Given the states of a system as state vector 𝐱n\mathbf{x}\in\mathbb{R}^{\text{n}}, input vector 𝐮m\mathbf{u}\in\mathbb{R}^{\text{m}}, we consider a control-affine nonlinear dynamic system

𝐱˙=f(𝐱)+g(𝐱)𝐮.\dot{\mathbf{x}}=f(\mathbf{x})+g(\mathbf{x})\mathbf{u}. (1)

A wide range of dynamical systems such as quadrotor and car-like vehicles can be separated into a linear dynamics component and additive nonlinearity (Greeff et al., 2021; Greeff and Schoellig, 2020). Thus, we divide the nonlinear system into a Linear Time-Invariant (LTI) system and a nonlinear term as follows (Yang et al., 2021):

𝐱˙=A𝐱+B(𝐮+𝐑(𝐱,𝐮)),\dot{\mathbf{x}}=A\mathbf{x}+B(\mathbf{u}+\mathbf{R}(\mathbf{x},\mathbf{u})), (2)

where A,BA,B are time-invariant matrices with the pair (A,B)(A,B) being controllable, and 𝐑(𝐱,𝐮)\mathbf{R}(\mathbf{x},\mathbf{u}) accounts for unknown nonlinear dynamics including model uncertainties, named residual dynamics. It is noted that the analysis in this letter is restricted to this form.

Problem Statement: We aim to make a copy of the residual dynamics 𝐑^(𝐱,𝐮)\hat{\mathbf{R}}(\mathbf{x},\mathbf{u}) in real-time to cancel out the nonlinear term by control input 𝐮=𝐮LQR𝐑^\mathbf{u}=\mathbf{u}^{\text{LQR}}-\hat{\mathbf{R}} such that the LQR controller becomes able to operate the linearized system. We use a real-time learning DNN directly as the cancellation term, which means we incorporate DNN predictions into a controller while the DNN is learning. To stabilize the closed-loop, we exploit contraction mapping technique proposed by (Shi et al., 2019) using Spectral Normalization, which constrains the Lipschitz constant of DNNs. Leveraging the property of the bounded Lipschitz constant, we guarantee the contoller stability.

Refer to caption
Figure 1: Our proposed architecture using real-time learning DNN has three key components: (1) Updating Dataset in Real-time: the current observed values of states, control inputs and residual dynamics are added to the training dataset in each time-step, (2) One-time Learning: using the dataset, the DNN is optimized once an iteration with Stochastic Gradient Descent, (3) Execution after Spectral Normalization: the DNN estimation value cancels the nonlinearity after its Lipschitz constant is bounded.

3 FRIDAY Controller Design

We first show the overall design of our controller, composed of a conventional LQR and the DNN-based feedforward cancellation 𝐑^(𝐱,𝐮)\hat{\mathbf{R}}(\mathbf{x},\mathbf{u}). The DNN learns 𝐑\mathbf{R} using observed values while its prediction directly compensates for the unknown dynamics such that the LQR controls the system (see Fig. 1)

𝐮=K(𝐱𝐱r)+𝐮r𝐑^(𝐱,𝐮),\mathbf{u}=-K(\mathbf{x}-\mathbf{x}_{r})+\mathbf{u}_{r}-\hat{\mathbf{R}}(\mathbf{x},\mathbf{u}), (3)

where the feedback gain K=R1BPK=R^{-1}B^{\top}P derives from the solution of the Algebraic Riccati Equation (ARE) AP+PAPBR1BP+Q=0,Q,R>0A^{\top}P+PA-PBR^{-1}B^{\top}P+Q=0,Q,R>0. 𝐱r\mathbf{x}_{r} and 𝐮r\mathbf{u}_{r} are reference signals satisfying 𝐱˙r=A𝐱r+B𝐮r\dot{\mathbf{x}}_{r}=A\mathbf{x}_{r}+B\mathbf{u}_{r} (Chen, 1984). Substituting (3) into (2), the closed-loop dynamics would be the following equation with approximation error ϵ=𝐑𝐑^\boldsymbol{\epsilon}=\mathbf{R}-\hat{\mathbf{R}},

𝐱˙𝐱˙r=(ABK)(𝐱𝐱r)+Bϵ.\dot{\mathbf{x}}-\dot{\mathbf{x}}_{r}=(A-BK)(\mathbf{x}-\mathbf{x}_{r})+B\boldsymbol{\epsilon}. (4)

By defining 𝐳=𝐱𝐱r\mathbf{z}=\mathbf{x}-\mathbf{x}_{r}, Acl=ABKA_{\text{cl}}=A-BK, the system dynamics would simply be

𝐳˙=Acl𝐳+Bϵ.\dot{\mathbf{z}}=A_{\text{cl}}\mathbf{z}+B\boldsymbol{\epsilon}. (5)

As long as Bϵ\|B\boldsymbol{\epsilon}\| is bounded, 𝐱(t)𝐱r(t)\mathbf{x}(t)\to\mathbf{x}_{r}(t) locally and exponentially with the bounded error (Slotine et al., 1991; Shi et al., 2018).

4 DNN Learning Residual Dynamics

We demonstrate how the DNN is trained and predicts the residual dynamics while the controller is running. We use a DNN with Rectified Linear Units (ReLU) activation function. ReLU DNNs have been shown to converge faster, have fewer vanishing gradient problems, and be easier to optimize than the other activation functions such as sigmoid and tanh (Zeiler et al., 2013)

𝐑^(DX,𝜽)=WLa(WL1(a(WL2(a(W1DX))))),\hat{\mathbf{R}}(D_{X},\boldsymbol{\theta})=W^{L}a(W^{L-1}(a(W^{L-2}(\ldots a(W^{1}D_{X})\ldots)))), (6)

where DX={𝐱,𝐮}D_{X}=\{\mathbf{x},\mathbf{u}\} consists of the observed state and control input, 𝜽=W1,,WL\boldsymbol{\theta}=W^{1},\cdots,W^{L} stands for LL-th layers of weight matrices, and a()=max(,0)a(\cdot)=\text{max}(\cdot,0) is layer-wise ReLU.

4.1 Real-time Learning

Real-time learning means that the following optimization is conducted in each time-step

𝒟X\displaystyle\mathcal{D}_{X} :={DX1,DXk1+DXk={𝐱k,𝐮k}},\displaystyle:=\Big{\{}D_{X_{1}},\cdots D_{X_{k-1}}\leftarrow+D_{X_{k}}=\{\mathbf{x}_{k},\mathbf{u}_{k}\}\Big{\}}, (7)
𝒟Y\displaystyle\mathcal{D}_{Y} :={DY1,DYk1+DYk=𝐑~k},\displaystyle:=\Big{\{}D_{Y_{1}},\cdots D_{Y_{k-1}}\leftarrow+D_{Y_{k}}=\tilde{\mathbf{R}}_{k}\Big{\}},
min𝜽1n(N)nNDYn𝐑^(DXn,𝜽)2,\displaystyle\min_{\boldsymbol{\theta}}\quad\frac{1}{\text{n}(N)}\sum_{n\in N}\|D_{Y_{n}}-\hat{\mathbf{R}}(D_{X_{n}},\boldsymbol{\theta})\|^{2},

where 𝐑~k\tilde{\mathbf{R}}_{k} is the observed residual dynamics value and NN denotes a mini-batch set with its size being n(N)\text{n}(N). In other words, the DNN is trainded using the each-iteration-updated dataset 𝒟X\mathcal{D}_{X}, 𝒟Y\mathcal{D}_{Y}. Once (7) is done, the estimator 𝐑^\hat{\mathbf{R}} is used for the cancellation at the next time-step (see Algorithm. 1 line: 10 to 16, and 8).
However, it is not preferable to integrate the estimation of such developping DNNs with a feedback controller because the output is unpredictable and can be unstable. To address this instability, we apply Spectral Normalization to our mapping. \colorblack

4.2 Spectral Normalization

Spectral Normalization (SN) normalizes the Lipschitz constant of the objective function. Lipschitz constant is defined as the smallest value such that

𝐱,𝐱´:f(𝐱)f(𝐱´)2/𝐱𝐱´2fLip.\forall\mathbf{x},\acute{\mathbf{x}}:\|f(\mathbf{x})-f(\acute{\mathbf{x}})\|_{2}/\|\mathbf{x}-\acute{\mathbf{x}}\|_{2}\leq\|f\|_{\text{Lip}}. (8)

Since the Lipschitz constant of the linear mapping W𝐱W\mathbf{x} is the spectral norm of the weight matrix σ(W)\sigma(W) (the maximum singular value), and that of ReLU a()Lip=1\|a(\cdot)\|_{\text{Lip}}=1, the Lipschitz constant of ReLU DNNs is naturally upper bounded by the product of all the spectral norms,

fLipWLLipaLipW1Lip=l=1Lσ(Wl).\|f\|_{\text{Lip}}\leq\|W^{L}\|_{\text{Lip}}\cdot\|a\|_{\text{Lip}}\ldots\|W^{1}\|_{\text{Lip}}=\prod_{l=1}^{L}\sigma(W^{l}). (9)

Leveraging this property, we can make the Lipschitz constant of the DNNs upper bounded by an intended value ζ\zeta by dividing each weight WSN=Wl/σ(Wl)ζ1LW_{SN}=W^{l}/\sigma(W^{l})\cdot\zeta^{\frac{1}{L}} (see Sec. 2.1 (Miyato et al., 2018), Lemma 3.1 (Shi et al., 2019)).

4.3 Constrained Prediction

According to Lemma 5.1 and Theorem 5.2 discussed later, ζ\zeta-Lipschitz DNN-based control input converges. Hence, stably incorporating a real-time learning DNN into a controller, now becomes normalizing all the weights s.t. 𝐑^Lipζ\|\hat{\mathbf{R}}\|_{\text{Lip}}\leq\zeta before executing the control input 𝐮\mathbf{u} (see Algorithm. 1 line: 4 to 9). \colorblack We optimize the DNN using Stochastic Gradient Descent with momentum (Momentum SGD).

Algorithm 1 FRIDAY algorithm
1:  Initialize the weights θ\theta of the SN-DNN.
2:  for the entire duration do
3:     Obtain the current states 𝐱k\mathbf{x}_{k}
4:     Calculate the spectral norm of each weight matrix σ(W)\sigma(W) and divide
5:     for ll do
6:        WlWl/σ(Wl)ζ1LW^{l}\leftarrow W^{l}/\sigma(W^{l})\cdot\zeta^{\frac{1}{L}}
7:     end for
8:     Estimate the residual dynamics 𝐑^k=𝐑^(𝐱k,𝐮k1)\hat{\mathbf{R}}_{k}=\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{k-1})
9:     Execute the current control input 𝐮k=K𝐱k𝐑^k\mathbf{u}_{k}=-K\mathbf{x}_{k}-\hat{\mathbf{R}}_{k}
10:     Begin the DNN training:
11:     Add the current data to the training sets,
12:     𝒟X+{𝐱k,𝐮k}\mathcal{D}_{X}\leftarrow+\{\mathbf{x}_{k},\mathbf{u}_{k}\}
13:     𝒟Y+𝐑~k\mathcal{D}_{Y}\leftarrow+\tilde{\mathbf{R}}_{k}
14:     Sample random mini-batch NN and update θ\theta once to minimize
15:     1n(N)nNDYn𝐑^(DXn,𝜽)2\frac{1}{\text{n}(N)}\underset{{n\in N}}{\sum}\|D_{Y_{n}}-\hat{\mathbf{R}}(D_{X_{n}},\boldsymbol{\theta})\|^{2}
16:     End the training.
17:  end for

5 Theoretical Guarantees

We analyze the closed-loop system to prove its stability and robustness. This analysis also provides insight into how to tune the hyperparameters of the DNN and the LQR controller to improve the performance. Note that all norms \|\cdot\| used later denote L2L^{2} norm.

5.1 Convergence of Control input

Using fixed-point iteration, we show the control input defined as (3) converges to a unique point when we fix all states.

Lemma 5.1: Control input defined by the following mapping 𝐮k=(𝐮k1)k\mathbf{u}_{k}=\mathcal{F}(\mathbf{u}_{k-1})_{k} converges to the unique solution satisfying 𝐮k=(𝐮k)k\mathbf{u}_{k}^{*}=\mathcal{F}(\mathbf{u}_{k}^{*})_{k} when 𝐑^Lipζ\|\hat{\mathbf{R}}\|_{\text{Lip}}\leq\zeta and all states are fixed,

(𝐮)k=K(𝐱k𝐱rk)+𝐮rk𝐑^(𝐱k,𝐮).\mathcal{F}(\mathbf{u})_{k}=-K(\mathbf{x}_{k}-\mathbf{x}_{r_{k}})+\mathbf{u}_{r_{k}}-\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}). (10)

Proof: Given all states fixed and 𝐮1,𝐮2𝒰\forall\mathbf{u}_{1},\mathbf{u}_{2}\in\mathcal{U}, with 𝒰\mathcal{U} being a compact set of feasible control input, the distance in L2L^{2}-space is

(𝐮1)k(𝐮2)k\displaystyle\|\mathcal{F}(\mathbf{u}_{1})_{k}-\mathcal{F}(\mathbf{u}_{2})_{k}\| =𝐑^(𝐱k,𝐮1)𝐑^(𝐱k,𝐮2)LR𝐮1𝐮2,\displaystyle=\|\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{1})-\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{2})\|\leq L_{R}\|\mathbf{u}_{1}-\mathbf{u}_{2}\|, (11)
\color

black where LRL_{R} is the Lipschitz constant of the estimated dynamics 𝐑^(𝐱,𝐮)\hat{\mathbf{R}}(\mathbf{x},\mathbf{u}). As long as we constrain LRL_{R} s.t. LR\leqqζ\leqq1L_{R}\leqq\zeta\leqq 1 in every iteration, ()\mathcal{F}(\cdot) is always a contraction mapping. Thus kk-th input 𝐮k\mathbf{u}_{k} mapped by ()k\mathcal{F}(\cdot)_{k} get close to its unique solution. \quad\Box

5.2 Stability of FRIDAY Controller

To present the stability of FRIDAY controller, we make four assumptions.
Assumption 1: There exists the maximum of the reference signals xrm=maxtt0𝐱r(t)x_{r_{m}}=\max_{t\geq t_{0}}\|\mathbf{x}_{r}(t)\|, urm=maxtt0𝐮r(t)u_{r_{m}}=\max_{t\geq t_{0}}\|\mathbf{u}_{r}(t)\|
Assumption 2: The distance between one-step control input satisfies 𝐮k𝐮k1\leqqρ𝐳\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\|\leqq\rho\|\mathbf{z}\| with a small positive constant ρ\rho.
The intuition behind this assumption is provided as follows: From (10), the distance is inherently upper bounded,

Δ𝐮kσ(K)Δ𝐳k+Δ𝐮rk+LR(Δ𝐮k1+Δ𝐱k),\Delta\mathbf{u}_{k}\leq\sigma(K)\Delta\mathbf{z}_{k}+\Delta\mathbf{u}_{r_{k}}+L_{R}(\Delta\mathbf{u}_{k-1}+\Delta\mathbf{x}_{k}), (12)

with Δ()k=()k()k1\Delta(\cdot)_{k}=\|(\cdot)_{k}-(\cdot)_{k-1}\|. Under the condition where the update rate of all states control and the reference signals are much faster than that of FRIDAY controller, in practice, we can safely neglect Δ𝐳k,Δ𝐮rk\Delta\mathbf{z}_{k},\Delta\mathbf{u}_{r_{k}} and Δ𝐱k\Delta\mathbf{x}_{k} in one update (see Theorem11.1 (Khalil, 2002), e. g. the rate>>100Hz (Shi et al., 2019; O’Connell et al., 2022)), which leads to:

Δ𝐮kLR(Δ𝐮k1+c),\Delta\mathbf{u}_{k}\leq L_{R}(\Delta\mathbf{u}_{k-1}+c), (13)

where cc is a small constant of sum of the neglected variables, and LR<1L_{R}<1. So we can consider Δ𝐮k\Delta\mathbf{u}_{k} has a small ultimate bound, and there exists a positive constant ρ\rho s.t. 𝐮k𝐮k1ρ𝐳\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\|\leq\rho\|\mathbf{z}\|.
Assumption 3: Over the compact sets of feasible states and control inputs 𝐱𝒳,𝐮𝒰\mathbf{x}\in\mathcal{X},\mathbf{u}\in\mathcal{U}, the residual dynamics 𝐑(𝐱,𝐮)\mathbf{R}(\mathbf{x},\mathbf{u}) and its learning error ϵ(𝐱,𝐮)=𝐑(𝐱,𝐮)𝐑^(𝐱,𝐮)\boldsymbol{\epsilon}(\mathbf{x},\mathbf{u})=\mathbf{R}(\mathbf{x},\mathbf{u})-\hat{\mathbf{R}}(\mathbf{x},\mathbf{u}) have upper bound Rm=sup𝐱𝒳,𝐮𝒰𝐑(𝐱,𝐮)R_{m}=\sup_{\mathbf{x}\in\mathcal{X},\mathbf{u}\in\mathcal{U}}\|\mathbf{R}(\mathbf{x},\mathbf{u})\|, ϵm=sup𝐱𝒳,𝐮𝒰ϵ(𝐱,𝐮)\epsilon_{m}=\sup_{\mathbf{x}\in\mathcal{X},\mathbf{u}\in\mathcal{U}}\|\boldsymbol{\epsilon}(\mathbf{x},\mathbf{u})\|.
This assumption is strengthened by (Neyshabur et al., 2017) showing SN-DNNs empirically generalize well to the set of unseen events having almost the same distribution as the training set.
Assumption 4: The compact sets 𝒳=B¯r𝒳(𝟎,r𝒳)\mathcal{X}=\bar{B}_{r_{\mathcal{X}}}(\mathbf{0},r_{\mathcal{X}}), 𝒰=B¯r𝒰(𝟎,r𝒰)\mathcal{U}=\bar{B}_{r_{\mathcal{U}}}(\mathbf{0},r_{\mathcal{U}}) are closed balls of radiuses r𝒳,r𝒰r_{\mathcal{X}},r_{\mathcal{U}}, centered at the origin.

Based on the assumptions, we prove the stability and robustness of the closed-loop system.
Theorem 5.2: If 𝐱0𝒳\mathbf{x}_{0}\in\mathcal{X}, 𝐮0𝒰\mathbf{u}_{0}\in\mathcal{U} and r𝒳,r𝒰r_{\mathcal{X}},r_{\mathcal{U}} are larger than some constants, then the controller defined in (3) achieves 𝐱(t)𝐱r(t)\mathbf{x}(t)\to\mathbf{x}_{r}(t) exponentially to an error ball B¯r\bar{B}_{r} of radius rr.
Proof: We select a Lyapunov function as 𝒱(𝐳)=𝐳P𝐳\mathcal{V}(\mathbf{z})=\mathbf{z}^{\top}P\mathbf{z} where P is a positive definite matrix satisfying the ARE. Applying Assumption 1-4, we get the time-derivative of 𝒱˙\dot{\mathcal{V}}:

𝒱˙\displaystyle\dot{\mathcal{V}} =𝐳(AclP+PAcl)𝐳+2𝐳PB(𝐑^(𝐱k,𝐮k)𝐑^(𝐱k,𝐮k1)+ϵ(𝐱k,𝐮k))\displaystyle=\mathbf{z}^{\top}(A_{\text{cl}}^{\top}P+PA_{\text{cl}})\mathbf{z}+2\mathbf{z}^{\top}PB(\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{k})-\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{k-1})+{\epsilon}(\mathbf{x}_{k},\mathbf{u}_{k})) (14)
λmax(AclP+PAcl)𝐳2+2λmax(P)σ(B)𝐳(LR𝐮k𝐮k1+ϵm),\displaystyle\leq\lambda_{\text{max}}(A_{\text{cl}}^{\top}P+PA_{\text{cl}})\|\mathbf{z}\|^{2}+2\lambda_{\text{max}}(P)\sigma(B)\|\mathbf{z}\|(L_{R}\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\|+\epsilon_{m}),

Let λ=λmax(PAcl+AclP)\lambda=-\lambda_{\text{max}}(PA_{\text{cl}}+A_{\text{cl}}^{\top}P), c1=λmin(P)c_{1}=\lambda_{\text{min}}(P), c2=λmax(P)c_{2}=\lambda_{\text{max}}(P), c3=2λmax(P)σ(B)c_{3}=2\lambda_{\text{max}}(P)\sigma(B) denote the constants and consider the inequality c1𝐳2𝒱c2𝐳2c_{1}\|\mathbf{z}\|^{2}\leq\mathcal{V}\leq c_{2}\|\mathbf{z}\|^{2}. Then (14) boils down to

𝒱˙1c2(λc2c3c1ρLR)𝒱+c3c1ϵm𝒱.\dot{\mathcal{V}}\leq-\frac{1}{c_{2}}(\lambda-\frac{c_{2}c_{3}}{c_{1}}\rho L_{R})\mathcal{V}+\frac{c_{3}}{\sqrt{c_{1}}}\epsilon_{m}\sqrt{\mathcal{V}}. (15)

Here, we define 𝒲=V,𝒲˙=𝒱˙/(2𝒱)\mathcal{W}=\sqrt{V},\dot{\mathcal{W}}=\dot{\mathcal{V}}/(2\sqrt{\mathcal{V}}) to apply the Comparison Lemma (Khalil, 2002) and obtain the convergence limtΛ𝐳\lim_{t\to\infty}\|\Lambda\mathbf{z}\| with Λ\Lambda being the decomposition of P=ΛΛP=\Lambda^{\top}\Lambda:

Λ𝐳(t)\displaystyle\|\Lambda\mathbf{z}(t)\| Λ𝐳(t0)exp(12c2(λc2c3c1ρLR)(tt0))+c2c3c1c1λc2c3ρLRϵm.\displaystyle\leq\|\Lambda\mathbf{z}(t_{0})\|\exp{(-\frac{1}{2c_{2}}(\lambda-\frac{c_{2}c_{3}}{c_{1}}\rho L_{R})(t-t_{0}))}+\frac{c_{2}c_{3}\sqrt{c_{1}}}{c_{1}\lambda-c_{2}c_{3}\rho L_{R}}\epsilon_{m}. (16)

This gives us 𝐳(t)r𝐳\|\mathbf{z}(t)\|\leq r_{\mathbf{z}} with r𝐳=σ(Λ1)(Λ𝐳(t0)+c2c3c1c1λc2c3ρLRϵm)r_{\mathbf{z}}=\sigma(\Lambda^{-1})(\|\Lambda\mathbf{z}(t_{0})\|+\frac{c_{2}c_{3}\sqrt{c_{1}}}{c_{1}\lambda-c_{2}c_{3}\rho L_{R}}\epsilon_{m}) and from (3),

𝐮(t)r𝐮,r𝐮=σ(K)r𝐳+urm+ϵm+Rm.\|\mathbf{u}(t)\|\leq r_{\mathbf{u}},\quad r_{\mathbf{u}}=\sigma(K)r_{\mathbf{z}}+u_{r_{m}}+\epsilon_{m}+R_{m}. (17)

As long as r𝐳r𝒳r_{\mathbf{z}}\leq r_{\mathcal{X}} and r𝐮r𝒰r_{\mathbf{u}}\leq r_{\mathcal{U}}, 𝐱,𝐮\mathbf{x},\mathbf{u} lies exclusively inside 𝒳,𝒰\mathcal{X},\mathcal{U}, yeilding 𝐳(t)B¯r(𝟎,r=σ(Λ1)c2c3c1c1λc2c3ρLRϵm)\|\mathbf{z}(t)\|\to\bar{B}_{r}(\mathbf{0},r=\sigma(\Lambda^{-1})\frac{c_{2}c_{3}\sqrt{c_{1}}}{c_{1}\lambda-c_{2}c_{3}\rho L_{R}}\epsilon_{m}). This result also implies a theoretical trade-off: smaller LRL_{R} conveges 𝐳\mathbf{z} faster but leaves a larger offset.\quad\Box

6 Experiments

We evaluate FRIDAY performance in trajectory tracking simulation. The experimental setup is where FRIDAY only knows a nominal model and learns the map of uncertain disturbances while controlling the system. In this experiment, a simple mass system is used as the nominal model, and a nonlinear term is added to it as truth-model. First, we show FRIDAY improves the tracking performance compared to a Baseline controller. Next, we demonstrate that SN guarantees the closed-loop stability by comparing the estimation error of FRIDAY without SN. Lastly, we present an intuition about how SN-DNNs are suitable for FRIDAY compared to another popular data-driven estimator, GPs. All our experiments are performed on Google Collaboratory with 12 GB of RAM and a 2.20GHz Intel Xeon processor. The code is available on: \urlhttps://github.com/SpaceTAKA/FRIDAY_CarSimu

6.1 Defining Nominal Model

For simplicity, we set the following mass system as a nominal model, which can be seen as a longitudinal car (Khalil, 2002)

mp¨=u,m\ddot{\text{p}}=\text{u}, (18)

where p¨,u,m\ddot{\text{p}},\text{u},m are the longitudinal acceleration, the driving force and the mass of the vehicle, weighing 1.5 kg respectively. We write it down to the LTI system 𝐱˙=A𝐱+Bu\dot{\mathbf{x}}=A\mathbf{x}+B\text{u}, with 𝐱=[p,p˙]\mathbf{x}=[\text{p},\dot{\text{p}}]^{\top}, A=[0100]A=\begin{bmatrix}0&1\\ 0&0\end{bmatrix}, B=[01m]B=\begin{bmatrix}0\\ \frac{1}{m}\end{bmatrix}.

6.2 Defining Truth-Models

Based on the nominal model, we define true nonlinear models named truth-models, which the controllers do not know.
Param-truth: This model represents a situation where all parameters of the vehicle are indeed time-variant. We consider time-variant load mass m(t)m(t) with aload=9a_{load}=9 and time-variant force coefficient λ(t)\lambda(t) with TT being simulation period (Khalil, 2002):

m(t)p¨\displaystyle m(t)\ddot{\text{p}} =λ(t)u,{m(t)=m+aloadm(1etT)λ(t)=etT\displaystyle=\lambda(t)\text{u},\quad\left\{\begin{aligned} m(t)&=m+a_{load}m(1-\text{e}^{-\frac{t}{T}})\\ \lambda(t)&=\text{e}^{-\frac{t}{T}}\end{aligned}\right. (19)

Multi-truth: This model represents a situation where the additive nonlinearity has multiplicative functions of control input uu. We consider the additive uncertainty p˙2u+p2+p˙|u|\dot{\text{p}}^{2}\text{u}+\text{p}^{2}+\dot{\text{p}}|\text{u}| (Ge et al., 2000):

mp¨=(1+p˙2)u+p2+p˙|u|.m\ddot{\text{p}}=(1+\dot{\text{p}}^{2})\text{u}+\text{p}^{2}+\dot{\text{p}}|\text{u}|. (20)

Enviro-truth: This model represents a situation where the vehicle is exposed to complex nonlinear environmental force. We consider air drag force fairf_{air} with cair=0.6c_{air}=0.6 and rolling resitance frollf_{roll} on icy road with μicy=0.6,aroll=0.4,r1=0.2,r2=0.1\mu_{icy}=0.6,a_{roll}=0.4,r_{1}=0.2,r_{2}=0.1. To increase the nonlinearity, Duffing spring force is added with k1=0.5,k2=0.3k_{1}=0.5,k_{2}=0.3 (Amodeo et al., 2009; Ward and Iagnemma, 2008):

mp¨\displaystyle m\ddot{\text{p}} =μicyufairfrollfduff,{fair=cairp˙2sinp˙,froll=sign(p˙)mg(r1(1earoll|p˙|)+r2|p˙|),fduff=k1p+k2p3\displaystyle=\mu_{icy}\text{u}-f_{air}-f_{roll}-f_{duff},\quad\left\{\begin{aligned} f_{air}&=c_{air}\dot{\text{p}}^{2}\sin{\dot{\text{p}}},\\ f_{roll}&=-\text{sign}(\dot{\text{p}})mg(r_{1}(1-\text{e}^{-a_{roll}|\dot{\text{p}}|})+r_{2}|\dot{\text{p}}|),\\ f_{duff}&=k_{1}\text{p}+k_{2}\text{p}^{3}\end{aligned}\right. (21)

6.3 Implementation of FRIDAY

In this experiment, our DNN 𝐑^(𝐱,u)\hat{\mathbf{R}}(\mathbf{x},\text{u}) consists of four fully-connected hidden layers with the input and output dimensions 3 to 1. The number of neurons in each layer is 50. We spectrally normalize the Lipschitz constant LR=1L_{R}=1. The teacher data 𝐲\mathbf{y} for the training is the observed residual dynamics value using the relation 𝐑(𝐱,u)=[0,p¨1mu]\mathbf{R}(\mathbf{x},\text{u})=[0,\ddot{\text{p}}-\frac{1}{m}\text{u}]. The weights are Q=diag(20,5),R=1Q=\text{diag}(20,5),R=1 and the control rate is 2020Hz.

6.4 Baseline Controller

We compare FRIDAY to an adaptive FBL-LQR controller with guaranteed stability (Yucelen, 1999). Instead of the DNN, this Baseline Controller approximates the uncertainty using weighted basis functions W^(t)𝝈(𝐱,𝐮)\hat{W}(t)\boldsymbol{\sigma}(\mathbf{x},\mathbf{u}), with update rate W^˙(t)=γ𝝈𝐞PB\dot{\hat{W}}(t)=\gamma\boldsymbol{\sigma}\mathbf{e}PB. Note that the learning rate γ=0.03\gamma=0.03 and 𝐞\mathbf{e} is the error between the state and reference model. We also compare to a simple LQR controller which can be seen as FRIDAY without the SN-DNN. \colorblack

Refer to caption
Figure 2: Setpoint regulation trajectory and the mean tracking error of 10 trajectories using FRIDAY (Left, blue), the Baseline controller (Right, red) and the LQR (Right, pink) in (a) Param-truth, (b) Multi-truth, and (c) Enviro-truth.

6.5 Tracking Performance

First, we conduct a setpoint regulation test. The target setpoint is 𝐱r=[1,0]\mathbf{x}_{r}=[1,0]^{\top} with the reference control input ur=0\text{u}_{r}=0. From Fig. 2, we can conclude that FRIDAY quickly and precisely reaches the target setpoint while the Baseline Controller slowly converges and the LQR shows a large offset (2-b) or highly oscillates (2-c) due to the unknown nonlinearity. FRIDAY achieves almost double the tracking accuracy of the Baseline Controller and ten times that of the LQR.\colorblack
For more practical use, we give a sine wave reference trajectory 𝐱r=[sinωt,ωcosωt]\mathbf{x}_{r}=[\sin\omega t,\omega\cos\omega t]^{\top}, ur=mω2sinωt\text{u}_{r}=-m\omega^{2}\sin\omega t where the wave period ω=2π/50\omega=2\pi/50 [rad/s]. In Fig. 3, similar to the setpoint regulation test, our controller outperforms the Baseline Controller and the LQR in tracking accuracy and the speed of convergence. FRIDAY smoothly fits the curve even when the Baseline and the LQR struggle to follow the path.\colorblack

Refer to caption
Figure 3: Sine wave tracking trajectory and the mean tracking error of 10 trajectories using FRIDAY (Left, blue), the Baseline controller (Right, red) and the LQR (Right, pink) in (a) Param-truth, (b) Multi-truth, and (c) Enviro-truth.

6.6 DNN Prediction Performance

The comparison of the true dynamics 𝐑\mathbf{R} and the predicted dynamics 𝐑^\hat{\mathbf{R}} is shown in Fig. 4 (a). We observe the FRIDAY learns the mapping more and more precisely as time passes by. Though in some curves, the DNN prediction does not fit, this is because the curvatures exceed the bounded Lipschitz constant. Without SN, we can see FRIDAY’s learning error overflows in Fig. 4 (b), which empirically implies the necessity of SN to stabilize the closed-loop system.

Refer to caption
Figure 4: (a) Real-time estimated residual dynamics 𝐑^\hat{\mathbf{R}} compared to the true dynamics 𝐑\mathbf{R} with mean estimation error in the sine wave tracking case (see Fig. 3). (b) Learning Loss of FRIDAY and FRIDAY without SN in the sine wave/Enviro-truth case with r1=0.4,aroll=0.8r_{1}=0.4,a_{roll}=0.8.
Table 1: Comparison of learned models
Model Wall time[s] Mean estimation error[N]
SN-DNN 1.9 1.14
DNN 1.8 1.82
GP 104.0 1.10

6.7 Estimator Comparison

To provide an intuition about how SN-DNNs are suitable for FRIDAY’s real-time estimation, we compare an SN-DNN with another popular data-driven estimator GPs, in the light of the training runtime and the mean estimation error. We use a GP model from Python library GPy with kernel Matern52. The training data is collected using the LQR controller following random setpoints (between 1.0 m and -1.0 m). After the training, we implement those learned models to our controller in the sine wave/Enviro-truth case.
From Table 1, we can find two main preferences of SN-DNNs for FRIDAY. (a) the SN-DNN learns the residual dynamics much faster than the GP and exhibits relatively good estimation. (b) the SN-DNN generalizes better than the one without SN. These preferences are supported by the fact that real-time learning SN-DNN in Fig. 4 (a) shows a smaller mean estimation error than that of the learned SN-DNN in Table 1.

7 Conclusion

In this letter, we present FRIDAY, composed of an LQR controller and a real-time stable learning DNN feedforward cancellation of unknown nonlinearity. Our framework shows two main benefits: (1) just by spectrally normalizing the weights, FRIDAY can learn fast with SGD and compensate for uncertain disturbances while controlling the system, and (2) the stability of FRIDAY is strictly theoretically guaranteed. Future works will include further application and generalization of FRIDAY.

\acks

I thank Guanya Shi, Xichen Shi, people on Mathematics Stack Exchange, and my family.

References

  • Amodeo et al. (2009) Matteo Amodeo, Antonella Ferrara, Riccardo Terzaghi, and Claudio Vecchio. Wheel slip control via second-order sliding-mode generation. IEEE Transactions on Intelligent Transportation Systems, 11(1):122–131, 2009.
  • Berkenkamp et al. (2016) Felix Berkenkamp, Riccardo Moriconi, Angela P Schoellig, and Andreas Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 4661–4666. IEEE, 2016.
  • Chen (1984) Chi-Tsong Chen. Linear system theory and design. Saunders college publishing, 1984.
  • Ge et al. (2000) SS Ge, TH Lee, and J Wang. Adaptive control of non-affine nonlinear systems using neural networks. In Proceedings of the 2000 IEEE International Symposium on Intelligent Control. Held jointly with the 8th IEEE Mediterranean Conference on Control and Automation (Cat. No. 00CH37147), pages 13–18. IEEE, 2000.
  • Greeff and Schoellig (2020) Melissa Greeff and Angela P Schoellig. Exploiting differential flatness for robust learning-based tracking control using gaussian processes. IEEE Control Systems Letters, 5(4):1121–1126, 2020.
  • Greeff et al. (2021) Melissa Greeff, Adam W Hall, and Angela P Schoellig. Learning a stability filter for uncertain differentially flat systems using gaussian processes. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 789–794. IEEE, 2021.
  • Joshi and Chowdhary (2019) Girish Joshi and Girish Chowdhary. Deep model reference adaptive control. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 4601–4608. IEEE, 2019.
  • Joshi et al. (2021) Girish Joshi, Jasvir Virdi, and Girish Chowdhary. Asynchronous deep model reference adaptive control. In Conference on Robot Learning, pages 984–1000. PMLR, 2021.
  • Khalil (2002) Hassan Khalil. Nonlinear Systems. Prentice Hall, 2002.
  • Konoiko et al. (2019) Aleksey Konoiko, Allan Kadhem, Islam Saiful, Ghorbanian Navid, Yahya Zweiri, and M.Necip Sahinkaya. Deep learning framework for controlling an active suspension system. Journal of Vibration and Control, 25(17):2316–2329, 2019.
  • Le et al. (2021) Duc M Le, Max L Greene, Wanjiku A Makumi, and Warren E Dixon. Real-time modular deep neural network-based adaptive control of nonlinear systems. IEEE Control Systems Letters, 6:476–481, 2021.
  • Lewis (1999) FL Lewis. Nonlinear network structures for feedback control. Asian Journal of Control, 1(4):205–228, 1999.
  • Li et al. (2017) Qiyang Li, Jingxing Qian, Zining Zhu, Xuchan Bao, Mohamed K Helwa, and Angela P Schoellig. Deep neural networks for improved, impromptu trajectory tracking of quadrotors. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5183–5189. IEEE, 2017.
  • Miyato et al. (2018) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
  • Neyshabur et al. (2017) Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017.
  • O’Connell et al. (2022) Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds. Science Robotics, 7(66):eabm6597, 2022.
  • Patil et al. (2022) Omkar Sudhir Patil, Duc M Le, Max L Greene, and Warren E Dixon. Lyapunov-derived control and adaptive update laws for inner and outer layer weights of a deep neural network. IEEE Control Systems Letters, 6:1855–1860, 2022.
  • Shi et al. (2019) Guanya Shi, Xichen Shi, Michael O’Connell, Rose Yu, Kamyar Azizzadenesheli, Animashree Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural lander: Stable drone landing control using learned dynamics. In 2019 international conference on robotics and automation (icra), pages 9784–9790. IEEE, 2019.
  • Shi et al. (2018) Xichen Shi, Kyunam Kim, Salar Rahili, and Soon-Jo Chung. Nonlinear control of autonomous flying cars with wings and distributed electric propulsion. In 2018 IEEE Conference on Decision and Control (CDC), pages 5326–5333. IEEE, 2018.
  • Slotine et al. (1991) Jean-Jacques E Slotine, Weiping Li, et al. Applied nonlinear control. Prentice hall Englewood Cliffs, NJ, 1991.
  • Sun et al. (2021) Runhan Sun, Max L Greene, Duc M Le, Zachary I Bell, Girish Chowdhary, and Warren E Dixon. Lyapunov-based real-time and iterative adjustment of deep neural networks. IEEE Control Systems Letters, 6:193–198, 2021.
  • Tourajizadeh et al. (2016) Hami Tourajizadeh, Mahdi Yousefzadeh, and Ali Tajik. Closed loop optimal control of a stewart platform using an optimal feedback linearization method. International Journal of Advanced Robotic Systems, 13(3):134, 2016.
  • Umlauft and Hirche (2019) Jonas Umlauft and Sandra Hirche. Feedback linearization based on gaussian processes with event-triggered online learning. IEEE Transactions on Automatic Control, 65(10):4154–4169, 2019.
  • Ward and Iagnemma (2008) Chris C Ward and Karl Iagnemma. A dynamic-model-based wheel slip detector for mobile robots on outdoor terrain. IEEE Transactions on Robotics, 24(4):821–831, 2008.
  • Yang et al. (2021) Rui Yang, Lei Zheng, Jiesen Pan, and Hui Cheng. Learning-based predictive path following control for nonlinear systems under uncertain disturbances. IEEE Robotics and Automation Letters, 6(2):2854–2861, 2021.
  • Yucelen (1999) Tansel Yucelen. Model reference adaptive control. Wiley Encyclopedia of Electrical and Electronics Engineering, pages 1–13, 1999.
  • Zeiler et al. (2013) Matthew D Zeiler, M Ranzato, Rajat Monga, Min Mao, Kun Yang, Quoc Viet Le, Patrick Nguyen, Alan Senior, Vincent Vanhoucke, Jeffrey Dean, et al. On rectified linear units for speech processing. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3517–3521. IEEE, 2013.
  • Zhang et al. (2023) Yiqiang Zhang, Jiaxing Che, Yijun Hu, Jiankuo Cui, and Junhong Cui. Real-time ocean current compensation for auv trajectory tracking control using a meta-learning and self-adaptation hybrid approach. Sensors, 23(14):6417, 2023.