FRIDAY: Real-time Learning DNN-based Stable LQR controller for Nonlinear Systems under Uncertain Disturbances

\NameTakahito Fujimori \Email[email protected]
\addrSchool of Biology Osaka University Machikaneyamacho 1-1 Toyonaka Japan

Abstract

Linear Quadratic Regulator (LQR) is often combined with feedback linearization (FBL) for nonlinear systems that have the nonlinearity additive to the input. Conventional approaches estimate and cancel the nonlinearity based on first principle or data-driven methods such as Gaussian Processes (GPs). However, the former needs an elaborate modeling process, and the latter provides a fixed learned model, which may be suffering when the model dynamics are changing. In this letter, we take a Deep Neural Network (DNN) using real-time-updated dataset to approximate the unknown nonlinearity while the controller is running. Spectrally normalizing the weights in each time-step, we stably incorporate the DNN prediction to an LQR controller and compensate for the nonlinear term. Leveraging the property of the bounded Lipschitz constant of the DNN, we provide theoretical analysis and locally exponential stability of the proposed controller. Simulation results show that our controller significantly outperforms Baseline controllers in trajectory tracking cases.

keywords:

Feedback Linearization, Deep Neural Networks, Real-time Learning, Guaranteed Stability

1 Introduction

Although Linear Quadratic Regulator (LQR) is an intuitively simple controller for linear systems and one of the successful theoretical subjects in modern control theory (Chen, 1984), many safety-critical systems such as self-driving vehicles, unmanned aerial vehicles (UAVs) and mobile manipulators exhibit additive nonlinear dynamics to the control input (Yang et al., 2021). For such systems, LQR is often combined with feedback linearization (FBL) to cancel out the nonlinearity, and various identification methods have been proposed.
A classical approach is first principle based (Tourajizadeh et al., 2016) or adaptive (Yucelen, 1999) approximations. However, parametric FBL-LQR controllers often suffer from slow response and delayed feedback if the model mismatches with the true dynamics. Another approach utilizes data-driven estimation, such as Gaussian Processes (GPs). GPs learn the nonlinear term and present the quantified prediction mismatch, enabling GPs-FBL-LQR controllers to stably linearize the system (Greeff et al., 2021; Greeff and Schoellig, 2020). However, GPs often keep a fixed dataset and do not update the model while the controller is running. Fixed models may suffer from situations where the model dynamics are changing due to unexpected disturbances, or the model needs to be updated because of biased training data. The compensating strategy is adding data to the dataset in real-time (Berkenkamp et al., 2016) or updating models based on its current reliability (Umlauft and Hirche, 2019), but these methods heavily rely on data efficiency and need a possibly expensive online optimization process, compromising fast convergence to the reference states. This is because the data efficiency of GPs is very sensitive to the choice of kernel/covariance function, and the runtime complexity is inherently $O(n^{3})$ . For these reasons, we take Deep Neural Networks (DNNs) approach to compensate for the unknown disturbances in real-time.
Related works: (Li et al., 2017; Konoiko et al., 2019) incorporate a learned DNN into a controller and directly cancel out the uncertainty by its prediction. However, these DNNs are not theoretically analyzed and thus possibly generate unpredictable outputs. To guarantee stability, (Shi et al., 2019; Zhang et al., 2023) apply Spectral Normalization (SN) to their DNN weights to constrain the Lipschitz constants. Though they stably compensate for complex fluid dynamics, they have fixed training datasets, resulting in the dependency of the performance on the DNN generalizations. Concurrent with real-time execution, (Joshi and Chowdhary, 2019; Sun et al., 2021) collect data and iteratively train their DNNs with Stochastic Gradient Descent (SGD) to apply batch-like updates. Still, in such multi-time scale controllers, when to collect the data and when to update the weights are open questions. Instead of SGD, adaptive DNN weights update laws based on Lyapunov stability analysis have been developed to continuously adjust the weights. Though these architectures are well-established, they only apply to NNs with a single hidden layer (Lewis, 1999) or only update the output layer weights (Joshi et al., 2021). For full layer weights update, (Le et al., 2021; Patil et al., 2022) develop modular adaptive law, but in arbitrary width and depth DNNs, simultaneously updating weights online under adaptive laws may be computationally intractable or undesired.
Contributions: Exploiting the technique to stabilize DNN-based FBL signal (Shi et al., 2019), which is computationally light and easy to implement, we continuously update all the weights with simple SGD, not with adaptive laws. Specifically, we apply SN to the weights before executing our DNN-based control input in each time-step such that it be a contraction mapping, converging to its unique point. By doing so, we can stably optimize the weights with SGD while controlling the system. The proposed controller is named FRIDAY, short for Fast ResIdual Dynamics AnalYsis. Leveraging the property of the bounded Lipschitz constant, FRIDAY is proved to be locally exponentially stable under bounded learning error. Simulation results show FRIDAY achieves almost double trajectory tracking accuracy of an adaptive baseline controller and ten times that of an LQR, while learning the map of uncertain disturbances. To the best of our knowledge, this is the first guaranteed framework that constantly collects data and updates the full layer weights with SGD to cancel out unknown dynamics.

2 Problem Statement: Nonlinear Systems under Uncertain Disturbances

Given the states of a system as state vector $\mathbf{x}\in\mathbb{R}^{\text{n}}$ , input vector $\mathbf{u}\in\mathbb{R}^{\text{m}}$ , we consider a control-affine nonlinear dynamic system

\dot{\mathbf{x}}=f(\mathbf{x})+g(\mathbf{x})\mathbf{u}.

(1)

A wide range of dynamical systems such as quadrotor and car-like vehicles can be separated into a linear dynamics component and additive nonlinearity (Greeff et al., 2021; Greeff and Schoellig, 2020). Thus, we divide the nonlinear system into a Linear Time-Invariant (LTI) system and a nonlinear term as follows (Yang et al., 2021):

\dot{\mathbf{x}}=A\mathbf{x}+B(\mathbf{u}+\mathbf{R}(\mathbf{x},\mathbf{u})),

(2)

where $A,B$ are time-invariant matrices with the pair $(A,B)$ being controllable, and $\mathbf{R}(\mathbf{x},\mathbf{u})$ accounts for unknown nonlinear dynamics including model uncertainties, named residual dynamics. It is noted that the analysis in this letter is restricted to this form.

Problem Statement: We aim to make a copy of the residual dynamics $\hat{\mathbf{R}}(\mathbf{x},\mathbf{u})$ in real-time to cancel out the nonlinear term by control input $\mathbf{u}=\mathbf{u}^{\text{LQR}}-\hat{\mathbf{R}}$ such that the LQR controller becomes able to operate the linearized system. We use a real-time learning DNN directly as the cancellation term, which means we incorporate DNN predictions into a controller while the DNN is learning. To stabilize the closed-loop, we exploit contraction mapping technique proposed by (Shi et al., 2019) using Spectral Normalization, which constrains the Lipschitz constant of DNNs. Leveraging the property of the bounded Lipschitz constant, we guarantee the contoller stability.

Refer to caption — Figure 1: Our proposed architecture using real-time learning DNN has three key components: (1) Updating Dataset in Real-time: the current observed values of states, control inputs and residual dynamics are added to the training dataset in each time-step, (2) One-time Learning: using the dataset, the DNN is optimized once an iteration with Stochastic Gradient Descent, (3) Execution after Spectral Normalization: the DNN estimation value cancels the nonlinearity after its Lipschitz constant is bounded.

3 FRIDAY Controller Design

We first show the overall design of our controller, composed of a conventional LQR and the DNN-based feedforward cancellation $\hat{\mathbf{R}}(\mathbf{x},\mathbf{u})$ . The DNN learns $\mathbf{R}$ using observed values while its prediction directly compensates for the unknown dynamics such that the LQR controls the system (see Fig. 1)

\mathbf{u}=-K(\mathbf{x}-\mathbf{x}_{r})+\mathbf{u}_{r}-\hat{\mathbf{R}}(\mathbf{x},\mathbf{u}),

(3)

where the feedback gain $K=R^{-1}B^{\top}P$ derives from the solution of the Algebraic Riccati Equation (ARE) $A^{\top}P+PA-PBR^{-1}B^{\top}P+Q=0,Q,R>0$ . $\mathbf{x}_{r}$ and $\mathbf{u}_{r}$ are reference signals satisfying $\dot{\mathbf{x}}_{r}=A\mathbf{x}_{r}+B\mathbf{u}_{r}$ (Chen, 1984). Substituting (3) into (2), the closed-loop dynamics would be the following equation with approximation error $\boldsymbol{\epsilon}=\mathbf{R}-\hat{\mathbf{R}}$ ,

\dot{\mathbf{x}}-\dot{\mathbf{x}}_{r}=(A-BK)(\mathbf{x}-\mathbf{x}_{r})+B\boldsymbol{\epsilon}.

(4)

By defining $\mathbf{z}=\mathbf{x}-\mathbf{x}_{r}$ , $A_{\text{cl}}=A-BK$ , the system dynamics would simply be

\dot{\mathbf{z}}=A_{\text{cl}}\mathbf{z}+B\boldsymbol{\epsilon}.

(5)

As long as $\|B\boldsymbol{\epsilon}\|$ is bounded, $\mathbf{x}(t)\to\mathbf{x}_{r}(t)$ locally and exponentially with the bounded error (Slotine et al., 1991; Shi et al., 2018).

4 DNN Learning Residual Dynamics

We demonstrate how the DNN is trained and predicts the residual dynamics while the controller is running. We use a DNN with Rectified Linear Units (ReLU) activation function. ReLU DNNs have been shown to converge faster, have fewer vanishing gradient problems, and be easier to optimize than the other activation functions such as sigmoid and tanh (Zeiler et al., 2013)

\hat{\mathbf{R}}(D_{X},\boldsymbol{\theta})=W^{L}a(W^{L-1}(a(W^{L-2}(\ldots a(W^{1}D_{X})\ldots)))),

(6)

where $D_{X}=\{\mathbf{x},\mathbf{u}\}$ consists of the observed state and control input, $\boldsymbol{\theta}=W^{1},\cdots,W^{L}$ stands for $L$ -th layers of weight matrices, and $a(\cdot)=\text{max}(\cdot,0)$ is layer-wise ReLU.

4.1 Real-time Learning

Real-time learning means that the following optimization is conducted in each time-step

$\displaystyle\mathcal{D}_{X}$	$\displaystyle:=\Big{\{}D_{X_{1}},\cdots D_{X_{k-1}}\leftarrow+D_{X_{k}}=\{\mathbf{x}_{k},\mathbf{u}_{k}\}\Big{\}},$	(7)
$\displaystyle\mathcal{D}_{Y}$	$\displaystyle:=\Big{\{}D_{Y_{1}},\cdots D_{Y_{k-1}}\leftarrow+D_{Y_{k}}=\tilde{\mathbf{R}}_{k}\Big{\}},$
	$\displaystyle\min_{\boldsymbol{\theta}}\quad\frac{1}{\text{n}(N)}\sum_{n\in N}\\|D_{Y_{n}}-\hat{\mathbf{R}}(D_{X_{n}},\boldsymbol{\theta})\\|^{2},$

where $\tilde{\mathbf{R}}_{k}$ is the observed residual dynamics value and $N$ denotes a mini-batch set with its size being $\text{n}(N)$ . In other words, the DNN is trainded using the each-iteration-updated dataset $\mathcal{D}_{X}$ , $\mathcal{D}_{Y}$ . Once (7) is done, the estimator $\hat{\mathbf{R}}$ is used for the cancellation at the next time-step (see Algorithm. 1 line: 10 to 16, and 8).
However, it is not preferable to integrate the estimation of such developping DNNs with a feedback controller because the output is unpredictable and can be unstable. To address this instability, we apply Spectral Normalization to our mapping. \colorblack

4.2 Spectral Normalization

Spectral Normalization (SN) normalizes the Lipschitz constant of the objective function. Lipschitz constant is defined as the smallest value such that

\forall\mathbf{x},\acute{\mathbf{x}}:\|f(\mathbf{x})-f(\acute{\mathbf{x}})\|_{2}/\|\mathbf{x}-\acute{\mathbf{x}}\|_{2}\leq\|f\|_{\text{Lip}}.

(8)

Since the Lipschitz constant of the linear mapping $W\mathbf{x}$ is the spectral norm of the weight matrix $\sigma(W)$ (the maximum singular value), and that of ReLU $\|a(\cdot)\|_{\text{Lip}}=1$ , the Lipschitz constant of ReLU DNNs is naturally upper bounded by the product of all the spectral norms,

\|f\|_{\text{Lip}}\leq\|W^{L}\|_{\text{Lip}}\cdot\|a\|_{\text{Lip}}\ldots\|W^{1}\|_{\text{Lip}}=\prod_{l=1}^{L}\sigma(W^{l}).

(9)

Leveraging this property, we can make the Lipschitz constant of the DNNs upper bounded by an intended value $\zeta$ by dividing each weight $W_{SN}=W^{l}/\sigma(W^{l})\cdot\zeta^{\frac{1}{L}}$ (see Sec. 2.1 (Miyato et al., 2018), Lemma 3.1 (Shi et al., 2019)).

4.3 Constrained Prediction

According to Lemma 5.1 and Theorem 5.2 discussed later, $\zeta$ -Lipschitz DNN-based control input converges. Hence, stably incorporating a real-time learning DNN into a controller, now becomes normalizing all the weights s.t. $\|\hat{\mathbf{R}}\|_{\text{Lip}}\leq\zeta$ before executing the control input $\mathbf{u}$ (see Algorithm. 1 line: 4 to 9). \colorblack We optimize the DNN using Stochastic Gradient Descent with momentum (Momentum SGD).

Algorithm 1 FRIDAY algorithm

1: Initialize the weights

\theta

of the SN-DNN.

2: for the entire duration do

3: Obtain the current states

\mathbf{x}_{k}

4: Calculate the spectral norm of each weight matrix

\sigma(W)

and divide

5: for

l

W^{l}\leftarrow W^{l}/\sigma(W^{l})\cdot\zeta^{\frac{1}{L}}

7: end for

8: Estimate the residual dynamics

\hat{\mathbf{R}}_{k}=\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{k-1})

9: Execute the current control input

\mathbf{u}_{k}=-K\mathbf{x}_{k}-\hat{\mathbf{R}}_{k}

10: Begin the DNN training:

11: Add the current data to the training sets,

12:

\mathcal{D}_{X}\leftarrow+\{\mathbf{x}_{k},\mathbf{u}_{k}\}

13:

\mathcal{D}_{Y}\leftarrow+\tilde{\mathbf{R}}_{k}

14: Sample random mini-batch

N

and update

\theta

once to minimize

15:

\frac{1}{\text{n}(N)}\underset{{n\in N}}{\sum}\|D_{Y_{n}}-\hat{\mathbf{R}}(D_{X_{n}},\boldsymbol{\theta})\|^{2}

16: End the training.

17: end for

5 Theoretical Guarantees

We analyze the closed-loop system to prove its stability and robustness. This analysis also provides insight into how to tune the hyperparameters of the DNN and the LQR controller to improve the performance. Note that all norms $\|\cdot\|$ used later denote $L^{2}$ norm.

5.1 Convergence of Control input

Using fixed-point iteration, we show the control input defined as (3) converges to a unique point when we fix all states.

Lemma 5.1: Control input defined by the following mapping $\mathbf{u}_{k}=\mathcal{F}(\mathbf{u}_{k-1})_{k}$ converges to the unique solution satisfying $\mathbf{u}_{k}^{*}=\mathcal{F}(\mathbf{u}_{k}^{*})_{k}$ when $\|\hat{\mathbf{R}}\|_{\text{Lip}}\leq\zeta$ and all states are fixed,

\mathcal{F}(\mathbf{u})_{k}=-K(\mathbf{x}_{k}-\mathbf{x}_{r_{k}})+\mathbf{u}_{r_{k}}-\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}).

(10)

Proof: Given all states fixed and $\forall\mathbf{u}_{1},\mathbf{u}_{2}\in\mathcal{U}$ , with $\mathcal{U}$ being a compact set of feasible control input, the distance in $L^{2}$ -space is

\displaystyle\|\mathcal{F}(\mathbf{u}_{1})_{k}-\mathcal{F}(\mathbf{u}_{2})_{k}\|

\displaystyle=\|\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{1})-\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{2})\|\leq L_{R}\|\mathbf{u}_{1}-\mathbf{u}_{2}\|,

(11)

\color

black where $L_{R}$ is the Lipschitz constant of the estimated dynamics $\hat{\mathbf{R}}(\mathbf{x},\mathbf{u})$ . As long as we constrain $L_{R}$ s.t. $L_{R}\leqq\zeta\leqq 1$ in every iteration, $\mathcal{F}(\cdot)$ is always a contraction mapping. Thus $k$ -th input $\mathbf{u}_{k}$ mapped by $\mathcal{F}(\cdot)_{k}$ get close to its unique solution. $\quad\Box$

5.2 Stability of FRIDAY Controller

To present the stability of FRIDAY controller, we make four assumptions.
Assumption 1: There exists the maximum of the reference signals $x_{r_{m}}=\max_{t\geq t_{0}}\|\mathbf{x}_{r}(t)\|$ , $u_{r_{m}}=\max_{t\geq t_{0}}\|\mathbf{u}_{r}(t)\|$
Assumption 2: The distance between one-step control input satisfies $\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\|\leqq\rho\|\mathbf{z}\|$ with a small positive constant $\rho$ .
The intuition behind this assumption is provided as follows: From (10), the distance is inherently upper bounded,

\Delta\mathbf{u}_{k}\leq\sigma(K)\Delta\mathbf{z}_{k}+\Delta\mathbf{u}_{r_{k}}+L_{R}(\Delta\mathbf{u}_{k-1}+\Delta\mathbf{x}_{k}),

(12)

with $\Delta(\cdot)_{k}=\|(\cdot)_{k}-(\cdot)_{k-1}\|$ . Under the condition where the update rate of all states control and the reference signals are much faster than that of FRIDAY controller, in practice, we can safely neglect $\Delta\mathbf{z}_{k},\Delta\mathbf{u}_{r_{k}}$ and $\Delta\mathbf{x}_{k}$ in one update (see Theorem11.1 (Khalil, 2002), e. g. the rate $>$ 100Hz (Shi et al., 2019; O’Connell et al., 2022)), which leads to:

\Delta\mathbf{u}_{k}\leq L_{R}(\Delta\mathbf{u}_{k-1}+c),

(13)

where $c$ is a small constant of sum of the neglected variables, and $L_{R}<1$ . So we can consider $\Delta\mathbf{u}_{k}$ has a small ultimate bound, and there exists a positive constant $\rho$ s.t. $\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\|\leq\rho\|\mathbf{z}\|$ .
Assumption 3: Over the compact sets of feasible states and control inputs $\mathbf{x}\in\mathcal{X},\mathbf{u}\in\mathcal{U}$ , the residual dynamics $\mathbf{R}(\mathbf{x},\mathbf{u})$ and its learning error $\boldsymbol{\epsilon}(\mathbf{x},\mathbf{u})=\mathbf{R}(\mathbf{x},\mathbf{u})-\hat{\mathbf{R}}(\mathbf{x},\mathbf{u})$ have upper bound $R_{m}=\sup_{\mathbf{x}\in\mathcal{X},\mathbf{u}\in\mathcal{U}}\|\mathbf{R}(\mathbf{x},\mathbf{u})\|$ , $\epsilon_{m}=\sup_{\mathbf{x}\in\mathcal{X},\mathbf{u}\in\mathcal{U}}\|\boldsymbol{\epsilon}(\mathbf{x},\mathbf{u})\|$ .
This assumption is strengthened by (Neyshabur et al., 2017) showing SN-DNNs empirically generalize well to the set of unseen events having almost the same distribution as the training set.
Assumption 4: The compact sets $\mathcal{X}=\bar{B}_{r_{\mathcal{X}}}(\mathbf{0},r_{\mathcal{X}})$ , $\mathcal{U}=\bar{B}_{r_{\mathcal{U}}}(\mathbf{0},r_{\mathcal{U}})$ are closed balls of radiuses $r_{\mathcal{X}},r_{\mathcal{U}}$ , centered at the origin.

Based on the assumptions, we prove the stability and robustness of the closed-loop system.
Theorem 5.2: If $\mathbf{x}_{0}\in\mathcal{X}$ , $\mathbf{u}_{0}\in\mathcal{U}$ and $r_{\mathcal{X}},r_{\mathcal{U}}$ are larger than some constants, then the controller defined in (3) achieves $\mathbf{x}(t)\to\mathbf{x}_{r}(t)$ exponentially to an error ball $\bar{B}_{r}$ of radius $r$ .
Proof: We select a Lyapunov function as $\mathcal{V}(\mathbf{z})=\mathbf{z}^{\top}P\mathbf{z}$ where P is a positive definite matrix satisfying the ARE. Applying Assumption 1-4, we get the time-derivative of $\dot{\mathcal{V}}$ :

	$\displaystyle\dot{\mathcal{V}}$	$\displaystyle=\mathbf{z}^{\top}(A_{\text{cl}}^{\top}P+PA_{\text{cl}})\mathbf{z}+2\mathbf{z}^{\top}PB(\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{k})-\hat{\mathbf{R}}(\mathbf{x}_{k},\mathbf{u}_{k-1})+{\epsilon}(\mathbf{x}_{k},\mathbf{u}_{k}))$		(14)
		$\displaystyle\leq\lambda_{\text{max}}(A_{\text{cl}}^{\top}P+PA_{\text{cl}})\\|\mathbf{z}\\|^{2}+2\lambda_{\text{max}}(P)\sigma(B)\\|\mathbf{z}\\|(L_{R}\\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\\|+\epsilon_{m}),$		(14)

Let $\lambda=-\lambda_{\text{max}}(PA_{\text{cl}}+A_{\text{cl}}^{\top}P)$ , $c_{1}=\lambda_{\text{min}}(P)$ , $c_{2}=\lambda_{\text{max}}(P)$ , $c_{3}=2\lambda_{\text{max}}(P)\sigma(B)$ denote the constants and consider the inequality $c_{1}\|\mathbf{z}\|^{2}\leq\mathcal{V}\leq c_{2}\|\mathbf{z}\|^{2}$ . Then (14) boils down to

\dot{\mathcal{V}}\leq-\frac{1}{c_{2}}(\lambda-\frac{c_{2}c_{3}}{c_{1}}\rho L_{R})\mathcal{V}+\frac{c_{3}}{\sqrt{c_{1}}}\epsilon_{m}\sqrt{\mathcal{V}}.

(15)

Here, we define $\mathcal{W}=\sqrt{V},\dot{\mathcal{W}}=\dot{\mathcal{V}}/(2\sqrt{\mathcal{V}})$ to apply the Comparison Lemma (Khalil, 2002) and obtain the convergence $\lim_{t\to\infty}\|\Lambda\mathbf{z}\|$ with $\Lambda$ being the decomposition of $P=\Lambda^{\top}\Lambda$ :

\displaystyle\|\Lambda\mathbf{z}(t)\|

\displaystyle\leq\|\Lambda\mathbf{z}(t_{0})\|\exp{(-\frac{1}{2c_{2}}(\lambda-\frac{c_{2}c_{3}}{c_{1}}\rho L_{R})(t-t_{0}))}+\frac{c_{2}c_{3}\sqrt{c_{1}}}{c_{1}\lambda-c_{2}c_{3}\rho L_{R}}\epsilon_{m}.

(16)

This gives us $\|\mathbf{z}(t)\|\leq r_{\mathbf{z}}$ with $r_{\mathbf{z}}=\sigma(\Lambda^{-1})(\|\Lambda\mathbf{z}(t_{0})\|+\frac{c_{2}c_{3}\sqrt{c_{1}}}{c_{1}\lambda-c_{2}c_{3}\rho L_{R}}\epsilon_{m})$ and from (3),

\|\mathbf{u}(t)\|\leq r_{\mathbf{u}},\quad r_{\mathbf{u}}=\sigma(K)r_{\mathbf{z}}+u_{r_{m}}+\epsilon_{m}+R_{m}.

(17)

As long as $r_{\mathbf{z}}\leq r_{\mathcal{X}}$ and $r_{\mathbf{u}}\leq r_{\mathcal{U}}$ , $\mathbf{x},\mathbf{u}$ lies exclusively inside $\mathcal{X},\mathcal{U}$ , yeilding $\|\mathbf{z}(t)\|\to\bar{B}_{r}(\mathbf{0},r=\sigma(\Lambda^{-1})\frac{c_{2}c_{3}\sqrt{c_{1}}}{c_{1}\lambda-c_{2}c_{3}\rho L_{R}}\epsilon_{m})$ . This result also implies a theoretical trade-off: smaller $L_{R}$ conveges $\mathbf{z}$ faster but leaves a larger offset. $\quad\Box$

6 Experiments

We evaluate FRIDAY performance in trajectory tracking simulation. The experimental setup is where FRIDAY only knows a nominal model and learns the map of uncertain disturbances while controlling the system. In this experiment, a simple mass system is used as the nominal model, and a nonlinear term is added to it as truth-model. First, we show FRIDAY improves the tracking performance compared to a Baseline controller. Next, we demonstrate that SN guarantees the closed-loop stability by comparing the estimation error of FRIDAY without SN. Lastly, we present an intuition about how SN-DNNs are suitable for FRIDAY compared to another popular data-driven estimator, GPs. All our experiments are performed on Google Collaboratory with 12 GB of RAM and a 2.20GHz Intel Xeon processor. The code is available on: \urlhttps://github.com/SpaceTAKA/FRIDAY_CarSimu

6.1 Defining Nominal Model

For simplicity, we set the following mass system as a nominal model, which can be seen as a longitudinal car (Khalil, 2002)

m\ddot{\text{p}}=\text{u},

(18)

where $\ddot{\text{p}},\text{u},m$ are the longitudinal acceleration, the driving force and the mass of the vehicle, weighing 1.5 kg respectively. We write it down to the LTI system $\dot{\mathbf{x}}=A\mathbf{x}+B\text{u}$ , with $\mathbf{x}=[\text{p},\dot{\text{p}}]^{\top}$ , $A=\begin{bmatrix}0&1\\ 0&0\end{bmatrix}$ , $B=\begin{bmatrix}0\\ \frac{1}{m}\end{bmatrix}$ .

6.2 Defining Truth-Models

Based on the nominal model, we define true nonlinear models named truth-models, which the controllers do not know.
Param-truth: This model represents a situation where all parameters of the vehicle are indeed time-variant. We consider time-variant load mass $m(t)$ with $a_{load}=9$ and time-variant force coefficient $\lambda(t)$ with $T$ being simulation period (Khalil, 2002):

\displaystyle m(t)\ddot{\text{p}}

\displaystyle=\lambda(t)\text{u},\quad\left\{\begin{aligned} m(t)&=m+a_{load}m(1-\text{e}^{-\frac{t}{T}})\\ \lambda(t)&=\text{e}^{-\frac{t}{T}}\end{aligned}\right.

(19)

Multi-truth: This model represents a situation where the additive nonlinearity has multiplicative functions of control input $u$ . We consider the additive uncertainty $\dot{\text{p}}^{2}\text{u}+\text{p}^{2}+\dot{\text{p}}|\text{u}|$ (Ge et al., 2000):

m\ddot{\text{p}}=(1+\dot{\text{p}}^{2})\text{u}+\text{p}^{2}+\dot{\text{p}}|\text{u}|.

(20)

Enviro-truth: This model represents a situation where the vehicle is exposed to complex nonlinear environmental force. We consider air drag force $f_{air}$ with $c_{air}=0.6$ and rolling resitance $f_{roll}$ on icy road with $\mu_{icy}=0.6,a_{roll}=0.4,r_{1}=0.2,r_{2}=0.1$ . To increase the nonlinearity, Duffing spring force is added with $k_{1}=0.5,k_{2}=0.3$ (Amodeo et al., 2009; Ward and Iagnemma, 2008):

\displaystyle m\ddot{\text{p}}

\displaystyle=\mu_{icy}\text{u}-f_{air}-f_{roll}-f_{duff},\quad\left\{\begin{aligned} f_{air}&=c_{air}\dot{\text{p}}^{2}\sin{\dot{\text{p}}},\\ f_{roll}&=-\text{sign}(\dot{\text{p}})mg(r_{1}(1-\text{e}^{-a_{roll}|\dot{\text{p}}|})+r_{2}|\dot{\text{p}}|),\\ f_{duff}&=k_{1}\text{p}+k_{2}\text{p}^{3}\end{aligned}\right.

(21)

6.3 Implementation of FRIDAY

In this experiment, our DNN $\hat{\mathbf{R}}(\mathbf{x},\text{u})$ consists of four fully-connected hidden layers with the input and output dimensions 3 to 1. The number of neurons in each layer is 50. We spectrally normalize the Lipschitz constant $L_{R}=1$ . The teacher data $\mathbf{y}$ for the training is the observed residual dynamics value using the relation $\mathbf{R}(\mathbf{x},\text{u})=[0,\ddot{\text{p}}-\frac{1}{m}\text{u}]$ . The weights are $Q=\text{diag}(20,5),R=1$ and the control rate is $20$ Hz.

6.4 Baseline Controller

We compare FRIDAY to an adaptive FBL-LQR controller with guaranteed stability (Yucelen, 1999). Instead of the DNN, this Baseline Controller approximates the uncertainty using weighted basis functions $\hat{W}(t)\boldsymbol{\sigma}(\mathbf{x},\mathbf{u})$ , with update rate $\dot{\hat{W}}(t)=\gamma\boldsymbol{\sigma}\mathbf{e}PB$ . Note that the learning rate $\gamma=0.03$ and $\mathbf{e}$ is the error between the state and reference model. We also compare to a simple LQR controller which can be seen as FRIDAY without the SN-DNN. \colorblack

6.5 Tracking Performance

First, we conduct a setpoint regulation test. The target setpoint is $\mathbf{x}_{r}=[1,0]^{\top}$ with the reference control input $\text{u}_{r}=0$ . From Fig. 2, we can conclude that FRIDAY quickly and precisely reaches the target setpoint while the Baseline Controller slowly converges and the LQR shows a large offset (2-b) or highly oscillates (2-c) due to the unknown nonlinearity. FRIDAY achieves almost double the tracking accuracy of the Baseline Controller and ten times that of the LQR.\colorblack
For more practical use, we give a sine wave reference trajectory $\mathbf{x}_{r}=[\sin\omega t,\omega\cos\omega t]^{\top}$ , $\text{u}_{r}=-m\omega^{2}\sin\omega t$ where the wave period $\omega=2\pi/50$ [rad/s]. In Fig. 3, similar to the setpoint regulation test, our controller outperforms the Baseline Controller and the LQR in tracking accuracy and the speed of convergence. FRIDAY smoothly fits the curve even when the Baseline and the LQR struggle to follow the path.\colorblack

6.6 DNN Prediction Performance

The comparison of the true dynamics $\mathbf{R}$ and the predicted dynamics $\hat{\mathbf{R}}$ is shown in Fig. 4 (a). We observe the FRIDAY learns the mapping more and more precisely as time passes by. Though in some curves, the DNN prediction does not fit, this is because the curvatures exceed the bounded Lipschitz constant. Without SN, we can see FRIDAY’s learning error overflows in Fig. 4 (b), which empirically implies the necessity of SN to stabilize the closed-loop system.

Table 1: Comparison of learned models

Model	Wall time[s]	Mean estimation error[N]
SN-DNN	1.9	1.14
DNN	1.8	1.82
GP	104.0	1.10

6.7 Estimator Comparison

To provide an intuition about how SN-DNNs are suitable for FRIDAY’s real-time estimation, we compare an SN-DNN with another popular data-driven estimator GPs, in the light of the training runtime and the mean estimation error. We use a GP model from Python library GPy with kernel Matern52. The training data is collected using the LQR controller following random setpoints (between 1.0 m and -1.0 m). After the training, we implement those learned models to our controller in the sine wave/Enviro-truth case.
From Table 1, we can find two main preferences of SN-DNNs for FRIDAY. (a) the SN-DNN learns the residual dynamics much faster than the GP and exhibits relatively good estimation. (b) the SN-DNN generalizes better than the one without SN. These preferences are supported by the fact that real-time learning SN-DNN in Fig. 4 (a) shows a smaller mean estimation error than that of the learned SN-DNN in Table 1.

7 Conclusion

In this letter, we present FRIDAY, composed of an LQR controller and a real-time stable learning DNN feedforward cancellation of unknown nonlinearity. Our framework shows two main benefits: (1) just by spectrally normalizing the weights, FRIDAY can learn fast with SGD and compensate for uncertain disturbances while controlling the system, and (2) the stability of FRIDAY is strictly theoretically guaranteed. Future works will include further application and generalization of FRIDAY.

\acks

I thank Guanya Shi, Xichen Shi, people on Mathematics Stack Exchange, and my family.

References

Amodeo et al. (2009) Matteo Amodeo, Antonella Ferrara, Riccardo Terzaghi, and Claudio Vecchio. Wheel slip control via second-order sliding-mode generation. IEEE Transactions on Intelligent Transportation Systems, 11(1):122–131, 2009.
Berkenkamp et al. (2016) Felix Berkenkamp, Riccardo Moriconi, Angela P Schoellig, and Andreas Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 4661–4666. IEEE, 2016.
Chen (1984) Chi-Tsong Chen. Linear system theory and design. Saunders college publishing, 1984.
Ge et al. (2000) SS Ge, TH Lee, and J Wang. Adaptive control of non-affine nonlinear systems using neural networks. In Proceedings of the 2000 IEEE International Symposium on Intelligent Control. Held jointly with the 8th IEEE Mediterranean Conference on Control and Automation (Cat. No. 00CH37147), pages 13–18. IEEE, 2000.
Greeff and Schoellig (2020) Melissa Greeff and Angela P Schoellig. Exploiting differential flatness for robust learning-based tracking control using gaussian processes. IEEE Control Systems Letters, 5(4):1121–1126, 2020.
Greeff et al. (2021) Melissa Greeff, Adam W Hall, and Angela P Schoellig. Learning a stability filter for uncertain differentially flat systems using gaussian processes. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 789–794. IEEE, 2021.
Joshi and Chowdhary (2019) Girish Joshi and Girish Chowdhary. Deep model reference adaptive control. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 4601–4608. IEEE, 2019.
Joshi et al. (2021) Girish Joshi, Jasvir Virdi, and Girish Chowdhary. Asynchronous deep model reference adaptive control. In Conference on Robot Learning, pages 984–1000. PMLR, 2021.
Khalil (2002) Hassan Khalil. Nonlinear Systems. Prentice Hall, 2002.
Konoiko et al. (2019) Aleksey Konoiko, Allan Kadhem, Islam Saiful, Ghorbanian Navid, Yahya Zweiri, and M.Necip Sahinkaya. Deep learning framework for controlling an active suspension system. Journal of Vibration and Control, 25(17):2316–2329, 2019.
Le et al. (2021) Duc M Le, Max L Greene, Wanjiku A Makumi, and Warren E Dixon. Real-time modular deep neural network-based adaptive control of nonlinear systems. IEEE Control Systems Letters, 6:476–481, 2021.
Lewis (1999) FL Lewis. Nonlinear network structures for feedback control. Asian Journal of Control, 1(4):205–228, 1999.
Li et al. (2017) Qiyang Li, Jingxing Qian, Zining Zhu, Xuchan Bao, Mohamed K Helwa, and Angela P Schoellig. Deep neural networks for improved, impromptu trajectory tracking of quadrotors. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5183–5189. IEEE, 2017.
Miyato et al. (2018) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
Neyshabur et al. (2017) Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017.
O’Connell et al. (2022) Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds. Science Robotics, 7(66):eabm6597, 2022.
Patil et al. (2022) Omkar Sudhir Patil, Duc M Le, Max L Greene, and Warren E Dixon. Lyapunov-derived control and adaptive update laws for inner and outer layer weights of a deep neural network. IEEE Control Systems Letters, 6:1855–1860, 2022.
Shi et al. (2019) Guanya Shi, Xichen Shi, Michael O’Connell, Rose Yu, Kamyar Azizzadenesheli, Animashree Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural lander: Stable drone landing control using learned dynamics. In 2019 international conference on robotics and automation (icra), pages 9784–9790. IEEE, 2019.
Shi et al. (2018) Xichen Shi, Kyunam Kim, Salar Rahili, and Soon-Jo Chung. Nonlinear control of autonomous flying cars with wings and distributed electric propulsion. In 2018 IEEE Conference on Decision and Control (CDC), pages 5326–5333. IEEE, 2018.
Slotine et al. (1991) Jean-Jacques E Slotine, Weiping Li, et al. Applied nonlinear control. Prentice hall Englewood Cliffs, NJ, 1991.
Sun et al. (2021) Runhan Sun, Max L Greene, Duc M Le, Zachary I Bell, Girish Chowdhary, and Warren E Dixon. Lyapunov-based real-time and iterative adjustment of deep neural networks. IEEE Control Systems Letters, 6:193–198, 2021.
Tourajizadeh et al. (2016) Hami Tourajizadeh, Mahdi Yousefzadeh, and Ali Tajik. Closed loop optimal control of a stewart platform using an optimal feedback linearization method. International Journal of Advanced Robotic Systems, 13(3):134, 2016.
Umlauft and Hirche (2019) Jonas Umlauft and Sandra Hirche. Feedback linearization based on gaussian processes with event-triggered online learning. IEEE Transactions on Automatic Control, 65(10):4154–4169, 2019.
Ward and Iagnemma (2008) Chris C Ward and Karl Iagnemma. A dynamic-model-based wheel slip detector for mobile robots on outdoor terrain. IEEE Transactions on Robotics, 24(4):821–831, 2008.
Yang et al. (2021) Rui Yang, Lei Zheng, Jiesen Pan, and Hui Cheng. Learning-based predictive path following control for nonlinear systems under uncertain disturbances. IEEE Robotics and Automation Letters, 6(2):2854–2861, 2021.
Yucelen (1999) Tansel Yucelen. Model reference adaptive control. Wiley Encyclopedia of Electrical and Electronics Engineering, pages 1–13, 1999.
Zeiler et al. (2013) Matthew D Zeiler, M Ranzato, Rajat Monga, Min Mao, Kun Yang, Quoc Viet Le, Patrick Nguyen, Alan Senior, Vincent Vanhoucke, Jeffrey Dean, et al. On rectified linear units for speech processing. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3517–3521. IEEE, 2013.
Zhang et al. (2023) Yiqiang Zhang, Jiaxing Che, Yijun Hu, Jiankuo Cui, and Junhong Cui. Real-time ocean current compensation for auv trajectory tracking control using a meta-learning and self-adaptation hybrid approach. Sensors, 23(14):6417, 2023.