This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Satisficing Control Design Framework with Safety and Performance Guarantees for Constrained Systems under Disturbances

Yuzhen Han Student Member, IEEE and Hamidreza Modares Senior Member, IEEE Yuzhen Han and Hamidreza Modares are with the Department of Mechanical Engineering, Michigan State University, East Lansing, MI, 48863, USA (e-mails:[email protected]; [email protected]).
Abstract

This paper presents a safe robust policy iteration (SR-PI) algorithm to design controllers with satisficing (good enough) performance and safety guarantee. This is in contrast to standard PI-based control design methods with no safety certification. It also moves away from existing safe control design approaches that perform pointwise optimization and are thus myopic. Safety assurance requires satisfying a control barrier function (CBF), which might be in conflict with the performance-driven Lyapunov solution to the Bellman equation arising in each iteration of the PI. Therefore, a new development is required to robustly certify the safety of an improved policy at each iteration of the PI. The proposed SR-PI algorithm unifies performance guarantee (provided by a Bellman inequality) with safety guarantee (provided by a robust CBF) at each iteration. The Bellman inequality resembles the satisficing decision making framework and parameterizes the sacrifice on the performance with an aspiration level when there is a conflict with safety. This aspiration level is optimized at each iteration to minimize the sacrifice on the performance. It is shown that the presented satisficing control policies obtained at each iteration of the SR-PI guarantees robust safety and performance. Robust stability is also guaranteed when there is no conflict with safety. Sum of squares (SOS) program is employed to implement the proposed SR-PI algorithm iteratively. Finally, numerical simulations are carried out to illustrate the proposed satisficing control framework.

Index Terms:
constrained continuous-time system, policy iteration, robust control, satisficing control, safe control

I Introduction

Successful deployment of the next-generation safety-critical systems (e.g., self-driving cars or assistive robots) requires certifying their safety despite uncertainties [1, 2]. Therefore, there is an urgent need for developing robust safe controllers that respect the system’s constraints all the time. Safety, however, is the bare minimum requirement for any safety-critical system, and it is desired to design safe controllers that achieve as much performance as possible. To guarantee performance, one can solve an optimal control problem for which a pre-defined cost function that encodes desired system specifications is optimized [3, 4]. Despite the importance of designing safe optimal controllers, safe control design and optimal control design are typically separated in the literature. More specifically, while an optimal controller is generally found by solving the so-called Hamilton-Jacobi-Bellman (HJB) equation [5, 6], existing iterative solutions to the HJB equation mainly ignore safety constraints. On the other hand, to satisfy safety requirement of dynamic systems, safety verification using control barrier functions (CBFs) [7, 8, 9] and reachability analysis [10, 11, 12] have been widely and successfully used. While these frameworks can effectively guarantee the forward invariance of a given safe set and asymptotic convergence of the system’s trajectories to a target set, the long-term optimality of the solution is not considered. This can lead to conservative solutions that unnecessarily consume a significant amount of resources or result in poor performance.

To take into account safety and optimality simultaneously, in the reference governor approach [13, 14], a safe controller intervenes with a nominal performance-driven controller to avoid constraint violations when there is a risk of safety violation. However, reference governor might keep intervening with the nominal controller, making the controller myopic and possibly far from optimal. This is because the nominal controller does not take into account the safety constraints in the design phase to proactively avoid them. Model predictive control (MPC) [15, 16] is another control strategy candidate for addressing safe optimal control design. MPC takes the system constraints into account while optimizing a short-horizon performance. However, despite its tremendous success, MPC needs to perform an optimization algorithm at every step, which might not be computationally tractable for nonlinear systems. Moreover, because of its short-sided nature, guaranteeing feasibility and stability is also hard.

Satisficing decision theory [17] has been widely employed in economic optimization problems to find good enough solutions that are not necessarily optimal. This is motivated by the fact that finding optimal solutions for systems under limited resources and incomplete information might not be feasible. Satisficing stabilizing control design has also been considered in control community [18, 19]. These approaches, however, are typically plagued by the lack of safety guarantee. In this paper, we propose a new safe and satisficing control approach in which a satisficing framework is leveraged to sacrifice performance in favor of safety, but minimizing the sacrifice level on the performance as much as possible. Starting from a safe control policy, an iterative safe robust policy iteration (SR-PI) algorithm is then proposed to find improved satisficing controllers that certify robust safety against matched disturbances. More specifically, the policy evaluation step finds the value function for the current the satisficing safe control policy and the policy improvement step finds an improved satisficing control policy with guaranteed input-to-state safety (ISSf) [20]. The robust stability of satisficing policies will also be guaranteed when there is no conflict between safety and stability. Sum of squares (SOS) program [21] is employed to implement the presented PI algorithm. Fig. 1 shows the schematic of the proposed SR-PI and its comparison with the standard PI algorithm. It should be noticed that the type matched disturbance is investigated in this research, but it could be generalized to unmatched disturbance case which also attracts wide attentions[22, 23, 24].

The rest of the paper is organized as follows. Section 2 presents some preliminaries that are used throughout the paper. A satisficing safe control design framework is developed in Section 3. Sections 4 presents the simulation results and experiment comparison, respectively. The paper is concluded in Section 5.

Notations: Through the paper, the set of continuously differentiable functions are represented by C1{{C}^{1}}, and the set of positive definite and proper functions in C1{{C}^{1}} are denoted as PP. A polynomial p(x)p(x) is a sum of squares (SOS) polynomial, i.e., p(x)𝒫SOSp(x)\in{{\mathcal{P}}^{SOS}} where 𝒫SOS{{\mathcal{P}}^{SOS}} is a set of SOS polynomial, if p(x)=1mpi2(x)p(x)=\sum\nolimits_{1}^{m}{p_{i}^{2}(x)} where pi(x)𝒫SOSp_{i}(x)\in{{\mathcal{P}}^{SOS}}, i=1,,mi=1,...,m. [x]d1,d2\mathbb{R}{{[x]}_{{{d}_{1}},{{d}_{2}}}} denotes all the sets of polynomials in xnx\in{{\mathbb{R}}^{n}} with degree at least d1{{d}_{1}} and at most d2{{d}_{2}}. 𝒳n\mathcal{X}\in{{\mathbb{R}}^{n}} is state space which is a compact set. A continuous function K:[0,a)[0,)K:[0,a)\to[0,\infty) is of class κ\kappa function, denoted by KκK\in\kappa, if it is strictly increasing and K(0)=0K(0)=0. A function α(s,t)\alpha(s,t) is a class of κ\kappa\mathcal{L} function if for each fixed t0t\geq 0 the function α(.,t)\alpha(.,t) is a κ\kappa function and for fixed s0s\geq 0 it is decreasing to zero as tt\to\infty. We also denote by κκ\kappa\kappa all functions γ\gamma such that γ(.,t)κ\gamma(.,t)\in\kappa for a fixed t0t\geq 0 and similarly, γ(s,.)κ\gamma(s,.)\in\kappa for a fixed s0s\geq 0. f(x)\nabla f(x) is the gradient of function ff and f(x)=[f(x)x1,f(x)x1,,f(x)xn]T\nabla f(x)={{[\frac{\partial f(x)}{\partial{{x}_{1}}},\frac{\partial f(x)}{\partial{{x}_{1}}},...,\frac{\partial f(x)}{\partial{{x}_{n}}}]}^{T}}. diag(x1,,xn)diag({{x}_{1}},...,{{x}_{n}}) denotes a square diagonal matrix with elements of x1,,xn{{x}_{1}},...,{{x}_{n}} on the main diagonal. x||x|| indicates the Euclidean norm xTx\sqrt{{{x}^{T}}x} of a real vector xnx\in{{\mathbb{R}}^{n}}. For any set SS, Int(S)Int(S) and S\partial S denote the interior and boundary of the set SS, respectively; Int(S)¯\overline{Int(S)} is the closure of set SS. ξU=minaUξa||\xi|{{|}_{U}}={{\min}_{a\in U}}||\xi-a|| where ||.||||.|| is Euclidian norm. For a given signal x:nx:\mathbb{R}\to{{\mathbb{R}}^{n}}, its LP{{L}^{P}} norm on the interval TT is given by xLP(T)=(Tx(t)P𝑑t)1/P||x|{{|}_{{{L}^{P}}(T)}}={{({{\int_{T}{||x(t)||}}^{P}}dt)}^{1/P}} and similarly, its L{{L}^{\infty}} norm is defined by xL(T)=(ess)suptT(x(t))||x|{{|}_{{{L}^{\infty}}(T)}}=(ess){{\sup}_{t\in T}}(||x(t)||). For the sake of conciseness, for T=[0,)T=[0,\infty), we denote the L{{L}^{\infty}} norm of xx simply by xL||x|{{|}_{{{L}^{\infty}}}}. For two vectors xx and yy, xyx\succeq y iff xiyix_{i}\geq y_{i} holds for all elements xix_{i} and yiy_{i} of xx and yy. fmaxf_{max} indicates the maximum of the function ff over a set of interest.

Refer to caption
Figure 1: Comparison between the proposed SR-PI algorithm and existing PI algorithms. (a) Standard PI without safety verification; (b) The proposed SR-PI with safety verification at each iteration to find an improved safe policy.

II Preliminaries

Consider the following continuous-time nonlinear system

x˙=f(x)+g(x)(u+d)=f(x)+g(x)u+ω,\dot{x}=f\left(x\right)+g\left(x\right)(u+d)=f\left(x\right)+g\left(x\right)u+\omega,\,\,\,\ (1)

where x𝒳x\in\mathcal{X} is the vector of system states, umu\in{{\mathbb{R}}^{\text{m}}} is the vector of control input, ω\omega is the disturbance on the control input, and ω=g(x)d\omega=g\left(x\right)d . The nonlinear functions f:nnf:{{\mathbb{R}}^{\text{n}}}\to{{\mathbb{R}}^{\text{n}}} and g:nn×m\text{g}:{{\mathbb{R}}^{\text{n}}}\to{{\mathbb{R}}^{\text{n}\times\text{m}}} are assumed to be locally Lipschitz continuous with f(0)=0f\left(0\right)=0.

Assumption 1 The system (1) is stabilizable on the set 𝒳\mathcal{X}.

Assumption 2 The disturbance dd is bounded. That is, there exists a constant dmaxd_{\max} such that

d(t)dmax||d(t)||\leq d_{\max} (2)

II-A Optimal Control Design Framework

In this section, we present a robust optimal control design framework for the system (1). To find an optimal controller, one can optimize a pre-defined performance index that encodes the designer’s intention in achieving some system’s specifications. For the case where d=0d=0, the following infinite-horizon performance index is usually considered for the system (1).

J(x,u)=tr(x,u)dτJ\left({x},u\right)=\mathop{\int}_{t}^{\infty}\,r\left(x,u\right)d\tau (3)

where

r(x,u)=q(x)+uTRu  r\left(x,u\right)=q\left(x\right)+{{u}^{T}}Ru\text{ }\!\!\!\!\text{ } (4)

is the reward function with q(x)q(x)\in\mathbb{R} as a positive definite function, and Rm×mR\in{{\mathbb{R}}^{m\times m}} as a symmetric positive definite matrix. The existence of a stabilizing optimal controller is guaranteed under some mild assumptions on the system dynamics and the performance index [25, 26]. The optimal control found by optimizing (3), however, does not guarantee robust stability of the system due to the disturbance. To guarantee robust stability of the system (1) for the case when d0d\neq 0, the performance index (3) can be modified as follows [27]

J¯(x,u)=tr¯(x,u)dτ\overline{J}\left({x},u\right)=\mathop{\int}_{t}^{\infty}\,\overline{r}\left(x,u\right)d\tau (5)

with the modified reward function

r¯(x,u)=q(x)+uTRu+β(x)  \overline{r}\left(x,u\right)=q\left(x\right)+{{u}^{T}}Ru+\beta\text{(x) }\!\!\!\!\text{ } (6)

where β(x)\beta\text{(x)} is an extra term added to guarantee robust stability despite the disturbance.

Remark 1. Note that several modified performance or reward functions are presented in the presence of the disturbance. For example, the HH_{\infty} control defines the extra term as β(x)=γ2d(x)Td(x)\beta\text{(x)}=-\gamma^{2}d(x)^{T}d(x). On the other hand, in [29], the extra term is defined as β(x)=14VTV+dmax2\beta\text{(x)=}\frac{1}{4}\nabla{{V}^{T}}\nabla V+d_{\max}^{2}, where V(x)V(x) is the value function corresponding to the control policy uu, and it is shown that the optimal controller found by minimizing the modified cost guarantees robust stability and provides an upper bound for the original performance. However, since the extra term β(x)\beta\text{(x)} depends on the gradient of the value of the control input quadratically, solving the modified optimal control problem becomes computationally expensive when SOS is used to implement it. In the following, a modified performance index is presented to avoid this issue. As will be shown later, instead of solving a huge SOS optimization (because of the cross term VTV\nabla{{V}^{T}}\nabla V), two SOS optimizations with much less complexity will be solved.

Theorem 1. Consider the system (1) and the performance function (5) and (6) and let

β(x)=βu(x)+dmaxTRdmax\beta\text{(x)=}\beta_{u}(x)+d_{\max}^{T}Rd_{\max} (7)

with βu(x)umaxTRumax\beta_{u}(x)\geq u_{\max}^{T}R\,\,{u}_{\max}. Then, the optimal control solution is

uo(x)=12R1((Vo)T(x)g(x))T{{u}^{o}}(x)=-\frac{1}{2}{{R}^{-1}}{{({{(\nabla{{V}^{o}})}^{T}}\left(x\right)g(x))}^{T}} (8)

where VoV^{o} is the solution to the HJB equation given by

H¯(Vo)=0\overline{H}\text{(}{{V}^{o}}\text{)=0} (9)

with

H¯(V)=q(x)+VT(x)f(x)14VT(x)g(x)R1(VT(x)g(x))T+dmaxTRdmax+βu(x)\displaystyle\begin{gathered}\overline{H}\text{(}V\text{)=}q(x)+\nabla{{V}^{T}}\left(x\right)f(x)\hfill\\ -\frac{1}{4}\nabla{{V}^{T}}\left(x\right)g(x){{R}^{-1}}{{(\nabla{{V}^{T}}\left(x\right)g(x))}^{T}}+d_{\max}^{T}Rd_{\max}+\beta_{u}(x)\hfill\end{gathered} (12)

That is,

Vo(x0)=min𝑢J¯(x0,u)=J¯(x0,uo){{V}^{o}}({{x}_{0}})=\underset{u}{\mathop{\min}}\,\overline{J}({{x}_{0}},u)=\overline{J}({{x}_{0}},{{u}^{o}}) (13)

Moreover, the optimal controller is unique and guarantees robust stability and provides a suboptimal performance (i.e., an upper bound) for the original cost function (3) and (4).

Proof. The fact that the optimal control policy found by (8) and (12) can be shown similar to [25], as only the cost function is modified and the derivation of the HJB equation does not change. We now show that the optimal controller guarantees robust stability and provided an upper bound for the original cost. First, we show that (8) is a solution to the roust control problem. That is, the system (1) is globally asymptotic stable for uo(x)u^{o}(x). To do this, we show that Vo(x)V^{o}(x) is a Lyapunov function. Clearly,

{Vo(x)>0,x0Vo(x)=0,x=0\left\{\begin{array}[]{l}V^{o}(x)>0,\,\,\,\,\,x\neq 0\\ V^{o}(x)=0,\,\,\,\,\,x=0\end{array}\right. (14)

Also, V˙o(x)=dVo(x)/dt<0\dot{V}^{o}(x)=dV^{o}(x)/dt<0 for x0x\neq 0, because

V˙o(x)=(dVo(x)/dx)T(dx/dt)=(Vo)T(x)(f(x)+g(x)uo+w)=(Vo)T(x)(f(x)+g(x)uo)+(Vo)T(x)g(x)d=βu(x)dmaxTRdmaxq(x)uoTRuo+(Vo)T(x)g(x)d=βu(x)dmaxTRdmaxq(x)uoTRuo2uoTRd=βu(x)dmaxTRdmaxq(x)(d+uo(x))TR(d+uo(x))+dTRdβu(x)(dmaxTRdmaxdTRd)q(x)βu(x)q(x)<0.\displaystyle\begin{gathered}\dot{V}^{o}(x)={{(dV^{o}(x)/dx)}^{T}}(dx/dt)\hfill\\ \quad\quad\quad={{(\nabla V^{o})}^{T}}(x)(f(x)+g(x)u^{o}+w)\hfill\\ \quad\quad\quad={{(\nabla V^{o})}^{T}}(x)(f(x)+g(x)u^{o})+{{(\nabla V^{o})}^{T}}(x)g(x)d\hfill\\ \quad\quad\quad=-\beta_{u}(x)-d_{\max}^{T}R{{d}_{\max}}-q(x)-{{u}^{{o}^{T}}}Ru^{o}+{{(\nabla V^{o})}^{T}}(x)g(x)d\hfill\\ \quad\quad\quad=-\beta_{u}(x)-d_{\max}^{T}R{{d}_{\max}}-q(x)-{{u}^{{o}^{T}}}Ru^{o}-2u^{o^{T}}Rd\hfill\\ \quad\quad\quad=-\beta_{u}(x)-d_{\max}^{T}R{{d}_{\max}}-q(x)-{{(d+u^{o}(x))}^{T}}R(d+u^{o}(x))\hfill\\ \quad\quad\quad\quad+{{d}^{T}}Rd\hfill\\ \quad\quad\quad\leq-\beta_{u}(x)-(d_{\max}^{T}R{{d}_{\max}}-{{d}^{T}}Rd)-q(x)\hfill\\ \quad\quad\quad\leq-\beta_{u}(x)-q(x)<0.\hfill\\ \end{gathered} (25)

Therefore, the conditions of the Lyapunov local stability theory are satisfied. Consequently, there exists a constant c>0c>0 and a neighborhood 𝒩={x:βu(x)+q(x)<c}\mathcal{N}=\{x:||\beta_{u}(x)+q(x)||<c\} such that if x(t)x(t) enters 𝒩\mathcal{N}, then limtx(t)=0lim_{t\to\infty}x(t)=0. However, x(t)x(t) cannot remain outside 𝒩\mathcal{N} forever. Otherwise, x(t)c||x(t)||\geq c for all t0t\geq 0. Therefore

V(x(t))V(x(0))=0tV˙(x(τ))𝑑τ0tβu(x)+q(x)𝑑τ0tc𝑑τ=ct\displaystyle\begin{gathered}V(x(t))-V(x(0))=\int_{0}^{t}{\dot{V}(x(\tau))}d\tau\hfill\\ \quad\quad\quad\leq-\int_{0}^{t}{||\beta_{u}(x)+q(x)|{{|}}}d\tau\hfill\\ \quad\quad\quad\leq-\int_{0}^{t}{{{c}}}d\tau={{c}}t\hfill\\ \end{gathered} (30)

Let t+t\to+\infty, we have

V(x(t))V(x(0))ct\displaystyle\begin{gathered}V(x(t))\leq V(x(0))-ct\to-\infty\end{gathered} (32)

which contradicts the fact that V(x(t))0V(x(t))\geq 0 for all x(t)x(t). Therefore limtx(t)=0lim_{t\to\infty}x(t)=0 no matter where the trajectory begins. So the optimal controller guarantees robust stability. From the (25), the following holds

V˙(x)βu(x)q(x)umaxTRumaxq(x)uTRuq(x)\displaystyle\begin{gathered}\dot{V}(x)\leq-\beta_{u}(x)-q(x)\hfill\\ \quad\quad\quad\leq-u_{\max}^{T}R{{u}_{\max}}-q(x)\hfill\\ \quad\quad\quad\leq-{{u}^{T}}Ru-q(x)\hfill\\ \end{gathered} (37)

Integrating both sides of (37) on the time interval [0,t][0,t] yields

V(x(0))V(x(t))0t[uTRu+q(x)]𝑑τ\displaystyle\begin{gathered}V(x(0))-V(x(t))\geq\int_{0}^{t}{\,[{{u}^{T}}Ru+q(x)}]d\tau\end{gathered} (39)

Let tt\to\infty, then

V(x(0))limt0t[uTRu+q(x)]𝑑τ=0[uTRu+q(x)]𝑑τ=J(x,u)\displaystyle\begin{gathered}V(x(0))\geq\underset{t\to\infty}{\mathop{\lim}}\ \int_{0}^{t}{\,[{{u}^{T}}Ru+q(x)}]d\tau=\int_{0}^{\infty}{\,[{{u}^{T}}Ru+q(x)}]d\tau=J(x,u)\end{gathered} (41)

which implies the optimal controller provides a suboptimal performance (i.e., an upper bound) for the original cost function (3) and (4). Next, we show the uniqueness of the solution to (9). Given uu, assume there is another Lyapunov function Va{{V}^{a}} satisfies HJB equation

H¯(Va)=0, x𝒳.\overline{H}\text{(}{{V}^{a}}\text{)=}0,\text{ }x\in\mathcal{X}. (42)

Since r¯(x,u)>0\overline{r}\left(x,u\right)>0, we have VT(x)(f(x)+g(x)u(x))<0\nabla{{V}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)u(x)\right)<0, x𝒳\{0}\forall x\in\mathcal{X}\backslash\{0\}. Subtracting H¯(Vo)\overline{H}({{V}^{o}}) from H¯(Va)\overline{H}({{V}^{a}}) yields

H¯(Va)H¯(Vo)=[Va(x)Vo(x)]T(f(x)+g(x)u(x))=0.\overline{H}({{V}^{a}})-\overline{H}({{V}^{o}})={{[\nabla{{V}^{a}}\left(x\right)-\nabla{{V}^{o}}\left(x\right)]}^{T}}\left(f\left(x\right)+g\left(x\right)u(x)\right)=0. (43)

Therefore, Va=Vo+ε{{V}^{a}}={{V}^{o}}+\varepsilon for some scalar ε\varepsilon,x𝒳\{0}\forall x\in\mathcal{X}\backslash\{0\}. Since f(x)+g(x)u(x)0f\left(x\right)+g\left(x\right)u(x)\neq 0, x𝒳\{0}\forall x\in\mathcal{X}\backslash\{0\}. Here Va(0)=V*(0)=0{{V}^{a}}(0)={{V}^{\text{*}}}(0)=0 result in ε=0\varepsilon=0 and therefore Va(x)=Vo(x){{V}^{a}}(x)={{V}^{o}}(x) holds for all x𝒳\forall x\in\mathcal{X}. This contradicts with our assumption that there is another Va{{V}^{a}} satisfy HJB equation. So, Vo(x){{V}^{o}}(x) is a unique solution for equation H¯(Vo)=0\overline{H}\left({{V}^{o}}\right)=0.

\square

Note that under Assumption 1, there exists a well-defined Lyapunov function V0P{{V}^{0}}\in P and a control policy u1{{u}^{1}}such that

L(V0,u1)=(V0)T(x)(f(x)+g(x)u1)r¯(x,u1)0L({{V}^{0}},{{u}^{1}})=-{{(\nabla{{V}^{0}})}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right){{u}^{1}}\right)-\overline{r}\left(x,{{u}^{1}}\right)\geq 0\,\,\,\, (44)

Remark 2. Note that in general the optimal control problem does not necessarily have a smooth value function. However, under some mild assumptions [30], the value function satisfies premier properties like continuity and continuous differentiability. In this paper, all derivations are performed under the assumption of the existence of a smooth solution to (9). If the smoothness assumption is relaxed, then one needs to use the theory of viscosity solutions [30] to find a solution. Note, however, that the existence of the disturbance and so addition of the extra term β(x)\beta(x) to the HJB might restrict the class of the systems or the conditions under which the existence of the solution is guaranteed. This is the case for all other approaches that deal with disturbance. For example, for the disturbance-free case, if the system is stabilizable and q(x) is positive definite, then the existence of a unique solution to the HJB equation is guaranteed. However, if HH_{\infty} control is employed for disturbance attenuation, then the solution to its corresponding Hamilton-Jacobi-Issac (HJI) equation is not guaranteed under the same conditions and extra conditions on the performance parameters and the disturbance attenuation level are required to guarantee existence of a solution.

II-B Safety Assurance Using Control Barrier Certificate

It is of vital importance for a safety critical system to prevent its state from entering some certain unsafe regions. To design a safe controller, the concept of control barrier function (CBF) can be used. Consider the nonlinear system (1) with x𝒳x\in\mathcal{X} , where 𝒳\mathcal{X} is the allowable set for system’s states, and let 𝒳𝟎𝒳n{{\mathcal{X}}_{\mathbf{0}}}\in\mathcal{X}\in{{\mathbb{R}}^{n}} and 𝒳u𝒳n{{\mathcal{X}}_{u}}\in\mathcal{X}\in{{\mathbb{R}}^{n}} be initial set and unsafe set, respectively. Let there exists a continuously differentiable function hC1(𝒳):nh\in{{C}^{1}}(\mathcal{X}):{{\mathbb{R}}^{n}}\to\mathbb{R} such that

h(x)0,x𝒳0,h(x)<0,x𝒳u.\displaystyle\begin{gathered}h\left(x\right)\geq 0,\,\,\forall x\in{{\mathcal{X}}_{0}},\hfill\\ h\left(x\right)<0,\,\,\forall x\in{{\mathcal{X}}_{u}}.\hfill\\ \end{gathered} (48)

Under disturbance, one must guarantee that regardless of the disturbance value, the system never enters the unsafe set 𝒳u{{\mathcal{X}}_{u}}. The input-to-state safety and the robust CBF defined below provide the conditions under which the system trajectories never enter an unsafe set despite the disturbances (as inputs to the system).

Definition 1[31]. Let 𝒳u{{\mathcal{X}}_{u}} be the unsafe set. The system (1) is input-to-state safe (ISSf) if there exist α,ϕκκ\alpha,\phi\in\kappa\kappa and a strictly increasing function σ\sigma such that

σ(x(t)𝒳u)α(x(t)𝒳u,t)ϕ((uL,t))\sigma(||x(t)|{{|}_{{{\mathcal{X}}_{u}}}})\geq\alpha(||x(t)|{{|}_{{{\mathcal{X}}_{u}}}},t)-\phi((||u|{{|}_{{{L}^{\infty}}}},t)) (49)

holds t\forall t.

Let the safe set 𝒞\mathcal{C} be defined as

𝒞={x𝒳:h(x)=0},𝒞={x𝒳:h(x)0},Int(𝒞)={x𝒳:h(x)>0}.\displaystyle\begin{gathered}\partial\mathcal{C}=\{x\in\mathcal{X}:h(x)=0\},\hfill\\ \mathcal{C}=\{x\in\mathcal{X}:h(x)\geq 0\},\hfill\\ Int(\mathcal{C})=\{x\in\mathcal{X}:h(x)>0\}.\hfill\\ \end{gathered} (54)

Then, according to [31], in the presence of disturbance d0d\neq 0, h(x)h(x) is called a Zeroing CBF (ZCBF) if there exists an extended class κ\kappa function ϖκ\varpi\in\kappa that satisfies

supuU{h(x)f(x)+  h(x)g(x)uh(x)g(x)(h(x)g(x))T+ϖ(h(x))}0,     x𝒳\displaystyle\begin{gathered}\leavevmode\nobreak\ \underset{u\in U}{\mathop{\sup}}\,\left\{\nabla h(x\right)f\left(x\right)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla h\left(x\right)g\left(x\right)u\hfill\\ -\nabla h\left(x\right)g(x){{(\nabla h\left(x\right)g(x))}^{T}}+\varpi\left(h\left(x\right)\right)\}\geq 0,\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\quad x\in\mathcal{X}\hfill\end{gathered} (57)

Based on ZCBF h(x),h\left(x\right), the safe control space S(x)S\left(x\right)\leavevmode\nobreak\ is defined as

S(x)={uU|h(x)f(x)+  h(x)g(x)uh(x)g(x)(h(x)g(x))T+ϖ(h(x))0}},  x𝒳.\displaystyle\begin{gathered}S(x)=\{u\in U|\nabla h(x)f(x)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla h(x)g(x)u\hfill\\ -\nabla h(x)g(x){{(\nabla h(x)g(x))}^{T}}+\varpi(h(x))\geq 0\}\},\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }x\in\mathcal{X}.\hfill\end{gathered} (60)

The following theorem shows how to design a controller using the concept ZCBF to guarantee that the safe set 𝒞\mathcal{C} is forward invariant and thus the system is safe.

Assumption 3[31]. The admissible control space S(x)S\left(x\right) is path-connected and nonempty. That is,

Int(S)0andInt(S)¯=S.Int(S)\neq 0\,\,\,\,and\,\,\,\overline{Int(S)}=S. (61)

Theorem 2 [32]. Consider the set 𝒞n\mathcal{C}\subset{{\mathbb{R}}^{n}} defined in (54) and let the ZCBF hh be defined as (57). Then, under Assumption 3, any Lipschitz continuous controller uu such that uS(x)u\in S\left(x\right) renders the system (1) ISSf and the safe set 𝒞\mathcal{C} forward invariant. \square

Conditions (54) and (57) guarantee that if the system starts from any initial condition x𝒳0x\in{{\mathcal{X}}_{0}} within the safe set, its future trajectories will not enter the unsafe region x𝒳ux\in{{\mathcal{X}}_{u}} for any disturbances. This is because the condition (57) makes the safe set 𝒞\mathcal{C} robustly invariant.

II-C Satisficing Control Design

Satisficing (good enough) decision-making framework, originated from economical science [17], and adopted in control society later [18, 19], defines two utility functions: the selectability px(u,x){{p}_{x}}(u,x) and rejectability pr(u,x){{p}_{r}}(u,x). One then seeks to find a strategy uu from a so-called satisficing set defined as

SB(x)={um:ps(u,x)B(x)pr(u,x)}{{S}_{B}}(x)=\left\{u\in{{\mathbb{R}}^{m}}:{{p}_{s}}(u,x)\succeq B(x){{p}_{r}}(u,x)\right\} (62)

where B(x)B(x) is called aspiration level. Any control strategy that has a lager selectability index than rejectability index belongs to the satisficing set and is considered to have a good enough performance. It was shown in [18] that if ps(u,x)=(V)T(x)(f(x)+g(x)u){{p}_{s}}(u,x)=-{{(\nabla V)}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)u\right) where VV is a control Lyapunov function and pr(u,x)=r(x,u){{p}_{r}}(u,x)=r(x,u) with r(x,u)r(x,u) defined in (4), then satisficing controllers (62) are stabilizing. Moreover, ps(u,x){{p}_{s}}(u,x) indicates the stability index of the system and pr(u,x){{p}_{r}}(u,x) indicates the cost of implementing the controller. It was also shown in [18] how to parametrize the set of all satisficing controllers. However, it is not clear how to choose one control strategy from the set of satisficing controllers to be applied to the system. We design a policy iteration (PI) algorithm that selects a satisficing strategy that optimizes a performance.

III A SATISFICING SAFE CONTROL DESIGN SCHEME

In this section, a novel satisficing safe control framework is designed that guarantees robust safety and a good enough performance. First, a relaxed robust stabilizing control framework is presented in 3.2, inspired by [33, 34, 35, 36], in which an infinite-dimensional linear program (LP) is derived to find a robust suboptimal controller. The infinite-horizon LP is then transformed into a sum-of-squares (SOS) program. This framework will then be integrated with CBF in 3.3 to certify robust safety of the resulting controller.

III-A A Satisficing Control Framework for Safe control with Guaranteed Performance

We now define a satisficing control framework that defines the selectability index as robust stability and robust safety of controllers and rejectability index as their cost. Consider the system (1) with the performance function (5) and (6). Let VV be a control Lyapunov function and h(x)h\left(x\right) be a control barrier function. Define

{ps(u,x)=[(V)T(x)(f(x)+g(x)u),h(x)f(x)+  h(x)g(x)uh(x)g(x)(h(x)g(x))T]Tpr(u,x)=[r¯(x,u),ηϖ(h(x))]T\left\{\begin{array}[]{l}{{{p}_{s}}(u,x)}=[{-{{(\nabla V)}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)u\right)},\hfill\\ {\nabla h\left(x\right)f\left(x\right)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla h\left(x\right)g\left(x\right)u-\nabla h\left(x\right)g(x){{(\nabla h\left(x\right)g(x))}^{T}}}]^{T}\\ \\ {{{p}_{r}}(u,x)}=[{\overline{r}\left(x,u\right)},\quad{-\eta\varpi\left(h\left(x\right)\right)}]^{T}\end{array}\right. (63)

where r¯(x,u)\overline{r}\left(x,u\right) is defined in (6), η>0\eta>0 is a constant, ps(u,x){{p}_{s}}(u,x) is selectability and is associated with the robust stability and robust safety of the system, and pr(u,x){{p}_{r}}(u,x) is rejectability and is associated with the controller cost and safety aggressiveness. Define now the satisficing set as (62) where B(x)=diag(b1(x),b2(x))B(x)=diag({{b}_{1}}(x),\,\,\,{{b}_{2}}(x)) with b1(x){{b}_{1}}(x) as the aspiration level on stability and b2(x){{b}_{2}}(x) the aspiration level on safety.

Remark 3. The parameter η\eta in the rejectibility index of (63) reveals how rapidly ϖ(h(x))\varpi\left(h\left(x\right)\right) damps as the system’s states get further away from the safe boundaries and thus it is treated as a rejectability index. A larger η\eta increases system’s aggressiveness in favor of having more flexibility in the safe set. The aspiration level reflects the expectation of the designer on satisfaction of the controller, and is also related to the solution space in the satisficing set SB(x){{S}_{B}}(x).

The next theorem shows that under Assumptions 1-3, there is always a feasible solution to (62), (63) by choosing an appropriate aspiration level. The feasible solution makes the system both robustly safe and robustly stable if there is no conflict between safety and stability and sacrifices on the stability in case of a conflict.

Theorem 3. Let Assumptions 1 and 3 hold. Then, there exists an aspiration level B(x)B(x) for which the satisficing set (62) with ps(u,x){{p}_{s}}(u,x) and pr(u,x){{p}_{r}}(u,x) defined in (63) has a feasible solution.

Proof. By Assumption 3, there is always a safe control policy that satisfies the second inequality in (62), (63). On the other hand, by Assumption 1, there is a stabilizing controller that satisfies the first constraints. If there is no conflict between safety and stability, then there is a controller that satisfies both inequalities in (62), (63), the satisficing set is nonempty. Assume now that there is a conflict between two constraints in (62), (63). Let

b1(x)=r¯(x,u)+ δ r¯(x,u),b2(x)=θ{{b}_{1}}(x)=\frac{\overline{r}\left(x,u\right)+\text{ }\!\!\delta\!\!\text{ }}{\overline{r}\left(x,u\right)},\,\,\,\,{{b}_{2}}(x)=\theta (64)

where  δ \text{ }\!\!\delta\!\!\text{ } is a function of xx representing the conflict between two constraints, and θ>0\theta>0 is the aspiration on the safety level. Based on (57), the safety constraint is satisfied by choosing any θ>0\theta>0 . On the other hand, based on the defined b1(x){{b}_{1}}(x) in (64), the first constraints in the equality ps(u,x)=B(x)pr(u,x){{p}_{s}}(u,x)=B(x){{p}_{r}}(u,x) becomes

(V)T(x)(f(x)+g(x)u)r¯(x,u)= δ .-{{(\nabla V)}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)u\right)-\overline{r}\left(x,u\right)=\text{ }\!\!\delta\!\!\text{ }. (65)

Therefore, any aspiration level of stability b1(x){{b}_{1}}^{{}^{\prime}}(x) satisfying  b1(x) b1(x)\text{ }{{b}_{1}}^{{}^{\prime}}(x)\leq\text{ }{{b}_{1}}(x) also satisfies safe constraint as

(V)T(x)(f(x)+g(x)u)θηϖ(h(x)).-{{(\nabla V)}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)u\right)\geq-\theta\eta\varpi\left(h\left(x\right)\right). (66)

This completes the proof. \square

Equation (65) shows that δ\delta indicates the sacrifice on stability and performance, since (66) resembles the Bellman inequality and will be used later to improve performance. This function will be optimized later to minimize the sacrifice on performance as much as possible. Note that a stabilizing solution to (66) is guaranteed under Assumption 1 when δ=0\delta=0. When δ\delta is nonzero, however, the solution to (66) might not be stabilizing and this can occur only when the safety is in conflict with stability. In this paper, instead of parameterizing the set of all satisficing controllers, a PI algorithm is designed to optimize over all satisficing control policies by relating the control Lyapunov function with the Bellman inequality solution, which is a performance-oriented control Lyapunov function and can be iteratively optimized while assuring that every improved policy remains in the satisficing set and thus guarantees stability. To this end, it is shown in the next subsection how to optimize over the set of satisficing control policies by iteratively solving Bellman inequalities while ignoring the safety constraint. 3.3 combines safety constraints and bellman inequalities to guarantee safety of improved policies.

III-B Relaxed Robust Stabilizing Optimal Control Design

In this subsection, a relaxed robust optimal control framework is presented. While safety is ignored here, this framework allows incorporating safety constraints, as shown in the next subsection. Inspired by [33, 34, 35, 36], a finite-dimensional linear program (LP) is first derived. This finite-dimensional optimization problem is integrated with CBF to include safety and is solved using SOS in the next subsection.

Problem 1 (Relaxed suboptimal robust stabilizing control design problem)

Consider the system (1) with the performance index (5), (6). Find the value function VV by solving

min𝑉ΩV(x)dxs.t.   H¯(V)0VP\displaystyle\begin{gathered}\underset{V}{\mathop{\min}}\,\mathop{\int}_{\Omega}V\left(x\right)dx\\ \text{s.t}.\quad\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\overline{H}\left(V\right)\leq 0\\ V\in P\\ \end{gathered} (71)

where Ω\Omega is the area in which system performance is expected to be improved, and H¯(V)\overline{H}(V) is defined in (12).

This is inspired by [33, 34, 35] in which it is shown that an optimal control problem can be transformed into an infinite-dimensional LP. Instead of searching for optimal value function over whole an infinite-dimensional space, in [35], the optimization is performed over an interested region Ω\Omega, which makes the problem finite-dimensional and tractable. In contrast to [35], a modified HJB inequality H¯(V)<0\overline{H}\left(V\right)<0 with the modified reward function is used to guarantee robust stability. This framework will allow us to incorporate safety constraints later and find satisficing control solutions that satisfy safety and optimize over stabilizing solutions. The following theorem shows some premier properties of Problem 1.

Theorem 4. Consider the system (1) and let Assumptions 1-3 hold. Then,

1) Problem 1 has a feasible value function solution.

2) If VV is the unique solution of Problem 1, then, the control solution

u¯(x)=12R1(VT(x)g(x))T\overline{u}(x)=-\frac{1}{2}{{R}^{-1}}{{(\nabla{{V}^{T}}\left(x\right)g(x))}^{T}} (72)

is globally robust stable and belongs to the satisficing set (62) when safety is ignored.

3) The control policy (72) provides an upper bound for the cost (3).

4) The following inequalities hold for x𝒳x\in\mathcal{X} along the trajectories of the closed system (1) with the controller (72),

V(x0)+0H(V(x(t)))𝑑tVo(x0)V(x0)V({{x}_{0}})+\int\limits_{0}^{\infty}{H(V(x(t)))dt}\leq{{V}^{o}}({{x}_{0}})\leq V({{x}_{0}}) (73)

5) The value function Vo{{V}^{o}} in (9) is a global robust optimal solution to (71).

Proof. See APPENDIX. \square

Iterative policy iteration algorithms can be designed to solve Problem 1. The need for safety assurance, however, makes existing policy iteration algorithms invalid, as they cannot guarantee safety. In the next subsection, to find a robust safe control policy with guaranteed performance, safety constraints satisfaction is incorporated by including a CBF as another inequality to Problem 1 and a novel PI algorithm is developed to solve it. Its connection to satisficing controllers is also shown.

Remark 4. A feasible solution VV to (71) may not be the true cost function associated with u¯\overline{u}. However, it is an upper bound of the practical cost.

III-C Robust Safe Satisficing Control with Performance Guarantee: A Novel Framework

While the controller designed by solving Problem 1 guarantees performance and robustness and belongs to satisficing set that only concerns stability, it cannot assure safety. On the other hand, the control design based on the CBF satisfying (62) guarantees safety but might result in poor performance. To bring the best of both worlds together, in this section, we aim to design robust stabilizing safe controllers that provide guaranteed performance within the volume of the certified safe area.

Problem 2 (Satisficing safe control design with performance guarantee)

Consider the system (1) with the performance function (5), (6). Find the value function that solves

minV, δ V  dx  +k δ  δ 2s.t.    H¯(V) δ  h(x)f(x)+  h(x)g(x)uh(x)g(x)(h(x)g(x))T+ϖ(h(x))0 \displaystyle\begin{gathered}\underset{V,\text{ }\!\!\delta\!\!\text{ }}{\mathop{\min}}\,\mathop{\int}_{\mathcal{L}}V\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ dx }\!\!\leavevmode\nobreak\ \!\!\text{ }+{{k}_{\text{ }\!\!\delta\!\!\text{ }}}{{\text{ }\!\!\delta\!\!\text{ }}^{2}}\\ \text{s.t}.\quad\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\overline{H}\left(V\right)\leq\text{ }\!\!\delta\!\!\text{ }\\ \text{ }\nabla h\left(x\right)f\left(x\right)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla h\left(x\right)g\left(x\right)u-\nabla h\left(x\right)g(x){{(\nabla h\left(x\right)g(x))}^{T}}\\ +\varpi\left(h\left(x\right)\right)\geq 0\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\\ \end{gathered} (79)

where H¯(V)\overline{H}(V) is defined in (12), \mathcal{L} is the interested safe region in which the performance is expected to be improved, k δ >0{{\text{k}}_{\text{ }\!\!\delta\!\!\text{ }}}>0 is a design parameter that trades off between the system’s aggressiveness toward performance and the safety, δ\delta is a relaxation factor.

Note that the relaxation factor δ\delta can be interpreted as the system’s aspiration level for the performance that shows how much we sacrifice the performance when both safety and performance cannot be satisfied together. This relaxation factor is minimized to get as much performance as possible.

Lemma 1. The solution to Problem 2 belongs to the satisficing set (62).

Proof. Recalling (64), (65) and (66), one can see that the constraints in (79) can be transformed to (62) by selecting suitable aspiration levels given in Theorem 4. Therefore, the search for the optimal solution is over the space of satisficing set (62), and, thus, the solution to Problem 2 belongs to (62). \square

Theorem 5. The safe optimization problem (79) has a feasible solution.

Proof. Based on Theorem 4, a robust safe control policy uu exists by selecting a suitable aspiration level. Let write this control policy as u=u+usafeu={{u}^{*}}+{{u}^{safe}} where u=12R1((V)T(x)g(x))T{{u}^{*}}=-\frac{1}{2}{{R}^{-1}}{{({{(\nabla{{V}^{*}})}^{T}}(x)g(x))}^{T}} is part of the control that is used to optimize the performance without concerning safety, and is given by [37], and usafe{{u}^{safe}} is added to u{{u}^{*}} to guarantee safety. Reformulate now the HJB equation as follows

H¯(V)=14(V)T(x)g(x)R1((V)T(x)g(x))T+q(x)+(V)T(x)f(x)+dmaxTRdmax+βu(x)=(V)T(x)(f(x)+g(x)u)+r¯(x,u)=(V)T(x)(f(x)+g(x)u)+r¯(x,u)+(u)TRuuTRu(V)T(x)g(x)usafe=(V)T(x)(f(x)+g(x)u)+r¯(x,u)+(u)TRuuTRu+2(u)TRusafe=(V)T(x)(f(x)+g(x)u)+r¯(x,u)+(u)TRuuTRu+2(u)TR(uu)=(V)T(x)(f(x)+g(x)u)+r¯(x,u)(u)TRuuTRu+2(u)TRu=(V)T(x)(f(x)+g(x)u)+r¯(x,u)(uu)TR(uu)=(V)T(x)(f(x)+g(x)u)+r¯(x,u)(usafe)TRusafe\displaystyle\begin{gathered}\overline{H}({{V}^{*}})=-\frac{1}{4}{{(\nabla{{V}^{*}})}^{T}}(x)g(x){{R}^{-1}}{{({{(\nabla{{V}^{*}})}^{T}}(x)g(x))}^{T}}\hfill\\ \quad\quad\quad+q(x)+{{(\nabla{{V}^{*}})}^{T}}(x)f(x)+d_{max}^{T}Rd_{max}+\beta_{u}(x)\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)\hfill\\ \quad\quad\quad+{{({{u}^{*}})}^{T}}R{{u}^{*}}-{{u}^{T}}Ru-{{(\nabla{{V}^{*}})}^{T}}(x)g(x){{u}^{safe}}\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)\hfill\\ \quad\quad\quad+{{({{u}^{*}})}^{T}}R{{u}^{*}}-{{u}^{T}}Ru+2{{({{u}^{*}})}^{T}}R{{u}^{safe}}\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)\hfill\\ \quad\quad\quad+{{({{u}^{*}})}^{T}}R{{u}^{*}}-{{u}^{T}}Ru+2{{({{u}^{*}})}^{T}}R(u-{{u}^{*}})\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)\hfill\\ \quad\quad\quad-{{({{u}^{*}})}^{T}}R{{u}^{*}}-{{u}^{T}}Ru+2{{({{u}^{*}})}^{T}}Ru\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)\hfill\\ \quad\quad\quad-{{(u-{{u}^{*}})}^{T}}R(u-{{u}^{*}})\hfill\\ \quad\quad\quad={{(\nabla{{V}^{*}})}^{T}}(x)(f(x)+g(x)u)+\overline{r}(x,u)-{{({{u}^{safe}})}^{T}}R{{u}^{safe}}\hfill\\ \end{gathered} (95)

where r¯\overline{r} is defined in (6) and (7). While u{{u}^{*}} is robust stabilizing, if robust safe control usafe{{u}^{safe}} is in conflict with the robust stability, the overall control input uu might not be robust stabilizing, i.e., (V)T(f(x)+g(x)u)+r¯(x,u)>0{{(\nabla{{V}^{*}})}^{T}}\left(f(x)+g(x)u\right)+\overline{r}\left(x,u\right)>0 might not be satisfied at some points. By choosing an appropriate slack variable δ\delta to resolve the conflict between safety and stability, however, one has

H¯(V)-δ=(V)T(x)(f(x)+g(x)u)+r¯(x,u)δ0\overline{H}({{V}^{*}})\text{-}\delta\text{=}{{(\nabla{{V}^{*}})}^{T}}(x)\left(f(x)+g(x)u\right)+\overline{r}\left(x,u\right)-\delta\leq 0 (96)

for some δ\delta. On the other hand, since uu is safe, based on the converse CBF theorem [38], there exists a barrier certificate h(x)h(x) satisfying

 h(x)f(x)+  h(x)g(x)uh(x)g(x)(h(x)g(x))T+ϖ(h(x))0.\displaystyle\begin{gathered}\text{ }\nabla h\left(x\right)f\left(x\right)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla h\left(x\right)g\left(x\right)u-\nabla h\left(x\right)g(x){{(\nabla h\left(x\right)g(x))}^{T}}+\varpi\left(h\left(x\right)\right)\geq 0.\hfill\\ \end{gathered} (99)

\square

Assumption 4[35]. There exists a smooth mapping V0:n{{V}^{0}}:{{\mathbb{R}}^{n}}\to\mathbb{R}, such that V0[x]2,2rP{{V}^{0}}\in\mathbb{R}{{[x]}_{2,2r}}\cap P and L(V0,u1)+δL({{V}^{0}},{{u}^{1}})+\delta is SOS, where L(V0,u1)=H¯(V0)L({{V}^{0}},{{u}^{1}})=-\overline{H}\text{(}{{V}^{0}}\text{)}.

Solving optimization Problem 2 is non-trivial in general. If both Bellman inequality and CBF inequality constraints are restricted to SOS constraints, SOS program can be used to significantly reduce the computational burden in finding a solution to this optimization problem. However, since H¯(V)\overline{H}(V) is bilinear in VV, it makes the optimization problem hard or even impossible to solve using SOS. Therefore, we propose a robust safe policy iteration algorithm that iterates on a Bellman inequality, which is linear in VV, instead of directly solving for H¯(V)δ\overline{H}(V)\leq\delta. Using this Bellman inequality, a policy evaluation step that will find the value function Vi{V^{i}} and corresponding to a robust safe control policy ui{{u}^{i}}, and policy improvement step will find an improved policy ui+1{{u}^{i+1}} for which its safety is certified by adding the CBF inequality. We assume that an initial robust safe control policy u0{{u}^{0}} is given, which can be found by a control policy that only satisfies the CBF without any concern about optimality.

To evaluate a given policy ui{{u}^{i}}, i.e., to find the value function Vi{{V}^{i}} corresponding to it, a Bellman inequality based on the modified reward function must be solved, which requires knowing βui(x)\beta_{u^{i}}(x) in the modified reward functions, which in turns require knowing a bound on uimaxTRuimax{u^{i}}_{\max}^{T}R\,\,{u^{i}}_{\max}, as defined in Theorem 1. Therefore, before the policy evaluation step, we first find this bound and thus βui(x)\beta_{u^{i}}(x). Since uiu^{i}, and, consequently uiTRuiu^{i^{T}}R\,\,{u^{i}}, is polynomial, one can write it as uiTRui=cimixu^{i^{T}}R\,\,{u^{i}}=c_{i}m_{i}^{x} where mixm_{i}^{x} is the set of monomials and cic_{i} is the vector of coefficients. To find uimaxTRuimax=maxuiTRui{u^{i}}_{\max}^{T}R\,\,{u^{i}}_{\max}=maxu^{i^{T}}R\,\,{u^{i}}, one can then solve the following optimization problem.

uimaxTRuimax=maxcimixs.t.x𝒳\displaystyle\begin{gathered}{u^{i}}_{\max}^{T}R\,\,{u^{i}}_{\max}=max\,\,c_{i}m_{i}^{x}\\ \,\,s.t.\,\,\,\,x\,\,\in\,\,\mathcal{X}\\ \end{gathered} (103)

However, polynomial optimization is NP-hard and, instead, we obtain a lower bound for uimaxTRuimax{u^{i}}_{\max}^{T}R\,\,{u^{i}}_{\max} by solving

Lp=minγs.t.γcimix 0\displaystyle\begin{gathered}L_{p}=min\,\,\gamma\\ \,\,s.t.\,\,\,\,\gamma\,-\,c_{i}m_{i}^{x}\,\geq\,0\\ \end{gathered} (107)

which is an SOS optimization and can be efficiently solved. Note that uimaxTRuimaxLp{u^{i}}_{\max}^{T}R\,\,{u^{i}}_{\max}\leq L_{p} and if we choose βui(x)=Lp\beta_{u^{i}}(x)=L_{p}, then βui(x)(umaxi)TRumaxi\beta_{u^{i}}(x)\geq(u^{i}_{max})^{T}\,\ R\,\ u^{i}_{max} which satisfies the condition of reward function in Theorem 1. Based on this optimization, the following policy evaluation step is proposed.

Safe policy evaluation step: Given a robust safe control policy ui{{u}^{i}}, find the bound for umaxiu^{i}_{max} using (107) and then find Vi{{V}^{i}} and δi{{\delta}_{i}} that solve the following optimization problem:

minVi,δiVi  dx  +kδδi2s.t. L(Vi,ui)=(Vi)T(x)(f(x)+g(x)ui)r¯(x,ui)δiVi1Vi0\displaystyle\begin{gathered}\underset{{{\text{V}}^{i}},{{\delta}^{i}}}{\mathop{\min}}\,\mathop{\int}_{\mathcal{L}}{{V}^{i}}\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ dx }\!\!\leavevmode\nobreak\ \!\!\text{ +}{{\text{k}}_{\delta}}{{\delta}_{i}}^{2}\\ \text{s.t}.\quad\text{ }\!\!\leavevmode\nobreak\ \!\!L({{V}^{i}},{{u}^{i}})=-{{(\nabla{{V}^{i}})}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right){{u}^{i}}\right)-\overline{r}\left(x,{{u}^{i}}\right)\geq-{{\delta}_{i}}\\ {{V}^{i-1}}-{{V}^{i}}\geq 0\\ \end{gathered} (112)

In terms of SOS, this optimization problem is transformed into

minVi,δiVi  dx  +kδδi2s.t. L(Vi,ui)+δi is SOS, x𝒳Vi1ViisSOS\displaystyle\begin{gathered}\underset{{{\text{V}}^{i}},{{\delta}^{i}}}{\mathop{\min}}\,\mathop{\int}_{\mathcal{L}}{{\text{V}}^{i}}\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ dx }\!\!\leavevmode\nobreak\ \!\!\text{ +}{{\text{k}}_{\delta}}{{\delta}_{i}}^{2}\\ \text{s.t}.\quad\text{ }\!\!\leavevmode\nobreak\ \!\!L({{V}^{i}},{{u}^{i}})+{{\delta}_{i}}\text{ }\!\!\leavevmode\nobreak\ \!\!\,\,\text{is}\,\,\text{ }\!\!\leavevmode\nobreak\ \!\!\text{SOS},\text{ }\!\!\leavevmode\nobreak\ \!\!\quad\forall x\in\mathcal{X}\\ {{V}^{i-1}}-{{V}^{i}}\,\,\,\text{is}\,\,\,\text{SOS}\\ \end{gathered} (117)

where V=pTm2,2r(x)V={{p}^{T}}{{\overrightarrow{m}}_{2,2r}}(x), and Vi=piTm2,2r(x){{V}_{i}}=p_{{}^{i}}^{T}{{\overrightarrow{m}}_{2,2r}}(x) .

In the policy evaluation step (112) , the value function corresponding to a given policy is found while minimizing the relaxation factor δi{{\delta}_{i}}. Note that since a robust safe control policy ui{{u}^{i}} might not necessarily be robust stabilizing, therefore, L(Vi,ui)L({{V}^{i}},{{u}^{i}}) might not be positive semidefinite.

Remark 5. Instead of performing two SOS optimization to evaluate a given policy, (one for finding umaxiu^{i}_{max} in each step), one can regard every element of umaxi=[u1i,,umi]u^{i}_{max}=[u^{i}_{1},...,u^{i}_{m}] as a decision variable and incorporate it through the following optimization problem.

minujmaxis.t.ujimaxujiisSOS\displaystyle\begin{gathered}\min\,{u^{i}_{j_{\max}}}\\ \,\,s.t.\,\,\,\,{{u^{i}_{j}}_{\max}}-{{u}^{i}_{j}}\,\,is\,\,SOS\\ \end{gathered} (121)

where the uiju^{i^{j}} is the jthj-th element of the improved safe policy uiu^{i} found in Step 3. Then, uimaxTRuimax{u^{i}}_{\max}^{T}R\,\,{u^{i}}_{\max} can be calculated since uimax=[u1maxi,,ummaxi]{u^{i}}_{\max}=[u^{i}_{1_{\max}},...,u^{i}_{m_{\max}}]. Alternatively, the ujmaxiu^{i}_{j_{max}} can be defined as a polynomial which is desired to be minimized in a domain of interests. That is,

minujmaxi𝒟ujmaxis.t.ujmaxiuiisSOS\displaystyle\begin{gathered}\underset{{u}^{i}_{j_{\max}}}{\mathop{\min\,}}\,\int_{\mathcal{D}}{{u^{i}_{j_{\max}}}}\\ \,\,s.t.\,\,\,\,{{u}^{i}_{j_{\max}}}-{u}^{i}\,\,is\,\,SOS\\ \end{gathered} (125)

where 𝒟\mathcal{D} is the domain of the interested in which the umaxiu^{i}_{max} is desired to be minimized. Incorporating these extra SOS constraints into the policy improvement step (112) relaxes the requirement of solving the SOS optimization (107).

Once a policy is evaluated, an improved control policy with ISSf certification is found. The following lemma shows how to find a safety-certified improved control policy.

Lemma 2 (Safe policy improvement). Let ui{{u}^{i}} be a robust safe control policy with guaranteed value function Vi{{V}^{i}}. Then, an improved safety certified control policy ui+1{{u}^{i+1}} can be found by solving following optimization problem

minusafe,Z(usafe)TRusafes.t. ui+1=usafe12R1(Vi(x)g(x))Th(x)f(x)+  h(x)g(x)ui+1+Zh(x)h(x)g(x)(h(x)g(x))Tis SOS  Z is SOS\displaystyle\begin{gathered}\underset{{{u}^{safe}},Z}{\mathop{\min}}\,\,{{({{u}^{safe}})}^{T}}R{{u}^{safe}}\\ \text{s.t}.\quad\text{ }\!\!\leavevmode\nobreak\ \!\!{{u}^{i+1}}={{u}^{safe}}-\frac{1}{2}{{R}^{-1}}{{(\nabla{{V}^{i}}(x)g(x))}^{T}}\\ \nabla{{h}}\left(\text{x}\right)f\left(x\right)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla{{h}}\left(\text{x}\right)\text{g}\left(\text{x}\right){{u}^{i+1}}+Z\,{{h}}\left(x\right)\\ -\nabla{{h}}\left(x\right)g(x){{(\nabla{{h}}\left(x\right)g(x))}^{T}}\,\,\text{is }\,\,\text{SOS }\!\!\leavevmode\nobreak\ \!\!\text{ }\\ Z\text{ }\!\!\leavevmode\nobreak\ \!\!\,\,\text{is }\,\,\text{SOS}\\ \end{gathered} (132)

Proof. To find an improved control policy, we use stationary condition [39] that minimizes the Bellman equation while satisfying the CBF and find an improved policy. Note that the Bellman equation can be written as

L(Vi,ui+1)=(Vi)T(x)(f(x)+g(x)ui+1)r¯(x,ui+1) =(Vi)T(x)(f(x)+g(x)(uopt)i+1) +(usafe)TRusafer¯(x,(uopt)i+1)\displaystyle\begin{gathered}L({{V}^{i}},{{u}^{i+1}})=-{{(\nabla{{V}^{i}})}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right){{u}^{i+1}}\right)-\overline{r}\left(x,{{u}^{i+1}}\right)\hfill\\ \qquad\qquad\text{ }\!\!\leavevmode\nobreak\ \!\!=-{{(\nabla{{V}^{i}})}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right){{({{u}^{opt}})}^{i+1}}\right)\hfill\\ \quad\qquad\qquad\text{ }\!\!\leavevmode\nobreak\ \!\!+{{({{u}^{safe}})}^{T}}R{{u}^{safe}}-\overline{r}\left(x,{{({{u}^{opt}})}^{i+1}}\right)\hfill\\ \end{gathered} (137)

where ui+1=uopt+usafe{{u}^{i+1}}={{u}^{opt}}+{{u}^{safe}}. Minimizing the term (Vi)T(x)(f(x)+g(x)(uopt)i+1)r¯(x,(uopt)i+1)-{{(\nabla{{V}^{i}})}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right){{({{u}^{opt}})}^{i+1}}\right)-\overline{r}\left(x,{{({{u}^{opt}})}^{i+1}}\right) using stationarity condition results in uopt=12R1((Vi)T(x)g(x))T{{u}^{opt}}=-\frac{1}{2}{{R}^{-1}}{{({{(\nabla{{V}^{i}})}^{T}}(x)g(x))}^{T}}. Therefore, minimizing (usafe)TRusafe{{({{u}^{safe}})}^{T}}R{{u}^{safe}} as the second term while setting ui+1=usafe12R1((Vi)T(x)g(x))T{{u}^{i+1}}={{u}^{safe}}-\frac{1}{2}{{R}^{-1}}{{({{(\nabla{{V}^{i}})}^{T}}(x)g(x))}^{T}} optimizes the performance. Since the control must certify safety constraint, the CBF inequality must also be considered. \square

Remark 6. In (117) and (132), δ\delta and usafe{{u}^{safe}} are polynomials which can be written in the form of Square Matrix Representation (SMR) as PT(x)QP(x){{P}^{T}}(x)QP(x), where P(x)P(x) is a vector of monomials, and QQ is a symmetrical coefficient matrix. In order to solve this optimization problem, we adopt a typical way in the literature [40] to minimize the trace(Q)trace(Q) to get smaller δ\delta and usafe{{u}^{safe}} for objective function in (117) and (132). The SOS program (132) involves bilinear decision variables. It can be solved efficiently by splitting into several smaller SOS programs as presented in [41].

Remark 7 [42]. To find a feasible h(x)h(x) for performing the policy improvement step, one can solve the following optimization problem.

Findh(x),σ1(x)andσ2(x)s.t.h(x)εσ1(x)x0(x)isSOSh(x)εσ1(x)xo(x)isSOSσ1(x)andσ2(x)isSOS\displaystyle\begin{gathered}\,\,\,\,\,\,\,\,Find\,\,\,h(x),\,\,{{\sigma}_{1}}(x)\,and\,\,{{\sigma}_{2}}(x)\,\\ s.t.\,\,\,h(x)-\varepsilon{{\sigma}_{1}}(x){{x}_{0}}(x)\,\,is\,\,SOS\\ \,\,\,\,\,\,-\,h(x)-\varepsilon{{\sigma}_{1}}(x){{x}_{o}}(x)\,\,is\,\,SOS\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{{\sigma}_{1}}(x)\,and\,\,{{\sigma}_{2}}(x)\,\,is\,\,SOS\\ \end{gathered} (143)

However, there might be multiple h(x)h(x) solutions. By maximizing the margin ε\varepsilon of the barrier certificate constraint, an optimal option of h(x)h(x) can be obtained. This method enlarges the feasible solution space of usafe{{u}^{safe}} for optimizing in successive step 1 which speeds up the convergence of optimization procedure [40, 41].

We now combine the safe policy evaluation step (117) and safe policy improvement step performed in (132), the following algorithm is presented to iteratively solve Problem 2.


Algorithm 1: Satisficing safe control design framework.

1:Initialize with (V0,u1)({V^{0}},{u^{1}}) that satisfies Assumption 4 and set a sum of square variable ¬\neg as a threshold variable.
2:procedure  i=1,2,,N\forall\,i=1,2,...,N
3:     Given uiu^{i}, let Vi=piTm2,2r(x){{V}_{i}}=p_{{}^{i}}^{T}{{\overrightarrow{m}}_{2,2r}}(x), calculate the value function ViV^{i} and the relaxation variable variable δi\delta_{i} using (117) through SOS program. Then, calculate and justify whether (Vi1Vi)+¬isSOS-({V^{i-1}}-{V^{i}})+\neg\,\,\,{\rm{is}}\,\,\,{\rm{SOS}}. The algorithm stops if (Vi1Vi)+¬isSOS-({V^{i-1}}-{V^{i}})+\neg\,\,\,{\rm{is}}\,\,\,{\rm{SOS}}, otherwise go to Step 4.
4:     Iteratively search for an improved policy ui+1u^{i+1} using (132). Then, use ui+1u^{i+1} into Step 3 to calculate a new value function.
5:end procedure

It should be noticed that in Step 3, the condition (Vi1Vi)+¬isSOS-({V^{i-1}}-{V^{i}})+\neg\,\,\,{\rm{is}}\,\,\,{\rm{SOS}} implies |Vi1Vi|<¬|{V^{i-1}}-{V^{i}}|\,<\neg\,. More specifically, Algorithm 1 terminates when the value function ViV^{i} stops decreasing.

Remark 8. The presented robust safe satisficing control scheme integrates barrier certificate with performance-driven Lyapunov to assure safety while sacrificing as little as possible on performance.

Theorem 6. Consider Assumptions 1-4 for the system (1). Then,

1) The policy evaluation step (112) has a nonempty feasible set.

2) The closed-loop system (1) with the controller ui{{u}^{i}} derived after each safe policy iteration is robust safe and guarantee the globally robust stability as much as possible.

3) There exists a positive definite V[x]2,2r{{V}^{*}}\in\mathbb{R}{{[x]}_{2,2r}} such that for any x0D{{x}_{0}}\in D, V(x0)Vi(x0){{V}^{*}}({{x}_{0}})\leq{{V}^{i}}({{x}_{0}}) holds. Besides, limiVi(x0)V(x0)\underset{i\to\infty}{\mathop{\lim}}\,{{V}^{i}}({{x}_{0}})\to{{V}^{*}}({{x}_{0}}).

4) Along the solution of system with u(x)=12R1((V)T(x)g(x))Tu^{*}(x)=-\frac{1}{2}{{R}^{-1}}{{({{(\nabla{{V}^{*}})}^{T}}\left(x\right)g(x))}^{T}}, the following holds

V(x0)+0H(V(x(t)))𝑑tVo(x0)V^{*}({{x}_{0}})+\int\limits_{0}^{\infty}{H(V^{*}(x(t)))dt}\leq{{V}^{o}}({{x}_{0}}) (144)

Proof:

1) The following mathematical induction steps are used to prove part 1.

i) Suppose i=1i=1. Then, under Assumption 4, L(V0,u1)+δL({{V}^{0}},{{u}^{{}_{1}}})+\delta is SOS. Therefore, V=V0V={{V}^{0}} is a feasible solution to (112).

ii) Assume now that V=Vj1V={{V}^{j-1}} is an optimal solution to the (112) with i=j1>1i=j-1>1. In the following, it is show that V=Vj1V={{V}^{j-1}} is then a feasible solution to the same problem with i=ji=j.

From the safe policy improvement step (132), by definition, ui=usafe12R1(Vi1(x)g(x))T{{u}^{i}}={{u}^{safe}}-\frac{1}{2}{{R}^{-1}}{{(\nabla{{V}^{i-1}}(x)g(x))}^{T}} and

L(Vj1,uj)+δ=(Vj1)T(f(x)+g(x)uj+g(x)ω)r¯(x,uj)+δ=L(Vj1,uj1)+δ(Vj1)Tg(x)(ujuj1)+(uj1)TRuj1+βuj1(x)βuj(x)(u)jTRuj=L(Vj1,uj1)+(uj1uj)TR(uj1uj)+δ+βuj1(x)βuj(x2(usafe)TR(ujuj1)\displaystyle\begin{gathered}L({{V}^{j-1}},{{u}^{{}_{j}}})+\delta=-{{(\nabla{{V}^{j-1}})}^{T}}(f(x)+g(x){{u}^{{}_{j}}}+g(x)\omega)-\bar{r}(x,{{u}^{{}_{j}}})+\delta\hfill\\ =L({{V}^{j-1}},{{u}^{{}_{j-1}}})+\delta-{{(\nabla{{V}^{j-1}})}^{T}}g(x)({{u}^{{}_{j}}}-{{u}^{{}_{j-1}}})+{{(u^{{}^{j-1}})}^{T}}Ru^{{}^{j-1}}+\beta_{u^{j-1}}(x)-\beta_{u^{j}}(x)-{{(u{{}^{j}})}^{T}}Ru^{{}^{j}}\hfill\\ =L({{V}^{j-1}},{{u}^{{}_{j-1}}})+{{({{u}^{{}_{j-1}}}-{{u}^{j}})}^{T}}R({{u}^{{}_{j-1}}}-{{u}^{j}})+\delta+\beta_{u^{j-1}}(x)-\beta_{u^{j}}(x-2{{({{u}^{safe}})}^{T}}R({{u}^{{}_{j}}}-{{u}^{{}_{j-1}}})\hfill\\ \end{gathered} (149)

Under the induction assumption, one has Vj1[x]2,2r{{V}^{{}_{j-1}}}\in\mathbb{R}{{[x]}_{2,2r}} and L(Vj1,uj1)+δL({{V}^{j-1}},{{u}^{{}_{j-1}}})+\delta is SOS. By selecting a suitable relax variable δ\delta, the affect of safe policy usafe{{u}^{safe}} on positivity of (149) is eliminated. Hence, L(Vj1,uj)+δL({{V}^{j-1}},{{u}^{{}_{j}}})+\delta is SOS. As a result, Vj1{{V}^{j-1}} is a feasible solution to the SOS program (112) with i=ji=j.

2) We fist show that ui{{u}^{i}} derived after each safe policy improvement is robustly safe. It can be seen from Algorithm 1 that u=uopt+usafeu={{u}^{opt}}+{{u}^{safe}} and following barrier certificate is satisfied.

h(x)f(x)+  h(x)g(x)ui+1+Zh(x)h(x)g(x)(h(x)g(x))Tis SOS \displaystyle\begin{gathered}\nabla{{h}}\left(\text{x}\right)f\left(x\right)+\text{ }\!\!\leavevmode\nobreak\ \!\!\text{ }\nabla{{h}}\left(\text{x}\right)\text{g}\left(\text{x}\right){{u}^{i+1}}+Z\,{{h}}\left(x\right)\\ -\nabla{{h}}\left(x\right)g(x){{(\nabla{{h}}\left(x\right)g(x))}^{T}}\,\,\text{is }\,\,\text{SOS }\!\!\leavevmode\nobreak\ \!\!\text{ }\\ \end{gathered} (153)

With an initial robust safe control policy u0{{u}^{0}}, the safety of the control policy can always be guaranteed in each iteration.

We now show globally robust stability of the control solution when there is no conflict between stability and safety. When there is no conflict, δ=0\delta=0 because in this case, no relaxation variable is needed to guarantee feasibility by selecting a suitable scalar kδ{{\text{k}}_{\delta}}. Now, develop an induction as follows.

i) Suppose i=1i=1. Under Assumption 4, u1{{u}^{1}} is globally robust stabilizing, and we also have V1P{{V}^{1}}\in P. For each x0D{{x}_{0}}\in D and x00{{x}_{0}}\neq 0, we can obtain

V1(x0)0r¯(x,u1)𝑑t>0{{V}^{1}}({{x}_{0}})\geq\int\limits_{0}^{\infty}{\overline{r}(x,{{u}^{1}})dt}>0 (154)

Using this inequality and the constraint in (112), under Assumption 1 it follows that

VoV1V0{{V}^{o}}\leq{{V}^{{}_{1}}}\leq{{V}^{{}_{0}}} (155)

since Vo{{V}^{o}} and V0{{V}^{0}} are assumed to be positive definite, obviously VP{{V}}\in P.

ii) Assume ui1{{u}^{i-1}} is globally robust stabilizing, and Vi1P{{V}^{i-1}}\in P for i2i\geq 2. We now prove that ui{{u}^{{}_{i}}} is globally robust stabilizing and ViP{{V}^{i}}\in P. Along the system trajectory of system (1) and u=uiu={{u}^{{}_{i}}} the following holds

V˙i1=(Vi1)T(f+g(ui+ω))=L(Vi1,ui)r¯(x,ui)0{{\dot{V}}^{i-1}}={{(\nabla{{V}^{i-1}})}^{T}}(f+g({{u}^{i}}+\omega))=-L({{V}^{i-1}},{{u}^{i}})-\overline{r}(x,{{u}^{i}})\leq 0 (156)

Therefore, ui{{u}^{i}} is globally robust stabilizing in this situation. Vi1{{V}^{i-1}} is a Lyapunov function for the system and we have

Vi(x0)0r¯(x,ui)𝑑t,x00.{{V}^{i}}({{x}_{0}})\geq\int\limits_{0}^{\infty}{\overline{r}(x,{{u}^{i}})dt},\,\,\,\,\,\forall{{x}_{0}}\neq 0. (157)

Similarly, the following inequalities hold

Vo(x0)Vi(x0)Vi1(x0){{V}^{o}}({{x}_{0}})\leq{{V}^{{}_{i}}}({{x}_{0}})\leq{{V}^{{}_{i-1}}}({{x}_{0}}) (158)

Since Vo{{V}^{o}} and Vi1{{V}^{{}_{i-1}}} are assumed to be positive definite, obviously ViP{{V}^{{}_{i}}}\in P.

3) By 2), the sequence {Vi(x)}i=0\left\{{{V}^{i}}(x)\right\}_{i=0}^{\infty} is monotonically decreasing with 0 as their lower bound due to its positive definite property. Therefore, there exists a lower bound V(x){{V}^{*}}(x) such that limiVi(x)=V(x)\underset{i\to\infty}{\mathop{\lim}}\,{{V}_{i}}(x)={{V}^{*}}(x). Let {pi}i=0\left\{{{p}_{i}}\right\}_{i=0}^{\infty} be a sequence for {Vi(x)}i=0\left\{{{V}^{i}}(x)\right\}_{i=0}^{\infty} such that Vi=piTm2,2r(x){{V}^{i}}=p_{i}^{T}{{\overrightarrow{m}}_{2,2r}}(x) so limipi=pn2r\underset{i\to\infty}{\mathop{\lim}}\,{{p}_{i}}={{p}^{*}}\in{{\mathbb{R}}^{{{n}_{2r}}}}, V=pm2,2r(x){{V}^{*}}={{p}^{*}}{{\overrightarrow{m}}_{2,2r}}(x). Similarly, it can also be shown that Vo(x)V(x)V0{{V}^{o}}(x)\leq{{V}^{*}}(x)\leq{{V}^{0}}. Since Vo{{V}^{o}} and Vi1{{V}^{{}_{i-1}}} are assumed to be positive definite, V[x]2,2r{{V}^{*}}\in\mathbb{R}{{[x]}_{2,2r}} and is positive definite.

4) By 3), we know that

H¯(V)=L(V,u)0.\overline{H}(V^{*})=-L(V^{*},u^{*})\leq 0. (159)

So, VV^{*} is a solution to Problem 1, the inequality in 4) can be derived from the fourth property of Theorem 5. This completes the proof. \square

IV SIMULATION RESULTS

In this section, the peoposed safe optimal control algorithm is applied to car suspension system. Consider a model of a car suspension system as

[x˙1x˙2x˙3x˙4]=[x2Ks(x3x1)Kn(x1x3)3+Bs(x4x2)+cuMbx3Ks(x3x1)Kn(x1x3)3+Bs(x4x2)+K1x3+cuMw]\displaystyle\begin{gathered}\left[{\begin{array}[]{*{20}{c}}{{{\dot{x}}_{1}}}\\ {{{\dot{x}}_{2}}}\\ {{{\dot{x}}_{3}}}\\ {{{\dot{x}}_{4}}}\end{array}}\right]=\left[{\begin{array}[]{*{20}{c}}{{x_{2}}}\\ {\frac{{{K_{s}}({x_{3}}-{x_{1}})-{K_{n}}{{({x_{1}}-{x_{3}})}^{3}}+{B_{s}}({x_{4}}-{x_{2}})+cu}}{{{M_{b}}}}}\\ {{x_{3}}}\\ {\frac{{{K_{s}}({x_{3}}-{x_{1}})-{K_{n}}{{({x_{1}}-{x_{3}})}^{3}}+{B_{s}}({x_{4}}-{x_{2}})+{K_{1}}{x_{3}}+cu}}{{{M_{w}}}}}\end{array}}\right]\end{gathered} (169)

where x1,x2,x3,x4{{x}_{1}},{{x}_{2}},{{x}_{3}},{{x}_{4}} are position, velocity of the car body, position, velocity of the wheel assembly. Mb,Mw{{M}_{b}},\,{{M}_{w}} denote mass of car and wheel assembly respectively. Kl,Ks,Kn{{K}_{l}},{{K}_{s}},{{K}_{n}} are the tire stiffness, nonlinear suspension stiffness and linear suspension stiffness, respectively, Bs{{B}_{s}} is the damping rate of suspension and cc is a constant control signal related to the input force. In this experiment, the parameters of system are set to Mb=300kg,Mw=60kg,Bs=1000Ns/m,Ks=16000N/m,Kt=190000N/m,Kn=1600N/m{{M}_{b}}=300\,kg,\,{{M}_{w}}=60\,kg,\,{{B}_{s}}=1000\,Ns/m,\,{{K}_{s}}=16000\,N/m,\,{{K}_{t}}=190000\,N/m,\,{{K}_{n}}=1600\,N/m. The safe region 𝒳o{{\mathcal{X}}_{o}} is defined as

𝒳o={x|x4,20x425}.\displaystyle\begin{gathered}\mathcal{X}_{o}=\{x|x\in{{\mathbb{R}}^{4}},\,-20\leq x_{4}\leq 25\}.\end{gathered} (171)

The parameters in performance index are set to q(x)=100x12+x22+x32+x42q(x)=100x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+x_{4}^{2} and R=1R=1. We are interested in proving the system in the following set

Θ={x|x4,|x1|,|x3|0.5|x2|,|x4|10}.\Theta=\{x|x\in{{\mathbb{R}}^{4}},\,\,\,|{{x}_{1}}|,\,|{{x}_{3}}|\leq 0.5\,\,\,|{{x}_{2}}|,\,|{{x}_{4}}|\leq 10\}. (172)

The state trajectories under the proposed safe optimal controller and uncontrolled condition are shown in Fig. 2 while the visualization of the value function, and safe optimal control signal are presented in Fig. 3 and Fig. 4. According to Fig. 3, the sequence of value functions evaluated by (112) is monotonically decreasing, and it will reach a much smaller value than the initial one after 10 iterations. Besides, it can be clearly observed that the trajectory of x4x_{4} under the proposed algorithm will not violate the safe constraints (171).

Refer to caption
Figure 2: The system trajectories of car suspension system under the proposed safe optimal control design.
Refer to caption
Figure 3: The 3D plot of value function of car suspension system for the initial control policy at the first iteration and the final control policy found after iteration 10.
Refer to caption
Figure 4: The control signal for the proposed safe optimal control when applied to the suspension system.

To analyze and verify the effectiveness of the proposed algorithm, in the following subsections, two comparison experiments are conducted. Since the optimal control of safety-critic systems is the concern of the paper, satisfaction of safe constraints (i.e., staying forever in the safe regions when starting from the safe set) as well as optimality of the proposed control policy are investigated in the following subsections.

IV-A Safety Verification of the Proposed Algorithm

We now compare our results with an optimal control algorithm presented in [35]. Implementing the algorithm in [35] to the car suspension system (169) results in the system performance shown in Fig. 5 and control signal shown in Fig. 6. From Fig. 5, it can be seen that the trajectory of x4x_{4} controlled by [35] violates safety constraints (171). Compared with [35], the proposed safe optimal control policy will not escape the safe set (171. Therefore, the proposed algorithm guarantees safety while the algorithm presented in [35] violates safety in this simulation case.

Refer to caption
Figure 5: Comparison of the trajectories of the suspension system under the proposed safe optimal control design and the optimal control design without safety guarantee presented in [35].
Refer to caption
Figure 6: Comparison of the control signal of the suspension system under the proposed safe optimal control design and the optimal control design without safety guarantee presented in [35].

IV-B Optimality Verification of the Proposed Algorithm

We now compare our proposed safe optimal control design method with the safe control design method presented in [41]. The method in [41] only considers safety and stability and does not incorporate any long-horizon optimality in the control design phase. The simulation results for the suspension system for the controller desiged without considering any cost optimization in [41] are shown in the Fig. 7. It can be clearly observed from Fig. 7 that both design methods will not violate the safety constraints (171) (represented by the blue dash line). However, as shown in Fig. 8, the value function for corresponding to the proposed safe optimal controller is much smaller than the value function corresponding to the same reward function calculated for the safe controller. This clearly shows that our proposed approach outperforms the standard safe control design approaches.

Refer to caption
Figure 7: Comparison of the trajectories of car suspension system under the proposed safe optimal control and safe control design without optimality consideration.
Refer to caption
Figure 8: Comparison of the value function of the system under the proposed safe optimal control and safe control design without optimality consideration.

V CONCLUSION AND FUTURE WORK

In this paper, the problem of both safe control and optimal control are investigated simultaneously. The optimal solution is derived by solving the Bellmen inequality using SOS. The obtained result is then verified by the barrier function to guarantee the safety of system. A relaxation variable is added to handle conflict between safety and stability of the system. The final controller obtained from the proposed method is not necessarily an optimal one but assures the safety of the system with guaranteed performance, called a satisficing solution. Numerical simulations are given to illustrate the effectiveness of the proposed algorithm. The proposed algorithm is applied to the car suspension system. Possible extensions of the presented work for the tracking problem, event-triggered systems, control of systems with unmatched disturbance and output regulation will be explored in the future.

Appendix A PROOF OF THE THEOREM 4

1) Define u0(x)=12R1(V0T(x)g(x))T{{u}_{0}}(x)=-\frac{1}{2}{{R}^{-1}}{{(\nabla V_{{}^{0}}^{T}\left(x\right)g(x))}^{T}}, since (44) holds, then

H¯(V0)=V0T(f(x)+g(x)u0+w)+r¯(x,u0)=V0T(f(x)+g(x)u1+w)+r¯(x,u1)+V0Tg(x)(u0u1)+u0TRu0u1TRu1+βu0(x)βu1(x)=V0T(f(x)+g(x)u1+w)+r¯(x,u1)(u0u1)TR(u0u1)+βu0(x)βu1(x)0\displaystyle\begin{gathered}\overline{H}({{V}_{0}})=\nabla V_{0}^{T}(f(x)+g(x){{u}_{0}}+w)+\overline{r}(x,{{u}_{0}})\hfill\\ \quad\quad\quad=\nabla V_{0}^{T}(f(x)+g(x){{u}_{1}}+w)+\overline{r}(x,{{u}_{1}})+\nabla V_{0}^{T}g(x)({{u}_{0}}-{{u}_{1}})\hfill\\ \quad\quad\quad+u_{0}^{T}R{{u}_{0}}-u_{1}^{T}R{{u}_{1}}+{{\beta}_{{{u}_{0}}}}(x)-{{\beta}_{{{u}_{1}}}}(x)\hfill\\ \quad\quad\quad=\nabla V_{0}^{T}(f(x)+g(x){{u}_{1}}+w)+\overline{r}(x,{{u}_{1}})-{{({{u}_{0}}-{{u}_{1}})}^{T}}R({{u}_{0}}-{{u}_{1}})+{{\beta}_{{{u}_{0}}}}(x)-{{\beta}_{{{u}_{1}}}}(x)\hfill\\ \quad\quad\quad\leq 0\hfill\\ \end{gathered} (179)

Therefore, V0{{V}_{0}} is a feasible solution for (71). Now, let us prove the solution to Problem 1 is unique.

2) Consider system (1) and control policy (72). Indeed, along the solutions of the closed-loop system, it follows that:

V˙=VT(x)(f(x)+g(x)u¯+ω(t))=VT(x)f(x)+VT(x)g(x)u¯+VT(x)ω(t)\displaystyle\begin{gathered}\dot{V}=\nabla{{V}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)\overline{u}+\omega(t)\right)\,\hfill\\ \quad=\nabla{{V}^{T}}\left(x\right)f(x)+\nabla{{V}^{T}}\left(x\right)g(x)\overline{u}+\nabla{{V}^{T}}\left(x\right)\omega(t)\hfill\\ \end{gathered} (183)

Since Problem 1 performs optimization over all value functions but satisfies H¯(V)0\overline{H}(V)\leq 0. Then applying Hamilton function H¯(V)=VT(x)(f(x)+g(x)u)+r¯(x,u)\overline{H}\text{(}V\text{)=}\nabla{{V}^{T}}(x)\,(f(x)+g(x)u)+\overline{r}\left(x,u\right) of (12), one has

V˙r¯(x,u¯)+VT(x)ω(t)=q(x)u¯TRu¯dmaxTRdmaxβu(x)+VT(x)g(x)d=2u¯TRddmaxTRdmaxq(x)u¯TRu¯βu(x)=dmaxTRdmaxq(x)+dT(x)Rd(x)(d+u¯)TR(d+u¯)βu(x)dmaxTRdmaxq(x)+dT(x)Rd(x)βu(x)q(x)βu(x)q(x)u¯TRu¯<0\displaystyle\begin{gathered}\dot{V}\leq-\overline{r}(x,\overline{u})+\nabla{{V}^{T}}\left(x\right)\omega(t)\hfill\\ \quad=-q\left(x\right)-{{\overline{u}}^{T}}R\overline{u}-d_{max}^{T}Rd_{max}-\beta_{u}(x)+\nabla{{V}^{T}}\left(x\right)g(x)d\hfill\\ \quad=-2{{\overline{u}}^{T}}Rd-d_{max}^{T}Rd_{max}-q(x)-{{\overline{u}}^{T}}R\overline{u}-\beta_{u}(x)\hfill\\ \quad=-d_{max}^{T}Rd_{max}-q(x)+{{d}^{T}}(x)Rd(x)-{{(d+\overline{u})}^{T}}R(d+\overline{u})-\beta_{u}(x)\hfill\\ \quad\leq-d_{max}^{T}Rd_{max}-q(x)+{{d}^{T}}(x)Rd(x)-\beta_{u}(x)\hfill\\ \quad\leq-q(x)-\beta_{u}(x)\leq-q(x)-\overline{u}^{T}R{{\overline{u}}}<0\hfill\\ \end{gathered} (191)

From (191), it is easily to see VV is a well-defined Lyapunov function for closed-loop system. Therefore, u¯\overline{u} is global robust stable. when there is no conflict, select  δ =0\text{ }\!\!\delta\!\!\text{ =0} and b(x)=1b(x)=1, it is obvious that solution belong to (20) when ps(u,s)=(V)T(x)(f(x)+g(x)u0){{p}_{s}}(u,s)=-{{(\nabla V)}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right){{u}_{0}}\right) and pr(u,s)=r(x,u0){{p}_{r}}(u,s)=r(x,{{u}_{0}}).

3) Next we show that index (3) has an upper bound. By integrating both sides of inequality (191) over time [0,T][0,T], we derive:

V(x0)V(x(T))0T(q(x)+uTRu)𝑑tV({{x}_{0}})-V(x(T))\geq\int\limits_{0}^{T}{(q\left(x\right)+{{u}^{T}}Ru})dt (192)

Since VV is a well-defined Lyapunov function of system (1), we have V(x(T))0V(x(T))\to 0 as TT\to\infty. Thus, (192) yields

V(x0)0(q(x)+uTRu)𝑑tV({{x}_{0}})\geq\int\limits_{0}^{\infty}{(q\left(x\right)+{{u}^{T}}Ru})dt (193)

which implies that J(x0,u)V(x0)J\left({{x}_{0}},u\right)\leq V({{x}_{0}}) for all small bounded disturbance dd. This shows performance index (3) has an upper bound V(x0)V({{x}_{0}}). And along the trajectory of the nominal system x˙=f(x)+g(x)u\dot{x}=f\left(x\right)+g\left(x\right)u, we have

V˙=VT(x)(f(x)+g(x)u)q(x)uTRuβu(x)dmaxTRdmax\dot{V}=\nabla{{V}^{T}}\left(x\right)\left(f\left(x\right)+g\left(x\right)u\right)\leq-q\left(x\right)-{{u}^{T}}Ru-\beta_{u}(x)-d_{max}^{T}Rd_{max} (194)

Integrating both sides of (194) over [0,T][0,T] yields

V(x0)V(x(T))0T(q(x)+uTRu+βu(x)+dmaxTRdmax)𝑑tV({{x}_{0}})-V(x(T))\geq\int\limits_{0}^{T}{(q\left(x\right)+{{u}^{T}}Ru}+\beta_{u}(x)+d_{max}^{T}Rd_{max})dt (195)

Letting TT\to\infty, we obtain J¯(x0,u)V(x0)\overline{J}\left({{x}_{0}},u\right)\leq V({{x}_{0}}).

4) By 3), we know

V(x0)J¯(x0,u¯)min𝑢J¯(x0,u¯)=Vo(x0)V({{x}_{0}})\geq\bar{J}({{x}_{0}},\overline{u})\geq\underset{u}{\mathop{\min}}\,\bar{J}({{x}_{0}},\overline{u})={{V}^{o}}({{x}_{0}}) (196)

Therefore, the second inequality in (73) is proved. Besides,

H¯(V)=H¯(V)H¯(Vo)=(VVo)T(f+gu¯)+r¯(x,u¯)(Vo)Tg(uou¯)r¯(x,uo)=(VVo)T(f+gu¯)+(u¯uo)TR(u¯uo)+βu¯(x)βuo(x)(VVo)T(f+gu¯)\displaystyle\begin{gathered}\overline{H}(V)=\overline{H}(V)-\overline{H}({{V}^{o}})\hfill\\ =-{{(\nabla V-\nabla{{V}^{o}})}^{T}}(f+g\overline{u})+\overline{r}(x,\overline{u})-{{(\nabla{{V}^{o}})}^{T}}g({{u}^{o}}-\overline{u})-\overline{r}(x,{{u}^{o}})\hfill\\ ={{(\nabla V-\nabla{{V}^{o}})}^{T}}(f+g\overline{u})+{{(\overline{u}-{{u}^{o}})}^{T}}R(\overline{u}-{{u}^{o}})+\beta_{\bar{u}}(x)-\beta_{u^{o}}(x)\hfill\\ \geq{{(\nabla V-\nabla{{V}^{o}})}^{T}}(f+g\overline{u})\hfill\\ \end{gathered} (202)

Integrating above equation along the solutions of the closed-loop system (1) with control policy (72) on the interval [0.][0.\infty], we derive

V(x0)Vo(x0)0H¯(V(x(t)))𝑑tV({{x}_{0}})-{{V}^{o}}({{x}_{0}})\leq-\int\limits_{0}^{\infty}{\bar{H}(V(x(t)))dt} (203)

5) By 3), for any feasible solution VV to (71), we have V(x)Vo(x)V(x)\geq{{V}^{o}}(x). Therefore

ΩVo(x)dxΩV(x)dx\mathop{\int}_{\Omega}{{V}^{o}}\left(x\right)dx\leq\mathop{\int}_{\Omega}V\left(x\right)dx (204)

which concludes Vo{{V}^{o}} is a global robust optimal solution to (71).

The proof is complete. \square

References

  • [1] A. D. Ames, X. Xu, J. W. Grizzle and P. Tabuada, “Control Barrier Function Based Quadratic Programs for Safety Critical Systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861-3876, Aug. 2017.
  • [2] L. E. G. Martins and T. Gorschek, “Requirements Engineering for Safety-Critical Systems: Overview and Challenges,” IEEE Software, vol. 34, no. 4, pp. 49-57, 2017.
  • [3] Vamvoudakis, Kyriakos G and Vrabie, Draguna and Lewis, Frank L, “Online adaptive algorithm for optimal control with integral reinforcement learning” International Journal of Robust and Nonlinear Control, vol. 24, no. 17, pp. 2686-2710, Oct. 2014.
  • [4] Kretchmar, R Matthew and Young, Peter M and Anderson, Charles W and Hittle, Douglas C and Anderson, Michael L and Delnero, Christopher C, “Robust reinforcement learning control with static and dynamic stability,” International Journal of Robust and Nonlinear Control, vol. 11, no. 15, pp.1469–1500, Oct. 2001.
  • [5] W. M. McEneaney, “Max-plus methods for nonlinear control and estimation,” Springer Science and Business Media, 2006.
  • [6] W. H. Fleming, and W. M. McEneaney. “A Max-Plus-Based Algorithm for a Hamilton–Jacobi–Bellman Equation of Nonlinear Filtering,” SIAM Journal on Control and Optimization, vol. 38, no. 3, pp. 683-710, 2000.
  • [7] X. Xu, J. W. Grizzle, P. Tabuada and A. D. Ames, “Correctness Guarantees for the Composition of Lane Keeping and Adaptive Cruise Control,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 3, pp. 1216-1229, July 2018.
  • [8] S. Prajna, A. Jadbabaie, “Safety verification of hybrid systems using barrier certificates,” International Workshop on Hybrid Systems: Computation and Control, pp. 477-492, Mar. 2004
  • [9] Q. Nguyen and K. Sreenath, “Exponential Control Barrier Functions for enforcing high relative-degree safety-critical constraints,” in Proc. of American Control Conference, pp. 322-328, 2016.
  • [10] O. Bokanowski, N. Forcadel and H. Zidani, “Reachability and minimal times for state constrained nonlinear problems without any controllability assumption,” SIAM Journal on Control and Optimization, vol. 48, no. 7, pp. 4292-4316, 2010.
  • [11] I. M. Mitchell, A. M. Bayen and C. J. Tomlin, “A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games,” IEEE Transactions on Automatic Control, vol. 50, no. 7, pp. 947-957, July. 2005.
  • [12] J. Ding, J. Sprinkle, S. S. Sastry and C. J. Tomlin, “Reachability calculations for automated aerial refueling,” In Proceedings of IEEE Conference on Decision and Control, pp. 3706-3712, 2008.
  • [13] A. Bemporad, “Reference governor for constrained nonlinear systems,” IEEE Transactions on Automatic Control, vol. 43, no. 3, pp. 415-419, March. 1998.
  • [14] E. G. Gilbert and I. Kolmanovsky, “A generalized reference governor for nonlinear systems,” In Proceedings of IEEE Conference on Decision and Control, pp. 4222-4227, 2001.
  • [15] F. Borrelli, A. Bemporad, M. Fodor and D. Hrovat, “An MPC/hybrid system approach to traction control,” IEEE Transactions on Control Systems Technology, vol. 14, no. 3, pp. 541-552, May. 2006.
  • [16] S. Richter, C. N. Jones and M. Morari, “Computational Complexity Certification for Real-Time MPC With Input Constraints Based on the Fast Gradient Method,” IEEE Transactions on Automatic Control, vol. 57, no. 6, pp. 1391-1403, June. 2012.
  • [17] H. Simon and J. March, “Administrative behavior organization,” New york: free Press, 1976.
  • [18] J. W. Curtis and R. W. Beard, “Satisficing: a new approach to constructive nonlinear control,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1090-1102, July. 2004.
  • [19] M. A. Goodrich, W. C. Stirling and R. L. Frost, “A theory of satisficing decisions and control,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans vol. 28, no. 6, pp. 763-779, Nov. 1998.
  • [20] H. K. Khalil, “Nonlinear Systems, 3 ed.” Upper Saddle River, NJ: Prentice Hall, 2002.
  • [21] S. Prajna, A. Papachristodoulou and P. A. Parrilo, “Introducing SOSTOOLS: a general purpose sum of squares programming solver,” In Proceedings of IEEE Conference on Decision and Control, pp. 741-746, 2002.
  • [22] C. Mu and Y. Zhang, “Learning-Based Robust Tracking Control of Quadrotor With Time-Varying and Coupling Uncertainties,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 1, pp. 259-273, 2020.
  • [23] Mu, C and Zhang, Y and Gao, Z and Sun. C, “ADP-Based Robust Tracking Control for a Class of Nonlinear Systems With Unmatched Uncertainties,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019.
  • [24] Wang, D and Mu. C and He, H and Liu, D, “Event-Driven Adaptive Robust Control of Nonlinear Systems With Uncertainties Through NDP Strategy,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 7, pp. 1358-1370, 2017.
  • [25] F. L. Lewis, D. Vrabie and V. L. Syrmos, Optimal control, John Wiley and Sons, 2012.
  • [26] M. Liang, D. Wang and D. Liu, “Neuro-Optimal Control for Discrete Stochastic Processes via a Novel Policy Iteration Algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019
  • [27] D. Wang, D. Liu and H. Li, “Policy Iteration Algorithm for Online Design of Robust Control for a Class of Continuous-Time Nonlinear Systems,” IEEE Transactions on Automation Science and Engineering, vol. 11, no. 2, pp. 627-632, April. 2014.
  • [28] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, Third Quarter 2009.
  • [29] D. Xu, Q. Wang, Y. Li, “Optimal Guaranteed Cost Tracking of Uncertain Nonlinear Systems Using Adaptive Dynamic Programming with Concurrent Learning,” International Journal of Control, Automation and Systems, pp. 1-12, 2019.
  • [30] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779-791, 2005.
  • [31] S. Kolathaya and A. D. Ames, “Input-to-State Safety With Control Barrier Functions,” IEEE Control Systems Letters, vol. 3, no. 1, pp. 108-113, Jan. 2019.
  • [32] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath and P. Tabuada, “Control Barrier Functions: Theory and Applications,” European Control Conference , pp. 3420-3431, 2019.
  • [33] D. P. De Farias and B. Van Roy, “The linear programming approach to approximate dynamic programming,” Operations research , vol. 51, no. 6, pp. 850-865, 2003.
  • [34] D. P. De Farias and B. Van Roy, “On constraint sampling in the linear programming approach to approximate dynamic programming,” Mathematics of operations research , vol. 29, no. 3, pp. 462-478, 2004.
  • [35] Y. Jiang and Z. Jiang, “Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems,” IEEE Transactions on Automatic Control , vol. 60, no. 11, pp. 2917-2929, Nov. 2015.
  • [36] P. N. Beuchat, A. Georghiou and J. Lygeros, “Performance Guarantees for Model-Based Approximate Dynamic Programming in Continuous Spaces,” IEEE Transactions on Automatic Control, vol. 65, no. 1, pp. 143-158, Jan. 2020.
  • [37] F. L. Lewis, D. Vrabie and K. G. Vamvoudakis, “Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers,” IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76-105, Dec. 2012.
  • [38] Konda, Rohit and Ames, Aaron D and Coogan, Samuel, “Characterizing safety: Minimal barrier functions from scalar comparison systems,” arXiv preprint arXiv:1908.09323, May 2019.
  • [39] D. P. Bertsekas, “Dynamic Programming and Optimal Control.” Belmont, MA, USA: Athena Scientific, 2007.
  • [40] G. Chesi, “Domain of attraction: analysis and control via SOS programming.” Springer Science and Business Media, 2011.
  • [41] Wang, L and Han, D and Egerstedt, M, “Permissive Barrier Certificates for Safe Stabilization Using Sum-of-squares,” 2018 Annual American Control Conference (ACC), pp. 585-590, 2018.
  • [42] Prajna, S and Papachristodoulou, A and Seiler, P and Parrilo, P. A., “Positive polynomials in control.” Springer, 2005.