Controller Design via Experimental Exploration with Robustness Guarantees

Tobias Holicki, Carsten W. Scherer and Sebastian Trimpe This project has been funded in part by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy -EXC 2075 -390740016 and in part by the Cyber Valley Initiative, which is gratefully acknowledged by the authors. Tobias Holicki and Carsten W. Scherer are with the Department of Mathematics, University of Stuttgart, Pfaffenwaldring 5a, 70569 Stuttgart, Germany (email: {tobias.holicki, carsten.scherer}@imng.uni-stuttgart.de)Sebastian Trimpe is with the Institute for Data Science in Mechanical Engineering, RWTH Aachen University, Germany, and also with the Intelligent Control Systems Group, Max Planck Institute for Intelligent Systems, Stuttgart, Germany (email: [email protected])

Abstract

For a partially unknown linear systems, we present a systematic control design approach based on generated data from measurements of closed-loop experiments with suitable test controllers. These experiments are used to improve the achieved performance and to reduce the uncertainty about the unknown parts of the system. This is achieved through a parametrization of auspicious controllers with convex relaxation techniques from robust control, which guarantees that their implementation on the unknown plant is safe. This approach permits to systematically incorporate available prior knowledge about the system by employing the framework of linear fractional representations.

Index Terms:

Experimental exploration, robust controller design, linear matrix inequalities.

{textblock}

13.1(1, 15.25) ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI: 10.1109/LCSYS.2020.3004506

I INTRODUCTION

Recently, learning and data-based control design approaches have received a lot of attention even for linear systems [FerUme20, BocMat18, FazGe19, ZhaHu20, BerSch16, MarHen17]. These approaches can often be subsumed under the broad framework of reinforcement learning [KobBag13], but are still rather diverse [MatPro19]. In [FerUme20] robust control is combined with a dual design strategy that is used for exploring the closed-loop behavior, while [BocMat18] employs the system level synthesis framework with an identification step followed by a robust design and an end-to-end analysis. The approaches in [FazGe19, ZhaHu20] are based on policy gradient methods, while [BerSch16, MarHen17] rely on Bayesian optimization strategies involving Gaussian processes for tuning the controller parameters. The latter strategies turned out to be very efficient for various applications, in particular, in robotics [DriEng17, CalSey15, RohTri18].

Bayesian optimization and other direct sampling methods aim to synthesize optimal controllers based on measurements of a closed-loop cost function involving an unknown system $P_{0}$ to which suitable test controllers are applied [BerSch16, MarHen17, DriEng17, CalSey15, RohTri18]. While these methods have successfully been used in practice, several aspects are subject to current research:

•

A critical issue is safety which means here (and in contrast to the many other interpretations as, e.g., in [DeaTu19]) that the implemented controllers are guaranteed to stabilize the unknown plant $P_{0}$ [BerSch16, TurKra19]. Such guarantees are not often provided in learning control, which might lead to catastrophic outcomes due to closed-loop instability during the tuning process. To this end, a safe threshold on the cost is introduced in [BerSch16] as an indicator for stability, while [TurKra19] incorporates a robustness objective in terms of classical delay and gain margins.
•

The choice of a suitable parametrization of test controllers is another important issue which aims to keep the evaluations of the cost small even if the set of admissible controllers is large [MarHen16, RobMan11, BanCal17]. In [RobMan11], several naive parametrizations are illustrated and one based on the Youla parametrization is studied. In [MarHen16], the controller candidates are parametrized in terms of the weights in an LQ design for a given nominal system.
•

Different ways to incorporate prior knowledge is another topic of tremendous importance in these approaches [MarHen16, MarHen17, BocMat18, KobBag13, BerSch15, RohTri18]. A linearization of the underlying nonlinear system is used in [MarHen16] for the construction of a parametrization. In [MarHen17], prior knowledge is used for the design of specialized kernels that outperform standard ones, while [RohTri18] discusses how to choose hyperparameters from some simulation model.

In this paper, we propose a systematic parametrization of controllers based on modeling, analysis and design techniques from robust control that can be used for controller tuning/sampling and addresses all of the above concerns at the same time.

We assume that $P_{0}$ is only partially unknown and employ the linear fractional representation (LFR) framework in order to separate known from unknown (or difficult) components. Such representations are well-established and flexible modeling tools in robust control [ZhoDoy96, Sch01a], but they are not often used in learning control. In particular, LFRs allow for expressing $P_{0}$ as feedback interconnection of a known linear system $P$ and some unknown or uncertain component $\Delta\in\mathbf{\Delta}$ ; the set $\mathbf{\Delta}$ captures, e.g., crude guesses on parameter ranges. Prior knowledge is thus encoded in the choices of $P$ and $\mathbf{\Delta}$ . Dedicated robust design techniques then allow the synthesis of controllers that stabilize the uncertain interconnection and, hence, are guaranteed to stabilize the unknown $P_{0}$ ; these techniques ensure safety. In this initial work, we assume that the uncertain component is parametric and construct a parametrization based on a partition of the set $\mathbf{\Delta}=\bigcup\mathbf{\Delta}_{k}$ . The main idea is to use controllers as obtained from a robust multi-objective design problem with guaranteed stability and performance on $\mathbf{\Delta}$ and $\mathbf{\Delta}_{k}$ , respectively.

Outline. The remainder of the paper is organized as follows. After a short paragraph on notation, we specify the considered learning control problem and discuss its essential ingredients. Next we propose a systematic parametrization of robust controllers for safely and exploratively evaluating the underlying closed-loop cost function. We elaborate on the properties of this parametrization and demonstrate its benefits on some numerical examples.

Notation. We use the star product “ $\star$ ” and all rules for linear fractional transformations (LFTs) as in [ZhoDoy96, Chapter 10]. Objects that can be inferred by symmetry or are not relevant are indicated by “ $\bullet$ ”.

II SETTING

II-A Problem Formulation

We assume that we are given an unknown real system $P_{0}$ described as

\left(\begin{array}[]{@{}c@{}}e\\ y\end{array}\right)=P_{0}\left(\begin{array}[]{@{}c@{}}d\\ u\end{array}\right).

(1)

Here $e$ is the controlled output and $d$ is a generalized disturbance (both used to formulate performance specifications), while $y$ is the measurement output and $u$ the control input. The underlying control problem is to find a controller

u=Fy

(2)

such that the corresponding closed-loop system, which is referred to as $P_{0}\star F$ , is stable and such that a closed-loop cost function $J$ , which encodes the performance specifications, is minimized. Since $P_{0}$ is unknown, we aim to find such a controller based on evaluations of the cost function $J$ . This amounts to the selection of suitable test controllers, their implementation on the real system $P_{0}$ and the evaluation of their achieved closed-loop performance.

This is the setting in [BerSch16, MarHen17, DriEng17, CalSey15, RohTri18], where the individual approaches differ, e.g., in the choice of cost, the available measurements from the plant, the assumed prior knowledge about the plant and the employed controller parametrization.

We confine the discussion to continuous-time linear time-invariant (LTI) systems $P_{0}$ and the design of state-feedback controller gains $F$ which motivates to choose $y$ as the state $x$ of $P_{0}$ . Moreover, we choose the $H_{\infty}$ -norm cost function

J:F\mapsto J(F):=\|P_{0}\star F\|_{\infty},

(3)

for which ample motivations are found in the robust control literature. Data-based techniques for estimating $H_{\infty}$ -norms have been proposed, e.g., in [RalFor17]. To simplify the exposition, we assume that the measurements of the cost are exact, although it is possible to extend our framework to noisy ones.

II-B Encoding Prior Knowledge

We consider the case that $P_{0}$ is partially unknown. To this end, we adopt the framework of LFRs [ZhoDoy96] and describe $P_{0}$ as the interconnection of some known system $P$ given by

\left(\begin{array}[]{@{}c@{}}\dot{x}(t)\\ \hline\cr z(t)\\ e(t)\\ y(t)\end{array}\right)=\left(\begin{array}[]{@{}c|ccc@{}}A&B_{1}&B_{2}&B_{3}\\ \hline\cr C_{1}&D_{11}&D_{12}&D_{13}\\ C_{2}&D_{21}&D_{22}&D_{23}\\ I&0&0&0\end{array}\right)\left(\begin{array}[]{@{}c@{}}x(t)\\ \hline\cr w(t)\\ d(t)\\ u(t)\end{array}\right)

(4)

in feedback with some unknown part or uncertainty

w(t)=\Delta_{0}z(t).

(5)

Here $\Delta_{0}$ is a real matrix of suitable dimension and $w$ , $z$ are the interconnection variables. Then (1) admits a state-space representation with the matrix

\left(\begin{array}[]{@{}c|cc@{}}A&B_{2}&B_{3}\\ \hline\cr C_{2}&D_{22}&D_{23}\\ I&0&0\end{array}\right)\!+\!\left(\begin{array}[]{@{}c@{}}B_{1}\\ \hline\cr D_{21}\\ 0\end{array}\right)\!\Delta_{0}(I-D_{11}\Delta_{0})^{-1}\left(\begin{array}[]{@{}c|cc@{}}C_{1}&D_{12}&D_{13}\end{array}\right).

By slightly abusing the notation, the latter matrix and the system (1) are denoted as $\Delta_{0}\star P$ . Note that the controlled unknown system, the interconnection of (4), (5) and (2), is then given by $P_{0}\star F=(\Delta_{0}\star P)\star F=\Delta_{0}\star P\star F$ . Such representations are known to be highly flexible since they permit to effectively capture structural dependencies of models on uncertain scalar parameters or matrix sub-blocks, which are typically collected on the diagonal of the (structured) uncertainty $\Delta_{0}$ . As an extra advantage, LFRs allow a seamless generalization to multiple heterogeneous (i.e., a mixture of time-varying, non-linear or infinite dimensional) uncertainties collected in a nonlinear feedback operator $w(t)=\Delta_{0}(t,z)(t)$ , but this is not pursued here.

Instead, we adopt the point-of-view that $\Delta_{0}$ describes unknown parts of the system $P_{0}$ , while the known parts (as, e.g., resulting from first-principle modeling) are captured by $P$ . Moreover, we assume that $\Delta_{0}$ is contained in some known set $\mathbf{\Delta}$ of matrices that is compact and typically given by

\{\mathrm{diag}(\delta_{1}I,\dots,\delta_{m_{r}}I,\Delta_{1},\dots,\Delta_{m_{\rm f}}):|\delta_{j}|\leq 1,\leavevmode\nobreak\ \|\Delta_{j}\|\leq 1\}

with (repeated) diagonal and full unstructured blocks on the diagonal, all bounded in norm by one. As an extreme case, this description does capture the models in [BerSch15, MarHen17, BocMat18] and the ones in [FerUme20, MatPro19, FazGe19, ZhaHu20], in which it is assumed that nothing aside from linearity is known about $P_{0}$ and where $\Delta_{0}$ is just one large unstructured uncertain matrix.

Note that the development of modern robust control has been substantially motivated by the fact that facing completely unknown systems is often not realistic. By now LFRs are used in tandem with dedicated analysis and design tools from robust control, such as structural singular values or integral quadratic constraints (IQCs) [MegRan97], which permit to accurately exploit the fine structure of the unknown $\Delta_{0}$ .

Therefore, in view of their modeling power, LFRs provide an ideal setting to incorporate prior structural knowledge about a system (through $P$ ) with unknown to-be-learnt components (through the elements of $\Delta_{0}$ ).

II-C Safety

Clearly, guaranteeing stability is a critical issue in learning based approaches since probing the system with gains that are not stabilizing can lead to catastrophes. In contrast to many other approaches as, e.g., in [DriEng17, CalSey15, RohTri18] and aligned with [MarHen17], we propose to only select controllers that are guaranteed to be robustly stabilizing, i.e., that are taken from the set

\mathbb{F}(\mathbf{\Delta}):=\{F:\leavevmode\nobreak\ F\text{ stabilizies }\Delta\star P\text{ for all }\Delta\in\mathbf{\Delta}\}.

(6)

This set is typically much smaller than the set of controllers that are merely required to stabilize $\Delta_{0}\star P$ . However, since $\Delta_{0}\in\mathbf{\Delta}$ is unknown and since we can only rely on the prior knowledge about $\mathbf{\Delta}$ , there is no other choice than to pick gains from $\mathbb{F}(\mathbf{\Delta})$ in order to ensure a safe operation of the system in closed-loop. The minimal value of $J(F)$ over the set $\mathbb{F}(\mathbf{\Delta})$ is related to the cost of interest as

\inf_{F\text{ stabilizes }P_{0}}J(F)\leq\inf_{F\in\mathbb{F}(\mathbf{\Delta})}J(F),

(

\mathcal{S}

)

in which the gap reflects the price to-be-paid for safety.

II-D Motivation for Controller Parametrizations

For optimizing the cost $J$ it is highly beneficial and an often seen strategy to parameterize a family of test controllers by a few parameters before applying an optimization algorithm, especially if the ambient space of controller gains has a large dimension [MarHen16, RobMan11, BanCal17]. Formally, such a parametrization is a mapping $\mathcal{F}$ with a domain $\mathrm{dom}(\mathcal{F})$ that is contained in a low dimensional ambient space and chosen in order to render the gap in the inequality

\inf_{F\text{ stabilizes }P_{0}}J(F)\leq\inf_{\theta\in\mathrm{dom}(\mathcal{F})}J(\mathcal{F}(\theta))

(

\mathcal{P^{\prime}}

)

as small as possible. Then, the idea is to minimize the surrogate cost $J\circ\mathcal{F}$ over $\mathrm{dom}(\mathcal{F})$ instead of determining the minimum of the original cost $J$ . Since the former minimization problem is formulated in a low dimensional space, it is expected to require substantially fewer evaluations of the cost function for its (approximate) solution. The gap in ( $\mathcal{P^{\prime}}$ ) constitutes the price to-be-paid for this reduction of complexity and is rarely analyzed in the literature. We stress that it is instrumental to choose a parametrization $\mathcal{F}$ such that its values are contained in $\mathbb{F}(\mathbf{\Delta})$ for reasons of safety. Then the gap in ( $\mathcal{P^{\prime}}$ ) can even be more precisely identified as the sum of that in ( $\mathcal{S}$ ) and the one in

\inf_{F\in\mathbb{F}(\mathbf{\Delta})}J(F)\leq\inf_{\theta\in\mathrm{dom}(\mathcal{F})}J(\mathcal{F}(\theta)).

(

\mathcal{P}

)

II-E Main Contributions

For some index set $\mathbb{I}$ , we propose a novel parametrization $\mathcal{F}:\mathbb{I}\to\mathbb{F}(\mathbf{\Delta})$ of auspicious robustly stabilizing controllers based on a partition of $\mathbf{\Delta}$ . It features an a priori safety guarantee without the need to ensure this property through the employed optimization algorithm as proposed, e.g., in [ZhaHu20]. For its construction, we use advanced robust control techniques that explicitly take the available prior knowledge into account. Based on this parametrization, we show how experimental controller probing allows for controlling the size of the gap in ( $\mathcal{P}$ ) by varying the coarseness of the partition of $\mathbf{\Delta}$ , and how to even reduce the gap in ( $\mathcal{S}$ ) by systematically decreasing the size of $\mathbf{\Delta}$ without endangering safety.

In comparison to a standard robust design, which does not utilize data from closed-loop experiments, our approach naturally generates safe controllers with improved performance on the real plant $P_{0}$ .

III PARAMETRIZATION OF TEST CONTROLLERS

III-A Construction of the Controller Parametrization

Let us choose the index set $\mathbb{I}:=\{1,\dots,N\}$ and subsets $\mathbf{\Delta}_{1},\dots,\mathbf{\Delta}_{N}$ of the uncertainty set $\mathbf{\Delta}$ that form the partition

\mathbf{\Delta}={\textstyle\bigcup\limits_{k\in\mathbb{I}}}\mathbf{\Delta}_{k}\text{ \leavevmode\nobreak\ with\leavevmode\nobreak\ }\mathrm{int}\mathbf{\Delta}_{k}\cap\mathrm{int}\mathbf{\Delta}_{l}=\emptyset\text{ \leavevmode\nobreak\ for all\leavevmode\nobreak\ }k,l\in\mathbb{I}.

With this partition, we construct $\mathcal{F}:\mathbb{I}\to\mathbb{F}(\mathbf{\Delta})$ based on the rationale to render $J(\mathcal{F}(k))$ for at least one index $k\in\mathbb{I}$ as small as possible, since this leads to the best possible reduction of the gap in ( $\mathcal{P}$ ). The proposed parametrization assigns to $k\in\mathbb{I}$ a controller $F\in\mathbb{F}(\mathbf{\Delta})$ which reduces ${\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star F\|_{\infty}}$ as much as possible. This means that we are facing a robust multi-objective synthesis problem involving robust stability w.r.t. $\Delta\in\mathbf{\Delta}$ and worst-case $H_{\infty}$ performance w.r.t. $\Delta\in\mathbf{\Delta}_{k}$ . Such problems are usually nonconvex as well as nonsmooth and thus hard to solve systematically. Still, it is possible to compute good upper bounds on the corresponding optimal value by solving a linear SDP if relying on so-called multiplier relaxations in robust control. One such relaxation is given in Theorem 1 and requires to specify a set $\mathbb{P}(\mathbf{\Delta})$ of real symmetric matrices with an LMI description such that

\left(\begin{array}[]{@{}c@{}}-\Delta^{T}\\ I\end{array}\right)^{T}P\left(\begin{array}[]{@{}c@{}}-\Delta^{T}\\ I\end{array}\right)\prec 0\text{ \leavevmode\nobreak\ for all\leavevmode\nobreak\ }\Delta\in\mathbf{\Delta},\ P\in\mathbb{P}(\mathbf{\Delta}).

We also assume that such multiplier classes $\mathbb{P}(\mathbf{\Delta}_{k})$ are available for the partition members $\mathbf{\Delta}_{k}$ and for $k=1,\ldots,N$ . A more detailed discussion with concrete choices for such multiplier sets can be found in [Sch05, SchWei00].

Theorem 1

For fixed $k\in\mathbb{I}$ consider the system of LMIs

	$Y\succ 0,\ \ (\bullet)^{T}\left(\begin{array}[]{@{}cc\|c@{}}0&I\\ I&0\\ \hline\cr&&P\end{array}\right)\left(\begin{array}[]{@{}cc@{}}I&0\\ -\mathbf{A}^{T}&-\mathbf{C}_{1}^{T}\\ \hline\cr 0&I\\ -B_{1}^{T}&-D_{11}^{T}\end{array}\right)\succ 0,$		(7a)
	$(\bullet)^{T}\left(\begin{array}[]{@{}cc\|c\|cc@{}}0&I&&&\\ I&0&&&\\ \hline\cr&&P_{k}&&\\ \hline\cr&&&-\gamma^{2}I&0\\ &&&0&I\end{array}\right)\left(\begin{array}[]{@{}ccc@{}}I&0&0\\ -\mathbf{A}^{T}&-\mathbf{C}_{1}^{T}&-\mathbf{C}_{2}^{T}\\ \hline\cr 0&I&0\\ -B_{1}^{T}&-D_{11}^{T}&-D_{21}^{T}\\ \hline\cr 0&0&I\\ -B_{2}^{T}&-D_{12}^{T}&-D_{22}^{T}\end{array}\right)\succ 0$		(7b)

in the variables $Y=Y^{T}$ , $P\in\mathbb{P}(\mathbf{\Delta})$ , $P_{k}\in\mathbb{P}(\mathbf{\Delta}_{k})$ , $M$ , $\gamma$ and with the abbreviations

(\mathbf{A},\mathbf{C}_{1},\mathbf{C}_{2})\!:=\!(AY\!+\!B_{3}M,C_{1}Y\!+\!D_{13}M,C_{2}Y\!+\!D_{23}M).

If these LMIs are feasible, the controller gain $F:=MY^{-1}$ satisfies $F\in\mathbb{F}(\mathbf{\Delta})$ and $\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star F\|_{\infty}<\gamma$ .

The proof of this result is found in [SchWei00]. It shows that

\inf_{F\in\mathbb{F}(\mathbf{\Delta})}\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star F\|_{\infty}\leq\gamma_{*}(k)

(8)

is satisfied for $\gamma_{*}(k)\!:=\!\inf\{\gamma\in\mathbb{R}:\text{ LMIs \eqref{DC::lem::eq::lmi_part} are feasible}\}$ .

All this leads us to the construction of the parametrization $\mathcal{F}$ as follows: For some fixed small $\varepsilon>0$ and $\gamma_{*}^{\varepsilon}(k):=(1+\varepsilon)\gamma_{*}(k)$ , we assign to $k\in\mathbb{I}$ some gain $\mathcal{F}(k)$ with

\mathcal{F}(k)\in\mathbb{F}(\mathbf{\Delta})\text{ and }\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star\mathcal{F}(k)\|_{\infty}\leq\gamma_{*}^{\varepsilon}(k).

(9)

We emphasize that both $\gamma_{*}(k)$ and $\mathcal{F}(k)$ can be computed by solving a standard semi-definite program. Still note that, in general, $\gamma_{*}(k)$ is not attained (no optimal controller exists), which motivates the introduction of $\varepsilon$ .

In the sequel, we abbreviate the surrogate cost function resulting from the parametrization $\mathcal{F}$ as

L(k):=J(\mathcal{F}(k))=\|\Delta_{0}\star P\star\mathcal{F}(k)\|_{\infty}\text{ \leavevmode\nobreak\ for\leavevmode\nobreak\ }k\in\mathbb{I}.

Further, let us note at this point that $\Delta_{0}\in\mathbf{\Delta}_{k}$ for some index $k\in\mathbb{I}$ clearly implies

L(k)\leq\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star\mathcal{F}(k)\|_{\infty}\leq\gamma_{*}^{\varepsilon}(k).

(10)

Remark 2

It is routine to adapt Theorem 1 to a singleton $\mathbf{\Delta}_{k}=\{\Delta\}$ with any $\Delta\in\mathbf{\Delta}$ . This adaptation no longer requires to choose a multiplier class for $\mathbf{\Delta}_{k}$ which promotes a smaller relaxation gap in (8). It also permits the choice $\mathbb{I}=\mathbf{\Delta}$ as a highly useful extreme case in our construction and leads to a parametrization $\mathcal{F}$ mapping $\mathbf{\Delta}$ into $\mathbb{F}(\mathbf{\Delta})$ .

Remark 3

As a key difference between the cost $J$ and its surrogate $J\circ\mathcal{F}$ , the domain of the former consists of the only implicitly defined set of (robustly) stabilizing controllers, while the latter can be evaluated directly. In particular for $\mathbb{I}=\mathbf{\Delta}$ as in Remark 2, $J\circ\mathcal{F}$ is simply defined on $\mathbf{\Delta}$ .

III-B Application of the Controller Parametrization

After having introduced the controller parametrization, the conceptual algorithm of this paper reads as follows. For each $k\in\mathbb{I}$ , we can implement the controller $\mathcal{F}(k)$ on the system, since it is assured to be stabilizing for $P_{0}$ , and measure the cost $L(k)$ . A mere minimization over $k\in\mathbb{I}$ then leads to an optimal controller $\mathcal{F}(k_{\ast})$ , and inequality ( $\mathcal{P^{\prime}}$ ) now reads as

\inf_{F\text{ stabilizes }P_{0}}J(F)\leq L(k_{*})\text{ \leavevmode\nobreak\ for\leavevmode\nobreak\ }k_{*}\in\operatorname*{arg\,min}_{k\in\mathbb{I}}L(k).

(11)

Fine partitions of $\mathbf{\Delta}$ lead to large index sets $\mathbb{I}$ . Instead of considering all $k\in\mathbb{I}$ , we can take fewer (random) samples $\{k_{1},\ldots,k_{M}\}$ of $\mathbb{I}$ and obtain a (rough) approximation $\min_{j=1,\dots,M}L(k_{j})$ of $L(k_{\ast})$ . In particular for the partition with $\mathbb{I}=\mathbf{\Delta}$ as described in Remark 2, one can directly employ a whole variety of smarter (derivative free) sampling and optimization strategies, such as Bayesian optimization involving Gaussian processes discussed in [BerSch16, MarHen17].

Instead of considering a single (fine) partition, one can as well start from a coarse partition of $\mathbf{\Delta}$ and propose adaptive refinement strategies which generate a sequence of controller parametrizations as follows. Given $\mathbf{\Delta}=\bigcup_{k=1}^{N}\mathbf{\Delta}_{k}$ , determine an index $k^{0}\in\operatorname*{arg\,min}_{j=1,\dots,N}L(j)$ . Then generate a partition of $\mathbf{\Delta}_{k^{0}}$ denoted as $\bigcup_{j=1}^{N}\mathbf{\Delta}_{k^{0}_{j}}$ in order to obtain a refined partition of the original set as

\mathbf{\Delta}=\Bigl{(}{\textstyle\bigcup\limits_{j=1,\dots,N}}\mathbf{\Delta}_{k^{0}_{j}}\Bigr{)}\leavevmode\nobreak\ \cup\leavevmode\nobreak\ \Bigl{(}{\textstyle\bigcup\limits_{k=1,\dots,N,\leavevmode\nobreak\ k\neq k^{0}}}\mathbf{\Delta}_{k}\Bigr{)}.

(12)

This refined partition yields a new parametrization $\mathcal{F}^{1}$ with corresponding new cost $L^{1}$ and some next optimal index $k^{1}\in\operatorname*{arg\,min}_{j=1,\dots,N}L^{1}(k^{0}_{j})$ . By construction it is guaranteed that $L^{1}(k^{1})\leq L(k^{0})$ holds. This step can be iterated in order to further decrease the value of the surrogate cost. In Section IV we propose a specific algorithm based on this approach which involves, in particular, a concrete strategy for refining given partitions.

III-C Reducing the Gap in Inequality ( $\mathcal{P}$ )

Our setup allows for identifying the sources of the gap in ( $\mathcal{P}$ ) and permits to generate systematic refinements towards its reduction. To illustrate this issue, let us suppose that the relaxation gap in (8) is small. Then we infer (by the definition of $\mathcal{F}$ and for small $\varepsilon>0$ ) that

\gamma_{*}^{\varepsilon}(k)\!\approx\!\inf_{F\in\mathbb{F}(\mathbf{\Delta})}\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star F\|_{\infty}\!\leq\!\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star\mathcal{F}(k)\|_{\infty}.

On the other hand, for $k\in\mathbb{I}$ with $\Delta_{0}\in\mathbf{\Delta}_{k}$ and if this member $\mathbf{\Delta}_{k}$ of the partition is sufficiently small, we have

\sup_{\Delta\in\mathbf{\Delta}_{k}}\|\Delta\star P\star\mathcal{F}(k)\|_{\infty}\approx\|\Delta_{0}\star P\star\mathcal{F}(k)\|_{\infty}=L(k).

Hence $\inf_{k\in\mathbb{I}}L(k)$ is close to $\inf_{F\in\mathbb{F}(\mathbf{\Delta})}\|\Delta_{0}\star P\star F\|_{\infty}$ which shows that the gap in ( $\mathcal{P}$ ) is small. In conclusion, it is essential that the size of the partition member containing $\Delta_{0}$ and the relaxation gap in (8) are both small. Without going into details, we emphasize that the latter can be controlled with the choices of the multiplier sets $\mathbb{P}(\mathbf{\Delta})$ and $\mathbb{P}(\mathbf{\Delta}_{k})$ , through the use of more advanced multi-object control techniques and by applying further refinements in robust control [ArzPea00], such as incorporating S-variables [EbiPea15] or dynamic instead of static IQCs [MegRan97].

III-D Reducing the Gap in Inequality ( $\mathcal{S}$ )

Our approach offers the opportunity to even reduce the gap in ( $\mathcal{S}$ ) by identifying a smaller index set $\tilde{\mathbb{I}}\subset\mathbb{I}$ with

	$\Delta_{0}\in\tilde{\mathbf{\Delta}}:={\textstyle\bigcup_{k\in\tilde{\mathbb{I}}}}\mathbf{\Delta}_{k}.$	(13a)
Indeed, this is guaranteed with
	$\tilde{\mathbb{I}}=\{k\in\mathbb{I}:L(k)\leq\gamma_{*}^{\varepsilon}(k)\}$	(13b)

since $k\in\mathbb{I}/\tilde{\mathbb{I}}$ implies $\Delta_{0}\notin\mathbf{\Delta}_{k}$ by (10). Note that $\tilde{\mathbf{\Delta}}$ can be considerably smaller than the original $\mathbf{\Delta}$ , which implies that the related set $\mathbb{F}(\tilde{\mathbf{\Delta}})$ of robustly stabilizing controllers is (much) larger than $\mathbb{F}(\mathbf{\Delta})$ . Thus, replacing $\mathbf{\Delta}$ with $\tilde{\mathbf{\Delta}}$ reduces the cost of safety as expressed by the gap in ( $\mathcal{S}$ ).

This suggests to repeat our design procedure for $\tilde{\mathbf{\Delta}}$ , which amounts to constructing a new parametrization $\tilde{\mathcal{F}}$ giving controllers $\tilde{\mathcal{F}}(k)$ (via Theorem 1) with which we can perform new closed-loop experiments to evaluate $\tilde{L}=J\circ\tilde{\mathcal{F}}$ . The controllers $\tilde{\mathcal{F}}(k)$ are expected to achieve (considerably) improved closed-loop performance with a smaller gap in ( $\mathcal{P^{\prime}}$ ), just due to the reduction of the gap in ( $\mathcal{S}$ ). The algorithm proposed in the next section is based on this strategy.

Remark 4

The set $\tilde{\mathbf{\Delta}}$ is not guaranteed to be convex. Similarly as in [Sch01], in this case one can express it as union of few convex sets and modify Theorem 1 by using a robust stabilization objective for each of the individual convex sets; this purposive design comes along with an increased numerical burden.

Note that the set $\mathbb{I}$ is constructed based on (10) which provides an upper bound on the cost $L(k)$ at the index $k$ with $\Delta_{0}\in\mathbf{\Delta}_{k}$ . We can also devise a lower bound which can be exploited similarly in order to further reduce $\tilde{\mathbb{I}}$ and shrink the gap in ( $\mathcal{S}$ ). To this end, observe that standard $H_{\infty}$ design permits to numerically determine

\gamma_{\mathrm{nom}}(\Delta)\!:=\!\inf_{F\text{ stabilizes }\Delta\star P}\!\|\Delta\star P\star F\|_{\infty}\text{ for any fixed }\Delta\!\in\!\mathbf{\Delta}.

Then $\Delta_{0}\in\mathbf{\Delta}_{k}$ for $k\in\mathbb{I}$ indeed yields the lower bound

\inf_{\Delta\in\mathbf{\Delta}_{k}}\!\gamma_{\mathrm{nom}}(\Delta)\leq\!\gamma_{\mathrm{nom}}(\Delta_{0})\leq\|\Delta_{0}\star P\star\mathcal{F}(k)\|_{\infty}\leq L(k).

Note that this lower bound is not cheap to compute as it involves a numeric minimization of $\gamma_{\mathrm{nom}}$ on $\mathbf{\Delta}_{k}$ for each considered $k\in\mathbb{I}$ . In contrast, the upper bound $\gamma_{\ast}^{\varepsilon}(k)$ is essentially obtained for free while constructing the map $\mathcal{F}$ .

IV AN ALGORITHM

In this section we propose a concrete algorithm that works in higher dimensions and aims to exploit (13). It involves the uncertainty box $\mathbf{\Delta}=\{\Delta(\delta):\delta_{\nu}\in\mathcal{I}_{\nu},\ \nu=1,\ldots,M\}$ where $\mathcal{I}_{1},\dots,\mathcal{I}_{M}$ are given intervals and in which we use the abbreviation $\Delta(\delta):=\mathrm{diag}(\delta_{1}I_{q_{1}},\dots,\delta_{M}I_{q_{M}})$ . The related Algorithm 1 is motivated by coordinate descent, which currently becomes more popular due to its appearance in machine learning applications.

input : Number of partitions

N

2Set

\nu=1

and

\mathcal{I}_{k}^{p}=\mathcal{I}_{k}

for all

k=1,\dots,M

3 while (not terminated) do

4 Choose a uniform partition

\mathcal{I}_{\nu}=\bigcup_{k=1}^{N}\tilde{\mathcal{I}}_{k}

5 Set

\mathbf{\Delta}_{k}\!:=\!\{\Delta(\delta)\!:\!\delta_{\nu}\!\in\!\tilde{\mathcal{I}}_{k},\,\delta_{j}\!\in\!\mathcal{I}_{j},\,j\!\neq\!\nu\}

to get

\mathbf{\Delta}\!=\!\bigcup_{k=1}^{N}\!\mathbf{\Delta}_{k}

6 Determine

\mathcal{F}(k)

\gamma_{*}(k)

and

L(k)

for all

k\in\mathbb{I}

7 Determine

\tilde{\mathbb{I}}

as in (13), set

\mathcal{I}_{\nu}\!=\!\mathrm{convex\,hull}\big{(}\bigcup_{k\in\tilde{\mathbb{I}}}\tilde{\mathcal{I}}_{k}\big{)}

and update

\mathbf{\Delta}

accordingly

8 Set

\mathcal{I}^{p}_{\nu}:=\tilde{\mathcal{I}}_{j}

where

j\!\in\!\operatorname*{arg\,min}_{k\in\tilde{\mathbb{I}}}L(k)

9 Set

\nu=\nu+1

\nu<M

and

\nu=1

otherwise

10 end while

11 Set

\nu=1

12 while (not terminated) do

13 Choose a uniform partition

\mathcal{I}_{\nu}^{p}=\bigcup_{k=1}^{N}\tilde{\mathcal{I}}_{k}

14 Set

\mathbf{\Delta}_{k}:=\{\Delta(\delta)\leavevmode\nobreak\ :\leavevmode\nobreak\ \delta_{\nu}\in\tilde{\mathcal{I}}_{k},\leavevmode\nobreak\ \delta_{j}\in\mathcal{I}^{p}_{j},\leavevmode\nobreak\ j\neq\nu\}

to get

\mathbf{\Delta}=\big{(}\bigcup_{k=1}^{N}\mathbf{\Delta}_{k}\big{)}\cup(\bullet)

as in (12)

15 Determine

\mathcal{F}(k)

\gamma_{*}(k)

and

L(k)

for all

k\in\{1,\dots,N\}

16 Set

\mathcal{I}^{p}_{\nu}\!:=\!\tilde{\mathcal{I}}_{j}

and

F_{\ast}\!=\!\mathcal{F}(j)

for

j\in\operatorname*{arg\,min}_{k=1,\dots,N}L(k)

17 Set

\nu=\nu+1

\nu<M

and

\nu=1

otherwise

18 end while

output : Controller gain

F_{\ast}

Algorithm 1 Design via Coordinate-Like Descent

The first loop of the algorithm generates a partition of $\mathbf{\Delta}$ by taking a uniform partition only of the interval $\mathcal{I}_{\nu}$ related to the parameter $\delta_{\nu}$ . In line 6, it exploits (13) in order to shrink $\bigcup_{k=1}^{N}\tilde{\mathcal{I}}_{k}$ to a new interval and to generate a reduced parameter set that is guaranteed to contain $\Delta_{0}$ ; moreover, it still has the structure of a hyperrectangle. In line 7 and as input to the second loop, we store those intervals $\mathcal{I}_{\nu}^{p}$ for which the best performance level is observed. Running this loop $n_{1}$ times requires to perform $Nn_{1}$ experiments.

In the second loop, the algorithm adaptively refines those subsets of $\mathbf{\Delta}$ for which the best closed-loop performance was achieved. This proceeds as in Section III-B, by generating sub-partitions along each parameter axis. Again, $n_{2}$ runs of the loop require $Nn_{2}$ experimental cost evaluations. In particular, if we let $n_{1},n_{2}=O(M)$ , the number of evaluations grows linearly in the number of unknown parameters $M$ and turns the algorithm applicable even if $M$ is large.

V NUMERICAL EXAMPLES

For numerical illustrations, we consider several modified examples from the library COMPl_eib [Lei04] which, unfortunately, does not comprise robust control examples. We let $\left(\begin{smallmatrix}A&B_{2}&B_{3}\\ C_{2}&D_{22}&D_{23}\end{smallmatrix}\right)$ be the matrices $\left(\begin{smallmatrix}A&B_{1}&B\\ C_{1}&D_{11}&D_{12}\end{smallmatrix}\right)$ in (1.1) from [Lei04] and choose the remaining matrices in order to define $P$ in (4) as $D_{11}=0$ , $D_{12}=0$ , $D_{21}=0$ ,

B_{1} = (\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ \hdashline 0_{∙ \times 3} \end{matrix}), C_{1} = (\begin{matrix} 1 & 0 \\ 0 & 1 & 0_{3 \times ∙} \\ 0 & 1 \end{matrix}) and D_{13} = (\begin{matrix} 0 \\ 0 & 0_{3 \times ∙} \\ 1 \end{matrix}) . F u r t h e r, w e t a k e Δ ​:=​ {diag(δ_1, δ_2, δ_3) : δ_1, δ_2, δ_3 ∈[-1, 1] } a n d s u p p o s e t h a t t h e r e a l s y s t e m P_0 = Δ_0 ⋆P i s o b t a i n e d f o r Δ_0 = diag(0.7, -0.1, 0.7) . N o t e t h a t a l l s u b s e q u e n t a l g o r i t h m s o n l y a c c e s s t h e c o s t J(F)=∥P_0⋆F∥_∞ i n (3) f o r c h o s e n g a i n s F . I n T h e o r e m 1, b o t h s e t s o f m u l t i p l i e r s P (Δ) a n d P (Δ_k) c o n s i s t o f s o - c a l l e d D/G - s c a l i n g s [SchWei00] a n d w e t a k e ε := 0.05 i n (9) . W i t h T h e o r e m 1 f o r t h e t r i v i a l p a r t i t i o n N=1, w e c a n a s w e l l c o m p u t e a n u p p e r b o u n d γ_rp f o r t h e r o b u s t p e r f o r m a n c e s y n t h e s i s p r o b l e m a s i n (14)Equation 1414infF∈F(Δ)supΔ∈Δ∥Δ⋆P⋆F∥≤γrp. A n y l e a r n i n g b a s e d d e s i g n r e s u l t s i n a c o n t r o l l e r w i t h p e r f o r m a n c e l e v e l i n b e t w e e n γ_rp a n d t h e b e s t a c h i e v a b l e n o m i n a l p e r f o r m a n c e γ_nom:=
inf_F stabilizes P_0​ ∥P_0 ⋆F∥_∞ . L e t u s n o w e m p l o y A l g o r i t h m 1 w i t h N = 6 a n d b y u s i n g n_1, n_2 i t e r a t i o n s i n t h e f i r s t a n d s e c o n d l o o p, r e s p e c t i v e l y . T h e a c h i e v e d p e r f o r m a n c e l e v e l s γ^n_1,n_2 f o r (n_1,n_2)=(0,6) a n d (n_1,n_2)=(3,3) a r e d e p i c t e d i n T a b l e LABEL:DC::tab::results . W e c o m p a r e t h e s e r e s u l t s w i t h a b a s e d - l i n e l e a r n i n g a p p r o a c h w h i c h a i m s t o m i n i m i z e t h e c o s t J w i t h o u t e m p l o y i n g a c o n t r o l l e r p a r a m e t r i z a t i o n, s i m i l a r l y a s d o n e i n [MarHen17, FazGe19, ZhaHu20] . T o t h i s e n d, w e u s e a d e t e r m i n i s t i c d i r e c t s e a r c h m e t h o d [AudDen02] w h i c h i s i n i t i a l i z e d w i t h a r o b u s t c o n t r o l l e r a s o b t a i n e d b y c o m p u t i n g γ_rp i n (14) . W e r e l y o n t h e M a t l a b i m p l e m e n t a t i o n i n patternsearch a n d d e n o t e b y γ_ps^k t h e a c h i e v e d p e r f o r m a n c e l e v e l a f t e r k e v a l u a t i o n s o f t h e c o s t f u n c t i o n . L e t u s e m p h a s i z e a t t h i s p o i n t t h a t, f o r t h e c o n s i d e r e d e x a m p l e s, a l l i t e r a t e s o f patternsearch a r e s t a b i l i z i n g P_0 w i t h o u t a n y p a r t i c u l a r p r e c a u t i o n s; t h i s i s i n s t a r k c o n t r a s t t o o t h e r d i r e c t o p t i m i z a t i o n a l g o r i t h m s s u c h a s bayesopt . M o r e o v e r, w e a l s o e m p l o y patternsearch f o r m i n i m i z i n g J ∘ F i f m a k i n g u s e o f o u r p a r a m e t r i z a t i o n F f o r I =Δ a s d e s c r i b e d i n R e m a r k 2 . I t i s t h e n i n i t i a l i z e d i n Δ= 0 a n d γ_ps F^k d e n o t e s t h e a c h i e v e d p e r f o r m a n c e l e v e l a f t e r k c o s t e v a l u a t i o n s . T h e r e s u l t s i n c o l u m n 3 a n d 4 o f T a b l e LABEL:DC::tab::results d e m o n s t r a t e t h e b e n e f i t o f e x p l o i t i n g o u r c o n t r o l l e r p a r a m e t r i z a t i o n o v e r a d i r e c t m i n i m i z a t i o n o f t h e c o s t f o r o n l y a f e w (h e r e k=36) i t e r a t i o n s, d e s p i t e t h e g a p i n (P ′) a n d e v e n c o m i n g a l o n g w i t h s a f e t y g u a r a n t e e s . T h i s c a n b e a t t r i b u t e d t o t h e f a c t t h a t t h e d i m e n s i o n o f t h e a m b i e n t c o n t r o l l e r g a i n s p a c e i s l a r g e r t h a n 20 f o r s o m e e x a m p l e s, s i n c e k=200 i t e r a t i o n s l e a d t o f u r t h e r i m p r o v e m e n t s o f p e r f o r m a n c e a s s h o w n i n c o l u m n 2, b u t w i t h o u t g u a r a n t e e s f o r s t a b i l i t y a l o n g t h e i t e r a t i o n s . T h e r e s u l t s i n c o l u m n s 5 a n d 6 f o r A l g o r i t h m 1 s h o w p e r f o r m a n c e l e v e l s m o s t l y s i m i l a r t o γ_ps F^36 a n d f o r a n i d e n t i c a l n u m b e r 36 o f e v a l u a t i o n s o f t h e c o s t . T h e c o m p a r i s o n o f γ^3,3 w i t h γ^0,6 f o r t h e s e c o n d g r o u p o f e x a m p l e s r e v e a l s t h e b e n e f i t o f e x p l o i t i n g (13) i n t h e f i r s t l o o p o f A l g o r i t h m 1 . L e t u s f i n a l l y p o i n t t o t h e f i r s t a n d l a s t c o l u m n i n T a b l e LABEL:DC::tab::results i n o r d e r t o i l l u s t r a t e t h e g e n e r a l b e n e f i t o f o u r s a f e l e a r n i n g a p p r o a c h o v e r a s t a n d a r d r o b u s t d e s i g n, b y f i n d i n g c o n t r o l l e r s w i t h (s o m e t i m e s e v e n d r a s t i c a l l y) i m p r o v e d c l o s e d - l o o p H_∞ p e r f o r m a n c e f o r P_0, w h i c h e v e n c o m e s c l o s e t o t h e o p t i m a l l e v e l γ_nom i n s o m e c a s e s . Table ITable IITable IINominalperformanceandperformanceachievedbyc

(14)

Controller Design via Experimental Exploration with Robustness Guarantees

Abstract

Index Terms:

I INTRODUCTION

II SETTING

II-A Problem Formulation

II-B Encoding Prior Knowledge

II-C Safety

II-D Motivation for Controller Parametrizations

II-E Main Contributions

III PARAMETRIZATION OF TEST CONTROLLERS

III-A Construction of the Controller Parametrization

Theorem 1

Remark 2

Remark 3

III-B Application of the Controller Parametrization

III-C Reducing the Gap in Inequality (𝒫\mathcal{P})

III-D Reducing the Gap in Inequality (𝒮\mathcal{S})

Remark 4

IV AN ALGORITHM

V NUMERICAL EXAMPLES

III-C Reducing the Gap in Inequality ( $\mathcal{P}$ )

III-D Reducing the Gap in Inequality ( $\mathcal{S}$ )