Controller Design via Experimental Exploration with Robustness Guarantees
Abstract
For a partially unknown linear systems, we present a systematic control design approach based on generated data from measurements of closed-loop experiments with suitable test controllers. These experiments are used to improve the achieved performance and to reduce the uncertainty about the unknown parts of the system. This is achieved through a parametrization of auspicious controllers with convex relaxation techniques from robust control, which guarantees that their implementation on the unknown plant is safe. This approach permits to systematically incorporate available prior knowledge about the system by employing the framework of linear fractional representations.
Index Terms:
Experimental exploration, robust controller design, linear matrix inequalities.13.1(1, 15.25) ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI: 10.1109/LCSYS.2020.3004506
I INTRODUCTION
Recently, learning and data-based control design approaches have received a lot of attention even for linear systems [FerUme20, BocMat18, FazGe19, ZhaHu20, BerSch16, MarHen17]. These approaches can often be subsumed under the broad framework of reinforcement learning [KobBag13], but are still rather diverse [MatPro19]. In [FerUme20] robust control is combined with a dual design strategy that is used for exploring the closed-loop behavior, while [BocMat18] employs the system level synthesis framework with an identification step followed by a robust design and an end-to-end analysis. The approaches in [FazGe19, ZhaHu20] are based on policy gradient methods, while [BerSch16, MarHen17] rely on Bayesian optimization strategies involving Gaussian processes for tuning the controller parameters. The latter strategies turned out to be very efficient for various applications, in particular, in robotics [DriEng17, CalSey15, RohTri18].
Bayesian optimization and other direct sampling methods aim to synthesize optimal controllers based on measurements of a closed-loop cost function involving an unknown system to which suitable test controllers are applied [BerSch16, MarHen17, DriEng17, CalSey15, RohTri18]. While these methods have successfully been used in practice, several aspects are subject to current research:
-
•
A critical issue is safety which means here (and in contrast to the many other interpretations as, e.g., in [DeaTu19]) that the implemented controllers are guaranteed to stabilize the unknown plant [BerSch16, TurKra19]. Such guarantees are not often provided in learning control, which might lead to catastrophic outcomes due to closed-loop instability during the tuning process. To this end, a safe threshold on the cost is introduced in [BerSch16] as an indicator for stability, while [TurKra19] incorporates a robustness objective in terms of classical delay and gain margins.
-
•
The choice of a suitable parametrization of test controllers is another important issue which aims to keep the evaluations of the cost small even if the set of admissible controllers is large [MarHen16, RobMan11, BanCal17]. In [RobMan11], several naive parametrizations are illustrated and one based on the Youla parametrization is studied. In [MarHen16], the controller candidates are parametrized in terms of the weights in an LQ design for a given nominal system.
-
•
Different ways to incorporate prior knowledge is another topic of tremendous importance in these approaches [MarHen16, MarHen17, BocMat18, KobBag13, BerSch15, RohTri18]. A linearization of the underlying nonlinear system is used in [MarHen16] for the construction of a parametrization. In [MarHen17], prior knowledge is used for the design of specialized kernels that outperform standard ones, while [RohTri18] discusses how to choose hyperparameters from some simulation model.
In this paper, we propose a systematic parametrization of controllers based on modeling, analysis and design techniques from robust control that can be used for controller tuning/sampling and addresses all of the above concerns at the same time.
We assume that is only partially unknown and employ the linear fractional representation (LFR) framework in order to separate known from unknown (or difficult) components. Such representations are well-established and flexible modeling tools in robust control [ZhoDoy96, Sch01a], but they are not often used in learning control. In particular, LFRs allow for expressing as feedback interconnection of a known linear system and some unknown or uncertain component ; the set captures, e.g., crude guesses on parameter ranges. Prior knowledge is thus encoded in the choices of and . Dedicated robust design techniques then allow the synthesis of controllers that stabilize the uncertain interconnection and, hence, are guaranteed to stabilize the unknown ; these techniques ensure safety. In this initial work, we assume that the uncertain component is parametric and construct a parametrization based on a partition of the set . The main idea is to use controllers as obtained from a robust multi-objective design problem with guaranteed stability and performance on and , respectively.
Outline. The remainder of the paper is organized as follows. After a short paragraph on notation, we specify the considered learning control problem and discuss its essential ingredients. Next we propose a systematic parametrization of robust controllers for safely and exploratively evaluating the underlying closed-loop cost function. We elaborate on the properties of this parametrization and demonstrate its benefits on some numerical examples.
Notation. We use the star product “” and all rules for linear fractional transformations (LFTs) as in [ZhoDoy96, Chapter 10]. Objects that can be inferred by symmetry or are not relevant are indicated by “”.
II SETTING
II-A Problem Formulation
We assume that we are given an unknown real system described as
(1) |
Here is the controlled output and is a generalized disturbance (both used to formulate performance specifications), while is the measurement output and the control input. The underlying control problem is to find a controller
(2) |
such that the corresponding closed-loop system, which is referred to as , is stable and such that a closed-loop cost function , which encodes the performance specifications, is minimized. Since is unknown, we aim to find such a controller based on evaluations of the cost function . This amounts to the selection of suitable test controllers, their implementation on the real system and the evaluation of their achieved closed-loop performance.
This is the setting in [BerSch16, MarHen17, DriEng17, CalSey15, RohTri18], where the individual approaches differ, e.g., in the choice of cost, the available measurements from the plant, the assumed prior knowledge about the plant and the employed controller parametrization.
We confine the discussion to continuous-time linear time-invariant (LTI) systems and the design of state-feedback controller gains which motivates to choose as the state of . Moreover, we choose the -norm cost function
(3) |
for which ample motivations are found in the robust control literature. Data-based techniques for estimating -norms have been proposed, e.g., in [RalFor17]. To simplify the exposition, we assume that the measurements of the cost are exact, although it is possible to extend our framework to noisy ones.
II-B Encoding Prior Knowledge
We consider the case that is partially unknown. To this end, we adopt the framework of LFRs [ZhoDoy96] and describe as the interconnection of some known system given by
(4) |
in feedback with some unknown part or uncertainty
(5) |
Here is a real matrix of suitable dimension and , are the interconnection variables. Then (1) admits a state-space representation with the matrix
By slightly abusing the notation, the latter matrix and the system (1) are denoted as . Note that the controlled unknown system, the interconnection of (4), (5) and (2), is then given by . Such representations are known to be highly flexible since they permit to effectively capture structural dependencies of models on uncertain scalar parameters or matrix sub-blocks, which are typically collected on the diagonal of the (structured) uncertainty . As an extra advantage, LFRs allow a seamless generalization to multiple heterogeneous (i.e., a mixture of time-varying, non-linear or infinite dimensional) uncertainties collected in a nonlinear feedback operator , but this is not pursued here.
Instead, we adopt the point-of-view that describes unknown parts of the system , while the known parts (as, e.g., resulting from first-principle modeling) are captured by . Moreover, we assume that is contained in some known set of matrices that is compact and typically given by
with (repeated) diagonal and full unstructured blocks on the diagonal, all bounded in norm by one. As an extreme case, this description does capture the models in [BerSch15, MarHen17, BocMat18] and the ones in [FerUme20, MatPro19, FazGe19, ZhaHu20], in which it is assumed that nothing aside from linearity is known about and where is just one large unstructured uncertain matrix.
Note that the development of modern robust control has been substantially motivated by the fact that facing completely unknown systems is often not realistic. By now LFRs are used in tandem with dedicated analysis and design tools from robust control, such as structural singular values or integral quadratic constraints (IQCs) [MegRan97], which permit to accurately exploit the fine structure of the unknown .
Therefore, in view of their modeling power, LFRs provide an ideal setting to incorporate prior structural knowledge about a system (through ) with unknown to-be-learnt components (through the elements of ).
II-C Safety
Clearly, guaranteeing stability is a critical issue in learning based approaches since probing the system with gains that are not stabilizing can lead to catastrophes. In contrast to many other approaches as, e.g., in [DriEng17, CalSey15, RohTri18] and aligned with [MarHen17], we propose to only select controllers that are guaranteed to be robustly stabilizing, i.e., that are taken from the set
(6) |
This set is typically much smaller than the set of controllers that are merely required to stabilize . However, since is unknown and since we can only rely on the prior knowledge about , there is no other choice than to pick gains from in order to ensure a safe operation of the system in closed-loop. The minimal value of over the set is related to the cost of interest as
() |
in which the gap reflects the price to-be-paid for safety.
II-D Motivation for Controller Parametrizations
For optimizing the cost it is highly beneficial and an often seen strategy to parameterize a family of test controllers by a few parameters before applying an optimization algorithm, especially if the ambient space of controller gains has a large dimension [MarHen16, RobMan11, BanCal17]. Formally, such a parametrization is a mapping with a domain that is contained in a low dimensional ambient space and chosen in order to render the gap in the inequality
() |
as small as possible. Then, the idea is to minimize the surrogate cost over instead of determining the minimum of the original cost . Since the former minimization problem is formulated in a low dimensional space, it is expected to require substantially fewer evaluations of the cost function for its (approximate) solution. The gap in () constitutes the price to-be-paid for this reduction of complexity and is rarely analyzed in the literature. We stress that it is instrumental to choose a parametrization such that its values are contained in for reasons of safety. Then the gap in () can even be more precisely identified as the sum of that in () and the one in
() |
II-E Main Contributions
For some index set , we propose a novel parametrization of auspicious robustly stabilizing controllers based on a partition of . It features an a priori safety guarantee without the need to ensure this property through the employed optimization algorithm as proposed, e.g., in [ZhaHu20]. For its construction, we use advanced robust control techniques that explicitly take the available prior knowledge into account. Based on this parametrization, we show how experimental controller probing allows for controlling the size of the gap in () by varying the coarseness of the partition of , and how to even reduce the gap in () by systematically decreasing the size of without endangering safety.
In comparison to a standard robust design, which does not utilize data from closed-loop experiments, our approach naturally generates safe controllers with improved performance on the real plant .
III PARAMETRIZATION OF TEST CONTROLLERS
III-A Construction of the Controller Parametrization
Let us choose the index set and subsets of the uncertainty set that form the partition
With this partition, we construct based on the rationale to render for at least one index as small as possible, since this leads to the best possible reduction of the gap in (). The proposed parametrization assigns to a controller which reduces as much as possible. This means that we are facing a robust multi-objective synthesis problem involving robust stability w.r.t. and worst-case performance w.r.t. . Such problems are usually nonconvex as well as nonsmooth and thus hard to solve systematically. Still, it is possible to compute good upper bounds on the corresponding optimal value by solving a linear SDP if relying on so-called multiplier relaxations in robust control. One such relaxation is given in Theorem 1 and requires to specify a set of real symmetric matrices with an LMI description such that
We also assume that such multiplier classes are available for the partition members and for . A more detailed discussion with concrete choices for such multiplier sets can be found in [Sch05, SchWei00].
Theorem 1
For fixed consider the system of LMIs
(7a) | |||
(7b) |
in the variables , , , , and with the abbreviations
If these LMIs are feasible, the controller gain satisfies and .
The proof of this result is found in [SchWei00]. It shows that
(8) |
is satisfied for .
All this leads us to the construction of the parametrization as follows: For some fixed small and , we assign to some gain with
(9) |
We emphasize that both and can be computed by solving a standard semi-definite program. Still note that, in general, is not attained (no optimal controller exists), which motivates the introduction of .
In the sequel, we abbreviate the surrogate cost function resulting from the parametrization as
Further, let us note at this point that for some index clearly implies
(10) |
Remark 2
Remark 3
As a key difference between the cost and its surrogate , the domain of the former consists of the only implicitly defined set of (robustly) stabilizing controllers, while the latter can be evaluated directly. In particular for as in Remark 2, is simply defined on .
III-B Application of the Controller Parametrization
After having introduced the controller parametrization, the conceptual algorithm of this paper reads as follows. For each , we can implement the controller on the system, since it is assured to be stabilizing for , and measure the cost . A mere minimization over then leads to an optimal controller , and inequality () now reads as
(11) |
Fine partitions of lead to large index sets . Instead of considering all , we can take fewer (random) samples of and obtain a (rough) approximation of . In particular for the partition with as described in Remark 2, one can directly employ a whole variety of smarter (derivative free) sampling and optimization strategies, such as Bayesian optimization involving Gaussian processes discussed in [BerSch16, MarHen17].
Instead of considering a single (fine) partition, one can as well start from a coarse partition of and propose adaptive refinement strategies which generate a sequence of controller parametrizations as follows. Given , determine an index . Then generate a partition of denoted as in order to obtain a refined partition of the original set as
(12) |
This refined partition yields a new parametrization with corresponding new cost and some next optimal index . By construction it is guaranteed that holds. This step can be iterated in order to further decrease the value of the surrogate cost. In Section IV we propose a specific algorithm based on this approach which involves, in particular, a concrete strategy for refining given partitions.
III-C Reducing the Gap in Inequality ()
Our setup allows for identifying the sources of the gap in () and permits to generate systematic refinements towards its reduction. To illustrate this issue, let us suppose that the relaxation gap in (8) is small. Then we infer (by the definition of and for small ) that
On the other hand, for with and if this member of the partition is sufficiently small, we have
Hence is close to which shows that the gap in () is small. In conclusion, it is essential that the size of the partition member containing and the relaxation gap in (8) are both small. Without going into details, we emphasize that the latter can be controlled with the choices of the multiplier sets and , through the use of more advanced multi-object control techniques and by applying further refinements in robust control [ArzPea00], such as incorporating S-variables [EbiPea15] or dynamic instead of static IQCs [MegRan97].
III-D Reducing the Gap in Inequality ()
Our approach offers the opportunity to even reduce the gap in () by identifying a smaller index set with
(13a) | |||
Indeed, this is guaranteed with | |||
(13b) |
since implies by (10). Note that can be considerably smaller than the original , which implies that the related set of robustly stabilizing controllers is (much) larger than . Thus, replacing with reduces the cost of safety as expressed by the gap in ().
This suggests to repeat our design procedure for , which amounts to constructing a new parametrization giving controllers (via Theorem 1) with which we can perform new closed-loop experiments to evaluate . The controllers are expected to achieve (considerably) improved closed-loop performance with a smaller gap in (), just due to the reduction of the gap in (). The algorithm proposed in the next section is based on this strategy.
Remark 4
The set is not guaranteed to be convex. Similarly as in [Sch01], in this case one can express it as union of few convex sets and modify Theorem 1 by using a robust stabilization objective for each of the individual convex sets; this purposive design comes along with an increased numerical burden.
Note that the set is constructed based on (10) which provides an upper bound on the cost at the index with . We can also devise a lower bound which can be exploited similarly in order to further reduce and shrink the gap in (). To this end, observe that standard design permits to numerically determine
Then for indeed yields the lower bound
Note that this lower bound is not cheap to compute as it involves a numeric minimization of on for each considered . In contrast, the upper bound is essentially obtained for free while constructing the map .
IV AN ALGORITHM
In this section we propose a concrete algorithm that works in higher dimensions and aims to exploit (13). It involves the uncertainty box where are given intervals and in which we use the abbreviation . The related Algorithm 1 is motivated by coordinate descent, which currently becomes more popular due to its appearance in machine learning applications.
The first loop of the algorithm generates a partition of by taking a uniform partition only of the interval related to the parameter . In line 6, it exploits (13) in order to shrink to a new interval and to generate a reduced parameter set that is guaranteed to contain ; moreover, it still has the structure of a hyperrectangle. In line 7 and as input to the second loop, we store those intervals for which the best performance level is observed. Running this loop times requires to perform experiments.
In the second loop, the algorithm adaptively refines those subsets of for which the best closed-loop performance was achieved. This proceeds as in Section III-B, by generating sub-partitions along each parameter axis. Again, runs of the loop require experimental cost evaluations. In particular, if we let , the number of evaluations grows linearly in the number of unknown parameters and turns the algorithm applicable even if is large.
V NUMERICAL EXAMPLES
For numerical illustrations, we consider several modified examples from the library COMPleib [Lei04] which, unfortunately, does not comprise robust control examples. We let be the matrices in (1.1) from [Lei04] and choose the remaining matrices in order to define in (4) as , , ,
(14) |