Online Learning Robust Control of Nonlinear Dynamical Systems
Abstract
In this work we address the problem of the online robust control of nonlinear dynamical systems perturbed by disturbance. We study the problem of attenuation of the total cost over a duration in response to the disturbances. The attenuation is defined as the ratio of the deviation of the total cost from the cost of the desired trajectory and the total energy in the disturbance. This is a harder problem than dynamic regret minimization because the comparator in the attenuation problem is zero total cost. We consider the setting where the cost function (at a particular time) is a general continuous function and adversarial, the disturbance is adversarial and bounded at any point of time. Our goal is to design a controller that can learn and adapt to achieve a certain level of attenuation. We analyse two cases (i) when the system is known and (ii) when the system is unknown. We measure the performance of the controller by the deviation of the controller’s cost for a sequence of cost functions with respect to an attenuation , . We propose an online controller and present guarantees for the metric when the maximum possible attenuation is given by , which is a system constant. We show that when the controller has preview of the cost functions and the disturbances for a short duration of time and the system is known when , where . We then show that when the system is unknown the proposed controller with a preview of the cost functions and the disturbances for a short horizon achieves , when , where is the accuracy of a given nonlinear estimator and is the duration of the initial estimation period. We also characterize the lower bound on the required prediction horizon for these guarantees to hold in terms of the system constants.
1 Introduction
In this work we study the problem of regulating a dynamical system perturbed by disturbance. Restating [21], the standard and control approaches for this problem are designed for a specific performance objective: (i) the expected cost for random disturbance drawn from a specific distribution in the case and (ii) the worst-case cost for adversarial disturbance in the case. These approaches can be poor if the disturbance belongs to a different class [21].
This naturally leads to the question whether the control can be dynamically adjusted to achieve desirable performance for any disturbance realization. Recently, this problem has been studied in online control from the point of view of regret minimization [3, 27, 19]. In this framework, for a dynamically evolving system with state , the online control policy is chosen so as to minimize the regret
In contrast, [21] approach this problem from the point of view minimizing the dynamic regret. The key difference is that the regret is defined against the best sequence of control actions in hindsight, not specific to a certain class of policies. The consideration of this regret is important from two perspectives: (i) the comparison is not restricted to a certain class of policies but is against the global optimal dynamic sequence of actions, (ii) as stated in [21] the online controller designed for this regret is more flexible, i.e., can respond to any type of disturbance, whereas linear feedback controllers can only be effective against either stochastic disturbance or adversarial disturbance. The dynamic regret is defined as follows:
In [21], the authors derive the regret sub-optimal controller from the point of view of the dynamic regret for a known sequence of time-varying quadratic cost functions for a known time-varying linear dynamical system. They show that for a specified attenuation the regret sub-optimal controller satisfies
provided such a controller exists for the specified . While these settings study the performance with respect to a class of control policies, they do not provide guarantees for the actual performance: the absolute attenuation of the cost of the system by the online adaptive controller in response to the disturbance. The central problem we study in this work is online guarantees for this problem of attenuation. The attenuation is a harder notion of performance than dynamic regret because the bound on the attenuation gives a bound on the ratio of the dynamic regret and the total energy in the disturbance, defined as , but not vice-versa. We study the setting where the controller may not have complete knowledge of the sequence of cost functions, and the system. Hence, the controller has to learn online to attenuate the cost of the system. The question is how much the online adaptive controller can attenuate the total cost? At what rate does the total cost converge to this attenuation level with ? To answer these questions, we study the deviation of the controller’s cost for a sequence of cost functions, , of the state of the system and the control input , with respect to an attenuation :
The metric as stated is a function of because for the same sequence of control actions the deviation of the cumulative cost from the second term will depend on . We state the online robust problem as follows:
The smallest gives the maximum achievable attenuation and the corresponding quantifies the convergence rate to this attenuation: . This is the central question we study in this work. The problem we study is a harder notion of performance than dynamic regret because the bound on the cost derived from is a bound on the dynamic regret and not vice versa.
In this work we present algorithms and guarantees for the questions posed above. We consider a setting similar to [3]. The system the system is a general non-linear dynamical system perturbed by disturbance. The sequence of cost functions and the disturbances are adversarial and the disturbance at any time is bounded. We make minimal assumptions on the cost functions. We assume that they are continuous functions such that are lower bounded by , where is a general non-negative function. We assume that the desired trajectory is given by . Most distinctly, the cost functions need not be convex.
We present our guarantees for the achievable attenuation for the condition that the optimal cost-to-go function for the horizon at time is upper bounded by . This assumption for the upper bound has two terms: (i) the first term is the contribution of the initial state to the optimal cost-to-go for the period from and (ii) the second term is the contribution of the disturbance over the period . The factor is the maximum achievable attenuation. We note that the optimal cost-to-go for horizon for strongly convex quadratic cost functions for linear dynamical systems trivially satisfies such an upper bound. Hence, the setting we present is sufficiently general. This assumption does not in anyway restrict the problem and is a necessary assumption because the problem we address in this work has a solution only if a finite attenuation is achievable.
1.1 Contribution
The online controller we propose and analyse is a Receding Horizon Controller (RHC) which has preview of the cost functions for a short horizon at time . In a typical RHC setting, the controller optimizes the cost-to-go for a certain horizon and applies the first element of the computed optimal control sequence as the control input. This is then repeated at every time step. The RHC approach to control is a well established technique in control. The primary advantage of the RHC approach is that the optimization carried out at every step can incorporate any type of control input, system state and output constraints, and cost functions [35, 38, 9]. The RHC controller we propose optimizes the cost-to-go for the short horizon to compute the control input for time . It then applies the first element of the computed optimal control sequence for the horizon as the control input. This is repeated at every time step. We analyse the performance of RHC over a duration .
First, we present the setting where the system is known. We present: (i) the performance guarantee for the Receding Horizon Controller (RHC) when the controller has preview of the disturbance for the horizon , i.e., for , and (ii) present the performance guarantee when the controller does not have any preview of the disturbance. For the first case we show that RHC achieves when and the horizon length . We show that the achievable attenuation in this case is nearly optimal that is . For the second case, when the preview of the disturbance is not available we guarantee that the overall cost is bounded by . Hence, in this case we guarantee that achievable attenuation is nearly of the maximum energy in the disturbance.
We then present the setting where the system is unknown and the disturbance preview is available. We present analysis for systems that are Bounded Input Bounded Stable (BIBO) and the cost functions are convex. We assume that the state transition function is parameterized and that only the parameter is unknown. We propose a generic online control framework that operates in two phases: the estimation phase followed by the control phase. The estimation phase runs for a period of time steps, at the end of which the online controller calculates an estimate of . In the control phase, the controller is the RHC that optimizes the cost-to-go for the estimated system. We present performance analysis for a black-box estimation procedure with accuracy given by: , a bounded and decreasing function of . We show that the overall online controller achieves when and the horizon length exceeds the same threshold as before. For linear systems, we trivially recover the results for the known system case since the estimation is trivial when the disturbance preview is available and the system is controllable. The framework we present provides a direct characterization to evaluate the various estimation approaches.
1.2 Notations
For a sequence (of vectors/function) , we denote . For a matrix , we denote its transpose by . We denote the maximum eigenvalue of a matrix by . The two norm of a vector is denoted by . When two matrices and are related by then it implies that is positive semi-definite. Similarly, when it implies that is positive definite. We denote as the dimensional euclidean space and as the positive part of the real line.
2 Preliminaries
We consider the control of a general non-linear dynamical system with disturbances in dynamics:
(1) |
where is a continuous function, ) is the system state, is the control input, ) is the disturbance, the parameter could known or unknown. We make the assumption that the controller observes the full state.
We consider the setting where a fixed horizon preview of the future cost functions and disturbances are available to the algorithm at each time step . In particular, the control input is computed as , where and are fixed horizon lengths and is the control policy at time . The goal of the agent is to select a control policy in order to minimize the cumulative cost with respect to an attenuation . This can be formulated as the following optimization problem
(2) |
Our goal is to characterize the optimal and the maximum achievable attenuation.
The assumptions used throughout this paper are as follows:
Assumption 1
(Disturbance) the disturbance , where is compact and . The disturbance is adversarial for all .
Assumption 2
(Cost Function) The function is Lipschitz continuous for all , with a uniform Lipschitz constant for all and there exists a constant such that . The cost function is adversarial for all .
3 Related Works
The control of dynamical systems with uncertainties such as modeling errors, parametric uncertainty, and disturbances is a central challenge in control theory and so has been extensively studied. There is vast literature in the field of control on control synthesis for systems with such uncertainties. The robust control literature studies the problem of feedback control design for stability and performance guarantees with modeling uncertainty and disturbances; see [46]. The adaptive control literature studies the control of systems with parametric uncertainty; see [43, 28, 7].
Online Control: The field of online adaptive control is a widely studied topic. [1] studied the online Linear Quadratic Regulator (LQR) problem with unknown system and stochastic disturbances. The authors propose an adaptive algorithm that achieves regret w.r.t the best linear control policy, which is the optimal policy. [15] propose an efficient algorithm for the same problem that achieves a regret of . [13] and [34] improved on this result by providing an efficient run-time algorithm with a regret guarantee of for the same problem. Mania et al. [34] also establish -regret for the partially observed Linear Quadratic Gaussian (LQG) setting. Recently, [44] showed that is the optimal regret for the online LQR problem. [12] provide an algorithm for a variant of the online LQR where the system is known and noise is stochastic but the controller cost function is an adversarially chosen quadratic function.
[3] consider the control of a known linear dynamic system with additive adversarial disturbance and an adversarial Lipschitz controller cost function. The controller they propose is a Disturbance Response Controller (DRC). They show that the the learning algorithm achieves -regret with respect to the best DRC in hindsight. In a subsequent work [4] show that a poly logarithmic regret is achievable for strongly convex controller cost and well conditioned stochastic disturbances. [27] consider the same setting when the system is unknown and present an -regret algorithm. Recently, [45] generalized these results to provide similar regret guarantees for the same setting with partial observation for both known and unknown systems.
Receding Horizon control: Many receding horizon control based methods have been proposed for managing disturbances and uncertainties in the system dynamics. For example, some works handle disturbances or uncertainties by robust or chance constraints [29, 22, 33, 48, 23]. Adaptive RHC techniques that adapt online when the system model is unknown have also been proposed [20, 2, 8, 47, 10]. These methods primarily focus on constraint satisfaction, stability and in some cases performance improvement using the adapted models. In contrast to these works, we consider non-asymptotic performance of an online robust adaptive RHC. There are considerable amount of papers that provide performance analysis of RHC under both time-invariant costs [5, 26, 24] and time varying costs [18, 6, 17, 25]. However, most of these studies focus on asymptotic performance.
4 Online Learning Robust Controller: Algorithms
In this section, we present the online control algorithms for the setting where . We present separate algorithms for the case where the system and the case where the system is unknown. The detailed discussion for the case where the disturbance is not accessible is deferred to the appendix.
4.1 System is known
In this setting, we assume that the system dynamics is not parameterized by . we assume that the algorithm has a fixed horizon preview of the future cost functions and disturbances. In particular, at each time , algorithm has access to and , in addition to the history of observation until . We use a receding horizon control approach that minimizes the cost-to-go for a horizon with the previewed disturbances and cost functions. In particular, to find the control input at each time step , the algorithm solves the following optimization problem:
(3) |
where is a compact set. Given that are continuous functions the solution to the above optimization exists. Please see proof of Lemma 3 for a detailed argument. We denote the solution of this optimization by . The control input is set as the first element of the sequence . We succinctly denote the control input by . The overall online control algorithm is given in Algorithm 2.
A.1. Discussion
RHC algorithms are in general complex because of the explicit optimization that is carried out at every time step. The complexity could arise from the system models being too complicated to be directly used for prediction and sometimes the constraints in the problem. If the optimization is convex and the system is linear then the optimization can be solved efficiently via primal-dual interior point frameworks [49]. Complex nonlinear RHC optimization such as Eq. (3) could be solved efficiently via online linearization of the nonlinear model. Such approaches have been shown to be empirically very effective for a wide range nonlinear control applications [30].
4.2 Online Control Framework for Unknown System
For the unknown system case, we assume the system dynamics is parametrized by an unknown parameter , i.e.,
The online controller we propose operates in two phases: (i) estimation phase, and (ii) control phase. The estimation phase runs for a duration of time steps. We assume that the estimation algorithm is such that at the end of the estimation phase the estimated parameter is such that , a bounded and decreasing function of . In the discussion below, we comment on the estimation procedures for specific instances of the system. In this work we present the performance analysis of the online controller without restricting the analysis to a specific estimation procedure. The performance guarantee is presented in terms of the accuracy of a given estimation algorithm . Thus, the analysis we present can be extended to any estimation procedure.
In the control phase, the controller is the online receding horizon controller similar to the known system case. At each instant, the controller optimizes the cost-to-go for the horizon for the realized disturbance for the estimated system:
(4) |
We later argue that the solution to this optimization exists. Please see proof of Lemma 3 for a detailed argument. We denote the solution of this optimization by . The control input is set as the first element of . We succinctly denote the control input by .
B.1. Discussion
In this work, we do not specify the estimation algorithm, rather we present a general framework and performance guarantee for a given black-box estimation procedure whose accuracy is characterized by , where is the duration of the estimation phase.
The estimation procedure is trivially simple when the system is a linear controllable system. In this case, the system matrices can be estimated accurately in finite time by generating a sequence of control inputs that is persistently exciting over a finite interval (see [37] for definition of persistence of excitation) and using least-squares to compute the estimate of the system matrices. In this case and . Hence, the online controller will be able to achieve the same performance as in the known system case.
The estimation can also be simpler when the nonlinear system is linear in parameters [31], i.e., dynamics of the form . Under certain conditions on , an estimator can be designed to estimate in finite time [31]. We note that a condition similar to persistence of excitation is required in this case too.
Nonlinear systems are in general complex to estimate. A typical approach to estimate nonlinear systems is via the Koopman operator [11, 36]. The Koopman operator essentially specifies the transition in a lifted observable space. The Koopman operator can be approximately represented as a point in a linear vector space. The only disadvantage is that this representation requires a suitable set of basis functions to be known. Given this representation, linear estimation via least-squares can be applied to estimate the Koopman operator approximately [11].
The general version of the online control algorithm for the unknown system case is shown in Algorithm 2.
5 Analysis and Results
In this section we present the performance analysis for the online controllers presented above. We first present the case where the system is known and then analyze the unknown case.
5.1 Analysis for Known System Case
We define as the optimal value function of the optimization problem given in Eq. 3. Hence, is given by
We note that this function exists and is continuous given that the solution to the optimization exists and is continuous (see Proof of Lemma 3 for a more detailed argument). In the next lemma, we characterize the incremental change of the value of this function as the system evolves. We present this characterization for the class of systems that satisfies . This represents a sufficiently general class of systems. It includes the strongly convex quadratic cost functions for linear dynamical systems. As stated in the introduction, the first term in the bound represents the contribution to the cost by the initial state and the second term is the bound to the cost for the deviation effected by the disturbances. Here is the maximum achievable attenuation for this system.
Lemma 1
Suppose , is accessible. Then there exists constants and such that for each and , for the closed-loop system , and
where , .
Please see the Appendix for the proof. The result states that the change in the value of the optimal cost-to-go when the system evolves from to is bounded by (i) the cost of the initial state and (ii) some constant times the total energy of the disturbances for the duration starting from . Next we use the above lemma to characterize an upper bound on the per step cost incurred by the online controller.
Lemma 2
Under the conditions of Lemma 1, suppose, , , . Then, for , and any , s.t. , the per-step cost of the closed-loop system at satisfies
.
Please see the Appendix for the proof. The results imply that the cost at a time is bounded by two terms: (i) the contribution of the cost from the initial state that is exponentially decaying with and (ii) the contribution from the disturbances seen by the system so far. From the form of the coefficients it is clear that the online controller exponentially diminishes the contribution of the disturbances from earlier times. In the next theorem we characterize the performance of the online controller.
Theorem 1
Under the conditions of Lemma 1, Then, for , , and any s.t. , the total cost incurred by the closed-loop system satisfies
where .
Please see the Appendix for the proof. It follows from the result that the achievable attenuation by the RHC approach is . Given the bounds on we can choose and such that , where . This means that the RHC approach cannot beat and that .
5.2 Analysis for Unknown System Case
Here, we present the performance analysis for a black-box estimation procedure with accuracy . Let
Let and be two control sequence of length . Define
We argue in Lemma 3 that the solution to the RHC optimization exists. Hence the function is well defined.
Definition 1
We say that a function , provided .
We make the following additional assumptions on the system and the cost functions. These assumptions are required to establish Lipschitz continuity of the solution . This is the key additional step that is required to extend the analysis technique applied to the known system case.
Assumption 3
(System and Cost) (i) The cost is time invariant, (ii) The system given by is Bounded Input Bounded Output (BIBO) stable; (iii) the condition is bounded trajectory; (iv) the estimation accuracy can be guaranteed by bounded excitation; (v) , a compact set and , and is continuous for all ; (vi) the function is Lipschitz, i.e., (trivially includes linear systems); (vii) the cost function is convex in both arguments and strongly convex in the second argument (includes LQR); (viii) wherein ; (ix) .
The assumption that the estimation accuracy can be guaranteed by bounded excitation is trivially satisfiable for linear controllable systems. The assumption that the system is BIBO is required to ensure that the total cost incurred during the estimation phase scales only linearly with .
Lemma 3
Under Assumption 3 and the cost function is time invariant, it holds that
To establish this result we draw from results on Lipschitz continuity of solutions to parametric optimization [40]. Please see the Appendix for the detailed proof. In the next lemma, we characterize the incremental change of the value of the function in the control phase as the closed loop system evolves from to . We establish these results for the same class of systems assumed for the known system case, i.e., systems for which and additionally Assumption 3.
Lemma 4
Please see the Appendix for the Proof. We note that the first two terms in the bound are the same as in the known system case. The last term arises from the error in the parameter estimate . Next we use the above lemma to characterize an upper bound on the per step cost incurred by the online controller in the control phase.
Lemma 5
Under the conditions of Lemma 4, suppose , , . Then, , and any , s.t. , the per-step cost for the closed-loop system at in the control phase satisfies
, , where .
Please see the Appendix for the proof. We note that the per-step cost in the control phase in this case is bounded by three terms; the first two terms are the same as before while the additional third term arises from the error in the parameter estimate . In the next theorem we characterize the performance of the online controller.
Theorem 2
Suppose the conditions of Lemma 4. Then, for , , and any s.t. , the total cost incurred by the closed-loop system satisfies
where .
Proof: By Assumption 3 that the system is BIBO and the accuracy can be guaranteed by bounded excitation it follows that the cost for the estimation phase is bounded by . Then from Lemma 5, the fact that , following steps similar to proof of Theorem 1 it follows that the bound for the cost for the control phase is . Combining the two observations the result follows
The result presented characterizes the achievable performance by the RHC approach for a given black-box estimation procedure. If , as is the case with linear controllable systems, we recover the results for the known system case, where the RHC approach achieves the attenuation . Similarly, for nonlinear systems linear in unknown parameters, as argued earlier, , and so the RHC approach will achieve the attenuation in this case too. For a general estimation procedure for which , the optimal duration for the estimation phase is and .
References
- [1] Yasin Abbasi-Yadkori and Csaba Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.
- [2] Veronica Adetola, Darryl DeHaan, and Martin Guay. Adaptive model predictive control for constrained nonlinear systems. Systems & Control Letters, 58(5):320–326, 2009.
- [3] Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, and Karan Singh. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111–119, 2019.
- [4] Naman Agarwal, Elad Hazan, and Karan Singh. Logarithmic regret for online control. In Advances in Neural Information Processing Systems, pages 10175–10184, 2019.
- [5] David Angeli, Rishi Amrit, and James B Rawlings. On average performance and stability of economic model predictive control. IEEE transactions on automatic control, 57(7):1615–1626, 2011.
- [6] David Angeli, Alessandro Casavola, and Francesco Tedesco. Theoretical advances on economic model predictive control with time-varying costs. Annual Reviews in Control, 41:218–224, 2016.
- [7] Karl J Åström and Björn Wittenmark. Adaptive control. Courier Corporation, 2013.
- [8] Anil Aswani, Humberto Gonzalez, S Shankar Sastry, and Claire Tomlin. Provably safe and robust learning-based model predictive control. Automatica, 49(5):1216–1226, 2013.
- [9] Francesco Borrelli, Alberto Bemporad, and Manfred Morari. Predictive control for linear and hybrid systems. Cambridge University Press, 2017.
- [10] Monimoy Bujarbaruah, Xiaojing Zhang, Marko Tanaskovic, and Francesco Borrelli. Adaptive mpc under time varying uncertainty: Robust and stochastic. arXiv preprint arXiv:1909.13473, 2019.
- [11] Yongxin Chen and Umesh Vaidya. Sample complexity for nonlinear dynamics. arXiv preprint arXiv:1810.11747, 2018.
- [12] Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. Online linear quadratic control. arXiv preprint arXiv:1806.07104, 2018.
- [13] Alon Cohen, Tomer Koren, and Yishay Mansour. Learning linear-quadratic regulators efficiently with only regret. In International Conference on Machine Learning, pages 1300–1309, 2019.
- [14] Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. arXiv preprint arXiv:2009.09623, 2020.
- [15] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4188–4197, 2018.
- [16] Francisco Facchinei and Jong-Shi Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.
- [17] Antonio Ferramosca, Daniel Limon, and Eduardo F Camacho. Economic mpc for a changing economic criterion for linear systems. IEEE Transactions on Automatic Control, 59(10):2657–2667, 2014.
- [18] Antonio Ferramosca, James B Rawlings, Daniel Limón, and Eduardo F Camacho. Economic mpc for a changing economic criterion. In 49th IEEE Conference on Decision and Control (CDC), pages 6131–6136. IEEE, 2010.
- [19] Dylan Foster and Max Simchowitz. Logarithmic regret for adversarial online control. In International Conference on Machine Learning, pages 3211–3221. PMLR, 2020.
- [20] Hiroaki Fukushima, Tae-Hyoung Kim, and Toshiharu Sugie. Adaptive model predictive control for a class of constrained linear systems based on the comparison model. Automatica, 43(2):301–308, 2007.
- [21] Gautam Goel and Babak Hassibi. Regret-optimal control in dynamic environments. arXiv preprint arXiv:2010.10473, 2020.
- [22] Paul J Goulart, Eric C Kerrigan, and Jan M Maciejowski. Optimization over state feedback policies for robust control with constraints. Automatica, 42(4):523–533, 2006.
- [23] PJ Goulart, X Zhang, M Kamgarpour, A Georghiou, and J Lygeros. Robust optimal control with adjustable uncertainty sets. Automatica, 75, 2016.
- [24] Lars Grüne and Anastasia Panin. On non-averaged performance of economic mpc with terminal conditions. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 4332–4337. IEEE, 2015.
- [25] Lars Grüne and Simon Pirkelmann. Closed-loop performance analysis for economic model predictive control of time-varying systems. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 5563–5569. IEEE, 2017.
- [26] Lars Grüne and Marleen Stieler. Asymptotic stability and transient optimality of economic mpc without terminal conditions. Journal of Process Control, 24(8):1187–1196, 2014.
- [27] Elad Hazan, Sham Kakade, and Karan Singh. The nonstochastic control problem. In Algorithmic Learning Theory, pages 408–421. PMLR, 2020.
- [28] Petros A Ioannou and Jing Sun. Robust adaptive control. Courier Corporation, 2012.
- [29] Wilbur Langson, Ioannis Chryssochoos, SV Raković, and David Q Mayne. Robust model predictive control using tubes. Automatica, 40(1):125–133, 2004.
- [30] Maciej Ławryńczuk. Computationally efficient model predictive control algorithms. A Neural Network Approach, Studies in Systems, Decision and Control, 3, 2014.
- [31] Devon Lehrer, Veronica Adetola, and Martin Guay. Parameter identification methods for non-linear discrete-time systems. In Proceedings of the 2010 American Control Conference, pages 2170–2175. IEEE, 2010.
- [32] Yingying Li, Xin Chen, and Na Li. Online optimal control with linear dynamics and predictions: Algorithms and regret analysis. In Advances in Neural Information Processing Systems, pages 14887–14899, 2019.
- [33] D Limon, I Alvarado, TEFC Alamo, and EF Camacho. Robust tube-based mpc for tracking of constrained linear systems with additive disturbances. Journal of Process Control, 20(3):248–260, 2010.
- [34] Horia Mania, Stephen Tu, and Benjamin Recht. Certainty equivalent control of lqr is efficient. arXiv preprint arXiv:1902.07826, 2019.
- [35] David Q Mayne, James B Rawlings, Christopher V Rao, and Pierre OM Scokaert. Constrained model predictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.
- [36] Igor Mezic. Koopman operator, geometry, and learning. arXiv preprint arXiv:2010.05377, 2020.
- [37] J Moore. Persistence of excitation in extended least squares. IEEE Transactions on Automatic Control, 28(1):60–68, 1983.
- [38] Manfred Morari and Jay H Lee. Model predictive control: past, present and future. Computers & Chemical Engineering, 23(4-5):667–682, 1999.
- [39] Arkadi Nemirovski. Interior point polynomial time methods in convex programming. Lecture notes, 2004.
- [40] Marc Quincampoix and Nadia Zlateva. Parameterized minimax problem: on lipschitz-like dependence of the solution with respect to the parameter. SIAM Journal on Optimization, 19(3):1250–1269, 2008.
- [41] R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.
- [42] J Ben Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
- [43] Shankar Sastry and Marc Bodson. Adaptive control: stability, convergence and robustness. Courier Corporation, 2011.
- [44] Max Simchowitz and Dylan Foster. Naive exploration is optimal for online lqr. In International Conference on Machine Learning, pages 8937–8948. PMLR, 2020.
- [45] Max Simchowitz, Karan Singh, and Elad Hazan. Improper learning for non-stochastic control. arXiv preprint arXiv:2001.09254, 2020.
- [46] Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control: analysis and design, volume 2. Citeseer, 2007.
- [47] Marko Tanaskovic, Lorenzo Fagiano, and Vojislav Gligorovski. Adaptive model predictive control for linear time varying mimo systems. Automatica, 105:237–245, 2019.
- [48] Roberto Tempo, Giuseppe Calafiore, and Fabrizio Dabbene. Randomized algorithms for analysis and control of uncertain systems: with applications. Springer Science & Business Media, 2012.
- [49] Stephen J Wright. Efficient convex optimization for linear mpc. In Handbook of model predictive control, pages 287–303. Springer, 2019.
Appendix A: Proof of Lemma 1
We note that
The last inequality follows directly from Assumption 2. Hence, . We introduce some notation for simplifying the presentation of the analysis. If , then let . We denote by , and by .
Let denote the state the system evolves to at time starting at , under the control sequence and disturbance . From the definition of it follows that
The second step just follows from the definition of . Hence,
where , where is the optimal control sequence given the control for the first time steps is . The second inequality follows from the fact that is sub-optimal compared to . Now by definition
Substituting this in the above inequality, the first term on the right, the summation from to gets cancelled. Hence, we have
By definition of , the first sum on the right is the optimal cost-to-go starting from the state for the horizon from . Hence,
Once again by the assumption in the Lemma we have that
Hence,
Now, . Hence,
(5) |
From Assumption 2 and the fact that for any it follows that
The above inequality implies that such that
Then using this in Eq. (5) it we get that
Then using the fact that we get that
Hence,
Appendix B: Proof of Lemma 2
Since , there exists such that
For convenience of presentation in the following we denote by . From Lemma 1 it follows that
From the above inequality it follows that and . Hence,
Then using the condition that we get
Pick such that . Then
Let . Note that . Let . Then
Then,
Similarly,
Extending the previous equation we get that for
(6) |
Next, we prove the upper bound for the case by induction. We hypothesize that for is given by
(7) |
From Lemma 1 we know that
Substituting Eq. (7)
From Eq. (6) the hypothesis Eq. (7) is true for . Hence, by induction Eq. (7) is true for all . We can simplify Eq. (7) as
Since . Then using the condition that we get
Thus, for any ,
Appendix C: Theorem 1
In the following we denote the cost incurred at time step succinctly by ignoring its arguments. Let . Given that , Lemma 2 is applicable to bound . Hence, for any s.t. , the stage cost of the closed loop system at is given by,
Hence,
The first two terms are . Combining the terms corresponding to each disturbance realization we get
The last term is also . Hence,
Appendix D: Disturbance is not Accessible
In this setting . The receding horizon online controller for this case optimizes the worst-case cost-to-go instead of the cost-to-go for the realized disturbance:
(8) |
Given the fact that the constraint is compact, the solution to the inner optimization exists and is continuous by an argument similar to Eq. (3) (see Proof of Lemma 3). Applying the same argument again for the outer optimization we find that the solution to the outer optimization exists and is continuous.
While RHC algorithms are complex because of the optimization, min-max optimization further exacerbate the difficulty because of the bi-level nature of the optimization. Min-max optimization of general non-convex and non-concave objective functions are known to be hard problems [14]. It was shown in [14] that finding an approximate local min-max point of large enough approximation is guaranteed to exist, but finding one such point is PPAD-complete. When the objective function is a convex-concave function, i.e., L is convex in x for all y and it is concave in y for all x, the problem with constraints is guaranteed to have a solution, under compactness of the constraint set [42], while computing a solution is amenable to convex programming. In fact, if L is smooth, the problem can be solved via first-order methods, and achieve an approximation error of in iterations; see e.g. [39]. When the function is strongly convex and strongly concave, the rate becomes geometric [16].
We denote the solution computed by the optimization Eq. (8) as and . The control input is set as the first element of the sequence . We succinctly denote this by . The complete algorithm for this case is given in Algorithm 3.
Let
By the existence of the solution this function is well defined. In the next lemma, we characterize the incremental change of the value of this function as the closed loop system evolves from to . As in the previous case, we establish the result for the class of systems that satisfy . In contrast to the previous case, the bound of this function is specified in terms of the maximum disturbance. Given the min-max nature of the optimization where the maximization is carried out overall all disturbance realizations this bound naturally follows from the bound assumed on in the previous case. It is easy to see that and are related by . We once again note that this class of systems trivially includes the strongly convex quadratic cost functions for linear dynamical systems.
Lemma 6
Suppose . Then there exists constants and such that for each and , for the closed-loop system , and
where , .
Proof: We note that
The last inequality follows directly from Assumption 2. Hence, . We introduce some notation for simplifying the presentation of the analysis. If , then let . We denote by , by , and by . Similarly, let
Let denote the state the system evolves to at time starting at , under the control sequence and disturbance . From the definition of it follows that
Define
Let
Let . Given that is not the worst-case disturbance realization for the system starting at and for the control sequence . Hence,
Let . From the definitions of it follows that
Using the above inequality we further get
Using the definitions of and , we get
Then for any , we get
By definition
Hence for any ,
By definition
Using the fact that and , it follows that
Hence,
By definition is the worst-case disturbance realization for the control sequence and the system starting . Hence,
Where the last inequality follows from . Using this inequality we get that
Hence there exists a such that
Hence,
Similar to the previous case two terms bound the incremental change in the value of function : the first term is the contribution from the cost for the state and the second term is the contribution from the maximum possible energy for the disturbance recognizing that the factor in is . Next we use the above lemma to characterize the performance of the online controller.
Theorem 3
Suppose the conditions of Lemma 6. Then, for , , , and any s.t. , the total cost incurred by the closed-loop system satisfies
where .
Proof: Since , there exists such that such that
For convenience of presentation in the following we denote by . From Lemma 1 it follows that
Hence,
Then using the fact that we get
Then using the fact that we get
Pick s.t. . Then
Let . Then
For convenience we denote the cost incurred at time step just by ignoring its arguments. Then
Then using the fact that we get
Thus summing over all we get
We note two differences in the result: (i) the attenuation depends on and (ii) the attenuation guarantee is given with respect to the maximum possible energy from the disturbances for the duration , i.e., . The controller’s performance can only characterized in terms of the maximum possible energy of the disturbance since the controller is always regulating for the worst-case, naturally. Given that , we can choose and such that, . Thus the online RHC approach achieves an attenuation of with respect to the maximum possible energy of the disturbance.
Appendix E: Proof of Lemma 3
Here we prove that under Assumption 3 the conditions required for Proposition 2.4, [40] to hold are satisfied. Since are continuous, is bounded, the objective function for the optimization in Eq. (3) is uniformly level bounded (see [41] for definition) and by definition continuous functions are proper. This satisfies the assumption in Theorem 1.17, [41] that the objective function is proper, lower semi-continuous and uniformly level bounded. Moreover by continuity of the functions there should exist a finite solution for all , and all s.t. . Hence the domain of the solution is . Then by continuity of and point 3 of Theorem 1.17, [41] the solution is continuous in and . Hence Assumption A1 of proposition of Proposition 2.4, [40] is satisfied. Since is strongly convex in the second argument it follows that Assumption A2 of Proposition 2.4, [40] trivially holds. Finally, by (vii) of Assumption 3 it follows that the Assumption A3 of Proposition 2.4, [40] is satisfied. Thus, all the assumptions of Proposition 2.4, [40] are satisfied. Hence, applying Proposition 2.4 we get the final result.
Appendix F: Proof of Lemma 4
We note that
The last inequality follows directly from Assumption 2. Hence, . We introduce some notation for simplifying the presentation of the analysis. If , then let . We denote by , and by .
Let denote the state the system evolves to at time starting at , under the control sequence and disturbance . From the definition of and Lemma 3 it follows that
We now analyze the term . By definition
The second step just follows from the definition of . Hence,
where , where is the optimal control sequence given the control for the first time steps is . The second inequality follows from the fact that sub optimal compared to . Now by definition
Let . Then . Recall that . Hence, by Lemma 3
Hence,
where . Hence,
Hence,
Hence,
By definition of , the first sum on the right is the optimal cost-to-go starting from the state for the horizon from . Hence,
By the assumption in the Lemma we have that
Now, . Hence,
(9) |
From Assumption 2 and the fact that for any it follows that
The above inequality implies that such that
Then using this in Eq. (9) it we get that
Then using the fact that and rearranging terms we get that
Hence,
Appendix G: Proof of Lemma 5
The proof is very similar to Lemma 5. Since , there exists such that
For convenience of presentation in the following we denote by . From Lemma 1 it follows that
From the above inequality it follows that and . Hence,
Then
Hence,
Pick such that . Then
Let , where . Let , and . Then
Then following steps similar to the proof of Lemma 5 we get that
Since , the condition implies