This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Online Learning Robust Control of Nonlinear Dynamical Systems

Deepan Muthirayan
University of California, Irvine
Irvine, CA - 92697
[email protected]
&Pramod P. Khargonekar
University of California, Irvine
Irvine, CA - 92697
[email protected]
Abstract

In this work we address the problem of the online robust control of nonlinear dynamical systems perturbed by disturbance. We study the problem of attenuation of the total cost over a duration TT in response to the disturbances. The attenuation is defined as the ratio of the deviation of the total cost from the cost of the desired trajectory and the total energy in the disturbance. This is a harder problem than dynamic regret minimization because the comparator in the attenuation problem is zero total cost. We consider the setting where the cost function (at a particular time) is a general continuous function and adversarial, the disturbance is adversarial and bounded at any point of time. Our goal is to design a controller that can learn and adapt to achieve a certain level of attenuation. We analyse two cases (i) when the system is known and (ii) when the system is unknown. We measure the performance of the controller by the deviation of the controller’s cost for a sequence of cost functions with respect to an attenuation γ\gamma, RtpR^{p}_{t}. We propose an online controller and present guarantees for the metric RtpR^{p}_{t} when the maximum possible attenuation is given by γ¯\overline{\gamma}, which is a system constant. We show that when the controller has preview of the cost functions and the disturbances for a short duration of time and the system is known RTp(γ)=O(1)R^{p}_{T}(\gamma)=O(1) when γγc\gamma\geq\gamma_{c}, where γc=𝒪(γ¯)\gamma_{c}=\mathcal{O}(\overline{\gamma}). We then show that when the system is unknown the proposed controller with a preview of the cost functions and the disturbances for a short horizon achieves RTp(γ)=𝒪(N)+𝒪(1)+𝒪((TN)g(N))R^{p}_{T}(\gamma)=\mathcal{O}(N)+\mathcal{O}(1)+\mathcal{O}((T-N)g(N)), when γγc\gamma\geq\gamma_{c}, where g(N)g(N) is the accuracy of a given nonlinear estimator and NN is the duration of the initial estimation period. We also characterize the lower bound on the required prediction horizon for these guarantees to hold in terms of the system constants.

1 Introduction

In this work we study the problem of regulating a dynamical system perturbed by disturbance. Restating [21], the standard 2\mathcal{H}_{2} and \mathcal{H}_{\infty} control approaches for this problem are designed for a specific performance objective: (i) the expected cost for random disturbance drawn from a specific distribution in the 2\mathcal{H}_{2} case and (ii) the worst-case cost for adversarial disturbance in the \mathcal{H}_{\infty} case. These approaches can be poor if the disturbance belongs to a different class [21].

This naturally leads to the question whether the control can be dynamically adjusted to achieve desirable performance for any disturbance realization. Recently, this problem has been studied in online control from the point of view of regret minimization [3, 27, 19]. In this framework, for a dynamically evolving system with state xtx_{t}, the online control policy π={πt}1tT\pi=\{\pi_{t}\}_{1\leq t\leq T} is chosen so as to minimize the regret

RTΠ=(t=1Tct(xt,πt({wl}1lt)infπΠt=1Tct(xt,π(w))),w={wt}1tTR^{\Pi}_{T}=\left(\sum_{t=1}^{T}c_{t}(x_{t},\pi_{t}(\{w_{l}\}_{1\leq l\leq t})-\inf_{\pi\in\Pi}\sum_{t=1}^{T}c_{t}(x_{t},\pi(w))\right),\ w=\{w_{t}\}_{1\leq t\leq T}

In contrast, [21] approach this problem from the point of view minimizing the dynamic regret. The key difference is that the regret is defined against the best sequence of control actions in hindsight, not specific to a certain class of policies. The consideration of this regret is important from two perspectives: (i) the comparison is not restricted to a certain class of policies but is against the global optimal dynamic sequence of actions, (ii) as stated in [21] the online controller designed for this regret is more flexible, i.e., can respond to any type of disturbance, whereas linear feedback controllers can only be effective against either stochastic disturbance or adversarial disturbance. The dynamic regret is defined as follows:

RTd=t=1Tct(xt,πt({wl}1lt)inf{ut}1tTt=1Tct(xt,ut(w)),w={wt}1tTR^{d}_{T}=\sum_{t=1}^{T}c_{t}(x_{t},\pi_{t}(\{w_{l}\}_{1\leq l\leq t})-\inf_{\{u_{t}\}_{1\leq t\leq T}}\sum_{t=1}^{T}c_{t}(x_{t},u_{t}(w)),\ w=\{w_{t}\}_{1\leq t\leq T}

In [21], the authors derive the regret sub-optimal controller from the point of view of the dynamic regret for a known sequence of time-varying quadratic cost functions ct(.,.)c_{t}(.,.) for a known time-varying linear dynamical system. They show that for a specified attenuation γ\gamma the regret sub-optimal controller satisfies

RTdγt=1Twt22,R^{d}_{T}\leq\gamma\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}_{2},

provided such a controller exists for the specified γ\gamma. While these settings study the performance with respect to a class of control policies, they do not provide guarantees for the actual performance: the absolute attenuation of the cost of the system by the online adaptive controller in response to the disturbance. The central problem we study in this work is online guarantees for this problem of attenuation. The attenuation is a harder notion of performance than dynamic regret because the bound on the attenuation gives a bound on the ratio of the dynamic regret and the total energy in the disturbance, defined as t=1Twt22\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}_{2}, but not vice-versa. We study the setting where the controller may not have complete knowledge of the sequence of cost functions, and the system. Hence, the controller has to learn online to attenuate the cost of the system. The question is how much the online adaptive controller can attenuate the total cost? At what rate does the total cost converge to this attenuation level with TT? To answer these questions, we study the deviation of the controller’s cost for a sequence of cost functions, {ct(.,.),t=1,2,}\{c_{t}(.,.),t=1,2,...\}, of the state of the system xtx_{t} and the control input utu_{t}, with respect to an attenuation γ\gamma:

RTp(γ)=(t=1Tct(xt,ut)γt=1Twt2)+.R^{p}_{T}(\gamma)=\left(\sum_{t=1}^{T}c_{t}(x_{t},u_{t})-\gamma\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}\right)_{+}.

The metric as stated is a function of γ\gamma because for the same sequence of control actions the deviation of the cumulative cost from the second term will depend on γ\gamma. We state the online robust problem as follows:

1. What is the achievableRTp(γ)?2. What is the smallestγs.t.RTpis sub-linear?\textit{1. What is the achievable}\ R^{p}_{T}(\gamma)?\hskip 20.00003pt\textit{2. What is the smallest}\ \gamma\ \text{s.t.}\ R^{p}_{T}\ \text{is sub-linear?}

The smallest γ\gamma gives the maximum achievable attenuation and the corresponding RTp(γ)R^{p}_{T}(\gamma) quantifies the convergence rate to this attenuation: RTp(γ)/TR^{p}_{T}(\gamma)/T. This is the central question we study in this work. The problem we study is a harder notion of performance than dynamic regret because the bound on the cost derived from RtpR^{p}_{t} is a bound on the dynamic regret and not vice versa.

In this work we present algorithms and guarantees for the questions posed above. We consider a setting similar to [3]. The system the system is a general non-linear dynamical system perturbed by disturbance. The sequence of cost functions and the disturbances are adversarial and the disturbance at any time is bounded. We make minimal assumptions on the cost functions. We assume that they are continuous functions such that are lower bounded by α¯σ(x)\underline{\alpha}\sigma(x), where σ(.)\sigma(.) is a general non-negative function. We assume that the desired trajectory is given by σ(x)=0\sigma(x)=0. Most distinctly, the cost functions need not be convex.

We present our guarantees for the achievable attenuation for the condition that the optimal cost-to-go function for the horizon MM at time tt is upper bounded by α¯σ(xt)+γ¯k=0M1wt+k22\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=0}^{M-1}\lVert w_{t+k}\rVert^{2}_{2}. This assumption for the upper bound has two terms: (i) the first term is the contribution of the initial state to the optimal cost-to-go for the period MM from tt and (ii) the second term is the contribution of the disturbance over the period MM. The factor γ¯\overline{\gamma} is the maximum achievable attenuation. We note that the optimal cost-to-go for horizon MM for strongly convex quadratic cost functions for linear dynamical systems trivially satisfies such an upper bound. Hence, the setting we present is sufficiently general. This assumption does not in anyway restrict the problem and is a necessary assumption because the problem we address in this work has a solution only if a finite attenuation is achievable.

1.1 Contribution

The online controller we propose and analyse is a Receding Horizon Controller (RHC) which has preview of the cost functions for a short horizon MM at time tt. In a typical RHC setting, the controller optimizes the cost-to-go for a certain horizon and applies the first element of the computed optimal control sequence as the control input. This is then repeated at every time step. The RHC approach to control is a well established technique in control. The primary advantage of the RHC approach is that the optimization carried out at every step can incorporate any type of control input, system state and output constraints, and cost functions [35, 38, 9]. The RHC controller we propose optimizes the cost-to-go for the short horizon MM to compute the control input for time tt. It then applies the first element of the computed optimal control sequence for the horizon MM as the control input. This is repeated at every time step. We analyse the performance of RHC over a duration TMT\gg M.

First, we present the setting where the system is known. We present: (i) the performance guarantee for the Receding Horizon Controller (RHC) when the controller has preview of the disturbance wkw_{k} for the horizon MM, i.e., for tkt+M1t\leq k\leq t+M-1, and (ii) present the performance guarantee when the controller does not have any preview of the disturbance. For the first case we show that RHC achieves RTp(γ)=O(1)R^{p}_{T}(\gamma)=O(1) when γγc\gamma\geq\gamma_{c} and the horizon length M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1. We show that the achievable attenuation in this case is nearly optimal that is γc=𝒪(γ¯)\gamma_{c}=\mathcal{O}(\overline{\gamma}). For the second case, when the preview of the disturbance is not available we guarantee that the overall cost is bounded by 2γ¯MTmaxw𝒲w22+𝒪(1)\sim 2\overline{\gamma}MT\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}+\mathcal{O}(1). Hence, in this case we guarantee that achievable attenuation is nearly 𝒪(γ¯)\mathcal{O}(\overline{\gamma}) of the maximum energy in the disturbance.

We then present the setting where the system is unknown and the disturbance preview is available. We present analysis for systems that are Bounded Input Bounded Stable (BIBO) and the cost functions are convex. We assume that the state transition function is parameterized and that only the parameter is unknown. We propose a generic online control framework that operates in two phases: the estimation phase followed by the control phase. The estimation phase runs for a period of NN time steps, at the end of which the online controller calculates an estimate θ^\hat{\theta} of θ\theta. In the control phase, the controller is the RHC that optimizes the cost-to-go for the estimated system. We present performance analysis for a black-box estimation procedure with accuracy given by: θ^θ2g(N)\lVert\hat{\theta}-\theta\rVert_{2}\leq g(N), a bounded and decreasing function of NN. We show that the overall online controller achieves RTp(γ)=𝒪(N)+𝒪(1)+𝒪((TN)g(N))R^{p}_{T}(\gamma)=\mathcal{O}(N)+\mathcal{O}(1)+\mathcal{O}((T-N)g(N)) when γγc\gamma\geq\gamma_{c} and the horizon length MM exceeds the same threshold as before. For linear systems, we trivially recover the results for the known system case since the estimation is trivial when the disturbance preview is available and the system is controllable. The framework we present provides a direct characterization to evaluate the various estimation approaches.

1.2 Notations

For a sequence (of vectors/function) {at,1tT}\{a_{t},1\leq t\leq T\}, we denote a1:t={aτ,1τt}a_{1:t}=\{a_{\tau},1\leq\tau\leq t\}. For a matrix MM, we denote its transpose by MM^{\top}. We denote the maximum eigenvalue of a matrix MM by λmax(M)\lambda_{\text{max}}(M). The two norm of a vector is denoted by .2\lVert.\rVert_{2}. When two matrices M1M_{1} and M2M_{2} are related by M1M2M_{1}\geq M_{2} then it implies that M1M2M_{1}-M_{2} is positive semi-definite. Similarly, when M1>M2M_{1}>M_{2} it implies that M1M2M_{1}-M_{2} is positive definite. We denote n\mathbb{R}^{n} as the nn-dimensional euclidean space and 0\mathbb{R}_{\geq 0} as the positive part of the real line.

2 Preliminaries

We consider the control of a general non-linear dynamical system with disturbances in dynamics:

xt+1=f(xt,ut,wt;θ),x_{t+1}=f(x_{t},u_{t},w_{t};\theta), (1)

where ff is a continuous function, xt(nx_{t}(\in\mathbb{R}^{n}) is the system state, ut(m)u_{t}(\in\mathbb{R}^{m}) is the control input, wt(nw_{t}(\in\mathbb{R}^{n}) is the disturbance, the parameter θ\theta could known or unknown. We make the assumption that the controller observes the full state.

We consider the setting where a fixed horizon preview of the future cost functions and disturbances are available to the algorithm at each time step tt. In particular, the control input utu_{t} is computed as ut=πt(x1:t,u1:t1,w1:t+Mw1,c1:t+Mc1)u_{t}=\pi_{t}(x_{1:t},u_{1:t-1},w_{1:t+M_{w}-1},c_{1:t+M_{c}-1}), where MwM_{w} and McM_{c} are fixed horizon lengths and πt\pi_{t} is the control policy at time tt. The goal of the agent is to select a control policy π=π1:T\pi=\pi_{1:T} in order to minimize the cumulative cost with respect to an attenuation γ\gamma. This can be formulated as the following optimization problem

RTp(γ)=J(π;w1:T,c1:T)γt=1Twt22:=t=1Tct(xt,ut)γt=1Twt22,where,\displaystyle R^{p}_{T}(\gamma)=J(\pi;w_{1:T},c_{1:T})-\gamma\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}_{2}:=\sum^{T}_{t=1}c_{t}(x_{t},u_{t})-\gamma\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}_{2},~{}~{}\text{where,}
ut=πt(x1:t,u1:t1,w1:t+Mw1,c1:t+Mc1).\displaystyle~{}~{}u_{t}=\pi_{t}(x_{1:t},u_{1:t-1},w_{1:t+M_{w}-1},c_{1:t+M_{c}-1}). (2)

Our goal is to characterize the optimal RTp(γ)R^{p}_{T}(\gamma) and the maximum achievable attenuation.

The assumptions used throughout this paper are as follows:

Assumption 1

(Disturbance) the disturbance wt𝒲w_{t}\in\mathcal{W}, where 𝒲\mathcal{W} is compact and wt2wc\lVert w_{t}\rVert_{2}\leq w_{c}. The disturbance wtw_{t} is adversarial for all tt.

Assumption 2

(Cost Function) The function ctc_{t} is Lipschitz continuous for all tt, with a uniform Lipschitz constant αc\alpha_{c} for all tt and there exists a constant α¯>0\underline{\alpha}>0 such that ct(x,u)α¯σ(x)0c_{t}(x,u)\geq\underline{\alpha}\sigma(x)\geq 0. The cost function ct(.,.)c_{t}(.,.) is adversarial for all tt.

The σ(x)\sigma(x) function could be any non-negative function. The assumption that the cost functions are Lipschitz continuous is a standard assumption [32, 45].

3 Related Works

The control of dynamical systems with uncertainties such as modeling errors, parametric uncertainty, and disturbances is a central challenge in control theory and so has been extensively studied. There is vast literature in the field of control on control synthesis for systems with such uncertainties. The robust control literature studies the problem of feedback control design for stability and performance guarantees with modeling uncertainty and disturbances; see [46]. The adaptive control literature studies the control of systems with parametric uncertainty; see [43, 28, 7].

Online Control: The field of online adaptive control is a widely studied topic. [1] studied the online Linear Quadratic Regulator (LQR) problem with unknown system and stochastic disturbances. The authors propose an adaptive algorithm that achieves T\sqrt{T} regret w.r.t the best linear control policy, which is the optimal policy. [15] propose an efficient algorithm for the same problem that achieves a regret of 𝒪(T2/3)\mathcal{O}(T^{2/3}). [13] and [34] improved on this result by providing an efficient run-time algorithm with a regret guarantee of 𝒪(T)\mathcal{O}(\sqrt{T}) for the same problem. Mania et al. [34] also establish 𝒪(T)\mathcal{O}(\sqrt{T})-regret for the partially observed Linear Quadratic Gaussian (LQG) setting. Recently, [44] showed that 𝒪(T)\mathcal{O}(\sqrt{T}) is the optimal regret for the online LQR problem. [12] provide an 𝒪(T)\mathcal{O}(\sqrt{T}) algorithm for a variant of the online LQR where the system is known and noise is stochastic but the controller cost function is an adversarially chosen quadratic function.

[3] consider the control of a known linear dynamic system with additive adversarial disturbance and an adversarial Lipschitz controller cost function. The controller they propose is a Disturbance Response Controller (DRC). They show that the the learning algorithm achieves 𝒪(T)\mathcal{O}(\sqrt{T})-regret with respect to the best DRC in hindsight. In a subsequent work [4] show that a poly logarithmic regret is achievable for strongly convex controller cost and well conditioned stochastic disturbances. [27] consider the same setting when the system is unknown and present an 𝒪(T2/3)\mathcal{O}(T^{2/3})-regret algorithm. Recently, [45] generalized these results to provide similar regret guarantees for the same setting with partial observation for both known and unknown systems.

Receding Horizon control: Many receding horizon control based methods have been proposed for managing disturbances and uncertainties in the system dynamics. For example, some works handle disturbances or uncertainties by robust or chance constraints [29, 22, 33, 48, 23]. Adaptive RHC techniques that adapt online when the system model is unknown have also been proposed [20, 2, 8, 47, 10]. These methods primarily focus on constraint satisfaction, stability and in some cases performance improvement using the adapted models. In contrast to these works, we consider non-asymptotic performance of an online robust adaptive RHC. There are considerable amount of papers that provide performance analysis of RHC under both time-invariant costs [5, 26, 24] and time varying costs [18, 6, 17, 25]. However, most of these studies focus on asymptotic performance.

4 Online Learning Robust Controller: Algorithms

In this section, we present the online control algorithms for the setting where Mc=Mw=MTM_{c}=M_{w}=M\ll T. We present separate algorithms for the case where the system and the case where the system is unknown. The detailed discussion for the case where the disturbance is not accessible is deferred to the appendix.

4.1 System is known

In this setting, we assume that the system dynamics is not parameterized by θ\theta. we assume that the algorithm has a fixed horizon preview of the future cost functions and disturbances. In particular, at each time tt, algorithm has access to ct:t+M1c_{t:t+M-1} and wt:t+M1w_{t:t+M-1}, in addition to the history of observation until tt. We use a receding horizon control approach that minimizes the cost-to-go for a horizon MM with the previewed disturbances and cost functions. In particular, to find the control input utu_{t} at each time step tt, the algorithm solves the following optimization problem:

infu~t:t+M1j=tt+M1cj(x~j,u~j)s.t.x~j+1=f(x~j,u~j,wj),x~t=xt,u~j𝒰.\underset{\tilde{u}_{t:t+M-1}}{\inf}\sum_{j=t}^{t+M-1}c_{j}(\tilde{x}_{j},\tilde{u}_{j})\ \text{s.t.}\ \tilde{x}_{j+1}=f(\tilde{x}_{j},\tilde{u}_{j},w_{j}),\ \tilde{x}_{t}=x_{t},\ \tilde{u}_{j}\in\mathcal{U}. (3)

where 𝒰\mathcal{U} is a compact set. Given that f,cjf,c_{j} are continuous functions the solution to the above optimization exists. Please see proof of Lemma 3 for a detailed argument. We denote the solution of this optimization by MPCt(Wt:te)\text{MPC}^{t}(W_{t:t^{e}}). The control input utu_{t} is set as the first element of the sequence MPCt(Wt:te)\text{MPC}^{t}(W_{t:t^{e}}). We succinctly denote the control input by κM(xt)\kappa_{M}(x_{t}). The overall online control algorithm is given in Algorithm 2.

1:  Input: MM
2:  for t = 1,…,T do
3:     Observe xtx_{t}, {cj(.,.)}tjt+M1\{c_{j}(.,.)\}_{t\leq j\leq t+M-1}, wt:t+M1w_{t:t+M-1}
4:     Solve the optimiziation Eq. (3) to obtain u~t:t+M1\tilde{u}_{t:t+M-1}
5:     Apply u~t\tilde{u}_{t} as the control input utu_{t}
6:  end for
Algorithm 1 Online Learning Robust Control (Known System with Disturbance Preview)

A.1. Discussion

RHC algorithms are in general complex because of the explicit optimization that is carried out at every time step. The complexity could arise from the system models being too complicated to be directly used for prediction and sometimes the constraints in the problem. If the optimization is convex and the system is linear then the optimization can be solved efficiently via primal-dual interior point frameworks [49]. Complex nonlinear RHC optimization such as Eq. (3) could be solved efficiently via online linearization of the nonlinear model. Such approaches have been shown to be empirically very effective for a wide range nonlinear control applications [30].

4.2 Online Control Framework for Unknown System

For the unknown system case, we assume the system dynamics is parametrized by an unknown parameter θ\theta, i.e.,

xt+1=f(xt,ut,wt;θ).x_{t+1}=f(x_{t},u_{t},w_{t};\theta).

The online controller we propose operates in two phases: (i) estimation phase, and (ii) control phase. The estimation phase runs for a duration of NN time steps. We assume that the estimation algorithm is such that at the end of the estimation phase the estimated parameter θ^\hat{\theta} is such that θ^θ2g(N)\lVert\hat{\theta}-\theta\rVert_{2}\leq g(N), a bounded and decreasing function of NN. In the discussion below, we comment on the estimation procedures for specific instances of the system. In this work we present the performance analysis of the online controller without restricting the analysis to a specific estimation procedure. The performance guarantee is presented in terms of the accuracy of a given estimation algorithm g(N)g(N). Thus, the analysis we present can be extended to any estimation procedure.

In the control phase, the controller is the online receding horizon controller similar to the known system case. At each instant, the controller optimizes the cost-to-go for the horizon MM for the realized disturbance for the estimated system:

infu~t:t+M1j=tt+M1cj(x~j,u~j)s.t.x~j+1=f(xt,ut,wt;θ^),x~t=xt,u~j𝒰.\underset{\tilde{u}_{t:t+M-1}}{\inf}\sum_{j=t}^{t+M-1}c_{j}(\tilde{x}_{j},\tilde{u}_{j})\ \text{s.t.}\ \tilde{x}_{j+1}=f(x_{t},u_{t},w_{t};\hat{\theta}),\ \tilde{x}_{t}=x_{t},\ \tilde{u}_{j}\in\mathcal{U}. (4)

We later argue that the solution to this optimization exists. Please see proof of Lemma 3 for a detailed argument. We denote the solution of this optimization by MPCt(wt:t+M1;θ^)\text{MPC}^{t}(w_{t:t+M-1};\hat{\theta}). The control input utu_{t} is set as the first element of MPCt(wt:t+M1;θ^)\text{MPC}^{t}(w_{t:t+M-1};\hat{\theta}). We succinctly denote the control input by κM(xt;θ^)\kappa_{M}(x_{t};\hat{\theta}).

B.1. Discussion

In this work, we do not specify the estimation algorithm, rather we present a general framework and performance guarantee for a given black-box estimation procedure whose accuracy is characterized by g(N)g(N), where NN is the duration of the estimation phase.

The estimation procedure is trivially simple when the system is a linear controllable system. In this case, the system matrices can be estimated accurately in finite time by generating a sequence of control inputs that is persistently exciting over a finite interval (see [37] for definition of persistence of excitation) and using least-squares to compute the estimate of the system matrices. In this case N=O(1)N=O(1) and g(N)=0g(N)=0. Hence, the online controller will be able to achieve the same performance as in the known system case.

The estimation can also be simpler when the nonlinear system is linear in parameters [31], i.e., dynamics of the form xk+1=xk+f(xk,uk)+g(xk,uk)θx_{k+1}=x_{k}+f(x_{k},u_{k})+g(x_{k},u_{k})\theta. Under certain conditions on g(xk,uk)g(x_{k},u_{k}), an estimator can be designed to estimate θ\theta in finite time [31]. We note that a condition similar to persistence of excitation is required in this case too.

Nonlinear systems are in general complex to estimate. A typical approach to estimate nonlinear systems is via the Koopman operator [11, 36]. The Koopman operator essentially specifies the transition in a lifted observable space. The Koopman operator can be approximately represented as a point in a linear vector space. The only disadvantage is that this representation requires a suitable set of basis functions to be known. Given this representation, linear estimation via least-squares can be applied to estimate the Koopman operator approximately [11].

The general version of the online control algorithm for the unknown system case is shown in Algorithm 2.

1:  Input: NN, MM
2:  Initialize: θ^=0\hat{\theta}=0
3:  for t = 1,…,T do
4:     Observe xtx_{t}, {cj(.,.)}tjt+M1\{c_{j}(.,.)\}_{t\leq j\leq t+M-1}, wt:t+m1w_{t:t+m-1}
5:     if t<Nt<N then
6:        Apply control for estimation phase
7:     else
8:        if t==Nt==N then
9:           θ^\hat{\theta}\leftarrow estimation(𝒟)(\mathcal{D}), 𝒟={{xk+1,xk,uk,wk},1kN1}\mathcal{D}=\{\{x_{k+1},x_{k},u_{k},w_{k}\},{1\leq k\leq N-1}\}
10:        end if
11:        Solve the optimiziation Eq. (4) to obtain u~t:t+M1\tilde{u}_{t:t+M-1}
12:        Apply u~t\tilde{u}_{t} as the control input utu_{t}
13:     end if
14:  end for
Algorithm 2 Online Learning Robust Control (Unknown System with Disturbance Preview)

5 Analysis and Results

In this section we present the performance analysis for the online controllers presented above. We first present the case where the system is known and then analyze the unknown case.

5.1 Analysis for Known System Case

We define VMt(xt,wt:t+M1)V^{t}_{M}(x_{t},w_{t:t+M-1}) as the optimal value function of the optimization problem given in Eq. 3. Hence, VMt(xt,wt:t+M1)V^{t}_{M}(x_{t},w_{t:t+M-1}) is given by

VMt(xt,wt:t+M1)=JMt(xt,u~t:t+M1,Wt:t+M1),\displaystyle V^{t}_{M}(x_{t},w_{t:t+M-1})=J^{t}_{M}(x_{t},\tilde{u}_{t:t+M-1},W_{t:t+M-1}),
JMt(xt,u~t:t+M1,wt:t+M1)=j=tt+M1cj(x~j,u~j)s.t.x~j+1=f(x~j,u~j,wj),x~t=xt.\displaystyle J^{t}_{M}(x_{t},\tilde{u}_{t:t+M-1},w_{t:t+M-1})=\sum_{j=t}^{t+M-1}c_{j}(\tilde{x}_{j},\tilde{u}_{j})\ \text{s.t.}\ \tilde{x}_{j+1}=f(\tilde{x}_{j},\tilde{u}_{j},w_{j}),\ \tilde{x}_{t}=x_{t}.

We note that this function exists and is continuous given that the solution to the optimization exists and is continuous (see Proof of Lemma 3 for a more detailed argument). In the next lemma, we characterize the incremental change of the value of this function as the system evolves. We present this characterization for the class of systems that satisfies VMt(xt)α¯σ(xt)+γ¯k=tte1wk22V^{t}_{M}(x_{t})\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{k}\rVert^{2}_{2}. This represents a sufficiently general class of systems. It includes the strongly convex quadratic cost functions for linear dynamical systems. As stated in the introduction, the first term in the bound represents the contribution to the cost by the initial state xtx_{t} and the second term is the bound to the cost for the deviation effected by the disturbances. Here γ¯\overline{\gamma} is the maximum achievable attenuation for this system.

Lemma 1

Suppose te=t+M1t^{e}=t+M-1, wt:tew_{t:t^{e}} is accessible. Then there exists constants α¯V,ΓVγ>0\underline{\alpha}_{V},\Gamma^{\gamma}_{V}>0 and ΓV\Gamma_{V} such that for each M2M\geq 2 and xtnx_{t}\in\mathbb{R}^{n}, for the closed-loop system xt+1=f(xt,κM(xt),wt)x_{t+1}=f(x_{t},\kappa_{M}(x_{t}),w_{t}), VMt(xt,Wt:te)α¯Vσ(xt)V^{t}_{M}(x_{t},W_{t:t^{e}})\geq\underline{\alpha}_{V}\sigma(x_{t}) and

VMt+1(xt+1,Wt+1:te+1)VMt(xt,Wt:te)ΓVσ(xt)+ΓVγk=ttewk22,V^{t+1}_{M}(x_{t+1},W_{t+1:t^{e}+1})-V^{t}_{M}(x_{t},W_{t:t^{e}})\leq\Gamma_{V}\sigma(x_{t})+\Gamma^{\gamma}_{V}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2},

where α¯V=α¯\underline{\alpha}_{V}=\underline{\alpha}, ΓV=(α¯2α¯(M1)α¯),ΓVγ=γ¯(α¯α¯(M1)+1)\Gamma_{V}=\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right),\ \Gamma^{\gamma}_{V}=\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right).

Please see the Appendix for the proof. The result states that the change in the value of the optimal cost-to-go when the system evolves from tt to t+1t+1 is bounded by (i) the cost of the initial state xtx_{t} and (ii) some constant times the total energy of the disturbances for the duration MM starting from tt. Next we use the above lemma to characterize an upper bound on the per step cost incurred by the online controller.

Lemma 2

Under the conditions of Lemma 1, suppose, b=γ¯(α¯α¯+1)b=\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right), HMH\geq M, tH=t+Ht_{H}=t+H. Then, for M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, and any aa, s.t. 1>a1ϵ~/α¯,0<ϵ~<α¯α¯2(M1)α¯1>a\geq 1-\tilde{\epsilon}/\overline{\alpha},0<\tilde{\epsilon}<\underline{\alpha}-\frac{\overline{\alpha}^{2}}{(M-1)\underline{\alpha}}, the per-step cost of the closed-loop system xt+1=f(xt,κM(xt),wt)x_{t+1}=f(x_{t},\kappa_{M}(x_{t}),w_{t}) at tHt_{H} satisfies

ct+H(xtH,κM(xtH))MλeλHσ(xt)+j=ttH+M2Mw,jwj22,where,c_{t+H}(x_{t_{H}},\kappa_{M}(x_{t_{H}}))\leq M_{\lambda}e^{-\lambda H}\sigma(x_{t})+\sum_{j=t}^{t_{H}+M-2}M_{w,j}\lVert w_{j}\rVert^{2}_{2},\ \text{where},

Mλ=α¯,λ=lna,Mw,j=(baHM/(1a)+aHγ¯),whentjte,Mw,j=b/(1a)atHj1,whente+1jtH1,Mw,j=b/(1a),whentHjtH+M2M_{\lambda}=\overline{\alpha},\ \lambda=-\ln{a},\ M_{w,j}=\left(ba^{H-M}/(1-a)+a^{H}\overline{\gamma}\right),\ \text{when}\ t\leq j\leq t^{e},M_{w,j}=b/(1-a)a^{t_{H}-j-1},\ \text{when}\ t^{e}+1\leq j\leq t_{H}-1,\ M_{w,j}=b/(1-a),\text{when}\ t_{H}\leq j\leq t_{H}+M-2.

Please see the Appendix for the proof. The results imply that the cost at a time t+Ht+H is bounded by two terms: (i) the contribution of the cost from the initial state xtx_{t} that is exponentially decaying with HH and (ii) the contribution from the disturbances seen by the system so far. From the form of the coefficients Mw,jM_{w,j} it is clear that the online controller exponentially diminishes the contribution of the disturbances from earlier times. In the next theorem we characterize the performance of the online controller.

Theorem 1

Under the conditions of Lemma 1, Then, for TM+1T\geq M+1, M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, and any aa s.t. 1>a1ϵ~/α¯,0<ϵ~<α¯α¯2(M1)α¯1>a\geq 1-\tilde{\epsilon}/\overline{\alpha},0<\tilde{\epsilon}<\underline{\alpha}-\frac{\overline{\alpha}^{2}}{(M-1)\underline{\alpha}}, the total cost incurred by the closed-loop system xt+1=f(xt,κM(xt),wt)x_{t+1}=f(x_{t},\kappa_{M}(x_{t}),w_{t}) satisfies

t=1Tct(xt,κM(xt))γt=1Twt22𝒪(1),γγc\sum_{t=1}^{T}c_{t}(x_{t},\kappa_{M}(x_{t}))-\gamma\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}_{2}\leq\mathcal{O}(1),\ \gamma\geq\gamma_{c}

where γc=b((M1)(1a)+1)(1a)2\gamma_{c}=\frac{b\left((M-1)(1-a)+1\right)}{(1-a)^{2}}.

Please see the Appendix for the proof. It follows from the result that the achievable attenuation by the RHC approach is γc\gamma_{c}. Given the bounds on aa we can choose aa and MM such that 2(1+α¯α¯)(γ~α¯2α¯2+γ~2)γ¯γc2γ¯2\left(1+\frac{\underline{\alpha}}{\overline{\alpha}}\right)\left(\frac{\tilde{\gamma}\overline{\alpha}^{2}}{\underline{\alpha}^{2}}+\tilde{\gamma}^{2}\right)\overline{\gamma}\gtrapprox\gamma_{c}\geq 2\overline{\gamma}, where γ~=α¯2α¯2α¯α¯(α¯2α¯2α¯2α¯21)\tilde{\gamma}=\frac{\lceil\frac{\overline{\alpha}^{2}}{\underline{\alpha}^{2}}\rceil\underline{\alpha}}{\overline{\alpha}\left(\frac{\underline{\alpha}^{2}}{\overline{\alpha}^{2}}\lceil\frac{\overline{\alpha}^{2}}{\underline{\alpha}^{2}}\rceil-1\right)}. This means that the RHC approach cannot beat 2γ¯2\overline{\gamma} and that γc=𝒪(γ¯)\gamma_{c}=\mathcal{O}(\overline{\gamma}).

5.2 Analysis for Unknown System Case

Here, we present the performance analysis for a black-box estimation procedure with accuracy θ^θFg(N)\lVert\hat{\theta}-\theta\rVert_{F}\leq g(N). Let

VMt(xt,wt:t+M1;θ^)=JMt(xt,u~t:t+M1,wt:t+M1;θ^),\displaystyle V^{t}_{M}(x_{t},w_{t:t+M-1};\hat{\theta})=J^{t}_{M}(x_{t},\tilde{u}_{t:t+M-1},w_{t:t+M-1};\hat{\theta}),
JMt(xt,u~t:t+M1,wt:t+M1)=j=tt+M1cj(x~j,u~j)s.t.x~j+1=f(x~j,u~j,wj;θ^),x~t=xt.\displaystyle J^{t}_{M}(x_{t},\tilde{u}_{t:t+M-1},w_{t:t+M-1})=\sum_{j=t}^{t+M-1}c_{j}(\tilde{x}_{j},\tilde{u}_{j})\ \text{s.t.}\ \tilde{x}_{j+1}=f(\tilde{x}_{j},\tilde{u}_{j},w_{j};\hat{\theta}),\ \tilde{x}_{t}=x_{t}.

Let UU and U~\tilde{U} be two control sequence of length MM. Define

J1,Mt(xt,U,U~,wt:te;θ)=JMt(xt,U,Wt:te;θ)JMt(xt,U~,wt:te;θ).J^{t}_{1,M}(x_{t},U,\tilde{U},w_{t:t^{e}};\theta)=J^{t}_{M}(x_{t},U,W_{t:t^{e}};\theta)-J^{t}_{M}(x_{t},\tilde{U},w_{t:t^{e}};\theta).

We argue in Lemma 3 that the solution to the RHC optimization MPCt(wt:te;θ^)\text{MPC}^{t}(w_{t:t^{e}};\hat{\theta}) exists. Hence the function VMt(xt,wt:te;θ^)V^{t}_{M}(x_{t},w_{t:t^{e}};\hat{\theta}) is well defined.

Definition 1

We say that a function f1:p×p×qf_{1}:\mathbb{R}^{p}\times\mathbb{R}^{p}\times\mathbb{R}^{q}\rightarrow\mathbb{R}, f1α,βf_{1}\in\mathcal{L}^{\alpha,\beta} provided f1(s,s~,λ)f1(s,s~,λ~)css~2αλλ~2βf_{1}(s,\tilde{s},\lambda)-f_{1}(s,\tilde{s},\tilde{\lambda})\leq c\lVert s-\tilde{s}\rVert^{\alpha}_{2}\lVert\lambda-\tilde{\lambda}\rVert^{\beta}_{2}.

We make the following additional assumptions on the system and the cost functions. These assumptions are required to establish Lipschitz continuity of the solution MPCt(wt:te;θ^)\text{MPC}^{t}(w_{t:t^{e}};\hat{\theta}). This is the key additional step that is required to extend the analysis technique applied to the known system case.

Assumption 3

(System and Cost) (i) The cost is time invariant, (ii) The system given by xt+1=f(xt,ut,wt;θ)x_{t+1}=f(x_{t},u_{t},w_{t};\theta) is Bounded Input Bounded Output (BIBO) stable; (iii) the condition σ(x)=0\sigma(x)=0 is bounded trajectory; (iv) the estimation accuracy g(N)g(N) can be guaranteed by bounded excitation; (v) θΘ\theta\in\Theta, a compact set and θ2S\lVert\theta\rVert_{2}\leq S, and ff is continuous for all θΘ\theta\in\Theta; (vi) the function f(xt,ut,wt;θ)f(x_{t},u_{t},w_{t};\theta) is Lipschitz, i.e., f(xt,ut,wt;θ)f(xt,ut,wt;θ)αfθ2(xtxt2+utut2)f(x^{\prime}_{t},u^{\prime}_{t},w_{t};\theta)-f(x_{t},u_{t},w_{t};\theta)\leq\alpha_{f}\lVert\theta\rVert_{2}\left(\lVert x^{\prime}_{t}-x_{t}\rVert_{2}+\lVert u^{\prime}_{t}-u_{t}\rVert_{2}\right) (trivially includes linear systems); (vii) the cost function ct(.,.)c_{t}(.,.) is convex in both arguments and strongly convex in the second argument (includes LQR); (viii) J1,Mt=f11,1J^{t}_{1,M}=f_{1}\in\mathcal{L}^{1,1} wherein s=U,s~=U~,λ=θ,λ~=θ~s=U,\tilde{s}=\tilde{U},\lambda=\theta,\tilde{\lambda}=\tilde{\theta}; (ix) θ^Θ\hat{\theta}\in\Theta.

The assumption that the estimation accuracy g(N)g(N) can be guaranteed by bounded excitation is trivially satisfiable for linear controllable systems. The assumption that the system is BIBO is required to ensure that the total cost incurred during the estimation phase scales only linearly with NN.

Lemma 3

Under Assumption 3 and the cost function is time invariant, it holds that

VMt(.,.;θ^)VMt(.,.;θ)αVθ^θ2,κM(xt;θ^)κM(xt;θ)ακθ^θ2.V^{t}_{M}(.,.;\hat{\theta})-V^{t}_{M}(.,.;\theta)\leq\alpha_{V}\lVert\hat{\theta}-\theta\rVert_{2},\ \kappa_{M}(x_{t};\hat{\theta})-\kappa_{M}(x_{t};\theta)\leq\alpha_{\kappa}\lVert\hat{\theta}-\theta\rVert_{2}.

To establish this result we draw from results on Lipschitz continuity of solutions to parametric optimization [40]. Please see the Appendix for the detailed proof. In the next lemma, we characterize the incremental change of the value of the function VMt(.,.;θ^)V^{t}_{M}(.,.;\hat{\theta}) in the control phase as the closed loop system evolves from tt to t+1t+1. We establish these results for the same class of systems assumed for the known system case, i.e., systems for which VMt(xt,Wt:te;θ)α¯σ(xt)+γ¯k=tte1wk22V^{t}_{M}(x_{t},W_{t:t^{e}};\theta)\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{k}\rVert^{2}_{2} and additionally Assumption 3.

Lemma 4

Suppose Assumptions 1, 2 3 hold. Suppose te=tM1t^{e}=t_{M}-1, Wt:teW_{t:t^{e}} is accessible. Then there exists constants α¯V,ΓVγ>0\underline{\alpha}_{V},\Gamma^{\gamma}_{V}>0 and ΓV\Gamma_{V} such that for each M2M\geq 2 and xtnx_{t}\in\mathbb{R}^{n}, for the closed-loop system xt+1=f(xt,κM(xt;θ^),wt)x_{t+1}=f(x_{t},\kappa_{M}(x_{t};\hat{\theta}),w_{t}) in the control phase satisfies VMt(xt,Wt:te;θ^)α¯Vσ(xt)V^{t}_{M}(x_{t},W_{t:t^{e}};\hat{\theta})\geq\underline{\alpha}_{V}\sigma(x_{t}) and

VMt+1(xt+1,Wt+1:te+1;θ^)VMt(xt,Wt:te;θ^)ΓVσ(xt)+ΓVγk=ttewk22+ΓVθθ^θ2,V^{t+1}_{M}(x_{t+1},W_{t+1:t^{e}+1};\hat{\theta})-V^{t}_{M}(x_{t},W_{t:t^{e}};\hat{\theta})\leq\Gamma_{V}\sigma(x_{t})+\Gamma^{\gamma}_{V}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+\Gamma^{\theta}_{V}\lVert\hat{\theta}-\theta\rVert_{2},

where ΓVθ=(2αV+αcα~fακ(M1)(α¯α¯(M1)+1))\Gamma^{\theta}_{V}=\left(2\alpha_{V}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\right), α¯V,ΓV,ΓVγ\underline{\alpha}_{V},\Gamma_{V},\Gamma^{\gamma}_{V} are the same as in Lemma 1 and α~f=max0kM2αfk+1Sk+1\tilde{\alpha}_{f}=\max_{0\leq k\leq M-2}\alpha^{k+1}_{f}S^{k+1}.

Please see the Appendix for the Proof. We note that the first two terms in the bound are the same as in the known system case. The last term arises from the error in the parameter estimate θ^\hat{\theta}. Next we use the above lemma to characterize an upper bound on the per step cost incurred by the online controller in the control phase.

Lemma 5

Under the conditions of Lemma 4, suppose b=γ¯(α¯α¯+1)b=\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right), HMH\geq M, tH=t+Ht_{H}=t+H. Then, M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, and any aa, s.t. 1>a1ϵ~/α¯,0<ϵ~<α¯α¯2(M1)α¯1>a\geq 1-\tilde{\epsilon}/\overline{\alpha},0<\tilde{\epsilon}<\underline{\alpha}-\frac{\overline{\alpha}^{2}}{(M-1)\underline{\alpha}}, the per-step cost for the closed-loop system xt+1=f(xt,κM(xt;θ^),wt)x_{t+1}=f(x_{t},\kappa_{M}(x_{t};\hat{\theta}),w_{t}) at tHt_{H} in the control phase satisfies

ct+H(xtH,κM(xtH;θ^))MλeλHσ(xt)+j=ttH+M2Mw,jwj22+Mθθ^θ2,where,c_{t+H}(x_{t_{H}},\kappa_{M}(x_{t_{H}};\hat{\theta}))\leq M_{\lambda}e^{-\lambda H}\sigma(x_{t})+\sum_{j=t}^{t_{H}+M-2}M_{w,j}\lVert w_{j}\rVert^{2}_{2}+M_{\theta}\lVert\hat{\theta}-\theta\rVert_{2},\ \text{where},

Mλ=α¯,λ=lna,Mw,j=(baHM/(1a)+aHγ¯),whentjte,Mw,j=b/(1a)atHj1,whente+1jtH1,Mw,j=b/(1a),whentHjtH+M2M_{\lambda}=\overline{\alpha},\ \lambda=-\ln{a},\ M_{w,j}=\left(ba^{H-M}/(1-a)+a^{H}\overline{\gamma}\right),\ \text{when}\ t\leq j\leq t^{e},M_{w,j}=\frac{b}{/}(1-a)a^{t_{H}-j-1},\ \text{when}\ t^{e}+1\leq j\leq t_{H}-1,\ M_{w,j}=b/(1-a),\ \text{when}\ t_{H}\leq j\leq t_{H}+M-2, Mθ=(c(1aH)(1a)+aHαV)M_{\theta}=\left(c(1-a^{H})(1-a)+a^{H}\alpha_{V}\right), where c=(αV(2+ϵ~α¯)+αcα~f(M1)(α¯ϵ~α¯+1))c=\left(\alpha_{V}\left(2+\frac{\tilde{\epsilon}}{\overline{\alpha}}\right)+\alpha_{c}\tilde{\alpha}_{f}(M-1)\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\right).

Please see the Appendix for the proof. We note that the per-step cost in the control phase in this case is bounded by three terms; the first two terms are the same as before while the additional third term arises from the error in the parameter estimate θ^\hat{\theta}. In the next theorem we characterize the performance of the online controller.

Theorem 2

Suppose the conditions of Lemma 4. Then, for TM+1T\geq M+1, M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, and any aa s.t. 1>a1ϵ~/α¯,0<ϵ~<α¯α¯2(M1)α¯1>a\geq 1-\tilde{\epsilon}/\overline{\alpha},0<\tilde{\epsilon}<\underline{\alpha}-\frac{\overline{\alpha}^{2}}{(M-1)\underline{\alpha}}, the total cost incurred by the closed-loop system xt+1=f(xt,κM(xt;θ^),wt)x_{t+1}=f(x_{t},\kappa_{M}(x_{t};\hat{\theta}),w_{t}) satisfies

t=1Tct(xt,κM(xt))γt=1Twt22𝒪(N)+𝒪(1)+𝒪((TN+1)g(N)),γγc\sum_{t=1}^{T}c_{t}(x_{t},\kappa_{M}(x_{t}))-\gamma\sum_{t=1}^{T}\lVert w_{t}\rVert^{2}_{2}\leq\mathcal{O}(N)+\mathcal{O}(1)+\mathcal{O}((T-N+1)g(N)),\ \gamma\geq\gamma_{c}

where γc=b((M1)(1a)+1)(1a)2\gamma_{c}=\frac{b\left((M-1)(1-a)+1\right)}{(1-a)^{2}}.

Proof: By Assumption 3 that the system is BIBO and the accuracy g(N)g(N) can be guaranteed by bounded excitation it follows that the cost for the estimation phase is bounded by γt=1N1wt22+𝒪(N)\gamma\sum_{t=1}^{N-1}\lVert w_{t}\rVert^{2}_{2}+\mathcal{O}(N). Then from Lemma 5, the fact that θ^θ2g(N)\lVert\hat{\theta}-\theta\rVert_{2}\leq g(N), following steps similar to proof of Theorem 1 it follows that the bound for the cost for the control phase is γt=NTwt22+𝒪(1)+𝒪((TN+1)g(N))\gamma\sum_{t=N}^{T}\lVert w_{t}\rVert^{2}_{2}+\mathcal{O}(1)+\mathcal{O}((T-N+1)g(N)). Combining the two observations the result follows \blacksquare

The result presented characterizes the achievable performance by the RHC approach for a given black-box estimation procedure. If N=𝒪(1),g(N)=0N=\mathcal{O}(1),g(N)=0, as is the case with linear controllable systems, we recover the results for the known system case, where the RHC approach achieves the attenuation γc\gamma_{c}. Similarly, for nonlinear systems linear in unknown parameters, as argued earlier, N=𝒪(1),g(N)=0N=\mathcal{O}(1),g(N)=0, and so the RHC approach will achieve the attenuation γc\gamma_{c} in this case too. For a general estimation procedure for which g(N)=𝒪(1/N)g(N)=\mathcal{O}(1/\sqrt{N}), the optimal duration for the estimation phase is N=𝒪(T2/3)N=\mathcal{O}(T^{2/3}) and RTp𝒪(T2/3)R^{p}_{T}\leq\mathcal{O}(T^{2/3}).

References

  • [1] Yasin Abbasi-Yadkori and Csaba Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.
  • [2] Veronica Adetola, Darryl DeHaan, and Martin Guay. Adaptive model predictive control for constrained nonlinear systems. Systems & Control Letters, 58(5):320–326, 2009.
  • [3] Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, and Karan Singh. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111–119, 2019.
  • [4] Naman Agarwal, Elad Hazan, and Karan Singh. Logarithmic regret for online control. In Advances in Neural Information Processing Systems, pages 10175–10184, 2019.
  • [5] David Angeli, Rishi Amrit, and James B Rawlings. On average performance and stability of economic model predictive control. IEEE transactions on automatic control, 57(7):1615–1626, 2011.
  • [6] David Angeli, Alessandro Casavola, and Francesco Tedesco. Theoretical advances on economic model predictive control with time-varying costs. Annual Reviews in Control, 41:218–224, 2016.
  • [7] Karl J Åström and Björn Wittenmark. Adaptive control. Courier Corporation, 2013.
  • [8] Anil Aswani, Humberto Gonzalez, S Shankar Sastry, and Claire Tomlin. Provably safe and robust learning-based model predictive control. Automatica, 49(5):1216–1226, 2013.
  • [9] Francesco Borrelli, Alberto Bemporad, and Manfred Morari. Predictive control for linear and hybrid systems. Cambridge University Press, 2017.
  • [10] Monimoy Bujarbaruah, Xiaojing Zhang, Marko Tanaskovic, and Francesco Borrelli. Adaptive mpc under time varying uncertainty: Robust and stochastic. arXiv preprint arXiv:1909.13473, 2019.
  • [11] Yongxin Chen and Umesh Vaidya. Sample complexity for nonlinear dynamics. arXiv preprint arXiv:1810.11747, 2018.
  • [12] Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. Online linear quadratic control. arXiv preprint arXiv:1806.07104, 2018.
  • [13] Alon Cohen, Tomer Koren, and Yishay Mansour. Learning linear-quadratic regulators efficiently with only T\sqrt{T} regret. In International Conference on Machine Learning, pages 1300–1309, 2019.
  • [14] Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. arXiv preprint arXiv:2009.09623, 2020.
  • [15] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4188–4197, 2018.
  • [16] Francisco Facchinei and Jong-Shi Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.
  • [17] Antonio Ferramosca, Daniel Limon, and Eduardo F Camacho. Economic mpc for a changing economic criterion for linear systems. IEEE Transactions on Automatic Control, 59(10):2657–2667, 2014.
  • [18] Antonio Ferramosca, James B Rawlings, Daniel Limón, and Eduardo F Camacho. Economic mpc for a changing economic criterion. In 49th IEEE Conference on Decision and Control (CDC), pages 6131–6136. IEEE, 2010.
  • [19] Dylan Foster and Max Simchowitz. Logarithmic regret for adversarial online control. In International Conference on Machine Learning, pages 3211–3221. PMLR, 2020.
  • [20] Hiroaki Fukushima, Tae-Hyoung Kim, and Toshiharu Sugie. Adaptive model predictive control for a class of constrained linear systems based on the comparison model. Automatica, 43(2):301–308, 2007.
  • [21] Gautam Goel and Babak Hassibi. Regret-optimal control in dynamic environments. arXiv preprint arXiv:2010.10473, 2020.
  • [22] Paul J Goulart, Eric C Kerrigan, and Jan M Maciejowski. Optimization over state feedback policies for robust control with constraints. Automatica, 42(4):523–533, 2006.
  • [23] PJ Goulart, X Zhang, M Kamgarpour, A Georghiou, and J Lygeros. Robust optimal control with adjustable uncertainty sets. Automatica, 75, 2016.
  • [24] Lars Grüne and Anastasia Panin. On non-averaged performance of economic mpc with terminal conditions. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 4332–4337. IEEE, 2015.
  • [25] Lars Grüne and Simon Pirkelmann. Closed-loop performance analysis for economic model predictive control of time-varying systems. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 5563–5569. IEEE, 2017.
  • [26] Lars Grüne and Marleen Stieler. Asymptotic stability and transient optimality of economic mpc without terminal conditions. Journal of Process Control, 24(8):1187–1196, 2014.
  • [27] Elad Hazan, Sham Kakade, and Karan Singh. The nonstochastic control problem. In Algorithmic Learning Theory, pages 408–421. PMLR, 2020.
  • [28] Petros A Ioannou and Jing Sun. Robust adaptive control. Courier Corporation, 2012.
  • [29] Wilbur Langson, Ioannis Chryssochoos, SV Raković, and David Q Mayne. Robust model predictive control using tubes. Automatica, 40(1):125–133, 2004.
  • [30] Maciej Ławryńczuk. Computationally efficient model predictive control algorithms. A Neural Network Approach, Studies in Systems, Decision and Control, 3, 2014.
  • [31] Devon Lehrer, Veronica Adetola, and Martin Guay. Parameter identification methods for non-linear discrete-time systems. In Proceedings of the 2010 American Control Conference, pages 2170–2175. IEEE, 2010.
  • [32] Yingying Li, Xin Chen, and Na Li. Online optimal control with linear dynamics and predictions: Algorithms and regret analysis. In Advances in Neural Information Processing Systems, pages 14887–14899, 2019.
  • [33] D Limon, I Alvarado, TEFC Alamo, and EF Camacho. Robust tube-based mpc for tracking of constrained linear systems with additive disturbances. Journal of Process Control, 20(3):248–260, 2010.
  • [34] Horia Mania, Stephen Tu, and Benjamin Recht. Certainty equivalent control of lqr is efficient. arXiv preprint arXiv:1902.07826, 2019.
  • [35] David Q Mayne, James B Rawlings, Christopher V Rao, and Pierre OM Scokaert. Constrained model predictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.
  • [36] Igor Mezic. Koopman operator, geometry, and learning. arXiv preprint arXiv:2010.05377, 2020.
  • [37] J Moore. Persistence of excitation in extended least squares. IEEE Transactions on Automatic Control, 28(1):60–68, 1983.
  • [38] Manfred Morari and Jay H Lee. Model predictive control: past, present and future. Computers & Chemical Engineering, 23(4-5):667–682, 1999.
  • [39] Arkadi Nemirovski. Interior point polynomial time methods in convex programming. Lecture notes, 2004.
  • [40] Marc Quincampoix and Nadia Zlateva. Parameterized minimax problem: on lipschitz-like dependence of the solution with respect to the parameter. SIAM Journal on Optimization, 19(3):1250–1269, 2008.
  • [41] R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.
  • [42] J Ben Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
  • [43] Shankar Sastry and Marc Bodson. Adaptive control: stability, convergence and robustness. Courier Corporation, 2011.
  • [44] Max Simchowitz and Dylan Foster. Naive exploration is optimal for online lqr. In International Conference on Machine Learning, pages 8937–8948. PMLR, 2020.
  • [45] Max Simchowitz, Karan Singh, and Elad Hazan. Improper learning for non-stochastic control. arXiv preprint arXiv:2001.09254, 2020.
  • [46] Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control: analysis and design, volume 2. Citeseer, 2007.
  • [47] Marko Tanaskovic, Lorenzo Fagiano, and Vojislav Gligorovski. Adaptive model predictive control for linear time varying mimo systems. Automatica, 105:237–245, 2019.
  • [48] Roberto Tempo, Giuseppe Calafiore, and Fabrizio Dabbene. Randomized algorithms for analysis and control of uncertain systems: with applications. Springer Science & Business Media, 2012.
  • [49] Stephen J Wright. Efficient convex optimization for linear mpc. In Handbook of model predictive control, pages 287–303. Springer, 2019.

Appendix A: Proof of Lemma 1

We note that

VMt(xt,Wt:te)ct(xt,κM(xt))α¯σ(xt).V^{t}_{M}(x_{t},W_{t:t^{e}})\geq c_{t}(x_{t},\kappa_{M}(x_{t}))\geq\underline{\alpha}\sigma(x_{t}).

The last inequality follows directly from Assumption 2. Hence, α¯V=α¯>0\underline{\alpha}_{V}=\underline{\alpha}>0. We introduce some notation for simplifying the presentation of the analysis. If MPCt(Wt:te)={u~t,u~t+1,,u~te}\text{MPC}^{t}(W_{t:t^{e}})=\{\tilde{u}^{*}_{t},\tilde{u}^{*}_{t+1},...,\tilde{u}^{*}_{t^{e}}\}, then let MPCt(Wt:te)[i:j]={u~t+i1,,u~t+j1},ij\text{MPC}^{t}(W_{t:t^{e}})[i:j]=\{\tilde{u}^{*}_{t+i-1},...,\tilde{u}^{*}_{t+j-1}\},\ i\leq j. We denote MPCt(Wt:te)\text{MPC}^{t}(W_{t:t^{e}}) by Mpt\text{M}^{t}_{p}, and Wt:teW_{t:t^{e}} by WtW_{t}.

Let ϕt(k,xt,Mpt,Wt)\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}) denote the state the system evolves to at time t+kt+k starting at xtx_{t}, under the control sequence Mpt[1:k]\text{M}^{t}_{p}[1:k] and disturbance WtW_{t}. From the definition of VMt(xt,Wt)V^{t}_{M}(x_{t},W_{t}) it follows that

VMt+1(xt+1,Wt+1)VMt(xt,Wt)=JMt+1(xt+1,Mpt+1,Wt+1)JMt(xt,Mpt,Wt)\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})=J^{t+1}_{M}(x_{t+1},\text{M}^{t+1}_{p},W_{t+1})-J^{t}_{M}(x_{t},\text{M}^{t}_{p},W_{t})
=k=0M1ct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1]).\displaystyle=\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1]).

The second step just follows from the definition of JMt(.,.,.)J^{t}_{M}(.,.,.). Hence,

VMt+1(xt+1,Wt+1)VMt(xt,Wt)=k=0jct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})=\sum_{k=0}^{j}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])
+k=j+1M1ct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1])\displaystyle+\sum_{k=j+1}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1])
k=0j1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])\displaystyle\leq\sum_{k=0}^{j-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])
+k=jM1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1]),\displaystyle+\sum_{k=j}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1]),

where M~pt+1=[Mpt[2:j+1],Mpj,t+1[j+1:M]]\tilde{\text{M}}^{t+1}_{p}=[\text{M}^{t}_{p}[2:j+1],\text{M}^{*j,t+1}_{p}[j+1:M]], where Mpj,t+1[j+1:M]\text{M}^{*j,t+1}_{p}[j+1:M] is the optimal control sequence given the control for the first jj time steps is Mpt[2:j+1]\text{M}^{t}_{p}[2:j+1]. The second inequality follows from the fact that M~pt+1\tilde{\text{M}}^{t+1}_{p} is sub-optimal compared to Mpt+1\text{M}^{t+1}_{p}. Now by definition

k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1])=ct(xt,Mpt[1])\displaystyle\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1])=c_{t}(x_{t},\text{M}^{t}_{p}[1])
+k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt[2:M],Wt+1),Mpt[k+2]).\displaystyle+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2]).

Substituting this in the above inequality, the first term on the right, the summation from k=0k=0 to jj gets cancelled. Hence, we have

VMt+1(xt+1,Wt+1)VMt(xt,Wt)k=jM1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})\leq\sum_{k=j}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])
ct(xt,Mpt[1]).\displaystyle-c_{t}(x_{t},\text{M}^{t}_{p}[1]).

By definition of M~pt+1\tilde{\text{M}}^{t+1}_{p}, the first sum on the right is the optimal cost-to-go starting from the state ϕt+1(j,xt+1,M~pt+1,Wt+1)\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}) for the horizon MjM-j from t+jt+j. Hence,

VMt+1(xt+1,Wt+1)VMt(xt,Wt)VMjt+j(ϕt+1(j,xt+1,M~pt+1,Wt+1),Wt+j)ct(xt,Mpt[1]).V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})\leq V^{t+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),W_{t+j})-c_{t}(x_{t},\text{M}^{t}_{p}[1]).

Once again by the assumption in the Lemma we have that

VMjt+j(ϕt+1(j,xt+1,M~pt+1,Wt+1),Wt+j)α¯σ(ϕt+1(j,xt+1,M~pt+1,Wt+1))+γ¯k=t+jt+M2wk22.V^{t+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),W_{t+j})\leq\overline{\alpha}\sigma(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}))+\overline{\gamma}\sum_{k=t+j}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}.

Hence,

VMt+1(xt+1,Wt+1)VMt(xt,Wt)α¯σ(ϕt+1(j,xt+1,M~pt+1,Wt+1))\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})\leq\overline{\alpha}\sigma(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}))
+γ¯k=t+jt+M2wk22ct(xt,Mpt[1]).\displaystyle+\overline{\gamma}\sum_{k=t+j}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{p}[1]).

Now, ϕt+1(j,xt+1,M~pt+1,Wt+1)=ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1)\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1})=\phi^{t+1}(j,x_{t+1},\text{M}^{t}_{p}[2:j+1],W_{t+1}). Hence,

VMt+1(xt+1,Wt+1)VMt(xt,Wt)α¯σ(ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1))\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})\leq\overline{\alpha}\sigma(\phi^{t+1}(j,x_{t+1},\text{M}^{t}_{p}[2:j+1],W_{t+1}))
+γ¯k=t+jt+M2wk22ct(xt,Mpt[1]).\displaystyle+\overline{\gamma}\sum_{k=t+j}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{p}[1]). (5)

From Assumption 2 and the fact that VMt(xt,Wt:te)α¯σ(xt)+γ¯k=tte1wt22V^{t}_{M}(x_{t},W_{t:t^{e}})\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{t}\rVert^{2}_{2} for any tt it follows that

α¯k=0M2σ(ϕt+1(k,xt+1,Mpt[2:k+1],Wt+1))k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt[2:M],Wt+1),Mpt[k+2])\displaystyle\underline{\alpha}\sum_{k=0}^{M-2}\sigma(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:k+1],W_{t+1}))\leq\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2])
ct(xt,Mpt[1])+k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt[2:M],Wt+1),Mpt[k+2])α¯σ(xt)+γ¯k=ttewk22.\displaystyle\leq c_{t}(x_{t},\text{M}^{t}_{p}[1])+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2])\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}.

The above inequality implies that j{0,1,,M2}\exists\ j^{*}\in\{0,1,...,M-2\} such that

σ(ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1))1α¯(M1)(α¯σ(xt)+γ¯k=ttewk22).\sigma(\phi^{t+1}(j^{*},x_{t+1},\text{M}^{t}_{p}[2:j^{*}+1],W_{t+1}))\leq\frac{1}{\underline{\alpha}(M-1)}\left(\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right).

Then using this in Eq. (5) it we get that

VMt+1(xt+1,Wt+1)VMt(xt,Wt)α¯σ(ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1))\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})\leq\overline{\alpha}\sigma(\phi^{t+1}(j^{*},x_{t+1},\text{M}^{t}_{p}[2:j^{*}+1],W_{t+1}))
+γ¯k=t+jt+M2wk22ct(xt,Mpt[1])\displaystyle+\overline{\gamma}\sum_{k=t+j^{*}}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{p}[1])
α¯α¯(M1)(α¯σ(xt)+γ¯k=ttewk22)+γ¯k=t+jt+M2wk22ct(xt,Mpt[1])\displaystyle\leq\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}\left(\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right)+\overline{\gamma}\sum_{k=t+j^{*}}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{p}[1])
=α¯2α¯(M1)σ(xt)ct(xt,Mpt[1])+γ¯(α¯α¯(M1)+1)(k=ttewk22).\displaystyle=\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}\sigma(x_{t})-c_{t}(x_{t},\text{M}^{t}_{p}[1])+\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\left(\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right).

Then using the fact that ct(xt,Mpt[1])α¯σ(xt)c_{t}(x_{t},\text{M}^{t}_{p}[1])\geq\underline{\alpha}\sigma(x_{t}) we get that

VMt+1(xt+1,Wt+1)VMt(xt,Wt)α¯2α¯(M1)σ(xt)α¯σ(xt)+γ¯(α¯α¯(M1)+1)(k=ttewk22)\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1})-V^{t}_{M}(x_{t},W_{t})\leq\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}\sigma(x_{t})-\underline{\alpha}\sigma(x_{t})+\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\left(\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right)
=(α¯2α¯(M1)α¯)σ(xt)+γ¯(α¯α¯(M1)+1)(k=ttewk22).\displaystyle=\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right)\sigma(x_{t})+\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\left(\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right).

Hence,

ΓV=(α¯2α¯(M1)α¯),ΓVγ=γ¯(α¯α¯(M1)+1).\Gamma_{V}=\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right),\ \Gamma^{\gamma}_{V}=\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right).

\blacksquare

Appendix B: Proof of Lemma 2

Since M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, there exists ϵ~(0,α¯)\tilde{\epsilon}\in(0,\underline{\alpha}) such that

M>α¯2/(α¯(α¯ϵ~))+1α¯/(α¯ϵ~)+1.M>\overline{\alpha}^{2}/\left(\underline{\alpha}(\underline{\alpha}-\tilde{\epsilon})\right)+1\geq\overline{\alpha}/\left(\underline{\alpha}-\tilde{\epsilon}\right)+1.

For convenience of presentation in the following we denote VMt(xt,Wt:te)V^{t}_{M}(x_{t},W_{t:t^{e}}) by VMtV^{t}_{M}. From Lemma 1 it follows that

VMt+1VMtΓVσ(x)+ΓVγk=ttewk22,ΓV=(α¯2α¯(M1)α¯),ΓVγ=γ¯(α¯α¯(M1)+1).V^{t+1}_{M}-V^{t}_{M}\leq\Gamma_{V}\sigma(x)+\Gamma^{\gamma}_{V}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2},\ \Gamma_{V}=\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right),\ \Gamma^{\gamma}_{V}=\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right).

From the above inequality it follows that ΓVϵ~\Gamma_{V}\leq-\tilde{\epsilon} and ΓVγγ¯((α¯ϵ~)/α¯+1)\Gamma^{\gamma}_{V}\leq\overline{\gamma}\left((\underline{\alpha}-\tilde{\epsilon})/\overline{\alpha}+1\right). Hence,

VMt+1VMtϵ~σ(x)+γ¯(α¯ϵ~α¯+1)k=ttewk22.V^{t+1}_{M}-V^{t}_{M}\leq-\tilde{\epsilon}\sigma(x)+\overline{\gamma}\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}.

Then using the condition that VMtα¯σ(xt)+γ¯k=tte1wt22V^{t}_{M}\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{t}\rVert^{2}_{2} we get

VMt+1VMt\displaystyle V^{t+1}_{M}-V^{t}_{M} ϵ~α¯(VMtγ¯k=tte1wk22)+γ¯(α¯ϵ~α¯+1)k=ttewk22\displaystyle\leq-\frac{\tilde{\epsilon}}{\overline{\alpha}}\left(V^{t}_{M}-\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{k}\rVert^{2}_{2}\right)+\overline{\gamma}\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}
=ϵ~α¯VMt+γ¯(α¯α¯+1)k=ttewk22.\displaystyle=-\frac{\tilde{\epsilon}}{\overline{\alpha}}V^{t}_{M}+\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}.

Pick ϵ\epsilon such that 0<ϵϵ~/α¯0<\epsilon\leq\tilde{\epsilon}/\overline{\alpha}. Then

VMt+1VMtϵVMt+γ¯(α¯α¯+1)k=ttewk22.V^{t+1}_{M}-V^{t}_{M}\leq-\epsilon V^{t}_{M}+\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}.

Let a=(1ϵ)a=(1-\epsilon). Note that a<1a<1. Let b=γ¯(α¯α¯+1)b=\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right). Then

VMt+1aVMt+bk=ttewk22.V^{t+1}_{M}\leq aV^{t}_{M}+b\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}.

Then,

VMt+2aVMt+1+bk=t+1te+1wk22a(aVMt+bk=ttewk22)+bk=t+1te+1wk22\displaystyle V^{t+2}_{M}\leq aV^{t+1}_{M}+b\sum_{k=t+1}^{t^{e}+1}\lVert w_{k}\rVert^{2}_{2}\leq a\left(aV^{t}_{M}+b\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right)+b\sum_{k=t+1}^{t^{e}+1}\lVert w_{k}\rVert^{2}_{2}
=a2VMt+abwt22+b(1+a)k=t+1tewk22+bwte+122.\displaystyle=a^{2}V^{t}_{M}+ab\lVert w_{t}\rVert^{2}_{2}+b(1+a)\sum_{k=t+1}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+b\lVert w_{t^{e}+1}\rVert^{2}_{2}.

Similarly,

VMt+3\displaystyle V^{t+3}_{M} aVMt+2+bk=t+2te+2wk22a3VMt+a2bwt22+ab(1+a)wt+122\displaystyle\leq aV^{t+2}_{M}+b\sum_{k=t+2}^{t^{e}+2}\lVert w_{k}\rVert^{2}_{2}\leq a^{3}V^{t}_{M}+a^{2}b\lVert w_{t}\rVert^{2}_{2}+ab(1+a)\lVert w_{t+1}\rVert^{2}_{2}
+b(1+a+a2)k=t+2tewk22+b(1+a)wte+122+bwte+222.\displaystyle+b(1+a+a^{2})\sum_{k=t+2}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+b(1+a)\lVert w_{t^{e}+1}\rVert^{2}_{2}+b\lVert w_{t^{e}+2}\rVert^{2}_{2}.

Extending the previous equation we get that for HMH\leq M

VMt+HaHVMt+bk=0H2((j=0kaH1j)wt+k22)+b(j=0H1aj)(k=H1M1wt+k22)\displaystyle V^{t+H}_{M}\leq a^{H}V^{t}_{M}+b\sum_{k=0}^{H-2}\left(\left(\sum_{j=0}^{k}a^{H-1-j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)+b\left(\sum_{j=0}^{H-1}a^{j}\right)\left(\sum_{k=H-1}^{M-1}\lVert w_{t+k}\rVert^{2}_{2}\right)
+bk=MH+M2((j=0H+M2kaj)wt+k22).\displaystyle+b\sum_{k=M}^{H+M-2}\left(\left(\sum_{j=0}^{H+M-2-k}a^{j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right). (6)

Next, we prove the upper bound for the case H>MH>M by induction. We hypothesize that VMt+HV^{t+H}_{M} for HMH\geq M is given by

VMt+HaHVMt+bk=0M1((j=0kaH1j)wt+k22)\displaystyle V^{t+H}_{M}\leq a^{H}V^{t}_{M}+b\sum_{k=0}^{M-1}\left(\left(\sum_{j=0}^{k}a^{H-1-j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)
+b1aM1ak=MH1aHk1wt+k22+bk=HH+M2((j=0H+M2kaj)wt+k22)\displaystyle+b\frac{1-a^{M}}{1-a}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}+b\sum_{k=H}^{H+M-2}\left(\left(\sum_{j=0}^{H+M-2-k}a^{j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right) (7)

From Lemma 1 we know that

VMt+H+1aVMt+H+bk=t+Ht+H+M1wk22V^{t+H+1}_{M}\leq aV^{t+H}_{M}+b\sum_{k=t+H}^{t+H+M-1}\lVert w_{k}\rVert^{2}_{2}

Substituting Eq. (7)

VMt+H+1a(aHVMt+bk=0M1((j=0kaH1j)wt+k22)\displaystyle V^{t+H+1}_{M}\leq a\left(a^{H}V^{t}_{M}+b\sum_{k=0}^{M-1}\left(\left(\sum_{j=0}^{k}a^{H-1-j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)\right.
+b1aM1ak=MH1aHk1wt+k22+bk=HH+M2((j=0H+M2kaj)wt+k22))+bk=t+Ht+H+M1wk22\displaystyle\left.+b\frac{1-a^{M}}{1-a}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}+b\sum_{k=H}^{H+M-2}\left(\left(\sum_{j=0}^{H+M-2-k}a^{j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)\right)+b\sum_{k=t+H}^{t+H+M-1}\lVert w_{k}\rVert^{2}_{2}
=aH+1VMt+bk=0M1((j=0kaHj)wt+k22)+b1aM1ak=MH1aHkwt+k22\displaystyle=a^{H+1}V^{t}_{M}+b\sum_{k=0}^{M-1}\left(\left(\sum_{j=0}^{k}a^{H-j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)+b\frac{1-a^{M}}{1-a}\sum_{k=M}^{H-1}a^{H-k}\lVert w_{t+k}\rVert^{2}_{2}
+bk=HH+M2((j=0H+M2kaj+1)wt+k22)+bk=t+Ht+H+M1wk22\displaystyle+b\sum_{k=H}^{H+M-2}\left(\left(\sum_{j=0}^{H+M-2-k}a^{j+1}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)+b\sum_{k=t+H}^{t+H+M-1}\lVert w_{k}\rVert^{2}_{2}
=aH+1VMt+bk=0M1((j=0kaHj)wt+k22)+b1aM1ak=MHaHkwt+k22\displaystyle=a^{H+1}V^{t}_{M}+b\sum_{k=0}^{M-1}\left(\left(\sum_{j=0}^{k}a^{H-j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)+b\frac{1-a^{M}}{1-a}\sum_{k=M}^{H}a^{H-k}\lVert w_{t+k}\rVert^{2}_{2}
+bk=H+1H+M1((j=0H+M1kaj)wt+k22).\displaystyle+b\sum_{k=H+1}^{H+M-1}\left(\left(\sum_{j=0}^{H+M-1-k}a^{j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right).

From Eq. (6) the hypothesis Eq. (7) is true for H=MH=M. Hence, by induction Eq. (7) is true for all HMH\geq M. We can simplify Eq. (7) as

VMt+HaHVMt+baHM(k=0M1aMk1(1ak+1)1awt+k22)+b1aM1ak=MH1aHk1wt+k22\displaystyle V^{t+H}_{M}\leq a^{H}V^{t}_{M}+ba^{H-M}\left(\sum_{k=0}^{M-1}\frac{a^{M-k-1}\left(1-a^{k+1}\right)}{1-a}\lVert w_{t+k}\rVert^{2}_{2}\right)+b\frac{1-a^{M}}{1-a}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}
+bk=HH+M2((j=0H+M2kaj)wt+k22)\displaystyle+b\sum_{k=H}^{H+M-2}\left(\left(\sum_{j=0}^{H+M-2-k}a^{j}\right)\lVert w_{t+k}\rVert^{2}_{2}\right)
aHVMt+baHM1a(k=0M1wt+k22)+b1ak=MH1aHk1wt+k22+b1ak=HH+M2wt+k22.\displaystyle\leq a^{H}V^{t}_{M}+\frac{ba^{H-M}}{1-a}\left(\sum_{k=0}^{M-1}\lVert w_{t+k}\rVert^{2}_{2}\right)+\frac{b}{1-a}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}+\frac{b}{1-a}\sum_{k=H}^{H+M-2}\lVert w_{t+k}\rVert^{2}_{2}.

Since VMt+HctH(xtH,κM(xtH))V^{t+H}_{M}\geq c_{t_{H}}(x_{t_{H}},\kappa_{M}(x_{t_{H}})). Then using the condition that VMtα¯σ(xt)+γ¯k=ttewk22V^{t}_{M}\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2} we get

ctH(.,.)aHα¯σ(xt)+(baHM1a+aHγ¯)(k=0M1wt+k22)\displaystyle c_{t_{H}}(.,.)\leq a^{H}\overline{\alpha}\sigma(x_{t})+\left(\frac{ba^{H-M}}{1-a}+a^{H}\overline{\gamma}\right)\left(\sum_{k=0}^{M-1}\lVert w_{t+k}\rVert^{2}_{2}\right)
+b(1a)k=MH1aHk1wt+k22+b(1a)k=HH+M2wt+k22.\displaystyle+\frac{b}{(1-a)}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}+\frac{b}{(1-a)}\sum_{k=H}^{H+M-2}\lVert w_{t+k}\rVert^{2}_{2}.

Thus, for any 0<ϵϵ~/α¯0<\epsilon\leq\tilde{\epsilon}/\overline{\alpha}, a=1ϵa=1-\epsilon

Mλ=aHα¯;λ=lna;Mw,j=(baHM1a+aHγ¯),tjte;\displaystyle M_{\lambda}=a^{H}\overline{\alpha};\ \lambda=-\ln{a};M_{w,j}=\left(\frac{ba^{H-M}}{1-a}+a^{H}\overline{\gamma}\right),t\leq j\leq t^{e};
Mw,j=b(1a)at+Hj1,te+1jt+H1;\displaystyle M_{w,j}=\frac{b}{(1-a)}a^{t+H-j-1},t^{e}+1\leq j\leq t+H-1;
Mw,j=b(1a),t+Hjt+H+M2\displaystyle M_{w,j}=\frac{b}{(1-a)},t+H\leq j\leq t+H+M-2\ \blacksquare

Appendix C: Theorem 1

In the following we denote the cost incurred at time step tt succinctly by ctc_{t} ignoring its arguments. Let HMH\geq M. Given that M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, Lemma 2 is applicable to bound cH+1c_{H+1}. Hence, for any aa s.t. 1>a1ϵ~/α¯,0<ϵ~<α¯α¯2(M1)α¯1>a\geq 1-\tilde{\epsilon}/\overline{\alpha},0<\tilde{\epsilon}<\underline{\alpha}-\frac{\overline{\alpha}^{2}}{(M-1)\underline{\alpha}}, the stage cost of the closed loop system at H+1H+1 is given by,

cH+1aHα¯σ(x1)+(baHM1a+aHγ¯)(k=0M1w1+k22)\displaystyle c_{H+1}\leq a^{H}\overline{\alpha}\sigma(x_{1})+\left(\frac{ba^{H-M}}{1-a}+a^{H}\overline{\gamma}\right)\left(\sum_{k=0}^{M-1}\lVert w_{1+k}\rVert^{2}_{2}\right)
+b(1a)k=MH1aHk1w1+k22+b1ak=HH+M2wt+k22.\displaystyle+\frac{b}{(1-a)}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{1+k}\rVert^{2}_{2}+\frac{b}{1-a}\sum_{k=H}^{H+M-2}\lVert w_{t+k}\rVert^{2}_{2}.

Hence,

H=MT1cH+1α¯aM1aσ(x1)+(b(1a)2+aMγ¯1a)(k=0M1w1+k22)\displaystyle\sum_{H=M}^{T-1}c_{H+1}\leq\frac{\overline{\alpha}a^{M}}{1-a}\sigma(x_{1})+\left(\frac{b}{(1-a)^{2}}+\frac{a^{M}\overline{\gamma}}{1-a}\right)\left(\sum_{k=0}^{M-1}\lVert w_{1+k}\rVert^{2}_{2}\right)
+b(1a)H=MT1k=MH1aHk1w1+k22+b1aH=MT1k=HH+M2wt+k22.\displaystyle+\frac{b}{(1-a)}\sum_{H=M}^{T-1}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{1+k}\rVert^{2}_{2}+\frac{b}{1-a}\sum_{H=M}^{T-1}\sum_{k=H}^{H+M-2}\lVert w_{t+k}\rVert^{2}_{2}.

The first two terms are O(1)O(1). Combining the terms corresponding to each disturbance realization wtw_{t} we get

H=MT1cH+1O(1)+b((M1)(1a)+1)(1a)2H=MTwH22+b(M1)1aH=TT+M2wH22\sum_{H=M}^{T-1}c_{H+1}\leq O(1)+\frac{b\left((M-1)(1-a)+1\right)}{(1-a)^{2}}\sum_{H=M}^{T}\lVert w_{H}\rVert^{2}_{2}+\frac{b(M-1)}{1-a}\sum_{H=T}^{T+M-2}\lVert w_{H}\rVert^{2}_{2}

The last term is also O(1)O(1). Hence,

H=MT1cH+1O(1)+b((M1)(1a)+1)(1a)2H=MTwH22.\sum_{H=M}^{T-1}c_{H+1}\leq O(1)+\frac{b\left((M-1)(1-a)+1\right)}{(1-a)^{2}}\sum_{H=M}^{T}\lVert w_{H}\rVert^{2}_{2}.

From Eq. (6) in the proof of Lemma 2 we get that H=1M1CHO(1)\sum_{H=1}^{M-1}C_{H}\leq O(1). Then for γc=b((M1)(1a)+1)(1a)2\gamma_{c}=\frac{b\left((M-1)(1-a)+1\right)}{(1-a)^{2}} we get

H=1T1cHγcH=MTwH22O(1)\sum_{H=1}^{T-1}c_{H}-\gamma_{c}\sum_{H=M}^{T}\lVert w_{H}\rVert^{2}_{2}\leq O(1)\ \blacksquare

Appendix D: Disturbance is not Accessible

In this setting Mc=M,Mw=0M_{c}=M,M_{w}=0. The receding horizon online controller for this case optimizes the worst-case cost-to-go instead of the cost-to-go for the realized disturbance:

infu~t:t+M1supw~t:t+M1j=tt+M1cj(x~j,u~j)s.t.x~j+1=f(x~j,u~j,w~j),,\displaystyle\underset{\tilde{u}_{t:t+M-1}}{\inf}\underset{\tilde{w}_{t:t+M-1}}{\sup}\sum_{j=t}^{t+M-1}c_{j}(\tilde{x}_{j},\tilde{u}_{j})\ \text{s.t.}\ \tilde{x}_{j+1}=f(\tilde{x}_{j},\tilde{u}_{j},\tilde{w}_{j}),,
x~t=xt,u~j𝒰,w~j𝒲.\displaystyle\ \tilde{x}_{t}=x_{t},\ \tilde{u}_{j}\in\mathcal{U},\ \tilde{w}_{j}\in\mathcal{W}. (8)

Given the fact that the constraint 𝒲\mathcal{W} is compact, the solution to the inner optimization exists and is continuous by an argument similar to Eq. (3) (see Proof of Lemma 3). Applying the same argument again for the outer optimization we find that the solution to the outer optimization exists and is continuous.

While RHC algorithms are complex because of the optimization, min-max optimization further exacerbate the difficulty because of the bi-level nature of the optimization. Min-max optimization of general non-convex and non-concave objective functions are known to be hard problems [14]. It was shown in [14] that finding an approximate local min-max point of large enough approximation is guaranteed to exist, but finding one such point is PPAD-complete. When the objective function L(x,y)L(x,y) is a convex-concave function, i.e., L is convex in x for all y and it is concave in y for all x, the problem minxmaxyL(x,y)\min_{x}\max_{y}L(x,y) with constraints is guaranteed to have a solution, under compactness of the constraint set [42], while computing a solution is amenable to convex programming. In fact, if L is LL-smooth, the problem can be solved via first-order methods, and achieve an approximation error of poly(L,1/T)\text{poly}(L,1/T) in TT iterations; see e.g. [39]. When the function is strongly convex and strongly concave, the rate becomes geometric [16].

We denote the solution computed by the optimization Eq. (8) as MPC𝒲t\text{MPC}^{t}_{\mathcal{W}} and Wt:t+M1W^{*}_{t:t+M-1}. The control input utu_{t} is set as the first element of the sequence MPC𝒲t\text{MPC}^{t}_{\mathcal{W}}. We succinctly denote this by κ𝒲,M(xt)\kappa_{\mathcal{W},M}(x_{t}). The complete algorithm for this case is given in Algorithm 3.

1:  Input: MM
2:  for t = 1,…,T do
3:     Observe yty_{t}, {cj(.,.)}tjt+M1\{c_{j}(.,.)\}_{t\leq j\leq t+M-1}
4:     Solve the optimization Eq. (8) to obtain u~t:t+M1\tilde{u}_{t:t+M-1}
5:     Apply u~t\tilde{u}_{t} to the system
6:  end for
Algorithm 3 Online Learning Robust Control (Known System and no Preview of Disturbance)

Let

V𝒲,Mt(xt)=JMt(xt,u~t:t+M1,w~t:t+M1),\displaystyle V^{t}_{\mathcal{W},M}(x_{t})=J^{t}_{M}(x_{t},\tilde{u}_{t:t+M-1},\tilde{w}_{t:t+M-1}),
JMt(xt,u~t:t+M1,w~t:t+M1)=j=tt+M1cj(x~j,u~j)s.t.x~j+1=f(x~j,u~j,w~j),x~t=xt.\displaystyle J^{t}_{M}(x_{t},\tilde{u}_{t:t+M-1},\tilde{w}_{t:t+M-1})=\sum_{j=t}^{t+M-1}c_{j}(\tilde{x}_{j},\tilde{u}_{j})\ \text{s.t.}\ \tilde{x}_{j+1}=f(\tilde{x}_{j},\tilde{u}_{j},\tilde{w}_{j}),\tilde{x}_{t}=x_{t}.

By the existence of the solution MPC𝒲t\text{MPC}^{t}_{\mathcal{W}} this function is well defined. In the next lemma, we characterize the incremental change of the value of this function as the closed loop system evolves from tt to t+1t+1. As in the previous case, we establish the result for the class of systems that satisfy V𝒲,Mt(xt)α¯𝒲σ(xt)+γ¯𝒲maxw𝒲w22V^{t}_{\mathcal{W},M}(x_{t})\leq\overline{\alpha}_{\mathcal{W}}\sigma(x_{t})+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}. In contrast to the previous case, the bound of this function is specified in terms of the maximum disturbance. Given the min-max nature of the optimization where the maximization is carried out overall all disturbance realizations this bound naturally follows from the bound assumed on VMt(.)V^{t}_{M}(.) in the previous case. It is easy to see that γ¯\overline{\gamma} and γ¯𝒲\overline{\gamma}_{\mathcal{W}} are related by γ¯𝒲=γ¯M\overline{\gamma}_{\mathcal{W}}=\overline{\gamma}M. We once again note that this class of systems trivially includes the strongly convex quadratic cost functions for linear dynamical systems.

Lemma 6

Suppose te=t+M1t^{e}=t+M-1. Then there exists constants α¯𝒲,V,Γ𝒲,Vγ>0\underline{\alpha}_{\mathcal{W},V},\Gamma^{\gamma}_{\mathcal{W},V}>0 and Γ𝒲,V\Gamma_{\mathcal{W},V} such that for each M2M\geq 2 and xtnx_{t}\in\mathbb{R}^{n}, for the closed-loop system xt+1=f(xt,κ𝒲,MT(xt),wt)x_{t+1}=f(x_{t},\kappa^{T}_{\mathcal{W},M}(x_{t}),w_{t}), V𝒲,Mt(xt)α¯𝒲,Vσ(xt)V^{t}_{\mathcal{W},M}(x_{t})\geq\underline{\alpha}_{\mathcal{W},V}\sigma(x_{t}) and

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)Γ𝒲,Vσ(xt)+Γ𝒲,Vγmaxw𝒲w22,V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq\Gamma_{\mathcal{W},V}\sigma(x_{t})+\Gamma^{\gamma}_{\mathcal{W},V}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2},

where α¯𝒲,V=α¯\underline{\alpha}_{\mathcal{W},V}=\underline{\alpha}, Γ𝒲,V=(α¯𝒲2(M1)α¯α¯),Γ𝒲,Vγ=γ¯𝒲(α¯𝒲(M1)α¯+1)\Gamma_{\mathcal{W},V}=\left(\frac{\overline{\alpha}^{2}_{\mathcal{W}}}{(M-1)\underline{\alpha}}-\underline{\alpha}\right),\ \Gamma^{\gamma}_{\mathcal{W},V}=\overline{\gamma}_{\mathcal{W}}\left(\frac{\overline{\alpha}_{\mathcal{W}}}{(M-1)\underline{\alpha}}+1\right).

Proof: We note that

V𝒲,Mt(xt)ct(xt,κ𝒲,M(xt))α¯σ(xt).V^{t}_{\mathcal{W},M}(x_{t})\geq c_{t}(x_{t},\kappa_{\mathcal{W},M}(x_{t}))\geq\underline{\alpha}\sigma(x_{t}).

The last inequality follows directly from Assumption 2. Hence, α¯𝒲,V=α¯\underline{\alpha}_{\mathcal{W},V}=\underline{\alpha}. We introduce some notation for simplifying the presentation of the analysis. If MPC𝒲t={u~t,u~t+1,,u~te}\text{MPC}^{t}_{\mathcal{W}}=\{\tilde{u}^{*}_{t},\tilde{u}^{*}_{t+1},...,\tilde{u}^{*}_{t^{e}}\}, then let MPC𝒲t[i:j]={u~t+i1,,u~t+j1},ij\text{MPC}^{t}_{\mathcal{W}}[i:j]=\{\tilde{u}^{*}_{t+i-1},...,\tilde{u}^{*}_{t+j-1}\},\ i\leq j. We denote MPC𝒲t\text{MPC}^{t}_{\mathcal{W}} by MPC𝒲t\text{MPC}^{t}_{\mathcal{W}}, Wt:teW_{t:t^{e}} by WtW_{t}, and Wt:teW^{*}_{t:t^{e}} by WtW^{*}_{t}. Similarly, let Wt[i:j]=Wt+i1:t+j1,Wt[i:j]=Wt+i1:t+j1W_{t}[i:j]=W_{t+i-1:t+j-1},\ W^{*}_{t}[i:j]=W^{*}_{t+i-1:t+j-1}

Let ϕt(k,xt,M𝒲t,Wt)\phi^{t}(k,x_{t},\text{M}^{t}_{\mathcal{W}},W_{t}) denote the state the system evolves to at time t+kt+k starting at xtx_{t}, under the control sequence M𝒲t[1:k]\text{M}^{t}_{\mathcal{W}}[1:k] and disturbance WtW_{t}. From the definition of V𝒲,Mt(xt)V^{t}_{\mathcal{W},M}(x_{t}) it follows that

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)=JMt+1(xt+1,M𝒲t+1,Wt+1)JMt(xt,M𝒲t,Wt)\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})=J^{t+1}_{M}(x_{t+1},\text{M}^{t+1}_{\mathcal{W}},W^{*}_{t+1})-J^{t}_{M}(x_{t},\text{M}^{t}_{\mathcal{W}},W^{*}_{t})
=k=0M1ct+1+k(ϕt+1(k,xt+1,M𝒲t+1,Wt+1),M𝒲t+1[k+1])\displaystyle=\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{\mathcal{W}},W^{*}_{t+1}),\text{M}^{t+1}_{\mathcal{W}}[k+1])
k=0M1ct+k(ϕt(k,xt,M𝒲t,Wt),M𝒲t[k+1]).\displaystyle-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{\mathcal{W}},W^{*}_{t}),\text{M}^{t}_{\mathcal{W}}[k+1]).

Define

M~𝒲t+1=[M𝒲t[2:j+1],U~jMj],U~jMj={u~1,,u~Mj},\displaystyle\tilde{\text{M}}^{t+1}_{\mathcal{W}}=[\text{M}^{t}_{\mathcal{W}}[2:j+1],\ \tilde{U}^{M-j}_{j}],\ \tilde{U}^{M-j}_{j}=\{\tilde{u}_{1},...,\tilde{u}_{M-j}\},
W~t+1=[Wt+1[1:j],W~jMj],W~jMj={w~1,,w~Mj}.\displaystyle\tilde{W}_{t+1}=[W^{*}_{t+1}[1:j],\ \tilde{W}^{M-j}_{j}],\ \tilde{W}^{M-j}_{j}=\{\tilde{w}_{1},...,\tilde{w}_{M-j}\}.

Let

{U~j,Mj,W~j,Mj}=arginfU~jMjargsupW~jMjk=0j1ct+1+k(ϕt+1(k,xt+1,M~𝒲t+1,W~),M~𝒲t+1[k+1])\displaystyle\{\tilde{U}^{*,M-j}_{j},\tilde{W}^{*,M-j}_{j}\}=\underset{\tilde{U}^{M-j}_{j}}{\text{arginf}}\ \underset{\tilde{W}^{M-j}_{j}}{\text{argsup}}\sum_{k=0}^{j-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}),\tilde{\text{M}}^{t+1}_{\mathcal{W}}[k+1])
+JMjt+1+j(ϕt+1(j,xt+1,M~𝒲t+1,W~),U~jMj,W~jMj).\displaystyle+J^{t+1+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}),\tilde{U}^{M-j}_{j},\tilde{W}^{M-j}_{j}).

Let W~t+1,j=[Wt+1[1:j],W~j,Mj]\tilde{W}^{*,j}_{t+1}=[W^{*}_{t+1}[1:j],\ \tilde{W}^{*,M-j}_{j}]. Given that [wt,W~t+1,j[1:M1]][w_{t},\tilde{W}^{*,j}_{t+1}[1:M-1]] is not the worst-case disturbance realization for the system starting at xtx_{t} and for the control sequence M𝒲t\text{M}^{t}_{\mathcal{W}}. Hence,

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)k=0M1ct+1+k(ϕt+1(k,xt+1,M𝒲t+1,Wt+1),M𝒲t+1[k+1])\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{\mathcal{W}},W^{*}_{t+1}),\text{M}^{t+1}_{\mathcal{W}}[k+1])
ct(xt,M𝒲t[1])k=0M2ct+1+k(ϕt+1(k,xt+1,M𝒲t[2:k+1],W~t+1,j[1:M1]),M𝒲t[k+2]).\displaystyle-c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1])-\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:k+1],\tilde{W}^{*,j}_{t+1}[1:M-1]),\text{M}^{t}_{\mathcal{W}}[k+2]).

Let M~𝒲,j=[M~𝒲t+1[1:j],U~j,Mj]\tilde{\text{M}}^{*,j}_{\mathcal{W}}=[\tilde{\text{M}}^{t+1}_{\mathcal{W}}[1:j],\tilde{U}^{*,M-j}_{j}]. From the definitions of M𝒲t+1,Wt+1,M~𝒲,j,W~t+1,j\text{M}^{t+1}_{\mathcal{W}},\ W^{*}_{t+1},\ \tilde{\text{M}}^{*,j}_{\mathcal{W}},\ \tilde{W}^{*,j}_{t+1} it follows that

k=0M1ct+1+k(ϕt+1(k,xt+1,M𝒲t+1,Wt+1),M𝒲t+1[k+1])\displaystyle\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{\mathcal{W}},W^{*}_{t+1}),\text{M}^{t+1}_{\mathcal{W}}[k+1])
k=0M1ct+1+k(ϕt+1(k,xt+1,M~𝒲,j,Wt+1),M~𝒲,j[k+1])\displaystyle\leq\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{*,j}_{\mathcal{W}},W^{*}_{t+1}),\tilde{\text{M}}^{*,j}_{\mathcal{W}}[k+1])
k=0M1ct+1+k(ϕt+1(k,xt+1,M~𝒲,j,W~t+1,j),M~𝒲,j[k+1]).\displaystyle\leq\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{*,j}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}),\tilde{\text{M}}^{*,j}_{\mathcal{W}}[k+1]).

Using the above inequality we further get

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)k=0M1ct+1+k(ϕt+1(k,xt+1,M~𝒲,j,W~t+1,j),M~𝒲,j[k+1])\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{*,j}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}),\tilde{\text{M}}^{*,j}_{\mathcal{W}}[k+1])
ct(xt,M𝒲t[1])k=0M2ct+1+k(ϕt+1(k,xt+1,M𝒲t[2:k+1],W~t+1,j[1:M1]),M𝒲t[k+2]).\displaystyle-c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1])-\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:k+1],\tilde{W}^{*,j}_{t+1}[1:M-1]),\text{M}^{t}_{\mathcal{W}}[k+2]).

Using the definitions of M~𝒲,j\tilde{\text{M}}^{*,j}_{\mathcal{W}} and W~t+1,j\tilde{W}^{*,j}_{t+1}, we get

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)k=0j1ct+1+k(ϕt+1(k,xt+1,M~𝒲t+1,W~t+1,j),M~𝒲t+1[k+1])\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq\sum_{k=0}^{j-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}),\tilde{\text{M}}^{t+1}_{\mathcal{W}}[k+1])
+JMjt+1+j(ϕt+1(j,xt+1,M~𝒲t+1,W~t+1,j),U~j,Mj,W~j,Mj)ct(xt,M𝒲t[1])\displaystyle+J^{t+1+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}),\tilde{U}^{*,M-j}_{j},\tilde{W}^{*,M-j}_{j})-c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1])
k=0M2ct+1+k(ϕt+1(k,xt+1,M𝒲t[2:k+1],W~t+1,j[1:M1]),M𝒲t[k+2]).\displaystyle-\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:k+1],\tilde{W}^{*,j}_{t+1}[1:M-1]),\text{M}^{t}_{\mathcal{W}}[k+2]).

Then for any j{0,,M1}j\in\{0,...,M-1\}, we get

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)JMjt+1+j(ϕt+1(j,xt+1,M~𝒲t+1,W~t+1,j),U~j,Mj,W~j,Mj)\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq J^{t+1+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}),\tilde{U}^{*,M-j}_{j},\tilde{W}^{*,M-j}_{j})
ct(xt,M𝒲t[1]).\displaystyle-c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1]).

By definition

JMjt+1+j(ϕt+1(j,xt+1,M~𝒲t+1,W~t+1,j),U~j,Mj,W~j,Mj)=V𝒲,Mjt+1+j(ϕt+1(j,xt+1,M~𝒲t+1,W~t+1,j)).J^{t+1+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}),\tilde{U}^{*,M-j}_{j},\tilde{W}^{*,M-j}_{j})=V^{t+1+j}_{\mathcal{W},M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1})).

Hence for any j{0,,M1}j\in\{0,...,M-1\},

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)V𝒲,Mjt+1+j(ϕt+1(j,xt+1,M~𝒲t+1,W~t+1,j))ct(xt,M𝒲t[1]).V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq V^{t+1+j}_{\mathcal{W},M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1}))-c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1]).

By definition

V𝒲,Mjt+1+j(.)V𝒲,Mt+1+j(.)α¯𝒲σ(.)+γ¯𝒲maxw𝒲w22.V^{t+1+j}_{\mathcal{W},M-j}(.)\leq V^{t+1+j}_{\mathcal{W},M}(.)\leq\overline{\alpha}_{\mathcal{W}}\sigma(.)+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Using the fact that M~𝒲,j=[M𝒲t[2:j+1],U~j,Mj]\tilde{\text{M}}^{*,j}_{\mathcal{W}}=[\text{M}^{t}_{\mathcal{W}}[2:j+1],\tilde{U}^{*,M-j}_{j}] and W~t+1,j=[Wt+1[1:j],W~j,Mj]\tilde{W}^{*,j}_{t+1}=[W^{*}_{t+1}[1:j],\ \tilde{W}^{*,M-j}_{j}], it follows that

ϕt+1(j,xt+1,M~𝒲t+1,W~t+1,j)=ϕt+1(j,xt+1,M𝒲t[2:j+1],Wt+1[1:j]).\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{\mathcal{W}},\tilde{W}^{*,j}_{t+1})=\phi^{t+1}(j,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:j+1],W^{*}_{t+1}[1:j]).

Hence,

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)α¯𝒲σ(ϕt+1(j,xt+1,M𝒲t[2:j+1],Wt+1[1:j]))\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq\overline{\alpha}_{\mathcal{W}}\sigma(\phi^{t+1}(j,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:j+1],W^{*}_{t+1}[1:j]))
+γ¯𝒲maxw𝒲w22ct(xt,M𝒲t[1]).\displaystyle+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1]).

By Assumption 2 (similar to the step in the Lemma 1) we have that

k=0M2α¯σ(ϕt+1(k,xt+1,M𝒲t[2:k+1],Wt+1[1:k]))ct(xt,M𝒲t[1])\displaystyle\sum_{k=0}^{M-2}\underline{\alpha}\sigma(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:k+1],W^{*}_{t+1}[1:k]))\leq c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1])
+k=0M2ct+1+k(ϕt+1(k,xt+1,M𝒲t[2:M],Wt+1),M𝒲t[k+2]).\displaystyle+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:M],W^{*}_{t+1}),\text{M}^{t}_{\mathcal{W}}[k+2]).

By definition WtW^{*}_{t} is the worst-case disturbance realization for the control sequence M𝒲t\text{M}^{t}_{\mathcal{W}} and the system starting xtx_{t}. Hence,

ct(xt,M𝒲t[1])+k=0M2ct+1+k(ϕt+1(k,xt+1,M𝒲t[2:M],Wt+1),M𝒲t[k+2])\displaystyle c_{t}(x_{t},\text{M}^{t}_{\mathcal{W}}[1])+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:M],W^{*}_{t+1}),\text{M}^{t}_{\mathcal{W}}[k+2])
k=0M1ct+k(ϕt(k,xt,M𝒲t,Wt),M𝒲t[k+1])=V𝒲,Mtα¯𝒲σ(xt)+γ¯𝒲maxw𝒲w22,\displaystyle\leq\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{\mathcal{W}},W^{*}_{t}),\text{M}^{t}_{\mathcal{W}}[k+1])=V^{t}_{\mathcal{W},M}\leq\overline{\alpha}_{\mathcal{W}}\sigma(x_{t})+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2},

Where the last inequality follows from V𝒲,Mt(.)α¯𝒲σ(.)+γ¯𝒲maxw𝒲w22V^{t}_{\mathcal{W},M}(.)\leq\overline{\alpha}_{\mathcal{W}}\sigma(.)+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}. Using this inequality we get that

k=0M2α¯σ(ϕt+1(k,xt+1,M𝒲t[2:k+1],Wt+1[1:k]))α¯𝒲σ(xt)+γ¯𝒲maxw𝒲w22.\sum_{k=0}^{M-2}\underline{\alpha}\sigma(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:k+1],W^{*}_{t+1}[1:k]))\leq\overline{\alpha}_{\mathcal{W}}\sigma(x_{t})+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Hence there exists a j{0,,M2}j^{*}\in\{0,...,M-2\} such that

σ(ϕt+1(j,xt+1,M𝒲t[2:j+1],Wt+1[1:j]))\displaystyle\sigma(\phi^{t+1}(j^{*},x_{t+1},\text{M}^{t}_{\mathcal{W}}[2:j^{*}+1],W^{*}_{t+1}[1:j^{*}]))
1(M1)α¯(α¯𝒲σ(xt)+γ¯𝒲maxw𝒲w22).\displaystyle\leq\frac{1}{(M-1)\underline{\alpha}}\left(\overline{\alpha}_{\mathcal{W}}\sigma(x_{t})+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}\right).

Hence,

V𝒲,Mt+1(xt+1)V𝒲,Mt(xt)(α¯𝒲2(M1)α¯α¯)σ(xt)\displaystyle V^{t+1}_{\mathcal{W},M}(x_{t+1})-V^{t}_{\mathcal{W},M}(x_{t})\leq\left(\frac{\overline{\alpha}^{2}_{\mathcal{W}}}{(M-1)\underline{\alpha}}-\underline{\alpha}\right)\sigma(x_{t})
+γ¯𝒲(α¯𝒲(M1)α¯+1)maxw𝒲w22\displaystyle+\overline{\gamma}_{\mathcal{W}}\left(\frac{\overline{\alpha}_{\mathcal{W}}}{(M-1)\underline{\alpha}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}\ \blacksquare

Similar to the previous case two terms bound the incremental change in the value of function V𝒲,MtV^{t}_{\mathcal{W},M}: the first term is the contribution from the cost for the state xtx_{t} and the second term is the contribution from the maximum possible energy for the disturbance recognizing that the factor γ¯𝒲\overline{\gamma}_{\mathcal{W}} in Γ𝒲,Vγ\Gamma^{\gamma}_{\mathcal{W},V} is γ¯M\leq\overline{\gamma}M. Next we use the above lemma to characterize the performance of the online controller.

Theorem 3

Suppose the conditions of Lemma 6. Then, for TM+1T\geq M+1, M>α¯𝒲2/α¯2+1M>\overline{\alpha}^{2}_{\mathcal{W}}/\underline{\alpha}^{2}+1, b=γ¯(α¯α¯𝒲+1)b=\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right), and any aa s.t. 1>a1ϵ~/α¯𝒲,0<ϵ~<α¯α¯𝒲2(M1)α¯1>a\geq 1-\tilde{\epsilon}/\overline{\alpha}_{\mathcal{W}},0<\tilde{\epsilon}<\underline{\alpha}-\frac{\overline{\alpha}_{\mathcal{W}}^{2}}{(M-1)\underline{\alpha}}, the total cost incurred by the closed-loop system xt+1=f(xt,κ𝒲,MT(xt),wt)x_{t+1}=f(x_{t},\kappa^{T}_{\mathcal{W},M}(x_{t}),w_{t}) satisfies

t=1Tct(xt,κ𝒲,M(xt))γTmaxw𝒲w22O(1),γγc,𝒲\sum_{t=1}^{T}c_{t}(x_{t},\kappa_{\mathcal{W},M}(x_{t}))-\gamma T\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}\leq O(1),\ \gamma\geq\gamma_{c,\mathcal{W}}

where γc,𝒲=γ¯𝒲1a(α¯α¯𝒲+1)\gamma_{c,\mathcal{W}}=\frac{\overline{\gamma}_{\mathcal{W}}}{1-a}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right).

Proof: Since M>α¯𝒲2/α¯2+1M>\overline{\alpha}^{2}_{\mathcal{W}}/\underline{\alpha}^{2}+1, there exists ϵ~(0,α¯)\tilde{\epsilon}\in(0,\underline{\alpha}) such that such that

M>α¯𝒲2/(α¯(α¯ϵ~))+1α¯𝒲/(α¯ϵ~)+1.M>\overline{\alpha}^{2}_{\mathcal{W}}/\left(\underline{\alpha}(\underline{\alpha}-\tilde{\epsilon})\right)+1\geq\overline{\alpha}_{\mathcal{W}}/\left(\underline{\alpha}-\tilde{\epsilon}\right)+1.

For convenience of presentation in the following we denote V𝒲,Mt(xt)V^{t}_{\mathcal{W},M}(x_{t}) by V𝒲,MtV^{t}_{\mathcal{W},M}. From Lemma 1 it follows that

V𝒲,Mt+1V𝒲,Mt(α¯𝒲2α¯(M1)α¯)σ(x)+γ¯𝒲(α¯𝒲α¯(M1)+1)maxw𝒲w22.V^{t+1}_{\mathcal{W},M}-V^{t}_{\mathcal{W},M}\leq\left(\frac{\overline{\alpha}_{\mathcal{W}}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right)\sigma(x)+\overline{\gamma}_{\mathcal{W}}\left(\frac{\overline{\alpha}_{\mathcal{W}}}{\underline{\alpha}(M-1)}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Hence,

V𝒲,Mt+1V𝒲,Mtϵ~σ(x)+γ¯𝒲(α¯𝒲α¯(M1)+1)maxw𝒲w22.V^{t+1}_{\mathcal{W},M}-V^{t}_{\mathcal{W},M}\leq-\tilde{\epsilon}\sigma(x)+\overline{\gamma}_{\mathcal{W}}\left(\frac{\overline{\alpha}_{\mathcal{W}}}{\underline{\alpha}(M-1)}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Then using the fact that V𝒲,Mt(.)α¯𝒲σ(.)+γ¯𝒲maxw𝒲w22V^{t}_{\mathcal{W},M}(.)\leq\overline{\alpha}_{\mathcal{W}}\sigma(.)+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2} we get

V𝒲,Mt+1V𝒲,Mtϵ~α¯𝒲(V𝒲,Mt(xt)γ¯𝒲maxw𝒲w22)+γ¯𝒲(α¯𝒲α¯(M1)+1)maxw𝒲w22.V^{t+1}_{\mathcal{W},M}-V^{t}_{\mathcal{W},M}\leq-\frac{\tilde{\epsilon}}{\overline{\alpha}_{\mathcal{W}}}\left(V^{t}_{\mathcal{W},M}(x_{t})-\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}\right)+\overline{\gamma}_{\mathcal{W}}\left(\frac{\overline{\alpha}_{\mathcal{W}}}{\underline{\alpha}(M-1)}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Then using the fact that α¯ϵ~>α¯𝒲2/(α¯(M1))\underline{\alpha}-\tilde{\epsilon}>\overline{\alpha}^{2}_{\mathcal{W}}/\left(\underline{\alpha}(M-1)\right) we get

V𝒲,Mt+1V𝒲,Mtϵ~α¯𝒲V𝒲,Mt(xt)+γ¯𝒲(α¯α¯𝒲+1)maxw𝒲w22.V^{t+1}_{\mathcal{W},M}-V^{t}_{\mathcal{W},M}\leq-\frac{\tilde{\epsilon}}{\overline{\alpha}_{\mathcal{W}}}V^{t}_{\mathcal{W},M}(x_{t})+\overline{\gamma}_{\mathcal{W}}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Pick ϵ\epsilon s.t. 0<ϵϵ~α¯𝒲0<\epsilon\leq\frac{\tilde{\epsilon}}{\overline{\alpha}_{\mathcal{W}}}. Then

V𝒲,Mt+1V𝒲,MtϵV𝒲,Mt(xt)+γ¯𝒲(α¯α¯𝒲+1)maxw𝒲w22.V^{t+1}_{\mathcal{W},M}-V^{t}_{\mathcal{W},M}\leq-\epsilon V^{t}_{\mathcal{W},M}(x_{t})+\overline{\gamma}_{\mathcal{W}}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Let a=1ϵa=1-\epsilon. Then

V𝒲,Mt+1aV𝒲,Mt(xt)+γ¯𝒲(α¯α¯𝒲+1)maxw𝒲w22.V^{t+1}_{\mathcal{W},M}\leq aV^{t}_{\mathcal{W},M}(x_{t})+\overline{\gamma}_{\mathcal{W}}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

For convenience we denote the cost incurred at time step t+Ht+H just by ct+Hc_{t+H} ignoring its arguments. Then

V𝒲,Mt+HaHV𝒲,Mt(xt)+γ¯𝒲(1aH)1a(α¯α¯𝒲+1)maxw𝒲w22,\displaystyle V^{t+H}_{\mathcal{W},M}\leq a^{H}V^{t}_{\mathcal{W},M}(x_{t})+\frac{\overline{\gamma}_{\mathcal{W}}(1-a^{H})}{1-a}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2},
i.e.,ct+HaHV𝒲,Mt(xt)+γ¯𝒲(1aH)1a(α¯α¯𝒲+1)maxw𝒲w22.\displaystyle\text{i.e.,}\ c_{t+H}\leq a^{H}V^{t}_{\mathcal{W},M}(x_{t})+\frac{\overline{\gamma}_{\mathcal{W}}(1-a^{H})}{1-a}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Then using the fact that V𝒲,Mt(.)α¯𝒲σ(x)+γ¯𝒲maxw𝒲w22V^{t}_{\mathcal{W},M}(.)\leq\overline{\alpha}_{\mathcal{W}}\sigma(x)+\overline{\gamma}_{\mathcal{W}}\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2} we get

ct+HaHα¯𝒲σ(xt)+γ¯𝒲(1aH+1)1a(α¯α¯𝒲+1)maxw𝒲w22.c_{t+H}\leq a^{H}\overline{\alpha}_{\mathcal{W}}\sigma(x_{t})+\frac{\overline{\gamma}_{\mathcal{W}}(1-a^{H+1})}{1-a}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}.

Thus summing over all H{1,,T}H\in\{1,...,T\} we get

H=0T1c1+Hγ¯𝒲1a(α¯α¯𝒲+1)Tmaxw𝒲w22α¯𝒲1aσ(x1)=O(1)\sum_{H=0}^{T-1}c_{1+H}-\frac{\overline{\gamma}_{\mathcal{W}}}{1-a}\left(\frac{\underline{\alpha}}{\overline{\alpha}_{\mathcal{W}}}+1\right)T\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}\leq\frac{\overline{\alpha}_{\mathcal{W}}}{1-a}\sigma(x_{1})=O(1)\ \blacksquare

We note two differences in the result: (i) the attenuation depends on γ¯𝒲\overline{\gamma}_{\mathcal{W}} and (ii) the attenuation guarantee is given with respect to the maximum possible energy from the disturbances for the duration TT, i.e., Tmaxw𝒲w22T\max_{w\in\mathcal{W}}\lVert w\rVert^{2}_{2}. The controller’s performance can only characterized in terms of the maximum possible energy of the disturbance since the controller is always regulating for the worst-case, naturally. Given that γ¯𝒲=γ¯M\overline{\gamma}_{\mathcal{W}}=\overline{\gamma}M, we can choose MM and aa such that, 2γ~(1+α¯α¯)γ¯γc,𝒲2γ¯2\tilde{\gamma}\left(1+\frac{\underline{\alpha}}{\overline{\alpha}}\right)\overline{\gamma}\gtrapprox\gamma_{c,\mathcal{W}}\geq 2\overline{\gamma}. Thus the online RHC approach achieves an attenuation of 𝒪(γ¯)\mathcal{O}(\overline{\gamma}) with respect to the maximum possible energy of the disturbance.

Appendix E: Proof of Lemma 3

Here we prove that under Assumption 3 the conditions required for Proposition 2.4, [40] to hold are satisfied. Since f,cjf,c_{j} are continuous, 𝒲\mathcal{W} is bounded, the objective function for the optimization in Eq. (3) is uniformly level bounded (see [41] for definition) and by definition continuous functions are proper. This satisfies the assumption in Theorem 1.17, [41] that the objective function is proper, lower semi-continuous and uniformly level bounded. Moreover by continuity of the functions there should exist a finite solution MPCMt\text{MPC}^{t}_{M} for all θΘ\theta\in\Theta, xtnx_{t}\in\mathbb{R}^{n} and all Wt:teW_{t:t^{e}} s.t. wt𝒲w_{t}\in\mathcal{W}. Hence the domain of the solution MPCMt\text{MPC}^{t}_{M} is {θ,x,Wt:te:θΘ,xn,wt𝒲}\{\theta,x,W_{t:t^{e}}:\theta\in\Theta,x\in\mathbb{R}^{n},w_{t}\in\mathcal{W}\}. Then by continuity of f,cjf,c_{j} and point 3 of Theorem 1.17, [41] the solution MPCMt\text{MPC}^{t}_{M} is continuous in xt,θx_{t},\theta and Wt:teW_{t:t^{e}}. Hence Assumption A1 of proposition of Proposition 2.4, [40] is satisfied. Since ctc_{t} is strongly convex in the second argument it follows that Assumption A2 of Proposition 2.4, [40] trivially holds. Finally, by (vii) of Assumption 3 it follows that the Assumption A3 of Proposition 2.4, [40] is satisfied. Thus, all the assumptions of Proposition 2.4, [40] are satisfied. Hence, applying Proposition 2.4 we get the final result.

Appendix F: Proof of Lemma 4

We note that

VMt(xt,Wt:te;θ^)ct(xt,κM(xt;θ^))α¯σ(xt).V^{t}_{M}(x_{t},W_{t:t^{e}};\hat{\theta})\geq c_{t}(x_{t},\kappa_{M}(x_{t};\hat{\theta}))\geq\underline{\alpha}\sigma(x_{t}).

The last inequality follows directly from Assumption 2. Hence, α¯V=α¯>0\underline{\alpha}_{V}=\underline{\alpha}>0. We introduce some notation for simplifying the presentation of the analysis. If MPCt(Wt:te;.)={u~t,u~t+1,,u~te}\text{MPC}^{t}(W_{t:t^{e}};.)=\{\tilde{u}^{*}_{t},\tilde{u}^{*}_{t+1},...,\tilde{u}^{*}_{t^{e}}\}, then let MPCt(Wt:te;.)[i:j]={u~t+i1,,u~t+j1},ij\text{MPC}^{t}(W_{t:t^{e}};.)[i:j]=\{\tilde{u}^{*}_{t+i-1},...,\tilde{u}^{*}_{t+j-1}\},\ i\leq j. We denote MPCt(Wt:te;θ)\text{MPC}^{t}(W_{t:t^{e}};\theta) by Mpt\text{M}^{t}_{p}, and Wt:teW_{t:t^{e}} by WtW_{t}.

Let ϕt(k,xt,Mpt,Wt)\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}) denote the state the system evolves to at time t+kt+k starting at xtx_{t}, under the control sequence Mpt[1:k]\text{M}^{t}_{p}[1:k] and disturbance WtW_{t}. From the definition of VMt(xt,Wt;θ^)V^{t}_{M}(x_{t},W_{t};\hat{\theta}) and Lemma 3 it follows that

VMt+1(xt+1,Wt+1;θ^)VMt(xt,Wt;θ^)VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)+2αVθ^θ2.V^{t+1}_{M}(x_{t+1},W_{t+1};\hat{\theta})-V^{t}_{M}(x_{t},W_{t};\hat{\theta})\leq V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)+2\alpha_{V}\lVert\hat{\theta}-\theta\rVert_{2}.

We now analyze the term VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta). By definition

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)=JMt+1(xt+1,Mpt+1,Wt+1)JMt(xt,Mpt,Wt)\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)=J^{t+1}_{M}(x_{t+1},\text{M}^{t+1}_{p},W_{t+1})-J^{t}_{M}(x_{t},\text{M}^{t}_{p},W_{t})
=k=0M1ct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])\displaystyle=\sum_{k=0}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])
k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1]).\displaystyle-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1]).

The second step just follows from the definition of JMt(.,.,.)J^{t}_{M}(.,.,.). Hence,

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)=k=0jct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)=\sum_{k=0}^{j}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])
+k=j+1M1ct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])\displaystyle+\sum_{k=j+1}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])
k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1])\displaystyle-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1])
k=0j1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])\displaystyle\leq\sum_{k=0}^{j-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])
+k=jM1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])\displaystyle+\sum_{k=j}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])
k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1]),\displaystyle-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1]),

where M~pt+1=[Mpt[2:j+1],Mpj,t+1[j+1:M]]\tilde{\text{M}}^{t+1}_{p}=[\text{M}^{t}_{p}[2:j+1],\text{M}^{*j,t+1}_{p}[j+1:M]], where Mpj,t+1[j+1:M]\text{M}^{*j,t+1}_{p}[j+1:M] is the optimal control sequence given the control for the first jj time steps is Mpt[2:j+1]\text{M}^{t}_{p}[2:j+1]. The second inequality follows from the fact that M~pt+1\tilde{\text{M}}^{t+1}_{p} sub optimal compared to Mpt+1\text{M}^{t+1}_{p}. Now by definition

k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1])=ct(xt,Mpt[1])\displaystyle\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1])=c_{t}(x_{t},\text{M}^{t}_{p}[1])
+k=0M2ct+1+k(ϕt+1(k,ft(xt,Mpt[1],wt;θ),Mpt[2:M],Wt+1),Mpt[k+2]).\displaystyle+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,f_{t}(x_{t},\text{M}^{t}_{p}[1],w_{t};\theta),\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2]).

Let x~t+1=ft(xt,Mpt[1],wt;θ)\tilde{x}_{t+1}=f_{t}(x_{t},\text{M}^{t}_{p}[1],w_{t};\theta). Then x~t+1=ft(xt,κM(xt;θ),wt;θ)\tilde{x}_{t+1}=f_{t}(x_{t},\kappa_{M}(x_{t};\theta),w_{t};\theta). Recall that xt+1=ft(xt,κM(xt;θ^),wt;θ)x_{t+1}=f_{t}(x_{t},\kappa_{M}(x_{t};\hat{\theta}),w_{t};\theta). Hence, by Lemma 3

xt+1x~t+1=ft(xt,κM(xt;θ^),wt;θ)ft(xt,Mpt[1],wt;θ)αfακSθ^θ2.x_{t+1}-\tilde{x}_{t+1}=f_{t}(x_{t},\kappa_{M}(x_{t};\hat{\theta}),w_{t};\theta)-f_{t}(x_{t},\text{M}^{t}_{p}[1],w_{t};\theta)\leq\alpha_{f}\alpha_{\kappa}S\lVert\hat{\theta}-\theta\rVert_{2}.

Hence,

ϕt+1(k,x~t+1,Mpt+1,Wt+1)ϕt+1(k,xt+1,Mpt+1,Wt+1)αfk+1ακSk+1θ^θ2α~fακθ^θ2,\phi^{t+1}(k,\tilde{x}_{t+1},\text{M}^{t+1}_{p},W_{t+1})-\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1})\leq\alpha^{k+1}_{f}\alpha_{\kappa}S^{k+1}\lVert\hat{\theta}-\theta\rVert_{2}\leq\tilde{\alpha}_{f}\alpha_{\kappa}\lVert\hat{\theta}-\theta\rVert_{2},

where α~f=maxk[0,1,,M2]αfk+1Sk+1\tilde{\alpha}_{f}=\max_{k\in[0,1,...,M-2]}\alpha^{k+1}_{f}S^{k+1}. Hence,

k=0M1ct+k(ϕt(k,xt,Mpt,Wt),Mpt[k+1])ct(xt,Mpt[1])\displaystyle-\sum_{k=0}^{M-1}c_{t+k}(\phi^{t}(k,x_{t},\text{M}^{t}_{p},W_{t}),\text{M}^{t}_{p}[k+1])\leq-c_{t}(x_{t},\text{M}^{t}_{p}[1])
k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt[2:M],Wt+1),Mpt[k+2])+αcα~fακ(M1)θ^θ2.\displaystyle-\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2])+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}.

Hence,

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)k=0j1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)\leq\sum_{k=0}^{j-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])
+k=jM1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])ct(xt,Mpt[1])\displaystyle+\sum_{k=j}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])-c_{t}(x_{t},\text{M}^{t}_{p}[1])
k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt[2:M],Wt+1),Mpt[k+2])+αcα~fακ(M1)θ^θ2.\displaystyle-\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2])+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}.

Hence,

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)k=jM1ct+1+k(ϕt+1(k,xt+1,M~pt+1,Wt+1),M~pt+1[k+1])\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)\leq\sum_{k=j}^{M-1}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),\tilde{\text{M}}^{t+1}_{p}[k+1])
ct(xt,Mpt[1])+αcα~fακ(M1)θ^θ2.\displaystyle-c_{t}(x_{t},\text{M}^{t}_{p}[1])+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}.

By definition of M~pt+1\tilde{\text{M}}^{t+1}_{p}, the first sum on the right is the optimal cost-to-go starting from the state ϕt+1(j,xt+1,M~pt+1,Wt+1)\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}) for the horizon MjM-j from t+jt+j. Hence,

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)VMjt+j(ϕt+1(j,xt+1,M~pt+1,Wt+1),Wt+j)\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)\leq V^{t+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}),W_{t+j})
ct(xt,Mpt[1])+αcα~fακ(M2)θ^θ2.\displaystyle-c_{t}(x_{t},\text{M}^{t}_{p}[1])+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-2)\lVert\hat{\theta}-\theta\rVert_{2}.

By the assumption in the Lemma we have that

VMjt+j(ϕt+1(j,xt+1,Mpt+1,Wt+1),Wt+j;θ)α¯σ(ϕt+1(j,xt+1,M~pt+1,Wt+1))+γ¯k=t+jt+M2wk22.V^{t+j}_{M-j}(\phi^{t+1}(j,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),W_{t+j};\theta)\leq\overline{\alpha}\sigma(\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1}))+\overline{\gamma}\sum_{k=t+j}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}.

Now, ϕt+1(j,xt+1,M~pt+1,Wt+1)=ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1)\phi^{t+1}(j,x_{t+1},\tilde{\text{M}}^{t+1}_{p},W_{t+1})=\phi^{t+1}(j,x_{t+1},\text{M}^{t}_{p}[2:j+1],W_{t+1}). Hence,

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)α¯σ(ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1))\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)\leq\overline{\alpha}\sigma(\phi^{t+1}(j,x_{t+1},\text{M}^{t}_{p}[2:j+1],W_{t+1}))
+γ¯k=t+jt+M2wk22ct(xt,Mpt[1])+αcα~fακ(M1)θ^θ2.\displaystyle+\overline{\gamma}\sum_{k=t+j}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{p}[1])+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}. (9)

From Assumption 2 and the fact that VMt(xt,Wt:te;θ)α¯σ(xt)+γ¯k=tte1wt22V^{t}_{M}(x_{t},W_{t:t^{e}};\theta)\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{t}\rVert^{2}_{2} for any tt it follows that

α¯k=0M2σ(ϕt+1(k,xt+1,Mpt[2:k+1],Wt+1))\displaystyle\underline{\alpha}\sum_{k=0}^{M-2}\sigma(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:k+1],W_{t+1}))
k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt[2:k+1],Wt+1),Mpt[k+2])\displaystyle\leq\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t}_{p}[2:k+1],W_{t+1}),\text{M}^{t}_{p}[k+2])
ct(xt,Mpt[1])+k=0M2ct+1+k(ϕt+1(k,xt+1,Mpt+1,Wt+1),Mpt+1[k+1])\displaystyle\leq c_{t}(x_{t},\text{M}^{t}_{p}[1])+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,x_{t+1},\text{M}^{t+1}_{p},W_{t+1}),\text{M}^{t+1}_{p}[k+1])
ct(xt,Mpt[1])+k=0M2ct+1+k(ϕt+1(k,x~t+1,Mpt[2:M],Wt+1),Mpt[k+2])\displaystyle\leq c_{t}(x_{t},\text{M}^{t}_{p}[1])+\sum_{k=0}^{M-2}c_{t+1+k}(\phi^{t+1}(k,\tilde{x}_{t+1},\text{M}^{t}_{p}[2:M],W_{t+1}),\text{M}^{t}_{p}[k+2])
+αcα~fακ(M1)θ^θ2α¯σ(xt)+γ¯k=ttewk22+αcα~fακ(M1)θ^θ2.\displaystyle+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}.

The above inequality implies that j{0,1,,M2}\exists\ j^{*}\in\{0,1,...,M-2\} such that

σ(ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1))\displaystyle\sigma(\phi^{t+1}(j^{*},x_{t+1},\text{M}^{t}_{p}[2:j^{*}+1],W_{t+1}))
1α¯(M1)(α¯σ(xt)+γ¯k=ttewk22+αcα~fακ(M1)θ^θ2).\displaystyle\leq\frac{1}{\underline{\alpha}(M-1)}\left(\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}\right).

Then using this in Eq. (9) it we get that

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)α¯σ(ϕt+1(j,xt+1,Mpt[2:j+1],Wt+1))\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)\leq\overline{\alpha}\sigma(\phi^{t+1}(j^{*},x_{t+1},\text{M}^{t}_{p}[2:j^{*}+1],W_{t+1}))
+γ¯k=t+jt+M2wk22ct(xt,Mpt[1])+αcαfαk(M1)θ^θ2\displaystyle+\overline{\gamma}\sum_{k=t+j^{*}}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}-c_{t}(x_{t},\text{M}^{t}_{p}[1])+\alpha_{c}\alpha_{f}\alpha_{k}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}
α¯α¯(M1)(α¯σ(xt)+γ¯k=ttewk22+αcα~fακ(M1)θ^θ2)+γ¯k=t+jt+M2wk22\displaystyle\leq\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}\left(\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}\right)+\overline{\gamma}\sum_{k=t+j^{*}}^{t+M-2}\lVert w_{k}\rVert^{2}_{2}
ct(xt,Mpt[1])+αcα~fακ(M1)θ^θ2.\displaystyle-c_{t}(x_{t},\text{M}^{t}_{p}[1])+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\lVert\hat{\theta}-\theta\rVert_{2}.

Then using the fact that ct(xt,Mpt[1])α¯σ(xt)c_{t}(x_{t},\text{M}^{t}_{p}[1])\geq\underline{\alpha}\sigma(x_{t}) and rearranging terms we get that

VMt+1(xt+1,Wt+1;θ)VMt(xt,Wt;θ)α¯2α¯(M1)σ(xt)α¯σ(xt)\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\theta)-V^{t}_{M}(x_{t},W_{t};\theta)\leq\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}\sigma(x_{t})-\underline{\alpha}\sigma(x_{t})
+γ¯(α¯α¯(M1)+1)(k=ttewk22)+αcα~fακ(M2)(α¯α¯(M1)+1)θ^θ2\displaystyle+\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\left(\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right)+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-2)\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\lVert\hat{\theta}-\theta\rVert_{2}
=(α¯2α¯(M1)α¯)σ(xt)+γ¯(α¯α¯(M1)+1)(k=ttewk22)\displaystyle=\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right)\sigma(x_{t})+\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\left(\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right)
+αcα~fακ(M1)(α¯α¯(M1)+1)θ^θ2.\displaystyle+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\lVert\hat{\theta}-\theta\rVert_{2}.

Hence,

VMt+1(xt+1,Wt+1;θ^)VMt(xt,Wt;θ^)(α¯2α¯(M1)α¯)σ(xt)\displaystyle V^{t+1}_{M}(x_{t+1},W_{t+1};\hat{\theta})-V^{t}_{M}(x_{t},W_{t};\hat{\theta})\leq\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right)\sigma(x_{t})
+γ¯(α¯α¯(M1)+1)(k=ttewk22)+(2αV+αcα~fακ(M1)(α¯α¯(M1)+1))θ^θ2\displaystyle+\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\left(\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}\right)+\left(2\alpha_{V}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\right)\lVert\hat{\theta}-\theta\rVert_{2}\ \blacksquare

Appendix G: Proof of Lemma 5

The proof is very similar to Lemma 5. Since M>α¯2/α¯2+1M>\overline{\alpha}^{2}/\underline{\alpha}^{2}+1, there exists ϵ~(0,α¯)\tilde{\epsilon}\in(0,\underline{\alpha}) such that

M>α¯2/(α¯(α¯ϵ~))+1α¯/(α¯ϵ~)+1.M>\overline{\alpha}^{2}/\left(\underline{\alpha}(\underline{\alpha}-\tilde{\epsilon})\right)+1\geq\overline{\alpha}/\left(\underline{\alpha}-\tilde{\epsilon}\right)+1.

For convenience of presentation in the following we denote VMt(xt,Wt:te;θ^)V^{t}_{M}(x_{t},W_{t:t^{e}};\hat{\theta}) by VMtV^{t}_{M}. From Lemma 1 it follows that

VMt+1VMtΓVσ(x)+ΓVγk=ttewk22+ΓVθθ^θ2,\displaystyle V^{t+1}_{M}-V^{t}_{M}\leq\Gamma_{V}\sigma(x)+\Gamma^{\gamma}_{V}\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+\Gamma^{\theta}_{V}\lVert\hat{\theta}-\theta\rVert_{2},
ΓV=(α¯2α¯(M1)α¯),ΓVγ=γ¯(α¯α¯(M1)+1),\displaystyle\Gamma_{V}=\left(\frac{\overline{\alpha}^{2}}{\underline{\alpha}(M-1)}-\underline{\alpha}\right),\ \Gamma^{\gamma}_{V}=\overline{\gamma}\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right),
ΓVθ=(2αV+αcα~fακ(M1)(α¯α¯(M1)+1)).\displaystyle\Gamma^{\theta}_{V}=\left(2\alpha_{V}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\overline{\alpha}}{\underline{\alpha}(M-1)}+1\right)\right).

From the above inequality it follows that ΓVϵ~\Gamma_{V}\leq-\tilde{\epsilon} and ΓVγγ¯((α¯ϵ~)/α¯+1)\Gamma^{\gamma}_{V}\leq\overline{\gamma}\left((\underline{\alpha}-\tilde{\epsilon})/\overline{\alpha}+1\right). Hence,

VMt+1VMtϵ~σ(x)+γ¯(α¯ϵ~α¯+1)k=ttewk22\displaystyle V^{t+1}_{M}-V^{t}_{M}\leq-\tilde{\epsilon}\sigma(x)+\overline{\gamma}\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}
+(2αV+αcα~fακ(M1)(α¯ϵ~α¯+1))θ^θ2.\displaystyle+\left(2\alpha_{V}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\right)\lVert\hat{\theta}-\theta\rVert_{2}.

Then

VMtVMt(xt,Wt:te;θ)+αVθ^θ2α¯σ(xt)+γ¯k=tte1wt22+αVθ^θ2.V^{t}_{M}\leq V^{t}_{M}(x_{t},W_{t:t^{e}};\theta)+\alpha_{V}\lVert\hat{\theta}-\theta\rVert_{2}\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{t}\rVert^{2}_{2}+\alpha_{V}\lVert\hat{\theta}-\theta\rVert_{2}.

Hence,

VMt+1VMt\displaystyle V^{t+1}_{M}-V^{t}_{M} ϵ~α¯(VMtγ¯k=tte1wk22αVθ^θ2)+γ¯(α¯ϵ~α¯+1)k=ttewk22\displaystyle\leq-\frac{\tilde{\epsilon}}{\overline{\alpha}}\left(V^{t}_{M}-\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{k}\rVert^{2}_{2}-\alpha_{V}\lVert\hat{\theta}-\theta\rVert_{2}\right)+\overline{\gamma}\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}
+(2αV+αcα~fακ(M1)(α¯ϵ~α¯+1))θ^θ2\displaystyle+\left(2\alpha_{V}+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\right)\lVert\hat{\theta}-\theta\rVert_{2}
=ϵ~α¯VMt+γ¯(α¯α¯+1)k=ttewk22\displaystyle=-\frac{\tilde{\epsilon}}{\overline{\alpha}}V^{t}_{M}+\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}
+(αV(2+ϵ~α¯)+αcα~fακ(M1)(α¯ϵ~α¯+1))θ^θ2.\displaystyle+\left(\alpha_{V}\left(2+\frac{\tilde{\epsilon}}{\overline{\alpha}}\right)+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\right)\lVert\hat{\theta}-\theta\rVert_{2}.

Pick ϵ\epsilon such that 0<ϵϵ~/α¯0<\epsilon\leq\tilde{\epsilon}/\overline{\alpha}. Then

VMt+1VMtϵVMt+γ¯(α¯α¯+1)k=ttewk22\displaystyle V^{t+1}_{M}-V^{t}_{M}\leq-\epsilon V^{t}_{M}+\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right)\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}
+(αV(2+ϵ~α¯)+αcα~fακ(M1)(α¯ϵ~α¯+1))θ^θ2.\displaystyle+\left(\alpha_{V}\left(2+\frac{\tilde{\epsilon}}{\overline{\alpha}}\right)+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\right)\lVert\hat{\theta}-\theta\rVert_{2}.

Let a=(1ϵ)a=(1-\epsilon), where a<1a<1. Let c=(αV(2+ϵ~α¯)+αcα~fακ(M1)(α¯ϵ~α¯+1))c=\left(\alpha_{V}\left(2+\frac{\tilde{\epsilon}}{\overline{\alpha}}\right)+\alpha_{c}\tilde{\alpha}_{f}\alpha_{\kappa}(M-1)\left(\frac{\underline{\alpha}-\tilde{\epsilon}}{\overline{\alpha}}+1\right)\right), and b=γ¯(α¯α¯+1)b=\overline{\gamma}\left(\frac{\underline{\alpha}}{\overline{\alpha}}+1\right). Then

VMt+1aVMt+bk=ttewk22+cθ^θ2.V^{t+1}_{M}\leq aV^{t}_{M}+b\sum_{k=t}^{t^{e}}\lVert w_{k}\rVert^{2}_{2}+c\lVert\hat{\theta}-\theta\rVert_{2}.

Then following steps similar to the proof of Lemma 5 we get that

VMt+HaHVMt+baHM1a(k=0M1wt+k22)+b1ak=MH1aHk1wt+k22\displaystyle V^{t+H}_{M}\leq a^{H}V^{t}_{M}+\frac{ba^{H-M}}{1-a}\left(\sum_{k=0}^{M-1}\lVert w_{t+k}\rVert^{2}_{2}\right)+\frac{b}{1-a}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}
+b1ak=HH+M2wt+k22+c1aH1aθ^θ2.\displaystyle+\frac{b}{1-a}\sum_{k=H}^{H+M-2}\lVert w_{t+k}\rVert^{2}_{2}+c\frac{1-a^{H}}{1-a}\lVert\hat{\theta}-\theta\rVert_{2}.

Since VMt+HctH(xtH,κM(xtH;θ^))V^{t+H}_{M}\geq c_{t_{H}}(x_{t_{H}},\kappa_{M}(x_{t_{H}};\hat{\theta})), the condition VMtα¯σ(xt)+γ¯k=tte1wt22+αVθ^θ2V^{t}_{M}\leq\overline{\alpha}\sigma(x_{t})+\overline{\gamma}\sum_{k=t}^{t^{e}-1}\lVert w_{t}\rVert^{2}_{2}+\alpha_{V}\lVert\hat{\theta}-\theta\rVert_{2} implies

VMt+HaHα¯σ(xt)+(baHM1a+aHγ¯)(k=0M1wt+k22)+b1ak=MH1aHk1wt+k22\displaystyle V^{t+H}_{M}\leq a^{H}\overline{\alpha}\sigma(x_{t})+\left(\frac{ba^{H-M}}{1-a}+a^{H}\overline{\gamma}\right)\left(\sum_{k=0}^{M-1}\lVert w_{t+k}\rVert^{2}_{2}\right)+\frac{b}{1-a}\sum_{k=M}^{H-1}a^{H-k-1}\lVert w_{t+k}\rVert^{2}_{2}
+b1ak=HH+M2wt+k22+(c1aH1a+aHαV)θ^θ2\displaystyle+\frac{b}{1-a}\sum_{k=H}^{H+M-2}\lVert w_{t+k}\rVert^{2}_{2}+\left(c\frac{1-a^{H}}{1-a}+a^{H}\alpha_{V}\right)\lVert\hat{\theta}-\theta\rVert_{2}\ \blacksquare