This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Safe and Efficient Model Predictive Control Using Neural Networks: An Interior Point Approach

Daniel Tabas and Baosen Zhang This work is partially supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1762114, NSF grants ECCS-1930605 and ECCS-2023531. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.Authors are with the department of Electrical and Computer Engineering, University of Washington, Seattle, WA, United States. {\{dtabas, zhangbao}\}@uw.edu.
Abstract

Model predictive control (MPC) provides a useful means for controlling systems with constraints, but suffers from the computational burden of repeatedly solving an optimization problem in real time. Offline (explicit) solutions for MPC attempt to alleviate real time computational challenges using either multiparametric programming or machine learning. The multiparametric approaches are typically applied to linear or quadratic MPC problems, while learning-based approaches can be more flexible and are less memory-intensive. Existing learning-based approaches offer significant speedups, but the challenge becomes ensuring constraint satisfaction while maintaining good performance. In this paper, we provide a neural network parameterization of MPC policies that explicitly encodes the constraints of the problem. By exploring the interior of the MPC feasible set in an unsupervised learning paradigm, the neural network finds better policies faster than projection-based methods and exhibits substantially shorter solve times. We use the proposed policy to solve a robust MPC problem, and demonstrate the performance and computational gains on a standard test system.

I Introduction

Model predictive control (MPC) [1] is a powerful technique for controlling systems that are subject to state and input constraints, such as agricultural [2], automotive [3], and energy systems [4]. However, many applications require fast decision-making which may preclude the possibility of repeatedly solving an optimization problem online [5].

A popular approach for accelerating MPC is to move as much computation offline as possible [5, 6]. These techniques, known as explicit MPC, involve precomputing the solution to the MPC problem over a range of parameters or initial conditions. Most of the research effort has focused on problems with linear dynamics and constraints, and linear or quadratic cost functions. In this case, the explicit MPC solution is a piecewise affine (PWA) function defined over a polyhedral partition of the state constraints. However, many of the applications of interest have cost functions that are not necessarily linear or quadratic, or even convex. Further, the memory required to store the partition and affine functions can be prohibitive even for modestly-sized problems.

In order to reduce the complexity of explicit MPC, the optimal offline solution can be approximated. Approximations generally fall into two categories: partition-based solutions [7, 8, 9] that generate piecewise control laws over coarser state space partitions, and learning-based solutions [10, 11, 12, 13] that use function approximation to compactly represent the optimal MPC policy. In this paper, we focus on the latter with the key contribution of ensuring constraint satisfaction while exploring all feasible policies.

Constraint satisfaction is crucial in many engineering applications, and the ability of MPC to enforce constraints is a major factor in its popularity. However, it is not straightforward to guarantee that a learning-based solution will satisfy constraints. The main challenge arises from the fact that while neural networks can limit their outputs to be in simple regions, there is no obvious way of forcing complex constraint satisfaction at the output. In [10, 14], supervised and unsupervised learning were used to approximate the solution of MPCs, but did not provide any feasibility guarantees. By contrast, [11] trains an NN using a policy gradient approach, and guarantees feasibility by projecting the NN output into the feasible action set. However, this extra optimization step slows down the speed of online implementation, making it difficult to use in applications that require high-frequency solutions [15]. Supervised learning approaches that provide safety guarantees [13, 12] rely on a choice of MPC oracle that is not obvious when persistent disturbances are present.

In this paper, we propose an NN architecture for approximating explicit solutions to finite-horizon MPC problems with linear dynamics, linear constraints, and arbitrary differentiable cost functions. The proposed architecture guarantees constraint satisfaction without relying on projections or MPC oracles. By exploring the interior of the feasible set, we demonstrate faster training and evaluation, and comparable closed-loop performance relative to other NN architectures.

The proposed approach has parallels in interior point methods for convex optimization [16]. Interior point methods first solve a Phase I problem to find a strictly feasible starting point. This solution is used to initialize the Phase II algorithm for optimizing the original problem. Our approach accelerates both phases. The Phase I solution is given by a simple function (e.g., affine map) and the Phase II problem is solved using an NN architecture that can encode arbitrary polytopic constraints (Fig. 1).

The Phase II solution builds on a technique first proposed in [17], which uses a gauge map to establish equivalence between compact, convex sets. With respect to [17], the current work has three novel aspects. First, the reinforcement learning (RL) algorithm in [17] only uses information about the constraints, and does not use information about the cost function or dynamics. The resulting policy is safe, but can exhibit suboptimal performance. The MPC formulation in the current paper gives rise to a training algorithm that can exploit knowledge about the system, improving performance. Second, the MPC formulation permits explicit consideration for future time steps. The RL formulation cannot optimize entire trajectories due to the presence of constraints. This inability to “look ahead” again limits the performance of the RL algorithm. Finally, the previous work required a Phase I that used a linear feedback to find a strictly feasible point. A linear feedback, however, may not exist for some problems. The current work proposes a more general class of Phase I solutions (piecewise affine), while providing a way to manage the complexity of the Phase I solution.

Refer to caption
Figure 1: Illustration of the interior point approach to learning-based MPC. The set (x0)\mathcal{F}(x_{0}) represents the MPC feasible set, while μ0(x0)\mu_{0}(x_{0}) and μθ(x0)\mu_{\theta}(x_{0}) are control input sequences representing solutions to the Phase I and Phase II problems, respectively. The neural network μθ\mu_{\theta} moves the Phase I solution to a more optimal solution.

We demonstrate the effectiveness of the proposed technique on a 3-state test system, and compare to standard projection- and penalty-based approaches for learning with constraints. The results show that the proposed technique achieves Pareto efficiency in terms of closed-loop performance and online computation effort. All code is available at github.com/dtabas/gauge_networks.

Notation: The pp-norm ball for p1p\geq 1 is 𝔹p={zzp1}\mathbb{B}_{p}=\{z\mid\|z\|_{p}\leq 1\}. A polytope 𝒫n:={znFzg}\mathcal{P}\subset\mathbb{R}^{n}:=\{z\in\mathbb{R}^{n}\mid Fz\leq g\} is the (bounded) intersection of a finite number of halfspaces. Scaling of polytopes by a factor λ>0\lambda>0 is defined as λ𝒫={λznFzg}={znFzλg}\lambda\mathcal{P}=\{\lambda z\in\mathbb{R}^{n}\mid Fz\leq g\}=\{z\in\mathbb{R}^{n}\mid Fz\leq\lambda g\}. Given a matrix FF and a vector gg, the iith row of FF is denoted F(i)TF^{(i)T} and the iith component of gg is g(i)g^{(i)}. The interior of any set 𝒬\mathcal{Q} is denoted int 𝒬\textbf{int }\mathcal{Q}. The value of a variable yy at a time interval tt is denoted yty_{t}. A state or control trajectory of length τ\tau is written as the vector x=[x1T,,xτT]Tnτ\textbf{x}=\begin{bmatrix}x_{1}^{T},\ldots,x_{\tau}^{T}\end{bmatrix}^{T}\in\mathbb{R}^{n\tau} or u=[u0T,,uτ1T]Tmτ\textbf{u}=\begin{bmatrix}u_{0}^{T},\ldots,u_{\tau-1}^{T}\end{bmatrix}^{T}\in\mathbb{R}^{m\tau}. The column vector of all ones is 1. The symbol \circ denotes function composition.

II Problem Formulation

In this paper, we consider the problem of regulating discrete-time dynamical systems of the form

xt+1=Axt+But+dt\displaystyle x_{t+1}=Ax_{t}+Bu_{t}+d_{t} (1)

where xtnx_{t}\in\mathbb{R}^{n} is the system state at time tt, utmu_{t}\in\mathbb{R}^{m} is the control input, and dtnd_{t}\in\mathbb{R}^{n} is an uncertain input that captures exogenous disturbances and/or linearization error (if the true system dynamics are nonlinear) [18]. We assume the pair (A,B)(A,B) is stabilizable. The input constraints (actuation limits) are 𝒰={umFuugu}\mathcal{U}=\{u\in\mathbb{R}^{m}\mid F_{u}u\leq g_{u}\} while the state constraints arising from safety-critical engineering considerations are 𝒳={xnFxxgx}\mathcal{X}=\{x\in\mathbb{R}^{n}\mid F_{x}x\leq g_{x}\}.

We consider the problem of operating the system (1) using finite-horizon model predictive control. The goal is to choose, given initial condition x0𝒳x_{0}\in\mathcal{X}, a sequence of inputs u of length τ\tau that minimizes the cost of operating the system while respecting the operational constraints.

However, since the disturbances dtd_{t} are unknown ahead of time, the designer must carefully consider how to achieve both optimality and constraint satisfaction. Robust MPC literature contains many ways to handle the presence of disturbances in both the cost and constraints [19]. For example, the certainty-equivalent approach [5] considers only the nominal system trajectory, while the min-max approach [9] considers the worst-case disturbance. Interpolating between these two extremes, the tube-based approach [20] considers the cost of a nominal trajectory while guaranteeing that the true trajectory satisfies constraints. A stochastic point of view in [21] considers the disturbance as a random variable and minimizes the expected cost while providing probabilistic guarantees for constraint satisfaction.

In most robust MPC formulations, the set of possible disturbances is modeled as either a finite set, a bounded set, or a probability distribution [22]. In this paper, we assume the disturbances lie in a closed and bounded set 𝒟:={dnFddgd}\mathcal{D}:=\{d\in\mathbb{R}^{n}\mid F_{d}d\leq g_{d}\}. In order to ensure constraint satisfaction, we operate the system within a robust control invariant set (RCI) 𝒮𝒳\mathcal{S}\subseteq\mathcal{X}, defined as a set of initial conditions for which there exists a feedback policy in 𝒰\mathcal{U} keeping all system trajectories in 𝒮\mathcal{S}, under any disturbance sequence in 𝒟\mathcal{D} [23]. In our simulations, we used approximately-maximal RCIs computed with the semidefinite program from [24].

With 𝒮:={xnFsxgs}\mathcal{S}:=\{x\in\mathbb{R}^{n}\mid F_{s}x\leq g_{s}\}, we define the target set 𝒯\mathcal{T} as {xnx+d𝒮,d𝒟}={xnFsxg~s}\{x\in\mathbb{R}^{n}\mid x+d\in\mathcal{S},\ \forall\ d\in\mathcal{D}\}=\{x\in\mathbb{R}^{n}\mid F_{s}x\leq\tilde{g}_{s}\} where for each row ii, g~s(i)=gs(i)maxd𝒟Fs(i)Td\tilde{g}_{s}^{(i)}=g_{s}^{(i)}-\max_{d\in\mathcal{D}}F_{s}^{(i)T}d [23]. Any policy that maps 𝒮\mathcal{S} to 𝒯\mathcal{T} under the nominal dynamics will map 𝒮\mathcal{S} to itself under the true dynamics, rendering 𝒮\mathcal{S} robustly invariant. By constraining the nominal state to the target set, robust constraint satisfaction is guaranteed for the first time step. Since 𝒮\mathcal{S} is RCI, this is sufficient for keeping closed-loop trajectories inside 𝒮\mathcal{S}. Under this formulation, the MPC problem is posed as follows, given initial state x0x_{0}:

minuk=0τ1l(xk,uk)+lF(xτ)\displaystyle\min_{\textbf{u}}\sum_{k=0}^{\tau-1}l(x_{k},u_{k})+l_{F}(x_{\tau}) (2a)
subject to kxk+1=Axk+Buk\displaystyle\text{subject to $\forall\ k$: }x_{k+1}=Ax_{k}+Bu_{k} (2b)
xk+1𝒯\displaystyle x_{k+1}\in\mathcal{T} (2c)
uk𝒰\displaystyle u_{k}\in\mathcal{U} (2d)

where ll and lFl_{F} are stage and terminal costs that are differentiable but possibly nonlinear or even non-convex. Although (2) differs from the standard tube-based approach, the techniques introduced in this paper can be applied to a variety of MPC formulations.

In this paper, we seek to derive a safe feedback policy πθ:nm\pi_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} that approximates the explicit solution to (2) by first approximating the optimal control sequence with a function μθ:nmτ\mu_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m\tau} and then implementing the first action of the sequence in the closed loop. In practice, any MPC policy implemented in closed loop must be stabilizing and recursively feasible. Recursive feasibility is the property that closed-loop trajectories generated by the MPC controller will not lead to states in which the MPC problem is infeasible. This property is guaranteed when 𝒮\mathcal{S} is RCI [23]. If recursive feasibility is not guaranteed, then a backup controller must be developed or a control sequence that is feasible for the most immediate time steps can be used. There is suggestion in the literature that the latter approach performs quite well in practice [25], but the theoretical aspects remain open. In terms of stability, recursive feasibility guarantees that trajectories will remain within a bounded set. Since this work focuses on constraint satisfaction, we do not consider stricter notions of stability.

III Phase I: Finding a Feasible Point

The feasible set of (2) is a polytope (x0)mτ\mathcal{F}(x_{0})\subseteq\mathbb{R}^{m\tau}, defined by the following inequalities in u:

Hs(M0x0+Muu)\displaystyle H_{s}(M_{0}x_{0}+M_{u}\textbf{u}) h~s,\displaystyle\leq\tilde{h}_{s}, (3a)
Huu\displaystyle H_{u}\textbf{u} hu\displaystyle\leq h_{u} (3b)

where Hs,Hu,M0,Mu,h~s,H_{s},H_{u},M_{0},M_{u},\tilde{h}_{s}, and huh_{u} are block matrices and vectors derived from the system dynamics and constraints. In this paper, we assume that (x0)\mathcal{F}(x_{0}) has nonempty interior for all x0𝒮x_{0}\in\mathcal{S}. Since the state constraints 𝒮\mathcal{S} form an RCI, (x0)\mathcal{F}(x_{0}) is already guaranteed to be nonempty, and the assumption of nonempty interior is only marginally more restrictive.

The gauge map technique introduced in [17] provides a way to constrain the outputs of a neural network μθ:nmτ\mu_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m\tau} to (x0)\mathcal{F}(x_{0}) without a projection or penalty function, but (x0)\mathcal{F}(x_{0}) must contain the origin in its interior. If this is not the case, then we must temporarily “shift” (x0)\mathcal{F}(x_{0}) by subtracting any one of its interior points. In this section, we discuss several ways to reduce the complexity of finding an interior point.

We begin by considering the feasibility problem for the one-step safe action set defined as 𝒱(x0)={umu𝒰,Ax0+Bu𝒯},\mathcal{V}(x_{0})=\{u\in\mathbb{R}^{m}\mid u\in\mathcal{U},Ax_{0}+Bu\in\mathcal{T}\}, which is guaranteed to have an interior point by the assumption on (x0).\mathcal{F}(x_{0}). One way to find an interior point of 𝒱(x0)\mathcal{V}(x_{0}) is to minimize the maximum constraint violation:

minu,ss\displaystyle\min_{u,s}s (4a)
subject to: Fs(Ax0+Bu)g~s+s1\displaystyle\text{subject to: }F_{s}(Ax_{0}+Bu)\leq\tilde{g}_{s}+s\textbf{1} (4b)
Fuugu+s1\displaystyle F_{u}u\leq g_{u}+s\textbf{1} (4c)

which has an optimal cost s0s^{*}\leq 0 if 𝒱(x0)\mathcal{V}(x_{0}) is nonempty, and s<0s^{*}<0 if 𝒱(x0)\mathcal{V}(x_{0}) has nonempty interior [16]. To avoid solving a linear program online during closed-loop implementation, the solution to (4) can be stored as a piecewise affine (PWA) function π0(x0):nm\pi_{0}(x_{0}):\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} [7]. Although solutions to multiparametric LPs can be demanding on computer memory, we take advantage of the fact that feasibility problems have low accuracy requirements: any suboptimal solution to (4) that achieves a cost s<0s<0 for all x0𝒮x_{0}\in\mathcal{S} is acceptable.

Definition 1

A function π0:nm\pi_{0}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is said to solve (4) if, for all x0𝒮x_{0}\in\mathcal{S}, the optimal cost of (4) is negative when the decision variable uu is fixed at π0(x0)\pi_{0}(x_{0}).

Existing techniques for approximate multiparametric linear programming [26], especially those that generate continuous solutions [27], can be used to reduce the memory requirements of offline solutions to (4).

To show just how far one can go with reducing complexity, we will construct an affine (rather than PWA) function that solves (4), for the system studied in Section V. Let π0(x0)=Wx0+w\pi_{0}(x_{0})=Wx_{0}+w. If Wm×nW\in\mathbb{R}^{m\times n} and wmw\in\mathbb{R}^{m} satisfy

Fx(Ax0+B(Wx0+w))\displaystyle F_{x}(Ax_{0}+B(Wx_{0}+w)) <g~x\displaystyle<\tilde{g}_{x} (5a)
Fu(Wx0+w)\displaystyle F_{u}(Wx_{0}+w) <gu\displaystyle<g_{u} (5b)

for all x0𝒮x_{0}\in\mathcal{S}, then π0(x0)=Wx0+w\pi_{0}(x_{0})=Wx_{0}+w solves (4). The following optimization problem can be solved to find WW and ww or certify that none exists. Let 𝒴(s)={x0nFs(Ax0+B(Wx0+w))g~s+s1,Fu(Wx0+w)gu+s1}.\mathcal{Y}(s)=\{x_{0}\in\mathbb{R}^{n}\mid F_{s}(Ax_{0}+B(Wx_{0}+w))\leq\tilde{g}_{s}+s\textbf{1},F_{u}(Wx_{0}+w)\leq g_{u}+s\textbf{1}\}. If the optimal cost of

minW,w,ss subject to 𝒮𝒴(s)\displaystyle\min_{W,w,s}s\text{ subject to }{\color[rgb]{0,0,0}\mathcal{S}}\subseteq\mathcal{Y}(s) (6)

is negative, then (5) holds for all x0𝒮x_{0}\in\mathcal{S}, thus π0\pi_{0} solves (4). This happens to be the case for the example in Section V, taken from [6]. The constraint in (6) is a polytope containment constraint in halfspace representation, thus (6) can be solved as a linear program [28].

Now consider the feasibility problem for (x0)\mathcal{F}(x_{0}), which is obtained by replacing (4b) and (4c) with (3a) and (3b), and changing the optimization variable from umu\in\mathbb{R}^{m} to umτ\textbf{u}\in\mathbb{R}^{m\tau}. One would naturally expect the complexity of the PWA solution to this feasibility problem to increase rapidly with the time horizon τ\tau, as more decision variables and constraints are added. However, the next proposition shows that the cardinality of the stored partition can be made constant in τ\tau.

Proposition 1 (Phase I solution)

If π0\pi_{0} solves (4), then the vector μ0(x0):=[π0(x0)T,,π0(xτ1)T]T\mu_{0}(x_{0}):=\begin{bmatrix}\pi_{0}(x_{0})^{T},\ldots,\pi_{0}(x_{\tau-1})^{T}\end{bmatrix}^{T}, where xk+1=Axk+Bπ0(xk)x_{k+1}=Ax_{k}+B\pi_{0}(x_{k}), is an interior point of (x0)\mathcal{F}(x_{0}) for any x0x_{0}\in 𝒮\mathcal{S}.

Proof:

If π0\pi_{0} solves (4), then π0(x)int 𝒱(x)\pi_{0}(x)\in\textbf{int }\mathcal{V}(x) for all x𝒮x\in\mathcal{S}. Applying the definition of 𝒱\mathcal{V} in an inductive argument, it is straightforward to show that the state trajectory associated with μ0(x0)\mu_{0}(x_{0}) is entirely contained in 𝒮\mathcal{S}. Fix any such trajectory {x1,,xτ}𝒮\{x_{1},\ldots,x_{\tau}\}\subset\mathcal{S} originating from x0𝒮x_{0}\in\mathcal{S} under policy π0\pi_{0}. For any k{1,,τ}k\in\{1,\ldots,\tau\}, xk𝒮x_{k}\in\mathcal{S} implies π0(xk)int 𝒱(xk)\pi_{0}(x_{k})\in\textbf{int }\mathcal{V}(x_{k}), which implies π0(xk)int 𝒰\pi_{0}(x_{k})\in\textbf{int }\mathcal{U} and Axk+Bπ0(xk)int 𝒯Ax_{k}+B\pi_{0}(x_{k})\in\textbf{int }\mathcal{T}. Since this holds for all kk, the constraints defining (x0)\mathcal{F}(x_{0}) hold strictly at μ0(x0)\mu_{0}(x_{0}). ∎

In our simulations on the example from [6], (6)\eqref{eqn:3-17-4} was feasible with negative optimal cost, meaning that a polyhedral partition of the state space was not needed (see Section V). This indicates that the minimum number of regions in a polyhedral state space partition associated with a PWA solution to (4) is in general very small relative to the number of regions in an explicit solution to (2).

IV Phase II: Optimizing Performance

In this section, we construct a class of policies from x0𝒮x_{0}\in\mathcal{S} to (x0)\mathcal{F}(x_{0}), that can be trained using standard machine learning packages. Although it is difficult to constrain the output of a neural network to an arbitrary polytope such as (x0)\mathcal{F}(x_{0}), it is easy to constrain the output to the hypercube 𝔹\mathbb{B}_{\infty} by applying a clamping function elementwise in the output layer. We apply a mapping between polytopes that is closed-form, differentiable, and bijective. This mapping establishes an equivalence between 𝔹\mathbb{B}_{\infty} and (x0)\mathcal{F}(x_{0}), allowing one to constrain the outputs of the policy to (x0)\mathcal{F}(x_{0}). The mapping from 𝔹\mathbb{B}_{\infty} to (x0)\mathcal{F}(x_{0}) is called the gauge map. The concept is illustrated in Figure 2.

Refer to caption
Figure 2: The proposed control policy uses a neural network combined with the Phase I solution and a gauge map to constrain the decision u to the MPC feasible set (x0)\mathcal{F}(x_{0}). The first action from the sequence u is extracted and implemented. On the right, the action of the gauge map is illustrated.

We begin constructing the gauge map by introducing some preliminary concepts. A C-set is a convex, compact set that contains the origin as an interior point. The gauge function with respect to C-set 𝒫n\mathcal{P}\subset\mathbb{R}^{n}, denoted γ𝒫:n+\gamma_{\mathcal{P}}:\mathbb{R}^{n}\rightarrow\mathbb{R}_{+}, is the function whose sublevel sets are scaled versions of 𝒫\mathcal{P}. Specifically, the gauge of a vector vv with respect to 𝒫\mathcal{P} is given by γ𝒫(v)=inf{λ0vλ𝒫}.\gamma_{\mathcal{P}}(v)=\inf\{\lambda\geq 0\mid v\in\lambda\mathcal{P}\}. If 𝒫\mathcal{P} is a polytopic C-set given by {vkFvg}\{v\in\mathbb{R}^{k}\mid Fv\leq g\}, then γ𝒫\gamma_{\mathcal{P}} is the pointwise maximum over a finite set of affine functions [17]:

γ𝒫(v)=maxiF(i)Tvg(i).\displaystyle\gamma_{\mathcal{P}}(v)=\max_{i}\frac{F^{(i)T}v}{g^{(i)}}. (7)

Given two C-sets 𝒫\mathcal{P} and 𝒬\mathcal{Q}, the gauge map G:𝒫𝒬G:\mathcal{P}\rightarrow\mathcal{Q} is

G(v𝒫,𝒬)=γ𝒫(v)γ𝒬(v)v.\displaystyle G(v\mid\mathcal{P},\mathcal{Q})=\frac{\gamma_{\mathcal{P}}(v)}{\gamma_{\mathcal{Q}}(v)}\cdot v. (8)

This function maps level sets of γ𝒫\gamma_{\mathcal{P}} to level sets of γ𝒬\gamma_{\mathcal{Q}}.

Proposition 2

Given two polytopic C-sets 𝒫\mathcal{P} and 𝒬\mathcal{Q}, the gauge map G:𝒫𝒬G:\mathcal{P}\rightarrow\mathcal{Q} is subdifferentiable and bijective. Further, given a function π0\pi_{0} from Proposition 1, the set ~(x):=[(x)π0(x)]\tilde{\mathcal{F}}(x):=[\mathcal{F}(x)-\pi_{0}(x)] is a C-set for all x𝒮x\in\mathcal{S}.

Proof:

The properties of subdifferentiability and bijectivity come from [17]. For the C-set property, fix x𝒮x\in\mathcal{S}. Since 𝒮,\mathcal{S}, 𝒰,\mathcal{U}, and 𝒟\mathcal{D} are convex and compact, so is (x)\mathcal{F}(x). Since μ0(x)\mu_{0}(x) is an interior point of (x),\mathcal{F}(x), the set ~(x)\tilde{\mathcal{F}}(x) contains the origin as an interior point and is therefore a C-set. ∎

We now use the gauge map in conjunction with the Phase I solution to construct a neural network whose output is confined to (x0).\mathcal{F}(x_{0}). Let ψθ:𝒮𝔹\psi_{\theta}:\mathcal{S}\rightarrow\mathbb{B}_{\infty} be a neural network parameterized by θ\theta. A safe policy is constructed by composing the gauge map G:𝔹~(x0)G:\mathbb{B}_{\infty}\rightarrow\tilde{\mathcal{F}}(x_{0}) with ψθ\psi_{\theta}, then adding μ0(x0)\mu_{0}(x_{0}) to map the solution into (x0)\mathcal{F}(x_{0}):

μθ(x0)=G(𝔹,~(x0))ψθ(x0)+μ0(x0).\displaystyle\mu_{\theta}(x_{0})=G(\cdot\mid\mathbb{B}_{\infty},\tilde{\mathcal{F}}(x_{0}))\circ\psi_{\theta}(x_{0})+\mu_{0}(x_{0}). (9)

Computing the gauge map online simply requires evaluating HsM0x0H_{s}M_{0}x_{0} from (3a) as well as the operations in (7).

The function μθ\mu_{\theta} has several important properties for approximating the optimal solution to (2). First, it leverages the universal function approximation properties of neural networks [29] along with the bijectivity of the gauge map (Proposition 2) to explore all interior points of (x0).\mathcal{F}(x_{0}). This is an advantage over projection-based methods [11] which may be biased towards the boundary of (x0)\mathcal{F}(x_{0}) when the optimal solution may lie on the interior. Second, μθ\mu_{\theta} is evaluated in closed form, and its outputs are constrained to (x0)\mathcal{F}(x_{0}) without the use of an optimization layer [14] that may have high computational overhead. Finally, the subdifferentiability of the gauge map (Proposition 2) enables selection of parameter θ\theta using standard automatic differentiation techniques.

Optimizing the parameter θ\theta

Similar to the approach taken in [10], we optimize θ\theta by sampling x𝒮x\in\mathcal{S} and applying stochastic gradient descent. At each iteration, a new batch of initial conditions {x0j}j=1M\{x_{0}^{j}\}_{j=1}^{M} is sampled from 𝒮\mathcal{S} and the loss is computed as

J(θ)=1Mj=1Mk=0τ1l(xkj,ukj)+lF(xτj)\displaystyle J(\theta)=\frac{1}{M}\sum_{j=1}^{M}\sum_{k=0}^{\tau-1}l(x_{k}^{j},u_{k}^{j})+l_{F}(x_{\tau}^{j}) (10)

with the control sequences uj\textbf{u}^{j} given by μθ(x0j)\mu_{\theta}(x_{0}^{j}) and state trajectories xj\textbf{x}^{j} generated according to the nominal dynamics. The parameters θ\theta are updated in the direction of θJ\nabla_{\theta}J, which is easily computed using automatic differentiation [30].

V Simulations

V-A Test systems

We simulate the proposed policy using a modified example from [6] with n=3n=3, m=2,m=2, and τ=5\tau=5. The system matrices, constraints, costs, and Phase I solution (found using (6)) are given below:

A=[.5.31.2.5.61.6.6],B=[.601.890.955.715.246.184],\displaystyle A=\begin{bmatrix}-.5&.3&-1\\ .2&-.5&.6\\ 1&.6&-.6\end{bmatrix},\ B=\begin{bmatrix}-.601&-.890\\ .955&-.715\\ .246&-.184\end{bmatrix}, (11)
x5,u1,d0.1,\displaystyle\|x\|_{\infty}\leq 5,\ \|u\|_{\infty}\leq 1,\ \|d\|_{\infty}\leq 0.1, (12)
l(x,u)=x22+c1u22,lF(x)=c2x22\displaystyle l(x,u)=\|x\|_{2}^{2}+c_{1}\|u\|_{2}^{2},\ l_{F}(x)=c_{2}\|x\|_{2}^{2} (13)
W=[0.1160.2100.3700.3200.1040.122],w=[0.1570.0533]\displaystyle W=\begin{bmatrix}0.116&0.210&-0.370\\ -0.320&-0.104&-0.122\end{bmatrix},w=\begin{bmatrix}-0.157\\ -0.0533\end{bmatrix}

where c1c_{1} and c2c_{2} are positive constants. Although quadratic costs are used in the simulations, the proposed method can work with any differentiable cost.

We evaluate the performance of a given policy in both open- and closed-loop experiments. In the open-loop experiments, we evaluate the MPC cost (2a) and compare it to the optimal cost. The fraction suboptimality is

δ=cnncmpccmpc\displaystyle\delta=\frac{c_{nn}-c_{mpc}}{c_{mpc}} (14)

where cnnc_{nn} is the average cost (2a) incurred by the control sequence μθ\mu_{\theta} on a validation set {x0j}j=1Nval𝒮\{x_{0}^{j}\}_{j=1}^{N_{val}}\subset\mathcal{S} and cmpcc_{mpc} is the optimal cost.

In the closed-loop experiments, we evaluate the performance of a policy πθ(xt):nm,t0\pi_{\theta}(x_{t}):\mathbb{R}^{n}\rightarrow\mathbb{R}^{m},t\geq 0 which is derived from μθ(xt)\mu_{\theta}(x_{t}) by taking the first action in the sequence. We simulate (1) for TτT\gg\tau time steps. The trajectory cost in the closed-loop experiments is computed as t=0T1l(xt,ut)+lF(xT)\sum_{t=0}^{T-1}l(x_{t},u_{t})+l_{F}(x_{T}) and the disturbance is modeled as an autoregressive sequence [31], dt+1=αdt+(1α)d^d_{t+1}=\alpha d_{t}+(1-\alpha)\hat{d} where α(0,1)\alpha\in(0,1) and d^\hat{d} is drawn uniformly over 𝒟\mathcal{D}.

V-B Benchmarks

We compare the proposed method to two of the most common approaches for learning a solution to (2). The first benchmark is a penalty-based approach [32] which enforces the constraints (2c) and (2d) by augmenting the cost (10) with a linear penalty term on constraint violations given by βmax{0,Fxxtg~x}\beta\cdot\max\{0,F_{x}x_{t}-\tilde{g}_{x}\} where the max\max is evaluated elementwise and β>0\beta>0. Since the penalty-based approach does not encode state constraints in the policy, the policy is constrained to the Cartesian product 𝒰τ=k=0τ1𝒰\mathcal{U}^{\tau}=\prod_{k=0}^{\tau-1}\mathcal{U} using scaled tanh\tanh functions elementwise.

The second benchmark is a projection-based approach [11] which constrains the policy to the set (x0)\mathcal{F}(x_{0}) by solving a convex quadratic program in the output layer of a neural network [33]. The optimization layer vu\textbf{v}\rightarrow\textbf{u} returns

argminuvu22 subject to u(x0).\displaystyle\underset{\textbf{u}}{\arg\min}\|\textbf{v}-\textbf{u}\|_{2}^{2}\text{ subject to }\textbf{u}\in\mathcal{F}(x_{0}).

Another class of approaches to learning-based MPC seeks to learn the optimal solution to (2) using regression [12, 13, 14]. Specifically, data-label pairs (x0,u0)(x_{0},u_{0}^{*}) are generated by sampling x0x_{0} from 𝒮\mathcal{S}, solving (2) for each sample, and extracting u0u_{0}^{*} from the optimal solution u\textbf{u}^{*}. Then, a neural network or other function approximator is trained to learn the relationship between x0x_{0} and u0u_{0}^{*}. Performance and constraint satisfaction are handled e.g. by bounding the approximation error with respect to the MPC oracle. We do not compare against this type of approach since it requires a large number of trained samples, making it difficult to compare with our and the other unsupervised examples.

V-C Neural network design

The neural networks were designed with nn inputs, mτm\tau outputs, and two hidden layers with rectified linear unit (ReLU) activation functions. The width of the networks was chosen during hyperparameter tuning. In particular, we performed 30 iterations of random search over the width of the network (number of neurons per hidden layer) {64,,1024}\in\{64,\ldots,1024\}, the batch size (number of initial conditions, MM) {100,,3000}\in\{100,\ldots,3000\} and the learning rate (LR, the step size for gradient descent) [105,103]\in[10^{-5},10^{-3}]. For each set of hyperparameters under consideration, we computed the validation score using (14) with Nval=100N_{val}=100. The hyperparameters after tuning are reported in Table I.

TABLE I: Hyperparameters for the three neural networks.
Type Width LR MM
Gauge 859859 4.7×1044.7\times 10^{-4} 16551655
Penalty 318318 8.7×1048.7\times 10^{-4} 133
Projection 956956 9.0×1059.0\times 10^{-5} 813813

V-D Simulation results

Here we compare our proposed approach (Gauge NN), the penalty-based approach (Penalty NN), the projection-based approach (Projection NN) and the “ground truth” obtained by solving (2) online in cvxpy. The results of the open-loop experiments are shown in Table II, with performance computed relative to the optimal MPC solution using (14) with Nval=100N_{val}=100 trials. The proposed Gauge NN achieves lower cost compared to the projection-based method, and has a much lower computational complexity (solve time is only 6% of projection). Table II only compares the NNs with safety guarantees because constraint violations are not accounted for in (14).

TABLE II: Open-loop test results.
Type δ\delta (14) Solve time (sec)
Gauge 0.007 .0015
Projection 0.010 .024

Figure 3 shows the training curves for each type of network. The lower training cost achieved by the Gauge NN illustrates that it can be more efficient to explore the interior of the feasible set than the boundary. Since the MPC cost in the simulations is strictly convex, solutions with lower cost are closer to the optimal solution.

Refer to caption
Figure 3: Training trajectories for the three types of neural netwokrs. Our proposed Gauge-based approach achieves lower cost at a much faster rate.

Figure 4 compares the policies in terms of computation time and test performance. The box-and-whisker plots indicate the range of performance over 100 test trajectories of length T=50T=50, while the vertical position of each box indicates the average time to compute a control action. Of the policies with safety guarantees (Gauge NN, Projection NN, and online MPC), the Gauge NN achieves Pareto efficiency in terms of average solve time and median trajectory cost. Our intuition behind the high performance of the neural networks is that (2) is a heuristic and the unsupervised learning approach can lead to better closed-loop policies.

Refer to caption
Figure 4: Solve time vs. trajectory cost for the networks under consideration applied to the 3-state system. The Gauge NN is Pareto-efficient in terms of cost and computation time compared to the other techniques with safety guarantees (Online MPC and Projection NN).

VI Conclusion

In this paper, we provided an efficient way of exploring the interior of the MPC feasible set for learning-based approximate explicit MPC, and demonstrated the performance and computational gains that can be achieved by approaching the problem from the interior. The paradigm relies on a Phase I solution that exploits the structure of the MPC problem and a Phase II solution that features a projection-free feasibility guarantee. The results compare favorably against common approaches that use unsupervised learning, as well as against the oracle itself used in supervised approaches. Future work includes applications to MPC problems with convex but non-polytopic constraint sets, and to distributed settings.

References

  • [1] J. B. Rawlings, D. Q. Mayne, and M. M. Diehl, Model Predictive Control: Theory and Design. Santa Barbara, CA: Nob Hill, 2 ed., 2019.
  • [2] Y. Ding, L. Wang, Y. Li, and D. Li, “Model predictive control and its application in agriculture: A review,” Computers and Electronics in Agriculture, vol. 151, pp. 104–117, 2018.
  • [3] D. Hrovat, S. Di Cairano, H. E. Tseng, and I. V. Kolmanovsky, “The development of Model Predictive Control in automotive industry: A survey,” Proc. IEEE Int. Conf. on Control Applications, pp. 295–302, 2012.
  • [4] A. Ademola-Idowu and B. Zhang, “Frequency Stability Using MPC-Based Inverter Power Control in Low-Inertia Power Systems,” IEEE Trans. Power Syst., vol. 36, no. 2, pp. 1628–1637, 2021.
  • [5] A. Alessio and A. Bemporad, “A Survey on Explicit Model Predictive Control,” in Nonlinear Model Predictive Control (L. Magni, D. Raimondo, and F. Allgöwer, eds.), Berlin, Heidelberg: Springer, 2009.
  • [6] M. N. Zeilinger, C. N. Jones, and M. Morari, “Real-time suboptimal model predictive control using a combination of explicit MPC and online optimization,” IEEE Trans. Autom. Control, vol. 56, no. 7, pp. 1524–1534, 2011.
  • [7] C. N. Jones, M. Barić, and M. Morari, “Multiparametric linear programming with applications to control,” European Journal of Control, vol. 13, no. 2-3, pp. 152–170, 2007.
  • [8] T. A. Johansen, “Approximate explicit receding horizon control of constrained nonlinear systems,” Automatica, vol. 40, no. 2, pp. 293–300, 2004.
  • [9] A. Grancharova and T. A. Johansen, “Computation, approximation and stability of explicit feedback min-max nonlinear model predictive control,” Automatica, vol. 45, no. 5, pp. 1134–1143, 2009.
  • [10] B. M. Åkesson and H. T. Toivonen, “A neural network model predictive controller,” J. Process Control, vol. 16, no. 9, pp. 937–946, 2006.
  • [11] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J. Pappas, and M. Morari, “Approximating Explicit Model Predictive Control Using Constrained Neural Networks,” Proc. Am. Control Conf., vol. 2018-June, pp. 1520–1527, 2018.
  • [12] T. Parisini and R. Zoppoli, “A Receding-Horizon Regulator for Nonlinear Systems and a Neural Approximation,” Automatica, vol. 31, no. 10, pp. 1443–1451, 1995.
  • [13] A. Domahidi, M. N. Zeilinger, M. Morari, and C. N. Jones, “Learning a feasible and stabilizing explicit model predictive control law by robust optimization,” in Proc. IEEE Conf. Decision Control, pp. 513–519, IEEE, 2011.
  • [14] E. T. Maddalena, C. G. da Moraes, G. Waltrich, and C. N. Jones, “A neural network architecture to learn explicit MPC controllers from data,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 11362–11367, 2020.
  • [15] L. Zheng, Y. Shi, L. J. Ratliff, and B. Zhang, “Safe reinforcement learning of control-affine systems with vertex networks,” in Learning for Dynamics and Control, pp. 336–347, PMLR, 2021.
  • [16] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2009.
  • [17] D. Tabas and B. Zhang, “Computationally Efficient Safe Reinforcement Learning for Power Systems,” arXiv:2110.10333, 2021.
  • [18] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory, vol. 15. Philadelphia: Society for Industrial and Applied Mathematics, 1994.
  • [19] A. Bemporad and M. Morari, “Robust model predictive control: A survey,” in Robustness in identification and control, pp. 207–226, London: Springer, 1999.
  • [20] W. Langson, I. Chryssochoos, S. V. Raković, and D. Q. Mayne, “Robust model predictive control using tubes,” Automatica, vol. 40, no. 1, pp. 125–133, 2004.
  • [21] M. Farina, L. Giulioni, and R. Scattolini, “Stochastic linear Model Predictive Control with chance constraints - A review,” J. Process Control, vol. 44, pp. 53–67, 2016.
  • [22] M. B. Saltık, L. Özkan, J. H. Ludlage, S. Weiland, and P. M. Van den Hof, “An outlook on robust model predictive control algorithms: Reflections on performance and computational aspects,” J. Process Control, vol. 61, pp. 77–102, 2018.
  • [23] F. Blanchini and S. Miani, Set-theoretic methods in control. Birkhauser, 2015.
  • [24] C. Liu and I. M. Jaimoukha, “The computation of full-complexity polytopic robust control invariant sets,” in Proc. 54th IEEE Conf. Decision Control, pp. 6233–6238, 2015.
  • [25] Y. Wang and S. Boyd, “Fast model predictive control using online optimization,” IEEE Trans. Control Syst. Technol., vol. 18, no. 2, pp. 267–278, 2010.
  • [26] C. Filippi, “An algorithm for approximate multiparametric linear programming,” Journal of Optimization Theory and Applications, vol. 120, no. 1, pp. 73–95, 2004.
  • [27] J. Spjøtvold, P. Tøndel, and T. A. Johansen, “A method for obtaining continuous solutions to multiparametric linear programs,” IFAC Proc. Volumes (IFAC-PapersOnline), vol. 38, no. 1, pp. 253–258, 2005.
  • [28] S. Sadraddini and R. Tedrake, “Linear Encodings for Polytope Containment Problems,” Proc. IEEE Conf. Decision Control, pp. 4367–4372, 2019.
  • [29] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
  • [30] A. Baydin, A. A. Radul, B. A. Pearlmutter, and J. M. Siskind, “Automatic Differentiation in Machine Learning: a Survey,” J. Machine Learning Research, vol. 18, pp. 1–43, 2018.
  • [31] M. D. Srinath, P. Rajasekaran, and R. Viswanathan, Introduction to statistical signal processing with applications. Prentice-Hall, Inc., 1995.
  • [32] J. Drgona, K. Kis, A. Tuor, D. Vrabie, and M. Klauco, “Differentiable Predictive Control: Deep Learning Alternative to Explicit Model Predictive Control for Unknown Nonlinear Systems,” arXiv:2011.03699, 2020.
  • [33] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Zico Kolter, “Differentiable convex optimization layers,” Advances in Neural Information Processing Systems, vol. 32, no. NeurIPS, 2019.