Safe and Efficient Model Predictive Control Using Neural Networks: An Interior Point Approach
Abstract
Model predictive control (MPC) provides a useful means for controlling systems with constraints, but suffers from the computational burden of repeatedly solving an optimization problem in real time. Offline (explicit) solutions for MPC attempt to alleviate real time computational challenges using either multiparametric programming or machine learning. The multiparametric approaches are typically applied to linear or quadratic MPC problems, while learning-based approaches can be more flexible and are less memory-intensive. Existing learning-based approaches offer significant speedups, but the challenge becomes ensuring constraint satisfaction while maintaining good performance. In this paper, we provide a neural network parameterization of MPC policies that explicitly encodes the constraints of the problem. By exploring the interior of the MPC feasible set in an unsupervised learning paradigm, the neural network finds better policies faster than projection-based methods and exhibits substantially shorter solve times. We use the proposed policy to solve a robust MPC problem, and demonstrate the performance and computational gains on a standard test system.
I Introduction
Model predictive control (MPC) [1] is a powerful technique for controlling systems that are subject to state and input constraints, such as agricultural [2], automotive [3], and energy systems [4]. However, many applications require fast decision-making which may preclude the possibility of repeatedly solving an optimization problem online [5].
A popular approach for accelerating MPC is to move as much computation offline as possible [5, 6]. These techniques, known as explicit MPC, involve precomputing the solution to the MPC problem over a range of parameters or initial conditions. Most of the research effort has focused on problems with linear dynamics and constraints, and linear or quadratic cost functions. In this case, the explicit MPC solution is a piecewise affine (PWA) function defined over a polyhedral partition of the state constraints. However, many of the applications of interest have cost functions that are not necessarily linear or quadratic, or even convex. Further, the memory required to store the partition and affine functions can be prohibitive even for modestly-sized problems.
In order to reduce the complexity of explicit MPC, the optimal offline solution can be approximated. Approximations generally fall into two categories: partition-based solutions [7, 8, 9] that generate piecewise control laws over coarser state space partitions, and learning-based solutions [10, 11, 12, 13] that use function approximation to compactly represent the optimal MPC policy. In this paper, we focus on the latter with the key contribution of ensuring constraint satisfaction while exploring all feasible policies.
Constraint satisfaction is crucial in many engineering applications, and the ability of MPC to enforce constraints is a major factor in its popularity. However, it is not straightforward to guarantee that a learning-based solution will satisfy constraints. The main challenge arises from the fact that while neural networks can limit their outputs to be in simple regions, there is no obvious way of forcing complex constraint satisfaction at the output. In [10, 14], supervised and unsupervised learning were used to approximate the solution of MPCs, but did not provide any feasibility guarantees. By contrast, [11] trains an NN using a policy gradient approach, and guarantees feasibility by projecting the NN output into the feasible action set. However, this extra optimization step slows down the speed of online implementation, making it difficult to use in applications that require high-frequency solutions [15]. Supervised learning approaches that provide safety guarantees [13, 12] rely on a choice of MPC oracle that is not obvious when persistent disturbances are present.
In this paper, we propose an NN architecture for approximating explicit solutions to finite-horizon MPC problems with linear dynamics, linear constraints, and arbitrary differentiable cost functions. The proposed architecture guarantees constraint satisfaction without relying on projections or MPC oracles. By exploring the interior of the feasible set, we demonstrate faster training and evaluation, and comparable closed-loop performance relative to other NN architectures.
The proposed approach has parallels in interior point methods for convex optimization [16]. Interior point methods first solve a Phase I problem to find a strictly feasible starting point. This solution is used to initialize the Phase II algorithm for optimizing the original problem. Our approach accelerates both phases. The Phase I solution is given by a simple function (e.g., affine map) and the Phase II problem is solved using an NN architecture that can encode arbitrary polytopic constraints (Fig. 1).
The Phase II solution builds on a technique first proposed in [17], which uses a gauge map to establish equivalence between compact, convex sets. With respect to [17], the current work has three novel aspects. First, the reinforcement learning (RL) algorithm in [17] only uses information about the constraints, and does not use information about the cost function or dynamics. The resulting policy is safe, but can exhibit suboptimal performance. The MPC formulation in the current paper gives rise to a training algorithm that can exploit knowledge about the system, improving performance. Second, the MPC formulation permits explicit consideration for future time steps. The RL formulation cannot optimize entire trajectories due to the presence of constraints. This inability to “look ahead” again limits the performance of the RL algorithm. Finally, the previous work required a Phase I that used a linear feedback to find a strictly feasible point. A linear feedback, however, may not exist for some problems. The current work proposes a more general class of Phase I solutions (piecewise affine), while providing a way to manage the complexity of the Phase I solution.

We demonstrate the effectiveness of the proposed technique on a 3-state test system, and compare to standard projection- and penalty-based approaches for learning with constraints. The results show that the proposed technique achieves Pareto efficiency in terms of closed-loop performance and online computation effort. All code is available at github.com/dtabas/gauge_networks.
Notation: The -norm ball for is . A polytope is the (bounded) intersection of a finite number of halfspaces. Scaling of polytopes by a factor is defined as . Given a matrix and a vector , the th row of is denoted and the th component of is . The interior of any set is denoted . The value of a variable at a time interval is denoted . A state or control trajectory of length is written as the vector or . The column vector of all ones is 1. The symbol denotes function composition.
II Problem Formulation
In this paper, we consider the problem of regulating discrete-time dynamical systems of the form
(1) |
where is the system state at time , is the control input, and is an uncertain input that captures exogenous disturbances and/or linearization error (if the true system dynamics are nonlinear) [18]. We assume the pair is stabilizable. The input constraints (actuation limits) are while the state constraints arising from safety-critical engineering considerations are .
We consider the problem of operating the system (1) using finite-horizon model predictive control. The goal is to choose, given initial condition , a sequence of inputs u of length that minimizes the cost of operating the system while respecting the operational constraints.
However, since the disturbances are unknown ahead of time, the designer must carefully consider how to achieve both optimality and constraint satisfaction. Robust MPC literature contains many ways to handle the presence of disturbances in both the cost and constraints [19]. For example, the certainty-equivalent approach [5] considers only the nominal system trajectory, while the min-max approach [9] considers the worst-case disturbance. Interpolating between these two extremes, the tube-based approach [20] considers the cost of a nominal trajectory while guaranteeing that the true trajectory satisfies constraints. A stochastic point of view in [21] considers the disturbance as a random variable and minimizes the expected cost while providing probabilistic guarantees for constraint satisfaction.
In most robust MPC formulations, the set of possible disturbances is modeled as either a finite set, a bounded set, or a probability distribution [22]. In this paper, we assume the disturbances lie in a closed and bounded set . In order to ensure constraint satisfaction, we operate the system within a robust control invariant set (RCI) , defined as a set of initial conditions for which there exists a feedback policy in keeping all system trajectories in , under any disturbance sequence in [23]. In our simulations, we used approximately-maximal RCIs computed with the semidefinite program from [24].
With , we define the target set as where for each row , [23]. Any policy that maps to under the nominal dynamics will map to itself under the true dynamics, rendering robustly invariant. By constraining the nominal state to the target set, robust constraint satisfaction is guaranteed for the first time step. Since is RCI, this is sufficient for keeping closed-loop trajectories inside . Under this formulation, the MPC problem is posed as follows, given initial state :
(2a) | |||
(2b) | |||
(2c) | |||
(2d) |
where and are stage and terminal costs that are differentiable but possibly nonlinear or even non-convex. Although (2) differs from the standard tube-based approach, the techniques introduced in this paper can be applied to a variety of MPC formulations.
In this paper, we seek to derive a safe feedback policy that approximates the explicit solution to (2) by first approximating the optimal control sequence with a function and then implementing the first action of the sequence in the closed loop. In practice, any MPC policy implemented in closed loop must be stabilizing and recursively feasible. Recursive feasibility is the property that closed-loop trajectories generated by the MPC controller will not lead to states in which the MPC problem is infeasible. This property is guaranteed when is RCI [23]. If recursive feasibility is not guaranteed, then a backup controller must be developed or a control sequence that is feasible for the most immediate time steps can be used. There is suggestion in the literature that the latter approach performs quite well in practice [25], but the theoretical aspects remain open. In terms of stability, recursive feasibility guarantees that trajectories will remain within a bounded set. Since this work focuses on constraint satisfaction, we do not consider stricter notions of stability.
III Phase I: Finding a Feasible Point
The feasible set of (2) is a polytope , defined by the following inequalities in u:
(3a) | ||||
(3b) |
where and are block matrices and vectors derived from the system dynamics and constraints. In this paper, we assume that has nonempty interior for all . Since the state constraints form an RCI, is already guaranteed to be nonempty, and the assumption of nonempty interior is only marginally more restrictive.
The gauge map technique introduced in [17] provides a way to constrain the outputs of a neural network to without a projection or penalty function, but must contain the origin in its interior. If this is not the case, then we must temporarily “shift” by subtracting any one of its interior points. In this section, we discuss several ways to reduce the complexity of finding an interior point.
We begin by considering the feasibility problem for the one-step safe action set defined as which is guaranteed to have an interior point by the assumption on One way to find an interior point of is to minimize the maximum constraint violation:
(4a) | |||
(4b) | |||
(4c) |
which has an optimal cost if is nonempty, and if has nonempty interior [16]. To avoid solving a linear program online during closed-loop implementation, the solution to (4) can be stored as a piecewise affine (PWA) function [7]. Although solutions to multiparametric LPs can be demanding on computer memory, we take advantage of the fact that feasibility problems have low accuracy requirements: any suboptimal solution to (4) that achieves a cost for all is acceptable.
Definition 1
Existing techniques for approximate multiparametric linear programming [26], especially those that generate continuous solutions [27], can be used to reduce the memory requirements of offline solutions to (4).
To show just how far one can go with reducing complexity, we will construct an affine (rather than PWA) function that solves (4), for the system studied in Section V. Let . If and satisfy
(5a) | ||||
(5b) |
for all , then solves (4). The following optimization problem can be solved to find and or certify that none exists. Let If the optimal cost of
(6) |
is negative, then (5) holds for all , thus solves (4). This happens to be the case for the example in Section V, taken from [6]. The constraint in (6) is a polytope containment constraint in halfspace representation, thus (6) can be solved as a linear program [28].
Now consider the feasibility problem for , which is obtained by replacing (4b) and (4c) with (3a) and (3b), and changing the optimization variable from to . One would naturally expect the complexity of the PWA solution to this feasibility problem to increase rapidly with the time horizon , as more decision variables and constraints are added. However, the next proposition shows that the cardinality of the stored partition can be made constant in .
Proposition 1 (Phase I solution)
If solves (4), then the vector , where , is an interior point of for any .
Proof:
If solves (4), then for all . Applying the definition of in an inductive argument, it is straightforward to show that the state trajectory associated with is entirely contained in . Fix any such trajectory originating from under policy . For any , implies , which implies and . Since this holds for all , the constraints defining hold strictly at . ∎
In our simulations on the example from [6], was feasible with negative optimal cost, meaning that a polyhedral partition of the state space was not needed (see Section V). This indicates that the minimum number of regions in a polyhedral state space partition associated with a PWA solution to (4) is in general very small relative to the number of regions in an explicit solution to (2).
IV Phase II: Optimizing Performance
In this section, we construct a class of policies from to , that can be trained using standard machine learning packages. Although it is difficult to constrain the output of a neural network to an arbitrary polytope such as , it is easy to constrain the output to the hypercube by applying a clamping function elementwise in the output layer. We apply a mapping between polytopes that is closed-form, differentiable, and bijective. This mapping establishes an equivalence between and , allowing one to constrain the outputs of the policy to . The mapping from to is called the gauge map. The concept is illustrated in Figure 2.

We begin constructing the gauge map by introducing some preliminary concepts. A C-set is a convex, compact set that contains the origin as an interior point. The gauge function with respect to C-set , denoted , is the function whose sublevel sets are scaled versions of . Specifically, the gauge of a vector with respect to is given by If is a polytopic C-set given by , then is the pointwise maximum over a finite set of affine functions [17]:
(7) |
Given two C-sets and , the gauge map is
(8) |
This function maps level sets of to level sets of .
Proposition 2
Given two polytopic C-sets and , the gauge map is subdifferentiable and bijective. Further, given a function from Proposition 1, the set is a C-set for all .
Proof:
The properties of subdifferentiability and bijectivity come from [17]. For the C-set property, fix . Since and are convex and compact, so is . Since is an interior point of the set contains the origin as an interior point and is therefore a C-set. ∎
We now use the gauge map in conjunction with the Phase I solution to construct a neural network whose output is confined to Let be a neural network parameterized by . A safe policy is constructed by composing the gauge map with , then adding to map the solution into :
(9) |
Computing the gauge map online simply requires evaluating from (3a) as well as the operations in (7).
The function has several important properties for approximating the optimal solution to (2). First, it leverages the universal function approximation properties of neural networks [29] along with the bijectivity of the gauge map (Proposition 2) to explore all interior points of This is an advantage over projection-based methods [11] which may be biased towards the boundary of when the optimal solution may lie on the interior. Second, is evaluated in closed form, and its outputs are constrained to without the use of an optimization layer [14] that may have high computational overhead. Finally, the subdifferentiability of the gauge map (Proposition 2) enables selection of parameter using standard automatic differentiation techniques.
Optimizing the parameter
Similar to the approach taken in [10], we optimize by sampling and applying stochastic gradient descent. At each iteration, a new batch of initial conditions is sampled from and the loss is computed as
(10) |
with the control sequences given by and state trajectories generated according to the nominal dynamics. The parameters are updated in the direction of , which is easily computed using automatic differentiation [30].
V Simulations
V-A Test systems
We simulate the proposed policy using a modified example from [6] with , and . The system matrices, constraints, costs, and Phase I solution (found using (6)) are given below:
(11) | |||
(12) | |||
(13) | |||
where and are positive constants. Although quadratic costs are used in the simulations, the proposed method can work with any differentiable cost.
We evaluate the performance of a given policy in both open- and closed-loop experiments. In the open-loop experiments, we evaluate the MPC cost (2a) and compare it to the optimal cost. The fraction suboptimality is
(14) |
where is the average cost (2a) incurred by the control sequence on a validation set and is the optimal cost.
In the closed-loop experiments, we evaluate the performance of a policy which is derived from by taking the first action in the sequence. We simulate (1) for time steps. The trajectory cost in the closed-loop experiments is computed as and the disturbance is modeled as an autoregressive sequence [31], where and is drawn uniformly over .
V-B Benchmarks
We compare the proposed method to two of the most common approaches for learning a solution to (2). The first benchmark is a penalty-based approach [32] which enforces the constraints (2c) and (2d) by augmenting the cost (10) with a linear penalty term on constraint violations given by where the is evaluated elementwise and . Since the penalty-based approach does not encode state constraints in the policy, the policy is constrained to the Cartesian product using scaled functions elementwise.
The second benchmark is a projection-based approach [11] which constrains the policy to the set by solving a convex quadratic program in the output layer of a neural network [33]. The optimization layer returns
Another class of approaches to learning-based MPC seeks to learn the optimal solution to (2) using regression [12, 13, 14]. Specifically, data-label pairs are generated by sampling from , solving (2) for each sample, and extracting from the optimal solution . Then, a neural network or other function approximator is trained to learn the relationship between and . Performance and constraint satisfaction are handled e.g. by bounding the approximation error with respect to the MPC oracle. We do not compare against this type of approach since it requires a large number of trained samples, making it difficult to compare with our and the other unsupervised examples.
V-C Neural network design
The neural networks were designed with inputs, outputs, and two hidden layers with rectified linear unit (ReLU) activation functions. The width of the networks was chosen during hyperparameter tuning. In particular, we performed 30 iterations of random search over the width of the network (number of neurons per hidden layer) , the batch size (number of initial conditions, ) and the learning rate (LR, the step size for gradient descent) . For each set of hyperparameters under consideration, we computed the validation score using (14) with . The hyperparameters after tuning are reported in Table I.
Type | Width | LR | |
---|---|---|---|
Gauge | |||
Penalty | 133 | ||
Projection |
V-D Simulation results
Here we compare our proposed approach (Gauge NN), the penalty-based approach (Penalty NN), the projection-based approach (Projection NN) and the “ground truth” obtained by solving (2) online in cvxpy. The results of the open-loop experiments are shown in Table II, with performance computed relative to the optimal MPC solution using (14) with trials. The proposed Gauge NN achieves lower cost compared to the projection-based method, and has a much lower computational complexity (solve time is only 6% of projection). Table II only compares the NNs with safety guarantees because constraint violations are not accounted for in (14).
Type | (14) | Solve time (sec) |
---|---|---|
Gauge | 0.007 | .0015 |
Projection | 0.010 | .024 |
Figure 3 shows the training curves for each type of network. The lower training cost achieved by the Gauge NN illustrates that it can be more efficient to explore the interior of the feasible set than the boundary. Since the MPC cost in the simulations is strictly convex, solutions with lower cost are closer to the optimal solution.

Figure 4 compares the policies in terms of computation time and test performance. The box-and-whisker plots indicate the range of performance over 100 test trajectories of length , while the vertical position of each box indicates the average time to compute a control action. Of the policies with safety guarantees (Gauge NN, Projection NN, and online MPC), the Gauge NN achieves Pareto efficiency in terms of average solve time and median trajectory cost. Our intuition behind the high performance of the neural networks is that (2) is a heuristic and the unsupervised learning approach can lead to better closed-loop policies.

VI Conclusion
In this paper, we provided an efficient way of exploring the interior of the MPC feasible set for learning-based approximate explicit MPC, and demonstrated the performance and computational gains that can be achieved by approaching the problem from the interior. The paradigm relies on a Phase I solution that exploits the structure of the MPC problem and a Phase II solution that features a projection-free feasibility guarantee. The results compare favorably against common approaches that use unsupervised learning, as well as against the oracle itself used in supervised approaches. Future work includes applications to MPC problems with convex but non-polytopic constraint sets, and to distributed settings.
References
- [1] J. B. Rawlings, D. Q. Mayne, and M. M. Diehl, Model Predictive Control: Theory and Design. Santa Barbara, CA: Nob Hill, 2 ed., 2019.
- [2] Y. Ding, L. Wang, Y. Li, and D. Li, “Model predictive control and its application in agriculture: A review,” Computers and Electronics in Agriculture, vol. 151, pp. 104–117, 2018.
- [3] D. Hrovat, S. Di Cairano, H. E. Tseng, and I. V. Kolmanovsky, “The development of Model Predictive Control in automotive industry: A survey,” Proc. IEEE Int. Conf. on Control Applications, pp. 295–302, 2012.
- [4] A. Ademola-Idowu and B. Zhang, “Frequency Stability Using MPC-Based Inverter Power Control in Low-Inertia Power Systems,” IEEE Trans. Power Syst., vol. 36, no. 2, pp. 1628–1637, 2021.
- [5] A. Alessio and A. Bemporad, “A Survey on Explicit Model Predictive Control,” in Nonlinear Model Predictive Control (L. Magni, D. Raimondo, and F. Allgöwer, eds.), Berlin, Heidelberg: Springer, 2009.
- [6] M. N. Zeilinger, C. N. Jones, and M. Morari, “Real-time suboptimal model predictive control using a combination of explicit MPC and online optimization,” IEEE Trans. Autom. Control, vol. 56, no. 7, pp. 1524–1534, 2011.
- [7] C. N. Jones, M. Barić, and M. Morari, “Multiparametric linear programming with applications to control,” European Journal of Control, vol. 13, no. 2-3, pp. 152–170, 2007.
- [8] T. A. Johansen, “Approximate explicit receding horizon control of constrained nonlinear systems,” Automatica, vol. 40, no. 2, pp. 293–300, 2004.
- [9] A. Grancharova and T. A. Johansen, “Computation, approximation and stability of explicit feedback min-max nonlinear model predictive control,” Automatica, vol. 45, no. 5, pp. 1134–1143, 2009.
- [10] B. M. Åkesson and H. T. Toivonen, “A neural network model predictive controller,” J. Process Control, vol. 16, no. 9, pp. 937–946, 2006.
- [11] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J. Pappas, and M. Morari, “Approximating Explicit Model Predictive Control Using Constrained Neural Networks,” Proc. Am. Control Conf., vol. 2018-June, pp. 1520–1527, 2018.
- [12] T. Parisini and R. Zoppoli, “A Receding-Horizon Regulator for Nonlinear Systems and a Neural Approximation,” Automatica, vol. 31, no. 10, pp. 1443–1451, 1995.
- [13] A. Domahidi, M. N. Zeilinger, M. Morari, and C. N. Jones, “Learning a feasible and stabilizing explicit model predictive control law by robust optimization,” in Proc. IEEE Conf. Decision Control, pp. 513–519, IEEE, 2011.
- [14] E. T. Maddalena, C. G. da Moraes, G. Waltrich, and C. N. Jones, “A neural network architecture to learn explicit MPC controllers from data,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 11362–11367, 2020.
- [15] L. Zheng, Y. Shi, L. J. Ratliff, and B. Zhang, “Safe reinforcement learning of control-affine systems with vertex networks,” in Learning for Dynamics and Control, pp. 336–347, PMLR, 2021.
- [16] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2009.
- [17] D. Tabas and B. Zhang, “Computationally Efficient Safe Reinforcement Learning for Power Systems,” arXiv:2110.10333, 2021.
- [18] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory, vol. 15. Philadelphia: Society for Industrial and Applied Mathematics, 1994.
- [19] A. Bemporad and M. Morari, “Robust model predictive control: A survey,” in Robustness in identification and control, pp. 207–226, London: Springer, 1999.
- [20] W. Langson, I. Chryssochoos, S. V. Raković, and D. Q. Mayne, “Robust model predictive control using tubes,” Automatica, vol. 40, no. 1, pp. 125–133, 2004.
- [21] M. Farina, L. Giulioni, and R. Scattolini, “Stochastic linear Model Predictive Control with chance constraints - A review,” J. Process Control, vol. 44, pp. 53–67, 2016.
- [22] M. B. Saltık, L. Özkan, J. H. Ludlage, S. Weiland, and P. M. Van den Hof, “An outlook on robust model predictive control algorithms: Reflections on performance and computational aspects,” J. Process Control, vol. 61, pp. 77–102, 2018.
- [23] F. Blanchini and S. Miani, Set-theoretic methods in control. Birkhauser, 2015.
- [24] C. Liu and I. M. Jaimoukha, “The computation of full-complexity polytopic robust control invariant sets,” in Proc. 54th IEEE Conf. Decision Control, pp. 6233–6238, 2015.
- [25] Y. Wang and S. Boyd, “Fast model predictive control using online optimization,” IEEE Trans. Control Syst. Technol., vol. 18, no. 2, pp. 267–278, 2010.
- [26] C. Filippi, “An algorithm for approximate multiparametric linear programming,” Journal of Optimization Theory and Applications, vol. 120, no. 1, pp. 73–95, 2004.
- [27] J. Spjøtvold, P. Tøndel, and T. A. Johansen, “A method for obtaining continuous solutions to multiparametric linear programs,” IFAC Proc. Volumes (IFAC-PapersOnline), vol. 38, no. 1, pp. 253–258, 2005.
- [28] S. Sadraddini and R. Tedrake, “Linear Encodings for Polytope Containment Problems,” Proc. IEEE Conf. Decision Control, pp. 4367–4372, 2019.
- [29] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
- [30] A. Baydin, A. A. Radul, B. A. Pearlmutter, and J. M. Siskind, “Automatic Differentiation in Machine Learning: a Survey,” J. Machine Learning Research, vol. 18, pp. 1–43, 2018.
- [31] M. D. Srinath, P. Rajasekaran, and R. Viswanathan, Introduction to statistical signal processing with applications. Prentice-Hall, Inc., 1995.
- [32] J. Drgona, K. Kis, A. Tuor, D. Vrabie, and M. Klauco, “Differentiable Predictive Control: Deep Learning Alternative to Explicit Model Predictive Control for Unknown Nonlinear Systems,” arXiv:2011.03699, 2020.
- [33] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Zico Kolter, “Differentiable convex optimization layers,” Advances in Neural Information Processing Systems, vol. 32, no. NeurIPS, 2019.