This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Regularized Interior Point Methods for Constrained Optimization and Control

Alberto De Marchi Universität der Bundeswehr München, Department of Aerospace Engineering, Institute of Applied Mathematics and Scientific Computing, Neubiberg/Munich, Germany. email[email protected], orcid0000-0002-3545-6898.
Abstract

Regularization and interior point approaches offer valuable perspectives to address constrained nonlinear optimization problems in view of control applications. This paper discusses the interactions between these techniques and proposes an algorithm that synergistically combines them. Building a sequence of closely related subproblems and approximately solving each of them, this approach inherently exploits warm-starting, early termination, and the possibility to adopt subsolvers tailored to specific problem structures. Moreover, by relaxing the equality constraints with a proximal penalty, the regularized subproblems are feasible and satisfy a strong constraint qualification by construction, allowing the safe use of efficient solvers. We show how regularization benefits the underlying linear algebra and a detailed convergence analysis indicates that limit points tend to minimize constraint violation and satisfy suitable optimality conditions. Finally, we report on numerical results in terms of robustness, indicating that the combined approach compares favorably against both interior point and augmented Lagrangian codes.

Keywords. Nonlinear programming, optimization-based control, interior point methods, augmented Lagrangian

AMS MSC. 65K0590C0690C2690C30

1 Introduction

Mathematical optimization plays an important role in model-based and data-driven control systems, forming the basis for advanced techniques such as optimal control, nonlinear model predictive control (MPC) and parameters estimation. Significant research effort on computationally efficient real-time optimization algorithms contributed to the success of MPC over the years and yet the demand for fast and reliable methods for a broad spectrum of applications is growing; see [29, 27] and references therein. In order to tackle these challenges, it is desirable to have an algorithm that benefits from warm-starting information, can cope with infeasibility, is robust to problem scaling, and exploits the structure of the problem. In order to reduce computations and increase robustness, a common approach is to relax the requirements on the solutions, in terms of optimality, constraint violation, or both [14, 27]. In this work, we propose to address such features by combining proximal regularization and interior point techniques, for developing a stabilized, efficient and robust numerical method. We advocate for this strategy by bringing together and combining a variety of ideas from the nonlinear programming literature.

Let us consider the constrained nonconvex problem

minimizex\displaystyle\operatorname*{minimize}_{x}\quad f(x)\displaystyle f(x) (P)
subjectto\displaystyle\operatorname{subject~{}to}\quad c(x)=0,x0,\displaystyle c(x)=0,\qquad x\geq 0,

where functions f:nf\colon\mathbb{R}^{n}\to\mathbb{R} and c:nmc\colon\mathbb{R}^{n}\to\mathbb{R}^{m} are (at least) continuously differentiable. Problems with general inequality constraints on xx and c(x)c(x) can be reformulated as (P) by introducing auxiliary variables. Nonlinear programming (NLP) problems such as (P) have been extensively studied and there exist several approaches for their numerical solution. Interior point (IP) [31, 32], penalty and augmented Lagrangian (AL) [8, 2, 6] and sequential programming [16] schemes are predominant ideas and have been re-combined in many ways [9, 5, 3].

Starting from linear programming, IP methods had a significant impact on the field of mathematical optimization [18]. By solving a sequence of barrier subproblems, they can efficiently handle inequality constraints and scale well with the problem size. The state-of-the-art solver Ipopt, described by [32], is an emblem of this remarkable success. However, relying on Newton-type schemes for approximately solving the subproblems, IP algorithms may suffer degeneracy and lack of constraint qualifications if suitable counter-mechanisms are not implemented. On the contrary, proximal techniques naturally cope with these scenarios thanks to their inherent regularizing action. Widely investigated in the convex setting [26], their favorable properties have been exploited to design (primal-dual) stabilized methods building on the proximal point algorithm [17, 20, 11]. The analysis of their close connection with the AL framework [25] led to the development of techniques applicable to more general problems [22, 10, 24].

The combination of IP and proximal strategies has been successfully leveraged in the context of convex quadratic programming [1, 7] and for linear [13] and nonlinear [28] least-squares problems. With this work we address general NLPs and devise a method for their numerical solution, which can be seen as an extension of a regularized Lagrange–Newton method to handle bound constraints via a barrier function [10], or as a proximally stabilized IP algorithm, generalizing the ideas put forward by [7].111Although beyond the scope of this paper, topics such as infeasibility detection [3], ill-posed subproblems [2], and feasibility restoration [32, §3.3] are of great relevance in this context. Appropriate mechanisms should be incorporated in practical implementations, as they can significantly improve the performances.

Outline

The paper is organized as follows. In Section 2 we provide and comment on some relevant optimality notions. The methodology is discussed in Section 3 detailing the proposed algorithm, whose convergence properties are investigated in Section 4. We report numerical results on benchmark problems in Section 5 and conclude the paper in Section 6.

Notation

With \mathbb{N}, \mathbb{R}, and ¯:={}\overline{\mathbb{R}}:=\mathbb{R}\cup\{\infty\} we denote the natural, real and extended real numbers, respectively. We denote the set of real vectors of dimension nn\in\mathbb{N} as n\mathbb{R}^{n}; a real matrix with mm\in\mathbb{N} rows and nn\in\mathbb{N} columns as Am×nA\in\mathbb{R}^{m\times n} and its transpose as An×mA^{\top}\in\mathbb{R}^{n\times m}. For a vector ana\in\mathbb{R}^{n}, its ii-th element is aia_{i} and its squared Euclidean norm is a2=aa\|a\|^{2}=a^{\top}a. A vector or matrix with all zero elements is represented by 0. The gradient of a function f:nf\colon\mathbb{R}^{n}\to\mathbb{R} at a point x¯n\bar{x}\in\mathbb{R}^{n} is denoted by f(x¯)n\nabla f(\bar{x})\in\mathbb{R}^{n}; the Jacobian of a vector function c:nmc\colon\mathbb{R}^{n}\to\mathbb{R}^{m} by c(x¯)m×n\nabla c(\bar{x})\in\mathbb{R}^{m\times n}.

2 Optimality and Stationarity

In this section we introduce some optimality concepts, following [6, Ch. 3] and [12, §2].

Definition 2.1 (Feasibility).

Relative to (P), we say a point xnx^{\ast}\in\mathbb{R}^{n} is feasible if x0x^{\ast}\geq 0 and c(x)=0c(x^{\ast})=0; it is strictly feasible if additionally x>0x^{\ast}>0.

Definition 2.2 (Approximate KKT stationarity).

Relative to (P), a point xnx^{\ast}\in\mathbb{R}^{n} is ε\varepsilon-KKT stationary for some ε0\varepsilon\geq 0 if there exist multipliers ymy^{\ast}\in\mathbb{R}^{m} and znz^{\ast}\in\mathbb{R}^{n} such that

f(x)+c(x)y+z\displaystyle\lVert\nabla f(x^{\ast})+\nabla c(x^{\ast})^{\top}y^{\ast}+z^{\ast}\rVert{}\leq{} ε,\displaystyle\varepsilon, (1a)
c(x)\displaystyle\lVert c(x^{\ast})\rVert{}\leq{} ε,\displaystyle\varepsilon, (1b)
xε,zε,min{x,z}\displaystyle x^{\ast}\geq-\varepsilon,\quad z^{\ast}\leq\varepsilon,\quad\min\{x^{\ast},-z^{\ast}\}{}\leq{} ε.\displaystyle\varepsilon. (1c)

When ε=0\varepsilon=0, the point xx^{\ast} is said KKT stationary.

Notice that (1c) provides a condition modeling the approximate satisfaction of the (elementwise) complementarity condition min{x,z}=0\min\{x,-z\}=0 within some tolerance ε0\varepsilon\geq 0. The IP algorithm discussed in Section 3 satisfies a stronger version of these conditions, since the iterates it generates meet the constraints x0x\geq 0 and z0z\leq 0 by construction. Furthermore, we point out that the condition min{xi,zi}ε\min\{x_{i},-z_{i}\}\leq\varepsilon is analogous to xiziε-x_{i}z_{i}\leq\varepsilon, more typical for interior point methods, but does not depend on a specific barrier function, e.g., the logarithmic barrier in [32, §2.1].

We shall consider the limiting behavior of approximate KKT stationary points when the tolerance ε\varepsilon vanishes. In fact, having xkxx^{k}\to x^{\ast} with xkx^{k} εk\varepsilon_{k}-KKT stationary for (P) and εk0\varepsilon_{k}\searrow 0 does not guarantee KKT stationarity of a limit point xx^{\ast} of {xk}\{x^{k}\}. This issue raises the need for defining KKT stationarity in an asymptotic sense [6, Def. 3.1].

Definition 2.3 (Asymptotic KKT stationarity).

Relative to (P), a feasible point xnx^{\ast}\in\mathbb{R}^{n} is AKKT stationary if there exist sequences {xk},{zk}n\{x^{k}\},\{z^{k}\}\subset\mathbb{R}^{n}, and {yk}m\{y^{k}\}\subset\mathbb{R}^{m} such that xkxx^{k}\to x^{\ast} and

f(xk)+c(xk)yk+zk\displaystyle\nabla f(x^{k})+\nabla c(x^{k})^{\top}y^{k}+z^{k}{}\to{} 0,\displaystyle 0, (2a)
min{xk,zk}\displaystyle\min\{x^{k},-z^{k}\}{}\to{} 0.\displaystyle 0. (2b)

Any local minimizer xx^{\ast} for (P) is AKKT stationary, independently of constraint qualifications [6, Thm 3.1].

3 Approach and Algorithm

The methodology presented in this section builds upon the AL framework, interpreted as a proximal point scheme in the nonconvex regime, and IP methods. The basic idea is to construct a sequence of proximally regularized subproblems and to approximately solve each of them as a single barrier subproblem, effectively merging the AL and IP outer loops. Reduced computational cost can be achieved with an effective warm-starting of the IP iterations and with the tight entanglement of barrier and proximal penalty strategies, by monitoring and updating the parameters’ values alongside with the inner tolerance.

A classical approach is to consider a sequence of bound-constrained Lagrangian (BCL) subproblems [8, 6]

minimizex0f(x)+12ρkc(x)+ρky^k2\operatorname*{minimize}_{x\geq 0}\quad f(x)+\frac{1}{2\rho_{k}}\|c(x)+\rho_{k}\hat{y}^{k}\|^{2} (3)

where ρk>0\rho_{k}>0 and y^km\hat{y}^{k}\in\mathbb{R}^{m} are some given penalty parameter and dual estimate, respectively. The nonlinearly-constrained Lagrangian (NCL) scheme [22] considers equality-constrained subproblems by introducing an auxiliary variable sms\in\mathbb{R}^{m} and the constraint c(x)=sc(x)=s. Analogously, a proximal point perspective yields the equivalent reformulation

minimizex,λ\displaystyle\operatorname*{minimize}_{x,\,\lambda}\quad f(x)+ρk2λ2\displaystyle f(x)+\frac{\rho_{k}}{2}\|\lambda\|^{2} (4)
subjectto\displaystyle\operatorname{subject~{}to}\quad c(x)+ρk(y^kλ)=0,x0,\displaystyle c(x)+\rho_{k}(\hat{y}^{k}-\lambda)=0,\qquad x\geq 0,

recovering the dual regularization term obtained, e.g., by [24, 10, 11]. By construction, these regularized subproblems are always feasible and satisfy a strong constraint qualification, namely the LICQ, at all points.

The regularized subproblems (3)–(4) can be numerically solved via IP algorithms. Let us consider a barrier parameter μk>0\mu_{k}>0 and barrier functions bi:¯b_{i}\colon\mathbb{R}\to\overline{\mathbb{R}}, i=1,,ni=1,\ldots,n, each with domain dombi=(0,)\operatorname{dom}b_{i}=(0,\infty), and such that bi(t)b_{i}(t)\to\infty as t0+t\to 0^{+} and bi0b_{i}^{\prime}\leq 0. Exemplarily, the logarithmic function xln(x)x\mapsto-\ln(x) is one of such barrier functions. Other choices can be considered as well, e.g., to handle bilateral constraints [4]. We collect these barrier functions to define b:n¯b\colon\mathbb{R}^{n}\to\overline{\mathbb{R}}, b:xi=1nbi(xi)b\colon x\mapsto\sum_{i=1}^{n}b_{i}(x_{i}), whose domain is domb=(0,)n\operatorname{dom}b=(0,\infty)^{n}. Thus, analogously to [3], a barrier counterpart for the BCL subproblem (3) reads

minimizexf(x)+12ρkc(x)+ρky^k2+μkb(x),\operatorname*{minimize}_{x}\quad f(x)+\frac{1}{2\rho_{k}}\|c(x)+\rho_{k}\hat{y}^{k}\|^{2}+\mu_{k}b(x), (5)

whereas for the constrained subproblem (4) this leads to

minimizex,λ\displaystyle\operatorname*{minimize}_{x,\,\lambda}\quad f(x)+ρk2λ2+μkb(x)\displaystyle f(x)+\frac{\rho_{k}}{2}\|\lambda\|^{2}+\mu_{k}b(x) (6)
subjectto\displaystyle\operatorname{subject~{}to}\quad c(x)+ρk(y^kλ)=0,\displaystyle c(x)+\rho_{k}(\hat{y}^{k}-\lambda)=0,

which is a regularized version of [32, Eq. 3] and reminiscent of [5, Eq. 2]. It should be stressed that, in stark contrast with classical AL and IP schemes, we intend to find an (approximate) solution to the regularized subproblem (4) by (approximately) solving only one barrier subproblem (6). Inspired by [9, 7], our rationale is to drive ρk,μk\rho_{k},\mu_{k} and the inner tolerance ϵk\epsilon_{k} concurrently toward zero, effectively knitting together proximal and barrier strategies.

It should be noted that a primal (Tikhonov-like) regularization term is not explicitly included in (3)–(6). In fact, the original objective ff could be replaced by a (proximal) model of the form xf(x)+σk2xx^k2x\mapsto f(x)+\frac{\sigma_{k}}{2}\|x-\hat{x}^{k}\|^{2}, with some given primal regularization parameter σk0\sigma_{k}\geq 0 and reference point x^kn\hat{x}^{k}\in\mathbb{R}^{n}. However, as this term can be interpreted as an inertia correction, we prefer the subsolver to account for its contribution; cf. [32, §3.1]. In this way, the subsolver can surgically tune the primal correction term as needed, possibly improving the convergence speed, and surpassing the issue that suitable values for σk\sigma_{k} are unknown a priori.

Data: ϵ0,ρ0,μ0>0\epsilon_{0},\rho_{0},\mu_{0}>0, κρ,κμ,κϵ(0,1)\kappa_{\rho},\kappa_{\mu},\kappa_{\epsilon}\in(0,1), θρ,θμ[0,1)\theta_{\rho},\theta_{\mu}\in[0,1), YmY\subset\mathbb{R}^{m} nonempty bounded, ε>0\varepsilon>0
Result: ε\varepsilon-KKT stationary point xx^{\ast} with yy^{\ast}, zz^{\ast}
1
2for k=0,1,2,k=0,1,2,\ldots do
3       Select y^kY\hat{y}^{k}\in Y
4       Find an ϵk\epsilon_{k}-KKT stationary point (xk,λk)(x^{k},\lambda^{k}) for (6), with multiplier yky^{k}
5       Set zkμkb(xk)z^{k}\leftarrow\mu_{k}\nabla b(x^{k})
6       if (xk,yk,zk)(x^{k},y^{k},z^{k}) satisfies\mathrm{satisfies} (1) then
7             return (x,y,z)(xk,yk,zk)(x^{\ast},y^{\ast},z^{\ast})\leftarrow(x^{k},y^{k},z^{k})
8            
9      Set Ckc(xk)C^{k}\leftarrow\lVert c(x^{k})\rVert and Vkmin{xk,zk}V^{k}\leftarrow\lVert\min\{x^{k},-z^{k}\}\rVert
10       if k=0k=0 or Ckmax{ε,θρCk1}C^{k}\leq\max\{\varepsilon,\theta_{\rho}C^{k-1}\}  then
11             set ρk+1ρk\rho_{k+1}\leftarrow\rho_{k}, else select ρk+1(0,κρρk]\rho_{k+1}\in(0,\kappa_{\rho}\rho_{k}]
12            
13      if k=0k=0 or Vkmax{ε,θμVk1}V^{k}\leq\max\{\varepsilon,\theta_{\mu}V^{k-1}\}  then
14             set μk+1μk\mu_{k+1}\leftarrow\mu_{k}, else select μk+1(0,κμμk]\mu_{k+1}\in(0,\kappa_{\mu}\mu_{k}]
15            
16      Set ϵk+1max{ε,κϵϵk}\epsilon_{k+1}\leftarrow\max\{\varepsilon,\kappa_{\epsilon}\epsilon_{k}\}
17      
Algorithm 3.1 Regularized interior point method for general nonlinear programs (P)

The overall procedure is detailed in Algorithm 3.1. At every outer iteration, indexed by kk, Algorithm 3.1 requires to compute an approximate stationary point, with the associated Lagrange multiplier, for the regularized barrier subproblem (6). As the dual estimate y^k\hat{y}^{k} is selected from some bounded set YmY\subset\mathbb{R}^{m} at Algorithm 3.1, the AL scheme is safeguarded and has stronger global convergence properties [6, Ch. 4]. The assignment of zkz^{k} at Algorithm 3.1 follows from comparing and matching the stationarity conditions for (P) and (6). After checking termination, we monitor progress in constraint violation and complementarity, based on (1), and update parameters ρk\rho_{k} and μk\mu_{k} accordingly, as well as the inner tolerance εk\varepsilon_{k}. At Algorithms 3.1 and 3.1 we consider relaxed conditions for satisfactory feasibility and complementarity as it is preferable to have the sequences {ρk}\{\rho_{k}\}, {μk}\{\mu_{k}\}, and {ϵk}\{\epsilon_{k}\} bounded away from zero, in order to avoid unnecessary ill-conditioning and tight tolerances. Sufficient conditions to guarantee boundedness of the penalty parameter {ρk}\{\rho_{k}\} away from zero are given, e.g., by [2, §5]. Remarkably, as established by 4.2 in Section 4, there is no need for the barrier parameter μk\mu_{k} to vanish in order to achieve ε\varepsilon-complementarity in the sense of (1c), for ε>0\varepsilon>0.

We shall mention that considering equivalent yet different subproblem formulations may affect the practical performance of the subsolver. It is enlightening to pinpoint the effect of the dual regularization in (6) and to appreciate its interactions with the linear algebra routines used to solve the linear systems arising in Newton-type methods. Although (6) has more (possibly many more) variables than (5), a simple reordering yields matrices with the same structure [10, 24]. Let us have a closer look. Defining the Lagrangian function k(x,y):=f(x)+μkb(x)+y,c(x)\mathcal{L}_{k}(x,y):=f(x)+\mu_{k}b(x)+\langle y,c(x)\rangle, the stationarity condition for (5) reads 0=xk(x,yk(x))0=\nabla_{x}\mathcal{L}_{k}(x,y_{k}(x)), where yk(x):=y^k+ρk1c(x)y_{k}(x):=\hat{y}^{k}+\rho_{k}^{-1}c(x), and the corresponding Newton system is

[Hk(x,yk(x))+1ρkc(x)c(x)]δx=xk(x,yk(x)),\begin{bmatrix}H_{k}(x,y_{k}(x))+\frac{1}{\rho_{k}}\nabla c(x)^{\top}\nabla c(x)\end{bmatrix}\delta x=-\nabla_{x}\mathcal{L}_{k}(x,y_{k}(x)), (7)

where Hk(x,y)n×nH_{k}(x,y)\in\mathbb{R}^{n\times n} denotes the Hessian matrix xx2k(x,y)\nabla_{xx}^{2}\mathcal{L}_{k}(x,y) or a symmetric approximation thereof. A linear transformation yields the equivalent linear system

[Hk(x,yk(x))c(x)c(x)ρkI][δxδy]=[xk(x,yk(x))0].\begin{bmatrix}H_{k}(x,y_{k}(x))&\nabla c(x)^{\top}\\ \nabla c(x)&-\rho_{k}I\end{bmatrix}\begin{bmatrix}\delta x\\ \delta y\end{bmatrix}=-\begin{bmatrix}\nabla_{x}\mathcal{L}_{k}(x,y_{k}(x))\\ 0\end{bmatrix}. (8)

Analogous Newton systems for (6) read

[Hk(x,y)c(x)ρkIρkIc(x)ρkI][δxδλδy]=[xk(x,y)ρk(λy)c(x)+ρk(y^kλ)]\begin{bmatrix}H_{k}(x,y)&\cdot&\nabla c(x)^{\top}\\ \cdot&\rho_{k}I&-\rho_{k}I\\ \nabla c(x)&-\rho_{k}I&\cdot\end{bmatrix}\begin{bmatrix}\delta x\\ \delta\lambda\\ \delta y\end{bmatrix}=-\begin{bmatrix}\nabla_{x}\mathcal{L}_{k}(x,y)\\ \rho_{k}(\lambda-y)\\ c(x)+\rho_{k}(\hat{y}^{k}-\lambda)\end{bmatrix} (9)

and formally solving for δλ\delta\lambda gives the condensed system

[Hk(x,y)c(x)c(x)ρkI][δxδy]=[xk(x,y)c(x)+ρk(y^ky)].\begin{bmatrix}H_{k}(x,y)&\nabla c(x)^{\top}\\ \nabla c(x)&-\rho_{k}I\end{bmatrix}\begin{bmatrix}\delta x\\ \delta y\end{bmatrix}=-\begin{bmatrix}\nabla_{x}\mathcal{L}_{k}(x,y)\\ c(x)+\rho_{k}(\hat{y}^{k}-y)\end{bmatrix}. (10)

The resemblances between these linear systems are apparent, as well as the differences. The AL relaxation in (5) introduces a dual regularization for both the linear algebra and nonlinear solver, whose hidden constraint c(x)+ρk(y^ky)=0c(x)+\rho_{k}(\hat{y}^{k}-y)=0 holds pointwise due to the identity y=yk(x)y=y_{k}(x). We remark that, entering the (2,2)-block, the dual regularization prevents issues due to linear dependence. Furthermore, the primal regularization is left to the inertia correction strategy of the subsolver, affecting the (1,1)-block as in [32, §3.1]. If the approximation Hk(x,y)H_{k}(x,y) is positive definite, e.g., by adopting suitable quasi-Newton techniques, the matrix in (10) is symmetric quasi-definite and can be efficiently factorized with tailored linear algebra routines [30].

4 Convergence Analysis

In this section we analyze the asymptotic properties of the iterates generated by Algorithm 3.1 under the following blanket assumptions:

  1. (a1)

    Functions f:nf\colon\mathbb{R}^{n}\to\mathbb{R} and c:nmc\colon\mathbb{R}^{n}\to\mathbb{R}^{m} in (P) are continuously differentiable.

  2. (a2)

    Subproblems (6) are well-posed for all parameters’ values, namely for any μkμ0\mu_{k}\leq\mu_{0}, ρkρ0\rho_{k}\leq\rho_{0}, and y^kY\hat{y}^{k}\in Y.

First, we characterize the iterates in terms of stationarity.

Lemma 4.1.

Consider a sequence {xk,yk,zk}\{x^{k},y^{k},z^{k}\} generated by Algorithm 3.1. Then, for all kk\in\mathbb{N}, it is xk>0x^{k}>0, zk0z^{k}\leq 0, and the following conditions hold:

f(xk)+c(xk)yk+zk\displaystyle\|\nabla f(x^{k})+\nabla c(x^{k})^{\top}y^{k}+z^{k}\|{}\leq{} ϵk,\displaystyle\epsilon_{k}, (11a)
c(xk)+ρk(y^kyk)\displaystyle\|c(x^{k})+\rho_{k}(\hat{y}^{k}-y^{k})\|{}\leq{} 2ϵk.\displaystyle 2\epsilon_{k}. (11b)
Proof.

Positivity of xkx^{k} follows from the barrier function bb having domain domb=(0,)n\operatorname{dom}b=(0,\infty)^{n}, whereas nonpositivity of zkz^{k} is a consequence of bi0b_{i}^{\prime}\leq 0 for all ii and μk>0\mu_{k}>0. Based on 2.2 and Algorithm 3.1 of Algorithm 3.1, the ϵk\epsilon_{k}-KKT stationarity of (xk,λk)(x^{k},\lambda^{k}) for (6), with multiplier yky^{k}, yields (11a) along with

ρkλkyk\displaystyle\rho_{k}\lVert\lambda^{k}-y^{k}\rVert{}\leq{} ϵk,\displaystyle\epsilon_{k}, (12a)
c(xk)+ρk(y^kλk)\displaystyle\lVert c(x^{k})+\rho_{k}(\hat{y}^{k}-\lambda^{k})\rVert{}\leq{} ϵk.\displaystyle\epsilon_{k}. (12b)

By the triangle inequality, (12a)–(12b) imply (11b). ∎

Patterning [12, Thm 4.2(ii)], we establish asymptotic complementarity.

Lemma 4.2.

Consider a sequence {xk,yk,zk}\{x^{k},y^{k},z^{k}\} of iterates generated by Algorithm 3.1 with ε=0\varepsilon=0. Then, it holds limkmin{xk,zk}=0\lim\limits_{k\to\infty}\min\{x^{k},-z^{k}\}=0.

Proof.

The algorithm can terminate in finite time only if the returned triplet (x,y,z)(x^{\ast},y^{\ast},z^{\ast}) satisfies min{x,z}=0\min\{x^{\ast},-z^{\ast}\}=0. Excluding this ideal situation, we may assume that it runs indefinitely and that consequently μk0\mu_{k}\searrow 0, owing to Algorithms 3.1 and 3.1 and recalling that xk>0x^{k}>0 and zk0z^{k}\leq 0 for all kk\in\mathbb{N} by 4.1. Consider now an arbitrary index i{1,,n}i\in\{1,\ldots,n\} and the two possible cases. If xik0x_{i}^{k}\to 0, then the statement readily follows from zik0z_{i}^{k}\leq 0. If instead a subsequence {xik}K\{x_{i}^{k}\}_{K} remains bounded away from zero, then {bi(xik)}K\{b_{i}^{\prime}(x_{i}^{k})\}_{K} is bounded and therefore zik=μkbi(xik)0z_{i}^{k}=\mu_{k}b_{i}^{\prime}(x_{i}^{k})\to 0 as kKk\to_{K}\infty, proving the statement since xik>0x_{i}^{k}>0. The claim then follows from the arbitrarity of the index ii and the subsequence. ∎

Like all penalty-type methods in the nonconvex setting, Algorithm 3.1 may generate limit points that are infeasible for (P). Patterning standard arguments, the following result gives sufficient conditions for the feasibility of limit points; cf. [6, Ex. 4.12].

Proposition 4.3.

Consider a sequence {xk,yk,zk}\{x^{k},y^{k},z^{k}\} of iterates generated by Algorithm 3.1. Then, each limit point xx^{\ast} of {xk}\{x^{k}\} is feasible for (P) if one of the following conditions holds:

  1. (i)

    the sequence {ρk}\{\rho_{k}\} is bounded away from zero, or

  2. (ii)

    there exists some BB\in\mathbb{R} such that for all kk\in\mathbb{N}

    f(xk)+12ρkc(xk)+ρky^k2B.f(x^{k})+\frac{1}{2\rho_{k}}\lVert c(x^{k})+\rho_{k}\hat{y}^{k}\rVert^{2}\leq B.

These conditions are generally difficult to check a priori. Nevertheless, in the situation where each iterate xkx^{k} is actually a (possibly inexact) global minimizer of (6), then limit points generated by Algorithm 3.1 have minimum constraint violation and tend to minimize the objective function subject to minimal infeasibility [6, Thm 5.1, Thm 5.3]. In particular, limit points are indeed feasible if (P) admits feasible points. However, these properties cannot be expected by solving the subproblems only up to stationarity. Nonetheless, even in the case where a limit point is not necessarily feasible, the next result shows that it is at least a stationary point for a feasibility problem associated to (P).

Proposition 4.4.

Consider a sequence {xk,yk,zk}\{x^{k},y^{k},z^{k}\} generated by Algorithm 3.1 with ε=0\varepsilon=0. Then each limit point xx^{\ast} of {xk}\{x^{k}\} is KKT stationary for the problem

minimizex012c(x)2.\operatorname*{minimize}_{x\geq 0}\quad\frac{1}{2}\|c(x)\|^{2}. (13)
Proof.

We may consider two cases, depending on the sequence {ρk}\{\rho_{k}\}. If {ρk}\{\rho_{k}\} remains bounded away from zero, then Algorithms 3.1 and 3.1 of Algorithm 3.1 imply that c(xk)0\|c(x^{k})\|\to 0 for kk\to\infty. Continuity of cc and properties of norms yield c(x)=0c(x^{\ast})=0. Furthermore, by construction, we have xk>0x^{k}>0 for all kk\in\mathbb{N}, hence x0x^{\ast}\geq 0. Altogether, this shows that xx^{\ast} is feasible for (P), namely a global minimizer for the feasibility problem (13) and, therefore, a KKT stationary point thereof. Assume now that ρk0\rho_{k}\to 0. Define δkn\delta^{k}\in\mathbb{R}^{n} and ηkm\eta^{k}\in\mathbb{R}^{m} as

δk:=\displaystyle\delta^{k}{}:={} f(xk)+c(xk)yk+zk\displaystyle\nabla f(x^{k})+\nabla c(x^{k})^{\top}y^{k}+z^{k}
ηk:=\displaystyle\eta^{k}{}:={} c(xk)+ρk(y^kyk)\displaystyle c(x^{k})+\rho_{k}(\hat{y}^{k}-y^{k})

for all kk\in\mathbb{N}. In view of 4.1, we have that δkϵk\lVert\delta^{k}\rVert\leq\epsilon_{k} and ηk2ϵk\lVert\eta^{k}\rVert\leq 2\epsilon_{k} hold for all kk\in\mathbb{N}. Multiplying δk\delta^{k} by ρk\rho_{k}, substituting yky^{k} and rearranging, we obtain

ρkδk=ρkf(xk)+c(xk)[ρky^k+c(xk)ηk]+ρkzk.\rho_{k}\delta^{k}{}={}\rho_{k}\nabla f(x^{k})+\nabla c(x^{k})^{\top}\left[\rho_{k}\hat{y}^{k}+c(x^{k})-\eta^{k}\right]+\rho_{k}z^{k}.

Now, let xx^{\ast} be a limit point of {xk}\{x^{k}\} and {xk}K\{x^{k}\}_{K} a subsequence such that xkKxx^{k}\to_{K}x^{\ast}. Then the sequence {f(xk)}K\{\nabla f(x^{k})\}_{K} is bounded, and so is {y^k}KY\{\hat{y}^{k}\}_{K}\subset Y by construction. Recalling from 4.1 that xk>0x^{k}>0 and zk0z^{k}\leq 0, and observing that 0δk,ηk2ϵk00\leq\lVert\delta^{k}\rVert,\lVert\eta^{k}\rVert\leq 2\epsilon_{k}\to 0, we shall now take the limit of ρkδk\rho_{k}\delta^{k} for kKk\to_{K}\infty, resulting in

0=c(x)c(x)+z~0{}={}\nabla c(x^{\ast})^{\top}c(x^{\ast})+\tilde{z}^{\ast}

for some z~0\tilde{z}^{\ast}\leq 0. As a limit point of {ρkzk}\{\rho_{k}z^{k}\}, z~\tilde{z}^{\ast} together with xx^{\ast} satisfy min{x,z~}=0\min\{x^{\ast},-\tilde{z}^{\ast}\}=0 by 4.2. Since we also have x0x^{\ast}\geq 0, it follows that xx^{\ast} is KKT stationary for (13) according to 2.2. ∎

Finally, we qualify the output of Algorithm 3.1 in the case of feasible limit points. In particular, it is shown that any feasible limit point is AKKT stationary for (P) in the sense of 2.3. Under some additional boundedness conditions, feasible limit points are KKT stationary, according to 2.2.

Theorem 4.5.

Let {xk,yk,zk}\{x^{k},y^{k},z^{k}\} be a sequence of iterates generated by Algorithm 3.1 with ε=0\varepsilon=0. Let xx^{\ast} be a feasible limit point of {xk}\{x^{k}\} and {xk}K\{x^{k}\}_{K} a subsequence such that xkKxx^{k}\to_{K}x^{\ast}. Then,

  1. (i)

    xx^{\ast} is an AKKT stationary point for (P).

  2. (ii)

    If {yk,zk}K\{y^{k},z^{k}\}_{K} remain bounded, then xx^{\ast} is KKT stationary for (P).

Proof.

(i) Together with the fact that ϵk0\epsilon_{k}\to 0, 4.1 ensures that the sequence {xk}K\{x^{k}\}_{K} satisfies condition (2a), whereas 4.2 implies (2b). Feasibility of xx^{\ast} completes the proof.

(ii) By boundedness, the subsequences {yk}K\{y^{k}\}_{K} and {zk}K\{z^{k}\}_{K} admit some limit points yy^{\ast} and zz^{\ast}, respectively. Thus, from the previous assertion and with continuity arguments on ff and cc, it follows that xx^{\ast} is KKT stationary for (P), not only asymptotically. ∎

Provided that the iterates admit a feasible limit point, finite termination of Algorithm 3.1 with an ε\varepsilon-KKT stationary point can be established as a direct consequence of 4.5.

5 Numerical Results

In this section we test an instance of the proposed regularized interior point approach, denoted RegIP, on the CUTEst benchmark problems [19]. RegIP is compared in terms of robustness against the IP solver Ipopt [32] and the AL solver Percival [15], which is based on a BCL method [8] coupled with a trust-region matrix-free solver [21] for the subproblems. We do not report runtimes nor iteration counts since a fair comparison would require close inspection of heuristics and fallbacks [32, §3].

We implemented RegIP in Julia and set up the numerical experiments adopting the JSO software infrastructure by [23]. The IP solver Ipopt acts as subsolver to execute Algorithm 3.1, warm-started at the current primal (xk1,yk1)(x^{k-1},y^{k-1}) and dual (yk1(y^{k-1}, zk1)z^{k-1}) estimates. We use its parameter tol to set the (inner) tolerance ϵk\epsilon_{k}, disabling other termination conditions, and let Ipopt control the barrier parameter as needed to approximately solve the regularized subproblem.222Solving a sequence of barrier subproblems may hinder the computational efficiency of RegIP compared to the approach behind Algorithm 3.1, but does not degrade its reliability. Ongoing research focuses on solving (6) and letting the IP subsolver update the barrier parameter after warm-starting at the current primal-dual estimate, in the spirit of [7, Alg. 3]. We let the safeguarding set be Y:={vm|v1020}Y:=\{v\in\mathbb{R}^{m}\,|\,\|v\|_{\infty}\leq 10^{20}\} and choose y^k\hat{y}^{k} by projecting the current estimate yk1y^{k-1} onto YY. We set the initial penalty parameter to ρ0=106\rho_{0}=10^{-6}, the inner tolerance ϵ0=ε3\epsilon_{0}=\sqrt[3]{\varepsilon}, and parameters θρ=0.5\theta_{\rho}=0.5, κρ=0.5\kappa_{\rho}=0.5, and κϵ=0.5\kappa_{\epsilon}=0.5. RegIP declares success, and returns a ε\varepsilon-KKT stationary point, as soon as ϵkε\epsilon_{k}\leq\varepsilon and CkεC^{k}\leq\varepsilon.333The condition ϵkε\epsilon_{k}\leq\varepsilon implies both ε\varepsilon-stationarity (1a) and ε\varepsilon-complementarity (1c) required for ε\varepsilon-KKT stationarity. This follows from the observation that, in RegIP, the subsolver approximately solves (4) at Algorithm 3.1, not (6); see Footnote 2. Instead, if ϵkε\epsilon_{k}\leq\varepsilon, Ck>εC^{k}>\varepsilon and ρkρmin:=1020\rho_{k}\leq\rho_{\min}:=10^{-20}, RegIP stops declaring (local) infeasibility. For Ipopt, we set the tolerance tol to ε\varepsilon, remove the other desired thresholds, and disable termination based on acceptable iterates. For Percival, we set absolute and relative tolerances atol, rtol, and ctol to ε\varepsilon. We select all CUTEst problems with at most 1000 variables and constraints, obtaining a test set with 609 problems. All solvers are provided with the primal-dual initial point available in CUTEst, a time limit of 6060 seconds, the maximum number of iterations set to 10910^{9}, and a tolerance ε{103,105}\varepsilon\in\{10^{-3},10^{-5}\}. A solver is deemed to solve a problem if it returns with a successful status; it fails otherwise. The source codes for the numerical experiments have been archived on Zenodo at

doi: 10.5281/zenodo.7109904.

Table 1 summarizes the results, stratified by solver, termination tolerance (ε\varepsilon) and problem size (nn, mm). For each combination, we indicate the number of times RegIP solves a problem that the other solver fails (“W”) or solves (“T+”) and the number of times RegIP fails on a problem that the other one fails (“T-”) or solves (“L”). The results show that RegIP succeeds on more problems than the other solvers, consistently for both low and high accuracy, indicating that the underlying regularized IP approach can form the basis for reliable and scalable solvers.444We agree with [5, §5] on the fact that “strong statements concerning the relative efficiency or robustness [\ldots] are not possible in nonlinear optimization.”

Table 1: Comparison on CUTEst problems with nn variables and mm constraints
RegIP against Ipopt
Size n,mn,m Tolerance ε=103\varepsilon=10^{-3} Tolerance ε=105\varepsilon=10^{-5}
Min Max W T L W T L
0 10 19 417+/25- 3 15 417+/29- 3
11 100 6 60+/7- 1 7 58+/8- 1
101 1000 6 57+/8- 0 7 53+/10- 1
RegIP against Percival
Size n,mn,m Tolerance ε=103\varepsilon=10^{-3} Tolerance ε=105\varepsilon=10^{-5}
Min Max W T L W T L
0 10 17 419+/20- 8 19 413+/24- 8
11 100 12 54+/8- 0 18 47+/8- 1
101 1000 37 26+/8- 0 37 23+/11- 0

6 Conclusion

This paper has presented a regularized interior point approach to solving constrained nonlinear optimization problems. Operating as an outer regularization layer, a quadratic proximal penalty provides robustness whilst consuming minimal computation effort once embedded into existent interior point codes as a principled inertia correction strategy. Furthermore, regularizing the equality constraints allows to safely adopt more efficient linear algebra routines, while waiving the need for an infeasibility detection mechanism within the subsolver. Preliminary numerical results indicate that a close integration of proximal regularization within interior point schemes is key to provide efficient and robust solvers. We encourage further research in this direction.

Acknowledgements

I gratefully acknowledge the support of Ryan Loxton and the Centre for Optimisation and Decision Science for giving me the opportunity to visit Curtin University. I would also like to thank Hoa T. Bui for her friendly hospitality and lively discussions during this time in Perth.

References

  • [1] A.Altman and J.Gondzio, Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization, Optimization Methods and Software 11 (1999), 275–302, doi:10.1080/10556789908805754.
  • [2] R.Andreani, E. G.Birgin, J. M.Martínez, and M. L.Schuverdt, On Augmented Lagrangian Methods with General Lower–Level Constraints, SIAM Journal on Optimization 18 (2008), 1286–1309, doi:10.1137/060654797.
  • [3] P.Armand and N. N.Tran, Rapid infeasibility detection in a mixed logarithmic barrier-augmented Lagrangian method for nonlinear optimization, Optimization Methods and Software 34 (2019), 991–1013, doi:10.1080/10556788.2018.1528250.
  • [4] E.Bertolazzi, F.Biral, and M.Da Lio, Real-Time Motion Planning for Multibody Systems, Multibody System Dynamics 17 (2007), 119–139, doi:10.1007/s11044-007-9037-7.
  • [5] E. G.Birgin, L. F.Bueno, and J. M.Martínez, Sequential equality-constrained optimization for nonlinear programming, Computational Optimization and Applications 65 (2016), 699–721, doi:10.1007/s10589-016-9849-6.
  • [6] E. G.Birgin and J. M.Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014.
  • [7] S.Cipolla and J.Gondzio, Proximal stabilized Interior Point Methods for quadratic programming and low-frequency-updates preconditioning techniques, 2022, doi:10.48550/arxiv.2205.01775.
  • [8] A. R.Conn, N. I. M.Gould, and P. L.Toint, A Globally Convergent Augmented Lagrangian Algorithm for Optimization with General Constraints and Simple Bounds, SIAM Journal on Numerical Analysis 28 (1991), 545–572, doi:10.1137/0728030.
  • [9] F. E.Curtis, A penalty-interior-point algorithm for nonlinear constrained optimization, Mathematical Programming Computation 4 (2012), 181–209, doi:10.1007/s12532-012-0041-4.
  • [10] A.De Marchi, Augmented Lagrangian methods as dynamical systems for constrained optimization, in 60th IEEE Conference on Decision and Control (CDC), IEEE, Austin, TX, 2021, 6533–6538, doi:10.1109/cdc45484.2021.9683199.
  • [11] A.De Marchi, On a Primal-Dual Newton Proximal Method for Convex Quadratic Programs, Computational Optimization and Applications (2022), doi:10.1007/s10589-021-00342-y.
  • [12] A.De Marchi and A.Themelis, An Interior Proximal Gradient Method for Nonconvex Optimization, 2022, doi:10.48550/arxiv.2208.00799.
  • [13] M.Dehghani, A.Lambe, and D.Orban, A regularized interior-point method for constrained linear least squares, INFOR: Information Systems and Operational Research 58 (2020), 202–224, doi:10.1080/03155986.2018.1559428.
  • [14] M.Diehl, H. J.Ferreau, and N.Haverbeke, Efficient numerical methods for nonlinear MPC and moving horizon estimation, in Nonlinear Model Predictive Control: Towards New Challenging Applications, Springer, 2009, 391–417, doi:10.1007/978-3-642-01094-1_32.
  • [15] E. A.dos Santos and A. S.Siqueira, Percival.jl: an augmented Lagrangian method, 2020, doi:10.5281/zenodo.3969045, https://github.com/JuliaSmoothOptimizers/Percival.jl.
  • [16] A. V.Fiacco and G. P.McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York, 1968.
  • [17] M. P.Friedlander and D.Orban, A primal-dual regularized interior-point method for convex quadratic programs, Mathematical Programming Computation 4 (2012), 71–107, doi:10.1007/s12532-012-0035-2.
  • [18] J.Gondzio, Interior point methods 25 years later, European Journal of Operational Research 218 (2012), 587–601, doi:10.1016/j.ejor.2011.09.017.
  • [19] N. I. M.Gould, D.Orban, and P. L.Toint, CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization, Computational Optimization and Applications 60 (2015), 545–557, doi:10.1007/s10589-014-9687-3.
  • [20] D.Liao-McPherson and I.Kolmanovsky, FBstab: A proximally stabilized semismooth algorithm for convex quadratic programming, Automatica 113 (2020), 108801, doi:10.1016/j.automatica.2019.108801.
  • [21] C. J.Lin and J. J.Moré, Newton’s Method for Large Bound-Constrained Optimization Problems, SIAM Journal on Optimization 9 (1999), 1100–1127, doi:10.1137/s1052623498345075.
  • [22] D.Ma, K. L.Judd, D.Orban, and M. A.Saunders, Stabilized Optimization Via an NCL Algorithm, in Numerical Analysis and Optimization, M.Al-Baali, L.Grandinetti, and A.Purnama (eds.), Springer, 2018, 173–191, doi:10.1007/978-3-319-90026-1_8.
  • [23] D.Orban and A. S.Siqueira, JuliaSmoothOptimizers: Infrastructure and Solvers for Continuous Optimization in Julia, 2019, doi:10.5281/zenodo.2655082, https://juliasmoothoptimizers.github.io.
  • [24] A.Potschka and H. G.Bock, A sequential homotopy method for mathematical programming problems, Mathematical Programming 187 (2021), 459–486, doi:10.1007/s10107-020-01488-z.
  • [25] R. T.Rockafellar, Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming, SIAM Journal on Control 12 (1974), 268–285, doi:10.1137/0312021.
  • [26] R. T.Rockafellar, Monotone Operators and the Proximal Point Algorithm, SIAM Journal on Control and Optimization 14 (1976), 877–898, doi:10.1137/0314056.
  • [27] N.Saraf and A.Bemporad, An efficient bounded-variable nonlinear least-squares algorithm for embedded MPC, Automatica 141 (2022), 110293, doi:10.1016/j.automatica.2022.110293.
  • [28] A. S.Siqueira and D.Orban, A Regularized Interior-Point Method for Constrained Nonlinear Least Squares, 2018, https://abelsiqueira.github.io/assets/2018-07-23-xiibrazopt.pdf (visited September 21, 2022). XII Brazilian Workshop on Continuous Optimization.
  • [29] P.Sopasakis, E.Fresk, and P.Patrinos, OpEn: Code Generation for Embedded Nonconvex Optimization, IFAC-PapersOnLine 53 (2020), 6548–6554, doi:10.1016/j.ifacol.2020.12.071. 21st IFAC World Congress.
  • [30] R. J.Vanderbei, Symmetric Quasidefinite Matrices, SIAM Journal on Optimization 5 (1995), 100–113, doi:10.1137/0805005.
  • [31] R. J.Vanderbei and D. F.Shanno, An Interior-Point Algorithm for Nonconvex Nonlinear Programming, Computational Optimization and Applications 13 (1999), 231–252, doi:10.1023/a:1008677427361.
  • [32] A.Wächter and L. T.Biegler, On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming, Mathematical Programming 106 (2006), 25–57, doi:10.1007/s10107-004-0559-y.