Regularized Interior Point Methods for Constrained Optimization and Control

Alberto De Marchi Universität der Bundeswehr München, Department of Aerospace Engineering, Institute of Applied Mathematics and Scientific Computing, Neubiberg/Munich, Germany. email[email protected], orcid0000-0002-3545-6898.

Abstract

Regularization and interior point approaches offer valuable perspectives to address constrained nonlinear optimization problems in view of control applications. This paper discusses the interactions between these techniques and proposes an algorithm that synergistically combines them. Building a sequence of closely related subproblems and approximately solving each of them, this approach inherently exploits warm-starting, early termination, and the possibility to adopt subsolvers tailored to specific problem structures. Moreover, by relaxing the equality constraints with a proximal penalty, the regularized subproblems are feasible and satisfy a strong constraint qualification by construction, allowing the safe use of efficient solvers. We show how regularization benefits the underlying linear algebra and a detailed convergence analysis indicates that limit points tend to minimize constraint violation and satisfy suitable optimality conditions. Finally, we report on numerical results in terms of robustness, indicating that the combined approach compares favorably against both interior point and augmented Lagrangian codes.

Keywords. Nonlinear programming, optimization-based control, interior point methods, augmented Lagrangian

AMS MSC. 65K05, 90C06, 90C26, 90C30

1 Introduction

Mathematical optimization plays an important role in model-based and data-driven control systems, forming the basis for advanced techniques such as optimal control, nonlinear model predictive control (MPC) and parameters estimation. Significant research effort on computationally efficient real-time optimization algorithms contributed to the success of MPC over the years and yet the demand for fast and reliable methods for a broad spectrum of applications is growing; see [29, 27] and references therein. In order to tackle these challenges, it is desirable to have an algorithm that benefits from warm-starting information, can cope with infeasibility, is robust to problem scaling, and exploits the structure of the problem. In order to reduce computations and increase robustness, a common approach is to relax the requirements on the solutions, in terms of optimality, constraint violation, or both [14, 27]. In this work, we propose to address such features by combining proximal regularization and interior point techniques, for developing a stabilized, efficient and robust numerical method. We advocate for this strategy by bringing together and combining a variety of ideas from the nonlinear programming literature.

Let us consider the constrained nonconvex problem

	$\displaystyle\operatorname*{minimize}_{x}\quad$	$\displaystyle f(x)$		(P)
	$\displaystyle\operatorname{subject~{}to}\quad$	$\displaystyle c(x)=0,\qquad x\geq 0,$

where functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and $c\colon\mathbb{R}^{n}\to\mathbb{R}^{m}$ are (at least) continuously differentiable. Problems with general inequality constraints on $x$ and $c(x)$ can be reformulated as (P) by introducing auxiliary variables. Nonlinear programming (NLP) problems such as (P) have been extensively studied and there exist several approaches for their numerical solution. Interior point (IP) [31, 32], penalty and augmented Lagrangian (AL) [8, 2, 6] and sequential programming [16] schemes are predominant ideas and have been re-combined in many ways [9, 5, 3].

Starting from linear programming, IP methods had a significant impact on the field of mathematical optimization [18]. By solving a sequence of barrier subproblems, they can efficiently handle inequality constraints and scale well with the problem size. The state-of-the-art solver Ipopt, described by [32], is an emblem of this remarkable success. However, relying on Newton-type schemes for approximately solving the subproblems, IP algorithms may suffer degeneracy and lack of constraint qualifications if suitable counter-mechanisms are not implemented. On the contrary, proximal techniques naturally cope with these scenarios thanks to their inherent regularizing action. Widely investigated in the convex setting [26], their favorable properties have been exploited to design (primal-dual) stabilized methods building on the proximal point algorithm [17, 20, 11]. The analysis of their close connection with the AL framework [25] led to the development of techniques applicable to more general problems [22, 10, 24].

The combination of IP and proximal strategies has been successfully leveraged in the context of convex quadratic programming [1, 7] and for linear [13] and nonlinear [28] least-squares problems. With this work we address general NLPs and devise a method for their numerical solution, which can be seen as an extension of a regularized Lagrange–Newton method to handle bound constraints via a barrier function [10], or as a proximally stabilized IP algorithm, generalizing the ideas put forward by [7].¹¹1Although beyond the scope of this paper, topics such as infeasibility detection [3], ill-posed subproblems [2], and feasibility restoration [32, §3.3] are of great relevance in this context. Appropriate mechanisms should be incorporated in practical implementations, as they can significantly improve the performances.

Outline

The paper is organized as follows. In Section 2 we provide and comment on some relevant optimality notions. The methodology is discussed in Section 3 detailing the proposed algorithm, whose convergence properties are investigated in Section 4. We report numerical results on benchmark problems in Section 5 and conclude the paper in Section 6.

Notation

With $\mathbb{N}$ , $\mathbb{R}$ , and $\overline{\mathbb{R}}:=\mathbb{R}\cup\{\infty\}$ we denote the natural, real and extended real numbers, respectively. We denote the set of real vectors of dimension $n\in\mathbb{N}$ as $\mathbb{R}^{n}$ ; a real matrix with $m\in\mathbb{N}$ rows and $n\in\mathbb{N}$ columns as $A\in\mathbb{R}^{m\times n}$ and its transpose as $A^{\top}\in\mathbb{R}^{n\times m}$ . For a vector $a\in\mathbb{R}^{n}$ , its $i$ -th element is $a_{i}$ and its squared Euclidean norm is $\|a\|^{2}=a^{\top}a$ . A vector or matrix with all zero elements is represented by $0$ . The gradient of a function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ at a point $\bar{x}\in\mathbb{R}^{n}$ is denoted by $\nabla f(\bar{x})\in\mathbb{R}^{n}$ ; the Jacobian of a vector function $c\colon\mathbb{R}^{n}\to\mathbb{R}^{m}$ by $\nabla c(\bar{x})\in\mathbb{R}^{m\times n}$ .

2 Optimality and Stationarity

In this section we introduce some optimality concepts, following [6, Ch. 3] and [12, §2].

Definition 2.1 (Feasibility).

Relative to (P), we say a point $x^{\ast}\in\mathbb{R}^{n}$ is feasible if $x^{\ast}\geq 0$ and $c(x^{\ast})=0$ ; it is strictly feasible if additionally $x^{\ast}>0$ .

Definition 2.2 (Approximate KKT stationarity).

Relative to (P), a point $x^{\ast}\in\mathbb{R}^{n}$ is $\varepsilon$ -KKT stationary for some $\varepsilon\geq 0$ if there exist multipliers $y^{\ast}\in\mathbb{R}^{m}$ and $z^{\ast}\in\mathbb{R}^{n}$ such that


$\displaystyle\lVert\nabla f(x^{\ast})+\nabla c(x^{\ast})^{\top}y^{\ast}+z^{\ast}\rVert{}\leq{}$	$\displaystyle\varepsilon,$	(1a)
$\displaystyle\lVert c(x^{\ast})\rVert{}\leq{}$	$\displaystyle\varepsilon,$	(1b)
$\displaystyle x^{\ast}\geq-\varepsilon,\quad z^{\ast}\leq\varepsilon,\quad\min\{x^{\ast},-z^{\ast}\}{}\leq{}$	$\displaystyle\varepsilon.$	(1c)

When $\varepsilon=0$ , the point $x^{\ast}$ is said KKT stationary.

Notice that (1c) provides a condition modeling the approximate satisfaction of the (elementwise) complementarity condition $\min\{x,-z\}=0$ within some tolerance $\varepsilon\geq 0$ . The IP algorithm discussed in Section 3 satisfies a stronger version of these conditions, since the iterates it generates meet the constraints $x\geq 0$ and $z\leq 0$ by construction. Furthermore, we point out that the condition $\min\{x_{i},-z_{i}\}\leq\varepsilon$ is analogous to $-x_{i}z_{i}\leq\varepsilon$ , more typical for interior point methods, but does not depend on a specific barrier function, e.g., the logarithmic barrier in [32, §2.1].

We shall consider the limiting behavior of approximate KKT stationary points when the tolerance $\varepsilon$ vanishes. In fact, having $x^{k}\to x^{\ast}$ with $x^{k}$ $\varepsilon_{k}$ -KKT stationary for (P) and $\varepsilon_{k}\searrow 0$ does not guarantee KKT stationarity of a limit point $x^{\ast}$ of $\{x^{k}\}$ . This issue raises the need for defining KKT stationarity in an asymptotic sense [6, Def. 3.1].

Definition 2.3 (Asymptotic KKT stationarity).

Relative to (P), a feasible point $x^{\ast}\in\mathbb{R}^{n}$ is AKKT stationary if there exist sequences $\{x^{k}\},\{z^{k}\}\subset\mathbb{R}^{n}$ , and $\{y^{k}\}\subset\mathbb{R}^{m}$ such that $x^{k}\to x^{\ast}$ and


$\displaystyle\nabla f(x^{k})+\nabla c(x^{k})^{\top}y^{k}+z^{k}{}\to{}$	$\displaystyle 0,$	(2a)
$\displaystyle\min\{x^{k},-z^{k}\}{}\to{}$	$\displaystyle 0.$	(2b)

Any local minimizer $x^{\ast}$ for (P) is AKKT stationary, independently of constraint qualifications [6, Thm 3.1].

3 Approach and Algorithm

The methodology presented in this section builds upon the AL framework, interpreted as a proximal point scheme in the nonconvex regime, and IP methods. The basic idea is to construct a sequence of proximally regularized subproblems and to approximately solve each of them as a single barrier subproblem, effectively merging the AL and IP outer loops. Reduced computational cost can be achieved with an effective warm-starting of the IP iterations and with the tight entanglement of barrier and proximal penalty strategies, by monitoring and updating the parameters’ values alongside with the inner tolerance.

A classical approach is to consider a sequence of bound-constrained Lagrangian (BCL) subproblems [8, 6]

\operatorname*{minimize}_{x\geq 0}\quad f(x)+\frac{1}{2\rho_{k}}\|c(x)+\rho_{k}\hat{y}^{k}\|^{2}

(3)

where $\rho_{k}>0$ and $\hat{y}^{k}\in\mathbb{R}^{m}$ are some given penalty parameter and dual estimate, respectively. The nonlinearly-constrained Lagrangian (NCL) scheme [22] considers equality-constrained subproblems by introducing an auxiliary variable $s\in\mathbb{R}^{m}$ and the constraint $c(x)=s$ . Analogously, a proximal point perspective yields the equivalent reformulation

	$\displaystyle\operatorname*{minimize}_{x,\,\lambda}\quad$	$\displaystyle f(x)+\frac{\rho_{k}}{2}\\|\lambda\\|^{2}$		(4)
	$\displaystyle\operatorname{subject~{}to}\quad$	$\displaystyle c(x)+\rho_{k}(\hat{y}^{k}-\lambda)=0,\qquad x\geq 0,$

recovering the dual regularization term obtained, e.g., by [24, 10, 11]. By construction, these regularized subproblems are always feasible and satisfy a strong constraint qualification, namely the LICQ, at all points.

The regularized subproblems (3)–(4) can be numerically solved via IP algorithms. Let us consider a barrier parameter $\mu_{k}>0$ and barrier functions $b_{i}\colon\mathbb{R}\to\overline{\mathbb{R}}$ , $i=1,\ldots,n$ , each with domain $\operatorname{dom}b_{i}=(0,\infty)$ , and such that $b_{i}(t)\to\infty$ as $t\to 0^{+}$ and $b_{i}^{\prime}\leq 0$ . Exemplarily, the logarithmic function $x\mapsto-\ln(x)$ is one of such barrier functions. Other choices can be considered as well, e.g., to handle bilateral constraints [4]. We collect these barrier functions to define $b\colon\mathbb{R}^{n}\to\overline{\mathbb{R}}$ , $b\colon x\mapsto\sum_{i=1}^{n}b_{i}(x_{i})$ , whose domain is $\operatorname{dom}b=(0,\infty)^{n}$ . Thus, analogously to [3], a barrier counterpart for the BCL subproblem (3) reads

\operatorname*{minimize}_{x}\quad f(x)+\frac{1}{2\rho_{k}}\|c(x)+\rho_{k}\hat{y}^{k}\|^{2}+\mu_{k}b(x),

(5)

whereas for the constrained subproblem (4) this leads to

	$\displaystyle\operatorname*{minimize}_{x,\,\lambda}\quad$	$\displaystyle f(x)+\frac{\rho_{k}}{2}\\|\lambda\\|^{2}+\mu_{k}b(x)$		(6)
	$\displaystyle\operatorname{subject~{}to}\quad$	$\displaystyle c(x)+\rho_{k}(\hat{y}^{k}-\lambda)=0,$

which is a regularized version of [32, Eq. 3] and reminiscent of [5, Eq. 2]. It should be stressed that, in stark contrast with classical AL and IP schemes, we intend to find an (approximate) solution to the regularized subproblem (4) by (approximately) solving only one barrier subproblem (6). Inspired by [9, 7], our rationale is to drive $\rho_{k},\mu_{k}$ and the inner tolerance $\epsilon_{k}$ concurrently toward zero, effectively knitting together proximal and barrier strategies.

It should be noted that a primal (Tikhonov-like) regularization term is not explicitly included in (3)–(6). In fact, the original objective $f$ could be replaced by a (proximal) model of the form $x\mapsto f(x)+\frac{\sigma_{k}}{2}\|x-\hat{x}^{k}\|^{2}$ , with some given primal regularization parameter $\sigma_{k}\geq 0$ and reference point $\hat{x}^{k}\in\mathbb{R}^{n}$ . However, as this term can be interpreted as an inertia correction, we prefer the subsolver to account for its contribution; cf. [32, §3.1]. In this way, the subsolver can surgically tune the primal correction term as needed, possibly improving the convergence speed, and surpassing the issue that suitable values for $\sigma_{k}$ are unknown a priori.

Data:

\epsilon_{0},\rho_{0},\mu_{0}>0

\kappa_{\rho},\kappa_{\mu},\kappa_{\epsilon}\in(0,1)

\theta_{\rho},\theta_{\mu}\in[0,1)

Y\subset\mathbb{R}^{m}

nonempty bounded,

\varepsilon>0

Result:

\varepsilon

-KKT stationary point

x^{\ast}

with

y^{\ast}

z^{\ast}

2for $k=0,1,2,\ldots$ do

3 Select

\hat{y}^{k}\in Y

4 Find an

\epsilon_{k}

-KKT stationary point

(x^{k},\lambda^{k})

for (6), with multiplier

y^{k}

5 Set

z^{k}\leftarrow\mu_{k}\nabla b(x^{k})

6 if $(x^{k},y^{k},z^{k})$ $\mathrm{satisfies}$ (1) then

7 return

(x^{\ast},y^{\ast},z^{\ast})\leftarrow(x^{k},y^{k},z^{k})

9 Set

C^{k}\leftarrow\lVert c(x^{k})\rVert

and

V^{k}\leftarrow\lVert\min\{x^{k},-z^{k}\}\rVert

10 if $k=0$ or $C^{k}\leq\max\{\varepsilon,\theta_{\rho}C^{k-1}\}$ then

11 set

\rho_{k+1}\leftarrow\rho_{k}

, else select

\rho_{k+1}\in(0,\kappa_{\rho}\rho_{k}]

13 if $k=0$ or $V^{k}\leq\max\{\varepsilon,\theta_{\mu}V^{k-1}\}$ then

14 set

\mu_{k+1}\leftarrow\mu_{k}

, else select

\mu_{k+1}\in(0,\kappa_{\mu}\mu_{k}]

16 Set

\epsilon_{k+1}\leftarrow\max\{\varepsilon,\kappa_{\epsilon}\epsilon_{k}\}

Algorithm 3.1 Regularized interior point method for general nonlinear programs (P)

The overall procedure is detailed in Algorithm 3.1. At every outer iteration, indexed by $k$ , Algorithm 3.1 requires to compute an approximate stationary point, with the associated Lagrange multiplier, for the regularized barrier subproblem (6). As the dual estimate $\hat{y}^{k}$ is selected from some bounded set $Y\subset\mathbb{R}^{m}$ at Algorithm 3.1, the AL scheme is safeguarded and has stronger global convergence properties [6, Ch. 4]. The assignment of $z^{k}$ at Algorithm 3.1 follows from comparing and matching the stationarity conditions for (P) and (6). After checking termination, we monitor progress in constraint violation and complementarity, based on (1), and update parameters $\rho_{k}$ and $\mu_{k}$ accordingly, as well as the inner tolerance $\varepsilon_{k}$ . At Algorithms 3.1 and 3.1 we consider relaxed conditions for satisfactory feasibility and complementarity as it is preferable to have the sequences $\{\rho_{k}\}$ , $\{\mu_{k}\}$ , and $\{\epsilon_{k}\}$ bounded away from zero, in order to avoid unnecessary ill-conditioning and tight tolerances. Sufficient conditions to guarantee boundedness of the penalty parameter $\{\rho_{k}\}$ away from zero are given, e.g., by [2, §5]. Remarkably, as established by 4.2 in Section 4, there is no need for the barrier parameter $\mu_{k}$ to vanish in order to achieve $\varepsilon$ -complementarity in the sense of (1c), for $\varepsilon>0$ .

We shall mention that considering equivalent yet different subproblem formulations may affect the practical performance of the subsolver. It is enlightening to pinpoint the effect of the dual regularization in (6) and to appreciate its interactions with the linear algebra routines used to solve the linear systems arising in Newton-type methods. Although (6) has more (possibly many more) variables than (5), a simple reordering yields matrices with the same structure [10, 24]. Let us have a closer look. Defining the Lagrangian function $\mathcal{L}_{k}(x,y):=f(x)+\mu_{k}b(x)+\langle y,c(x)\rangle$ , the stationarity condition for (5) reads $0=\nabla_{x}\mathcal{L}_{k}(x,y_{k}(x))$ , where $y_{k}(x):=\hat{y}^{k}+\rho_{k}^{-1}c(x)$ , and the corresponding Newton system is

\begin{bmatrix}H_{k}(x,y_{k}(x))+\frac{1}{\rho_{k}}\nabla c(x)^{\top}\nabla c(x)\end{bmatrix}\delta x=-\nabla_{x}\mathcal{L}_{k}(x,y_{k}(x)),

(7)

where $H_{k}(x,y)\in\mathbb{R}^{n\times n}$ denotes the Hessian matrix $\nabla_{xx}^{2}\mathcal{L}_{k}(x,y)$ or a symmetric approximation thereof. A linear transformation yields the equivalent linear system

\begin{bmatrix}H_{k}(x,y_{k}(x))&\nabla c(x)^{\top}\\ \nabla c(x)&-\rho_{k}I\end{bmatrix}\begin{bmatrix}\delta x\\ \delta y\end{bmatrix}=-\begin{bmatrix}\nabla_{x}\mathcal{L}_{k}(x,y_{k}(x))\\ 0\end{bmatrix}.

(8)

Analogous Newton systems for (6) read

\begin{bmatrix}H_{k}(x,y)&\cdot&\nabla c(x)^{\top}\\ \cdot&\rho_{k}I&-\rho_{k}I\\ \nabla c(x)&-\rho_{k}I&\cdot\end{bmatrix}\begin{bmatrix}\delta x\\ \delta\lambda\\ \delta y\end{bmatrix}=-\begin{bmatrix}\nabla_{x}\mathcal{L}_{k}(x,y)\\ \rho_{k}(\lambda-y)\\ c(x)+\rho_{k}(\hat{y}^{k}-\lambda)\end{bmatrix}

(9)

and formally solving for $\delta\lambda$ gives the condensed system

\begin{bmatrix}H_{k}(x,y)&\nabla c(x)^{\top}\\ \nabla c(x)&-\rho_{k}I\end{bmatrix}\begin{bmatrix}\delta x\\ \delta y\end{bmatrix}=-\begin{bmatrix}\nabla_{x}\mathcal{L}_{k}(x,y)\\ c(x)+\rho_{k}(\hat{y}^{k}-y)\end{bmatrix}.

(10)

The resemblances between these linear systems are apparent, as well as the differences. The AL relaxation in (5) introduces a dual regularization for both the linear algebra and nonlinear solver, whose hidden constraint $c(x)+\rho_{k}(\hat{y}^{k}-y)=0$ holds pointwise due to the identity $y=y_{k}(x)$ . We remark that, entering the (2,2)-block, the dual regularization prevents issues due to linear dependence. Furthermore, the primal regularization is left to the inertia correction strategy of the subsolver, affecting the (1,1)-block as in [32, §3.1]. If the approximation $H_{k}(x,y)$ is positive definite, e.g., by adopting suitable quasi-Newton techniques, the matrix in (10) is symmetric quasi-definite and can be efficiently factorized with tailored linear algebra routines [30].

4 Convergence Analysis

In this section we analyze the asymptotic properties of the iterates generated by Algorithm 3.1 under the following blanket assumptions:

(a1)

Functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and $c\colon\mathbb{R}^{n}\to\mathbb{R}^{m}$ in (P) are continuously differentiable.
(a2)

Subproblems (6) are well-posed for all parameters’ values, namely for any $\mu_{k}\leq\mu_{0}$ , $\rho_{k}\leq\rho_{0}$ , and $\hat{y}^{k}\in Y$ .

First, we characterize the iterates in terms of stationarity.

Lemma 4.1.

Consider a sequence $\{x^{k},y^{k},z^{k}\}$ generated by Algorithm 3.1. Then, for all $k\in\mathbb{N}$ , it is $x^{k}>0$ , $z^{k}\leq 0$ , and the following conditions hold:


$\displaystyle\\|\nabla f(x^{k})+\nabla c(x^{k})^{\top}y^{k}+z^{k}\\|{}\leq{}$	$\displaystyle\epsilon_{k},$	(11a)
$\displaystyle\\|c(x^{k})+\rho_{k}(\hat{y}^{k}-y^{k})\\|{}\leq{}$	$\displaystyle 2\epsilon_{k}.$	(11b)

Proof.

Positivity of $x^{k}$ follows from the barrier function $b$ having domain $\operatorname{dom}b=(0,\infty)^{n}$ , whereas nonpositivity of $z^{k}$ is a consequence of $b_{i}^{\prime}\leq 0$ for all $i$ and $\mu_{k}>0$ . Based on 2.2 and Algorithm 3.1 of Algorithm 3.1, the $\epsilon_{k}$ -KKT stationarity of $(x^{k},\lambda^{k})$ for (6), with multiplier $y^{k}$ , yields (11a) along with


$\displaystyle\rho_{k}\lVert\lambda^{k}-y^{k}\rVert{}\leq{}$	$\displaystyle\epsilon_{k},$	(12a)
$\displaystyle\lVert c(x^{k})+\rho_{k}(\hat{y}^{k}-\lambda^{k})\rVert{}\leq{}$	$\displaystyle\epsilon_{k}.$	(12b)

By the triangle inequality, (12a)–(12b) imply (11b). ∎

Patterning [12, Thm 4.2(ii)], we establish asymptotic complementarity.

Lemma 4.2.

Consider a sequence $\{x^{k},y^{k},z^{k}\}$ of iterates generated by Algorithm 3.1 with $\varepsilon=0$ . Then, it holds $\lim\limits_{k\to\infty}\min\{x^{k},-z^{k}\}=0$ .

Proof.

The algorithm can terminate in finite time only if the returned triplet $(x^{\ast},y^{\ast},z^{\ast})$ satisfies $\min\{x^{\ast},-z^{\ast}\}=0$ . Excluding this ideal situation, we may assume that it runs indefinitely and that consequently $\mu_{k}\searrow 0$ , owing to Algorithms 3.1 and 3.1 and recalling that $x^{k}>0$ and $z^{k}\leq 0$ for all $k\in\mathbb{N}$ by 4.1. Consider now an arbitrary index $i\in\{1,\ldots,n\}$ and the two possible cases. If $x_{i}^{k}\to 0$ , then the statement readily follows from $z_{i}^{k}\leq 0$ . If instead a subsequence $\{x_{i}^{k}\}_{K}$ remains bounded away from zero, then $\{b_{i}^{\prime}(x_{i}^{k})\}_{K}$ is bounded and therefore $z_{i}^{k}=\mu_{k}b_{i}^{\prime}(x_{i}^{k})\to 0$ as $k\to_{K}\infty$ , proving the statement since $x_{i}^{k}>0$ . The claim then follows from the arbitrarity of the index $i$ and the subsequence. ∎

Like all penalty-type methods in the nonconvex setting, Algorithm 3.1 may generate limit points that are infeasible for (P). Patterning standard arguments, the following result gives sufficient conditions for the feasibility of limit points; cf. [6, Ex. 4.12].

Proposition 4.3.

Consider a sequence $\{x^{k},y^{k},z^{k}\}$ of iterates generated by Algorithm 3.1. Then, each limit point $x^{\ast}$ of $\{x^{k}\}$ is feasible for (P) if one of the following conditions holds:

(i)

the sequence $\{\rho_{k}\}$ is bounded away from zero, or
(ii)

there exists some $B\in\mathbb{R}$ such that for all $k\in\mathbb{N}$

$f(x^{k})+\frac{1}{2\rho_{k}}\lVert c(x^{k})+\rho_{k}\hat{y}^{k}\rVert^{2}\leq B.$

These conditions are generally difficult to check a priori. Nevertheless, in the situation where each iterate $x^{k}$ is actually a (possibly inexact) global minimizer of (6), then limit points generated by Algorithm 3.1 have minimum constraint violation and tend to minimize the objective function subject to minimal infeasibility [6, Thm 5.1, Thm 5.3]. In particular, limit points are indeed feasible if (P) admits feasible points. However, these properties cannot be expected by solving the subproblems only up to stationarity. Nonetheless, even in the case where a limit point is not necessarily feasible, the next result shows that it is at least a stationary point for a feasibility problem associated to (P).

Proposition 4.4.

Consider a sequence $\{x^{k},y^{k},z^{k}\}$ generated by Algorithm 3.1 with $\varepsilon=0$ . Then each limit point $x^{\ast}$ of $\{x^{k}\}$ is KKT stationary for the problem

\operatorname*{minimize}_{x\geq 0}\quad\frac{1}{2}\|c(x)\|^{2}.

(13)

Proof.

We may consider two cases, depending on the sequence $\{\rho_{k}\}$ . If $\{\rho_{k}\}$ remains bounded away from zero, then Algorithms 3.1 and 3.1 of Algorithm 3.1 imply that $\|c(x^{k})\|\to 0$ for $k\to\infty$ . Continuity of $c$ and properties of norms yield $c(x^{\ast})=0$ . Furthermore, by construction, we have $x^{k}>0$ for all $k\in\mathbb{N}$ , hence $x^{\ast}\geq 0$ . Altogether, this shows that $x^{\ast}$ is feasible for (P), namely a global minimizer for the feasibility problem (13) and, therefore, a KKT stationary point thereof. Assume now that $\rho_{k}\to 0$ . Define $\delta^{k}\in\mathbb{R}^{n}$ and $\eta^{k}\in\mathbb{R}^{m}$ as

	$\displaystyle\delta^{k}{}:={}$	$\displaystyle\nabla f(x^{k})+\nabla c(x^{k})^{\top}y^{k}+z^{k}$
	$\displaystyle\eta^{k}{}:={}$	$\displaystyle c(x^{k})+\rho_{k}(\hat{y}^{k}-y^{k})$

for all $k\in\mathbb{N}$ . In view of 4.1, we have that $\lVert\delta^{k}\rVert\leq\epsilon_{k}$ and $\lVert\eta^{k}\rVert\leq 2\epsilon_{k}$ hold for all $k\in\mathbb{N}$ . Multiplying $\delta^{k}$ by $\rho_{k}$ , substituting $y^{k}$ and rearranging, we obtain

\rho_{k}\delta^{k}{}={}\rho_{k}\nabla f(x^{k})+\nabla c(x^{k})^{\top}\left[\rho_{k}\hat{y}^{k}+c(x^{k})-\eta^{k}\right]+\rho_{k}z^{k}.

Now, let $x^{\ast}$ be a limit point of $\{x^{k}\}$ and $\{x^{k}\}_{K}$ a subsequence such that $x^{k}\to_{K}x^{\ast}$ . Then the sequence $\{\nabla f(x^{k})\}_{K}$ is bounded, and so is $\{\hat{y}^{k}\}_{K}\subset Y$ by construction. Recalling from 4.1 that $x^{k}>0$ and $z^{k}\leq 0$ , and observing that $0\leq\lVert\delta^{k}\rVert,\lVert\eta^{k}\rVert\leq 2\epsilon_{k}\to 0$ , we shall now take the limit of $\rho_{k}\delta^{k}$ for $k\to_{K}\infty$ , resulting in

0{}={}\nabla c(x^{\ast})^{\top}c(x^{\ast})+\tilde{z}^{\ast}

for some $\tilde{z}^{\ast}\leq 0$ . As a limit point of $\{\rho_{k}z^{k}\}$ , $\tilde{z}^{\ast}$ together with $x^{\ast}$ satisfy $\min\{x^{\ast},-\tilde{z}^{\ast}\}=0$ by 4.2. Since we also have $x^{\ast}\geq 0$ , it follows that $x^{\ast}$ is KKT stationary for (13) according to 2.2. ∎

Finally, we qualify the output of Algorithm 3.1 in the case of feasible limit points. In particular, it is shown that any feasible limit point is AKKT stationary for (P) in the sense of 2.3. Under some additional boundedness conditions, feasible limit points are KKT stationary, according to 2.2.

Theorem 4.5.

Let $\{x^{k},y^{k},z^{k}\}$ be a sequence of iterates generated by Algorithm 3.1 with $\varepsilon=0$ . Let $x^{\ast}$ be a feasible limit point of $\{x^{k}\}$ and $\{x^{k}\}_{K}$ a subsequence such that $x^{k}\to_{K}x^{\ast}$ . Then,

(i)

$x^{\ast}$ is an AKKT stationary point for (P).
(ii)

If $\{y^{k},z^{k}\}_{K}$ remain bounded, then $x^{\ast}$ is KKT stationary for (P).

Proof.

(i) Together with the fact that $\epsilon_{k}\to 0$ , 4.1 ensures that the sequence $\{x^{k}\}_{K}$ satisfies condition (2a), whereas 4.2 implies (2b). Feasibility of $x^{\ast}$ completes the proof.

(ii) By boundedness, the subsequences $\{y^{k}\}_{K}$ and $\{z^{k}\}_{K}$ admit some limit points $y^{\ast}$ and $z^{\ast}$ , respectively. Thus, from the previous assertion and with continuity arguments on $f$ and $c$ , it follows that $x^{\ast}$ is KKT stationary for (P), not only asymptotically. ∎

Provided that the iterates admit a feasible limit point, finite termination of Algorithm 3.1 with an $\varepsilon$ -KKT stationary point can be established as a direct consequence of 4.5.

5 Numerical Results

In this section we test an instance of the proposed regularized interior point approach, denoted RegIP, on the CUTEst benchmark problems [19]. RegIP is compared in terms of robustness against the IP solver Ipopt [32] and the AL solver Percival [15], which is based on a BCL method [8] coupled with a trust-region matrix-free solver [21] for the subproblems. We do not report runtimes nor iteration counts since a fair comparison would require close inspection of heuristics and fallbacks [32, §3].

We implemented RegIP in Julia and set up the numerical experiments adopting the JSO software infrastructure by [23]. The IP solver Ipopt acts as subsolver to execute Algorithm 3.1, warm-started at the current primal $(x^{k-1},y^{k-1})$ and dual $(y^{k-1}$ , $z^{k-1})$ estimates. We use its parameter tol to set the (inner) tolerance $\epsilon_{k}$ , disabling other termination conditions, and let Ipopt control the barrier parameter as needed to approximately solve the regularized subproblem.²²2Solving a sequence of barrier subproblems may hinder the computational efficiency of RegIP compared to the approach behind Algorithm 3.1, but does not degrade its reliability. Ongoing research focuses on solving (6) and letting the IP subsolver update the barrier parameter after warm-starting at the current primal-dual estimate, in the spirit of [7, Alg. 3]. We let the safeguarding set be $Y:=\{v\in\mathbb{R}^{m}\,|\,\|v\|_{\infty}\leq 10^{20}\}$ and choose $\hat{y}^{k}$ by projecting the current estimate $y^{k-1}$ onto $Y$ . We set the initial penalty parameter to $\rho_{0}=10^{-6}$ , the inner tolerance $\epsilon_{0}=\sqrt[3]{\varepsilon}$ , and parameters $\theta_{\rho}=0.5$ , $\kappa_{\rho}=0.5$ , and $\kappa_{\epsilon}=0.5$ . RegIP declares success, and returns a $\varepsilon$ -KKT stationary point, as soon as $\epsilon_{k}\leq\varepsilon$ and $C^{k}\leq\varepsilon$ .³³3The condition $\epsilon_{k}\leq\varepsilon$ implies both $\varepsilon$ -stationarity (1a) and $\varepsilon$ -complementarity (1c) required for $\varepsilon$ -KKT stationarity. This follows from the observation that, in RegIP, the subsolver approximately solves (4) at Algorithm 3.1, not (6); see Footnote 2. Instead, if $\epsilon_{k}\leq\varepsilon$ , $C^{k}>\varepsilon$ and $\rho_{k}\leq\rho_{\min}:=10^{-20}$ , RegIP stops declaring (local) infeasibility. For Ipopt, we set the tolerance tol to $\varepsilon$ , remove the other desired thresholds, and disable termination based on acceptable iterates. For Percival, we set absolute and relative tolerances atol, rtol, and ctol to $\varepsilon$ . We select all CUTEst problems with at most 1000 variables and constraints, obtaining a test set with 609 problems. All solvers are provided with the primal-dual initial point available in CUTEst, a time limit of $60$ seconds, the maximum number of iterations set to $10^{9}$ , and a tolerance $\varepsilon\in\{10^{-3},10^{-5}\}$ . A solver is deemed to solve a problem if it returns with a successful status; it fails otherwise. The source codes for the numerical experiments have been archived on Zenodo at

doi: 10.5281/zenodo.7109904.

Table 1 summarizes the results, stratified by solver, termination tolerance ( $\varepsilon$ ) and problem size ( $n$ , $m$ ). For each combination, we indicate the number of times RegIP solves a problem that the other solver fails (“W”) or solves (“T+”) and the number of times RegIP fails on a problem that the other one fails (“T-”) or solves (“L”). The results show that RegIP succeeds on more problems than the other solvers, consistently for both low and high accuracy, indicating that the underlying regularized IP approach can form the basis for reliable and scalable solvers.⁴⁴4We agree with [5, §5] on the fact that “strong statements concerning the relative efficiency or robustness [ $\ldots$ ] are not possible in nonlinear optimization.”

Table 1: Comparison on CUTEst problems with

n

variables and

m

constraints

RegIP against Ipopt
Size $n,m$		Tolerance $\varepsilon=10^{-3}$			Tolerance $\varepsilon=10^{-5}$
Min	Max	W	T	L	W	T	L
0	10	19	417+/25-	3	15	417+/29-	3
11	100	6	60+/7-	1	7	58+/8-	1
101	1000	6	57+/8-	0	7	53+/10-	1
RegIP against Percival
Size $n,m$		Tolerance $\varepsilon=10^{-3}$			Tolerance $\varepsilon=10^{-5}$
Min	Max	W	T	L	W	T	L
0	10	17	419+/20-	8	19	413+/24-	8
11	100	12	54+/8-	0	18	47+/8-	1
101	1000	37	26+/8-	0	37	23+/11-	0

6 Conclusion

This paper has presented a regularized interior point approach to solving constrained nonlinear optimization problems. Operating as an outer regularization layer, a quadratic proximal penalty provides robustness whilst consuming minimal computation effort once embedded into existent interior point codes as a principled inertia correction strategy. Furthermore, regularizing the equality constraints allows to safely adopt more efficient linear algebra routines, while waiving the need for an infeasibility detection mechanism within the subsolver. Preliminary numerical results indicate that a close integration of proximal regularization within interior point schemes is key to provide efficient and robust solvers. We encourage further research in this direction.

Acknowledgements

I gratefully acknowledge the support of Ryan Loxton and the Centre for Optimisation and Decision Science for giving me the opportunity to visit Curtin University. I would also like to thank Hoa T. Bui for her friendly hospitality and lively discussions during this time in Perth.

References

[1] A.Altman and J.Gondzio, Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization, Optimization Methods and Software 11 (1999), 275–302, doi:10.1080/10556789908805754.
[2] R.Andreani, E. G.Birgin, J. M.Martínez, and M. L.Schuverdt, On Augmented Lagrangian Methods with General Lower–Level Constraints, SIAM Journal on Optimization 18 (2008), 1286–1309, doi:10.1137/060654797.
[3] P.Armand and N. N.Tran, Rapid infeasibility detection in a mixed logarithmic barrier-augmented Lagrangian method for nonlinear optimization, Optimization Methods and Software 34 (2019), 991–1013, doi:10.1080/10556788.2018.1528250.
[4] E.Bertolazzi, F.Biral, and M.Da Lio, Real-Time Motion Planning for Multibody Systems, Multibody System Dynamics 17 (2007), 119–139, doi:10.1007/s11044-007-9037-7.
[5] E. G.Birgin, L. F.Bueno, and J. M.Martínez, Sequential equality-constrained optimization for nonlinear programming, Computational Optimization and Applications 65 (2016), 699–721, doi:10.1007/s10589-016-9849-6.
[6] E. G.Birgin and J. M.Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014.
[7] S.Cipolla and J.Gondzio, Proximal stabilized Interior Point Methods for quadratic programming and low-frequency-updates preconditioning techniques, 2022, doi:10.48550/arxiv.2205.01775.
[8] A. R.Conn, N. I. M.Gould, and P. L.Toint, A Globally Convergent Augmented Lagrangian Algorithm for Optimization with General Constraints and Simple Bounds, SIAM Journal on Numerical Analysis 28 (1991), 545–572, doi:10.1137/0728030.
[9] F. E.Curtis, A penalty-interior-point algorithm for nonlinear constrained optimization, Mathematical Programming Computation 4 (2012), 181–209, doi:10.1007/s12532-012-0041-4.
[10] A.De Marchi, Augmented Lagrangian methods as dynamical systems for constrained optimization, in 60th IEEE Conference on Decision and Control (CDC), IEEE, Austin, TX, 2021, 6533–6538, doi:10.1109/cdc45484.2021.9683199.
[11] A.De Marchi, On a Primal-Dual Newton Proximal Method for Convex Quadratic Programs, Computational Optimization and Applications (2022), doi:10.1007/s10589-021-00342-y.
[12] A.De Marchi and A.Themelis, An Interior Proximal Gradient Method for Nonconvex Optimization, 2022, doi:10.48550/arxiv.2208.00799.
[13] M.Dehghani, A.Lambe, and D.Orban, A regularized interior-point method for constrained linear least squares, INFOR: Information Systems and Operational Research 58 (2020), 202–224, doi:10.1080/03155986.2018.1559428.
[14] M.Diehl, H. J.Ferreau, and N.Haverbeke, Efficient numerical methods for nonlinear MPC and moving horizon estimation, in Nonlinear Model Predictive Control: Towards New Challenging Applications, Springer, 2009, 391–417, doi:10.1007/978-3-642-01094-1_32.
[15] E. A.dos Santos and A. S.Siqueira, Percival.jl: an augmented Lagrangian method, 2020, doi:10.5281/zenodo.3969045, https://github.com/JuliaSmoothOptimizers/Percival.jl.
[16] A. V.Fiacco and G. P.McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York, 1968.
[17] M. P.Friedlander and D.Orban, A primal-dual regularized interior-point method for convex quadratic programs, Mathematical Programming Computation 4 (2012), 71–107, doi:10.1007/s12532-012-0035-2.
[18] J.Gondzio, Interior point methods 25 years later, European Journal of Operational Research 218 (2012), 587–601, doi:10.1016/j.ejor.2011.09.017.
[19] N. I. M.Gould, D.Orban, and P. L.Toint, CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization, Computational Optimization and Applications 60 (2015), 545–557, doi:10.1007/s10589-014-9687-3.
[20] D.Liao-McPherson and I.Kolmanovsky, FBstab: A proximally stabilized semismooth algorithm for convex quadratic programming, Automatica 113 (2020), 108801, doi:10.1016/j.automatica.2019.108801.
[21] C. J.Lin and J. J.Moré, Newton’s Method for Large Bound-Constrained Optimization Problems, SIAM Journal on Optimization 9 (1999), 1100–1127, doi:10.1137/s1052623498345075.
[22] D.Ma, K. L.Judd, D.Orban, and M. A.Saunders, Stabilized Optimization Via an NCL Algorithm, in Numerical Analysis and Optimization, M.Al-Baali, L.Grandinetti, and A.Purnama (eds.), Springer, 2018, 173–191, doi:10.1007/978-3-319-90026-1_8.
[23] D.Orban and A. S.Siqueira, JuliaSmoothOptimizers: Infrastructure and Solvers for Continuous Optimization in Julia, 2019, doi:10.5281/zenodo.2655082, https://juliasmoothoptimizers.github.io.
[24] A.Potschka and H. G.Bock, A sequential homotopy method for mathematical programming problems, Mathematical Programming 187 (2021), 459–486, doi:10.1007/s10107-020-01488-z.
[25] R. T.Rockafellar, Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming, SIAM Journal on Control 12 (1974), 268–285, doi:10.1137/0312021.
[26] R. T.Rockafellar, Monotone Operators and the Proximal Point Algorithm, SIAM Journal on Control and Optimization 14 (1976), 877–898, doi:10.1137/0314056.
[27] N.Saraf and A.Bemporad, An efficient bounded-variable nonlinear least-squares algorithm for embedded MPC, Automatica 141 (2022), 110293, doi:10.1016/j.automatica.2022.110293.
[28] A. S.Siqueira and D.Orban, A Regularized Interior-Point Method for Constrained Nonlinear Least Squares, 2018, https://abelsiqueira.github.io/assets/2018-07-23-xiibrazopt.pdf (visited September 21, 2022). XII Brazilian Workshop on Continuous Optimization.
[29] P.Sopasakis, E.Fresk, and P.Patrinos, OpEn: Code Generation for Embedded Nonconvex Optimization, IFAC-PapersOnLine 53 (2020), 6548–6554, doi:10.1016/j.ifacol.2020.12.071. 21st IFAC World Congress.
[30] R. J.Vanderbei, Symmetric Quasidefinite Matrices, SIAM Journal on Optimization 5 (1995), 100–113, doi:10.1137/0805005.
[31] R. J.Vanderbei and D. F.Shanno, An Interior-Point Algorithm for Nonconvex Nonlinear Programming, Computational Optimization and Applications 13 (1999), 231–252, doi:10.1023/a:1008677427361.
[32] A.Wächter and L. T.Biegler, On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming, Mathematical Programming 106 (2006), 25–57, doi:10.1007/s10107-004-0559-y.