This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

DEFT-FUNNEL: an open-source global optimization solver for constrained grey-box and
black-box problems

Phillipe R. Sampaio
Veolia Research and Innovation
Maisons-Laffitte, France
[email protected]
Abstract

The fast-growing need for grey-box and black-box optimization methods for constrained global optimization problems in fields such as medicine, chemistry, engineering and artificial intelligence, has contributed for the design of new efficient algorithms for finding the best possible solution. In this work, we present DEFT-FUNNEL, an open-source global optimization algorithm for general constrained grey-box and black-box problems that belongs to the class of trust-region sequential quadratic optimization algorithms. It extends the previous works by Sampaio and Toint (2015, 2016) to a global optimization solver that is able to exploit information from closed-form functions. Polynomial interpolation models are used as surrogates for the black-box functions and a clustering-based multistart strategy is applied for searching for the global minima. Numerical experiments show that DEFT-FUNNEL compares favorably with other state-of-the-art methods on two sets of benchmark problems: one set containing problems where every function is a black box and another set with problems where some of the functions and their derivatives are known to the solver. The code as well as the test sets used for experiments are available at the Github repository http://github.com/phrsampaio/deft-funnel.

Keywords Global optimization  \cdot constrained nonlinear optimization  \cdot black-box optimization  \cdot grey-box optimization  \cdot derivative-free optimization  \cdot simulation-based optimization  \cdot trust-region method  \cdot subspace minimization  \cdot sequential quadratic optimization

1 Introduction

We are interested in finding the global minimum of the optimization problem

{minxf(x)s.t.:lcc(x)uc,lhh(x)uh,lxxux,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{x}&f(x)\\ \textrm{s.t.:}&l^{c}\leq c(x)\leq u^{c},\\ &l^{h}\leq h(x)\leq u^{h},\\ &l^{x}\leq x\leq u^{x},\\ \end{array}\right. (5)

where f:nf:\mathbb{R}^{n}\rightarrow\Re might be or not a black box, c:nqc:\mathbb{R}^{n}\rightarrow\mathbb{R}^{q} are black-box constraint functions and h:nlh:\mathbb{R}^{n}\rightarrow\mathbb{R}^{l} are white-box constraint functions. i.e. their analytical expressions as well as their derivatives are available. The vectors lcl^{c}, lhl^{h}, ucu^{c} and uhu^{h} are lower and upper bounds on the constraints values c(x)c(x) and h(x)h(x), while lxl^{x} and uxu^{x} are bounds on the xx variables, with lc()ql^{c}\in(\mathbb{R}\cup-\infty)^{q}, lh()ll^{h}\in(\mathbb{R}\cup-\infty)^{l}, uc()qu^{c}\in(\mathbb{R}\cup\infty)^{q}, uh()lu^{h}\in(\mathbb{R}\cup\infty)^{l}, lx()nl^{x}\in(\mathbb{R}\cup-\infty)^{n} and ux()nu^{x}\in(\mathbb{R}\cup\infty)^{n}. We also address the case where there are no white-box constraint functions hh. Finally, we assume that the bound constraints are unrelaxable, i.e. feasibility must be maintained throughout the iterations, while the other general constraints are relaxable.

When at least one of the functions in (1) has a closed form, i.e. either the objective function is a white box or there is at least one white-box constraint function in the problem, the latter is said to be a grey-box problem. If no information about the functions is given at all, which means that the objective function is a black box and that there are no white-box constraints, the problem is known as a black-box problem. Both grey-box and black-box optimization belong to the field of derivative-free optimization (DFO) [12, 4], where the derivatives of the functions are not available. DFO problems are encountered in real-life applications in various fields such as engineering, medicine, science and artificial intelligence. The black boxes are often the result of an expensive simulation or a proprietary code, in which case automatic differentiation [21, 22] is not applicable.

Many optimizations methods have been developed for finding stationary points or local minima of (1) when both the objective and the constraints functions are black boxes (e.g., [36, 29, 10, 5, 42, 43, 14, 1]). However, a local minimum is not enough sometimes and so one need to search for the global minimum [16, 15]. A reduced number of methods have been proposed to find the global minima of constrained black-box problems (see, for instance, [25, 39, 40, 37, 38, 9]) and thus there is still much research to be done on this area. Moreover, many global optimization methods for constrained black-box problems proposed in the literature or used in industry are unavailable to the public and are not open source. We refer the reader to the survey papers [41, 28] and to the textbooks [12, 4] for a comprehensive review on DFO algorithms for different types of problems.

In the case of constrained grey-box problems, especially those found in industry, it is a good idea to exploit the available information about the white boxes since such problems are usually hard to be solved, i.e. highly nonlinear, multimodal and with very expensive functions. Therefore, one would expect the solver to use any information given as input in order to attain the global minimum as fast as possible. Unfortunately, even less global optimization solvers exist for such problems today. A common solution found by optimization researchers, engineers and practioners is to consider all the functions as black boxes and to use a black-box optimization algorithm to solve the problem. Two of the few methods that exploit the available information are ARGONAUT [9] and a trust-region two-phase algorithm proposed in [6]. In ARGONAUT, the black-box functions are replaced by surrogate models and a global optimization algorithm is used to solve the problem to global optimality, having the surrogate models updated only after its resolution. After updating the models, the problem is solved again with the updated models and this process is repeated until convergence is declared. In the two-phase algorithm described in [6], radial basis functions (RBF) are used as surrogate models for the black-box functions. Moreover, as in DEFT-FUNNEL, the self-correcting geometry approach proposed by [44] is applied to the management of the interpolation set. Their algorithm is composed of a feasibility phase, where the goal is to find a feasible point, and an optimization phase, where the feasible point found in the first phase is used as a starting point to find a global minimum.

The algorithm in [6] is the one sharing more elements in common with DEFT-FUNNEL. However, these two methods are still very different in nature since DEFT-FUNNEL combines a multistart strategy with a sequential quadratic optimization (SQO) algorithm in order to find global minima while the other applies a global optimization solver in its both phases for this purpose. Besides, DEFT-FUNNEL employs local polynomial interpolation models rather than RBF models as the former have good performance in the context of local optimization, which suits well for the SQO algorithm used in its local search. Despite the good performance of the algorithms proposed in [9] and [6], neither is freely available or open source.

Contributions. This paper proposes a new global optimization solver for general constrained grey-box and black-box problems written in Matlab [32] that exploits any available information given as input and that employs surrogate models built from polynomial interpolation in a trust-region-based SQO algorithm. Differently from the ARGONAUT approach, the surrogate models are updated during the optimization process as soon as new information from the evaluation of the functions at the iterates become available. Furthermore, the proposed solver, named DEFT-FUNNEL, is open source and freely available at the Github repository http://github.com/phrsampaio/deft-funnel. It is based on the previous works in [42, 43] and it extends the original DFO algorithm to grey-box problems and to the search for global minima. As its previous versions, it does not require feasible starting points. To our knowledge, DEFT-FUNNEL is the first open-source global optimization solver for general constrained grey-box and black-box problems that exploits the derivative information available from the white-box functions. It is also the first one of the class of trust-funnel algorithms [19] to be used in the search for global minima in both derivative-based and derivative-free optimization.

This paper serves also as the first release of the DEFT-FUNNEL code to the open-source community and to the public in general. For this reason and also due to the new extensions mentioned above, we give a complete description of the algorithm so that the reader can understand the method more easily while examining the code on Github. We also notice that some modifications and additions have been made to the local-search SQO algorithm with respect to the one presented in [43]. In particular, some changes were done in the condition for the normal step calculation, in the criticality step and in the maintenance of the interpolation set, all of them being described in due course. Furthermore, we have also added a second-order correction step.

The extension to global optimization is based on the multi-level single linkage (MLSL) method [26, 27], a well-known stochastic multistart strategy that combines random sampling, a clustering-based approach for the selection of the starting points and local searches in order to identify all local minima. In DEFT-FUNNEL, it is used for selecting the starting points of the local searches done with the trust-funnel SQO algorithm.

Organization. The outline of this paper is as follows. Section 2 introduces the MLSL method while in Section 3 the DEFT-FUNNEL solver is presented in detail. In Section 4, numerical results on a set of benchmark problems for global optimization are shown and the performance of DEFT-FUNNEL is compared with those of other state-of-the-art algorithms in a black-box setting. Moreover, numerical results on a set of grey-box problems are also analyzed. Finally, some conclusions about the proposed solver are drawn in Section 5.

Notation. Unless otherwise specified, the norm \|\cdot\| is the standard Euclidean norm. Given any vector xnx\in\mathbb{R}^{n}, we denote its ii-th component by [x]i\left[x\right]_{i}. We define [x]+=max(0,x)\left[x\right]^{+}=\max(0,x) where the max\max operation is done componentwise. We let (z;Δ){\cal B}(z;\Delta) denote the closed Euclidian ball centered at z, with radius Δ>0\Delta>0. Given any set 𝒜{\cal A}, |𝒜||{\cal A}| denotes the cardinality of 𝒜{\cal A}. By 𝒫nd{\cal P}_{n}^{d}, we mean the space of all polynomials of degree at most dd in n\Re^{n}. Finally, given any subspace 𝒮{\cal S}, we denote its dimension by dim(𝒮)dim({\cal S}).

2 Multi-level single linkage

The MLSL method [26, 27] is a stochastic multistart strategy originally designed for bound-constrained global optimization problems as below

{minf(x)s.t.:xΩ,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min&f(x)\\ \textrm{s.t.:}&x\in\Omega,\\ \end{array}\right. (8)

where Ωn\Omega\subseteq\mathbb{R}^{n} is a convex, compact set containing the global minimum as an interior point that is defined by lower and upper bounds. It was later extended to problems with general constraints in [45]. As most of the stochastic multistart methods, it consists of a global phase, where random points are sampled from a probabilistic distribution, and a local phase, where selected points from the global phase are used as starting points for local searches done by some local optimization algorithm. MLSL aims at avoiding unnecessary and costly local searches that culminate in the same local minima. To achieve this goal, sample points are drawn from an uniform distribution in the global phase and then a local search procedure is applied to each of them except if there is another sample point or a previously detected local minimum within a critical distance with smaller objective function value. The method is fully described in Algorithm 2.

 Algorithm 2.1: MLSL  

1:

={\cal L}^{*}=\emptyset

2:

for k=1k=1 to \ldots do

3:

Generate NN random points uniformly distributed.

4:

Rank the sample points by increasing value of ff.

5:

for i=1i=1 to kNkN do

6:

if x:xxirk\not\exists x:\|x-x_{i}\|\leq r_{k} and f(x)<f(xi)f(x)<f(x_{i}) then

7:

=LocalSearch(xi){\cal L}^{*}={\cal L}^{*}\cup LocalSearch(x_{i})

8:

end if

9:

end for

10:

end for

11:

return the best local minimum found in {\cal L}^{*}

  The critical distance rkr_{k} is defined as

rk=defπ1/2(Γ(1+n2)m(S)σlogkNkN)1/n,\displaystyle r_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\pi^{-1/2}\left(\Gamma\left(1+\frac{n}{2}\right)m(S)\frac{\sigma\log kN}{kN}\right)^{1/n}, (9)

where Γ\Gamma is the gamma function and m(S)m(S) is the Lebesgue measure of the set SS. The method is centred on the idea of exploring the region of attraction of all local minima, which is formally defined below.

Definition 2.1.

Given a local search procedure 𝒫{\cal P}, we define a region of attraction (x){\cal R}(x^{*}) in Ω\Omega to be the set of all points in Ω\Omega starting from which 𝒫{\cal P} will arrive at xx^{*}.

The ideal multistart method is the one that runs a local search only once at the region of attraction of every local minimum. However, two types of errors might occur in practice [30]:

  • Error 1. The same local minimum xx^{*} has been found after applying local search to two or more points belonging to the same region of attraction of xx^{*}.

  • Error 2. The region of attraction of a local minimum xx^{*} contains at least one sampled point, but local search has never been applied to points in this region.

In [26, 27], the authors demonstrate the following theoretical properties of MLSL that are directly linked to the errors above:

  • Property 1. (Theorem 8 in [26] and Theorem 1 in [27]) If σ>4\sigma>4 in (9), then, even if the sampling continues forever, the total number of local searches ever started by MLSL is finite with probability 1.

  • Property 2. (Theorem 12 in [26] and Theorem 2 in [27]) If rkr_{k} tends to 0 with increasing kk, then any local minimum xx^{*} will be found within a finite number of iterations with probability 1.

Property 1 states that the number of possible occurrences of Error 1 is finite while Property 2 says that Error 2 never happens. Due to its strong theoretical results and good practical performance, MLSL became one of the most reliable and popular multistart methods of late.

Finally, we note that MLSL has already been applied into the global constrained black-box optimization setting in [2]. The proposed method, MLSL-MADS, integrates MLSL with a mesh adaptive direct search (MADS) method [3] to find multiple local minima of an inverse transport problem involving black-box functions.

3 The DEFT-FUNNEL solver

We first define the function z:nq+lz:\mathbb{R}^{n}\rightarrow\mathbb{R}^{q+l} as z(x)=def(c(x),h(x))z(x)\stackrel{{\scriptstyle\rm def}}{{=}}(c(x),h(x)), i.e. it includes all the constraint functions of the original problem (5). Then, by defining f(x,s)=deff(x)f(x,s)\stackrel{{\scriptstyle\rm def}}{{=}}f(x) and z(x,s)=defz(x)sz(x,s)\stackrel{{\scriptstyle\rm def}}{{=}}z(x)-s, the problem (5) is rewritten as

{min(x,s)f(x,s)s.t.:z(x,s)=0,lssus,lxxux,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{(x,s)}&f(x,s)\\ \textrm{s.t.:}&z(x,s)=0,\\ &l^{s}\leq s\leq u^{s},\\ &l^{x}\leq x\leq u^{x},\\ \end{array}\right. (14)

where sq+ls\in\mathbb{R}^{q+l} are slack variables and ls()q+ll^{s}\in(\mathbb{R}\cup-\infty)^{q+l} and us()q+lu^{s}\in(\mathbb{R}\cup\infty)^{q+l} are the lower and upper bounds of the modified problem with ls=[lclh]Tl^{s}=\left[l^{c}\;l^{h}\right]^{T} and us=[ucuh]Tu^{s}=\left[u^{c}\;u^{h}\right]^{T}. We highlight that the rewriting of the original problem (5) as (14) is done within the solver and that the user does not need to interfere.

DEFT-FUNNEL is composed of a global search and a local search that are combined to solve the problem (14). In the next two sections, we elaborate on each of these search steps.

3.1 Global search

As mentioned previously, the global search in DEFT-FUNNEL relies on the MLSL multistart strategy. A merit function Φ(x)\Phi(x) is used to decide which starting points are selected for the local search. We introduce the global search in detail in what follows.

 Algorithm 3.1: GlobalSearch  

1:

={\cal L}^{*}=\emptyset

2:

for k=1k=1 to \ldots do

3:

Generate NN random points uniformly distributed.

4:

Rank the sample points by increasing value of Φ\Phi.

5:

for i=1i=1 to kNkN do

6:

if x:xxirk\not\exists x:\|x-x_{i}\|\leq r_{k} and Φ(x)<Φ(xi)\Phi(x)<\Phi(x_{i}) then

7:

=LocalSearch(xi){\cal L}^{*}={\cal L}^{*}\cup LocalSearch(x_{i})

8:

end if

9:

end for

10:

end for

11:

return the best feasible local minimum found in {\cal L}^{*}

 

The Algorithm 3.1 is implemented in the function deft_funnel_multistart, which is called by typing the following line in the Matlab command window:

[best_sol, best_fval, best_indicators, total_eval, nb_local_searches, fL] = deft_funnel_multistart(@f, @c, @h, @dev_f, @dev_h, n, nb_cons_c, nb_cons_h)

The inputs and outputs of deft_funnel_multistart are detailed in Table 1. The options for the input ‘whichmodel’ are described in Section 3.2.

Table 1: Inputs and outputs of the function deft_funnel_multistart.
Name Description
Mandatory f function handle of the objective function
Inputs c function handle of the black-box constraints if any or an empty array
h function handle of the white-box constraints if any or an empty array
dev_f function handle of the derivatives of f if it is a white box or an empty array
dev_h function handle of the derivatives of h if any or an empty array
n number of decision variables
nb_cons_c number of black-box constraints (bound constraints not included)
nb_cons_h number of white-box constraints (bound constraints not included)
Optional lsbounds vector of lower bounds for the constraints
Inputs usbounds vector of upper bounds for the constraints
lxbounds vector of lower bounds for the x variables
uxbounds vector of upper bounds for the x variables
maxeval maximum number of evaluations (default: 5000*n)
maxeval_ls maximum number of evaluations per local search (default: maxeval*0.7)
whichmodel approach to build the surrogate models
f_global_optimum known objective function value of the global optimum
Outputs best_sol best feasible solution found
best_fval objective function value of “best_sol”
best_indicators indicators of “best_sol”
total_eval number of evaluations used
nb_local_searches number of local searches done
fL objective function values of all local minima found

The merit function Φ(x)\Phi(x) employed in DEFT-FUNNEL is the well-known 1\ell_{1} penalty functon which is defined as follows:

Φ(x)=deff(x)+πi=1m([z(x)us]++[lsz(x)]+),\displaystyle\Phi(x)\stackrel{{\scriptstyle\rm def}}{{=}}f(x)+\pi\displaystyle\sum_{i=1}^{m}\left(\left[z(x)-u^{s}\right]^{+}+\left[l^{s}-z(x)\right]^{+}\right), (15)

where m=q+lm=q+l is the total number of constraints and π\pi is the penalty parameter. One of the advantages of this penalty function over others is that it is exact, that is, for sufficiently large values of π\pi, the local minimum of Φ(x)\Phi(x) subject only to the bound constraints on the xx variables is also the local minimum of the original constrained problem (5). We note that, although Φ(x)\Phi(x) is nondifferentiable, it is only used in the global search for selecting the starting points for the local searches.

The LocalSearch algorithm at line 7 is started by calling the function deft_funnel whose inputs and outputs are given in the next section.

3.2 Local search

The algorithm used in the local search is based on the one described in the paper [43]. It is a trust-region SQO method that makes use of a funnel bound on the infeasibility of the iterates in order to ensure convergence. At each iteration kk, we have an iterate (xk,sk)(x_{k},s_{k}) such that xk𝒴kx_{k}\in{\cal Y}_{k}, where 𝒴k{\cal Y}_{k} defines the interpolation set at iteration kk. Every iterate satisfies the following bound constraints:

lsskus,\displaystyle l^{s}\leq s_{k}\leq u^{s}, (16)
lxxkux.\displaystyle l^{x}\leq x_{k}\leq u^{x}. (17)

Depending on the optimality and feasibility of (xk,sk)(x_{k},s_{k}), a new step dk=def(dkx,dks)Td_{k}\stackrel{{\scriptstyle\rm def}}{{=}}(d^{x}_{k},d^{s}_{k})^{T} is computed. Each full step of the trust-funnel algorithm is decomposed as

dk=(dkxdks)=(nkxnks)+(tkxtks)=nk+tk,\displaystyle d_{k}=\left(\begin{array}[]{c}d^{x}_{k}\\ d^{s}_{k}\end{array}\right)=\left(\begin{array}[]{c}n^{x}_{k}\\ n^{s}_{k}\end{array}\right)+\left(\begin{array}[]{c}t^{x}_{k}\\ t^{s}_{k}\end{array}\right)=n_{k}+t_{k}, (24)

where the normal step component nkn_{k} aims to improve feasibility and the tangent step component tkt_{k} reduces the objective function model without worsening the constraint violation up to first order. This is done by requiring the tangent step to lie in the null space of the Jacobian of the constraints and by requiring the predicted improvement in the objective function obtained in the tangent step to not be negligible compared to the predicted change in ff resulting from the normal step. The full composite step dkd_{k} is illustrated in Figure 1. As it is explained in the next subsections, the computation of the composite step in our algorithm does not involve the function z(x.s)z(x.s) itself but rather its surrogate model.

After having computed a trial point xk+dkx_{k}+d_{k}, the algorithm proceeds by checking whether the iteration was successful in a sense yet to be defined. The iterate is then updated correspondingly, while the sample set and the trust regions are updated according to a self-correcting geometry scheme to be described later on. If 𝒴k{\cal Y}_{k} has been modified, the surrogate models are updated to satisfy the interpolation conditions for the new set 𝒴k+1{\cal Y}_{k+1}, implying that new function evaluations are carried out for the additional point obtained at iteration kk.

Refer to caption
Figure 1: Illustration of the composite step dk=nk+tkd_{k}=n_{k}+t_{k}. The normal step nkn_{k} attempts to improve feasibility by reducing the linearized constraint violation at (xk,sk)(x_{k},s_{k}), whereas the tangent step aims at minimizing the objective function without deteriorating the gains in feasibility obtained through the normal step. Here, A(xk,sk)A(x_{k},s_{k}) denotes the Jacobian of z(x,s)z(x,s) at (xk,sk)(x_{k},s_{k}).

The local search is started inside the multistart strategy loop in Algorithm 3.1, but it can also be called directly by the user in order to use DEFT-FUNNEL without multistart. This is done by typing the following line at the Matlab command window:

[x, fx, mu, indicators, evaluations, iterate, exit_algo] = deft_funnel(@f, @c, @h, @dev_f, @dev_h, x0, nb_cons_c, nb_cons_h)

The inputs and outputs of the function deft_funnel are detailed below in Table 2. Many other additional parameters can be set directly in the function deft_funnel_set_parameters. Those are related to the trust-region mechanism, to the interpolation set maintenance and to criticality step thresholds.

Table 2: Inputs and outputs of the function deft_funnel
Name Description
Mandatory f function handle of the objective function
Inputs c function handle of the black-box constraints if any or an empty array
h function handle of the white-box constraints if any or an empty array
dev_f function handle of the derivatives of f if it is a white box or an empty array
dev_h function handle of the derivatives of h if any or an empty array
x0 starting point (no need to be feasible)
nb_cons_c number of black-box constraints (bound constraints not included)
nb_cons_h number of white-box constraints (bound constraints not included)
Optional lsbounds vector of lower bounds for the constraints
Inputs usbounds vector of upper bounds for the constraints
lxbounds vector of lower bounds for the x variables
uxbounds vector of upper bounds for the x variables
maxeval maximum number of evaluations (default: 500*n)
type_f string ’BB’ if f is a black box (default) or ’WB’ otherwise
whichmodel approach to build the surrogate models
Outputs x the best approximation found to a local minimum
fx the value of the objective function at x
mu local estimates for the Lagrange multipliers
indicators feasibility and optimality indicators
evaluations number of calls to the objective function and constraints
iterate info related to the best point found as well as the coordinates of all past iterates
exit_algo output signal (0: terminated with success; -1: terminated with errors)

Once the initial interpolation set has been built using one of the methods described in the next subsection, the algorithm calls the function deft_funnel_main. In fact, deft_funnel serves only as a wrapper for the main function of the local search, deft_funnel_main, doing all the data preprocessing and paramaters setting needed in the initialization process. All the main steps such as the subspace minimization step, the criticality step and the computation of the new directions are part of the scope of deft_funnel_main.

3.2.1 Building the surrogate models

The local-search algorithm starts by building an initial interpolation set either from a simplex or by drawing samples from an uniform distribution. The construction of the interpolation set is done in the function deft_funnel_build_initial_sample_set, which is called only once during a local search within deft_funnel. The choice between random sampling and simplex is done in deft_funnel when calling the function deft_funnel_set_parameters, which defines the majority of parameters of the local search. If random sampling is chosen, it checks if the resulting interpolation set is well poised and, if not, it is updated using the Algorithm 6.3 described in Chapter 6 in [12], which is implemented in deft_funnel_repair_Y.

For the sake of simplicity, we assume henceforth that the objective function is also a black box. Given a poised set of sample points 𝒴0={y0,y1,,yp}{\cal Y}_{0}=\{y^{0},y^{1},\ldots,y^{p}\} with an initial point x0𝒴0x_{0}\in{\cal Y}_{0}, the next step of our algorithm is to replace the objective function f(x)f(x) and the black-box constraint functions c(x)=(c1(x),c2(x),,cq(x))c(x)=(c_{1}(x),c_{2}(x),\ldots,c_{q}(x)) by surrogate models mf(x)m^{f}(x) and mc(x)=(mc1(x),mc2(x),,mcq(x))m^{c}(x)=(m^{c_{1}}(x),m^{c_{2}}(x),\ldots,m^{c_{q}}(x)) built from the solution of the interpolation system

M(ϕ,𝒴)αϕ=Υ(𝒴),M(\phi,{\cal Y})\alpha_{\phi}=\Upsilon({\cal Y}), (25)

where

M(ϕ,𝒴)=(ϕ0(y0)ϕ1(y0)ϕb(y0)ϕ0(y1)ϕ1(y1)ϕb(y1)ϕ0(yp)ϕ1(yp)ϕb(yp)),Υ(𝒴)=(Υ(y0)Υ(y1)Υ(yp)),pb,M(\phi,{\cal Y})=\begin{pmatrix}\phi_{0}(y^{0})&\phi_{1}(y^{0})&\cdots&\phi_{b}(y^{0})\\ \phi_{0}(y^{1})&\phi_{1}(y^{1})&\cdots&\phi_{b}(y^{1})\\ \vdots&\vdots&\ddots&\vdots\\ \phi_{0}(y^{p})&\phi_{1}(y^{p})&\cdots&\phi_{b}(y^{p})\end{pmatrix},\;\;\Upsilon({\cal Y})=\begin{pmatrix}\Upsilon(y^{0})\\ \Upsilon(y^{1})\\ \vdots\\ \Upsilon(y^{p})\end{pmatrix},\;\;p\leq b,

where ϕ\phi is the basis of monomials and Υ(x)\Upsilon(x) is replaced by the objective function f(x)f(x) or some black-box constraint function cj(x)c_{j}(x).

We consider underdetermined quadratic interpolation models that are fully linear and that are enhanced with curvature information along the optimization process. Since the linear system (25) is potentially underdetermined, the resulting interpolating polynomials will be no longer unique and so we provide to the user four approaches to construct the models mf(x)m^{f}(x) and mcj(x)m^{c_{j}}(x) that can be chosen by passing a number from 1 to 4 to the input ‘whichmodel’: 1 - subbasis selection approach; 2 - minimum 2\ell_{2}-norm models (by defatul); 3 - minimum Frobenius norm models; and 4 - regression mnodels (recommended for noisy functions). Further details about each one can be found in [12].

If p0=|𝒴0|=n+1p_{0}=|{\cal Y}_{0}|=n+1, a linear model rather than an underdetermined quadratic model is built for each function. The reason is that, despite both having error bounds that are linear in Δ\Delta for the first derivatives, the error bound for the latter includes also the norm of the model Hessian, as stated in Lemma 2.2 in [46], which makes it worse than the former.

Whenever n+1<pk(n+1)(n+2)/2=pmaxn+1<p_{k}\leq(n+1)(n+2)/2=p^{\max}, the algorithm builds underdetermined quadratic models based on the choice of the user between the approaches described above. If regression models are considered instead, we set pmax=(n+1)(n+2)p^{\max}=(n+1)(n+2), which means that the sample set is allowed to have twice the number of sample points required for fully quadratic interpolation models. Notice that having a number of sample points much larger than the required for quadratic interpolation can also worsen the quality of the interpolation models as the sample set could contain points that are too far from the iterate, which is not ideal for models built for local approximation.

It is also possible to choose the initial degree of the models between fully linear, quadratic with a diagonal Hessian or fully quadratic. This is done within deft_funnel by setting the input argument cur_degree in the call to deft_funnel_set_parameters to one of the following options: model_size.plin, model_size.pdiag or model_size.pquad.

The interpolation system (25) is solved using a QR factorization of the matrix M(ϕ,𝒴)M(\phi,{\cal Y}) within the function deft_funnel_computeP, which is called by deft_funnel_build_models.

In order to evaluate the error of the interpolation models and their derivatives with respect to the original functions ff and cc, we make use of the measure of well poisedness of 𝒴{\cal Y} given below.

Definition 3.1.

Let 𝒴={y0,y1,,yp}{\cal Y}=\{y^{0},y^{1},\ldots,y^{p}\} be a poised interpolation set and 𝒫nd{\cal P}_{n}^{d} be a space of polynomials of degree less than or equal to dd on n\mathbb{R}^{n}. Let Λ>0\Lambda>0 and {0(x),1(x),,p(x)}\{\ell_{0}(x),\ell_{1}(x),\ldots,\ell_{p}(x)\} be the basis of Lagrange polynomials associated with 𝒴{\cal Y}. Then, the set 𝒴{\cal Y} is said to be Λ\Lambda-poised in {\cal B} for 𝒫nd{\cal P}_{n}^{d} (in the interpolation sense) if and only if

max0ipmaxx|i(x)|Λ.\displaystyle\displaystyle\max_{0\leq i\leq p}\displaystyle\max_{x\in{\cal B}}|\ell_{i}(x)|\leq\Lambda.

As it is shown in [12], the error bounds for at most fully quadratic models depend linearly on the constant Λ\Lambda; the smaller it is, the better the interpolation models approximate the original functions. We also note that the error bounds for undetermined quadratic interpolation models are linear in the diameter of the smallest ball containing 𝒴{\cal Y} for the first derivatives and quadratic for the function values.

Finally, since the constraint functions c(x)c(x) are replaced by surrogate models mc(x)m^{c}(x) in the algorithm, we define the models mz(x)=def(mc(x),h(x))m^{z}(x)\stackrel{{\scriptstyle\rm def}}{{=}}(m^{c}(x),h(x)) and mz(x,s)=defmz(x)sm^{z}(x,s)\stackrel{{\scriptstyle\rm def}}{{=}}m^{z}(x)-s, which are those used for computing new directions.

3.2.2 Subspace minimization

In this subsection, we explain how the subspace minimization is employed in our algorithm. We define the subspace 𝒮k{\cal S}_{k} at iteration kk as

𝒮k=def{xn|[x]i=[lx]i for ik and [x]i=[ux]i for i𝒰k},{\cal S}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{x\in\mathbb{R}^{n}\;|\;\left[x\right]_{i}=\left[l^{x}\right]_{i}\;\textrm{ for }\;i\in{\cal L}_{k}\;\textrm{ and }\;\left[x\right]_{i}=\left[u^{x}\right]_{i}\;\textrm{ for }\;i\in{\cal U}_{k}\},

where k=def{i|[xk]i[lx]iϵb}{\cal L}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{i\;|\;\left[x_{k}\right]_{i}-\left[l^{x}\right]_{i}\leq\epsilon_{b}\} and 𝒰k=def{i|[ux]i[xk]iϵb}{\cal U}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{i\;|\;\left[u^{x}\right]_{i}-\left[x_{k}\right]_{i}\leq\epsilon_{b}\} define the index sets of (nearly) active variables at their bounds, for some small constant ϵb>0\epsilon_{b}>0. If k{\cal L}_{k}\neq\emptyset or 𝒰k{\cal U}_{k}\neq\emptyset, a new well-poised interpolation set 𝒵k{\cal Z}_{k} is built and a recursive call is made in order to solve the problem in the new subspace 𝒮k{\cal S}_{k}.

If the algorithm converges in a subspace 𝒮k{\cal S}_{k} with an optimal solution (x~,s~)(\tilde{x},\tilde{s}), it checks if the latter is also optimal for the full-space problem, in which case the algorithm stops. If not, the algorithm continues by attempting to compute a new direction in the full space. This procedure is illustrated in Figure 2.

Refer to caption
Figure 2: Subspace minimization procedure. The algorithm calls itself recursively in the order to solve the problem in a new subspace. If convergence is attained, it goes back to check if the solution found is also optimal in the full space.

The dimensionality reduction of the problem mitigates the chances of degeneration of the interpolation set when the sample points become too close to each other and thus affinely dependent. Figure 3 gives an example of this scenario as the optimal solution is approached.

Refer to caption
Figure 3: Illustration of a scenario where the interpolation set becomes degenerated as the optimal solution is approached. In this example, we consider a two-dimensional problem with the bound constraint [x]20\left[x\right]_{2}\geq 0, which is active at the solution xx^{*} and at the iterates close to it.

In order to check the criticality in the full-space problem, a full-space interpolation set of degree n+1n+1 is built in an ϵ\epsilon-neighborhood around the point x𝒮x^{*}_{{\cal S}}, which is obtained by assembling the subspace solution x~\tilde{x} and the |k𝒰k||{\cal L}_{k}\,\cup\,{\cal U}_{k}| fixed components [xk]i\left[x_{k}\right]_{i}. The models mkfm^{f}_{k} and mkcm^{c}_{k} are then updated and the criticality step is entered.

The complete subspace minimization step is described in Algorithm 3.2.2 and it is implemented in the function deft_funnel_subspace_min, which is called inside deft_funnel_main.

 Algorithm 3.2: SubspaceMinimization(𝒮k1{\cal S}_{k-1}, 𝒴k{\cal Y}_{k}, xkx_{k}, sks_{k}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z}, vkmaxv_{k}^{\max})  

1:

Check for (nearly) active bounds at xkx_{k} and define 𝒮k{\cal S}_{k}. If there is no (nearly) active bound or if 𝒮k{\cal S}_{k} has already been explored, go to Step 5. If all bounds are active, go to Step 4.

2:

Build a new interpolation set 𝒵k{\cal Z}_{k} in 𝒮k{\cal S}_{k}.

3:

Call recursively LocalSearch(𝒮k{\cal S}_{k}, 𝒵k{\cal Z}_{k}, xkx_{k}, sks_{k}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z}, vkmaxv_{k}^{\max}) and let (x𝒮,s𝒮)(x^{*}_{{\cal S}},s^{*}_{{\cal S}}) be the solution of the subspace problem after adding the fixed components.

4:

If dim(𝒮k1)<ndim({\cal S}_{k-1})<n, return (x𝒮,s𝒮)(x^{*}_{{\cal S}},s^{*}_{{\cal S}}). Otherwise, reset (xk,sk)=(x𝒮,s𝒮)(x_{k},s_{k})=(x^{*}_{{\cal S}},s^{*}_{{\cal S}}), construct new set 𝒴k{\cal Y}_{k} around xkx_{k}, build mkfm_{k}^{f} and mkcm_{k}^{c} and recompute πk1f\pi^{f}_{k-1} (optimality measure).

5:

If 𝒮k{\cal S}_{k} has already been explored, set (xk+1,sk+1)=(xk,sk)(x_{k+1},s_{k+1})=(x_{k},s_{k}), reduce the trust regions radii Δk+1f=γΔkf\Delta_{k+1}^{f}=\gamma\Delta_{k}^{f} and Δk+1z=γΔkz\Delta_{k+1}^{z}=\gamma\Delta_{k}^{z}, set Δk+1=min[Δk+1f,Δk+1z]\Delta_{k+1}=\min[\Delta_{k+1}^{f},\Delta_{k+1}^{z}] and build a new poised set 𝒴k+1{\cal Y}_{k+1} in (xk+1;Δk+1){\cal B}(x_{k+1};\Delta_{k+1}).

 

3.2.3 The normal step

The normal step aims at reducing the constraint violation at given point (x,s)(x,s) defined by

v(x,s)=def12mz(x,s)2.\displaystyle v(x,s)\stackrel{{\scriptstyle\rm def}}{{=}}{\scriptstyle\frac{1}{2}}\|m^{z}(x,s)\|^{2}. (26)

To ensure that the step nkn_{k} is normal to the linearized constraint mz(xk,sk)+J(xk,sk)n=0m^{z}(x_{k},s_{k})+J(x_{k},s_{k})n=0, where J(x,s)=def(J(x)I)J(x,s)\stackrel{{\scriptstyle\rm def}}{{=}}(J(x)\;-I) is the Jacobian of mz(x,s)m^{z}(x,s) with respect to (x,s)(x,s) and J(x)J(x) is the Jacobian of mz(x)m^{z}(x) with respect to xx, we require that

nkκnmz(xk,sk),\displaystyle\|n_{k}\|_{\infty}\leq\kappa_{n}\|m^{z}(x_{k},s_{k})\|, (27)

for some κn1\kappa_{n}\geq 1.

The computation of nkn_{k} is done by solving the constrained linear least-squares subproblem

{minn=(nx,ns)12mz(xk,sk)+J(xk,sk)n2s.t.:lssk+nsus,lxxk+nxux,xk+nx𝒮k,n𝒩k,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{n=(n^{x},n^{s})}&{\scriptstyle\frac{1}{2}}\|m^{z}(x_{k},s_{k})+J(x_{k},s_{k})n\|^{2}\\ \textrm{s.t.:}&l^{s}\leq s_{k}+n^{s}\leq u^{s},\\ &l^{x}\leq x_{k}+n^{x}\leq u^{x},\\ &x_{k}+n^{x}\in{\cal S}_{k},\\ &n\in{\cal N}_{k},\end{array}\right. (33)

where

𝒩k=def{nn+mnmin[Δkz,κnmz(xk,sk)]},\displaystyle{\cal N}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{n\in\mathbb{R}^{n+m}\mid\|n\|_{\infty}\leq\min\left[\,\Delta^{z}_{k},\,\kappa_{n}\,\|m^{z}(x_{k},s_{k})\|\,\right]\,\}, (34)

for some trust-region radius Δkz>0\Delta_{k}^{z}>0. Finally, a funnel bound vkmaxv_{k}^{\max} is imposed on the constraint violation vk=defv(xk,sk)v_{k}\stackrel{{\scriptstyle\rm def}}{{=}}v(x_{k},s_{k}) for the acceptance of new iterates to ensure the convergence towards feasibility.

We notice that, although a linear approximation of the constraints is used for calculating the normal step, the second-order information of the quadratic interpolation model mz(x,s)m^{z}(x,s) is still used in the SQO model employed in the tangent step problem as it is shown next.

The subproblem (33) is solved within the function deft_funnel_normal_step, which makes use of an original active-set algorithm where the unconstrained problem is solved at each iteration in the subspace defined by the currently active bounds, themselves being determined by a projected Cauchy step. Each subspace solution is then computed using a SVD decomposition of the reduced matrix. This algorithm is implemented in the function deft_funnel_blls and is intended for small-scale bound-constrained linear least-squares problems.

3.2.4 The tangent step

The tangent step is a direction that improves optimality and it is computed by using a SQO model for the problem (14) after the normal step calculation. The quadratic model for the function objective function is defined as

ψk((xk,sk)+d)=defmf(xk,sk)+gk,d+12d,Bkd,\displaystyle\psi_{k}((x_{k},s_{k})+d)\stackrel{{\scriptstyle\rm def}}{{=}}m^{f}(x_{k},s_{k})+\langle g_{k},d\rangle+{\scriptstyle\frac{1}{2}}\langle d,B_{k}d\rangle, (35)

where mf(xk,sk)=defmf(xk)m^{f}(x_{k},s_{k})\stackrel{{\scriptstyle\rm def}}{{=}}m^{f}(x_{k}), gk=def(x,s)mf(xk,sk)g_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\nabla_{(x,s)}m^{f}(x_{k},s_{k}), and BkB_{k} is the approximate Hessian of the Lagrangian function

(x,s,μ,ξs,τs,ξx,τx)\displaystyle{\cal L}(x,s,\mu,\xi^{s},\tau^{s},\xi^{x},\tau^{x}) =mf(x,s)+μ,mz(x,s))+τs,sus+ξs,lss\displaystyle=m^{f}(x,s)+\langle\mu,m^{z}(x,s))\rangle+\langle\tau^{s},s-u^{s}\rangle+\langle\xi^{s},l^{s}-s\rangle
+τx,xux+ξx,lxx\displaystyle\quad\;+\langle\tau^{x},x-u^{x}\rangle+\langle\xi^{x},l^{x}-x\rangle

with respect to (x,s)(x,s), given by

Bk=(Hk+i=1m[μ^k]iZik000),\displaystyle B_{k}=\begin{pmatrix}H_{k}+\sum_{i=1}^{m}[\hat{\mu}_{k}]_{i}Z_{ik}&0\\ 0&0\end{pmatrix}, (36)

where ξs\xi^{s} and τs\tau^{s} are the Lagrange multipliers associated to the lower and upper bounds, respectively, on the slack variables ss, and ξx\xi^{x} and τx\tau^{x}, the Lagrange multipliers associated to the lower and upper bounds on the xx variables. In (36), Hk=xx2mf(xk,sk)H_{k}=\nabla^{2}_{xx}m^{f}(x_{k},s_{k}), Zik=xx2mikz(xk,sk)Z_{ik}=\nabla^{2}_{xx}m^{z}_{ik}(x_{k},s_{k}) and the vector μ^k\hat{\mu}_{k} may be viewed as a local approximation of the Lagrange multipliers with respect to the equality constraints mz(x,s)=0m^{z}(x,s)=0.

By applying (24) into (35), we obtain

ψk((xk,sk)+nk+t)=ψk((xk,sk)+nk)+gkN,t+12t,Bkt,\begin{array}[]{rcl}\psi_{k}((x_{k},s_{k})+n_{k}+t)&=&\psi_{k}((x_{k},s_{k})+n_{k})+\langle g_{k}^{N},t\rangle+{\scriptstyle\frac{1}{2}}\langle t,B_{k}t\rangle,\end{array} (37)

where

gkN=defgk+Bknk.\displaystyle g_{k}^{N}\stackrel{{\scriptstyle\rm def}}{{=}}g_{k}+B_{k}\,n_{k}. (38)

Since (37) is a local approximation for the function mf((xk,sk)+nk+t)m^{f}((x_{k},s_{k})+n_{k}+t), a trust region with radius Δkf\Delta_{k}^{f} is used for the complete step d=nk+td=n_{k}+t:

𝒯k=def{dn+mdΔkf}.\displaystyle{\cal T}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{d\in\mathbb{R}^{n+m}\mid\|d\|_{\infty}\leq\Delta^{f}_{k}\}. (39)

Moreover, given that the normal step was also calculated using local models, it makes sense to remain in the intersection of both trust regions, which implies that

dkk=def𝒩k𝒯k=def{dn+mdΔk},\displaystyle d_{k}\in{\cal R}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}{\cal N}_{k}\cap{\cal T}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{d\in\mathbb{R}^{n+m}\mid\|d\|_{\infty}\leq\Delta_{k}\}, (40)

where Δk=min[Δkz,Δkf]\Delta_{k}=\min[\Delta_{k}^{z},\Delta_{k}^{f}].

In order to make sure that there is still enough space left for the tangent step within k{\cal R}_{k}, we first check if the following constraint on the normal step is satisfied:

nkκΔk,\displaystyle\|n_{k}\|_{\infty}\leq\kappa_{{\cal R}}\Delta_{k}, (41)

for some κ(0,1)\kappa_{{\cal R}}\in(0,1). If (41) holds, the tangent step is calculated by solving the following subproblem

{mint=(tx,ts)gkN,t+12t,Bkts.t.:J(xk,sk)t=0,lssk+nks+tsus,lxxk+nkx+txux,xk+nkx+tx𝒮k.nk+tk,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{t=(t^{x},t^{s})}&\langle g_{k}^{N},t\rangle+{\scriptstyle\frac{1}{2}}\langle t,B_{k}t\rangle\\ \textrm{s.t.:}&J(x_{k},s_{k})t=0,\\ &l^{s}\leq s_{k}+n^{s}_{k}+t^{s}\leq u^{s},\\ &l^{x}\leq x_{k}+n^{x}_{k}+t^{x}\leq u^{x},\\ &x_{k}+n^{x}_{k}+t^{x}\in{\cal S}_{k}.\\ &n_{k}+t\in{\cal R}_{k},\\ \end{array}\right. (48)

where we require that the new iterate xk+dkx_{k}+d_{k} belongs to subspace 𝒮k{\cal S}_{k} and that it satisfies the bound constraints  (16) and (17). In the Matlab code, the tangent step is calculated by the function deft_funnel_tangent_step, which in turn calls either the Matlab solver linprog or our implementation of the nonmonotone spectral projected gradient method [8] in deft_funnel_spg to solve the subproblem (48). The choice between both solvers is based on whether Bkϵ\|B_{k}\|\leq\epsilon, for a small ϵ>0\epsilon>0, in which case we assume that the problem is linear and therefore linprog is used.

We define our ff-criticality measure as

πkf=defgkN,rk,\displaystyle\pi_{k}^{f}\stackrel{{\scriptstyle\rm def}}{{=}}-\langle g_{k}^{N},r_{k}\rangle, (49)

where rkr_{k} is the projected Cauchy direction obtained by solving the linear optimization problem

{minr=(rx,rs)gkN,rs.t.:J(xk,sk)r=0,lssk+nks+rsus,lxxk+nkx+rxux,xk+nkx+rx𝒮k.r1.\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{r=(r^{x},r^{s})}&\langle g_{k}^{N},r\rangle\\ \textrm{s.t.:}&J(x_{k},s_{k})r=0,\\ &l^{s}\leq s_{k}+n^{s}_{k}+r^{s}\leq u^{s},\\ &l^{x}\leq x_{k}+n^{x}_{k}+r^{x}\leq u^{x},\\ &x_{k}+n^{x}_{k}+r^{x}\in{\cal S}_{k}.\\ &\|r\|_{\infty}\leq 1.\end{array}\right. (56)

By definition, πkf\pi_{k}^{f} measures how much decrease could be obtained locally along the projection of the negative of the approximate gradient gkNg_{k}^{N} onto the nullspace of J(xk,sk)J(x_{k},s_{k}) intersected with the region delimited by the bounds. This measure is computed in deft_funnel_compute_optimality, which uses lingprog to solve the subproblem (56).

A new local estimate of the Lagrange multipliers (μk,ξks,τks,ξkx,τkx)(\mu_{k},\xi^{s}_{k},\tau^{s}_{k},\xi^{x}_{k},\tau^{x}_{k}) are computed by solving the following problem:

{min(μ,ξ^s,τ^s,ξ^x,τ^x)12k(μ,ξ^s,τ^s,ξ^x,τ^x)2s.t.:ξ^s,τ^s,ξ^x,τ^x0,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{(\mu,\hat{\xi}^{s},\hat{\tau}^{s},\hat{\xi}^{x},\hat{\tau}^{x})}&{\scriptstyle\frac{1}{2}}\|{\cal M}_{k}(\mu,\hat{\xi}^{s},\hat{\tau}^{s},\hat{\xi}^{x},\hat{\tau}^{x})\|^{2}\\ \textrm{s.t.:}&\hat{\xi}^{s},\hat{\tau}^{s},\hat{\xi}^{x},\hat{\tau}^{x}\geq 0,\end{array}\right. (59)

where

k(μ,ξ^s,τ^s,ξ^x,τ^x)\displaystyle{\cal M}_{k}(\mu,\hat{\xi}^{s},\hat{\tau}^{s},\hat{\xi}^{x},\hat{\tau}^{x}) =def\displaystyle\stackrel{{\scriptstyle\rm def}}{{=}} (gkN0)+(J(xk)TI)μ+(0Iτs)τ^s+(0Iξs)ξ^s\displaystyle\left(\begin{array}[]{c}g_{k}^{N}\\ 0\end{array}\right)+\left(\begin{array}[]{c}J(x_{k})^{T}\\ -I\end{array}\right)\mu+\left(\begin{array}[]{c}0\\ I_{\tau}^{s}\end{array}\right)\hat{\tau}^{s}+\left(\begin{array}[]{c}0\\ -I_{\xi}^{s}\end{array}\right)\hat{\xi}^{s} (73)
+(Iτx0)τ^x+(Iξx0)ξ^x,\displaystyle+\left(\begin{array}[]{c}I_{\tau}^{x}\\ 0\end{array}\right)\hat{\tau}^{x}+\left(\begin{array}[]{c}-I_{\xi}^{x}\\ 0\end{array}\right)\hat{\xi}^{x},

the matrix II is the m×mm\times m identity matrix, the matrices IξsI_{\xi}^{s} and IτsI_{\tau}^{s} are obtained from II by removing the columns whose indices are not associated to any active (lower and upper, respectively) bound at sk+nkss_{k}+n^{s}_{k}, the matrices IξxI_{\xi}^{x} and IτxI_{\tau}^{x} are obtained from the n×nn\times n identity matrix by removing the columns whose indices are not associated to any active (lower and upper, respectively) bound at xk+nkxx_{k}+n^{x}_{k}, and the Lagrange multipliers (ξ^s,τ^s,ξ^x,τ^x)(\hat{\xi}^{s},\hat{\tau}^{s},\hat{\xi}^{x},\hat{\tau}^{x}) are those in (ξs,τs,ξx,τx)(\xi^{s},\tau^{s},\xi^{x},\tau^{x}) associated to active bounds at sk+nkss_{k}+n^{s}_{k} and xk+nkxx_{k}+n^{x}_{k}. All the other Lagrange multipliers are set to zero.

The subproblem (59) is also solved using the active-set algorithm implemented in the function deft_funnel_blls.

3.2.5 Which steps to compute and retain

The algorithm computes normal and tangent steps depending on the measures of feasibility and optimality at each iteration. Differently from [42, 43], where the computation of the normal steps depends on the measure of optimality, here the normal step is computed whenever the following condition holds

z(xk,sk)>ϵ,\displaystyle\|z(x_{k},s_{k})\|>\epsilon, (74)

for some small ϵ>0\epsilon>0, i.e. preference is always given to feasibility. This choice is based on the fact that, in many real-life problems with expensive functions and a small budget, one seeks to find a feasible solution as fast as possible and that a solution having a smaller objective function value than the current strategy is already enough. If (74) fails, we set nk=0n_{k}=0.

We define a vv-criticality measure that indicates how much decrease could be obtained locally along the projection of the negative gradient of the Gauss-Newton model of vv at (xk,sk)(x_{k},s_{k}) onto the region delimited by the bounds as

πkv=defJ(xk,sk)Tz(xk,sk),bk,\pi_{k}^{v}\stackrel{{\scriptstyle\rm def}}{{=}}-\langle J(x_{k},s_{k})^{T}z(x_{k},s_{k}),b_{k}\rangle,

where the projected Cauchy direction bkb_{k} is given by the solution of

{minb=(bx,bs)J(xk,sk)Tz(xk,sk),bs.t.:lssk+bsus,lxxk+bxux,xk+bx𝒮k,b1.\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{b=(b^{x},b^{s})}&\langle J(x_{k},s_{k})^{T}z(x_{k},s_{k}),b\rangle\\ \textrm{s.t.:}&l^{s}\leq s_{k}+b^{s}\leq u^{s},\\ &l^{x}\leq x_{k}+b^{x}\leq u^{x},\\ &x_{k}+b^{x}\in{\cal S}_{k},\\ &\|b\|_{\infty}\leq 1.\end{array}\right. (80)

We say that (xk,sk)(x_{k},s_{k}) is an infeasible stationary point if z(xk,sk)0z(x_{k},s_{k})\neq 0 and πkv=0\pi_{k}^{v}=0, in which case the algorithm terminates.

The procedure for the calculation of the normal step is given in the algorithm below. In the code, it is implemented in the function deft_funnel_normal_step, which calls the algorithm in deft_funnel_blls in order to solve the normal step subproblem (33).

 Algorithm 3.3: NormalStep(xkx_{k}, sks_{k}, πkv\pi_{k}^{v}, vkv_{k}, vkmaxv_{k}^{\max})  

1:

If z(xk,sk)0z(x_{k},s_{k})\neq 0 and πkv=0\pi_{k}^{v}=0, STOP (infeasible stationary point).

2:

If (74), compute a normal step nkn_{k} by solving (33). Otherwise, set nk=0n_{k}=0.

 

If the solution of (56) is rk=0r_{k}=0, then by (49) we have πkf=0\pi_{k}^{f}=0, in which case we set tk=0t_{k}=0. If the current iterate is farther from feasibility than from optimality, i.e., for a given a monotonic bounding function ωt\omega_{t}, the condition

πkf>ωt(z(xk,sk))\displaystyle\pi_{k}^{f}>\omega_{t}(\|z(x_{k},s_{k})\|) (81)

fails, then we skip the tangent step computation by setting tk=0t_{k}=0.

After the computation of the tangent step, the usefulness of the latter is evaluated by checking if the following conditions

tk>κ𝒵𝒮nk\displaystyle\|t_{k}\|>\kappa_{{\cal Z}{\cal S}}\|n_{k}\| (82)

and

δkf=defδkf,t+δkf,nκδδkf,t,\displaystyle\delta_{k}^{f}\stackrel{{\scriptstyle\rm def}}{{=}}\delta^{f,t}_{k}+\delta^{f,n}_{k}\geq\kappa_{\delta}\delta^{f,t}_{k}, (83)

where

δkf,t=defψk((xk,sk)+nk)ψk((xk,sk)+nk+tk)\displaystyle\delta^{f,t}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\psi_{k}((x_{k},s_{k})+n_{k})-\psi_{k}((x_{k},s_{k})+n_{k}+t_{k}) (84)

and

δkf,n=defψk(xk,sk)ψk((xk,sk)+nk),\displaystyle\delta^{f,n}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\psi_{k}(x_{k},s_{k})-\psi_{k}((x_{k},s_{k})+n_{k}), (85)

are satisfied for some κ𝒵𝒮>1\kappa_{{\cal Z}{\cal S}}>1 and for κδ(0,1)\kappa_{\delta}\in(0,1). The inequality (83) indicates that the predicted improvement in the objective function obtained in the tangent step is substantial compared to the predicted change in ff resulting from the normal step. If (82) holds but (83) fails, the tangent step is not useful in the sense just discussed, and we choose to ignore it by resetting tk=0t_{k}=0.

The tangent step procedure is stated in Algorithm 3.2.5 and it is implemented in the function deft_funnel_tangent_step.

 Algorithm 3.4: TangentStep(xkx_{k}, sks_{k}, nkn_{k})  

1:

If (41) holds, then

1.1:

select a vector μ^k\hat{\mu}_{k} and define BkB_{k} as in (36);

1.2:

compute μk\mu_{k} by solving (59);

1.3:

compute the modified Cauchy direction rkr_{k} by solving (56) and define πkf\pi_{k}^{f} as (49);

1.4:

if (81) holds, compute a tangent step tkt_{k} by solving (48).

2:

If (41) fails, set μk=μk1\mu_{k}=\mu_{k-1}. In this case, or if (81) fails, or if (82) holds but (83) fails, set tk=0t_{k}=0 and dk=nkd_{k}=n_{k}.

3:

Define (xk+,sk+)=(xk,sk)+dk(x_{k}^{+},s_{k}^{+})=(x_{k},s_{k})+d_{k}.

 

3.2.6 Iteration types

Depending on the contributions of the current iteration in terms of optimality and feasibility, we classify it into three types: μ\mu-iteration, ff-iteration and zz-iteration. This is done by checking if some conditions hold for the trial point defined as

(xk+,sk+)=def(xk,sk)+dk.\displaystyle(x_{k}^{+},s_{k}^{+})\stackrel{{\scriptstyle\rm def}}{{=}}(x_{k},s_{k})+d_{k}. (86)
3.2.6.1 μ\mu-iteration

If dk=0d_{k}=0, which means that the Lagrange multiplier estimates are the only new values that have been computed, iteration kk is said to be a μ\mu-iteration with reference to the Lagrange multipliers μ\mu associated to the constraints mz(x,s)=0m^{z}(x,s)=0. Notice, however, that not only new μk\mu_{k} values have been computed, but all the other Lagrange multipliers (ξks,τks,ξkx,τkx)(\xi^{s}_{k},\tau^{s}_{k},\xi^{x}_{k},\tau^{x}_{k}) as well.

In this case, we set (xk+1,sk+1)=(xk,sk)(x_{k+1},s_{k+1})=(x_{k},s_{k}), Δk+1f=Δkf\Delta_{k+1}^{f}=\Delta_{k}^{f}, Δk+1z=Δkz\Delta_{k+1}^{z}=\Delta_{k}^{z}, vk+1max=vkmaxv_{k+1}^{\max}=v_{k}^{\max} and we use the new multipliers to build a new SQO model in (35).

Since null steps dk=0d_{k}=0 might be due to the poor quality of the interpolation models, we check the Λ\Lambda-poisedness in μ\mu-iterations and attempt to improve it whenever the following condition holds

ΛΔ(𝒴k)>ϵμ,\Lambda\,\Delta({\cal Y}_{k})>\epsilon_{\mu}, (87)

where

Δ(𝒴k)=defmaxjyk,jxk\Delta({\cal Y}_{k})\stackrel{{\scriptstyle\rm def}}{{=}}\displaystyle\max_{j}\|y^{k,j}-x_{k}\|

and ϵμ>0\epsilon_{\mu}>0. The inequality (87) gives an estimate of the error bound for the interpolation models. If (87) holds, we try to reduce the value at the left side by modifying the sample set 𝒴k{\cal Y}_{k}. Firstly, we choose a constant ξ(0,1)\xi\in(0,1) and replace all points yk,j𝒴ky^{k,j}\in{\cal Y}_{k} such that

yk,jxk>ξΔ(𝒴k)\|y^{k,j}-x_{k}\|>\xi\Delta({\cal Y}_{k})

by new points yk,jy^{k,j}_{*} that (approximately) maximizes |jk(x)||\ell_{j_{k}}(x)| in (xk;ξΔ(𝒴k)){\cal B}(x_{k};\xi\Delta({\cal Y}_{k})). Then we use the Algorithm 6.3 described in Chapter 6 in [12] with the smaller region {\cal B} to improve Λ\Lambda-poisedness of the new sample set. This procedure is implemented in the Matlab code by the function deft_funnel_repair_sample_set.

3.2.6.2 ff-iteration

If iteration kk has mainly contributed to optimality, we say that iteration kk is an ff-iteration. Formally, this happens when tk0t_{k}\neq 0, (83) holds, and

v(xk+,sk+)vkmax.\displaystyle v(x_{k}^{+},s_{k}^{+})\leq v_{k}^{\max}. (88)

Convergence of the algorithm towards feasibility is ensured by condition (88), which limits the constraint violation with the funnel bound.

In this case, we set (xk+1,sk+1)=(xk+,sk+)(x_{k+1},s_{k+1})=(x_{k}^{+},s_{k}^{+})) if

ρkf=defmf(xk,sk)mf(xk+,sk+)δkfη1,\displaystyle\rho_{k}^{f}\stackrel{{\scriptstyle\rm def}}{{=}}\frac{m^{f}(x_{k},s_{k})-m^{f}(x_{k}^{+},s_{k}^{+})}{\delta^{f}_{k}}\geq\eta_{1}, (89)

and (xk+1,sk+1)=(xk,sk)(x_{k+1},s_{k+1})=(x_{k},s_{k})) otherwise. Note that δkf>0\delta_{k}^{f}>0 (because of (84) and (83)) unless (xk,sk)(x_{k},s_{k}) is first-order critical, and hence that condition (89) is well-defined. As for the value of the funnel bound, we set vk+1max=vkmaxv_{k+1}^{\max}=v_{k}^{\max}.

Since our method can suffer from the Maratos effect [31], we also apply a second-order correction (see Chapter 15, Section 15.6, in [35] for more details) to the normal step whenever the complete direction dkd_{k} is unsuccessful at improving optimality, i.e., whenever the condition (89) fails. As the latter might be due to the local approximation of the constraint functions, this effect may be overcome with a second-order step n^\hat{n} that is calculated in deft_funnel_sec_order_correction by solving the following subproblem

{minn^=(n^x,n^s)12mz(xk+,sk+)+J(xk,sk)n^2s.t.:lssk++n^sus,lxxk++n^xux,xk++n^x𝒮k,n^𝒩k^,\displaystyle\left\{\begin{array}[]{rc}\displaystyle\min_{\hat{n}=(\hat{n}^{x},\hat{n}^{s})}&{\scriptstyle\frac{1}{2}}\|m^{z}(x_{k}^{+},s_{k}^{+})+J(x_{k},s_{k})\hat{n}\|^{2}\\ \textrm{s.t.:}&l^{s}\leq s_{k}^{+}+\hat{n}^{s}\leq u^{s},\\ &l^{x}\leq x_{k}^{+}+\hat{n}^{x}\leq u^{x},\\ &x_{k}^{+}+\hat{n}^{x}\in{\cal S}_{k},\\ &\hat{n}\in\hat{{\cal N}_{k}},\end{array}\right. (95)

where

𝒩k^=def{n^n+mn^min[Δkz,κnmz(xk+,sk+)]}.\displaystyle\hat{{\cal N}_{k}}\stackrel{{\scriptstyle\rm def}}{{=}}\{\hat{n}\in\mathbb{R}^{n+m}\mid\|\hat{n}\|_{\infty}\leq\min\left[\,\Delta^{z}_{k},\,\kappa_{n}\,\|m^{z}(x_{k}^{+},s_{k}^{+})\|\,\right]\,\}. (96)
3.2.6.3 zz-iteration

If iteration kk is neither a μ\mu-iteration nor a ff-iteration, then it is said to be a zz-iteration. This means that the major contribution of iteration kk is to improve feasibility, which happens when either tk=0t_{k}=0 or when (83) fails.

We accept the trial point if the improvement in feasibility is comparable to its predicted value

δkz=def12z(xk,sk)212z(xk,sk)+J(xk,sk)dk2,\delta^{z}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}{\scriptstyle\frac{1}{2}}\|z(x_{k},s_{k})\|^{2}-{\scriptstyle\frac{1}{2}}\|z(x_{k},s_{k})+J(x_{k},s_{k})d_{k}\|^{2},

and the latter is itself comparable to its predicted decrease along the normal step, that is

nk0, δkzκznδkz,nandρkz=defv(xk,sk)v(xk+,sk+)δkzη1,\displaystyle n_{k}\neq 0,\;\text{ }\delta_{k}^{z}\geq\kappa_{zn}\delta^{z,n}_{k}\;\;\mbox{and}\;\;\rho_{k}^{z}\stackrel{{\scriptstyle\rm def}}{{=}}\frac{v(x_{k},s_{k})-v(x_{k}^{+},s_{k}^{+})}{\delta^{z}_{k}}\geq\eta_{1}, (97)

for some κzn(0,1κtg]\kappa_{zn}\in(0,1-\kappa_{tg}] and where

δkz,n\displaystyle\delta^{z,n}_{k} =def\displaystyle\stackrel{{\scriptstyle\rm def}}{{=}} 12z(xk,sk)212z(xk,sk)+J(xk,sk)nk2.\displaystyle{\scriptstyle\frac{1}{2}}\|z(x_{k},s_{k})\|^{2}-{\scriptstyle\frac{1}{2}}\|z(x_{k},s_{k})+J(x_{k},s_{k})n_{k}\|^{2}. (98)

If (97) fails, the trial point is rejected.

Finally, the funnel bound is updated as follows

vk+1max={max[κtx1vkmax,v(xk+,sk+)+κtx2(v(xk,sk)v(xk+,sk+))]if (97) hold,vkmaxotherwise,\displaystyle v_{k+1}^{\max}=\left\{\begin{array}[]{ll}\max\left[\kappa_{tx1}v_{k}^{\max},v(x_{k}^{+},s_{k}^{+})+\kappa_{tx2}(v(x_{k},s_{k})-v(x_{k}^{+},s_{k}^{+}))\right]&\mbox{if \eqref{eq:rhoc} hold,}\\ v_{k}^{\max}&\mbox{otherwise,}\end{array}\right. (101)

for some κtx1(0,1)\kappa_{tx1}\in(0,1) and κtx2(0,1)\kappa_{tx2}\in(0,1).

3.2.7 Criticality step

Two different criticality steps are employed: one for the subspaces 𝒮k{\cal S}_{k} with dim(𝒮k)<ndim({\cal S}_{k})<n and one for the full space (dim(𝒮k)=ndim({\cal S}_{k})=n). In the latter, convergence is declared whenever at least one of the following conditions is satisfied: (1) the trust-region radius Δk\Delta_{k} is too small, (2) the computed direction dkd_{k} is too small or (3) both feasibility and optimality have been achieved and the error between the real functions and the models is expected to be sufficiently small. As it was mentioned before, this error is directly linked to the Λ\Lambda-poisedness measure given in Definition 3.1. In the subspace, we are less demanding and only ask that either Δk\Delta_{k} be very small or both feasibility and optimality have been achieved without checking the models error though.

The complete criticality step in DEFT-FUNNEL is described in the next algorithm.

 Algorithm 3.5: CriticalityStep(𝒮k{\cal S}_{k}, 𝒴k{\cal Y}_{k}, πk1f\pi_{k-1}^{f}, ϵi\epsilon_{i})  

Step 1:

If dim(𝒮k)<0dim({\cal S}_{k})<0,

Step 1.1:

If Δkϵ(xk,sk)\Delta_{k}\leq\epsilon\|(x_{k},s_{k})\|, return (xk,sk)(x_{k},s_{k}).

Step 1.2:

If z(xk,sk)ϵ\|z(x_{k},s_{k})\|\leq\epsilon and π^ifϵ\hat{\pi}_{i}^{f}\leq\epsilon, return (xk,sk)(x_{k},s_{k}).

Step 2:

If dim(𝒮k)=ndim({\cal S}_{k})=n,

Step 2.1:

If Δkϵ(xk,sk)\Delta_{k}\leq\epsilon\|(x_{k},s_{k})\| or dkϵ(xk,sk)\|d_{k}\|\leq\epsilon\|(x_{k},s_{k})\|, return (xk,sk)(x_{k},s_{k}).

Step 2.2:

Define m^if=mkf\hat{m}^{f}_{i}=m^{f}_{k}, m^ic=mkc\hat{m}^{c}_{i}=m^{c}_{k} and π^if=πk1f\hat{\pi}_{i}^{f}=\pi_{k-1}^{f}.

Step 2.3:

If z(xk,sk)ϵi\|z(x_{k},s_{k})\|\leq\epsilon_{i} and π^ifϵi\hat{\pi}_{i}^{f}\leq\epsilon_{i}, set ϵi+1=max[αz(xk,sk),απ^if,ϵ]\epsilon_{i+1}=\max\left[\alpha\|z(x_{k},s_{k})\|,\alpha\hat{\pi}_{i}^{f},\epsilon\right] and modify 𝒴k{\cal Y}_{k} as needed to ensure it is Λ\Lambda-poised in (xk;ϵi+1){\cal B}(x_{k};\epsilon_{i+1}). If 𝒴k{\cal Y}_{k} was modified, compute new models m^if\hat{m}_{i}^{f} and m^ic\hat{m}_{i}^{c}, calculate r^i\hat{r}_{i} and π^if\hat{\pi}_{i}^{f} and increment ii by one. If z(xk,sk)ϵ\|z(x_{k},s_{k})\|\leq\epsilon and π^ifϵ\hat{\pi}_{i}^{f}\leq\epsilon, return (xk,sk)(x_{k},s_{k}); otherwise, start Step 2.3 again;

Step 2.4:

Set mkf=m^ifm^{f}_{k}=\hat{m}^{f}_{i}, mkc=m^icm^{c}_{k}=\hat{m}^{c}_{i}, πk1f=π^if\pi_{k-1}^{f}=\hat{\pi}_{i}^{f}, Δk=βmax[z(xk,sk),πk1f]\Delta_{k}=\beta\max\left[\|z(x_{k},s_{k})\|,\pi_{k-1}^{f}\right] and define ϑi=xk\vartheta_{i}=x_{k} if a new model has been computed.

 

3.3 Maintenance of the interpolation set and trust-region updating strategy

The management of the geometry of the interpolation set is based on the self-correcting geometry scheme proposed in [44], where unsuccessful trial points are used to improve the geometry of the interpolation set. It depends on the criterion used to define successful iterations, which is passed to the algorithm through the parameter criterion. This parameter depends on the iteration type (μ\mu, ff or zz, as explained previously). In general, unsuccessful trial points replace other sample points in the interpolation set which maximize a combined criteria of distance and poisedness involving the trial point. Finally, we also notice that we do not make use of “dummy” interpolation points resulting from projections onto the subspaces as in [43] anymore. The whole procedure is described in Algorithm 3.3. The definition of the maximum cardinality of the interpolation set, pmaxp^{\max}, is given in Section 3.2.1.

 Algorithm 3.6: UpdateInterpolationSet(𝒴k{\cal Y}_{k}, xkx_{k}, xk+x_{k}^{+}, Δk\Delta_{k}, ϵi\epsilon_{i}, criterion)  

1: Augment the interpolation set.

If |𝒴k|<pmax|{\cal Y}_{k}|<p^{\max}, then define 𝒴k+1=𝒴k{xk+}{\cal Y}_{k+1}={\cal Y}_{k}\cup\{x_{k}^{+}\}.

2: Successful iteration.

If |𝒴k|=pmax|{\cal Y}_{k}|=p^{\max} and criterion holds, then define 𝒴k+1=𝒴k{yk,r}{xk+}{\cal Y}_{k+1}={\cal Y}_{k}\setminus\{y^{k,r}\}\cup\{x_{k}^{+}\} for

yk,r=argmaxyk,j𝒴kyk,jxk+2|k,j(xk+)|.\displaystyle y^{k,r}=\arg\max_{y^{k,j}\in{\cal Y}_{k}}\|y^{k,j}-x_{k}^{+}\|^{2}|\ell_{k,j}(x_{k}^{+})|. (102)
3: Replace a far interpolation point.

If |𝒴k|=pmax|{\cal Y}_{k}|=p^{\max}, criterion fails, either xkϑix_{k}\neq\vartheta_{i} or Δkϵi\Delta_{k}\leq\epsilon_{i}, and the set

k=def{yk,j𝒴k such that yk,jxk|>ζΔ and k,j(xk+)0}\displaystyle{\cal F}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{y^{k,j}\in{\cal Y}_{k}\text{ such that }\|y^{k,j}-x_{k}\ |>\zeta\Delta\text{ and }\ell_{k,j}(x_{k}^{+})\neq 0\} (103)

is non-empty, then define 𝒴k+1=𝒴k{yk,r}{xk+}{\cal Y}_{k+1}={\cal Y}_{k}\setminus\{y^{k,r}\}\cup\{x_{k}^{+}\}, where

yk,r=argmaxyk,jkyk,jxk+2|k,j(xk+)|.\displaystyle y^{k,r}=\arg\max_{y^{k,j}\in{\cal F}_{k}}\|y^{k,j}-x_{k}^{+}\|^{2}|\ell_{k,j}(x_{k}^{+})|. (104)
4: Replace a close interpolation point.

If |𝒴k|=pmax|{\cal Y}_{k}|=p^{\max}, criterion fails, either xkϑix_{k}\neq\vartheta_{i} or Δkϵi\Delta_{k}\leq\epsilon_{i}, the set k{\cal F}_{k} is empty, and the set

𝒞k=def{yk,j𝒴k such that yk,jxkζΔ and |k,j(xk+)|>Λ}\displaystyle{\cal C}_{k}\stackrel{{\scriptstyle\rm def}}{{=}}\{y^{k,j}\in{\cal Y}_{k}\text{ such that }\|y^{k,j}-x_{k}\|\leq\zeta\Delta\text{ and }|\ell_{k,j}(x_{k}^{+})|>\Lambda\} (105)

is non-empty, then define 𝒴k+1=𝒴k{yk,r}{xk+}{\cal Y}_{k+1}={\cal Y}_{k}\setminus\{y^{k,r}\}\cup\{x_{k}^{+}\}, where

yk,r=argmaxyk,j𝒞kyk,jxk+2|k,j(xk+)|.\displaystyle y^{k,r}=\arg\max_{y^{k,j}\in{\cal C}_{k}}\|y^{k,j}-x_{k}^{+}\|^{2}|\ell_{k,j}(x_{k}^{+})|. (106)
5: No replacements.

If |𝒴k|=pmax|{\cal Y}_{k}|=p^{\max}, criterion fails and either [xk=ϑix_{k}=\vartheta_{i} and Δk>ϵi\Delta_{k}>\epsilon_{i}] or k𝒞k={\cal F}_{k}\cup{\cal C}_{k}=\emptyset, then define 𝒴k+1=𝒴k{\cal Y}_{k+1}={\cal Y}_{k}.

 

The trust-region update strategy associated to ff- and zz-iterations are now described. Following the idea proposed in [20], the trust-region radii are allowed to decrease even when the interpolation set has been changed after the replacement of a far point or a close point at unsuccessful iterations. However, the number of times it can be shrunk in this case is limited to νfmax\nu_{f}^{\max} and νzmax\nu_{z}^{\max} as a means to prevent the trust regions from becoming too small. If the interpolation set has not been updated, the algorithm understands that the lack of success is not due to the surrogate models but rather to the trust region size and thus it reduces the latter.

 Algorithm 3.7: ff-iteration(xkx_{k}, sks_{k}, xk+x_{k}^{+}, sk+s_{k}^{+}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z})  

1: Successful iteration.

If ρkfη1\rho_{k}^{f}\geq\eta_{1}, set (xk+1,sk+1)=(xk+,sk+)(x_{k+1},s_{k+1})=(x_{k}^{+},s_{k}^{+}) and νf=0\nu_{f}=0. The radius of 𝒯k{\cal T}_{k} is updated by

Δk+1f={min[max[γ2dk,Δkf],Δmax]ifρkfη2,Δkfifρkf[η1,η2),\displaystyle\Delta_{k+1}^{f}=\left\{\begin{array}[]{ll}\min\left[\max[\gamma_{2}\|d_{k}\|,\Delta_{k}^{f}],\Delta^{\max}\right]&\;\;\mbox{if}\;\;\rho_{k}^{f}\geq\eta_{2},\\ \Delta_{k}^{f}&\;\;\mbox{if}\;\;\rho_{k}^{f}\in[\eta_{1},\eta_{2}),\\ \end{array}\right. (109)

The radius of 𝒩k{\cal N}_{k} is updated by

Δk+1z={min[max[γ2nk,Δkz],Δmax]ifv(xk+,sk+)<η3vkmax,Δkzotherwise.\displaystyle\Delta_{k+1}^{z}=\left\{\begin{array}[]{ll}\min\left[\max[\gamma_{2}\|n_{k}\|,\Delta_{k}^{z}],\Delta^{\max}\right]&\!\!\!\;\;\mbox{if}\;\;v(x_{k}^{+},s_{k}^{+})<\eta_{3}\,v_{k}^{\max},\\ \Delta_{k}^{z}&\!\!\!\;\;\mbox{otherwise.}\end{array}\right. (112)
2: Unsuccessful iteration.

If ρkf<η1\rho_{k}^{f}<\eta_{1}, set (xk+1,sk+1)=(xk,sk)(x_{k+1},s_{k+1})=(x_{k},s_{k}) and Δk+1z=δkz\Delta^{z}_{k+1}=\delta^{z}_{k}. The radius of 𝒯k{\cal T}_{k} is updated by

Δk+1f={γ1dkif either(𝒴k+1𝒴kandνfνfmax)or𝒴k+1=𝒴k,Δkfif𝒴k+1𝒴kandνf>νfmax,\displaystyle\Delta_{k+1}^{f}=\left\{\begin{array}[]{ll}\gamma_{1}\|d_{k}\|&\!\!\!\!\!\;\;\mbox{if either}\;\;({\cal Y}_{k+1}\neq{\cal Y}_{k}\;\;\mbox{and}\;\;\nu_{f}\leq\nu_{f}^{\max})\\ &\,\;\;\;\mbox{or}\;\;{\cal Y}_{k+1}={\cal Y}_{k},\\ \Delta_{k}^{f}&\!\!\!\!\!\;\;\mbox{if}\;\;{\cal Y}_{k+1}\neq{\cal Y}_{k}\;\;\mbox{and}\;\;\nu_{f}>\nu_{f}^{\max},\\ \end{array}\right. (116)

If 𝒴k+1𝒴k{\cal Y}_{k+1}\neq{\cal Y}_{k} and νfνfmax\nu_{f}\leq\nu_{f}^{\max}, update νf=νf+1\nu_{f}=\nu_{f}+1.

 

The operations related to zz-iterations follow below.

 Algorithm 3.8: zz-iteration(xkx_{k}, sks_{k}, xk+x_{k}^{+}, sk+s_{k}^{+}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z})  

1: Successful iteration.

If (97) holds, set (xk+1,sk+1)=(xk+,sk+)(x_{k+1},s_{k+1})=(x_{k}^{+},s_{k}^{+}), Δk+1f=Δkf\Delta_{k+1}^{f}=\Delta_{k}^{f} and νz=0\nu_{z}=0. The radius of 𝒩k{\cal N}_{k} is updated by

Δk+1z={min[max[γ2nk,Δkz],Δmax]ifρkzη2,Δkzifρkz[η1,η2).\displaystyle\Delta_{k+1}^{z}=\left\{\begin{array}[]{ll}\min\left[\max[\gamma_{2}\|n_{k}\|,\Delta_{k}^{z}],\Delta^{\max}\right]&\!\!\!\!\!\;\;\mbox{if}\;\;\rho_{k}^{z}\geq\eta_{2},\\ \Delta_{k}^{z}&\!\!\!\!\!\;\;\mbox{if}\;\;\rho_{k}^{z}\in[\eta_{1},\eta_{2}).\\ \end{array}\right. (119)
2: Unsuccessful iteration.

If (97) fails, set (xk+1,sk+1)=(xk,sk)(x_{k+1},s_{k+1})=(x_{k},s_{k}) and Δk+1f=Δkf\Delta^{f}_{k+1}=\Delta^{f}_{k}. The radius of 𝒩k{\cal N}_{k} is updated by

Δk+1z={γ1nkifnk0and either(𝒴k+1𝒴kandνzνzmax)or𝒴k+1=𝒴k,Δkzif𝒴k+1𝒴kandνz>νzmax,γ1Δkzifnk=0,\displaystyle\Delta_{k+1}^{z}=\left\{\begin{array}[]{ll}\gamma_{1}\|n_{k}\|&\!\!\!\!\!\;\;\mbox{if}\;\;\|n_{k}\|\neq 0\;\;\mbox{and either}\;\;({\cal Y}_{k+1}\neq{\cal Y}_{k}\;\;\mbox{and}\;\;\nu_{z}\leq\nu_{z}^{\max})\\ &\,\;\;\;\mbox{or}\;\;{\cal Y}_{k+1}={\cal Y}_{k},\\ \Delta_{k}^{z}&\!\!\!\!\!\;\;\mbox{if}\;\;{\cal Y}_{k+1}\neq{\cal Y}_{k}\;\;\mbox{and}\;\;\nu_{z}>\nu_{z}^{\max},\\ \gamma_{1}\Delta_{k}^{z}&\!\!\!\!\!\;\;\mbox{if}\;\;\|n_{k}\|=0,\\ \end{array}\right. (124)

If 𝒴k+1𝒴k{\cal Y}_{k+1}\neq{\cal Y}_{k} and νzνzmax\nu_{z}\leq\nu_{z}^{\max}, update νz=νz+1\nu_{z}=\nu_{z}+1.

 

3.3.1 The local-search algorithm

We now provide the full description of the local search which assembles all the previous subroutines.

 Algorithm 3.9: LocalSearch(𝒮{\cal S}, 𝒴{\cal Y}, xx, ss, Δf\Delta^{f}, Δz\Delta^{z}, vmaxv^{\max})  

0: Initialization.

Choose an initial vector of Lagrange multipliers μ1\mu_{-1} and parameters ϵ>0\epsilon>0, ϵ0>0\epsilon_{0}>0, Δ0f>0\Delta^{f}_{0}>0, Δ0z>0\Delta^{z}_{0}>0, α(0,1)\alpha\in(0,1), 0<γ1<1<γ20<\gamma_{1}<1<\gamma_{2}, ζ1\zeta\geq 1, 0<η1<η2<10<\eta_{1}<\eta_{2}<1, Λ>1\Lambda>1, β>0\beta>0, η3>0\eta_{3}>0. Define Δ0=min[Δ0f,Δ0z]Δmax\Delta_{0}=\min[\Delta^{f}_{0},\Delta^{z}_{0}]\leq\Delta^{\max}. Initialize 𝒴0{\cal Y}_{0}, with x0𝒴0(x0;Δ0)x_{0}\in{\cal Y}_{0}\subset{\cal B}(x_{0};\Delta_{0}) and |𝒴0|n+1|{\cal Y}_{0}|\geq n+1, as well as the maximum number of interpolation points pmax|𝒴0|p_{\max}\geq|{\cal Y}_{0}|. Compute the associated models m0fm_{0}^{f} and m0cm_{0}^{c} around x0x_{0} and Lagrange polynomials {l0,j}j=0p\{l_{0,j}\}_{j=0}^{p}. Set 𝒮1=n{\cal S}_{-1}=\mathbb{R}^{n}. Define v0max=max[κza,κzrv(x0,s0)]v_{0}^{\max}=\max[\kappa_{za},\kappa_{zr}v(x_{0},s_{0})], where κza>0\kappa_{za}>0 and κzr>1\kappa_{zr}>1. Compute r1r_{-1} by solving (56) with normal step n1=0n_{-1}=0 and define π1f\pi_{-1}^{f} as in (49). Define νfmax>0\nu_{f}^{\max}>0 and νzmax>0\nu_{z}^{\max}>0 and set νf=νz=0\nu_{f}=\nu_{z}=0. Set k=0k=0 and i=0i=0.

1:

SubspaceMinimization(𝒮k1{\cal S}_{k-1}, 𝒴k{\cal Y}_{k}, xkx_{k}, sks_{k}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z}, vkmaxv_{k}^{\max}).

2:

CriticalityStep(𝒮k{\cal S}_{k}, 𝒴k{\cal Y}_{k}, πk1f\pi_{k-1}^{f}, ϵi\epsilon_{i}).

3:

NormalStep(xkx_{k}, sks_{k}, πkv\pi_{k}^{v}, vkv_{k}, vkmaxv_{k}^{\max}).

4:

TangentStep(xkx_{k}, sks_{k}, nkn_{k}).

5: Conclude a μ\mu-iteration.

If nk=tk=0n_{k}=t_{k}=0, then

5.1:

set (xk+1,sk+1)=(xk,sk)(x_{k+1},s_{k+1})=(x_{k},s_{k}), Δk+1f=Δkf\Delta_{k+1}^{f}=\Delta_{k}^{f} and Δk+1z=Δkz\Delta_{k+1}^{z}=\Delta_{k}^{z};

5.2:

set Δk+1=min[Δk+1f,Δk+1z]\Delta_{k+1}=\min[\Delta^{f}_{k+1},\Delta^{z}_{k+1}], vk+1max=vkmaxv_{k+1}^{\max}=v_{k}^{\max} and 𝒴k+1=𝒴k{\cal Y}_{k+1}={\cal Y}_{k}.

6: Conclude an ff-iteration.

If tk0t_{k}\neq 0 and (83) and (88) hold,

6.1:

UpdateInterpolationSet(𝒴k{\cal Y}_{k} ,xkx_{k}, xk+x_{k}^{+}, Δk\Delta_{k}, ϵi\epsilon_{i}, ‘ρkfη1\rho_{k}^{f}\geq\eta_{1}’);

6.2:

ff-iteration(xkx_{k}, sks_{k}, xk+x_{k}^{+}, sk+s_{k}^{+}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z});

6.3:

Set Δk+1=min[Δk+1f,Δk+1z]\Delta_{k+1}=\min[\Delta^{f}_{k+1},\Delta^{z}_{k+1}] and vk+1max=vkmaxv_{k+1}^{\max}=v_{k}^{\max}.

7: Conclude a zz-iteration.

If either nk0n_{k}\neq 0 and tk=0t_{k}=0, or either one of (83) or (88) fails,

7.1:

UpdateInterpolationSet(𝒴k{\cal Y}_{k} ,xkx_{k}, xk+x_{k}^{+}, Δk\Delta_{k}, ϵi\epsilon_{i}, ‘(97)’);

7.2:

zz-iteration(xkx_{k}, sks_{k}, xk+x_{k}^{+}, sk+s_{k}^{+}, Δkf\Delta_{k}^{f}, Δkz\Delta_{k}^{z});

7.3:

Set Δk+1=min[Δk+1f,Δk+1z]\Delta_{k+1}=\min[\Delta^{f}_{k+1},\Delta^{z}_{k+1}] and update vkmaxv_{k}^{\max} using (101).

8: Update the models and the Lagrange polynomials.

If 𝒴k+1𝒴k{\cal Y}_{k+1}\neq{\cal Y}_{k}, compute the interpolation models mk+1fm_{k+1}^{f} and mk+1cm_{k+1}^{c} around xk+1x_{k+1} using 𝒴k+1{\cal Y}_{k+1} and the associated Lagrange polynomials {lk+1,j}j=0p\{l_{k+1,j}\}_{j=0}^{p}. Increment kk by one and go to Step 1.

 

3.4 Parameters tuning and user goals

Like any other algorithm, DEFT-FUNNEL performance can be affected by how its parameters are tuned. In this section, we discuss about which parameters might have a major impact in its performance and how they should be tuned depending on the user goals. We consider four main aspects concerning the resolution of black-box problems: the budget, the priority level given to feasibility, the type of the objective function and the priority level given to global optimality. We can then describe the possible scenarios in decreasing order of difficulty as below:

  • Budget: very low, limited or unlimited.

  • Priority to feasibility: high or low.

  • Objective function: highly multimodal, multimodal or unimodal.

  • Priority given to global optimality: high or low.

Clearly, if the objective function is highly multimodal and the budget is limited, one should give priority to a multistart strategy with a high number of samples per iteration in case where global optimality is important. This can be done via two ways: the first one is by reducing the budget for each local search through the optional input maxeval_ls, which equals maxeval*0.7 by default; for instance, one could try setting maxeval_ls = maxeval*0.4, so that each local search uses up to 40% of the total budget only. The other possibility is to increase the number NN of random points generated at Step 3 in the global search (see Algorithm 3.1). In case where attaining the global minimum is not a condition, a better approach would be to give more budget to each local search so that the chances to reach a local minimum are higher. If the objective function is not highly multimodal, one should search for a good compromise between spending the budget on each local search and on the sampling of the multistart strategy.

We also notice that when the budget is too small and it is very hard to find a feasible solution, it may be a good idea to compute a tangent step only when feasibility has been achieved. This is due to the fact that tangent steps may still deteriorate the gains in feasibility obtained through normal steps. When this happens, more normal steps must be computed which requires more function evaluations. Therefore, instead of spending the budget with both tangent and normal steps without guarantee of feasibility in the end, it is a better strategy to compute only normal steps in the beginning so that the chances of obtaining a feasible solution are higher in the end. For this purpose, the user should set the constant value κ=0\kappa_{{\cal R}}=0 in (41). By doing so, the tangent step will be computed only if the normal step equals zero, which by (74) it happens when the iterate is feasible. In the code, this can be done by setting kappa_r to zero in the function deft_funnel_set_parameters.

4 Numerical experiments

We divide the numerical experiments into two sections: the first one is focused on the evaluation of the performance of DEFT-FUNNEL on black-box optimization problems, while the second one aims at analyzing the benefits of the exploitation of white-box functions on grey-box problems. In all experiments with DEFT-FUNNEL, minimum 2\ell_{2}-norm models were employed. The criticality step threshold was set to ϵ=104\epsilon=10^{-4} and the trust-region constants were defined as η1=104\eta_{1}=10^{-4}, η2=0.9\eta_{2}=0.9, η3=0.6\eta_{3}=0.6, γ1=0.5\gamma_{1}=0.5, γ2=2.0\gamma_{2}=2.0, νfmax=20×n\nu_{f}^{\max}=20\times n and νzmax=20×n\nu_{z}^{\max}=20\times n, where nn is the number of variables.

We compare DEFT-FUNNEL with two popular algorithms for constrained black-box optimization: the Genetic Algorithm (GA) and the Pattern Search (PS) algorithm from the Matlab Global Optimization Toolbox [32]. In particular, GA has been set with the Adaptive Feasible (’mutationadaptfeasible’) default mutation function and the Penalty (’penalty’) nonlinear constraint algorithm. As for PS, since it is a local optimization algorithm, it has been coupled with the Latin Hypercube Sampling (LHS) [33] method in order to achieve global optimality, a common strategy found among practitioners. Our experiments cover therefore three of the most popular approaches for solving black-box problems: surrogate-based methods, genetic algorithms and pattern-search methods. Other DFO algorithms have already been compared against DEFT-FUNNEL in previous papers (see [42, 43]) on a much larger set of test problems mainly designed for local optimization.

4.1 Black-box optimization problems

The three methods are compared on a set of 14 well-known benchmark problems for constrained global optimization, including four industrial design problems — Welded Beam Design (WB4) [13], Gas Transmission Compressor Design (GTCD4) [7], Pressure Vessel Design (PVD4) [11] and Speed Reducer Design (SR7) [17] — and the Harley pooling problem (problem 5.2.2 from [16]), which is originally a maximization problem and that has been converted into a minimization one. Besides the test problems originated from industrial applications, we have collected problems with different characteristics (multimodal, nonlinear, separable and non-separable, with connected and disconnected feasible regions) to have a broader view of the performance of the algorithms in various kinds of scenarios. For instance, the Hesse problem [23] is the result of the combination of 3 separable problems with 18 local minima and 1 global minimum, while the Gómez #3 problem, listed as the third problem in [18], consists of many disconnected feasible regions, thus having many local minima. The test problems G3-G11 are taken from the widely known benchmark list in [34]. In Table 3, we give the number of decision variables, the number of constraints and the best known feasible objective function value of each test problem.

Table 3: Problem name, number of decision variables, number of constraints (simple bounds not included) and the best known feasible objective function value of each test problem.
Test problem Number of Number of Best known feasible
variables constraints objective function value
Harley (Harley Pooling Problem) 9 6 -600
WB4 (Welded Beam Design) 4 6 1.7250
GTCD4 (Gas Transmission Compressor Design) 4 1 2964893.85
PVD4 (Pressure Vessel Design) 4 3 5804.45
SR7 (Speed Reducer Design) 7 11 2994.42
Hesse 6 6 -310
Gómez #3 2 1 -0.9711
G3 2 1 -1
G4 5 6 -30665.539
G6 2 2 -6961.8139
G7 10 8 24.3062
G8 2 2 -0.0958
G9 7 4 680.6301
G11 2 1 0.75000455

Two types of black-box experiments were conducted in order to compare the algorithms. In the first type, a small budget of 100 black-box calls is given to each algorithm in order to evaluate how they perform on difficult problems with highly expensive functions. In such scenarios, many algorithms have difficulties to obtain even local minima depending on the test problem. In the second type of experiments, we analyze their ability and speed to achieve global minima rather than local minima by allowing larger budgets that range from 100×n100\times n to 500×n500\times n, where nn is the number of variables. We consider every function as black box in both types of experiments. Finally, we have run each algorithm 50 times on each test problem.

Only approximate feasible solutions of the problem (5) are considered when comparing the best objective function values obtained by the algorithms, i.e. we require that each optimal solution xx^{*} satisfy cv(x)104cv(x^{*})\leq 10^{-4}, where

cv=defmax[[z(x)us]+,[lsz(x)]+].\displaystyle cv\stackrel{{\scriptstyle\rm def}}{{=}}\max\left[\left[z(x)-u^{s}\right]^{+},\left[l^{s}-z(x)\right]^{+}\right]. (125)

In the next two subsections, we show the results for the two types of experiments in the black-box setting.

4.1.1 Budget-driven experiments

The results of the first type of experiments are shown in Table 4. In the second column, fOPTf_{\textrm{OPT}} denotes the objective function value of the global minimum of the problem when it is known or the best known objective function value otherwise. For each solver, we show the best, the average and the worst objective function values obtained in 50 runs on every test problem.

Table 4: Results for the first type of experiments on a budget of 100 black-box calls. For each solver, we show the best, the average and the worst objective function values obtained in 50 runs.
Prob fOPTf_{\textrm{OPT}} Solver Best Avg. Worst
Harley -600 GA -195.2264 -3.1266 209.5655
PS None None None
DEFT-FUNNEL -600 -17.1508 301.9904
WB4 1.7250 GA 2.3013 4.4525 6.1749
PS 3.6274 7.2408 11.0038
DEFT-FUNNEL 2.9246 7.3649 16.1476
GTCD4 2964893.85 GA 4004353.4045 9693274.3280 13860176.1776
PS 4953046.0849 12212940.1950 13786675.7772
DEFT-FUNNEL 3648033.5677 11080105.7502 28062428.3209
PVD4 5804.45 GA 5804.3762 5870.6299 6095.9716
PS 5877.6483 7650.8416 10827.5206
DEFT-FUNNEL 5804.3761 7360.2882 10033.0231
SR7 2994.42 GA 3027.8978 3151.6054 3438.7069
PS 3134.0525 3516.1761 5677.6224
DEFT-FUNNEL 3003.7577 3489.8743 4583.1364
Hesse -310 GA -292.1091 -277.1073 -259.1020
PS -302.1163 -162.5650 -49.4364
DEFT-FUNNEL -310 -234.5969 -24
Gómez #3 -0.9711 GA -0.9711 -0.8551 -0.6532
PS -0.9700 -0.7774 -0.4293
DEFT-FUNNEL -0.9711 -0.0689 3.23333
G3 -1 GA -1 -0.9978 -0.9872
PS -1 -0.8162 -0.0831
DEFT-FUNNEL -1 -0.8967 -0.0182
G4 -30665.539 GA -30674.0765 -30048.4518 -29243.6646
PS -30814.5885 -29422.4521 -27838.8541
DEFT-FUNNEL -31025.6056 -30980.3524 -29246.5608
G6 -6961.8139 GA -6240.7711 -3520.1488 -2275.7798
PS -6252.2652 -3220.0072 -1206.1356
DEFT-FUNNEL -6961.8165 -6961.8158 -6961.8146
G7 24.3062091 GA 95.3300 325.3722 688.7635
PS 197.7475 434.6940 687.2492
DEFT-FUNNEL 24.3011 52.1011 185.5706
G8 -0.095825 GA -0.0909 -0.0267 -0.0002
PS -0.0953 -0.0097 0.0185
DEFT-FUNNEL -0.0958 -0.0471 0.0008
G9 680.6300573 GA 775.1281 1253.1102 4815.6952
PS 953.2964 5336.9157 12000.4159
DEFT-FUNNEL 797.1996 1403.7992 2668.9271
G11 0.75000455 GA 0.7500 0.7557 0.7941
PS 0.7502 0.8937 0.9997
DEFT-FUNNEL 0.7499 0.7513 0.8091

As it can be seen in Table 4, DEFT-FUNNEL found the global minimum in 10 out of 14 problems, while GA and PS did it in only 5 problems. Besides, when considering the best value found by each solver, DEFT-FUNNEL was superior to the others or equal to the best solver in 12 problems. In the average and worse cases, it also presented a very good performance; in particular, its worse performance was inferior to all others’ in only 4 problems. Although GA did not reach the global minimum often, it presented the best average-case performance among all methods, while PS presented the worst. Finally, PS was the only one unable to reach a feasible solution in the Harley pooling problem.

4.1.2 Experiments driven by global minima

We now evaluate the ability of each solver to find global minima rather than local minima. The results of the second type of experiments for each test problem are presented individually, which allows a better analysis of the evolution of each solver performance over the number of function evaluations allowed. Each figure is thus associated to one single test problem and shows the average progress curve of each solver after 50 trials as a function of the budget.

In Figure 4, we show the results on test problems Harley, WB4, GTCD4 and PVD4. DEFT-FUNNEL and GA presented the best performance among the three methods, each one being superior to the other in 2 out of 4 problems. PS not only was inferior to the other two methods, but it also seemed not to be affected by the number of black-box calls allowed, as it can be seen in Figures 4(a), 4(b) and 4(c). In particular, it did not find a feasible solution for the Harley problem at all. Moreover, the solutions found by PS had often much larger objective function values than those obtained by DEFT-FUNNEL and GA.

Refer to caption
(a) Test problem: Harley.
Refer to caption
(b) Test problem: WB4.
Refer to caption
(c) Test problem: GTCD4.
Refer to caption
(d) Test problem: PVD4.
Figure 4: Mean best feasible objective function value obtained by each solver on 50 trials with budgets of 100×n100\times n, 200×n200\times n, 300×n300\times n and 400×n400\times n black-box function evaluations.

Figure 5 shows the results on test problems SR7, Hesse, Gómez #3 and G3. DEFT-FUNNEL and GA had similar performances on SR7, Hesse and G3 problems, while PS had the poorest performance. GA was the best method on Gómez #3, being very close to the global minimum in the end. Finally, PS performance on SR7 shows that allowing more black-box calls was not helpful and that the objective function value even increased.

Refer to caption
(a) Test problem: SR7.
Refer to caption
(b) Test problem: Hesse.
Refer to caption
(c) Test problem: Gómez #3.
Refer to caption
(d) Test problem: G3.
Figure 5: Mean best feasible objective function value obtained by each solver on 50 trials with budgets of 100×n100\times n, 200×n200\times n, 300×n300\times n and 400×n400\times n black-box function evaluations.

Figure 6 shows the results on test problems G4, G6, G7, G8, G9 and G11. DEFT-FUNNEL was superior to all other methods, attaining global optimality on 4 problems independently of the number of evaluations allowed. GA was the second best, followed by PS. Besides that PS did not find a feasible solution on Harley pooling problem, there is a significant increase on its objective function value as the number of evaluations grows on test problems SR7 and G7. These 3 problems are the ones with the largest number of constraints, which could be a reason for its poor performance.

Refer to caption
(a) Test problem: G4.
Refer to caption
(b) Test problem: G6.
Refer to caption
(c) Test problem: G7.
Refer to caption
(d) Test problem: G8.
Refer to caption
(e) Test problem: G9.
Refer to caption
(f) Test problem: G11.
Figure 6: Mean best feasible objective function value obtained by each solver on 50 trials with budgets of 100×n100\times n, 200×n200\times n, 300×n300\times n and 400×n400\times n black-box function evaluations.

4.2 Grey-box optimization problems

In this section, DEFT-FUNNEL is compared with GA and PS on a set of artificial grey-box problems built by selecting a test problem and defining some of its functions as black boxes and the remaining as white boxes. Among the 5 grey-box problems considered in the experiments, 3 were used in the black-box experiments described in the previous subsection, where all functions were considered as black boxes, namely: Hesse, SR7 and GTCD4. The reuse of these test problems allows for evaluating the performance improvement of DEFT-FUNNEL in cases where some information is available and for comparing its performance with that obtained in the black-box setting. The remaining grey-box problems are the problems 21 and 23 from the well-known Hock-Schittkowski collection [24], denoted here as HS21 and HS23, respectively. Both HS21 and HS23 have nonlinear objective functions; however, in HS21 the objective function is defined as white box, while in HS23 it is defined is black box. The only constraint present in HS21 is defined as black box, while in HS23 there is a balance between both categories among the constraints. We have therefore attempted to cover different possibilities related to the type of functions in a grey-box setting. The derivatives of all functions defined as white boxes were given as input to DEFT-FUNNEL. The reader can find more details about the tested grey-box problems in Table 5.

Table 5: Problem name, number of decision variables, number of black-box constraints, number of white-box constraints, type of objective function – black box (BB) or white box (WB) – and the best known feasible objective function value of each test problem.
Test problem Number of Number of Number of Type of Best known feasible
variables BB constraints WB constraints Obj. function obj. function value
GTCD4 4 0 1 BB 2964893.85
SR7 7 9 2 WB 2994.42
Hesse 6 3 3 WB -310
HS21 2 1 0 WB -99.96
HS23 2 3 2 WB 2

In Table 6, the results obtained with a budget of 100 black-box calls are presented. It can be seen that DEFT-FUNNEL was the only one to reach the global minimum in every problem, while GA had the best average-case and worst-case performances in general. We also notice that the three solvers had similar performances on HS21, attaining all the global minimum without much difficulty.

When comparing the black-box and grey-box results obtained by DEFT-FUNNEL on problems GTCD4 and SR7, it is evident that the available information from the white-box functions contributed to improve its performance. Not only the best objective function values on these problems were improved, allowing for reaching the global minimum on SR7, but also the average and worst ones. Since DEFT-FUNNEL had already attained the global minimum on Hesse in the black-box setting, the only expected improvement would be in the average and worst cases, which did not happen in our experiments. This is probably due to the fact that this a multimodal problem with 18 local minima, being a combination of 3 separable problems and having 1 global minimum. Therefore, even if information about the function is partially available, the problem remains very difficult to solve due to its nature.

Table 6: Experiments with grey-box problems with a budget of 100 black-box calls. For each solver, we show the best, the average and the worst objective function values obtained in 50 runs.
Prob fOPTf_{\textrm{OPT}} Solver Best Avg. Worst
GTCD4 2964893.85 GA 4004353.4045 9693274.3280 13860176.1776
PS 4953046.0849 12212940.1950 13786675.7772
DEFT-FUNNEL 3564559.7818 9994027.4513 13937534.7346
SR7 2994.42 GA 3027.8978 3151.6054 3438.7069
PS 3134.0525 3516.1761 5677.6224
DEFT-FUNNEL 2994.4244 3370.8758 4274.8433
Hesse -310 GA -292.1091 -277.1073 -259.1020
PS -302.1163 -162.5650 -49.4364
DEFT-FUNNEL -310 -210.6784 -36
HS21 -99.96 GA -99.9599 -99.5528 -95.6688
PS -99.9599 -99.9435 -99.8086
DEFT-FUNNEL -99.96 -99.9593 -99.9267
HS23 2 GA 4.7868 13.7529 174.1049
PS 4.5138 45.6059 274.6368
DEFT-FUNNEL 2 424.9427 1242.011

5 Conclusions

We have introduced a new global optimization solver for general constrained grey-box and black-box optimization problems named DEFT-FUNNEL. It combines a stochastic multistart strategy with a trust-funnel sequential quadratic optimization algorithm where the black-box functions are replaced by surrogate models built from polynomial interpolation. The proposed solver is able to exploit available information from white-box functions rather than considering them as black boxes. Its code is open source and freely available at the Github repository http://github.com/phrsampaio/deft-funnel. Unlike many black-box optimization algorithms, it can handle both inequality and equality constraints and it does not require feasible starting points. Moreover, the constraints are treated individually rather than grouped into a penalty function.

We have shown that DEFT-FUNNEL compares favourably with other state-of-the-art algorithms available for the optimization community on a collection of well-known benchmark problems. The reported numerical results indicate that the proposed approach is very promising for grey-box and black-box optimization. Future research works include extensions for multiobjective optimization and mixed-integer nonlinear optimization as well as as parallel version of the solver.

References

  • Amaioua et al. [2018] N. Amaioua, C. Audet, A. R. Conn, and S. L. Digabel. Efficient solution of quadratically constrained quadratic subproblems within the mesh adaptive direct search algorithm. European Journal of Operational Research, 268(1):13–24, 2018. ISSN 0377-2217. doi: https://doi.org/10.1016/j.ejor.2017.10.058. URL http://www.sciencedirect.com/science/article/pii/S0377221717309876.
  • Armstrong and Favorite [2016] J. C. Armstrong and J. A. Favorite. Using a derivative-free optimization method for multiple solutions of inverse transport problems. Optimization and Engineering, 17(1):105–125, Mar 2016. ISSN 1573-2924. doi: 10.1007/s11081-015-9306-x. URL https://doi.org/10.1007/s11081-015-9306-x.
  • Audet and Dennis [2006] C. Audet and J. Dennis. Mesh adaptive direct search algorithms for constrained optimization. SIAM Journal on Optimization, 17(1):188–217, 2006. doi: 10.1137/040603371. URL https://doi.org/10.1137/040603371.
  • Audet and Hare [2017] C. Audet and W. Hare. Derivative-Free and Blackbox Optimization. Springer, Cham, 2017. ISBN 978-3-319-68912-8. doi: https://doi.org/10.1007/978-3-319-68913-5.
  • Audet et al. [2015] C. Audet, S. L. Digabel, and M. Peyrega. Linear equalities in blackbox optimization. Computational Optimization and Applications, 61(1):1–23, May 2015. ISSN 1573-2894. doi: 10.1007/s10589-014-9708-2. URL https://doi.org/10.1007/s10589-014-9708-2.
  • Bajaj et al. [2018] I. Bajaj, S. S. Iyer, and M. M. F. Hasan. A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point. Computers & Chemical Engineering, 116:306–321, 2018. ISSN 0098-1354. doi: https://doi.org/10.1016/j.compchemeng.2017.12.011. URL http://www.sciencedirect.com/science/article/pii/S0098135417304404.
  • Beightler and Phillips [1976] C. S. Beightler and D. T. Phillips. Applied Geometric Programming. John Wiley & Sons Inc, New York, 1976.
  • Birgin et al. [2000] E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10(4):1196–1211, 2000.
  • Boukouvala et al. [2017] F. Boukouvala, M. M. F. Hasan, and C. A. Floudas. Global optimization of general constrained grey-box models: new method and its application to constrained pdes for pressure swing adsorption. Journal of Global Optimization, 67(1):3–42, Jan 2017. ISSN 1573-2916. doi: 10.1007/s10898-015-0376-2. URL https://doi.org/10.1007/s10898-015-0376-2.
  • Bueno et al. [2013] L. Bueno, A. Friedlander, J. Martínez, and F. Sobral. Inexact restoration method for derivative-free optimization with smooth constraints. SIAM Journal on Optimization, 23(2):1189–1213, 2013. doi: 10.1137/110856253. URL https://doi.org/10.1137/110856253.
  • Coello and Montes [2002] C. A. C. Coello and E. M. Montes. Constraint-handling in genetic algorithms through the use of dominance-based tournament selection. Advanced Engineering Informatics, 16(3):193–203, 2002. ISSN 1474–0346. doi: https://doi.org/10.1016/S1474-0346(02)00011-3. URL http://www.sciencedirect.com/science/article/pii/S1474034602000113.
  • Conn et al. [2009] A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to derivative-free optimization. MPS-SIAM Book Series on Optimization, Philadelphia, 2009.
  • Deb [2000] K. Deb. An efficient constraint handling method for genetic algorithms. Computer Methods in Applied Mechanics and Engineering, 186(2):311 – 338, 2000. ISSN 0045-7825. doi: https://doi.org/10.1016/S0045-7825(99)00389-8. URL http://www.sciencedirect.com/science/article/pii/S0045782599003898.
  • Echebest et al. [2017] N. Echebest, M. L. Schuverdt, and R. P. Vignau. An inexact restoration derivative-free filter method for nonlinear programming. Computational and Applied Mathematics, 36(1):693–718, Mar 2017. ISSN 1807-0302. doi: 10.1007/s40314-015-0253-0. URL https://doi.org/10.1007/s40314-015-0253-0.
  • Floudas [2000] C. Floudas. Deterministic Global Optimization: Theory, Methods and Applications. Springer, Boston, MA, 2000. ISBN 978-1-4419-4820-5. doi: https://doi.org/10.1007/978-1-4757-4949-6.
  • Floudas et al. [1999] C. Floudas, P. Pardalos, C. Adjiman, W. R. Esposito, Z. H. Gümüs, S. T. Harding, J. L. Klepeis, C. A. Meyer, and C. A. Schweiger. Handbook of Test Problems in Local and Global Optimization. Springer US, 1999. ISBN 978-0-7923-5801-5. doi: 10.1007/978-1-4757-3040-1.
  • Floudas and Pardalos [1990] C. A. Floudas and P. M. Pardalos. A Collection of Test Problems for Constrained Global Optimization Algorithms. Springer-Verlag, Berlin, Heidelberg, 1990. ISBN 0-387-53032-0.
  • Gómez and Levy [1982] S. Gómez and A. V. Levy. The tunnelling method for solving the constrained global optimization problem with several non-connected feasible regions. In J. P. Hennart, editor, Numerical Analysis, pages 34–47, Berlin, Heidelberg, 1982. Springer Berlin Heidelberg. ISBN 978-3-540-38986-6.
  • Gould and Toint [2010] N. I. M. Gould and P. L. Toint. Nonlinear programming without a penalty function or a filter. Mathematical Programming, 122(1):155–196, Mar 2010. ISSN 1436-4646. doi: 10.1007/s10107-008-0244-7. URL https://doi.org/10.1007/s10107-008-0244-7.
  • Gratton et al. [2011] S. Gratton, P. L. Toint, and A. Tröltzsch. An active-set trust-region method for bound-constrained nonlinear optimization without derivatives. Optimization Methods and Software, 26(4-5):875–896, 2011.
  • Griewank [2003] A. Griewank. A mathematical view of automatic differentiation. Acta Numerica, 12:321––398, 2003. doi: 10.1017/S0962492902000132.
  • Griewank and Walther [2008] A. Griewank and A. Walther. Evaluating Derivatives. Society for Industrial and Applied Mathematics, second edition, 2008. doi: 10.1137/1.9780898717761. URL https://epubs.siam.org/doi/abs/10.1137/1.9780898717761.
  • Hesse [1973] R. Hesse. A heuristic search procedure for estimating a global solution of nonconvex programming problems. Operations Research, 21(6):1267–1280, 1973. doi: 10.1287/opre.21.6.1267. URL https://doi.org/10.1287/opre.21.6.1267.
  • Hock and Schittkowski [1980] W. Hock and K. Schittkowski. Test examples for nonlinear programming codes. Journal of Optimization Theory and Applications, 30(1):127–129, 1980. doi: 10.1007/BF00934594. URL https://doi.org/10.1007/BF00934594.
  • Jones et al. [1998] D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, Dec 1998. ISSN 1573-2916. doi: 10.1023/A:1008306431147. URL https://doi.org/10.1023/A:1008306431147.
  • Kan and Timmer [1987a] A. H. G. R. Kan and G. T. Timmer. Stochastic global optimization methods part i: Clustering methods. Mathematical Programming, 39(1):27–56, Sep 1987a. ISSN 1436-4646. doi: 10.1007/BF02592070. URL https://doi.org/10.1007/BF02592070.
  • Kan and Timmer [1987b] A. H. G. R. Kan and G. T. Timmer. Stochastic global optimization methods part ii: Multi level methods. Mathematical Programming, 39(1):57–78, Sep 1987b. ISSN 1436-4646. doi: 10.1007/BF02592071. URL https://doi.org/10.1007/BF02592071.
  • Larson et al. [2019] J. Larson, M. Menickelly, and S. M. Wild. Derivative-free optimization methods. Acta Numerica, 28:287–404, 2019. doi: 10.1017/S0962492919000060.
  • Lewis and Torczon [2002] R. Lewis and V. Torczon. A globally convergent augmented lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM Journal on Optimization, 12(4):1075–1089, 2002. doi: 10.1137/S1052623498339727. URL https://doi.org/10.1137/S1052623498339727.
  • Locatelli [1998] M. Locatelli. Relaxing the assumptions of the multilevel single linkage algorithm. Journal of Global Optimization, 13(1):25–42, Jan 1998. ISSN 1573-2916. doi: 10.1023/A:1008246031222. URL https://doi.org/10.1023/A:1008246031222.
  • Maratos [1978] N. Maratos. Exact penalty function algorithms for finite dimensional and control optimization problems. Department of Control Theory, Imperial College London, 1978. URL https://books.google.fr/books?id=Ar2AtgAACAAJ.
  • MATLAB [2015] MATLAB. 2015b. The MathWorks Inc., Natick, Massachusetts, 2015.
  • McKay et al. [1979] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239–245, 1979. ISSN 00401706. URL http://www.jstor.org/stable/1268522.
  • Michalewicz and Schoenauer [1996] Z. Michalewicz and M. Schoenauer. Evolutionary algorithms for constrained parameter optimization problems. Evolutionary Computation, 4(1):1–32, March 1996. doi: 10.1162/evco.1996.4.1.1.
  • Nocedal and Wright [2006] J. Nocedal and S. J. Wright. Numerical optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York, NY, 2. ed. edition, 2006. ISBN 978-0-387-30303-1.
  • Powell [1994] M. J. D. Powell. A direct search optimization method that models the objective and constraint functions by linear interpolation. In S. Gomez and J.-P. Hennart, editors, Advances in Optimization and Numerical Analysis, pages 51–67. Springer Netherlands, Dordrecht, 1994. ISBN 978-94-015-8330-5. doi: 10.1007/978-94-015-8330-5_4. URL https://doi.org/10.1007/978-94-015-8330-5_4.
  • Regis [2011] R. G. Regis. Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions. Computers & Operations Research, 38(5):837–853, 2011. ISSN 0305-0548. doi: https://doi.org/10.1016/j.cor.2010.09.013. URL http://www.sciencedirect.com/science/article/pii/S030505481000208X.
  • Regis [2014] R. G. Regis. Constrained optimization by radial basis function interpolation for high-dimensional expensive black-box problems with infeasible initial points. Engineering Optimization, 46(2):218–243, 2014. doi: 10.1080/0305215X.2013.765000. URL https://doi.org/10.1080/0305215X.2013.765000.
  • Regis and Shoemaker [2005] R. G. Regis and C. A. Shoemaker. Constrained global optimization of expensive black box functions using radial basis functions. Journal of Global Optimization, 31(1):153–171, Jan 2005. ISSN 1573-2916. doi: 10.1007/s10898-004-0570-0. URL https://doi.org/10.1007/s10898-004-0570-0.
  • Regis and Shoemaker [2007] R. G. Regis and C. A. Shoemaker. A stochastic radial basis function method for the global optimization of expensive functions. INFORMS Journal on Computing, 19(4):497–509, 2007. doi: 10.1287/ijoc.1060.0182. URL https://doi.org/10.1287/ijoc.1060.0182.
  • Rios and Sahinidis [2013] L. M. Rios and N. V. Sahinidis. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 56(3):1247–1293, Jul 2013. ISSN 1573-2916. doi: 10.1007/s10898-012-9951-y. URL https://doi.org/10.1007/s10898-012-9951-y.
  • Sampaio and Toint [2015] P. R. Sampaio and P. L. Toint. A derivative-free trust-funnel method for equality-constrained nonlinear optimization. Computational Optimization and Applications, 61(1):25–49, May 2015. ISSN 1573-2894. doi: 10.1007/s10589-014-9715-3. URL https://doi.org/10.1007/s10589-014-9715-3.
  • Sampaio and Toint [2016] P. R. Sampaio and P. L. Toint. Numerical experience with a derivative-free trust-funnel method for nonlinear optimization problems with general nonlinear constraints. Optimization Methods and Software, 31(3):511–534, 2016. doi: 10.1080/10556788.2015.1135919. URL https://doi.org/10.1080/10556788.2015.1135919.
  • Scheinberg and Toint [2010] K. Scheinberg and P. L. Toint. Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization. SIAM Journal on Optimization, 20(6):3512–3532, 2010.
  • Sendín et al. [2009] J.-O. H. Sendín, J. R. Banga, and T. Csendes. Extensions of a multistart clustering algorithm for constrained global optimization problems. Industrial & Engineering Chemistry Research, 48(6):3014–3023, 2009. doi: 10.1021/ie800319m. URL https://doi.org/10.1021/ie800319m.
  • Zhang et al. [2010] H. Zhang, A. Conn, and K. Scheinberg. A derivative-free algorithm for least-squares minimization. SIAM Journal on Optimization, 20(6):3555–3576, 2010. doi: 10.1137/09075531X. URL https://doi.org/10.1137/09075531X.