Model Construction for Convex-Constrained Derivative-Free Optimization
Abstract
We develop a new approximation theory for linear and quadratic interpolation models, suitable for use in convex-constrained derivative-free optimization (DFO). Most existing model-based DFO methods for constrained problems assume the ability to construct sufficiently accurate approximations via interpolation, but the standard notions of accuracy (designed for unconstrained problems) may not be achievable by only sampling feasible points, and so may not give practical algorithms. This work extends the theory of convex-constrained linear interpolation developed in [Hough & Roberts, SIAM J. Optim, 32:4 (2022), pp. 2552–2579] to the case of linear regression models and underdetermined quadratic interpolation models.
Keywords: derivative-free optimization, convex constraints, interpolation
Mathematics Subject Classification: 65K05, 90C30, 90C56
1 Introduction
This work is concerned with model-based methods for derivative-free optimization (DFO). Such methods are widely studied and have proven to be popular in practice [16, 1, 25]. Most commonly, model-based methods construct a local interpolant for the objective from selected evaluations which is then used in place of Taylor series in a standard framework, most commonly trust-region methods [13]. The convergence and worst-case complexity of model-based DFO in the unconstrained setting is well-established. Here, we are interested in nonconvex minimization over a convex set,
(1.1) |
where is smooth (but nonconvex), and is a closed, convex set with nonempty interior. Such problems appear in the derivative-free setting, for example bound and linear constraints appearing in problems from climate science [31, 30], deep learning [33] and systems design engineering [23]. Although a model-based DFO framework for solving (1.1) was recently developed in [22], it only considered the case of constructing linear interpolants from function evaluations (at feasible points). Outside of structured problems such as nonlinear least-squares [8], quadratic interpolants are the most common models used in model-based DFO [28, 29, 16]. In this work, we develop a comprehensive approximation theory for linear regression and underdetermined quadratic interpolation models based on sampling only from feasible points. This theory fits the requirements of the method from [22], and so allows this framework to be used with more practical model choices.
1.1 Existing work
Model-based DFO methods for unconstrained problems have well-established convergence [13, 16] and worst-case complexity theory [17, 25]. This theory relies on building (typically linear or quadratic) interpolants for the objective function based on samples of the objective at specific points, which must be “fully linear”, i.e. satisfying specific error bounds. Unlike much approximation theory in numerical analysis (e.g. [32]), it is typically assumed that we have limited ability to select the sample points and instead should use as many pre-existing evaluations as possible (motivated by problems with computationally expensive objectives, an important DFO use case). In this setting, the model accuracy can be bounded by quantities only depending on the choice of sampling points (and information about the smoothness of the objective). Originally in [13], the concept of “-poised” interpolation sets was introduced as a sufficient condition for constructing fully linear interpolation models, as well as developing procedures for improving the choice of sample points (while retaining as many existing points as possible) to construct a -poised set. This was extended in [14] for polynomial regression and underdetermined polynomial interpolation models. A specific form of underdetermined quadratic interpolation, namely minimum Frobenius interpolation, was introduced and extensively studied in [26, 27] and is the foundation of several state-of-the-art codes [28, 29, 6]. An analysis of minimum Frobenius interpolation in terms of -poisedness/fully linear error bounds was given in [16]. Alternate model constructions have been proposed based on weighted linear regression [5], underdetermined quadratic interpolation based on Sobolev norms [37], radial basis functions (RBF) [35] and combinations of polynomial and RBFs [2], for example.
Many model-based DFO methods have been proposed for constrained optimization; see [25, Section 7] for a thorough survey. We note in particular the following strictly feasible methods: [10] studies convex-constrained problems, [29, 19] and [34, Section 6.3] consider bound constraints, and [20] considers linear inequality constraints. Strictly feasible methods for nonconvex-constrainted problems have also been considered in [11, 9, 36]. Of these, where convergence was established it was based on the assumption that fully linear models could be constructed using the same -poised interpolation sets as in the unconstrained case. However, this is potentially impractical, as in some settings “it may be impossible to obtain a fully linear model using only feasible points” [25, p. 362].
More recently, [22] introduces a model-based DFO method for solving (1.1) by adapting the unconstrained approach from [15] based on gradient-based trust-region methods for convex-constrained optimization [12, Chapter 12]. The method has convergence and worst-case complexity results which align with unconstrained model-based DFO [17] and convex-constrained gradient-based trust-region methods [7]. To make this approach practical, it introduced a weaker notion of fully linear models which are sufficient for algorithm convergence, but which can be attained using linear interpolation over a -poised set of only feasible points. It also demonstrated how to modify an interpolation set to make it -poised (without changing too many points). However, linear models are not typically used in practice, except for special cases (such as nonlinear least-squares minimization [8]), with quadratic models being preferred [28, 29, 6].
We additionally note that the BOBYQA code [29] for bound-constrained optimization uses maximization of Lagrange polynomials inside the feasible region—similar to how we will define -poisedness in this setting—but this was a heuristic with no theoretical analysis.111In fact, our results here give some theoretical justification to this approach.
1.2 Contributions
The contributions of this work are:
- •
-
•
A full approximation theory for underdetermined quadratic interpolation (in the style of [26, 27]) based on sampling only feasible points. This includes derivation of fully linear error bounds, defining the relevant notion of -poisedness in this setting, and procedures for efficiently generating -poised sets. By construction, in the case of a full interpolation set of points for quadratic interpolation, this approach produces the same result as standard quadratic interpolation, and so our results also apply to this setting.
While the linear regression theory is of independent interest for DFO applied to noisy objectives [24], it is primarily included as a prerequisite for deriving the quadratic results. The results proven in both cases are sufficient to enable these model constructions to be used in the algorithmic framework from [22] (with the associated convergence and worst-case complexity guarantees).
While the formulations and results are very similar to those for the unconstrained case from [14], the constrained setting necessitates new approaches for proving the results (which when applied to the case provide simpler proofs of existing results).
Organization
A summary of the modified fully linear model accuracy requirement and associated algorithm from [22] is given in Section 2. That -poisedness (only over the feasible region) guarantees fully linear interpolation models is first shown for linear regression models in Section 3 and then for (underdetermined) quadratic interpolation models in Section 4. Lastly, Section 5 provides concrete procedures for identifying and constructing -poised quadratic interpolation models using only feasible points.
Notation
We let be the Euclidean norm of vectors and the operator 2-norm of matrices, and use for and for the closed ball .
2 Background
This section summarizes the key background and results from [22], which provide a model-based DFO algorithm for solving (1.1).
We now outline how problem (1.1) can be solved using model-based DFO methods. We begin with outlining our problem assumptions.
Assumption 2.1.
The objective function is bounded below by and continuously differentiable, and is -Lipschitz continuous.
Although exists, we assume that it is not available to the algorithm as an oracle.
Assumption 2.2.
The feasible set is closed and convex with nonempty interior.
Assumption 2.2 implies that the Euclidean projection operator for ,
(2.1) |
is a well-defined function [3, Theorem 6.25]. Our framework is based on a setting in which is inexpensive to evaluate (at least compared to evaluations of the objective ). Several examples of feasible sets which have simple analytic expressions for are given in [3, Table 6.1].
In this work, as in [22], we aim to find a first-order optimal solution to (1.1). To measure this, we use the following first-order criticality measure from [12, Section 12.1.4], defined for all :
(2.2) |
From [12, Theorem 12.1.6], we have that is a non-negative, continuous function of and if and only if is a first-order critical point for (1.1). In the unconstrained case , then (2.2) simplifies to . For notational convenience, we define , where is the -th iterate of our algorithm.
2.1 Model Construction & Accuracy
We will consider a model-based DFO approach for solving (1.1). Specifically, at each iteration we construct a local quadratic approximation to which we hope to be accurate near the current iterate :
(2.3) |
for some , and with .
Assumption 2.3.
There exists such that for all .
Convergence of our algorithm will require that the local models (2.3) are sufficiently accurate approximations of the objective (at least on some iterations). The required notion of ‘sufficiently accurate’ is the following:
Definition 2.4.
The model (2.3) is -fully linear in if there exist , independent of , such that
(2.4a) | ||||
(2.4b) |
When we wish to refer to this notion in the abstract sense, we use the term “-full linearity”.
We note in particular that (2.4a) uses the constraint and (2.4b) uses . This essentially corresponds to the standard definition of fully linear models [15, Definition 3.1] in the unconstrained case .
Our algorithm will require the existence of two procedures related to -full linearity: given a model , iterate and trust-region radius , we assume that we can
-
•
Verify whether or not is -fully linear in ; and
-
•
If is not -fully linear, create a new model (typically a small modification of ) which is -fully linear in .
Lastly, our model (2.3) induces an approximate first-order criticality measure, namely
(2.5) |
As above, for convenience we will define .
2.2 Algorithm Specification
Our DFO method is based on a trust-region framework. That is, given the local model (2.3), we calculate a potential new iterate as , where is an approximate minimizer of the trust-region subproblem
(2.6) |
where is a parameter updated dynamically inside the algorithm. Formally, we require the following, which can be achieved using a Goldstein-type linesearch method [12, Algorithm 12.2.2 & Theorem 12.2.2].
Assumption 2.5.
There exists a constant such that the computed step satisfies , and the generalized Cauchy decrease condition:
(2.7) |
The full derivative-free algorithm for solving (1.1) from [22], called CDFO-TR, is given in Algorithm 1. The overall structure is similar to other model-based DFO methods, such as [15, Algorithm 4.1] for unconstrained minimization.
(2.8) |
We have the following global convergence and worst-case complexity results for Algorithm 1 from [22].
Theorem 2.6 (Theorem 3.10 & Corollary 3.15, [22]).
In [22], details are given on how to construct fully linear models based on linear interpolation to feasible points. Although linear models can be practical for some structured problems, such as nonlinear least-squares objectives (e.g. [8]), quadratic models are generally preferred. The remainder of this paper is devoted to the construction of fully linear quadratic models by only sampling the objective at feasible points.
3 Linear Regression Models
We first consider how to extend the linear interpolation approximation theory from [22] to the case of linear regression models (with the slight extra generalization that the base point does not need to be an interpolation point). The purpose of this is twofold: regression models can be useful for modelling noisy objectives (e.g. [5]), and this theory will be necessary to develop the corresponding quadratic interpolation theory in Section 4. In the unconstrained case , these results were originally proven in [14].
Suppose we have sampled the function at points (where ), and given this information we wish to find a linear model
(3.1) |
by solving
(3.2) |
Equivalently, we can find the and for our model by finding the least-squares solution to the system
(3.3) |
which may be written as
(3.4) |
where is the Moore-Penrose pseudoinverse of .
Later we will require the following standard properties of (see, e.g. [18, Section 5.5.4]).
Lemma 3.1.
For , the Moore-Penrose pesudoinverse of (3.3) satisfies and . The minimal-norm solution to the underdetermined system is .
The quality of the choice of interpolation points will be assessed by considering the associated set of Lagrange polynomials. In this case, the Lagrange polynomials associated with our interpolation set are the linear functions
(3.5) |
each defined by the least-squares regression problem
(3.6) |
or equivalently
(3.7) |
Our notion of the sampled points being a ‘good’ choice is given by the Lagrange polynomials having small magnitude in the region of interest. This is formalized in the following notion, which generalizes [14, Definition 2.7] to the convex feasible region .
Definition 3.2.
Given , the set is -poised for linear regression in if spans and
(3.8) |
We note that if is -poised for linear regression, then has full column rank (since is assumed to span ). This requirement also necessitates that .
Assuming the -poisedness of will be sufficient for the associated linear regression model to be -fully linear. The proof of this follows a similar structure to [22, Lemma 4.3 & Theorem 4.4], but with an increased complexity to the proofs coming from using regression instead of interpolation.
Lemma 3.3.
Proof.
We begin by defining the residual of the least-squares problem (3.3) as
(3.12) |
Then for all have
(3.13) | ||||
(3.14) |
Now, fix . Since has full column rank, its rows span and so there exist constants such that
(3.15) |
Of these, we take the with minimal norm (c.f. Lemma 3.1),
(3.16) |
There are two key properties of : firstly,
(3.17) |
where the last equality follows (Lemma 3.1). Hence from the -poisedness condition. Secondly, we have
(3.18) |
from (Lemma 3.1). All together, we have
(3.19) | ||||
(3.20) | ||||
(3.21) | ||||
(3.22) | ||||
(3.23) |
and we recover (3.9), where we used (3.18), and to get the last inequality.
Theorem 3.4.
Proof.
Fix . We first derive by considering the cases and separately.
If , then . Since is convex and we also have . Hence (3.9) gives
(3.27) |
and so
(3.28) |
This gives us
(3.29) | ||||
(3.30) | ||||
(3.31) | ||||
(3.32) | ||||
(3.33) | ||||
(3.34) |
where we use (3.10) to get the fourth inequality, and the last line follows from . Instead if , then already, and so (3.9) immediately gives
(3.35) | ||||
(3.36) | ||||
(3.37) |
Either way, we get the desired value of .
To get we now fix an arbitrary and again consider the cases and separately. First, if , then . From (3.11) we get
(3.38) |
where the second inequality follows from . Alternatively, if then the convexity of implies that . Again from (3.11) we get
(3.39) | ||||
(3.40) | ||||
(3.41) |
where the last line follows from . Again, either way we get the desired value of . ∎
4 Underdetermined Quadratic Interpolation Models
We now consider the case of forming -fully linear quadratic interpolation models. Our approach follows that of [26] for the unconstrained case, although the fully linear error bounds for this approach were shown later in [16]. We note that this approach is different to the underdetermined quadratic interpolation used in [14].
Here, we aim to construct a quadratic model
(4.1) |
where , and with . We assume that our interpolation set is with . The case corresponds to full quadratic interpolation [13, Section 4.2]. We exclude the case which, from (4.2) below, corresponds to linear interpolation and was analyzed (in the convex-constrained case) in [22, Section 4].
If then there are infinitely many models satisfying the interpolation conditions for all . So, following [26] we choose the model with minimum Frobenius norm Hessian by solving
(4.2a) | ||||
s.t. | (4.2b) |
This is a convex quadratic program, and (as shown in [26]), reduces to solving the linear system
(4.20) |
where is from the linear regression problem (3.3) and has entries for . The solution to (4.20) immediately gives us and ; the (symmetric) model Hessian is given by . We note that is symmetric positive semidefinite [26, eq. 2.10], and are the Lagrange multipliers associated with constraints (4.2b).
The Lagrange polynomials associated with our interpolation set are
(4.21) |
where , and come from solving (4.2) with (4.2b) replaced by for . Equivalently, we have
(4.22) |
and . This gives
(4.23) | ||||
(4.24) | ||||
(4.25) |
Given (4.20) and (4.22), we conclude that
(4.26) |
which gives
(4.27) |
We now have our new definition of -poisedness:
Definition 4.1.
Given , the interpolation set is -poised for minimum Frobenius norm interpolation in if is invertible, and
(4.28) |
Lemma 4.2.
If the set is -poised for minimum Frobenius norm interpolation, then it is -poised for linear regression.
Proof.
Since is invertible, the sub-matrix has full column rank by [4, Theorem 3.3], and so spans . The remainder of this proof is based on the argument in [16, p. 83]. We note that since is a global minimizer of the objective function (4.2), that if is linear then we have exact interpolation, . Applying this to the functions and for , we get from (4.27) that
(4.29) |
Denoting as the vector of all , these are equivalent to
(4.30) |
Hence is another solution to the (underdetermined) system (3.15). Since the minimal norm solution to (3.15) was , we must have . However from (3.17) we know , where are the Lagrange polynomials associated with linear regression for . Thus . Now, fixing any we get
(4.31) |
and we are done. ∎
We are now in a position to construct our fully linear error bounds. We will begin by proving a bound on the size of the model Hessian, which requires the following technical result.
Lemma 4.3.
Fix and . If satisfy and then and .
Proof.
We first prove the bound on . To find a contradiction, first suppose that and so . Since we have , which means , contradicting . Instead, suppose that and so . Then means and so , again contradicting .
The bound on follows from similar reasoning. First suppose that and so . Since we have and so , contradicting . Instead suppose that and so . Then since we have and so , again contradicting . ∎
We can now give our bound on the model Hessian. In the existing (unconstrained) theory, this is presented as a bound on [16, Theorem 5.7], but here we only need to consider specific Rayleigh-type quotients.
Lemma 4.4.
Proof.
First, fix and consider the associated Lagrange polynomial
(4.33) |
Additionally, fix , and we will first provide a bound on
(4.34) |
To do this, we consider the value of at five different points: , , , plus
(4.35) |
where . Since we have , and by convexity, since . Similarly, we also have . For these points, from the definition of -poisedness we know and by definition of Lagrange polynomials we have and so .
From we have , and so implies
(4.36) | ||||
(4.37) | ||||
(4.38) |
where the last line follows from the reverse triangle inequality. Together with , we get
(4.39) | ||||
(4.40) |
Since by definition, applying Lemma 4.3 we conclude
(4.41) | ||||
(4.42) |
Since was arbitrary, the same inequalities hold with replaced by .
Now, consider the point
(4.43) |
Since , we have , and also
(4.44) |
and so . Written in full, this is
(4.45) |
That is,
(4.46) |
Applying , (4.41) and (4.42), we conclude
(4.47) |
or
(4.48) |
for all , where
(4.49) |
More simply, we have
(4.50) | ||||
(4.51) | ||||
(4.52) |
using to get the inequality. Now let us turn our attention to the model (4.2). Note that we may add/subtract a linear function to without changing (since this just changes and in (4.2)). So, we consider the model generated by interpolation to . From (4.27) we may write
(4.53) |
where are the Hessians of the Lagrange polynomials. From Assumption 2.1, we also have . Then for any we conclude
(4.54) | ||||
(4.55) | ||||
(4.56) |
and we are done after applying (4.52). ∎
Remark 4.5.
Next, we can prove an analogous result to Lemma 3.3.
Lemma 4.6.
Proof.
Fix . From Lemma 4.2, our interpolation set is -poised for linear regression. Therefore there exist constants such that
(4.61) |
where (provided the minimal norm solution is taken as in (3.15)). Thus we have
(4.62) | |||
(4.63) | |||
(4.64) | |||
(4.65) | |||
(4.66) |
using Assumption 2.1 and Lemma 4.4 in the last line, and we have (4.58). To get (4.59), we just take in (4.58). Finally, to get (4.60), we use
(4.67) |
Finally, we can give our fully linear error bounds for underdetermined quadratic interpolation models.
Theorem 4.7.
Proof.
First fix and we will derive by considering the cases and separately. If then is in and since is convex. We then apply Lemma 4.6 to get
(4.69) | ||||
(4.70) | ||||
(4.71) |
where the last inequality follows from . Also, from (4.61) we have with , and so from Lemma 4.4,
(4.72) | ||||
(4.73) | ||||
(4.74) | ||||
(4.75) |
again using in the last line.
Instead, if then and we apply Lemma 4.6 directly to get
(4.76) | ||||
(4.77) |
Also, using Lemma 4.4 we have
(4.78) | ||||
(4.79) | ||||
(4.80) |
In either case, we use (4.71) and (4.75), or (4.77) and (4.80), with Assumption 2.1 and Lemma 4.6 to get
(4.81) | ||||
(4.82) | ||||
(4.83) |
and we get the value of after .
Remark 4.8.
The fully linear error bounds for the unconstrained case [14, 16] give , so Theorem 4.7 matches the bound for up to a factor of (but with a value of which is smaller than the unconstrained case; see Remark 4.5). Our value of has an extra term of size and so may be larger depending on the relative sizes of and .
5 Constructing -Poised Quadratic Models
To meet all the requirements of Algorithm 1, we require procedures which can verify whether or not a given model is -fully linear, and if not, modify it to be -fully linear. From Theorem 4.7 it is clear that it suffices to verify/ensure that is -poised for underdetermined quadratic interpolation in and for all .
In the first case—verifying whether not the interpolation set is -poised—we may follow the approach in [22, Section 4.3.1] and simply maximize/minimize each in . Checking is straightforward. Although maximizing/minimizing , a possibly nonconvex quadratic function, in may not be straightforward depending on the structure of , we only need to know if the solution is above/below and not the exact solution. Existing solvers, including potentially global optimization solvers, are sufficient for this.
We now consider the situation where is not -poised, and we wish to construct a new interpolation set which is -poised. There are two possible scenarios:
-
•
The associated matrix in (4.20) is not invertible; or
-
•
The matrix is invertible but for some and .
We will describe how to handle both of these situations in Section 5.2, but we will first need a technical result about how the matrix in (4.20) changes as interpolation points are changed.
5.1 Updating the matrix
In this section we analyze changes in the determinant of the matrix as interpolation points are updated. These results are based on ideas from [26, Section 4], which considers changes in from updating interpolation points, rather than .
Lemma 5.1.
If is a symmetric, invertible matrix and we form by changing the -th row and column of to , then
(5.1) |
Proof.
Denote the -th row and column of by . We first note that can be written as a symmetric rank-3 update of :
(5.2) |
The first two update terms add to the -th column and row respectively, and the last term corrects for the double update of from the first two updates. We can then apply the matrix determinant lemma222That is, if is invertible, e.g. [21, Corollary 18.1.4]. to get
(5.3) | ||||
(5.4) |
where , , and since is symmetric. Directly computing, we get
(5.5) | ||||
(5.6) | ||||
(5.7) |
Finally, since , we get
(5.8) |
and
(5.9) | ||||
(5.10) | ||||
(5.11) | ||||
(5.12) |
Our result then follows from (5.7) combined with the definition of , (5.8) and (5.12). ∎
Theorem 5.2.
Proof.
If , then the result holds trivially, so we assume without loss of generality that .
Changing the interpolation point to requires replacing the -th row and column of by , where
(5.14a) | ||||
(5.14b) | ||||
(5.14c) | ||||
(5.14d) |
That is, is the same as from (4.25) except for the -th entry. Specifically, we have
(5.15) |
where
(5.16) |
Given this, Lemma 5.1 gives us (denoting for notational convenience)
(5.17) | ||||
(5.18) | ||||
(5.19) |
So from the definition of , we get
(5.20) |
where . Finally, [26, Lemma 2] gives us provided , which gives us the result.333That is not explicitly assumed in the statement of [26, Lemma 2] but is required as the proof relies on [26, Theorem, p. 196]. ∎
Remark 5.3.
Remark 5.4.
5.2 Constructing -Poised Sets
We begin by describing how to construct an initial interpolation set, contained in for which is invertible. The full process for this is given in Algorithm 2. We first generate an initial set in , but not necessarily in , for which is invertible. We can do this using the approach outlined in [29, Section 2], which chooses points of the form or for some in a specific way. We then replace any of these points which are not in by new points in while maintaining an invertibile .
Theorem 5.5.
Proof.
Firstly, Algorithm 2 terminates after at most iterations, since an index can only ever be selected once (after which the new ).
Next, all points in the output set were either originally generated in line 2, in which case they must have originally been in both (from the construction in line 2) and (in order to never be replaced in the main loop), or were a chosen in line 4. Hence .
Finally, the initial collection from [29, Section 2] generates an invertible , with an explicit formula for given in [29, eqns. 2.11–2.15]. Hence the initial satisfies . Then at each iteration replace with a point for which , and so the new also satisfies by Theorem 5.2. Thus the output interpolation set has an invertible . ∎
We can now present our algorithm to generate a -poised set for minimum Frobenius norm interpolation, given in Algorithm 3. This algorithm is very similar to existing methods for unconstrained linear/quadratic (e.g. [16, Algorithm 6.3]) or constrained linear poisedness ([22, Algorithm 4.2]). Essentially we first ensure we have an interpolation set with invertible (so we can construct the associated Lagrange polynomials), and we then iteratively find indices and points with , replacing with .
Theorem 5.6.
For any , , and , Algorithm 3 produces an interpolation set which is -poised for minimum Frobenius norm interpolation in and for which for all .
Proof.
If Algorithm 3 terminates, then by the definition of the loop check it must satisfy the requirements of the theorem, so it suffices to show that the ‘while’ loop terminates.
At the start of the ‘while’ loop in line 5, we have that and is invertible (either because the ‘if’ statement in line 2 evaluates to false or by Theorem 5.5). Hence at the start of the ‘while’ loop we have . Let us call this value .
In each iteration of the ‘while’ loop, we increase by a factor of at least by Theorem 5.2. Hence after iterations of the loop, we have . However, since by construction our interpolation set is contained in the compact set , there is a global upper bound on (which is a continuous function of the interpolation points). Hence the loop must terminate after at most iterations. ∎
Remark 5.7.
6 Conclusions and Future Work
In [22], a weaker notion of fully linear models was used to construct convergent algorithms for (1.1) (namely Algorithm 1), and it was shown that linear interpolation models constructed only with feasible points can yield such models. Here, we show that Algorithm 1 can also be implemented using underdetermined quadratic interpolation models in the minimum Frobenius sense of [26], which is a much more useful construction in practice. As an extra benefit, it also establishes the same results for linear regression models, as a strict generalization of the analysis in [22] that is independently useful in the case where only noisy objective evaluations are available.
The natural extension of this work is to the construction and analysis of second-order accurate models (“fully quadratric” in the language of [16]) built only using feasible points. A concrete implementation of Algorithm 1 with quadratic interpolation models would also be a useful way to extend the implementation from [22] (which only considers nonlinear least-squares objectives, for which linear interpolation models can be practically useful).
References
- [1] C. Audet and W. Hare, Derivative-Free and Blackbox Optimization, Springer Series in Operations Research and Financial Engineering, Springer, Cham, Switzerland, 2017.
- [2] F. Augustin and Y. M. Marzouk, A trust-region method for derivative-free nonlinear constrained stochastic optimization, arXiv preprint arXiv:1703.04156, (2017).
- [3] A. Beck, First-Order Methods in Optimization, MOS-SIAM Series on Optimization, MOS/SIAM, Philadelphia, 2017.
- [4] M. Benzi, G. H. Golub, and J. Liesen, Numerical solution of saddle point problems, Acta Numerica, 14 (2005), pp. 1–137.
- [5] S. C. Billups, J. Larson, and P. Graf, Derivative-free optimization of expensive functions with computational error using weighted regression, SIAM Journal on Optimization, 23 (2013), pp. 27–53.
- [6] C. Cartis, J. Fiala, B. Marteau, and L. Roberts, Improving the flexibility and robustness of model-based derivative-free optimization solvers, ACM Transactions on Mathematical Software, 45 (2019), pp. 32:1–41.
- [7] C. Cartis, N. I. M. Gould, and P. L. Toint, An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity, IMA Journal of Numerical Analysis, 32 (2012), pp. 1662–1695.
- [8] C. Cartis and L. Roberts, A derivative-free Gauss-Newton method, Mathematical Programming Computation, 11 (2019), pp. 631–674.
- [9] P. Conejo, E. Karas, and L. Pedroso, A trust-region derivative-free algorithm for constrained optimization, Optimization Methods and Software, 30 (2015), pp. 1126–1145.
- [10] P. Conejo, E. Karas, L. Pedroso, A. Ribeiro, and M. Sachine, Global convergence of trust-region algorithms for convex constrained minimization without derivatives, Applied Mathematics and Computation, 220 (2013), pp. 324–330.
- [11] A. Conn, K. Scheinberg, and Ph. Toint, A derivative free optimization algorithm in practice, in 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, St. Louis, MO, USA, 1998, American Institute of Aeronautics and Astronautics.
- [12] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, vol. 1 of MPS-SIAM Series on Optimization, MPS/SIAM, Philadelphia, 2000.
- [13] A. R. Conn, K. Scheinberg, and L. N. Vicente, Geometry of interpolation sets in derivative free optimization, Mathematical Programming, 111 (2007), pp. 141–172.
- [14] , Geometry of sample sets in derivative-free optimization: Polynomial regression and underdetermined interpolation, IMA Journal of Numerical Analysis, 28 (2008), pp. 721–748.
- [15] , Global convergence of general derivative-free trust-region algorithms to first- and second-order critical points, SIAM Journal on Optimization, 20 (2009), pp. 387–415.
- [16] , Introduction to Derivative-Free Optimization, vol. 8 of MPS-SIAM Series on Optimization, MPS/SIAM, Philadelphia, 2009.
- [17] R. Garmanjani, D. Júdice, and L. N. Vicente, Trust-region methods without using derivatives: Worst case complexity and the nonsmooth case, SIAM Journal on Optimization, 26 (2016), pp. 1987–2011.
- [18] G. H. Golub and C. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, 3rd ed., 1996.
- [19] S. Gratton, P. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Optimization Methods and Software, 26 (2011), pp. 873–894.
- [20] E. A. E. Gumma, M. H. A. Hashim, and M. M. Ali, A derivative-free algorithm for linearly constrained optimization problems, Computational Optimization and Applications, 57 (2014), pp. 599–621.
- [21] D. A. Harville, Matrix Algebra From a Statistician’s Perspective, Springer, New York, 2008.
- [22] M. Hough and L. Roberts, Model-based derivative-free methods for convex-constrained optimization, SIAM Journal on Optimization, 32 (2022), pp. 2552–2579.
- [23] K. Kulshreshtha, B. Jurgelucks, F. Bause, J. Rautenberg, and C. Unverzagt, Increasing the sensitivity of electrical impedance to piezoelectric material parameters with non-uniform electrical excitation, Journal of Sensors and Sensor Systems, 4 (2015), pp. 217–227.
- [24] J. Larson and S. C. Billups, Stochastic derivative-free optimization using a trust region framework, Computational Optimization and Applications, 64 (2016), pp. 619–645.
- [25] J. Larson, M. Menickelly, and S. M. Wild, Derivative-free optimization methods, Acta Numerica, 28 (2019), pp. 287–404.
- [26] M. J. D. Powell, Least Frobenius norm updating of quadratic models that satisfy interpolation conditions, Mathematical Programming, 100 (2004), pp. 183–215.
- [27] , On the use of quadratic models in unconstrained minimization without derivatives, Optimization Methods and Software, 19 (2004), pp. 399–411.
- [28] , The NEWUOA software for unconstrained optimization without derivatives, in Large-Scale Nonlinear Optimization, P. Pardalos, G. Di Pillo, and M. Roma, eds., vol. 83, Springer US, Boston, MA, 2006, pp. 255–297.
- [29] , The BOBYQA algorithm for bound constrained optimization without derivatives, Tech. Rep. DAMTP 2009/NA06, University of Cambridge, 2009.
- [30] M. Scheuerer and T. M. Hamill, Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions, Monthly Weather Review, 143 (2015), pp. 4578 – 4596.
- [31] S. F. B. Tett, J. M. Gregory, N. Freychet, C. Cartis, M. J. Mineter, and L. Roberts, Does model calibration reduce uncertainty in climate projections?, Journal of Climate, 35 (2022), pp. 2585–2602.
- [32] L. N. Trefethen, Approximation Theory and Approximation Practice, SIAM, Philadelphia, 2020.
- [33] G. Ughi, V. Abrol, and J. Tanner, An empirical study of derivative-free-optimization algorithms for targeted black-box attacks in deep neural networks, Optimization and Engineering, 23 (2022), pp. 1319–1346.
- [34] S. M. Wild, Derivative-Free Optimization Algorithms for Computationally Expensive Functions, PhD thesis, Cornell University, 2009.
- [35] S. M. Wild, R. G. Regis, and C. A. Shoemaker, ORBIT: Optimization by radial basis function interpolation in trust-regions, SIAM Journal on Scientific Computing, 30 (2008), pp. 3197–3219.
- [36] M. Q. Xuan and J. Nocedal, A feasible method for constrained derivative-free optimization, arXiv preprint arXiv:2402.11920, (2024).
- [37] Z. Zhang, Sobolev seminorm of quadratic functions with applications to derivative-free optimization, Mathematical Programming, 146 (2014), pp. 77–96.