This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

[1]\fnmGonglin \surYuan \equalcontThese authors contributed equally to this work.

[1]\orgdivGuangxi University, \orgnameSchool of Mathematics and Information Science, Center for Applied Mathematics of Guangxi (Guangxi University), \orgaddress\streetStreet, \cityNanning, \postcode530004, \stateGuangxi, \countryChina

2]\orgnameThai Nguyen University of Economics and Business Administration, \orgaddress\streetStreet, \stateNguyen, \countryThai

A Diagonal BFGS Update Algorithm with Inertia Acceleration Technology for Minimizations

\fnmZhenhua \surLuo [email protected]    [email protected]    \fnmHongtruong \surPham [email protected] * [
Abstract

We integrate the diagonal quasi-Newton update approach with the enhanced BFGS formula proposed by Wei, Z., Yu, G., Yuan, G., Lian, Z. [1], incorporating extrapolation techniques and inertia acceleration technology. This method, designed specifically for non-convex constrained problems, requires that the search direction ensures sufficient descent and establishes global linear convergence. Such a design has yielded exceptionally favorable data results.

keywords:
Non-convex Unconstrained Optimization, q-linear convergence, Quasi-Newton Methods, Diagonal Quasi-Newton Updates, Inertia Algorithms.

1 Introduction

Address the unrestricted optimization dilemma of reducing a continuously differentiable function f:nf:\mathbb{R}^{n}\rightarrow\mathbb{R}, represented as:

minxnf(x).\min_{x\in\mathbb{R}^{n}}f(x). (1.1)

Quasi-Newton methods belong to a category of iterative optimization algorithms that employ quasi-Newton approximations of the Hessian matrix to address the problem that defined in (1.1). The underlying principle involves forming a sequence of iterates {xk}\{x_{k}\} designed to converge towards a fixed point of ff, achieved through the iterative refinement of an approximate Hessian matrix BkB_{k} to expedite optimization procedures with ff at each iteration. The foundational principles of the quasi-Newton approach originated with Davidon [2], and subsequent enhancements and applications have been explored by numerous scholars [3, 4, 5]. We start with x0nx_{0}\in\mathbb{R}^{n} and B0n×nB_{0}\in\mathbb{R}^{n\times n}. The iterative process is updated by generating a sequence {xk}\{x_{k}\} to iteratively refine the estimation of the Hessian matrix:

xk+1=xk+αkdk,x_{k+1}=x_{k}+\alpha_{k}d_{k}, (1.2)

In n\mathbb{R}^{n}, the determination of the search direction is governed by the line search vector dkd_{k}, and αk>0\alpha_{k}>0 represents the determination of the αk\alpha_{k} is conducted via line search procedure. The meaning of gkg_{k} represents partial gradient of ff at xkx_{k}, represented as a vector of the function we are looking for:

Bkdk=gk,B_{k}d_{k}=-g_{k}, (1.3)

sks_{k} represents the difference vector between consecutive iterates, computed as xk+1xkx_{k+1}-x_{k}, and the variable yky_{k} denotes the discrepancy vector between the gradients observed at consecutive iterations, formulated as gk+1gkg_{k+1}-g_{k}. In algorithm for approximating the Hessian matrices, the vectors sks_{k} and yky_{k} are crucial for the enhancement of the algorithm, which is realized by an update process. BFGS updating is the most widely used among those formulations, expressed as:

Bk+1=BkBkskskTBkskTBksk+ykykTykTsk.B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+\frac{y_{k}y_{k}^{T}}{y_{k}^{T}s_{k}}. (1.4)

Computational evidence has consistently demonstrated the superior effectiveness to other quasi-Newton methods. Considerable scholarly investigations have been dedicated to exploring the convergence characteristics of the technique , particularly in a realm of convex minimization, as evidenced by studies such as [6, 7, 8, 3, 4, 5]. These formulations have demonstrated high effectiveness in practical scenarios across a diverse range of optimization problems. By incorporating the insights garnered from these studies, we refine our existing formula BkB_{k} in (1.4), leading to enhanced convergence performance.

Drawing inspiration from Neculai Andrei’s work on diagonal Hessian matrix from a function ff[9], we leverage the efficacy of the update formula (1.4) and its variants. The approach discerns the components positioned along the principal diagonal of the approximative matrix which, adhering to the principles outlined in the theory from Dennis and Wolkowicz’s [10] secant updating method with minimal changes, is a key aspect to be addressed. Let us now delve into this consideration:

Bk=diag(bk1,,bkn),B_{k}=\text{diag}(b_{k}^{1},\ldots,b_{k}^{n}), (1.5)

bki>0b_{k}^{i}>0 holds for every i=1,,ni=1,\ldots,n, an estimate of components of the matrix BkB_{k} for ff at iteration kk. Incorporated within the algorithm, a computation of the search direction which is expressed as dk=Bk1gkd_{k}=-B_{k}^{-1}g_{k}, specifically, dki=gki/bkid_{k}^{i}=-{g_{k}^{i}}/{b_{k}^{i}} for i=1,,ni=1,\ldots,n. Notably, as BkB_{k} is both diagonal and positive definite, every element undergoes multiplication through distinct positive factors, determined by the diagonal factors BkB_{k}, in this particular algorithm. To facilitate the use of the minimum change secant update strategy, let us consider the assumption that the diagonal matrix BkB_{k} possesses positive definiteness. The updated matrix Bk+1B_{k+1}, defined as an outcome of the update to BkB_{k}, can be expressed as:

Bk+1=Bk+Δk,B_{k+1}=B_{k}+\Delta_{k}, (1.6)

where Δk=diag(δk1,,δkn)\Delta_{k}=\text{diag}(\delta_{k}^{1},\cdots,\delta_{k}^{n}) is a correction matrix that satisfies certain conditions to ensure the accuracy and stability between BkB_{k} and Bk+1B_{k+1}. More precisely, the matrix Δk\Delta_{k} serves as a correction to BkB_{k}, ensuring that Bk+1B_{k+1} is positive definite and adheres to the weak orthogonal cut condition proposed by Dennis and Wolkowicz [10]:

skTBk+1sk=skTyk.s_{k}^{T}B_{k+1}s_{k}=s_{k}^{T}y_{k}.

Utilizing (1.11) and the weak WWP step size rule,

f(xk+αkdk)\displaystyle f(x_{k}+\alpha_{k}d_{k}) f(xk)+ραkgkTdk,\displaystyle\leq f(x_{k})+\rho\alpha_{k}g_{k}^{T}d_{k}, (1.7)
gk+1Tdk\displaystyle g_{k+1}^{T}d_{k} σgkTdk,\displaystyle\geq\sigma g_{k}^{T}d_{k}, (1.8)

Meanwhile, Wei, Z., Yu, G., Yuan, G., Lian, Z. introduced a novel formula[1]:

Bk+1sk=yk,B_{k+1}s_{k}=y_{k}^{*}, (1.9)

where yk=yk+Yk(3)sky^{*}_{k}=y_{k}+Y_{k}(3)s_{k} and

Yk(3)=2[f(xk)f(xk+1)]+(g(xk+1)+g(xk))Tsksk2I.Y_{k}(3)=\frac{2[f(x_{k})-f(x_{k+1})]+(g(x_{k+1})+g(x_{k}))^{T}s_{k}}{\|s_{k}\|^{2}}I. (1.10)

By using (1.9), they provided a BFGS-type update,

Bk+1=BkBkskskTBkskTBksk+yk(yk)TskTyk.B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+\frac{y_{k}^{*}(y_{k}^{*})^{T}}{s_{k}^{T}y_{k}^{*}}. (1.11)

Therefore, our attention shifts to the modified BFGS method, a formulation demonstrated to exhibit superlinear convergence, as proposed by Wei Wei, Li, and Qi [1]. They have introduced several modifications to the BFGS method incorporating the recently introduced quasi-Newton condition Bk+1sk=ykB_{k+1}s_{k}=y_{k}^{*}. Employing this property within the framework enables the establishment of its linear convergence.

This observation motivates us to combine these two derivations of BkB_{k} and obtain a new improved BkB_{k} formula, which simplifies the proof of convergence. In utilizing the iterative update approach, let us make the assumption that the matrix BkB_{k} should be positive definite. So Bk+1B_{k+1} derived from the update BkB_{k} which is defined by Bk+1sk=ykB_{k+1}s_{k}=y_{k}^{*}, expressing yy^{*} as a formula to approach 2f(x)s\nabla^{2}f(x)s, recognizing yy and Yk(3)Y_{k}^{(3)} are vectors. Often, in practice, αk\alpha_{k} from (1.2) is defined from WWP step rules [11, 12] with the selection of positive coefficients σ\sigma and ρ\rho ensuring 1>σ>ρ>01>\sigma>\rho>0.

Section 1 of this paper introduces the research problem and recent advancements in related studies. Section 2 explains the motivation behind this research and proposes the improved diagonal quasi-Newton algorithm, DMBFGS3, as well as the WDMBFGS3 algorithm, which incorporates extrapolation techniques. Section 3 proves the global linear convergence of both algorithms. Section 4 validates the rapid convergence and stability of the WDMBFGS3 algorithm through numerical experiments. Section 5 wraps up the findings discussed in this paper and proposes avenues for further investigation.

2 Motivation and algorithm: Optimizing Hessian with Diagonal Minimization

As previously mentioned, the matrix Δk\Delta_{k} from (1.6) is ascertained as the result of the following questions:

min\displaystyle\min 12ΔkF2+tr(Bk+Δk),\displaystyle\frac{1}{2}\|\Delta_{k}\|_{F}^{2}+\mathrm{tr}(B_{k}+\Delta_{k}), (2.1)
s.t.\displaystyle s.t. skTBk+1sk=skTyk\displaystyle\quad s_{k}^{T}B_{k+1}s_{k}=s_{k}^{T}y_{k}

Then we can further modify the definitions of BkB_{k} and yky_{k},obtain the following formula:

min\displaystyle\min 12ΔkF2+tr(Bk+Δk)\displaystyle\frac{1}{2}\|\Delta_{k}\|_{F}^{2}+\mathrm{tr}(B_{k}+\Delta_{k}) (2.2)
s.t.\displaystyle s.t. skTBk+1sk=skTyk,\displaystyle\quad s_{k}^{T}B_{k+1}s_{k}=s_{k}^{T}y_{k}^{*},

The objective function of problem (2.2) constitutes a linear combination of minimizing the difference between BkB_{k} and Bk+1B_{k+1} according to the principle of variation, along with minimizing the sum of the diagonal elements of Bk+1B_{k+1}. The incorporation of tr(Bk+Δk)\text{tr}(B_{k}+\Delta_{k}) within the target function from (2.1) is motivated by the aim to derive a correction matrix Δk\Delta_{k},

minΔk12i=1n(δki)2+i=1n(bki+δki)\displaystyle\min_{\Delta_{k}}\frac{1}{2}\sum_{i=1}^{n}(\delta_{k}^{i})^{2}+\sum_{i=1}^{n}(b_{k}^{i}+\delta_{k}^{i}) (2.3)
s.t.\displaystyle s.t. i=1n(ski)2δki=skTyk+skTYk(3)i=1n(ski)2bki,\displaystyle\quad\sum_{i=1}^{n}(s_{k}^{i})^{2}\delta_{k}^{i}=s_{k}^{T}y_{k}+s_{k}^{T}Y_{k}(3)-\sum_{i=1}^{n}(s_{k}^{i})^{2}b_{k}^{i},

whereΔk=diag(δk1,,δkn)\Delta_{k}=\operatorname{diag}(\delta_{k}^{1},\dots,\delta_{k}^{n}), sk=(sk1,,skn)Ts_{k}=\begin{pmatrix}s_{k}^{1},\dots,s_{k}^{n}\end{pmatrix}^{T}, and yk=2f(xk)sk.y_{k}=\nabla^{2}f(x_{k})s_{k}. The Lagrangian corresponding to problem (2.3) can be said to be:

L=12i=1n(δki)2+i=1n(bki+δki)+λ(i=1n(ski)2δkiskTykskTYk(3)+i=1n(ski)2bki),L=\frac{1}{2}\sum_{i=1}^{n}(\delta_{k}^{i})^{2}+\sum_{i=1}^{n}(b_{k}^{i}+\delta_{k}^{i})+\lambda\left(\sum_{i=1}^{n}(s_{k}^{i})^{2}\delta_{k}^{i}-s_{k}^{T}y_{k}-s_{k}^{T}Y_{k}(3)+\sum_{i=1}^{n}(s_{k}^{i})^{2}b_{k}^{i}\right), (2.4)

that λ\lambda represents the lagrangian coefficient.Solutions pursued to this problem (2.3) correspond to a fixed point of the Lagrangian. Consequently, we take the partial derivative of (2.4) for δki\delta_{k}^{i}, so we can obtain:

δki+1+λ(ski)2=0,i=1,,n,\delta_{k}^{i}+1+\lambda(s_{k}^{i})^{2}=0,\quad i=1,\dots,n, (2.5)

By transferring items, we can obtain:

δki=1λ(ski)2,i=1,,n.\delta_{k}^{i}=-1-\lambda(s_{k}^{i})^{2},\quad i=1,\dots,n. (2.6)

To determine λ\lambda, we use the constraint in (2.3) and substitute (2.6) into it, which gives:

λ=i=1n(ski)2bkiskTykskTYk(3)i=1n(ski)2tr(Ak2),\lambda=\frac{\sum_{i=1}^{n}(s_{k}^{i})^{2}b_{k}^{i}-s_{k}^{T}y_{k}-s_{k}^{T}Y_{k}(3)-\sum_{i=1}^{n}(s_{k}^{i})^{2}}{\operatorname{tr}(A_{k}^{2})}, (2.7)

where Ak=diag((sk1)2,,(skn)2)A_{k}=\operatorname{diag}((s_{k}^{1})^{2},\dots,(s_{k}^{n})^{2}). By utilizing (2.7) in (2.6), we can express the diagonal elements of Δk\Delta_{k} as follows:

δki=skTyk+skTYk(3)+skTskskTBksktr(Ak2)(ski)21,i=1,,n,\delta_{k}^{i}=\frac{s_{k}^{T}y_{k}+s_{k}^{T}Y_{k}(3)+s_{k}^{T}s_{k}-s_{k}^{T}B_{k}s_{k}}{\operatorname{tr}(A_{k}^{2})}(s_{k}^{i})^{2}-1,\quad i=1,\dots,n, (2.8)

where BkB_{k} is the serves as the Hessian matrix. This diagonal correction matrix Δk\Delta_{k} can be articulated as

Δk=(skTyk+skTYk(3)+skTskskTBksktr(Ak2))AkI.\Delta_{k}=\left(\frac{s_{k}^{T}y_{k}+s_{k}^{T}Y_{k}(3)+s_{k}^{T}s_{k}-s_{k}^{T}B_{k}s_{k}}{\mathrm{tr}(A_{k}^{2})}\right)A_{k}-I. (2.9)

Then we have the following equation:

Bk+1=Bk+(skTyk+skTYk(3)+skTskskTBksktr(Ak2))AkI,B_{k+1}=B_{k}+\left(\frac{s_{k}^{T}y_{k}+s_{k}^{T}Y_{k}(3)+s_{k}^{T}s_{k}-s_{k}^{T}B_{k}s_{k}}{\mathrm{tr}(A_{k}^{2})}\right)A_{k}-I, (2.10)

i.e., on components,

bk+1i=bki+skTyk+skTYk(3)+skTskskTBksktr(Ak2)(ski)21,i=1,,n,b_{k+1}^{i}=b_{k}^{i}+\frac{s_{k}^{T}y_{k}+s_{k}^{T}Y_{k}(3)+s_{k}^{T}s_{k}-s_{k}^{T}B_{k}s_{k}}{\operatorname{tr}(A_{k}^{2})}(s_{k}^{i})^{2}-1,\quad i=1,\dots,n, (2.11)

The vector is then ubsequently performed as:

dk+1=Bk+11gk+1,d_{k+1}=-B_{k+1}^{-1}g_{k+1}, (2.12)

where gk+1g_{k+1} is the gradient at the new point, and dk+1d_{k+1} is the search direction. It is essential to note that, in order to prevent the update matrix from degenerating or becoming an indefinite matrix Bk+1B_{k+1}, it’s necessary that bk+1i=bki+δki>0b_{k+1}^{i}=b_{k}^{i}+\delta_{k}^{i}>0 holds for all i=1,,ni=1,\ldots,n. If bk+1iϵb_{k+1}^{i}\geq\epsilon, then dk+1=gk+1bk+1id_{k+1}=-\frac{g_{k+1}}{b_{k+1}^{i}}; otherwise, dk+1=gk+1d_{k+1}=-g_{k+1}, where ϵb\epsilon_{b} denotes a minor positive constant.

In practical scenarios, the pursuit of iterative algorithms with rapid convergence rates remains a consistent focus of research [13, 14, 15, 16, 17, 18]. Among these, the inertial extrapolation method has gained significant attention as an acceleration approach [19, 20]. These methods rely on iterative schemes where each subsequent term is derived from the preceding two terms. This concept originated from Polyak’s seminal work [21] and is rooted when dealing with a dissipative dynamical system that is second-order in time, we employ an implicit discretization technique.This arrangement is delineated by the subsequent differential equation:

x′′(k)+ηx(k)+f(x(k))=0,x^{\prime\prime}(k)+\eta x^{\prime}(k)+\nabla f(x(k))=0, (2.13)

Let η>0\eta>0 and f:nnf:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} be a differentiable function. The discretization of arrangement (2.13) enables the determination of the subsequent term xk+1x_{k+1} given the previous terms xk1x_{k-1} and xkx_{k} using an iterative scheme. This discretization process is crucial in the development of inertial extrapolation methods, which aim to enhance the convergence rate of iterative algorithms. By leveraging information from the two preceding terms, these methods efficiently approach the desired solution. The specific form of the discretization and the iterative scheme employed depend on the particular inertial extrapolation method in use. However, the underlying principle remains the same: harnessing the momentum generated by previous iterations to expedite the convergence process.
Arrangement (2.13) is discretized in such a manner that, possessing the terms xk1x_{k-1} and xkx_{k}, the subsequent term xk+1x_{k+1} can be ascertained utilizing

xk+12xk+xk1q2+ηxkxk1q+f(xk)=0,k1,\frac{x_{k+1}-2x_{k}+x_{k-1}}{q^{2}}+\eta\frac{x_{k}-x_{k-1}}{q}+\nabla f(x_{k})=0,\quad k\geq 1, (2.14)

where qq is the pace magnitude. Equation (2.14) begets the ensuing iterative plan:

xk+1=xk+ρ(xkxk1)αf(xk),k1,x_{k+1}=x_{k}+\rho(x_{k}-x_{k-1})-\alpha\nabla f(x_{k}),\quad k\geq 1, (2.15)

where ρ=1ηq\rho=1-\eta q, α=q2\alpha=q^{2}, and ρ(xkxk1)\rho(x_{k}-x_{k-1}) is denoted as the inertial extrapolation term aimed at accelerating the convergence of the series produced from Equation 2.15.

Recently, a plethora of algorithms rooted in the inertial acceleration technique, initially proposed by Alvarez [22], have emerged to tackle a diverse array of optimization and engineering challenges. Notably, the fusion of inertial acceleration with finely crafted search direction strategies, as advocated by Sun and Liu [23], has garnered significant attention.Building upon this groundwork, [24] has made significant contributions for addressing nonlinear monotone equations. The convergence being confined and meeting certain descent conditions:

k=1αkxkxk1<,\sum_{k=1}^{\infty}\alpha_{k}\|x_{k}-x_{k-1}\|<\infty, (2.16)

The inertial extrapolation step magnitude is selected as:

αk:=min{α,1k2xkxk1},\alpha_{k}:=\min\left\{\alpha,\frac{1}{k^{2}\|x_{k}-x_{k-1}\|}\right\}, (2.17)

In this case, where α[0,1)\alpha\in[0,1), the stipulation in (2.16) is satisfied automatically.As is commonly understood, the aforementioned condition is stronger than the one typically used:

k=1αkxkxk1<,\sum_{k=1}^{\infty}\alpha_{k}\|x_{k}-x_{k-1}\|<\infty, (2.18)

There are some algorithms obtained through inertial extrapolation techniques [25, 20, 26, 19], it becomes apparent that by setting

αk:=min{α,1(kxkxk1)2},\alpha_{k}:=\min\left\{\alpha,\frac{1}{(k\|x_{k}-x_{k-1}\|)^{2}}\right\}, (2.19)

Considering these two algorithms, we can observe that with increasing iterations, the discrepancy between xk1x_{k-1} and xkx_{k} becomes smaller and smaller. Therefore, the step size determined by (2.16) will achieve better convergence compared to (2.18).

From the preceding process, we proposed the following algorithm:

Algorithm 1 The DMBFGS3 Algorithm
1:Select x1nx_{1}\in\mathbb{R}^{n}, ϵg>0\epsilon_{g}>0 sufficiently small and ϵb>0\epsilon_{b}>0 and B1n×nB_{1}\in\mathbb{R}^{n\times n}. Set k=1k=1.
2:If |gk||g_{k}| is less than or equal to ϵg\epsilon_{g}, stop.
3:Solve Bkdk+gk=0B_{k}d_{k}+g_{k}=0 to find dkd_{k}.
4:Applying the WWP method to determine αk\alpha_{k}.
5:Update xk+1=xk+αkdkx_{k+1}=x_{k}+\alpha_{k}d_{k}. Calculate Yk(3)Y_{k}(3) from (1.10), and obtain yk=yk+Yk(3)y^{*}_{k}=y_{k}+Y_{k}(3).
6:Diagonal elements for calculating the approximate value BkB_{k} of the Hessian matrix bi,k+1b_{i,k+1}, i=1,,ni=1,\ldots,n, as detailed in (2.11).
7:Calculate the search direction dk+1d_{k+1}: if bk+1iϵbb_{k+1}^{i}\geq\epsilon_{b}, then dk+1i=gk+1ibk+1id_{k+1}^{i}=-\frac{g_{k+1}^{i}}{b_{k+1}^{i}}; otherwise, set dk+1i=gk+1id_{k+1}^{i}=-g_{k+1}^{i}.
8:Increment kk by 1 and proceed with Step 2.

Based on extrapolation techniques, we have accelerated the convergence speed of the DMBFGS3 algorithm using extrapolation techniques. The resulting algorithm is as follows:

Algorithm 2 The WDMBFGS3 Algorithm
1:Select x1nx_{1}\in\mathbb{R}^{n}, ϵg>0\epsilon_{g}>0 sufficiently small and ϵb>0\epsilon_{b}>0 and B1n×nB_{1}\in\mathbb{R}^{n\times n}. Set k=1k=1.
2:If |gk||g_{k}| is less than or equal to ϵg\epsilon_{g}, stop. Compute
xk+1=xk+αkdk,x_{k+1}=x_{k}+\alpha_{k}d_{k},
and 0τkτk~0\leq\tau_{k}\leq\tilde{\tau_{k}} with
τk~:={min{1k2xkxk12,τ},if xk=xk1,τ,otherwise.\tilde{\tau_{k}}:=\begin{cases}\begin{aligned} &\min\{\frac{1}{k^{2}\|x_{k}-x_{k-1}\|^{2}},\tau\},&&\text{if }x_{k}=x_{k-1},\\ &\tau,&&\text{otherwise.}\end{aligned}\end{cases}
3:Compute f(pk)f(p_{k}). If f(pk)Tol||f(p_{k})||\leq\text{Tol}, stop. Otherwise, Solve Bkdk+gk=0B_{k}d_{k}+g_{k}=0 to find dkd_{k}.
4:Applying the WWP method to determine αk\alpha_{k}.
5:Find pk=xk+τk(xkxk1)p_{k}=x_{k}+\tau_{k}(x_{k}-x_{k-1}),sk=pkpk1s_{k}=p_{k}-p_{k-1}. Calculate Yk(3)Y_{k}(3) from (1.10) , and obtain yk=yk+Yk(3)y^{*}_{k}=y_{k}+Y_{k}(3).
6:Diagonal elements for calculating the approximate value BkB_{k} of the Hessian matrix bi,k+1b_{i,k+1}, i=1,,ni=1,\ldots,n, as detailed in (2.11).
7:Calculate the search direction dk+1d_{k+1}: if bk+1iϵbb_{k+1}^{i}\geq\epsilon_{b}, then dk+1i=gk+1ibk+1id_{k+1}^{i}=-\frac{g_{k+1}^{i}}{b_{k+1}^{i}}; otherwise, set dk+1i=gk+1id_{k+1}^{i}=-g_{k+1}^{i}.
8:Increment kk by 1 and proceed with S Step 2.

Remark

  • We introduce a novel approach based on the theory of minimal alterations in secant updates proposed by Dennis and Wolkowicz [10]. In our method, we enforce the condition that the updating matrices are diagonal, which serves as the approximation to the Hessian. We have improved the search strategy for yky_{k} and achieved linear convergence for the entire algorithm.

  • Our experimental results demonstrate that the algorithm proposed by combining these two methods is more efficient and robust compared to other benchmark algorithms.

3 CONVERGENCE OF ALGORITHMS

The subsequent assumptions are necessary for this article:

  • The function ff exhibits continuous differentiability over the set Ω\Omega, with g(m)g(n)Lmn\|g(m)-g(n)\|\leq L\|m-n\| for all m,nΩm,n\in\Omega, where LL is a non-negative constant.

  • The set of level points, denoted as X={tf(t)f(t0)}X=\{t\mid f(t)\leq f(t_{0})\}, is enclosed within the set Ω\Omega.

  • Indicating that v1,v2>0,v1,v2nv_{1},v_{2}>0,v_{1},v_{2}\in\mathbb{R}^{n}, and the function ff is convex, such that v1t2tTG(x)tv2t2v_{1}\|t\|^{2}\leq t^{T}G(x)t\leq v_{2}\|t\|^{2} for all tnt\in\mathbb{R}^{n}.

  • There exists xΩx^{*}\in\Omega which represents the solution to the minimization problem stated in (1.1).

The following Lemma 1 and Lemma 2 presented in [1, 9]are reproduced here for completeness.

Lemma 1.

Assuming that sk0||s_{k}||\neq 0 for each kk, and the function ff meets the required conditions, consider the sequences of diagonal matrices {δk}\{\delta_{k}\} and {Bk}\{B_{k}\} are generated by (2.9) and (2.10)

Assuming the existence of constants β0\beta_{0} and γ0\gamma_{0} satisfying β0b0iγ0\beta_{0}\leq b_{0}^{i}\leq\gamma_{0} for every i=1,,ni=1,\ldots,n, where b0ib_{0}^{i} is the diagonal element of the matrix B0B_{0}, it follows that the sequences {δk}\{\delta_{k}\} and {Bk}\{B_{k}\} remain finite for each kk.

Proof.

By Equation 2.9 and Equation 2.10, since BkB_{k} is a diagonal matrix, we can derive the iterative formula for its diagonal elements (2.11), and y0=2f(ξ0)s0,sk=xk+1xky_{0}=\triangledown^{2}f(\xi_{0})s_{0},s_{k}=x_{k+1}-x_{k}. Because of (s0max)4/tr(A02)<1,\left(s_{0}^{\max}\right)^{4}/tr\left(A_{0}^{2}\right)<1, use Equation 2.11 we can have:

δ0i\displaystyle\|\delta_{0}^{i}\| =s0Ty0+s0Ts0s0TB0s0tr(A02)(s0i)21\displaystyle=\|\frac{s_{0}^{T}y_{0}+s_{0}^{T}s_{0}-s_{0}^{T}B_{0}s_{0}}{\operatorname{tr}(A_{0}^{2})}(s_{0}^{i})^{2}-1\| (3.1)
s0Ty0+s0Ts0s0TB0s0tr(A02)(s0i)2+1\displaystyle\leq\frac{\|s_{0}^{T}y_{0}+s_{0}^{T}s_{0}-s_{0}^{T}B_{0}s_{0}\|}{\operatorname{tr}(A_{0}^{2})}(s_{0}^{i})^{2}+1
s0T2f(ξ0)s0+s0Ts0s0TB0s0tr(A02)(s0max)2+1\displaystyle\leq\frac{\|s_{0}^{T}\nabla^{2}f(\xi_{0})s_{0}+s_{0}^{T}s_{0}-s_{0}^{T}B_{0}s_{0}\|}{\operatorname{tr}(A_{0}^{2})}(s_{0}^{\max})^{2}+1
=s0T2f(ξ0)s0+s0Ts0s0TB0s0(s0max)2tr(A02)(s0max)4+1\displaystyle=\frac{\|s_{0}^{T}\nabla^{2}f(\xi_{0})s_{0}+s_{0}^{T}s_{0}-s_{0}^{T}B_{0}s_{0}\|}{(s_{0}^{\max})^{2}\operatorname{tr}(A_{0}^{2})}(s_{0}^{\max})^{4}+1
s0T2f(ξ0)s0+s0Ts0s0TB0s0(s0max)2+1.\displaystyle\leq\frac{\|s_{0}^{T}\nabla^{2}f(\xi_{0})s_{0}+s_{0}^{T}s_{0}-s_{0}^{T}B_{0}s_{0}\|}{(s_{0}^{\max})^{2}}+1.

Since sk0\|s_{k}\|\neq 0, it implies the existence of ς>0\varsigma>0, ε>0\varepsilon>0 whereby ς(s0max)2\varsigma\geq(s_{0}^{\mathrm{max}})^{2}, skε\|s_{k}\|\geq\varepsilon. Therefore,

s02=s0Ts0=i=1n(s0i)2i=1n(s0max)2=n(s0max)2.\|s_{0}\|^{2}=s_{0}^{T}s_{0}=\sum_{i=1}^{n}\left(s_{0}^{i}\right)^{2}\leq\sum_{i=1}^{n}\left(s_{0}^{\max}\right)^{2}=n\left(s_{0}^{\max}\right)^{2}.

Since ξ0Ω\xi_{0}\in\Omega, using of assumptions , we can deduce that

s0T2f(ξ0)s0nK(s0max)2nKς2,β0s02s0TB0s0γ0s02.\|s_{0}^{T}\nabla^{2}f(\xi_{0})s_{0}\|\leq nK(s_{0}^{\mathrm{max}})^{2}\leq nK\varsigma^{2},\beta_{0}\|s_{0}\|^{2}\leq s_{0}^{T}B_{0}s_{0}\leq\gamma_{0}\|s_{0}\|^{2}.

Set Γ=max{β0,γ0},\Gamma=\max\{\|\beta_{0}\|,\|\gamma_{0}\|\},

δ0i\displaystyle\|\delta_{0}^{i}\| s0T2f(ξ0)s0+s0Ts0+s0TB0s0(s0max)2+1\displaystyle\leq\frac{\|s_{0}^{T}\nabla^{2}f(\xi_{0})s_{0}\|+\|s_{0}^{T}s_{0}\|+\|s_{0}^{T}B_{0}s_{0}\|}{(s_{0}^{\max})^{2}}+1
nKς2+nς2+nΓς2(s0max)2+1\displaystyle\leq\frac{nK\varsigma^{2}+n\varsigma^{2}+n\Gamma\varsigma^{2}}{(s_{0}^{\max})^{2}}+1
(M+1+Γ)nς2ε2+1c,\displaystyle\leq\frac{(M+1+\Gamma)n\varsigma^{2}}{\varepsilon^{2}}+1\equiv c,

for each i=1,,ni=1,\ldots,n, use of bk+1i=bki+δki,b_{k+1}^{i}=b_{k}^{i}+\delta_{k}^{i}, , we can deduce that

βkbkiγk,i=1,,n,\beta_{k}\leq b_{k}^{i}\leq\gamma_{k},\quad i=1,\ldots,n,

thatβk=β0(k1)c,γk=γ0+(k1)c.\mathrm{that~{}}\beta_{k}=\beta_{0}-(k-1)c,\gamma_{k}=\gamma_{0}+(k-1)c. The diagonal matrix sequence {Δk}\{\Delta_{k}\} is bounded , beacuse of Equation 1.6 , the sequence {Bk}\{B_{k}\} is bounded, too. ∎

Lemma 2.

Suppose that the required elements are produced by the algorithm DMBFGS3. Then for any kk, the function value f(xk)f(x_{k}) can be represented as:

f(xk)=f(xk+1)+g(xk+1)TΔxk+12ΔxkTBk+1Δxk,f(x_{k})=f(x_{k+1})+g(x_{k+1})^{T}\Delta x_{k}+\frac{1}{2}\Delta x_{k}^{T}B_{k+1}\Delta x_{k}, (3.2)

where Δxk=xkxk+1\Delta x_{k}=x_{k}-x_{k+1}.

Lemma 3.

Suppose that the elements consists of algorithm DMBFGS3 and that GG is continuous on xx^{*}. So

limkYk(3)=0.\lim_{k\to\infty}\|Y_{k}(3)\|=0. (3.3)
Proof.

We derive the following equation through the expansion of Taylor’s series:

ykTsk=(gk+1gk)Tsk=skTG(ξ1k)sk,y_{k}^{T}s_{k}=\left(g_{k+1}-g_{k}\right)^{T}s_{k}=s_{k}^{T}G(\xi_{1k})s_{k},

and

f(xk)f(xk+1)=gk+1Tsk+12skTG(ξ2k)sk,f(x_{k})-f(x_{k+1})=-g_{k+1}^{T}s_{k}+\frac{1}{2}s_{k}^{T}G(\xi_{2k})s_{k},

where ξ1k=xk+θ1k(xk+1xk),ξ2k=xk+θ2k(xk+1xk),\xi_{1k}=x_{k}+\theta_{1k}(x_{k+1}-x_{k}),\xi_{2k}=x_{k}+\theta_{2k}(x_{k+1}-x_{k}), and θ1k,θ2k(0,1).\theta_{1k},\theta_{2k}\in(0,1). From the definition of Yk(3)Y_{k}(3) and Lemma 2, we get

Yk(3)=skTBk+1skskTG(ξ1k)sksk2I,Y_{k}(3)=\frac{s_{k}^{T}B_{k+1}s_{k}-s_{k}^{T}G(\xi_{1k})s_{k}}{\|s_{k}\|^{2}}I,

and

skTBk+1sk=skTG(ξ2k)sk.s_{k}^{T}B_{k+1}s_{k}=s_{k}^{T}G(\xi_{2k})s_{k}.

Hence,

Yk(3)G(ξ2k)G(ξ1k).\|Y_{k}(3)\|\leq\|G(\xi_{2k})-G(\xi_{1k})\|.

Therefore, Lemma 3 holds, and Yk(3)Y_{k}(3) is definitely bounded, we can set the boundary of this sequence as Yk(3)N0\|Y_{k}(3)\|\leq N_{0}. ∎

Lemma 4.

Let {xk}\{x_{k}\} be produced by the DMBFGS3 algorithm. There exists a sequence of positive numbers {bi}i=1n\{b_{i}\}_{i=1}^{n} and a sufficiently large positive number WW, thereby for iterations k>Wk>W, it holds that

Bk+1G(x)\displaystyle\|B_{k+1}-G(x^{*})\| BkG(x)+2nε1(cn+1)σk+n\displaystyle\leq\|B_{k}-G(x)\|+\frac{2n}{\varepsilon_{1}}(c\sqrt{n}+1)\sigma_{k}+\sqrt{n} (3.4)
=BkG(x)+ω1σk+n,\displaystyle=\|B_{k}-G(x)\|+\omega_{1}\sigma_{k}+\sqrt{n},
Proof.

From Equation (1.6), for a given kk, we have Bk+1=Bk+ΔkB_{k+1}=B_{k}+\Delta_{k}. Hence,

Bk+12f(x)\displaystyle\left\|B_{k+1}-\nabla^{2}f(x^{*})\right\| =Bk+Δk2f(x)\displaystyle=\left\|B_{k}+\Delta_{k}-\nabla^{2}f(x^{*})\right\| (3.5)
Bk2f(x)+Δk\displaystyle\leq\left\|B_{k}-\nabla^{2}f(x^{*})\right\|+\|\Delta_{k}\|

Since yk=Bk+1sk+Yk(3)y_{k}^{*}=B_{k+1}s_{k}+Y_{k}(3),we have

Δk\displaystyle\left\|\Delta_{k}\right\| =skTykskTBksk+skTsktr(Ak2)AkI\displaystyle=\left\|\frac{s_{k}^{T}{y_{k}}^{*}-s_{k}^{T}B_{k}s_{k}+s_{k}^{T}s_{k}}{\mathrm{tr}\left(A_{k}^{2}\right)}A_{k}-I\right\| (3.6)
skTBk+1skskTBksktr(Ak2)Ak+skTsktr(Ak2)AkI+skTYk(3)tr(Ak2).\displaystyle\leq\left\|\frac{s_{k}^{T}B_{k+1}s_{k}-s_{k}^{T}B_{k}s_{k}}{\mathrm{tr}\left(A_{k}^{2}\right)}A_{k}\right\|+\left\|\frac{s_{k}^{T}s_{k}}{\mathrm{tr}\left(A_{k}^{2}\right)}A_{k}-I\right\|+\left\|\frac{s_{k}^{T}Y_{k}\left(3\right)}{\mathrm{tr}\left(A_{k}^{2}\right)}\right\|.\quad

Since Ak=tr(AkTAk)=tr(Ak2)\|A_{k}\|=\sqrt{tr\left(A_{k}^{T}A_{k}\right)}=\sqrt{tr\left(A_{k}^{2}\right)} we have:

skTBk+1skskTBksktr(Ak2)Ak=skT(Bk+1Bk)skAktr(Ak2)\displaystyle\left\|\frac{s_{k}^{T}B_{k+1}s_{k}-s_{k}^{T}B_{k}s_{k}}{\mathrm{tr}\left(A_{k}^{2}\right)}A_{k}\right\|=\frac{\left\|s_{k}^{T}(B_{k+1}-B_{k})s_{k}\right\|\left\|A_{k}\right\|}{\mathrm{tr}\left(A_{k}^{2}\right)} (3.7)
=skT(Bk+1Bk)sktr(Ak2)=Bk+1Bksk2tr(Ak2).\displaystyle=\frac{\left\|s_{k}^{T}(B_{k+1}-B_{k})s_{k}\right\|}{\sqrt{\mathrm{tr}\left(A_{k}^{2}\right)}}=\frac{\left\|B_{k+1}-B_{k}\right\|\left\|s_{k}\right\|^{2}}{\sqrt{\mathrm{tr}\left(A_{k}^{2}\right)}}.

Since sk0\|s_{k}\|\neq 0, exists α>0\alpha>0, ε1>0\varepsilon_{1}>0 thereby α(skmax)2\alpha\geq(s_{k}^{\mathrm{max}})^{2}, skε1\|s_{k}\|\geq\varepsilon_{1}. Therefore, nαn(skmax)2sk2ε12n\alpha\geq n(s_{k}^{\mathrm{max}})^{2}\geq\|s_{k}\|^{2}\geq\varepsilon_{1}^{2}. Hence,

α2(skmax)2ε12n.\alpha^{2}\geq\left(s_{k}^{\max}\right)^{2}\geq\frac{\varepsilon_{1}^{2}}{n}. (3.8)

But,

tr(Ak2)=i=1n(ski)4(skmax)4=(skmax)2(skmax)2n1nε12nsk21n.tr\left(A_{k}^{2}\right)=\sum_{i=1}^{n}\left(s_{k}^{i}\right)^{4}\geq\left(s_{k}^{\max}\right)^{4}=\left(s_{k}^{\max}\right)^{2}\left(s_{k}^{\max}\right)^{2}n\frac{1}{n}\geq\frac{\varepsilon_{1}^{2}}{n}\left\|s_{k}\right\|^{2}\frac{1}{n}.

Hence

tr(Ak2)ε12n2sk2.tr\left(A_{k}^{2}\right)\geq\frac{\varepsilon_{1}^{2}}{n^{2}}\left\|s_{k}\right\|^{2}.

Therefore,

sktr(Ak2)nε1.\frac{\|s_{k}\|}{\sqrt{tr\left(A_{k}^{2}\right)}}\leq\frac{n}{\varepsilon_{1}}.

Since sk0\|s_{k}\|\neq 0, exists ε1>0\varepsilon_{1}>0 thereby skε1\|s_{k}\|\geq\varepsilon_{1}. Therefore, n(skmax)2sk2ε12n(s_{k}^{\mathrm{max}})^{2}\geq\|s_{k}\|^{2}\geq\varepsilon_{1}^{2}. Hence,

(skmax)2ε12n.\left(s_{k}^{\max}\right)^{2}\geq\frac{\varepsilon_{1}^{2}}{n}.

But,

tr(Ak2)=i=1n(ski)4(skmax)4=(skmax)2(skmax)2n1nε12nsk21n.tr\left(A_{k}^{2}\right)=\sum_{i=1}^{n}\left(s_{k}^{i}\right)^{4}\geq\left(s_{k}^{\max}\right)^{4}=\left(s_{k}^{\max}\right)^{2}\left(s_{k}^{\max}\right)^{2}n\frac{1}{n}\geq\frac{\varepsilon_{1}^{2}}{n}\left\|s_{k}\right\|^{2}\frac{1}{n}.

Hence

tr(Ak2)ε12n2sk2.tr\left(A_{k}^{2}\right)\geq\frac{\varepsilon_{1}^{2}}{n^{2}}\left\|s_{k}\right\|^{2}.

So, we have

sktr(Ak2)nε1\frac{\|s_{k}\|}{\sqrt{tr\left(A_{k}^{2}\right)}}\leq\frac{n}{\varepsilon_{1}} (3.9)

But, from Lemma 1,

sk=xk+1xk=xk+1x+xxkxk+1x+xxk2σk.\|s_{k}\|=\|x_{k+1}-x_{k}\|=\|x_{k+1}-x^{*}+x^{*}-x_{k}\|\leq\|x_{k+1}-x^{*}\|+\|x^{*}-x_{k}\|\leq 2\sigma_{k}.

Using (3.7) and Lemma 1, it deduces that:

Bk+1Bksk2tr(Ak2)=Δksksktr(Ak2)cnnε12σk.\frac{\left\|B_{k+1}-B_{k}\right\|\left\|s_{k}\right\|^{2}}{\sqrt{\mathrm{tr}\left(A_{k}^{2}\right)}}=\frac{\left\|\Delta_{k}\right\|\left\|s_{k}\right\|\left\|s_{k}\right\|}{\sqrt{\mathrm{tr}\left(A_{k}^{2}\right)}}\leq c\sqrt{n}\frac{n}{\varepsilon_{1}}2\sigma_{k}. (3.10)

Examining Equation 3.6, we find:

skTsktr(Ak2)AkI\displaystyle\left\|\frac{s_{k}^{T}s_{k}}{\operatorname{tr}\left(A_{k}^{2}\right)}A_{k}-I\right\| =(skTsk)Aktr(Ak2)Itr(Ak2)\displaystyle=\left\|\frac{\left(s_{k}^{T}s_{k}\right)A_{k}-\operatorname{tr}\left(A_{k}^{2}\right)I}{\operatorname{tr}\left(A_{k}^{2}\right)}\right\| (3.11)
=(skTsk)Aktr(Ak2)Itr(Ak2)(skTsk)Ak+tr(Ak2)Itr(Ak2)\displaystyle=\frac{\left\|\left(s_{k}^{T}s_{k}\right)A_{k}-\operatorname{tr}\left(A_{k}^{2}\right)I\right\|}{\operatorname{tr}\left(A_{k}^{2}\right)}\leq\frac{\left\|\left(s_{k}^{T}s_{k}\right)A_{k}+\operatorname{tr}\left(A_{k}^{2}\right)I\right\|}{\operatorname{tr}\left(A_{k}^{2}\right)}
sk2Ak+tr(Ak2)Itr(Ak2)=sk2tr(Ak2)+tr(Ak2)ntr(Ak2)\displaystyle\leq\frac{\|s_{k}\|^{2}\|A_{k}\|+\operatorname{tr}\left(A_{k}^{2}\right)\|I\|}{\operatorname{tr}\left(A_{k}^{2}\right)}=\frac{\|s_{k}\|^{2}\sqrt{\operatorname{tr}\left(A_{k}^{2}\right)}+\operatorname{tr}\left(A_{k}^{2}\right)\sqrt{n}}{\operatorname{tr}\left(A_{k}^{2}\right)}
=sk2tr(Ak2)+nsk2ε1nsk+n=nε1sk+nnε12σk+n.\displaystyle=\frac{\|s_{k}\|^{2}}{\sqrt{\operatorname{tr}\left(A_{k}^{2}\right)}}+\sqrt{n}\leq\frac{\|s_{k}\|^{2}}{\frac{\varepsilon_{1}}{n}\|s_{k}\|}+\sqrt{n}=\frac{n}{\varepsilon_{1}}\|s_{k}\|+\sqrt{n}\leq\frac{n}{\varepsilon_{1}}2\sigma_{k}+\sqrt{n}.

Also, considering the terms in (3.8) and (3.9), we get:

skTYk(3)tr(Ak2)skYk(3)ε12n2sk2=Yk(3)ε12n2skn2N0ε13.\left\|\frac{s_{k}^{T}Y_{k}\left(3\right)}{\mathrm{tr}\left(A_{k}^{2}\right)}\right\|\leqslant\frac{\left\|s_{k}\right\|\left\|Y_{k}\left(3\right)\right\|}{\frac{\varepsilon_{1}^{2}}{n^{2}}\left\|s_{k}\right\|^{2}}=\frac{\left\|Y_{k}\left(3\right)\right\|}{\frac{\varepsilon_{1}^{2}}{n^{2}}\left\|s_{k}\right\|}\leqslant\frac{n^{2}N_{0}}{\varepsilon_{1}^{3}}. (3.12)

From (3.5), using (3.6), (3.10), and (3.11), we have:

Bk+12f(x)\displaystyle\left\|B_{k+1}-\nabla^{2}f(x^{*})\right\| Bk2f(x)+2nε1(cn+1)σk+n+n2N0ε13\displaystyle\leq\left\|B_{k}-\nabla^{2}f(x^{*})\right\|+\frac{2n}{\varepsilon_{1}}(c\sqrt{n}+1)\sigma_{k}+\sqrt{n}+\frac{n^{2}N_{0}}{\varepsilon_{1}^{3}}
=Bk2f(x)+ω1σk+n+n2N0ε13,\displaystyle=\left\|B_{k}-\nabla^{2}f(x^{*})\right\|+\omega_{1}\sigma_{k}+\sqrt{n}+\frac{n^{2}N_{0}}{\varepsilon_{1}^{3}},\quad\quad\quad

where ω1=2n(cn+1)/ε1>0\omega_{1}=2n(c\sqrt{n}+1)/\varepsilon_{1}>0, i.e., (1.11) fulfills the property of bounded degradation.

In particular, the sequences {|Bk|}{\{|B_{k}|\}} is bounded. The ensuing proposition establishes the quadratic-linear convergence of the DMFBGS3 technique. The validation relies on the characteristic of restricted deterioration as delineated in Lemma 2 (consult [27] for comprehensive validation).

Theorem 1.

Presume that the modification equation (2.10) adheres to the confined degradation attribute. Additionally, assume the existence of the affirmative constants ε\varepsilon and δ\delta ,we have:

x0x<ε,\|x_{0}-x^{*}\|<\varepsilon\>,

Next, the succession {xk}\{x_{k}\} produced by the technique DMBFGS3 is properly delineated, and {xk}\{x_{k}\} converges q-linearly to {x}\{x\}.

Theorem 2.

Suppose {xk}\{x_{k}\} and {pk}\{p_{k}\} are produced by Algorithm Algorithm 2. Assuming xΩx^{*}\in\Omega, then according to Assumptions, it is true that

limkpkx=0.\lim_{k\to\infty}\|p_{k}-x^{*}\|=0.
Proof.
pkx2\displaystyle\left\|p_{k}-x^{*}\right\|^{2} =xkx+τk(xkxk1)2\displaystyle=\left\|x_{k}-x^{*}+\tau_{k}(x_{k}-x_{k-1})\right\|^{2} (3.13)
xkx2+2τk(xkxk1)T(xk+τk(xkxk1)x)\displaystyle\leq\left\|x_{k}-x^{*}\right\|^{2}+2\tau_{k}(x_{k}-x_{k-1})^{T}\left(x_{k}+\tau_{k}(x_{k}-x_{k-1})-x^{*}\right)
xkx2+2τkxkxk1(xkx+τkxkxk1)\displaystyle\leq\left\|x_{k}-x^{*}\right\|^{2}+2\tau_{k}\parallel x_{k}-x_{k-1}\parallel\left(\left\|x_{k}-x^{*}\right\|+\tau_{k}\parallel x_{k}-x_{k-1}\parallel\right)
xkx2+2M0τkxkxk1+4M0τkxkxk1\displaystyle\leq\left\|x_{k}-x^{*}\right\|^{2}+2M_{0}\tau_{k}\parallel x_{k}-x_{k-1}\parallel+4M_{0}\tau_{k}\parallel x_{k}-x_{k-1}\parallel
=xkx2+6M0τkxkxk1.\displaystyle=\left\|x_{k}-x^{*}\right\|^{2}+6M_{0}\tau_{k}\parallel x_{k}-x_{k-1}\parallel.

Since the sequence {xk}\{x_{k}\} is convergent, there exists an M0M_{0} such that xkxM0||x_{k}-x^{*}||\leq M_{0}. Similarly, we can deduce that xkxk12M0||x_{k}-x_{k-1}||\leq 2M_{0}.Through Theorem 1, we can obtaink=1τkxkx<,k=1τkxkxk1<\sum_{k=1}^{\infty}\tau_{k}||x_{k}-x^{*}||<\infty,\sum_{k=1}^{\infty}\tau_{k}||x_{k}-x_{k-1}||<\infty , Through (3.13), we can obtain:

k=1pkx2\displaystyle\sum_{k=1}^{\infty}\left\|p_{k}-x^{*}\right\|^{2} <\displaystyle<\infty (3.14)

So that

limkpkx=0.\lim_{k\to\infty}\|p_{k}-x^{*}\|=0. (3.15)

The convergence of the algorithm DMBFGS3 is ensured, and the sequence {xk}\{x_{k}\} exhibits q-linear convergence towards xx., while the sequence {vk}\{v_{k}\} also converges efficiently.

4 Data Analysis

All code was written and executed on a personal computer equipped with an Intel 13th Gen Intel Core(TM) i5-13490F processor (running at a speed of 2.50 GHz). The machine has 32.0 GB of RAM (31.8 GB usable). MATLAB r2021b was used for coding and execution.

To support the theoretical results, this paper compares two classic quasi-Newton optimization algorithms: DNRTR [9] and MBFGS3 [1]. MBFGS3 is an enhanced quasi-Newton algorithm with a modified term added to yky_{k}, while DNRTR is a diagonal quasi-Newton update method for unconstrained optimization.

4.1 CS problems

Candes et al. [28] and Donoho [29] have highlighted significant advancements in the field of Compressive Sensing (CS) in recent years.Compressive Sensing (CS) is a theoretical framework for efficiently acquiring and reconstructing sparse or approximately sparse signals. In this theory, sparsity implies that a signal can be described using a relatively small number of non-zero components, such as the coefficient vector in an appropriate basis or transform. In essence, Compressive Sensing involves the concept of representing a large, sparse signal xx with a limited number of linear measurements using a matrix AA and then storing y=Axy=Ax. The primary difficulty lies in reconstructing the original signal xx from the observation vector yy.

We applied the proposed method to the CS problem, a swiftly expanding discipline that has attracted significant interest across multiple scientific areas. We assess the quality of the reconstructed signal xx^{*} by calculating the relative error compared to the original signal xsx_{s} as follows:

RelErr:=100xxsxs.\text{RelErr}:=100\frac{\|x^{*}-x_{s}\|}{\|x_{s}\|}.

Following the method described in [30], we created random test functions by setting the signal dimension to n=212n=2^{12} and using the Hadamard matrix for the sampling matrix ψ\psi. We performed a comparison among WDMBFGS3, MBFGS3, DMBFGS3, and DNRTR using a Bernoulli matrix AA, with entries randomly taking values of +1+1 or 1-1 with equal likelihood. This setup allows us to evaluate the performance of these algorithms under the specified conditions and matrix structure. The initial point was set to 0n0\in\mathbb{R}^{n}.To perform a meaningful comparison of these approaches’ performance, the resulting data was analyzed in terms of relative error (RelErr).

Figure 4 to Figure 6 compare MBFGS3, DMBFGS3, DNRTR, and WDMBFGS3 in terms of function values versus reconstruction ability (Refer to Figure 4-Figure 6, which illustrate the recovered signal (red circles) compared to the original signal (blue peaks)). These figures verify the competitiveness of WDMBFGS3 compared to MBFGS3, DMBFGS3, and DNRTR based on the corresponding values. Furthermore, by examining the plots and noting the real reconstruction errors, the effectiveness and resilience of the WDMBFGS3 method in retrieving large sparse signals become evident when compared to the alternative solvers. This approach is seamless and can be efficiently addressed using the provided algorithm.

Refer to caption
Figure 1: Diagram of the original signal.
Refer to caption
Figure 2: Diagram of the observation (noisy measurement).
Refer to caption
Figure 3: Diagrams of recovered signals by DNRTR
Refer to caption
Figure 4: Diagrams of recovered signals by MBFGS3
Refer to caption
Figure 5: Diagrams of recovered signals by DMBFGS3
Refer to caption
Figure 6: Diagrams of recovered signals by WDMBFGS3

4.2 74 Unconstrained Optimization Test Problems

To compare the numerical performance between WDMBFGS3 and DMBFGS3 methods, we compared them using various techniques . This study assessedthe CPU time (CPU) , the number of function evaluations (NFG), and the iteration count (NI) of the algorithms required. Additionally, the numerical experiments in this paper included two control groups: DNRTR and MBFGS3. The parameter data used in the experiments are as follows:
Problem dimensions: 900, 1500, and 2100 dimensions.
Parameters used: All algorithm parameters are specified in brackets [σ=0.8\sigma=0.8 , ρ=0.0001\rho=0.0001].
Stop criterion: For all algorithms, if the condition gkϵ=106\|g_{k}\|\leq\epsilon=10^{-6} is met, or if the number of iterations exceeds 5000, the algorithm halts.
If this condition is not met, the algorithm stops when g(x)<ϵ\|g(x)\|<\epsilon satisfies any of the conditions, the algorithm stops, where e1=105e1=10^{-5}.
Tested problems: This study assessed 74 unconstrained optimization challenges, with an extensive enumeration provided in the work of Andrei [27].

The figures illustrate the comparison on two aspects: the left side displays the percentage of the most efficient problems in the algorithm (τ=1\tau=1), while the right side showcases the percentage of problems that the algorithm can solve (τ=\tau=\infty).For a given algorithm, the τ=1\tau=1 plot indicates its efficiency, while the τ=\tau=\infty plot represents its robustness.

To evaluate and compare the efficiency of these algorithms, we follow the methodology proposed by Dolan and More [17]. The parameters on the horizontal and vertical axes are defined as follows: "τ\tau" represents the performance metric (NI, NFG, or CPU time), and the reciprocal of the ratio to the best performance among all methods is denoted as PpPp. The proportion of problems solved when the ratio of this method is less than parameter τ\tau is indicated by r(0.5)<τr(0.5)<\tau. From Figures 7, 8 and 9, it can be seen that the performance of the DMBFGS3 algorithm is, in a sense, the best, because 59% of the problems are solved by this algorithm with the fewest number of iterations (NI), outperforming MBFGS3 and DNRTR, which solve at most 22% of the test problems. At the same time, Figure 7 shows that WDMBFGS3 solves 55% of the problems with the fewest number of iterations, although this result is not the optimal one. However, when we zoom in on the horizontal axis and expand the τ\tau value to 3, WDMBFGS3 can solve nearly 100% of the problems, indicating that WDMBFGS3 has stronger robust stability. From the figure, it can be clearly seen that WDMBFGS3, using extrapolation techniques, consistently maintains the highest number of victories (indicating the highest probability of being the superior solver). Figure 8 shows that WDMBFGS3 solves 68% of the evaluation tasks with the fewest number of total function and gradient evaluations (NFG). Notably, when the horizontal axis is zoomed in to τ=6\tau=6, DMBFGS3 and WDMBFGS3 perform excellently, solving over 90% of the problems. Although DMBFGS3’s performance in processing data is close to WDMBFGS3’s when τ6\tau\geq 6, WDMBFGS3 solves these problems with fewer iterations (NI), so the algorithm with the extrapolation technique, WDMBFGS3, has a stronger advantage and competitiveness. Figure 9 shows the CPU time used by the algorithms, and it is clear that the WDMBFGS3 and DMBFGS3 algorithms significantly outperform the other two algorithms.

Refer to caption
Figure 7: The Iteration Count (NI) for Four Algorithms
Refer to caption
Figure 8: The Total Number of Function and Gradient Evaluations (NFG) for Four Algorithms
Refer to caption
Figure 9: The CPU Time (CPU) for Four Algorithms
Table 1: WDMBFGS3 versus DMBFGS3 versus MBFGS3 and versus DNRTR
nn DNRTR MBFGS3 DMBFGS3 WDMBFGS3
#iter cpu #iter cpu #iter cpu #iter cpu
900 34 0.046875 35 0.65625 41 0.03125 33 0.0625
1500 27 0.34375 35 2.078125 35 0.046875 42 0.0625
2100 28 2.25 35 12.54688 113 1.234375 51 0.09375
Total 89 2.640625 105 15.28125 189 1.3125 126 0.21875

Table 1 presents the outcomes from the issues with the quantity of variables within the interval (900, 2100). Observably, for extensive-scale unconstrained optimization challenges, the performance of the WDMBFGS3 algorithm is notably best than other algorithms.

4.3 Muskingum Model

The nonlinear Muskingum model, frequently applied in hydrological engineering, addresses the challenges posed by optimization algorithms. A notable study by Ouyang et al.[31] presents this model, which is formulated as follows:

minf(x1,x2,x3)=i=1n1((1Δt6)x1(x2Ii+1+(1x2)Qi+1)x3(1Δt6)x1(x2Ii+(1x2)Qi)x3Δt2(IiQi)+Δt2(1Δt3)(Ii+1Qi+1))2,\min f(x_{1},x_{2},x_{3})=\sum_{i=1}^{n-1}{\left(\begin{array}[]{c}\left(1-\frac{\Delta t}{6}\right)x_{1}\left(x_{2}I_{i+1}+(1-x_{2})Q_{i+1}\right)^{x_{3}}\\ -\left(1-\frac{\Delta t}{6}\right)x_{1}\left(x_{2}I_{i}+(1-x_{2})Q_{i}\right)^{x_{3}}\\ -\frac{\Delta t}{2}(I_{i}-Q_{i})+\frac{\Delta t}{2}\left(1-\frac{\Delta t}{3}\right)(I_{i+1}-Q_{i+1})\\ \end{array}\right)^{2}},

where: nn denotes the total time steps, x1x_{1} is the storage time constant, x2x_{2} is the weighting factor, x3x_{3} is an additional parameter, Δt\Delta t represents the time step at time tit_{i} (i=1,2,,ni=1,2,\ldots,n), IiI_{i} and QiQ_{i} are the observed inflow and outflow discharges, respectively.

We cited [32] as the source of the research data for this paper. The time step Δt\Delta t was set to 12 hours, and the initial parameter values were x=[0,1,1]Tx=[0,1,1]^{T}. Comprehensive information on IiI_{i} and QiQ_{i} for 1961 is available in the work by Ouyang et al.[31]. Figure 11 displays both the recorded and computed outflows for the Muskingum model using the WDMBFGS3 in 1961, while Figure 11 presents the same for the DMBFGS3 in 1960. It is evident that the outflow curve fitted by the WDMBFGS algorithm is noticeably superior to that of the DMBFGS3.

Refer to caption
Figure 10: Fitting Curve of Outflow Utilizing WDMBFGS3 Algorithm in 1961
Refer to caption
Figure 11: Fitting Curve of Outflow Utilizing DMBFGS3 Algorithm in 1961

5 Conclusion

This paper introduces an improved quasi-Newton algorithm, DMBFGS3, which incorporates diagonal transformations and an adjustment term on yky_{k}, further enhancing the stability of the algorithm. By introducing an extrapolation technique, the iterative process of the algorithm is accelerated, resulting in the proposed WDMBFGS3 algorithm. We discuss the performance of these two proposed algorithms in three different non-convex optimization experimental settings. A series of numerical experiments demonstrate that DMBFGS3 exhibits high efficiency and robustness when applied to non-convex optimization problems. Future research will focus on the following aspects: 1. Designing adaptive step sizes to avoid manual adjustments while ensuring theoretical convergence; 2. Investigating faster iterative setting strategies to improve the efficiency of model training.

Declarations

Data availability Data available on request from the authors.

Acknowledgement

This work is supported by Guangxi Science and Technology base and Talent Project (Grant No. AD22080047) and the special foundation for Guangxi Ba Gui Scholars.

References

  • \bibcommenthead
  • Wei et al. [2004] Wei, Z., Yu, G., Yuan, G., Lian, Z.: The superlinear convergence of a modified bfgs-type method for unconstrained optimization. Computational optimization and applications 29, 315–332 (2004)
  • Davidon [1991] Davidon, W.C.: Variable metric method for minimization. SIAM Journal on optimization 1(1), 1–17 (1991)
  • Li and Fukushima [2001] Li, D.-H., Fukushima, M.: A modified bfgs method and its global convergence in nonconvex minimization. Journal of Computational and Applied Mathematics 129(1-2), 15–35 (2001)
  • Li and Fukushima [1999] Li, D., Fukushima, M.: A globally and superlinearly convergent gauss–newton-based bfgs method for symmetric nonlinear equations. SIAM Journal on numerical Analysis 37(1), 152–172 (1999)
  • Li and Fukushima [2001] Li, D.-H., Fukushima, M.: On the global convergence of the bfgs method for nonconvex unconstrained optimization problems. SIAM Journal on Optimization 11(4), 1054–1064 (2001)
  • Dennis and Moré [1974] Dennis, J.E., Moré, J.J.: A characterization of superlinear convergence and its application to quasi-newton methods. Mathematics of computation 28(126), 549–560 (1974)
  • Griewank and Toint [1982] Griewank, A., Toint, P.L.: Local convergence analysis for partitioned quasi-newton updates. Numerische Mathematik 39(3), 429–448 (1982)
  • Pierre [1986] Pierre, D.A.: Optimization Theory with Applications. Courier Corporation, ??? (1986)
  • Andrei [2019] Andrei, N.: A diagonal quasi-newton updating method for unconstrained optimization. Numerical Algorithms 81(2), 575–590 (2019)
  • Dennis and Wolkowicz [1993] Dennis, J.E. Jr, Wolkowicz, H.: Sizing and least-change secant methods. SIAM Journal on Numerical Analysis 30(5), 1291–1314 (1993)
  • Wolfe [1969a] Wolfe, P.: Convergence conditions for ascent methods. SIAM review 11(2), 226–235 (1969)
  • Wolfe [1969b] Wolfe, P.: Convergence conditions for ascent methods. SIAM review 11(2), 226–235 (1969)
  • Chen et al. [2013] Chen, P., Huang, J., Zhang, X.: A primal–dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Problems 29(2), 025011 (2013)
  • Iiduka [2012] Iiduka, H.: Iterative algorithm for triple-hierarchical constrained nonconvex optimization problem and its application to network bandwidth allocation. SIAM Journal on Optimization 22(3), 862–878 (2012)
  • Jolaoso et al. [2021] Jolaoso, L., Alakoya, T., Taiwo, A., Mewomo, O.: Inertial extragradient method via viscosity approximation approach for solving equilibrium problem in hilbert space. Optimization 70(2), 387–412 (2021)
  • Abubakar et al. [2020a] Abubakar, J., Kumam, P., Hassan Ibrahim, A., Padcharoen, A.: Relaxed inertial tseng’s type method for solving the inclusion problem with application to image restoration. Mathematics 8(5), 818 (2020)
  • Abubakar et al. [2020b] Abubakar, J., Kumam, P., Hassan Ibrahim, A., Padcharoen, A.: Relaxed inertial tseng’s type method for solving the inclusion problem with application to image restoration. Mathematics 8(5), 818 (2020)
  • Abubakar et al. [2019] Abubakar, J., Sombut, K., Ibrahim, A.H., et al.: An accelerated subgradient extragradient algorithm for strongly pseudomonotone variational inequality problems. Thai Journal of Mathematics 18(1), 166–187 (2019)
  • Alvarez [2004] Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in hilbert space. SIAM Journal on Optimization 14(3), 773–782 (2004)
  • Alvarez and Attouch [2001] Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Analysis 9, 3–11 (2001)
  • Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics 4(5), 1–17 (1964)
  • Alvarez [2000] Alvarez, F.: On the minimizing property of a second order dissipative system in hilbert spaces. SIAM Journal on Control and Optimization 38(4), 1102–1119 (2000)
  • Sun and Liu [2015] Sun, M., Liu, J.: A modified hestenes–stiefel projection method for constrained nonlinear equations and its linear convergence rate. Journal of Applied Mathematics and Computing 49, 145–156 (2015)
  • Ibrahim et al. [2022] Ibrahim, A.H., Kumam, P., Sun, M., Chaipunya, P., Abubakar, A.B.: Projection method with inertial step for nonlinear equations: application to signal recovery. Journal of Industrial and Management Optimization 19(1), 30–55 (2022)
  • Moudafi and Elizabeth [2003] Moudafi, A., Elizabeth, E.: Approximate inertial proximal methods using the enlargement of maximal monotone operators. International Journal of Pure and Applied Mathematics 5(3), 283–299 (2003)
  • Moudafi and Oliny [2003] Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. Journal of Computational and Applied Mathematics 155(2), 447–454 (2003)
  • Andrei [2008] Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim 10(1), 147–161 (2008)
  • Candès et al. [2006] Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory 52(2), 489–509 (2006)
  • Donoho [2006] Donoho, D.L.: Compressed sensing. IEEE Transactions on information theory 52(4), 1289–1306 (2006)
  • Aminifard et al. [2022] Aminifard, Z., Hosseini, A., Babaie-Kafaki, S.: Modified conjugate gradient method for solving sparse recovery problem with nonconvex penalty. Signal Processing 193, 108424 (2022)
  • Ouyang et al. [2015] Ouyang, A., Liu, L.-B., Sheng, Z., Wu, F.: A class of parameter estimation methods for nonlinear muskingum model using hybrid invasive weed optimization algorithm. Mathematical Problems in Engineering 2015(1), 573894 (2015)
  • Luo et al. [2022] Luo, D., Li, Y., Lu, J., Yuan, G.: A conjugate gradient algorithm based on double parameter scaled broyden–fletcher–goldfarb–shanno update for optimization problems and image restoration. Neural Computing and Applications 34(1), 535–553 (2022)