This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SPICE: Scaling-Aware Prediction Correction Methods with a Free Convergence Rate for Nonlinear Convex Optimization

Sai Wang1,2 1 XXX University, Country
2Southern University of Science and Technology, China
Abstract

Recently, the prediction-correction method has been developed to solve nonlinear convex optimization problems. However, its convergence rate is often poor since large regularization parameters are set to ensure convergence conditions. In this paper, the scaling-aware prediction correction (Spice) method is proposed to achieve a free convergence rate. This method adopts a novel scaling technique that adjusts the weight of the objective and constraint functions. The theoretical analysis demonstrates that increasing the scaling factor for the objective function or decreasing the scaling factor for constraint functions significantly enhances the convergence rate of the prediction correction method. In addition, the Spice method is further extended to solve separable variable nonlinear convex optimization. By employing different scaling factors as functions of the iterations, the Spice method achieves convergence rates of 𝒪(1/(t+1))\mathcal{O}(1/(t+1)), 𝒪(1/[et(t+1)])\mathcal{O}(1/[e^{t}(t+1)]), and 𝒪(1/(t+1)t+1)\mathcal{O}(1/(t+1)^{t+1}). Numerical experiments further validate the theoretical findings, demonstrating the effectiveness of the Spice method in practice.

keywords:
Primal-dual method, prediction correction method, scaling technique.

1 Introduction

Nonlinear convex optimization is fundamental to many fields, including control theory, image denoising, and signal processing. This work addresses a general nonlinear convex problem with inequality constraints, formalized as:

𝖯0:min{f(𝐱)ϕi(𝐱)0,𝐱𝒳,i=1,,p},\displaystyle\mathsf{P}0:\quad\min\left\{f(\mathbf{x})\mid\phi_{i}(\mathbf{x})\leq 0,\ \mathbf{x}\in\mathcal{X},\ i=1,\cdots,p\right\}, (1.1)

where the set 𝒳\mathcal{X} is a nonempty closed convex subset of n\mathbb{R}^{n}. The objective and constraint function {f(𝐱):n}\{f(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R}\} and {ϕi(𝐱):n,i=1,,p}\{\phi_{i}(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R},i=1,\cdots,p\} are convex, with the constraint functions also being continuously differentiable. Notably, f(𝐱)f(\mathbf{x}) is not assumed to be differentiable. The Lagrangian function associated with P0, parameterized by the dual variable 𝝀\boldsymbol{\lambda}, is given by:

(𝐱,𝝀)=f(𝐱)+𝝀Φ(𝐱),\mathcal{L}(\mathbf{x},\boldsymbol{\lambda})=f(\mathbf{x})+\boldsymbol{\lambda}^{\top}\Phi(\mathbf{x}),\ (1.2)

where Φ(𝐱)=[ϕ1(𝐱),,ϕp(𝐱)]\Phi(\mathbf{x})=[\phi_{1}(\mathbf{x}),\cdots,\phi_{p}(\mathbf{x})]^{\top}, and the dual variable 𝝀\boldsymbol{\lambda} belongs to the set 𝒵:=+p\mathcal{Z}:=\mathbb{R}^{p}_{+}. In this way, the constrained problem is reformulated to a saddle point problem. The saddle point of (1.2) yields the optimal solution of P0. A common approach is to design an iterative scheme to update the primal and dual variables alternately. The Arrow-Hurwicz method, one of the earliest primal-dual methods, was proposed to solve concave-convex optimization problems [1], laying the foundation for subsequent developments. Sequentially, a primal-dual hybrid gradient (PDHG) method [2] introduced gradient steps, offering a more flexible framework. However, PDHG has been shown to diverge in certain linear programming problems [3], and its variants achieve convergence only under more restrictive assumptions [4]. To address this issue, the Chambolle-Pock method was developed as a modified version of PDHG. By introducing a relaxation parameter, it improved stability and enhanced the convergence rate for saddle point problems with linear operators [5, 6, 7]. Further extensions of PDHG have been applied to nonlinear operators [8, 9]. Unfortunately, these approaches offer only local convergence guarantees, leaving the convergence rate unaddressed. Another notable class of primal-dual methods is the customized proximal point algorithms (PPA) developed in [10, 11, 12, 13, 14]. These algorithms aim to customize a symmetric proximal matrix, rather than merely relaxing the primal variable, to address convex problems with linear constraints. Based on variational analysis theory, they provide a straightforward proof of convergence. Later, a novel prediction-correction (PC) method was proposed in [15, 16]. The key innovation of the PC method is proposed to correct the primal and dual variables generated by the Arrow-Hurwicz method without constructing a symmetric proximal matrix. This unified framework establishes convergence conditions for the corrective matrix, which are relatively easy to satisfy. In Table 1, a comparison of different primal-dual methods under setting assumptions and convergence rate.

Recent advances have extended prediction-correction (PC) methods to handle nonlinear optimization problems, particularly by leveraging carefully designed regularization parameters [17, 18]. These works prove that it can achieve ergodic convergence rates by ensuring that the sequence of regularization parameters is non-increasing. However, the design of such a sequence poses a significant challenge, as the regularization parameters strongly influence the performance of the proximal mapping problem. Larger regularization parameters, though useful in theory, often fail to capture the curvature of the proximal term effectively, leading to suboptimal convergence. While a smaller non-increasing sequence of regularization parameters can accelerate convergence toward the optimal solution, this approach introduces instability into the algorithm. The challenge, therefore, lies in balancing the size of these parameters to ensure both rapid convergence and algorithmic stability. The choice of these parameters is intricately tied to the derivative of the constraint functions; for any given derivative, a lower bound can be established on the product of the regularization parameters, which serves as a key condition for ensuring convergence. This lower bound prevents the parameters from being too small, while larger parameters are often inefficient at capturing the problem’s local shape, thereby slowing convergence rates. Building upon these theoretical foundations, some works have introduced customized proximal matrices to optimize regularization parameter selection. For instance, the authors of [19, 18] proposed generalized primal-dual methods that achieve a 25% reduction in the lower bound of the product of regularization parameters, representing a significant improvement in parameter tuning. Despite this progress, identifying smaller regularization parameters that still meet convergence conditions remains a major obstacle.

To solve this challenge, this paper introduces a novel scaling technique that adjusts the weights of the objective and constraint functions, enabling a reduction in the regularization parameters. This innovation leads to the development of the scaling-aware prediction correction (Spice) method, which achieves a free convergence rate for nonlinear convex problems, overcoming the limitations of existing approaches in terms of both stability and efficiency.

Table 1: Convergence rates of prima-dual methods for convex optimizations (tt is the number of iterations, ω(0,1)\omega\in(0,1)).
Algorithm Linear Constraints Nonlinear Constraints Other Assumption Convergence Rate
Arrow-Hurwicz [4] \surd - Strongly convex 𝒪(1/t)\mathcal{O}(1/t)
Chambbolle-Pock [6] \surd - Smooth 𝒪(ωt)\mathcal{O}(\omega^{t})
Customized PPA [10, 11, 12] \surd - - 𝒪(1/t)\mathcal{O}(1/t)
Traditional PC [15, 16] \surd - - 𝒪(1/t)\mathcal{O}(1/t)
Spice (this work) - \surd - Free

2 Preliminaries

This section introduces the foundational concepts, specifically the variational inequality, which is a key tool for the analysis later in the paper. Additionally, a scaling technique is proposed to adjust the variational inequality for further optimization.

2.1 Variational Inequality

This subsection formalizes the concept of variational inequality in the context of optimization problems. It provides the necessary mathematical background, including gradient representations and saddle point conditions, to establish the optimality criteria for constrained problems. The gradient of the function f(𝐱)f(\mathbf{x}) with respect to the vector 𝐱=[x1,,xn]\mathbf{x}=[x_{1},\cdots,x_{n}]^{\top} is represented as 𝒟f(𝐱)=[f(𝐱)]\mathcal{D}f(\mathbf{x})=[\nabla f(\mathbf{x})]^{\top}. For a vector function Φ(𝐱)\Phi(\mathbf{x}), its gradient is given by:

𝒟Φ(𝐱)=(𝒟ϕ1(𝐱)𝒟ϕp(𝐱))=(ϕ1x1ϕ1xnϕpx1ϕpxn).\mathcal{D}\Phi(\mathbf{x})=\left(\begin{array}[]{c}\mathcal{D}\phi_{1}(\mathbf{x})\\ \vdots\\ \mathcal{D}\phi_{p}(\mathbf{x})\end{array}\right)=\left(\begin{array}[]{ccc}\frac{\partial\phi_{1}}{\partial x_{1}}&\cdots&\frac{\partial\phi_{1}}{\partial x_{n}}\\ \vdots&\ddots&\vdots\\ \frac{\partial\phi_{p}}{\partial x_{1}}&\cdots&\frac{\partial\phi_{p}}{\partial x_{n}}\end{array}\right).
Lemma 2.1.

Let 𝒳n\mathcal{X}\subset\mathbb{R}^{n} be a closed convex set, with f(𝐱)f(\mathbf{x}) and h(𝐱)h(\mathbf{x}) as convex functions, where h(𝐱)h(\mathbf{x}) is differentiable. Assume the minimization problem min{f(𝐱)+h(𝐱)𝐱𝒳}\min\{f(\mathbf{x})+h(\mathbf{x})\mid\mathbf{x}\in\mathcal{X}\} has a nonempty solution set. The vector 𝐱\mathbf{x}^{*} is an optimal solution, i.e.,

𝐱argmin{f(𝐱)+h(𝐱)𝐱𝒳},\mathbf{x}^{*}\in\arg\min\{f(\mathbf{x})+h(\mathbf{x})\mid\mathbf{x}\in\mathcal{X}\},

if and only if

𝐱𝒳,f(𝐱)f(𝐱)+(𝐱𝐱)h(𝐱)0,𝐱𝒳.\mathbf{x}^{*}\in\mathcal{X},\quad f(\mathbf{x})-f(\mathbf{x}^{*})+(\mathbf{x}-\mathbf{x}^{*})^{\top}\nabla h(\mathbf{x}^{*})\geq 0,\quad\forall\ \mathbf{x}\in\mathcal{X}.
Proof.

Please refer to the work in [17] (Lemma 2.1). ∎

Let (𝐱,𝝀)(\mathbf{x}^{*},\boldsymbol{\lambda}^{*}) be the saddle point of (1.2) that satisfies the following conditions:

𝐱argmin{(𝐱,𝝀)𝐱𝒳},\displaystyle\mathbf{x}^{*}\in\arg\min\{\mathcal{L}(\mathbf{x},\boldsymbol{\lambda}^{*})\mid\mathbf{x}\in\mathcal{X}\},
𝝀argmax{(𝐱,𝝀)𝝀𝒵}.\displaystyle\boldsymbol{\lambda}^{*}\in\arg\max\{\mathcal{L}(\mathbf{x}^{*},\boldsymbol{\lambda})\mid\boldsymbol{\lambda}\in\mathcal{Z}\}.

By Lemma 2.1, the saddle point follows the variational inequalities:

𝐱𝒳,\displaystyle\mathbf{x}^{*}\in\mathcal{X},\quad f(𝐱)f(𝐱)+(𝐱𝐱)𝒟Φ(𝐱)𝝀0,\displaystyle f(\mathbf{x})-f(\mathbf{x}^{*})+(\mathbf{x}-\mathbf{x}^{*})^{\top}\mathcal{D}\Phi(\mathbf{x}^{*})^{\top}\boldsymbol{\lambda}^{*}\geq 0, 𝐱𝒳,\displaystyle\quad\forall\ \mathbf{x}\in\mathcal{X},
𝝀𝒵,\displaystyle\boldsymbol{\lambda}^{*}\in\mathcal{Z},\quad (𝝀𝝀)[Φ(𝐱)]0,\displaystyle(\boldsymbol{\lambda}-\boldsymbol{\lambda}^{*})^{\top}[-\Phi(\mathbf{x}^{*})]\geq 0, 𝝀𝒵,\displaystyle\quad\forall\ \boldsymbol{\lambda}\in\mathcal{Z},

where 𝒟Φ(𝐱)=[ϕ1,,ϕp]p×n\mathcal{D}\Phi(\mathbf{x})=[\nabla\phi_{1},\cdots,\nabla\phi_{p}]^{\top}\in\mathbb{R}^{p\times n} and 𝒟Φ(𝐱)𝝀=i=1pλiϕi(𝐱)n×1.\mathcal{D}\Phi(\mathbf{x})^{\top}\boldsymbol{\lambda}=\sum_{i=1}^{p}\lambda_{i}\nabla\phi_{i}(\mathbf{x})\in\mathbb{R}^{n\times 1}. The following monotone variational inequality can characterize the optimal condition:

𝐰Ω,f(𝐱)f(𝐱)+(𝐰𝐰)𝚪(𝐰)0,𝐰Ω,\mathbf{w}^{*}\in\Omega,\quad f(\mathbf{x})-f(\mathbf{x}^{*})+(\mathbf{w}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\mathbf{w}^{*})\geq 0,\quad\forall\ \mathbf{w}\in\Omega, (2.1)

where

𝐰=(𝐱𝝀),𝚪(𝐰)=(𝒟Φ(𝐱)𝝀Φ(𝐱)), and Ω=𝒳×𝒵.\mathbf{w}=\left(\begin{array}[]{c}\mathbf{x}\\ \boldsymbol{\lambda}\end{array}\right),\quad\boldsymbol{\Gamma}(\mathbf{w})=\left(\begin{array}[]{c}\mathcal{D}\Phi(\mathbf{x})^{\top}\boldsymbol{\lambda}\\ -\Phi(\mathbf{x})\end{array}\right),\quad\mbox{ and }\quad\Omega=\mathcal{X}\times\mathcal{Z}. (2.2)
Lemma 2.2.

Let 𝒳n\mathcal{X}\subset\mathbb{R}^{n}, 𝒵:=+p1×p2\mathcal{Z}:=\mathbb{R}^{p_{1}}_{+}\times\mathbb{R}^{p_{2}} be closed convex sets. Then the operator 𝚪\boldsymbol{\Gamma} defined in (2.4) satisfies

(𝐰𝐰¯)[𝚪(𝐰)𝚪(𝐰¯)]0,𝐰,𝐰¯Ω.(\mathbf{w}-\bar{\mathbf{w}})^{\top}\left[\boldsymbol{\Gamma}(\mathbf{w})-\boldsymbol{\Gamma}(\bar{\mathbf{w}})\right]\geq 0,\quad\forall\ \mathbf{w},\bar{\mathbf{w}}\in\Omega. (2.3)
Proof.

The proof can be found in [17] (Lemma 2.2). ∎

2.2 Scaling Technique

This subsection introduces a novel scaling technique to scale the objective and constraint functions. It shows how scaling factors impact the variational inequality, leading to a reformulated optimization problem with improved flexibility in solution methods.

Lemma 2.3.

For any scaling factors ρ>0,η>0\rho>0,\eta>0, if 𝐱\mathbf{x}^{*} is the optimal solution of 𝖯0\mathsf{P}0, the following variational inequality holds

𝐰Ω,ρ[f(𝐱)f(𝐱)]+(𝐰𝐰)1η𝚪(𝐰)0,𝐰Ω.\mathbf{w}^{*}\in\Omega,\quad\rho\left[f(\mathbf{x})-f(\mathbf{x}^{*})\right]+(\mathbf{w}-\mathbf{w}^{*})^{\top}\frac{1}{\eta}\boldsymbol{\Gamma}(\mathbf{w}^{*})\geq 0,\quad\forall\ \mathbf{w}\in\Omega. (2.4)
Proof.

For any scaling factors ρ>0,η>0\rho>0,\eta>0, the original problem 𝖯0\mathsf{P}0 can be rewritten as

𝖯1:min{ρf(𝐱)1ηϕi(𝐱)0,1η(𝐀𝐱𝐛)=𝟎,𝐱𝒳,i=1,,p1}.\displaystyle\mathsf{P}1:\quad\min\left\{\rho f(\mathbf{x})\mid\frac{1}{\eta}\phi_{i}(\mathbf{x})\leq 0,\frac{1}{\eta}(\mathbf{A}\mathbf{x}-\mathbf{b})=\mathbf{0},\ \mathbf{x}\in\mathcal{X},\ i=1,\cdots,p_{1}\right\}. (2.5)

The scaling Lagrangian function of 𝖯1\mathsf{P}1 is defined as

(𝐱,𝝀,ρ,η)=ρf(𝐱)+𝝀1ηΦ(𝐱).\mathcal{L}(\mathbf{x},\boldsymbol{\lambda},\rho,\eta)=\rho f(\mathbf{x})+\boldsymbol{\lambda}^{\top}\frac{1}{\eta}\Phi(\mathbf{x}).\ (2.6)

For given fixed values of ρ\rho and η\eta, the saddle point of (2.6) can written as

𝐱argmin{(𝐱,𝝀,ρ,η)𝐱𝒳},\displaystyle\mathbf{x}^{*}\in\arg\min\{\mathcal{L}(\mathbf{x},\boldsymbol{\lambda}^{*},\rho,\eta)\mid\mathbf{x}\in\mathcal{X}\},
𝝀argmax{(𝐱,𝝀,ρ,η)𝝀𝒵}.\displaystyle\boldsymbol{\lambda}^{*}\in\arg\max\{\mathcal{L}(\mathbf{x}^{*},\boldsymbol{\lambda},\rho,\eta)\mid\boldsymbol{\lambda}\in\mathcal{Z}\}.

They further follow the variational inequalities below:

𝐱𝒳,\displaystyle\mathbf{x}^{*}\in\mathcal{X}, ρ[f(𝐱)f(𝐱)]+(𝐱𝐱)1η𝒟Φ(𝐱)𝝀0,\displaystyle\quad\rho\left[f(\mathbf{x})-f(\mathbf{x}^{*})\right]+(\mathbf{x}-\mathbf{x}^{*})^{\top}\frac{1}{\eta}\mathcal{D}\Phi(\mathbf{x}^{*})^{\top}\boldsymbol{\lambda}^{*}\geq 0, 𝐱𝒳,\displaystyle\quad\forall\ \mathbf{x}\in\mathcal{X},
𝝀𝒵,\displaystyle\boldsymbol{\lambda}^{*}\in\mathcal{Z}, (𝝀𝝀)[1ηΦ(𝐱)]0,\displaystyle\quad(\boldsymbol{\lambda}-\boldsymbol{\lambda}^{*})^{\top}[-\frac{1}{\eta}\Phi(\mathbf{x}^{*})]\geq 0, 𝝀𝒵.\displaystyle\quad\forall\ \boldsymbol{\lambda}\in\mathcal{Z}.

The above inequalities can be described as a unified variational inequality:

𝐰Ω,ρ[f(𝐱)f(𝐱)]+(𝐰𝐰)1η𝚪(𝐰)0,𝐰Ω.\mathbf{w}^{*}\in\Omega,\quad\rho\left[f(\mathbf{x})-f(\mathbf{x}^{*})\right]+(\mathbf{w}-\mathbf{w}^{*})^{\top}\frac{1}{\eta}\boldsymbol{\Gamma}(\mathbf{w}^{*})\geq 0,\quad\forall\ \mathbf{w}\in\Omega.

Thus, this lemma is proven. ∎

Remark 2.1.

For any ρ>0\rho>0 and η>0\eta>0, the variational inequality (2.4) can be reformulated as

𝐰Ω,f(𝐱)f(𝐱)+(𝐰𝐰)1ηρ𝚪(𝐰)0,𝐰Ω.\mathbf{w}^{*}\in\Omega,\quad f(\mathbf{x})-f(\mathbf{x}^{*})+(\mathbf{w}-\mathbf{w}^{*})^{\top}\frac{1}{\eta\rho}\boldsymbol{\Gamma}(\mathbf{w}^{*})\geq 0,\quad\forall\ \mathbf{w}\in\Omega. (2.7)

Since f(𝐱)f(𝐱)0f(\mathbf{x})-f(\mathbf{x}^{*})\geq 0 and (𝐰𝐰)𝚪(𝐰)0(\mathbf{w}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\mathbf{w}^{*})\geq 0, the inequality (2.7) implies that the combination of the function value difference and the term involving the gradient 𝚪\boldsymbol{\Gamma} always results in a non-negative value. This indicates that 𝐰\mathbf{w}^{*} is indeed an optimal solution in the feasible set Ω\Omega because any deviation 𝐰\mathbf{w} from 𝐰\mathbf{w}^{*} within the set Ω\Omega does not reduce the objective function value. Thus, the variational inequality effectively serves as a condition ensuring that 𝐱\mathbf{x}^{*} minimizes the function ff over the set Ω\Omega.

3 Scaling-Aware Prediction Correction Method

The Spice method consists of two operation steps. The first one involves utilizing the PPA scheme to predict the variables. In the second step, a matrix is applied to correct the predicted variables.

Lemma 3.1.

Let 𝐐\mathbf{Q} be a second-order diagonal scalar upper triangular block matrix defined as:

𝐐=(r𝐈n𝚲𝟎s𝐈p),\mathbf{Q}=\begin{pmatrix}r\mathbf{I}_{n}&\boldsymbol{\Lambda}\\ \mathbf{0}&s\mathbf{I}_{p}\end{pmatrix},

where r>0,s>0r>0,s>0 are two positive constants, 𝐈n\mathbf{I}_{n} and 𝐈p\mathbf{I}_{p} are identity matrices, and 𝚲\boldsymbol{\Lambda} is an n×pn\times p matrix. For any vector 𝐰n+p\mathbf{w}\in\mathbb{R}^{n+p}, the following equality holds:

𝐰𝐐𝐰=𝐰12(𝐐+𝐐)𝐰.\mathbf{w}^{\top}\mathbf{Q}\mathbf{w}=\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w}.
Proof.

Given the matrix 𝐐\mathbf{Q} and vector 𝐰\mathbf{w}, the quadratic form is:

𝐰𝐐𝐰=(𝐱𝝀)(r𝐈n𝚲𝟎s𝐈p)(𝐱𝝀).\mathbf{w}^{\top}\mathbf{Q}\mathbf{w}=\begin{pmatrix}\mathbf{x}^{\top}&\boldsymbol{\lambda}^{\top}\end{pmatrix}\begin{pmatrix}r\mathbf{I}_{n}&\boldsymbol{\Lambda}\\ \mathbf{0}&s\mathbf{I}_{p}\end{pmatrix}\begin{pmatrix}\mathbf{x}\\ \boldsymbol{\lambda}\end{pmatrix}.

Expanding this expression:

𝐰𝐐𝐰=r𝐱2+𝐱𝚲𝝀+s𝝀2.\mathbf{w}^{\top}\mathbf{Q}\mathbf{w}=r\|\mathbf{x}\|^{2}+\mathbf{x}^{\top}\boldsymbol{\Lambda}\boldsymbol{\lambda}+s\|\boldsymbol{\lambda}\|^{2}.

Now, consider the symmetric part of 𝐐\mathbf{Q}, which is 12(𝐐+𝐐)\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top}):

𝐐=(r𝐈n𝟎𝚲s𝐈p).\mathbf{Q}^{\top}=\begin{pmatrix}r\mathbf{I}_{n}&\mathbf{0}\\ \boldsymbol{\Lambda}^{\top}&s\mathbf{I}_{p}\end{pmatrix}.

Thus:

12(𝐐+𝐐)=12(r𝐈n𝚲𝟎s𝐈p)+12(r𝐈n𝟎𝚲s𝐈p)=(r𝐈n12𝚲12𝚲s𝐈p).\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})=\frac{1}{2}\begin{pmatrix}r\mathbf{I}_{n}&\boldsymbol{\Lambda}\\ \mathbf{0}&s\mathbf{I}_{p}\end{pmatrix}+\frac{1}{2}\begin{pmatrix}r\mathbf{I}_{n}&\mathbf{0}\\ \boldsymbol{\Lambda}^{\top}&s\mathbf{I}_{p}\end{pmatrix}=\begin{pmatrix}r\mathbf{I}_{n}&\frac{1}{2}\boldsymbol{\Lambda}\\ \frac{1}{2}\ \boldsymbol{\Lambda}^{\top}&s\mathbf{I}_{p}\end{pmatrix}.

Now, the quadratic form 𝐰12(𝐐+𝐐)𝐰\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w} is

𝐰12(𝐐+𝐐)𝐰=(𝐱𝝀)(r𝐈n12𝚲12𝚲s𝐈p)(𝐱𝝀).\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w}=\begin{pmatrix}\mathbf{x}^{\top}&\boldsymbol{\lambda}^{\top}\end{pmatrix}\begin{pmatrix}r\mathbf{I}_{n}&\frac{1}{2}\boldsymbol{\Lambda}\\ \frac{1}{2}\boldsymbol{\Lambda}^{\top}&s\mathbf{I}_{p}\end{pmatrix}\begin{pmatrix}\mathbf{x}\\ \boldsymbol{\lambda}\end{pmatrix}.

Expanding this:

𝐰12(𝐐+𝐐)𝐰=r𝐱2+12𝐱𝚲𝝀+12𝝀𝚲𝐱+s𝝀2.\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w}=r\|\mathbf{x}\|^{2}+\frac{1}{2}\mathbf{x}^{\top}\boldsymbol{\Lambda}\boldsymbol{\lambda}+\frac{1}{2}\boldsymbol{\lambda}^{\top}\boldsymbol{\Lambda}^{\top}\mathbf{x}+s\|\boldsymbol{\lambda}\|^{2}.

Since 𝐱𝚲𝝀\mathbf{x}^{\top}\boldsymbol{\Lambda}\boldsymbol{\lambda} and 𝝀𝚲𝐱\boldsymbol{\lambda}^{\top}\boldsymbol{\Lambda}^{\top}\mathbf{x} are scalars and equal, we have

𝐰12(𝐐+𝐐)𝐰=r𝐱2+s𝝀2+𝐱𝚲𝝀.\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w}=r\|\mathbf{x}\|^{2}+s\|\boldsymbol{\lambda}\|^{2}+\mathbf{x}^{\top}\boldsymbol{\Lambda}\boldsymbol{\lambda}.

This result matches the original quadratic form:

𝐰𝐐𝐰=r𝐱2+s𝝀2+𝐱𝚲𝝀.\mathbf{w}^{\top}\mathbf{Q}\mathbf{w}=r\|\mathbf{x}\|^{2}+s\|\boldsymbol{\lambda}\|^{2}+\mathbf{x}^{\top}\boldsymbol{\Lambda}\boldsymbol{\lambda}.

Thus, this lemma holds. ∎

Remark 3.1.

This corrected proof demonstrates that for any second-order diagonal scalar upper triangular block matrix, the quadratic form 𝐰𝐐𝐰\mathbf{w}^{\top}\mathbf{Q}\mathbf{w} is indeed equal to the quadratic form 𝐰12(𝐐+𝐐)𝐰\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w}, without any additional factors in the mixed term. In particular, if rs>14𝚲2r\cdot s>\frac{1}{4}\|\boldsymbol{\Lambda}\|^{2} holds, we have 12(𝐐+𝐐)0\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\succ 0 and 𝐰𝐐𝐰=𝐰12(𝐐+𝐐)𝐰=𝐰12(𝐐+𝐐)>0.\mathbf{w}^{\top}\mathbf{Q}\mathbf{w}=\mathbf{w}^{\top}\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})\mathbf{w}=\|\mathbf{w}\|_{\frac{1}{2}(\mathbf{Q}+\mathbf{Q}^{\top})}>0.

3.1 PPA-Like Prediction Scheme

The PPA highlights its performance between adjacent iterative steps. For the kk-th iteration and regularization parameters rk>0,sk>0r_{k}>0,s_{k}>0, the PPA-like prediction scheme can be described by the following equations:

𝐱¯k=argmin{(𝐱,𝝀k,ρ,ηk)+rk2𝐱𝐱k2𝐱𝒳},\displaystyle\bar{\mathbf{x}}^{k}=\arg\min\left\{\mathcal{L}(\mathbf{x},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{x}-\mathbf{x}^{k}\|^{2}\mid\mathbf{x}\in\mathcal{X}\right\}, (3.1a)
𝝀¯k=argmax{(𝐱¯k,𝝀,ρ,ηk)sk2𝝀𝝀k2𝝀𝒵},\displaystyle\boldsymbol{\bar{\lambda}}^{k}=\arg\max\left\{\mathcal{L}(\bar{\mathbf{x}}^{k},\boldsymbol{\lambda},\rho,\eta_{k})-\frac{s_{k}}{2}\|\boldsymbol{\lambda}-\boldsymbol{\lambda}^{k}\|^{2}\mid\boldsymbol{\lambda}\in\mathcal{Z}\right\}, (3.1b)

Referring to Lemma 2.1, the saddle point of (𝐱¯k,𝝀¯k\bar{\mathbf{x}}^{k},\boldsymbol{\bar{\lambda}}^{k}) satisfies the following variational inequalities:

𝐱¯k𝒳,\displaystyle\bar{\mathbf{x}}^{k}\in\mathcal{X}, ρ[f(𝐱)f(𝐱¯k)]+(𝐱𝐱¯k)[1ηk𝒟Φ(𝐱¯k)𝝀¯k\displaystyle\quad\rho\left[f(\mathbf{x})-f(\bar{\mathbf{x}}^{k})\right]+(\mathbf{x}-\bar{\mathbf{x}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\boldsymbol{\bar{\lambda}}^{k}\right.\qquad
1ηk𝒟Φ(𝐱¯k)(𝝀¯k𝝀¯k)+rk(𝐱¯k𝐱k)]\displaystyle\left.-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}(\boldsymbol{\bar{\lambda}}^{k}-\boldsymbol{\bar{\lambda}}^{k})+r_{k}(\bar{\mathbf{x}}^{k}-\mathbf{x}^{k})\right] 0,𝐱𝒳,\displaystyle\geq 0,\quad\forall\ \mathbf{x}\in\mathcal{X},
𝝀¯k𝒵,\displaystyle\boldsymbol{\bar{\lambda}}^{k}\in\mathcal{Z}, (𝝀𝝀¯k)[1ηkΦ(𝐱¯k)+sk(𝝀¯k𝝀k)]\displaystyle\quad(\boldsymbol{\lambda}-\boldsymbol{\bar{\lambda}}^{k})^{\top}\left[-\frac{1}{\eta_{k}}\Phi(\bar{\mathbf{x}}^{k})+s_{k}(\boldsymbol{\bar{\lambda}}^{k}-\boldsymbol{\lambda}^{k})\right] 0,𝝀𝒵.\displaystyle\geq 0,\quad\forall\ \boldsymbol{\lambda}\in\mathcal{Z}.

This can be rewritten in a unified form as:

𝐰¯kΩ,ρ[f(𝐱)f(𝐱¯k)]+(𝐰𝐰¯k)[1ηk𝚪(𝐰¯k)+𝐐k(𝐰¯k𝐰k)]0,𝐰Ω,\bar{\mathbf{w}}^{k}\in\Omega,\quad\rho\left[f(\mathbf{x})-f(\bar{\mathbf{x}}^{k})\right]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})+\mathbf{Q}_{k}(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})\right]\geq 0,\quad\forall\ \mathbf{w}\in\Omega, (3.2)

where the predictive (proximal) matrix 𝐐k\mathbf{Q}_{k} is given as:

𝐐k=(rk𝐈n1ηk𝒟Φ(𝐱¯k)𝟎sk𝐈p).\mathbf{Q}_{k}=\left(\begin{array}[]{cc}r_{k}\mathbf{I}_{n}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] \mathbf{0}&s_{k}\mathbf{I}_{p}\end{array}\right).

In the PPA-like prediction scheme, the proximal term 𝐐k\mathbf{Q}_{k} is a second-order diagonal scalar upper triangular block matrix. By Remark 3.1, if 𝐰=𝐰k\mathbf{w}=\mathbf{w}^{k} and rksk>14ηk2𝒟Φ(𝐱¯k)2r_{k}\cdot s_{k}>\frac{1}{4\eta_{k}^{2}}\|\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})\|^{2} hold, the inequality (3.2) can be simplified as:

ρ[f(𝐱k)f(𝐱¯k)]+(𝐰k𝐰¯k)1ηk𝚪(𝐰¯k)(𝐰k𝐰¯k)𝐐k(𝐰k𝐰¯k)>0.\rho\left[f(\mathbf{x}^{k})-f(\bar{\mathbf{x}}^{k})\right]+(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\geq(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k})^{\top}{\mathbf{Q}_{k}}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k})>0.

Although the generated sequence satisfies the variational condition, it may converge to a suboptimal solution. It has been shown in [17] that a symmetrical proximal matrix leads to the convergence of the customized PPA. Since the proximal matrix 𝐐k\mathbf{Q}_{k} is not symmetrical, the PPA diverges for some simple linear convex problems [3]. To ensure the convergence of the proposed method, the predictive variables should be corrected by a corrective matrix. The pseudocode for the Spice method can be found in Algorithm 1.

Input: Parameters μ>1\mu>1; tolerant error τ\tau.
Output: The optimal solution: 𝐱\mathbf{x}^{*}.
1 Initialize 𝐱0,𝝀0,k=0,η0=1\mathbf{x}^{0},\boldsymbol{\lambda}^{0},k=0,\eta_{0}=1;
2 while errorτerror\geq\tau\  do
3       % The prediction step:
4       if k>0k>0 then
5             ηk=ηk1\eta_{k}^{\prime}=\eta_{k-1}; ηk=ηk\eta_{k}=\eta_{k}^{\prime}; ηmax=μηk\eta_{max}=\mu\cdot\eta_{k};
6             while  ηk<ηmax\eta_{k}<\eta_{max} do
7                   ηk=ηk\eta_{k}=\eta_{k}^{\prime}; rk=1ηk𝖱(𝐱k)r_{k}=\frac{1}{\eta_{k}}\sqrt{\mathsf{R}(\mathbf{x}^{k})};
8                   𝐱¯k=argmin{(𝐱,𝝀k,ρ,ηk)+rk2𝐱𝐱k2𝐱𝒳}\bar{\mathbf{x}}^{k}=\arg\min\{\mathcal{L}(\mathbf{x},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{x}-\mathbf{x}^{k}\|^{2}\mid\mathbf{x}\in\mathcal{X}\};
9                   ηmax=max{ηk1𝖱(𝐱k)𝖱(𝐱k1),ηk1𝖱(𝐱¯k)𝖱(𝐱k1)𝖱(𝐱¯k1)𝖱(𝐱k)}\eta_{max}=\max\left\{\eta_{k-1}\cdot\sqrt{\frac{\mathsf{R}(\mathbf{x}^{k})}{\mathsf{R}(\mathbf{x}^{k-1})}},\quad\eta_{k-1}\cdot\frac{\mathsf{R}(\bar{\mathbf{x}}^{k})\sqrt{\mathsf{R}(\mathbf{x}^{k-1})}}{\mathsf{R}(\bar{\mathbf{x}}^{k-1})\sqrt{\mathsf{R}(\mathbf{x}^{k})}}\right\};
10                   ηk=μηk\eta_{k}^{\prime}=\mu\cdot\eta_{k};
11                  
12             end while
13            
14       end if
15      sk=μηk𝖱(𝐱¯k)/𝖱(𝐱k)s_{k}=\frac{\mu}{\eta_{k}}\mathsf{R}(\bar{\mathbf{x}}^{k})/\sqrt{\mathsf{R}(\mathbf{x}^{k})};
16       𝝀¯k=𝖯𝒵(𝝀k+1ηkskΦ(𝐱¯k))\bar{\boldsymbol{\lambda}}^{k}=\mathsf{P}_{\mathcal{Z}}\left(\boldsymbol{\lambda}^{k}+\frac{1}{\eta_{k}s_{k}}\Phi(\bar{\mathbf{x}}^{k})\right);
17       % The correction step:
18       𝐰k+1=𝐰k𝐌k(𝐰k𝐰¯k)\mathbf{w}^{k+1}=\mathbf{w}^{k}-\mathbf{M}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k});
19       error=abs[f(𝐱k)f(𝐱k+1)]error=\mbox{abs}[f(\mathbf{x}^{k})-f(\mathbf{x}^{k+1})];
20       k=k+1k=k+1;
21      
22 end while
23return 𝐱=𝐱k\mathbf{x}^{*}=\mathbf{x}^{k};
Algorithm 1 Scaling-aware prediction correction method

3.2 Matrix-Driven Correction Scheme

In the correction phase, a corrective matrix is employed to adjust the predictive variables. It is important to note that this matrix is not unique; various forms can be utilized, including both upper and lower triangular matrices, as discussed in related work [17].

𝐰k+1=𝐰k𝐌k(𝐰k𝐰¯k).\mathbf{w}^{k+1}=\mathbf{w}^{k}-\mathbf{M}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}). (3.3)

For the sake of simplifying the analysis, in this paper, the corrective matrix 𝐌k\mathbf{M}_{k} is defined as follows:

𝐌k=(𝐈n1ηkrk𝒟Φ(𝐱¯k)𝟎𝐈p).\mathbf{M}_{k}=\left(\begin{array}[]{cc}\mathbf{I}_{n}&-\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] \mathbf{0}&\mathbf{I}_{p}\end{array}\right).

Referring to equation (3.3), the corrective matrix is intrinsically linked to the predictive variable and the newly iterated variables. To facilitate convergence analysis, two extended matrices, 𝐇k\mathbf{H}_{k} and 𝐆k\mathbf{G}_{k}, are introduced. These matrices are derived by dividing by 𝐐k\mathbf{Q}_{k}, setting the groundwork for the subsequent section.

𝐇k=𝐐k𝐌k1\displaystyle\mathbf{H}_{k}=\mathbf{Q}_{k}\mathbf{M}_{k}^{-1} =\displaystyle= (rk𝐈n1ηk𝒟Φ(𝐱¯k)𝟎sk𝐈p)(𝐈n1ηkrk𝒟Φ(𝐱¯k)𝟎𝐈p)\displaystyle\left(\begin{array}[]{cc}r_{k}\mathbf{I}_{n}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] \mathbf{0}&s_{k}\mathbf{I}_{p}\end{array}\right)\left(\begin{array}[]{cc}\mathbf{I}_{n}&\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] \mathbf{0}&\mathbf{I}_{p}\end{array}\right) (3.8)
=\displaystyle= (rk𝐈n𝟎𝟎sk𝐈p),\displaystyle\left(\begin{array}[]{cc}r_{k}\mathbf{I}_{n}&\mathbf{0}\\[2.84544pt] \mathbf{0}&s_{k}\mathbf{I}_{p}\end{array}\right), (3.11)

The first extended matrix, 𝐇k\mathbf{H}_{k}, is diagonal. When rkskr_{k}\approx s_{k}, it can be considered as a scaled identity matrix by rkr_{k}. Clearly, it is a positive definite matrix since both rkr_{k} and sks_{k} are positive. The second extended matrix, 𝐆k\mathbf{G}_{k}, is defined as follows:

𝐆k\displaystyle\mathbf{G}_{k} =\displaystyle= 𝐐k+𝐐k𝐌k𝐇k𝐌k=𝐐k+𝐐k𝐌k𝐐k\displaystyle\mathbf{Q}_{k}^{\top}+\mathbf{Q}_{k}-\mathbf{M}_{k}^{\top}\mathbf{H}_{k}\mathbf{M}_{k}=\mathbf{Q}_{k}^{\top}+\mathbf{Q}_{k}-\mathbf{M}_{k}^{\top}\mathbf{Q}_{k} (3.18)
=\displaystyle= (2rk𝐈n1ηk𝒟Φ(𝐱¯k)1ηk𝒟Φ(𝐱¯k)2sk𝐈p)(𝐈n𝟎1ηkrk𝒟Φ(𝐱¯k)𝐈p)(rk𝐈n1ηk𝒟Φ(𝐱¯k)𝟎sk𝐈p)\displaystyle\left(\begin{array}[]{cc}2r_{k}\mathbf{I}_{n}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] -\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})&2s_{k}\mathbf{I}_{p}\end{array}\right)-\left(\begin{array}[]{cc}\mathbf{I}_{n}&\mathbf{0}\\[2.84544pt] -\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})&\mathbf{I}_{p}\end{array}\right)\left(\begin{array}[]{cc}r_{k}\mathbf{I}_{n}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] \mathbf{0}&s_{k}\mathbf{I}_{p}\end{array}\right)
=\displaystyle= (2rk𝐈n1ηk𝒟Φ(𝐱¯k)1ηk𝒟Φ(𝐱¯k)2sk𝐈p)(rk𝐈n1ηk𝒟Φ(𝐱¯k)1ηk𝒟Φ(𝐱¯k)sk𝐈p+1ηk2rk𝒟Φ(𝐱¯k)𝒟Φ(𝐱¯k))\displaystyle\left(\begin{array}[]{cc}2r_{k}\mathbf{I}_{n}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] -\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})&2s_{k}\mathbf{I}_{p}\end{array}\right)-\left(\begin{array}[]{cc}r_{k}\mathbf{I}_{n}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] -\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})&s_{k}\mathbf{I}_{p}+\frac{1}{\eta_{k}^{2}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\end{array}\right) (3.23)
=\displaystyle= (rk𝐈n𝟎𝟎sk𝐈p1ηk2rk𝒟Φ(𝐱¯k)𝒟Φ(𝐱¯k)).\displaystyle\left(\begin{array}[]{cc}r_{k}\mathbf{I}_{n}&\mathbf{0}\\[2.84544pt] \mathbf{0}&s_{k}\mathbf{I}_{p}-\frac{1}{\eta_{k}^{2}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\end{array}\right). (3.26)

This results in a block diagonal matrix. By applying the Schur complement lemma, it is evident that if the condition rksk𝐈n1ηk2𝒟Φ(𝐱¯k)𝒟Φ(𝐱¯k)r_{k}\cdot s_{k}\cdot\mathbf{I}_{n}\succ\frac{1}{\eta_{k}^{2}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k}) is met, 𝐆k\mathbf{G}_{k} is positive definite. In this paper, for μ>1\mu>1 and k1k\geq 1, we define 𝖱(𝐱k)=𝒟Φ(𝐱k)2\mathsf{R}(\mathbf{x}^{k})=\|\mathcal{D}\Phi(\mathbf{x}^{k})\|^{2} and set:

rk\displaystyle r_{k} =1ηk𝖱(𝐱k),\displaystyle=\frac{1}{\eta_{k}}\sqrt{\mathsf{R}(\mathbf{x}^{k})}, (3.27a)
sk\displaystyle s_{k} =μ𝖱(𝐱¯k)ηk𝖱(𝐱k).\displaystyle=\frac{\mu\mathsf{R}(\bar{\mathbf{x}}^{k})}{\eta_{k}\sqrt{\mathsf{R}(\mathbf{x}^{k})}}. (3.27b)

To ensure that the sequence ηk\eta_{k} satisfies the conditions rk1rkr_{k-1}\geq r_{k} and sk1sks_{k-1}\geq s_{k}, the sequence of {rk,sk}\{r_{k},s_{k}\} is non-increasing. Starting with the condition rk1rkr_{k-1}\geq r_{k}, we have

1ηk1𝖱(𝐱k1)1ηk𝖱(𝐱k).\frac{1}{\eta_{k-1}}\sqrt{\mathsf{R}(\mathbf{x}^{k-1})}\geq\frac{1}{\eta_{k}}\sqrt{\mathsf{R}(\mathbf{x}^{k})}.

Thus, the value of ηk\eta_{k} must satisfy:

ηkηk1𝖱(𝐱k)𝖱(𝐱k1).\eta_{k}\geq\eta_{k-1}\cdot\sqrt{\frac{\mathsf{R}(\mathbf{x}^{k})}{\mathsf{R}(\mathbf{x}^{k-1})}}.

Similarly, beginning with the condition sk1sks_{k-1}\geq s_{k}, we have

μ𝖱(𝐱¯k1)ηk1𝖱(𝐱k1)μ𝖱(𝐱¯k)ηk𝖱(𝐱k).\frac{\mu\mathsf{R}(\bar{\mathbf{x}}^{k-1})}{\eta_{k-1}\sqrt{\mathsf{R}(\mathbf{x}^{k-1})}}\geq\frac{\mu\mathsf{R}(\bar{\mathbf{x}}^{k})}{\eta_{k}\sqrt{\mathsf{R}(\mathbf{x}^{k})}}.

Therefore, ηk\eta_{k} must also satisfy:

ηkηk1𝖱(𝐱¯k)𝖱(𝐱k1)𝖱(𝐱¯k1)𝖱(𝐱k).\eta_{k}\geq\eta_{k-1}\cdot\frac{\mathsf{R}(\bar{\mathbf{x}}^{k})\sqrt{\mathsf{R}(\mathbf{x}^{k-1})}}{\mathsf{R}(\bar{\mathbf{x}}^{k-1})\sqrt{\mathsf{R}(\mathbf{x}^{k})}}.

To satisfy both rk1rkr_{k-1}\geq r_{k} and sk1sks_{k-1}\geq s_{k}, the parameter ηk\eta_{k} must be chosen to satisfy the more restrictive of the two conditions. Therefore, the appropriate choice for ηk\eta_{k} is given by:

ηkmax{ηk1𝖱(𝐱k)𝖱(𝐱k1),ηk1𝖱(𝐱¯k)𝖱(𝐱k1)𝖱(𝐱¯k1)𝖱(𝐱k)}.\eta_{k}\geq\max\left\{\eta_{k-1}\cdot\sqrt{\frac{\mathsf{R}(\mathbf{x}^{k})}{\mathsf{R}(\mathbf{x}^{k-1})}},\quad\eta_{k-1}\cdot\frac{\mathsf{R}(\bar{\mathbf{x}}^{k})\sqrt{\mathsf{R}(\mathbf{x}^{k-1})}}{\mathsf{R}(\bar{\mathbf{x}}^{k-1})\sqrt{\mathsf{R}(\mathbf{x}^{k})}}\right\}. (3.28)

For any iteration k0k\geq 0, we define a difference matrix as follows

𝐃k=𝐇k𝐇k+1=((rkrk+1)𝐈n𝟎𝟎(sksk+1)𝐈p).\displaystyle\mathbf{D}_{k}=\mathbf{H}_{k}-\mathbf{H}_{k+1}=\left(\begin{array}[]{cc}(r_{k}-r_{k+1})\mathbf{I}_{n}&\mathbf{0}\\[2.84544pt] \mathbf{0}&(s_{k}-s_{k+1})\mathbf{I}_{p}\end{array}\right). (3.31)

Since rkrk+1r_{k}\geq r_{k+1} and sksk+1s_{k}\geq s_{k+1} hold, 𝐃k\mathbf{D}_{k} is a semi positive definite matrix. The initial extended matrix can be reformulated as 𝐇0=1η0𝖧0\mathbf{H}_{0}=\frac{1}{\eta_{0}}\mathsf{H}_{0}, given by:

𝐇0=(1η0𝖱(𝐱0)𝐈n𝟎𝟎μ𝖱(𝐱¯0)η0𝖱(𝐱0)𝐈p),𝖧0=(𝖱(𝐱0)𝐈n𝟎𝟎μ𝖱(𝐱¯0)𝖱(𝐱0)𝐈p).\displaystyle\mathbf{H}_{0}=\left(\begin{array}[]{cc}\frac{1}{\eta_{0}}\sqrt{\mathsf{R}(\mathbf{x}^{0})}\mathbf{I}_{n}&\mathbf{0}\\[2.84544pt] \mathbf{0}&\frac{\mu\mathsf{R}(\bar{\mathbf{x}}^{0})}{\eta_{0}\sqrt{\mathsf{R}(\mathbf{x}^{0})}}\mathbf{I}_{p}\end{array}\right),\quad\mathsf{H}_{0}=\left(\begin{array}[]{cc}\sqrt{\mathsf{R}(\mathbf{x}^{0})}\mathbf{I}_{n}&\mathbf{0}\\[2.84544pt] \mathbf{0}&\frac{\mu\mathsf{R}(\bar{\mathbf{x}}^{0})}{\sqrt{\mathsf{R}(\mathbf{x}^{0})}}\mathbf{I}_{p}\end{array}\right). (3.36)

From the structure of these matrices, it is evident that 𝖧0\mathsf{H}_{0} is independent of the scaling factor ηk\eta_{k}, relying solely on the initial value of 𝖱(𝐱0)\mathsf{R}(\mathbf{x}^{0}).

3.3 Sequence Convergence Analysis

Based on the previous preparation work, this part demonstrates the sequence convergence of the proposed method. First, the convergence condition is derived, followed by an analysis of the sequence convergence characteristics.

Lemma 3.2.

Let {𝐰k,𝐰¯k,𝐰k+1}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k},\mathbf{w}^{k+1}\} be the sequences generated by the Spice method. When we use equation (3.27) to update the values of rkr_{k} and sks_{k}, the predictive matrix 𝐐k\mathbf{Q}_{k} and corrective matrix 𝐌k\mathbf{M}_{k} are upper triangular and satisfy

𝐇k=𝐐k𝐌k10and𝐆k=𝐐k+𝐐k𝐌k𝐇k𝐌k0,\mathbf{H}_{k}=\mathbf{Q}_{k}\mathbf{M}_{k}^{-1}\succ 0\quad{\rm and}\quad\mathbf{G}_{k}=\mathbf{Q}_{k}^{\top}+\mathbf{Q}_{k}-\mathbf{M}_{k}^{\top}\mathbf{H}_{k}\mathbf{M}_{k}\succ 0, (3.37)

and the following variational inequality holds

ρ[f(𝐱)f(𝐱¯k)]+(𝐰𝐰¯k)1ηk𝚪(𝐰¯k)12𝐰¯k𝐰k𝐆k2+12(𝐰𝐰k+1𝐇k2𝐰𝐰k𝐇k2),𝐰\displaystyle\rho\left[f(\mathbf{x})-f(\bar{\mathbf{x}}^{k})\right]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\geq\frac{1}{2}\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{G}_{k}}^{2}+\frac{1}{2}\left(\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\right),\quad\forall\ \mathbf{w}\in Ω.\displaystyle\Omega. (3.38)
Proof.

Referring to (3.2) and (3.37), the proximal term can be written as

(𝐰𝐰¯k)𝐐k(𝐰k𝐰¯k)=(𝐰𝐰¯k)𝐇k𝐌k(𝐰k𝐰¯k)=(𝐰𝐰¯k)𝐇k(𝐰k𝐰k+1).\displaystyle(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\mathbf{Q}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k})=(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\mathbf{H}_{k}\mathbf{M}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k})=(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\mathbf{H}_{k}(\mathbf{w}^{k}-\mathbf{w}^{k+1}). (3.39)

To further simplify it, the following fact can be used.

(𝐰𝐰1)𝐇k(𝐰2𝐰3)=12(𝐰𝐰3𝐇k2𝐰𝐰2𝐇k2)+12(𝐰1𝐰2𝐇k2𝐰1𝐰3𝐇k2).(\mathbf{w}-\mathbf{w}_{1})^{\top}\mathbf{H}_{k}(\mathbf{w}_{2}-\mathbf{w}_{3})=\frac{1}{2}\left(\|\mathbf{w}-\mathbf{w}_{3}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}-\mathbf{w}_{2}\|_{\mathbf{H}_{k}}^{2}\right)+\frac{1}{2}\left(\|\mathbf{w}_{1}-\mathbf{w}_{2}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}_{1}-\mathbf{w}_{3}\|_{\mathbf{H}_{k}}^{2}\right). (3.40)

Applying (3.40) to the equation (3.39), we obtain

(𝐰𝐰¯k)𝐇k(𝐰k𝐰k+1)=\displaystyle(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\mathbf{H}_{k}(\mathbf{w}^{k}-\mathbf{w}^{k+1})= 12(𝐰𝐰k+1𝐇k2𝐰𝐰k𝐇k2)+12(𝐰¯k𝐰k𝐇k2𝐰¯k𝐰k+1𝐇k2).\displaystyle\frac{1}{2}\left(\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\right)+\frac{1}{2}\left(\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}\right).

On the other hand, by Lemma 3.1, the following equation holds.

𝐰¯k𝐰k𝐇k2𝐰¯k𝐰k+1𝐇k2\displaystyle\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}
=\displaystyle= 𝐰¯k𝐰k𝐇k2(𝐰¯k𝐰k)(𝐰k+1𝐰k)𝐇k2\displaystyle\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})-(\mathbf{w}^{k+1}-\mathbf{w}^{k})\|_{\mathbf{H}_{k}}^{2}
=\displaystyle= 𝐰¯k𝐰k𝐇k2(𝐰¯k𝐰k)𝐌k(𝐰¯k𝐰k)𝐇k2\displaystyle\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})-\mathbf{M}_{k}(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})\|_{\mathbf{H}_{k}}^{2}
=\displaystyle= 2(𝐰¯k𝐰k)𝐇k𝐌k(𝐰¯k𝐰k)(𝐰¯k𝐰k)𝐌k𝐇k𝐌k(𝐰¯k𝐰k)\displaystyle 2(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})^{\top}\mathbf{H}_{k}\mathbf{M}_{k}(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})-(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})^{\top}\mathbf{M}_{k}^{\top}\mathbf{H}_{k}\mathbf{M}_{k}(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})
=\displaystyle= (𝐰¯k𝐰k)(𝐐k+𝐐k𝐌k𝐇k𝐌k)(𝐰¯k𝐰k)\displaystyle(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})^{\top}(\mathbf{Q}_{k}^{\top}+\mathbf{Q}_{k}-\mathbf{M}_{k}^{\top}\mathbf{H}_{k}\mathbf{M}_{k})(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})
=\displaystyle= 𝐰¯k𝐰k𝐆k2.\displaystyle\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{G}_{k}}^{2}.

Combining with (3.39), the proximal term is equal to

(𝐰𝐰¯k)𝐐k(𝐰k𝐰¯k)=12(𝐰𝐰k+1𝐇k2𝐰𝐰k𝐇k2)+12𝐰¯k𝐰k𝐆k2,(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\mathbf{Q}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k})=\frac{1}{2}\left(\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\right)+\frac{1}{2}\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{G}_{k}}^{2}, (3.41)

By replacing the prediction term of (3.2) with (3.41), this lemma is proved. ∎

Theorem 3.1.

Let {𝐰k,𝐰¯k,𝐰k+1}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k},\mathbf{w}^{k+1}\} be the sequences generated by the Spice method. For the predictive matrix 𝐐k\mathbf{Q}_{k}, if there is a corrective matrix 𝐌k\mathbf{M}_{k} that satisfies the convergence condition (3.37), then the sequences satisfy the following inequality

𝐰𝐰k𝐇k2𝐰𝐰k+1𝐇k2+𝐰k𝐰¯k𝐆k2,𝐰Ω,\|\mathbf{w}^{*}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\geq\|\mathbf{w}^{*}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}+\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2},\quad\ \mathbf{w}^{*}\in\Omega^{*}, (3.42)

where Ω\Omega^{*} is the set of optimal solutions.

Proof.

Setting 𝐰=𝐰\mathbf{w}=\mathbf{w}^{*}, the inequality of (3.38) can be reformulated as

𝐰𝐰k𝐇k2𝐰𝐰k+1𝐇k2\displaystyle\|\mathbf{w}^{*}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}^{*}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2} 𝐰k𝐰¯k𝐆k22{ρ[f(𝐱¯k)f(𝐱)]+(𝐰¯k𝐰)1ηk𝚪(𝐰¯k)}.\displaystyle-\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}\geq 2\left\{\rho[f(\bar{\mathbf{x}}^{k})-f(\mathbf{x}^{*})]+(\bar{\mathbf{w}}^{k}-\mathbf{w}^{*})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\right\}.

By Lemme 2.2, the monotone operator satisfies (𝐰¯k𝐰)𝚪(𝐰¯k)(𝐰¯k𝐰)𝚪(𝐰).(\bar{\mathbf{w}}^{k}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\geq(\bar{\mathbf{w}}^{k}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\mathbf{w}^{*}). Then we have

𝐰𝐰k𝐇k2𝐰𝐰k+1𝐇k2\displaystyle\|\mathbf{w}^{*}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}^{*}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}- 𝐰k𝐰¯k𝐆k22{ρ[f(𝐱¯k)f(𝐱)]+(𝐰¯k𝐰)1ηk𝚪(𝐰)}0.\displaystyle\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}\geq 2\left\{\rho[f(\bar{\mathbf{x}}^{k})-f(\mathbf{x}^{*})]+(\bar{\mathbf{w}}^{k}-\mathbf{w}^{*})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\mathbf{w}^{*})\right\}\geq 0.

Thus, this theorem holds. ∎

Remark 3.2.

The above theorem establishes a fundamental inequality for the sequence {𝐰k,𝐰¯k,𝐰k+1}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k},\mathbf{w}^{k+1}\} generated by the Spice method. The inequality shows that the norm of the difference between the iterate 𝐰k+1\mathbf{w}^{k+1} and any optimal solution 𝐰\mathbf{w}^{*}, measured in the 𝐇k\mathbf{H}_{k}-norm, decreases from one iteration to the next. Specifically, this decrease is driven by the residual term 𝐰k𝐰¯k𝐆k2\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}. This term quantifies the gap between the predictor step 𝐰k\mathbf{w}^{k} and the corrected point 𝐰¯k\bar{\mathbf{w}}^{k}, ensuring that the iterates progressively approach the set of optimal solutions Ω\Omega^{*} under the given convergence condition. Hence, the theorem guarantees the convergence of the Spice method as long as the matrix 𝐌k\mathbf{M}_{k} satisfies condition (3.37).

Lemma 3.3.

Let sequence {𝐰k}\{\mathbf{w}^{k}\} be generated by the Spice method. If parameters {rk,sk,ηk}\{r_{k},s_{k},\eta_{k}\} satify (3.27) and (3.28), for any k1k\geq 1, the sequence 𝐇k\mathbf{H}_{k} is diagonal and monotonically non-increasing. We have 𝐇k𝐇k1\mathbf{H}_{k}\preceq\mathbf{H}_{k-1} and

𝐰k+1𝐰𝐇k+12𝐰0𝐰𝐇02.\|\mathbf{w}^{k+1}-\mathbf{w}^{*}\|_{\mathbf{H}_{k+1}}^{2}\leq\|\mathbf{w}^{0}-\mathbf{w}^{*}\|_{\mathbf{H}_{0}}^{2}. (3.43)
Proof.

According to the above condition, we have:

𝐰k+1𝐰𝐇k+12\displaystyle\|\mathbf{w}^{k+1}-\mathbf{w}^{*}\|_{\mathbf{H}_{k+1}}^{2} 𝐰k+1𝐰𝐇k2\displaystyle\leq\|\mathbf{w}^{k+1}-\mathbf{w}^{*}\|_{\mathbf{H}_{k}}^{2}
𝐰k𝐰𝐇k2𝐰k𝐰¯k𝐆k2\displaystyle\leq\|\mathbf{w}^{k}-\mathbf{w}^{*}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}
𝐰k𝐰𝐇k2\displaystyle\leq\|\mathbf{w}^{k}-\mathbf{w}^{*}\|_{\mathbf{H}_{k}}^{2}
𝐰0𝐰𝐇02\displaystyle\leq\|\mathbf{w}^{0}-\mathbf{w}^{*}\|_{\mathbf{H}_{0}}^{2}

Thus, this lemma holds. ∎

Theorem 3.2.

Let {𝐰k,𝐰¯k}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k}\} the sequences generated by the Spice method. For the given optimal solution 𝐰\mathbf{w}^{*}, the following limit equations hold

limk𝐰k𝐰¯k2=0andlimk𝐰k=𝐰.\lim_{k\to\infty}\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|^{2}=0\quad{\rm and}\quad\lim_{k\to\infty}\mathbf{w}^{k}=\mathbf{w}^{*}. (3.44)
Proof.

We begin by analyzing the following inequality:

𝐰k+1𝐰𝐇k+12\displaystyle\|\mathbf{w}^{k+1}-\mathbf{w}^{*}\|_{\mathbf{H}_{k+1}}^{2} 𝐰k+1𝐰𝐇k2\displaystyle\leq\|\mathbf{w}^{k+1}-\mathbf{w}^{*}\|_{\mathbf{H}_{k}}^{2}
𝐰k𝐰𝐇k2𝐰k𝐰¯k𝐆k2\displaystyle\leq\|\mathbf{w}^{k}-\mathbf{w}^{*}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}
𝐰0𝐰𝐇02t=0k𝐰t𝐰¯t𝐆t2.\displaystyle\leq\|\mathbf{w}_{0}-\mathbf{w}^{*}\|_{\mathbf{H}_{0}}^{2}-\sum_{t=0}^{k}\|\mathbf{w}_{t}-\bar{\mathbf{w}}_{t}\|_{\mathbf{G}_{t}}^{2}.

This implies:

t=0k𝐰t𝐰¯t𝐆t2𝐰0𝐰𝐇02𝐰k+1𝐰𝐇k+12.\sum_{t=0}^{k}\|\mathbf{w}_{t}-\bar{\mathbf{w}}_{t}\|_{\mathbf{G}_{t}}^{2}\leq\|\mathbf{w}_{0}-\mathbf{w}^{*}\|_{\mathbf{H}_{0}}^{2}-\|\mathbf{w}^{k+1}-\mathbf{w}^{*}\|_{\mathbf{H}_{k+1}}^{2}.

Summing over all iterations, we find:

k=0𝐰k𝐰¯k𝐆k2𝐰0𝐰𝐇02.\sum_{k=0}^{\infty}\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}\leq\|\mathbf{w}_{0}-\mathbf{w}^{*}\|_{\mathbf{H}_{0}}^{2}.

Therefore, as kk\to\infty:

limk𝐰k𝐰¯k𝐆k2=0.\lim_{k\to\infty}\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2}=0. (3.45)

Since 𝐆k\mathbf{G}_{k} depends on the values of {rk,sk}\{r_{k},s_{k}\}, equation (3.45) further implies:

limk𝐰k𝐰¯k2=0.\lim_{k\to\infty}\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|^{2}=0.

Finally, as kk\to\infty, inequality (3.2) leads to:

𝐰¯kΩ,ρ[f(𝐱)f(𝐱¯k)]+(𝐰𝐰¯k)1ηk𝚪(𝐰¯k)0,𝐰Ω,\bar{\mathbf{w}}^{k}\in\Omega,\quad\rho[f(\mathbf{x})-f(\bar{\mathbf{x}}^{k})]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\geq 0,\quad\forall\ \mathbf{w}\in\Omega,

where 𝐰¯k\bar{\mathbf{w}}^{k} satisfies the optimality condition. Consequently, we obtain:

limk𝐰k=𝐰¯k=𝐰.\lim_{k\to\infty}\mathbf{w}^{k}=\bar{\mathbf{w}}^{k}=\mathbf{w}^{*}.

Thus, this theorem is proved. ∎

Remark 3.3.

This theorem strengthens the convergence properties of the Spice method. It asserts that the sequence {𝐰k,𝐰¯k}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k}\} not only satisfies the diminishing residual condition 𝐰k𝐰¯k20\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|^{2}\to 0 as kk\to\infty, but also ensures that the iterates 𝐰k\mathbf{w}^{k} themselves converge to an optimal solution 𝐰\mathbf{w}^{*} in the limit. This implies that the discrepancy between the predictor and corrector steps vanishes asymptotically, guaranteeing that the algorithm progressively stabilizes and converges to an optimal solution. The theorem provides a rigorous foundation for the global convergence of the Spice method.

3.4 Ergodic Convergence Analysis

To analyze the ergodic convergence, we require an alternative characterization of the solution set for the variational inequality (3.2). This characterization is provided in the following lemma:

Lemma 3.4.

The solution of the variational inequality (2.1) can be characterized as

Ω=𝐰Ω{𝐰¯Ω:ρ[ϑ(𝐮)ϑ(𝐮¯)]+(𝐰𝐰¯)1η𝚪(𝐰)0}.\Omega^{*}=\underset{\mathbf{w}\in\Omega}{\bigcap}\left\{\bar{\mathbf{w}}\in\Omega:\rho[\vartheta(\mathbf{u})-\vartheta(\bar{\mathbf{u}})]+(\mathbf{w}-\bar{\mathbf{w}})^{\top}\frac{1}{\eta}\boldsymbol{\Gamma}(\mathbf{w})\geq 0\right\}. (3.46)

The proof can be found in [17] (Lemma 3.3). By Lemma 2.2, for any 𝐰Ω\mathbf{w}\in\Omega, the monotone operator can be expressed as:

(𝐰𝐰)𝚪(𝐰)(𝐰𝐰)𝚪(𝐰).(\mathbf{w}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\mathbf{w})\geq(\mathbf{w}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\mathbf{w}^{*}). (3.47)

Referring to (3.47), the variational inequality (2.1) can be rewritten as

𝐰Ω,ρ[ϑ(𝐮)ϑ(𝐮)]+(𝐰𝐰)1η𝚪(𝐰)0,𝐰Ω.\mathbf{w}^{*}\in\Omega,\quad\rho[\vartheta(\mathbf{u})-\vartheta(\mathbf{u}^{*})]+(\mathbf{w}-\mathbf{w}^{*})^{\top}\frac{1}{\eta}\boldsymbol{\Gamma}(\mathbf{w})\geq 0,\quad\forall\ \mathbf{w}\in\Omega. (3.48)

For a given ϵ>0\epsilon>0, 𝐰¯Ω\bar{\mathbf{w}}\in\Omega is considered an approximate solution to the optimal solution 𝐰\mathbf{w}^{*} if it satisfies the following inequality:

ρ[ϑ(𝐮)ϑ(𝐮¯)]+(𝐰𝐰¯)1η𝚪(𝐰)ϵ,𝐰𝖭ϵ(𝐰¯),\rho[\vartheta(\mathbf{u})-\vartheta(\bar{\mathbf{u}})]+(\mathbf{w}-\bar{\mathbf{w}})^{\top}\frac{1}{\eta}\boldsymbol{\Gamma}(\mathbf{w})\geq-\epsilon,\quad\forall\ \mathbf{w}\in\mathsf{N}_{\epsilon}(\bar{\mathbf{w}}), (3.49)

where 𝖭ϵ(𝐰¯)={𝐰𝐰𝐰¯ϵ}\mathsf{N}_{\epsilon}(\bar{\mathbf{w}})=\{\mathbf{w}\mid\|\mathbf{w}-\bar{\mathbf{w}}\|\leq\epsilon\} is the ϵ\epsilon-neighborhood of 𝐰¯\bar{\mathbf{w}}. For a given ϵ>0\epsilon>0, after tt iterations, this provides 𝐰¯Ω\bar{\mathbf{w}}\in\Omega such that:

sup𝐰𝖭ϵ(𝐰¯){ρ[ϑ(𝐮¯)ϑ(𝐮)]+(𝐰¯𝐰)1η𝚪(𝐰)}ϵ.\sup_{\mathbf{w}\in\mathsf{N}_{\epsilon}(\bar{\mathbf{w}})}\left\{\rho[\vartheta(\bar{\mathbf{u}})-\vartheta(\mathbf{u})]+(\bar{\mathbf{w}}-\mathbf{w})^{\top}\frac{1}{\eta}\boldsymbol{\Gamma}(\mathbf{w})\right\}\leq\epsilon. (3.50)

Referring to Lemma 2.2, the inequality (3.38) can be reformulated as

ρ[f(𝐱)f(𝐱¯k)]+(𝐰𝐰¯k)1ηk𝚪(𝐰)+12𝐰𝐰k𝐇k212𝐰𝐰k+1𝐇k2,𝐰Ω.\rho\left[f(\mathbf{x})-f(\bar{\mathbf{x}}^{k})\right]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\mathbf{w})+\frac{1}{2}\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\geq\frac{1}{2}\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2},\quad\forall\ \mathbf{w}\in\Omega. (3.51)

Note that the above inequality holds only for 𝐆k0\mathbf{G}_{k}\succ 0.

Theorem 3.3.

Let {𝐱¯k}\{\bar{\mathbf{x}}^{k}\} be a sequence generated by the Spice method for problem P0 and ηk\eta_{k} satisfy condtion (3.28). New ariables 𝐱¯t\bar{\mathbf{x}}_{t}, ηt\eta_{t} and 𝐰¯t\bar{\mathbf{w}}_{t} are defined as follows

𝐱¯t=1t+1k=0t𝐱¯k,ηt=t+1k=0t1ηk,𝐰¯t=k=0t𝐰¯kηkk=0t1ηk.\bar{\mathbf{x}}_{t}=\frac{1}{t+1}\sum_{k=0}^{t}\bar{\mathbf{x}}^{k},\quad\eta_{t}=\frac{t+1}{\sum_{k=0}^{t}\frac{1}{\eta_{k}}},\quad\bar{\mathbf{w}}_{t}=\frac{\sum_{k=0}^{t}\frac{\bar{\mathbf{w}}^{k}}{\eta_{k}}}{\sum_{k=0}^{t}\frac{1}{\eta_{k}}}.

If Lemma 3.2 holds, for any iteration t>0t>0, scaling factors ρ(t)>0\rho(t)>0 and ηk>0\eta_{k}>0, the error bound satisfies

𝐰Ω,f(𝐱¯t)f(𝐱)+(𝐰¯t𝐰)1ηtρ(t)𝚪(𝐰)12η0ρ(t)(t+1)𝐰𝐰0𝖧02,𝐰¯tΩ.\mathbf{w}^{*}\in\Omega,\quad f(\bar{\mathbf{x}}_{t})-f(\mathbf{x}^{*})+(\bar{\mathbf{w}}_{t}-\mathbf{w}^{*})^{\top}\frac{1}{\eta_{t}\rho(t)}\boldsymbol{\Gamma}(\mathbf{w}^{*})\leq\frac{1}{2\eta_{0}\rho(t)(t+1)}\|\mathbf{w}^{*}-\mathbf{w}^{0}\|_{\mathsf{H}_{0}}^{2},\quad\forall\ \bar{\mathbf{w}}_{t}\in\Omega. (3.52)
Proof.

Since the sequence {𝐰¯k,k=0,1,,t}\{\bar{\mathbf{w}}^{k},k=0,1,\cdots,t\} belongs to the convex set Ω\Omega, its linear combination 𝐰¯t\bar{\mathbf{w}}_{t} also belongs to Ω\Omega. Summing inequality (3.51) over all iterations, for any 𝐰Ω\mathbf{w}\in\Omega, we obtain:

ρ(t)(t+1)f(𝐱)ρ(t)k=0tf(𝐱¯k)+k=0t1ηk(𝐰𝐰¯k)𝚪(𝐰)\displaystyle\rho(t)(t+1)f(\mathbf{x})-\rho(t)\sum_{k=0}^{t}f(\bar{\mathbf{x}}^{k})+\sum_{k=0}^{t}\frac{1}{\eta_{k}}\left(\mathbf{w}-\bar{\mathbf{w}}^{k}\right)^{\top}\boldsymbol{\Gamma}(\mathbf{w}) +12k=0t𝐰𝐰k𝐇k212k=0t𝐰𝐰k+1𝐇k2.\displaystyle+\frac{1}{2}\sum_{k=0}^{t}\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\geq\frac{1}{2}\sum_{k=0}^{t}\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}.

The inequality above divided by t+1t+1 equals

ρ(t)f(𝐱)ρ(t)t+1k=0tf(𝐱¯k)+(𝐰𝐰¯t)1ηt𝚪(𝐰)+\displaystyle\rho(t)f(\mathbf{x})-\frac{\rho(t)}{t+1}\sum_{k=0}^{t}f(\bar{\mathbf{x}}^{k})+\left(\mathbf{w}-\bar{\mathbf{w}}_{t}\right)^{\top}\frac{1}{\eta_{t}}\boldsymbol{\Gamma}(\mathbf{w})+ 12(t+1)k=0t𝐰𝐰k𝐇k212(t+1)k=0t𝐰𝐰k+1𝐇k2.\displaystyle\frac{1}{2(t+1)}\sum_{k=0}^{t}\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\geq\frac{1}{2(t+1)}\sum_{k=0}^{t}\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}.

Based on the fact that f(𝐱)f(\mathbf{x}) is convex, the inequality f(𝐱¯t)1t+1k=0tf(𝐱¯k)f(\bar{\mathbf{x}}_{t})\leq\frac{1}{t+1}\sum_{k=0}^{t}f(\bar{\mathbf{x}}^{k}) holds.

ρ(t)[f(𝐱)f(𝐱¯t)]+(𝐰𝐰¯t)1ηt𝚪(𝐰)+12(t+1)k=0t[𝐰𝐰k𝐇k2𝐰𝐰k+1𝐇k2]\displaystyle\rho(t)[f(\mathbf{x})-f(\bar{\mathbf{x}}_{t})]+\left(\mathbf{w}-\bar{\mathbf{w}}_{t}\right)^{\top}\frac{1}{\eta_{t}}\boldsymbol{\Gamma}(\mathbf{w})+\frac{1}{2(t+1)}\sum_{k=0}^{t}\big{[}\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}\big{]} 0.\displaystyle\geq 0.

Setting 𝐰=𝐰\mathbf{w}=\mathbf{w}^{*}, the above inequality can be expressed as

ρ(t)[f(𝐱¯t)f(𝐱)]+(𝐰¯t𝐰)1ηt𝚪(𝐰)\displaystyle\rho(t)[f(\bar{\mathbf{x}}_{t})-f(\mathbf{x}^{*})]+\left(\bar{\mathbf{w}}_{t}-\mathbf{w}^{*}\right)^{\top}\frac{1}{\eta_{t}}\boldsymbol{\Gamma}(\mathbf{w}^{*}) 12(t+1)k=0t[𝐰𝐰k𝐇k2𝐰𝐰k+1𝐇k2],\displaystyle\leq\frac{1}{2(t+1)}\sum_{k=0}^{t}\left[\|\mathbf{w}^{*}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}^{*}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}\right],
ρ(t)[f(𝐱¯t)f(𝐱)]+(𝐰¯t𝐰)1ηt𝚪(𝐰)\displaystyle\rho(t)[f(\bar{\mathbf{x}}_{t})-f(\mathbf{x}^{*})]+\left(\bar{\mathbf{w}}_{t}-\mathbf{w}^{*}\right)^{\top}\frac{1}{\eta_{t}}\boldsymbol{\Gamma}(\mathbf{w}^{*}) 12(t+1)[𝐰𝐰0𝐇02k=0t1𝐰𝐰k𝐃k2].\displaystyle\leq\frac{1}{2(t+1)}\left[\|\mathbf{w}^{*}-\mathbf{w}^{0}\|_{\mathbf{H}_{0}}^{2}-\sum_{k=0}^{t-1}\|\mathbf{w}^{*}-\mathbf{w}^{k}\|_{\mathbf{D}_{k}}^{2}\right]. (3.53)

Since 𝐃k0,(k=0,,t1)\mathbf{D}_{k}\succeq 0,(k=0,\cdots,t-1) and 𝐇0=1η0𝖧0\mathbf{H}_{0}=\frac{1}{\eta_{0}}\mathsf{H}_{0}, the inequality (3.53) can be further rewritten as

f(𝐱¯t)f(𝐱)+(𝐰¯t𝐰)1ηtρ(t)𝚪(𝐰)\displaystyle f(\bar{\mathbf{x}}_{t})-f(\mathbf{x}^{*})+\left(\bar{\mathbf{w}}_{t}-\mathbf{w}^{*}\right)^{\top}\frac{1}{\eta_{t}\rho(t)}\boldsymbol{\Gamma}(\mathbf{w}^{*}) 12η0ρ(t)(t+1)𝐰𝐰0𝖧02.\displaystyle\leq\frac{1}{2\eta_{0}\rho(t)(t+1)}\|\mathbf{w}^{*}-\mathbf{w}^{0}\|_{\mathsf{H}_{0}}^{2}.

Thus, this theorem holds. ∎

Remark 3.4.

The above theorem shows that the Spice method achieves an ergodic convergence rate of 𝒪(1/[ρ(t)(t+1)])\mathcal{O}(1/[\rho(t)(t+1)]). The inequality (3.53) provides an upper bound on the error, which is influenced by the scaling factor ρ(t)\rho(t), the iteration number tt, and the initial conditions of the variables. This insight highlights how different choices of ρ(t)\rho(t) directly impact the convergence behavior of the method.

Corollary 3.1.

For any iteration t>0t>0, the choices of the scaling factor ρ(t)\rho(t) allows for flexible control over the convergence rate:

  1. 1.

    For any α>0\alpha>0, setting ρ(t)=(t+1)α\rho(t)=(t+1)^{\alpha} results in an ergodic power-law convergence rate of 𝒪(1/(t+1)1+α)\mathcal{O}(1/(t+1)^{1+\alpha}). This demonstrates that increasing α\alpha enhances the convergence rate, leading to faster decay of the error.

  2. 2.

    For any β>0\beta>0, setting ρ(t)=eβt\rho(t)=e^{\beta t} results in an ergodic exponential convergence rate of 𝒪(1/[eβt(t+1)])\mathcal{O}(1/[e^{\beta t}(t+1)]). This choice accelerates convergence even further as the error decays exponentially with increasing iterations.

  3. 3.

    Setting ρ(t)=(t+1)t\rho(t)=(t+1)^{t} yields an ergodic power-exponential convergence rate of 𝒪(1/(t+1)t+1)\mathcal{O}(1/(t+1)^{t+1}). This setting combines both power-law and exponential decay, offering a robust convergence rate that rapidly diminishes the error as tt increases.

As ρ(t)\rho(t)\rightarrow\infty, the error term f(𝐱¯t)f(𝐱)0+f(\bar{\mathbf{x}}_{t})-f(\mathbf{x}^{*})\rightarrow 0_{+}, ensuring the Spice method’s asymptotic convergence to the optimal solution.

3.5 Solving Convex Problems with Equality Constraints

For the single variable, the convex problems with equality constraints can be written as:

min{f(𝐱)ϕi(𝐱)0,𝐀𝐱=𝐛,𝐱𝒳,i=1,,p1},\displaystyle\min\left\{f(\mathbf{x})\mid\phi_{i}(\mathbf{x})\leq 0,\mathbf{A}\mathbf{x}=\mathbf{b},\ \mathbf{x}\in\mathcal{X},\ i=1,\cdots,p_{1}\right\},

where 𝐀p2×n\mathbf{A}\in\mathbb{R}^{p_{2}\times n}, 𝐛p2\mathbf{b}\in\mathbb{R}^{p_{2}} (p1+p2=pp_{1}+p_{2}=p), and Φ(𝐱)=[ϕ1(𝐱),,ϕp1(𝐱),ϕp1+1(𝐱),,ϕp(𝐱)]\Phi(\mathbf{x})=[\phi_{1}(\mathbf{x}),\cdots,\phi_{p_{1}}(\mathbf{x}),\phi_{p_{1}+1}(\mathbf{x}),\cdots,\phi_{p}(\mathbf{x})]^{\top}, with [ϕp1+1(𝐱),,ϕp(𝐱)]=𝐀𝐱𝐛[\phi_{p_{1}+1}(\mathbf{x}),\cdots,\phi_{p}(\mathbf{x})]^{\top}=\mathbf{A}\mathbf{x}-\mathbf{b}. The domain of the dual variable 𝝀\boldsymbol{\lambda} becomes 𝒵:=+p1×p2\mathcal{Z}:=\mathbb{R}^{p_{1}}_{+}\times\mathbb{R}^{p_{2}}.

4 Nonlinear Convex Problems with Separable Variables

In this section, the Spice method is extended to address the following separable convex optimization problem, which includes nonlinear inequality constraints:

𝖯2:min{f(𝐱)+g(𝐲)|ϕi(𝐱)+ψi(𝐲)0,𝐱𝒳,𝐲𝒴,i=1,,p},\displaystyle\mathsf{P}2:\quad\min\left\{f(\mathbf{x})+g(\mathbf{y})\ \middle|\ \phi_{i}(\mathbf{x})+\psi_{i}(\mathbf{y})\leq 0,\ \mathbf{x}\in\mathcal{X},\ \mathbf{y}\in\mathcal{Y},\ i=1,\cdots,p\right\}, (4.1)

where two conve set 𝒳n\mathcal{X}\in\mathbb{R}^{n} and 𝒴m\mathcal{Y}\in\mathbb{R}^{m} are nonempty and closed. Two convex objective functions {f(𝐱):n}\{f(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R}\} and {g(𝐲):m}\{g(\mathbf{y}):\mathbb{R}^{m}\rightarrow\mathbb{R}\} are proper and closed. The constraint functions {ϕi:n,i=1,,p}\{\phi_{i}:\mathbb{R}^{n}\rightarrow\mathbb{R},\;i=1,\ldots,p\} and {ψi:m,i=1,,p}\{\psi_{i}:\mathbb{R}^{m}\rightarrow\mathbb{R},\;i=1,\ldots,p\} are convex and continuously differentiable. The Lagrangian function associated with P2 is characterized as:

(𝐱,𝐲,𝝀)=f(𝐱)+g(𝐲)+𝝀[Φ(𝐱)+Ψ(𝐲)],\mathcal{L}(\mathbf{x},\mathbf{y},\boldsymbol{\lambda})=f(\mathbf{x})+g(\mathbf{y})+\boldsymbol{\lambda}^{\top}[\Phi(\mathbf{x})+\Psi(\mathbf{y})], (4.2)

where Φ(𝐱)=[ϕ1(𝐱),,ϕp(𝐱)]\Phi(\mathbf{x})=[\phi_{1}(\mathbf{x}),\ldots,\phi_{p}(\mathbf{x})]^{\top} and Ψ(𝐲)=[ψ1(𝐲),,ψp1(𝐲)]\Psi(\mathbf{y})=[\psi_{1}(\mathbf{y}),\cdots,\psi_{p_{1}}(\mathbf{y})]^{\top}. The dual variable 𝝀\boldsymbol{\lambda} belongs to the set 𝒵:=+p\mathcal{Z}:=\mathbb{R}^{p}_{+}. In fact, constrained convex optimization can be interpreted as a specific instance of a saddle point problem:

min𝐱,𝐲max𝝀{(𝐱,𝐲,𝝀)𝐱𝒳,𝐲𝒴,𝝀𝒵}.\min_{\mathbf{x},\mathbf{y}}\max_{\boldsymbol{\lambda}}\left\{\mathcal{L}(\mathbf{x},\mathbf{y},\boldsymbol{\lambda})\mid\mathbf{x}\in\mathcal{X},\mathbf{y}\in\mathcal{Y},\boldsymbol{\lambda}\in\mathcal{Z}\right\}. (4.3)

By Lemma 2.1, for the saddle point, the following variational inequality holds:

𝐱𝒳,\displaystyle\mathbf{x}^{*}\in\mathcal{X},\quad f(𝐱)f(𝐱)+(𝐱𝐱)𝒟Φ(𝐱)𝝀0,\displaystyle f(\mathbf{x})-f(\mathbf{x}^{*})+(\mathbf{x}-\mathbf{x}^{*})^{\top}\mathcal{D}\Phi(\mathbf{x}^{*})^{\top}\boldsymbol{\lambda}^{*}\geq 0, 𝐱𝒳,\displaystyle\quad\forall\ \mathbf{x}\in\mathcal{X},
𝐲𝒴,\displaystyle\mathbf{y}^{*}\in\mathcal{Y},\quad g(𝐲)g(𝐲)+(𝐲𝐲)𝒟Ψ(𝐲)𝝀0,\displaystyle g(\mathbf{y})-g(\mathbf{y}^{*})+(\mathbf{y}-\mathbf{y}^{*})^{\top}\mathcal{D}\Psi(\mathbf{y}^{*})^{\top}\boldsymbol{\lambda}^{*}\geq 0, 𝐲𝒴,\displaystyle\quad\forall\ \mathbf{y}\in\mathcal{Y},
𝝀𝒵,\displaystyle\boldsymbol{\lambda}^{*}\in\mathcal{Z},\quad (𝝀𝝀)[Φ(𝐱)Ψ(𝐲)]0,\displaystyle(\boldsymbol{\lambda}-\boldsymbol{\lambda}^{*})^{\top}[-\Phi(\mathbf{x}^{*})-\Psi(\mathbf{y}^{*})]\geq 0, 𝝀𝒵,\displaystyle\quad\forall\ \boldsymbol{\lambda}\in\mathcal{Z},

where 𝒟Φ(𝐱)=[ϕ1,,ϕp]p×n\mathcal{D}\Phi(\mathbf{x})=[\nabla\phi_{1},\cdots,\nabla\phi_{p}]^{\top}\in\mathbb{R}^{p\times n}, 𝒟Ψ(𝐲)=[ψ1,,ψp]p×m\mathcal{D}\Psi(\mathbf{y})=[\nabla\psi_{1},\cdots,\nabla\psi_{p}]^{\top}\in\mathbb{R}^{p\times m}. The above inequalities can be further characterized as a monotone variational inequality:

𝐰Ω,ϑ(𝐮)ϑ(𝐮)+(𝐰𝐰)𝚪(𝐰)0,𝐰Ω,\mathbf{w}^{*}\in\Omega,\quad\vartheta(\mathbf{u})-\vartheta(\mathbf{u}^{*})+(\mathbf{w}-\mathbf{w}^{*})^{\top}\boldsymbol{\Gamma}(\mathbf{w}^{*})\geq 0,\quad\forall\ \mathbf{w}\in\Omega, (4.4)

where ϑ(𝐮)=f(𝐱)+g(𝐲)\vartheta(\mathbf{u})=f(\mathbf{x})+g(\mathbf{y}),

𝐮=(𝐱𝐲),𝐰=(𝐱𝐲𝝀),𝚪(𝐰)=(𝒟Φ(𝐱)𝝀𝒟Ψ(𝐲)𝝀Φ(𝐱)Ψ(𝐲)), and Ω=𝒳×𝒴×𝒵.\mathbf{u}=\left(\begin{array}[]{c}\mathbf{x}\\[5.69046pt] \mathbf{y}\end{array}\right),\quad\mathbf{w}=\left(\begin{array}[]{c}\mathbf{x}\\[5.69046pt] \mathbf{y}\\[5.69046pt] \boldsymbol{\lambda}\end{array}\right),\quad\boldsymbol{\Gamma}(\mathbf{w})=\left(\begin{array}[]{c}\mathcal{D}\Phi(\mathbf{x})^{\top}\boldsymbol{\lambda}\\[5.69046pt] \mathcal{D}\Psi(\mathbf{y})^{\top}\boldsymbol{\lambda}\\[5.69046pt] -\Phi(\mathbf{x})-\Psi(\mathbf{y})\end{array}\right),\quad\mbox{ and }\quad\Omega=\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}. (4.5)

Then, the following lemma shows that the operator 𝚪\boldsymbol{\Gamma} is monotone.

Lemma 4.1.

Let 𝒳n\mathcal{X}\subset\mathbb{R}^{n}, 𝒴m\mathcal{Y}\subset\mathbb{R}^{m}, 𝒵:=+p\mathcal{Z}:=\mathbb{R}^{p}_{+} be closed convex sets. Then the operator 𝚪\boldsymbol{\Gamma} defined in (4.5) satisfies

(𝐰𝐰¯)[𝚪(𝐰)𝚪(𝐰¯)]0,𝐰,𝐰¯𝒳×𝒴×𝒵.(\mathbf{w}-\bar{\mathbf{w}})^{\top}[\boldsymbol{\Gamma}(\mathbf{w})-\boldsymbol{\Gamma}(\bar{\mathbf{w}})]\geq 0,\quad\forall\ \mathbf{w},\bar{\mathbf{w}}\in\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}. (4.6)
Proof.

Recall that {ϕi\{\phi_{i} and ψi(i=1,,p)}\psi_{i}\;(i=1,\ldots,p)\} are assumed to be convex and differentiable on 𝒳\mathcal{X} and 𝒴\mathcal{Y}. Then for any 𝐱,𝐱¯𝒳\mathbf{x},\;\bar{\mathbf{x}}\in\mathcal{X} and 𝐲,𝐲¯𝒴\mathbf{y},\;\bar{\mathbf{y}}\in\mathcal{Y}, we have

ϕi(𝐱)ϕi(𝐱¯)+ϕi(𝐱¯),𝐱𝐱¯,i=1,,p,\phi_{i}(\mathbf{x})\geq\phi_{i}(\bar{\mathbf{x}})+\langle\nabla\phi_{i}(\bar{\mathbf{x}}),\mathbf{x}-\bar{\mathbf{x}}\rangle,\quad i=1,\ldots,p,

and

ψi(𝐲)ψi(𝐲¯)+ψi(𝐲¯),𝐲𝐲¯,i=1,,p,\psi_{i}(\mathbf{y})\geq\psi_{i}(\bar{\mathbf{y}})+\langle\nabla\psi_{i}(\bar{\mathbf{y}}),\mathbf{y}-\bar{\mathbf{y}}\rangle,\quad i=1,\ldots,p,

It implies that

θi(𝐱𝐱¯):=ϕi(𝐱)ϕi(𝐱¯)+ϕi(𝐱¯),𝐱𝐱¯0,i=1,,p,\theta_{i}^{\prime}(\mathbf{x}\mid\bar{\mathbf{x}}):=\phi_{i}(\mathbf{x})-\phi_{i}(\bar{\mathbf{x}})+\langle\nabla\phi_{i}(\bar{\mathbf{x}}),\mathbf{x}-\bar{\mathbf{x}}\rangle\geq 0,\quad i=1,\ldots,p,

and

θi′′(𝐲𝐲¯):=ψi(𝐲)ψi(𝐲¯)+ϕi(𝐲¯),𝐲𝐲¯0,i=1,,p.\theta_{i}^{\prime\prime}(\mathbf{y}\mid\bar{\mathbf{y}}):=\psi_{i}(\mathbf{y})-\psi_{i}(\bar{\mathbf{y}})+\langle\nabla\phi_{i}(\bar{\mathbf{y}}),\mathbf{y}-\bar{\mathbf{y}}\rangle\geq 0,\quad i=1,\ldots,p.

In the same way, for any i{1,,p}i\in\{1,\ldots,p\}, we obtain θi(𝐱¯𝐱)0\theta_{i}^{\prime}(\bar{\mathbf{x}}\mid\mathbf{x})\geq 0 and θi′′(𝐲¯𝐲)0\theta_{i}^{\prime\prime}(\bar{\mathbf{y}}\mid\mathbf{y})\geq 0. Let 𝚯:=[θ1,,θp]\boldsymbol{\Theta}^{\prime}:=[\theta_{1}^{\prime},\cdots,\theta_{p}^{\prime}]^{\top} and 𝚯′′:=[θ1′′,,θp′′]\boldsymbol{\Theta}^{\prime\prime}:=[\theta_{1}^{\prime\prime},\cdots,\theta_{p}^{\prime\prime}]^{\top}. Since 𝝀,𝝀¯𝒵\boldsymbol{\lambda},\bar{\boldsymbol{\lambda}}\in\mathcal{Z}, 𝚯(𝐱𝐱¯),𝚯(𝐱¯𝐱),𝚯′′(𝐲𝐲¯),𝚯′′(𝐲¯𝐲)+p\boldsymbol{\Theta}^{\prime}(\mathbf{x}\mid\bar{\mathbf{x}}),\boldsymbol{\Theta}^{\prime}(\bar{\mathbf{x}}\mid\mathbf{x}),\boldsymbol{\Theta}^{\prime\prime}(\mathbf{y}\mid\bar{\mathbf{y}}),\boldsymbol{\Theta}^{\prime\prime}(\bar{\mathbf{y}}\mid\mathbf{y})\in\mathbb{R}^{p}_{+}, the following inequality holds.

(𝐰𝐰¯)[𝚪(𝐰)𝚪(𝐰¯)]=\displaystyle(\mathbf{w}-\bar{\mathbf{w}})^{\top}[\boldsymbol{\Gamma}(\mathbf{w})-\boldsymbol{\Gamma}(\bar{\mathbf{w}})]= (𝐱𝐱¯𝐲𝐲¯𝝀𝝀¯)(𝒟Φ(𝐱)𝝀𝒟Φ(𝐲¯)𝝀¯𝒟Ψ(𝐲)𝝀𝒟Ψ(𝐱¯)𝝀¯Φ(𝐱)+Φ(𝐱¯)Ψ(𝐲)+Ψ(𝐲¯))\displaystyle\left(\begin{array}[]{c}\mathbf{x}-\bar{\mathbf{x}}\\[5.69046pt] \mathbf{y}-\bar{\mathbf{y}}\\[5.69046pt] \boldsymbol{\lambda}-\bar{\boldsymbol{\lambda}}\end{array}\right)^{\top}\left(\begin{array}[]{c}\mathcal{D}\Phi(\mathbf{x})^{\top}\boldsymbol{\lambda}-\mathcal{D}\Phi(\bar{\mathbf{y}})^{\top}\bar{\boldsymbol{\lambda}}\\[5.69046pt] \mathcal{D}\Psi(\mathbf{y})^{\top}\boldsymbol{\lambda}-\mathcal{D}\Psi(\bar{\mathbf{x}})^{\top}\bar{\boldsymbol{\lambda}}\\[5.69046pt] -\Phi(\mathbf{x})+\Phi(\bar{\mathbf{x}})-\Psi(\mathbf{y})+\Psi(\bar{\mathbf{y}})\end{array}\right)
=\displaystyle= 𝝀[Φ(𝐱¯)Φ(𝐱)𝒟Φ(𝐱)(𝐱¯𝐱)]+𝝀¯[Φ(𝐱)Φ(𝐱¯)𝒟Φ(𝐱¯)(𝐱𝐱¯)]\displaystyle\boldsymbol{\lambda}^{\top}[\Phi(\bar{\mathbf{x}})-\Phi(\mathbf{x})-\mathcal{D}\Phi(\mathbf{x})(\bar{\mathbf{x}}-\mathbf{x})]+\bar{\boldsymbol{\lambda}}^{\top}[\Phi(\mathbf{x})-\Phi(\bar{\mathbf{x}})-\mathcal{D}\Phi(\bar{\mathbf{x}})(\mathbf{x}-\bar{\mathbf{x}})]
+𝝀[Ψ(𝐲¯)Ψ(𝐲)𝒟Ψ(𝐲)(𝐲¯𝐲)]+𝝀¯[Ψ(𝐲)Ψ(𝐲¯)𝒟Ψ(𝐲¯)(𝐲𝐲¯)]\displaystyle+\boldsymbol{\lambda}^{\top}[\Psi(\bar{\mathbf{y}})-\Psi(\mathbf{y})-\mathcal{D}\Psi(\mathbf{y})(\bar{\mathbf{y}}-\mathbf{y})]+\bar{\boldsymbol{\lambda}}^{\top}[\Psi(\mathbf{y})-\Psi(\bar{\mathbf{y}})-\mathcal{D}\Psi(\bar{\mathbf{y}})(\mathbf{y}-\bar{\mathbf{y}})]
=\displaystyle= 𝝀𝚯(𝐱¯𝐱)+𝝀¯𝚯(𝐱𝐱¯)+𝝀𝚯′′(𝐲¯𝐲)+𝝀¯𝚯′′(𝐲𝐲¯)0.\displaystyle\boldsymbol{\lambda}^{\top}\boldsymbol{\Theta}^{\prime}(\bar{\mathbf{x}}\mid\mathbf{x})+\bar{\boldsymbol{\lambda}}^{\top}\boldsymbol{\Theta}^{\prime}(\mathbf{x}\mid\bar{\mathbf{x}})+\boldsymbol{\lambda}^{\top}\boldsymbol{\Theta}^{\prime\prime}(\bar{\mathbf{y}}\mid\mathbf{y})+\bar{\boldsymbol{\lambda}}^{\top}\boldsymbol{\Theta}^{\prime\prime}(\mathbf{y}\mid\bar{\mathbf{y}})\geq 0.

Thus, this lemma holds. ∎

4.1 PPA-Like Prediction Scheme

For the prediction scheme, we take the output of PPA 𝐰¯k\bar{\mathbf{w}}^{k} as the predictive variables. Consider the scaling factors ρ>0\rho>0 and ηk>0\eta_{k}>0. The corresponding scaling Lagrangian function for P2 is defined as

(𝐱,𝐲,𝝀,ρ,η)=ρ[f(𝐱)+g(𝐲)]+𝝀1η[Φ(𝐱)+Ψ(𝐲)],\mathcal{L}(\mathbf{x},\mathbf{y},\boldsymbol{\lambda},\rho,\eta)=\rho[f(\mathbf{x})+g(\mathbf{y})]+\boldsymbol{\lambda}^{\top}\frac{1}{\eta}[\Phi(\mathbf{x})+\Psi(\mathbf{y})], (4.7)

For the kk-th iteration, the PPA-like prediction scheme utilizes the current estimates (𝐱k,𝐲k,𝝀k)(\mathbf{x}^{k},\mathbf{y}^{k},\boldsymbol{\lambda}^{k}) to sequentially update the predicted values (𝐱¯k(\bar{\mathbf{x}}^{k}, 𝐲¯k\bar{\mathbf{y}}^{k}, and 𝝀¯k)\bar{\boldsymbol{\lambda}}^{k}) by solving the following subproblems:

𝐱¯k=\displaystyle\bar{\mathbf{x}}^{k}= argmin{(𝐱,𝐲k,𝝀k,ρ,ηk)+rk2𝐱𝐱k2𝐱𝒳},\displaystyle\arg\min\{\mathcal{L}(\mathbf{x},\mathbf{y}^{k},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{x}-\mathbf{x}^{k}\|^{2}\mid\mathbf{x}\in\mathcal{X}\}, (4.8a)
𝐲¯k=\displaystyle\bar{\mathbf{y}}^{k}= argmin{(𝐱¯k,𝐲,𝝀k,ρ,ηk)+rk2𝐲𝐲k2𝐲𝒴},\displaystyle\arg\min\{\mathcal{L}(\bar{\mathbf{x}}^{k},\mathbf{y},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{y}-\mathbf{y}^{k}\|^{2}\mid\mathbf{y}\in\mathcal{Y}\}, (4.8b)
𝝀¯k=\displaystyle\bar{\boldsymbol{\lambda}}^{k}= argmax{(𝐱¯k,𝐲¯k,𝝀,ρ,ηk)sk2𝝀𝝀k2𝝀𝒵}.\displaystyle\arg\max\{\mathcal{L}(\bar{\mathbf{x}}^{k},\bar{\mathbf{y}}^{k},\boldsymbol{\lambda},\rho,\eta_{k})-\frac{s_{k}}{2}\|\boldsymbol{\lambda}-\boldsymbol{\lambda}^{k}\|^{2}\mid\boldsymbol{\lambda}\in\mathcal{Z}\}. (4.8c)

As established in Lemma 2.1, these optimization problems (4.8) can be expressed as the following variational inequalities:

𝐱¯k𝒳,\displaystyle\bar{\mathbf{x}}^{k}\in\mathcal{X},\quad ρ[f(𝐱)f(𝐱¯k)]+(𝐱𝐱¯k)[1ηk𝒟Φ(𝐱¯k)𝝀k+rk(𝐱¯k𝐱k)]0,\displaystyle\rho\left[f(\mathbf{x})-f(\bar{\mathbf{x}}^{k})\right]+(\mathbf{x}-\bar{\mathbf{x}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\boldsymbol{\lambda}^{k}+r_{k}(\bar{\mathbf{x}}^{k}-\mathbf{x}^{k})\right]\geq 0, 𝐱𝒳,\displaystyle\quad\forall\ \mathbf{x}\in\mathcal{X},
𝐲¯k𝒴,\displaystyle\bar{\mathbf{y}}^{k}\in\mathcal{Y},\quad ρ[g(𝐲)g(𝐲¯k)]+(𝐲𝐲¯k)[1ηk𝒟Ψ(𝐲¯k)𝝀k+rk(𝐲¯k𝐲k)]0,\displaystyle\rho[g(\mathbf{y})-g(\bar{\mathbf{y}}^{k})]+(\mathbf{y}-\bar{\mathbf{y}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}\boldsymbol{\lambda}^{k}+r_{k}(\bar{\mathbf{y}}^{k}-\mathbf{y}^{k})\right]\geq 0, 𝐲𝒴,\displaystyle\quad\forall\ \mathbf{y}\in\mathcal{Y},
𝝀¯k𝒵,\displaystyle\bar{\boldsymbol{\lambda}}^{k}\in\mathcal{Z},\quad (𝝀𝝀¯k)[1ηk[Φ(𝐱¯k)+Ψ(𝐲¯k)]+sk(𝝀¯k𝝀k)]0,\displaystyle(\boldsymbol{\lambda}-\bar{\boldsymbol{\lambda}}^{k})^{\top}\left[-\frac{1}{\eta_{k}}[\Phi(\bar{\mathbf{x}}^{k})+\Psi(\bar{\mathbf{y}}^{k})]+s_{k}(\bar{\boldsymbol{\lambda}}^{k}-\boldsymbol{\lambda}^{k})\right]\geq 0, 𝝀𝒵.\displaystyle\quad\forall\ \boldsymbol{\lambda}\in\mathcal{Z}.

In light of the optimality conditions, we can further refine these inequalities into the form:

𝐱¯k𝒳,\displaystyle\bar{\mathbf{x}}^{k}\!\in\!\mathcal{X},\ ρ[f(𝐱)f(𝐱¯k)]+(𝐱𝐱¯k)[1ηk𝒟Φ(𝐱¯k)𝝀¯k+rk(𝐱¯k𝐱k)1ηk𝒟Φ(𝐱¯k)(𝝀¯k𝝀k)]0,\displaystyle\rho\!\left[f(\mathbf{x})\!-\!f(\bar{\mathbf{x}}^{k})\right]\!+\!(\mathbf{x}\!-\!\bar{\mathbf{x}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\bar{\boldsymbol{\lambda}}^{k}\!+\!r_{k}(\bar{\mathbf{x}}^{k}\!-\!\mathbf{x}^{k})\!-\!\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}(\bar{\boldsymbol{\lambda}}^{k}\!-\!\boldsymbol{\lambda}^{k})\right]\geq 0, 𝐱𝒳,\displaystyle\ \forall\ \mathbf{x}\in\mathcal{X},
𝐲¯k𝒴,\displaystyle\bar{\mathbf{y}}^{k}\!\in\!\mathcal{Y},\ ρ[g(𝐲)g(𝐲¯k)]+(𝐲𝐲¯k)[1ηk𝒟Ψ(𝐲¯k)𝝀¯k+rk(𝐲¯k𝐲k)1ηk𝒟Ψ(𝐲¯k)(𝝀¯k𝝀k)]0,\displaystyle\rho[g(\mathbf{y})\!-\!g(\bar{\mathbf{y}}^{k})]\!+\!(\mathbf{y}\!-\!\bar{\mathbf{y}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}\bar{\boldsymbol{\lambda}}^{k}\!+\!r_{k}(\bar{\mathbf{y}}^{k}\!-\!\mathbf{y}^{k})\!-\!\frac{1}{\eta_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}(\bar{\boldsymbol{\lambda}}^{k}\!-\!\boldsymbol{\lambda}^{k})\right]\geq 0, 𝐲𝒴,\displaystyle\ \forall\ \mathbf{y}\in\mathcal{Y},
𝝀¯k𝒵,\displaystyle\bar{\boldsymbol{\lambda}}^{k}\!\in\!\mathcal{Z},\ (𝝀𝝀¯k)[1ηk[Φ(𝐱¯k)+Ψ(𝐲¯k)]+sk(𝝀¯k𝝀k)]0,\displaystyle(\boldsymbol{\lambda}\!-\!\bar{\boldsymbol{\lambda}}^{k})^{\top}\left[\!-\!\frac{1}{\eta_{k}}[\Phi(\bar{\mathbf{x}}^{k})\!+\!\Psi(\bar{\mathbf{y}}^{k})]\!+\!s_{k}(\bar{\boldsymbol{\lambda}}^{k}\!-\!\boldsymbol{\lambda}^{k})\right]\geq 0, 𝝀𝒵.\displaystyle\ \forall\ \boldsymbol{\lambda}\in\mathcal{Z}.

The above inequalities above can be combined into

ρ[ϑ(𝐮)ϑ(𝐮¯k)]+(𝐰𝐰¯k){1ηk𝚪(𝐰¯k)+(rk(𝐱¯k𝐱k)1ηk𝒟Φ(𝐱¯k)(𝝀¯k𝝀k)rk(𝐲¯k𝐲k)1ηk𝒟Ψ(𝐲¯k)(𝝀¯k𝝀k)sk(𝝀¯k𝝀k))}0,𝐰Ω.\rho[\vartheta(\mathbf{u})-\vartheta(\bar{\mathbf{u}}^{k})]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\left\{\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})+\left(\begin{array}[]{r}r_{k}(\bar{\mathbf{x}}^{k}-\mathbf{x}^{k})-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}(\bar{\boldsymbol{\lambda}}^{k}-\boldsymbol{\lambda}^{k})\\[5.69046pt] r_{k}(\bar{\mathbf{y}}^{k}-\mathbf{y}^{k})-\frac{1}{\eta_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}(\bar{\boldsymbol{\lambda}}^{k}-\boldsymbol{\lambda}^{k})\\[5.69046pt] s_{k}(\bar{\boldsymbol{\lambda}}^{k}-\boldsymbol{\lambda}^{k})\end{array}\right)\right\}\geq 0,\quad\forall\ \mathbf{w}\in\Omega.

Further, it can be written as

ρ[ϑ(𝐮)ϑ(𝐮¯k)]+(𝐰𝐰¯k)[1ηk𝚪(𝐰¯k)+𝐐k(𝐰¯k𝐰k)]0,𝐰Ω,\rho[\vartheta(\mathbf{u})-\vartheta(\bar{\mathbf{u}}^{k})]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\left[\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})+\mathbf{Q}_{k}(\bar{\mathbf{w}}^{k}-\mathbf{w}^{k})\right]\geq 0,\quad\forall\ \mathbf{w}\in\Omega, (4.9)

where the proximal matrix is

𝐐k=(rk𝐈n𝟎1ηk𝒟Φ(𝐱¯k)𝟎rk𝐈m1ηk𝒟Ψ(𝐲¯k)𝟎𝟎sk𝐈p),\mathbf{Q}_{k}=\left(\begin{array}[]{ccc}r_{k}\mathbf{I}_{n}&\boldsymbol{0}&-\frac{1}{\eta_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[5.69046pt] \boldsymbol{0}&r_{k}\mathbf{I}_{m}&-\frac{1}{\eta_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}\\[5.69046pt] \boldsymbol{0}&\boldsymbol{0}&s_{k}\mathbf{I}_{p}\end{array}\right),

and it can be considered a second-order diagonal scalar upper triangular block matrix. The variational inequality (4.9) equals

𝐰¯kΩ,ρ[ϑ(𝐮)ϑ(𝐮¯k)]+(𝐰𝐰¯k)1ηk𝚪(𝐰¯k)(𝐰𝐰¯k)𝐐k(𝐰k𝐰¯k),𝐰Ω,\bar{\mathbf{w}}^{k}\in\Omega,\quad\rho[\vartheta(\mathbf{u})-\vartheta(\bar{\mathbf{u}}^{k})]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\geq(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\mathbf{Q}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}),\quad\forall\ \mathbf{w}\in\Omega, (4.10)

where the proximal matrix 𝐐k\mathbf{Q}_{k} in the Spice method is called the predictive matrix.

4.2 Matrix-Driven Correction Scheme

For the correction scheme, we use a corrective matrix 𝐌k\mathbf{M}_{k} to correct the predictive variables by the following equation.

𝐰k+1=𝐰k𝐌k(𝐰k𝐰¯k),\mathbf{w}^{k+1}=\mathbf{w}^{k}-\mathbf{M}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}), (4.11)

where the corrective matrix 𝐌k\mathbf{M}_{k} is not unique. In this paper, we take

𝐌k=(𝐈n𝟎1ηkrk𝒟Φ(𝐱¯k)𝟎𝐈m1ηkrk𝒟Ψ(𝐲¯k)𝟎𝟎𝐈p),𝐌k1=(𝐈n𝟎1ηkrk𝒟Φ(𝐱¯k)𝟎𝐈m1ηkrk𝒟Ψ(𝐲¯k)𝟎𝟎𝐈p).\mathbf{M}_{k}=\left(\begin{array}[]{ccc}\mathbf{I}_{n}&\mathbf{0}&-\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[5.69046pt] \mathbf{0}&\mathbf{I}_{m}&-\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}\\[5.69046pt] \mathbf{0}&\mathbf{0}&\mathbf{I}_{p}\end{array}\right),\quad\mathbf{M}_{k}^{-1}=\left(\begin{array}[]{ccc}\mathbf{I}_{n}&\mathbf{0}&\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[5.69046pt] \mathbf{0}&\mathbf{I}_{m}&\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}\\[5.69046pt] \mathbf{0}&\mathbf{0}&\mathbf{I}_{p}\end{array}\right).

For any rk,sk>0r_{k},s_{k}>0, the extended matrix 𝐇k\mathbf{H}_{k} is positive-definite:

𝐇k=𝐐k𝐌k1=(rk𝐈n𝟎𝟎𝟎rk𝐈m𝟎𝟎𝟎sk𝐈p)0.\mathbf{H}_{k}=\mathbf{Q}_{k}\mathbf{M}_{k}^{-1}=\left(\begin{array}[]{ccc}r_{k}\mathbf{I}_{n}&\mathbf{0}&\mathbf{0}\\[5.69046pt] \mathbf{0}&r_{k}\mathbf{I}_{m}&\mathbf{0}\\[5.69046pt] \mathbf{0}&\mathbf{0}&s_{k}\mathbf{I}_{p}\end{array}\right)\succ 0.

When the condition rksk>1ηk2𝒟Φ(𝐱¯k)2+1ηk2𝒟Ψ(𝐲¯k)2r_{k}s_{k}>\frac{1}{\eta_{k}^{2}}\|\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})\|^{2}+\frac{1}{\eta_{k}^{2}}\|\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})\|^{2} holds, we have

𝐆k\displaystyle\mathbf{G}_{k} =𝐐k+𝐐k𝐌k𝐇k𝐌k=(rk𝐈n𝟎𝟎𝟎rk𝐈m𝟎𝟎𝟎sk𝐈p1ηk2rk[𝒟Φ(𝐱¯k)𝒟Φ(𝐱¯k)+𝒟Ψ(𝐲¯k)𝒟Ψ(𝐲¯k)])0.\displaystyle=\mathbf{Q}_{k}^{\top}+\mathbf{Q}_{k}-\mathbf{M}_{k}^{\top}\mathbf{H}_{k}\mathbf{M}_{k}=\left(\begin{array}[]{ccc}r_{k}\mathbf{I}_{n}&\mathbf{0}&\mathbf{0}\\[5.69046pt] \mathbf{0}&r_{k}\mathbf{I}_{m}&\mathbf{0}\\[5.69046pt] \mathbf{0}&\mathbf{0}&s_{k}\mathbf{I}_{p}-\frac{1}{\eta_{k}^{2}r_{k}}[\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}+\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}]\end{array}\right)\succ 0.

For μ>1\mu>1 and k1k\geq 1, we set 𝖱(𝐮k)=𝒟Φ(𝐱k)2+𝒟Ψ(𝐲k)2\mathsf{R}(\mathbf{u}^{k})=\|\mathcal{D}\Phi(\mathbf{x}^{k})\|^{2}+\|\mathcal{D}\Psi(\mathbf{y}^{k})\|^{2} and take a non-decreasing sequence like

rk\displaystyle r_{k} =1ηk𝖱(𝐮k),\displaystyle=\frac{1}{\eta_{k}}\sqrt{\mathsf{R}(\mathbf{u}^{k})}, (4.12a)
sk\displaystyle s_{k} =μ𝖱(𝐮¯k)ηk𝖱(𝐮k).\displaystyle=\frac{\mu\mathsf{R}(\bar{\mathbf{u}}^{k})}{\eta_{k}\sqrt{\mathsf{R}(\mathbf{u}^{k})}}. (4.12b)

To get a non-increasing sequence {rk,sk}\{r_{k},s_{k}\}, the scaling factor ηk\eta_{k} should satisfy

ηkmax{ηk1𝖱(𝐮k)𝖱(𝐮k1),ηk1𝖱(𝐮¯k)𝖱(𝐮k1)𝖱(𝐮¯k1)𝖱(𝐮k)}.\eta_{k}\geq\max\left\{\eta_{k-1}\cdot\sqrt{\frac{\mathsf{R}(\mathbf{u}^{k})}{\mathsf{R}(\mathbf{u}^{k-1})}},\quad\eta_{k-1}\cdot\frac{\mathsf{R}(\bar{\mathbf{u}}^{k})\sqrt{\mathsf{R}(\mathbf{u}^{k-1})}}{\mathsf{R}(\bar{\mathbf{u}}^{k-1})\sqrt{\mathsf{R}(\mathbf{u}^{k})}}\right\}. (4.13)

For any iteration k0k\geq 0, the difference matrix 𝐃k=𝐇k𝐇k+1\mathbf{D}_{k}=\mathbf{H}_{k}-\mathbf{H}_{k+1} is semi positive definite. The initial extended matrix can be written as 𝐇0=1η0𝖧0\mathbf{H}_{0}=\frac{1}{\eta_{0}}\mathsf{H}_{0}, where

𝖧0=(𝖱(𝐮0)𝐈n𝟎𝟎𝟎𝖱(𝐮0)𝐈m𝟎𝟎𝟎μ𝖱(𝐮¯0)𝖱(𝐮0)𝐈p).\displaystyle\mathsf{H}_{0}=\left(\begin{array}[]{ccc}\sqrt{\mathsf{R}(\mathbf{u}^{0})}\mathbf{I}_{n}&\mathbf{0}&\mathbf{0}\\[2.84544pt] \mathbf{0}&\sqrt{\mathsf{R}(\mathbf{u}^{0})}\mathbf{I}_{m}&\mathbf{0}\\[2.84544pt] \mathbf{0}&\mathbf{0}&\frac{\mu\mathsf{R}(\bar{\mathbf{u}}^{0})}{\sqrt{\mathsf{R}(\mathbf{u}^{0})}}\mathbf{I}_{p}\end{array}\right). (4.17)

4.3 Convergence Analysis

Lemma 4.2.

Let {𝐰k,𝐰¯k,𝐰k+1}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k},\mathbf{w}^{k+1}\} be the sequences generated by the Spice method. When we use equation (4.12) to update the values of rkr_{k} and sks_{k}, the predictive matrix 𝐐k\mathbf{Q}_{k} and corrective matrix 𝐌k\mathbf{M}_{k} are upper triangular and satisfy

𝐇k=𝐐k𝐌k10and𝐆k=𝐐k+𝐐k𝐌k𝐇k𝐌k0,\mathbf{H}_{k}=\mathbf{Q}_{k}\mathbf{M}_{k}^{-1}\succ 0\quad{\rm and}\quad\mathbf{G}_{k}=\mathbf{Q}_{k}^{\top}+\mathbf{Q}_{k}-\mathbf{M}_{k}^{\top}\mathbf{H}_{k}\mathbf{M}_{k}\succ 0, (4.18)

then the following variational inequality holds

ρ[ϑ(𝐮)ϑ(𝐮¯k)]+(𝐰𝐰¯k)1ηk𝚪(𝐰¯k)12(𝐰𝐰k+1𝐇k2𝐰𝐰k𝐇k2)+12𝐰¯k𝐰k𝐆k2,𝐰Ω.\rho[\vartheta(\mathbf{u})-\vartheta(\bar{\mathbf{u}}^{k})]+(\mathbf{w}-\bar{\mathbf{w}}^{k})^{\top}\frac{1}{\eta_{k}}\boldsymbol{\Gamma}(\bar{\mathbf{w}}^{k})\geq\frac{1}{2}\left(\|\mathbf{w}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}-\|\mathbf{w}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\right)+\frac{1}{2}\|\bar{\mathbf{w}}^{k}-\mathbf{w}^{k}\|_{\mathbf{G}_{k}}^{2},\quad\forall\ \mathbf{w}\in\Omega. (4.19)
Proof.

The proof is omitted since it is similar to that of Lemma 3.2. ∎

Theorem 4.1.

Let {𝐰k,𝐰¯k,𝐰k+1}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k},\mathbf{w}^{k+1}\} be the sequences generated by the Spice method. For the predictive matrix 𝐐k\mathbf{Q}_{k}, if there is a corrective matrix 𝐌k\mathbf{M}_{k} that satisfies the convergence condition (4.19), then we have

𝐰𝐰k𝐇k2𝐰𝐰k+1𝐇k2+𝐰k𝐰¯k𝐆k2,𝐰Ω,\|\mathbf{w}^{*}-\mathbf{w}^{k}\|_{\mathbf{H}_{k}}^{2}\geq\|\mathbf{w}^{*}-\mathbf{w}^{k+1}\|_{\mathbf{H}_{k}}^{2}+\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|_{\mathbf{G}_{k}}^{2},\quad\ \mathbf{w}^{*}\in\Omega^{*}, (4.20)

where Ω\Omega^{*} is the set of optimal solutions.

Proof.

The proof is omitted since it is similar to that of Theorem 3.1. ∎

Theorem 4.2.

The {𝐰k,𝐰¯k}\{\mathbf{w}^{k},\bar{\mathbf{w}}^{k}\} the sequences generated by the Spice method. For the given optimal solution 𝐰\mathbf{w}^{*}, the following limit equations hold

limk𝐰k𝐰¯k2=0andlimk𝐰k=𝐰.\lim_{k\to\infty}\|\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}\|^{2}=0\quad{\rm and}\quad\lim_{k\to\infty}\mathbf{w}^{k}=\mathbf{w}^{*}. (4.21)
Proof.

The proof is omitted since it is similar to that of Theorem 3.2. ∎

Theorem 4.3.

Let {𝐱¯k}\{\bar{\mathbf{x}}^{k}\} be the sequence generated by the Spice method for P2 and ηk\eta_{k} satisfy condition (4.13). Let 𝐮¯t\bar{\mathbf{u}}_{t}, ηt\eta_{t}, and 𝐰¯t\bar{\mathbf{w}}_{t} be defined as follows

𝐮¯t=1t+1k=0t𝐮¯k,ηt=t+1k=0t1ηk,𝐰¯t=k=0t𝐰¯kηkk=0t1ηk.\bar{\mathbf{u}}_{t}=\frac{1}{t+1}\sum_{k=0}^{t}\bar{\mathbf{u}}^{k},\quad\eta_{t}=\frac{t+1}{\sum_{k=0}^{t}\frac{1}{\eta_{k}}},\quad\bar{\mathbf{w}}_{t}=\frac{\sum_{k=0}^{t}\frac{\bar{\mathbf{w}}^{k}}{\eta_{k}}}{\sum_{k=0}^{t}\frac{1}{\eta_{k}}}.

If Lemma 4.2 holds, for any iteration t>0t>0 and scaling factor ρ(t)>0\rho(t)>0, the error bound satisfies

𝐰Ω,ϑ(𝐮¯t)ϑ(𝐮)+(𝐰¯t𝐰)1ηtρ(t)𝚪(𝐰)12η0ρ(t)(t+1)𝐰𝐰0𝖧02,𝐰¯tΩ.\mathbf{w}^{*}\in\Omega,\quad\vartheta(\bar{\mathbf{u}}_{t})-\vartheta(\mathbf{u}^{*})+(\bar{\mathbf{w}}_{t}-\mathbf{w}^{*})^{\top}\frac{1}{\eta_{t}\rho(t)}\boldsymbol{\Gamma}(\mathbf{w}^{*})\leq\frac{1}{2\eta_{0}\rho(t)(t+1)}\|\mathbf{w}^{*}-\mathbf{w}^{0}\|_{\mathsf{H}_{0}}^{2},\quad\forall\ \bar{\mathbf{w}}_{t}\in\Omega. (4.22)
Proof.

The proof is omitted since it is similar to that of Theorem 3.3. ∎

Corollary 4.1.

The Spice method provides flexibility in controlling the convergence rate through the choice of the scaling factor ρ(t)\rho(t). For example, setting ρ(t)=(t+1)α\rho(t)=(t+1)^{\alpha} with α>0\alpha>0 yields a power-law convergence rate, while ρ(t)=eβt\rho(t)=e^{\beta t} with β>0\beta>0 achieves exponential convergence. Additionally, ρ(t)=(t+1)t\rho(t)=(t+1)^{t} combines both power-law and exponential effects. This adaptability ensures that the method can tailor its convergence behavior to different nonlinear convex optimization problems, converging asymptotically as ρ(t)\rho(t) increases.

4.4 Solving Convex Problems with Equality Constraints

The Spice method can be extended to address convex problems involving nonlinear inequality and linear equation constraints. The convex problem is formulated as follows:

min{f(𝐱)+g(𝐲)|ϕi(𝐱)+ψi(𝐲)0,𝐀𝐱+𝐁𝐲=𝐛,𝐱𝒳,𝐲𝒴,i=1,,p1},\displaystyle\min\left\{f(\mathbf{x})+g(\mathbf{y})\ \middle|\ \phi_{i}(\mathbf{x})+\psi_{i}(\mathbf{y})\leq 0,\ \mathbf{A}\mathbf{x}+\mathbf{B}\mathbf{y}=\mathbf{b},\ \mathbf{x}\in\mathcal{X},\ \mathbf{y}\in\mathcal{Y},\ i=1,\cdots,p_{1}\right\},

where 𝐀p2×n\mathbf{A}\in\mathbb{R}^{p_{2}\times n}, 𝐁p2×m\mathbf{B}\in\mathbb{R}^{p_{2}\times m}, and 𝐛p2\mathbf{b}\in\mathbb{R}^{p_{2}} (p1+p2=:pp_{1}+p_{2}=:p). The constraint functions are structured as Φ(𝐱)=[ϕ1(𝐱),,ϕp1(𝐱),ϕp1+1(𝐱),,ϕp(𝐱)]\Phi(\mathbf{x})=[\phi_{1}(\mathbf{x}),\ldots,\phi_{p_{1}}(\mathbf{x}),\phi_{p_{1}+1}(\mathbf{x}),\cdots,\phi_{p}(\mathbf{x})]^{\top}, with [ϕp1+1(𝐱),,ϕp(𝐱)]=𝐀𝐱𝐛/2[\phi_{p_{1}+1}(\mathbf{x}),\cdots,\phi_{p}(\mathbf{x})]^{\top}=\mathbf{A}\mathbf{x}-\mathbf{b}/2, and Ψ(𝐲)=[ψ1(𝐲),,ψp1(𝐲),ψp1+1(𝐲),,ψp(𝐲)]\Psi(\mathbf{y})=[\psi_{1}(\mathbf{y}),\cdots,\psi_{p_{1}}(\mathbf{y}),\psi_{p_{1}+1}(\mathbf{y}),\cdots,\psi_{p}(\mathbf{y})]^{\top}, with [ψp1+1(𝐲),,ψp(𝐲)]=𝐁𝐲𝐛/2[\psi_{p_{1}+1}(\mathbf{y}),\cdots,\psi_{p}(\mathbf{y})]^{\top}=\mathbf{B}\mathbf{y}-\mathbf{b}/2. The domain of the dual variable 𝝀\boldsymbol{\lambda} must be redefined to account for these constraints. Specifically, the domain set of 𝝀\boldsymbol{\lambda} becomes 𝒵:=+p1×p2\mathcal{Z}:=\mathbb{R}^{p_{1}}_{+}\times\mathbb{R}^{p_{2}}, where +p1\mathbb{R}^{p_{1}}_{+} handles the non-positivity constraints and p2\mathbb{R}^{p_{2}} corresponds to the equality constraints.

5 Numerical Experiment

To evaluate the performance of the Spice method, this paper examines quadratically constrained quadratic programming (QCQP) problems that commonly arise in control theory and signal processing. The QCQP structure allows for closed-form solutions at each iteration. Single-variable and separable-variable QCQP problems involving nonlinear objective and constraint functions are considered. This approach ensures a comprehensive evaluation of the method’s performance across different scenarios.

5.1 Single-Variable QCQP

In single-variable QCQP, the objective and constraints are quadratic functions, which can be described as:

min𝐱f(𝐱)=\displaystyle\min_{\mathbf{x}}\quad f(\mathbf{x})= 𝐖0𝐱𝐚02\displaystyle\|\mathbf{W}_{0}\mathbf{x}-\mathbf{a}_{0}\|^{2}
subject to 𝐖i𝐱𝐚i2πi,i=1,,p,\displaystyle\|\mathbf{W}_{i}\mathbf{x}-\mathbf{a}_{i}\|^{2}\leq\pi_{i},\quad i=1,\cdots,p,

where 𝐱n\mathbf{x}\in\mathbb{R}^{n} and 𝐖iq×n\mathbf{W}_{i}\in\mathbb{R}^{q\times n} and 𝐚iq,i=0,1,,p\mathbf{a}_{i}\in\mathbb{R}^{q},i=0,1,\cdots,p.

5.1.1 PPA-Like Prediction Scheme

In the first step, the PPA-like prediction scheme updates primal and dual variables by solving the corresponding subproblems.

𝐱¯k\displaystyle\bar{\mathbf{x}}^{k} =argmin{(𝐱,𝝀k,ρ,ηk)+rk2𝐱𝐱k2𝐱𝒳},\displaystyle=\arg\min\big{\{}\mathcal{L}(\mathbf{x},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{x}-\mathbf{x}^{k}\|^{2}\mid\mathbf{x}\in\mathcal{X}\big{\}},
𝝀¯k\displaystyle\boldsymbol{\bar{\lambda}}^{k} =argmax{(𝐱¯k,𝝀,ρ,ηk)sk2𝝀𝝀k2𝝀𝒵},\displaystyle=\arg\max\big{\{}\mathcal{L}(\bar{\mathbf{x}}^{k},\boldsymbol{\lambda},\rho,\eta_{k})-\frac{s_{k}}{2}\|\boldsymbol{\lambda}-\boldsymbol{\lambda}^{k}\|^{2}\mid\boldsymbol{\lambda}\in\mathcal{Z}\big{\}},

By solving the above subproblems, we obtain

𝐱¯k=(2ρ𝐖0𝐖0+2ηki=1pλik𝐖i𝐖i+rk𝐈n)1(2ρ𝐖0𝐚0+2ηki=1pλik𝐖i𝐚i+rk𝐱k),\displaystyle\bar{\mathbf{x}}^{k}=\left(2\rho\mathbf{W}_{0}^{\top}\mathbf{W}_{0}+\frac{2}{\eta_{k}}\sum_{i=1}^{p}\lambda^{k}_{i}\mathbf{W}^{\top}_{i}\mathbf{W}_{i}+r_{k}\mathbf{I}_{n}\right)^{-1}\cdot\left(2\rho\mathbf{W}_{0}^{\top}\mathbf{a}_{0}+\frac{2}{\eta_{k}}\sum_{i=1}^{p}\lambda^{k}_{i}\mathbf{W}^{\top}_{i}\mathbf{a}_{i}+r_{k}\mathbf{x}^{k}\right),
λ¯ik=max{λ^ik,0}, where λ^ik=λik+1ηksk(𝐖i𝐱¯k𝐚i2πi),i=1,,p.\displaystyle\bar{\lambda}^{k}_{i}=\max\{\hat{\lambda}^{k}_{i},0\},\mbox{ where }\hat{\lambda}^{k}_{i}=\lambda^{k}_{i}+\frac{1}{\eta_{k}s_{k}}\left(\|\mathbf{W}_{i}\bar{\mathbf{x}}^{k}-\mathbf{a}_{i}\|^{2}-\pi_{i}\right),\quad i=1,\cdots,p.

5.1.2 Matrix-Driven Correction Scheme

In the second step, the prediction error is corrected by an upper triangular matrix. The iterative scheme of the variables is given by:

𝐰k+1=𝐰k𝐌k(𝐰k𝐰¯k),\mathbf{w}^{k+1}=\mathbf{w}^{k}-\mathbf{M}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}),

where the corrective matrix 𝐌k\mathbf{M}_{k} is given as

𝐌k=(𝐈n1ηkrk𝒟Φ(𝐱¯k)𝟎𝐈p),\mathbf{M}_{k}=\left(\begin{array}[]{cc}\mathbf{I}_{n}&-\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[2.84544pt] \mathbf{0}&\mathbf{I}_{p}\end{array}\right),

where 𝒟Φ(𝐱)=[ϕ1,,ϕp]p×n\mathcal{D}\Phi(\mathbf{x})=[\nabla\phi_{1},\cdots,\nabla\phi_{p}]^{\top}\in\mathbb{R}^{p\times n}. Let 𝖱(𝐱k)=𝒟Φ(𝐱k)2\mathsf{R}(\mathbf{x}^{k})=\|\mathcal{D}\Phi(\mathbf{x}^{k})\|^{2} and the regularization parameters rkr_{k} and sks_{k} are defined as follows:

rk=1ηk𝖱(𝐱k),sk=μ𝖱(𝐱¯k)ηk𝖱(𝐱k).r_{k}=\frac{1}{\eta_{k}}\sqrt{\mathsf{R}(\mathbf{x}^{k})},\quad s_{k}=\frac{\mu\mathsf{R}(\bar{\mathbf{x}}^{k})}{\eta_{k}\sqrt{\mathsf{R}(\mathbf{x}^{k})}}.

To ensure that rk1rkr_{k-1}\geq r_{k} and sk1sks_{k-1}\geq s_{k}, the scaling factor ηk\eta_{k} must be satisfied the following condition.

ηkmax{ηk1𝖱(𝐱k)𝖱(𝐱k1),ηk1𝖱(𝐱¯k)𝖱(𝐱k1)𝖱(𝐱¯k1)𝖱(𝐱k)}.\eta_{k}\geq\max\left\{\eta_{k-1}\cdot\sqrt{\frac{\mathsf{R}(\mathbf{x}^{k})}{\mathsf{R}(\mathbf{x}^{k-1})}},\quad\eta_{k-1}\cdot\frac{\mathsf{R}(\bar{\mathbf{x}}^{k})\sqrt{\mathsf{R}(\mathbf{x}^{k-1})}}{\mathsf{R}(\bar{\mathbf{x}}^{k-1})\sqrt{\mathsf{R}(\mathbf{x}^{k})}}\right\}.

5.2 Separable-Variable QCQP

For the separable-variable QCQP, the problem structure is extended to consider two sets of variables, 𝐱\mathbf{x} and 𝐲\mathbf{y}, which are optimized independently within a coupled quadratic framework. The optimization problem is formulated as follows:

min𝐱,𝐲f(𝐱)=\displaystyle\min_{\mathbf{x},\mathbf{y}}\quad f(\mathbf{x})= 𝐖0𝐱𝐚02+𝐕0𝐲𝐜02\displaystyle\|\mathbf{W}_{0}\mathbf{x}-\mathbf{a}_{0}\|^{2}+\|\mathbf{V}_{0}\mathbf{y}-\mathbf{c}_{0}\|^{2}
subject to 𝐖i𝐱𝐚i2+𝐕i𝐲𝐜i2πi,i=1,,p,\displaystyle\|\mathbf{W}_{i}\mathbf{x}-\mathbf{a}_{i}\|^{2}+\|\mathbf{V}_{i}\mathbf{y}-\mathbf{c}_{i}\|^{2}\leq\pi_{i},\quad i=1,\cdots,p,

where 𝐱n\mathbf{x}\in\mathbb{R}^{n}, 𝐲m\mathbf{y}\in\mathbb{R}^{m} and 𝐖iq×n\mathbf{W}_{i}\in\mathbb{R}^{q\times n},𝐕iq×m\mathbf{V}_{i}\in\mathbb{R}^{q\times m}. 𝐚iq\mathbf{a}_{i}\in\mathbb{R}^{q} and 𝐜iq,i=0,1,,q\mathbf{c}_{i}\in\mathbb{R}^{q},i=0,1,\cdots,q.

5.2.1 PPA-Like Prediciton Scheme

In the prediction step, we update the primal and dual variables by solving the following subproblems.

𝐱¯k\displaystyle\bar{\mathbf{x}}^{k} =argmin{(𝐱,𝐲k,𝝀k,ρ,ηk)+rk2𝐱𝐱k2𝐱𝒳},\displaystyle=\arg\min\big{\{}\mathcal{L}(\mathbf{x},\mathbf{y}^{k},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{x}-\mathbf{x}^{k}\|^{2}\mid\mathbf{x}\in\mathcal{X}\big{\}},
𝐲¯k\displaystyle\bar{\mathbf{y}}^{k} =argmin{(𝐱¯k,𝐲,𝝀k,ρ,ηk)+rk2𝐲𝐲k2𝐲𝒴},\displaystyle=\arg\min\big{\{}\mathcal{L}(\bar{\mathbf{x}}^{k},\mathbf{y},\boldsymbol{\lambda}^{k},\rho,\eta_{k})+\frac{r_{k}}{2}\|\mathbf{y}-\mathbf{y}^{k}\|^{2}\mid\mathbf{y}\in\mathcal{Y}\big{\}},
𝝀¯k\displaystyle\boldsymbol{\bar{\lambda}}^{k} =argmax{(𝐱¯k,𝐲¯k,𝝀,ρ,ηk)sk2𝝀𝝀k2𝝀𝒵},\displaystyle=\arg\max\big{\{}\mathcal{L}(\bar{\mathbf{x}}^{k},\bar{\mathbf{y}}^{k},\boldsymbol{\lambda},\rho,\eta_{k})-\frac{s_{k}}{2}\|\boldsymbol{\lambda}-\boldsymbol{\lambda}^{k}\|^{2}\mid\boldsymbol{\lambda}\in\mathcal{Z}\big{\}},

By solving the above subproblems, we obtain

𝐱¯k=(2ρ𝐖0𝐖0+2ηki=1pλik𝐖i𝐖i+rk𝐈n)1(2ρ𝐖0𝐚0+2ηki=1pλik𝐖i𝐚i+rk𝐱k),\displaystyle\bar{\mathbf{x}}^{k}=\left(2\rho\mathbf{W}_{0}^{\top}\mathbf{W}_{0}+\frac{2}{\eta_{k}}\sum_{i=1}^{p}\lambda^{k}_{i}\mathbf{W}^{\top}_{i}\mathbf{W}_{i}+r_{k}\mathbf{I}_{n}\right)^{-1}\cdot\left(2\rho\mathbf{W}_{0}^{\top}\mathbf{a}_{0}+\frac{2}{\eta_{k}}\sum_{i=1}^{p}\lambda^{k}_{i}\mathbf{W}^{\top}_{i}\mathbf{a}_{i}+r_{k}\mathbf{x}^{k}\right),
𝐲¯k=(2ρ𝐕0𝐕0+2ηki=1pλik𝐕i𝐕i+rk𝐈n)1(2ρ𝐕0𝐜0+2ηki=1pλik𝐕i𝐜i+rk𝐲k),\displaystyle\bar{\mathbf{y}}^{k}=\left(2\rho\mathbf{V}_{0}^{\top}\mathbf{V}_{0}+\frac{2}{\eta_{k}}\sum_{i=1}^{p}\lambda^{k}_{i}\mathbf{V}^{\top}_{i}\mathbf{V}_{i}+r_{k}\mathbf{I}_{n}\right)^{-1}\cdot\left(2\rho\mathbf{V}_{0}^{\top}\mathbf{c}_{0}+\frac{2}{\eta_{k}}\sum_{i=1}^{p}\lambda^{k}_{i}\mathbf{V}^{\top}_{i}\mathbf{c}_{i}+r_{k}\mathbf{y}^{k}\right),
λ¯ik=max{λ^ik,0}, where λ^ik=λik+1ηksk(𝐖i𝐱¯k𝐚i2+𝐕i𝐲¯k𝐜i2πi),i=1,,p.\displaystyle\bar{\lambda}^{k}_{i}=\max\{\hat{\lambda}^{k}_{i},0\},\mbox{ where }\hat{\lambda}^{k}_{i}=\lambda^{k}_{i}+\frac{1}{\eta_{k}s_{k}}\left(\|\mathbf{W}_{i}\bar{\mathbf{x}}^{k}-\mathbf{a}_{i}\|^{2}+\|\mathbf{V}_{i}\bar{\mathbf{y}}^{k}-\mathbf{c}_{i}\|^{2}-\pi_{i}\right),\quad i=1,\cdots,p.

5.2.2 Matrix-Driven Correction Scheme

In the correction step, the prediction error is corrected by using an upper triangular matrix. The update rule is:

𝐰k+1=𝐰k𝐌k(𝐰k𝐰¯k),\mathbf{w}^{k+1}=\mathbf{w}^{k}-\mathbf{M}_{k}(\mathbf{w}^{k}-\bar{\mathbf{w}}^{k}),

where the corrective matrix 𝐌k\mathbf{M}_{k} is given as

𝐌k=(𝐈n𝟎1ηkrk𝒟Φ(𝐱¯k)𝟎𝐈m1ηkrk𝒟Ψ(𝐲¯k)𝟎𝟎𝐈p),\mathbf{M}_{k}=\left(\begin{array}[]{ccc}\mathbf{I}_{n}&\mathbf{0}&-\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Phi(\bar{\mathbf{x}}^{k})^{\top}\\[5.69046pt] \mathbf{0}&\mathbf{I}_{m}&-\frac{1}{\eta_{k}r_{k}}\mathcal{D}\Psi(\bar{\mathbf{y}}^{k})^{\top}\\[5.69046pt] \mathbf{0}&\mathbf{0}&\mathbf{I}_{p}\end{array}\right),

where 𝒟Φ(𝐱)=[ϕ1,,ϕp]p×n\mathcal{D}\Phi(\mathbf{x})=[\nabla\phi_{1},\cdots,\nabla\phi_{p}]^{\top}\in\mathbb{R}^{p\times n} and 𝒟Ψ(𝐲)=[ψ1,,ψp]p×m\mathcal{D}\Psi(\mathbf{y})=[\nabla\psi_{1},\cdots,\nabla\psi_{p}]^{\top}\in\mathbb{R}^{p\times m}. For μ>1\mu>1 and k1k\geq 1, we set 𝖱(𝐮k)=𝒟Φ(𝐱k)2+𝒟Ψ(𝐲k)2\mathsf{R}(\mathbf{u}^{k})=\|\mathcal{D}\Phi(\mathbf{x}^{k})\|^{2}+\|\mathcal{D}\Psi(\mathbf{y}^{k})\|^{2} and take a non-decreasing sequence like

rk=1ηk𝖱(𝐮k),sk=μ𝖱(𝐮¯k)ηk𝖱(𝐮k).r_{k}=\frac{1}{\eta_{k}}\sqrt{\mathsf{R}(\mathbf{u}^{k})},\quad s_{k}=\frac{\mu\mathsf{R}(\bar{\mathbf{u}}^{k})}{\eta_{k}\sqrt{\mathsf{R}(\mathbf{u}^{k})}}.

To get a non-increasing sequence {rk,sk}\{r_{k},s_{k}\}, the scaling factor ηk\eta_{k} should satisfy

ηkmax{ηk1𝖱(𝐮k)𝖱(𝐮k1),ηk1𝖱(𝐮¯k)𝖱(𝐮k1)𝖱(𝐮¯k1)𝖱(𝐮k)}.\eta_{k}\geq\max\left\{\eta_{k-1}\cdot\sqrt{\frac{\mathsf{R}(\mathbf{u}^{k})}{\mathsf{R}(\mathbf{u}^{k-1})}},\quad\eta_{k-1}\cdot\frac{\mathsf{R}(\bar{\mathbf{u}}^{k})\sqrt{\mathsf{R}(\mathbf{u}^{k-1})}}{\mathsf{R}(\bar{\mathbf{u}}^{k-1})\sqrt{\mathsf{R}(\mathbf{u}^{k})}}\right\}.

5.3 Parameter Setting

The dimensionality of the vectors 𝐱\mathbf{x} and 𝐲\mathbf{y} is set to n=300n=300 and m=300m=300, respectively, while the dimension qq for the matrices 𝐖i\mathbf{W}_{i}, 𝐕i\mathbf{V}_{i}, and the vectors 𝐚i\mathbf{a}_{i}, 𝐜i\mathbf{c}_{i} is set to 400. The number of constraints pp is set to 20. To ensure reproducibility, a random seed of 0 is used. Matrices 𝐖0q×n\mathbf{W}_{0}\in\mathbb{R}^{q\times n} and 𝐕0q×m\mathbf{V}_{0}\in\mathbb{R}^{q\times m} are generated by drawing elements from a standard normal distribution and scaling them by 1. Vectors 𝐚0q\mathbf{a}_{0}\in\mathbb{R}^{q} and 𝐜0q\mathbf{c}_{0}\in\mathbb{R}^{q} are drawn from the same distribution and scaled by 12. For each constraint, i=1,,pi=1,\dots,p, the matrices 𝐖iq×n\mathbf{W}_{i}\in\mathbb{R}^{q\times n} and 𝐕iq×m\mathbf{V}_{i}\in\mathbb{R}^{q\times m} are generated by drawing elements from a standard normal distribution. The corresponding vectors 𝐚iq\mathbf{a}_{i}\in\mathbb{R}^{q} and 𝐜iq\mathbf{c}_{i}\in\mathbb{R}^{q} are scaled by 0.1. The bounds of two problems for the constraints πi\pi_{i} are uniformly set to 500,000 and 1,000,000, respectively. The tolerant error is set to 10910^{-9}. The values of α\alpha and β\beta are set to 2. In addition, the delta objective value is defined as the iterative error abs(f(𝐱k)f(𝐱k+1))\textsf{abs}(f(\mathbf{x}^{k})-f(\mathbf{x}^{k+1})).

Refer to caption
(a) Single-variable QCQP: objective function value
Refer to caption
(b) Single-variable QCQP: delta objective value
Figure 1: Performance of the Spice method for single-variable QCQP problems with different ρ(t)\rho(t) functions. (a) Objective function value versus the number of iterations; (b) delta objective value versus the number of iterations.
Refer to caption
(a) Separable-variable QCQP: objective function value
Refer to caption
(b) Separable-variable QCQP: delta objective value
Figure 2: Performance of the Spice method for separable-variable QCQP problems with different ρ(t)\rho(t) functions. (a) Objective function value versus the number of iterations; (b) delta objective value versus the number of iterations.

5.4 Results Analysis

In Figure 1, the performance of the Spice method for single-variable QCQP problems is evaluated. In subplot (a), we show the objective function value versus the number of iterations. The different curve legends represent four different settings of the ρ(t)\rho(t) function: ρ(t)=1\rho(t)=1, ρ(t)=(t+1)α\rho(t)=(t+1)^{\alpha}, ρ(t)=eβt\rho(t)=e^{\beta t}, and ρ(t)=(t+1)t+1\rho(t)=(t+1)^{t+1}. The curves show a steady decrease in the objective function values as the number of iterations increases, with ρ(t)=(t+1)t+1\rho(t)=(t+1)^{t+1} converging faster than the others, particularly within the first 10 iterations. The other settings for ρ(t)\rho(t) demonstrate a slower but consistent convergence behavior. Subplot (b) indicates the delta objective value, plotted on a logarithmic scale, which provides more insight into the convergence rate. Again, ρ(t)=(t+1)t+1\rho(t)=(t+1)^{t+1} demonstrates a quicker decline in the delta objective value, indicating faster convergence. The results show that the choice of ρ(t)\rho(t) significantly impacts the convergence rate, with more aggressive functions leading to faster reductions in both the objective and delta objective values.

In Figure 2, the Spice method’s performance is extended to separable-variable QCQP problems. Subplot (a) shows the objective function values across iterations, where the same ρ(t)\rho(t) functions are used. As expected, ρ(t)=(t+1)t+1\rho(t)=(t+1)^{t+1} exhibits the fastest convergence, closely followed by ρ(t)=eβt\rho(t)=e^{\beta t}, while the constant function ρ(t)=1\rho(t)=1 leads to a much slower decrease in the objective value. Subplot (b) shows the delta objective value on a logarithmic scale, revealing a similar trend where ρ(t)=(t+1)t+1\rho(t)=(t+1)^{t+1} outperforms the others in reducing the objective error more rapidly. The overall conclusion from these figures is that the adaptive choices for ρ(t)\rho(t) can significantly accelerate convergence, with functions that grow with iterations providing more efficient performance in both single-variable and separable-variable QCQP problems.

Table 2: Number of iterations: PC and Spice methods
Dimensions Single-variable QCQP Separable-variable QCQP
nn mm p PC Spice: ρ(t)=1\rho(t)=1 Spice: ρ(t)=eβt\rho(t)=e^{\beta t} PC Spice: ρ(t)=1\rho(t)=1 Spice: ρ(t)=eβt\rho(t)=e^{\beta t}
100 100 10 546 31 8 836 33 8
300 300 10 12462 49 10 24068 53 10
100 100 20 745 32 8 1129 35 9
300 300 20 16890 51 10 33206 54 11

Table 2 compares the number of iterations required by the PC method and the Spice method for both single-variable and separable-variable QCQP problems across different problem dimensions. The dimensions nn, mm, and pp refer to the problem size, with larger values representing more complex instances. For single-variable QCQP problems, the PC method consistently requires a significantly larger number of iterations than the Spice method, regardless of the ρ(t)\rho(t) function used. When ρ(t)=1\rho(t)=1, the number of iterations needed by the Spice method is greatly reduced, and when ρ(t)=eβt\rho(t)=e^{\beta t}, the iteration count decreases even further. For instance, in the case where n=100n=100, m=100m=100, and p=10p=10, the PC method requires 546 iterations, while the Spice method with ρ(t)=eβt\rho(t)=e^{\beta t} only takes 8 iterations, demonstrating the significant improvement in efficiency. Similarly, for separable-variable QCQP problems, the Spice method outperforms the PC method regarding the number of iterations. The trend persists as the problem dimensions increase, with the Spice method maintaining a more efficient convergence pattern, particularly with the ρ(t)=eβt\rho(t)=e^{\beta t} function. For example, in the case of n=300n=300, m=300m=300, and p=20p=20, the PC method requires 33,206 iterations, whereas the Spice method with ρ(t)=eβt\rho(t)=e^{\beta t} converges in just 11 iterations. This highlights the dramatic improvement in scalability offered by the Spice method with adaptive ρ(t)\rho(t). This table illustrates the superior convergence performance of the Spice method, especially when utilizing the exponential ρ(t)\rho(t) function, which achieves faster convergence and reduces the number of iterations by several orders of magnitude compared to the PC method.

6 Conclusion

This paper introduced a novel scaling technique to adjust the weights of the objective and constraint functions. Based on this technique, the Spice method was designed to achieve a free convergence rate, addressing limitations inherent in traditional PC approaches. Additionally, the Spice method was extended to handle separable-variable nonlinear convex problems. Theoretical analysis, supported by numerical experiments, demonstrated that varying the scaling factors for the objective and constraint functions results in flexible convergence rates. These findings underscore the practical efficacy of the Spice method, providing a robust framework for solving a wider range of nonlinear convex optimization problems.

References

  • [1] K. J. Arrow, L. Hurwicz, H. Uzawa, Studies in linear and nonlinear programming, Stanford University Press, Stanford, 1958.
  • [2] M. Zhu, T. F. Chan, An efficient primal-dual hybrid gradient algorithm for total variation image restoration, in: UCLA CAM Report Series, Los Angeles, CA, 2008, p. 08–34.
  • [3] B. He, S. Xu, X. Yuan, On convergence of the Arrow–Hurwicz method for saddle point problems, J. Math. Imaging Vis. 64 (6) (2022) 662–671. doi:10.1007/s10851-022-01089-9.
  • [4] A. Nedić, A. Ozdaglar, Subgradient methods for saddle-point problems, J. Optimiz. Theory App. 142 (1) (2009) 205–228. doi:10.1007/s10957-009-9522-7.
  • [5] J. Rasch, A. Chambolle, Inexact first-order primal-dual algorithms, Comput. Optim. Appl. 76 (2) (2020) 381–430. doi:10.1007/s10589-020-00186-y.
  • [6] A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vis. 40 (1) (2011) 120–145. doi:10.1007/s10851-010-0251-1.
  • [7] A. Chambolle, T. Pock, On the ergodic convergence rates of a first-order primal-dual algorithm, Math. Program. 159 (1) (2016) 253–287. doi:10.1007/s10107-015-0957-3.
  • [8] T. Valkonen, A primal-dual hybrid gradient method for nonlinear operators with applications to MRI, Inverse Probl. 30 (5) (May 2014). doi:10.1088/0266-5611/30/5/055012.
  • [9] Y. Gao, W. Zhang, An alternative extrapolation scheme of PDHGM for saddle point problem with nonlinear function, Comput. Optim. Appl. 85 (1) (2023) 263–291. doi:10.1007/s10589-023-00453-8.
  • [10] X. Cai, G. Gu, B. He, X. Yuan, A proximal point algorithm revisit on the alternating direction method of multipliers, Science China Mathematics 56 (10) (2013) 2179–2186. doi:10.1007/s11425-013-4683-0.
  • [11] G. Gu, B. He, X. Yuan, Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: A unified approach, Comput. Optim. Appl. 59 (1) (2014) 135–161. doi:10.1007/s10589-013-9616-x.
  • [12] B. He, X. Yuan, W. Zhang, A customized proximal point algorithm for convex minimization with linear constraints, Comput. Optim. Appl. 56 (3) (2013) 559–572. doi:10.1007/s10589-013-9564-5.
  • [13] H.-M. Chen, X.-J. Cai, L.-L. Xu, Approximate customized proximal point algorithms for separable convex optimization, J. Oper. Res. Soc. China 11 (2) (2023) 383–408. doi:10.1007/s40305-022-00412-w.
  • [14] B. Jiang, Z. Peng, K. Deng, Two new customized proximal point algorithms without relaxation for linearly constrained convex optimization, Bull. Iranian Math. Soc. 46 (3) (2020) 865–892. doi:10.1007/s41980-019-00298-0.
  • [15] B. He, PPA-like contraction methods for convex optimization: A framework using variational inequality approach, J. Oper. Res. Soc. China 3 (4) (2015) 391–420. doi:10.1007/s40305-015-0108-9.
  • [16] B. He, F. Ma, X. Yuan, An algorithmic framework of generalized primal-dual hybrid gradient methods for saddle point problems, J. Math. Imaging Vis. 58 (2) (2017) 279–293. doi:10.1007/s10851-017-0709-5.
  • [17] S. Wang, Y. Gong, Nonlinear convex optimization: From relaxed proximal point algorithm to prediction correction method, Optimization, under review (2024).
  • [18] S. Wang, Y. Gong, A generalized primal-dual correction method for saddle-point problems with the nonlinear coupling operator, Int. J. Control Autom. Syst., accepted (2024).
  • [19] B. He, F. Ma, S. Xu, X. Yuan, A generalized primal-dual algorithm with improved convergence condition for saddle point problems, SIAM J. Imaging Sci. 15 (3) (2022) 1157–1183. doi:10.1137/21M1453463.