A Bregman inertial forward-reflected-backward method for nonconvex minimization
Abstract
We propose a Bregman inertial forward-reflected-backward (BiFRB) method for nonconvex composite problems. Our analysis relies on a novel approach that imposes general conditions on implicit merit function parameters, which yields a stepsize condition that is independent of inertial parameters. In turn, a question of Malitsky and Tam regarding whether FRB can be equipped with a Nesterov-type acceleration is resolved. Assuming the generalized concave Kurdyka-Łojasiewicz property of a quadratic regularization of the objective, we obtain sequential convergence of BiFRB, as well as convergence rates on both the function value and actual sequence. We also present formulae for the Bregman subproblem, supplementing not only BiFRB but also the work of Boţ-Csetnek-László and Boţ-Csetnek. Numerical simulations are conducted to evaluate the performance of our proposed algorithm.
2010 Mathematics Subject Classification: Primary 90C26, 49J52; Secondary 26D10.
Keywords: Generalized concave Kurdyka-Łojasiewicz property, Bregman proximal mapping, forward-reflected-backward splitting, implicit merit function, nonconvex optimization, inertial effect.
1 Introduction
Consider the inclusion problem of maximally monotone operators:
(1) |
where and are (maximally) monotone operators with being Lipschitz continuous. Behavior of splitting methods for solving (1) is well-understood; see, e.g., the recent forward-reflected-backward method by Malitsky and Tam [22], and [3, Chapters 26, 28], which includes classical algorithms such as the forward-backward method [14] with being cocoercive, the Douglas-Rachford method [19, 15] and Tseng’s method [29].
The goal of this paper is to propose a Bregman inertial forward-reflected-backward method (Algorithm 3.2) for solving the nonconvex composite problem
(2) |
where is proper lower semicontinuous (lsc), and has a Lipschitz continuous gradient, which corresponds to (1) with and when and are both convex. Informally, the proposed algorithm computes
(3) |
for some stepsize , inertial parameter and kernel . In particular, (3) reduces to the inertial forward-reflected-backward method studied in [22, Corollary 4.4] when , are convex and inertial parameter is fixed. Before stating our main contribution, let us note that several splitting algorithms have been extended to nonconvex setting for solving (2); see, e.g., the Douglas-Rachford method [18], forward-backward method [2, 8], inertial forward-backward methods [27, 10], the Peaceman-Rachford method [17], inertial Tseng’s method [9], and forward-reflected-backward method [31].
Let us summarize the main contributions of this paper. Our convergence analysis relies on the generalized concave Kurdyka-Łojasiewicz property (Definition 2.2) and a novel framework for analyzing a sequence of implicit merit functions (Definition 3.10) through a general condition (Assumption 3.12) on merit function parameters. In turn, we derive global sequential convergence to a stationary point of with an explicit stepsize condition that is independent of inertial parameters, whereas stepsize assumptions in current literature are tangled with inertial parameters or even implicit; see Remark 4.3(i). Notably, such an independence result resolves a question of Malitsky and Tam [22, Section 7] regarding whether FRB can be adapted to incorporate a Nesterov-type acceleration; see Remark 4.3(ii). Convergence rate analysis on both the function value and actual sequence is also carried out. Moreover, we provide several formulae of the associated Bregman subproblem, which to the best of our knowledge are the first of its kind among present results devoting to Bregman extensions of splitting algorithms with the same kernel; see, e.g., [9, 10].
The paper is organized as follows. We begin with notation and background in Section 2. The proposed algorithm and its abstract convergence are studied in Section 3. Then, in Section 4, we present suitable stepsize rules under which the abstract convergence holds. Function value and sequential convergence rate results are discussed in Section 5, followed by Bregman subproblem formulae in Section 6. Numerical simulations are conducted in Section 7. We end this paper in Section 8 with remarks on our contribution and possible directions for future research.
2 Notation and preliminaries
Throughout this paper, is the standard Euclidean space equipped with inner product and the Euclidean norm for . Let and . The open ball centered at with radius is denoted by . The distance function of a subset is ,
where if . For and , we set . We say a function is coercive if . A proper, lsc function is prox-bounded if there exists such that is bounded below; see, e.g., [28, Exercise 1.24]. The supremum of the set of all such is the threshold of prox-boundedness for . The indicator function of a set is if and otherwise.
Definition 2.1
Let be a proper function and a nonempty closed set. We say that
(i) is a Fréchet subgradient of at , denoted by , if
(4) |
(ii) is a limiting subgradient of at , denoted by , if
(5) |
where . Moreover, we set . We say that is a stationary point, if .
(iii) The Fréchet and limiting normal cones to at are and , respectively, if ; otherwise .
If is a nonempty convex set, then ; see, e.g., [28].
The proximal mapping of a proper and lsc with parameter is
which is the resolvent with when is convex; see, e.g., [3]. The Bregman distance induced by a differentiable is
For , denote by the class of functions satisfying the following conditions: (i) is right-continuous at with ; (ii) is strictly increasing on . The following concept will be the key in our convergence analysis.
Definition 2.2
[30] Let be proper and lsc. Let and , and let be a nonempty subset.
(i) We say that has the pointwise generalized concave Kurdyka-Łojasiewicz (KL) property at , if there exist neighborhood , and concave , such that for all ,
(6) |
where denotes the left derivative. Moreover, is a generalized concave KL function if it has the generalized concave KL property at every .
(ii) Suppose that on . We say has the setwise111We shall omit adjectives “pointwise” and “setwise” whenever there is no ambiguity. generalized concave Kurdyka-Łojasiewicz property on , if there exist , and concave such that for every ,
(7) |
Generalized concave KL functions are ubiquitous. The class of semialgebraic functions is one particularly rich resource. We refer the interested reader to [7, 1, 6] for more examples; see also the fundamental work of Łojasiewicz [20] and Kurdyka [16].
Definition 2.3
(i) A set is called semialgebraic if there exist finitely many polynomials such that
(ii) A function is called semialgebraic if its graph is semialgebraic.
Fact 2.4
[6, Corollary 16] Let be a proper and lsc function and let . If is semialgebraic, then it has the concave KL property at with for some and .
3 The BiFRB method and its abstract convergence
3.1 The BiFRB method
In the remainder of this paper, suppose that the following hold:
Assumption 3.1
(i) is proper, lsc and prox-bounded with threshold .
(ii) has Lipschitz continuous gradient with constant .
(iii) is -strongly convex and has Lipschitz continuous gradient with constant .
(iv) The objective is bounded below.
We now propose the Bregman inertial forward-reflected-backward (BiFRB) algorithm.
Algorithm 3.2 (BiFRB)
1. Initialization: Pick . Let . Choose .
2. For , choose and . Compute
(8) | |||
(9) |
The Bregman distance is proportional to square of the Euclidean distance.
Lemma 3.3
[3, Exercise 17.5, Theorem 18.15] Let be -strongly convex and has Lipschitz continuous gradient with constant . Then
Assumption 3.1(iii) has been used in the literature to obtain non-Euclidean extension of splitting algorithms; see, e.g., [9, 10]. Recall that the Fenchel conjugate of is and the Moreau envelope of with parameter is
Below is a systemic way to construct kernel via Fenchel conjugate and Moreau envelope.
Proposition 3.4
Let and let be proper, lsc and convex. Define . Then the following hold:
(i) [3, Corollary 18.18] has -Lipschitz gradient if and only if is the Moreau envelope with parameter of a proper lsc and convex function . Therefore, in particular, if is -strongly convex then has -Lipschitz gradient.
(ii) If has -Lipschitz gradient, then is -strongly convex with -Lipschitz gradient.
Example 3.5
Let . Then satisfies Assumption 3.1(iii) with and .
Proof. Define . Then is -strongly convex; see, e.g., [5, Section 4.4, Example 5.29]. Applying Proposition 3.4 completes the proof.
We also assume the following throughout the paper, under which the proposed algorithm is well-defined.
Assumption 3.6
Suppose that one of the following holds:
(i) The kernel has strong convexity modulus .
(ii) Function has prox-threshold .
Remark 3.7
The threshold when is convex or bounded below.
Proposition 3.8
Let and suppose that Assumption 3.6 holds.Then for , the function is coercive. Consequently, the set
Proof. We claim that is prox-bounded with for an arbitrary . Indeed,
Thus invoking [28, Excercise 1.24] proves the claim.
By our claim, is prox-bounded with threshold . Let be such that . Then under Assumption 3.6. Therefore
as , where the infimum in the last inequality is finite thanks to prox-boundedness of , which completes the proof.
Setting the kernel gives Algorithm 3.9, which is an inertial forward-reflected-backward method (iFRB).
Algorithm 3.9 (iFRB)
1. Initialization: Pick . Let . Choose .
2. For , choose and . Compute
Convergence of iFRB with fixed stepsize and inertial parameter was studied in [22] in the presence of convexity. However, its behavior in nonconvex setting is still not understood, which will be covered in our analysis.
3.2 BiFRB merit function and its parameter rules
In the remainder of this paper, unless specified otherwise, let and define for
(10) | ||||
(11) |
where is a stepsize sequence of BiFRB; is a sequence of inertial parameters; and are parameters specified in Assumption 3.1. The following concept plays a central role in our analysis.
Definition 3.10
Let . Define by
Then we say that is a quadratic merit function with parameter . Let be given by (10). Then is the BiFRB merit function. If for some , then is a dominating BiFRB merit function.
Remark 3.11
Despite the recursion (10), we shall see soon that and are indeed implicit in our analysis. Imposing the following novel condition on and , one gets abstract convergence of BiFRB, which in turn yields less restrictive stepsize rules; see Section 4 and in particular Remark 4.3.
Assumption 3.12 (general rules of BiFRB merit function parameter)
(i) and .
(ii) There exists such that for all .
3.3 Abstract function value convergence
In this section, we study function value convergence of BiFRB under Assumption 3.12; see Section 4 for concrete stepsize rules under which Assumption 3.12 holds.
Lemma 3.14
Let be a sequence generated by BiFRB and define for . Then the following hold:
(i) For all ,
(12) |
(ii) Suppose that Assumption 3.12(i) holds. Then the sequence is decreasing and . Consequently, is convergent and .
Proof. (i) By using the iterative scheme (9),
(13) |
where the last inequality is implied by Lemma 3.3. Invoking the descent lemma [3, Lemma 2.64] to yields
(14) |
(15) |
Next we estimate
Thus, inequality (15) further implies that
(16) |
Adding to the above inequality gives
where the first inequality follows from the identity .
(ii) Assume without loss of generality that and for . Thus by statement(i) and Assumption 3.1, the sequence is decreasing and bounded below, therefore it is convergent. Summing inequality (12) from to , we have
which together with the assumption that implies that .
The following lemma is an immediate consequence of subdifferential sum rules [28, Exercise 8.8, Proposition 10.5].
Lemma 3.15
Let . Then for every ,
The lemma below provides a lower bound for consecutive iterates gap.
Lemma 3.16
Let be a sequence generated by BiFRB, for , and . For , define ,
Set
and
Then and .
Proof. Invoking (9), one has
which implies that . It then follows from Lemma 3.15 that . Moreover,
Then
as desired.
Having established basic properties of , we now show its convergence. Denote by the set of all limit points of .
Theorem 3.17
Let be a sequence generated by BiFRB and define for . Suppose that Assumption 3.12 holds and is bounded. Let be a subsequence such that for some as . Then the following hold:
(i) We have . In fact, . Consequently , where is given in Assumption 3.12(ii).
(ii) The limit point satisfies and , which implies that .
(iii) The set is nonempty, connected, and compact, on which the function is constant . Moreover, .
Proof. (i) For every , appealing to the iterative scheme (9), we have
(17) |
Recall that as . Then Lemma 3.14(ii) and the triangle inequality imply that and
where the first inequality holds due to (8). Taking the above limits into account, replacing the index by in (3.3) and passing to the limit, one gets
where the last inequality holds thanks to the lower semicontinuity of , which implies that . Note that the sequence is convergent. Then one concludes that
where due to Assumption 3.12(ii) and Lemma 3.14(ii). Finally, it is easy to see that as under Assumption 3.12(ii). Thus .
(ii) By Lemma 3.14(ii) and the triangle inequality, subsequences and have the same limit. Therefore . Combining Lemma 3.16 with , Lemma 3.14(ii), statement(i), and the outer semicontinuity ot , we conclude that , which further implies .
(iii) Apply a similar proof as in [31, Theorem 3.6(iii)].
The result below provides a sufficient condition to the critical boundedness assumption in Theorem 3.17.
Theorem 3.18
Let be a sequence generated by BiFRB and define for . Suppose that Assumption 3.12(i) holds. If is coercive (or level-bounded), then the sequence is bounded, so is .
Proof. Assume without loss of generality that and for . Then, appealing to Lemma 3.14, one has for . Therefore
If was unbounded, then the above inequality would yield a contradiction due to the coercivity of (or level-boundedness).
3.4 Abstract sequential convergence
In this section, we establish global sequential convergence of BiFRB to a stationary point by using the generalized concave KL property (recall Definition 2.2). For and nonempty set , we define .
Lemma 3.19
[30, Lemma 4.4] Let be proper lsc and let . Let be a nonempty compact set on which for all . The following statements hold:
(i) Suppose that satisfies the pointwise generalized concave KL property at each . Then there exist and such that has the setwise generalized concave KL property on with respect to , and .
(ii) Set and define by
Then the function ,
and , is well-defined and belongs to . The function has the setwise generalized concave KL property on with respect to , and . Moreover,
We say is the exact modulus of the setwise generalized concave KL property of on with respect to and .
Theorem 3.20 (global convergence of BiFRB)
Let be a sequence generated by BiFRB and define for . Suppose that Assumption 3.12 holds and is bounded. Suppose further that the dominating BiFRB merit function (recall Definition 3.10) has the generalized concave KL property on . Then the following hold:
(i) The sequence is Cauchy and has finite length. To be specific, there exist index , and such that for ,
(18) |
where
and is the exact modulus associated with the setwise generalized concave KL property of on with respect to and .
(ii) The sequence has finite length and converges to some with .
Proof. By the boundedness assumption, assume without loss of generality that . Then Theorem 3.17 implies that and as . By Lemma 3.14, assume without loss of generality that is decreasing. Consider two cases:
Case 1: Suppose that there exists such that . Then Lemma 3.14(i) implies that and . The desired results then follows from a simple induction.
Case 2: Assume that for all . By Theorem 3.17(iii), the function is constant on . Then Lemma 3.19 yields and such that has the setwise generalized concave KL property on with respect to and and the associated exact modulus . Appealing to Theorem 3.17 again, there exists such that for and such that for all . Put . Then for ,
where the first inequality holds because and the last one follows from the generalized concave KL inequality (recall Definition 2.2). Invoking Lemma 3.16 yields
(19) |
where . For simplicity, define
By the concavity of and (19), one has
(20) |
Applying Lemma 3.14 to (3.4) yields
where . On one hand, by Assumption 3.12(i), assume without loss of generality that for all . On the other hand, satisfies
Then and
(21) |
Pick . Summing (21) from to an arbitrary yields
implying that
from which (18) readily follows. Finally, notice that
(22) |
Recall from Theorem 3.17(i) and Lemma 3.14(ii) that and as . Thus, passing the index to infinity, we conclude from inequality (22) that is Cauchy. Invoking Theorem 3.17(ii) completes the proof.
(ii) The statement follows from the definition of and Theorem 3.17(ii).
The generalized concave KL assumption in Theorem 3.20 can be easily satisfied by semialgebraic functions.
Corollary 3.21
Let be a sequence generated BiFRB. Assume that is bounded and Assumption 3.12 holds. Suppose further that both and are semialgebraic. Then converges to some with and has finite length property.
4 Stepsize strategies
Following abstract convergence properties of BiFRB, we now exploit conditions such that the important Assumption 3.12 holds.
To furnish Assumption 3.12, it suffices to establish the existence of suitable under desirable stepsize rules. Unlike most literature that emphasizes on explicit merit function parameters [9, 10, 31, 27], our and are implicit. This turns out to be instrumental in designing less restrictive stepsize rules. In the remainder of this section, unless specified otherwise,
(23) |
Consequently, the sequence and given by (10) and (11) satisfy
(24) |
The lemma below asserts that and are governed by stepsizes and an initial input , which will be useful soon.
Lemma 4.1
Let be a sequence of nonzero real numbers and let . Set and define as in (24). Then for ,
where we adopt the convention for any sequence .
Proof. Let . Then
and
from which the desired identities hold.
4.1 Non-Euclidean case
Now we provide a fixed stepsize rule for Algorithm 3.2. In contrast to similar stepsize results in the literature, we shall see that Assumption 3.12 furnishes explicit stepsize bounds that is independent of inertial parameters. In turn, we answer a question of Malitsky and Tam [22, Section 7]; see Remark 4.3.
Proposition 4.2 (fixed stepsize)
To furnish Assumption 3.12, we will prove that
(27) | |||
(28) | |||
(29) |
Combining inequalities (27)–(29) yields the existence of some such that
which together with (25) and (26) entails Assumption 3.12. First, we establish (27). Observe that
By assumption, and . Then the quadratic function has no root and is therefore strictly positive, implying (27). Next we justify (28), which amounts to . Notice that the discriminant and product of roots . Then the quadratic function has two real roots with opposite signs. In turn,
furnishing inequality (28). Finally, the assumption implies (29) after rearranging.
Remark 4.3
(i) (Comparison to known results) Compared to a result of Boţ and Csetnek [9, Lemma 3.3] with explicit merit function parameters but implicit stepsize bounds, our stepsize is explicit and independent of inertial parameters; see [10, Remark 7] for a similar restrictive result and [22, Corollary 4.4] for a result with dependent stepsize bound and inertial parameter in the presence of convexity; see also [25, Table 1].
(ii) (Regarding a question of Malitsky and Tam) In [22, Section 7], Malitsky and Tam posted a question regarding whether FRB can be adapted to incorporate a Nesterov-type acceleration. With inertial parameter independent of stepsize in Proposition 4.2, as opposed to a fixed inertial parameter dependent of stepsize in [22, Corollary 4.4], this question is resolved. Indeed, the Nesterov acceleration scheme corresponds to Proposition 4.2 with
where and ; see, e.g., [24].
(iii) The assumption that is not demanding. For instance, let be such that and let be a function with -Lipschitz gradient. Then satisfies .
4.2 Euclidean case
Considering kernel , we now turn to stepsize strategies of Algorithm 3.9.
Proposition 4.4 (dynamic stepsize)
Let be a iFRB stepsize sequence and let be the sequence generated by (24) with , and for some . Let be a sequence of positive real numbers with and let . Suppose that the following hold:
(i) The stepsize lower bound and upper bound .
(ii) For all , .
Proof. First notice that assumption(i) is not contradictory because . Then, by assumption(i), we have
which implies that there exits such that
(30) |
According to Lemma 4.1 and assumption(ii)
which together with (30) shows that is positive and bounded above. By assumption(ii), for all . Consequently, the sequence is convergent and . It follows from the above inequalities and (30) that
Hence .
The following corollary follows immediately, which provides an inertial variant of [31, Lemma 3.2] together with Lemma 3.14.
Corollary 4.5 (fixed stepsize)
Remark 4.6
One recovers the stepsize requirement in [31, Lemma 3.2] by setting .
5 Convergence rates
After global convergence results, we now turn to convergence rates of both the function value and actual sequence. To this end, a lemma helps.
Lemma 5.1
[11, Lemma 10] Let be a decreasing sequence converging . Assume further that there exist natural numbers such that for every ,
(31) |
where is some constant and . Then the following hold:
(i) If , then converges in finite steps.
(ii) If , then there exists such that for every ,
(iii) If , then there exists such that for every ,
Recall that we say a generalized concave KL function has KL exponent , if an associated desingularizing function for some . We now provide function value convergence rate of BiFRB under KL exponent assumption.
Theorem 5.2 (function value convergence rate)
Let be a sequence generated by BiFRB and define for . Suppose that all assumptions in Theorem 3.20 are satisfied and let be the limit given in Theorem 3.20(ii). Suppose further that the dominating BiFRB merit function (recall Definition 3.10) has the generalized concave KL property at with KL exponent . Define for . Then converges to and the following hold:
(i) If , then converges in finite steps.
(ii) If , then there exist and such that for sufficiently large,
(iii) If , then there exists such that for sufficiently large,
Proof. Recall from Lemma 3.14(ii) and Theorem 3.17(i) that the sequence is decreasing and converges to . By the KL exponent assumption, there exist and such that for
Assume without loss of generality that . So Lemmas 3.14 and 3.16 imply
where
Applying Lemma 5.1 completes the proof.
We now develop sequential convergence rate based on Theorem 5.2.
Theorem 5.3 (sequential convergence rate)
Let be a sequence generated by BiFRB and define for . Suppose that all assumptions in Theorem 3.20 are satisfied and let be the limit given in Theorem 3.20(ii). Suppose further that the dominating BiFRB merit function (recall Definition 3.10) has the generalized concave KL property at with KL exponent . Then the following hold:
(i) If , then converges in finite steps.
(ii) If , then there exist and such that for sufficiently large
(iii) If , then there exist such that for sufficiently large
Proof. For simplicity, define for
Then and . Without loss of generality (recall Lemma 3.14), assume that and for . To obtain the desired results, it suffices to estimate .
Assume without loss of generality that . Invoking Lemma 3.14(i) and Theorem 3.17(i) yields that , which in turn implies that
(32) |
where . Moreover, recall from Theorem 3.20(i) that there exists such that for
(33) |
where the last inequality holds because . Assume without loss of generality that is associated with desingularizing due to the KL exponent assumption. Thus, combined with (32), inequality (33) yields
(34) |
Case 1 (): Appealing to Theorem 5.2, the sequence converges to in finite steps. Thus (34) implies in finite steps, so does .
Case 3 (): Note that . Then appealing to (34) and Theorem 5.2 yields for sufficiently large
from which the desired result readily follows.
Example 5.4 (nonconvex feasibility problem)
Let be a nonempty, closed and convex set and let be nonempty and closed. Suppose that and either or is compact. Assume further that both and are semialgebraic. Consider the minimization problem (2) with and . Let be a sequence generated by BiFRB. Suppose that Assumption 3.12 holds. Then the following hold:
(i) There exists such that and .
(ii) Suppose additionally that
(35) |
Then . Moreover, there exist and such that
for sufficiently large.
Proof. (i) By the compactness assumption, the function is coercive. Hence Theorem 3.18 implies that is bounded. We assume that sets are semialgebraic, then so are functions and ; see [1, Section 4.3] and [2, Lemma 2.3]. Then apply Corollary 3.21.
(ii) Taking the fact that into account and applying the subdifferential sum rule [28, Exercise 8.8], one gets
The constraint qualification (35) then implies that and consequently . The set is assumed to be closed, thus . Then a direct application of [13, Theorem 5] guarantees that has KL exponent at . The desired result follows immediately from Theorem 5.3.
6 Bregman proximal subproblem formulae
Set . Fix and . In the remainder of this section, let
Define for , and
(36) | ||||
(37) |
Clearly Assumption 3.6(ii) holds with and and the Bregman proximal subproblem (9) corresponds to (36) with , and . Note that (36) covers subproblems in [9, Algorithm 3.1] and [10, Agorithm 1] as well.
In this section, we develop formulae of with the kernel chosen as above, supplementing not only Algorithm 3.2 but also [9, Algorithm 3.1] and [10, Algorithm 1].
6.1 Formula of -constrained problems
Let for integer an and real number , where with . In this subsection, consider minimization problem
which is (2) with . Recall that the Hard-threshold of with parameter , denoted by , is given by
(38) |
Immediately, for . The following lemma will be instrumental.
Lemma 6.1
Following a similar route in [8, Proposition 5.2], we now prove the desired formula.
Proposition 6.2 (subproblem formula)
Let , where for integer an and real number . Define
(39) |
where is the unique solution of
(40) |
which satisfies if ; otherwise is the unique solution of
Then .
Proof. Set for simplicity. If , then and clearly . Now we consider , in which case . Invoking Lemma 6.1, one has for
(41) |
attained at
(42) |
It is easy to see that optimal value (6.1) and solution (42) still hold when .
Next we show that . By (37), is the solution set to
(43) |
Consider an arbitrary feasible of (43) and let . Then and
where the first and last inequalities hold due to (40) and (6.1) respectively.
Finally, we turn to (40). Noticing the strong convexity of and applying optimality condition, one gets
So
and
which never occurs when . If , then and , meaning that there exists unique such that due to the strong convexity of .
Remark 6.3
When , in Proposition 6.2 is indeed one of the four roots of quartic equation
Although quartic equations admit closed-form solutions, it is virtually impossible to determine beforehand the branch of solutions to which belongs, due to parameters , and . To find efficiently, one may employ numerical root-finding algorithms. For instance, apply bisection method to identify a subinterval of in which the desired root lies, then appeal to Newton’s method; see, e.g., [12, Chapter 2].
6.2 Formula of problems penalized by a positively homogeneous convex function
Recall that is positively homogeneous if for every and . In this subsection, we provide subproblem formula for problems of the form
where is convex and positively homogeneous with known proximal mapping, which is a special case of problem (2).
The result below generalizes the technique of Bolte-Sabach-Teboulle-Vaisbourd [8, Proposition 5.1]. However one cannot recover their result as our kernel is different.
Proposition 6.4
Let be proper, lsc and convex. Suppose that is positively homogeneous. Then the following hold:
(i) .
(ii) (subproblem formula) , where is the unique root of the equation
Proof. (i) .
(ii) For simplicity, let . Note that is single-valued by strong convexity. Then invoking optimality condition and subdifferential sum rule yields
Define and . Then and the above inclusion together with statement(i) implies that
In turn, satisfies
If , then clearly as claimed; if , then and , which is consistent with statement(ii). To see the uniqueness, let for , which is strongly convex. Simple calculus shows that is stationary point of , thus unique.
Remark 6.5
Equipped with Proposition 6.4, we now turn to concrete problems. Recall that the Soft-threshold of with parameter is , which satisfies
see, e.g., [5, Example 6.8].
Corollary 6.6 (-penalized problem)
Let . Then , where is the unique solution of .
Corollary 6.7 (-penalized problem)
Let . Then , where
and is the unique solution of .
Remark 6.8
The projection admits a closed-form formula; see, e.g., [5, Example 6.33].
7 Numerical experiments
In the remainder, let , , , and . Define and . Clearly is semialgebraic. The set can be represented as
which is an intersection of semialgebraic sets; see, e.g, [4, Formula 27(d)], thus semialgebraic. Consider
(44) |
which corresponds to (2) with and . In turn, Example 5.4 ensures that BiFRB and its Euclidean variant iFRB converge to a stationary point of (44).
We shall benchmark BiFRB and iFRB against the Douglas-Rachford (DR) [18], Forward-reflected-backward (FRB) [31, 22] and inertial Tseng’s (iTseng) [9] methods with and . These splitting methods are known to converge globally to a stationary point of (2) under the same assumptions on ; see [18, Theorems 1–2, Remark 4, Corollary 1], [31, Theorem 3.9] and [9, Theorem 3.1], respectively.
BiFRB, iFRB and FRB are implemented with stepsize rules given by Proposition 4.2, Corollary 4.5 and [31, Theorem 3.9] respectively, and terminated when
(45) |
Moreover, BiFRB uses the kernel . We apply DR with stepsize and termination condition given by [18, Section 5] with the same tolerance ; while iTseng employs a stepsize given by [9, Lemma 3.3] and termination condition (45). As for inertial parameter, we use for BiFRB and for both of iFRB and iTseng. Finally, each algorithm is initialized at the origin and is equipped with the stepsize heuristic described in [18, Remark 4].
Simulation results are presented in Table 1 below. Our problem data is generated through creating random matrices with entries following the standard Gaussian distribution. For each problem of the size , we randomly generate 50 instances and report ceilings of the average number of iterations (iter) and the minimal objective value at termination (). As problem (44) has optimal value zero, indicates whether an algorithm actually hits a global minimizer rather than simply converging to a stationary point. We observed the following:
-
-
BiFRB is the most advantageous method on “bad” problems (small ), but it is not robust.
-
-
DR tends to have the smallest function value at termination, however, it can converge very slowly.
-
-
In most cases, iFRB has the smallest number of iterations with fair function values at termination.
8 Conclusions
We proved convergence of BiFRB (Algorithm 3.2) by using the generalized concave KL property and a general framework for decreasing property. The resulting stepsize condition is independent of inertial parameter, which is less restrictive compared to other algorithms with the same setting. In turn, a question of Malitsky and Tam [22, Section 7] is answered. We believe that our approach could be useful for other splitting algorithms in designing more implementable stepsize rules in the absence of convexity. We also conducted convergence rate analysis on both function value and the actual sequence. Formulae for our Bregman subproblems are provided as well, which supplements not only BiFRB but also [9, 10]. To end this paper, let us now present several directions for future work:
- -
-
-
Another interesting question is to explore the reason why BiFRB demonstrates inconsistent performance when problem size varies; see Section 7.
-
-
Now that the range of admissible inertial parameter is extended (recall Proposition 4.2), it would be interesting to see if there exists an optimal accelerating scheme.
Size | BiFRB | iFRB | FRB | DR | iTseng | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
m | n | iter | iter | iter | iter | iter | |||||
100 | 4000 | 50 | 0.03251 | 93 | 0.03251 | 631 | 0.03929 | 860 | 0.02819 | 1367 | 0.03149 |
100 | 5000 | 52 | 0.02413 | 115 | 0.02413 | 733 | 0.02875 | 684 | 0.02192 | 1632 | 0.02268 |
100 | 6000 | 57 | 0.01369 | 145 | 0.01369 | 911 | 0.01794 | 683 | 0.01253 | 1970 | 0.0133 |
200 | 4000 | 870 | 0.2923 | 39 | 0.2745 | 190 | 0.2769 | 5466 | 0.2711 | 448 | 0.2746 |
200 | 5000 | 1198 | 0.2367 | 41 | 0.2095 | 231 | 0.2157 | 5864 | 0.2073 | 546 | 0.2097 |
200 | 6000 | 81 | 0.2306 | 43 | 0.2259 | 257 | 0.2292 | 4199 | 0.2236 | 620 | 0.2264 |
300 | 4000 | 885 | 1.051 | 32 | 1.036 | 105 | 1.038 | 5497 | 1.036 | 253 | 1.048 |
300 | 5000 | 925 | 0.7481 | 34 | 0.7044 | 124 | 0.7071 | 5673 | 0.7032 | 302 | 0.7051 |
300 | 6000 | 1407 | 0.583 | 36 | 0.5546 | 144 | 0.555 | 6173 | 0.5541 | 352 | 0.5566 |
Size | BiFRB | iFRB | FRB | DR | iTseng | ||||||
m | n | iter | iter | iter | iter | iter | |||||
1873 | 0.006609 | 1210 | 0.00365 | 7376 | 0.00816 | 2194 | 4e-21 | 10001 | 0.01671 | ||
574 | 0.00278 | 1427 | 0.002244 | 8675 | 1.098e-05 | 2919 | 1.626e-20 | 10001 | 0.0158 | ||
790 | 0.003827 | 1696 | 0.001261 | 9394 | 0.0002952 | 2344 | 3.438e-20 | 10001 | 0.01066 | ||
5842 | 2.992 | 750 | 2.442e-18 | 8223 | 0.007688 | 1038 | 1.106e-20 | 9797 | 0.1094 | ||
5938 | 0.6547 | 859 | 2.876e-05 | 7257 | 0.02581 | 1242 | 2.082e-20 | 9997 | 0.07125 | ||
5666 | 0.917 | 1086 | 0.0001211 | 7312 | 0.006726 | 1487 | 9.129e-21 | 10001 | 0.07524 | ||
7164 | 3.322 | 619 | 2.196e-18 | 4620 | 1.462e-05 | 753 | 9.795e-21 | 7722 | 0.2417 | ||
5267 | 2.299 | 693 | 2.175e-18 | 6352 | 0.02936 | 880 | 5.446e-20 | 9194 | 0.1702 | ||
5724 | 5.06 | 750 | 4.71e-18 | 8644 | 0.1109 | 1027 | 2.175e-20 | 9823 | 0.1423 |
Acknowledgments
XW and ZW were partially supported by NSERC Discovery Grants.
References
- [1] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality, Mathematics of Operations Research, 35 (2010), pp. 438–457.
- [2] H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods, Mathematical Programming, 137 (2013), pp. 91–129.
- [3] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, Cham, 2017.
- [4] H. H. Bauschke, D. R. Luke, H. M. Phan, and X. Wang, Restricted normal cones and sparsity optimization with affine constraints, Foundations of Computational Mathematics, 14 (2014), pp. 63–83.
- [5] A. Beck, First-order Methods in Optimization, SIAM, 2017.
- [6] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM Journal on Optimization, 18 (2007), pp. 556–572.
- [7] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet, Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity, Transactions of the American Mathematical Society, 362 (2010), pp. 3319–3363.
- [8] J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd, First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems, SIAM Journal on Optimization, 28 (2018), pp. 2131–2151.
- [9] R. I. Boţ and E. R. Csetnek, An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems, Journal of Optimization Theory and Applications, 171 (2016), pp. 600–616.
- [10] R. I. Boţ, E. R. Csetnek, and S. C. László, An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions, EURO Journal on Computational Optimization, 4 (2016), pp. 3–25.
- [11] R. I. Boţ and D.-K. Nguyen, The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates, Mathematics of Operations Research, 45 (2020), pp. 682–712.
- [12] R. Burden, D. Faires, and A. Burden, Numerical Analysis, Cengage Learning, 2014.
- [13] C. Chen, T. K. Pong, L. Tan, and L. Zeng, A difference-of-convex approach for split feasibility with applications to matrix factorizations and outlier detection, Journal of Global Optimization, 78 (2020), pp. 107–136.
- [14] P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling & Simulation, 4 (2005), pp. 1168–1200.
- [15] J. Douglas and H. H. Rachford, On the numerical solution of heat conduction problems in two and three space variables, Transactions of the American Mathematical Society, 82 (1956), pp. 421–439.
- [16] K. Kurdyka, On gradients of functions definable in o-minimal structures, Annales de l’institut Fourier, 48 (1998), pp. 769–783.
- [17] G. Li, T. Liu, and T. K. Pong, Peaceman–Rachford splitting for a class of nonconvex optimization problems, Computational Optimization and Applications, 68 (2017), pp. 407–436.
- [18] G. Li and T. K. Pong, Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems, Mathematical Programming, 159 (2016), pp. 371–401.
- [19] P.-L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM Journal on Numerical Analysis, 16 (1979), pp. 964–979.
- [20] S. Łojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, Les équations aux dérivées partielles, 117 (1963), pp. 87–89.
- [21] R. Luss and M. Teboulle, Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint, SIAM Review, 55 (2013), pp. 65–98.
- [22] Y. Malitsky and M. K. Tam, A forward-backward splitting method for monotone inclusions without cocoercivity, SIAM Journal on Optimization, 30 (2020), pp. 1451–1472.
- [23] B. S. Mordukhovich, Variational Analysis and Generalized Differentiation I: Basic Theory, Springer-Verlag, Berlin, 2006.
- [24] Y. Nesterov, A method for solving the convex programming problem with convergence rate O(), Dokl. Akad. Nauk SSSR, 269 (1983), pp. 543–547.
- [25] P. Ochs, Local convergence of the heavy-ball method and iPiano for non-convex optimization, Journal of Optimization Theory and Applications, 177 (2018), pp. 153–180.
- [26] P. Ochs, Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano, SIAM Journal on Optimization, 29 (2019), pp. 541–570.
- [27] P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: Inertial proximal algorithm for nonconvex optimization, SIAM Journal on Imaging Sciences, 7 (2014), pp. 1388–1419.
- [28] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer-Verlag, Berlin, 1998.
- [29] P. Tseng, A modified forward-backward splitting method for maximal monotone mappings, SIAM Journal on Control and Optimization, 38 (2000), pp. 431–446.
- [30] X. Wang and Z. Wang, The exact modulus of the generalized Kurdyka-Łojasiewicz property, Mathematics of Operations Research (2022), https://doi.org/10.1287/moor.2021.1227.
- [31] X. Wang and Z. Wang, Malitsky-Tam forward-reflected-backward splitting method for nonconvex minimization problems, Computational Optimization and Applications, 82 (2022), pp. 441–463.