A distributed proximal splitting method with linesearch for locally Lipschitz data
Abstract
In this paper, we propose a distributed first-order algorithm with backtracking linesearch for solving multi-agent minimisation problems, where each agent handles a local objective involving nonsmooth and smooth components. Unlike existing methods that require global Lipschitz continuity and predefined stepsizes, our algorithm adjusts stepsizes using distributed linesearch procedures, making it suitable for problems where global constants are unavailable or difficult to compute. The proposed algorithm is designed within an abstract linesearch framework for a primal-dual proximal-gradient method to solve min-max convex-concave problems, enabling the consensus constraint to be decoupled from the optimisation task. Our theoretical analysis allows for gradients of functions to be locally Lipschitz continuous, relaxing the prevalent assumption of globally Lipschitz continuous gradients.
Keywords:
Backtracking linesearch, distributed optimisation, locally Lipschitz data, min-max problems, primal-dual algorithm, proximal splitting algorithm.
Mathematics Subject Classification (MSC 2020):
90C25, 68W15, 49M27, 65K05.
1 Introduction
In this work, we consider a network of agents, connected by a communication graph, working collaboratively to solve
(1) |
where each agent possesses its own components of the objective function and . Here, is a proper lower semicontinuous convex function that may represent constraints or nonsmooth components of the problem, while is a differentiable convex function with locally Lipschitz continuous gradient capturing smooth components in the objective. Each agent has the ability to perform local computations using their respective and , and the agents communicate via their immediate neighbours in the network to exchange information.
Distributed optimisation problems fitting the form of (1) arise in numerous applications including signal processing, statistics and machine learning [15, 19, 17, 18]. Several methods have been proposed to address problems of this type, however they typically assume that smooth components in the objective function have globally Lipschitz continuous gradients [23, 24, 11], leading to convergence guarantees involving stepsize bounds that depend on the inverse of the (global) Lipschitz modulus [6, 24, 11]. In the case such constants are unavailable, for instance, when the problem formulation does not have globally Lipschitz gradients, or when the Lipschitz constant exists but cannot be computed accurately, defining appropriate stepsizes poses a challenge. Consequently, alternatives strategies for determining the stepsizes, such as via a backtracking linesearch [2, 3], are required.
In order to solve (1), we present linesearch procedures for a distributed first-order algorithm. In the proposed method, each component of the objective function is handled separately, and each iteration utilises a backtracking linesearch involving communication, only allowing each agent to communicate with immediate neighbours and update local variables. For each individual agent, vector operations are performed in a decentralised manner, and the coordination operations to determine the stepsizes use distributed communication with only scalar values. Our approach builds on decentralised proximal-gradient methods, casting problem (1) into the more general framework of min-max problems and addressing the consensus constraint in the dual space. Primal-dual methods to solve min-max problems have been extensively studied in the literature, see [5, 9, 10, 12, 14] and references therein.
Let us recall that a proximal-gradient exact first-order algorithm (PG-EXTRA) is proposed in [24] to solve (1). This algorithm performs rounds of communications for the iterates in each iteration, using a common constant stepsize among agents. By denoting the set of agents directly connected to agent in the network and a matrix that represents the topology of the network and models communication among agents, called a mixing matrix (see Definition 2.3), the main iteration of PG-EXTRA can be formulated as follows. Given a fixed stepsize , for each and , compute
(2) |
This algorithm converges to a solution to (1) as long as each has globally Lipschitz continuous gradients with constant , and the stepsize satisfies a bound that depends on the minimum eigenvalue of and the Lipschitz constants:
However, computing this bound requires the smallest eigenvalue of the mixing matrix and the global Lipschitz constants of the gradients for each , two possibly expensive tasks. Moreover, the latter becomes unviable if at least one of the gradients is only locally Lipschitz continuous.
To overcome the aforementioned challenge, we propose a distributed first-order algorithm that uses a linesearch to compute appropriate stepsizes using rounds of peer-to-peer communication. For a sequence of stepsizes computed via this linesearch, and a constant , the proposed algorithm takes the following form: for all and ,
(3) |
The relation between PG-EXTRA and the proposed algorithm is obtained by setting , and for each . This change of variables yields (2) with (see Remark 4.2 for further details). As we will see later, the alternative form used in (3) will be more convenient than (2) for the linesearch formulation and analysis.
Our contribution can be summarised as follows. We propose two linesearch procedures for a distributed proximal splitting method to solve the multi-agent minimisation problem (1). In other words, we propose backtracking linesearch routines for PG-EXTRA. Our derivation uses an abstract linesearch framework to define stepsizes for a primal-dual first-order splitting method for solving min-max convex-concave problems with a linear coupling term. We develop a convergence analysis for such splitting method that only requires locally Lipschitz continuous gradients and, in this case, the linesearch estimates the local Lipschitz constant of such gradients. Furthermore, we extend this linesearch to allow approximation of the norm of the linear coupling term, giving more flexibility to the step length in each iteration.
The paper is organised as follows. Section 2 provides the preliminary concepts and materials used in this work. In Section 3, we propose and analyse convergence of with a unifying linesearch framework for a primal-dual algorithm, which is related to the method with linesearch proposed in [13]. Building on this, Section 4 presents two distributed linesearch procedures to define stepsizes for a distributed proximal-gradient algorithm with guarantees of convergence to a solution of (1). Concluding remarks are given in Section 5.
2 Preliminaries
2.1 General definitions and notations
We assume throughout that is a finite-dimensional Hilbert space with inner product and induced norm . When , the notation represents the dot product given by for all . Similarly, when , denotes the trace inner product given by for all . For a symmetric matrix , and denote the smallest and largest eigenvalues of , respectively.
Let . The domain of is . We say that is proper if its domain is nonempty, lower semicontinuous (lsc) if, for all , , and convex if, for all and all ,
We say an operator is (globally) Lipschitz continuous on with constant if, for all ,
When , we simply say that is Lipschitz continuous.
Functions with Lipschitz gradients provide a powerful ingredient for convergence analysis of gradient-based methods, known as the descent lemma [1, Lemma 2.64]. This result states that the first-order approximation error can be bounded by a quadratic term when gradients are Lipschitz continuous.
Lemma 2.1 (Descent lemma).
Suppose is a differentiable function over a open convex set , such that is Lipschitz continuous on with constant . Then, for all ,
We say that an operator is locally Lipschitz continuous on if, for all , there exists an open set containing , such that restricted to is Lipschitz continuous. Any twice continuously differentiable function is locally Lipschitz, since they have locally bounded Hessians. For locally Lipschitz functions in finite-dimensional spaces, Lemma 2.1 is therefore satisfied over bounded sets.
Given a proper lsc convex function and , the proximity operator of with stepsize at is given by
For a nonempty closed convex set , the indicator function of given by if and otherwise, and the projector onto is denoted by .
The following lemma is concerned with continuity properties of proximity operators. Here, the notation denotes the closure of a set .
Lemma 2.2 ([8, Corollary 7]).
Let be a proper lsc convex function. Then the operator given by
is continuous on its domain. Moreover, it is locally Lipschitz on the interior of its domain.
2.2 Distributed optimisation
We now turn our attention to notation and preliminaries for distributed optimisation. Following [23, 24, 22], given (column) vectors , we denote
For the functions in problem (1), we define and as
Under this definition, the proximity operator of with parameter and the gradient of are given by
In order to model communication between agents through the communication graph in (1), we recall the following definition (see, for example, [23]).
Definition 2.3 (Mixing matrix).
Given an undirected graph with , a matrix is called a mixing matrix if
-
(i)
whenever ,
-
(ii)
,
-
(iii)
, where ,
-
(iv)
.
Definition 2.3(i)–(ii) ensures that communication is symmetric and only occurs between neighbours, while Definition 2.3(iii) is used to ensure consensus, and Definition 2.3(iv) is related to convergence. Note that, under the notation defined in this section, (2) can be combined for all agents to write the PG-EXTRA compactly as
(4) |
As discussed in [14], PG-EXTRA can be understood as the Condat–Vũ method [7, 25] applied to a reformulation of (1) as a saddle-point problem (see also Section 4). In what follows, we recall the derivation of this reformulation. For each agent , let denote its copy of the decision variable . Then problem (1) can be expressed as
(5) |
Let so that . Then if and only if , in which case we say is in consensus. Hence, problem (5) becomes
(6) |
Since is proper lsc convex with , we have . Substituting this into (6) therefore gives
(7) |
Altogether, this shows that the multi-agent minimisation problem (1) can be reformulated as a saddle-point problem (7) having a linear coupling term related to consensus among agents. Note that, in this context, can be computed from the mixing matrix . The structure of the latter problem (7) motivates our study of min-max problems in the following section.
3 Proximal splitting algorithm with linesearch
Consider the following convex-concave min-max problem
(8) |
where is a (bounded) linear operator between two finite-dimensional Hilbert spaces, and are proper lsc convex functions, and is convex differentiable with locally Lipschitz continuous gradient. We assume that is known and that there exists a saddle point for problem (8), that is, there exists a pair satisfying
(9) |
Conditions for existence of saddle points for convex-concave problems can be found, for example, in [20, 21]. Note that (9) can be equivalently expressed as the following pair of inequalities
(10) |
The two quantities above define the primal-dual gap. Given a saddle point of problem (8), the primal-dual gap is given by
(11) |
Assuming is globally Lipschitz continuous (with constant ) and knowledge of , the Condat–Vũ method [7, 25] can be used to solve (8). Given stepsizes satisfying , this method takes the form
Note that, in the special case where , this method reduces to the primal-dual hybrid gradient (PDHG) method from [5]. To handle the case where useful estimates of both and are unavailable, a version of the Condat–Vũ method with a backtracking linesearch was developed in [13, Section 4]. However, this approach is not directly suited to our setting for two reasons: firstly, it still requires global Lipschitz continuity of (even though the Lipschitz constant is not needed) and, secondly, it can not exploit knowledge of even when it is available. In the distributed setting of Section 4, the latter is related to the minimum eigenvalue of the mixing matrix and, as such, is typically assumed to be known.
Thus, in order to solve problem (8) in the setting important for distributed optimisation, in this section, we extend the analysis of the primal-dual algorithm with linesearch introduced in [13] under the assumption that is locally Lipschitz (but not necessarily globally Lipschitz) and that is known. Our analysis considers an abstract linesearch procedure satisfying Assumptions 3.1 and 3.3 (for use in Section 4).
3.1 Primal-dual algorithm with abstract linesearch
In Algorithm 1, we present our proposed method with an abstract linesearch procedure, denoted LS, which is assumed to satisfy Assumption 3.1. The main result of this section (Theorem 3.4) shows that the sequences generated by Algorithm 1 converge to a saddle point of (8). An implementation of the abstract linesearch is given in Subroutine 2.
Assumption 3.1 (Abstract linesearch).
Given a set of parameters param containing and , and some set , consider an abstract linesearch LS that takes the input
and returns
satisfying the following conditions.
-
(i)
.
-
(ii)
The set of parameters param in Assumption 3.1, besides including the constants and that appear in the descent condition (ii), it may also involve other parameters that define internal operations in LS. We will define a specific set of parameters in Section 3.2.
(12) |
In order to prove convergence of Algorithm 1, motivated by [13, Theorem 3.4], we first show that the algorithm satisfies a sufficient descent condition and generates bounded sequences.
Lemma 3.2 (Sufficient decrease and boundedness of iterates).
Suppose that and are proper lsc convex functions, and is a differentiable convex function with locally Lipschitz continuous gradient. In addition, suppose Assumption 3.1 holds, and let be a saddle point of (8). Then, for the sequences , and generated by Algorithm 1, the following hold:
-
(i)
The sequence defined by
(13) satisfies the sufficient decrease condition
(14) -
(ii)
The sequences , and are bounded, and
-
(iii)
, , , and converge to .
Proof.
In the context of Algorithm 1, Assumption 3.1(ii) implies:
(15) |
In view of the update rule of in Algorithm 1 and [1, Proposition 12.26], we have
(16) |
In view of Assumption 3.1(ii), , and thus This estimate together with condition (15) and the convexity of implies
(17) | ||||
Thus, combining (16) and (17) yields
(18) |
for all . With (18) established, the remainder of the proof closely follows [13, Theorem 3.4]. From the update rule for in Algorithm 1 and [1, Proposition 12.26], we have
(19) |
Using (19), the identity and the definition of , we deduce that
(20) | ||||
(21) |
for all . Using properties of the adjoint, the inner product terms on last line of (21) can be written as
(22) | ||||
Let be a saddle point of problem (8), and take . Substituting (22) into (21), noting the definitions for and from (11), we obtain
Using in the last line and the law of cosines in the second line yields
From (12) and Assumption 3.1(i), , and thus . Then, satisfies the sufficient decrease condition (14). Hence, converges to . The sufficient decrease condition implies
Furthermore, since and converges, the sequences and are bounded, which in turn implies that is also bounded. ∎
Global convergence will follow from the previous results, as long as the sequence of stepsizes does not vanish. This fact determines the next assumption of the abstract linesearch.
Assumption 3.3.
For Algorithm 1, suppose the sequence generated by LS is separated from whenever the sequence is bounded.
Note that, as a consequence of (12) and Assumptions 3.1(i) and 3.3, since , the sequence is bounded and separated from zero.
Theorem 3.4.
Suppose that and are proper lsc convex functions, and is a differentiable convex function with locally Lipschitz continuous gradient. Suppose, in addition, Assumptions 3.1 and 3.3 hold, and there exists a saddle point of (8). Then, the sequence generated by Algorithm 1 converges to a saddle point of problem (8), and converges to .
Proof.
We first prove that any cluster point of is a saddle point. Let be a cluster point of which is bounded due to Lemma 3.2(ii). Then there exists a subsequence such that as . From (18) and (19), we have
for all . In view of Assumptions 3.1(i) and 3.3 and Lemma 3.2(iii), taking the limit as , yields and , and thus for all . That is a saddle point for problem (8) follows from (10). Furthermore, substituting with in the above set of inequalities and , taking the limit as , since both and are separated from zero, Lemma 3.2(iii) implies and , so that follows from (11). Finally, we prove that the whole sequence converges to . Since defined in (13) (monotonically) converges to some , then
Taking the limit along the subsequence as in the previous equation, yields . Thus and as . ∎
Remark 3.5.
Some comments regarding the proof of the result above are in order.
- (i)
-
(ii)
The introduction of the parameter in Step 2 of Algorithm 1 is crucial to guarantee global convergence of the sequence. Otherwise, if , we would only be able to prove subsequential convergence to saddle points. Indeed, setting in (14) would prevent us from concluding that converges to , and thus we would only be able to guarantee that converges (but not necessarily to ). In this case, an alternative is to assume, in addition, that is continuous on its domain as in [13, Theorem 3.4].
3.2 A linesearch procedure for primal-dual splitting methods
In this section, we propose a natural implementation in Subroutine 2 of LS to satisfy Assumptions 3.1 and 3.3.
(23) |
Remark 3.6.
Some comments regarding the parameters in Subroutine 2 are in order.
-
(i)
The shrinking parameter is used in the backtracking linesearch to find a stepsize of adequate length to satisfy (15). As for , it is used to balance the primal and dual stepsizes. The parameter sequence and the parameter give flexibility to the stepsizes. The initialisation in (12) assures that the stepsize sequence remains bounded, but possibly increase it if larger stepsizes are allowed.
-
(ii)
Suppose that, in (12), for all , and that the linesearch LS loop terminates in a finite number of steps in each iteration. Then, the sequence of stepsizes is nonincreasing and bounded above. In this case, it suffices to initialise the outer loop with , and set for condition (12) to always hold. Furthermore, the stepsize sequence is also eventually constant, and hence our proposed method eventually reduces to [7, Algorithm 3.2]. To see this, suppose, to the contrary, that the stepsize sequence decreases an infinite number of times. By construction, each decrease must be at least by a factor of . This implies that converges to , a contradiction.
-
(iii)
Let us assume that after finitely many iterations of Algorithm 1, the stepsizes become constant equal to . In view of (12), it holds . Furthermore, the linesearch condition (23) suggest that an approximation of the equivalent of a Lipschitz constant is . Therefore, for , the condition on the stepsize of the Condat–Vũ method reads
Hence, if the linesearch condition eventually finds constant stepsizes, it implies the bounds on the stepsizes in [7, 25], as the right-hand side in the above estimate is strictly smaller than . Therefore, we retrieve convergence to saddle points for this special case.
If is globally Lipschitz continuous with constant , the inequality (15) holds whenever by virtue of Lemma 2.1. This estimate is valid after finitely many steps in the linesearch procedure, since is a decreasing sequence and hence the linesearch is well-defined for globally Lipschitz continuous gradients. The analogous result when is only locally Lipschitz continuous is presented in the following lemma.
Lemma 3.7.
Suppose that and are proper lsc convex functions, and is a differentiable convex function with locally Lipschitz continuous gradient. Then, the following hold:
-
(i)
For each , there exists such that
(24) where and .
-
(ii)
The linesearch in Subroutine 2 terminates in finitely many iterations with .
-
(iii)
The sequence defined in Algorithm 1 is bounded and separated from .
In particular, the implementation of LS in Subroutine 2 is well-defined and satisfies Assumptions 3.1 and 3.3.
Proof.
In order to prove (i), fix . In view of Lemma 2.2, there exists such that, for all , we have , where denote the open ball centred at with radius . Moreover, as is locally Lipschitz continuous and is finite dimensional, is Lipschitz continuous on the open bounded convex set . Thus, according to Lemma 2.1, there exists such that
Taking yields (24). Since , there exists such that for all . This shows that the linesearch terminates after iterations. Furthermore, since , then Assumption 3.1(i) holds with , proving (ii). By taking in (24), Assumption 3.1(ii) follows. Next, as a consequence of (12), is bounded above by . Suppose, by way of a contradiction, that is not separated from zero. Then there exists a decreasing subsequence with as . Set and set
Due to the linesearch procedure in Subroutine 2, we obtain
(25) |
By Lemma 3.2, the sequences and are bounded. Thus, appealing to Lemma 2.2, is bounded as the continuous image of a bounded sequence. Let denote any bounded open convex set containing . By applying Lemma 2.1, there exists such that
(26) |
Combining (25) and (26) yields the contradiction , and hence we conclude that is separated from zero. The fact that is also bounded above and separate from zero follows, showing (iii) holds, from where Assumption 3.3 follows. ∎
Corollary 3.8 (Primal-dual convergence).
Proof.
Remark 3.9.
In the following, we comment on the above convergence result.
-
(i)
If the operator norm of is not available for use in (12), then it is instead possible to replace (12) with and modify the linesearch conditions in (23) with
(27) This corresponds to the linesearch proposed and analysed in [13, Algorithm 4] for globally Lipschitz . In the locally Lipschitz case, it is also possible to prove analogous results to Lemma 3.7(i) and (ii), and 3.2. Regarding Lemma 3.7(iii), since we modify the initialisation in (23), we can only guarantee that is separated from zero, which is enough to argue convergence to saddle points as in [13, Section 4].
- (ii)
In this section, we defined a concrete linesearch procedure for Algorithm 1 that leads to convergence to saddle points of the min-max convex-concave problem (8), by assuming that the norm of the operator is available. In the next section, we apply these ideas to obtain implementable distributed algorithms for multi-agent minimisation problems.
4 A distributed proximal-gradient algorithm with linesearch
In this section, we propose a distributed first-order method with linesearch based upon Algorithm 1 to solve the multi-agent minimisation problem (1). In doing so, the main question that arises is how to implement the linesearch in distributed settings. We propose two implementations that address this with differing amounts of communication.
4.1 Construction of the algorithm
Let be a mixing matrix, and set . As explained in Section 2, problem (1) can be formulated as
(28) |
Hence, problem (28) is in the form of problem (8) of Section 3 with
Applying Algorithm 1 to problem (28) gives
(29) |
where and is calculated using a linesearch. However, in this form, (29) is not suitable for a distributed implementation since need not be a mixing matrix (in particular, it need not satisfy Definition 2.3(i)). To address this, denote
(30) |
Premultiplying the first two equations in (29) by and substituting (30) yield
(31) |
Since this only involves the mixing matrix , (31) is suitable for distributed implementation. Due to (30), we also note that the initialisation for (31) requires that (e.g., ).
As a result of the discussion above, we present, in Algorithm 3, a distributed proximal-gradient method for finding solutions to problem (28), given an abstract distributed linesearch procedure LS that computes stepsizes satisfying
This method corresponds to Algorithm 1 applied to (28). In Section 4.2, we discuss two possible distributed implementations of LS.
(32) |
Note that the linesearch initialisation (12) for Algorithm 1 involves the opeartor norm of . In view of the definition of , this can be expressed in terms of the minimum eigenvalue of the mixing matrix :
This identity is used to obtain the initialisation rule for in (32) from (12).
Remark 4.1.
Some further comments regarding Algorithm 3 are in order. Although the initialisation is used in Algorithm 3 for convenience, it is actually possible to use such that . To see this, note that the derivation (30) only requires and, due to Definition 2.3(iii), we have
The linesearch procedure LS in Algorithm 3 is assumed to perform agent-independent proximal steps to define
Two possible ways to implement LS are described in Subroutines 4 and 5 assuming are locally Lipschitz continuous for each and that (an estimate of) is available.
Remark 4.2 (Equivalent forms of PG-EXTRA).
- (i)
-
(ii)
The initialisation of Algorithm 3 can recover the initialisation of PG-EXTRA used in [24]. Indeed, taking and any gives which implies . Since , we therefore have
Alternatively, taking any and setting gives which implies . We therefore obtain
which corresponds to the initialisation of PG-EXTRA discussed in [14, Remark 4.5].
4.2 Linesearch procedures in distributed settings
In this section, propose two distributed linesearch techniques that can take the role of LS in Algorithm 3. Although the updates for the variables and in Algorithm 3 can be decentralised (when the stepsize is known), the proposed linesearch procedure internally requires global communication of scalar values across all agents. More precisely, LS as defined in Subroutine 4 requires a method for computing a global (scalar) sum, and LS as defined in Subroutine 5 requires computing a minimum (scalar) value among agents. Both of these operators can be efficiently implemented, for instance, using Message Passing Interface (MPI) [16]. See [4, Chapter 10] for more details. Since the aforementioned approaches only require communication of scalars across the network, their global communication costs are small compare to methods that needs global communication of the vectors and matrix variables such as and .
Linesearch with global (scalar) summation.
The first distributed linesearch subroutine we propose is presented in Subroutine 4, and corresponds to a direct application of Subroutine 2. It locally computes independent proximal steps for each agent in (33), using one common stepsize. In each outer loop of the linesearch, a round of global communication is then used to compute the sum of the scalars , and the linesearch condition (34) is checked. Note that this check (for positivity of the sum) can either be performed for by all agents after the global value has been returned, or performed by one agent followed by broadcasting the outcome to the other agents. Observe also that each computation of the summation (34) involves performing proximal steps (one per agent). Note that, when (34) holds, the current stepsize is too large and must be shrunk by the same amount for all agents before returning to Step 1. When (34) does not hold, the linesearch procedure terminates with all agents agreeing on a common stepsize.
(33) |
(34) |
We also propose a linesearch that instead of requiring the calculation of a decentralised sum of local values over a network, it needs the computation of the minimum (scalar) value among agents in a network.
Linesearch with a global minimum.
The second distributed linesearch subroutine we propose is presented in Subroutine 5. Different from Subroutine 4, this is not a direct application of Subroutine 2 as it locally computes independent proximal steps for each agent in (35) using potentially different stepsizes between agents. In the outer loop (Steps 1 & 2), the linesearch condition (36) is tested locally for each agent, rather than globally as in Subroutine 4, and hence no global communication is need in this part. In exchange, a common stepsize for all agents is computed by finding a global minimum across the network. This common stepsize is then used to recompute the proximal steps for each agent (if needed). Note that the global minimum only needs to be computed once in each time LS is run.
(35) |
(36) |
Remark 4.3.
-
(i)
The rounds of communication in both algorithms are different in nature. On the one hand, in each outer loop of Subroutine 4, we require communication to compute the global sum in (34) and test the inequality therein. On the other hand, in Subroutine 5, we require communication to compute the minimum stepsize among agents only once per linesearch call.
-
(ii)
One advantage of Subroutine 4 is that it does not require the computation of additional proximal steps after communicating, as all agents share the same stepsize before entering a new inner linesearch iteration. However, this cost is balanced by the global communication needed to check the linesearch condition. In this regard, there is a trade-off between checking a global sum and recomputing proximal steps and computing the global minimum stepsize. The latter is the case of Subroutine 5, in which both proximal steps and test of the linesearch condition are performed locally, and may be more suitable whenever the proximal steps are cheap relatively to the cost of computing global scalar sums.
4.3 Convergence analysis
In this section, we analyse convergence of Algorithm 3 with LS being either Subroutine 4 or 5, which will require the following lemma.
Lemma 4.4.
Proof.
Subroutine 4 is a distributed implementation of Subroutine 2, therefore the result for this procedure follows from Lemma 3.7, and (37) follows from (15). Regarding Subroutine 5, using Lemma 3.7(i) with , for each and , there exists such that
(38) |
where . After finitely many backtracking linesearch steps, the search for agent terminates with . Since , this means that the linesearch terminates with , so Assumption 3.1(i) holds with . By aggregating (38) over all and taking , yields Assumption 3.1(ii) and (37). Furthermore, in view of Lemma 3.2, is bounded. In order to check that is separated from , assume by way of a contradiction that there exists a subsequence as . From Step 2 of Subroutine 5, there exists such that . Applying Lemma 3.7(iii) with yields a contradiction, a Step 1 in Subroutine 5 corresponds to Subroutine 2 applied to agent . Hence, Assumption 3.3 holds. ∎
Since we have established that Algorithm 1 is well-defined and both Subroutines 4 and 5 satisfy Assumptions 3.1 and 3.3, we are ready to state the convergence result: the primal sequence converges to a solution in consensus, being asymptotically feasible.
Corollary 4.5 (Convergence to primal solutions).
Suppose that, for each , is a proper lsc convex function and is a differentiable convex with locally Lipschitz continuous gradient, and that the set of solutions of (1) is nonempty. Then, the sequence generated by Algorithm 3 using the linesearch in Subroutine 4 or 5, converges to a point in consensus , such that any row of defines a solution to (1).
Proof.
Remark 4.6.
The initialisation of in (32) requires the knowledge of the smallest eigenvalue of . Instead, we could also perform the linesearch in (27) and initialise . For this alternative linesearch, we need a distributed way to calculate the leftmost term on the left-hand side to make it implementable. More specifically, since , the extra term in the linesearch can be rewritten as
(39) |
Therefore, in each step of the linesearch we need to perform pre-multiplications by , which amounts to rounds of communication. In this way, we check (34) with
while (36) requires
Note that when both linesearch procedures terminate, they imply
These two options could be as costly as estimating separately, since they need a number of round of network communications equal to linesearch steps, per iteration. In this sense, we have the choice of reallocating the possibly highly costly network communication rounds per iteration to a separate problem to compute . The modifications of the linesearch procedures (33)–(34) and (35)–(36) by using (39) are also well-defined, and convergence of the resulting method follows from Remark 3.9(i).
5 Concluding remarks
In this work, we propose an abstract linesearch framework for a primal-dual splitting method to solve min-max convex concave problems, that can be viewed as a generalisation of the method proposed in [13], and we provide an implementation of such a linesearch that naturally satisfies the assumptions. In the distributed optimisation setting, we propose two distributed linesearch procedures, in such a way that the resulting method extends PG-EXTRA defining variable stepsizes. We also allow the gradients to be locally Lipschitz continuous for each agent , instead of using the usual assumption of globally Lipschitz gradients.
Funding.
FA, MND and MKT were supported in part by Australian Research Council grant DP230101749.
References
- [1] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer New York, 2011.
- [2] A. Beck and M. Teboulle. Gradient-based algorithms with applications to signal-recovery problems. In D. P. Palomar and Y. C. Eldar, editors, Convex Optimization in Signal Processing and Communications, page 42–88. Cambridge University Press, 2009.
- [3] J. Y. Bello Cruz and T. T. Nghia. On the convergence of the forward–backward splitting method with linesearches. Optimization Methods and Software, 31(6):1209–1238, 2016.
- [4] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
- [5] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40:120–145, 2011.
- [6] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In H. H. Bauschke, R. S. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, editors, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pages 185–212. Springer New York, New York, NY, 2011.
- [7] L. Condat. A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. Journal of Optimization Theory and Applications, 158(2):460–479, 2013.
- [8] M. P. Friedlander, A. Goodwin, and T. Hoheisel. From perspective maps to epigraphical projections. Mathematics of Operations Research, 48(3):1711–1740, 2023.
- [9] B. He and X. Yuan. Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM Journal on Imaging Sciences, 5(1):119–149, 2012.
- [10] N. Komodakis and J.-C. Pesquet. Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine, 32(6):31–54, 2015.
- [11] Z. Li, W. Shi, and M. Yan. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Transactions on Signal Processing, 67(17):4494–4506, 2019.
- [12] Z. Li and M. Yan. New convergence analysis of a primal-dual algorithm with large stepsizes. Advances in Computational Mathematics, 47:1–20, 2021.
- [13] Y. Malitsky and T. Pock. A first-order primal-dual algorithm with linesearch. SIAM Journal on Optimization, 28(1):411–432, 2018.
- [14] Y. Malitsky and M. K. Tam. A first-order algorithm for decentralised min-max problems. arXiv preprint arXiv:2308.11876, 2023.
- [15] G. Mateos, J. A. Bazerque, and G. B. Giannakis. Distributed sparse linear regression. IEEE Transactions on Signal Processing, 58(10):5262–5276, 2010.
- [16] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard Version 4.1, Nov. 2023.
- [17] A. Nedić, A. Olshevsky, and C. A. Uribe. Fast convergence rates for distributed non-Bayesian learning. IEEE Transactions on Automatic Control, 62(11):5538–5553, 2017.
- [18] S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690. PMLR, 2017.
- [19] C. Ravazzi, S. M. Fosson, and E. Magli. Distributed iterative thresholding for -regularized linear inverse problems. IEEE Transactions on Information Theory, 61(4):2081–2100, 2015.
- [20] R. T. Rockafellar. Monotone operators associated with saddle-functions and minimax problems. In Proceedings of Symposia in Pure Mathematics, volume 18, pages 241–250. American Mathematical Society, 1970.
- [21] R. T. Rockafellar. Saddle-points and convex analysis. Differential games and related topics, 109, 1971.
- [22] E. K. Ryu and W. Yin. Large-scale Convex Pptimization: Algorithms & Analyses via Monotone Operators. Cambridge University Press, 2022.
- [23] W. Shi, Q. Ling, G. Wu, and W. Yin. EXTRA: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.
- [24] W. Shi, Q. Ling, G. Wu, and W. Yin. A proximal gradient algorithm for decentralized composite optimization. IEEE Transactions on Signal Processing, 63(22):6013–6023, 2015.
- [25] B. C. Vũ. A splitting algorithm for dual monotone inclusions involving cocoercive operators. Advances in Computational Mathematics, 38(3):667–681, 2013.