Online Convex Optimization Using Coordinate Descent Algorithms
Abstract
This paper considers the problem of online optimization where the objective function is time-varying. In particular, we extend coordinate descent type algorithms to the online case, where the objective function varies after a finite number of iterations of the algorithm. Instead of solving the problem exactly at each time step, we only apply a finite number of iterations at each time step. Commonly used notions of regret are used to measure the performance of the online algorithm. Moreover, coordinate descent algorithms with different updating rules are considered, including both deterministic and stochastic rules that are developed in the literature of classical offline optimization. A thorough regret analysis is given for each case. Finally, numerical simulations are provided to illustrate the theoretical results.
keywords:
Online convex optimization; Coordinate descent; Online learning; Regret minimization., ,
1 Introduction
Online learning [29], resource allocation [9], demand response in power systems [14], and localization of moving targets [2] are just a few examples where online convex optimization (OCO) has been applied. In the problem setup of OCO, the objective functions are time-varying and are not available to the decision maker a priori. At each time instance, after an update of the decision variable, new information of the latest cost function is made available to the decision maker. The objective of the decision maker is to minimize the objective function over time. One commonly used performance measure of an online optimization algorithm is the notion of regret which measures the suboptimality of the algorithm compared to the outcome generated by the best decision at each time step.
In the seminal work of [41], the method of online gradient descent is proposed for OCO problems, where at each time step the decision maker performs one gradient descent step using the latest available information. A static regret upper bound that is sublinear in is proved, where is the length of the horizon. Under stronger assumptions on the cost functions such as strong convexity, an improved logarithmic regret bound can be achieved [11, 23, 10]. If future information is available, it can be used to further improve the performance of the online optimization algorithm in terms of regret bounds. The work [15] introduces an additional predictive step following the algorithm developed in [41], if certain conditions on the estimated gradient and descent direction are met. Similar algorithms have also been extended to cases where zeroth order [6, 30, 36, 26, 32] and second order [16] oracles are used instead of (sub)gradients. The works [6, 36] on bandit feedback consider the situation where there are time-varying inequality constraints. In such cases the algorithms proposed in [41] will be hard to implement because of the high computational resource demand of the projection operation. This motivates recent research on online optimization algorithms with time-varying constraints including the primal-dual algorithm proposed in [13, 37], a modified saddle-point method given in [9]. Other algorithms are also proposed to handle stochastic constraints [38] and cover continuous-time applications [27]. In the case where only the values rather than the exact form of the cost function at are revealed to the decision maker, bandit feedback based online algorithms [6, 36] can be used to solve the problem, other methods such as forward gradient [22] have also been proposed recently to deal with the issue. The need for applications in large-scale systems has also led to extensive research on distributed OCO. Distributed online algorithms that achieve sublinear regret bound for convex optimization problems with static constraints can be found in [28, 12, 33]. For instance, [28] proposes a distributed version of the dynamic mirror descent algorithm which is a generalization of the classical gradient descent methods suitable for high-dimensional optimization problems. The work [19] proposes distributed online primal-dual algorithms for optimization problems with static coupled inequality constraints while the work [35] studies distributed online convex optimization with time-varying inequality constraints in the discrete-time setting. For a more detailed documentation of recent advances of online optimization, we refer the readers to the survey paper [17].
To the best of our knowledge, Coordinate descent [31], as an important class of optimization algorithms, is not sufficiently analyzed by researchers in the online optimization community. In coordinate descent algorithms, most components of the decision variable are fixed during one iteration while the cost function is minimized with respect to the remaining components of the decision variable. The resulting problem is lower-dimensional and often much easier to solve. Thus, coordinate descent algorithms have great potential in applications such as machine learning, where iteration with full gradient information is computationally expensive. In [24], it is shown that for huge scale problems, coordinate descent can be very efficient. Another situation where one may find coordinate descent useful is dual decomposition based methods for distributed optimization, see [21] and references therein. Specifically, the dual problem of multi-agent optimal consensus results in a sum of functions with very loose coupling between the dual variables. Calculation of a component of the gradient of the dual function only involves computations and communications of a pair of agents (or processors). Moreover, it can also be implemented in a parallel fashion as shown in [3]. Therefore, sufficient effort has been made recently by researchers to develop theoretical performance guarantees of various coordinate descent algorithms [31]. In this paper, we aim to extend this appealing algorithm to solve OCO problems by providing an in-depth regret analysis for different types of online coordinate descent algorithms.
The main contributions of the paper can be summarized as follows. First, we extend the coordinate descent algorithms considered in [31] to the online case and provide their regret analysis. To the best of our knowledge, this is the first attempt to look at possibilities of using coordinate descent methods to solve OCO problems. Second, we provide an in-depth regret analysis of various coordinate descent algorithms with different rules, such as cyclic updating rules and random updating rules. Specifically, we consider both random and deterministic online coordinate descent algorithms under assumptions commonly used in the literature. In particular, most existing literature on OCO are based on extensions of offline algorithms that monotonically reduce the distance from the decision variable to the set of solutions at each iteration. An example is the well-known online gradient descent [41, 11]. However, offline deterministic coordinate descent algorithm, although has provable convergence properties to the set of solutions, does not necessarily result in an updated variable that is closer to the set of solutions at each iteration. We overcome this issue by using predictive like updates at each time which are detailed in Section 5. Lastly, we show that the regret bounds achieved by our online coordinate descent algorithms are comparable to those achieved by the literature on centralized full-gradient based online algorithms.
We summarize the theoretical upper bounds of regrets we prove in Theorem 1 to Theorem 7 for Algorithms 1, 4, 5 in the following table.
,C | ,SC | ,C | ,SC | |
---|---|---|---|---|
Alg. 1 | ||||
Alg. 2,4 | ||||
Alg. 3,5 | ||||
Gradient Method | [41] | [11] | [7] | [28] [8] |
The regret bounds summarized in Table 1 are consistent with regret bounds of full-gradient based online optimization algorithms proved in the existing literature [29, 11, 7, 23] under similar settings. Our dynamic regret bounds for strongly convex functions proved in Theorems 6 and 7 might need multiple updates at each time . This setup is also adopted in some existing works including [40, 8] to achieve less conservative regret bounds. It should also be noted that, although the online algorithms with different update rules share the regret bounds with the same order, the exact coefficient for the regret bounds may still be different.
The rest of paper is organized as follows. The problem formulation is presented in Section 2. The online coordinate descent algorithm considered in this paper is given in Section 3. Regret bounds for random online coordinate descent algorithms are given in Section 4 followed by regret bounds for deterministic online coordinate descent algorithms in Section 5. The numerical simulation is given in Section 6. Finally the results presented in this paper is summarized in Section 7.
Notation: Let be the set of real numbers and be the -dimensional Euclidean space, (resp. ) be the set of non-negative (resp. positive) real numbers. For , denotes the inner product in . The set of non-negative (resp. positive) integers is denoted by (resp. ). Furthermore, denotes the Euclidean norm of a vector . The matrix is used to denote the -dimensional identity matrix and will be omitted when the dimension is clear. For a given vector , denotes the -th component of , denotes the value of at time , and denotes the value of the -th component of at time . All random variables considered in this paper are defined on a probability space , where is the sample space, the -field is the set of events and is the probability measure defined on .
2 Problem formulation
We consider the following optimization problem
(1) |
where is the convex cost function at time , is the decision variable, and is the non-empty closed convex constraint set. For simplicity, we assume is continuously differentiable for any . Moreover, let the decision variable at any given time be partitioned as , , . The components of the vector are assigned to individual processors. For any integer , the processor is responsible for updating its own “local” decision variable [3].
Denote the minimizer of at time by . We use a notion of regret to measure processors’ ability to track . Regret refers to a quantity that measures the overall difference between the cost incurred by an algorithm and the cost at the best possible point from an offline view up to a horizon . Two notions of regret commonly considered in the literature are static regret and dynamic regret. The static regret is defined as:
(2) |
where for a given , the subscript is omitted for convenience. On the other hand, the dynamic regret is defined as
(3) |
Remark 1.
Static regret is a useful performance metric in applications such as static parameter estimation, where the variable of interest is static. However, if the variable of interest evolves over time (e.g. tracking moving targets), the notion of dynamic regret makes more sense than its static counterpart. It can be seen from (2) and (3) that regret captures the accumulation of errors due to the fact that optimization problems are not solved exactly at each step. If the regret is sublinear in , then the average accumulated error (or ) will converge to as . This further implies that converges to (or ). Although being the more appropriate performance metric in most applications, dynamic regret does not have a provable bound sublinear in in general. To obtain a sublinear regret bound, additional regularity assumptions such as bounded variation of environment are typically required, see [28] for example.
If stochastic algorithms are considered, similar notions of regrets can be defined via expectations.
(4) |
(5) |
With some abuse of notation, and are used for both random and deterministic cases. However, this would not lead to confusion since it should be clear whether an algorithm is stochastic or not.
3 Online coordinate descent algorithms
We construct our online block coordinate descent algorithms following the setup in [31]. At time , we select a component of to update. The updating component at time is denoted by . If the -th component is updating, then updating equation for component is given by
(6) |
where is the stepsize at time and is the -th component of the gradient of evaluated at at time . For any , we have
(7) |
Then is projected onto . We define the matrix . For any and , , where and denote identity and zero matrices of dimension , respectively. Then the updates of the coordinate descent algorithm at time can be written as
(8) |
where denotes projection on which is well defined by closedness and convexity of . The following non-expansive property of the projection operator will be used extensively throughout the paper.
Lemma 1.
[4, Proposition 3.2.1] Let be a non-empty closed convex set. We have , for any .
In this paper, we consider the following three commonly used schemes of selecting the updating component .
-
•
The random coordinate descent algorithm. In this case, is selected randomly with equal probability, independently of the selections made at previous iterations.
-
•
The cyclic coordinate descent algorithm. In this case, the updating component is selected in a pre-determined cyclic fashion: .
-
•
The coordinate descent algorithm with Gauss-Southwell Rule. In this case, the updating component is selected as .
In the paper, we consider the case where only one coordinate is allowed to change per iteration. The results on random coordinate descent can be extended to other cases where probabilities of selections are unequal with potentially overlapping components [20], such as the random sleep scheme [34]. Intuitively speaking, if for any the expectation of the update direction takes the form for a given positive definite diagonal matrix , then the analysis in this work can be applied with mild modifications. Our analysis for deterministic cases can not be trivially extended to cover overlapping components. We aim to address this topic in future research.
Remark 2.
A substantial review of variants of coordinate descent algorithms can be found in [4, Section 6.5.1]. The cyclic selection of coordinates is normally assumed to ensure convergence of the algorithm. On the other hand, the use of an irregular order is then considered by researchers to accelerate convergence. Particularly, it is shown in [31] that randomization leads to faster convergence in terms of expectation. Obviously, this is not guaranteed for each instance of the algorithm. The Gauss-Southwell method leads to faster convergence at the cost of extra computations and evaluations of gradients during the selection of coordinates which can be an issue in large-scale problems [25].
4 Regret bounds for online coordinate descent algorithms with random coordinate selection rules
The online random coordinate descent algorithm considered in this section is given in Algorithm 1.
4.1 Static regret for convex functions
Before we state the main results, we first list the assumptions.
Assumption 1.
For any given , and , the following inequalities hold uniformly in and for problem (1),
-
(i)
,
-
(ii)
,
where is the decision variable at time .
Remark 3.
Item (ii) of Assumption 1 can be ensured if the constraint set is bounded. Moreover, when is bounded, item (i) of Assumption 1 holds in many cases including linear regression and logistic regression [5]. By [29, Lemma 2.6], Assumption 1 (i) implies that is also Lipschitz continuous uniformly in over . These assumptions are quite standard in the literature on online optimization [6, 7, 5, 17, 37, 39, 15, 18, 28].
Before we state the first main result of the paper, we first present the so-called doubling trick scheme stepsize rule which is introduced in [29, Section 2.3.1].
Definition 1 (Doubling trick scheme).
A stepsize sequence is said to be chosen by the doubling trick scheme, if for , in each period of iterations .
Remark 4.
The doubling trick scheme of stepsize choices is particularly useful when stepsizes of type are needed to achieve a desirable regret bound. Since in applications it is often unrealistic to know the horizon in advance, by using the doubling trick scheme, regret bounds of the same order can be achieved without explicit knowledge of . The same trick will be used extensively throughout the paper to derive regret bounds. It should also be noted that, the doubling trick scheme in general does not result a better regret bound in terms orders compared to other stepsize rules [41].
The following result states that, under Assumption 1, if the stepsize at each iteration is chosen by the doubling trick scheme, there is an upper bound for the static regret defined in (4). Moreover, the upper bound has the order of for convex costs.
Theorem 1.
for any . Given , denote the -field containing past data of Algorithm 1 up to time by . Then, by convexity of , we have
(10) |
Substituting in to (10), taking total expectations using for , and rearranging (10) leads to
(11) |
Note that
(12) |
Due to Assumption 1 the gradient of is uniformly bounded by and is upper bounded uniformly by . Consequently,
(13) |
In the last inequality of (13), the property is used, which is a direct consequence of the definition of the stepsize rule. On the other hand, the remaining term in (12) can be upper bounded as follows
To derive the regret bound, we again use the properties of the doubling trick scheme. First, we set the horizon to be some known constant and the stepsize is chosen as the constant . Then it is obvious that
(14) |
Since for , in each period of iterations . That is . Then we can sum (14) up over any given as
(15) |
Thus, the proof is complete. ∎
Remark 4.1.
In Theorem 1, the differentiability of is in fact never used. Thus the results apply to the case where subgradients are used when is not differentiable for some as long as item 1 of Assumption 1 is satisfied by the subgradients used in the iterations. Moreover, the regret bound established in Theorem 1 is of the same order as the one established in [29] for online gradient descent under the same assumptions.
4.2 Static regret for strongly convex functions
In this part, we show that if the cost functions are strongly convex, then we can achieve an improved static regret bound for the online random coordinate descent algorithms.
Assumption 2.
For any , the function is uniformly -strongly convex, i.e., there exists such that the following inequality holds for any ,
(16) |
Theorem 2.
Similar to (10), given , we have the following relationship for any ,
(18) |
By substituting to (18) and rearranging the terms, we have the following inequality
Since the gradient of is uniformly bounded by , the following inequality holds
(19) |
By choosing , we have the following relationship
(20) |
where the second inequality follows by expanding the sum. Thus, the proof is complete. ∎
4.3 Dynamic regret for convex functions
Now, we provide an upper bound of the dynamic regret of the online coordinate descent algorithm. First, we define the following measure of variations of the problem (1) which is commonly used in the literature [23, 28]
(21) |
where is a solution to the optimization problem at time and can be arbitrary constant. The term captures the accumulated variations of optimal points at two consecutive time instances over time. If the cost functions are general convex functions and an upper bound of is known, the following results can be stated.
Theorem 3.
Substituting in (10) for any , yields
Furthermore,
(22) |
Then,
Choosing the stepsize to be leads to
∎
Remark 4.2.
As discussed in Remark 4, the choice of stepsize can be replaced by the doubling trick scheme (see Definition 1) to derive a regret bound of the same order. Moreover, the exact value of is also not necessary to derive the regret bound given in Theorem 3. If is not available, a stepsize of the same order of can be used with constant factor errors. We refer readers to [7, Theorem 1] for a more detailed discussion on how to implement stepsizes of the form and obtain an upper bound of in practice using limited information. The inclusion of in the dynamic regret bound is common in existing literature [7, 28, 37]. As argued in [7], if is sublinear in , then the overall dynamic regret bound in Theorem 3 will be sublinear in , which is desired and consistent with the dynamic regret bounds established in [7] for full-gradient based online algorithms. To see this, if the variation of minimizers decreases with the order , then and . Similarly, if the variation of minimizers decreases with the order with , then and . In the worst case, by Assumption 1, we have and . Thus, , which means the corresponding online algorithm incurs a steady-state tracking error. Moreover, the error decreases with the constant bound on the variation of minimizers.
4.4 Dynamic regret for strongly convex functions
Now, we consider the dynamic regret of the online random coordinate descent algorithm for strongly convex functions. As before, we consider -strongly convex functions. In addition, we will make the following assumption commonly seen in online optimization literature [18, 23].
Assumption 3.
For any , the gradient of is uniformly Lipschitz continuous, i.e., there exists such that the following inequality holds for any ,
(23) |
It is shown in [4, Proposition 6.1.2] that under the same conditions stated in Assumption 3, (23) is equivalent to
(24) |
The main result regarding the dynamic regret bound of Algorithm 1 for strongly convex functions with Lipschitz gradients is stated as follows.
Theorem 4.
Then, using Lemma 1, we have . Next, we take the conditional expectation given past history up to time ,
(26) |
The last term in (26) can be upper bounded using the following inequality
(27) |
The inequality (27) comes from the fact that
Combining (25), (26) and (27), we have for any
(28) |
where the last inequality comes from the constant stepsize choice . By Jensen’s inequality, we have
(29) |
Note that . Thus, there exists such that .
Assumption 1 (i) implies for any and any . As a result, the stochastic dynamic regret is bounded as . Moreover, we have
(30) |
Now taking total expectations, using (29) and summing up both sides of (30) from to yields
(31) |
Rearranging (31) leads to
∎
Remark 4.3.
By assuming uniform strong convexity and uniform Lipschitz continuity of gradients of , Theorem 4 improves the order of the theoretical dynamic regret bound from in Theorem 3 to , which is consistent with results shown in [23] for full-gradient based online algorithms. This means if problem (1) does not change over time i.e. for any , and the regret grows at a rate which is consistent with the convergence result of coordinate descent algorithms in the offline setting [31]. As item (ii) of Assumption 1 is not required for proving Theorem 4, the regret bound is applicable to unconstrained problems with bounded gradients.
5 Regret bounds for online coordinate descent algorithms with deterministic coordinate selection rules
We consider two deterministic online coordinate descent algorithms in this section and they are given in Algorithms 2 and 3, respectively.
Note that both Algorithms 2 and 3 are deterministic and update a component of the decision variable. We can therefore relate the regrets achievable by Algorithms 2 and 3 to regret achievable by the online projected gradient descent algorithm which takes the form
(32) |
It can be easily seen that Algorithms 2 and 3 take the following form
(33) |
where the vector with denotes the vector such that its -th component is while all other entries are . It captures the effect of the components of the gradient that are not updating. We use and to denote the static and dynamic regrets of the online projected gradient descent algorithm respectively. Then, we can have the following result.
Proposition 5.1.
By the definitions of regrets in (2) and (3), it is obvious that . Thus, if one of the two items is proved, so is the other item. Since Assumption 1 holds, from (2) and Lemma 1, we know that . Thus the proof is complete. ∎ By using Proposition 5.1, we can establish regret bounds for Algorithms 2 and 3 using known regret bounds for online gradient descent algorithms. Moreover, if the regret bounds for online gradient descent are sublinear in and is also sublinear in , then the established regret bounds for Algorithms 2 and 3 will be sublinear.
5.1 Static regret for convex functions
The following static regret bound of online projected gradient descent algorithm is proved in [41, Theorem 1].
Lemma 5.2.
The following result on static regret bounds of Algorithms 2 and 3 is a direct corollary of Proposition 5.1 and Lemma 5.2.
Corollary 5.3.
5.2 Static regret for strongly convex functions
The static regret bound in Lemma 5.2 is improved in [11, Theorem 1] under the assumption of strong convexity.
Lemma 5.4.
The following result on static regret bounds of Algorithms 2 and 3 for strongly convex functions is a direct corollary of Proposition 5.1 and Lemma 5.4.
Corollary 5.5.
5.3 Dynamic regret for convex functions
We can adapt the proof of Theorem 3 to the deterministic case to derive a dynamic regret bound for convex functions which is given in the following result.
Proposition 5.6.
From (32) and Lemma 1, we have
(34) |
for any . Substituting to (34) results in the following inequality for any
Note that, the inequality (22) still holds in this case Thus, we have
If we set the stepsize to be , we have
∎
Using Propositions 5.1 and 5.6, we can state the following result on dynamic regret bounds of Algorithms 2 and 3.
Corollary 5.7.
5.4 Dynamic regret for strongly convex functions
Note that when the cost functions are strongly convex and constant stepsizes are used, the result in Proposition 5.1 only gives a dynamic regret bound that is linear in . Therefore, we aim to establish better dynamic regret bounds for Algorithms 2 and 3. In this subsection, we will make the following assumption on the problem (1).
Assumption 4.
The gradient of is block-wise Lipschitz continuous uniformly in , i.e., for any , there exists such that the following inequality holds for any , , and
(35) |
where is such that .
We denote the largest for by . By further assuming for any , the dynamic regret bounds can be derived without Assumption 1. Therefore, they are applicable to unconstrained problems with unbounded gradients. Before analyzing the regrets, we define the following measure of variations of the problem (1) which is first introduced in [40],
(36) |
Remark 5.8.
Compared to , can be significantly smaller if the variation of minimizers is small. Following the discussion in Remark 4.2, if the variation of minimizers decreases with the order with , then is finite while grows to with a rate sublinear in . Similarly, if , then .
Theorem 5.
Let denote the sequence of real vectors such that , where the value of follows the coordinate selection rule in Algorithm 3. The variable stores the value of the decision variable before projection, i.e., . By the block descent lemma [1, Lemma 3.2] and the fact that for all , we have . That is
(37) |
From (37) and :
(38) |
Since for any , minimizing both sides of (16) with respect to , we have for any (known as the Polyak-Łojasiewicz condition). Then, by (38), we have
where . Consequently,
(39) |
By the fact that and Assumption 3, we have and hence
(40) |
By (39) and (40), the following inequality holds as a result of Lemma 1
(41) |
Next we show that under the stated assumptions. Since , it can be shown that is equivalent to after some algebraic manipulations. When we have and the set of solutions to the inequality with respect to is non-empty. Moreover the solution is given by . Therefore, the listed conditions in the statement of the theorem ensures that . Thus, (41) implies
(42) |
with .
Note that
(43) |
The deterministic dynamic regret (3) can be upper bounded as follows
(44) |
From (42), we have the following relationship by summing up both sides of (43) from to ,
(45) |
Since , (45) implies leads to . ∎
Remark 5.9.
Note that in Theorem 5, we introduce an upper bound that depends on the number of components which increases to as decreases to . This is mainly a result of conservatism introduced in the derivation of (41). However, we manage to improve the regret bound in Theorem 4 from to following the discussion in Remark 5.8. The bound is also valid for unconstrained OCO problems that are not covered by Theorem 1-4.
The following two theorems give dynamic regret bounds without assuming an upper bound on the number of components . However, at each time , potentially multiple offline steps are needed to guarantee desirable regret bounds.
The modified algorithms are in Algorithm 4.
It can be seen that in Algorithms 4 and 5, at each time , updates are performed where is an integer to be chosen. In Algorithms 2 and 3, however, only one step is performed at each time .
Theorem 6.
By the block descent lemma [1, Lemma 3.2] and the fact that for all , we have at any , for any . That is
Summing over all blocks in a full round where all components have been updated exactly once leads to
By the update equation and Lipschitz continuity of the gradient, we have . Therefore
Summing over blocks leads to
Therefore,
(46) |
By Assumption 2, minimizing both sides of (16) with respect to , we have . Then, by (46), we have
which further implies,
(47) |
By Assumption 3, we have . For any , there exist and such that . Then, by Assumptions 2 and 3, we have
(48) |
for any . Consequently, (47) and (48) yield
(49) |
at any . Since , we have and . Hence there exists such that . Hence, by Lemma 1
(50) |
By (44), we have . Note that
(51) |
Summing up both sides of (51) from to using (50), we have
Since , follows. ∎
Theorem 7.
Following similar steps as those in the proof of Theorem 6 and using (39) and (48) yield
(52) |
where as in the proof of Theorem 5. Since , we know there exists such that . Hence, we have
(53) |
From (53), the following inequality holds by summation of both sides of (51) from to
Since , we have . ∎
6 Numerical simulations
First, we study the following unconstrained quadratic problem
(54) |
where is a randomly generated constant vector, is a time-varying matrix that is positive definite for all . Moreover, all elements of are in the closed interval . As a result, (54) satisfies Assumptions 2-4. Next, we discuss Assumption 1 (i) made in Theorem 4. The constant from Assumption 1 (i) is only used to show such that (31) holds. All other arguments made in the proof of Theorem 4 are still true even if . Thus, every iteration of Algorithm 1 will move closer to expectation-wise, if the stepsize is small enough. Since in our setup is uniformly bounded, the expectation of will remain bounded too. This means and the gradient must be bounded, almost surely. We choose and for each , is a scalar. The horizon length is , and the constant stepsize is chosen as . Since the static regret is always upper bounded by the dynamic regret, we only show the plots for dynamic regrets and their time-averages .

It can be seen from Fig. 1 that when the constant stepsize is chosen sufficiently small, the regrets in all cases have sublinear growths and therefore their time-averages go to when is sufficiently large. This is consistent with our theoretical results from Theorem 4-7 on strongly convex functions.

In order to see how the variation of the problem impacts the performance of the algorithms. We add an extra term of such that the matrix is diagonally dominant and therefore being less sensitive to . We test the online algorithms in this case and the results are shown in Fig. 2. It can be seen that when problem (54) varies slowly with respect to time, the curves of the regrets in Fig. 2 have a lower growth rate compared to the regrets shown in Fig. 1.
As expected, the algorithm using full gradient has the best performance in terms of minimizing the dynamic regret. Yet, it is worth mentioning that among the three coordinate descent algorithms considered for this numerical example, Gauss-Southwell rule gives the best performance which is consistent with Remark 2. The extra information of the component-wise gradient norms enables a better selection of the coordinate to update. An in-depth theoretical analysis of this problem in an online setting is left for future work.
Next, we consider the following problem of minimizing entropy functions online
(55) |
The variable is decomposed into 5 scalar components with the compact constraint set . The values of are such that each is individually and randomly selected from . For , is such that for all . It can be verified that the above selection ensures that and . Note that the cost function in (55) is convex but not strongly convex and hence only Theorem 1 and Theorem 3 apply. We again show the plots of dynamics regrets of Algorithms 1-3 and full gradient based algorithms in Fig. 3 with constant stepsize and . Moreover, the plot for random coordinate descent is averaged over 1000 runs.
It can be seen from the Fig. 1 to Fig. 3 that, for the quadratic problem, the dynamic regrets are a lot flatter when is large. On the other hand, the dynamic regrets in Fig. 3 still exhibit a significant growth when even if we select the time-varying parameters to ensure that . These findings are consistent with the improved regret bounds shown in Theorems 4-7 for uniformly strongly convex functions with uniformly Lipschitz gradients.

7 Summary
In this work, we have proposed an online coordinate descent algorithms to deal with optimization problems that may change over time. Three widely used update rules of coordinate descent are considered. Under different assumptions, we have provided different upper bounds on the regrets of these online algorithms. In particular, we have verified that the established regret bounds of these coordinate descent algorithms are of similar orders as those of online gradient descent methods under same settings. The regret bounds proved in this paper are summarized in Table 1. Lastly, a numerical example was given to illustrate our main result. The possibilities of using coordinates with overlapping components is an interesting future research direction, especially for the deterministic case. Another topic of interest is the use of inaccurate gradient information in online coordinate descent algorithms.
The authors would like to thank Dr. Yuen-Man Pun for pointing out an error in the proof of Theorem 6 in the initial draft.
References
- [1] Amir Beck and Luba Tetruashvili. On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4):2037–2060, 2013.
- [2] A. S. Bedi, P. Sarma, and K. Rajawat. Tracking moving agents via inexact online gradient descent algorithm. IEEE Journal of Selected Topics in Signal Processing, 12(1):202–217, 2018.
- [3] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, 1989.
- [4] Dimitri P Bertsekas. Convex Optimization Algorithms. Athena Scientific, 2015.
- [5] Xuanyu Cao and Tamer Başar. Decentralized online convex optimization with feedback delays. IEEE Transactions on Automatic Control, 67(6):2889–2904, 2022.
- [6] Xuanyu Cao and KJ Ray Liu. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on Automatic Control, 64(7):2665–2680, 2019.
- [7] Xuanyu Cao, Junshan Zhang, and H Vincent Poor. Online stochastic optimization with time-varying distributions. IEEE Transactions on Automatic Control, 66(4):1840–1847, 2021.
- [8] Ting-Jui Chang and Shahin Shahrampour. On online optimization: Dynamic regret analysis of strongly convex and smooth problems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6966–6973, 2021.
- [9] Tianyi Chen, Qing Ling, and Georgios B Giannakis. An online convex optimization approach to proactive network resource allocation. IEEE Transactions on Signal Processing, 65(24):6350–6364, 2017.
- [10] Dan Garber and Elad Hazan. A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. SIAM Journal on Optimization, 26(3):1493–1528, 2016.
- [11] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.
- [12] Saghar Hosseini, Airlie Chapman, and Mehran Mesbahi. Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11):3545–3550, 2016.
- [13] Rodolphe Jenatton, Jim Huang, and Cédric Archambeau. Adaptive algorithms for online convex optimization with long-term constraints. In International Conference on Machine Learning, pages 402–411, 2016.
- [14] A. Lesage-Landry and J. A. Taylor. Setpoint tracking with partially observed loads. IEEE Transactions on Power Systems, 33(5):5615–5627, 2018.
- [15] Antoine Lesage-Landry, Iman Shames, and Joshua A Taylor. Predictive online convex optimization. Automatica, 113:108771, 2020.
- [16] Antoine Lesage-Landry, Joshua A Taylor, and Iman Shames. Second-order online nonconvex optimization. IEEE Transactions on Automatic Control, 2020.
- [17] Xiuxian Li, Lihua Xie, and Na Li. A survey of decentralized online learning. arXiv preprint arXiv:2205.00473, 2022.
- [18] Yingying Li, Guannan Qu, and Na Li. Online optimization with predictions and switching costs: Fast algorithms and the fundamental limit. IEEE Transactions on Automatic Control, 66(10):4761–4768, 2021.
- [19] Zhenhong Li, Zhengtao Ding, Junyong Sun, and Zhongkui Li. Distributed adaptive convex optimization on directed graphs via continuous-time algorithms. IEEE Transactions on Automatic Control, 63(5):1434–1441, 2018.
- [20] Yankai Lin, Iman Shames, and Dragan Nešić. Asynchronous distributed optimization via dual decomposition and block coordinate ascent. In 58th IEEE Conference on Decision and Control, pages 6380–6385, 2019.
- [21] Yankai Lin, Iman Shames, and Dragan Nešić. Asynchronous distributed optimization via dual decomposition and block coordinate subgradient methods. IEEE Transactions On Control Of Network Systems, 8(3):1348–1359, 2021.
- [22] Behnam Mafakheri, Iman Shames, and Jonathan Manton. First order online optimisation using forward gradients under polyak-Lojasiewicz condition. arXiv preprint arXiv:2211.15825, 2022.
- [23] Aryan Mokhtari, Shahin Shahrampour, Ali Jadbabaie, and Alejandro Ribeiro. Online optimization in dynamic environments: Improved regret rates for strongly convex problems. In 55th IEEE Conference on Decision and Control, pages 7195–7201, 2016.
- [24] Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341–362, 2012.
- [25] Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, and Hoyt Koepke. Coordinate descent converges faster with the gauss-southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641. PMLR, 2015.
- [26] Yipeng Pang and Guoqiang Hu. Randomized gradient-free distributed online optimization via a dynamic regret analysis. IEEE Transactions on Automatic Control, 2023.
- [27] Santiago Paternain and Alejandro Ribeiro. Online learning of feasible strategies in unknown environments. IEEE Transactions on Automatic Control, 62(6):2807–2822, 2017.
- [28] Shahin Shahrampour and Ali Jadbabaie. Distributed online optimization in dynamic environments using mirror descent. IEEE Transactions on Automatic Control, 63(3):714–725, 2018.
- [29] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2012.
- [30] Iman Shames, Daniel Selvaratnam, and Jonathan H Manton. Online optimization using zeroth order oracles. IEEE Control Systems Letters, 4(1):31–36, 2019.
- [31] S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1):3–34, 2015.
- [32] Yongyang Xiong, Xiang Li, Keyou You, and Ligang Wu. Distributed online optimization in time-varying unbalanced networks without explicit subgradients. IEEE Transactions on Signal Processing, 70:4047–4060, 2022.
- [33] Feng Yan, Shreyas Sundaram, S. V. N Vishwanathan, and Yuan Qi. Distributed autonomous online learning: Regrets and intrinsic privacy-preserving properties. IEEE Transactions on Knowledge and Data Engineering, 25(11):2483–2493, 2013.
- [34] Peng Yi and Yiguang Hong. Stochastic sub-gradient algorithm for distributed optimization with random sleep scheme. Control Theory and Technology, 13:333–347, 2015.
- [35] Xinlei Yi, Xiuxian Li, Lihua Xie, and Karl H Johansson. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 68:731–746, 2020.
- [36] Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Tianyou Chai, and Karl H Johansson. Distributed bandit online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Automatic Control, 66:4620–4635, 2021.
- [37] Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Tianyou Chai, and Karl H Johansson. Regret and cumulative constraint violation analysis for distributed online constrained convex optimization. IEEE Transactions on Automatic Control, 68:2875–2890, 2023.
- [38] Hao Yu, Michael J Neely, and Xiaohan Wei. Online convex optimization with stochastic constraints. In Advances in Neural Information Processing Systems, pages 1428–1438, 2017.
- [39] Deming Yuan, Alexandre Proutiere, and Guodong Shi. Distributed online optimization with long-term constraints. IEEE Transactions on Automatic Control, 67(3):1089–1104, 2022.
- [40] Lijun Zhang, Tianbao Yang, Jinfeng Yi, Rong Jin, and Zhi-Hua Zhou. Improved dynamic regret for non-degenerate functions. Advances in Neural Information Processing Systems, 30, 2017.
- [41] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning, pages 928–936, 2003.