An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints
Abstract
We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make repeated decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of the loss and constraint functions. Our results show that we can improve the current best bounds of regret and cumulative constraint violations to and , respectively, where and represent the cumulative prediction errors of the loss and constraint functions. In the worst case, where and (assuming bounded loss and constraint functions), our rates match the prior results. However, when the loss and constraint predictions are accurate, our approach yields significantly smaller regret and cumulative constraint violations. Notably, if the constraint function remains constant over time, we achieve cumulative constraint violation, aligning with prior results.
1 Introduction
We are interested in generalizations of Online Convex Optimization (OCO) to problems in which constraints are imposed—generalizations which we refer to as Constrained Online Convex Optimization (COCO). Recall the standard formulation of OCO (Orabona, 2019; Hazan, 2023): At each round , a learner makes a decision , receives a loss function from the environment, and suffers the loss . The goal of the learner is to minimize the cumulative loss . In COCO, the learner additionally must attempt to satisfy a (potentially adversarial) constraint of the form at every time step. The learner observes only after selecting , and cannot always satisfy the constraint exactly but can hope to have a small cumulative constraint violation. In the adversarial setting it is not viable to seek absolute minima of the cumulative loss, and the problem is generally formulated in terms of obtaining a sublinear Regret—the difference between the learner’s cumulative loss and the cumulative loss of a fixed decision. Having a sublinear regret means that we perform as well as the best action in hindsight. In the COCO problem we also aim to ensure small cumulative constraint violation.
COCO has been studied extensively over the past two decades, starting with Mahdavi et al. (2012); Jenatton et al. (2016); Yu and Neely (2020) that assumed time-invariant constraints. There has also been work on time-varying constraints (Neely and Yu, 2017; Liu et al., 2022) under a Slater assumption: (. The guarantees presented by those authors are polynomial in , which unfortunately can be made arbitrarily small by the adversary. More recent work has established bounds on regret that scale as with constraint violations scaling as (Guo et al., 2022; Yi et al., 2023; Sinha and Vaze, 2024).
Recent work in OCO has considered settings in which the adversary is predictable—i.e., not entirely arbitrary—aiming to obtain improved regret bounds. For example, Chiang et al. (2012); Rakhlin and Sridharan (2013b) studied the case in which using the previous function is used as a prediction at the current time step (i.e., ). They showed that the regret improved from to where is a measure of the cumulative prediction error. This optimistic framework has also been studied in the COCO setting by Qiu et al. (2023), who focused on time-invariant constraints. Those authors established a similar bound for regret with violations. This line of work was pursued further in Anderson et al. (2022), who, considering the case of the case of adaptive constraints and perfect predictions, established a bound on regret with constraint violations of the order .
In the current paper we go beyond earlier work to consider the case of adversarial constraints. Specifically, our contributions are as follows:
-
1.
We present the first algorithm to solve COCO problems in which the constraints are adversarial but also predictable. We present a meta-algorithm that, when built on an optimistic OCO algorithm, achieves regret and constraint violation.
-
2.
When applied to problems having fixed constraints our meta-algorithm achieves a regret and constraint violation, matching previous work up to a term.
The rest of the paper is structured as follow: We present previous work in Appendix 2, introduce the main assumptions and notation in Appendix 3, and review optimistic OCO in Appendix 4. In Appendix 5, we present our meta-algorithm for the COCO problem and establish theoretical guarantees.
Reference | Complexity per round | Constraints | Loss Function | Regret | Violation |
Guo et al. (2022) | Conv-OPT | Fixed | Convex | ||
Adversarial | Convex | ||||
Yi et al. (2023) | Conv-OPT | Adversarial | Convex | ||
Sinha and Vaze (2024) | Projection | Adversarial | Convex | ||
Qiu et al. (2023) | Projection | Fixed | Convex, Slater | ||
Ours | Projection | Fixed | Convex | ||
Projection | Adversarial | Convex |
2 Related Work
Unconstrained OCO
The OCO problem was introduced in Zinkevich (2003), who established a guarantee based on projected online gradient descent (OGD). Hazan (2023); Orabona (2019) provide overviews of the burgeoning literature that has emerged since Zinkevich’s seminal work, in particular focusing on online mirror descent (OMD) as a general way to solve OCO problems.
Optimistic OCO
Optimistic OCO is often formulated as a problem involving gradual variation—i.e., where and are close in some appropriate metric. Chiang et al. (2012) exploit this assumption in an optimistic version of OMD that incorporates a prediction based on the most recent gradient, and establish a regret bound of where . Rakhlin and Sridharan (2013a, b) prove that when using a predictor that is not necessarily the past gradient, one can have regret of the form .
Constrained OCO
Constrained OCO was first studied in the context of time-invariant constraints; i.e., where for all . In this setup one can employ projection-free algorithms, avoiding the potentially costly projection onto the set , by allowing the learner to violate the constraints in a controlled way (Mahdavi et al., 2012; Jenatton et al., 2016; Yu and Neely, 2020). Qiu et al. (2023) studied the case with gradual variations, proving a regret guarantee and a constraint violation. The case of time-varying constraints is significantly harder as the constraints are potentially adversarial. Most of the early work studying such constraints (Neely and Yu, 2017; Yi et al., 2023) accordingly incorporated an additional Slater condition: . These papers establish regret guarantees that grow with , which unfortunately can be vacuous as can be arbitrarily small. Guo et al. (2022) present an algorithm that does not require the Slater condition and yielded improved bounds but it requires solving a convex optimization problem at each time step. In a more recent work, Sinha and Vaze (2024) presented a simple and efficient algorithm to solve the problem with just a projection and obtained state-of-the-art guarantees. See Table 1 for more comparison of our results with previous work.
3 Preliminaries
3.1 Problem setup and notation
Let denote the set of real numbers, and let denote the set of -dimensional real vectors. Let denote the set of possible actions of the learner, where is a specific action, and let be a norm defined on . Let the dual norm be denoted as .
Online learning is a problem formulation in which the learner plays the following game with Nature. At each step :
-
1.
The learner plays action .
-
2.
Nature reveals a loss function and a constraint function .111If we have multiple constraint functions , we set .
-
3.
The learner receives the loss and the constraint violation .
In standard OCO, the loss function is convex, and the goal of the learner is to minimize the regret with respect to an oracle action , where:
(1) |
In COCO, we generalize the OCO problem to additionally ask the learner to obtain a small cumulative constraint violation, which we denote as :
(2) |
Overall, the goal of the learner is to achieve both sublinear regret and sublinear CCV. This is a challenging problem, and indeed Mannor et al. (2009) proved that no algorithm can achieve both sublinear regret and sublinear cumulative constraint violation for an oracle in the set . However, it is possible to find algorithms that achieve sublinear regret for the smaller set , and this latter problem is our focus.
Finally, we will assume that at the end of step , the learner can make predictions and . More precisely, we will be interested in predictions of the gradients, and, for any function , we will denote by the prediction of the gradient of . We will abuse notation and denote by the function whose gradient is . Moreover, we define the following prediction errors
(3) |
where is the sequence of actions taken by the learner.
Additionally, for a given -strongly convex function , we define the Bregman divergence between two points:
(4) |
Two special cases that are particularly important:
-
1.
When , the Bregman divergence is the Euclidean distance , with respect to the L2 norm, and .
-
2.
When , the Bregman divergence is the KL divergence : , with respect to the L1 norm, and , .
3.2 Assumptions
Throughout this paper we will use various combinations of the following assumptions.
Assumption 1 (Convex set, loss and constraints).
We make the following standard assumptions on the loss :
-
1.
is closed, convex and bounded with diameter .
-
2.
, is convex and differentiable.
-
3.
, is convex and differentiable.
Assumption 2 (Bounded losses).
The loss functions are bounded by and the constraints are bounded by .
Assumption 3 (Feasibility).
The set is not empty.
Assumption 4 (Lipschitz gradients).
For any , the gradient of the loss function and the gradient of the constraint function are and Lipschitz, respectively. That is, , we have:
We will abuse notation and denote , and similarly for . Moreover, we will assume that there is a constant such that . We similarly define .
Assumption 5 (Prediction Sequence Regularity).
The prediction sequence of loss functions is such that is -smooth and is -smooth:
We again abuse notation and let and similarly for .
4 Optimistic Algorithms for OCO
In this section, we introduce some of the foundational optimistic algorithms that have been used for OCO. Our meta-algorithm to solve COCO problems will repose on these foundational ingredients.
4.1 Optimistic OMD and Optimistic AdaGrad
Theorem 7 (Optimistic OMD).
If constant, we have
(6) |
For completeness we present a proof of this result in Appendix A, adapting the proof from Rakhlin and Sridharan (2013b) with the important change that the regret depends on instead of . This is significant as equals zero when the predictions are perfect, but might not be zero.
In order to obtain the optimal convergence rate with a fixed learning learning rate, we must set with requires knowing in advance. We mitigate this issue via the use of adaptive learning rates, in the style of Rakhlin and Sridharan (2013b). Note also that D’Orazio and Huang (2021) present an algorithm for a more general class of algorithms called Lagrangian hedging, OMD being a special case. However, this algorithm requires knowledge of , which our approach does not require.
Theorem 8 (Adapted from Rakhlin and Sridharan (2013b), Corollary 2).
Rakhlin and Sridharan (2013b) use a slightly different learning rate—instead of , they use 1. We modify that learning rate to be dependent on because this allows the scale of the function to vary over time, which will be useful in the generalization to COCO. If the functions are linear, we can use any .
The proof is in Appendix B.
Remark 9.
Note that the algorithm is computationally efficient. Indeed, a mirror step can be computed in two steps:
-
1.
Compute such that . In particular, if is invertible, .
-
2.
Let .
Moreover, we have two special cases of interest:
-
1.
When and , this is simply projected gradient descent, .
-
2.
When the -dimensional simplex, with being the entropy, , where is a normalizing factor to ensure .
4.2 Optimistic OMD for learning with experts
In this setting, the agent has access to experts and has to form a distribution for selecting among them. She observes the loss of each expert and suffers an overall loss which is the expected value over the experts. Formally, we assume where is the number of experts. At each step , the learner selects , a distribution over experts, and observes , which is the vector of loss across the experts. The learner then suffers the loss . We will let denote the prediction of .
We could use the Algorithm 1 with , but in the worst case can be as large as resulting in a regret scaling in . We instead are able achieve a scaling of . Let and . In that case, the Bregman divergence is the KL divergence and . However, the KL divergence is not upper bounded as any can be close to zero. We circumvent this problem in Algorithm 2 by introducing the mixture . This algorithm can be found in Rakhlin and Sridharan (2013b) in the context of a two-player zero-sum game.
Theorem 10 (Optimistic OMD with experts ).
Under Assumption 1 and for any , is a constant function. If , and , then
(9) |
Moreover, by using defined as
(10) |
the regret guarantee becomes:
(11) |
5 Meta-Algorithm for Optimistic COCO
Our meta-algorithm is inspired by Sinha and Vaze (2024). The main idea of that paper is to build a surrogate loss function that can be seen as a Lagrangian of the optimization problem
The learner then runs AdaGrad (Duchi et al., 2011) on the surrogate, with a theoretical guarantee of bounded cumulative constraint violation (CCV) and Regret.
Our meta-algorithm is based on the use of optimistic methods, such as those presented in Appendix 4, which allows us to obtain stronger bounds that depends on the prediction quality. Presented in Algorithm 3, this algorithm assumes that at the end of every step , the learner makes a prediction and of the upcoming loss and constraint .222We are actually only interested in the predictions of the gradients, but for simplicity we will let denote any function whose gradient is the prediction of the gradient . At each time step , the learner forms a surrogate loss function, defined via a convex potential Lyapunov function:, where is non-decreasing. Specifically:
(12) |
Using the predictions and , we form a prediction of the Lagrange function , where is defined in Equation 13.
(13) |
In Sinha and Vaze (2024), the update is , but using at requires to be known at the end of . We instead define the following update:
(14) |
The learner then executes the step of algorithm , denoted in Algorithm 3. We then have the following lemma that relates the regret on , CCV, and the regret of on .
Lemma 11 (Regret decomposition).
For any algorithm , if is a Lyapunov potential function, we have that for any , and any
(15) |
where , and is the regret of the algorithm running on .
Proof By convexity of :
Let , then by definition , thus
Summing from to :
where
Below we will make the assumption that the underlying optimistic OCO algorithm has standard regret guarantees that we will express in terms of a functional that takes as input a sequence of functions, and returns a constant. For simplicity, we will denote . One example is , another is .
Assumption 12 (Regret of optimistic OCO).
The optimistic OCO algorithm has the following regret guarantee: There is a constant and a sublinear functional such that for any sequence of functions , and any
(16) |
Theorem 13 (Optimistic OCO regret and CCVguarantees).
Consider the following assumptions :
-
1.
and satisfy the assumptions of algorithm for all .
- 2.
-
3.
satisfies 12 for some constant
-
4.
.
-
5.
.
Under these assumptions, Algorithm 3 has the following regret and CCV guarantees: ,
(17) | ||||
(18) |
Proof Note that is convex, and using (13), we obtain the following instantaneous prediction error:
Thus,
(19) |
By sub-linearity of :
Finally, using 12, we have
using the fact that is non-decreasing and is a non-decreasing function. With those two assumptions, and knowing that is non-negative and upper bounded by we can also upper bound :
We can now upper bound the regret: for any
Thus
(20) |
Therefore, if ,
Note that the final inequality comes from using . Also note that , thus:
(21) |
If , then
and
With and , we have
Remark 14.
For unknown , we can use the doubling trick, and use the previous prediction errors to estimate .
Finally, we have the following corollaries that are direct consequences of applying Algorithm 3, depending on the setting and the properties of the algorithm .
Corollary 15 (Optimistic Adagrad COCO).
Consider the following assumptions:
-
1.
is optimistic Adagrad described in Algorithm 1 with .
-
2.
5 with and .
-
3.
and are set as in Theorem 13.
Under these assumptions, the meta-algorithm (3) has the following regret and constraint violation guarantees:
(22) |
Proof This is a direct consequence of Theorem 13. We only need to show that satisfies the assumption of :
-
1.
There is some such that is -Lipschitz.
-
2.
is -Lipschitz.
-
3.
.
The first two statements are straightforward from the definition of , and sublinearity of the norm. Note that is -Lipschitz for the same reason. Finally, the third assumption of the corollary directly implies .
Corollary 16 (Fixed constraint bounds).
If is known for all , then by using in Algorithm 3 we have
(23) |
Proof
The proof is a direct consequence of Theorem 13 with .
Remark 17.
By applying the algorithm without predictions, i.e., using , we obtain and for constraint violations, improving on the result for constraint violation when applying Sinha and Vaze (2024)’s algorithm.
Corollary 18 (Application to the experts setting).
When using Algorithm 2 in the meta-algorithm, we have
(24) |
References
- Anderson et al. (2022) Daron Anderson, George Iosifidis, and Douglas J Leith. Lazy Lagrangians with predictions for online learning. arXiv preprint arXiv:2201.02890, 2022.
- Chiang et al. (2012) Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Conference on Learning Theory, pages 6–1. JMLR Workshop and Conference Proceedings, 2012.
- D’Orazio and Huang (2021) Ryan D’Orazio and Ruitong Huang. Optimistic and adaptive Lagrangian hedging. arXiv preprint arXiv:2101.09603, 2021.
- Duchi et al. (2011) John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
- Guo et al. (2022) Hengquan Guo, Xin Liu, Honghao Wei, and Lei Ying. Online convex optimization with hard constraints: towards the best of two worlds and beyond. Advances in Neural Information Processing Systems, 35:36426–36439, 2022.
- Hazan (2023) Elad Hazan. Introduction to online convex optimization, 2023. URL https://arxiv.org/abs/1909.05207.
- Jenatton et al. (2016) Rodolphe Jenatton, Jim Huang, and Cedric Archambeau. Adaptive algorithms for online convex optimization with long-term constraints. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 402–411, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/jenatton16.html.
- Liu et al. (2022) Qingsong Liu, Wenfei Wu, Longbo Huang, and Zhixuan Fang. Simultaneously achieving sublinear regret and constraint violations for online convex optimization with time-varying constraints. ACM SIGMETRICS Performance Evaluation Review, 49(3):4–5, 2022.
- Mahdavi et al. (2012) Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research, 13(1):2503–2528, 2012.
- Mannor et al. (2009) Shie Mannor, John N Tsitsiklis, and Jia Yuan Yu. Online learning with sample path constraints. Journal of Machine Learning Research, 10(3), 2009.
- Neely and Yu (2017) Michael J. Neely and Hao Yu. Online convex optimization with time-varying constraints, 2017. URL https://arxiv.org/abs/1702.04783.
- Orabona (2019) Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
- Qiu et al. (2023) Shuang Qiu, Xiaohan Wei, and Mladen Kolar. Gradient-variation bound for online convex optimization with constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9534–9542, 2023.
- Rakhlin and Sridharan (2013a) Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Conference on Learning Theory, pages 993–1019. PMLR, 2013a.
- Rakhlin and Sridharan (2013b) Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences, 2013b. URL https://arxiv.org/abs/1311.1869.
- Sinha and Vaze (2024) Abhishek Sinha and Rahul Vaze. Optimal algorithms for online convex optimization with adversarial constraints, 2024. URL https://arxiv.org/abs/2310.18955.
- Wei et al. (2020) Xiaohan Wei, Hao Yu, and Michael J Neely. Online primal-dual mirror descent under stochastic constraints. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(2):1–36, 2020.
- Yi et al. (2023) Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Yiguang Hong, Tianyou Chai, and Karl H Johansson. Distributed online convex optimization with adversarial constraints: reduced cumulative constraint violation bounds under Slater’s condition. arXiv preprint arXiv:2306.00149, 2023.
- Yu and Neely (2020) Hao Yu and Michael J Neely. A low complexity algorithm with o () regret and o (1) constraint violations for online convex optimization with long term constraints. Journal of Machine Learning Research, 21(1):1–24, 2020.
- Zinkevich (2003) Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
Appendix A Proof of Theorem 7
The theorem is a direct consequence of the following lemma.
Lemma 19.
One step of optimistic online mirror descent satisfies:
(25) |
Moreover, if is -smooth with ,
(26) |
We will need the following proposition to prove the lemma.
Proposition 20 (Chiang et al. (2012), proposition 18).
For any , if , then
(27) |
Proof [of Lemma 19] Let
On one hand, using Proposition 20, the left and right terms can be upper bounded respectively :
Therefore
On the other hand,
By combining the last two inequalities, we obtain (25).
To prove (26), first note that by using the fact that ,
For the second part of the statement, if is -smooth:
By inserting in (25) and dividing both sides by :
If , then since is non-increasing. We can upper bound the third term by zero.
Appendix B Proof of Theorem 8
Proof Note that is non-decreasing. Moreover, we have . Therefore,
We can apply Theorem 7:
(29) |
Note that we can rewrite:
(30) |
We can also show that
(31) |
Using (30) and (31) in the regret upper bound (29):
Appendix C Proof of Theorem 10
The proof is similar to that of Theorem 7. We will need a lemma from Wei et al. (2020) that relates and .
Lemma 21 (Wei et al. (2020), Lemma 31).
Let and , then
Then we must show a lemma very similar to Lemma 19.
Lemma 22.
One step of optimistic online mirror descent satisfies:
(32) |
Proof The proof is exactly the same as Lemma 19. First, we can retrieve an equation similar to (25), where we replace with :
(33) |
Then we can upper bound
We can also use Lemma 21 below to upper bound the KL divergence:
and from Pinsker’s inequality:
By combining these results in (33) and dividing both sides by we obtain
Proof [of Theorem 10] By first dividing both sides of (32) by and summing over :
We can upper bound the first term of the sum by noticing that:
using Lemma 21 | ||||
The second term can be upper bounded by:
The last term can be upper bounded by . By combining all of these terms, and setting we obtain (9):
To prove (11), we use the same reasoning as in Theorem 8 where we let
(34) |