Decentralized Online Convex Optimization in Networked Systems
Abstract
We study the problem of networked online convex optimization, where each agent individually decides on an action at every time step and agents cooperatively seek to minimize the total global cost over a finite horizon. The global cost is made up of three types of local costs: convex node costs, temporal interaction costs, and spatial interaction costs. In deciding their individual action at each time, an agent has access to predictions of local cost functions for the next time steps in an -hop neighborhood. Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. We show that LPC achieves a competitive ratio of in an adversarial setting, where and are constants in that increase with the relative strength of temporal and spatial interaction costs, respectively. This is the first competitive ratio bound on decentralized predictive control for networked online convex optimization. Further, we show that the dependence on and in our results is near optimal by lower bounding the competitive ratio of any decentralized online algorithm.
1 Introduction
A wide variety of multi-agent systems can be modeled as optimization tasks in which individual agents must select actions based on local information with the goal of cooperatively learning to minimize a global objective in an uncertain, time varying environment. This general setting emerges in applications such as formation control (Chen & Wang, 2005; Oh et al., 2015), power systems control (Molzahn et al., 2017; Shi et al., 2021), and multiproduct price optimization (Caro & Gallien, 2012; Candogan et al., 2012). In all these cases, it is key that the algorithms used by agents use only local information due to the computational burden created by the size of the systems, the information constraints in the systems, and the need for fast and/or interpretable decisions.
At this point, there is a mature literature focused on decentralized optimization, e.g. Bertsekas & Tsitsiklis (1989); Boyd et al. (2011); Shi et al. (2015); Nedić et al. (2018), see Xin et al. (2020) for a survey; however, the design of learning policies for uncertain, time-varying environments requires decentralized online optimization. The literature studying decentralized online optimization is still nascent (see the related work section for a discussion of recent papers, e.g. Li et al. (2021b); Yuan et al. (2021); Yi et al. (2020)) and many challenging open questions remain.
Three issues of particular importance for real-world applications are the following.
First, temporal coupling in actions is often of first-order importance to applications. For example, startup costs, ramping costs, and switching costs are prominent in settings such as power systems and cloud computing, and lead to penalties for changing actions dramatically over time. The design of online algorithms to address such temporal interaction costs has received significant attention in the single-agent case recently, e.g, smoothed online optimization (Goel et al., 2019; Lin et al., 2020), convex body chasing (Argue et al., 2020b; Sellke, 2020), online optimization with memory (Agarwal et al., 2019; Shi et al., 2020), and dynamic pricing (Besbes & Lobel, 2015; Chen & Farias, 2018).
Second, spatial interaction costs are of broad importance in practical applications. Such costs arise because of the need for actions of nearby agents to be aligned with one another, and are prominent in settings such as economic team theory (Marschak, 1955; Marschak & Radner, 1972), combinatorial optimization over graphs (Hochba, 1997; Gamarnik & Goldberg, 2010), and statistical inference (Wainwright & Jordan, 2008). An example is (dynamic) multiproduct pricing, where the price of a product can impact the demand of other related products (Song & Chintagunta, 2006).
Third, leveraging predictions of future costs has long been recognized as a promising way to improve the performance of online agents (Morari & Lee, 1999; Lin et al., 2012; Badiei et al., 2015; Chen et al., 2016; Shi et al., 2019; Li et al., 2020). As learning tools become more prominent, the role of predictions is growing. By collecting data from repeated trials, data-driven learning tools make it possible to provide accurate predictions for near future costs. For example, in multiproduct pricing, good demand forecasts can be constructed up to a certain time horizon and are invaluable in setting prices (Caro & Gallien, 2012).
In addition to the three issues above, we would like to highlight that existing results for decentralized online optimization focus on designing algorithms with low (static) regret (Hosseini et al., 2016; Li et al., 2021b) , i.e., algorithms that (nearly) match the performance of the best static action in hindsight. In a time-varying environment, it is desirable to instead obtain stronger bounds, such as those on the dynamic regret or competitive ratio, which compare to the dynamic optimal actions instead of the best static action in hindsight, e.g., see results in the centralized setting such as Lin et al. (2020); Li et al. (2020); Shi et al. (2020).
This paper aims to address decentralized online optimization with the three features described above. In particular, we are motivated by the open question: Can a decentralized algorithm make use of predictions to be competitive for networked online convex optimization in an adversarial environment when spatial and temporal costs are considered?
Contributions. This paper provides the first competitive algorithm for decentralized learning in networked online convex optimization. Agents in a network must each make a decision at each time step, to minimize a global cost which is the sum of convex node costs, spatial interaction costs and temporal interaction costs. We propose a predictive control framework called Localized Predictive Control (LPC, Algorithm 1) and prove that it achieves a competitive ratio of , which approaches 1 exponentially fast as the prediction horizon and communication radius increase simultaneously. Our results quantify the improvement in competitive ratio from increasing the communication radius (which also increases the computational requirements) versus increasing the prediction horizon , and imply that – as a function of problem parameters – one of the two “resources” and emerges as the bottleneck to algorithmic performance. Given any target competitive ratio, we find the minimum required prediction horizon and communication radius as functions of the temporal interaction strength and the spatial interaction strength, resp.
Further, we show that LPC is order-optimal in terms of and by proving a lower bound on the competitive ratio for any online algorithm. We formalize the near optimality of our algorithm by showing that a resource augmentation bound follows from our upper and lower bounds: our algorithm with given and performs at least as well as the best possible algorithm that is forced to work with and which are a constant factor smaller than and respectively.
The algorithm we propose, LPC, is inspired by Model Predictive Control (MPC). After fixing the prediction horizon and the communication radius , each agent makes an individual decision by solving a -time-step optimization problem, on a local neighborhood centered at itself and with radius . In doing so, the algorithm utilizes all available information and makes a “greedy” decision. One benefit of this algorithm is its simplicity and interpretability, which is often important for practical applications. Moreover, since the algorithm is local, the computation needed for each agent is independent of the network size.
Our main results are enabled by a new analysis methodology which obtains two separate decay factors for the propagation of decision errors (a temporal decaying factor and spatial decaying factor ) through a novel perturbation analysis. Specifically, the perturbation analysis seeks to answer the following question: If we perturb the boundary condition of an agent ’s -hop neighborhood at the time step which is -th later than the present, how does that affect ’s current decision, in terms of spatial distance and temporal distance ? With our analysis, we are able to bound the impact on ’s current decision by , where the decay factors and increase with the strength of temporal/spatial interactions among individual decisions. This novel analysis is critical for deriving a competitive ratio that distinguishes the decay rate for temporal and spatial distances.
To illustrate the use of our results in a concrete application, Appendix B provides a detailed discussion of dynamic multiproduct pricing, which is a central problem in revenue management. The resulting revenue maximization problem fits into our theoretical framework, and we deduce from our results that LPC guarantees near optimal revenue, in addition to reducing the computational burden (Schlosser, 2016) and providing interpretable prices (Biggs et al., 2021) for products.
Related Work. This paper contributes to the literature in three related areas, each of which we describe below.
Distributed Online Convex Optimization. Our work relates to a growing literature on distributed online convex optimization with time-varying cost functions over multi-agent networks. Many recent advances have been made including distributed OCO with delayed feedback (Cao & Basar, 2021), coordinating behaviors among agents (Li et al., 2021a; Cao & Başar, 2021), and distributed OCO with a time-varying communication graph (Hosseini et al., 2016; Akbari et al., 2017; Yuan et al., 2021; Li et al., 2021b; Yi et al., 2020). A common theme of the previous literature is the idea that agents can only access partial information of time-varying global loss functions, thus requiring local information exchange between neighboring agents. To the best of our knowledge, our paper is the first in this literature to provide competitive ratio bounds or consider spatial and temporal costs, e.g., switching costs.
Online Convex Optimization (OCO) with Switching Costs. Online convex optimization with switching costs was first introduced in Lin et al. (2012) to model dynamic power management in data centers. Different forms of cost functions have been studied since then, e.g., Chen et al. (2018); Shi et al. (2020); Lin et al. (2020), in order to fit a variety of applications from video streaming (Joseph & de Veciana, 2012) to energy markets (Kim & Giannakis, 2017). The quadratic form of switching cost was first proposed in Goel & Wierman (2019) and yields connections to optimal control, which were further explored in Lin et al. (2021). The literature has focused entirely on the centralized, single-agent setting. Our paper contributes to this literature by providing the first analysis of switching costs in a networked setting with a decentralized algorithm.
Perturbation Analysis of Online Algorithms. Sensitivity analysis of convex optimization problems studying the properties of the optimal solutions as a function of the problem parameters has a long and rich history (see Fiacco & Ishizuka (1990) for a survey). The works that are most related to ours consider the specific class of problems where the decision variables are located on a horizon axis, or consider a general network and aim to show the impact of a perturbation on a decision variable is exponentially decaying in the graph distance from that variable, e.g., Shin et al. (2021); Shin & Zavala (2021); Lin et al. (2021). The idea of using exponentially decaying perturbation bounds to analyze an online algorithm is first proposed in Lin et al. (2021), where only the temporal dimension is considered. This style of perturbation analysis is key to the proof of our competitive bounds and, to prove our competitive bounds, we provide new perturbation results that separate the impact of spatial and temporal costs in a network for the first time. Additionally, our analysis is enabled by new results on the decay rate of a product of exponential decay matrices, which may be of independent interest.
Notation. A complete notation table can be found in Appendix A. Here we describe the most commonly used notation. In a graph , we use to denote the distance (i.e. the length of the shortest path) between two vertices and . denotes the -hop neighborhood of vertex , i.e., . denotes the boundary of , i.e., . We generalize these notations to temporal-spatial graphs as follows. Let denote the Cartesian product of sets, and For any subset of vertices , we use to denote the set of all edges that have both endpoints in , and define (i.e., and its 1-hop neighbors). Let denote the maximum degree of any vertex in ; . We say a function is in if it is twice continuously differentiable. We use to denote the (Euclidean) 2-norm for vectors and the induced 2-norm for matrices.
2 Problem Setting
We consider a set of agents in a networked system where each agent individually decides on an action at each time step and the agents cooperatively seek to minimize a global cost over a finite time horizon . Specifically, we consider a graph of agents. Each vertex denotes an individual agent, and two agents and interact with each other if and only if they are connected by an undirected edge . At each time step , each agent picks an -dimensional local action , where is a positive integer and is a convex set of feasible actions. The global action at time is the vector of agent actions and incurs a global state cost, which is the sum of three types of local cost functions:
-
•
Node costs: Each agent incurs a time-varying node cost , which characterizes agent ’s local preference for its local action .
-
•
Temporal interaction costs: Each agent incurs a time-varying temporal interaction cost , that characterizes how agent ’s previous local action interacts with its current local action .
-
•
Spatial interaction costs: Each edge incurs a time-varying spatial interaction cost111Since is an undirected edge, the order we write the two inputs (the action of and the action of ) does not matter. Note that can be asymmetric for agents and , e.g., . . This characterizes how agents and ’s current local actions affect each other.
In our model, the node cost is the part of the cost that only depends the agent’s current local action. If the other two types of costs are zero functions, each agent will trivially pick the minimizer of its node cost. Temporal interaction costs encourage agents to choose a local action that is “compatible” with their previous local action. For example, a temporal interaction could be a switching cost which penalizes large deviations from the previous action, in order to make the trajectory of local actions “smooth”. Such switching costs can be found in work on single-agent online convex optimization, e.g., Chen et al. (2018); Goel et al. (2019); Lin et al. (2020). Spatial interaction costs, on the other hand, can be used to enforce some collective behavior among the agents. For example, spatial interaction can model the probability that one agent’s actions affects its neighbor’s actions in diffusion processes on social networks (Kempe et al., 2015); or model interactions between complement/substitute products in multiproduct pricing (Candogan et al., 2012).
Our analysis is based on standard smoothness and convexity assumptions on the local cost functions (see Appendix A for definitions of smoothness and strong convexity):
Assumption 2.1.
For , the local cost functions and feasible sets for all satisfy:
-
•
is -strongly convex, -smooth, and in ;
-
•
is convex, -smooth, and in ;
-
•
is convex, -smooth, and in ;
-
•
satisfies and can be written as where each is a convex function in .
Note that the assumptions above are common, even in the case of single-agent online convex optimization, e.g., see Li et al. (2020); Shi et al. (2020); Lin et al. (2021).
It is useful to separate the global stage costs into two parts based on whether the cost term depends only on the current global action or whether it also depends on the previous action. Specifically, the part that only depends on the current global action is the sum of all node costs and spatial interaction costs. We refer to this component as the (global) hitting cost and denote it as
The rest of the global stage cost involves the current global action and the previous global action . We refer to it as the (global) switching cost and denote it as
Combining the global hitting and switching costs, the networked agents work cooperatively to minimize the total global stage costs in a finite horizon starting from a given initial global action at time step : where denotes the decentralized online algorithm used by the agents. The offline optimal cost is the clairvoyant minimum cost one can incur on the same sequence of cost functions and the initial global action at time step , i.e.,
We measure the performance of any online algorithm ALG by the competitive ratio (CR), which is a widely-used metric in the literature of online optimization, e.g., Chen et al. (2018); Goel et al. (2019); Argue et al. (2020a).
Definition 2.2.
The competitive ratio of online algorithm is the supremum of over all possible problem instances, i.e.,
Finally, we define the partial hitting and switching costs over subsets of the agents. In particular, for a subset of agents , we denote the joint action over as and define the partial hitting cost and partial switching cost over as
(1) |
This notation is useful for presenting decentralized online algorithms where the optimizations are performed over the -hop neighborhood of each agent.
2.1 Information Availability Model
We assume that each agent has access to local cost functions up to a prediction horizon into the future, for themselves and their neighborhood up to a communication radius . In more detail, recall that denotes the -hop neighborhood of agent , i.e., . Before picking a local action at time , agent can observe steps of future node costs, temporal interaction costs, and spatial interaction costs within its -hop neighborhood, and the previous local actions in : .
We provide an illustration of the local cost functions known to agent at time in Figure 1. In the figure, the black circles, blue lines, and orange lines denote the node costs, temporal interaction costs, and spatial interaction costs respectively. The known functions are marked by solid lines. Note that, in addition to the local cost functions, agent also knows the local actions in at time , which are not illustrated in the figure.
To simplify notation, in cases when the prediction horizon exceeds the whole horizon length , we adopt the convention that , and for . These extended definitions do not affect our original problem with horizon .
As in many previous works studying the power of predictions in the online optimization literature, e.g., Yu et al. (2020); Lin et al. (2020); Li et al. (2020); Lin et al. (2021), we assume the -step predictions of cost functions are exact and leave the case of inexact predictions for future work. This model is reasonable in the case where the predictors can be trained to be very accurate for the near future. Although we focus on exact predictions throughout this paper, we also discuss how to extend this available information model to include inexact predictions in Appendix C.3.
3 Algorithm and Main Results
In this section we present our main results, which show that our simple and practical LPC algorithm can achieve an order-optimal competitive ratio for the networked online convex optimization problem. We first introduce LPC in Section 3.1. Then, we present the key idea that leads to our competitive ratio bound: a novel perturbation-based analysis (Section 3.2). Next, we use our perturbation analysis to derive bounds on the competitive ratio in Section 3.3. Finally, we show that the competitive ratio of LPC is order-optimal in Section 3.4. An outline that highlights the major novelties in our proofs can be found in Appendix C.
3.1 Localized Predictive Control (LPC)
The design of LPC is inspired by the classical model predictive control (MPC) framework (Garcia et al., 1989), which leverages all available information at the current time step to decide the current local action “greedily”. In our context, when an agent wants to decide its action at time , the available information includes previous local actions in the -hop neighborhood and -step predictions of all local node costs, temporal/spatial interaction costs. The boundaries of all available information, which are formed by and , are illustrated in Figure 2.
The pseudocode for LPC is presented in Algorithm 1. For each agent at time step , LPC fixes the actions on the boundaries of available information and then solves for the optimal actions inside the boundaries. Specifically, define as the optimal solution of the problem
s.t. | ||||
(2) | ||||
where the partial hitting cost and partial switching cost and for a subset of agents were defined in (1). Note that is a matrix of actions (in ) indexed by . (When the context is clear, we use the shorthand .) Once the parameters and are fixed, the agent can leverage its knowledge of the local cost functions to solve (3.1).
LPC fixes the parameters to be , which are the previous local actions in , and fixes the parameters to be the minimizers of local node cost functions at nodes in . The selection of the parameters at nodes in plays a similar role as the terminal cost of classical MPC in centralized settings.
For a single-agent system, MPC-style algorithms are perhaps the most prominent approach for optimization-based control (Garcia et al., 1989) because of their simplicity and excellent performance in practice. LPC extends the ideas of MPC to a multi-agent setting in a networked system by leveraging available information in both the temporal and spatial dimensions, whereas classical MPC focuses only on the temporal dimension. This change leads to significant technical challenges in the analysis.
3.2 Perturbation Analysis
The key idea underlying our analysis of LPC is that the impact of perturbations to the actions at the boundaries of the available information of an agent decay quickly, in fact exponentially fast, in the distance of the boundary from the agent. This quick decay means that small errors cannot build up to hurt algorithm performance.
In this section, we formally study such perturbations by deriving several new results which generalize perturbation bounds for online convex optimization problems on networks. Our bounds capture both the effect of temporal interactions as well as spatial interactions between agent actions, which is a more challenging problem compared to previous literature which considers either temporal interactions (Lin et al., 2021) or spatial interactions (Shin et al., 2021) but not both simultaneously.
More specifically, recall from Section 3.1 that for each agent at time , LPC solves an optimization problem where actions on the boundaries of available information (i.e., and ) are fixed. By the principle of optimality, we know that if the actions on the boundaries are selected to be identical with the offline optimal actions, the agent can decide its current action optimally by solving . However, due to the limits on the prediction horizon and communication radius, LPC can only approximate the offline optimal actions on the boundaries (we do this by using the minimizer of node cost functions). The key idea to our analysis of the optimality gap of LPC is by first asking: If we perturb the parameters of , i.e., the actions on the information boundaries, how large is the resulting change in a local action for (in the optimal solution to (3.1))?
Ideally, we would like the above impact to decay exponentially fast in the graph distance between node and the communication boundary for node (i.e., minus the graph distance between and ), as well as in the temporal distance between and . We formalize this goal as exponentially decaying local perturbation bound in Definition 3.1. We then show in Theorem 3.2 and Theorem 3.3 that such bounds hold under appropriate assumptions.
Definition 3.1.
Define and for arbitrary boundary parameters and . We say an exponentially decaying local perturbation bound holds if for non-negative constants
for any and arbitrary boundary parameters , we have:
Perturbation bounds were recently found to be a promising tool for the analysis of adaptive control and online optimization models(Lin et al., 2021). The exponentially decaying local perturbation bound defined above is similar in spirit to two recent results, i.e., Lin et al. (2021) derives a similar perturbation bound for line graphs and Shin et al. (2021) for general graphs with local perturbations. In fact, one may attempt to derive such a bound by applying these results directly; however, a major weakness of the direct approach is that it will yield , i.e., it cannot distinguish between spatial and temporal dependencies, and the bound deteriorates as increases. For instance, even if the temporal interactions are weak (i.e., ), can still be close to if is large, leading to a large slack in the perturbation bound for small prediction horizons .
We overcome this limitation by redefining the action variables. Specifically, to focus on the temporal decay effect, we regroup all local actions in as a “large” decision variable for time (in Figure 1 we would group each horizontal blue plane in to create a new variable). After regrouping, we have “large” decision variables located on a line graph, where the strength of the interactions between consecutive variables is upper bounded by . On the other hand, to focus on spatial decay, we regroup all local actions in as a decision variable (in Figure 1 we would group each vertical orange line connecting from to to create a new variable). After regrouping, we have “large” decision variables located on , where the strength of the interactions between two neighbors is upper bounded by . Averaging over the two perturbation bounds (since we have two valid bounds, their average is also a valid bound) provides the following exponentially decaying local perturbation bound (see (D.1) in Appendix D.1 for details of the proof).
Theorem 3.2.
Under Assumption 2.1, the exponentially decaying local perturbation bound (Definition 3.1) holds with , and
Note that, as (respectively ) tends to zero, (respectively ) in Theorem 3.2 also tends to zero with the scaling (resp. ).
Next, we provide a tighter bound (through a refined analysis) for the regime where is much larger than . Specifically, we establish a bound with the scaling and . Again, it is not possible to obtain this result from previous perturbation bounds in the literature.
Theorem 3.3.
Recall . Given any , define , and . Suppose Assumption 2.1 holds, and . Then the exponentially decaying local perturbation bound (Definition 3.1) holds with
Note that follow from the condition on . Also observe that as .
The main difference between this result and Theorem 3.2 is, instead of dividing and redefining the action variables, we explicitly write down the perturbations along spatial edges and along temporal edges in the original temporal-spatial graph. We observe that per-time-step spatial interactions are characterized by a banded matrix and that the inverse of the banded matrix exhibits exponential correlation decay, which implies the exponentially decaying local perturbation bounds holds if the perturbed boundary action and the impacted local action we consider are at the same the time step. However, for a multi-time-step problem, to characterize the impact at a local action at some time step due to perturbation at a boundary action at a different time step is a difficult problem. The main technical contribution of this proof is to establish that a product of exponentially decaying matrices still satisfies exponential decay under the conditions in Theorem 3.3. In addition, we obtain a tight bound on the decay rate of the product matrix (see Lemma C.3), which may be of independent interest.
Our condition on and characterizes a tradeoff between the allowable neighborhood boundary sizes , and how large needs to be compared to the interaction cost parameters . At one extreme, if , then by setting and , we obtain but must make a strong requirement on , namely, At the other extreme, if (as is the case if is a grid), then holds for any , we can impose a weaker requirement on : for example, taking yields a requirement (where ); which grows only linearly in , and compares favorably with the requirement which arose earlier.
Proofs of Theorem 3.2 and Theorem 3.3 are in Appendix D.
3.3 From Perturbations to Competitive Bounds
We now present our main result, which bounds the competitive ratio of LPC using the exponentially decaying local perturbation bounds defined in the previous section.
Before presenting the result, we first provide some intuition as to why the perturbation bounds are useful for deriving the competitive ratio bound. Specifically, to bound the competitive ratio requires bounding the gap between LPC’s trajectory and the offline optimal trajectory. This gap comes from the following two sources: (i) the per-time-step error made by LPC due to its limited prediction horizon and communication radius; and (ii) the cumulative impact of all per-time-step errors made in the past. Intuitively, the local perturbation bounds we derive in Section 3.2 allow us to bound the per-step error made jointly by all agents in LPC, and then we use the perturbation bounds from Lin et al. (2021) help us to bound the second type of cumulative errors.
We present our main result in the following theorem and defer a proof outline to Appendix C.2. A formal proof can be found in Appendix E.3.
Theorem 3.4.
Suppose Assumption 2.1 and the exponentially decaying local perturbation bound in Definition 3.1 holds. Define and define . If parameters and of LPC are large enough such that then the competitive ratio of LPC is upper bounded by
Here the notation hides factors that depend polynomially on and ; see Appendix E.3.
Recall that denotes the size of the largest -hop boundary in . The bound in Theorem 3.4 implies that if can be upper bounded by for some constant , the competitive ratio of LPC can be upper bounded by , because can be upper bounded by some constant that depends on in this case. Therefore, the competitive ratio improves exponentially with respect to the prediction horizon and communication radius .
Note that the assumption is not particularly restrictive: For commonly seen graphs like an -dimensional grid, is polynomial in , so works. More generally, for graphs with bounded degree , there exists such that, for any graph with node degrees bounded above by and any , we have (from either Theorem 3.2 or Theorem 3.3) will be small enough that, e.g., ; i.e., works. Thus we can eliminate the dependence on and in the competitive ratio by making additional assumptions on . This result is stated in Corollary 3.5 whose proof is deferred to Appendix E.4. Corollary 3.5 is a corollary of Theorem 3.4 and Theorem 3.3. We use the bound in Theorem 3.3 and not the bound in Theorem 3.2 because Theorem 3.3 is tighter when is small.
Corollary 3.5.
Suppose 2.1 and inequalities , and hold. If and satisfy that then the competitive ratio of LPC is upper bounded by where and are given by Theorem 3.3 with parameters and . The notation hides factors that depend polynomially on and see Appendix E.4 for the full constants.
3.4 A Lower Bound
We show that the competitive ratio in Theorem 3.4 is order-optimal by deriving a lower bound on the competitive ratio of any decentralized online algorithm with prediction horizon and communication radius . The specific constants and a proof of Theorem 3.6 can be found in Appendix F.
Theorem 3.6.
When , the competitive ratio of any decentralized online algorithm is lower bounded by . Here, the decay factor is given by . The decay factor is given by if ; otherwise. The notation hides factors that depend polynomially on and .
While Theorem 3.6 highlights that Theorem 3.4 is order-optimal, the decay factors in the lower bound differ from their counterparts in the upper bound for LPC. To understand the magnitude of the difference, we compare the bounds on graphs with bounded degree . The decay factors are a function of the interaction strengths, which are measured by and . Our lower bound on the temporal decay factor and upper bound only differ by a constant factor in the log-scale, and the same holds for the lower/upper bound in terms of the spatial decay factor.
To formalize this comparison, we derive a resource augmentation bound that bounds the additional “resources” that LPC needs to outperform the optimal decentralized online algorithm.222See, e.g., Roughgarden (2020), for an introduction to this flavor of bounds for expressing the near-optimality of an algorithm. Here the prediction horizon and the communication radius can be viewed as the “resources” available to a decentralized online algorithm in our setting. We ask how large do and given to LPC need to be, to ensure that it beats the optimal decentralized online algorithm given a communication radius and prediction horizon ?
We formally state our result in the following corollary and provide a proof in Appendix G.
Corollary 3.7.
Under Assumption 2.1, suppose the optimal decentralized online algorithm achieves a competitive ratio of with prediction horizon and communication radius . Additionally assume that and , where the notation hides a factor that depends polynomially on . As , LPC achieves a competitive ratio at least as good as that of the optimal decentralized online algorithm when LPC uses a prediction horizon of and a communication radius of .
Finally, note that we establish Corollary 3.7 based on the local perturbation bound in Theorem 3.2 rather than Theorem 3.3 for simplicity, because it does not make assumptions on the relationship among and . We expect that Theorem 3.3 can give better resource augmentation bounds under stronger assumptions on and .
4 Concluding Remarks
In this work, we introduce and study a novel form of decentralized online convex optimization in a networked system, where the local actions of each agent are coupled by temporal interactions and spatial interactions. We propose a decentralized online algorithm, LPC, which leverages all available information within a prediction horizon of length and a communication radius of to achieve a competitive ratio of . Our lower bound result shows that this competitive ratio is order optimal. Our results imply that the two types of resources, the prediction horizon and the communication radius, must be improved simultaneously in order to obtain a competitive ratio that converges to . That is, it is not enough to either have a large communication radius or a long prediction horizon, the combination of both is necessary to approach the hindsight optimal performance.
A limitation of this work is that we have considered only exact predictions in the available information model, the LPC algorithm, and its analysis. Considering inexact predictions is an important future goal and we are optimistic that our work can be generalized in that direction using the roadmap in Appendix C.3.
Acknowledgement: We would like to thank ICML reviewers and meta-reviewer, as well as Yisong Yue for their valuable feedback on this work. This work is supported by NSF grants ECCS-2154171, CNS-2146814, CPS-2136197, CNS-2106403, NGSDI-2105648, CMMI-1653477, with additional support from Amazon AWS, PIMCO, and the Resnick Sustainability Insitute. Yiheng Lin was supported by Kortschak Scholars program.
References
- Agarwal et al. (2019) Agarwal, N., Bullins, B., Hazan, E., Kakade, S., and Singh, K. Online control with adversarial disturbances. In International Conference on Machine Learning, pp. 111–119. PMLR, 2019.
- Akbari et al. (2017) Akbari, M., Gharesifard, B., and Linder, T. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 4(3), 2017.
- Antoniadis et al. (2020) Antoniadis, A., Coester, C., Elias, M., Polak, A., and Simon, B. Online metric algorithms with untrusted predictions. In International Conference on Machine Learning, pp. 345–355. PMLR, 2020.
- Argue et al. (2020a) Argue, C., Gupta, A., and Guruganesh, G. Dimension-free bounds for chasing convex functions. In Conference on Learning Theory, pp. 219–241. PMLR, 2020a.
- Argue et al. (2020b) Argue, C., Gupta, A., Guruganesh, G., and Tang, Z. Chasing convex bodies with linear competitive ratio. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1519–1524. SIAM, 2020b.
- Badiei et al. (2015) Badiei, M., Li, N., and Wierman, A. Online convex optimization with ramp constraints. In 2015 54th IEEE Conference on Decision and Control (CDC), pp. 6730–6736. IEEE, 2015.
- Bertsekas & Tsitsiklis (1989) Bertsekas, D. and Tsitsiklis, J. Parallel and Distributed Computation: Numeral Methods. Prentice-Hall Inc., 1989.
- Besbes & Lobel (2015) Besbes, O. and Lobel, I. Intertemporal price discrimination: Structure and computation of optimal policies. Management Science, 61(1):92–110, 2015.
- Biggs et al. (2021) Biggs, M., Sun, W., and Ettl, M. Model distillation for revenue optimization: Interpretable personalized pricing. In International Conference on Machine Learning, pp. 946–956. PMLR, 2021.
- Boyd et al. (2011) Boyd, S., Parikh, N., and Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
- Candogan et al. (2012) Candogan, O., Bimpikis, K., and Ozdaglar, A. Optimal pricing in networks with externalities. Operations Research, 60(4):883–905, 2012.
- Cao & Basar (2021) Cao, X. and Basar, T. Decentralized online convex optimization with feedback delays. IEEE Transactions on Automatic Control, 2021. ISSN 15582523. doi: 10.1109/TAC.2021.3092562.
- Cao & Başar (2021) Cao, X. and Başar, T. Decentralized online convex optimization based on signs of relative states. Automatica, 129, 2021. ISSN 00051098. doi: 10.1016/j.automatica.2021.109676.
- Caro & Gallien (2012) Caro, F. and Gallien, J. Clearance pricing optimization for a fast-fashion retailer. Operations Research, 60(6):1404–1422, 2012.
- Chen & Chen (2015) Chen, M. and Chen, Z. L. Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production and Operations Management, 24, 2015. ISSN 19375956. doi: 10.1111/poms.12295.
- Chen et al. (2016) Chen, N., Comden, J., Liu, Z., Gandhi, A., and Wierman, A. Using predictions in online optimization: Looking forward with an eye on the past. ACM SIGMETRICS Performance Evaluation Review, 44(1):193–206, 2016.
- Chen et al. (2018) Chen, N., Goel, G., and Wierman, A. Smoothed online convex optimization in high dimensions via online balanced descent. In Proceedings of Conference On Learning Theory (COLT), pp. 1574–1594, 2018.
- Chen & Farias (2018) Chen, Y. and Farias, V. F. Robust dynamic pricing with strategic customers. Mathematics of Operations Research, 43(4):1119–1142, 2018.
- Chen & Wang (2005) Chen, Y. Q. and Wang, Z. Formation control: A review and a new consideration. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2005. doi: 10.1109/IROS.2005.1545539.
- Fiacco & Ishizuka (1990) Fiacco, A. V. and Ishizuka, Y. Sensitivity and stability analysis for nonlinear programming. Annals of Operations Research, 27(1):215–235, 1990.
- Forsgren et al. (2002) Forsgren, A., Gill, P. E., and Wright, M. H. Interior methods for nonlinear optimization. SIAM review, 44(4):525–597, 2002.
- Gallego & Topaloglu (2019) Gallego, G. and Topaloglu, H. Revenue management and pricing analytics. International Series in Operations Research and Management Science, 279, 2019. ISSN 22147934. doi: 10.1007/978-1-4939-9606-3.
- Gamarnik & Goldberg (2010) Gamarnik, D. and Goldberg, D. A. Randomized greedy algorithms for independent sets and matchings in regular graphs: Exact results and finite girth corrections. Combinatorics Probability and Computing, 19, 2010. ISSN 09635483. doi: 10.1017/S0963548309990186.
- Garcia et al. (1989) Garcia, C. E., Prett, D. M., and Morari, M. Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348, 1989.
- Goel & Wierman (2019) Goel, G. and Wierman, A. An online algorithm for smoothed regression and LQR control. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
- Goel et al. (2019) Goel, G., Lin, Y., Sun, H., and Wierman, A. Beyond online balanced descent: An optimal algorithm for smoothed online optimization. In Neural Information Processing Systems (NeurIPS), 2019.
- Hochba (1997) Hochba, D. S. Approximation algorithms for NP-hard problems. ACM Sigact News, 28(2):40–52, 1997.
- Hosseini et al. (2016) Hosseini, S., Chapman, A., and Mesbahi, M. Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11), 2016. ISSN 00189286. doi: 10.1109/TAC.2016.2525928.
- Joseph & de Veciana (2012) Joseph, V. and de Veciana, G. Jointly optimizing multi-user rate adaptation for video transport over wireless systems: Mean-fairness-variability tradeoffs. In Proceedings of the IEEE INFOCOM, pp. 567–575, 2012.
- Kempe et al. (2015) Kempe, D., Kleinberg, J., and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11, 2015. ISSN 15572862. doi: 10.4086/toc.2015.v011a004.
- Kim & Giannakis (2017) Kim, S. and Giannakis, G. B. An online convex optimization approach to real-time energy pricing for demand response. IEEE Transactions on Smart Grid, 8(6):2784–2793, 2017. ISSN 1949-3053. doi: 10.1109/TSG.2016.2539948.
- Li et al. (2021a) Li, X., Yi, X., and Xie, L. Distributed online convex optimization with an aggregative variable. IEEE Transactions on Control of Network Systems, 2021a. ISSN 2325-5870. doi: 10.1109/tcns.2021.3107480.
- Li et al. (2021b) Li, X., Yi, X., and Xie, L. Distributed online optimization for multi-agent networks with coupled inequality constraints. IEEE Transactions on Automatic Control, 66(8), 2021b. ISSN 15582523. doi: 10.1109/TAC.2020.3021011.
- Li et al. (2020) Li, Y., Qu, G., and Li, N. Online optimization with predictions and switching costs: Fast algorithms and the fundamental limit. IEEE Transactions on Automatic Control, 2020.
- Lin et al. (2012) Lin, M., Liu, Z., Wierman, A., and Andrew, L. L. Online algorithms for geographical load balancing. In Proceedings of the International Green Computing Conference (IGCC), pp. 1–10, 2012.
- Lin et al. (2020) Lin, Y., Goel, G., and Wierman, A. Online optimization with predictions and non-convex losses. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(1):1–32, 2020.
- Lin et al. (2021) Lin, Y., Hu, Y., Sun, H., Shi, G., Qu, G., and Wierman, A. Perturbation-based regret analysis of predictive control in linear time varying systems. arXiv preprint arXiv:2106.10497, 2021.
- Marschak (1955) Marschak, J. Elements for a theory of teams. Management Science, 1, 1955. ISSN 0025-1909. doi: 10.1287/mnsc.1.2.127.
- Marschak & Radner (1972) Marschak, J. and Radner, R. Economic Theory of Teams. Cowles Foundation for Research in Economics at Yale University, 1972.
- Molzahn et al. (2017) Molzahn, D. K., Dörfler, F., Sandberg, H., Low, S. H., Chakrabarti, S., Baldick, R., and Lavaei, J. A survey of distributed optimization and control algorithms for electric power systems. IEEE Transactions on Smart Grid, 8(6):2941–2962, 2017.
- Morari & Lee (1999) Morari, M. and Lee, J. H. Model predictive control: past, present and future. Computers & Chemical Engineering, 23(4-5):667–682, 1999.
- Nedić et al. (2018) Nedić, A., Olshevsky, A., and Rabbat, M. G. Network topology and communication-computation tradeoffs in decentralized optimization. Proceedings of the IEEE, 106(5):953–976, 2018.
- Oh et al. (2015) Oh, K. K., Park, M. C., and Ahn, H. S. A survey of multi-agent formation control. Automatica, 53, 2015. ISSN 00051098. doi: 10.1016/j.automatica.2014.10.022.
- Roughgarden (2020) Roughgarden, T. Resource augmentation. CoRR, abs/2007.13234, 2020. URL https://arxiv.org/abs/2007.13234.
- Schlosser (2016) Schlosser, R. Stochastic dynamic multi-product pricing with dynamic advertising and adoption effects. Journal of Revenue and Pricing Management, 15(2):153–169, 2016.
- Sellke (2020) Sellke, M. Chasing convex bodies optimally. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1509–1518. SIAM, 2020.
- Shi et al. (2020) Shi, G., Lin, Y., Chung, S.-J., Yue, Y., and Wierman, A. Online optimization with memory and competitive control. Advances in Neural Information Processing Systems, 33:20636–20647, 2020.
- Shi et al. (2019) Shi, M., Lin, X., and Jiao, L. On the value of look-ahead in competitive online convex optimization. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2):22, 2019.
- Shi et al. (2015) Shi, W., Ling, Q., Wu, G., and Yin, W. Extra: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.
- Shi et al. (2021) Shi, Y., Qu, G., Low, S., Anandkumar, A., and Wierman, A. Stability constrained reinforcement learning for real-time voltage control. arXiv preprint arXiv:2109.14854, 2021.
- Shin & Zavala (2021) Shin, S. and Zavala, V. M. Controllability and observability imply exponential decay of sensitivity in dynamic optimization. IFAC-PapersOnLine, 54(6):179–184, 2021.
- Shin et al. (2020) Shin, S., Zavala, V. M., and Anitescu, M. Decentralized schemes with overlap for solving graph-structured optimization problems. IEEE Transactions on Control of Network Systems, 7(3):1225–1236, Sep 2020. ISSN 2372-2533. doi: 10.1109/tcns.2020.2967805. URL http://dx.doi.org/10.1109/TCNS.2020.2967805.
- Shin et al. (2021) Shin, S., Anitescu, M., and Zavala, V. M. Exponential decay of sensitivity in graph-structured nonlinear programs. arXiv preprint arXiv:2101.03067, 2021.
- Song & Chintagunta (2006) Song, I. and Chintagunta, P. K. Measuring cross-category price effects with aggregate store data. Management Science, 52(10):1594–1609, 2006.
- Stanica (2001) Stanica, P. Good lower and upper bounds on binomial coefficients. Journal of Inequalities in Pure and Applied Mathematics, 2(3):30, 2001.
- Talluri & Ryzin (2006) Talluri, K. and Ryzin, G. V. The theory and practice of revenue management, springer. International Series in Operations Research and Management, 68, 2006.
- Tretter (2008) Tretter, C. Spectral theory of block operator matrices and applications. Spectral Theory of Block Operator Matrices and Applications, 2008. doi: 10.1142/P493.
- Wainwright & Jordan (2008) Wainwright, M. J. and Jordan, M. I. Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.
- Xin et al. (2020) Xin, R., Pu, S., Nedić, A., and Khan, U. A. A general framework for decentralized optimization with first-order methods. Proceedings of the IEEE, 108(11):1869–1889, 2020.
- Yi et al. (2020) Yi, X., Li, X., Xie, L., and Johansson, K. H. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 68, 2020. ISSN 19410476. doi: 10.1109/TSP.2020.2964200.
- Yu et al. (2020) Yu, C., Shi, G., Chung, S.-J., Yue, Y., and Wierman, A. The power of predictions in online control. Advances in Neural Information Processing Systems, 33, 2020.
- Yuan et al. (2021) Yuan, D., Proutiere, A., and Shi, G. Distributed online optimization with long-term constraints. IEEE Transactions on Automatic Control, 2021. ISSN 15582523. doi: 10.1109/TAC.2021.3057601.
Appendix A Notation Summary and Definitions
The notation we use throughout the paper is summarized in the following two tables.
Notation | Meaning |
---|---|
The network of agents connected by undirected edges; | |
The graph distance (i.e. the length of the shortest path) between two vertices and in ; | |
The -hop neighborhood of vertex/agent in , i.e., ; | |
The boundary of the -hop neighborhood of vertex/agent , i.e., ; | |
The size of the largest -hop boundary in , i.e., ; | |
The maximum degree of any vertex in ; | |
The set of all edges that have both endpoints in , where ; | |
The extension of by 1-hop, i.e., ; | |
, which is a set of (time, vertex) pairs; | |
, which is a set of (time, vertex) pairs; |
Notation | Meaning |
---|---|
The (Euclidean) 2-norm for vectors and the induced 2-norm for matrices; | |
The whole horizon length of Networked OCO problem; | |
The sequence of integers ; | |
For any positive integer , denotes the set of all symmetric real matrices; | |
The sequence , for ; | |
The individual action of agent at time step . It is a vector in ; | |
The joint action of all agents in set at time , i.e., ; | |
The joint action of all agents in at time , i.e., . It is a shorthand of ; | |
The offline optimal joint action of all agents at time ; | |
The clairvoyant joint decision of all agents at time given that the joint decision is at time ; | |
The node cost function for agent at time step ; | |
The temporal interaction cost function for agent at time step ; | |
The spatial interaction cost for edge at time step ; | |
The strong convexity constant of node costs ; | |
The smoothness constant of node costs, temporal interaction costs, and spatial interaction costs; | |
The feasible set of for agent at time . It is a convex subset of ; | |
The minimizer of node cost function for at time subject to , i.e., ; | |
The total node costs and spatial interaction costs over a subset at time , i.e., | |
; | |
The total temporal interaction costs over a subset at time , i.e., ; | |
The total node costs and spatial interaction costs over a at time . A shorthand of ; | |
The total temporal interaction costs over at time . A shorthand of ; | |
The optimal individual decisions in when the decision boundaries formed by | |
and are fixed as parameters; | |
The optimal global trajectory when and are fixed as parameters; | |
The optimal global trajectory when is fixed as the parameter; |
In addition to the notation in the tables above, we make use of the concepts of strong convexity and smoothness throughout this paper.
Definition A.1.
For a fixed dimension , let be a convex set, and suppose function is a differentiable function. Then, is called -smooth for some constant if
and is called -strongly convex for some constant if
Here denotes the dot product of vectors.
Appendix B Example: Multiproduct Pricing
The networked online convex optimization problem captures many applications where individual agents must make decisions online in a networked system. In this section, we give a concrete motivating example from multiproduct pricing problems. Many recent books of revenue management (Talluri & Ryzin, 2006; Gallego & Topaloglu, 2019) and papers (Song & Chintagunta, 2006; Caro & Gallien, 2012; Candogan et al., 2012; Chen & Chen, 2015) describe numerous instances of this problem. In our example below, we follow a similar pricing model as in Candogan et al. (2012).
Consider a setting where a large company sells different products and wants to maximize its revenue by adjusting prices adaptively in a time-varying market. Each vertex/agent corresponds to a product and denotes its price at time . Two products and are connected by an edge if they interact, e.g., because the products are complements or substitutes.
We assume a classical linear demand model (Talluri & Ryzin, 2006; Gallego & Topaloglu, 2019), where the demand of at time , denoted as , is given by
with parameters and . Here, Part 1 corresponds to the nominal demand at price ; Part 2 adds the network externalities from ’s complements/substitutes, and Part 3 reflects the pent up demand of product due to high price at time . Note that the coefficient can be different with . To simplify the notations, for each undirected edge , we define an aggregate coefficient .
The full revenue maximization problem can be written as
(3) | ||||
s.t. |
which is equivalent to the following:
(4) | ||||
s.t. |
We assume that the product’s own price elasticity coefficient is uniformly larger than the sum of magnitudes of cross-elasticity coefficients, i.e., exist , s.t.
holds for any . Further, we assume , , and .
We now observe that this problem fits within our framework with node, spatial, and temporal costs which are quadratic and defined as follows.
Note that is plus a constant , and the interaction functions can be rewritten as
Summing, we see that
(5) |
Hence the optimal solution of (4) is the same as the following problem:
(6) | ||||
s.t. |
where the node cost function is nonnegative, -strongly convex, and -smooth; the spatial interaction function is nonnegative, convex and -smooth; the temporal interaction function is nonegative, convex and -smooth.
The decentralized nature of our policy is important in this setting. Interpretable pricing algorithms (Biggs et al., 2021) are attractive in practice. Our local pricing algorithm is indeed interpretable, since the current price of a given product is transparently determined by reliable predictions of demand in the near future as well as interactions with directly related products.
In addition, exactly solving the global multiproduct pricing problem can be computationally challenging in practice, especially when the network is large. For example, large online e-commerce companies maintain millions of products, which makes the entire network difficult to store, let alone do computation over. Moreover, due to the ease of changing prices, e-commerce companies often use dynamic pricing and change prices on a daily (or quicker) basis, which magnifies the computational burden.
B.1 Competitive Bound
We end our discussion of multiproduct pricing by showing how the competitive bound in Theorem 3.4 can be applied to the revenue maximization problem of product networks through the use of the following lemma.
Lemma B.1.
Suppose the competitive ratio of our general cost minimization problem is , which a function of prediction horizon and communication radius . Suppose , , then the competitive ratio for the corresponding revenue maximization problem, defined as , is at least , where denotes the degree of the product network and .
Proof.
We define . Suppose
then
Rearranging the terms yields
(7) |
To find a lower bound on , we choose a pricing strategy such that where . We first check that the demand is always nonnegative under this strategy:
Moreover,
Hence this is a feasible price strategy.
Since for the cost minimization problem,
This allows us to complete the proof as follows
∎
Appendix C Proof Outline
In this section, we outline the major novelties in our proofs for the tighter exponentially decaying local perturbation bound in Theorem 3.3 and the main competitive ratio bound for LPC in Theorem 3.4. The full details of the proofs of these and other results are in the appendices following this one.
C.1 Refined Analysis of Perturbation Bounds
We begin by outlining the four-step structure we use to prove Theorem 3.3. Our goal is to highlight the main ideas, while deferring a detailed proof to Appendix D.2.
Step 1. Establish first order equations
We define as the objective function in (3.1), where actions on the boundary are fixed as and the actions at time are fixed as . We denote those fixed actions as system parameter
To avoid writing the time index repeatedly, we use to denote actions at time for . The main lemma in for this step is the following.
Lemma C.1.
Given , system parameter and perturbation vector , we have
where
Step 2: Exploit the structure of matrix
is a hierarchical block matrix with the first level of dimension . When fixing the first level indices (i.e. time indices) in , the lower level matrices are non-zero only if the difference in the time indices is . Hence we decompose to a block diagonal matrix and tri-diagonal block matrix with zero matrix on the diagonal. Each diagonal block in is a graph-induced banded matrix, which captures the Hessian of in a single time step. Denote each diagonal block as for . Further, for , (similarly ) captures the temporal correlation of individual’s action between consecutive time steps. Given this decomposition,
For the ease of notation, we denote by Note that is not necessarily a symmetric matrix. Nevertheless, under technical conditions on ’s eigenvalues, we have the following power series expansion (Shin et al., 2020). The details are presented in the Lemma C.2 in Appendix D.2.
Lemma C.2.
Under the condition , we have
(8) |
To understand the the power series in (8), consider the special case where each block , and . Denote as , which is equivalent to . Then, we have , , Intuitively, captures the “correlation over actions” after one time step. More generally, for and any two time indices , ,
where is a constant depending on and equal to zero if .
Given that is a graph-induced banded matrix, satisfies exponential-decay properties, which makes it plausible that is an exponential decay matrix with a slower rate.
For the general case, we need to bound terms similar to . This is the goal of Step 3.
Step 3: Properties for general exponential-decay matrices
The goal of this step is to establish that a product of exponential decay matrices still exhibits exponential decay property under technical conditions about the underlying graph.
Lemma C.3.
Given any graph and integers , suppose block matrices all satisfy exponential decay properties, i.e. exists , and , s.t.,
Select some s.t. . If , then satisfies exponential decay properties with decay rate , i.e.,
where .
Step 4: Establish correlation decay properties of matrix
The last step of the proof is to study the properties of . To accomplish this, we first show that, for time indices , has the following properties:
-
•
if or is odd.
-
•
is a summation of terms and the number of such terms is bounded by .
We formally state and prove the above observation in Lemma D.3. We can further use Theorem C.3 on block matrices , which gives the following lemma.
Lemma C.4.
Recall . Select s.t. and . Given and , we have
C.2 From Perturbation to Competitive Ratio
We now show how to use the result proven in the previous section to prove our competitive ratio bounds in Theorem 3.4. Our starting point is the assumption that the exponentially decaying local perturbation bound in Definition 3.1 holds for some and , which is established using the proof approach outlined in the Section 3.3.
As we discussed in Section 3.3, our proof contains two key parts: (i) we bound the per-time-step error of LPC (Lemma C.7); and (ii) show that the per-time-step error does not accumulate to be unbounded (Theorem C.8).
A key observation that enables the above analysis approach is that the aggregation of local per-time-step error made by each agent at can be viewed as a global per-time-step error in the joint global action . Following this observation, we first introduce a global perturbation bound that focuses on the global action rather than the local actions . Recall that denotes the global hitting cost (see Section 2). Define the optimization problem that solves the optimal trajectory from global state to
s.t. | (9) |
and another one that solves the optimal trajectory from global state to the end of the game
s.t. | (10) |
The following global perturbation bound can be derived from Theorem 3.1 in Lin et al. (2021):
Theorem C.5 (Global Perturbation Bound).
To make the concept of per-time-step error rigorous, we formally define it as the distance between the actual next action picked by LPC and the clairvoyant optimal next action from previous action to the end of the game:
Definition C.6 (Per-step error magnitude).
At time step , given the previous state , the (decentralized) online algorithm picks . We define error as
Using the local perturbation bound in Definition 3.1, we show the per-time-step error of LPC decays exponentially with respect to prediction length and communication range . This result is stated formally in Lemma C.7, and the proof can be found in Appendix E.2.
Lemma C.7.
For with parameters and , satisfies
where .
Using the global perturbation bound in Theorem C.5, we show can be upper bounded by the sum of per-time-step errors of LPC in Theorem C.8. The proof can be found in Appendix E.1.
Theorem C.8.
Let denote the offline optimal global trajectory and denote the trajectory of . The trajectory of satisfies that
where and is defined in Theorem C.5.
To understand the bound in Theorem C.8, we can set all per-time-step error to be zero except a single time step . We see the impact of on the total squared distance is up to some constant factor of . This is because the impact of on decays exponentially as increases from to .
By substituting the per-time-step error bound in Lemma C.7 into Theorem C.8, one can bound by the offline optimal cost, which can be converted to the competitive ratio bound in Theorem 3.4.
C.3 Roadmap to Generalize the Proof to Inexact Predictions
In this section, we present a roadmap to generalize our proof to the case where predictions of future cost functions are inexact. In the information availability model in Section 2.1, one can study inexact predictions by introducing additional disturbance parameters to the 3 types of cost functions and generalize them to for all . represents the disturbances in the cost functions that are hard to predict exactly. Before the decentralized online algorithm decides each local action, it receives the generalized cost functions and noisy predictions of the true disturbance parameters within time steps and an -hop neighborhood. In the LPC algorithm (Algorithm 1), the optimization problem can then be solved with the noisy predictions of disturbance parameters. To analyze the performance of LPC in the presence of prediction errors, one can first generalize the exponentially decaying perturbation bounds in Theorem 3.2 and Theorem 3.3 to include the perturbations on disturbance parameters similar to what we already did in Theorem D.2. The prediction error on disturbance parameters will result in an additional additive term in the per-step error bound in Lemma C.7. If one is willing to assume that the total sum of prediction errors is , as in Antoniadis et al. (2020), one can derive a competitive ratio for LPC by substituting the per-step error bound into Theorem C.8. It is worth noting that the resulting competitive ratio will inevitably depend on the quality of predictions, and will converge to a limit larger than (under imperfect predictions) as the prediction horizon and the communication radius increase.
Appendix D Perturbation Bounds
This section provides the full proofs of the perturbation bounds stated in Section 3.2.
D.1 Proof of Theorem 3.2
We begin with a technical lemma. Recall that for any positive integer , denotes the set of all symmetric real matrices.
Lemma D.1.
For a graph , suppose is a positive definite matrix in formed by blocks, where the -th block has dimension , i.e., . Assume that is -banded for an even positive integer ; i.e.,
Let denote the smallest eigenvalue value of , and denote the largest eigenvalue value of . Assume that . Suppose , where is positive semi-definite. Let , where . Then we have , where
Here denotes the condition number of matrix .
We can show Lemma D.1 using the same method as Lemma B.1 in Lin et al. (2021). We only need to note that even when the size of blocks are not identical, the th power of a -banded matrix is a -banded matrix for any positive integer .
With the help of Lemma D.1, we can proceed to show a local perturbation bound on a general in Theorem D.2, where can be different from the network of agents in Section 2. Compared with Theorem 3.1 in Lin et al. (2021), Theorem D.2 is more general because it considers a general network of decision variables while Theorem 3.1 in Lin et al. (2021) only consider the special case of a line graph. Although Theorem D.2 does not consider the temporal dimension which features in the local perturbation bound defined in Definition 3.1, we will use it to show Theorem 3.2 later by redefining the variables from two perspectives.
Theorem D.2.
For a network with undirected edges, suppose that each node is associated with a decision vector333We add a hat over the decision vector to distinguish it with the local action and global action defined in Section 2. We assume is a dimensional real vector. and a cost function , and each edge is associated with an edge cost . Assume that is -strongly convex for all and is -smooth for all . For some subset , define
For the disturbance vectors444We do not consider the disturbance vectors in the exponentially decaying local perturbation bounds defined in Definition 3.1, but adding into the edge costs makes Theorem D.2 more general. For each edge , is a -dimensional real vector. indexed by and indexed by , let denote the optimal solution of the optimization problem
Then, we have that for any vertex , the following inequality holds:
where and . Here, denote the maximum degree of any vertex in graph . For , we define .
Proof of Theorem D.2.
Let be a vector where for and for . Let be an arbitrary real number. Define function as
To simplify the notation, we use to denote the tuple of system parameters, i.e.,
From our construction, we know that is -strongly convex in , so we use the decomposition , where
Since is the minimizer of convex function , we see that
Taking the derivative with respect to gives that
To simplify the notation, we define
where in , the block size is ; in , the block size is ; in , the block size is . Hence we can write
Recall that are block matrices with block size ; are block matrices with block size . Let denote the set of neighbors of vertex on . For , the -th block can be non-zero only if . For , the -th block can be non-zero only if and . Hence we see that
Since the switching costs are -strongly smooth, we know that the norms of
are all upper bounded by . Taking norms on both sides gives that
(11) |
Note that can be decomposed as , where
Since is block tri-diagonal and satisfies , and is block diagonal and satisfies , we obtain the following using Lemma D.1:
where .
Finally, by integration we can complete the proof
∎
Now we return to the proof of Theorem 3.2. For simplicity, we temporarily assume the individual decision points are unconstrained, i.e., . We discuss how to relax this assumption in Appendix D.3.
We first consider the case when and only differ at one entry or . If the difference is at , by viewing each subset for in the original problem as a vertex in the new graph and applying Theorem D.2, we obtain that
(12) |
where and . On the other hand, by viewing each subset for in the original problem as a vertex in the new graph and applying Theorem D.2, we obtain that
(13) |
where and . Combining (12) and (13) gives that
(14) |
when and only differ at one entry for .
We can use the same method to show that when and only differ at one entry for , we have
(15) |
In the general case where and differ not only at one entry, we can perturb the entries of parameters one at a time and apply the triangle inequality. Then, the conclusion of Theorem 3.2 follows from (D.1) and (15).
D.2 Proof of Theorem 3.3
The proof follows a four step structure outlined in Section C.1.
Step 1. Establish first order equations
Given any system parameter , we can define function as follows:
coincides with on every node in coincides with on every node in For , coincides with on the boundary, i.e.,
Let perturbation vector where and for .
Given , is the global minimizer of convex function , and hence we have
Taking the derivative with respect to , we establish the following set of equations:
(16) |
We adopt the following short-hand notation:
-
•
, which is a hierarchical block matrix with the first level of dimension , the second level of dimension and the third level of dimension .
-
•
, which is also a hierarchical block matrix with the first level of dimension , the second level of dimension and the third level of dimension .
-
•
, which is also a hierarchical block matrix with the first level of dimension , the second level of dimension and the third level of dimension .
-
•
, which is also a hierarchical block matrix with the first level of dimension . the second level of dimension and the third level of dimension .
Using the above, we can rewrite (16) as follows:
Due to the structure of temporal interaction cost functions, for (resp. ), only when the first level index is (resp. ), the lower level block matrix is non-zero; due to the structure of spatial interaction cost functions, for , only when the first level index is , the lower level block matrix is non-zero. Hence, for , we have
(17) |
where the subscripts on the right hand side denote the first level index of hierarchical block matrices , , and .
Step 2. Exploit the structure of matrix
We decompose to block diagonal matrix and tri-diagonal block matrix such that . We denote each diagonal block in as for . Other blocks in are zero matrices.
Each non-zero block in is a diagonal block matrix, which captures the Hessian of temporal interaction cost between consecutive time steps. Denote each block as for .
We rewrite the inverse of as follows:
Next we show the proof for Lemma C.2.
Proof of Lemma C.2.
We claim the eigenvalues of are in for some and such that We first establish Lemma C.2 based on the claim and then prove the claim.
We follow the argument as in the proof of Thm 4 in Shin et al. (2020). Since any eigenvalue of satisfies , Thus, the eigenvalues of lie on , which guarantees Therefore,
We let and and prove the above claim by utilizing Gershgorin circle theorem for block matrices.
By Theorem 1.13.1 and Remark 1.13.2 of Tretter (2008), the following holds: Consider () where and is symmetric. Suppose is the spectrum of a matrix. Define set
where denotes a disk and is the -th smallest eigenvalues of . Then,
Next, we use the above fact to find a superset of . Every diagonal block of is . Moreover, for , , . Hence we have
The last inequality is by Assumptions 2.1. Therefore, This implies all eigenvalues of are in ∎
To further simplify the notation in the power series expansion, we define . Given any time indices and , we have
(18) |
where the first equality is since is a diagonal block matrix, the second equality is due to Lemma C.2.
Step 3: Property for general exponential-decay matrices
This step simply requires proving Lemma C.3.
Step 4: Establish correlation decay properties of matrix
In this step, we use the property developed for general exponential-decay matrices on and derive the perturbation bound in the Theorem 3.3.
Lemma D.3.
For , time index , has the following properties:
-
•
if or is odd.
-
•
is a summation of terms and the number of such terms is bounded by .
Note for integers , we define if is odd.
Proof.
Since is a tri-diagonal banded matrix, for . We prove the rest of properties of by induction on . When ,
Lemma D.3 holds for the base case. Suppose Lemma D.3 holds for for . Let , then
By induction hypothesis, is a summation of terms . Moreover, the number of such terms is bounded by . Next we will show case by case.
Case 1: is odd.
If is odd, then and are both odd. Under this case,
which is equal to
Case 2: is even and . Under this case, we have
Since is even,
Case 3: is even and .
If is even, then and are both even. We denote as . By triangle inequality, and are in . Since ,
which sums to by Pascal’s triangle.
∎
Next we present the proof of Lemma C.4.
Proof of Lemma C.4.
By Lemma D.3, is a summation of terms and the number of such terms is bounded by .
Define . Recall is a diagonal matrix and is a graph-induced banded matrix.
where the last inequality is by using Lemma D.1 on .
Under the condition , we can use Lemma C.3 to obtain the following bound,
Since the number of such terms is bounded by , we have
∎
Lemma D.4.
Given , and , we have
and for , ,
where and We let .
Proof.
Given and , since , we have
(20) |
With slight abuse of notation, we use to denote , and to denote in this proof from now. We can rewrite the right hand side of (20) using the new notation as follows:
(21) |
For a given and ,
where the last equality is since spacial interaction costs are only among neighboring nodes.
For a given , since the spacial interaction cost for each edge is smooth,
which gives
Therefore,
(22) |
By Lemma C.4, satisfies the following exponential decay properties: for any ,
where we choose , and .
Moreover, (which denotes ) is the inverse of a graph-induced banded matrix. satisfies: for any ,
where the first inequality is again by using Lemma D.1 on .
Applying Lemma C.3 on and , we have for any , and ,
where and Note that , it is straightforward to verify that the above inequality holds when .
With the exponential decay properties of , we have
(23) |
The third inequality uses , which can be proved using the following version of Stirling’s approximation: For all , denotes the natural number,
Similarly, consider for . With slight abuse of notation, in this proof, we use to denote and use the notation to denote Following the same steps as before, we have
(24) |
Since temporal interactions occurs for the same node under consecutive time steps, is a diagonal block matrix. Hence,
Moreover, using the exponential decay properties of , we have for ,
Therefore,
(25) |
∎
Given time index , node , and perturbation vector ,
where and We let . Under the condition , and .
Then,
Finally, let and . By integration,
which is bounded by
D.3 Adding Constraints to Perturbation Bounds
Recall that in Appendix D.1 and D.2, we showed Theorem 3.2 and Theorem 3.3 under the assumption that the individual decisions are unconstrained to simplify the analysis. In this section, we present a general way to relax this assumption by incorporating logarithm barrier functions, which also applies for Theorem C.5.
Recall that in Assumption 2.1, we assume that is convex with a non-empty interior, and can be expressed as
where the th constraint is a convex function in . For any time-vertex pair , we can approximate the individual constraints
by adding the logarithmic barrier function to the original node cost function . Here, parameter is a positive real number that controls how “good” the barrier function approximates the indicator function
The approximation improves as parameter becomes closer to . Thus, the new node cost function will be
As an extension of the original notation, we use denote the optimal solution of the following optimization problem
s.t. | |||
Compared with defined in Section 3.1, the constraints are removed and the node costs are replaced with .
A key observation we need to point out is that the perturbation bounds we have shown in Appendix D.1 and D.2 do not depend on the smoothness constant of node cost functions. That means the perturbation bound
holds for arbitrary , where are specified in Theorem 3.2 or Theorem 3.3 and are independent of . Theorem 3.10 in Forsgren et al. (2002) guarantees that converge to for any positive sequence that tends to zero. Thus the above perturbation bound also holds for which includes the constraints on individual decisions.
Note that the argument we present in this section also works for Theorem C.5.
Appendix E Competitive Bounds
This appendix includes the proofs of the competitive bounds presented in Section 3.3.
E.1 Proof of Theorem C.8
We first derive an upper bound on the distance between and .
E.2 Proof of Lemma C.7
In this section, we show Lemma C.7 holds with following specific constants:
(28) |
Note that, by the principle of optimality, we have
Recall that we define the quantity to simplify the notation.
Since the exponentially decaying local perturbation bound holds in Definition 3.1, we see that
(29) |
which implies that
(30a) | ||||
(30b) | ||||
(30c) |
where we used the AM-GM Inequality in (30a); we used the Cauchy-Schwarz Inequality in (30b); we used the definitions of functions and in (30c).
We also note that by the principle of optimality, the following equations hold for all :
Recall that . By Theorem C.5, we see that
(32) |
which implies
(33a) | ||||
(33b) |
where we used the triangle inequality and the AM-GM inequality in (33a); we used (32) in (33b).
Substituting (33) into (E.2) gives
(34) |
where we used the fact that the node cost function is non-negative and -strongly convex for all , thus
Note that . Thus we have finished the proof of (E.2).
E.3 Proof of Theorem 3.4
In this section, we show Theorem 3.4 holds with the following specific constants:
(35) |
under the assumption that
(36) |
Recall that is defined in Theorem C.8. Note that Theorem 3.2 and Theorem C.5 hold under Assumption 2.1. One can check that and are bounded by polynomials of and .
In the proof, we need to use Lemma F.2 in Lin et al. (2021) to bound LPC’s total cost by a weighted sum of the offline optimal cost and the sum of squared distances between their trajectories. For completeness, we present Lemma F.2 in Lin et al. (2021) below:
Lemma E.1.
For a fixed dimension , assume a function is convex, -smooth and continuously differentiable. For all , for all , we have
Now we come back to the proof of Theorem 3.4. We first bound the sum of squared distances between LPC’s trajectory and the offline optimal trajectory:
(37a) | ||||
(37b) |
where we used Theorem C.8 in (37a); we used Lemma C.7 with the specific constants given in Appendix E.2 in (37b).
Recall that in (36), we assume and are sufficient large so that the coefficient of the first term in (37) satisfies
Substituting this bound into (37) gives that
(38) |
By Lemma E.1, since is -smooth, convex, and non-negative on , and is -smooth, convex, and non-negative on , we know that
(39) |
holds for any . Summing the above inequality over gives
(40) |
where we used (38) and in the last inequality. Setting in (E.3) finishes the proof of (35).
E.4 Proof of Corollary 3.5
We first show holds under 2.1 and the assumptions that . To see this, note that as we discussed in Section 3.2, by setting and , Theorem 3.4 holds with
Hence we see that
which implies that
(41) |
Recall that function . Hence we see that
(42) |
Appendix F Proof of Theorem 3.6
In this appendix we prove a lower bound on the competitive ratio of any online algorithm. Our proof focuses on temporal and spatial lower bounds separately first, and then combines them.
Step 1: Temporal Lower Bounds
We first show that the competitive ratio of any online algorithm with steps of future predictions is lower bounded by . To show this, we consider the special case when there are no spatial interaction costs (i.e., for all and ). In this case, since all agents are independent with each other, it suffices to assume there is only one agent in the network . Thus we will drop the agent index in the following analysis. To further simplify the problem, we assume dimension , , and the feasible set is for all . Let denote the diameter of , i.e., .
By Theorem 2 in Li et al. (2020) and Case 1 in its proof, we know that for any online algorithm with steps of future predictions and , there exists a problem instance with quadratic functions that have the form such that
(43) |
where . Note that
Substituting this into (43) gives
(44) |
Note that (43) implies , hence the competitive ratio can be unbounded if .
Step 2: Spatial Lower Bounds
We next show that the competitive ratio of any online algorithm that can communicate within -hop neighborhood according to the scheme defined in Section 2.1 is lower bounded by . To show this, we will construct a special Networked OCO instance with random cost functions and show there exists a realization that achieves the lower bound by probabilistic methods.
Theorem F.1.
Under the assumption that , the competitive ratio of any decentralized online algorithm with communication radius is lower bounded by , where notation hides factors that depend polynomially on and , and
(45) |
Proof of Theorem F.1.
In the proof, we assume the online game only lasts one time step before it ends, i.e., . Note that when , the same counterexample can be constructed repeatedly by letting the temporal interaction costs for every agent and time step . To simplify the notation, we define and . Without the loss of generality, we assume so that each agent has a positive integer index.
We consider the case where the node cost function for each agent is and the spatial interaction cost between two neighboring agents and is . Here, is the scalar action of agent , and parameter is a local information that corresponds to agent . The parameters are sampled i.i.d. from some distribution , which we will discuss later.
For a general graph of agents, let denote its graph Laplacian matrix. Recall that the graph Laplacian matrix is a symmetric matrix and it is defined as
for agents . Here denotes the degree of an agent in graph . We know that is a symmetric semi-definite positive semi-definite and has bandwidth w.r.t. to . The centralized optimization problem can be expressed as
where the minimum is attained at .
When each agent only has communication radius , it can only observe the part of that is within . To simplify the notation, we define the mask operator w.r.t. a set as
for . The local policy of agent (denote as ) is a mapping from to the local decision .
Suppose the distribution of each local parameters is a mean-zero distribution with support on . For every agent , we see that
(46a) | ||||
(46b) | ||||
(46c) |
where we use the fact that conditional expectations minimize the mean square prediction error in (46a); we use the requirement that the distribution of is mean-zero in (46b).
To bound the variance term in (46c), we need the following lemma to lower bound the magnitude of every entry in the exponential decaying matrix :
Lemma F.2.
There exists a finite graph with maximum degree that satisfies the following conditions: For any two vertices such that , the following inequality holds:
If we make the additional assumption that , we have that
We defer the proof of Lemma F.2 to the end of this section. Note that Lemma F.2 implies that there exists a graph that satisfies , where notation hides factors that depend polynomially on and , and is as defined in (45). We assume the agents are located in this graph for the rest of the proof.
Using Lemma F.2, we can derive the following lower bound of the variance term in (46b):
(47) |
Substituting (F) into (46) and summing over all vertices , we obtain that
We also see that
(48) |
Note that the global objective function is -strongly convex, and is minimizer of this function. Thus, we have that for any outcome of ,
Taking expectations on both sides w.r.t. gives that
(49) |
Dividing (49) by (48), we obtain that
Note that . Thus, there must exist an instance of such that and
∎
Before we present the proof of Lemma F.2, we first need to introduce two technical lemmas that will be used in the proof of Lemma F.2. The first lemma (Lemma F.3) provides a lower bound for binomial coefficient .
Lemma F.3.
For any positive integer and such that is an integer, the following inequality holds:
Proof of Lemma F.3.
By Lemma 2.1 in (Stanica, 2001), we know for any ,
where satisfies . Thus we see that
Therefore, we can lower bound by
∎
The second technical lemma (Lemma F.4) will be used to simplify the decay factor in the proof of Lemma F.2.
Lemma F.4.
For all , the following inequality holds
Proof of Lemma F.4.
By taking logarithm on both sides, we see the original inequality is equivalent to
which is further equivalent to
(50) |
Note that the LHS can be lower bounded by
Function satisfies that , and its derivative is
Thus, for all . Hence (50) holds for all . ∎
Now we are ready to present the proof of Lemma F.2.
Proof of Lemma F.2.

Consider the graph constructed as Figure 3: Let be a positive integer that is sufficiently large. blocks form a ring, where each block contains nodes. Every pair of blocks are connected by a complete bipartite graph. The graph Laplacian of can be decomposed as , where is the adjacency matrix of . We see that
Fix two vertices and and denote and assume . Without the loss of generality, we can assume is on the clockwise direction of . We see that
(51) |
Note that denotes the number of paths from to with length in graph . Note that the shortest paths from to have length . To pick a path with length from to , we can first pick a path on the level of blocks: The number of possible block-level paths is lower bounded by because we can choose in steps to go in the counter clockwise direction. After a block-level path is fixed, we can choose which specific vertices in the blocks we want to land at, and there are choices. Thus we see that
Substituting this into (F) gives
(52) |
Let will give that , which shows the first claim of Lemma F.2. Now we proceed to show the second claim of Lemma F.2.
By Lemma F.3, we know that when , we have that
For any , the inequality we just showed can help us bound a term in the summation of (52) below:
where the last line follows from Lemma F.4.
Thus, we obtain that the following inequality holds for arbitrary :
(53) |
Step 3: Combine Temporal and Spatial Lower Bounds
Combining the results of Steps 1 and 2 together, we know that the competitive ratio of any decentralized online algorithm is lower bounded by
Appendix G Proof of Corollary 3.7
In this appendix we prove a resource augmentation bound for LPC. To simplify the notation, we define the shorthand and . and are positive real numbers. We first show two lemmas about the relationships between the decay factors and , and and .
Lemma G.1.
Under the assumptions of Theorem 3.2, we have
Proof of Lemma G.1.
Lemma G.2.
Under the assumptions of Theorem 3.2, we have .
Proof of Lemma G.2.
Case 1: . We have in this case.
We first show that the following inequality holds for any positive integer and :
(55) |
To see this, define function . Note that is a convex function with and
Thus, we see that holds for all . Hence (55) holds.
By (55), we see that
Case 2: . We have in this case.
To see this, note that , and by Theorem 3.6, . Therefore, we see that
Case 3: . We have in this case.
To see this, note that
Thus we see that
∎
Now we come back to the proof of Corollary 3.7. By Theorem 3.4 and Theorem 3.6, we know that the optimal competitive ratio is lower bounded by
and LPC’s competitive ratio is upper bounded by
where and are some positive constants. To achieve , it suffices to guarantee that
Note that can be upper bounded by some constant and under our assumptions. Applying Lemma G.1 and Lemma G.2 finishes the proof.