Decentralized Online Convex Optimization in Networked Systems

Yiheng Lin Judy Gan Guannan Qu Yash kanoria Adam Wierman

Abstract

We study the problem of networked online convex optimization, where each agent individually decides on an action at every time step and agents cooperatively seek to minimize the total global cost over a finite horizon. The global cost is made up of three types of local costs: convex node costs, temporal interaction costs, and spatial interaction costs. In deciding their individual action at each time, an agent has access to predictions of local cost functions for the next $k$ time steps in an $r$ -hop neighborhood. Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. We show that LPC achieves a competitive ratio of $1+\tilde{O}(\rho_{T}^{k})+\tilde{O}(\rho_{S}^{r})$ in an adversarial setting, where $\rho_{T}$ and $\rho_{S}$ are constants in $(0,1)$ that increase with the relative strength of temporal and spatial interaction costs, respectively. This is the first competitive ratio bound on decentralized predictive control for networked online convex optimization. Further, we show that the dependence on $k$ and $r$ in our results is near optimal by lower bounding the competitive ratio of any decentralized online algorithm.

Machine Learning, ICML

1 Introduction

A wide variety of multi-agent systems can be modeled as optimization tasks in which individual agents must select actions based on local information with the goal of cooperatively learning to minimize a global objective in an uncertain, time varying environment. This general setting emerges in applications such as formation control (Chen & Wang, 2005; Oh et al., 2015), power systems control (Molzahn et al., 2017; Shi et al., 2021), and multiproduct price optimization (Caro & Gallien, 2012; Candogan et al., 2012). In all these cases, it is key that the algorithms used by agents use only local information due to the computational burden created by the size of the systems, the information constraints in the systems, and the need for fast and/or interpretable decisions.

At this point, there is a mature literature focused on decentralized optimization, e.g. Bertsekas & Tsitsiklis (1989); Boyd et al. (2011); Shi et al. (2015); Nedić et al. (2018), see Xin et al. (2020) for a survey; however, the design of learning policies for uncertain, time-varying environments requires decentralized online optimization. The literature studying decentralized online optimization is still nascent (see the related work section for a discussion of recent papers, e.g. Li et al. (2021b); Yuan et al. (2021); Yi et al. (2020)) and many challenging open questions remain.

Three issues of particular importance for real-world applications are the following.

First, temporal coupling in actions is often of first-order importance to applications. For example, startup costs, ramping costs, and switching costs are prominent in settings such as power systems and cloud computing, and lead to penalties for changing actions dramatically over time. The design of online algorithms to address such temporal interaction costs has received significant attention in the single-agent case recently, e.g, smoothed online optimization (Goel et al., 2019; Lin et al., 2020), convex body chasing (Argue et al., 2020b; Sellke, 2020), online optimization with memory (Agarwal et al., 2019; Shi et al., 2020), and dynamic pricing (Besbes & Lobel, 2015; Chen & Farias, 2018).

Second, spatial interaction costs are of broad importance in practical applications. Such costs arise because of the need for actions of nearby agents to be aligned with one another, and are prominent in settings such as economic team theory (Marschak, 1955; Marschak & Radner, 1972), combinatorial optimization over graphs (Hochba, 1997; Gamarnik & Goldberg, 2010), and statistical inference (Wainwright & Jordan, 2008). An example is (dynamic) multiproduct pricing, where the price of a product can impact the demand of other related products (Song & Chintagunta, 2006).

Third, leveraging predictions of future costs has long been recognized as a promising way to improve the performance of online agents (Morari & Lee, 1999; Lin et al., 2012; Badiei et al., 2015; Chen et al., 2016; Shi et al., 2019; Li et al., 2020). As learning tools become more prominent, the role of predictions is growing. By collecting data from repeated trials, data-driven learning tools make it possible to provide accurate predictions for near future costs. For example, in multiproduct pricing, good demand forecasts can be constructed up to a certain time horizon and are invaluable in setting prices (Caro & Gallien, 2012).

In addition to the three issues above, we would like to highlight that existing results for decentralized online optimization focus on designing algorithms with low (static) regret (Hosseini et al., 2016; Li et al., 2021b) , i.e., algorithms that (nearly) match the performance of the best static action in hindsight. In a time-varying environment, it is desirable to instead obtain stronger bounds, such as those on the dynamic regret or competitive ratio, which compare to the dynamic optimal actions instead of the best static action in hindsight, e.g., see results in the centralized setting such as Lin et al. (2020); Li et al. (2020); Shi et al. (2020).

This paper aims to address decentralized online optimization with the three features described above. In particular, we are motivated by the open question: Can a decentralized algorithm make use of predictions to be competitive for networked online convex optimization in an adversarial environment when spatial and temporal costs are considered?

Contributions. This paper provides the first competitive algorithm for decentralized learning in networked online convex optimization. Agents in a network must each make a decision at each time step, to minimize a global cost which is the sum of convex node costs, spatial interaction costs and temporal interaction costs. We propose a predictive control framework called Localized Predictive Control (LPC, Algorithm 1) and prove that it achieves a competitive ratio of $1+\tilde{O}(\rho_{T}^{k})+\tilde{O}(\rho_{S}^{r})$ , which approaches 1 exponentially fast as the prediction horizon $k$ and communication radius $r$ increase simultaneously. Our results quantify the improvement in competitive ratio from increasing the communication radius $r$ (which also increases the computational requirements) versus increasing the prediction horizon $k$ , and imply that – as a function of problem parameters – one of the two “resources” $k$ and $r$ emerges as the bottleneck to algorithmic performance. Given any target competitive ratio, we find the minimum required prediction horizon $k$ and communication radius $r$ as functions of the temporal interaction strength and the spatial interaction strength, resp.

Further, we show that LPC is order-optimal in terms of $k$ and $r$ by proving a lower bound on the competitive ratio for any online algorithm. We formalize the near optimality of our algorithm by showing that a resource augmentation bound follows from our upper and lower bounds: our algorithm with given $k$ and $r$ performs at least as well as the best possible algorithm that is forced to work with $k^{\prime}$ and $r^{\prime}$ which are a constant factor smaller than $k$ and $r$ respectively.

The algorithm we propose, LPC, is inspired by Model Predictive Control (MPC). After fixing the prediction horizon $k$ and the communication radius $r$ , each agent makes an individual decision by solving a $k$ -time-step optimization problem, on a local neighborhood centered at itself and with radius $r$ . In doing so, the algorithm utilizes all available information and makes a “greedy” decision. One benefit of this algorithm is its simplicity and interpretability, which is often important for practical applications. Moreover, since the algorithm is local, the computation needed for each agent is independent of the network size.

Our main results are enabled by a new analysis methodology which obtains two separate decay factors for the propagation of decision errors (a temporal decaying factor $\rho_{T}$ and spatial decaying factor $\rho_{S}$ ) through a novel perturbation analysis. Specifically, the perturbation analysis seeks to answer the following question: If we perturb the boundary condition of an agent $v$ ’s $r$ -hop neighborhood at the time step which is $\tau$ -th later than the present, how does that affect $v$ ’s current decision, in terms of spatial distance $r$ and temporal distance $\tau$ ? With our analysis, we are able to bound the impact on $v$ ’s current decision by $O(\rho_{T}^{\tau}\rho_{S}^{r})$ , where the decay factors $\rho_{T}$ and $\rho_{S}$ increase with the strength of temporal/spatial interactions among individual decisions. This novel analysis is critical for deriving a competitive ratio that distinguishes the decay rate for temporal and spatial distances.

To illustrate the use of our results in a concrete application, Appendix B provides a detailed discussion of dynamic multiproduct pricing, which is a central problem in revenue management. The resulting revenue maximization problem fits into our theoretical framework, and we deduce from our results that LPC guarantees near optimal revenue, in addition to reducing the computational burden (Schlosser, 2016) and providing interpretable prices (Biggs et al., 2021) for products.

Related Work. This paper contributes to the literature in three related areas, each of which we describe below.

Distributed Online Convex Optimization. Our work relates to a growing literature on distributed online convex optimization with time-varying cost functions over multi-agent networks. Many recent advances have been made including distributed OCO with delayed feedback (Cao & Basar, 2021), coordinating behaviors among agents (Li et al., 2021a; Cao & Başar, 2021), and distributed OCO with a time-varying communication graph (Hosseini et al., 2016; Akbari et al., 2017; Yuan et al., 2021; Li et al., 2021b; Yi et al., 2020). A common theme of the previous literature is the idea that agents can only access partial information of time-varying global loss functions, thus requiring local information exchange between neighboring agents. To the best of our knowledge, our paper is the first in this literature to provide competitive ratio bounds or consider spatial and temporal costs, e.g., switching costs.

Online Convex Optimization (OCO) with Switching Costs. Online convex optimization with switching costs was first introduced in Lin et al. (2012) to model dynamic power management in data centers. Different forms of cost functions have been studied since then, e.g., Chen et al. (2018); Shi et al. (2020); Lin et al. (2020), in order to fit a variety of applications from video streaming (Joseph & de Veciana, 2012) to energy markets (Kim & Giannakis, 2017). The quadratic form of switching cost was first proposed in Goel & Wierman (2019) and yields connections to optimal control, which were further explored in Lin et al. (2021). The literature has focused entirely on the centralized, single-agent setting. Our paper contributes to this literature by providing the first analysis of switching costs in a networked setting with a decentralized algorithm.

Perturbation Analysis of Online Algorithms. Sensitivity analysis of convex optimization problems studying the properties of the optimal solutions as a function of the problem parameters has a long and rich history (see Fiacco & Ishizuka (1990) for a survey). The works that are most related to ours consider the specific class of problems where the decision variables are located on a horizon axis, or consider a general network and aim to show the impact of a perturbation on a decision variable is exponentially decaying in the graph distance from that variable, e.g., Shin et al. (2021); Shin & Zavala (2021); Lin et al. (2021). The idea of using exponentially decaying perturbation bounds to analyze an online algorithm is first proposed in Lin et al. (2021), where only the temporal dimension is considered. This style of perturbation analysis is key to the proof of our competitive bounds and, to prove our competitive bounds, we provide new perturbation results that separate the impact of spatial and temporal costs in a network for the first time. Additionally, our analysis is enabled by new results on the decay rate of a product of exponential decay matrices, which may be of independent interest.

Notation. A complete notation table can be found in Appendix A. Here we describe the most commonly used notation. In a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , we use $d_{\mathcal{G}}(v,u)$ to denote the distance (i.e. the length of the shortest path) between two vertices $v$ and $u$ . $N_{v}^{r}$ denotes the $r$ -hop neighborhood of vertex $v$ , i.e., $N_{v}^{r}:=\{u\in\mathcal{V}\mid d_{\mathcal{G}}(u,v)\leq r\}$ . $\partial N_{v}^{r}$ denotes the boundary of $N_{v}^{r}$ , i.e., $\partial N_{v}^{r}=N_{v}^{r}\setminus N_{v}^{r-1}$ . We generalize these notations to temporal-spatial graphs as follows. Let $\times$ denote the Cartesian product of sets, and $N_{(t,v)}^{(k,r)}:={}\{\tau\in\mathbb{Z}\mid t\leq\tau<t+k\}\times N_{v}^{r},$ $\partial N_{(t,v)}^{(k,r)}:={}N_{(t,v)}^{(k,r)}\setminus N_{(t,v)}^{(k-1,r-1)}.$ For any subset of vertices $S$ , we use $\mathcal{E}(S)$ to denote the set of all edges that have both endpoints in $S$ , and define $S_{+}=\{u\in\mathcal{V}\mid\exists v\in S\text{ s.t. }d_{\mathcal{G}}(u,v)\leq 1\}$ (i.e., $S$ and its 1-hop neighbors). Let $\Delta$ denote the maximum degree of any vertex in $\mathcal{G}$ ; $h(r):=\sup_{v}\mathopen{}\mathclose{{}\left\lvert\partial N_{v}^{r}}\right\rvert$ . We say a function is in $C^{2}$ if it is twice continuously differentiable. We use $\mathopen{}\mathclose{{}\left\lVert\cdot}\right\rVert$ to denote the (Euclidean) 2-norm for vectors and the induced 2-norm for matrices.

2 Problem Setting

We consider a set of agents in a networked system where each agent individually decides on an action at each time step and the agents cooperatively seek to minimize a global cost over a finite time horizon $H$ . Specifically, we consider a graph $\mathcal{G}=\mathopen{}\mathclose{{}\left(\mathcal{V},\mathcal{E}}\right)$ of agents. Each vertex $v\in\mathcal{V}$ denotes an individual agent, and two agents $v$ and $u$ interact with each other if and only if they are connected by an undirected edge $(v,u)\in\mathcal{E}$ . At each time step $t=1,2,\dots,H$ , each agent $v$ picks an $n$ -dimensional local action $x_{t}^{v}\in D_{t}^{v}$ , where $n$ is a positive integer and $D_{t}^{v}\subset\mathbb{R}^{n}$ is a convex set of feasible actions. The global action at time $t$ is the vector of agent actions $x_{t}=\{x_{t}^{v}\}_{v\in\mathcal{V}},$ and incurs a global state cost, which is the sum of three types of local cost functions:

•

Node costs: Each agent $v$ incurs a time-varying node cost $f_{t}^{v}(x_{t}^{v})$ , which characterizes agent $v$ ’s local preference for its local action $x_{t}^{v}$ .
•

Temporal interaction costs: Each agent $v$ incurs a time-varying temporal interaction cost $c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})$ , that characterizes how agent $v$ ’s previous local action $x_{t-1}^{v}$ interacts with its current local action $x_{t}^{v}$ .
•

Spatial interaction costs: Each edge $e=(v,u)$ incurs a time-varying spatial interaction cost¹¹1Since $e$ is an undirected edge, the order we write the two inputs (the action of $v$ and the action of $u$ ) does not matter. Note that $s_{t}^{e}$ can be asymmetric for agents $v$ and $u$ , e.g., $s_{t}^{e}(x_{t}^{v},x_{t}^{u})=s_{t}^{e}(x_{t}^{u},x_{t}^{v})=\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}+2x_{t}^{u}}\right\rVert^{2}$ . $s_{t}^{e}(x_{t}^{v},x_{t}^{u})$ . This characterizes how agents $v$ and $u$ ’s current local actions affect each other.

In our model, the node cost is the part of the cost that only depends the agent’s current local action. If the other two types of costs are zero functions, each agent will trivially pick the minimizer of its node cost. Temporal interaction costs encourage agents to choose a local action that is “compatible” with their previous local action. For example, a temporal interaction could be a switching cost which penalizes large deviations from the previous action, in order to make the trajectory of local actions “smooth”. Such switching costs can be found in work on single-agent online convex optimization, e.g., Chen et al. (2018); Goel et al. (2019); Lin et al. (2020). Spatial interaction costs, on the other hand, can be used to enforce some collective behavior among the agents. For example, spatial interaction can model the probability that one agent’s actions affects its neighbor’s actions in diffusion processes on social networks (Kempe et al., 2015); or model interactions between complement/substitute products in multiproduct pricing (Candogan et al., 2012).

Our analysis is based on standard smoothness and convexity assumptions on the local cost functions (see Appendix A for definitions of smoothness and strong convexity):

Assumption 2.1.

For $\mu>0,\ell_{f}<\infty,\ell_{T}<\infty,\ell_{S}<\infty$ , the local cost functions and feasible sets for all $t,v,e$ satisfy:

•

$f_{t}^{v}:\mathbb{R}^{n}\to\mathbb{R}_{\geq 0}$ is $\mu$ -strongly convex, $\ell_{f}$ -smooth, and in $C^{2}$ ;
•

$c_{t}^{v}:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\mathbb{R}_{\geq 0}$ is convex, $\ell_{T}$ -smooth, and in $C^{2}$ ;
•

$s_{t}^{e}:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\mathbb{R}_{\geq 0}$ is convex, $\ell_{S}$ -smooth, and in $C^{2}$ ;
•

$D_{t}^{v}\subseteq\mathbb{R}^{n}$ satisfies $int(D_{t}^{v})\not=\emptyset$ and can be written as $D_{t}^{v}:=\{x_{t}^{v}\in\mathbb{R}^{n}\mid(g_{t}^{v})_{i}(x_{t}^{v})\leq 0,\forall 1\leq i\leq m_{t}^{v}\},$ where each $(g_{t}^{v})_{i}:\mathbb{R}^{n}\to\mathbb{R}$ is a convex function in $C^{2}$ .

Note that the assumptions above are common, even in the case of single-agent online convex optimization, e.g., see Li et al. (2020); Shi et al. (2020); Lin et al. (2021).

It is useful to separate the global stage costs into two parts based on whether the cost term depends only on the current global action or whether it also depends on the previous action. Specifically, the part that only depends on the current global action $x_{t}$ is the sum of all node costs and spatial interaction costs. We refer to this component as the (global) hitting cost and denote it as

f_{t}(x_{t}):=\sum_{v\in\mathcal{V}}f_{t}^{v}(x_{t}^{v})+\sum_{(v,u)\in\mathcal{E}}s_{t}^{(v,u)}(x_{t}^{v},x_{t}^{u}).

The rest of the global stage cost involves the current global action $x_{t}$ and the previous global action $x_{t-1}$ . We refer to it as the (global) switching cost and denote it as

c_{t}(x_{t},x_{t-1})=\sum_{v\in\mathcal{V}}c_{t}^{v}(x_{t}^{v},x_{t-1}^{v}).

Combining the global hitting and switching costs, the networked agents work cooperatively to minimize the total global stage costs in a finite horizon $H$ starting from a given initial global action $x_{0}$ at time step $0$ : $cost(ALG):=\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t})+c_{t}(x_{t},x_{t-1})}\right),$ where $ALG$ denotes the decentralized online algorithm used by the agents. The offline optimal cost is the clairvoyant minimum cost one can incur on the same sequence of cost functions and the initial global action $x_{0}$ at time step $0$ , i.e., $cost(OPT):=\min_{x_{1:H}}\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t})+c_{t}(x_{t},x_{t-1})}\right).$

We measure the performance of any online algorithm ALG by the competitive ratio (CR), which is a widely-used metric in the literature of online optimization, e.g., Chen et al. (2018); Goel et al. (2019); Argue et al. (2020a).

Definition 2.2.

The competitive ratio of online algorithm $ALG$ is the supremum of $cost(ALG)/cost(OPT)$ over all possible problem instances, i.e., $CR(ALG)\coloneqq\sup_{\mathcal{G},H,x_{0},\{f_{t}^{v},c_{t}^{v},s_{t}^{e},D_{t}^{v}\}}cost(ALG)/cost(OPT).$

Finally, we define the partial hitting and switching costs over subsets of the agents. In particular, for a subset of agents $S\subseteq\mathcal{V}$ , we denote the joint action over $S$ as $x_{t}^{S}:=\{x_{t}^{v}\mid v\in S\}$ and define the partial hitting cost and partial switching cost over $S$ as

	$\displaystyle f_{t}^{S}(x_{t}^{S_{+}}):={}$	$\displaystyle\sum_{v\in S}f_{t}^{v}(x_{t}^{v})+\sum_{(v,u)\in\mathcal{E}(S_{+})}s_{t}^{(v,u)}(x_{t}^{v},x_{t}^{u}),$
	$\displaystyle c_{t}^{S}(x_{t}^{S},x_{t-1}^{S}):={}$	$\displaystyle\sum_{v\in S}c_{t}^{v}(x_{t}^{v},x_{t-1}^{v}),$		(1)

This notation is useful for presenting decentralized online algorithms where the optimizations are performed over the $r$ -hop neighborhood of each agent.

2.1 Information Availability Model

Figure 1: Illustration of available information for agent

v

at time

t

in networked online convex optimization with

k=2

and

r=1

, for the network with

\mathcal{V}=\{u_{1},u_{2},v,u_{3},u_{4}\}

and

\mathcal{E}=\{(u_{1},u_{2}),(u_{2},v),(v,u_{3}),(u_{2},u_{3}),(u_{3},u_{4})\})

We assume that each agent has access to local cost functions up to a prediction horizon $k$ into the future, for themselves and their neighborhood up to a communication radius $r$ . In more detail, recall that $N_{v}^{r}$ denotes the $r$ -hop neighborhood of agent $v$ , i.e., $N_{v}^{r}:=\{u\in\mathcal{V}\mid d_{\mathcal{G}}(u,v)\leq r\}$ . Before picking a local action $x_{t}^{v}$ at time $t$ , agent $v$ can observe $k$ steps of future node costs, temporal interaction costs, and spatial interaction costs within its $r$ -hop neighborhood, $\{\{(f_{\tau}^{u},c_{\tau}^{u})\mid u\in N_{v}^{r}\},\{s_{\tau}^{e}\mid e\in\mathcal{E}(N_{v}^{r})\}\}_{t\leq\tau<t+k}\,,$ and the previous local actions in $N_{v}^{r}$ : $\{x_{t-1}^{u}\mid u\in N_{v}^{r}\}$ .

We provide an illustration of the local cost functions known to agent $v$ at time $t$ in Figure 1. In the figure, the black circles, blue lines, and orange lines denote the node costs, temporal interaction costs, and spatial interaction costs respectively. The known functions are marked by solid lines. Note that, in addition to the local cost functions, agent $v$ also knows the local actions in $N_{v}^{r}$ at time $t-1$ , which are not illustrated in the figure.

To simplify notation, in cases when the prediction horizon exceeds the whole horizon length $H$ , we adopt the convention that $f_{t}^{v}(x_{t}^{v})=\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}}\right\rVert^{2}$ , $c_{t}^{v}\equiv s_{t}^{e}\equiv 0$ and $D_{t}^{v}=\mathbb{R}^{n}$ for $t>H$ . These extended definitions do not affect our original problem with horizon $H$ .

As in many previous works studying the power of predictions in the online optimization literature, e.g., Yu et al. (2020); Lin et al. (2020); Li et al. (2020); Lin et al. (2021), we assume the $k$ -step predictions of cost functions are exact and leave the case of inexact predictions for future work. This model is reasonable in the case where the predictors can be trained to be very accurate for the near future. Although we focus on exact predictions throughout this paper, we also discuss how to extend this available information model to include inexact predictions in Appendix C.3.

3 Algorithm and Main Results

Algorithm 1 Localized Predictive Control (for agent

v

)

Parameters:

k

and

r

for

t=1

H

Receive information

\{x_{t-1}^{u}\mid u\in N_{v}^{r}\}

and

\{\{(f_{\tau}^{u},c_{\tau}^{u})\mid u\in N_{v}^{r}\},\{s_{\tau}^{e}\mid e\in\mathcal{E}(N_{v}^{r})\}\}_{t\leq\tau<t+k}.

Choose local action

x_{t}^{v}

to be the

(t,v)

-th element in

\psi_{(t,v)}^{(k,r)}\!\mathopen{}\mathclose{{}\left(\!\{x_{t-1}^{u}\!\mid\!u\!\in\!N_{v}^{r}\},\big{\{}\theta_{\tau}^{u}\!\mid\!(\tau,u)\!\in\!\partial N_{(t,v)}^{(k,r)}\big{\}}\!}\right)

the solution of (3.1), where

\theta_{\tau}^{u}\coloneqq\arg\min_{y\in D_{\tau}^{u}}f_{\tau}^{u}(y)

end for

In this section we present our main results, which show that our simple and practical LPC algorithm can achieve an order-optimal competitive ratio for the networked online convex optimization problem. We first introduce LPC in Section 3.1. Then, we present the key idea that leads to our competitive ratio bound: a novel perturbation-based analysis (Section 3.2). Next, we use our perturbation analysis to derive bounds on the competitive ratio in Section 3.3. Finally, we show that the competitive ratio of LPC is order-optimal in Section 3.4. An outline that highlights the major novelties in our proofs can be found in Appendix C.

3.1 Localized Predictive Control (LPC)

The design of LPC is inspired by the classical model predictive control (MPC) framework (Garcia et al., 1989), which leverages all available information at the current time step to decide the current local action “greedily”. In our context, when an agent $v$ wants to decide its action $x_{t}^{v}$ at time $t$ , the available information includes previous local actions in the $r$ -hop neighborhood and $k$ -step predictions of all local node costs, temporal/spatial interaction costs. The boundaries of all available information, which are formed by $\{t-1\}\times N_{v}^{r}$ and $\partial N_{(t,v)}^{(k,r)}$ , are illustrated in Figure 2.

The pseudocode for LPC is presented in Algorithm 1. For each agent $v$ at time step $t$ , LPC fixes the actions on the boundaries of available information and then solves for the optimal actions inside the boundaries. Specifically, define $\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\mid u\in N_{v}^{r}\},\{z_{\tau}^{u}\mid(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\}}\right)$ as the optimal solution of the problem

$\displaystyle\min$	$\displaystyle\sum_{\tau=t}^{t+k-1}\mathopen{}\mathclose{{}\left(f_{\tau}^{(N_{v}^{r-1})}\mathopen{}\mathclose{{}\left(x_{\tau}^{(N_{v}^{r})}}\right)+c_{\tau}^{(N_{v}^{r})}\mathopen{}\mathclose{{}\left(x_{\tau}^{(N_{v}^{r})},x_{\tau-1}^{(N_{v}^{r})}}\right)}\right)$
s.t.	$\displaystyle x_{t-1}^{u}=y_{t-1}^{u},\forall u\in N_{v}^{r},$
	$\displaystyle x_{\tau}^{u}=z_{\tau}^{u},\forall(\tau,u)\in\partial N_{(t,v)}^{(k,r)},$	(2)
	$\displaystyle x_{\tau}^{u}\in D_{\tau}^{u},\forall(\tau,u)\in N_{(t,v)}^{(k-1,r-1)},$

where the partial hitting cost and partial switching cost $f_{\tau}^{S}$ and $c_{\tau}^{S}$ for a subset $S$ of agents were defined in (1). Note that $\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ is a matrix of actions (in $\mathbb{R}^{n}$ ) indexed by $(\tau,u)\in N_{(t,v)}^{(k-1,r-1)}$ . (When the context is clear, we use the shorthand $\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ .) Once the parameters $\{y_{t-1}^{u}\}$ and $\{z_{\tau}^{u}\}$ are fixed, the agent $v$ can leverage its knowledge of the local cost functions to solve (3.1).

Figure 2: Illustration of LPC with

k=3,r=2

on a line graph (the underlying graph is replicated over the time dimension). The orange node marks the decision variable at

(t,v)

. The green part denotes the decisions in

N_{v}^{r}

at time

(t-1)

. The blue “U” shape denotes the boundary of available information for node

v

at time

t

. Edge

e\coloneqq(v,q)

LPC fixes the parameters $\{y_{t-1}^{u}\}$ to be $\{x_{t-1}^{u}\}$ , which are the previous local actions in $N_{v}^{r}$ , and fixes the parameters $\{z_{\tau}^{u}\}$ to be the minimizers of local node cost functions at nodes in $\partial N_{(t,v)}^{(k,r)}$ . The selection of the parameters at nodes in $\partial N_{(t,v)}^{(k,r)}$ plays a similar role as the terminal cost of classical MPC in centralized settings.

For a single-agent system, MPC-style algorithms are perhaps the most prominent approach for optimization-based control (Garcia et al., 1989) because of their simplicity and excellent performance in practice. LPC extends the ideas of MPC to a multi-agent setting in a networked system by leveraging available information in both the temporal and spatial dimensions, whereas classical MPC focuses only on the temporal dimension. This change leads to significant technical challenges in the analysis.

3.2 Perturbation Analysis

The key idea underlying our analysis of LPC is that the impact of perturbations to the actions at the boundaries of the available information of an agent decay quickly, in fact exponentially fast, in the distance of the boundary from the agent. This quick decay means that small errors cannot build up to hurt algorithm performance.

In this section, we formally study such perturbations by deriving several new results which generalize perturbation bounds for online convex optimization problems on networks. Our bounds capture both the effect of temporal interactions as well as spatial interactions between agent actions, which is a more challenging problem compared to previous literature which considers either temporal interactions (Lin et al., 2021) or spatial interactions (Shin et al., 2021) but not both simultaneously.

More specifically, recall from Section 3.1 that for each agent $v$ at time $t$ , LPC solves an optimization problem $\psi_{(t,v)}^{(k,r)}$ where actions on the boundaries of available information (i.e., $\{t-1\}\times N_{v}^{r}$ and $\partial N_{(t,v)}^{(k,r)}$ ) are fixed. By the principle of optimality, we know that if the actions on the boundaries are selected to be identical with the offline optimal actions, the agent can decide its current action optimally by solving $\psi_{(t,v)}^{(k,r)}$ . However, due to the limits on the prediction horizon and communication radius, LPC can only approximate the offline optimal actions on the boundaries (we do this by using the minimizer of node cost functions). The key idea to our analysis of the optimality gap of LPC is by first asking: If we perturb the parameters of $\psi_{(t,v)}^{(k,r)}$ , i.e., the actions on the information boundaries, how large is the resulting change in a local action $x_{t_{0}}^{v_{0}}$ for $(t_{0},v_{0})\in N_{(t,v)}^{(k,r)}\setminus\partial N_{(t,v)}^{(k,r)}$ (in the optimal solution to (3.1))?

Ideally, we would like the above impact to decay exponentially fast in the graph distance between node $v_{0}$ and the communication boundary for node $v$ (i.e., $r$ minus the graph distance between $v_{0}$ and $v$ ), as well as in the temporal distance between $t_{0}$ and $t$ . We formalize this goal as exponentially decaying local perturbation bound in Definition 3.1. We then show in Theorem 3.2 and Theorem 3.3 that such bounds hold under appropriate assumptions.

Definition 3.1.

Define $x_{t_{0}}^{v_{0}}\coloneqq\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)_{(t_{0},v_{0})},$ and $(x_{t_{0}}^{v_{0}})^{\prime}\coloneqq\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\}}\right)_{(t_{0},v_{0})}$ for arbitrary boundary parameters $\{(y_{t-1}^{u})\},\{(z_{\tau}^{u})\}$ and $\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\}$ . We say an exponentially decaying local perturbation bound holds if for non-negative constants

	$\displaystyle C_{1}={}$	$\displaystyle C_{1}(\ell_{T}/\mu,(\Delta\ell_{S})/\mu)<\infty,$
	$\displaystyle C_{2}={}$	$\displaystyle C_{2}(\ell_{T}/\mu,(\Delta\ell_{S})/\mu)<\infty,$
	$\displaystyle\rho_{T}={}$	$\displaystyle\rho_{T}(\ell_{T}/\mu)<1,\rho_{S}=\rho_{S}((\Delta\ell_{S})/\mu)<1,$

for any $(t_{0},v_{0})$ and arbitrary boundary parameters $\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\},\{(y_{t-1}^{u})\},\{(z_{\tau}^{u})\}$ , we have:

		$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert$
	$\displaystyle\leq{}$	$\displaystyle C_{1}\sum_{(u,\tau)\in\partial N_{(t,v)}^{(k,r)}}\rho_{T}^{\|t_{0}-\tau\|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert$
		$\displaystyle+C_{2}\sum_{u\in N_{v}^{r}}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert.$

Perturbation bounds were recently found to be a promising tool for the analysis of adaptive control and online optimization models(Lin et al., 2021). The exponentially decaying local perturbation bound defined above is similar in spirit to two recent results, i.e., Lin et al. (2021) derives a similar perturbation bound for line graphs and Shin et al. (2021) for general graphs with local perturbations. In fact, one may attempt to derive such a bound by applying these results directly; however, a major weakness of the direct approach is that it will yield $\rho_{T}=\rho_{S}$ , i.e., it cannot distinguish between spatial and temporal dependencies, and the bound deteriorates as $\max\{\ell_{T}/\mu,\ell_{S}/\mu\}$ increases. For instance, even if the temporal interactions are weak (i.e., $\ell_{T}/\mu\approx 0$ ), $\rho_{T}=\rho_{S}$ can still be close to $1$ if $\ell_{S}/\mu$ is large, leading to a large slack in the perturbation bound for small prediction horizons $k$ .

We overcome this limitation by redefining the action variables. Specifically, to focus on the temporal decay effect, we regroup all local actions in $\{\tau\}\times N_{v}^{r}$ as a “large” decision variable for time $\tau$ (in Figure 1 we would group each horizontal blue plane in $N_{v}^{r}$ to create a new variable). After regrouping, we have $(k+1)$ “large” decision variables located on a line graph, where the strength of the interactions between consecutive variables is upper bounded by $\ell_{T}$ . On the other hand, to focus on spatial decay, we regroup all local actions in $\{\tau\mid t-1\leq\tau<t+k\}\times\{v\}$ as a decision variable (in Figure 1 we would group each vertical orange line connecting from $t-1$ to $t+k-1$ to create a new variable). After regrouping, we have $\mathopen{}\mathclose{{}\left\lvert\mathcal{V}}\right\rvert$ “large” decision variables located on $\mathcal{G}$ , where the strength of the interactions between two neighbors is upper bounded by $\ell_{S}$ . Averaging over the two perturbation bounds (since we have two valid bounds, their average is also a valid bound) provides the following exponentially decaying local perturbation bound (see (D.1) in Appendix D.1 for details of the proof).

Theorem 3.2.

Under Assumption 2.1, the exponentially decaying local perturbation bound (Definition 3.1) holds with $C_{1}=C_{2}=\frac{2\sqrt{\Delta\ell_{S}\ell_{T}}}{\mu}$ , and

	$\displaystyle\rho_{T}={}$	$\displaystyle\sqrt{1-2\mathopen{}\mathclose{{}\left(\sqrt{1+({2\ell_{T}}/{\mu})}+1}\right)^{-1}},$
	$\displaystyle\rho_{S}={}$	$\displaystyle\sqrt{1-2\mathopen{}\mathclose{{}\left(\sqrt{1+({\Delta\ell_{S}}/{\mu})}+1}\right)^{-1}}.$

Note that, as $\ell_{T}/\mu$ (respectively $\ell_{S}/\mu$ ) tends to zero, $\rho_{T}$ (respectively $\rho_{S}$ ) in Theorem 3.2 also tends to zero with the scaling $\rho_{T}=\Theta(\sqrt{\ell_{T}/\mu})$ (resp. $\rho_{S}=\Theta(\sqrt{\ell_{S}/\mu})$ ).

Next, we provide a tighter bound (through a refined analysis) for the regime where $\mu$ is much larger than $\ell_{T},\ell_{S}$ . Specifically, we establish a bound with the scaling $\rho_{T}=\Theta(\ell_{T}/\mu)$ and $\rho_{S}=\Theta(\ell_{S}/\mu)$ . Again, it is not possible to obtain this result from previous perturbation bounds in the literature.

Theorem 3.3.

Recall $h(\gamma)\coloneqq\sup_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lvert\partial N_{v}^{\gamma}}\right\rvert$ . Given any $b_{1},b_{2}>0$ , define $a=\sum_{\gamma\geq 0}(\frac{1+b_{1}}{1+b_{1}+b_{2}})^{\gamma}h(\gamma)$ , $\tilde{a}=\sum_{\gamma\geq 0}(\frac{1}{1+b_{1}})^{\gamma}h(\gamma)$ and $\gamma_{S}=\frac{\sqrt{1+\Delta\ell_{S}/\mu}-1}{\sqrt{1+\Delta\ell_{S}/\mu}+1}$ . Suppose Assumption 2.1 holds, $a,\tilde{a}<\infty$ and $\mu\geq\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}(b_{1}+b_{2})/4\}$ . Then the exponentially decaying local perturbation bound (Definition 3.1) holds with $C_{1}=C_{2}=\max\{\frac{a^{2}}{2\tilde{a}(1-4\tilde{a}l_{T}/\mu)},\frac{2a^{2}\Delta\ell_{S}/\mu}{\gamma_{S}(1+b_{1}+b_{2})(1-4\tilde{a}l_{T}/\mu)}\}$

\displaystyle\rho_{T}=\frac{4\tilde{a}\ell_{T}}{\mu},\quad\rho_{S}=(1+b_{1}+b_{2})\gamma_{S}.

Note that $\rho_{T},\rho_{S}<1$ follow from the condition on $\mu$ . Also observe that $\gamma_{S}=\Theta(\ell_{S}/\mu)$ as $\ell_{S}/\mu\to 0$ .

The main difference between this result and Theorem 3.2 is, instead of dividing and redefining the action variables, we explicitly write down the perturbations along spatial edges and along temporal edges in the original temporal-spatial graph. We observe that per-time-step spatial interactions are characterized by a banded matrix and that the inverse of the banded matrix exhibits exponential correlation decay, which implies the exponentially decaying local perturbation bounds holds if the perturbed boundary action and the impacted local action we consider are at the same the time step. However, for a multi-time-step problem, to characterize the impact at a local action at some time step due to perturbation at a boundary action at a different time step is a difficult problem. The main technical contribution of this proof is to establish that a product of exponentially decaying matrices still satisfies exponential decay under the conditions in Theorem 3.3. In addition, we obtain a tight bound on the decay rate of the product matrix (see Lemma C.3), which may be of independent interest.

Our condition on $a,\tilde{a}<\infty$ and $\mu>\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}(b_{1}+b_{2})/4\}$ characterizes a tradeoff between the allowable neighborhood boundary sizes $h(\gamma)$ , and how large $\mu$ needs to be compared to the interaction cost parameters $\ell_{T},\ell_{S}$ . At one extreme, if $h(\gamma)=\Delta^{\gamma}$ , then by setting $b_{1}=2\Delta-1$ and $b_{2}=4\Delta^{2}-2\Delta$ , we obtain $a=\tilde{a}=2$ but must make a strong requirement on $\mu$ , namely, $\mu>\max\{16\ell_{T},\Delta^{3}\ell_{S}(1-\frac{1}{4\Delta^{2}})\}.$ At the other extreme, if $h(\gamma)\leq O(poly(\gamma))$ (as is the case if $\mathcal{G}$ is a grid), then $a,\tilde{a}<\infty$ holds for any $b_{1},b_{2}>0$ , we can impose a weaker requirement on $\mu$ : for example, taking $b_{1}=b_{2}=1$ yields a requirement $\mu>\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}/2\}$ (where $\tilde{a}=\sum_{\gamma\geq 0}(\frac{1}{2})^{\gamma}h(\gamma)$ ); which grows only linearly in $\Delta$ , and compares favorably with the $\mu>\Omega(\Delta^{3})$ requirement which arose earlier.

Proofs of Theorem 3.2 and Theorem 3.3 are in Appendix D.

3.3 From Perturbations to Competitive Bounds

We now present our main result, which bounds the competitive ratio of LPC using the exponentially decaying local perturbation bounds defined in the previous section.

Before presenting the result, we first provide some intuition as to why the perturbation bounds are useful for deriving the competitive ratio bound. Specifically, to bound the competitive ratio requires bounding the gap between LPC’s trajectory and the offline optimal trajectory. This gap comes from the following two sources: (i) the per-time-step error made by LPC due to its limited prediction horizon and communication radius; and (ii) the cumulative impact of all per-time-step errors made in the past. Intuitively, the local perturbation bounds we derive in Section 3.2 allow us to bound the per-step error made jointly by all agents in LPC, and then we use the perturbation bounds from Lin et al. (2021) help us to bound the second type of cumulative errors.

We present our main result in the following theorem and defer a proof outline to Appendix C.2. A formal proof can be found in Appendix E.3.

Theorem 3.4.

Suppose Assumption 2.1 and the exponentially decaying local perturbation bound in Definition 3.1 holds. Define $\rho_{G}\coloneqq 1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+({2\ell_{T}}/{\mu})}+1}\right)^{-1},$ and define $C_{3}(r)\coloneqq\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}$ . If parameters $r$ and $k$ of LPC are large enough such that $O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2k}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2},$ then the competitive ratio of LPC is upper bounded by

1+O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{r}}\right)+O\mathopen{}\mathclose{{}\left(C_{3}(r)^{2}\cdot\rho_{T}^{k}}\right).

Here the $O(\cdot)$ notation hides factors that depend polynomially on $\ell_{f}/\mu,\ell_{T}/\mu,$ and $(\Delta\ell_{S})/\mu$ ; see Appendix E.3.

Recall that $h(r)$ denotes the size of the largest $r$ -hop boundary in $\mathcal{G}$ . The bound in Theorem 3.4 implies that if $h(r)$ can be upper bounded by $poly(r)\cdot\rho_{S}^{-\frac{(1-\iota)r}{2}}$ for some constant $\iota>0$ , the competitive ratio of LPC can be upper bounded by $1+O(\rho_{S}^{\iota r})+O(\rho_{T}^{k})$ , because $C_{3}(r)$ can be upper bounded by some constant that depends on $\iota$ in this case. Therefore, the competitive ratio improves exponentially with respect to the prediction horizon $k$ and communication radius $r$ .

Note that the assumption $h(r)\leq poly(r)\cdot\rho_{S}^{-\frac{(1-\iota)r}{2}}$ is not particularly restrictive: For commonly seen graphs like an $m$ -dimensional grid, $h(r)$ is polynomial in $r$ , so $\iota=1$ works. More generally, for graphs with bounded degree $\Delta<\infty$ , there exists $\delta=\delta(\Delta)>0$ such that, for any graph with node degrees bounded above by $\Delta$ and any ${\ell_{S}}/{\mu}\leq\delta$ , we have $\rho_{S}$ (from either Theorem 3.2 or Theorem 3.3) will be small enough that, e.g., $h(r)\leq\Delta^{r}=O(\rho_{S}^{-\frac{r}{4}})$ ; i.e., $\iota=1/2$ works. Thus we can eliminate the dependence on $h(r)$ and $C_{3}(r)$ in the competitive ratio by making additional assumptions on $\ell_{S}/\mu$ . This result is stated in Corollary 3.5 whose proof is deferred to Appendix E.4. Corollary 3.5 is a corollary of Theorem 3.4 and Theorem 3.3. We use the bound in Theorem 3.3 and not the bound in Theorem 3.2 because Theorem 3.3 is tighter when $\ell_{S}/\mu$ is small.

Corollary 3.5.

Suppose 2.1 and inequalities $\ell_{S}/\mu\leq{\Delta^{-7}}$ , and $\ell_{T}/\mu\leq 1/16$ hold. If $r$ and $k$ satisfy that $O\mathopen{}\mathclose{{}\left(\rho_{S}^{r}+\rho_{T}^{2k}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2},$ then the competitive ratio of LPC is upper bounded by $1+O\mathopen{}\mathclose{{}\left(\rho_{S}^{r/2}}\right)+O\mathopen{}\mathclose{{}\left(\rho_{T}^{k}}\right),$ where $\rho_{S}$ and $\rho_{T}$ are given by Theorem 3.3 with parameters $b_{1}=2\Delta-1$ and $b_{2}=4\Delta^{2}-2\Delta$ . The $O(\cdot)$ notation hides factors that depend polynomially on $\ell_{f}/\mu,\ell_{T}/\mu,$ and $(\Delta\ell_{S})/\mu,$ see Appendix E.4 for the full constants.

3.4 A Lower Bound

We show that the competitive ratio in Theorem 3.4 is order-optimal by deriving a lower bound on the competitive ratio of any decentralized online algorithm with prediction horizon $k$ and communication radius $r$ . The specific constants and a proof of Theorem 3.6 can be found in Appendix F.

Theorem 3.6.

When $\Delta\geq 3$ , the competitive ratio of any decentralized online algorithm is lower bounded by $1+\Omega(\lambda_{T}^{k})+\Omega(\lambda_{S}^{r})$ . Here, the decay factor $\lambda_{T}$ is given by $\lambda_{T}=\mathopen{}\mathclose{{}\left(1-2\mathopen{}\mathclose{{}\left(\sqrt{1+(4\ell_{T}/\mu)}+1}\right)^{-1}}\right)^{2}$ . The decay factor $\lambda_{S}$ is given by $\lambda_{S}=\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)}$ if $\Delta\ell_{S}/\mu<48$ ; $\lambda_{S}=\max\mathopen{}\mathclose{{}\left(\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)},\mathopen{}\mathclose{{}\left(1-4\sqrt{3}\cdot(\Delta\ell_{S}/\mu)^{-\frac{1}{2}}}\right)^{2}}\right)$ otherwise. The $\Omega(\cdot)$ notation hides factors that depend polynomially on $1/\mu,\ell_{T},$ and $\ell_{S}$ .

While Theorem 3.6 highlights that Theorem 3.4 is order-optimal, the decay factors $\lambda_{T},\lambda_{S}$ in the lower bound differ from their counterparts $\rho_{T},\rho_{S}$ in the upper bound for LPC. To understand the magnitude of the difference, we compare the bounds on graphs with bounded degree $\Delta$ . The decay factors are a function of the interaction strengths, which are measured by ${\ell_{S}}/{\mu}$ and ${\ell_{T}}/{\mu}$ . Our lower bound on the temporal decay factor $\lambda_{T}$ and upper bound $\rho_{T}$ only differ by a constant factor in the log-scale, and the same holds for the lower/upper bound in terms of the spatial decay factor.

To formalize this comparison, we derive a resource augmentation bound that bounds the additional “resources” that LPC needs to outperform the optimal decentralized online algorithm.²²2See, e.g., Roughgarden (2020), for an introduction to this flavor of bounds for expressing the near-optimality of an algorithm. Here the prediction horizon $k$ and the communication radius $r$ can be viewed as the “resources” available to a decentralized online algorithm in our setting. We ask how large do $k$ and $r$ given to LPC need to be, to ensure that it beats the optimal decentralized online algorithm given a communication radius $r^{*}$ and prediction horizon $k^{*}$ ?

We formally state our result in the following corollary and provide a proof in Appendix G.

Corollary 3.7.

Under Assumption 2.1, suppose the optimal decentralized online algorithm achieves a competitive ratio of $c(k^{*},r^{*})$ with prediction horizon $k^{*}$ and communication radius $r^{*}$ . Additionally assume that $h(\gamma)=\tilde{O}\mathopen{}\mathclose{{}\left(\rho_{S}^{-\gamma/4}}\right)$ and $\Delta\geq 3$ , where the $\tilde{O}$ notation hides a factor that depends polynomially on $\gamma$ . As $k^{*},r^{*}\to\infty$ , LPC achieves a competitive ratio at least as good as that of the optimal decentralized online algorithm when LPC uses a prediction horizon of $k=(4+o(1))k^{*}$ and a communication radius of $r=(32+o(1))r^{*}$ .

Finally, note that we establish Corollary 3.7 based on the local perturbation bound in Theorem 3.2 rather than Theorem 3.3 for simplicity, because it does not make assumptions on the relationship among $\mu,\ell_{T},$ and $\ell_{S}$ . We expect that Theorem 3.3 can give better resource augmentation bounds under stronger assumptions on $\mu,\ell_{T},$ and $\ell_{S}$ .

4 Concluding Remarks

In this work, we introduce and study a novel form of decentralized online convex optimization in a networked system, where the local actions of each agent are coupled by temporal interactions and spatial interactions. We propose a decentralized online algorithm, LPC, which leverages all available information within a prediction horizon of length $k$ and a communication radius of $r$ to achieve a competitive ratio of $1+\tilde{O}(\rho_{T}^{k})+\tilde{O}(\rho_{S}^{r})$ . Our lower bound result shows that this competitive ratio is order optimal. Our results imply that the two types of resources, the prediction horizon and the communication radius, must be improved simultaneously in order to obtain a competitive ratio that converges to $1$ . That is, it is not enough to either have a large communication radius or a long prediction horizon, the combination of both is necessary to approach the hindsight optimal performance.

A limitation of this work is that we have considered only exact predictions in the available information model, the LPC algorithm, and its analysis. Considering inexact predictions is an important future goal and we are optimistic that our work can be generalized in that direction using the roadmap in Appendix C.3.

Acknowledgement: We would like to thank ICML reviewers and meta-reviewer, as well as Yisong Yue for their valuable feedback on this work. This work is supported by NSF grants ECCS-2154171, CNS-2146814, CPS-2136197, CNS-2106403, NGSDI-2105648, CMMI-1653477, with additional support from Amazon AWS, PIMCO, and the Resnick Sustainability Insitute. Yiheng Lin was supported by Kortschak Scholars program.

References

Agarwal et al. (2019) Agarwal, N., Bullins, B., Hazan, E., Kakade, S., and Singh, K. Online control with adversarial disturbances. In International Conference on Machine Learning, pp. 111–119. PMLR, 2019.
Akbari et al. (2017) Akbari, M., Gharesifard, B., and Linder, T. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 4(3), 2017.
Antoniadis et al. (2020) Antoniadis, A., Coester, C., Elias, M., Polak, A., and Simon, B. Online metric algorithms with untrusted predictions. In International Conference on Machine Learning, pp. 345–355. PMLR, 2020.
Argue et al. (2020a) Argue, C., Gupta, A., and Guruganesh, G. Dimension-free bounds for chasing convex functions. In Conference on Learning Theory, pp. 219–241. PMLR, 2020a.
Argue et al. (2020b) Argue, C., Gupta, A., Guruganesh, G., and Tang, Z. Chasing convex bodies with linear competitive ratio. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1519–1524. SIAM, 2020b.
Badiei et al. (2015) Badiei, M., Li, N., and Wierman, A. Online convex optimization with ramp constraints. In 2015 54th IEEE Conference on Decision and Control (CDC), pp. 6730–6736. IEEE, 2015.
Bertsekas & Tsitsiklis (1989) Bertsekas, D. and Tsitsiklis, J. Parallel and Distributed Computation: Numeral Methods. Prentice-Hall Inc., 1989.
Besbes & Lobel (2015) Besbes, O. and Lobel, I. Intertemporal price discrimination: Structure and computation of optimal policies. Management Science, 61(1):92–110, 2015.
Biggs et al. (2021) Biggs, M., Sun, W., and Ettl, M. Model distillation for revenue optimization: Interpretable personalized pricing. In International Conference on Machine Learning, pp. 946–956. PMLR, 2021.
Boyd et al. (2011) Boyd, S., Parikh, N., and Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
Candogan et al. (2012) Candogan, O., Bimpikis, K., and Ozdaglar, A. Optimal pricing in networks with externalities. Operations Research, 60(4):883–905, 2012.
Cao & Basar (2021) Cao, X. and Basar, T. Decentralized online convex optimization with feedback delays. IEEE Transactions on Automatic Control, 2021. ISSN 15582523. doi: 10.1109/TAC.2021.3092562.
Cao & Başar (2021) Cao, X. and Başar, T. Decentralized online convex optimization based on signs of relative states. Automatica, 129, 2021. ISSN 00051098. doi: 10.1016/j.automatica.2021.109676.
Caro & Gallien (2012) Caro, F. and Gallien, J. Clearance pricing optimization for a fast-fashion retailer. Operations Research, 60(6):1404–1422, 2012.
Chen & Chen (2015) Chen, M. and Chen, Z. L. Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production and Operations Management, 24, 2015. ISSN 19375956. doi: 10.1111/poms.12295.
Chen et al. (2016) Chen, N., Comden, J., Liu, Z., Gandhi, A., and Wierman, A. Using predictions in online optimization: Looking forward with an eye on the past. ACM SIGMETRICS Performance Evaluation Review, 44(1):193–206, 2016.
Chen et al. (2018) Chen, N., Goel, G., and Wierman, A. Smoothed online convex optimization in high dimensions via online balanced descent. In Proceedings of Conference On Learning Theory (COLT), pp. 1574–1594, 2018.
Chen & Farias (2018) Chen, Y. and Farias, V. F. Robust dynamic pricing with strategic customers. Mathematics of Operations Research, 43(4):1119–1142, 2018.
Chen & Wang (2005) Chen, Y. Q. and Wang, Z. Formation control: A review and a new consideration. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2005. doi: 10.1109/IROS.2005.1545539.
Fiacco & Ishizuka (1990) Fiacco, A. V. and Ishizuka, Y. Sensitivity and stability analysis for nonlinear programming. Annals of Operations Research, 27(1):215–235, 1990.
Forsgren et al. (2002) Forsgren, A., Gill, P. E., and Wright, M. H. Interior methods for nonlinear optimization. SIAM review, 44(4):525–597, 2002.
Gallego & Topaloglu (2019) Gallego, G. and Topaloglu, H. Revenue management and pricing analytics. International Series in Operations Research and Management Science, 279, 2019. ISSN 22147934. doi: 10.1007/978-1-4939-9606-3.
Gamarnik & Goldberg (2010) Gamarnik, D. and Goldberg, D. A. Randomized greedy algorithms for independent sets and matchings in regular graphs: Exact results and finite girth corrections. Combinatorics Probability and Computing, 19, 2010. ISSN 09635483. doi: 10.1017/S0963548309990186.
Garcia et al. (1989) Garcia, C. E., Prett, D. M., and Morari, M. Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348, 1989.
Goel & Wierman (2019) Goel, G. and Wierman, A. An online algorithm for smoothed regression and LQR control. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
Goel et al. (2019) Goel, G., Lin, Y., Sun, H., and Wierman, A. Beyond online balanced descent: An optimal algorithm for smoothed online optimization. In Neural Information Processing Systems (NeurIPS), 2019.
Hochba (1997) Hochba, D. S. Approximation algorithms for NP-hard problems. ACM Sigact News, 28(2):40–52, 1997.
Hosseini et al. (2016) Hosseini, S., Chapman, A., and Mesbahi, M. Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11), 2016. ISSN 00189286. doi: 10.1109/TAC.2016.2525928.
Joseph & de Veciana (2012) Joseph, V. and de Veciana, G. Jointly optimizing multi-user rate adaptation for video transport over wireless systems: Mean-fairness-variability tradeoffs. In Proceedings of the IEEE INFOCOM, pp. 567–575, 2012.
Kempe et al. (2015) Kempe, D., Kleinberg, J., and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11, 2015. ISSN 15572862. doi: 10.4086/toc.2015.v011a004.
Kim & Giannakis (2017) Kim, S. and Giannakis, G. B. An online convex optimization approach to real-time energy pricing for demand response. IEEE Transactions on Smart Grid, 8(6):2784–2793, 2017. ISSN 1949-3053. doi: 10.1109/TSG.2016.2539948.
Li et al. (2021a) Li, X., Yi, X., and Xie, L. Distributed online convex optimization with an aggregative variable. IEEE Transactions on Control of Network Systems, 2021a. ISSN 2325-5870. doi: 10.1109/tcns.2021.3107480.
Li et al. (2021b) Li, X., Yi, X., and Xie, L. Distributed online optimization for multi-agent networks with coupled inequality constraints. IEEE Transactions on Automatic Control, 66(8), 2021b. ISSN 15582523. doi: 10.1109/TAC.2020.3021011.
Li et al. (2020) Li, Y., Qu, G., and Li, N. Online optimization with predictions and switching costs: Fast algorithms and the fundamental limit. IEEE Transactions on Automatic Control, 2020.
Lin et al. (2012) Lin, M., Liu, Z., Wierman, A., and Andrew, L. L. Online algorithms for geographical load balancing. In Proceedings of the International Green Computing Conference (IGCC), pp. 1–10, 2012.
Lin et al. (2020) Lin, Y., Goel, G., and Wierman, A. Online optimization with predictions and non-convex losses. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(1):1–32, 2020.
Lin et al. (2021) Lin, Y., Hu, Y., Sun, H., Shi, G., Qu, G., and Wierman, A. Perturbation-based regret analysis of predictive control in linear time varying systems. arXiv preprint arXiv:2106.10497, 2021.
Marschak (1955) Marschak, J. Elements for a theory of teams. Management Science, 1, 1955. ISSN 0025-1909. doi: 10.1287/mnsc.1.2.127.
Marschak & Radner (1972) Marschak, J. and Radner, R. Economic Theory of Teams. Cowles Foundation for Research in Economics at Yale University, 1972.
Molzahn et al. (2017) Molzahn, D. K., Dörfler, F., Sandberg, H., Low, S. H., Chakrabarti, S., Baldick, R., and Lavaei, J. A survey of distributed optimization and control algorithms for electric power systems. IEEE Transactions on Smart Grid, 8(6):2941–2962, 2017.
Morari & Lee (1999) Morari, M. and Lee, J. H. Model predictive control: past, present and future. Computers & Chemical Engineering, 23(4-5):667–682, 1999.
Nedić et al. (2018) Nedić, A., Olshevsky, A., and Rabbat, M. G. Network topology and communication-computation tradeoffs in decentralized optimization. Proceedings of the IEEE, 106(5):953–976, 2018.
Oh et al. (2015) Oh, K. K., Park, M. C., and Ahn, H. S. A survey of multi-agent formation control. Automatica, 53, 2015. ISSN 00051098. doi: 10.1016/j.automatica.2014.10.022.
Roughgarden (2020) Roughgarden, T. Resource augmentation. CoRR, abs/2007.13234, 2020. URL https://arxiv.org/abs/2007.13234.
Schlosser (2016) Schlosser, R. Stochastic dynamic multi-product pricing with dynamic advertising and adoption effects. Journal of Revenue and Pricing Management, 15(2):153–169, 2016.
Sellke (2020) Sellke, M. Chasing convex bodies optimally. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1509–1518. SIAM, 2020.
Shi et al. (2020) Shi, G., Lin, Y., Chung, S.-J., Yue, Y., and Wierman, A. Online optimization with memory and competitive control. Advances in Neural Information Processing Systems, 33:20636–20647, 2020.
Shi et al. (2019) Shi, M., Lin, X., and Jiao, L. On the value of look-ahead in competitive online convex optimization. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2):22, 2019.
Shi et al. (2015) Shi, W., Ling, Q., Wu, G., and Yin, W. Extra: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.
Shi et al. (2021) Shi, Y., Qu, G., Low, S., Anandkumar, A., and Wierman, A. Stability constrained reinforcement learning for real-time voltage control. arXiv preprint arXiv:2109.14854, 2021.
Shin & Zavala (2021) Shin, S. and Zavala, V. M. Controllability and observability imply exponential decay of sensitivity in dynamic optimization. IFAC-PapersOnLine, 54(6):179–184, 2021.
Shin et al. (2020) Shin, S., Zavala, V. M., and Anitescu, M. Decentralized schemes with overlap for solving graph-structured optimization problems. IEEE Transactions on Control of Network Systems, 7(3):1225–1236, Sep 2020. ISSN 2372-2533. doi: 10.1109/tcns.2020.2967805. URL http://dx.doi.org/10.1109/TCNS.2020.2967805.
Shin et al. (2021) Shin, S., Anitescu, M., and Zavala, V. M. Exponential decay of sensitivity in graph-structured nonlinear programs. arXiv preprint arXiv:2101.03067, 2021.
Song & Chintagunta (2006) Song, I. and Chintagunta, P. K. Measuring cross-category price effects with aggregate store data. Management Science, 52(10):1594–1609, 2006.
Stanica (2001) Stanica, P. Good lower and upper bounds on binomial coefficients. Journal of Inequalities in Pure and Applied Mathematics, 2(3):30, 2001.
Talluri & Ryzin (2006) Talluri, K. and Ryzin, G. V. The theory and practice of revenue management, springer. International Series in Operations Research and Management, 68, 2006.
Tretter (2008) Tretter, C. Spectral theory of block operator matrices and applications. Spectral Theory of Block Operator Matrices and Applications, 2008. doi: 10.1142/P493.
Wainwright & Jordan (2008) Wainwright, M. J. and Jordan, M. I. Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.
Xin et al. (2020) Xin, R., Pu, S., Nedić, A., and Khan, U. A. A general framework for decentralized optimization with first-order methods. Proceedings of the IEEE, 108(11):1869–1889, 2020.
Yi et al. (2020) Yi, X., Li, X., Xie, L., and Johansson, K. H. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 68, 2020. ISSN 19410476. doi: 10.1109/TSP.2020.2964200.
Yu et al. (2020) Yu, C., Shi, G., Chung, S.-J., Yue, Y., and Wierman, A. The power of predictions in online control. Advances in Neural Information Processing Systems, 33, 2020.
Yuan et al. (2021) Yuan, D., Proutiere, A., and Shi, G. Distributed online optimization with long-term constraints. IEEE Transactions on Automatic Control, 2021. ISSN 15582523. doi: 10.1109/TAC.2021.3057601.

Appendix A Notation Summary and Definitions

The notation we use throughout the paper is summarized in the following two tables.

Table 1: Notation related to the graph/network structures.

Notation	Meaning
$\mathcal{G}=(\mathcal{V},\mathcal{E})$	The network of agents connected by undirected edges;
$d_{\mathcal{G}}(u,v)$	The graph distance (i.e. the length of the shortest path) between two vertices $u$ and $v$ in $\mathcal{G}$ ;
$N_{v}^{r}$	The $r$ -hop neighborhood of vertex/agent $v$ in $\mathcal{G}$ , i.e., $N_{v}^{r}:=\{u\in\mathcal{V}\mid d_{\mathcal{G}}(u,v)\leq r\}$ ;
$\partial N_{v}^{r}$	The boundary of the $r$ -hop neighborhood of vertex/agent $v$ , i.e., $\partial N_{v}^{r}=N_{v}^{r}\setminus N_{v}^{r-1}$ ;
$h(r)$	The size of the largest $r$ -hop boundary in $\mathcal{G}$ , i.e., $h(r):=\sup_{v}\mathopen{}\mathclose{{}\left\lvert\partial N_{v}^{r}}\right\rvert$ ;
$\Delta$	The maximum degree of any vertex $v$ in $\mathcal{G}$ ;
$\mathcal{E}(S)$	The set of all edges that have both endpoints in $S$ , where $S\subseteq\mathcal{V}$ ;
$S_{+}$	The extension of $S$ by 1-hop, i.e., $S_{+}=S\cup\{u\mid\exists v\in S\text{ s.t. }(u,v)\in\mathcal{E}\}$ ;
$N_{(t,v)}^{(k,r)}$	$\{\tau\in\mathbb{Z}\mid t\leq\tau<t+k\}\times N_{v}^{r}$ , which is a set of (time, vertex) pairs;
$\partial N_{(t,v)}^{(k,r)}$	$N_{(t,v)}^{(k,r)}\setminus N_{(t,v)}^{(k-1,r-1)}$ , which is a set of (time, vertex) pairs;

Table 2: Notation related to the optimization problems.

Notation	Meaning
$\mathopen{}\mathclose{{}\left\lVert\cdot}\right\rVert$	The (Euclidean) 2-norm for vectors and the induced 2-norm for matrices;
$H$	The whole horizon length of Networked OCO problem;
$[H]$	The sequence of integers $1,2,\ldots,H$ ;
$\mathbb{S}^{m}$	For any positive integer $m$ , $\mathbb{S}^{m}$ denotes the set of all symmetric real $m\times m$ matrices;
$y_{t_{1}:t_{2}}$	The sequence $y_{t_{1}},y_{t_{1}+1},\ldots,y_{t_{2}}$ , for $t_{2}\geq t_{1}$ ;
$x_{t}^{v}$	The individual action of agent $v$ at time step $t$ . It is a vector in $\mathbb{R}^{n}$ ;
$x_{t}^{S}$	The joint action of all agents in set $S\subseteq\mathcal{V}$ at time $t$ , i.e., $x_{t}^{S}=\{x_{t}^{v}\}_{v\in S}$ ;
$x_{t}$	The joint action of all agents in $\mathcal{V}$ at time $t$ , i.e., $x_{t}=\{x_{t}^{v}\}_{v\in\mathcal{V}}$ . It is a shorthand of $x_{t}^{\mathcal{V}}$ ;
$x_{t}^{*}$	The offline optimal joint action of all agents at time $t$ ;
$x_{\tau\mid t}^{*}$	The clairvoyant joint decision of all agents at time $\tau$ given that the joint decision is $x_{t}$ at time $t$ ;
$f_{t}^{v}(x_{t}^{v})$	The node cost function for agent $v\in\mathcal{V}$ at time step $t$ ;
$c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})$	The temporal interaction cost function for agent $v\in\mathcal{V}$ at time step $t$ ;
$s_{t}^{e}(x_{t}^{v},x_{t}^{u})$	The spatial interaction cost for edge $e=(v,u)\in\mathcal{E}$ at time step $t$ ;
$\mu$	The strong convexity constant of node costs $f_{t}^{v}$ ;
$\ell_{f},\ell_{T},\ell_{S}$	The smoothness constant of node costs, temporal interaction costs, and spatial interaction costs;
$D_{t}^{v}$	The feasible set of $x_{t}^{v}$ for agent $v$ at time $t$ . It is a convex subset of $\mathbb{R}^{n}$ ;
$\theta_{t}^{v}$	The minimizer of node cost function for $v$ at time $t$ subject to $D_{t}^{v}$ , i.e., $\theta_{t}^{v}=\operatorname*{arg\,min}_{y\in D_{t}^{v}}f_{t}^{v}(y)$ ;
$f_{t}^{S}(x_{t}^{S_{+}})$	The total node costs and spatial interaction costs over a subset $S\subseteq\mathcal{V}$ at time $t$ , i.e.,
	$f_{t}^{S}(x_{t}^{S}):=\sum_{v\in S}f_{t}^{v}(x_{t}^{v})+\sum_{(v,u)\in\mathcal{E}(S)}s_{t}^{(v,u)}(x_{t}^{v},x_{t}^{u})$ ;
$c_{t}^{S}(x_{t}^{S})$	The total temporal interaction costs over a subset $S\subseteq\mathcal{V}$ at time $t$ , i.e., $c_{t}^{S}(x_{t}^{S}):=\sum_{v\in S}c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})$ ;
$f_{t}(x_{t})$	The total node costs and spatial interaction costs over a $\mathcal{V}$ at time $t$ . A shorthand of $f_{t}^{\mathcal{V}}(x_{t}^{\mathcal{V}})$ ;
$c_{t}(x_{t})$	The total temporal interaction costs over $\mathcal{V}$ at time $t$ . A shorthand of $c_{t}^{\mathcal{V}}(x_{t}^{\mathcal{V}})$ ;
$\psi_{(t,v)}^{(k,r)}(\cdot,\cdot)$	The optimal individual decisions in $N_{(t,v)}^{(k,r)}$ when the decision boundaries formed by $\{t-1\}\times N_{v}^{r}$
	and $\partial N_{(t,v)}^{(k,r)}$ are fixed as parameters;
$\tilde{\psi}_{t}^{p}(\cdot,\cdot)$	The optimal global trajectory $x_{t},x_{t+1},\ldots,x_{t+p-2}$ when $x_{t-1}$ and $x_{t+p-1}$ are fixed as parameters;
$\tilde{\psi}_{t}(\cdot)$	The optimal global trajectory $x_{t},x_{t+1},\ldots,x_{T}$ when $x_{t-1}$ is fixed as the parameter;

In addition to the notation in the tables above, we make use of the concepts of strong convexity and smoothness throughout this paper.

Definition A.1.

For a fixed dimension $m\in\mathbb{Z}_{+}$ , let $D\subset\mathbb{R}^{m}$ be a convex set, and suppose function $\hat{h}:D\to\mathbb{R}$ is a differentiable function. Then, $\hat{h}$ is called $\ell$ -smooth for some constant $\ell\in\mathbb{R}_{\geq 0}$ if

\hat{h}(y)\leq\hat{h}(x)+\langle\nabla\hat{h}(x),y-x\rangle+\frac{\ell}{2}\mathopen{}\mathclose{{}\left\lVert y-x}\right\rVert_{2}^{2},\forall x,y\in\mathbb{R}^{m},

and is called $\mu$ -strongly convex for some constant $\mu\in\mathbb{R}_{\geq 0}$ if

\hat{h}(y)\geq\hat{h}(x)+\langle\nabla\hat{h}(x),y-x\rangle+\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert y-x}\right\rVert_{2}^{2},\forall x,y\in\mathbb{R}^{m}.

Here $\langle\cdot,\cdot\rangle$ denotes the dot product of vectors.

Appendix B Example: Multiproduct Pricing

The networked online convex optimization problem captures many applications where individual agents must make decisions online in a networked system. In this section, we give a concrete motivating example from multiproduct pricing problems. Many recent books of revenue management (Talluri & Ryzin, 2006; Gallego & Topaloglu, 2019) and papers (Song & Chintagunta, 2006; Caro & Gallien, 2012; Candogan et al., 2012; Chen & Chen, 2015) describe numerous instances of this problem. In our example below, we follow a similar pricing model as in Candogan et al. (2012).

Consider a setting where a large company sells $n$ different products and wants to maximize its revenue by adjusting prices adaptively in a time-varying market. Each vertex/agent $v\in\mathcal{V}$ corresponds to a product and $x_{t}^{v}$ denotes its price at time $t$ . Two products $v$ and $u$ are connected by an edge if they interact, e.g., because the products are complements or substitutes.

We assume a classical linear demand model (Talluri & Ryzin, 2006; Gallego & Topaloglu, 2019), where the demand of $v$ at time $t$ , denoted as $d_{t}^{v}$ , is given by

d_{t}^{v}=\underbrace{a_{t}^{v}-k_{t}^{v}x_{t}^{v}}_{\text{Part 1}}\overbrace{-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}x_{t}^{u}}^{\text{Part 2}}\underbrace{+b_{t}^{v}x_{t-1}^{v}}_{\text{Part 3}},

with parameters $a_{t}^{v},k_{t}^{v},b_{t}^{v}>0$ and $\eta_{t}^{(u\to v)}\in\mathbb{R}$ . Here, Part 1 corresponds to the nominal demand at price $x_{t}^{v}$ ; Part 2 adds the network externalities from $v$ ’s complements/substitutes, and Part 3 reflects the pent up demand of product $v$ due to high price at time $t-1$ . Note that the coefficient $\eta_{t}^{(u\to v)}$ can be different with $\eta_{t}^{(v\to u)}$ . To simplify the notations, for each undirected edge $e=(u,v)$ , we define an aggregate coefficient $\gamma_{t}^{e}\coloneqq\frac{1}{2}\mathopen{}\mathclose{{}\left(\eta_{t}^{(u\to v)}+\eta_{t}^{(v\to u)}}\right)$ .

The full revenue maximization problem can be written as

	$\displaystyle\max$	$\displaystyle\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}x_{t}^{v}d_{t}^{v}=\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}x_{t}^{v}({a_{t}^{v}-k_{t}^{v}x_{t}^{v}}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}x_{t}^{u}+b_{t}^{v}x_{t-1}^{v})$		(3)
	s.t.	$\displaystyle 0\leq x_{t}^{v}\leq\overline{p}_{t}^{v},$		(3)

which is equivalent to the following:

	$\displaystyle\min$	$\displaystyle-\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}x_{t}^{v}d_{t}^{v}=\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}\Bigg{[}x_{t}^{v}(-a_{t}^{v}+k_{t}^{v}x_{t}^{v}+\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}x_{t}^{u}-b_{t}^{v}x_{t-1}^{v})\Bigg{]}$		(4)
	s.t.	$\displaystyle 0\leq x_{t}^{v}\leq\overline{p}_{t}^{v}$		(4)

We assume that the product’s own price elasticity coefficient $k_{t}^{v}$ is uniformly larger than the sum of magnitudes of cross-elasticity coefficients, i.e., exist $\mu>0$ , s.t.

\xi_{t}^{v}:=k_{t}^{v}-\sum_{u\in N_{v}^{1}\setminus{v}}|\gamma_{t}^{(u,v)}|-\frac{b_{t}^{v}+b_{t+1}^{v}}{2}\geq\mu/2>0

holds for any $(t,v)$ . Further, we assume $\sup_{v\in\mathcal{V},t\in H}k_{t}^{v}\leq\ell_{f}/2$ , $\sup_{v\in\mathcal{V},t\in H}b_{t}^{v}\leq b$ , and $\sup_{(u,v)\in\mathcal{E},t\in H}|\eta_{t}^{(u\to v)}|\leq\gamma$ .

We now observe that this problem fits within our framework with node, spatial, and temporal costs which are quadratic and defined as follows.

	$\displaystyle f_{t}^{v}(x_{t}^{v})\coloneqq{}$	$\displaystyle\xi_{t}^{v}\mathopen{}\mathclose{{}\left(x_{t}^{v}-\frac{a_{t}^{v}}{2\xi_{t}^{v}}}\right)^{2},$
	$\displaystyle s_{t}^{(u,v)}(x_{t}^{u},x_{t}^{v})\coloneqq{}$	$\displaystyle\|\gamma_{t}^{(u,v)}\|\mathopen{}\mathclose{{}\left(x_{t}^{u}+sgn\mathopen{}\mathclose{{}\left(\gamma_{t}^{(u,v)}}\right)\cdot x_{t}^{v}}\right)^{2},$
	$\displaystyle c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})\coloneqq{}$	$\displaystyle\frac{b_{t}^{v}}{2}\mathopen{}\mathclose{{}\left(x_{t}^{v}-x_{t-1}^{v}}\right)^{2}.$

Note that $f_{t}^{v}(x_{t}^{v})$ is $\xi_{t}^{v}(x_{t}^{v})^{2}-{a_{t}^{v}}{x_{t}^{v}}$ plus a constant $(a_{t}^{v})^{2}/(4\xi_{t}^{v})$ , and the interaction functions can be rewritten as

s_{t}^{(u,v)}(x_{t}^{u},x_{t}^{v})=|\gamma_{t}^{(u,v)}|\Bigg{(}(x_{t}^{u})^{2}+(x_{t}^{v})^{2}\Bigg{)}+2\gamma_{t}^{(u,v)}x_{t}^{v}x_{t}^{u},\quad c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})=\frac{b_{t}^{v}}{2}\Bigg{(}(x_{t}^{v})^{2}+(x_{t-1}^{v})^{2}\Bigg{)}-b_{t}x_{t-1}^{v}x_{t}^{v}.

Summing, we see that

\displaystyle\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}f_{t}^{v}(x_{t}^{v})+c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})+\sum_{t=1}^{H}\sum_{e\in\mathcal{E}}s_{t}^{e}(x_{t}^{u},x_{t}^{v})=\textup{(Objective in \eqref{opt-problem:min})}+\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}(a_{t}^{v})^{2}/(4\xi_{t}^{v})\,.

(5)

Hence the optimal solution of (4) is the same as the following problem:

	$\displaystyle\min$	$\displaystyle\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}f_{t}^{v}(x_{t}^{v})+c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})+\sum_{t=1}^{H}\sum_{e\in\mathcal{E}}s_{t}^{e}(x_{t}^{u},x_{t}^{v})$		(6)
	s.t.	$\displaystyle 0\leq x_{t}^{v}\leq\overline{p}_{t}^{v}$		(6)

where the node cost function $f_{t}^{v}(x_{t}^{v})$ is nonnegative, $\mu$ -strongly convex, and $\ell_{f}$ -smooth; the spatial interaction function is nonnegative, convex and $(4\gamma)$ -smooth; the temporal interaction function is nonegative, convex and $(2b)$ -smooth.

The decentralized nature of our policy is important in this setting. Interpretable pricing algorithms (Biggs et al., 2021) are attractive in practice. Our local pricing algorithm is indeed interpretable, since the current price of a given product is transparently determined by reliable predictions of demand in the near future as well as interactions with directly related products.

In addition, exactly solving the global multiproduct pricing problem can be computationally challenging in practice, especially when the network is large. For example, large online e-commerce companies maintain millions of products, which makes the entire network difficult to store, let alone do computation over. Moreover, due to the ease of changing prices, e-commerce companies often use dynamic pricing and change prices on a daily (or quicker) basis, which magnifies the computational burden.

B.1 Competitive Bound

We end our discussion of multiproduct pricing by showing how the competitive bound in Theorem 3.4 can be applied to the revenue maximization problem of product networks through the use of the following lemma.

Lemma B.1.

Suppose the competitive ratio of our general cost minimization problem is $CR(k,r)$ , which a function of prediction horizon $k$ and communication radius $r$ . Suppose $\sup_{(u,v)\in\mathcal{E},t\in[H]}a_{t}^{u}/a_{t}^{v}\leq\tilde{b}$ , $\sup_{v\in\mathcal{V},t\in[H]}\frac{a_{t}^{v}}{\overline{p}_{t}^{v}}\leq\tilde{c}$ , then the competitive ratio for the corresponding revenue maximization problem, defined as $rev(ALG)/rev(OPT)$ , is at least $1-\frac{\eta}{2}(CR(k,r)-1)$ , where $\Delta$ denotes the degree of the product network and $\eta\coloneqq\max\{2(\ell_{f}+\Delta\tilde{b}\gamma)/\mu,\tilde{c}/\mu\}$ .

Proof.

We define $C\coloneqq\sum_{t,v}(a_{t}^{v})^{2}/(4\xi_{t}^{v})$ . Suppose

CR(k,r)\cdot cost(OPT)\geq cost(ALG),

then

CR(k,r)\cdot(-rev(OPT)+C)\geq(-rev(ALG)+C).

Rearranging the terms yields

(CR(k,r)-1)\cdot C\geq CR(k,r)rev(OPT)-rev(ALG).

(7)

To find a lower bound on $rev(OPT)$ , we choose a pricing strategy such that $x_{t}^{v}=\frac{a_{t}^{v}}{\eta\mu}\geq 0$ where $\eta=\max\{2(\ell_{f}+\Delta\tilde{b}\gamma)/\mu,\tilde{c}/\mu\}$ . We first check that the demand is always nonnegative under this strategy:

	$\displaystyle a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}\frac{a_{t}^{u}}{\eta\mu}+b_{t}^{v}\frac{a_{t-1}^{v}}{\eta\mu}$	$\displaystyle\geq a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}\frac{a_{t}^{u}}{\eta\mu}$
		$\displaystyle\geq a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\tilde{b}\gamma\frac{a_{t}^{v}}{\eta\mu}$
		$\displaystyle\geq a_{t}^{v}(1-\frac{\ell_{f}+\Delta\tilde{b}\gamma}{\eta\mu})$
		$\displaystyle\geq\frac{a_{t}^{v}}{2}.$

Moreover,

x_{t}^{v}\leq a_{t}^{v}/\tilde{c}\leq\overline{p}_{t}^{v}.

Hence this is a feasible price strategy.

We lower bound the optimal revenue:

	$\displaystyle rev(OPT)$	$\displaystyle\geq\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}\frac{a_{t}^{v}}{\eta\mu}({a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}\frac{a_{t}^{u}}{\eta\mu}+b_{t}^{v}\frac{a_{t-1}^{v}}{\eta\mu})$
		$\displaystyle\geq\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}\frac{a_{t}^{v}}{\eta\mu}\frac{a_{t}^{v}}{2}$
		$\displaystyle\geq\frac{2}{\eta}C.$

We further divide Equation 7 by $rev(OPT)$ to obtain

(CR(k,r)-1)\frac{C}{rev(OPT)}\geq CR(k,r)-rev(ALG)/rev(OPT).

Since $CR(k,r)\geq 1$ for the cost minimization problem,

rev(ALG)/rev(OPT)\geq 1-(CR(k,r)-1)\frac{C}{rev(OPT)}.

This allows us to complete the proof as follows

rev(ALG)/rev(OPT)\geq 1-\frac{\eta}{2}(CR(k,r)-1).

∎

Appendix C Proof Outline

In this section, we outline the major novelties in our proofs for the tighter exponentially decaying local perturbation bound in Theorem 3.3 and the main competitive ratio bound for LPC in Theorem 3.4. The full details of the proofs of these and other results are in the appendices following this one.

C.1 Refined Analysis of Perturbation Bounds

We begin by outlining the four-step structure we use to prove Theorem 3.3. Our goal is to highlight the main ideas, while deferring a detailed proof to Appendix D.2.

Step 1. Establish first order equations

We define $h$ as the objective function in (3.1), where actions on the boundary are fixed as $\{z_{\tau}^{u}|(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\}$ and the actions at time $t-1$ are fixed as $\{x_{t-1}^{u}|u\in N_{v}^{r}\}$ . We denote those fixed actions as system parameter

\zeta\coloneqq(x_{t-1}^{(N_{v}^{r})},\{z_{\tau}^{u}|(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\}).

To avoid writing the time index $t$ repeatedly, we use $\hat{x}_{i}$ to denote actions at time $t-1+i$ for $0\leq i\leq k$ . The main lemma in for this step is the following.

Lemma C.1.

Given $\theta\in\mathbb{R}$ , system parameter $\zeta$ and perturbation vector $e$ , we have

\frac{d}{d\theta}\psi(\zeta+\theta e)=M^{-1}\Bigg{(}R^{(1)}e_{0}+R^{(k-1)}e_{k}+\sum_{\tau=1}^{k-1}K^{(\tau)}e_{\tau}\Bigg{)}

where

M=\nabla_{\hat{x}_{1:k-1}}^{2}h(\psi(\zeta+\theta e),\zeta+\theta e),

R^{(1)}\coloneqq-\nabla_{\hat{x}_{0}}\nabla_{\hat{x}_{1:k-1}}h(\psi(\zeta+\theta e),\zeta+\theta e),

R^{(k-1)}\coloneqq-\nabla_{\hat{x}_{k}}\nabla_{\hat{x}_{1:k-1}}h(\psi(\zeta+\theta e),\zeta+\theta e),

K^{(\tau)}\coloneqq-\nabla_{\hat{x}_{\tau}^{(\partial N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}}h(\psi(\zeta+\theta e),\zeta+\theta e).

The proof for Lemma C.1 using first order conditions at the global optimal solution for convex function $h(\cdot,\zeta+\theta e)$ and then takes derivatives with respect to to $\theta$ . See Appendix D.2 for a proof.

Step 2: Exploit the structure of matrix $M$

$M$ is a hierarchical block matrix with the first level of dimension $(k-1)\times(k-1)$ . When fixing the first level indices (i.e. time indices) in $M$ , the lower level matrices are non-zero only if the difference in the time indices is $\leq 1$ . Hence we decompose $M$ to a block diagonal matrix $D$ and tri-diagonal block matrix $A$ with zero matrix on the diagonal. Each diagonal block in $D$ is a graph-induced banded matrix, which captures the Hessian of $h$ in a single time step. Denote each diagonal block as $D_{i,i}$ for $1\leq i\leq k-1$ . Further, for $1\leq i\leq k-1$ , $A_{i,i-1}$ (similarly $A_{i,i+1}$ ) captures the temporal correlation of individual’s action between consecutive time steps. Given this decomposition,

M^{-1}=(D+A)^{-1}=D^{-1}(I+AD^{-1})^{-1}.

For the ease of notation, we denote $I+AD^{-1}$ by $P.$ Note that $P$ is not necessarily a symmetric matrix. Nevertheless, under technical conditions on $P$ ’s eigenvalues, we have the following power series expansion (Shin et al., 2020). The details are presented in the Lemma C.2 in Appendix D.2.

Lemma C.2.

Under the condition $\mu>2\ell_{T}$ , we have

P^{-1}=\sum_{\ell\geq 0}(I-P)^{\ell}.

(8)

To understand the the power series in (8), consider the special case where each block $A_{i,j}=\ell_{T}\cdot I$ , and $D_{i,i}=Q$ . Denote $P-{I}$ as $J$ , which is equivalent to $AD^{-1}$ . Then, we have $J_{i,i}=0$ , $J_{i,i-1}=J_{i,i+1}=\ell_{T}Q^{-1}$ , $J_{i,j}=0\textup{ when $|i-j|>1$}.$ Intuitively, $J$ captures the “correlation over actions” after one time step. More generally, for $\ell\geq 0$ and any two time indices $\tau^{\prime}$ , $\tau$ ,

J^{\ell}_{\tau^{\prime},\tau}=\ell_{T}^{\ell}Q^{-\ell}b(\ell,\tau,\tau^{\prime}),

where $b(\ell,\tau,\tau^{\prime})$ is a constant depending on $\ell,\tau,\tau^{\prime}$ and equal to zero if $\ell<|\tau-\tau^{\prime}|$ .

Given that $Q$ is a graph-induced banded matrix, $Q^{-1}$ satisfies exponential-decay properties, which makes it plausible that $Q^{-\ell}$ is an exponential decay matrix with a slower rate.

For the general case, we need to bound terms similar to $\mathopen{}\mathclose{{}\left\lVert(D_{i_{1},i_{1}}^{-1}D_{i_{2},i_{2}}^{-1}\cdots D_{i_{\ell},i_{\ell}}^{-1})_{u,v}}\right\rVert$ . This is the goal of Step 3.

Step 3: Properties for general exponential-decay matrices

The goal of this step is to establish that a product of exponential decay matrices still exhibits exponential decay property under technical conditions about the underlying graph.

Lemma C.3.

Given any graph $\mathcal{M}=(\mathcal{V^{\prime}},\mathcal{E^{\prime}})$ and integers $d,\ell\geq 1$ , suppose block matrices $A_{i}\in\mathbb{R}^{|\mathcal{V^{\prime}}|d\times|\mathcal{V^{\prime}}|d}$ all satisfy exponential decay properties, i.e. exists $C_{i}\geq 0$ , and $0\leq\lambda<1$ , s.t.,

\mathopen{}\mathclose{{}\left\lVert(A_{i})_{u,q}}\right\rVert\leq C_{i}\lambda^{d_{\mathcal{M}}(u,q)}\quad\textup{for any node $u,q\in\mathcal{M}$}.

Select some $\delta>0$ s.t. $\lambda^{\prime}=\lambda+\delta<1$ . If $\tilde{a}\coloneqq\sum_{k=0}^{\infty}(\frac{\lambda}{\lambda^{\prime}})^{k}(\sup_{u\in\mathcal{V^{\prime}}}|\partial N_{u}^{k}|)<\infty$ , then $\prod_{i=1}^{\ell}A_{i}$ satisfies exponential decay properties with decay rate $\lambda^{\prime}$ , i.e.,

\mathopen{}\mathclose{{}\left\lVert(\prod_{i=1}^{\ell}A_{i})_{u,v}}\right\rVert\leq C^{\prime}{(\lambda^{\prime})}^{d_{M}(u,v)}.

where $C^{\prime}=(\tilde{a})^{\ell}\prod_{i=1}^{\ell}C_{i}$ .

A proof of Lemma C.3 can be found in the Appendix D.2.

Step 4: Establish correlation decay properties of matrix $M$

The last step of the proof is to study the properties of $M$ . To accomplish this, we first show that, for time indices $i,j\geq 1$ , $J^{\ell}$ has the following properties:

•

$(J^{\ell})_{i,j}=0$ if $\ell<|i-j|$ or $\ell-|i-j|$ is odd.
•

$(J^{\ell})_{i,j}$ is a summation of terms $\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}$ and the number of such terms is bounded by $\binom{\ell}{(\ell-|i-j|)/2}$ .

We formally state and prove the above observation in Lemma D.3. We can further use Theorem C.3 on block matrices $A_{j_{k},i_{k}}D_{i_{k}}^{-1}$ , which gives the following lemma.

Lemma C.4.

Recall $\gamma_{S}\coloneqq\frac{\sqrt{1+(\Delta\ell_{S}/\mu)}-1}{\sqrt{1+(\Delta\ell_{S}/\mu)}+1}$ . Select $\delta>0$ s.t. $\gamma_{S}^{\prime}=\gamma_{S}+\delta<1$ and $b\coloneqq\sum_{\gamma\geq 0}(\frac{\gamma_{S}}{\gamma_{S}^{\prime}})^{\gamma}h(\gamma)$ . Given $\ell,i,j\geq 1$ and $u,q\in\mathcal{V}$ , we have

\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{i,j})_{u,q}}\right\rVert\leq\binom{\ell}{(\ell-|i-j|)/2}(b\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,q)}.

Intuitively speaking, Lemma C.4 bounds the correlations over actions for node $u$ at time step $t-1+i$ and action for node $q$ at time step $t-1+j$ . We present its proof in the Appendix D.2.

Recall that, for $1\leq i,j\leq k-1$ ,

M^{-1}_{i,j}=D^{-1}_{i,i}\sum_{\ell\geq 0}(-J)^{\ell}_{i,j}.

With the exponential decaying bounds on matrix $J^{\ell}$ , we can thus conclude Theorem 3.3 by following a similar procedure as in the proof of Theorem 3.1 of (Lin et al., 2021). We present the details in Appendix D.4.

C.2 From Perturbation to Competitive Ratio

We now show how to use the result proven in the previous section to prove our competitive ratio bounds in Theorem 3.4. Our starting point is the assumption that the exponentially decaying local perturbation bound in Definition 3.1 holds for some $C_{1},C_{2}>0$ and $\rho_{S},\rho_{T}\in[0,1)$ , which is established using the proof approach outlined in the Section 3.3.

As we discussed in Section 3.3, our proof contains two key parts: (i) we bound the per-time-step error of LPC (Lemma C.7); and (ii) show that the per-time-step error does not accumulate to be unbounded (Theorem C.8).

A key observation that enables the above analysis approach is that the aggregation of local per-time-step error made by each agent at $x_{t}^{v}$ can be viewed as a global per-time-step error in the joint global action $x_{t}$ . Following this observation, we first introduce a global perturbation bound that focuses on the global action $x_{t}$ rather than the local actions $x_{t}^{v}$ . Recall that $f_{t}$ denotes the global hitting cost (see Section 2). Define the optimization problem that solves the optimal trajectory from global state $x_{t-1}$ to $x_{t+p-1}$

	$\displaystyle\tilde{\psi}_{t}^{p}(y,z)=\operatorname*{arg\,min}_{x_{t:t+p-1}}$	$\displaystyle\sum_{\tau=t}^{t+p-1}\mathopen{}\mathclose{{}\left(f_{\tau}(x_{\tau})+c_{\tau}(x_{\tau},x_{\tau-1})}\right)$
	s.t.	$\displaystyle x_{t-1}=y,x_{t+p-1}=z,$		(9)

and another one that solves the optimal trajectory from global state $x_{t-1}$ to the end of the game

	$\displaystyle\tilde{\psi}_{t}(y)=\operatorname*{arg\,min}_{x_{t:T}}$	$\displaystyle\sum_{\tau=t}^{T}\mathopen{}\mathclose{{}\left(f_{\tau}(x_{\tau})+c_{\tau}(x_{\tau},x_{\tau-1})}\right)$
	s.t.	$\displaystyle x_{t-1}=y.$		(10)

The following global perturbation bound can be derived from Theorem 3.1 in Lin et al. (2021):

Theorem C.5 (Global Perturbation Bound).

Under Assumption 2.1, the following perturbation bounds hold for optimization problems (C.2) and (C.2):

	$\displaystyle\mathopen{}\mathclose{{}\left\lVert\tilde{\psi}_{t}^{p}(y,z)_{t_{0}}-\tilde{\psi}_{t}^{p}(y^{\prime},z^{\prime})_{t_{0}}}\right\rVert\leq{}$	$\displaystyle C_{G}\rho_{G}^{t_{0}-t+1}\mathopen{}\mathclose{{}\left\lVert y-y^{\prime}}\right\rVert+C_{G}\rho_{G}^{t+p-1-t_{0}}\mathopen{}\mathclose{{}\left\lVert z-z^{\prime}}\right\rVert,$
	$\displaystyle\mathopen{}\mathclose{{}\left\lVert\tilde{\psi}_{t}(y)_{t_{0}}-\tilde{\psi}_{t}(y^{\prime})_{t_{0}}}\right\rVert\leq{}$	$\displaystyle C_{G}\rho_{G}^{t_{0}-t+1}\mathopen{}\mathclose{{}\left\lVert y-y^{\prime}}\right\rVert,$

where $\rho_{G}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+\frac{2\ell_{T}}{\mu}}+1}\right)^{-1}$ and $C_{G}=\frac{2\ell_{T}}{\mu}$ .

To make the concept of per-time-step error rigorous, we formally define it as the distance between the actual next action picked by LPC and the clairvoyant optimal next action from previous action $x_{t-1}$ to the end of the game:

Definition C.6 (Per-step error magnitude).

At time step $t$ , given the previous state $x_{t-1}$ , the (decentralized) online algorithm $ALG$ picks $x_{t}\in D_{t}$ . We define error $e_{t}$ as

e_{t}:=\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}(x_{t-1})_{t}}\right\rVert.

Using the local perturbation bound in Definition 3.1, we show the per-time-step error of LPC decays exponentially with respect to prediction length $k$ and communication range $r$ . This result is stated formally in Lemma C.7, and the proof can be found in Appendix E.2.

Lemma C.7.

For $LPC$ with parameters $r$ and $k$ , $e_{t}$ satisfies

	$\displaystyle e_{t}^{2}={}$	$\displaystyle O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2k}\rho_{G}^{2k}}\right)\cdot\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}$
		$\displaystyle+O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{2r}}\right)\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{*})$
		$\displaystyle+O\mathopen{}\mathclose{{}\left(C_{3}(r)^{2}\cdot\rho_{T}^{2k}}\right)f_{t+k-1}(x_{t+k-1}^{*}),$

where $C_{3}(r):=\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}$ .

Using the global perturbation bound in Theorem C.5, we show $\sum_{t=1}^{T}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}$ can be upper bounded by the sum of per-time-step errors of LPC in Theorem C.8. The proof can be found in Appendix E.1.

Theorem C.8.

Let $x_{0},x_{1}^{*},x_{2}^{*},\ldots,x_{H}^{*}$ denote the offline optimal global trajectory and $x_{0},x_{1},x_{2},\ldots,x_{H}$ denote the trajectory of $ALG$ . The trajectory of $ALG$ satisfies that

\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq\frac{C_{0}^{2}}{(1-\rho_{G})^{2}}\sum_{t=1}^{H}e_{t}^{2},

where $C_{0}:=\max\{1,C_{G}\}$ and $C_{G}$ is defined in Theorem C.5.

To understand the bound in Theorem C.8, we can set all per-time-step error $e_{t}$ to be zero except a single time step $\tau$ . We see the impact of $e_{\tau}$ on the total squared distance $\sum_{t=1}^{T}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}$ is up to some constant factor of $e_{\tau}$ . This is because the impact of $e_{\tau}$ on $\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert$ decays exponentially as $t$ increases from $\tau$ to $T$ .

By substituting the per-time-step error bound in Lemma C.7 into Theorem C.8, one can bound $\sum_{t=1}^{T}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}$ by the offline optimal cost, which can be converted to the competitive ratio bound in Theorem 3.4.

C.3 Roadmap to Generalize the Proof to Inexact Predictions

In this section, we present a roadmap to generalize our proof to the case where predictions of future cost functions are inexact. In the information availability model in Section 2.1, one can study inexact predictions by introducing additional disturbance parameters $\{\delta_{t}^{v},w_{t}^{v},w_{t}^{e}\}$ to the 3 types of cost functions and generalize them to $f_{t}(x_{t}^{v},\delta_{t}^{v}),c_{t}^{v}(x_{t}^{v},x_{t-1}^{v},w_{t}^{v}),s_{t}^{e}(x_{t}^{v},x_{t}^{u},w_{t}^{e})$ for all $t\in[H],v\in\mathcal{V},e=(v,u)\in\mathcal{E}$ . $\{\delta_{t}^{v},w_{t}^{v},w_{t}^{e}\}$ represents the disturbances in the cost functions that are hard to predict exactly. Before the decentralized online algorithm decides each local action, it receives the generalized cost functions $f_{t}^{v}(\cdot,\cdot),c_{t}^{v}(\cdot,\cdot,\cdot),s_{t}^{e}(\cdot,\cdot,\cdot)$ and noisy predictions of the true disturbance parameters $\{\delta_{t}^{v},w_{t}^{v},w_{t}^{e}\}$ within $k$ time steps and an $r$ -hop neighborhood. In the LPC algorithm (Algorithm 1), the optimization problem $\psi_{(t,v)}^{(k,r)}$ can then be solved with the noisy predictions of disturbance parameters. To analyze the performance of LPC in the presence of prediction errors, one can first generalize the exponentially decaying perturbation bounds in Theorem 3.2 and Theorem 3.3 to include the perturbations on disturbance parameters similar to what we already did in Theorem D.2. The prediction error on disturbance parameters will result in an additional additive term in the per-step error bound in Lemma C.7. If one is willing to assume that the total sum of prediction errors is $O(cost(OPT))$ , as in Antoniadis et al. (2020), one can derive a competitive ratio for LPC by substituting the per-step error bound into Theorem C.8. It is worth noting that the resulting competitive ratio will inevitably depend on the quality of predictions, and will converge to a limit larger than $1$ (under imperfect predictions) as the prediction horizon $k$ and the communication radius $r$ increase.

Appendix D Perturbation Bounds

This section provides the full proofs of the perturbation bounds stated in Section 3.2.

D.1 Proof of Theorem 3.2

We begin with a technical lemma. Recall that for any positive integer $m$ , $\mathbb{S}^{m}$ denotes the set of all symmetric $m\times m$ real matrices.

Lemma D.1.

For a graph $\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ , suppose $A$ is a positive definite matrix in $\mathbb{S}^{\sum_{i\in\mathcal{V}^{\prime}}p_{i}}$ formed by $\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert\times\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert$ blocks, where the $(i,j)$ -th block has dimension $p_{i}\times p_{j}$ , i.e., $A_{i,j}\in\mathbb{R}^{p_{i}\times p_{j}}$ . Assume that $A$ is $q$ -banded for an even positive integer $q$ ; i.e.,

A_{i,j}=0,\forall d_{\mathcal{G}^{\prime}}(i,j)>q/2.

Let $a_{0}$ denote the smallest eigenvalue value of $A$ , and $b_{0}$ denote the largest eigenvalue value of $A$ . Assume that $b_{0}\geq a_{0}>0$ . Suppose $D=diag(D_{1},\ldots,D_{\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert})$ , where $D_{i}\in\mathbb{S}^{p_{i}}$ is positive semi-definite. Let $M=\mathopen{}\mathclose{{}\left((A+D)^{-1}}\right)_{S_{R},S_{C}}$ , where $S_{R},S_{C}\subseteq\{1,\ldots,\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert\}$ . Then we have $\mathopen{}\mathclose{{}\left\lVert M}\right\rVert\leq C\gamma^{\hat{d}}$ , where

C=\frac{2}{a_{0}},\gamma=\mathopen{}\mathclose{{}\left(\frac{\sqrt{cond(A)}-1}{\sqrt{cond(A)}+1}}\right)^{2/q},\hat{d}=\min_{i\in S_{R},j\in S_{c}}d_{\mathcal{G}^{\prime}}(i,j).

Here $cond(A)=b_{0}/a_{0}$ denotes the condition number of matrix $A$ .

We can show Lemma D.1 using the same method as Lemma B.1 in Lin et al. (2021). We only need to note that even when the size of blocks are not identical, the $m$ th power of a $q$ -banded matrix is a $qm$ -banded matrix for any positive integer $m$ .

With the help of Lemma D.1, we can proceed to show a local perturbation bound on a general $\mathcal{G}^{\prime}$ in Theorem D.2, where $\mathcal{G}^{\prime}$ can be different from the network $\mathcal{G}$ of agents in Section 2. Compared with Theorem 3.1 in Lin et al. (2021), Theorem D.2 is more general because it considers a general network of decision variables while Theorem 3.1 in Lin et al. (2021) only consider the special case of a line graph. Although Theorem D.2 does not consider the temporal dimension which features in the local perturbation bound defined in Definition 3.1, we will use it to show Theorem 3.2 later by redefining the variables from two perspectives.

Theorem D.2.

For a network $\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},{\mathcal{E}^{\prime}})$ with undirected edges, suppose that each node $v\in\mathcal{V}^{\prime}$ is associated with a decision vector³³3We add a hat over the decision vector $\hat{x}_{v}$ to distinguish it with the local action $x_{t}^{v}$ and global action $x_{t}$ defined in Section 2. We assume $\hat{x}_{v}$ is a $p_{v}$ dimensional real vector. $\hat{x}_{v}\in\mathbb{R}^{p_{v}}$ and a cost function $\hat{f}_{v}:\mathbb{R}^{p_{v}}\to\mathbb{R}_{\geq 0}$ , and each edge $e=(u,v)\in{\mathcal{E}^{\prime}}$ is associated with an edge cost $\hat{c}_{e}:\mathbb{R}^{p_{v}}\times\mathbb{R}^{p_{u}}\times\mathbb{R}^{q}\to\mathbb{R}_{\geq 0}$ . Assume that $\hat{f}_{v}$ is $\mu$ -strongly convex for all $v\in\mathcal{V}^{\prime}$ and $\hat{c}_{e}$ is $\ell$ -smooth for all $e\in{\mathcal{E}^{\prime}}$ . For some subset $S\subset\mathcal{V}^{\prime}$ , define

	$\displaystyle E_{0}:={}$	$\displaystyle\{(u,v)\in{\mathcal{E}^{\prime}}\mid u,v\in\mathcal{V}^{\prime}\setminus S\},$
	$\displaystyle E_{1}:={}$	$\displaystyle\{(u,v)\in{\mathcal{E}^{\prime}}\mid u\in\mathcal{V}^{\prime}\setminus S,v\in S\}.$

For the disturbance vectors⁴⁴4We do not consider the disturbance vectors in the exponentially decaying local perturbation bounds defined in Definition 3.1, but adding $w$ into the edge costs makes Theorem D.2 more general. For each edge $e$ , $w_{e}$ is a $q$ -dimensional real vector. $w\in\mathbb{R}^{(\mathopen{}\mathclose{{}\left\lvert E_{0}}\right\rvert+\mathopen{}\mathclose{{}\left\lvert E_{1}}\right\rvert)\times q}$ indexed by $e\in E_{0}\cup E_{1}$ and $y\in\mathbb{R}^{\sum_{v\in S}p_{v}}$ indexed by $v\in S$ , let $\psi(w,y)$ denote the optimal solution of the optimization problem

\psi(w,y):=\operatorname*{arg\,min}_{x\in\mathbb{R}^{\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times d}}\sum_{v\in\mathcal{V}^{\prime}\setminus S}\hat{f}_{v}(\hat{x}_{v})+\sum_{(u,v)\in E_{0}}\hat{c}_{(u,v)}(\hat{x}_{u},\hat{x}_{v},w_{(u,v)})+\sum_{(u,v)\in E_{1}}\hat{c}_{(u,v)}(\hat{x}_{u},y_{v},w_{(u,v)}).

Then, we have that for any vertex $u_{0}\in\mathcal{V}^{\prime}\setminus S$ , the following inequality holds:

\mathopen{}\mathclose{{}\left\lVert\psi(w,y)_{u_{0}}-\psi(w^{\prime},y^{\prime})_{u_{0}}}\right\rVert\leq C\mathopen{}\mathclose{{}\left(\sum_{e\in E_{0}\cup E_{1}}\lambda^{d_{\mathcal{G}^{\prime}}(h,e)-1}\mathopen{}\mathclose{{}\left\lVert w_{e}-w_{e}^{\prime}}\right\rVert+\sum_{v\in S}\lambda^{d_{\mathcal{G}^{\prime}}(h,v)-1}\mathopen{}\mathclose{{}\left\lVert y_{v}-y_{v}^{\prime}}\right\rVert}\right),\forall w,y,w^{\prime},y^{\prime},

where $C\coloneqq{(2\ell)}/{\mu}$ and $\lambda\coloneqq 1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(\Delta^{\prime}\ell/\mu)}+1}\right)^{-1}$ . Here, $\Delta^{\prime}$ denote the maximum degree of any vertex $v\in\mathcal{V}^{\prime}$ in graph $\mathcal{G}^{\prime}$ . For $e=(u,v)\in{\mathcal{E}^{\prime}}$ , we define $d_{\mathcal{G^{\prime}}}(u_{0},e):=\min\{d_{\mathcal{G}^{\prime}}(u_{0},u),d_{\mathcal{G}^{\prime}}(u_{0},v)\}$ .

Proof of Theorem D.2.

Let $e=[\pi^{\top},\epsilon^{\top}]^{\top}$ be a vector where $\epsilon=\{\epsilon_{v}\}_{v\in S}$ for $\epsilon_{v}\in\mathbb{R}^{p_{v}}$ and $\pi=\{\pi_{e}\}_{e\in E_{0}\cup E_{1}},$ for $\pi_{e}\in\mathbb{R}^{q}$ . Let $\theta$ be an arbitrary real number. Define function $\hat{h}:\mathbb{R}^{\sum_{v\in\mathcal{V}^{\prime}\setminus S}p_{v}}\times\mathbb{R}^{(\mathopen{}\mathclose{{}\left\lvert E_{0}}\right\rvert+\mathopen{}\mathclose{{}\left\lvert E_{1}}\right\rvert)\times q}\times\mathbb{R}^{\sum_{v\in S}p_{v}}\to\mathbb{R}_{\geq 0}$ as

\hat{h}(\hat{x},w,y)=\sum_{v\in\mathcal{V}^{\prime}\setminus S}\hat{f}_{v}(\hat{x}_{v})+\sum_{(u,v)\in E_{0}}\hat{c}_{(u,v)}(\hat{x}_{u},\hat{x}_{v},w_{(u,v)})+\sum_{(u,v)\in E_{1}}\hat{c}_{(u,v)}(\hat{x}_{u},y_{v},w_{(u,v)}).

To simplify the notation, we use $\zeta$ to denote the tuple of system parameters, i.e.,

\zeta:=(w,y).

From our construction, we know that $\hat{h}$ is $\mu$ -strongly convex in $x$ , so we use the decomposition $\hat{h}=\hat{h}_{a}+\hat{h}_{b}$ , where

	$\displaystyle\hat{h}_{a}(\hat{x},\zeta)$	$\displaystyle=\sum_{v\in\mathcal{V}^{\prime}\setminus S}\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert\hat{x}_{v}}\right\rVert^{2}+\sum_{(u,v)\in E_{0}}\hat{c}_{(u,v)}(\hat{x}_{u},\hat{x}_{v},w_{(u,v)})+\sum_{(u,v)\in E_{1}}\hat{c}_{(u,v)}(\hat{x}_{u},y_{v},w_{(u,v)}),$
	$\displaystyle\hat{h}_{b}(\hat{x},\zeta)$	$\displaystyle=\sum_{v\in\mathcal{V}^{\prime}\setminus S}\mathopen{}\mathclose{{}\left(\hat{f}_{v}(\hat{x}_{v})-\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert\hat{x}_{v}}\right\rVert^{2}}\right).$

Since $\psi(\zeta+\theta e)$ is the minimizer of convex function $\hat{h}(\cdot,\zeta+\theta e)$ , we see that

\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)=0.

Taking the derivative with respect to $\theta$ gives that

	$\displaystyle\nabla_{\hat{x}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\frac{d}{d\theta}\psi(\zeta+\theta e)={}$	$\displaystyle-\sum_{v\in S}\nabla_{y_{v}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\epsilon_{v}$
		$\displaystyle-\sum_{e\in E_{1}\cup E_{2}}\nabla_{w_{e}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\pi_{e}.$

To simplify the notation, we define

	$\displaystyle M$	$\displaystyle:=\nabla_{\hat{x}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e),\text{which is a }\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\text{ block matrix},$
	$\displaystyle R^{(v)}$	$\displaystyle:=-\nabla_{y_{v}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e),\forall v\in S,\text{which are }\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1\text{ block matrix},$
	$\displaystyle K^{(e)}$	$\displaystyle:=-\nabla_{w_{e}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e),\forall e\in E_{0}\cup E_{1},\text{which are }\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1\text{ block matrices},$

where in $M$ , the block size is $p_{u}\times p_{v},\forall(u,v)\in(\mathcal{V}^{\prime}\setminus S)^{2}$ ; in $R^{(v)}$ , the block size is $p_{u}\times p_{v},\forall u\in\mathcal{V}^{\prime}\setminus S$ ; in $K^{(e)}$ , the block size is $p_{u}\times q,\forall u\in\mathcal{V}^{\prime}\setminus S$ . Hence we can write

\frac{d}{d\theta}\psi(\zeta+\theta e)=M^{-1}\mathopen{}\mathclose{{}\left(\sum_{v\in S}R^{(v)}\epsilon_{v}+\sum_{e\in E_{1}\cup E_{2}}K^{(e)}\pi_{e}}\right).

Recall that $\{R^{(v)}\}_{v\in S}$ are $\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1$ block matrices with block size $p_{u}\times p_{v},\forall u\in\mathcal{V}^{\prime}\setminus S$ ; $\{K^{(e)}\}_{e\in E_{0}\cup E_{1}}$ are $\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1$ block matrices with block size $p_{u}\times q,\forall u\in\mathcal{V}^{\prime}\setminus S$ . Let $N(v)$ denote the set of neighbors of vertex $v$ on $\mathcal{G}^{\prime}$ . For $R^{(v)},v\in S$ , the $(u,1)$ -th block can be non-zero only if $u\in(\mathcal{V}^{\prime}\setminus S)\cap N(v)$ . For $K^{(e)},e\in E_{0}\cup E_{1}$ , the $(u,1)$ -th block can be non-zero only if $u\in e$ and $u\in\mathcal{V}^{\prime}\setminus S$ . Hence we see that

\displaystyle\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}=\sum_{v\in S}(M^{-1})_{u_{0},(\mathcal{V}^{\prime}\setminus S)\cap N(v)}R^{(v)}_{(\mathcal{V}^{\prime}\setminus S)\cap N(v),1}\epsilon_{v}+\sum_{e\in E_{0}\cup E_{1}}(M^{-1})_{u_{0},\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\}}K^{(\tau)}_{\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\},1}\pi_{e}.

Since the switching costs $c_{\tau}(\cdot,\cdot,\cdot),\tau=1,\ldots,p$ are $\ell$ -strongly smooth, we know that the norms of

R^{(v)}_{(\mathcal{V}^{\prime}\setminus S)\cap N(v),1},\text{ and }K^{(\tau)}_{\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\},1}

are all upper bounded by $\ell$ . Taking norms on both sides gives that

\displaystyle\mathopen{}\mathclose{{}\left\lVert\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}}\right\rVert\leq{}

\displaystyle\sum_{v\in S}\ell\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},(\mathcal{V}^{\prime}\setminus S)\cap N(v)}}\right\rVert\mathopen{}\mathclose{{}\left\lVert\epsilon_{v}}\right\rVert+\sum_{e\in E_{0}\cup E_{1}}\ell\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\}}}\right\rVert\mathopen{}\mathclose{{}\left\lVert\pi_{e}}\right\rVert.

(11)

Note that $M$ can be decomposed as $M=M_{a}+M_{b}$ , where

	$\displaystyle M_{a}$	$\displaystyle:=\nabla_{\hat{x}}^{2}\hat{h}_{a}(\psi(\zeta+\theta e),\zeta+\theta e),$
	$\displaystyle M_{b}$	$\displaystyle:=\nabla_{\hat{x}}^{2}\hat{h}_{b}(\psi(\zeta+\theta e),\zeta+\theta e).$

Since $M_{a}$ is block tri-diagonal and satisfies $(\mu+\Delta^{\prime}\ell)I\succeq M_{a}\succeq\mu I$ , and $M_{b}$ is block diagonal and satisfies $M_{b}\succeq 0$ , we obtain the following using Lemma D.1:

\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},(\mathcal{V}^{\prime}\setminus S)\cap N(v)}}\right\rVert\leq\frac{2}{\mu}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},v)-1},\text{ and }\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\}}}\right\rVert\leq\frac{2}{\mu}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},e)-1},

where $\lambda:={(\sqrt{cond(M_{a})}-1)}/{(\sqrt{cond(M_{a})}+1)}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(2\ell/\mu)}+1}\right)^{-1}$ .

Substituting this into (11), we see that

\mathopen{}\mathclose{{}\left\lVert\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}}\right\rVert\leq C\mathopen{}\mathclose{{}\left(\sum_{v\in S}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},v)-1}\mathopen{}\mathclose{{}\left\lVert\epsilon_{v}}\right\rVert+\sum_{e\in E_{0}\cup E_{1}}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},e)-1}\mathopen{}\mathclose{{}\left\lVert\pi_{e}}\right\rVert}\right),

where $C=(2\ell)/\mu$ .

Finally, by integration we can complete the proof

	$\displaystyle\mathopen{}\mathclose{{}\left\lVert\psi(\zeta)_{u_{0}}-\psi(\zeta+e)_{u_{0}}}\right\rVert={}$	$\displaystyle\mathopen{}\mathclose{{}\left\lVert\int_{0}^{1}\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}d\theta}\right\rVert$
	$\displaystyle\leq{}$	$\displaystyle\int_{0}^{1}\mathopen{}\mathclose{{}\left\lVert\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}}\right\rVert d\theta$
	$\displaystyle\leq{}$	$\displaystyle C\mathopen{}\mathclose{{}\left(\sum_{v\in S}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},v)-1}\mathopen{}\mathclose{{}\left\lVert\epsilon_{v}}\right\rVert+\sum_{e\in E_{0}\cup E_{1}}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},e)-1}\mathopen{}\mathclose{{}\left\lVert\pi_{e}}\right\rVert}\right).$

∎

Now we return to the proof of Theorem 3.2. For simplicity, we temporarily assume the individual decision points are unconstrained, i.e., $D_{t}^{v}=\mathbb{R}^{n}$ . We discuss how to relax this assumption in Appendix D.3.

We first consider the case when $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ and $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ only differ at one entry $y_{t-1}^{u}$ or $z_{\tau}^{u}$ . If the difference is at $z_{\tau}^{u}$ , by viewing each subset $\{\tau\}\times N_{v}^{r}$ for $\tau\in\{t-1,t,\dots,t+k\}$ in the original problem as a vertex in the new graph $\mathcal{G}^{\prime}$ and applying Theorem D.2, we obtain that

\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq C_{1}^{0}\cdot(\rho_{T}^{0})^{\mathopen{}\mathclose{{}\left\lvert t_{0}-\tau}\right\rvert}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert,

(12)

where $C_{1}^{0}=(2\ell_{T})/\mu$ and $\rho_{T}^{0}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(2\ell_{T}/\mu)}+1}\right)^{-1}$ . On the other hand, by viewing each subset $\{\tau\mid t-1\leq\tau<t+k\}\times\{u\}$ for $u\in N_{v}^{r}$ in the original problem as a vertex in the new graph $\mathcal{G}^{\prime}$ and applying Theorem D.2, we obtain that

\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq C_{1}^{1}\cdot(\rho_{S}^{0})^{d_{\mathcal{G}}(u,v_{0})}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert,

(13)

where $C_{1}^{1}=(2\Delta\ell_{S})/\mu$ and $\rho_{S}^{0}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(2\Delta\ell_{S}/\mu)}}\right)^{-1}$ . Combining (12) and (13) gives that

$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq{}$	$\displaystyle\min\{C_{1}^{0}\cdot(\rho_{T}^{0})^{\mathopen{}\mathclose{{}\left\lvert t_{0}-\tau}\right\rvert},C_{1}^{1}\cdot(\rho_{S}^{0})^{d_{\mathcal{G}}(u,v_{0})}\}\cdot\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert$
$\displaystyle\leq{}$	$\displaystyle\sqrt{C_{1}^{0}\cdot C_{1}^{1}}\cdot(\rho_{T}^{0})^{\mathopen{}\mathclose{{}\left\lvert t_{0}-\tau}\right\rvert/2}\cdot(\rho_{S}^{0})^{d_{\mathcal{G}}(u,v_{0})/2}\cdot\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert$
$\displaystyle\leq{}$	$\displaystyle C_{1}\cdot\rho_{T}^{\|t_{0}-\tau\|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert$	(14)

when $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ and $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ only differ at one entry $z_{\tau}^{u}$ for $(\tau,u)\in\partial N_{(t,v)}^{(k,r)}$ .

We can use the same method to show that when $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ and $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ only differ at one entry $y_{t-1}^{u}$ for $u\in N_{v}^{r}$ , we have

\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq{}C_{2}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert.

(15)

In the general case where $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ and $\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)$ differ not only at one entry, we can perturb the entries of parameters one at a time and apply the triangle inequality. Then, the conclusion of Theorem 3.2 follows from (D.1) and (15).

D.2 Proof of Theorem 3.3

The proof follows a four step structure outlined in Section C.1.

Step 1. Establish first order equations

Given any system parameter $\zeta=(x_{t-1}^{(N_{v}^{r})},\{z_{\tau}^{u}|(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\})$ , we can define function $\hat{h}$ as follows:

\hat{h}(\hat{x}_{[k-1]},\zeta)=\sum_{i=1}^{k-1}\sum_{u\in N_{v}^{r-1}}f^{u}_{t-1+i}(\hat{x}_{i}^{u})+\sum_{i=1}^{k-1}\sum_{(u,u^{\prime})\in\mathcal{E}(N_{v}^{r})}s_{t-1+i}^{(u,u^{\prime})}(\hat{x}_{i}^{u},\hat{x}_{i}^{u^{\prime}})+\sum_{i=1}^{k}\sum_{u\in N_{v}^{r}}c_{t-1+i}^{u}(\hat{x}_{i}^{u},\hat{x}_{i-1}^{u}).

$\hat{x}_{0}$ coincides with $x_{t-1}$ on every node in $N_{v}^{r}.$ $\hat{x}_{k}$ coincides with $z_{t-1+k}$ on every node in $N_{v}^{r}.$ For $1\leq i\leq k-1$ , $\hat{x}_{i}^{u}$ coincides with $z_{t-1+i}^{u}$ on the boundary, i.e., $u\in\partial N_{v}^{r}.$

Let perturbation vector $e=[e_{0}^{T},e_{1}^{T},\cdots,e_{k-1}^{T},e_{k}^{T}]^{T}$ where $e_{0},e_{k}\in\mathbb{R}^{|N_{v}^{r}|\times n}$ and $e_{i}=\mathbb{R}^{|\partial N_{v}^{r}|\times n}$ for $1\leq i\leq k-1$ .

Given $\theta\in\mathbb{R}$ , $\psi(\zeta+\theta e)$ is the global minimizer of convex function $\hat{h}(\cdot,\zeta+\theta e)$ , and hence we have

\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)=0.

Taking the derivative with respect to $\theta$ , we establish the following set of equations:

\displaystyle\begin{split}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\frac{d}{d\theta}\psi(\zeta+\theta e)&=-\nabla_{\hat{x}_{0}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)e_{0}\\ &-\nabla_{\hat{x}_{k}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)e_{k}\\ &-\sum_{\tau=1}^{k-1}\nabla_{\hat{x}_{\tau}^{(\partial N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)e_{\tau}.\end{split}

(16)

We adopt the following short-hand notation:

•

$M\coloneqq\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)$ , which is a hierarchical block matrix with the first level of dimension $(k-1)\times(k-1)$ , the second level of dimension $|N_{v}^{r-1}|\times|N_{v}^{r-1}|$ and the third level of dimension $n\times n$ .
•

$R^{(1)}\coloneqq-\nabla_{\hat{x}_{0}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)$ , which is also a hierarchical block matrix with the first level of dimension $(k-1)\times 1$ , the second level of dimension $|N_{v}^{r-1}|\times|N_{v}^{r}|$ and the third level of dimension $n\times n$ .
•

$R^{(k-1)}\coloneqq-\nabla_{\hat{x}_{k}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)$ , which is also a hierarchical block matrix with the first level of dimension $(k-1)\times 1$ , the second level of dimension $|N_{v}^{r-1}|\times|N_{v}^{r}|$ and the third level of dimension $n\times n$ .
•

$K^{(\tau)}\coloneqq-\nabla_{\hat{x}_{\tau}^{(\partial N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)$ , which is also a hierarchical block matrix with the first level of dimension $(k-1)\times 1$ . the second level of dimension $|N_{v}^{r-1}|\times|\partial N_{v}^{r}|$ and the third level of dimension $n\times n$ .

Using the above, we can rewrite (16) as follows:

\frac{d}{d\theta}\psi(\zeta+\theta e)=M^{-1}\Bigg{(}R^{(1)}e_{0}+R^{(k-1)}e_{k}+\sum_{\tau=1}^{k-1}K^{(\tau)}y_{\tau}\Bigg{)}.

Due to the structure of temporal interaction cost functions, for $R^{(1)}$ (resp. $R^{(k-1)}$ ), only when the first level index is $1$ (resp. $k-1$ ), the lower level block matrix is non-zero; due to the structure of spatial interaction cost functions, for $K^{(\tau)}$ , only when the first level index is $\tau$ , the lower level block matrix is non-zero. Hence, for $1\leq\tau^{\prime}\leq k-1$ , we have

\displaystyle\begin{split}(\frac{d}{d\theta}\psi(\zeta+\theta e))_{\tau^{\prime}}&=M^{-1}_{\tau^{\prime},1}R^{(1)}_{1}e_{0}+M^{-1}_{\tau^{\prime},k-1}R^{(k-1)}_{k-1}e_{k}+\sum_{\tau=1}^{k-1}M^{-1}_{\tau^{\prime},\tau}K^{(\tau)}_{\tau}y_{\tau},\end{split}

(17)

where the subscripts on the right hand side denote the first level index of hierarchical block matrices $M$ , $R^{(1)}$ , $R^{(k-1)}$ and $K^{(\tau)}$ .

Step 2. Exploit the structure of matrix $M$

We decompose $M$ to block diagonal matrix $D$ and tri-diagonal block matrix $A$ such that $M=D+A$ . We denote each diagonal block in $D$ as $D_{i,i}$ for $1\leq i\leq k-1$ . Other blocks in $D$ are zero matrices.

D\coloneqq\begin{bmatrix}\begin{matrix}*&0&\cdots&*\\ 0&*&&0\\ \vdots&&\ddots&\\ *&0&\cdots&*\end{matrix}&&&\\ &\begin{matrix}*&0&\cdots&*\\ 0&*&&0\\ \vdots&&\ddots&\\ *&0&\cdots&*\end{matrix}\\ &&\ddots&\\ &&&\begin{matrix}*&0&\cdots&*\\ 0&*&&0\\ \vdots&&\ddots&\\ *&0&\cdots&*\end{matrix}\end{bmatrix}

Each non-zero block in $A$ is a diagonal block matrix, which captures the Hessian of temporal interaction cost between consecutive time steps. Denote each block as $A_{i,j}$ for $1\leq i,j\leq k-1$ .

A\coloneqq\begin{bmatrix}&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}&&\\ \begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}&&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}\\ &&\ddots&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}\\ &&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}&\end{bmatrix}

We rewrite the inverse of $M$ as follows:

M^{-1}=(D+A)^{-1}=D^{-1}(I+AD^{-1})^{-1}=D^{-1}P^{-1}.

Next we show the proof for Lemma C.2.

Proof of Lemma C.2.

We claim the eigenvalues of $P$ are in $\{\lambda\in\mathbb{C}||\lambda-z|\leq R\}$ for some $R\in\mathbb{R}_{>0}$ and $z\in\mathbb{C}\setminus\{0\}$ such that $R<|z|.$ We first establish Lemma C.2 based on the claim and then prove the claim.

We follow the argument as in the proof of Thm 4 in Shin et al. (2020). Since any eigenvalue $\lambda$ of $P$ satisfies $|\lambda-z|\leq R$ , $|\lambda/z-1|\leq R/|z|<1.$ Thus, the eigenvalues of $I-(1/z)P$ lie on $\{\tilde{\lambda}\in\mathbb{C}:|\tilde{\lambda}|\leq R/|z|\}$ , which guarantees $\rho(I-(1/z)P)<1.$ Therefore,

P^{-1}=\frac{1}{z}\Bigg{(}I-(I-\frac{1}{z}P)\Bigg{)}^{-1}=\frac{1}{z}\sum_{q\geq 0}(I-\frac{1}{z}P)^{q}.

We let $z=1$ and $R=\frac{2\ell_{T}}{\mu}$ and prove the above claim by utilizing Gershgorin circle theorem for block matrices.

By Theorem 1.13.1 and Remark 1.13.2 of Tretter (2008), the following holds: Consider $\mathcal{A}=(A_{ij})\in\mathbb{R}^{dn\times dn}$ ( $d,n\geq 1$ ) where $A_{ij}\in\mathbb{R}^{d\times d}$ and $A_{ii}$ is symmetric. Suppose $\sigma(\cdot)$ is the spectrum of a matrix. Define set

G_{i}\coloneqq\sigma(A_{ii})\cup\Bigg{\{}\cup_{k=1}^{d}B\Bigg{(}\lambda_{k}(A_{ii}),\sum_{j\neq i}\mathopen{}\mathclose{{}\left\lVert A_{ij}}\right\rVert\Bigg{)}\Bigg{\}}

where $B(\cdot,\cdot)$ denotes a disk $B(c,r)=\{\lambda:\mathopen{}\mathclose{{}\left\lVert\lambda-c}\right\rVert\leq r\}$ and $\lambda_{k}$ is the $k$ -th smallest eigenvalues of $A_{ii}$ . Then,

\sigma(\mathcal{A})\in\cup_{i=1}^{n}G_{i}.

Next, we use the above fact to find a superset of $\sigma(P)$ . Every diagonal block of $P$ is ${I}$ . Moreover, $P_{i,j}=0$ for $|i-j|>1$ , $P_{i,i-1}=A_{i,i-1}D_{i-1,i-1}^{-1}$ , $P_{i,i+1}=A_{i,i+1}D_{i+1,i+1}^{-1}$ . Hence we have

	$\displaystyle\sum_{j\neq i}\mathopen{}\mathclose{{}\left\lVert P_{i,j}}\right\rVert$	$\displaystyle\leq\mathopen{}\mathclose{{}\left\lVert A_{i,i-1}}\right\rVert\mathopen{}\mathclose{{}\left\lVert D_{i-1,i-1}^{-1}}\right\rVert+\mathopen{}\mathclose{{}\left\lVert A_{i,i+1}}\right\rVert\mathopen{}\mathclose{{}\left\lVert D_{i+1,i+1}^{-1}}\right\rVert$
		$\displaystyle\leq\frac{2\ell_{T}}{\mu}.$

The last inequality is by Assumptions 2.1. Therefore, $G_{i}=B(1,\frac{2\ell_{T}}{\mu}).$ This implies all eigenvalues of $P$ are in $B(1,\frac{2\ell_{T}}{\mu}).$ ∎

To further simplify the notation in the power series expansion, we define $J\coloneqq AD^{-1}=P-I$ . Given any time indices $\tau^{\prime}$ and $\tau$ , we have

\displaystyle\begin{split}(M^{-1})_{\tau^{\prime},\tau}&=(D^{-1})_{\tau^{\prime},\tau^{\prime}}(P^{-1})_{\tau^{\prime},\tau}\\ &=(D^{-1})_{\tau^{\prime},\tau^{\prime}}\times\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau},\end{split}

(18)

where the first equality is since $D^{-1}$ is a diagonal block matrix, the second equality is due to Lemma C.2.

Step 3: Property for general exponential-decay matrices

This step simply requires proving Lemma C.3.

Proof of Lemma C.3.

Under the assumptions, we see that

\displaystyle\begin{split}\sum_{q}(\frac{1}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,q)}\mathopen{}\mathclose{{}\left\lVert(A_{1}A_{2}\cdots A_{\ell})_{u,q}}\right\rVert&=\sum_{q}(\frac{1}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,q)}\mathopen{}\mathclose{{}\left\lVert\sum_{s_{1},\cdots,s_{\ell-1}}(A_{1})_{u,s_{1}}(A_{2})_{s_{1},s_{2}}\cdots(A_{\ell})_{s_{\ell-1},q}}\right\rVert\\ &\leq\sum_{q}(\frac{1}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,q)}\sum_{s_{1},\cdots,s_{\ell-1}}(C_{1}\lambda^{d_{\mathcal{M}}(u,s_{1})})(C_{2}\lambda^{d_{\mathcal{M}}(s_{1},s_{2})})\cdots(C_{\ell}\lambda^{d_{\mathcal{M}}(s_{\ell-1},q)})\\ &\leq\sum_{q}\sum_{s_{1},\cdots,s_{\ell-1}}\prod_{i=1}^{\ell}C_{i}(\frac{\lambda}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,s_{1})+d_{\mathcal{M}}(s_{1},s_{2})+\cdots+d_{\mathcal{M}}(s_{\ell-1},q)}\\ &\leq(\tilde{a})^{\ell}\prod_{i=1}^{\ell}C_{i}.\end{split}

(19)

Hence, we obtain that

\mathopen{}\mathclose{{}\left\lVert(\prod_{i=1}^{\ell}A_{i})_{u,q}}\right\rVert\leq C^{\prime}(\lambda^{\prime})^{d_{M}(u,q)}.

∎

Step 4: Establish correlation decay properties of matrix $M$

In this step, we use the property developed for general exponential-decay matrices on $M$ and derive the perturbation bound in the Theorem 3.3.

Lemma D.3.

For $\ell\geq 1$ , time index $i,j\geq 1$ , $J^{\ell}$ has the following properties:

•

$(J^{\ell})_{i,j}=0$ if $\ell<|i-j|$ or $\ell-|i-j|$ is odd.
•

$(J^{\ell})_{i,j}$ is a summation of terms $\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}$ and the number of such terms is bounded by ${\ell\choose(\ell-|i-j|)/2}$ .

Note for integers $m,k\geq 1$ , we define ${m\choose k/2}=0$ if $k$ is odd.

Proof.

Since $J$ is a tri-diagonal banded matrix, $J^{\ell}_{i,j}=0$ for $\ell<|i-j|$ . We prove the rest of properties of $J$ by induction on $\ell$ . When $\ell=1$ ,

J_{i,i}=0,\quad J_{i,i-1}=A_{i,i-1}D_{i-1,i-1}^{-1},\quad J_{i,i+1}=A_{i,i+1}D_{i+1,i+1}^{-1}.

Lemma D.3 holds for the base case. Suppose Lemma D.3 holds for $J^{q}$ for $q\leq\ell-1$ . Let $q=\ell$ , then

\displaystyle J^{\ell}_{i,j}

\displaystyle=\sum_{k}J^{\ell-1}_{i,k}J_{k,j}=J^{\ell-1}_{i,j-1}A_{j-1,j}D^{-1}_{j,j}+J^{\ell-1}_{i,j+1}A_{j+1,j}D_{j,j}^{-1}.

By induction hypothesis, $J^{\ell-1}_{i,j}$ is a summation of terms $\prod_{k=1}^{\ell-1}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}$ . Moreover, the number of such terms is bounded by ${\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}$ . Next we will show ${\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}={\ell\choose(\ell-|i-j|)/2}$ case by case.

Case 1: $\ell-|i-j|$ is odd.

If $\ell-|i-j|$ is odd, then $\ell-1-|i-j-1|$ and $\ell-1-|i-j+1|$ are both odd. Under this case,

{\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}=0,

which is equal to ${\ell\choose(\ell-|i-j|)/2}.$

Case 2: $\ell-|i-j|$ is even and $i=j$ . Under this case, we have

{\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}={\ell-1\choose\ell/2-1}+{\ell-1\choose\ell/2-1}.

Since $\ell$ is even, ${\ell-1\choose\ell/2-1}+{\ell-1\choose\ell/2-1}={\ell\choose\ell/2}={\ell\choose(\ell-|i-j|)/2}.$

Case 3: $\ell-|i-j|$ is even and $i\neq j$ .

If $\ell-|i-j|$ is even, then $\ell-1-|i-j-1|$ and $\ell-1-|i-j+1|$ are both even. We denote $(\ell-|i-j|)/2$ as $k_{0}$ . By triangle inequality, $(\ell-1-|i-j-1|)/2$ and $(\ell-1-|i-j+1|)/2$ are in $\{k_{0}-1,k_{0}\}$ . Since $i\neq j$ ,

{\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}={\ell-1\choose k_{0}-1}+{\ell-1\choose k_{0}},

which sums to ${\ell\choose k_{0}}$ by Pascal’s triangle.

∎

Next we present the proof of Lemma C.4.

Proof of Lemma C.4.

By Lemma D.3, $(J^{\ell})_{i,j}$ is a summation of terms $\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}$ and the number of such terms is bounded by ${\ell\choose(\ell-|i-j|)/2}$ .

Define $B_{k}\coloneqq A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}$ . Recall $A_{j_{k},i_{k}}$ is a diagonal matrix and $D_{i_{k},i_{k}}$ is a graph-induced banded matrix.

\mathopen{}\mathclose{{}\left\lVert(B_{k})_{u,q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert(A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1})_{u,q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert(A_{j_{k},i_{k}})_{u,u}(D_{i_{k},i_{k}}^{-1})_{u,q}}\right\rVert\leq\ell_{T}\mathopen{}\mathclose{{}\left\lVert(D_{i_{k},i_{k}}^{-1})_{u,v}}\right\rVert\leq\frac{2\ell_{T}}{\mu}\gamma_{S}^{d_{\mathcal{G}}(u,v)}.

where the last inequality is by using Lemma D.1 on $D_{i_{k},i_{k}}$ .

Under the condition $b<\infty$ , we can use Lemma C.3 to obtain the following bound,

\mathopen{}\mathclose{{}\left\lVert(\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k}}^{-1})_{u,v}}\right\rVert\leq(b\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,v)}.

Since the number of such terms is bounded by ${\ell\choose(\ell-|i-j|)/2}$ , we have

\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{i,j})_{u,q}}\right\rVert\leq{\ell\choose(\ell-|i-j|)/2}(b\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,v)}.

∎

Lemma D.4.

Given $1\leq\tau^{\prime},\tau\leq k-1$ , $y\in\mathbb{R}^{|\partial N_{v}^{r}|\times n}$ and $v_{0}\in N_{v}^{r-1}$ , we have

\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M)^{-1}_{\tau^{\prime},\tau}K_{\tau}^{(\tau)}y\Bigg{)}_{v_{0}}}\right\rVert\leq C_{1}\rho_{T}^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert,

and for $i\in\{1,k-1\}$ , $e\in\mathbb{R}^{|N_{v}^{r}|\times n}$ ,

\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e\Bigg{)}_{v_{0}}}\right\rVert\leq C_{2}\rho_{T}^{|\tau^{\prime}-i|+1}\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert e_{u}}\right\rVert,

where $\rho_{T}=\frac{4\tilde{a}\ell_{T}}{\mu}$ and $\rho_{S}=(1+b_{1}+b_{2})\gamma_{S}.$ We let $C_{1}=C_{2}=\max\{\frac{a^{2}}{2\tilde{a}(1-4\tilde{a}\ell_{T}/\mu)},\frac{2a^{2}\Delta\ell_{S}/\mu}{\gamma_{S}(1+b_{1}+b_{2})(1-4\tilde{a}\ell_{T}/\mu)}\}$ .

Proof.

Given $1\leq\tau,\tau^{\prime}\leq k-1$ and $v_{0}\in N_{v}^{r-1}$ , since $M^{-1}=D^{-1}\sum_{\ell\geq 0}(-J)^{\ell}$ , we have

\displaystyle\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M)^{-1}_{\tau^{\prime},\tau}K_{\tau}^{(\tau)}y\Bigg{)}_{v_{0}}}\right\rVert

\displaystyle=\mathopen{}\mathclose{{}\left\lVert\Bigg{(}D^{-1}_{\tau^{\prime},\tau^{\prime}}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}K_{\tau}^{(\tau)}y\Bigg{)}_{v_{0}}}\right\rVert.

(20)

With slight abuse of notation, we use $K$ to denote $K_{\tau}^{(\tau)}$ , and $Q^{-1}$ to denote $D^{-1}_{\tau^{\prime},\tau^{\prime}}$ in this proof from now. We can rewrite the right hand side of (20) using the new notation as follows:

\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert&\leq\sum_{\ell\geq 0}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert\\ &=\sum_{\ell\geq 0}\mathopen{}\mathclose{{}\left\lVert\sum_{q\in N_{v}^{r-1}}\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{v_{0},q}(Ky)_{q}}\right\rVert\\ &\leq\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r-1}}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{v_{0},q}}\right\rVert\mathopen{}\mathclose{{}\left\lVert(Ky)_{q}}\right\rVert.\end{split}

(21)

For a given $q\in N_{v}^{r-1}$ and $y\in\mathbb{R}^{|\partial N_{v}^{r}|d}$ ,

\mathopen{}\mathclose{{}\left\lVert(Ky)_{q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert\sum_{u\in\partial N_{v}^{r}}K_{q,u}y_{u}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert\sum_{u\in\partial N_{v}^{r}\cap N_{q}^{1}}K_{q,u}y_{u}}\right\rVert.

where the last equality is since spacial interaction costs are only among neighboring nodes.

For a given $u\in\partial N_{v}^{r}$ , since the spacial interaction cost for each edge is $\ell_{S}$ smooth,

\mathopen{}\mathclose{{}\left\lVert K_{q,u}y_{u}}\right\rVert\leq\mathopen{}\mathclose{{}\left\lVert K_{q,u}}\right\rVert\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\leq\ell_{S}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert,

which gives

\mathopen{}\mathclose{{}\left\lVert(Ky)_{q}}\right\rVert\leq\sum_{u\in\partial N_{v}^{r}\cap N_{q}^{1}}\ell_{S}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert.

Therefore,

\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert&\leq\ell_{S}\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r-1}}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{v_{0},q}}\right\rVert\sum_{u\in\partial N_{v}^{r}\cap N_{q}^{1}}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert.\end{split}

(22)

By Lemma C.4, $(-J)^{\ell}_{\tau^{\prime},\tau}$ satisfies the following exponential decay properties: for any $u,q\in N_{v}^{r-1}$ ,

\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{\tau^{\prime},\tau})_{u,q}}\right\rVert\leq{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,q)},

where we choose $\delta=b_{1}\cdot\gamma_{S}$ , $\gamma_{S}^{\prime}=(1+b_{1})\gamma_{S}$ and $\tilde{a}=\sum_{\gamma\geq 0}(\frac{1}{1+b_{1}})^{\gamma}h(\gamma)$ .

Moreover, $Q^{-1}$ (which denotes $D_{\tau^{\prime},\tau^{\prime}}^{-1}$ ) is the inverse of a graph-induced banded matrix. $Q^{-1}$ satisfies: for any $u,q\in N_{v}^{r-1}$ ,

\mathopen{}\mathclose{{}\left\lVert(Q^{-1})_{u,q}}\right\rVert\leq\frac{2}{\mu}\gamma_{S}^{d_{\mathcal{G}}(u,q)}<\frac{2}{\mu}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,q)},

where the first inequality is again by using Lemma D.1 on $D_{\tau^{\prime},\tau^{\prime}}$ .

Applying Lemma C.3 on $Q^{-1}$ and $\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{\tau^{\prime},\tau})}\right\rVert$ , we have for any $u,q\in N_{v}^{r-1}$ , and $\ell\geq 1$ ,

\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{u,q}}\right\rVert\leq a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(u,q)},

where $\lambda^{\prime}\coloneqq\gamma_{S}^{\prime}+b_{2}\cdot\gamma_{S}<1$ and $a\coloneqq\sum_{\gamma\geq 0}(\frac{1+b_{1}}{1+b_{1}+b_{2}})^{\gamma}h(\gamma).$ Note that $J^{0}\coloneqq I$ , it is straightforward to verify that the above inequality holds when $\ell=0$ .

With the exponential decay properties of $Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}$ , we have

\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert&\leq\ell_{S}a^{2}\frac{2}{\mu}\sum_{\ell\geq 0}{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}\sum_{q\in N_{v}^{r-1}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\sum_{u\in\partial N_{v}^{r}\cup N_{q}^{1}}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &\leq\ell_{S}a^{2}\frac{2}{\mu}\sum_{\ell\geq|\tau^{\prime}-\tau|}{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}\sum_{u\in\partial N_{v}^{r}}\Delta(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &\leq\Delta\ell_{S}a^{2}\frac{2}{\mu}\sum_{\ell\geq|\tau^{\prime}-\tau|}(\frac{4\tilde{a}\ell_{T}}{\mu})^{\ell}\sum_{u\in\partial N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &\leq\frac{2\Delta\ell_{S}a^{2}}{\mu-4\tilde{a}\ell_{T}}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &=\frac{2\Delta\ell_{S}a^{2}}{\lambda^{\prime}(\mu-4\tilde{a}\ell_{T})}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert.\\ \end{split}

(23)

The third inequality uses ${\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}\leq 2^{\ell}$ , which can be proved using the following version of Stirling’s approximation: For all $n\geq 1$ , $e$ denotes the natural number,

\sqrt{2\pi n}(n/e)^{n}e^{1/(12n+1)}<n!<\sqrt{2\pi n}(n/e)^{n}e^{1/(12n)}.

Similarly, consider $\mathopen{}\mathclose{{}\left\lVert((M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e)_{v_{0}}}\right\rVert$ for $i\in\{1,k-1\}$ . With slight abuse of notation, in this proof, we use $R$ to denote $R_{i}^{(i)}$ and use the notation $Q^{-1}$ to denote $D^{-1}_{\tau^{\prime},\tau^{\prime}}.$ Following the same steps as before, we have

\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e\Bigg{)}_{v_{0}}}\right\rVert&\leq\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},i}\Bigg{)}_{v_{0},q}}\right\rVert\mathopen{}\mathclose{{}\left\lVert(Re)_{q}}\right\rVert.\end{split}

(24)

Since temporal interactions occurs for the same node under consecutive time steps, $R$ is a diagonal block matrix. Hence,

\mathopen{}\mathclose{{}\left\lVert(Re)_{q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert R_{q,q}e_{q}}\right\rVert\leq\ell_{T}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert.

Moreover, using the exponential decay properties of $Q^{-1}(-J)^{\ell}_{\tau^{\prime},i}$ , we have for $u,q\in N_{v}^{r-1}$ ,

\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},i}\Bigg{)}_{u,q}}\right\rVert\leq a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-i|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(u,q)}.

Therefore,

\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert((M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e)_{v_{0}}}\right\rVert&\leq\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r}}a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-i|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\ell_{T}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &\leq\sum_{\ell\geq|\tau^{\prime}-i|}\sum_{q\in N_{v}^{r}}a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-i|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\ell_{T}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &\leq\frac{2\ell_{T}a^{2}}{\mu}\sum_{\ell\geq|\tau^{\prime}-i|}(\frac{4\tilde{a}\ell_{T}}{\mu})^{\ell}\sum_{q\in N_{v}^{r}}{(\lambda^{\prime})}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &\leq\frac{2\ell_{T}a^{2}}{\mu-4\tilde{a}\ell_{T}}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-i|}\sum_{q\in N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &=\frac{a^{2}\mu}{2\tilde{a}(\mu-4\tilde{a}\ell_{T})}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-i|+1}\sum_{q\in N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert.\\ \end{split}

(25)

∎

Given time index $1\leq\tau^{\prime}\leq k-1$ , node $v_{0}\in N_{v}^{r-1}$ , and perturbation vector $e=(e_{0},e_{1},\cdots,e_{k})$ ,

	$\displaystyle\mathopen{}\mathclose{{}\left\lVert(\frac{d}{d\theta}\psi(\zeta+\theta e))_{\tau^{\prime},v_{0}}}\right\rVert$	$\displaystyle\leq\mathopen{}\mathclose{{}\left\lVert\Bigg{(}M^{-1}_{\tau^{\prime},1}R^{(1)}_{1}e_{0}\Bigg{)}_{v_{0}}}\right\rVert+\mathopen{}\mathclose{{}\left\lVert\Bigg{(}M^{-1}_{\tau^{\prime},k-1}R^{(k-1)}_{k-1}e_{k}\Bigg{)}_{v_{0}}}\right\rVert+\sum_{\tau=1}^{k-1}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}M^{-1}_{\tau^{\prime},\tau}K^{(\tau)}_{\tau}e_{\tau}\Bigg{)}_{v_{0}}}\right\rVert$
		$\displaystyle\leq\frac{a^{2}\mu}{2\tilde{a}(\mu-4\tilde{a}\ell_{T})}\Bigg{[}\rho_{T}^{\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{0})_{q}}\right\rVert+\rho_{T}^{k-\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{k})_{q}}\right\rVert\Bigg{]}$
		$\displaystyle+\sum_{\tau=1}^{k-1}\frac{2\Delta\ell_{S}a^{2}}{\lambda^{\prime}(\mu-4\tilde{a}\ell_{T})}\rho_{T}^{\|\tau^{\prime}-\tau\|}\sum_{u\in\partial N_{v}^{r}}(\rho_{S})^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert(e_{\tau})_{u}}\right\rVert$

where $\rho_{T}=\frac{4\tilde{a}\ell_{T}}{\mu}$ and $\rho_{S}=\lambda^{\prime}=(1+b_{1}+b_{2})\gamma_{S}.$ We let $C=\max\{\frac{a^{2}}{2\tilde{a}(1-4\tilde{a}\ell_{T}/\mu)},\frac{2a^{2}\Delta\ell_{S}/\mu}{\gamma_{S}(1+b_{1}+b_{2})(1-4\tilde{a}\ell_{T}/\mu)}\}$ . Under the condition $\mu\geq\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}(b_{1}+b_{2})/4\}$ , $\rho_{T}<1$ and $\rho_{S}<1$ .

Then,

		$\displaystyle\mathopen{}\mathclose{{}\left\lVert(\frac{d}{d\theta}\psi(\zeta+\theta e))_{\tau^{\prime},v_{0}}}\right\rVert$
	$\displaystyle\leq{}$	$\displaystyle C\Bigg{[}\rho_{T}^{\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{0})_{q}}\right\rVert+\rho_{T}^{k-\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{k})_{q}}\right\rVert+\sum_{\tau=1}^{k-1}\rho_{T}^{\|\tau^{\prime}-\tau\|}\sum_{u\in\partial N_{v}^{r}}(\rho_{S})^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert(e_{\tau})_{u}}\right\rVert\Bigg{]}.$

Finally, let $\zeta=\{y_{t-1}^{u},z_{\tau}^{u}|(\tau,u)\in\partial N_{(v,t)}^{(k,r)}\}$ and $e=\{(y_{t-1}^{u})^{\prime}-y_{t-1}^{u},(z_{\tau}^{u})^{\prime}-z_{\tau}^{u}\}$ . By integration,

\displaystyle\mathopen{}\mathclose{{}\left\lVert\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)_{(t_{0},v_{0})}-(\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\}}\right)_{(t_{0},v_{0})}}\right\rVert

\displaystyle\leq\int_{0}^{1}\mathopen{}\mathclose{{}\left\lVert(\frac{d}{d\theta}\psi(\zeta+\theta e))_{t_{0},v_{0}}}\right\rVert d\theta,

which is bounded by

C\sum_{u\in N_{v}^{r}}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert+C\sum_{(u,\tau)\in\partial N_{(t,v)}^{(k,r)}}\rho_{T}^{|t_{0}-\tau|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert.

D.3 Adding Constraints to Perturbation Bounds

Recall that in Appendix D.1 and D.2, we showed Theorem 3.2 and Theorem 3.3 under the assumption that the individual decisions are unconstrained to simplify the analysis. In this section, we present a general way to relax this assumption by incorporating logarithm barrier functions, which also applies for Theorem C.5.

Recall that in Assumption 2.1, we assume that $D_{t}^{v}$ is convex with a non-empty interior, and can be expressed as

D_{t}^{v}:=\{x_{t}^{v}\in\mathbb{R}^{n}\mid(g_{t}^{v})_{i}(x_{t}^{v})\leq 0,\forall 1\leq i\leq m_{t}^{v}\},

where the $i$ th constraint $(g_{t}^{v})_{i}:\mathbb{R}^{n}\to\mathbb{R}$ is a convex function in $C^{2}$ . For any time-vertex pair $(\tau,v)$ , we can approximate the individual constraints

(g_{\tau}^{v})_{i}(x_{\tau}^{v})\leq 0,\forall 1\leq i\leq m_{\tau}^{v},

by adding the logarithmic barrier function $-\mu\sum_{i=1}^{m_{\tau}^{v}}\ln{(-(g_{\tau}^{v})_{i}(x_{\tau}^{v}))}$ to the original node cost function $f_{\tau}^{v}$ . Here, parameter $\mu$ is a positive real number that controls how “good” the barrier function approximates the indicator function

\displaystyle\mathbf{I}_{D_{\tau}^{v}}(x_{\tau}^{v})=\begin{cases}0&\text{ if }(g_{\tau}^{v})_{i}(x_{\tau}^{v})\leq 0,\forall 1\leq i\leq m_{\tau}^{v},\\ +\infty&\text{ otherwise.}\end{cases}

The approximation improves as parameter $\mu$ becomes closer to $0$ . Thus, the new node cost function will be

B_{\tau}^{v}(x_{\tau}^{v};\mu):=f_{\tau}^{v}(x_{\tau}^{v})-\mu\sum_{i=1}^{m_{\tau}^{v}}\ln{(-(g_{\tau}^{v})_{i}(x_{\tau}^{v}))}.

As an extension of the original notation, we use $\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\};\mu)$ denote the optimal solution of the following optimization problem

	$\displaystyle\operatorname*{arg\,min}_{\{x_{\tau}^{u}\mid(\tau,v)\in N_{(t,v)}^{(k-1,r-1)}\}}$	$\displaystyle\sum_{\tau=t}^{t+k-1}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}B_{\tau}^{u}(x_{\tau}^{u};\mu)+\sum_{u\in N_{v}^{r}}c_{\tau}^{u}(x_{\tau}^{u},x_{\tau-1}^{u})+\sum_{(u,q)\in{\mathcal{E}}(N_{v}^{r})}g_{t}^{(u,q)}(x_{t}^{u},x_{t}^{q})}\right)$
	s.t.	$\displaystyle x_{t-1}^{u}=y_{t-1}^{u},\forall u\in N_{v}^{r},$
		$\displaystyle x_{\tau}^{u}=z_{\tau}^{u},\forall(\tau,u)\in\partial N_{(t,v)}^{(k,r)}.$

Compared with $\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\})$ defined in Section 3.1, the constraints $x_{\tau}^{u}\in D_{\tau}^{u}$ are removed and the node costs $f_{\tau}^{u}(x_{\tau}^{u})$ are replaced with $B_{\tau}^{u}(x_{\tau};\mu)$ .

A key observation we need to point out is that the perturbation bounds we have shown in Appendix D.1 and D.2 do not depend on the smoothness constant $\ell_{f}$ of node cost functions. That means the perturbation bound

		$\displaystyle\mathopen{}\mathclose{{}\left\lVert\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\};\mu}\right)_{(t_{0},v_{0})}-\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\};\mu}\right)_{(t_{0},v_{0})}}\right\rVert$
	$\displaystyle\leq{}$	$\displaystyle C_{1}\sum_{(u,\tau)\in\partial N_{(t,v)}^{(k,r)}}\rho_{T}^{\|t_{0}-\tau\|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert+C_{2}\sum_{u\in N_{v}^{r}}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert$

holds for arbitrary $\mu$ , where $C_{1},C_{2},\rho_{S},\rho_{T}$ are specified in Theorem 3.2 or Theorem 3.3 and are independent of $\mu$ . Theorem 3.10 in Forsgren et al. (2002) guarantees that $\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\};\mu_{k})$ converge to $\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\})$ for any positive sequence $\{\mu_{k}\}_{k=1}^{\infty}$ that tends to zero. Thus the above perturbation bound also holds for $\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\})$ which includes the constraints on individual decisions.

Note that the argument we present in this section also works for Theorem C.5.

Appendix E Competitive Bounds

This appendix includes the proofs of the competitive bounds presented in Section 3.3.

E.1 Proof of Theorem C.8

We first derive an upper bound on the distance between $x_{t}$ and $x_{t}^{*}$ .

Note that for any time step $t$ , we have

\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}\mathopen{}\mathclose{{}\left(x_{t-1}}\right)_{t}}\right\rVert\leq e_{t}.

(26)

Thus we see that


$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert={}$	$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{1}(x_{0})_{t}}\right\rVert{}$
$\displaystyle\leq{}$	$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}(x_{t-1})_{t}}\right\rVert+\sum_{i=1}^{t-1}\mathopen{}\mathclose{{}\left\lVert\tilde{\psi}_{t-i+1}(x_{t-i})_{t}-\tilde{\psi}_{t-i}(x_{t-i-1})_{t}}\right\rVert$
$\displaystyle\leq{}$	$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}(x_{t-1})_{t}}\right\rVert+\sum_{i=1}^{t-1}C_{G}\rho_{G}^{i}\mathopen{}\mathclose{{}\left\lVert x_{t-i}-\tilde{\psi}_{t-i}(x_{t-i-1})_{t-i}}\right\rVert$	(27a)
$\displaystyle\leq{}$	$\displaystyle\sum_{i=0}^{t-1}C_{0}\rho_{G}^{i}\mathopen{}\mathclose{{}\left\lVert x_{t-i}-\tilde{\psi}_{t-i}(x_{t-i-1})_{t-i}}\right\rVert$	(27b)
$\displaystyle\leq{}$	$\displaystyle\sum_{i=1}^{t}C_{0}\rho_{G}^{t-i}e_{i},$	(27c)

where in (27a), we used Theorem C.5 and the fact that $\tilde{\psi}_{t-i}(x_{t-i-1})_{t}$ can be written as

\tilde{\psi}_{t-i}(x_{t-i-1})_{t}=\tilde{\psi}_{t-i+1}\mathopen{}\mathclose{{}\left(\tilde{\psi}_{t-i}(x_{t-i-1})_{t-i}}\right)_{t}.

We also used $C_{0}=\max\{1,C_{G}\}$ in (27b) and (26) in (27c).

By (27) and the Cauchy-Schwarz Inequality, we see that

\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq C_{0}^{2}\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}e_{i}}\right)^{2}\leq C_{0}^{2}\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}}\right)\cdot\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}e_{i}^{2}}\right)\leq\frac{C_{0}^{2}}{1-\rho_{G}}\cdot\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}e_{i}^{2}}\right).

Summing up over $t$ gives that

\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq\frac{C_{0}^{2}}{(1-\rho_{G})^{2}}\cdot\sum_{t=1}^{H}e_{t}^{2}.

E.2 Proof of Lemma C.7

In this section, we show Lemma C.7 holds with following specific constants:

$\displaystyle e_{t}^{2}:={}$	$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t\mid t-1}^{*}}\right\rVert^{2}$
$\displaystyle\leq{}$	$\displaystyle 4C_{1}^{2}C_{0}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}$
	$\displaystyle+\frac{8C_{1}^{2}}{\mu}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{})+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}f_{t+k-1}(x_{t+k-1}^{})}\right)$	(28)

Note that, by the principle of optimality, we have

	$\displaystyle x_{t}^{v}$	$\displaystyle=\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{x_{t-1}^{u}\},\{\theta_{\tau}^{u}\}}\right)_{(t,v)},$
	$\displaystyle(x_{t\mid t-1}^{v})^{*}$	$\displaystyle=\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{x_{t-1}^{u}\},\{(x_{\tau\mid t-1}^{u})^{*}\}}\right)_{(t,v)}.$

Recall that we define the quantity $C_{3}(r):=\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}$ to simplify the notation.

Since the exponentially decaying local perturbation bound holds in Definition 3.1, we see that

	$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert\leq{}$	$\displaystyle C_{1}\rho_{S}^{r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert$
		$\displaystyle+C_{1}\rho_{T}^{k-1}\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert,$		(29)

which implies that


$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}\leq{}$	$\displaystyle 2C_{1}^{2}\rho_{S}^{2r}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert}\right)^{2}$
	$\displaystyle+2C_{1}^{2}\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert}\right)^{2}$	(30a)
$\displaystyle\leq{}$	$\displaystyle 2C_{1}^{2}\rho_{S}^{2r}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}1}\right)\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}}\right)$
	$\displaystyle+2C_{1}^{2}\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}}\right)\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}}\right)$	(30b)
$\displaystyle\leq{}$	$\displaystyle\frac{2C_{1}^{2}h(r)}{1-\rho_{T}}\cdot\rho_{S}^{2r}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}}\right)$
	$\displaystyle+2C_{1}^{2}C_{3}(r)\cdot\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}}\right),$	(30c)

where we used the AM-GM Inequality in (30a); we used the Cauchy-Schwarz Inequality in (30b); we used the definitions of functions $h(r)$ and $C_{3}(r)$ in (30c).

Summing up (30) over all $v\in\mathcal{V}$ and reorganizing terms gives

	$\displaystyle\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}$
$\displaystyle\leq{}$	$\displaystyle\frac{2C_{1}^{2}h(r)}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}}\right)$
	$\displaystyle+2C_{1}^{2}C_{3}(r)\cdot\rho_{T}^{2(k-1)}\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}}\right)$
$\displaystyle\leq{}$	$\displaystyle\frac{2C_{1}^{2}h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{}-\theta_{\tau}}\right\rVert^{2}+2C_{1}^{2}C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left\lVert x_{t+k-1\mid t-1}^{}-\theta_{t+k-1}}\right\rVert^{2},$	(31)

where we used the facts that

\displaystyle\sum_{v\in\mathcal{V}}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}\leq{}h(r)\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{v})^{*}-\theta_{\tau}^{v}}\right\rVert^{2}=h(r)\cdot\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-\theta_{\tau}}\right\rVert^{2},

and

	$\displaystyle\sum_{v\in\mathcal{V}}\sum_{u\in\partial N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}\leq{}$	$\displaystyle C_{3}(r)\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{v})^{*}-\theta_{t+k-1}^{v}}\right\rVert^{2}$
	$\displaystyle={}$	$\displaystyle C_{3}(r)\cdot\mathopen{}\mathclose{{}\left\lVert x_{t+k-1\mid t-1}^{*}-\theta_{t+k-1}}\right\rVert^{2}.$

We also note that by the principle of optimality, the following equations hold for all $\tau\geq t$ :

	$\displaystyle x_{\tau\mid t-1}^{*}$	$\displaystyle=\tilde{\psi}_{t}\mathopen{}\mathclose{{}\left(x_{t-1}}\right)_{\tau},$
	$\displaystyle x_{\tau}^{*}$	$\displaystyle=\tilde{\psi}_{t}\mathopen{}\mathclose{{}\left(x_{t-1}^{*}}\right)_{\tau}.$

Recall that $C_{0}:=\max\{1,C_{G}\}$ . By Theorem C.5, we see that

\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-x_{\tau}^{*}}\right\rVert\leq C_{0}\rho_{G}^{\tau-t+1}\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert,

(32)

which implies


$\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-\theta_{\tau}}\right\rVert^{2}\leq{}$	$\displaystyle 2\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{}-x_{\tau}^{}}\right\rVert^{2}+2\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{*}-\theta_{\tau}}\right\rVert^{2}$	(33a)
$\displaystyle\leq{}$	$\displaystyle 2C_{0}^{2}\rho_{G}^{2(\tau-t+1)}\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{}}\right\rVert^{2}+2\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{}-\theta_{\tau}}\right\rVert^{2},$	(33b)

where we used the triangle inequality and the AM-GM inequality in (33a); we used (32) in (33b).

Substituting (33) into (E.2) gives

	$\displaystyle\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}$
$\displaystyle\leq{}$	$\displaystyle 4C_{1}^{2}C_{0}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}$
	$\displaystyle+4C_{1}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{}-\theta_{\tau}}\right\rVert^{2}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left\lVert x_{t+k-1}^{}-\theta_{t+k-1}}\right\rVert^{2}}\right)$
$\displaystyle\leq{}$	$\displaystyle 4C_{1}^{2}C_{0}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}$
	$\displaystyle+\frac{8C_{1}^{2}}{\mu}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{})+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}f_{t+k-1}(x_{t+k-1}^{})}\right),$	(34)

where we used the fact that the node cost function $f_{\tau}^{v}$ is non-negative and $\mu$ -strongly convex for all $\tau,v$ , thus

f_{\tau}(x_{\tau}^{*})\geq\sum_{v\in\mathcal{V}}f_{\tau}^{v}((x_{\tau}^{v})^{*})\geq\frac{\mu}{2}\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau}^{v})^{*}-\theta_{\tau}^{v}}\right\rVert^{2}=\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{*}-\theta_{\tau}}\right\rVert^{2}.

Note that $\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}=\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t\mid t-1}^{*}}\right\rVert^{2}=e_{t}^{2}$ . Thus we have finished the proof of (E.2).

E.3 Proof of Theorem 3.4

In this section, we show Theorem 3.4 holds with the following specific constants:

\displaystyle 1+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{T})\cdot h(r)^{2}}{\mu(1-\rho_{G})^{2}(1-\rho_{T})^{2}}}\right)\cdot\rho_{S}^{r}+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{T})C_{3}(r)^{2}}{\mu(1-\rho_{G})^{2}}}\right)\rho_{T}^{k-1}.

(35)

under the assumption that

\displaystyle\frac{4C_{1}^{2}C_{0}^{4}}{(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2}.

(36)

Recall that $C_{0}$ is defined in Theorem C.8. Note that Theorem 3.2 and Theorem C.5 hold under Assumption 2.1. One can check that $C_{0},C_{1},(1-\rho_{G})^{-1},$ and $(1-\rho_{T})^{-1}$ are bounded by polynomials of $\ell_{f}/\mu,\ell_{T}/\mu,$ and $(\Delta\ell_{S})/\mu$ .

In the proof, we need to use Lemma F.2 in Lin et al. (2021) to bound LPC’s total cost by a weighted sum of the offline optimal cost and the sum of squared distances between their trajectories. For completeness, we present Lemma F.2 in Lin et al. (2021) below:

Lemma E.1.

For a fixed dimension $m\in\mathbb{Z}_{+}$ , assume a function $h:\mathbb{R}^{m}\to\mathbb{R}_{\geq 0}$ is convex, $\ell$ -smooth and continuously differentiable. For all $x,y\in\mathbb{R}^{m}$ , for all $\eta>0$ , we have

h(x)\leq(1+\eta)h(y)+\frac{\ell}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\mathopen{}\mathclose{{}\left\lVert x-y}\right\rVert^{2}.

Now we come back to the proof of Theorem 3.4. We first bound the sum of squared distances between LPC’s trajectory and the offline optimal trajectory:


$\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq{}$	$\displaystyle\frac{C_{0}^{2}}{(1-\rho_{G})^{2}}\sum_{t=1}^{H}e_{t}^{2}$	(37a)
$\displaystyle\leq{}$	$\displaystyle\frac{4C_{1}^{2}C_{0}^{4}}{(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}$
	$\displaystyle+\frac{8C_{0}^{2}C_{1}^{2}}{\mu(1-\rho_{G})^{2}}\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{})+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}f_{t+k-1}(x_{t+k-1}^{})}\right),$	(37b)

where we used Theorem C.8 in (37a); we used Lemma C.7 with the specific constants given in Appendix E.2 in (37b).

Recall that in (36), we assume $r$ and $k$ are sufficient large so that the coefficient of the first term in (37) satisfies

\frac{4C_{1}^{2}C_{0}^{4}}{(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2}.

Substituting this bound into (37) gives that

\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq\frac{16C_{0}^{2}C_{1}^{2}}{\mu(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{(1-\rho_{T})^{2}}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}}\right)\cdot\sum_{t=1}^{H}f_{t}(x_{t}^{*}).

(38)

By Lemma E.1, since $f_{t}$ is $(\ell_{f}+\Delta\ell_{S})$ -smooth, convex, and non-negative on $\mathbb{R}^{n}$ , and $c_{t}$ is $\ell_{T}$ -smooth, convex, and non-negative on $\mathbb{R}^{n}\times\mathbb{R}^{n}$ , we know that

	$\displaystyle f_{t}(x_{t})$	$\displaystyle\leq(1+\eta)f_{t}(x_{t}^{})+\frac{\ell_{f}+\Delta\ell_{S}}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{}}\right\rVert^{2}$
	$\displaystyle c_{t}(x_{t},x_{t-1})$	$\displaystyle\leq(1+\eta)c_{t}(x_{t}^{},x_{t-1}^{})+\frac{\ell_{T}}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{}}\right\rVert^{2}+\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{}}\right\rVert^{2}}\right)$		(39)

holds for any $\eta>0$ . Summing the above inequality over $t$ gives

	$\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t})+c_{t}(x_{t},x_{t-1})}\right)$
$\displaystyle\leq{}$	$\displaystyle(1+\eta)\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t}^{})+c_{t}(x_{t}^{},x_{t-1}^{})}\right)+\frac{(\ell_{f}+\Delta\ell_{S}+2\ell_{T})}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{}}\right\rVert^{2}$
$\displaystyle\leq{}$	$\displaystyle(1+\eta)cost(OPT)+\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\frac{16C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{T})}{\mu(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{(1-\rho_{T})^{2}}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}}\right)\cdot cost(OPT),$	(40)

where we used (38) and $\sum_{t=1}^{H}f_{t}(x_{t}^{*})\leq cost(OPT)$ in the last inequality. Setting $\eta=\rho_{S}^{r}+\rho_{T}^{k-1}$ in (E.3) finishes the proof of (35).

As a remark, we require the local cost function $\mathopen{}\mathclose{{}\left(f_{t}^{v},c_{t}^{v},s_{t}^{e}}\right)$ to be non-negative, convex, and smooth in the whole Euclidean spaces $\mathopen{}\mathclose{{}\left(\mathbb{R}^{n},\mathbb{R}^{n}\times\mathbb{R}^{n},\mathbb{R}^{n}\times\mathbb{R}^{n}}\right)$ in Assumption 2.1 because we want to apply Lemma E.1 in (E.3).

E.4 Proof of Corollary 3.5

We first show $\Delta^{2}\rho_{S}\leq\sqrt{\rho_{S}}$ holds under 2.1 and the assumptions that $\frac{\ell_{S}}{\mu}\leq\frac{1}{\Delta^{7}},\frac{\ell_{T}}{\mu}\leq\frac{1}{16}$ . To see this, note that as we discussed in Section 3.2, by setting $b_{1}=2\Delta-1$ and $b_{2}=4\Delta^{2}-2\Delta$ , Theorem 3.4 holds with

\displaystyle\rho_{S}=\frac{4\Delta^{2}(\sqrt{1+\Delta\ell_{S}/\mu}-1)}{\sqrt{1+\Delta\ell_{S}/\mu}+1}.

Hence we see that

\Delta^{2}\sqrt{\rho_{S}}=2\Delta^{3}\mathopen{}\mathclose{{}\left(\frac{\sqrt{1+(\Delta\ell_{S}/\mu)}-1}{\sqrt{1+(\Delta\ell_{S}/\mu)}+1}}\right)^{\frac{1}{2}}\leq 2\Delta^{3}\mathopen{}\mathclose{{}\left(\frac{\sqrt{1+\Delta^{-6}}-1}{2}}\right)^{\frac{1}{2}}\leq 1,

which implies that

\displaystyle\Delta^{2}\rho_{S}\leq\sqrt{\rho_{S}}.

(41)

Recall that function $C_{3}(r):=\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}$ . Hence we see that

\displaystyle C_{3}(r)\leq\sum_{\gamma=0}^{r}\Delta^{\gamma}\cdot\rho_{S}^{\gamma}\leq\sum_{\gamma=0}^{r}\mathopen{}\mathclose{{}\left(\frac{\sqrt{\rho_{S}}}{\Delta}}\right)^{\gamma}\leq\frac{\Delta}{\Delta-\sqrt{\rho_{S}}}.

(42)

Substituting (41) and (42) into the competitive ratio bound in (35) shows that the competitive ratio of LPC is upper bound by

1+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{c})}{\mu(1-\rho_{G})^{2}(1-\rho_{T})^{2}}}\right)\cdot\rho_{S}^{\frac{r}{2}}+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{c})\Delta^{2}}{\mu(1-\rho_{G})^{2}(\Delta-\sqrt{\rho_{S}})^{2}}}\right)\rho_{T}^{k-1}.

Appendix F Proof of Theorem 3.6

In this appendix we prove a lower bound on the competitive ratio of any online algorithm. Our proof focuses on temporal and spatial lower bounds separately first, and then combines them.

Step 1: Temporal Lower Bounds

We first show that the competitive ratio of any online algorithm with $k$ steps of future predictions is lower bounded by $1+\Omega(\lambda_{T}^{k})$ . To show this, we consider the special case when there are no spatial interaction costs (i.e., $s_{t}^{e}\equiv 0$ for all $t$ and $e$ ). In this case, since all agents are independent with each other, it suffices to assume there is only one agent in the network $\mathcal{G}$ . Thus we will drop the agent index in the following analysis. To further simplify the problem, we assume dimension $n=1$ , $c_{t}(x_{t},x_{t-1})=\frac{\ell_{T}}{2}(x_{t}-x_{t-1})^{2}$ , and the feasible set is $D_{t}\equiv D=[0,1]$ for all $t$ . Let $R$ denote the diameter of $D$ , i.e., $R=\sup_{x,y\in D}\mathopen{}\mathclose{{}\left\lvert x-y}\right\rvert=1$ .

By Theorem 2 in Li et al. (2020) and Case 1 in its proof, we know that for any online algorithm $ALG$ with $k$ steps of future predictions and $L_{T}\in(2R,RH)$ , there exists a problem instance with quadratic functions $f_{1},f_{2},\ldots,f_{H}$ that have the form $f_{t}(x_{t})=\frac{\mu}{2}(x_{t}-\theta_{t})^{2},\theta_{t}\in D$ such that

\displaystyle cost(ALG)-cost(OPT)\geq\frac{\mu^{3}(1-\sqrt{\lambda_{T}})^{2}}{96(\mu+1)^{2}}\cdot\lambda_{T}^{k}\cdot R\cdot L_{H},

(43)

where $L_{H}\geq\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lvert\theta_{t}-\theta_{t-1}}\right\rvert$ . Note that

	$\displaystyle R\cdot L_{T}\geq{}$	$\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lvert v_{t}-v_{t-1}}\right\rvert^{2}$
	$\displaystyle={}$	$\displaystyle\frac{2}{\ell_{T}}\cdot\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(v_{t})+c_{t}(v_{t},v_{t-1})}\right)$
	$\displaystyle\geq{}$	$\displaystyle\frac{2}{\ell_{T}}\cdot cost(OPT).$

Substituting this into (43) gives

\displaystyle cost(ALG)\geq\mathopen{}\mathclose{{}\left(1+\frac{\mu^{3}(1-\sqrt{\lambda_{T}})^{2}}{48(\mu+1)^{2}\ell_{T}}\cdot\lambda_{T}^{k}}\right)\cdot cost(OPT).

(44)

Note that (43) implies $cost(ALG)>0$ , hence the competitive ratio can be unbounded if $cost(OPT)=0$ .

Step 2: Spatial Lower Bounds

We next show that the competitive ratio of any online algorithm that can communicate within $r$ -hop neighborhood according to the scheme defined in Section 2.1 is lower bounded by $1+\Omega(\lambda_{S}^{r})$ . To show this, we will construct a special Networked OCO instance with random cost functions and show there exists a realization that achieves the lower bound by probabilistic methods.

Theorem F.1.

Under the assumption that $\Delta\geq 3$ , the competitive ratio of any decentralized online algorithm $ALG$ with communication radius $r$ is lower bounded by $1+\Omega(\lambda_{S}^{r})$ , where $\Omega(\cdot)$ notation hides factors that depend polynomially on $1/\mu,\ell_{T},\ell_{S},$ and $\Delta$ , and

\displaystyle\lambda_{S}=\begin{cases}\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)}&\text{ if }\Delta\ell_{S}/\mu<48,\\ \max\mathopen{}\mathclose{{}\left(\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)},\mathopen{}\mathclose{{}\left(1-4\sqrt{3}\cdot(\Delta\ell_{S}/\mu)^{-\frac{1}{2}}}\right)^{2}}\right)&\text{ otherwise.}\end{cases}

(45)

Proof of Theorem F.1.

In the proof, we assume the online game only lasts one time step before it ends, i.e., $H=1$ . Note that when $H>1$ , the same counterexample can be constructed repeatedly by letting the temporal interaction costs $c_{t}^{v}\equiv 0$ for every agent $v$ and time step $t$ . To simplify the notation, we define $\ell:=\ell_{S}/\mu$ and $d:=[\Delta/2]$ . Without the loss of generality, we assume $\mathcal{V}=\{1,2,\cdots,n\}$ so that each agent has a positive integer index.

We consider the case where the node cost function for each agent $i$ is $(x_{i}+w_{i})^{2}$ and the spatial interaction cost between two neighboring agents $i$ and $j$ is $\ell(x_{i}-x_{j})^{2}$ . Here, $x_{i}\in\mathbb{R}$ is the scalar action of agent $i$ , and parameter $w_{i}\in\mathbb{R}$ is a local information that corresponds to agent $i$ . The parameters $\{w_{i}\}_{i=1}^{n}$ are sampled i.i.d. from some distribution $\mathcal{D}$ , which we will discuss later.

For a general graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ of agents, let $L$ denote its graph Laplacian matrix. Recall that the graph Laplacian matrix $L\in\mathcal{V}\times\mathcal{V}$ is a symmetric $n\times n$ matrix and it is defined as

L_{i,j}=\begin{cases}deg(i)&\text{ if }i=j,\\ -1&\text{ if }i\not=j\text{ and }(i,j)\in\mathcal{E},\\ 0&\text{ otherwise,}\end{cases}

for agents $i,j\in\mathcal{V}$ . Here $deg(\cdot)$ denotes the degree of an agent in graph $\mathcal{G}$ . We know that $L$ is a symmetric semi-definite positive semi-definite and has bandwidth $1$ w.r.t. to $\mathcal{G}$ . The centralized optimization problem can be expressed as

	$\displaystyle cost(OPT)$	$\displaystyle=\min_{x\in\mathbb{R}^{n}}(x+w)^{\top}(x+w)+\ell\cdot x^{\top}Lx$
		$\displaystyle=\min_{x\in\mathbb{R}^{n}}\mathopen{}\mathclose{{}\left\lVert(I+\ell\cdot L)^{\frac{1}{2}}x+(I+\ell\cdot L)^{-\frac{1}{2}}w}\right\rVert^{2}+w^{\top}(I-(I+\ell\cdot L)^{-1})w$
		$\displaystyle=w^{\top}(I-(I+\ell\cdot L)^{-1})w,$

where the minimum is attained at $x^{*}=(I+\ell\cdot L)^{-1}w$ .

When each agent $i$ only has communication radius $r$ , it can only observe the part of $w$ that is within $N_{i}^{r}$ . To simplify the notation, we define the mask operator $\phi_{S}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ w.r.t. a set $S\subseteq\mathcal{V}$ as

\phi_{S}(w)_{i}=\begin{cases}w_{i}&\text{ if }i\in S,\\ 0&\text{ otherwise,}\end{cases}

for $i\in\mathcal{V}$ . The local policy of agent $i$ (denote as $\pi_{i}$ ) is a mapping from $w_{N_{i}^{r}}$ to the local decision $x_{i}$ .

Suppose the distribution $\mathcal{D}$ of each local parameters $w_{i}$ is a mean-zero distribution with support on $\mathbb{R}$ . For every agent $i\in\mathcal{V}$ , we see that


$\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert x_{i}(w)-x_{i}^{*}(w)}\right\rvert^{2}$	$\displaystyle={}\min_{\pi_{i}}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\pi_{i}(w_{N_{i}^{r}})-x_{i}^{*}(w)}\right\rvert^{2}$
	$\displaystyle\geq{}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathbb{E}[x_{i}^{}(w)\mid w_{N_{i}^{r}}]-x_{i}^{}(w)}\right\rvert^{2}$	(46a)
	$\displaystyle={}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathbb{E}[\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}w}\right)_{i}\mid w_{N_{i}^{r}}]-\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}w}\right)_{i}}\right\rvert^{2}$
	$\displaystyle={}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}\phi_{N_{i}^{r}}(w)}\right)_{i}-\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}w}\right)_{i}}\right\rvert^{2}$	(46b)
	$\displaystyle={}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}\phi_{N_{-i}^{r}}(w)}\right)_{i}}\right\rvert^{2},$	(46c)

where we use the fact that conditional expectations minimize the mean square prediction error in (46a); we use the requirement that the distribution of $w$ is mean-zero in (46b).

To bound the variance term in (46c), we need the following lemma to lower bound the magnitude of every entry in the exponential decaying matrix $(I+\ell\cdot L)^{-1}$ :

Lemma F.2.

There exists a finite graph $\mathcal{G}$ with maximum degree $2d$ that satisfies the following conditions: For any two vertices $i,j$ such that $d_{\mathcal{G}}(i,j)\geq 3$ , the following inequality holds:

\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq\frac{d_{\mathcal{G}}(i,j)}{d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\frac{d\ell}{2d\ell+1}}\right)^{d_{\mathcal{G}}(i,j)}.

If we make the additional assumption that $\ell>\frac{16}{d}$ , we have that

\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq\frac{1}{4\sqrt{\pi\cdot d_{\mathcal{G}}(i,j)\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(1-4(d\ell)^{-\frac{1}{2}}}\right)^{d_{\mathcal{G}}(i,j)}.

We defer the proof of Lemma F.2 to the end of this section. Note that Lemma F.2 implies that there exists a graph $\mathcal{G}$ that satisfies $\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{i,j}=\Omega\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}}\right)$ , where $\Omega(\cdot)$ notation hides factors that depend polynomially on $1/\mu,\ell_{T},\ell_{S},$ and $\Delta$ , and $\lambda_{S}$ is as defined in (45). We assume the agents are located in this graph $\mathcal{G}$ for the rest of the proof.

Using Lemma F.2, we can derive the following lower bound of the variance term in (46b):

$\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}\phi_{N_{-i}^{r}}(w)}\right)_{i}}\right\rvert^{2}={}$	$\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left(\sum_{j\in N_{-i}^{r}}\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}w_{j}}\right)^{2}$
$\displaystyle={}$	$\displaystyle\sum_{j\in N_{-i}^{r}}\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}^{2}Var(w_{j})$
$\displaystyle\geq{}$	$\displaystyle\sum_{j\in\partial N_{i}^{r+1}}\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}^{2}Var(w_{j})$
$\displaystyle\geq{}$	$\displaystyle\Theta\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}\cdot Var(w_{i})}\right).$	(47)

Substituting (F) into (46) and summing over all vertices $i$ , we obtain that

\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lVert x(w)-x^{*}(w)}\right\rVert^{2}\geq\sum_{i=1}^{n}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert x_{i}(w)-x_{i}^{*}(w)}\right\rvert^{2}\geq\Theta\mathopen{}\mathclose{{}\left(n\cdot\lambda_{S}^{r}\cdot Var(w_{i})}\right).

We also see that

\displaystyle\mathbb{E}_{w}[cost(OPT)]=\mathbb{E}_{w}\mathopen{}\mathclose{{}\left[w^{\top}(I-(I+\ell\cdot L)^{-1})w}\right]=O(n\cdot Var(w_{i})).

(48)

Note that the global objective function $(x+w)^{\top}(x+w)+\ell\cdot x^{\top}Lx$ is $1$ -strongly convex, and $x^{*}(w)$ is minimizer of this function. Thus, we have that for any outcome of $w$ ,

cost(ALG)-cost(OPT)\geq\frac{1}{2}\mathopen{}\mathclose{{}\left\lVert x(w)-x^{*}(w)}\right\rVert^{2}.

Taking expectations on both sides w.r.t. $w$ gives that

\displaystyle\mathbb{E}_{w}cost(ALG)-\mathbb{E}_{w}cost(OPT)\geq\frac{1}{2}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lVert x(w)-x^{*}(w)}\right\rVert^{2}\geq\Theta\mathopen{}\mathclose{{}\left(n\cdot\lambda_{S}^{r}\cdot Var(w_{i})}\right).

(49)

Dividing (49) by (48), we obtain that

\frac{\mathbb{E}_{w}cost(ALG)}{\mathbb{E}_{w}cost(OPT)}\geq 1+\Omega\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}}\right).

Note that $\mathbb{P}_{w}\mathopen{}\mathclose{{}\left[cost(OPT)=0}\right]=0$ . Thus, there must exist an instance of $w$ such that $cost(OPT)>0$ and

\frac{cost(ALG)}{cost(OPT)}\geq 1+\Omega\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}}\right).

∎

Before we present the proof of Lemma F.2, we first need to introduce two technical lemmas that will be used in the proof of Lemma F.2. The first lemma (Lemma F.3) provides a lower bound for binomial coefficient $\binom{(2+\epsilon)m}{m}$ .

Lemma F.3.

For any positive integer $m$ and $\epsilon\in\mathbb{R}_{\geq 0}$ such that $\epsilon m$ is an integer, the following inequality holds:

\binom{(2+\epsilon)m}{m}>\frac{1}{\sqrt{2\pi}}m^{-\frac{1}{2}}\cdot\frac{(2+\epsilon)^{(2+\epsilon)m+\frac{1}{2}}}{(1+\epsilon)^{(1+\epsilon)m+\frac{1}{2}}}\cdot e^{-\frac{1}{6m}}.

Proof of Lemma F.3.

By Lemma 2.1 in (Stanica, 2001), we know for any $n\in\mathbb{Z}_{+}$ ,

n!=\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n+r(n)},

where $r(n)$ satisfies $\frac{1}{12n+1}<r(n)<\frac{1}{12n}$ . Thus we see that

\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n+\frac{1}{12n+1}}<n!<\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n+\frac{1}{12n}},\forall n\in\mathbb{Z}_{+}.

Therefore, we can lower bound $\binom{(2+\epsilon)m}{m}$ by

	$\displaystyle\binom{(2+\epsilon)m}{m}={}$	$\displaystyle\frac{((2+\epsilon)m)!}{m!\cdot((1+\epsilon)m)!}$
	$\displaystyle>{}$	$\displaystyle\frac{\sqrt{2\pi}((2+\epsilon)m)^{(2+\epsilon)m+\frac{1}{2}}e^{-(2+\epsilon)m+\frac{1}{12(2+\epsilon)m+1}}}{\sqrt{2\pi}m^{m+\frac{1}{2}}e^{-m+\frac{1}{12m}}\cdot\sqrt{2\pi}((1+\epsilon)m)^{(1+\epsilon)m+\frac{1}{2}}e^{-(1+\epsilon)m+\frac{1}{12(1+\epsilon)m}}}$
	$\displaystyle={}$	$\displaystyle\frac{1}{\sqrt{2\pi}}m^{-\frac{1}{2}}\cdot\frac{(2+\epsilon)^{(2+\epsilon)m+\frac{1}{2}}}{(1+\epsilon)^{(1+\epsilon)m+\frac{1}{2}}}\cdot e^{\frac{1}{12(2+\epsilon)m+1}-\frac{1}{12m}-\frac{1}{12(1+\epsilon)m}}$
	$\displaystyle>{}$	$\displaystyle\frac{1}{\sqrt{2\pi}}m^{-\frac{1}{2}}\cdot\frac{(2+\epsilon)^{(2+\epsilon)m+\frac{1}{2}}}{(1+\epsilon)^{(1+\epsilon)m+\frac{1}{2}}}\cdot e^{-\frac{1}{6m}}.$

∎

The second technical lemma (Lemma F.4) will be used to simplify the decay factor in the proof of Lemma F.2.

Lemma F.4.

For all $\epsilon\in[0,\sqrt{2})$ , the following inequality holds

\frac{2+\epsilon}{2\cdot(1+\epsilon)^{\frac{1+\epsilon}{2+\epsilon}}}\geq 1-\frac{\epsilon^{2}}{2}.

Proof of Lemma F.4.

By taking logarithm on both sides, we see the original inequality is equivalent to

\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2+\epsilon}\ln(1+\epsilon)\geq\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right),

which is further equivalent to

\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2+\epsilon}\ln(1+\epsilon)-\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right)\geq 0.

(50)

Note that the LHS can be lower bounded by

\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2+\epsilon}\ln(1+\epsilon)-\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right)\geq\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2}\ln(1+\epsilon)-\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right)=:g(\epsilon).

Function $g$ satisfies that $g(0)=0$ , and its derivative is

	$\displaystyle g^{\prime}(\epsilon)={}$	$\displaystyle\frac{1}{2+\epsilon}-\frac{1}{2}-\frac{1}{2}\ln(1+\epsilon)+\frac{\epsilon}{1-\frac{1}{2}\epsilon^{2}}$
	$\displaystyle\geq{}$	$\displaystyle\frac{1}{2+\epsilon}-\frac{1}{2}-\frac{\epsilon}{2}+\epsilon$
	$\displaystyle={}$	$\displaystyle\frac{2-(2+\epsilon)(1-\epsilon)}{2(2+\epsilon)}$
	$\displaystyle={}$	$\displaystyle\frac{\epsilon+\epsilon^{2}}{2(2+\epsilon)}\geq 0.$

Thus, $g(\epsilon)\geq 0$ for all $\epsilon\in[0,\sqrt{2})$ . Hence (50) holds for all $\epsilon\in[0,\sqrt{2})$ . ∎

Now we are ready to present the proof of Lemma F.2.

Proof of Lemma F.2.

Refer to caption — Figure 3: Graph structure of $\mathcal{G}$ to obtain the lower bound: $N$ blocks form a ring. Each block contains $d$ vertices.

Consider the graph $\mathcal{G}$ constructed as Figure 3: Let $N$ be a positive integer that is sufficiently large. $N$ blocks form a ring, where each block contains $d$ nodes. Every pair of blocks are connected by a complete bipartite graph. The graph Laplacian of $\mathcal{G}$ can be decomposed as $L=2dI-M$ , where $M$ is the adjacency matrix of $\mathcal{G}$ . We see that

	$\displaystyle(I+\ell\cdot L)^{-1}$	$\displaystyle=\mathopen{}\mathclose{{}\left((2d\ell+1)I-\ell\cdot M}\right)^{-1}$
		$\displaystyle=\frac{1}{2d\ell+1}\mathopen{}\mathclose{{}\left(I-\frac{\ell}{2d\ell+1}M}\right)^{-1}$
		$\displaystyle=\frac{1}{2d\ell+1}\sum_{t=0}^{\infty}\frac{\ell^{t}}{(2d\ell+1)^{t}}M^{t}$

Fix two vertices $i$ and $j$ and denote $\kappa:=d_{\mathcal{G}}(i,j)$ and assume $\kappa\geq 3$ . Without the loss of generality, we can assume $j$ is on the clockwise direction of $i$ . We see that

	$\displaystyle\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}={}$	$\displaystyle\frac{1}{2d\ell+1}\sum_{t=0}^{\infty}\frac{\ell^{t}}{(2d\ell+1)^{t}}(M^{t})_{ij}$
	$\displaystyle={}$	$\displaystyle\frac{\ell^{\kappa}}{(2d\ell+1)^{\kappa+1}}\sum_{m=0}^{\infty}\frac{\ell^{2m}}{(2d\ell+1)^{2m}}(M^{\kappa+2m})_{ij}.$		(51)

Note that $(M^{\kappa+2m})_{ij}$ denotes the number of paths from $i$ to $j$ with length $\kappa+2m$ in graph $\mathcal{G}$ . Note that the shortest paths from $i$ to $j$ have length $\kappa$ . To pick a path with length $(\kappa+2m)$ from $i$ to $j$ , we can first pick a path on the level of blocks: The number of possible block-level paths is lower bounded by $\binom{\kappa+2m}{m}$ because we can choose $m$ in $(\kappa+2m)$ steps to go in the counter clockwise direction. After a block-level path is fixed, we can choose which specific vertices in the blocks we want to land at, and there are $d^{\kappa+2m-2}$ choices. Thus we see that

(M^{\kappa+2m})_{ij}\geq\binom{\kappa+2m}{m}d^{\kappa+2m-2}.

Substituting this into (F) gives

\displaystyle\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq{}

\displaystyle\frac{\ell^{\kappa}d^{\kappa-2}}{(2d\ell+1)^{\kappa+1}}\sum_{m=0}^{\infty}\frac{\ell^{2m}d^{2m}}{(2d\ell+1)^{2m}}\binom{\kappa+2m}{m}.

(52)

Let $m=0$ will give that $\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq\frac{\kappa}{d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\frac{d\ell}{2d\ell+1}}\right)^{\kappa}$ , which shows the first claim of Lemma F.2. Now we proceed to show the second claim of Lemma F.2.

By Lemma F.3, we know that when $\kappa=\epsilon m$ , we have that

	$\displaystyle\binom{\kappa+2m}{m}={}$	$\displaystyle\binom{(2+\epsilon)m}{m}$
	$\displaystyle>{}$	$\displaystyle\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{6m}}m^{-\frac{1}{2}}\frac{(2+\epsilon)^{(2+\epsilon)m}}{(1+\epsilon)^{(1+\epsilon)m}}\cdot\mathopen{}\mathclose{{}\left(\frac{2+\epsilon}{1+\epsilon}}\right)^{\frac{1}{2}}$
	$\displaystyle\geq{}$	$\displaystyle\frac{1}{2\sqrt{2\pi}}m^{-\frac{1}{2}}\mathopen{}\mathclose{{}\left(\frac{2+\epsilon}{(1+\epsilon)^{\frac{1+\epsilon}{2+\epsilon}}}}\right)^{(2+\epsilon)m}.$

For any $m>\kappa$ , the inequality we just showed can help us bound a term in the summation of (52) below:

		$\displaystyle\frac{\ell^{\kappa}d^{\kappa-2}}{(2d\ell+1)^{\kappa+1}}\cdot\frac{\ell^{2m}d^{2m}}{(2d\ell+1)^{2m}}\binom{\kappa+2m}{m}$
	$\displaystyle\geq{}$	$\displaystyle\frac{1}{d^{2}(2d\ell+1)}\cdot\frac{1}{2^{\kappa+2m}}\binom{\kappa+2m}{m}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{\kappa+2m}$
	$\displaystyle\geq{}$	$\displaystyle\frac{1}{2\sqrt{2\pi}\cdot\sqrt{\frac{\kappa}{\epsilon}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\frac{2+\epsilon}{2\cdot(1+\epsilon)^{\frac{1+\epsilon}{2+\epsilon}}}}\right)^{(2+\epsilon)\kappa/\epsilon}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{(1+\frac{2}{\epsilon})\kappa}$
	$\displaystyle\geq{}$	$\displaystyle\frac{1}{2\sqrt{2\pi}\cdot\sqrt{\frac{\kappa}{\epsilon}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{\epsilon^{2}}{2}}\right)^{\frac{1}{\epsilon}}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{\frac{1}{\epsilon}}}\right)^{(2+\epsilon)\kappa},$

where the last line follows from Lemma F.4.

Thus, we obtain that the following inequality holds for arbitrary $\epsilon\in(0,1)$ :

\displaystyle\mathopen{}\mathclose{{}\left((I+L)^{-1}}\right)_{ij}\geq\frac{1}{2\sqrt{2\pi}\cdot\sqrt{\frac{\kappa}{\epsilon}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{\epsilon^{2}}{2}}\right)^{\frac{2}{\epsilon}+1}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{\frac{2}{\epsilon}+1}}\right)^{\kappa}.

(53)

By setting $\epsilon$ such that $1/\epsilon=\mathopen{}\mathclose{{}\left[2(d\ell)^{\frac{1}{2}}}\right]$ in (53), we obtain that:

$\displaystyle\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq{}$	$\displaystyle\frac{1}{2\sqrt{2\pi}\cdot\sqrt{2\kappa\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell}}\right)^{4\sqrt{d\ell}+1}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{4\sqrt{d\ell}+1}}\right)^{\kappa}$
$\displaystyle\geq{}$	$\displaystyle\frac{1}{4\sqrt{\pi\cdot\kappa\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{4\sqrt{d\ell}+1}{2d\ell}}\right)\cdot\mathopen{}\mathclose{{}\left(1-\frac{4\sqrt{d\ell}+1}{2d\ell+1}}\right)}\right)^{\kappa}$
$\displaystyle\geq{}$	$\displaystyle\frac{1}{4\sqrt{\pi\cdot\kappa\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(1-\frac{4}{\sqrt{d\ell}}}\right)^{\kappa}.$	(54)

∎

Step 3: Combine Temporal and Spatial Lower Bounds

Combining the results of Steps 1 and 2 together, we know that the competitive ratio of any decentralized online algorithm is lower bounded by

\max\{1+\frac{\mu^{3}(1-\sqrt{\lambda_{T}})^{2}}{48(\mu+1)^{2}\ell_{T}}\cdot\lambda_{T}^{k},1+\Omega(\lambda_{S}^{r})\}=1+\Omega(\lambda^{k})+\Omega(\lambda_{S}^{r}).

Appendix G Proof of Corollary 3.7

In this appendix we prove a resource augmentation bound for LPC. To simplify the notation, we define the shorthand $a_{T}:=\ell_{T}/\mu$ and $a_{S}:=\ell_{S}/\mu$ . $a_{T}$ and $a_{S}$ are positive real numbers. We first show two lemmas about the relationships between the decay factors $\rho_{T}$ and $\lambda_{T}$ , and $\rho_{S}$ and $\lambda_{S}$ .

Lemma G.1.

Under the assumptions of Theorem 3.2, we have $\rho_{T}^{4}\leq\lambda_{T}\leq\rho_{T}^{2}.$

Proof of Lemma G.1.

Recall that $\rho_{T}$ is given by

\rho_{T}=\sqrt{1-\frac{2}{\sqrt{1+2a_{T}}+1}}

in Theorem 3.2. Thus we see that

\displaystyle\rho_{T}^{4}=\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+2a_{T}}+1}}\right)^{2}\leq\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+4a_{T}}+1}}\right)^{2}=\lambda_{T}.

On the other hand, we have that

\lambda_{T}-\rho_{T}^{2}=\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+4a_{T}}+1}}\right)^{2}-1+\frac{2}{\sqrt{1+2a_{T}}+1}=\frac{4\sqrt{(1+2a_{T})}\mathopen{}\mathclose{{}\left(\sqrt{1+2a_{T}}-\sqrt{1+4a_{T}}}\right)}{\mathopen{}\mathclose{{}\left(\sqrt{1+2a_{T}}+1}\right)\mathopen{}\mathclose{{}\left(\sqrt{1+4a_{T}}+1}\right)^{2}}\leq 0.

∎

Lemma G.2.

Under the assumptions of Theorem 3.2, we have $\rho_{S}^{32}\leq\lambda_{S}$ .

Proof of Lemma G.2.

Recall that $\rho_{T}$ is given by

\rho_{S}=\sqrt{1-\frac{2}{\sqrt{1+\Delta a_{S}}+1}}

in Theorem 3.2. We consider the following three cases separately.

Case 1: $\Delta a_{S}\geq 224$ . We have $\rho_{S}^{32}\leq\lambda_{S}$ in this case.

We first show that the following inequality holds for any positive integer $n_{0}$ and $x\in[0,1/(2n_{0})]$ :

\displaystyle(1-x)^{2n_{0}}\leq 1-n_{0}x.

(55)

To see this, define function $g(x)=(1-x)^{2n_{0}}+n_{0}x-1$ . Note that $g$ is a convex function with $g(0)=0$ and

g\mathopen{}\mathclose{{}\left(\frac{1}{2n_{0}}}\right)=\mathopen{}\mathclose{{}\left(1-\frac{1}{2n_{0}}}\right)^{2n_{0}}-\frac{1}{2}\leq e^{-1}-\frac{1}{2}<0.

Thus, we see that $g(x)\leq 0$ holds for all $x\in[0,1/(2n_{0})]$ . Hence (55) holds.

By (55), we see that

	$\displaystyle\rho_{S}^{16}$	$\displaystyle=\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+\Delta a_{S}}+1}}\right)^{8}$
		$\displaystyle\leq 1-\frac{8}{\sqrt{1+\Delta a_{S}}+1}$
		$\displaystyle\leq 1-\frac{4\sqrt{3}}{\sqrt{\Delta a_{S}}}$
		$\displaystyle=\sqrt{\lambda_{S}}.$

Case 2: $1\leq\Delta a_{S}<224$ . We have $\rho_{S}^{28}\leq\lambda_{S}$ in this case.

To see this, note that $\rho_{S}^{2}\leq 1-\frac{2}{\sqrt{1+224}+1}=\frac{7}{8}$ , and by Theorem 3.6, $\lambda_{S}\geq\frac{1}{3+3\cdot 1}=\frac{1}{6}$ . Therefore, we see that

\rho_{S}^{28}\leq\mathopen{}\mathclose{{}\left(\frac{7}{8}}\right)^{14}\leq\frac{1}{6}\leq\lambda_{S}.

Case 3: $\Delta a_{S}<1$ . We have $\rho_{S}^{4}\leq\lambda_{S}$ in this case.

To see this, note that

\displaystyle\rho_{S}^{2}=\frac{\sqrt{1+\Delta a_{S}}-1}{\sqrt{1+\Delta a_{S}}+1}\leq\frac{\sqrt{\Delta a_{S}}}{4}.

Thus we see that

\displaystyle\rho_{S}^{4}\leq\frac{\Delta a_{S}}{16}\leq\frac{\Delta a_{S}}{3+3\Delta a_{S}}=\lambda_{S}.

∎

Now we come back to the proof of Corollary 3.7. By Theorem 3.4 and Theorem 3.6, we know that the optimal competitive ratio is lower bounded by

\displaystyle c(k^{*},r^{*})\geq 1+C_{\lambda}\mathopen{}\mathclose{{}\left(\lambda_{T}^{k^{*}}+\lambda_{S}^{r^{*}}}\right)

and LPC’s competitive ratio is upper bounded by

c_{LPC}(k,r):=1+C_{\rho}\mathopen{}\mathclose{{}\left(C_{3}(r)^{2}\cdot\rho_{T}^{k}+h(r)^{2}\cdot\rho_{S}^{r}}\right),

where $C_{\lambda}$ and $C_{\rho}$ are some positive constants. To achieve $c_{LPC}(k,r)\leq c(k^{*},r^{*})$ , it suffices to guarantee that

C_{\rho}\cdot C_{3}(r)^{2}\cdot\rho_{T}^{k}\leq C_{\lambda}\lambda_{T}^{k^{*}}\text{ and }C_{\rho}\cdot h(r)^{2}\cdot\rho_{S}^{r}\leq C_{\lambda}\lambda_{S}^{r^{*}}.

Note that $C_{3}(r)$ can be upper bounded by some constant and $h(r)^{2}\leq poly(r)\cdot\rho_{S}^{-\frac{r}{2}}$ under our assumptions. Applying Lemma G.1 and Lemma G.2 finishes the proof.

Decentralized Online Convex Optimization in Networked Systems

Abstract

1 Introduction

2 Problem Setting

Assumption 2.1.

Definition 2.2.

2.1 Information Availability Model

3 Algorithm and Main Results

3.1 Localized Predictive Control (LPC)

3.2 Perturbation Analysis

Definition 3.1.

Theorem 3.2.

Theorem 3.3.

3.3 From Perturbations to Competitive Bounds

Theorem 3.4.

Corollary 3.5.

3.4 A Lower Bound

Theorem 3.6.

Corollary 3.7.

4 Concluding Remarks

References

Appendix A Notation Summary and Definitions

Definition A.1.

Appendix B Example: Multiproduct Pricing

B.1 Competitive Bound

Lemma B.1.

Proof.

Appendix C Proof Outline

C.1 Refined Analysis of Perturbation Bounds

Step 1. Establish first order equations

Lemma C.1.

Step 2: Exploit the structure of matrix MM

Lemma C.2.

Step 3: Properties for general exponential-decay matrices

Lemma C.3.

Step 4: Establish correlation decay properties of matrix MM

Lemma C.4.

C.2 From Perturbation to Competitive Ratio

Theorem C.5 (Global Perturbation Bound).

Definition C.6 (Per-step error magnitude).

Lemma C.7.

Theorem C.8.

C.3 Roadmap to Generalize the Proof to Inexact Predictions

Appendix D Perturbation Bounds

D.1 Proof of Theorem 3.2

Lemma D.1.

Theorem D.2.

Proof of Theorem D.2.

D.2 Proof of Theorem 3.3

Step 1. Establish first order equations

Step 2. Exploit the structure of matrix MM

Proof of Lemma C.2.

Step 3: Property for general exponential-decay matrices

Proof of Lemma C.3.

Step 4: Establish correlation decay properties of matrix MM

Lemma D.3.

Proof.

Proof of Lemma C.4.

Lemma D.4.

Proof.

D.3 Adding Constraints to Perturbation Bounds

Appendix E Competitive Bounds

E.1 Proof of Theorem C.8

E.2 Proof of Lemma C.7

E.3 Proof of Theorem 3.4

Lemma E.1.

E.4 Proof of Corollary 3.5

Appendix F Proof of Theorem 3.6

Step 1: Temporal Lower Bounds

Step 2: Spatial Lower Bounds

Theorem F.1.

Proof of Theorem F.1.

Lemma F.2.

Lemma F.3.

Proof of Lemma F.3.

Lemma F.4.

Proof of Lemma F.4.

Proof of Lemma F.2.

Step 3: Combine Temporal and Spatial Lower Bounds

Appendix G Proof of Corollary 3.7

Step 2: Exploit the structure of matrix $M$

Step 4: Establish correlation decay properties of matrix $M$

Step 2. Exploit the structure of matrix $M$

Step 4: Establish correlation decay properties of matrix $M$